BLOG

Data Lake Monitoring & S3 Observability Blog

Technical posts on S3 monitoring, Delta Lake health, lakehouse observability, Athena tuning, and data pipeline debugging. For data engineers and platform teams.

ALL POSTS
Data Lake Health6 min read

How to Expire Iceberg Snapshots Without Breaking Time-Travel Queries

Running expire_snapshots too aggressively can break time-travel queries your downstream consumers depend on. Here is how to set the retention window correctly and verify expiry ran without corrupting your table.

rC
reCost Team
Apr 14, 2026
Read article
Data Lake Health7 min read

Manifest Bloat in Apache Iceberg: How to Detect It and When to Run rewrite_manifests

Iceberg manifest files are the index your query engines read before touching a single data file. When manifests bloat: thousands of small manifest files per snapshot, query planning slows significantly before any data is read. Here is how to detect manifest bloat and when rewrite_manifests actually helps.

rC
reCost Team
Mar 24, 2026
Read article
Data Lake Health9 min read

The Small Files Problem in Iceberg, Delta Lake, and Hudi: Compaction Strategies Compared

Small files inflate query times and S3 request costs across every open table format. The symptoms look the same: slow scans, high GET counts, growing object counts. But the fix is format-specific. Here is how to detect the small files problem in Iceberg, Delta Lake, and Hudi, and which compaction procedure to run.

rC
reCost Team
Mar 3, 2026
Read article
Pipeline Observability8 min read

Detecting Silent Writers in S3-Backed Data Lakes: Firehose, Kinesis, MSK, Glue, and Spark Streaming

A silent writer is a pipeline that stops committing data to S3 without raising an error. Glue reports success, Firehose shows no failures, Airflow marks the task green. But the table is not being updated. Here is how to detect silent writers across every major S3-writing service using S3 access logs.

rC
reCost Team
Feb 10, 2026
Read article
Data Lake Health7 min read

Athena Cost per Table: Attribution Using S3 Access Logs and CloudTrail

Athena charges per byte scanned. But the Athena console only tells you the total per-query scan size, not which table caused it, which team runs the most expensive queries, or which partition is scanned cold every time. S3 access logs give you that attribution layer.

rC
reCost Team
Jan 20, 2026
Read article
Data Lake Health7 min read

Delta Lake _delta_log Bloat: Why Your Checkpoints Grow to 170 TB and How to Fix It

The Delta Lake transaction log lives in the _delta_log/ prefix of every Delta table. Without proper checkpoint and VACUUM configuration, this log accumulates JSON files, Parquet checkpoints, and orphaned data files that inflate storage costs and slow every reader that opens the table.

rC
reCost Team
Dec 30, 2025
Read article
Data Lake Health6 min read

Hudi MOR Compaction Lag: How to Monitor It Without Touching the Writer

Apache Hudi MOR (Merge-on-Read) tables accumulate delta log files between compaction runs. As the log-to-base-file ratio grows, read amplification increases. Every query must merge more log files before returning results. Here is how to monitor compaction lag without instrumenting your Spark or Flink writers.

rC
reCost Team
Dec 9, 2025
Read article
Data Lake Health8 min read

AWS S3 Tables vs Self-Managed Iceberg: Compaction Cost, Observability, and When to Switch

AWS S3 Tables is a managed Iceberg service that handles compaction, snapshot expiry, and orphan file removal automatically. But it comes with different pricing, reduced observability, and trade-offs for teams with complex table formats or existing tooling. Here is how to evaluate whether to switch.

rC
reCost Team
Nov 18, 2025
Read article
Data Lake Health7 min read

Trino Query Cost Attribution: Joining Event-Listener Logs with S3 Access Logs

Trino does not expose per-query S3 costs natively. But every Trino query that reads Iceberg, Delta, or Hudi data generates S3 GET requests that appear in your access logs under the Trino connector's IAM role. Here is how to join Trino event-listener logs with S3 access logs to attribute query cost per table, per user, and per team.

rC
reCost Team
Oct 28, 2025
Read article
Data Lake Health9 min read

Delta Lake Health Monitoring: What to Track and How to Find Issues Without Your Query Engine

Most Delta Lake health problems are invisible until they show up as slow queries or failed jobs. Here's how S3 access logs surface compaction lag, orphaned files, and checkpoint failures before they escalate.

rC
reCost Team
Oct 7, 2025
Read article
S3 Monitoring8 min read

S3 Monitoring Beyond CloudWatch: Object-Level Visibility for Data Engineering Teams

CloudWatch tells you how much S3 storage you have. It doesn't tell you which tables are degrading, which pipelines have stopped writing, or which IAM roles are behaving unexpectedly. Here's what object-level visibility actually looks like.

rC
reCost Team
Sep 16, 2025
Read article
Data Lake Health10 min read

Lakehouse Monitoring in 2026: Covering Iceberg, Delta Lake, and Hudi From One Place

Iceberg, Delta Lake, and Hudi all have different metadata models, compaction patterns, and failure modes. Here's how to monitor all three from a single place without running queries against each catalog.

rC
reCost Team
Aug 26, 2025
Read article
Data Lake Health7 min read

Athena Monitoring From S3 Access Logs: Cold Partitions, Stale Results, and Scan Waste

Athena scan costs scale with how much data each query touches. S3 access logs reveal exactly which partitions are being hit, how often, and whether your results cache is doing anything useful.

rC
reCost Team
Aug 5, 2025
Read article
Pipeline Observability8 min read

Data Pipeline Observability Without Instrumentation: How S3 Write Patterns Tell the Story

Adding observability to every ETL job takes time your team doesn't have. S3 write patterns already contain the signal you need to detect dead pipelines, cadence drift, and checkpoint failures.

rC
reCost Team
Jul 15, 2025
Read article
Engineering7 min read

IAM Monitoring for AWS Data Teams: Who Is Accessing What and With What SDK

IAM roles, SDK versions, access frequency, and bucket boundaries, most data teams have no visibility into this layer until something goes wrong. Here's how to build that picture from S3 access logs.

rC
reCost Team
Jun 24, 2025
Read article
S3 Monitoring9 min read

Data Lake Monitoring in 2026: What Good Looks Like Across S3 Environments

We analyzed petabytes of S3 usage across hundreds of data lake workloads to define what efficient S3 storage looks like in 2026, and where most teams still fall short.

rC
reCost Team
Jun 3, 2025
Read article
S3 Monitoring7 min read

What S3 Access Logs Reveal About Your Data Lake That CloudWatch Hides

CloudWatch surfaces bucket-level metrics. S3 access logs tell you which tables are growing, which pipelines have stopped, and which roles are crossing boundaries. The difference matters.

rC
reCost Team
May 13, 2025
Read article
Data Lake Health6 min read

S3 Data Transfer Monitoring: What Engineers Actually Need to Know

Data transfer costs are one of the most unpredictable parts of your AWS bill. Cross-region replication, CDN charges, and internal service calls all add up faster than you think.

rC
reCost Team
Apr 22, 2025
Read article
Pipeline Observability8 min read

S3 Request Monitoring: How GET and PUT Patterns Expose Pipeline Problems

GET, PUT, and LIST requests are billed per operation on S3, and they add up fast at scale. More importantly, unusual request patterns are often the first sign a pipeline is broken.

rC
reCost Team
Apr 1, 2025
Read article
S3 Monitoring5 min read

Object-Level S3 Monitoring vs Bucket Metrics: What the Difference Reveals

Bucket metrics tell you total spend. Object-level visibility tells you which tables, prefixes, and access patterns are driving it. Here's what you can and can't see at each level.

rC
reCost Team
Mar 11, 2025
Read article
Data Lake Health9 min read

Delta Lake Health and S3 Lifecycle: What Your Monitoring Stack Misses

S3 lifecycle policies and Delta Lake compaction interact in ways most monitoring tools don't surface. Here's where the gaps are and how to close them.

rC
reCost Team
Feb 18, 2025
Read article
Data Lake Health7 min read

S3 Storage Class Monitoring: Finding Cold Data Without Touching Your Catalog

S3 Intelligent-Tiering automates transitions but doesn't tell you what's cold, why, or whether it matches your access patterns. Object-level visibility gives you that picture first.

rC
reCost Team
Jan 28, 2025
Read article
Engineering5 min read

Bucket-Level vs Object-Level S3 Monitoring: Why Engineers Need Both

Bucket-level metrics are fast and cheap. Object-level visibility is where the real signal lives. Here's when each matters and how to combine them effectively.

rC
reCost Team
Jan 7, 2025
Read article
Data Lake Health6 min read

Small Files in S3: How Data Lake Monitoring Surfaces the Real Performance Cost

Every GET and LIST request on S3 has a price. Small files multiply your request count while frequent access patterns compound the cost. Here's how to see and fix it.

rC
reCost Team
Dec 17, 2024
Read article
Pipeline Observability7 min read

How to Monitor S3 Access Patterns and Catch Pipeline Failures Early

S3 API call costs can quietly drain your budget. But more importantly, access patterns are one of the earliest signals of pipeline failure, schema drift, and data quality issues.

rC
reCost Team
Nov 26, 2024
Read article
Pipeline Observability6 min read

Why Your AWS S3 Lifecycle Policies Might Be Costing You More Than You Think

Misconfigured lifecycle rules can end up costing more than doing nothing. Here's how to audit your existing policies and fix the patterns that silently inflate your S3 spend.

rC
reCost Team
Nov 5, 2024
Read article