Data Lake Health10 min read · Feb 2026

Lakehouse Monitoring in 2026: Covering Iceberg, Delta Lake, and Hudi From One Place

rC
reCost Team
Feb 2026

Iceberg, Delta Lake, and Hudi all solve the same core problem,ACID transactions on object storage,but they do it differently. Different metadata models, different compaction strategies, different failure modes. Most monitoring approaches handle one format well and the others poorly. Here's how to cover all three without building three separate monitoring pipelines.

Why each format needs different monitoring signals

Apache Iceberg

Iceberg's snapshot model means every write creates a new snapshot pointer. Without snapshot expiry policies, these accumulate indefinitely,not just wasting storage, but slowing down every reader that has to resolve the snapshot chain. The key signals for Iceberg health are snapshot count per table, manifest file count, and whether expiry is configured and running.

Delta Lake

Delta Lake stores changes as a transaction log in _delta_log/. Log depth, checkpoint frequency, and the ratio of log files to data files are the primary health indicators. Orphaned data files,partial writes that were never committed,accumulate without vacuum policies and are invisible to query engines but visible in S3.

Apache Hudi

Hudi's compaction model (especially in MOR tables) creates a timeline of base files and incremental log files. Health problems in Hudi typically manifest as compaction lag,a growing number of log files per base file,which increases read amplification over time. Timeline file counts and the ratio of log to base files are the signals to watch.

What they have in common from an S3 perspective

Despite their differences, all three formats leave similar footprints in S3 that reveal health status:

  • Metadata file accumulation without corresponding data growth indicates cleanup policy failures
  • High GET request rates to metadata prefixes relative to data prefixes indicate reader overhead
  • Write cadence changes,faster or slower than baseline,indicate pipeline changes or failures
  • Object size distribution shifts indicate small file problems or compaction regressions

The practical monitoring stack for multi-format lakehouses

The simplest approach is to build monitoring at the S3 layer rather than building format-specific monitoring for each table format. S3 access logs and inventory data are format-agnostic: they tell you what's being read, written, and accumulated without needing to understand the specific metadata model.

The format-specific interpretation happens in how you contextualize the signals: a high ratio of metadata GET requests in a _delta_log/ prefix means something different than the same pattern in a metadata/ prefix in an Iceberg table. But the underlying data source,S3 logs and inventory,is the same.

What reCost covers across all three formats

  • Iceberg: snapshot count, manifest bloat, expiry policy status, metadata overhead per table
  • Delta Lake: log depth, checkpoint age, orphaned file volume, small file ratios
  • Hudi: compaction lag, timeline file counts, log-to-base-file ratios
  • All formats: write cadence monitoring, storage class distribution, IAM access patterns

How to get started

Connect reCost to your S3 environment (S3 access logging and inventory,both are bucket-level settings, no agents or code changes required). Within minutes you'll see health status across all your table formats, including which tables have active issues and what type of remediation is needed.

SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo