Data Lake Health7 min read · May 2026

Manifest Bloat in Apache Iceberg: How to Detect It and When to Run rewrite_manifests

rC
reCost Team
May 2026

Iceberg manifest files are the index your query engines read before touching a single data file. When manifests bloat: thousands of small manifest files per snapshot, query planning slows significantly before any data is read. Here is how to detect manifest bloat and when rewrite_manifests actually helps.

What Iceberg manifests are and why they matter

Every Iceberg snapshot has a manifest list: a file that points to one or more manifest files. Each manifest file lists the data files for a subset of the table's partitions, along with statistics (min/max values, null counts) that query planners use for partition pruning. The deeper the manifest tree, the more files a query engine must read before it can begin scanning data.

How manifest bloat develops

Manifest bloat happens when writes are small and frequent. Every write appends a new manifest to the snapshot's manifest list, rather than merging into existing manifests. After thousands of incremental writes without a rewrite_manifests pass, a single table scan requires reading hundreds of manifest files before reaching any data, adding seconds to query startup time.

  • Streaming pipelines with micro-batch commits: each commit adds a new manifest entry
  • Frequent small appends without compaction: each append creates a new manifest rather than updating existing ones
  • No rewrite_manifests in the maintenance schedule: manifests accumulate until query planning visibly degrades

How to detect manifest bloat

The manifest-list file for each snapshot lists how many manifests it references. A healthy table typically has 1 to 20 manifests per snapshot. Tables with hundreds or thousands of manifests per snapshot have significant bloat. You can read the manifest count from the snapshot's summary field in the table metadata JSON without running a query engine.

When to run rewrite_manifests

Run rewrite_manifests when: manifest count per snapshot exceeds 100, manifest files are smaller than 8 MB on average, or query planning time has increased without a corresponding increase in data volume. rewrite_manifests compacts the manifest files and updates the snapshot pointer. The operation is safe to run while the table is being read.

What reCost tracks

  • Manifest count per snapshot per table, trended over 30 days
  • Average manifest file size per table
  • Alert threshold: flag tables with more than 200 manifests per snapshot
  • Recommended rewrite_manifests invocation with correct catalog and table reference
SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo