Small files inflate query times and S3 request costs across every open table format. The symptoms look the same: slow scans, high GET counts, growing object counts. But the fix is format-specific. Here is how to detect the small files problem in Iceberg, Delta Lake, and Hudi, and which compaction procedure to run.
Why small files hurt performance
Query engines like Athena and Spark open one task per file. A table with 50,000 files of 1 MB each requires 50,000 file open operations, task slots, and metadata fetches. The same data in 50 files of 1 GB each requires 50. At scale, small file counts above 10,000 per partition add measurable query planning overhead before a single byte of data is read.
Detecting small files from S3 monitoring
- Pull S3 Inventory for the table prefix and compute median object size per partition
- Flag partitions with median object size below 64 MB as small-file candidates
- Check GET request rate to byte volume ratio: high GET count relative to bytes scanned confirms query engines are touching many small files
- For Iceberg: cross-reference manifest data file entries against S3 object sizes
- For Delta Lake: parse the _delta_log AddFile actions to get file sizes without running DESCRIBE DETAIL
Iceberg: bin-pack compaction
Iceberg's compaction uses the bin-pack strategy (CALL system.rewrite_data_files) to merge small files into target-size files (default 512 MB). Run this per table partition for the most control. Iceberg 1.5+ supports concurrent compaction with write operations.
Delta Lake: OPTIMIZE and ZORDER
Delta Lake's OPTIMIZE command compacts small files into 1 GB files and optionally applies Z-ORDER co-location for frequently filtered columns. Delta 3.0+ supports Liquid Clustering as an alternative to static Z-ORDER. Run OPTIMIZE after batch writes and on a daily schedule for streaming tables.
Apache Hudi: compaction for MOR tables
Hudi MOR (Merge-on-Read) tables accumulate log files alongside base files. Compaction merges log files back into base files. Inline compaction runs during writes; async compaction runs separately. When compaction lags, the log-to-base-file ratio climbs and read amplification increases. Monitor compaction lag using the Hudi timeline metadata.
How reCost surfaces the small files problem
- Median object size per table partition, updated from S3 Inventory snapshots
- Alert when median object size drops below 64 MB across a partition with more than 10,000 objects
- Format-specific recommendation: rewrite_data_files for Iceberg, OPTIMIZE for Delta, compaction for Hudi MOR
- GET request efficiency score: ratio of bytes read to GET requests per query engine per table
Connect reCost to your S3 environment in 5 minutes
No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.
Book a Demo