Data Lake Health8 min read · March 2026

AWS S3 Tables vs Self-Managed Iceberg: Compaction Cost, Observability, and When to Switch

rC
reCost Team
March 2026

AWS S3 Tables is a managed Iceberg service that handles compaction, snapshot expiry, and orphan file removal automatically. But it comes with different pricing, reduced observability, and trade-offs for teams with complex table formats or existing tooling. Here is how to evaluate whether to switch.

What AWS S3 Tables provides

S3 Tables is a purpose-built S3 bucket type for Apache Iceberg tables. AWS manages compaction, snapshot expiry, and orphan file removal automatically. The service uses the same S3 API surface as standard S3 buckets for Iceberg operations, but metadata management happens outside the customer's account under AWS control.

Compaction cost comparison

With self-managed Iceberg, compaction costs are borne by the compute engine that runs it (Spark EMR cluster, Glue job, or Athena-initiated compaction). With S3 Tables, AWS charges for the table maintenance capacity units (TMCUs) consumed by automated compaction. At low write rates, S3 Tables compaction can be cheaper than running a dedicated Spark compaction job. At high write rates, TMCU costs can exceed the cost of self-managed compaction.

  • Low write rate tables (under 1 GB/day): S3 Tables automated compaction is typically cost-effective
  • Medium write rate tables (1-100 GB/day): compare TMCU cost against Glue job cost for equivalent compaction frequency
  • High write rate tables (over 100 GB/day): self-managed compaction with tuned bin-pack parameters often wins on cost
  • Tables with custom partition specs: S3 Tables may not honor non-default partition specs in automated compaction

Observability trade-offs

Self-managed Iceberg gives you full access to the metadata tree, S3 access logs, and snapshot history. S3 Tables restricts direct metadata access. The table storage is in an AWS-managed prefix that does not appear in your S3 Inventory or access logs the same way. This reduces your ability to monitor orphan file accumulation, snapshot health, and manifest bloat independently.

When to stay self-managed

  • You need object-level cost attribution per table: S3 Tables does not expose the storage breakdown at the file level
  • You run custom compaction strategies (Z-ORDER, bin-pack with custom target sizes): S3 Tables uses AWS-defined compaction parameters
  • You rely on S3 access logs for security monitoring: S3 Tables access patterns appear differently in access logs
  • You have Iceberg tables with complex partition evolution history: managed compaction may not handle all partition evolution cases

When S3 Tables makes sense

  • Small teams without Spark infrastructure to run maintenance jobs
  • Tables with predictable write rates where TMCU pricing is favorable
  • New projects where operational simplicity outweighs observability depth
  • Tables that are primarily read-heavy with infrequent writes

Monitoring S3 Tables with reCost

reCost monitors S3 Tables through CloudTrail data events and the Iceberg REST catalog API, providing table health scores, query cost attribution, and silent writer detection for tables on both self-managed and S3 Tables backends.

SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo