Data Lake Health6 min read · May 2026

How to Expire Iceberg Snapshots Without Breaking Time-Travel Queries

rC
reCost Team
May 2026

Running expire_snapshots too aggressively can break time-travel queries your downstream consumers depend on. Here is how to set the retention window correctly and verify expiry ran without corrupting your table.

What expire_snapshots does

The Iceberg expire_snapshots procedure removes snapshot entries older than a specified timestamp from the table's snapshot log. It also marks the data files and manifest files referenced exclusively by those snapshots as eligible for deletion in the subsequent remove_orphan_files pass. Without regular expiry, snapshot lists grow indefinitely, adding overhead to every reader that resolves the snapshot chain on startup.

The time-travel risk

Iceberg supports time-travel queries that read data as-of a past snapshot (SELECT * FROM table FOR VERSION AS OF snapshot_id or FOR TIMESTAMP AS OF ts). If you expire snapshots that a downstream consumer relies on for auditing or rollback, those time-travel queries will fail with 'snapshot not found' errors. The solution is not to skip expiry. It is to set retention correctly.

  • Audit your consumers: check for any SELECT ... FOR TIMESTAMP AS OF queries in your Athena and Spark history
  • Identify the oldest time-travel window you need to support (common values: 7 days, 14 days, 30 days)
  • Set older_than to now() minus your retention window, never shorter
  • Run expire_snapshots on a schedule that matches your write cadence, not less frequently

The correct CALL syntax

For Spark with Iceberg: CALL catalog.system.expire_snapshots(table => 'db.table', older_than => TIMESTAMP 'YYYY-MM-DD HH:MM:SS', max_concurrent_deletes => 4). The `max_concurrent_deletes` parameter controls parallel deletion threads. Set to 4 or higher to drain large backlogs faster. For Athena with Iceberg native tables: use the AWS Glue Data Catalog stored procedure or a direct Iceberg API call from a Glue job.

Verifying expiry ran correctly

After expire_snapshots completes, check the snapshot count in the table metadata. If it has not decreased, the procedure may have hit a max_snapshot_age_ms limit or encountered a lock. S3 access logs show whether a write to the metadata/ prefix occurred during the expiry window. Absent metadata writes means expiry did not commit.

How reCost monitors snapshot expiry health

  • Tracks snapshot count per table over time, not just current count
  • Alerts when snapshot count exceeds a threshold (default: 1,000 per table)
  • Shows last expire_snapshots run timestamp inferred from metadata writes
  • Flags tables where expire_snapshots has never run or has not run in more than 7 days
SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo