Running expire_snapshots too aggressively can break time-travel queries your downstream consumers depend on. Here is how to set the retention window correctly and verify expiry ran without corrupting your table.
What expire_snapshots does
The Iceberg expire_snapshots procedure removes snapshot entries older than a specified timestamp from the table's snapshot log. It also marks the data files and manifest files referenced exclusively by those snapshots as eligible for deletion in the subsequent remove_orphan_files pass. Without regular expiry, snapshot lists grow indefinitely, adding overhead to every reader that resolves the snapshot chain on startup.
The time-travel risk
Iceberg supports time-travel queries that read data as-of a past snapshot (SELECT * FROM table FOR VERSION AS OF snapshot_id or FOR TIMESTAMP AS OF ts). If you expire snapshots that a downstream consumer relies on for auditing or rollback, those time-travel queries will fail with 'snapshot not found' errors. The solution is not to skip expiry. It is to set retention correctly.
- Audit your consumers: check for any SELECT ... FOR TIMESTAMP AS OF queries in your Athena and Spark history
- Identify the oldest time-travel window you need to support (common values: 7 days, 14 days, 30 days)
- Set older_than to now() minus your retention window, never shorter
- Run expire_snapshots on a schedule that matches your write cadence, not less frequently
The correct CALL syntax
For Spark with Iceberg: CALL catalog.system.expire_snapshots(table => 'db.table', older_than => TIMESTAMP 'YYYY-MM-DD HH:MM:SS', max_concurrent_deletes => 4). The `max_concurrent_deletes` parameter controls parallel deletion threads. Set to 4 or higher to drain large backlogs faster. For Athena with Iceberg native tables: use the AWS Glue Data Catalog stored procedure or a direct Iceberg API call from a Glue job.
Verifying expiry ran correctly
After expire_snapshots completes, check the snapshot count in the table metadata. If it has not decreased, the procedure may have hit a max_snapshot_age_ms limit or encountered a lock. S3 access logs show whether a write to the metadata/ prefix occurred during the expiry window. Absent metadata writes means expiry did not commit.
How reCost monitors snapshot expiry health
- Tracks snapshot count per table over time, not just current count
- Alerts when snapshot count exceeds a threshold (default: 1,000 per table)
- Shows last expire_snapshots run timestamp inferred from metadata writes
- Flags tables where expire_snapshots has never run or has not run in more than 7 days
Connect reCost to your S3 environment in 5 minutes
No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.
Book a Demo