Apache Iceberg

Apache Iceberg Table Observability

Monitor table health, snapshot growth, compaction schedules, and schema evolution across all your Iceberg tables. Never miss a compaction again.

Why Iceberg tables degrade silently

Every write to an Iceberg table creates a new snapshot. Without regular compaction, small files multiply and snapshot metadata grows , causing query planning to slow down by 2-5x over months. Native AWS tooling doesn't surface this until queries are already failing.

Snapshot Lifecycle Tracking

See every snapshot, when it was created, how many files it references, and how much metadata it adds to table planning time.

Compaction Health Alerts

Get alerted when snapshot count exceeds safe thresholds or when file count per partition indicates compaction is overdue.

Schema Drift Detection

Track schema evolution over time. Get notified when columns are added, dropped, or renamed , before it breaks downstream consumers.

Partition Statistics

Monitor row counts, file sizes, and data distribution per partition. Identify hot partitions and skew that degrades query performance.

Manifest File Analysis

Track manifest list growth and average files per manifest. Surface tables approaching the point where planning time dominates query time.

S3 Inventory Integration

Reads table metadata directly from S3 object prefixes and S3 Inventory snapshots. No Glue Data Catalog or Lake Formation access required.

Safe maintenance order

Run Iceberg maintenance procedures in this order to avoid corrupting time-travel history:

  1. 1.
    expire_snapshots: Remove old snapshot references. Always run first. Subsequent steps depend on a clean snapshot list.
  2. 2.
    remove_orphan_files: Delete files not referenced by any live snapshot. Safe only after expire_snapshots completes.
  3. 3.
    rewrite_manifests: Compact manifest files for faster planning. Run last: operates on the current, cleaned snapshot state.

Reversing the order (orphan files before snapshots) can delete data files still referenced by live snapshots.

How to connect Iceberg

1

Deploy read-only IAM role

CloudFormation creates scoped read access to your Glue Data Catalog, Lake Formation, and S3 table metadata prefixes.

2

Register your tables

Provide your table database and namespace in the reCost Data Flow console. We auto-discover tables within the registered namespace.

3

Set compaction thresholds

Configure snapshot count and file size thresholds that match your workload's SLAs. Defaults based on table write frequency.

4

Monitor and alert

Table health dashboards populate within 24 hours. Alerts fire when thresholds are breached.

Required IAM permissions

s3:GetObject (S3 access log buckets)
s3:ListBucket (S3 access log buckets)
s3:GetInventoryConfiguration
s3:GetObject (table metadata prefixes, read-only)
s3:ListBucket (table metadata prefixes, read-only)

No Glue, Lake Formation, or catalog access. S3 metadata prefixes and access logs only.

See exactly what's happening in your S3 data layer

Works with your existing AWS setup. Read-only access. No agents. No data exposure.

Book a Demo