Question 1

Does reCost need agents on my Spark or Glue clusters?

Accepted Answer

No. reCost reads S3 access logs and S3 Inventory directly. No agents, no code changes, and no catalog access are required. Setup takes under 5 minutes with a read-only IAM role.

Question 2

Which table formats does reCost support?

Accepted Answer

reCost supports Apache Iceberg, Delta Lake, and Apache Hudi, plus query engines including Athena, Trino, Glue, Spark, EMR, and Databricks.

Question 3

How does reCost detect silent pipeline failures?

Accepted Answer

reCost tracks the last-write timestamp for every Iceberg, Delta, and Hudi table by writer identity (Glue, Airflow, Firehose, Kinesis, Spark Streaming, Flink, dbt). When a writer misses its SLO, reCost alerts via Slack, email, or webhook before downstream consumers notice.

Question 4

Can reCost attribute Athena query costs to specific tables?

Accepted Answer

Yes. reCost joins query-engine logs with S3 access logs to show scan size, GET requests, and bytes read per query, mapped to the specific table, writer, and team that caused the cost.

Question 5

What is the safe order for running Iceberg maintenance procedures?

Accepted Answer

The safe order is: (1) expire_snapshots first, (2) remove_orphan_files second, (3) rewrite_manifests third. Reversing the order risks corrupting time-travel history: removing orphan files before expiring snapshots can delete data files that live snapshots still reference.

Catch broken pipelines and bad tables before queries break

Your queries are getting slower and you don't know why

Three lenses. One S3 data source.

Iceberg, Delta, and Hudi health from metadata

Last-write tracking and silent-writer alerts

Query observability across engines

15.6 TB of Orphaned Files in S3, Invisible for 8 Months

Stop flying blind on your data lake