Athena scan costs scale directly with how much data each query touches. Cold partitions, stale query results, and unbounded table scans all inflate your bill,but they're invisible in Athena's own metrics. S3 access logs tell a different story.
The problem with Athena's built-in monitoring
Athena provides per-query scan metrics and query history, but both are scoped to individual query execution. They don't tell you which partitions are being repeatedly scanned, whether your results cache is actually helping, or what proportion of your total scan cost comes from cold data that rarely gets queried.
CloudWatch Athena metrics,TotalExecutionTime, ProcessedBytes,are aggregate. They're useful for budget tracking but not for understanding where the waste is concentrated.
What S3 access logs reveal about Athena behavior
Every Athena query generates S3 GET requests against the data it scans. These requests are logged in your S3 server access logs with the Athena service principal as the requester. From these logs you can reconstruct:
- Which prefixes and partitions are being scanned most frequently
- Which partitions haven't been touched in weeks or months but are still included in full scans
- Whether your Athena results location is being re-read (indicating stale result reuse) or generating fresh data reads on every query
- How scan volume per prefix has changed over time,increasing scan waste on aging partitions
Cold partitions: the biggest source of Athena waste
Cold partitions are partitions that your queries technically include in their scope but whose data hasn't changed or been accessed recently. If your partition pruning isn't working correctly,or if your queries use date ranges broader than necessary,these partitions add to every scan even though they contribute no useful results.
In one case reCost surfaced, 214 cold partitions were being included in regular table scans, adding a 4.2× scan overhead to 28,400 Athena queries in a single month. None of this was visible in Athena query history,only in the S3 access pattern for those prefixes.
Stale results and query result reuse
Athena has a query result reuse feature that can serve cached results for identical queries. But whether it's working,and for which queries,isn't surfaced in Athena metrics. S3 access patterns tell you: if your Athena results location is generating new writes for queries that should be hitting cache, result reuse isn't working for those queries.
Practical Athena monitoring from S3 logs
- Map GET requests from the Athena service principal to your table partition structure
- Flag partitions with high scan frequency but no recent data modification
- Track scan volume per prefix over time to identify growing overhead
- Monitor results bucket write frequency as a proxy for cache miss rate
- Alert on scan overhead ratios exceeding 2-3× baseline for high-frequency query patterns
What reCost surfaces automatically
reCost maps Athena access patterns from your S3 logs and surfaces cold partition lists, stale result accumulation, and scan waste estimates per table, per day. You see which tables are costing the most in unnecessary scans and what partition pruning or cache configuration changes would have the biggest impact.
Connect reCost to your S3 environment in 5 minutes
No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.
Book a Demo