PRODUCT

Everything reCost Sees

One platform. Object-level insight across your entire AWS data layer , from Iceberg tables to media pipelines to EOL SDK detection.

TABLE FORMAT MONITORING

Know the health of every table, without touching your query engine

reCost reads your S3 inventory and access logs to reconstruct the full health picture of every Iceberg, Delta, and Hudi table , snapshot count, orphaned files, manifest bloat, small file accumulation, cold partitions , without needing access to your catalog or compute.

Iceberg snapshot count per table , detect tables with expiry never configured
Orphaned file detection and GB quantification per table
Manifest bloat , how many old manifests are being read on every query
Delta Lake transaction log depth , tables with thousands of JSON commits and no checkpoints
Delta VACUUM gaps , unvacuumed deleted records accumulating storage cost
Hudi compaction lag , commit health and compaction frequency
Small file explosion , streaming micro-commit accumulation per table
Avg file size per table , OPTIMIZE candidates

REAL FINDING

"We found 15.6 TB of orphaned files on a single Iceberg table. The table had 42,015 snapshots with no expiry policy."

PIPELINE HEALTH

Detect broken pipelines before your users do

Pipelines leave footprints in S3. When a Spark job stops writing, when a Glue job goes silent, when a streaming checkpoint stops advancing , reCost detects it from access log patterns, without any integration into your orchestration layer.

Dead pipeline detection , prefixes with no writes after expected cadence
Cadence-aware monitoring , distinguishes hourly, daily, and weekly jobs
Spark streaming checkpoint health , commit offset lag detection
Glue job silence , jobs with no PUT activity for N days
Delta Lake streaming job monitoring , checkpoint write gaps
New writer alert , unexpected IAM role starts writing to a table

REAL FINDING

"A Go service was making 16.6M failed S3 requests per month , 96% error rate. Nobody had noticed. We found it from user agent analysis."

QUERY INTELLIGENCE

Understand what your query engines are actually doing to your storage

Every Athena query, every Trino scan, every Databricks read leaves a trace in your S3 access logs. reCost reconstructs query behavior , which tables are hot, which partitions are cold, how much stale metadata is being read on every plan , without any query engine integration.

Athena query count estimation from result file patterns
Cold partition detection , data files with GETs from Athena but last accessed 90+ days ago
Stale result accumulation , Unsaved/ prefix GB growing unnoticed
Snapshot/manifest GET overhead per table , old metadata still read on every query
Trino metadata overhead , manifest list GET patterns
Databricks Unity Catalog anomalies
Estimated query count per table derived from access log GET patterns
Per-table GET/PUT ratio , read-heavy vs write-heavy classification

REAL FINDING

"64% of all snapshot GETs were on files older than 90 days. Every query was loading ancient snapshot chains just to find current data."

SECURITY & COMPLIANCE

See who is touching what , and what SDK they're using

S3 access logs contain the full identity of every request , IAM role, SDK version, user agent, source IP. reCost cross-references this against known EOL runtimes and CVEs to surface production security risks that build-time scanners like Snyk will never see.

EOL SDK detection in production , aws-sdk-java/1.x, aws-sdk-go/1.x, Hadoop 3.3.x, nodejs10/12/14
CVE matching against detected runtime versions
IAM boundary violations , roles accessing out-of-scope buckets
First-time role access alerts , new identity touching a bucket
Browser / human access to production buckets
Unidentified services , null user agents or raw HTTP clients in production
External tool detection hitting sensitive prefixes
PII path detection , email, name patterns in object keys

REAL FINDING

"nodejs10.x Lambda , EOL since 2021 , still making requests to production S3. Three years past end of support, actively hitting a sensitive bucket."

IAM DATA FLOW

See exactly who is moving data where

reCost builds a complete map of data movement in your AWS environment , which IAM roles access which buckets, which prefixes, which operations, and how much. The Sankey diagram makes invisible data flows visible at a glance.

IAM Role → Bucket → Prefix → Operation Sankey diagram
Top roles by GET volume and PUT volume
Cross-bucket data movement visibility
Role access frequency and byte volume per bucket
Hover to highlight all connections for a specific role
Detect roles with unexpectedly broad bucket access

REAL FINDING

"Three IAM roles were crossing production bucket boundaries. None of them were supposed to have access. Found within the first scan."

STORAGE MONITORING

Object-level storage visibility, not just bucket totals

CloudWatch shows you bucket size. reCost shows you which tables are growing, which prefixes are cold, which storage classes your data is actually in, and what's costing you money at the object level.

Largest tables by size and object count
Data temperature per prefix , hot, warm, cold, frozen
Storage class distribution per bucket and per table
S3 cost breakdown by category , compute, storage, transfer, requests
Largest buckets by size and monthly request volume
CloudFront cache miss rate vs S3 origin requests (for media workloads)
Bandwidth by content type , jpg, webp, mp4, wav

REAL FINDING

"12 TB in Standard storage tier with zero GETs in 180 days. Moving to Glacier saved $1,400/month. Identified in under 10 minutes."

MEDIA WORKLOADS

S3 as a media server , now observable

If your S3 serves media through CloudFront, reCost gives you visibility that CloudFront logs alone can't provide , which originals are hot, which resized variants are being served, which files have never been accessed, and where your CDN is passing requests through to S3 vs serving from cache.

Resized variant popularity by dimension (728x, 336x, 1200x etc.)
Original vs derivative file access patterns
Peak CDN→S3 traffic hours
Files in Intelligent Tiering with zero GETs
Cache miss rate , 200 (S3 served) vs 304 (CloudFront cached)
EOL SDK detection from media pipeline user agents (FFmpeg, etc.)

REAL FINDING

"68% of CloudFront origin requests were for image variants that hadn't been accessed in 90+ days. Cache warming the wrong assets."

See exactly what's happening in your S3 data layer

Works with your existing AWS setup. Read-only access. No agents. No data exposure.

Book a Demo