PRODUCT

Everything reCost Sees

One platform. Object-level insight across your entire AWS data layer , from Iceberg tables to media pipelines to EOL SDK detection.

TABLE FORMAT MONITORING

Know the health of every table, without touching your query engine

reCost reads your S3 inventory and access logs to reconstruct the full health picture of every Iceberg, Delta, and Hudi table , snapshot count, orphaned files, manifest bloat, small file accumulation, cold partitions , without needing access to your catalog or compute.

  • Iceberg snapshot count per table , detect tables with expiry never configured
  • Orphaned file detection and GB quantification per table
  • Manifest bloat , how many old manifests are being read on every query
  • Delta Lake transaction log depth , tables with thousands of JSON commits and no checkpoints
  • Delta VACUUM gaps , unvacuumed deleted records accumulating storage cost
  • Hudi compaction lag , commit health and compaction frequency
  • Small file explosion , streaming micro-commit accumulation per table
  • Avg file size per table , OPTIMIZE candidates
REAL FINDING

"We found 15.6 TB of orphaned files on a single Iceberg table. The table had 42,015 snapshots with no expiry policy."

PIPELINE HEALTH

Detect broken pipelines before your users do

Pipelines leave footprints in S3. When a Spark job stops writing, when a Glue job goes silent, when a streaming checkpoint stops advancing , reCost detects it from access log patterns, without any integration into your orchestration layer.

  • Dead pipeline detection , prefixes with no writes after expected cadence
  • Cadence-aware monitoring , distinguishes hourly, daily, and weekly jobs
  • Spark streaming checkpoint health , commit offset lag detection
  • Glue job silence , jobs with no PUT activity for N days
  • Delta Lake streaming job monitoring , checkpoint write gaps
  • New writer alert , unexpected IAM role starts writing to a table
REAL FINDING

"A Go service was making 16.6M failed S3 requests per month , 96% error rate. Nobody had noticed. We found it from user agent analysis."

QUERY INTELLIGENCE

Understand what your query engines are actually doing to your storage

Every Athena query, every Trino scan, every Databricks read leaves a trace in your S3 access logs. reCost reconstructs query behavior , which tables are hot, which partitions are cold, how much stale metadata is being read on every plan , without any query engine integration.

  • Athena query count estimation from result file patterns
  • Cold partition detection , data files with GETs from Athena but last accessed 90+ days ago
  • Stale result accumulation , Unsaved/ prefix GB growing unnoticed
  • Snapshot/manifest GET overhead per table , old metadata still read on every query
  • Trino metadata overhead , manifest list GET patterns
  • Databricks Unity Catalog anomalies
  • Estimated query count per table derived from access log GET patterns
  • Per-table GET/PUT ratio , read-heavy vs write-heavy classification
REAL FINDING

"64% of all snapshot GETs were on files older than 90 days. Every query was loading ancient snapshot chains just to find current data."

SECURITY & COMPLIANCE

See who is touching what , and what SDK they're using

S3 access logs contain the full identity of every request , IAM role, SDK version, user agent, source IP. reCost cross-references this against known EOL runtimes and CVEs to surface production security risks that build-time scanners like Snyk will never see.

  • EOL SDK detection in production , aws-sdk-java/1.x, aws-sdk-go/1.x, Hadoop 3.3.x, nodejs10/12/14
  • CVE matching against detected runtime versions
  • IAM boundary violations , roles accessing out-of-scope buckets
  • First-time role access alerts , new identity touching a bucket
  • Browser / human access to production buckets
  • Unidentified services , null user agents or raw HTTP clients in production
  • External tool detection hitting sensitive prefixes
  • PII path detection , email, name patterns in object keys
REAL FINDING

"nodejs10.x Lambda , EOL since 2021 , still making requests to production S3. Three years past end of support, actively hitting a sensitive bucket."

IAM DATA FLOW

See exactly who is moving data where

reCost builds a complete map of data movement in your AWS environment , which IAM roles access which buckets, which prefixes, which operations, and how much. The Sankey diagram makes invisible data flows visible at a glance.

  • IAM Role → Bucket → Prefix → Operation Sankey diagram
  • Top roles by GET volume and PUT volume
  • Cross-bucket data movement visibility
  • Role access frequency and byte volume per bucket
  • Hover to highlight all connections for a specific role
  • Detect roles with unexpectedly broad bucket access
REAL FINDING

"Three IAM roles were crossing production bucket boundaries. None of them were supposed to have access. Found within the first scan."

STORAGE MONITORING

Object-level storage visibility, not just bucket totals

CloudWatch shows you bucket size. reCost shows you which tables are growing, which prefixes are cold, which storage classes your data is actually in, and what's costing you money at the object level.

  • Largest tables by size and object count
  • Data temperature per prefix , hot, warm, cold, frozen
  • Storage class distribution per bucket and per table
  • S3 cost breakdown by category , compute, storage, transfer, requests
  • Largest buckets by size and monthly request volume
  • CloudFront cache miss rate vs S3 origin requests (for media workloads)
  • Bandwidth by content type , jpg, webp, mp4, wav
REAL FINDING

"12 TB in Standard storage tier with zero GETs in 180 days. Moving to Glacier saved $1,400/month. Identified in under 10 minutes."

MEDIA WORKLOADS

S3 as a media server , now observable

If your S3 serves media through CloudFront, reCost gives you visibility that CloudFront logs alone can't provide , which originals are hot, which resized variants are being served, which files have never been accessed, and where your CDN is passing requests through to S3 vs serving from cache.

  • Resized variant popularity by dimension (728x, 336x, 1200x etc.)
  • Original vs derivative file access patterns
  • Peak CDN→S3 traffic hours
  • Files in Intelligent Tiering with zero GETs
  • Cache miss rate , 200 (S3 served) vs 304 (CloudFront cached)
  • EOL SDK detection from media pipeline user agents (FFmpeg, etc.)
REAL FINDING

"68% of CloudFront origin requests were for image variants that hadn't been accessed in 90+ days. Cache warming the wrong assets."

See exactly what's happening in your S3 data layer

Works with your existing AWS setup. Read-only access. No agents. No data exposure.

Book a Demo