Data Lake Health7 min read · Feb 12, 2025

S3 Storage Class Monitoring: Finding Cold Data Without Touching Your Catalog

rC
reCost Team
Feb 12, 2025

S3 Intelligent-Tiering automates transitions between access tiers based on access frequency. But it doesn't tell you what's cold, why, or whether the cold data matches your expected access patterns. Object-level visibility gives you that picture before Intelligent-Tiering starts making decisions.

What S3 Intelligent-Tiering does

S3 Intelligent-Tiering monitors access patterns for individual objects and automatically moves them between Frequent Access and Infrequent Access tiers based on 30-day access windows. Objects not accessed for 90 days can optionally move to Archive Instant Access, and further to Deep Archive after 180 days.

The pricing model charges a small monthly monitoring fee per object but eliminates retrieval charges for Frequent and Infrequent Access tiers. For workloads with unpredictable access patterns, it's often cheaper than managing storage classes manually.

What Intelligent-Tiering doesn't tell you

Intelligent-Tiering is an optimization mechanism, not a visibility layer. It doesn't explain why objects are cold,whether it's because a pipeline stopped writing, a table partition is never queried, or a data set has become stale. It doesn't surface which specific tables or prefixes contain the cold data, or whether the cold pattern is expected or anomalous.

  • It won't tell you if cold data corresponds to a failed pipeline that should be actively writing
  • It won't alert you that a Delta Lake partition has never been accessed since it was created
  • It won't distinguish between intentionally archival data and accidentally orphaned data
  • It won't tell you whether cold data is still referenced by a Delta Lake or Iceberg transaction log

Using access logs to understand cold data before Intelligent-Tiering acts

S3 access logs combined with S3 inventory give you last-access time per object and the full read pattern history. This lets you answer questions that Intelligent-Tiering doesn't surface:

  • Which tables have partitions that have never been read since they were created?
  • Which prefixes have cold data that is still referenced by active transaction log entries?
  • Which cold prefixes correspond to pipelines that should be actively writing?
  • Which cold data is genuinely archival vs which is orphaned?

The monitoring gap Intelligent-Tiering leaves open

Intelligent-Tiering handles the transition economics of cold data. Object-level monitoring handles the operational understanding of why data is cold and whether that's expected. Both layers are needed,Intelligent-Tiering to optimize costs automatically, and access log analysis to understand the data lake's health and catch cases where cold data signals a problem rather than normal archival.

Practical approach

Enable Intelligent-Tiering for data prefixes with variable access patterns. Use object-level monitoring (access logs + inventory) to identify cold data before Intelligent-Tiering flags it, so you can triage: is this cold data from a failed pipeline (fix the pipeline), orphaned files (run vacuum), or genuinely archival (correct behavior, let Intelligent-Tiering handle it)?

SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo