Pipeline Observability8 min read · Mar 2, 2025

S3 Request Monitoring: How GET and PUT Patterns Expose Pipeline Problems

rC
reCost Team
Mar 2, 2025

S3 request costs,GET, PUT, LIST,are billed per operation and scale quickly at data lake volumes. But the more important signal in request patterns isn't cost: it's what unusual GET and PUT behavior tells you about the health of your pipelines.

S3 request pricing basics

PUT, COPY, POST, and LIST requests cost $0.005 per 1,000. GET and all other requests cost $0.0004 per 1,000. At first glance, these numbers seem negligible. At data lake scale,millions of small file writes, frequent LIST operations for metadata discovery, high-frequency reads by query engines,they add up. More importantly, they leave a detailed trace of system behavior.

GET patterns as a pipeline health signal

Normal data lake read patterns are predictable: query engines read specific partitions, usually the most recent ones. Anomalies in GET patterns often indicate problems:

  • A sudden spike in GET requests to a specific prefix: a new consumer, a misconfigured loop, or a caching failure
  • GET requests to partitions that haven't been written recently: cold partition scans, possibly from unbounded query predicates
  • GET requests to _delta_log/ or metadata/ prefixes proportionally higher than data reads: table metadata bloat slowing reader startup
  • GET requests from an unexpected IAM role: a misconfigured or compromised service accessing data it shouldn't

PUT patterns as a pipeline cadence signal

PUT request patterns reveal pipeline write behavior at a granular level. A daily ETL that writes 500 objects per run has a recognizable PUT signature. Deviations are observable without any instrumentation on the pipeline itself.

  • Zero PUT requests to a prefix that normally receives writes: dead pipeline
  • PUT volume significantly below baseline: partial failure, upstream data loss, or predicate filtering issue
  • PUT request rate much higher than usual: runaway write loop or misconfigured streaming job
  • PUT requests arriving at wrong times: schedule drift or dependency ordering issue

LIST requests: the hidden cost in metadata-heavy workloads

LIST operations are often overlooked in cost analysis but can become significant in data lake environments with many small files. Query engines and catalog tools perform LIST operations to discover available objects and partitions. In environments with millions of small files, LIST operations can outnumber GET requests and constitute a meaningful portion of total request costs.

High LIST-to-GET ratios often indicate: partition discovery overhead from excessive small file counts, frequent catalog refreshes from metadata tooling, or repeated directory scans in pipelines that don't cache results.

Building request pattern baselines

Effective S3 request monitoring requires baselines: what's the normal GET volume per prefix per day? What's the expected PUT count per pipeline run? With baselines established, deviations become actionable signals rather than noise.

reCost establishes these baselines automatically from your S3 access logs, surfaces deviations, and maps them back to the specific pipelines, query patterns, or IAM roles driving the anomalous behavior.

SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo