S3 data transfer costs are among the most misunderstood charges on an AWS bill. Cross-region replication, internet egress, CDN fees, and inter-service transfers all have different pricing,and without object-level monitoring, it's nearly impossible to attribute costs accurately.
The four types of S3 data transfer charges
1. Internet egress
Data transferred out of S3 to the public internet,for example, serving objects to end users,is billed at standard egress rates. For data engineering workloads this is usually not the primary concern, but it can appear unexpectedly when debugging tools or third-party integrations make direct S3 requests.
2. Cross-region transfer
This is the most common source of surprise data transfer costs in data lake environments. When a Spark cluster in us-west-2 reads data from a bucket in us-east-1, AWS charges for cross-region data transfer at both ends. For multi-region architectures or teams that haven't locked down region locality, this can add up to significant monthly charges.
3. Cross-AZ transfer within a VPC
Data transferred between Availability Zones within the same region is charged at $0.01/GB in each direction. For high-throughput workloads that don't use VPC endpoints or that span multiple AZs, this can be a meaningful cost that doesn't appear in bucket-level S3 metrics.
4. Replication traffic
S3 Cross-Region Replication generates both the replication transfer cost and the PUT request cost in the destination region. For frequently updated data lake tables, replication costs can exceed the underlying storage costs on a monthly basis.
Why data transfer is hard to monitor
The challenge is that data transfer costs aren't directly visible in S3 metrics or CloudWatch. They appear on the AWS Cost Explorer under EC2 data transfer, not S3. This makes attribution difficult,you know you're paying for data transfer, but you can't easily see which S3 access patterns are driving it.
S3 access logs help by revealing which prefixes are generating the most read volume and from what source IPs or VPC endpoints,letting you infer whether traffic is staying within a region or crossing boundaries.
Practical monitoring approach
- Enable S3 access logging on all production buckets and parse requester IP and VPC endpoint information
- Flag read requests that originate from outside the bucket's region based on VPC endpoint vs public IP patterns
- Monitor replication rule configurations and track destination bucket write volumes against source change rates
- Use S3 VPC endpoints to eliminate cross-AZ transfer costs for within-region traffic where possible
- Set Cost Anomaly Detection alerts for the EC2/DataTransfer-Out-Bytes and DataTransfer-Regional-Bytes metrics
What to fix first
Cross-region access is usually the quickest win. If your compute and your data aren't in the same region, moving one or the other,or enabling replication to a local replica,typically produces the largest reduction in transfer costs. Cross-AZ optimization usually matters less unless you're running very high-throughput workloads.
Connect reCost to your S3 environment in 5 minutes
No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.
Book a Demo