Engineering7 min read · Dec 2025

IAM Monitoring for AWS Data Teams: Who Is Accessing What and With What SDK

rC
reCost Team
Dec 2025

IAM roles, SDK versions, access frequency, and bucket boundaries,most data teams have no visibility into this layer until something goes wrong. A security audit, a compliance review, or an unexplained spike in S3 costs often reveals IAM access patterns that have been accumulating for months. Here's how to build that picture from S3 access logs before an incident forces your hand.

Why IAM monitoring matters for data teams specifically

Data teams deal with a specific IAM problem: a large number of roles with broad S3 permissions, maintained by multiple teams, across multiple environments. Glue roles, Spark roles, Athena execution roles, ETL pipeline roles, BI tool service accounts,each one has access to specific prefixes, and that access map changes constantly as data architectures evolve.

The risk isn't primarily external attack,it's configuration drift. A role that was scoped to read a specific prefix in dev gets cloned into production with broader permissions. An SDK version that was current two years ago is still running in production because no one updated it. A browser session hits a production bucket during debugging and the access pattern goes unnoticed.

What S3 access logs tell you about IAM behavior

S3 server access logs include the requester identity for every request. This means you can reconstruct a complete access map: which roles are accessing which buckets and prefixes, at what frequency, using which HTTP user-agent (which includes SDK version information), and with what operation types.

  • Role-to-bucket access map: which roles are accessing buckets they shouldn't
  • SDK version detection: which roles are still using outdated SDK versions with known vulnerabilities (boto3 1.9.x, old AWS CLI versions)
  • First-time access alerts: a role accessing a prefix it has never accessed before
  • Browser session detection: user-agent patterns that indicate human browsing vs automated access
  • Access frequency anomalies: roles with sudden spikes or drops in request volume

EOL SDK detection and why it matters

Old SDK versions carry known CVEs. boto3 1.9.x (botocore < 1.12.63) has CVE-2018-15869, a credential exposure vulnerability in presigned URL handling. If a role is making 520K requests per month against production data using this SDK version, that's an active risk,not a hypothetical one.

SDK version is visible in the user-agent string in S3 access logs. Parsing this at scale reveals which roles are running on outdated stacks without requiring any changes to the applications themselves.

Building an IAM access map from S3 logs

The practical workflow is: parse S3 access logs for requester identity, group by role ARN, map each role to its accessed buckets and prefixes, and flag anomalies (new access patterns, outdated SDKs with known vulnerabilities, boundary violations). This gives you a current-state IAM access map that reflects actual behavior, not just what the IAM policy permits.

IAM policies tell you what's allowed. S3 access logs tell you what's actually happening. For most data environments, there's a significant gap between the two.

What reCost surfaces automatically

  • Role-to-bucket access map across your entire S3 environment
  • SDK version detection per role with CVE flagging for outdated SDK versions
  • First-time access alerts: new role, new bucket, or new prefix combinations
  • Browser session detection in production prefixes
  • Access frequency trending per role over time

How to act on IAM monitoring findings

When reCost surfaces an outdated SDK version with known CVEs, the fix is updating the application's dependency. When it surfaces a boundary violation,a role accessing a production prefix it shouldn't,the fix is updating the IAM policy or the application configuration. When it surfaces a first-time browser session on a production prefix, the fix is understanding who did it and whether it should be blocked.

None of these are hard to fix once you know about them. The challenge is always knowing about them before they become incidents.

SEE IT IN YOUR ENVIRONMENT

Connect reCost to your S3 environment in 5 minutes

No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.

Book a Demo