IAM roles, SDK versions, access frequency, and bucket boundaries,most data teams have no visibility into this layer until something goes wrong. A security audit, a compliance review, or an unexplained spike in S3 costs often reveals IAM access patterns that have been accumulating for months. Here's how to build that picture from S3 access logs before an incident forces your hand.
Why IAM monitoring matters for data teams specifically
Data teams deal with a specific IAM problem: a large number of roles with broad S3 permissions, maintained by multiple teams, across multiple environments. Glue roles, Spark roles, Athena execution roles, ETL pipeline roles, BI tool service accounts,each one has access to specific prefixes, and that access map changes constantly as data architectures evolve.
The risk isn't primarily external attack,it's configuration drift. A role that was scoped to read a specific prefix in dev gets cloned into production with broader permissions. An SDK version that was current two years ago is still running in production because no one updated it. A browser session hits a production bucket during debugging and the access pattern goes unnoticed.
What S3 access logs tell you about IAM behavior
S3 server access logs include the requester identity for every request. This means you can reconstruct a complete access map: which roles are accessing which buckets and prefixes, at what frequency, using which HTTP user-agent (which includes SDK version information), and with what operation types.
- Role-to-bucket access map: which roles are accessing buckets they shouldn't
- SDK version detection: which roles are still using outdated SDK versions with known vulnerabilities (boto3 1.9.x, old AWS CLI versions)
- First-time access alerts: a role accessing a prefix it has never accessed before
- Browser session detection: user-agent patterns that indicate human browsing vs automated access
- Access frequency anomalies: roles with sudden spikes or drops in request volume
EOL SDK detection and why it matters
Old SDK versions carry known CVEs. boto3 1.9.x (botocore < 1.12.63) has CVE-2018-15869, a credential exposure vulnerability in presigned URL handling. If a role is making 520K requests per month against production data using this SDK version, that's an active risk,not a hypothetical one.
SDK version is visible in the user-agent string in S3 access logs. Parsing this at scale reveals which roles are running on outdated stacks without requiring any changes to the applications themselves.
Building an IAM access map from S3 logs
The practical workflow is: parse S3 access logs for requester identity, group by role ARN, map each role to its accessed buckets and prefixes, and flag anomalies (new access patterns, outdated SDKs with known vulnerabilities, boundary violations). This gives you a current-state IAM access map that reflects actual behavior, not just what the IAM policy permits.
IAM policies tell you what's allowed. S3 access logs tell you what's actually happening. For most data environments, there's a significant gap between the two.
What reCost surfaces automatically
- Role-to-bucket access map across your entire S3 environment
- SDK version detection per role with CVE flagging for outdated SDK versions
- First-time access alerts: new role, new bucket, or new prefix combinations
- Browser session detection in production prefixes
- Access frequency trending per role over time
How to act on IAM monitoring findings
When reCost surfaces an outdated SDK version with known CVEs, the fix is updating the application's dependency. When it surfaces a boundary violation,a role accessing a production prefix it shouldn't,the fix is updating the IAM policy or the application configuration. When it surfaces a first-time browser session on a production prefix, the fix is understanding who did it and whether it should be blocked.
None of these are hard to fix once you know about them. The challenge is always knowing about them before they become incidents.
Connect reCost to your S3 environment in 5 minutes
No agents, no code changes. Just your S3 access logs and a complete picture of your data lake health.
Book a Demo