The Scale of the Problem

Human identity security has been a first-class concern for decades. MFA, SSO, JIT access, PAM systems, and phishing-resistant authentication are mature disciplines. Non-human identity security is not. Most organisations that can enumerate their human users precisely cannot tell you how many service accounts, API keys, or OAuth clients exist across their cloud environments.

The disparity in numbers is staggering. A medium-sized engineering organisation with 200 engineers may have 200 human IAM users and upward of 9,000 non-human identities: Lambda execution roles, ECS task roles, GitHub Actions OIDC subjects, Terraform service accounts, database connection strings, third-party SaaS webhooks, CI runner tokens, deployment pipeline credentials, and the API keys that nobody knows who originally created or why.

Governance vacuum: Human identity lifecycle management β€” onboarding, offboarding, access reviews β€” is well-defined in most organisations. Non-human identity lifecycle management is typically undefined. When an engineer leaves, their MFA device is revoked. The service account they created for a project that was cancelled is not.

Serverless Token Theft

Serverless functions present a specific NHI attack vector. AWS Lambda functions, GCP Cloud Functions, and Azure Functions execute under execution roles β€” IAM identities automatically assigned temporary credentials via the instance metadata service. These credentials are available to any code executing inside the function, including code that the developer did not write.

An attacker who can achieve code execution inside a serverless function β€” via an application vulnerability in the function's code β€” can extract the execution role's temporary credentials and use them outside the function context for as long as they remain valid (up to 12 hours for assumed roles). The function's network security controls are irrelevant once the attacker has the credentials in hand.

RCEAttacker achieves code execution in Lambda function via app vuln
Token TheftReads temp creds from metadata endpoint or env vars
ExfilSends credentials out via HTTP (function has outbound access)
PivotUses role outside function to access S3, DynamoDB, secrets

The defence is not just restricting the execution role's permissions (though that is necessary) β€” it is also detecting anomalous use of execution role credentials from outside the expected invocation context. AWS CloudTrail records the source IP of every API call. An execution role whose credentials are suddenly being used from an IP address that is not your Lambda VPC NAT gateway is a clear signal of credential theft.

CI/CD Credential Exposure

CI/CD systems are the most credential-dense environments in most organisations. A typical pipeline has access to: version control tokens, artifact registry credentials, cloud provider deployment keys, database connection strings, and SaaS API keys for notification and monitoring systems. They are accessed hundreds of times a day by automated processes and are therefore invisible against the background noise of legitimate access.

The three most common CI credential exposure vectors are: build log exposure (secrets printed to stdout by a step that was debugging), pull request workflow triggers (PRs from forks executing in contexts with access to repository secrets in some CI configurations), and runner compromise (a compromised self-hosted runner has access to all secrets that any pipeline running on that runner is configured to use).

GitHub's design decision to not expose repository secrets to workflows triggered from forks was correct, but has been worked around by developers who need fork-triggered workflows to have access to certain credentials. The "manual approval required" pattern for fork-triggered workflows that need secret access is the right default β€” most teams that have disabled this protection did so for convenience without fully understanding the exposure.

Long-Lived Key Risks

Long-lived API keys and service account credentials share a fundamental problem: they accumulate risk over time. Every day a key exists is another day it might appear in a git commit, a log file, a support ticket, a Slack message, or a database backup. The older a key is, the more copies of it exist in places the original owner has forgotten about.

Research consistently finds that the average organisation has thousands of API keys that have not been rotated in over a year. A significant fraction have never been rotated at all. Keys created for a specific purpose by a developer who has since left the organisation are particularly dangerous β€” there is no owner to notice anomalous usage, and the key often has broader permissions than currently necessary.

NHI Inventory and Governance

You cannot secure what you cannot enumerate. The first step in NHI governance is building a complete inventory. This is harder than it sounds because NHIs are created across multiple systems, by multiple teams, for multiple purposes, and the authoritative source of truth is distributed across your cloud provider IAM, your CI/CD systems, your vault/secrets manager, and your source code repositories.

A practical NHI inventory includes for each identity: what it is (type, identifier), who created it and when, what it has access to (permissions), where its credential is stored, and when it was last used. The "last used" field is the most immediately actionable β€” any NHI that has not been used in 90 days is a candidate for revocation, and doing so carries no operational risk.

Mitigations

  1. Replace long-lived keys with OIDC-based short-lived tokens wherever possible: GitHub Actions, GitLab CI, and most major cloud providers support OIDC federation. Use it. A token that expires when the workflow run completes cannot be used for long-term access even if it is exfiltrated.
  2. Enforce 90-day automatic rotation for all long-lived keys: Automated rotation removes the human discipline requirement. Keys that cannot be automatically rotated (some legacy integration APIs) should be inventoried as technical debt requiring resolution.
  3. Restrict execution role permissions to the minimum required for each function: Lambda functions and container tasks should have named-resource-scoped IAM policies, not wildcard policies. s3:GetObject on a specific bucket, not s3:* on *.
  4. Enable CloudTrail/audit logging on all API calls and alert on NHI usage anomalies: Usage from unexpected IPs, at unexpected times, or for unexpected actions is the primary detection signal for credential theft. These signals are only visible if you have audit logging enabled and are monitoring it.
  5. Revoke NHIs when their purpose ends: When a project completes, when a developer leaves, when a migration finishes β€” audit the NHIs created for that context and revoke the ones that are no longer needed. Make this part of your offboarding and project closure checklists.
  6. Block IMDSv1 and require IMDSv2: IMDSv1 is accessible via simple HTTP GET with no additional authentication. IMDSv2 requires a PUT request to acquire a session token first, which prevents SSRF-based credential theft from most web application vulnerabilities. Enforce IMDSv2 at the account level.

Human identity security has decades of tooling and process maturity. Non-human identity security is at the same stage human identity security was in 2005. The attack surface is larger, the governance is weaker, and the signals are available but unmonitored. The gap between human and non-human identity security discipline is where most cloud-native attacks now live.