The Open Source Trust Problem
When you add a dependency, you implicitly trust every contributor who has ever committed to that project β and every maintainer who has ever merged a PR. For a popular library maintained by a single volunteer over years, that trust chain is long, distributed, and largely unaudited.
The open source security model works well for vulnerabilities that are unintentional. It breaks down for intentional malicious contributions, because the social and technical mechanisms that catch bugs (many eyes, automated testing) don't reliably catch stealthy backdoors inserted by a trusted contributor.
Maintainer Account Takeover
The fastest path to compromising a popular library is compromising the maintainer's account on npm, PyPI, or GitHub. Methods:
- Credential stuffing β maintainers reuse passwords; if a breach exposes credentials for another service, the registry account may be accessible
- Phishing β targeted spear phishing of maintainers is well-documented; some attacks use fake security advisory emails asking maintainers to "verify" access
- Package handoff β an attacker contacts a burned-out maintainer, offers to take over maintenance of a popular but neglected package, then publishes a malicious version after gaining trust
- Social engineering GitHub app authorisations β tricking maintainers into authorising a malicious GitHub App that then has write access to the repository
Most npm package accounts have no MFA. Until recently, npm did not require MFA for the top 100 packages by download count. An entire ecosystem of critical infrastructure ran on single-password accounts.
Malicious Pull Requests
Contributing to a project over time to build trust before submitting a malicious PR is a documented attack pattern. The attacker:
- Makes several legitimate, high-quality contributions over weeks or months
- Builds a reputation as a trusted contributor
- Submits a PR that fixes a real bug or adds a real feature β but includes a small, stealthy malicious change in a part of the codebase the reviewer is less familiar with
- The PR gets merged on the strength of the attacker's contribution history
The malicious change is often placed in a non-obvious location: deep in a utility module, inside a conditional that only triggers in specific environments, or in a build script rather than the main library code.
Stealthy Commit Techniques
Technical techniques for hiding malicious code in commits:
- Whitespace manipulation β hiding code in trailing whitespace, tab/space differences that affect Python indentation, or non-printing characters
- Test-only placement β inserting malicious code in test files or test utilities that are included in the package but appear less scrutinised
- Binary file changes β modifying compiled assets, certificates, or data files that can't be diff-reviewed in a standard PR
- Build script injection β modifying build scripts (
Makefile,setup.py,configure.ac) that run during compilation - Merge commit hiding β exploiting GitHub's merge commit to introduce changes that weren't in the reviewed PR
The xz-utils Case Study (2024)
The xz-utils backdoor is the most sophisticated documented supply chain attack against an open source library. Key elements:
- The attacker (
Jia Tan) contributed to the xz project for over two years, making legitimate improvements and eventually gaining co-maintainer trust - The malicious payload was hidden across multiple commits, including in binary test files that couldn't be easily reviewed
- The backdoor targeted
systemd-linked builds ofsshdβ deliberately narrow targeting to avoid widespread detection - It was discovered by a Microsoft engineer who noticed unusual sshd CPU usage during routine benchmarking β not by any security scanner
What xz-utils reveals: A patient, sophisticated attacker can introduce a multi-year infiltration into a critical open source project, bypass all automated scanning, and get within hours of deploying an SSH backdoor to major Linux distributions. Standard CVE-based SCA would never catch this.
Detection and Defences
No single control prevents this class of attack. The defence is a layered combination:
- Pin exact versions β use exact version pins in lock files, not semver ranges that automatically pull new versions
- Review dependency diffs β when upgrading a dependency, review the diff of what changed in the library, not just your own code changes
- Verify package signatures β sigstore and npm's provenance features allow verifying that a package was built from a specific commit by a specific workflow
- Monitor for unexpected new capabilities β a library update that suddenly introduces network calls, process spawning, or file system access deserves scrutiny regardless of whether it has a CVE
- Binary file scanning β scan for executables, archives, and blobs in source repositories; these are common payload staging locations
- SBOM generation and monitoring β generate an SBOM on each build and alert on unexpected new transitive dependencies