Detecting Obfuscated Malware in Source Code: Patterns, Tools, and CI Gates

Why Obfuscation Works Against Code Review

Code review is a human process with human cognitive limits. A reviewer reading hundreds of lines of code looks for correctness and intent — not for a 200-character base64 string that decodes to a network call hidden inside what appears to be a configuration constant.

Obfuscation exploits the asymmetry between writing and reading malicious code: it takes seconds to encode a payload and minutes to decode it during review. Most reviewers won't bother decoding a suspicious string manually — they'll note it, assume it's legitimate, and move on.

Obfuscation is not just a malware technique. Legitimate minification and bundling create high-entropy, low-readability code. Detection tools must distinguish between intentional obfuscation in build artefacts (acceptable) and obfuscation in source files (suspicious).

Encoding and Encoding Chains

The simplest obfuscation is a single base64-encoded string that is decoded and executed at runtime. Detection is straightforward — look for decode operations applied to long string literals.

single-layer base64 payloadPython
import base64
# The long string below decodes to: __import__('os').system('curl ...')
exec(base64.b64decode(
  'X19pbXBvcnRfXygn...'  # truncated
).decode())

Multi-layer encoding chains are harder to detect — each layer decodes to the next layer's decoder:

multi-layer encoding chainJavaScript
// Layer 1: hex-encoded Layer 2 decoder
const l2 = Buffer.from('66756e6374696f6e...', 'hex').toString();
// Layer 2: base64-encoded Layer 3 payload
eval(eval(l2)('aHR0cHM6...'));

Detection rule: flag any code where a decode operation (base64, hex, URL encoding) is immediately passed to an execution function (eval, exec, Function(), subprocess, os.system).

Eval and Dynamic Code Execution

Dynamic code execution is rarely legitimate in production source code. The pattern eval(some_variable) or new Function(string)() is almost always a red flag in source files (as opposed to build tooling).

JavaScript: eval(), new Function(), setTimeout(string), setInterval(string)
Python: exec(), eval(), compile() + exec(), __import__() with string arguments
Ruby: eval, instance_eval, class_eval with string arguments
PHP: eval(), preg_replace with /e modifier, assert() with string argument

Context matters. A REPL, test framework, or template engine may legitimately use eval. Flag dynamic execution in combination with obfuscation signals (encoded strings, network calls), not in isolation.

Unicode Homoglyphs and Zero-Width Characters

Unicode attacks are particularly dangerous because they're invisible to most code editors and review tools. Common techniques:

Homoglyph substitution — replace ASCII letters with visually identical Unicode characters. pаypal with a Cyrillic 'а' looks identical to paypal but is a different identifier
Zero-width characters — insert invisible characters (U+200B, U+FEFF, U+200C) inside identifiers or strings to change their runtime value without affecting visual appearance
Bidirectional text (Trojan Source) — use Unicode bidirectional control characters to visually reorder code so it reads differently to a human reviewer than to the compiler

Trojan Source example (bidirectional override)Python
# What the reviewer sees vs what the compiler executes diverge
# due to Unicode bidi control characters — not visible in plain text
access_level = "user‮ ⁦# Check if admin⁩ ⁦"
# The string above contains bidi overrides that make it
# visually appear to be a comment in some editors

Entropy Analysis

Shannon entropy measures information density in a string. Legitimate code has relatively low entropy — variable names, function names, and comments are human-readable. Encrypted or encoded payloads have high entropy because they're designed to pack maximum information into minimum space.

A practical entropy threshold for source code review: flag strings longer than 64 characters with Shannon entropy above 4.5 bits per character. This catches base64, hex-encoded payloads, and encrypted shellcode while avoiding false positives on normal code.

entropy calculation in PythonPython
import math
from collections import Counter

def entropy(s):
    counts = Counter(s)
    total = len(s)
    return -sum(
        (c / total) * math.log2(c / total)
        for c in counts.values()
    )

# Flag: len > 64 AND entropy > 4.5
SUSPICIOUS_STRING = "aHR0cHM6Ly9hdHRhY2tlci5leGFtcGxlL3BheWxvYWQ="
print(entropy(SUSPICIOUS_STRING))  # → 5.8

Detection Tooling and CI Gates

A layered detection approach covers different obfuscation classes:

Semgrep custom rules — write rules for the decode-then-execute pattern, dynamic eval, and network calls combined with encoding
git-secrets / truffleHog — primarily for secrets, but high-entropy string detection catches encoded payloads too
detect-secrets — Yelp's tool detects high-entropy strings across many file types
Unicode scanner — grep for bidirectional control characters: grep -rP "[\x{200B}\x{200C}\x{200D}\x{FEFF}\x{202A}-\x{202E}]" src/
AquilaX SAST — combines entropy analysis, AST-based eval detection, and behavioural pattern matching to catch obfuscated malware as part of standard scanning

Obfuscation detection creates false positives. Legitimate base64-encoded configuration values, embedded fonts, and minified vendor files will trigger entropy rules. Build an allowlist for known legitimate high-entropy strings, and enforce that new additions require review sign-off.

Detecting Obfuscated Malwarein Source Code:Patterns, Tools, and CI Gates.