How to Detect Malware in a Git Repository: Tools, Scripts, and CI/CD Integration

What malware actually looks like in a Git repository

Most teams think of malware as an executable file attached to an email. In a Git repository context, the threat is different. Malicious code gets into repos through: supply chain attacks (a dependency is compromised and the lockfile updated), compromised contributor accounts pushing backdoored commits, malicious pull requests that add a few lines to a legitimate file, or automated dependency update bots being tricked into upgrading to a malicious version.

Common patterns to look for:

Encoded payloads: Base64, hex, or ROT13 strings decoded and executed at runtime — eval(atob(...)) in JavaScript, exec(base64.b64decode(...)) in Python.
Reverse shell one-liners hidden in build scripts, post-install hooks, or CI configuration.
Data exfiltration: curl or wget commands that pipe environment variables or file contents to an external host.
Committed binaries — ELF executables, compiled DLLs, or packed scripts — that don't belong in the repository.
Typosquatted package names added alongside legitimate ones (e.g., lodahs next to lodash).

Real example: The event-stream npm package incident (2018) — a maintainer handed the package to a stranger who added a dependency (flatmap-stream) containing an encrypted payload that targeted a specific Bitcoin wallet app. The malicious code was in a transitive dependency, obfuscated, and passed all existing CI checks.

Quick grep patterns to run on your repo right now

These one-liners catch the most common malware patterns. Run them from your repository root:

                Terminal — malware indicator patterns
                shell
              

# Encoded payloads executed at runtime
$ grep -rn "eval(base64" --include="*.js" --include="*.py" --include="*.php" .
$ grep -rn "exec(base64" --include="*.py" .
$ grep -rn "eval(atob(" --include="*.js" .

# Reverse shells and data exfiltration
$ grep -rn "bash -i >&" .
$ grep -rn "/dev/tcp/" .
$ grep -rn "0.0.0.0:4" .
$ grep -rEn "curl.*(env|/etc/passwd|\$HOME)" .

# Obfuscated strings (long base64-looking content in code)
$ grep -rEn "[A-Za-z0-9+/]{100,}={0,2}" --include="*.js" --include="*.py" .

# Suspicious postinstall hooks in package.json
$ grep -rn '"postinstall"' --include="package.json" .
$ grep -rn '"preinstall"' --include="package.json" .

# Outbound network calls in unexpected places
$ grep -rn "wget " --include="*.sh" --include="Makefile" .
$ grep -rEn "import requests|urllib" --include="setup.py" .
              

False positive rate: These patterns will produce false positives in legitimate codebases (e.g., eval used for dynamic configuration, Base64 for data encoding). Review each match in context. The goal of the grep pass is to reduce the search space, not to conclusively identify malware.

Scanning committed binary files

Binary files committed to Git are a common malware vector. Developers sometimes commit compiled tools, pre-built libraries, or bundled executables. Legitimate reasons exist (vendored dependencies, test fixtures), but every committed binary should be treated as suspicious until verified.

                Find and scan binaries in the repository
                shell
              

# Find all binary (non-text) files tracked by git
$ git ls-files | xargs file | grep -v text | grep -v empty

# Scan with ClamAV
$ git ls-files | xargs clamscan --infected --no-summary

# Scan with YARA rules (supply your own rule files)
$ git ls-files -z | xargs -0 yara /path/to/rules/*.yar

# Check file entropy (high entropy = likely packed/encrypted)
$ python3 -c "
import sys, math, collections
data = open(sys.argv[1],'rb').read()
freq = collections.Counter(data)
entropy = -sum((c/len(data))*math.log2(c/len(data)) for c in freq.values())
print(f'{entropy:.2f}  {sys.argv[1]}')
" suspicious_file.bin
              

High Shannon entropy (above 7.0 out of 8.0) in a committed file strongly suggests the content is compressed, encrypted, or packed — all common in malware. Legitimate source files rarely exceed 5.5 entropy.

Checking git history for injected code

Supply chain attacks often involve a single malicious commit added to an otherwise legitimate project. Reviewing recent commits, especially those from unfamiliar contributors or automated bots, is an effective detection technique.

                Git history investigation commands
                shell
              

# Show all commits with the files they changed (last 30 days)
$ git log --since="30 days ago" --name-only --oneline

# Find commits that modified package.json or lockfiles
$ git log --all --oneline -- package.json package-lock.json yarn.lock

# See the full diff of a suspicious commit
$ git show <commit-sha>

# Search all commits for a suspicious string (searches the diff)
$ git log -p --all -S "base64_decode" -- "*.php"
$ git log -p --all -S "eval(atob" -- "*.js"

# Find commits from authors not in your team
$ git log --format="%ae %an %H" | sort -u | grep -v "@yourcompany.com"
              

The git history never lies. Even if a malicious file has been removed in a subsequent commit, git log -p -S "malicious_string" will find the commit where it was introduced. This is useful for forensics after an incident — you can determine exactly when and by whom the malicious code was added.

Dedicated scanning tools

Trivy — filesystem and repository scan

Trivy's fs scanner checks for known malware signatures (via ClamAV integration), secrets, and misconfigurations across a local directory or Git repository. It's fast, low-setup, and integrates directly into CI.

Trivy repository scanshell

$ trivy fs \
  --scanners vuln,secret,misconfig \
  --severity HIGH,CRITICAL \
  ./my-repo
              

ClamAV — open-source antivirus signatures

ClamAV maintains a large signature database for known malware families. While it misses novel/targeted malware, it catches known commodity malware reliably. The clamscan command works directly on files and directories.

YARA — custom rule matching

YARA lets you write custom detection rules based on byte patterns, strings, and conditions. It's the tool of choice for targeted malware hunting. Public YARA rule repositories (e.g., Yara-Rules/rules on GitHub) cover hundreds of known malware families.

AquilaX malware scanner

AquilaX combines static analysis, signature matching, and behavioral pattern detection across all file types in a repository — including minified JavaScript, compiled Python (.pyc), and shell scripts — with results in a structured report suitable for CI gating.

Adding a malware gate to your CI/CD pipeline

The most effective control is automated scanning on every commit. The gate should block merges if malware indicators are found, not just report them.

                .github/workflows/malware-scan.yml
                yaml
              

name: Malware Scan
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

      - name: Trivy filesystem scan
        run: |
          curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
          trivy fs \
            --scanners secret \
            --exit-code 1 \
            --severity CRITICAL,HIGH \
            .

      - name: ClamAV scan
        run: |
          sudo apt-get install -y clamav clamav-daemon
          sudo freshclam
          clamscan --recursive --infected --no-summary . || exit 1
              

Practical tip: Combine automated scanning with a .gitattributes rule that marks binary files explicitly (*.exe binary, *.dll binary). This makes it easy to audit committed binaries separately from source code, and alerts reviewers when a binary is added or changed.

Scan your repositories for malware automatically

AquilaX detects malware, obfuscated scripts, backdoors, and supply chain injections across every file in your repository — integrated into your existing Git workflow.

See malware scanning →

How to detect malwarein a Git repository.