Scanning AI-Generated Code: Why Your SAST Rules Need an Upgrade

How Much AI Code Is Already in Your Codebase

If you're running a development team in 2026 and you don't have a clear answer to this question, you're not alone — but you should get one. The 70% figure for enterprise codebases containing AI-generated code comes from GitHub's State of the Developer Ecosystem 2025 report, and it aligns with what we're seeing in security assessments.

The challenge is that AI-generated code isn't labelled. Developers don't add a comment saying "Copilot wrote this function." It flows seamlessly into the codebase alongside hand-written code. You can attempt to detect it statistically — AI-generated code has characteristic patterns — but there's no reliable automated marker.

Why it matters for your threat model: AI-generated code is written by a model trained on a corpus that includes insecure code. The model doesn't understand the security implications of the patterns it reproduces — it's predicting the next plausible token. Code that "looks correct" statistically may implement known antipatterns.

The Security Profile of AI-Generated Code

Research from Stanford, NYU, and multiple security firms in 2023-2025 consistently shows that AI-generated code has a distinct security profile compared to hand-written code. It's not universally worse — in some categories (basic injection protection, SQL parameterization for simple cases) it's actually better than average developer output. But in specific categories, it's reliably worse.

The Stanford CyberSecurity Lab study found that ~40% of code generated by GitHub Copilot for security-relevant tasks contained vulnerabilities — compared to ~25% for experienced developers writing equivalent functions. More importantly, the distribution of vulnerability types differs: LLMs over-represent cryptographic errors, error handling failures, and subtle authorization logic flaws.

Patterns LLMs Reliably Get Wrong

Cryptographic implementation

LLMs have a strong bias toward generating cryptographic code that matches common Stack Overflow patterns — many of which are insecure. The most common failure is using ECB mode for AES encryption (because it's simpler and appears more frequently in tutorials), using MD5 for anything security-related, and generating encryption code that uses a static IV.

                ai_crypto_vulnerable.py
                Python — typical LLM output
              
# What Copilot often generates for "encrypt with AES":
from Crypto.Cipher import AES
import hashlib

def encrypt(data: str, key: str) -> bytes:
    key_bytes = hashlib.md5(key.encode()).digest()  # MD5 for key derivation!
    cipher = AES.new(key_bytes, AES.MODE_ECB)      # ECB mode leaks patterns
    return cipher.encrypt(data.ljust(16).encode())

# What it should generate:
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
import os

def encrypt(data: bytes, key: bytes) -> bytes:
    nonce = os.urandom(12)            # random nonce, never reuse
    aesgcm = AESGCM(key)
    ciphertext = aesgcm.encrypt(nonce, data, None)
    return nonce + ciphertext          # prepend nonce for decryption

Error handling

LLMs frequently generate code that swallows exceptions or returns success values on failure — exactly the A10 pattern we discussed in the OWASP 2025 post. When asked to add error handling, models often generate the path of least resistance: a bare except: pass or a catch-and-return-None pattern that silently fails.

Injection vulnerabilities

For simple cases, LLMs generally use parameterized queries. But for complex, dynamic queries — the cases where a senior developer would reach for a query builder — LLMs often fall back to string concatenation with a comment like "# TODO: sanitize this input." Those TODOs tend to ship.

Why Classic SAST Rules Miss AI-Specific Antipatterns

Traditional SAST rules are pattern-based: they match specific syntax, function calls, and data flows. They're excellent at catching known-bad patterns that have been codified into rules.

The problem with AI-generated code is that it can introduce vulnerabilities through patterns that look syntactically fine but are semantically wrong. ECB mode usage is one example — it's a valid AES configuration, not a syntax error. A bare except: return True in a permission check passes all syntax validation.

Classic SAST rules also lack context about why code is doing what it's doing. An LLM-generated authentication function that has a subtle logic flaw — checking the wrong field, using non-constant-time comparison — may not trigger any existing rule because the function uses all the right primitives in a slightly wrong way.

Semantic Analysis vs Rule-Based Scanning

Semantic analysis — understanding code meaning rather than just syntax — is better positioned to catch AI-generated vulnerability patterns. This includes taint analysis (tracking data flow), type analysis (understanding what values can flow where), and increasingly, LLM-augmented analysis that can reason about code intent.

The practical direction for security teams is: use traditional SAST for breadth (it covers known patterns at scale) and add semantic or LLM-augmented analysis specifically for security-critical code sections. Don't try to use LLM-based analysis on every file in a large codebase — the cost and latency don't justify it. Target it at auth code, crypto code, and input handling.

License and IP Risks from AI-Generated Code

This isn't strictly a security vulnerability, but it's a risk that security teams are increasingly being asked to assess. LLMs can reproduce code from their training data verbatim — including GPL-licensed code, code with restrictive licenses, or proprietary code that appeared in the training corpus.

GitHub Copilot has a "duplication detection" feature that warns when generated code closely matches training data, but it's opt-in and many organisations haven't enabled it. The legal status of AI-generated code that's substantially similar to licensed source code is still being litigated in multiple jurisdictions.

Practical advice: Enable duplication detection in your Copilot or AI coding tool settings. Consider adding license scanning to your CI pipeline to detect GPL and other copyleft licenses in code — both in dependencies and, with appropriate tooling, in the code itself.

Practical Scanning Strategy for AI Code in CI

Rather than trying to treat AI-generated code as a special category (you can't reliably identify it), the right approach is to extend your existing SAST configuration to cover AI-specific antipatterns explicitly.

Add crypto-specific rules: Explicitly check for ECB mode, static IVs, MD5/SHA1 for password hashing, and non-constant-time comparisons in authentication code.
Add exception handling rules: Flag bare except clauses in security-critical code paths, and check that exception handlers in auth/permission functions fail closed.
Increase sensitivity on new code: Configure higher sensitivity (more rules, lower threshold) for code added in the last 30 days. AI-generated code tends to be newer — this catches it before it has time to get embedded.
Add human review requirements for security primitives: Any function handling auth, crypto, or access control should require a human security review regardless of SAST results.

Developer Education: Prompt Engineering for Security

You can meaningfully reduce the rate of AI-generated vulnerabilities by teaching developers to include security context in their prompts. This is low-effort, high-return developer education.

                secure-prompts.txt
                Prompt patterns
              
--- Less secure prompt ---
"Write a function to encrypt a string with AES"

--- More secure prompt ---
"Write a function to encrypt a string with AES-GCM using a random nonce.
Use the Python cryptography library, not PyCrypto.
Use authenticated encryption so we can detect tampering.
Do not use ECB mode. Generate a new random nonce for every encryption."

--- Less secure prompt ---
"Add error handling to this function"

--- More secure prompt ---
"Add error handling to this function. If any exception occurs,
log the error with the exception details, and return False
(fail closed). Do not swallow exceptions silently.
Do not return True or grant access on exception."

What Tools Are Adapting

The SAST market is responding to AI-generated code risks, though adoption is uneven. Tools to watch:

Semgrep has added AI-specific rule packs targeting common LLM antipatterns, including ECB mode detection and exception-swallowing in security contexts
GitHub Advanced Security now has AI-powered autofix suggestions that tend to be more contextually aware than pure rule matching, and includes a "code scanning alert dismissal" audit trail useful for tracking AI-generated finding patterns
Snyk Code uses a semantic analysis engine that performs better on logic flaws than rule-based alternatives
AquilaX combines SAST with semantic analysis specifically designed to catch the cryptographic, error handling, and injection patterns that LLMs produce most frequently, across 20+ languages

The irony: The best tools for catching AI-generated vulnerabilities are themselves using AI for analysis. A security-focused LLM reviewing code is better at understanding the semantic intent of LLM-generated code than a rule set written before LLMs existed.

Scan AI-Generated Code in Your Codebase

AquilaX SAST catches AI-generated vulnerability patterns across 20+ languages — including the crypto, error handling, and injection patterns LLMs reliably get wrong.

Start Free Scan

Scanning AI-Generated Code:Why Your SAST Rules Need an Upgrade