The Concept: Treating Remediation Like Infrastructure

Infrastructure as Code made infrastructure provisioning declarative, version-controlled, and reviewable. Remediation as Code applies the same principle to security fixing: every remediation action is expressed as code, goes through version control, is tested, and is deployed via the standard PR process.

The AI layer closes the gap between finding identification and fix generation β€” the part of the process that previously required a security engineer to manually write the patch.

Pipeline Stages: CVE to PR

Stage 1: Finding ingestion

The pipeline receives a finding from any scanner: CVE identifier, CWE identifier, SAST rule match, or secret detection. It normalises this into a structured finding object with: type, severity, affected file, affected line range, scanner-provided description, and NVD/OSV enrichment.

Stage 2: Codebase context extraction

For the affected file and location: extract the containing function, its callers (up to depth 2), the relevant test file, the import list, and any existing sanitisation patterns. For dependency CVEs: extract all usages of the affected package, the current version, and available fixed versions.

Stage 3: Fix generation

The LLM receives the enriched finding and codebase context. It generates a unified diff with the fix. The prompt constrains output format strictly: diff only, no explanation, no new files unless necessary.

Stage 4: Validation

Apply the diff to a clean branch. Run: linter, type checker, test suite, and the original scanner against the patched code. All four must pass before the PR is created.

Stage 5: PR creation with provenance

Create a PR containing: the original finding, the LLM prompt used, the generated diff, validation results (test run, re-scan), and a confidence score. This is the full audit trail.

LLM Prompting Strategy

The most important prompt engineering decisions:

  • Output format constraint β€” "Return only a unified diff. Do not explain. Do not add comments. Do not modify lines outside the vulnerable function."
  • Context boundaries β€” explicitly tell the model what it should not change: "Do not modify the function signature. Do not change the return type. Do not add new dependencies."
  • Security class framing β€” tell the model exactly what class of vulnerability it is fixing and what the correct fix pattern is for that class.
  • Negative examples β€” include examples of incorrect fixes for this class: "Do not use string escaping. Do not use regex validation. Use parameterised queries."

Few-shot prompting: Including 2-3 examples of previous correct fixes from your codebase dramatically improves fix quality. The model learns your codebase conventions, import style, and test patterns.

Anatomy of a Good Remediation PR

A well-formed AI-generated remediation PR contains:

  • Title β€” "fix: resolve CWE-89 SQL injection in users.get_by_id() [AI-generated]"
  • Finding details β€” scanner, rule, severity, CVE/CWE identifier
  • Diff explanation β€” what changed and why (written by the AI, reviewed by a human)
  • Validation evidence β€” link to the CI run that passed, re-scan results showing the finding is resolved
  • Confidence score β€” the pipeline's assessment of fix quality
  • Prompt provenance β€” the exact prompt used, for reproducibility and audit

Human Checkpoints You Must Never Skip

  • CVSS β‰₯ 8.0 β€” always requires a security engineer to read the diff before merge, regardless of confidence score
  • Authentication or authorisation code β€” any fix touching auth must be reviewed by the team lead
  • Database migration β€” fixes that change table schemas or query patterns require DBA review
  • Cryptographic code β€” algorithm changes or key handling modifications require specialist review
  • Novel fix patterns β€” if the AI generates a fix using a pattern not present in your historical validated fixes, flag for review regardless of confidence score

"The goal of remediation as code is not to remove humans from the loop. It is to ensure every human in the loop is looking at a well-tested, well-documented, consistently formatted fix β€” not a raw diff someone emailed them."