The Security Backlog Problem

Security backlogs are self-reinforcing. As findings accumulate, triage takes longer. As triage takes longer, the backlog grows. As the backlog grows, engineer fatigue increases. As fatigue increases, less gets remediated. The cycle ends when a breach forces emergency triage.

The root cause is a capacity mismatch: automated scanners can generate findings 100x faster than engineers can manually remediate them. The traditional solution β€” hire more security engineers β€” does not scale. The actual solution is to automate the remediations that can be automated.

The Continuous Remediation Model

In the continuous model, every scanner finding immediately triggers an automated response attempt. The flow:

  1. Scanner produces finding
  2. Finding enrichment: severity, reachability, EPSS, business context
  3. Routing decision: auto-fix attempt, or direct to human queue
  4. For auto-fix track: fix generation β†’ validation β†’ PR creation
  5. For human track: prioritised ticket with full context and suggested fix
  6. Outcome tracking: measure fix rate, fix quality, time-to-resolution

The goal is that the human queue contains only findings that genuinely require human judgment β€” not findings that an automated system could have fixed trivially.

Finding Routing Logic

The routing decision is the critical design point. A finding should go to the auto-fix track when:

  • The vulnerability class has a >85% historical auto-fix correctness rate for this codebase
  • The severity is below a configurable threshold (typically CVSS < 7.0 for auto-merge)
  • The affected code has >70% test coverage
  • The fix has been validated in the last 90 days for this exact vulnerability class

A finding should go directly to the human queue when:

  • The vulnerability class requires architectural understanding (auth, authZ, business logic)
  • The severity is critical (CVSS β‰₯ 9.0 β€” always human eyes)
  • The affected code is in a high-risk module (payments, authentication, cryptography)
  • The auto-fix engine has failed on this finding more than twice

Routing confidence: Track your routing accuracy over time. If human reviewers are frequently merging auto-generated PRs unchanged, your routing is correctly identifying auto-fixable findings. If they frequently reject PRs, widen the human-review threshold.

Measuring Zero Backlog

Four metrics define a healthy continuous remediation programme:

  • Auto-remediation rate β€” percentage of findings that are automatically fixed (PR merged) without human code review. Target: 40-60% of total findings.
  • Mean time to remediation (MTTR) β€” time from finding creation to vulnerability resolved in production. Target: <24 hours for auto-fixed findings, <7 days for human-reviewed.
  • Backlog age distribution β€” no finding older than 30 days for high/critical severity. Zero findings older than 90 days.
  • Auto-fix quality rate β€” percentage of auto-fixes that pass human review without modification. Target: >90%.

Prerequisites for Continuous Auto-Remediation

Continuous remediation amplifies whatever is already true about your development process. If your CI is flaky, auto-generated PRs will fail on unrelated test failures. If your scanner is misconfigured, you will auto-remediate false positives.

  • Stable CI β€” test suite passes reliably on the default branch. Flaky tests must be fixed before enabling auto-merge.
  • Good test coverage β€” >70% line coverage on modules in the auto-fix scope.
  • Tuned scanner β€” false positive rate below 20% before you start trying to auto-fix findings.
  • Code review culture β€” engineers must be willing to review AI-generated PRs critically, not just rubber-stamp them because they passed CI.

"Continuous auto-remediation is not a shortcut. It is the logical endpoint of taking security seriously β€” investing in automation so that the findings you cannot auto-fix get genuine human attention instead of being buried in a backlog."