Rotating the key is step one. It is not the last step.

Someone on your team accidentally committed an AWS access key in a config file three years ago. A GitHub secret scanning alert finally fires, or worse, you find out from an AWS billing anomaly. You immediately revoke the key, generate a new one, update your secrets manager, and push a commit that removes the plaintext from the file. Incident closed, right?

Absolutely not. That old commit is still there. Every developer who has cloned the repo has it. Every CI runner that ever ran against that branch fetched it. GitHub has it indexed and cached. Archive services may have crawled it. If the repo was ever public for even a few minutes after the commit landed, it was almost certainly already scraped by automated bots that scan GitHub for exactly this.

The core mistake: Most teams treat "remove from current code" as equivalent to "secret is gone." These are completely different operations. Git is an append-only store. Removing content from HEAD does not remove it from history.

This post covers the mechanics of how attackers recover rotated credentials from git history, and what an actual remediation looks like โ€” including why the standard advice of "just rewrite history" often falls short.

How git stores commits โ€” and why they never disappear

Git is a content-addressed store. Every commit is a SHA-1 (or SHA-256 in newer repos) hash of its content, its tree, and its parent commit hashes. When you make a new commit that removes a file or changes a value, git creates a new object. The old objects โ€” blobs, trees, commits โ€” are still there, referenced by their original SHAs.

Git object graph after "remove secret" commit git
# Three commits, simplified object graph:

a1b2c3d # "Add database config"
  blob:config.py โ†’ DB_PASSWORD = "hunter2_prod"

e4f5a6b # "Add feature X"
  blob:config.py โ†’ DB_PASSWORD = "hunter2_prod"  # same blob, unchanged

c7d8e9f # "Remove hardcoded secret"  โ† you are here
  blob:config.py โ†’ DB_PASSWORD = os.environ["DB_PASSWORD"]

# The blob containing "hunter2_prod" is still in the object store.
# git gc won't touch it because it's referenced by a1b2c3d and e4f5a6b.
# Those commits are referenced by the branch history.
# git log can find them. git show a1b2c3d:config.py shows the secret.

Even if you force-push a new history that excludes those commits, the old objects remain in the `.git/objects` directory locally and on any remotes that received them. GitHub specifically does not garbage-collect objects that were pushed to the server, even after a force push.

The reflog trap: Even after rewriting history with git filter-repo, the old commits remain accessible in .git/refs/original/ (for git filter-branch) or in the local reflog for 90 days by default. Any clones made before the rewrite already have the unreachable objects and can still access them by SHA.

The GitHub CDN cache layer

GitHub serves commit and blob content through a CDN. When you push a commit, GitHub renders it and caches the diff view, the raw file view, and the blob content at URLs like raw.githubusercontent.com/org/repo/commit-sha/path. These URLs remain accessible even after you force-push or delete the branch, as long as the underlying git objects haven't been garbage-collected from GitHub's backend โ€” which, for pushed objects, they typically haven't been.

This means an attacker who has the commit SHA (which may be visible in PR comments, CI logs, branch names, or simply guessable for short-lived public repos) can fetch the raw file content directly, bypassing any branch protection or access controls on the repo UI.

The attacker's playbook

When targeting a repository that may have leaked credentials, attackers follow a predictable sequence. None of this is novel โ€” automated bots running on GitHub's event stream do all of it in milliseconds on every public push.

01
Clone the full history, not just the current branch
02
Search all blobs for credential patterns (AKIA, ghp_, sk-)
03
For every match, record the commit SHA + file path
04
Try each credential against the target service API
Manual git history search for AWS key IDs bash
# Clone with full history
git clone --mirror https://github.com/org/repo.git

# Search all commits for AWS access key patterns
git log --all -p | grep -E 'AKIA[0-9A-Z]{16}'

# Or use git grep across all trees
git grep -i 'aws_access_key' $(git rev-list --all)

# Search for generic key patterns across all blobs
git log --all --full-history -- '*.env' --diff-filter=D -p \
  | grep -E '(password|secret|token|key)\s*=\s*["\x27][^"\x27]{8,}'

# Direct blob extraction once SHA is known
git show a1b2c3d:path/to/config.py

The key insight here is git log --all. Most developers only think in terms of branches. But --all traverses every ref: all branches, all tags, all remote-tracking refs, and โ€” critically โ€” all stash objects. A secret committed to a feature branch that was later deleted is still reachable if the remote ref exists.

Deleted branches aren't gone either. When you delete a branch on GitHub, the underlying commits become unreferenced โ€” but only on GitHub's side. Anyone who cloned the repo before the deletion fetched the branch and its full history. The objects remain in their local repos. If they ever push anything that references those commits (even accidentally), the objects get re-uploaded.

The GitGuardian real-world numbers

GitGuardian's state-of-secrets reports consistently show that when a secret is pushed to a public GitHub repository, it is typically detected and attempted within 5 seconds by automated scanners. Rotation latency โ€” the time between detection and actual key revocation โ€” averages hours to days for most teams. The overlap window is the exposure window. For cloud credentials where API calls are immediately billable or destructive, even a 5-second window is enough for an automated exploit chain.

Finding secrets in your git history

The first step in any remediation is understanding the full scope. You need to scan all commits across all branches โ€” not just your main branch, not just the last 100 commits. Here are the tools actually worth using:

truffleHog

Searches git history with high-entropy detection and regex patterns. v3 has 700+ detectors for specific credential formats and verifies them against APIs. trufflehog git file://./repo --only-verified

gitleaks

Fast, well-maintained, TOML-configurable. Great for CI integration as a pre-push or PR check. gitleaks detect --source . --log-opts="--all"

detect-secrets

Yelp's tool. Creates a baseline file of known false-positives, making it practical for existing repos with lots of non-secret high-entropy strings. Good for brownfield projects.

git log + grep

Blunt but no install required. Useful for targeted searches when you know the credential format: git log --all -p -S 'AKIA'

Full history scan with truffleHog v3 bash
# Scan all branches, all history, verify against APIs
trufflehog git file://. \
  --branch="$(git branch -r | tr -d ' ' | paste -sd ',')" \
  --only-verified \
  --json \
  | tee secrets-report.json

# Count verified secrets by type
jq -r '.DetectorName' secrets-report.json | sort | uniq -c | sort -rn

# Get full commit context for each finding
jq -r '[.SourceMetadata.Data.Git.commit, .DetectorName, .Raw] | @tsv' \
  secrets-report.json
gitleaks with all-branch scan bash
# Install
brew install gitleaks  # or use the Docker image

# Scan full history including all branches
gitleaks detect \
  --source . \
  --log-opts="--all --full-history" \
  --report-format json \
  --report-path gitleaks-report.json \
  --verbose

# Scan just the commits in a PR (useful as a CI gate)
gitleaks detect \
  --log-opts="origin/main..HEAD" \
  --source .

Run both tools. They use different detection heuristics and you will get different (complementary) results. truffleHog's API verification is particularly valuable โ€” it tells you whether a found credential is still active, which is the most critical fact for triage.

The GitHub cache problem (and why you need their help)

After you identify the commits containing secrets and rewrite history locally, you face a platform problem: GitHub's backend has cached the objects and serves them via CDN. Even after you force-push a rewritten history, the old commit SHAs remain accessible via direct URLs for an indeterminate period.

The RAW URL problem: raw.githubusercontent.com/org/repo/<old-commit-sha>/path/to/file will continue serving the old file content after a force push. The CDN cache is separate from the git object store. GitHub must explicitly purge it.

GitHub's own documentation on this is clear: after rewriting history to remove sensitive data, you must contact GitHub Support to request a cache purge. Without this step, the old content remains accessible via direct CDN URLs even though it's no longer reachable through normal git operations on the platform.

The process for a public repository:

  1. Rewrite history with git filter-repo (see next section)
  2. Force-push the rewritten history to all remote branches and tags
  3. Contact GitHub Support with the repository name and the commit SHAs to purge
  4. Ask all collaborators to delete their local clones and re-clone (not git pull โ€” any local clone has the old objects and can re-introduce them)
  5. Revoke and rotate the actual credential (this should have been done immediately, before any of the above)

Fun fact: Forked repositories inherit the full object graph at fork time. If anyone forked your repo before the rewrite, they still have the old objects. GitHub's fork network means those objects might be accessible from dozens of forks you don't control. This is another reason why rotation of the actual credential is non-negotiable โ€” you cannot guarantee removal of the secret from all possible locations.

Actual history rewriting: git filter-repo

The old advice was git filter-branch. Don't use it. It is slow, error-prone, and has been deprecated in favour of git filter-repo, which is orders of magnitude faster and has cleaner semantics for credential removal.

Before you start: Make a full backup. cp -r .git .git.bak. History rewriting is destructive and will change every commit SHA downstream of the first modified commit. Any open PRs, commit references in issues, deployment tags โ€” all will point to now-orphaned commits. Coordinate with your team before doing this on a shared repo.

Remove a specific file from all history bash
# Install git-filter-repo
pip install git-filter-repo

# Remove a specific file from all history
git filter-repo --path secrets/credentials.py --invert-paths

# Remove multiple files
git filter-repo \
  --path config/production.env \
  --path secrets.json \
  --invert-paths

# Remove a pattern from file contents (more surgical)
# This rewrites file content rather than removing entire files
git filter-repo --replace-text replacements.txt

# replacements.txt format:
# AKIA1234567890ABCDEF==>REDACTED_AWS_KEY_ID
# wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY==>REDACTED_AWS_SECRET
Force push rewritten history to all branches bash
# After rewriting, push to remote (destructive โ€” requires force)
# You must disable branch protection rules temporarily

git push origin --force --all
git push origin --force --tags

# Clean up local reflog and pack files to remove old objects locally
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# Verify the old commit SHA no longer resolves
git cat-file -t a1b2c3d
# Should output: "fatal: Not a valid object name a1b2c3d"

BFG Repo Cleaner: the faster alternative for specific patterns

For the specific use case of removing credentials (rather than entire files), BFG Repo Cleaner is often faster and more convenient than git filter-repo. It handles the common "replace all occurrences of this string" case with a single flag:

BFG credential removal bash
# Download BFG
wget https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar

# Create a file listing the secrets to replace
cat > secrets-to-remove.txt <<EOF
AKIA1234567890ABCDEF
wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
my-super-secret-db-password
EOF

# Run BFG on a bare clone (safest approach)
git clone --mirror https://github.com/org/repo.git
java -jar bfg-1.14.0.jar \
  --replace-text secrets-to-remove.txt \
  repo.git

# Apply the changes
cd repo.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --force

BFG is faster than git filter-repo for simple text replacement across large repos. However, it only handles committed content โ€” it won't touch your working directory. And it only processes non-HEAD commits by default (use --no-blob-protection to include HEAD).

Prevention: stop secrets reaching git in the first place

Remediation is painful. Prevention is cheap. The two most effective controls are pre-commit hooks and CI/CD gate checks.

.pre-commit-config.yaml yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.2
    hooks:
      - id: gitleaks
        name: Detect secrets with gitleaks
        entry: gitleaks protect --staged --redact --no-banner
        language: system
        pass_filenames: false
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']
.github/workflows/secret-scan.yml โ€” CI gate yaml
name: Secret Scanning
on: [push, pull_request]

jobs:
  gitleaks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0   # full history โ€” not just latest commit
      - uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GITLEAKS_LICENSE: ${{ secrets.GITLEAKS_LICENSE }}

  trufflehog:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: TruffleHog scan
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.repository.default_branch }}
          head: HEAD
          extra_args: --only-verified

Enable GitHub's native secret scanning: GitHub provides free secret scanning for public repos and as part of Advanced Security for private repos. It scans push content in real-time and can automatically revoke certain credential types (AWS, GitHub tokens) upon detection. It's not a replacement for the above but it's a useful additional layer โ€” and it covers 200+ token formats.

The .gitignore is not a security control

A .gitignore entry for .env means that file won't be tracked by git โ€” until someone runs git add -f .env or creates a file with a slightly different name. .gitignore is a developer convenience, not a security boundary. The only reliable control is scanning that verifies no secrets are staged before every commit.

Secret Scanning

Stop secrets before they reach git history

AquilaX Secret Scanner scans every commit in real-time โ€” across git history, CI pipeline runs, and open PRs. With 800+ credential detectors and live API verification, you know immediately whether a found secret is still active.