Running Containers as Root
The most common container security mistake is also the most impactful: running processes inside containers as root (UID 0). Most official Docker images default to root. Most Dockerfiles never add a USER instruction. So most containers run as root.
Why does this matter? If an attacker exploits a vulnerability in your application and achieves code execution within the container, they have root privileges inside that container. Combined with kernel vulnerabilities or container escape bugs, root-in-container often becomes root-on-host.
# Bad β runs as root (default) FROM python:3.12-slim COPY . /app CMD ["python", "/app/main.py"] # Good β non-root user FROM python:3.12-slim RUN groupadd -r appuser && useradd -r -g appuser appuser COPY --chown=appuser:appuser . /app USER appuser CMD ["python", "/app/main.py"]
Check your running containers: docker inspect <container> | grep -i user β if it shows empty or "root", the container is running as root.
Privileged Containers and Host Namespace Escapes
Privileged containers (--privileged flag) have nearly all Linux capabilities and can access host devices. They're effectively equivalent to running directly on the host with root access. We find privileged containers used "temporarily for debugging" that somehow never get reverted in production.
Privileged container escape is trivial: From inside a privileged container, you can mount the host filesystem and write to it, load kernel modules, access host processes, and escape the container entirely in under 60 seconds using publicly available techniques. Never run privileged containers in production.
If a container genuinely needs elevated capabilities, grant only the specific capabilities required using --cap-add instead of --privileged. For example, a network monitoring container might need NET_ADMIN β add that capability alone rather than enabling all of them.
Base Image Vulnerabilities
Every Docker image starts from a base image β and that base image includes the OS packages it was built with. ubuntu:22.04 pulled a year ago contains packages with CVEs fixed since then. python:3.12 includes the full Debian OS with potentially hundreds of packages.
In practice, we find containers in production with base images that haven't been updated in 12-24 months, containing dozens of known vulnerabilities β some critical severity. Teams don't update base images because their CI/CD builds are pinned and rebuild-on-update isn't automated.
Rebuild regularly: Pin your base image to a specific digest for reproducibility, but have your CI rebuild images weekly against fresh base image pulls. This picks up OS-level security patches automatically.
Secrets in Dockerfiles and Image Layers
Docker image layers are permanent. If you copy a secret into an image layer and then delete it in a later layer, the secret still exists in the earlier layer β visible to anyone who pulls the image and inspects the layer history.
# WRONG β secret exists in layer history even after rm COPY credentials.json /tmp/credentials.json RUN pip install -r requirements.txt RUN rm /tmp/credentials.json # still in previous layer! # WRONG β ARG values appear in docker history ARG API_KEY RUN curl -H "Authorization: $API_KEY" https://api.example.com/setup # CORRECT β use BuildKit secrets (never written to layers) RUN --mount=type=secret,id=api_key \ curl -H "Authorization: $(cat /run/secrets/api_key)" https://api.example.com/setup
For runtime secrets (environment variables, database passwords), inject via orchestrator secrets management (Kubernetes Secrets, Docker Swarm secrets, HashiCorp Vault) β never bake them into images.
The Minimal Base Image Principle
The attack surface of a container is proportional to what's in it. A full Ubuntu image contains compilers, shells, package managers, curl, wget β every tool an attacker needs to operate after a successful exploitation. A distroless image contains only your application and its runtime dependencies.
# Build stage β full tools available FROM python:3.12 AS builder WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt --target /app/packages # Runtime stage β minimal, no shell, no package manager FROM gcr.io/distroless/python3 WORKDIR /app COPY --from=builder /app/packages /app/packages COPY src/ /app/src/ ENV PYTHONPATH=/app/packages CMD ["/app/src/main.py"]
Distroless images don't have a shell β which means if an attacker achieves RCE, they can't easily run commands, install tools, or explore the filesystem. It doesn't prevent exploitation but significantly raises the cost of post-exploitation.
Docker Image Layer Scanning for CVEs
Image scanning tools inspect every layer of a Docker image and identify packages with known CVEs. This covers both OS-level packages (from the base image) and application-level packages (your requirements.txt or package.json dependencies).
Integrate image scanning into your CI pipeline so every built image is scanned before it's pushed to your registry. Block pushes that introduce critical-severity vulnerabilities.
Scan images in your registry too: New CVEs are discovered every day. A scan-on-build strategy misses vulnerabilities disclosed after the image was built. Continuous registry scanning catches these β alerting when a previously-clean image becomes vulnerable.
Runtime Security
Scanning images before deployment is necessary but not sufficient β attackers can exploit vulnerabilities at runtime, or applications can behave in unexpected ways after deployment. Runtime security tools monitor container behaviour and alert on anomalies:
- Falco: open-source runtime security that alerts on unexpected syscalls, file access, or network connections
- Read-only root filesystem: mount the container root as read-only β legitimate applications almost never need to write to the root filesystem, but attackers do
- No new privileges:
--security-opt=no-new-privilegesprevents privilege escalation via setuid binaries
Kubernetes Security Context Gotchas
apiVersion: apps/v1 kind: Deployment spec: template: spec: securityContext: runAsNonRoot: true runAsUser: 10001 containers: - name: app securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] # drop all capabilities by default
Use Pod Security Standards (PSS) to enforce security context requirements across your cluster. The "restricted" profile enforces non-root, no privilege escalation, dropped capabilities, and read-only root filesystem.
Container Security Checklist
- All containers run as non-root user with explicit UID
- No privileged containers in production
- Capabilities explicitly dropped β add back only what's needed
- Base images updated at least monthly β automated rebuild process
- No secrets in Dockerfiles, ARG, or image layers
- Runtime secrets injected via Kubernetes Secrets or Vault
- Multi-stage builds with minimal runtime images (distroless or alpine)
- Image scanning in CI β block critical CVEs before push
- Continuous scanning of registry for newly-disclosed CVEs
- Read-only root filesystem where possible
- Network policies restricting pod-to-pod communication
Scan Your Container Images
AquilaX scans Docker images for CVEs across all layers, detects secrets baked into image history, and checks Dockerfile security configurations.
Start Free Scan