Running Containers as Root

The most common container security mistake is also the most impactful: running processes inside containers as root (UID 0). Most official Docker images default to root. Most Dockerfiles never add a USER instruction. So most containers run as root.

Why does this matter? If an attacker exploits a vulnerability in your application and achieves code execution within the container, they have root privileges inside that container. Combined with kernel vulnerabilities or container escape bugs, root-in-container often becomes root-on-host.

Dockerfile Dockerfile
# Bad β€” runs as root (default)
FROM python:3.12-slim
COPY . /app
CMD ["python", "/app/main.py"]

# Good β€” non-root user
FROM python:3.12-slim
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY --chown=appuser:appuser . /app
USER appuser
CMD ["python", "/app/main.py"]

Check your running containers: docker inspect <container> | grep -i user β€” if it shows empty or "root", the container is running as root.

Privileged Containers and Host Namespace Escapes

Privileged containers (--privileged flag) have nearly all Linux capabilities and can access host devices. They're effectively equivalent to running directly on the host with root access. We find privileged containers used "temporarily for debugging" that somehow never get reverted in production.

Privileged container escape is trivial: From inside a privileged container, you can mount the host filesystem and write to it, load kernel modules, access host processes, and escape the container entirely in under 60 seconds using publicly available techniques. Never run privileged containers in production.

If a container genuinely needs elevated capabilities, grant only the specific capabilities required using --cap-add instead of --privileged. For example, a network monitoring container might need NET_ADMIN β€” add that capability alone rather than enabling all of them.

Base Image Vulnerabilities

Every Docker image starts from a base image β€” and that base image includes the OS packages it was built with. ubuntu:22.04 pulled a year ago contains packages with CVEs fixed since then. python:3.12 includes the full Debian OS with potentially hundreds of packages.

In practice, we find containers in production with base images that haven't been updated in 12-24 months, containing dozens of known vulnerabilities β€” some critical severity. Teams don't update base images because their CI/CD builds are pinned and rebuild-on-update isn't automated.

Rebuild regularly: Pin your base image to a specific digest for reproducibility, but have your CI rebuild images weekly against fresh base image pulls. This picks up OS-level security patches automatically.

Secrets in Dockerfiles and Image Layers

Docker image layers are permanent. If you copy a secret into an image layer and then delete it in a later layer, the secret still exists in the earlier layer β€” visible to anyone who pulls the image and inspects the layer history.

Dockerfile Dockerfile
# WRONG β€” secret exists in layer history even after rm
COPY credentials.json /tmp/credentials.json
RUN pip install -r requirements.txt
RUN rm /tmp/credentials.json  # still in previous layer!

# WRONG β€” ARG values appear in docker history
ARG API_KEY
RUN curl -H "Authorization: $API_KEY" https://api.example.com/setup

# CORRECT β€” use BuildKit secrets (never written to layers)
RUN --mount=type=secret,id=api_key \
    curl -H "Authorization: $(cat /run/secrets/api_key)" https://api.example.com/setup

For runtime secrets (environment variables, database passwords), inject via orchestrator secrets management (Kubernetes Secrets, Docker Swarm secrets, HashiCorp Vault) β€” never bake them into images.

The Minimal Base Image Principle

The attack surface of a container is proportional to what's in it. A full Ubuntu image contains compilers, shells, package managers, curl, wget β€” every tool an attacker needs to operate after a successful exploitation. A distroless image contains only your application and its runtime dependencies.

Dockerfile (multi-stage) Dockerfile
# Build stage β€” full tools available
FROM python:3.12 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt --target /app/packages

# Runtime stage β€” minimal, no shell, no package manager
FROM gcr.io/distroless/python3
WORKDIR /app
COPY --from=builder /app/packages /app/packages
COPY src/ /app/src/
ENV PYTHONPATH=/app/packages
CMD ["/app/src/main.py"]

Distroless images don't have a shell β€” which means if an attacker achieves RCE, they can't easily run commands, install tools, or explore the filesystem. It doesn't prevent exploitation but significantly raises the cost of post-exploitation.

Docker Image Layer Scanning for CVEs

Image scanning tools inspect every layer of a Docker image and identify packages with known CVEs. This covers both OS-level packages (from the base image) and application-level packages (your requirements.txt or package.json dependencies).

Integrate image scanning into your CI pipeline so every built image is scanned before it's pushed to your registry. Block pushes that introduce critical-severity vulnerabilities.

Scan images in your registry too: New CVEs are discovered every day. A scan-on-build strategy misses vulnerabilities disclosed after the image was built. Continuous registry scanning catches these β€” alerting when a previously-clean image becomes vulnerable.

Runtime Security

Scanning images before deployment is necessary but not sufficient β€” attackers can exploit vulnerabilities at runtime, or applications can behave in unexpected ways after deployment. Runtime security tools monitor container behaviour and alert on anomalies:

  • Falco: open-source runtime security that alerts on unexpected syscalls, file access, or network connections
  • Read-only root filesystem: mount the container root as read-only β€” legitimate applications almost never need to write to the root filesystem, but attackers do
  • No new privileges: --security-opt=no-new-privileges prevents privilege escalation via setuid binaries

Kubernetes Security Context Gotchas

deployment.yaml YAML
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
      containers:
      - name: app
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]  # drop all capabilities by default

Use Pod Security Standards (PSS) to enforce security context requirements across your cluster. The "restricted" profile enforces non-root, no privilege escalation, dropped capabilities, and read-only root filesystem.

Container Security Checklist

  1. All containers run as non-root user with explicit UID
  2. No privileged containers in production
  3. Capabilities explicitly dropped β€” add back only what's needed
  4. Base images updated at least monthly β€” automated rebuild process
  5. No secrets in Dockerfiles, ARG, or image layers
  6. Runtime secrets injected via Kubernetes Secrets or Vault
  7. Multi-stage builds with minimal runtime images (distroless or alpine)
  8. Image scanning in CI β€” block critical CVEs before push
  9. Continuous scanning of registry for newly-disclosed CVEs
  10. Read-only root filesystem where possible
  11. Network policies restricting pod-to-pod communication

Scan Your Container Images

AquilaX scans Docker images for CVEs across all layers, detects secrets baked into image history, and checks Dockerfile security configurations.

Start Free Scan