The OTEL Data Model

OpenTelemetry defines three signal types: traces (request journeys through a distributed system), metrics (numeric measurements over time), and logs (structured event records). The power of OTEL is that a single instrumentation layer can capture all three, correlated by trace context. Auto-instrumentation β€” available for most major languages and frameworks via OTEL SDKs β€” requires minimal code changes and captures a rich telemetry stream automatically.

The problem is that this rich telemetry stream is not filtered by default. The OTEL specification defines a set of semantic conventions for what attributes to capture β€” and those conventions include HTTP request headers, database query statements, RPC method arguments, and process environment variables. An OTEL deployment with default settings will capture all of these. Most organisations deploying OTEL for the first time are not aware of the data they are sending to their observability backend.

Default is dangerous: The OpenTelemetry HTTP instrumentation libraries capture http.request.header.* attributes by default in many SDKs. This includes the Authorization header β€” meaning every API call's bearer token is stored in your trace backend unless you explicitly filter it out.

Auto-Instrumentation Risks

Auto-instrumentation hooks into framework internals to capture telemetry without requiring developers to add instrumentation code. For HTTP frameworks, this means capturing request and response objects. For database clients, it means capturing SQL statements. For message queue clients, it means capturing message payloads. Each of these can contain highly sensitive data.

HTTP Instrumentation and Header Capture

The OTEL HTTP semantic conventions define http.request.header.{name} attributes for capturing HTTP headers. In many SDK implementations, header capture is enabled for all headers unless explicitly restricted. The Authorization, X-API-Key, Cookie, and any custom authentication headers are all captured. These end up as span attributes in your trace backend β€” searchable, indexed, and retained for however long your trace retention policy specifies.

# Python OTEL - header capture is enabled by default # This creates spans with attributes like: # http.request.header.authorization = "Bearer eyJhbGci..." # http.request.header.cookie = "session=abc123; csrf=xyz..." # To suppress sensitive headers, configure attribute filtering: from opentelemetry.instrumentation.flask import FlaskInstrumentor FlaskInstrumentor().instrument( request_hook=lambda span, environ: ( span.set_attribute("http.request.header.authorization", "[REDACTED]") if span.is_recording() else None ) )

Database Instrumentation and Query Capture

OTEL database instrumentations capture db.statement β€” the full SQL query, including any literal values that were not parameterised. A query like SELECT * FROM users WHERE email = '[email protected]' captured in a trace exposes the searched email address to everyone who has access to the trace backend. In systems where developers use literal values in ad-hoc queries (a common practice in analytics and admin tooling), this can expose significant volumes of PII.

Sensitive Data in Traces

Beyond HTTP headers and SQL queries, distributed traces can capture sensitive data through several mechanisms: span events that log exception details including stack traces with variable values, custom spans added by developers for debugging that capture request payloads, resource attributes that capture environment variable values, and baggage β€” the OTEL mechanism for propagating key-value pairs across service boundaries β€” which developers sometimes use to propagate user context including identifiers and session tokens.

The aggregation problem is significant: while a single trace may seem innocuous, the trace backend stores months of telemetry from every service. An attacker who gains read access to the trace backend has a comprehensive record of every API call, every database query, and every inter-service request, correlated by user and session, across the entire system's history. This is a more complete picture of user behaviour and system state than most application databases contain.

The Jaeger/Tempo exposure risk: Self-hosted trace backends like Jaeger and Grafana Tempo are frequently deployed without authentication on internal networks. "Internal" means accessible to any compromised service in the cluster β€” which can read every trace from every other service.

The Collector as an Attack Target

The OpenTelemetry Collector is the recommended deployment model for production β€” agents send telemetry to the collector, which batches, transforms, and forwards to backends. This makes the collector a high-value target: it processes the complete telemetry stream from every instrumented service. A compromised collector can exfiltrate the entire observability stream to an attacker-controlled endpoint while still forwarding to the legitimate backend, making the breach invisible to operators.

The collector's receiver endpoints β€” typically listening on ports 4317 (gRPC) and 4318 (HTTP) β€” accept telemetry from any source by default. An attacker with network access to the collector can inject fake spans that appear to come from legitimate services, polluting the trace data and potentially triggering false alerts or obscuring real incidents.

Collector authentication: The OTEL Collector supports mTLS and bearer token authentication for both receivers and exporters. Enabling mTLS for inbound telemetry ensures that only authorised services can submit traces β€” preventing both data injection and unauthorised telemetry submission.

Securing the Observability Pipeline

The OTEL Collector's processor pipeline is the right place to apply data sanitisation before telemetry reaches the backend. The transform processor can apply OpenTelemetry Transformation Language (OTTL) rules to redact, hash, or remove sensitive attributes.

# OTEL Collector config: redact sensitive HTTP headers and SQL statements processors: transform: trace_statements: - context: span statements: - set(attributes["http.request.header.authorization"], "[REDACTED]") - set(attributes["http.request.header.cookie"], "[REDACTED]") - replace_pattern(attributes["db.statement"], "'[^']*'", "'?'") redaction: allow_all_keys: false blocked_values: - "4[0-9]{12}(?:[0-9]{3})?" # credit card numbers - "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" # email addresses

Telemetry Data Governance

  1. Inventory what you capture: Before deploying OTEL at scale, audit what attributes each instrumentation library captures by default. Enable debug logging in the collector temporarily to see the full attribute set for each span type.
  2. Apply a deny-list at the collector: Use the redaction processor or transform processor to systematically remove attributes matching PII patterns β€” email addresses, credit card numbers, national ID patterns β€” from all spans before they reach any backend.
  3. Classify trace backends as sensitive data stores: Apply the same access controls to your Jaeger, Tempo, or commercial trace backend as you apply to your application databases. Enable authentication, restrict access to engineering staff who need it, and audit access logs.
  4. Enable mTLS between agents and the collector: Ensure that only your instrumented services can submit telemetry to the collector. This prevents injection of fake spans and limits the attack surface of the collector's receiver ports.
  5. Separate observability tenants by environment: Do not send production traces to the same backend as development or staging. Production telemetry contains real user data; development telemetry is often used in less-controlled contexts.
  6. Apply trace retention policies aligned with your data retention requirements: Traces containing PII are subject to your data retention and deletion obligations. Implement TTL policies that match your compliance requirements rather than keeping everything indefinitely.