The Event Trust Model
In request-response architectures, trust is established per-request: the caller authenticates, the server authorises, and the interaction is complete. Event-driven architectures break this model. A message published to a Kafka topic may be consumed days later by a service that has no direct relationship with the original producer. The consumer typically trusts the message contents β it has to, because the originator is gone and there is no session to re-validate against.
This implicit trust in message contents is the central security challenge of event-driven systems. Consumers that deserialise and act on message payloads without validation are vulnerable to every class of injection attack, just deferred in time and distributed across all subscribers. The broker itself becomes a persistence layer for malicious payloads.
The retention problem: Kafka topics are configured with retention periods that can be hours, days, or indefinite. A malicious message is not just consumed once β it sits in the topic for the entire retention period, ready to be reprocessed by any consumer that resets its offset. Malicious payloads in event streams can outlast the incident response that discovered them.
Message Injection
The most direct attack on an event-driven system is publishing malicious messages to a topic. Kafka's ACL system controls which principals can produce to which topics, but ACL misconfigurations are common β particularly the default configuration, which allows any authenticated client to produce to any topic. A compromised service account for a low-privilege producer can publish messages to high-privilege topics if ACLs are not explicitly configured.
Message injection can target: deserialization vulnerabilities in consumers (Java deserialization in legacy Avro consumers, pickle deserialization in Python services); business logic β publishing order events with negative prices or fraudulent user IDs to corrupt downstream state; and notification systems β injecting email/SMS trigger events to flood users or exfiltrate data through notification channels.
Schema Registry Attacks
Confluent Schema Registry and similar schema management systems are used to enforce message structure in Kafka topics. Producers register schemas; consumers use the registered schema to deserialise messages. The Schema Registry is trusted infrastructure β consumers trust that the schema they retrieve accurately describes the messages in the topic.
An attacker who can write to the Schema Registry can register a modified schema that causes consumers to misparse messages β interpreting a user ID field as a command field, or changing a string field to a byte array that triggers deserialization issues. More broadly, schema modification can enable consumers to silently drop fields, leading to data integrity issues that are difficult to detect through normal monitoring.
Schema Registry is often unauthenticated: Many Kafka deployments configure the Schema Registry without authentication on internal networks. Any service in the cluster can register and modify schemas, regardless of whether it produces to the affected topics.
Consumer Group Manipulation
Kafka's consumer group mechanism allows multiple instances of the same consumer to coordinate partition assignments. Consumer group management operations β resetting offsets, deleting groups, modifying assignments β require specific ACL permissions but are often granted broadly. An attacker who can modify consumer group offsets can replay historical messages to consumers (potentially triggering duplicate processing of financial transactions or state changes), skip messages (denying service to consumers), or cause consumers to process messages out of order.
Consumer group deletion is a denial-of-service attack on the consuming service: deleting the consumer group causes all instances to lose their offset tracking and restart from the earliest available offset, potentially reprocessing days of events.
Kafka Authentication and ACL Configuration
Kafka supports multiple authentication mechanisms: SASL/PLAIN (username/password, transmitted as plaintext unless TLS is configured), SASL/SCRAM (challenge-response, more secure), SASL/GSSAPI (Kerberos), and mTLS. Many production Kafka deployments still use SASL/PLAIN without TLS or use the ANONYMOUS listener for "internal" traffic β meaning any service on the network can authenticate without credentials.
Kafka's ACL system defines permissions at the topic and consumer group level for produce, consume, and describe operations. Default Kafka clusters have no ACLs configured, meaning any authenticated user can produce and consume from any topic. Turning on ACLs without a complete permission model causes all existing consumers to fail, so organisations often defer enabling them β and frequently never get to it.
Securing Event Streams
- Enable TLS for all Kafka listeners: Configure all Kafka listeners to require TLS. Never use the PLAINTEXT listener in any environment where messages may contain sensitive data. Apply mTLS for internal service-to-broker communication.
- Implement Kafka ACLs from day one: Define ACLs that restrict each service account to only the topics it legitimately produces to and consumes from. Use separate service accounts for producers and consumers of the same topic. Default-deny all access and explicitly grant required permissions.
- Validate all message contents at consumer boundaries: Treat Kafka message payloads as untrusted input. Validate against expected schemas, check field value ranges, and apply the same input sanitisation you apply to HTTP request bodies.
- Authenticate access to Schema Registry: Enable authentication on the Schema Registry. Restrict schema write access to CI/CD pipelines or dedicated schema management tools, not individual service accounts.
- Restrict consumer group management permissions: Only grant consumer group write operations (offset reset, group deletion) to operations tooling and dedicated administrative accounts, not to application service accounts.
- Audit message provenance: Include a producer identity field in message schemas (set server-side from the authenticated principal, not supplied by the producer) so consumers can verify message origin. Alert on messages from unexpected principals on sensitive topics.