Apache Kafka

A distributed, durable event-streaming platform built around a partitioned, replicated commit log — where events are retained by time or size rather than deleted on consumption.

Problem

Traditional message queues delete messages after they are consumed, making it impossible to replay past events, audit history, or have multiple independent consumers read the same stream at different speeds. High-throughput data pipelines (IoT, financial transactions, microservices integration) require a system that can ingest millions of events per second, store them durably, and serve them to heterogeneous consumers.

Solution / Explanation

Kafka models data as an ordered, immutable log. Producers append events to topics; consumers read from any offset in the log. The log is retained for a configurable window (time- or size-based), independent of consumption. This makes Kafka both a message bus and a persistent event store.

Core Abstractions

Events The unit of data. Each event has: key, value, timestamp, and optional headers. Events are immutable once written.

Topics Named, durable log streams. A topic is analogous to a filesystem folder for events. Topics support multiple simultaneous producers and consumers. Events are not deleted on consumption.

Partitions Each topic is split into one or more partitions, each an ordered, append-only log hosted on a broker. Partitions enable parallelism:

  • Events with the same key always route to the same partition (ordered delivery for that key).
  • Different partitions may be on different brokers, distributing I/O.

Offsets A sequential integer identifying a message’s position within a partition. Consumers track their own offset, giving them full control over replay vs. live consumption.

Brokers Kafka servers forming a cluster. Each partition has one leader broker and zero or more follower replicas for fault tolerance.

Replication Topics are typically replicated with a factor of 3 across brokers and datacenters. If the leader fails, a follower is elected automatically.

Retention Kafka retains events based on configured time (e.g., 7 days) or total size, regardless of whether they have been consumed. This enables:

  • Replay from any point in time.
  • Multiple independent consumers at different positions.
  • Event sourcing replay to rebuild state.

Consumer Groups

A consumer group is a set of consumer instances that collectively read a topic. Kafka assigns each partition to exactly one member of the group at a time, distributing load. Multiple independent consumer groups can read the same topic simultaneously, each at their own pace and offset — enabling both parallel processing and pub-sub semantics in one model.

Topic Partition 0 → Consumer Group A: Instance 1
Topic Partition 1 → Consumer Group A: Instance 2
Topic Partition 0 → Consumer Group B: Instance 1  (independent read)

Kafka APIs

APIPurpose
Producer APIPublish events to topics
Consumer APISubscribe and read events
Kafka StreamsClient library for stream processing (filter, join, aggregate)
Kafka ConnectConnectors to import/export data from external systems
Admin APIManage topics, partitions, configurations

Key Components

  • ZooKeeper / KRaft — cluster coordination (KRaft replaces ZooKeeper from Kafka 3.x).
  • Topic — named log stream.
  • Partition — ordered sub-log enabling parallelism.
  • Consumer Group — set of consumers sharing partition assignment.
  • Offset — consumer position within a partition.
  • Retention Policy — time- or size-based log expiry.

When to Use

  • Real-time event streaming and data pipelines (high throughput).
  • Event sourcing — Kafka as the system of record for event history (see Event Sourcing).
  • CQRS read-model projection — consuming the event log to build query stores (see CQRS).
  • Microservices integration — services publish domain events, others consume asynchronously.
  • Activity tracking (page views, clicks, IoT sensor data).
  • Log aggregation from many services into a central stream.

Trade-offs

BenefitCost
Extremely high throughput (millions of events/s)Operationally complex cluster to manage
Durable, replayable logHigher latency than in-memory queues for very low-volume workloads
Multiple independent consumer groupsOrdering only within a partition; global ordering not guaranteed
Decouples producers from consumersConsumers must manage offset state
Native stream processing (Kafka Streams)Schema evolution requires discipline (use a Schema Registry)

Comparison: Kafka vs. RabbitMQ

Apache KafkaRabbitMQ
ModelDistributed log (pull-based)Message broker (push-based)
RetentionTime/size based (independent of consumption)Deleted after ACK
ThroughputMillions of events/sHundreds of thousands/s
OrderingPer partitionPer queue
Consumer positionConsumer-controlled offsetBroker-controlled
ReplayYesNo (once consumed, gone)
Use caseEvent streaming, audit, replayTask queues, RPC, routing