Dymestify Kafka: How Producer, Broker And Consumer Work Together

A concise and in depth explanation for new Kafka users on how the basic components of the Kafka work together.

Li Pei
5 min readDec 1, 2023

For developers with little experience with Kafka or event driven architecture, it may be daunting to comprehend how Kafka works since it is not very intuitive point-to-point communication. In this article I will try my best to offer an in-depth presentation with no starch. With step-by-step evolution of the system, hopefully it will help understand and memorize how and why Kafka works in this way.

Intro Of Producer, Consumer And Broker

Consider a scenario where Producer A needs to dispatch messages, but it is unsure about the eventual recipients. Rather than directly targeting a specific message consumer, Producer A opts for an indirect approach. It sends the messages to an intermediate entity known as a Broker. This method allows for more flexibility in message delivery, ensuring that the messages reach the appropriate consumers without the producer needing precise knowledge of who they are at the time of sending.

At present, the Broker functions similarly like a temporary storage system. It maintains an append-only log file for each message it receives. Unlike actively seeking out consumers, the Broker takes a passive role, holding onto the messages until a consumer comes to poll them. This approach ensures that messages are securely stored and readily available for consumers when they are ready to retrieve them.

When Consumer A, the first interested party, arrives on the scene, it initiates the process by polling messages from the Broker.

Producer, Broker and Consumer
Producer, Broker and Consumer

Scale Log With Topic

As the system expands, additional producers and consumers begin to participate. Let’s introduce Producer B and its corresponding Consumer B into this scenario. In the initial setup, the broker mixes all messages from both Producer A and B in a single log file. This mix leads to consumers processing a multitude of irrelevant messages, e.g. if Producer A generates messages at a rate of 99/s while Producer B only produces 1/s, Consumer B ends up inefficiently sifting through a vast majority of irrelevant data, wasting 99% of its bandwidth to read in data. To counteract this inefficiency, the Broker evolves to segment the large log into multiple independent logs, each identified by a unique Topic. Now, when a producer sends a message, it specifies the topic it is addressing, and similarly, a consumer specifies the topic from which it intends to retrieve messages. This organization vastly improves system efficiency by streamlining the message flow to relevant parties.

Produce and Consume from Topic

Accelerate Consuming With Consumer Group And Partition

Imagine a situation where Producer A generates messages under Topic A at a high volume, but Consumer A struggles to process these messages fast enough. In cases where messages are not correlated, an effective strategy is to horizontally scale by increasing the number of instances of Consumer A. These instances collectively form what is known as a Consumer Group, each handling the same set of tasks.

However, in the current configuration, this Consumer Group would still be reading from a single log file under the topic. To better facilitate horizontal scaling and enhance processing efficiency, the topic can be divided into multiple Partitions. Multiple replicas of Consumer A can then simultaneously read from these different partitions. This not only improves the consumer’s processing capability but also aids in scaling the broker, which will be elaborated in later sections.

How are these partitions determined? The number of partitions is configured on the broker for each topic. When producer dispatching message, each message must be produced with a key. This key is then used to compute a partition number, which dictates the destination partition for the message. This method ensures a more balanced and efficient distribution of messages across the Kafka system.

Produce and Consume from Topic with Partition

Memorize Consumption State With Offset

In Kafka, a consumer might be either a long-running instance or not. However, under the basic setup, if a consumer goes offline, it faces the challenge of having to reread messages from the beginning upon reboot. To avoid this, Kafka’s Broker assigns an auto-incrementing sequence number to every message within a partition. This sequence number, known as Offset, is crucial. The Broker not only assigns these offsets but also keeps track of which offsets have been consumed for each partition.

Is this enough? Consider a scenario with another Consumer Group A1 other than Consumer Group A, also interested in Topic A. Since different consumer groups may consume messages at varying rates, storing just a single offset for the partition becomes inadequate. To address this, Kafka maintains separate records of consumed offsets for each consumer group on each partition. After a consumer finishes processing messages, it will notify the Broker to mark the offsets as consumed for its consumer group and partition.

This system ensures precise tracking of message consumption, catering to the unique pace of different consumer groups within the Kafka ecosystem.

Broker Cluster

Until now, our discussion has assumption on a single broker scenario, which poses a significant risk as a single point of failure. In practice, however, Kafka employs a cluster of servers, ideally distributing partitions evenly across these servers to mitigate this risk. When a producer sends a message, it first calculates the appropriate partition, then identifies and communicates with the corresponding broker that manages that partition. Similarly, consumers are assigned specific partitions at startup and establish dedicated connections to the corresponding brokers.

This distributed architecture plays a critical role in Kafka’s reliability and fault tolerance. Should one broker in the cluster fail, the partitions it manages are promptly rebalanced and reassigned to other brokers in the cluster. This ensures minimal disruption, as only the connections related to the affected partitions and their associated producers and consumers are impacted. Such a design significantly enhances the system’s resilience against failures, maintaining continuous operation even in the face of individual server issues.

Produce and Consumer from Broker Cluster

Summary

In this article, several fundamental building blocks Producer, Consumer, Broker, Consumer Group, Topic and Partitions are covered and assmebled together to explain why they exist and how they work together, in the coming articles, more detail on each component and best practice will be discussed.

--

--

Li Pei
Li Pei

Written by Li Pei

Everything can be built in event driven way

No responses yet