Technology Sharing

Kafka basic framework diagram deduction

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

1. Single Point Model

1. Noun concept

  1. Broker: Refers to a node in a Kafka cluster. A Kafka cluster consists of multiple Brokers that work together to handle the storage, transmission, and consumption of messages. Brokers manage one or more partitions.

  2. Topic:The producer sends messages to the specified Topic, and the consumer subscribes to the Topic to get the messages. The Topic itself is just a logical grouping and does not have the concept of physical storage.

  3. Partition: It is a subset of Topic and is the basic unit for storing and processing messages in Kafka. Each Topic can be divided into multiple Partitions, and each Partition is an ordered, immutable sequence of messages.

  4. Replica: A partition can have multiple copies.

  5. Leader Broker: Under multiple copies of a partition, the broker responsible for handling all read and write requests for the partition.

  6. FollowerBroker: In the case of multiple copies of a partition, the broker responsible for synchronizing the Leader's data in the partition.

insert image description here

The producer sends the message (record) to Kafka, and the consumer obtains the data through the offset (similar to the subscript of an array).

At the same time, each partition will have its own log file, and Kafka uses log files to save data to disk.

2. Distributed cluster-horizontal expansion

1. Topic multi-partition

insert image description here

About production

The producer connects to the Kafka cluster through the Bootstrap Broker. This step is to establish the initial connection and obtain the metadata of the cluster.

Once the producer obtains this metadata, it knows who the leader broker is for each partition and can send the message directly to the correct leader broker.

A producer must specify a Topic when sending a message, but partitions are optional.

  • Do not specify a partition: If the producer does not manually specify the partition, Kafka will assign messages to partitions according to the default partition strategy. The default partition strategy is as follows:
    • If the message has a key, Kafka will determine the partition based on the hash value of the key. The same key is always assigned to the same partition.
    • If a message has no key, Kafka will use polling or random methods to assign messages to partitions to ensure that the messages are evenly distributed.
  • Specifying a partition: The producer can also explicitly specify the partition when sending a message. In this way, the message will be sent directly to the specified partition.

In Kafka, when a producer sends a message to a Broker, the first operation of the Broker is to record the message to disk to ensure the persistence and reliability of the message.

About consumption

Consumers in Kafka usually belong to a consumer group. Each consumer group has a unique group ID. The concept of consumer group is used to achieve load balancing and parallel consumption of messages.

When multiple consumers belong to the same group, Kafka assigns topic partitions to consumers in the group.Each partition can only be consumed by one consumer in the group, so that load balancing can be achieved.

  • A single consumer subscribes to a Topic

    • If only one consumer subscribes to a Topic, then the consumer will receive all messages in the Topic.
  • Multiple consumers belong to the same group

    • The partitions in a topic are distributed among the consumers in the group. Each partition will only be consumed by one consumer in the group.
    • If the number of consumers is greater than the number of partitions, the excess consumers will not be assigned to any partition and will be idle. These consumers can automatically take over their partitions when other consumers exit, thus achieving high availability.
    • If there are fewer consumers than partitions, some consumers will be assigned more than one partition.
  • Multiple consumers belong to different groups

    • Each group will consume all messages in the Topic independently. In other words, the messages will be broadcast to consumers in all groups.

About adding new partitions

Kafka creates new partitions in the cluster. These new partitions are assigned to different Brokers to achieve balanced data storage and high availability. Kafka does not automatically redistribute or balance data from existing partitions to new partitions. New partitions are empty when they are created, and data is written to these new partitions only when subsequent producers send messages. The consumer group will sense the change in the number of partitions and trigger a rebalance.

2. Partition multiple copies

insert image description here

Kafka allows each partition to have multiple replicas, which are stored on different brokers. One replica is called the Leader, which is responsible for processing all read and write requests, and the other replicas are Followers, which are responsible for synchronizing the Leader's data.

Among multiple replicas, only one replica can read and write at the same time. This is the Leader replica. The other replicas become Follower replicas and are used as backups.