EIK Filebeat Kafka

2024-07-11

1. Introduction to Kafka

1. Kafka definition

Kafka is a distributed message queue (MQ) based on the publish/subscribe model, which is mainly used for real-time computing and log collection in the field of big data.

2. Introduction to Kafka

Kafka was originally developed by Linkedin. It is a distributed, partition-supported, multi-replica distributed messaging middleware system based on Zookeeper coordination. Its biggest feature is that it can process large amounts of data in real time to meet various demand scenarios, such as Hadoop-based batch processing systems, low-latency real-time systems, Spark/Flink streaming processing engines, nginx access logs, messaging services, etc. It is written in Scala language.
Linkedin was contributed to the Apache Foundation in 2010 and became a top open source project.

3. Why do we need Message Queuing (MQ)?

The main reason is that in a high-concurrency environment, synchronous requests cannot be processed in time, and requests are often blocked. For example, a large number of requests access the database concurrently, resulting in row and table locks, and finally too many request threads will accumulate, triggering a too many connection error and causing an avalanche effect.
We use message queues to relieve system pressure by asynchronously processing requests. Message queues are often used in scenarios such as asynchronous processing, traffic peak reduction, application decoupling, and message communication.

Currently, the more common MQ middlewares include ActiveMQ, RabbitMQ, RocketMQ, Kafka, etc.

4. Benefits of using message queues

(1) Decoupling
This allows you to extend or modify the processing on both sides independently, as long as you make sure they adhere to the same interface constraints.

(2) Recoverability
When a part of the system fails, it will not affect the entire system. The message queue reduces the coupling between processes, so even if a process that processes messages crashes, the messages added to the queue can still be processed after the system recovers.

(3) Buffer
It helps to control and optimize the speed at which data flows through the system and resolve the inconsistency between the processing speed of produced and consumed messages.

(4) Flexibility

Technology Sharing