Hundred-day foundation building day 17 - Introduction to message queues
Basic Concepts
What is a message queue?
MQ:Message Queue
The queue for storing messages consumes messages in order (first in, first out).
The two parties involved in message passing are called Producer andconsumer , producers are responsible for sending messages, and consumers are responsible for processing messages.
What is the use of message queues?
Three familiar benefits:
Asynchronous processing: Asynchronous processing is achieved through message queues. After the corresponding message is sent to the message queue, the result is returned immediately, reducing response time and improving user experience. Subsequently, the system consumes the message.
Peak shaving/current limiting: First, store the transaction messages generated by short-term high concurrency in the message queue, and then the backend service slowly consumes these messages according to its own capabilities, so as to avoid directly bringing down the backend service.
Reduce system coupling: There is no direct call between modules, so adding new modules or modifying modules will have less impact on other modules.
Enterprise application scenarios:
Implementing distributed transactions: One of the solutions for distributed transactions is MQ transactions, which are supported by most MQs. Transactions allow event stream applications toConsumption, processing, productionThe entire message process is defined as an atomic operation.
Sequence guarantee: Applicable to scenarios that have strict requirements on data order, supported by most MQs.
Delayed/timed processing: After the message is sent, it will not be consumed immediately, but a time will be specified and it will be consumed after the time.
Data stream processing: For the massive data streams generated by distributed systems, such as business logs, monitoring data, user behavior, etc., message queues can collect this data in real time or in batches, and import it into the big data processing engine to achieve efficient data stream management and processing.
What problems will arise when using message queues?
Reduced system availability: The system availability is reduced to some extent. Why do I say that? Before adding MQ, you don’t have to worry about message loss or MQ crashes, but after introducing MQ, you need to consider it!
Increased system complexity: After joining MQ, you need to ensure that messages are not consumed repeatedly, handle message loss, ensure the order of message delivery, and so on!
Consistency issues: I mentioned above that message queues can achieve asynchrony, and the asynchrony brought by message queues can indeed improve the system response speed. However, what if the real consumer of the message does not consume the message correctly? This will lead to inconsistent data!
Common message queues
Kafka
Kafka is an open sourceDistributed Stream Processing PlatformIt has become a top-level Apache project. It was used to process massive logs in the early days, and later gradually developed into a full-featured high-performance message queue.
A streaming platform has three key functions:
message queue: Publish and subscribe to message streams. This function is similar to a message queue, which is why Kafka is also classified as a message queue.
Fault-tolerant persistent storage of recorded message streams: Kafka will persist messages to disk, effectively avoiding the risk of message loss.
Streaming Platform: Kafka provides a complete stream processing library for processing messages when they are published.
RocketMQ
RocketMQ is an open source cloud-native "message, event, and stream" real-time data processing platform developed by Alibaba. It draws on Kafka and has become a top Apache project.
The core features of RocketMQ (from the RocketMQ official website):
Cloud Native: Born and grown in the cloud, infinite elastic scaling, Kubernetes-friendly
High throughput: Trillion-level throughput is guaranteed, meeting the needs of both microservice and big data scenarios.
Stream processing: Provides a lightweight, highly scalable, high-performance, and richly functional stream computing engine.
Financial-grade: Financial-grade stability, widely used in core transaction links.
Extremely simple architecture: zero external dependencies, shared-nothing architecture.
Eco-friendly: Seamlessly connect to microservices, real-time computing, data lakes and other surrounding ecosystems.
RabbitMQ
RocketMQ is an open source cloud-native "message, event, and stream" real-time data processing platform developed by Alibaba. It draws on Kafka and has become a top Apache project.
The core features of RocketMQ (from the RocketMQ official website):
Cloud Native: Born and grown in the cloud, infinite elastic scaling, Kubernetes-friendly
High throughput: Trillion-level throughput is guaranteed, meeting the needs of both microservice and big data scenarios.
Stream processing: Provides a lightweight, highly scalable, high-performance, and richly functional stream computing engine.
Financial-grade: Financial-grade stability, widely used in core transaction links.
Extremely simple architecture: zero external dependencies, shared-nothing architecture.
Eco-friendly: Seamlessly connect to microservices, real-time computing, data lakes and other surrounding ecosystems.
Pulsar
Pulsar is a next-generation cloud-native distributed message streaming platform originally developed by Yahoo and has become an Apache top-level project.
Pulsar integrates messaging, storage, and lightweight functional computing. It adopts a computing and storage separation architecture design, supports multi-tenancy, persistent storage, and cross-regional data replication in multiple computer rooms. It has streaming data storage characteristics such as strong consistency, high throughput, low latency, and high scalability. It is regarded as the best solution for real-time message stream transmission, storage, and computing in the cloud-native era.
The key features of Pulsar are as follows (from the official website):
It is the next generation cloud-native distributed message streaming platform.
A single instance of Pulsar natively supports multiple clusters and can seamlessly replicate messages between clusters across data centers.
Very low publishing latency and end-to-end latency.
Seamlessly scales to over a million topics.
Simple client API supporting Java, Go, Python and C++.
Multiple subscription modes for topics (exclusive, shared, and failover).
Message delivery is guaranteed through the persistent message storage mechanism provided by Apache BookKeeper.
Stream-native data processing is implemented by the lightweight serverless computing framework Pulsar Functions.
Pulsar IO, a serverless connector framework based on Pulsar Functions, makes it easier to move data into and out of Apache Pulsar.
Tiered storage can offload data from hot storage to cold/long-term storage (such as S3, GCS) when the data becomes obsolete.
The above MQ comparison:
Contrast direction
summary
Throughput
The throughput of ActiveMQ and RabbitMQ at the 10,000 level (ActiveMQ has the worst performance) is an order of magnitude lower than that of RocketMQ and Kafka at the 100,000 or even 1 million level.
Availability
Both can achieve high availability. ActiveMQ and RabbitMQ are based on master-slave architecture to achieve high availability. RocketMQ is based on distributed architecture. Kafka is also distributed, with multiple copies of one data. If a few machines go down, data will not be lost and will not cause unavailability.
Timeliness
RabbitMQ is developed based on Erlang, so it has strong concurrency capabilities, extremely good performance, and very low latency, reaching the microsecond level. The other ones are all at the ms level.
Functional support
Pulsar has more comprehensive functions, supporting multi-tenancy, multiple consumption modes, persistence modes and other functions. It is the next-generation cloud-native distributed message streaming platform.
Message Lost
The possibility of loss for ActiveMQ and RabbitMQ is very low, and Kafka, RocketMQ and Pulsar can theoretically achieve 0 loss.
Summarize:
Although RabbitMQ is slightly inferior to Kafka, RocketMQ and Pulsar in terms of throughput, it is developed based on Erlang, so it has strong concurrency, extremely good performance, and low latency, reaching the microsecond level. However, because RabbitMQ is developed based on Erlang, few companies in China have the strength to do research and customization at the Erlang source code level. If the business scenario does not require too high concurrency (hundreds of thousands or millions), then RabbitMQ may be your first choice among these message queues.
RocketMQ and Pulsar support strong consistency and can be used in scenarios with high requirements for message consistency.
RocketMQ is produced by Alibaba and is a Java open source project. We can read the source code directly and customize our own company's MQ. RocketMQ has been tested in Alibaba's actual business scenarios.
The characteristics of Kafka are actually very obvious. It only provides a few core functions, but provides ultra-high throughput, millisecond-level latency, extremely high availability and reliability, and the distribution can be expanded arbitrarily. At the same time, Kafka is best to support a small number of topics to ensure its ultra-high throughput. The only disadvantage of Kafka is that messages may be consumed repeatedly, which will have a very slight impact on data accuracy. In the field of big data and log collection, this slight impact can be ignored. This feature is naturally suitable for real-time computing of big data and log collection. If it is a real-time computing, log collection and other scenarios in the field of big data, using Kafka is the industry standard, absolutely no problem, the community is very active, and it will never be outdated, not to mention that it is almost a de facto standard in this field in the world.
A brief survey on enterprise-level self-developed MQ