Technology Sharing

Kafka Interview Questions (Basic-Advanced-Advanced)

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Table of contents

Kafka Basics

1. What are the uses of Kafka? What are the usage scenarios?

2. What do ISR and AR stand for in Kafka? What does ISR scaling mean?

3. What do HW, LEO, LSO, LW, etc. in Kafka stand for?

4. How is message order reflected in Kafka?

5. Do you understand the partitioner, serializer, and interceptor in Kafka? What is the processing order between them?

6.What is the overall structure of the Kafka producer client?

7.How many threads are used in the Kafka producer client? What are they?

8.What are the design flaws of Kafka’s old Scala consumer client?

9. Is it true that "if the number of consumers in a consumer group exceeds the number of partitions in a topic, some consumers will not be able to consume data"? If so, is there any hack to fix this?

10. What situations may lead to duplicate consumption?

In what situations will the message consumption be missed?

12.KafkaConsumer is not thread-safe, so how to implement multi-threaded consumption?

13. Briefly describe the relationship between consumers and consumer groups

14. When you create (delete) a topic using kafka-topics.sh, what logic does Kafka execute behind the scenes?

15. Can the number of topic partitions be increased? If so, how can it be increased? If not, why not?

16. Can the number of topic partitions be reduced? If so, how? If not, why not?

17. How to choose the appropriate number of partitions when creating a topic?

Kakfa Advanced

1. What internal topics does Kafka currently have? What are their characteristics? What are their respective functions?

2. What is a priority copy? What special function does it have?

3. Where does Kafka have the concept of partition allocation? Briefly describe the general process and principle

4. Briefly describe the log directory structure of Kafka

5. What are the index files in Kafka?

6. If I specify an offset, how does Kafka find the corresponding message?

7. If I specify a timestamp, how does Kafka find the corresponding message?

8. Talk about your understanding of Kafka's Log Retention

1. Time-based

2. Based on log size

3. Based on the log start offset

9. Talk about your understanding of Kafka's Log Compaction#

10. Talk about your understanding of Kafka's underlying storage

11. Let’s talk about the principle of Kafka’s delayed operation

12 Let’s talk about the role of Kafka controller

13.What are the design flaws of Kafka’s old Scala consumer client?

14. What is the principle of consumption rebalancing? (Hint: Consumer Coordinator and Consumer Group Coordinator)

15. How is idempotence achieved in Kafka?

Kafka Advanced Edition

1.How are transactions implemented in Kafka?

2. What is an expired copy? What are the solutions?

3. The evolution of HW and LEO in each replica under multiple replicas

4.What improvements has Kafka made in terms of reliability?

5.Why does Kafka not support read-write separation?

6. How to implement delay queue in Kafka

7.How to implement dead letter queue and retry queue in Kafka?

8.How to perform message audit in Kafka?

9. How to do message tracking in Kafka?

10. How to calculate Lag? (Note the difference between read_uncommitted and read_committed states)

11.What indicators of Kafka should we pay attention to?

12.What designs of Kafka make it have such high performance?

1. Partition

2. Reduce network transmission overhead

3. Sequential reading and writing

4. Zero copy technology

5. Excellent file storage mechanism


Kafka Basics

1. What are the uses of Kafka? What are the usage scenarios?

Messaging system: Kafka and traditional messaging systems (also known as messaging middleware) both have features such as system decoupling, redundant storage, traffic peak shaving, buffering, asynchronous communication, scalability, and recoverability. At the same time, Kafka also provides message ordering and backtracking consumption functions that are difficult to achieve in most messaging systems.

Storage System: Kafka persists messages to disk, which effectively reduces the risk of data loss compared to other memory-based storage systems. Thanks to Kafka's message persistence and multi-copy mechanism, we can use Kafka as a long-term data storage system by simply setting the corresponding data retention policy to "permanent" or enabling the topic's log compression function.

Streaming Platform: Kafka not only provides a reliable data source for each popular stream processing framework, but also provides a complete stream processing library, such as window, join, transformation, aggregation and other operations.