2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
The following example is used to test and illustrate the execution process of Kafka message streaming by counting the number of word occurrences in the message.
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<exclusions>
<exclusion>
<artifactId>connect-json</artifactId>
<groupId>org.apache.kafka</groupId>
</exclusion>
<exclusion>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
</dependency>
</dependencies>
First, write and create three classes, which serve as message producers, message consumers, and stream processors.
KafkaStreamProducer
: Message producer
public class KafkaStreamProducer {
public static void main(String[] args) throws ExecutionException, InterruptedException {
Properties properties = new Properties();
//kafka的连接地址
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.246.128:9092");
//发送失败,失败的重试次数
properties.put(ProducerConfig.RETRIES_CONFIG, 5);
//消息key的序列化器
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
//消息value的序列化器
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
for (int i = 0; i < 5; i++) {
ProducerRecord<String, String> producerRecord = new ProducerRecord<>("kafka-stream-topic-input", "hello kafka");
producer.send(producerRecord);
}
producer.close();
}
}
The message producer sends akafka-stream-topic-input
Send five timeshello kafka
KafkaStreamConsumer
: Message Consumer
public class KafkaStreamConsumer {
public static void main(String[] args) {
Properties properties = new Properties();
//kafka的连接地址
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.246.128:9092");
//消费者组
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
//消息的反序列化器
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
//手动提交偏移量
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
//订阅主题
consumer.subscribe(Collections.singletonList("kafka-stream-topic-output"));
try {
while (true) {
ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
System.out.println("consumerRecord.key() = " + consumerRecord.key());
System.out.println("consumerRecord.value() = " + consumerRecord.value());
}
// 异步提交偏移量
consumer.commitAsync();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
// 同步提交偏移量
consumer.commitSync();
}
}
}
KafkaStreamQuickStart
: Stream processing class
public class KafkaStreamQuickStart {
public static void main(String[] args) {
Properties properties = new Properties();
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.246.128:9092");
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-quickstart");
StreamsBuilder streamsBuilder = new StreamsBuilder();
//流式计算
streamProcessor(streamsBuilder);
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), properties);
kafkaStreams.start();
}
/**
* 消息格式:hello world hello world
* 配置并处理流数据。
* 使用StreamsBuilder创建并配置KStream,对输入的主题中的数据进行处理,然后将处理结果发送到输出主题。
* 具体处理包括:分割每个消息的值,按值分组,对每个分组在10秒的时间窗口内进行计数,然后将结果转换为KeyValue对并发送到输出主题。
*
* @param streamsBuilder 用于构建KStream对象的StreamsBuilder。
*/
private static void streamProcessor(StreamsBuilder streamsBuilder) {
// 从"kafka-stream-topic-input"主题中读取数据流
KStream<String, String> stream = streamsBuilder.stream("kafka-stream-topic-input");
System.out.println("stream = " + stream);
// 将每个值按空格分割成数组,并将数组转换为列表,以扩展单个消息的值
stream.flatMapValues((ValueMapper<String, Iterable<String>>) value -> {
String[] valAry = value.split(" ");
return Arrays.asList(valAry);
})
// 按消息的值进行分组,为后续的窗口化计数操作做准备
.groupBy((key, value) -> value)
// 定义10秒的时间窗口,在每个窗口内对每个分组进行计数
.windowedBy(TimeWindows.of(Duration.ofSeconds(10)))
.count()
// 将计数结果转换为流,以便进行进一步的处理和转换
.toStream()
// 显示键值对的内容,并将键和值转换为字符串格式
.map((key, value) -> {
System.out.println("key = " + key);
System.out.println("value = " + value);
return new KeyValue<>(key.key().toString(), value.toString());
})
// 将处理后的流数据发送到"kafka-stream-topic-output"主题
.to("kafka-stream-topic-output");
}
}
The processing class first starts from the subjectkafka-stream-topic-input
Get the message data from the source file and send it to the topic after processing.kafka-stream-topic-output
Then the message consumerKafkaStreamConsumer
Consumption
When entering a topickafka-stream-topic-input
When reading a data stream, each message is a key-value pair. Assume the key of the input message isnull
or a specific string, depending on how the message is sent to the input topic.
KStream<String, String> stream = streamsBuilder.stream("kafka-stream-topic-input");
useflatMapValues
The method splits the message value, but this operation does not change the message key. If the key of the input message isnull
, then at this stage the key of the message is stillnull
。
stream.flatMapValues((ValueMapper<String, Iterable<String>>) value -> {
String[] valAry = value.split(" ");
return Arrays.asList(valAry);
})
In Kafka Streams, when usinggroupBy
When you group a stream using the grouping method, you are actually specifying a new key that will be used in subsequent windowing and aggregation operations.groupBy
Method is used to group by message value:
.groupBy((key, value) -> value)
This means that after the grouping operation, the key of each message in the stream is set to the value of the message.map
See in the methodkey
Parameters, thiskey
is actually the original value of the message, becausegroupBy
After that, the value of the message has become the key.
At this stage, messages are windowed and counted, but the keys remain the same.
.windowedBy(TimeWindows.of(Duration.ofSeconds(10)))
.count()
When converting the count result into a stream, the key is still the key used for the previous grouping.
.toStream()
existmap
In the method, you seekey
The parameters are actually the grouped keys, which are the original values of the messages:
.map((key, value) -> {
System.out.println("key = " + key);
System.out.println("value = " + value);
return new KeyValue<>(key.key().toString(), value.toString());
})
map
Methodkey.key().toString()
is to get the string representation of the key, whilevalue.toString()
is to convert the count value into a string.
.to("kafka-stream-topic-output");