In this post we will run a Kakfa cluster with 3 kafka brokers on docker compose and using a producer to send messages to our topics and a consumer that will receive the messages from the topics, which we will develop in python and explore the kafka-ui.
What is Kafka?
Kafka is a distributed event store and stream processing platform. Kafka is used to build real-time streaming data pipelines and real-time streaming applications.
But on a high level, the components of a typical Kafka setup:
Zookeeper: Kafka relies on Zookeeper to do leadership election of Kafka Brokers and Topic Partitions.
Broker: Kafka server that receives messages from producers, assigns them to offsets and commit the messages to disk storage. A offset is used for data consistency in a event of failure, so that consumers know from where to consume from their last message.
Topic: A topic can be thought of categories to organize messages. Producers writes messages to topics, consumers reads from those topics.
Partitions: A topic is split into multiple partitions. This improves scalability through parallelism (not just one broker). Kafka also does replication
For great in detail information about kafka and its components, I encourage you to visit the mentioned post from above.
Launch Kafka
This is the docker-compose.yaml that we will be using to run a kafka cluster with 3 broker containers, 1 zookeeper container, 1 producer, 1 consumer and a kafka-ui.
You can verify that the brokers are passing their health checks with:
12345678910
docker-compose ps
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
broker-1 confluentinc/cp-kafka:7.4.0 "/etc/confluent/dock…" broker-1 5 minutes ago Up 4 minutes (healthy) 0.0.0.0:9091->9091/tcp, :::9091->9091/tcp, 9092/tcp
broker-2 confluentinc/cp-kafka:7.4.0 "/etc/confluent/dock…" broker-2 5 minutes ago Up 4 minutes (healthy) 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp
broker-3 confluentinc/cp-kafka:7.4.0 "/etc/confluent/dock…" broker-3 5 minutes ago Up 4 minutes (healthy) 9092/tcp, 0.0.0.0:9093->9093/tcp, :::9093->9093/tcp
consumer ruanbekker/kafka-producer-consumer:2023-05-17 "sh /src/run.sh $ACT…" consumer 5 minutes ago Up 4 minutes
kafka-ui provectuslabs/kafka-ui:latest "/bin/sh -c 'java --…" kafka-ui 5 minutes ago Up 4 minutes 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
producer ruanbekker/kafka-producer-consumer:2023-05-17 "sh /src/run.sh $ACT…" producer 5 minutes ago Up 4 minutes
zookeeper confluentinc/cp-zookeeper:7.4.0 "/etc/confluent/dock…" zookeeper 5 minutes ago Up 5 minutes (healthy) 0.0.0.0:2888->2888/tcp, :::2888->2888/tcp, 0.0.0.0:3888->3888/tcp, :::3888->3888/tcp, 2181/tcp, 0.0.0.0:32181->32181/tcp, :::32181->32181/tcp
Producers and Consumers
The producer generates random data and sends it to a topic, where the consumer will listen on the same topic and read messages from that topic.
To view the output of what the producer is doing, you can tail the logs:
12345678
docker logs -f producer
setting up producer, checking if brokers are available
brokers not available yet
brokers are available and ready to produce messages
message sent to kafka with squence id of 1
message sent to kafka with squence id of 2
message sent to kafka with squence id of 3
And to view the output of what the consumer is doing, you can tail the logs:
1234567
docker logs -f consumer
starting consumer, checks if brokers are availabe
brokers not availbe yet
brokers are available and ready to consume messages
{'sequence_id': 10, 'user_id': '20520', 'transaction_id': '4026fd10-2aca-4d2e-8bd2-8ef0201af2dd', 'product_id': '17974', 'address': '71741 Lopez Throughway | South John | BT', 'signup_at': '2023-05-11 06:54:52', 'platform_id': 'Tablet', 'message': 'transaction made by userid 119740995334901'}{'sequence_id': 11, 'user_id': '78172', 'transaction_id': '4089cee1-0a58-4d9b-9489-97b6bc4b768f', 'product_id': '21477', 'address': '735 Jasmine Village Apt. 009 | South Deniseland | BN', 'signup_at': '2023-05-17 09:54:10', 'platform_id': 'Tablet', 'message': 'transaction made by userid 159204336307945'}