Tutorial: Kafka basic commands

Familiarize yourself with the basic commands of Kafka by creating a topic, producing messages, and consuming messages.

Requirements

This tutorial assumes you are running a cluster based on TDP getting started, an easy-to-launch TDP environment for testing purposes. This deployment provides you with:

  • tdp_user, a user with the ability to kinit for authentication.
  • An edge node accessible by SSH

Note: When using another TDP deployment than tdp-getting-started, some commands require some customization with your environment.

Before beginning the tutorial, connect to the cluster and authenticate yourself with kinit using the following commands:

# Connect to edge-01.tdp 
vagrant ssh edge-01
# Switch user to tdp_user
sudo su tdp_user
# Authenticate the user with his Kerberos principal and password
kinit -kt ~/tdp_user.keytab tdp_user@REALM.TDP

Topic creation

Use the following command to create a topic named tdp-user-system-metrics with three partitions. Each partition will have three replicas, the --command-config option is used to specify the client properties file for Kafka authentication.

# Create tdp-user-system-metrics topic
/bin/kafka-topics.sh --create \
--topic tdp-user-system-metrics \
--replication-factor 3 \
--partitions 3 \
--command-config /etc/kafka/conf/client.properties

Print the description of your topic using the describe command:

# Describe tdp-user-system-metrics topic
/bin/kafka-topics.sh --describe \
--topic tdp-user-system-metrics \
--command-config /etc/kafka/conf/client.properties 

Console Producer

In this example, we will continuously capture system metrics every 5 seconds and send them to a Kafka topic called tdp-user-system-metrics.

To start producing the messages, run the following command:

while sleep 5; do ps -e -o pid,user,cmd,%cpu,%mem,vsz,rss,ni,state,etime --no-headers; done \
| /bin/kafka-console-producer.sh --topic tdp-user-system-metrics --producer.config /etc/kafka/conf/producer.properties

The command runs a loop every 5 seconds, utilizing the ps command with the -e option to list all processes. It captures crucial information such as the process ID (PID), user, command, CPU usage, memory usage, and other metrics. The header is excluded from the output by using the --no-headers option. Finally, the captured metrics data is sent to the Kafka topic tdp-user-system-metrics using the Kafka console producer and the provided producer configuration.

Console Consumer

To begin consuming messages from the tdp-user-system-metrics topic, open a new terminal and execute the following command:

# Start consuming messages
/bin/kafka-console-consumer.sh --topic tdp-user-system-metrics \
--consumer.config /etc/kafka/conf/consumer.properties \
--from-beginning

Topics removal

To delete the topic:

# Delete tdp-user-system-metrics topic
/bin/kafka-topics.sh --delete \
--topic tdp-user-system-metrics \
--command-config /etc/kafka/conf/client.properties 

Further reading

In the next tutorial, we will explore how to process the streaming data ingested into Kafka using Spark Structured Streaming. This will enable us to perform near real-time data processing. To learn about Kafka architecture and its components, refer to kafka overview.