Sunday, March 5, 2017

Configuring and Running Apache Kafka in IBM BigInsights

This blog describes on Configuring and running the Kafka from IBM BigInsights.

Apache Kafka is an open source that provides a publish-subscribe model for messaging system. Refer : https://kafka.apache.org/

I assume that you were aware of  terminologies like Producer, Subscriber, Kafka Brokers, Topic and Partitions. Here, I will be focusing on creating multiple Brokers in BigInsights then create a topic and publish the messages from command line and consumer getting it from the Broker.


Environment: BigInsights 4.2

 Step 1: Creating Kafka Brokers from Ambari

By default, Ambari will have one Kafka Broker configured.  Based on your usecase, you may need to create multiple brokers.

Login to Ambari UI --> Click on Host and add the Kafka Broker to the node where you need to install Broker.


 You can see multiple brokers running in Kafka UI.




















 
Step 2: Create a Topic

Login to one of the node where broker is running.  Then create a topic.

cd /usr/iop/4.2.0.0/kafka/bin

su kafka -c "./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 -partitions 1 --topic CustomerOrder"









You can get the details of the topic using the below describe command.

su kafka -c "./kafka-topics.sh --describe --zookeeper localhost:2181 --topic CustomerOrder"






 
Step 3: Start the Producer

In the argument --broker-list, pass all the brokers that are running.

su kafka -c "./kafka-console-producer.sh --broker-list bi1.test.com:6667,bi2.test.com:6667 --topic CustomerOrder"

When you run the above command, it will be waiting for user input. You can pass a sample message

{"ID":99, "CUSTOMID":234,"ADDRESS":"12,5-7,westmead", "ORDERID":99, "ITEM":"iphone6", "COST":980}









Step 4: Start the Consumer

Open an other Linux terminal and start the consumer. It will display all the messages send to producer.

su kafka -c "./kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic CustomerOrder"

 

 Thus, We are able to configure and perfom a sample pub-sub system using Kafka.


5 comments:

mahendar cherry said...

Apache Spark today remains the most active open source project in Big Data with over 1000 contributors. Spark offers over 80 high-level operators that make it easy to build parallel apps
apache spark developer

Macrosoft said...

Nice information from article.

Big Data Analytics Services

sai venkat said...

The great service in this blog and the nice technology is visible in this blog. I am really very happy for the nice approach is visible in this blog and thank you very much for using the nice technology in this blog
Data Science Online Training

rose said...

Nice blog has been shared by you. it will be really helpful to many peoples who are all working under the technology.thank you for sharing this blog.


Hadoop Training in Marathahalli|
Hadoop Training in Bangalore|
Data science training in Marathahalli|
Data science training in Bangalore|

Anoushka Sakthi said...

Wonderful Blog!!! Your post is very informative about Data Management. Thank you for sharing the article with us.

Hadoop Training Chennai |
Big Data Training