Showing posts with label Streaming. Show all posts
Showing posts with label Streaming. Show all posts

Sunday, February 12, 2017

Running HDFS Word Count using Spark Streaming in IBM BigInsights

This blog talks on running a simple word count example to demonstrate Spark Streaming in IBM BigInsights.

Environment : IBM BigInsights 4.2

Step 1: Run the Spark Streaming word Count example for HDFS.

su hdfs

cd /usr/iop/current/spark-client

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount --master yarn-client lib/spark-examples.jar /tmp/wordcount


 The above statement will be listening to the hdfs folder ( /tmp/wordcount ). Whenever a file is loaded to hdfs folder, it will do a word count and output it.


 Step2: Open another Linux terminal and run the below command as hdfs user.

echo "Hello - Date is `date`" | hadoop fs -put - /tmp/wordcount/test1.txt



In the Linux terminal in step 1, you can see the output of the word count.















The above example will help us to validate the Spark Streaming.