Big Data Analytics: Running HDFS Word Count using Spark Streaming in IBM BigInsights

Sunday, February 12, 2017

Running HDFS Word Count using Spark Streaming in IBM BigInsights

This blog talks on running a simple word count example to demonstrate Spark Streaming in IBM BigInsights.

Environment : IBM BigInsights 4.2

Step 1: Run the Spark Streaming word Count example for HDFS.

su hdfs

cd /usr/iop/current/spark-client

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount --master yarn-client lib/spark-examples.jar /tmp/wordcount

The above statement will be listening to the hdfs folder ( /tmp/wordcount ). Whenever a file is loaded to hdfs folder, it will do a word count and output it.

Step2: Open another Linux terminal and run the below command as hdfs user.

echo "Hello - Date is `date`" | hadoop fs -put - /tmp/wordcount/test1.txt

In the Linux terminal in step 1, you can see the output of the word count.

The above example will help us to validate the Spark Streaming.

Big Data Analytics

Sunday, February 12, 2017

Running HDFS Word Count using Spark Streaming in IBM BigInsights

No comments:

About Me