Sunday, February 12, 2017

Running HDFS Word Count using Spark Streaming in IBM BigInsights

This blog talks on running a simple word count example to demonstrate Spark Streaming in IBM BigInsights.

Environment : IBM BigInsights 4.2

Step 1: Run the Spark Streaming word Count example for HDFS.

su hdfs

cd /usr/iop/current/spark-client

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount --master yarn-client lib/spark-examples.jar /tmp/wordcount


 The above statement will be listening to the hdfs folder ( /tmp/wordcount ). Whenever a file is loaded to hdfs folder, it will do a word count and output it.


 Step2: Open another Linux terminal and run the below command as hdfs user.

echo "Hello - Date is `date`" | hadoop fs -put - /tmp/wordcount/test1.txt



In the Linux terminal in step 1, you can see the output of the word count.















The above example will help us to validate the Spark Streaming.

3 comments:

Macrosoft said...

Running hdfs word count using spark streaming is useful.

Big Data Analytics Services

mahendar cherry said...

Apache Spark today remains the most active open source project in Big Data with over 1000 contributors. Spark offers over 80 high-level operators that make it easy to build parallel apps
apache spark developer training

data science consulting said...

Hello,
The Article on Running HDFS Word Count using Spark Streaming in IBM BigInsights s nice.It give detail informationabout HDFS.Thanks for Sharing the information about it.
data science consulting