This blog talks on running a simple word count example to demonstrate Spark Streaming in IBM BigInsights.
Environment : IBM BigInsights 4.2
Step 1: Run the Spark Streaming word Count example for HDFS.
su hdfs
cd /usr/iop/current/spark-client
./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount --master yarn-client lib/spark-examples.jar /tmp/wordcount
The above statement will be listening to the hdfs folder ( /tmp/wordcount ). Whenever a file is loaded to hdfs folder, it will do a word count and output it.
Step2: Open another Linux terminal and run the below command as hdfs user.
echo "Hello - Date is `date`" | hadoop fs -put - /tmp/wordcount/test1.txt
In the Linux terminal in step 1, you can see the output of the word count.
The above example will help us to validate the Spark Streaming.
Environment : IBM BigInsights 4.2
Step 1: Run the Spark Streaming word Count example for HDFS.
su hdfs
cd /usr/iop/current/spark-client
./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount --master yarn-client lib/spark-examples.jar /tmp/wordcount
The above statement will be listening to the hdfs folder ( /tmp/wordcount ). Whenever a file is loaded to hdfs folder, it will do a word count and output it.
Step2: Open another Linux terminal and run the below command as hdfs user.
echo "Hello - Date is `date`" | hadoop fs -put - /tmp/wordcount/test1.txt
In the Linux terminal in step 1, you can see the output of the word count.
The above example will help us to validate the Spark Streaming.
No comments:
Post a Comment