Tuesday, November 27, 2018

Diagnosing the Scheduler Timeout Error


This blog describes on - Diagnosing the Scheduler Timeout Error

When you run some huge bigsql queries or running many concurrent queries, sometimes you may get scheduler error. But when you rerun it, it will return the results.

In the db2diag log, you will see the error

2018-10-22-15.54.09.353224+540 I56948911015      LEVEL: Error
PID    : 29917               TID : 70312999533568 PROC : db2sysc 0
INSTANCE: bigsql              NODE : 000          DB  : BIGSQL
APPHDL : 0-2278              APPID: *1.bigsql.199921064012
AUTHID : BIGSQL              HOSTNAME: testcluster234.ibm.com
EDUID  : 228                 EDUNAME: db2agent (BIGSQL) 0
FUNCTION: DB2 UDB, base sys utilities, sqeBigSqlSchedulerInternal::registerQuery, probe:2690
MESSAGE : ZRC=0xFFFFEBB1=-5199
         SQL5199N The statement failed because a connection to a Hadoop I/O
         component could not be established or maintained. Hadoop I/O
         component name: "". Reason code: "". Database partition number: "".

DATA #1 : String, 140 bytes
Transport Exception occurred. The BigSql Scheduler service may not be running or the scheduler client request timeout may not be sufficient.

If you get the error,


 1) You need to check whether the scheduler is running.

                       $BIGSQL_HOME/bin/bigsql status -scheduler


2) If the scheduler is running, then you may need to check the time taken for scanning metadata & registerQueryNew. To get the time, you need to check the /var/ibm/bigsql/logs/bigsql-sched-recurring-diag-info.log

 [requestScanMetadata]
    top (5) max elapsed times:
      time= 4363; info= [TableSchema(schName:<schema>, tblName:<tablename>,         impersonationID:bigsql)]
      time= 2339; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:user_tyre)]
      time= 32; info= [TableSchema(schName:<schema>, tblName:<tablename>, impersonationID:bigsql)]
    elapsed-time-range-in-millis and frequency-of-calls-in-that-range:
      range= 0-10; freq= 0
      range= 10-100; freq= 1
      range= 100-1000; freq= 0
      range= 1000-10000; freq= 2
      range= 10000-100000; freq= 0
      range= 100000-1000000; freq= 0
      range= 1000000-9223372036854775807; freq= 0
    
[registerQueryNew]
    top (5) max elapsed times:
      time= 448892; info= schema.tablename;
      time= 245079; info= schema.tablename;
      time= 204996; info= schema.tablename;
      time= 152660; info= schema.tablename;
      time= 151576; info= schema.tablename;
    elapsed-time-range-in-millis and frequency-of-calls-in-that-range:
      range= 0-10; freq= 71
      range= 10-100; freq= 18
      range= 100-1000; freq= 16
      range= 1000-10000; freq= 6
      range= 10000-100000; freq= 13
      range= 100000-1000000; freq= 9
      range= 1000000-9223372036854775807; freq= 0
    
You need to check the max time taken during the time when query failed. In above example, the max time is 448892 millisecond ( 7.48 mins).

Check the timeout set for the property (scheduler.client.request.IUDEnd.timeout & scheduler.client.request.timeout) in /usr/ibmpacks/current/bigsql/bigsql/conf/bigsql-conf.xml.

The default values are scheduler.client.request.timeout = 360000 (6 Mins) & scheduler.client.request.IUDEnd.timeout = 600000 (10 Mins)

  <property>
    <name>scheduler.client.request.IUDEnd.timeout</name>
    <value>600000</value>
    <description>
      Scheduler clients will wait for scheduler to respond for
      these many milli-seconds before timing out during RPC for
      finalizing Insert/Update/Delete.
      For Inserting/Updating/Deleting very large dataset,
      this many need to be adjusted.
    </description>
  </property>
 
  <property>
    <name>scheduler.client.request.timeout</name>
    <value>3600000</value>
    <description>
      Scheduler clients will wait for scheduler to respond for
      these many milli-seconds before timing out during any RPC
      call other than finalizing Insert/Update/Delete.
      For query over very large dataset, this many need to be adjusted.
    </description>
  </property>

As per the logs the query is taking 7.48 mins but the timeout we set in property is 6 mins. So we need to increase the timeout of property higher than the max timeout in the logs. You can change the timeout to 720000 (12 mins) and restart the BigSQL.

If you are using BigSQL 5.0.2 or above, you can do it from Ambari UI or you need to change it from /usr/ibmpacks/current/bigsql/bigsql/conf/bigsql-conf.xml


10 comments:

Anonymous said...

Interesting blog. The coding part is really awesome.

Vicky from Way2Smile - Trusted Big Data Analytics Company in Dubai.

Prwatech said...

You have share informative information. Thank you.
Big Data Certification Course in Bangalore

Unknown said...

Very knowledgeable Blog.Thanks for providing such a valuable Knowledge on Big Data.

Unknown said...

Thank you for explaining theBig Data analysis in a effective way so that it reaches us in a manner which is understandable .Keep Sharing.

Anonymous said...

Amazing info on Big Data. Big Data streaming requires via Robotics in Singapore.

Devi said...

Worth reading! Our experts also have given detailed inputs about these trainings & courses! Presenting here for your reference. Do checkout Big data training in chennai & enjoy learning more about it.

Lafay Tech Plaza said...

Big Data service is a lot of data – so much that it becomes too much for a single computer to handle. An entire industry has emerged around making sense of Big Data. The companies that do this the best are the ones that have mastered getting the most out of Big Data, while keeping costs down.

George said...


Cloud data lakeis a storage repository that stores data in its native format, and does not require a data warehouse to be stored in a predetermined format. The cloud data lake enables a company to store its data in whatever format it is in, and to whatever depth. A cloud data lake is more flexible than a cloud data warehouse, which requires a company's data to be in a predetermined format, and a cloud data lake is significantly cheaper than a cloud data warehouse.

Jasonsmith said...

Such a nice blog with the reference links. Thanks for sharing with others.
Selenium Training In Chennai
Selenium Online Training
Selenium Training In Bangalore
Selenium Webdrivers

Vihan said...

Nice Post Oracle Cloud Automated Testing Tool