0% found this document useful (0 votes)
9 views22 pages

BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016

Uploaded by

Celso Bortoli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016

Uploaded by

Celso Bortoli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Performance and

SQL
Big Best Practices

Session 7: Problem Determination

IBM Big Data Performance Team


4.1.0.2

For questions about this presentation contact Simon Harris [email protected] © 2015 IBM Corporation
IBMIBM
Analytics
Security

Session 7:
Big SQL Performance
Problem Determination
1. General Problem Determination
2. Performance Problem Determination:
a. Statistics
b. How to identify a Suspicious Plan
c. Ways to influence the Optimizer
d. Big SQL Health Check
e. Docs to collect

2 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Big SQL Diagnostics

 Big SQL logs information to a number of log files located in (v4 and
beyond): /var/ibm/bigsql

 Log used depends on the component writing the message:

File Description

diag/DIAGXXXX The "db2dump" directory. Contains Big SQL (DB2) runtime log files,
event files, trace information, dumps, and traps. ‘XXXX’ represents the
node number.
logs/bigsql.log Log messages for the java I/O and DDL handler
logs/bigsql-sched.log Log for the scheduler service. Only found on the host on which the
scheduler is installed (typically the Big SQL master node)
logs/bigsql-ndfsio.log Log for the native I/O engine

3 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Diagnostics – Error Messages and Logs


 There are a number of error messages that can be issued similar to:

SQL5197N
SQL5197N The
The statement
statement failed
failed because
because of
of aa communication
communication error
error
with a Big SQL component. Big SQL component name:
with a Big SQL component. Big SQL component name: "HDFS"."HDFS".
Reason
Reason code:
code: "1".
"1". Log
Log entry
entry identifier:
identifier: "BSL-9-3ef8abc4".
"BSL-9-3ef8abc4".

 The "Log entry identifier" indicates that the details of the failure may be found
in a specific Big SQL log file. The format is:

NRL-9-3ef8abc4
Which Log File Which Node What to Look For
BSL – bigsql.log The node # (from A unique string to
SCL – bigsql-sched.log db2nodes.cfg) that search for in the log
NRL – bigsql-ndfsio.log owns the log file to locate the error
DDL – DB2 diag logs

 An empty identifier usually means its in the db2diag.log file


4 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Diagnostics – Retrieving Error Messages

 Edit the appropriate file and search for the “Log entry identifier”.

 Info related to the error messages can also be retrieved using SQL:
SELECT * FROM table(SYSHADOOP.LOG_ENTRY('log_entry_id'));

 For example:

SELECT * FROM table(SYSHADOOP.LOG_ENTRY('BSL-9-3ef8abc4'));

 Can also control the number of lines output before and after:
– Output 30 lines before, 30 lines after:

SELECT * FROM table(SYSHADOOP.LOG_ENTRY('BSL-9-3ef8abc4', 30, 30));

5 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Big SQL Performance Problem Determination

 Most likely causes of Big SQL performance issues - in approximate order (most
frequent first):
1. Incomplete or inaccurate statistics
2. Only small amount of cluster resources dedicated to Big SQL – default of 25% used
3. Single disk used for temporary working data – default of one path/disk used
4. Not using the most optimal storage format
5. Big SQL Configuration
6. Mapping Hive String data types as VARCHAR(32k) – default of 32k used

6 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Big SQL Performance PD – Check statistics

 ANALYZE, ANALYZE, ANALYZE………


– Most performance issues in Big SQL are resolved by running ANALYZE on the correct set
of tables and columns

 First port of call for a Big SQL performance problem – Make sure ANALYZE has
been run and statistics are up to date

 Check STATS_TIME and CARD in SYSCAT.TABLES to see if ANALYZE has been


run on the TABLE:

select substr(tabname,1,20),stats_time,card from syscat.tables where tabschema='SIMON'"

1 STATS_TIME CARD
------------ -------------------------- --------------------
NATION - -1
ORDERS - -1
CUSTOMER - -1
LINEITEM - -1
STATS-TIME=-. CARD=-1.
ANALYZE not ANALYZE not
run. run.
7 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Big SQL Optimizer – Statistics are crucial


 Can also check “Extended Diagnostic Information” in a formatted explain – it will
identify which tables/columns being referenced in the query do not have
statistics

Extended Diagnostic Information:


--------------------------------

Diagnostic Identifier: 1
Diagnostic Details: EXP0021W Table column has no statistics. The
column "DATEKEY" of table “EDW ".
"DIMENSION" has not had runstats run on it.
This can lead to poor cardinality and predicate
filtering estimates.
Diagnostic Identifier: 2
Diagnostic Details: EXP0021W Table column has no statistics. The
column "APP_TYPE" of table “EDW ".“FACT"
has not had runstats run on it. This can lead to
poor cardinality and predicate filtering estimates.

 But it cannot tell you if the statistics are out of date or inaccurate

See “Session 2: Best Practices: 6. Statistics” for more information.


See “Session 6: Big SQL Optimizer: Explain” for more information on formatted
explains.
8 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Suspicious plans !!!


 If a query is still slow once stats have been
gathered for all necessary tables & columns.

 Explain the query and look for sections in the


plan which:
– are NOT using HashJoins as the join type Note: These are not always
– Have fat BTQs signs of a bad plan, but they
– Are using replicated Hadoop scans on large are indicators – especially if
amounts of data the data volumes are large

 Warning flags to look out for in the explain:


 Nested Loop Joins (NLJOIN operator) - can be very
bad

 Nested Loop Joins without a TEMP on the inner -


can be very, very, very bad

 Merge Scan Joins (MSJOIN)

9 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Suspicious plans - Nested Loop Joins


 Nested Loop Joins (NLJOIN) are usually bad for performance in Big SQL
– Chiefly because Big SQL does not support indexes which therefore prevents efficient indexed
lookups on the inner table

 Therefore, in BigSQL, a NLJOIN will scan all the qualifying rows of the inner for
every row of the outer
– This can be very costly if there are more than just a handful of rows on the outer

 On the face of it, this plan doesn’t look so bad 242.747


– 1 row qualifies in the outer (LHS) of the NLJOIN, so NLJOIN
( 3)
the inner (RHS) 364 rows (estimate) will be scanned just 184.025
2
once /------+-------\
– But what happens if stats have not been collected 1
TBSCAN
364.12
TBSCAN
for the EDW.FACT table, or the optimizers ( 4) ( 5)
90.3946 93.6308
selectivity estimates (which lead to an estimate of 1 1
1 row from the FACT table qualifying) are inaccurate ? |
27470
|
9103
– Might lead to many more rows qualifying for the outer table HTABLE: EDW HTABLE: EDW
FACT DIMENSION
which will mean many more scans of the inner. Q4 Q3

10 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Suspicious plans - Nested Loop Joins


 Optimizer might be forced to choose a NLJOIN if:
– there is an inequality join predicate in the query
• HSJOIN cannot be used in these cases
• Left with choice between Nested Loop and Merge Scan joins
– if the data types of the joined columns do not match
• May include cases where one or more of columns have a function call
• HSJOIN can only be used in limited circumstances here
• Try changing the column types of the join columns in the DDL

 Nested Loop Joins (NLJOIN) can be efficient in Big SQL when it can guarantee
there is only a single row qualifying on the outer
– And also for Hbase lookups

 If you see a NLJOIN in a plan, treat it with suspicion


– Most can be changed to a HSJOIN by running ANALYZE on the tables & columns used in the
query

11 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Big SQL Optimizer – Be wary of FAT BTQs


 Be wary of Broadcast Table Queues moving large amounts of data

This BTQ will send 722M rows 2.79356e+06


^HSJOIN
from each data node to every ( 11)
other data node….. 9.50434e+06
So each data node process ALL the 61907.1
/-----------------+------------------\
data for outer leg of this join. 7.22346e+08 2.43121e+07
In an 18-node cluster each node will BTQ HSJOIN^
process 722M*18=12,996M rows on the ( 12) ( 21)
outer leg.

With Directed Table Queues, data is partitioned


2.79356e+06
on the fly by the join key and only sent to the node ^HSJOIN
responsible for processing that key. ( 14)
So each node only process a subset of the 1.97698e+06
49690
data for this join. /--------+---------\
In an 18-node cluster each node will 7.22346e+08 2.43121e+07
process approx 722M rows on the DTQ DTQ
( 12) ( 21)
outer leg.

 The latter will scale much better than the former


12 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Influencing the Optimizer - Using SELECTIVITY clause

 If correct statistics are not enough to achieve an efficient access plan, then
there are alternative ways to influence the access plan chosen by the optimizer

 Use the SELECTIVITY clause to tell the optimizer the correct filter factor of
predicates
– Useful for influencing plans

 Set the following registry variable:


– db2set -im DB2_SELECTIVITY=YES|NO

 When this registry variable is set to YES, the SELECTIVITY clause can be
specified for the following predicate types:
– A user-defined predicate
– A basic predicate in which at least one expression contains host variables or parameter
markers

13 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Influencing the Optimizer - Using SELECTIVITY clause

 Add the SELECTIVITY clause to the SQL statement indicating the selectivity of
the predicate
– over-writes the filter factor being calculated by the optimizer.
– For example, to tell the optimizer that this predicate will select 10% of the data from the table:
D_DATE >= :orderdate SELECTIVITY 0.1

 Check using explain:


8) External Sarg Predicate,
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.1000
Predicate Text:
--------------
(Q2.D_DATE >= :ORDERDATE)

 For more information see:


http
://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admi
n.regvars.doc/doc/r0005664.html?lang=en

14 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Health Check: Check Percentage of Cluster Dedicated to


Big SQL
 The default is a measly 25% to allow for MapReduce, HBase and other Hadoop
components
– Particularly important if comparing Big SQL against Hive or other SQL over Hadoop
solutions that have 100% of cluster resources available

 Check current value:


SELECT NAME, VALUE, DEFERRED_VALUE FROM SYSIBMADM.DBMCFG where name='instance_memory';
+-----------------+----------+----------------+
| NAME | VALUE | DEFERRED_VALUE |
+-----------------+----------+----------------+
| instance_memory | 18077673 | 25 |
+-----------------+----------+----------------+

 If Big SQL is the priority resource on this cluster, increase using autoconfigure
command:

call syshadoop.big_sql_service_mode('on');
autoconfigure using mem_percent 75
workload_type complex
is_populated no
apply db and dbm;

See15 “Session 2: Best Practices: 3. Resource Sharing” for more information.


© 2015 IBM Corporation
IBMIBM
Analytics
Security

Health Check: Check TEMPORARY tablespace paths


 The installation default is to create the TEMPORARY tablespace on a single path/drive –
this is not good for performance
– Does not allow for any parallel io to/from temp storage

 To find out if the default was changed to use the recommended multiple paths, use the
following query:
SELECT member,count(container_name) as count_containers FROM TABLE(MON_GET_CONTAINER('',-
2)) AS t where tbsp_name='TEMPSPACE1' group by member ;
+--------+------------------+
| MEMBER | COUNT_CONTAINERS |
+--------+------------------+
| 0 | 9 |
| 1 | 9 |
| 2 | 9 |
| 3 | 9 |
| 4 | 9 |
+--------+------------------+
 Temporary tablespace should be spread across the same set of disks as your Hadoop
data. See following article to redistribute tablespace paths:
https://developer.ibm.com/hadoop/blog/2016/02/02/redistribute-big-sql-4-x-storage-paths/

See “Session 2: Best Practices: 1. Physical Database Design” for more information.

16 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Health Check: Monitoring TEMPORARY work space


consumption
 Use the following query to monitor the space consumption of temporary tables
select substr(tabschema, 1, 20) as schema,
substr(tabname, 1, 20) as tabname,
cast(substr(tabschema, 2, position('>', tabschema, octets) - 2) as
bigint) as app_handle,
t.data_object_l_pages * q.tbsp_page_size as Data_SIZE,
t.long_object_l_pages * q.tbsp_page_size as LOB_SIZE
from table(mon_get_Table(NULL,NULL, -1)) as t,
table(mon_get_tablespace(NULL,-1)) as q
where t.tbsp_id = q.tbsp_id and
q.tbsp_content_type = 'SYSTEMP' ;

<APP_HANDLE><AUTHID>
+-----------------+--------------------+------------+------------+----------+
| SCHEMA | TABNAME | APP_HANDLE | DATA_SIZE | LOB_SIZE |
+-----------------+--------------------+------------+------------+----------+
| <413><BIGSQL > | TEMP (00001,00336) | 413 | 1591148544 | [NULL] |
| <436><BIGSQL > | TEMP (00001,00002) | 436 | 365330432 | [NULL] |
| <429><BIGSQL > | TEMP (00001,00003) | 429 | 3548217344 | [NULL] |
| <429><BIGSQL > | TEMP (00001,00004) | 429 | 3547267072 | [NULL] |
| <413><BIGSQL > | TEMP (00001,00005) | 413 | 131072 | [NULL] |
| <436><BIGSQL > | TEMP (00001,00006) | 436 | 371982336 | [NULL] |
| <429><BIGSQL > | TEMP (00001,00007) | 429 | 173801472 | [NULL] |
| <429><BIGSQL > | TEMP (00001,00008) | 429 | 173867008 | [NULL] |
| <429><BIGSQL > | TEMP (00001,00009) | 429 | 173899776 | [NULL] |
17 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Health Check: Check Storage Format of data


 Ensure the optimal storage format is being used to store the data
– Parquet is often the format that gives best performance with good compression for analytical
workloads

 To check the storage format being used:


[bigsql] 1> VALUES(SYSHADOOP.HCAT_DESCRIBETAB('TPCH1000G','ORDERS'));
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Hive schema : tpch1000g |
| Hive name : orders |
| Type : MANAGED_TABLE |
| Table params : |
| COLUMN_STATS_ACCURATE = true |
| biginsights.sql.constraints = |
| [{"v":1,"type":"pk","name":"SQL1449620333612","trusted":true,"enforce":false,"cols":["O_ORDERKEY"]},{"v":1,"type":"fk","name":"SQL1449620335604","trust |
| ed":true,"enforce":false,"cols":["O_CUSTKEY"],"fscm":"TPCH1000G","ftab":"CUSTOMER","fcols":["C_CUSTKEY"]}] |
| biginsights.sql.metadata = {"v":1,"source":"BIGSQL","version":"4.0","dependents":[{"schema":"TPCH1000G","table":"LINEITEM"}]} |
| last_modified_by = bigsql |
| last_modified_time = 1449620336 |
| numFiles = 150 |
| numRows = -1 |
| rawDataSize = -1 |
| totalSize = 64545981252 |
| transient_lastDdlTime = 1449620768 |
| SerDe : null |
| SerDe lib : org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
|
| SerDe params : |
| serialization.format = 1 |
| Location : hdfs://bigaperf108.svl.ibm.com:8020/apps/hive/warehouse/tpch1000g.db/orders |
| Inputformat : org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |
| Outputformat : org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

See “Session 2: Best Practices: 2. Storage Formats” for more information.

18 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Health Check: Check Big SQL Configuration


 Check critical properties of Big SQL configuration:
– SORTHEAP & SHEAPTHRES_SHR for sorting and HashJoins
– Bufferpool size
– SMP parallelism
– Optimization Level

 Get current values using:


SELECT NAME, VALUE, DEFERRED_VALUE, DBPARTITIONNUM FROM SYSIBMADM.DBCFG WHERE
DBPARTITIONNUM=0;

SELECT NAME, VALUE, DEFERRED_VALUE FROM SYSIBMADM.DBMCFG;

db2set

See “Session 3: Big SQL Tuning Knobs” for more information.

19 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Health Check: Check the data types

 STRING is bad for Big SQL !


– But is prevalent in the Hadoop and Hive worlds

$ db2 "describe table SIMON.ORDERS"


 This performance degradation Data type Column

can be avoided by: Column name


-------------------------------
schema
---------
Data type name Length Scale Nulls
------------------- ---------- ----- ------
O_ORDERKEY SYSIBM BIGINT 8 0 No
 Change references from STRING to O_CUSTKEY
O_ORDERSTATUS
SYSIBM
SYSIBM
INTEGER
VARCHAR
4
1
0 No
0 No

explicit VARCHAR(n) that most O_TOTALPRICE


O_ORDERDATE
SYSIBM
SYSIBM
DOUBLE
DATE
8
4
0 No
0 No
O_ORDERPRIORITY SYSIBM VARCHAR 15 0 No
appropriately fit the data size O_CLERK
O_SHIPPRIORITY
SYSIBM
SYSIBM
VARCHAR
INTEGER
15
4
0 No
0 No
O_COMMENT SYSIBM VARCHAR 32672 0 No

 Use the bigsql.string.size property 9 record(s) selected.

(via SET HADOOP PROPERTY)


to lower the default size of the
VARCHAR to which the STRING is mapped when creating new tables.

See “Session 2: Best Practices: 4. Big SQL Logical Design” for more information.

20 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Big SQL Performance PD – what docs to gather ?


 Collect db2look information for Big SQL database:
– db2look –d bigsql –e –m –l -f
– See
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.cmd.
doc/doc/r0002051.html?cp=SSEPGG_10.5.0%2F3-5-2-6-80&lang=en

 Collect db2support information for Big SQL database (aka catsim):


– db2support <output_directory> -d <database_name> -cl 0
– See
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.d
oc/doc/t0020808.html?lang=en

 Collect formatted explain of the query using db2exfmt:


– Try to collect the explain with section actuals (which show the actual number of rows processed
at each stage):
– http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.perf.
doc/doc/c0005134.html?lang=en

21 © 2015 IBM Corporation


IBMIBM
Analytics
Security

Big SQL Performance PD – what docs to gather ?


 Collect general configuration information:
– Database Manager Configuration:
db2 “attach to bigsql”
db2 “get dbm cfg show detail ”
db2 “detach”
– Database Configuration:
db2 “connect to bigsql”
db2 “get db cfg for bigsql show detail”
db2 “terminate”
– Big SQL registry variables:
db2set
– Big SQL reader/scheduler configuration files:
tar -cvf `hostname`.bigsql-conf.tar $BIGSQL_HOME/conf/* ./`hostname`.bigsql.*

 Collect Reader log files (if requested):


– See “Big SQL Readers” section of this presentation for more details

22 © 2015 IBM Corporation

You might also like