0% found this document useful (0 votes)

9 views22 pages

BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016

Uploaded by

Celso Bortoli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views22 pages

BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016

Uploaded by

Celso Bortoli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Performance and

SQL
Big Best Practices

Session 7: Problem Determination

IBM Big Data Performance Team

4.1.0.2

For questions about this presentation contact Simon Harris [email protected] © 2015 IBM Corporation
IBMIBM
Analytics
Security

Session 7:
Big SQL Performance
Problem Determination
1. General Problem Determination
2. Performance Problem Determination:
a. Statistics
b. How to identify a Suspicious Plan
c. Ways to influence the Optimizer
d. Big SQL Health Check
e. Docs to collect

2 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Big SQL Diagnostics

 Big SQL logs information to a number of log files located in (v4 and
beyond): /var/ibm/bigsql

 Log used depends on the component writing the message:

File Description

diag/DIAGXXXX The "db2dump" directory. Contains Big SQL (DB2) runtime log files,
event files, trace information, dumps, and traps. ‘XXXX’ represents the
node number.
logs/bigsql.log Log messages for the java I/O and DDL handler
logs/bigsql-sched.log Log for the scheduler service. Only found on the host on which the
scheduler is installed (typically the Big SQL master node)
logs/bigsql-ndfsio.log Log for the native I/O engine

3 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Diagnostics – Error Messages and Logs

 There are a number of error messages that can be issued similar to:

SQL5197N
SQL5197N The
The statement
statement failed
failed because
because of
of aa communication
communication error
error
with a Big SQL component. Big SQL component name:
with a Big SQL component. Big SQL component name: "HDFS"."HDFS".
Reason
Reason code:
code: "1".
"1". Log
Log entry
entry identifier:
identifier: "BSL-9-3ef8abc4".
"BSL-9-3ef8abc4".

 The "Log entry identifier" indicates that the details of the failure may be found
in a specific Big SQL log file. The format is:

NRL-9-3ef8abc4
Which Log File Which Node What to Look For
BSL – bigsql.log The node # (from A unique string to
SCL – bigsql-sched.log db2nodes.cfg) that search for in the log
NRL – bigsql-ndfsio.log owns the log file to locate the error
DDL – DB2 diag logs

 An empty identifier usually means its in the db2diag.log file

4 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Diagnostics – Retrieving Error Messages

 Edit the appropriate file and search for the “Log entry identifier”.

 Info related to the error messages can also be retrieved using SQL:
SELECT * FROM table(SYSHADOOP.LOG_ENTRY('log_entry_id'));

 For example:

SELECT * FROM table(SYSHADOOP.LOG_ENTRY('BSL-9-3ef8abc4'));

 Can also control the number of lines output before and after:
– Output 30 lines before, 30 lines after:

SELECT * FROM table(SYSHADOOP.LOG_ENTRY('BSL-9-3ef8abc4', 30, 30));

5 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Big SQL Performance Problem Determination

 Most likely causes of Big SQL performance issues - in approximate order (most
frequent first):
1. Incomplete or inaccurate statistics
2. Only small amount of cluster resources dedicated to Big SQL – default of 25% used
3. Single disk used for temporary working data – default of one path/disk used
4. Not using the most optimal storage format
5. Big SQL Configuration
6. Mapping Hive String data types as VARCHAR(32k) – default of 32k used

6 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Big SQL Performance PD – Check statistics

 ANALYZE, ANALYZE, ANALYZE………

– Most performance issues in Big SQL are resolved by running ANALYZE on the correct set
of tables and columns

 First port of call for a Big SQL performance problem – Make sure ANALYZE has
been run and statistics are up to date

 Check STATS_TIME and CARD in SYSCAT.TABLES to see if ANALYZE has been

run on the TABLE:

select substr(tabname,1,20),stats_time,card from syscat.tables where tabschema='SIMON'"

1 STATS_TIME CARD
------------ -------------------------- --------------------
NATION - -1
ORDERS - -1
CUSTOMER - -1
LINEITEM - -1
STATS-TIME=-. CARD=-1.
ANALYZE not ANALYZE not
run. run.
7 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Big SQL Optimizer – Statistics are crucial

 Can also check “Extended Diagnostic Information” in a formatted explain – it will
identify which tables/columns being referenced in the query do not have
statistics

Extended Diagnostic Information:

--------------------------------

Diagnostic Identifier: 1
Diagnostic Details: EXP0021W Table column has no statistics. The
column "DATEKEY" of table “EDW ".
"DIMENSION" has not had runstats run on it.
This can lead to poor cardinality and predicate
filtering estimates.
Diagnostic Identifier: 2
Diagnostic Details: EXP0021W Table column has no statistics. The
column "APP_TYPE" of table “EDW ".“FACT"
has not had runstats run on it. This can lead to
poor cardinality and predicate filtering estimates.

 But it cannot tell you if the statistics are out of date or inaccurate

See “Session 2: Best Practices: 6. Statistics” for more information.

See “Session 6: Big SQL Optimizer: Explain” for more information on formatted
explains.
8 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Suspicious plans !!!

 If a query is still slow once stats have been
gathered for all necessary tables & columns.

 Explain the query and look for sections in the

plan which:
– are NOT using HashJoins as the join type Note: These are not always
– Have fat BTQs signs of a bad plan, but they
– Are using replicated Hadoop scans on large are indicators – especially if
amounts of data the data volumes are large

 Warning flags to look out for in the explain:

 Nested Loop Joins (NLJOIN operator) - can be very
bad

 Nested Loop Joins without a TEMP on the inner -

can be very, very, very bad

 Merge Scan Joins (MSJOIN)

9 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Suspicious plans - Nested Loop Joins

 Nested Loop Joins (NLJOIN) are usually bad for performance in Big SQL
– Chiefly because Big SQL does not support indexes which therefore prevents efficient indexed
lookups on the inner table

 Therefore, in BigSQL, a NLJOIN will scan all the qualifying rows of the inner for
every row of the outer
– This can be very costly if there are more than just a handful of rows on the outer

 On the face of it, this plan doesn’t look so bad 242.747

– 1 row qualifies in the outer (LHS) of the NLJOIN, so NLJOIN
( 3)
the inner (RHS) 364 rows (estimate) will be scanned just 184.025
2
once /------+-------\
– But what happens if stats have not been collected 1
TBSCAN
364.12
TBSCAN
for the EDW.FACT table, or the optimizers ( 4) ( 5)
90.3946 93.6308
selectivity estimates (which lead to an estimate of 1 1
1 row from the FACT table qualifying) are inaccurate ? |
27470
|
9103
– Might lead to many more rows qualifying for the outer table HTABLE: EDW HTABLE: EDW
FACT DIMENSION
which will mean many more scans of the inner. Q4 Q3

10 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Suspicious plans - Nested Loop Joins

 Optimizer might be forced to choose a NLJOIN if:
– there is an inequality join predicate in the query
• HSJOIN cannot be used in these cases
• Left with choice between Nested Loop and Merge Scan joins
– if the data types of the joined columns do not match
• May include cases where one or more of columns have a function call
• HSJOIN can only be used in limited circumstances here
• Try changing the column types of the join columns in the DDL

 Nested Loop Joins (NLJOIN) can be efficient in Big SQL when it can guarantee
there is only a single row qualifying on the outer
– And also for Hbase lookups

 If you see a NLJOIN in a plan, treat it with suspicion

– Most can be changed to a HSJOIN by running ANALYZE on the tables & columns used in the
query

11 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Big SQL Optimizer – Be wary of FAT BTQs

 Be wary of Broadcast Table Queues moving large amounts of data

This BTQ will send 722M rows 2.79356e+06

^HSJOIN
from each data node to every ( 11)
other data node….. 9.50434e+06
So each data node process ALL the 61907.1
/-----------------+------------------\
data for outer leg of this join. 7.22346e+08 2.43121e+07
In an 18-node cluster each node will BTQ HSJOIN^
process 722M*18=12,996M rows on the ( 12) ( 21)
outer leg.

With Directed Table Queues, data is partitioned

2.79356e+06
on the fly by the join key and only sent to the node ^HSJOIN
responsible for processing that key. ( 14)
So each node only process a subset of the 1.97698e+06
49690
data for this join. /--------+---------\
In an 18-node cluster each node will 7.22346e+08 2.43121e+07
process approx 722M rows on the DTQ DTQ
( 12) ( 21)
outer leg.

 The latter will scale much better than the former

12 © 2015 IBM Corporation
IBMIBM
Analytics
Security

Influencing the Optimizer - Using SELECTIVITY clause

 If correct statistics are not enough to achieve an efficient access plan, then
there are alternative ways to influence the access plan chosen by the optimizer

 Use the SELECTIVITY clause to tell the optimizer the correct filter factor of
predicates
– Useful for influencing plans

 Set the following registry variable:

– db2set -im DB2_SELECTIVITY=YES|NO

 When this registry variable is set to YES, the SELECTIVITY clause can be
specified for the following predicate types:
– A user-defined predicate
– A basic predicate in which at least one expression contains host variables or parameter
markers

13 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Influencing the Optimizer - Using SELECTIVITY clause

 Add the SELECTIVITY clause to the SQL statement indicating the selectivity of
the predicate
– over-writes the filter factor being calculated by the optimizer.
– For example, to tell the optimizer that this predicate will select 10% of the data from the table:
D_DATE >= :orderdate SELECTIVITY 0.1

 Check using explain:

8) External Sarg Predicate,
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.1000
Predicate Text:
--------------
(Q2.D_DATE >= :ORDERDATE)

 For more information see:

http
://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admi
n.regvars.doc/doc/r0005664.html?lang=en

14 © 2015 IBM Corporation

IBMIBM
Analytics
Security

Health Check: Check Percentage of Cluster Dedicated to

Big SQL
 The default is a measly 25% to allow for MapReduce, HBase and other Hadoop
components
– Particularly important if comparing Big SQL against Hive or other SQL over Hadoop
solutions that have 100% of cluster resources available

 Check current value:

SELECT NAME, VALUE, DEFERRED_VALUE FROM SYSIBMADM.DBMCFG where name='instance_memory';
+-----------------+----------+----------------+
| NAME | VALUE | DEFERRED_VALUE |
+-----------------+----------+----------------+
| instance_memory | 18077673 | 25 |
+-----------------+----------+----------------+

 If Big SQL is the priority resource on this cluster, increase using autoconfigure
command:

call syshadoop.big_sql_service_mode('on');
autoconfigure using mem_percent 75
workload_type complex
is_populated no
apply db and dbm;

See15 “Session 2: Best Practices: 3. Resource Sharing” for more information.

Health Check: Check TEMPORARY tablespace paths

 The installation default is to create the TEMPORARY tablespace on a single path/drive –
this is not good for performance
– Does not allow for any parallel io to/from temp storage

 To find out if the default was changed to use the recommended multiple paths, use the
following query:
SELECT member,count(container_name) as count_containers FROM TABLE(MON_GET_CONTAINER('',-
2)) AS t where tbsp_name='TEMPSPACE1' group by member ;
+--------+------------------+
| MEMBER | COUNT_CONTAINERS |
+--------+------------------+
| 0 | 9 |
| 1 | 9 |
| 2 | 9 |
| 3 | 9 |
| 4 | 9 |
+--------+------------------+
 Temporary tablespace should be spread across the same set of disks as your Hadoop
data. See following article to redistribute tablespace paths:
https://developer.ibm.com/hadoop/blog/2016/02/02/redistribute-big-sql-4-x-storage-paths/

See “Session 2: Best Practices: 1. Physical Database Design” for more information.

IBMIBM
Analytics
Security

Health Check: Monitoring TEMPORARY work space

consumption
 Use the following query to monitor the space consumption of temporary tables
select substr(tabschema, 1, 20) as schema,
substr(tabname, 1, 20) as tabname,
cast(substr(tabschema, 2, position('>', tabschema, octets) - 2) as
bigint) as app_handle,
t.data_object_l_pages * q.tbsp_page_size as Data_SIZE,
t.long_object_l_pages * q.tbsp_page_size as LOB_SIZE
from table(mon_get_Table(NULL,NULL, -1)) as t,
table(mon_get_tablespace(NULL,-1)) as q
where t.tbsp_id = q.tbsp_id and
q.tbsp_content_type = 'SYSTEMP' ;

Health Check: Check Storage Format of data

 Ensure the optimal storage format is being used to store the data
– Parquet is often the format that gives best performance with good compression for analytical
workloads

 To check the storage format being used:

[bigsql] 1> VALUES(SYSHADOOP.HCAT_DESCRIBETAB('TPCH1000G','ORDERS'));
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| Hive schema : tpch1000g |
| Hive name : orders |
| Type : MANAGED_TABLE |
| Table params : |
| COLUMN_STATS_ACCURATE = true |
| biginsights.sql.constraints = |
| [{"v":1,"type":"pk","name":"SQL1449620333612","trusted":true,"enforce":false,"cols":["O_ORDERKEY"]},{"v":1,"type":"fk","name":"SQL1449620335604","trust |
| ed":true,"enforce":false,"cols":["O_CUSTKEY"],"fscm":"TPCH1000G","ftab":"CUSTOMER","fcols":["C_CUSTKEY"]}] |
| biginsights.sql.metadata = {"v":1,"source":"BIGSQL","version":"4.0","dependents":[{"schema":"TPCH1000G","table":"LINEITEM"}]} |
| last_modified_by = bigsql |
| last_modified_time = 1449620336 |
| numFiles = 150 |
| numRows = -1 |
| rawDataSize = -1 |
| totalSize = 64545981252 |
| transient_lastDdlTime = 1449620768 |
| SerDe : null |
| SerDe lib : org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
|
| SerDe params : |
| serialization.format = 1 |
| Location : hdfs://bigaperf108.svl.ibm.com:8020/apps/hive/warehouse/tpch1000g.db/orders |
| Inputformat : org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |
| Outputformat : org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

See “Session 2: Best Practices: 2. Storage Formats” for more information.

IBMIBM
Analytics
Security

Health Check: Check Big SQL Configuration

 Check critical properties of Big SQL configuration:
– SORTHEAP & SHEAPTHRES_SHR for sorting and HashJoins
– Bufferpool size
– SMP parallelism
– Optimization Level

 Get current values using:

SELECT NAME, VALUE, DEFERRED_VALUE, DBPARTITIONNUM FROM SYSIBMADM.DBCFG WHERE
DBPARTITIONNUM=0;

SELECT NAME, VALUE, DEFERRED_VALUE FROM SYSIBMADM.DBMCFG;

db2set

See “Session 3: Big SQL Tuning Knobs” for more information.

IBMIBM
Analytics
Security

Health Check: Check the data types

 STRING is bad for Big SQL !

– But is prevalent in the Hadoop and Hive worlds

$ db2 "describe table SIMON.ORDERS"

 This performance degradation Data type Column

can be avoided by: Column name

-------------------------------
schema
---------
Data type name Length Scale Nulls
------------------- ---------- ----- ------
O_ORDERKEY SYSIBM BIGINT 8 0 No
 Change references from STRING to O_CUSTKEY
O_ORDERSTATUS
SYSIBM
SYSIBM
INTEGER
VARCHAR
4
1
0 No
0 No

explicit VARCHAR(n) that most O_TOTALPRICE

O_ORDERDATE
SYSIBM
SYSIBM
DOUBLE
DATE
8
4
0 No
0 No
O_ORDERPRIORITY SYSIBM VARCHAR 15 0 No
appropriately fit the data size O_CLERK
O_SHIPPRIORITY
SYSIBM
SYSIBM
VARCHAR
INTEGER
15
4
0 No
0 No
O_COMMENT SYSIBM VARCHAR 32672 0 No

 Use the bigsql.string.size property 9 record(s) selected.

(via SET HADOOP PROPERTY)

to lower the default size of the
VARCHAR to which the STRING is mapped when creating new tables.

See “Session 2: Best Practices: 4. Big SQL Logical Design” for more information.

IBMIBM
Analytics
Security

Big SQL Performance PD – what docs to gather ?

 Collect db2look information for Big SQL database:
– db2look –d bigsql –e –m –l -f
– See
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.cmd.
doc/doc/r0002051.html?cp=SSEPGG_10.5.0%2F3-5-2-6-80&lang=en

 Collect db2support information for Big SQL database (aka catsim):

– db2support <output_directory> -d <database_name> -cl 0
– See
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.trb.d
oc/doc/t0020808.html?lang=en

 Collect formatted explain of the query using db2exfmt:

– Try to collect the explain with section actuals (which show the actual number of rows processed
at each stage):
– http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.perf.
doc/doc/c0005134.html?lang=en

IBMIBM
Analytics
Security

Big SQL Performance PD – what docs to gather ?

 Collect general configuration information:
– Database Manager Configuration:
db2 “attach to bigsql”
db2 “get dbm cfg show detail ”
db2 “detach”
– Database Configuration:
db2 “connect to bigsql”
db2 “get db cfg for bigsql show detail”
db2 “terminate”
– Big SQL registry variables:
db2set
– Big SQL reader/scheduler configuration files:
tar -cvf `hostname`.bigsql-conf.tar $BIGSQL_HOME/conf/* ./`hostname`.bigsql.*

 Collect Reader log files (if requested):

– See “Big SQL Readers” section of this presentation for more details

GETT Breen Records - Redacted
No ratings yet
GETT Breen Records - Redacted
1,141 pages
Oracle FMW12c On SLES12-SP3 PDF
No ratings yet
Oracle FMW12c On SLES12-SP3 PDF
298 pages
Math 146 Finals Reviewer
No ratings yet
Math 146 Finals Reviewer
4 pages
How To Hint
No ratings yet
How To Hint
20 pages
Good PDF 4303162 2
No ratings yet
Good PDF 4303162 2
33 pages
FDBMS Question Bank Answer
No ratings yet
FDBMS Question Bank Answer
28 pages
IT - Ebook - Semester 2
No ratings yet
IT - Ebook - Semester 2
67 pages
Main
No ratings yet
Main
62 pages
Toledo Assignment 3
0% (1)
Toledo Assignment 3
4 pages
Optimizing BigQuery SQL Queries - A Comprehensive Guide
No ratings yet
Optimizing BigQuery SQL Queries - A Comprehensive Guide
27 pages
Updated UM Praptra1
No ratings yet
Updated UM Praptra1
17 pages
Fintech Dictionary Terminology For The Digitalized Financial World Rainer Alt Stefan Huch PDF Download
No ratings yet
Fintech Dictionary Terminology For The Digitalized Financial World Rainer Alt Stefan Huch PDF Download
77 pages
Handle Error Mysql
No ratings yet
Handle Error Mysql
15 pages
Malware Detection: A Framework For Reverse Engineered Android Applications Through Machine Learning Algorithms
No ratings yet
Malware Detection: A Framework For Reverse Engineered Android Applications Through Machine Learning Algorithms
20 pages
02 Requirements Specification++MUR4
No ratings yet
02 Requirements Specification++MUR4
14 pages
Lecture 02 Write Basic Go Web Server
No ratings yet
Lecture 02 Write Basic Go Web Server
17 pages
Daftar Pustaka - (New)
No ratings yet
Daftar Pustaka - (New)
13 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
ALL Sessions For User Conference
No ratings yet
ALL Sessions For User Conference
12 pages
Finals Reviewer
100% (1)
Finals Reviewer
11 pages
WTM Service Manual (ENG)
No ratings yet
WTM Service Manual (ENG)
36 pages
EAP0320 Syllabus
No ratings yet
EAP0320 Syllabus
8 pages
Doc-IT Features and Benefits
No ratings yet
Doc-IT Features and Benefits
8 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
TT SQL Cheat Sheet
No ratings yet
TT SQL Cheat Sheet
7 pages
Software, Os, Assembler, Interpreter and Compiler
No ratings yet
Software, Os, Assembler, Interpreter and Compiler
20 pages
Lecture 6 - CS50's Introduction To Databases With SQL
No ratings yet
Lecture 6 - CS50's Introduction To Databases With SQL
14 pages
Database Performance Tuning by Examples
No ratings yet
Database Performance Tuning by Examples
16 pages
Anderson - Network Models
No ratings yet
Anderson - Network Models
41 pages
101 Unit 1 Fall 2021
No ratings yet
101 Unit 1 Fall 2021
20 pages
Query Processing and Optimization - Lab-PP
No ratings yet
Query Processing and Optimization - Lab-PP
10 pages
Informatica Scenario Based
100% (1)
Informatica Scenario Based
366 pages
Dbms-Lab Assignment - 1: Name - VIKAS SINGH Roll No - 4257
No ratings yet
Dbms-Lab Assignment - 1: Name - VIKAS SINGH Roll No - 4257
3 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
SAP BW Useful Tables
No ratings yet
SAP BW Useful Tables
12 pages
Gratisexam Com Oracle Testkings 1z0 064 v2019!01!16 by Ryan 53q
No ratings yet
Gratisexam Com Oracle Testkings 1z0 064 v2019!01!16 by Ryan 53q
43 pages
Query Optimization
No ratings yet
Query Optimization
17 pages
SQL Tuning Examples 3
No ratings yet
SQL Tuning Examples 3
5 pages
Heavin-2018-Challenges For Digital Transformat
No ratings yet
Heavin-2018-Challenges For Digital Transformat
9 pages
4.4.tuning SQL Execution-Plan
No ratings yet
4.4.tuning SQL Execution-Plan
56 pages
AI Question Bank
No ratings yet
AI Question Bank
3 pages
Job Scheduling in HPC Cluster
No ratings yet
Job Scheduling in HPC Cluster
4 pages
What To Expect in Oracle 19c
100% (1)
What To Expect in Oracle 19c
42 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
Oracle SQL High Performance Tuning: Guy Harrison Director, R&D Melbourne
100% (1)
Oracle SQL High Performance Tuning: Guy Harrison Director, R&D Melbourne
56 pages
Monaco
100% (2)
Monaco
432 pages
Explain Plan
No ratings yet
Explain Plan
2 pages
Opatch Auto
No ratings yet
Opatch Auto
2 pages
SQL
No ratings yet
SQL
12 pages
Before
No ratings yet
Before
13 pages
Hitachi
No ratings yet
Hitachi
7 pages
Big Data Engineering Interview Questions
67% (3)
Big Data Engineering Interview Questions
189 pages
19c Real-Time and High Frequency Statistics Collection
No ratings yet
19c Real-Time and High Frequency Statistics Collection
75 pages
String Pattern SQL
No ratings yet
String Pattern SQL
5 pages
Module 4 - Database Management Systems - Exam Notes
No ratings yet
Module 4 - Database Management Systems - Exam Notes
15 pages
SQL Techniques
No ratings yet
SQL Techniques
37 pages
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
No ratings yet
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
41 pages
... Through SQL: Connor Mcdonald
No ratings yet
... Through SQL: Connor Mcdonald
44 pages
Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte
No ratings yet
Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte
41 pages
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
No ratings yet
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
41 pages
Wait Event - Latch: Row Cache Objects - DC - Objects
No ratings yet
Wait Event - Latch: Row Cache Objects - DC - Objects
11 pages
Db2 SQL Tuning Tips
100% (1)
Db2 SQL Tuning Tips
11 pages
Missing Partition Key Column Stats
No ratings yet
Missing Partition Key Column Stats
41 pages
Effective Query Writing
No ratings yet
Effective Query Writing
32 pages
Dbms Answer
No ratings yet
Dbms Answer
3 pages
Oracle SQL Tuning: Presented by Akin S Walter-Johnson Ms Principal Peerlabs, Inc
No ratings yet
Oracle SQL Tuning: Presented by Akin S Walter-Johnson Ms Principal Peerlabs, Inc
45 pages
Db2 SQL Tuning
No ratings yet
Db2 SQL Tuning
26 pages
Advance SQL
No ratings yet
Advance SQL
12 pages
SQL Tuning
No ratings yet
SQL Tuning
27 pages
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
No ratings yet
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
41 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
13 pages
Airmaster Q Series Q1 Controller Software For Positive Displacement Compressor Factsheet
No ratings yet
Airmaster Q Series Q1 Controller Software For Positive Displacement Compressor Factsheet
8 pages
Oracle Index Rebuild Automation
No ratings yet
Oracle Index Rebuild Automation
19 pages
Schneider 10g Tuning Highlights
No ratings yet
Schneider 10g Tuning Highlights
17 pages
Oracle E-Biz Performance
No ratings yet
Oracle E-Biz Performance
32 pages
Teradata Performance Tuning - Basic Tips
No ratings yet
Teradata Performance Tuning - Basic Tips
7 pages
Proven Process For SQL Tuning: Dean Richards Senior DBA, Confio Software
No ratings yet
Proven Process For SQL Tuning: Dean Richards Senior DBA, Confio Software
30 pages
Explain: Window To The DB2 Optimizer
No ratings yet
Explain: Window To The DB2 Optimizer
60 pages
DB2 SQL Tuning
No ratings yet
DB2 SQL Tuning
53 pages
Abinitio
No ratings yet
Abinitio
1 page
Performance and Tuning: Oracle Initialization Parameters Used in The Compilation of PLSQL Units
No ratings yet
Performance and Tuning: Oracle Initialization Parameters Used in The Compilation of PLSQL Units
19 pages
SQL Server Query Optimization Techniques PDF
No ratings yet
SQL Server Query Optimization Techniques PDF
9 pages
Perofrmance and Indexes Discussion Questions Solutions PDF
No ratings yet
Perofrmance and Indexes Discussion Questions Solutions PDF
5 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
Ee PDF V2019-May-17 by Theobald 40q Vce
No ratings yet
Ee PDF V2019-May-17 by Theobald 40q Vce
12 pages
Learn SQL: Database Management Basics
From Everand
Learn SQL: Database Management Basics
Kiet Huynh
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SQL Server Interview Questions You'll Most Likely Be Asked
From Everand
SQL Server Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016

Uploaded by

BigSQLv4102 BestPractices and Performance Session7 ProblemDetermination 15012016

Uploaded by

Performance and

Session 7: Problem Determination

IBM Big Data Performance Team

2 © 2015 IBM Corporation

Big SQL Diagnostics

 Log used depends on the component writing the message:

3 © 2015 IBM Corporation

Diagnostics – Error Messages and Logs

 An empty identifier usually means its in the db2diag.log file

Diagnostics – Retrieving Error Messages

SELECT * FROM table(SYSHADOOP.LOG_ENTRY('BSL-9-3ef8abc4'));

SELECT * FROM table(SYSHADOOP.LOG_ENTRY('BSL-9-3ef8abc4', 30, 30));

5 © 2015 IBM Corporation

Big SQL Performance Problem Determination

6 © 2015 IBM Corporation

Big SQL Performance PD – Check statistics

 ANALYZE, ANALYZE, ANALYZE………

 Check STATS_TIME and CARD in SYSCAT.TABLES to see if ANALYZE has been

select substr(tabname,1,20),stats_time,card from syscat.tables where tabschema='SIMON'"

Big SQL Optimizer – Statistics are crucial

Extended Diagnostic Information:

See “Session 2: Best Practices: 6. Statistics” for more information.

Suspicious plans !!!

 Explain the query and look for sections in the

 Warning flags to look out for in the explain:

 Nested Loop Joins without a TEMP on the inner -

 Merge Scan Joins (MSJOIN)

9 © 2015 IBM Corporation

Suspicious plans - Nested Loop Joins

 On the face of it, this plan doesn’t look so bad 242.747

10 © 2015 IBM Corporation

Suspicious plans - Nested Loop Joins

 If you see a NLJOIN in a plan, treat it with suspicion

11 © 2015 IBM Corporation

Big SQL Optimizer – Be wary of FAT BTQs

This BTQ will send 722M rows 2.79356e+06

With Directed Table Queues, data is partitioned

 The latter will scale much better than the former

Influencing the Optimizer - Using SELECTIVITY clause

 Set the following registry variable:

13 © 2015 IBM Corporation

Influencing the Optimizer - Using SELECTIVITY clause

 Check using explain:

 For more information see:

14 © 2015 IBM Corporation

Health Check: Check Percentage of Cluster Dedicated to

 Check current value:

See15 “Session 2: Best Practices: 3. Resource Sharing” for more information.

Health Check: Check TEMPORARY tablespace paths

16 © 2015 IBM Corporation

Health Check: Monitoring TEMPORARY work space

Health Check: Check Storage Format of data

 To check the storage format being used:

See “Session 2: Best Practices: 2. Storage Formats” for more information.

18 © 2015 IBM Corporation

Health Check: Check Big SQL Configuration

 Get current values using:

SELECT NAME, VALUE, DEFERRED_VALUE FROM SYSIBMADM.DBMCFG;

See “Session 3: Big SQL Tuning Knobs” for more information.

19 © 2015 IBM Corporation

Health Check: Check the data types

 STRING is bad for Big SQL !

$ db2 "describe table SIMON.ORDERS"

can be avoided by: Column name

explicit VARCHAR(n) that most O_TOTALPRICE

 Use the bigsql.string.size property 9 record(s) selected.

(via SET HADOOP PROPERTY)

20 © 2015 IBM Corporation

Big SQL Performance PD – what docs to gather ?

 Collect db2support information for Big SQL database (aka catsim):

 Collect formatted explain of the query using db2exfmt:

21 © 2015 IBM Corporation

Big SQL Performance PD – what docs to gather ?

 Collect Reader log files (if requested):

22 © 2015 IBM Corporation

You might also like