0% found this document useful (0 votes)

65 views6 pages

Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0

This document discusses how the formulation and structure of SQL queries can significantly impact performance due to differences in how the database query optimizer processes the queries. Queries that appear similar may take vastly different times to execute depending on factors like whether attributes are indexed, the selectivity of conditions, and how uniformly data is distributed in the database. The order of conditions in WHERE clauses and choices made by the optimizer can result in some queries accessing many more table rows than necessary. Maintaining statistics on data distributions and structure is important for optimizers to generate efficient execution plans.

Uploaded by

abidchanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views6 pages

Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0

Uploaded by

abidchanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Optimizing SQL Query Processing

Abstract
Query performance in relational database systems is dependent not only on the database structure, but also on the way in which the query is optimized. We show various classes of syntactically equivalent SQL queries, each of which can exhibit substantial differences in data access depending on the characteristics of the query formulation and the success of the database query optimizer. Simply put, similar looking queries can take significantly different times to execute. We conclude that on-line analytic processing systems must not depend on dynamic user specified SQL queries if consistent overall system performance is required. If SQL queries can be structured dynamically from user input, then system designers will not be able to guarantee performance.

Introduction
SQL query processing requires that the DBMS identify and execute a strategy for retrieving the results of the query. The SQL query determines what data is to be found, but does not define the method by which the data manager searches the database. Hence, query optimization is necessary for high-level relational queries and provides an opportunity for the DBMS to systematically evaluate alternative query execution strategies and to choose an optimal strategy. In some cases the data manager cannot determine the optimal strategy. Assumptions are made which are predicated on the actual structure of the SQL query. These assumptions can significantly affect the query performance. This implies that certain queries can exhibit significantly different response times for relatively innocuous changes in query syntax and structure. For the purpose of this discussion an example medical database will be used. Figure 1 below illustrates our subject database schema for physicians, patients, and medical services. The Physician table contains one row for every physician in the system. Various attributes describe the physician name, address, provider number and specialty. The Patient table contains one row for every individual in the system. Patients have attributes listing their social security number, name, residence area, age, gender, and doctor. For simplicity, a physician can see many patients, but a patient has only one doctor. A Services table exists which lists all the valid medical procedures which can be performed. When a patient is ill and under the care of a physician, a row exists in the Treatment table describing the prescribed treatment. This table contains one attribute recording the cost of the individual service and a compound key that identifies the patient, physician, and the specific service received.

P a tie n t

SSN

N am e

A ge

G ender

A re a

D o c to r

1 ,0 0 0 ,0 0 0

P h y sc ia n

P ro v id e r

D r_ S S N

S p e c ia lty

D r_ N a m e

D r_ A d d r e s s

1 ,0 0 0

S e r v ic e

S e rv ic e

Type

1 0 ,0 0 0

T r e a tm e n t

P a tie n t

D rN u m

S rv nu m

C o st

1 0 ,0 0 0 ,0 0 0

Figure 1

Query Processing
The steps necessary for processing an SQL query are shown in Figure 2. The SQL query statement is first parsed into its constituent parts. The basic SELECT statement is formed from the three clauses SELECT, FROM, and WHERE. These parts identify the various tables and columns that participate in the data selection process. The WHERE clause is used to determine the order and precedence of the various attribute comparisons through a conditional expression. An example query to determine the names and addresses of all patients of Doctor 1234 is shown as query Q1 below. The WHERE clause uses a conjunctive clause which combines two attribute comparisons. More complex conditions are possible. Q1: SELECT Name, Address, Dr_Name FROM Patient, Physician WHERE Patient.Doctor = Physician.Provider AND Physician.Provider = 1234

The query optimizer has the task of determining the optimum query execution plan. The term optimizer is actually a misnomer, because in many cases the optimum strategy is not found. The goal is to find a reasonably efficient strategy for executing the query. Finding the perfect strategy is usually too time consuming and can require detailed information on both the data storage structure and the actual data content. Usually this information is simply not available. Once the execution plan is established the query code is generated. Various techniques such as memory management, disk caching and parallel query execution can be used to improve the query performance. However, if the plan is not correct, then the query performance cannot be optimum.

SQL query

Scanning and Parsing (Syntax Analysis) Intermediate form of query

Query Optimizer Execution Plan

Query Code Generator

Code to execute the query

Runtime Database Processor Result of query

Figure 2

William Miles

5/22/2005

Query Optimizing
There are two main techniques for query optimization. The first approach is to use a rule based or heuristic method for ordering the operations in a query execution strategy. The rules usually state general characteristics for data access, such as it is more efficient to search a table using an index, if available, than a full table scan. The second approach systematically estimates the cost of different execution strategies and chooses the least cost solution. This approach uses simple statistics about the data structure size and organization as arguments to a cost estimating equation. In practice most commercial database systems use a combination of both techniques.

Indexes Consider, for example, a rule-based technique for query optimization that states that indexed access to data is preferable to a full table scan. Whenever a single condition specifies the selection, it is a simple matter to check whether or not an indexed access path exists for the attribute involved in the condition. Queries Q2 and Q3 are two queries which, from a syntactic structure, are identical. However, query Q2 uses an index on the patient number, and query Q3 does not have an index on the patient name. Assuming a balanced tree based index, query Q2 will at worst case access on the order of log2 (n) entries to locate the required row in the table. Conversely, query Q3 must search on average n/2 rows to find the entry during a full table scan, and n rows if the entry does not exist in the table. When n = 1,000,000 this is the difference between accessing 20 rows versus 500,000 rows for a successful search. Clearly, indexing can significantly improve query performance. However, it is not always practical to index every attribute in every table, thus certain types of user queries can respond quite differently from others.

Q2:

SELECT * FROM Patient WHERE Patient.SSN = 11111111 SELECT * FROM Patient WHERE Patient.Name = Doe, John Q.

In this query, the SSN attribute is the primary key index for the Patient table. In this query, no index exists on the Name attribute. This requires a full table scan.

Q3:

Selectivities A more significant problem occurs when more than one condition is used in a conjunctive selection. In this case the selectivity of each condition must be considered. Selectivity is defined as the ratio between the number of rows that satisfy the condition to the total number of rows in the table. This is the probability that a row satisfies the condition, assuming a uniform distribution. If the selectivity is small, then only a few rows are selected by the condition, and it is desirable to use this condition first when retrieving records. To calculate selectivities, the database manager needs statistics on all table and attribute values. The heuristic rule states that, for multiple conjunctive conditions, the order of application is from smallest selectivity to largest. Queries Q4 and Q5 illustrate multiple conditions in a conjunctive selection on the Patient table. Consider the case where the selectivity on Age is 10,000/1,000,000 = 0.01 (Age is assumed to be uniformly distributed between 0 and 100). The selectivity on Gender is 500,000/1,000,000 = 0.5 (Gender is assumed to be either M or F). It is clear that by using age as the first retrieval condition, 10,000 rows are accessed for testing against the gender condition, versus accessing 500,000 rows if the gender attribute was chosen first. This is a 50 times performance difference. Selectivities can be used only if statistics are maintained by the database manager. If this information is not available, then the order of condition testing often defaults to the order of conditions as specified in the WHERE clause. William Miles 5/22/2005

Q4:

SELECT * FROM Patient WHERE Age = 45 AND Gender = M SELECT * FROM Patient WHERE Gender = M AND Age = 45

In this query, the Age attribute is specified first.

Q5:

This query specifies Gender first.

Uniformity In many cases the actual data does not follow a uniform distribution. Consider the case where 95% of the patients live in the province of New Brunswick and the remaining 5% live in 199 different states and countries of the world. In this case there are 200 different values for the Area attribute. The selectivity of the Area attribute, assuming a uniform distribution, is 5,000/1,000,000 = 0.005. Thus, this attribute will be accessed first given any query with a conjunctive clause relating Area and Age. In the example below, query Q6 selects Area based on the province of Ontario. We estimate that (5% of 1,000,000) / 199, or 251 patients live in Ontario. These rows are accessed first and then tested against the Age condition. Conversely, query Q7 selects patients in the province of New Brunswick. In this case, 950,000 patient rows are accessed, or more than 3,700 times the number of rows for the Ontario example. The distribution was skewed sufficiently to result in a poor choice by the query optimizer. Clearly, non-uniform data distributions can significantly affect query performance.

Q6:

SELECT * FROM Patient WHERE Area = Ontario AND Age = 45 SELECT * FROM Patient WHERE Area = New Brunswick AND Age = 45

A uniform distribution for out of province residents predicts that 251 patients live in Ontario. Actual data has 950,000 patients living in New Brunswick.

Q7:

Disjunctive Clauses A disjunctive clause occurs when simple conditions are connected by the OR logical connective rather than AND. These clauses are much harder to process and optimize. For example, consider query Q8, which uses a disjunctive clause relating a specific doctor and the patient area of residence. With such a condition, little optimization can be done because the rows satisfying the query are the union of the rows satisfying each of the individual conditions. If any one of the search conditions does not have an access path, then the query optimizer is compelled to choose a full table scan to satisfy the query. Performance can only be improved if an access path exists on every condition in the disjunctive clause. In this case, row sets can be found satisfying each condition and then combined through applying a union operation across the result sets to eliminate duplicate rows. However, set union operations can also be expensive. The customary way to implement union operations is to sort the relations on the same attributes and then scan the sorted files to eliminate duplicate rows. Superficially, the differences between query Q8 and Q9 appear trivial, yet the queries can have profound differences in performance. In many cases the use of disjunctive clauses in queries results in either a brute force linear search of the table, or a sort of a potentially large amount of data.

William Miles

5/22/2005

Q8:

SELECT * FROM Patient WHERE Doctor = 1234 OR Area = Ontario SELECT * FROM Patient WHERE Doctor = 1234 AND Area = Ontario

Group one doctors patients with Ontario patients.

Q9:

Identify only the Ontario patients of a particular doctor.

Join Selectivities The JOIN operation is one of the most time consuming operations in query processing. A join operation matches two tables across domain compatible attributes. One common technique for performing a join is a nested (inner-outer) loop or brute force approach. In this case, for every row in the first table a scan of the second table is performed and every record is tested for satisfying the join condition. A second technique is to use an access structure or index to retrieve the matching records. In this case, for every row in the first table an index is used to access the matching records from the second table. One factor that significantly affects performance of the join is the percentage of rows in one table that will be joined with rows in the other table. This is called the join selection factor. This factor depends not only on the two tables to be joined, but also on the join fields if there are multiple join conditions between the two tables. For example, query Q10 joins each Physician row with the Patient rows. Each physician is expected to exist once in the Patient table (after all, a physician is also a patient), but 999,000 patient rows will not be joined. Suppose indexes exist on each of the join attributes. There are two options for performing the join. The first retrieves each Patient row and then uses the index into the Physician table to find the matching record. In this case, no matching records will be found for those patients who are not also physicians. The second option first retrieves each Physician row and then uses the index into the Patient table to find the matching Patient row. In this case, every physician will have one matching patient row. It is clear that the second option is more efficient than the first option. This occurs because the join selection factor of Physician with respect to the join condition is 1. Conversely, the Patient selection factor with respect to the same join condition is 1,000/1,000,000. Choosing optimum join methods requires that various table sizes and other statistics be used to compute estimated join selectivities.

Q10:

SELECT * FROM Patient, Physician WHERE Patient.SSN = Physician.Dr_SSN SELECT * FROM Patient, Physician WHERE Physician.Dr_SSN = Patient.SSN

If join selectivities are not used, then these two queries can exhibit quite different performance.

Q11:

Views A view in SQL is a single table that is derived from other tables. A view can be considered as a virtual table or as a stored query. A view is often used to specify a frequently used query. This is of particular benefit if tables must be joined or restricted. One difficulty with views is that a view can hide the query complexity from the user. For example, view V1 describes a virtual table that contains the same number of rows as the Physician table. Query Q12 accesses the Patient, Provider, and Treatment tables through view William Miles 5 5/22/2005

V1 to determine the total cost of services that Opthamologists have rendered. Conversely, query Q13 accesses only the Physician table to retrieve (different) data on Opthamologists. The problem is that both Q12 and Q13 appear to be of the same order of complexity, given that knowledge of the view is hidden, yet each query will clearly have a different performance profile.

V1:

CREATE VIEW DrService (Dr, Specialty, Age, TotCost) AS SELECT Provider, Specialty,Age,Sum(Cost) FROM Patient, Physician, Treatment WHERE SSN = Dr_SSN AND DrNum = Provider GROUP BY Provider SELECT * FROM DrService WHERE Specialty = Opthamologist SELECT * FROM Physician WHERE Specialty = Opthamologist

This view matches the Physician table to the Treatment table, and then joins the result to the Patient table.

Q12:

This query performs a three-way join, through the view. This query simply scans one table.

Q13:

Conclusions
For many decision support systems we have observed that clients expect that information can always be retrieved efficiently, assuming that the database is designed properly. We have attempted to show why this is a myth. Queries formulated using an SQL query language provide little predictive information useful for estimating query performance. Internal knowledge of the database structure, data distribution, and query optimizing strategy are necessary to develop effective query statements. This technical knowledge rarely exists in the user community. This leads us to recommend that enterprise decision support systems remain independent from user developed, unstructured queries. Any request to integrate ineffective or unproven query statements into a management system should be discouraged. The inevitable result is a dissatisfied client.

References
[1] [2] [3] Date, C. An Introduction to Database Systems, Addison-Wesely Publishing Co., 1975 Knuth, D. The Art of Computer Programming, Vol. 3, Searching and Sorting, Addison-Wesely Publishing Co., 1973 Elmasri, R. And Navathe, S. Fundamentals of Database Systems, Benjamin Cummings Publishing Co., 1989

William Miles

5/22/2005

Ahlam Na - Baldevarona
No ratings yet
Ahlam Na - Baldevarona
29 pages
Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0
No ratings yet
Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0
6 pages
JETIR1805119
No ratings yet
JETIR1805119
7 pages
Query Processing
No ratings yet
Query Processing
8 pages
4.6 Algorithms For Select and Join Operations
No ratings yet
4.6 Algorithms For Select and Join Operations
6 pages
Module - 4
No ratings yet
Module - 4
60 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
8 Query Optimization
No ratings yet
8 Query Optimization
39 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
7 pages
Query Processing and Query Optimization Techniques
No ratings yet
Query Processing and Query Optimization Techniques
20 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
Database Query Optimization Guide
100% (1)
Database Query Optimization Guide
43 pages
Heuristic Query Optimization Guide
No ratings yet
Heuristic Query Optimization Guide
6 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
33 pages
Query Processing
No ratings yet
Query Processing
11 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
Index & Query Optimization
No ratings yet
Index & Query Optimization
21 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
2 Select Optimization
No ratings yet
2 Select Optimization
23 pages
Database
No ratings yet
Database
4 pages
Query Performance Tuning
No ratings yet
Query Performance Tuning
35 pages
SQL Basics for Aspiring Developers
No ratings yet
SQL Basics for Aspiring Developers
15 pages
11 Query Evaluations
No ratings yet
11 Query Evaluations
17 pages
Perofrmance and Indexes Discussion Questions Solutions PDF
No ratings yet
Perofrmance and Indexes Discussion Questions Solutions PDF
5 pages
Dbms Notes by kavyar-UNIT5
No ratings yet
Dbms Notes by kavyar-UNIT5
12 pages
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Oracle SQL Tuning PDF
50% (2)
Oracle SQL Tuning PDF
70 pages
Basic Concepts of Query Processing
No ratings yet
Basic Concepts of Query Processing
10 pages
DD Mani
No ratings yet
DD Mani
10 pages
SQL Query Optimization Tips
No ratings yet
SQL Query Optimization Tips
4 pages
SQL Query Optimization Guide
No ratings yet
SQL Query Optimization Guide
4 pages
SQL Performance Optimization Guide
No ratings yet
SQL Performance Optimization Guide
66 pages
Advanced Database System Chapter Three Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Three Query Processing and Optimization
94 pages
Dbmsimpunit 3
No ratings yet
Dbmsimpunit 3
10 pages
Unit - V: Database Database Management System Storage Devices CPU Computers Network
No ratings yet
Unit - V: Database Database Management System Storage Devices CPU Computers Network
4 pages
Writing SQL Queries - Let's Start With The Basics
No ratings yet
Writing SQL Queries - Let's Start With The Basics
6 pages
Lesson 12 Performance Optimization and Best Practices in SQL
No ratings yet
Lesson 12 Performance Optimization and Best Practices in SQL
66 pages
04 Advanced Database System Chap 02 (RVUNC)
No ratings yet
04 Advanced Database System Chap 02 (RVUNC)
50 pages
SQL Tuning
No ratings yet
SQL Tuning
51 pages
SQL Tuning
100% (6)
SQL Tuning
51 pages
What Is Query
No ratings yet
What Is Query
6 pages
Query Processing
No ratings yet
Query Processing
5 pages
Db2 SQL Query
No ratings yet
Db2 SQL Query
188 pages
Writing SQL Queries Bsics
No ratings yet
Writing SQL Queries Bsics
9 pages
Query
No ratings yet
Query
10 pages
NICE ONE - SQL Optimization
No ratings yet
NICE ONE - SQL Optimization
60 pages
Adbs CH2
No ratings yet
Adbs CH2
56 pages
1 Query Processing
No ratings yet
1 Query Processing
4 pages
29-Query Optimization-04-10-2024
No ratings yet
29-Query Optimization-04-10-2024
35 pages
Unit 13 Structured Query Formulation: Structure
No ratings yet
Unit 13 Structured Query Formulation: Structure
17 pages
Document Checklist Batch-VI (New)
No ratings yet
Document Checklist Batch-VI (New)
2 pages
Fsa 2011 15 PDF
No ratings yet
Fsa 2011 15 PDF
253 pages
Job Satisfaction and Teaching Performance of College Faculty
No ratings yet
Job Satisfaction and Teaching Performance of College Faculty
7 pages
Job Satisfaction Among Staffs at Universiti Sultan Zainal Abidin (Unisza)
No ratings yet
Job Satisfaction Among Staffs at Universiti Sultan Zainal Abidin (Unisza)
10 pages
Prime Minister'S National Laptop Scheme Phase-Iii: Application Form For Verification
No ratings yet
Prime Minister'S National Laptop Scheme Phase-Iii: Application Form For Verification
2 pages
Reseach Proposal On Motivation
No ratings yet
Reseach Proposal On Motivation
15 pages
User Satisfaction On Human Resource Management Information System (HRMIS) : A Case Study at Terengganu Police Contingent, Malaysia
No ratings yet
User Satisfaction On Human Resource Management Information System (HRMIS) : A Case Study at Terengganu Police Contingent, Malaysia
13 pages
Applying UTAUT To Determine in
No ratings yet
Applying UTAUT To Determine in
319 pages
Job Satisfaction Among Staffs at Universiti Sultan Zainal Abidin (Unisza)
No ratings yet
Job Satisfaction Among Staffs at Universiti Sultan Zainal Abidin (Unisza)
10 pages
Dairy Farm Business-Plan Heracles
100% (1)
Dairy Farm Business-Plan Heracles
17 pages
Character Certificate: System Analytic, IBA
No ratings yet
Character Certificate: System Analytic, IBA
1 page
04 Operating System Support
No ratings yet
04 Operating System Support
60 pages
Masterthesis Fons Kortekaas PDF
No ratings yet
Masterthesis Fons Kortekaas PDF
119 pages
My Resreach Area
No ratings yet
My Resreach Area
23 pages
University of Sindh Jamshoro: Employee Data Collection Form Health Insurance Coverage
No ratings yet
University of Sindh Jamshoro: Employee Data Collection Form Health Insurance Coverage
1 page
Visiting Faculty
No ratings yet
Visiting Faculty
1 page
Disaster Recovery & Business Continuity
No ratings yet
Disaster Recovery & Business Continuity
17 pages
USB Drive Speed Test Results
No ratings yet
USB Drive Speed Test Results
6 pages
Short Answer Questions (MC) : UNIT-1 Wireless Transmission
No ratings yet
Short Answer Questions (MC) : UNIT-1 Wireless Transmission
7 pages
Automation License Manager V5.3 SP4, 32-Bit and 64-Bit Edition
No ratings yet
Automation License Manager V5.3 SP4, 32-Bit and 64-Bit Edition
18 pages
REPORT ENGLISH - Archive
No ratings yet
REPORT ENGLISH - Archive
8 pages
Office Admin: Trends & Innovations
No ratings yet
Office Admin: Trends & Innovations
3 pages
Google It Automation With Python
No ratings yet
Google It Automation With Python
28 pages
PacketFabric Colt Infosheet
No ratings yet
PacketFabric Colt Infosheet
1 page
Canteen App for Students & Staff
No ratings yet
Canteen App for Students & Staff
4 pages
Protocols For Qos Support
No ratings yet
Protocols For Qos Support
52 pages
Audit Data Standards & Analytics Quiz
No ratings yet
Audit Data Standards & Analytics Quiz
3 pages
21stCenturyLit Week7&8
No ratings yet
21stCenturyLit Week7&8
4 pages
Test Driven
No ratings yet
Test Driven
166 pages
Arch Fundamental
No ratings yet
Arch Fundamental
43 pages
Sap Abap Part1 - (PDF Document)
No ratings yet
Sap Abap Part1 - (PDF Document)
4 pages
Design and Implementation of High Speed Carry Select Adder
No ratings yet
Design and Implementation of High Speed Carry Select Adder
6 pages
H13 511 - V5.5 Demo
No ratings yet
H13 511 - V5.5 Demo
8 pages
04 - Signaling in MTP
No ratings yet
04 - Signaling in MTP
68 pages
Ondc Final PDF
No ratings yet
Ondc Final PDF
1 page
Flappy Bird - Study Paper
No ratings yet
Flappy Bird - Study Paper
3 pages
Oracle Cloud Engineer Profile
No ratings yet
Oracle Cloud Engineer Profile
1 page
Wi-Fi PLUG
No ratings yet
Wi-Fi PLUG
1 page
Descriptive Texts on Favorite Items
No ratings yet
Descriptive Texts on Favorite Items
6 pages
MT6752 EMMC Partition Layout
No ratings yet
MT6752 EMMC Partition Layout
6 pages
SECCD Col11
No ratings yet
SECCD Col11
96 pages
Computer JKSSB Exams
No ratings yet
Computer JKSSB Exams
40 pages
Installing and Configuring FreeNAS
No ratings yet
Installing and Configuring FreeNAS
29 pages
Programming Control View
No ratings yet
Programming Control View
1 page
Ensayo de Estilo de Vida Saludable
100% (1)
Ensayo de Estilo de Vida Saludable
4 pages

Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0

Uploaded by

Optimizing SQL Query Processing: Patient 1, 0 0 0, 0 0 0

Uploaded by

Optimizing SQL Query Processing

Scanning and Parsing (Syntax Analysis) Intermediate form of query

Query Optimizer Execution Plan

Query Code Generator

Code to execute the query

Runtime Database Processor Result of query

In this query, the Age attribute is specified first.

This query specifies Gender first.

Group one doctors patients with Ontario patients.

Identify only the Ontario patients of a particular doctor.

You might also like