11 Physicaldesign
11 Physicaldesign
UpdateSalary monthly 10 S U
Decisions to be taken
• The physical organization of relations
• Indexes
• Index type
• ... Logical schema transformation to improve
performance
Decisions for relations and indexes
• Storage structures for relations:
– heap (small data set, scan operations, use of indexes)
– sequential (sorted static data)
– hash (key equality search), usually static
– tree (index sequential) (key equality and range search)
• Choice of secondary index, considering that
– they are extremely useful
– slow down the updated of the index keys
– require memory
How to choose indexes
• Use a DBMS application, such as DB2 Design
Advisor, SQL Server DB Tuning Advisor, Oracle
Access Advisor.
How to choose indexes
• Tips: don’t use indexes
• on small relations,
• on frequently modified attributes,
• on non selective attributes (queries which returns ≥ 15% of
data)
• on attributes with values long string
• Define indexes on primary and foreign keys
• Consider the definition of indexes on attributes used
on queries which requires sorting: ORDER BY,
GROUP BY, DISTINCT, Set operations
How to choose indexes (cont.)
• Evaluate the convenience of indexes on attributes
that can be used to generate index-only plans.
• Evaluate the convenience of indexes on selective
attributes in the WHERE:
– hash indexes, for equality search
– B+tree, for equality and range search, possibly clustered
– multi-attributes (composite), for conjunctive conditions
• to improve joins with IndexNestedLoop or
MergeJoin, grouping, sorting, duplicate elimination.
• Attention to disjunctive conditions
How to choose indexes (cont.)
– Local view: indexes useful for one query (simple)
Departments Lecturers
Nrec 300 10 000
Npag 30 1 200
Nkey(IdxSalary) 50 (min=40, max=160)
Nkey(IdxCity) 15
SELECT Name
The index is created implicitly FROM Lecturers
WHERE PkLecturer = 70
on a PK
The definition of (multi-attributes) indexes
How many indexes on a Table ? How many indexes use the DBMS for
AND?
A secondary index on Position or on
Salary? SELECT Name
FROM Lecturers
A secondary index on Position and WHERE Position = ‘P’
another on Salary? AND Salary BETWEEN 50 AND 60
A secondary index on <Position,
Salary>?
Fully efficient
DB Tuning
requires deep knowledge
about...
OS HW
How does it manage HW What are suitable HW
resources and services by the components to support the
overall system? performace requirements?
Examples
Adding indexes -> benefit: better query performance
costs: more disk memory, more update time
Denormalization -> benefit: better query performance
costs: need to control redundancy within tables
Replace disk by RAID -> benefit: better I/O performance
costs: HW costs
Schallehn: Database Tuning and Self-Tuning 2012
Basic principles: Pareto principle
80/20 Rule: by applying 20% of the effort one can achieve 80% of the
desired effects.
Hence, one does not need to be an expert on all levels of the system to
be able to implement a reasonable solution...
-- T1.begin: -- T2.begin: RU
UPDATE Account
SET Balance = Balance - 200.00
WHERE No = 123; SELECT AVG(Balance)FROM
Account; COMMIT;
ROLLBACK;
Isolation levels in SQL
READ COMMITTED, shared read locks are released immediately,
exclusive locks until the T commit
Problem: avoid dirty read, but unrepeatable reads or loss of updates
-- T1.begin: -- T2.begin: RC
SELECT AVG(Balance)
UPDATE Account FROM Account;
SET Balance = Balance - 200.00
WHERE No = 123; COMMIT;
SELECT AVG(Balance)
FROM Account;COMMIT;
Isolation levels in SQL
REPEATABLE READ, shared and exclusive locks on records until the end of
the transaction
Problem: avoid the previous problems, but not the “phantom records”
problem:
-- T1.begin: -- T2.begin: RR
SELECT AVG(Balance)
INSERT INTO Account FROM Account;
VALUES(1233,”xx”,200.00);
COMMIT;
SELECT AVG(Balance)FROM
Account;COMMIT;
Isolation levels in sql (cont.)
• SERIALIZABLE, multi-granularity locks: tables read by
a T cannot be updated.
• Good, but the number of transactions which can be
executed concurrently is considerably reduced.
DBMS isolation levels
• Commercial DBMS may
– provide some isolation levels only,
– not have the same isolation level by default
– have other isolation levels (e.g. SNAPSHOT)
Logical schema tuning
• Types of logical schema restructuring:
– Vertical Partitioning.
– Horizontal Partitioning
– Denormalization
• Unlike changes to the physical schema (physical
independence), changes to the logical schema
(schema evolution) require views creation for the
logical independence.
Logical schema tuning
• Partitioning: splitting a table for performance
– Horizontal: on a property
– Vertical: R1(pk, Name, Surname) R2(pk, Address, …)
• Normalization: divide Students from Exams to avoid
anomalies
• Denormalization: store Students and Exams into one
table:
– increases update time but makes join faster
Vertical partitioning (projections)
Students
Name StudentNo City BirthYear BDegree University
Exams
PkE Course StudentNo Master Date Other
Critical Query:
Find the number of exams passed and the number of students who have
done the test by course, and by academic year.
ExamForAnalysis ExamsOther
PkE Course Master Date PkE StudentNo Other
MasterExam Master
FkM PkM
FkE ... Title President ...
Critical Query:
For a study program with title X and a course with less than 5 exams passed,
find the number of exams, by course, and academic year
MasterXExams
PkE Course
StudentNo Degree Date ...
MasterYExams
PkE Course
StudentNo Degree Date ... ...
Denormalization (attribute replication)
Students
Name StudentNo City BirthYear BDegree University
Exams
PkE Course
Student Degree Date Other
MasterExam Master
FkM PkM
FkE ... Title President ...
Critical Query:
For a student number N, find the student name, the master program title
and the grade of exams passed
Exams
PkE Course
Student Name Degree Date Master Other
DBMS itself
is best
Tuning Expert !
OS HW
Knowledge about encoded in Knowledge about encoded in
platform specific code + platform specific code +
runtime inf via OS interfaces runtime inf via OS interfaces