0% found this document useful (0 votes)

5 views6 pages

Solution 03

The document outlines Exercise 3 for a Database Systems course, focusing on various join algorithms, sorting techniques, and histogram calculations. It includes detailed questions and solutions related to Nested-Loop-Join, Hash-Join, join implementations in code, external sorting, and V-optimal histograms. Each question provides specific tasks, calculations, and expected outcomes, along with insights into performance comparisons and algorithm efficiency.

Uploaded by

deyik21439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

Solution 03

Uploaded by

deyik21439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Database Systems WS 2024/25

Prof. Dr.-Ing. Sebastian Michel

M.Sc. Angjela Davitkova
Exercise 3: Handout 04.11.2024, Due 11.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #016

Question 1: Nested-Loop-Join and Hash-Join (1 P.)

We want to join the relations Invoice and Item. There are 100 000 invoices and 200 000 items. Each
tuple of each relation fits in exactly one page. The join buffer has 128 pages and the result tuples of the
join also fit into one page (We perform the projection during the join). For this join Item is the inner
relation and Invoice the outer.

a) Instead of counting all page accesses, we now would like to estimate the amount of sequential page
accesses (e.g.: Loading the first 128 pages of a table into the buffer counts as 1 sequential page
access).
In the Block-Nested-Loop join from the lecture as many tuples of the outer relation as possible
are stored in the buffer. Now we will fill the buffer with 50% inner tuples and 50% outer tuples.
Calculate the amount of sequential page accesses, that are required to calculate the join.

Solution
One page in the buffer is reserved for the result tuples. The remaining 127 pages are split into 64
pages for the outer and 63 pages for the inner relation (or the other way around). For each 64-page
chunk of the outer relation we have to read the whole inner relation in 63-page chunks.
The outer relation has d100 000/64e = 1 563 of these chunks, while the inner relation has
d200 000/63e = 3 175.
Thereby, the total amount of page accesses is 1 563 + 1 563 · 3 175 = 4 964 088.

b) We now want to perform a hash join, with the hash function mod k. Determine k, such that
the buffer is utilized optimally, and calculate the amount of sequential page accesses for the join.
You can ignore the write operations after partitioning and assume that all relevant attributes are
uniformly distributed natural numbers ≥ 1

Solution
With the sizes of the relations, the uniform distribution and the lossless join we can determine
that each invoice will be joined with 2 items. We use the buffer optimally if it is filled with tuples
from Invoice and Item in a 1 : 2 ratio. 127 pages are available for tuples in the buffer. Thereby,
b127/3c = 42, 42 · 1 = 42, 42 · 2 = 84 tuples of the respective relations should be stored in the buffer.
In total 126 of the 127 possible tuples will be used.
Since we want to end up with 42 pages in each bucket, k can now be calculated with d 10042000 e = 2 381
resulting in k = 2 381.
To calculate the sequential reads, we have to partition both relations by reading them with
100 000/127 + 200 000/127 sequential accesses. The k = 2 381 partitions of both relations are now
read and and joined, resulting in another 2 · 2 381 accesses.
In total 100 000/127 + 200 000/127 + 2 · 2 381 sequential read operations are performed.

1
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 3: Handout 04.11.2024, Due 11.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #016

Question 2: Join Implementation (1 P.)

This question requires you to implement code in any programming language you want. The code has to
compile and return the correct result. Submit the code as a separate file (or archive, if you have
more than one source file). If you use a different language than Java, please provide instructions on
how to compile and run your code. In OLAT, we provide a Java template with most of the boilerplate
code already in place.
Please delete all indices that were created on the tables lineitem and orders.

a) Implement the following query1 :

SELECT l_orderkey, l_shipdate, o_orderdate

FROM lineitem JOIN orders ON l_orderkey = o_orderkey

First you should load the required values from the TPC-H database into lists. You then execute a
nested loop join and an index-based nested loop join once. The index should be a suitable HashMap.

Solution
Pseudo code nested loop join:

result = []
for o in outer:
for i in inner:
result << new Tuple(o, i)
return result

Pseudo code index-based:

map = {}
for i in inner:
map[i.key] = i

result = []
for o in outer:
result << new Tuple(o, map[o.key])
return result

b) Measure the execution time of your implementations (including the index creation time) and com-
pare it to the time Postgres requires for the same query. How do you explain the differences?

Solution
We collected statistics from multiple systems and noticed: The nested loop join requires approx. 4
s, the hash join 20 ms and Postgres 500 ms. The hash join is way faster than the nested loop join,
as we only have to iterate the outer relation once and find the join partner in the inner relation in
constant time.
The slower time of Postgres is most probably due to the fact that we also measure the time Postgres
needs to load the tuples from disk.
1 In the example code we only load tuples where orderkey < 50000, you may try bigger values if you want.

2
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 3: Handout 04.11.2024, Due 11.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #016

c) Which join algorithm did Postgres choose? Make Postgres perform a merge join and measure the
execution time.

Solution
Postgres performs a hash join, by hashing the Orders tuples. To perform a merge join, the data
has to be readable in a sorted manner.
We can use the following two solutions to enable a sorted read:
• Create two indices, enabling a index-only scan.
CREATE INDEX index1 ON lineitem(l_orderkey, l_shipdate);
CREATE INDEX index2 ON orders(o_orderkey, o_orderdate);

• or, cluster the tables by orderkey :

CREATE INDEX index3 ON lineitem(l_orderkey);
CLUSTER lineitem USING index3;
CREATE INDEX index4 ON orders(o_orderkey);
CLUSTER orders USING index4;

Question 3: Swapping Inputs of Join Operators at Runtime(1 P.)

Consider a natural join between two relations R(A, B) and S(B, C). As we know, the join operator is
associative and commutative, so regarding the results it does not matter if we execute R o
n S or S on R.
Assume that the join algorithm started executing R o n S. Discuss whether it is possible or not to swap
during execution the two inputs and at which points of the algorithm this can be done. Do so for Nested
Loop Join as well as Merge Join (relations are already properly sorted).

Solution
Merge Join:
The merge join is naturally symmetric. The only little thing to take care of is if there are ties, which need
to be processed either entirely first or extra caution is needed.
Nested Loop Join:
The nested loop is not symmetric, but whenever the inner loop is completed for one specific tuple of the
outer relation, we can swap inputs. But we have to make sure to not join tuples for a second time.
E.g.: If |R| = 10, |S| = 20, and we just completed the join for iR = 2, then after swapping the inputs,
the nested loop would have to make sure not to join the first [0, 2] tuples in R:

for each s in S
for each r in R[3,|R|]
...

3
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 3: Handout 04.11.2024, Due 11.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #016

Question 4: External Sort (1 P.)

a) A file consisting of 100 000 blocks is to be sorted. The sorting should take not more than 4 merge
phases. Is this possible using standard external sort with an in-memory buffer of 20 pages, and if it
is possible, what is the minimum buffer size required?

Solution
With N = 100 000; nB = 20 we calculate:
N 100 000
nR = d e=d e = 5 000
nB 20
p = dlognB −1 (nR )e = dlog19 (5 000)e = 3
Thereby, it is possible to merge the blocks in the specified number of phases. To get the minimum
number we may try sensible numbers or calculate it by using the phases formula.
N
p = dlognB −1 (d e)e = 4
nB
By solving it for nB , we get nB ≈ 10.807. Thereby, we need a minimum of 11 pages.

b) Apply external sort with and without blocked I/O, for N and nB as specified in the previous
question. For blocked I/O assume a buffer block of b = 2. Specify the number of runs in each pass
of the algorithm. Which algorithm requires less phases?

Solution
The number of initial runs is given by d nNb e, where N is the number of blocks of the file and nb is
the available size of main memory used for sorting (buffer size). In the merge phase, for external
sort without blocked I/O we read the next nb − 1 runs, where as for blocked I/O b nbb c − 1 ( 1 block
is reserved for the output block ).
External sort without blocked I/O:
• Pass 0 (Initial Sorting): d 100000
20 e = 5 000 runs.
• Pass 1: d 5 19
000
e = 264 runs.
• Pass 2: d 264
19 e = 14 runs.
• Pass 3: merges the 14 runs.
External sort with blocked I/O:
• Pass 0 (Initial Sorting): d 100000
20 e = 5 000 runs.
• Pass 1: d 5 000
9 e = 556 runs.
• Pass 2: 556
d 9 e = 62 runs.
• Pass 3: d 62
9 e = 7 runs.
• Pass 4: merges the 7 runs.
Clearly, external sort without blocked I/O produces less number of phases.

4
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 3: Handout 04.11.2024, Due 11.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #016

Question 5: V-Optimal Histograms (1 P.)

a) Given the following table of values and frequencies. Calculate the v-optimal histogram for B = 2
cells. Use the algorithm that was discussed in the lecture.
Please write down P, PP and a table with all calculated SSE ∗ (i, k) values and note when and how
it was updated. As final result, the histogram bounds should be provided.

Value Frequency
1 9
2 15
3 2
4 17
5 15

Solution

k 0 1 2 3 4 5
P 0 9 24 26 43 58
PP 0 81 306 310 599 824

In the lecture we saw, that we can calculate the SSE with the following formula:

P [j] − P [i − 1] 2
SSE([i, j]) = P P [j] − P P [i − 1] − (j − i + 1) ∗ ( )
(j − i + 1)
For k = 1 we do not have a choice on how to distribute the values. So we only have to calculate the
errors with SSE(1, i):

k,i 1 2 3 4 5
1 0.00 18.00 84.67 136.75 151.20
2 - - - - -

For k > 1 we now execute the algorithm as follows:

k i j besterror[j][k − 1] SSE(j + 1, i) besterror[i][k] Action

2 1 1 0.00 0.00 0.00 Initial
2 2 1 0.00 0.00 0.00 Initial
2 2 2 18.00 0.00 0.00 -
2 3 1 0.00 84.50 0.00 Initial
2 3 2 18.00 0.00 84.50 Replace
2 3 3 84.67 0.00 18.00 -
2 4 1 0.00 132.67 0.00 Initial
2 4 2 18.00 112.50 132.67 Replace
2 4 3 84.67 0.00 130.50 Replace
2 4 4 136.75 0.00 84.67 -
2 5 1 0.00 142.75 0.00 Initial
2 5 2 18.00 132.67 142.75 -
2 5 3 84.67 2.00 142.75 Replace
2 5 4 136.75 0.00 86.67 -
2 5 5 151.20 0.00 86.67 -

5
Database Systems WS 2024/25
Prof. Dr.-Ing. Sebastian Michel
M.Sc. Angjela Davitkova
Exercise 3: Handout 04.11.2024, Due 11.11.2024 12:00 CET https://dbis.cs.uni-kl.de
Lecture content: Videos up to #016

The full SSE ∗ (i, k) table now looks like:

k,i 1 2 3 4 5
1 0.00 18.00 84.67 136.75 151.20
2 0.00 0.00 18.00 84.67 86.67

By looking at the best choices made, we can see that the buckets are [1, 3][4, 5] and have an error
of 86.667.

b) Formally show that

X
SSE([i, j]) = (F [k]2 ) − (j − i + 1) ∗ AV G([i, j])2
i≤k≤j

Solution

X
SSE([i, j]) = (F [k] − AV G([i, j]))2 (1)
i≤k≤j
X
F [k]2 − 2 AV G([i, j]) ∗ F [k] + AV G([i, j])2

= (2)
i≤k≤j
X X X
= F [k]2 − 2 AV G([i, j]) ∗ F [k] + AV G([i, j])2 (3)
i≤k≤j i≤k≤j i≤k≤j
X X X
= F [k]2 − 2 AV G([i, j]) ∗ F [k] + AV G([i, j])2 (4)
i≤k≤j i≤k≤j i≤k≤j
X X
= F [k]2 − 2 AV G([i, j]) ∗ (j − i + 1)AV G([i, j]) + AV G([i, j])2 (5)
i≤k≤j i≤k≤j
X
= F [k]2 − 2 AV G([i, j]) ∗ (j − i + 1)AV G([i, j]) + (j − i + 1)AV G([i, j])2(6)
i≤k≤j
X
= F [k]2 − 2 (j − i + 1)AV G([i, j])2 + (j − i + 1)AV G([i, j])2 (7)
i≤k≤j
X
= F [k]2 − (j − i + 1)AV G([i, j])2 (8)
i≤k≤j

Design of Regenerative Pump
No ratings yet
Design of Regenerative Pump
19 pages
Sheet 03
No ratings yet
Sheet 03
2 pages
Lesson 06
No ratings yet
Lesson 06
44 pages
3 Join Optimization
No ratings yet
3 Join Optimization
32 pages
Problem Solving 3
No ratings yet
Problem Solving 3
3 pages
Query Processing - Short Form
No ratings yet
Query Processing - Short Form
3 pages
05 Vaishnavi Bhosale B1
No ratings yet
05 Vaishnavi Bhosale B1
68 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Response DB 3
No ratings yet
Response DB 3
6 pages
hw3 Sols
No ratings yet
hw3 Sols
5 pages
Homework #3 Join Algorithms After - 12
No ratings yet
Homework #3 Join Algorithms After - 12
4 pages
hw3 Sols
No ratings yet
hw3 Sols
5 pages
hw3 Sols
No ratings yet
hw3 Sols
4 pages
Cs411fa09 Hw4 Sol
No ratings yet
Cs411fa09 Hw4 Sol
8 pages
Course08 - RelEval
No ratings yet
Course08 - RelEval
22 pages
05 Optimization
No ratings yet
05 Optimization
58 pages
ADBMS
No ratings yet
ADBMS
15 pages
Query Processing: Solutions To Practice Exercises
No ratings yet
Query Processing: Solutions To Practice Exercises
5 pages
Relational Operators
No ratings yet
Relational Operators
114 pages
Chapter 13
No ratings yet
Chapter 13
24 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Unit 3 - DBMS
No ratings yet
Unit 3 - DBMS
15 pages
Chapter 1 Part II
No ratings yet
Chapter 1 Part II
22 pages
DBMS R19 Unit Iv
No ratings yet
DBMS R19 Unit Iv
25 pages
DBMS 10 Joins v2
No ratings yet
DBMS 10 Joins v2
38 pages
Correction of Final Exam 24-25
No ratings yet
Correction of Final Exam 24-25
5 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
hw4 Sols
No ratings yet
hw4 Sols
4 pages
Query Processing + Optimization: Outline: Operator Evaluation Strategies
No ratings yet
Query Processing + Optimization: Outline: Operator Evaluation Strategies
53 pages
Q Evaluation
No ratings yet
Q Evaluation
17 pages
Unit 3
No ratings yet
Unit 3
63 pages
BCS Topic
No ratings yet
BCS Topic
66 pages
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
No ratings yet
Evaluation of Relational Operations: Chapter 14, Part A (Joins)
6 pages
This
No ratings yet
This
8 pages
Exercises On Join Algos
No ratings yet
Exercises On Join Algos
2 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
Dbms Chapter 5
No ratings yet
Dbms Chapter 5
54 pages
Sheet 01
No ratings yet
Sheet 01
3 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Algorithms For Query Processing and Optimization
No ratings yet
Algorithms For Query Processing and Optimization
77 pages
Advanced Course On Databases 2015 Exercises (10) : TH TH TH
No ratings yet
Advanced Course On Databases 2015 Exercises (10) : TH TH TH
5 pages
Chap12 Practice Key
No ratings yet
Chap12 Practice Key
3 pages
QEII
No ratings yet
QEII
44 pages
DSA V2 Lab Final Paper
No ratings yet
DSA V2 Lab Final Paper
3 pages
Final 15
No ratings yet
Final 15
7 pages
Chapter15 2
No ratings yet
Chapter15 2
34 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
Database Homework Help
No ratings yet
Database Homework Help
10 pages
Fundamentals of Database Systems: Assignment: 4 Due Date: 28th August, 2017
No ratings yet
Fundamentals of Database Systems: Assignment: 4 Due Date: 28th August, 2017
5 pages
Week 10 Dbms
No ratings yet
Week 10 Dbms
4 pages
Query Execution
No ratings yet
Query Execution
87 pages
Dbms Query Evaluation
No ratings yet
Dbms Query Evaluation
28 pages
11 Iterators Relalg
No ratings yet
11 Iterators Relalg
46 pages
QueryProcessing Sorting
No ratings yet
QueryProcessing Sorting
44 pages
Query Processing
No ratings yet
Query Processing
77 pages
Query Processing: Practice Exercises
No ratings yet
Query Processing: Practice Exercises
4 pages
DSA V4 Lab Final Paper
No ratings yet
DSA V4 Lab Final Paper
3 pages
Notes On DBMS Internals: Preamble
No ratings yet
Notes On DBMS Internals: Preamble
20 pages
13 QP1
No ratings yet
13 QP1
33 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
80407049830 (5)
No ratings yet
80407049830 (5)
2 pages
Introduction To Practical Exercises Using MODICOM 2 ... - LJ Create PDF
No ratings yet
Introduction To Practical Exercises Using MODICOM 2 ... - LJ Create PDF
8 pages
FANUC Software WeldPRO
No ratings yet
FANUC Software WeldPRO
2 pages
Att-4 LV Cable Epr Epr GSWB Shf-2
No ratings yet
Att-4 LV Cable Epr Epr GSWB Shf-2
7 pages
Pol Party Raz
No ratings yet
Pol Party Raz
1 page
SBR - Chapter 1
No ratings yet
SBR - Chapter 1
2 pages
Illycaffe: The Starbucks Threat: Marketing Strategy
No ratings yet
Illycaffe: The Starbucks Threat: Marketing Strategy
12 pages
Lecture Notes Respiratory Medicine 9th Edition Stephen J. Bourke Instant Download
100% (1)
Lecture Notes Respiratory Medicine 9th Edition Stephen J. Bourke Instant Download
52 pages
Appraisal Form
No ratings yet
Appraisal Form
12 pages
L35 MC 6
No ratings yet
L35 MC 6
351 pages
AbInitio String Functions
100% (3)
AbInitio String Functions
13 pages
SpyGlass DS System Brochure
No ratings yet
SpyGlass DS System Brochure
6 pages
P. Anil Kumar Synopsis
No ratings yet
P. Anil Kumar Synopsis
6 pages
Optima Super Secure Brochure
No ratings yet
Optima Super Secure Brochure
20 pages
DAY 1-Dr Stephen Opio Okiror
No ratings yet
DAY 1-Dr Stephen Opio Okiror
33 pages
MBAAR Final
No ratings yet
MBAAR Final
43 pages
Building A Performance Based Work Culture PDF
No ratings yet
Building A Performance Based Work Culture PDF
16 pages
Expanding Mental Health Care in The Kingdom of Eswatini: Successes, Challenges and Recommendations From Initial Experiences in Lubombo Region
No ratings yet
Expanding Mental Health Care in The Kingdom of Eswatini: Successes, Challenges and Recommendations From Initial Experiences in Lubombo Region
8 pages
Pseudo Holday - Handle COVID 19 - Facebook Prophet
No ratings yet
Pseudo Holday - Handle COVID 19 - Facebook Prophet
27 pages
Best Practices For Effectively Implementing An ATP Sanitation Verification Program
100% (1)
Best Practices For Effectively Implementing An ATP Sanitation Verification Program
16 pages
United States Court of Appeals, Eleventh Circuit
No ratings yet
United States Court of Appeals, Eleventh Circuit
5 pages
Bryson Yee Resume 2018-2019 Updated
No ratings yet
Bryson Yee Resume 2018-2019 Updated
2 pages
Ultrasonic Sensors: USA Series US-T50/R25 US-S25AN US-S300 Series US-1AH
No ratings yet
Ultrasonic Sensors: USA Series US-T50/R25 US-S25AN US-S300 Series US-1AH
19 pages
DL24/DL24P User Manual
No ratings yet
DL24/DL24P User Manual
9 pages
CH3 4
No ratings yet
CH3 4
32 pages
FastLink CAT5e (SFTP) Outdoor
No ratings yet
FastLink CAT5e (SFTP) Outdoor
3 pages
Business Model Canvas
No ratings yet
Business Model Canvas
3 pages
Felcom 12 15 16 Ssas Tie PDF
No ratings yet
Felcom 12 15 16 Ssas Tie PDF
80 pages
ER Diagram
No ratings yet
ER Diagram
2 pages

Solution 03

Uploaded by

Solution 03

Uploaded by

Database Systems WS 2024/25

Prof. Dr.-Ing. Sebastian Michel

Question 1: Nested-Loop-Join and Hash-Join (1 P.)

Question 2: Join Implementation (1 P.)

a) Implement the following query1 :

SELECT l_orderkey, l_shipdate, o_orderdate

Pseudo code index-based:

• or, cluster the tables by orderkey :

Question 3: Swapping Inputs of Join Operators at Runtime(1 P.)

Question 4: External Sort (1 P.)

Question 5: V-Optimal Histograms (1 P.)

For k > 1 we now execute the algorithm as follows:

k i j besterror[j][k − 1] SSE(j + 1, i) besterror[i][k] Action

The full SSE ∗ (i, k) table now looks like:

b) Formally show that

You might also like