0% found this document useful (0 votes)
336 views7 pages

HW4 Solutions

This document contains a homework assignment with 8 questions about database systems topics including block allocation, B-trees, hashing, and grid files. The questions ask the student to calculate the number of blocks needed for different data structures, determine minimum node sizes in B-trees, probabilities in extensible hashing, and buckets to examine for range and nearest neighbor queries in a grid file. Detailed answers are provided for each question.

Uploaded by

Nikhil Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views7 pages

HW4 Solutions

This document contains a homework assignment with 8 questions about database systems topics including block allocation, B-trees, hashing, and grid files. The questions ask the student to calculate the number of blocks needed for different data structures, determine minimum node sizes in B-trees, probabilities in extensible hashing, and buckets to examine for range and nearest neighbor queries in a grid file. Detailed answers are provided for each question.

Uploaded by

Nikhil Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CSE 562

Database Systems
HW 4
Total Marks 100

Your Name:
Your email ID:
Your UB Person ID:

1. Suppose blocks hold either three records, or ten key-pointer pairs. As a function
of n, the number of records, how many blocks do we need to hold a data file and: (a)
A dense index (b) A sparse index?

Answers:

n
a). For dense index we need a key-pointer pair for each record, and so will need 10
blocks.
For the data, we will need n3 blocks, and so the total number of blocks is 13n
30

b). For the sparse index we need a key-pointer pair for each of the data block,and so will
n
need 30 blocks. For the data, we will need n3 blocks, and so the total number of blocks is 11n
30
.

2.Repeat Problem 1 if blocks can hold up to 30 records or 200 key-pointer pairs, but
neither data nor index-blocks are allowed to be more than 80 % full.

Answers:

n
a) For dense index we need a key-pointer pair for each record, and so will need 200.0.8 blocks.
n 23n
For the data, we will need 30.0.8 blocks, and so the total number of blocks is 480 .

b) For the sparse index we need a key-pointer pair for each of the data block,and so will
n n
need 30.0.8.200.0.8
blocks. For the data, we will need 30.0.8 blocks, and so the total number of
161n
blocks is 3840
.

3.Suppose that blocks can hold either ten records or 99 keys and 100 pointers. Also
assume that the average B-tree node is 70% full; i.e., it will have 69 keys and 70
pointers. We can use B-trees as part of several different structures. For each structure
described below, determine
(i) the total number of blocks needed for a 1,000,000-record file, and
(ii) the average number of disk I/O ’s to retrieve a record given its search key. You
may assume nothing is in memory initially, and the search key is the primary key for
the records.

1
a) The data file is a sequential file, sorted on the search key, with 10 records per block.
The B-tree is a dense index.
b) The same as (a), but the data file consists of records in no particular order, packed
10 to a block.
c) The same as (a), but the B-tree is a sparse index.
d) Instead of the B-tree leaves having pointers to data records, the B-tree leaves hold
the records themselves. A block can hold ten records, but on average, a leaf block is
70% full; i.e., there are seven records per leaf block.
e) The data file is a sequential file, and the B-tree is a sparse index, but each primary
block of the data file has one overflow block. On average, the primary block is full,
and the overflow block is half full. However, records are in no particular order within
a primary block and its overflow block

answer:
1000000 1000000
1. We would need = 100000 blocks for the data + = 14493 blocks for
10 69
14493 208
the leaf nodes + = 208 blocks for the next B-tree level, = 3 for the next
70 70
level, and one block for the root. The total would be 114705 blocks. We would need 5
I/O’s (4 for the B-tree levels + data page).
2. Same as (a)
1000000 100000
3. We would need = 100000 blocks for the data + = 1450 blocks for the
10 69
1450
leaf nodes + = 21 blocks for the next B-tree level, , and one block for the root.
70
The total would be 101472 blocks. We would need 4 I/O’s (3 for the B-tree levels +
data page).
1000000 142858
4. We would need = 142858 blocks for the leaf nodes + = 2041 blocks
7 70
2041
for the next level + = 30 blocks for the next level, , and one block for the root.
70
The total would be 144930 blocks. We would need 4 I/O’s (4 for the B-tree levels).
1000000
5. We would need = 66667 blocks for the primary data + 66667 for the overflow
15
66667 967
blocks, = 967 blocks for the leaf nodes + = 14 blocks for the next B-tree
69 70
level, and one block for the root. The total would be 134316 blocks. We would need 3
I/O’s for the B-tree levels + average of 1 + 1 · 13 for the total of 4 13 .

4. What are the minimum numbers of keys and pointers in B-tree (i) interior nodes
and (ii) leaves, when:
a) n = 10; i.e., a block holds 10 keys and 11 pointers.
b) n = 11; i.e., a block holds 11 keys and 12 pointers.

answer:

2
a) 5 keys and 6 pointers for the interior nodes, 5 keys and 5 pointers in the leaf nodes.

b) 5 keys and 6 pointers for the interior nodes, 6 keys and 6 pointers in the leaf nodes.

5. In an extensible hash table with n records per block, what is the probability that
an overflowing block will have to be handled recursively; i.e., all members of the block
will go into the same one of the two blocks created in the split?

answer: In order for all members of the block to go to the same created block, they must
have the same bit at the (j + 1)st position. The probability of all n+1 records having the
 n+1
1
same bit is .
2

6.Suppose keys are hashed to four-bit sequences, as in our examples of extensible


and linear hashing in this section. However, also suppose that blocks can hold three
records, rather than the two-record blocks of our examples. If we start with a hash
table with two empty blocks (corresponding to 0 and 1), show the organization after
we insert records with hashed keys:
a)0000,0001,... ,1111, and the method of hashing is extensible hashing.
b) 0000,0001,... ,1111, and the method of hashing is linear hashing with a capacity
threshold of 100 %.
c) 1111,1110,..., 0000, and the method of hashing is extensible hashing.
d)1111,1110,... , 0000, and the method of hashing is linear hashing with a capacity
threshold of 75%.

answer:

1. ---
0000
000
0001
---
0010
001
0011
---
0100
010
0101
---
0110
011
0111
---
1000
100

3
1001
---
1010
101
1011
---
1100
110
1101
---
1110
111
1111
---
2. i = 3, n = 6, r = 16
---
0000
000 1000

---
0001
001 1001

---
0010 1110
010 0110
1010
---
0011 1111
011 0111
1011
---
0100
100 1100

---
0101
101 1101

---
3. ---
0001
000
0000
---

4
0011
001
0010
---
0101
010
0100
---
0111
011
0110
---
1001
100
1000
---
1011
101
1010
---
1101
110
1100
---
1111
111
1110
---
4. i = 3, n = 8, r = 16
---
1000
000 0000

---
1001
001 0001

---
1010
010 0010

---
1011
011 0011

---

5
1100
100 0100

---
1101
101 0101

---
1110
110 0110

---
1111
111 0111

---

7. Suppose we store a relation R (x,y) in a grid file. Both attributes have a range of
values from 0 to 1000. The partitions of this grid file happen to be uniformly spaced;
for x there are partitions every 20 units, at 20, 40, 60, and so on, while for y the
partitions are every 50 units, at 50, 100, 150, and so on.
a) How many buckets do we have to examine to answer the range query
SELECT * FROM R
WHERE 310 < x AND x < 400 AND 520 < y AND y <730
b) We wish to perform a nearest-neighbor query for the point (110,205). We begin
by searching the bucket with lower-left corner at (100,200) and upper-right corner at
(120,250), and we find that the closest point in this bucket is (115,220). What other
buckets must be searched to verify that this point is the closest?

answer: (a) 25
(b) The distance is 15.8, so the other buckets that need to be examined are: (80,200),
(120,200), (80,150), (100,150), (120,150).

8. Suppose we have a relation R (x ,y ,z), where the pair of attributes x and y together
form the key. Attribute x ranges from 1 to 100, and y ranges from 1 to 1000. For each
x there are records with 100 different values of y, and for each y there are records
with 10 different values of x. Note that there are thus 10,000 records in R. We wish
to use a multiple-key index that will help us to answer queries of the form:
SELECT z
FROM R
WHERE x = C AND y = D;
where C and D are constants. Assume that blocks can hold ten key-pointer pairs, and
we wish to create dense indexes at each level, perhaps with sparse higher-level indexes
above them, so that each index starts from a single block. Also assume that initially
all index and data blocks are on disk.
a) How many disk I/O ’s are necessary to answer a query of the above form if the first

6
index is on x?
b) How many disk I/O ’s are necessary to answer a query of the above form if the
first index is on y?
c) Suppose you were allowed to buffer 11 blocks in memory at all times. Which blocks
would you choose, and would you make x or y the first index, if you wanted to minimize
the number of additional disk I/O ’s needed?

answer:

1. For the dense index on x we would need 100/10 = 10 blocks, and so for the sparse
index of x we would need 1 block. Therefore, we need two disk I/O’s to get to the
y index. For each x there are 100 y values and so the dense index on y will need 10
blocks, and the sparse index will need one block. We would need two disk I/O’s to get
to the y value. The total number of I/O’s is then 2+2+1(for the data) = 5.
2. For the dense index on y we would need 1000/10 = 100 blocks, and so for the sparse
index on y we would need two levels (10 blocks and 1 block). Therefore, we need 3 disk
I/O’s to get to the x index. For each y there are 10 values of x and so we just need a
one block dense index for x. The total disk I/O’s is then 3+1+1 (for the data) = 5.
3. We would want to buffer the top 11 blocks of the tree (root + 10 intermediate blocks
of the next level after the root). This means that picking x as the first index is better
since the whole index would be in memory and the queries where predicate x = C is
false could be answered without any additional I/O’s.

9.For the structure of Problem 9, how many disk I/O ’s cure required to answer the
range query in which 20 < x < 35 and 200 < y < 350. Assume data is distributed
uniformly; i.e., the expected number of points will be found within any given range.

answer. To evaluate 20 6 x 6 35 we need to read the root, then 3 blocks for the range (11-
20, 21-30, 31-40). That’s 4 I/O’s. For each of the x values qualified we need to evaluate 200
6 y 6 350 which means reading the root block and then 3 blocks for the range (101-200,
201-300, 301-400). The total is then, 4+1·4+10·4+10·4 = 88.

You might also like