0% found this document useful (0 votes)
214 views

Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The

This document discusses database indexing and performance. It includes: 1) Questions about sorting data, clustered vs unclustered indexes, and different file organizations for various database operations. 2) A scenario involving professors and departments where indexes would help optimize specific queries. 3) Questions about hash indexing, linear hashing, and its performance benefits over tree indexes for tables with few inserts and frequent lookups by item ID. 4) An assignment to implement a student database with indexing to improve query performance, and report on the findings.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views

Index On The Search Key, and Heap Files With An Unclusted Hash Index. Briefly Discuss The

This document discusses database indexing and performance. It includes: 1) Questions about sorting data, clustered vs unclustered indexes, and different file organizations for various database operations. 2) A scenario involving professors and departments where indexes would help optimize specific queries. 3) Questions about hash indexing, linear hashing, and its performance benefits over tree indexes for tables with few inserts and frequent lookups by item ID. 4) An assignment to implement a student database with indexing to improve query performance, and report on the findings.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Part 1: Concepts and Principles

Question 1
A. Briefly explain the three main alternatives for sorting information in a data entry of an
index.

B. Define clustered index, and discuss the relation between the three alternatives and
clustered/unclustered indexes.

Question 2
Consider the following file organizations: sorted files, heap files with an unclustered tree
index on the search key, and heap files with an unclusted hash index. Briefly discuss the
suitability of each of these file organizations to perform the following operations: file
scans, range selections, inserts, and deletes.

Question 3
A. Briefly describe the two internal organizations for heap files (using lists versus
directory of pages).
B. Explain which organization you would choose if records are variable in length.

Question 4
Compare ISAM and B+ Tree index. Explain briefly their differences in handling Search,
Insert and Delete, and discus when you would use ISAM and when you would use B+
Tree index.

Question 5
Does the final structure of a B+ tree depend on the order in which the terms are added to
it? Explain your answer using an illustration example.

Question 6
Explain how extendible hashing uses a directory of buckets and discuss the global depth
of the index and local depth of a bucket.
Part 2: Design considerations for application

Question 1

Consider the following relations:

Professor (profid: integer, name: varchar, salary: integer, age: integer, depid: integer)
Department (did: integer, budget: integer, location: varchar, mgr eid: integer)

Salaries range from $30,000 to $100,000, ages vary from 20 to 80, each department has about 20
employees on average, there are 10 locations, and budgets vary from $100,000 to $1 million.
You can assume uniform distributions of values.

For each of the following queries, what index would you choose to speed up the query? If your
database system does not consider index-only plans (i.e., data records are always retrieved even
if enough information is available in the index entry), how would your answer change? Explain
briefly.

A. Query1: Print name, age, and salary for all professors.


B. Query2: Find the dids of departments that are located in Edmonton and have a budget of
more than $150,000. 

Question 2

The CVT Company is a leader in the manufacture of work clothes. You are hired as database
administrator for the company and your IT supervisor asked you to solve a retrieval speed
problem they used to have with a large file for item records. Your supervisor mentioned that they
have sorted the file but the problem didn’t improve, so they need to create a B+ tree index to
solve the problem. Your supervisor outlined the way to do it: “The best way to accomplish this
task is to scan the file, record by record, inserting each one using the B+ tree insertion
procedure.” Being a fresh graduate, you noticed that since the file is already sorted there is a
better way to do it.

a. What performance and storage utilization problems are there with your
supervisor’s approach? 
b. Explain how the bulk-loading algorithm provides a better alternative than the
proposed scheme. 
Question 3

Your team in charge of database administration was discussing different alternatives for indexing
your organization’s databases. Some tables in one database have very few insertions but they are
used intensively by different services to check for information about items using the item_ID
number. While many of your colleagues proposed using a tree index, you argued for a Hash
index for these tables because it provides an average-case search cost of only slightly more than
one disk I/O. The team leader agrees to adopt your solution but has asked you to write a short
explanation for two questions:

a. How does Linear Hashing provide an average-case search cost of only slightly
more than one disk I/O, given that overflow buckets are part of its data
structure? (6 marks)
b. If a Linear Hashing index using Alternative (1) for data entries contains 10,000
records, with 10 records per page and an average storage utilization of 80 percent,
what is the worst-case cost for an equality search? Under what conditions would
this cost be the actual search cost? (6 marks)
Part 3: Implementation Case

Consider the following database schema with the following relations:

Student (SID, Name, Address, Telephone, Age)


Course (CourseNo, Title, Department, NumberOfCredits, CourseFees)
Registration (SID, CourseNo, startDate, CompleteDate, Grade)

Consider the following queries:

o List the student numbers and names of students who received a grade greater or
equal to 70% in the course “COMP418,” sorted by age ascending.
o List the course numbers and titles of courses that have more than 10 students
getting a grade lower than 50. [(Use group by courseNo and count(SID)].
o List the course numbers and titles of courses whose course fees are between 400
and 600 dollars.
o List all courses in the database.
o Update all the course fees by adding 6 dollars to each course.

Your task is to implement this database using PostgreSQL or any other DBMS of the list
(Oracle, MySQL, DB2, SQL server) then compare the performance of the system before creating
the indexes and after creating the indexes. Make sure that you create indexes that support the
queries.

o You should use test data to identify performance issues: the more data, the better.
Make sure there is sufficient test data in your system to be able to run queries that
can return at least a dozen rows of data even when using the queries. Unless there
is a fair amount of test data, you will not be able to see much difference in query
execution times.
o Decide on the type of indexing that would be most appropriate. This will require
you to read about the different indexing options in your DBMS. The PostgreSQL 9
manual on the subject is available
at http://www.postgresql.org/docs/9.1/interactive/indexes.html. Most DBMSs,
including PostgreSQL, provide an 'ANALYZE,' 'EXPLAIN' or similar command
that can be used to help tune your database and make recommendations on
indexing that you may find very useful.
o Check the performance of the queries before adding indexes. If using PostgreSQL,
you will likely find the EXPLAIN command useful in accomplishing this. There is
a visual EXPLAIN tool available as part of some versions of PgAdmin that you
may want to try. Information on reading and interpreting the results as well as on
how to use the tool is available at
http://www.postgresonline.com/journal/index.php?/archives/27-Reading-PgAdmin-
Graphical-Explain-Plans.html
o Add your indexes. You will probably be using the CREATE INDEX command for
this, but do feel free to use other DBMS tools if they are available. For a better
analysis you may want to add one index at a time and check performance changes
between each addition to discover the cumulative effects of each index.
o Check the performance again, and record the results. If you have the time and
inclination, it would be informative to experiment more with your DBMS to
discover what differences different kinds of indexes make to different queries. If
you have enough test data, you may find considerable differences in performance
as a result.

Write a short report (1-2 pages maximum) that summarizes your findings during the experiment.
The report should include:

o A description of your implementation of the database. Include the SQL code for
implementing the tables. How many records did you enter in each table?
o A description of the execution time of the queries before creating the indexes.
o A description of the created indexes, and a justification of why you think those
indexes would improve the system performance for the specified queries.
o A table comparing both execution times before and after indexing for each query.

You might also like