Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)
Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)
Systems II
BIT – 3rd Year
Semester 6
IT6405: Database Systems II – Topic 2
Learning Outcome
After successful completion of this
course students will be able to:
– create stored procedures and triggers
– describe data storage & access and
manipulate query processing techniques
– demonstrate transaction processing
techniques of database systems
– determine designs for distributed databases
© UCSC - 2016 2
IT6405: Database Systems II – Topic 2
Outline of Syllabus
1. Stored Procedures and Triggers
2. Data Storage and Querying
3. Transaction Management
4. Distributed Databases
© UCSC - 2016 3
IT6405: Database Systems II – Topic 2
References
1. Elmasri, Navathe, Somayajulu, and Gupta,
“Fundamentals of Database Systems”, 5th Edition,
Pearson Education (2008)
Note: 6th Edition released in 2011
2. Silberschatz A., Korth H.F. and Sudarshan S., “Database
System Concepts”, 5th Edition, McGraw Hill (2006).
Note: 6th Edition released in 2010
3. Ramakrishnan, Gehrke, “Database Management
Systems”, 3rd edition, McGraw Hill
© UCSC - 2016 4
IT6405: Database Systems II – Topic 2
Duration: 15 hours
© UCSC - 2016 5
IT6405: Database Systems II – Topic 2
© UCSC - 2016 6
IT6405: Database Systems II – Topic 2
© UCSC - 2016 7
IT6405: Database Systems II – Topic 2
© UCSC - 2016 8
IT6405: Database Systems II – Topic 2
© UCSC - 2016 9
IT6405: Database Systems II – Topic 2
© UCSC - 2016 10
IT6405: Database Systems II – Topic 2
Blocking
• Blocking: refers to storing a number of records in
one block on the disk.
• Blocking factor (bfr) refers to the number of
records per block.
• There may be empty space in a block if an integral
number of records do not fit in one block.
© UCSC - 2016 11
IT6405: Database Systems II – Topic 2
Files of Records
• A file is a sequence of records, where each record is a
collection of data values (or data items).
© UCSC - 2016 12
IT6405: Database Systems II – Topic 2
Operation on Files
• OPEN: Readies the file for access, and associates
a pointer that will refer to a current file record at
each point in time.
• FIND: Searches for the first file record that satisfies
a certain condition, and makes it the current file
record.
• FINDNEXT: Searches for the next file record (from
the current record) that satisfies a certain condition,
and makes it the current file record.
• READ: Reads the current file record into a
program variable.
• INSERT: Inserts a new record into the file, and makes
it the current file record.
© UCSC - 2016 13
IT6405: Database Systems II – Topic 2
Operation on Files
• DELETE: Removes the current file record from the
file, usually by marking the record to indicate that it
is no longer valid.
• MODIFY: Changes the values of some fields of the
current file record.
• CLOSE: Terminates access to the file.
• REORGANIZE: Reorganizes the file records. For
example, the records marked deleted are physically
removed from the file or a new organization of the
file records is created.
• READ_ORDERED: Read the file blocks in order of
a specific field of the file.
© UCSC - 2016 14
IT6405: Database Systems II – Topic 2
Unordered Files
• Also called a heap or a pile file.
• New records are inserted at the end of the file.
• To search for a record, a linear search through
the file records is necessary. This requires
reading and searching half the file blocks on the
average, and is hence quite expensive.
• Record insertion is quite efficient.
• To delete a record, the record is marked as
deleted. Space is reclaimed during periodical
reoganization.
© UCSC - 2016 15
IT6405: Database Systems II – Topic 2
Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an
ordering field.
• Insertion is expensive: records must be inserted in the
correct order.
• A binary search can be used to search for a record on its
ordering field value. This requires reading and searching
log2 of the file blocks on the average, an improvement
over linear search.
• Reading the records in order of the ordering field is
quite efficient.
© UCSC - 2016 16
IT6405: Database Systems II – Topic 2
Ordered Files
© UCSC - 2016 17
IT6405: Database Systems II – Topic 2
© UCSC - 2016 18
IT6405: Database Systems II – Topic 2
Hashed Files
• The file blocks are divided into M equal-sized
buckets, numbered bucket0, bucket1, ..., bucket M-1.
© UCSC - 2016 19
IT6405: Database Systems II – Topic 2
Hashed Files
• There are numerous methods for collision
resolution, including the following:
© UCSC - 2016 20
IT6405: Database Systems II – Topic 2
Hashed Files
• The hash function h should distribute the records
uniformly among the buckets; otherwise, search
time will be increased because many overflow
records will exist.
• Main disadvantages of static hashing:
Fixed number of buckets M is a problem if the
number of records in the file grows or shrinks.
© UCSC - 2016 21
IT6405: Database Systems II – Topic 2
Hashed Files
© UCSC - 2016 22
IT6405: Database Systems II – Topic 2
© UCSC - 2016 23
IT6405: Database Systems II – Topic 2
Indexes
• Index: A data structure that allows particular records in a
file to be located more quickly
~ Index in a book
© UCSC - 2016 24
IT6405: Database Systems II – Topic 2
Indexes
• Data file: a file containing the logical
records
• Index file: a file containing the index
records
• Indexing field: the field used to order the
index records in the index file
© UCSC - 2016 25
IT6405: Database Systems II – Topic 2
Dense Index
© UCSC - 2016 26
IT6405: Database Systems II – Topic 2
Sparse Index
© UCSC - 2016 27
IT6405: Database Systems II – Topic 2
Primary Index
• Defined on an ordered data file.
• Includes one index entry for each block in the data file;
the index entry has the key field value for the first
record in the block, which is called the block anchor.
© UCSC - 2016 28
IT6405: Database Systems II – Topic 2
© UCSC - 2016 29
IT6405: Database Systems II – Topic 2
Clustering Index
• Defined on an ordered data file
© UCSC - 2016 30
IT6405: Database Systems II – Topic 2
© UCSC - 2016 31
IT6405: Database Systems II – Topic 2
© UCSC - 2016 32
IT6405: Database Systems II – Topic 2
Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.
© UCSC - 2016 33
IT6405: Database Systems II – Topic 2
Secondary Index
• There can be many secondary indexes (and
hence, indexing fields) for the same file.
© UCSC - 2016 34
IT6405: Database Systems II – Topic 2
© UCSC - 2016 35
IT6405: Database Systems II – Topic 2
© UCSC - 2016 36
IT6405: Database Systems II – Topic 2
© UCSC - 2016 37
IT6405: Database Systems II – Topic 2
Multi-Level Indexes
• Since a single-level index is an ordered file, we
can create a primary index to the index itself;
• In this case, the original index file is called the first-level
index and the index to the index is called the
second- level index.
• We can repeat the process, creating a third, fourth,
..., top level until all entries of the top level fit in one
disk block.
• A multi-level index can be created for any type of first
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block.
© UCSC - 2016 38
IT6405: Database Systems II – Topic 2
© UCSC - 2016 39
IT6405: Database Systems II – Topic 2
Multi-Level Indexes
• Such a multi-level index is a form of search tree.
© UCSC - 2016 40
IT6405: Database Systems II – Topic 2
© UCSC - 2016 41
IT6405: Database Systems II – Topic 2
© UCSC - 2016 42
IT6405: Database Systems II – Topic 2
© UCSC - 2016 43
IT6405: Database Systems II – Topic 2
B+ tree
The structure of the internal nodes of a B+ tree
of order p is as follows:
• Each internal node is of the form
<P1,K1,P2, K2…..,Kq-1,Pq-1,Pq>
where q ≤ p. Each Pi is a tree pointer.
• Within each node K1 < K2 < ….<Kq-1
• Each node has at most p tree pointers.
• Each node with q tree pointers, q ≤ p, has q-1
search key field values.
© UCSC - 2016 44
IT6405: Database Systems II – Topic 2
B+ tree
The structure of the leaf nodes of a B+ tree of
order p is as follows:
• Each leaf node is of the form
<K1,Pr1>,<K2,Pr2>,…..,<Kq-1,Prq-1>,Pnext>
where q ≤ p. Each Pri is a data pointer. Pnext
points to the next leaf node of the B+ tree.
• Within each node K1 < K2 < ….<Kq-1
• All leaf nodes are at the same level.
© UCSC - 2016 45
IT6405: Database Systems II – Topic 2
© UCSC - 2016 46
IT6405: Database Systems II – Topic 2
© UCSC - 2016 47
IT6405: Database Systems II – Topic 2
© UCSC - 2016 48