0% found this document useful (0 votes)
97 views48 pages

Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)

The document discusses different file organization techniques for databases including unordered files, ordered files, hashed files, and the average access times for each. It also covers topics like blocking, disk storage devices, and basic operations on files.

Uploaded by

Amali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views48 pages

Database Systems - BIT - University of Colombo - Year 3 (Lecture Note 5)

The document discusses different file organization techniques for databases including unordered files, ordered files, hashed files, and the average access times for each. It also covers topics like blocking, disk storage devices, and basic operations on files.

Uploaded by

Amali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

IT6405: Database

Systems II
BIT – 3rd Year
Semester 6
IT6405: Database Systems II – Topic 2

Learning Outcome
After successful completion of this
course students will be able to:
– create stored procedures and triggers
– describe data storage & access and
manipulate query processing techniques
– demonstrate transaction processing
techniques of database systems
– determine designs for distributed databases

© UCSC - 2016 2
IT6405: Database Systems II – Topic 2

Outline of Syllabus
1. Stored Procedures and Triggers
2. Data Storage and Querying
3. Transaction Management
4. Distributed Databases

© UCSC - 2016 3
IT6405: Database Systems II – Topic 2

References
1. Elmasri, Navathe, Somayajulu, and Gupta,
“Fundamentals of Database Systems”, 5th Edition,
Pearson Education (2008)
Note: 6th Edition released in 2011
2. Silberschatz A., Korth H.F. and Sudarshan S., “Database
System Concepts”, 5th Edition, McGraw Hill (2006).
Note: 6th Edition released in 2010
3. Ramakrishnan, Gehrke, “Database Management
Systems”, 3rd edition, McGraw Hill

© UCSC - 2016 4
IT6405: Database Systems II – Topic 2

IT6405: Database Systems II

Data Storage, Indexing

Duration: 15 hours

© UCSC - 2016 5
IT6405: Database Systems II – Topic 2

File Organization and Storage


Structures
Primary Storage (Main Memory)
• Fast
• Volatile
• Expensive

Secondary Storage (Files in disks or tapes)


• Non-Volatile

© UCSC - 2016 6
IT6405: Database Systems II – Topic 2

Disk Storage Devices


• Preferred secondary storage device for high
storage capacity and low cost.
• Data stored as magnetized areas on magnetic
disk surfaces.
• A disk pack contains several magnetic disks
connected to a rotating spindle.
• Disks are divided into concentric circular
tracks on each disk surface. Track capacities
vary typically from 4 to 50 Kbytes.

© UCSC - 2016 7
IT6405: Database Systems II – Topic 2

Disk Storage Devices


• Since a track usually contains a large amount of
information, it is divided into smaller blocks or
sectors.

• The block size B is fixed for each system.

• Typical block sizes range from B=512 bytes to


B=4096 bytes. Whole blocks are transferred
between disk and main memory for processing.

© UCSC - 2016 8
IT6405: Database Systems II – Topic 2

Disk Storage Devices

© UCSC - 2016 9
IT6405: Database Systems II – Topic 2

Disk Storage Devices


• A read-write head moves to the track that contains
the block to be transferred.
• Disk rotation moves the block under the readwrite
head for reading or writing.
• Reading or writing a disk block is time consuming
because of the seek time s and rotational delay
(latency) rd.

© UCSC - 2016 10
IT6405: Database Systems II – Topic 2

Blocking
• Blocking: refers to storing a number of records in
one block on the disk.
• Blocking factor (bfr) refers to the number of
records per block.
• There may be empty space in a block if an integral
number of records do not fit in one block.

© UCSC - 2016 11
IT6405: Database Systems II – Topic 2

Files of Records
• A file is a sequence of records, where each record is a
collection of data values (or data items).

• A file descriptor (or file header ) includes information


that describes the file, such as the field names and
their data types, and the addresses of the file blocks
on disk.

• Records are stored on disk blocks. The blocking


factor bfr for a file is the (average) number of file
records stored in a disk block.

© UCSC - 2016 12
IT6405: Database Systems II – Topic 2

Operation on Files
• OPEN: Readies the file for access, and associates
a pointer that will refer to a current file record at
each point in time.
• FIND: Searches for the first file record that satisfies
a certain condition, and makes it the current file
record.
• FINDNEXT: Searches for the next file record (from
the current record) that satisfies a certain condition,
and makes it the current file record.
• READ: Reads the current file record into a
program variable.
• INSERT: Inserts a new record into the file, and makes
it the current file record.

© UCSC - 2016 13
IT6405: Database Systems II – Topic 2

Operation on Files
• DELETE: Removes the current file record from the
file, usually by marking the record to indicate that it
is no longer valid.
• MODIFY: Changes the values of some fields of the
current file record.
• CLOSE: Terminates access to the file.
• REORGANIZE: Reorganizes the file records. For
example, the records marked deleted are physically
removed from the file or a new organization of the
file records is created.
• READ_ORDERED: Read the file blocks in order of
a specific field of the file.

© UCSC - 2016 14
IT6405: Database Systems II – Topic 2

Unordered Files
• Also called a heap or a pile file.
• New records are inserted at the end of the file.
• To search for a record, a linear search through
the file records is necessary. This requires
reading and searching half the file blocks on the
average, and is hence quite expensive.
• Record insertion is quite efficient.
• To delete a record, the record is marked as
deleted. Space is reclaimed during periodical
reoganization.

© UCSC - 2016 15
IT6405: Database Systems II – Topic 2

Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an
ordering field.
• Insertion is expensive: records must be inserted in the
correct order.
• A binary search can be used to search for a record on its
ordering field value. This requires reading and searching
log2 of the file blocks on the average, an improvement
over linear search.
• Reading the records in order of the ordering field is
quite efficient.

© UCSC - 2016 16
IT6405: Database Systems II – Topic 2

Ordered Files

10, University of Colombo School of Computing

© UCSC - 2016 17
IT6405: Database Systems II – Topic 2

Average Access Times


The following table shows the average access time to
access a specific record for a given type of file

© UCSC - 2016 18
IT6405: Database Systems II – Topic 2

Hashed Files
• The file blocks are divided into M equal-sized
buckets, numbered bucket0, bucket1, ..., bucket M-1.

• One of the file fields is designated to be the hash key


of the file.

• The record with hash key value K is stored in bucket


i, where i=h(K), and h is the hashing function.

• Search is very efficient on the hash key.

• Collisions occur when a new record hashes to a


bucket that is already full. An overflow file is kept for
storing such records.

© UCSC - 2016 19
IT6405: Database Systems II – Topic 2

Hashed Files
• There are numerous methods for collision
resolution, including the following:

Open addressing: Proceeding from the occupied


position specified by the hash address, the program
checks the subsequent positions in order until an
unused (empty) position is found.

Chaining: A collision is resolved by placing the new


record in an unused overflow location and setting
the pointer of the occupied hash address location to
the address of that overflow location.

Multiple hashing: The program applies a second


hash function if the first results in a collision.

© UCSC - 2016 20
IT6405: Database Systems II – Topic 2

Hashed Files
• The hash function h should distribute the records
uniformly among the buckets; otherwise, search
time will be increased because many overflow
records will exist.
• Main disadvantages of static hashing:
Fixed number of buckets M is a problem if the
number of records in the file grows or shrinks.

© UCSC - 2016 21
IT6405: Database Systems II – Topic 2

Hashed Files

© UCSC - 2016 22
IT6405: Database Systems II – Topic 2

Hashed Files Limitation


• Inappropriate for some retrievals:
• based on pattern matching
• eg. Find all students with ID like 98xxxxxx.
• Involving ranges of values
•eg. Find all students from 50100000 to
50199999.
• Based on a field other than
the hash field

© UCSC - 2016 23
IT6405: Database Systems II – Topic 2

Indexes
• Index: A data structure that allows particular records in a
file to be located more quickly
~ Index in a book

• An index can be sparse or dense:

– Sparse: record for only some of the search key


values (eg. Staff Ids: CS001, EE001, MA001).
Applicable to ordered data files only.
– Dense: record for every search key value. (eg.
Staff Ids: CS001, CS002, .. CS089, EE001,
EE002, ..)

© UCSC - 2016 24
IT6405: Database Systems II – Topic 2

Indexes
• Data file: a file containing the logical
records
• Index file: a file containing the index
records
• Indexing field: the field used to order the
index records in the index file

© UCSC - 2016 25
IT6405: Database Systems II – Topic 2

Dense Index

–The index is usually specified on one


field of the file (although it could be
specified on several fields)
–One form of an index is a file of
entries <field value, pointer to
record>, which is ordered by field
value
– The index is called an access path
on the field.

of Colombo School of Computing

© UCSC - 2016 26
IT6405: Database Systems II – Topic 2

Sparse Index

© UCSC - 2016 27
IT6405: Database Systems II – Topic 2

Primary Index
• Defined on an ordered data file.

• The data file is ordered on a key field.

• Includes one index entry for each block in the data file;
the index entry has the key field value for the first
record in the block, which is called the block anchor.

• A primary index is a nondense (sparse) index, since it


includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every
search value.

© UCSC - 2016 28
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 29
IT6405: Database Systems II – Topic 2

Clustering Index
• Defined on an ordered data file

• The data file is ordered on a non-key field unlike primary


index, which requires that the ordering field of the data
file have a distinct value for each record.

• Includes one index entry for each distinct value of


the field; the index entry points to the first data block
that contains records with that field value.

• It is another example of nondense index.

© UCSC - 2016 30
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 31
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 32
IT6405: Database Systems II – Topic 2

Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.

• The secondary index may be on a field which is a candidate


key and has a unique value in every record, or a non-key with
duplicate values.

• The index is an ordered file with two fields.

• The first field is of the same data type as some non-


ordering field of the data file that is an indexing field.

• The second field is either a block pointer or a record


pointer.

© UCSC - 2016 33
IT6405: Database Systems II – Topic 2

Secondary Index
• There can be many secondary indexes (and
hence, indexing fields) for the same file.

• Includes one entry for each record in the data file;


hence, it is a dense index.

© UCSC - 2016 34
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 35
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 36
IT6405: Database Systems II – Topic 2

© UCSC - 2016 37
IT6405: Database Systems II – Topic 2

Multi-Level Indexes
• Since a single-level index is an ordered file, we
can create a primary index to the index itself;
• In this case, the original index file is called the first-level
index and the index to the index is called the
second- level index.
• We can repeat the process, creating a third, fourth,
..., top level until all entries of the top level fit in one
disk block.
• A multi-level index can be created for any type of first
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk
block.

© UCSC - 2016 38
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 39
IT6405: Database Systems II – Topic 2

Multi-Level Indexes
• Such a multi-level index is a form of search tree.

• However, insertion and deletion of new index


entries is a severe problem because every level
of the index is an ordered file.

© UCSC - 2016 40
IT6405: Database Systems II – Topic 2

Dynamic Multilevel Indexes Using B+-


Trees
• Most multi-level indexes use B+-tree data
structure because of the insertion and deletion problem
• This leaves space in each tree node (disk block) to
allow for new index entries
• The data structure is a variation of search trees
that allow efficient insertion and deletion of new
search values.
• In B+-Tree data structure, each node
corresponds to a disk block.
• Each node is kept between half-full and completely full

© UCSC - 2016 41
IT6405: Database Systems II – Topic 2

Dynamic Multilevel Indexes Using B+-Trees

• An insertion into a node that is not full is quite


efficient.

• If a node is full the insertion causes a split into two


nodes.

• Splitting may propagate to other tree levels

© UCSC - 2016 42
IT6405: Database Systems II – Topic 2

Dynamic Multilevel Indexes Using B+-Trees

• A deletion is quite efficient if a node does not


become less than half full.
• If a deletion causes a node to become less than
half full, it must be merged with neighboring
nodes.

© UCSC - 2016 43
IT6405: Database Systems II – Topic 2

B+ tree
The structure of the internal nodes of a B+ tree
of order p is as follows:
• Each internal node is of the form
<P1,K1,P2, K2…..,Kq-1,Pq-1,Pq>
where q ≤ p. Each Pi is a tree pointer.
• Within each node K1 < K2 < ….<Kq-1
• Each node has at most p tree pointers.
• Each node with q tree pointers, q ≤ p, has q-1
search key field values.

© UCSC - 2016 44
IT6405: Database Systems II – Topic 2

B+ tree
The structure of the leaf nodes of a B+ tree of
order p is as follows:
• Each leaf node is of the form
<K1,Pr1>,<K2,Pr2>,…..,<Kq-1,Prq-1>,Pnext>
where q ≤ p. Each Pri is a data pointer. Pnext
points to the next leaf node of the B+ tree.
• Within each node K1 < K2 < ….<Kq-1
• All leaf nodes are at the same level.

© UCSC - 2016 45
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 46
IT6405: Database Systems II – Topic 2

Difference between B-tree and B+-tree

• In a B-tree, pointers to data records exist at all


levels of the tree.

• In a B+-tree, all pointers to data records exists at


the leaf-level nodes.

• A B+-tree can have less levels (or higher capacity


of search values) than the corresponding B-tree.

© UCSC - 2016 47
IT6405: Database Systems II – Topic 2

© 2010, University of Colombo School of Computing

© UCSC - 2016 48

You might also like