Unitv Part1
Unitv Part1
●
Databases are stored physically as files of records, which
are typically stored on magnetic disks.
●
The collection of data that makes up a computerized
database must be stored physically on some computer
storage medium. The DBMS software can then retrieve,
update, and process this data as needed.
●
Computer storage media form a storage hierarchy that
includes two main categories.
Primary Storage
●
This category includes storage media that can be operated on directly by the
computer’s central processing unit (CPU), such as the computer’s main memory
and smaller but faster cache memories.
●
Primary storage usually provides fast access to data but is of limited storage
capacity.
●
Although main memory capacities have been growing rapidly in recent years,
they are still more expensive and have less storage capacity than demanded by
typical enterprise-level databases.
●
The contents of main memory are lost in case of power failure or a system crash.
●
Examples: CPU registers, cache memory, main memory(RAM), etc.
Secondary Storage
●
The primary choice of storage medium for online storage of enterprise databases
has been magnetic disks.
●
These devices provide non-volatile, long-term storage for computer systems,
retaining data even after the system is powered down.
●
Secondary storage plays a crucial role in effective data management, retrieval,
and sharing in modern computer systems.
●
However, flash memories are becoming a common medium of choice for storing
moderate amounts of permanent data. When used as a substitute for a disk drive,
such memory is called a solid-state drive (SSD).
●
Examples: Hard disk drives (HDDs), Solid-state drives (SSDs), USB flash drives.
Tertiary Storage
●
Optical disks (CD-ROMs, DVDs, and other similar storage
media) and tapes are removable media used in today’s systems
as offline storage for archiving databases and hence come under
the category called tertiary storage.
●
These devices usually have a larger capacity, cost less, and
provide slower access to data than do primary storage devices.
●
Data in secondary or tertiary storage cannot be processed
directly by the CPU; first it must be copied into primary storage
and then processed by the CPU.
File Organization
●
The File is a collection of records. Using the primary key, we can access the
records. The type and frequency of access can be determined by the type of file
organization which was used for a given set of records.
●
File organization is a logical relationship among various records. This method
defines how file records are mapped onto disk blocks.
●
File organization is used to describe the way in which the records are stored in
terms of blocks, and the blocks are placed on the storage medium.
●
The first approach to map the database to the file is to use the several files and
store only one fixed length record in any given file. An alternative approach is to
structure our files so that we can contain multiple lengths for records.
●
Files of fixed length records are easier to implement than the files of variable
length records.
Types of File Organization
●
Sequential file organization
●
Heap file organization
●
Hash file organization
●
B+ file organization
●
Indexed sequential access method (ISAM)
●
Cluster file organization
Sequential File Organization
●
This method is the easiest method for file
organization. In this method, files are stored
sequentially. This method can be implemented
in two ways:
– Pile File Method
– Sorted File Method
Sequential File Organization (Pile
File Method)
●
It is a quite simple method. In this method, we store the
record in a sequence, i.e., one after another. Here, the
record will be inserted in the order in which they are
inserted into tables.
●
In case of updating or deleting of any record, the record
will be searched in the memory blocks. When it is found,
then it will be marked for deleting, and the new record is
inserted.
●
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence.
●
Hence, records are nothing but a row in the table. Suppose we want to insert a new
record R2 in the sequence, then it will be placed at the end of the file. Here,
records are nothing but a row in any table.
Objective of File Organization
●
It contains an optimal selection of records, i.e., records can be
selected as fast as possible.
●
To perform insert, delete or update transaction on the records
should be quick and easy.
●
The duplicate records cannot be induced as a result of insert,
update or delete.
●
For the minimal cost of storage, records should be stored
efficiently.
Sequential File Organization (Sorted
File Method)
●
In this method, the new record is always inserted at
the file's end, and then it will sort the sequence in
ascending or descending order. Sorting of records
is based on any primary key or any other key.
●
In the case of modification of any record, it will
update the record and then sort the file, and lastly,
the updated record is placed in the right place.
●
Suppose there is a
preexisting sorted
sequence of four
records R1, R3 and
so on upto R6 and
R7.
●
Suppose a new
record R2 has to be
inserted in the
sequence, then it
will be inserted at
the end of the file,
and then it will sort
the sequence.
Pros of sequential file organization
●
It contains a fast and efficient method for the huge amount of data.
●
In this method, files can be easily stored in cheaper storage
mechanism like magnetic tapes.
●
It is simple in design. It requires no much effort to store the data.
●
This method is used when most of the records have to be accessed
like grade calculation of a student, generating the salary slip, etc.
●
This method is used for report generation or statistical calculations.
Cons of sequential file organization
●
It will waste time as we cannot jump on a
particular record that is required but we have to
move sequentially which takes our time.
●
Sorted file method takes more time and space
for sorting the records.
Heap File Organization
●
It is the simplest and most basic type of organization. It works with data blocks.
In heap file organization, the records are inserted at the file's end. When the
records are inserted, it doesn't require the sorting and ordering of records.
●
When the data block is full, the new record is stored in some other block. This
new data block need not to be the very next data block, but it can select any
data block in the memory to store new records. The heap file is also known as
an unordered file.
●
In the file, every record has a unique id, and every page in a file is of the same
size. It is the DBMS responsibility to store and manage the new records.
Insertion of a new record
●
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose
we want to insert a new record R2 in a heap. If the data block 3 is full then it will
be inserted in any of the database selected by the DBMS, let's say data block 1.
●
If we want to search, update or delete the data in heap file organization, then
we need to traverse the data from staring of the file till we get the requested
record.
●
If the database is very large then searching, updating or deleting of record will
be time-consuming because there is no sorting or ordering of records. In the
heap file organization, we need to check all the data until we get the requested
record.
Pros of Heap file organization
●
It is a very good method of file organization for
bulk insertion. If there is a large number of data
which needs to load into the database at a time,
then this method is best suited.
●
In case of a small database, fetching and
retrieving of records is faster than the sequential
record.
Cons of Heap file organization
●
This method is inefficient for the large database
because it takes time to search or modify the
record.
●
This method is inefficient for large databases.
Hash File Organization
●
Hash File Organization uses the computation of hash function on some
fields of the records. The hash function's output determines the location of
disk block where the records are to be placed.
●
When a record has to be received using the hash key columns, then the
address is generated, and the whole record is retrieved using that address.
In the same way, when a new record has to be inserted, then the address
is generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.
●
In this method, there is no effort for searching and sorting the entire file. In
this method, each record will be stored randomly in the memory.
B+ File Organization
●
B+ tree file organization is the advanced method of an indexed sequential
access method. It uses a tree-like structure to store records in File.
●
It uses the same concept of key-index where the primary key is used to
sort the records. For each primary key, the value of the index is generated
and mapped with the record.
●
The B+ tree is similar to a binary search tree (BST), but it can have more
than two children. In this method, all the records are stored only at the
leaf node. Intermediate nodes act as a pointer to the leaf nodes. They do
not contain any records.
The previous B+ tree shows that:
●
There is one root node of the tree, i.e., 25.
●
There is an intermediary layer with nodes. They do not store the actual record.
They have only pointers to the leaf node.
●
The nodes to the left of the root node contain the prior value of the root and
nodes to the right contain next value of the root, i.e., 15 and 30 respectively.
●
There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and
29.
●
Searching for any record is easier as all the leaf nodes are balanced.
●
In this method, searching any record can be traversed through the single path
and accessed easily.
Pros of B+ tree file organization
●
In this method, searching becomes very easy as all the records are
stored only in the leaf nodes and sorted the sequential linked list.
●
Traversing through the tree structure is easier and faster.
●
The size of the B+ tree has no restrictions, so the number of
records can increase or decrease and the B+ tree structure can
also grow or shrink.
●
It is a balanced tree structure, and any insert/update/delete does
not affect the performance of tree.
Cons of B+ tree file organization
●
This method is inefficient for mostly static data.
Indexed sequential access method
(ISAM)
●
ISAM method is an advanced sequential file
organization. In this method, records are stored in the
file using the primary key. An index value is generated
for each primary key and mapped with the record. This
index contains the address of the record in the file.
●
If any record has to be retrieved based on its index
value, then the address of the data block is fetched and
the record is retrieved from the memory.
Pros of ISAM:
●
This method, each record has the address of its data block,
searching a record in a huge database is quick and easy.
●
This method supports range retrieval and partial retrieval of
records. Since the index is based on the primary key values,
we can retrieve the data for the given range of value.
●
In the same way, the partial value can also be easily
searched, i.e., the student name starting with 'JA' can be
easily searched.
Cons of ISAM
●
This method requires extra space in the disk to store
the index value.
●
When the new records are inserted, then these files
have to be reconstructed to maintain the sequence.
●
When the record is deleted, then the space used by
it needs to be released. Otherwise, the performance
of the database will slow down.
Cluster file organization
●
When the two or more records are stored in the same file, it is known as clusters.
These files will have two or more tables in the same data block, and key
attributes which are used to map these tables together are stored only once.
●
This method reduces the cost of searching for various records in different files.
●
The cluster file organization is used when there is a frequent need for joining the
tables with the same condition. These joins will give only a few records from both
tables. In the given example, we are retrieving the record for only particular
departments. This method can't be used to retrieve the record for the entire
department.
●
In this method, we can directly insert, update or delete any record. Data is sorted
based on the key with which searching is done. Cluster key is a type of key with
which joining of the table is performed.
Types of Cluster file organization:
●
Indexed Clusters:
– In indexed cluster, records are grouped based on the cluster key and
stored together. The above EMPLOYEE and DEPARTMENT relationship
is an example of an indexed cluster. Here, all the records are grouped
based on the cluster key- DEP_ID and all the records are grouped.
●
Hash Clusters:
– It is similar to the indexed cluster. In hash cluster, instead of storing the
records based on the cluster key, we generate the value of the hash key
for the cluster key and store the records with the same hash key value.
Pros of Cluster file organization
●
The cluster file organization is used when there
is a frequent request for joining the tables with
same joining condition.
●
It provides the efficient result when there is a
1:M mapping between the tables.
Cons of Cluster file organization
●
This method has the low performance for the very
large database.
●
If there is any change in joining condition, then this
method cannot use. If we change the condition of
joining then traversing the file takes a lot of time.
●
This method is not suitable for a table with a 1:1
condition.
Data Dictionary
●
A data dictionary in Database Management System
(DBMS) can be defined as a component that stores the
collection of names, definitions, and attributes for data
elements that are being used in a database.
●
The Data Dictionary stores metadata, i.e., data about the
database.
●
These data elements are then used as part of a database,
research project, or information system.
Data Dictionary
●
A data dictionary is a crucial part of a relational
database as it provides additional information
about the relationships between multiple tables
in a database.
●
The data dictionary in DBMS helps the user to
arrange data in a neat and well-organized way,
thus preventing data redundancy.
Advantages of Data Dictionaries
●
Data models in DBMS provide very little information about the database, so
a data dictionary is very essential to have proper knowledge about entities,
relationships, and attributes that are present in a data model.
●
The Data Dictionary provides consistency by reducing data redundancy in
the collection and use of data across various members of a team.
●
The Data Dictionary provides structured analysis and design tools by
enforcing the use of data standards. Data standards are the set of rules that
govern the way data is collected, recorded, and represented.
●
Using a Data Dictionary helps to define naming conventions that are used in
a model.
Types of Data Dictionaries (Active
Data Dictionaries)
●
An active data dictionary is a type of dictionary that is
automatically managed and updated by the database
management system (DBMS) whenever any modification or
changes are executed in the database.
●
It does not require any external maintenance software or tool and
is highly consistent with the structure and definition of the
database.
●
The active data dictionary is also known as an integrated data
dictionary.
Types of Data Dictionaries (Passive
Data Dictionaries)
●
A passive data dictionary is a type of dictionary that also has the
storage for centralizing the metadata, but it does not require any
dedicated software for updating or modifying the information.
●
It is manually updated and maintained, and one of its
disadvantages is that it requires a lot of maintenance cost and
also requires other teams to maintain it.
●
The passive data dictionary is also known as a non-integrated
dictionary or a standalone dictionary.
Types of Data Dictionaries (Passive
Data Dictionaries)
●
A passive data dictionary is a type of dictionary that also has the
storage for centralizing the metadata, but it does not require any
dedicated software for updating or modifying the information.
●
It is manually updated and maintained, and one of its
disadvantages is that it requires a lot of maintenance cost and
also requires other teams to maintain it.
●
The passive data dictionary is also known as a non-integrated
dictionary or a standalone dictionary.
Hashing
●
Hashing is a technique used in DBMS for the storage and
retrieval of data records in a database, particularly in large
databases containing thousands or millions of records.
●
It is used to quickly locate a data record in a database by
utilizing an auxiliary hash table and a hash function.
●
The hash function takes the primary key of a data record as
input and computes an index or location where the current
data record is stored.
Dynamic Hashing
●
Nature of Data: Static hashing is used for fixed-size, non-changing
data.
●
Data Bucket Size: The resulting data bucket in static hashing is of fixed
length.
●
Handling Data Changes: It does not handle variable-size or changing
data efficiently.
●
Bucket Overflow: Static hashing can face challenges with bucket
overflow, especially if the memory size is limited.
●
Complexity: Static hashing is simpler compared to dynamic hashing.
Static Hashing
●
Nature of Data: Static hashing is used for fixed-size, non-changing
data.
●
Data Bucket Size: The resulting data bucket in static hashing is of fixed
length.
●
Handling Data Changes: It does not handle variable-size or changing
data efficiently.
●
Bucket Overflow: Static hashing can face challenges with bucket
overflow, especially if the memory size is limited.
●
Complexity: Static hashing is simpler compared to dynamic hashing.
ACID Properties of Transaction
●
Atomicity
●
Consistency
●
Isolation
●
Durability
Atomicity
●
A transaction is an atomic unit of processing; it
should either be performed in its entirety or not
performed at all.
Consistency
●
A transaction should be consistency preserving,
meaning that if it is completely executed from
beginning to end without interference from other
transactions, it should take the database from
one consistent state to another.
Isolation
●
A transaction should appear as though it is
being executed in isolation from other
transactions, even though many transactions are
executing concurrently.
●
That is, the execution of a transaction should not
be interfered with by any other transactions
executing concurrently.
Durability
●
The changes applied to the database by a
committed transaction must persist in the
database.
●
These changes must not be lost because of any
failure.