4 Marks Chapter (12)
1) Physical Storage Media
1. Cache
2. Main Memory
3. Flash Memory
4. Magnetic disk
5. Optical Storage
6. Tape Storage
Primary Storage : Cahce, Main Memory, Flash Memory
Storage Storage : Magnetic disk, Optical Storage
Tertiary Storage : Tape Storage
Volatile Storage : Cache, Main Memory
Non-Volatile: Flash Memory, Magnetic disk, Optical disk, Tape Storage
2) Improvement of Reliability via Redundancy
The solution to the problem of reliability is to introduce redundancy; that is, we store
extra information that is not needed normally but that can be used in the event of failure of a
disk to rebuild the lost information. Thus, even if a disk fails, data are not lost, so the effective
mean time to failure is increased, provided that we count only failures that lead to loss of data
or to non-availability of data.
3) Two main goals of parallelism
1. Load-balance multiple small accesses (block accesses), so that the throughput of such
accesses increases.
2. Parallelize large accesses so that the response time of large accesses is reduced.
4) Choice of RAID Level RAID
level are :
1. Monetary cost of extra disk-storage requirements.
2. Performance requirements in terms of number of I/O operations per second.
3. Performance when a disk has failed.
4. Performance during rebuild (i.e., while the data in a failed disk are being rebuilt on a new
disk).
Chapter (13)
1) Variable-Length Records
Variable-length records arise in database systems due to several reasons. The most
common reason is the presence of variable length fields, such as strings. Other reasons
include record types that contain repeating fields such as arrays or multisets, and the presence
of multiple record types within a file.
The representation of a record with variable-length attributes typically has two parts:
an initial part with fixed-length information, whose structure is the same for all records of the
same relation, followed by the contents of variable-length attributes.
2) Organization of Records in Files (List & Explain one or two)
1. Heap file organization
2. Sequential file organization
3. Multitable clustering file organization
4. B+-tree file organization
5. Hashing file organization
Heap file organization
Any record can be placed anywhere in the file where there is space for the record.
There is no ordering of records. Typically, there is either a single file or a set of files for each
relation.
Sequential file organization
Records are stored in sequential order, according to the value of a “search key” of
each record.
3) Data-Dictionary Storage
Among the types of information that the system must store are these :
1. Names of the relations
2. Names of the attributes of each relation
3. Domains and lengths of attributes
4. Names of views defined on the database, and definitions of those views
5. Integrity constraints (e.g., key constraints)
In addition, many systems keep the following data on users of the system :
1. Names of users, the default schemas of the users, and passwords or other information to
authenticate users
2. Information about authorizations for each user
The data dictionary may also note the storage organization (heap, sequential,
hash, etc.) of relations, and the location where each relation is stored:
1. If relations are stored in operating system files, the dictionary would note the names
of the file (or files) containing each relation.
2. If the database stores all relations in a single file, the dictionary may note the blocks
containing records of each relation in a data structure such as a linked list
In which we study indices, we shall see a need to store information about each
index on each of the relations:
1. Name of the index
2. Name of the relation being indexed
3. Attributes on which the index is defined
4. Type of index formed
Chapter (14)
1) Define Ordered indices and Hash indices.
Ordered indices. Based on a sorted ordering of the values.
Hash indices. Based on a uniform distribution of values across a range of buckets. The
bucket to which a value is assigned is determined by a function, called a hash function.
2) Difference between Clustering index and Nonclustering index.
Clustering Index (primary index)
If the file containing the records is sequentially ordered, a clustering index is an index
whose search key also defines the sequential order of the file.
Nonclustering Index (secondary index)
Indices whose search key specifies an order different from the sequential order of the
file are called nonclustering indices.
3) Dense and Sparse Indices
An index entry, or index record, consists of a search-key value and pointers to one or
more records with that value as their search-key value. The pointer to a record consists of the
identifier of a disk block and an offset within the disk block to identify the record within the
block.
There are two types of ordered indices that we can use:
Dense index: In a dense index, an index entry appears for every search-key value in the file.
Sparse index: In a sparse index, an index entry appears for only some of the search key
values.
4) What is the main disadvantages/drawback of index-sequential file organization?
The main drawback of index-sequential file organization is the degradation of
performance as the file grows: With growth, an increasing percentage of index entries and
actual records become out of order and are stored in overflow blocks. The degradation of
index lookups by using B+-tree indices on the file.
5) Difference between B+-tree and B-tree
B-tree indices are similar to B+-tree indices. The primary distinction between the two
approaches is that a B-tree eliminates the redundant storage of search-key values. Every
search-key value appears in some leaf node; several are repeated in nonleaf nodes. AB-tree
allows search-key values to appear only once (if they are unique), unlike a B+-tree, where a
value may appear in a nonleaf node, in addition to appearing in a leaf node. Since search keys
are not repeated in the B-tree, we may be able to store the index in fewer tree nodes than in
the corresponding B+-tree index. However, since search keys that appear in nonleaf nodes
appear nowhere else in the B-tree, we are forced to include an additional pointer field for
each search key in a nonleaf node. These additional pointers point to either file records or
buckets for the associated search key.
6) Which conditions can occur bucket overflow?
Bucket overflow can occur if there are insufficient buckets for the given number of records.
Bucket overflow can also occur if some buckets are assigned more records than are others,
resulting in one bucket overflowing even when other buckets still have a lot of free space.
Chapter (15)
1) Query Processing
Query processing refers to the range of activities involved in extracting data from a
database. The activities include translation of queries in high-level database languages into
expressions that can be used at the physical level of the file system, a variety of
queryoptimizing transformations, and actual evaluation of queries.
The basic steps are:
1. Parsing and translation.
2. Optimization.
3. Evaluation.
Chapter (16)
1) Query Optimization
Query optimization is the process of selecting the most efficient query-evaluation plan
from among the many strategies usually possible for processing a given query, especially if
the query is complex. the system to construct a query-evaluation plan that minimizes the cost
of query evaluation.
One aspect of optimization occurs at the relational-algebra level, where the system
attempts to find an expression that is equivalent to the given expression, but more efficient to
execute. Another aspect is selecting a detailed strategy for processing the query, such as
choosing the algorithm to use for executing an operation, choosing the specific indices to use,
and so on.
Chapter (17)
1) Transactions
A transaction is a unit of program execution that accesses and possibly updates
various data items.
The following properties of the transactions :
1. Atomicity: A transaction is all or nothing. If one part fails, everything is rolled back.
Prevents partial updates.
2. Consistency: Transactions must take the database from one correct state to another.
Prevents data corruption.
3. Isolation: Transactions should not interfere with each other. Prevents one transaction
from seeing incomplete results of another.
4. Durability: Once a transaction is committed, it is permanently stored. Even if the
system crashes, the data remains safe.
2) The following states :
Active, the initial state; the transaction stays in this state while it is executing.
Partially committed, after the final statement has been executed.
Failed, after the discovery that normal execution can no longer proceed.
Aborted, after the transaction has been rolled back and the database has been restored to its
state prior to the start of the transaction.
Committed, after successful completion.
3) Concurrency Control
When several transactions execute concurrently in the database, however, the isolation
property may no longer be preserved. To ensure that it is, the system must control the
interaction among the concurrent transactions; this control is achieved through one of a
variety of mechanisms called concurrency-control schemes.
Two good reasons/advantages of concurrency control are :
1. Improved throughput and resource utilization
2. Reduced waiting time
Chapter (18)
1) The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol. This
protocol requires that each transaction issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
Chapter (19)
1) Recovery System
The recovery scheme must also support high availability, that is, the database should
be usable for a very high percentage of time. To support high availability in the face of
machine, the recovery scheme must support the ability to keep a backup copy of the database
synchronized with the current contents of the primary copy of the database. If the machine
with the primary copy fails, transaction processing can continue on the backup copy.
Transaction failure
There are two types of errors that may cause a transaction to fail :
1. Logical error. The transaction can no longer continue with its normal execution
because of some internal condition, such as bad input, data not found, overflow, or resource
limit exceeded.
2. System error. The system has entered an undesirable state (e.g., deadlock), as a result
of which a transaction cannot continue with its normal execution. The transaction, however,
can be reexecuted at a later time.