0% found this document useful (0 votes)
17 views7 pages

5 Marks For DBMS

DBMS

Uploaded by

Tin Htun Win
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

5 Marks For DBMS

DBMS

Uploaded by

Tin Htun Win
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Database System Structure

CS-3125

Chapter-12

1. Some databases use magnetic disk s in a way that only sectors in outer tracks are used, while sectors in inner
tracks are left unused. What might be the benefits of doing so? (Page-582)
Answer :

The disk's data-transfer rate will be greater on the outer tracks than the inner tracks. This is because the disk
spins at a cons ant rate, so more sectors pass underneath the drive head in a given amount of time when the arm
is positioned on an outer track than when on an inner track. Even more importantly, by using only outer tracks,
the disk arm movement is minimized, reducing the disk access latency. This aspect is important for transaction-
processing systems, where latency aspects the transaction-processing rate.

2. A database administrator can choose how many disks are organized into a single RAID 5 array. What are the
trades-offs between having fewer disks versus more disks, in terms of cost, reliability, performance during
failure, and performance e during rebuild? (Page-582)
Answer:

Fewer disks has higher cost, but with more disks, the than e of two disk failures, which would lead to data loss,
is higher. Further, performance during failure would be poor sine a block read from a failed disk would result a
large number of block reads from the other disks. Similarly, the overhead for rebuilding the failed disk would
also be higher, since more disks need to be read to reconstruct the data in the failed disk.

Chapter-13

1. What is the main purpose of the data dictionary (or system catalog) in a relational database management
system (RDBMS), and what is its role in ensuring database functionality? (Page-602)
Relational schemas and other metadata about relations are stored in a structure called the data dictionary or
system catalog. Among the types of information that the system must store are these:

• Names of the relations

• Names of the attributes of each relation

• Domains and lengths of attributes

• Names of views defined on the database, and definitions of those views

• Integrity constraints (e.g., key constraints)

Page 1 of 7
2. PostgreSQL normally uses a small buffer, leaving it to the operating system buffer manager to manage the
rest of main memory available for le system buffering. Explain (a) what is the benefit of this approach, and
(b) one key limitation of this approach. (Page-620)
Answer:

The database system does not know what are the memory demands from not her processes. By using a
small buffer, Postg reSQL ensures that it does not grab too mu h of main memory. But at the same time, even
if a block is evicted from buffer, if the file system buffer manager has enough memory allocated to it, the
evicted page is likely to still be ached in the file system buffer. Thus, a database buffer miss is often not very
expensive sin e the block is still in the file system buffer.
The drawback of this approach is that the database system may not be able to control the file system buffer
replacement policy. Thus, the operating system may make suboptimal decisions on what to evict from the file
system buffer.

3. What are the key rules governing the granting of shared and exclusive locks on a buffer block, and how do
these rules affect the processing of lock requests from different database processes? (Page-606)
 Any number of processes may have shared locks on a block at the same time.
• Only one process is allowed to get an exclusive lock at a time, and further when a process has an
exclusive lock, no other process may have a shared lock on the block. Thus, an exclusive lock can be
granted only when no other process has a lock on the buffer block.

• If a process requests an exclusive lock when a block is already locked in shared or exclusive mode, the
request is kept pending until all earlier locks are released.
• If a process requests a shared lock when a block is not locked, or already shared locked, the lock may
be granted; however, if another process has an exclusive lock, the shared lock is granted only after the
exclusive lock has been released.

Chapter-14

1. Indices speed query processing, but it is usually a bad idea to relate indices on every attribute, and every
combination of attributes, that are potential search keys. Explain why. (Page-679)
Answer :
Reasons for not keeping indices on ever y attribute include: •
Ever y index requires additional CPU time and disk I/O overhead during inserts and deletions.
• Indices on non-primary keys might have to be hanged on updates, although an index on the primary key
might not (this is because updates typically do not modify the primary -key attributes). •
Each extra index requires additional storage space.
• For queries which involve conditions on several search keys, efficien y might not be bad even if only some
of the keys have indices on t hem. Therefore, database per for man e is improved less by adding indices when
many indices already exist.

Page 2 of 7
2. What factors should be considered when evaluating indexing techniques for a database and why are they
important to evaluate? (Page-624)

Access types: The types of access that are supported efficiently. Access types can include finding records with a
specified attribute value and finding records whose attribute values fall in a specified range.

• Access time: The time it takes to find a particular data item, or set of items, using the technique in question.

• Insertion time: The time it takes to insert a new data item. This value includes the time it takes to find the
correct place to insert the new data item, as well as the time it takes to update the index structure.

• Deletion time: The time it takes to delete a data item. This value includes the time it takes to find the item to
be deleted, as well as the time it takes to update the index structure.

• Space overhead: The additional space occupied by an index structure. Provided that the amount of additional
space is moderate, it is usually worthwhile to sacrifice the space to achieve improved performance.

3. What are the major distinctions between dense and sparse indexes in terms of their structure and
functionality, and how do these distinctions affect their utilization in ordered indexing? (Page-626)

1. Dense Index:
o Dense Clustering Index: Contains an index entry for every search-key value, with each entry
pointing to the first record of that search-key value. Subsequent records with the same key are
stored sequentially in the file.
o Dense Nonclustering Index: Includes an index entry for every search-key value, with each entry
pointing to a list of pointers to all records with that search-key value.
2. Sparse Index:
o Sparse Index: Contains index entries for only some search-key values and requires the data to
be sorted by the search key. Each entry points to the first record with that search-key value. To
find a record, locate the nearest index entry with a search-key value less than or equal to the
target value, then scan sequentially from there to find the desired record.

Chapter-15

1. What are the two primary benefits of creating a pipeline of operations in query evaluation, and how do these
benefits impact query performance and user experience? (Page-726)

Creating a pipeline of operations can provide two benefits:

1. It eliminates the cost of reading and writing temporary relations, reducing the cost of query evaluation. Note
that the cost formulae that we saw earlier for each operation included the cost of reading the result from disk. If
the input to an operator oi is pipelined from a preceding operator oj, the cost of oi should not include the cost of
reading the input from disk; the cost formulae that we saw earlier can be modified accordingly.

2. It can start generating query results quickly, if the root operator of a query evaluation plan is combined in a
pipeline with its inputs. This can be quite useful if the results are displayed to a user as they are generated, since
otherwise there may be a long delay before the user sees any query results.

Page 3 of 7
2. The indexed nested-loop join algorithm can be inefficient if the index is a secondary index and there are
multiple tuples with the same value for the join attributes. Why is it inefficient? Describe a way, using sorting,
to reduce the cost of retrieving tuples of the inner relation. Under what conditions would this algorithm be more
efficient than hybrid merge join? (Page-737)

Answer :

If there are multiple tuples in the inner relation with the same value for the join attributes, we may have to
access that many blocks of the inner relation for each tuple of the outer relation. That is why it is inefficient. To
reduce this cost we can perform a join of the outer relation tuples with just the secondary index leaf entries,
postponing the inner relation tuple retrieval. The result file obtained is then sorted on the inner relation
addresses, allowing an efficient physical orders scan to complete the join.

Hybrid merge – join requires the outer relation to be sorted. The above algorithm does not have this
requirement, but for each tuple in the outer relation it needs to perform an index look up on the inner relation. If
the outer relation is much larger than the inner relation, this index look up cost will be less than the sorting cost,
thus this algorithm will be more efficient.

3. What factors contribute to the difficulty in estimating the response time of a query-evaluation plan? (Page-
694)

The response time for a query-evaluation plan (that is, the wall-clock time required to execute the plan),
assuming no other activity is going on in the computer, would account for all these costs, and could be used as a
measure of the cost of the plan. Unfortunately, the response time of a plan is very hard to estimate without
actually executing the plan, for the following two reasons:

1. The response time depends on the contents of the buffer when the query begins execution; this information is
not available when the query is optimized and is hard to account for even if it were available.

2. In a system with multiple disks, the response time depends on how accesses are distributed among disks,
which is hard to estimate without detailed knowledge of data layout on disk.

Chapter-16

1. What are the key requirements for efficiently optimizing query evaluation plans, and how do each of these
requirements contribute to the overall efficiency of the approach? (Page-771)

To make the approach work efficiently requires the following:


1. A space-efficient representation of expressions that avoids making multiple copies of the same
subexpressions when equivalence rules are applied.
2. Efficient techniques for detecting duplicate derivations of the same expression.
3. A form of dynamic programming based on memoization, which stores the optimal query evaluation plan
for a sub expression when it is optimized for the first time; subsequent requests to optimize the same
subexpression are handled by returning the already memorized plan.
4. Techniques that avoid generating all possible equivalent plans by keeping track of the cheapest plan
generated for any subexpression up to any point of time, and pruning away any plan that is more expensive

Page 4 of 7
than the cheapest plan found so far for that subexpression.

2. How can expression-representation techniques and cost-based optimization strategies help reduce the
space and time costs associated with query optimization? (Page-757)
The preceding process is extremely costly both in space and in time, but optimizers can greatly reduce
both the space and time cost, using two key ideas.
1. If we generate an expression E′ from an expression E1 by using an equivalence rule on subexpression
ei, then E′ and E1 have identical subexpressions except for ei and its transformation. Even ei and its
transformed version usually share many identical subexpressions. Expression-representation techniques
that allow both expressions to point to shared subexpressions can reduce the space requirement
significantly.
2. It is not always necessary to generate every expression that can be generated with the equivalence
rules. If an optimizer takes cost estimates of evaluation into account, it may be able to avoid examining
some of the expressions. We can reduce the time required for optimization by using techniques such as
these.

Chapter-17

1. What are the four essential attributes of a database system that must guarantee transactions' behavior and
reliability? (Page-800)
• Atomicity. Either all operations of the transaction are reflected properly in the database, or none are.
• Consistency. Execution of a transaction in isolation preserves the consistency of the database.
• Isolation. Even though multiple transactions may execute concurrently, the system guarantees that, for
every pair of transactions T i and T j, it appears to T i that either T j finished execution before T i
started or T j started execution after T i finished. Thus, each transaction is unaware of other
transactions executing concurrently in the system.
• Durability. After a transaction completes successfully, the changes it has made to the database persist,
even if there are system failures.

2. What is the difference between volatile, non-volatile, and stable storage, and how does it affect data
persistence and system reliability? (Page-804)

 Volatile Storage: Volatile storage, such as main memory and cache memory, does not retain
information through system crashes. It is very fast and allows direct access to data items but is not
suitable for long-term data retention.

 Non-Volatile Storage: Non-volatile storage, including magnetic disks, flash storage, optical media,
and magnetic tapes, retains information through system crashes. Although slower than volatile storage
and prone to potential failures, it is used for both online and archival data storage.

 Stable Storage: Stable storage aims to ensure that information is never lost, even in extreme
scenarios. While truly stable storage is theoretically impossible, it is closely approximated by replicating
data across multiple non-volatile media with independent failure modes to minimize the risk of data loss.

Page 5 of 7
Chapter-18

1. Show by example that there are schedules possible under the tree protocol that is not possible under the
two-phase locking protocol, and vice versa. (Page-899)

Answer :

Consider the tree-structured database graph given below.

Schedule possible under tree protocol but not under 2PL:

Schedule possible under 2PL but not under tree protocol:

2. Describe the process of managing concurrent transactions in a multiversion concurrency-control scheme,


and discuss the specific implications of write operations in multiversion timestamp ordering and
multiversion two-phase locking. (Page-896)
A multiversion concurrency-control scheme is based on the creation of a new version of a data item for
each transaction that writes that item. When a read operation is issued, the system selects one of the
Page 6 of 7
versions to be read. The concurrency control scheme ensures that the version to be read is selected in a
manner that ensures serializability by using timestamps. A read operation always succeeds.
° In multiversion timestamp ordering, a write operation may result in the rollback of the transaction.
° In multiversion two-phase locking, write operations may result in a lock wait or, possibly, in deadlock.

Chapter-19

1. What are the important fields that make up an update log record, and what information does each field give
about a database write operation? (Page-913)
There are several types of log records. An update log record describes a single database write. It has
these fields:
• Transaction identifier, which is the unique identifier of the transaction that performed the write
operation.
• Data-item identifier, which is the unique identifier of the data item written. Typically, it is the location
on disk of the data item, consisting of the block identifier of the block on which the data item resides
and an offset within the block.
• Old value, which is the value of the data item prior to the write.
• New value, which is the value that the data item will have after the write.

2. What are the various faults that can occur in a system, and how should they be addressed based on the fail-
stop assumption and other recovery mechanisms? (Page-907)

• Transaction failure. There are two types of errors that may cause a transaction to fail:
° Logical error.
° System error.
• System crash. There is a hardware malfunction, or a bug in the database software or the operating
system, that causes the loss of the content of volatile storage and brings transaction processing to a halt.
The assumption that hardware errors and bugs in the software bring the system to a halt, but do not
corrupt the non-volatile storage contents, is known as the fail-stop assumption. Well-designed systems
have numerous internal checks, at the hardware and the software level, that bring the system to a halt
when there is an error.
• Disk failure. A disk block loses its content as a result of either a head crash or failure during a data-
transfer operation. Copies of the data on other disks, or archival backups on tertiary media, such as DVD
or tapes, are used to recover from the failure.

Page 7 of 7

You might also like