Dbms Notes
Dbms Notes
NOTES ON
DATABASE
MANAGEMENT
SYSTEM
BY D.K
1
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Ans: Database – A database is a collection of related data and/or information stored so that it is
available to many users for different purposes.
Advantages Of DBMS
1. Centralized Management and Control - One of the main advantages of using a
database system is that the organization can exert, via the DBA, centralized management and control
over the data.
2. Reduction of Redundancies and Inconsistencies - Centralized control avoids
unnecessary duplication of data and effectively reduces the total amount of data storage required.
Removing redundancy eliminates inconsistencies.
3. Data Sharing - A database allows the sharing of data under its control by any number
of application programs or users.
4. Data Integrity - Data integrity means that the data contained in the database is both
accurate and consistent. Centralized control can also ensure that adequate checks are incorporated in the
DBMS to provide data integrity.
5. Data Security - Data is of vital importance to an organization and may be
confidential. Such confidential data must not be accessed by unauthorized persons. The DBA who has
the ultimate responsibility for the data in the DBMS can ensure that proper access procedures are
followed. Different levels of security could be implemented for various types of data and operations.
6. Data Independence - Data independence is the capacity to change the schema at one
level of a database system without having to change the schema at the next level. It is usually
considered from two points of view: physical data independence and logical data independence.
Physical data independence is the capacity to change the internal schema without having to change
conceptual schema. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs.
7. Providing Storage Structures for Efficient Query Processing - Database systems
provide capabilities for efficiently executing queries and updates. Auxiliary files called indexes are used
for this purpose.
8. Backup and Recovery - These facilities are provided to recover databases from
hardware and/or software failures.
Some other advantages are:
▪ Reduced Application Development Time
▪ Flexibility
▪ Availability of up-to-date Information
Disadvantages Of DBMS
1. Cost of Software/Hardware and Migration - A significant disadvantage of the
DBMS system is cost.
2
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
2. Mappings between the internal and the conceptual levels, as well as between the
conceptual and external levels, are also defined by the DBA.
3. DBA ensures that appropriate measures are in place to maintain the integrity of the
database and that the database is not accessible to unauthorized users.
4. DBA is responsible for granting permission to the users of the database and stores
the profile of each user in the database.
5. DBA is responsible for defining procedures to recover the database from failures
with minimal loss of data.
Explain the terms primary key, candidate key and foreign key. Give an example foreach.
(7)
Ans: Primary Key – Primary key is one of the candidate keys that uniquelyidentifies
each row in the relation.
Candidate Key – A candidate key of an entity set is a minimal superkey, that uniquelyidentifies
each row in the relation.
Foreign Key – Let there are two relations (tables) R and S. Any candidate key of the relation R which is
referred in the relation S is called the foreign key in the relation S and referenced key in the relation R.
The relation R is also called as parent table and relation Sis also called as child table.
For example:
STUDENT
GRADE
Differentiate between logical database design and physical database design. Show how this separation
leads to data independence. (7)
3
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Ans:
Basis Logical Database Design Physical Database Design
Task Maps or transforms the conceptual The specifications for the stored
schema (or an ER schema) from the database in terms of physical storage
high-level data model into a structures, record placement, and indexes
relational database schema. are designed.
Choice The mapping can proceed in two The following criteria are often used to
of stages: guide the choice of physical database
criteria ▪ System-independent mapping design options:
but data model-dependent ▪ Response Time
▪ Tailoring the schemas to a ▪ Space Utilization
specific DBMS ▪ Transaction Throughput
Result DDL statements in the language of An initial determination of storage
the chosen DBMS that specify the structures and the access paths for the
conceptual and external level schemas database files. This corresponds to
of the database system. But if the defining the internal schema in terms of
DDL statements include some Data Storage Definition Language.
physical design parameters, a
complete DDL specification must
wait until after the physical database
design phase is completed.
The database design is divided into several phases. The logical database design and physical database
design are two of them. This separation is generally based on the concept of three-level architecture of
DBMS, which provides the data independence. Therefore, we can say that this separation leads to data
independence because the output of the logical database design is the conceptual and external level
schemas of the database system which is independent from the output of the physical database
design that isinternal schema.
Consider the following relation schemes: (27=14)
Project (Project#, Project_name, chief_architect)Employee (Emp#, Empname)
Assigned_To (Project#, Emp#)
Give expression in Tuple calculus and Domain calculus for each of the queries below:
(i) Get the employee numbers of employees who work on all projects.
(ii) Get the employee numbers of employees who do not work on the COMP123
project.
Ans:
(i) Tuple Calculus:
{t[Emp#] | t ASSIGNED_TO p (p PROJECT �u (u ASSIGNED_TO
p[Project#] = u[Project#] t[Emp#] = u[Emp#]))}
Domain Calculus:
{e | p (<p, e> ASSIGNED_TO p1 (<p1, n1, c1> PROJECT
�<p1, e> ASSIGNED_TO))}
(ii) Tuple Calculus:
{t[Emp#] | t ASSIGNED_TO u (u ASSIGNED_TO
u[ Project#] = ‗COMP123 ‘ t[ Emp#] = u[ Emp#])}
Domain Calculus:
{e | p (<p, e> ASSIGNED_TO p1, e1 (<p1, e1> ASSIGNED_TO
p1 ‗COMP123‘ e1 e))}
4
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Define the five basic operators of relational algebra with an example each.
Ans: Five basic operators of relational algebra are:
1. Union () - Selects tuples that are in either P or Q or in both of them. The
duplicate tuples are eliminated.R = P Q
2. Minus (–) - Removes common tuples from the first relation.
R=P–Q
3. Cartesian Product or Cross Product () - The cartesian product of two relations
is the concatenation of tuples belonging to the two relations and consisting of all possible combination
of the tuples.
R=PQ
For Example:
P: Q:
ID Name ID Name
104 Lalonde
R=PQ R=P–Q
ID Name ID Name
100 John 101 Jones
101 Jones 103 Smith
103 Smith
104 Lalonde
R=PQ
P.ID P.Nam Q.ID Q.Nam
e e
101 Jones 100 John
101 Jones 104 Lalonde
103 Smith 100 John
103 Smith 104 Lalonde
104 Lalond 100 John
e
104 Lalond 104 Lalonde
e
5
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
4. Projection () - The projection of a relation is defined as a projection of all its tuples over
some set of attributes, i.e., it yields a vertical subset of the relation. It is used to either reduce the
number of attributes (degree) in the resultant relation or to reorder attributes. The projection of a
relation T on the attribute A is denoted by A(T).
5. Selection () - Selects only some of the tuples, those satisfy given criteria, from the
relation. It yields a horizontal subset of a given relation, i.e., the action is defined over acomplete set of
attribute names but only a subset of the tuples are included in the result. R = B(P)
For Example:
EMPLOYEE:
Id Name Name
101 Jones Jones
�
103 Smith Smith
104 Lalonde
106 Byron
Result of Selection over EMPLOYEE for ID > 103
Explain entity integrity and referential integrity rules in relational model. Show howthese
are realized in SQL.
Ans:
Entity Integrity Rule – No primary key value can be null.
Referential Integrity Rule – In referential integrity, it is ensured that a value that appears in one
relation for a given set of attributes also appears for a certain set of attributes in another relation.
In SQL, entity integrity and referential integrity rules are implemented as constraints onthe relation
called as primary key constraint and reference key constraint respectively.These constraints can be
specified with relation at the time of creation of the relations orafter the creation of the relations by
altering the definition of the relations. For example: CREATE TABLE DEPT
(DEPTNO NUMBER PRIMARY KEY,DNAME VARCHAR2(15));
CREATE TABLE EMP
(EMPNO NUMBER PRIMARY KEY,
6
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
What are the advantages of embedded query language? Give an example of a embedded SQL
query.
Ans:
Embedded query language – SQL can be implemented in two ways. It can be used interactively or
embedded in a host language or by using API. The use of SQL commands within a host language (e.g.,
C, Java, etc.) program is called embedded query language or Embedded SQL. Although similar
capabilities are supported for a variety of host languages, the syntax sometimes varies. Some of the
advantages of embedded SQL are:
▪ SQL statements can be used wherever a statement in the host language is allowed.
▪ It combines the strengths of two programming environments, the procedural features of
host languages and non-procedural features of SQL.
▪ SQL statements can refer to variables (must be prefixed by a colon in SQL statements)
defined in the host program.
▪ Special program variables (called null indicators) are used to assign and retrieve the
NULL values to and from the database.
▪ The facilities available through the interactive query language are also automatically
available to the host programs.
▪ Embedded SQL along with host languages can be used to accomplish very complex and
complicated data access and manipulation tasks.
Example: The following Embedded SQL statement in C inserts a row, whose columnvalues are based
on the values of the host language variables contained in it.
EXEC SQL
INSERT INTO Sailors VALUES (:c_sname, :c_sid, :c_rating, :c_age);
Define a view and a trigger. Construct a view for the above relations which has the information
about suppliers and the parts they supply. The view contains the S#, SNAME, P# , PNAME
renamed as SNO, NAME, PNO, PNAME.
7
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Ans:
View – A view is a virtual table which is based on the one or more physical tables and/or views. In
other words, a view is a named table that is represented, not by its own physically separate stored data,
but by its definition in terms of other named tables (base tables or views).
Trigger – A trigger is a procedure that is automatically invoked by the DBMS in the response to
specified changes to the database. Triggers may be used to supplement declarative referential integrity,
to enforce complex business rules or to audit changes todata.
Command:
CREATE VIEW SUP_PART (SNO, NAME, PNO, PNAME) AS
SELECT S.S#, SNAME, P.P#, PNAMEFROM S, SP, P
WHERE S.S# = SP.S# AND P.P# = SP.P#
Ans:(i) Theta Join – The theta join operation is an extension to the natural-join operation that allows us
to combine selection and a Cartesian product into a single operation. Consider relations r(R) and s(S),
and let θ be a predicate on attributes in the schema R S. The theta join operation r DD s is defined as
follows:
r DD s = (r x s)
(ii) Equi Join – It produces all the combinations of tuples from two relations that satisfy a
join condition with only equality comparison (=).
(iii) Natural Join - Same as equi-join except that the join attributes (having same
names) are not included in the resulting relation. Only one sets of domain compatible attributes
involved in the natural join are present.
(iv) Outer Join - If there are any values in one table that do not have corresponding
value(s) in the other, in an equi-join that will not be selected. Such rows can be forcefully selected by
using the outer join. The corresponding columns for that row will have NULLs. There are actually three
forms of the outer-join operation: left outer join ( X), right outer join (X ) and full outer join ( X ).
Draw and explain the three level architecture of the database system.
Ans:
A DBMS provides three levels of data is said to follow three-level architecture. The goal of the three-
schema architecture is to separate the user applications and the physical database. The view at each of
these levels is described by a schema. The processes of transforming requests and results between levels
are called mappings. In this architecture, schemas can be defined at the following three levels:
8
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
▪ External Level or Subschema – It is the highest level of database abstraction where only
those portions of the database of concern to a user or application program are included. Any number of
user views (some of which may be identical) may exist for a given global or conceptual view. Each
external view is described by means of a schema called an external schema or subschema.
▪ Conceptual Level or Conceptual Schema - At this level of database abstraction all the
database entities and the relationships among them are included. One conceptual view represents the
entire database. This conceptual view is defined by the conceptual schema. There is only one
conceptual schema per database. The description of data at this level is in a format independent of its
physical representation. It also includes features that specify the checks to retain data consistency and
integrity.
▪ Internal Level or Physical Schema – It is closest to the physical storage method used. It
indicates how the data will be stored and describes the data structures and access methods to be used by
the database. The internal view is expressed by the internal schema.
Explain (a) Heap file (b) Sorted file. Also discuss their advantages and disadvantages.
Ans: Heap File is an unordered set of records, stored on a set of pages. This class provides basic
support for inserting, selecting, updating, and deleting records. Temporary heap files are used for
external sorting and in other relational operators. A sequential scan of a heap file (via the Scan class) is
the most basic access method.
Sorted file The sort utility shall perform one of the following functions:
1. Sort lines of all the named files together and write the result to the specified output.
2. Merge lines of all the named (presorted) files together and write the result to the specified
output.
3. Check that a single input file is correctly presorted.
Comparisons shall be based on one or more sort keys extracted from each line of input(or, if no sort
keys are specified, the entire line up to, but not including, the terminating
<newline>), and shall be performed using the collating sequence of the current locale.
9
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Describe a method for direct search? Explain how data is stored in a file so that direct
searching can be performed.
Ans: For a file of unordered fixed length records using unspanned blocks and contiguous allocation, it
is straight forward to access any record by its position in the file. If the file records are numbered 0,1,2,-
--,r-1 and the records in each block are numbered 0,1,---bfr-1; where bfr is the blocking factor, then ith
record of the file is located in block [(i/bfr)] and is the (I mod bfr)th record in that block. Such a file is
often called a relative or direct file because records can easily be accessed directly by their relative
positions. Accessing a record based on a search condition; however, it facilitates the construction of
access paths on the file, such as the indexes.
Explain the integrity constraints: Not Null, Unique, Primary Key with an example each.Is the
combination ‘Not Null, Primary Key’ a valid combination. Justify.
Ans: Not Null – Should contain valid values and cannot be NULL.
Unique – An attribute or a combination of two or more attributes must have a uniquevalue in each
row. The unique key can have NULL values.
Primary Key – It is same as unique key but cannot have NULL values. A table can have at most one
primary key in it.
For example:
STUDENT
Ans: (i) Nested Queries – A SELECT query can have subquery(s) in it. When aSELECT query
having another SELECT query in it, is called as nested query. Someoperations cannot be
performed with single SELECT command or with join operation.There are some operations which
can be performed with the help of nested queries (alsoreferred to as subqueries). For example, we want
to compute the second highest salary: SELECT MAX(SAL) FROM EMP WHERE SAL < (SELECT
MAX(SAL) FROM EMP)
Some operations can be performed both by Join and subqueries. The Join operation is costlier in terms
of time and space. Therefore, the solution based on subqueries is preferred.
10
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
(ii) Cursors in SQL – An object used to store the output of a query for row-by-row
processing by the application programs. Cursors are constructs that enable the user to name a private
memory area to hold a specific statement for access at a later time. Cursors are used to process multi-
row result sets one row at a time. Additionally, cursors keep track of which row is currently being
accessed, which allows for interactive processing of the active set.
(iii) RDBMS – RDBMS is a database management system (DBMS) that stores data in the
form of relations. Relational databases are powerful because they require few assumptions about how
data is related or how it will be extracted from the database. As a result, the same database can be
viewed in many different ways. An important feature of relational system is that a single database can be
spread across several tables. This differs from flat-file databases, in which each database is self-contained in a
single table.
(iv) View – A view is a relation (virtual rather than base) and can be used in query
expressions, that is, queries can be written using the view as a relation. In other words, a view is a
named table that is represented, not by its own physically separate stored data, but by its definition in
terms of other named tables (base tables or views). The base relations on which a view is based are
sometimes called the existing relations. The definition of a view in a create view statement is stored in
the system catalog. The syntax to create a view is:CREATE [OR REPLACE] VIEW <view_name>
[(<aliases>)] AS
<query> WITH {READ ONLY|CHECK OPTION [CONSTRAINT
<constraint_name>]};
(v) Application Programming Interface – Commercial SQL implementations take one
of the two basic techniques for including SQL in a programming language – embedded SQL and
application program interface (API). In the application program interface approach, the program
communicates with the RDBMS using a set of functions called the Application Program Interface
(API). The program passes the SQL statements to the RDBMS using API calls and uses API calls to
retrieve the results. In this method, the precompiler is not required.
Ans: Data independence is the capacity to change the schema at one level of a database system without
having to change the schema at the next level. The three-schema architecture allows the feature of data
independence. Data independence occurs because when the schema is changed at some level, the
schema at the next level remains unchanged; only the mapping between the two levels is changed.
Types ofdata independence are:
▪ Physical Data Independence – It is capacity to change the internal schema without
having to change conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some physical files had to be reorganized to
improve the performance of retrieval or update. If the same data as before remains in the database, the
conceptual schema needs not be changed.
▪ Logical Data Independence - It is the capacity to change the conceptual schema without
having to change external schemas or application programs. The conceptual schema may be changed to
expand the database (by adding a record type or data item), to change constraints, or to reduce the
database (by removing a record type or data item). Only the view definition and the mappings need be
changed in a DBMS that supports logical data independence. Changes to constraints can be applied to
the conceptual schema without affecting the external schemas or application programs.
12
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Reals: An attribute with a value of one or more real numbers. Enter each number on a separate line in
the Attribute Value text box.
String: An attribute with a value of a series of characters (text).
Strings: An attribute with a value of one or more strings. Enter each string on a separate line in the
Attribute Value text box.
Unique ID: An attribute with a value of a unique text string. An element can have only one ID attribute
(which can be of type Unique ID or Unique IDs). All ID values must be unique in the document or
book. An element with a Unique ID attribute can be the source for an element-based cross-reference.
Unique IDs: An attribute with a value of one or more unique text strings. Enter each string on a
separate line in the Attribute Value text box.
(iii) Oracle Instances: An instance is the (executed) Oracle software and the memory they
use. It is the instance that manipulates the data stored in the database. It can be started independent of
any database. It consists of:
1) A shared memory area that provides the communication between various processes.
2) Upto five background processes which handled various tasks. Whenever an oracle
instance starts, the file ‗INIT.ORA‘ is executed.
(iv) Mid square method of hashing: In midsquare hashing, the key is squared and the
address selected from the middle of the squared number.
Mid square method
* Square K.
* Strip predetermined digits from front and rear.
* e.g., use thousands and ten thousands places.
Ans:
CREATE TABLE EMPLOYEE
( EMPLOYEE_NAME VARCHAR2(20) PRIMARY KEY,STREET
VARCHAR2(20),
CITY VARCHAR2(15));CREATE TABLE COMPANY
( COMPANY_NAME VARCHAR2(50) PRIMARY KEY, CITY
VARCHAR2(15));
CREATE TABLE WORKS
( EMPLOYEE_NAME VARCHAR2(20) REFERENCES
EMPLOYEE(EMPLOYEE_NAME,
COMPANYNAME VARCHAR2(50) REFERENCES
COMPANY(COMPANY_NAME,
SALARY NUMBER(6),
CONSTRAINT WORKS_PK PRIMARY KEY(EMPLOYEE_NAME,COMPANY_NAME));
Give an expression in SQL for each of queries below:
13
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
(i) Find the names of all employees who work for first Bank Corporation.
(ii) Find the names and company names of all employees sorted in ascending order
ofcompany name and descending order of employee names of that company.
(iii) Change the city of First Bank Corporation to ‘New Delhi’
Ans:
(i) SELECT EMPLOYEE_NAMEFROM WORKS
WHERE COMPANYNAME = ‗First Bank Corporation‘;
(ii) SELECT EMPLOYEE_NAME, COMPANYNAMEFROM WORKS
ORDER BY COMPANYNAME, EMPLOYEE_NAME DESC;
(iii) UPDATE COMPANY
SET CITY = ‗New Delhi‘
WHERE COMPANY_NAME = ‗First Bank Corporation‘;
Discuss the correspondence between the E-R model construct and the relation model construct.
Show how each E-R model construct can be mapped to the relational model using the suitable
example?
Ans: An entity-relationship model (ERM): An entity-relationship model (ERM) is an abstract
conceptual representation of structured data. Entity-relationship modeling is a relational schema
database modeling method, used in software engineering to produce a type of conceptual data model (or
semantic data model) of a system, often a relational database, and its requirements in a top-down
fashion. Diagrams created using this process are called entity-relationship diagrams, or ER diagrams or
ERDs for short.
ER-to-Relational Mapping Algorithm:
1) Step 1: Mapping of regular entity types: For each strong entity type E, create a
relation T that includes all the simple attributes of a composite attribute.
2) Step2: Mapping of weak entity types: For each weak entity type W with owner entity
type E, create relation R and include all simple attributes (or simple components of composite
attributes) of W as attributed of R. In addition, include as foreign key attributes of R, the primary key
attribute (s) of relation(s) that correspond to the owner(s) and the partial key of the weak entity type W,
if any.
3) Mapping of relationship types: form a relation R, for relationship with primary keys of
participating relations A and B as foreign keys in R. In addition to this, any attributes of relationship
become an attribute of R also.
4) Mapping of multivalued attributes: For each multilvalued attribute A, create a new
relation R. This relation R will include an attribute corresponding to A, plus primary key attribute K-as
a foreign key in R-of the relation that represents the entity type or relationship type that has A as an
attribute.
Explain the concepts of relational data model. Also discuss its advantages and
disadvantages.
Ans:
Relational Data Model – The relational model was first introduced by Prof. E.F. Codd ofthe IBM
Research in 1970 and attracted immediate attention due to its simplicity and mathematical foundation.
The model uses the concept of a mathematical relation (like a table of values) as its basic building
block, and has its theoretical basis in set theory and first-order predicate logic. The relational model
represents the database as a collection of relations. The relational model like all other models consists
of three basic components:
▪ a set of domains and a set of relations
▪ operation on relations
14
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
▪ integrity rules
Advantages
Ease of use – The revision of any information as tables consisting of rows and columns
is quite natural and therefore even first time users find it attractive.
Flexibility – Different tables from which information has to be linked and extracted can
be easily manipulated by operators such as project and join to give information in the form in which it is
desired.
Security – Security control and authorization can also be implemented more easily by
moving sensitive attributes in a given table into a separate relation with its own authorization controls.
If authorization requirement permits, a particular attribute could be joined back with others to enable
full information retrieval.
Data Independence – Data independence is achieved more easily with normalization
structure used in a relational database than in the more complicated tree or network structure. It also
frees the users from details of storage structure and access methods.
Data Manipulation Language – The possibility of responding to ad-hoc query by
means of a language based on relational algebra and relational calculus is easy in the relational database
approach. Provides simplicity in the data organization and the availability of reasonably simple to very
powerful query languages.
Disadvantages
Performance – If the number of tables between which relationships to be established
are large and the tables themselves are voluminous, the performance in responding to queries is
definitely degraded.
Unsuitable for Hierarchies – While the relational database approach is a logically
attractive, commercially feasible approach, but if the data is for example naturally organized in a
hierarchical manner and stored as such, the hierarchical approach may give better results.
Ans:
(i) Integrity Constraints – A database is only as good as the information stored in it, and a
DBMS must therefore help prevent the entry of incorrect information. An integrity constraint is a
condition specified on a database schema and restricts the data that can be stored in an instance of the
database. If a database instance satisfies all the integrity constraints specified on the database schema, it
is a legal instance. A DBMS enforces integrity constraints, in that it permits only legal instances to be
stored in the database. Integrity constraints are specified and enforced at different times:
▪ When the DBA or end user defines a database schema, he or she specifies the integrity
constraints that must hold on any instance of this database.
▪ When a database application is run, the DBMS checks for violations and disallows
changes to the data that violate the specified integrity constraints.
Many kinds of integrity constraints can be specified in the relational model, such as, Not Null, Check,
Unique, Primary Key, etc.
List any two significant differences between a file processing system and a DBMS.
Ans:
File Processing System vs. DBMS
Data Independence - Data independence is the capacity to change the schema at one level of a
database system without having to change the schema at the next level. In file processing systems the
data and applications are generally interdependent, but DBMS provides the feature of data
independence.
Data Redundancy – Data redundancy means unnecessary duplication of data. In file processing
15
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
systems there is redundancy of data, but in DBMS we can reduce data redundancy by means of
normalization process without affecting the original data. If we do so in file processing system, it
becomes too complex.
Differentiate between various levels of data abstraction.
Ans: Data Abstraction – Abstraction is the process to hide the irrelevant things from the users and
represent the relevant things to the user. Database systems are often used by non-computer
professionals so that the complexity must be hidden from database system users. This is done by
defining levels of abstract as which the database may be viewed, there are logical view or external view,
conceptual view and internal view or physical view.
o External View – This is the highest level of abstraction as seen by a user. It describes
only the part of entire database, which is relevant to a particular user.
o Conceptual View – This is the next higher level of abstraction which is the sum total
of Database Management System user's views. It describes what data are actually stored in the
database. It contains information about entire database in terms of a small number of relatively simple
structure.
o Internal View – This is the lowest level of abstraction. It describes how the data are
physically stored
Ans: Primary Key – Primary key is one of the candidate keys. It should be chosen such that its
attribute values are never, or very rarely, changed.
b) Data Manipulation Language (DML) – A data manipulation language is a language
that enables users to access or manipulate data as organized by the appropriate data model.
c) Multivalued Attribute – Multivalued attribute may have more than one value for an
entity. For example, PreviousDegrees of a STUDENT.
d) Relationship Instance – A relationship is an association among two or more entities.
An instance of relationship set is a set of relationships.
Explain the concept of a data model. What data models are used in database management systems?
Ans:
Data Model – Model is an abstraction process that hides irrelevant details while highlighting details
relevant to the applications at hand. Similarly, a data model is a collection of concepts that can be used
to describe structure of a database and provides the necessary means to achieve this abstraction.
Structure of database means the data types, relationships, and constraints that should hold for the data.
In general a data model consists of two elements:
A mathematical notation for expressing data and relationships.
Operations on the data that serve to express queries and other manipulations of the
data.
Data Models used in DBMSs:
▪ Hierarchical Model - It was developed to model many types of hierarchical
organizations that exist in the real world. It uses tree structures to represent relationship among records.
In hierarchical model, no dependent record can occur without its parent record occurrence and no
dependent record occurrence may be connected to more than one parent record occurrence.
▪ Network Model - It was formalised in the late 1960s by the Database Task Group of the
Conference on Data System Language (DBTG/CODASYL). It uses two different data structures to
represent the database entities and relationships between the entities, namely record type and set type. In
the network model, the relationships as well as the navigation through the database are predefined at
database creation time.
▪ Relational Model - The relational model was first introduced by E.F. Codd of the
IBM Research in 1970. The model uses the concept of a mathematical relation (like a table of values) as
its basic building block, and has its theoretical basis in set theory and first-order predicate logic. The
relational model represents the database as a collection of relations.
▪ Object Oriented Model – This model is based on the object-oriented programming
language paradigm. It includes the features of OOP like inheritance, object-identity,
encapsulation, etc. It also supports a rich type system, including structured and collection types.
▪ Object Relational Model – This model combines the features of both relational
model and object oriented model. It extends the traditional relational model with a variety of features
such as structured and collection types.
Briefly explain the differences between a stand alone query language, embedded query language
and a data manipulation language.
Ans: Stand alone Query Language – The query language which can be used interactively is called
stand alone query language. It does not need the support of a host language.
Embedded Query Language – A query language (e.g., SQL) can be implemented in two ways. It can
be used interactively or embedded in a host language. The use of query language commands within a
host language (e.g., C, Java, etc.) program is called embedded query language. Although similar
capabilities are supported for a variety of host languages, the syntax sometimes varies.
Data Manipulation Language (DML) – A data manipulation language is a language that enables
users to access or manipulate data as organized by the appropriate data model.
Consider the following relations for a database that keeps track of business trips of
salespersons in a sales office:
SALESPERSON (SSN, Name, start_year, Dept_no)
17
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
TRIP (SSN, From_city, To_city, Departure_Date, Return_Date, Trip_ID)EXPENSE(TripID,
Account#, Amount)
Specify the following queries in relational algebra: Give the
details (all attributes of TRIP) for trips that exceeded $2000 in expenses.
(i) Print the SSN of salesman who took trips to ‘Honolulu’
(ii) Print the trip expenses incurred by the salesman with SSN= ‘234-56-
7890’.Notethat the salesman may have gone on more than one trip. List them individually
Ans: Key – A key a single attribute or a combination of two or more attributes of an entity set that is
used to identify one or more instances (rows) of the set (table). It is a minimal combination of attributes.
Super Key – A super key is a set of one or more attributes that, taken collectively, allows us to identify
uniquely a tuple in the relation.
What are views? Explain how views are different from tables.
Ans:
A view in SQL terminology is a single table that is derived from other tables. These other
tables could be base tables or previously defined views. A view does not necessarily exist in physical
form; it is considered a virtual table, in contrast to base tables, whose tuples are actually stored in the
database. This limits the possible update operations that can be applied to views, but it does not provide
any limitations on querying a view. A view represents a different perspective of a base relation(s). The
definition of a view in a create view statement is stored in the system catalog. Any attribute in the view
can be updated as long as the attribute is simple and not derived from a computation involving two or
more base relation attribute. View that involve a join may or may not be updatable. Such views are not
updatable if they do not include the primary keys of the base relations.
What do you mean by integrity constraints? Explain the two constraints, check and
foreign key in SQL with an example for each. Give the syntax.
Ans:Integrity Constraints –An integrity constraint is a condition specified on a database schema and
restricts the data that can be stored in an instance of the database. If a database instance satisfies all the
integrity constraints specified on the database schema, it is a legal instance. A DBMS enforces integrity
constraints, in that it permits only legal instances to be stored in the database.
CHECK constraint – CHECK constraint specifies an expression that must always be true for every
row in the table. It can‘t refer to values in other rows.
Syntax:
ALTER TABLE <table_name>
ADD CONSTRAINT <constraint_name> CHECK(<expression>);
FOREIGN KEY constraint – A foreign key is a combination of columns with values based on the
18
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
primary key values from another table. A foreign key constraint, also known as referential integrity
constraint, specifies that the values of the foreign key correspond to actual values of the primary or
unique key in other table. One can refer to a primary or unique key in the same table also.
Syntax:
ALTER TABLE <table_name>
ADD CONSTRAINT <constraint_name> FOREIGN KEY(<column_name(s)>)REFERENCES
<base_table>(<column_name>) ON {DELETE | UPDATE} CASCADE;
Q: What are the different types of database end users? Discuss the main activities of each.
Ans:
End-Users – End-users are the people whose jobs require access to the database for querying, updating,
and generating reports; the database primarily exists for their use. The different types of end-users are:
▪ Casual end-users – occasionally access the database, need different information each
time.
▪ Naive or Parametric end-users – includes tellers, clerks, etc., make up a sizable portion
of database end-users, main job function revolves around constantly querying and updating the
database.
▪ Sophisticated end-users – includes engineers, scientists, business analyst, etc., use for
their complex requirements
▪ Stand-alone users – maintain personal databases by using ready-made program
packages, provide easy-to-use menu-based or graphics-based interfaces
Information about a bank is about customers and their account. Customer has a name, address
which consists of house number, area and city, and one or more phone numbers. Account has
number, type and balance. We need to record customers who own an account. Account can be
held individually or jointly. An account cannot exist without a customer.
Arrive at an E-R diagram. Clearly indicate attributes, keys, the cardinality ratios and
participation constraints. Phone_no
area
Ans:
House_ no city
name type
address name type details
19
Acco unt _no balance
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Describe the static hash file with buckets and chaining and show how insertion, deletionand
modification of a record can be performed.
Ans:
In static hash file organization, the term bucket is used to denote a unit storage that can store one or
more records.A file consists of buckets 0 through N-1, with one primary page per bucket initially and
additional overflow pages chained with bucket, if required later. Buckets contain data entries (or data
records). In hashing scheme, a hash function, h, is performed on the key of the record to identify the
bucket to which data record belongs to. The hash function is an important component of the hashing
approach. The main problem with static hash file is that the number of buckets is fixed.
Insertion of a record – To insert a data entry, the hash function is used to identify the
h(key) mod N 0
1
Key
h
N-1
correct bucket and then put the data entry there. If there is no space for this data entry, a
new overflow page will be allocated, put the data entry on this page, and the page to the overflow chain
of the bucket.
Deletion of a record – To delete a data entry, the hash function is used to identify the correct bucket,
locate the data entry by searching the bucket, and then remove it. If the data entry is the last in an
overflow page, the overflow page is removed from the overflow chain of the bucket and added to a list
of free pages.
Modification of a record – To modify a data entry, the hash function is used to identify the correct
bucket, locate the data entry by searching the bucket and get it, modify the data entry, and then rewrite
the modified data entry on it.
Ans: (i) Derived and Stored Attribute - In some cases, two or more attribute values are related, for
example, Age and BirthDate attributes of a person. For particular person entity, the value of Age can be
determined from the current date and the value of that person‘s BirthDate. Hence, the attribute Age is
called as derived attribute and the attribute BirthDate is called as stored attribute.
(ii) Distributed System – A distributed system consists of a number of processing
20
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
elements that are interconnected by a computer network and that cooperate in performing certain
assigned tasks.
(iii) Interblock Gap – A track of a disk is divided into equal-sized disk blocks. Blocks are
separated by fixed-size gaps, called as interblock gaps, which include specially coded control
information written during disk initialization.
(iv) Degree of a Relation – The degree or arity of a relation is the number of attributes
n of its relation schema.
(v) Catalog – A relational DBMS maintains information about every table and index that
it contains. A catalog is a collection of special tables, which stores the descriptive information of every
table and index.
(vi) Conceptual Schema – Conceptual schema describes the structure of the whole
database for a community of users. It hides the details of physical storage structures and concentrates on
describing entities, data types, relationships, and constraints.
(vii) DDL and SDL – The data definition language (DDL) is used by DBA and database
designers to define conceptual schema, internal schema, and mappings between these two. In some
DBMSs, a clear separation is maintained between conceptual schema and internal schema. In that case,
DDL is used to specify the conceptual schema only. Another language, storage definition language
(SDL) is used to specify the internal schema. The mappings between the two schemas may be specified
in either one of these languages.
Define a relation.
Ans: Relation – A relations is a named two-dimensional table of data. Mathematically, a relation can be
defined as a subset of the cartesian product of a list of domains. Each relation consists of a set of named
columns and an arbitrary number of rows. The columns correspond to the fields describing each tuple
in the table or relation. The rows correspond to each instance of the entity described by the table or
relation.
Describe entity integrity and referential integrity. Give an example of each.
Ans:
Entity Integrity Rule – If the attribute A of relation R is a prime attribute of R then A
cannot accept null values.
Referential Integrity Rule – In referential integrity, it is ensured that a value that appears in one
relation for a given set of attributes also appears for a certain set of attributes in another relation.
For example:
STUDENT
21
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
▪ Roll No is the primary key in the relation STUDENT and Roll No + Course is the
primary key of the relation GRADE. (Entity Integrity)
▪ Roll No in the relation GRADE (child table) is a foreign key, which is referenced from
the relation STUDENT (parent table). (Referential Integrity).
Ans
(i) SELECT P.NAME FROM TRAIN T, TICKET I, PASSENGER PWHERE P.PNRNO
= I.PNRNO AND T.NAME = I.NAME
AND T.START = I.START AND T.DEST = I.DEST
(ii) SELECT NAME FROM PASSENGER
WHERE PNRNO IN (SELECT DISTINCT A.PNRNO
FROM TICKET A, TICKET B WHERE A.PNRNO = B.PNRNOAND A.START = B.DEST AND
A.DEST = B.START)
(iii) INSERT INTO TRAIN
VALUES(‗Shatabdi‘, ‗Delhi‘, ‗Banglore‘
(iv) DELETE FROM TICKET
WHERE PNRNO = (SELECT PNRNO FROM PASSENGERWHERE NAME = ‗Tintin‘)
Define outer union operation of the relational algebra. Compute the outer union for the
relations R and S given below.
R S
A B C D A F
a1 b1 c1 d1 a1 f1
a3 b2 c2 d1 a2 null
22
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Ans:
Outer Join - If there are any values in one table that do not have corresponding value(s) in the other,
in an equi-join that will not be selected. Such rows can be forcefully selected by using the outer join.
The corresponding columns for that row will have NULLs. There are actually three forms of the outer-
join operation: left outer join ( X), right outer join(X ) and full outer join ( X ).
R.A B C D S.A F
a1 b1 c1 d1 a1 f1
a3 b2 c2 Null Null Null
Null Null Null d1 a2 Null
Ans: Every DBA uses database utilities to manage and control their databases. But
there is a lot of confusion in the field as to what, exactly, is a database utility. There are a lot of
definitions floating around out there. DBAs constantly refer to utilities, tools, solutions, and suites.
So, first of all, let‘s be clear on what a utility is and what is a ―tool‖ or ―solution.‖ A utility is generally
a single purpose program for moving and/or verifying database pages; examples include LOAD,
UNLOAD, REORG, CHECK, COPY, and RECOVER. A database tool is a multi-functioned program
designed to simplify database monitoring, management, and/or administrative tasks. A solution is a
synergistic group of tools and utilities designed to work together to address a customer‘s business issue.
A suite is a group of tools that are sold together, but are not necessarily integrated to work with each
other in any way. Of course, these are just my definitions. But there are useful definitions that make it
easier to discuss DBA products and programs.
Differentiate between
(i) WHERE and HAVING clause in SQL.
(ii) Strong entity set and weak entity set.
Ans: (i) WHERE and HAVING clause in SQL
The WHERE clause is basically used for implementing conditions on every tuple of therelation.
The HAVING clause is used in combination with the GROUP BY clause. It can be usedin a SELECT
statement to filter groups of the records that a GROUP BY returns.
The syntax for the HAVING clause is:
SELECT column1, column2, ... column_n, aggregate_function (expression)FROM tables
WHERE predicates
GROUP BY column1, column2, ... column_nHAVING condition1 ... condition_n;
Aggregate_function can be a function such as SUM, COUNT, MIN, or MAX.
(ii) Strong entity set and weak entity set: A strong entity set has a primary key. All
24
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
tuples in the set are distinguishable by that key. A weak entity set has no primary key unless attributes
of the strong entity set on which it depends are included. Tuples in a weak entity set are partitioned
according to their relationship with tuples in a strong entity set. Tuples within each partition are
distinguishable by a discriminator, which is a set of attributes. A strong entity set has a primary key.
All tuples in the set are distinguishable by that key. A weak entity set has no primary key unless
attributes of the strong entity set on which it depends are included. Tuples in a weak entity set are
partitioned according to their relationship with tuples in a strong entity set. Tuples within each partition
are distinguishable by a discriminator, which is a set of attributes.
Discuss with examples about various types of attributes present in the ER model.
Ans: Types of Attributes are:
SIMPLE attributes are attributes that are drawn from the atomic value domains
E.g. Name = {John} ; Age = {23}
COMPOSITE attributes: Attributes that consist of a hierarchy of attributes
E.g. Address may consists of ―Number‖, ―Street‖ and ―Suburb‖ → Address = {59 +‗Meek Street‘ +
‗Kingsford‘}
SINGLE VALUED attributes: Attributes that have only one value for each entity
E.g. Name, Age for EMPLOYEE
MULTIVALUED attributes: Attributes that have a set of values for each entity
E.g. Degrees of a person: ‗ BSc‘ , ‗MIT‘, ‗PhD‘
DERIVED attributes: Attributes Contain values that are calculated from other
attributes
Eg. Age can be derived from attribute DateOfBirth. In this situation, DateOfBirthmight be called Stored
Attribute.
26
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Ans: (i)
A C
1 3
4 6
7 9
(ii)
B C D
2 3 10
2 3 11
(iii)
A B C D
1 2 3 10
1 2 3 11
(iv) Assuming left outer join
A B C D
1 2 3 10
1 2 3 11
4 5 6 NULL
7 8 9 NULL
27
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
An ORACLE database has both a physical and a logical structure. By separating physical and logical
database structure, the physical storage of data can be managed without affecting the access to logical
storage structures.
(iii) Group By clause in SQL: The GROUP BY clause can be used in a SELECT
statement to collect data across multiple records and group the results by one or more columns.
The syntax for the GROUP BY clause is:
SELECT column1, column2, ... column_n, aggregate_function (expression)FROM tables
WHERE predicates
GROUP BY column1, column2, ... column_n;
aggregate_function can be a function such as SUM, COUNT, MIN, or MAX.
Explain the three data models namely relational, network and hierarchical and compare their
relative advantages and disadvantages.
Ans: Hierarchical Model: In hierarchical model, data elements are connected to one another through
links. Records are arranged in a top-down structure that resembles a tree or genealogy chart. The top
node is called the root, the bottom nodes are called leaves, and intermediate nodes have one parent
node and several child nodes. The root can have any number of child nodes but a child node can have
only one parent node. Data are related in a nested, one-to-many set of relationships, while many-to-
many relationship cannot be directly expressed.
A child record occurrence must have a parent record occurrence; deleting a parent record occurrence
requires deleting all its child record occurrences.
A network data model can be regarded as an extended form of the hierarchical model; the principle
distinction between the two being that in a hierarchical model, a child record has exactly one parent
whereas in network model, a child record can have any number of parents. It may have zero also.
Data in the network model is represented by collection of records and relationship among data is
represented by links, which can be viewed, as pointers. The records in the database are organized as
collection of arbitrary graphs, which allows to have one-to- many as well as many-to-many relationship
is a collection of data items which can be retrieved from a database, or which can be stored in a
database as an undivided object. Thus, A DBMS may STORE, DELETE or MODIFY records within a
database. In this way, a number of records within a network database are dynamically changed. The
network model can be graphically represented as follows:
A labeled rectangle represents the corresponding entity or record type. An arrow represents the set type,
which denotes the relationship between the owner record type and member record. The arrow direction
is from the owner record type to the member record type.
A labeled rectangle represents the corresponding entity or record type. An arrow represents the set type,
which denotes the relationship between the owner record type and member record. The arrow direction
is from the owner record type to the member record type.
28
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
Each many to many relationship is handled by introducing a new record type to represent the
relationship wherein the attributes, if any, of the relationship are stored. We when create two
symmetrical 1:M sets with the member in each of the sets being the newly introduced record type. In
this model, the relationships as well as the navigation through the database are predefined at database
creation time.
In relational model the data and the relations among them are represented by a collection of tables. A
tables is a collection of records and each record in a table contains the same fields. The attractiveness of
the relational approach arouses from the simplicity in the data organization and the availability of ably
simple to very powerful query languages. The relational model is based on a technique called
―Normalization‖ proposed by E.F. Codd. This model reduces the complexity of the Network and
Hierarchical Models. This model uses the certain mathematical operations from relational algebra and
relational calculus on the relation such as projection, union and joins etc. where fields in two different
tables take values from the same set, a join operation can be performed to select related records in the
two tables by matching values in those fields. A description of data in terms of a data model is called a
schema. In relation model, the schema for a relation specifies its name, the name of each field and the
type of each field.
Navigation through relations the represent an M:N relationship is just as simple as through a 1:M
relationship. This leads us to conclude that it is easier to specify how to manipulate a relational database
than a network or hierarchical one. This in turn leads to a query language for the relational model that is
correct, clear, and effective in specifying the required operations. Unfortunately, the join operation is
inherently inefficient and demands a considerable amount of processing and retrieval of unnecessary
data. The structure for the network and hierarchical model can be implemented efficiently. Such an
implementation would mean that navigating through these databases, though awkward, requires the
retrieval of relatively little unnecessary data.
In an organization several projects are undertaken. Each projects can employ one or more employees.
Each employee can work on one or more projects. Each project is undertaken on the required of
client. A client can request for several projects. Each
Explain the relevance of Data Dictionary in a Database System.
Ans: Data dictionary is a database in its own right residing on the disk which consist of Meta data
which is = Data about all entity sets + attributes + relationships among entity sets + constraints. It
consist of compiled form of definitions, structure and usage information on data stored, design
decisions, usage standards, application programme descriptions, user information. It is consulted by
DBMS before DML operation and by user to learn what each piece of data and various synonymous of
data fields mean. Data dictionary can be integrated system where it is part of DBMS or add ons to
DBMS. In integrated system data dictionary contains information concerning external, conceptual and
internal level of data base. Both in source and object form. It contains source code of each data field
value, frequency of its use, audit trail concerning updates and cross reference information. Present
system are all add ons standards do not exist for iintegrity data dictionary with DBMS. Data dictionary
should be integrated in database it defines and thus include its own definition so that it can be queried
with the same language usefor queering database.
Ans: (i) Procedural and non procedural languages - A procedural language specifies the operations
to be performed on the existing data to derive the results. It also specifies the sequence of operations in
which they will be performed. But, a non procedural language specifies only the result or information
29
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
required not how it is obtained.
(ii) Key and superkey - A key a single attribute or a combination of two or more
attributes of an entity set that is used to identify one or more instances (rows) of the set (table). If we
add some additional attributes to a primary key then that augmented key is called as super key.
Therefore, the primary key is the minimum super key.
(iii) Primary and secondary storage – Primary storage device stores the data
temporarily. Primary storage is generally used by the processing unit to temporary store the data,
intermediate results, and the final results before storing to the secondary storage because the secondary
storage devices are not directly accessible by the CPU. But, if we want to store data permanently then
the secondary storage devices are required. Secondary storage devices are slower than the primary
storage devices.
What is the difference between a primary index and a secondary index? What are the
advantages of using an index and what are its disadvantages.
Ans: Primary Index: A primary index is an ordered file whose records are of fixed length with two
fields. The first field is the ordering key field-called primary key-of the data file, and the second field is
a pointer to a disk block. There is one index entry in the index file for each block in the data file. Each
index entry has the value of the primary key field for the first record in a block and a pointer to that
block as its two field values. A major problem with a primary index is insertion and deletion of records.
If we attempt to insert a record in it‘s correct positioning the data file, we have to not only move records
tomake space for the new record but also change some index entries.
Secondary Index: A secondary index is also an ordered file with two fields. The first field is non-
ordering field of the data file that is an indexing field. The second field is either a block pointer or a
record pointer. A secondary index on a candidate key looks just like a dense primary index, except that
the records pointed to by successive values in the index are not stored sequentially.
In contrast, if the search key of a secondary index is not a candidate key, it is not enough to point to
just the first record with each search-key value. The remaining records with the same search – key
value could be anywhere in the file, since the records are ordered by the search key of the primary
index, rather than by the search key if the secondary index. Therefore, a secondary index must contain
pointers to all the records. Secondary indices improve the performance of queries that use keys other
than the search key of the primary index. However, they impose a significant overhead on modification
of the database. The designer of a database decides which secondary indices are desirable on an
estimate of the relative frequency of query‘s and modifications.
Some of the advantages of using an index are:
(v) Indexes speed up search on the indexed attributes(s). Without an index either a
sequential search or some sort of binary search would be needed.
(vi) Indexes can also speed up sequential processing of the file when the file is not
stored as a sequential file.
Some of the disadvantages of using an index are :
(i) An index requires additional storage. This additional storage can be significant
when a number of indexes are being used on a file.
(ii) Insertion, deletion and updates on a file with indexes takes more time than on afile
without any indexes.
Describe the function of each of the following types of keys: Primary, alternative, secondary and
foreign.
Ans: Primary Key : The primary key is an attribute or a set of attributes that uniquely identify a
specific instance of an entity. Every entity in the data model must have a primary key whose values
uniquely identify instances of the entity.
To qualify as a primary key for an entity, an attribute must have the following properties :
30
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
* It must have a non-null value for each instance of the entity.
* The value must be unique for each instance of an entity
* The values must not change or become null during the life of each entity instance.
Candidate Key and Alternate Key : In some instances, an entity will have more thanone attribute
that can serve as a primary key. Any key or minimum set of keys that couldbe a primary key is called a
candidate key. Ones candidate keys are identified, one of them is chosen as primary key. The choice of
Primary key is based on guaranteeduniqueness and minimalism.
Candidate keys which are not chosen as the primary key are known as alternate keys. Foreign Key :
The primary key of one file or table which is implanted in another file ortable to implement the
relationships between them. Foreign keys are used to implementsome types of relationships. Foreign
keys do not exist in information models.
Discuss the techniques for a hash file to expand and shrink dynamically. What are the
advantages and disadvantages of each?
Ans:
The hashing techniques that allow dyanamic file expansion are:
(i) Extendible hashing
(ii) Linear hashing
The main advantage of extendible hashing that makes it attractive is that performance of the file does
not degrade as the file grows. Also, no space is allocated in extendible hashing for future growth, but
additional buckets can be allocated dynamically as needed. A disadvantage is that the directory must be
searched before accessing the buckets themselves, resulting in two blocks accesses instead of one in
static hashing.
Discuss the types of integrity constraints that must be checked for the update operations –Insert
and Delete. Give examples.
Ans: Insert operation can violet any of the following four constraints:
1) Domain constraints can be violated if given attribute value does not appear in
corresponding domain.
2) Key constraints can be violated if given attribute value does not appear in
corresponding domain.
3) Entity integrity can be violated if the primary key of the new tuple t is NULL.
4) Referential integrity can be violated if value of any foreign key in t refers to a tuple
that does not exist in referenced relation.
Delete operation can violate only referential integrity constraints, if the tuple beingdeleted is referenced
by the foreign keys from other tuples in the database.
Ans: EXISTS: The EXISTS function takes one parameter which is a SQL statement. If any records exist
that match the criteria it returns true, otherwise it returns false. This gives you a clean, efficient way to
write a stored procedure that does either an insert or update.
UNIQUE: If UNIQUE is specified then only unique values are used tocalculate the mean.
32
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
What is NULL? Give an example to illustrate testing for NULL in SQL.
Ans: The NULL SQL keyword is used to represent either a missing value or avalue that is not
applicable in a relational table.
Consider there is a relation:
Person(id, name, address, phone)
Now to find ids and names of person who do not have a phone is:
Select id, namefrom Person
where phone is null
Discuss the differences between the candidate keys and the primary key of a relation.Give
example to illustrate your answer.
Ans: A candidate key is one which can be used as primary key that is not null and unique constraint
both holding true. In short all primary keys are definitely candidate keys. That is one of the
candidate keys is chosen as primary key.
Ans:
Constraint Description
PRIMARY Determines which column(s) uniquely identifies each record.
KEY The primary key cannot be NULL, and the data value(s) must
be unique.
FOREIGN In a one-to-many relationship, the constraint is added to the
KEY "many"table. The constraint ensures that if a value is entered
into a specified column, it must already exist in the "one" table,
or the record is not added.
UNIQUE Ensures that all data values stored in a specified column are
unique.The UNIQUE constraint differs from the PRIMARY
KEY constraint in that it allows NULL values.
CHECK Ensures that a specified condition is true before the data value is
added to a table. For example, an order's ship date cannot be
earlierthan its order date.
NOT NULL Ensures that a specified column cannon contain a NULL value.
TheNOT NULL constraint can only be created with the
column-level approach to table creation.
What are the various types of the update operations on relations? Also explain the constraints
on these update operation. Give examples in support of your answer.
34
Debasis kamila 9432208397 DATABASE MANAGEMENT SYSTEMS
35
DC10 DATABASE MANAGEMENT SYSTEMS
Write short note on followings:
(i) Relational Constraints
(ii) Disadvantages of Relational Approach
(iii) Instances and Schemas
What is data model? Explain object based and record based data models.
Ans: A data model is an abstract model that describes how data is represented and accessed.
(i) Object based data models: Similar to a relational database model, but objects,
classes and inheritance are directly supported in database schemas and in the query language
(ii) Record based data models: is a database model based on first-order predicate logic.
Its core idea is to describe a database as a collection of predicates over a finite set of predicate
variables, describing constraints on the possible values and combinations of values.
Generalization
The process of generalizing entities, where the generalized entities contain the properties of all the
generalized entities, is called generalization. In generalization, a number of entities are brought together into
one generalized entity based on their similar characteristics. For example, pigeon, house sparrow, crow and
dove can all be generalized as Birds.
36
DC10 DATABASE MANAGEMENT SYSTEMS
Specialization
Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-
groups based on their characteristics. Take a group ‗Person‘ for example. A person has name, date of birth,
gender, etc. These properties are common in all persons, human beings. But in a company, persons can be
identified as employee, employer, customer, or vendor, based on what role they play in the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what role
they play in school as entities.
Aggregration
Aggregration is a process when relation between two entity is treated as a single entity. Here the relation
between Center and Course, is acting as an Entity in relation with Visitor.
37
DC10 DATABASE MANAGEMENT SYSTEMS
An anomaly is an irregularity, or something which deviates from the expected or normal state. When
designing databases, we identify three types of anomalies: Insert, Update and Delete.
We can‘t insert a row in REFERENCING RELATION if referencing attribute‘s value is not present in
referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE ‗ME‘ in STUDENT relation
will result in error because ‗ME‘ is not present in BRANCH_CODE of BRANCH.
We can‘t delete or update a row from REFERENCED RELATION if value of REFRENCED ATTRIBUTE
is used in value of REFERENCING ATTRIBUTE. e.g; if we try to delete tuple from BRANCH having
BRANCH_CODE ‗CS‘, it will result in error because ‗CS‘ is referenced by BRANCH_CODE of
STUDENT, but if we try to delete the row from BRANCH with BRANCH_CODE CV, it will be deleted as
the value is not been used by referencing relation. It can be handled by following method:
ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if value used by
REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g;, if we delete a row from
BRANCH with BRANCH_CODE ‗CS‘, the rows in STUDENT relation with BRANCH_CODE CS
(ROLL_NO 1 and 2 in this case) will be deleted.
Key that consist of two or more attributes that uniquely identify an entity occurance is called Composite key.
But any attribute that makes up the Composite key is not a simple key in its own.
38
DC10 DATABASE MANAGEMENT SYSTEMS
Database Normalization
Database normalization is the process of organizing the attributes of database to reduce or eliminate data
redundancy (having same data but at different places) .
Functional Dependency
Functional Dependency is a constraint between two sets of attributes in a relation from a database.
Functional dependency is denoted by arrow (→). If an attributed A functionally determines B, then it is
written as A → B.
For example employee_id → name means employee_id functionally determines name of employee. As
another example in a time table database, {student_id, time} → {lecture_room}, student ID and time
determine the lecture room where student should be.
For example in the below table A → B is true, but B → A is not true as there are different values of A for B
= 3.
A B
------
1 3
2 3
4 0
1 3
4 0
ABC --> AB
ABC --> A
ABC --> ABC
Non Trivial Functional Dependencies
X –> Y is a non trivial functional dependencies when Y is not a subset of X.
39
DC10 DATABASE MANAGEMENT SYSTEMS
X –> Y is called completely non-trivial when X intersect Y is NULL.
Examples:
Id --> Name,
Name --> DOB
Normal form
Normal forms are used to eliminate or reduce redundancy in database tables.
Example :
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
A relation is in 2NF iff it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not
part of any candidate key) is dependent on any proper subset of any candidate key of the table.
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset
of AB doesn‘t determine any non-prime attribute.
All possible candidate keys in above relation are {A, E, CD, BC}
All attribute are on right sides of all functional dependencies are prime.
BCNF
A relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.
Key Points
Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC --> D
CD --> AE
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this dependency is not in
BCNF. So, R is not in BCNF.
3NF: ABC -> D we don‘t need to check for this dependency as it already satisfied BCNF. Let us consider
CD -> AE. Since E is not a prime attribute, so relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a candidate key and
it determine E, which is non prime attribute. So, given relation is also not in 2 NF.
41
DC10 DATABASE MANAGEMENT SYSTEMS
So, the highest normal form is 1 NF.
Given a Relation with different FD sets for that relation, we have to find out whether one FD set is subset of
other or both are equal.
1. If all FDs of FD1 can be derived from FDs present in FD2, we can say that FD2 ⊃ FD1.
2. If all FDs of FD2 can be derived from FDs present in FD1, we can say that FD1 ⊃ FD2.
3. If 1 and 2 both are true, FD1=FD2.
All these three cases can be shown using Venn diagram as:
Q. Let us take an example to show the relationship between two FD sets. A relation R(A,B,C,D) having two
FD sets FD1 = {A->B, B->C, AB->D} and FD2 = {A->B, B->C, A->C, A->D}
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is true.
Step 3. As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These two FD sets are semantically
equivalent.
Q. Let us take another example to show the relationship between two FD sets. A relation R2(A,B,C,D)
having two FD sets FD1 = {A->B, B->C,A->C} and FD2 = {A->B, B->C, A->D}
As all FDs in set FD1 also hold in set FD2, FD2 ⊃ FD1 is true.
As all FDs in set FD2 do not hold in set FD1, FD2 ⊄ FD1.
Step 3. In this case, FD2 ⊃ FD1 and FD2 ⊄ FD1, these two FD sets are not semantically equivalent.
Atomicity
By this, we mean that either the entire transaction takes place at once or doesn‘t happen at all. There is no midway i.e.
transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not
executed at all. It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‗All or nothing rule‘.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.
43
DC10 DATABASE MANAGEMENT SYSTEMS
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before
write(Y)), then amount has been deducted from X but not added to Y. This results in an inconsistent
database state. Therefore, the transaction must be executed in entirety in order to ensure correctness of
database state.
Consistency
This means that integrity constraints must be maintained so that the database is consistent before and after the
transaction. It refers to correctness of a database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result T is incomplete.
Isolation
This property ensures that multiple transactions can occur concurrently without leading to inconsistency of database
state. Transactions occur independently without interference. Changes occurring in a particular transaction will not be
visible to any other transaction until that particular change in that transaction is written to memory or has been
committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent
to a state achieved these were executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T‖.
Suppose T has been executed till Read (Y) and then T‘‘ starts. As a result , interleaving of operations takes
place due to which T‘‘ reads correct value of X but incorrect value of Y and sum computed by
T‘‘: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
44
DC10 DATABASE MANAGEMENT SYSTEMS
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in
isolation and changes should be visible only after a they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and modifications to the database
are stored in and written to disk and they persist even is system failure occurs. These updates now become permanent
and are stored in a non-volatile memory. The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of a database in
a way such that each transaction is a group of operations that acts a single unit, produces consistent results,
acts in isolation from other operations and updates that it makes are durably stored.
Indexing in Databases
Indexing is a way to optimize performance of a database by minimizing the number of disk accesses
required when a query is processed.
An index or database index is a data structure which is used to quickly locate and access the data in a
database table.
The first column is the Search key that contains a copy of the primary key or candidate key of the table. These
values are stored in sorted order so that the corresponding data can be accessed quickly (Note that the data
may or may not be stored in sorted order).
The second column is the Data Reference which contains a set of pointers holding the address of the disk
block where that particular key value can be found.
There is no comparison between both the techniques, it depends on the database application on which
it is being applied.
45
DC10 DATABASE MANAGEMENT SYSTEMS
Indexing Methods
Ordered Indices
The indices are usually sorted so that the searching is faster. The indices which are sorted are known
as ordered indices.
If the search key of any index specifies same order as the sequential order of the file, it is known as
primary index or clustering index.
Note: The search key of a primary index is usually the primary key, but it is not necessarily so.
If the search key of any index specifies an order different from the sequential order of the file, it is
called the secondary index or non-clustering index.
Clustered Indexing
Clustering index is defined on an ordered data file. The data file is ordered on a non-key field. In some
cases, the index is created on non-primary key columns which may not be unique for each record. In
such cases, in order to identify the records faster, we will group two or more columns together to get
the unique values and create index out of them. This method is known as clustering index. Basically,
records with similar characteristics are grouped together and indexes are created for these groups.
For example, students studying in each semester are grouped together. i.e. 1st Semester students, 2nd
semester students, 3rd semester students etc are grouped.
Primary Index
In this case, the data is sorted according to the search key. It induces sequential file organisation.
In this case, the primary key of the database table is used to create the index. As primary keys are unique and
are stored in sorted manner, the performance of searching operation is quite efficient. The primary index is
classified into two types : Dense Index and Sparse Index.
46
DC10 DATABASE MANAGEMENT SYSTEMS
(I) Dense Index :
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that search key value.
The index record appears only for a few items in the data file. Each item points to a block as shown.
To locate a record, we find the index record with the largest search key value less than or equal to the search
key value we are looking for.
We start at that record pointed to by the index record, and proceed along the pointers in the file (that is,
sequentially) until we find the desired record.
Q: What are the difference Between Relational Algebra and Relational Calculus ?
Order Relational Algebra describes the order Relational Calculus does not
in which operations have to be specify the order of operations.
performed.
47
DC10 DATABASE MANAGEMENT SYSTEMS
Domain Relational Algebra is not domain Relation Claculus can be
dependent. domain dependent.
BASIS FOR
COMPARISON Sequential Hash
Method of storing Stored as they come or Stored at the hash address generated
sorted as they come
Types Pile file and sorted file Static and dynamic hashing
Method
Q: Discuss different types of anomalies.
48
DC10 DATABASE MANAGEMENT SYSTEMS
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets updated
in one department but not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently
not assigned to any department then we would not be able to insert the data into the table if emp_dept
field doesn‘t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of employee
Maggie since she is assigned only to this department.
1. SELECT column-names
2. FROM table-name T1 JOIN table-name T2
3. WHERE condition
Q: Why "bcnf is stronger than 3nf" explain with the help of an example.
A relation R is in 3NF if and only if every dependency A->B satisfied by R meets at least ONE of the
following criteria: 1. A->B is trivial (i.e. B is a subset of A) 2. A is a superkey 3. B is a subset of a candidate
key BCNF doesn't permit the third of these options. Therefore BCNF is said to be stronger than 3NF because
3NF permits some dependencies which BCNF does not.
Why is SQL called a structured and a non-procedural language?
SQL is a declarative language in which the expected result or operation is given without the specific details
about how to accomplish the task. The steps required to execute SQL statements are handled transparently
by the SQL database. Sometimes SQL is characterized as non-procedural because procedural languages
generally require the details of the operations to be specified, such as opening and closing tables, loading
and searching indexes, or flushing buffers and writing data to filesystems. Therefore, SQL is considered to
be designed at a higher conceptual level of operation than procedural languages because the lower level
logical and physical operations aren't specified and are determined by the SQL engine or server process that
executes it.
Q: What are the advantages of hash file organization ?
Records need not be sorted after any of the transaction. Hence the effort of sorting is reduced in this
method.
Since block address is known by hash function, accessing any record is very faster. Similarly
updating or deleting a record is also very quick.
This method can handle multiple transactions as each record is independent of other. i.e.; since there
is no dependency on storage location for each record, multiple records can be accessed at the same
time.
It is suitable for online transaction systems like online banking, ticket booking system etc.
49
DC10 DATABASE MANAGEMENT SYSTEMS
50
DC10 DATABASE MANAGEMENT SYSTEMS
The main purpose of data model is to give an idea that how final system or software will look like after
development is completed.
1. Hierarchical Model
Hierarchical model was developed by IBM and North American Rockwell known as Information
Management System.
It represents the data in a hierarchical tree structure.
This model is the first DBMS model.
In this model, the data is sorted hierarchically.
It uses pointer to navigate between the stored data.
2. Relational Model
Relational model is based on first-order predicate logic.
This model was first proposed by E. F. Codd.
It represents data as relations or tables.
Relational database simplifies the database structure by making use of tables and columns.
In this diagram,
Rectangle represents the entities. Eg. Doctor and Patient.
Ellipse represents the attributes. Eg. DocId, Dname, PId, Pname. Attribute describes each entity becomes a
major part of the data stored in the database.
Diamond represents the relationship in ER diagrams. Eg. Doctor diagnoses the Patient.
52
DC10 DATABASE MANAGEMENT SYSTEMS
What do you mean by Lossless Decomposition ?
Lossless Decomposition :
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables using Joins. This
is the preferred choice. The information will not lose from the relation when decomposed. The join would
result in the same original relation.
<EmpInfo>
<EmpDetails>
<DeptDetails>
53
DC10 DATABASE MANAGEMENT SYSTEMS
Non-Clustered Indexing
A non clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to
the location where the data is actually stored. Data is not physically stored in the order of the index. Instead ,
data is present in leaf nodes. For eg. the contents page of a book. Each entry gives us the page number or
location of the information stored. The actual data here(information on each page of book) is not organised
but we have an ordered reference(contents page) to where the data points actually lie.
54
DC10 DATABASE MANAGEMENT SYSTEMS
It requires more time as compared to clustered index because some amount of extra work is done in
order to extract the data by further following the pointer. In case of clustered index, data is directly
present in front of the index.
Secondary Index
It is used to optimize query processing and access records in a database with some information other
than the usual search key (primary key). In this two levels of indexing are used in order to reduce the
mapping size of the first level and in general. Initially, for the first level, a large range of numbers is
selected so that the mapping size is small. Further, each range is divided into further sub ranges.
55
DC10 DATABASE MANAGEMENT SYSTEMS
In order for quick memory access, first level is stored in the primary memory. Actual physical
location of the data is determined by the second mapping level.
DDL
Data Definition Language (DDL) statements are used to define the database structure or schema.
Some examples:
DML
Data Manipulation Language (DML) statements are used for managing data within schema objects.
Some examples:
DCL
TCL
Transaction Control (TCL) statements are used to manage the changes made by DML statements. It
allows statements to be grouped together into logical transactions.
DELETE command: DELETE command is used to delete rows from a table based on the condition that we
provide in a WHERE clause. o DELETE command delete only those rows which are specified with the
WHERE clause. o DELETE command can be rolled back. o DELETE command maintain a log, that's why it
is slow. DELETE use row lock while performing DELETE function.
Ans: Normalization is the process of efficiently organizing data in a database. There are two goals of the
normalization process: eliminating redundant data (for example, storing the same data in more than one
table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are
worthy goals as they reduce the amount of space a database consumes and ensure that data is logically
stored. Why it is requried? Normalization reduces redundancy. Redundancy is the unnecessary repetition
of data. It can cause problems with storage, reterival and updation of data. Redundancy can lead to:
Inconsistencies:-errors are more likely to occur when facts are repeated. Update anomalies:-inserting,
modifying and deleting data may cause inconsistencies. Inconsistency occurs when we perform updation or
deletion of data in one relation, while forgetting to make corresponding changes in other relations. During
the process of normalization, you can identify dependencies, which can cause problems when deleting or
updating. Normalization also helps to simplify the structure of the tables. A fully normalized record consist
of: A primary key that identifies that entity. A set of attributes that describe that entity
Ans: A log file is a recording of everything that goes in and out of a particular server. It is a concept much
like the black box of an airplane that records everything going on with the plane in the event of a problem.
The information is frequently recorded chronologically, and is located in the root directory, or occasionally
in a secondary folder, depending on how it is set up with the server. The only person who has regular access
to the log files of a server is the server administrator, and a log file is generally password protected, so that
the server administrator has a record of everyone and everything that wants to look at the log files for a
specific server,
57
DC10 DATABASE MANAGEMENT SYSTEMS
What is the domain of an Attribute?
Ans: Attribute domains are rules that describe the legal values of a field type, providing a method for
enforcing data integrity. Attribute domains are used to constrain the values allowed in any particular
attribute for a table or feature class. If the features in a feature class or nonspatial objects in a table have been
grouped into subtypes, different attribute domains can be assigned to each of the subtypes. A domain is a
declaration of acceptable attribute values. Whenever a domain is associated with an attribute field, only the
values within that domain are valid for the field. In other words, the field will not accept a value that is not in
that domain. Using domains helps ensure data integrity by limiting the choice of values for a particular field.
Ans: Metadata is data about data. An item of metadata may describe an individual data item or a collection
of data items. Metadata is used to facilitate the understanding, use and management of data. Metadata
defines the nature of the data stored in the database. Metadata consists of pre-determined values that
describe various attributes of a given table or a relation. Thus a part of the database which contains
information about data stored in the database is called as metadata.
Ans: Derived attributes are those attributes which are based on and are derived from the attributes of
another table or a relation. The derived attributes may contain new values or the values from the base table
from which it was derived. Derived attributes are effectively read-only since there is no place to write them
back to. Also, because derived attributes don‘t directly point to anything in the database, they cannot be used
as primary keys. For example: a derived attribute person‘s full name may be derived from attribute person‘s
first name and the last name.
Group by clause is used to apply aggregate functions to a set of tuples. The attributes given in the group by
clause are used to form groups. Tuples with the same value on all attributes in the group by clause are placed
in one group.
58