CSC 203.1 Note
CSC 203.1 Note
Why Database:
In order to overcome the limitation of a file system, a new approach was required. Hence
a database approach emerged. A database is a persistent collection of logically related
data. The initial attempts were to provide a centralized collection of data. A database has
a self-describing nature. It contains not only the data sharing and integration of data of
an organization in a single database.
A small database can be handled manually but for a large database and having multiple
users it is difficult to maintain it. In that case a computerized database is useful.
1
COMPUTER SCIENCE UNIT
Advantages of DBMS:
Reduction of redundancies:
Centralized control of data by the DBA avoids unnecessary duplication of data and
effectively reduces the total amount of data storage required avoiding duplication in the
elimination of the inconsistencies that tend to be present in redundant data files.
Sharing of Data:
A database allows the sharing of data under its control by any number of application
programs or users.
2
COMPUTER SCIENCE UNIT
Data Security:
The DBA who has the ultimate responsibility for the data in the dbms can ensure that
proper access procedures are followed including proper authentication to access to the
DataBase System and additional check before permitting access to sensitive data.
Conflict Resolution:
DBA resolve the conflict on requirements of various user and applications. The DBA
chooses the best file structure and access method to get optional performance for the
application.
Data Independence:
Data independence is usually considered from two points of views; physically data
independence and logical data independence.
Physical Data Independence allows changes in the physical storage devices or
organization of the files to be made without requiring changes in the conceptual view or
any of the external views and hence in the application programs using the data base.
Logical Data Independence indicates that the conceptual schema can be changed without
affecting the existing external schema or any application program.
Disadvantage of DBMS:
1. DBMS software and hardware (networking installation) cost is high
2. The processing overhead by the dbms for implementation of security, integrity and
sharing of the data.
3. Centralized database control
4. Setup of the database system requires more knowledge, money, skills, and time.
5. The complexity of the database may result in poor performance.
3
COMPUTER SCIENCE UNIT
A subschema is derived schema derived from existing schema as per the user requirement.
There may be more than one subschema create for a single conceptual schema.
4
COMPUTER SCIENCE UNIT
Internal level
External level View View View
A database management system that provides three level of data is said to follow three-
level architecture.
⚫ External level
⚫ Conceptual level
⚫ Internal level
External Level:
The external level is at the highest level of database abstraction. At this level, there will be
many views define for different user’s requirement. A view will describe only a subset of
the database. Any number of user views may exist for a given global schema (conceptual
schema).
For example, each student has different view of the time table. the view of a student of
BTech (CSE) is different from the view of the student of Btech (ECE). Thus this level of
abstraction is concerned with different categories of users.
Each external view is described by means of a schema called sub schema.
5
COMPUTER SCIENCE UNIT
Internal level:
It is the lowest level of abstraction closest to the physical storage method used. It indicates
how the data will be stored and describes the data structures and access methods to be
used by the database. The internal view is expressed by internal schema.
The following aspects are considered at this level:
1. Storage allocation e.g: B-tree, hashing
2. Access paths eg. specification of primary and secondary keys, indexes etc
3. Miscellaneous eg. Data compression and encryption techniques, optimization of
the internal structures.
Database Users:
Naive Users:
Users who need not be aware of the presence of the database system or any other system
supporting their usage are considered naïve users. A user of an automatic teller machine
falls on this category.
Online Users:
These are users who may communicate with the database directly via an online terminal
or indirectly via a user interface and application program. These users are aware of the
database system and also know the data manipulation language system.
6
COMPUTER SCIENCE UNIT
Database language:
1) Data definition language (DDL):
DDL is used to define database objects. The conceptual schema is specified by a set
of definitions expressed by this language. It also gives some details about how to
implement this schema in the physical devices used to store the data. This
definition includes all the entity sets and their associated attributes and their
relationships. The result of DDL statements will be a set of tables that are stored in
special file called data dictionary.
2) Data Manipulation Language (DML):
A DML is a language that enables users to access or manipulate data stored in the
database. Data manipulation involves retrieval of data from the database, insertion
of new data into the database and deletion of data or modification of existing data.
There are basically two types of DML:
⚫ Procedural: Which requires a user to specify what data is needed and how to
get it.
⚫ Non-Procedural: which requires a user to specify what data is needed without
specifying how to get it.
7
COMPUTER SCIENCE UNIT
ELEMENTS OF DBMS
DML Pre-Compiler:
It converts DML statements embedded in an application program to normal procedure
calls in the host language. The pre-complier must interact with the query processor in
order to generate the appropriate code.
DDL Compiler:
The DDL compiler converts the data definition statements into a set of tables. These tables
contain information concerning the database and are in a form that can be used by other
components of the dbms.
File Manager:
File manager manages the allocation of space on disk storage and the data structure used
to represent information stored on disk.
Database Manager:
A database manager is a program module which provides the interface between the low
level data stored in the database and the application programs and queries submitted to
the system.
The responsibilities of database manager are:
1. Interaction with File Manager: The database manager is responsible for
the actual storing, retrieving and updating of data in the database.
2. Integrity Enforcement: The data values stored in the database must satisfy
certain constraints (eg: the age of a person can't be less then zero). These
constraints are specified by DBA. Data manager checks the constraints and if it
satisfies then it stores the data in the database.
3. Security Enforcement: Data manager checks the security measures for database
from unauthorized users.
8
COMPUTER SCIENCE UNIT
9
COMPUTER SCIENCE UNIT
Database manager
File manager
DBMS
Data file
Data dictionary
ER-MODEL
Data Model:
The data model describes the structure of a database. It is a collection of conceptual tools
for describing data, data relationships and consistency constraints and various types of
data models such as
1. Object based logical model
2. Record based logical model
3. Physical model
Basic Concepts:
The E-R data model employs three basic notions: entity sets, relationship sets and
attributes.
Entity Sets: An entity is a “thing” or “object” in the real world that is distinguishable from
all other objects. For example, each person in an enterprise is an entity. An entity has a set
property and the values for some set of properties may uniquely identify an entity. BOOK
is entity and its properties (called as attributes) bookcode, booktitle, price etc.
An entity set is a set of entities of the same type that share the same properties, or
attributes. The set of all persons who are customers at a given bank.
11
COMPUTER SCIENCE UNIT
Attributes:
An entity is represented by a set of attributes. Attributes are descriptive properties
possessed by each member of an entity set (entity).
Customer is an entity and its attributes are customerid, custmername, custaddress etc.
Relationship:
A relationship is an association among entities. They are usually expressed as verbs such
as assign, associate, track etc. Relationship provides useful information that could not be
easily discerned with just the entity types.
In the above diagram, borrow denotes the relationship between the two elements customer
and loan.
Cardinality Ratio:
Cardinality ratios express the number of entities to which another entity can be associated
via a relationship set. It describes the number of relationship instances in which an entity
can participate. Types of cardinality ratio:
1. One to One:
An entity A is associated with at most one entity B, and an entity B is associated with at
most one entity A.
Eg: relationship between college and principal
12
COMPUTER SCIENCE UNIT
2. One to Many:
An entity A is associated with any number of entities in B. An entity in B is associated
with at the most one entity A.
Eg: Relationship between department and faculty
1 M
Faculty Contains Department
3. Many to One:
An entity A is associated with at most one entity in B. An entity in B is associated with
any number in A.
M 1
Employee 1 Works Department
in
4. Many to Many:
Entities in A and B are associated with any number of entities from each other.
M M
M M
customer account
owns
13
COMPUTER SCIENCE UNIT
Participation Constraints:
The participation constraints specify the number of instances of an entity that can
participate in a relationship set.
a) Total: When all the entities from an entity set participate in a relationship type,
is called total participation. For example, the participation of the entity set student on the
relationship set must
‘opts’ is said to be total because every student enrolled must opt for a course.
b) Partial: When it is not necessary for all the entities from an entity set to participate
in a relationship type, it is called partial participation. For example, the participation of
the entity set student in ‘represents’ is partial, since not every student in a class is a class
representative.
Weak Entity:
Entity types that do not contain any key attribute, and hence cannot be identified
independently are called weak entity types. A weak entity can be identified by uniquely
only by considering some of its attributes in conjunction with the primary key attribute
of another entity, which is called the identifying owner entity.
Generally, a partial key is attached to a weak entity type that is used for unique
identification of weak entities related to a particular owner type. The following
restrictions must hold:
The owner entity set and the weak entity set must participate in one to many relationship
set.
This relationship set is called the identifying relationship set of the weak
entity set. The weak entity set must have total participation in the identifying
relationship.
14
COMPUTER SCIENCE UNIT
Example:
Consider the entity type Dependent related to Employee entity, which is used to keep
track of the dependents of each employee. The attributes of Dependents are: name,
birthdate, sex and relationship. Each employee entity set is said to its own the dependent
entities that are related to it. However, not that the ‘Dependent’ entity does not exist of its
own, it is dependent on the Employee entity.
ER-DIAGRAM:
The overall logical structure of a database using ER-model graphically with the help of
an ERdiagram.
composite attribute
entity
Weak entity
attribute Relationship
1 m
1 1
15
COMPUTER SCIENCE UNIT
16
COMPUTER SCIENCE UNIT
17
COMPUTER SCIENCE UNIT
Consider a university database for the scheduling of class rooms for final exams. This
database could be modeled as the single entity set exam, with attributes course-
18
COMPUTER SCIENCE UNIT
19
COMPUTER SCIENCE UNIT
20
COMPUTER SCIENCE UNIT
21
COMPUTER SCIENCE UNIT
employee
Generalization Specialization
Is Is
degree degree
Is Is Is Is
EMPLOYEE(empno,name,dob) Faculty(empno,degree,intrest)
FULL_TIME_EMPLOYEE(empno,salary) Staff(empno,hour-rate)
PART_TIME_EMPLOYEE(empno,type) Teaching (empno,stipend)
22
COMPUTER SCIENCE UNIT
Aggregation:
Aggregation is the process of compiling information on an object, there by abstracting a
higher level object. The entity person is derived by aggregating the characteristics of
name, address, ssn. Another form of the aggregation is abstracting a relationship objects
and viewing the relationship as an object.
23
COMPUTER SCIENCE UNIT
Job
Branch
Employe
Works
e on
Manag
es
Manager
24
COMPUTER SCIENCE UNIT
25
COMPUTER SCIENCE UNIT
26
COMPUTER SCIENCE UNIT
RELATIONAL MODEL
Relational model is simple model in which database is represented as a collection of
“relations” where each relation is represented by two-dimensional table.
The relational model was founded by E. F. Codd of the IBM in 1972. The basic concept in
the relational model is that of a relation.
Properties:
o It is column homogeneous. In other words, in any given column of a table, all items
are of the same kind.
o Each item is a simple number or a character string. That is a table must be in first
normal form.
o All rows of a table are distinct. The ordering of rows with in a table is immaterial.
o The columns of a table are assigned distinct. names and the ordering of these
columns is immaterial.
Tuple:
Each row in a table represents a record and is called a tuple .A table containing ‘n’
attributes in a record is called is called n-tuple.
Attributes:
27
COMPUTER SCIENCE UNIT
Domain:
A domain is a set of values that can be given to an attribute. So, every attribute in a table
has a specific domain. Values to these attributes cannot be assigned outside their
domains.
Keys:
Super key:
A super key is an attribute or a set of attributes used to identify the records uniquely in a
relation. For example, customer-id, (cname, customer-id), (cname,telno)
Candidate key:
Super keys of a relation can contain extra attributes. Candidate keys are minimal super
keys. i.e, such a key contains no extraneous attribute. An attribute is called extraneous if
even after removing it from the key, makes the remaining attributes still has the properties
of a key (atribute represents entire table).
In a relation R, a candidate key for R is a subset of the set of attributes of R, which have
the following properties:
Uniqueness: No two distinct tuples in R have the same values for
the candidate key
Irreducible: No proper subset of the candidate key has the
uniqueness property that is the
candidate key.
28
COMPUTER SCIENCE UNIT
29