Unit I
Unit I
Database system
architecture
1
Syllabus
• Data Abstraction,
• Data Independence,
• Data Definition Language (DDL),
• Data Manipulation Language (DML).
• Data models:
– Entity-relationship model,
– network model,
– relational and
– object-oriented data models,
• integrity constraints,
• data manipulation 2
Introduction
3
What is data?
• A collection of facts from which conclusion
may be drawn.
• Data is often obtained as a result of recordings
or observations.
• Data is the plural form of datum.
• The temperature of the days is data.
4
Temperature of the days
• When this data is to be collected, a system or
person monitors the daily temperatures and
records it.
• Finally when it is to be converted into
meaningful information, the patterns in the
temperatures are analyzed and a conclusion
about the temperature is arrived at.
• So information obtained as a result of
analysis, communication, or investigation.
5
Properties of Data
• Data should be well organized.
• Data should be related.
• Data should be accessible in any order.
• One data should be stored minimum number
of times.
6
What is a Database?
Database is a collection of related data, that
contains information about an entity.
For example:
1. University database
2. Employee database
3. Student database
4. Airlines database
7
Properties of Database
• A database represents some aspect of the real
world, sometimes called the miniworld or the
universe of discourse (UoD).
• A database is a logically coherent collection of
data with some inherent meaning.
• A database is designed, built and populated
with data for a specific purpose.
8
What is Database Management System
(DBMS)?
• A database management system (DBMS) is a collection of
programs that enables users to create & maintain a database. It
facilitates the definition, creation and manipulation of the
database.
• Definition – it holds only structure of database, not the data. It
involves specifying the data types, structures & constraints for the
data to be stored in the database.
• Creation –it is the inputting of actual data in the database. It
involves storing the data itself on some storage medium that is
controlled by the DBMS.
• Manipulation-it includes functions such as updation, insertion,
deletion, retrieval of specific data and generating reports from the
data.
9
Typical DBMS Functionality
• Define a database : in terms of data types,
structures and constraints
• Construct or Load the Database: on a
secondary storage medium
• Manipulating the database : querying,
generating reports, insertions, deletions and
modifications to its content
• Concurrent Processing and Sharing by a set of
users and programs: yet, keeping all data valid
and consistent
10
Typical DBMS Functionality
Other features:
– Protection or Security measures to prevent
unauthorized access
– “Active” processing to take internal actions
on data
– Presentation and Visualization of data
11
Database System
• The database and the DBMS together is called the
database system.
• Database systems are designed to manage large bodies of
information.
• It involves both defining structures for storage of
information & providing mechanisms for the manipulation
of information.
• Database system must ensure the safety of the information
stored.
• Meta data- it is the data about the data. It contains the
structure of the database as well as the physical location of
the database.
12
A simplified database system
environment
13
Database System Applications
• Banking- for customer information, accounts & loans, and banking
transactions.
• Airlines-for reservations & schedule information.
• Universities-for student information, course registration and grades.
• Credit card transactions-for purchases on credit cards & generation of
monthly statements.
• Telecommunication-for keeping records of calls made, generating
monthly bills, maintaining balances, information about communication
networks.
• Finance-for storing information about holdings, sales & purchases of
financial instruments such as stocks & bonds.
• Sales-for customer, product and purchase information.
• Manufacturing-for management of supply chain & for tracking
production of items in factories.
• Human resources-for information about employees, salaries, payroll
taxes and benefits
14
Database system
• Oracle
• SQL Server
• DB2
• Sybase
• MySQL
• PostgreSQL
• Teradata
• Informix
• Ingres
• Amazon’s SimpleDB 15
Traditional File systems
16
Traditional File systems
17
Traditional file system
• For example, one user, the grade reporting office, may keep a
file on students and their grades. Programs to print a student’s
transcript and to enter new grades into the file are
implemented.
• A second user, the accounting office, may keep track of
students’ fees and their payments.
• Although both users are interested in data about students,
each user maintains separate files—and programs to
manipulate these files—because each requires some data not
available from the other user’s files.
• This redundancy in defining and storing data results in wasted
storage space and in redundant efforts to maintain common
data up-to-date.
18
Disadvantages of File systems
1.Data Redundancy & Inconsistency
2.Difficulty in Accessing data
3.Data Isolation
4.Integrity Problems
5.Atomicity Problems
6.Concurrent access Anomalies Problems
7.Security Problems
19
Data Redundancy & Inconsistency
20
Difficulty in Accessing data
• Accessing data from a list is also a difficulty in file system.
• Suppose we want to see the records of all customers who
has a balance less than Rs10,000, we can either check the
list & find the names manually or write an application
program.
• If we write an application program & at some later time,
we need to see the records of customer who have a balance
of less than Rs 20,000, then again a new program has to be
written.
• It means that file processing system do not allow data to be
accessed in a convenient manner.
21
Data Isolation
• As the data is stored in various files, & various
files may be stored in different format, writing
application program to retrieve the data is
difficult.
22
Integrity Problems
23
Atomicity Problems
• Any mechanical or electrical device is subject to failure, and
so is the computer system.
• In this case we have to ensure that data should be restored to a
consistent state.
• For example an amount of Rs 50 has to be transferred from
Account A to Account B.
• Let the amount has been debited from account A but have not
been credited to Account B and in the mean time, some failure
occurred.
• So, it will lead to an inconsistent state.
• So, we have to adopt a mechanism which ensures that either
full transaction should be executed or no transaction should be
executed i.e. the fund transfer should be atomic.
24
Concurrent access Problems
• Many systems allows multiple users to update
the data simultaneously.
• It can also lead the data in an inconsistent state.
• Suppose a bank account contains a balance of
Rs 500 & two customers want to withdraw
Rs100 & Rs 50 simultaneously.
• Both the transaction reads the old balance &
withdraw from that old balance which will
result in Rs 450 , Rs 400 which is incorrect.
25
Security Problems
• All the user of database should not be able to
access all the data.
• For example a payroll Personnel needs to
access only that part of data which has
information about various employees & are
not needed to access information about
customer accounts.
26
Advantages of DBMS
27
Advantages of DBMS
• Reducing Data Redundancy
• Sharing of Data
• Data Integrity
• Data Security
• Privacy
• Backup and Recovery
• Data Consistency
28
Reducing Data Redundancy
31
Data Security
• Data Security is vital concept in a database.
Only authorized users should be allowed to
access the database and their identity should
be authenticated using a username and
password.
• Unauthorized users should not be allowed to
access the database under any circumstances
as it violates the integrity constraints.
32
Privacy
• The privacy rule in a database means only the
authorized users can access a database
according to its privacy constraints.
• There are levels of database access and a user
can only view the data he is allowed to.
• For example - In social networking sites,
access constraints are different for different
accounts a user may want to access.
33
Backup and Recovery
34
Data Consistency
35
Disadvantages of DBMS
• Cost of Hardware & Software
• Cost of Data Conversion
• Cost of Staff Training
• Appointing Technical Staff
• Database Damage
36
Database users
37
Users may be divided into
38
Actors on the scene
– Database administrators
– Database Designers
– End-users
39
Database administrators (DBA)
40
Functions of DBA
42
Functions of DBD
• The creation of original description of the
database structure
• Database designers interact with different
group of users & integrate their views to make
the best structure.
43
End-users
44
• Casual: they can only browse through the
database; they cannot create, update or make
any changes in the database. They
occasionally access the database.
• Naive or Parametric: they use the readymade
software which deals with the database. They
can only update the database. Examples are
bank-tellers or reservation clerks who do this
activity for an entire shift of operations.
45
• Sophisticated: these include business analysts,
scientists, engineers, others thoroughly
familiarize themselves with the facilities of the
DBMS so as to implement their applications to
meet their complex requirements.
• Stand-alone: mostly maintain personal
databases using ready-to-use packaged
applications. An example is a tax program user
that creates his or her own internal database.
46
Workers behind the scene
47
Data models
48
DATA MODEL
50
Conceptual data models
• Conceptual data models provide concepts that are close to
the way many users perceive data.
• Before implementation, a rough model of database is
created.
• This model is never implemented but is used for designing
purpose.
• This model use concepts such as entities , attributes and
relationships.
51
E-R model
• Stands for entity-relationship model.
Terms used in E-R model:
Attribute
Entity
52
EXAMPLE
Qu ickTime™ an d a
d ecomp ressor
are n eed ed to see this p icture.
Qu ickTime™ an d a
d ecomp ressor
are n eed ed to see this p icture.
Qu ickTim e™ an d a
d ecomp r essor Quic kTime™ and a
ar e n eed ed t o see t h is p ict u r e. dec ompres s or
are needed to s ee this pic ture.
Qu ickTime™ an d a
d ecomp ressor
are n eed ed to see this p icture.
Qu ickTime™ an d a
d ecomp ressor
are n eed ed to see this p icture.
53
Physical data models
• It provides concepts that describe the details of
how data is stored in the computer by
representing information such as record
formats(fixed/variable length), record orderings,
and access paths(key indexing).
• An access path is a structure that makes the
search for particular database records efficient.
• Concepts provided by physical data models are
generally meant for computer specialists, not for
typical end users.
54
Implementation data models
• Provide concepts that fall between the above
two.
• It also provides concepts that may be
understood by end users but that are not too far
away from the way data is organized within
the computer.
• Example: relational model, network model,
hierarchical model, object oriented model
55
Hierarchical Model
Hierarchical Model was the first DBMS model. This
model organizes the data in the hierarchical tree
structure.
The hierarchy starts from the root which has root data
and then it expands in the form of a tree adding child
node to the parent node.
This model easily represents some of the real-world
relationships like food recipes, sitemap of a website etc.
Example: We can represent the relationship between
the shoes present on a shopping website in the
following way:
56
Example
57
Features of a Hierarchical Model
58
Advantages and Disadvantages of Hierarchical Model
59
Network Model
• This model is an extension of the hierarchical model.
• It was the most popular model before the relational
model.
• This model is the same as the hierarchical model, the
only difference is that a record can have more than
one parent.
• It replaces the hierarchical tree with a graph.
• Example: In the example below we can see that node
student has two parents i.e. CSE Department and
Library.
• This was earlier not possible in the hierarchical model.
60
Example
61
Features of a Network Model
account-number balance
A-101 500
A-201 900
A-215 700
A-217 750
65
Features of Relational Model
67
Object oriented data model
68
Example of Object oriented data model
69
Advantages
70
Disadvantages
71
Object relational data model
72
Schemas And Instances
73
Schemas And Instances
re re quisite
pre re quisite course numbe r
numbe r
cosc3380 cosc3320
cosc3330 math2410
cosc3320 cosc1310
77
Data abstraction
78
Levels of data abstraction
79
Physical level
80
Logical level
81
View level
• This level contains the actual data which is shown
to the users.
• This is the highest level of abstraction & the user
of this level need not know the actual details of
data storage.
82
Levels of Abstraction
• Physical level: describes how a record (e.g.
customer) is stored.
• Logical level: describes data stored in database,
and the relationships among the data.
type customer = record
name: string;
street: string;
city: integer;
end;
• View level: application programs hide details of
data types. Views can also hide information (e.g.
salary) for security purposes.
83
3-level DBMS Architecture
84
ANSI-SPARC 3-level DBMS Architecture
85
Three-schema architecture
• Where ANSI-SPARC stands for American National
Standards Institute, Standards Planning And
Requirements Committee.
• The three-schema architecture is a convenient tool for
the user to visualize the schema levels in a database
system.
• In this architecture, schemas can be defined at the
following three levels:
– Internal schema/Physical schema
– Conceptual schema
– External schema
86
• The internal level has an internal schema, which describes
the physical storage structure of the database.
Conceptual schema:
• Student (sid: string, name: string, age: number, percent: real)
• Courses (cid: string, cname: string, credits: number)
• Enrolled (enrollment: integer,sid: string, cid: string, grade:
string)
External schema:
• Course_info(cid: string, enrollment: integer)
88
DATA INDEPENDENCE
89
DATA INDEPENDENCE
90
Types of data independence
• Physical data independence is the capacity to
change the internal schema without having to
change the conceptual (or external) schemas.
91
Entity- Relationship Model
92
Entity- Relationship Model
• The E-R model is the most commonly used
conceptual model.
• In this model, the real world consists of a
collection of basic objects called entities and the
relationships among these objects.
• The end product of the modeling process is an
entity-relationship diagram (ERD) or ER diagram.
• But it is not implemented but design for
creating the database.
93
The E-R data model employs three basic notions:
• Entity
• Attributes
• Relationship
94
Entity
• It is an object with a physical existence.
• For example, each person in an enterprise ,
car, house, a company, student.
95
Entity Type & Entity Sets
• Entity Type –
– collection of entities that have the same attributes.
Ex: STUDENT
STUDENT
96
Graphical representation of entity sets
97
Attributes
• Attributes are the particular properties that
describe an entity.
Ex: A STUDENT entity may be described by
student’s name, student’s roll_number.
98
Graphical representation of attributes
99
Types of Attributes
100
Simple (Atomic) and Composite Attributes
101
Address
102
Single Valued & Multi-valued Attributes
103
Stored and Derived Attributes
104
Null Valued Attributes
105
Complex Attributes
106
Key attribute in an entity type
• Key attributes will be having a unique value for
each entity of that attribute.
• It identifies every entity in the entity set.
• Key attribute will never be a null valued attribute.
• Any composite attribute can also be a key
attribute.
• There could be more than one key attributes for
an entity type.
Example: roll_no, enrollment _no
107
Domain of value set of an attribute
• Domain of an attribute is the allowed set of
values of that attribute.
Example: if attribute is ‘grade’, then its allowed
values are A,B,C,F.
• Grade ={A, B,C,F}
108
TYPES OF ENTITY TYPES
Strong entity type – Entity types that have at least one
key attribute.
Weak entity type – Entity type that does not have any
key attribute.
An entity in a weak entity type is identified by a
relationship with a strong entity type and that
relationship is called Identifying Relationship and that
strong entity type is called the owner of the weak entity
type.
109
TYPES OF ENTITY TYPES
Student
Roll No. Name Age
1 Rakesh 20
2 Nikhil 21
3 Nikhil 21
Secured
Marks
Name M1 M2 M3
Nikhil 50 45 40
Nikhil 80 75 82
Identifying Relationship
110
Relationship
• Relates two or more distinct entities with a specific meaning.
– For example, EMPLOYEE John works on the ProductX
PROJECT
or
– EMPLOYEE Franklin manages the Research
DEPARTMENT.
Terms used:
Relationship type,
Relationship set,
Relationship instances.
111
Relationship type: secured
Relationship set: {R1, R2, R3, R4}
Relationship instances: R1
112
Graphical Representation of Relationship Sets
113
NOTATIONS USED IN E-R DIAGRAM
Entity Type
Attribute
Key Attribute
114
NOTATIONS USED IN E-R DIAGRAM
Composite Attribute
Derived Attribute
Multivalued Attribute
115
NOTATIONS USED IN E-R DIAGRAM
Relationship Type
Identifying Relationship
116
Example of E-R Diagrams
117
E-R Diagram With Composite, Multivalued, and Derived
Attributes
118
Relationship Types with Attributes
119
Constraints
Relationship types usually have certain
constraints. Two main types of relationship
constraints:
• Mapping cardinalities
• Participation constraints
120
Mapping cardinalities, or cardinality ratios
121
Mapping Cardinalities
• One-to-one (1:1)
• One-to-many (1: N)
• Many-to-one (N: 1)
• Many-to-many (M: N)
122
Cardinality ratio
• We express cardinality ratio by drawing
directed line (→), signifying “one,” or an
undirected line (—), signifying “many,”
123
One-To-One Relationship
124
One-To-Many Relationship
125
Many-To-One Relationships
• In a many-to-one relationship a loan is associated with several
customers via borrower.
126
Many-To-Many Relationship
127
Find out the Cardinality ratio
• Prime minister-country
• classroom –students
• students –classroom
• customer -loan
128
Participation constraints
Total participation : every entity in the entity type participates in at least
one relationship in the relationship type (displayed by double line)
E.g. participation of loan in borrower is total
every loan must have a customer associated to it via borrower
Partial participation: some entities may not participate in any relationship
in the relationship type (displayed by a single line).
Example: participation of customer in borrower is partial
some customers may not participate in any loan
129
KEYS
130
Types of keys
• Candidate Key
• Alternate & Primary key
• Superkey
131
Candidate Key
• It is the minimal set of attributes that uniquely identifies any
entity in entity set.
• There can be more than one candidate keys in entity set.
• More than one attribute can together form a single candidate key.
• Suppose that a combination of customer-name and customer-street
is sufficient to distinguish among members of the customer entity
set.
• Then, both {customer-id} and {customer-name, customer-street}
are candidate keys.
• Although the attributes customer-id and customer-name together
can distinguish customer entities, their combination does not form
a candidate key, since the attribute customer-id alone is a
candidate key.
132
Alternate & Primary key
• Alternate & Primary key is related with
candidate key.
• In entity set, primary key is a candidate key
but only one key is the primary key & the left
candidate keys are called alternate key.
• AK=CK-PK
133
Superkey
• A superkey is the superset of any candidate key.
• For example, the customer-id attribute of the entity set
customer is sufficient to distinguish one customer entity
from another.
• Thus, customer-id is a superkey.
• Similarly, the combination of customer-name and
customer-id is a superkey for the entity set customer.
• The customer-name attribute of customer is not a
superkey, because several people might have the same
name.
• Example: {customer-id}, {customer-name, customer-id}
134
Weak Entity Types
• An entity type that does not have a primary key is
referred to as a weak entity type.
135
Weak Entity types (Cont.)
• We depict a weak entity type by double rectangles.
• We underline the partial key of a weak entity type with a
dashed line.
• payment_number – partial key of the payment entity type
• Primary key for payment – (loan_number, payment_number)
136
Give me answer?
• Can we convert weak entity type into strong
entity type?
137
Steps in ER Modeling
138
PROBLEMS ON E-R DIAGRAM
139
140
Question 2
• Design an E-R diagram for keeping track of the
exploits of your favorite sports team. You
should store the matches played, the scores in
each match, the players in each match and
individual player statistics for each match.
Summary statistics should be modeled as
derived attributes.
141
Solution
142
Question 3
• Construct an E-R diagram for a hospital with a
set of patients and a set of medical doctors.
Associate with each patient a log of the
various tests and examinations conducted.
143
Solution
144
Question 4
Construct an E-R diagram for Bank.
145
146
Integrity constraints
147
Integrity constraints
• Integrity constraints are a set of rules. It is
used to maintain the quality of information.
• Integrity constraints ensure that the data
insertion, updating, and other processes have
to be performed in such a way that data
integrity is not affected.
• Thus, integrity constraint is used to guard
against accidental damage to the database.
148
Types of Integrity Constraint
149
Domain constraints
150
Example:
151
Entity integrity constraints
152
Example:
153
Referential Integrity Constraints
154
Referential Integrity
• Referential integrity is concerned with foreign keys.
Primary key and foreign key relates two tables.
• Referential integrity is a property, when satisfied, requires
every value of one attribute of a relation to exist as a value
of another attribute in a different relation.
• Less formally: For referential integrity to hold, any field in
a table that is declared a foreign key can contain only values
from a parent table's primary key.
155
Referential Integrity Constraint
The value in the foreign key column FK of the referencing relation
R1 can be either:
(1) a value of an existing primary key value of the corresponding
primary key PK in the referenced relation R2,, or..
(2) a null.
156
Example
157
Key constraints
• Keys are the entity set that is used to identify
an entity within its entity set uniquely.
• An entity set can have multiple keys, but out
of which one key will be the primary key. A
primary key can contain a unique and null
value in the relational table.
158
Example
159
Key Constraints
• Super key
• Candidate key
• Primary key
• Alternate key
• Foreign Key
160
Super key
• Super key as a set of those keys that identify a row or a tuple
uniquely. The word super denotes the superiority of a key.
• Thus, a super key is the superset of a key known as
a Candidate key. It means a candidate key is obtained from a
super key only.
• The role of the super key is simply to identify the tuples of the
specified table in the database. It is the superset where the
candidate key is a part of the super key only.
• So, all those attributes in a table that is capable of identifying
the other attributes of the table in a unique manner are all
super keys.
161
Example
• Let's consider an EMPLOYEE_DETAIL table example where we have the following attribute:
• Employee(Emp_SSN,Emp_Id,Emp_name,Emp_email)
162
Candidate key
163
Continued…
• It is possible that several distinct sets of attributes could serve as
a candidate key.
• Suppose that a combination of customer-name and customer-
street is sufficient to distinguish among members of the customer
entity set.
• Then, both {customer-id} and {customer-name, customer-street}
are candidate keys.
• Although the attributes customer-id and customer-name together
can distinguish customer entities, their combination does not
form a candidate key, since the attribute customer-id alone is a
candidate key.
165
Alternate & Primary key
166
Foreign Key
• Foreign keys are the column of the table used to point to
the primary key of another table.
• Every employee works in a specific department in a
company, and employee and department are two different
entities. So we can't store the department's information in
the employee table. That's why we link these two tables
through the primary key of one table.
• We add the primary key of the DEPARTMENT table,
Department_Id, as a new attribute in the EMPLOYEE table.
• In the EMPLOYEE table, Department_Id is the foreign key,
and both the tables are related.
167
Example
168
DDL and DML commands
169
Structured Query Language(SQL)
• Structured Query Language(SQL) as we all know is the database
language by the use of which we can perform certain operations
on the existing database and also we can use this language to
create a database.
• SQL uses certain commands like Create, Drop, Insert, etc. to
carry out the required tasks.
• These SQL commands are mainly categorized into four
categories as:
• DDL – Data Definition Language
• DML – Data Manipulation Language
• DCL – Data Control Language
• TCL – Transaction Control Language
170
DDL (Data Definition Language)