Notes
Notes
Unit-I
Introduction
●
Data: It is collection of information which is to be translated into
meaningful form for the purpose of processing i.e. Data contains
both useful /unuseful information but information is useful data
i.e. Extracted from the data.
●
Database:A database is a collection of data. It contains
information related to one particular enterprise.
●
Bank: -which stores customer, accounts, loans & banking
transaction information
●
Airlines: - For schedules & reservations information.
●
Universities: - For student, course registration & exam
information
●
DBMS:Data Base Management System
●
Database management system is a collection of interrelated data &
a set of programs to access the data.
●
The primary goal of a DBMS is to provide way to store and retrieve
database information that is both convenient and efficient
●
Database is an important part of enterprises
Database System Applications
●
Banking:Customer info., loan accounts, Banking Transactions.
●
Airlines:
●
University
●
Credit Card Transactions:
●
Telecommunication
●
Finance
●
Sales
●
On-l9ine retailers
●
Manufacturing:
●
Human Resources
Purpose of database System
●
Disadvantages in using file processing system.
1. Data Redundancy(duplication) and Inconsistency :
●
Since different programmers create the files and application
programs over a long period.
(i) The data files are likely to have different formats.
(ii) Programs may be written in several programming languages
(iii) The information may be duplicated in several files.
●
This redundancy leads to higher storage and access cost.
●
In addition, it may lead to data inconsistency that is the various
copies of the same data may no longer agree. E.g. address may be
written in various ways
2. Difficulty in accessing data:
●
Convenient file processing system does not allow needed data to
be retrieved in a convenient and efficient manner.
●
E.g. consider a data file Student with fields (Rollno, name, marks)
●
Application programs are written to access data. If user wants the
list of student whose marks is greater than 75. And if program is
not written then it is necessary to write down a new program for
producing such list.
3. Data isolation:
●
Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve
the appropriate information is difficult.
4. Integrity Constraints:
●
The data values stored in the database must satisfy certain types
of constraints
●
e.g. Balance of a saving account may never fall RS. 200.
●
Application programmers enforce these consistency constraints
by adding appropriate code in the various application programs.
However, when a new constraint is to be added, it is difficult to
change programs to enforce the new constraints.
●
5. Atomicity Problems
●
A computer system is subject to failure. In many applications, if a
failure occurs it is difficult that, the data be restored to the
consistent state that existed prior to failure.
●
e.g. Consider a program to transfer Rs. 250 from account A to
account B. If failure occurs it is possible that fund was be removed
from account A but not transferred to account B.
●
This results in inconsistent state.
●
It is essential that either both credit and debit occur or that
neither occur.
●
That is funds transfer must be atomic –it must happen in its
entirely or not at all.
●
6. Concurrent Access Anomalies:
●
Many systems allow multiple users to update the data
simultaneously. But in case of file processing system data is not
centralized.
●
If two or more users want to access database at the same time,
interaction of concurrent updates may result in inconsistent data.
7. Security Problems:
●
Not every user of the database system should be able to access all
the data.
●
But enforcing such security constraints is difficult in conventional
file processing system.
DATA ABSTRACTION:
●
A major purpose of a database system is to provide users with
abstract view of the data. That is, the system hides certain details
of how the data are stored and maintained
●
Many complex database structures are used to represent data in
the database. Since many database system users are not computer
trained, developers hide the complexity from users through
several levels of abstraction, to simplify users interaction with the
system:
●
Physical level: It is the lowest level of data abstraction, It
describes how data are actually stored and describes data
structures and access methods to be used by the database.
●
At the physical level, complex data structures are used for
describing data.
●
Conceptual Level: It is the next higher level of abstraction. It
describes what data are stored in the database and relationship
that exists among the data..
●
Here, the entire database is described in terms of small number of
relatively simple structures.
●
The conceptual level of abstraction is used by database
administrators, who must decide what information is kept in the
database.
●
Implementation of the simple structures at the logical level may
involve complex level physical structures.
●
View level: It is the highest level of abstraction. It describes only a
part of the entire database.
●
Need of Abstraction:Many users of the database system will not
be concerned with all of the information. They may need only a
part of the entire database. This level of abstraction exists for
simplifying users interaction with system.
INSTANCES AND SCHEMAS
●
Database Changes over time as info. Is inserted and deleted.
●
Collection of information stored in the database at a particular
moment is called an instance of database.
●
The overall design(architecture or blueprint) of the database is
called database schema.
●
Schemas are changed infrequently.
●
A database schema is similar to the variable declarations or type
definitions in the programming language.
e.g. struct student
{
int rollno;
char name [40];
char address[40]
int marks;
} stud;
●
A database schema corresponds to the programming language
data type definition eg. Student structure in the above example.
●
The value of a given type has a particular value at a given instance.
●
e.g. stud variable in the above example has a particular value if
used in application program
●
Database systems have several schemas partitioned according to
the level of abstraction.
Physical schema describes the database design at the lowest
level.
Logical schema describes the database design at the logical
level.
A database may also have several schemas at the logical level,
sometimes called subschmas that describes different views of
the database.
DATA INDEPENDENCE
●
The ability to modify a schema defination in one level without
affecting a schema definition in the next higher level is called data
independence.
●
There are two levels of Independance
●
●
Logical data independence is more difficult to achieve than is
the physical data independence.
●
Programmer construct applications by using the logical schema .
The physical schema is hidden beneath the logical schema and can
be changed easily without affecting the application programs.
●
Database Languges
●
DDL:Data Defination Language- specify the database scheme
●
DML:Data Manipulation Language-express database query &
updates
●
Data definition language: Database schema is specified by a set
of definitions expressed by a special language called data
definition language (DDL).
●
The result of compilation of DDL statements is a set of tables,
which is stored in a special file called data directory
●
e.g. following statement in the SQL languages defines student
table
create table student
( rollno int(5),
name char(30),
address char(40)) ;
●
DATA DICTIONARY: The result of compilation of DDL statements
is a set of tables , which is stored in a special file called data
dictionary. This file contains metadata i.e. data about data. Among
the types of information the data dictionary stores following
information.
●
Name of relations (table).
●
Names of attributes of each relation
●
Domains of attributes
●
Names of views defined on the database, and the definition of
those views.
●
Integrity constraints for each relation.
●
In addition to that many system keeps following data on the user
of the system.
●
Name of authorized users.
●
Accounting information about users.
●
In system that use highly sophisticated structure to store
relations, statistical and descriptive relations may be kept on line.
●
Data Manipulation language:
●
DML is a language that enables users to access or manipulate data
as organized by the appropriate data model.
●
Data manipulation means:
– To retrieve information from the database
– To insert information form the database
– To delete information from database
– To modify information from database
●
There are basically two types of DML:
●
●
Procedural DML’s
●
Non procedural DML’s
●
Procedural DML’s : It requires a user to specify what data are
needed and how to get those data.
●
Non procedural DML’S: It requires a user to specify what data are
needed without specifying how to get those data.
e.g. DML component of the SQL language is non procedural
language.
●
Query:
●
A query is a statement requiring or requesting the retrieval of
information. The portion of a DML that involves information
retrieval is calleda query language.
●
e.g. select name from student where rollno=10;
●
This query in the SQL finds the name of the student whose roll
number is 10.
●
Queries may involve from more than one table.
●
The query processor component of the database system translates
DML queries into sequences of actions at the physical level of the
database system.
DATA MODELS
●
Underlying the structure of a database is data model, a collection
of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints.
OBJECT BASED LOGICAL MODELS:
●
Object- based logical models are used in describing data at the
logical and view level. They provide flexible structuring
capabilities and allow data constraints to be specified explicitly.
●
There are several models. Several of more widely known ones are
– The entity relationship model
●
Account with attributes
– 1. social-security no.
– 2. balance
– 3. type
●
In this above example ,every customer has account in the bank.
Depositor is a relation between two entities customer and account
which specifies which account belongs to which customer.
●
Degree of a Relationship Set:The number of entity sets that
participate in a relationship set.
●
Relationship sets with two entity sets are binary (or degree
two).
most relationship sets in a database system are binary.
●
Relationship sets may involve more than two entity sets.
Entity sets customer, account, and branch may be linked by
the ternary (degree three) relationship set CAB.
●
Mapping Cardinality
●
Mapping cardinality or cardinality ratios:express the number of
entities to which another entity can be associated via a
relationship set.
●
It is most useful in describing the binary relationship.
●
Cardinality types are distinguished by drawing a directed line
●
(→):signifying “one,”
●
or an undirected line (—), signifying “many,” between the
relationship set and the entity set.
●
For a binary relationship set R between entity sets A and B, the
mapping cardinality must be one of the following
●
●
One to one :- An entity in A is associated with at most one entity
in B and an entity in B is associated at most one entity in A.
●
●
One to many :- An entity in A is associated with any number of
entities in B.
●
An entity in B, however can be associated with at most one entity
in A.
●
Many to one :- An entity in A is associated with at most one entity
in B.
●
An entity in B however can be associated with any number of
entities in A.
●
●
Many to many :- An entity in A is associated with any number of
entities in B
●
and an entity in B is associated with any number of entities in A.
●
PARTICIPATION CONSTRAINTS
●
An entity is divided into sub-entities based on its characteristics.
●
It is a top-down approach where the higher-level entity is
specialized into two or more lower-level entities.
●
For Example, an EMPLOYEE entity in an Employee management
system can be specialized into DEVELOPER, TESTER, etc. . In this
case, common attributes like E_NAME, E_SAL, etc. become part of a
higher entity (EMPLOYEE), and specialized attributes like
TES_TYPE become part of a specialized entity (TESTER).
Generalization
●
Generalization is the process of extracting common properties
from a set of entities and creating a generalized entity from it.
●
It is a bottom-up approach in which two or more entities can be
generalized to a higher-level entity if they have some attributes in
common.
●
For Example, STUDENT and FACULTY can be generalized to a
higher-level entity called PERSON .In this, common attributes like
P_NAME, and P_ADD become part of a higher entity (PERSON),
and specialized attributes like S_FEE become part of a specialized
entity (STUDENT).
●
Aggregation
●
Aggregation is an abstraction through which relationships are
treated as higher level entities.
●
It helps to express relationship among relationships.
●
An ER diagram is not capable of representing the relationship
between an two entity – relationships which may be required in
some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher-level entity.
●
For Example, an Employee working on a project may require some
machinery. So, REQUIRE relationship is needed between the
relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE
and PROJECT is aggregated into a single entity and relationship
REQUIRE is created between the aggregated entity and
MACHINERY.
Converting ER diagram to Tables
●
Assignment