DBMS Unit-1
DBMS Unit-1
UNIT-1
Introduction to Database Management System
As the name suggests, the database management system consists of two parts. They are:
1. Database and
2. Management System
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any
DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent meaningful
information.
1 ABC 19
Basis of
Data Information
Comparison
Information
It refers to are facts
raw facts regarding
which one something
gathers put into
Meaning
about context
something which are
and are bare refined
and random. through
processing
We measure
information
We measure
in units of
Measurement data in bits
time,
and bytes
quantity and
more
The average
A student’s
Example score of the
exam score
class
The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples
or Records.
3
A Historical Perspective
Database Management Systems (DBMS) have been around for several decades, and their history can be
traced back to the early 1960s. In the early days, computer systems were designed to manage data in a
hierarchical or navigational manner, where data was stored in a tree-like structure. This method of
storing data was inefficient and difficult to use, as it required a lot of manual effort to access and
manage the data.
In the late 1960s, The first general-purpose DBMS, designed by Charles Bachman, was called
the Integrated Data Store (IDS) which was based on network data model for which he was received
the Turing Award (The most prestigious award which is equivalent to Nobel prize in the field of
Computer Science.).
In the late 1970s, Mr Edgar Codd proposed a new data representation framework called the Relational
Database Model. Mr Edgar Codd won the 1981 Turing Award for his seminal work. This model was based
on the concept of a table, with rows representing individual records and columns representing individual
fields within those records. The relational model allowed for more efficient storage and retrieval of data
and was easier to use than the hierarchical or navigational models.
In the late 1980s IBM developed the Structured Query Language (SQL) for relational databases, as a part
of R project. This system was designed to manage large amounts of data and was used primarily in
corporate and government applications. SQL was adopted by the American National Standards Institute
(ANSI) and International Organization for Standardization (ISO).
In the 1980s, several new DBMS products were introduced, including Oracle, Sybase, and Microsoft SQL
Server. These systems were designed to be more user-friendly and to support more advanced data
modeling and query languages.
In the 1990s, object-oriented DBMS (OODBMS) emerged, which were designed to store and manage
complex data structures, such as multimedia and other types of non-traditional data. These systems
were initially popular in research and academic environments, but their adoption was limited in the
commercial sector.
In the 1991, Microsoft ships MS access, a personal DBMS and that displaces all other personal DBMS
products.
In the 1997, XML applied to database processing. Many vendors begin to integrate XML into DBMS
products.
In the 2000s, web-based applications and cloud computing became more popular, and DBMS systems
began to adapt to these new technologies. New DBMS systems were developed to support distributed
and web-based applications, including NoSQL databases such as MongoDB and Cassandra.
Today, DBMS systems continue to evolve, with an emphasis on scalability, performance, and support for
cloud-based applications. Some of the most popular DBMS systems in use today include Oracle,
Microsoft SQL Server, MySQL, PostgreSQL, and MongoDB.
A file system is a technique of arranging the files in a storage devices like a hard disk, pen drive, DVD, etc.
It helps you to organizes the data and allows easy retrieval of files when they are required. A file system
enables you to handle the way of reading and writing data to the storage medium. It is directly installed
into the computer with the Operating systems such as Windows and Linux.
4
What is DBMS?
Database Management System (DBMS) is a software for storing and retrieving user’s data while
considering appropriate security measures. It consists of a group of programs that manipulate the
database. The DBMS accepts the request for data from an application and instructs the DBMS engine to
provide the specific data. In large systems, a DBMS helps users and other third-party software to store
and retrieve data.
A file system is a software that manages and DBMS or Database Management System is a
organizes the files in a storage medium. It controls software application. It is used for accessing,
how data is stored and retrieved. creating, and managing databases.
The file system provides the details of data DBMS gives an abstract view of data that hides
representation and storage of data. the details.
Storing and retrieving of data can’t be done DBMS is efficient to use as there are a wide
efficiently in a file system. variety of methods to store and retrieve data.
It does not offer data recovery processes. There is a backup recovery for data in DBMS.
The file system doesn’t have a crash recovery DBMS provides a crash recovery mechanism.
mechanism.
Protecting a file system is very difficult. DBMS offers good protection mechanism.
In a file management system, the redundancy of The redundancy of data is low in the DBMS
data is greater. system.
Data inconsistency is higher in the file system. Data inconsistency is low in a database
management system.
The file system offers lesser security. Database Management System offers high
security.
5
File System DBMS
File System allows you to stores the data as Database Management System stores data as
isolated data files and entities. well as defined constraints and interrelation.
Not provide support for complicated transactions. Easy to implement complicated transactions.
The centralization process is hard in File Centralization is easy to achieve in the DBMS
Management System. system.
It doesn’t offer backup and recovery of data if it is DBMS system provides backup and recovery of
lost. data even if it is lost.
There is no efficient query processing in the file You can easily query data in a database using
system. the SQL language.
These system doesn’t offer concurrency. DBMS system provides a concurrency facility.
Each application has its data file so, the same data may have to be recorded and stored many
times.
Data dependence in the file processing system are data-dependent, but, the problem is
incompatible with file format.
Limited data sharing.
The problem with security.
Time-consuming.
It allows you to maintain the record of the big firm having a large number of items.
Required lots of labor work to do.
A Database management system is a computerized record-keeping system. It is a repository or a container for collection of
computerized data files. The overall purpose of DBMS is to allow he users to define, store, retrieve and update the
information contained in the database on demand. Information can be anything that is of significance to an individual
or organization.
Databases touch all aspects of our lives. Some of the major areas of application are as follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing and selling
5. Human resources
Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for generation of
paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in factories,
inventories of items inwarehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking,generation of recommendation lists, and
maintenance of online product evaluations.
Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks and
bonds; also for storing real-time market data to enable online trading by customers and automated trading
by the firm.
• Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
• Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on prepaid
calling cards, and storing information about the communication networks.
Advantages of DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same data multiple times). In a
database system, by having a centralized database and centralized control of data by the DBA the unnecessary duplication of
data is avoided. It also eliminates the extra time for processing the large volume of data. It results in saving the storage
space.
Improved Data Sharing: DBMS allows a user to share the data in any number of application programs.
Data Integrity: Integrity means that the data in the database is accurate. Centralized control of the data helps in
permitting the administrator to define integrity constraints to the data in the database. For example: in customer
database we can can enforce an integrity that it must accept the customer only from Noida and Meerut city.
Security: Having complete authority over the operational data, enables the DBA in ensuring that the only mean of
access to the database is through proper channels. The DBA can define authorization checks to becarried out whenever
access to sensitive data is attempted.
Data Consistency : By eliminating data redundancy, we greatly reduce the opportunities for inconsistency. For example: is a
customer address is stored only once, we cannot have disagreement on the stored values. Also updating data values is
greatly simplified when each value is stored in one place only. Finally, we avoid the wasted storage that results from
redundant data storage.
Efficient Data Access : In a database system, the data is managed by the DBMS and all access to the data is through the DBMS
providing a key to effective data processing
Enforcements of Standards : With the centralized of data, DBA can establish and enforce the data standards which may
include the naming conventions, data quality standards etc.
Data Independence : Ina database system, the database management system provides the interface between the application
programs and the data. When changes are made to the data representation, the meta data obtained by the DBMS is
changed but the DBMS is continues to provide the data to application program in the previously used way. The DBMs handles
the task of transformation of data wherever necessary.
Reduced Application Development and Maintenance Time : DBMS supports many important functions that are common to
many applications, accessing data stored in the DBMS, which facilitates the quick development of application.
Disadvantages of DBMS
1) It is bit complex. Since it supports multiple functionality to give the user the best, the underlying softwarehas
become complex. The designers and developers should have thorough knowledge about the software to get the
most out of it.
2) Because of its complexity and functionality, it uses large amount of memory. It also needs large memory to run
8
efficiently.
3) DBMS system works on the centralized system, i.e.; all the users from all over the world access this database.
Hence any failure of the DBMS, will impact all the users.
4) DBMS is generalized software, i.e.; it is written work on the entire systems rather specific one. Hence some of the
application will run slow.
Data Models
A data model is a collection of high-level data description constructs that hide many low-level storage
details. A DBMS allows a user to define the data to be stored in terms of a data model.
1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Based Data Model
1. Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the hierarchical tree
structure.
The hierarchy starts from the root which has root data and then it expands in the form of a tree adding
child node to the parent node. This model easily represents some of the real-world relationships like
food recipes, sitemap of a website etc.
Depicts a set of one-to-many (1:M) relationships
2. Network Model
9
This model is an extension of the hierarchical model, the only difference is that a record can have more
than one parent. It replaces the hierarchical tree with a graph.
The network model was created to represent complex data relationships more effectively when
compared to hierarchical models, to improve database performance and standards.
Depicts both one-to-many (1:M) and many-to-many (M:N) relationships.
3. Entity-Relationship Model
An ER model is the logical representation of data as objects and relationships among them. These
objects are known as entities, and relationship is an association among these entities.
Entity-Relationship Model Components
ER diagram basically having three components:
1. Entities − It is a real-world thing which can be a person, place, or even a concept. For Example:
Department, Admin, Courses, Teachers, Students, Building, etc are some of the entities of a
School Management System.
2. Attributes − An entity which contains a real-world property called an attribute. For Example: The
entity employee has the property like employee id, salary, age, etc.
3. Relationship − Relationship tells how two attributes are related. For Example: Employee works
for a department.
An entity has a real-world property called attribute and these attributes are defined by a set of values
called domain.
These concepts are explained below.
10
Advantages of Entity-Relationship Model
4. Relational Model
The relational model uses a collection of tables to represent both data and the relationships. Tables are
also known as relations. Each table has multiple columns represent as attributes, Attributes are the
properties which define a relation. Each row of the table represents as Tuple, Tuple is one piece of
information.
Tables: relations are saved in the table format. A table has two properties rows and columns
Attribute: columns represent as attributes
Tuple: A Row represent as Tuple
Relation Schema: A relation schema represents the name of the relation with its attributes.
Degree: The total number of attributes which in the relation is called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance: The set of tuples of a relation at a particular instance of time is called as
relation instance.
1. Objects: An object is an abstraction of a real world entity or we can say it is an instance of class.
Object encapsulates data and code into a single unit which provide data abstraction by hiding the
implementation details from the user.
2. Attribute: An attribute describes the properties of object.
3. Methods: Method represents the behavior of an object, it represents the real-world action
4. Class: A class is a collection of similar objects with shared structure i.e. attributes and behavior.
5. Inheritance: new classes are created from the existing classes
Reduced Maintenance
Real-World Modeling
Improved Reliability and Flexibility
High Code Reusability
Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It provides a
different view and helps in achieving data independence which is used to enhance the security of data.
The database systems consist of complicated data structures and relations. For users to access the data
easily, these complications are kept hidden, and only the relevant part of the database is made
accessible to the users through data abstraction .
12
Levels of abstraction for DBMS
Database systems include complex data-structures. In terms of retrieval of data, reduce complexity in
terms of usability of users and in order to make the system efficient, developers use levels of abstraction
that hide irrelevant details from the users. Levels of abstraction simplify database design.
Mainly there are three levels of abstraction for DBMS
Internal Level/Schema
The internal schema defines the physical storage structure of the database. The internal schema is a
very low-level representation of the entire database. It contains multiple occurrences of multiple
types of internal record. In the ANSI term, it is also called “stored record’.
It helps you to keeps information about the actual representation of the entire database. Like the
actual storage of the data on the disk in the form of records
13
The internal view tells us what data is stored in the database and how
It never deals with the physical devices. Instead, internal schema views a physical device as a
collection of physical pages
Conceptual Schema/Level
The conceptual schema describes the Database structure of the whole database for the community
of users. This schema hides information about the physical storage structures and focuses on
describing data types, entities, relationships, etc.
This logical level comes between the user level and physical storage view. However, there is only
single conceptual view of a single database.
In the conceptual level, the data available to a user must be contained in or derivable from the
physical level
External Schema/Level
An external schema describes the part of the database which specific user is interested in. It hides
the unrelated details of the database from the user. There may be “n” number of external views for
each database.
Each external view is defined using an external schema, which consists of definitions of various types
of external record of that specific view.
An external view is just the content of the database as it is seen by some specific particular user. For
example, a user from the sales department will see only sales related data.
An external level is only related to the data which is viewed by specific end users.
The external schema describes the segment of the database which is needed for a certain user
group and hides the remaining details from the database from the specific user group
Every user should be able to access the same data but able to see a customized view of the data.
The user need not to deal directly with physical database storage detail.
14
The DBA should be able to change the database storage structure without disturbing the user’s
views
The internal structure of the database should remain unaffected when changes made to the
physical aspects of storage.
DBMS Architecture allows you to make changes on the presentation level without affecting the
other two layers
It is more secure as the client doesn’t have direct access to the database business logic
In case of the failure of the one-tier no data loss as you are always secure by accessing the other
tier
Complete DB Schema is a complex structure which is difficult to understand for every one
The physical separation of the tiers can affect the performance of the Database
There are two levels of data independence based on three levels of abstraction. These are as
follows −
o Physical data independence can be defined as the capacity to change the internal schema
without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the Conceptual
structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.
The changes in the physical level may include changes using the following −
o Logical data independence refers characteristic of being able to change the conceptual schema
without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would not
be affected.
o Logical data independence occurs at the user interface level.
DBMS Architecture
Database architecture uses languages to design a particular type of software for business
organizations.
1. In this, there is only one server and multiple clients are there.
2. Only the user interface will be there.
3. In this, we will use the file server rather than using the database server.
4. These are mainly used where there are less frequent changes in the data and there
are no multiple users accessing the data.
5. Basically, a one-tier architecture helps in keeping all of the elements of an
application, including the interface, middleware, and back-end data, in only one
place.
For example – Let us say we want to fetch the records of employees from the database and the
database is available on the computer system, so the request to fetch employee details will be made by
the computer and data will be fetched from the database. This type of system is referred to as a local
database system.
17
1. It is a client-server architecture. Here two layers will be there the client tier and
the database tier.
2. There is direct communication, therefore, faster than 1 tier architecture.
3. In this, we use a database server.
4. In this, the database system and DBMS application are present at the server machine
and the client machine respectively. The two devices are connected with each other via
a reliable network.
5. All the clients will be communicating with the database server which is present in the
organization.
6. Whenever the client machine makes a request to access the database located at the
server using a query language like SQL, the server performs the request on the
database and returns the result to the client.
The application connection interface like JDBC and ODBC are generally used for the
interconnection between server and client.
There is no intermediate layer between the client and the server.
1. In this, there will be three layers: the client tier, the business logic tier, and the
database tier.
2. In this, the client application does not directly communicate with the database systems
present. But they communicate with server applications and the server application
communicates internally with the database system.
3. This architecture separates ties depending on the complexity of users and how users
use the data present in the database.
4. It is a completely web-based application.
18
Database Management System (DBMS) is a software that allows access to data stored in a database
and provides an easy and effective method of –
Defining the information.
Storing the information.
Manipulating the information.
Protecting the information from system crashes or data theft.
Differentiating access permissions for different users.
Please be note that the Structure of Database Management System is also referred to as Overall
System Structure or Database Architecture but it is different from the tier architecture of Database.
The database system is divided into three components: Query Processor, Storage Manager, and Disk
Storage.
These are explained as following below.
Architecture of DBMS
1. Query Processor :
It interprets the requests (queries) received from end user via an application program into
instructions. It also executes the user request which is received from the DML compiler.
Query Processor contains the following components –
DML Compiler –
It processes the DML statements into low level instruction (machine language), so that they can be
executed.
DDL Interpreter –
It processes the DDL statements into a set of table containing meta data (data about data).
Query Optimizer –
It executes the instruction generated by DML Compiler.
2. Storage Manager :
Storage Manager is a program that provides an interface between the data stored in the database and
the queries received. It is also known as Database Control System. It maintains the consistency and
19
integrity of the database by applying the constraints and executes the DCL statements. It is responsible
for updating, storing, deleting, and retrieving data in the database.
It contains the following components –
Authorization Manager –
It ensures role-based access control, i.e,. checks whether the particular person is privileged to
perform the requested operation or not.
Integrity Manager –
It checks the integrity constraints when the database is modified.
Transaction Manager –
It controls concurrent access by performing the operations in a scheduled way that it receives the
transaction. Thus, it ensures that the database remains in the consistent state before and after the
execution of a transaction.
File Manager –
It manages the file space and the data structure used to represent information in the database.
Buffer Manager –
It is responsible for cache memory and the transfer of data between the secondary storage and
main memory.
Data Dictionary –
It contains the information about the structure of any database object. It is the repository of
information that governs the metadata.
Indices –
It provides faster retrieval of data item.
What is an ER Diagram?
An Entity Relationship Diagram (ER Diagram) pictorially explains the relationship between entities to be
stored in a database. Fundamentally, the ER Diagram is a structural design of the database. It acts as a
framework created with specialized symbols for the purpose of defining the relationship between the
database entities. ER diagram is created based on three principal components: entities, attributes, and
relationships.
Lines: It links attributes to entity types and entity types with other relationship types
Components of ER Diagram
Entities
Weak Entity
Attributes
Key Attribute
Composite Attribute
Multivalued Attribute
Derived Attribute
Relationships
One-to-One Relationships
One-to-Many Relationships
Many-to-One Relationships
21
Many-to-Many Relationships
Entities
For example, in a student study course, both the student and the course are entities.
Weak Entity
An entity that makes reliance over another entity is called a weak entity
In the example below, school is a strong entity because it has a primary key attribute - school number.
Unlike school, the classroom is a weak entity because it does not have any primary key and the room
number here acts only as a discriminator.
Entity Set:
An Entity is an object of Entity Type and a set of all entities is called an entity set. For Example, E1 is an
entity having Entity Type Student and the set of all students is called Entity Set. In ER diagram, Entity
Type is represented as:
22
Entity Set
Attribute
Key Attribute
For example: For a student entity, the roll number can uniquely identify a student from a set of students.
Composite Attribute
Multivalued Attribute
Some attributes can possess over one value, those attributes are called multivalued attributes.
Derived Attribute
An attribute that can be derived from other attributes of the entity is known as a derived attribute.
Relationship
In the example below, both the student and the course are entities, and study is the relationship
between them.
24
One-to-One Relationship
When a single element of an entity is associated with a single element of another entity, it is called a
one-to-one relationship.
For example, a student has only one identification card and an identification card is given to one person.
One-to-Many Relationship
When a single element of an entity is associated with more than one element of another entity, it is
called a one-to-many relationship
For example, a customer can place many orders, but an order cannot be placed by many customers.
Many-to-One Relationship
When more than one element of an entity is related to a single element of another entity, then it is
called a many-to-one relationship.
For example, students have to opt for a single course, but a course can have many students.
Many-to-Many Relationship
When more than one element of an entity is associated with more than one element of another entity,
this is called a many-to-many relationship.
25
For example, you can assign an employee to many projects and a project can have many employees.
Relationship Set
A set of relationships of the same type is known as a relationship set .
The following relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in
C3.
Relationship Set
1. Unary Relationship: When there is only ONE entity set participating in a relation, the relationship is
called a unary relationship. For example, one person is married to only one person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3.Ternary Relationship: When there are exactly three entity sets participating in a relationship then
such type of relationship is called ternary relationship
26
4.n-ary Relationship: When there are n entities set participating in a relation, the relationship is called
an n-ary relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Participation Constraint
Specialization – An entity set broken down sub-entities that are distinct in some way from other entities in
the set. For instance, a subset of entities within an entity set may have attributes that are not shared by all
the entities in the entity set. The E-R model provides a means for representing these distinctive entity
groupings.
Specialization is an “a Top-down approach” where a high-level entity is specialized into two or more level
entities.
Example – Consider an entity set vehicle, with attributes color and no. of tires. A vehicle may be further
classified as one of the following:
Car
Bike
Bus
Each of these vehicle types is described by a set of attributes that includes all the attributes of the entity set
vehicle plus possibly additional attributes. For example, car entities may be described further by the
attribute gear, whereas bike entities may be described further by the attributes automatic break. The
28
process of designating subgroupings within an entity set is called specialization. The specialization of
vehicles allows us to distinguish among vehicles according to whether they are cars, buses, or bikes.
Generalization – It is a process of extracting common properties from a set of entities and creating a
generalized entity from it. generalization is a “Bottom-up approach”. In which two or more entities can be
combined to form a higher-level entity if they have some attributes in common.
Example: There are three entities given, car, bus, and bike. They all have some common attributes like all
cars, buses, and bikes they all have no. of tires and have some colors. So they all can be grouped and make a
superclass named a vehicle.
Inheritance – An entity that is a member of a subclass inherits all the attributes of the entity as the member
of the superclass, the entity also inherits all the relationships that the superclass participates in. Inheritance
is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the
attributes of higher-level entities.
29
Example – Car, bikes, and buses inherit the attributes of a vehicle. Thus, a car is described by its color and
no. of tires, and additionally a gear attribute; a bike is described by its color and no. of tires attributes, and
additionally automatic break attribute.
Aggregation – In aggregation, the relation between two entities is treated as a single entity. In aggregation,
the relationship with its corresponding entities is aggregated into a higher-level entity.
Example- phone numbers on your mobile phone. You can refer to them individually – your mother’s
number, your best friend’s number, etc. But it’s easier to think of them collectively, as your phone number
list. It is also important to realize that each member of the aggregation still has the properties of the whole.
In other words, each phone number in the list remains a phone number. The process of combining them has
not altered them in any way.
Weak entity
A weak entity is an entity set that does not have sufficient attributes for Unique Identification of its
records.
Example 1 – A loan entity can not be created for a customer if the customer doesn’t exist
Simply a weak entity is nothing but an entity that does not have a primary key attribute
It contains a partial key called a discriminator which helps in identifying a group of entities from
the entity set
A discriminator is represented by underlining with a dashed line
30
Representation