DBMS Unit-1
DBMS Unit-1
What is Database
The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students
and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the
information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of
failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements
of the user.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores
all the data in one single database file and that recorded data is placed in the
database.
o Data sharing: In DBMS, the authorized users of an organization can share the
data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature
of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic
backup of data from hardware and software failures and restores the data if
required.
o multiple user interface: It provides different types of user interfaces like
graphical user interfaces, application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and
large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in
most of the organization, all the data stored in a single database and if the
database is damaged due to electric failure or database corruption then the data
may be lost forever.
DBMS vs. File System
There are the following differences between DBMS and File systems:
Sharing of data Due to the centralized approach, Data is distributed in many files,
data sharing is easy. and it may be of different formats,
so it isn't easy to share data.
Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.
Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the system
user from system failure. crashes while entering some data,
then the content of the file will be
lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store store and retrieve the data.
and retrieve the data.
Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.
Data Due to the centralization of the In this, the files and application
Redundancy and database, the problems of data programs are created by different
Inconsistency redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.
Structure The database structure is complex to The file system approach has a
design. simple structure.
Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.
o Logical Data Independence
o Physical Data Independence
Data Models In the database approach, 3 types of In the file system approach, there
data models exist: is no concept of data models
o Hierarchal data models exists.
Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are more approach.
easily with a database approach.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The 3-Tier architecture is used in case of large web application.
4) Semistructured Data Model: This type of data model is different from the other
three data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have
different attributes sets. The Extensible Markup Language, also known as XML, is widely
used for representing the semistructured data. Although XML was initially designed for
including the markup information to the text document, it gains importance because of
its application in the exchange of data.
Database Schema
A database schema is the skeleton structure that represents the logical view of the
entire database. It defines how the data is organized and how the relations among
them are associated. It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a
descriptive detail of the database, which can be depicted by means of schema
diagrams. It’s the database designers who design the schema to help programmers
understand the database and make it useful.
Database Instance
It is important that we distinguish these two terms individually. Database schema is the
skeleton of database. It is designed when the database doesn't exist at all. Once the
database is operational, it is very difficult to make any changes to it. A database
schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It
contains a snapshot of the database. Database instances tend to change with time. A
DBMS ensures that its every instance (state) is in a valid state, by diligently following
all the validations, constraints, and conditions that the database designers have
imposed
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one level
of the database system without altering the schema at the next higher level.
Database Language
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
Types of Database Language
o DDL stands for Data Definition Language. It is used to define database structure or
pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.
These commands are used to update the database schema that's why they come under
Data definition language.
o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.
(But in Oracle database, the execution of data control language does not have
the feature of rolling back.)
There are the following operations which have the authorization of Revoke:
• Administrative DBA – This DBA is mainly concerned with installing, and maintaining DBMS
servers. His prime tasks are installing, backups, recovery, security, replications, memory
management, configurations, and tuning. He is mainly responsible for all administrative tasks of
a database.
• Development DBA – He is responsible for creating queries and procedures for the requirement.
Basically, his task is similar to any database developer.
• Database Architect – Database architect is responsible for creating and maintaining the users,
roles, access rights, tables, views, constraints, and indexes. He is mainly responsible for
designing the structure of the database depending on the requirement. These structures will be
used by developers and development DBA to code.
• Data Warehouse DBA –DBA should be able to maintain the data and procedures from various
sources in the data warehouse. These sources can be files, COBOL, or any other programs. Here
data and programs will be from different sources. A good DBA should be able to keep the
performance and function levels from these sources at the same pace to make the data
warehouse work.
• Application DBA –He acts like a bridge between the application program and the database. He
makes sure all the application program is optimized to interact with the database. He ensures all
the activities from installing, upgrading, and patching, maintaining, backup, recovery to
executing the records work without any issues.
• OLAP DBA – He is responsible for installing and maintaining the database in OLAP systems. He
maintains only OLAP databases.
ACID Properties
The expansion of the term ACID defines for:
1) Atomicity: The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or executed
completely or should not be executed at all. It further means that the operation should
not break in between or execute partially. In the case of executing operations on the
transaction, the operation should be completely executed and not partially.
2) Consistency: The word consistency means that the value should remain preserved
always. In DBMS, the integrity of the data should be maintained, which means if a
change in the database is made, it should remain preserved always. In the case of
transactions, the integrity of the data is very essential so that the database remains
consistent before and after the transaction. The data should always be correct.
3) Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of
a database where no data should affect the other one and may occur concurrently. In
short, the operation on one database should begin when the operation on the first
database gets complete. It means if two operations are being performed on two
different databases, they may not affect the value of one another. In the case of
transactions, when two or more transactions occur simultaneously, the consistency
should remain maintained. Any changes that occur in any particular transaction will not
be seen by other transactions until the change is not committed in the memory.
Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency
and availability of data in the database.
Components of DBMS
In this section, we will look at the common components that are universal
across all DBMS software, including:
• Storage engine
• Query language
• Query processor
• Optimization engine
• Metadata catalog
• Log manager
• Reporting and monitoring tools
• Data utilities
A decision support system helps in decision-making but does not necessarily give a
decision itself. The decision makers compile useful information from raw data, documents,
personal knowledge, and/or business models to identify and solve problems and make
decisions.
Attributes of a DSS
• Adaptability and flexibility
• High level of Interactivity
• Ease of use
• Efficiency and effectiveness
• Complete control by decision-makers
• Ease of development
• Extendibility
• Support for modeling and analysis
• Support for data access
• Standalone, integrated, and Web-based
Components of a DSS
• Database Management System (DBMS) − To solve a problem the necessary
data may come from internal or external database.
• Model Management System − It stores and accesses models that managers
use to make decisions. Such models are used for designing manufacturing
facility, analyzing the financial health of an organization, forecasting demand of
a product or service, etc.
• Support Tools − Support tools like online help; pulls down menus, user
interfaces, graphical analysis, error correction mechanism, facilitates the user
interactions with the system.
ER model
o ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and easy
to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship
diagram.
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
a. Weak Entity
• An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. The weak entity is represented by a
double rectangle.
•
• 2. Attribute
• The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.
• For example, id, age, contact number, name, etc. can be attributes of a student.
•
• a. Key Attribute
• The key attribute is used to represent the main characteristics of an entity. It
represents a primary key. The key attribute is represented by an ellipse with the
text underlined.
•
b. Composite Attribute
c. Multivalued Attribute
• An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued attribute.
• For example, a student can have more than one phone number.
•
d. Derived Attribute
3. Relationship
• A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.
•
a. One-to-One Relationship
• When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.
• For example, A female can marry to one male, and a male can marry to one
female
b. One-to-many relationship
• When only one instance of the entity on the left, and more than one instance of
an entity on the right associates with the relationship then this is known as a one-
to-many relationship.
• For example, Scientist can invent many inventions, but the invention is done by
the only specific scientist.
•
c. Many-to-one relationship
• When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a many-
to-one relationship.
• For example, Student enrolls for only one course, but a course can have many
students.
d. Many-to-many relationship
• When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is known
as a many-to-many relationship.
• For example, Employee can assign by many projects and project can have many
employees.
Notation of ER diagram
Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of entities to which
another entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two entity sets.
o For binary relationship set R on an entity set A and B, there are four possible mapping
cardinalities. These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and
an entity in E2 is associated with at most one entity in E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2,
and an entity in E2 is associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and
an entity in E2 is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in
E2, and an entity in E2 is associated with any number of entities in E1.
Participation Constraints
• Total Participation − Each entity is involved in the relationship. Total
participation is represented by double lines.
• Partial participation − Not all entities are involved in the relationship. Partial
participation is represented by single lines.
ER Design Issues
However, users often mislead the concept of the elements and the design process of the
ER diagram. Thus, it leads to a complex structure of the ER diagram and certain issues
that does not meet the characteristics of the real-world enterprise model.
Here, we will discuss the basic design issues of an ER database schema in the following
points:
For example, ID is used as a key in the Student table because it is unique for each
student. In the PERSON table, passport_number, license_number, SSN are keys since
they are unique for each person.
Types of keys:
1. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset
of a candidate key.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of
the attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.
3. Primary key
o It is the key used to identify one and only one instance of an entity uniquely. An entity
can contain multiple keys, as we saw in the PERSON table. The key which is most suitable
from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In
the EMPLOYEE table, we can even select License_Number and Passport_Number as
primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another
table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's information in
the employee table. That's why we link these two tables through the primary key of one
table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in
the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely
identify each tuple in a relation. These attributes or combinations of the attributes are
called the candidate keys. One key is chosen as the primary key from these candidate
keys, and the remaining candidate key, if it exists, is termed the alternate key. In other
words, the total number of the alternate keys is the total number of candidate keys
minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act
as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the
other candidate key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite
key. This key is also known as Concatenated Key.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These1 keys
are created when a primary key is large and complex and has no relationship with many
other relations. The data values of the artificial keys are usually numbered in a serial
order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is
large in employee relations. So it would be better to add a new virtual attribute to
identify each tuple in the relation uniquely.
E-R Diagram: