DBMS Unit 1
DBMS Unit 1
A database is an organized collection of data that is stored and managed electronically. It allows
for efficient retrieval, updating, insertion, and deletion of data. Databases are typically structured
in tables that are related to one another, and they use database management systems (DBMS) to
ensure data integrity, security, and efficient querying.
Databases can store various types of data such as text, numbers, images, or even complex objects
like videos and documents. They are widely used in applications ranging from websites to
enterprise systems and can handle large amounts of information.
Characteristics of Database
The database approach includes key features that ensure data accuracy, consistency, integrity,
security, etc. Let us now understand some of the important characteristics of the Database
Approach separately.
The characteristics of the Database are as follows:
Data Organization and self-describing nature of Database system
The database approach stores data in a structured and organized manner. It uses the database
management system to create, design, and maintain data logically and consistently. The database
approach also comes with the characteristic of a self-describing nature of a database system. That
is, the database not only contains the data and the tables but a thorough description of the entire
database in the form of metadata.
For example, the database of any e-commerce company contains data on its customers and
vendors in tables linked with foreign keys.
Integrity Of Data
Data integrity means the accuracy and consistency of data over its entire lifecycle. The database
approach emphasizes data integrity through the use of various methods such as data validation,
data normalization, the use of queries, and referential integrity. These techniques ensure that data
is accurate, consistent, and free from errors or duplicates. Data Integrity prevents human errors,
transfer errors and data verification errors.
For example, the database of any e-commerce company maintains the accuracy and consistency
of data inserted in tables because of the property of Data Integrity.
Data Independence
Data independence is another vital characteristic of the database approach. Data independence
means that the database structure and design are independent of the application programs that use
it. Any modifications to the database structure will not affect the application programs and vice
versa. This property makes modifying and updating the database easier without disrupting the
application software's functioning.
For example, the database of an online food ordering company can be modified without affecting
the consumer side of the website because of its data independence.
Data Security
Data security is a critical aspect of the database approach. A database management system
provides various security measures such as authentication, authorization, and encryption to
protect data from unauthorized access, tampering, or data breaches. This makes it easier to
identify the user and control who has access to data and what they can do with it, thus making
the data more secure.
The database approach also provides a good backup and data recovery feature. This is achieved
with techniques like database backups, replication, and transaction logs.
For example, the data of customers in the database of any ecommerce company has a high
backup facility and the data is securely stored in the database.
Data Consistency with centralized data management
All the data in the database approach is stored in a single centralized location rather than in
separate files. This characteristic of centralized data management contributes to data consistency
by storing data in a standardized format. Also, the database is updated and manipulated from a
single location. The database stored at a centralized computer system can be accessed through an
internet connection. Thus coordination of data becomes a lot easier and simpler here.
Sharing of Data
Data sharing is one of the essential aspects of the database approach. Here, data can be shared
among multiple users and applications simultaneously. It also reduces the need to store duplicate
data in various locations, which can lead to inconsistencies and errors.
Data sharing with multiple users can occur with techniques like concurrency and locking control.
This enhances collaboration and efficiency. During locking locks specific data from being
accessed by any other user while it is being updated, concurrency control handles the access to
data by numerous users by applying the strategy of time stamping and pessimistic and optimistic
concurrency control.
For example, in an online food ordering website multiple users can add food to their cart and
order food at the same time allowing multiple users to access the same database.
Query and Report Flexibility
The database approach also offers users query and report flexibility, allowing them to fetch and
analyze data efficiently. Queries can be used to perform CRUD operations like creating, reading,
updating, and deleting actions or filtering data according to the requirement.
The flexibility of the report means the database approach offers tools to generate customized
interactive reports in formats like graphs, tables, and charts that analyze data more easily. Here
we can define our dimension metrics for the data.
Database Schema:
The schema is the blueprint or the structure of the database. It defines how data is organized,
how relationships between data are managed, and the types of data (tables, fields, relationships,
constraints, etc.) that will be stored in the database.
It is static and typically doesn't change frequently. The schema describes the structure of the
entire database, including:
Database Instance:
The instance refers to the actual data stored in the database at a specific point in time. It
represents the state of the database at any moment.
It is dynamic, meaning the instance can change frequently as data is added, updated, or deleted.
Query Optimizer: It executes the instruction generated by DML Compiler.
Authorization Manager: It ensures role-based access control, i.e,. checks whether the particular
person is privileged to perform the requested operation or not.
Integrity Manager: It checks the integrity constraints when the database is modified.
Transaction Manager: It controls concurrent access by performing the operations in a scheduled
way that it receives the transaction. Thus, it ensures that the database remains in the consistent
state before and after the execution of a transaction.
File Manager: It manages the file space and the data structure used to represent information in
the database.
Buffer Manager: It is responsible for cache memory and the transfer of data between the
secondary storage and main memory.
Data Dictionary: It contains the information about the structure of any database object. It is the
repository of information that governs the metadata.
Introduction of ER Model
Peter Chen developed the ER diagram in 1976. The ER model was created to provide a simple
and understandable model for representing the structure and logic of databases. It has since
evolved into variations such as the Enhanced ER Model and the Object Relationship Model
The Entity Relational Model is a model for identifying entities to be represented in the database
and representation of how those entities are related. The ER data model specifies enterprise
schema that represents the overall logical structure of a database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the
database. ER models are used to model real-world objects like a person, a car, or a company and
the relation between these real-world objects. In short, the ER Diagram is the structural format
of the database.
Why Use ER Diagrams In DBMS?
ER diagrams represent the E-R model in a database, making them easy to convert
into relations (tables).
ER diagrams provide the purpose of real-world modeling of objects which makes
them intently useful.
ER diagrams require no technical knowledge and no hardware support.
These diagrams are very easy to understand and easy to create even for a naive user.
It gives a standard solution for visualizing the data logically.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which consists
of these symbols:
Rectangles: Rectangles represent Entities in the ER Model.
Ellipses: Ellipses represent Attributes in the ER Model.
Diamond: Diamonds represent Relationships among Entities.
Lines: Lines represent attributes to entities and entity sets with other relationship
types.
Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
Double Rectangle: Double Rectangle represents a Weak Entity.
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database
System.
Components of ER Diagram
What is Entity?
An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a university
course.
What is Entity Set?
An Entity is an object of Entity Type and a set of all entities is called an entity set. For Example,
E1 is an entity having Entity Type Student and the set of all students is called Entity Set. In ER
diagram, Entity Type is represented as:
Entity Set
We can represent the entity set in ER Diagram but can’t represent entity in ER Diagram because
entity is row and column in the relation and ER Diagram is graphical representation of data.
Types of Entity
There are two types of entity:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on
other Entity in the Schema. It has a primary key, that helps in identifying it uniquely, and it is
represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But some
entity type exists for which key attributes can’t be defined. These are called Weak Entity types .
For Example, A company may store the information of dependents (Parents, Children, Spouse)
of an Employee. But the dependents can’t exist without the employee. So Dependent will be
a Weak Entity Type and Employee will be Identifying Entity type for Dependent, which means
it is Strong Entity Type .
A weak entity type is represented by a Double Rectangle. The participation of weak entity types
is always total. The relationship between the weak entity type and its identifying strong entity
type is called identifying relationship and it is represented by a double diamond.
Strong Entity and Weak Entity
What is Attributes?
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, and Mobile_No are the attributes that define entity type Student. In ER diagram,
the attribute is represented by an oval.
Attribute
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country. In
ER diagram, the composite attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No (can
be more than one for a given student). In ER diagram, a multivalued attribute is represented by
a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following relationship
set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in a relationship, the
relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
What is Cardinality?
The number of times an entity of an entity set participates in a relationship set is known
as cardinality . Cardinality can be of different types:
1. One-to-One: When each entity in each entity set can take part only once in the relationship,
the cardinality is one-to-one. Let us assume that a male can marry one female and a female can
marry one male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more than
one entity and the total number of tables that can be used in this is 2. Let us assume that one
surgeon department can accommodate many doctors. So the Cardinality will be 1 to M. It means
one department has many Doctors.
total number of tables that can used is 3.
3. Many-to-One: When entities in one entity set can take part only once in the relationship set
and entities in other entity sets can take part more than once in the relationship set, cardinality
is many to one. Let us assume that a student can take only one course but one course can be
taken by many students. So the cardinality will be n to 1. It means that for one course there can
be n students but for one student, there will be only one course.
The total number of tables that can be used in this is 3.
In this case, each student is taking only 1 course but 1 course has been taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than one
course and one course can be taken by many students. So the relationship will be many to many.
the total number of tables that can be used in this is 3.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and
S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If each
student must enroll in a course, the participation of students will be total. Total participation is
shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Total Participation and Partial Participation
Every student in the Student Entity set participates in a relationship but there exists a course C4
that is not taking part in the relationship.
How to Draw ER Diagram?
The very first step is Identifying all the Entities, and place them in a Rectangle, and
labeling them accordingly.
The next step is to identify the relationship between them and place them
accordingly using the Diamond, and make sure that, Relationships are not
connected to each other.
Attach attributes to the entities properly.
Remove redundant entities and relationships.
Add proper colors to highlight the data present in the database.
1. Generalization
2. Specialization
3. Aggregation
1. Generalization
Generalization is the process of extracting common properties from a set of entities and
create a generalized entity from it.
Generalization is a "bottle-up approach" in which two or more entities can be combined
to form a higher level entity if they have some attributes in common.
Subclasses are combined to make a superclass.
Generalization is used to emphasize the similarities among lower-level entity set and to
hide differences in the schema.
Example:
Consider we have 3 sub entities Car, Bus and Motorcycle. Now these three entities can be
generalized into one higher-level entity (or super class) named as Vehicle.
2. Specialization
Specialization is opposite of Generalization.
In Specialization, an entity is broken down into sub-entities based on their
characteristics.
Specialization is a "Top-down approach" where higher level entity is specialized into
two or more lower level entities.
Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics
Specialization can be repeatedly applied to refine the design of schema.
Normally, the superclass is defined first, the subclass and its related attributes are
defined next, and relationship set are then added.
Depicted by triangle component labeled ISA
Example:
Vehicle entity can be a Car, Bus or Motorcycle.
3. Aggregation
Aggregation is used when we need to express a relationship among relationships.
Aggregation is an abstraction through which relationships are treated as higher level
entities.
Aggregation is a process when a relationship between two entities is considered as a
single entity and again this single entity has a relationship with another entity.
Note: Basic E-R model can't represent relationships involving other relationships.
Example:
1. Candidate Key
2. Primary Key
3. Foreign Key
4. Super Key
5. Alternate Key
6. Composite Key
7. Unique Key
1. Primary Key
The primary key refers to a column or a set of columns of a table that helps us
identify all the records uniquely present in that table. A table can consist of
just one primary key. Also, this primary key cannot consist of the same values
reappearing/repeating for any of its rows. All the values of a primary key have
to be different, and there should be no repetitions.
2. Super Key
A super key refers to the set of all those keys that help us uniquely identify all
the rows present in a table. It means that all of these columns present in a
table that can identify the columns of that table uniquely act as the super
keys.
A super key is a candidate key’s superset (candidate key has been explained
below). We need to pick the primary key of any table from the super key’s set
so as to make it the table’s identity attribute.
3. Candidate Key
The candidate keys refer to those attributes that identify rows uniquely in a
table. In a table, we select the primary key from a candidate key. Thus, a
candidate key has similar properties as that of the primary keys that we have
explained above. In a table, there can be multiple candidate keys.
4. Alternate Key
As we have stated above, any table can consist of multiple choices for the
primary key. But, it can only choose one. Thus, all those keys that did not
become a primary key are known as alternate keys.
5. Foreign Key
We use a foreign key to establish relationships between two available tables.
The foreign key would require every value present in a column/set of columns
to match the referential table’s primary key. A foreign key helps us to maintain
data as well as referential integrity.
6. Composite Key
The composite key refers to a set of multiple attributes that help us uniquely
identify every tuple present in a table. The attributes present in a set may not
be unique whenever we consider them separately. Thus, when we take them
all together, it will ensure total uniqueness.