DBMS
DBMS
What is Information?
Info. Is processed, organized, and structured data.
It provides context of the data and enables decision making.
Processed data that make sense to us.
Information is extracted from the data, by analyzing and interpreting pieces of data.
E.g.,you have data of all the people living in your locality, its Data, when you analyze and
interpret
i. the data and come to some conclusion that:
ii. There are 100 senior citizens.
iii. The sex ratio is 1.1.
iv. Newborn babies are 100.
v. These are information.
Data vs Information
a) Data is a collection of facts, while information puts those facts into context.
b) While data is raw and unorganized, information is organized.
c) Data points are individual and sometimes unrelated. Information maps out that data to
provide a big-picture view of how it all fits together.
d) Data, on its own, is meaningless. When it’s analyzed and interpreted, it becomes
meaningful information.
e) Data does not depend on information; however, information depends on data.
f) Data typically comes in the form of graphs, numbers, figures, or statistics. Information is
typically presented through words, language, thoughts, and ideas.
g) Data isn’t sufficient for decision-making, but you can make decisions based on
information.
What is Database?
Database is an electronic place/system where data is stored in a way that it can be easily
accessed,
managed, and updated.
To make real use Data, we need Database management systems. (DBMS)
What is the DBMS?
1
DBMS is the set programs that act as interface between the user and database which
helps the user to interact with the database(collection of interrelated data), it means user
can access, delete make any modification in the database.
The primary goal of a DBMS is to provide a way to store and retrieve data efficiently
2
Problem: When a student changes their address, the update must be applied to every file. If
the address is updated in `students.txt` but not in `courses.txt` or `grades.txt`, the data
becomes inconsistent. This redundancy increases the storage requirement and leads to
inconsistencies.
DBMS Solution: A DBMS stores the student's information in a single table `Students` and
references this table in other tables like `Courses` and `Grades`, eliminating redundancy and
ensuring consistency.
2. Difficulty in Accessing Data
To find all students enrolled in a specific course and their grades, you need to combine
information from `students.txt`, `courses.txt`, and `grades.txt`.
Problem: in a file system, this requires writing a complex program to parse and integrate data
from multiple files.
DBMS Solution: In a DBMS, you can use SQL queries to easily retrieve this information. For
example:
3. Data Isolation
The university wants to generate a report of all courses a student has taken along with their
grades.
Problem: In a file system, student data, course enrollments, and grades are in separate files,
making it difficult to integrate and retrieve combined information.
DBMS Solution: A DBMS allows data to be stored in related tables, making it easy to join these
tables and retrieve integrated information.
4. Integrity Problems
Scenario: Each student must have a unique ID, and grades must be within a specific range (0-
100).
Problem: A file system cannot enforce these constraints, leading to potential integrity issues
like duplicate IDs or invalid grades.
DBMS Solution: A DBMS enforces integrity constraints such as primary keys for unique IDs and
check constraints for valid grade ranges.
5. Atomicity Problems
Scenario: A transaction involves registering a student for a course and updating their total
course count.
Problem: In a file system, if the system crashes after updating the course file but before
updating the student file, the data will be inconsistent.
DBMS Solution: A DBMS ensures atomicity of transactions, meaning both updates (course
registration and course count) will be completed together or not at all.
6. Concurrent-Access Anomalies
Scenario: Two administrators simultaneously update a student's course registration and
personal details.
Problem: In a file system, concurrent updates can lead to data loss or corruption if not
properly synchronized.
3
DBMS Solution: A DBMS handles concurrent access using locking mechanisms to prevent
conflicts, ensuring data consistency.
7. Security Problems
Scenario: Sensitive data like student grades and personal information need to be protected.
Problem:A file system offers limited security controls, making it difficult to restrict access to
sensitive data based on user roles.
DBMS Solution: A DBMS provides fine-grained access control, allowing the definition of user
roles and permissions. For example, only authorized personnel can access or modify student
grades.
4
If we have the file system then why we used DBMS over the file System to store
and maintain the data?
DBMS vs File Systems
a. File-processing systems has major disadvantages.
1. Data Redundancy and inconsistency
2. Difficulty in accessing data
3. Data isolation
4. Integrity problems
5. Atomicity problems
6. Concurrent-access anomalies
7. Security problems
b. Above 7 are also the Advantages of DBMS (answer to "Why to use DBMS?")
5
6. Concurrent-Access Anomalies:
File-processing systems may not handle concurrent access to data properly, leading to
anomalies such as lost updates, uncommitted data, or inconsistent retrievals.
Real-life Example: In a shared document management system using file-based storage, if
multiple users attempt to modify the same document simultaneously, their changes may
overwrite each other, leading to lost updates or corrupted files.
7. Security Problems:
File-processing systems may lack robust security features, making it challenging to control
access to sensitive data and protect against unauthorized access or modifications.
Real-life Example: In a healthcare system using file-based storage for patient records, if
files are not adequately secured, unauthorized personnel may gain access to sensitive
medical information, compromising patient privacy and confidentiality.
By employing the Three Schema Architecture, the Student Management System achieves
modularity, flexibility, and data abstraction, enabling efficient management of student-related
information while providing customized views for different stakeholders within the educational
institution.
Instances and Schemas
The collection of information stored in the DB at a particular moment is called an
instance of DB.
The overall design of the DB is called the DB schema.
7
Schema is structural description of data. Schema doesn’t change frequently. Data
may change
frequently.
DB schema corresponds to the variable declarations (along with type) in a program.
We have 3 types of Schemas: Physical, Logical, several view schemas called
subschemas.
Logical schema is most important in terms of its effect on application programs, as
programmers
construct apps by using logical schema.
Physical data independence, physical schema change should not affect logical
schema/application programs.
Data Models:
Provides a way to describe the design of a DB at logical level.
Underlying the structure of the DB is the Data Model; a collection of conceptual tools for
describing data, data relationships, data semantics & consistency constraints.
E.g., ER model, Relational Model, object-oriented model, object-relational data model etc
Database Languages:
a. Data definition language (DDL) to specify the database schema.
b. Data manipulation language (DML) to express database queries and updates.
C. Practically, both language features are present in a single DB language, e.g., SQL language.
d. DDL: We specify consistency constraints, which must be checked, every time DB is
updated.
e. DML
Data manipulation involves
o Retrieval of information stored in DB.
o Insertion of new information into DB.
o Deletion of information from the DB.
o Updating existing information stored in DB.
Query language, a part of DML to specify statement requesting the retrieval of
information.
Now one question may come in our mind, that DBMS only understand the query
languages like SQL then how the application written in another language can
access the database?
Ans: application accesses a database by establishing a connection to the database using a
database driver( libraries or interfaces examp: In Node.js, the mysql package is a widely used
library for interacting with MySQL databases.), then interacts with the database using a
database API to execute SQL queries and commands. The application processes the retrieved
data according to its logic, handles errors, and manages database connections efficiently. This
allows the application to retrieve, manipulate, and present data from the database to users,
enabling dynamic and interactive functionality.
8
Here's a brief list of database drivers for different languages:
1. Java: JDBC (MySQL Connector/J, PostgreSQL JDBC Driver, Oracle JDBC Driver)
2. Python: psycopg2 (PostgreSQL), pymysql (MySQL), cx_Oracle (Oracle), pyodbc (SQL
Server, PostgreSQL, MySQL, etc.)
3. Node.js: mysql2 (MySQL), pg (PostgreSQL), mssql (SQL Server)
4. C++: ODBC
DBMS Application Architectures: Client machines, on which remote DB users work, and
server machines
on which DB system runs.
a. T1 Architecture: The client, server & DB all present on the same machine.
b. T2 Architecture
1. App is partitioned into 2-components.
2. Client machine, which invokes DB system functionality at server end through query
language statements.
3. API standards like ODBC & JDBC are used to interact between client and server.
c. T3 Architecture
1. App is partitioned into 3 logical components.
2. Client machine is just a frontend and doesn’t contain any direct DB calls.
3. Client machine communicates with App server, and App server communicated with
DB
system to access data.
4. Business logic, what action to take at that condition is in App server itself.
5. T3 architecture are best for WWW Applications.
6. Advantages:
Scalability due to distributed application servers.
Data integrity, App server acts as a middle layer between client and DB, which
minimize the chances of data corruption.
Security, client can't directly access DB, hence it is more secure
9
Data model: Video3
1. Data Model: Collection of conceptual tools for describing data, data relationships,
data semantics, and consistency constraints.
2. ER Model
It is a high level data model based on a perception of a real world that consists of a
collection of basic objects, called entities and of relationships among these objects.
Graphical representation of ER Model is ER diagram, which acts as a blueprint of DB.
3. Entity: An Entity is a “thing” or “object” in the real world that is distinguishable from all
other objects.
It has physical existence.
Each student in a college is an entity.
Entity can be uniquely identified. (By a primary attribute, aka Primary Key)
Strong Entity: Can be uniquely identified.
Weak Entity: Can’t be uniquely identified., depends on some other strong entity.
o It doesn’t have sufficient attributes, to select a uniquely identifiable attribute.
o Loan -> Strong Entity, Payment -> Weak, as instalments are sequential number
counter can be generated separate for each loan.
o Weak entity depends on strong entity for existence.
4. Entity set
It is a set of entities of the same type that share the same properties, or attributes.
E.g., Student is an entity set.
E.g., Customer of a bank
5. Attributes
An entity is represented by a set of attributes.
Each entity has a value for each of its attributes.
For each attribute, there is a set of permitted values, called the domain, or value set,
of that attribute.
E.g., Student Entity has following attributes
Student_ID , Name, Standard, Course, Batch, Contact number, Address
Types of Attributes
1. Simple
Attributes which can’t be divided further.
E.g., Customer’s account number in a bank, Student’s Roll number etc.
2. Composite
Can be divided into subparts (that is, other attributes).
E.g., Name of a person, can be divided into first-name, middle-name, last-name.
If user wants to refer to an entire attribute or to only a component of the attribute.
Address can also be divided, street, city, state, PIN code.
3. Single-valued
10
Only one value attribute.
e.g., Student ID, loan-number for a loan.
4. Multi-valued
Attribute having more than one value.
e.g., phone-number, nominee-name on some insurance, dependent-name etc.
Limit constraint may be applied, upper or lower limits.
5. Derived
Value of this type of attribute can be derived from the value of other related
attributes.
e.g., Age, loan-age, membership-period etc.
6. NULL Value
An attribute takes a null value when an entity does not have a value for it.
It may indicate “not applicable”, value doesn’t exist. e.g., person having no middle-
name
It may indicate “unknown”.
o Unknown can indicate missing entry, e.g., name value of a customer is NULL,
means it is missing as name
o must have some value.
o Not known, salary attribute value of an employee is null, means it is not
known yet.
11
Strong entity and weak entity:
Strong Entity
A strong entity in the schema is independent of all other entities. There will always be
a primary key for a strong entity. A strong entity set is a set that is made up of many strong
entities.
Representation:
A single rectangle is used to represent strong entities.
A single diamond is used to represent the relationship between two strong entities.
In the above image, we have two strong entities namely Employee and Department hence
they are represented using a single rectangle. The relationship between them is works in i.e
it gives information about an employee working in a particular department hence it is
represented using a single diamond. In the above image, if we remove the relationship
between the two entities then also the two entities will exist i.e Employee as well
as Department will exist since they both are independent of each other, this explains the
independent nature of strong entities.
Weak Entity
A weak entity in DBMS is an entity whose existence depends on other strong entities and
it does not have a primary key of its own.
Representation:
A double rectangle is used to represent weak entities.
A double diamond is used to represent the relationship between two weak entities.
Example: In the context of a customer relationship management system, an "Address" entity
can be considered a weak entity because it does not have a unique identifier on its own. It
relies on the existence of a "Customer" entity to which it is associated. An address can be
uniquely identified only in combination with its associated customer. Therefore, the
"Customer" entity serves as the identifying or owner entity for the "Address" entity.
12
Examples:
1. One-to-One (1:1):
Example: Employee and EmployeeID
Each employee has exactly one employee ID, and each employee ID is associated
with only one employee.
2. One-to-Many (1:N):
Example: Department and Employee
Each department can have multiple employees, but each employee belongs to only
one department.
3. Many-to-One (N:1):
Example: Employee and Manager
Many employees can report to the same manager, but each employee has only one
manager.
4. Many-to-Many (N:M):
Example: Student and Course
Each student can enroll in multiple courses, and each course can have multiple
students enrolled.
Participations constraints:
13
The total participation constraint here is between the Borrow relationship and the Loan entity.
It specifies that every loan entity must participate in the Borrow relationship, meaning every
loan must be associated with at least one customer through the Borrow relationship.
If total participation is enforced:
Every loan entity must be connected to at least one customer entity through the Borrow
relationship.
It ensures that there are no loans in the system that exist without being borrowed by any
customer.
Graphically, in an Entity-Relationship Diagram (ERD), total participation constraints are
typically represented by a double line connecting the relationship to the entity, indicating that
the participation is total.
Note: in the above example cust.. is partially participated : means there may be customer
without loan.
Weak entities has always total participation constraints, but strong entities may not
have total participation.
Extended ER Features:
1. Specialisation
2. Gener
3.
1. Basic ER Features studied in the LEC-3, can be used to model most DB features but
when complexity increases, it is better to use some Extended ER features to model the DB
Schema.
2. Specialisation
In ER model, we may require to subgroup an entity set into other entity sets that are
distinct in some way with other entity sets.
Specialisation is splitting up the entity set into further sub entity sets on the basis
of their functionalities, specialities and features.
14
It is a Top-Down approach.
e.g., Person entity set can be divided into customer, student, employee. Person is
superclass and other specialised entity sets are subclasses.
1. We have "is-a" relationship between superclass and subclass.
2. Depicted by triangle component
Why Specialisation?
3. Certain attributes may only be applicable to a few entities of the parent entity
set.
4. DB designer can show the distinctive features of the sub entities.
5. To group such entities we apply Specialisation, to overall refine the DB blueprint.
Here we just break down the person into two entities customer and the employee and also divide the attributes,
the attributes which belongs to both customer and employee we keep them as the attributes of the person, as we
know that sub-emtities can inherit the attributes of superentity
more examples:
15
2. Generalisation
1. It is just a reverse of Specialisation.
2. DB Designer, may encounter certain properties of two entities are overlapping.
Designer may consider to make a new generalised entity set. That generalised entity
set will be a super class.
3. “is-a” relationship is present between subclass and super class.
4. e.g., Car, Jeep and Bus all have some common attributes, to avoid data repetition for
the common attributes. DB designer may consider to Generalise to a new entity set
“Vehicle”.
5. It is a Bottom-up approach.
6. Why Generalisation?
Makes DB more refined and simpler.
Common attributes are not repeated.
Note: ER-diagram will be same as specialization, but the difference is here we start thinking
from the bottom to up, for ex first we will thing about customer and employee and if both
entity have some common attribute after then we think about the person entities and assign
those common attributes to the person entity
Attribute Inheritance
1. Both Specialisation and Generalisation, has attribute inheritance.
2. The attributes of higher level entity sets are inherited by lower level entity sets.
3. E.g., Customer & Employee inherit the attributes of Person.
Participation Inheritance
1. If a parent entity set participates in a relationship then its child entity sets will also
participate in that relationship.
3 Aggregation :
How to show relationships among relationships? - Aggregation is the technique.
Abstraction is applied to treat relationships as higher-level entities. We can call it
Abstract entity.
Avoid redundancy by aggregating relationship as an entity set itself
there is a one limitation with E-R model that it cannot express relationships among
relationships. So aggregation is an abstraction through which relationship is treated
as higher level entities.
Note:
A supertype entity is a data model entity which has one or more other entities that act as
subtypes. In a supertype/subtype entity structure: The top-level entity is referred to as the
parent entity or the supertype entity. Each lower-level entity is referred to as a child entity or
a subtype entity
17
Q.1: What is the purpose of the Generalization?
Answer:
Generalization is simply gathering the common properties from entities and creating a
generalized concept from those extracted data. Generalization helps in improving the
flexibility, and reusability of the database.
Q.2: Why is generalization important in the database?
Answer:
Generalization is important in the database because it helps to gather important information
so that it becomes easier and faster for the user the analysis of data and it also helps in
making decisions faster.
Q.3. What does it mean to generalize/specialize an object in an ER diagram?
Answer:
Generalization is the process of creating a more general object from a more specific object. In
an ER diagram, this is represented by an arrow going from the more specific object to the
more general object. Specialization is the process of creating a more specific object from a
more general object. In an ER diagram, this is represented by an arrow going from the more
general object to the more specific object.
18
Step3: relation and constraints
19
Step4: ER-Diagram
20
Facebook ER-diagram:
After designing the conceptual model of the Database using ER diagram, we need to convert
the conceptual model into a relational model which can be implemented using
any RDBMS language like Oracle SQL, MySQL, etc. So we will see what the Relational Model is.
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
Tables are also known as relations. The relational model is an example of a record-based
model. Record-based models are so named because the database is structured in fixed-
format records of several types. Each table contains records of a particular type. Each record
type defines a fixed number of fields, or attributes. The columns of the table correspond to
the attributes of the record type. The relational data model is the most widely used data
model, and a vast majority of current database systems are based on the relational model.
Relational Model:
The relational model represents how data is stored in Relational Databases. A relational
database consists of a collection of tables, each of which is assigned a unique name. Consider
a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and AGE shown in
the table
21
Important Terminologies:
1. Relational Model (RM) organises the data in the form of relations (tables).
2. A relational DB consists of collection of tables, each of which is assigned a unique name.
3. A row in a table represents a relationship among a set of values, and table is collection of
such relationships.
4. Tuple: A single row of the table representing a single data point / a unique record.
5. Columns: represents the attributes of the relation. Each attribute, there is a permitted
value, called domain of the attribute.
6. Relation Schema: defines the design and structure of the relation, contains the name of
the relation and all the columns/attributes.
7. Common RM based DBMS systems, aka RDBMS: Oracle, IBM, MySQL, MS Access.
8. Degree of table: number of attributes/columns in a given table/relation.
9. Cardinality: Total no. of tuples in a given relation.
10.Relational Key: Set of attributes which can uniquely identify an each tuple.
11.Relation Instance: The set of tuples of a relation at a particular instance of time is called
a relation instance. Table 1 shows the relation instance of STUDENT at a particular time. It
can change whenever there is an insertion, deletion, or update in the database.
12.NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
14.Relation Key: These are basically the keys that are used to identify the rows uniquely or
also help in identifying tables. These are of the following types.
Super Key
Candidate Key
Primary Key
Alternate Key
Composite Key
Compound key
Surrogate Key:
1. Super Key (SK): Any P&C of attributes present in a table which can uniquely identify
each tuple.
22
2. Candidate Key (CK): minimum subset of super keys, which can uniquely identify each
tuple. It contains no redundant attribute.
CK value shouldn’t be NULL.
3. Primary Key (PK): Selected out of CK set, has the least no. of attributes.
4. Alternate Key (AK): All CK except PK.
5. Foreign Key (FK):
It creates relation between two tables.
A relation, say r1, may include among its attributes the PK of an other relation, say
r2. This attribute is called FK from r1 referencing r2.
The relation r1 is aka Referencing (Child) relation of the FK dependency, and r2
is called Referenced (Parent) relation of the FK.
FK helps to cross reference between two different relations.
6. Composite Key: PK formed using at least 2 attributes.
7. Compound Key: PK which is formed using 2 FK.
8. Surrogate Key:
Synthetic PK.
Generated automatically by DB, usually an integer value.
May be used as PK.
Surrogate key:
23
In the above figure let’s say we have the database of two school. We want to merge the both
tables of school A and school B. But the problem is we can not use register no as the primary
key because they are different in format, so to identify the merged table uniquely database
ads surrogate key to table.
Integrity Constraints:
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
Thus, integrity constraint is used to guard against accidental damage to the database.
24
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain.
o Restricts the value in the attribute of relation, specifies the Domain.
o Restrict the Data types of every attribute.
o E.g., We want to specify that the enrolment should happen for candidate birth year <
2002
Example:
Example:
1. Cascading Deletes
we can define a foreign key with the ON DELETE CASCADE option. This means that if a record
in the parent table is deleted, all related records in the child table will also be deleted
automatically.
26
In this setup,
deleting a student
will automatically
delete all associated
grades.
2. Set Null on Delete => (question may ask: can foreign key have null value?)
You can define a foreign key with the ON DELETE SET NULL option. This means that if a record
in the parent table is deleted, the foreign key field in the child table will be set to NULL.
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
27
o Key constraints are rules applied to a table's columns to enforce the uniqueness and
validity of data within a database.
o An entity set can have multiple keys, but out of which one key will be the primary key.
A primary key can contain a unique and null value in the relational table.
Example:
28