0% found this document useful (0 votes)
6 views33 pages

DBMS

The document compares file-based systems with Database Management Systems (DBMS), highlighting issues such as data redundancy, difficulty in accessing data, data isolation, integrity problems, atomicity issues, and concurrent-access anomalies. It explains the three-schema architecture of DBMS, which includes internal, conceptual, and external levels, and discusses the concepts of schema and instance in DBMS. Additionally, it covers the role of a Database Administrator (DBA), the Entity-Relationship (ER) model, and the concepts of specialization and generalization in database design.

Uploaded by

sakshi.1191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views33 pages

DBMS

The document compares file-based systems with Database Management Systems (DBMS), highlighting issues such as data redundancy, difficulty in accessing data, data isolation, integrity problems, atomicity issues, and concurrent-access anomalies. It explains the three-schema architecture of DBMS, which includes internal, conceptual, and external levels, and discusses the concepts of schema and instance in DBMS. Additionally, it covers the role of a Database Administrator (DBA), the Entity-Relationship (ER) model, and the concepts of specialization and generalization in database design.

Uploaded by

sakshi.1191
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

DATABASE MANAGEMENT SYSTEM (DBMS)

FileBased System VS Database Management System


1.​ Data Redundancy and Inconsistency:
○​ In file-processing systems, data redundancy occurs when the same
piece of information is stored in multiple files or multiple locations within
a file. For example, consider a system where employee information is
stored in separate files for each department. If an employee changes
their address, this change must be made in every file where their
information is stored. Failure to do so can lead to inconsistency, where
different copies of the same data contain conflicting information.
○​ Inconsistencies can arise due to errors in data entry or updates, lack of
synchronization between different copies of data, or failures during data
manipulation processes. These inconsistencies can lead to incorrect
decision-making, inefficiencies, and ultimately, loss of trust in the data.
2.​ Difficulty in Accessing Data:
○​ File-processing systems often lack a centralized mechanism for
managing and accessing data. As a result, finding and retrieving specific
information can be challenging, especially in large and complex systems
with numerous files stored in various formats and locations.
○​ Users may need to navigate through multiple directories or remember
specific file names and locations to access the desired data. This
decentralized approach to data storage and retrieval can lead to
inefficiencies, increased response times, and user frustration.
3.​ Data Isolation:
○​ In file-processing systems, data is typically organized into separate files
or directories based on the requirements of individual applications or
departments. Each application or department may have its own set of
files, formats, and access methods, leading to data isolation.
○​ Data isolation inhibits data sharing and integration across different parts
of an organization. It hampers collaboration, consistency, and the ability
to derive insights from data stored in disparate systems.
4.​ Integrity Problems:
○​ Ensuring data integrity—maintaining the accuracy, consistency, and
reliability of data—is challenging in file-processing systems, especially
when multiple users or processes are concurrently accessing and
modifying the same files.
○​ Without proper controls and mechanisms in place, integrity problems
such as data corruption, loss, or unauthorized modifications can occur.
For example, if two users simultaneously update the same file, their
changes may conflict, leading to inconsistencies or loss of data integrity.
5.​ Atomicity Problems:
○​ Atomicity refers to the property of database transactions where either all
the operations within a transaction are successfully completed, or none
of them are. File-processing systems often lack built-in mechanisms to
ensure atomicity, leaving transactions vulnerable to interruptions or
failures.
○​ Without atomicity, incomplete or partially applied changes to data can
occur, leading to data inconsistencies or corruption. For example, if a
transaction involves updating multiple files, a failure during the process
could leave the data in an inconsistent state, with some files updated
and others not.
6.​ Concurrent-Access Anomalies:
○​ Concurrent access refers to the scenario where multiple users or
processes attempt to access and modify the same data concurrently. In
file-processing systems, concurrent access can lead to various
anomalies, including lost updates, uncommitted data, and inconsistent
retrievals.
○​ For instance, if two users simultaneously read and update the same file,
one user's changes may overwrite the other user's modifications,
resulting in lost updates. Similarly, if one user is in the process of
updating a file while another user is reading it, the reading user may
retrieve inconsistent or incomplete data.

View of Data (Three Schema Architecture)


●​ The major purpose of DBMS is to provide users with an abstract view of the data. That is, the
system hides certain details of how the data is stored and maintained.
●​ To simplify user interaction with the system, abstraction is applied through several levels of
abstraction.
●​ The main objective of three level architecture is to enable multiple users to access the same data
with a personalized view while storing the underlying data only once

The three-schema architecture is as follows:


1. Internal Level / Physical Level

●​ The internal level has an internal schema which describes the physical storage structure of the
database.
●​ The internal schema is also known as a physical schema.
●​ It uses the physical data model. It is used to define how the data will be stored in a block.
●​ The physical level is used to describe complex low-level data structures in detail.

The internal level is generally is concerned with the following activities:

●​ Storage space allocations.​


For Example: B-Trees, Hashing etc.
●​ Access paths.​
For Example: Specification of primary and secondary keys, indexes, pointers and sequencing.
●​ Data compression and encryption techniques.
●​ Optimization of internal structures.
●​ Representation of stored fields.

2. Conceptual Level / Logical Level


●​ The conceptual schema describes the design of a database at the conceptual level. Conceptual
level is also known as logical level.
●​ The conceptual schema describes the structure of the whole database.
●​ The conceptual level describes what data are to be stored in the database and also describes what
relationship exists among those data.
●​ In the conceptual level, internal details such as an implementation of the data structure are hidden.
●​ Programmers and database administrators work at this level.

3. External Level

●​ At the external level, a database contains several schemas that sometimes called as subschema.
The subschema is used to describe the different view of the database.
●​ An external schema is also known as view schema.
●​ Each view schema describes the database part that a particular user group is interested and hides
the remaining database from that user group.
●​ The view schema describes the end user interaction with database systems.

Instances and Schema

What is DBMS Schema?

Here the DBMS schema means designing the database. For example, if we take the example of the
employee table. The employee table contains the following attributes. These attributes are EMP_ID,
EMP_ADDRESS, EMP_NAME, EMP_CONTACT. These are the schema of the employee table.

Schema is further divided into three types. These three are as follows.

1.​ Logical schema.


2.​ View schema.
3.​ Physical schema.

The schema defines the logical view of the database. It provides some knowledge about the database and
what data needs to go where.

In DBMS, the schema is shown in diagram format.


We can understand the relationship between the data present in the database. With the help of this
schema, we can implement the DBMS function such as delete, insert, search, update, etc.

Let us understand this by the below diagram. There are three diagrams, i.e., section, course, and student.
This diagram shows the relationship between the section and the course diagram. Schema is the only type
of structural view of the database that is shown below.

1. Physical schema:

In the physical schema, the database is designed at the physical level. At this level, the schema describes
how the data block is stored and how the storage is managed.

2. Logical schema:

In the logical schema, the database is designed at a logical level. At this level, the programmer and data
administrator perform their work. Also, at this level, a certain amount of data is stored in a structured way.
But the internal implementation data are hidden in the physical layer for the security proposed.

3. View schema:

In view schema, the database is designed at the view level. This schema describes the user interaction with
the database system.

Moreover, Data Definition Language (DDL) statements help to denote the schema of a database. The
schema represents the name of the table, the name of attributes, and their types; constraints of the tables
are related to the schema. Therefore, if users want to modify the schema, they can write DDL statements.

What is DBMS Instance?


In DBMS, the data is stored for a particular amount of time and is called an instance of the database. The
database schema defines the attributes of the database in the particular DBMS. The value of the particular
attribute at a particular moment in time is known as an instance of the DBMS.

For example, in the above example, we have taken the example of the attribute of the schema. In this
example, each table contains two rows or two records. In the above schema of the table, the employee
table has some instances because all the data stored by the table have some instances.

Let's take another example: Let's say we have a single table student in the database; today, the table has
100 records, so today, the instance of the database has 100 records. We are going to add another 100
records to this table by tomorrow, so the instance of the database tomorrow will have 200 records in the
table. In short, at a particular moment, the data stored in the database is called the instance; this change
over time as and when we add, delete or update data in the database.

Differences between Database Schema and Instance

Both of these help in describing the data available in a database, but there is a fundamental difference
between Schema and Instance in DBMS. Schema refers to the overall description of any given database.
Instance basically refers to a collection of data and Information that the database stores at any particular
moment.

The major differences between schema and instance are as follows:

Database Schema Database Instance

It is the definition of the database, or it is It is a snapshot of a database at a specific


defined as the description of the database. moment.

It rarely changes. It changes frequently.

This corresponds to the variable declaration The value of the variable in a program at a point
of a programming language. in time corresponds to an instance of the
database schema.

Defines the basic structure of the database, It is the set of Information stored at a particular
i.e., how the data will be stored in the time.
database.

Schema is same for whole database. Data in instances can be changed using
addition, deletion, updation.

It does not change very frequently. It changes very frequently

Data Model
Provides a way to describe the design of a DB at a logical level.

Underlying the structure of the DB is the Data Model; a collection of conceptual tools for describing
data, data relationships, data semantics & consistency constraints.

E.g., ER model, Relational Model, object-oriented model, object-relational data model etc.
Database Administrator (DBA)
A Database Administrator (DBA) is like the caretaker of a database environment. They're responsible for
keeping the database up and running smoothly, ensuring that the data is secure, accessible, and
consistently performs well. Think of a DBA as a mix between a librarian, who organizes and takes care of
the books (data), and a building superintendent, who maintains the building (database system) and ensures
everything operates correctly.

Here are some key functions of a DBA explained in simple terms:

1.​ Installation and Setup: The DBA is responsible for installing the DBMS and configuring it to
operate efficiently. This is like setting up the shelves and organizing the library to make it easy to
navigate and use.
2.​ Maintenance: They perform regular checks and maintenance tasks to ensure the database runs
smoothly. This involves updating the DBMS software, applying patches, and making sure the
hardware is up to date, similar to a librarian who ensures the library building is well-maintained and
the book catalog is current.
3.​ Security: A DBA protects the database from unauthorized access. They set up user accounts with
appropriate access levels, encrypt data, and implement other security measures. It's akin to a
librarian who controls who enters the library and ensures that only authorized personnel can access
certain restricted sections.
4.​ Backup and Recovery: The DBA regularly backs up the database to prevent data loss in case of a
failure and knows how to recover the data if something goes wrong. Imagine a librarian who keeps
copies of important documents safe and knows exactly where they are if needed.
5.​ Capacity Planning: They anticipate the future needs of the database, planning for storage,
computing resources, and scalability to ensure the database can handle growth over time. This is
like planning a library expansion or rearranging shelves to accommodate more books.
6.​ Ensuring Data Integrity: The DBA sets up rules and policies to maintain the accuracy and
consistency of data in the database, similar to a librarian ensuring that books are correctly
categorized and returned to the right place.
7.​ Troubleshooting: When problems arise, the DBA is on call to diagnose and fix issues, ensuring
minimal downtime. Think of them as the go-to person for fixing a leak in the library roof or a glitch in
the check-out system.
ER (Entity Relationship) Diagram in DBMS
●​ ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
●​ It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.
●​ In ER modeling, the database structure is portrayed as a diagram called an
entity-relationship diagram.

For example, Suppose we design a school database. In this database, the student will be
an entity with attributes like address, name, id, age, etc. The address can be another entity
with attributes like city, street name, pin code, etc and there will be a relationship between
them.

Component of ER Diagram
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

Consider an organization as an example- manager, product, employee, department etc.


can be taken as an entity.

a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a
primary key. The key attribute is represented by an ellipse with the text underlined.

b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an
ellipse.

c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.

d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can
be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute
like Date of birth.
3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is


used to represent the relationship.

Types of relationship are as follows:

a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as
one to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then this is known as a one-to-many
relationship.

For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.

c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity
on the right associates with the relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a many-to-many
relationship.

For example, Employee can assign by many projects and project can have many
employees.
Specialization
●​ Specialization is a top-down approach, and it is opposite to Generalization. In
specialization, one higher level entity can be broken down into two lower level
entities.
●​ Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
●​ Normally, the superclass is defined first, the subclass and its related attributes are
defined next, and relationship set are then added.

For example: In an Employee management system, EMPLOYEE entity can be specialized


as a TESTER or DEVELOPER based on what role they play in the company.

Generalization
●​ Generalization is like a bottom-up approach in which two or more entities of lower
level combine to form a higher level entity if they have some attributes in common.
●​ In generalization, an entity of a higher level can also combine with the entities of the
lower level to form a further higher level entity.
●​ Generalization is more like subclass and superclass system, but the only difference
is the approach. Generalization uses the bottom-up approach.
●​ In generalization, entities are combined to form a more generalized entity, i.e.,
subclasses are combined to make a superclass.

For example, Faculty and Student entities can be generalized and create a higher level
entity Person.
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In
aggregation, relationship with its corresponding entities is aggregated into a higher level
entity.

For example: Center entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching center then he will never enquiry about the Course only or just about the Center
instead he will ask the enquiry about both.

Relational Model
The relational model is a conceptual framework used in database management. It
represents data in the form of tables, where each table consists of rows and columns.
Here's an explanation of some key terms associated with the relational model:
1.​ Tuple: In the context of a relational database, a tuple refers to a single row in a
table. Each tuple represents a single record or entity within the database.
2.​ Attributes: Attributes, also known as columns or fields, represent the properties or
characteristics of the entities being modeled. Each attribute in a table corresponds to
a particular piece of information about the entities represented by the tuples. For
example, in a table representing employees, attributes might include 'Name', 'Age',
'Salary', etc.
3.​ Domain: A domain refers to the set of possible values that an attribute can take. It
defines the data type and constraints for each attribute. For instance, the domain of
an 'Age' attribute might be integers between 0 and 150, and the domain of a 'Name'
attribute might be strings of characters.
4.​ Cardinality: Cardinality refers to the number of tuples (rows) in a relation (table). It
describes the uniqueness of the relationships between tables. For example, in a
one-to-many relationship between two tables, the cardinality would indicate how
many tuples in one table are related to each tuple in the other table.
5.​ Degree: Degree refers to the number of attributes (columns) in a relation (table). It
indicates the complexity or the number of properties being stored for each entity
represented by a tuple. For example, a table with attributes 'Name', 'Age', and
'Salary' would have a degree of 3.

Important properties of a Table in Relational Model


1.​ The name of relation is distinct among all other relation.
2.​ The values have to be atomic. Can’t be broken down further.
3.​ The name of each attribute/column must be unique.
4.​ Each tuple must be unique in a table.
5.​ The sequence of row and column has no significance.
6.​ Tables must follow integrity constraints - it helps to maintain data consistency across
the tables.

Relational Model Keys


1.​ Primary Key (PK):
○​ A primary key is a unique identifier for each tuple (row) in a table.
○​ It ensures that each row in a table can be uniquely identified.
○​ Primary keys cannot have NULL values.
○​ Each table can have only one primary key.
○​ Primary keys are often implemented using a single column, but they can also
be composite (using multiple columns).
○​ Example: In a table representing employees, the 'EmployeeID' column might
serve as the primary key.
2.​ Foreign Key (FK):
○​ A foreign key is a column or set of columns in one table that refers to the
primary key in another table.
○​ It establishes a relationship between two tables by enforcing referential
integrity.
○​ Foreign key values must either match a primary key value in the referenced
table or be NULL.
○​ Foreign keys ensure that data in related tables remains synchronized and
consistent.
○​ Multiple foreign keys can exist in a table, each referring to a different primary
key.
○​ Example: In a table representing orders, the 'CustomerID' column might serve
as a foreign key referencing the 'CustomerID' primary key in a 'Customers'
table.
3.​ Unique Key:
○​ A unique key ensures that all values in a column or set of columns are unique.
○​ Unlike primary keys, unique keys can contain NULL values, but each
non-NULL value must be unique.
○​ Tables can have multiple unique keys, but only one primary key.
○​ Unique keys are used to enforce data integrity constraints and prevent
duplicate entries.
○​ Example: In a table representing students, the 'StudentID' column might have
a unique key constraint to ensure each student has a distinct identifier.
4.​ Candidate Key:
○​ A candidate key is a set of one or more columns that can uniquely identify
tuples in a table.
○​ Any candidate key can potentially serve as the primary key.
○​ If the primary key is chosen from among multiple candidate keys, the
remaining candidate keys become alternate keys.
○​ Example: In a table representing postal addresses, a combination of
'HouseNumber', 'Street', 'City', and 'PostalCode' might form a candidate key,
as it uniquely identifies each address.
5.​ Composite Key:
○​ A composite key is a primary key composed of multiple columns.
○​ Together, these columns ensure uniqueness within the table.
○​ Composite keys are useful when no single column can uniquely identify
tuples.
○​ Example: In a table representing sales transactions, a composite key
consisting of 'TransactionID' and 'LineItemNumber' might be used to uniquely
identify each line item within a transaction.
6.​ Compound Key:
○​ A compound key, also known as a composite key, is a key that consists of
multiple columns in a table.
○​ Together, these columns uniquely identify each row in the table.
○​ Unlike a composite key, where each column contributes to the uniqueness of
the key, a compound key involves combining multiple columns to create a
unique identifier.
○​ Compound keys are useful when no single column can uniquely identify tuples
in a table.
○​ Example: In a table representing sales transactions, a compound key
consisting of 'TransactionID' and 'ProductID' might be used to uniquely identify
each product within a transaction.
7.​ Surrogate Key:
○​ A surrogate key is an artificially generated key used as the primary key of a
table.
○​ Surrogate keys are typically numeric and have no inherent meaning or
relationship to the data they represent.
○​ They are often implemented using auto-incrementing integers or GUIDs
(Globally Unique Identifiers).
○​ Surrogate keys simplify database design and improve performance by
providing a stable, unique identifier for each tuple.
○​ They are especially useful when the natural key (i.e., the key derived from
existing data attributes) is complex, subject to change, or not suitable for use
as a primary key.
○​ Example: In a table representing employees, a surrogate key named
'EmployeeID' might be used as the primary key instead of a natural key like
'Social Security Number' due to privacy concerns or potential changes in the
SSN.
8.​ Super Key:
○​ A super key is a set of one or more attributes (columns) that, taken together,
uniquely identify tuples in a table.
○​ It is a broader concept than a primary key because it includes all possible
combinations of attributes that uniquely identify tuples.
○​ A super key may contain more attributes than necessary to uniquely identify
tuples, making it a superset of candidate keys.
○​ Any subset of a super key that still uniquely identifies tuples is considered a
candidate key.
○​ Example: In a table representing students, a super key might include
attributes such as 'StudentID', 'FirstName', 'LastName', and 'DateOfBirth', as
these together can uniquely identify each student. However, this super key
would contain more attributes than necessary to uniquely identify tuples,
making it a super key rather than a candidate key.

Integrity Constraints
Integrity constraints are rules or conditions that are enforced on data in a relational
database to ensure its accuracy, consistency, and reliability. These constraints help
maintain the quality and integrity of the data stored in the database. There are several
types of integrity constraints commonly used in relational databases:

I.​ Entity Integrity Constraint:


a.​ Also known as primary key constraint.
b.​ It ensures that each row in a table is uniquely identifiable by its primary key.
c.​ This constraint prevents duplicate or null values in the primary key column(s).
II.​ Referential Integrity Constraint:
a.​ Also known as foreign key constraint.
b.​ It ensures the consistency of relationships between tables by enforcing
referential integrity.
c.​ This constraint ensures that values in a foreign key column must match values
in the corresponding primary key column of the related table or be NULL.
d.​ It prevents orphaned records by restricting actions like deleting or updating
records in the referenced table if related records exist in other tables.
III.​ Domain Integrity Constraint:
a.​ It ensures that data values stored in a column adhere to a defined set of
permissible values or data types.
b.​ Domain integrity constraints can include data type constraints, range
constraints, and check constraints.
c.​ Data type constraints enforce that data values stored in a column match the
specified data type (e.g., integers, strings, dates).
d.​ Range constraints define allowable ranges of values for numeric data types
(e.g., minimum and maximum values for integers).
e.​ Check constraints define custom conditions that data values must satisfy (e.g.,
ensuring that ages are greater than zero).
IV.​ Attribute or Column Constraints:
a.​ These constraints are specific to individual columns and define rules or
conditions that must be satisfied by the values in those columns.
b.​ Examples include NOT NULL constraints, UNIQUE constraints, and DEFAULT
constraints.
c.​ NOT NULL constraints ensure that a column does not contain NULL values.
d.​ UNIQUE constraints ensure that values in a column (or combination of
columns) are unique across all rows in the table.
e.​ DEFAULT constraints specify default values for columns if no value is explicitly
provided during insertion.

2.​ CRUD Operations must be done with some integrity policy so that DB is always
consistent.
CRUD stands for Create, Read, Update, and Delete. It represents the four basic functions
or operations that are typically performed on data in a database or any persistent storage
system. Here's a brief explanation of each CRUD operation:

a.​ Create:
○​ Create operation involves adding new data or records to a database.
○​ It typically involves inserting a new row or tuple into a table.
○​ For example, creating a new user account, adding a new product to an
inventory, or inserting a new entry into a log.
b.​ Read:
○​ Read operation involves retrieving existing data or records from a database.
○​ It allows users to access and view the stored data without making any
changes to it.
○​ Read operations can be simple queries to retrieve specific information or
complex searches across multiple tables.
○​ For example, retrieving user information based on their username, querying
product details based on a specific category, or fetching all entries from a log
file within a certain time frame.
c.​ Update:
○​ Update operation involves modifying existing data or records in a database.
○​ It allows users to change or update the values of one or more attributes of a
particular record.
○​ Update operations typically involve specifying the record to be updated and
providing new values for the attributes.
○​ For example, updating a user's profile information, changing the quantity of a
product in inventory, or editing the content of a document.
d.​ Delete:
○​ Delete operation involves removing existing data or records from a database.
○​ It permanently eliminates the specified records from the database.
○​ Delete operations should be used with caution as they can result in the loss of
valuable data.
○​ For example, deleting a user account, removing a product from inventory, or
deleting outdated records from a log file.

3.​ Introduced so that we do not accidentally corrupt the DB.

Functional Dependencies
A Functional Dependency in DBMS is a fundamental concept that describes the
relationship between attributes (columns) in a table. It shows how the values in one or
more attributes determine the value in another. In layperson's terms, it describes how data
in one column or set of columns can relate to data in another column. It helps to maintain
the quality of the data in DBMS.

X→Y
Here X determines Y

To Check
If (t1 .X=t2.X)
Then (t1 .Y=t2.Y)
Roll_no Name Marks Department Course

1 Joy 78 CS C1

2 Ravi 60 EE C1

3 Joy 78 CS C2

4 Ravi 60 EE C3

5 Kevin 80 IT C3

6 Shyam 80 EC C2

X Y
Roll_no → Name

Here Roll_no attribute determines Name attribute every value of Roll_no is unique so it is
Functional Dependency

Name → Marks

It is functional Dependency

Name → Course
It is not functional Dependency Name Has two value joy so according to If (t1 .X=t2.X)
Is true but (t1 .Y=t2.Y) is not True Because Course are Different for Two Joy that is C1
and C2 same goes for ravi

Name,Marks → Course

Here (t1 .X=t2.X) is True (Joy,68)=(Joy,68) then we have to check (t1 .Y=t2.Y) Which is not
true because C1 is not equal to C2 so it is not functional Dependency

Name,Marks → Department
It is functional Dependency
Types of Functional Dependencies

If X → Y Where Y ⊆ X then it is called Trivial Functional Dependency


Example (Name,Marks) → Marks where Marks is subset of (Name,Marks) so it Trivial
Functional Dependency

If X → Y Where Y is not Subset of X and X ⋂ Y = ϕ Then it is called Non-Trivial

Functional Dependency Example Name → Marks , (Name,Marks) → (Marks,Department)

Armstrong Axioms / Inference Rules


1.​ Reflexivity Rule : X → Y Where Y ⊆ X
2.​ Transitive Rule : if (X → Y and Y → Z) then X → Z
Example (Name → Marks) and (Marks → Department) then
Name → Department
3.​ Augmentation Rule : if X → Y then XA → YA
Example If (Name → Marks) then Name,Course → Marks,Course
4.​ Union Rule : if (X → Y and X → Z) then X → YZ
Example (Name → Marks) and (Name → Department) then
Name → (Marks,Department)
5.​ Decomposition Rule : if (X → YZ ) then (X → Y and X → Z)
Example Name → (Marks,Department) then
(Name → Marks) and (Name → Department)
6.​ Pseudo transitive Rule : if (X → Y and YZ → A) then XZ → A
Example (Roll_no → Name and Name,marks → Department) then
Roll_no,marks → Department
7.​ Composition Rule : if X → Y and A → B then XA → YB
NORMALIZATION
●​ Normalization is the process of organizing the data in a database efficiently.
●​ It involves breaking down a large table into smaller tables and defining relationships
between them.
●​ The main goals of normalization are to eliminate redundancy, reduce data
anomalies, and ensure data integrity.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows. Normalization consists of a series of guidelines that
helps to guide you in creating a good database structure.

Data modification anomalies can be categorized into three types:

●​ Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
●​ Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
●​ Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.

Types of Forms
There are several levels of normalization, typically referred to as normal forms (NF), with
each subsequent normal form building upon the rules of the previous one. The most
commonly used normal forms are:

First Normal Form (1NF)


●​ A relation will be 1NF if it contains an atomic value.
●​ It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attributes.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar


12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


●​ The relation should be in 1NF
●​ There should not be any partial dependency.
1. All non-prime attributes must be fully dependent on PK.
2. Non prime attributes can not depend on the part of the PK.
STUDENT PROJECT table:

STUD_ID PROJ_ID STUD_NAME PROJ_NAME

589 P07 Alex ChatBot

586 P02 Eva Cloud

576 P04 Dave IOT

592 P03 Robert Portfolio


PK: {STUD_ID , PROJ_ID}
FD : STUD_ID → STUD_NAME
PROJ_ID → PROJ_NAME
Here STUD_NAME and PROJ_NAME are Non prime attributes whereas STUD_ID and
PROJ_ID are prime attributes we can see that Non prime attributes depend on the part of
the PK. STUD_ID → STUD_NAME and PROJ_ID → PROJ_NAME which is partial
dependency.

The decomposition of the STUDENT PROJECT table into 2NF has been shown
below:
STUDENT table

PROJ_ID STUD_ID STUD_NAME

P07 589 Alex

P02 586 Eva

P04 576 Dave

P03 592 Robert

PROJECT table

PROJ_ID PROJ_NAME

P07 ChatBot

P02 Cloud

P04 IOT

P03 Portfolio

Third Normal Form (3NF)


●​ The relation should be in 2NF
●​ No transitivity dependency exists.
1. Non-prime attribute should not find a non-prime attribute.

Course_ID Course_Name Instructor Department

1 Math Smith Mathematics

2 History Jones History

3 Physics Kim Science

4 Biology Patel Science


PK: {STUD_ID , PROJ_ID}
FD : Department → Course_Name
Course_Name → Instructor
Here transitivity dependency exists.
Department table

Course_ID Course_Name Department

1 Math Mathematics

2 History History

3 Physics Science

4 Biology Science

Instructor table

Course_Name Instructor

Math Smith

History Jones

Physics Kim

Biology Patel

Boyce-Codd Normal Form (BCNF)


●​ The relation should be in 3NF
●​ FD: A -> B, A must be a super key.
1. We must not derive prime attribute from any prime or non-prime attribute.

STUD_ID SUBJECT PROFESSOR

101 JAVA Alex

101 CPP Eva

102 JAVA Alex

103 C# Robert

104 JAVA Alex

PK: {STUD_ID , SUBJECT}


FD : {STUD_ID , SUBJECT} → PROFESSOR
PROFESSOR → SUBJECT
Here PROFESSOR is NON PRIME-KEY Attribute that determines prime key attribute
subject

STUD_ID P_ID

101 1

101 2

102 1

103 3

104 1

P_ID PROFESSOR SUBJECT

1 Alex JAVA

2 Eva CPP

3 Robert C#

Transaction In DBMS
A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:

X's Account

1.​ Open_Account(X)
2.​ Old_Balance = X.balance
3.​ New_Balance = Old_Balance - 800
4.​ X.balance = New_Balance
5.​ Close_Account(X)

Y's Account
1.​ Open_Account(Y)
2.​ Old_Balance = Y.balance
3.​ New_Balance = Old_Balance + 800
4.​ Y.balance = New_Balance
5.​ Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in
a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following
operations:

1.​ 1. R(X);
2.​ 2. X = X - 500;
3.​ 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

●​ The first operation reads X's value from database and stores it in a buffer.
●​ The second operation will decrease the value of X by 500. So buffer will contain
3500.
●​ The third operation will write the buffer's value to the database. So X's final value will
be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after executing
operation 2 then X's value will remain 4000 in the database which is not acceptable by the
bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Transaction property
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.

1.​ Atomicity
2.​ Consistency
3.​ Isolation
4.​ Durability

Atomicity: All operations within a transaction are treated as a single unit of work. Either all
operations are completed successfully, or none are. There is no partial completion.
Consistency: The integrity constraints are maintained so that the database is consistent
before and after the transaction.

Example For Transaction T1 A sends 100 to B THE BALANCE BEFORE TRANSACTION


IN A IS 400 AND IN B IS 500

Read(A) 400
A=A-100
Write(A) 300
Read(B) 500
B=B+100
Write(B) 600
Commit

Before Transaction 400+500=900


After Transaction 300+600=900

Isolation: Each Transaction is isolated from other which means one transaction cant affect
other

Durability: Once a transaction is committed, its effects are permanently stored in the
database and cannot be undone, even in the event of a system failure.

States of Transaction
In a database, the transaction can be in one of the following states -

Active state

The active state is the first state of every transaction. In this state, the transaction is being
executed. For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
Partially committed

In the partially committed state, a transaction executes its final operation, but the data is
still not saved to the database.

In the total mark calculation example, a final display of the total marks step is executed in
this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully.


In this state, all the effects are now permanently saved on the database system.

Failed state

If any of the checks made by the database recovery system fails, then the transaction is
said to be in the failed state.

In the example of total mark calculation, if the database is not able to fire a query to fetch
the marks, then the transaction will fail to execute.

Aborted

If any of the checks fail and the transaction has reached a failed state then the database
recovery system will make sure that the database is in its previous consistent state. If not
then it will abort or roll back the transaction to bring the database into a consistent state.

If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.

After aborting the transaction, the database recovery module will select one of the two
operations:

Re-start the transaction

Kill the transaction

Schedule
A schedule refers to the chronological order in which multiple transactions are
executed

There are two main types of schedules in DBMS:


1.​ Serial Schedule:
○​ In a serial schedule, transactions are executed one after the other in a
sequential manner. Each transaction is fully completed before the next one
begins.
○​ Serial schedules ensure consistency and isolation of transactions, as there is
no concurrency or interference between transactions.
○​ While serial schedules are simple and easy to implement, they may lead to
reduced performance, especially in systems with a high volume of
transactions, as transactions are executed sequentially, leading to potential
resource contention and longer execution times.
2.​ Parallel Schedule:
○​ In a parallel schedule, transactions are executed concurrently, with multiple
transactions executing simultaneously.
○​ Parallel schedules can improve performance and throughput by utilizing the
available resources more efficiently and reducing overall execution time.
○​ However, parallel schedules introduce challenges related to concurrency
control and ensuring the consistency and isolation of transactions.
○​ Techniques such as concurrency control mechanisms (e.g., locking,
timestamp ordering, optimistic concurrency control) are employed to manage
concurrency and maintain the consistency and integrity of the database in
parallel schedules.

Concurrency
It deals with ensuring that transactions execute concurrently without interfering with each
other in a way that could lead to inconsistencies or incorrect results.if T1 and T2
Transactions is to be performed then first T1 is executed then T2 then again T1

Disadvantages of Concurrency

Dirty Read Problem / Write Read Conflict :The dirty read problem in DBMS occurs when
a transaction reads the data that has been updated by another transaction that is still
uncommitted.

Example: Consider two transactions A and B performing read/write operations on a data


DT in the database DB. The current value of DT is 1000: The following table shows the
read/write operations in A and B transactions.
Transaction A reads the value of data DT as 1000 and modifies it to 1500 which gets
stored in the temporary buffer. The transaction B reads the data DT as 1500 and commits
it and the value of DT permanently gets changed to 1500 in the database DB. Then some
server errors occur in transaction A and it wants to get rollback to its initial value, i.e., 1000
and then the dirty read problem occurs.

Incorrect Summary problem: The Incorrect summary problem occurs when there is an
incorrect sum of the two data. This happens when a transaction tries to sum two data using
an aggregate function and the value of any one of the data get changed by another
transaction.
Example: Consider two transactions A and B performing read/write operations on two data
DT1 and DT2 in the database DB. The current value of DT1 is 1000 and DT2 is 2000: The
following table shows the read/write operations in A and B transactions.

Transaction A reads the value of DT1 as 1000. It uses an aggregate function SUM which
calculates the sum of two data DT1 and DT2 in variable add but in between the value of
DT2 get changed from 2000 to 2500 by transaction B. Variable add uses the modified
value of DT2 and gives the resultant sum as 3500 instead of 3000.

You might also like