Dbms 1-4 Unit Notes
Dbms 1-4 Unit Notes
DATA
Data can be defined as the representation of facts, concepts or instructions in a
formulized manner, suitable for communication, interpretation or processing by
human or electronic machine.
Data is represented with the help of character like alphabets (A to Z, a to z), digits (0-
9) or special character ( + , = * < > / \ ).
RECORD
Record is a collection of related data items.
For example- payroll record for an employee contain such data fields like name, age,
qualifications, DA, HRA, PF.
INFORMATION
Information is processed data that is meaningful and relevant. It results from
organizing, analysing, and interpreting data to provide context and value.
Example: In the university database, information might include the average score of
all BCA101 students, which is derived from data by performing calculations on
exam scores.
CHARACTERISTICS OF INFORMATION
1. Timely
2. Accurate
3. Complete
4. Given to the right person
5. Purpose
FILES
Files are collections of related records. They are used to store and manage data in a
structured manner within a database system. Files can be thought of as tables in a
database, each containing a set of records.
Example: In the university database, you might have files or tables for students,
courses, instructors, and grades. The "Students" file would contain records of all
students, while the "Courses" file would store information about various courses
offered.
DATABASE
The database approach involves the use of a Database Management System (DBMS) to
manage and store data. In this approach, data is organized into a centralized repository with
structured relationships, ensuring data consistency, security, and integrity.
CHARACTERISTICS OF DATABASE
Centralization
Query Independence
Language
CHARACTERISTICS
Recovery Security
OF DATABASE
Concurrent Integrity
access
Sharing
ADVANTAGES OF DATABASE SYSTEM
1. Data Centralization: All data is stored in one central location, making it easier
to manage and access data efficiently.
2. Data Consistency: Database systems enforce data integrity constraints, ensuring that
data remains accurate and consistent.
3. Data Security: DBMS provides security features like authentication and
authorization to control who can access and modify data.
4. Data Sharing: Multiple users and applications can access and share data
simultaneously, promoting data integration and collaboration.
5. Data Independence: Changes to the database structure can be made without
affecting the application programs that use the data (data and program
independence).
6. Concurrent Access: DBMS handles concurrent access by multiple users, ensuring
dataavailability and consistency.
7. Backup and Recovery: Robust backup and recovery mechanisms are in place
to prevent data loss in case of system failures.
8. Query Language: A query language (e.g., SQL) allows users to retrieve,
manipulate, and analyse data easily.
9. Data Relationships: Database systems support the establishment of relationships
between data in different tables, facilitating complex queries and data modelling.
10. Scalability: DBMS can be scaled to accommodate growing data and user loads.
3. SYSTEM ANALYST
System Analyst is a user who analysis the requirement of parametric end users.
They check whether all the requirements of end users are satisfied.
4. SOPHISTICATED USERS
Sophisticated users can be engineer, scientist, business analyst who are familiar with
the database.
They can develop their database application according to their requirement.
5. DATABASE DESIGNER
Database designers are the users who design the structure of the database which
includes tables, indexes stored procedures.
6. APPLICATION PROGRAMMERS
Application programmers also referred as system analyst or simply software engineer
are the back- end programmers who write the code for the applications programs.
7. CASUAL USERS
Casual users and Temporary users are the users who use the database but when they
access the database they require the information.
Example - middle or higher level manager.
DISADVANTAGES OF DBMS
1. Complexity: DBMS systems can be complex to set up and manage.
Database administrators require specialized training and expertise to ensure
optimal performance and security.
2. Cost: Implementing and maintaining a DBMS can be expensive. Costs include
software licenses, hardware, personnel training, and ongoing maintenance expenses.
3. Performance Overhead: DBMS systems introduce performance overhead due to
query processing, indexing, and data management. In some cases, complex queries
may take longer to execute compared to simpler file-based systems.
4. Data Size: Large databases may require significant storage resources and can lead to
increased hardware costs. Additionally, managing and backing up large volumes of
data can be time-consuming.
5. Complex Backup and Recovery: While DBMS systems offer backup and
recovery mechanisms, implementing and managing them effectively can be
complex. This complexity may lead to data loss if not properly configured.
6. Vendor Lock-In: Organizations may become locked into a specific DBMS
vendor's technology, making it challenging to switch to a different system in the
future.
7. Resource Consumption: Database systems can consume significant
system resources, including CPU, memory, and storage. This may affect the
overall performance of the host system.
8. Data Redundancy: While DBMS systems aim to reduce data redundancy, it is still
possible to have some redundancy in certain situations, leading to increased storage
requirements.
9. Data Security Risks: While DBMS systems offer security features, they can still be
vulnerable to security breaches if not properly configured and maintained. Security
risks include unauthorized access and data breaches.
10. Data Isolation: In some cases, data in a DBMS may be isolated and not easily
shared or integrated with other systems, leading to data silos.
File based system were an early attempt to computerize the manual system.
It is also called Traditional based approach in which a decentralized approach is
taken where each department stored and control its own data with the help of data
processing.
These roles work collaboratively to ensure that data is effectively managed, secured,and
made available to support the needs of an organization. Effective communication and
coordination among these roles are essential for the successfuloperation of the database
environment.
UNIT-II
An external schema provides a customized and simplified view of the database for a
particular group of users or applications.
It acts as a layer of abstraction that hides the complexity of the underlying database schema
The external schema is designed to meet the requirements of a specific group of users or
applications.
Conceptual Level:-
A conceptual schema is a high-level, abstract representation of the entire database, describing
entities, their relationships, and constraints, providing a clear, conceptual understanding of the
data organization.
Physical Level:-
The physical schema, or physical level, represents how data is stored, indexed, and organized at
the lowest level, focusing on disk storage, file structures, access paths, and hardware
considerations within a database system.
The physical schema describes how data is physically stored on the underlying storage
devices such as hard drives, SSDs, or other storage media
It defines the actual access paths and mechanisms used to retrieve and manipulate data
efficiently.
The physical schema is closely tied to the hardware and operating system of the underlying
computing environment.
Data Independence:-
Data independence is the ability to modify the database schema at one level (e.g., logical) without
affecting the schema, applications, or processes at another level (e.g., physical), enhancing system
adaptability, flexibility, and maintenance.
Data Independence
Physical data independence refers to the capacity to alter the physical storage and access
mechanisms without affecting the conceptual or external schemas. This is essential for
efficiency, as changes at the physical level can be made transparently to applications and
higher-level schemas.
Key points include:
Impact on Schema Changes to physical storage (e.g., Changes to logical schema (e.g., table
indexing, storage mechanisms) do not structure, relationships) do not affect
Changes affect the logical or external schemas. external schemas, applications, or
user views.
Importance for Important for optimizing database Important for ensuring flexibility in
performance, storage, and retrieval. data representation and ease of
optimization
modification, enhancing
Adaptability and maintenance.
Classification of DBMS:-
Database Management Systems (DBMS) can be classified based on various criteria, including
data model, users, purpose, and sites. Here's a classification based on these criteria:
Classification of DBMS
Heterogeneous Homogeneous
Based on user:-
1. Single User: - In simple user DBMS Database is stored on simple Computer and
access only one user at a time.
2. Multi User:- The database stored in single system and access by multiple user
Simultaneously
Based on Purpose:-
1. General Purpose:- The database which provide data for general purpose
application is called General Purpose DBMS
Example: Oracle
2. Special Purpose:- Design and built for special purpose application
Example: Airline Reservation Banks
Based on no of sites :-
1. Centralized: Data is stored at single computer site. A centralized database
stored data single location and it maintained and modified from that location only
and usually access using and interest connection such as LAN & MAN.
PC Database PC
PC
Site Site
Communication Network
Site
Site
1. Heterogeneous Database :-
In this different sites can use different schema and software
that can lead to problem in query processing.
Different computer OS and different application are used.
They may even use different models for the database.
2. Homogenous Database:-
In Homogenous database, all different sites stores database
identically.
The operating system, DBMS and the DS used all the same
at all sites.
College
Salary EMP ID
Roll No Name
College
Library
CSE Dept.
Student
B. Relational Data Model :
In relational DBMS data stored in form of rows and columns &
together form a table.
A relational DBMS uses SQL for sorting manipulating as well as
maintaining the data.
Terminal
Terminal
Terminal Terminal
End User
Application Program
Database System
GUI Interface
Application Program
Database
Data Models:-
Functional Data
Models
1. Hierarchical Model:-
In Hierarchical Model data are represented by collection of records.
In this, relationship among the data represented by links.
In this model tree data structure is used.
It was developed in 1960 by IBM to manage large amount of data.
The Basic logic structure data model is of hierarchical upside-down tree.
Advantages:-
A. Simplicity
B. Data Integrity
C. Data Security
D. Data Efficiency
E. Easy Availability
Disadvantages:-
A. Complexity
B. Flexibility
C. Lack of data independence.
Employee
2. Network Model:-
In Network Model data are represented by collection of records.
In this relationship among the data are represented by links.
Graph data structures are used in this model.
It permits a record to have more than one parent.
Advantages:-
1. Data Integrity
2. Database Standards
Disadvantages:-
1. System Complexity
Project
Project 1 Project 2
Advantages
1. Easy To Design
2. Easy to Manage
Disadvantages
1. Easy to design can result in bad design.
UNIT-III
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
1. Strong Entity – A strong entity is an entity type that has a key attribute. It
doesn't depend on other entities in the schema. A strong entity always has a
primary key, and it is represented by a single rectangle in the ER diagram.
2. Weak Entity – Weak entity type doesn’t have a key attribute and so we
cannot uniquely identify them by their attributes alone. Therefore, a foreign
key must be used in combination with its attributes to create a primary key.
They are called Weak entity types because they can’t be identified on their
own. It relies on another powerful entity for its unique identity. A
weak entity is represented by a double-outlined rectangle in ER diagrams.
The relationship between a weak entity type and a strong entity type is shown with
a double-outlined diamond instead of a single-outlined diamond. This
representation can be seen in the image given below.
Entity Sets:
Entity set: is a group of entities of similar kinds. It can
contain entities with attributes that share similar values. It's collectively a group
of entities of a similar type. For example, a car entity set, an animal entity set, a bank
account entity set, and so on.
Attribute:
Attributes are the characteristics or properties which define the entity type. In ER
diagram, the attribute is represented by an oval.
For example, here id, Name, Age, and Mobile No are the attributes that define
the entity type Student. There are five types of attributes:
4. Derived attribute: Derived attributes in DBMS are the ones that can be
derived from other attributes of an entity type. The derived attributes are
represented by a dashed oval symbol in the ER diagram.
Relationship
4. Many to Many Relationships: If multiple instances of the entity on the left are
linked by relationships to multiple instances of the entity on the right, this is
considered a many-to-one-relationship means relationship. For example, one
employee can be assigned many projects, and one project can be assigned by
many employees.
Features of ER
The features of ER Model are as follows −
ER Diagram: ER diagrams are the diagrams that are sketched out to design
the database. They are created based on three basic
concepts: entities, attributes, and relationships between them. In ER diagram
we define the entities, their related attributes, and the relationships between
them. This helps in illustrating the logical structure of the databases.
Database Design: The Entity-Relationship model helps the database
designers to build the database in a very simple and conceptual manner.
Graphical Representation helps in Better Understanding: ER diagrams
are very easy and simple to understand and so the developers can easily use
them to communicate with stakeholders.
Easy to build: The ER model is very easy to build.
The extended E-R features: Some of the additional features of ER model
are specialization, upper and lower-
level entity sets, attribute inheritance, aggregation, and generalization.
Integration of ER model: This model can be integrated into a common
dominant relational model and is widely used by database designers for
communicating their ideas.
Simplicity and various applications of ER model: It provides a preview of
how all your tables should connect, and what fields are going to be on each
table, which can be used as a blueprint for implementing data in specific
software applications.
What is Data Abstraction?
Data Abstraction = Data abstraction refers to the process of
simplifying complex data structures or systems by focusing on the
essential aspects and hiding unnecessary details. It involves
representing data and its operations at a higher level of abstraction,
making it easier to understand and work with.
The main purpose of this is," to Hide unnecessary details and
provide an abstract view of the data for the end user".
LEVELS OF ABSTRACTION
1.PHYSICAL LEVEL:
2. LOGICAL LEVEL:
3. EXTERNAL LEVEL:
NETWORK MODEL:
The network model was created to represent complex data relationships more
effectively when compared to hierarchical models, to improve database
performance and standards.
It has entities which are organized in a graphical representation and some entities
are accessed through several paths. A User perceives the network model as a
collection of records in 1:M relationships.
Tuple − A single row of a table, which contains a single record forthat relation is
called a tuple.
Relation instance − A finite set of tuples in the relational database system represents
relation instance. Relation instances do not haveduplicate tuples.
Relation key − Each row has one or more attributes, known as relation key,
which can identify the row in the relation (table)uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known
as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be avalid relation. These
conditions are called Relational Integrity Constraints. There are three main
integrity constraints −
Key constraints
Domain constraints
Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can
identify a tuple uniquely. This minimal subset ofattributes is called key for that
relation. If there are more than one such minimal subsets, these are called candidate
keys.
Key constraints force that −
Domain Constraints
Attributes have specific values in real-world scenario. For example,age can only be a
positive integer. The same constraints have beentried to employ on the attributes of a
relation. Every attribute is bound to have a specific range of values. For example, age
cannot be less than zero and telephone numbers cannot contain a digit outside 0-9.
In the 1980s and ’90s, relational databases grew increasingly dominant, delivering
rich indexes to make any query efficient. Table joins, the term for read operations
that pull together separate records into one, and Transactions, which means a
combination of reads and especially writes spread across the database, were
essential. SQL, the Structured Query Language, became the language of data, and
software developers learned to use it to ask for what they wanted, and let the
database decide how to deliver it. Strict guarantees were engineered into the
database to prevent surprises.
Important Terminologies
Here are some Relational Model concepts in DBMS:
Tables – In the case of the relational model, all relations are saved in the
table format, and it is stored along with the entities. A table consists of two
properties: columns and rows. While rows represent records, the columns
represent attributes.
Degree: It refers to the total number of attributes that are there in the
relation. The EMPLOYEE relation defined here has degree 5.
Relation Schema: It represents the relation’s name along with its attributes.
E.g., EMPLOYEE (ID_NO, NAME, ADDRESS, ROLL_NO, AGE) is the
relation schema for EMPLOYEE. If a schema has more than 1 relation, then
it is known as Relational Schema.
Column: It represents the set of values for a certain attribute. The column
ID_NO is extracted from the relation EMPLOYEE.
Cardinality: It refers to the total number of rows present in the given table.
The EMPLOYEE relation defined here has cardinality 4.
Relation instance – It refers to a finite set of tuples present in the RDBMS
system. A relation instance never has duplicate tuples.
Attribute domain – Every attribute has some predefined value and scope,
which is known as the attribute domain.
Relation key – Each and every row consists of a single or multiple attributes.
It is known as a relation key.
NULL Values: The value that is NOT known or the value that is unavailable
is known as a NULL value. This null value is represented by the blank
spaces. E.g., the MOBILE of the EMPLOYEE having ID_NO 4 is NULL.
Relational database structure
The database and the database structure are defined in the installation process.
The structure of the database depends on whether the databaseis Oracle Database,
IBM® DB2®, or Microsoft SQL Server.
a set of system catalog tables that describe the logical and physical
structure of the data
a configuration file containing the parameter values allocated for the
database
a recovery log with ongoing transactions and archivable transactions
DATABASE RELATION:
A relational database collects different types of data sets that use tables, records,
and columns. It is used to create a well-defined relationship between database tables so
that relational databases can be easily stored. For example of relational databases
such as Microsoft SQL Server, Oracle Database,MYSQL, etc.
One to One Relationship (1:1): It is used to create a relationship between two tables in which a
single row of the first table can only be related to one and only one records of a second table.
Similarly, the row of a second table can also be related to anyone row of the first table.
For example, there are many people involved in each project, and every person
can involve more than one project.
Properties of relations:
1. Each table in a database has a unique identity (name).
2. Any entry at the connection of each row and column has a single value.
There can be only one value that is related with each attributeon a specific
row of a table; no multivalued attributes are allowed in a relation.
3. Each row is unique; no two rows or tables in the same relation canbe
identical.
4. Each attribute (or column) within a table has a unique name.
5. The sequence of columns (left to right) is insignificant. The order ofthe
columns in a relation can be changed without changing the meaning or use
of the relation.
6. The sequence of rows (top to bottom) is insignificant. As with
columns, the order of the rows of a relation may be changed orstored
in any sequence.
KEYS:
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table.It is
also used to establish and identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each
student. In the PERSON table, passport_number, license_number, SSN are keys
since they are unique for each person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys, as we saw in the PERSON
table. The key which is most suitable from those lists becomes a primary
key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for
each employee. In the EMPLOYEE table, we can even select
License_Number and Passport_Number as primary keys since they are also
unique.
o For each entity, the primary key selection is based on requirements and
developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify
a tuple.
o Except for the primary key, the remaining attributes are considered a
candidate key. The candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The
rest of the attributes, like SSN, Passport_Number, License_Number, etc., are
considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key.
5. Alternate key
6. Composite key
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These
keys are created when a primary key is large and complex and has no relationship
with many other relations. The data values of the artificial keys are usually
numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and
Proj_ID, is large in employee relations. So it would be better to add a new virtual
attribute to identify each tuple in the relation uniquely.
DOMAIN:
The data type defined for a column in a database is called a database domain. This
data type can either be a built-in type (such as an integer or a string) or a custom
type that defines data constraints.
To understand this more effectively, let's think like this :
A database schema has a set of attributes, also called columns or fields, that define
the database. Each attribute has a domain that specifies the types of values that can
be used, as well as other information such as data type, length, and values.
Creating a Domain:
The above statement, for example, creates a C_Number attribute with ten integers
to store contact numbers. It's not possible to use a NULL or an unknown value.
This generates a ten-integer C_Number property, in which it wouldn't be possible
to use a NULL or an unknown value.
There are different types of data integrity constraints that are commonly found in
relational databases, including the following −
Relational Algebra
Relational algebra refers to a procedural query language that takes relation instances as input and
returns relation instances as output. It performs queries with the help of operators. A binary or
unary operator can be used. They take in relations as input and produce relations as output.
Types of Relational Operations
1. Select (σ)
2. Project (∏)
3. Union (𝖴)
4. Set Difference (-)
5. Set Intersection (∩)
6. Cartesian product (X)
7. Rename (ρ)
1. Select (σ)
Query:
Output:
2. Project (∏)
Project operation is done by Projection Operator which is represented by "pi"(∏). It is used to
retrieve certain attributes(columns) from the table. It is also known as vertical partitioning as it
separates the table vertically. It is also a unary operator.
Notation : ∏ a(r)
Where ∏ is used to represent PROJECTION
r is used to represent RELATION
a is the attribute list
Syntax of Project Operator (∏)
Output:
3. Union (𝖴)
Union operation is done by Union Operator which is represented by "union"(𝖴). It is the same as
the union operator from set theory, i.e., it selects all tuples from both relations but with the
exception that for the union of two relations/tables both relations must have the same set of
Attributes. It is a binary operator as it requires two operands.
Notation: R 𝖴 S
Where R is the first relation
S is the second relation
If relations don't have the same set of attributes, then the union of such relations will result in NULL.
Syntax of Union Operator (𝖴)
Query:
Output:
5. Intersection (∩)
Intersection operation is done by Intersection Operator which is represented by
"intersection"(∩).It is the same as the intersection operator from set theory, i.e., it selects all the
tuples which are present in both relations. It is a binary operator as it requires two operands.
Also, it eliminates duplicates.
Notation : R ∩ S
Where R is the first relation
S is the second relation
Syntax of Intersection Operator (∩)
Table 2: STUDENT
Query:
Output:
Table 2:S
Query:
Let find the cartesian product of table R and S
Output:
Note: The number of rows in the output will always be the cross product of number of rows
in each table. In our example table 1 has 3 rows and table 2 has 3 rows so the output has
3×3 = 9 rows.
7. Rename (ρ)
Rename operation is denoted by "Rho"(ρ). As its name suggests it is used to rename the output
relation. Rename operator too is a binary operator.
Notation: ρ(R,S)
Where R is a new relation name
S is a old relation name
Rename (ρ) Syntax:
Query:
Output:
Relational Calculus
Relational Calculus in non-procedural query language and has no description about how the
query will work or the data will be fetched. It only focusses on what to do, and not on how to do
it.
Types of Relation calculus
1. Tuple Relational Calculus (TRC) :- Tuple Relational Calculus in DBMS uses a tuple
variable (t) that goes to each row of the table and checks if the predicate is true or false for the
given row. Depending on the given predicate condition, it returns the row or part of the row.
Syntax
{T | P (T)} or {T | Condition (T)}
Example
Table: Student
First_Name Last_Name Age
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query:
{ t.Last_Name | Student(t) AND t.age > 30 }
Output:
Last_Name
Singh
2. Domain Relational Calculus (DRC) :- Domain Relational Calculus uses domain
Variables to get the column values required from the database based on the predicate expression
or condition.
Syntax
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Example:-
First_Name Last_Name Age
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query
{< First_Name, Age > | ∈ Student 𝖠 Age > 27}
Note:
The symbols used for logical operators are: 𝖠 for AND, ∨ for OR and ┓ for NOT.
Output
First_Name Age
Ajeet 30
Chaitanya 31
Carl 28
Functional Dependencies
In a relational database management, functional dependency is a concept that specifies the
relationship between two sets of attributes where one attribute determines the value of another
attribute. It is denoted as X → Y, where the attribute set on the left side of the arrow, X is
called Determinant, and Y is called the Dependent
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the
augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name}
→ {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid
by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no →
dept_building is also valid.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
Advantages of Functional Dependencies
1. Data Normalization:- Data normalization is the process of organizing data in a database
in order to minimize redundancy and increase data integrity.
2. Query Optimization:- With the help of functional dependencies we are able to decide
the connectivity between the tables and the necessary attributes need to be projected to
retrieve the required data from the tables.
3. Consistency of Data:- Functional dependencies ensures the consistency of the data
by removing any redundancies or inconsistencies that may exist in the data.
4. Data Quality Improvement:- Functional dependencies ensure that the data in the
database to be accurate, complete and updated.
Anomalies
Anomalies means problems or inconsistency which happened during the operations performed
on the table. There can be many reasons that anomaly occur for example,It occurs when data is
stored multiple times unnecessarily in the database i.e. redundant data is present or it occur when
all the data is stored in a single table. normalization is used to overcome the anomalies. the
different type of anomalies are insertion,deletion and updation anomaly.
Type of Anomalies
1. Update
2. Insert
3. Delete
Input
The same input is used for all three anomalies.
Student
1. Insertion Anomaly:-
When certain data or attributes cannot be inserted into the database without the presence of other
data, it's called insertion anomaly.
For example, let's take a branch name petroleum, now the data regarding petroleum cannot be
stored in the table unless we insert a student which is in petroleum.
Code
Insert into student values(3, ‘G’,16, ‘PETROLEUM’,104, ‘NAMAN’)
Select * from Student;
Output
2. Deletion anomaly:-
If we delete any data from the database and any other information which is required also gets
deleted with that deletion, then it is called deletion anomaly.
For example, suppose a student of the electrical branch is leaving so now we have to delete the
data of that student, but the problem is if we delete the student data, then branch data will also
get deleted along with that as there is only one student present through which branch data is
present.
Code
Delete from STUDENT WHERE BRANCH= ‘ELECTRICAL’;
Select * from STUDENT;
Output
3. Updation/modification anomaly:-
If we want to update any single piece of data then we have to update all other copies, it comes
under insertion anomaly.
For example, suppose we need to change the hod name for civil branch, now as per requirement,
only single data is to be changed, but we have to change the data at every other part so as to not
make an inconsistent table.
Code:-
Update STUDENT #Table selected to preform task
Set HOD_NAME= ‘RAHUL’#changes to be made
WHERE BRANCH= ‘CIVIL’;#condition given
Select * from STUDENT;#Data selected
Output:-
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
employees Table
EMPLOYEE_ID NAME STATE_CODE HOME_STATE
jobs table
JOB_CODE JOB
J01 Chef
J02 Waiter
J03 Bartender
home_state is now dependent on state_code. So, if you know the state_code, then you can find
the home_state value.
To take this a step further, we should separate them again to a different table to make it 3NF.
3.The Third Normal Form – 3NF
When a table is in 2NF, it eliminates repeating groups and redundancy, but it does not eliminate
transitive partial dependency.
This means a non-prime attribute (an attribute that is not part of the candidate’s key) is
dependent on another non-prime attribute. This is what the third normal form (3NF) eliminates.
be in 2NF
have no transitive partial dependency.
employee_roles Table
EMPLOYEE_ID JOB_CODE
E001 J01
E001 J02
E002 J02
E002 J03
E003 J01
employees Table
jobs Table:-
JOB_CODE JOB
J01 Chef
J02 Waiter
J03 Bartende
states Table
STATE_CODE HOME_STATE
26 Michigan
56 Wyoming
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
4. Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
1.It is in the Boyce-Codd Normal Form.
2. And, it doesn't have Multi-Valued Dependency.
Example where a table is used to store the Roll Numbers and Names of the students enrolled in
a university.
ROLL_NO STUDENT
901 Armaan
902 Ashutosh
903 Baljeet
904 Bhupinder
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example – Consider the above schema, with a case as “if a company makes a product and an
agent is an agent for that company, then he always sells that product for the company”. Under
these circumstances, the ACP table is shown as:
Table ACP
A1 PQR Nut
A1 PQR Bolt
A1 XYZ Nut
Agent Company Product
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of all three
relations will be shown as:
Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of R13
and R2 over ‘Agent’and ‘Product’ will be Table ACP.
SQL
Data type: Data types are used to represent the nature of the data that can be stored in the
database table. For example, in a particular column of a table, if we want to store a string type of
data then we will have to declare a string data type of this column.
Data types mainly classified into three categories for every database.
char(n) It is a fixed width character string data type. Its size can Define width
be up to 8000 characters.
varchar(n) It is a variable width character string data type. Its size 2 bytes + number of
can be up to 8000 characters. chars
varchar(max) It is a variable width character string data types. Its size 2 bytes + number of
can be up to 1,073,741,824 characters. chars
text It is a variable width character string data type. Its size 4 bytes + number of
can be up to 2GB of text data. chars
nchar It is a fixed width Unicode string data type. Its size can Define width x 2
be up to 4000 characters.
binary(n) It is a fixed width Binary string data type. Its size can
be up to 8000 bytes.
varbinary It is a variable width Binary string data type. Its size can
be up to 8000 bytes.
float(n) It is used to specify floating precision number data from -1.79E+308 to 4 or 8 bytes
1.79E+308. The n parameter indicates whether the field should hold the 4
or 8 bytes. Default value of n is 53.
datetime2 It is used to specify date and time combination. It supports 6-8 bytes
range from January 1, 0001 to December 31, 9999 with
an accuracy of 100 nanoseconds
date It is used to store date only. It supports range from January 3 bytes
1, 0001 to December 31, 9999
UPDATE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Ex- Be careful when updating records. If you omit the WHERE clause, ALL records will be
updated!
CustomerID CustomerName ContactName Address City PostalCode Country
The following SQL statement updates the first customer (CustomerID = 1) with a new contact
person and a new city.
Code:- UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
Table:-
CustomerID CustomerName ContactName Address City PostalCode Country
Query Processing
General Strategies of Query Processing:- Query processing is a process of translating a user
query into an executable form. It helps to retrieve the results from a database. In query
processing, it converts the high-level query into a low-level query for the database. Query
processing is a very important component of DBMS. It is critical to the performance of
applications that rely on database operations. The flow of query processing in DBMS is
mentioned below:
Optimization
After doing query parsing, the DBMS starts finding the most efficient way to
execute the given query. The optimization process follows some factors for the
query. These factors are indexing, joins, and other optimization mechanisms.
These help in determining the most efficient query execution plan. So, query
optimization tells the DBMS what the best execution plan is for it. The main goal
of this step is to retrieve the required data with minimal cost in terms of resources
and time.
Evaluation
After finding the best execution plan, the DBMS starts the execution of the
optimized query. And it gives the results from the database. In this step, DBMS
can perform operations on the data. These operations are selecting the data,
inserting something, updating the data, and so on.
Once everything is completed, DBMS returns the result after the evaluation step.
This result is shown to you in a suitable format.
Query Processor:-
It interprets the requests (queries) received from end user via an application
program into instructions. It also executes the user request which is received from
the DML compiler.
Query Processor contains the following components –
DML Compiler: It processes the DML statements into low level instruction
(machine language), so that they can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing
meta data (data about data).
Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
Concurrency Control
Concurrency control means that multiple transactions can be executed at the same time
and then the interleaved logs occur. But there may be changes in transaction results so
maintain the order of execution of those transactions.
During recovery, it would be very difficult for the recovery system to backtrack all the
logs and then start recovering.
Recovery with concurrent transactions can be done in the following four ways.
1. Interaction with concurrency control
2. Transaction rollback
3. Checkpoints
4. Restart recovery
Interaction with concurrency control :
In this scheme, the recovery scheme depends greatly on the concurrency control scheme
that is used. So, to rollback a failed transaction, we must undo the updates performed by
the transaction.
Transaction rollback :
In this scheme, we rollback a failed transaction by using the log.
The system scans the log backward a failed transaction, for every log record
found in the log the system restores the data item.
Checkpoints :
Checkpoints is a process of saving a snapshot of the applications state so that
it can restart from that point in case of failure.
Checkpoint is a point of time at which a record is written onto the database
form the buffers.
Checkpoint shortens the recovery process.
When it reaches the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file.
Then the log file is updated with the new step of transaction till the next
checkpoint and so on.
The checkpoint is used to declare the point before which the DBMS was in
the consistent state, and all the transactions were committed.
To ease this situation, ‘Checkpoints‘ Concept is used by the most DBMS.
In this scheme, we used checkpoints to reduce the number of log records that
the system must scan when it recovers from a crash.
In a concurrent transaction processing system, we require that the checkpoint
log record be of the form <checkpoint L>, where ‘L’ is a list of transactions
active at the time of the checkpoint.
A fuzzy checkpoint is a checkpoint where transactions are allowed to perform
updates even while buffer blocks are being written out.
Restart recovery :
When the system recovers from a crash, it constructs two lists.
The undo-list consists of transactions to be undone, and the redo-list consists
of transaction to be redone.
The system constructs the two lists as follows: Initially, they are both empty.
The system scans the log backward, examining each record, until it finds the
first <checkpoint> record.