0% found this document useful (0 votes)
10 views20 pages

Module 1 - Query Processing

Query processing involves a series of steps that a DBMS follows to execute SQL queries efficiently, including query decomposition, optimization, plan generation, code generation, and execution. It also encompasses mapping global queries to local queries in distributed databases, ensuring optimal resource utilization, and employing relational algebra for query formulation. Effective query optimization is essential for improving execution time, minimizing resource consumption, and enhancing user experience.

Uploaded by

mystiq.soull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

Module 1 - Query Processing

Query processing involves a series of steps that a DBMS follows to execute SQL queries efficiently, including query decomposition, optimization, plan generation, code generation, and execution. It also encompasses mapping global queries to local queries in distributed databases, ensuring optimal resource utilization, and employing relational algebra for query formulation. Effective query optimization is essential for improving execution time, minimizing resource consumption, and enhancing user experience.

Uploaded by

mystiq.soull
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT - I : Query Processing

Introduction
Query processing is the sequence of steps that a database management system
(DBMS) follows to execute a query and retrieve the desired results from the
database efficiently. It involves transforming a high-level SQL query into a low-
level database operation and optimizing it for better performance.

Phases of Query Processing

1. Query Decomposition

 Purpose: Breaks down the SQL query into smaller, logical steps for
further analysis.

 Subphases:

1. Query Parsing:

 Checks the query for syntax and semantic errors.

 Outputs a parse tree (a structured representation of the


query).

2. Query Validation:

 Ensures that the query refers to valid tables, columns, and


data types.

2. Query Optimization

 Purpose: Determines the most efficient way to execute the query.

 Steps:

1. Heuristic Optimization: Applies rules like moving filters early


(e.g., apply WHERE before JOIN).

2. Cost-Based Optimization: Considers factors like CPU usage, I/O,


and memory. Chooses the query plan with the lowest cost.

 Output: An optimized query plan that minimizes resource


consumption.

3. Query Plan Generation


 Purpose: Converts the optimized query plan into a series of physical
operations for execution.

 The query plan may include:

o Index scans or table scans.

o Join algorithms (e.g., merge join or hash join).

 Output: A physical plan with detailed instructions for the database


engine.

4. Code Generation

 Purpose: Converts the query plan into executable code (low-level


instructions).

 This code interacts directly with the database storage to fetch and
manipulate data.

5. Query Execution

 Purpose: Executes the generated code and retrieves the desired result.

 Data is fetched from storage, processed, and returned to the user in the
requested format.

Overview of Query Processing and Optimisation in


Databases
Mapping Global Queries into Local Queries
Mapping global queries into local queries refers to the process of converting a
high-level query written for a distributed database (global query) into smaller
subqueries that can be executed by the individual local databases where data is
stored.

This is a critical step in distributed database management systems (DDBMS) to


ensure seamless query execution and result integration.

Steps in the Process

1. Query Decomposition

o The global query is divided into multiple subqueries, each designed


to retrieve data from specific fragments or local databases.
o The decomposition ensures that the data required by the query is
retrieved from all relevant locations.

2. Data Location Identification

o Identify which fragments or local databases store the data needed


for the query. This is typically achieved using metadata maintained
by the system, which maps global tables to their respective local
data sources.

3. Query Translation

o Convert the subqueries into the syntax and format supported by the
local database systems. This step is essential when the local
databases are heterogeneous, meaning they use different query
languages or database management systems.

4. Query Execution Planning

o Develop a strategy to execute the subqueries efficiently. This


includes deciding the order of execution and minimizing data
transfer between nodes to optimize performance.

5. Execution of Subqueries

o The subqueries are executed on their respective local databases


independently. The results from these executions are collected by
the global system.

6. Result Integration

o The results from all local queries are combined to form a single
unified result set that matches the global query's requirements. This
often involves operations such as merging, sorting, or joining data.

Query Optimisation
Query optimization refers to the process of selecting the most efficient query
execution plan for a given query. The goal is to minimize the resources required
for query execution (such as time, memory, and CPU usage) while ensuring that
the query results are correct.

As data grows and queries become more complex, optimizing queries becomes
increasingly important to maintain system performance and cost-effectiveness.

Optimal Utilisation of Resources in the Distributed System


Optimal utilization of resources in a distributed system refers to the efficient
management and allocation of various resources (such as CPU, memory,
bandwidth, and storage) across multiple interconnected nodes in a network.

Basic Concepts of Query Processing


Basic Steps in Query Processing
1. Parsing and Translation

 What it is: In this first step, the system checks the syntax of the user’s
query and translates it into an internal form that the system can process.

 How it works:

o The query is analyzed and divided into tokens (e.g., keywords, table
names, column names).

o A parse tree or abstract syntax tree (AST) is created to represent the


logical structure of the query.

o Semantic analysis ensures that the query references valid database


objects (like tables, columns).

 Output: The output is the parse tree or abstract syntax tree that
represents the query's logical structure.

2. Query Optimization

 What it is: This step focuses on improving the efficiency of the query by
finding the most cost-effective way to execute it.

 How it works:

o The system looks at different ways to execute the query (e.g.,


different join strategies, filtering methods, indexing options).

o The query optimizer uses heuristic rules or cost-based approaches


to evaluate and compare possible query execution plans.

o The goal is to select the most optimal plan based on resource


consumption (CPU, memory, disk I/O).
 Output: The output is an optimized query execution plan that specifies
how to best retrieve the data.

3. Evaluation

 What it is: The evaluation phase involves executing the optimized query
plan and actually performing the database operations (like scans, joins,
and sorts).

 How it works:

o The DBMS evaluates the query by performing the actions defined in


the optimized execution plan.

o The system accesses the storage manager to retrieve data from the
disk if needed.

o The results are generated based on the logical operations specified


in the execution plan (e.g., reading from tables, applying joins,
filtering data).

 Output: The final result of the query is produced and sent back to the
user, often as a result set or a view of the requested data.

4. Execution

 What it is: After evaluation, the system executes the query and returns the
results to the user.

 How it works:

o The evaluated plan is executed in a manner that retrieves the


necessary data from the database.

o Depending on the complexity of the query, different operations like


filtering, joining, or aggregation might be performed.

 Output: The query results are returned to the user.

Converting SQL Queries into Relational Algebra


Overview of Relational Algebra
Relational Algebra is a mathematical framework used for querying and
manipulating relational databases. It provides a set of operations that work on
relations (tables) and return relations as results.
These operations serve as the foundation for SQL queries and are used by
DBMSs to execute queries efficiently.

Types of Relational Operation


1. Selection (σ): The selection operation is used to filter rows based on a
specified condition (predicate). It returns a subset of the relation that
satisfies the given condition.

2. Projection (π): The projection operation is used to select specific columns


(attributes) from a relation. It removes duplicate tuples by default.

3. Union ( ∪ ): The union operation combines the tuples from two relations,
eliminating duplicates. Both relations must have the same set of attributes.

4. Set Difference ( − ): The set difference operation returns the tuples that
are in one relation but not in another.

5. Cartesian Product ( × ): The Cartesian product operation combines every


tuple of the first relation with every tuple of the second relation. The result is
a new relation with all possible combinations of tuples.

6. Rename (ρ): The rename operation is used to rename the attributes or the
relation itself. This is often used to avoid ambiguity when working with
multiple relations.

How Queries can be categorised in Relational Database?


In relational databases, queries can be categorized into different types based on
the actions they perform. Here's a simplified breakdown:

1. Data Retrieval Queries


These queries are used to retrieve or fetch data from one or more tables
without changing anything. For example, using a SELECT statement to get
specific columns from a table.

2. Data Manipulation Queries


These queries help in adding, modifying, or removing data. Examples
include:

o INSERT: Adds new data to a table.

o UPDATE: Changes existing data.


o DELETE: Removes data.

3. Data Definition Queries


These queries define or alter the structure of the database. Examples
include:

o CREATE: Creates new tables or structures.

o ALTER: Changes existing structures (like adding or removing


columns).

o DROP: Deletes tables or structures.

4. Data Control Queries


These queries manage permissions and access to data. Examples:

o GRANT: Assigns permissions to users.

o REVOKE: Removes permissions from users.

5. Transaction Control Queries


These queries manage how changes are saved or reverted during a
transaction. Examples:

o COMMIT: Saves all changes made during a transaction.

o ROLLBACK: Reverts any changes made during a transaction.

Converting SQL Queries into Relational Algebra


Relational algebra is a theoretical foundation for SQL. It helps understand how
SQL queries are executed by using set operations on tables.

Optimisation
Optimisation in the context of databases refers to the process of improving the
efficiency of SQL queries. This involves reducing the query's execution time,
minimizing the use of resources (like CPU and memory), and ensuring faster
response times.

Requirement for SQL Query Optimisation

SQL query optimization is essential for improving the performance of a


database. It helps in:

 Reducing Execution Time: Optimized queries execute faster, making


the application more responsive.
 Minimizing Resource Consumption: Efficient queries consume less
CPU, memory, and disk space.

 Improving Scalability: Optimized queries scale better as data grows,


making applications perform well even with larger datasets.

 Improving User Experience: Faster query results lead to a better user


experience, especially in high-traffic systems.

Best Practices for SQL Query Optimisation

1. Indexing: Create indexes on frequently queried columns to speed up data


retrieval. This can drastically reduce the time it takes to search through
large tables.

2. Select Only Necessary Columns: Avoid SELECT *. Only retrieve the


columns you need to minimize data processing and network traffic.

3. Use WHERE Clauses Effectively: Filtering data as early as possible


reduces the amount of data that needs to be processed. Avoid
unnecessary full-table scans by using appropriate conditions.

4. Avoid Subqueries When Possible: Subqueries can often be inefficient.


Try to rewrite them as joins or use EXISTS instead, which may be more
efficient.

5. Limit the Number of Returned Rows: Use LIMIT or TOP to restrict the
number of rows returned, especially in queries that only need a subset of
the data.

6. Avoid Multiple Joins on Large Tables: When joining large tables,


ensure that indexes are used and only relevant rows are included in the
join.

Metrics for Analyzing Query Performance for SQL Query Optimisation

To assess how well a query is performing, it's important to analyze certain


metrics:

1. Execution Time: The amount of time it takes for a query to return


results. Lower execution time means better performance.

2. CPU Usage: High CPU usage indicates that the query is computationally
expensive, which could suggest a need for optimization.
3. Disk I/O: Measures the amount of data read from or written to disk. High
disk I/O suggests inefficient query patterns, possibly due to missing
indexes.

4. Memory Usage: Excessive memory usage can indicate that large


intermediate results are being created during query execution.

1. Selection

In SQL, selection refers to the process of filtering records from a table based on
a specified condition. Using the WHERE clause efficiently can significantly
improve query performance by narrowing down the dataset early in the query.

2. Avoid Using SELECT DISTINCT

The SELECT DISTINCT keyword is used to eliminate duplicate records from the
result set. While it can be useful, it can be expensive in terms of performance
because it requires additional processing to check for duplicate rows.

3. Inner Joins vs WHERE Clause

Inner joins are used to combine rows from two or more tables based on a
related column. The equivalent condition can sometimes be written in the
WHERE clause, but there’s a subtle difference in performance.

4. LIMIT Command

The LIMIT command is used to restrict the number of rows returned by a query.
It can help optimize queries, particularly when you only need a subset of the
data (e.g., in pagination).

5. IN versus EXISTS

Both IN and EXISTS are used in subqueries, but they work differently. The IN
operator checks if a value matches any value in a list or subquery, while EXISTS
checks if a subquery returns any rows.

Evaluation
Query evaluation in databases refers to the process of executing a SQL query to
retrieve the requested data. The query execution process is typically managed
by a query optimizer that evaluates the best way to execute a given query. This
process involves transforming a query into an execution plan that is efficient in
terms of resource usage (time, memory, etc.).

Creation of Query Evaluation Plans

A Query Evaluation Plan is a sequence of steps that describe how a query will be
executed. These steps typically involve a series of relational operations (like
joins, selections, projections) applied on tables.

1. Parser Stage: The query is parsed to generate a parse tree.

2. Logical Plan: The query is transformed into a logical plan, which consists
of relational algebra expressions.

3. Optimization: The logical plan is optimized to improve performance by


transforming it into more efficient equivalent forms.

4. Physical Plan: The final optimized plan is mapped into a physical execution
plan, specifying how operations like joins and selections should be
implemented physically (e.g., using hash joins or nested loops).

Transformation of Relational Expressions

Transforming relational expressions refers to the process of converting one form


of relational operation into another. This is done to optimize the query
evaluation process.

For example, changing the order of operations or converting a join to a different


type of join can lead to more efficient query execution.

Equivalence Rules

Equivalence rules are used to transform a relational expression into an


equivalent expression that may be more efficient to execute. These rules are
applied during query optimization to ensure that the result of the query is the
same, but the execution is more efficient.

Here are some common equivalence rules:

1. Commutativity of JOIN: The order of joining tables does not affect the
result.

2. Associativity of JOIN: The order in which joins are performed does not
affect the result.
3. Distributivity of SELECTION over JOIN: A selection can be applied
before a join to reduce the number of rows involved in the join.

4. Projection can be moved: Projections can be applied before or after


joins without changing the results.

These equivalence rules help the query optimizer identify more efficient
execution strategies for the query.

Obtaining Alternative Query Expressions

Obtaining alternative query expressions refers to generating different ways of


writing a query to achieve the same result. Each alternative might have
different performance characteristics, and the query optimizer evaluates which
one is the most efficient.

For example:

 Using different join types: A query using an inner join can be rewritten
using a hash join or nested loop join, depending on the size of the tables
and indexes available.

 Rearranging operations: A query that performs selection followed by a join


can be rewritten to perform the join first, depending on which results in
fewer rows being processed.

Query Evaluation Plans

A query evaluation plan defines the physical steps for executing a query,
outlining how the database will perform operations such as table scans, joins,
selections, and projections.

Components of a Query Evaluation Plan:

 Access Paths: How data will be retrieved (e.g., full table scan, index scan).

 Join Algorithms: Defines which join algorithm to use (e.g., hash join,
nested loops join).

 Sort Operations: Whether and how sorting is performed.

 Cost Estimation: The estimated cost (in terms of time and resources) for
executing the plan.

Choice of Evaluation Plans


Choosing the best query evaluation plan is crucial for optimizing query
performance. The choice depends on factors such as:

1. Data Size: Larger datasets may benefit from more efficient join
algorithms like hash joins or merge joins.

2. Indexes Available: If indexes are available, they can speed up data


retrieval (e.g., using index scans instead of full table scans).

3. Join Types: Depending on the number of rows involved, the optimizer


may choose between nested loop joins, hash joins, or merge joins.

4. Cost-Based Optimization: Optimizers may use cost models to predict


the cost of different plans. The plan with the lowest cost (in terms of time
and resources) is typically chosen.

5. Caching: If certain data is likely to be accessed repeatedly, caching


intermediate results can reduce repeated computations.

Query Tree
A Query Tree is a hierarchical tree structure that represents a query in terms of
relational operations. It is used in the query optimization process to help
visualize and transform queries into efficient execution plans.

Each node in the tree represents a relational operation (such as SELECT, JOIN, or
PROJECT), and the branches show the order in which the operations need to be
performed. A query tree helps in understanding the flow of data and aids in
finding an optimal execution strategy.

Creation of Query Tree

Creating a Query Tree involves transforming a given query into a tree structure
where each operation is represented as a node. The nodes are connected in a
way that shows the order of operations.

Here are the steps involved in creating a query tree:

1. Parsing:
The query is parsed by the database system to check its syntax and
structure. During this phase, the query parser generates a parse tree that
represents the syntactical structure of the query.

2. Query Optimization:
Once the query is parsed, it is transformed into an internal representation,
often as a relational algebra expression. The optimizer uses this internal
representation to apply optimization techniques and generate more
efficient query plans. During optimization, a query tree is created or
transformed to represent the most efficient plan.

3. Query Execution Tree:


The query execution tree represents the steps in the query plan that will
actually be executed. It is an optimized version of the query tree, where
the database system has selected the most efficient operations. Each
node in the tree corresponds to an operation (such as scan, join, etc.), and
the tree structure shows the order in which the operations will be
performed.

4. Physical Execution Plan:


Once the query execution tree is generated, it is mapped to a physical
execution plan. This plan specifies the physical algorithms for performing
operations like sorting, joining, or scanning. For example, a join operation
might be implemented using a nested loop join, hash join, or merge join,
depending on the size of the data and available indexes.

5. Execution:
After the physical execution plan is created, the query is executed
according to the steps in the plan. The database system performs the
operations as per the tree's structure, using the chosen algorithms to
retrieve the data.

Overview of the Process

1. Parsing: The query is first checked for syntax and structure.

2. Query Optimization: The query is optimized to find the most efficient


execution plan.

3. Query Execution Tree: The optimized plan is represented as a query


execution tree.

4. Physical Execution Plan: The execution plan is translated into physical


steps for actual data retrieval.

5. Execution: The query is executed based on the plan.


Each of these stages plays a crucial role in ensuring that the query is processed
as efficiently as possible, from parsing and optimization to execution.

Query Graph
A Query Graph is a graphical representation used to visualize and manage the
relationships between different parts of a SQL query. It provides a way to break
down complex SQL operations into simpler components, showing how different
tables and operations interact with one another.

The query graph helps in visualizing the flow of data and operations in a query,
which is useful for optimizing query performance and analyzing the query
execution process.

Role of Query Graph in SQL

1. Query Optimisation:
The query graph plays a key role in query optimization by representing
the operations and their relationships. It helps the optimizer identify the
most efficient way to execute a query by considering different join types,
the order of operations, and the flow of data between operations. For
example, the graph can help in deciding the most efficient join order,
helping to minimize resource usage and query time.

2. Visual Representation:
A query graph provides a visual representation of the SQL query's logical
structure. Each node in the graph typically represents a table or an
operation, such as a join or selection. The edges (connections between
nodes) show how data flows between operations. This representation
helps both developers and query optimizers understand how the query
works at a high level, aiding in better decision-making for optimization.

3. Query Execution Plan:


The query execution plan can be derived from the query graph. By
breaking down the query into smaller operations (such as selections, joins,
or projections), the query graph makes it easier to transform the query
into a physical execution plan. The graph can show which operations will
be executed first and how data will be processed step-by-step, helping
database systems choose the most efficient plan.

4. Cost-Based Optimisation:
Query graphs are often used in cost-based optimization. The optimizer
uses the query graph to calculate the estimated cost (in terms of time and
resources) for different query execution strategies. Each operation in the
graph has a cost associated with it, and the optimizer will try to minimize
the total cost by selecting the most efficient operations and their order.
The query graph helps in evaluating the trade-offs between different
strategies.

5. Parallel Execution:
The query graph also plays a significant role in parallel execution of
queries. By visualizing how different parts of a query can be executed
independently, the query graph helps identify opportunities for parallel
processing. For instance, if two operations in the graph do not depend on
each other, they can potentially be executed in parallel to improve
performance and reduce query execution time.

6. Debugging and Performance Tuning:


Query graphs are valuable for debugging and performance tuning. When
queries are running slowly or not returning the expected results, the query
graph can help identify inefficiencies in the query's execution plan.
Developers and database administrators can use the graph to spot
performance bottlenecks, such as unnecessary joins, excessive sorting, or
inefficient access paths, and make adjustments to improve query
performance.

7. Query Explain Plans:


Query Explain Plans are generated by databases to show how a query will
be executed. These plans are often presented in the form of a query
graph, detailing the sequence of operations, their cost, and the data flow
between them. By analyzing the explain plan, users can better understand
how their queries are being executed and identify potential areas for
optimization.

Heuristic Optimisation of Query Tree


Heuristic optimization of a query tree refers to the process of applying specific
rules or heuristics to modify the query tree structure in order to improve the
performance of the query execution. The goal is to reduce the time and
resources required to execute a query by simplifying and rearranging operations
within the query tree.

In simple terms, it involves making educated guesses or using predefined rules


that have been shown to lead to better performance, without necessarily
considering all possible query plans. Heuristic optimization is often a faster,
more practical approach compared to exhaustive optimization methods, which
may take a lot of time and computational resources.
Different Query Optimisation Approaches
Query optimization is the process of improving the performance of a database
query. There are two main approaches to query optimization: Cost-Based
Optimization and Semantic Query Optimization.

Each approach uses different techniques to improve the performance of SQL


queries by reducing execution time, minimizing resource usage, and optimizing
the query execution plan.

1. Cost-Based Optimisation

Cost-Based Optimisation (CBO) is an approach that chooses the best query


execution plan based on the estimated "cost" of various possible plans. The cost
is typically measured in terms of time (CPU usage) or resources (disk I/O). CBO
uses a cost model to evaluate different execution strategies for a given query
and selects the one with the least estimated cost.

Key Features of Cost-Based Optimisation:

 Estimates Costs: It calculates the cost of each possible execution plan


using factors such as CPU usage, I/O operations, and network resources.

 Relies on Statistics: The optimization process depends on statistics


about the tables and indexes involved in the query. These statistics
include data distribution, table size, and index availability.

 Exhaustive Search: CBO evaluates multiple execution plans and


compares their costs. This can involve expensive computations, but it
guarantees finding the most efficient plan (under given assumptions).

 Plan Selection: The optimizer chooses the query execution plan that
minimizes the total cost, considering operations like joins, selections, and
projections.

2. Semantic Query Optimisation

Semantic Query Optimisation (SQO) is an optimization approach that focuses on


the logical structure of the query and aims to exploit the inherent properties of
the data and the query itself. Unlike cost-based optimization, which relies
heavily on statistics and resources, semantic query optimization uses
knowledge of the domain or business logic to simplify or transform queries in
ways that may not be immediately obvious through cost analysis alone.

Key Features of Semantic Query Optimisation:

 Application of Business Rules: SQO takes into account domain-specific


knowledge, such as business rules and constraints, to improve query
performance. This may include rules like "if this condition is true, then
certain other conditions can be skipped."

 Transformation of Queries: It often involves transforming a query into


an equivalent but more efficient form. For example, it might replace an
expensive operation with a simpler one or eliminate redundant operations.

 Focus on Logical Equivalence: The optimization seeks to achieve a


semantically equivalent query that may run more efficiently. For example,
a semantic optimization might eliminate unnecessary joins or apply a rule
like "filtering data early in the query is better."

 Using Views and Materialized Views: Sometimes, semantic


optimizations involve replacing complex subqueries with predefined views
or materialized views, which store precomputed results for faster access.

Functional Dependencies
A functional dependency (FD) is a relationship between two sets of attributes in
a database relation. It expresses how the value of one attribute (or a group of
attributes) determines the value of another attribute (or group of attributes).

Functional dependencies are crucial for normalization and help to minimize


redundancy and ensure data consistency in relational databases.

Rules of Functional Dependency


In a relational database, functional dependency (FD) defines the relationship
between attributes (or columns) in a relation (or table). It specifies that one
attribute (or a set of attributes) uniquely determines another attribute (or set of
attributes).

Functional dependencies are used to maintain data consistency and reduce


redundancy in databases. The rules of functional dependency are essential in
understanding database normalization and ensuring data integrity.

Types of Functional Dependency


1. Multivalued Dependency (MVD)

A multivalued dependency (MVD) occurs when one attribute (or set of


attributes) determines a set of values for another attribute (or set of attributes),
and the set of values can be independently determined without reference to
other attributes in the table.
Properties:

 Multivalued dependencies arise when there are multiple independent facts


related to the same entity.

 They violate the Fourth Normal Form (4NF) if not handled properly.

2. Trivial Dependency

A trivial functional dependency occurs when the dependency is always true,


regardless of the data values. Specifically, any set of attributes always
determines itself.

Properties:

 Trivial dependencies do not provide useful information for normalization


purposes.

 They are generally ignored in the context of database design.

3. Non-Trivial Dependency

A non-trivial functional dependency is a functional dependency where the


dependent attribute is not a subset of the determinant attribute. This is a more
meaningful dependency than a trivial dependency.

Properties:

 Non-trivial dependencies are the primary focus of database normalization.

 These dependencies help in identifying redundant data and improving the


design of relational schemas.

4. Transitive Dependency

A transitive dependency occurs when one attribute indirectly determines


another attribute through a third attribute. In other words, if X → Y and Y → Z,
then X → Z by transitivity.

Properties:

 Transitive dependencies violate Third Normal Form (3NF).

 They can often be eliminated by decomposing the relation into smaller


relations to remove redundancy and ensure that only direct dependencies
remain.
Normal Forms
Normalisation
Normalization is the process of organizing a relational database in such a way
that it reduces redundancy and dependency. The goal is to ensure that the data
is logically stored and that updates to the database will be more efficient and
less error-prone. The process involves dividing large tables into smaller ones and
defining relationships between them.

1st Normal Form


A table is in 1st Normal Form if it meets the following conditions:

 All columns contain atomic (indivisible) values.

 Each record in the table has a unique identifier (Primary Key).

 The table does not contain repeating groups or arrays

2nd Normal Form


A table is in 2nd Normal Form if:

 It is already in 1st Normal Form.

 It has no partial dependencies. This means that all non-key attributes are
fully dependent on the entire primary key, not just part of it.

3rd Normal Form


A table is in 3rd Normal Form if:

 It is already in 2nd Normal Form.

 There are no transitive dependencies. This means that non-key attributes


should not depend on other non-key attributes.

4th Normal Form


A table is in 4th Normal Form if:

 It is already in 3rd Normal Form.


 It has no multivalued dependencies. This means that there should be
no situation where one attribute determines multiple values for another
attribute independently of other attributes.

5th Normal Form


A table is in 4th Normal Form if:

 It is already in 3rd Normal Form.

 It has no multivalued dependencies. This means that there should be


no situation where one attribute determines multiple values for another
attribute independently of other attributes.

You might also like