Normalization: Problems of Data Redundancy
Normalization: Problems of Data Redundancy
Data Inconsistency
Advantages
Minimizes Data Redundancy: Reduces duplicate data, saving storage space.
Avoids Anomalies: Prevents issues with data insertion, update, and deletion.
Improves Query Performance: Simplifies queries and makes the database more
efficient
Disadvantages
Performance Issues: The need for multiple joins can slow down read operations,
especially in high normalized databases
A table is in first normal form when the columns have atomic values(Atomic
value is a value that is stored in single column of a table).
Ex: If you want to update middle name from below De-normalized table , you
need to write weird SQL query, so you will be having DML (Insert, Update,
Delete) anomalies.
So we will have separate columns for first,
middle and last names as shown in below normalized table.
Now you know how simple it is to write SQL query to update middle name on
Normalized table.
DeNormalized
Data is in De-normalized form when multiple values of a single
attribute are stored in single column rather than in separate
columns
Normalized
Each cell contains only one value
www.linkedin.com/in/chenchuanil
2nd Normal Form
2 NF satisfied
www.linkedin.com/in/chenchuanil
3rd Normal Form
3rd NF Satisfied
www.linkedin.com/in/chenchuanil
De - Normalization
Partial dependency
Which one to Choose
Transient dependency
Choose normalization when you have write
heavy operations such as customer is
APT
ordering every 2 or 3 sec. Ex: OLTP
www.linkedin.com/in/chenchuanil
ACID POPERTIES
The ACID properties ensure the reliability of transactions in a
database. They stand for Atomicity, Consistency, Isolation, and
Durability.
1. Atomicity
Atomicity ensures that all operations within a transaction are
completed successfully. If any part of the transaction fails, the
entire transaction is rolled back, and no changes are made to the
database.
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 1;
UPDATE accounts
SET balance = balance + 100
WHERE account_id = 2;
COMMIT;
If either of the UPDATE statements fail in above SQL query(e.g., insufficient funds in
account 1), the transaction is rolled back, ensuring no partial updates occur.
www.linkedin.com/in/chenchuanil
2. Consistency
Consistency ensures that a transaction brings the database from one valid
state to another, complying with all predefined rules (such as constraints,
triggers, or integrity conditions). If a transaction violates any rule, the
entire transaction is aborted, and the database remains unchanged.
Example:
Consider a scenario where a bank database has a rule that account
balances must always be non-negative (i.e., there’s a CHECK constraint
that prevents balances from falling below zero)
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 500
WHERE account_id = 1;
COMMIT;
If the UPDATE operation attempts to reduce the balance below zero, the transaction
will fail due to the CHECK constraint, and no changes will be applied to the database.
Consistency ensures that the database maintains its integrity, meaning no invalid data
(like a negative balance) will ever be stored. The transaction will either complete
successfully or roll back to maintain this state.
www.linkedin.com/in/chenchuanil
3. Isolation
Example:
Consider two transactions running concurrently
Transaction A: Withdraw 100 from account 1.
Transaction B: Query the balance of account 1.
-- Transaction A
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 1;
-- Transaction not yet committed
-- Transaction B
SELECT balance FROM accounts
WHERE account_id = 1;
If isolation is maintained properly, Transaction B will either see the balance before or
after Transaction A's update, but not the intermediate state.
www.linkedin.com/in/chenchuanil
4. Durability
Durability ensures that once a transaction is committed, its effects are
permanent. Even in the case of a system crash, power failure, or other
catastrophic events, the changes made by committed transactions will
persist in the database.
Example:
Consider a scenario where you transfer 200 from one account to another.
Once the transaction is committed, the database guarantees that the
changes (the deduction from one account and the addition to the other) will
not be lost, even if there’s a power outage or a system failure.
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 200
WHERE account_id = 1;
UPDATE accounts
SET balance = balance + 200
WHERE account_id = 2;
COMMIT;
After the COMMIT statement, the changes to the accounts table are durable. Even if the
system crashes or there's a power failure immediately after the commit, the database
will ensure that these changes remain intact when the system comes back online.
www.linkedin.com/in/chenchuanil
Slowly Changing Dimensions (SCD)
www.linkedin.com/in/chenchuanil
SCD Type 0: Retain Original (No Change)
Design : The dimension table is designed to store static data, and no provisions are
made for changes.
Example
www.linkedin.com/in/chenchuanil
SCD Type 1: Overwrite
Purpose: The old data is overwritten with new data.
Use Case: When keeping history is not important, and you only care
about the most current information.
Method: Instead of overwriting the existing record, a new row is inserted with the new attribute values, and the
old row is retained with additional metadata (e.g., effective dates, active/inactive flags) to track historical
versions.
Use Case: When it's important to track the history of changes over time.
Example: If a customer changes their address, a new row is added to the dimension table with the new address,
while the old row remains unchanged, representing the historical record.
UPDATE customer_dimension
SET effective_end_date = CURDATE(), -- Set the end date to today
is_current = FALSE -- Mark it as inactive
WHERE customer_id = 1 -- Filter by the specific customer
AND is_current = TRUE; -- Only update the current active record
-- 2. Insert a new record with the updated address (start a new version)
Key Point: Each new version of the data gets a new row, and historical versions
are retained by setting the end date and status flag. www.linkedin.com/in/chenchuanil
SCD Type 3: Add New Attribute (Track
Limited History)
Purpose: Track limited history by adding new columns (attributes) for changes.
Method: Instead of adding new rows, a new column (attribute) is added to the table to
store the previous value of the attribute. This approach only tracks a limited number of
changes (usually just the previous value and the current value).
Use Case: Useful when only a few changes need to be tracked, and full historical tracking
is not necessary.
Example: A "previous address" column is added to store the old address, while the
current address is stored in a separate column.
Design:
Add columns to the dimension table for storing both the current and previous values
of attributes. This method only tracks limited historical changes, typically just one
previous version.
Key Point: Only a limited amount of historical information is stored (e.g., the previous
value).
www.linkedin.com/in/chenchuanil
ANIL REDDY CHENCHU
DATA ANALYTICS
www.linkedin.com/in/chenchuanil