0% found this document useful (0 votes)
10 views

Complete Data Cleaning Guide On in SQL

Uploaded by

abhijeet161b001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Complete Data Cleaning Guide On in SQL

Uploaded by

abhijeet161b001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Complete

Guide on

Data

cleaning

in sql

Email [email protected]
PART -1
Data

cleaning

IN sql

Email [email protected]
02

For this tutorial I have used


different Databases and tables.

This data is only for learning


purpose.

Let’s learn how to do Data


Cleaning in SQL.

03

Data Cleaning

INITIAL STEPS
First Step is to create a New table for our
cleaning purpose.

We should never clean data in original


dataset.

So let’s create a new temporary table


and insert our original data into it.

CREATE TABLE temp_employees LIKE employees;


04

Data Cleaning

CREATING TEMP TABLE


Employees is our original table and we
have created a same schema temporary
table from our original table.

SELECT * FROM temp_employees;


05

Data Cleaning

INSERTING VALUES IN TEMP TABLE

Now Let’s Insert data from our original


table to our temporary table to start data
cleaning.

INSERT INTO temp_employees

SELECT * FROM employees;


06

Data Cleaning

UNDERSTANDING SCHEMA
Now we have the data in our temporary
table we can start data cleaning.

Let’s look at the table data types and


constraints.

DESCRIBE temp_employees ;

07

Data Cleaning

UNDERSTANDING SCHEMA

We can now check the data types of the


columns whether it is correct or not.

We can also see all the constraints on our


columns.
08

Data Cleaning

CHANGING DATA TYPES


From the Schema we can clearly see that
hire_date column is of VARCHAR data type
instead of Date Type.

So let’s Convert this to correct data type.

We have 2 ways :

1. Altering the table

ALTER TABLE temp_employees

MODIFY COLUMN hire_date DATE;

09

Data Cleaning

CHANGING DATA TYPES


Casting and Converting Data

We use this when we don’t want to alter


our table.

SELECT STR_TO_DATE(hire_date,'%Y-%m-%d') as
hire_date FROM temp_employees;

We use this function to convert any column


to date data type to year-month-date
format.
10

Data Cleaning

CHANGING DATA TYPES


To Convert VARCHAR to INT we use

SELECT CAST(age as UNSIGNED) as age FROM


temp_employees;

VARCHAR to DATE

SELECT CONVERT(hire_date,DATE) as hire_date FROM


temp_employees;

11

Data Cleaning

CHANGING DATA TYPES


To Convert VARCHAR to DECIMAL

SELECT CONVERT(‘12345’,DECIMAL(10,2)) as salary


FROM temp_employees;

12

This is just the start of data


cleaning we have many things to
cover in this series.

So make sure to follow me and


save this pdf.

After all parts are done I will


merge all this pdf into single
data cleaning SQL Pdf.

For More Education Content

FOLLOW

ME ON

LINKEDIN

Please Like and


Repost if you find

this helpful

LinkedIn www.linkedin.com/in/dhruvik-detroja/
PART -2
Data

cleaning

IN sql

Email [email protected]
02

For this tutorial I have used


different Databases and tables.

This data is only for learning


purpose.

Let’s learn how to do Data


Cleaning in SQL.

Check Part 1 in Comment


Section.

03

Data Cleaning

HANDLING MISSING DATA


We have customers table like this.

First we need to identify missing data.


04

Data Cleaning

IDENTIFYING MISSING DATA


Step 1 : Identify Missing Data

SELECT

COUNT(*) AS TotalRows,

COUNT(CASE WHEN FullName IS NULL THEN 1 END) AS


MissingFullName,

COUNT(CASE WHEN Age IS NULL THEN 1 END) AS


MissingAge,

COUNT(CASE WHEN Email IS NULL THEN 1 END) AS


MissingEmail,

COUNT(CASE WHEN PhoneNumber IS NULL THEN 1 END) AS


MissingPhoneNumber,

COUNT(CASE WHEN SignupDate IS NULL THEN 1 END) AS


MissingSignupDate

FROM Customers;

05

Data Cleaning

IDENTIFYING MISSING DATA


This will return the count of null values
in our table.
06

Data Cleaning

ANALYZING MISSING DATA


Step 2 : Analyzing Missing Values

SELECT * FROM Customers WHERE

FullName IS NULL OR

Age IS NULL OR

Email IS NULL OR

PhoneNumber IS NULL OR

SignupDate IS NULL;
07

Data Cleaning

ANALYZING MISSING DATA


Now we have all the rows having some
kind of null values.

We can look at this data and we should


try to understand what can we do with
this null values.

There is only 2 options we can do to


handle this missing values
Delete Missing Dat
Adding Values to that Missing Data

08

Data Cleaning

HANDLING MISSING DATA


Step 3 : New Values for Missing Data

a) When we have missing data for


numerical columns we can set those
missing value according to mean,
median or mode approach.

Remember how to handle missing


completely depends on what kind of
data you are working on and applying
the domain knowledge on that dataset.

It should make sense to apply those


values when we have big dataset but
very low null values.

09

Data Cleaning

HANDLING MISSING DATA

UPDATE customers

SET Age = (SELECT Age FROM (SELECT AVG(Age) as Age


FROM customers) AS S)

WHERE Age IS NULL;

Using this query we are applying Mean


age to our all null values in age column.
10

Data Cleaning

HANDLING MISSING DATA


b) Imputing with some other value

We have phone number column we don’t


have any value we can put in this row so
we can write unknown to this column.

UPDATE Customers

SET PhoneNumber = 'Unknown'

WHERE PhoneNumber IS NULL;


11

Data Cleaning

HANDLING MISSING DATA


b) Imputing with some other value

Look at this data


12

Data Cleaning

HANDLING MISSING DATA


By Looking at that missing data and
applying simple logic we can say
industry for Airbnb should be Travel

So this time we have not null value we


have blank value which is different thing

We can have blank values as well as Null


Values in our dataset.

So Check it properly also for blank


values.
13

Data Cleaning

HANDLING MISSING DATA

UPDATE customers

SET industry = 'Travel'

WHERE company='Airbnb';
14

Data Cleaning

HANDLING MISSING DATA


Step 4 : Deleting Missing Data

This step should be avoided as this


involve data loss.

This should be our least priority when


handling null values.

If we don’t have any other option then


deleting the data then only we should
use this.

When dataset is bigger and there are


only few null values then it’s fine to
delete this data.

15

Data Cleaning

HANDLING MISSING DATA


Let’s remove data which doesn’t have
FullName.

DELETE FROM Customers

WHERE FullName IS NULL;


16

Data Cleaning

HANDLING MISSING DATA

We have successfully deleted the record.


17

This is just the start of data


cleaning we have many things to
cover in this series.

So make sure to follow me and


save this pdf.

After all parts are done I will


merge all this pdf into single
data cleaning SQL Pdf.

For More Education Content

FOLLOW

ME ON

LINKEDIN

Please Like and


Repost if you find

this helpful

LinkedIn www.linkedin.com/in/dhruvik-detroja/
PART -3
Data

cleaning

IN sql

Email [email protected]
02

For this tutorial I have used


different Databases and tables.

This data is only for learning


purpose.

Let’s learn how to do Data


Cleaning in SQL.

Check Part 2 in Comment


Section.

03

Data Cleaning

HANDLING DUPLICATE DATA


We have customers table like this.

First we need to identify duplicate data.


04

Data Cleaning

IDENTIFYING DUPLICATE DATA


First we need to identify duplicate data.

SELECT

FullName, Age, Email, PhoneNumber, SignupDate, COUNT(*)

FROM

Customers

GROUP BY

FullName, Age, Email, PhoneNumber, SignupDate

HAVING

COUNT(*) > 1;

05

Data Cleaning

IDENTIFYING DUPLICATE DATA


As you can see we are grouping all
columns and also using Count(*) > 1 which
give us records which are duplicate.

We got Count value as 3 this tell us that


this record is written 3 times and so we
need to remove 2 duplicate records from
this table.
06

Data Cleaning

IDENTIFYING DUPLICATE DATA


But we don’t have CustomerID for this
column so we need to write some kind of
code to find it.

We can also delete this record giving all


column values in our delete statement or
we can find CustomerID for records and
then delete records using those
CustomerID.
07

Data Cleaning

IDENTIFYING DUPLICATE DATA


This is the code to find CustomerID

SELECT

CustomerID, FullName, Age, Email, PhoneNumber, SignupDate

FROM

Customers

WHERE

(FullName, Age, Email, PhoneNumber, SignupDate) IN (

SELECT

FullName, Age, Email, PhoneNumber, SignupDate

FROM

Customers

GROUP BY

FullName, Age, Email, PhoneNumber, SignupDate

HAVING

COUNT(*) > 1

ORDER BY

FullName, Age, Email, PhoneNumber, SignupDate;

08

Data Cleaning

REMOVING DUPLICATE DATA


We can keep first occurance and remove
other duplicate records.

DELETE FROM Customers

WHERE CustomerID IN (4,7)

Note: Before doing any Update/Delete


Operation also Commit the database.

This is important when we might have


done something wrong we can always
rollback to that state of database again.

Next Query We can use so we do not need


to hard code any value.
09

Data Cleaning

REMOVING DUPLICATE DATA


WITH CTE AS (

SELECT

CustomerID,

FullName,

Age,

Email,

PhoneNumber,

SignupDate,

ROW_NUMBER() OVER (PARTITION BY FullName, Age, Email,


PhoneNumber, SignupDate ORDER BY CustomerID) AS row_num

FROM

Customers

-- Step 2: Delete duplicates, keeping the first occurrence

DELETE FROM Customers

WHERE CustomerID IN (

SELECT CustomerID

FROM CTE

WHERE row_num > 1

);
10

Data Cleaning

RECHECKING DUPLICATE DATA


Using the same SQL Query we should
double check at the end whether All
duplicate records are removed or not.
For More Education Content

FOLLOW

ME ON

LINKEDIN

Please Like and


Repost if you find

this helpful

LinkedIn www.linkedin.com/in/dhruvik-detroja/
PART -4

Data

cleaning

IN sql

Email [email protected]
02

For this tutorial I have used


different Databases and tables.

This data is only for learning


purpose.

Let’s learn how to do Data


Cleaning in SQL.

Check Part 3 in Comment


Section.

03

Data Cleaning

STANDARDIZING
Standardizing is the process to make
data standardize means ensuring all the
data values are in perfect format.

You will Understand this topic better


with examples.

So we can standardize different


categories of data like text , numbers
and dates.

Let’s start standardizing.


04

Data Cleaning

STANDARDIZING DATES
We have this Orders Table let’s
standardize it into proper date format.

We will Standardize it to yyyy-mm-dd


format.
05

Data Cleaning

STANDARDIZING DATES

-- Format: 'YYYY/MM/DD'

UPDATE Orders

SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate,

'%Y/%m/%d'), '%Y-%m-%d')

WHERE OrderDate LIKE '%/%/%';

-- Format: 'DD-MM-YYYY'

UPDATE Orders

SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate,

'%d-%m-%Y'), '%Y-%m-%d')

WHERE OrderDate LIKE '__-__-____';

STR_TO_DATE() function is used to convert


string to date data type.
06

Data Cleaning

STANDARDIZING DATES

-- Format: 'Month DD, YYYY'

UPDATE Orders

SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate, '%M


%d, %Y'), '%Y-%m-%d')

WHERE OrderDate LIKE '% %,%';

-- Format: 'YYYY.MM.DD'

UPDATE Orders

SET OrderDate = DATE_FORMAT(STR_TO_DATE(OrderDate, '%Y.


%m.%d'), '%Y-%m-%d')

WHERE OrderDate LIKE '%.%.%';

After we have date format we can format


this date using DATE_FORMAT() function.
07

Data Cleaning

STANDARDIZING DATES
Here is the result.
08

Data Cleaning

STANDARDIZING TEXT FIELD


We have this employees table and as you
can see text field needs to be standardized
as there are some spaces also some words
are in capital some in smaller letter.

Let’s standardize this text fields.


09

Data Cleaning

STANDARDIZING TEXT FIELD

-- Convert full name to proper case

UPDATE Employees

SET FullName = CONCAT(

UPPER(LEFT(TRIM(FullName), 1)),
LOWER(SUBSTR(TRIM(FullName), 2))

);

-- Trim leading and trailing spaces from email addresses

UPDATE Employees

SET Email = TRIM(Email);

TRIM() Function is used to remove wide


spaces from both sides then LEFT function
is used to retrieve the characters from left
side mentioning index we have used 1 so it
will take 1st character of our FullName.
10

Data Cleaning

STANDARDIZING TEXT FIELD


UPPER() is used to uppercase our
character and here we want 1st character
upper, then SUBSTR() is used to retrieve
substring by mentioning index we have
used 2 that mean start from 2nd index to
end of characters so we will get resultant
substring starting from 2nd position to end

LOWER() is used to lower case the letters.

Then We have used CONCAT to merge


both strings the 1st character which will be
now capital and after string will be in
Lower case

Ex-dhruvik d | hruvik = Dhruvik


11

Data Cleaning

STANDARDIZING TEXT FIELD


This will be our Result.

I have used multiple functions so you get


opportunity to learn all functions try those
functions separately when practicing.
12

Data Cleaning

STANDARDIZING PHONE NUMS


We have Contacts Table like this

Let’s Standardize this.

13

Data Cleaning

STANDARDIZING PHONE NUMS


We have Contacts Table like this
UPDATE Contacts

SET PhoneNumber = CONCAT(

SUBSTR(PhoneNumber, 2, 3), '-',

SUBSTR(PhoneNumber, 7, 3), '-',

SUBSTR(PhoneNumber, 11, 4)

WHERE PhoneNumber LIKE '(%';

-- Remove non-numeric characters and format as '123-456-7890'

UPDATE Contacts

SET PhoneNumber = CONCAT(

SUBSTR(PhoneNumber, 1, 3), '-',

SUBSTR(PhoneNumber, 5, 3), '-',

SUBSTR(PhoneNumber, 9, 4))

WHERE PhoneNumber LIKE '%-%' OR PhoneNumber LIKE '%.%'

OR PhoneNumber LIKE '% %';

14

Data Cleaning

STANDARDIZING PHONE NUMS


Note :

When we talk about using SUBSTR() let’s


understand how it works 1st argument is
column_name , 2nd argument is Index
Position from which we want to retrieve
the string, 3rd argument is how much
characters we want after that index
position.

15

Data Cleaning

STANDARDIZING ADDRESS
Now we have Addresses Table like this.

Let’s Standardize this.


16

Data Cleaning

STANDARDIZING ADDRESS
We have Contacts Table like this
-- Convert state abbreviations to uppercase

UPDATE Addresses

SET State = UPPER(State);

-- Standardize street suffixes (e.g., 'st.' to 'Street')

UPDATE Addresses

SET Street = REPLACE(Street, ' st.', ' Street')

WHERE Street LIKE '% st.';

-- Remove extra spaces and standardize city names

UPDATE Addresses

SET City = CONCAT(

UPPER(LEFT(TRIM(City), 1)),

LOWER(SUBSTR(TRIM(City), 2))

);

-- Standardize ZIP code format (e.g., '90001-1234' to '90001')

UPDATE Addresses

SET ZipCode = SUBSTR(TRIM(ZipCode), 1, 5)

WHERE ZipCode LIKE '%-%';

17

Data Cleaning

STANDARDIZING ADDRESS
This time we have a new REPLACE()
function which has 1st argument as column
name, 2nd argument value we want to
replace and 3rd argument value which we
want to replace with.

We already know other stuff so here is the


result.
18

Data Cleaning

END OF STANDARDIZING
We have covered most of the important
stuffs in our data cleaning series and we
have few topics left in the series.

If you find my content value then please


like , repost and share it this will help me
grow my engagement and provide more
value to the community.

Check Comment Box for other parts of this


series.

Thank You!

Dhruvik Detroja
For More Education Content

FOLLOW

ME ON

LINKEDIN

Please Like and


Repost if you find

this helpful

LinkedIn www.linkedin.com/in/dhruvik-detroja/
PART -5

Data

cleaning

IN sql

Email [email protected]
02

For this tutorial I have used


different Databases and tables.

This data is only for learning


purpose.

Let’s learn how to do Data


Cleaning in SQL.

Check Part 4 in Comment


Section.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
03

Data Cleaning

OUTLIERS
Outliers are the data points which
deviate significantly from the rest of the
dataset.

Outliers can distort statistical analysis


and lead to inaccurate conclusions.

So today we are going to learn how to


handle outliers with step by step
process.

Ready?

Let’s Go

LinkedIn www.linkedin.com/in/dhruvik-detroja/
04

Data Cleaning

IDENTIFYING OUTLIERS
To identify outliers we have different
method let’s learn them

1. Z Score Method:

Measures how many standard deviations


a data point is from the mean.

Normally we use Z score technique when


data is normally distributed.

We have sales table like this.

Let’s learn how to find Z score.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
05

Data Cleaning

IDENTIFYING OUTLIERS

LinkedIn www.linkedin.com/in/dhruvik-detroja/
06

Data Cleaning

IDENTIFYING OUTLIERS

SELECT

SaleID,

CustomerID,

SaleAmount,

(SaleAmount - (SELECT AVG(SaleAmount) FROM Sales)) /


(SELECT STDDEV(SaleAmount) FROM Sales) AS ZScore

FROM

Sales

HAVING

ABS(ZScore) > 3;

-- Assuming a threshold of 3 for identifying outliers

Mostly if Z score is above +-3 we


consider it as outlier.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
07

Data Cleaning

IDENTIFYING OUTLIERS
2. IQR Method:

Interquartile Range (IQR) is used to


identify outliers mostly used in skewed
distributions.

IQR is range between Q1 (first quartile)


and Q3 (third quartile)

IQR= Q3-Q1

Outliers are below Q1-1.5*IQR

OR

Outliers are above Q3+1.5*IQR

LinkedIn www.linkedin.com/in/dhruvik-detroja/
08

Data Cleaning

IDENTIFYING OUTLIERS
We have the Products table like this.

Let’s find outliers

LinkedIn www.linkedin.com/in/dhruvik-detroja/
09

Data Cleaning

IDENTIFYING OUTLIERS

-- Calculate the total number of rows

SELECT COUNT(*) INTO @row_count FROM Products;

-- Calculate Q1

WITH OrderedPrices AS (

SELECT

Price,

ROW_NUMBER() OVER (ORDER BY Price) AS RowNum

FROM Products

),

Quartiles AS (

SELECT

Price AS Q1

FROM OrderedPrices

WHERE RowNum = FLOOR(@row_count * 0.25) + 1

SELECT Q1 INTO @Q1 FROM Quartiles;

LinkedIn www.linkedin.com/in/dhruvik-detroja/
10

Data Cleaning

IDENTIFYING OUTLIERS

-- Calculate Q3

WITH OrderedPrices AS (

SELECT

Price,

ROW_NUMBER() OVER (ORDER BY Price) AS RowNum

FROM Products

),

Quartiles AS (

SELECT

Price AS Q3

FROM OrderedPrices

WHERE RowNum = FLOOR(@row_count * 0.75) + 1

SELECT Q3 INTO @Q3 FROM Quartiles;

LinkedIn www.linkedin.com/in/dhruvik-detroja/
11

Data Cleaning

IDENTIFYING OUTLIERS

SET @IQR = @Q3 - @Q1;

-- Calculate the lower and upper bounds for outliers

SET @LowerBound = @Q1 - 1.5 * @IQR;

SET @UpperBound = @Q3 + 1.5 * @IQR;

-- Identify outliers

SELECT

ProductID,

ProductName,

Price

FROM

Products

WHERE

Price < @LowerBound OR Price > @UpperBound;

LinkedIn www.linkedin.com/in/dhruvik-detroja/
12

Data Cleaning

IDENTIFYING OUTLIERS
Here is the result and we have successfully
found the outlier.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
13

Data Cleaning

ANALYZING OUTLIERS
This was just a sample dataset.

But Real world don’t works always on


formula there might be cases where you
find a data point which is very deviated
from dataset but it might not be outlier.

But for your analysis purpose you might


remove it or might not it completely
depends on what data we have in hand.

So Analyzing Outliers is very important


before taking any next step so check data
entry errors, understand business context
or identify rare but valid cases.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
14

Data Cleaning

ANALYZING OUTLIERS
Take Example of rooms booked in a hotel

dataset is like:

Rooms Booked
Here 12 is outlier but still
it’s possible in real world
1

to book 12 rooms at a
2
time.

2
So might not be any
12
data entry error here.

3 Now to remove or not


completely depends on
analysis you want to
perform.
LinkedIn www.linkedin.com/in/dhruvik-detroja/
15

Data Cleaning

DECIDING HANDLING STRATEGY


We can do following things to handle
outlier after detecting
Remove Outlier
Replacing Outliers with some other
value.

You can replace it with mean, maybe with


median or with some other value it
completely depends on what type of
analysis you doing or will replacement
value will make sense for those outliers.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
16

Data Cleaning

REMOVING OUTLIERS
Simply delete the data after identifying
outliers if you want to remove outliers.

We have identified lower bound and upper


bound using IQR so let’s remove data
which outside of this bound.
We have Contacts Table like this
DELETE FROM Products

WHERE Price < @LowerBound OR Price > @UpperBound

For Replacing Value simply use Update


statement with your replacement value.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
For More Education Content

FOLLOW

ME ON

LINKEDIN

Please Like and

Repost if you find

this helpful

LinkedIn www.linkedin.com/in/dhruvik-detroja/
PART -6
Data

cleaning

IN sql

Email [email protected]
02

For this tutorial I have used


different Databases and tables.

This data is only for learning


purpose.

Let’s learn how to do Data


Cleaning in SQL.

Check Part 5 in Comment


Section.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
03

Data Cleaning

ENSURING CONSTRAINTS
During Data Cleaning Process its
important that we should add
meaningful constraints to our columns.

So we can minimize our data insertion


errors.

Consider a simple employees table from


the next page.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
04

Data Cleaning

ENSURING CONSTRAINTS

CREATE TABLE Employees (

EmployeeID INT,

FirstName VARCHAR(50),

LastName VARCHAR(50),

Email VARCHAR(100),

PhoneNumber VARCHAR(20),

HireDate DATETIME,

Salary DECIMAL(10, 2),

DepartmentID INT

);

Now Applying Constraints in these columns


will help in future preventing data insertion
errors.
LinkedIn www.linkedin.com/in/dhruvik-detroja/
05

Data Cleaning

ENSURING CONSTRAINTS

ALTER TABLE Employees

ADD CONSTRAINT uq_email UNIQUE (Email);

ALTER TABLE Employees

ADD CONSTRAINT chk_salary CHECK (Salary >= 0);

ALTER TABLE Employees

MODIFY COLUMN HireDate DATETIME DEFAULT NOW();

Similarly we can also apply primary key,

not null foreign key constraints.

What constraints you need to apply


completely depends on your data.
LinkedIn www.linkedin.com/in/dhruvik-detroja/
06

Data Cleaning

REMOVING UNNECESSARY COLUMNS

Sometimes during data cleaning we might


want to remove some unnecessary
columns which might not be very useful to
our data analysis so we should drop those
columns.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
07

Data Cleaning

REMOVING UNNECESSARY COLUMNS

ALTER TABLE employees

DROP COLUMN MiddleName;

As we are working in temporary tables for


our data cleaning so its fine to remove
those unnecessary columns.

The original data will be still present in our


main data table.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
08

Data Cleaning

DERIVING NEW COLUMN


When we are performing data analysis we
might want to derive some new column
using some calculations in our existing
table.

Take similar example of our employees


table.

Let’s derive new columns for our table


let’s see what types of new columns we
can create.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
09

Data Cleaning

DERIVING NEW COLUMN


We have birthdate column we need to
derive new age column.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
10

Data Cleaning

DERIVING NEW COLUMN


Add column to the table

We have birthdate column we need to


derive newEmployees

ALTER TABLE age column.


ADD COLUMN Age INT;

Adding data to New Column

We have birthdate column we need to


derive
UPDATE new age column.
Employees

SET Age = YEAR(CURDATE()) - YEAR(BirthDate)

- (DATE_FORMAT(CURDATE(), '%m-%d') <


DATE_FORMAT(BirthDate, '%m-%d'));

LinkedIn www.linkedin.com/in/dhruvik-detroja/
11

Data Cleaning

DERIVING NEW COLUMN


We have salary column we need to derive
bonus for our employees.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
12

Data Cleaning

DERIVING NEW COLUMN


Add column to the table

We have birthdate column we need to


derive newEmployees

ALTER TABLE age column.


ADD COLUMN YearlyBonus DECIMAL(10, 2);

We assuming bonus to be 15% of salary

We have birthdate column we need to


derive
UPDATE new age column.
Employees

SET YearlyBonus = Salary * 0.15;

LinkedIn www.linkedin.com/in/dhruvik-detroja/
13

Data Cleaning

DERIVING NEW COLUMN

Here we go we have our new derived


columns which can be useful for our data
analysis.

So this is how we can derive new columns


during our data cleaning , it’s actually a
part of preparing data for our future data
analysis.

LinkedIn www.linkedin.com/in/dhruvik-detroja/
For More Education Content

FOLLOW

ME ON

LINKEDIN

Please Like and


Repost if you find

this helpful

LinkedIn www.linkedin.com/in/dhruvik-detroja/

You might also like