0% found this document useful (0 votes)
14 views

SQL -Data Cleaning

Sql question

Uploaded by

559aryan.ar3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

SQL -Data Cleaning

Sql question

Uploaded by

559aryan.ar3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data

Engineering
Data Cleaning
using SQL

Seekho Bigdata Institute


https://www.seekhobigdata.com/
1. Removing Duplicates

Purpose:
To delete duplicate rows from a table and retain unique records only.

Syntax:

DELETE FROM table_name


WHERE id NOT IN (
SELECT MIN(id)
FROM table_name
GROUP BY column1, column2, ...
);

Explanation:

This query identifies duplicates by grouping records based on


certain columns (like column1, column2) and keeps the row with the
minimum id value. All other duplicates are deleted.

https://www.seekhobigdata.com/
2. Handling NULL Values

Purpose:
To replace or fill NULL values in columns to ensure consistency in data.

Syntax:

UPDATE table_name
SET column_name = default_value
WHERE column_name IS NULL;

Explanation:

This replaces NULL values in a specific column with a specified


default_value, making the dataset more complete.

https://www.seekhobigdata.com/
3. Trimming Whitespaces

Purpose:
To remove extra spaces from text data that can cause inconsistencies
during analysis.

Syntax:

UPDATE table_name
SET column_name = TRIM(column_name);

Explanation:

The TRIM function removes spaces from both ends of the text data
in column_name.
This ensures data cleanliness by eliminating unwanted spaces.

https://www.seekhobigdata.com/
4. Standardizing Data (e.g., Case Consistency)

Purpose:
To maintain uniformity in data entries (e.g., all text in uppercase or
lowercase).

Syntax:

UPDATE table_name
SET column_name = UPPER(column_name);

Explanation:

Using UPPER or LOWER, you can make all text data in a column
consistent in case (e.g., all uppercase).
This helps avoid case-sensitive issues.

https://www.seekhobigdata.com/
5. Filtering Out Unwanted Data (e.g., Rows with Invalid
Data)
Purpose:
To delete rows that don’t meet specific quality criteria.

Syntax:

DELETE FROM table_name


WHERE column_name = 'invalid_value';

Explanation:

Removes rows where data doesn’t meet specified conditions (like


invalid_value), ensuring that only high-quality data remains.

https://www.seekhobigdata.com/
6. Converting Data Types

Purpose:
To ensure columns have the correct data type (e.g., converting text
dates to DATE type).

Syntax:

ALTER TABLE table_name


ALTER COLUMN column_name TYPE new_data_type USING
expression;

Explanation:

Converting data types makes it easier to work with data


consistently.
USING expression can handle any necessary transformations (e.g.,
converting string format dates to DATE).

https://www.seekhobigdata.com/
7. Removing Outliers (Advanced)

Purpose:
To remove extreme values that can skew data analysis.

Syntax:

DELETE FROM table_name


WHERE column_name < lower_limit OR column_name >
upper_limit;

Explanation:

Specify a lower_limit and upper_limit for acceptable values in a


column.
This keeps only the data within this range, removing outliers.

https://www.seekhobigdata.com/
Thank you

Seekho Bigdata Institute


https://www.seekhobigdata.com/
+91 99894 54737

You might also like