SQL -Data Cleaning
SQL -Data Cleaning
Engineering
Data Cleaning
using SQL
Purpose:
To delete duplicate rows from a table and retain unique records only.
Syntax:
Explanation:
https://www.seekhobigdata.com/
2. Handling NULL Values
Purpose:
To replace or fill NULL values in columns to ensure consistency in data.
Syntax:
UPDATE table_name
SET column_name = default_value
WHERE column_name IS NULL;
Explanation:
https://www.seekhobigdata.com/
3. Trimming Whitespaces
Purpose:
To remove extra spaces from text data that can cause inconsistencies
during analysis.
Syntax:
UPDATE table_name
SET column_name = TRIM(column_name);
Explanation:
The TRIM function removes spaces from both ends of the text data
in column_name.
This ensures data cleanliness by eliminating unwanted spaces.
https://www.seekhobigdata.com/
4. Standardizing Data (e.g., Case Consistency)
Purpose:
To maintain uniformity in data entries (e.g., all text in uppercase or
lowercase).
Syntax:
UPDATE table_name
SET column_name = UPPER(column_name);
Explanation:
Using UPPER or LOWER, you can make all text data in a column
consistent in case (e.g., all uppercase).
This helps avoid case-sensitive issues.
https://www.seekhobigdata.com/
5. Filtering Out Unwanted Data (e.g., Rows with Invalid
Data)
Purpose:
To delete rows that don’t meet specific quality criteria.
Syntax:
Explanation:
https://www.seekhobigdata.com/
6. Converting Data Types
Purpose:
To ensure columns have the correct data type (e.g., converting text
dates to DATE type).
Syntax:
Explanation:
https://www.seekhobigdata.com/
7. Removing Outliers (Advanced)
Purpose:
To remove extreme values that can skew data analysis.
Syntax:
Explanation:
https://www.seekhobigdata.com/
Thank you