0% found this document useful (0 votes)
134 views

Data Cleaning in SQL

This document discusses data cleaning processes performed on the Nashville Housing Dataset using SQL queries. The processes included: 1) standardizing the date format in the SaleDate column, 2) populating blank property addresses by comparing parcel IDs, 3) splitting the PropertyAddress and OwnerAddress columns into individual address, city, and state columns, 4) replacing "Y" and "N" values with "Yes" and "No" in the "Sold as Vacant" field, and 5) deleting unused columns like PropertyAddress and SaleDate. The end result was a cleaned dataset with standardized, unambiguous data in individual columns.

Uploaded by

sycs student
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views

Data Cleaning in SQL

This document discusses data cleaning processes performed on the Nashville Housing Dataset using SQL queries. The processes included: 1) standardizing the date format in the SaleDate column, 2) populating blank property addresses by comparing parcel IDs, 3) splitting the PropertyAddress and OwnerAddress columns into individual address, city, and state columns, 4) replacing "Y" and "N" values with "Yes" and "No" in the "Sold as Vacant" field, and 5) deleting unused columns like PropertyAddress and SaleDate. The end result was a cleaned dataset with standardized, unambiguous data in individual columns.

Uploaded by

sycs student
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Cleaning in

SQL
By Muhammad Ikhwan Fadillah
This Project worked using SQL Queries in
Microsoft SQL Server Management Studio
Data Brief
Nashville Housing Dataset
This is the core dataset. You might find all other
information.
1. UniqueID 11. Acreage
2. ParcelID 12. TaxDistrict
3. LandUse 13. LandValue
4. PropertyAddress 14. BuildingValue
5. SaleDate 15. TotalValue
6. SalePrice 16. YearBuilt
7. LegalReference 17. Bedrooms
8. SoldAsVacant 18. FullBath
9. OwnerName 19. HalfBath
10. OwnerAddress
Total Rows : 56.477
Data Cleaning Process
1. Standardize Date Format ; Change Datetime format to Date Format in SaleDate column.
2. Populate Property Address Data ; Fill in the blank data in the property address column by
looking at the similarity of data in the parcelID column.
3. Breaking Out Address Column ; Split PropertyAddress column and OwnerAddress column into
Address column, city column, and state column.
4. Replace Sold AS Vacant Data ; Replace "Y" and "N" into "Yes" and "No" .
5. Delete Unused Column ; Delete unused column such as PropertyAddress, OwnerAddress and
SaleDate .
Standardize
Date Format
Solution
Result
Before After
Populate Property
Address Data
Solution
Result
Solution
Result
Breaking out Property
Address into Individual
Columns
(Addres and City)
Solution
Solution
Result
Breaking out Owner
Address into Individual
Columns
(Addres, City, and
State)
Solution
Solution
Result
Change Y and N to
Yes and No in "Sold
as Vacant" field
Solution
Result
Solution
Solution
Result
Delete Unused
Columns
Solution
THANK YOU

You might also like