Data Cleaning in Machine Learning With Numerical Example
Data Cleaning in Machine Learning With Numerical Example
101 25 50000 1
Na
102 60000 0
N
103 40 NaN 1
104 35 70000 1
105 50 80000 0
106 25 50000 1
107 -5 45000 1
108 29 90000 0
df = pd.DataFrame(data)
print("Original Dataset:\n", df)
103 40 60000 1
104 35 70000 1
105 50 80000 0
108 29 90000 0
Improvements:
✅ Missing values handled using mean (Age) and median (Salary).
✅ Duplicate record removed (Customer 106 was a duplicate of 101).
✅ Negative value corrected (Customer 107’s Age changed from -5 to 5).
✅ Outliers removed in Salary column using IQR method.