0% found this document useful (0 votes)
4 views

Data Cleansing,Linear Regression,Gradient Descent Algorithm in ML

Data cleaning is a vital process in the machine learning pipeline that involves identifying and rectifying errors in data to enhance its quality and usability. This step is crucial for ensuring that the data is accurate and consistent, as poor data quality can adversely affect model performance. Additionally, the document briefly discusses linear regression and gradient descent as methods used in machine learning for model training and optimization.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Cleansing,Linear Regression,Gradient Descent Algorithm in ML

Data cleaning is a vital process in the machine learning pipeline that involves identifying and rectifying errors in data to enhance its quality and usability. This step is crucial for ensuring that the data is accurate and consistent, as poor data quality can adversely affect model performance. Additionally, the document briefly discusses linear regression and gradient descent as methods used in machine learning for model training and optimization.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

28-01-2025

ML | Overview of Data Cleaning

What is Data Cleaning?


Data cleaning is a crucial step in the machine learning (ML) pipeline, as it involves
identifying and removing any missing, duplicate, or irrelevant data. The goal of data
cleaning is to ensure that the data is accurate, consistent, and free of errors, as incorrect
or inconsistent data can negatively impact the performance of the ML model.
Professional data scientists usually invest a very large portion of their time in this step

Machine Learning Practical because of the belief that “Better data beats fancier algorithms”.
Data cleaning, also known as data cleansing or data preprocessing, is a crucial step
Data Cleaning in ML in the data science pipeline that involves identifying and correcting or removing errors,
inconsistencies, and inaccuracies in the data to improve its quality and usability. Data
cleaning is essential because raw data is often noisy, incomplete, and inconsistent,
which can negatively impact the accuracy and reliability of the insights derived from it.

1
28-01-2025

From the above data info, we can see that Age and Cabin have an unequal number of
counts. And some of the columns are categorical and have data type objects and some
are integer and float values.

2
28-01-2025

3
28-01-2025

4
28-01-2025

5
28-01-2025

6
28-01-2025

Linear Regression Example steps

7
28-01-2025

import numpy as np # Train model


import pandas as pd model = LinearRegression()
import matplotlib.pyplot as plt model.fit(X_train, y_train)
from sklearn.model_selection import train_test_split
# Model parameters
from sklearn.linear_model import LinearRegression print(f"Slope (m): {model.coef_[0][0]}")
from sklearn.metrics import mean_squared_error, r2_score print(f"Intercept (b): {model.intercept_[0]}")

# Generate dataset # Predict and evaluate


np.random.seed(42) y_pred = model.predict(X_test)
X = 2 * np.random.rand(100, 1) mse = mean_squared_error(y_test, y_pred)
y = 4 + 3 * X + np.random.randn(100, 1) r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")


# Split dataset
print(f"R² Score: {r2}")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Visualization
plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, model.predict(X), color='red', label='Regression Line')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Linear Regression')
plt.show()

8
28-01-2025

What is gradient descent?


Gradient descent is an optimization algorithm often
used to train machine learning models by locating
the minimum values within a cost function.

Through this process, gradient descent minimizes


the cost function and reduces the margin between
predicted and actual results, improving a machine
learning model’s accuracy over time.

9
28-01-2025

A gradient simply measures the change in all weights Imagine a blindfolded man who wants to climb
with regard to the change in error. You can also think to the top of a hill with the fewest steps possible.
of a gradient as the slope of a function.
He might start climbing the hill by taking really
The higher the gradient, the steeper the slope and
big steps in the steepest direction.
the faster a model can learn.

But if the slope is zero, the model stops learning. In But as he comes closer to the top, his steps will
mathematical terms, a gradient is a partial derivative get smaller and smaller to avoid overshooting it.
with respect to its inputs.

10
28-01-2025

11

You might also like