0% found this document useful (0 votes)
8 views

Data Pre-Processing with Sklearn using Standard and Minmax

The document discusses data preprocessing in machine learning, emphasizing its importance in preparing raw data for model training. It covers techniques such as feature scaling using Standard and MinMax scalers, as well as binarization, and provides Python code examples using Scikit-learn. Additionally, it explains linear regression as a supervised learning algorithm for predicting dependent variables based on independent features.

Uploaded by

Barkha Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Pre-Processing with Sklearn using Standard and Minmax

The document discusses data preprocessing in machine learning, emphasizing its importance in preparing raw data for model training. It covers techniques such as feature scaling using Standard and MinMax scalers, as well as binarization, and provides Python code examples using Scikit-learn. Additionally, it explains linear regression as a supervised learning algorithm for predicting dependent variables based on independent features.

Uploaded by

Barkha Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Pre-Processing with Sklearn using

Standard and Minmax scaler


Data Preprocessing in Machine learning
• Data preprocessing is a process of preparing the raw
data and making it suitable for a machine learning
model. It is the first and crucial step while creating a
machine learning model.
• When creating a machine learning project, it is not
always a case that we come across the clean and
formatted data. And while doing any operation with
data, it is mandatory to clean it and put in a formatted
way. So for this, we use data preprocessing task.
• Why do we need Data Preprocessing?
• A real-world data generally contains noises, missing
values, and maybe in an unusable format which cannot
be directly used for machine learning models. Data
preprocessing is required tasks for cleaning the data
and making it suitable for a machine learning model
which also increases the accuracy and efficiency of a
machine learning model.
It involves below steps:
• Getting the dataset
• Importing libraries
• Importing datasets
• Finding Missing Data
• Encoding Categorical Data
• Splitting dataset into training and test set
• Feature scaling
• Data Scaling is a data preprocessing step for numerical
features. Many machine learning algorithms like Gradient
descent methods, KNN algorithm, linear and logistic regression,
etc. require data scaling to produce good results.
• Apart from supporting library functions other functions that will
be used to achieve the functionality are:
• The fit(data) method is used to compute the mean and std dev
for a given feature so that it can be used further for scaling.
• The transform(data) method is used to perform scaling using
mean and std dev calculated using the .fit() method.
• The fit_transform() method does both fit and transform.
MinMax Scaler
There is another way of data scaling, where the minimum
of feature is made equal to zero and the maximum of
feature equal to one. MinMax Scaler shrinks the data
within the given range, usually of 0 to 1. It transforms
data by scaling features to a given range. It scales the
values to a specific value range without changing the
shape of the original distribution.
The MinMax scaling is done using:
• x_std = (x – x.min) / (x.max – x.min)
• x_scaled = x_std * (max – min) + min
Where,
• min, max = feature_range
• x.min(axis=0) : Minimum feature value
• x.max(axis=0):Maximum feature value
• Sklearn preprocessing defines MinMaxScaler() method
to achieve this.
• Syntax: class
sklearn.preprocessing.MinMaxScaler(feature_range=0, 1,
*, copy=True, clip=False)
• Parameters:
• feature_range: Desired range of scaled data. The
default range for the feature returned by MinMaxScaler is
0 to 1. The range is provided in tuple form as (min,max).
• copy: If False, inplace scaling is done. If True , copy is
created instead of inplace scaling.
• clip: If True, scaled data is clipped to provided feature
# import module
from sklearn.preprocessing import
MinMaxScaler

# create data
data = [[11, 2], [3, 7], [0, 10], [11, 8]]

# scale features
scaler = MinMaxScaler()
model=scaler.fit(data)
scaled_data=model.transform(data)

# print scaled features


print(scaled_data)
Output:

[[1. 0. ]
[0.27272727 0.625 ]
[0. 1. ]
[1. 0.75 ]]
binarize the data using Python Scikit-learn
• Binarization is a preprocessing technique which is
used when we need to convert the data into binary
numbers i.e., when we need to binarize the data. The
scikit-learn function
named Sklearn.preprocessing.binarize() is used to
binarize the data.
• This binarize function is having threshold parameter,
the feature values below or equal this threshold value is
replaced by 0 and value above it is replaced by 1.
• # Importing the necessary packages import
sklearn import numpy as np from sklearn import
preprocessing X = [[ 0.4, -1.8, 2.9],[ 2.5,
0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]]
Binarized_Data =
preprocessing.Binarizer(threshold=0.5).transfor
m(X) print("\nThe Binarized data is:\n",
Binarized_Data)
• # Importing the necessary packages
• import sklearn
• import numpy as np
• from sklearn import preprocessing
• X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]]
• Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X)
• print("\nThe Binarized data is:\n", Binarized_Data)
Output:
The Binarized
data is:
[[0. 0. 1.]
[1. 1. 0.]
[0. 1. 0.]
[0. 1. 1.]]
• Implementation of Logistic and Linear Regression Algorithms
Linear Regression:
Linear regression is a type of supervised machine
learning algorithm that computes the linear relationship
between a dependent variable and one or more
independent features.
When the number of the independent feature, is 1 then it
is known as Univariate Linear regression and
in the case of more than one feature, it is known as
multivariate linear regression.
• The goal of the algorithm is to find the best linear
equation that can predict the value of the dependent
variable based on the independent variables.
• The equation provides a straight line that represents the
relationship between the dependent and independent
variables.
• The slope of the line indicates how much the dependent
variable changes for a unit change in the independent
variable(s).
• Linear regression is used in many different fields,
including finance, economics, and psychology, to
understand and predict the behavior of a particular
variable. For example, in finance, linear regression
might be used to understand the relationship between a
company’s stock price and its earnings or to predict the
future value of a currency based on its past
performance.
• One of the most important supervised learning tasks is regression.
In regression set of records are present with X and Y values and
these values are used to learn a function so if you want to predict
Y from an unknown X this learned function can be used. In
regression we have to find the value of Y, So, a function is
required that predicts continuous Y in the case of regression given
X as independent features.
• Here Y is called a dependent or target variable and X is called an
independent variable also known as the predictor of Y. There are
many types of functions or modules that can be used for
regression. A linear function is the simplest type of function. Here,
X may be a single feature or multiple features representing the
problem.
Linear regression performs the task to predict a
dependent variable value (y) based on a given
independent variable (x)). Hence, the name is Linear
Regression. In the figure above, X (input) is the work
experience and Y (output) is the salary of a person. The
regression line is the best-fit line for our model.

You might also like