Data Pre-Processing With Sklearn Using Standard and Minmax

The document discusses data preprocessing in machine learning, emphasizing its importance in preparing raw data for model training. It covers techniques such as feature scaling using Standard and MinMax scalers, as well as binarization, and provides Python code examples using Scikit-learn. Additionally, it explains linear regression as a supervised learning algorithm for predicting dependent variables based on independent features.

Uploaded by

Barkha Kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views21 pages

Data Pre-Processing With Sklearn Using Standard and Minmax

Uploaded by

Barkha Kumari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Pre-Processing with Sklearn using

Standard and Minmax scaler

Data Preprocessing in Machine learning
• Data preprocessing is a process of preparing the raw
data and making it suitable for a machine learning
model. It is the first and crucial step while creating a
machine learning model.
• When creating a machine learning project, it is not
always a case that we come across the clean and
formatted data. And while doing any operation with
data, it is mandatory to clean it and put in a formatted
way. So for this, we use data preprocessing task.
• Why do we need Data Preprocessing?
• A real-world data generally contains noises, missing
values, and maybe in an unusable format which cannot
be directly used for machine learning models. Data
preprocessing is required tasks for cleaning the data
and making it suitable for a machine learning model
which also increases the accuracy and efficiency of a
machine learning model.
It involves below steps:
• Getting the dataset
• Importing libraries
• Importing datasets
• Finding Missing Data
• Encoding Categorical Data
• Splitting dataset into training and test set
• Feature scaling
• Data Scaling is a data preprocessing step for numerical
features. Many machine learning algorithms like Gradient
descent methods, KNN algorithm, linear and logistic regression,
etc. require data scaling to produce good results.
• Apart from supporting library functions other functions that will
be used to achieve the functionality are:
• The fit(data) method is used to compute the mean and std dev
for a given feature so that it can be used further for scaling.
• The transform(data) method is used to perform scaling using
mean and std dev calculated using the .fit() method.
• The fit_transform() method does both fit and transform.
MinMax Scaler
There is another way of data scaling, where the minimum
of feature is made equal to zero and the maximum of
feature equal to one. MinMax Scaler shrinks the data
within the given range, usually of 0 to 1. It transforms
data by scaling features to a given range. It scales the
values to a specific value range without changing the
shape of the original distribution.
The MinMax scaling is done using:
• x_std = (x – x.min) / (x.max – x.min)
• x_scaled = x_std * (max – min) + min
Where,
• min, max = feature_range
• x.min(axis=0) : Minimum feature value
• x.max(axis=0):Maximum feature value
• Sklearn preprocessing defines MinMaxScaler() method
to achieve this.
• Syntax: class
sklearn.preprocessing.MinMaxScaler(feature_range=0, 1,
*, copy=True, clip=False)
• Parameters:
• feature_range: Desired range of scaled data. The
default range for the feature returned by MinMaxScaler is
0 to 1. The range is provided in tuple form as (min,max).
• copy: If False, inplace scaling is done. If True , copy is
created instead of inplace scaling.
• clip: If True, scaled data is clipped to provided feature
# import module
from sklearn.preprocessing import
MinMaxScaler

# create data
data = [[11, 2], [3, 7], [0, 10], [11, 8]]

# scale features
scaler = MinMaxScaler()
model=scaler.fit(data)
scaled_data=model.transform(data)

# print scaled features

print(scaled_data)
Output:

[[1. 0. ]
[0.27272727 0.625 ]
[0. 1. ]
[1. 0.75 ]]
binarize the data using Python Scikit-learn
• Binarization is a preprocessing technique which is
used when we need to convert the data into binary
numbers i.e., when we need to binarize the data. The
scikit-learn function
named Sklearn.preprocessing.binarize() is used to
binarize the data.
• This binarize function is having threshold parameter,
the feature values below or equal this threshold value is
replaced by 0 and value above it is replaced by 1.
• # Importing the necessary packages import
sklearn import numpy as np from sklearn import
preprocessing X = [[ 0.4, -1.8, 2.9],[ 2.5,
0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]]
Binarized_Data =
preprocessing.Binarizer(threshold=0.5).transfor
m(X) print("\nThe Binarized data is:\n",
Binarized_Data)
• # Importing the necessary packages
• import sklearn
• import numpy as np
• from sklearn import preprocessing
• X = [[ 0.4, -1.8, 2.9],[ 2.5, 0.9, 0.3],[ 0., 1., -1.5],[ 0.1, 2.9, 5.9]]
• Binarized_Data = preprocessing.Binarizer(threshold=0.5).transform(X)
• print("\nThe Binarized data is:\n", Binarized_Data)
Output:
The Binarized
data is:
[[0. 0. 1.]
[1. 1. 0.]
[0. 1. 0.]
[0. 1. 1.]]
• Implementation of Logistic and Linear Regression Algorithms
Linear Regression:
Linear regression is a type of supervised machine
learning algorithm that computes the linear relationship
between a dependent variable and one or more
independent features.
When the number of the independent feature, is 1 then it
is known as Univariate Linear regression and
in the case of more than one feature, it is known as
multivariate linear regression.
• The goal of the algorithm is to find the best linear
equation that can predict the value of the dependent
variable based on the independent variables.
• The equation provides a straight line that represents the
relationship between the dependent and independent
variables.
• The slope of the line indicates how much the dependent
variable changes for a unit change in the independent
variable(s).
• Linear regression is used in many different fields,
including finance, economics, and psychology, to
understand and predict the behavior of a particular
variable. For example, in finance, linear regression
might be used to understand the relationship between a
company’s stock price and its earnings or to predict the
future value of a currency based on its past
performance.
• One of the most important supervised learning tasks is regression.
In regression set of records are present with X and Y values and
these values are used to learn a function so if you want to predict
Y from an unknown X this learned function can be used. In
regression we have to find the value of Y, So, a function is
required that predicts continuous Y in the case of regression given
X as independent features.
• Here Y is called a dependent or target variable and X is called an
independent variable also known as the predictor of Y. There are
many types of functions or modules that can be used for
regression. A linear function is the simplest type of function. Here,
X may be a single feature or multiple features representing the
problem.
Linear regression performs the task to predict a
dependent variable value (y) based on a given
independent variable (x)). Hence, the name is Linear
Regression. In the figure above, X (input) is the work
experience and Y (output) is the salary of a person. The
regression line is the best-fit line for our model.

Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Feature Engineering PDF
100% (1)
Feature Engineering PDF
75 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Python Scikit-Learn Cheat Sheet For Machine Learning
No ratings yet
Python Scikit-Learn Cheat Sheet For Machine Learning
3 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
100% (1)
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
4 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Empirical AND Molecul AR Formulas: Insert Picture From First Page of Chapter
No ratings yet
Empirical AND Molecul AR Formulas: Insert Picture From First Page of Chapter
58 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Transportation Problem Using North-West Corner Method Calculator
50% (2)
Transportation Problem Using North-West Corner Method Calculator
2 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
ML Da
No ratings yet
ML Da
55 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
7 محاضرات
No ratings yet
7 محاضرات
36 pages
Mini 4
No ratings yet
Mini 4
9 pages
EN3150 Pattern Recognition - L02
No ratings yet
EN3150 Pattern Recognition - L02
51 pages
4 Data Preprocessing
No ratings yet
4 Data Preprocessing
27 pages
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
No ratings yet
Machine Learning Laboratory (BTCS619-18) B.Tech Cse 6Th 2024 EVEN
29 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
Bearing Selection Procedures
No ratings yet
Bearing Selection Procedures
2 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
MLCyber Lab
No ratings yet
MLCyber Lab
9 pages
Unit 2 - Supervised Learning - Regression
No ratings yet
Unit 2 - Supervised Learning - Regression
19 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
24 pages
Feature Engineering: Getting The Most Out of Data For Predictive Models
No ratings yet
Feature Engineering: Getting The Most Out of Data For Predictive Models
75 pages
The Effect of Growth Mindset On Mathematical Performance in Algeb
No ratings yet
The Effect of Growth Mindset On Mathematical Performance in Algeb
31 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Action Guide
No ratings yet
Action Guide
48 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Kabir Data Preprocessing Python
No ratings yet
Kabir Data Preprocessing Python
14 pages
Machine Learning With Python Data Preprocessing, Analysis and Visualization
No ratings yet
Machine Learning With Python Data Preprocessing, Analysis and Visualization
8 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Scikit Hca
No ratings yet
Scikit Hca
8 pages
Toefl Tips
No ratings yet
Toefl Tips
12 pages
Data - Preprocessing - Jupyter Notebook
No ratings yet
Data - Preprocessing - Jupyter Notebook
5 pages
Final ML
No ratings yet
Final ML
2 pages
Data Preprocessing
No ratings yet
Data Preprocessing
11 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
DSCI 6003 Class Notes
No ratings yet
DSCI 6003 Class Notes
7 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
How To Write A Position Paper
No ratings yet
How To Write A Position Paper
3 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
3 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Forensic 1 Lesson 3
No ratings yet
Forensic 1 Lesson 3
28 pages
D3.5 - Design of Frameless Structures Made of Sandwich Panels
No ratings yet
D3.5 - Design of Frameless Structures Made of Sandwich Panels
59 pages
For FRP Abaqus
No ratings yet
For FRP Abaqus
7 pages
Soal Reference Types Questions
No ratings yet
Soal Reference Types Questions
10 pages
Gaming Community 1-1-6
No ratings yet
Gaming Community 1-1-6
6 pages
Electrostatic
No ratings yet
Electrostatic
77 pages
1098 Programming Manual Pulsar Altair PDF
No ratings yet
1098 Programming Manual Pulsar Altair PDF
136 pages
Cui - Practical Exercise 3.1
No ratings yet
Cui - Practical Exercise 3.1
5 pages
Preparation of Papers For Journals of Science: Figure 1: Architecture of The Enhanced Fuzzy Resolution
No ratings yet
Preparation of Papers For Journals of Science: Figure 1: Architecture of The Enhanced Fuzzy Resolution
3 pages
Change in Waiting
No ratings yet
Change in Waiting
6 pages
Final Quiz 1 - Attempt Review
No ratings yet
Final Quiz 1 - Attempt Review
6 pages
Syllabus AST 101 G1 2
No ratings yet
Syllabus AST 101 G1 2
6 pages
Kaizen Model in African Bank
No ratings yet
Kaizen Model in African Bank
107 pages
Kerinna Good - Protected and Dispossessed
No ratings yet
Kerinna Good - Protected and Dispossessed
92 pages
CHANG 1980 - Ballistic Trajectory Estimation With Angle-Only Measurement
No ratings yet
CHANG 1980 - Ballistic Trajectory Estimation With Angle-Only Measurement
8 pages
Tutorial 5
No ratings yet
Tutorial 5
12 pages
Modul Master 01
No ratings yet
Modul Master 01
49 pages
Nafis Sir Math Routine 21 22
No ratings yet
Nafis Sir Math Routine 21 22
4 pages
2020 05 26 IDN NV UN 001 English
100% (2)
2020 05 26 IDN NV UN 001 English
2 pages
Revise Project Proposal Sample
No ratings yet
Revise Project Proposal Sample
15 pages
CRITICAL ANALYSIS Paula
No ratings yet
CRITICAL ANALYSIS Paula
2 pages
2024 Ultra Mock 1 English Language 1
No ratings yet
2024 Ultra Mock 1 English Language 1
3 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)