0% found this document useful (0 votes)
10 views

Scikit Learn

Tutorial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Scikit Learn

Tutorial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Features Vs

Target
TYPES OF
MACHINE
LEARNING
1 . SUPERVISED
LEARNING
1 . SUPERVISED
LEARNING
2 . UNSUPERVISED
LEARNING
3 . REINFORCEMENT
LEARNING
Preprocessing
tools

Feauture
Scikit-
learn selection train,

test, split
Model
Algorithms
evaluation
S C I K I T-
LEARN
Scikit-learn, also known as sklearn, is a popular open-source machine learning library in
Python

that provides a wide range of tools for data analysis, modeling, and evaluation.

Sklearn is built on top of NumPy, SciPy, and Matplotlib, and supports integration with
Pandas,

which makes it easy to use in data science workflows.

Sklearn is widely used in the data science community for various applications such as
predictive

modeling, natural language processing, computer vision, and time series forecasting,

among others.
I NS TA L L AT I O
N
IMPOR
T
Feature
Scaling

PREPROCESSING Encoding
Imputing null values

Outlier - detection & Handling


F E AT U R E
SCALING
F E AT U R E
SCALING
Feature scaling is a method used to normalize the range of
features

of data.

Fea ture S c a ling involves m odifying va lues by m ethods like

Normalization or Standardization.

It helps to avoid bias in machine learning


model.
WHY
SCALING?
When dataset has
numerical

fea tures a nd ea c h of them a re in

different scale.

ML m odel ca n put weight

on features with larger scale.

S c a ling helps to c ontribute

a ll features equally.
NORMALIZATI
ON

It is the method of scaling


the

d a t a by fitting the d a t a

points between a range of 0

to 1.
MIN-MAX
SCALER

MinMaxScaler from sklearn perform


normalization
S TA N DA R D I Z AT I
ON

This converts all the d a t a


points

to h a v e a m e a n value of 0

and standard deviation of 1


S TA N D A R D
SCALER

StandardScaler from sklearn perform


standardization
ROBUST
SCALER

This uses interquartile range


so

that it is robust to outliers


ROBUST
SCALER
ROBUST
SCALER
WHICH I S
BETTER?
Normalization:

Useful when the d a t a doesn't follow ga us sia n(norma l)

distrubution Useful in algorithms like KNN, and Neural networks

like CNN, ANN

Standardization:

W hen your d a t a follows gaussian distribution

R ob u st Scaler:

W hen your d a t a has outliers


E N C OD IN
G
ENCODIN
G
Machine learning models c a n only work with numerical
values.

For this reason, it is necessary to transform the categorical values


of

the relevant features into numerical ones.

This process is called feature


encoding.
T Y P E S OF
ENCODING
1. Nominal encoding :

Represent d a t a without an y order or

hierarchy It c a n be done with

OneHotEncoder

2. Ordinal Encoding :

Assigning unique integer based on

rank/order It c a n be done with LabelEncoder


ONEHOT
ENCODER
LABEL
ENCODER

You might also like