0% found this document useful (0 votes)
14 views20 pages

Day 1 4 Feature Extract

Uploaded by

ak.amith.fr33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views20 pages

Day 1 4 Feature Extract

Uploaded by

ak.amith.fr33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Workshop

on
Trending Models in Deep Learning

Feature Engineering

Shailesh S
Overview
➢Types of learning
➢Dimensionality Reduction
➢Components of Dimensionality Reduction
➢Methods of Dimensionality Reduction
➢Feature Reduction Iris Dataset
➢Preprocessing

ALPYDIN & CH. EICK: ML TOPIC1 2


What machine learning does?

ALPYDIN & CH. EICK: ML TOPIC1 3


Some Facts
Expectation :
• We have good enough data

• Do focus on designing better


algorithms

Reality :
• We have large amount of data but not
good enough

• How to transform your data into


learning compactable ?
ALPYDIN & CH. EICK: ML TOPIC1 4
What is a Feature in ML
• A feature is a measurable property of the object
you’re trying to analyze. In datasets, features
appear as columns
• Feature engineering is the process of transforming
raw data into features that better represent the
underlying problem to the predictive models,
resulting in improved model accuracy on unseen
data.
• Feature engineering turn your inputs into things
the algorithm can understand

ALPYDIN & CH. EICK: ML TOPIC1 5


Types of Features

Features

ALPYDIN & CH. EICK: ML TOPIC1 6


Features from observation

ALPYDIN & CH. EICK: ML TOPIC1 7


Features from observation

ALPYDIN & CH. EICK: ML TOPIC1 8


Feature Engineering

ALPYDIN & CH. EICK: ML TOPIC1 9


Features in iris data set

Source : Kaggle
https://www.kaggle.com/uciml/iris

ALPYDIN & CH. EICK: ML TOPIC1 10


Iris dataset in scikit-learn
• scikit-learn – machine learning in python
• Simple and efficient tools for data
mining and data analysis
• several machine learning algorithms
Machine Learning in Python

ALPYDIN & CH. EICK: ML TOPIC1 11


Dimensionality Reduction
Dimensionality reduction - Reducing the
number of random variables to consider.
Learning in Python

ALPYDIN & CH. EICK: ML TOPIC1 12


Dimensionality Reduction
• ‘Dimensionality’ - simply refers to the number of
features (i.e. input variables) in your dataset.
• When the number of features is very large relative to
the number of observations in your
dataset, certain algorithms struggle to train
effective models.
• This is called the “Curse of Dimensionality,”
• It’s especially relevant for clustering algorithms
that rely on distance calculations.

ALPYDIN & CH. EICK: ML TOPIC1 13


Components of Dimensionality
Reduction

Feature selection: you select a subset of


the original feature set.

Feature extraction: you build a new set of


features from the original feature set.

ALPYDIN & CH. EICK: ML TOPIC1 14


Methods of Dimensionality
Reduction
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)

ALPYDIN & CH. EICK: ML TOPIC1 15


Feature Reduction Iris Dataset

ALPYDIN & CH. EICK: ML TOPIC1 16


Feature Reduction Iris Dataset
PCA OUTPUT
[[-2.68412563 0.31939725] [-2.71414169 -0.17700123] [-2.88899057 -0.14494943] [-2.74534286 -
0.31829898] [-2.72871654 0.32675451] [-2.28085963 0.74133045] [-2.82053775 -0.08946138] [-
2.62614497 0.16338496] [-2.88638273 -0.57831175] -----------------------------------------------
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
--------------[ 2.15943764 -0.21727758] [ 1.44416124 -0.14341341] [ 1.78129481 -0.49990168] [
3.07649993 0.68808568] [ 2.14424331 0.1400642 ] [ 1.90509815 0.04930053] [ 1.16932634 -
0.16499026] [ 2.10761114 0.37228787] [ 2.31415471 0.18365128] [ 1.9222678 0.40920347] [
1.41523588 -0.57491635] [ 2.56301338 0.2778626 ] [ 2.41874618 0.3047982 ] [ 1.94410979 0.1875323
] [ 1.52716661 -0.37531698] [ 1.76434572 0.07885885] [ 1.90094161 0.11662796] [ 1.39018886 -
0.28266094]]

LDA OUTPUT
[[-8.06179978e+00 3.00420621e-01] [-7.12868772e+00 -7.86660426e-01] [-7.48982797e+00 -
2.65384488e-01] [-6.81320057e+00 -6.70631068e-01] [-8.13230933e+00 5.14462530e-01] [-
7.70194674e+00 1.46172097e+00] [-7.21261762e+00 3.55836209e-01] --------------------------------
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
----------------[ 9.97610366e-01 -4.90530602e-01] [ 3.83525931e+00 -1.40595806e+00] [
2.25741249e+00 -1.42679423e+00] [ 1.25571326e+00 -5.46424197e-01] [ 1.43755762e+00 -1.34424979e-
01] [ 2.45906137e+00 -9.35277280e-01] [ 3.51848495e+00 1.60588866e-01] [ 2.58979871e+00 -
1.74611728e-01] [-3.07487884e-01 -1.31887146e+00] [ 4.96774090e+00 8.21140550e-01] [
5.88614539e+00 2.34509051e+00] [ 4.68315426e+00 3.32033811e-01]]

ALPYDIN & CH. EICK: ML TOPIC1 17


Preprocessing
Scaling
• The MinMaxScaler is the probably the most famous
scaling algorithm, and follows the following
formula for each feature:
𝑥𝑖 –min(𝑥)
max(𝑥)–min(𝑥)

• Shrinks the range


such that the range
is now between 0
and 1 (or -1 to 1
if there are
negative values).

ALPYDIN & CH. EICK: ML TOPIC1 18


Preprocessing
Label Encoding
• Used to transform non-numerical labels that is
categorical values to numerical labels.

ALPYDIN & CH. EICK: ML TOPIC1 19


Thank You

20

You might also like