0% found this document useful (0 votes)

15 views20 pages

Day 1 4 Feature Extract

Uploaded by

ak.amith.fr33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views20 pages

Day 1 4 Feature Extract

Uploaded by

ak.amith.fr33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Workshop

on
Trending Models in Deep Learning

Feature Engineering

Shailesh S
Overview
➢Types of learning
➢Dimensionality Reduction
➢Components of Dimensionality Reduction
➢Methods of Dimensionality Reduction
➢Feature Reduction Iris Dataset
➢Preprocessing

ALPYDIN & CH. EICK: ML TOPIC1 2

What machine learning does?

ALPYDIN & CH. EICK: ML TOPIC1 3

Some Facts
Expectation :
• We have good enough data

• Do focus on designing better

algorithms

Reality :
• We have large amount of data but not
good enough

• How to transform your data into

learning compactable ?
ALPYDIN & CH. EICK: ML TOPIC1 4
What is a Feature in ML
• A feature is a measurable property of the object
you’re trying to analyze. In datasets, features
appear as columns
• Feature engineering is the process of transforming
raw data into features that better represent the
underlying problem to the predictive models,
resulting in improved model accuracy on unseen
data.
• Feature engineering turn your inputs into things
the algorithm can understand

ALPYDIN & CH. EICK: ML TOPIC1 5

Types of Features

Features

ALPYDIN & CH. EICK: ML TOPIC1 6

Features from observation

ALPYDIN & CH. EICK: ML TOPIC1 7

Features from observation

ALPYDIN & CH. EICK: ML TOPIC1 8

Feature Engineering

ALPYDIN & CH. EICK: ML TOPIC1 9

Features in iris data set

Source : Kaggle
https://www.kaggle.com/uciml/iris

ALPYDIN & CH. EICK: ML TOPIC1 10

Iris dataset in scikit-learn
• scikit-learn – machine learning in python
• Simple and efficient tools for data
mining and data analysis
• several machine learning algorithms
Machine Learning in Python

ALPYDIN & CH. EICK: ML TOPIC1 11

Dimensionality Reduction
Dimensionality reduction - Reducing the
number of random variables to consider.
Learning in Python

ALPYDIN & CH. EICK: ML TOPIC1 12

Dimensionality Reduction
• ‘Dimensionality’ - simply refers to the number of
features (i.e. input variables) in your dataset.
• When the number of features is very large relative to
the number of observations in your
dataset, certain algorithms struggle to train
effective models.
• This is called the “Curse of Dimensionality,”
• It’s especially relevant for clustering algorithms
that rely on distance calculations.

ALPYDIN & CH. EICK: ML TOPIC1 13

Components of Dimensionality
Reduction

Feature selection: you select a subset of

the original feature set.

Feature extraction: you build a new set of

features from the original feature set.

ALPYDIN & CH. EICK: ML TOPIC1 14

Methods of Dimensionality
Reduction
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)

ALPYDIN & CH. EICK: ML TOPIC1 15

Feature Reduction Iris Dataset

ALPYDIN & CH. EICK: ML TOPIC1 16

Feature Reduction Iris Dataset
PCA OUTPUT
[[-2.68412563 0.31939725] [-2.71414169 -0.17700123] [-2.88899057 -0.14494943] [-2.74534286 -
0.31829898] [-2.72871654 0.32675451] [-2.28085963 0.74133045] [-2.82053775 -0.08946138] [-
2.62614497 0.16338496] [-2.88638273 -0.57831175] -----------------------------------------------
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
--------------[ 2.15943764 -0.21727758] [ 1.44416124 -0.14341341] [ 1.78129481 -0.49990168] [
3.07649993 0.68808568] [ 2.14424331 0.1400642 ] [ 1.90509815 0.04930053] [ 1.16932634 -
0.16499026] [ 2.10761114 0.37228787] [ 2.31415471 0.18365128] [ 1.9222678 0.40920347] [
1.41523588 -0.57491635] [ 2.56301338 0.2778626 ] [ 2.41874618 0.3047982 ] [ 1.94410979 0.1875323
] [ 1.52716661 -0.37531698] [ 1.76434572 0.07885885] [ 1.90094161 0.11662796] [ 1.39018886 -
0.28266094]]

LDA OUTPUT
[[-8.06179978e+00 3.00420621e-01] [-7.12868772e+00 -7.86660426e-01] [-7.48982797e+00 -
2.65384488e-01] [-6.81320057e+00 -6.70631068e-01] [-8.13230933e+00 5.14462530e-01] [-
7.70194674e+00 1.46172097e+00] [-7.21261762e+00 3.55836209e-01] --------------------------------
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
----------------[ 9.97610366e-01 -4.90530602e-01] [ 3.83525931e+00 -1.40595806e+00] [
2.25741249e+00 -1.42679423e+00] [ 1.25571326e+00 -5.46424197e-01] [ 1.43755762e+00 -1.34424979e-
01] [ 2.45906137e+00 -9.35277280e-01] [ 3.51848495e+00 1.60588866e-01] [ 2.58979871e+00 -
1.74611728e-01] [-3.07487884e-01 -1.31887146e+00] [ 4.96774090e+00 8.21140550e-01] [
5.88614539e+00 2.34509051e+00] [ 4.68315426e+00 3.32033811e-01]]

ALPYDIN & CH. EICK: ML TOPIC1 17

Preprocessing
Scaling
• The MinMaxScaler is the probably the most famous
scaling algorithm, and follows the following
formula for each feature:
𝑥𝑖 –min(𝑥)
max(𝑥)–min(𝑥)

• Shrinks the range

such that the range
is now between 0
and 1 (or -1 to 1
if there are
negative values).

ALPYDIN & CH. EICK: ML TOPIC1 18

Preprocessing
Label Encoding
• Used to transform non-numerical labels that is
categorical values to numerical labels.

ALPYDIN & CH. EICK: ML TOPIC1 19

Thank You

UNIT-1 Regression vs. Classification
No ratings yet
UNIT-1 Regression vs. Classification
25 pages
Artificial Intelligence
100% (1)
Artificial Intelligence
47 pages
IT Skill-2
100% (1)
IT Skill-2
58 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
274 pages
Machine Learning and Econometrics
No ratings yet
Machine Learning and Econometrics
50 pages
Unit 3
No ratings yet
Unit 3
21 pages
Module7 Slides
No ratings yet
Module7 Slides
69 pages
ML Unit-1
No ratings yet
ML Unit-1
64 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
04 MLModelingBasics
No ratings yet
04 MLModelingBasics
61 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
SYBCS-Python Workbook Sem4 SCMIRT 240330 124212
No ratings yet
SYBCS-Python Workbook Sem4 SCMIRT 240330 124212
36 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages
1st Class-Introduction and Python Package
No ratings yet
1st Class-Introduction and Python Package
93 pages
ML - Week 05
No ratings yet
ML - Week 05
25 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
End To End Project
No ratings yet
End To End Project
21 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Unit-I, Part-2 Feature Engineering
No ratings yet
Unit-I, Part-2 Feature Engineering
21 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 8
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 8
31 pages
Algorithmeknn 121213175830 Phpapp02
No ratings yet
Algorithmeknn 121213175830 Phpapp02
52 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Ann Unit V
No ratings yet
Ann Unit V
30 pages
Ann Unit-Ii
No ratings yet
Ann Unit-Ii
29 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
Data Preprocessing: Modern Data Analytics (G0Z39A) Prof. Dr. Ir. Jan de Spiegeleer
No ratings yet
Data Preprocessing: Modern Data Analytics (G0Z39A) Prof. Dr. Ir. Jan de Spiegeleer
82 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
6 CNN
No ratings yet
6 CNN
50 pages
Functions:: Sparse Modeling
No ratings yet
Functions:: Sparse Modeling
7 pages
Introduction To Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
No ratings yet
Introduction To Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
15 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
DM - MOD - 1 Part III
No ratings yet
DM - MOD - 1 Part III
12 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
Machine Learning (Feature Engineering)
No ratings yet
Machine Learning (Feature Engineering)
10 pages
BigData Assessment2 26230605
No ratings yet
BigData Assessment2 26230605
14 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
PyTorch Neural Network Classifcation
No ratings yet
PyTorch Neural Network Classifcation
1 page
3) Multi-Layer Perceptron Learning in Tensorflow
No ratings yet
3) Multi-Layer Perceptron Learning in Tensorflow
7 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
MNIST Dataset
No ratings yet
MNIST Dataset
12 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Deep Learning For Data Analytics 2023 Answer
No ratings yet
Deep Learning For Data Analytics 2023 Answer
6 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Introduction R
No ratings yet
Introduction R
9 pages
COMEN Brochure V1.6 20230722
No ratings yet
COMEN Brochure V1.6 20230722
41 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
12 pages
Numpy and Scipy: Numerical Computing in Python
No ratings yet
Numpy and Scipy: Numerical Computing in Python
47 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
1001 Solved Electrical Engineering Problems PDF
50% (2)
1001 Solved Electrical Engineering Problems PDF
798 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
PPS - Unit 5 Objective Questions
No ratings yet
PPS - Unit 5 Objective Questions
12 pages
Milk Vending Machine
50% (2)
Milk Vending Machine
21 pages
Welcome To The TEC Business Intelligence (BI) RFI / RFP Template
No ratings yet
Welcome To The TEC Business Intelligence (BI) RFI / RFP Template
58 pages
tms320f28377d (데이터시트)
No ratings yet
tms320f28377d (데이터시트)
253 pages
Game Engine Gems 2 1st Edition Eric Lengyel Instant Download
100% (3)
Game Engine Gems 2 1st Edition Eric Lengyel Instant Download
81 pages
Analog Testing 02
0% (1)
Analog Testing 02
39 pages
The Wyndor Glass Co
100% (2)
The Wyndor Glass Co
5 pages
Css q2 Week6 g12
No ratings yet
Css q2 Week6 g12
4 pages
Usim Conformance Testing Spec
No ratings yet
Usim Conformance Testing Spec
99 pages
WS en 2018.07 Rev.6 64636655
No ratings yet
WS en 2018.07 Rev.6 64636655
88 pages
CS312 Handouts
No ratings yet
CS312 Handouts
101 pages
Sorting Methods
No ratings yet
Sorting Methods
50 pages
SOP IPPB - LPT001 Downloading&Configuring Java
No ratings yet
SOP IPPB - LPT001 Downloading&Configuring Java
9 pages
Operations Research
No ratings yet
Operations Research
11 pages
Hackathon Idea
No ratings yet
Hackathon Idea
6 pages
PCM 04-2013+booklet Amd
No ratings yet
PCM 04-2013+booklet Amd
152 pages
Network Security Checklist: Ensuring Robust Protection Against Cyber Threats
No ratings yet
Network Security Checklist: Ensuring Robust Protection Against Cyber Threats
12 pages
Microvision2 Hardware Manual SP101015.100
No ratings yet
Microvision2 Hardware Manual SP101015.100
50 pages
Cashify Whitepaper 2020
No ratings yet
Cashify Whitepaper 2020
28 pages
Manual Theta76 en
No ratings yet
Manual Theta76 en
154 pages
Cornerstone Project
No ratings yet
Cornerstone Project
31 pages
1 ML Intro
No ratings yet
1 ML Intro
24 pages
2 K-Means
No ratings yet
2 K-Means
20 pages
Openboxes-Docs Documentation: Release 0.7.18
No ratings yet
Openboxes-Docs Documentation: Release 0.7.18
42 pages
Pop Personal
No ratings yet
Pop Personal
10 pages
Laiza-Powerpoint 20240410 215454 0000
No ratings yet
Laiza-Powerpoint 20240410 215454 0000
9 pages
Batch Analysis Tool
No ratings yet
Batch Analysis Tool
8 pages
25.3 - VFX PDF MultiPassReferences
No ratings yet
25.3 - VFX PDF MultiPassReferences
3 pages
Create Windows 11 Bootable Usb Multiple Unattended Scripts - Google Search. 1pdf
No ratings yet
Create Windows 11 Bootable Usb Multiple Unattended Scripts - Google Search. 1pdf
1 page
2024 04 22 10.35.37
No ratings yet
2024 04 22 10.35.37
2 pages
Unit 4 Chapter 11 Test
No ratings yet
Unit 4 Chapter 11 Test
2 pages
Platform Grating
No ratings yet
Platform Grating
2 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet