0% found this document useful (0 votes)
10 views

Lesson 7 Feature Engineering

The document outlines a 4-hour course on feature engineering, covering topics such as the definition of features, the importance of feature selection and extraction, and various techniques like one-hot encoding, normalization, and dealing with missing data. It also discusses visual pattern recognition features, including shape-based descriptors and the Histogram of Oriented Gradients (HOG) algorithm. The course emphasizes the need for informative and discriminative features to improve machine learning model performance.

Uploaded by

hadatalex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lesson 7 Feature Engineering

The document outlines a 4-hour course on feature engineering, covering topics such as the definition of features, the importance of feature selection and extraction, and various techniques like one-hot encoding, normalization, and dealing with missing data. It also discusses visual pattern recognition features, including shape-based descriptors and the Histogram of Oriented Gradients (HOG) algorithm. The course emphasizes the need for informative and discriminative features to improve machine learning model performance.

Uploaded by

hadatalex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Feature engineering

 Duration: 4 hrs
 Outline:
1. Introduction
2. Feature engineering
3. Features in visual pattern recognition
4. Shape-based feature descriptors
Feature engineering

 Duration: 4 hrs
 Outline:
1. Introduction
2. Feature engineering
3. Features in visual pattern recognition
4. Shape-based feature descriptors
Introduction to feature & feature engineering
 Feature:
 an individual measurable property or characteristic of a data
example

 describes the example

 Features are usually numeric.

 Feature engineering: transfer raw data into feature vector

Data  feature vector  ML model


The general framework for Machine Learning
Curse of dimensionality
 Dimensionality: the number of features in feature vector.

 Curse of dimensionality:

 The number of features is very large relative to the number of


observations (examples) in dataset

 Hard to train effective model

 Dimensionality reduction
 Feature selection
 Feature extraction
Feature extraction vs. feature selection

 Feature selection:

 Filtering irrelevant or redundant features from dataset

 Choosing a subset of the original features

 Feature extraction:

 Creating a new smaller set of features

 Getting useful features from existing data

 Feature need to be informative, discriminating and


independent
Feature extraction vs. feature selection
Feature engineering

 Duration: 4 hrs
 Outline:
1. Introduction
2. Feature engineering
3. Features in visual pattern recognition
4. Shape-based feature descriptors
Feature engineering
 One-hot encoding

 Binning

 Normalization

 Standardization

 Dealing with missing feature

 Data imputation techniques


One-hot encoding
 Transform a categorical feature into several binary features

 Example: feature “color” has 3 values “red”, “yellow”, green”

 “red” = 1, “yellow” = 2, “green” = 3

 ?
Binning (bucketing)
 Transform a numerical feature into categorical feature

 Example: feature “age”

 Put all ages between 0 and 5 years-old into one bin

 Put ages from 6 to 10 years-old in the second bin

 Put ages from 11 to 15 years-old in the third bin, and so on.


Normalization

 Converting an actual range of values of a numerical feature


into a standard range of values, typically in the interval [-1, 1] or
[0, 1].

 Example: natural range = [350, 1450]

 Subtracting 350 from every value of the feature

 Dividing the result by 1100  normalized range = [0, 1].


Standardization

 Rescaling the feature values so that they have the properties of


a standard normal distribution with µ = 0 and =1

 Formula:
Standardization or normalization?
 Try two if have time 

 Rule of thumbs:
Dealing with missing features

 Removing the examples with missing features.

 Use data imputation technique


Data imputation techniques

 Technique 1: Replacing the missing value of a feature by an


average value of this feature in the dataset

 Technique 2: Replacing the missing value by the same value


outside the normal range of values.

 Technique 3: Replacing the missing value by a value in the


middle of the range.

…etc…
Feature engineering

 Duration: 4 hrs
 Outline:
1. Introduction
2. Feature engineering
3. Features in visual pattern recognition
4. Shape-based feature descriptors
Image feature extraction
 Purpose:

 To reduce the dimensionality of input image


 To transform each input image into a corresponding multi-
dimension feature vector

 To perform the predefined classification tasks with sufficient


accuracy without using the entire input image

 Requirements:
 Features should extract the most suitable characteristics from
the input image
An example of feature extraction

.
Visual features

 Color-based features
Visual features
 Shape-based features
Visual features
 Texture-based features
Which feature is the best?

 Example: plant recognition

 Plant features: leaf, fruit, flower, root, branch,…

 Leaf features: shape, vein, margin, texture


 No single best feature for a given leaf identity  combination
of different features

 No single best presentation for a given feature  multiple


descriptors to characterize the feature from different
perspectives

 Challenging
Deep learning
 Innovative
Feature engineering

 Duration: 4 hrs
 Outline:
1. Introduction
2. Feature engineering
3. Features in visual pattern recognition
4. Shape-based feature descriptors
Shape-based feature descriptor

 Shape: important

 Good shape descriptor: invariant to geometrical


transformations (rotation, reflection, scaling, translation)

 Types of shape descriptors: simple and morphological shape


descriptor (SMSD), contour-based, region-based
Simple and morphological shape descriptor

 Refer to basic geometric properties of the shape

 Basic descriptor: diameter, major axis length, minor axis length,


area, perimeter, centroid,…

 Morphological descriptor: aspect ratio, perimeter to area ratio,


rectangularity measures, circularity measures,…
Contour-based feature descriptor

 Consider the boundary of a shape and neglect the information


contained in the shape interior

 Ex: CCD (centroid contour distance), Fourier descriptor


computed on CCD.
Contour-based feature descriptor
Region-based feature descriptor

 Take all the pixels within a shape region into account to


obtain the shape representation

 Image moments: statistical descriptor of a shape. Ex: Hu


moments

 Local features: select key points in image. Ex: HOG


(histogram of oriented gradients), SIFT (scale-invariant
feature transform)
Histogram of Oriented Gradients (HOG) ALGORITHM

• HOG stands for histogram of oriented gradients.


• The hog descriptor focuses on structure or shape of
the object.
• It uses magnitude as well as direction of the
gradient to compute the features.
• It generates histogram by using magnitude and
direction of the gradient.
HOG ALGORITHM
HOG ALGORITHM

20

40 70

• Here we calculating gradient magnitude and 70


direction, to calculate pixels intensity we need
• X direction=|40-70|=30
• Y direction=|20-70|=50
• By these values we are calculating magnitude and
direction of the gradient
• By using magnitude and direction we calculate
feature vectors
HOG ALGORITHM
e
HOG ALGORITHM

• Before getting the hog feature and after


concatenating feature vectors we are supposed to
do normalize.
• Suppose we have taken 150*300 pixels and multiply
with 2 to increase the brightness and divided by 2 to
decrease the brightness, then you cant compare
two images without normalization bec’z the pixels
intensity will be changed.
• But if you normalize the feature vectors it is easy to
compare
HOG ALGORITHM
HOG ALGORITHM

• For hog features giving human template and giving output


for convolving with human model
• Then it will predict whether it is human or not.
Image moments
Hu moments feature descriptor
Central moments:

Central normalized moments:

Centroid of the image:

_
x.s(x, y) _
 y.s(x, y)
x x y
,y x y

s(x, y) s(x, y)
x y x y
Hu moments feature (cont)
Id image S1 S2 S3 S4 S5 S6 S7

6
HOG

https://www.youtube.com/watch?v=XmO0CSsKg88&t=41s

You might also like