INT247 Lect3.03.1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

INT247

Machine Learning Foundations


Lecture #5.0
Normalization and Feature Scaling

© LPU :: INT247 Machine Learning Foundations


Feature Scaling
• Used for standardization of independent
variables of data features.
• Dataset contains features varying in
magnitude, units and range. For example:
– Gold_weight measured in gms.
– Iron_weight measured in Kg.
• Euclidian distance is not the best method to
scale the features.
© LPU :: INT247
Machine Learning
Foundations
Techniques of Feature Scaling
• Standardisation
• Mean Normalization
• Min-Max Scaling
• Unit Vector

© LPU :: INT247
Machine Learning
Foundations
Standardisation

𝝈
• This redistributes the features with their
mean =0 and standard deviation =1.

© LPU :: INT247
Machine Learning
Foundations
Normalisation

© LPU :: INT247
Machine Learning
Foundations
Exercise
Consider the following dataset:
X
0.0
1.0
2.0
3.0
4.0
5.0

Perform standardisation and normalisation on


dataset.

© LPU :: INT247
Machine Learning
Foundations
Solution
Consider the following dataset:
X Normalized Standardized
0.0 0.0 -1.336306
1.0 0.2 -0.801784
2.0 0.4 -0.267261
3.0 0.6 0.267261
4.0 0.8 0.801784
5.0 1.0 1.336306

© LPU :: INT247
Machine Learning
Foundations
Over-fitting
• Model performs much better on a training dataset
than on the test dataset.
• Model fits the parameter too closely to a particular
observation in the training dataset.
• Not generalize the real data.

© LPU :: INT247
Machine Learning
Foundations
Reduce Generalization Errors
• Collect more training data.
• Introduce a penalty for complexity via regularization.
• Choose a simpler model with fewer parameters.
• Reduce the dimensionality of the data.

© LPU :: INT247
Machine Learning
Foundations
Sparse Solution With L1 Regularization

• L1 regularization yields sparse feature vectors.


• Sparsity is useful if dataset is high dimensional with many
irrelevant features.
• L1 penalty is the sum of the absolute weight coefficients.

© LPU :: INT247
Machine Learning
Foundations
Sequential Feature Selection Algorithms
• Family of greedy search algorithms.
• Reduce an initial d-dimensional feature space into k-
dimensional feature sub-space where k<d.
• Automatically select a subset of features that are most
relevant to the problem.

© LPU :: INT247
Machine Learning
Foundations
Sequential Forward Selection (SFS) Algo.
SFS is the simplest greedy search algorithm.
– Starting from the empty set, sequentially add the
features x+ that maximizes J(Yk+x+) when combined
with the features Ykthat have already been selected.
1. Start with the empty set Y0={
2. Select the next best feature x+=argmaxJ(Yk+x)
3. Update Yk+1=Yk+x+; k=k+1
4. Go to 2
© LPU :: INT247
Machine Learning
Foundations
Sequential Forward Selection (SFS) Algo.
• SFS performs best when the optimal subset is
small.
• The search space is drawn like an ellipse to
emphasize the fact that there are fewer states
towards the full or empty sets.

© LPU :: INT247
Machine Learning
Foundations
Example
• Run SFS to completion for the following objective function:
J(X)=-2x1x2+3x1+5x2-2x1x2x3+7x3+4x4+-2x1x2x3x4
Where xk are indicator variables, which indicate whether the kth feature
has been selected (xk=1) or not (xk=0)
J(x1)=3 J(x2)=5 J(x3)=7 J(x4)=4
x3 is maximum: J(x3x1)=10 J(x3x2)=12 J(x3x4)=11
x3x2 is maximum: j(x3x2x1)=11 j(x3x2x4)=16
x3x2x4 is maximum: j(x3x2x4x1)=13

© LPU :: INT247
Machine Learning
Foundations
Sequential Backward Selection (SBS) Algo.
Aims to reduce the dimensionality of the initial feature
subspace.
– Initialize the algorithm with k=d where d is the dimensionality
of the full feature space Xd.
– Determine the feature x- that maximizes the criterion x-
=argmaxJ(Xk-x) where xϵXk.
– Remove the feature x- from the feature set: Xk-1=Xk-x-, k=k-1.
– Terminate if k equals the number of desired features, if not, go
to step 2.

© LPU :: INT247
Machine Learning
Foundations
Sequential Backward Selection (SBS)
• SBS works best when the optimal feature subset is
large, since SBS spends most of its time visiting
large subsets.
• The main limitation of SBS is its inability to re-
evaluate the usefulness of a feature after it has
been discarded.

© LPU :: INT247
Machine Learning
Foundations
Bidirectional Search (BDS)
BDS is a parallel implementation of SFS and SBS.
– SFS is performed from the empty set.
– SBS is performed from the full set.
– To guarantee that SFS and SBS converge to the same
solution.
• Features already selected by SFS are not removed by SBS.
• Features already removed by SBS are not selected by SFS.

© LPU :: INT247
Machine Learning
Foundations
Bidirectional Search (BDS)
1. Start SFS with YF={
2. Start SBS with YB=X
3. Select the best feature

4. Remove the worst feature

5. Go to step 2

© LPU :: INT247
Machine Learning
Foundations
Selecting Features Using Random Forests
There are two different methods for feature
selection are:
– Mean decrease impurity
– Mean decrease accuracy

© LPU :: INT247
Machine Learning
Foundations
Mean Decrease Impurity
• Impurity: measure based on which optimal condition is
chosen.
• During training, it is computed how each feature
decreases the weighted impurity in a tree.
• For a forest, the impurity decrease from each feature can
be averaged and the features are ranked according to this
measure.

© LPU :: INT247
Machine Learning
Foundations
Mean Decrease Impurity
• Feature selection based on impurity reduction is biased
towards preferring variables with more categories.
• When the dataset has two or more correlated features,
any of these correlated features can be used as the
predictor.

© LPU :: INT247
Machine Learning
Foundations
Mean Decrease Accuracy
• Measure the impact of each feature on accuracy of the
model.
• Permute the values of each feature and measure how
much the permutation decreases the accuracy of the
model.
• Unimportant variables permutation have little or no effect
on model accuracy.
• Important variables permutation significantly decrease
the accuracy.

© LPU :: INT247
Machine Learning
Foundations
© LPU :: INT247 Machine Learning Foundations

You might also like