INT247 Lect3.03.1

INT247
Machine Learning Foundations

Lecture #5.0
Normalization and Feature Scaling
© LPU :: INT247 Machine Learning Foundations

Feature Scaling
• Used for standardization of independent
variables of data features.
• Dataset contains features varying in
magnitude, units and range. For example:
– Gold_weight measured in gms.
– Iron_weight measured in Kg.
• Euclidian distance is not the best method to
scale the features.
© LPU :: INT247
Machine Learning
Foundations
Techniques of Feature Scaling
• Standardisation
• Mean Normalization
• Min-Max Scaling
• Unit Vector
© LPU :: INT247
Machine Learning
Foundations
Standardisation
𝝈
• This redistributes the features with their
mean =0 and standard deviation =1.
© LPU :: INT247
Machine Learning
Foundations
Normalisation
© LPU :: INT247
Machine Learning
Foundations
Exercise
Consider the following dataset:
X
0.0
1.0
2.0
3.0
4.0
5.0
Perform standardisation and normalisation on

dataset.
© LPU :: INT247
Machine Learning
Foundations
Solution
Consider the following dataset:
X Normalized Standardized
0.0 0.0 -1.336306
1.0 0.2 -0.801784
2.0 0.4 -0.267261
3.0 0.6 0.267261
4.0 0.8 0.801784
5.0 1.0 1.336306
© LPU :: INT247
Machine Learning
Foundations
Over-fitting
• Model performs much better on a training dataset
than on the test dataset.
• Model fits the parameter too closely to a particular
observation in the training dataset.
• Not generalize the real data.
© LPU :: INT247
Machine Learning
Foundations
Reduce Generalization Errors
• Collect more training data.
• Introduce a penalty for complexity via regularization.
• Choose a simpler model with fewer parameters.
• Reduce the dimensionality of the data.
© LPU :: INT247
Machine Learning
Foundations
Sparse Solution With L1 Regularization
• L1 regularization yields sparse feature vectors.

• Sparsity is useful if dataset is high dimensional with many
irrelevant features.
• L1 penalty is the sum of the absolute weight coefficients.
© LPU :: INT247
Machine Learning
Foundations
Sequential Feature Selection Algorithms
• Family of greedy search algorithms.
• Reduce an initial d-dimensional feature space into k-
dimensional feature sub-space where k<d.
• Automatically select a subset of features that are most
relevant to the problem.
© LPU :: INT247
Machine Learning
Foundations
Sequential Forward Selection (SFS) Algo.
SFS is the simplest greedy search algorithm.
– Starting from the empty set, sequentially add the
features x+ that maximizes J(Yk+x+) when combined
with the features Ykthat have already been selected.
1. Start with the empty set Y0={
2. Select the next best feature x+=argmaxJ(Yk+x)
3. Update Yk+1=Yk+x+; k=k+1
4. Go to 2
© LPU :: INT247
Machine Learning
Foundations
Sequential Forward Selection (SFS) Algo.
• SFS performs best when the optimal subset is
small.
• The search space is drawn like an ellipse to
emphasize the fact that there are fewer states
towards the full or empty sets.
© LPU :: INT247
Machine Learning
Foundations
Example
• Run SFS to completion for the following objective function:
J(X)=-2x1x2+3x1+5x2-2x1x2x3+7x3+4x4+-2x1x2x3x4
Where xk are indicator variables, which indicate whether the kth feature
has been selected (xk=1) or not (xk=0)
J(x1)=3 J(x2)=5 J(x3)=7 J(x4)=4
x3 is maximum: J(x3x1)=10 J(x3x2)=12 J(x3x4)=11
x3x2 is maximum: j(x3x2x1)=11 j(x3x2x4)=16
x3x2x4 is maximum: j(x3x2x4x1)=13
© LPU :: INT247
Machine Learning
Foundations
Sequential Backward Selection (SBS) Algo.
Aims to reduce the dimensionality of the initial feature
subspace.
– Initialize the algorithm with k=d where d is the dimensionality
of the full feature space Xd.
– Determine the feature x- that maximizes the criterion x-
=argmaxJ(Xk-x) where xϵXk.
– Remove the feature x- from the feature set: Xk-1=Xk-x-, k=k-1.
– Terminate if k equals the number of desired features, if not, go
to step 2.
© LPU :: INT247
Machine Learning
Foundations
Sequential Backward Selection (SBS)
• SBS works best when the optimal feature subset is
large, since SBS spends most of its time visiting
large subsets.
• The main limitation of SBS is its inability to re-
evaluate the usefulness of a feature after it has
been discarded.
© LPU :: INT247
Machine Learning
Foundations
Bidirectional Search (BDS)
BDS is a parallel implementation of SFS and SBS.
– SFS is performed from the empty set.
– SBS is performed from the full set.
– To guarantee that SFS and SBS converge to the same
solution.
• Features already selected by SFS are not removed by SBS.
• Features already removed by SBS are not selected by SFS.
© LPU :: INT247
Machine Learning
Foundations
Bidirectional Search (BDS)
1. Start SFS with YF={
2. Start SBS with YB=X
3. Select the best feature
4. Remove the worst feature
5. Go to step 2
© LPU :: INT247
Machine Learning
Foundations
Selecting Features Using Random Forests
There are two different methods for feature
selection are:
– Mean decrease impurity
– Mean decrease accuracy
© LPU :: INT247
Machine Learning
Foundations
Mean Decrease Impurity
• Impurity: measure based on which optimal condition is
chosen.
• During training, it is computed how each feature
decreases the weighted impurity in a tree.
• For a forest, the impurity decrease from each feature can
be averaged and the features are ranked according to this
measure.
© LPU :: INT247
Machine Learning
Foundations
Mean Decrease Impurity
• Feature selection based on impurity reduction is biased
towards preferring variables with more categories.
• When the dataset has two or more correlated features,
any of these correlated features can be used as the
predictor.
© LPU :: INT247
Machine Learning
Foundations
Mean Decrease Accuracy
• Measure the impact of each feature on accuracy of the
model.
• Permute the values of each feature and measure how
much the permutation decreases the accuracy of the
model.
• Unimportant variables permutation have little or no effect
on model accuracy.
• Important variables permutation significantly decrease
the accuracy.
© LPU :: INT247
Machine Learning
Foundations
© LPU :: INT247 Machine Learning Foundations

INT247 Lect3.03.1

Uploaded by

Copyright:

Available Formats

INT247 Lect3.03.1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

INT247 Lect3.03.1

Uploaded by

Copyright:

Available Formats

INT247

Machine Learning Foundations

© LPU :: INT247 Machine Learning Foundations

Perform standardisation and normalisation on

• L1 regularization yields sparse feature vectors.

4. Remove the worst feature

You might also like