0% found this document useful (0 votes)
38 views22 pages

Lecture 7 Data Transformation and Dimensionality Reduction

Uploaded by

ssrindes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views22 pages

Lecture 7 Data Transformation and Dimensionality Reduction

Uploaded by

ssrindes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Machine Learning (21CSH-286)


Faculty: Prof. (Dr.) Madan Lal Saini(E13485)

Lecture - 7
Data Transformation, Normalization, DISCOVER . LEARN . EMPOWER
Dimensionality reduction 1
Machine Learning: Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. Understand and apply various data handling and visualization techniques.
2. Understand about some basic learning algorithms and techniques and their
applications, as well as general questions related to analysing and handling large data
sets.
3. To develop skills of supervised and unsupervised learning techniques and
implementation of these to solve real life problems.
4. To develop basic knowledge on the machine techniques to build an intellectual
machine for making decisions behalf of humans.
5. To develop skills for selecting an algorithm and model parameters and apply them for
designing optimized machine learning applications.

2
COURSE OUTCOMES

On completion of this course, the students shall be able to:-


CO1 Describe and apply various data pre-processing and visualization techniques on dataset.

Understand about some basic learning on algorithms and analysing their applications, as
CO2
well as general questions related to analysing and handling large data sets.

Describe machine learning techniques to build an intellectual machine for making


CO3
decisions on behalf of humans.

Develop supervised and unsupervised learning techniques and implementation of these to


CO4
solve real life problems.

Analyse the performance of machine learning model and apply optimization techniques to
CO5
improve the performance of the model.

3
Unit-1 Syllabus

Unit-1 Data Pre-processing Techniques


Data Pre- Data Frame Basics, CSV File, Libraries for Pre-processing, Handling
Processing Missing data, Encoding Categorical data, Feature Scaling, Handling Time
Series data.

Feature Extraction Dimensionality Reduction: Feature Selection Techniques, Feature


Extraction Techniques; Data Transformation, Data Normalization.

Data Visualization Different types of plots, Plotting fundamentals using Matplotlib, Plotting
fundamentals using Seaborn.

4
SUGGESTIVE READINGS
TEXT BOOKS:
• T1: Tom.M.Mitchell, “Machine Learning”, McGraw Hill, International Edition, 2018
• T2: Ethern Alpaydin, “Introduction to Machine Learning”. Eastern Economy Edition, Prentice Hall of
India, 2015.
• T3: Andreas C. Miller, Sarah Guido, “Introduction to Machine Learning with Python”, O’REILLY
(2018).

REFERENCE BOOKS:
• R1 Sebastian Raschka, Vahid Mirjalili, “Python Machine Learning”, Packt Publisher (2019)
• R2 Aurélien Géron, “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, Wiley,
2nd Edition, 2022
• R3 Christopher Bishop, “Pattern Recognition and Machine Learning”, Illustrated Edition, Springer,
2016.

5
Data Transformation
• Data Transformation
• Data transformation is one of the fundamental steps in the part of data processing.
When I first learnt the technique of feature scaling, the terms scale, standardise,
and normalise are often being used. However, it was pretty hard to find information
about which of them I should use and also when to use.
• What Does Feature Scaling Mean?
• In practice, we often encounter different types of variables in the same dataset. A
significant issue is that the range of the variables may differ a lot. Using the original
scale may put more weights on the variables with a large range. In order to deal with
this problem, we need to apply the technique of features rescaling to independent
variables or features of data in the step of data pre-processing. The
terms normalisation and standardisation are sometimes used interchangeably, but
they usually refer to different things.
6
Data Transformation
A dataset that contains an independent variable (Purchased) and 3 dependent
variables (Country, Age, and Salary). We can easily notice that the variables are
not on the same scale because the range of Age is from 27 to 50, while the
range of Salary going from 48 K to 83 K. The range of Salary is much wider than
the range of Age. This will cause some issues in our models since a lot of
machine learning models such as k-means clustering and nearest neighbour
classification are based on the Euclidean Distance.
Focusing on age and salary
When we calculate the equation of Euclidean distance, the number of (x2-x1)²
is much bigger than the number of (y2-y1)² which means the Euclidean
distance will be dominated by the salary if we do not apply feature scaling. The
difference in Age contributes less to the overall difference. Therefore, we
should use Feature Scaling to bring all values to the same magnitudes and,
thus, solve this issue. To do this, there are primarily two methods called
Standardisation and Normalisation. 7
Data Transformation

8
Normalization
• Max-Min Normalization
Another common approach is the so-called Max-Min Normalization (Min-Max scaling). This
technique is to re-scales features with a distribution value between 0 and 1. For every feature, the
minimum value of that feature gets transformed into 0, and the maximum value gets transformed
into 1. The general equation is shown below:

• Standardisation vs Max-Min Normalization


• In contrast to standardisation, we will obtain smaller standard deviations through the process of
Max-Min Normalisation. Let me illustrate more in this area using the above dataset.

9
Normalization

10
Normalization

• From the above graphs, we can clearly notice that applying Max-
Min Nomaralisation in our dataset has generated smaller
standard deviations (Salary and Age) than using Standardisation
method. It implies the data are more concentrated around the
mean if we scale data using Max-Min Nomaralisation.
• As a result, if you have outliers in your feature (column),
normalizing your data will scale most of the data to a small
interval, which means all features will have the same scale but
does not handle outliers well. Standardisation is more robust to
outliers, and in many cases, it is preferable over Max-Min
Normalisation.
11
Dimensionality Reduction
• What is Dimensionality Reduction?
In machine learning classification problems, there are often too many
factors on the basis of which the final classification is done. These
factors are basically variables called features. The higher the number of
features, the harder it gets to visualize the training set and then work
on it. Sometimes, most of these features are correlated, and hence
redundant. This is where dimensionality reduction algorithms come
into play. Dimensionality reduction is the process of reducing the
number of random variables under consideration, by obtaining a set of
principal variables. It can be divided into feature selection and feature
extraction.

12
Dimensionality Reduction

Components of Dimensionality Reduction


There are two components of dimensionality reduction:
Feature selection: In this, we try to find a subset of the original set of variables, or
features, to get a smaller subset which can be used to model the problem. It usually
involves three ways:
Filter
Wrapper
Embedded
Feature extraction: This reduces the data in a high dimensional space to a lower
dimension space, i.e. a space with lesser no. of dimensions.

13
Dimensionality Reduction
Methods of Dimensionality Reduction
• The various methods used for dimensionality reduction include:
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Generalized Discriminant Analysis (GDA)
• Dimensionality reduction may be both linear or non-linear, depending upon the
method used. The prime linear method, called Principal Component Analysis, or
PCA, is discussed below.

14
Principal Component Analysis
The main idea of principal component analysis (PCA) is to reduce the
dimensionality of a data set consisting of many variables correlated
with each other, either heavily or lightly, while retaining the variation
present in the dataset, up to the maximum extent. The same is done by
transforming the variables to a new set of variables, which are known
as the principal components (or simply, the PCs) and are orthogonal,
ordered such that the retention of variation present in the original
variables decreases as we move down in the order. So, in this way, the
1st principal component retains maximum variation that was present in
the original components. The principal components are the
eigenvectors of a covariance matrix, and hence they are orthogonal.

15
How PCA works?
Step 1: Normalize the data
First step is to normalize the data that we have so that PCA works
properly. This is done by subtracting the respective means from the

Y, all X become 𝔁- and all Y become 𝒚-. This produces a dataset whose
numbers in the respective column. So if we have two dimensions X and

mean is zero.
Step 2: Calculate the covariance matrix
Since the dataset we took is 2-dimensional, this will result in a 2x2
Covariance matrix.

16
How PCA works?
Step 3: Calculate the eigenvalues and eigenvectors
Next step is to calculate the eigenvalues and eigenvectors for the
covariance matrix. The same is possible because it is a square
matrix. ƛ is an eigenvalue for a matrix A if it is a solution of the
characteristic equation:
det( ƛI - A ) = 0
Where, I is the identity matrix of the same dimension as A which is a
required condition for the matrix subtraction as well in this case and
‘det’ is the determinant of the matrix. For each eigenvalue ƛ, a
corresponding eigen-vector v, can be found by solving:
( ƛI - A )v = 0
17
How PCA works?
Step 4: Choosing components and forming a feature vector:
We order the eigenvalues from largest to smallest so that it gives us the components in
order or significance. Here comes the dimensionality reduction part. If we have a
dataset with n variables, then we have the corresponding n eigenvalues and
eigenvectors. It turns out that the eigenvector corresponding to the highest eigenvalue
is the principal component of the dataset and it is our call as to how many eigenvalues
we choose to proceed our analysis with. To reduce the dimensions, we choose the
first p eigenvalues and ignore the rest. We do lose out some information in the process,
but if the eigenvalues are small, we do not lose much.
Next we form a feature vector which is a matrix of vectors, in our case, the
eigenvectors. In fact, only those eigenvectors which we want to proceed with. Since we
just have 2 dimensions in the running example, we can either choose the one
corresponding to the greater eigenvalue or simply take both.
Feature Vector = (eig1, eig2)
18
How PCA works?
Step 5: Forming Principal Components:
This is the final step where we actually form the principal components using all the
math we did till here. For the same, we take the transpose of the feature vector and
left-multiply it with the transpose of scaled version of original dataset.
NewData = FeatureVectorT x ScaledDataT
Here,
NewData is the Matrix consisting of the principal components,
FeatureVector is the matrix we formed using the eigenvectors we chose to keep, and
ScaledData is the scaled version of original dataset
(‘T’ in the superscript denotes transpose of a matrix which is formed by
interchanging the rows to columns and vice versa. In particular, a 2x3 matrix has a
transpose of size 3x2)
19
Questions?

• How Do You Handle Missing or Corrupted Data in a Dataset?

• How Can You Choose a Classifier Based on a Training Set Data Size?

• What Are the Three Stages of Building a Model in Machine Learning?

• What Are the Different Types of Machine Learning?

• What is ‘training Set’ and ‘test Set’ in a Machine Learning Model? How
Much Data Will You Allocate for Your Training, Validation, and Test Sets?

20
References
Book:
• Ethern Alpaydin, “Introduction to Machine Learning”. Eastern Economy Edition, Prentice Hall of
India, 2015.
• Andreas C. Miller, Sarah Guido, “Introduction to Machine Learning with Python”, O’REILLY
(2018).
Research Paper:
• Bi, Qifang, et al. "What is machine learning? A primer for the epidemiologist." American journal of
epidemiology 188.12 (2019): 2222-2239.
• Jordan, Michael I., and Tom M. Mitchell. "Machine learning: Trends, perspectives, and
prospects." Science 349.6245 (2015): 255-260.
Websites:
• https://www.geeksforgeeks.org/machine-learning/
• https://www.javatpoint.com/machine-learning
Videos:
• https://www.youtube.com/playlist?list=PLIg1dOXc_acbdJo-AE5RXpIM_rvwrerwR 21
THANK YOU

For queries
Email: [email protected]

You might also like