0% found this document useful (0 votes)
6 views

Module 5 - BECE309L - AIML - Part2

Uploaded by

yv5pgh7z84
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module 5 - BECE309L - AIML - Part2

Uploaded by

yv5pgh7z84
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Data Transformation

▪ Benefits:
▪ Data is transformed to make it better-organized.
▪ Properly formatted and validated data improves data quality.
▪ Data transformation facilitates compatibility between applications,
systems, and types of data.
▪ Challenges:
▪ Data transformation can be expensive (due to computing resources,
licensing cost, etc.). It can be resource-intensive.
▪ Lack of expertise and carelessness can introduce problems during
transformation.
75
Data Reduction
▪ Data Reduction: Process of transforming numerical or alphabetical digital information
derived empirically or experimentally into a corrected, ordered, and simplified form.
▪ Concept: Reduction of vast amounts of data down to the meaningful parts.
▪ In the case of discrete data, reduction involves smoothing and interpolation. In the case
of digital data, reduction involves some editing, scaling, encoding, sorting, collating,
etc.
▪ Benefits of data reduction:
▪ Reduced representation of data will be smaller in volume but produces the same (or
similar) analytical results;
▪ Data analytics/mining on the reduced data set consumes much lesser time
compared to that done on the complete data set.
76
Data Reduction
▪ Curse of Dimensionality
▪ Dimensionality : Number of features we have in the dataset.
▪ Assume: for a Housing Price Prediction; we have 500 features

3 Model 1 ↑↑

10 Model 2 ↑↑↑↑

15 Model 3 ↓↓
No. of ML
Accuracy
Features Models
50 Model 4 ↓↓↓

100 Model 5 ↓↓↓↓

77 500 Model 6 ↓↓↓↓↓↓


Data Reduction

3 ↑↑

↑↑↑↑
10 As theModel 1
No. of ML
Accuracy ↓↓
Features No. of Features Increase:
Models

Accuracy Decrease ⋮

500 ↓↓↓↓↓↓

▪ Model Performance Degrades


▪ Reason: the model would have to do a lot of calculations

78
Data Reduction

Location 75 L

3 BHK 1.2 Cr.

Beach
Nearby 2.5 Cr.
Features Price
Near a mall ???

Near a
school ???
Near to EV
79
charging Confused!
station
Data Reduction

▪ Two Ways to Address the Curse of Dimensionality


▪ Feature Selection
▪ We try to consider the important features from our dataset and
▪ Then train our model
▪ Dimensionality Reduction
▪ Feature extraction
▪ Derive a feature from the set of features, where the essence of the
previous features will be captured
▪ E.g.: 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 , … , 𝒇𝟕 → {𝒅𝟏 , 𝒅𝟐 }

80
Why Dimensionality Reduction

▪ Prevent → Curse of Dimensionality


▪ Improve Performance of the Model
▪ Better Visualization
▪ How to visualize 500 features?

81
Principal Component Analysis (PCA)

▪ Summary:
▪ As we keep on increasing the number of features,
▪ At the time of training our models; at one point of time; our model
might get confused; as it has so many features to learn
▪ Because of that; the Performance of the Model will Degrade.

82
Covariance Analysis
▪ Feature Selection Linear relationship Covariance Analysis
(to find mathematical relationship
(+ve Cov) between x & y)
𝒚

𝒙↑ 𝒚↑
𝒙 𝒚
𝒊𝟏 𝒐𝟏 𝒙↓ 𝒚↓
𝒙
𝒊𝟐 𝒐𝟐
𝒊𝟑 𝒐𝟑 Inverse
Linear relationship No relationship
⋮ ⋮ 𝒙↑ 𝒚↓ (-ve Cov) (0 Cov)
𝒊𝒏 𝒐𝒎 𝒚 𝒚
𝒙↓ 𝒚↑

83
𝒙 𝒙
Covariance Analysis

▪ Feature Selection
▪ For features with very high covariance
▪ These set of features are important
▪ As this feature is directly influencing the output

▪ If Covar. Is ZERO
▪ There is no relation between feature and can be neglected

84
Pearson Correlation

𝒄𝒐𝒗 𝒙, 𝒚
Pearson Correaltion = = [−𝟏 𝒕𝒐 + 𝟏]
𝝈𝒙 ∗ 𝝈𝒚

▪ If PC is more inclined towards + 1 → 𝑥, 𝑦 more Positive Correlation


▪ If PC is more inclined towards - 1 → 𝑥, 𝑦 more Negative Correlation

85
Feature Selection

▪ Example: Housing Dataset

Price
▪ Features
▪ House Size
▪ No. of Rooms House Size

▪ Output Feature

Price
▪ Price

Fountain Size
86
Feature Extraction

▪ Example: Housing Dataset


▪ Independent Features
▪ No. of Rooms Reduce the Features Apply some
▪ Room Size to ONE from TWO Transform

▪ Output Feature
▪ Price Generate New
Feature

Predict House Size


87
Feature Extraction

▪ Example: Housing Dataset


Room Size No. of Rooms Price
No. of Rooms

Reduce TWO Features


• Room Size House Size
• No. of Rooms

Room Size

88
Feature Extraction

▪ Example: Housing Dataset Room Size No. of Rooms Price

No. of Rooms

2D → 1D
Room Size

As Spread ↑ Variance ↑
89
Feature Extraction

▪ Example: Housing Dataset Room Size No. of Rooms Price

No. of Rooms
As Spread ↑ Variance ↑

Information
is Lost
if only 1D is
considered

2D → 1D
Room Size

As Spread ↑ Variance ↑
90
Feature Extraction

▪ Example: Housing Dataset Room Size No. of Rooms Price

In 𝑛𝐷 → 𝑚𝐷 ; 𝑛 >>> 𝑚 conversion

Model
If so much
might give
Information
WRONG
is Lost
Predictions

91
Principal Component Analysis (PCA)

▪ In 𝑛𝐷 → 𝑚𝐷 ; 𝑛 >>> 𝑚 conversion apply some transformation{Eigen


Decomposition} is applied on some matrix.

No. of Rooms

Maximum
Variance is Captured

Room Size
92
Least Variance is Lost
Principal Component Analysis (PCA)
Maximum Variance is Captured on X Axis
Least is lost on Y Axis
PC1
No. of Rooms

PC2

No. of Rooms
Projection

Room Size
Room Size
Projection
93 Best PC : Captures Max Variance
Variance : PC1 > PC2 > PC3 > PC4 …
Principal Component Analysis (PCA)

▪ Find a projection that captures the x2


largest amount of variation in data
▪ The original data are projected onto
e
a much smaller space, resulting in
dimensionality reduction.
▪ We find the eigenvectors of the
covariance matrix, and these
eigenvectors define the new space
x1
94
Principal Component Analysis (PCA)
❑ PCA transforms the variables into a new set of variables called as principal
components (PC).
❑ PCs are linear combination of original variables and are orthogonal.
❑ 1st PC accounts for most of the possible variation of original data.
❑ 2nd PC does its best to capture the variance in the data.
❑ There can be only 2 PC for a 2D data set.

Original data space Tranfomed data space

95
Principal Component Analysis (PCA)
The steps involved in PCA Algorithm are as
follows-
❑ Get data.
❑ Compute the mean vector (µ).
❑ Subtract mean from the given data.
❑ Calculate the covariance matrix.
❑ Calculate the eigen vectors and eigen values
of the covariance matrix.
❑ Choosing components and forming a
feature vector.
❑ Deriving the new data set. 96
PCA : Example
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Compute the principal component using PCA Algorithm.

𝑥1 = 2,1
𝑥2 = 3,5 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6
𝑥3 = 4,3
𝑥4 = 5,6 2 3 4 5 6 7
𝑥5 = 6,7 1 5 3 6 7 8
𝑥6 = 7,8

97
PCA : Example
Compute Mean

2+3+4+5+6+7 1+5+3+6+7+8
Mean 𝝁 = ,
6 6

Mean Vector 𝝁

4.5
5

98
PCA : Example (… contd.)
Subtract Mean from the Vector
𝑥1 − 𝜇 = 2 − 4.5, 1−5
𝑥2 − 𝜇 = 3 − 4.5, 5−5
𝑥3 − 𝜇 = 4 − 4.5, 3−5
𝑥4 − 𝜇 = 5 − 4.5, 6−5
𝑥5 − 𝜇 = 6 − 4.5, 7−5
𝑥6 − 𝜇 = 7 − 4.5, 8−5

𝑥1 ′ 𝑥2 ′ 𝑥3 ′ 𝑥4 ′ 𝑥5 ′ 𝑥6 ′
−2.5 −1.5 −0.5 0.5 1.5 2.5
−4 0 −2 1 2 3
99
PCA : Example (… contd.)
Compute Covariance Matrix

σ 𝒙𝒊 − 𝝁 ∗ 𝒙𝒊 − 𝝁 𝑻
Covariance Matrix =
𝒏

100
PCA : Example (… contd.)
Compute Covariance Matrix

σ 𝒙𝒊 − 𝝁 ∗ 𝒙𝒊 − 𝝁 𝑻
Covariance Matrix =
𝒏
𝒎𝟏 = 𝒙𝟏 − 𝝁 ∗ 𝒙𝟏 − 𝝁 𝑻 1.5 2.25 3
−2.5 6.25 10 𝒎𝟓 = ∗ 1.5 2 =
= ∗ −2.5 −4 = 2 3 4
−4 10 16

−1.5 2.25 0 2.5 6.25 7.5


𝒎𝟐 = ∗ −1.5 0 = 𝒎𝟔 = ∗ 2.5 3 =
0 0 0 3 7.5 9

−0.5 0.25 1
𝒎𝟑 = ∗ −0.5 −2 =
−2 1 4
0.5 0.25 0.5
𝒎𝟒 = ∗ 0.5 1 =
1 0.5 1 101
PCA : Example (… contd.)
Compute Covariance Matrix

𝒎𝟏 + 𝒎𝟐 + 𝒎 𝟑 + 𝒎𝟒 + 𝒎𝟓 + 𝒎𝟔
Covariance Matrix =
𝒏

1 17.5 22 2.92 3.67


𝑪𝑴 = ∗ =
6 22 34 3.67 5.67

102
PCA : Example (… contd.)
Compute Eigen Values & Vectors of CM
𝐶𝑀 − 𝜆𝐼 = 0

2.92 3.67 𝜆 0
− =0
3.67 5.67 0 𝜆

2.92 − 𝜆 3.67
=0
3.67 5.67 − 𝜆

103
PCA : Example (… contd.)
Solving the equations

2.92 − 𝜆 5.67 − 𝜆 − 3.67 ∗ 3.67 = 0

𝜆2 − 8.59𝜆 + 3.09 = 0
Solving the quadratic equations; we obtain the Eigen values

𝜆1 = 8.22
𝜆2 = 0.38 2nd Eigen Value is
SMALLER as compared to
the 1st.
Thus can be Left Out. 104
PCA : Example (… contd.)
Obtaining Eigen Vectors

Using the following equation, the Eigen Vector can be determined:


Where:
𝑀𝑋 = 𝜆𝑋 𝑀 is the Covar. Matrix.
𝑋 is the Eigen Vector
2.92 3.67 𝑥1 𝑥1 𝜆 is Eigen Value
⇒ ∗ 𝑥 = 8.22 ∗ 𝑥
3.67 5.67 2 2

Solving:
2.92𝑥1 + 3.67𝑥2 = 8.22𝑥1
3.67𝑥1 + 5.67𝑥2 = 8.22𝑥2 105
PCA : Example (… contd.)
Simplifying: 𝒙𝟐
5.3𝑥1 = 3.67𝑥2 eq. (1)

3.67𝑥1 = 2.55𝑥2 eq. (2)

From above equations:


𝑥1 = 0.69𝑥2
From eq. (2), the Eigen Vector is:

𝑥1 2.55
⇒ 𝑥 = 𝒙𝟏
2 3.67 106
PCA : Example 2
Using the PCA algo., transform the given pattern (2,1) onto the Eigen vector
obtained in prev. example.

Given

Eigen Vector: Mean Vector: 4.5


=
𝑥1 5
2.55
𝑥2 = 3.67
Feature Vector: =
2
1
107
PCA : Example 2
Using the PCA algo., transform the given pattern (2,1) onto the Eigen vector
obtained in prev. example.

To transform the feature vector:


Transpose of Eigen Vector x (Feature Vector – Mean Vector)
𝑇
2.55 2 4.5
= ∗ −
3.67 1 5
−2.5
= 2.55 3.67 ∗
−4
= −21.055 108

You might also like