0% found this document useful (0 votes)
5 views

Machine Learning

Uploaded by

Himanshu Saxena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Machine Learning

Uploaded by

Himanshu Saxena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

Machine Learning
CSN-382 (Lecture 5)
Dr. R. Balasubramanian
Professor
Department of Computer Science and Engineering
Mehta Family School of Data Science and Artificial Intelligence
Indian Institute of Technology Roorkee
Roorkee 247 667
[email protected]
https://faculty.iitr.ac.in/cs/bala/
R squared

● R-Squared (R² or the coefficient of determination) is a statistical


measure in a regression model .
● It shows how well the data fit the regression model (the goodness of
fit).
● R-squared can take any values between 0 to 1.

Regression sum of squares = SSR


R2 = SST
Total sum of squares

2
Adjusted R squared

● Adjusted R2 is a corrected goodness-of-fit (model accuracy)


measure for linear models.
● The adjusted R-squared increases when the new term improves the
model more than would be expected by chance.
● Typically, the adjusted R-squared is positive, not negative. It is
always lower than the R-squared.

(1 - R2)(N - 1)
Adjusted R2 =1-
N-p-1
Where,
R2 = Sample R-squared
N = Total Sample size
p= No. of independent variables

3
Fluke Correlation

● For any two associated variables there can never be an instance


where all the positive data points cancel all the negative data points.
● Or we can say that when we plot data points for two dimensions the
correlation between them can never be zero,
● As in real world scenario the data points is likely to have asymmetric
distribution, and therefore, R>0.
● This relationship of R>0 is called Fluke Relationship.
● R squared does not eliminate this statistical fluke whereas Adjusted
R squared removes this statistical fluke and gives the exact results
for quality of our model.

4
Linear Regression Assumptions

● The dependent variable must be numeric.


● Linear relationship between dependent and independent variables.
● Predictors must not show multicollinearity.
● Independence of observations should exist (absence of
autocorrelation).
● The error terms should be homoscedastic.
● The error terms must follow normal distribution.

5
Let’s Answer some questions

1. Can we use Linear regression to predict a categorical variable?


2. Is 95% R-Squared a good metric?

6
Polynomial Regression

► Source: https://www.javatpoint.com/machine-learning-polynomial-regression

7
Non-Linear Regression

Price (24 karat per 10


Year
grams)
1964 Rs.63.25
1965 Rs.71.75
1966 Rs.83.75
1967 Rs.102.50
1968 Rs.162.00
1969 Rs.176.00
1970 Rs.184.00
1971 Rs.193.00
1972 Rs.202.00
1973 Rs.278.50
1974 Rs.506.00

8
1975 Rs.540.00
1976 Rs.432.00
1977 Rs.486.00
1978 Rs.685.00
1979 Rs.937.00
1980 Rs.1,330.00
1981 Rs.1,800.00
1982 Rs.1,645.00
1983 Rs.1,800.00
1984 Rs.1,970.00
1985 Rs.2,130.00

9
1986 Rs.2,140.00
1987 Rs.2,570.00
1988 Rs.3,130.00
1989 Rs.3,140.00
1990 Rs.3,200.00
1991 Rs.3,466.00

1992 Rs.4,334.00
1993 Rs.4,140.00
1994 Rs.4,598.00
1995 Rs.4,680.00
1996 Rs.5,160.00

10
1997 Rs.4,725.00

1998 Rs.4,045.00

1999 Rs.4,234.00

2000 Rs.4,400.00

2001 Rs.4,300.00

2002 Rs.4,990.00

2003 Rs.5,600.00

2004 Rs.5,850.00

2005 Rs.7,000.00

2007 Rs.10,800.00

11
2008 Rs.12,500.00

2009 Rs.14,500.00

2010 Rs.18,500.00

2011 Rs.26,400.00

2012 Rs.31,050.00

2013 Rs.29,600.00

2014 Rs.28,006.50

2015 Rs.26,343.50

2016 Rs.28,623.50

2017 Rs.29,667.50

12
2018 Rs.31,438.00
2019 Rs.35,220.00
2020 Rs.48,651.00
2021 Rs.48,720.00
2022 (Till Today) Rs.52,690.00

13
14
15
Predicted Gold rate in 2032 and
2042
► 2032: Rs. 92390

► 2042: Rs. 132090

16
Feature Engineering for ML

► Feature engineering is the pre-processing step of machine


learning, which is used to transform raw data into features
that can be used for creating a predictive model using
Machine learning or statistical Modelling.
► Feature
 Image Processing
 NLP
► Feature is an attribute that impacts a problem or is useful for
the problem.

17
Feature Engineering

18
Four Processes of ML

► Feature Creation
► Transformations
► Feature Extraction
► Feature Selection.

19
Feature Creation

► Feature creation is finding the most useful variables to be


used in a predictive model.

► Reference Papers:

► Adaptive Fine-Grained Sketch-Based Image Retrieval,


ECCV 2022.
► Sketching without Worrying: Noise-Tolerant Sketch-Based
Image Retrieval, CVPR 2022.

20
Transformations

► The transformation step of feature engineering involves


adjusting the predictor variable to improve the accuracy and
performance of the model.

21
Feature Extraction

► Feature extraction is an automated feature engineering


process that generates new variables by extracting them
from the raw data.
► Examples:
► Cluster analysis, text analytics, edge detection algorithms,
and principal components analysis (PCA).

22
Feature Selection

► Feature selection is a way of selecting the subset of the most


relevant features from the original features set by removing
the redundant, irrelevant, or noisy features.

23
Principal Component Analysis
(PCA)
► Purpose of PCA
 To compress a lot of data into small data such that the compressed data
captures the essence the original data.
 Dimensionality Reduction.
 X-D into 3-D or 2-D.

► How is Dimensionality Reduction useful?


 Data processing in higher dimensions involves high time & space complexity and
computing cost.
 There is a risk of over-fitting.

► Not all the features in the dataset are relevant to the problem. Some
features are more relevant than others. The processing may be done
for the more relevant features only, without significant loss of the
information.

24
Intuition behind using PCA
► Let’s take an example of counting the minions that are scattered in
a 2-D space. Suppose we want to project them onto a 1-D line and
count.

Courtesy: https://www.youtube.com/channel/UCFJPdVHPZOYhSyxmX_C_Pew
25
Intuition behind using PCA
► How to choose the 1-D line?
 Vertical: the minions will collide onto each other while projecting => ✖
 At an angle: still, the possibility of collision is there. => ✖
 Horizontal: least possibility of collision. Max. variation. => ✔

26
Principal Component Analysis
(PCA)
► In the minion’s example:
 We reduced the dimensionality from 2 to 1.
 The horizontal line would be the Principal Component.

► How to determine the Principal Component, mathematically?


 Using the concepts: Covariance matrix, Eigen-vectors, etc.
 Discussed in the further slides.

27
Variance and Covariance

Variance
‘How spread out a given dataset is.’

Courtesy: https://www.youtube.com/watch?v=g-Hb26agBFg
28
Variance and Covariance
Covariance Covariance Matrix
‘Total variation of two variables
from their expected values.’

nn
C  (ci , j )
where :
ci , j  cov( Ai , A j )
A1 , ..., A n  given n attributes.

• Covariance:
– Positive ⇒ both the variables increase together.
– Negative ⇒ as one variable increases, the other decreases.
– Zero ⇒ both the variables are independent of each other.
Courtesy: Smith, Lindsay I. A tutorial on principal components analysis. 2002.
29
Example on Covariance
Matrix
Covariance Matrix

 cov( H , H ) cov( H , M ) 
 
 cov( M , H ) cov( M , M ) 

 var( H ) 104.5 
  
 104.5 var( M ) 

 47.7 104.5 
  
104.5 370 
Eigenvalues and Eigenvectors
► Eigenvector is a direction. E.g., in the minion's example, the
eigenvectors were the directions of the lines - vertical,
horizontal or at an angle.
► Eigenvalue is a number telling how much variance is there
in the data in that direction. E.g., in the minion's example, the
eigenvalue is the number telling how spread out the minions
are on the line.
► Principal Component = Eigenvector with higher
eigenvalue.
► Every Eigenvector has a corresponding Eigenvalue.
► Eigenvectors & Eigenvalues that exist = No. of dimensions
(experimental observation).

31
Eigenvalues and Eigenvectors
► Let A be an nn matrix.
 x is an eigenvector of A if:
Ax = x
  is called the eigenvalue associated with x

► How to find the Eigenvalue  ?


 Equate the determinant |A- I| to 0. Here, I is the Identity Matrix.
|A- I| = 0 (Characteristic Equation)
 Eigenvalues are the roots of the Characteristic Equation.

► How to find the Eigenvectors?


 Use the values of  in the equation (A – I)x = 0.

32
Eigenvalues and Eigenvectors
Ques. Find the eigenvalues and eigenvectors of the matrix
  4  6
A
3 5 
Let us first derive the characteristic polynomial of A. We get:

  4  6 1 0  4    6 
A  I 2        
 3 5   0 1   3 5   
A  I 2  (4   )(5   )  18  2    2
By solving the above characteristic equation of A, we get the
eigenvalues, = 2 and = –1.
The corresponding eigenvectors are found by using these values of
 in the equation (A – I)x = 0. There are many eigenvectors
corresponding to each eigenvalue.
Courtesy: http://webct.math.yorku.ca/file.php/230/eigenvalues-and-eigenvectors.ppt
33
Eigenvalues and Eigenvectors
• For  = 2
We solve the equation (A – 2I)x = 0 for x.
The matrix (A – 2I) is obtained by subtracting 2 from the
diagonal elements of A. We get  6  6  x1 
3    0
 3   x2 
This leads to the system of equations
 6 x1  6 x2  0
3 x1  3 x2  0
giving x1 = –x2. The solutions to this system of equations are
x1 = –r, x2 = r, where r is a scalar. Thus the eigenvectors of A
corresponding to  = 2 are nonzero vectors of the form
 x1   1  1
v1     x2    r  
 x2   1  1
34
Eigenvalues and Eigenvectors
• For  = –1
We solve the equation (A + 1I)x = 0 for x.
The matrix (A + 1I) is obtained by adding 1 to the
diagonal elements of A. We get  3  6  x1 
3    0
 6   x2 
This leads to the system of equations
 3 x1  6 x2  0
3x1  6 x2  0
Thus x1 = –2x2. The solutions to this system of equations are
x1 = –2s and x2 = s, where s is a scalar. Thus the eigenvectors
of A corresponding to  = –1 are nonzero vectors of the form
 x1   2  2
v 2     x2    s  
 x2   1  1
35
Dimensionality Reduction
► Reduce the data down to its basic components, chipping away any
unnecessary part.
► Assume, the minions data would’ve been represented in 3-D.

36
Dimensionality Reduction
► Clearly, EV3 is unnecessary. Reduce it and represent the data in
terms of EV1 and EV2.
► Re-arrange the axes along the Eigenvectors, rather than the
original 3-D axes.

37
Dimensionality Reduction
Advantages:
► Reduces redundant features,
► Solves multi-collinearity issue,
► Helps compressing the data and reduce the space requirements,
► Quickens the time required to perform the same computation.

Applications:
► Stock Market Analysis,
► Image and Text processing,
► Speech Recognition,
► Recommendation Engine, etc.

38
Example on performing PCA
Step 1: Get some data
Shown in Table 1.
Step2: Subtract the mean
• All the x values have x̅ subtracted
and y values have y̅ subtracted
from them. This produces a data
set whose mean is zero.
• Subtracting the mean makes
variance and covariance
calculation easier by simplifying
their equations. The variance and
co-variance values are not affected
by the mean value.
Courtesy: Smith, Lindsay I. A tutorial on principal components analysis. 2002.
39
Example
Step 3: Calculate the Covariance Matrix:
Solved similar to the example depicted on slide 8.

Step 4: Calculate eigenvectors & eigenvalues of covariance matrix:


Solved similar to the example depicted on slide 11-13.

40
Example

41
Example

• Eigenvectors are plotted as diagonal dotted lines on the plot.


• Observe, they are perpendicular to each other.
• Observe, one of the eigenvectors goes through the middle of
the points, like drawing a line of best fit.
• The second eigenvector gives us the other, less important,
pattern in the data, that all the points follow the main line,
but are off to the side of the main line by some amount.

42
Example
Step 5: Choosing components and forming a feature vector:

We can either form a feature vector with both of the eigenvectors:

or, we can choose to leave out the smaller, less significant


component and only have a single column:

43
Example
► Step 6: Deriving the new data

RowFeatureVector is the matrix with the eigenvectors in the


columns transposed so that the eigenvectors are now in the rows,
with the most significant eigenvector at the top.

RowDataAdjust is the mean-adjusted data transposed, i.e. the data


items are in each column, with each row holding a separate
dimension.

44
Example

  .677873399  .735178956 Refer: slide 20


RowFeatureVector1   
  .735178956 .677873399 

RowFeatureVector 2   .677873399  .735178956

Refer: slide 17
 .69  1.31 .39 .09 1.29 .49 .19  .81  .31  .71 
RowDataAdjust   
 .49  1.21 .99 .29 1.09 .79  .31  .81  .31  1.01

This gives original data in terms of chosen components (eigenvectors), i.e., along
these axes.

45
Example

Final Data, RowFeatureVector1 has 2 rows => 1 eigenvectors


=> 2 dimensions in the final data (same as initial data).
After solving the Eq. on slide 21.
Using RowFeatureVector1..

46
Example

Final Data, RowFeatureVector2 has 1 row => 1 eigenvector


=> 1 dimension in the final data (reduced from 2)
After solving the Eq. on slide 21.
Using RowFeatureVector2..

47
Thank You!

48

You might also like