1-Python Algebra Maths
1-Python Algebra Maths
Approach 1 Approach 2
4. Covariance and Correlation Matrix
Variance of the data X can be defined as follows : (As you see, in case of
single column data, Variance is same as covariance between the same
data).
Now let's assume that we have two set of single column data as shown
below.
4. Covariance and Correlation Matrix
• The covariance between these two data set (two single column data) is defined as
follows :
• If you combine the two sets of data into a matrix form as shown below
4. Covariance and Correlation Matrix
• Properties of Covariance Matrix
• Covariance matrix can represents the variance and linear correlation in
multivariate/multidimensional data
• Covariance matrix gives you meaningful result only when the data sets have
linear correlation.
• Covariance matrix is always a Square Matrix
• If Idata is n rows and m columns (n x m Matrix), the dimension of Covariance
matrix is m x m Square matrix
• Covariation vs. Correlation
4. Covariance and Correlation Matrix
4. Covariance and Correlation Matrix
5. Eigen Values and Eigen Vectors
• In linear algebra, an eigenvector or characteristic vector of a linear transformation
is a nonzero vector that changes at most by a scalar factor when that linear
transformation is applied to it.
• The corresponding eigenvalue is the factor by which the eigenvector is scaled.
5. Eigen Values and Eigen Vectors
• How do we find these eigen things?
• We start by finding the eigenvalue: we know this equation must be true:
Av = λv
• Now let us put in an identity matrix so we are dealing with matrix-vs-matrix:
Av = λIv
• Bring all to left hand side:
Av − λIv = 0
• If v is non-zero then we can solve for λ using just the determinant:
| A − λI | = 0
5. Eigen Values and Eigen Vectors
5. Eigen Values and Eigen Vectors
Example 1
5. Eigen Values and Eigen Vectors
Example 2
6. Principle Component Analysis
• Principal Component Analysis, or PCA, is a dimensionality-reduction method that is
often used to reduce the dimensionality of large data sets, by transforming a large
set of variables into a smaller one that still contains most of the information in the
large set.
• Because smaller data sets are easier to explore and visualize and make analyzing
data much easier and faster for machine learning algorithms without extraneous
variables to process.
6. Principle Component Analysis
• STEP BY STEP EXPLANATION OF PCA
• STEP 1: STANDARDIZATION
• The aim of this step is to standardize the range of the continuous initial
variables so that each one of them contributes equally to the analysis.
• Mathematically, this can be done by subtracting the mean and dividing
by the standard deviation for each value of each variable.
6. Principle Component Analysis
• STEP 2: COVARIANCE MATRIX COMPUTATION
• The aim of this step is to understand how the variables of the input data
set are varying from the mean with respect to each other, or in other
words, to see if there is any relationship between them
6. Principle Component Analysis
STEP BY STEP EXPLANATION OF PCA
STEP 3: COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE COVARIANCE MATRIX TO
IDENTIFY THE PRINCIPAL COMPONENTS
• Eigenvectors and eigenvalues are the linear algebra concepts that we need to compute from
the covariance matrix in order to determine the principal components of the data.
• Principal components are new variables that are constructed as linear combinations or
mixtures of the initial variables
• Most of the information within the initial variables is squeezed or compressed into the first
components
6. Principle Component Analysis
• STEP BY STEP EXPLANATION OF PCA
• STEP 4: FEATURE VECTOR
• Computing the eigenvectors and ordering them by their eigenvalues in
descending order, allow us to find the principal components in order of
significance.
• In this step, what we do is, to choose whether to keep all these
components or discard those of lesser significance (of low eigenvalues),
and form with the remaining ones a matrix of vectors that we call
Feature vector.
6. Principle Component Analysis
• LAST STEP: RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES
• In this step, which is the last one, the aim is to use the feature vector
formed using the eigenvectors of the covariance matrix, to reorient the
data from the original axes to the ones represented by the principal
components
• This can be done by multiplying the transpose of the original data set by
the transpose of the feature vector.
26
26