Mod2 Notes (4)
Mod2 Notes (4)
Prepared By
Indu K S
Asst Professor,Dept Of ISE,TOCE,Bangalore
Module2
• Understanding Data – 2: Bivariate Data and Multivariate
Data,
• Multivariate Statistics,
• Essential Mathematics for Multivariate Data,
• Feature Engineering and Dimensionality Reduction
Techniques.
• Basic Learning Theory: Design of Learning System,
• Introduction to Concept of Learning,
• Modelling in Machine Learning.
1. Covariance
Covariance is a measure of joint probability of random variables, say X and Y.
Generally, random variables are represented in capital letters.
It is defined as covariance(X, Y) or COV(X, Y) and is used to measure the
variance between two dimensions. The formula for finding co-variance for specific x
and y are:
2. Correlation
2.7 MULTIVARIATE STATISTICS
• In machine learning, almost all datasets are multivariable.
• Multivariate data is the analysis of more than two observable variables.
• The multivariate data is like bivariate data but may have more than two
dependent variables.
• Some of the multivariate analyses are regression analysis, principal
component analysis, and path analysis.
The mean of multivariate data is a mean vector, and the mean of the above three
attributes is given as (2, 7.5, 1.33).
The variance of multivariate data becomes the covariance matrix. The mean
vector is called centroid, and variance is called dispersion matrix.
Multivariate data has three or more variables.
The aim of the multivariate analysis is much more.
They are regression analysis, factor analysis, and multivariate analysis of variance.
Heatmap
• A heatmap is a graphical representation of data where individual values in a matrix are
represented as colors.
• It is commonly used in data science and machine learning to visualize correlations
between variables, feature importance, or distributions in datasets.
• Heatmap is a graphical representation of 2D matrix.
• A heatmap is like a table, but instead of numbers, we use colors to indicate values.
• It takes a matrix as input and colours it.
• The darker colours indicate very large values and lighter colours indicate smaller
values.
2.7 MULTIVARIATE STATISTICS
• In machine learning, almost all datasets are multivariable.
• Multivariate data is the analysis of more than two observable variables.
• The multivariate data is like bivariate data but may have more than two
dependent variables.
• Some of the multivariate analyses are regression analysis, principal
component analysis, and path analysis.
The mean of multivariate data is a mean vector, and the mean of the above three
attributes is given as (2, 7.5, 1.33).
The variance of multivariate data becomes the covariance matrix. The mean
vector is called centroid, and variance is called dispersion matrix.
Multivariate data has three or more variables.
The aim of the multivariate analysis is much more.
They are regression analysis, factor analysis, and multivariate analysis of variance.
Heatmap
• A heatmap is a graphical representation of data where individual values in a matrix are
represented as colors.
• It is commonly used in data science and machine learning to visualize correlations
between variables, feature importance, or distributions in datasets.
• Heatmap is a graphical representation of 2D matrix.
• A heatmap is like a table, but instead of numbers, we use colors to indicate values.
• It takes a matrix as input and colours it.
• The darker colours indicate very large values and lighter colours indicate smaller
values.
Understanding Correlation Heatmaps
A correlation heatmap is a special type of heatmap that visualizes the correlation between variables
in a dataset. It helps identify relationships between features.
Correlation values range from -1 to 1:
● +1 → Strong positive correlation (one increases, the other also increases).
● -1 → Strong negative correlation (one increases, the other decreases).
● 0 → No correlation.
imagine you are waiting for a bus that arrives randomly every 10 minutes (on average).
○ This function tells us how likely it is to see the observed data given some
parameter values.
3. Maximize the Likelihood
○ We adjust the parameters so that the likelihood of seeing our data is as high as
possible.
○ This gives us the best estimate of the parameters.
Maximum Likelihood Estimation (MLE) is a method used in parametric density
estimation to estimate the parameters of a probability distribution by maximizing the
likelihood function. It aims to find the parameter values that make the observed data
most probable.
In parametric density estimation, we assume that the data follows a specific probability
distribution (e.g., Normal, Exponential, Binomial) with unknown parameters. MLE helps
us determine these parameters.
The relevance of this theory of MLE for machine learning is that
MLE can solve the problem of predictive modelling in
machine learning.
If one assumes that the regression problem can be framed as
predicting output y given input x, then for p(y∣x), the MLE
framework can be applied as: max∑log(y∣xi,h).
Gaussian Mixture Model and Expectation-Maximization (EM) Algorithm
In machine learning, clustering is one of the important tasks. MLE framework is quite useful for
designing model-based methods for clustering data.
1. Gaussian Mixture Model (GMM)
A Gaussian Mixture Model (GMM) is a soft clustering algorithm that assumes data points
come from multiple Gaussian (bell-shaped) distributions. Instead of assigning each point to
one specific cluster (like K-Means), GMM gives a probability score for a data point belonging
to different clusters.
Example:
Imagine you have height and weight data for people from three different countries, but you
don’t know which country each person belongs to.
● A GMM would assume the data comes from three different Gaussian distributions.
● Instead of saying "this person is definitely from country A", GMM will say "this person
has a 70% chance of being from country A, 20% from B, and 10% from C."
2. Expectation-Maximization (EM) Algorithm
The EM algorithm is one algorithm that is commonly used for estimating the
MLE in the presence of latent or missing variables.
Formal Definition:
For a square matrix A, an eigenvector v and its corresponding eigenvalue λ satisfy:
Av=λv
This means:
● When you multiply matrix A with vector v, it only scales the vector (does not change its
direction).
● The scaling factor is the eigenvalue λ.
2.10.3 Principal Component Analysis
PCA (Principal Component Analysis) is a method used in machine learning and statistics to
reduce the number of variables in a dataset while keeping the most important information.
This leads to a reduced and compact set of features. Basically, this elimination is made
possible because of the information redundancies. This compact representation is of a reduced
dimension.
Ex: Imagine you have a big collection of books and want to organize them efficiently.
● Instead of sorting them by every small detail (title, author, genre, year, pages, price,
etc.),
● You pick only the most important factors (genre and author) to classify them.
PCA does something similar—it reduces the number of features (variables) in a dataset but
keeps the most important patterns.
The PCA algorithm is as follows:
2. The mean is subtracted from the dataset. Let the mean be mmm. Thus, the adjusted dataset is X−m. The objective
of this process is to transform the dataset with zero mean.
5. The eigenvector of the highest eigenvalue is the principal component of the dataset. The eigenvalues are arranged
in a descending order. The feature vector is formed with these eigenvectors in its columns.
7. PCA transform is y=A×(x−m), where x is the input dataset, m is the mean, and A is the transpose of the feature
vector.
The original data can be retrieved using the formula given below: