Module 2 Notes Bcs602
Module 2 Notes Bcs602
Module-2 Notes-BCS602
MODULE 2
Understanding Data – 2: Bivariate Data and Multivariate Data, Multivariate Statistics, Essential
Mathematics for Multivariate Data, Feature Engineering and Dimensionality Reduction Techniques.
Basic Learning Theory: Design of Learning System, Introduction to Concept of Learning, Modelling in
Machine Learning.
Bivariate Statistics:
Covariance:
• Covariance is a measure of joint probability of random variables say X and Y.
• Random variables are represented in capital letters and defined as covariance(X,Y) or COV(X,Y)
and is used to measure the variance between two dimensions.
• Here x and y are data values from X and Y. E(X) and E(Y) are mean values of x and y i.
• N is the number of given data.
• Example:
Correlation
• It is the most common test for determining any Association between two phenomena.
• It measures the strength and direction of a linear relationship between an x and y variables.
• The correlation indicates the relationship between dimension using its sign.
• The sign is more important than the actual value.
o If the value is positive, it indicates that the dimension increases together.
o If the value is negative, it indicates that while One dimension increases the other dimension
decreases.
o If the value is zero indicates that both the dimensions are independent of each other.
o If the dimensions are correlated then it is better to remove One dimension as it is a
redundant dimension.
Pairplot:
• Pair plot or scatter matrix is a data visual for multivariate data.
• It consists of several pairwise scatter plots for variables of multivariate data
• All the results are presented in the matrix format, chips among the variables such as correlation
between the variables.
•
• If there is a unique solution then the system is called consistent independent. If there are various
solution then the system is called consistent dependent. If there are no solutions and if the
equations are contradictory then the system is called inconsistent.
• For solving large number of system of equations Gaussian elimination can be used.
• Procedure for applying Gaussian elimination is:
2. Matrix decompositions:
• It is a way of reducing matrix into eigen values and eigen vectors.
• A matrix a can be decomposed as:
o
o
• LU decomposition:
o matrix A can be decomposed as: A=LU
o L is a lower triangular Matrix and U is the upper triangular matrix.
• Now it can be observed that the first matrix is L as it is a lower triangular matrix whose values
are determiners used in the reduction of the equation 3,3, 2 / 3.
• second matrix is U the upper triangular Matrix whose values are the values of the reduced Matrix
because of Gaussian elimination.
• Using statistics Analysis of data is done any data will generate a probability distribution.
Probability Distributions:
• It is a variable, say X, summarises the probability associated with X’s event.
• Distribution is a parameterized mathematical function the describes the relationship between the
observations in the sample space.
• Probability distributions are of two types
a. Discrete Probability Distribution
b. Continuous Probability Distribution
• The relationships between the events of continuous random variables and their probabilities is called
a Continuous Probability Distribution summarised as Probability Density Function (PDF) which
calculates the quality of observing and instance.
• Cumulative Distribution Function (CDF) computes the probability observation less than or equal
value.
• Cumulative Distribution Function is the probability of an event cannot be detected directly it
should be computed as the area under the curve of for a small interval around the specific outcome.
Types of continuous probability distribution
1.Normal Distribution:
• It is a continuous probability distribution known as Gaussian distribution are bell shaped curve
distribution.
• Data tends to be around a central value with no bias on the left or right
• Example heights of the student, blood pressure of the population, marks scored in a class.
• Normal distribution is given as
Density estimation is the problem of estimating the density function from an observed data.
the estimated density function denoted as P(x) can be used to value directly for any unknown data say Xt
as P(x)
There are two types of density estimation methods Parametric density estimation and Non- Parametric
density estimation:
1. Parametric Density Estimation(PDF): it assumes the data is from a known probabilistic
o where V is the volume of the region R. if R is an hypercube centred at x and h is the length
of the hypercube the volume V is h2 for 2D square cube and h3 for 3D cube.
2. KNN estimation: it is another non parametric density estimation method, the initial parameter k
is determined and based on the key neighbours are determined the probability density function
estimated is the average of the values that are returned by the neighbours.
other features example mole on a face can help in face detection than a common features
like noise.
o Feature Redundancy: some features are redundant. when a database table as a field
called date of birth, age field is not relevant as it can be computed easily by date of birth
this helps in removing the column age that leads to reduction of dimension one.
o procedures for redundancy is
i. generate all possible subset
ii. evaluate the subset and model performance
iii. evaluate the results for optimal feature selection
4. Eigen values and eigen vectors of a covariance matrix are calculated. an eigenvector is a
non-zero vector that remains in the same direction after a linear transformation, scaled by
its corresponding eigenvalue.
5. The eigen vector of the highest eigen value is the principal component of the dataset. The
eigen values are arranged in descending order. The feature vectors is formed with these
eigen vectors in its columns. Feature vector ={ eigenvector1, eigenvector2....
Eigenvectorn}
6. Obtain the transpose of feature vector let it be A.Tthe mapping of vectors x to y using the
transformation is described as y=A(x-mx). this transform is called as Karhunen-Loeve
or Hoteling transform.
7. PCA transform is y=A(x-m), where X is the input M is the mean a is the transpose of
feature vector.
8. The original data can be retrieved using the formula given below:
Original data (f) ={(A)-1 x y}+m
= {(A)T x y}+m
• Scree plot is a visualisation technique used to visualise the principal component visually.
• For any randomly selected data set of 246 PCA is applied and its sree plot indicates that only 6
out of 246 attributes are important.
Example: let the data points be (2/6) and (1/7). Apply PCA and find the transformed data.
Again, apply the inverse and prove that PCA works:
5. find the eigen values and the eigen vectors of ATA .find the eigenvalue and pack of eigen vector
as a matrix called V.
6. Thus A=USVT, Here U and V are prthogonal matrices.
Examples: