ML Mod32019
ML Mod32019
The maximum likelihood estimation method (MLE) is a method for estimating the
parameters of a statistical model, given observations (see Section 6.5 for details). The
method attempts to find the parameter values that maximize the likelihood function, or
equivalently the log-likelihood function, given the observations.
In the case of Gaussian mixture problems, because of the nature of the function, finding a
maximum likelihood estimate by taking the derivatives of the log-likelihood function
with respect to all the parameters and simultaneously solving the resulting equations is
nearly impossible. So we apply the EM algorithm to solve the problem.
It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data visualization,
noise reduction, cluster analysis, etc.
Benefits of applying Dimensionality Reduction
By reducing the dimensions of the features, the space required to store the dataset also
gets reduced.
Less Computation training time is required for reduced dimensions of features.
Reduced dimensions of features of the dataset help in visualizing the data quickly.
It removes the redundant features (if present) by taking care of multicollinearity.
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning. It is a statistical process that converts the
observations of correlated features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are called the Principal
Components. It is one of the popular tools that is used for exploratory data analysis and
predictive modeling. It is a technique to draw strong patterns from the given dataset by
reducing the variances.
As described above, the transformed new features or the output of PCA are the Principal
Components. The number of these PCs are either equal to or less than the original features
present in the dataset. Some properties of these principal components are given below:
The principal component must be the linear combination of the original features.
These components are orthogonal, i.e., the correlation between a pair of variables is
zero.
The importance of each component decreases when going to 1 to n, it means the 1 PC
has the most importance, and n PC will have the least importance.