ML Module 2,3,4
ML Module 2,3,4
- Norms:
Norms are mathematical measures used to quantify the size or length of a vector
in a vector space. In the context of machine learning and linear algebra, two
commonly used norms are the L1 norm and the L2 norm. Here's an explanation
of each:
- Inner Product:
The inner product, also known as the dot product, is a binary operation
that takes two vectors and returns a scalar quantity.
It measures the similarity or projection of one vector onto another and
plays a fundamental role in defining distances, angles, and orthogonality
in vector spaces.
The inner product is used in various mathematical and computational
applications, including vector spaces, geometry, signal processing, and
machine learning.
- Diagonalization:
Diagonalization is a process that transforms a square matrix into a
diagonal matrix by finding a basis of eigenvectors and expressing the
matrix as a product of eigenvectors and eigenvalues.
Diagonalization simplifies matrix computations, facilitates eigenvalue
analysis, and provides insights into the matrix's properties and behavior.
It's used in various mathematical and computational applications,
including solving systems of linear equations, computing matrix powers,
and solving differential equations.
3. Decision Boundary:
- The decision boundary is determined by the threshold value (e.g., 0.5)
applied to the predicted class probabilities.
- If the predicted probability is above the threshold, the instance is classified
as class 1; otherwise, it is classified as class 0.
4. Loss Function:
- LSC minimizes the squared error loss between the predicted class labels and
the actual class labels.
- The loss function penalizes misclassifications by squaring the difference
between the predicted and actual class labels.
5. Applications:
- Least-Squares Regression for classification is a simple and interpretable
method commonly used in situations where linear decision boundaries are
appropriate.
- It can be applied to various binary classification tasks, such as spam
detection, medical diagnosis, and sentiment analysis.
2. Applications:
- Multivariate linear regression is widely used in various fields such as:
Economics: Analyzing the impact of multiple factors on
economic outcomes like GDP or employment rates.
Finance: Predicting stock prices based on multiple financial
indicators such as interest rates, market indices, and company
performance metrics.
Social Sciences: Investigating the relationships between
demographic factors, social behaviors, and health outcomes.
Marketing: Predicting sales or market share based on
advertising expenditure, pricing strategies, and consumer
demographics.
Environmental Science: Modeling the relationships between
environmental variables (temperature, humidity, pollution
levels) and ecological outcomes (species abundance,
biodiversity).
- Regularized regression:
Regularized regression is an extension of linear regression that introduces
penalty terms to the model's cost function, aiming to prevent overfitting and
improve predictive performance.
Model Representation:
The model equation resembles linear regression:
Types of Regularization:
The two primary types of regularization commonly used in regularized
regression are:
1. L1 Regularization (Lasso):
L1 regularization adds the absolute values of the coefficients to
the cost function, represented by the sum of the absolute values
of the coefficients:
L1 regularization encourages sparsity in the coefficient
estimates, as it tends to shrink less relevant features' coefficients
to exactly zero.
L1 regularization facilitates feature selection by effectively
removing irrelevant or redundant features from the model.
2. L2 Regularization (Ridge):
L2 regularization adds the squared values of the coefficients to
the cost function, represented by the sum of the squared
coefficients:
Model Representation:
o Given a training dataset with input features and corresponding
class labels, SVM finds the hyperplane that separates the classes
with the largest margin.
o The hyperplane is defined by a set of support vectors, which are the
data points closest to the decision boundary.
Key Concepts:
o Margin: The distance between the hyperplane and the nearest data
point from each class. SVM aims to maximize this margin, leading
to better generalization.
o Kernel Trick: SVM can handle non-linearly separable data by
mapping input features into a higher-dimensional space using
kernel functions (e.g., polynomial, radial basis function) to find a
linear separation boundary.
o Regularization Parameter (C): Controls the trade-off between
maximizing the margin and minimizing the classification error on
the training data. Higher values of C allow for fewer margin
violations but may lead to overfitting.
o Kernel Parameters: Parameters specific to the chosen kernel
function, such as the degree for polynomial kernels and the gamma
parameter for radial basis function (RBF) kernels.
Types of SVM:
SVM, or Support Vector Machine, can be categorized based on the type of
decision boundary they form. Here are the main types:
1. Linear SVM:
a. Linear SVMs classify data by finding the optimal hyperplane that
linearly separates the classes in the feature space.
b. The decision boundary is a straight line (in 2D), or a hyperplane (in
higher dimensions) that maximizes the margin between the classes.
c. Linear SVMs are suitable for linearly separable datasets where
classes can be separated by a straight line or plane.
2. Non-linear SVM:
a. Non-linear SVMs are used for datasets that are not linearly
separable in the original feature space.
b. They employ kernel functions to map the input features into a
higher-dimensional space where the classes become separable by a
hyperplane.
c. Common kernel functions include polynomial kernel, radial basis
function (RBF) kernel, sigmoid kernel, and custom kernels tailored
to specific data characteristics.
d. Non-linear SVMs are capable of capturing complex decision
boundaries and can handle more intricate patterns in the data.
Applications:
SVM is widely used in various fields, including:
o Text classification (e.g., spam detection, sentiment analysis).
o Image recognition (e.g., object detection, facial recognition).
o Bioinformatics (e.g., gene expression classification, protein
structure prediction).
o Finance (e.g., credit scoring, stock market prediction).
o Medical diagnosis (e.g., disease classification, cancer detection).
Module 4: Hebbian Learning and Expectation Maximization
1. Objective:
a. The EM algorithm for clustering aims to find the parameters of a
mixture model that best describe the underlying data distribution.
b. It iteratively estimates the parameters of the mixture model by
maximizing the likelihood of the observed data.
2. Model Representation:
a. The mixture model represents the data as a combination of multiple
probability distributions (e.g., Gaussian distributions) with
different parameters.
b. Each component of the mixture model represents a cluster in the
data.
3. Algorithm Steps:
a. Expectation (E) Step: In the E-step, the algorithm estimates the
probabilities of data points belonging to each cluster (i.e.,
computes the posterior probabilities or responsibilities).
b. Maximization (M) Step: In the M-step, the algorithm updates the
parameters of the mixture model (e.g., means and covariances of
Gaussian distributions) based on the estimated cluster assignments
obtained from the E-step.
4. Iterative Process:
a. The EM algorithm iterates between the E-step and M-step until
convergence, where the likelihood of the observed data stops
improving or reaches a predefined threshold.
b. Each iteration of the algorithm typically improves the fit of the
mixture model to the data, leading to better cluster assignments and
parameter estimates.
5. Initialization:
a. The performance of the EM algorithm can be sensitive to the initial
parameter values.
b. Common initialization strategies include random initialization, k-
means clustering, or hierarchical clustering.
6. Applications:
a. The EM algorithm for clustering is widely used in various
domains, including image segmentation, document clustering, and
gene expression analysis.
b. It is particularly useful when the data contains hidden or latent
variables and when the underlying data distribution is complex and
cannot be easily modeled by a single probability distribution.