0% found this document useful (0 votes)

18 views13 pages

ML Module 2,3,4

ML unit 2,3,4

Uploaded by

Viman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views13 pages

ML Module 2,3,4

ML unit 2,3,4

Uploaded by

Viman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Module 2: Linear Algebra for Machine Learning

- System of Linear equations:

 A system of linear equations consists of multiple equations with multiple
unknown variables.
 It can be represented in matrix form and solved using various techniques
such as Gaussian elimination, matrix inversion, and matrix factorization.
 Linear algebra provides a powerful framework for solving systems of
equations and understanding the properties of linear transformations.

- Norms:
Norms are mathematical measures used to quantify the size or length of a vector
in a vector space. In the context of machine learning and linear algebra, two
commonly used norms are the L1 norm and the L2 norm. Here's an explanation
of each:

1. L1 Norm (Manhattan Norm or Taxicab Norm):

 The L1 norm of a vector {x} is defined as the sum of the
absolute values of its components:

 Geometrically, the L1 norm represents the distance between the

origin and the point defined by the vector when following only
horizontal and vertical paths (like a taxi cab navigating city
blocks).
- Properties of the L1 norm:
- It is less sensitive to outliers compared to the L2 norm.
- It tends to produce sparse solutions in optimization problems,
leading to feature selection.
- It is commonly used in Lasso regularization in linear regression
and feature selection algorithms.
2. L2 Norm (Euclidean Norm):
- The L2 norm of a vector {x} is defined as the square root of the
sum of the squares of its components:

 Geometrically, the L2 norm represents the Euclidean distance

between the origin and the point defined by the vector in a
multidimensional space.
- Properties of the L2 norm:
 It is sensitive to outliers because it squares the values of the
vector components.
 It tends to produce dense solutions in optimization problems,
which may not be desirable in certain scenarios.
 It is commonly used in Ridge regularization in linear regression
and as a distance metric in machine learning algorithms such as
k-nearest neighbors (KNN).

- Inner Product:
 The inner product, also known as the dot product, is a binary operation
that takes two vectors and returns a scalar quantity.
 It measures the similarity or projection of one vector onto another and
plays a fundamental role in defining distances, angles, and orthogonality
in vector spaces.
 The inner product is used in various mathematical and computational
applications, including vector spaces, geometry, signal processing, and
machine learning.

- Diagonalization:
 Diagonalization is a process that transforms a square matrix into a
diagonal matrix by finding a basis of eigenvectors and expressing the
matrix as a product of eigenvectors and eigenvalues.
 Diagonalization simplifies matrix computations, facilitates eigenvalue
analysis, and provides insights into the matrix's properties and behavior.
 It's used in various mathematical and computational applications,
including solving systems of linear equations, computing matrix powers,
and solving differential equations.

- SVD and its application:

 Singular Value Decomposition (SVD) is a matrix factorization technique
that decomposes a matrix into three orthogonal matrices: U, Σ, and V^T.
 U is an orthogonal matrix containing the left singular vectors.
 Σ is a diagonal matrix containing the singular values.
 V^T is the transpose of an orthogonal matrix containing the right singular
vectors.
 It's a powerful tool for dimensionality reduction, data compression, and
matrix approximation.
Its applications include:
 Dimensionality Reduction: Retaining significant singular values and
vectors reduces data dimensions while preserving essential information.
 Matrix Approximation: Truncating Σ yields a lower-rank approximation
of A , useful for denoising and completion tasks.
 Collaborative Filtering: Factorizing user-item rating matrices helps
make personalized recommendations in recommendation systems.
 Principal Component Analysis (PCA): SVD identifies principal
components, aiding in dimensionality reduction while retaining variance
in data.
 Image Processing: SVD enables image denoising, compression, and
restoration by decomposing images into singular components.
 Latent Semantic Analysis (LSA): Analyzing term-document matrices
uncovers latent semantic structures in textual data for tasks like topic
modeling and information retrieval.
 Low-Rank Matrix Completion: SVD aids in recovering missing data
entries by approximating matrices with a low-rank representation, useful
in recommender systems and collaborative filtering.
Module 3: Regression and Support Vector Machine (SVM)

- Least-Squares Regression for classification:

Least-Squares Regression for classification, also known as Least-Squares
Classification (LSC), is a simple method used for binary classification tasks.
Here's a straightforward explanation:
1. Objective:
- Least-Squares Regression for classification aims to find a linear decision
boundary that separates the classes in the feature space.
- It seeks to minimize the squared error between the predicted class labels and
the actual class labels.
2. Model Representation:
- Given a dataset with input features and binary class labels (0 or 1), LSC fits
a linear regression model to the data.
- The model predicts the class labels using a linear equation of the form:

3. Decision Boundary:
- The decision boundary is determined by the threshold value (e.g., 0.5)
applied to the predicted class probabilities.
- If the predicted probability is above the threshold, the instance is classified
as class 1; otherwise, it is classified as class 0.
4. Loss Function:
- LSC minimizes the squared error loss between the predicted class labels and
the actual class labels.
- The loss function penalizes misclassifications by squaring the difference
between the predicted and actual class labels.
5. Applications:
- Least-Squares Regression for classification is a simple and interpretable
method commonly used in situations where linear decision boundaries are
appropriate.
- It can be applied to various binary classification tasks, such as spam
detection, medical diagnosis, and sentiment analysis.

- Multivariate linear regression:

Multivariate linear regression is an extension of simple linear regression to
handle multiple independent variables.
In multivariate linear regression, the goal is to model the relationship between
multiple predictors (independent variables) and a single target variable
(dependent variable) by fitting a linear equation to the observed data.
1. Model Representation:

2. Applications:
- Multivariate linear regression is widely used in various fields such as:
 Economics: Analyzing the impact of multiple factors on
economic outcomes like GDP or employment rates.
 Finance: Predicting stock prices based on multiple financial
indicators such as interest rates, market indices, and company
performance metrics.
 Social Sciences: Investigating the relationships between
demographic factors, social behaviors, and health outcomes.
 Marketing: Predicting sales or market share based on
advertising expenditure, pricing strategies, and consumer
demographics.
 Environmental Science: Modeling the relationships between
environmental variables (temperature, humidity, pollution
levels) and ecological outcomes (species abundance,
biodiversity).

- Regularized regression:
Regularized regression is an extension of linear regression that introduces
penalty terms to the model's cost function, aiming to prevent overfitting and
improve predictive performance.

Model Representation:
 The model equation resembles linear regression:

 Additional penalty terms are included in the cost function to

regulate the coefficients.

Types of Regularization:
The two primary types of regularization commonly used in regularized
regression are:

1. L1 Regularization (Lasso):
 L1 regularization adds the absolute values of the coefficients to
the cost function, represented by the sum of the absolute values
of the coefficients:
 L1 regularization encourages sparsity in the coefficient
estimates, as it tends to shrink less relevant features' coefficients
to exactly zero.
 L1 regularization facilitates feature selection by effectively
removing irrelevant or redundant features from the model.

2. L2 Regularization (Ridge):
 L2 regularization adds the squared values of the coefficients to
the cost function, represented by the sum of the squared
coefficients:

 L2 regularization penalizes large coefficients, effectively

shrinking them towards zero, but typically not all the way to
zero.
 L2 regularization is effective in reducing the impact of
multicollinearity among predictor variables by stabilizing
coefficient estimates.
Applications:
 Regularized regression finds application across various
domains:
 Finance: Predicting stock prices, risk assessment.
 Healthcare: Patient outcome prediction, disease diagnosis.
 Marketing: Customer churn prediction, sales forecasting.
 Environmental Science: Modeling environmental impacts on
ecosystems.

Difference between Lasso and Ridge Regression

- Support Vector Machine (SVM):

Support Vector Machine (SVM) is a supervised machine learning algorithm that
finds the optimal hyperplane in an n-dimensional space to classify data into
different classes. It works by identifying the best separation boundary between
classes, maximizing the margin between the classes.

 Model Representation:
o Given a training dataset with input features and corresponding
class labels, SVM finds the hyperplane that separates the classes
with the largest margin.
o The hyperplane is defined by a set of support vectors, which are the
data points closest to the decision boundary.

 Key Concepts:
o Margin: The distance between the hyperplane and the nearest data
point from each class. SVM aims to maximize this margin, leading
to better generalization.
o Kernel Trick: SVM can handle non-linearly separable data by
mapping input features into a higher-dimensional space using
kernel functions (e.g., polynomial, radial basis function) to find a
linear separation boundary.
o Regularization Parameter (C): Controls the trade-off between
maximizing the margin and minimizing the classification error on
the training data. Higher values of C allow for fewer margin
violations but may lead to overfitting.
o Kernel Parameters: Parameters specific to the chosen kernel
function, such as the degree for polynomial kernels and the gamma
parameter for radial basis function (RBF) kernels.

 Types of SVM:
SVM, or Support Vector Machine, can be categorized based on the type of
decision boundary they form. Here are the main types:

1. Linear SVM:
a. Linear SVMs classify data by finding the optimal hyperplane that
linearly separates the classes in the feature space.
b. The decision boundary is a straight line (in 2D), or a hyperplane (in
higher dimensions) that maximizes the margin between the classes.
c. Linear SVMs are suitable for linearly separable datasets where
classes can be separated by a straight line or plane.

2. Non-linear SVM:
a. Non-linear SVMs are used for datasets that are not linearly
separable in the original feature space.
b. They employ kernel functions to map the input features into a
higher-dimensional space where the classes become separable by a
hyperplane.
c. Common kernel functions include polynomial kernel, radial basis
function (RBF) kernel, sigmoid kernel, and custom kernels tailored
to specific data characteristics.
d. Non-linear SVMs are capable of capturing complex decision
boundaries and can handle more intricate patterns in the data.

 Applications:
SVM is widely used in various fields, including:
o Text classification (e.g., spam detection, sentiment analysis).
o Image recognition (e.g., object detection, facial recognition).
o Bioinformatics (e.g., gene expression classification, protein
structure prediction).
o Finance (e.g., credit scoring, stock market prediction).
o Medical diagnosis (e.g., disease classification, cancer detection).
Module 4: Hebbian Learning and Expectation Maximization

- Hebbian learning rule:

The Hebbian learning rule is a concept in neuroscience and neural network
theory that describes a mechanism for synaptic plasticity, which is the ability of
synapses to strengthen or weaken over time based on their activity. Here's a
simplified explanation:
 Definition:
- The Hebbian learning rule states that "neurons that fire together,
wire together." It suggests that if two neurons are repeatedly
activated at the same time, the strength of the connection
(synaptic weight) between them should increase.
- Proposed by Donald Hebb in 1949, the rule provides a
foundational concept for understanding how learning and
memory formation occur in biological neural networks.
 Mechanism:
- When a presynaptic neuron repeatedly fires and causes the
postsynaptic neuron to fire, the connection between them
strengthens.
- If the presynaptic neuron consistently precedes the firing of the
postsynaptic neuron, the synaptic connection strengthens further.
- Conversely, if the presynaptic neuron consistently fails to cause
the postsynaptic neuron to fire, the connection weakens.
 Key Points:
- The Hebbian learning rule is based on the idea of correlation
between neuronal activity.
- It provides a mechanism for associative learning, where the co-
activation of neurons leads to the formation of associations or
memories.
- While the Hebbian learning rule offers a simple explanation for
synaptic plasticity, it doesn't account for all aspects of learning
and memory, and more complex rules have been proposed in
neuroscience and artificial neural network models.
 Applications:
- The Hebbian learning rule has inspired various computational
models of learning and memory in artificial neural networks.
- It forms the basis for unsupervised learning algorithms such as
Hebbian learning and competitive learning, where networks self-
organize based on input patterns.

- Expectation maximization algorithm for clustering:

The Expectation Maximization (EM) algorithm for clustering is a powerful
method used to fit mixture models, particularly in unsupervised learning tasks
such as clustering. Here's a straightforward explanation:

1. Objective:
a. The EM algorithm for clustering aims to find the parameters of a
mixture model that best describe the underlying data distribution.
b. It iteratively estimates the parameters of the mixture model by
maximizing the likelihood of the observed data.

2. Model Representation:
a. The mixture model represents the data as a combination of multiple
probability distributions (e.g., Gaussian distributions) with
different parameters.
b. Each component of the mixture model represents a cluster in the
data.

3. Algorithm Steps:
a. Expectation (E) Step: In the E-step, the algorithm estimates the
probabilities of data points belonging to each cluster (i.e.,
computes the posterior probabilities or responsibilities).
b. Maximization (M) Step: In the M-step, the algorithm updates the
parameters of the mixture model (e.g., means and covariances of
Gaussian distributions) based on the estimated cluster assignments
obtained from the E-step.
4. Iterative Process:
a. The EM algorithm iterates between the E-step and M-step until
convergence, where the likelihood of the observed data stops
improving or reaches a predefined threshold.
b. Each iteration of the algorithm typically improves the fit of the
mixture model to the data, leading to better cluster assignments and
parameter estimates.

5. Initialization:
a. The performance of the EM algorithm can be sensitive to the initial
parameter values.
b. Common initialization strategies include random initialization, k-
means clustering, or hierarchical clustering.

6. Applications:
a. The EM algorithm for clustering is widely used in various
domains, including image segmentation, document clustering, and
gene expression analysis.
b. It is particularly useful when the data contains hidden or latent
variables and when the underlying data distribution is complex and
cannot be easily modeled by a single probability distribution.

Get Hitler in Argentina But No Teutonic Conspiracy of 1000 Years 1st Edition Bruno Buike Free All Chapters
100% (1)
Get Hitler in Argentina But No Teutonic Conspiracy of 1000 Years 1st Edition Bruno Buike Free All Chapters
51 pages
Chicago River Design Guidelines 2019
100% (2)
Chicago River Design Guidelines 2019
137 pages
Minh Hoa KTHK1 Anh 11 - Linh
No ratings yet
Minh Hoa KTHK1 Anh 11 - Linh
2 pages
Discipline of Focus
No ratings yet
Discipline of Focus
9 pages
Ch. 5 Disposal of Wastewater Numericals
No ratings yet
Ch. 5 Disposal of Wastewater Numericals
6 pages
Fortiss Report AI Engineering en Web
No ratings yet
Fortiss Report AI Engineering en Web
120 pages
Quantum Physics For Babies
No ratings yet
Quantum Physics For Babies
13 pages
AGA Report 7-Measurement of Natural Gas by Turbine Meters
No ratings yet
AGA Report 7-Measurement of Natural Gas by Turbine Meters
77 pages
Intro To Microscopes - Compiled Lesson Plan
No ratings yet
Intro To Microscopes - Compiled Lesson Plan
32 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
Dose, Dilution and The LM Potencies
No ratings yet
Dose, Dilution and The LM Potencies
12 pages
Advertisement For Dav
No ratings yet
Advertisement For Dav
9 pages
Optimax Speed - HLP - Hydraulic
No ratings yet
Optimax Speed - HLP - Hydraulic
2 pages
Data-Sheet FieldJointCoating
No ratings yet
Data-Sheet FieldJointCoating
2 pages
Project Format & Details
No ratings yet
Project Format & Details
7 pages
Halliburton Packer Service Tools Catalog
100% (8)
Halliburton Packer Service Tools Catalog
92 pages
BS 2C 4-1973 (2012)
No ratings yet
BS 2C 4-1973 (2012)
10 pages
Enhancing The Weather - Governance of Weather Modification Activit
No ratings yet
Enhancing The Weather - Governance of Weather Modification Activit
69 pages
Building Social Protection Floors For All: Global Flagship Programme Strategy (2016-20)
No ratings yet
Building Social Protection Floors For All: Global Flagship Programme Strategy (2016-20)
24 pages
TU108 Project 3
No ratings yet
TU108 Project 3
8 pages
12 Rashis and Their Lords
No ratings yet
12 Rashis and Their Lords
1 page
WP0 REPLA0140 Same 0 Box 00 PUBLIC0
No ratings yet
WP0 REPLA0140 Same 0 Box 00 PUBLIC0
114 pages
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
No ratings yet
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
28 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Random Processes: 8.1 Basic Concepts
No ratings yet
Random Processes: 8.1 Basic Concepts
14 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
TNPSC Geography One Liner Daysheet 2 4
No ratings yet
TNPSC Geography One Liner Daysheet 2 4
3 pages
Food Culture and A Travelogue Nine Fishy Tales of Samanth Subramanian's Following Fish
No ratings yet
Food Culture and A Travelogue Nine Fishy Tales of Samanth Subramanian's Following Fish
4 pages
Divine Grace School: Working Plan in Grade 8 Science For First Quarter
No ratings yet
Divine Grace School: Working Plan in Grade 8 Science For First Quarter
4 pages
VADOSE ZONE Microbial Ecology
No ratings yet
VADOSE ZONE Microbial Ecology
9 pages
Abrar's Lesson Plan
No ratings yet
Abrar's Lesson Plan
4 pages
Linear Algebra and Applications Compressed 28.12.2023
No ratings yet
Linear Algebra and Applications Compressed 28.12.2023
291 pages
CH 4 - Book Exercise
No ratings yet
CH 4 - Book Exercise
3 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Unit 2
No ratings yet
Unit 2
92 pages
Lecture-03 - Vectors and Matrices
No ratings yet
Lecture-03 - Vectors and Matrices
27 pages
Unit 2
No ratings yet
Unit 2
133 pages
Mod 2 and Mod 3 Pyq
No ratings yet
Mod 2 and Mod 3 Pyq
12 pages
Pattern Recognition Systems
No ratings yet
Pattern Recognition Systems
81 pages
Wainwrightslides 2
No ratings yet
Wainwrightslides 2
77 pages
Lecture-04 - Least Squares and Geometry
No ratings yet
Lecture-04 - Least Squares and Geometry
35 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
MSCV MLDL Remedial
No ratings yet
MSCV MLDL Remedial
95 pages
Regression Review
No ratings yet
Regression Review
50 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Linear Algebra Spring Project 2024099270 Chominhyeok
No ratings yet
Linear Algebra Spring Project 2024099270 Chominhyeok
4 pages
Selected Linear Algebra For Machine Learning
No ratings yet
Selected Linear Algebra For Machine Learning
30 pages
UCS-401 - CSE7th M L - Lect - 10 - Unit-Ll - Least Squares Method, Multivariate Linear Regression, Regul
No ratings yet
UCS-401 - CSE7th M L - Lect - 10 - Unit-Ll - Least Squares Method, Multivariate Linear Regression, Regul
16 pages
Cheatsheet 2
No ratings yet
Cheatsheet 2
2 pages
Data Analysis
No ratings yet
Data Analysis
40 pages
Report System Identification and Modelling
No ratings yet
Report System Identification and Modelling
34 pages
Linear Algebra and Some of It Application To Machine Learning 1
No ratings yet
Linear Algebra and Some of It Application To Machine Learning 1
17 pages
Mod 3
No ratings yet
Mod 3
9 pages
Cs419 Closed Form Derv
No ratings yet
Cs419 Closed Form Derv
5 pages
Maths Project Abdul
No ratings yet
Maths Project Abdul
15 pages
MLF Week 4 Notes by Manisha Pal
No ratings yet
MLF Week 4 Notes by Manisha Pal
13 pages
Chapter1 - Numerical Analysis II 2023-2024
No ratings yet
Chapter1 - Numerical Analysis II 2023-2024
30 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Information Retrieval Important Questions
No ratings yet
Information Retrieval Important Questions
20 pages
Syllabus of Machine Learning
No ratings yet
Syllabus of Machine Learning
19 pages
Regression Interpolation
No ratings yet
Regression Interpolation
34 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
Ee127-Fa2018-Mt1-El Ghaoui-Soln
No ratings yet
Ee127-Fa2018-Mt1-El Ghaoui-Soln
15 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Assignment 4 Reportdocx
No ratings yet
Assignment 4 Reportdocx
10 pages
Day 1
No ratings yet
Day 1
41 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Convex Optimization Prerequisite - Topics
No ratings yet
Convex Optimization Prerequisite - Topics
6 pages
Module 3
No ratings yet
Module 3
35 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
No ratings yet
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
12 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
CS 532 Lecture Notes
No ratings yet
CS 532 Lecture Notes
25 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Linear Algebra - A Powerful Tool For Data Science
No ratings yet
Linear Algebra - A Powerful Tool For Data Science
6 pages
Lecture 1 - Overview of Supervised Learning
No ratings yet
Lecture 1 - Overview of Supervised Learning
133 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
No ratings yet
Linear Algebra: Submitted by Ahmad Saeed Submitted To Sir Muzzam Ali BITM-F18-022
5 pages
Some Notes On Least Squares, QR-factorization, SVD and Fitting
No ratings yet
Some Notes On Least Squares, QR-factorization, SVD and Fitting
12 pages

ML Module 2,3,4

Uploaded by

ML Module 2,3,4

Uploaded by

Module 2: Linear Algebra for Machine Learning

- System of Linear equations:

1. L1 Norm (Manhattan Norm or Taxicab Norm):

 Geometrically, the L1 norm represents the distance between the

 Geometrically, the L2 norm represents the Euclidean distance

- SVD and its application:

- Least-Squares Regression for classification:

- Multivariate linear regression:

 Additional penalty terms are included in the cost function to

 L2 regularization penalizes large coefficients, effectively

Difference between Lasso and Ridge Regression

- Support Vector Machine (SVM):

- Hebbian learning rule:

- Expectation maximization algorithm for clustering:

You might also like