MLF Week 4 Notes by Manisha Pal

The document covers key concepts in linear regression, including least squares and maximum likelihood estimation (MLE), explaining how they relate to finding the best-fit line for data. It also introduces polynomial regression and ridge regression to address overfitting, along with the importance of eigenvalues and eigenvectors in matrix transformations and diagonalization. Finally, it discusses the spectral theorem for real symmetric matrices, emphasizing the significance of orthogonal diagonalization.

Uploaded by

LAKSHAY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

MLF Week 4 Notes by Manisha Pal

Uploaded by

LAKSHAY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Week 4: by: Manisha Pal

Linear Regression: Imagine you're trying to predict something, like how much a house might sell for based
on its size. You have some data about other houses—how big they are and how much they sold for. You
want to find a simple rule that connects the size of a house to its price. This rule could be a straight line on
a graph, which is what linear regression helps you find.
Least Squares: Once you've drawn this line, you might notice that the line doesn't pass through all the
points exactly. Some points are above the line, and some are below. These differences are called "errors" or
"residuals."
1. Squared Errors: To measure how good the line is, we look at these errors. But instead of just adding
them up, we square each one (to make sure they are all positive and to give bigger errors more
weight).
2. Minimizing the Sum: We then add up all these squared errors. The goal is to draw the line in such a
way that this total is as small as possible. This method of finding the line is called "least squares"
because it minimizes the sum of the squared errors.

Least Square
Solution

A is the feature matrix, Theta is the variables matrix and y is the final outcome of that A matrix
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of a statistical model.
The idea is to find the parameter values that make the observed data most probable.
Example: Suppose you have a bag of coins, some are fair, and some are biased. You want to estimate the
probability of getting heads (let’s call this probability p) for each coin. You flip each coin multiple times and
observe the results.
MLE will help you find the probability p that maximizes the likelihood of obtaining the observed results. In
other words, MLE estimates p such that, given this probability, the observed number of heads and tails
would be most likely. It is like trying to figure out the best guess for a hidden thing based on what you see
happening.
Connection Between Least Squares and MLE
The connection between Least Squares and Maximum Likelihood Estimation comes into play in the
context of linear regression.
1. Linear Regression: In linear regression, we fit a line to data points to predict values. The least
squares method is often used to determine the line that best fits the data.
2. MLE in Linear Regression: When we use MLE to estimate the parameters of a linear regression
model, it turns out that the estimates provided by MLE are the same as those provided by the least
squares method.
Why? In the case of linear regression with normally distributed errors (errors that follow a Gaussian
distribution), minimizing the sum of squared errors (least squares) is equivalent to maximizing the
likelihood function (MLE).

Connecting Projections to Linear Regression

In linear regression, we are trying to find the best-fit line through our data points. This line is a model that
helps us make predictions.
When we talk about projections in the context of linear regression, we are referring to projecting our data
onto the "column space" of a matrix (which represents the line or plane in our data).
Breaking it Down:
1. Matrix A and Vector Y:
o Think of matrix AAA as a set of instructions that tell us how to connect our data to the line
we’re trying to draw (like the line of best fit).
o Vector YYY is our actual data points (like the prices of houses we measured).
2. Projection of Y onto A:
o We want to find the closest point on this line (or plane) to each of our data points YYY. This
"closest point" is the projection.
o The line we draw (using linear regression) is based on this projection. It’s the best
approximation of our data within the limits of the line we’re using.
Introduction to Polynomial Regression
What is Polynomial Regression?
Sometimes, data points don’t fit well with a straight line. In these cases, you might need a curve instead of
a line. Polynomial regression extends linear regression to fit curves, not just lines.
Polynomial Regression involves fitting a polynomial (a mathematical expression involving powers of
variables) to your data. This means instead of a straight line, you could use a curve like a parabola (U-
shaped curve) or even more complex curves.
Fitting a Polynomial:
1. Transform Features:
o Start with x.
o Create polynomial features: x(the original), x^2(the squared term).
2. Form Polynomial Regression Model:
o Your model will now include both x and x^2 terms.
o The equation might look like this: y=a0+a1x+a2x^2 where a0, a1, and a2 are coefficients that
the model will find.
3. Fit the Curve:
o Use linear regression techniques to determine the best values for a0, a1, and a2 that
minimize the error between your actual data points and the predicted points from the
polynomial equation.

Regularization and Ridge Regression

Ridge Regression is a method used in machine learning and statistics to deal with a problem called
overfitting. Overfitting happens when a model is too complex and fits the training data too closely, which
makes it perform poorly on new, unseen data. Ridge regression helps to make models more general and
less prone to overfitting by adding a penalty to large coefficients.
Regularization is like putting a constraint or limit on the model to keep it from becoming too complex.
Think of it like a speed limit for driving. Just as a speed limit prevents you from going too fast, regularization
prevents your model from becoming too complex.
Simple Linear Regression vs. Ridge Regression
Scenario: Suppose you are predicting house prices based on features like size (in square feet) and number
of bedrooms.
• Without Regularization: If you use a simple linear regression, you might end up with very large
coefficients for some features if the data has high variance. This could cause the model to fit the
noise in the data rather than the actual trend.
• With Ridge Regression: When you apply ridge regression, it will add a penalty to the size of these
coefficients. For example, if the coefficient for the number of bedrooms becomes very large, ridge
regression will reduce its size to balance the model and prevent overfitting.
Polynomial Regression with Ridge Regularization
Scenario: You are fitting a polynomial curve to data points that show a complex relationship. You use
polynomial regression to fit a curve, but the model might overfit by creating a very wiggly curve that fits
the training data too closely.
• Without Regularization: The polynomial might be very complex, with high-degree terms having
very large coefficients.
• With Ridge Regression: The ridge penalty will keep the coefficients of the polynomial terms smaller,
leading to a smoother curve that captures the overall trend without fitting every tiny fluctuation in
the training data.

What Are Eigenvalues and Eigenvectors?

Matrix as a Transformation
Imagine you have a rubber sheet that you can stretch, rotate, or compress. A matrix is like a set of
instructions for how to stretch, rotate, or compress this sheet.
• Matrix: Just like you can use different tools to transform the rubber sheet, you use a matrix to
transform vectors (arrows or points) in space. For instance, a matrix might stretch the rubber sheet
more in one direction than in another.
Eigenvalues and Eigenvectors
• Eigenvalue: Think of an eigenvalue as a “scaling factor.” It tells you how much the matrix stretches or
shrinks the eigenvector.
• Eigenvector: Imagine an eigenvector as a special direction on the rubber sheet. When you apply the
matrix (your transformation instructions) to this direction, it only stretches or shrinks the direction
but doesn’t change it.
Why Do We Need Eigenvalues and Eigenvectors?
Understanding Matrix Transformations
Let’s use a simple example to see why eigenvalues and eigenvectors are useful:
Example: Stretching a Rubber Sheet
1. Transforming Directions: Suppose you have a rubber sheet stretched out in a certain way, and you
have a particular direction on that sheet (let’s call it “direction A”). When you apply a stretching
matrix to this sheet, direction A might stretch to a new direction or get larger, but it will generally
remain along the same line. This direction A is an eigenvector.
2. Scaling Factor: The amount by which direction A stretches or shrinks is the eigenvalue. If the
eigenvalue is 2, it means direction A stretches to twice its length. If the eigenvalue is 0.5, it shrinks
to half its length.

The direction is eigen vector but how much will it get diverted is eigenvalue
Diagonalization
Diagonalization is a process of simplifying a matrix by converting it into a diagonal matrix. This is useful
because working with diagonal matrices is often easier than working with the original matrix.
Why Do We Need Diagonalization?
Simplification: Diagonal matrices are much simpler to work with, especially when it comes to matrix powers
or solving matrix equations.
Why Diagonalization Works
• Reason: When a matrix is diagonalizable, its action (transformation) can be broken down into
simpler actions along the directions of its eigenvectors. This makes calculations involving the matrix
more straightforward.
Example: Suppose a matrix represents a linear transformation of rotating and stretching space.
Diagonalizing it means you can think of the transformation as stretching along certain directions without
any rotation, making it easier to understand and compute.

3. Linear Independence of Eigenvectors

Key Idea: For a matrix to be diagonalizable, you need enough independent directions (eigenvectors) to
describe the space. These eigenvectors should be linearly independent, meaning no eigenvector can be
written as a combination of the others.
Example: Imagine you have three directions (eigenvectors) in space, and each one points in a unique
direction. These are independent directions. If you have a matrix with exactly three such independent
directions, you can simplify it into a diagonal matrix.
Why Linear Independence is Crucial:
• Reason: If you have n linearly independent eigenvectors for an n×n matrix, you can create a matrix S
with these eigenvectors as columns. If S is invertible (meaning it has an inverse), you can diagonalize
the matrix.

Distinct Eigenvalues

Claim: If a matrix has all distinct eigenvalues, it is diagonalizable.

Reason: Distinct eigenvalues ensure that each eigenvalue has a unique eigenvector, and these eigenvectors
are linearly independent.

Orthogonal Diagonalization:

For orthogonal diagonalization we want :

Real Symmetric Matrices:

• A matrix is symmetric if A = A^T
• For real symmetric matrices, the spectral theorem guarantees diagonalizability and ensures the
eigenvalues are real. The matrix can be orthogonally diagonalized

Eigenvectors and Eigenvalues for orthogonal diagonalization:

• Eigenvectors corresponding to different eigenvalues of a real symmetric matrix are orthogonal.
• Eigenvectors can be normalized to form an orthonormal basis.
• The matrix formed by placing these orthonormal eigenvectors as columns is the orthogonal matrix
Q

Spectral Theorem:
• For any real symmetric matrix AAA, the eigenvalues are real, eigenvectors corresponding to distinct
eigenvalues are orthogonal, and the matrix is orthogonally diagonalizable.
In practical terms, the Spectral Theorem helps us break down complex matrix operations into simpler ones,
making it easier to understand and work with the underlying data or transformations the matrix represents.

PG TRB PHYSICS STUDY MATERIAL - PDF'
88% (16)
PG TRB PHYSICS STUDY MATERIAL - PDF'
150 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Regression Analysis in Machine Learning: Context
No ratings yet
Regression Analysis in Machine Learning: Context
16 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Regression Review
No ratings yet
Regression Review
50 pages
Regression
No ratings yet
Regression
45 pages
Lecture-17-Linear Regression Using Sklearn
No ratings yet
Lecture-17-Linear Regression Using Sklearn
32 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
Bias
No ratings yet
Bias
62 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Unit 2
No ratings yet
Unit 2
92 pages
Maths Project Abdul
No ratings yet
Maths Project Abdul
15 pages
Linear Regression1
No ratings yet
Linear Regression1
98 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Unit 2
No ratings yet
Unit 2
133 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
ML Unit
No ratings yet
ML Unit
23 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Unit 3
No ratings yet
Unit 3
25 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Unit - Iii Supervisied Learning - Notes
No ratings yet
Unit - Iii Supervisied Learning - Notes
42 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Edab Module - 3
No ratings yet
Edab Module - 3
17 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Unit 2
No ratings yet
Unit 2
92 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Simple Linear Regression-Merged
No ratings yet
Simple Linear Regression-Merged
65 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
A) The Least-Squares Method
No ratings yet
A) The Least-Squares Method
19 pages
Unit-3 DA
No ratings yet
Unit-3 DA
50 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
AI Lab7
No ratings yet
AI Lab7
13 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Linear Algebra
No ratings yet
Linear Algebra
21 pages
Module 3
No ratings yet
Module 3
35 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Regression
No ratings yet
Regression
6 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Introduction To Short Circuit Current Calculations
No ratings yet
Introduction To Short Circuit Current Calculations
46 pages
Design of LQR Controller For Inverted Pendulum System
100% (1)
Design of LQR Controller For Inverted Pendulum System
64 pages
Linear Algebra Spring Project 2024099270 Chominhyeok
No ratings yet
Linear Algebra Spring Project 2024099270 Chominhyeok
4 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Math SPM
No ratings yet
Math SPM
54 pages
Note Calc PWRI Injectivity
No ratings yet
Note Calc PWRI Injectivity
9 pages
Advanced Excel Formulas
100% (1)
Advanced Excel Formulas
319 pages
Syllabus Cse Ruet
No ratings yet
Syllabus Cse Ruet
25 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Important Facts of World and Indian Geography
No ratings yet
Important Facts of World and Indian Geography
80 pages
Conveyors PDF
No ratings yet
Conveyors PDF
22 pages
MSC Statistics Syllabus First Year Regular Mode
No ratings yet
MSC Statistics Syllabus First Year Regular Mode
18 pages
Transactionalanalysis
No ratings yet
Transactionalanalysis
22 pages
Centrifugation: Difference Densities of The Two Liquids Small Force of Gravity Weak
No ratings yet
Centrifugation: Difference Densities of The Two Liquids Small Force of Gravity Weak
34 pages
Beginning Data Science in R 4 - Data Analysis, Visualization, - Thomas Mailund - 2, 2022 - Apress - 9781484281543 - Anna's Archive
No ratings yet
Beginning Data Science in R 4 - Data Analysis, Visualization, - Thomas Mailund - 2, 2022 - Apress - 9781484281543 - Anna's Archive
545 pages
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
No ratings yet
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
17 pages
Part IA - Vectors and Matrices: Theorems With Proof
No ratings yet
Part IA - Vectors and Matrices: Theorems With Proof
26 pages
Syllabus
No ratings yet
Syllabus
143 pages
4-5-Aljabar Linier - Print
No ratings yet
4-5-Aljabar Linier - Print
51 pages
1970 - Mehra - On The Identification of Variances and Adaptive - KF
No ratings yet
1970 - Mehra - On The Identification of Variances and Adaptive - KF
34 pages
Chapter 4 Algebraic Structures
No ratings yet
Chapter 4 Algebraic Structures
39 pages
Important Facts of Ancient Indian History
No ratings yet
Important Facts of Ancient Indian History
33 pages
Teaching Scheme For B.E. (Chemical Engineering) (2014-2015)
No ratings yet
Teaching Scheme For B.E. (Chemical Engineering) (2014-2015)
28 pages
2017 Exam 2
No ratings yet
2017 Exam 2
37 pages
History of Indian Constitution
No ratings yet
History of Indian Constitution
55 pages
Important Facts of General Science
No ratings yet
Important Facts of General Science
30 pages
Selfanalysisand Swot Analysis
No ratings yet
Selfanalysisand Swot Analysis
12 pages
Hallinger Kovacevic 2019 A Bibliometric Review of Research On Educational Administration Science Mapping The Literature
No ratings yet
Hallinger Kovacevic 2019 A Bibliometric Review of Research On Educational Administration Science Mapping The Literature
35 pages
CREMATORIUM OF FAMOUS PERSONALITIES समाधि स्थल
No ratings yet
CREMATORIUM OF FAMOUS PERSONALITIES समाधि स्थल
15 pages
Research Article: New Models For Solving Time-Varying Lu Decomposition by Using ZNN Method and Zead Formulas
No ratings yet
Research Article: New Models For Solving Time-Varying Lu Decomposition by Using ZNN Method and Zead Formulas
13 pages
Dimension Reduction (PCA
No ratings yet
Dimension Reduction (PCA
12 pages
Python Pandas N-WPS Office
No ratings yet
Python Pandas N-WPS Office
15 pages
1.4 Noun
No ratings yet
1.4 Noun
9 pages
Assigment KCA205 (DS)
No ratings yet
Assigment KCA205 (DS)
2 pages
Self Concept and Self Esteem-2
No ratings yet
Self Concept and Self Esteem-2
16 pages
The Mathematics of Juggling - Ed Carstens
No ratings yet
The Mathematics of Juggling - Ed Carstens
25 pages
Quiz 2
No ratings yet
Quiz 2
6 pages
2024 Ensc180 Lab 1
No ratings yet
2024 Ensc180 Lab 1
2 pages
Design For Quality in Agile Manufacturing Environment Through Modified Orthogonal Array-Based Experimentation
No ratings yet
Design For Quality in Agile Manufacturing Environment Through Modified Orthogonal Array-Based Experimentation
22 pages
Pensum I IMA1001 Matematiske Metoder 1
No ratings yet
Pensum I IMA1001 Matematiske Metoder 1
3 pages
Implementing A Randomized SVD Algorithm and Its Performance Analysis
No ratings yet
Implementing A Randomized SVD Algorithm and Its Performance Analysis
7 pages
Geoff Bohling Bayes
No ratings yet
Geoff Bohling Bayes
10 pages
Lax, Peter D. Change of Variables in Multiple Integrals. II. Amer. Math. Monthly 108 (2001)
No ratings yet
Lax, Peter D. Change of Variables in Multiple Integrals. II. Amer. Math. Monthly 108 (2001)
6 pages
Reco PDF
No ratings yet
Reco PDF
1 page
MAS109 Syllabus 2017fall
No ratings yet
MAS109 Syllabus 2017fall
1 page
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)

MLF Week 4 Notes by Manisha Pal

Uploaded by

MLF Week 4 Notes by Manisha Pal

Uploaded by

Week 4: by: Manisha Pal

Connecting Projections to Linear Regression

Regularization and Ridge Regression

What Are Eigenvalues and Eigenvectors?

3. Linear Independence of Eigenvectors

Claim: If a matrix has all distinct eigenvalues, it is diagonalizable.

For orthogonal diagonalization we want :

Real Symmetric Matrices:

Eigenvectors and Eigenvalues for orthogonal diagonalization:

You might also like