0% found this document useful (0 votes)
13 views13 pages

MLF Week 4 Notes by Manisha Pal

The document covers key concepts in linear regression, including least squares and maximum likelihood estimation (MLE), explaining how they relate to finding the best-fit line for data. It also introduces polynomial regression and ridge regression to address overfitting, along with the importance of eigenvalues and eigenvectors in matrix transformations and diagonalization. Finally, it discusses the spectral theorem for real symmetric matrices, emphasizing the significance of orthogonal diagonalization.

Uploaded by

LAKSHAY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

MLF Week 4 Notes by Manisha Pal

The document covers key concepts in linear regression, including least squares and maximum likelihood estimation (MLE), explaining how they relate to finding the best-fit line for data. It also introduces polynomial regression and ridge regression to address overfitting, along with the importance of eigenvalues and eigenvectors in matrix transformations and diagonalization. Finally, it discusses the spectral theorem for real symmetric matrices, emphasizing the significance of orthogonal diagonalization.

Uploaded by

LAKSHAY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Week 4: by: Manisha Pal

Linear Regression: Imagine you're trying to predict something, like how much a house might sell for based
on its size. You have some data about other houses—how big they are and how much they sold for. You
want to find a simple rule that connects the size of a house to its price. This rule could be a straight line on
a graph, which is what linear regression helps you find.
Least Squares: Once you've drawn this line, you might notice that the line doesn't pass through all the
points exactly. Some points are above the line, and some are below. These differences are called "errors" or
"residuals."
1. Squared Errors: To measure how good the line is, we look at these errors. But instead of just adding
them up, we square each one (to make sure they are all positive and to give bigger errors more
weight).
2. Minimizing the Sum: We then add up all these squared errors. The goal is to draw the line in such a
way that this total is as small as possible. This method of finding the line is called "least squares"
because it minimizes the sum of the squared errors.

Least Square
Solution

A is the feature matrix, Theta is the variables matrix and y is the final outcome of that A matrix
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of a statistical model.
The idea is to find the parameter values that make the observed data most probable.
Example: Suppose you have a bag of coins, some are fair, and some are biased. You want to estimate the
probability of getting heads (let’s call this probability p) for each coin. You flip each coin multiple times and
observe the results.
MLE will help you find the probability p that maximizes the likelihood of obtaining the observed results. In
other words, MLE estimates p such that, given this probability, the observed number of heads and tails
would be most likely. It is like trying to figure out the best guess for a hidden thing based on what you see
happening.
Connection Between Least Squares and MLE
The connection between Least Squares and Maximum Likelihood Estimation comes into play in the
context of linear regression.
1. Linear Regression: In linear regression, we fit a line to data points to predict values. The least
squares method is often used to determine the line that best fits the data.
2. MLE in Linear Regression: When we use MLE to estimate the parameters of a linear regression
model, it turns out that the estimates provided by MLE are the same as those provided by the least
squares method.
Why? In the case of linear regression with normally distributed errors (errors that follow a Gaussian
distribution), minimizing the sum of squared errors (least squares) is equivalent to maximizing the
likelihood function (MLE).

Connecting Projections to Linear Regression


In linear regression, we are trying to find the best-fit line through our data points. This line is a model that
helps us make predictions.
When we talk about projections in the context of linear regression, we are referring to projecting our data
onto the "column space" of a matrix (which represents the line or plane in our data).
Breaking it Down:
1. Matrix A and Vector Y:
o Think of matrix AAA as a set of instructions that tell us how to connect our data to the line
we’re trying to draw (like the line of best fit).
o Vector YYY is our actual data points (like the prices of houses we measured).
2. Projection of Y onto A:
o We want to find the closest point on this line (or plane) to each of our data points YYY. This
"closest point" is the projection.
o The line we draw (using linear regression) is based on this projection. It’s the best
approximation of our data within the limits of the line we’re using.
Introduction to Polynomial Regression
What is Polynomial Regression?
Sometimes, data points don’t fit well with a straight line. In these cases, you might need a curve instead of
a line. Polynomial regression extends linear regression to fit curves, not just lines.
Polynomial Regression involves fitting a polynomial (a mathematical expression involving powers of
variables) to your data. This means instead of a straight line, you could use a curve like a parabola (U-
shaped curve) or even more complex curves.
Fitting a Polynomial:
1. Transform Features:
o Start with x.
o Create polynomial features: x(the original), x^2(the squared term).
2. Form Polynomial Regression Model:
o Your model will now include both x and x^2 terms.
o The equation might look like this: y=a0+a1x+a2x^2 where a0, a1, and a2 are coefficients that
the model will find.
3. Fit the Curve:
o Use linear regression techniques to determine the best values for a0, a1, and a2 that
minimize the error between your actual data points and the predicted points from the
polynomial equation.

Regularization and Ridge Regression


Ridge Regression is a method used in machine learning and statistics to deal with a problem called
overfitting. Overfitting happens when a model is too complex and fits the training data too closely, which
makes it perform poorly on new, unseen data. Ridge regression helps to make models more general and
less prone to overfitting by adding a penalty to large coefficients.
Regularization is like putting a constraint or limit on the model to keep it from becoming too complex.
Think of it like a speed limit for driving. Just as a speed limit prevents you from going too fast, regularization
prevents your model from becoming too complex.
Simple Linear Regression vs. Ridge Regression
Scenario: Suppose you are predicting house prices based on features like size (in square feet) and number
of bedrooms.
• Without Regularization: If you use a simple linear regression, you might end up with very large
coefficients for some features if the data has high variance. This could cause the model to fit the
noise in the data rather than the actual trend.
• With Ridge Regression: When you apply ridge regression, it will add a penalty to the size of these
coefficients. For example, if the coefficient for the number of bedrooms becomes very large, ridge
regression will reduce its size to balance the model and prevent overfitting.
Polynomial Regression with Ridge Regularization
Scenario: You are fitting a polynomial curve to data points that show a complex relationship. You use
polynomial regression to fit a curve, but the model might overfit by creating a very wiggly curve that fits
the training data too closely.
• Without Regularization: The polynomial might be very complex, with high-degree terms having
very large coefficients.
• With Ridge Regression: The ridge penalty will keep the coefficients of the polynomial terms smaller,
leading to a smoother curve that captures the overall trend without fitting every tiny fluctuation in
the training data.

What Are Eigenvalues and Eigenvectors?


Matrix as a Transformation
Imagine you have a rubber sheet that you can stretch, rotate, or compress. A matrix is like a set of
instructions for how to stretch, rotate, or compress this sheet.
• Matrix: Just like you can use different tools to transform the rubber sheet, you use a matrix to
transform vectors (arrows or points) in space. For instance, a matrix might stretch the rubber sheet
more in one direction than in another.
Eigenvalues and Eigenvectors
• Eigenvalue: Think of an eigenvalue as a “scaling factor.” It tells you how much the matrix stretches or
shrinks the eigenvector.
• Eigenvector: Imagine an eigenvector as a special direction on the rubber sheet. When you apply the
matrix (your transformation instructions) to this direction, it only stretches or shrinks the direction
but doesn’t change it.
Why Do We Need Eigenvalues and Eigenvectors?
Understanding Matrix Transformations
Let’s use a simple example to see why eigenvalues and eigenvectors are useful:
Example: Stretching a Rubber Sheet
1. Transforming Directions: Suppose you have a rubber sheet stretched out in a certain way, and you
have a particular direction on that sheet (let’s call it “direction A”). When you apply a stretching
matrix to this sheet, direction A might stretch to a new direction or get larger, but it will generally
remain along the same line. This direction A is an eigenvector.
2. Scaling Factor: The amount by which direction A stretches or shrinks is the eigenvalue. If the
eigenvalue is 2, it means direction A stretches to twice its length. If the eigenvalue is 0.5, it shrinks
to half its length.

The direction is eigen vector but how much will it get diverted is eigenvalue
Diagonalization
Diagonalization is a process of simplifying a matrix by converting it into a diagonal matrix. This is useful
because working with diagonal matrices is often easier than working with the original matrix.
Why Do We Need Diagonalization?
Simplification: Diagonal matrices are much simpler to work with, especially when it comes to matrix powers
or solving matrix equations.
Why Diagonalization Works
• Reason: When a matrix is diagonalizable, its action (transformation) can be broken down into
simpler actions along the directions of its eigenvectors. This makes calculations involving the matrix
more straightforward.
Example: Suppose a matrix represents a linear transformation of rotating and stretching space.
Diagonalizing it means you can think of the transformation as stretching along certain directions without
any rotation, making it easier to understand and compute.

3. Linear Independence of Eigenvectors


Key Idea: For a matrix to be diagonalizable, you need enough independent directions (eigenvectors) to
describe the space. These eigenvectors should be linearly independent, meaning no eigenvector can be
written as a combination of the others.
Example: Imagine you have three directions (eigenvectors) in space, and each one points in a unique
direction. These are independent directions. If you have a matrix with exactly three such independent
directions, you can simplify it into a diagonal matrix.
Why Linear Independence is Crucial:
• Reason: If you have n linearly independent eigenvectors for an n×n matrix, you can create a matrix S
with these eigenvectors as columns. If S is invertible (meaning it has an inverse), you can diagonalize
the matrix.

Distinct Eigenvalues

Claim: If a matrix has all distinct eigenvalues, it is diagonalizable.

Reason: Distinct eigenvalues ensure that each eigenvalue has a unique eigenvector, and these eigenvectors
are linearly independent.

Orthogonal Diagonalization:

For orthogonal diagonalization we want :

Real Symmetric Matrices:


• A matrix is symmetric if A = A^T
• For real symmetric matrices, the spectral theorem guarantees diagonalizability and ensures the
eigenvalues are real. The matrix can be orthogonally diagonalized

Eigenvectors and Eigenvalues for orthogonal diagonalization:


• Eigenvectors corresponding to different eigenvalues of a real symmetric matrix are orthogonal.
• Eigenvectors can be normalized to form an orthonormal basis.
• The matrix formed by placing these orthonormal eigenvectors as columns is the orthogonal matrix
Q

Spectral Theorem:
• For any real symmetric matrix AAA, the eigenvalues are real, eigenvectors corresponding to distinct
eigenvalues are orthogonal, and the matrix is orthogonally diagonalizable.
In practical terms, the Spectral Theorem helps us break down complex matrix operations into simpler ones,
making it easier to understand and work with the underlying data or transformations the matrix represents.

You might also like