MLF Week 4 Notes by Manisha Pal
MLF Week 4 Notes by Manisha Pal
Linear Regression: Imagine you're trying to predict something, like how much a house might sell for based
on its size. You have some data about other houses—how big they are and how much they sold for. You
want to find a simple rule that connects the size of a house to its price. This rule could be a straight line on
a graph, which is what linear regression helps you find.
Least Squares: Once you've drawn this line, you might notice that the line doesn't pass through all the
points exactly. Some points are above the line, and some are below. These differences are called "errors" or
"residuals."
1. Squared Errors: To measure how good the line is, we look at these errors. But instead of just adding
them up, we square each one (to make sure they are all positive and to give bigger errors more
weight).
2. Minimizing the Sum: We then add up all these squared errors. The goal is to draw the line in such a
way that this total is as small as possible. This method of finding the line is called "least squares"
because it minimizes the sum of the squared errors.
Least Square
Solution
A is the feature matrix, Theta is the variables matrix and y is the final outcome of that A matrix
Maximum Likelihood Estimation (MLE)
Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of a statistical model.
The idea is to find the parameter values that make the observed data most probable.
Example: Suppose you have a bag of coins, some are fair, and some are biased. You want to estimate the
probability of getting heads (let’s call this probability p) for each coin. You flip each coin multiple times and
observe the results.
MLE will help you find the probability p that maximizes the likelihood of obtaining the observed results. In
other words, MLE estimates p such that, given this probability, the observed number of heads and tails
would be most likely. It is like trying to figure out the best guess for a hidden thing based on what you see
happening.
Connection Between Least Squares and MLE
The connection between Least Squares and Maximum Likelihood Estimation comes into play in the
context of linear regression.
1. Linear Regression: In linear regression, we fit a line to data points to predict values. The least
squares method is often used to determine the line that best fits the data.
2. MLE in Linear Regression: When we use MLE to estimate the parameters of a linear regression
model, it turns out that the estimates provided by MLE are the same as those provided by the least
squares method.
Why? In the case of linear regression with normally distributed errors (errors that follow a Gaussian
distribution), minimizing the sum of squared errors (least squares) is equivalent to maximizing the
likelihood function (MLE).
The direction is eigen vector but how much will it get diverted is eigenvalue
Diagonalization
Diagonalization is a process of simplifying a matrix by converting it into a diagonal matrix. This is useful
because working with diagonal matrices is often easier than working with the original matrix.
Why Do We Need Diagonalization?
Simplification: Diagonal matrices are much simpler to work with, especially when it comes to matrix powers
or solving matrix equations.
Why Diagonalization Works
• Reason: When a matrix is diagonalizable, its action (transformation) can be broken down into
simpler actions along the directions of its eigenvectors. This makes calculations involving the matrix
more straightforward.
Example: Suppose a matrix represents a linear transformation of rotating and stretching space.
Diagonalizing it means you can think of the transformation as stretching along certain directions without
any rotation, making it easier to understand and compute.
Distinct Eigenvalues
Reason: Distinct eigenvalues ensure that each eigenvalue has a unique eigenvector, and these eigenvectors
are linearly independent.
Orthogonal Diagonalization:
Spectral Theorem:
• For any real symmetric matrix AAA, the eigenvalues are real, eigenvectors corresponding to distinct
eigenvalues are orthogonal, and the matrix is orthogonally diagonalizable.
In practical terms, the Spectral Theorem helps us break down complex matrix operations into simpler ones,
making it easier to understand and work with the underlying data or transformations the matrix represents.