Linear regression fits a line to a data set of observations to predict unobserved values. It works by minimizing the squared error between each data point and the line using least squares. R-squared measures how well the linear regression line fits the data, ranging from 0 to 1, with higher values indicating more of the variance is captured by the model.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
10 views
Week 3 - 1-Linear Regression
Linear regression fits a line to a data set of observations to predict unobserved values. It works by minimizing the squared error between each data point and the line using least squares. R-squared measures how well the linear regression line fits the data, ranging from 0 to 1, with higher values indicating more of the variance is captured by the model.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8
Linear Regression
Fit a line to a data set
of observations Use this line to predict unobserved values I don’t know why they call it “regression”. It’s really misleading. You can use it to predict points in the future, the past, whatever. Usually, time has nothing to do with it. Linear Regression: How does it work? Usually using “least squares” Minimises the squared-error between each point and the line Remember the slope-intercept equation of a line? Y=mx + b The slope is the correlation between the two variables times the standard deviation in Y, all divided by the standard deviation in X. Nice how the standard deviation has some real mathematical meaning? The intercept is the mean of Y minus the slope times the mean of X But python will do all that for you. Linear Regression: How does it work? Least squares minimises the sum of the squared errors. This is the same as maximizing the likelihood of the observed data if you start thinking of the probabilities and probability distribution functions This is something called “maximum likelihood estimation” More than one way to do it Gradient Descent is an alternate method to least squares. Basically iterates to find the line that best follows the contours defined by the data. Can make sense when dealing with 3D data. Easy to try in Python and just compare the results to least squares But usually least squares is a perfectly good choice. Measuring error with r-squared How do we measure how well our line fits our data? R-squared (aka coefficient of determination) measures:
The fraction of the total variation in
Y that is captured by the model Computing r-squared Interpreting r-squared Ranges from 0 to 1 0 is bad (none of the variance is captured), 1 is good (all of the variance is captured).