0% found this document useful (0 votes)
6 views

Linear_Regression_Deviation_Example

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Linear_Regression_Deviation_Example

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Understanding Deviation in Linear Regression

In linear regression, understanding the different types of variation is crucial for interpreting the

model's performance. Let's break down the three key types of variation: total deviation, explained

deviation, and unexplained deviation.

1. Total Variation (Total Deviation):

Total variation measures how much the actual data points (the observed values) differ from the

overall average value of the dependent variable (Y).

Mathematically, for each data point, the total deviation is calculated as the difference between the

observed value (yi) and the mean of all observed values (mean y).

Total Deviation = yi - mean y

2. Explained Variation (Explained Deviation):

Explained variation measures how much of the total variation is explained by the regression line.

This represents the part of the difference between the actual value (yi) and the mean (mean y) that

can be explained by the relationship between the independent variables and the dependent

variable.

Mathematically, it is the difference between the predicted value (predicted y) from the regression line

and the mean (mean y).

Explained Deviation = predicted y - mean y


3. Unexplained Variation (Unexplained Deviation):

Unexplained variation measures the portion of the total variation that the regression line does not

explain. This is the difference between the actual value (yi) and the predicted value (predicted y).

Mathematically, it is the difference between the actual value (yi) and the predicted value (predicted

y).

Unexplained Deviation = yi - predicted y

Putting it all together:

- The total deviation tells you how far the actual point is from the average of all points.

- The explained deviation tells you how much of that distance is explained by the regression line

(how close the point is to the predicted value on the line).

- The unexplained deviation is what's left over, i.e., how much the point deviates from what the

regression line predicts.

In other words:

Total Deviation = Explained Deviation + Unexplained Deviation

Example:

Suppose you have data on the number of hours students studied for a test (X) and their

corresponding test scores (Y). Let's look at three students' data:

| Hours Studied (X) | Actual Test Score (Y) | Predicted Test Score (predicted y) |

|-------------------|-----------------------|-----------------------------------|

|2 | 65 | 70 |

|4 | 75 | 80 |
|6 | 85 | 90 |

In this example:

- For Student 1, the total deviation is -10, explained deviation is -5, and unexplained deviation is -5.

- For Student 2, the total deviation is 0, explained deviation is 5, and unexplained deviation is -5.

- For Student 3, the total deviation is 10, explained deviation is 15, and unexplained deviation is -5.

This breakdown helps us understand how much of the variation in test scores can be explained by

the number of hours studied and how much cannot be explained.

You might also like