0% found this document useful (0 votes)
81 views8 pages

Regression Numarical

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. It is used to predict the value of the dependent variable Y based on the independent variable X. The least squares line finds the line of best fit that minimizes the distance between the data points and the line. It is calculated by determining the slope and y-intercept using the means, sums, and counts of the X and Y values. The correlation coefficient measures how well the data fits the calculated least squares line.

Uploaded by

Murtaza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views8 pages

Regression Numarical

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. It is used to predict the value of the dependent variable Y based on the independent variable X. The least squares line finds the line of best fit that minimizes the distance between the data points and the line. It is calculated by determining the slope and y-intercept using the means, sums, and counts of the X and Y values. The correlation coefficient measures how well the data fits the calculated least squares line.

Uploaded by

Murtaza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Linear Regression

Linear regression attempts to model the relationship between two


variables by fitting a linear equation to observed data.
Linear regression is a method for predicting Y from X.
 In our case, Y is the dependent variable, and X is the independent
variable. 
We want to predict the value of Y for a given value of X. 
Now, if the data were perfectly linear, we could simply calculate the
slope intercept form of the line in terms Y = a + bX.
To predict Y, we would just plug in the given values of X, a and b. 
In the real world, our data will not be perfectly linear.
It will likely be in the form of a cluster of data points on
a scatterplot.
From that scatterplot, we would like to determine, what is the line of
best fit that describes the linear qualities of the data, and how well
does the line fit the cluster of points?

Let’s make up some data to use as an example. The relationship between


Chimpanzee hunting party size and percentage of successful hunts is well
documented.
Plot the data using a scatterplot.

Draw a line through the data does fit it perfectly,


Using just a Least Squares Line drawn by hand through the data, we could
predict that a hunting party of 4 chimpanzees is going to be around 52%
successful.
We are not 100 percent accurate, but with more data, we would likely
improve our accuracy.
How well the data fits the Least Squares Line is the Correlation
Coefficient.

Least Squares Line


In the chart above, I just drew a line by hand through the data that I
judged to be the best fit.
We should calculate this line in slope intercept form Y = a + bX to make
true predictions.
What we are seeking is a line where the differences between the line and
each point are as small as possible. This is the line of best fit.

The least squares line has two components: the slope b, and Y-intercept a. 
The equations for a and b are:

a = Y̅ - bx̅

We need to calculate Y̅, x̅ , ∑x, ∑y, ∑xy, ∑x², and ∑y².


Each piece will then be fed into the equations for a and b. 

Create the below table based on our original dataset.


Now it is a simple matter to plug our Sigma values into the equation for a
and b. n is the number of values in the dataset, which in our case is 8.

2249 – (36)(449)/8
b=
204−362 /8

b = 5.44

Calculate means 0f X and Y


Y̅= 449/8 = 56.12

x̅ = 36/8 = 4.5
Therefore
a = Y̅ - b X̅
a = 56.12 – b (4.5) = 56.12 - 5.44(4.5) = 31.64

You can make predictions of Y from given values of X using your equation:
Y = a + bX

 Y = 31.64 + 5.44X 

This means that our line starts out at 31.64 and the Y-values increase
by 5.44 percentage points for every 1 Chimpanzee that joins the hunting
party.

To test this out, let’s predict the percent hunt success for 4 chimpanzees,
i.e. X = 4; Y = ?
Y = 31.64 + 5.44(4), which results in Y=53.4
X = 9; Y = ?
Y = 31.64 + 5.44(4),
We just predicted the percentage of successful hunts for a chimpanzee
hunting party based solely on knowledge of their group size.

However, now that you can make predictions, you need to qualify your
predictions with the Correlation Coefficient, which describes how well the
data fits your calculated line.
r = .96
Our value is close to positive 1, which means that the data is highly
correlated, and positive.
You could have determined this from looking at the least squares line
plotted over the scatterplot, but the Correlation Coefficient gives you
scientific proof!

Calculate Standard error of estimate

Calculate: Regression coefficients


Example 2:
A random sample of 11 statistics students produced the following
data, where x is the third exam score out of 80, and y is the final
exam score out of 200. Can you predict the final exam score of a
randomly selected student if you know the third exam score?

Table showing the scores on the final exam based on scores from
the third exam.

x (third exam score) y (final exam score)

65 175

67 133

71 185

71 163

66 126

75 198

67 153

70 163

71 159

69 151

69 159

You might also like