0% found this document useful (0 votes)
20 views

Linear Regression

Linear regression analyzes the relationship between two variables to determine if an apparent linear relationship is statistically significant or could be due to chance. It calculates a line of best fit that minimizes the distance between data points and the line. The slope and y-intercept of this line can indicate if changes in the independent variable significantly impact the dependent variable. For example, linear regression of egg mass over time showed mass significantly decreased with increasing time.

Uploaded by

Richard Hampson
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Linear Regression

Linear regression analyzes the relationship between two variables to determine if an apparent linear relationship is statistically significant or could be due to chance. It calculates a line of best fit that minimizes the distance between data points and the line. The slope and y-intercept of this line can indicate if changes in the independent variable significantly impact the dependent variable. For example, linear regression of egg mass over time showed mass significantly decreased with increasing time.

Uploaded by

Richard Hampson
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Linear Regression

The aim of this learning module is to test whether an apparently linear relationship
between two variables is real, or whether it could have happened by chance because of
variability. To do this we use regression analysis, strictly speaking linear regression.

This is based on the 'method of least squares' that we've already met in using Excel to fit
a trendline to data on a spreadsheet chart.

Regression can also be used to extrapolate or interpolate your data to obtain


predicted values on the Y-axis for X-axis values where you don't have an actual
measurement.

Difference between regression and correlation


In an earlier unit we saw how correlation allows us to test for relationships between
two measurements taken on the same items, either of which might be dependent on
the other, e.g. heart rate and blood pressure.

However, there are many cases when one measurement is clearly independent of the
other one. Some examples:

1. Ages and weights of people: the weight of a growing person clearly depends on their
age, but their age is independent of their weight.

2. Time and intracellular pH in cells: you can measure the pH in cells at particular times,
but it would be meaningless to plan to measure the time at particular pH values.

3. Reaction rate and temperature: you can control the temperature and this affects the
rate of the reaction, but you can't set a reaction rate that will affect the temperature of
the experiment.

Statistically speaking, you shouldn't investigate these cases using correlation. Instead
you should plot the data correctly and use regression analysis to see if there is a linear
association.

Plotting the data


You must plot the dependent variable (weight, pH, reaction rate etc.) on the Y-axis and
the independent variable (time, temperature etc.) on the X-axis. For instance the
graph below shows how the mass of eggs depends on their age.
This helps you to see how the dependent variable is affected by the independent
variable.

What regression does

Regression analysis calculates the "line of best fit" through the data points.

It does this by finding a straight line, y = a + bx, which minimises the sum of the
squares of the distances, si2, of each point to the line.

The slope, b, of the line is given by the equation


where the bars over the x and y indicate that they are the mean values of x and y from
all the data points. The value of a is then calculated by putting the values of x, y and b
into the equation y = a + bx.

The problem

You could get exactly the same line and exactly the same equation just by chance even
if there is no association between the independent variable and the dependent one. This
can easily happen when the data points are very scattered, as on the right-hand graph
below.

In panel (a) you can clearly see that there is a significant linear relationship between the
two variables. This will be indicated by the fact that the sum of squares Σs i2 is low.

In panel (b) the points are all over the place so the value of Σsi2 is likely to be very large,
indicating that there is not a significant association - even though the line of best fit is
exactly the same as in panel (a).

To illustrate the use of linear regression analysis in Prism, let's consider the relationship
between time and the mass of a batch of eggs. It looks as if the mass of the eggs fell as
they got older, but is this a significant fall? To determine this we must test whether the
slope of the regression line is significantly less than zero. The null hypothesis is that the
mas of the eggs does not change over time; in other words that the slope of the line is
zero.

To enter the data we use an XY data table in Prism, putting the time values (the
independent variable in this example) in the X column and the mass of the eggs (the
dependent variable) in the first Y column:
After pressing the Analyze button we select Linear Regression from the list of XY
analyses:
Accepting all the default options in the next dialogue box we get to a Results page
looking like this:

Here we can see that the line of best fit has a Slope of -1.361 with a standard error of
0.0951, and a Y-intercept of 89.44 with a standard error of 2.279.

The equation of the line Y = -1.361*X + 89.44 is shown at the bottom of the window.

The R squared value is 0.9192 is close to 1 telling us that the regression line


(equivalent to a trendline in Excel) is a good fit to the data.

The P value for the difference between the slope and zero is much less than 0.05 so we
can reject our null hypothesis and conclude that the mass of the eggs does decrease
significantly with time.

The Graph page shows an XY plot of the data together with the fitted regression line:
 You can also carry out your own t-tests on the results of the linear regression obtained
with Prism to work out whether the slope or intercept are different from any particular
values.

For example, you could test whether eggs are significantly lighter than, say, 90 g when
they are laid. In other words, test whether the intercept is significantly lower than 90.
We do this as follows:

1.Our null hypothesis is that the eggs do weigh 90 g when laid.

2.Calculate t using the equation

This gives:
3.Compare t with the critical value for N - 2 degrees of freedom, where N = number
of points:

tcrit = 2.069

4.Clearly 0.247 is less than 2.069, so the difference is not significant.

We conclude that the initial weight is not significantly different from 90 g.

You might also like