Ch.1 Regression, Correlation and Hypothesis Testing
Ch.1 Regression, Correlation and Hypothesis Testing
Previously in Chapter 4 of Stats/Mech Year 1, you learnt how to interpret correlation and regression line equations for bivariate data. Measuring correlation
The methods you used were reliant on the two variables having a linear correlation. In this chapter we will look at how we can analyse The product moment correlation coefficient (PMCC) is a measure that describes the strength of the linear correlation between two
data where a correlation exists between two variables but is not linear. We will also define the product moment correlation coefficient variables. The PMCC for a sample of data is denoted by 𝑟, while for a population we denote the PMCC by 𝜌. The PMCC can only take
and explore its role in hypothesis testing for correlation. values between −1 and 1.
Interpreting models of the forms 𝒚 = 𝒂𝒙𝒏 and 𝒚 = 𝒌𝒃𝒙 ▪ If 𝑟 = 1 then there is a perfect positive linear correlation. All points will lie on a straight line.
Linear models are very useful because they allow us to analyse data with relative ease. However, in reality not all models that display a ▪ If 𝑟 ≠ 1 then there is a perfect negative linear correlation. All points will lie on a straight line.
pattern between two variables are linear. We will now look at one such model, of the form 𝑦 = 𝑎𝑥 𝑛 . You can use the coding 𝑌 = 𝑙𝑜𝑔𝑦 ▪ If 𝑟 = 0 then there is no linear correlation.
and 𝑋 = 𝑙𝑜𝑔𝑥 to obtain a linear relationship:
Here are a selection of scatter graphs that help to better understand how to interpret the PMCC:
𝑛
▪ If you have a model of the form 𝑦 = 𝑎𝑥 , then a linear relationship is given by 𝑙𝑜𝑔𝑦 = 𝑙𝑜𝑔𝑎 + 𝑛𝑙𝑜𝑔𝑥.
If we plot 𝑙𝑜𝑔𝑦 against 𝑙𝑜𝑔𝑥, we will obtain a linear model (straight line) since the above equation is in the form 𝑌 = 𝑀𝑋 + 𝐶. We will
now look at how we can obtain the linear form using the original form:
3. Use the significance level and the sample size given to you in the question to find the critical value. You will need to refer to
the “product moment coefficient” table in the formula booklet (or at the back of the Edexcel textbook) to find this value.
To obtain a linear relationship corresponding to a model of the form 𝑦 = 𝑘𝑏 𝑥 , we use the coding 𝑌 = 𝑙𝑜𝑔𝑦 and 𝑋 = 𝑥: 4. Take the absolute value of your PMCC, r, and compare to the critical value. If your absolute value is greater than the critical,
then you should reject the null hypothesis. Otherwise you should accept the null hypothesis. Don’t forget to write a full
▪ If you have a model of the form 𝑦 = 𝑘𝑏 𝑥 , then the linear relationship between 𝑦 and 𝑥 is given by 𝑙𝑜𝑔𝑦 = 𝑙𝑜𝑔𝑘 + 𝑥𝑙𝑜𝑔𝑏. conclusion in the context of the question.
In this case, we need to plot 𝑙𝑜𝑔𝑦 against 𝑥 to obtain a linear model. Example 2: Twelve students sat two biology tests, one theoretical the other practical. Their marks are shown below:
Marks in theoretical test, t 5 9 7 11 20 4 6 17 12 10 15 16
Marks in practical test, p 6 8 9 13 20 9 8 17 14 8 17 18
Example 1: The heights, ℎ cm, and masses, 𝑚 kg, of a sample of Galapagos penguins are recorded. The data are coded using
𝑦 = 𝑙𝑜𝑔𝑚 and 𝑥 = 𝑙𝑜𝑔ℎ and it is found that a linear relationship exists between 𝑥 and 𝑦. The equation of the a) Find the product moment correlation coefficient for these data, correct to 3 significant figures.
regression line of 𝑦 on 𝑥 is 𝑦 = 0.0023 + 1.8𝑥. b) A teacher claims that students who do well in their theoretical test tend to do well in their practical test. Test this claim at
Find an equation to describe the relationship between 𝑚 and ℎ, giving your answer in the form 𝑚 = 𝑎ℎ𝑛 , the 0.05 significance level, stating your hypotheses clearly.
where 𝑎 and 𝑛 are constants to be found. a) Using a calculator, we find the PMCC: Using a calculator, 𝑟 = 0.935.
b) The teacher claims that better scores in the theoretical test are likely to give better practical 𝐻0 :𝜌 = 0, 𝐻1 :𝜌 > 0.
Using the regression line and substituting the coding: 𝑙𝑜𝑔𝑚 = 0.0023 + 1.8𝑙𝑜𝑔𝑟 scores, so we are testing for a positive correlation. Therefore, our hypotheses are:
Using the power rule and taking the 𝑙𝑜𝑔𝑟 term to the other side: 𝑙𝑜𝑔𝑚 − log(𝑟1.8 ) = 0.0023 To find the critical value, note that the sample size, 𝑛, is 12 and the significance level is 5% (one- Significance level: 5%, 𝑛 = 12 ⇒ critical value = 0.4973
tail). We use this to find that our critical value is 0.4973.
Using the division law for logs and then using the relationship between 𝑚 𝑚
𝑙𝑜𝑔 ( ) = 0.0023 ⇒ 1.8 = 100.0023 We compare the absolute value of our PMCC to the critical value. Our absolute value is 0.935, 0.935 > 0.4973
exponentials and logs: 𝑟1.8 𝑟
which is greater than 0.4973.
Simplifying: 𝑚 = 100.0023 (𝑟1.8 )
As a result, we choose to reject the null hypothesis. We finish by writing a conclusion in context of ∴ Reject the null hypothesis. We can conclude that there is
We can see that 𝑎 = 100.0023 , 𝑛 = 1.8 the question. sufficient evidence to suggest that students who do well in
theoretical Biology tests also do well in practical Biology tests.
https://bit.ly/pmt-cc
https://bit.ly/pmt-edu https://bit.ly/pmt-cc