0% found this document useful (0 votes)
131 views1 page

Ch.1 Regression, Correlation and Hypothesis Testing

This document discusses regression, correlation, and hypothesis testing for bivariate data. It defines the product moment correlation coefficient (PMCC) as a measure of the strength of the linear correlation between two variables ranging from -1 to 1. A value of 0 indicates no linear correlation, 1 indicates perfect positive correlation, and -1 indicates perfect negative correlation. The document provides examples of how to interpret scatter plots based on PMCC values and outlines the four step process to conduct a hypothesis test for zero correlation using the PMCC. This includes stating the null and alternative hypotheses, calculating the sample PMCC, finding the critical value, and comparing the sample PMCC to the critical value to determine whether to reject or fail to reject the null hypothesis.

Uploaded by

Adam Salik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views1 page

Ch.1 Regression, Correlation and Hypothesis Testing

This document discusses regression, correlation, and hypothesis testing for bivariate data. It defines the product moment correlation coefficient (PMCC) as a measure of the strength of the linear correlation between two variables ranging from -1 to 1. A value of 0 indicates no linear correlation, 1 indicates perfect positive correlation, and -1 indicates perfect negative correlation. The document provides examples of how to interpret scatter plots based on PMCC values and outlines the four step process to conduct a hypothesis test for zero correlation using the PMCC. This includes stating the null and alternative hypotheses, calculating the sample PMCC, finding the critical value, and comparing the sample PMCC to the critical value to determine whether to reject or fail to reject the null hypothesis.

Uploaded by

Adam Salik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Regression, correlation and hypothesis testing Edexcel Stats/Mech Year 2

Previously in Chapter 4 of Stats/Mech Year 1, you learnt how to interpret correlation and regression line equations for bivariate data. Measuring correlation
The methods you used were reliant on the two variables having a linear correlation. In this chapter we will look at how we can analyse The product moment correlation coefficient (PMCC) is a measure that describes the strength of the linear correlation between two
data where a correlation exists between two variables but is not linear. We will also define the product moment correlation coefficient variables. The PMCC for a sample of data is denoted by 𝑟, while for a population we denote the PMCC by 𝜌. The PMCC can only take
and explore its role in hypothesis testing for correlation. values between −1 and 1.

Interpreting models of the forms 𝒚 = 𝒂𝒙𝒏 and 𝒚 = 𝒌𝒃𝒙 ▪ If 𝑟 = 1 then there is a perfect positive linear correlation. All points will lie on a straight line.
Linear models are very useful because they allow us to analyse data with relative ease. However, in reality not all models that display a ▪ If 𝑟 ≠ 1 then there is a perfect negative linear correlation. All points will lie on a straight line.
pattern between two variables are linear. We will now look at one such model, of the form 𝑦 = 𝑎𝑥 𝑛 . You can use the coding 𝑌 = 𝑙𝑜𝑔𝑦 ▪ If 𝑟 = 0 then there is no linear correlation.
and 𝑋 = 𝑙𝑜𝑔𝑥 to obtain a linear relationship:
Here are a selection of scatter graphs that help to better understand how to interpret the PMCC:
𝑛
▪ If you have a model of the form 𝑦 = 𝑎𝑥 , then a linear relationship is given by 𝑙𝑜𝑔𝑦 = 𝑙𝑜𝑔𝑎 + 𝑛𝑙𝑜𝑔𝑥.

If we plot 𝑙𝑜𝑔𝑦 against 𝑙𝑜𝑔𝑥, we will obtain a linear model (straight line) since the above equation is in the form 𝑌 = 𝑀𝑋 + 𝐶. We will
now look at how we can obtain the linear form using the original form:

We start with the original equation 𝑦 = 𝑎𝑥 𝑛


Taking logs of both sides: 𝑙𝑜𝑔𝑦 = log⁡(𝑎𝑥 𝑛 ) Recall that if a base of a logarithm 𝑟 = −1 𝑟 = −0.8 𝑟=0 𝑟 = 0.6 𝑟=1
is not explicitly written then you
Using the multiplicative property of logs: 𝑙𝑜𝑔𝑦 = 𝑙𝑜𝑔𝑎 + log⁡(𝑥 𝑛 ) can assume it is 10. You need to be able to use your calculator to find the PMCC for bivariate data (data involving two variables). The method for doing so
Using the power rule for logs: 𝑙𝑜𝑔𝑦 = 𝑙𝑜𝑔𝑎 + 𝑛 ∙ 𝑙𝑜𝑔𝑥 depends on which calculator you are using, so refer to your calculator’s handbook or a relevant online tutorial if you are unsure how to
calculate the PMCC.
This is now in a linear form.
⁡⁡𝑌⁡⁡ = ⁡⁡⁡𝐶⁡⁡⁡ + ⁡⁡𝑀 ∙ 𝑋 Hypothesis testing for zero correlation
You need to be able to carry out hypothesis tests on a sample of bivariate data to find out if we can establish a linear relationship for
the entire population. The idea is that we calculate the PMCC for the sample and compare it to a critical value which will tell us
Here is a graph showing points Plotting 𝑙𝑜𝑔𝑦 against 𝑙𝑜𝑔𝑥, whether or not a linear relationship is likely to exist. The procedure for these questions can be consolidated into four steps:
on the curve 𝑦 = 0.1𝑥 1.8 ,⁡ we can see that we now have
which is of the above form a straight line. The gradient of 1. First write down your null and alternative hypotheses. Your null hypothesis is always 𝜌 = 0, while your alternative hypothesis
𝑦 = 𝑎𝑥 𝑛 . We can see a this line, 𝑛, is equal to 1.8 and will depend on what you are told in the question.
pattern here, but it isn’t the 𝑦-intercept is equal to
linear. 𝑙𝑜𝑔0.1.
2. Using your calculator, work out the PMCC of the sample data, 𝑟.

3. Use the significance level and the sample size given to you in the question to find the critical value. You will need to refer to
the “product moment coefficient” table in the formula booklet (or at the back of the Edexcel textbook) to find this value.

To obtain a linear relationship corresponding to a model of the form 𝑦 = 𝑘𝑏 𝑥 , we use the coding 𝑌 = 𝑙𝑜𝑔𝑦 and 𝑋 = 𝑥: 4. Take the absolute value of your PMCC, r, and compare to the critical value. If your absolute value is greater than the critical,
then you should reject the null hypothesis. Otherwise you should accept the null hypothesis. Don’t forget to write a full
▪ If you have a model of the form 𝑦 = 𝑘𝑏 𝑥 , then the linear relationship between 𝑦 and 𝑥 is given by 𝑙𝑜𝑔𝑦 = 𝑙𝑜𝑔𝑘 + 𝑥𝑙𝑜𝑔𝑏. conclusion in the context of the question.

In this case, we need to plot 𝑙𝑜𝑔𝑦 against 𝑥 to obtain a linear model. Example 2: Twelve students sat two biology tests, one theoretical the other practical. Their marks are shown below:
Marks in theoretical test, t 5 9 7 11 20 4 6 17 12 10 15 16
Marks in practical test, p 6 8 9 13 20 9 8 17 14 8 17 18
Example 1: The heights, ℎ cm, and masses, 𝑚 kg, of a sample of Galapagos penguins are recorded. The data are coded using
𝑦 = 𝑙𝑜𝑔𝑚 and 𝑥 = 𝑙𝑜𝑔ℎ and it is found that a linear relationship exists between 𝑥 and 𝑦. The equation of the a) Find the product moment correlation coefficient for these data, correct to 3 significant figures.
regression line of 𝑦 on 𝑥 is 𝑦 = 0.0023 + 1.8𝑥. b) A teacher claims that students who do well in their theoretical test tend to do well in their practical test. Test this claim at
Find an equation to describe the relationship between 𝑚 and ℎ, giving your answer in the form 𝑚 = 𝑎ℎ𝑛 , the 0.05 significance level, stating your hypotheses clearly.
where 𝑎 and 𝑛 are constants to be found. a) Using a calculator, we find the PMCC: Using a calculator, 𝑟 = 0.935.
b) The teacher claims that better scores in the theoretical test are likely to give better practical 𝐻0 :⁡𝜌 = 0, 𝐻1 :⁡𝜌 > 0.
Using the regression line and substituting the coding: 𝑙𝑜𝑔𝑚 = 0.0023 + 1.8𝑙𝑜𝑔𝑟 scores, so we are testing for a positive correlation. Therefore, our hypotheses are:
Using the power rule and taking the 𝑙𝑜𝑔𝑟 term to the other side: 𝑙𝑜𝑔𝑚 − log(𝑟1.8 ) = 0.0023 To find the critical value, note that the sample size, 𝑛, is 12 and the significance level is 5% (one- Significance level: 5%, 𝑛 = 12 ⇒ critical value = 0.4973
tail). We use this to find that our critical value is 0.4973.
Using the division law for logs and then using the relationship between 𝑚 𝑚
𝑙𝑜𝑔 ( ) = 0.0023 ⇒ 1.8 = 100.0023 We compare the absolute value of our PMCC to the critical value. Our absolute value is 0.935, 0.935 > 0.4973
exponentials and logs: 𝑟1.8 𝑟
which is greater than 0.4973.
Simplifying: 𝑚 = 100.0023 (𝑟1.8 )
As a result, we choose to reject the null hypothesis. We finish by writing a conclusion in context of ∴ Reject the null hypothesis. We can conclude that there is
We can see that 𝑎 = 100.0023 , 𝑛 = 1.8 the question. sufficient evidence to suggest that students who do well in
theoretical Biology tests also do well in practical Biology tests.

https://bit.ly/pmt-cc
https://bit.ly/pmt-edu https://bit.ly/pmt-cc

You might also like