0% found this document useful (0 votes)
10 views

Lecture_ Week 9 Statistical Inference

This document outlines key learning outcomes related to statistical inference and correlation, including definitions of correlation, p-values, and hypothesis testing. It discusses the importance of statistical analysis in making inductive inferences and provides examples of significance testing, including t-tests and z-tests. The document emphasizes the interpretation of p-values in determining the validity of hypotheses in various research scenarios.

Uploaded by

bkadimogullari1
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture_ Week 9 Statistical Inference

This document outlines key learning outcomes related to statistical inference and correlation, including definitions of correlation, p-values, and hypothesis testing. It discusses the importance of statistical analysis in making inductive inferences and provides examples of significance testing, including t-tests and z-tests. The document emphasizes the interpretation of p-values in determining the validity of hypotheses in various research scenarios.

Uploaded by

bkadimogullari1
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Arguments and Inference

(Reasoning)
Stats: Learning Outcomes
After this class, you will be able to:
1. Define what is meant by Correlation r and what does its value tell us.
2. Guess roughly, by looking at a scatter plot the value of r, in particular, is
it positive or negative , close to 1 (-1)?
3. Define and distinguish the terms :sample’ and ‘Population”
4. Define what is meant by the p-value.
5. State the null and alternative hypotheses for the purpose of performing
significance tests for some set of data.
6. Depending on the p-value given, state conclusions about which
hypothesis to accept.
Statistical Inference: Correlation
• A professor noted over years that students who had a better
command of English did better in the course ART 101.
• How can she test this observation? In other words, what does it take
for her to say “ Indeed, students who have better command in English
would do better in ART 101” ?.
• Note that this is a sort of inductive argument.
• There is a simple statistical concept that can help in making such
inductive inference: Correlation.
Statistical Inference: Correlation
• When a scatter plot of two quantitative variables suggests a straight-
line pattern, we say there is correlation between these variables.
• Evidently, the degree or strength of correlation is reflected by how
close to a straight line is the data scattered.
A B
• The variables in B have
Stronger correlation than
Those in A.
Statistical Inference: Correlation
• In the figure below, it seems that the data on the right have stronger
correlation.
• Wrong: same data plotted using different scales.
• So: Judging by eye is not sufficient.
Statistical Inference: Correlation
• Correlation quantified:
Statistical Inference: Correlation
• Positive correlation ( r>0) : If one variable increases the other also
increases and vice versa.
• Negative correlation ( r<0) : If one variable increases the other
decreases and vice versa.
• Correlation makes no use of the distinction between explanatory
( independent) and response ( dependent) variables. It makes no
difference which variable you call x and which you call y in calculating
the correlation.
Statistical Inference: Correlation
• Correlation requires that both variables be quantitative, so that it
makes sense to do the arithmetic indicated by the formula for r.
• The correlation r is always a number between −1 and 1:
-Values of r near 0 indicate a very weak linear relationship.
- Values of r close to −1 or 1 indicate that the points lie close to a
straight line.
- The extreme values r = −1 and r = 1 occur only when the points in a
scatterplot lie exactly along the straight line.
Statistical Inference: Correlation
• Correlation measures the strength of only the linear relationship
between two variables. Correlation does not describe curved
relationships between, no matter how strong they are.
• This does not mean that there is no relation between variables whose
scatter plot is not linear!
• It means that you cannot use r as a measure of the correlation in this
case.
• So, first plot to see that the scatterplot suggests a straight line then
calculate r.
Statistical Inference: Correlation
• Examples:
Statistical Inference: Correlation
• Examples:
Statistical Inference: Correlation-DO
• Coffee is a leading export from several developing countries. When
coffee prices are high, farmers often clear forest to plant more coffee
trees. Here are data for five years on prices paid to coffee growers in
Indonesia and the rate of deforestation in a national park that lies in a
coffee-producing region.
• Find the correlation r and make a
Scatterplot using the Pearson
Correlation Coefficient Calculator
https://www.socscistatistics.com/tests/pearson/default2.aspx
Statistical Inference: Correlation-DO
• Your results should look like this:

• r = 0.9552
Statistical Inference: Correlation-DO
• Do the two exercise posted to lms under current week using the
Pearson Correlation Coefficient Calculator at the link provide in
previous slide.
Statistical Inference: Significance
Tests
• Recall the following type of inductive inference:
• Enumerative Induction ( inductive generalization):
when we arrive at a generalization about a group of things after
observing only some members of that group.
Symbolically:
X (proportion/percent) of the observed members ( i.e. sample) of A
are B. Therefore, X (proportion/percent) of the entire group
(population) of A are B.
Statistical Inference: Significance
Tests

• The purpose of statistical inference is to draw conclusions from data.

• It acts to justify/support making inductive arguments like the


Enumerative Induction ( inductive generalization).
Statistical Inference: Significance
Tests
• Consider the following example:
Researchers want to know if a new drug is more effective than a
placebo. Twenty patients receive the new drug, and 20 receive a
placebo. Twelve (60%) of those taking the drug show improvement
versus only 8 (40%) of the placebo patients.
• Our unaided judgment would suggest that the new drug is better.
• However, statistical analysis yields a p-value of 20% !. SO??
Statistical Inference: Significance
Tests
• This means the following:
• 1. If there were NO difference between the effect of the drug and that
of the placebo, and,
2. we take 100 samples from the whole population of those who
received the drug and 100 from those who received the placebo and
compare their results, then:
3. 20 pairs of these samples will yield results similar to the results we
got, i.e. they will show that drug was more effective than placebo!
• Therefore: The conclusion that the drug is more effective than the
placebo is not sufficiently supported. We can not accept this!
Statistical Inference: Significance
Tests
• The t-test: Two samples from two different populations:
• Steps of Significance Testing- the issue:
We have two samples representing two populations.
We measure a statistical quantity ( usually the mean of some
quantity) in the two samples and get different results.
The question is: Does this imply that the population means of this
quantity are different?
Statistical Inference: Siginificance
Tests
• Steps of Significance Testing- the procedure:
We make two hypotheses:
- The Null Hypothesis H0:
H0: there is no difference in the population means of this quantity.
- The Alternative Hypothesis Ha:
Ha: there is a difference/one is larger than the other in the population means
of this quantity
 We perform a statistical test and get the p-value:
- p-value large ( usually p > 0.05 ) accept H0 / reject Ha
- p-value small ( usually p < 0.05 ) accept Ha / reject H0
Statistical Inference: Siginificance
Tests
The statistical test is performed on the data of the samples, and
measures the variance within each sample and between the samples.
 The p-value provides the following info: how many samples out of
100 samples taken from each of the two populations, and assuming
that the null hypothesis is true, will give results similar to the results
obtained by the two samples measured.
Large p-value indicates that such difference in the results of the
samples might be due to chance and NOT due to difference among
the two populations.
Statistical Inference: Siginificance
Tests
• Example from Studies on use of mlaria drug (HCQ)b during the Covid
pandemic:
Safety and Efficacy of Hydroxychloroquine in COVID-19: A Systematic
Review and Meta-Analysis
Waqas Ullah,a,i Hafez M. Abdullah,b Sohaib Roomi,a Yasar Sattar,c Talal
Almas,d Smitha Narayana Gowda,b Rehan Saeed,a Maryam Mukhtar,e
Ammar Ahmad,f Tony Oliver,b M. Chadi Alraies,g Donald C. Haas,a and
David L. Fischmanh
• J Clin Med Res. 2020 Aug; 12(8): 483–491.
Published online 2020 Jul 4. doi: 10.14740/jocmr4233
Statistical Inference: Significance
Tests
• Example from Studies on use of mlaria drug (HCQ)b during the Covid pandemic:
• Results
• Twelve studies comprising 3,912 patients (HCQ 2,512 and control 1400) were included.
The odds of all-cause mortality (OR: 2.23, 95% confidence interval (CI): 1.58 - 3.13, P
value < 0.00001) were significantly higher in patients on HCQ compared to patients on
control agent. The response to therapy assessed by negative repeat polymerase chain
reaction (PCR) (OR: 1.83, 95% CI: 0.50 - 6.75, P = 0.36), radiological resolution (OR:
1.98, 95% CI: 0.47 - 8.36, P value = 0.36) and the need for invasive mechanical
ventilation (IMV) (OR: 1.21, 95% CI: 0.34 - 4.33, P value = 0.76) were identical between
the two groups. Overall, four times higher odds of net adverse events (NAEs) were
observed in the HCQ group (OR: 4.59, 95% CI 1.73 - 12.20, P value = 0.02). The
measures for individual safety endpoints were also numerically lower in the control
arm; however, none of these values reached the level of statistical significance
Statistical Inference: Siginificance
Tests
• Example from Studies on use of mlaria drug (HCQ)b during the Covid
pandemic:

Conclusions
HCQ might offer no benefits in terms of decreasing the viral load and
radiological improvement in patients with COVID-19. HCQ appears to
be associated with higher odds of all-cause mortality and NAEs
Statistical Inference: Siginificance
Tests-DO
• Basket team A coach claims that his team is doing better than team B
in three-pointers. He bases his claim on the following statistics of both
teams during the last 8 games (SAMPLE).
Team A 12 7 13 7 8 7 11 9 Mean A
9.25

Team B 4 7 6 11 9 8 4 7 Mean B
7

• Do you believe him?


http://www.rossmanchance.com/applets/TOSCalculator.html
Statistical Inference: Significance
Tests-DO
• Lets us do significance test.
• What are H0 and Ha?
-H0: The average score of team A in ALL MATCHES is the same as team B.
- Ha: the average score of team A in ALL MATCHES is higher than team B.

• Use the applet at the link below to do the unpaired t-test:


http://www.rossmanchance.com/applets/TOSCalculator.html
Statistical Inference: Significance
Tests-DO
• This is how the results
look on the applet.
Statistical Inference: Siginificance
Tests-DO
P value and statistical significance:
• The P value equals 0.0522 > 0.05
• By conventional criteria (p<0.05) , this difference is considered to be
not statistically significant.
• It is LIKELY that these results are due to chance and do NOT reflect a
difference in the means of the two teams in the population, which is
ALL GAMES of the two teams.
Statistical Inference: Siginificance
Tests-DO
• Repeat the test for
A B
the following data: 12 4
7 7
13 6
7 11
• What do you conclude? 8 9
7 8
11 4
9 7
9 9
10 8
9 7
10 6
12 11
12 8
11 7
13 5
Statistical Inference: Significance
Tests-DO
• This is how the results
look on the applet.
• The P value equals
0.0036 < 0.05
• Conclusion:
It is VERY LIKELY that
team A score higher than
B in ALL GAMES.
Statistical Inference: Siginificance
Tests
• Consider one more example:
Foresters who study special very old pine trees in a forest are
interested in how the trees are distributed in the forest. Is there some
sort of clustering, resulting in regions of the forest with more trees than
others? Or are the tree locations random, resulting in no particular
patterns?
Statistical Inference: Siginificance
Tests
• The Figure below gives a plot of the locations of all 584 longleaf pine
trees in a randomly picked 200-meter by 200-meter region in the
forest.
• Is there clustering?
• Seems so by eye.
• But if trees were distributed
completely randomly, is not
there a chance that such clustering
will appear in some 200x200 region?
Statistical Inference: Siginificance
Tests
• Statistical analysis find that the probability of having a pattern like
this if the whole trees of the forest were distributed randomly is 4 in
100, i.e. p-value is 4%.
• This means that, assuming the whole trees
of the forest ( the population) are randomly
distributed, if we pick up 100 areas 200x200 m
in dimensions (samples), then 4 of them only
will have a form similar to the one that we
investigated.
Statistical Inference: Siginificance
Tests
• Therefore, the probability to get such a pattern in a sample if the
trees were randomly distributed is small, 4%.
• Therefore, it is very likely that the whole trees
in the forest ( the population) is clustered NOT
Randomly distributed.
• OR, stated differently, it is unlikely to have
such a pattern if the trees were randomly
distributed.
Statistical Inference: Siginificance
Tests
• The z-test: One sample; the mean of population is known:
• Steps of Significance Testing- the issue:
We have one population with the population mean/standard
deviations of some quantity known.
We draw a sample from this population representing some sub-
population and measure the mean/standard deviation of the same
quantity for the sample .
The question is: Does the values of mean/standard deviation imply
that this sub-population has mean different than the whole
population?
Statistical Inference: Siginificance
Tests
• Steps of Significance Testing- the procedure:
We make two hypotheses:
- The Null Hypothesis H0:
H0: there mean is the same as the whole population mean.
- The Alternative Hypothesis Ha:
Ha: the mean is different than the whole population mean
 We perform a statistical test and get the p-value:
- p-value large ( usually p > 0.05 ) accept H0 / reject Ha
- p-value small( usually p < 0.05 ) accept Ha / reject H0
Statistical Inference: Significance
Tests-DO
• The CGPA of all TEDU students is 2.5. A group of 10 engineering
students claimed that the CGPA of Engineering students of TEDU is
higher than the overall CGPA of TEDU students. To support there
argument, they reported their mean CGPA’s which was 2.8 with s=0.7.
• Does this data support their claim?
• What are H0 and Ha?
• Find the p-value using the applet at:
http://www.rossmanchance.com/applets/TOSCalculator.html
Statistical Inference: Significance
Tests-DO
• The p=value is 0.1042 > 0.05 . Therefore,
at this level of significance we can NOT
reject H0 in favor of Ha.
Statistical Inference: Siginificance
Tests-DO
• Assume now that they reported their mean CGPA’s which was 3.2
with s=0.3.
• Does this data support their claim?
Statistical Inference: Significance
Tests-DO
• The p=value is 0.0035 << 0.05 . Therefore,
at this level of significance we CAN
reject H0 in favor of Ha.

You might also like