Hypothesis Testing
Hypothesis Testing
www.statstutor.ac.uk
Data types
Data Variables
Categorical:
Scale appear as categories
Measurements/ Numerical/ count Tick boxes on questionnaires
data
www.statstutor.ac.uk
Populations and samples
• Taking a sample from a population
www.statstutor.ac.uk
How can exam score data be summarised?
www.statstutor.ac.uk
Summary statistics n
• Mean = åx
i =1
=x
n
Standard deviation (s) is a measure of how much the individuals differ
from the mean n
(
å ix - x )2
s = i =1
n -1
Large SD = very spread out data
Small SD = there is little variation from the mean
www.statstutor.ac.uk
Interpretation of standard deviation
• The larger the standard deviation, the more spread
out the data is.
www.statstutor.ac.uk
Normal Distribution
• As discussed in the introductory section, normal distributions do not
necessarily have the same means and standard deviations.
• A normal distribution with a mean of 0 and a standard deviation of 1
is called a standard normal distribution.
www.statstutor.ac.uk
Assessing Normality
Charts can be used to informally assess whether data is:
Normally
Or….Skewed
distributed
H 0 : µ = 20
• We test the null hypothesis against an alternative hypothesis, Ha
• It includes the outcomes not covered by the null hypothesis
• Alternative hypothesis would express that freshmen do not study 20 hours
per week:
H a : µ ¹ 20
Hypothesis testing
• z-statistic
• Population S.D is known
• t-statistic
• Population S.D is not known
H a : µ ¹ 20 H a : µ > 20
Assumptions for t-test
• The population distribution is normal; or
• The sampling distribution is symmetric and the sample size is ≤ 15;’ or
• The sampling distribution is moderately skewed and the sample size is 16 ≤n≤ 30; or
• The random sample is made up of independent observations
t-test example
Problem Statement: Duracell manufactures batteries that the CEO claims
will last an average of 300 hours under normal use. A researcher randomly
selected 20 batteries from the production line and tested these batteries.
The tested batteries had a mean life span of 270 hours with a standard
deviation of 50 hours. Do we have enough evidence to suggest that the
claim of an average lifetime of 300 hours is false?
df = 19
t-test example
df = 29
t= 10.96
T- Table
Example
• In the past a machine has produced washers having a mean thickness
of 0.050 inches. To determine whether the machine is in proper
working order a sample of 10 washers has taken for which the mean
thickness is 0.053 inches and the standard deviation is 0.003 inches.
Test the hypothesis that the machine is in proper working order, using
a level of significance of 0.05.
• H0 is accepted
Example
• A drug company claimed that the medicine Tylenol can increase the
platelets count on average rate of at least 50,000. A sample of 30
patients of dengue fever were tested and the observed mean increase
in the platelets count were 52,000 with standard deviation of 5000.
Does the drug is affecting the platelets count ?
• H0 is accepted
Example
• A process is in control when the average amount of instant coffee
that is packed is 6 oz. The standard deviation is 0.2 oz. A sample of
100 jars are selected at random and the sample average found is 6.1
oz. Is the process out of control ?
• Ho is rejected
Protein-Drug Interaction Example
• Sorafenib
• Approved for the treatment of primary kidney cancer and primary liver cancer
• Small molecular inhibitor of
• Raf kinase
• Platelet-derived growth factor
• VEGF receptor 2 & 3 kinases
• c-Kit
• Assumptions
• Docking scores are normally distributed
Protein-Drug Interaction Example
Docking
Protein Name
Score _
x- µ0
cAMP-specific 3',5'-cyclic
-167.63 t=
phosphodiesterase 4B s
Proto-oncogene tyrosine-protein
kinase Src
-92.22 n
_
-73.87 /
MHC Class I (HLA A / HLA B)
-85.8 x = -61.984
Mitogen-activated protein kinase
-72.66 µ0 = 0
1
MAP kinase-activated protein s = 45.722
-50.03
kinase 2
n = 10
Nicotinate-nucleotide
-40.77
pyrophosphorylase - 61.984 - 0
t=
Ferrochelatase, mitochondrial -13.99 45.722
Protein kinase C eta type -11.71
10
cAMP and cAMP-inhibited cGMP
3',5'-cyclic phosphodiesterase -11.16 t = -4.287
10A
VEGFR2 -56.28 t-critical = 1.833
The Chi-Square Test
• Note:
• The chi square test does not prove that a hypothesis is correct
• It evaluates to what extent the data and the hypothesis have a good fit
Hypothesis Testing with Chi-Square
Chi-square follows five steps:
1. Making assumptions (random sampling)
• The research hypothesis (H1) proposes that the two variables are
related in the population.
• The null hypothesis (H0) states that no association exists between the
two cross-tabulated variables in the population, and therefore the
variables are statistically independent.
H0: There is no association between the two
variables.
• If the chi square value results in a probability that is less than 0.05 (ie: less
than 5%) it is considered statistically significant
• The hypothesis is rejected
Calculating the Obtained Chi-Square
( fe - fo ) 2
c =å
2
fe
fe = expected frequencies
fo = observed frequencies
Example
• H0: SNPs in coding and non-coding areas are not significantly
different. (Null Hypothesis)