0% found this document useful (0 votes)
63 views38 pages

Hypothesis Testing

This document provides an overview of key concepts in hypothesis testing including: - Variables, data types, populations and samples, point estimation - Descriptive statistics like mean, standard deviation, and how data can be summarized - The process of hypothesis testing including null and alternative hypotheses - Common statistical tests like z-tests, t-tests, and chi-square tests - Examples of hypothesis tests conducted using t-tests and analyzing their results

Uploaded by

Ayesha Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views38 pages

Hypothesis Testing

This document provides an overview of key concepts in hypothesis testing including: - Variables, data types, populations and samples, point estimation - Descriptive statistics like mean, standard deviation, and how data can be summarized - The process of hypothesis testing including null and alternative hypotheses - Common statistical tests like z-tests, t-tests, and chi-square tests - Examples of hypothesis tests conducted using t-tests and analyzing their results

Uploaded by

Ayesha Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Hypothesis Testing

Dr. Zoya Khalid


[email protected]
Data and variables
DATA: the answers to questions or
measurements from the One variable per
experiment column

VARIABLE = measurement which


varies between subjects e.g. height
or gender

One row per subject

www.statstutor.ac.uk
Data types
Data Variables

Categorical:
Scale appear as categories
Measurements/ Numerical/ count Tick boxes on questionnaires
data

www.statstutor.ac.uk
Populations and samples
• Taking a sample from a population

Sample data ‘represents’ the whole population


www.statstutor.ac.uk
Point estimation
Sample data is used to estimate parameters of a population
Statistics are calculated using sample data.
Parameters are the characteristics of population data

sample mean Population mean


!
" estimates
µ
Sample SD Population SD
# s

www.statstutor.ac.uk
How can exam score data be summarised?

Exam marks for 60 students (marked out of 65)

mean = 30.3 sd = 14.46

www.statstutor.ac.uk
Summary statistics n
• Mean = åx
i =1
=x
n
Standard deviation (s) is a measure of how much the individuals differ
from the mean n
(
å ix - x )2

s = i =1
n -1
Large SD = very spread out data
Small SD = there is little variation from the mean

For exam scores, mean = 30.5, SD = 14.46

www.statstutor.ac.uk
Interpretation of standard deviation
• The larger the standard deviation, the more spread
out the data is.

www.statstutor.ac.uk
Normal Distribution
• As discussed in the introductory section, normal distributions do not
necessarily have the same means and standard deviations.
• A normal distribution with a mean of 0 and a standard deviation of 1
is called a standard normal distribution.

www.statstutor.ac.uk
Assessing Normality
Charts can be used to informally assess whether data is:

Normally
Or….Skewed
distributed

The mean and median are very


different for skewed data.
www.statstutor.ac.uk
Introduction – Hypothesis testing
• In everyday life, we often have to make decisions based on
incomplete information
• Will I improve my biology grades if I spend more time studying vocabulary?
• Should I become a chemistry major to increase my chances of getting into
med school?
• Hypothesis testing is a kind of statistical inference that involves
asking a question, collecting data, and then examining what the data
tells us about how to proceed
Null and alternate hypothesis
• The hypothesis to be tested is called the null hypothesis, H0
• No difference between a hypothesized population mean and a sample mean
• College freshmen study 20 hours per week:

H 0 : µ = 20
• We test the null hypothesis against an alternative hypothesis, Ha
• It includes the outcomes not covered by the null hypothesis
• Alternative hypothesis would express that freshmen do not study 20 hours
per week:

H a : µ ¹ 20
Hypothesis testing
• z-statistic
• Population S.D is known

• t-statistic
• Population S.D is not known

Degree of freedom (df) = n-1


Outline of t-test

• level of significance or alpha (α) level ----- 0.05


Hypothesis test direction

H a : µ ¹ 20 H a : µ > 20
Assumptions for t-test
• The population distribution is normal; or
• The sampling distribution is symmetric and the sample size is ≤ 15;’ or
• The sampling distribution is moderately skewed and the sample size is 16 ≤n≤ 30; or
• The random sample is made up of independent observations
t-test example
Problem Statement: Duracell manufactures batteries that the CEO claims
will last an average of 300 hours under normal use. A researcher randomly
selected 20 batteries from the production line and tested these batteries.
The tested batteries had a mean life span of 270 hours with a standard
deviation of 50 hours. Do we have enough evidence to suggest that the
claim of an average lifetime of 300 hours is false?

Step 1: Clearly state the Null and Alternative Hypothesis


t-test example
Step 2: Identify the appropriate significance level and confirm the test
assumptions.
0.05
Step 3: Analyze the data and compute the test statistic.

df = 19
t-test example

Step 4: Interpret your results


t-test value is outside of our t-critical value –
it lies in the critical region –we reject the
Null Hypothesis.
Example
t-test example
Analyze the data and compute the test statistic.

df = 29

t= 10.96
T- Table
Example
• In the past a machine has produced washers having a mean thickness
of 0.050 inches. To determine whether the machine is in proper
working order a sample of 10 washers has taken for which the mean
thickness is 0.053 inches and the standard deviation is 0.003 inches.
Test the hypothesis that the machine is in proper working order, using
a level of significance of 0.05.

• H0 is accepted
Example
• A drug company claimed that the medicine Tylenol can increase the
platelets count on average rate of at least 50,000. A sample of 30
patients of dengue fever were tested and the observed mean increase
in the platelets count were 52,000 with standard deviation of 5000.
Does the drug is affecting the platelets count ?

• H0 is accepted
Example
• A process is in control when the average amount of instant coffee
that is packed is 6 oz. The standard deviation is 0.2 oz. A sample of
100 jars are selected at random and the sample average found is 6.1
oz. Is the process out of control ?
• Ho is rejected
Protein-Drug Interaction Example
• Sorafenib
• Approved for the treatment of primary kidney cancer and primary liver cancer
• Small molecular inhibitor of
• Raf kinase
• Platelet-derived growth factor
• VEGF receptor 2 & 3 kinases
• c-Kit

• Assumptions
• Docking scores are normally distributed
Protein-Drug Interaction Example
Docking
Protein Name
Score _
x- µ0
cAMP-specific 3',5'-cyclic
-167.63 t=
phosphodiesterase 4B s
Proto-oncogene tyrosine-protein
kinase Src
-92.22 n
_
-73.87 /
MHC Class I (HLA A / HLA B)
-85.8 x = -61.984
Mitogen-activated protein kinase
-72.66 µ0 = 0
1
MAP kinase-activated protein s = 45.722
-50.03
kinase 2
n = 10
Nicotinate-nucleotide
-40.77
pyrophosphorylase - 61.984 - 0
t=
Ferrochelatase, mitochondrial -13.99 45.722
Protein kinase C eta type -11.71
10
cAMP and cAMP-inhibited cGMP
3',5'-cyclic phosphodiesterase -11.16 t = -4.287
10A
VEGFR2 -56.28 t-critical = 1.833
The Chi-Square Test

• Chi-square test: an inferential statistics technique designed to test


for significant relationships between two variables.

• Chi-square requires no assumptions about the shape of the


population distribution from which a sample is drawn.
The Chi-Square Test
• A statistical method used to determine goodness of fit
• Goodness of fit refers to how close the observed data are to those predicted
from a hypothesis

• Note:
• The chi square test does not prove that a hypothesis is correct
• It evaluates to what extent the data and the hypothesis have a good fit
Hypothesis Testing with Chi-Square
Chi-square follows five steps:
1. Making assumptions (random sampling)

2. Stating the research and null hypotheses

3. Selecting the sampling distribution and


specifying the test statistic

4. Computing the test statistic

5. Making a decision and interpreting the results


Stating Research and Null Hypotheses

• The research hypothesis (H1) proposes that the two variables are
related in the population.

• The null hypothesis (H0) states that no association exists between the
two cross-tabulated variables in the population, and therefore the
variables are statistically independent.
H0: There is no association between the two
variables.

Gender and fear of walking alone at night are


statistically independent.

Afraid Men Women Total


No 71.1% 71.1% 71.1%
Yes 28.9% 28.9% 28.9%
Total 100% 100% 100%
H1: The two variables are related in the
population.

Gender and fear of walking alone at night are


statistically dependent.

Afraid Men Women Total


No 83.3% 57.2% 71.1%
Yes 16.7% 42.8% 28.9%
Total 100% 100% 100%
The Concept of Expected Frequencies
Expected frequencies fe : the cell frequencies that
would be expected in a bivariate table if the two
tables were statistically independent.

Observed frequencies fo: the cell frequencies


actually observed in a bivariate table.
Chi-Square (obtained)
• The test statistic that summarizes the differences between the
observed (fo) and the expected (fe) frequencies in a bivariate
table.
Interpret the chi-square values
• Low chi square values indicate a high probability that the observed deviations
could be due to random chance alone
• High chi square values indicate a low probability that the observed deviations
are due to random chance alone

• If the chi square value results in a probability that is less than 0.05 (ie: less
than 5%) it is considered statistically significant
• The hypothesis is rejected
Calculating the Obtained Chi-Square

( fe - fo ) 2
c =å
2

fe
fe = expected frequencies
fo = observed frequencies
Example
• H0: SNPs in coding and non-coding areas are not significantly
different. (Null Hypothesis)

• H1: They are significantly different. (Alternate hypothesis)

• If the test statistics value is higher the hypothesis is rejected that


means two variables are statistically different.
• If the test statistics value is lower and the P-value is higher than alpha
(p< 0.05) the hypothesis is accepted that means two variables are not
statistically significant.

You might also like