0% found this document useful (0 votes)
24 views53 pages

4 Inferentials

The document provides an overview of statistical inference, focusing on estimation and hypothesis testing. It explains key concepts such as point and interval estimation, confidence intervals, and the steps involved in hypothesis testing, including the formulation of null and alternative hypotheses. Additionally, it discusses the types of errors in hypothesis testing and provides examples to illustrate these concepts.

Uploaded by

Abas Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views53 pages

4 Inferentials

The document provides an overview of statistical inference, focusing on estimation and hypothesis testing. It explains key concepts such as point and interval estimation, confidence intervals, and the steps involved in hypothesis testing, including the formulation of null and alternative hypotheses. Additionally, it discusses the types of errors in hypothesis testing and provides examples to illustrate these concepts.

Uploaded by

Abas Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

College of health and medical science

Department of Epidemiology and


Biostatistics
Statistical inference

By Adisu B. (MPH, Assistant professor)


Objectives
 After complete this session you will
be able to do

 Understand basics of statistical


inferences

 Apply statistical inference on real data


sets
Introduction # 1
 Inference is the process of generalizing or drawing
conclusions about the target population on the basis
of results obtained from a sample.
Involves
1. Estimation

2. Hypothesis testing

Purpose is to make decision about population


characteristics
Introduction #2
 We have two facts that are key to statistical inference.
• Population parameters are fixed numbers whose values are
usually unknown
• Sample statistic are known values for any given sample, but
vary from sample to sample taken from the same population.
• This variability of sample statistics (sampling
variation) is always present and must be accounted for
in any inferential procedure by identifying probability
distributions that describe the variability of sample
statistics.
Estimation
 In the real world, the values of population parameters
are fixed and usually not known.
 Instead, we must try to say something about the way
in which a variable is distributed using the information
contained in a sample of observations.

 Two broad categories of statistical inference:


Estimation and Hypothesis testing.
Estimation
 Is concerned with estimating the values of specific population parameters
based on sample statistic.
 Is about using information in a sample to make estimates of the
characteristics (parameters) of the source population.

Examples: A sample survey revealed:


 Proportion of smokers among a certain group of population aged 15 to
24 is 6%.
 Mean SBP among sampled population is 130
 The next question is what can we predict about the characteristics of the
population from which the sample was drawn
Two methods of estimation are commonly used:
 point estimation and interval estimation
 Point estimation involves the calculation of a single
number to estimate the population parameter.
represents a "single best guess“. [ Value ],
 Interval estimation specifies a range of reasonable
values for the parameter. In the form of a "range of
plausible values", [lower limit, upper limit ]
Estimating the Sampling Error

 Any estimates derived from samples are subject to the sampling


error.
 This comes from the fact that only a part of the population was
observed, instead of the whole.
 A different samples could have come up with different results.
 The amount of variation that exists among the estimates from
the different possible samples is the sampling error.
 The set of sample means in repeated random samples of size n from a
given population has variance .
 The standard deviation of this set of sample means is and is
referred to as the standard error of the mean (sem) or the standard
error.
 The sem is estimated by if  is unknown.

 As n increases, the sample mean and the sample


variance s2 approach the values of the true population
parameters, µ and 2, respectively.
Example

 Suppose that the mean ± standard deviation of DBP on 20 old males is


78.5 ± 10.3 mm Hg.

1. What is point estimate of µ ?

2. Compute the standard error of mean?

3. Compare the standard error of mean with the sd.

Ans: The best estimate of µ is 78.5.


 The sem of this estimate is 10.3/√20 = 2.3
 The sem (2.3) is much smaller than sd (10.3).
1. Point Estimate

 A single numerical value used to estimate the


corresponding population parameter.
2. Interval Estimation

 Interval estimation specifies a range of reasonable values


for the population parameter based on a point estimate.
 A confidence interval is a particular type of interval
estimator.

Confidence Intervals
 Give a plausible range of values of the estimate likely to
include the “true” (population) value with a given
confidence level.
 CIs also give information about the precision of an estimate.

 When sampling variability is high, the CI will be wide to reflect


the uncertainty of the observation. Wider CIs indicate less
certainty.

 CIs can also answer the question of whether or not an


association exists (analogous to p-values…).

 Narrow CI widths reflects large sample size or low variability or


both.
General Formula:
The general formula for all CIs is:

point estimate  (measure of how confident we want to


be)
 (standard error)

From a Z table or a T table, depending on the


sampling distribution of the statistic.

The value of the statistic in sample


(eg., mean, proportions, etc.)
A confidence interval has 3 components:

1) A point estimate (e.g. the sample mean or proportion)

2) The standard error of the point estimate ( e.g. SEM =σ/√ n )

3) A confidence coefficient
 Lower limit = Point Estimate - (Critical Value/ confidence
coefficient) x (Standard Error)
 Upper limit = Point Estimate + (Critical Value/ confidence
coefficient) x (Standard Error)
Confidence Level
 Confidence Level:
 Confidence in which the interval will
contain the unknown population
parameter
 A percentage (less than 100%)

Example: 95%
 Also written (1 - α) = .95
Confidence intervals…

 90% CI is narrower than 95% CI since we are only


90% certain that the interval includes the
population parameter.
 On the other hand 99% CI will be wider than 95%
CI; the extra width meaning that we can be more
certain that the interval will contain the
population parameter.

 To obtain a higher confidence from the same


sample, we must be willing to accept a larger
margin of error (a wider interval).
Confidence intervals…

 For a given confidence level (i.e. 90%, 95%,


99%) the width of the confidence interval
depends on the standard error of the estimate
which in turn depends on the
 1. Sample size:-The larger the sample size, the

narrower the confidence interval (this is to mean the


sample statistic will approach the population
parameter) and the more precise our estimate.
 Lack of precision means that in repeated sampling
the values of the sample statistic are spread out or
scattered.
Confidence intervals…

- To increase precision, use a larger sample.


You can make the precision as high as you
want by taking a large enough sample. The
margin of error decreases as√n increases.
 2. Standard deviation:-The more the variation
among the individual values, the wider the
confidence interval and the less precise the
estimate. As sample size increases SD
decreases.
 Z is the value from SND
 90% CI, z=1.64
 95% CI, z=1.96
(2, 6.1)

(10, 100)
Hypothesis testing
 A hypothesis usually results from speculation
concerning observed behavior, natural phenomena,
or established theory.

 The statistical hypothesis is stated in terms of


population parameters such as the mean and
proportion,
Hypothesis

 A statistical hypothesis is an assumption, claim or


a statement which may or may not be true
concerning one or more populations.
 Is a statement about one or more population parameter
 Is frequently concerned with the parameters of the
population about which the statement is made.
Examples of Research Hypotheses

Population Mean
 The average length of stay of patients admitted
to the hospital is five days.

Population Proportion
 The proportion of adult smokers in Harar is P =
0.20
Types of Hypothesis

1. The Null Hypothesis, H0


 Is a statement claiming that there is no difference
between the hypothesized value and the population
value. (The effect of interest is zero = no difference)
 States the assumption (hypothesis) to be tested
 H0 is a statement of agreement (or no
difference)
 H0 is always about a population parameter, not
about a sample statistic
 Begin with the assumption that the Ho is true
 Always contains “=” , “ ≤” or “≥ ” sign
 May or may not be rejected
2. The Alternative Hypothesis, HA

 Is a statement what we will believe is true if our sample data


causes us to reject Ho.
 Is a statement that disagrees (opposes) with Ho (The effect
of interest is not zero).
 Never contains “=” , “ ≤” or “≥ ” sign rather contains not =,
<, >
 May or may not be accepted
Steps in Hypothesis Testing

1. Choose the null hypothesis that is to be questioned.


2. Choose an alternative hypothesis which is accepted if the
original hypothesis is rejected.
3. Choose a rule for making a decision about when to reject
the original hypothesis and when to fail to reject it.
4. Choose a random sample from the appropriate population
and compute appropriate statistics: that is, mean,
proportions and so on.
5. Make the decision.
Rules for Stating Statistical Hypotheses

 Indication of equality (either =, ≤ or ≥) must appear in Ho.

Ho: μ = μo, HA: μ ≠ μo

Ho: P = Po, HA: P ≠ Po


 Can we conclude that a certain population mean is not 50?

Ho: μ = 50 and HA: μ ≠ 50


 Can we conclude that a certain population mean is greater
than 50?

Ho: μ ≤ 50 HA: μ > 50


 Can we conclude that the proportion of patients
with leukemia who survive more than six years
is not 60%?

Ho: P = 0.6 HA: P ≠ 0.6


 Now think about how the hypothesis test should be
carried out
 We draw a random sample of size n from the
underlying population and calculate its sample mean
(¯x).
 based on sample data can we conclude average
weight of students in this class is above 50kg

 Ho: μ<=50 HA: μ>50


 Can we conclude proportion of TB patients in
Harari region is 20% with available researches
 HO: P=0.2 HA: p is different from 0.2
Decision Rule

 The decision to reject or not to reject the Ho is based


on the magnitude of the test statistic.
 An example of a test statistic is the quantity

 When the variance of the population is unknown and


sample size is small, the test statistics is:
Rejection and Non-Rejection Regions

 The values the test statistic assume on the horizontal


axis of the normal distribution and are divided into
two groups:
 Rejection region, and
 Non-rejection region.
Example: Two-sided test at α 5%

= 0.025 0.95 = 0.025

-1.96 1.96

Rejection Non-rejection region Rejection


region region
Statistical
Decision
 Reject Ho if the value of the test statistic that
we compute from our sample is one of the
values in the rejection region
 Don’t reject Ho if the computed value of the test
statistic is one of the values in the non-rejection
region.
One tail and two tail tests

 In a one tail test, the rejection region is at one end of the


distribution or the other.
 In a two tail test, the rejection region is split between the two
tails.
 Which one is used depends on the way the Ho is written, Level
of Significance and the Rejection Region
Example:
 The average survival year after cancer diagnosis is less
than 3 years.
Confidence one tailed test Two tailed tes
Level
90 1.28 1.645
95 1.645 1.96
99 2.33 2.58
Types of Errors in Hypothesis
Tests
 Whenever we reject or accept the Ho, we commit
errors.
 Two types of errors are committed.
 Type I Error
 Type II Error
Type I Error

 The error committed when a true Ho is rejected


 The probability of a type I error is the probability of
rejecting the Ho when it is true-
 The probability of type I error is α
 Called level of significance of the test
 Set by researcher in advance
Type II Error

 The error committed when a false Ho is not rejected


 The probability of Type II Error is 

Power
 The probability of rejecting the Ho when it is false.
 Power = 1 – β = 1- probability of type II error
 We would like to maintain low probability of a Type I error (α)
 And low probability of a Type II error (β) [high power = 1 - β].
Action Reality
(Conclusion)
Ho True Ho False

Do not Correct action Type II error (β)


reject Ho (Prob. = 1-α) (Prob. = β= 1-Power)

Reject Ho Type I error (α) Correct action


(Prob. = α = Sign. level) (Prob. = Power = 1-β)
Type I & II Error Relationship
Hypothesis testing for single population mean
 EXAMPLE : A researcher claims that the mean SBP for 16 Patients
at HFSUH is 110 and the expected value for all population is 100
with standard deviation of 10. Test the hypothesis (Assume
population is normally distributed).
Solution
1. Ho:µ=100 VS HA:µ≠100
2. Assume α=0.05
3. Test statistics: z=(110-100)10/4=4
4. z-critical at 0.025 is equal to 1.96.
5. Decision: reject the null hypothesis since 4 ≥ 1.96
6. Conclusion: the mean of the IQ for all population is different from
100 at 5% level of significance.
Inferencefor single proportions

 Example: In the study of partner violence , research


showed that 166 in a sample of 947 patients reported
histories of IPV
a) constructs 95% confidence interval
b) test the hypothesis that the true population proportion
is 30%?  p (1  p )
p  z
 Solution (a) 2 n
 The 95% CI for P is given by
0.175 0.825
 0.175 1.96 
947
 0.175 1.96 0.0124
 [0.151 ; 0.2]
Example……


To the hypothesis we need to follow the steps
Step 1: State the hypothesis
Ho: P=Po=0.3
Ha: P≠Po ≠0.3
Step 2: Fix the level of significant (α=0.05)
Step 3: Compute the calculated and tabulated
value of the test statistic

p  Po 0.175  0.3  0.125
zcal     8.39
p (1  p ) 0.3(0.7) 0.0149
n 947
ztab 1.96
Example……
 Step 4: Comparison of the calculated and tabulated
values of the test statistic
 Since the tabulated value is smaller than the

calculated value of the test the we reject the null


hypothesis.
 Step 6: Conclusion

 Hence we concluded that the proportion of IPV in

psychiatry patients is different from 0.3


Example: Hypothesis Testing
A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that
the mean age of the population is not 30? The variance is
known to be 20 and population is normally distributed.

1. Take α = .05
State the Hypotheses

Ho: µ = 30

HA: µ ≠ 30

Test statistic
 As the population variance is known, we use Z as
the test statistic.
Decision Rule
 Reject Ho if the Z value falls in the rejection region.
 Don’t reject Ho if the Z value falls in the non-rejection
region.
 Because of the structure of Ho it is a two tail test.
 Therefore, reject Ho if Z ≤ -1.96 or Z ≥ 1.96.
Calculation of test statistic

Statistical decision

We reject the Ho because Z = -2.12 is in the rejection region.


Conclusion
 We can conclude that µ is not 30.
Question?

THE END OF INFERENCE


CHAPTER

You might also like