0% found this document useful (0 votes)

11 views130 pages

Lecture 5

Uploaded by

Moybon Kalif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views130 pages

Lecture 5

Uploaded by

Moybon Kalif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 130

Statistical Estimation Techniques

1
Introduction
In the real world, the values of population parameters are fixed
and usually not known.
Instead, we must try to say something about the way in which
a variable is distributed using the information contained in a
sample of observations.
The process of drawing conclusions about an entire population
based on the data in a sample is known as statistical inference.
Two broad categories: Estimation and Hypothesis testing.

2
Estimation
Is concerned with estimating the values of specific population
parameters based on sample statistic.
Is about using information in a sample to make estimates of the
characteristics (parameters) of the source population.

Examples: A sample survey revealed:

 Proportion of smokers among a certain group of population aged 15 to 24.

 Mean of SBP among sampled population

The next question is what can we predict about the characteristics of

the population from which the sample was drawn
3
Estimation, Estimator & Estimate
♣ Estimation is the computation of a statistic from sample data,
often yielding a value that is an approximation (guess) of its
target, an unknown true population parameter value.

♣ The statistic itself is called an estimator and can be of two

types - point or interval.

♣ The value or values that the estimator assumes are called

estimates.
4
Two methods of estimation are commonly used:
point estimation and interval estimation

Point estimation involves the calculation of a single number to

estimate the population parameter
Interval estimation specifies a range of reasonable values for
the parameter

5
Point versus Interval Estimators
♣ An estimator that represents a "single best guess" is called a
point estimator.

♣ When the estimate is of the form of a "range of plausible

values", it is called an interval estimator.

 Thus,
 A point estimate is of the form: [ Value ],

 Whereas, an interval estimate is of the form: [ lower limit,

upper limit ] 6
Sample mean ( ) is an unbiased estimator of population mean.

7
Estimating the Sampling Error

 Any estimates derived from samples are subject to the

sampling error.
 This comes from the fact that only a part of the population
was observed, instead of the whole.
 A different samples could have come up with different results.

 The amount of variation that exists among the estimates from

the different possible samples is the sampling error. 8
 The set of sample means in repeated random samples of size n from a
given population has variance .
 The standard deviation of this set of sample means is and is
referred to as the standard error of the mean (sem) or the standard
error.
 The sem is estimated by if  is unknown.

9
 The sampling error is dependent on sample size (n) and the

variability of individual sample points ().

 As n increases, the sample mean ( ) and the sample variance
s2 approach the values of the true population parameters, µ
and 2, respectively.

10
Example
 Suppose that the mean ± sd of DBP on 20 old males is 78.5 ± 10.3
mm Hg.

1. What is our best estimate of µ ?

2. What is the sem?

3. Compare the sem with the sd.

11
 The following table gives the se for mean of DBP for different
sample sizes.
 Our best estimate of µ is 78.5.

 The sem of this estimate is 10.3/√20 = 2.3

 The sem (2.3) is much smaller than sd (10.3).

12
1. Point Estimate
 A single numerical value used to estimate the corresponding
population parameter.
Sample Statistic are Estimators of Population Parameters

Sample mean, µ
Sample variance, S2 2
Sample P or π
proportion, p OR
Sample Odds Ratio,
RR
OŔ
ρ 13
Sample Relative Risk, RŔ
2. Interval Estimation
 Interval estimation specifies a range of reasonable values for
the population parameter based on a point estimate.
 A confidence interval is a particular type of interval estimator.

Confidence Intervals
 Give a plausible range of values of the estimate likely to include
the “true” (population) value with a given confidence level.
 An interval estimate provides more information about a
population characteristic than does a point estimate
14
 CIs also give information about the precision of an estimate.

 When sampling variability is high, the CI will be wide to reflect

the uncertainty of the observation.

 Wider CIs indicate less certainty.

 CIs can also answer the question of whether or not an

association exists (analogous to p-values…).

 Narrow CI widths reflects large sample size or low variability

or both. 15
General Formula:
The general formula for all CIs is:

The value of the statistic in sample

(eg., mean, proportions, etc.)
point estimate  (measure of how confident we want to be)
 (standard error)

From a Z table or a T table, depending on the

sampling distribution of the statistic.

16
A confidence interval has 3 components:

1) A point estimate (e.g. the sample mean)

2) The standard error of the point estimate ( e.g. SEM =σ/√ n )

3) A confidence coefficient (conf. coeff)

Lower limit = Point Estimate - (Critical Value/ confidence
coefficient) x (Standard Error)
Upper limit = Point Estimate + (Critical Value/ confidence
coefficient) x (Standard Error)
17
Confidence Level
 Confidence Level:

 Confidence in which the interval will contain the unknown

population parameter
 A percentage (less than 100%)

Example: 95%
 Also written (1 - α) = .95

18
Definition of 95% CI
1. Probabilistic interpretation:
 If all possible random samples of a given sample size were obtained
and if each were used to obtain its own CI, then 95% of all such CIs
would contain the unknown population parameter; the remaining 5%
would not.

2. Practical interpretation
 When sampling is from a normally distributed population with known
standard deviation, we are 100 (1-α) [e.g., 95%] confident that the
single computed interval contains the unknown population
parameter. 19
Estimation for Single Population

20
1. CI for a Population Mean (normally distributed)

A. Known variance (large sample size)

Consider the task of computing a CI estimate of μ for a

population distribution that is normal with σ known.
 Available are data from a random sample of size = n.

21
Assumptions
 Population standard deviation () is known

 Population is normally distributed

 If population is not normal, use large sample

A 100(1-)% C.I. for  is:

  is to be chosen by the researcher, most common values

of  are 0.1, 0.05 and 0.01. 22
3. Commonly used CLs are 90%, 95%, and 99%

23
Finding the Critical Value

24
Margin of Error
(Precision of the estimate)

25
Factors Affecting Margin of Error

The CI for mean or margin of error is determined by n, s,

and α.
As n increases, the CI decreases.

As s increases, the length of CI increases.

As the confidence level increases (α decreases), the length

of CI increases.
26
Example:
1. Waiting times (in hours) at a particular hospital are believed to
be approximately normally distributed with a variance of
2.25 hr.

a. A sample of 20 outpatients revealed a mean waiting time of

1.52 hours. Construct the 95% CI for the estimate of the
population mean.

b. Suppose that the mean of 1.52 hours had resulted from a

sample of 32 patients. Find the 95% CI.

c. What effect does larger sample size have on the CI? 27

a.
2.25
1.52 1.96 1.52 1.96(.33)
20
1.52 .65 (.87, 2.17)

 We are 95% confident that the true mean waiting time is

between 0.87 and 2.17 hrs.
 95% of the intervals formed in this manner will contain the true
mean.

28
b. 2.25
1.52 1.96 1.52 1.96(.27)
32
1.52 .53 (.99, 2.05)

c. The larger the sample size makes the CI narrower (more

precision).

29
 When constructing CIs, it has been assumed that the standard
deviation of the underlying population,  , is known
 What if  is not known?

 In this case, the SE of the population can be replaced by the

SE of the sample if the sample size is large enough (n>30).
With large sample size, we assume a normal distribution.

30
 Example: It was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average, with SD of 8
minutes. What is the 90% CI for µ? Ans: (14.98, 19.42).
 Since the sample size is fairly large (>30) and the
population SD is unknown, we assume the distribution
of sample mean to be normally distributed based on
the CLT and the sample SD to replace population .

31
B. Unknown variance
(small sample size, n ≤ 30)
 What if the  for the underlying population is unknown and
the sample size is small?

 As an alternative we use Student’s t distribution .

32
33
Student’s t Distribution
 The t is a family of continuous probability distributions

 Bell Shaped

 Symmetric about zero (the mean)

 Flatter than the Normal (0,1). This means

The variability of a t is greater than that of a Z that is

normal(0,1)
Thus, there is more area under the tails and less at center

Because variability is greater, resulting confidence intervals

34
will be wider.
• Note: t approaches z as n increases

35
Student’s t Table

36
t distribution values
 With comparison to the Z value

37
Example

 Standard error =

 t-value at 90% CI at 19 df =1.729

38
39
2. CIs for population proportion, p

Is based on three elements of CI.

Point estimate

SE of point estimate

Confidence coefficient
40
41
42
Lower limit = Point Estimate - (Critical Value) x (Standard
Error of Estimate)
Upper limit = Point Estimate + (Critical Value) x (Standard
Error of Estimate)

Hence,

is an approximate 95% CI for the true proportion p.

43
Example 1
 A random sample of 100 people shows that 25 are left-
handed. Form a 95% CI for the true proportion of left-
handers.

44
Interpretation

45
Example
 It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear prior to the time of case’s diagnosis. Calculate
a 95% CI for the percentage of cervical-cancer cases who never
had a Pap test.

46
Sample size Determination
Too small sample size :
May fail to detect an important effect

Estimates of effect may be too imprecise (wide CI’s)

Too many sample size:

May results in wastage of resources.

To make generalizations about entire population, we need

a total sample size of 200-400
47
Confidence interval approach
 Given confidence interval
mean ( proportion ) z  s.e
2

 Hence the absolute precision denoted by d is given as

Margin of error
d = z s.e

 Where s.e is the standard error2 of the estimator of the
parameter of interest.

48
Steps to determine sample size:
1. Specify tolerable error (i.e., desired precision and confidence
level via d and  )

2. Identify appropriate equation relating tolerable error (d, ) to

sample size (n)

3. Estimate unknown quantities in equation

4. Solve for n

5. Evaluate (and return to first step)

sample size calculation should relate to the study’s outcome
variable 49
Estimating a single population
mean/proportion

50
Examples
1. A survey is being planned to determine what proportion of
families in a certain area are medically indigent. It is found
that the proportion is 0.35 from previous studies. A 95%
confidence interval is desired with d=5% What size sample of
families should be selected?
2. Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points with 95%
confidence. Let p=0.20, d=0.03, α=5%

52
Example
3. Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would like
a 95% CI and wants margin of error of 2 units.

If the population SD is 124 years, how large should our sample

be?

= 1.96*1.96*124 = 119
2*2

53
Suppose there is no prior information about the proportion
(p) who breastfeed

For a fixed absolute precision (d), the required sample

size increases as P increases form 0 to 0.5, and then
decreases in the same way as the prevalence
approaches 1.

54
 An estimate of p is not always available.

 However, the formula may also be used for sample size

calculation based on various assumptions for the values of
p.
P = 0.1  n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2  n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3  n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5  n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7  n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8  n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
55
Some Considerations

56
Using design effect
 The loss of effectiveness by the use of cluster sampling,
instead of simple random sampling, is the design effect.
 The design effect is basically the ratio of the actual variance,
under the sampling method actually used, to the variance
computed under the assumption of simple random
sampling
Using design effect cont.…
 When simple and systematic random sampling
techniques are used design effect is one.
 When clustering sampling technique is used design
effect is two.
 When multi stage sampling technique is used design
effect is equal to the number of stages.
Hypothesis Testing

60
Introduction
In statistical analyses hypotheses are formulated, experiments are
performed, and results are evaluated for their consistency (non-
consistency) with a hypothesis.
Hypothesis Testing (HT) provides an objective framework for making
decisions using probabilistic methods.
The purpose of HT is to aid the clinician, researcher or administrator
in reaching a decision (conclusion).

61
Hypothesis
A statistical hypothesis is an assumption, claim or a statement
which may or may not be true concerning one or more
populations.
Is a statement about one or more population parameter

Is frequently concerned with the parameters of the population

about which the statement is made.

62
Examples of Research
Hypotheses
Population Mean
The average length of stay of patients admitted to the hospital is five
days
The mean birth weight of babies delivered by mothers with low SES is
lower than those from higher SES.

Population Proportion
The proportion of adult smokers in Harar is P = 0.40

 The prevalence of HIV among non-married adults is higher than that in

married adults
63
Types of Hypothesis

1. The Null Hypothesis, H0

Is a statement claiming that there is no difference between the
hypothesized value and the population value. (The effect of
interest is zero = no difference)
States the assumption (hypothesis) to be tested
H0 is a statement of agreement (or no difference)
H0 is always about a population parameter, not about a sample
statistic
64
Begin with the assumption that the Ho is true
Similar to the notion of innocent until proven guilty

Always contains “=” , “ ≤” or “≥ ” sign

May or may not be rejected

65
2. The Alternative Hypothesis, HA

Is a statement what we will believe is true if our sample data causes
us to reject Ho.
 Is generally the hypothesis that is believed(or needs to be supported)
by the researcher.
Is a statement that disagrees (opposes) with Ho (The effect of
interest is not zero).
Never contains “=” , “ ≤” or “≥ ” sign

May or may not be accepted

66
Steps in Hypothesis Testing
1. Choose the null hypothesis that is to be questioned.

2. Choose an alternative hypothesis which is accepted if the original

hypothesis is rejected.

3. Choose a rule for making a decision about when to reject the

original hypothesis and when to fail to reject it.

4. Choose a random sample from the appropriate population and

compute appropriate statistics: that is, mean, proportions and so
on.
67
5. Make the decision.
Rules for Stating Statistical
Hypotheses
Indication of equality (either =, ≤ or ≥) must appear in Ho.

Ho: μ = μo, HA: μ ≠ μo

Ho: P = Po, HA: P ≠ Po

Can we conclude that a certain population mean is not 50?

Ho: μ = 50 and HA: μ ≠ 50

 Can we conclude that a certain population mean is greater than
50?

Ho: μ ≤ 50 HA: μ > 50

68
Can we conclude that the proportion of patients with leukemia
who survive more than six years is not 60%?

Ho: P = 0.6 HA: P ≠ 0.6

Now think about how the hypothesis test should be carried out

We draw a random sample of size n from the underlying population

and calculate its sample mean (¯x).
We compare (¯x) to the postulated mean μ0.

69
Decision Rule
The decision to reject or not to reject the Ho is based on the
magnitude of the test statistic.
An example of a test statistic is the quantity

When the variance of the population is unknown and sample size is

small, the test statistics is:

70
Rejection and Non-Rejection
Regions
The values the test statistic assume on the horizontal axis of the
normal distribution and are divided into two groups:
 Rejection region, and
 Non-rejection region.

The values of the test statistic forming the rejection region are less
likely to occur if the Ho is true.
The values making the non-rejection region are more likely to occur
if the Ho is true.
71
Example: Two-sided test at α
5%

= 0.025 = 0.025
0.95

-1.96 1.96

Rejection Non-rejection region Rejection

region region

72
Statistical
Decision
Reject Ho if the value of the test statistic that we compute from our
sample is one of the values in the rejection region
Don’t reject Ho if the computed value of the test statistic is one of
the values in the non-rejection region.

73
Level of
Significance, α
Is the probability of rejecting a true Ho (type 1 error)

Defines unlikely values of sample statistic if Ho is true

Defines rejection region of the sampling distribution

The decision is made on the basis of the level of significance,

designated by α.
More frequently used values of α are 0.01, 0.05 and 0.10.

α is selected by the researcher at the beginning

74
One tail and two tail
tests
In a one tail test, the rejection region is at one end of the distribution or the
other.
In a two tail test, the rejection region is split between the two tails.

Which one is used depends on the way the Ho is written.

Level of Significance and the Rejection Region

Example:
The average survival year after cancer diagnosis is less than 3 years.

75
76
Types of Errors in
Hypothesis Tests
Whenever we reject or accept the Ho, we commit errors.

Two types of errors are committed.

 Type I Error
 Type II Error

77
Type I Error
The error committed when a true Ho is rejected

The probability of a type I error is the probability of rejecting the Ho

when it is true-
The probability of type I error is α

 Called level of significance of the test

 Set by researcher in advance

78
Type II Error
The error committed when a false Ho is not rejected
The probability of Type II Error is 

Power
The probability of rejecting the Ho when it is false.
Power = 1 – β = 1- probability of type II error
We would like to maintain low probability of a Type I error (α)
And low probability of a Type II error (β) [high power = 1 - β].

79
Action Reality
(Conclusion)
Ho True Ho False

Do not Correct action Type II error (β)

reject Ho (Prob. = 1-α) (Prob. = β= 1-Power)

Reject Ho Type I error (α) Correct action

(Prob. = α = Sign. level) (Prob. = Power = 1-β)

80
Type I & II Error
Relationship

81
1. Hypothesis Testing of a Single
Mean
(Normally Distributed)

82
1.1 Known
Variance

83
Example:
1. A simple random sample of 10 people from a certain population
has a mean age of 27. Can we conclude that the mean age of the
population is not 30? The variance is known to be 20 and
population is normally distributed. Take α = .05.
Data : n = 10, sample mean = 27, 2 = 20, α = 0.05

84
State the Hypotheses

Ho: µ = 30

HA: µ ≠ 30

Test statistic
 As the population variance is known, we use Z as the test
statistic.

85
Decision Rule
Reject Ho if the Z value falls in the rejection region.

Don’t reject Ho if the Z value falls in the non-rejection region.

Because of the structure of Ho it is a two tail test.

Therefore, reject Ho if Z ≤ -1.96 or Z ≥ 1.96.

86
Calculation of test statistic

Statistical decision

We reject the Ho because Z = -2.12 is in the rejection region.

Conclusion

 We conclude that mean age of population is not 30.

87
Hypothesis test using confidence
interval
A problem like the above example can also be solved using a
confidence interval.
A confidence interval will show that the calculated value of Z does
not fall within the boundaries of the interval.

Confidence interval

88
Example: One -Tailed Test
A simple random sample of 10 people from a certain population has a
mean age of 27. Can we conclude that the mean age of the population
is less than 30? The variance is known to be 20. Let α = 0.05.
Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
Hypotheses
Ho: µ ≥ 30, HA: µ < 30

89
Test statistic

Rejection Region

Lower tail test

With α = 0.05 and the inequality, we have the entire rejection region at the left.
The critical value will be Z = -1.645. Reject Ho if Z < -1.645.
90
Statistical decision
 We reject the Ho because -2.12 < -1.645.
Conclusion
 We conclude that µ < 30.

91
Suppose that the Ho and HA take the form

Ho: µ = µo, HA: µ > µo

In this case, Ho would be rejected for large values of test statistic

Upper tail test

92
1.2 Unknown Variance
In most practical applications the standard deviation of the
underlying population is not known
In this case,  can be estimated by the sample standard deviation
s.
If the underlying population is normally distributed, then the test
statistic is:

93
Example: Two-Tailed Test
A random sample of 14 people from a certain population gives a
sample mean body mass index (BMI) of 30.5 and sd of 10.64. Can we
conclude that the BMI is not 35 at α 5%?

Ho: µ = 35, HA: µ ≠35

Test statistic

If the assumptions are correct and Ho is true, the test statistic follows
Student's t distribution with 13 degrees of freedom.

94
Decision rule

We have a two tailed test. With α = 0.05 it means that each tail is 0.025. The
critical t values with 13 df are -2.1604 and 2.1604. We reject Ho if the t ≤ -
2.1604 or t ≥ 2.1604.

Do not reject Ho because -1.58 is not in the rejection region.

Conclusion : Based on the data of the sample, it is possible that µ = 35.

95
Summary
Population mean, known population variance (or standard
deviation): Normal test.
Population mean, Unknown population variance (or standard
deviation) and small sample: Student’s t-test.
Single population proportion: Normal test.

96
Hypothesis Tests for
Proportions
Involves categorical values

 Two possible outcomes

“Success” (possesses a certain characteristic)

“Failure” (does not possesses that characteristic)
 Fraction or proportion of population in the “success” category is
denoted by p

97
Hypothesis Testing for Population Proportion

98
Example
We are interested in the probability of developing asthma over a
given one-year period for children 0 to 4 years of age whose mothers
smoke in the home.
In the general population of 0 to 4-year-olds, the annual incidence of
asthma is 1.4%. If 10 cases of asthma are observed over a single year
in a sample of 500 children whose mothers smoke, can we conclude
that this is different from the underlying probability of p0 = 0.014?
alpha = 5% H0 : p = 0.014
HA: p ≠ 0.014
99
The test statistic is given by:

100
The critical value of Zα/2 at α=5% is ±1.96.

Don’t reject Ho since Z (=1.14) in the non-rejection region between ±1.96.

We do not have sufficient evidence to conclude that the probability of

developing asthma for children whose mothers smoke in the home is different
from the probability in the general population

101
Chi-squared
test
The Chi-squared test measures the disparity between observed
frequencies (data from the sample) and expected frequencies.
Helps to check association between categorical variables.

The Chi-squared test is valid

If no observed cell is 0
And no 20% of expected cell is less than 5
Chi-square test (x2)…
Chi square test is used for nominal or ordinal explanatory and
response variables
Variables can have any number of distinct levels
If the two variables have two level each, the resulting contingency
table will be 2X2
Variable 1
Variable 2 Diseased Not diseased N ( ad  bc ) 2
 2
cal 
( a  c )(b  d )( a  b )(c  d )
Exposed A B A+B
Not exposed C D C+D
A+C B+D N
Chi-square test
(x 2
)…
 If the two variables have more than two levels; r rows and c
columns, the resulting table will be rXc
Variable 1
Total
1 2 … c

Variable 2 1 O11 O12 … O1c r1

2 O21 O22 … O2c r2
… … … … … …
r Or1 Or2 … Orc rr
Total c1 c2 … cc n

(observed frequency - expected frequency)2 (O  E) 2

χ 
2

all cells expected frequency all cells E
Chi-square
test (x2)…
Oij is the observed frequency and Eij is expected frequency

Expected Frequency (E) for i th row and the jth column ( Eij )
( row i total ) ( column j total ) ri c j
Eij  
sample size n
ri row i total c j column j total n total sample size

 The degrees of freedom df = (r-1)x(c-1)

 r = # of rows and c = # of columns in the contingency
Chi-square test
(x2)…
Hypothesis testing steps in chi square test
1. State Hypotheses:
Null hypothesis (Ho): The classification variables are independent

Alternative hypothesis (Ha): There is relationship between the variables

2. Compute test statistic: Get calculated 2 value

3. Determine critical values: Find the table value of 2 at a given df

4. Decision: Reject H0 if calculated 2 > critical value of 2 from the table.

Example 1

Consider the following 2X2 table. Is there association between wearing Helmet and head injury??
Use 95% confidence level.
Head injury Wearing helmet
Yes No Total
Yes 17 218 235

No 130 428 558

Total 147 646 793

Example 1…

Step 1: hypothesis

 HO : There is no association between wearing helmet and head

injury
HA : There is an association between wearing helmet and head
injury
 Step 2: Test statistics
χ2 = __N (ad-bc)2__= (17*428-218*130)2*793= 28.26
 nD nND nE nNE 235*559*147*646
Step 3: critical value χ2 = 3.84
Step 4: Decision reject the null hypothesis
Example 2

A sample of 263 students who bought lunch at a school

canteen were asked whether or not they developed
gastroenteritis. The response is given below
Gastroenteritis
Yes No Total
Ate sandwich
Yes 109 116 225

No 4 34 38

Total 113 150 263

Example 2…
 Step 1: hypothesis

 HO : There is no association between eating sandwich and gastroenteritis

 Ha : There is an association between eating sandwich and gastroenteritis

 Step 2. Test statistics χ2 = 17.6

 Step 3: Critical value of χ21(0.05)= 3.84

 Step 4: decision: since 17.6>3.84 then reject the null hypothesis and
decide as there is association between eating sandwich and
gastrointestinal pain
Parametric vs nonparametric
test
 Statistical methods which depend on the assumptions about
the distribution of parameters in the population are referred
to as parametric methods
 Parametric tests include t-test, ANOVA, Regression,
Correlation, and so on
 To use a parametric test, we must assume a normal
distribution for the dependent variable, equality of variance
where populations are compared, and large sample size
 However, in real research situations things do not come with
labels detailing the characteristics of the population of origin
111
01/19/25
Parametric vs
nonparametric test
 Non-parametric statistics (we call sometimes distribution free
statistics) were designed to be used when we know nothing
about the distribution of the variable of interest in the
population
 It requires fewer assumptions about the population
probability distribution
 It also handles data collected in the form of ranking
 Nonparametric methods are often the way to analyze
nominal or ordinal data and draw statistical112conclusions.
01/19/25
Parametric vs
nonparametric test
More generally, a nonparametric method has the following
advantages;
Methods are quick and easy to apply.
More powerful when the assumptions of normality have been
violated.
Can be used with small sample size.
•Not affected by the presence of outliers.
•Less sensitive for measurement error as it uses ranks.
•Inherently robust due to lack of stringent assumption.
113
01/19/25
Parametric Versus Non-
parametric tests
Parametric Non-parametric
Unpaired t- Wilcoxon rank sum test/
test Mann-Whitney-Wilcoxon Test
Paired t-test Wilcoxon signed rank test
One way Kruskal-Wallis test
ANOVA

114
01/19/25
1. Wilcoxon Signed-
Rank Test
 This test is the nonparametric alternative to the
parametric matched-sample test.

 The methodology of the parametric matched-sample

analysis requires:

 The assumption that the population of differences

between the pairs of observations is normally
distributed.

 If the assumption of normally distributed differences is

not appropriate, the Wilcoxon signed-rank test can be
115
used.
01/19/25
1. Wilcoxon Signed-Rank Test
(cont’d)
Cases: We are testing the effectiveness of a new fuel
additive. We run an experiment with 12 cars. We first run
each car without the fuel treatment and measure the
mileage. We then add the fuel treatment and repeat the
experiment.

We want to test the null hypothesis that the treatment had

no effect. STATA CODE: signrank mpg1=mpg2

01/19/25
116
1. Wilcoxon Signed-Rank Test
(cont’d)
Stata outputs

Conclusion: We reject Ho and conclude treatment had

effect

01/19/25 117
2. Wilcoxon rank-sum test / Mann-
Whitney-Wilcoxon Test
 It is frequently used as the nonparametric analogy of the
Student’s t-test to compare two sets.

 The data are non-normally distributed.

 It is also used, when the original measurements were made

on an ordinal scale.

 This test, unlike the Wilcoxon signed-rank test, is not based

on a matched sample.

118
01/19/25
2. Wilcoxon rank-sum test / Mann-
Whitney-Wilcoxon Test (cont’d)
 This method tests to determine whether the two
populations are identical.

The hypotheses are:

H0: The two populations are identical

Ha: The two populations are not identical

119
01/19/25
2. Wilcoxon rank-sum test /
Mann-Whitney-Wilcoxon Test
(cont’d)

• We want to check the existence of weight difference

among male and female infants (assume that
normality distribution is violated even after
transformation)

STATA CODE: ranksum weight, by(sexChild)

01/19/25
120
2. Wilcoxon rank-sum test /
Mann-Whitney-Wilcoxon Test (cont’d)

Stata outputs

Conclusion: We reject Ho and conclude that there is

weight difference among male and female infants

01/19/25 121
2. Wilcoxon rank-sum test /
Mann-Whitney-Wilcoxon Test
(cont’d)
Student class activity 1
1. Check the existence of length difference among male
and female infants (assume that normality
distribution is violated even after transformation )

01/19/25 122
3. Kruskal-Wallis
Test
 For a Gaussian outcome the means of three or more
independent groups are compared by one-way ANOVA.

 When the assumption of one-way ANOVA are not met, i.e.:

 Populations are not normally distributed with equal variance;

data consist of only ranks.

 The alternative is the Kruskal-Wallis one-way analysis.

123
01/19/25
3. Kruskal-Wallis Test
 (cont’d)
The Mann-Whitney-Wilcoxon test can be used to test whether two
populations are identical.

 The MWW test has been extended by Kruskal and Wallis for cases
of three or more populations.

 The Kruskal-Wallis test can be used with ordinal data as well as

with interval or ratio data.

 Also, the Kruskal-Wallis test does not require the assumption of

normally distributed populations.

The hypotheses are:

H0: All populations are identical

Ha: Not all populations are identical

124
01/19/25
3. Kruskal-Wallis Test (cont’d)

• We want to check the existence of weight difference

among different categories of gravidity (assume that
normality distribution is violated even after
transformation )

STATA CODE: kwallis weight, by(CatGravida)

01/19/25
125
3. Kruskal-Wallis Test (cont’d)

Stata outputs

Conclusion: We reject Ho and conclude that there is

weight difference among different categories of
gravidity
01/19/25 126
3. Kruskal-Wallis Test (cont’d)

Students class activity 2

1. Check the existence of length difference among
different categories of gravidity (assume that
normality distribution is violated even after
transformation )

01/19/25 127
Spearman rank correlation
• Measures the strength and direction of association between
two variables that are measured on an ordinal or
continuous scale.
• The Spearman correlation coefficient is often denoted by
the symbol rs (or the Greek letter ρ, pronounced rho).
• It is a useful test when Pearson's correlation cannot be run
due to violations of normality, a non-linear relationship or
when ordinal variables are being used.
• Stata code: spearman variable 1 variable 2 or using menu
bar Statistics > Nonparametric analysis > Tests of hypotheses >
Spearman's rank correlation

01/19/25 128
Quiz
Write null and alternative hypothesis for the following statements (1&2).

1. The average weight of patients admitted to the hospital is 45KG

2. The proportion of children exposed to asbestos is 0.21

TRUE OR FALSE(3 &4)

3. Chi-square distribution is used to assess association between categorical variables.

4. Hypothesis is written in terms of sample statistic.

5. Researcher who is rejecting true null hypothesis when in fact it is True is committing which
Type of errors? A. Type one error

B. Type two error C. Both type of errors

129
THE END OF THE COURSE

THANKS!!!

All the best!!!

130

Estimation
No ratings yet
Estimation
106 pages
Estimation
No ratings yet
Estimation
74 pages
Chpater Three
No ratings yet
Chpater Three
84 pages
7 Estimation
No ratings yet
7 Estimation
108 pages
Chapter 5-6 Estimation Hypothesis
No ratings yet
Chapter 5-6 Estimation Hypothesis
146 pages
Statistics For Food Science - Chap 6-9
No ratings yet
Statistics For Food Science - Chap 6-9
50 pages
Estimation & Sample Size Determination
No ratings yet
Estimation & Sample Size Determination
91 pages
Chapter Two (Estimation and Hypothesis Testing)
No ratings yet
Chapter Two (Estimation and Hypothesis Testing)
20 pages
Lecture 4 Dr. Amani Week 13
No ratings yet
Lecture 4 Dr. Amani Week 13
34 pages
Statistics For Economists Lecture VI
No ratings yet
Statistics For Economists Lecture VI
33 pages
1 EC108 Estimation and Confidence Interval
No ratings yet
1 EC108 Estimation and Confidence Interval
125 pages
Bio 6
No ratings yet
Bio 6
36 pages
Hypothesis Testing Notes 2025
No ratings yet
Hypothesis Testing Notes 2025
116 pages
4estimation and Hypothesis Testing (DB) (Compatibility Mode)
No ratings yet
4estimation and Hypothesis Testing (DB) (Compatibility Mode)
170 pages
Lecture 4-Statistical Inferences
No ratings yet
Lecture 4-Statistical Inferences
118 pages
Chapter 7estimation
No ratings yet
Chapter 7estimation
44 pages
Chapte 8 Estimation
No ratings yet
Chapte 8 Estimation
60 pages
Estimation
No ratings yet
Estimation
40 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Estimation and CI
No ratings yet
Estimation and CI
87 pages
Hypothesis Testing Notes 2025
No ratings yet
Hypothesis Testing Notes 2025
93 pages
Biostat Lecture Seven
No ratings yet
Biostat Lecture Seven
59 pages
Statistical Inference 417
No ratings yet
Statistical Inference 417
90 pages
Estimation
No ratings yet
Estimation
44 pages
Chapter 3 - 2 Statistical Inference For 1 Population
No ratings yet
Chapter 3 - 2 Statistical Inference For 1 Population
84 pages
Estimation
No ratings yet
Estimation
53 pages
6 Estimation
No ratings yet
6 Estimation
65 pages
Unit V Estimation
No ratings yet
Unit V Estimation
33 pages
Chapter 4 Inferential
No ratings yet
Chapter 4 Inferential
135 pages
UNIT 10 - Estimations (With Voice)
No ratings yet
UNIT 10 - Estimations (With Voice)
67 pages
Theory of Estimation
100% (1)
Theory of Estimation
30 pages
Chapter Two-Four
No ratings yet
Chapter Two-Four
118 pages
Lec - 7& 8 (Stastical Estimation)
No ratings yet
Lec - 7& 8 (Stastical Estimation)
65 pages
VIII - Estimation
No ratings yet
VIII - Estimation
60 pages
4 Inferentials
No ratings yet
4 Inferentials
53 pages
Lecture2-Estimating With Confidence
No ratings yet
Lecture2-Estimating With Confidence
32 pages
Methods Chapter 2
No ratings yet
Methods Chapter 2
19 pages
Unit 6a Point and Interval Estimation
No ratings yet
Unit 6a Point and Interval Estimation
13 pages
Statistical Inferenace 1
No ratings yet
Statistical Inferenace 1
9 pages
7 Estimation
No ratings yet
7 Estimation
91 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Business Statistics CH 2
No ratings yet
Business Statistics CH 2
49 pages
Chapter 8
No ratings yet
Chapter 8
19 pages
Statistical Estimation
No ratings yet
Statistical Estimation
28 pages
POINT INTERVAL Estimates
No ratings yet
POINT INTERVAL Estimates
48 pages
Biostat Estimation
100% (1)
Biostat Estimation
48 pages
Lecture 8
No ratings yet
Lecture 8
85 pages
Chapter Two
No ratings yet
Chapter Two
154 pages
Ch-1.Ppt Business Statx
No ratings yet
Ch-1.Ppt Business Statx
66 pages
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
No ratings yet
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
119 pages
Inferential Statistics
No ratings yet
Inferential Statistics
119 pages
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
No ratings yet
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
44 pages
Statistics
No ratings yet
Statistics
412 pages
Inferential Estimation
100% (1)
Inferential Estimation
74 pages
Z Test and T Test
No ratings yet
Z Test and T Test
7 pages
Interval Estimation
100% (1)
Interval Estimation
42 pages
Chapter Two
No ratings yet
Chapter Two
28 pages
Statistical Inference
100% (1)
Statistical Inference
33 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
15 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
Biostat Inferential Statistics
No ratings yet
Biostat Inferential Statistics
62 pages
Action Research
No ratings yet
Action Research
3 pages
Statistics and Probability - q4 - Mod6 - Computation of Test Statistic On Population-Mean - V2
No ratings yet
Statistics and Probability - q4 - Mod6 - Computation of Test Statistic On Population-Mean - V2
24 pages
RHandbookProgramEvaluation PDF
100% (1)
RHandbookProgramEvaluation PDF
759 pages
Assignment 1 Statistics Masters Level
100% (1)
Assignment 1 Statistics Masters Level
16 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Lic Ipo
No ratings yet
Lic Ipo
23 pages
(Ebook) Social Statistics: Managing Data, Conducting Analyses, Presenting Results by Thomas J. Linneman ISBN 9780415661461, 0415661463 Download
100% (1)
(Ebook) Social Statistics: Managing Data, Conducting Analyses, Presenting Results by Thomas J. Linneman ISBN 9780415661461, 0415661463 Download
58 pages
Does A Hand Strength Focused Exercise Program.4
No ratings yet
Does A Hand Strength Focused Exercise Program.4
6 pages
General Linear Model (GLM)
No ratings yet
General Linear Model (GLM)
58 pages
Plotrix
No ratings yet
Plotrix
231 pages
MTH302 Solved MCQs Mega File
No ratings yet
MTH302 Solved MCQs Mega File
34 pages
NMT06105 Cat TW0 2024
No ratings yet
NMT06105 Cat TW0 2024
16 pages
Pivot Table and Jamovi
No ratings yet
Pivot Table and Jamovi
48 pages
Exploring Impact of Profitability Leverage and Capital Intensity On Avoidance of Tax Moderated by Size of Firm in LQ45 Companies
No ratings yet
Exploring Impact of Profitability Leverage and Capital Intensity On Avoidance of Tax Moderated by Size of Firm in LQ45 Companies
14 pages
1x1 qs-STAT ENG B
No ratings yet
1x1 qs-STAT ENG B
28 pages
RSCH8079 - Session 09 - Data Science With R
No ratings yet
RSCH8079 - Session 09 - Data Science With R
69 pages
529-Article Text-1895-1-10-20220430
No ratings yet
529-Article Text-1895-1-10-20220430
16 pages
DP Nisa
No ratings yet
DP Nisa
10 pages
Development of A Media Literacy Skills Scale: July 2017
No ratings yet
Development of A Media Literacy Skills Scale: July 2017
20 pages
Template Jurnal Tadrib Finish
No ratings yet
Template Jurnal Tadrib Finish
10 pages
CARJ Vol8 No1 P 58-68
No ratings yet
CARJ Vol8 No1 P 58-68
11 pages
2022 Khan Et Al - The Role of Online Written Feedback On Learners' Writing Skills
No ratings yet
2022 Khan Et Al - The Role of Online Written Feedback On Learners' Writing Skills
13 pages
BSA Syllabus Stats Fall22
No ratings yet
BSA Syllabus Stats Fall22
4 pages
Effect of High-Intensity Interval Training and Fartlek On Increasing VO2max in Futsal Players
No ratings yet
Effect of High-Intensity Interval Training and Fartlek On Increasing VO2max in Futsal Players
4 pages
Lecture 4 Finals Quantitative Data Analysis
No ratings yet
Lecture 4 Finals Quantitative Data Analysis
4 pages
Plant Experiment Guidelines PDF
No ratings yet
Plant Experiment Guidelines PDF
2 pages
Stimulating You To Speak A Strip Story As A Technique in Teaching Speaking
No ratings yet
Stimulating You To Speak A Strip Story As A Technique in Teaching Speaking
10 pages
Nafisatu Khoirun Nisa' - 16431020
No ratings yet
Nafisatu Khoirun Nisa' - 16431020
6 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet