0% found this document useful (0 votes)
17 views39 pages

Lecture 14 Nonparametric - Statistics

Here are the steps for the Wilcoxon rank sum test for the two samples in Example 8-3: 1. Combine the heights from girls and boys and rank them from smallest to largest without considering the group labels. Ties get the average rank. 2. Sum the ranks for each group. For girls: T1=37. For boys: T2=134. 3. Since n1=8 and n2=10 are both ≤50, look up the critical values in the T-value table. The critical value for a two-tailed test with α=0.05 is T≤27. 4. Compare the sums to the critical value. Both T1=37 and

Uploaded by

fareehakanwar93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views39 pages

Lecture 14 Nonparametric - Statistics

Here are the steps for the Wilcoxon rank sum test for the two samples in Example 8-3: 1. Combine the heights from girls and boys and rank them from smallest to largest without considering the group labels. Ties get the average rank. 2. Sum the ranks for each group. For girls: T1=37. For boys: T2=134. 3. Since n1=8 and n2=10 are both ≤50, look up the critical values in the T-value table. The critical value for a two-tailed test with α=0.05 is T≤27. 4. Compare the sums to the critical value. Both T1=37 and

Uploaded by

fareehakanwar93
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Medical Statistics

医学统计学
Lecture 8 Nonparametric Statistics
非参数统计
Faming pan
[email protected]
Key points of last class
 Statistical inference
 Estimation of population mean
 Hypothesis testing
 Basic principle
 Basic steps
 t- test
 Questions for Hypothesis testing.
 Type I error & Type II error.
contents
 Parametric test & Nonparametric test
 Wilcoxon signed rank sum test
 Wilcoxon rank sum test for matched pairs
 Wilcoxon rank sum test for two samples
 Kruskal-Wallis’ H test
Vocabulary for Lecture 8
Parametric test 参数检验
non-parametric test / distribution-free tests 非参数检验
rank sum test 秩和检验
rank 秩
rank sum 秩和
precise 精确
pilot study 预试验
power 功效
Wilcoxon signed rank sum test Wilcoxon 符号秩和检验
Wilcoxon rank sum test Wilcoxon 秩和检验
Vocabulary for Lecture 8
tie 相持

kruskal-wallis’ H test kruskal-wallis H 检验

raw data 原始数据

survival time 生存时间

critical value 临界值

multiple comparison 多重比较

pair-wise comparison 两两比较

Location or position 位置
1. Parametric test & Nonparametric test

 Parametric test : The methods of


hypothesis testing we have learnt : t-test
 Assume: the variable follows a normal
distribution;
 To test whether the means (parameters)
are equal or not under such an assumption.
Therefore, they are called parametric tests
1. Parametric test & Nonparametric test

 Non-parametric tests (distribution-free


tests): There is no any assumptions
about the distribution.
 Rank sum tests: Another kind of non-
parametric tests, which is based on ranks
of the data.
1. Parametric test & Nonparametric test

 For the following situations, the non-


parametric tests could be used:
 The distribution of data is unknown;
 The distribution of data is skew;
 Ranked data or non-precise data;
 A quick and easy analysis ( for pilot
study ).
1. Parametric test & Nonparametric test
 It is suitable for a variety of data:
 Measurement or enumeration or ordinal
 Normal distribution or not
 Symmetric or not

 However, If the data are suitable for the


parametric tests, the power of non-
parametric tests will be slightly lower.
1. Wilcoxon signed rank sum test
(matched pairs)
 Example 8-1 :To evaluation HIV
knowledge training on 13 middle school
students, the scores as below:
Table 8-1 The scores of 13 students before and after training

No. of students Before After Difference Rank


(1) training training
(2) (3) (4)=(3)-(2) (5)
1 7 10 3 +9
2 7 9 2 +6.5
3 7 7 0 -
4 6 7 1 +3
5 7 10 3 +9
6 7 6 -1 -3
7 8 9 1 +3
8 2 6 4 +11
9 9 8 -1 -3
10 6 9 3 +9
11 4 6 2 +6.5
12 6 6 0 -
13 6 7 1 +3
1.1Steps of nonparametric statistics

Step 1 : Hypotheses:
H0: The median of the difference is 0 , Md=0
H1: The median of the difference is not 0 ,
Md≠0 α=0.05.
Step 2: Get the difference
Step 3: Ranking absolute differences (omit zero) and
give back the signs
Step 4: Rank sum and statistic
T = min( positive sum, negative sum)
1.1Steps of nonparametric statistics
 Step 5: P-value and conclusion (n≤50, check T
value table; n>50, normal approximately ).
 If n≤50 , When n is fixed , rank n random number many
times , 95% the sums of rank range from (lower limit,
upper limit), see T value table.
 If n > 50 , beyond the rang of value table , the
method of normal approximately can be used.

T  n( n  1) / 4 T: minimum sum of ranks


Z
n( n  1)( 2n  1) ( t 3j  t j ) n : sample size (omit zero)

24 48 tj: the number of same ranks
N one sided 0.05 0.025 0.010 0.005

two sided 0.10 0.050 0.020 0.010


1.2 Steps of example 8-1
 Step 1: H0: Md=0, H1: Md≠0, α=0.05
 Step 2: Get the difference (Table 1).
 Step 3: Ranking absolute differences (omit zero, then n-1)
and give back the signs.
(Table 1, two zeros, n=13-2=11 ).
 Step 4: Rank sum and statistic.
Positive sum of rank=60, negative sum of rank =6
T = min( positive sum, negative sum)=6
 Step 5: P-value and conclusion.
1.1 Steps of example 8-1
 Step 5: P-value and conclusion (n≤50, check T value table;
n>50, normal approximately )
From Table above, T0.05/2,11= (10-56) , it means if 1000 people
write 11 numbers in 1000 pieces of paper, get the minimum
sum of rank after rank the numbers, the range of the sum of
rank is from 0 to 66, 950 people’s sum of ranks were covered
from 10 to 56.
 Here , T=6 , out of 10-56, Small probability event happened.
P<0.05, H0 is rejected. Conclusion: The training is effective.
2. Comparing Medians between
one sample and a population
 To deduce weather the difference exists
between Median of one sample and
population median or not.
Example 8-2

The median of urine fluorine content of normal man


is 2.15mmol/L , 12 worker were randomly
sampled in a certain factory, and measure their
urine fluorine contents (table 8-2 ) , is the urine
fluorine contents of the workers in the factory
higher than that of normal man ?
Table 8-2 The urine fluorine of 12 workers
urine fluorine median difference rank
(1) (2) (3)=(2)-(1) (4)
2.15 2.15 0 X
2.10 2.15 -0.05 - 2.5
2.20 2.15 0.05 2. 5
2.12 2.15 -0.03 -1
2.42 2.15 0.27 4
2.52 2.15 0.37 5
2.62 2.15 0.47 6
2.72 2.15 0.57 7
2.99 2.15 0.84 8
3.19 2.15 1.04 9
3.37 2.15 1.22 10
4.57 2.15 2.42 11
Sum of rank (-) --- --- - 3.5
Sum of rank (+) --- --- +62.5
2. Steps of example 8-2

Step 1: H0: Md=0, H1: Md>0, α=0.05


Step 2: Get the difference (Table 8-2).
Step 3: Ranking absolute differences (omit zero, then
n-1) and give back the signs.
(Table 8-2, one zeros, n=12-1=11 ).
Step 4: Rank sum and statistic.
Positive sum of rank=62.5, negative sum of rank =3.5
T = min( positive sum, negative sum)=3.5
2. Steps of example 8-2

Step 5: P-value and conclusion


 n≤50, check T value table
 From T-value table, T0.05,11= (13-53) , here ,
T=3.5 , out of 13-53, Small probability event
happened.
 P<0.05, H0 is rejected. Conclusion: the urine
fluorine contents of the workers in the factory
higher than that of normal man
 (why?) . Who can tell me why?
3. Wilcoxon rank sum test for two
samples
 At least one of the two population’s
distributions which samples came from is not
normal distribution;
 The variations are not equal.

 What is the condition of t–test?


Example 8-3

 To compare the heights between girls and


boys in our class, we investigated the heights
of 8 girls and 10 boys randomly,
 is there any difference between the heights
of boys and girls?
 If you encounter such data, do you think
should be how to deal with it?
Table 8-3 The heights of girls and boys
Girls Boys
Height rank Height rank
172.78 9 168.23 8
163.23 1 173.50 10
164.20 2 174.04 11.5
164.87 3 174.15 13
165.12 4 177.28 16
166.21 5 174.34 14
167.18 6 177.47 17
168.05 7 174.04 11.5
174.75 15
184.82 18
n1=8 T1=37 n2=10 T2=134

How to judge which group of high? Can you think so ?


What is the difference between this method and
the front rank method?

• First: Mixed coding rank


• Second: For the average rank value different
groups of the same observation
• Later: Rank sum statistics for each group
respectively
• Check the tables
• Determine P value and make a conclusion
T-value table for two samples

Smaller n
Smaller n
2. Steps of example 8-3

Step 1: H0: M1=M2, H1: M1≠M2, α=0.05


Step 2: Ranking the raw value
(Table 8-3, nsmaller=8; nbigger=10 ).
Step 3: Rank sum for smaller sample size. (why?)
(Table 8-3, nsmaller=8, T1=37; nbigger=10 , T2=134).
Step 4: P-value and conclusion
Tα/2,(n1,n2-n1) =T0.05/2,(8,10-8) =(54-98) , T1=37 is out of the range,
P<0.05, H0 is rejected. Conclusion: heights of boys and girls
are difference in our class.
If n1>11 or n2  n1  10 , beyond the range of
T-value table,the normal approximately can be
used. n1  n2  N ,calculate Z value。

T  n1 ( N  1) / 2
Z
n1 n2 ( n  1) ( t  t j )
3
j
(1  )
N N
3
12
4. Kruskal-Wallis’ H test

 For comparing ≥ 2 samples;

Have two situations


 For raw data;

 For contingency table with ordinal categories


4.1 For raw data
 Example 8-4 Hospitalization time (days) of
3 treatments for a certain disease as below:

Table 8-4 the Hospitalization time of treatment A, B and C


A Rank B Rank C Rank
(1) (2) (3) (4) (5) (6)
3 4 9 13 1 1
7 10 12 15 2 2.5
7 10 11 14 6 7.5
6 7.5 8 12 4 5
2 2.5 5 6 7 10
Ni 5 5 5
Ri 34 60 26
Steps of example 8-4
Step 1 : Hypothesis:
H0: The distributions of three populations are all
same.
H1: The distributions of three populations are not all
same. α=0.05
Step 2 : Ranking all the observations in three samples
(Same way for ties)
Step 3 : Rank sums for each sample
R1=34, R2=60, R3=26
Steps 4 : Statistic H
If there is no tie 12 Ri2
H
N ( N  1)
 ni
 3( N  1)

If there are some ties


tj : Number of individuals in j-th tie
 (t j  t j )
3
H
Correction factor: C  1 HC 
N3  N C
Example 8-4:
12  34 2 60 2 26 2 
H      3(15  1)  6.32
15(15  1)  5 5 5 
6.32
C  1
(23  2)  (23  2)  (33  3)
 0 .9893
HC   6.39
153  15 0.9893
Steps of example 8-4
Step 5: P-value and conclusion
Compare with critical value of H
(H-value table as below)

H 0.05  5.78 ,Hc  6.39  5.78 , P  0.05

conclusion: The Hospitalization time (days) are


not all equal
H-value table ( for 3 samples)
P
n n1 n2 n3
0.05 0.01
3 2 2 4.71
7
3 3 1 5.14
3 3 2 5.36
4 2 2 5.33
8
4 3 1 5.21
5 2 1 5.00
3 3 3 5.60 7.20
4 3 2 5.44 6.44
9 4 4 1 4.97 6.67
5 2 2 5.16 6.53
5 3 1 4.96
4 3 3 5.73 6.75
4 4 2 5.49 7.04
10
5 3 2 5.25 6.82
5 4 1 4.99 6.95
4 4 3 5.60 7.14
5 3 3 5.65 7.08
11
5 4 2 5.27 7.12
5 5 1 5.13 7.31
4 4 4 5.69 7.65
12 5 4 3 5.63 7.44
5 5 2 5.34 7.27
5 4 4 5.62 7.76
13
5 5 3 5.71 7.54
14 5 5 4 5.64 7.79
15 5 5 5 5.78 7.98
(2) For contingency table with ordinal categories

Example 8-5 Milk-secretion and number of


pregnancy weeks at delivery

Milk- Premature Term Postmature Total Range Average Rank sum


secretion birth birth birth of rank Rank Premature Term Postmature
birth birth Birth
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Non 30 132 10 172 1-172 86.5 2595 11418 865
Less 36 292 14 342 173-514 343.5 12366 100302 4809
More 31 414 34 479 515-993 754 23374 312156 25636
Total 97 838 58 993 38335 423876 31310

Such data how do we rank?


Statistic H and conclusion

12  383352 423876 2 31310 2 


H      3(993  1)  14.3
993(993  1)  97 838 58 
(1723  172)  (3423  342)  (4793  479) 14.3
C  1  0.8417 HC   17.0
993  993
3 0.8417

 02.05, 2  5.99 P  0.05

Conclusion:The pregnancy weeks of the 3 milk-secretion groups


are significantly different.
Summary

 The difference between Parameter test


and nonparameter test
 Application of nonparameter test
 Rank and rank sum
 Correction factor

You might also like