Practical On Nonparametric Statistical Tests
Practical On Nonparametric Statistical Tests
PROBLEM 1.
Suppose that each of 13 randomly chosen female registered voters was asked to indicate if she was
going to vote for candidate A or candidate B in an upcoming election. The result shows that 9 of the
voters preferred A. Is this sufficient evidence to conclude that candidate A is preferred to B by female
voters?
ANSWER:
𝑃𝐻0 {𝑆 ≥ 𝑠𝛼 } ≤ 𝛼 = 0.005
13
⇒ ∑13
𝑠𝛼 ( ) (0.5)13 < 0.005
𝑠𝛼
Go to binomial table for n=13 and p=0.05
P(X = 13) = .0001, P(X = 12) = .0016, P(X = 11) = .00095, P(X=10)=0.0035
So if we choose sα = 12
P(S ≥ sα=12) =0.0017 <0.05.
For, sα = 11
P(S ≥ sα=11) =0.0012 <0.05.
For sα = 10
P(S ≥ sα=10) =0.0462 <0.05.
For sα = 9
P(S ≥ sα=9) =0.1332 <0.05.
As per rejection rule we fail to reject null hypothesis. No preference is given to a by the female voters.
PROBLEM 2.
Let Xi denote the length, in centimeters, of a randomly selected pygmy sunfish, i = 1, 2, ...10. If we
obtain the following data set 5.0 3.9 5.2 5.5 2.8 6.1 6.4 2.6 1.7 4.3 Can we conclude that the median
length of pygmy sunfish differs significantly from 3.7 centimeters?
ANSWER:
As, X is a continuous random variable we can think of Wilcoxon signed rank test.
𝐻0 : 𝜇 = 3.7
Against
𝐻1 : 𝜇 ≠ 3.7
The calculations required for the solution are shown in the table below:
𝑿𝒊 𝑫𝒊 |𝑫𝒊 | Rank(|𝑫𝒊 |)
5 1.3 1.3 5
3.9 0.2 0.2 1
5.2 1.5 1.5 6
5.5 1.8 1.8 7
2.8 -0.9 0.9 3
6.1 2.4 2.4 9
6.4 2.7 2.7 10
-2 -1.1 1.1 4
1.7 -2 2 8
4.3 0.6 0.6 2
= 𝑋𝑖 − 3.7
Now, the rank for which 𝐷𝑖 > 0 are 5,1,6,7,9,10 and 2
So, 𝑇 + = 5 + 1 + 6 + 7 + 9 + 10 + 2 = 40
Therefore, the median length of pygmy fish does not significantly differ from 3.7.
PROBLEM 3.
Five independent weighings of a standard weight (in gm ×106 ) give the following discrepancies from the
supposed true weight: 1.2, 0.2, 0.6, 0.8, 1.0.
Are the discrepancies sampled from N(0, 1)?
ANSWER:
We arrange the data in increasing order and prepare a table containing empirical cdf 𝐹𝑛 (x) as well as
standard normal cdf Φ(x) .
X 𝑭𝒏 (𝑿) 𝚽(𝑿)
-1.2 1/5=0.2 𝚽(-1.2)=0.115 0.085
-1.0 2/5=0.4 𝚽(-1.0)=0.159 0.425
-0.6 3/5=0.6 𝚽(-0.6)=0.274 0.326
0.2 4/5=0.8 𝚽(0.2)=0.580 0.22
0.8 5/5=1 𝚽(0.8)=0.788 0.212
From the table : the maximum difference is 0.326 . So, 𝐷𝑛 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 0.326
Since, the tabulated value is greater than the calculated value we fail to reject the null hypothesis at 5%
level of significance.
Ten observations are collected from a continuous distribution. Check that if they are from Uniform (0,1).
0.11,0.32,0.51,0.57,0.53,0.44,0.6,0.63,0.65,0.69.
ANSWER:
𝐻0 : F(x) = 𝐹0 (x).
Now, for n=10 and alpha=0.05, the tabular value of 𝐷𝑛 (obtained from K-S table) is 0.41.
Since, the calculated value of 𝐷𝑛 is smaller than the tabulated value of 𝐷𝑛 , we fail to reject the Null
Hypothesis.
A natural reserve in Australia had 15 fires from the beginning of this year. The fires occurred on the
following days of the year: 4, 18, 32, 37, 56, 64, 78, 89, 104, 134, 154, 178, 190, 220, 256.
A researcher claims that the time between the occurrences of fire in the reserve, say X, follows an
exponential distribution with parameter λ, i.e. f(x) = λexp(−xλ), X > 0 where λ = .009. Is the claim
justified?
(Find the distribution function F(x) of exponential)
ANSWER:
Let, X be the time between the occurrences and 𝐹𝑥 (𝑋) be its distribution function.
The Null Hypothesis is as follows:
𝐻0 : F(x) = 𝐹0 (x)
Now, for n=10 and alpha=0.05, the tabular value of 𝐷𝑛 (obtained from K-S table) is 0.338
Since, the calculated value of 𝐷𝑛 is smaller than the tabulated value of 𝐷𝑛 , we fail to reject the Null
Hypothesis.
Therefore, the data points are from exponential with parameter 𝜆 = 0.009
PROBLEM 6.
The data below represent earnings (in dollars) for a random sample of five common stocks listed on the
New York Stock Exchange. 1.68,3.35,2.50,6.23,3.24. Check if these data can be regarded as a random
sample from a normal distribution with µ = 3 and σ = 1. i.e N(3,1)
ANSWER:
Let, X be the earnings in dollars and 𝐹𝑋 (𝑥) be its distribution.
To check if X~N(3,1)
Null Hypothesis : The data belongs to N(3,1)
As, X is a continuous random variable, we apply Kolmogorov-Smirnov Test.
𝑥−𝜇 𝑥−3
Now, let z= = ⇒ 𝑍~𝑁(0,1)
𝜎 1
Therefore, redefining the null hypothesis : 𝐻0 : 𝐹(𝑧) = Φ(𝑧);
Φ(𝑧) 𝑖𝑠 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑁(0,1).
The Calculations required are given in the table below:
x z 𝑭𝒏 (𝒛) 𝚽(𝒛)
1.68 -1.32 0.2 𝚽(−𝟏. 𝟑𝟐) =0.0934 0.1066
2.5 -0.5 0.4 𝚽(−𝟎. 𝟓) =0.3084 0.0915
3.24 0.24 0.6 𝚽(𝟎. 𝟐𝟒) =0.5948 0.0052
3.35 0.35 0.8 𝚽(𝟎. 𝟑𝟓) =0.6368 0.1632
6.23 3.23 1 𝚽(𝟑. 𝟐𝟑) =0.9994 0.0052
Now, for n=10 and alpha=0.05, the tabular value of 𝐷𝑛 (obtained from K-S table) is 0.565
Since, the calculated value of 𝐷𝑛 is smaller than the tabulated value of 𝐷𝑛 , we fail to reject the Null
Hypothesis.
The following data shows the age at diagnosis of type II diabetes in young adults. Is the age of diagnosis
for males higher than that of females?
Males : 19 22 16 29 24
Females : 20 11 17 12
ANSWER:
Say X means for male age and Y means for female age. Suppose males age and females age both follow
two continuous independent populations with distribution functions 𝐹𝑥 (x) and 𝐹𝑦 (x) respectively. 𝑛1 = 5,
𝑛2 = 4. We want to test 𝐻0 : 𝐹𝑥 (x) = 𝐹𝑦 (x) against 𝐻1 : 𝐹𝑥 (x) > 𝐹𝑦 (x).
Combining the two samples and arranging the observations in mixed sample. We assign X to male and Y
to female.
11 12 16 17 19 20 22 24 29
Y Y X Y X Y X X X
For α = 0.05, 𝑛1 = 5, 𝑛2 = 4 the value of 𝑈𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 2; (𝐹𝑟𝑜𝑚 𝑜𝑛𝑒 𝑡𝑎𝑖𝑙 𝑀𝑎𝑛𝑛 − 𝑊ℎ𝑖𝑡𝑛𝑒𝑦 𝑡𝑎𝑏𝑙𝑒)
Since 𝑈𝑡𝑎𝑏𝑢𝑙𝑎𝑟 < 𝑈𝑥 , we fail to reject the null hypothesis at 5% level of significance.
Thus, the age of diagnosis for male is not higher than that of females.
PROBLEM 8.
A census statistic for the state Alabama in USA gives the percentage of population changes in 9 rural and
7 urban districts gave the following data on population change.
ANSWER:
Let, X denote the population change in rural area and Y denote the population change in urban area.
Suppose that both X and y follow continuous independent distributions with distribution functions 𝐹𝑋 (x)
and 𝐹𝑌 (x) respectively.
Combining the two samples and arranging the observations in mixed sample.
-21.7(X), -16.3(X), 11.3(X), 10.4(X), -7(X), -2.4(Y), -2(X), 1.10(X), 1.9(X), 6.2(X), 7.4(X), 9.9(Y), 14.2(Y),
19.4(Y), 20.1(Y), 23.4(Y)
For α = 0.05, 𝑛1 = 9, 𝑛2 = 7 the value of 𝑈𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 12; (𝐹𝑟𝑜𝑚 𝑡𝑤𝑜 𝑡𝑎𝑖𝑙 𝑀𝑎𝑛𝑛 − 𝑊ℎ𝑖𝑡𝑛𝑒𝑦 𝑈 𝑡𝑎𝑏𝑙𝑒)
Hence, Population change in urban area and rural area is not the same
PROBLEM 9.
A production manager believes that playing some music in the production area will help to reduce the
no. of defective items. 4 workers are randomly assigned to work in the usual day without music and 5
workers are assigned to work in music. No. of defective items produced below. Check manager’s claim.
3 4 9 10 1 2 5 7 8
ANSWER:
Let X denote the defective items without music and Y denote the defective items with music.
𝑛1 = 4, 𝑛2 = 5
To test,
Combining the two samples and arranging the observations in mixed sample we get:
For α = 0.05, 𝑛1 = 5, 𝑛2 = 4 the value of 𝑈𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 2; (𝐹𝑟𝑜𝑚 𝑜𝑛𝑒 𝑡𝑎𝑖𝑙 𝑀𝑎𝑛𝑛 − 𝑊ℎ𝑖𝑡𝑛𝑒𝑦 𝑡𝑎𝑏𝑙𝑒)
Since 𝑈𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 2 < 𝑈𝑥 = 16 , we fail to reject the null hypothesis at 5% level of significance.
Thus, the manager’s claim that playing some music in the production area will help to reduce the no. of
defective items is not correct.
PROBLEM 10.
To determine if a new hybrid seed produces a bushier flowering plants the following data on each
shrub’s sapling’s girth was collected. Examine if the data indicate that the new hybrid produces different
shrubs than the current variety.
Shrubs Girth(inches)
ANSWER:
Let X denote the girth of current variety of bushier flowering plant and Y denote the girth of Hybrid
variety of bushier flowering plant. The distribution function of X and Y is 𝐹𝑋 (𝑥) 𝑎𝑛𝑑 𝐹𝑌 (𝑦) 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦.
𝑛1 = 5, 𝑛2 = 6
To test,
Combining the two samples and arranging the observations in mixed sample we get:
21.3(X), 24.8(X), 27.6(X), 30(Y), 31.8(Y), 32.8(Y), 34.7(Y), 35.5(X), 36(Y), 36.7(X), 39.2(Y)
For 𝛼 = 0.05, 𝑛1 = 5, 𝑛2 = 6
Since 𝑈𝑡𝑎𝑏𝑢𝑙𝑎𝑟 < 𝑈𝑥 , we fail to reject the null hypothesis at 5% level of significance.
Thus, the new hybrid does not produces different shrubs than the current variety.
PROBLEM 11.
A study assessed the effectiveness of a new drug designed to reduce repetitive behaviors in children
affected with autism. A total of 8 children with autism enroll in the study and the amount of time that
each child is engaged in repetitive behavior during three hour observation periods are measured both
before treatment and then again after taking the new medication for a period of 1 week. The data are
shown below. Test if the new medication is better.
ANSWER:
Let X and Y denote the amount of time that each child is engaged in repetitive behavior before and after
one weak of treatment respectively.
Suppose X and Y follows continuous distribution. Each data points paired. So, we apply Wilcoxon signed
rank test for paired data. Here, the no. of observation is 8.
To test,
𝐻0 : 𝜇𝑑 = 0 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1 : 𝜇𝑑 < 0
The calculations required are shown in the table:
From the Wilcoxon table (one tail test) the tabular value of T is 5 for n=8 and alpha =0.005
Since, 𝑇 + = 15 > 𝑇 + 𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 5, we fail to reject the null hypothesis at 5% level of significance.
The table below shows the hours of relief provided by two analgesic drugs in 12 patients suffering from
arthritis. Is there any evidence that one drug provides longer relief than the other?
ANSWER:
Let, X and Y denote the hours of relief provided by Drug A and Drug B respectively.
Suppose, X and Y follows continuous distribution. Each sample point is a paired observation.
We apply Wilcoxon signed rank test for paired data.
Number of observation, n =12
To test,
𝐻0 : 𝜇𝑑 = 0 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝐻1 : 𝜇𝑑 < 0
The calculations required are shown in the table:
From the Wilcoxon table (one tail test) the tabular value of T is 10 for n=12 and alpha =0.005
Since, 𝑇 + = 71 > 𝑇𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 10, we fail to reject the null hypothesis at 5% level of significance.
PROBLEM 13.
Let X and Y denote the times in hours per weeks that students in two different schools watch television.
Let F(x) and G(y) denote the respective distributions. To test the null hypothesis:F(z) = G(z) a random
sample of eight students was selected from each school, yielding the following results:
What conclusion should we make about the equality of the two distribution functions?
ANSWER:
To test,
13(Y), 15.50(X), 16.75(X0, 17.25(X), 17.50(Y), 19(Y), 19.25(X), 19.75(Y), 20.50(X), 20.75(X), 21.50(Y),
22(X), 22.50(Y), 22.75(Y), 23.50(Y), 24.75(Y)
A series of 20 coin tosses might produce the following sequence of heads (H) and tails (T).
H H T T H T H H H H T H H T T T T T H H.
ANSWER:
To test,
So, W=9
Now, for 𝑛1 = 𝑛𝑜. 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠 = 11 ,𝑛2 = 𝑛𝑜. 𝑜𝑓 𝑡𝑎𝑖𝑙𝑠 = 9 𝑎𝑛𝑑 𝛼 = 0.05; 𝑊𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 6,16
Since, 6 < 𝑊𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 < 16. We fail to reject the H0. These observations are in random manner.
PROBLEM 15.
A quality control chart has been maintained for the weights of paint cans taken from a conveyor belt at
a fixed point in a production line. Sixteen (16) weights obtained today, in order of time, are as follows:
68.2 71.6 69.3 71.6 70.4 65.0 63.6 64.7 65.3 64.2 67.6 68.6 66.8 66.8 70.1
Use the run test, at approximately 0.05 level, to determine whether the weights of the paint cans on the
conveyor belt deviate from randomness.
ANSWER:
It is a quantitative data. First we arrange the data in increasing order. So we have to find median of the
observations and check which observations are greater than Median and which are lower.
63.6 64.2 64.7 65.0 65.3 66.8 67.6 68.2 68.6 68.9 69.3 70.1 70.4 71.6 71.6
Median weight of the sampled paint cans is 67.9. Labeling each observed weight with either a U if the
observed weight is greater than the median (67.9), or an L if the observed weight is less than the median
(67.9), we get: (U U U U U) (L L L L L L) (U) (L) (U) (L) (U)
Now, 𝑊𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑅𝑢𝑛𝑠 = 7 , and from the table we get 𝑊𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 4,14 for 𝑛1 = no of
U’s=8 and 𝑛2 =8. Therefore, the critical region W : W ≤ 4 or W ≥ 14.
Looking at the value of W=7 we fail to reject the null hypothesis. The weights of paint cans in a
production line are in random manner.