Research Methodology - 11,12
Research Methodology - 11,12
Hypothesis:-
A logical relationship between two or more variables expressed in the form of a
testable statement. It is an intelligent guess of the probable solution to the problem.
They are derived from the theoretical framework formulated for the research
A hypothesis is also an assumption about the population parameter.
A parameter is a characteristic of the population, like its mean or variance.
The parameter must be identified before analysis.
Characteristics of hypothesis:
Hypothesis must possess the following characteristics:
Hypothesis should be clear and precise.
Hypothesis should be capable of being tested.
Hypothesis should state relationship between variables, if it happens to be a
relational hypothesis.
Hypothesis should be consistent with most known facts i.e., it must be consistent
with a substantial body of established facts. In other words, it should be one that
judges accept as being the most likely.
Hypothesis should be amenable to testing within a reasonable time.
IMPORTANT CONCEPTS:-
Null hypothesis:-
Null hypothesis assumes no difference or no relationship between the two
hypothesized variables.
It assumes there is no difference between population parameter and sample
characteristic.
It is represented as Ho.
Always contains the ‘ = ‘ sign
It indicates unbiased attitude of the researcher to the research.
A statement can be tested statistically.
1
Consider the following examples:-
1) H0 : μ>=60 (The mean score of the class is at least 60)
2) H0 : μ1= μ2 (There is no difference in average scores of two classes)
Note:-
If we want to test the significance of the difference between the statistic and
parameter or between two sample statistics then we set up a null hypothesis that
the difference is not significant.
e.g. H0: μ = ̅ (Population mean is same as sample mean)
If we want to test any statement about the population, we set up null hypothesis
that it is true.
e.g. H0: μ= 65 cms (Population mean is equal to stated value, say 65cms)
If we test relationship between two variables then we state that there is no
relationship between two variables.
Alternate hypothesis:-
1) It assumes some difference or some relationship between the hypothesized
variables.
2) It is represented as HA or H1.
3) It can be tested statistically as null hypothesis.
4) The rejection of null hypothesis results in acceptance of alternate hypothesis.
Examples 1 & 2 tell us that they do differ from 3 & 4. But do not tell us which variable
in statement 1 & 2 has a higher value than the other. It does not tell us the
DIRECTION of magnitude. Hence, they are known as NON - DIRECTIONAL
HYPOTHESES. Whereas examples 3 & 4 not only tell us that the variables mentioned
therein are not equal in magnitude but also tell us which variable is having a
higher magnitude . They provide us the direction of the magnitude. Hence, they are
known as DIRECTIONAL HYPOTHESES.
2
The following considerations are usually kept in view in formation of hypothesis:
Alternative hypothesis is usually the one, which one wishes to prove and the null
hypothesis is the one, which one wishes to disprove. Thus, a null hypothesis
represents the hypothesis we are trying to reject, and alternative hypothesis
represents all other possibilities.
If the rejection of a certain hypothesis when it is actually true involves great risk, it is
taken as null hypothesis because then the probability of rejecting it when it is true is
a (the level of significance) which is chosen very small.
Null hypothesis should always be specific hypothesis i.e., it should not state about or
approximately a certain value.
3
Decision rule or test of hypothesis:
Given a hypothesis H0 and an alternative hypothesis Ha, we make a rule which is known
as decision rule according to which we accept H0 (i.e., reject Ha) or reject H0 (i.e., accept
Ha). For instance, if (H0 is that a certain lot is good (there are very few defective items in
it) against Ha) that the lot is not good (there are too many defective items in it), then we
must decide the number of items to be tested and the criterion for accepting or rejecting
the hypothesis. We might test 10 items in the lot and plan our decision saying that if
there are none or only
1 defective item among the 10, we will accept H0 otherwise we will reject H0 (or accept
Ha). This sort of basis is known as decision rule.
4
Symbolically, the two tailed test is appropriate when we have,
Thus, in a two-tailed test, there are two rejection regions*, one on each tail of the curve,
which can be illustrated as under:
If the significance level is 5 per cent and the two-tailed test is to be applied, the
probability of the rejection area will be 0.05 (equally spitted on both tails of the curve as
0.025) and that of the acceptance region will be 0.95 as shown in the above curve. If we
5
take m = 100 and if our sample mean deviates significantly from 100 in either direction,
then we shall reject the null hypothesis; but if the sample mean does not deviate
significantly from m , in that case we shall accept the null hypothesis. But there are
situations when only one-tailed test is considered appropriate. A one-tailed test would be
used when we are to test, say, whether the population mean is either lower than or
higher than some hypothesized value. For instance, if our
, then we are interested in what is known as
left-tailed test (wherein there is one rejection region only on the left tail) which can be
illustrated as below:
If our m = 100 and if our sample mean deviates significantly from100 in the lower
direction, we shall reject H0, otherwise we shall accept H0 at a certain level of
significance. If the significance level in the given case is kept at 5%, then the rejection
region will be equal to 0.05 of area in the left tail as has been shown in the above curve.
6
In case our null and alternate hypothesis are:-
we are then interested in what is known as one tailed test (right tail) and the rejection
region will be on the right tail of the curve as shown below:
If our and if our sample mean deviates significantly from 100 in the upward
direction, we shall reject H0, otherwise we shall accept the same. If in the given case the
significance level is kept at 5%, then the rejection region will be equal to 0.05 of area in
the right-tail as has been shown in the above curve.
8
IMPORTANT PARAMETRIC TESTS
The important parametric tests are: (1) z-test; (2) t-test; (*3) c2-test, and (4) F-test. All
these tests are based on the assumption of normality i.e., the source of data is considered
to be normally distributed. In some cases the population may not be normally
distributed, yet the tests will be applicable on account of the fact that we mostly deal
with samples and the sampling distributions closely approach normal distributions.
z-test is based on the normal probability distribution and is used for judging the
significance of several statistical measures, particularly the mean. The relevant test
statistic*, z, is worked out and compared with its probable value (to be read from table
showing area under normal curve) at a specified level of significance for judging the
significance of the measure concerned. This is a most frequently used test in research
studies. This test is used even when binomial distribution or t-distribution is applicable
on the presumption that such a distribution tends to approximate normal distribution as
‘n’ becomes larger. Z-test is generally used for comparing the mean of a sample to some
hypothesized mean for the population in case of large sample, or when population
variance is known. Z-test is also used for judging the significance of difference between
means of two independent samples in case of large samples, or when population
variance is known. Z-test is also used for comparing the sample proportion to a
theoretical value of population proportion or for judging the difference in proportions of
two independent samples when n happens to be large. Besides, this test may be used for
judging the significance of median, mode, coefficient of correlation and several other
measures.
T-test is based on t-distribution and is considered an appropriate test for judging the
significance of a sample mean or for judging the significance of difference between the
means of two samples in case of small sample(s) when population variance is not known
(in which case we use variance of the sample as an estimate of the population variance).
In case two samples are related, we use paired t-test (or what is known as difference
test) for judging the significance of the mean of difference between the two related
samples. It can also be used for judging the significance of the coefficients of simple and
partial correlations. The relevant test statistic, t, is calculated from the sample data and
then compared with its probable value based on t-distribution (to be read from the table
that gives probable values of t for different levels of significance for different degrees of
freedom) at a specified level of significance for concerning degrees of freedom for
accepting or rejecting the null hypothesis. It may be noted that t-test applies only in case
of small sample(s) when population variance is unknown.
9
Difference between parametric and Non-parametric Test
1. A statistical test, in which specific assumptions are made about the population
parameter, is known as the parametric test. A statistical test used in the case of
non-metric independent variables is called nonparametric test.
2. In the parametric test, the test statistic is based on distribution. On the other
hand, the test statistic is arbitrary in the case of the nonparametric test.
3. In the parametric test, it is assumed that the measurement of variables of interest
is done on interval or ratio level. As opposed to the nonparametric test, wherein
the variable of interest are measured on nominal or ordinal scale.
4. In general, the measure of central tendency in the parametric test is mean, while
in the case of the nonparametric test is median.
5. In the parametric test, there is complete information about the population.
Conversely, in the nonparametric test, there is no information about the
population.
6. The applicability of parametric test is for variables only, whereas nonparametric
test applies to both variables and attributes.
7. For measuring the degree of association between two quantitative variables,
Pearson’s coefficient of correlation is used in the parametric test, while
spearman’s rank correlation is used in the nonparametric test.
Formula:
10
Example 1 (one-tailed test): A herd of 1,500 steers was fed a special high-protein
grain for a month. A random sample of 29 was weighed and had gained an average of 6.7
pounds. If the standard deviation of weight gain for the entire herd is 7.1, Does the
average weight gain per steer for the month was greater than 5 pounds for continuing
with grain (use 5% level of significance)?
Solution:-
Given Data:-
Sample Size (n) =29, Sample mean ( ̅ ) = 6.7pounds, Std. Dev. Of population (σ)
=7.1 pounds, Level of significance (α) =0.05
Null hypothesis: H0: μ <= 5 (the average weight gain per steer is less than or equal to 5
pounds)
Alternate hypothesis: H0: μ > 5 (The average weight per steer is greater than 5
pounds)
Since alternate hypothesis contains > (greater than), this is one tailed (right tailed)
test.
| |
Z= = 3.436
√
11
we say that The average weight per steer is greater than 5 pounds and hence we
continue with the high-protein grain.
Solution:-
Given Data:-
Sample Size (n) =90, Sample mean ( ̅ ) = 65, Std. Dev. Of population (σ) =13, Level of
significance (α) =0.05
There are two possible ways that the class may differ from the population. Its scores may
be lower than, or higher than, the population of all students taking the test; therefore,
this problem requires a two-tailed test. Alternate hypothesis contains ≠ sign.
Z= = -2.18
√
12
Step 4) Decision rule:-
Since calculated value of Z (-2.18) is less than lower standard value of z at 5% level of
significance (-1.96), we reject the null hypothesis and accept alternate hypothesis. i.e. we
say that the mean score of the class is not 68.
Where ̅ the sample mean, µ is is a specified value to be tested, S is the sample standard
deviation, and N is the size of the sample. When the standard deviation of the sample is
substituted for the standard deviation of the population, the statistic does not have a
normal distribution; it has what is called the T-distribution. Because there is a different
T-distribution for each sample size, it is not practical to list a separate area-of-the-curve
table for each one. Instead, critical T-values for common alpha levels (.05, .01, .001, and
so forth) are usually given in a single table for a range of sample sizes. For very large
samples, the T-distribution approximates the standard normal ( Z) distribution.
Values in the T-table are not actually listed by sample size but by degrees of freedom
(DF). The number of degrees of freedom for a problem involving the T-distribution for
sample size N is simply N – 1 for a one-sample mean problem.
Degrees of Freedom
Suppose I tell you that I have a sample of n=4 scores, and that the first three scores are 2,
3, and 5. What is the value of the 4th score? You can't tell me, given only that n = 4. It
could be anything. In other words, all of the scores, including the last one, are free to
vary: df = n for a sample mean. To calculate t, you must first calculate the sample
standard deviation. The conceptual formula for the sample standard deviation is:
Suppose that the last score in my sample of 4 scores is a 6. That would make the sample
mean equal to (2+3+5+6)/4 = 4. As shown in Table 2, the deviation scores for the first 3
scores are -2,-1, and 1.
13
Table : Illustration of degrees of freedom for sample standard deviation
Using only the information shown in the final column of Table, you can deduce that x4,
the 4th deviation score, is equal to -2. How so? Because by definition, the sum of the
deviations about the mean = 0. This is another way of saying that the mean is the exact
balancing point of the distribution. In symbols,
So, once you have n-1 of the (X − X) deviation scores, the final deviation score is
determined. That is, the first n-1 deviation scores are free to vary, but the final one is not.
There are n-1 degrees of freedom whenever you calculate a sample variance (or
standard deviation).
Solution:-
Given Data:-
Sample Size (n) =6, Level of significance (α) =0.1 (since you want to be 90% sure)
14
Step 2) Specify significance level and selection of Test:-
Significance level is 5% (Given, if not given then assume)
Where,
And
Since sample std. dev, is not given we can calculate using the formula,
s =sqrt (((62-79.16)2 + (92-79.16)2 + (75-79.16)2 + (68-79.16)2 + (83-79.16)2 +
(95-79.16)2)/ (6-1))
= 13.17
| ̅ | | |
t= = = 1.71
√ √
15
The T-distribution is particularly useful for tests with small samples (N < 30).
Example 2 (two-tailed test): A Little League baseball coach wants to know if his team
is representative of other teams in scoring runs. Nationally, the average number of runs
scored by a Little League team in a game is 5.7. He chooses five games at random in
which his team scored 5 9, 4, 11, and 8 runs. Is it likely that his team's scores could have
come from the national distribution? Assume an alpha level of .05.
Because the team's scoring rate could be either higher than or lower than the national
average, the problem calls for a two-tailed test.
Solution:-
Given Data:-
Sample Size (n) =5, Level of significance (α) =0.05.
| ̅ | | |
t= = = 1.47
√ √
16
Step 4) Decision rule:-
The tabled value for T.025, 4 is 2.776. The computed T of 1.32 is smaller than the T from
the table, so you accept the null hypothesis that the mean of this team is equal to the
population mean. The coach can conclude that his team fits in with the national
distribution on runs scored.
First People’s Bank of Central City would like to improve their loan application process.
In particular currently the amount of time required to process loan applications is
approximately normally distributed with a mean of 18 days. Measures intended to
simplify and speed the process have been identified and implemented. Were they
effective? Test the appropriate hypothesis at the α= 0.05 level of significance if a sample
of 25 applications submitted after the measures were implemented gave an average
processing time of 15.2 days and a standard deviation of 2.0 days.
Solution:-
Given Data:-
Sample Size (n) =25, Level of significance (α) =0.05., Sample Mean ( ̅ =15.2 days,
Sample Std. Dev. (s) =2 days
17
Step 3) Compute Statistic:-
| ̅ | | |
t= = = -7.00
√ √
Formula:
where ̅̅̅̅ and ̅̅̅̅ are the means of the two samples, Δ is the hypothesized difference
between the population means (0 if testing for equal means), σ1 and σ2 are the standard
deviations of the two populations, and N1 and N2 are the sizes of the two samples.
Example 1 (two-tailed test): The amount of a certain trace element in blood is known
to vary with a standard deviation of 14.1 ppm (parts per million) for male blood donors
and 9.5 ppm for female donors. Random samples of 75 male and 50 female donors yield
concentration means of 28 and 33 ppm, respectively. What is the likelihood that the
population means of concentrations of the element are the same for men and women
(Test at 5% significance level). Z- Value for two-tailed test at 5% level of
significance=1.96?
Given Data:-
For Male:-
Sample Size (N 1) = 75
Standard Deviation (σ 1) = 14.1 ppm
Mean (̅̅̅̅) = 28 ppm
18
For Female:-
Sample Size (N 2) = 50
Standard Deviation (σ 2) = 14.1 ppm
Mean (̅̅̅̅) = 28 ppm
Null hypothesis:-
H0: μ1 = μ2 (there is no significance diff. between means of concentrations of two
samples)
or H0: μ1 – μ2 = 0
Alternate hypothesis:
H a: μ1 − μ 2
or: Ha: μ1 − μ2 ≠ 0
In practice, the two-sample Z-test is not often used because the two-population standard
deviations σ1 and σ2 are usually unknown. Instead, sample standard deviations and the
T-distribution are used.
19
4) TWO SAMPLE T-TEST: COMPARING TWO MEANS (Equal Variances assumed)
Requirements: two normally distributed but independent populations, σ is unknown
but assumed equal. When the sample sizes are nearly equal (admittedly "nearly equal"
is somewhat ambiguous so often if sample sizes are small one requires they be equal),
then a good Rule of Thumb to use is to see if this ratio falls from 0.5 to 2 (that is neither
sample standard deviation is more than twice the other). If this rule of thumb is satisfied
we can assume the variances are equal.
Formula:
GROUP METHOD N X S
1 intensive 12 46.31 6.44
2 paced 10 42.79 7.52
Solution:-
Step 1) Set up Hypothesis:-
Null hypothesis:
H0: μ1 ≤ μ2 ( The intensive tutoring is not superior to paced tutoring)
or H0: μ1 − μ2 ≤ 0
20
Alternate hypothesis:
Ha: μ1 ≥ μ2 (The intensive tutoring is more effective than paced tutorial)
or: Ha: μ1 − μ2 ≥ 0
1 2 3 4 5 6 7 8 9 10 11
Before 87.4 92.9 83.6 81.5 89.7 100.5 98.6 88.8 112.4 87.6 92.8
After 85.4 88.3 84.7 81.2 83.3 94.6 90.1 87.2 104.6 88.4 91.7
21
Solution:-
Step 1) Set up Hypothesis:-
Let µ1 be the average weight before the diet and µ2 the average weight after diet.
Null hypothesis:
H0: µ2 - µ1 >=0 (The average weight after diet reducing plan did not reduce)
or H0: μ2 >= μ1
Alternate hypothesis:
Ha: : µ2 - µ1 < 0 (The average weight after diet reducing plan reduce significantly)
or: Ha: μ2 < μ1 One tailed test (Left Tailed Test)
Sample Size(n) = 11
DAvg = (ΣDj) /n = -3.3
Std Dev = sqrt (Σ(Dj - Davg)2/(n-1)) = 3.46
Std. Err.= std. Dev / sqrt(n) =1.044
Tcal= Davg / Std. Err. = -3.16
Degrees of Freedom = 11-1=10
Pair Before After Dj Dj2 (Dj-Davg)2
1 87.4 85.4 -2 4 1.69
2 92.9 88.3 -4.6 21.2 1.69
3 83.6 84.7 1.1 1.21 19.36
4 81.5 81.2 -0.3 0.09 9.00
5 89.7 83.3 -6.4 41 9.61
6 100.5 94.6 -5.9 34.8 6.76
7 98.6 90.1 -8.5 72.3 27.04
8 88.8 87.2 -1.6 2.56 2.89
9 112.4 104.6 -7.8 60.8 20.25
10 87.6 88.4 0.8 0.64 16.81
11 92.8 91.7 -1.1 1.21 4.84
119.94
22
Lower TCri at 5% level of sign. and 10 degrees of freedom = -1.812
Forty-four sixth graders were randomly selected from a school district. Then, they were
divided into 22 matched pairs, each pair having equal IQ's. One member of each pair was
randomly selected to receive special training. Then, all of the students were given an IQ
test. Test results are summarized below.
1 95 90 12 85 83
2 89 85 13 87 83
3 76 73 14 85 83
4 92 90 15 85 82
5 91 90 16 68 65
6 53 53 17 81 79
7 67 68 18 84 83
8 88 90 19 71 60
9 75 78 20 46 47
10 85 89 21 75 77
11 90 95 22 80 83
Do these results provide evidence that the special training helped or hurt student
performance? Use 0.05 level of significance. Assume that the mean differences are
approximately normally distributed.
23
Solution:-
Step 1) Set up Hypothesis:-
Let µ1 be the average score before the training and µ2 the average score after the
training.
Null hypothesis:
H0: µ2 - µ1 =0 (The average score after the training did not change i.e. the training
did not hurt or help)
or H0: μ2 = μ1
Alternate hypothesis:
Ha: µ2 - µ1 ≠ 0 (The average score after the training changed i.e. the training
either hurt or help)
or: Ha: μ2 ≠ μ1 Two tailed test.
Sample Size(n) = 22
DAvg = (ΣDj) /n = -1
Std Dev (s) = sqrt (Σ(Dj - Davg)2/(n-1)) = 3.586
Std. Err. (SE) = std. Dev (s) / sqrt (n) =3.586/sqrt (22) =0.765
Tcal= Davg / Std. Err. = 1/0.765= 1.307
Degrees of Freedom = 22-1=21
24
Pair Training No Dj (Diff) (Dj-Davg)^2
training
1 95 90 5 16
2 89 85 4 9
3 76 73 3 4
4 92 90 2 1
5 91 90 1 0
6 53 53 0 1
7 67 68 -1 4
8 88 90 -2 9
9 75 78 -3 16
10 85 89 -4 25
11 90 95 -5 36
12 85 83 2 1
13 87 83 4 9
14 85 83 2 1
15 85 82 3 4
16 68 65 3 4
17 81 79 2 1
18 84 83 1 0
19 71 60 11 100
20 46 47 -1 4
21 75 77 -2 9
22 80 83 -3 16
∑ Dj= 22
Davg= 1
∑ (Dj- 270
Davg)^2=
25
6) Z-TEST FOR TESTING OF PROPORTION
The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are
very satisfied with the service they receive. To test this claim, the local newspaper
surveyed 100 customers, using simple random sampling. Among the sampled customers,
73 percent say they are very satisfied. Based on these findings, can we reject the CEO's
hypothesis that 80% of the customers are very satisfied? Use a 0.05 level of significance.
Null hypothesis:
H0: P = 0.8 (80% of the customers are satisfied)
Alternate hypothesis:
Ha: P ≠ 0.8
Two tailed test.
26
Example 2: One-Tailed Test:-
Historically, 12% of Small Business Loans granted result in default. Three years ago, FPB
of Central City purchased software which they hope will assist in reducing the default
rate by more effectively discriminating between small business loan applicants who are
likely to default and those who are not likely to do so. After adequately training their
loan officers in use of software, FPB sampled 150 small business loan applications
processed using the software and found 9 to be in default at the end of two years. Using a
= 0.05, does it appear that the software is of value?
Null hypothesis:
H0: P >= 0.12 (The software is of no value i.e. the proportion of defaulters
has not reduced)
Alternate hypothesis:
Ha: P < 0.12 (The software is of value i.e. the proportion of defaulters has
reduced)
27
7) T-Test for testing significance of correlation coefficient:-
Example:-
Suppose you observe that r= .50 between literacy rate and political stability
in 10 nations.
1) Is this relationship "strong"?
2) Is the relationship "significant"?
Solution:-
1) Coefficient of determination = r-squared =0.25 Means that 25% of variance in
political stability is "explained" by literacy rate.
2) For testing whether Relationship is significant:-
28
Using the following data, apply sign test to determine whether there is difference
between the number of days required to collect an account receivable before and after a
new debt collection policy
Before 30 41 29 33 30 40 45 38 28 36 35 41 40 38 35 34
After 32 38 26 30 31 40 42 42 25 35 29 38 38 42 34 38
Solution:-
Since the objective is improve debt collection, if the days for collection after
implementing debt-collection policy are less than before the implementation of the
policy then it is considered as a success and indicated by “+” sign.
H0: P <= 0.5 (The probability of success is less than or equal to 0.5)
H1: P > 0.5 (The probability of success is greater than 0.5)
29
35 34 +
34 38 -
Number of positives: - 10
Number of negatives: - 5
Sample proportion of Success (p) = 10/(10+5) =2/3
30
EXERCISE
1) A packaging process is set to fill packets of oil to a mean of 400 ml. The amount of oil
filled is normally distributed and the standard deviation is known to be 6 ml. It is
important to check the quality of the process periodically because if it is over-flowing
then it reduces the profitability of the company; if it is under-filling, then it risks
prosecution. Accordingly a random sample of 25 pouches is examined and the mean
quantity of oil filled is found to be 403 ml. Using a 5% level of significance, can we
conclude that the process is no longer filling oil with a mean of 400 ml? To frame the
question differently, is the mean quantity of oil i9n the sample significantly different
from 400 ml at a 0.05 level of significance? Also, calculate the p-value and interpret
the result. [Ans: zcal = 2.5 > 1.96. Reject Ho. p-value = 0.00124 < 0.05. Confidence
Interval = (397.64 – 402.35). Value of 403 falls outside the limits]
2) A random sample of 80 bank employees is taken to test a claim that the mean salary of
the bank executives in a certain State is Rs 48,400 per month. Further, from a related
study undertaken recently, it is known that the standard deviation of the distribution of
the salaries of the bank executives in that State is Rs 5,870. This value is believed to
be true. The random sample that was taken indicated that the average monthly salary
was Rs 47,456. Is the claim tenable that the mean salary was Rs 48,400 per month?
Test at 1% level of significance. [Ans: zcal = ─ 1.438 > ─ 2.58 (zcritital). Do not
reject Ho. p-value = 0.1498 > 0.01]
3) A consumers’ group is suspicious about the weight of a certain brand of cereal for
which the boxes are labeled as containing 12 ounces. A random sample of 50 boxes
yields an average weight of 11.6 ounces with the standard deviation of 1.68 ounces.
At 5% level of significance, test the hypothesis that the population mean weight is
12 ounces. Further, also calculate the p-value and show that the decision using this
criteria is same as the one reached by z-test. [Ans: zcal = ─ 1.68 > ─ 1.96 (zcritical). Do
not reject Ho. p-value = 0.0930 > 0.05. Confidence limits = (11.534 – 12.466)]
4) A company is engaged in the manufacturing of brake systems for cars. A car running
at a speed of 50 kmph comes to a halt after covering a distance of 24 feet on the
average, when the brake which is already in use is applied. Recently the company has
developed a new brake system which, when applied 80 times to cars running at 50
kmph, gave a mean distance of 23.2 feet, with a standard deviation of 2.8 feet, for the
cars to come to a halt. Test at 1% level of significance whether the new brake system
is superior to the existing one. Also, calculate the p-value and confirm the results.
[Ans: It is a one-tail test because, to be superior, the new brake system should
have a mean significantly smaller than 24 feet. zcal = ─ 2.56 < ─ 2.33 (zcritical).
Reject Ho. p-value = 0.0052 < 0.01. Confidence limit (lower limit) = 23.271 > 23.2]
5) Ten bars of a certain quality are tested for their diameters. The results are given
below. At 95% level of confidence test that the mean diameters of the bars produced
by the process used is one cm. Verify your result by using the p-value approach.
31
Diameter (cm) 1.02 0.98 0.97 1.01 0.94 0.98 1.00 1.03 0.92 1.02
[Ans: We use the t-test. tcal = ─ 1.136 > ─ 2.262 (tcritical). Do not reject Ho.
p-value = 0.2853 > 0.05. Confidence limits = (0.9741 – 1.0259)]
6) The manager of a certain University café claims that, out of the customers visiting the
café, 40% prefer to have self-service. To test this claim made by the manager, a
sample of 150 customers is examined randomly. It is found that 54 out of them
preferred self-service. Examine the claim made by the manager in each of the
following cases: (i) Using a 95% confidence level (ii) Using a 99% confidence level
(iii) Using the p-value approach. (iv) Finding confidence limits at 95% , 99% levels
[Ans: (i) zcal = ─ 1.00 > ─ 1.96 (zcritical) for 95% level. Do not reject Ho.
(ii) zcal = ─ 1.00 > ─ 2.575 (zcritical) for 99 % level. Do not reject Ho.
(iii) p-value = 0.3174 which is > 0.05 as well as 0.01. Do not reject Ho.
(iv) Confidence limits: For 95% = (0.3216 – 0.47840).
For 99% = (0.297 – 0.503)]
8) A large retailing company wants to know whether there is a difference in the average
size of customer accounts in its Kolkata and Mumbai stores. Past experience has
shown that the standard deviations for the two are Rs 180 and Rs 192 respectively.
Samples of 80 accounts taken from Kolkata gave a mean value of Rs 885. Samples
of 90 accounts from Mumbai gave a mean value of Rs 936. At 5% level of
significance, does this provide evidence that the mean account sizes at the two
stores are different? [Ans: zcal = ─ 1.79 > ─ 1.96 (zcritical). Do not reject Ho.]
32
10) A new software package has been developed at Excel Ltd. Which seeks to reduce the
time required by systems analysts to design, develop and implement information
systems. To test the efficacy of this new software, a random sample of 15 systems
analysts who are using the current (i.e., the existing) software technology, and another
random sample of 12 systems analysts who are trained to use the new software
package are selected. Each of the analysts is given information on a hypothetical
information system. The two sets of analysts perform the assigned work and the
following results are obtained.
Test at 90% level of confidence if it can be concluded that the new software package
is better in terms of developing and implementing an information system.
[Ans: tcal = 2.775 > 1.316 (tcritical). Reject Ho]
11) A new detergent powder was recently developed by a multinational company. It was
advertised in two States of India using two different media. In State A, it was
advertised on TV only, while in State B, it was advertised on FM radio and local
newspapers. The amount spent on advertisement was equal in both the States. After a
month of the advertisement campaign, a survey was conducted to assess the awareness
about this detergent in both the States. In State A, 600 people were selected randomly
out of which 360 responded that they were aware about the new detergent powder. In
State B, a random sample of 800 people was taken, out of which 620 indicated that
they were aware about the new product. On the basis of this survey, can we
conclude that the advertising media used in the two States were equally effective (or,
there was no difference) in making the consumers aware about the new product? Test
at 95% level of confidence.
[Ans: zcal = ─ 7.14 < ─ 1.96 (zcritical). Reject Ho]
12) In a certain city, an average middle class family spends Rs 19,000 per month. A
random sample of 64 families in this city indicated a mean monthly expenditure to be
Rs 18,450. The standard deviation was Rs 1,450. Using 5% significance level,
compute the value of the relevant statistic and state the decision you would take.
Compute the p-value of the statistics and comment. [Ans: zcal = ─ 3.034 < ─ 1.96
(zcritical). Reject
Ho. p-value = 0.0024 < 0.05]
13) A diary advertises that a tub of ice-cream gives, on an average, 80 scoops of ice-cream
A retail ice-cream seller who buys, in wholesale, ice-cream from this dairy suspects
the claim made by the dairy. To verify this claim, he selects randomly 60 tubs of ice-
cream and finds that, on an average, 78.8 scoops of ice-scream were obtained. The
standard deviation of the sampled tubs was found to be 4.4 scoops. Using 1% level of
confidence, carry out the one-tailed test and decide on who is correct. Find the p-
33
value from the computed statistic. [Ans: zcal = ─ 2.11 > ─ 2.33 (zcritical). Do not reject
Ho. p-value = 0.0174 > 0.01]
14) For a long period of time, a company has been using a particular component in its
production process. This component is known to have an average life of 2,400 hours.
The company is now considering to replace the existing component. It has to make a
choice between two new brands for the same component. For this purpose, the
purchase manager of the company studied a random sampling of each of the two
brands and obtained the following results
Brand A Brand B
Mean Life 2,442 hours 2,456 hours
Standard Deviation 126 hours 185 hours
Sample Size 80 80
Using 1% level of significance, test whether either Brand A or Brand B, or both, Brand
A and Brand B are better than the existing one which is being currently used. Both the
Brands are known to cost the same. Use one-tail test.
[Ans: For Brand A: zcal = 2.98 > 2.33 (zcritical). Reject Ho.
For Brand B: zcal = 2.71 > 2.33 (zcritical). Reject Ho.
This shows each of the new brands is better than the existing one.
p-value for Brand A is 0.0014 > 0.01
p-value for Brand B is 0.0034 > 0.01
From these values, it is evident that the evidence in the case of Brand A is
stronger than that for Brand B. Hence, Brand A may be preferred.]
15) A study is done to determine the effects of removing a renal blockage in patients
whose renal function is impaired because of advanced metatstatic malignancy of
nonurologic cause. The arterial blood pressure of a random sample of 10 patients
is measured before and after surgery for treatment of the blockage yielded the
following data:
Based on the sign test, can we conclude that the surgery tends to lower arterial
blood pressure?
34
17) A psychologist claims that the number of repeat offenders will decrease if first time
offenders complete a particular rehabilitation course. You randomly select 10 prisons
and record the number of repeat offenders during a two-year period. Then, after first-
time offenders complete the course, you record the number of repeat offenders at each
prison for another two-year period. The results are shown in the following table. At
0.05 significance level, can you support the psychologist’s claim?
Prison 1 2 3 4 5 6 7 8 9 10
Before 21 34 9 45 30 54 37 36 33 40
After 19 22 16 31 21 30 22 18 17 21
35