I M Com QT Final On16march2016
I M Com QT Final On16march2016
I M Com QT Final On16march2016
Master of commerce
Semester I
Paper II
Study Material
2015 Admission onwards
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
CALICUT UNIVERSITY P.O., THENJIPALAM, MALAPPURAM-673635
2022
School of Distance Education
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Master of commerce
Study Material
2015 Admission onwards
Semester I
Paper II
QUANTITATIVE TECHNIQUES FOR BUSINESS DECISIONS
TABLE OF CONTENT
No. Topic Page No.
1 QUANTITATIVE TECHNIQUES - CONCEPTS 5
2 INFERENTIAL ANALYSIS- POINT ESTIMATION 13
3 INTERVAL ESTIMATION 19
4 SAMPLING & SAMPLE SIZE 23
5 TESTS OF SIGNIFICANCE - CONEPTS 27
6 PARAMETRIC TESTS MEANS & PROPORTIONS 32
7 TESTS FOR VARIANCE & PAIRED OBSERVATIONS 45
8 ANALYSIS OF VARIANCE 51
9 NON P[ARAMETRIC TESTS - CONCEPTS 59
10 CHI-SQUARE TESTS 65
11 SIGN AND SIGNED RANK TESTS 73
12 RANK SUM & OTHER NON PAREMETRIC TESTS 84
13 STATISTICAL QUALITY CONTROL - CONCEPTS 93
14 CONTROL CHART FOR VARIABLES 100
15 CONTROL CHART FOR ATTRIBUTES 107
16 TOTAL QUALITY MANAGEMENT 116
17 CORRELATION ANALYSIS 122
18 RANL, PARTIAL & MULTIPLE CORRELATION 131
19 REGRESSION ANALYSIS 137
20 SOFTWARES FOR QUANTITATIVE ANALYSIS 147
21 APPENDIX 160
Class Frequency
15.1 - 15.5 2
15.6 - 15.8 8
15.9 - 16.1 9
16.2 - 16.5 7
16.6 - 16.9 4
Total 30
EX 1.1
An employment exchange gave following information about its registered candidates.
Level of education not completed +2 = 35%, completed +2 31%, attended but not
completed degree 16%, completed degree 9%, not completed PG 6% and completed PG 3%.
Construct a relative frequency table and comment on the trend of registration.
Ex 1.2
UNIT II
INFERENTIAL ANALYSIS POINT ESTIMATE
Introduction
One of the main objectives of statistical studies is to draw valid conclusion about the
population on the basis of samples drawn from the population. Such a process of inferring about
the population is called inferential analysis. Inferential analysis is often required and applied in
business management.
Management is confronted with various practical problems like augmentation of
production, maximization of profit, minimization of cost, introduction of innovations
improvement of production methods etc. these problems lead to accomplishment of certain pre-
determined objectives and goals.
There has been a growing tendency to turn to quantitative techniques as a means for
solving many of these managerial decision problems that arise in a business or industrial
enterprise. A large number of business problems have been given quantitative representation
with considerable degree of success. Inferential analysis is such a quantitative e technique widely
applied for managerial decision taking.
Inferential analysis
Inferential analysis is a prominent quantitative technique based on probability concept to
deal with uncertainty in decision making it is a set of statistical methods to assume with
reasonable accuracy, population characteristics on the basis of given en sample statistics.
Statistical inference can be defined as drawing inference from probabilistic sample, about
unknown population parameters.
Types of statistical inference
Statistical inference may be focused either on examining hypotheses or on predicting
probable values. Accordingly two types of statistical inferences are hypotheses testing and
statistical estimation.
In hypotheses testing we examine the claims made about unknown population parameter
using sample statistics. These claims are made using some past experience and logic.
Statistical Estimation means estimating unknown population parameters, with reasonable
accuracy, using sample statistics. This unit focuses on statistical estimation.
Statistical Estimation
Everyone makes estimates. When we are ready to cross a road, we estimate the speed of
any approaching car, the distance to the car, and our own speed. Having made these quick
estimates, we decide whether to wait or to walk.
Business managers also estimate for various purposes. Estimation is the process of
assessing characteristics of a phenomenon, on the basis of intuition, experience, statistics and
other available information
When estimation is exclusively based on statistical methods, it is statistical estimation.
Statistical estimation is a useful quantitative technique.
A carton of syringes contain 5 packets of 20 each. Following is the number of defectives in each
packet. Estimate the proportion of defectives.
Packet No Defectives
1 2
2 4
3 3
4 2
5 1
Solution
Packet No Defectives
1 2
2 4
3 3
4 2
5 1
Total 12
Defectives observed = 12
Therefore, estimated proportion of defectives = 12/100 = .12 (it is likely that 12 out of
every 100 items may be defectives.)
Ex 2. 3
Mamatha Bakery delivers shavarma @ Rs 60 and guarantees that it will be delivered
within 30 minutes of order. If it takes more than 30 minutes, it will be given free and recorded
as 30 minutes. Twelve random orders are delivered as below. Find the average estimate
delivery time. What is the population and can the sample be used to estimate average
delivery time correctly?
Sl No Delivery time
1 15.0
2 25.0
3 30.0
4 10.0
5 30.0
6 19.0
7 10.0
8 12.0
9 14.0
10 30.0
11 22.0
12 23.0
TOTAL 240.0
Proportion
Mean is considered a useful estimator. Proportion is another popular estimator of
population values. Proportion is generally used to express parameters of populations in social
studies. Using sample proportion, population proportion can be estimated statistically.
Ex2.3
Out of 60 executives in an IT company, 50 uses 4G Cell phones. Give a point estimate of
proportion of 4G users.
Sample proportion = 50/60 or 5/6
Estimated populating proportion = 5/6 (that is out of every 6 persons, 5 will be using 4G
cell phones.)
Ans
Ex 3.2
Average age of 100 college teachers is 45 with standard deviation 15. What would be the
probable general average age of college teachers in Kerala.
Ans
Ex 3.7
AM motors states that out of 75 employees, 40% are females. Estimate with 99%
confidence, proportion of females, to be recruited in future.
Solution
P = .4 q = .6 n = 75
.. .
Standard error =
99% upper confidence limit = .4 + 2.58 x .03 = .477
99% lower confidence limit = .4 - 2.58 x . 03 = .323
Ex 3.8
Dr Shivkumar, a psychologist, surveyed 70 executives and found that 66% of them could
not add fractions.
(a) Estimate standard error of proportion
(b) Find lower and upper 95% confidence limits.
P = .66 q = .34 n= 70
.. .
Standard error proportion = = .056
95% upper confidence limit = .66 + 1.96 x .056 = .771
95% lower confidence limit = .66 - 1.96 x .056 = .549
Review Questions and Exercises
. .
Z value = = = = 1
.
Z value = = = .
= 0.55
t= = = = 1.91
. . .
Degree of freedom = 20 - 1 = 19
Table value t @ 5% level of significance is 2.093
The calculated value is numerically less than the table value. So, we ACCEPT the null
hypothesis. There is no significant difference between mean of sample and original mean.
So, the advertisement is not effective.
EX: 6.5
Prices of the shares of a company on different days in a month were found to be 66, 65,
69, 70, 69, 71, 70, 63, 64, 68. Discuss weather mean price of in month is 65.
Ans:
X
-4 16 66
-5 25 65
-1 1 69
0 0 70
-1 1 69
1 1 71
0 0 70
-7 49 63
-6 36 64
-2 4 68
- 133 675
25
H0= There is no significant difference. The mean price of the share of the month is 65.
X=675 , n= 10 ,
675
Mean = x =
= 10 = 67.5
S.D (s) = = = 13.3 6.25
= 7.05 = 2.655
. . .
SE = = = = = 0.885
.
t value = = = = 2.92
. .
t=
SE = = = = = 12.06
.
t= == = = 1.24
. .
Degree of freedom = n-1 = 12-1 = 11 , Given level of significance = .01 Table
value of t (for one tailed test) = 2.178
The calculated value 1.24 is less than the table value 2.178. So, we accept the Ho. The
stenographer can type at an average 120 words per hour.
EX: 6.7
A factory was producing electric bulbs of average length of life 2000 hours. A new
manufacturing process was introducing with the hope of increasing the length of life of bulbs. A
sample of 25 bulbs produced by new process, was examined and average length of life was
found to be 2200 hours. Examine whether the average length of the bulbs was increased
assuming of lives of bulbs follow normal distribution with = 300. (Significant level 0.05)
Ans:
H0: There is no significant difference. The new product has not increased the length of bulbs.
Here we want to test that the increase in the length of bulbs life is significant. So, its a
one-tailed test. Population SD is given. So, we can use Z-test for the calculation.
Z= , SE =
SE = = = = 60
Z= = = = 3.33
Table value @ 0.05 level of significance is 1.645 (Sine one tailed test).
The calculated value 3.3 is numerically greater than the table value 1.645.
So, we REJECT the H0. The new product has increased length of life.
Mean Tests Sample v/s Sample
Such test may use large sample or small sample. The test examines the significance of
difference of one sample against another sample. There is no population value. The test
focuses on the difference between samples, or whether they conform to each other. In this case
two sample sizes, and the combined standard deviation, or two sample standard deviations must
be given. Then standard errors are calculated as below
is given)
SE = + ( for large sample. When two sample standard deviations are given)
Degree of freedom = + 2
EX: 6.8
The mean yield of wheat from district A was 2010 lbs. with standard deviation 10 lbs. per acre
from a sample of 100 plots. In another district B, the yield was 200 lbs. with standard deviation 12 lbs.
from a sample of 150 plots. Assuming that standard deviation of yield in the entire state was 11 lbs., test
whether there is any significant difference between the mean yields of the crops in the two districts
Ans:
H0 = There is no is no significant difference.
There is no difference between mean yield of crops between mean crops in the two
districts. Here, the samples are large. So, we can use z-test. You note that here in this question
population S.D is given. So, sample S.D not used for finding S.E
Z= , SE = + n1 =100 , n2 = 150.
Z= = = =7
. .
Degree of freedom is infinity
Table value @ 5% level of significance is 1.96
= , SE = + Here n1 = 50 , n2 = 50 SE = +
=SE = + =2x = 2 x . 04 = 2 x .2 = .4 Z = =
. . .
= = 3.75 Level of significance = .05
. .
Degree of freedom = infinity
Table value of Z = 1.645 (Since one tail test)
The calculated value (3.75) is numerically greater than
table value (1.675). So, we REJECT the null hypothesis.
The special diet really promotes the weight.
EX: 6.10
X Company Y Company
No. of Bulbs Used 100 100
Mean Life in Hours 1300 1248
Standard Deviation 82 93
Using standard error of the difference between mean, state whether there is any significant difference in
the life of the two makes
Ans:
H0= There is no significant difference. There is no significant difference in the life of 2 brands.
Z= , SE = + S1 = 82 ; S2 = 93 ; n1 = 100 ; n2 = 100
= = = = 4.19 Degree of
. .
freedom is infinity The Z-Table value
@ 5% level of significance is 1.96 The calculated value
(4.19) is numerically greater than table value (1.96). So, we REJECT the Ho.
There is very significant difference in the
life of 2 brands.
Z= ;
. . . .
= + = + = . 0225 + .0067 =. 0292 = .1708 Z = =
.
.
= = 14.64 Z-table value @ 5% level of
.
significance for infinite degree of freedom is 1.645 (one tailed test). The calculated value (14.64) is
numerically greater than the table value (1.645). So, we REJECT the H0.
The average wages of workers of Kerala are higher than that of workers in Tamil Nadu.
EX: 6.12
A random sample of 200 villages was taken from district A and average proportion per village was
485 with SD 50. Another village random sample of 250 villages from the sample the same district gave an
average population of 510 per village with SD of 40. Is this difference between the averages of these of the
two sample statistically significant?
Ans:
Ho = There is no significant difference
Difference between average of two samples are not statistically significant.
t= ; SE = +
SE = + = (. 04 + .04) = (.08)
.
= = = 45.708 = 6.535 t
so, we will REJECT the Ho. Both machjines are not equally efficient. EX: 6.14
In a test given to two groups of students the mark obtained were obtained were as follows:
Group I 18 20 36 50 49 36 34 49 41
Group II 29 26 28 35 30 44 46
Assuming that the groups standard deviations are same, test the hypothesis that the group means are
equal.
18 -19 361
20 -17 289
36 -1 1
50 +13 169
19 +12 144
36 -1 1
34 -3 9
19 +12 144
41 +4 16
333 0 1134
Group I
x = = = 37 S1 = = = 126 0 = 11.22
d d
29 -5 25
26 -8 64
28 -6 36
35 +1 1
30 -4 16
44 +10 100
46 +12 144
238 0 386
x = = = 34 S2 = = = 55.14 0 = 7.426
. . .
SE = + = + = (. 111 + .143) =
1134+385.98 1519.98
14
. 254 = 14
. 254 = 108.57 .254 =
Ex 6.15 Average life of 26 bulbs is 1200 hours with standard deviation 15o hours. Test whether
these bulbs could be considered as a random sample from a normal population with mean 1300
hours.
Standard Error = = = = 30
Z value = = = = 3.33
Since calculated t value is more than t table value, difference is considered significant.
Hypothesis is rejected. Bulbs cannot be considered random sample from normal population.
4. Obtain Z value = =
5. Compare with Z table value at significance level and degree of freedom.
6. Decide the fate of null hypothesis
Ex 6.16 A population survey indicates that out of 3232 births, 1705 were boys and 1527 girls. Do
these figures confirm the hypothesis that the gender ratio is 50:50?
Null Hypothesis : No significant difference between sample proportion and population
proportion. Gender ration is 50:50
Given p = 0.5 q = 0.5 n = 3232
. .
Standard error = = = .0088
. .
Z value = = .
= 3.125
Z table value at = .05 == 1.96. The calculated value 3.125, is more than Z table value.
Hence the difference between proportions is significant. So, the hypothesis is rejected. Gender
ratio is not 50;50.
Proportion test - Sample Vs Sample
Sometimes proportions of 2 samples may be given, instead of population proportion. In
that case population proportion may be estimated, on the basis of sample proportion and sample
size. And the n standard error obtained.
Steps
1. Form null hypothesis
2. Consider given sample proportions and sample size.
3. Obtain standard error of proportion = + (Where P = population proportion
and Q is 1-P
4. Calculate population proportion P =
5. Find Z value = =
6. Compare with Z table value and decide fate of Ho
Ex 6.17
In a city 400 out 500 men were smokers. After an awareness program, a random sample of 600
men revealed 400 smokers. Is the awareness program effective?
. .
Z value = = = = 4.81
.
Z table value at = .05, = 1.96 and calculated Z value is more than table value.
Therefore the difference is considered significant. The hypothesis is rejected. Ultimately
awareness program is effective.
Ex 6.18
A random sample of 16 men from Malappuram District had a mean height of 68inches
and sum of squares from mean 132. 25 men from Amrithser district had the corresponding
values 66.5 and 165 respectively. Do the samples belong to the same population?
Given . x 1 = 68 x 2 = 66.5 (x1 - x1) = 132 = n1s1 (x2 - x2) =165 = n2s2
n1 = 16 n2 = 25
Standard Error = ( + )
= ( + ) = 0.78 = 0.88
.
t value = .
= 1.697
Since the calculated t value is less than t table value, the difference is insignificant.
Hypothesis is rejected. Samples belong to the same population
UNIT VII
PARAMETRIC TESTS FOR VARIANCES AND PAIRED OBSERVATIONS
Two popular parametric tests are variance tests and paired observations tests, after mean
tests and proportion tests. Variance tests focus on significance of difference between population
variance and a sample variance , or between a sample variance and another sample variance.
Paired observations tests examine significance of difference between two dependent sample
groups.
Tests for variances
Significance tests may be designed to examine equality of variance of two samples or
populations. Such tests examine whether one population is significantly differ from other , or
two samples are randomly drawn from a population or not. In other word, we are able to
determine whether two independent estimates of population variance are homogenous or not.
The test can be also used to examine equality of standard deviation of two sample groups. In
variance tests there is no need of standard error. The test statistic in tests of variances follows
Snedcors F distribution.
We are not always interested in means and proportion. In many situations responsible
decision makers have to make inferences about its variability, within a population. A sociologist
investigating effect of education on earning capacity may be eager to know whether income of
college graduates is more variable than income of post graduates.
Steps
1. Formulate Null hypothesis
2. Consider given observations, standard deviations and sample size.
3. Find F Ratio =
4. Compare the obtained F value with F table value
5. Accept or reject null hypothesis , as the case may be
Assumptions
1. Populations from which samples are drawn is normally distributed
2. Samples are randomly drawn and independent of each other
3. Means of population or samples are taken to be equal.
Ex . 7.1
In a rat feeding experiment ,high protein was given to sample x containing 12 rats and low
protein to sample y with 7 rats. Weights gained by them ( in gms) are as below. Examine whether
protein leads to weight gain.
X 13 14 10 11 12 16 10 8 11 12 9 12
Y 7 11 10 8 10 13 9
X Y
169 49
196 121
100 100
121 64
144 100
256 169
100 81
S1 = - ( ) = = - ( ) = 4.42
S2 = - ( ) = = - ( ) = 3.43
. .
F value = = = 1.205
F table value = 0.05 and = 11, 6 = 4.03.
Since the calculated F value is much less than the F table value, the difference is
considered significant. Ho is accepted. Protein does not lead to weight gain.
Ex 7.2
An economist believes that income earned by graduates is more variable than non
graduate employees. A sample of 21 graduates have earning with standard deviation of 17000,
and income of 25 non graduates gave a standard deviation of 7500. Is his belief true?
Ex 7.3
Quality controller of Hero Honda is concerned with uniformity in the number of defects in
bikes coming off two assembly lines. If there is significant variability in defects, he wants to re
install the assembly lines. Test at a 0.05 level of significance.
Defects in A Defects in B
Mean 10 11
Variance 9 25
Sample size 20 16
Given : n1 = 20 n2 = 16 1 = 9 1 == 25
Mean tests, proportion tests or variance tests relate to independent sample groups. The
samples were drawn independently from a population, on random basis. However there are
situations where samples are related in one way or other and they have to be tested as
dependent sample groups.
Generally data in this case of dependent samples will be given in pairs. Therefore the test
of dependent samples is also called test for paired observations.
For example, disputes before and after Act will form 2 dependent samples. The variables
in the two samples are the same. A value from one sample is given , with the corresponding
value in the other sample, are together taken and forms a pair. Thus we get pairs of
observations
Paired observation focuses on average deviations between pairs (d) and standard error.
Here no parameter is employed or no distribution is adopted.
Assumptions
Steps:
s = standard deviation of difference = - ( )
Ex 7.4
To test efficacy of sleeping pills, 5 person are selected. Time before sleep is given below in
seconds. Test whether sleeping pills are effective.
Ans
.
T value = =
.
= -2.04
Ex 7.5
Ten students scored following marks in two tests as below. Examine, if there is significant
difference in their performance.
Test1 67 24 57 55 63 54 56 68 33 43
Test 2 70 38 58 58 56 67 68 72 42 38
Test 1 Test 2 d d
67 70 -3 9
24 38 -14 196
57 58 -1 1
55 58 -3 9
63 56 +7 49
54 67 -13 169
56 68 -12 144
68 72 -4 16
33 42 -9 81
43 38 +5 25
-47 699
s== = = .69
.
SE = = = 2.3
.
t value = =
.
= -2.04
Since calculated t value is less than t table value we accept the hypothesis that there is no
difference in performance.
Ex 7.6
.
t value = =
.
= 2.9
Since calculated t value is more than t table value we reject the hypothesis that there is no
increase in weight. There is surely increase in weight.
13. Ten soldiers visit a rifle range for two consecutive weeks. for the first week, their scores
are 61, 26, 57, 55, 63, 54, 56, 68, 33, 43 and during the second week, they score in the
same order 56, 36, 58, 58, 56, 67, 68, 72, 42, 38. Examine, if there is significant difference
in their performance. Conduct paired observation test.
14. A certain medicine administered to each of the 12 patients resulted in the following
increase of blood pressure 5, 2, 8, -1, 3, 0, -2, 1, 5, 0, 4, 6. can it be concluded that
stimulus will, in general be accompanied by an increase in blood pressure, by doing paired
t test.
15. Two sets of 10 students selected at random from a college were taken, and were given
memory test and their scores were;
set A: 10 8 7 9 8 10 9 6 7 8
set B: 12 8 8 10 8 11 9 8 9 9
Test whether there is a significant difference in mean scores. Do paired samples test.
(the value of t for 18 d.f = 2.101)
16. 10 accountants were given intensive coaching and four tests were conducted in a month.
the scores of the tests 1 and 4 are given below:
Sl.no: 1 2 3 4 5 6 7 8 9 10
marks in 1 test: 50 42 51 42 60 41 70 55 62 38
marks in 4 test: 62 40 61 52 68 51 64 63 72 50
Examining the result of paired t test, does the score from test1 to test 4 show an
improvement?
1 = =6 2 = 20/4 = 5 3 = 16/= 4
Mean of means = =5
( )
1. Find correction factor = /n =
2. Find sum of squares total = SST = sum of squares of all observations - /n
( ) ( )
3. Obtain sum of square columns = SSC = + .
4. Find mean square column = MSC =
5. Obtain sum of squares within = SSE = SST SSC
6. find mean square within =
7. find f ratio =
8. present the result in ANOVA table
9. compare with f table value and decide the fate of h o
ANOVA table
The results obtained from analysis of variance whether one way classification or two way
classifications can be presented in a table called ANOVA table. It shows sum of variances, sum of
squares, degrees of freedom, type of variance and f ratio, along with table values. it facilitates
comprehension , comparison and analysis.
Ex. 8.3
Coding may be applied to reduce the data size. thus, deducting 41 from all the values, coded
data will be as below:
Ho : Productivity does not differ between machines. Workers do not differ in productivity
Let us apply coding method and subtract 41from all the values
A B C D TOTAL
3 -3 6 -5 1
5 -1 11 2 17
-7 -5 3 -9 -18
2 -3 5 -8 -4
-3 1 8 -2 4
0 -11 33 -22 0
N=20
( ) ( ) ( ) ( ) ( ) ( ) ( )
SSR= + +......- = + + + + -0 =161.5
.
MSR = = = 40.38
.
MSE= ( )( )
= = 6.14
A B C
14 14 18
16 13 16
18 15 16
22 19 20
Is there significant difference in the production of three varieties?
Expected frequency = = = 358
(Total of expected frequencies must come equal to total of observed frequencies. for this
purpose, rounding of figures must be done.)
o e /E
300 358 58 9.39
350 358 8 .18
425 358 67 12.54
700 642 58 5.24
650 642 8 .10
575 642 67 6.99
34.44
( )
value = = = 34.44
table value @ = 0.05 ,and degree of freedom = c-1 x r- 1 = 3-1 x 2-1 = 2 =
5.99. H0 is rejected because , calculated value is greater than table value. So supporters are the
same.
Ex. 10.2
In a hostel, there are 200 students, in 4 classes A: 70, B: 60, C: 30 & D: 40. Following
details are given on their meals heavy, medium & light. Are meals independent of class?
meals
class heavy medium light total
a 24 32 14 70
b 22 26 12 60
c 10 14 6 30
d 14 16 10 40
total 70 88 42 200
Ans : Ho = No significant difference . Meals are independent of class
O E /e
24 24 0 0
32 31 1 .03
14 15 1 .07
22 21 1 .05
26 26 0 0
12 13 1 .07
.22
Chi square table value at = 0.05, = 3-1 x 4 -1 = 2 x 3 = 6 = 12.59
Calculated value - 0.22 is much less than table value. Hypothesis is accepted. Meals do
not dependent on class.
b ) Tests for goodness of fit
If we have a set of frequencies of a distribution obtained by an experiment and if we are
interested in knowing whether these frequencies are consistent with those which may be
obtained based on some theory, then we can use chi square test of goodness of fit.
For example, if frequency distribution like Binomial or Poisson or Normal is applicable,
the expected frequencies would be derived using that distribution.
Ex .10.3
A sample analysis of an examination result of 200 students were made. It was found that
46 students had failed, 68 secured III class, 62 second class and the rest were placed in the first
division. Are these figure commensurate with the general examination results which is in the
ratio 2: 3 : 3; 2 for various categories respectively.
H0 : No significant difference. There is goodness of fit. The sample of 200 students result is
commensurate with general exam result.
O O-E /E
46 40 36 0.9
68 60 64 1.07
62 60 4 0.06
24 40 256 6.4
Total 8.43
( )
value = =8.43
SINGLE 39 51 48 42 180
TOTAL 169 221 208 182 780
.
O E /
137 130 49 .4
32 39 49 1.3
164 170 36 .2
57 51 36 .7
152 260 64 .4
56 68 64 1.3
147 140 49 .4
35 42 49 1.2
5.9
2 ( )
= = 5.9
Degree of freedom = (r-1) (c-1) = (2-1) (4-1) = 3
Table value = = 0.05, = 3 = 7.815. The calculated value 5.9 is less than table value.
Therefore the hypothesis is accepted. Cities show same tendency to marry.
X2 test can be used for testing the given population variance, (to test whether there is any
significant difference between sample variance and population variance).
The test statistic is obtained by the formula t= , where n is the sample size is the
sample variance and is the populatin variance. Degree of freedom = n-1
Ex .10.5
The standard deviation of a sample of 10 observations from a normal population was found to be
5. Examine whether this is consistent with the hypothesis that the standard deviation of the
population is 5.3?
Ans ; H0 = No significant difference . Standard deviation of the population is 5.3.
= = 8.88
Table value of at 0.05 level of significance for 19 degrees of freedom is 30.14. The
calculated value 8.88 is less than the table value. Hence we accept the null hypothesis.
Population standard deviation can be taken 9.
Ex 10.7 Following data relates to inoculation against fever. Is inoculation effective?
Attacked Not attacked
Inoculated 31 469
Not inoculated 185 1315
Ho: the two attributes inoculation and attack are independent. Inoculation is not effective.
O E /
31 54 529 9.80
469 446 529 1.19
185 162 36 3.27
1315 1338 529 0.40
Expected frequencies are = = 54 446, 162, 1338
Total 14.66
Table value at 1 degree of freedom, and 5% level of significance = 3.84. The calculated
value is much greater than table value. Hence the hypothesis is rejected. The inoculation is
effective.
O E /
22 30 64 2.13
38 52.5 210.25 4.00
60 37.5 506.25 13.50
35 31.25 14.06 0.45
70 54.69 234.4 4.29
20 39.06 363.26 9.30
23 18.75 18.06 0.96
32 32.81 0.66 0.02
20 23.44 11.83 0.50
Total 35.15
Expected frequencies ; = 30 52.5 37.5, 31.25 , 54.69, 39.06, 18.75, 32.81, and 23.44
Value = 35.15.
Calculated value 35.15, whereas table value 9.488. The difference is significant. The
hypothesis is rejected . Intelligence is related to
Precautions for chi square test
The chi square test is no doubt a most frequently used test, but its correct application is
an uphill task. It must be performed with special care and diligence.
1. It should be borne in mind that the test is to be applied only when the individual observations
of sample are independent which means that the occurrence of one individual observation
has no effect upon the occurrence of any other observation.
2. Small theoretical frequencies, if these occur in certain groups, should be dealt with ial care,
and must be added along with appropriate frequency.
3. sum of observed frequencies and expected frequencies must be equalized by rounding off.
4. Level of significance and degrees of freedom must be properly ascertained.
Conditions in applying chi square test
1. The total frequencies must be reasonably large say, at least 50
2. Expected frequency of less than 5 is pooled with the preceding or succeeding frequency
so that no expected frequency is less than 5.
3. Accordingly the degrees of freedom must be modified.
4. The distribution should not be of proportions or percentages. It should be of original
units.
Contingency tables
A contingency table is a frequency table in which a a sample from th population is
classified according to two attributes, which are divided into two or more classes. When there are
only two divisions for each attribute the contingency table is known as 2x2 contingency table.
The frequencies appearing in the table are known as cell frequencies. The independence of these
two attributes can be tested by chi square test. Contingency table can have m rows and n
columns.
Sign test focuses on the + or sign of each observation. In sign test, we do not give
importance to the magnitude of observation. Usually the central point of a data set is arithmetic
mean when and we give weight age to the magnitude of observations. It is not the case with sign
test.
Since the location wise central point is median sign test consider median, in place of
mean. Generally a value known as median value will be given in the question, and sign test
examines how many + signs are there which come above the median value, and how many
signs, representing values less than median value.
Sign test may be one sample sign test or two sample sign test. Again such tests may be
conducted in the context of small samples and large samples.
one sample sign test
All tests concerning means that you have studied are based on the assumptions that samples
are taken from a population having the shape of a normal distribution. When this assumption is
not possible, or any statistic is not available, vital questions still arise:
`. is there a significant difference between actual observations and theoretically
Expected observations
2. Is it reasonable to believe that samples have been taken from a probabilistic sampling
distribution
3 is it reasonable to accept that the sample as a random sample from a population etc.
One sample sign test is applicable, when sample is taken from a population which is
continuous. in this case, the probability that the sample value is less than mean, and the
probability that sample value is greater than mean are both i.e, and . Here the sign test is
used to test hypothesis as a population median. The median divide the distribution into two
equal halves. Now we may examine whether two halves are equal or not. if these halves are
equal or approximately equal, the distribution is and the same prediction is possible. This is the
rationale behind one sample sign test.
Z=
where, s = no. of maximum sign (+ or -)
n = no. of total sign
p = proportion of population mostly 0.05)
q = 1-p
steps
1.Form null hypothesis no significant difference
2.Consider the observations given and the median value
3.ascertain values greater than median value and replace them with + sign
4.ascertain values less than median value and replace them with - sign
5.if any value happens to be equal to given median value, no sign is assigned and
accordingly sample size is reduced
6. obtain number of maximum signs where = s
7. ascertain probability of population, as per null hypothesis
8. find z value as per the formula
z =
where s = number of maximum sign ( + or -)
p = proportion of population
q = 1 p.
n = sample size
9. compare with z table value, and decide the fate of Ho
EX 11 .3
The increase in pulse rate of 24 patients measured before and after a drug- Atenolol is
given below. Examine whether drug influence pulse rate by conducting a sample sign test. 18 is
the normal increase rate.
18 24 20 26 23 17 24 21 22 20 16 27 25 14 20 15 18 22 21 24 26 27 28
You may use 1% level of significance.
A. ho = no significant difference. drug does not influence pulse rate
signs : 0 + + + + - + + + + - + + - + - 0 + + + + + +
- signs = 5
+ signs = 17
No sign = 2
Z value @0.01 level of significance = 2.58. The calculated value 2.56 is less than 2.58.
Hence, hypothesis is accepted. Drug do not influence pulse rate
Ex 11.4
A survey was conducted to study preference for fast food. A sample of 100 persons indicated
that 54 do not prefer fast food, and 46 preferred fast food. By using sign test, examine the
hypothesis that half of people prefer fast food.
A. Ho = no significant difference. half of persons prefer fast food
Denote those prefer by + sign, and those dont prefer by sign
Then s = 54 (maximum), p = 0.5, q = 0.5 n = 100
( . )
z = = . .
= 0.8
Z table value at = 1.96, and calculated sign test value is 0.8. Hence the difference is
considered insignificant. Thus the hypothesis is accepted and half of persons prefer fast-food.
Sign test as proportion test
Two sample test can be performed as proportion test also. The difference between given
two sample groups form the sample proportion. Population proportion can be estimated as 0.5.
Standard error can be used to compute test statistic. Test statistic is the ratio between the
difference between population proportion, sample proportion , and standard error. The test
statistic will be compared with z table value to reveal the significance of difference.
steps
1. Form ho
2. Consider given sample values and sample size
3. Find the difference between pairs, and put appropriate signs
4. Take Maximum + or - signs as sample proportion
5. Reduce sample size considering no signs.
6. Find standard error =
7. Obtain test statistic, z = =
Ex 11.5
To examine effectiveness of traffic signal system, number of accidents before and after its
installation at a junction is given below. Use the sign test at =0.01, to examine the
effectiveness
9 and 5, 7 and 3, 3 and 4, 16 and 11, 12 and 7, 12 and 5, 5 and 5, 6 and 1
Is the training effective? Perform Wilcoxon Matched pairs signed difference rank test at
5% level of significance.
Z= = = 0.83
.
Since calculated Z value is less than Z table value, the difference is considered insignificant
and the hypothesis is accepted. Training is not effective.
14. The pulse rate of 12 patients are measured before and after administering a drug. Test the
null hypothesis that drug has no effect on pulse rate.
Test whether the distribution of scores under placebo and under rug are identical.
17. Following are 15 measurements of the octane rating of a certain kind of gasoline 97.5,
95.2, 97.3, 96.0, 96.8, 100.3, 97.4, 95.3, 93.2, 99.1, 96.1, 97.6, 98.2, 98.5, and 94.9. Use
the signed rank test at 0.05 level of significance to test whether the mean octane rating of
the given kind of gasoline is 98.5
18. Following are the number of employers absent from two government agencies on 25
days:
24-29 32-45 36-36 33-39 41-48 45-36 33-41 38-39 46-40 32-39
37-30 34-45 41-42 32-40 30-33 46-42 38-50 34-37 45-39 32-37
44-32 25-33 45-48 35-33 30-35
Use the Wilcoxon Two sample sign test at 0.05 level of significance to test the
hypothesis that absentees are uniform, all days.
32 1 B
38 2 A
39 3 A
40 4 B
41 5 B
44 6.5 B
44 6.5 B
46 8 A
48 9 A
52 10 B
53 11.5 B
53 11.5 A
57 13 A
60 14 A
61 15 B
67 16 B
69 17 A
70 18 B
72 19.5 B
72 19.5 B
73 21.5 A
73 21.5 A
74 23 A
78 24 A
Sum of ranks assigned to sample A or R1= 2+3+8+9+11.5+13+14+17+21.5+21.5+23+24=167.5
Since the calculated U value -1.01 is less than Z table value 1.96 the difference is not
significant and therefore hypothesis is accepted. The two samples come from the same
population.
Khruskal Wallis Test (H Test)
This test is similar to one way analysis of variance, but it does not require the assumption
that samples come from approximately normal population or the population to have the same
standard deviation or variance.
Khruskal wallis test examines whether three or more samples come from identical
populations or their means are approximately the same. The test is used to test the null
hypothesis that three or more independent samples come from identical populations. The test
focuses on sum of ranks. The test approximately follows chi square distribution and therefore can
be compared with chi square value for comparing test result, and deciding the null hypothesis.
The test is conducted in a way similar to the U test described above. This test is used to
test the null hypothesis that K independent random samples come from identical universes
against the alternative hypothesis that the medians of these universes are not equal. This test is
analogous to the one way analysis of variance, but unlike the latter it does not require the
assumption that the samples come from approximately normal populations or the universes
having the same standard deviation.
In this, test like the U test, the data are ranked jointly from low to high or high to low, as if
they constituted a single sample. The test statistic is H for this test.
If the null hyphenise is true that there is no difference between sample medians and each
sample has at least five items, then the sampling distribution H can be approximated with a chi
square distribution with (k-1) degrees of freedom. As such we can reject the null hypothesis at a
given level of significance if H value calculated, as stated above, exceeds the concerned table
value of chi square. For small sample the critical values can be taken from the Table given in the
appendix, for comparison and decision.
Steps;
1. Form null hypothesis that samples belong to identical population
2. Consider values in several sample groups
3. Mix the values and rank them jointly in increasing order from lowest to higher
4. If there are ties, assign mean of tied ranks.
5. Find sum of ranks for all samples separately and get R1, R2, R3,
6. Obtain test statistic H =
3n + 3
7. Compare with chi square Table value at specified degree of freedom ( C-1) and level of
significance
Use the Khruskal Wallis test at 5% level of significance to test the null hypothesis that a
professional bowler performs equally well with the four bowling balls, given following results.
H=
3n + 3
H= + + + 3 20 + 3 = 4.15
H value follows chi-square distribution. Chi-square value for C-1 = 4-1 = 7.815. The
calculated value 4.51 is less than chi-square table value. The difference is not significant and the
hypothesis is accepted.
H= 3n + 3
H= + + + 3 20 + 3 = 4.15
For Kushkjal Wallis test, we follow chi-square Table. At 1% level of significance, Chi-square
table value for C
-1 = 4-1 = 7.815. The calculated H value 4.51 is less than chi-square table value. The
difference is Insignificant and hypothesis is accepted. Four salesmen have performed equally in
sales drive.
Applying the many statistical concepts discovered throughout the above units , it was
always assumed that all sample data had been collected by some randomization procedure.
The run test based on the order in which sample observation are obtained is a useful
technique for testing the null hypothesis that observations are drawn at random.
For example suppose that 20 people are surveyed as to use of cosmetics. A researcher
may be eager to test the randomness of the simple, or is there any bias towards men or women.
In this case men may be denoted as M and women as W. The sequence in which 20 men
and women surveyed may be as below: WMMWWWMMWWWMMMM, WMWMW . First W is
a run, next MM is another run, and next WWW is a run and so on. There are altogether 11 runs in
this survey.
Here each groupings is a run. Thus a run is a subsequence of one or more identical symbols
representing a common property of the data.
( )
= ( ) ( )
Steps
Ex 12.5
20 men and 10 women queue before a bank. A man argues that men are neglected and
women favored by the clerk. Is it true? Conduct run test at =0.05. Presently the sequence of
men and women are;
Number of runs r= 7
( ) ( )
= ( ) ( )
.= ( ) ( )
= 2.38
.
Z value = = = -3.079
.
The calculated value of Z for runs test is -3.079, and is more than Z table value at =0.05 =
1.96. The difference is considered significant. The hypothesis is rejected. Men and women were
not chosen randomly. The argument that men are neglected and women favored by clerk is true.
Review questions and Exercises
12. A driver buys diesel either at a Texaco station (t), or at a Mobile station (m), and the
following arrangement shows the order of the stations from which she bought diesel over a
certain period of time.
t t t m t m t m t m m m
t m m m t m m t m t m m
t m m t t m t m m m t m
t t t m t t m t t t t m
Test for randomness at the 0.05 level of significance.
13. Following are the speeds at which every 5th passenger car was timed in a certain check point.
46 58 60 56 70 66 48 54 62 41 39 52
45 62 53 69 65 65 67 76 52 59 59 67
51 46 61 40 43 42 77 67 63 59 63 63
72 57 59 42 56 45 62 67 70 63 66 69
Test the null hypothesis for randomness at 0.05 level of significance. Do sign test.
(Hint median = 59.5, take value less than 59.5 as negative sign , and values greater than 59.5 as
positive signs. Any value equal to 59.5 is ignored)
14. Use the Kruskell Wallis H test at 5% level of significance to test the hypothesis
that a professional bowler performs equally well with 4 bowling balls.
score ball a 271 282 257 248 262
score ball b 252 275 302 268 276
score ball c 260 255 239 246 266
score ball d 279 242 297 270 258
15.Following are the number of students absent from a college on 24 days. Test for
randomness at = 0.01
29 25 31 28 30 28 33 31 35 39 31 33
35 28 36 30 33 26 30 28 32 31 38 27
control chart
60.00
50.00 UCL
quality value
40.00
30.00 CL
20.00
10.00
LCL
0.00
1 2 3 4 5 6 7 8 9 10 11 12
sample number
Double sampling inspection plan provides for taking a second sample, if we are not in a
position to arrive at a decision, about accepting or rejecting a lot on the basis of a single sample.
Mean Chart
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10
Since all points fall within control limits, the process is under control.
Ex 14.2
In the production of cement, 10 sample bags were selected for quality and weights are given
below. Standard deviation of weight is .5kg.
Mean Chart
54
52
50
48
46
44
42
1 2 3 4 5 6 7 8 9 10
Some points fall beyond the two control limits, therefore the process is not under control
Ex 14.3
Construct a control chart for mean from the following sample size
. .
CL = = 12.05 Range = . = 1.23
UCL= +A2 = 12.05+0.577x1.23=12.76
LCL= -A2 . = 2.05 - 0.577 x1.23=11.34
Mean Chart
13
12.5
12
11.5
11
10.5
1 2 3 4 5 6 7 8 9 10
As all points fall within the upper and lower control limits, quality control program is perfect.
Ex. 14.4
Construct a control chart for Means and Ranges from the following sample size
Sample number Mean length Range
12.2 1.2
2 12 1.4
3 11.8 1.2
4 12 1
12.2 1.2
6 12.1 1.4
Control limit values for Means
3 Range Chart
2.5
1.5
0.5
0
1 2 3
VALUES 4 5
MEAN(CL) 6 UCL7 8LCL 9 10
As per the range chart, all range values come within the upper and lower control
limits, and the process is under control.
Ex. 14.5
You are given following sample numbers and ranges relating to production of writing
chalks. Construct range Chart and comment on the quality.
Sample 1 2 3 4 5 6 7 8 9 10
Range 5 6 5 7 7 4 8 6 4 6
Range Chart
14
12
10
0
1 2 3 4 5 6 7 8 9 10
VALUES MEAN(CL) UCL LCL
Since all range values fall within the two control limits, the process is under control
Range chart reveals process quality on the basis of given range. Range is the
difference between smallest value and largest value with a sample group.
Range chart contains three lines CL, UCL and LCL. Range chart focuses on variations
with in samples groups and is more analytical than mean chart. Both mean chart and range chart
should be studied together since the former reveals variations between samples and the latter
reveals variations within sample groups.
1. Explain variable
2. State the causes of variations in quality
3. Distinguish between variable and attribute
4. State uses of control charts
5. What are important variable control charts
6. Explain control limits for a mean chart
7. How is a mean chart drawn
8. Draw a model control chart for mean
9. Explain range Chart
10. State the interpretation of a Mean Chart
11. What are the steps in drawing a Range Chart
12. How is Range chart interpreted
13. What is UCL and LCL
Sample 1 2 3 4 5 6 7 8 9 10
X 43 49 37 44 45 37 51 46 43 47
R 5 6 5 7 7 4 8 6 4 6
You may use the following control chart constants.( For n = 5, A2 = 0.58,D3 = 0 and D4=
2.115)
15. A machine is set to deliver packets of a given weight. 10 samples of size 5 each were
recorded. below are given relevant data:
Sample no. 1 2 3 4 5 6 7 8 9 10
Mean 15 17 15 18 17 14 18 15 17 16
Range 7 7 4 9 8 7 12 4 11 5
Calculate the values for the central line and the control limits for mean chart and then comment
on the state of control. (Conversion factors of n=5, are A2=0.58, d3=0, D4=2.115)
16. Construct a control chart for mean and range for the following data on the basis of fuses,
samples of 5 being taken every hour (each set of 5 has been arranged o ascending order
of magnitude)
42 42 19 36 42 51 60 18 15 69 64 61
65 45 24 54 51 74 60 20 30 109 90 78
75 68 80 69 57 75 72 27 39 113 93 94
78 72 81 77 59 78 95 42 62 118 109 109
87 90 81 84 78 132 138 60 84 153 112 136
3. Obtain UCL = + 3
4. Obtain LCL = - 3
5. If the LCL falls below zero, take it as zero.
6. Draw the three lines on the control chart.
7. Ascertain and Plot fraction defectives (p) of each sample.
8. Decide process quality on the basis of plots.
In a certain sampling inspection, the number of defectives found in 10 samples 100 each are as
given below:
16, 18, 11, 18, 21, 10, 20, 18, 17, and 21
Do these indicate that the quality characteristics under inspection is under statistical
control?.
. .
UCL= + 3 = .11 + 3 x = .22
. .
LCL.= - 3 = .11 - 3 x = .00
P values = .16, .18, .11, .18, .21, .10, .20, .18, .17, and .21
0.15
0.1
0.05
0
1 2 3 VALUES 4 5
MEAN(CL) 6 UCL 7 LCL 8 9 10
All fraction defectives values fall within the control limits, and therefore the process is under
control.
Ex 15.2
20 Samples of 100 batteries each are taken from a production process which gives
number of defectives as below. Determine control chart limits for fraction defectives.
9,17, 8, 7, 12, 5, 11, 16, 14, 15, 10, 6, 7, 18, 16, 10, 5, 14, 7, 13
. .
LCL.= - 3 = .11 - 3 x = .016
Fraction defectives .9, .17, .8, .7 . . . . . . . . . . .
15
10
0
1 2 3VALUES4 5MEAN(CL)
6 7 UCL 8 LCL9 10
Since all fraction defective points come within the upper and lower control limits, the
process is said to be under control.
Ex 15.3
The average number of defective in 22 sampled lots of 2000 rubber belts each was found to
be 16%. Indicate how to construct the relevant control chart.
. .
Obtain UCL = + 3 = .16 + 3 x = .1846
. .
Obtain LCL = - 3 = .16 - 3 x = .1354
For drawing the control chart for fraction defectives, defectives values under
each sample must be given.
Ex 15.4
A daily sample of 30 cell phones was taken over a period of 14 days in order to
establish attributes control limits. If 21 defectives were found, what should be the upper
and lower control limits of the proportion of defectives?
CL = = . = = = .05
. .
LCL = - 3 = .05 - 3 x = -.069 taken as zero
Interpretation of P chart
From the P chart, a process is judged to be in statistical control if all the sample points fall
within control limit. If one or more points fall beyond the two control limits-UCL and LCL, it
shows deterioration in quality. Reasons for this should be traced out and eliminated. If a point
goes below LCL or above UCL, it is an indication of bad quality. On the basis of fraction defectives
chart, reasons for deviation from the ideal product quality specification can be studied and
corrective measures adopted.
P chart is based on fraction defectives i.e. when the given data is on number of defectives out
of a certain number of samples, with definite sample size. And fraction defectives are a little
difficult to calculate and present.
Thus CL= = n
UCL= n + 3 n
LCL= n - 3 n
Steps
1. Select scale and draw x and y axis.
2. Find mean of defectives= = = n =CL.
3. Ascertain = and = 1-
4. Ascertain UCL= n +
5. Find LCL = n - .
6. If LCL is less than zero, it is taken as zero.
7. Draw the three lines, plot the number of defectives and decide the quality.
Ex 15.5
A sample of 100 items was examined each hour from a production process. The number
of defectives so found on a day is reproduced below:
Find the control limits for number of defectives and comment on the state of control of the
process.
= = = .08 and therefore =1- = ,92
UCL= n + = 8 + 100 .08 .92 = 16.1
LCL= n - . = 8 - 100 .08 .92 = -.1 taken as 0. (.If LCL is less than zero,
it is to be taken as zero.)
20
18
16
14
12 VALUES
10 MEAN(CL)
8 UCL
6
LCL
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Ex. 15. 6
25
Number of defectives chart
20
15
10
0
1 2 3 4 5 6 7 8 9 10
VALUES MEAN(CL) UCL LCL
No point falls beyond control limits, and therefore, the quality control of this firm is good.
Interpretation of or d or n chart
Ex. 15.7
10 Pieces of cloth, out of different rolls, were inspected and following defects found. Draw
control chart for number of defects and comment on the quality.
Since all defects values fall within control limits, the process is under quality control
Interpretation of C chart
In spite of wide applicability of Mean chart and Range chart, a number of practical
situations exist in many industries where numbers of defects are to be focused in order to control
quality of manufactured products.
C chart is based on number of defects fund on inspection of unit of products such as cloth,
carpet, paper, sheet, glass etc. they can be observed on inspection of sample units. When the
number of defects exceeds the upper and lower control limits, the process is said to be beyond
control, and needs corrective action.
2 3 4 0 5 6 7 4 3 2
How to identify and group causes of inferior quality? The question is difficult to answer.
Total quality management approach to complex business operations began with the realization
that all errors, defects and problems have causes, and that there is only a finite number of such
causes.
One crucial process in total quality management is to identify and discriminate between
things gone right and things gone wrong. In our airport example, some of the planes do leave on
time ( things gone right) . When we observe late departures (things gone wrong), we can begin
to build up a list of causes behind their delays.
Causes of problems can be gathered into logical groups. There will be various reasons for
departure delays, and there will be cause and effect relationship among them. These
relationships can be captured pictorially in a cause and effect diagram. Such diagrams are called
Ishikawa diagrams or Fishbone diagrams. Fishbone diagrams takes an unstructured list of factors
that contribute to delayed take-offs, and organizes that list in two major ways. First, it gathers
the factors into logical groups. And then within the groups, it indicates how various factors feed
into one another in cause and effect relationship.
Fishbone diagrams point out that employees at all levels must be involved in total quality
management, to be successful. Baggage handlers are much more likely than top management to
be able to identify a complete list of baggage problems that contribute to take off delays.
Besides, they are also very likely to be able to suggest ways to improve baggage operations.
Thus, fishbone diagram is the prominent tool of analysis in total quality management. It
sets out all the likely causes of departure from quality, along with their sources. Following is a
Fishbone diagram presentation of causes and effects of relationships among various operations in
a large airport. It identifies and presents things gone right and things gone wrong, which leads to
In any quality improvement process, there are likely to be a very large number of causes
for defects and errors. Looking at all the possible things that can go wrong, even if they are
organized into a neat fishbone diagram, can lead even well motivated people to despair at the
complexities. The solution is to distinguish between vital few and trivial many. In our airport
example, most of the delays are due to baggage handling, and only one delay a year is
attributable to a freak hail storm. In total quality improvement, companies must slay the dragons
first in working to improve the quality of their goods or service. A Parretto chart will assist in
identifying major causes.
Parreto diagrams
Fishbone diagram reveals the source, cause and effect of each variation in quality. when
such cause are revealed, they must be closely analyzed as to the contribution of each cause and
its effect. The diagram enabling such an analysis is called Parreto diagram.
30
25
Baggage
20
Equip
15 Paaseng
Service
10
Crew
5 Other
250
FD
200 HD
I/O
150 KEYBOARD
MONITOR
100 POWER
RAM
50 ROM
VIDEO
0
OTHER
25
Producers risk
Any person or firm that manufactures or assembles goods to be supplied to others is
called a producer. Any sampling inspection plan of a lot has the disadvantage of rejecting a lot of
satisfactory quality. The problem of rejecting a lot under sampling inspection plan is called
producers risk. thus producers risk stands for the problem of rejecting a good lot.
Consumers risk
Any person or form that buys and uses a product or services is called the consumer. the
consumer has the risk of accepting a lot of unsatisfactory goods, on the basis of sampling
inspection plan. the problem of accepting a lot which may be defective is called consumers risk.
Thus consumers risk stands for the probability of accepting a poor lot.
1. Define total quality management
2. What are the causes of failures of control charts
3. What is the basis of total quality management
4. Explain reasons for variations in quality
Components Faults
Cpu 5
Hdd 10
1/0 ports 10
Key-board 50
Power supply 20
Ram 5
Rom 12
Adaptor 5
others 6
Construct Parreto chart
The formula can still be modified and simplified as r =
If deviations are taken from assumed means, instead of original means, the formula will
be as below;
r = where dx, dy are deviations from assumed
( ) ( )
means.
Ex 17.1
Following are data on price and supply of a commodity. Find the correlation between
price and supply and comment on the degree of correlation between them.
X 22 26 29 30 31 33 34 35
Y 19 21 22 29 27 24 27 31
X ( 30) Y (25) X2 Y2 xy
-8 -6 64 36 48
-4 -4 16 16 16
-1 -3 1 9 3
0 4 0 16 0
1 2 1 4 2
3 -1 9 1 -3
4 2 16 4 8
5 6 25 36 30
0 0 132 122 104
r = = = = .82
( ) ( )
There is high degree of positive correlation. That is when price increases supply also
increases, to a great extant.
Ex 17.2
Calculate Karl Pearsons coefficient of correlation, from the following data on price and demand
for a product.
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
X Y X2 Y2 XY
2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
36 60 266 706 193
r= = r= -.92
( ) ( )
There is high degree of negative correlation. That is, when price increases, demand
decreases, almost in the same manner.
Ex 17.3
Calculate correlation coefficient in the short cut method, ie, using assumed means.
X 70 92 80 74 65 83
Y 74 84 63 87 78 90
r = r = = .2397
( ) ( )
There is very low degree of positive correlation. That is, when variable X increases,
variable Y also increases, but only slightly.
Ex. 17.4
Find the coefficient of correlation between age and playing habit of students.
Age 14.5-15.5 15.5-16.5 16.5-17.5 17.5-18.5 18.5-19.5 19.5-20.5
Students 250 200 150 120 100 80
Players 200 150 90 48 30 12
X = Age Y = = x 100 = 80 .
= r = = - .99
( ) ( )
There is very high degree of negative correlation between age and playing habit. That is,
as age increases, playing habit also increases, in almost perfect manner.
Interpretation of r
Karl Pearsons coefficient of correlation = r has become the most popular measure of
correlation. It represents the degree or intensity of relation between two variables. The
interpretation of r may take following forms:
1. when the value of r is +1, it represents perfect positive correlation, and when it is -1, it
is negative correlation
2. when the value is < 0.3 it is low level of correlation
3. when the value is between 0.3 and 0.7, it shows moderate level of correlation
4. when the value is > 0.7, it is high degree of correlation
Properties of correlation coefficient
1. Value of r lies between -1 and +1
2. Value of r is a pure, and is independent of its individual observation
3. Karl Pearsons r will not change, due to a change of origin or scale
4. Karl Pearsons r between x and y will be the same as between y and x
5. It does not represent causal relation
Probable error
Probable error of the coefficient of correlation is a s statistical measure which measures
reliability and dependability of the value of coefficient of correlation. If probable error is
added to or subtracted from the coefficient of correlation , it would give two such limits
within which we can reasonably expect the value of coefficient of correlation, to vary. Usually
the coefficient of correlation is calculated from samples. For different samples drawn from
the same population, the coefficient of correlation may vary. But the numerical value of such
variation is expected to be less than the probable error.
. (
The formula for finding probable error is PE =
Where r = coefficient of correlation n = number of pairs of observations
Coefficient of correlation indicates the nature and extent of relation between 2 variables.
Now a measure is required to explain the strength of relation and strength of variation. It is
coefficient of determination.
Coefficient of determination = =
25. Find Karl Pearsons coefficient of correlation between x and y from following data giving
test scores of 10 candidates in mathematics and statistics and interpret
Scores in Mathematics : 98 70 40 20 85 75 95 80 10 5
Scores in Statistics : 85 65 32 30 80 60 61 55 54 65
26. Find the coefficient of correlation from the following data give below
X: 12 20 15 18 33 24 30 12 15 22
Y: 30 35 28 36 29 39 30 25 30 38
27. Find the coefficient of correlation between sales and expenses of following 10 firms
(Figures omitting 000s)
Firms : 1 2 3 4 5 6 7 8 9 10
Sales : 50 50 55 60 65 65 65 60 60 50
Expenses 11 13 14 16 16 15 15 14 14 13
:
28. Compute the coefficient of correlation for the following data and interpret it
Year 1925 1926 2927 1928 1929 1930 1931 1932 1933 1934
Labors 368 384 385 361 347 384 395 503 400 385
bales 22 20 21 24 20 22 26 26 29 28
29. Find Karl Pearsons coefficient of correlation between two variables of x and y from given
below. Also find probable error and interpret
X: 78 89 96 69 59 79 68 61
Y: 121 137 156 112 107 136 223 108
30. Find coefficient of correlation between sales and price from following data. Also find
Probable Error (P. E)
Price (RS) 100 90 85 92 90 84 88 90 93 95
Sales (Units) 600 610 700 630 760 800 800 750 700 680
31. Establish correlation between the following pair of series and find out probable error and
also interpret your findings
X: 17 19 20 22 24 27 29 30 33 35
Y: 87 85 80 78 75 72 70 65 62 60
Uses
1. Rank correlation coefficient R is used to measure the degree of association between two
attributes
2. It is used in cases where exact or reliable measurement are not available like beauty,
attitude, intelligence, etc.
3. It gives a quick estimate of the degree of association between two attributes, avoiding
heavy calculation of Karl Pearsons coefficient of correlation
Ex. 18.1.the ranking 10 beauty contestants by two judges X and Y are given below. Calculate
Spearmans Rank correlation coefficient.
Contestant A b C D E F G H I J
Judge A 1 6 3 9 5 2 7 10 8 4
Judge B 6 8 3 2 7 10 5 9 4 1
Ex 18.2
Data on intelligence and income are given. Find Rank correlation coefficient
Intelligence 17 13 15 16 6 11 14 9 7 12
Income 36 46 35 24 12 18 27 22 2 8
17 1 36 2 1 1
13 5 46 1 4 16
15 3 35 3 0 0
16 2 24 5 3 9
6 10 12 8 2 4
11 7 18 7 0 0
14 4 27 4 0 0
9 8 22 6 2 4
7 9 2 10 1 1
12 6 8 9 3 9
Spearmans rank correlation R = 1 - = 1- = 0.733
. , =
. =
. =
If r12 = 0.7, r13 = 0.61, r23 = 0.4, find r12.3, r23.1, r13.2.
. . .
12.3, = = = .63
. .
. . .
23.1 = = = - .048
. .
. . .
31.2 = = = .50
. .
. =
. =
. =
Where r12, r13, r23 are simple correlation coefficients.
Ex.18.4
Given following simple correlation coefficients : . , . ,
. . .
, and . ,
. . . . .
. = = .
= 0.92
. . . . .
. = = .
= 0.84
. . .
= =. 972 = .986
.
(. ) (. ) (. )(. )(. )
R2.13= =
(. )
. . .
= =. 9751 = .9875
.
(. ) (. ) (. )(. )(. )
R3.12= =
(. )
. . .
= .
=. 4924 = 0.7017
EX: 18.6
if , , and , are three variables measured from their means with N=10, =90,
=160, =40, =60, =60, =40, calculate the partial correlation
coefficient . , and multiple correlation of coefficient r1.23.
Ans: , = = =.5
, = = = .67
, = = = .75
(. . ) .
. = = = = .515
. . .
. . . . .
r1.23 = = .
=67
byx = ( )
, where , are observations directly taken.
Steps in regression
1. Consider the given values
2. Find X, Y, and XY , where values are in original form.
3. Instead of original values, deviation from actual means or assumed means may be taken.
4. Find regression coefficients using the formula byx = ( )
5. solve the formula using regression coefficient y = ( )
6. obtain the value of y = a + bX
Ex:19.1
From the following data of values of x and y, find the regression equation of y on x
x : 2 3 3 5 6
y : 3 5 4 8 9
Ans:
2 3 6 4
3 5 15 9
4 4 16 16
5 8 40 25
6 9 54 36
20 29 131 90
The equation of the regression line y on x is = y y = b (x x)
( )
where b = = = = = 1.5
( ) ( )
= = =4;y= = = 5.8
the equation is y 5.8 = 1.5 (x-4)
y 5.8 = 1.5x - 6
y = 1.5x 6 + 5.8
y = 1.5x - .2
x on y = x x = b (y y )
EX 19. 2:
From the following data of age of husband and age of wife, form the two regression equations
and calculate (i) Husbands age when wifes age is 16
(ii) Wifes age when husbands age is 40.
Husbands 36 23 27 28 28 29 30 31 33 35
Age
Wifes 29 18 20 22 27 21 29 27 29 28
Age
Ans:
= = = 30 ; y = = = 25
( )
byx = ( )
= ( )
= = = 0.89
( )
bxy = = = = = 0.75
( ) ( )
Equation of the line of regression of Y on X
(y y) = byx ( ) y - 25 = .089 (x-30)
y = 0.89x 26.7 + 25
y = .89x 1.7
byx =
bxy =
EX 19.4:
Ans:
Given x = 36 ; y = 85 | x = 11 ; y = 8 | r = 0.66
. .
byx = = = = 0.48
. .
bxy = = = = 0.91
Equations of regression of Y on X
= ( )
85 = 0.48 (x-36) = .048x 17.28
= 0.48x + 67.72
Equations of regression of X on Y
x x = byx (y y)
x 36 = 0.91 (y 85) = 0.91y 77.35
x = 0.91y - 41.35 To estimate the
value of x when value of y = 75, we use regression equation x on y. x = 0.91
(75) 41.35 = 68.25 41.35 = 26.9
EX 19 5
From the following result, estimate the yield of Yields in KG (Y) Rainfall in CMs (X)
crops when the rainfall is 22 cm and rainfall when
yield is 600 kgs.
Arithmetic Mean 508.4 26.7
Standard Deviation 36.8 4.6
Ans:
EX 19.6
Find r if byx = -0.2 ; bxy = -0.7
Ans:
= r= .2 .7 = . 14 = -.37
EX 19 . 7:
byx = .83 ; = 10 ; = 12. Find r
Ans:
byx =
.83 = r
.83 = r 1.2
.
r= .
= .7
Ex:19.8 comment on the following result.
For a bivariate distribution,
1. Coefficient of regression of y on x is 4.2 and coefficient of regression of x on y is 0.50
2. bxy= -.82 and byx =.25
Ans: 1. r=byx x bxy = 4.2 x 0.5= 2.10
But r square cannot be greater than 1. Hence the given statement is wrong.
2 byx and bxy can not have opposite signs. So the statement is wrong.
1. Regression helps to obtain most probable values of one variable for given values of
other variable.
2. It helps to study the effect of price on supply or demand of a commodity.
3. It is widely applied in physical science where the relation is functional.
4. It is used to describe the nature of relation between 2 or more variable.
5. It reveals rate of change in one variable based on change in other variable.
Properties of Regression Coefficient
1. The sign of both regression and coefficient will be the same. That is , both will be positive
or both will be negative.
2. Product of regression coefficient will be squire of correlation coefficient.
3. byx and bxy will have same squire of correlation coefficient.
4. = and =
5. Both byx and bxy cannot be greater than one
Distinguish Between Correlation and Regression
Correlation Regression
Several degrees of relationship Several nature of relationship
No dependent or independent variables One variable is dependent other independent
Does not enable prediction Does enable prediction
Means relation between two variable Means to average value
Need not imply cause and effect relationship Indicates cause and effect relationship
between variables
Relative measure Absolute measure
Multiple Regressions
Regression can be simple or multiple according to the number of variables involved. In
simple regression, there are only two variables, and we are concerned with establishing linear
relationship between them. It ignores the possibility of variations in the dependent variable
being explained in terms of 2 or more independent variables.
In multiple regression, 3 or more variables are involved and we study the average relation
between one dependent variable and two or more independent variables.
For example The price of a product depends on both demand and supply for it. Here, if
the average relation between price on the one hand, and supply and demand on the other hand,
is calculated, it can be used to estimate volume of price, as a combined impact of supply and
demand.
Multiple regression equation of X1 on X2 X3 = X1 = b12.3 X2 + b13..2 X3
Multiple regression equation of X2 on X1 X3 = X2 = b21.3 X1 + b23.1 X3
b21.3 = b23.1 =
b31.23 =
b32.1 =
SPSS for Windows has the same general look a feel of most other programmes for
Windows. Virtually anything statistic that you wish to perform can be accomplished in
combination with pointing and clicking on the menus and various interactive dialog boxes. You
may have noted that the examples in the Howell textbook are performed/analyzed via code. That
is, SPSS, like many other packages, can be accessed by programming short scripts, instead of
pointing and clicking. We will not cover any programming in this tutorial.
SPSS offers a large number of possible formats, including their own. A list of the available
formats can be viewed and selected by clicking on the Save as type: , on the Save As dialog box. If
your intention is to only work in SPSS, then there may be some benefit to saving in the
SPSS(*.sav) format. I assume that this format allows for faster reading and writing of the data file.
However, if your data will be analyzed and looked by other packages (e.g., a spreadsheet), it
would be advisable to save in a more universal format (e.g., Excel(*.xls), 1-2-3 Rel 3.0 (*.wk3).
Once the type of file has been selected, enter a filename, minus the extension (e.g., sav,
xls). You should also save the file in a meaningful directory, on your hard drive or floppy. That is,
for any given project a separate directory should be created. You don't want your data to get
mixed-up.
Once data has been entered or modified, it is advisable to save. In fact, save as often as
possible [File => Save As].
Data View
The Data View window is simply a grid with rows and columns. The rows represent subjects
(cases or observations) and columns represent variables whose names should appear at the top
of the columns. In the grid, the intersection between a row and a column is known as a cell. A cell
will therefore contain the score of a particular subject (or case) on one particular variable. This
window displays the contents of data file. You create new data files or modify existing ones in this
window. This window opens automatically when you start an SPSS session. See Figure 1 for a
brief annotation of this window.
The Variable View window is also a simple grid with rows and columns. This window contains
descriptions of the attributes of each variable that make up your data set. Window, rows are
variables and columns are variable attributes. You can make changes to variable attributes in this
window such as add, delete and modify attributes of variables. There are eleven columns
altogether namely: Name, Type, Width, Decimal, Label, Value, Missing, Columns, Align, Measure
and Role. See Fig. 2 for more information. As you define variables in this window, they are
displayed in the Data View window. The number of rows in the Variable view window
corresponds to the number of columns in the Data view window.
Columns represent attributes of variables
Rows
Dialogue boxes
You use dialogue boxes to select variables and options for statistics and charts. You select
variables for analysis from the source list. And you use the arrow button to move the variables
into the target list. Dialogue box buttons with an ellipsis (...) open sub dialogue boxes for optional
selections. There are five standard buttons on most dialogue boxes (OK, PASTE, RESET, CANCEL,
and HELP). You see some diagrams of some dialogue boxes as you progress through this
document. The Frequency dialogue box is shown in Fig. 11.
The OK button is not available because no variable has been transferred to the target list
yet A single click on any of these buttons will open sub dialogue boxes Click this arrow to transfer
variable(s) Target variable list Source variable list. The variables can be selected
Variable names
Always give meaningful names to all your variables. If you do not, SPSS will name the
variables for you, calling the first variable var00001, the second var00002 and so on. There are six
specific rules that you should follow when selecting variable names. A variable name:
Descriptive statistics
The most primary objective of statistical study is describing the properties like central
tendency, dispersion, normality, spread of distribution, percentages, properties, etc.
spas enables such descriptive statistics, which provide necessary information about
description of variables. to compute descriptive statistics:
X 2 4 5 6 8 11
Y 18 12 10 8 7 5