Module-V CSE
Module-V CSE
MODULE - V
DESIGN OF EXPERIMENTS & ANOVA
Introduction
Design of Experiments (DOE) is a systematic approach for planning and conducting experiments
to understand relationships between factors and responses. Analysis of Variance (ANOVA) is a
statistical technique for assessing differences in means between multiple groups. DOE helps
optimize processes, while ANOVA quantifies the significance of these optimizations. It's a
powerful combination in scientific research, manufacturing, and quality control to make informed
decisions. DOE designs experiments, and ANOVA analyzes results, enhancing decision-making
across diverse fields.
Factor Analysis and Data Interpretation: Develop the skills to identify significant factors,
understand their interactions, and optimize processes or systems. Learn how to interpret and derive
meaningful insights from experimental data.
Practical Application: Apply DOE and ANOVA techniques in real-world scenarios, such as
quality improvement, process optimization, and scientific research, to make data-driven decisions
and enhance outcomes.
Effective Experimental Planning: Develop the ability to plan and execute experiments
systematically, considering factors, levels, and experimental designs for efficient data collection.
Hypothesis Testing Proficiency: Gain expertise in formulating hypotheses and applying ANOVA
to determine statistically significant differences among groups or treatments and understand the
impact of factors.
Data Analysis and Interpretation: Learn to interpret and draw meaningful insights from
experimental data, making informed decisions and improvements in quality, processes, and
research based on empirical evidence.
Planning an experiment to obtain appropriate data and drawing inference out of the data with
respect to any problem under investigation is known as design and analysis of experiments. This
might range anywhere from the formulations of the objectives of the experiment in clear terms to
the final stage of the drafting reports incorporating the important findings of the enquiry. The
structuring of the dependent and independent variables, the choice of their levels in the experiment,
the type of experimental material to be used, the method of the manipulation of the variables on
the experimental material, the method of recording and tabulation of data, the mode of analysis of
the material, the method of drawing sound and valid inference etc. are all intermediary details that
go with the design and analysis of an experiment.
Principles of Experimentation
Almost all experiments involve the three basic principles, viz., randomization, replication and
local control. These three principles are, in a way, complementary to each other in trying to
increase the accuracy of the experiment and to provide a valid test of significance, retaining at the
same time the distinctive features of their roles in any experiment.
This provides a basis for making a valid estimate of random fluctuations which is so essential in
testing of significance of genuine differences.
Replication: Replication is the repetition of experiment under identical conditions but in the
context of experimental designs, it refers to the number of distinct experimental units under the
same treatment. Replication, with randomization, will provide a basis for estimating the error
variance. In the absence of randomization, any amount of replication may not lead to a true
estimate of error. The greater the number of replications, greater is the precision in the experiment.
Local control: Local control means the control of all factors except the ones about which we are
investigating. Local control, like replication is yet another device to reduce or control the variation
due to extraneous factors and increase the precision of the experiment.
A completely randomized design (CRD) is one where the treatments are assigned completely at
random so that each experimental unit has the same chance of receiving any one treatment. For
the CRD, any difference among experimental units receiving the same treatment is considered as
experimental error. Hence, CRD is appropriate only for experiments with homogeneous
experimental units, such as laboratory experiments, where environmental effects are relatively
easy to control. For field experiments, where there is generally large variation among experimental
plots in such environmental factors as soil, the CRD is rarely used.
The randomized complete block design (RCBD) is one of the most widely used experimental
designs in forestry research. The design is especially suited for field experiments where the number
of treatments is not large and there exists a conspicuous factor based on which homogenous sets
of experimental units can be identified. The primary distinguishing feature of the RCBD is the
presence of blocks of equal size, each of which contains all the treatments.
The ANOVA technique is important in the context of all those situations where we want
to compare more than two populations such as in comparing the yield of crop from several varieties
of seeds, the gasoline mileage of four automobiles, the smoking habits of five groups of university
students and so on. In such circumstances one generally does not want to consider all possible
combinations of two populations at a time for that would require a great number of tests before we
would be able to arrive at a decision. This would also consume lot of time and money, and even
then, certain relationships may be left unidentified (particularly the interaction effects). Therefore,
one quite often utilizes the ANOVA technique and through it investigates the differences among
the means of all the populations simultaneously.
ANOVA TECHNIQUE
Under one-way (or single factor) ANOVA, we consider only one factor and then observe that the
reason for said factor to be important is that several possible types of samples can occur within
that factor. We then determine if there are differences within that factor. The technique involves
the following steps:
̅1 , 𝑋
(i) Obtain the mean of each sample i.e., obtain 𝑋 ̅2, 𝑋
̅3, … , 𝑋
̅ 𝑘 when there are k samples.
ii) Work out the mean of the sample means as follows:
̅ 1 +𝑋
𝑋 ̅ 2 +𝑋
̅ 3 +⋯+𝑋
̅𝑘
𝑋̿ =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠(𝑘)
iii) Take the deviations of the sample means from the mean of the sample means and calculate
the square of such deviations which may be multiplied by the number of items in the
corresponding sample, and then obtain their total. This is known as the sum of squares for
variance between the samples (or SS between). Symbolically, this can be written:
2 2 2 2
𝑆𝑆 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑛1 (𝑋̿ − 𝑋
̅1 ) + 𝑛2 (𝑋̿ − 𝑋
̅ 2 ) + 𝑛3 (𝑋̿ − 𝑋
̅ 3 ) + ⋯ + 𝑛𝑘 (𝑋̿ − 𝑋
̅𝑘 )
iv) Divide the result of the step (iii) by the degrees of freedom between the samples to obtain
variance or mean square (MS) between samples.
𝑆𝑆 𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝑀𝑆 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 =
(𝑘 − 1)
Where 𝑘 − 1 represents degrees of freedom (d.f) between samples.
v) Obtain the deviations of the values of the sample items for all the samples from corresponding
means of the samples and calculate the squares of such deviations and then obtain them
total. This total is known as the sum of squares for variance within samples (or SS within).
𝑆𝑆 𝑤𝑖𝑡ℎ𝑖𝑛 = ∑(𝑋1𝑖 − 𝑋̅1 )2 + ∑(𝑋2𝑖 − 𝑋̅2 )2 + … + ∑(𝑋𝑘𝑖 − 𝑋̅𝑘 )2
vi) Divide the result of step (v) by the degrees of freedom within samples to obtain the variance or
mean square (MS) within samples.
𝑆𝑆 𝑤𝑖𝑡ℎ𝑖𝑛
𝑀𝑆 𝑤𝑖𝑡ℎ𝑖𝑛 =
(𝑛 − 𝑘)
Where (𝑛 − 𝑘) represents degrees of freedom within samples,
n=total number of items in all the samples i.e., 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘
k= number of samples.
vii) For a check, the sum of squares of deviations when the deviations for the individual items in
all the samples have been taken from the mean of the sample means.
2
𝑆𝑆 𝑓𝑜𝑟 𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = ∑(𝑋𝑖𝑗 − 𝑋̿) , 𝑖 = 1,2,3. . & 𝑗 = 1,2,3 …
This should be equal to total of the result of the (iii) and (v) steps explained above i.e.,
𝑆𝑆 𝑓𝑜𝑟 𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆𝑆 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 + 𝑆𝑆 𝑤𝑖𝑡ℎ𝑖𝑛
The degrees of freedom for total variance will be equal to the number of items in all samples minus
one i.e., (n – 1). The degrees of freedom for between and within must add up to the degrees of
freedom for total variance i.e., (n – 1) = (k – 1) + (n – k)
This fact explains the additive property of the ANOVA technique.
(viii) Finally, F-ratio may be worked out as under:
𝑀𝑆 𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝐹 − 𝑟𝑎𝑡𝑖𝑜 =
𝑀𝑆 𝑤𝑖𝑡ℎ𝑖𝑛
This ratio is used to judge whether the difference among several sample means is
significant or is just a matter of sampling fluctuations. For this purpose, we look into the table*,
giving the values of F for given degrees of freedom at different levels of significance. If the
worked-out value of F, as stated above, is less than the table value of F, the difference is taken as
insignificant i.e., due to chance and the null-hypothesis of no difference between sample means
stands. In case the calculated value of F happens to be either equal or more than its table value, the
difference is considered as significant (which means the samples could not have come from the
same universe) and accordingly the conclusion may be drawn. The higher the calculated value of
F is above the table value, the more definite and surer one can be about his conclusions.
Note:
i) It should be remembered that ANOVA test is always a one-tailed test, since a low calculated
value of F from the sample data would mean that the fit of the sample means to the null hypothesis
̅1 = 𝑋
(viz., 𝑋 ̅2 = ⋯ = 𝑋 ̅ k ) is a very good fit.
Table 1: Analysis of Variance for one-way ANOVA Technique (there are k sample having n items)
Problems
1. Set up an analysis of variance table for the following per acre production data for three varieties
of wheat, each grown on 4 plots and state if the variety differences are significant.
The above table shows that the calculated value of F is 1.5 which is less than the table value of
4.26 at 5% level with d.f. being v1 = 2 and v2 = 9 and hence could have arisen due to chance. This
analysis supports the null-hypothesis of no difference is sample means. We may, therefore,
conclude that the difference in wheat output due to varieties is insignificant and is just a matter of
chance.
2. Three different kinds of food are tested on three groups of rats for 5 weeks. The objective is to
check the difference in mean weight (in grams) of the rats per week. Apply one-way ANOVA
using a 0.05 significance level to the following data:
Food I Food II Food III
8 4 11
12 5 8
19 4 7
8 6 13
6 9 7
11 7 9
Solution: Using the same procedure as explained in problem1 we set null hypothesis as
H0: μ1= μ2=μ3
H1: The means are not equal
The other computed values as follows
Since, 𝑋̅1 = 5, 𝑋̅2 = 9, 𝑋̅3 = 10
Total mean = 𝑋̅= 8
SSE = 68
MSB = SSB/df1 = 42
MSE = SSE/df2 = 4.53
f = MSB/MSE = 42/4.53 = 9.33
Source of SS d.f MS F-ratio 5%
variation F-limit(from
the F-table)
Between 8 (3-1)=2 8/2=4.00 4.00/2.67=1.5 F(1,2)=4.26
sample
Within 24 12-3=9 24/9=2.67 4.00/2.67=1.5 F(1,2)=4.26
sample
Total 32 12-1=11
Preliminary data analyses indicate that the independent samples come from normal populations
with equal standard deviations. At the 5% significance level, does there appear to be a difference
in mean lifetime among the four brands of batteries?
At the 𝛼 = 0.05 level of significance, there is not enough evidence to conclude that the mean
lifetimes of the brands of batteries differ, thus it is failed to reject the null hypothesis.
4. Data on Scholastic Aptitude Test (SAT) scores are published by the College Entrance
Examination board in National College-Bound Senior. SAT scores for randomly selected
students from each of four high-school rank categories are displayed in the following table.
Top Tenth Second Tenth Secon-fifth Third fifth
528 514 649 372
586 457 506 440
680 521 556 495
718 370 413 321
532 470 424
330
Construct the one-way ANOVA table for the data. Compute SSC and SSE using the defining
formulas.
̅̅̅1 = 628.0, 𝑋
Solution: 𝑋 ̅̅̅2 = 478.8, 𝑋
̅̅̅3 = 518.8, 𝑋
̅̅̅4 = 397.0, 𝑋̿ = 494.1
SS between samples (SSC) = 132508.2
SS Within samples (SSE) = 95877.6 and SS total = SSC+SSE=228385.8
Source of SS d.f MS F-ratio
variation
Between 132508.2 3 44169.40 7.37
sample
Within 95877.6 16 5992.35
sample
Total 560.2 19
Exercise
1. Manufacturers of golf balls always seem to be claiming that their ball goes the farthest. A
writer for a sports magazine decided to conduct an impartial test. She randomly selected 20
golf professionals and then randomly assigned four golfers to each of five brands. Each golfer
drove the assigned brand of ball. The driving distances, in yards, are displayed in the following
table.
Brand 1 Brand 2 Brand 3 Brand 4 Brand 5
286 279 270 284 281
276 277 262 271 293
281 284 277 269 276
274 288 280 275 292
Preliminary data analyses indicate that the independent samples come from normal populations
with equal standard deviations. Do the data provide sufficient evidence to conclude that a
difference exists in mean weekly earnings among nonsupervisory workers in the five industries?
Perform the required hypothesis test using α = 0.05.
2. The U.S. Bureau of Prisons publishes data in Statistical Report on the times served by prisoners
released from federal institutions for the first time. Independent random samples of released
prisoners for five different offense categories yielded the following information on time served,
in months. At the 1% significance level, do the data provide sufficient evidence to conclude
that a difference exists in mean time served by prisoners among the five offense groups?
Major
𝑛𝑖 𝑋𝑖 𝑠𝑖
Counterfeiting 15 14.5 4.5
Drug Laws 17 18.4 3.8
Firearms 12 18.2 4.5
Forgery 10 15.6 3.6
Fraud 11 11.5 4.7
TWO-WAY ANOVA
Two-way ANOVA technique is used when the data are classified on the basis of two factors. For
example, the agricultural output may be classified on the basis of different varieties of seeds and
also on the basis of different varieties of fertilizers used. A business firm may have its sales data
classified on the basis of different salesmen and also on the basis of sales in different regions. In a
factory, the various units of a product produced during a certain period may be classified on the
basis of different varieties of machines used and also on the basis of different grades of labour.
Such a two-way design may have repeated measurements of each factor or may not have repeated
values. The ANOVA technique is little different in case of repeated measurements where we also
compute the interaction variation. We shall now discuss the two-way ANOVA technique in the
context of both the said designs with the help of examples.
(a) ANOVA technique in context of two-way design when repeated values are not there: As
we do not have repeated values, we cannot directly compute the sum of squares within samples as
we had done in the case of one-way ANOVA. Therefore, we have to calculate this residual or error
variation by subtraction, once we have calculated (just on the same lines as we did in the case of
one-way ANOVA) the sum of squares for total variance and for variance between varieties of one
treatment as also for variance between varieties of the other treatment.
The various steps involved are as follows:
(i) Take the total of the values of individual items in all the samples and call it T.
𝑇2
(ii) Work out the correction factor as under correction factor = 𝑛
(iii) Find out the square of all the item values (or their coded values as the case may be) one by
one and then take its total. Subtract the correction factor from this total to obtain the sum of squares
2 𝑇2
of deviations for total variance. Total SS = ∑ 𝑋𝑖𝑗 − 𝑛
(v) Take the total of different columns and then obtain the square of each column total and divide
such squared values of each column by the number of items in the concerning column and take the
total of the result thus obtained. Finally, subtract the correction factor from this total to obtain the
sum of squares of deviations for variance between columns or (SS between columns).
(vi) Take the total of different rows and then obtain the square of each row total and divide such
squared values of each row by the number of items in the corresponding row and take the total of
the result thus obtained. Finally, subtract the correction factor from this total to obtain the sum of
squares of deviations for variance between rows (or SS between rows).
(vii) Sum of squares of deviations for residual or error variance can be worked out by subtracting
the result of the sum of 5th and 6th steps from the result of 4th step stated above. In other words,
Total SS – (SS between columns + SS between rows) = SS for residual or error variance.
(viii) Degrees of freedom (d.f.) can be worked out as under:
d.f. for total variance = (c . r – 1)
d.f. for variance between columns = (c – 1)
d.f. for variance between rows = (r – 1)
d.f. for residual variance = (c – 1) (r-1) where c = number or columns, r = number of rows
ix) ANOVA table cab be setup in the usual fashion as shown below
Solution: As the given problem is a two-way design of experiment without repeated values, we
shall adopt all the above stated steps.
𝑇2 60×60
Step (i) T=60, n=12, ∴ 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = 𝑛
= 12
= 300
(60× 60)
Step (ii) 𝑇𝑜𝑡𝑎𝑙 𝑆𝑆 = (36 + 25 + 25 + 49 + 25 + 16 + 9 + 9 + 64 + 49 + 16) − 12
24×24 20×20 16×16 60×60
Step (iii) SS between columns treatment = [ 4
+ 4
+ 4
]−[ 12
] = 144 + 100 +
64 − 300 = 8
16×16 16×16 9×9 19×19 60×60
Step (iv) SS between rows treatment = [ 3
+ 3
+ 3
+ 3
]−[ 12
] = 85.33 +
From the said ANOVA table, we find that differences concerning varieties of seeds are
insignificant at 5% level as the calculated F-ratio of 4 is less than the table value of 5.14, but the
variety differences concerning fertilizers are significant as the calculated F-ratio of 6 is more than
its table value of 4.76.
2. Set up ANOVA table for the following information relating to three drugs testing to judge the
effectiveness in reducing blood pressure for three different groups of people:
Amount of Blood Pressure Reduction in Millimeters of Mercury
Do the drugs act differently? Are the different groups of people affected differently? Is the
interaction term significant? Justify your answer at 5% level of significance.
Solution: As the given problem is a two-way design of experiment with repeated values, we shall
adopt all the above stated steps.
Step (i) T=187, n=18,
187×187
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = 18 = 1942.7
2 2 2 2 2 2 2 2 2 2
Step (ii) 𝑇𝑜𝑡𝑎𝑙 𝑆𝑆 = (14 + 15 + 12 +11 + 10 + 9 + 72 + 8 + 11 + 11 + 11 +
(1872 )
112 + 102 + 112 + 82 + 72 ) − 18
= 2019 − 1942.72 = 76.28
73×73 56×56 58×58 1872
Step (iii) SS between columns (i.e., between drugs) = [ 6
+ 6
+ 6
]−[ 18
]=
(NOTE: These figures are left-over figures and have been obtained by subtracting from the
column total the total of all other value in the said column. Thus, interaction SS = (76.28) – (28.77
+ 14.78 + 3.50) = 29.23 and interaction degrees of freedom = (17) – (2 + 2 + 9) = 4).
The above table shows that all the three F-ratios are significant of 5% level which means that the
drugs act differently, different groups of people are affected differently and the interaction term is
significant. In fact, if the interaction term happens to be significant, it is pointless to talk about the
differences between various treatments i.e., differences between drugs or differences between
groups of people in the given case.
3. The following data show the number of worms quarantined from the GI areas of four groups
of muskrats in a carbon tetrachloride anthelmintic study. Conduct a two-way ANOVA test.
I II III IV
338 412 124 389
324 387 353 432
268 400 469 255
147 233 222 133
309 212 111 265
Solution: Using the same procedure as explained in problem1
Source of Sum of Squares Degrees of Mean Square Ratio F
variation freedom
Between the 62111.6 8 9078.067 F = MST / MSE
groups = 9.4062 / 3.66
F = 2.57
Within the groups 98787.8 16 4567.89
Total 167771.4 24
1. The following data represents the number of units of tablet production (in thousands) per day
by five different technicians by using four different types of machines.
Workers A B C D
P 54 48 57 46
Q 56 50 62 53
R 44 46 54 42
S 53 48 56 44
T 48 52 59 48
a) Test whether the mean productivity of different machines is same?
b) Test whether the 5 technicians differ with respect to mean productivity?
Note:
1. CODING METHOD: Coding method is furtherance of the short-cut method. This is based on
an important property of F-ratio that its value does not change if all the n item values are either
multiplied or divided by a common figure or if a common figure is either added or subtracted from
each of the given n item values. Through this method big figures are reduced in magnitude by
division or subtraction and computation work is simplified without any disturbance on the F-ratio.
This method should be used specially when given figures are big or otherwise inconvenient. Once
the given figures are converted with the help of some common figure, then all the steps of the
short-cut method stated above can be adopted for obtaining and interpreting F-ratio.
2. In place of c we can as well write r or v since in Latin-square design c = r = v
Problems
1. Analyze and interpret the following statistics concerning output of wheat per field obtained as
a result of experiment conducted to test four varieties of wheat viz., A, B, C and D under a
Latin-square design
Solution: Using the coding method, we subtract 20 from the figures given in each of the small
squares and obtain the coded figures as under:
𝑇2 (−12)×(−12)
Step (i) T=60, n=12, ∴ 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = 𝑛
= 16
=9
2 𝑇2
Step (ii) 𝑇𝑜𝑡𝑎𝑙 𝑆𝑆 = ∑(𝑋𝑖𝑗 ) − 𝑛
= 122 − 9 = 113
2
(𝑇𝑗 ) 𝑇2 66
Step (iii) SS between columns = ∑ 𝑛𝑗
− 𝑛
= 4
− 9 = 7.5
(𝑇𝑖 )2 𝑇2 222
Step (iv) SS between rows treatment = ∑ 𝑛𝑖
− 𝑛
= 4
− 9 = 46.5
Step (v) For finding SS for variance between varieties, we would first rearrange the coded data in
the following form:
Sum of square of residual variance will work out to, 113-(7.5+46.5+48.5) =10.50
d.f. for variance between columns = (c – 1) = (4 – 1) = 3
d.f. for variance between rows = (r – 1) = (4 – 1) = 3
d.f. for variance between varieties = (v – 1) = (4 – 1) = 3
d.f. for total variance = (n – 1) = (16 – 1) = 15
d.f. for residual variance = (c – 1) (c – 2) = (4 – 1) (4 – 2) = 6
ANOVA table in Latin-Square design can now be set up as shown below
The above table shows that variance between rows and variance between varieties are significant
and not due to chance factor at 5% level of significance as the calculated values of the said two
variances are 8.85 and 9.24 respectively which are greater than the table value of 4.76. But variance
between columns is insignificant and is due to chance because the calculated value of 1.43 is less
than the table value of 4.76.
2. Below are given the plan and yield in kgs/plot of a 5x5 Latin square experiment on the wheat
crop carried out for testing the effects of five, manorial treatments A, B, C, D, and E. ‘A’ denotes
control.
(𝑇𝑖 )2 𝑇2
Step (iv) SS between rows treatment (SSR) = ∑ 𝑛𝑖
− 𝑛
= 3.04
Step (v) To get SS due to treatments, first find the totals for each treatment using the given data
as follows:
𝑇𝑣2 𝑇2
SS for variance between treatments=∑ 𝑛𝑣 − 𝑛
= 454.64
Summary of results
Treatment means will be calculated from the original table on treatment totals.
Treatments A B C D E CD 5%
Mean yield 9.0 13.6 17.6 21.8 16.6 1.33
in Kgs/plot
The treatment has been compared by setting them in the descending order of their yields.
Treatments D C E B A CD 5%
While applying the ANOCOVA technique, the influence of uncontrolled variable is usually
removed by simple linear regression method and the residual sums of squares are used to provide
variance estimates which in turn are used to make tests of significance. In other words, covariance
analysis consists in subtracting from each individual score (Yi) that portion of it Yi´ that is
predictable from uncontrolled variable (Zi) and then computing the usual analysis of variance on
the resulting (Y – Y´)’s, of course making the due adjustment to the degrees of freedom because
of the fact that estimation using regression method required loss of degrees of freedom.
ASSUMPTIONS IN ANOCOVA
The ANOCOVA technique requires one to assume that there is some sort of relationship between
the dependent variable and the uncontrolled variable. We also assume that this form of relationship
is the same in the various treatment groups. Other assumptions are:
(i) Various treatment groups are selected at random from the population.
(ii) The groups are homogeneous in variability.
(iii) The regression is linear and is same from group to group.
Problems
1. The following are paired observations for three experimental groups:
Y is the covariate (or concomitant) variable. Calculate the adjusted total, within groups and
between groups, sums of squares on X and test the significance of differences between the adjusted
means on X by using the appropriate F-ratio. Also calculate the adjusted means on X.
Solution: We apply the technique of analysis of covariance and work out the related measures as
(∑ 𝑋)2
Correction factor for 𝑋 = 𝑁
= 7616.27 and
(∑ 𝑌)2
∑ 𝑌 = 33 + 72 + 105 = 210 and correction factor for Y= = 2940
𝑁
∑𝑋∑𝑌
∑ 𝑋 2 = 9476,∑ 𝑌 2 = 3734, ∑ 𝑋𝑌 = 5838 and correction factor for XY= = 4732
𝑁
7671.27=1588.13
SS within X = (total SS for X) - (SS between for X) = 1859.73-1588.13 = 271.60
Similarly, we work out the following values in respect of Y
Total SS for Y= Y2 - correction factor for Y =3734-2940=794
332 722 1052
SS between for Y ={ 5
+ 5
+ 5
} −correction factor for Y=519.6
SS within for Y = (total SS for Y) – (SS between for Y) = (794) – (519.6) = 274.4
Then, we work out the following values in respect of both X and Y
Total sum of product of XY = ∑ 𝑋𝑌 – 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 𝑓𝑜𝑟 𝑋𝑌 = 5838 – 4732 = 1106
49×33 114×72 175×105
SS between for XY = { + + } − 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 𝑓𝑜𝑟 𝑋𝑌
5 5 5
𝑇2 11062
Adjusted total SS= 𝑇𝑋𝑋 − 𝑇𝑋𝑌 = 1859.73 − = 1859.73 − 1540.60 = 319.13
𝑌𝑌 794
𝐸2 1982
Adjusted SS within group= 𝐸𝑋𝑋 − 𝐸𝑋𝑌 = 271.60 − 274.40 = 128.73
𝑌𝑌
Adjusted SS between groups= (adjusted total SS) – (Adjusted SS within group) = (319.13 –
128.73) = 190.40
ANOVA table for adjusted X
At 5% level, the table value of F for v1 = 2 and v2 = 11 is 3.98 and at 1% level the table value of
F is 7.21. Both these values are less than the calculated value (i.e., calculated value of 8.14 is
greater than table values) and accordingly we infer that F-ratio is significant at both levels which
means the difference in group means is significant.
Adjusted means on X will be worked out as follows:
𝑆𝑢𝑚 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝 198
Regression coefficient for X on Y i.e.,𝑏 = 𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑓𝑜𝑟 𝑌 = 274.40 = 0.7216
Adjusted means of groups in X = (Final mean) – b (deviation of initial mean from general mean
in case of Y). Hence,
Adjusted mean for Group I = (9.80) – 0.7216 (–7.4) = 15.14
Adjusted mean for Group II = (22.80) – 0.7216 (0.40) = 22.51
Adjusted mean for Group III = (35.00) – 0.7216 (7.00) = 29.95
Video Links:
1. Design of experiments
2. Basic of ANOVA
3. Problems on one-way ANOVA
4. Problems on two-way ANOVA
5. Problems on two-way ANOVA
6. Problems on ANOVA for Latin-Square design
7. Problems on ANOCOVA
Disclaimer: The content provided is prepared by department of Mathematics for the specified
syllabus by using reference books mentioned in the syllabus. This material is specifically for the
use of RVITM students and for education purpose only.