Lectures Total
Lectures Total
Definition:
Examples:
Marital status: single, married, divorced, widow
Blood group: A, B, AB and O.
Types of variables
18 Total
• For discrete quantitative variables:
• The categories here are the different classes of data.
• Pulse in the previous table is a discrete quantitative variable.
Example for its table (manual table) is:
• In construction of this frequency distribution, the following rules
are followed:
• In general, about 4 – 12 groups is an appropriate figure to adopt.
(the number of classes depends upon the number of observation
in the series).
• Keep the class – or group intervals at an equal width.
• The first step is to find the upper and lower limits over which the
tabulation extend and try to make the classes or intervals. For
example, in the pulse in the previous table, upper limit is 80 and
lower limit is 65, so the range is 15 and we distributed it into 4
equal classes each with class interval of 4.
2. For discrete quantitative variables:
18 Total
3. For continuous quantitative variables:
Answer:
• Range = highest value – lowest value = 35 – 10 = 25 kg.
• We select the class interval =5 therefore the number of classes =
5.
Simple frequency distribution table:
Weight (kg) Frequency
10- 5
15- 15
20- 10
25- 20
30 – 35 10
Total 60
Cumulative frequency tables:
50
40
30
20
10
0
Ascaris Bilharziasis Ankylostom Oxyuris E. Histolytica
35
30
25
20
Males
Females
15
10
0
Ascaris Bilharziasis Ankylostoma Oxyuris E. Histo
60
50
40
30 Column1
Females
20
10
0
Ascaris Bilharziasis Ankylostoma Oxyuris Entameba. Histo
2. Line graph:
• It is the most suitable type to represent data when we are
dealing with a time variable (hours, days, weeks, months,
years).
• The time variable is a special unit of quantitative continuous
variable. Usually we show the time variable on the
horizontal axis and the other variable on the vertical axis.
For example temperature chart over hours, blood sugar over
months, crude birth rate and infant mortality rate over
years.
• Example: the following table shows infant mortality rate in
Egypt over 10 years:
Year IMR per thousand
1971 103
1972 116
1973 98
1974 101
1975 89
1976 87
1977 85
1978 74
1979 76
1980 76
140
120
100
80
60
40
20
0
1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
3. Histogram:
• This type of graph is used to present data from a frequency
distribution table, and the variables must be quantitative
continuous. The table is of a simple type and closed ended. In
this graph each category in the table is represented by a bar
graph, the height of the bar should be opposite to the
corresponding frequency on the Y (vertical axis). The width of
the bar depends on the width of the interval that it represents
equal intervals and there is no space between the bars.
• Example: this table shows the distribution of 45 students by
their weights:
4. Frequency polygon:
14
12
10
8 Column1
Females
6
0
50- 55- 60- 65- 70- 75-80
5. Pie chart:
• This graph can be used to represent qualitative or discrete
quantitative variables, and it can be used for the 4 types of
variables.
• The idea is to draw a circle with a suitable radius and this circle
is divided into a number of sectors equal to the number of
categories in the frequency distribution table. Therefore each
sector will represent one of the categories in the table, and the
area of the sector will be proportional to its frequency.
• The determination of the angle sector (or each class) is done as
follows:
A
B
AB
O
POPULATION AND SAMPLES
This will be illustrated for the distribution shown in fig. (9) and
(10) of the heights of adult men in the United Kingdom, which is
approximately normal with mean µ=171.5cm and standard
deviation ơ= 6.5cm.
Area in the upper tail of distribution:
The normal distribution can be used to estimate, for
example, the proportion of men taller than 180 cm.
This proportion is represented by the fraction of the
area under the frequency distribution curve that is
above 180cm. The corresponding SND (Z) is:
180 – 171.5
• Z = ------------------- = 1.31
6.5
• so that the proportion may be derived from the proportion of
the area of the standard normal distribution that is above 1.31.
• This area is illustrated in Figure 5.3(a) and can be found from a
computer or from Table A1 in the Appendix.
• The rows of the table refer to z to one decimal place and the
columns to the second decimal place.
• Thus the area above 1.31 is given in row 1.3 and column 0.01
and is 0.0951.
• We conclude that a fraction 0.0951, or equivalently 9.51%, of
adult men are taller than 180 cm.
Area in the lower tail of distribution:
The proportion of men shorter than 160cm, for example can
be similarly estimated.
160 – 171.5
• Z = ---------------------------- = - 1.77
6.5
• The required area is illustrated in fig. (5.3). As the standard
normal distribution (SND) is symmetrical about zero the area
below z = - 1.77 is equal to the area above z = 1.77 and is
0.0375. Thus 3.75% of men are shorter than 160 cm.
Area of distribution between two values:
The proportion of men with a height between for example, 165cm and
175cm is estimated by finding the proportion of men shorter than
165cm and taller than 175cm and subtracting these from 1. This is
illustrated in figure (5.3).
SND corresponding to 165cm is:
165 – 171.5
Z = ------------------------ = -1 Proportion below this height is 0.1587.
6.5
SND corresponding to 175cm is:
175 – 171.5
Z = ------------------------ = 0.54 Proportion above this height is 0.2946.
6.5
Proportion of men with heights between 165cm and 175cm
= 1 – proportion below 165 – proportion above 175cm
= 1 – 0.1587 – 0.2964 or 54.67%
Value corresponding to specified tail area:
• Table A1 can also be used the other way round, that is starting
with an area and finding the corresponding z value.
• For example, what height is exceeded by 5% or 0.05 of the
population?. Looking through the table the closest value to
0.05 is found in row 1.6 and column 0.04 and so the required z
value is 1.64.
• The corresponding height is found by inverting the definition of
SND to give:
• X = µ + zơ
(1.96)2 X P (1 – P)
N (sample size) = ----------------------------
(d)2
Where:
P = Prevalence of the condition under study.
d = Error rate (i.e. 5%)
Sample size calculation for cohort studies and randomized clinical trials
2 X(1.96 + 0.84)2 X P (1 – P)
N (sample size) = ---------------------------------------------
(P0 – P1)2
Where:
P0 = Proportion of the participants with the condition in the
control group
P1 = Proportion of the participants with the condition in the
exposed group
P0 + P1
P = -----------------
2
Sample size calculation for case-control studies
2 X(1.96 + 0.84)2 X P (1 – P)
N (sample size) = ---------------------------------------------
(P0 – P1)2
Where:
P0 = Proportion of the participants with the condition in the
control group
P1 = Proportion of the participants with the condition in the cases
= P0 X OR / 1 + (P0 (OR – 1)
OR = Odds ratio
P0 + P1
P = -----------------
MATHEMATICAL PRESENTATION OF DATA
• There are 2 groups for statistical presentation of data:
A. Measures of position (measures of central tendency)
B. Measures of dispersion
39 + 50 + 26 + 45 + 47 207
Mean = -------------------------------------- = ------------ = 41.4 years
5 5
Sum of observations Σ Xi
Mean (X dash) = ----------------------------------------- = -------------
Total number of observations n
• Mean from grouped data:
• Calculate the arithmetic mean from the following
table:
• First, we should find the midpoint of each class interval
• Then we multiply this midpoint by frequency of this class
• Then we add the results of multiplication of the midpoint of
class interval by its frequency and divide it by the sum of the
frequencies. As in the following table:
Age (years) Frequency (F) Midpoint (X) FX
10- 10 15 10 X 15 = 150
20- 5 25 5 X 25 = 125
30- 15 35 15 X 35 = 525
40- 10 45 10 X 45 = 450
50-60 20 55 20 X 55 = 1100
Total 60 2350
ΣFX 2350
• Mean (X dash) = ------------ = -------- = 39.17 yrs
Freq(N) 60
Where:
• Σ = summation
• F = frequency of the corresponding class interval
• X = midpoint of the corresponding class interval
• N = summation of total frequencies (total number of
observations).
Properties of the arithmetic mean:
1. Simple: can easily be calculated and interpreted
2. Unique: for a given set of data there is only one mean.
3. Sensitive: the mean is affected by every value. Extreme
values may have an influence and can distort the mean and
become misleading. For example, the following are duration
of stay in hospital in days for specific conditions: 5, 5, 5, 7, 10,
20 and 102.
154
• The mean = ----------- = 22
7
• The extreme value 102 days have a misleading effect on the
mean. In this case the median (7 days) is a less misleading
measure i.e. the mean may be affected by large outlying
observations and the median is unaffected.
Other types of mean
1. Geometric mean
2. Weighted mean
3. Harmonic mean
4. Trimmed mean
• The geometric mean
• The arithmetic mean is an inappropriate summary measure of
location if our data are skewed. If the data are skewed to the
right, we can produce a distribution that is more symmetrical if
we take the logarithm [to base 10 or to base e (2.7182818)] of
each value of the variable in this data set.
• The arithmetic mean of the log values is a measure of location
for the transformed data. To obtain a measure that has the same
units as the original observations, we have to back transform
(i.e. take the antilog of) (also known as exponentiating) the
mean of the log data; we call this the geometric mean.
• The geometric mean
• If the values xl, x2, x3, . . . , xn, have corresponding weights w1, w2,
W1 x1 + W2 x2 + ………. Wn xn
-------------------------------------------
W1 + w2 + …………….. wn
• The weighted mean
• For example, suppose we are interested in determining the
average length of stay of hospitalized patients in a district,
and we know the average discharge time for patients in
every hospital. To take account of the amount of
information provided, one approach might be to take each
weight as the number of patients in the associated
hospital.
• The weighted mean and the arithmetic mean are identical
if each weight is equal to one.
• Example: Last week, 6 patients were discharged from hospital A,
the period of stay for these patients were as follows: 8, 7, 15, 9 , 4
and 3 days. In the same week there were 5 patients discharged
hospital B, the duration of stay for them were as follows: 5, 4, 14,
7 and 6 days. In the same week there were 4 patients discharged
hospital C, the duration of stay for them were as follows: 2, 4, 6,
and 4 days. Calculate the weighted mean.
• If we want to give extra weight for the first mean and less weights
for the other two means for example:
• Weight for the first mean = .2
• Weight for the second mean = .2
• Weight for the third mean = .6
• Then, the weighted mean will be: [(7.67 x .2) + (7.3 x .2) + (4 x .6)]
= 1.53 + 1.46 + 2.4 = 5.39 days
• Example: If we take three 100 point exams in your statistics class and
score 80, 80, 95. the last exam is much easier than the first two, so
your supervisor has given it less weight. The weights for the three
exams are:
• Exam1: 40% of your grade (note: 40% as a proportion of .4)
• Exam2: 40% of your grade
• Exam3: 20% of your grade
• What is your weighted average (mean) for the class?
• The Samsung type gets 8 (out of 10) for voice Quality, 6 for Battery Life
and 7 for Camera
• The Hawawii type gets 9 for voice Quality, 4 for Battery Life and 6 for
Camera
• Then the rank of the median is the 3rd and 4th observations and its value
equal to the average of these two observations.
The Median from grouped data:
The following table shows frequency distribution of 26
individuals by age in years:
5- 3 3
10- 5 8
15- 12 20
20- 3 23
25-30 3 26
Total 26
• In order to determine the value of the median firstly we
have to calculate:
• General rank of the median:
N 26
• ----------- = ------------- = 13
2 2
Where:
• Lm = lower limit of the median class
5 X 5
= 15 + (------------) = 15 + 2.08 = 17.08 years.
12
3. The Mode:
The mode is defined as the most frequently occurring value
in a series of observations. There may no mode or one mode
or more than one mode for a group of observations.
The mode from ungrouped data: these are the weights of 5
live births: 3 – 3.5 – 3 – 2.5 – 3 kg. The mode is 3kg. which is
the most frequent observation.
The mode from grouped data: the class which has the
highest frequency is the modal class.
D1
Mode = Lm + -------------------- X I
D1 + D2
Where:
Modal class = the class interval of the mode is the class
interval with the highest frequency.
Lm = lower limit of the modal class.
D1 = difference between frequency of the modal class and
the above (previous) one.
D2 = difference between frequency of the modal class and
the lower (following) one
I = amount (or width) of the class interval.
Total 200
MEASURES OF DISPERSION
Introduction:
As we have seen, the mean, median and mode are measures
of the central tendency of a variable, but they do not provide
any information of how much the measurements vary. This
section describes some common measures of variation (or
variability) which in statistical textbooks are often referred to
as measures of dispersion.
Measures of dispersion include the following:
1. Range
2. Percentiles and quantiles
3. Average deviation
4. Variance and
5. Standard deviation
1. RANGE:
• The range of a set of measurements is the difference
between the smallest and the largest measurement. For
example, if the weights of 7 pregnant women were: 40 – 41
– 42 – 43 – 47 – 72 kg.
• The range would be 72 – 40 = 32 kg.
• Although simple to calculate, the range does not tell us
anything about the distribution of the values between the
two extreme ones.
5 25 27 1st quartile ??
6 30 28
7 35 31
8 40 32
9 45 33
10 50 35 Median (2nd Q) ??
11 55 45
12 60 46
13 65 48
14 70 54
15 75 55 3rd quartile ??
16 80 57
17 85 59
18 90 59
19 95 61
20 100 62 Maximum = 62
• For the haemoglobin data, the median is the 35.5th observation
and so we take the average of the 35th and 36th observations. Thus
the median is (11.811.9 + ) ÷ 11.85 = 2, as shown in Table
3.3. It is the haemoglobin value corresponding to the point where
the 50% line crosses the curve, as shown in the Figure
• Also marked on Figure 3.7 are the two points where the 25% and
75% lines cross the curve. These are called the lower (first) and
upper (third) quartiles of the distribution, respectively, and
together with the median they divide the distribution into four
equally-sized groups.
• The difference between the lower and upper quartiles is known as
the interquartile range.
Box plot:
• A useful plot, based on these values, is a box and whiskers plot.
• The box is drawn from the lower quartile to the upper quartile; its
length gives the interquartile range.
• The horizontal line in the middle of the box represents the median.
• Just as a cat’s whiskers mark the full width of its body, the
‘whiskers’ in this plot mark the full extent of the data. They are
drawn on either end of the box to the minimum and maximum
values.
Box plot
3. AVERAGE DEVIATION:
• Average deviation is the mean of the absolute difference
between individual values and the mean of these values.
1. Calculate the mean of all measurements
2. Calculate the difference between each individual measurement
and the mean
3. Add the absolute difference between the mean and the
individual measurements (remove the negative sign).
4. Divide the above result by the number of observation
5. Average deviation = Σ (X – mean) / n
3. AVERAGE DEVIATION: example
Mean = 60 /10 = 6 - Average deviation = Σ (X – mean) / n
Average deviation = 16 / 10 = 1.6
Σ(X – X dash)2
Variance, SD 2 = ------------------------
(n – 1)
4. Variance: example
Mean = 60 /10 = 6 - Variance = Σ (X – mean)2 / n – 1
Variance = 34 / 10 – 1 = 3.78
• The chi-square test can be used to give us the answer. This test
is based on measuring the difference between the observed
frequencies and the expected frequencies if the null hypothesis
(i.e. the hypothesis of no difference) were true.
To perform a x2 test you need to complete the following 3 steps:
1. Calculate the x2 value
2. Use a x2 table, and
3. Interpret the results.
2. Using a x2 table
• As for the t-test, the calculated X2 value has to be compared
with a theoretical X2 value in order to determine whether the
null hypothesis is rejected or not.
• First you must decide on a p-value. We usually take 0.05
• Then the degree of freedom have to be calculated. With the X2
test the number of degrees of freedom is related to the number of
cells. i.e. the number of groups or variables you are comparing.
The number of degrees of freedom is found by multiplying the
number of rows (r) minus 1 by the number of columns (c) minus 1:
• d.f. = (r – 1) X (c – 1)
• For a simple two-by-two table the number of degrees of freedom
is 1 (i.e. d.f. = (2 – 1) X (2 – 1).
• Then the X2 value belonging to the p-value and the number of
degrees of freedom is located in the table, in order to determine
whether the X2 value is statistically significant or not.
3. Interpreting the result:
• As for the t-test, the null hypothesis is rejected if p<0.05, which is
the case if the calculated X2 is larger than the theoretical X2 value
in the table.
• Let us now apply the X2 test to the data given in Example (D)
(utilization of antenatal care). This gives the following results:
Step1(a):
The expected frequencies for each cell are calculated as follows:
E1 = 86 X 80 /155 = 44.4 E2 = 69X80 /155 = 35.6
E3 = 86 X 75 /155 = 41.6 E4 = 69 X 57 /155 = 33.4
For convenience, the observed and expected frequencies are
shown in the following table:
Table (E): Utilization of antenatal clinics, observed and expected
frequencies.
Total Did not use Used ANC Distance from
ANC ANC
80 O2=29 O1=51 < 10 km
E2=35.6 E1=44.4
75 O4=40 O3=35 ≥10 km
E4=33.4 E3 =41.6
155 69 86 Total
Note that the expected frequencies refer to the values we would
have expected, given the total numbers of 80 and 75 women in
the two groups. If the null hypothesis, stating that there is no
difference between the two groups, were true.
Step1(b) to 1(d):
(51 – 44.4)2 (29 – 35.6)2 (35 – 41.6)2 (40 – 33.4)2
X2 = --------------- + ----------------- + ----------------- + -----------------
44.4 35.6 41.6 33.4
Normal
Total Obese
weight
• d.f. = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1
• Theoretical x2 value from x2 table at p =0.05 and d.f. =1 is 3.84.
Then, the difference is not significant.
• d.f. = (r – 1) (c – 1) = (2 – 1) (3 – 1) = 2
Test Statistics
Minority Classification
Chi-Square 149.274a
Df 1
Asymp. Sig. .000
a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell
frequency is 237.0.
Example 2
colour
Observed N Expected N Residual
blue 25 20.0 5.0
pink 20 20.0 .0
green 20 20.0 .0
brown 15 20.0 -5.0
yellow 20 20.0 .0
Total 100
• Test Statistics
• colour
• Chi-Square 2.500a
• df 4
• Asymp. Sig. .645
• a. 0 cells (.0%) have expected frequencies less than 5. The minimum
expected cell frequency is 20.0.
COMPARISON OF TWO MEANS - STUDENT'S T-TEST
The unpaired (two-sample) t-test - Independent samples t-test
Rationale
• We consider the difference in the means of the two groups.
Under the null hypothesis that the population means in the two
groups are the same, this difference will equal zero. Therefore,
we use a test statistic that is based on the difference in the two
sample means, and on the value of the difference in population
means under the null hypothesis (i.e. zero).
When the sample sizes are reasonably large, the t-test is fairly
robust to departures from Normality.
• If the Levene's test is not significant (p > .05) then you can
assume that the data show homogeneity of variance.
(b) If the F-ratio is very close to 1, you are safe in concluding that
the data probably show homogeneity of variance. If the F-ratio
is quite a bit larger than 1, then to decide how likely it is to get
your obtained F-ratio by chance, you need to use a table of F-
max values.
To use the table you need to know the d.f. (the number of
participants in a group, minus 1) and k (the number of groups or
conditions).
Note that Hartley's test assumes that there are equal numbers
of participants in each group.
2. Calculate the t-value: when variances are un-equal:
SD12 SD22
• Square root of ------------ + --------------
n1 n2
Example: the following money (in pounds) with 20 male and 20
female children, test whether there is a significant difference
between the two means or not
• Males:
11.0 - 12.0 - 13.0 - 14.0 - 15.0 - 13.0 - 14.0 - 15.0 - 16.0 - 18.0
- 22.0 - 33.0 - 44.0 - 55.0 - 66.0 - 65.0 - 45.0 - 43.0 - 66.0 -
32.0
N1 = 20 - Mean1 = 15.4 - SD1 = 9.5
• Females:
11.0 - 12.0 - 13.0 - 11.0 - 11.0 - 12.0 - 13.0 - 14.0 - 15.0 - 55.0 -
22.0 - 21.0 - 21.0 - 23.0 - 24.0 - 21.0 - 23.0 - 24.0 - 25.0 -
32.0
N2 = 20 - Mean2 = 35.4- SD2 = 16.3
The standard error of the difference is given by the following
formula:
9.52 16.32
• Square root of ----------- + ------------ = 4.2
20 20
Where:
X1 – X2
t = --------------------------------------------------
SD12 SD22
√----------- + -----------
n1 n2
• Where X1 is the mean value of the first sample and X2 is the mean value
of the second sample.
3. Using a T-Table:
• Once the t-value has been calculated, you will have to refer to a
t-table, from which you can determine whether the null
hypothesis is rejected or not.
A. First, decide which significance level (p-value) you want to use.
Remember that the p-value is an expression of the likelihood of
finding a difference by chance when there is no real difference.
Usually we take a p-value of 0.05.
B. Second, determine the number of degrees of freedom for the
test being performed. Degrees of freedom is a measure derived
from the sample size, which has to be taken into account when
performing a t-test. The bigger the sample size (and degrees of
freedom) the smaller the difference needed to reject the null
hypothesis.
• The way the number of degrees of freedom is calculated differs
from one statistical test to another. For Student t-test the number
of degrees of freedom is calculated as the sum of the two sample
sizes minus 2. That is: d.f. = n1 + n2 – 2
• Thus, for example 1 the number of degrees of freedom is:
• d.f. = 20 + 20 – 2 = 38
SD12(n1 – 1) + SD22(n2 – 1) 1 1
Square root ----------------------------------------x (----- --- + --------)
n1 + n2 – 2 n1 n2
We have two samples that are related to each other and one
numerical or ordinal variable of interest.
Assumptions
In the population of interest, the individual differences are
Normally distributed with a given (usually unknown) variance.
We have a reasonable sample size so that we can check the
assumption of Normality.
Rationale
If the two sets of measurements were the same, then we would
expect the mean of the differences between each pair of
measurements to be zero in the population of interest.
Therefore, our test statistic simplifies to a one-sample t-test on
the differences, where the hypothesized value for the mean
difference in the population is zero.
Additional notation
Because of the paired nature of the data, our two samples must
be of the same size, n. We have n differences, with sample
mean, d dash, and estimated standard deviation sd.
• Since the pairing is explicitly defined and thus new information added
to the data, paired data can always be analyzed with the independent
sample t-test as well, but not vice versa.
d.f. = n - 1
The problem
• We have a sample from a single group of individuals and one
numerical or ordinal variable of interest. We are interested in
whether the average of this variable takes a particular value.
• For example, we may have a sample of patients with a specific
medical condition. We have been monitoring triglyceride levels in
the blood of healthy individuals and know that they have a
geometric mean of 1.74 mmol/L. We wish to know whether the
average level in our patients is the same as this value.
The one-sample t-test
• Assumptions
• In the population, the variable is Normally distributed with a
given (usually unknown) variance. In addition, we have taken a
reasonable sample size so that we can check the assumption of
Normality.
Rationale
• We are interested in whether the mean, x, of the variable in the
population of interest differs from some hypothesized value, µ.
We use a test statistic that is based on the difference between
the sample mean, X dash, and µ1,. Assuming that we do not
know the population variance, then this test statistic, often
referred to as t, follows the t-distribution.
• If we do know the population variance, or the sample size is
very large, then an alternative test (often called a z-test), based
on the Normal distribution, may be used. However, in these
situations, results from either test are virtually identical.
Additional notation
• Our sample is of size n and the estimated standard deviation is
s.
• Interpretation of the confidence interval
• The 95% confidence interval provides a range of values in which
we are 95% certain that the true population mean lies. If the 95%
confidence interval does not include the hypothesized value for
the mean, µl, we reject the null hypothesis at the 5% level. If,
however, the confidence interval includes µl, then we fail to
reject the null hypothesis at that level.
Grand mean = 79
a) data:
14 370 Total
F should be about 1 if there are no real differences between the
groups and larger than 1 if there are differences.
There is thus strong evidence that mean weight levels differ between
the three groups (first, second and third), the mean being lowest for
the third group (74), intermediate for the first group (79), and highest
for the second group (84).
• The data are said to have a balanced design if there are equal
numbers of observations in each group and an unbalanced
design if there are not.
• These data are classified in two ways, by strain and by sex. The
design is balanced with replication because there are five
observations in each strain–sex group.
• Number of groups = sex(2) X strain (3) = 6.
Mean weight gains in grams with standard deviations in
parentheses (n = 5 for each group).
C B A Sex / strain
2. The sum of squares due to differences between the sexes, that is the
main effect of sex. Its degrees of freedom equal 1, one less than the
number of sexes.
3. The sum of squares due to the interaction between strain and sex. An
interaction means that the strain differences are not the same for
both sexes and, equivalently, that the sex difference is not the same
for the three strains. The degrees of freedom equal the product of
the degrees of freedom of the two main effects, which is 2 x 1 = 2.
4. The residual sum of squares due to differences between the rats
within each strain–sex group. Its degrees of freedom equal 24,
the product of the number of strains (3), the number of sexes (2)
and the number of observations in each group minus one (4).
• When there are two or more independent variables, it’s possible that
some variables use the same participants whereas others use different
participants. In this case we use the term mixed.
• When we name an ANOVA, we are simply telling the reader how many
independent variables we used and how they were measured. In
general terms we could write the name of an ANOVA as:
A (number of independent variables) way of how these variables were
measured ANOVA.
Naming ANOVA
• By remembering this you can understand the name of any ANOVA you
come across. Look at these examples and try to work out how many
variables were used and how they were measured:
One-way independent ANOVA
Two-way repeated-measures ANOVA
Two-way mixed ANOVA
Three-way independent ANOVA
• Therefore, the degrees of freedom used to assess the F-ratio are the degrees
of freedom for the effect of the model (dfM = 2) and the degrees of freedom
for the residuals of the model (dfR = 12).
• Therefore, the correct way to report the main finding would be:
There was a significant effect of Viagra on levels of libido, F(2, 12) = 5.12, p
< .05, ω = .60.
• Notice that the value of the F-ratio is preceded by the values of the degrees of
freedom for that effect. Also, we rarely state the exact significance value of
the F-ratio: instead we report that the significance value, p, was less than the
criterion value of .05 and include an effect size measure.
Input data
Analysis – compare means – one way ANOVA
Write variables in the windows
Select options
Post hoc multiple comparisons (pair-wise comparisons)
ANOVA table
Test of homogeneity of vairances (Result of Levene test)
Result of post-hoc multiple comparison
Post hoc pairwise comparison tests
Post hoc procedures
Post hoc tests consist of pairwise comparisons that are designed
to compare all different combinations of the treatment groups. So,
it is rather like taking every pair of groups and then performing a
t-test on each pair of groups.
Now, this might seem like a particularly stupid thing to say in the
light of what I have already told you about the problems of
inflated familywise error rates.
(2) does the test control the Type II error rate (i.e. does the test have
good statistical power); and
(3) is the test reliable when the test assumptions of ANOVA have
been violated?