Probability
Probability
The word statistics is defined in different ways depending on its use in the plural and singular sense.
In the plural sense: - statistics is defined as the collection of numerical facts or figures (or the raw data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered as a statistics whereas
Abebe has got 90% in statistics course is not statistics.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they cannot be compared
and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of collecting, organizing,
presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data are used.
Descriptive Statistics:- deals with describing the data collected without going further conclusion.
Example 1.1: Suppose that the mark of 6 students in Statistics course for COTM students is given as 40, 45, 50,
60, 70 and 80. The average mark of the 6 students is 57.5 and it is considered as descriptive statistics.
Inferential Statistics:- It deals with making inferences and/or conclusions about a population based on data
obtained from a sample of observations. It consists of performing hypothesis testing, determining relationships
among variables and making predictions.
Example 1.2: In the above example, if we say that the average mark in Statistics course for Mathematics students
is 57.5, then we talk about inferential statistics (draw conclusion based on the sample observation).
1
1.3 Definition of some statistical terms
Population: - It is the totality of objects under study. The population represents the target of an investigation, and
the objective of the investigation is to draw conclusions about the population hence we sometimes call it target
population. The word population doesn’t necessarily refer to people.
Examples:- All clients of Telephone Company, Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study.
Sampling frame: - is the list of all possible units of the population that the sample can be drawn from it.
Eg. List of all students of AASTU, List of all residential houses in A.A city, etc
Survey: - is an investigation of a certain population to assess its characteristics. It may be census or sample.
Census survey: a complete enumeration of the population under study.
Sample survey: the process of collecting data covering a representative part or portion of a population.
Parameter: - is a statistical measure of a population, or summary value calculated from a population. Examples:
Average, Range, proportion, variance, etc
Statistic: - is a descriptive measure of a sample, or it is a summary value calculated from a sample.
Sampling: - The process or method of sample selection from the population.
Sample size: - The number of elements or observation to be included in the sample.
An element: - is a member of sample or population. It is specific subject or object (for example a person, firm,
item, etc.) about which the information is collected.
Variable: - It is an item of interest that can take numerical or non-numerical values for different elements. It may
be qualitative or quantitative. Example: age, weight, sex, marital status, etc.
Observation (measurement):- is the value of a variable for an element.
Qualitative variables:- are variables that assume non-numerical values. They can be categorized and they are
usually called attributes. Example: - Sex, marital status, ID number, etc.
Quantitative variables: - are variables which assume numerical values. eg. Age, weight, etc.
1.4 Applications, uses and limitations of Statistics
Statistics can be applied in any field of study which seeks quantitative evidence. For instance, Engineering,
Economics, Natural Science, etc.
Engineering: Statistics have wide application in engineering.
• To compare the breaking strength of two types of materials
• To determine the probability of reliability of a product.
• To control the quality of products in a given production process.
• To compare the improvement of yield due to certain additives such as fertilizer, herbicides, e t c.
Function/Uses of Statistics
The following are some uses of statistics:
• It condenses and summarizes a mass of data: the original set of data (raw data) is normally voluminous and
disorganized unless it is summarized and expressed in few presentable, understandable & precise figures.
• Statistics facilitates comparison of data: measures obtained from different set of data can be compared to draw
conclusion about those sets. Statistical values such as averages, percentages, ratios, rates, coefficients, etc, are
the tools that can be used for the purpose of comparing sets of data.
• Statistics helps to predict future trends: statistics is very useful for analyzing the past and present data and
forecasting future events.
• Statistics helps to formulate & review policies
2
• Formulating and testing hypothesis: Statistical methods are extremely useful in formulating and testing
hypothesis and to develop new theories.
Limitations of Statistics
Some of these limitations are:
a) It does not deal with individual values: as discussed earlier, statistics deals with aggregate of facts. For
example, wage earned by an individual worker at any one time, taken by itself is not a statistics.
b) It does not deal with qualitative characteristics directly: statistics is not applicable to qualitative
characteristics such as beauty, honesty, poverty, standard of living and so on since these cannot be expressed in
quantitative terms.
c) Statistical conclusions are not universally true: since statistics is not an exact science, as is the case with
natural sciences, the statistical conclusions are true only under certain assumptions.
d) It can be misused: statistics cannot be used to full advantage in the absence of proper understanding of the
subject matter.
1.5 Levels of Measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order to specify and apply the
proper statistical method for their analysis and inferences.
Scale Types
Measurement is the assignment of values to objects or events in a systematic fashion. Four levels of measurement
scales are commonly distinguished: nominal, ordinal, interval, and ratio. The first two are qualitative while the
last two are quantitative.
Nominal scale: The values of a nominal attribute are just different names, i.e., nominal attributes provide only
enough information to distinguish one object from another. Qualities with no ranking or ordering; no numerical
or quantitative value. These types of data are consists of names, labels and categories.
Example 1.3: Eye color: brown, black, etc, sex: male, female.
• In this scale, one is different from the other
• Arithmetic operations (+, -, *, ÷) are not applicable, comparison (<, >, ≠, etc) is impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
• Can be arranged in some order, but the differences between the data values are meaningless.
• Data consisting of an ordering of ranking of measurements are said to be on an ordinal scale of
measurements. That is, the values of an ordinal scale provide enough information to order objects.
• One is different from and greater /better/ less than the other
• Arithmetic operations (+, -, *, ÷) are impossible, comparison (<, >, ≠, etc) is possible.
Example 1.4 -Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good, fair, poor), military
status (general, colonel, lieutenant, etc).
Interval Level: data are defined as ordinal data and the differences between data values are meaningful. However,
there is no true zero, or starting point, and the ratio of data values are meaningless. Note: Celsius & Fahrenheit
temperature readings have no meaningful zero and ratios are meaningless.
In this measurement scale:-
• One is different, better/greater and by a certain amount of difference than another.
• Possible to add and subtract. For example; 800c – 500c = 300c, 700c – 400c = 300c.
3
• Multiplication and division are not possible. For example; 600c = 3(200c). But this does not imply that an
object which is 600c is three times as hot as an object which is 200c.
Most common examples are: temperature, IQ.
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting point, and the ratios of
data values have meaning.
• Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both differences and ratios are
meaningful.
• One is different/larger /taller/ better/ less by a certain amount of difference and so much times than the
other.
• This measurement scale provides better information than interval scale of measurement.
Example 1.5: weight, age, number of students.
4
CHAPTER TWO: METHODS OF DATA COLLECTION AND PRESENTATION
Data:- is a measurement or observation value recorded for a certain element or variable. it is the raw material of
statistics. It can be obtained either by measurement or counting.
Sources of data
The statistical data may be classified under two categories depending up on the sources.
Primary data: - Data collected by the investigator himself for the purpose of a specific inquiry or study.
Three of the most common methods of collecting Primary data are:
• Telephone survey
• Mailed questionnaire
• Personal interview.
Secondary data: - When an investigator uses data, which have already been collected by others, such data are
called secondary data. . Example of secondary data: books, reports, magazines, etc.
2.2 Methods of Data Presentation
The presentation of data is broadly classified in to the following two categories:
✓ Frequency distribution /Tabular presentation
✓ Diagrammatic and Graphic presentation.
2.2.1 Frequency distribution
Frequency:- is the number of times a certain value or class of values occurs.
Frequency distribution (FD):- is the organization of raw data in table form using classes and frequency.
Definition of some basic terms
• Grouped frequency distribution: is a FD when several numbers are grouped into one class.
• Class limits (CL): It separates one class from another. The limits could actually appear in the data and have
gaps between the upper limits of one class and the lower limit of the next class.
• Unit of measure (U): This is the possible difference between successive values. E.g. 1, 0.1, 0.01 …
• Class boundaries: Separate one class in a grouped frequency distribution from the other. The boundary has
one more decimal place than the raw data. There is no gap between the upper boundaries of one class and
the lower boundaries of the succeeding class. Lower class boundary is found by subtracting half of the unit
of measure from the lower class limit and upper class boundary is found by adding half unit measure to the
upper class limit.
• Class width (W): The difference between the upper and lower boundaries of any consecutive class. The
class width is also the difference between the lower limit or upper limits of two consecutive classes.
• Class mark (Midpoint): It is found by adding the lower and upper class limit (Boundaries) and divided the
sum by two.
• Cumulative frequency (CF): It is the number of observation less than the upper class boundary or greater
than the lower class boundary of class.
• CF (Less than type): it is the number of values less than the upper class boundary of a given class.
• CF (Greater than type): it is the number of values greater than the lower class boundary of a given class.
• Relative frequency (Rf ):The class frequency divided by the total frequency. This gives the percent of
values falling in that class.
5
• Rfi = fi/n= fi/∑fi
• Relative cumulative frequency (RCf): The class cumulative frequency divided by the total frequency
gives the percent of the values which are less than the upper class boundary or the reverse.
RCfi = Cfi/n= Cfi/∑fi
Example 2.4: The following data are on the number of minutes to travel from home to work for a group of
automobile workers: 28 25 48 37 41 19 32 26 16 23 23 29 36 31 26 21 32 25 31 43 35 42 38 33 28.
Construct a frequency distribution for this data.
Solution:
✓ Range = 48 – 16 =32
✓ K=1+3.322log10 25=5.64≈6
✓ W=32/6=5.33 rounding up to the nearest integer i.e W=6.
Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally Frequency
16-21 15.5-21.5 \\\ 3
22-27 21.5-27.5 \\\\\ \ 6
28-33 27.5-33.5 \\\\\ \\\ 8
34-39 33.5-39.5 \\\\ 4
40-45 39.5-45.5 \\\ 3
46-51 45.5-51.5 \ 1
Total 25
The final frequency distribution is shown in table below.
6
Table: The distribution of the times
Time (in minute) Number of workers
16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25
This frequency distribution is more understandable than the raw data. For instance, many observations are found
in the second class and third class. This in turn implies that many workers took around 22 to 33 minutes to travel
from home to work.
Types of frequency distributions
Based on the type of frequency assigned to the classes we have three types of frequency distributions:
➢ Absolute frequency distribution
➢ Relative frequency distribution
➢ Cumulative frequency distribution
The frequency distributions that we have seen in the previous examples (examples 2.3 and 2.4 ) are absolute
frequency distributions because the frequencies assigned are absolute frequencies.
Definition 2.1: A relative frequency distribution is a distribution which specifies the frequency of a class
relative to the total frequency.
Example 2.5: Convert the above absolute frequency distribution in example 2.4 to a relative frequency
distribution.
Solution: First we find the relative frequency of each class. The relative frequency of a class is the frequency of
the class divided by the total number of observations. For instance the relative frequency of the first class is
3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the relative frequency
distribution is shown in the table below.
7
Definition 2.2: Cumulative frequency refers to the number of observations that are below a specified value or
that are above a specified value.
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the observations are
bounded from above or from below, we can have a cumulative less than or a cumulative more than frequency
distributions, respectively.
Example 2.6: Convert the absolute frequency distribution in example 2.4 into:
i) a cumulative less than frequency distribution.
ii) a cumulative more than frequency distribution.
Solution:
i) We use the class boundaries to form cumulative frequencies. For instance, there is no observation which
is less than 15.5, 3 observations are less than 21.5, 9 observations are less than 27.5 and so on. Thus,
the following less than cumulative frequency distribution is obtained.
8
Table: Distribution of the number of children.
Number of Children Frequency Relative frequency
2 5 .17
3 7 .23
4 8 .27
5 4 .13
6 1 .03
7 2 .07
8 3 .1
Total 30 1
As we can see from this frequency distribution most families have 4 or 3 children. It would have been difficult to
observe such feature of the data if we did not organize the raw data using a frequency distribution.
Note: Up to now we have seen frequency distributions for quantitative data; we can have also frequency
distributions for qualitative (categorical) data.
Frequency polygon: is a graphic form of a frequency distribution. It can be constructed by plotting the class
frequencies against class marks and joining them by a set of line segments.
Note: we should add two classes with zero frequencies at the two ends of the frequency distribution to complete
the polygon.
Example 2.10: Construct a frequency polygon for the frequency distribution of the time spent by the automobile
workers that we have seen in example 2.4.
Example 2.11: Draw a bar chart for the following coffee production data.
Table: Coffee productions from 1990 to 1995.
10
120
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
Pie-chart: it is a circle divided by radial lines into sections or sectors so that the area of each sector is proportional
to the size of the figure represented.
Pie-chart construction:
f
✓ Calculate the percentage frequency of each component. It is i * 100 .
n
f
✓ Calculate the degree measures of each sector. It is given by i * 360 0 .
n
✓ Draw the circle using protractor and compass
Example 2.13: Draw a pie-chart to represent the following data on a certain family expenditure.
Table: Family expenditure.
Item Food Clothing House rent Fuel & light Miscellaneous Total
Expenditure(in birr) 50 30 20 15 35 150
Percentage 33.33 20 13.33 10 23.33
frequencies
Angles of the sector 1200 720 480 360 840 3600
Item
Food
Clothing
House rent
Fuel and light
Miscellaneous
11
Example 2.14: The following data are the blood types of 50 volunteers at a blood plasma donation clinic:
O A O AB A A O O B A O A AB B O OO A B A A O A A B O B A O AB A O O A
B A AA O B O O A O A B O AB A O
a) Organize this data using a categorical frequency distribution
b) Present the data using both a pie and a bar chart.
Solution
a) The classes of the frequency distribution are A, B, O, AB. Count the number of donors for each of the blood
types.
A 19 38.0
B 8 16.0
O 19 38.0
AB 4 8.0
Total 50 100.0
b) Pie chart
Find the percentage of donors for each blood type. In order to find the angles of the sector for each blood
type, multiply the corresponding percentage by 3600 and divide by 100.
12
Blood type
A
B
O
AB
15
10
0
A B O AB
Blood type
13
UNIT THREE: MEASURES OF CENTERAL TENDENCY
Objectives:
Having studied this unit, you should be able to:
✓ understand the role of descriptive statistics in summarization, description and interpretation of data.
✓ use several numerical methods belonging to measures of central tendency to describe the characteristics
of a data set.
Note that the individual values of the distribution must have a tendency to cluster around an average. In view of
this requirement an average is also referred to as a measure of central tendency.
Objectives of measuring central tendency:
✓ To get one single value that describes the characteristics of the entire group.
✓ To facilitate comparison between different data sets.
∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 + ⋯ + 𝑥𝑛 𝑦𝑛
𝑖=1
𝑛
∑ 𝑥𝑖 𝑓𝑖 = 𝑥1 𝑓1 + 𝑥2 𝑓2 + 𝑥3 𝑓3 + ⋯ + 𝑥𝑛 𝑓𝑛
𝑖=1
14
𝑛
Definition 3.2:
i) Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 be the values of the variable X. The simple arithmetic mean denoted by 𝑥̅ is the sum
of these observations of X divided by the no values.
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
ii) If the numbers 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 occur with frequencies𝑓1 , 𝑓2 , 𝑓3 , … , 𝑓𝑘 , respectively. Then mean can be
𝑥1 𝑓1 +𝑥2 𝑓2 +𝑥3 𝑓3 +⋯+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑓𝑖 𝑥𝑖
defined in a more compact form as 𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 +𝑓3 +⋯+ 𝑓𝑘 𝑖=1 𝑓𝑖
Note that if the data refers to a population data the mean is denoted by the Greek letter µ (read as mu).
Arithmetic mean for raw data (ungrouped data)
Example 3.1: The following data is the weight (in Kg) of eight youths: 32,37,41,39,36,43,48 and 36. Calculate
the arithmetic mean of their weight.
Solution:
∑8𝑖=1 𝑥𝑖 32 + 37 + 41 + ⋯ + 36 312
𝑥̅ = = = = 39
8 𝑛 8
Example 3.2: The ages of a random sample of patients in a given hospital in Ethiopia is given below:
Age 10 12 14 16 18 20 22
Number of patients 3 6 10 14 11 5 4
Calculate the average age of these patients.
Solution:
Age (xi) Number of patients (fi) (𝑓𝑖 𝑥𝑖 )
10 3 30
12 6 72
14 10 140
16 14 224
18 11 198
20 5 100
22 4 88
Total 53 852
15
∑𝑘𝑖=1 𝑓𝑖 𝑥𝑖 𝑥1 𝑓1 + 𝑥2 𝑓2 + 𝑥3 𝑓3 + ⋯ + 𝑥𝑘 𝑓𝑘
𝑥̅ = 𝑘 =
∑𝑖=1 𝑓𝑖 𝑓1 + 𝑓2 + 𝑓3 + ⋯ + 𝑓𝑘
10 × 3 + 12 × 6 + 14 × 10 + 16 × 14 + 18 × 11 + 20 × 5 + 22 × 4
=
3 + 6 + 10 + 14 + 11 + 5 + 4
30 + 72 + 140 + 224 + 198 + 100 + 88 852
= = = 16.075
53 53
Thus the mean age of these patients is 16.075.
The weighted arithmetic mean
In some cases the data in the sample or population should not be weighted equally, and each value weighted
according to its importance. There is a measure of average for such problems known as weighted Arithmetic mean.
Weighted arithmetic mean is used to calculate the average when the relative importance of the observations
differs. This relative importance is technically known as weight. Weight could be a frequency or numerical
coefficient associated with observations.
Example 3.3: The GPA or CGPA of a student is a good example of a weighted arithmetic mean. Suppose that
Solomon obtained the following grades in the first semester of the freshman program at Jima University in 2000.
Course Credit hour (wi) Grade
Math101 4 A=4
Bio101 3 C=2
Chem101 3 B=3
Phys101 4 B=3
Flen101 3 C=2
Find the GPA of Solomon.
4(4) + 3(2) + 3(3) + 4(3) + 3(2) 49
𝑥̅𝑤 = 𝐺𝑃𝐴 = = = 2.88
4+3+3+4+3 17
Example 3.4: In a vacancy for a position of botanist in an organization, the criteria of selection were work
experience, entrance exam, and, interview result. The relative importance of these criteria was regarded to be
different. The weights of these criteria and the scores obtained by 3 candidates (out of 100 in each criterion) are
given in the following table. In addition, the selection of a candidate is based on average result on these criteria.
Criterion Weight Candidates
Tesfaye Gutema Kedir
Work experience 4 70 89 85
Entrance exam 3 78 83 89
Interview result 2 90 92 90
Who is the appropriate candidate for the position based on the criteria?
Solution: We use the weighted mean since the relative importances of these criteria are different.
16
Criterion Weight Candidates
Tesfaye Gutema Kedir
xi xiwi xi xiwi xi xiwi
Work experience 4 70 280 89 356 85 340
Entrance exam 3 78 234 83 249 89 267
Interview result 2 90 180 92 184 90 180
Total 9 238 694 264 789 264 787
The weighted mean and the simple arithmetic mean for the applicants are as follows:
Applicant Tesfaye Gutema Kedir
Weighted mean 694/9=77.11 789/9=87.67 787/9=87.44
Simple arithmetic mean 238/3=79.33 264/3=88 264/3=88
If we use the simple arithmetic mean of the scores, both Gutema and Kedir have got equal chances to be recruited.
However, the relative importance of the criteria is different. So we have to use the weighted mean for
discriminating among the candidates. The weighted mean of the scores obtained by Gutema is larger than the
others. So Gutema should be recruited for the job.
Properties of arithmetic mean
i. It can be computed for any set of numerical data, it always exists, and unique.
ii. It depends on all observations.
iii. The sum of deviations of the observations about the mean is zero i.e.
𝑛
∑(𝑥𝑖 − 𝑥̅ ) = 0
𝑖=1
iv. It is greatly affected by extreme values.
v. It lends itself to further statistical treatment, for instance, combinations of means.
vi. It is relatively reliable, i.e. it is not greatly affected by fluctuations in sampling.
vii. The sum of squares of deviations of all observations about the mean is the minimum
i.e. ∑𝑛𝑖=1(𝑥𝑖 − 𝑋̅)2 ≤ ∑𝑛𝑖=1(𝑥𝑖 − 𝐴)2 for any constant A.
1.3.2 Geometric mean
Definition 3.4: The geometric mean of any n positive numbers is the nth root of the products of the numbers.
Symbolically if 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are given their geometric (G.M) mean is given by
𝑛
𝑛
𝑛
𝐺. 𝑀 = √𝑥1 . 𝑥2 . 𝑥3 . … . 𝑥𝑛 = √∏ 𝑥𝑖
𝑖=1
17
Median for raw data
𝑛+1
i. If the number of observations, say n, is odd then the median is equal to the ( ) 𝑡ℎ observation of
2
the array.
𝑛
ii. If the number of observations n is even then the median is equal to the sum of ( ) 𝑡ℎ observation and
2
𝑛
( + 1) 𝑡ℎ observation divided by two.
2
Notation: If X is the variable under consideration, then 𝑥̃ is used to denote the median.
Example 3.9: Find the median for the following sets of data:
i. 10 5 7 9 6 5 4
Solution: First arrange the data in the form of an array.
4 5 5 6 7 9 10
Here we have n=7 which is odd
𝑛+1
Therefore, the median, 𝑥̃ = 𝑡ℎ𝑒 ( ) 𝑡ℎ observation = the 4th observation = 6.
2
ii. 10 5 7 9 6 5 4 8
Solution: Arrange the data in ascending order.
4 5 5 6 7 8 9 10
Here n=8 which is even.
(82)𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛+(82+1)𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Therefore, 𝑥̃ =
2
4𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛+5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 6 + 7
= = = 6.5
2 2
iii. A shop keeper (sales person) recorded the number of video cassette recorders (VCRs) sold per month
over a two year period. Find the median number VCRs sold.
Number of sets sold Frequency ( months) Cumulative frequency
1 3 3
2 8 11
3 5 16
4 4 20
5 2 22
6 1 23
7 1 24
18
The mode for raw data
Example 3.10: Find the modal value for the following sets of data.
i. 5 6 5 8 7 4 . In this data set, 5 is the most frequent value. Therefore, the mode is 5. Since the modal
value is only one number, we call the distribution unimodal.
ii. 1 2 3 4 8 2 5 4 6. In this data,themodal values are 2 and 4 since both 2 and 4 appear most frequently
and they occur equal number of times. These kind distributions are called bimodal distribution.
iii. 1 2 4 3 5 6 8 7 In this data set, all values appear equal number of times so there is no modal
value.
Note:
✓ If a distribution has more than two modal values then we call the distribution multimodal.
✓ If in a set of observed values, all values occur once or equal number of times, there is no mode.
✓ The mode is also useful in finding the most typical case when the data are nominal or categorical.
Example 3.11: A survey showed the following distribution for the number of students enrolled in each field. Find
the mode.
Subject Number of students
Business 850
Liberal arts 825
Computer sciences 645
Education 478
General studies 100
Solution: Since the category with the highest frequency is business, the most typical case is a business major.
Properties of modal value
➢ It is easy to calculate and understand.
➢ It is not affected by extreme values.
➢ It is ill-defined, indeterminate and indefinite sometimes.
➢ It is not based on all observations.
➢ Is not used in further analysis of data.
The mean, median, and mode of grouped data
The mean for grouped data can be found by considering the values in the interval are centered at the mid-point of
the interval.
Example 3.12: Consider the frequency distribution of the time spent by the automobile workers. Find the mean
time spent by these workers from this frequency distribution.
Time (in minute) Class mark (xi) Number of workers f×xi
15.5- 21.5 18.5 3 55.5
21.5-27.5 24.5 6 147
27.5-33.5 30.5 8 244
33.5-39.5 36.5 4 146
39.5-45.5 42.5 3 127.5
45.5-51.5 48.5 1 48.5
Total 25 768.5
Solution:
19
∑6𝑖=1 𝑓𝑖 𝑥𝑖 768.5
𝑥̅ = = = 30.74
∑ 𝑓𝑖 25
Note: In case of grouped data if any class interval is open, arithmetic mean cannot be calculated.
The median for grouped data can be approximated by the following formula.
(𝑛2−𝑐𝑓)
𝑥̃ = 𝐿𝑚 + 𝑤
𝑓𝑚
where Lm= lower class boundary for the median class.
n= total number of observations in the distribution.
cf= less than cumulative frequency for the class preceding the median class.
w= class width for median class.
fm=frequency for median class.
Note that the median class is the class containing the (n/2)thobservation if integer or the next nearest integer if the
(n/2)th is not integer.
Example 3.13: Find the median for the following frequency distribution.
Class boundaries Frequency (f) Cumulative frequency
5.5-10.5 1 1
10.5-15.5 2 3
15.5-20.5 3 6
20.5-25.5 5 11
25.5-30.5 4 15
30.5-35.5 3 18
35.5-40.5 2 20
th th
Solution: The class containing the (n/2) observation or the 10 observation is the median class. This class has
class boundaries 20.5 & 25.5(4th class).
(𝑛−𝑐𝑓) (10−6)
𝑥̃ = 𝐿𝑚 + 2 𝑤 = 20.5 + 5 = 24.5
𝑓𝑚 5
Therefore, the median is 24.5.
Note:
i. We approximate the median by assuming that the values in the median class are evenly distributed.
ii. We can compute the median for open-ended frequency distribution as long as the middle value does
not occur in the open-ended class.
The mode for grouped data can be estimated by the following formula.
The modal value is denoted by 𝑥̂. For grouped data we can compute the mode as follows:
𝑓1 − 𝑓0
𝑥̂ = 𝐿1 + ( )𝑤
2𝑓1 − 𝑓0 − 𝑓2
where f1= frequency of the modal class
f0 = frequency of the class preceding the modal class
f2 = frequency of the class next to the modal class
L1= lower class boundary of the modal class
W = class width of the modal class
Note: The modal class is the class with the highest frequency.
20
Example 3.14: Calculate the modal time spent by the automobile workers.
Time (in minute) Number of workers
15.5- 21.5 3
21.5-27.5 6
27.5-33.5 8
33.5-39.5 4
39.5-45.5 3
45.5-51.5 1
The modal class is the class with largest frequency and it is the third class.
8−6
𝑥̂ = 27.5 + ( ) 6 = 29.5
16 − 6 − 4
Therefore, the modal time spent is 29.5 minutes.
Note: The mode can be calculated for distributions with open ended classes.
Deciles are nine points which divide an array into 10 parts in such a way that each part contains equal number
of elements. The 1st, 2nd,…, and the 9th points are known as the 1st, 2nd,…, and the 9th deciles and are usually
denoted by D1,D2,…,D9, respectively.
Percentiles are 99 points which divide an array into 100 parts in such a way that each part consists of equal
number of elements. The 1st, 2nd… and the 99th points are known as the 1st, 2nd… and the 99th percentiles and
are usually denoted by P1, P2… P99, respectively.
Note: The array should be in ascending order in order to get the quantiles.
i. Quantile points for raw data
First form an array in an ascending order and then apply the following procedure.
𝑛 + 1 𝑡ℎ
𝑄𝑘 = 𝑇ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑛 𝑡ℎ𝑒 {𝑘 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
𝑛 + 1 𝑡ℎ
𝐷𝑘 = 𝑇ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑛 𝑡ℎ𝑒 {𝑘 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
10
𝑛 + 1 𝑡ℎ
𝑃𝑘 = 𝑇ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑛 𝑡ℎ𝑒 {𝑘 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
100
Example 3.15: The following data relate to sizes of shoes sold at a stock during a week. Find the quartiles, the
seventh decile and the 90th percentile.
Size of shoes 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5
Number of pairs 2 5 15 30 60 40 23 11 4 1
21
Solution: The total number of observations is 191.
191 + 1 𝑡ℎ
𝑄1 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {1 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
= 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 48𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 6.5
191 + 1 𝑡ℎ
𝑄3 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {3 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
= 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 144𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 7.5
191 + 1 𝑡ℎ
𝐷7 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {7 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
10
= 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 134.4𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
≈ 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 134𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 7.5
191 + 1 𝑡ℎ
𝑃90 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {90 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
100
= 𝑡ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 172.8𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
≈ 𝑡ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 173𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 8
22
UNIT FOUR: MEASURES OF VARIATION
Objectives:
Having studied this unit, you should be able to
✓ understand the importance of measuring the variability (dispersion) in a data set.
✓ measure the scatter or dispersion in a data set.
✓ understand ‘moments’ as a convenient and unifying method for summarizing several
descriptive statistical measures.
✓ measure the extent to which the distribution of values in a data set deviate from symmetry.
4.1 Introduction and objectives of measuring variation
We have seen that averages are representatives of a frequency distribution. But they fail to give a
complete picture of the distribution. They do not tell anything about the spread or dispersion of
observations within the distribution. Suppose that we have the distribution of yield (kg per plot)
of two rice varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30
The mean yield of both varieties is 42 kg. The mean yield of variety 1 is close to the values in this
variety. On the other hand, the mean yield of variety 2 is not close to the values in variety 2. The
mean doesn’t tell us how the observations are close to each other. This example suggests that a
measure of central tendency alone is not sufficient to describe a frequency distribution. Therefore,
we should have a measure of spreads of observations.
There are different measures of dispersion. In this chapter we shall discus the most commonly used
measure of dispersion or variation like Range, Quartile Deviation, Standard Deviation, coefficient
of variation. And measure of shape such as skewness and kurtosis.
Objectives of measuring variation
➢ To describe dispersion (variability) in a data.
➢ To compare the spread in two or more distributions.
➢ To determine the reliability of an average.
Note: The desirable properties of good measures of variation are almost identical with that of a
good measure of central tendency.
4.2 Absolute and relative measures
Measures of variation may be either absolute or relative. Absolute measures of variation are
expressed in the same unit of measurement in which the original data are given. These values may
be used to compare the variation in two distributions provided that the variables are in the same
units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager’s salary versus
worker’s salary, the absolute measures of dispersion are not comparable. In such cases measures
23
of relative dispersion should be used. A measure of relative dispersion is the ratio of a measure of
absolute dispersion to an appropriate measure of central tendency. It is a unitless measure.
4.3 Types of measures of variation
The range and relative range
Definition 4.1: Range is defined as the difference between the maximum and minimum
observations in a set of data. 𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒
Range is the crudest absolute measures of variation. It is widely used in the construction of quality
control charts and description of daily temperature.
𝑅𝑎𝑛𝑔𝑒
Definition 4.2: Relative range (RR) is defined as 𝑅𝑅 =
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒+𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒
Definition 4.3: The variance is the average of the squares of the distance each value is from the
mean. The symbol for the population variance is σ2 (σ is the Greek lower case letter sigma). Let
x1,x2,…,xN be the measurements on N population units then, the population variance is given by
the formula:
(∑ 𝑥𝑖 )2
∑𝑁
𝑖=1(𝑥𝑖 −µ)
2 {∑𝑁 2
𝑖=1 𝑥𝑖 − } ∑𝑁
𝑖=1 𝑥𝑖
2 𝑁
𝜎 = = whereµ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 = and N=Population
𝑁 𝑁 𝑁
size.
Definition 4.4: The standard deviation is the square root of the variance. The symbol for the
population standard deviation is 𝜎. The corresponding formula for the standard deviation is
∑𝑁
𝑖=1(𝑥𝑖 −µ)
2
𝜎 = √𝜎 2 = √ .
𝑁
Example 4.1: The height of members of a certain committee was measured in inches and the data
is presented below.
Height(x): 69 66 67 69 64 63 65 68 72
∑𝑁𝑖=1 𝑥𝑖 69 + 66 + ⋯ + 72 603
µ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 = = = = 67 𝑖𝑛𝑐ℎ𝑒𝑠
𝑁 9 9
(𝑥 − µ) 2 -1 0 2 -3 -4 -2 1 5
(𝑥 − µ)2 4 1 0 4 9 16 4 1 25
∑𝑁𝑖=1(𝑥𝑖 − µ)
2
4 + 1 + 0 + 4 + 9 + 16 + 4 + 1 + 25 64
𝜎2 = = = = 7.11𝑖𝑛𝑐ℎ2
𝑁 9 9
And = 2 = 7.11 = 2.66
Definition 4.5: The sample variance is denoted by S2, and its formula is
24
2
(∑ 𝑓𝑥)
∑𝑛 2 ∑ 𝑓(𝑥−𝑥̅ )2 ∑ 𝑓𝑥 2 −
2 𝑖=1(𝑥𝑖 −𝑥̅ ) 𝑛
𝑆 = = ={ }.
𝑛−1 𝑛−1 𝑛−1
Definition 4.6: The sample standard deviation, denoted by S, is the square root of the sample
variance
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 ∑ 𝑓(𝑥−𝑥̅ )2
𝑆 = √𝑆 2 = √ =√ .
𝑛−1 𝑛−1
Example 4.2: For a newly created position, a manager interviewed the following numbers of
applicants each day over a five-day period: 16, 19, 15, 15, and 14. Find the variance and standard
deviation.
Solution:
79
𝑥̅ = = 15.8
5
∑ 𝑓(𝑥−𝑥̅ )2 14.8
𝑆2 = = = 3.7
𝑛−1 4
2
(∑ 𝑓𝑥) (79)2
∑ 𝑓𝑥 2 − 1263− 14.8
2 𝑛 5
𝑆 ={ }={ }={ } = 3.7
𝑛−1 4 4
Note that the procedure for finding the variance and standard deviation for grouped data is similar
to that for finding the mean for grouped data, and it uses the mid-points of each class.
Properties of variance
✓ The unit of measurement of the variance is the square of the unit of measurement of the
observed values. It is one of its limitations.
✓ The variance gives more weight to extreme values as compared to those which are near to
mean value, because the difference is squared in variance.
✓ It is based on all observations in the data set.
Properties of standard deviation
✓ Standard deviation is considered to be the best measure of dispersion and is used widely.
✓ There is, however, one difficulty with it. If the unit of measurement of variables of two
series is not the same, then their variability cannot be compared by comparing the values
of standard deviation.
Uses of the variance and standard deviation
✓ The variance and standard deviations can be used to determine the spread of data,
consistency of a variable and the proportion of data values that fall within a specified
interval in a distribution.
✓ If the variance or standard deviation is large, the data is more dispersed. This information is
useful in comparing two or more data sets to determine which is more (most) variable.
✓ Finally, the variance and standard deviation are used quite often in inferential statistics.
Coefficient of variation (CV)
25
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two
or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent:
S
CV = 100%
x
whereS is the standard deviation of the observations.
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example 4.3: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.
27
∑(𝑥𝑖 − 𝐴)𝑟
.
𝑛
The most known moments are moments about the mean also known as the central moments and
the moments about zero (also known as moments about the origin.)
The rth moment about the mean, µr, is given by:
∑(𝑥𝑖 −𝑥̅ )𝑟
µ𝑟 = .
𝑛
Special ceases: µ0=1, µ1=0, µ2=s2.
The rthmoment about the origin,µ𝑟 , , is given by:
∑ 𝑥𝑖𝑟
µ𝑟 , = .
𝑛
∑ 𝑥𝑖2
Special cases: µ0 , = 1,µ1 , = 𝑥̅ ,µ2 , = .
𝑛
Skweness: it refers to lack of symmetry in a distribution.
Note: for a symmetrical and unimodal distribution:
i) Mean =median =mode
ii) The lower and upper quartiles are equidistant from the median, so also are corresponding
pairs of deciles and percentiles.
iii) Sum of positive deviations from the median is equal to the sum of negative deviations (signs
ignored).
iv) The two tails of the frequency curve are equal in length from the central value.
If a distribution is not symmetrical we call it skewed distribution.
Measures of skewness
i) Pearsonian coefficient of skewness (Pcsk) defined as:
𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒
𝑃𝑐𝑠𝑘 =
𝑠. 𝑑
In moderately skewed distributions: Mode = mean- 3(mean-median)
3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑃𝑐𝑠𝑘 =
𝑠. 𝑑
Interpretation:
Note: in a negatively skewed distribution larger values are more frequent than smaller values. In
a positively skewed distribution smaller values are more frequent than larger values.
28
Example 4.7: If the mean, mode and s.d of a frequency distribution are 70.2, 73.6, and 6.4,
respectively. What can one state about its skeweness?
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 70.2−73.6
𝑃𝑐𝑠𝑘 = = = −0.53.
𝑠.𝑑 6.4
This figure suggests that there is some negative skeweness.
When the values of a distribution are closely bunched around the mode in such a way that the peak
of the distribution becomes relatively high, the distribution is said to be leptokurtic. If it is flat
topped we call it platykurtic. A distribution which is neither highly peaked nor flat topped is known
as a meso-kurtic distribution (normal).
Measures of kurtosis
29
UNIT FIVE: ELEMENTARY PROBABILITY
Objectives:
Having studied this unit, you should be able to
✓ understand the elements of probability
✓ calculate some probabilities of events associated with random experiments
✓ apply the concept of probability in some biological phenomena
5.1 Introduction
Without some formalism of probability theory, the student cannot appreciate the true interpretation
from data analysis through modern statistical methods. It is quite natural to study probability prior
to studying statistical inference. Elements of probability allow us to quantify the strength or
“confidence” in our conclusions. In this sense, concepts in probability form a major component
that supplements statistical methods and helps us to gauge the strength of the statistical inference.
The discipline of probability, then, provides the transition between descriptive statistics and
inferential methods. Elements of probability allow the conclusion to be put into the language that
the science or engineering practitioners require. An example follows that will enable the reader to
understand the notion of a P-value, which often provides the “bottom line” in the interpretation of
results from the use of statistical methods.
Definition 5.2:
Sample point (outcome): The individual result of a random experiment.
Sample space: The set containing all possible sample points (out comes) of the random experiment.
The sample space is often called the universe and denoted by S.
Event: The collection of outcomes or simply a subset of the sample space. We denote events with
capital letters, A, B, C, etc.
Definitions 5.3:
1. Union: The union of A and B, A u B, is the event containing all sample points in either
A or B or both. Sometimes we use A or B for union.
2. Intersection: The intersection of A and B, A n B, is the event containing all sample points that are both in
A and B. Sometimes we use AB or A and B for intersection.
3. Subset: If for any ω A, then ω B. Then A B .
4. Empty set: If a set A contains no points, it will be called the null set, or empty set, and denoted by .
5. Complement: The complement of a set A denoted by Ac is the set where ω S, ω Ac but, ω A .
6. Mutually Exclusive Events: Two events are said to be mutually exclusive (or disjoint) if their intersection
is empty. (i.e. A n B = ). Subsets A1, A2,… are defined to be mutually exclusive if Ai n Aj = for every i ≠
j.
In short, to assign probabilities for an event, we might need to enumerate the possible outcomes
of a random experiment and need to know the number of possible outcomes favoring the event.
The following principles will help us in determining the number of possible outcomes favoring a
given event.
Example 5.3: Suppose one wants to purchase a certain commodity and that this commodity is on
sale in 5 government owned shops, 6 public shops and 10 private shops. How many alternatives
are there for the person to purchase this commodity?
Solution: Total number of ways =5+6+10=21 ways
Example 5.4: If we can go from Addis Ababa to Rome in 2 ways and from Rome to Washington
D.C. in 3 ways then the number of ways in which we can go from Addis Ababa to Rome to
Washington D.C. is 2x3 ways or 6 ways. We may illustrate the situation by using a tree diagram
below:
R W
W
A
W
R
W
Example 5.5: If a test consists of 10 multiple choice questions, with each permitting 4 possible
answers, how many ways are there in which a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4 alternatives.
Second step: To give answer to question number two, he/she has 4 alternatives……
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4x…x4=410 ways or1, 048, 576 ways of completing the exam. Note
that there is only one way in which he /she can give correct answers to all questions and that there
are 310 ways in which all the answers will be incorrect.
Example 5.6: A manufactured item must pass through three control stations. At each station the
item is inspected for a particular characteristic and marked accordingly. At the first station, three
ratings are possible while at the last two stations four ratings are possible. Hence there are 48 ways
in which the item may be marked.
32
Example 5.7: Suppose that car plate has three letters followed by three digits. How many possible
car plates are there, if each plate begins with a H or an F?
2x 26x 26x 10x 10x 10 or 1, 352, 000 different plates.
Definition 5.4: If n is a positive integer, we define n!= n(n-1)(n-2)…1 and call it n-factorial and
0!=1.
Permutations
Suppose that we have n different objects. In how many ways, saynPn, may these objects be arranged
(permuted)? For example, if we have objects a, b and c we can consider the following
arrangements: abc, acb, bac, bca, cab, and cba. Thus the answer is 6. The following theorem gives
general result on the number of such arrangements.
33
n gives (n - 1)!. The two circular permutations below are considered the same; their order is a, b,
c, d, e.
There are many problems in which we are interested in determining the number of ways in which
r objects can be selected from n distinct objects without regard to the order in which they are
selected. Such selections are called combinations or r-sets. It may help to think of combinations
as committees. The key here is without regard for order.
To obtain the general result we recall the formula derived above: the number of ways of choosing
r objects out of n and permuting the chosen r equals n!/(n-r)!. Let C be the number of ways of
choosing r out of n, disregarding order. C is the number required. Note that once the r items have
34
been chosen, there are r! ways of permuting them. Hence applying the multiplication principle
again, together with the above result, we obtain
n!
C.r! = n!/(n-r)!. Therefore, C = . This number arises in many contexts in mathematics
r!(n − r )!
and hence a special symbol is used for it. We shall write
n n!
= n C r = .
r r!(n − r )!
Example 5.12: How many different committees of 3 can be formed from Hawa, Segenet, Nigisty
and Lensa?
Solution: The question can restated in terms of subsets from a set of 4 objects, how many subsets
of 3 elements are there? In terms of combinations the question becomes, what is the number of
combinations of 4 distinct objects taken 3 at a time? The list of committees:{H,S,N}, {H,S,L},
{H,N,L}, {S,N,L}.Therefore, we have 4C3 or 4 possible number of committees.
Example 5.13:
(i) A committee of 3 is to be formed from a group of 20 people. How many different committees
are possible?
(ii) From a group of 5 men and 7 women, how many different committees consisting of 2 men and
3 women can be formed?
20 20!
Solution: (i) There are = = 1140 possible committees.
3 3!17!
5 7 5! 7!
(i) = = 350 possible committees.
2 3 2!3! 3!4!
Remarks:
n n
i) =
r n − r
It is rather surprising that with only these three axioms, we can construct the "entire" theory of
probability! The next theorems and definitions help in assigning probabilities of events.
Theorem 5.6 :If A is an event in a discrete sample space S, then P(S) equals the sum of the probabilities of
the individual outcomes comprising A.
Theorem 5.7: Suppose that we have a random experiment with sample space S and probability function P
and A andB are events. Then we have the following results:
i) P( ) = 0
ii) P(Ac) = 1 − P(A)
iii) P(B n Ac) = P(B) − P(A n B)
iv) If A subset of B then P(A) ≤ P(B).
36
2
P( A) = = 0.5
4
2
P( B ) = = 0.5
4
P( A ) = 1 − P ( A) = 1 − 0.5 = 0.5
c
P( B c ) = 1 − P ( B ) = 1 − 0.5 = 0.5
P( S c ) = 1 − P( S ) = 1 − 1 = 0 = P ( )
Example 5.16: From a group of 5 men and 7 women, it is required to form a committee of 5
persons. If the selection is made randomly, then
i) what is the probability that 2 men and 3 women will be in the committee?
ii) what is the probability that all members of the committee will be men?
iii) what is the probability that at least three members will be women?
12 12!
Solution: The total number of possible committees is = = 792 , i.e. the number of possible
5 5!7!
out comes in the sample space is 792.
i) Let A be the event that the committee will consist of two 2 men and 3 women. We need to
know the number of possible outcomes favoring this event. The number of ways we
5 5!
can select 2 men from 5 men is = = 10 and the number of ways of selecting 3
2 2!3!
7 7!
women out of 7 women is = = 35 . Using the multiplication principle, the
3 3!4!
number of elements favoring event A is 10x35 or 350.
Hence, using the classical definition of probability,
5 7
P ( A) = =
2 3 350
= 0.44
12 792
5
ii) Let B be the event that all members of the committee will be men. Hence
5 7
P ( A) = =
5 0 1
12 792
5
iii) Let C be the event that at least three of the committee members will be women.
Basically, three different compositions of committee members can be formed in terms
of sex: 3 women and 2 men, 4 women and 1 man, and all are women. Hence the number
of possible outcomes favoring event C using the principle of combination together with
5 7 5 7 5 7
the addition principle is + + = 350 + 175 + 21 = 546 .
2 3 1 4 0 5
37
5 7 5 7 5 7
+ +
Therefore, P (C ) = =
2 3 1 4 0 5 546
= 0.69
12 792
5
The above definition of probability is based on empirical data accumulated through time or based
on observations made from repeated experiments for a large number of times.
5.5 Some probability rules
Theorem 5.8: If A and B , thenP(A u B) = P(A) + P(B) − P(A n B).
In more precise terms, given an experiment, a corresponding sample space, and a probability law,
supposes that we know that the outcome is within some given event B. We wish to quantify the
likelihood that the outcome also belongs to some other given event A. We thus seek to construct a
new probability law, which takes into account this knowledge and which, for any event A, gives
us the conditional probability of A given B, denoted by P(A|B).
Definition 5.8:If P(B) > 0, the conditional probability of A given B, denoted by P(A|B), is
P( AnB )
P( A / B) = .
P( B)
Example 5.19: Suppose cards numbered one through ten are placed in a hat, mixed up, and then
one of the cards is drawn at random. If we are told that the number on the drawn card is at least
five, then what is the conditional probability that it is ten?
Solution :Let A denote the event that the number on the drawn card is ten, and Bbe the event that
it is at least five. The desired probability is P(A|B).
P( AnB) P({10}n{5,6,7,8,9,10}) P({10}) 1 / 10 1
P( A / B) = = = = =
P( B) P({5,6,7,8,9,10}) P({5,6,7,8,9,10}) 6 / 10 6
39
Example 5.20: A family has two children. What is the conditional probability that both are boys
given that at least one of them is a boy? Assume that the sample space S is given by S = {(b, b),
(b, g), (g, b), (g, g)}, and all outcomes are equally likely. (b, g) means, for instance, that the older
child is a boy and the younger child is a girl.
Solution:Letting A denote the event that both children are boys, and B the event that at least one
of them is a boy, then the desired probability is given by
P( AnB) 1 / 4 1
P( A / B) = = =
P( B) 3/ 4 3
Law of Multiplication
The defining equation for conditional probability may also be written as:
P(AnB) = P(B) P(A|B)
This formula is useful when the information given to us in a problem is P(B) and P(A|B) and we
are asked to find P(AnB). An example illustrates the use of this formula. Suppose that 5 good fuses
and two defective ones have been mixed up. To find the defective fuses, we test them one-by-one,
at random and without replacement. What is the probability that we are lucky and find both of the
defective fuses in the first two tests?
Example 5.21: Suppose an urn contains seven black balls and five white balls. We draw two balls
from the urn without replacement. Assuming that each ball in the urn is equally likely to be drawn,
what is the probability that both drawn balls are black?
Solution:Let A and B denote, respectively, the events that the first and second balls drawn are
black. Now, given that the first ball selected is black, there are six remaining black balls and five
white balls, and so P (B|A) = 6/11. As P(A) is clearly 7/12 , our desired probability is
7 6 7
P( AnB) = P( A) P( B / A) = . =
12 11 22
Independence
We have introduced the conditional probability P (A|B) to capture the partial information that
event B provides about event A. An interesting and important special case arises when the
occurrence of B provides no information and does not alter the probability that A has occurred,
i.e., P(A|B) = P(A).When the above equality holds, we say that A is independent of B. Note that
by the definition P(A|B) = P(A ∩ B)/P(B), this is equivalent to P(A ∩ B) = P(A)P(B).
40
41
UNIT SIX: PROBABILITY DISTRIBUTIONS
Objectives:
Having studied this unit, you should be able to
✓ compute probabilities of events using the concept of probability distributions.
✓ compute expected values and variances of random variables.
✓ apply the concepts of probability distributions to real-life problems.
Introduction
In many applications, the outcomes of probabilistic experiments are numbers or have some
numbers associated with them, which we can use to obtain important information, beyond what
we have seen so far. We can, for instance, describe in various ways how large or small these
numbers are likely to be and compute likely averages and measures of spread. For example, in 3
tosses of a coin, the number of heads obtained can range from 0 to 3, and there is one of these
numbers associated with each possible outcome. Informally, the quantity “number of heads” is
called a random variable, and the numbers 0 to 3 its possible values. The value of a random
variable is determined by the outcome of the experiment. Thus, we may assign probabilities to the
possible values of the random variable.
42
If x is any possible value of X, the probability mass of x, denoted PX(x), is the probability of the
event {X = x} consisting of all outcomes that give rise to a value of X equal to x. A probability
mass function must satisfy the following conditions:
i. PX(x)≥0 for any value of x of X.
ii. PX (x) = 1 where the summation is over all values of x .
Example 6.1: Consider an experiment of tossing two fair coins. Letting X denote the number of
heads appearing on the top face, then X is a random variable taking on one of the values 0, 1, 2 .
The random variable X assigns a 0 value for the outcome (T,T), 1 for outcomes (T ,H) and (H, T
), and 2 for the outcome (H,H). Thus, we can calculate the probability that X can take specific
value/s as follows:
P(X = 0) = P({(T , T )}) = ¼
P(X = 1) = P({(T ,H),(H, T )}) = 2/4,
P(X = 2) = P({(H,H)}) = ¼
The table below shows the probability mass function X.
X 0 1 2
PX(x) ¼ 2/4 ¼
We can justify that PX(x) is probability mass function.
PX(x)≥0 for x=0,1,2 and
P(X = 0) + P(X = 1)+P(X = 2) = ¼ + 2/4 + ¼=1
Suppose we are interested to calculate the probability that X≥1. The values of X which are greater
than or equal to 1 are 1 and 2. Thus, the probability that X is greater than or equal to 1, denoted
P(X≥1), is found as P(X≥1) = P(X = 1) + P(X = 2)=3/4.
We can use the probability density function to calculate probabilities of events expressed in terms
of the random variable X. For instance, if we are interested in the probability that X lies between
two points, say a and b, we can find it using integration of fX(x) on the interval [a,b],i.e.
b
P(a X b) = f X ( x)dx
a
It is useful to view the mean of X as a “representative” value of X, which lies somewhere in the
middle of its range. We can make this statement more precise, by viewing the mean as the center
of gravity of the distribution.
Variance
Definition 6.4: The variance of a random variable X denoted V(X) or σ2 is defined as
V(X)=E[(X- μ)2] = E(X2) – μ2.
i) if X is discrete, V ( X ) = [ x 2 PX ( x)] − 2
ii) if X is continuous, V ( X ) = [ x 2 f X ( x)dx] − 2
−
The variance provides a measure of dispersion of X around its mean. Another measure of
dispersion is the standard deviation of X, which is defined as the square root of the variance and is
denoted by σ.
Example 6.2: Calculate the mean and variance of the random variable X in example 7.1.
1 1 1
E ( X ) = xPX ( x) = 0 + 1 + 2 = 1
4 2 4
1 1 1
E ( X 2 ) = x 2 PX ( x) = 02 + 12 + 22 = 1.5
4 2 4
V ( X ) = E ( X ) − = 1.5 − 1 = 0.5
2 2 2
44
6.3 Common discrete probability distributions – binomial and Poisson
The Binomial distribution
Many real problems (experiments) have two possible outcomes, for instance, a person may be
HIV-Positive or HIV-Negative, a seed may germinate or not, the sex of a new born bay may be a
girl or a boy, etc. Technically, the two outcomes are called Success and Failure. Experiments or
trials whose outcomes can be classified as either a “success” or as a “failure” are called Bernoulli
trails.
Suppose that n independent trials, each of which results in a “success” with probability p and in a
“failure” with probability 1 − p, are to be performed. If X represents the number of successes that
occur in the n trials, then X is said to have binomial distribution with parameters n and p. The
probability mass function of a binomial distribution with parameters n and p is given by
n
PX ( x) = p x (1 − p) n − x , x = 0, 1, 2, ..., n
x
The mean and variance of the binomial distribution are np and np(1-p), respectively. Note that the
binomial distributions are used to model situations where there are just two possible outcomes,
success and failure. The following conditions also have to be satisfied.
i) There must be a fixed number of trials called n
ii) The probability of success (called p) must be the same for each trial.
iii) The trials must be independent
Example 6.3: A fair coin is flipped 4 times. Let X be the number of heads appearing out of the
four trials. Calculate the following probabilities:
i) 2 heads will appear
ii) No head will appear
iii) At least two heads will appear
iv) Less than two heads will appear
v) At most heads 2 will appear
Solution: We can consider that the outcomes of each trial are independent to each other. In
addition the probability that a head will appear in each trial is the same. Thus, X has a binomial
distribution with number of trials 4 and probability of success (the occurrence of head in a trial) is
½. The probability mass function of X is given by
n n
PX ( x) = 0.5 x (1 − 0.5) n − x = 0.5 n , x = 0, 1, 2, 3,4 , Note that n = 4 and p = 1/2
x x
4
i) P( X = 2) = 0.5 2 (1 − 0.5) 4−2 = 0.3750
2
4
ii) P( X = 0) = 0.5 0 (1 − 0.5) 4−0 = 0.0625
0
iii) P( X 2) = P( X = 2) + P ( X = 3) + P( X = 4) = 0.3750 + 0.2500 + 0.0625 = 0.6875
iv) P( X 2) = P( X = 0) + P( X = 1) = 0.0625 + 0.2500 = 0.3125
v) P( X 2) = P( X = 0) + P( X = 1) + P ( X = 2) = 0.0625 + 0.2500 + 0.3750 = 0.6875
45
Example 6.4:Suppose that a particular trait of a person (such as eye color or left handedness) is
classified on the basis of one pair of genes and suppose that d represents a dominant gene and r a
recessive gene. Thus a person with ddgenes is pure dominance, one with rris pure recessive, and
one with rdis hybrid. The pure dominance and the hybrid are alike in appearance. Children receive
one gene from each parent. If, with respect to a particular trait, two hybrid parents have a total of
four children, what is the probability that exactly three of the four children have the outward
appearance of the dominant gene?
Solution:If we assume that each child is equally likely to inherit either of two genes from each
parent, the probabilities that the child of two hybrid parents will have dd, rr, or rdpairs of genes
are, respectively, ¼, ¼,½. Hence, because an offspring will have the outward appearance of the
dominant gene if its gene pair is either ddor rd, it follows that the number of such children ,say X,
is binomially distributed with parameters n equals 4 and p equals ¾. Thus the desired probability
4
is P( X = 3) = 0.753 (1 − 0.75) 4−3 = 0.421875.
3
Example 6.5: Suppose it is known that the probability of recovery for a certain disease is 0.4. If
random sample of 10 people who are stricken with the disease are selected, what is the probability
that:
(a) exactly 5 of them will recover?
(b) at most 9 of them will recover?
Solution: Let X be the number of persons will recover from the disease. We can assume that the
selection process will not affect the probability of success (0.4) for each trial by assuming a large
diseased population size. Hence, X will have a binomial distribution with number of trials equal
10
to 10 and probability of success equal 0.4. P( X = k ) = 0.4 k 0.610−k , k = 0,1,2,...10
k
10
(a) P( X = 5) = 0.4 5 0.610−5 = 0.200658
5
10
(b) P( X 9) = 1 − P( X = 10) = 1 − 0.410 0.610−10 = 1 − 0.000105 = 0.9999
10
The Poisson Random Variable
A random variable X, taking on one of the values 0, 1, 2, . . . , is said to have a Poisson distribution
if its probability mass function is given by
e − x
PX ( x) = , x = 0, 1, 2, 3, ... and 0 .
x!
λ is the parameter of this distribution. The mean and variance of the Poisson distribution are equal
and their values are equal to λ. Note that poison distributions is used to model situations where
the random variable X is the number of occurrences of a particular event over a given period of
time (or space). Together with this , the following conditions must also be fulfilled: events are
independent of each other, events occur singly, and events occur at a constant rate (in other words
for a given time interval the mean number of occurrences is proportional to the length of the
interval).
The poisson distribution is used as a distribution of rare events such as telephone calls made to a
switch board in a given minute, number of misprints per page in a book, road accidents on a
46
particular motor way in one day, etc. The processes that give rise to such events are called poisson
processes.
Example 6.6:Suppose that the number of typographical errors on a single page of this lecture note
has a Poisson distribution with parameter λ = 1. if we randomly select a page in this lecture note,
calculate the probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.
Solution: Let X= Number of errors per page
e − k
P( X = k ) = , = 1, k = 0,1,2,...
k!
e −110 1
a) Required P(X≥1)=? P( X = 0) = = = 0.367879
0! e
−1 3
e 1
b) P( X = 3) = = 0.061313
3!
c) P( X 2) = P( X = 0) + P( X = 1) = 0.73576
D) P( X 1) = 1 − P( X = 0) = 1 − 0.367879 = 0.632121
Example 6.7:If the number of accidents occurring on a highway each day is a Poisson random
variable with parameter λ = 3, what is the probability that no accidents will occur on a randomly
selected day in the future?
Solution: Let X= number of accidents per day
e −3 3 k
P( X = k ) = , k = 0,1,2,...
k!
e −3 30
Required P(X= 0) = ? P( X = 0) = = e −3 = 0.05
0!
Note: The Poisson random variable has a wide range of applications in a diverse number of areas.
An important property of the Poisson random variable is that it may be used to approximate a
binomial random variable when the binomial parameter n is large and p is small. The probability
that X will be k can be approximated by substituting λ by np in the poisson distribution, i.e.
e − k
P( X = k ) = , = np .
k!
6.4 Common continuous probability distributions
Normal distribution
The normal distribution plays an important role in statistical inference because many real-life
distributions are approximately normal; many other distributions can be almost normalized by
appropriate data transformations (e.g., taking the log) and as a sample size increases, the means of
samples drawn from a population of any distribution will approach the normal distribution.
A continuous random variable X is said to follow normal distribution , if and only if , its
1 x− 2
1 − ( )
probability density function (p.d.f.) is f X ( x) = e 2
where x (-∞,∞ ), μ (-∞,∞ )
2
47
and σ (0,∞ ). There are infinitely many normal distributions since different values of μ and σ
define different normal distributions. For instance, when μ= 0 and σ =1 , the above density will
1
1 − 2 z2
have the following form f Z ( z ) = e . This particular distribution is called the standard
2
normal distribution and sometimes known as Z-distribution.. The random variable corresponding
to this distribution is usually denoted by Z. If X has a normal distribution with mean μ and variance
σ2, we denote it as X ~ N ( , 2 ) .
Properties of normal distribution
i) The normal distribution curve is a bell shaped, symmetrical about μ and mesokurtic. The
p.d.f. attains its maximum value at x= μ.
ii) Since for x= μ divides the area under the normal curve into two equal parts, μ is the mean,
the median and the mode of the distribution.
iii) The mean and variance of the normal distribution are μ, and σ2, respectively.
iv) The total area under the curve and bounded from below by the horizontal axis is 1, i.e.
f
−
X ( x)dx = 1
48
standard normal distribution curve are tabulated in various ways. The most common tables give
areas bounded between Z=0 and a positive value of Z. In addition to the standard normal table,
the properties of normal distribution and the following theorem are useful to make probability
calculations very easy for any normal distribution.
Example 6.8: Let Z be the standard normal random variable. Calculate the following probabilities
using the standard normal distribution table: a) P(0<Z<1.2) b) P(0<Z<1.43) c) P(Z≤0) d) P(-
1.2<Z<0) e) P(Z≤-1.43)
f)P(-1.43≤Z<1.2) g) P(Z≥1.52) h)P(Z≥-1.52)
Solution:
a) The probability that Z lies between 0 and 1.2 can be directly found from the standard normal
table as follows: look for the value 1.2 from z column ( first column) and then move
horizontally until you find the value of 0.00 in the first row. The point of intersection made
by the horizontal and vertical movements will give the desired area (probability). Hence
P(0<Z<1.2)= 0.3849. Refer the table below as a guide to find this probability.
49
Figure: P(0<Z<1.2) is the shaded area
b) In a similar way P(0<Z<1.43)= 0.4236.
c) We know that the normal distribution is symmetric about its mean. Hence the area to the left
of 0 and the to the right of zero are 0.5 each. Therefore P(Z≤0)=P(Z≥0)=0.5
Figure: The area to the left and the right of 0 for z-distribution
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to symmetry
e) P(Z<-1.43)= 1- P(Z ≥ -1.43) Using the probability of the complement event.
= 1-[P(-1.43<Z<0)+P(Z≥0)] Since a region can be broken down
=1-[P(0<Z<1.43)+P(Z ≥0)] into non overlapping regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764
50
Figure: P(-1.43≤Z<1.2) is the shaded region
g) P(Z≥1.52) = 0.5 – P(0≤ Z<1.52)=0.5 – 0.4357=0.0643
Because of its importance, the chi-square distribution is tabulated for various values of the
parameter n (refer table). Thus we may find in the table that value, denoted by 2 (n) , satisfying
p( X 2 (n)) = , 0 1. The example below helps on how to read chi-square distribution
values.
Example 6.11:To read the chi-square value with 2 degrees of freedom where the area to the right
of this value is 0.005.Look the degrees of freedom, 2, in the first column (df column) and then
move horizontally until you find the value of α , 0.005 in the first row. The point of intersection
made by the horizontal and vertical movement will give the desired chi-square value, 10.597. This
value satisfies the following: P( X 10 .597 ) = 0.005 . In a similar way,The chi-square value with
100 degrees of freedom where the area to the right of this value is 0.975 is 74.222.
53
The t distribution
The t distribution is an important distribution useful in inference concerning population
mean/means. This distribution has one parameter called the degrees of freedom. Depending on the
values of the degrees of freedom, we may have different t distributions. The degrees of freedom
is usually denoted by n. In inference on the population mean, the degrees of freedom is related
to sample size. As the sample size or degrees of freedom increases, the t distribution approaches
the standard normal distribution.
The t- distribution shares some characteristics of the normal distribution and differs from it in
others. The t distribution is similar to the standard normal distribution in the following ways.
i) it is bell-shaped
ii) it is symmetrical about the mean
iii) the mean, median, and mode are equal to 0 and are located at the center of the distribution.
iv) The curve never touches the x-axis
The t distribution differs from the standard normal distribution in the following ways.
i) the variance is greater than 1.
ii) The t distribution is actually a family of curves based on the concept of degrees of freedom.
54
55
UNIT SEVEN: SAMPLING AND SAMPLING DISTRIBUTION OF SAMPLE MEAN
Objectives:
After a successful completion of this unit, students will be able to:
✓ Differentiate the two major sampling techniques: probabilistic and non-probabilistic
✓ Apply simple random sampling technique to select sample
✓ Define sampling distribution of the sample mean
56
Simple random sampling
Simple random sampling is a method of selecting a sample from a population in such a way that
every unit of the population is given an equal chance of being selected. In practice, you can draw
a simple random sample of elements using either the 'lottery method' or 'tables of random numbers'.
For example, you may use the lottery method to draw a random sample by using a set of 'N' tickets,
with numbers ' 1 to N' if there are 'N' units in the population. After shuffling the tickets thoroughly,
the sample of a required size, say n, is selected by picking the required n number of tickets.
The best method of drawing a simple random sample is to use a table of random numbers. After
assigning consecutive numbers to the units of population, the researcher starts at any point on the
table of random numbers and reads the consecutive numbers in any direction horizontally,
vertically or diagonally. If the read out numbers corresponds with the one written on a unit card,
then that unit is chosen for the sample.
Suppose that a sample of 6 study centers is to be selected at random from a serially numbered
population of 60 study centers. The following table is portion of a random numbers table used to
select a sample.
Row> 1 2 3 4 5 …… N
Column∀
1 2315 7548 5901 8372 5993 ….. 6744
2 0554 5550 4310 5374 3508 ….. 1343
3 1487 1603 5032 4043 6223 ….. 0834
4 3897 6749 5094 0517 5853 ….. 1695
5 9731 2617 1899 7553 0870 ….. 0510
6 1174 2693 8144 3393 0862 ….. 6850
7 4336 1288 5911 0164 5623 ….. 4036
8 9380 6204 7833 2680 4491 ….. 2571
9 4954 0131 8108 4298 4187 ….. 9527
10 3676 8726 3337 9482 1569 ….. 3880
11 ….. ….. ….. ….. ….. ….. …..
12 ….. ….. ….. ….. ….. ….. …..
13 ….. ….. ….. ….. ….. ….. …..
14 ….. ….. ….. ….. ….. ….. …..
15 ….. ….. ….. ….. ….. ….. …..
N 3914 5218 3587 4855 4888 ….. 8042
57
If you start in the first row and first column, centers numbered 23, 05, 14,…, will be selected.
However, centers numbered above the population size (60) will not be included in the sample. In
addition, if any number is repeated in the table, it may be substituted by the next number from the
same column. Besides, you can start at any point in the table. If you chose column 4 and row 1,
the number to start with is 83. In this way you can select first 6 numbers from this column starting
with 83.
The sample, then, is as follows:
83 75
53 33
40 01
05 26
Hence, the study centers numbered 53, 40, 05, 33, 01 and 26 will be in the sample.
Simple random sampling ensures the best results. However, from a practical point of view, a list
of all the units of a population is not possible to obtain. Even if it is possible, it may involve a very
high cost which a researcher or an organization may not be able to afford. In addition, it may result
an unrepresentative sample by chance.
Stratified sampling
Stratified random sampling takes into account the stratification of the main population into a
number of sub-populations, each of which is homogeneous with respect to one or more
characteristic(s). Having ensured this stratification, it provides for selecting randomly the required
number of units from each sub-population. The selection of a sample from each subpopulation
may be done using simple random sampling. It is useful in providing more accurate results than
simple random sampling.
Systematic sampling
In this method, samples are selected at equal intervals from the listings of the elements. This
method provides a sample as good as a simple random sample and is comparatively easier to draw
a sample. For instance, to study the average monthly expenditure of households in a city, you may
randomly select every fourth households from the household listings
Cluster sampling
Cluster sampling is used when sampling frame is difficult to construct or using other sampling
techniques (simple random sampling) is not feasible or costly. For instance, when the geographic
distribution of units is scattered it is difficult to apply simple random sampling. It involves division
of the population of elementary units into groups or clusters that serve as primary sampling units.
A selection of the clusters is then made to form the sample. The precision of estimates made based
on samples taken using this method is relatively low.
Non-probabilily sampling techniques
In non-probability sampling, the sample is not based on chance. It is rather determined by personal
judgment. This method is cost effective; however, we cannot make objective statistical inferences.
Depending on the technique used, non-probability samples are classified into quota, judgment or
purposive and convenience samples.
58
appropriate sampling methods and/or increasing the sample size. The non-sampling error is likely
to increase with increase in sample size.
2. If sampling is with replacement we will have Nn = 32 = 9 possible samples: (A, A), (A, B),
(A, C), (B, A), (B, B), (B, C), (C, A), (C, B) and (C, C). Hence the probability distribution
(sampling distribution) of the sample mean is:
x 3 4.5 6 7.5 9
P(X = x ) 1/9 2/9 3/9 2/9 1/9
E ( X ) = x P( x ) = 3(1/9) + 4.5(2/9) + 6(3/9) + 7.5(2/9) + 9(1/9) = 6
V ( X ) = ( x 2 P( x )) − x = (1 + 4.5 + 12 + 12.5 + 9) – 36 = 3
2
Note:
✓ The mean of the sampling distribution of the sample mean is the same as the population
mean irrespective of the sampling procedure.
✓ The variance of the sampling distribution of the sample mean is:
2
, if sampling is with replacement
n
2
N − n , if sampling is without replacement
n N − 1
✓ The problem with using sample mean to make inferences about the population mean is that
the sample mean will probably differ from the population mean. This error is measured by
the variance of the sampling distribution of the sample mean and is known as the standard
error. The standard error is the average amount of sampling error found because of taking
59
a sample rather than the whole population. As sample size increases, the standard error
decreases.
7.3 Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance σ2, then as n goes
to infinity the distribution of the sample mean, X , approximates normal distribution with mean μ
and variance σ2/n. That is, as n gets large, X N (μ, σ2/n) and its standardized form is
X −
Z= ~ N (0,1).
/ n
Note: The central limit theorem is useful for approximating the distribution of the sample mean
based on a large sample size and when the population distribution is non normal; however, if the
population is normal, then the sampling distribution of the sample mean will be normal regardless
of the sample size.
Example 7.2: If the uric acid values in normal adult males are normally distributed with mean 5.7
mgs and standard deviation of 1mg. Find the probability that
a) a sample of size 4 will yield a mean less than 5
b) a sample of size 9 will yield a mean greater than 6
Solution: Let X be the amount of uric acids in normal adult males with mean 5.7 and variance 1.
a) If a sample of size 4 is taken, then X ~ N (5.7, 0.25) since the population is normally
distributed.
5 − 5.7
P( X 5) = P ( Z ) = P ( Z −1.4)
0.5
= 0.5 − P (0 Z 1.4) = 0.0808
b) If a sample of size 9 is taken, then X ~ N (5.7, 1/9) since the population is normally
distributed.
6 − 5.7
P( X 6) = P( Z ) = P( Z 0.9)
1
3
= 0.5 − P(0 Z 0.9) = 0.1841
60
UNIT EIGHT: SIMPLE LINEAR REGRESSION AND CORRELATION
Objectives:
Having studied this unit, you should be able to:
✓ Formulate a simple linear regression model.
✓ express quantitatively the magnitude and direction of the association between two variables
Introduction
The statistical methods discussed so far are used to analyze the data involving only one variable.
Often an analysis of data concerning two or more variables is needed to look for any statistical
relationship or association between them. Thus, regression and correlation analysis are helpful in
ascertaining the probable form of the relationship between variables and the strength of the
relationship.
61
b such that ˆ 2
= (Y − Y ) 2
is minimum. The solution of this minimization problem using
partial differentiation is as follows:
X Y
XY − n n XY − X Y
b= = and a = Y − bX
( X ) 2
n X 2 − ( X ) 2
X − n
2
Example 8.1: A researcher wants to find out if there is any relationship between height of the son
and his father. He took random sample of 6 fathers and their sons. The height in inch is given in
the table below:
Height of father (X) 63 65 64 65 67 68
Height of the son (Y) 66 68 65 67 69 70
i) Draw the scatter diagram and comment on the type of relationship.
ii) Fit the regression line of Y on X.
iii) Predict the height of the son if his father’s height is 66 inch.
Solution:
i)
From the scatter plot one can see that the points are roughly on straight line.
ii)
n = 6 X = 392 , Y = 405 , X = 25628 , XY = 26476 , Y = 27355
2 2
62
sample values X and Y, Pearson’s correlation coefficient is calculated as the ratio of the covariance
of the variables X and Y to the product of the standard deviations of X and Y. symbolically,
( X − X )(Y − Y )
Cor ( X , Y ) n −1
r= =
Var( X ) Var(Y ) . ( X − X ) (Y − Y ) 2
2
n −1 n −1
=
( X − X )(Y − Y )
( X − X ) (Y − Y )
2 2
Example 8.2: In some locations, there is strong association between concentrations of two
different pollutants. An article reports the accompanying data on ozone concentration x (ppm) and
secondary carbon concentration y ( g / m 3 ) :
X 0.066 0.088 0.120 0.050 0.162 0.186 0.057 0.100
Y 4.6 11.6 9.5 6.3 13.8 15.4 2.5 11.8
63
0.112 0.055 0.154 0.074 0.111 0.140 0.071 0.110
8.0 7.0 20.6 16.6 9.2 17.9 2.8 13
a. Calculate the correlation coefficient and comment on the strength and direction of the
relationship between the two variables.
Solution: The summary quantities are
n = 16, xi = 1.656 , y i = 170 .6, xi y i = 20 .0397 , xi = 0.196912 , y i = 2253 .56
2 2
64
UNIT NINE: ESTIMATION AND HYPOTHESIS TESTING
Objectives:
Having studied this unit, you should be able to
✓ construct and interpret confidence interval estimates
✓ formulate hypothesis about a population mean
✓ determine an appropriate sample size for estimation
Introduction
We now assume that we have collected, organized and summarized a random sample of data and
are trying to use that sample to estimate a population parameter. Statistical inference is a procedure
whereby inferences about a population are made on the basis of the results obtained from a sample.
Statistical inference can be divided in to two main areas: estimation and hypothesis testing.
Estimation is concerned with estimating the values of specific population parameters; hypothesis
testing is concerned with testing whether the value of a population parameter is equal to some
specific value.
9.1 Point and interval estimation of the mean
Point estimate: In point estimation, a single sample statistic (such as x , s or pˆ ) is calculated from
the sample to provide an estimate of the true value of the corresponding population parameters
(such as , or p ). Such a single statistic is termed as point estimator, and the specific value of
the statistic is termed as point estimate. For example, the sample mean X is an estimator for
population mean and X = 10 is an estimate, which is one of the possible values of X .
Interval estimate: In most practical problems, a point estimate does not provide information about
‘how close is the estimate’ to the population parameter unless accompanied by a statement of
possible sampling errors involved based on the sampling distribution of the statistic. Hence, an
interval estimate of a population parameter is a confidence interval with a statement of confidence
that the interval contains the parameter value.
An interval estimate of the population parameter consists of two bounds within which the
parameter will be contained:
L U
where L is the lower bound and U is the upper bound.
Case 1: When the population is normal.
✓ If the variance 2 is known, the sampling distribution of the sample mean X is normal
2 2 X −
with mean and variance . i.e., X ~ N , and Z = ~ N(0,1).
n n
n
X −
✓ If the variance 2 is unknown, t = will have t-distribution with
S
n
n - 1 degrees of freedom. Moreover, as the sample size increases t is approximately the
same as standard normal.
Consider the case 2 is known, we can derive a (1 − )100 % confidence interval for the population
mean .
65
to the right. i.e. P( Z Z )
Let Z be a point on the standard normal curve that cuts an area of
2 2 2
= . By the symmetric property of the normal distribution, P( Z − Z ) = (see the diagram
2 2 2
below).
From the standard normal distribution, we know that
P(− Z Z Z ) = 1 −
2 2
α/2 α/2
1-α
To obtain the limit of the interval estimate, we use the standardized form of X in the above
X −
probability statement. i.e., letting Z =
n
P(− Z Z Z ) = 1 − Becomes
2 2
X −
P(− Z Z ) = 1 −
2 2
n
P(− Z X − Z ) = 1−
2 n 2 n
P(− X − Z − − X + Z ) = 1−
2 n 2 n
P( X − Z X + Z ) = 1−
2 n 2 n
We can assert with probability 1 − that the interval ( X − Z X + Z ) contains
2 n 2 n
the population mean we are estimating.
66
The end points of the interval, X − Z and X + Z
, are called confidence limits and the
n2 2 n
probability 1 − is called the degree of confidence.
In a similar way a (1 − )100 % confidence interval for the population mean with unknown
variance 2 is given by
S S
X − t (n − 1) , X + t (n − 1)
2 n 2 n
where t is the critical value of t-test statistic providing an area in the right tail of the t-
2 2
0.95 0.95
2.28 − (2.571) , 2.28 + (2.571)
6 6
(2.28-0.997, 2.28+0.997)
(1.28, 3.27)
67
We are 95% confident that the mean drop in blood pressure lies in between 1.28 mmHg and 3.27
mmHg for the sampled population.
Example 9.2: Punctuality of patients in keeping appointment is of interest to a research team. In
a study of patients flow through the office of general practitioners, it was found that a sample of
35 patients were 17.2 minutes late for appointments, on the average. Previous research had shown
the standard deviation to be about 8 minutes. The population distribution was felt to be not normal.
What is the 90 percent confidence interval for the true mean amount of time late for appointment?
Solution: Given: X = 17 .2 , = 8 , n = 35
(1 − )100% = 90% 1 − = 0.90 = 0.1 = 0.05
2
Since the sample size is fairly large (n > 30), and since the population standard deviation is known,
according to the central limit theorem, the sampling distribution of sample mean is approximately
normal. Thus, a confidence interval of the population mean is given by:
X − Z , X + Z
2 n 2 n
And from the standard normal distribution table, Z = Z 0.05 = 1.65
2
8 8
17.2 − (1.65) , 17.2 + (1.65)
35 35
(17.2 – 2.2, 17.2 + 2.2)
(15.0, 19.4)
Therefore, the 90% confidence interval for true mean amount of time late for appointment is
between 15.0 and 19.4 minutes.
68
Step 1: State the null hypothesis ( H 0 ) and alternative hypothesis ( H 1 )
Null hypothesis ( H 0 ): refers to a hypothesized numerical value of the population parameter which
is initially assumed to be true. The null hypothesis is always expressed in the form of an equation
making a claim regarding the specific value of the population parameter. That is, for example
H 0 : = 0
where 0 is hypothesized value of the population mean.
Alternative hypothesis ( H 1 ): is the logical opposite of the null hypothesis. The alternative
hypothesis states that specific population parameter value is not equal to the value stated in the
null hypothesis. For example,
H 1 : 0 (Two-sided test)
H 1 : 0 or H 1 : 0 (One-sided test)
Step 2: State the level of significance (alpha) for the test
The level of significance is the probability to wrongly reject the null hypothesis H 0 when it is
actually true. It is specified by the statistician or the researcher before the sample is drawn. The
most commonly used values of are 0.10, 0.50 or 0.01.
Step 3: Calculate the appropriate test statistic
Test statistic is a value computed from a sample that is used to determine whether the null
hypothesis has to be rejected or not. The choice of suitable test statistic depends on the sampling
distribution of the sample statistic. Accordingly, we have the following cases:
Case 1: When the population is normal.
✓ If the variance 2 is known, the sampling distribution of the sample mean X is normal
2 2 X −
with mean and variance . i.e., X ~ N , and the test statistic is Z = ~
n n
n
N(0,1).
X −
✓ If the variance 2 is unknown the test statistic is, t = ~t (n-1).
S
n
69
critical value. For a specified , we read the critical values from the Z or t tables, depending on
the test statistic chosen.
Rejection Rejection
region, α/2 Acceptance region, α/2
region, 1-α
µ=µ0
Critical Critical
value, Zα/2 value, Zα/2
Rejection Rejection
region, Acceptance region,
α/2 region, 1-α α/2
Rejection
Acceptance region, α
region, 1-α
Z=0 Zα
70
iii. For H 1 : 0 (left-tailed test) reject H 0 if Z − Z .
Rejection
region, α Acceptance
region, 1-α
-Zα Z=0
Reject H 0 : = 0 if Z Z Z Z Z −Z
2
Reject H 0 : = 0 if t t (n − 1) t t (n − 1) t −t (n − 1)
2
Null Hypothesis ( H 0 )
Decision True False
Reject H 0 Type I error ( ) Correct decision
Accept H 0 Correct decision Type II error ( )
Type I error is committed if we reject the null hypothesis when it is true. The probability of
committing a type I error, denoted by is called the level of significance. The probability level
of this error is decided by the decision-maker before the hypothesis test is performed. Type II
error is committed if we do not reject the null hypothesis when it is false. The probability of
committing a type II error is denoted by (Greek letter beta). As type one error increases type
two error will decrease (they are inversely proportional). Hence we cannot reduce both errors
simultaneously. As the sample size increases both errors will decrease.
71
Example 9.3: The life expectancy of people in the year 1999 in a country is expected to be 50
years. A survey was conducted in eleven regions of the country and the data obtained, in years, are
given below:
Life expectancy (years): 54.2, 50.4, 44.2, 49.7, 55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and 53.4.
Do the data confirm the expected view? (Assuming normal population) Use 5% level of
significance.
Solution: Let be the life expectancy of people in the year 1999 in a country.
1. H 0 : = 50 (The life expectancy of people in the year 1999 in a country is 50 years)
H 1 : 50 (The life expectancy of people in the year 1999 in a country is different from
50 years)
2. Level of significance, α = 0.05.
3. Since is unknown and the population is normal, the t-test statistic is appropriate.
Given: n = 11; 0 = 50 and we need to compute X and s .
11
x i
54.2 + 50.4 + ..... + 57.5 + 53.4 598.5
X = i =1
= = = 54.41
n 11 11
11
1
x i −
( xi ) 1
2
= 32799.91 −
(598.5) 2
S = 2 2
n −1 n 10 11
1
= (236.07) = 23.607
10
S = 23 .607 = 4.859
Then, the t-test statistic is calculated as:
X − 0 54 .41 − 50 4.41
t= = = = 3.01
S 4.859 1.465
n 11
4. For α = 0.05 and two-tailed test, the critical (table) value is:
t (n − 1) = t 0.05 (11 − 1) = t 0.025 (10 ) = 2.228
2 2
0.02 0.02
5 5
-2.228 0
2.228
Since t = 3.01 t (n − 1) = 2.228 reject the null hypothesis H 0 . That is, the calculated
2
t value lies in the rejection region (the shaded region).
5. Conclusion: The data do not confirm the expected view. That is, the life expectancy is
different from 50 years at 5% level of significance.
72
Example 9.4: Suppose that we want to test the hypothesis with a significance level of .05 that the
climate has changed since industrialization. Suppose that the mean temperature throughout history
is 50 degrees. During the last 40 years, the mean temperature has been 51 degrees and the
population standard deviation is 2 degrees. What can we conclude?
Solution:
Let be the mean temperature.
1. H 0 : = 50 (There is no change in temperature since industrialization)
H 1 : 50 (There is change in temperature since industrialization)
2. Level of significance, α = 0.05.
3. Since n = 40 is large, the Z-test statistic is appropriate.
Given: n = 40; = 2; X = 51; 0 = 50
X − 0 51 − 50 1
Z= = = = 3.16
2 0.316
n 40
4. For α = 0.05 and two-tailed test, the critical (table) value is:
Z = Z 0.05 = Z 0.025 = 1.96
2 2
0.0 0.0
25 25
-
1.96 Z= 1.9
0 6
Since Z = 3.16 Z = Z 0.025 = 1.96 reject the null hypothesis H 0 . That is, the
2
calculated Z value lies in the rejection region (the shaded region).
5. Conclusion: There has been a change in temperature since industrialization, at 5% level of
significance.
Example 9.5:A study was conducted to describe the menopausal status, menopausal symptoms,
energy expenditure and aerobic fitness of healthy midwife women and to determine relationship
among these factors. Among the variables measured was maximum oxygen uptake (Vo2max). The
mean Vo2max score for a sample of 242 women was 33.3 with a standard deviation of 12.14. On
the basis of these data, can we conclude that the mean score for a population of such women is
greater than 30? Use 5% level of significance.
Solution:
Let be the mean Vo2max score for a population of healthy midwife women.
1. H 0 : = 30 (The mean score for a population of healthy midwife women is 30)
H 1 : 30 (The mean score for a population of healthy midwife women is greater than
30).
2. Level of significance, α = 0.05.
3. Since n = 242 is large, the Z-test statistic is appropriate.
Given: n = 242; S = 12.14; X = 33.3; 0 = 30
73
X − 0 33 .3 − 30 3.3
Z= = = = 4.23
S 12 .14 0.7804
n 242
4. For α = 0.05 and right-tailed test, the critical (table) value is:
Z = Z 0.05 = 1.65
0.
05
Z= 1.6
0 5
Since Z = 4.23 Z = 1.65 reject the null hypothesis H 0 . That is, the calculated Z value
lies in the rejection region (the shaded region).
5. Conclusion: The mean Vo2max score for the sampled population of healthy midwife women
is greater than 30 at 5% level of significance.
9.3 Test of Association (Independence)
Usually we encounter with nominal scale data. The 2 test of association is useful for determining
whether there is any relationship or association exists between two nominal variables. For instance,
we might be interested in the relationship between HIV status with sex, lung cancer and smoking
habit, political affiliation and sex, e t c.
When observations are classified according to two variables or attributes and arranged in a table,
the display is called a contingency table as shown below:
The test of association or independence uses the contingency table format. Here the variables A
and B have been classified into mutually exclusive categories. The values Oij in row i and column
j of the table shows the observed frequency falling in each joint category i and j. The row and
column totals are the sums of their corresponding frequencies. The sum of row or column totals
will give grand total n, which represents the sample size. The procedures to test the association
between two independent variables is summarized as follows:
Step 1: State the null and alternative hypotheis
H 0 : There is no association or relationship exists between two variables, that is, the two
variables are independent.
74
H 1 : There is association or relationship between two variables, that is, the two variables
are dependent.
Step 2: State the level of significance, .
Step 3: Calculate the expected frequencies, Eij, corresponding to the observed frequency in row i
and column j. The expected frequencies in each cell are calculated as:
Row i total Column j total Ri C j
Eij = =
Sample size n
Step 4: Compute the value of test-statistic:
r c (O − E ) 2
Cal =
2 ij ij
i =1 j =1 Eij
where Oij is the observed frequency of row i and coulumn j and Eij is the expected frequency of
row i and coulumn j.
Step 5: Find the critical (table) value of 2 (df ) (from Appendix..). The value of 2 correponds
to an area in the right tail of the distribution.
where df = (Number of rows – 1)(Number of columns – 1) = (r – 1)(c – 1)
Step 6: Compare the calculated and table values of 2 . Decide wheather the variables are
independent or not, using the following decision rule:
Reject H 0 if Cal 2 is greater than 2 , (df ) . Otherwise do not reject H 0 .
Example 9.6: The following data on the colour of eye and hair for 6800 individuals were obtained
from a source:
Eye colour
Hair colour Fair Brown Black red Total
Blue 1768 808 190 47 2813
Green 946 1387 746 43 3122
Brown 115 444 288 18 865
Total 2829 2639 1224 108 6800
Test the hypothesis that hair colour and eye colour are independently distributed (there is no
association between colour of eye and colour of hair) at the level of = 0.01.
Solution:
1. H 0 : There is no association between hair colour and eye colour.
H 1 : There is association between hair colour and eye colour.
2. = 0.01.
3. Calculate the expected frequencies, Eij
Ri C j
Eij =
n
2813 2829 2813 108
E11 = = 1170.29 ……………….. E14 = = 44.68
6800 6800
865 2829 865 108
E31 = = 359.87 ………………….. E34 = = 13.74
6800 6800
75
Therefore, the contingency table for expected frequencies is as follows:
Eye colour
Hair colour Fair Brown Black red Total
Blue 1170.29 1091.69 506.34 44.68 2813
Green 1298.84 1211.61 561.96 49.58 3122
Brown 359.87 335.70 155.70 13.74 865
Total 2829 2639 1224 108 6800
4. Calculate the test statistic:
r c (O − E ) 2
Cal =
2 ij ij
i =1 j =1 Eij
(1768 − 1170.29) 2 (47 − 44.68) 2 (946 − 1298.84) 2
Cal 2
= + ..... + + + ..... +
1170.29 44.68 1298.84
(43 − 49.58) 2 (115 − 359.87) 2 (18 − 13.74) 2
+ + ..... +
49.58 359.87 13.74
Cal 2 = 1074 .43
5. Critical value 2 (df )
df = (r – 1) (c – 1) = (3 – 1) (4 – 1) = (2) (3) = 6
2 (df ) = 0.01 2 (6) = 16 .812
6. Since Cal = 1074 .43 > (df ) = 16 .812 Reject H 0 .
2 2
7. Conclusion: There is association between hair colour and eye colour. That is, hair colour
and eye colour are dependent.
76
77
78
79