Statistics and Probability Handout
Statistics and Probability Handout
Objectives:
Having studied this unit, you should be able to
understand statistics and basic terminologies
understand scales of measurement in statistics
understand the basic methods of data collection
Introduction
Most people become familiar with statistics through radio, television, newspapers and
magazines. For instance, one may find the following statements in a newspaper or reports. “The
HIV prevalence rate in Ethiopia among adults 15-49 years is 1.4 in 2005”; “Among older men,
the mortality rate for smokers is twice the rate of those who never smoked”; “The agricultural
production increased by 5 percent this year”.
However, statistics is used in almost all fields of human endeavor to make a scientific decisions
based on data. For example, in public health an administrator would be concerned with the
number of residents who contract a new strain of flu virus during a certain year. In pharmacy, it
is used to study the efficacy and potency of drugs. To study plant life, a botanist has to relay on
statistics to know the effect of temperature, rainfall and so on. In general, statistics can be
applied in business, social sconces, natural sciences and engineering.
Definition of Statistics
The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state."
Clearly, statistics is closely linked with the administrative affairs of a state such as facts and
figures regarding defense force, population, housing, food, financial resources etc.
The word statistics has several meanings. In the first place, it is a plural noun which describes a
collection of numerical data such as employment statistics, accident statistics, population
statistics, economic statistics, and agricultural statistics e t c. It is in this sense that the word
'statistics' is usually understood by a layman.
Secondly the word statistics as a singular noun is used to describe a branch of applied
mathematics, whose purpose is to provide methods of dealing with collections of data and
extracting information from them in compact form by tabulating, summarizing and analyzing the
numerical data or a set of observations.
Classification of Statistics
Inferential statistics includes statistical methods which facilitate estimation the characteristics of
a population or making decisions concerning a population on the basis of sample results. In this
regard, methods like estimation and hypothesis testing are examples of inferential statistics.
For example, a biologist collected blood samples of 10 students from biology department to
study blood types. Accordingly, the following data is obtained:
O A O AB A A O O B A O
Summary measures, for example, the proportion of students with blood type O in the sample is
50% is an example of descriptive statistics. We can also describe the data using bar or pie charts.
However, if he/she wants to get information on the proportion of students with blood type O in
the entire class, he/she may use the sample proportion (50%) as an estimate of the corresponding
value of the entire class. This is an example of inferential statistics.
Activity 1.1
A statistical study might involve the following stages: collection of data, organizing and
presenting the collected data, analyzing and interpreting the result.
Stage 1: Data collection: this stage involves acquiring data related with the problem at hand.
Stage 2: Organizing and presenting data: this stage involves the classification or sorting the
collected data based on some characteristics or attributes such as age, sex, marital status e t c.
Further we may use tables, graphs, charts so on to present the data.
2 of 25
Stage 3: Data analysis: a thorough scrutiny or analysis of the data is necessary in order to reach
conclusions or provide answers to a problem. The analysis might require simple or sophisticated
statistical tools depending on the type of answers that may have to be provided.
Activity 1.2
A population: is the totality (collection) of all individuals, objects or items under consideration.
Consists of all elements, individuals, items or objectives whose characteristics are being studied.
The population that is being studied is called target population.
Sample: A portion of the population selected for study.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Variable: is a characteristic under study that assumes different values for different element.
Quantitative variable: A variable that can be measured numerically. The data collected on
quantitative variable are called quantitative data. Examples include weight, height, number of
students in a class, number of car accidents, e t c.
Qualitative variable: A variable that cannot assume a numerical value but can be classified into
two or more non numerical categories. The data collected on such a variable are called
qualitative or categorical data. Examples include sex, blood type, marital status, religion e t c.
Discrete variable: a variable whose values are countable. Examples include number patients in a
hospital, number of white blood cells in a droplet of blood sample, number of rodents per plot of
farmland e t c.
Continuous variable: a variable that can assume any numerical value over a certain interval or
intervals. Examples include weight of new born babies, height of seedlings, temperature
measurements e t c.
Parameter: A statistical measure obtained from a population data. Examples include population
mean, proportion, and variance and so on.
Statistic: A statistical measure obtained from a sample data. Examples include sample mean,
proportion, and variance and so on.
Unit of analysis: The type of thing being measured in the data, such as persons,
families, households, states, nations, etc.
Activity 1.3
3 of 25
2. Explain which of the following variables are quantitative and which are qualitative
a. Number of persons in a family b. Marital status of people
c. Monthly phone bills d. Length of frog jump
3. Classify the above variable as discrete or continuous.
4. Clearly identify the difference between population and sample by giving example.
5. Differentiate the terms statistic and parameter.
Application of statistics
We pointed out that statistics has already become a very important subject area, and, that various
tools of statistics are being used to solve problems in everyday life, in research, in marketing, in
planning, in production and quality control and other areas. Nevertheless, statistics has its own
limitation and it can also be misused. In the following section we outline the limitations.
Limitation of statistics
Statistics deals with only those subjects of inquiry which are capable of being quantitatively
measured and numerically expressed.
Statistics deals only with aggregates of facts and no importance is attached to individual
items
Statistical data is only approximately and not mathematically correct
Statistics is liable to be misused. Hence expertise in the subject is very essential. Besides,
honesty is very important in the use of statistics.
Activity 1.4
2. Discuss the limitation of statistics using example for each of the limitations.
If we use different types of measurement scales having different levels of refinement to measure
one and the same object we obtain different amounts and types of information about a variable
under consideration. Formally, we distinguish among four levels of measurement scales, and,
therefore, among four types of data.
Nominal scale: it is the simplest measurement scale. Values of nominal scale are used merely to
categorize the quantity being measured and hence there is no natural ordering of the levels or
values of the scale. For example, sex of an individual may be male or female. There is no natural
ordering of the two sexes. Others examples include religion, blood type, eye colour, marital
status e t c. The values of nominal scale can be coded using numerical values; however, we
cannot perform any mathematical operations on the numbers used to code.
4 of 25
Ordinal scale: this measurement scale is similar to the nominal scale but the levels or categories
can be ranked or order. That is, we can compare levels or categories of the scale. Therefore, this
scale of measurement gives better information on the quantities being measured as compared to
nominal scale. For example, living standard of a family can be poor, medium or higher. These
categories can be ordered as poor is less than medium and medium is less than higher class.
However, the distance or magnitude between the levels, say between poor and medium, is not
clearly known.
Interval scale: this measurement scale shares the ordering or ranking and labeling properties of
ordinal scale of measurement. Besides, the distance or magnitude between two values is clearly
known (meaningful). However, it lacks a true zero point (i.e., zero point is not meaningful). For
example, temperature in degree centigrade or Fahrenheit of an object. If the temperature of an
object is zero degrees centigrade, it doesn’t mean that the object lacks heat. Hence zero is
arbitrary point in the scale. It doesn’t make sense to say that 80° F is twice as hot as 40° F; in
centigrade the ratio would be 6; neither ratio is meaningful. We can do subtraction and addition
on interval level data but division and multiplication are impossible to use.
Ratio scale: it is the highest level of measurement scale. It shares the ordering, labeling and
meaningful distance properties of interval scale. In addition, it has a true or meaningful zero
point. The existence of a true zero makes the ratio of two measures meaningful. For instance, if
your salary is 1000 birr and your wife’s is 2000 we can say that your wife earns twice of yours.
If you don’t have any source of income, your income is zero in this scale context and it is
meaningful assignment. Other example includes, weight, height, volume measurements e t c.
We can do subtraction, addition, multiplication and division on ration level data.
The more precise variable is ratio variable and the least precise is the nominal variable. Ratio and
interval level data are classified under quantitative variable and, nominal and ordinal level data
are classified under qualitative variable.
Activity 1.5
1. For each of the variables, indicate whether it is quantitative or qualitative and specify the
scale of measurement that is employed when taking measurement on each
a. Class standing of the members of this class relative to each other
b. Admitting diagnosis of patients admitted to a mental health clinic
c. Weight of babies born in a hospital during a year
d. Gender of babies born in a hospital during a year.
e. Under-arm temperature of day-old infants born in a hospital.
5 of 25
CHAPTER - TWO
Objectives:
After completing this unit you should be able to
organize data using frequency distribution.
present data using suitable graphs or diagrams.
Introduction
The amount of data collected in real life situations is often too large, thus we need some methods
to organize it. One of such methods is grouping, that is putting data into groups rather than
treating each observation individually. In fact, raw data provide little, if any, information to
decision makers. Thus, they need a means of converting the raw data into useful information.
Hence, the purpose of this chapter is to introduce tools used for data presentation.
The use of classifying and tabulating data are to display the points of similarity and dissimilarity;
to save mental strain by systematic condensation and suppression of irrelevant detail; to enable
one to form a mental picture of objects of perception; and to prepare the ground for comparison
and inference.
Types of classification
1. Geographical- in terms of cities, districts, countries etc.
2. Chronological - on the basis of time
3. Qualitative - according to some qualitative characteristics.
4. Quantitative – in terms of magnitude.
Tabulation: tables may be classified according to the number of characteristics used for
tabulation.
1. Simple or one way table: it uses only one characteristic or variable for classification.
Example 2.1: Students who took introduction to statistics in 1998 E.C.by gender.
Gender Number
Male 2000
Female 700
6 of 25
2. Two-way tables: it uses two characteristics for classification.
Example 2.2: Students who took introduction to statistics in 1998 E.C.by age and gender.
Age Gender
Number of male Number of female
19 and below 200 180
20-25 1415 385
26 and above 385 135
3. Higher ordered tables: results when we have more than two characteristics of classification.
For instance, we can classify the students who took introduction to statistics in 1998 by age,
gender and faculty.
There are many types of data collection techniques which are used to collect data for study.
There are two types of data: primary and secondary data. Primary data refers to the statistical
material which the investigator originates for the purpose of inquiry. But secondary data, on the
other hand, refers to that statistical material which is not investigated by the investigator himself,
but which he obtains from someone else records.
Primary methods of data collection: Those methods that aim at collecting primary data are
termed as primary method. These may involve data collection using observation, personal
interview, self administered questionnaire, mailed questionnaire etc.
Secondary method of data collection: Secondary data can be obtained from published or
unpublished documents: reports, journals, magazines, articles e t c.
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
Comparable
Meaningful and
Collected for a well defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
It enables us to know the rang of the data set easy and it also gives us some idea about
the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies first hand information for the use of the
immediate purpose.
7 of 25
Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for
other purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
Secondary data: data collected from a secondary source.
The process of data collection from a primary source may in value: field trials, laboratory
experiments, surveys (sample survey and census survey), etc….
Activity 2.1
1. Distinguish between primary and secondary data.
2. What are the various methods of collecting primary data? Give example of each.
3. Define secondary data. What are their sources?
4. Describe primary and secondary method of data collection. In what special circumstance
are the two methods suitable?
In this section, we will concentrate on some of the frequently used method of organizing data.
The easiest method of organizing data is using a frequency distribution, which converts raw data
into a meaningful pattern for statistical analysis.
The main uses of a frequency distribution are
to organize data in a meaningful, intelligible way.
to enable one to determine the nature or shape of the distribution; how the observations
cluster around a central value; and how the values spread around the center of the data.
to facilitate computational procedures for measures of average and spread.
to enable one to draw charts and graphs for the presentation of data.
to enable one to make comparisons between data sets.
Frequency distribution: a grouping of data into categories showing the number of observations
in each mutually exclusive category.
Array: data put in an ascending or descending order of magnitude.
Grouped data: data presented in the form of a frequency distribution.
Frequency: the number of observations corresponding to a fixed value or to a class of values.
Relative frequency: the number obtained when the frequency of a class is divided by total
number of observations.
Generally, there are three basic types of frequency distributions: Categorical, Ungrouped and
Grouped frequency distributions.
8 of 25
The categorical frequency distribution is used for data which can be placed in specific categories
such as nominal or ordinal level data. For example, data such as political affiliation, religious
affiliation, blood type, marital status, or major field of study would use categorical frequency
distributions.
Example 1.1: The following data are on the political party affiliations of sample of 40 statistics
students. D, R, and O stand for Democratic, Republican and other, respectively.
D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.
B B C B A C
D C C C B B
B A B C D C
A F B F C A
B C C A C D
There are five kinds of grades: A, B, C, D and F which may be used as the classes for constructing the
distribution. The procedure for constructing a frequency distribution for categorical data is given below.
(I) ( II ) ( III ) ( IV )
STEP 2. Tally the data and place the results in column (II)
9 of 25
STEP 3. Count the Tallies and put the results in column (III)
STEP 4. Calculate the percentages (%) of frequencies in each class by using the formula
f
% 100 Where f = frequency of the class (result in column (III))
n
n = total number of observations
*
Percentages, normally, are not parts of a frequency distribution, but they can be included since they are
important in different statistical analyses.
STEP 5. [For checking] find the total of column (III) and that of column (IV) and see that the total of
column (III) and that of column (IV) are n (total number of observations) and 100%
respectively.
Finally, the frequency distribution becomes as follows.
(I) ( II ) ( III ) ( IV )
A ///// 5 16.7
D /// 3 10.0
F // 2 6.7
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly occur in
the data along with their corresponding frequencies. Ungrouped frequency distribution is often
constructed for small set of data or a discrete variable.
To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in the
collected data. Then make a columnar table of all potential raw scored values arranged in order of
magnitude with the number of times a particular value is repeated, i.e., the frequency of that value. To
facilitate counting method, tallies can be used.
Example 2.1: The following data are the ages in years of 20 women who attend health education last year:
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1
Class boundaries: the precise points which separate various classes rather than the values
included in any one of the classes. They are sometimes referred to as exact or true limits. They
leave no space for ambiguity and overlapping. A class boundary is located mid-way between the
upper class limit of a class and the lower class limit of the next higher class. They are carried out
to one more decimal place than the class limits.
Class mark: the point which divides the class into two equal parts. This is also known as class
mid-point. This can be determined by dividing the sum of the two limits or the sum of the two
boundaries by 2.
11 of 25
70 64 99 55 64 89 87 65 62 38 67 70 60 69 78 39 75 56 71 51
99 68 95 86 57 53 47 50 55 81 80 98 51 36 63 66 85 79 83 70
By grouping data into classes we can make the data much easier to read and understand. We
group these data by 10s. The smallest weight is 36 kg, thus the 1rst class of weights is 31 kg up
to, including, 40 kg.
Table 3.1: Distribution of weights.
Class Class boundary Count (Frequency)
31 – 40 30.5-40.5 3
41 – 50 40.5-50.5 2
51 – 60 50.5-60.5 8
61 – 70 60.5-70.5 12
71 – 80 70.5-80.5 5
81 – 90 80.5-90.5 6
91 - 100 90.5-100.5 4
Total 40
For this example, the first class is ‘31-40’. Lower limit of this class = 31; upper limit = 40. The
lower class boundary = 30.5; upper class boundary = 40.5. The width of the class = upper class
boundary - lower class boundary = 40.5-30.5 = 10. The class mark (class mid-point) of this class
is (31+40)/2 = 35.5. The values 36, 39, 38 are included in this class. Therefore, the frequency of
this class is 3.
Cumulative frequency (Cf) less than type – is the total frequency of all values (observations)
less than or equal to the upper class boundary for the given class.
Cumulative frequency (Cf) more than type – is the total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.
A tabular arrangement of class intervals together with their corresponding cumulative frequency
(either more than or less than type; as defined above) is called cumulative frequency distribution.
STEP 1. Find the maximum(Max) and the minimum(Min) observation, and then compute their range, R
Range Max Min
STEP 2. Fix the number of classes’ desired (k). there are two ways to fix k:
– Fix k arbitrarily between 6 and 20, or
– Use Sturge’s Formula: k 1 3.332 log 10 N where N is the total frequency. And round
this value of k up to get an integer number.
STEP 3. Find the class widths (W) by dividing the range by the number of classes and round the number up
to get an integer value. W R
K
STEP 4. Pick a suitable starting point less than or equal to the minimum value. This starting point is the
lower limit of the first class. Continue to add the class width to this lower limit to get the rest of
the lower limits.
12 of 25
STEP 5. Find the upper class limits. To find the upper class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to this
upper limit so as to get the rest of the upper limits.
STEP 6. Compute the class boundaries as: LCB LCL 12 U and UCB UCL 12 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and UCB= upper
class boundary. The class boundaries are also half way between the upper limit of one class and the lower
limit of the next class.
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48
27 53 40 29 63 34 44 32
58 61 38 41 26 50 47 37
STEP 5. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
The lower and the upper class limits (Steps 5 and 6) can be written as follows.
Class limits
26 – 30
31 – 35
36 – 40
41 – 45
46 – 50
51 – 55
13 of 25
56 – 60
61 – 65
STEP 6. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units of
measurement to the upper class limits, we can get lower and upper class boundaries as follows.
Class
boundaries
25.5 – 30.5
30.5 – 35.5
35.5– 40.5
40.5– 45.5
45.5– 50.5
50.5– 55.5
55.5– 60.5
60.5– 65.5
STEPS 7 and 8 are displayed in the following table (columns 3, 4 and 5&6 respectively).
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
Example 3.2: The following data are on the number of minutes to travel from home to work for a
group of automobile workers.
28 25 48 37 41 19 32 26 16 23 23 29 36
31 26 21 32 25 31 43 35 42 38 33 28.
14 of 25
Construct a frequency distribution for this data.
Solution:
Range = 48 – 16 =32
K=1+3.322 =5.64≈6
Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally
Frequency
16-21 15.5-21.5 \\\3
22-27 21.5-27.5 \\\\\ \
6
28-33 27.5-33.5 \\\\\ \\\
8
34-39 33.5-39.5 \\\\
4
40-45 39.5-45.5 \\\3
46-51 45.5-51.5 \ 1
Total 25
Table 3.2: The distribution of the time in minutes spent by automobile workers to travel from
home to work place.
This frequency distribution is more understandable than the raw data. We can see some feature
of the data from this table. For instance, many observations are found in the second class and
third class. This in turn implies that many workers took around 22 to 33 minutes to travel from
home to work place.
Activity 2.2
1. In a biology experiment the lengths of 25 worms, measured to the nearest 0.1cm, were:
9.5 8.1 5.1 6.6 9.3 9.1 6.5 5.0 6.9 7.6 9.3 8.3 6.0
6.2 7.4 7.7 7.8 7.9 7.0 7.8 5.4 9.8 6.3 7.5 8.4
Construct a frequency distribution for the data by using Sturgess’ rule for the number of classes. What
do you think about the typical length of these worms?
15 of 25
Based on the type of frequency assigned to the classes we have three types of grouped frequency
distributions:
Absolute frequency distribution
Relative frequency distribution
Cumulative frequency distribution
The frequency distributions that we have seen in the previous examples (examples 3.2 and table 3.2) are
absolute frequency distributions because the frequencies assigned are absolute frequencies.
Example 3.3: Convert the above absolute frequency distribution in example 2.6 to a relative frequency
distribution.
Solution: First we find the relative frequency of each class. The relative frequency of a class is the
frequency of the class divided by the total number of observations. For instance the relative frequency of
the first class is 3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the
relative frequency distribution is shown in table 2.7.
Table 3.3: The distribution of the time in minutes spent by automobile workers to travel from home to
work place.
Time (in minute) Relative frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1
Note: Proportion may also be changed to percentages to obtain a percentage relative frequency
distribution.
Example 3.4: Convert the above relative frequency distribution to a percentage relative frequency
distribution.
Solution: We simply multiply the relative frequencies of the above relative frequency distribution by 100.
Table 3.4: The distribution of the time in minutes spent by automobile workers to travel from home to
work.
16 of 25
Definition 2.2: Cumulative frequency refers to the number of observations
that are below a specified value or that are above a specified value.
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the
observations are bounded from above or from below we can have a cumulative less than or a
cumulative more than frequency distributions, respectively.
Example 2.8: Convert the absolute frequency distribution in example 2.5 into:
Solution:
We use the class boundaries to form cumulative frequencies.
Table 3.5: The less than and more than type cumulative frequency distribution of the time in
minutes spent by automobile workers to travel from home to work place.
Activity 2.3
1. The following are the scores of 32 students who took statistics test:
55 70 80 75 90 80 60 100 95 70 75 85 80 80 70 95
100 80 85 70 85 90 80 75 85 70 90 60 80 70 85 80
Organize this data set using an absolute frequency distribution consisting of 7 classes. Start the first
class with the minimum value in the data set. Construct also the relative frequency distribution, the
less than cumulative frequency distribution, and the more than cumulative frequency distribution.
What do you think about the typical score of these students? How many students score below the
lower limit of the third class?
17 of 25
It consists of a set of adjacent rectangles whose bases are marked off by class boundaries (not
class limits) along the horizontal axis and whose heights are proportional to the frequencies
associated with the respective classes.
The importance of a histogram is that it enables us to organize and present data graphically so as
to draw attention to certain important features of the data. For instance, a histogram can often
indicate how symmetric the data are; how spread out the data are; whether there are intervals
having high levels of data concentration; whether there are gaps in the data; and whether some
data values are far apart from others.
Example: Construct a histogram for the frequency distribution of the time spent by the
automobile workers.
Table 3.6: The distribution of the time in minutes spent by automobile workers to travel from
home to work.
2. Frequency Polygon
18 of 25
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical axis
and their respective class marks along the horizontal axis. Then join the cross points by a free hand curve.
10
6
Frequency
0
0.0 8.50 14.50 20.50 26.50 32.50 38.50
Class Marks
Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis. Place
the class boundaries along the horizontal axis and the corresponding cumulative frequencies (either less
than or more than cumulative frequencies) along the vertical axis. Then join the cross points by a free
hand curve.
19 of 25
Example: the data in the above example can be presented using either a less than or a more than
cumulative frequency polygon as given below (i) and (ii) respectively.
30
Less than type cumulative frequencies
20
10
0
11.50 17.50 23.50 29.50 35.50 41.50
30
More than type cumulative frequencies
20
10
0
5.50 11.50 17.50 23.50 29.50 35.50
4. Line graph
Data from a frequency table can be graphically pictured by a line graph which plots the
successive values on the horizontal axis and indicates the corresponding frequency by the height
of a vertical line. This method of data presentation is especially suitable for discrete data. For
20 of 25
instance data on number of family members, number of car accidents, number of defective items
produced by machines etc could be well explained using line graph.
Example: The following data are on the number of seeds germinated out of six seeds planted in
each of 50 pots.
1 1 1 2 6 3 3 4 2 43 2 1 5 2 1 3 6 2 23 1 1 4 3
2 2 2 2 30 3 1 2 1 2 3 1 1 3 3 2 1 2 1 1 3 1 5 1
1. Bar-charts
i) Simple bar charts: are diagrammatic representation of data in which the data are
represented by series of vertical or horizontal bars, the height (or length) of each bar
indicating the size of the figure represented.
Example: Draw a bar chart for the following coffee production data.
21 of 25
120
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
ii) Component bar charts: are like ordinary bar charts except that the bars are subdivided
into two or more component parts. It is used to represent total figure in terms of
components. The components are proportional in size to the component parts of the
total quantity being represented by each bar.
a. Actual component bar charts: are charts in which the overall height of the bar and the
individual component lengths represent actual figures.
Example: Draw an actual component bar chart for the following data on production of coffee
(in 1000 tons).
Table: Coffee productions from 1991 to 1993 by region.
Production year 1991 1992 1993
250 Region
Amount of coffee in 1000 tons
A
B
200
150
100
50
0
1991 1992 1993
Production year
b. Percentage component bar charts: are charts in which the individual component
lengths represent the percentage forms of the overall total. Note that a series of such bars
will all be of the same total height, i.e. 100 percent.
22 of 25
Example: Draw a percentage component bar chart for the above data on production of coffee (in
1000 tons).
Solution: First convert the component figures into percentage forms of their corresponding totals
to get the following result.
Table: Coffee productions from 1991 to 1993 by region.
100.0 Region
Amount of coffee in percent
A
B
80.0
60.0
40.0
20.0
0.0
1991 1992 1993
Production year
iii) Multiple bar charts: are charts in which figures are shown as separate bars adjoining
each other. The height of each bar represents the actual value of the component
figures.
Example: Draw a multiple bar chart for the data on production of coffee.
200 Region
Amount of coffee in 1000 tons
A
B
150
100
50
0
1991 1992 1993
Production year
23 of 25
2. Pie-chart
Is a circle divided by radial lines into sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
Pie-chart construction:
f
Calculate the percentage frequency of each component. It i *100 .
n
f
Calculate the degree measures of each sector. It is given by i * 360 0 .
n
Draw the circle using protractor and compass
Example: Draw a pie-chart to represent the following data on a certain family expenditure.
Item
Food
Clothing
House rent
Fuel and light
Miscellaneous
Activity 2.4
1. The following data are the blood types of 50 volunteers at a blood plasma donation clinic:
O A O AB A A O O B A O A AB B O O O A B A A O A A B O B A O AB
A O O A B A A A O B O O A O A B O AB A O
2. The following table gives the number of deaths in a certain country in 1987 due to accidents for
individuals in various classifications.
24 of 25
Classification Number of deaths
Pedestrians 1699
Bicyclists 280
Motorcyclists 650
Represent the data using both a bar chart and a pie chart. Which of the charts is more
informative?
3. Pictogram
Is a device used to represent data by means of pictures or small symbols. It is customary to
represent a unique value of the data by standard symbol or a picture and the whole quantity by an
appropriate number of repetitions of the symbol assumed. The symbol should be simple and
clear for understanding.
Example: The following table shows the orange production in a plantation from production year
1990-1993. Represent the data by a pictogram.
25 of 25
Chapter Three
A single value that describes the characteristics of the entire mass of data is called measures of
central tendency or average.
We say a measure of central tendency is best if it posses most of the following. It should:
For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x 2 , x3 , x 4 , x5 and x 6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x 6 = 37.
6
Their sum becomes x i 1
i 21+13+59+46+32+37=208.
n
2 2 2 2
Similarly x1 x2 ... xn = xi
i 1
Example:
12 12 12 12 2
2
Let xi 26,
i 1
y
i 1
i
17 , xi 484,
i 1
y
i 1
i
362
12 12
Find I ) (4 x 3 y ),
i 1
i i
II ) 2 x ( x 7)
i 1
i i
27 of 25
12 12 12
Solution: I ) (4 x
i 1
i
3 y ) 4 xi Y i 4( 26) 3(17) 105
i
i 1 i 1
12 12 12
2
II ) 2 xi ( xi 7) 2 xi 14 xi 2(484) 14(26) 604
i 1 i 1 i 1
Several types of averages or measures of central tendency can be defined, the most commons are
- the mean
- the mode
- the median
3.3.1. The Mean
There are four types of means: Arithmetic mean, Weighted arithmetic mean, Harmonic mean and
Geometric mean.
Arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items.
When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
Example 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record
the following:
17.5, 19.5, 17.5, 19, 20, 21, 18, 19.5, 18, 10.75
Example 2: Monthly incomes of fourth year regular students are given in the following
frequency distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
28 of 25
Arithmetic Mean for Grouped Frequency Distribution
If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
k
f m
i 1
i i f m
1 1 f m ... f m
2 2 k k
x k
f f f ... f
1 2 k
i
i 1
k
Note that f i n = the total number of observations.
i 1
Example: The following table gives the daily wages of laborers. Calculate the average daily
wages paid to a laborer.
Number of laborers 3 4 5 6 6 4 3
The sum of the deviations of the items from their arithmetic mean is zero. This means, the
algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is ( xi x ) 0
i 1
The sum of the squares of the deviations of a set of observations from any number, say A, is
minimum when A= . That is,
When a set of observations is divided into k groups and x1 is the mean of n1 observations of
group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations
of group k , then the combined mean ,denoted by xc , of all observations taken together is
given by
29 of 25
If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let
denote the wrong figure used in calculating the mean
be the correct figure that should have been used
be the wrong mean calculated using , then the correct mean, , is given by
Solution:
Example 2: An average weight of 10 students was calculated to be 65 kg, but latter, it was
discovered that one measurement was misread as 40 kg instead of 80 kg. Calculate the corrected
average weight.
Solution:
Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100.
After the mid-term exam, however, a student whose score was 41 out of 100 dropped the course.
What is the average/mean score among the 24 students?
In finding arithmetic mean, all items were assumed to be of equally importance (each value in
the data set has equal weight). When the observations have different weight, we use weighted
average. Weights are assigned to each item in proportion to its relative importance.
If x1 , x 2 , ..., x k represent values of the items and w1 , w2 , ... , wk are the corresponding weights, then
the weighted mean, ( x w ) is given by
30 of 25
Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively 82, 80, 90 and 70.If the respective credits received for these courses are 3, 5, 3 and
1, determine the approximate average mark the student has got for one course.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.
xi 82 80 90 70
wi 3 5 3 1
Therefore x w
w x i i
(3 82) (5 80) (3 90) (1 70)
82.17
w i 3 5 3 1
Exercise: If a student gets A in 4 cr. hrs, B in 3 cr. hrs and D in 2 cr. hrs courses, what is his
GPA in this semester?
Values 4 3 1
Weight 4 3 2
- Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite.
- It is calculated based on all observations.
- Arithmetic mean is simple to calculate and easy to understand.
- It doesn’t need arrangement of data in increasing or decreasing order.
- Arithmetic mean is also capable of further algebraic treatment.
- It affords a good standard of comparison.
31 of 25
Geometric Mean: It used when observed values are measured as ratios, percentages,
proportions, indices or growth rates.
GM n x1 . x2 .... xn ,
n f1 f 2 .... f k
If the observed have frequencies GM x .x 1 2 x
k
Solution:
Values 2 4 6 8 10 Total
Frequencies 1 2 2 2 1 8
2 2 2
GM 8
2*4 * 6 .*8 *10 5.41
Harmonic Mean: is a suitable measure of central tendency when the data pertains to speed, rate and
n n
time. HM
n 1 1 1
i1 ....
x i x 1 x n
HM
f i 1 i
f 1
..... f k
n 1 1
i 1
f ix i
f x ........ f x
1 1 k k
Example: A motorist travels 480km in 3 days. She travels for 10 hours at rate of 48km/hr on 1st day,
for 12 hours at rate of 40km/hr on the 2nd day and for 15 hours at rate of 32km/hr on the 3rd day.
What is her average speed?
3
HM 39.92
1 1 1
48 40 32
1. x GM HM
32 of 25
2. For two observations x * HM GM
The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x 2 , ..., x n by ~
x . For ungrouped data the median is obtained by
F Sum of frequencies of all class lower than the median class (in other words it is the
cumulative frequency immediately preceding the median class)
The median class is the class with the smallest cumulative frequency greater than or equal to n .
2
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2,
6.4, 10.5, 8.1 and 7.8. Find the median weight of these five babies.
Examples 2: The following table gives the distribution of the weekly wages of employees of a small
firm.
33 of 25
126 and below 3
127 – 135 5
136 – 144 9
145 – 153 12
154 – 162 5
163 – 171 4
Merits of median
Demerits of median
- It is not capable of further algebraic treatment.
- It is not a good representative of the data if the number of items (data) is small.
- The arrangement of items in order of magnitude is sometimes very tedious process if the number
of items is very large.
The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.
34 of 25
1
xˆ Lmod W
1 2
1 The difference between the frequency of the modal class and frequency of the class
2 The difference between the frequency of the modal class and frequency of the class
The modal class is the class with the highest frequency in the distribution.
Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students’ marks.
Example 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30
children given below.
No. of children 5 5 9 4 4 3
Solution: 2.7-2.3 is the modal class since it has the highest frequency
4
xˆ 2.7 * 0.4 2.878
45
Merits of mode
Demerits of mode
- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency
3.3.4 Quantiles
Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are averages of position (non-central tendency). Some of these are quartiles, deciles and
percentiles.
I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 ,Q2 and Q3 . The
first quartile is also called the lower quartile and the third quartile is the upper quartile. The second
quartile is the median.
For Ungrouped data:
Let Q j be the j th quartile value for j 1, 2, 3 . Then
th
j
Q j n 1 item; j 1, 2, 3.
4
j n 4 FQ j
Q j LQ j W ; j 1, 2, 3.
fQj
The j th quartile class is the class with the smallest cumulative frequency greater than or equal
to j n 4 .
36 of 25
II. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth decile
is the median.
For Ungrouped data
Let D j be the j th percentile value for j 1, 2, ... , 9 . Then
th
j
D j n 1 item; j 1, 2, ... , 9
10
j n10 FD j
D j LD j W ; j 1, 2, ... , 9
f Dj
The j th decile class is the class with the smallest cumulative frequency greater than or equal
to j n 10 .
Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1 , P2 , ... P99 .
The fiftieth percentile is the median.
th
j
Pj n 1 item; j 1, 2, 3, ... , 99
100
j n100 FPj
Pj LPj W ; j 1, 2, 3, ... , 99
f Pj
The j th percentile class is the class with the smallest cumulative frequency greater than or equal
to j n 100 .
37 of 25
Interpretations
1. Q j is the value below which ( j 25) percent of the observations in the series are found
(where j 1, 2, 3 ). For instance Q3 means the value below which 75 percent of observations in the
given series are found.
2. D j Is the value below which ( j 10) percent of the observations in the series are found
(where j 1, 2, ... , 9 ). For instance D4 is the value below which 40 percent of the values are found
in the series.
3. Pj is the value below which j percent of the total observations are found (where j 1, 2, 3, ... , 99 ).
For example 73 percent of the observations in a given series are below P73 .
Exercise: The following table presents the male population of a certain region in Ethiopia.
c) 65 th and 75 th percentiles
Male population 2580 3737 4620 5200 7250 620 297 355
Chapter Four
The average or central value is of little use unless the degree of variation, which occurs about it,
is given. If the scatter about the measure of central tendency is very large, the average is not a
typical value. Therefore it is necessary to develop a quantitative measure of the dispersion (or
variation) of the values about the average. Measures of variation are statistical measures, which
provide ways of measuring the extent to which the data are dispersed or spread out.
38 of 25
To judge the reliability of a measure of central tendency
To compare two or more sets of data with regard to their variability
To control variability itself like in quality control, body temperature, etc
To make further statistical analysis or to facilitate the use of other statistical measures.
In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager’s salary
versus worker’s salary, the absolute measures of dispersion are not comparable. In such cases
measures of relative dispersion should be used.
Range (R) is defined as the difference between the largest and the smallest observation in a given
set of data. That is, R x max x min where xmax and xmin are the largest and the smallest
observations in the series respectively.
In case grouped data, range is found by taking the difference between the class mark of the last
class and that of the first class. That is, R M last M first where M last and M first are the class
marks of the last class and that of the first class respectively.
x max x min R
RR ........ for ungrouped data
x max x min x max x min
M last M first R
RR ......... for grouped data
M last M first M last M first
- Range and relative range are easy to calculate and simple to understand.
- Both cannot be computed for grouped data with open ended classes.
- They do not tell us anything about the distribution of values in the series.
Example 1: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.
462 480 534 624 498 552 606 588 516 570
Solution:
40 of 25
Example 2: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
Solution:
The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
MD
x i A
Where A is a central measure (the mean or the median)
n
41 of 25
MD
f i mi A
Where mi is the class mark of the i th class, f i is the frequency of
n
the i th class and n f i .
The mean deviation about the arithmetic mean is, therefore, given by
MD
xi x .... for ungrouped data
n
MD
f i mi x
.... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n f i
f i mi x
MD .... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n f i .
The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.
MD
In general, CMD where A is a measure of central tendency: the arithmetic mean or the
A
median.
MD
That is, CMD about the arithmetic mean is given by CMD where MD is the mean
x
deviation calculated about the arithmetic mean. On the other hand CMD about the median is
MD
given by CMD ~ in which case MD is calculated about the median of the observations.
x
42 of 25
- It is not capable of further mathematical treatments and it is not a very accurate measure
of dispersion.
The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.
Population Variance ( 2 )
For ungrouped data
2 x i
2
1 2 xi 2 Where is the population arithmetic mean
N
. ..
N
xi N
and N is the total number of observations in the population.
N
i
N
fi mi N Where is the population arithmetic
2
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and N f i .
Sample Variance ( S 2 )
For ungrouped data
2 x i x
2
1 2 xi 2 Where is the sample arithmetic mean
S
n 1
...
n 1
xi n x
and n is the total number of observations in the sample.
43 of 25
The Standard Deviation
value 3 6 9 12 15 total
frequency 1 4 10 3 2 20
f ix i
3 24 90 36 30 183
x i
-6.15 -3.15 -0.15 2.85 5.85
2
(x i ) 37.8225 9.9225 0.0225 8.1225 34.2225
2
f (x i )
i
37.8225 39.69 0.225 24.3675 68.445 170.55
x
fx i i
183 5
9.15, where n f i 20
n 20 i 1
And S 2
f x xi i
170.55
8.976
n 1 19
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
44 of 25
Coefficient of variation is used in such problems where we want to compare the variability of
two or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
S
CV 100 . Where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
S S
CV 100 CV 100
x x
23 11
100 29.11% 100 17.19%
79 64
Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students’ scores compared with that of Chemistry students.
Example: The following table illustrates the frequency distribution of masses of 100 male
students in Gander University.
45 of 25
No. of students 5 18 42 27 8
Solution:
class mark(mi) 61 64 67 70 73
5 5
5 2
i 1
f i mi 6745 , m i 1 i
455803 , n f
i 1
i 100
fm
i 1
i i
6745
and x 67.45
n 100
2 2
2 1 5 2
( f i m i) 1 (6745) ) 8.61
a) S
n 1
(i 1 f m
i i
n
) (455803
99 100
2
b) S S 8.61 2.93
S 2.93
c) CV * 100 * 100 4.344
x 67.45
d) MD
fi mi x 226.5 2.265
n 100
Variance
46 of 25
– It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
– Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
– It is calculated based on all the observations/data in the series.
– It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation
A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x
Population standard score: Z where x is the value of the observation, and are the
mean and standard deviation of the population respectively.
xx
Sample standard score: Z where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.
Interpretation:
Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?
47 of 25
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84
x A x1 84 72
Z-score of student A: Z 2.00
S1 6
x B x 2 90 85
Z-score of student B: Z 1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.
Moments:
The Kth row moment about the origin for a given n observation
x , x ,....., x
1 2 N
withthecorrespondingfrequencies f ,f 1 2
,...., f N
is defined as
N
1 N k
M k
N
f i x , where N f
i 1 i
i 1
i
, k 1, 2, ..
1 N
For k=1, we have M 1
N
f ix
i 1 i
Thus the 1st raw moment about the origin is arithmetic mean.
1 N 2
For k=2, we have M 2
N
f ix
i 1 i
The Kth central moment about the arithmetic mean for a given n observation is denoted by
N
1 N k
Mk and defined as M k N i 1 f i ( x i )
, where N
i 1
f i, k 1, 2, .. and
is arithmetic mean
1 N 2
f i ( xi )
2
For k=2 =>population variance i.e. M2 N i 1
48 of 25
Example: find the first three moments a about the mean from the following data
value 5 15 25 35 Total
frequency 1 3 4 2 10
Solution
value 5 15 25 35 Total
frequency 1 3 4 2 10
f ix i
5 45 100 70 220
x i
-17 -7 3 13
f i( x i
) -17 -21 12 26 0
2
f (x i ) 289 147 36 338 810
i
3
f (xi ) -4913 -1029 108 4394 -1440
i
N 4
x
i 1
i
200 f i( x i )
i 1
N
22 , M1 4
=0/10=0
f 10
i 1 i f
i 1
i
4 2 4 3
f i( x i )
i 1
f i( x i )
i 1
M2 4
=810/10=81 and M 3 4
= -1440/10 = 144
f
i 1
i f
i 1
i
A distribution is said to symmetrical when the value is uniformly distributed around the mean
(distribution of the data bellow the mean and above the mean are equal). The mean, median and
the mode are equal.
49 of 25
Negatively Skewed distribution: if one or more extremely small observations are present i.e.
mean is smaller than median and mode.
Positively skewed distribution: if one or more observations are extremely large i.e. mean is
greater than median and mode
When deviations are raised to an odd power (i.e. k=1, 2, 3, …) and sum of the negative deviation
equal to sum of positive deviations, then the distribution is symmetrical other wise it is skewed.
I.e. the distribution is symmetrical if M3=0, M5=0, M7=0, etc but for example if M3≠0 then the
distribution is skewed.
Negatively
Symmetrical
x x x x x or x x x or x
S K
lies b / n 3 and 3 i.e. 3 S K
3
If S K
= 0, then the distribution is symmetrical since X X
If S K
> 0, then the distribution is positively skewed, since X X
Solution: =22, 2 = 81
th th
(n ) obsn (n 1) obsn 5th obsn 6th obsn 25 25
X 2 2 25
2 2 2
3(22 25) 9
S K
9
9
1 0 , this implies that the data is negatively skewed.
50 of 25
(ii) Bowley’s Quartile measure of skewness: it says in a symmetrical distribution first and
third quartile has equidistance from the median(Q2)
Q Q3
I.e. Q2 – Q1= Q3 – Q2 in other word median, X 1
2
Q1 Q3 2Q2
SB Since Q2 = median we can rewrite as
Q3 Q1
Mesokurtic (normal curve): if the frequency distribution is unimodal and if the curve is bell
shaped and symmetrical.
51 of 25
Leptokurtic: if the frequency distribution is more peaked than normal i.e. large numbers of
observations have high frequency
Platykurtic: if the frequency distribution is less peaked than normal i.e. large numbers of
observations have low frequency.
Leptokurtic
Mesokurtic
Frequency Platykurtic
Value
M4 M4
4 2
(M 2) 4
Example: The standard deviation of a symmetrical distribution is 3.What must be the value of
the fourth moment about the mean in order that the distribution be mesokurtic?
Solution:
M4 M4
4 3 4 4
81
M4
3 M 4 3(81) 243 , So the 4th moment about the mean should be equal to 243
81
52 of 25
Unit 5
5 Elementary Probabilities
53 of 25
Example In throwing a fair die all possible outcomes are equally likely. That means the elements
of the sample space have the chance to be occurred.
5.2 Counting techniques:
In order to determine the number of out comes one can use several rules of counting
1. Multiplication rule: - in a sequence of n events in which the first event has k1 possibilities…
the nth event has kn possibilities, then the total possibilities of the sequence will be k1.k2….kn.
54 of 25
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
There are 14P2= 14! / (14-2) = 182
3. Combination: - counting technique in which the order of the objects is immaterial. Selection
of r objects from a collection of n objects where r<= n without regarding order.
The combination of n objects r objects taken at a time is given by
nCr = n!/(n-r)!r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
5.3 Definition of probability
Probability:-is a chance (likely hood) of occurrence of an event. It is expressed by a numerical
value between 0 and 1 inclusively. Probability is a building block of inferential statistics.
Deterministic Stochastic model (probabilistic)
-> Certain -> uncertain
->mathematical ->non-mathematical (econometric model)
Generally probability can be divided into two
i) Subjective probability: - probability of an event in a certain experiment to be occurred
based on individual’s belief or attitude.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
5.4 Basic approaches to probability
Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out the n outcomes
event E occur only k times the probability of the event E is denoted by P (E) is defined as
P (E) = n (E)/ n(S) =k/n
55 of 25
Deficiencies of classical approach
- If total number of outcomes is infinite or if it is not possible to enumerate all elements of
the sample space.
- If each out come is not equally likely
Example: in the experiment of tossing a coin and a die together, find the probability of an event
E consisting head and even numbers.
Solution: S={H1,H2,H3,H4,H5,H6,T1,T2,T3,T4,T5,T6} then
E= {H2, H4, H6} thus, P (E) =n (E)/n(S) =3/12= ¼
Let S be sample space of an experiment, P is called probability function if it satisfies the
following condition
0 < P (A) ≤ 1, for each event A, P (A) is called probability of A where P (S) = 1
If A and B are mutually exclusive events, then P (A B) = P (A) + P (B)
Similarly P ( Ai ) =P ( A1 ) + P ( A2 ) +…+ P ( An )
i 1
= P( A )
i
i 1
56 of 25
Rule l: let A be an event and A’ be the compliment of A with respect to a given sample space
of an experiment, then p(A’)=1-P(A)
Proof: let S be a sample space S=A A’ and, A and A’ are mutually exclusive
A A’ =
P(S) = P (A A’) = P (A’) + P (A) and P(S) = 1
1= P (A’) + P (A) => P (A’) = 1-P (A)
Rule 2: let A and B are events of a sample space S, then
P (A’ B) = P (B)-P (A B)
Proof: B =S B = (A A’) B = (A B) (A’ B)
Case 1: if A B ≠ , then P (B) =P (A B) +P (A’ B)
P (A’ B) = P (B) – P (A B)
Case 2: if A B = , then P (B) =P (A B) + P (A’ B) since P (A B) = P ( ) =0
=> P (B) = P (A’ B)
Rule 3: Suppose A and B are two events of a sample space, then
P (A B) = P (A) + P (B) - P (A B)
Example: A fair die is thrown twice. Calculate the probability that the sum of spots on the face
of the die that turn up is divisible by 2 or 3.
Solution:
S= {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),
(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let E1 be the event that the sum of the spots on the die
is divisible by 2 and E2 be the event that the sum of the spots on the die is divisible by two,
then
P (E1 or E2) = P (E1 E2)
= P (E1) +P (E2) – P (E1 E2)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3
5.6 Conditional probability and independence
57 of 25
5.6.1 Conditional probability: the conditional probability of an event A in relation to B is
defined as the probability that event E occurs given that event A is has been already occurred.
P (A/B)=P(A B)/P(B) where P(B)> 0
Remark: (i) P (A B) & P (B) are computed w. r. t. original sample
(ii) P (S/B) = P(S B)/P (B) = P (B)/P (B) = 1
P (B/S) = P (B) because P (B/S) = P (B S)/P(S) = P (B)/1 =P (B) (iv) if A and B are
independent event, then P(A/B) =P(A) and P(B/A) =P(B) two events are independent if the
occurrence of B doesn’t affect the occurrence of A. i.e. P(A/B) =P(A B)/P(B)
P (A B) = P (A/B) *P (B) but P (A/B) = P (A)
Hence P (A B) = P (A)* P (B)
Example: Suppose that an office has 100 calculating machines. Some of them use electric power
(E) while others are manual (M) and some machines are well known (N) while others are used
(U). The table below gives numbers of machines in each category. A person enter the office
picks a machine at random and discovers that it is new. What is the probability that it is used
with electric power?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution: P (E/N) =P (E N) /P (N) = 40/70 =7/4
Baye’s theorem
Theorem 1.1: let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En
has non-zero probability that is P(Ei) ≠ 0 for I = 1,2, … ,n and let E be any event, then P(E)
=P(E1)* P(E/E1) + P(E2)*P(E/E2) +….+P(En)*P(E/En)
n
= P ( E )P ( E )
i 1 E i
P( E ) P( E E )
i 1
i i
Example: suppose that three machines are A1,A2 and A3 produce 60%, 30%, and 20%
respectively of the total production of machines are 2%, 4%, and 6% respectively.
If an item is selected at random, then find the probability that the item is defective.
Assuming that an item selected at random is found to be defective. Find the probability the item
was produced on machine A1.
Solution :let B be an event of selecting a defective item at random and let E1, E2, E3 be an items
produced on machines A1, A2, A3 respectively then
P (B/E1) = 2%=0.02, P(B/E2) = 4%=0.04 and P(B/E3)=6%=0.06
P(B) = P(B [E1 E2 E3])
= P ([B E1] [B E2] [B E3])
= P (B E1) + P (B E2) +P (B E3)
= P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
= 0.6*0.02 + 0.3*0.04 + 0.1*.006
= 0.03
p ( E1 B ) P ( E1) P ( B
E ) = 0.6 * 0.02 =0.4
1
We use Bye’s formula P (E1/B) = = n
P( B) 0.03
P( E ) P( B E )
i 1
i i
5.6.2 Independence: two events E1 and E2 are said to be independent if the occurrence of E1
has no bearing on occurrence of E2. That means knowledge of E1 has occurred given no
information about the occurrence of E2. Two events, A and B, are said to be independent
if P ( A B ) P ( A) P ( B ) .
Suppose A and B are independent events with 0<P (A) <1 and 0<P (B) <1. Show that the
following statements true.
59 of 25
Example: Consider the experiment of drawing a card from a well shuffled deck of cards
Let A: a spade is drawn
B: an honor (10, J, Q, K, A) is drawn
Are the two events are independent?
13 1 20 5
Solution: P ( A) , P( B) and P( A B) 5
52 4 52 13 52
13 20 5
Using independence theorem P ( A B ) P ( A) P ( B ) *
52 52 52
6. Probability Distribution
6.1 Definition of Random Variable
Definition: A random Variable is variable whose values are determined by chance.
It is a numerical description of the outcomes of the experiment or a numerical valued function
defined on sample space, usually denoted by capital letters.
Example: If X is a random variable, then it is a function from the elements of the sample space to
the set of real numbers. i.e. X is a function X: S→R.
Flip a coin three times, let X be the number of heads in three tosses.
S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
X (HHH) =3
X (HHT) =X (HTH) =X (THH) =2
X (HTT) =X (THT) =X (TTH) =1
X (TTT) =0
X= {0, 1, 2,}
Random Variables are of two types:
1. Discrete random variable: are variables which can assume only a specific number of
values which are clearly separated and they can be counted.
Example:
Example: Consider the possible outcomes for the exp't of tossing three coins together.
Sample space, S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
Let the r.v. X be the No of heads that will turn up when three coins tossed
X = {0, 1, 2, 3}
P(X = 0) = P (TTT) = 1/8,
P(X=1) = P (HTT) +P (THT) + P (TTH) =1/8+1/8+1/8 = 3/8
P(X=2) = P (HHT) +P (HTH) +P (THH) = 1/8+1/8+1/8 = 3/8,
P(X=3) = P (HHH) = 1/8
X=x 0 1 2 3
Every discrete random variable X has a point associated with it. The points collectively are
known as a probability mass function which can be used to obtain probabilities associated
with the random variable.
Let X be a discrete random variable, then the probability mass function is given by
f(x) = P(X=x), for real number x.
A function is probability mass function
1. f(x) ≥0
2. f ( x ) 1
all x
Let X be a discrete random variable X whose possible values are X1, X2 …., Xn with
the probabilities P(X1), P(X2),P(X3),…….P(Xn) respectively.
Then the expected value of X, E(X) is defined as:
E(X) =X1P(X1) +X2P(X2) +……..+XnP (Xn)
n
E (X) = X P X x
i 1
i i
62 of 25
Example: what is the expected value for the r.v from the above example?
Solution X= 0,1,2,3, X 1 0, X 2 1, X 3 2, X4 3
E (X) = 3 X i P X xi
i 1
Variance
If X is a discrete random variable with expected value (i.e. E(X) = ), then the variance of X,
denoted by Var (X), is defined by
= E (X2) - 2
n 2
= ( xi ) P x -
i 1
i
2
n 2
Alternatively, Var (X) = ( xi X ) P x
i 1
i
Properties of Variances
For any r.v X and constant C, it can be shown that
Var (CX) = C2 Var (X)
Var (X +C) = Var (X) +0 = Var (X)
63 of 25
If X and Y are independent random variables, then
Var (X + Y) = Var (X) + Var (Y)
More generally if X1, X2 ……, Xk are independent random variables,
Then Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)
k k
I.e Var xi
i 1
Var X
i 1
i
= 0(1-P) + 1(P) = P
= xi P X x
2
i
2
= [02 P x 0 12 P x 1 ] - P2
= [0(1 p ) 1( p )] P 2
= P P 2 = P (1-P)
2. Two fair coin are tossed. Determine Var (X) where X is the number of heads that appear.
A) Use the definition of the variance
64 of 25
a. Use the fact that the variance of the sum of independent variables is equal to the
sum of the variance
P (X = 0) =¼ , P (X = 1) = ½, P(X=2) = ¼
E (X) = 0.P(X=0) +1.P (X=1) +2P(X=2) = 0 (1/4) + 1(1/2) +2(1/4) = 1=E(X)
E(X2) = 02P(X=1) +12.P(X=1) +22P(X=2) = 0(1/4) + 1(1/2) +4(1/4) = 3/2
= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the out come of the
second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½
65 of 25
Definition: Let X be the number of success in n repeated Binomial trials with probability of
success p on each trial, then the probabilities distribution of a discrete random variable X is
called binomial distribution.
Let P = the probability of success
q= 1-P= the probability of failure on any given trial.
A binomial random variable with parameters n and p represents the number of r successes in n
independent trials, when each trial has P probability of success
n!
P(X=r)= P r (1 P )n r X Binomial probability distribution formula
r ! n 1 !
n! n r
P(X=r) = Pr q whereq 1 P
r ! n r !
2 3 2
3! 1 1
b) P (X=2) = 3
2! 3 2 ! 2 2 8
2. The probability that a student entering a college will graduate is 0.4. Determine the probability
that out of 5 students (a) none, (b) one (b) at least one (a) at most three will graduate
X = 0,1,2,3,4,5,
5!
a) P (None will graduate) = P (X=0) = 0.4 0 0.65 0.08
0!5 0!
5!
b) P (one will graduate) = P (X=1) = 0.41 0.65 0.26
1!5 1!
5!
= 1- 0.40 0.65
0!5 0!
= 1-0.08=0.92
= 1-P(X>3)
= 1- [ P ( x 4) P ( x 5)]
67 of 25
= 1-[5!/(4!(5-4)!(0.4)4(0.6)1+5!/5!(5-5)!)(0.4)5(0.6)0]
= 0.91296
1. E (X) = n.p.
2. Var ( X) = npq
Poisson distribution
- It is a discrete probability distribution which is used in the area of rare events such as
number of car accidents in a day, arrival of telephone calls over interval of times, number
of misprints in a typed page natural disasters Like earth quake, etc,
Definition Let X be the number of occurrences in a Poisson process and be the actual
average number of occurrence of an event in a unit length of interval, the probability function for
Poisson distribution is,
x
e
P (X = x) = forX 0,1, 2,....
X!
0, otherwise
Remarks
Poisson distribution possesses only one parameter
If X has a Poisson distribution the parameter , then E (X) = and
Var (X) = , i.e. E (X) = Var (X) = ,
68 of 25
P ( X x) 1
x 0
Examples 1 A company manufacturing light bulbs discovers from past experience that 2
defects of bulbs are manufactured per 30 working hours. What is the prob that 4 defects will
be manufactured in 30 working hours?
Solution = 2,
e 2 .2 4
P (X = 4) 0.09
4!
Example 2 In a small city, 10 accidents took place in a time of 50 days. Find the probability
that there will be
a) Two accidents in a day
b) three or more accidents in a day
b) P (X 3) P( X 3) P X 4 P X 5 ...
= 1- P X 0 P X 1 P X 2
. . . . . . b/c P X x 1
x0
= 0.0012
3. a) Referring to eg.1, what is the expected no of defected light bulbs in a day? What
about the variance?
b) Referring to eg.2, find the mean and the variance for the no of accidents in a day
69 of 25
Solution a) E (X) = Var (X) = 2
Poisson distribution can approximate binomial distn, when the number of trials, n is
comparatively large and the probability of an occurrence a success, P is small.
e np np
, X 0,1, 2,...., Generally we use poisson to approximate
P(X=x) = x!
a Binomial when n 50 and np 5
Example: Suppose that an insurance company has 2000 policy holders & that the probability of
any one of policy holders will file at least one claim in any given year is 1/1000. Find the
probability that in any given year one or more of the policy holders will file at least one claim.
2000 0 2000
P( X 1 1 P x 1 1 p x 0 1 6 0.001 0.999 0.8648
70 of 25
It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. A random variable X is said to have a normal distribution if
its probability density function is given by
1 2
1 2 x
f ( x) e 2 , Where X is the real value of X,
2
i.e. - <x< , ∞<µ<∞ and σ>0
Where µ=E(x) (σ) 2 = variance(X)
µ and (σ) 2 are the Parameters of the Normal Distribution.
2. The curve approaches the horizontal x-axis as we go either direction from the mean.
1
1 x 2
2
3. Total area under the curve sums to 1, that is f ( x)dx e dx 1
2
4. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
5. The height of the normal curve attains its maximum at X this implies the mean and
mode coincides(equal) .
6.4.2 Standard normal Distribution
It is a normal distribution with mean 0 and variance 1.Normal distribution can be converted to
standard normal distribution as follows. If X has normal distribution with mean X and standard
x
deviation , then the standard normal distribution devariate Z is given by Z=
2
1 z
P (Z) =
2
e 2
Given a normal distributed random variable X with mean µ and standard deviation σ.
71 of 25
b x a
P (a<X<b) P ( )
a
PZ
ii) P ( Z ) 1
iii) P a Z b P Z b P Z a forq b
Consider the situations under the standard normal curve. It is clear that
P 0 Z 0.5 P Z 0
0 Z1 Z2 0 Z0 Z0 0
72 of 25
iii) p (Z1<Z<Z2)
Z1 0 Z2
As the value of increases, the curve becomes more and more flat and vice versa.
Solution: a)
-2.2 1.2
P (-2.2<Z<1.2) = P (0<Z<1.2) +p (-2.2<Z<0)
= p (0<Z<1.2) +P (0<Z<2.2)
= 0.3849+0.4861
= 0.8710
b)
= P (Z>1.05) = 1 - P (0<Z<1.05)
= 1-0.8531 = 0.1469
c) P (0<Z<0.96) = 0.3315
d) P (-1.45 <Z<0) = P (0<Z<1.45) = 0.4265
73 of 25
Student distribution (t-distribution)
Suppose we have a sample X1… Xn from a normal population having mean unknown and
standard deviation (Unknown), and using this sample data, we want to get an interval
estimator of the population mean .
X
Z= has a standard normal distn. But is unknown so that we can substitute it by
n
its estimator S (sample standard deviation). Hence, now
X
t n 1 = is said to be (student t-distribution) having n- 1 df.
S
n
X
t n 1 = is a (student) t r.v having n – 1
S
n
Notion; t x v stands for a value of t with v df. to the right of which an area equal to lips
t (12) t(25)
0.25 0.01
2.179 2.485
-tx ty
74 of 25
CHAPTER 7
We will now use the probability studied in the last two chapters to discuss inferential statistics.
We are going to analyze and interpret data to draw conclusions not about the data but about the
source of the data (population consisting of all elements being studied). We collect a sample of
data from the population and use it to make inferences about the population. Very often we will
be interested in estimating a population parameter. In order to estimate this we need to define
our terms carefully:
Unit: An element of the population. This will be a person or object on which observations can be
made or from which information can be obtained.
Sampling population: is a population from which one actually draws a sample. Sample
population covers the element from which sample was actually selected.
Target population: the population about which one wishes to make an inference.
Note: (i) the sample population is smaller than target population by non coverage or incomplete
(ii) Statistical inference procedures allow one to make inference about sample population.
Only when sample population and target population are equal one can infer about target
population.
75 of 25
7.2 Types of Errors
An estimate based on a sample will not be exact; there will be an error involved. In general,
errors which occur during estimation based on a sample can be categorized into two:
Sampling errors
Non sampling errors
Sampling errors
The error which arise due to only a sample being used to estimate population parameter.
Even if we have a representative sample will also introduce errors if the sample size is small.
On the other hand our estimates of parameters will often be inaccurate if our sample is not
representative of the population. Because of this we need to know how to choose a sample. We
see this in Section 7.3.
Sampling error is the difference b/n an estimate and the true value of the parameter being
evaluated. We deal with this concept in the next chapter.
Suppose we have a representative sample and have chosen a sample large enough to ensure our
parameter estimates are accurate to a good degree of precision. Even we will still have to
consider other kinds of errors such as measurement errors, recording errors, non-response
errors, respondent bias, interviewer error, errors in processing the data, and reporting
error. Measurement errors and recording errors occur if there is an error in measuring the item
being studied or in recording its result. Interviewer errors can occur in surveys when an
interviewer introduces bias into an interview or when a questionnaire is badly designed. Another
common form of error is the non-response error. Non responses can be due to refusals
Simple random sample: a sampling technique in which member of the population is equally
likely to be included in the sample. Suppose we have a population of N objects and we wish to
choose n of them to form a sample. We have seen that there are N C n ways of choosing the
sample without replacement and Nn ways with replacement.
Table of random number: used to select representative sample from a large size population. To
select the sample use random digit techniques. We proceed with the following steps
Step 1: each element numbered for example for a population of size 500 we assign 001 to 500.
Step 2: select a random starting point
Step 3: we need only respective number of digits. Proceed in this fashion until the required
number of sample selected
Stratified random sampling: is often used when the population is split into subgroups or
“strata”. The different subgroups are believed to be very different from each other, but it is
thought that the individuals who make up each subgroup are similar. The number of units to be
chosen from each sub-group is fixed in advance and the units are chosen by simple random
sampling within the sub group.
Cluster sampling: in some case the identification and location of an ultimate unit for sampling
may require considerable time and cost in such cases cluster sampling is used. In cluster
sampling the population is subdivided into groups or clusters and a probability of these clusters
is then drawn and studied. Clusters may be Region, Zones, Weredas, Kebeles etc.
This method of sampling has less cost, faster and more convenient but it may not be very
efficient and representative due to the usual tendency of the units in different cluster be similar.
Example: if we want to study the travel habit of families in Ethiopia which is divided in to
Regions and Zones. We shall first draw a random sample from the Zones to be studied and then
from these selected Zones or clusters, we draw random sample of house holds for the purpose of
investigation.
Systematic sampling: the items or individuals of the population are arranged in some way
alphabetically, in file drawer by data received or some other method. A random starting point
77 of 25
is selected and then every Kth member of the population is selected for the sample. For example
if we want select n items from the population of size N using systematic sampling, we divide N by
n (N/n = K) and choose one b/n 1 and K then we take every Kth member. So the samples will be i,
i+K, i+ 2K, i+ 3K, etc. where 0< i < K
Example: Suppose we want to choose a sample of about 20 students out of a class of 100
students. First we put the class in order (may be alphabetical order, or by ID number) and give
each a number between 1 and 100. Next we divide 100 by 20 and we get 100/20 = 5. We now
choose a number at random between 1 and 5. The student corresponding to that number is the
first student in the sample, and we then take every 5th student. So if, for example, we choose the
number 2 the sample will consist of the 2nd, 7th, 12th, 17th... 92nd and 97th students on the list.
Non probability sampling: selection of sample is based on the judgment of the investigator
rather than on randomness.
Judgment sampling: the subjective judgment of the researcher is the basis for selecting items to
be included in a sample. Judgment sampling often used to pre-test the questionnaire.
Quota sampling: in this sampling technique major population characteristics play an important
role in selection of the sample. It has some aspects in common with stratified sampling, but has
no randomization.
Example: if a scientist is reorganizing that the variability in daily milk production may due to
age difference. Characteristics of cows will be selected from different age group. For instant 30%
of cows’ b/n ages 4-6 years old, and remaining 70% are b/n ages 6-8 years old, a quota sample
must reflect those same percentages.
Convenient sampling: this technique is simply convenient to the researcher in terms of time,
money and administration
78 of 25
Chapter 8
Estimation and hypothesis testing
8.1 Estimation
Point estimate: a single number computed from the sample and it is used to estimate the
population parameter. We try to find a statistic (calculated from the sample) that is a good
estimator of the unknown parameter.
Confidence interval: A range of values constructed from the sample data, so the parameter
occurs within that range at a specific probability (level of confidence)
Standard error of the sample mean(SE): it the standard deviation of the probability
distribution of the sample means which measures variability of the sampling distribution of the
sample mean. It is denoted by X .
X
Case 1: when the variance σ2 is known. In this case X , then the confidence interval for
n
X
population mean X is given by X z
2 n
79 of 25
S
Case 2: the variance is unknown but the sample is large (n>30.) X and
n
S S S
X z / 2 , X z / 2 This can be written as: X z / 2
n n n
S
Case 3: when the variance σ2 is unknown and the sample size is small (n<30), then X
n
S
and the confidence interval for population mean X is given by X tn 1
2 n
Example: An experiment involves selecting a random sample of 256 middle managers. One item
of interest is annual income. The sample mean is $45,420 and the sample standard deviation is
$2,050.
(i) What is the estimated mean income of all middle manager (point estimate or population
mean)?
(ii) What is the 95 percent confidence interval for population mean?
(iii) What degree of confidence being used?
(iv) Interpret the result.
Solution:
1) The manufacturer of a certain type of battery is trying to estimate the lifetime of the battery.
He believes each battery will last for a random amount of time that has a N (μ, 100) distribution.
(The lifetimes are measured in hours.) He carries out an experiment to estimate μ. A sample of
400 batteries is tested and their lifetimes are measures. The (sample) mean lifetime is found to
be 74.2 hours. Calculate a 95% confidence interval for μ. How do you interpret this interval?
80 of 25
2) A biostatistician intends to estimate μ, the mean blood pressure of women between the ages
of 45 and 50. She takes a random sample of 20 women and measures their blood pressure.
Based on past experience she believes the measurements will follow a
N(μ, 100) distribution. (Measurements are in mm mercury.) Suppose she discovers the sample
mean is equal to 136.9 mm mercury. Find a 95% confidence interval for μ.
4) A sports scientist takes a random sample of 17 athletes and asks them to run 5km on a
treadmill. Their heart rates are measured before the start of the run and five minutes after the
finish. The increases in heart rates are measured and are shown below.
53 45 71 74 65 83 47 56 61 74 61 72 54 43 72 65 54
Increase in heart rates (beats per minute)
(i) Calculate the mean and standard deviation of the data.
(ii) The sport scientist wanted to estimate μ, the mean increase in heart rate. Find a
point estimate for μ and construct a 95% confidence interval for it. What
assumptions do you need to make about the population for this interval to be
valid?
Step 1: state the null hypothesis (H0) and alternative hypothesis (H1)
Null hypothesis: a statistical hypothesis stated with the view to be tested its validity. It is
denoted by Ho where H stands for hypothesis and 0 for no difference. Accepting H0 is not
sufficient to conclude that it is indeed true. It is better to say H0 is not false.
Alternative hypothesis: A statement that is accepted if the sample data provide enough evidence
that H0 is false.
The alternative hypothesis may be tested in keeping with either of the following two situations:
Level of significance is the probability of rejecting the null hypothesis when it is true. It is
usually designated by
There are money test statistics Z, T, F, and X 2. Test statistics is a value calculated from the
sample information which is used to determine acceptance and rejection of H0.
For example in hypothesis testing of population mean the test statistic Z is computed
x
as Z ( when sample size is large and X is known) similarly test statistics T is
X
n
x
computed as t ( when sample size is small and X is unknown)
calc S
n
Critical value: is a dividing pint b/n the region where the null hypothesis is rejected and the
region where it is not rejected.
Step 5: Make decision: is a decision to reject or not to reject H0 based on the test statistics
calculated lies in the rejection region or not at level of significance.
Example: The Jamestown Steel Company manufactures and assembles desks and other office
equipment at several plats in the western New York. The weekly production of model A325desk
at Fredonia plant is normally distributed with mean 200 and standard deviation 16. Recently due
to the market expansion, new production method is introduced and new employees are hired and
mean production of the last 50 weeks is 203.5. The vice president of the company would like to
82 of 25
investigate whether there has been an overall change in the weekly production of model A 325
desk. Is the mean production is different from 200? Use 0.01 level of significance.
Solution:
Since this is a two tailed test =0.01/2=0.05 in each tail have the area under normal curve is
2
0.05, then 0.500-0.005=0.4950
Step 5: since 1.55 doesn’t lie on the rejection region H0 is accepted and we conclude that the
population mean is not different from 200. I.e. production rate at Fredonia plant is not changed.
Example: Random samples of 200 senior school students produce a mean weight of 58 kg with
stdev. 4 kg. Test the hypothesis that the mean weights of the population is greater than 60 kg.
Use 0.05 level of significance
Solution:
i) H 0 : X 60kgvsH 1: X 60kg
ii) 0.05
x 58 60
iii) Test statistics: Z 7.072
X 4
n 200
x
Note: Z when is unknown, where S is sample standard deviation.
S
n
Example: A consumer service agency examined a new automobile for its gasoline performance.
A sample of 12 randomly chosen of kms covered per gallon under normal condition resulted an
average of 60 kms/gallon with stdev 1.8 km. Do this result support manufactures claim that the
new automobile covers more than 50 km/gallon? Use a=0.10
Solution:
Example: the mean length of a small counter balance bar is 43 mm there is a concern that the
adjustment the machine changed the bars. The null hypothesis there is no change in the mean
length ( X 43 ) test at 0.02 level of significance. 12 bars are randomly selected and their
length in mm 42, 39, 42, 45, 43, 40, 39, 41, 40, 42, 43, 42
Solution:
1) H 0 : X 43vsH 1: X 43
2) 0.02
84 of 25
2
x x i
( xi x)
3) t calc
S
2.92, n 12, x
n
41.5, S
n 1
1.78
n
5) Conclusion: computed (-2.92) lies beyond critical level of -2.718, so based on the sample
result we conclude that the machines is out of adjustment.
(i) Goodness of fit test- it enables us to determine how good a fit is between the observed
frequencies and the corresponding expected frequencies. It is concerned with
multinomial population where population and sample are distributed into two or more
classes according to a single attribute and p.d. is hypothesized.
Example: A random sample 100 families with four children each disclosed the following data.
Female birth 0 1 2 3 4
number of families 5 25 40 20 10
Verify at 0.05 if these data are consistent with the hypothesis that male and female births are
equally likely.
85 of 25
Solution:
Hypothesis H0: female births and male births are equally likely.
H1: female births and male births are not equal
0.05
Test statistics: Obtaining E i with probability of a female birth as p=1/2 and q=1-1/2=1/2 we
have 6.25 25 37.5 25 6.25
E i
5 25 40 20 10
Oi
2 (O E )
x i
3.667 i
E i
2 2
Decision rule: For v=4 Degrees of freedom reject H0 in favour of H1 if x x
2
Where x 0.05
9.49
2 2
Conclusion: since x 3.667 is less than x 0.05 9.49 H0 is accepted. This means that the sample
is consistent with the hypothesis that female and male births are equally likely.
(ii) Tests of independence: enable to determine whether or not the attributes are statistically
independent. The test is applied when population and sample are classified according
to two attribute. More over the p.d. of the classification is not known
(ColumnMa rg inalfrequency )( RowM arg inalFrequency )
Expected frequency of any cell=
TotalFrequency
Example: The Employment Bureau located in a city received 200 applications in the month of
June, 1987 for registration. A tabular presentation of the applications according to sex and level
of education emerges as under.
Do these data provide evidence at 0.05 to indicate that the level of education is related to
sex?
Solution:
Hypothesis H0: the level of education is not related to sex
H1: sex and level of education is related
0.05
2 2
2 (O i E i ) (30 24) (50 28)
Test statistics: x ... 44.4 where,
E i
24 28
86 of 25
(ColumnMa rg inalfrequency )( RowM arg inalFrequency ) 40(120)
E i
TotalFrequency
, E1
200
,..., E 6 28
2 2 2
Decision rule: For v=2df H0 is rejected in favour of H1 if x x at 0.05 where x 0.05
5.99
2 2
Conclusion: as x 44.4 > x 0.05 5.99 H0 is rejected. Thus, the sample data indicates that level
of education is related to sex.
(iii) Tests of homogeneity: When two or more samples drown from the same population or
from different population, it is of great interest to know weather the samples have
come from the population. Alternatively, it means verifying weather the data obtained
from different samples are homogeneous (similarity b/n two or more sets of sample
data).the use of X2 test statistics for verifying the homogeneous character of two or
more sets of sample data comprises X2 tests of homogeneity. H0: attributes are
homogeneous vs. H1: attributes are different
Example:-suppose a survey is conducted to know the view of different strata of people on the
new industrial policy which aims at seeking greater participation of the private sector. Four
different samples consisting of 100 professionals, 120 business men, 110 farmers, and 150
students are selected. Each one is asked to indicate weather he/she was in favour of, against or
indifferent to the new industrial policy. The result is show in the table below. Based on the
sample result evaluate weather there is a significant difference in the views of the four categories
of person.
Those who are Professionals Business men Farmers Students Total
In favour (f) 50(39.6) 70(47.5) 20(43.5) 50(59.4) 190
Against (A) 40(43.8) 30(52.5) 60(48.1) 80(65.6) 210
Indifferent (I) 10(16.6) 20(20.0) 30(18.4) 20(25.0) 80
Sample size (ni) 100 120 110 150 480
Solution:
Hypothesis H0: the views of four categories of person on the new industrial policy is similar
(Homogeneous)
H1: not H0
0.05
2 2
2 (O i E i ) (5039.6) (20 25)
Test statistics: x ... 54.57
E i
39.6 25
2 2
Decision rule: reject H0 if x x 0.05
at v (c 1)( r 1) (3 2)(4 1) 6
2 2
Conclusion: since x 54.57 x 0.05 12.60 H0 is rejected. It means there is a significant
difference in the views of the four categories of people on the new industrial policy. I.e. the four
samples don’t come from the same population (not homogeneous).
87 of 25
HAPTER 9
Regression is concerned with bringing out the nature of relation ship and using it to know the
best approximate value of one variable corresponding to a known value of other variable
Simple linear regression deals with method of fitting a straight line (regression line) on a sample
of data of two variables in terms of equation so that if the value of one variable is given we can
predict the value of the other variable.
In other words if we have two variables under study one may represent the cause and the other
may represent the effect. The variable representing the cause is known as independent (predictor
or repressor) variable and it is usually denoted by X. The variable representing the effect is
known as dependent (predicted) variable and is usually denoted by Y. Then, if the relationship
between the two variables is a straight line, it is known as simple linear regression.
When there are more than two variables and one of them is assumed to be dependent up on the
others, the functional relationship between the variables is known as multiple linear regressions.
Scatter diagram: is a plot of all ordered pairs (x, y) on the coordinate plane which is necessary to
discover weather the relationship b/n two variables indeed best explained by straight line.
Example:
Advertizing budget (X) 5 6 7 8 9 10 11
Profit(Y) 8 7 9 10 13 12 13
Y
13 x x
12 x
11
10 x
9 x
8 x
7 x
6
5
4
3
88 of 25
2
1
1 2 3 4 5 6 7 8 9 10 11 X
So if we draw a line, the regression line is one that passes through almost all or closest to all
points in the scatter diagram.
Y
x x x
x xx x
x
x x x
x x x
Y = + X + ε
Where
= y-intercept
= slope of the line or regression coefficient
ε=is the error term
The y-intercept and the regression coefficient are the population parameters. We obtain the
estimates of and from the sample. The estimators of and are denoted by a and b,
respectively. The fitted regression line is thus,
Ye = a + b X
The above algebraic equation is known as a regression line. The method of finding such a
relationship is known as fitting regression line. For each observed value of the variable X, we
can find out the value of Y. The computed values of Y are known as the expected values of Y
and are denoted by Ye.
The observed values of Y are denoted by Y. The difference between the observed and the
expected values Y-Ye, is known as error or residual, and is denoted by e. The residual can be
positive, negative or zero.
2
A best fitting line is one for which the sum of squares of the residuals, e; , is minimum. For
this purpose the principle called the method of least squares is used.
89 of 25
According to the principle of least squares, one would select a and b such that
e; 2 = (Y- Ye) ² is minimum where Ye = a+ bx.
2
To minimize this function, first we take the partial derivatives of e; with respect to a and b.
Then the partial derivatives are equated to zero separately. These will result in the following
normal equations:
y na b x
2
xy a x b x
Solving these normal equations simultaneously we can get the values of a and b as follows:
x y
xy n
b and
2
( x) 2
x
n
a y bx
Regression analysis is useful in predicting the value of one variable from the given values of
another variable.
Example: A researcher wants to find out if there is any relationship b/n height of the son and his
father. He took random sample 6 fathers and their sons. The height in inch is given in the table
bellow (i) Find the regression line of Y on X
(ii) What would be the height of the son if his father’s height is 70 inch?
Height of father (X) 63 65 66 67 67 68
2 2
Solution : X 396 , Y 425 , X 26152 , XY 26740 , Y 27355
x y
xy n 6(26740) (396)(405)
b 2
0.625 2
2
( x) 6( 26152) (396)
(i) x n
a y bx
Y b X
405 (0.625)(396)
67.5
n 6
Y=26.25-0.625X
(ii) If X=70, then
Y=26.25-0.625(70) =70, thus the height of the son is 70 inch
Standard Error of estimates: measures the average amount by which the estimated Ye values
depart from the corresponding observed Y values (dispersion of observed values around the line
of regression Yon X)
90 of 25
2
Sx.y =
( y i y ei ) , where Ye = + X + ε and
n2
Yi is observed (actual) value of y
Example: given the observation (2, 2), (4, 5), (6, 4) and (8, 7), we can get the regression line
Ye =1+0.7X. Find the standard error of the estimates of the regression line.
Solution:
Ye =1+0.7Xi, I = 1, 2, 3, 4
Then Ye1 =1+0.7(x1) Ye3 = 1+0.7(6) = 5.2
=1+0.7(2) = 2.4 Ye4 = 1+0.7(8) = 6.6
Ye2=1+07(4) = 3.8
2
( y i y ei) 1
Sx.y = = (2 2.4) ... (7 6.6) 1.26
n2 2
The measure of the degree of relationship between two continuous variables is known as
correlation coefficient. The population correlation coefficient is represented by and its
estimator by r. The correlation coefficient r is also called Pearson’s correlation coefficient since
it was developed by Karl Pearson. r is given as the ratio of the covariance of the variables x and
y to the product of the standard deviations of x and y. Symbolically,
( x x )( y y )
Cor ( x, y ) n 1
r
sd ( x).sd (Y ) 2
(x x ( y y)
n 1 n 1
=
( x x )( y y )
2 2
(x x) ( y y)
x y
xy n
= 2
( x
( X ) )( y ( y )
2 2
2
n
n
)
The numerator is termed as the sum of products of x and y, SPxy. In the denominator, the first
term is called the sum of squares of x, SSx, and the second term is called the sum of squares of y,
SSy. Thus,
91 of 25
SPxy
r=
SS x SS y
x x
For example, if r = 0.8, then r2 = 0.64. This means on the basis of the sample
approximately 64% of the variation in the dependent variable, say Y, is caused by the variation
of the independent variable, say X. The remaining, 1-r2, 36% variation in Y is unexplained by
variation in X. In other words, variables (factors) other than X could have caused the remaining
36% variation in Y.
Example: the research director of the Dubbary Saving and Loan Bank collected 24 observation
of montage interest rates X and number of house sales Y at each interest rate. The director
computed that,
2 2
x 276, y 768, x i 3300, y 2500, xi y 8690
i i i i
Then compute (i) Coefficient of correlation.
(iii)The coefficient of determination.
Solution:
(i) r
( x x )( y y ) 24(86.9) 276(768)
0.61
2 2
( x x ) ( y y )
2
24(3300) (276) 24(2500) (768)
2
92 of 25
(ii) Coefficient of determination (R2) = r2= (0.61)2 =0.37 this shows that 37% of the variation
in the number of households is due to the variation in the interest rate.
Ranks may be assigned either by two persons on a single characteristics or by a single person to
two different characteristics.
n 2
6i 1 d i
rs= 1- 2
, d i xi y
i
n(n 1)
Example: two judges gave the following ranks (highest to lowest) to 11 girls in a beauty contest.
Whether or not an agreement b/n the independent ranking of the two judges, find the rank
correlation and interpret the agreement b/n the two judges.
Girl number 1 2 3 4 5 6 7 8 9 10 11
ranking of judge A 3 4 1 2 5 10 11 7 9 8 6
ranking of judge B 2 4 3 1 7 9 6 11 10 5 8
Solution: construct table for the difference of paired ranks di and di2
Girl no 1 2 3 4 5 6 7 8 9 10 11
di 1 0 -2 1 -2 1 5 -4 -1 3 2
di 2 1 0 4 1 4 1 25 16 1 9 4
11 2
6i 1 d i 6(66) 11 2
Then, rs = 2
11(11 1)
=
11(121 1)
0.7 , where d i
i 1
66
rs=0.7 implies there is a very good agreement b/n judges with regard to the beauty of girls.
93 of 25