0% found this document useful (0 votes)
470 views93 pages

Statistics and Probability Handout

This document provides an introduction to statistics. It discusses the objectives of studying statistics, which are to understand basic terminology, scales of measurement, and methods of data collection. It then defines statistics and distinguishes between descriptive and inferential statistics. Descriptive statistics describe data, while inferential statistics allow estimating characteristics of populations based on samples. The document outlines the stages of a statistical investigation as data collection, organization, analysis, and interpretation. It also defines key statistical terms like population, sample, variable, parameter, and statistic. Finally, it discusses the applications and limitations of statistics.

Uploaded by

Onetwothree Tube
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
470 views93 pages

Statistics and Probability Handout

This document provides an introduction to statistics. It discusses the objectives of studying statistics, which are to understand basic terminology, scales of measurement, and methods of data collection. It then defines statistics and distinguishes between descriptive and inferential statistics. Descriptive statistics describe data, while inferential statistics allow estimating characteristics of populations based on samples. The document outlines the stages of a statistical investigation as data collection, organization, analysis, and interpretation. It also defines key statistical terms like population, sample, variable, parameter, and statistic. Finally, it discusses the applications and limitations of statistics.

Uploaded by

Onetwothree Tube
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

CHAPTER - ONE: INTRODUCTION

Objectives:
Having studied this unit, you should be able to
 understand statistics and basic terminologies
 understand scales of measurement in statistics
 understand the basic methods of data collection

Introduction
Most people become familiar with statistics through radio, television, newspapers and
magazines. For instance, one may find the following statements in a newspaper or reports. “The
HIV prevalence rate in Ethiopia among adults 15-49 years is 1.4 in 2005”; “Among older men,
the mortality rate for smokers is twice the rate of those who never smoked”; “The agricultural
production increased by 5 percent this year”.
However, statistics is used in almost all fields of human endeavor to make a scientific decisions
based on data. For example, in public health an administrator would be concerned with the
number of residents who contract a new strain of flu virus during a certain year. In pharmacy, it
is used to study the efficacy and potency of drugs. To study plant life, a botanist has to relay on
statistics to know the effect of temperature, rainfall and so on. In general, statistics can be
applied in business, social sconces, natural sciences and engineering.

1.1 Definition and Classification of Statistics

Definition of Statistics
The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state."
Clearly, statistics is closely linked with the administrative affairs of a state such as facts and
figures regarding defense force, population, housing, food, financial resources etc.

The word statistics has several meanings. In the first place, it is a plural noun which describes a
collection of numerical data such as employment statistics, accident statistics, population
statistics, economic statistics, and agricultural statistics e t c. It is in this sense that the word
'statistics' is usually understood by a layman.

Secondly the word statistics as a singular noun is used to describe a branch of applied
mathematics, whose purpose is to provide methods of dealing with collections of data and
extracting information from them in compact form by tabulating, summarizing and analyzing the
numerical data or a set of observations.

Classification of Statistics

Statistics may be divided into two main branches:


(1) Descriptive Statistics (2) Inferential Statistics
1 of 25
Descriptive statistics includes statistical methods involving the collection, presentation, and
characterization of a set of data in order to describe the various features of the data. In general,
methods of descriptive statistics include graphic methods (bar chart, pie chart, e t c) and numeric
measures (mean, median, variance e t c). Descriptive statistics do not, however, allow us to make
conclusions beyond the data we have analyzed. They are simply a way to describe data.
Meaningful and pertinent information cannot be realized from raw data unless summarized by
the tools of descriptive statistics. Descriptive statistics, therefore, allow us to present the data in a
more meaningful way which allows interpretation of the data easily.

Inferential statistics includes statistical methods which facilitate estimation the characteristics of
a population or making decisions concerning a population on the basis of sample results. In this
regard, methods like estimation and hypothesis testing are examples of inferential statistics.

For example, a biologist collected blood samples of 10 students from biology department to
study blood types. Accordingly, the following data is obtained:

O A O AB A A O O B A O

Summary measures, for example, the proportion of students with blood type O in the sample is
50% is an example of descriptive statistics. We can also describe the data using bar or pie charts.

However, if he/she wants to get information on the proportion of students with blood type O in
the entire class, he/she may use the sample proportion (50%) as an estimate of the corresponding
value of the entire class. This is an example of inferential statistics.

Activity 1.1

1. Define the term statistics in two ways.

2. Explain the difference between descriptive statistics and inferential statistics.

3. Give example for each classification of statistics.

1.2 Stages in statistical investigation

A statistical study might involve the following stages: collection of data, organizing and
presenting the collected data, analyzing and interpreting the result.

Stage 1: Data collection: this stage involves acquiring data related with the problem at hand.

Stage 2: Organizing and presenting data: this stage involves the classification or sorting the
collected data based on some characteristics or attributes such as age, sex, marital status e t c.
Further we may use tables, graphs, charts so on to present the data.

2 of 25
Stage 3: Data analysis: a thorough scrutiny or analysis of the data is necessary in order to reach
conclusions or provide answers to a problem. The analysis might require simple or sophisticated
statistical tools depending on the type of answers that may have to be provided.

Stage 4: Interpretation of the result: logically a statistical analysis has to be followed by


conclusions in order to be able to make a decision. The technical terminology used to describe
this last process of a statistical study is referred to as interpretation.

Activity 1.2

1. Briefly explain the steps in statistical investigation

1.3 Definition of some terms

A population: is the totality (collection) of all individuals, objects or items under consideration.
Consists of all elements, individuals, items or objectives whose characteristics are being studied.
The population that is being studied is called target population.
Sample: A portion of the population selected for study.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Variable: is a characteristic under study that assumes different values for different element.
Quantitative variable: A variable that can be measured numerically. The data collected on
quantitative variable are called quantitative data. Examples include weight, height, number of
students in a class, number of car accidents, e t c.
Qualitative variable: A variable that cannot assume a numerical value but can be classified into
two or more non numerical categories. The data collected on such a variable are called
qualitative or categorical data. Examples include sex, blood type, marital status, religion e t c.
Discrete variable: a variable whose values are countable. Examples include number patients in a
hospital, number of white blood cells in a droplet of blood sample, number of rodents per plot of
farmland e t c.
Continuous variable: a variable that can assume any numerical value over a certain interval or
intervals. Examples include weight of new born babies, height of seedlings, temperature
measurements e t c.
Parameter: A statistical measure obtained from a population data. Examples include population
mean, proportion, and variance and so on.
Statistic: A statistical measure obtained from a sample data. Examples include sample mean,
proportion, and variance and so on.
Unit of analysis: The type of thing being measured in the data, such as persons,
families, households, states, nations, etc.

Activity 1.3

1. Explain the meaning of the following terms and give examples


a. Quantitative variable c. Qualitative variable
b. Discrete variable d. Continuous variable

3 of 25
2. Explain which of the following variables are quantitative and which are qualitative
a. Number of persons in a family b. Marital status of people
c. Monthly phone bills d. Length of frog jump
3. Classify the above variable as discrete or continuous.
4. Clearly identify the difference between population and sample by giving example.
5. Differentiate the terms statistic and parameter.

1.4 Applications, uses and limitations of statistics

Application of statistics
We pointed out that statistics has already become a very important subject area, and, that various
tools of statistics are being used to solve problems in everyday life, in research, in marketing, in
planning, in production and quality control and other areas. Nevertheless, statistics has its own
limitation and it can also be misused. In the following section we outline the limitations.

Limitation of statistics

 Statistics deals with only those subjects of inquiry which are capable of being quantitatively
measured and numerically expressed.
 Statistics deals only with aggregates of facts and no importance is attached to individual
items
 Statistical data is only approximately and not mathematically correct
 Statistics is liable to be misused. Hence expertise in the subject is very essential. Besides,
honesty is very important in the use of statistics.

Activity 1.4

1. Briefly explain the application of statistics in different sectors.

2. Discuss the limitation of statistics using example for each of the limitations.

1.5 Scales of measurement

If we use different types of measurement scales having different levels of refinement to measure
one and the same object we obtain different amounts and types of information about a variable
under consideration. Formally, we distinguish among four levels of measurement scales, and,
therefore, among four types of data.

Nominal scale: it is the simplest measurement scale. Values of nominal scale are used merely to
categorize the quantity being measured and hence there is no natural ordering of the levels or
values of the scale. For example, sex of an individual may be male or female. There is no natural
ordering of the two sexes. Others examples include religion, blood type, eye colour, marital
status e t c. The values of nominal scale can be coded using numerical values; however, we
cannot perform any mathematical operations on the numbers used to code.

4 of 25
Ordinal scale: this measurement scale is similar to the nominal scale but the levels or categories
can be ranked or order. That is, we can compare levels or categories of the scale. Therefore, this
scale of measurement gives better information on the quantities being measured as compared to
nominal scale. For example, living standard of a family can be poor, medium or higher. These
categories can be ordered as poor is less than medium and medium is less than higher class.
However, the distance or magnitude between the levels, say between poor and medium, is not
clearly known.

Interval scale: this measurement scale shares the ordering or ranking and labeling properties of
ordinal scale of measurement. Besides, the distance or magnitude between two values is clearly
known (meaningful). However, it lacks a true zero point (i.e., zero point is not meaningful). For
example, temperature in degree centigrade or Fahrenheit of an object. If the temperature of an
object is zero degrees centigrade, it doesn’t mean that the object lacks heat. Hence zero is
arbitrary point in the scale. It doesn’t make sense to say that 80° F is twice as hot as 40° F; in
centigrade the ratio would be 6; neither ratio is meaningful. We can do subtraction and addition
on interval level data but division and multiplication are impossible to use.

Ratio scale: it is the highest level of measurement scale. It shares the ordering, labeling and
meaningful distance properties of interval scale. In addition, it has a true or meaningful zero
point. The existence of a true zero makes the ratio of two measures meaningful. For instance, if
your salary is 1000 birr and your wife’s is 2000 we can say that your wife earns twice of yours.
If you don’t have any source of income, your income is zero in this scale context and it is
meaningful assignment. Other example includes, weight, height, volume measurements e t c.
We can do subtraction, addition, multiplication and division on ration level data.

The more precise variable is ratio variable and the least precise is the nominal variable. Ratio and
interval level data are classified under quantitative variable and, nominal and ordinal level data
are classified under qualitative variable.

Activity 1.5
1. For each of the variables, indicate whether it is quantitative or qualitative and specify the
scale of measurement that is employed when taking measurement on each
a. Class standing of the members of this class relative to each other
b. Admitting diagnosis of patients admitted to a mental health clinic
c. Weight of babies born in a hospital during a year
d. Gender of babies born in a hospital during a year.
e. Under-arm temperature of day-old infants born in a hospital.

5 of 25
CHAPTER - TWO

METHODS OF DATA COLLECTION AND PRESENTATION

Objectives:
After completing this unit you should be able to
 organize data using frequency distribution.
 present data using suitable graphs or diagrams.

Introduction

The amount of data collected in real life situations is often too large, thus we need some methods
to organize it. One of such methods is grouping, that is putting data into groups rather than
treating each observation individually. In fact, raw data provide little, if any, information to
decision makers. Thus, they need a means of converting the raw data into useful information.
Hence, the purpose of this chapter is to introduce tools used for data presentation.

2.1 Classification and tabulation of data

The use of classifying and tabulating data are to display the points of similarity and dissimilarity;
to save mental strain by systematic condensation and suppression of irrelevant detail; to enable
one to form a mental picture of objects of perception; and to prepare the ground for comparison
and inference.
Types of classification
1. Geographical- in terms of cities, districts, countries etc.
2. Chronological - on the basis of time
3. Qualitative - according to some qualitative characteristics.
4. Quantitative – in terms of magnitude.

One can also use combination of these to classify data.

Tabulation: tables may be classified according to the number of characteristics used for
tabulation.
1. Simple or one way table: it uses only one characteristic or variable for classification.

Example 2.1: Students who took introduction to statistics in 1998 E.C.by gender.
Gender Number

Male 2000

Female 700

6 of 25
2. Two-way tables: it uses two characteristics for classification.

Example 2.2: Students who took introduction to statistics in 1998 E.C.by age and gender.
Age Gender
Number of male Number of female
19 and below 200 180
20-25 1415 385
26 and above 385 135

3. Higher ordered tables: results when we have more than two characteristics of classification.
For instance, we can classify the students who took introduction to statistics in 1998 by age,
gender and faculty.

2.2 Introduction to methods of data collection

There are many types of data collection techniques which are used to collect data for study.
There are two types of data: primary and secondary data. Primary data refers to the statistical
material which the investigator originates for the purpose of inquiry. But secondary data, on the
other hand, refers to that statistical material which is not investigated by the investigator himself,
but which he obtains from someone else records.

Primary methods of data collection: Those methods that aim at collecting primary data are
termed as primary method. These may involve data collection using observation, personal
interview, self administered questionnaire, mailed questionnaire etc.

Secondary method of data collection: Secondary data can be obtained from published or
unpublished documents: reports, journals, magazines, articles e t c.
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are

 Comparable
 Meaningful and
 Collected for a well defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
 It enables us to know the rang of the data set easy and it also gives us some idea about
the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.

Primary source: Is a source of data that supplies first hand information for the use of the
immediate purpose.
7 of 25
 Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for
other purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
 Secondary data: data collected from a secondary source.
The process of data collection from a primary source may in value: field trials, laboratory
experiments, surveys (sample survey and census survey), etc….

Activity 2.1
1. Distinguish between primary and secondary data.
2. What are the various methods of collecting primary data? Give example of each.
3. Define secondary data. What are their sources?
4. Describe primary and secondary method of data collection. In what special circumstance
are the two methods suitable?

2.3 Methods of Data Presentation


2.3.1 Frequency distributions

In this section, we will concentrate on some of the frequently used method of organizing data.
The easiest method of organizing data is using a frequency distribution, which converts raw data
into a meaningful pattern for statistical analysis.
The main uses of a frequency distribution are
 to organize data in a meaningful, intelligible way.
 to enable one to determine the nature or shape of the distribution; how the observations
cluster around a central value; and how the values spread around the center of the data.
 to facilitate computational procedures for measures of average and spread.
 to enable one to draw charts and graphs for the presentation of data.
 to enable one to make comparisons between data sets.
Frequency distribution: a grouping of data into categories showing the number of observations
in each mutually exclusive category.
Array: data put in an ascending or descending order of magnitude.
Grouped data: data presented in the form of a frequency distribution.
Frequency: the number of observations corresponding to a fixed value or to a class of values.
Relative frequency: the number obtained when the frequency of a class is divided by total
number of observations.
Generally, there are three basic types of frequency distributions: Categorical, Ungrouped and
Grouped frequency distributions.

1. Categorical frequency distribution


– the data are usually qualitative
– the scales of measurements for the data are usually nominal or ordinal

8 of 25
The categorical frequency distribution is used for data which can be placed in specific categories
such as nominal or ordinal level data. For example, data such as political affiliation, religious
affiliation, blood type, marital status, or major field of study would use categorical frequency
distributions.

Example 1.1: The following data are on the political party affiliations of sample of 40 statistics
students. D, R, and O stand for Democratic, Republican and other, respectively.
D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.

Table 2.12 Number of students by political party affiliations.

Class frequency Relative frequency


Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1
Example 1.2: Thirty students, last year, took Stat 273 course and their grades were as follows. Construct
an appropriate frequency distribution for these data.

B B C B A C

D C C C B B

B A B C D C

A F B F C A

B C C A C D

There are five kinds of grades: A, B, C, D and F which may be used as the classes for constructing the
distribution. The procedure for constructing a frequency distribution for categorical data is given below.

STEP 1. Construct a table as shown below


Class Tally Frequency Percent*

(I) ( II ) ( III ) ( IV )

STEP 2. Tally the data and place the results in column (II)

9 of 25
STEP 3. Count the Tallies and put the results in column (III)
STEP 4. Calculate the percentages (%) of frequencies in each class by using the formula
f
%    100 Where f = frequency of the class (result in column (III))
n
n = total number of observations
*
Percentages, normally, are not parts of a frequency distribution, but they can be included since they are
important in different statistical analyses.

STEP 5. [For checking] find the total of column (III) and that of column (IV) and see that the total of
column (III) and that of column (IV) are n (total number of observations) and 100%
respectively.
Finally, the frequency distribution becomes as follows.

Class Tally Frequency Percent*

(I) ( II ) ( III ) ( IV )

A ///// 5 16.7

B ///// //// 9 30.0

C ///// ///// / 11 36.7

D /// 3 10.0

F // 2 6.7

2. Ungrouped frequency distribution

Ungrouped frequency distribution is a table of all potential raw scored values that could possibly occur in
the data along with their corresponding frequencies. Ungrouped frequency distribution is often
constructed for small set of data or a discrete variable.

Constructing an ungrouped frequency distribution

To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in the
collected data. Then make a columnar table of all potential raw scored values arranged in order of
magnitude with the number of times a particular value is repeated, i.e., the frequency of that value. To
facilitate counting method, tallies can be used.

Example 2.1: The following data are the ages in years of 20 women who attend health education last year:
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.

Construct a frequency distribution for these data.

STEP 1. Find the range of the data:


Range  Maximum observation  Minimum observation
10 of 25
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency distribution
becomes as follows.
Age Tally Frequency

29 / 1

30 //// 4

31 / 1

32 /// 3

33 / 1

35 // 2

36 // 2

37 / 1

39 / 1

41 /// 3

42 / 1

3. Grouped frequency distribution

Components of a grouped frequency distribution


Class limits: the values of a variable which typically serve to identify the classes of a frequency
distribution. They are sometimes referred to as nominal or apparent limits. The smaller and the
larger values are known as the lower and the upper class limits, respectively. They should be
selected in such a way that they have the same number of significant places or units of
measurement as the observations to be classified.

Class boundaries: the precise points which separate various classes rather than the values
included in any one of the classes. They are sometimes referred to as exact or true limits. They
leave no space for ambiguity and overlapping. A class boundary is located mid-way between the
upper class limit of a class and the lower class limit of the next higher class. They are carried out
to one more decimal place than the class limits.

Class mark: the point which divides the class into two equal parts. This is also known as class
mid-point. This can be determined by dividing the sum of the two limits or the sum of the two
boundaries by 2.

Class width: the length of a class.


Example 2.3: The following data are the weights in kg of 40 individuals participated in a diet
program for weight loss:

11 of 25
70 64 99 55 64 89 87 65 62 38 67 70 60 69 78 39 75 56 71 51
99 68 95 86 57 53 47 50 55 81 80 98 51 36 63 66 85 79 83 70

By grouping data into classes we can make the data much easier to read and understand. We
group these data by 10s. The smallest weight is 36 kg, thus the 1rst class of weights is 31 kg up
to, including, 40 kg.
Table 3.1: Distribution of weights.
Class Class boundary Count (Frequency)
31 – 40 30.5-40.5 3
41 – 50 40.5-50.5 2
51 – 60 50.5-60.5 8
61 – 70 60.5-70.5 12
71 – 80 70.5-80.5 5
81 – 90 80.5-90.5 6
91 - 100 90.5-100.5 4
Total 40

For this example, the first class is ‘31-40’. Lower limit of this class = 31; upper limit = 40. The
lower class boundary = 30.5; upper class boundary = 40.5. The width of the class = upper class
boundary - lower class boundary = 40.5-30.5 = 10. The class mark (class mid-point) of this class
is (31+40)/2 = 35.5. The values 36, 39, 38 are included in this class. Therefore, the frequency of
this class is 3.

 Cumulative frequency (Cf) less than type – is the total frequency of all values (observations)
less than or equal to the upper class boundary for the given class.
 Cumulative frequency (Cf) more than type – is the total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.
 A tabular arrangement of class intervals together with their corresponding cumulative frequency
(either more than or less than type; as defined above) is called cumulative frequency distribution.

Steps for construction of a grouped frequency distribution

STEP 1. Find the maximum(Max) and the minimum(Min) observation, and then compute their range, R
Range  Max  Min
STEP 2. Fix the number of classes’ desired (k). there are two ways to fix k:
– Fix k arbitrarily between 6 and 20, or
– Use Sturge’s Formula: k  1  3.332 log 10 N where N is the total frequency. And round
this value of k up to get an integer number.
STEP 3. Find the class widths (W) by dividing the range by the number of classes and round the number up
to get an integer value. W R
K
STEP 4. Pick a suitable starting point less than or equal to the minimum value. This starting point is the
lower limit of the first class. Continue to add the class width to this lower limit to get the rest of
the lower limits.

12 of 25
STEP 5. Find the upper class limits. To find the upper class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to this
upper limit so as to get the rest of the upper limits.
STEP 6. Compute the class boundaries as: LCB  LCL  12 U and UCB  UCL  12 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and UCB= upper
class boundary. The class boundaries are also half way between the upper limit of one class and the lower
limit of the next class.

STEP 7. Tally the data and Find the frequencies.


STEP 8. (If necessary) Find the cumulative frequencies (more than and less than types).
Example 3.1: The number of hours 40 employees spends on their job for the last 7 working days is given
below.

62 50 35 36 31 43 43 43

41 31 65 30 41 58 49 41

37 62 27 47 65 50 45 48

27 53 40 29 63 34 44 32

58 61 38 41 26 50 47 37

Construct a suitable frequency distribution for these data using 8 classes.

STEP 1. Max = 65, Min = 26 so that R = 65-26 = 39


STEP 2. It is already determined to construct a frequency distribution having 8 classes.
STEP 3. Class width W  39  4.875  5
5
STEP 4. Starting point = 26 = lower limit of the first class. And hence the lower class limits become
26 31 36 41 46 51 56 61

STEP 5. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65

The lower and the upper class limits (Steps 5 and 6) can be written as follows.

Class limits

26 – 30

31 – 35

36 – 40

41 – 45

46 – 50

51 – 55

13 of 25
56 – 60

61 – 65

STEP 6. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units of
measurement to the upper class limits, we can get lower and upper class boundaries as follows.
Class
boundaries

25.5 – 30.5

30.5 – 35.5

35.5– 40.5

40.5– 45.5

45.5– 50.5

50.5– 55.5

55.5– 60.5

60.5– 65.5

STEPS 7 and 8 are displayed in the following table (columns 3, 4 and 5&6 respectively).

Class limits Class Tally frequency Cumulative Cumulative


boundaries frequency (less frequency
than type) (more than type)

26 – 30 25.5 – 30.5 ///// 5 5 40

31 – 35 30.5 – 35.5 ///// 5 10 35

36 – 40 35.5– 40.5 ///// 5 15 30

41 – 45 40.5– 45.5 ///// //// 9 24 25

46 – 50 45.5– 50.5 ///// // 7 31 16

51 – 55 50.5– 55.5 / 1 32 9

56 – 60 55.5– 60.5 // 2 34 8

61 – 65 60.5– 65.5 ///// / 6 40 6

Example 3.2: The following data are on the number of minutes to travel from home to work for a
group of automobile workers.

28 25 48 37 41 19 32 26 16 23 23 29 36

31 26 21 32 25 31 43 35 42 38 33 28.
14 of 25
Construct a frequency distribution for this data.

Solution:

 Range = 48 – 16 =32

 K=1+3.322 =5.64≈6

 W=32/6=5.33 rounding up to the nearest integer i.e W=6.

Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally
Frequency
16-21 15.5-21.5 \\\3
22-27 21.5-27.5 \\\\\ \
6
28-33 27.5-33.5 \\\\\ \\\
8
34-39 33.5-39.5 \\\\
4
40-45 39.5-45.5 \\\3
46-51 45.5-51.5 \ 1
Total 25
Table 3.2: The distribution of the time in minutes spent by automobile workers to travel from
home to work place.

Time (in minute) Number of workers


16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25

This frequency distribution is more understandable than the raw data. We can see some feature
of the data from this table. For instance, many observations are found in the second class and
third class. This in turn implies that many workers took around 22 to 33 minutes to travel from
home to work place.

Activity 2.2
1. In a biology experiment the lengths of 25 worms, measured to the nearest 0.1cm, were:
9.5 8.1 5.1 6.6 9.3 9.1 6.5 5.0 6.9 7.6 9.3 8.3 6.0
6.2 7.4 7.7 7.8 7.9 7.0 7.8 5.4 9.8 6.3 7.5 8.4
Construct a frequency distribution for the data by using Sturgess’ rule for the number of classes. What
do you think about the typical length of these worms?

Types of grouped frequency distributions

15 of 25
Based on the type of frequency assigned to the classes we have three types of grouped frequency
distributions:
 Absolute frequency distribution
 Relative frequency distribution
 Cumulative frequency distribution
The frequency distributions that we have seen in the previous examples (examples 3.2 and table 3.2) are
absolute frequency distributions because the frequencies assigned are absolute frequencies.

Definition 2.1: A relative frequency distribution is a distribution which specifies


the frequency of a class relative to the total frequency.

Example 3.3: Convert the above absolute frequency distribution in example 2.6 to a relative frequency
distribution.
Solution: First we find the relative frequency of each class. The relative frequency of a class is the
frequency of the class divided by the total number of observations. For instance the relative frequency of
the first class is 3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the
relative frequency distribution is shown in table 2.7.
Table 3.3: The distribution of the time in minutes spent by automobile workers to travel from home to
work place.
Time (in minute) Relative frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1
Note: Proportion may also be changed to percentages to obtain a percentage relative frequency
distribution.

Example 3.4: Convert the above relative frequency distribution to a percentage relative frequency
distribution.

Solution: We simply multiply the relative frequencies of the above relative frequency distribution by 100.

Table 3.4: The distribution of the time in minutes spent by automobile workers to travel from home to
work.

Time (in minute) Relative frequency


16-21 12
22-27 24
28-33 32
34-39 16
40-45 12
46-51 4
Total 100

16 of 25
Definition 2.2: Cumulative frequency refers to the number of observations
that are below a specified value or that are above a specified value.
Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the
observations are bounded from above or from below we can have a cumulative less than or a
cumulative more than frequency distributions, respectively.

Example 2.8: Convert the absolute frequency distribution in example 2.5 into:

i) a cumulative frequency distribution less than type.


ii) a cumulative frequency distribution more than type.

Solution:
We use the class boundaries to form cumulative frequencies.

Table 3.5: The less than and more than type cumulative frequency distribution of the time in
minutes spent by automobile workers to travel from home to work place.

Time (in minute) Cf Cf


less than type more than type
15.5 – 21.5 3 25
21.5 – 27.5 9 22
27.5 – 33.5 17 16
33.5 – 39.5 21 8
39.5 – 45.5 24 4
45.5 – 51.5 25 1

Activity 2.3
1. The following are the scores of 32 students who took statistics test:
55 70 80 75 90 80 60 100 95 70 75 85 80 80 70 95

100 80 85 70 85 90 80 75 85 70 90 60 80 70 85 80

Organize this data set using an absolute frequency distribution consisting of 7 classes. Start the first
class with the minimum value in the data set. Construct also the relative frequency distribution, the
less than cumulative frequency distribution, and the more than cumulative frequency distribution.
What do you think about the typical score of these students? How many students score below the
lower limit of the third class?

2.3.2 Diagrammatic and graphical presentation of data

2.3.2.1 Graphs for quantitative data


1. Histogram

17 of 25
It consists of a set of adjacent rectangles whose bases are marked off by class boundaries (not
class limits) along the horizontal axis and whose heights are proportional to the frequencies
associated with the respective classes.

To construct a histogram from a data set:


1. Arrange the data in increasing order.
2. Choose class intervals so that all data points are covered.
3. Construct a frequency table.
4. Draw adjacent bars having heights determined by the frequencies in step 3.

The importance of a histogram is that it enables us to organize and present data graphically so as
to draw attention to certain important features of the data. For instance, a histogram can often
indicate how symmetric the data are; how spread out the data are; whether there are intervals
having high levels of data concentration; whether there are gaps in the data; and whether some
data values are far apart from others.
Example: Construct a histogram for the frequency distribution of the time spent by the
automobile workers.

Table 3.6: The distribution of the time in minutes spent by automobile workers to travel from
home to work.

Time (in minute) Class mark Number of workers


15.5- 21.5 18.5 3
21.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
45.5-51.5 48.5 1

2. Frequency Polygon

18 of 25
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical axis
and their respective class marks along the horizontal axis. Then join the cross points by a free hand curve.

Example: Draw a frequency polygon presenting the following data.

Frequency Cumulative Cumulative


Frequency (less Frequency (more
Class Boundaries Class Mark than type) than type)

5.5 – 11.5 8.5 2 2 20

11.5 – 17.5 14.5 2 4 18

17.5 – 23.5 20.5 7 11 16

23.5 – 29.5 26.5 4 15 9

29.5 – 35.5 32.5 3 18 5

35.5 – 41.5 38.5 2 20 2

10

6
Frequency

0
0.0 8.50 14.50 20.50 26.50 32.50 38.50

Class Marks

3. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis. Place
the class boundaries along the horizontal axis and the corresponding cumulative frequencies (either less
than or more than cumulative frequencies) along the vertical axis. Then join the cross points by a free
hand curve.

19 of 25
Example: the data in the above example can be presented using either a less than or a more than
cumulative frequency polygon as given below (i) and (ii) respectively.

(i) Less than type cumulative frequency curve

30
Less than type cumulative frequencies

20

10

0
11.50 17.50 23.50 29.50 35.50 41.50

Upper class boundaries

(ii) More than type cumulative frequency curve

30
More than type cumulative frequencies

20

10

0
5.50 11.50 17.50 23.50 29.50 35.50

Lower class boundaries

4. Line graph

Data from a frequency table can be graphically pictured by a line graph which plots the
successive values on the horizontal axis and indicates the corresponding frequency by the height
of a vertical line. This method of data presentation is especially suitable for discrete data. For
20 of 25
instance data on number of family members, number of car accidents, number of defective items
produced by machines etc could be well explained using line graph.

Example: The following data are on the number of seeds germinated out of six seeds planted in
each of 50 pots.
1 1 1 2 6 3 3 4 2 43 2 1 5 2 1 3 6 2 23 1 1 4 3
2 2 2 2 30 3 1 2 1 2 3 1 1 3 3 2 1 2 1 1 3 1 5 1

Construct a line graph for this data.

2.3.2.2 Graphs for qualitative data

1. Bar-charts
i) Simple bar charts: are diagrammatic representation of data in which the data are
represented by series of vertical or horizontal bars, the height (or length) of each bar
indicating the size of the figure represented.

Example: Draw a bar chart for the following coffee production data.

Table: Coffee productions from 1990 to 1995.

Production year 1990 1991 1992 1993 1994 1995


Amounts of coffee (in 1000 tons) 50 75 92 64 100 120

21 of 25
120

Amount of coffee in 1000 tons


100

80

60

40

20

0
1990 1991 1992 1993 1994 1995

Production year

ii) Component bar charts: are like ordinary bar charts except that the bars are subdivided
into two or more component parts. It is used to represent total figure in terms of
components. The components are proportional in size to the component parts of the
total quantity being represented by each bar.
a. Actual component bar charts: are charts in which the overall height of the bar and the
individual component lengths represent actual figures.
Example: Draw an actual component bar chart for the following data on production of coffee
(in 1000 tons).
Table: Coffee productions from 1991 to 1993 by region.
Production year 1991 1992 1993

Amount of coffee Region A 80 85 90


(in 1000 tons) Region B 120 165 120
Total 200 250 210

250 Region
Amount of coffee in 1000 tons

A
B
200

150

100

50

0
1991 1992 1993

Production year

b. Percentage component bar charts: are charts in which the individual component
lengths represent the percentage forms of the overall total. Note that a series of such bars
will all be of the same total height, i.e. 100 percent.

22 of 25
Example: Draw a percentage component bar chart for the above data on production of coffee (in
1000 tons).

Solution: First convert the component figures into percentage forms of their corresponding totals
to get the following result.
Table: Coffee productions from 1991 to 1993 by region.

Production year 1991 1992 1993

Amount of coffee Region A 40 34 42.9


(in percents) Region B 60 66 57.1
Total 100 100 100

100.0 Region
Amount of coffee in percent

A
B
80.0

60.0

40.0

20.0

0.0
1991 1992 1993

Production year

iii) Multiple bar charts: are charts in which figures are shown as separate bars adjoining
each other. The height of each bar represents the actual value of the component
figures.

Example: Draw a multiple bar chart for the data on production of coffee.

200 Region
Amount of coffee in 1000 tons

A
B

150

100

50

0
1991 1992 1993

Production year

23 of 25
2. Pie-chart
Is a circle divided by radial lines into sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
Pie-chart construction:
f
 Calculate the percentage frequency of each component. It i *100 .
n
f
 Calculate the degree measures of each sector. It is given by i * 360 0 .
n
 Draw the circle using protractor and compass

Example: Draw a pie-chart to represent the following data on a certain family expenditure.

Table: Data on a certain family expenditure.


Item Food Clothing House rent Fuel & light Miscellaneous Total
Expenditure(in birr) 50 30 20 15 35 150
Percentage frequencies 33.33 20 13.33 10 23.33
Angles of the sector 1200 720 480 360 840 3600

Item
Food
Clothing
House rent
Fuel and light
Miscellaneous

Activity 2.4

1. The following data are the blood types of 50 volunteers at a blood plasma donation clinic:

O A O AB A A O O B A O A AB B O O O A B A A O A A B O B A O AB
A O O A B A A A O B O O A O A B O AB A O

a. Organize this data using a categorical frequency distribution


b. Present the data using both a pie and a bar chart.

2. The following table gives the number of deaths in a certain country in 1987 due to accidents for
individuals in various classifications.

24 of 25
Classification Number of deaths

Pedestrians 1699

Bicyclists 280

Motorcyclists 650

Automobile drivers 1327

Represent the data using both a bar chart and a pie chart. Which of the charts is more
informative?

3. Pictogram
Is a device used to represent data by means of pictures or small symbols. It is customary to
represent a unique value of the data by standard symbol or a picture and the whole quantity by an
appropriate number of repetitions of the symbol assumed. The symbol should be simple and
clear for understanding.

Example: The following table shows the orange production in a plantation from production year
1990-1993. Represent the data by a pictogram.

Table: Orange productions from 1990 to 1993.

Production year 1990 1991 1992 1993


Amount (in kg) 3000 3850 3500 5000

25 of 25
Chapter Three

MEASURES OF CENTRAL TENDENCY

3.1 Objectives of Measuring Central Tendency

A single value that describes the characteristics of the entire mass of data is called measures of
central tendency or average.

Objectives of measuring central tendency are:

 To get a single value that represent(describe) characteristics of the entire data

 To summarizing/reducing the volume of the data


 To facilitating comparison within one group or between groups of data
 To enable further statistical analysis

Desirable properties of measure of central tendency

We say a measure of central tendency is best if it posses most of the following. It should:

- be simple to understand and easy to calculate/interpret,

- exist and be unique,

- be rigidly defined by mathematical formula,

- based on all observations,

- Not be seriously affected by extreme observations,

- Have capable of further statistical analysis and/or algebraic manipulation.

3.2 The Summation Notation (∑)


26 of 25
Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last
subscript) denotes the number of observations in the data and xi is the ith observation. Then the
sum

For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x 2 , x3 , x 4 , x5 and x 6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x 6 = 37.

6
Their sum becomes x i 1
i  21+13+59+46+32+37=208.

n
2 2 2 2
Similarly x1  x2  ...  xn =  xi
i 1

Some Properties of the Summation Notation


n
1.  c = n.c
i 1
where c is a constant number.
n n
2.  b.xi  b xi where b is a constant number
i 1 i 1
n n
3.  (a  bxi )  n.a  b xi
i 1 i 1
where a and b are constant numbers
n n n
4.  (x
i 1
i  y i )   xi   y i
i 1 i 1
n n n
5. x y
i 1
i i  x y
i 1
i
i 1
i

Example:
12 12 12 12 2
2
Let  xi  26,
i 1
y
i 1
i
17 ,  xi  484,
i 1
y
i 1
i
 362

12 12
Find I )  (4 x  3 y ),
i 1
i i
II )  2 x ( x  7)
i 1
i i

27 of 25
12 12 12
Solution: I )  (4 x
i 1
i
 3 y )  4 xi   Y i  4( 26)  3(17)  105
i
i 1 i 1

12 12 12
2
II )  2 xi ( xi  7)  2 xi  14 xi  2(484)  14(26)  604
i 1 i 1 i 1

3.3 Types of Measures of Central Tendency

Several types of averages or measures of central tendency can be defined, the most commons are

- the mean
- the mode
- the median
3.3.1. The Mean

There are four types of means: Arithmetic mean, Weighted arithmetic mean, Harmonic mean and
Geometric mean.

Arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items.

Arithmetic Mean for Ungrouped Frequency Distribution

When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is

Example 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record
the following:

17.5, 19.5, 17.5, 19, 20, 21, 18, 19.5, 18, 10.75

Compute the mean length of the infants for these data.

Example 2: Monthly incomes of fourth year regular students are given in the following
frequency distribution.

Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5

Number of students 6 9 15 25 13 7 5

Compute the mean for these data.

28 of 25
Arithmetic Mean for Grouped Frequency Distribution

If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
k

 f m
i 1
i i f m
1 1 f m  ...  f m

2 2 k k
x  k

 f f  f  ...  f
1 2 k
i
i 1

Where mi is he class mark of the i th class; i = 1, 2, …, k

f i = the frequency of the i th class and k = the number of classes

k
Note that  f i  n = the total number of observations.
i 1

Example: The following table gives the daily wages of laborers. Calculate the average daily
wages paid to a laborer.

Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25

Number of laborers 3 4 5 6 6 4 3

Properties of the Arithmetic Mean

 The sum of the deviations of the items from their arithmetic mean is zero. This means, the
algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is  ( xi  x )  0
i 1

 The sum of the squares of the deviations of a set of observations from any number, say A, is
minimum when A= . That is,
 When a set of observations is divided into k groups and x1 is the mean of n1 observations of
group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations
of group k , then the combined mean ,denoted by xc , of all observations taken together is
given by

29 of 25
 If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let
 denote the wrong figure used in calculating the mean
 be the correct figure that should have been used
 be the wrong mean calculated using , then the correct mean, , is given by

 If the mean of x1 , x 2 , ..., x n is x , then


a) the mean of x1  k , x 2  k , ..., x n  k will be x  k
b) The mean of kx1 , kx 2 , ..., kx n will be kx .
Example 1: Last year there were three sections taking Stat 273 course in Alemaya University. At
the end of the semester, the three sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark for the entire students.

Solution:

n1 x1  n2 x 2  n3 x3 28(80)  32(83)  35(76) 7556


xc     79.54
n1  n2  n3 28  32  35 95

Example 2: An average weight of 10 students was calculated to be 65 kg, but latter, it was
discovered that one measurement was misread as 40 kg instead of 80 kg. Calculate the corrected
average weight.

Solution:

Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100.
After the mid-term exam, however, a student whose score was 41 out of 100 dropped the course.
What is the average/mean score among the 24 students?

Weighted Arithmetic Mean

In finding arithmetic mean, all items were assumed to be of equally importance (each value in
the data set has equal weight). When the observations have different weight, we use weighted
average. Weights are assigned to each item in proportion to its relative importance.

If x1 , x 2 , ..., x k represent values of the items and w1 , w2 , ... , wk are the corresponding weights, then
the weighted mean, ( x w ) is given by

30 of 25
Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively 82, 80, 90 and 70.If the respective credits received for these courses are 3, 5, 3 and
1, determine the approximate average mark the student has got for one course.

Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.

xi 82 80 90 70

wi 3 5 3 1

Therefore x w 
w x i i

(3  82)  (5  80)  (3  90)  (1  70)
 82.17
w i 3  5  3 1

Average mark of the student for one course is approximately 82.

Exercise: If a student gets A in 4 cr. hrs, B in 3 cr. hrs and D in 2 cr. hrs courses, what is his
GPA in this semester?

Values 4 3 1

Weight 4 3 2

Merits of Arithmetic Mean

- Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite.
- It is calculated based on all observations.
- Arithmetic mean is simple to calculate and easy to understand.
- It doesn’t need arrangement of data in increasing or decreasing order.
- Arithmetic mean is also capable of further algebraic treatment.
- It affords a good standard of comparison.

Drawbacks of Arithmetic Mean

- It is highly affected by extreme (abnormal) values in the series.


- It can be a number which does not exist in the series.
- It sometime gives such results which appear almost absurd. For example it is likely that we
can get an average of ‘3.6 children’ per family.
- It can’t be calculated for open-ended classes.

31 of 25
Geometric Mean: It used when observed values are measured as ratios, percentages,
proportions, indices or growth rates.
GM  n x1 . x2 .... xn ,

n f1 f 2 .... f k
If the observed have frequencies GM  x .x 1 2 x
k

Example: compute the geometric mean of the following values: 2, 8, 6, 4, 10, 6, 8, 4

Solution:
Values 2 4 6 8 10 Total

Frequencies 1 2 2 2 1 8

2 2 2
GM  8
2*4 * 6 .*8 *10  5.41
Harmonic Mean: is a suitable measure of central tendency when the data pertains to speed, rate and
n n
time. HM  
n 1 1 1
i1  .... 
x i x 1 x n

If the data arranged in the form of frequency distribution


k

HM 
 f i 1 i

f 1
 .....  f k
n 1 1
 i 1
f ix i
f x  ........  f x
1 1 k k

Example: A motorist travels 480km in 3 days. She travels for 10 hours at rate of 48km/hr on 1st day,
for 12 hours at rate of 40km/hr on the 2nd day and for 15 hours at rate of 32km/hr on the 3rd day.
What is her average speed?

3
HM   39.92
1 1 1
 
48 40 32

Relations among different means

1. x  GM  HM

32 of 25
2. For two observations x * HM  GM

3. x  GM  HM if all observation have equal magnitude

3.3.2 The Median

The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x 2 , ..., x n by ~
x . For ungrouped data the median is obtained by

 x n 1 if the number of items, n, is odd


~  2
x  1
 ( x n  x n 2 ) if the number of items, n, is even
 2 2 2

For grouped data the median, obtained by interpolation method, is given by

Where Lmed  lower class boundary of the median class

F  Sum of frequencies of all class lower than the median class (in other words it is the
cumulative frequency immediately preceding the median class)

f med  Frequency of the median class and W  is class width

The median class is the class with the smallest cumulative frequency greater than or equal to n .
2
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2,
6.4, 10.5, 8.1 and 7.8. Find the median weight of these five babies.

Solution: the median is 8.1.

Examples 2: The following table gives the distribution of the weekly wages of employees of a small
firm.

Wages in birr No. of employees

33 of 25
126 and below 3

127 – 135 5

136 – 144 9

145 – 153 12

154 – 162 5

163 – 171 4

172 and above 2

a) Find the median weekly wage.


b) Why is the median a more suitable measure of central tendency than the mean in
this case?

Merits of median

- It is not influenced by extreme values.


- Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
- Median can be calculated even in case of open-ended intervals.
- It can be computed for ratio, interval, and ordinal level of data.

Demerits of median
- It is not capable of further algebraic treatment.
- It is not a good representative of the data if the number of items (data) is small.
- The arrangement of items in order of magnitude is sometimes very tedious process if the number
of items is very large.

3.3.3 The Mode

The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.

For grouped data, the mode is found by the following formula:

34 of 25
 1 
xˆ  Lmod   W
 1   2 

Where Lmod  lower class boundary of the modal class

 1  The difference between the frequency of the modal class and frequency of the class

immediately preceding the modal class

 2  The difference between the frequency of the modal class and frequency of the class

Immediately follows the modal class

W  is the class width

The modal class is the class with the highest frequency in the distribution.

Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students’ marks.

Example 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30
children given below.

Weight 1.9-2.3 2.3-2.7 2.7-3.1 3.1-3.5 3.5-3.9 3.9-4.3

No. of children 5 5 9 4 4 3

Solution: 2.7-2.3 is the modal class since it has the highest frequency

1  9  5  4 and 2  9  4  5 Lmod  2.7

 4 
xˆ  2.7    * 0.4  2.878
 45

Merits of mode

- Mode is not affected by extreme values.


- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.
35 of 25
- It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.

Demerits of mode

- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency

3.3.4 Quantiles

Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are averages of position (non-central tendency). Some of these are quartiles, deciles and
percentiles.

I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 ,Q2 and Q3 . The
first quartile is also called the lower quartile and the third quartile is the upper quartile. The second
quartile is the median.
 For Ungrouped data:
Let Q j be the j th quartile value for j  1, 2, 3 . Then

th
j 
Q j   n  1 item; j  1, 2, 3.
4 

 For grouped data


We can apply the following formula:

 j  n 4  FQ j 
Q j  LQ j  W ; j  1, 2, 3.
 fQj 
 

Where Q j  the j th quartile we are going to calculate

LQ j  Lower class boundary of the j th quartile class

FQ j  Sum of frequencies of all classes lower than the j th quartile class

f Q j  Frequency of the j th quartile class and W  Class width

The j th quartile class is the class with the smallest cumulative frequency greater than or equal
to j  n 4 .

36 of 25
II. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth decile
is the median.
 For Ungrouped data
Let D j be the j th percentile value for j  1, 2, ... , 9 . Then

th
 j 
D j   n  1 item; j  1, 2, ... , 9
 10 

 For grouped data


We can apply the following formula:

 j  n10  FD j 
D j  LD j   W ; j  1, 2, ... , 9
 f Dj 
 

Define the symbols similar way as we did in the case of quartiles.

The j th decile class is the class with the smallest cumulative frequency greater than or equal
to j  n 10 .

Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1 , P2 , ... P99 .
The fiftieth percentile is the median.

 For ungrouped data


Let Pj be the percentile value for j  1, 2, 3, ... , 99 . Then

th
 j
Pj   n  1 item; j  1, 2, 3, ... , 99
 100 

 For grouped data


We can use the following formula:

 j  n100  FPj 
Pj  LPj   W ; j  1, 2, 3, ... , 99
 f Pj 
 

Define the symbols similar way as we did in the case of quartiles.

The j th percentile class is the class with the smallest cumulative frequency greater than or equal
to j  n 100 .

37 of 25
Interpretations

1. Q j is the value below which ( j  25) percent of the observations in the series are found
(where j  1, 2, 3 ). For instance Q3 means the value below which 75 percent of observations in the
given series are found.
2. D j Is the value below which ( j  10) percent of the observations in the series are found
(where j  1, 2, ... , 9 ). For instance D4 is the value below which 40 percent of the values are found
in the series.
3. Pj is the value below which j percent of the total observations are found (where j  1, 2, 3, ... , 99 ).
For example 73 percent of the observations in a given series are below P73 .
Exercise: The following table presents the male population of a certain region in Ethiopia.

Find a) all quartiles

b) The 9 th and 5 th decile and

c) 65 th and 75 th percentiles

Age groups (in years) 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 - 40

Male population 2580 3737 4620 5200 7250 620 297 355

Chapter Four

Measures of Dispersion (Variation) and Shape

4.1 Objectives of Measuring Variation

Variation (dispersion) is the scatter or spread of observations /values/ in a distribution

The average or central value is of little use unless the degree of variation, which occurs about it,
is given. If the scatter about the measure of central tendency is very large, the average is not a
typical value. Therefore it is necessary to develop a quantitative measure of the dispersion (or
variation) of the values about the average. Measures of variation are statistical measures, which
provide ways of measuring the extent to which the data are dispersed or spread out.

Measures of variation are needed for the following basic objectives.

38 of 25
 To judge the reliability of a measure of central tendency
 To compare two or more sets of data with regard to their variability
 To control variability itself like in quality control, body temperature, etc
 To make further statistical analysis or to facilitate the use of other statistical measures.

Properties of a good measure of dispersion

A good measure of dispersion should:

- be rigidly defined by a mathematical formula,


- be simple to understand and easy to calculate,
- be unique,
- calculated based on all observations in the series,
- not be affected by some extreme values existing in the series,
- have sampling stability property, and
- be capable of further algebraic treatment as well as further statistical analysis.

4.2 Absolute and Relative Measures of Dispersion

Measures of dispersion/variation may be either absolute or relative. Absolute measures of


dispersion are expressed in the same unit of measurement in which the original data are given.
These values may be used to compare the variation in two distributions provided that the
variables are in the same units and of the same average size.

In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager’s salary
versus worker’s salary, the absolute measures of dispersion are not comparable. In such cases
measures of relative dispersion should be used.

A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate


measure of central tendency. It is sometimes called coefficient of dispersion because the word
“coefficient” represents a pure number (that is independent of any unit of measurement). It
should be noted that while computing the relative dispersion, the average (the measure of central
tendency) used as a base should be the same one from which the absolute deviations were
measured. Note also that the value of a relative dispersion is unit less quantity.

4.3 Types of Measures of Dispersion


39 of 25
4.3.1 The Range and Relative Range

Range (R) is defined as the difference between the largest and the smallest observation in a given
set of data. That is, R  x max  x min where xmax and xmin are the largest and the smallest
observations in the series respectively.

In case grouped data, range is found by taking the difference between the class mark of the last
class and that of the first class. That is, R  M last  M first where M last and M first are the class
marks of the last class and that of the first class respectively.

A relative range (RR), also known as coefficient of range, is given by

x max  x min R
RR   ........ for ungrouped data
x max  x min x max  x min
M last  M first R
RR   ......... for grouped data
M last  M first M last  M first

Properties of Range and Relative Range

- Range and relative range are easy to calculate and simple to understand.
- Both cannot be computed for grouped data with open ended classes.
- They do not tell us anything about the distribution of values in the series.

Example 1: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.

462 480 534 624 498 552 606 588 516 570

Solution:

x max  624 birr x min  462 birr


R  x max  x min  624 birr  462 birr  162 birr
x max  x min 624 birr  462 birr 162 birr
RR     0.149
x max  x min 624 birr  462 birr 1086 birr

40 of 25
Example 2: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.

Maximum load Number


of cables
(in kilo-Newton)

93 – 97 2

98 – 102 5

103 – 107 12

108 – 112 17

113 – 117 14

118 – 122 6

123 – 127 3

128 – 132 1

Solution:

M first  95 kN M last  130 kN


R  M last  M first  130 kN  95 kN  35 kN
M last  M first 130 kN  95 kN 35 kN
RR     0.156
M last  M first 130 kN  95 kN 225 kN

4.3.2 The Mean Deviation and Coefficient of Mean Deviation

The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.

The mean deviation of a sample of n observations x1 , x 2 , ... , x n is given as

MD 
x i A
Where A is a central measure (the mean or the median)
n

In case of grouped data, the formula for MD becomes

41 of 25
MD 
f i mi  A
Where mi is the class mark of the i th class, f i is the frequency of
n
the i th class and n   f i .

 The mean deviation about the arithmetic mean is, therefore, given by

MD 
 xi  x .... for ungrouped data
n

MD 
f i mi  x
.... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n   f i

 The mean deviation about the median is also given by


 xi  ~
x
MD  .... for ungrouped data
n

f i mi  x
MD  .... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n   f i .

The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.

MD
In general, CMD  where A is a measure of central tendency: the arithmetic mean or the
A
median.

MD
That is, CMD about the arithmetic mean is given by CMD  where MD is the mean
x
deviation calculated about the arithmetic mean. On the other hand CMD about the median is
MD
given by CMD  ~ in which case MD is calculated about the median of the observations.
x

Properties of Mean Deviation and coefficient of mean deviation

- It is easy to understand and compute


- It is based on all observations
- It is not affected very much by the values of extreme value(s).

42 of 25
- It is not capable of further mathematical treatments and it is not a very accurate measure
of dispersion.

4.3.3 The Variance, the Standard Deviation and Coefficient of Variation

The Variance

Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.

 Population Variance (  2 )
For ungrouped data

2  x i  
2
1  2  xi 2  Where is the population arithmetic mean
 
N
 . .. 
N
 xi  N  
 
and N is the total number of observations in the population.

For grouped data


2
2   f i mi  
 f m    1 
 2
 i

N
i

N
 fi mi  N  Where  is the population arithmetic
2

 
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and N   f i .

 Sample Variance ( S 2 )
For ungrouped data

2  x i  x
2
1  2  xi 2  Where is the sample arithmetic mean
S 
n 1
 ... 
n 1
 xi  n  x
 
and n is the total number of observations in the sample.

For grouped data


2
2   f i mi  
2  f m  x 
i i 1  2   Where x is the sample arithmetic
S 
n 1
 ... 
n 1   f i mi 
n 
 
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and n   f i .

43 of 25
The Standard Deviation

Standard deviation is the positive square root of the variance.

 Population Standard Deviation (  )


   2 where  2 is the population variance.

 Sample Standard Deviation ( S )


S  S 2 where S 2 is the sample standard deviation.

Example 1: compute the variance for the following data

value 3 6 9 12 15 total

frequency 1 4 10 3 2 20

f ix i
3 24 90 36 30 183

x i
  -6.15 -3.15 -0.15 2.85 5.85
2
(x i  ) 37.8225 9.9225 0.0225 8.1225 34.2225

2
f (x i  )
i
37.8225 39.69 0.225 24.3675 68.445 170.55

x
fx i i

183 5
 9.15, where n   f i  20
n 20 i 1

And S 2

 f x  xi i

170.55
 8.976
n 1 19

Coefficient of Variation

The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).

44 of 25
Coefficient of variation is used in such problems where we want to compare the variability of
two or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.

S
CV   100 . Where S is the standard deviation of the observations.
x

A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.

Example: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.

Department Biology Chemistry

Mean score 79 64

Standard deviation 23 11

Compare the relative dispersions of the two departments’ scores using the appropriate way.

Solution:

Biology Department Chemistry Department

S S
CV   100 CV   100
x x
23 11
  100  29.11%   100  17.19%
79 64

Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students’ scores compared with that of Chemistry students.

Example: The following table illustrates the frequency distribution of masses of 100 male
students in Gander University.

Mass (kg) 60-62 63-65 66-68 69-71 72-74

45 of 25
No. of students 5 18 42 27 8

Find: a) the variance b) the standard deviation c) the coefficient of variation

d) Calculate mean deviation?

Solution:

Mass (kg) 60-62 63-65 66-68 69-71 72-74 Total

No. of students(fi) 5 18 42 27 8 100

class mark(mi) 61 64 67 70 73

fi mi 305 1152 2814 1890 584 6745

fi mi2 18605 73728 188538 132300 42632 455803

mi  x 6.45 3.45 0.45 2.55 5.55

fi mi  x 32.25 62.1 18.9 68.85 44.4 226.5

5 5
5 2
i 1
f i mi  6745 ,  m i 1 i
 455803 , n f
i 1
i  100

fm
i 1
i i
6745
and x    67.45
n 100
2 2
2 1 5 2
( f i m i) 1 (6745) )  8.61
a) S 
n 1
(i 1 f m
i i

n
)  (455803 
99 100
2
b) S  S  8.61  2.93
S 2.93
c) CV  * 100  * 100  4.344
x 67.45

d) MD 
 fi mi  x  226.5  2.265
n 100

Properties of the Variance and the Standard Deviation

Variance
46 of 25
– It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
– Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
– It is calculated based on all the observations/data in the series.
– It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation

– It is considered to be the best measure of dispersion.


– [Demerits] If the values of two series have different unit of measurement, then we can not
compare their variability just by comparing the values of their respective standard deviations.
– It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
– Standard deviation is as such neither easy to calculate nor to understand.
– Similar to the variance, standard deviation gives more weight to extreme values and less to
those which are near to the mean.

The Standard Scores (Z-Scores)

A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.

x
Population standard score: Z  where x is the value of the observation,  and  are the

mean and standard deviation of the population respectively.

xx
Sample standard score: Z  where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.

Interpretation:

Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?

47 of 25
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84

Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90

x A  x1 84  72
Z-score of student A: Z    2.00
S1 6

x B  x 2 90  85
Z-score of student B: Z    1.00
S2 5

From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.

4.2. Measure of shape

Moments:

The Kth row moment about the origin for a given n observation
x , x ,....., x
1 2 N
withthecorrespondingfrequencies f ,f 1 2
,...., f N
is defined as

N
1 N k
M k

N
 f i x , where N   f
i 1 i
i 1
i
, k  1, 2, ..

1 N
For k=1, we have M 1

N
 f ix
i 1 i

Thus the 1st raw moment about the origin is arithmetic mean.

1 N 2
For k=2, we have M 2

N
 f ix
i 1 i

The Kth central moment about the arithmetic mean for a given n observation is denoted by
N
1 N k
Mk and defined as M k N i 1 f i ( x i   )
 , where N  
i 1
f i, k  1, 2, .. and
 is arithmetic mean

For k=1 => Mk=0

1 N 2
 f i ( xi   )  
2
For k=2 =>population variance i.e. M2 N i 1

48 of 25
Example: find the first three moments a about the mean from the following data

value 5 15 25 35 Total

frequency 1 3 4 2 10

Solution

value 5 15 25 35 Total

frequency 1 3 4 2 10

f ix i
5 45 100 70 220

x i
  -17 -7 3 13
f i( x i
  ) -17 -21 12 26 0
2
f (x i  ) 289 147 36 338 810
i

3
f (xi  ) -4913 -1029 108 4394 -1440
i

N 4

x
i 1
i
200  f i( x i   )
i 1
 N
  22 , M1  4
=0/10=0
 f 10
i 1 i f
i 1
i

4 2 4 3
 f i( x i   )
i 1
 f i( x i   )
i 1
M2  4
=810/10=81 and M 3  4
= -1440/10 = 144
f
i 1
i f
i 1
i

Skewness: is lack of symmetry.

A distribution is said to symmetrical when the value is uniformly distributed around the mean
(distribution of the data bellow the mean and above the mean are equal). The mean, median and
the mode are equal.

49 of 25
Negatively Skewed distribution: if one or more extremely small observations are present i.e.
mean is smaller than median and mode.

Positively skewed distribution: if one or more observations are extremely large i.e. mean is
greater than median and mode

When deviations are raised to an odd power (i.e. k=1, 2, 3, …) and sum of the negative deviation
equal to sum of positive deviations, then the distribution is symmetrical other wise it is skewed.

I.e. the distribution is symmetrical if M3=0, M5=0, M7=0, etc but for example if M3≠0 then the
distribution is skewed.

Negatively

Positively skewed Skewed

Symmetrical

x  x  x x  x or x x  x or x

(i) coefficient of skewness( Karl Pearson’s measure of skewness)


3( X  X )
SK  S

S K
lies b / n 3 and 3 i.e. 3  S K
3

 If S K
= 0, then the distribution is symmetrical since X X
 If S K
> 0, then the distribution is positively skewed, since X  X

 If K S < 0, then the distribution is negatively skewed, since X  X


Example: consider the above example to check the skewness the data

Solution:  =22,  2 = 81
th th
(n ) obsn  (n  1) obsn 5th obsn  6th obsn 25  25
X 2 2    25
2 2 2
3(22  25) 9
S K

9

9
  1  0 , this implies that the data is negatively skewed.

50 of 25
(ii) Bowley’s Quartile measure of skewness: it says in a symmetrical distribution first and
third quartile has equidistance from the median(Q2)
Q  Q3
I.e. Q2 – Q1= Q3 – Q2 in other word median, X  1
2

If Q2 – Q1 ≠ Q3 – Q2 the data is skewed

Bowley’s Quartile coefficient of skewness is denoted by SB

Q1  Q3  2Q2
SB  Since Q2 = median we can rewrite as
Q3  Q1

(median  Q1 )  (Q3  median)


SB 
(median  Q1 )  (Q3  median)

S B Lies b/n -1 and 1

If S B =0, then the distribution is symmetrical

If S B >0, then the distribution is positively skewed

If S B <0, then the distribution is negatively skewed

(iii) The coefficient of skewness in terms of moment denoted by  3


M3 M3
3  3
, where M 2   2 =>  3  3
2
(M 2 ) 2
( )
if  3  0 the distribution is symmetrical ,
if  3  0 the distribution is negatively skewed
if  3  0 the distribution is positively skewed

Kurtosis: is the degree of peakedness of the curve of a frequency distribution.

Mesokurtic (normal curve): if the frequency distribution is unimodal and if the curve is bell
shaped and symmetrical.

51 of 25
Leptokurtic: if the frequency distribution is more peaked than normal i.e. large numbers of
observations have high frequency

Platykurtic: if the frequency distribution is less peaked than normal i.e. large numbers of
observations have low frequency.

Leptokurtic

Mesokurtic

Frequency Platykurtic

Value

Formula for the measure of kurtosis:

M4 M4
4  2

(M 2) 4

If  4 > 3 the curve is leptokurtic (more peaked)

If  4 < 3 the curve is platykurtic (less peaked)

If  4 = 3the curve is mesokurtic (normal curve )

Example: The standard deviation of a symmetrical distribution is 3.What must be the value of
the fourth moment about the mean in order that the distribution be mesokurtic?

Solution:

M4 M4
4  3 4  4

 81

M4
3  M 4  3(81)  243 , So the 4th moment about the mean should be equal to 243
81
52 of 25
Unit 5

5 Elementary Probabilities

5.1 Definition of basic terms of probability

Random experiment: - is a process of measurement or observation which is repeated at any


time and who’s out come can’t be predicted with certainty. E.g. tossing a coin
Out come: - a particular result of an experiment (result of single trial of an experiment)
Sample space: - is the set of all possible out comes of a random experiment. Each possible out
come is called sample point.
Event: - is a subset of a sample space (one or more outcomes of an experiment)
Example1: if we toss a coin the sample space (S) of this experiment
S = {head, tail} where head and tail are two faces of a coin. If we are interested the outcome of
head will turn up then the event E= {head}
Example 2: find the sample space of tossing a coin twice.
S= {HH, HT, TH, TT}
Elementary or simple event: - an event having only one sample point.
Mutually exclusive event: - two events E1 and E2 are said to be mutually exclusive if there is
no sample point which is common to E1 and E2.
i.e. E1  E2 = 
Independent event: two events E1 and E2 are said to be independent if the occurrence of E1 has
no bearing on occurrence of E2. That means knowledge of E1 has occurred given no information
about the occurrence of E2.
Collectively exclusive events: - two events are said to be collectively exclusive if at least one of
them must occur. Hence they include every possible outcome.
Equally likely outcomes: - if each outcome in a sample space has the same chance to be
occurred.

53 of 25
Example In throwing a fair die all possible outcomes are equally likely. That means the elements
of the sample space have the chance to be occurred.
5.2 Counting techniques:
In order to determine the number of out comes one can use several rules of counting
1. Multiplication rule: - in a sequence of n events in which the first event has k1 possibilities…
the nth event has kn possibilities, then the total possibilities of the sequence will be k1.k2….kn.

Example: - in a personnel department a larger corporation wishes to issue each employee an ID


cards with two letters followed by two digit numbers. How many possible ID cards can be
imposed?
Solution
K1 K2 K3 K4
26 26 10 10
Thus the total number of ID cards issued could be:
26*26*10*10=67600(with repetition)
26*25*10*9=58500 (with repetition)

2. Permutation: is an arrangement of n objects in a specific order. In this case order is crucial.


a) The number of permutations of n objects taken all together is n!
I.e. n! / (n-n)!
b) The arrangement of n distinct objects in a specific order using r objects at a time is given by
nPr =n!/(n-r)!= n(n-1)(n-2)…..(n-r-1)
c) The number of permutation of n objects in which k1 are alike, k2 are alike, kn are alike is
n! / k1!k2!....kn!
Example: a photographer wants to arrange 3 persons in a raw for photograph. How many
different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA, YLA and
YAL

54 of 25
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
There are 14P2= 14! / (14-2) = 182
3. Combination: - counting technique in which the order of the objects is immaterial. Selection
of r objects from a collection of n objects where r<= n without regarding order.
The combination of n objects r objects taken at a time is given by
nCr = n!/(n-r)!r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
5.3 Definition of probability
Probability:-is a chance (likely hood) of occurrence of an event. It is expressed by a numerical
value between 0 and 1 inclusively. Probability is a building block of inferential statistics.
Deterministic Stochastic model (probabilistic)
-> Certain -> uncertain
->mathematical ->non-mathematical (econometric model)
Generally probability can be divided into two
i) Subjective probability: - probability of an event in a certain experiment to be occurred
based on individual’s belief or attitude.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
5.4 Basic approaches to probability
Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out the n outcomes
event E occur only k times the probability of the event E is denoted by P (E) is defined as
P (E) = n (E)/ n(S) =k/n

55 of 25
Deficiencies of classical approach
- If total number of outcomes is infinite or if it is not possible to enumerate all elements of
the sample space.
- If each out come is not equally likely
Example: in the experiment of tossing a coin and a die together, find the probability of an event
E consisting head and even numbers.
Solution: S={H1,H2,H3,H4,H5,H6,T1,T2,T3,T4,T5,T6} then
E= {H2, H4, H6} thus, P (E) =n (E)/n(S) =3/12= ¼
Let S be sample space of an experiment, P is called probability function if it satisfies the
following condition
0 < P (A) ≤ 1, for each event A, P (A) is called probability of A where P (S) = 1
If A and B are mutually exclusive events, then P (A  B) = P (A) + P (B)

Similarly P (  Ai ) =P ( A1 ) + P ( A2 ) +…+ P ( An )
i 1


=  P( A )
i
i 1

Relative frequency Approach (empirical approach):- suppose we repeat a certain


experiment n times and let A be an event of the experiment and let k be the number of times that
event A occurs. Then the ratio k/n is called the relative frequency of event A.
number of times event A has occurred k
P ( A)  
total number of observations n
In other words given a frequency distribution , the probability of an event (E) being in a
frquency of a class
given class is P(E)=
total frequency in the distribution
Example: the national center for health statistics reported that of every 539 deaths in recent
years, 24 resulted that from automobile accident, 182 from cancer, and 353 from other disease.
What is the probability that particular death is due to an automobile accident?
Solution P (automobile) = death due to automobile /total death =24/539
5.5 Rules of probability

56 of 25
Rule l: let A be an event and A’ be the compliment of A with respect to a given sample space
of an experiment, then p(A’)=1-P(A)
Proof: let S be a sample space S=A  A’ and, A and A’ are mutually exclusive
A  A’ = 
P(S) = P (A  A’) = P (A’) + P (A) and P(S) = 1
1= P (A’) + P (A) => P (A’) = 1-P (A)
Rule 2: let A and B are events of a sample space S, then
P (A’  B) = P (B)-P (A  B)
Proof: B =S  B = (A  A’)  B = (A  B)  (A’  B)
Case 1: if A  B ≠  , then P (B) =P (A  B) +P (A’  B)
P (A’  B) = P (B) – P (A  B)
Case 2: if A  B =  , then P (B) =P (A  B) + P (A’  B) since P (A  B) = P (  ) =0
=> P (B) = P (A’  B)
Rule 3: Suppose A and B are two events of a sample space, then
P (A  B) = P (A) + P (B) - P (A  B)
Example: A fair die is thrown twice. Calculate the probability that the sum of spots on the face
of the die that turn up is divisible by 2 or 3.
Solution:
S= {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),
(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let E1 be the event that the sum of the spots on the die
is divisible by 2 and E2 be the event that the sum of the spots on the die is divisible by two,
then
P (E1 or E2) = P (E1  E2)
= P (E1) +P (E2) – P (E1  E2)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3
5.6 Conditional probability and independence

57 of 25
5.6.1 Conditional probability: the conditional probability of an event A in relation to B is
defined as the probability that event E occurs given that event A is has been already occurred.
P (A/B)=P(A  B)/P(B) where P(B)> 0
Remark: (i) P (A  B) & P (B) are computed w. r. t. original sample
(ii) P (S/B) = P(S  B)/P (B) = P (B)/P (B) = 1
P (B/S) = P (B) because P (B/S) = P (B  S)/P(S) = P (B)/1 =P (B) (iv) if A and B are
independent event, then P(A/B) =P(A) and P(B/A) =P(B) two events are independent if the
occurrence of B doesn’t affect the occurrence of A. i.e. P(A/B) =P(A  B)/P(B)
P (A  B) = P (A/B) *P (B) but P (A/B) = P (A)
Hence P (A  B) = P (A)* P (B)
Example: Suppose that an office has 100 calculating machines. Some of them use electric power
(E) while others are manual (M) and some machines are well known (N) while others are used
(U). The table below gives numbers of machines in each category. A person enter the office
picks a machine at random and discovers that it is new. What is the probability that it is used
with electric power?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution: P (E/N) =P (E  N) /P (N) = 40/70 =7/4
Baye’s theorem
Theorem 1.1: let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En
has non-zero probability that is P(Ei) ≠ 0 for I = 1,2, … ,n and let E be any event, then P(E)
=P(E1)* P(E/E1) + P(E2)*P(E/E2) +….+P(En)*P(E/En)
n
=  P ( E )P ( E )
i 1 E i

Theorem 1.2: (Baye’s theorem)


Let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En has non-zero
probability that is P(Ei) ≠ 0 for I = 1,2, … ,n and let E be any event for P(E) > 0, then for each
integer k, 1 ≤ K ≤ n, we have
58 of 25
P( E k ) P( E
E)
p( E k )= k
E n

 P( E ) P( E E )
i 1
i i

Example: suppose that three machines are A1,A2 and A3 produce 60%, 30%, and 20%
respectively of the total production of machines are 2%, 4%, and 6% respectively.
If an item is selected at random, then find the probability that the item is defective.
Assuming that an item selected at random is found to be defective. Find the probability the item
was produced on machine A1.
Solution :let B be an event of selecting a defective item at random and let E1, E2, E3 be an items
produced on machines A1, A2, A3 respectively then
P (B/E1) = 2%=0.02, P(B/E2) = 4%=0.04 and P(B/E3)=6%=0.06
P(B) = P(B  [E1  E2  E3])
= P ([B  E1]  [B  E2]  [B  E3])
= P (B  E1) + P (B  E2) +P (B  E3)
= P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
= 0.6*0.02 + 0.3*0.04 + 0.1*.006
= 0.03
p ( E1  B ) P ( E1) P ( B
E ) = 0.6 * 0.02 =0.4
1
We use Bye’s formula P (E1/B) = = n
P( B) 0.03
 P( E ) P( B E )
i 1
i i

5.6.2 Independence: two events E1 and E2 are said to be independent if the occurrence of E1
has no bearing on occurrence of E2. That means knowledge of E1 has occurred given no
information about the occurrence of E2. Two events, A and B, are said to be independent
if P ( A  B )  P ( A) P ( B ) .
Suppose A and B are independent events with 0<P (A) <1 and 0<P (B) <1. Show that the
following statements true.

i. AC and BC are independent.


ii. A and Bc are independent
iii. Ac and B are independent
iv. P(B|A) = P(B)
v. P(B|AC) = P(B)

59 of 25
Example: Consider the experiment of drawing a card from a well shuffled deck of cards
Let A: a spade is drawn
B: an honor (10, J, Q, K, A) is drawn
Are the two events are independent?
13 1 20 5
Solution: P ( A)   , P( B)   and P( A  B)  5
52 4 52 13 52
13 20 5
Using independence theorem P ( A  B )  P ( A) P ( B )  * 
52 52 52

6. Probability Distribution
6.1 Definition of Random Variable
Definition: A random Variable is variable whose values are determined by chance.
It is a numerical description of the outcomes of the experiment or a numerical valued function
defined on sample space, usually denoted by capital letters.
Example: If X is a random variable, then it is a function from the elements of the sample space to
the set of real numbers. i.e. X is a function X: S→R.
Flip a coin three times, let X be the number of heads in three tosses.
S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
X (HHH) =3
X (HHT) =X (HTH) =X (THH) =2
X (HTT) =X (THT) =X (TTH) =1
X (TTT) =0
X= {0, 1, 2,}
Random Variables are of two types:
1. Discrete random variable: are variables which can assume only a specific number of
values which are clearly separated and they can be counted.
Example:

 Toss coin n times and count the number of heads.


 Number of Children in a family.
 Number of car accidents per week.
 Number of defective items in a given company.
60 of 25
2. Continuous random variable: are variables that can assume any in an interval.
Example:

 Height of students at certain college.


 Mark of a student.
Definition: A probability distribution is a complete list of all possible of values of a random
variable and their corresponding probabilities.
Discrete probability distribution: is a distribution whose random variable is discrete.

Example: Consider the possible outcomes for the exp't of tossing three coins together.
Sample space, S = (HHH, THH, HTH, HHT, TTH, THT, HTT, TTT)
Let the r.v. X be the No of heads that will turn up when three coins tossed
X = {0, 1, 2, 3}
P(X = 0) = P (TTT) = 1/8,
P(X=1) = P (HTT) +P (THT) + P (TTH) =1/8+1/8+1/8 = 3/8
P(X=2) = P (HHT) +P (HTH) +P (THH) = 1/8+1/8+1/8 = 3/8,
P(X=3) = P (HHH) = 1/8

X=x 0 1 2 3

P(X=x) 1/8 3/8 3/8 1/8

Probability density function(continuous probability distribution): is a probability


distribution whose random variable is continuous. Probability of a single value is zero and
probability of an interval is the area bounded by curve of probability density function and
interval on x-axis. Let a and b be any two values; a <b. The prob. that X assumes a value that lies
b/n a and b is equal to the area under the curve a and b.

I.e. P (a  x  b) area under curve b/n a and b

Since X must assume some value, it follows that


The total area under the density curve must equal 1.
61 of 25
p (  x  b ) area of shaded region Note: P ( X = a ) = 0 for any point a
b
Fig. probability density functions of X. P (a  X  b)  p ( a  x  b)   f ( x)dx
a

If f(x) is a probability density functions:


1) f(x) ≥0

2)  f ( x)dx  1


6.2 PROBABILITY MASS FUNCTION, EXPECTATION AND VARIANCE OF A


DISCRETE RANDOM VARIABLE

Every discrete random variable X has a point associated with it. The points collectively are
known as a probability mass function which can be used to obtain probabilities associated
with the random variable.
Let X be a discrete random variable, then the probability mass function is given by
f(x) = P(X=x), for real number x.
A function is probability mass function
1. f(x) ≥0
2.  f ( x )  1
all x

Expected Value (Mean)

 Let X be a discrete random variable X whose possible values are X1, X2 …., Xn with
the probabilities P(X1), P(X2),P(X3),…….P(Xn) respectively.
Then the expected value of X, E(X) is defined as:
E(X) =X1P(X1) +X2P(X2) +……..+XnP (Xn)
n
E (X) =  X P X  x 
i 1
i i

62 of 25
Example: what is the expected value for the r.v from the above example?

Solution X= 0,1,2,3,  X 1  0, X 2  1, X 3 2, X4 3

P(X=x1) = 1/8 P(X= X2) =3/8 , P (X= x4) = 1/8

E (X) = 3 X i P X  xi 
i 1

= 0(1/8) +1(3/8) + 2(3/8) +3(1/8) = 12/8 = 1.5

Properties of expected value


 If C is a constant then E(C) = C
 E (CX) =CE(X), Where C is constant.
 E (X+C) =E(X) +C, Where C is a constant.\
 E(X + Y)= E(X) +E(Y)

Variance
If X is a discrete random variable with expected value  (i.e. E(X) =  ), then the variance of X,
denoted by Var (X), is defined by

Var (X) = E(X-  ) 2

= E (X2) -  2
n 2
=  ( xi ) P  x  - 
i 1
i
2

n 2
Alternatively, Var (X) =  ( xi   X ) P  x 
i 1
i

Properties of Variances
 For any r.v X and constant C, it can be shown that
Var (CX) = C2 Var (X)
Var (X +C) = Var (X) +0 = Var (X)
63 of 25
 If X and Y are independent random variables, then
Var (X + Y) = Var (X) + Var (Y)
More generally if X1, X2 ……, Xk are independent random variables,
Then Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)

 k  k
I.e Var   xi  
 i 1 
Var  X 
i 1
i

 If X and Y are not independent, then


Var (X+Y) = Var(X) + 2Cov(X,Y) + Var(Y)
Var(X-Y) = Var(X) – 2Cov(X,Y) + Var(Y)

Examples Consider a random variable X that takes a value either


1 or 0 with respective probabilities P and 1-P. find the expected value as well as the variance of
the r.v.
Solution X1 = { 0, 1 }
P(X=1) = P and P(X=0) = 1-P

E (X) =  xiP X  x   0.P X  0  1.P X  1


i

= 0(1-P) + 1(P) = P

Var (X) = E (X2) -  2

=  xi P X  x   
2
i
2

= [02 P  x  0   12 P  x  1 ] - P2

= [0(1  p )  1( p )]  P 2

= P  P 2 = P (1-P)

2. Two fair coin are tossed. Determine Var (X) where X is the number of heads that appear.
A) Use the definition of the variance
64 of 25
a. Use the fact that the variance of the sum of independent variables is equal to the
sum of the variance

Solution A) X = No of heads = 0,1, 2, HH , TH , HT , TT 

P (X = 0) =¼ , P (X = 1) = ½, P(X=2) = ¼
E (X) = 0.P(X=0) +1.P (X=1) +2P(X=2) = 0 (1/4) + 1(1/2) +2(1/4) = 1=E(X)
E(X2) = 02P(X=1) +12.P(X=1) +22P(X=2) = 0(1/4) + 1(1/2) +4(1/4) = 3/2

Var (X) = E(X2) -  2 = 3/2-1=1/2

B) Let X = head on the first coin = 0,1 T , H 


Y = head on the second coin = 0,1  T , H 

P(X= 0) = ½ , P (X = 1) = ½ and P (Y=0) = ½, P(Y=1) = ½


E(X) = 0.P(X=0 + 1.P(X=1) E(Y) = 0.P(Y=0) +1P(Y=1)
= 0(1/2) +1(1/2) =1/2=E(X) = 0(1/2) +1(1/2) =1/2
E(X2) = 02 .P(X=0) +12.P(X=1) E (Y2) =02.P(Y=0) +12P(Y=1)
= 0(1/2) +1(1/2) =1/2 = 0(1/2) +1(1/2) =1/2

Var (X) = E (X2) -  2 Var (Y) = E (Y2) -  2

= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the out come of the
second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½

1.3 COMMON DISCRETE PROBABILITY DISTRIBUTIONS


1. Binomial Distribution
The origin of binomial experiment lies in Bernoulli trial. Bernoulli trial is an experiment of
having only two mutually exclusive outcomes which are designated by “success(s)” and “failure
(f)”. Sample space of Bernoulli trial {s, f}
Notation: Let probability of success and failure are p and q respectively
P (success) = P(s) = p and P (failure) = P (f) = q, where q= 1- p

65 of 25
Definition: Let X be the number of success in n repeated Binomial trials with probability of
success p on each trial, then the probabilities distribution of a discrete random variable X is
called binomial distribution.
Let P = the probability of success
q= 1-P= the probability of failure on any given trial.
A binomial random variable with parameters n and p represents the number of r successes in n
independent trials, when each trial has P probability of success

If X is a random variable, then for i= 0, 1, 2… n

n!
P(X=r)= P r (1  P )n  r X  Binomial probability distribution formula
r ! n  1 !

n! n r
P(X=r) = Pr  q  whereq  1  P
r ! n  r  !

A binomial experiment is a probability experiment that satisfies the following requirements


called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success
or a failure.
3) The probability of each outcome does not change from trial to trial.
4) The trials are independent.
Examples of binomial experiments

 Tossing a coin 20 times to see how many tails occur.


 Asking 200 people weather or not they listen the BBC news.

Example 1. A fair coin is flipped 3 times, what is the probability of getting


a) No heads?
b) 2 heads?
Solution = Let x = no of heads
i = 0,1,2,3, (i.e. i is the no of possible no of heads)
P (getting head) =) P = ½ q = 1-P =1/2 , n =3
66 of 25
0 3 0
3! 1  1 
     1
a) P(X=0) = 0!3  0 !  2   2  8

2 3 2
3! 1 1
b) P (X=2) =      3
2! 3  2  !  2   2  8

2. The probability that a student entering a college will graduate is 0.4. Determine the probability
that out of 5 students (a) none, (b) one (b) at least one (a) at most three will graduate

Solution X: No of students who will graduate

X = 0,1,2,3,4,5,

P = 0.4, q = 1-P = 0.6

5!
a) P (None will graduate) = P (X=0) = 0.4 0 0.65  0.08
0!5  0!

5!
b) P (one will graduate) = P (X=1) = 0.41 0.65  0.26
1!5  1!

c) P (at least one will graduate) = P (X  1)

= P(X=1) + P(X=2) +P(X=3) +P(X=4) +P+X=5)


= 1-P(X<1) = 1-P (X=0)

5!
= 1- 0.40 0.65
0!5  0!

= 1-0.08=0.92

d) P (at most three will graduate) = P(X  3)

= 1-P(X>3)

= 1- [ P ( x  4)  P ( x  5)]

67 of 25
= 1-[5!/(4!(5-4)!(0.4)4(0.6)1+5!/5!(5-5)!)(0.4)5(0.6)0]

= 0.91296

If X is a binomial random variable with two parameters n and P, then

1. E (X) = n.p.
2. Var ( X) = npq
Poisson distribution

- It is a discrete probability distribution which is used in the area of rare events such as
number of car accidents in a day, arrival of telephone calls over interval of times, number
of misprints in a typed page natural disasters Like earth quake, etc,

A Poisson model has the following assumptions


- The expected occurrences of events can be estimated from part trials ( records)
- The numbers of success or events occur during a given regions / time intervals are
independent in another.

Definition Let X be the number of occurrences in a Poisson process and  be the actual
average number of occurrence of an event in a unit length of interval, the probability function for
Poisson distribution is,


  x
 e 
P (X = x) =  forX  0,1, 2,....
X!
 0, otherwise

Remarks
 Poisson distribution possesses only one parameter 
 If X has a Poisson distribution the parameter  , then E (X) =  and
Var (X) =  , i.e. E (X) = Var (X) =  ,
68 of 25

  P ( X  x)  1
x 0
Examples 1 A company manufacturing light bulbs discovers from past experience that 2
defects of bulbs are manufactured per 30 working hours. What is the prob that 4 defects will
be manufactured in 30 working hours?

Solution =   2,

Let X be the R.v that the no of defected bulbs

e 2 .2 4
P (X = 4) 0.09
4!
Example 2 In a small city, 10 accidents took place in a time of 50 days. Find the probability
that there will be
a) Two accidents in a day
b) three or more accidents in a day

Solution 50 days 10 accidents


1 day ? 10/50 = 0.2 accidents in a day
   0.2

Let X be the rv., the No of accidents per day

X 〜poiss   0.2 X = 0, 1, 2,…


2
e 0.2 0.2 
a) P (X = 2) =  0.0164
2!

b) P (X  3)  P( X  3)  P X  4  P X  5  ...

= 1- P X  0  P X  1  P X  2

. . . . . . b/c  P X  x   1
x0

= 1- 0.8187  0.1637  0.0164

= 0.0012
3. a) Referring to eg.1, what is the expected no of defected light bulbs in a day? What
about the variance?
b) Referring to eg.2, find the mean and the variance for the no of accidents in a day
69 of 25
Solution a) E (X) = Var (X) =   2

b) E (X) = Var (X) =   0.2

Approximating Binomial with Poisson

Poisson distribution can approximate binomial distn, when the number of trials, n is
comparatively large and the probability of an occurrence a success, P is small.

The approximating formula is

e  np  np 
, X  0,1, 2,...., Generally we use poisson to approximate
P(X=x) = x!
a Binomial when n  50 and np  5

Example: Suppose that an insurance company has 2000 policy holders & that the probability of
any one of policy holders will file at least one claim in any given year is 1/1000. Find the
probability that in any given year one or more of the policy holders will file at least one claim.

Solution X = No of policy holders who will file at least one claim

X = 0,1,.......2000 n = 2000 , p = 0.001

 2000  0 2000
P( X  1  1  P  x  1  1  p  x  0   1   6   0.001  0.999   0.8648
 

Since n = 2000 > 50 and np = 2, we can use Poisson approximation


0
e 2 .2 
P (X  1) = 1-p  x  1  1  p x  0   1   0.8647
0!

6.4. COMMON CONTINUOUS PROBABILITY DISTRIBUTIONS


6.4.1 Normal Distributions

70 of 25
It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. A random variable X is said to have a normal distribution if
its probability density function is given by
1 2
1  2  x 
f ( x)  e 2 , Where X is the real value of X,
 2
i.e. -  <x<  , ∞<µ<∞ and σ>0
Where µ=E(x) (σ) 2 = variance(X)
µ and (σ) 2 are the Parameters of the Normal Distribution.

Properties of Normal Distribution:


1. It is bell shaped and is symmetrical about its mean. The maximum coordinate is at
x = X

2. The curve approaches the horizontal x-axis as we go either direction from the mean.
  1
1   x   2
2
3. Total area under the curve sums to 1, that is  f ( x)dx  e dx  1
  2 
4. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
5. The height of the normal curve attains its maximum at  X this implies the mean and
mode coincides(equal) .
6.4.2 Standard normal Distribution
It is a normal distribution with mean 0 and variance 1.Normal distribution can be converted to
standard normal distribution as follows. If X has normal distribution with mean  X and standard
x
deviation , then the standard normal distribution devariate Z is given by Z=

2
1 z
P (Z) =
2
e 2

Properties of the standard normal distribution:


 The same as normal distribution, but the mean is zero and the variance is one.
 Areas under the standard normal distribution curve have been tabulated in various ways.
The most common ones are the areas between Z = 0 and a positive value of Z.

Given a normal distributed random variable X with mean µ and standard deviation σ.
71 of 25
b x a
P (a<X<b)  P (   )
  

 x a  x


P( X  a)  P   But,  Z Standard normal r.v.
    

 a
 PZ  
  

Note: i) P (a<x<b) = P (a<=X<b)


= P (a<X<=b)
=P (a<=X<=b)

ii) P (   Z  )  1

iii) P  a  Z  b   P  Z  b   P  Z  a  forq  b

Consider the situations under the standard normal curve. It is clear that

P  0  Z   0.5  P  Z  0 

i) Let Z0 be negative number then,


Question 1. P  Z  Z 0   P  Z  0   P ( Z 0  Z  0)
ii) If Z0 is positive real number, then
Question 2. P  Z  Z 0   P  Z  0   P ( Z 0  Z  0)
iii) Let Z1 be a negative number and Z2 be positive real number, then
Question 3. P  Z 1  Z  Z 2   P  Z 1  Z  0   P ( Z 2  Z  0)
iv) If Z1 and Z2 are positive real numbers with Z1<Z2
Question 4. P  Z 1  Z  Z 2   P  Z 1  Z  0   P ( Z 2  Z  0)
Question 5.
Question 6. i.e.
i) p(Z<Z0)
iv) P(Z1<Z<Z2) ii) p(Z>Z0)

0 Z1 Z2 0 Z0 Z0 0

72 of 25
iii) p (Z1<Z<Z2)

Z1 0 Z2

As the value of  increases, the curve becomes more and more flat and vice versa.

Examples: - For a standard normal variable Z find

a) P(-2.2 <Z<1.2) c) P(0<Z<0.96)


b) P(Z>1.05) d) p(-1.45 <Z<0)

Solution: a)
-2.2 1.2
P (-2.2<Z<1.2) = P (0<Z<1.2) +p (-2.2<Z<0)
= p (0<Z<1.2) +P (0<Z<2.2)
= 0.3849+0.4861
= 0.8710
b)

= P (Z>1.05) = 1 - P (0<Z<1.05)
= 1-0.8531 = 0.1469

c) P (0<Z<0.96) = 0.3315
d) P (-1.45 <Z<0) = P (0<Z<1.45) = 0.4265

73 of 25
Student distribution (t-distribution)

Suppose we have a sample X1… Xn from a normal population having mean  unknown and
standard deviation  (Unknown), and using this sample data, we want to get an interval
estimator of the population mean  .

X 
Z= has a standard normal distn. But  is unknown so that we can substitute it by

n
its estimator S (sample standard deviation). Hence, now

X 
t n 1 = is said to be (student t-distribution) having n- 1 df.
S
n

 The random variable

X 
t n 1 = is a (student) t r.v having n – 1
S
n

Notion; t x v  stands for a value of t with v df. to the right of which an area equal to  lips

Example: t0.025 (12) = 2.179 means P t 12  2.179  0.025

t 0.01 (25)= 2.485 means P t 25  2.485  0.01

t (12) t(25)
0.25 0.01
2.179 2.485

N.B. The Student t – distn is a symmetrical distn -tx = ty

-tx ty

74 of 25
CHAPTER 7

Sampling and Sampling Distribution of the Mean

7.1 Definition and Some Basic Terms

We will now use the probability studied in the last two chapters to discuss inferential statistics.
We are going to analyze and interpret data to draw conclusions not about the data but about the
source of the data (population consisting of all elements being studied). We collect a sample of
data from the population and use it to make inferences about the population. Very often we will
be interested in estimating a population parameter. In order to estimate this we need to define
our terms carefully:

Population: the entire group of individuals or objects of interest.

Unit: An element of the population. This will be a person or object on which observations can be
made or from which information can be obtained.

Sampling Frame: list of all the units in the population.

Sampling population: is a population from which one actually draws a sample. Sample
population covers the element from which sample was actually selected.

Target population: the population about which one wishes to make an inference.

Note: (i) the sample population is smaller than target population by non coverage or incomplete

coverage (missing units).

(ii) Statistical inference procedures allow one to make inference about sample population.

Only when sample population and target population are equal one can infer about target

population.

Major reasons why sampling is necessary

1) the destructive nature of certain tests


2) physical impossibility of checking all items in the population
3) cost of studying all items in the population is often prohibitive
4) the adequacy of sample result
5) in terms of time

75 of 25
7.2 Types of Errors

An estimate based on a sample will not be exact; there will be an error involved. In general,
errors which occur during estimation based on a sample can be categorized into two:

 Sampling errors
 Non sampling errors

Sampling errors

The error which arise due to only a sample being used to estimate population parameter.
Even if we have a representative sample will also introduce errors if the sample size is small.

On the other hand our estimates of parameters will often be inaccurate if our sample is not
representative of the population. Because of this we need to know how to choose a sample. We
see this in Section 7.3.

Sampling error is the difference b/n an estimate and the true value of the parameter being
evaluated. We deal with this concept in the next chapter.

Non sampling errors

Suppose we have a representative sample and have chosen a sample large enough to ensure our
parameter estimates are accurate to a good degree of precision. Even we will still have to
consider other kinds of errors such as measurement errors, recording errors, non-response
errors, respondent bias, interviewer error, errors in processing the data, and reporting
error. Measurement errors and recording errors occur if there is an error in measuring the item
being studied or in recording its result. Interviewer errors can occur in surveys when an
interviewer introduces bias into an interview or when a questionnaire is badly designed. Another
common form of error is the non-response error. Non responses can be due to refusals

7.3 Sampling Methods

Sampling techniques can be grouped into two categories:

– Random (probability) sampling methods, and


76 of 25
– Non-random (non-probability) sampling methods.
Random sampling: sampling method in which the items are included in the sample in a random
basis.

Simple random sample: a sampling technique in which member of the population is equally
likely to be included in the sample. Suppose we have a population of N objects and we wish to
choose n of them to form a sample. We have seen that there are N C n ways of choosing the
sample without replacement and Nn ways with replacement.

Examples of simple random sampling


Lottery method – the units to be included in the sample are chosen by a lottery. Assign numbers
to each element in the population. Write each number in a split of paper, toss then draw one
number at a time. This method can only be used if the population is not very large otherwise it is
cumbersome.

Table of random number: used to select representative sample from a large size population. To
select the sample use random digit techniques. We proceed with the following steps
Step 1: each element numbered for example for a population of size 500 we assign 001 to 500.
Step 2: select a random starting point
Step 3: we need only respective number of digits. Proceed in this fashion until the required
number of sample selected

Stratified random sampling: is often used when the population is split into subgroups or
“strata”. The different subgroups are believed to be very different from each other, but it is
thought that the individuals who make up each subgroup are similar. The number of units to be
chosen from each sub-group is fixed in advance and the units are chosen by simple random
sampling within the sub group.

Example: An investigator is interested in securing a particular response that would be


representative of under graduate college student. He might stratify the population into four
groups: freshman, sophomore, junior and senior.

Cluster sampling: in some case the identification and location of an ultimate unit for sampling
may require considerable time and cost in such cases cluster sampling is used. In cluster
sampling the population is subdivided into groups or clusters and a probability of these clusters
is then drawn and studied. Clusters may be Region, Zones, Weredas, Kebeles etc.
This method of sampling has less cost, faster and more convenient but it may not be very
efficient and representative due to the usual tendency of the units in different cluster be similar.
Example: if we want to study the travel habit of families in Ethiopia which is divided in to
Regions and Zones. We shall first draw a random sample from the Zones to be studied and then
from these selected Zones or clusters, we draw random sample of house holds for the purpose of
investigation.

Systematic sampling: the items or individuals of the population are arranged in some way
alphabetically, in file drawer by data received or some other method. A random starting point
77 of 25
is selected and then every Kth member of the population is selected for the sample. For example
if we want select n items from the population of size N using systematic sampling, we divide N by
n (N/n = K) and choose one b/n 1 and K then we take every Kth member. So the samples will be i,
i+K, i+ 2K, i+ 3K, etc. where 0< i < K

Example: Suppose we want to choose a sample of about 20 students out of a class of 100
students. First we put the class in order (may be alphabetical order, or by ID number) and give
each a number between 1 and 100. Next we divide 100 by 20 and we get 100/20 = 5. We now
choose a number at random between 1 and 5. The student corresponding to that number is the
first student in the sample, and we then take every 5th student. So if, for example, we choose the
number 2 the sample will consist of the 2nd, 7th, 12th, 17th... 92nd and 97th students on the list.

Non probability sampling: selection of sample is based on the judgment of the investigator
rather than on randomness.

Judgment sampling: the subjective judgment of the researcher is the basis for selecting items to
be included in a sample. Judgment sampling often used to pre-test the questionnaire.

Quota sampling: in this sampling technique major population characteristics play an important
role in selection of the sample. It has some aspects in common with stratified sampling, but has
no randomization.

Example: if a scientist is reorganizing that the variability in daily milk production may due to
age difference. Characteristics of cows will be selected from different age group. For instant 30%
of cows’ b/n ages 4-6 years old, and remaining 70% are b/n ages 6-8 years old, a quota sample
must reflect those same percentages.

Convenient sampling: this technique is simply convenient to the researcher in terms of time,
money and administration

78 of 25
Chapter 8
Estimation and hypothesis testing

8.1 Estimation

The purpose of this chapter


We now assume that we have collected a random sample and are trying to use that sample to
estimate a population parameter. We discuss methods of point estimation and interval
estimation (mainly finding confidence intervals),

8.1.1 Point and interval estimation of the mean

Point estimate: a single number computed from the sample and it is used to estimate the
population parameter. We try to find a statistic (calculated from the sample) that is a good
estimator of the unknown parameter.

Sample mean ( X ) is point estimate of population mean (  X )

Confidence interval: A range of values constructed from the sample data, so the parameter
occurs within that range at a specific probability (level of confidence)

Confidence interval fore the mean is Estimate  Critical Value  S .E = X  z  2( SE )

Standard error of the sample mean(SE): it the standard deviation of the probability
distribution of the sample means which measures variability of the sampling distribution of the
sample mean. It is denoted by  X .

X
Case 1: when the variance σ2 is known. In this case  X  , then the confidence interval for
n
X
population mean  X is given by X  z 
2 n
79 of 25
S
Case 2: the variance is unknown but the sample is large (n>30.)  X  and
n

the 100(1-α) % confidence interval for μ is

 S S  S
 X  z / 2 , X  z / 2  This can be written as: X  z / 2
 n n n

S
Case 3: when the variance σ2 is unknown and the sample size is small (n<30), then  X 
n
S
and the confidence interval for population mean  X is given by X  tn 1
2 n

Example: An experiment involves selecting a random sample of 256 middle managers. One item
of interest is annual income. The sample mean is $45,420 and the sample standard deviation is
$2,050.

(i) What is the estimated mean income of all middle manager (point estimate or population
mean)?
(ii) What is the 95 percent confidence interval for population mean?
(iii) What degree of confidence being used?
(iv) Interpret the result.
Solution:

(i) the point estimate of the population mean is $45,420


(ii) C.I. for population mean (  X )
s
Question 7. X Z =
n
1.96(2,050)
$45, 420   45, 420  251.125  (45168.875, 45671.125)
256
(iii)Degree of confidence (level) of confidence or measure of confidence of a person is 95.
(iv) We are 95% confident that the mean annual income of all middle managers is in b/n
45,168.875 and 45,671.125.
Exercise:

1) The manufacturer of a certain type of battery is trying to estimate the lifetime of the battery.
He believes each battery will last for a random amount of time that has a N (μ, 100) distribution.
(The lifetimes are measured in hours.) He carries out an experiment to estimate μ. A sample of
400 batteries is tested and their lifetimes are measures. The (sample) mean lifetime is found to
be 74.2 hours. Calculate a 95% confidence interval for μ. How do you interpret this interval?

80 of 25
2) A biostatistician intends to estimate μ, the mean blood pressure of women between the ages
of 45 and 50. She takes a random sample of 20 women and measures their blood pressure.
Based on past experience she believes the measurements will follow a
N(μ, 100) distribution. (Measurements are in mm mercury.) Suppose she discovers the sample
mean is equal to 136.9 mm mercury. Find a 95% confidence interval for μ.

3) A biologist measured a random sample of 12 fossil skeletons of an extinct species of bird. He


found that their skulls had a mean length of 6.34cm and a standard deviation of 0.45cm. He
believes that the lengths of the skulls follow a normal distribution. Us the data to obtain a 95%
confidence interval for the mean of this distribution.

4) A sports scientist takes a random sample of 17 athletes and asks them to run 5km on a
treadmill. Their heart rates are measured before the start of the run and five minutes after the
finish. The increases in heart rates are measured and are shown below.
53 45 71 74 65 83 47 56 61 74 61 72 54 43 72 65 54
Increase in heart rates (beats per minute)
(i) Calculate the mean and standard deviation of the data.
(ii) The sport scientist wanted to estimate μ, the mean increase in heart rate. Find a
point estimate for μ and construct a 95% confidence interval for it. What
assumptions do you need to make about the population for this interval to be
valid?

8.2 Hypothesis testing


Statistical hypothesis is a statement about a population parameter and hypothesis is a procedure
based on sample evidence & probability theory to determine weather the hypothesis is a
reasonable statement.

There are five steps in testing hypothesis:

Step 1: state the null hypothesis (H0) and alternative hypothesis (H1)

Null hypothesis: a statistical hypothesis stated with the view to be tested its validity. It is
denoted by Ho where H stands for hypothesis and 0 for no difference. Accepting H0 is not
sufficient to conclude that it is indeed true. It is better to say H0 is not false.

Alternative hypothesis: A statement that is accepted if the sample data provide enough evidence
that H0 is false.

The alternative hypothesis may be tested in keeping with either of the following two situations:

1. in two tailed test H1 does not state direction


Question 8. H 0 :  X  0 vsH 1:  X  0
2. If our interest is to know weather the value of  X has increased or decreased compared to
the hypothesized mean value 0 (one tailed test because H1 states a direction)
81 of 25
(i) When  X is expected to be increased,
Question 9. H 0 :  X  0 vsH 1:  X   X
(ii) When  X is expected to be decreased,
Question 10. H 0 :  X  0 vsH 1:  X  0

Step 2: Select the level of significance

Level of significance is the probability of rejecting the null hypothesis when it is true. It is
usually designated by 

Step 3: Competing the test statistics

There are money test statistics Z, T, F, and X 2. Test statistics is a value calculated from the
sample information which is used to determine acceptance and rejection of H0.

For example in hypothesis testing of population mean the test statistic Z is computed
x
as Z  ( when sample size is large and  X is known) similarly test statistics T is
X
n
x
computed as t  ( when sample size is small and  X is unknown)
calc S
n

Step 4: formulating the decision rule

Decision rule is a statement of rejection or acceptance of H0

Critical value: is a dividing pint b/n the region where the null hypothesis is rejected and the
region where it is not rejected.

Step 5: Make decision: is a decision to reject or not to reject H0 based on the test statistics
calculated lies in the rejection region or not at level  of significance.

8.2.1 Hypothesis testing Hypothesis test for large sample


One tailed test and two tailed test of significance

Two tailed test

Example: The Jamestown Steel Company manufactures and assembles desks and other office
equipment at several plats in the western New York. The weekly production of model A325desk
at Fredonia plant is normally distributed with mean 200 and standard deviation 16. Recently due
to the market expansion, new production method is introduced and new employees are hired and
mean production of the last 50 weeks is 203.5. The vice president of the company would like to
82 of 25
investigate whether there has been an overall change in the weekly production of model A 325
desk. Is the mean production is different from 200? Use 0.01 level of significance.

Solution:

Step 1: H 0 :  X  200vsH 1:  X  200

Step 2:   0.01   0.01 (committing type I error)

Step 3: Test statistics

x 203.8  200


Z   1.55
X 16
n 50

Step 4: decision rule

Since this is a two tailed test  =0.01/2=0.05 in each tail have the area under normal curve is
2
0.05, then 0.500-0.005=0.4950

P (0  z  z  )  0.4950 Implies z 0.005


 2.58
2

Accept H0 if z lies b/n -2.58 & 2.58

Step 5: since 1.55 doesn’t lie on the rejection region H0 is accepted and we conclude that the
population mean is not different from 200. I.e. production rate at Fredonia plant is not changed.

One tailed taste

Example: Random samples of 200 senior school students produce a mean weight of 58 kg with
stdev. 4 kg. Test the hypothesis that the mean weights of the population is greater than 60 kg.
Use 0.05 level of significance

Solution:

i) H 0 :  X  60kgvsH 1:  X  60kg

ii)   0.05

x 58  60
iii) Test statistics: Z    7.072
X 4
n 200

iv) Critical region 0.5-0.05=0.45


83 of 25
P (0  z  z 0.05
)  0.4500 Implies z 0.05
 1.65

v) Since Z  7.072 is greater than z 0.05


 1.65 H0 is rejected in favour of H1, this implies that
the mean weight of the senior school students is greater than 60.

x
Note: Z  when  is unknown, where S is sample standard deviation.
S
n

8.2.2 Hypothesis testing for small sample size and  X is unknown


One tailed test

Example: A consumer service agency examined a new automobile for its gasoline performance.
A sample of 12 randomly chosen of kms covered per gallon under normal condition resulted an
average of 60 kms/gallon with stdev 1.8 km. Do this result support manufactures claim that the
new automobile covers more than 50 km/gallon? Use a=0.10

Solution:

1) H 0 :  X  15km / gallonvsH 1:  X  15km / gallon


2)   0.01
x   16  15
3) Test statistics: t calc    1.9, n  12, x  16, S  1.8
S 1.8
n 12
4) Critical region t 0.10  1.36, v  (n  1)  11df
Two tailed test

Example: the mean length of a small counter balance bar is 43 mm there is a concern that the
adjustment the machine changed the bars. The null hypothesis there is no change in the mean
length (  X  43 ) test at 0.02 level of significance. 12 bars are randomly selected and their
length in mm 42, 39, 42, 45, 43, 40, 39, 41, 40, 42, 43, 42

Solution:

1) H 0 :  X  43vsH 1:  X  43

2)   0.02

84 of 25
2

x x i
 ( xi  x)
3) t calc

S
 2.92, n  12, x 
n
 41.5, S 
n 1
 1.78
n

4) Critical value: (t  2


 t  t  2), t 0.02  t 0.01  2.718
2

5) Conclusion: computed (-2.92) lies beyond critical level of -2.718, so based on the sample
result we conclude that the machines is out of adjustment.

8.2.3 Chi-square test


It has three tests (i) tests of goodness of fit (goodness of fit test)
(ii) Tests of independence (test of association of attribute)
(iii) Tests of homogeneity
Hypothesis Test
Test statistics: the above three test have the same test statistics X2 where
2 (O  E )
x   i i , O i  observedclassfrequency, E i  exp ectedclassfrequency
E i

and  O   E  n since both values are obtained from a single sample


i i

Note: if O  E for all k classes, then is X2 zero and X2 is maximum when O


i i i
is concentrated
in one particular class.
2
Table value: the theoretical X2 value denoted by x  for different degrees of freedom
2
v  (c  1)( r  1) beyond which the right tail area under x curve is   0.1, 0.05, 0.025etc in terms
2 2
of probability p ( x  x  )   , ‘c’ stands for number of columns and ‘r’ stands for number of
rows.
2 2
Decision rule: We reject H0 at level of significance with v degree of freedom when x  x 
otherwise H0 is accepted.
Contingency table: based on the sample, data classified according to two attributes, the observed
and expected frequency must be displayed in ‘r’ rows and ‘c’ columns are called contingency
table.

(i) Goodness of fit test- it enables us to determine how good a fit is between the observed
frequencies and the corresponding expected frequencies. It is concerned with
multinomial population where population and sample are distributed into two or more
classes according to a single attribute and p.d. is hypothesized.
Example: A random sample 100 families with four children each disclosed the following data.
Female birth 0 1 2 3 4
number of families 5 25 40 20 10
Verify at   0.05 if these data are consistent with the hypothesis that male and female births are
equally likely.
85 of 25
Solution:
Hypothesis H0: female births and male births are equally likely.
H1: female births and male births are not equal
  0.05
Test statistics: Obtaining E i with probability of a female birth as p=1/2 and q=1-1/2=1/2 we
have 6.25 25 37.5 25 6.25
E i

5 25 40 20 10
Oi

2 (O  E )
x  i
 3.667 i

E i
2 2
Decision rule: For v=4 Degrees of freedom reject H0 in favour of H1 if x  x
2
Where x 0.05
 9.49
2 2
Conclusion: since x  3.667 is less than x 0.05  9.49 H0 is accepted. This means that the sample
is consistent with the hypothesis that female and male births are equally likely.

(ii) Tests of independence: enable to determine whether or not the attributes are statistically
independent. The test is applied when population and sample are classified according
to two attribute. More over the p.d. of the classification is not known
(ColumnMa rg inalfrequency )( RowM arg inalFrequency )
Expected frequency of any cell=
TotalFrequency
Example: The Employment Bureau located in a city received 200 applications in the month of
June, 1987 for registration. A tabular presentation of the applications according to sex and level
of education emerges as under.

Educational level Sex Total


Male Female
Undergraduates 30(24) 10 40
Graduates 70 20 90
Post graduates 20 50(28) 70
Total 120 80 200

Do these data provide evidence at   0.05 to indicate that the level of education is related to
sex?
Solution:
Hypothesis H0: the level of education is not related to sex
H1: sex and level of education is related
  0.05
2 2
2 (O i  E i ) (30 24) (50 28)
Test statistics: x    ...   44.4 where,
E i
24 28
86 of 25
(ColumnMa rg inalfrequency )( RowM arg inalFrequency ) 40(120)
E i

TotalFrequency
, E1 
200
,..., E 6  28
2 2 2
Decision rule: For v=2df H0 is rejected in favour of H1 if x  x  at 0.05 where x 0.05
 5.99
2 2
Conclusion: as x  44.4 > x 0.05  5.99 H0 is rejected. Thus, the sample data indicates that level
of education is related to sex.
(iii) Tests of homogeneity: When two or more samples drown from the same population or
from different population, it is of great interest to know weather the samples have
come from the population. Alternatively, it means verifying weather the data obtained
from different samples are homogeneous (similarity b/n two or more sets of sample
data).the use of X2 test statistics for verifying the homogeneous character of two or
more sets of sample data comprises X2 tests of homogeneity. H0: attributes are
homogeneous vs. H1: attributes are different
Example:-suppose a survey is conducted to know the view of different strata of people on the
new industrial policy which aims at seeking greater participation of the private sector. Four
different samples consisting of 100 professionals, 120 business men, 110 farmers, and 150
students are selected. Each one is asked to indicate weather he/she was in favour of, against or
indifferent to the new industrial policy. The result is show in the table below. Based on the
sample result evaluate weather there is a significant difference in the views of the four categories
of person.
Those who are Professionals Business men Farmers Students Total
In favour (f) 50(39.6) 70(47.5) 20(43.5) 50(59.4) 190
Against (A) 40(43.8) 30(52.5) 60(48.1) 80(65.6) 210
Indifferent (I) 10(16.6) 20(20.0) 30(18.4) 20(25.0) 80
Sample size (ni) 100 120 110 150 480

Solution:
Hypothesis H0: the views of four categories of person on the new industrial policy is similar
(Homogeneous)
H1: not H0
  0.05
2 2
2 (O i  E i ) (5039.6) (20 25)
Test statistics: x    ...   54.57
E i
39.6 25
2 2
Decision rule: reject H0 if x  x 0.05
at v  (c  1)( r  1)  (3  2)(4  1)  6

2 2
Conclusion: since x  54.57  x 0.05  12.60 H0 is rejected. It means there is a significant
difference in the views of the four categories of people on the new industrial policy. I.e. the four
samples don’t come from the same population (not homogeneous).

87 of 25
HAPTER 9

Simple Linear Regression and correlation analysis

10.1. Simple Linear Regression Analysis

Regression is concerned with bringing out the nature of relation ship and using it to know the
best approximate value of one variable corresponding to a known value of other variable

Simple linear regression deals with method of fitting a straight line (regression line) on a sample
of data of two variables in terms of equation so that if the value of one variable is given we can
predict the value of the other variable.

In other words if we have two variables under study one may represent the cause and the other
may represent the effect. The variable representing the cause is known as independent (predictor
or repressor) variable and it is usually denoted by X. The variable representing the effect is
known as dependent (predicted) variable and is usually denoted by Y. Then, if the relationship
between the two variables is a straight line, it is known as simple linear regression.

When there are more than two variables and one of them is assumed to be dependent up on the
others, the functional relationship between the variables is known as multiple linear regressions.

Scatter diagram: is a plot of all ordered pairs (x, y) on the coordinate plane which is necessary to
discover weather the relationship b/n two variables indeed best explained by straight line.

Example:
Advertizing budget (X) 5 6 7 8 9 10 11

Profit(Y) 8 7 9 10 13 12 13

Y
13 x x
12 x
11
10 x
9 x
8 x
7 x
6
5
4
3
88 of 25
2
1

1 2 3 4 5 6 7 8 9 10 11 X

So if we draw a line, the regression line is one that passes through almost all or closest to all
points in the scatter diagram.
Y
x x x
x xx x
x
x x x

x x x

The simple linear regression of Y on X in the population is given by:

Y =  + X + ε
Where
 = y-intercept
 = slope of the line or regression coefficient
ε=is the error term

The y-intercept  and the regression coefficient  are the population parameters. We obtain the
estimates of  and  from the sample. The estimators of  and  are denoted by a and b,
respectively. The fitted regression line is thus,

Ye = a + b X

The above algebraic equation is known as a regression line. The method of finding such a
relationship is known as fitting regression line. For each observed value of the variable X, we
can find out the value of Y. The computed values of Y are known as the expected values of Y
and are denoted by Ye.

The observed values of Y are denoted by Y. The difference between the observed and the
expected values Y-Ye, is known as error or residual, and is denoted by e. The residual can be
positive, negative or zero.
2
A best fitting line is one for which the sum of squares of the residuals,  e; , is minimum. For
this purpose the principle called the method of least squares is used.

89 of 25
According to the principle of least squares, one would select a and b such that
 e; 2 = (Y- Ye) ² is minimum where Ye = a+ bx.
2
To minimize this function, first we take the partial derivatives of  e; with respect to a and b.
Then the partial derivatives are equated to zero separately. These will result in the following
normal equations:
 y  na  b x
2
 xy  a x  b x
Solving these normal equations simultaneously we can get the values of a and b as follows:

 x y
 xy  n
b and
2
(  x) 2
x 
n
a  y  bx
Regression analysis is useful in predicting the value of one variable from the given values of
another variable.

Example: A researcher wants to find out if there is any relationship b/n height of the son and his
father. He took random sample 6 fathers and their sons. The height in inch is given in the table
bellow (i) Find the regression line of Y on X
(ii) What would be the height of the son if his father’s height is 70 inch?
Height of father (X) 63 65 66 67 67 68

Height of the son (Y) 66 88 65 67 69 70

2 2
Solution :  X  396 ,  Y  425 ,  X  26152 ,  XY  26740 ,  Y  27355
 x y
 xy  n 6(26740)  (396)(405)
b  2
 0.625 2
2
(  x) 6( 26152)  (396)
(i) x  n

a  y  bx 
 Y  b X 
405  (0.625)(396)
 67.5
n 6
 Y=26.25-0.625X
(ii) If X=70, then
Y=26.25-0.625(70) =70, thus the height of the son is 70 inch

Standard Error of estimates: measures the average amount by which the estimated Ye values
depart from the corresponding observed Y values (dispersion of observed values around the line
of regression Yon X)
90 of 25
2

Sx.y =
( y i  y ei ) , where Ye =  + X + ε and
 n2
Yi is observed (actual) value of y
Example: given the observation (2, 2), (4, 5), (6, 4) and (8, 7), we can get the regression line
Ye =1+0.7X. Find the standard error of the estimates of the regression line.
Solution:
Ye =1+0.7Xi, I = 1, 2, 3, 4
Then Ye1 =1+0.7(x1) Ye3 = 1+0.7(6) = 5.2
=1+0.7(2) = 2.4 Ye4 = 1+0.7(8) = 6.6
Ye2=1+07(4) = 3.8
2
( y i  y ei) 1
 Sx.y =  = (2  2.4)  ...  (7  6.6)  1.26
n2 2

10.2 Simple Linear Correlation Analysis

The measure of the degree of relationship between two continuous variables is known as
correlation coefficient. The population correlation coefficient is represented by  and its
estimator by r. The correlation coefficient r is also called Pearson’s correlation coefficient since
it was developed by Karl Pearson. r is given as the ratio of the covariance of the variables x and
y to the product of the standard deviations of x and y. Symbolically,

( x  x )( y  y )
Cor ( x, y )  n 1
r 
sd ( x).sd (Y ) 2
 (x  x  ( y  y)
n 1 n 1

=
 ( x  x )( y  y )
2 2
 (x  x)  ( y  y)
 x y
 xy  n
= 2

( x 
( X ) )( y  ( y )
2 2
2

n
 n
)

The numerator is termed as the sum of products of x and y, SPxy. In the denominator, the first
term is called the sum of squares of x, SSx, and the second term is called the sum of squares of y,
SSy. Thus,

91 of 25
SPxy
r=
SS x SS y

The correlation coefficient is always between –1 and +1, i.e.,-1  r  1.


r = -1 implies perfect negative linear correlation between the variables under
consideration
r = +1 implies perfect positive linear correlation between the variables under
consideration
r = 0 implies there is no linear relationship between the two variables: but there could be a non-
linear relationship between them. In other words, when two variables are uncorrelated, r = 0, but
when r = 0, it is not necessarily true that the variables are uncorrelated.

x perfect negative perfect positive x no correlation


correction(r = -1) correlation (r = 0)
x (r = 1) x x x x

x x

10.3 Coefficient Determination(R2)

The square of the correlation coefficient, r2, is called the coefficient of


determination. It measures the variation in the dependent Y variable explained by variation in the
independent variable X.

For example, if r = 0.8, then r2 = 0.64. This means on the basis of the sample
approximately 64% of the variation in the dependent variable, say Y, is caused by the variation
of the independent variable, say X. The remaining, 1-r2, 36% variation in Y is unexplained by
variation in X. In other words, variables (factors) other than X could have caused the remaining
36% variation in Y.

Example: the research director of the Dubbary Saving and Loan Bank collected 24 observation
of montage interest rates X and number of house sales Y at each interest rate. The director
computed that,
2 2
x  276,  y  768,  x i  3300,  y  2500,  xi y  8690
i i i i
Then compute (i) Coefficient of correlation.
(iii)The coefficient of determination.
Solution:

(i) r
 ( x  x )( y  y )  24(86.9)  276(768)
 0.61
2 2
 ( x  x )  ( y  y ) 

2
 
24(3300)  (276) 24(2500)  (768)
 
2


92 of 25
(ii) Coefficient of determination (R2) = r2= (0.61)2 =0.37 this shows that 37% of the variation
in the number of households is due to the variation in the interest rate.

Rank correlation: it is applied if qualification of some information is not possible or exact


magnitude is not ascertainable instead characteristics are expressed in comparative terms.
Example beauty, honesty, intelligence etc.

Ranks may be assigned either by two persons on a single characteristics or by a single person to
two different characteristics.
n 2
6i 1 d i
rs= 1- 2
, d i  xi  y
i
n(n  1)
Example: two judges gave the following ranks (highest to lowest) to 11 girls in a beauty contest.
Whether or not an agreement b/n the independent ranking of the two judges, find the rank
correlation and interpret the agreement b/n the two judges.
Girl number 1 2 3 4 5 6 7 8 9 10 11

ranking of judge A 3 4 1 2 5 10 11 7 9 8 6

ranking of judge B 2 4 3 1 7 9 6 11 10 5 8

Solution: construct table for the difference of paired ranks di and di2

Girl no 1 2 3 4 5 6 7 8 9 10 11
di 1 0 -2 1 -2 1 5 -4 -1 3 2
di 2 1 0 4 1 4 1 25 16 1 9 4
11 2
6i 1 d i 6(66) 11 2
Then, rs = 2
11(11  1)
=
11(121  1)
 0.7 , where d i
i 1
 66

rs=0.7 implies there is a very good agreement b/n judges with regard to the beauty of girls.

93 of 25

You might also like