0% found this document useful (0 votes)

52 views

Probability

Uploaded by

Natben Sonic

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Probability

Uploaded by

Natben Sonic

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

UNIT ONE: INTRODUCTION TO STATISTICS

1.1 Definition and classification of Statistics

The word statistics is defined in different ways depending on its use in the plural and singular sense.
In the plural sense: - statistics is defined as the collection of numerical facts or figures (or the raw data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered as a statistics whereas
Abebe has got 90% in statistics course is not statistics.

Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they cannot be compared
and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of collecting, organizing,
presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data are used.
Descriptive Statistics:- deals with describing the data collected without going further conclusion.
Example 1.1: Suppose that the mark of 6 students in Statistics course for COTM students is given as 40, 45, 50,
60, 70 and 80. The average mark of the 6 students is 57.5 and it is considered as descriptive statistics.
Inferential Statistics:- It deals with making inferences and/or conclusions about a population based on data
obtained from a sample of observations. It consists of performing hypothesis testing, determining relationships
among variables and making predictions.
Example 1.2: In the above example, if we say that the average mark in Statistics course for Mathematics students
is 57.5, then we talk about inferential statistics (draw conclusion based on the sample observation).

1.2 Stages of Statistical Investigation

The area of statistics points out the following five stages. These are collection, organization, presentation, analysis
and interpretation of data.
Collection of data: This is the process of obtaining measurements or counts or obtaining raw data.
Data can be collected in a variety of ways; one of the most common methods is through the use of sample or
census survey.
Organization of data: - Data collected from published sources are generally in organized form. However if an
investigator has collected data through a survey, it is necessary to edit these data in order to correct any apparent
inconsistencies, ambiguities, and recording errors.
This phase also includes correcting the data for errors, grouping data into classes and tabulating.
Presentation of data:- After the data have been collected and organized they can be presented in the form of
tables, charts, diagrams and graphs. This presentation in an orderly manner facilitates the understanding as well
as analysis of data.
Analysis of data: - the basic purpose of data analysis is to dig out useful information for decision making. This
analysis may simply be a critical observation of data to draw some meaningful conclusions about it or it may
involve highly complex and sophisticated mathematical techniques.
Interpretation of data: - Interpretation means drawing conclusions from the data collected and analyzed. Correct
interpretation will lead to a valid conclusion of the study & thus can aid in decision making.

1
1.3 Definition of some statistical terms
Population: - It is the totality of objects under study. The population represents the target of an investigation, and
the objective of the investigation is to draw conclusions about the population hence we sometimes call it target
population. The word population doesn’t necessarily refer to people.
Examples:- All clients of Telephone Company, Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study.
Sampling frame: - is the list of all possible units of the population that the sample can be drawn from it.
Eg. List of all students of AASTU, List of all residential houses in A.A city, etc
Survey: - is an investigation of a certain population to assess its characteristics. It may be census or sample.
Census survey: a complete enumeration of the population under study.
Sample survey: the process of collecting data covering a representative part or portion of a population.
Parameter: - is a statistical measure of a population, or summary value calculated from a population. Examples:
Average, Range, proportion, variance, etc
Statistic: - is a descriptive measure of a sample, or it is a summary value calculated from a sample.
Sampling: - The process or method of sample selection from the population.
Sample size: - The number of elements or observation to be included in the sample.
An element: - is a member of sample or population. It is specific subject or object (for example a person, firm,
item, etc.) about which the information is collected.
Variable: - It is an item of interest that can take numerical or non-numerical values for different elements. It may
be qualitative or quantitative. Example: age, weight, sex, marital status, etc.
Observation (measurement):- is the value of a variable for an element.
Qualitative variables:- are variables that assume non-numerical values. They can be categorized and they are
usually called attributes. Example: - Sex, marital status, ID number, etc.
Quantitative variables: - are variables which assume numerical values. eg. Age, weight, etc.
1.4 Applications, uses and limitations of Statistics
Statistics can be applied in any field of study which seeks quantitative evidence. For instance, Engineering,
Economics, Natural Science, etc.
Engineering: Statistics have wide application in engineering.
• To compare the breaking strength of two types of materials
• To determine the probability of reliability of a product.
• To control the quality of products in a given production process.
• To compare the improvement of yield due to certain additives such as fertilizer, herbicides, e t c.
Function/Uses of Statistics
The following are some uses of statistics:
• It condenses and summarizes a mass of data: the original set of data (raw data) is normally voluminous and
disorganized unless it is summarized and expressed in few presentable, understandable & precise figures.
• Statistics facilitates comparison of data: measures obtained from different set of data can be compared to draw
conclusion about those sets. Statistical values such as averages, percentages, ratios, rates, coefficients, etc, are
the tools that can be used for the purpose of comparing sets of data.
• Statistics helps to predict future trends: statistics is very useful for analyzing the past and present data and
forecasting future events.
• Statistics helps to formulate & review policies

2
• Formulating and testing hypothesis: Statistical methods are extremely useful in formulating and testing
hypothesis and to develop new theories.

Limitations of Statistics
Some of these limitations are:
a) It does not deal with individual values: as discussed earlier, statistics deals with aggregate of facts. For
example, wage earned by an individual worker at any one time, taken by itself is not a statistics.
b) It does not deal with qualitative characteristics directly: statistics is not applicable to qualitative
characteristics such as beauty, honesty, poverty, standard of living and so on since these cannot be expressed in
quantitative terms.
c) Statistical conclusions are not universally true: since statistics is not an exact science, as is the case with
natural sciences, the statistical conclusions are true only under certain assumptions.
d) It can be misused: statistics cannot be used to full advantage in the absence of proper understanding of the
subject matter.
1.5 Levels of Measurement

Proper knowledge about the nature and type of data to be dealt with is essential in order to specify and apply the
proper statistical method for their analysis and inferences.

Scale Types
Measurement is the assignment of values to objects or events in a systematic fashion. Four levels of measurement
scales are commonly distinguished: nominal, ordinal, interval, and ratio. The first two are qualitative while the
last two are quantitative.
Nominal scale: The values of a nominal attribute are just different names, i.e., nominal attributes provide only
enough information to distinguish one object from another. Qualities with no ranking or ordering; no numerical
or quantitative value. These types of data are consists of names, labels and categories.
Example 1.3: Eye color: brown, black, etc, sex: male, female.
• In this scale, one is different from the other
• Arithmetic operations (+, -, *, ÷) are not applicable, comparison (<, >, ≠, etc) is impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
• Can be arranged in some order, but the differences between the data values are meaningless.
• Data consisting of an ordering of ranking of measurements are said to be on an ordinal scale of
measurements. That is, the values of an ordinal scale provide enough information to order objects.
• One is different from and greater /better/ less than the other
• Arithmetic operations (+, -, *, ÷) are impossible, comparison (<, >, ≠, etc) is possible.
Example 1.4 -Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good, fair, poor), military
status (general, colonel, lieutenant, etc).

Interval Level: data are defined as ordinal data and the differences between data values are meaningful. However,
there is no true zero, or starting point, and the ratio of data values are meaningless. Note: Celsius & Fahrenheit
temperature readings have no meaningful zero and ratios are meaningless.
In this measurement scale:-
• One is different, better/greater and by a certain amount of difference than another.
• Possible to add and subtract. For example; 800c – 500c = 300c, 700c – 400c = 300c.

3
• Multiplication and division are not possible. For example; 600c = 3(200c). But this does not imply that an
object which is 600c is three times as hot as an object which is 200c.
Most common examples are: temperature, IQ.
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting point, and the ratios of
data values have meaning.
• Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both differences and ratios are
meaningful.
• One is different/larger /taller/ better/ less by a certain amount of difference and so much times than the
other.
• This measurement scale provides better information than interval scale of measurement.
Example 1.5: weight, age, number of students.

4
CHAPTER TWO: METHODS OF DATA COLLECTION AND PRESENTATION

2.1 Methods of Data Collection

Data:- is a measurement or observation value recorded for a certain element or variable. it is the raw material of
statistics. It can be obtained either by measurement or counting.

Sources of data
The statistical data may be classified under two categories depending up on the sources.
Primary data: - Data collected by the investigator himself for the purpose of a specific inquiry or study.
Three of the most common methods of collecting Primary data are:
• Telephone survey
• Mailed questionnaire
• Personal interview.
Secondary data: - When an investigator uses data, which have already been collected by others, such data are
called secondary data. . Example of secondary data: books, reports, magazines, etc.
2.2 Methods of Data Presentation
The presentation of data is broadly classified in to the following two categories:
✓ Frequency distribution /Tabular presentation
✓ Diagrammatic and Graphic presentation.
2.2.1 Frequency distribution
Frequency:- is the number of times a certain value or class of values occurs.
Frequency distribution (FD):- is the organization of raw data in table form using classes and frequency.
Definition of some basic terms
• Grouped frequency distribution: is a FD when several numbers are grouped into one class.
• Class limits (CL): It separates one class from another. The limits could actually appear in the data and have
gaps between the upper limits of one class and the lower limit of the next class.
• Unit of measure (U): This is the possible difference between successive values. E.g. 1, 0.1, 0.01 …
• Class boundaries: Separate one class in a grouped frequency distribution from the other. The boundary has
one more decimal place than the raw data. There is no gap between the upper boundaries of one class and
the lower boundaries of the succeeding class. Lower class boundary is found by subtracting half of the unit
of measure from the lower class limit and upper class boundary is found by adding half unit measure to the
upper class limit.
• Class width (W): The difference between the upper and lower boundaries of any consecutive class. The
class width is also the difference between the lower limit or upper limits of two consecutive classes.
• Class mark (Midpoint): It is found by adding the lower and upper class limit (Boundaries) and divided the
sum by two.
• Cumulative frequency (CF): It is the number of observation less than the upper class boundary or greater
than the lower class boundary of class.
• CF (Less than type): it is the number of values less than the upper class boundary of a given class.
• CF (Greater than type): it is the number of values greater than the lower class boundary of a given class.
• Relative frequency (Rf ):The class frequency divided by the total frequency. This gives the percent of
values falling in that class.
5
• Rfi = fi/n= fi/∑fi
• Relative cumulative frequency (RCf): The class cumulative frequency divided by the total frequency
gives the percent of the values which are less than the upper class boundary or the reverse.
RCfi = Cfi/n= Cfi/∑fi

STEPS IN CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION

1. Find the highest and the smallest value,
2. Compute the range; R = H – L,
3. Determine the number of classes using sturgles formula
K= 1 + 3.322Log n; n= Total frequency
4. Find the class width (W) by dividing the range by the number of classes and round to the nearest integer.
W = R/K
5. Identify the unit of measure usually as 1, 0.1, 0.01,…..
6. Pick a suitable starting point less than or equal to the minimum value. Your starting point is lower limit
of the first class, then continue to add the class width to get the rest lower class limits.
7. Find the upper class limits UCLi = LCLi+w-U. then continue to add width to get the rest upper class
8. Tally the data and find the frequencies.

Example 2.4: The following data are on the number of minutes to travel from home to work for a group of
automobile workers: 28 25 48 37 41 19 32 26 16 23 23 29 36 31 26 21 32 25 31 43 35 42 38 33 28.
Construct a frequency distribution for this data.
Solution:
✓ Range = 48 – 16 =32
✓ K=1+3.322log10 25=5.64≈6
✓ W=32/6=5.33 rounding up to the nearest integer i.e W=6.

Let the lower limit of the first class be 16 then the frequency distribution is as follows:
Class limit Class boundaries Tally Frequency
16-21 15.5-21.5 \\\ 3
22-27 21.5-27.5 \\\\\ \ 6
28-33 27.5-33.5 \\\\\ \\\ 8
34-39 33.5-39.5 \\\\ 4
40-45 39.5-45.5 \\\ 3
46-51 45.5-51.5 \ 1
Total 25
The final frequency distribution is shown in table below.

6
Table: The distribution of the times
Time (in minute) Number of workers
16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25
This frequency distribution is more understandable than the raw data. For instance, many observations are found
in the second class and third class. This in turn implies that many workers took around 22 to 33 minutes to travel
from home to work.
Types of frequency distributions
Based on the type of frequency assigned to the classes we have three types of frequency distributions:
➢ Absolute frequency distribution
➢ Relative frequency distribution
➢ Cumulative frequency distribution
The frequency distributions that we have seen in the previous examples (examples 2.3 and 2.4 ) are absolute
frequency distributions because the frequencies assigned are absolute frequencies.

Definition 2.1: A relative frequency distribution is a distribution which specifies the frequency of a class
relative to the total frequency.

Example 2.5: Convert the above absolute frequency distribution in example 2.4 to a relative frequency
distribution.
Solution: First we find the relative frequency of each class. The relative frequency of a class is the frequency of
the class divided by the total number of observations. For instance the relative frequency of the first class is
3/25=0.12, the relative frequency of the second class is 6/25=0.24, and so on. Thus, the relative frequency
distribution is shown in the table below.

Table: Relative frequency distribution of times

Time (in minute) Relative frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1
Note: Proportion may also be changed to percentages to obtain a percentage relative frequency distribution.

7
Definition 2.2: Cumulative frequency refers to the number of observations that are below a specified value or
that are above a specified value.

Note: Class boundaries are mostly used to obtain cumulative frequencies. Based on whether the observations are
bounded from above or from below, we can have a cumulative less than or a cumulative more than frequency
distributions, respectively.
Example 2.6: Convert the absolute frequency distribution in example 2.4 into:
i) a cumulative less than frequency distribution.
ii) a cumulative more than frequency distribution.
Solution:
i) We use the class boundaries to form cumulative frequencies. For instance, there is no observation which
is less than 15.5, 3 observations are less than 21.5, 9 observations are less than 27.5 and so on. Thus,
the following less than cumulative frequency distribution is obtained.

Table: Less than cumulative frequency distribution of times

Time (in minute) Cumulative frequency
15.5-21.5 3
21.5-27.5 9
27.5-33.5 17
33.5-39.5 21
39.5-45.5 24
45.551.5 25
ii) There are 25 observations which are more than 15.5, 22 observations are more than 21.5, 16 observations
are more than 27.5 and so on. Thus, the following more than cumulative frequency distribution is
obtained.
Table: More than cumulative frequency distribution of times
Time (in minute) Cumulative frequency
15.5-21.5 25
21.5-27.5 22
27.5-33.5 16
33.5-39.5 8
39.5-45.5 4
45.551.5 1
Note:If class limits are used instead of class boundaries the phrases ‘or more’ and ‘or less’ are used in place of
‘more than’ and ‘less than’, respectively to obtain cumulative frequencies.
Ungrouped frequency distributions (Single-value grouping)
In the previous examples each class that we used for grouping data represented a range of possible values. In
some cases, however, using classes that each represents a single value is more appropriate.
Example 2.7: A demographer is interested in the number of children a family may have. He took a random sample
of 30 families. The following data is the number of children in a sample of 30 families.
4 2 4 3 2 8 3 4 4 2 2 8 5 3 4
4 5 4 3 5 2 7 3 3 6 7 3 8 4 5
To group these data, we will use classes based on the single numerical value.

8
Table: Distribution of the number of children.
Number of Children Frequency Relative frequency
2 5 .17
3 7 .23
4 8 .27
5 4 .13
6 1 .03
7 2 .07
8 3 .1
Total 30 1
As we can see from this frequency distribution most families have 4 or 3 children. It would have been difficult to
observe such feature of the data if we did not organize the raw data using a frequency distribution.
Note: Up to now we have seen frequency distributions for quantitative data; we can have also frequency
distributions for qualitative (categorical) data.

Categorical frequency distributions

The categorical frequency distribution is used for data which can be placed in specific categories such as nominal
or ordinal level data. For example, data on political affiliation, religious affiliation, blood type, marital status, or
major field of study would use categorical frequency distributions.
Example 2.8: The following data are on the political party affiliations of sample of 40 biology students. D, R,
and O stand for Democratic, Republican and Other, respectively.
D DDD O R O R O R O R O D D R D DD R
R O R D R R O R RRRR O O R R D R D D
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.

Table: Number of students by political party affiliations

Class frequency Relative frequency
Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1

2.1 Diagrammatic and graphical presentation of data

2.1.1 Graphs for quantitative data
Histogram:itconsists of a set of adjacent rectangles whose bases are marked off by class boundaries (not class
limits) along the horizontal axis and whose heights are proportional to the frequencies associated with the
respective classes.
To construct a histogram from a data set:
1. Construct a frequency table.
2. Draw adjacent bars having heights determined by the frequencies in step1.
The importance of a histogram is that it enables us to organize and present data graphically so as to draw attention
to certain important features of the data. For instance, a histogram can often indicate how symmetric the data are;
how spread out the data are; whether there are intervals having high levels of data concentration; whether there
are gaps in the data; and whether some data values are far apart from others.
9
Example 2.9:The following is a histogram for the frequency distribution in example 2.4.

Figure: Distribution of number of minutes spent by the automobile workers

Frequency polygon: is a graphic form of a frequency distribution. It can be constructed by plotting the class
frequencies against class marks and joining them by a set of line segments.
Note: we should add two classes with zero frequencies at the two ends of the frequency distribution to complete
the polygon.
Example 2.10: Construct a frequency polygon for the frequency distribution of the time spent by the automobile
workers that we have seen in example 2.4.

Figure: Distribution of number of minutes spent by the automobile workers

2.1.2 Graphs useful for presenting qualitative data

Bar charts are diagrammatic representation of data in which the data are represented by series of vertical or
horizontal bars, the height (or length) of each bar indicating the size of the figure represented.

Example 2.11: Draw a bar chart for the following coffee production data.
Table: Coffee productions from 1990 to 1995.

Production year 1990 1991 1992 1993 1994 1995

Amounts of coffee (in 1000 tons) 50 75 92 64 100 120

10
120

Amount of coffee in 1000 tons

100

0
1990 1991 1992 1993 1994 1995

Production year

Figure: Production of coffee from 1990 to 1995.

Pie-chart: it is a circle divided by radial lines into sections or sectors so that the area of each sector is proportional
to the size of the figure represented.
Pie-chart construction:
f
✓ Calculate the percentage frequency of each component. It is i * 100 .
n
f
✓ Calculate the degree measures of each sector. It is given by i * 360 0 .
n
✓ Draw the circle using protractor and compass
Example 2.13: Draw a pie-chart to represent the following data on a certain family expenditure.
Table: Family expenditure.
Item Food Clothing House rent Fuel & light Miscellaneous Total
Expenditure(in birr) 50 30 20 15 35 150
Percentage 33.33 20 13.33 10 23.33
frequencies
Angles of the sector 1200 720 480 360 840 3600

Item
Food
Clothing
House rent
Fuel and light
Miscellaneous

Figure: Family expenditure

11
Example 2.14: The following data are the blood types of 50 volunteers at a blood plasma donation clinic:
O A O AB A A O O B A O A AB B O OO A B A A O A A B O B A O AB A O O A
B A AA O B O O A O A B O AB A O
a) Organize this data using a categorical frequency distribution
b) Present the data using both a pie and a bar chart.

Solution
a) The classes of the frequency distribution are A, B, O, AB. Count the number of donors for each of the blood
types.

Table: The number of donors by blood types

Blood type Frequency Percent

A 19 38.0

B 8 16.0

O 19 38.0

AB 4 8.0

Total 50 100.0
b) Pie chart
Find the percentage of donors for each blood type. In order to find the angles of the sector for each blood
type, multiply the corresponding percentage by 3600 and divide by 100.

Blood type Frequency Percent Angles

A 19 38.0 136.80
B 8 16.0 57.60
O 19 38.0 136.80
AB 4 8.0 28.80
Total 50 100.0 360 0

12
Blood type
A
B
O
AB

Figure: Distribution of blood types of donors.

Bar chart
Label the horizontal axis by blood types and the vertical axis by the number of donors. Then draw a bar for
each blood type which is proportional to their percentage or frequency.
20
Number of donors

0
A B O AB

Blood type

Figure : Distribution of blood types of donors.

13
UNIT THREE: MEASURES OF CENTERAL TENDENCY

Objectives:
Having studied this unit, you should be able to:
✓ understand the role of descriptive statistics in summarization, description and interpretation of data.
✓ use several numerical methods belonging to measures of central tendency to describe the characteristics
of a data set.

1.1 Introduction and objectives of measuring central tendency

In unit 2, we discussed how raw data can be organized in terms of tables, charts and frequency distributions in
order to be easily understood and analyzed. Frequency distributions and their corresponding graphical displays
roughly tell us some of the features of a data set. However, they don’t condense the mass of data in a way that we
can easily understand and interpret. In this chapter, we will see how to summarize data using a descriptive measure
called average. This will help us in condensing a mass of data into a single value which is in some sense
representative of the whole data set.
Suppose you might be interested to know electric power supply needed for a certain residential area. You might
get individual families electricity demand. This mass of data might not be helpful in making decision regarding,
say how to allocate electric power among individual families. But if you compute an average value from the data
set it may help you to make a decision.

Definition 3.1: An average is a single value intended to represent a distribution as a whole.

Note that the individual values of the distribution must have a tendency to cluster around an average. In view of
this requirement an average is also referred to as a measure of central tendency.
Objectives of measuring central tendency:
✓ To get one single value that describes the characteristics of the entire group.
✓ To facilitate comparison between different data sets.

1.2 The summation notation

Suppose a variable is represented by X. The successive values of this variable may be represented by using
subscripts or indexes as x1, x2, x3,…,xn. If the sum of these values or terms is required, we write x1+x2+x3+…+xn.
The Greek letter ∑ (read as sigma) can be used to write the above sum in a compact form as 𝑥1 + 𝑥2 + 𝑥3 + ⋯ +
𝑥𝑛 = ∑𝑛𝑖=1 𝑥𝑖 where 1= lower limit and n = upper limit. When no confusion can result we shall often denote the
above sum simply by ∑ 𝑥 𝑜𝑟 ∑ 𝑥𝑖 𝑜𝑟 ∑𝑖 𝑥𝑖 . Using this notation we may have also occasions to write terms like
∑ 𝑥𝑖2 , ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 , ∑ 𝑥𝑖 𝑓𝑖 , ∑ 𝑥𝑖2 𝑓𝑖 …
∑ 𝑥𝑖2 = 𝑥12 + 𝑥22 + 𝑥32 + ⋯ + 𝑥𝑛2
𝑛

∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 + ⋯ + 𝑥𝑛 𝑦𝑛
𝑖=1
𝑛

∑ 𝑥𝑖 𝑓𝑖 = 𝑥1 𝑓1 + 𝑥2 𝑓2 + 𝑥3 𝑓3 + ⋯ + 𝑥𝑛 𝑓𝑛
𝑖=1

14
𝑛

∑ 𝑥𝑖2 𝑓𝑖 = 𝑥12 𝑓1 + 𝑥22 𝑓2 + 𝑥32 𝑓3 + ⋯ + 𝑥𝑛2 𝑓𝑛

𝑖=1
Rules of summation

∑𝒏𝒊=𝟏(𝒙𝒊 ± 𝒚𝒊 ) = ∑𝑛𝑖=1 𝑥𝑖 ± ∑𝑛𝑖=1 𝑦𝑖 , ∑𝑛𝑖=1 𝑘𝑥𝑖 = 𝑘 ∑𝑛𝑖=1 𝑥𝑖 𝑖𝑓 𝑘 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡.

and∑𝑛𝑖=1 𝑘 = 𝑛𝑘 𝑖𝑓 𝑘 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡.

1.3 Types of measures of central tendency

1.3.1 Arithmetic mean

Definition 3.2:
i) Let 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 be the values of the variable X. The simple arithmetic mean denoted by 𝑥̅ is the sum
of these observations of X divided by the no values.
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
ii) If the numbers 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑘 occur with frequencies𝑓1 , 𝑓2 , 𝑓3 , … , 𝑓𝑘 , respectively. Then mean can be
𝑥1 𝑓1 +𝑥2 𝑓2 +𝑥3 𝑓3 +⋯+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑓𝑖 𝑥𝑖
defined in a more compact form as 𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 +𝑓3 +⋯+ 𝑓𝑘 𝑖=1 𝑓𝑖

Note that if the data refers to a population data the mean is denoted by the Greek letter µ (read as mu).
Arithmetic mean for raw data (ungrouped data)
Example 3.1: The following data is the weight (in Kg) of eight youths: 32,37,41,39,36,43,48 and 36. Calculate
the arithmetic mean of their weight.
Solution:
∑8𝑖=1 𝑥𝑖 32 + 37 + 41 + ⋯ + 36 312
𝑥̅ = = = = 39
8 𝑛 8
Example 3.2: The ages of a random sample of patients in a given hospital in Ethiopia is given below:
Age 10 12 14 16 18 20 22
Number of patients 3 6 10 14 11 5 4
Calculate the average age of these patients.
Solution:
Age (xi) Number of patients (fi) (𝑓𝑖 𝑥𝑖 )
10 3 30
12 6 72
14 10 140
16 14 224
18 11 198
20 5 100
22 4 88
Total 53 852

15
∑𝑘𝑖=1 𝑓𝑖 𝑥𝑖 𝑥1 𝑓1 + 𝑥2 𝑓2 + 𝑥3 𝑓3 + ⋯ + 𝑥𝑘 𝑓𝑘
𝑥̅ = 𝑘 =
∑𝑖=1 𝑓𝑖 𝑓1 + 𝑓2 + 𝑓3 + ⋯ + 𝑓𝑘
10 × 3 + 12 × 6 + 14 × 10 + 16 × 14 + 18 × 11 + 20 × 5 + 22 × 4
=
3 + 6 + 10 + 14 + 11 + 5 + 4
30 + 72 + 140 + 224 + 198 + 100 + 88 852
= = = 16.075
53 53
Thus the mean age of these patients is 16.075.
The weighted arithmetic mean
In some cases the data in the sample or population should not be weighted equally, and each value weighted
according to its importance. There is a measure of average for such problems known as weighted Arithmetic mean.
Weighted arithmetic mean is used to calculate the average when the relative importance of the observations
differs. This relative importance is technically known as weight. Weight could be a frequency or numerical
coefficient associated with observations.

Definition 3.3: If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 have weights 𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑛 , respectively, then the weighted arithmetic

mean denoted by 𝑥̅𝑤 , is defined as
𝑥1 𝑤1 + 𝑥2 𝑤2 + 𝑥3 𝑤3 + ⋯ + 𝑥𝑛 𝑤𝑛 ∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖
𝑥̅𝑤 = = 𝑛
𝑤1 + 𝑤2 + 𝑤3 + ⋯ + 𝑤𝑛 ∑𝑖=1 𝑤𝑖

Example 3.3: The GPA or CGPA of a student is a good example of a weighted arithmetic mean. Suppose that
Solomon obtained the following grades in the first semester of the freshman program at Jima University in 2000.
Course Credit hour (wi) Grade
Math101 4 A=4
Bio101 3 C=2
Chem101 3 B=3
Phys101 4 B=3
Flen101 3 C=2
Find the GPA of Solomon.
4(4) + 3(2) + 3(3) + 4(3) + 3(2) 49
𝑥̅𝑤 = 𝐺𝑃𝐴 = = = 2.88
4+3+3+4+3 17
Example 3.4: In a vacancy for a position of botanist in an organization, the criteria of selection were work
experience, entrance exam, and, interview result. The relative importance of these criteria was regarded to be
different. The weights of these criteria and the scores obtained by 3 candidates (out of 100 in each criterion) are
given in the following table. In addition, the selection of a candidate is based on average result on these criteria.
Criterion Weight Candidates
Tesfaye Gutema Kedir
Work experience 4 70 89 85
Entrance exam 3 78 83 89
Interview result 2 90 92 90
Who is the appropriate candidate for the position based on the criteria?

Solution: We use the weighted mean since the relative importances of these criteria are different.
16
Criterion Weight Candidates
Tesfaye Gutema Kedir
xi xiwi xi xiwi xi xiwi
Work experience 4 70 280 89 356 85 340
Entrance exam 3 78 234 83 249 89 267
Interview result 2 90 180 92 184 90 180
Total 9 238 694 264 789 264 787
The weighted mean and the simple arithmetic mean for the applicants are as follows:
Applicant Tesfaye Gutema Kedir
Weighted mean 694/9=77.11 789/9=87.67 787/9=87.44
Simple arithmetic mean 238/3=79.33 264/3=88 264/3=88
If we use the simple arithmetic mean of the scores, both Gutema and Kedir have got equal chances to be recruited.
However, the relative importance of the criteria is different. So we have to use the weighted mean for
discriminating among the candidates. The weighted mean of the scores obtained by Gutema is larger than the
others. So Gutema should be recruited for the job.
Properties of arithmetic mean
i. It can be computed for any set of numerical data, it always exists, and unique.
ii. It depends on all observations.
iii. The sum of deviations of the observations about the mean is zero i.e.
𝑛

∑(𝑥𝑖 − 𝑥̅ ) = 0
𝑖=1
iv. It is greatly affected by extreme values.
v. It lends itself to further statistical treatment, for instance, combinations of means.
vi. It is relatively reliable, i.e. it is not greatly affected by fluctuations in sampling.
vii. The sum of squares of deviations of all observations about the mean is the minimum
i.e. ∑𝑛𝑖=1(𝑥𝑖 − 𝑋̅)2 ≤ ∑𝑛𝑖=1(𝑥𝑖 − 𝐴)2 for any constant A.
1.3.2 Geometric mean
Definition 3.4: The geometric mean of any n positive numbers is the nth root of the products of the numbers.
Symbolically if 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are given their geometric (G.M) mean is given by
𝑛
𝑛
𝑛
𝐺. 𝑀 = √𝑥1 . 𝑥2 . 𝑥3 . … . 𝑥𝑛 = √∏ 𝑥𝑖
𝑖=1

Example 3.5: Find the geometric mean of the following numbers 2, 4, 8.

Solution:
3 3
𝐺. 𝑀 = √2.4.8 = √64 = 4.
Note: The geometric mean is useful in finding the average of percentages, ratios, indexes, or growth rates.
1.3.3 The median
Definition 3.5: the median of a set of data is a value which divides the set in such a way that the number of
observations below it is the same as the number of observations above it.

17
Median for raw data
𝑛+1
i. If the number of observations, say n, is odd then the median is equal to the ( ) 𝑡ℎ observation of
2
the array.
𝑛
ii. If the number of observations n is even then the median is equal to the sum of ( ) 𝑡ℎ observation and
2
𝑛
( + 1) 𝑡ℎ observation divided by two.
2
Notation: If X is the variable under consideration, then 𝑥̃ is used to denote the median.
Example 3.9: Find the median for the following sets of data:
i. 10 5 7 9 6 5 4
Solution: First arrange the data in the form of an array.
4 5 5 6 7 9 10
Here we have n=7 which is odd
𝑛+1
Therefore, the median, 𝑥̃ = 𝑡ℎ𝑒 ( ) 𝑡ℎ observation = the 4th observation = 6.
2
ii. 10 5 7 9 6 5 4 8
Solution: Arrange the data in ascending order.
4 5 5 6 7 8 9 10
Here n=8 which is even.
(82)𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛+(82+1)𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Therefore, 𝑥̃ =
2
4𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛+5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 6 + 7
= = = 6.5
2 2
iii. A shop keeper (sales person) recorded the number of video cassette recorders (VCRs) sold per month
over a two year period. Find the median number VCRs sold.
Number of sets sold Frequency ( months) Cumulative frequency
1 3 3
2 8 11
3 5 16
4 4 20
5 2 22
6 1 23
7 1 24

The number of observations n=24 , even.

12𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛+13𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 3 + 3
𝑥̃ = = =3
2 2
Properties of median
➢ It is an average of position.
➢ It is affected by the number of observations than by extreme values.
➢ The sum of the deviations about the median, signs ignored, is less than the sum of deviations taken from
any other value or specific average.
1.3.4 The mode
Definition 3.6: The mode (modal value) of an observed set of data is the value that occurs the largest number
of times.

18
The mode for raw data
Example 3.10: Find the modal value for the following sets of data.
i. 5 6 5 8 7 4 . In this data set, 5 is the most frequent value. Therefore, the mode is 5. Since the modal
value is only one number, we call the distribution unimodal.
ii. 1 2 3 4 8 2 5 4 6. In this data,themodal values are 2 and 4 since both 2 and 4 appear most frequently
and they occur equal number of times. These kind distributions are called bimodal distribution.
iii. 1 2 4 3 5 6 8 7 In this data set, all values appear equal number of times so there is no modal
value.
Note:
✓ If a distribution has more than two modal values then we call the distribution multimodal.
✓ If in a set of observed values, all values occur once or equal number of times, there is no mode.
✓ The mode is also useful in finding the most typical case when the data are nominal or categorical.
Example 3.11: A survey showed the following distribution for the number of students enrolled in each field. Find
the mode.
Subject Number of students
Business 850
Liberal arts 825
Computer sciences 645
Education 478
General studies 100
Solution: Since the category with the highest frequency is business, the most typical case is a business major.
Properties of modal value
➢ It is easy to calculate and understand.
➢ It is not affected by extreme values.
➢ It is ill-defined, indeterminate and indefinite sometimes.
➢ It is not based on all observations.
➢ Is not used in further analysis of data.
The mean, median, and mode of grouped data
The mean for grouped data can be found by considering the values in the interval are centered at the mid-point of
the interval.
Example 3.12: Consider the frequency distribution of the time spent by the automobile workers. Find the mean
time spent by these workers from this frequency distribution.
Time (in minute) Class mark (xi) Number of workers f×xi
15.5- 21.5 18.5 3 55.5
21.5-27.5 24.5 6 147
27.5-33.5 30.5 8 244
33.5-39.5 36.5 4 146
39.5-45.5 42.5 3 127.5
45.5-51.5 48.5 1 48.5
Total 25 768.5

Solution:
19
∑6𝑖=1 𝑓𝑖 𝑥𝑖 768.5
𝑥̅ = = = 30.74
∑ 𝑓𝑖 25
Note: In case of grouped data if any class interval is open, arithmetic mean cannot be calculated.
The median for grouped data can be approximated by the following formula.
(𝑛2−𝑐𝑓)
𝑥̃ = 𝐿𝑚 + 𝑤
𝑓𝑚
where Lm= lower class boundary for the median class.
n= total number of observations in the distribution.
cf= less than cumulative frequency for the class preceding the median class.
w= class width for median class.
fm=frequency for median class.
Note that the median class is the class containing the (n/2)thobservation if integer or the next nearest integer if the
(n/2)th is not integer.
Example 3.13: Find the median for the following frequency distribution.
Class boundaries Frequency (f) Cumulative frequency
5.5-10.5 1 1
10.5-15.5 2 3
15.5-20.5 3 6
20.5-25.5 5 11
25.5-30.5 4 15
30.5-35.5 3 18
35.5-40.5 2 20
th th
Solution: The class containing the (n/2) observation or the 10 observation is the median class. This class has
class boundaries 20.5 & 25.5(4th class).
(𝑛−𝑐𝑓) (10−6)
𝑥̃ = 𝐿𝑚 + 2 𝑤 = 20.5 + 5 = 24.5
𝑓𝑚 5
Therefore, the median is 24.5.
Note:
i. We approximate the median by assuming that the values in the median class are evenly distributed.
ii. We can compute the median for open-ended frequency distribution as long as the middle value does
not occur in the open-ended class.

The mode for grouped data can be estimated by the following formula.
The modal value is denoted by 𝑥̂. For grouped data we can compute the mode as follows:
𝑓1 − 𝑓0
𝑥̂ = 𝐿1 + ( )𝑤
2𝑓1 − 𝑓0 − 𝑓2
where f1= frequency of the modal class
f0 = frequency of the class preceding the modal class
f2 = frequency of the class next to the modal class
L1= lower class boundary of the modal class
W = class width of the modal class
Note: The modal class is the class with the highest frequency.

20
Example 3.14: Calculate the modal time spent by the automobile workers.
Time (in minute) Number of workers
15.5- 21.5 3
21.5-27.5 6
27.5-33.5 8
33.5-39.5 4
39.5-45.5 3
45.5-51.5 1
The modal class is the class with largest frequency and it is the third class.
8−6
𝑥̂ = 27.5 + ( ) 6 = 29.5
16 − 6 − 4
Therefore, the modal time spent is 29.5 minutes.
Note: The mode can be calculated for distributions with open ended classes.

1.3.5 Quartiles, Deciles and Percentiles

These are averages of position. They are collective known as fractile (quantile) points.
Definition 3.7
Quartiles are three points which divide an array into four parts in such a way that each portion contains an
equal number of elements. The 1st, the 2nd and the 3rd points are known as the 1st, the 2nd and the 3rd quartiles
and are usually denoted by Q1, Q2 and Q3, respectively.

Deciles are nine points which divide an array into 10 parts in such a way that each part contains equal number
of elements. The 1st, 2nd,…, and the 9th points are known as the 1st, 2nd,…, and the 9th deciles and are usually
denoted by D1,D2,…,D9, respectively.

Percentiles are 99 points which divide an array into 100 parts in such a way that each part consists of equal
number of elements. The 1st, 2nd… and the 99th points are known as the 1st, 2nd… and the 99th percentiles and
are usually denoted by P1, P2… P99, respectively.

Note: The array should be in ascending order in order to get the quantiles.
i. Quantile points for raw data
First form an array in an ascending order and then apply the following procedure.
𝑛 + 1 𝑡ℎ
𝑄𝑘 = 𝑇ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑛 𝑡ℎ𝑒 {𝑘 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
𝑛 + 1 𝑡ℎ
𝐷𝑘 = 𝑇ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑛 𝑡ℎ𝑒 {𝑘 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
10
𝑛 + 1 𝑡ℎ
𝑃𝑘 = 𝑇ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑛 𝑡ℎ𝑒 {𝑘 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
100
Example 3.15: The following data relate to sizes of shoes sold at a stock during a week. Find the quartiles, the
seventh decile and the 90th percentile.
Size of shoes 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5
Number of pairs 2 5 15 30 60 40 23 11 4 1
21
Solution: The total number of observations is 191.
191 + 1 𝑡ℎ
𝑄1 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {1 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
= 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 48𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 6.5
191 + 1 𝑡ℎ
𝑄3 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {3 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
= 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 144𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 7.5
191 + 1 𝑡ℎ
𝐷7 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {7 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
10
= 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 134.4𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
≈ 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 134𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 7.5
191 + 1 𝑡ℎ
𝑃90 = 𝑇ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 𝑡ℎ𝑒 {90 ( )} 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
100
= 𝑡ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 172.8𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
≈ 𝑡ℎ𝑒 𝑠𝑖𝑧𝑒 𝑜𝑛 173𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 8

Note: Relationships between fractile points

✓ Q1=P25
✓ Q2=P50=D5=𝑋̃ = Median
✓ Q3=P75
✓ D1=P10; D2=P20 …D9=P90.

22
UNIT FOUR: MEASURES OF VARIATION

Objectives:
Having studied this unit, you should be able to
✓ understand the importance of measuring the variability (dispersion) in a data set.
✓ measure the scatter or dispersion in a data set.
✓ understand ‘moments’ as a convenient and unifying method for summarizing several
descriptive statistical measures.
✓ measure the extent to which the distribution of values in a data set deviate from symmetry.
4.1 Introduction and objectives of measuring variation
We have seen that averages are representatives of a frequency distribution. But they fail to give a
complete picture of the distribution. They do not tell anything about the spread or dispersion of
observations within the distribution. Suppose that we have the distribution of yield (kg per plot)
of two rice varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30

The mean yield of both varieties is 42 kg. The mean yield of variety 1 is close to the values in this
variety. On the other hand, the mean yield of variety 2 is not close to the values in variety 2. The
mean doesn’t tell us how the observations are close to each other. This example suggests that a
measure of central tendency alone is not sufficient to describe a frequency distribution. Therefore,
we should have a measure of spreads of observations.

There are different measures of dispersion. In this chapter we shall discus the most commonly used
measure of dispersion or variation like Range, Quartile Deviation, Standard Deviation, coefficient
of variation. And measure of shape such as skewness and kurtosis.
Objectives of measuring variation
➢ To describe dispersion (variability) in a data.
➢ To compare the spread in two or more distributions.
➢ To determine the reliability of an average.
Note: The desirable properties of good measures of variation are almost identical with that of a
good measure of central tendency.
4.2 Absolute and relative measures
Measures of variation may be either absolute or relative. Absolute measures of variation are
expressed in the same unit of measurement in which the original data are given. These values may
be used to compare the variation in two distributions provided that the variables are in the same
units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager’s salary versus
worker’s salary, the absolute measures of dispersion are not comparable. In such cases measures
23
of relative dispersion should be used. A measure of relative dispersion is the ratio of a measure of
absolute dispersion to an appropriate measure of central tendency. It is a unitless measure.
4.3 Types of measures of variation
The range and relative range

Definition 4.1: Range is defined as the difference between the maximum and minimum
observations in a set of data. 𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒

Range is the crudest absolute measures of variation. It is widely used in the construction of quality
control charts and description of daily temperature.

𝑅𝑎𝑛𝑔𝑒
Definition 4.2: Relative range (RR) is defined as 𝑅𝑅 =
𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒+𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒

Variance, standard deviation and coefficient of variation

Definition 4.3: The variance is the average of the squares of the distance each value is from the
mean. The symbol for the population variance is σ2 (σ is the Greek lower case letter sigma). Let
x1,x2,…,xN be the measurements on N population units then, the population variance is given by
the formula:
(∑ 𝑥𝑖 )2
∑𝑁
𝑖=1(𝑥𝑖 −µ)
2 {∑𝑁 2
𝑖=1 𝑥𝑖 − } ∑𝑁
𝑖=1 𝑥𝑖
2 𝑁
𝜎 = = whereµ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 = and N=Population
𝑁 𝑁 𝑁
size.
Definition 4.4: The standard deviation is the square root of the variance. The symbol for the
population standard deviation is 𝜎. The corresponding formula for the standard deviation is
∑𝑁
𝑖=1(𝑥𝑖 −µ)
2
𝜎 = √𝜎 2 = √ .
𝑁

Example 4.1: The height of members of a certain committee was measured in inches and the data
is presented below.
Height(x): 69 66 67 69 64 63 65 68 72
∑𝑁𝑖=1 𝑥𝑖 69 + 66 + ⋯ + 72 603
µ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 = = = = 67 𝑖𝑛𝑐ℎ𝑒𝑠
𝑁 9 9
(𝑥 − µ) 2 -1 0 2 -3 -4 -2 1 5
(𝑥 − µ)2 4 1 0 4 9 16 4 1 25
∑𝑁𝑖=1(𝑥𝑖 − µ)
2
4 + 1 + 0 + 4 + 9 + 16 + 4 + 1 + 25 64
𝜎2 = = = = 7.11𝑖𝑛𝑐ℎ2
𝑁 9 9
And  =  2 = 7.11 = 2.66
Definition 4.5: The sample variance is denoted by S2, and its formula is
24
2
(∑ 𝑓𝑥)
∑𝑛 2 ∑ 𝑓(𝑥−𝑥̅ )2 ∑ 𝑓𝑥 2 −
2 𝑖=1(𝑥𝑖 −𝑥̅ ) 𝑛
𝑆 = = ={ }.
𝑛−1 𝑛−1 𝑛−1
Definition 4.6: The sample standard deviation, denoted by S, is the square root of the sample
variance
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 ∑ 𝑓(𝑥−𝑥̅ )2
𝑆 = √𝑆 2 = √ =√ .
𝑛−1 𝑛−1

Example 4.2: For a newly created position, a manager interviewed the following numbers of
applicants each day over a five-day period: 16, 19, 15, 15, and 14. Find the variance and standard
deviation.
Solution:
79
𝑥̅ = = 15.8
5
∑ 𝑓(𝑥−𝑥̅ )2 14.8
𝑆2 = = = 3.7
𝑛−1 4
2
(∑ 𝑓𝑥) (79)2
∑ 𝑓𝑥 2 − 1263− 14.8
2 𝑛 5
𝑆 ={ }={ }={ } = 3.7
𝑛−1 4 4

Note that the procedure for finding the variance and standard deviation for grouped data is similar
to that for finding the mean for grouped data, and it uses the mid-points of each class.
Properties of variance
✓ The unit of measurement of the variance is the square of the unit of measurement of the
observed values. It is one of its limitations.
✓ The variance gives more weight to extreme values as compared to those which are near to
mean value, because the difference is squared in variance.
✓ It is based on all observations in the data set.
Properties of standard deviation
✓ Standard deviation is considered to be the best measure of dispersion and is used widely.
✓ There is, however, one difficulty with it. If the unit of measurement of variables of two
series is not the same, then their variability cannot be compared by comparing the values
of standard deviation.
Uses of the variance and standard deviation
✓ The variance and standard deviations can be used to determine the spread of data,
consistency of a variable and the proportion of data values that fall within a specified
interval in a distribution.
✓ If the variance or standard deviation is large, the data is more dispersed. This information is
useful in comparing two or more data sets to determine which is more (most) variable.
✓ Finally, the variance and standard deviation are used quite often in inferential statistics.
Coefficient of variation (CV)

25
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two
or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent:
S
CV =  100%
x
whereS is the standard deviation of the observations.
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example 4.3: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.

Department Biology Chemistry

Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Biology Department Chemistry Department
23 11
CV = 100 = 29.11% CV = 100 = 17.19%
79 64
Since the CV of Biology Department students is greater than that of Chemistry Department
students, we can say that there is more dispersion in the distribution of Biology students’ scores
compared with that of Chemistry students.
Example 4.4: The mean weight of 20 children was found to be 30 kg with variance of 16kg2 and
their mean height was 150 cm with variance of 25cm2. Compare the variability of weight and
height of these children.
𝑆𝑚 4 𝑘𝑔
𝐶𝑉𝑚 = × 100 = × 100% = 13.33%
𝑥̅𝑚 30 𝑘𝑔
𝑆ℎ 5𝑐𝑚
𝐶𝑉ℎ = × 100 = × 100 = 3.33%
𝑥̅ℎ 150𝑐𝑚
The weight of the children is more variable than their height.
2.2.1 Standard score
A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x−
Population standard score: Z = where x is the value of the observation,  and  are the mean

and standard deviation of the population respectively.
26
x−x
Sample standard score: Z = where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.
Interpretation:
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑙𝑖𝑒𝑠 𝑎𝑏𝑜𝑣𝑒 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
𝐼𝑓 𝑍 𝑖𝑠 { 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑙𝑖𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
𝑧𝑒𝑟𝑜, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
Example 4.5: Two sections were given an exam in a course. The average score was 72 with
standard deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A
from section 1 scored 84 and student B from section 2 scored 90. Who performed better relative
to his/her group?
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84
Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90
x A − x1 84 − 72
Z-score of student A: Z = = = 2.00
S1 6
x B − x 2 90 − 85
Z-score of student B: Z = = = 1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.
Example 4.6: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation
of 10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare
her relative positions on each test.
Solution: First, find the z-scores.
For calculus the z-score is
𝒙 − µ 𝟔𝟓 − 50
𝒛= = = 𝟏. 𝟓
𝝈 𝟏𝟎
For history the z-score is
𝒙 − µ 𝟑𝟎 − 25
𝒛= = = 𝟏. 𝟎
𝝈 𝟓
Since the z-score for calculus is larger, her relative position in the calculus class is higher than her
relative position in the history class.
4.4 Moments, skewness and kurtosis
Moments
Definition 4.7: The average of deviations from an arbitrary origin raised to an integral power of
the observations of a distribution is defined as a moment. Let x1,x2,…,xn be observations, we
define the r-th moment about A as:

27
∑(𝑥𝑖 − 𝐴)𝑟
.
𝑛

The most known moments are moments about the mean also known as the central moments and
the moments about zero (also known as moments about the origin.)
The rth moment about the mean, µr, is given by:
∑(𝑥𝑖 −𝑥̅ )𝑟
µ𝑟 = .
𝑛
Special ceases: µ0=1, µ1=0, µ2=s2.
The rthmoment about the origin,µ𝑟 , , is given by:
∑ 𝑥𝑖𝑟
µ𝑟 , = .
𝑛
∑ 𝑥𝑖2
Special cases: µ0 , = 1,µ1 , = 𝑥̅ ,µ2 , = .
𝑛
Skweness: it refers to lack of symmetry in a distribution.
Note: for a symmetrical and unimodal distribution:
i) Mean =median =mode
ii) The lower and upper quartiles are equidistant from the median, so also are corresponding
pairs of deciles and percentiles.
iii) Sum of positive deviations from the median is equal to the sum of negative deviations (signs
ignored).
iv) The two tails of the frequency curve are equal in length from the central value.
If a distribution is not symmetrical we call it skewed distribution.

Measures of skewness
i) Pearsonian coefficient of skewness (Pcsk) defined as:
𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒
𝑃𝑐𝑠𝑘 =
𝑠. 𝑑
In moderately skewed distributions: Mode = mean- 3(mean-median)

3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑃𝑐𝑠𝑘 =
𝑠. 𝑑
Interpretation:

< 0, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑.

𝑖𝑓 𝑃𝑐𝑠𝑘 { = 0, 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐𝑎𝑙
>0, 𝑝𝑜𝑠𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑.

Note: in a negatively skewed distribution larger values are more frequent than smaller values. In
a positively skewed distribution smaller values are more frequent than larger values.

28
Example 4.7: If the mean, mode and s.d of a frequency distribution are 70.2, 73.6, and 6.4,
respectively. What can one state about its skeweness?
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒 70.2−73.6
𝑃𝑐𝑠𝑘 = = = −0.53.
𝑠.𝑑 6.4
This figure suggests that there is some negative skeweness.

Kurtosis: it refers to the degree of peakedness of a distribution.

When the values of a distribution are closely bunched around the mode in such a way that the peak
of the distribution becomes relatively high, the distribution is said to be leptokurtic. If it is flat
topped we call it platykurtic. A distribution which is neither highly peaked nor flat topped is known
as a meso-kurtic distribution (normal).

Measures of kurtosis

i. Moment coefficient of kurtosis (Mck) is given by

µ4 µ4
𝑀𝑐𝑘 = =
µ22 𝑠 4
where
∑(𝑥𝑖 − 𝑥̅ )4 ∑(𝑥𝑖 − 𝑥̅ )2
µ4 = , µ2 = = 𝑆2.
𝑛 𝑛
Interpretation:
< 3, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑝𝑙𝑎𝑡𝑦𝑘𝑢𝑟𝑡𝑖𝑐 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.
𝑖𝑓 𝑀𝑐𝑘 { = 3, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑚𝑒𝑠𝑜𝑘𝑢𝑟𝑡𝑖𝑐 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.
> 3 , 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑙𝑒𝑝𝑡𝑜𝑘𝑢𝑟𝑡𝑖𝑐 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.

29
UNIT FIVE: ELEMENTARY PROBABILITY
Objectives:
Having studied this unit, you should be able to
✓ understand the elements of probability
✓ calculate some probabilities of events associated with random experiments
✓ apply the concept of probability in some biological phenomena
5.1 Introduction

Without some formalism of probability theory, the student cannot appreciate the true interpretation
from data analysis through modern statistical methods. It is quite natural to study probability prior
to studying statistical inference. Elements of probability allow us to quantify the strength or
“confidence” in our conclusions. In this sense, concepts in probability form a major component
that supplements statistical methods and helps us to gauge the strength of the statistical inference.
The discipline of probability, then, provides the transition between descriptive statistics and
inferential methods. Elements of probability allow the conclusion to be put into the language that
the science or engineering practitioners require. An example follows that will enable the reader to
understand the notion of a P-value, which often provides the “bottom line” in the interpretation of
results from the use of statistical methods.

5.2 Definition of some probability terms

Definition 5.1: Random experiment is an experiment in which the outcome cannot be
determined or predicted exactly in advance, i.e. it is the process of observing or measuring the
outcome of a chance event.
Some of the characteristics of a random experiment are
✓ All the possible outcomes of the experiment can be specified in advance.
✓ The experiment can be repeated indefinitely.
✓ There is a sort of regularity in the outcomes observed in large repetitions of the experiment.
Examples of random experiments includes throwing a fair coin and observing the outcome,
throwing a fair die and observing the number on the top face, taking a student at random from
science class and noting the sex of the student.
All of these examples satisfy the above characteristics of a random experiment.

Definition 5.2:
Sample point (outcome): The individual result of a random experiment.
Sample space: The set containing all possible sample points (out comes) of the random experiment.
The sample space is often called the universe and denoted by S.
Event: The collection of outcomes or simply a subset of the sample space. We denote events with
capital letters, A, B, C, etc.

Example 5.1: If an experiment consists of flipping of a coin once, then

S = {H, T} where H means that the outcome of the toss is a head and T that it is a tail. A= {H}
represents the event of head occurring.
Example 5.2: If an experiment consists of rolling a die once and observing the number on top,
then the sample space is S = {1, 2, 3, 4, 5, 6} where the outcome imeans that iappeared on the die,
30
i= 1, 2, 3, 4, 5, 6. {1}, {2},{3},{4},{5} and {6}are elementary events i.e. events consisting of a
single outcome. Let A represents the event of an odd number will occur, then A is simply the set
containing 1, 3 and 5 i.e. A= {1, 3, 5}.
Review of set theory
Concepts of set theory are important in understanding probability. Given A,B and C are events
associated with a sample space S and ω represents an elementary event (outcome) in S, then the
following are some useful definitions and results in set theory.

Definitions 5.3:
1. Union: The union of A and B, A u B, is the event containing all sample points in either
A or B or both. Sometimes we use A or B for union.
2. Intersection: The intersection of A and B, A n B, is the event containing all sample points that are both in
A and B. Sometimes we use AB or A and B for intersection.
3. Subset: If for any ω  A, then ω  B. Then A  B .
4. Empty set: If a set A contains no points, it will be called the null set, or empty set, and denoted by  .
5. Complement: The complement of a set A denoted by Ac is the set where ω  S, ω  Ac but, ω A .
6. Mutually Exclusive Events: Two events are said to be mutually exclusive (or disjoint) if their intersection
is empty. (i.e. A n B =  ). Subsets A1, A2,… are defined to be mutually exclusive if Ai n Aj =  for every i ≠
j.

Theorem 5.1:Important elementary set theory results

i) Au B=B u A and A n B = B n A
ii) Au (B u C) = (Au B) u C and A n (B n C) = (A n B) n C
iii) An (B u C) = (A n B) u (A n C) and Au (B n C) = (A u B) n (A u C)
iv) (Ac)c = A
v) An S = A; A u S = S; A n  =  ; and A u A =A
vi) (A u B)c = Ac n Bc and (A n B)c = Ac u Bc

5.3 Counting rules

Combinatorics refers to the methods used to count things. If a sample space contains a finite set of
outcomes, determining the probability of an event often is a counting problem. But often the
numbers are just too large to count in the 1, 2, 3, 4 ordinary ways. For example, if you put a grain
of rice on the first square of a chessboard, then two grains on the second square, four on the third
square, and continue doubling until all 64 squares are filled, how many grains of rice would you
have in all? The number is so large that it is difficult to handle without a systematic enumeration
technique.

In short, to assign probabilities for an event, we might need to enumerate the possible outcomes
of a random experiment and need to know the number of possible outcomes favoring the event.
The following principles will help us in determining the number of possible outcomes favoring a
given event.

Theorem 5.2:Addition principle

31
If a task can be accomplished by k distinct procedures where the ithprocedure has ni alternatives,
then the total number of ways of accomplishing the task equals
n1 + n2+…+nk.

Example 5.3: Suppose one wants to purchase a certain commodity and that this commodity is on
sale in 5 government owned shops, 6 public shops and 10 private shops. How many alternatives
are there for the person to purchase this commodity?
Solution: Total number of ways =5+6+10=21 ways

Theorem 5.3: Multiplication principle

If a choice consists of k steps of which the first can be made in n1 ways, for each of these the
second can be made in n2 ways… and for each of these the kth can be made in nk ways, then the
whole choice can be made in n1.n2….nk ways.

Example 5.4: If we can go from Addis Ababa to Rome in 2 ways and from Rome to Washington
D.C. in 3 ways then the number of ways in which we can go from Addis Ababa to Rome to
Washington D.C. is 2x3 ways or 6 ways. We may illustrate the situation by using a tree diagram
below:

R W

W
A

W
R
W

Example 5.5: If a test consists of 10 multiple choice questions, with each permitting 4 possible
answers, how many ways are there in which a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4 alternatives.
Second step: To give answer to question number two, he/she has 4 alternatives……
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4x…x4=410 ways or1, 048, 576 ways of completing the exam. Note
that there is only one way in which he /she can give correct answers to all questions and that there
are 310 ways in which all the answers will be incorrect.
Example 5.6: A manufactured item must pass through three control stations. At each station the
item is inspected for a particular characteristic and marked accordingly. At the first station, three
ratings are possible while at the last two stations four ratings are possible. Hence there are 48 ways
in which the item may be marked.
32
Example 5.7: Suppose that car plate has three letters followed by three digits. How many possible
car plates are there, if each plate begins with a H or an F?
2x 26x 26x 10x 10x 10 or 1, 352, 000 different plates.

Definition 5.4: If n is a positive integer, we define n!= n(n-1)(n-2)…1 and call it n-factorial and
0!=1.

Permutations
Suppose that we have n different objects. In how many ways, saynPn, may these objects be arranged
(permuted)? For example, if we have objects a, b and c we can consider the following
arrangements: abc, acb, bac, bca, cab, and cba. Thus the answer is 6. The following theorem gives
general result on the number of such arrangements.

Theorem 5.4: Permutation

i) The number of permutations of n different objects is given by nPn= n!
ii) A permutation of n objects, arranged in groups of size r, without repetition, and order
being important is:
n!
n Pr =
(n − r )!

Example 5.8: Suppose that we have five letters a, b, c, d.

i) What is the number of possible arrangements of these letters taken all at a time?
ii) What is the number of possible arrangements of these letters if we use only three of the
letters at a time?
Solution:
i) Using (i) of theorem 5.4, we have 4! ways of arranging the 4 letters, i.e. we have 24 possible
arrangements.
ii) Using (ii) of theorem 5.4, we have 4P3 ways of arranging 3 letters taken from the four letters,
i.e. we have 24 possible arrangements.
Example 5.9: In a class with 8 boys and 8 girls
i) In how many ways can the children line up if they alternate girl-boy-girl-boy-... ?
ii) In how many ways can the children line up so that no two of the same sex are next to each
other?
Solution:
i) The 8 girls can line-up in 8! ways, and likewise the 8 boys can line-up in 8! ways. For any
single arrangement of the girls, all possible arrangements of the boys are possible, thus by
multiplication principle we have 8!x 8! ways to arrange the children in girl-boy lines.
ii) Now we must include the case of boy-girl. So we have 2x8!x 8! ways of arranging.
Example 5.10: If I have 5 different books on my shelf, in how many ways can I arrange these
books? Solution: We can arrange the books in 5! different ways or 5x4x3x2x1 ways or 120 ways.
Remarks
i) The number of permutations of n distinct objects arranged in a circle is (n-1)!.
This is because we consider two permutations the same if one is a rotation of the other. For n
objects arranged around a circle, there a n rotations that give the same permutation. Dividing n! by

33
n gives (n - 1)!. The two circular permutations below are considered the same; their order is a, b,
c, d, e.

ii) Permutations when not all objects are different

Given n objects of which n1 are one kind, n2 are another kind, …,nk of another kind, then the total
n!
number of distinct permutations that can be made from these objects is .
n1!n2 !...nk !
Example 5.11
i) How many "words" (text strings or distinct arrangements) can be made from the letters
b,k,o,o?
ii) How many permutations are there for the letters in the word banana?
Solution:
i) If we label the two o’s as o1 and o2, and think of them as distinct, then the number of
permutations is 4!. For each permutation there will be a matching permutation that
switches the o’s, that is for o1o2bk there is the matching o2o1bk permutation. We can
see then that if we divide the number of distinct permutations by two, we have a count
of the number of permutations of the 4 letters where we do not distinguish between the
two o’s. Therefore, there are distinct4!/2 text strings or 12 text strings.
ii) If we think of all 6 letters as distinct, then we would have 6! permutations. As in the
preceding example for the two n’s, we would need to divide 6! by 2. For the 3 a’s, we
would have 6 counts for a single permutation. For instance, each of the following would
be a single word if the a’s were not distinct. a1a2a3bnn, a1a3a2bnn, a2a1a3bnn, a2a3a1bnn,
a3a1a2bnn, and a3a2a1bnn. Hence the number of distinct permutations of the word
banana is
6!
= 60 .
2!3!
Combinations
Consider n different objects. This time we are concerned with counting the number of ways we
may choose r out of these n objects without regard to order. For example, we have the objects a,
b, c and d, and r=2; we wish to count ab, ac, ad, bc, bd, and cd. In other words, we do not count ab
and ba since the same objects are involved and only the order differs.

There are many problems in which we are interested in determining the number of ways in which
r objects can be selected from n distinct objects without regard to the order in which they are
selected. Such selections are called combinations or r-sets. It may help to think of combinations
as committees. The key here is without regard for order.

To obtain the general result we recall the formula derived above: the number of ways of choosing
r objects out of n and permuting the chosen r equals n!/(n-r)!. Let C be the number of ways of
choosing r out of n, disregarding order. C is the number required. Note that once the r items have

34
been chosen, there are r! ways of permuting them. Hence applying the multiplication principle
again, together with the above result, we obtain
n!
C.r! = n!/(n-r)!. Therefore, C = . This number arises in many contexts in mathematics
r!(n − r )!
and hence a special symbol is used for it. We shall write
n n!
  = n C r = .
r r!(n − r )!

Theorem 5.5: Combination

The number of ways of choosing r out of n different objects, disregarding order, is given by
n n!
  = .
 r  r!(n − r )!

Example 5.12: How many different committees of 3 can be formed from Hawa, Segenet, Nigisty
and Lensa?
Solution: The question can restated in terms of subsets from a set of 4 objects, how many subsets
of 3 elements are there? In terms of combinations the question becomes, what is the number of
combinations of 4 distinct objects taken 3 at a time? The list of committees:{H,S,N}, {H,S,L},
{H,N,L}, {S,N,L}.Therefore, we have 4C3 or 4 possible number of committees.
Example 5.13:
(i) A committee of 3 is to be formed from a group of 20 people. How many different committees
are possible?
(ii) From a group of 5 men and 7 women, how many different committees consisting of 2 men and
3 women can be formed?
 20  20!
Solution: (i) There are   = = 1140 possible committees.
 3  3!17!
 5  7  5! 7!
(i)    = = 350 possible committees.
 2  3  2!3! 3!4!
Remarks:
n  n 
i)   =  
r  n − r

ii) A set with n elements has 2n subsets.

5.4 Probability of an event

Definition 5.5: The Axioms of Probability
Probabilities are real numbers assigned to events (or subsets) of a sample space. We can think of the
assignment of probabilities to events, or probability measure, as a function between the collection of subsets
of the sample space and the real numbers. Mathematically, a probability measure P for a random experiment
is a real-valued function defined on the collection of events that satisfies the following axioms:
Axiom 1: The probability of an event is a nonnegative real number; that is, P(A) ≥ 0 for any subset A of S.
Axoim 2: P(S) = 1
35
Axiom 3: If A1, A2, A3 ... is a finite or infinite sequence of mutually exclusive
events of S, then P(A1 u A2 u A3 u ...) = P( A1) + P( A2) + P( A3) + ...=  P ( Ai )

It is rather surprising that with only these three axioms, we can construct the "entire" theory of
probability! The next theorems and definitions help in assigning probabilities of events.

Theorem 5.6 :If A is an event in a discrete sample space S, then P(S) equals the sum of the probabilities of
the individual outcomes comprising A.

Theorem 5.7: Suppose that we have a random experiment with sample space S and probability function P
and A andB are events. Then we have the following results:
i) P(  ) = 0
ii) P(Ac) = 1 − P(A)
iii) P(B n Ac) = P(B) − P(A n B)
iv) If A subset of B then P(A) ≤ P(B).

Definition 5.6: The classical definition of probability

If an experiment can result in any one of N equally likely and mutually exclusive outcomes, and if n of these
n
outcomes constitute the event A, then the probability of event Ais P( A) = .
N
Example 5.14: Consider the experiment of tossing a fair die. A fair die means that all six numbers
are equally likely to appear. Calculate the probabilities of the following events:
a) A=One will occur ={1}
b) B=Even number will occur ={2, 4, 6}
c) C=Odd number will occur ={1, 3, 5}
d) D=A number less than 3 will occur ={1,2}
Solution:
a) Since the die is fair
1
P( A) = P({1}) =
6
1
P({2}) = P({3}) = P({4}) = P({5}) = P({6}) =
6
3 3 2
b) P( B) = = 0.5; c) P(C ) = = 0.5; d ) P( D) =
6 6 6
Example 5.15: Suppose that we toss two coins, and assume that each of the four outcomes in the
sample space S = {(H,H),(H, T ), (T ,H), (T , T )} are equally likely and hence has probability ¼.
Let A= {(H, H),(H, T )} and B= {(H,H), (T ,H)} that is, Ais the event that the first coin falls heads,
and Bis the event that the second coin falls heads. Then, calculate the probabilities of A, B, Ac, Bc,
and Sc. The event that none of the outcomes will occur is the same as Sc.
Solution:

36
2
P( A) = = 0.5
4
2
P( B ) = = 0.5
4
P( A ) = 1 − P ( A) = 1 − 0.5 = 0.5
c

P( B c ) = 1 − P ( B ) = 1 − 0.5 = 0.5
P( S c ) = 1 − P( S ) = 1 − 1 = 0 = P ( )
Example 5.16: From a group of 5 men and 7 women, it is required to form a committee of 5
persons. If the selection is made randomly, then
i) what is the probability that 2 men and 3 women will be in the committee?
ii) what is the probability that all members of the committee will be men?
iii) what is the probability that at least three members will be women?
12  12!
Solution: The total number of possible committees is   = = 792 , i.e. the number of possible
 5  5!7!
out comes in the sample space is 792.
i) Let A be the event that the committee will consist of two 2 men and 3 women. We need to
know the number of possible outcomes favoring this event. The number of ways we
 5  5!
can select 2 men from 5 men is   = = 10 and the number of ways of selecting 3
 2  2!3!
 7  7!
women out of 7 women is   = = 35 . Using the multiplication principle, the
 3  3!4!
number of elements favoring event A is 10x35 or 350.
Hence, using the classical definition of probability,
 5  7 
  
P ( A) =    =
2 3 350
= 0.44
12  792
 
5
ii) Let B be the event that all members of the committee will be men. Hence
 5  7 
  
P ( A) =    =
5 0 1
12  792
 
5
iii) Let C be the event that at least three of the committee members will be women.
Basically, three different compositions of committee members can be formed in terms
of sex: 3 women and 2 men, 4 women and 1 man, and all are women. Hence the number
of possible outcomes favoring event C using the principle of combination together with
 5  7   5  7   5  7 
the addition principle is    +    +    = 350 + 175 + 21 = 546 .
 2  3   1  4   0  5 

37
 5  7   5  7   5  7 
   +    +   
Therefore, P (C ) =          =
2 3 1 4 0 5 546
= 0.69
12  792
 
5

Definition 5.7: Relative Frequency Definition of probability

If an experiment is repeated a large number, n, of times and the event A is observed nAtimes, the
probability of A is P(A) ≈ nA/n.

The above definition of probability is based on empirical data accumulated through time or based
on observations made from repeated experiments for a large number of times.
5.5 Some probability rules
Theorem 5.8: If A and B , thenP(A u B) = P(A) + P(B) − P(A n B).

Example 5.17: Consider the experiment of tossing a fair die. Let

A = Even number occurring = {2,4,6}
B = A number greater than 2 occurring ={3, 4, 5, 6}
C = Odd number occurring ={1, 3, 5}
i) What is the probability that A and B will occur?
ii) What is the probability that A or B will occur?
Solution: We use the concept of set theory to help us solve probability questions very easily and
vein diagrams are useful tools to depict the relations between events within the sample space. The
shaded region on Fig 1.shows the event that both A and B will occur.
i) A and B ≡ AnB ={4,6}
Thus P(AnB)=2/6.
ii) A or B ≡ AUB ={2,3,4,5,6}
AnB={4,6} Hence,
3 4 2 5
P( AUB) = P( A) + P( B) − P( AnB) = + − = .
6 6 6 6
Example 5.18: Sixty percent of the families in a certain community own their own car, thirty
percent own their own home, and twenty percent own both their own car and their own home. If a
family is randomly chosen,
a) what is the probability that this family do not have a car?
b) what is the probability that this family owns a car or a house?
c) what is the probability that this family owns a car or a house but not both?
d) what is the probability that this family owns only a house?
e) what is the probability that this family neither owns a car nor a house?
Solution: Let A represents that the family owns a car and B represents that the family owns a
house. Given information: P(A)=0.6,P(B)=0.3, and P(AnB)=0.2.
a) Required: P(Ac) = ?
P(Ac)=1-P(A) = 1-0.6 = 0.4
b) Required: P(AUB) = ?
P(AUB) = P(A)+P(B)-P(AnB) = 0.6+0.3-0.2 = 0.7
c) Required: P((AnBc)U(AcnB)) = ?
38
P((AnBc)U(AcnB)) = P(AnBc)+P(AcnB) = [P(A)-P(AnB)]+[P(B)-P(AnB)]
= [0.6-0.2]+[0.3-0.2]=0.5
d) Required: P(AcnB) =?
P (AcnB) = P(B)-P(AnB) = 0.3-0.2 = 0.1
e) Required: P(AcnBc) = ?
P (AcnBc) = P((AUB)c) = 1-P(AUB) = 1-0.7 = 0.3
We can represent various events by an informative diagram called vein diagram. If properly and
correctly drawn, a vein diagram helps to calculate probabilities of events easily. The figure below
shows various events represented by shaded regions. Note that the rectangle in each figure
represents the sample space.

5.6 Conditional probability and independence

Conditional Probability
Conditional probability provides us with a way to reason about the outcome of an experiment,
based on partial information. Here are some examples of situations we may have in our mind:
(a) What is the probability that a person will be HIV-Positive given he has tuberculosis?
(d) A spot shows up on a radar screen. How likely is it that it corresponds to an aircraft?

In more precise terms, given an experiment, a corresponding sample space, and a probability law,
supposes that we know that the outcome is within some given event B. We wish to quantify the
likelihood that the outcome also belongs to some other given event A. We thus seek to construct a
new probability law, which takes into account this knowledge and which, for any event A, gives
us the conditional probability of A given B, denoted by P(A|B).

Definition 5.8:If P(B) > 0, the conditional probability of A given B, denoted by P(A|B), is
P( AnB )
P( A / B) = .
P( B)
Example 5.19: Suppose cards numbered one through ten are placed in a hat, mixed up, and then
one of the cards is drawn at random. If we are told that the number on the drawn card is at least
five, then what is the conditional probability that it is ten?
Solution :Let A denote the event that the number on the drawn card is ten, and Bbe the event that
it is at least five. The desired probability is P(A|B).
P( AnB) P({10}n{5,6,7,8,9,10}) P({10}) 1 / 10 1
P( A / B) = = = = =
P( B) P({5,6,7,8,9,10}) P({5,6,7,8,9,10}) 6 / 10 6
39
Example 5.20: A family has two children. What is the conditional probability that both are boys
given that at least one of them is a boy? Assume that the sample space S is given by S = {(b, b),
(b, g), (g, b), (g, g)}, and all outcomes are equally likely. (b, g) means, for instance, that the older
child is a boy and the younger child is a girl.
Solution:Letting A denote the event that both children are boys, and B the event that at least one
of them is a boy, then the desired probability is given by
P( AnB) 1 / 4 1
P( A / B) = = =
P( B) 3/ 4 3
Law of Multiplication
The defining equation for conditional probability may also be written as:
P(AnB) = P(B) P(A|B)
This formula is useful when the information given to us in a problem is P(B) and P(A|B) and we
are asked to find P(AnB). An example illustrates the use of this formula. Suppose that 5 good fuses
and two defective ones have been mixed up. To find the defective fuses, we test them one-by-one,
at random and without replacement. What is the probability that we are lucky and find both of the
defective fuses in the first two tests?
Example 5.21: Suppose an urn contains seven black balls and five white balls. We draw two balls
from the urn without replacement. Assuming that each ball in the urn is equally likely to be drawn,
what is the probability that both drawn balls are black?
Solution:Let A and B denote, respectively, the events that the first and second balls drawn are
black. Now, given that the first ball selected is black, there are six remaining black balls and five
white balls, and so P (B|A) = 6/11. As P(A) is clearly 7/12 , our desired probability is
7 6 7
P( AnB) = P( A) P( B / A) = . =
12 11 22
Independence
We have introduced the conditional probability P (A|B) to capture the partial information that
event B provides about event A. An interesting and important special case arises when the
occurrence of B provides no information and does not alter the probability that A has occurred,
i.e., P(A|B) = P(A).When the above equality holds, we say that A is independent of B. Note that
by the definition P(A|B) = P(A ∩ B)/P(B), this is equivalent to P(A ∩ B) = P(A)P(B).

Definition 5.9: Independence

Two events A and B are said to independent if P (A ∩ B) = P (A)P(B). If in addition, P (B) >0, independence
is equivalent to the condition P(A|B) = P(A).

40
41
UNIT SIX: PROBABILITY DISTRIBUTIONS
Objectives:
Having studied this unit, you should be able to
✓ compute probabilities of events using the concept of probability distributions.
✓ compute expected values and variances of random variables.
✓ apply the concepts of probability distributions to real-life problems.
Introduction
In many applications, the outcomes of probabilistic experiments are numbers or have some
numbers associated with them, which we can use to obtain important information, beyond what
we have seen so far. We can, for instance, describe in various ways how large or small these
numbers are likely to be and compute likely averages and measures of spread. For example, in 3
tosses of a coin, the number of heads obtained can range from 0 to 3, and there is one of these
numbers associated with each possible outcome. Informally, the quantity “number of heads” is
called a random variable, and the numbers 0 to 3 its possible values. The value of a random
variable is determined by the outcome of the experiment. Thus, we may assign probabilities to the
possible values of the random variable.

6.1 Definition of random variables and probability distributions

Given an experiment and the corresponding set of possible outcomes (the sample space), a random
variable associates a particular number with each outcome. Mathematically, a random variable is
a real-valued function of the experimental outcome. The following are some examples of random
variables:
(a) In an experiment involving a sequence of 5 tosses of a coin, the number of heads in the sequence
is a random variable.
(b) In an experiment involving two rolls of a die, the following are examples of random variables:
(1) The sum of the two rolls, (2) The number of sixes in the two rolls.
(c) In an experiment involving the transmission of a message, the time needed to transmit the
message, the number of symbols received in error, and the delay with which the message is
received are all random variables.
Notation: We will use capital letters to denote random variables, and lower case characters to
denote real numbers such as the numerical values of a random variable.
Types of random variables: Generally, two types of random variables exist: discrete and
continuous. A random variable is called discreteif its range (the set of values that it can take) is
finite or at most countably infinite. For instance, the number of children in a family, number of car
accidents within given period of time in a certain locality, the number of bacteria in a cubic mm
of agar, etc. If random variable assumes any numerical value in an interval or collection of
intervals, then it is called a continuous random variable. Examples include body weight of new
born baby, life time of a human being, height of a person, etc.
The most important way to characterize a random variable is through the probabilities of the values
that it can take. For a discrete random variable X, these are captured by the probability mass
function (p.m.f. for short) of X, denoted PX(x). For a continuous random variable X it is done by
the probability density function (p.d.f.), denoted fX(x).

Definition 6.1: Probability mass function

42
If x is any possible value of X, the probability mass of x, denoted PX(x), is the probability of the
event {X = x} consisting of all outcomes that give rise to a value of X equal to x. A probability
mass function must satisfy the following conditions:
i. PX(x)≥0 for any value of x of X.
ii.  PX (x) = 1 where the summation is over all values of x .

Example 6.1: Consider an experiment of tossing two fair coins. Letting X denote the number of
heads appearing on the top face, then X is a random variable taking on one of the values 0, 1, 2 .
The random variable X assigns a 0 value for the outcome (T,T), 1 for outcomes (T ,H) and (H, T
), and 2 for the outcome (H,H). Thus, we can calculate the probability that X can take specific
value/s as follows:
P(X = 0) = P({(T , T )}) = ¼
P(X = 1) = P({(T ,H),(H, T )}) = 2/4,
P(X = 2) = P({(H,H)}) = ¼
The table below shows the probability mass function X.
X 0 1 2
PX(x) ¼ 2/4 ¼
We can justify that PX(x) is probability mass function.
PX(x)≥0 for x=0,1,2 and
P(X = 0) + P(X = 1)+P(X = 2) = ¼ + 2/4 + ¼=1
Suppose we are interested to calculate the probability that X≥1. The values of X which are greater
than or equal to 1 are 1 and 2. Thus, the probability that X is greater than or equal to 1, denoted
P(X≥1), is found as P(X≥1) = P(X = 1) + P(X = 2)=3/4.

Definition 6.2: Continuous random variable

A random variable X is called continuous if there exists a function fX(x) called the probability
density function of X which satisfies
a. fX(x)≥0 for all x.

b. f
−
X ( x)dx = 1

We can use the probability density function to calculate probabilities of events expressed in terms
of the random variable X. For instance, if we are interested in the probability that X lies between
two points, say a and b, we can find it using integration of fX(x) on the interval [a,b],i.e.
b
P(a  X  b) =  f X ( x)dx
a

Figure: P(a≤ X ≤ b) is the shaded region

43
Remarks:
i) The area bounded under the graph of a probability density function and below by the horizontal
axis is 1.
ii) The probability that a continuous random variable X will assume a specific value is zero, i.e.
c
P( X = c) =  f X ( x)dx = 0 where c is a constant.
c
iii) The probability that a continuous random variable X will assume a value in a closed intervals
is the same as the probability that it will assume in open interval or half open intervals, i.e.
, P(a≤X≤b) = P(a<X<b) = P(a≤X<b) = P(a<X≤b), P(X≤c) = P(X<c) , P(X≥c) = P(X>c)
where a, b, and c are constants.

6.2 Introduction to expectation: mean and variance

We can associate with each random variable certain “averages” of interest, such as mean and
variance which give useful summary of a probability distribution.
Mean
Definition 6.3: The (mean) expected value of a random variable X denoted by E(X) or μ is given
by
i) E ( X ) =  xPX ( x) if X is discrete r.v.

ii)  xf X ( x)dx if X is continuous r.v.
−

It is useful to view the mean of X as a “representative” value of X, which lies somewhere in the
middle of its range. We can make this statement more precise, by viewing the mean as the center
of gravity of the distribution.

Variance
Definition 6.4: The variance of a random variable X denoted V(X) or σ2 is defined as
V(X)=E[(X- μ)2] = E(X2) – μ2.
i) if X is discrete, V ( X ) = [ x 2 PX ( x)] −  2

ii) if X is continuous, V ( X ) = [  x 2 f X ( x)dx] −  2
−

The variance provides a measure of dispersion of X around its mean. Another measure of
dispersion is the standard deviation of X, which is defined as the square root of the variance and is
denoted by σ.
Example 6.2: Calculate the mean and variance of the random variable X in example 7.1.
1 1 1
E ( X ) =  xPX ( x) = 0  + 1 + 2  = 1
4 2 4
1 1 1
E ( X 2 ) =  x 2 PX ( x) = 02  + 12  + 22  = 1.5
4 2 4
V ( X ) = E ( X ) −  = 1.5 − 1 = 0.5
2 2 2

44
6.3 Common discrete probability distributions – binomial and Poisson
The Binomial distribution
Many real problems (experiments) have two possible outcomes, for instance, a person may be
HIV-Positive or HIV-Negative, a seed may germinate or not, the sex of a new born bay may be a
girl or a boy, etc. Technically, the two outcomes are called Success and Failure. Experiments or
trials whose outcomes can be classified as either a “success” or as a “failure” are called Bernoulli
trails.
Suppose that n independent trials, each of which results in a “success” with probability p and in a
“failure” with probability 1 − p, are to be performed. If X represents the number of successes that
occur in the n trials, then X is said to have binomial distribution with parameters n and p. The
probability mass function of a binomial distribution with parameters n and p is given by
 n
PX ( x) =   p x (1 − p) n − x , x = 0, 1, 2, ..., n
 x
The mean and variance of the binomial distribution are np and np(1-p), respectively. Note that the
binomial distributions are used to model situations where there are just two possible outcomes,
success and failure. The following conditions also have to be satisfied.
i) There must be a fixed number of trials called n
ii) The probability of success (called p) must be the same for each trial.
iii) The trials must be independent

Example 6.3: A fair coin is flipped 4 times. Let X be the number of heads appearing out of the
four trials. Calculate the following probabilities:
i) 2 heads will appear
ii) No head will appear
iii) At least two heads will appear
iv) Less than two heads will appear
v) At most heads 2 will appear
Solution: We can consider that the outcomes of each trial are independent to each other. In
addition the probability that a head will appear in each trial is the same. Thus, X has a binomial
distribution with number of trials 4 and probability of success (the occurrence of head in a trial) is
½. The probability mass function of X is given by
n n
PX ( x) =  0.5 x (1 − 0.5) n − x =  0.5 n , x = 0, 1, 2, 3,4 , Note that n = 4 and p = 1/2
 x  x
 4
i) P( X = 2) =  0.5 2 (1 − 0.5) 4−2 = 0.3750
 2
 4
ii) P( X = 0) =  0.5 0 (1 − 0.5) 4−0 = 0.0625
0
iii) P( X  2) = P( X = 2) + P ( X = 3) + P( X = 4) = 0.3750 + 0.2500 + 0.0625 = 0.6875
iv) P( X  2) = P( X = 0) + P( X = 1) = 0.0625 + 0.2500 = 0.3125
v) P( X  2) = P( X = 0) + P( X = 1) + P ( X = 2) = 0.0625 + 0.2500 + 0.3750 = 0.6875

45
Example 6.4:Suppose that a particular trait of a person (such as eye color or left handedness) is
classified on the basis of one pair of genes and suppose that d represents a dominant gene and r a
recessive gene. Thus a person with ddgenes is pure dominance, one with rris pure recessive, and
one with rdis hybrid. The pure dominance and the hybrid are alike in appearance. Children receive
one gene from each parent. If, with respect to a particular trait, two hybrid parents have a total of
four children, what is the probability that exactly three of the four children have the outward
appearance of the dominant gene?
Solution:If we assume that each child is equally likely to inherit either of two genes from each
parent, the probabilities that the child of two hybrid parents will have dd, rr, or rdpairs of genes
are, respectively, ¼, ¼,½. Hence, because an offspring will have the outward appearance of the
dominant gene if its gene pair is either ddor rd, it follows that the number of such children ,say X,
is binomially distributed with parameters n equals 4 and p equals ¾. Thus the desired probability
 4
is P( X = 3) =  0.753 (1 − 0.75) 4−3 = 0.421875.
 3
Example 6.5: Suppose it is known that the probability of recovery for a certain disease is 0.4. If
random sample of 10 people who are stricken with the disease are selected, what is the probability
that:
(a) exactly 5 of them will recover?
(b) at most 9 of them will recover?
Solution: Let X be the number of persons will recover from the disease. We can assume that the
selection process will not affect the probability of success (0.4) for each trial by assuming a large
diseased population size. Hence, X will have a binomial distribution with number of trials equal
10 
to 10 and probability of success equal 0.4. P( X = k ) =  0.4 k 0.610−k , k = 0,1,2,...10
k
10 
(a) P( X = 5) =  0.4 5 0.610−5 = 0.200658
5
10 
(b) P( X  9) = 1 − P( X = 10) = 1 −  0.410 0.610−10 = 1 − 0.000105 = 0.9999
10 
The Poisson Random Variable
A random variable X, taking on one of the values 0, 1, 2, . . . , is said to have a Poisson distribution
if its probability mass function is given by
e − x
PX ( x) = , x = 0, 1, 2, 3, ... and   0 .
x!
λ is the parameter of this distribution. The mean and variance of the Poisson distribution are equal
and their values are equal to λ. Note that poison distributions is used to model situations where
the random variable X is the number of occurrences of a particular event over a given period of
time (or space). Together with this , the following conditions must also be fulfilled: events are
independent of each other, events occur singly, and events occur at a constant rate (in other words
for a given time interval the mean number of occurrences is proportional to the length of the
interval).
The poisson distribution is used as a distribution of rare events such as telephone calls made to a
switch board in a given minute, number of misprints per page in a book, road accidents on a

46
particular motor way in one day, etc. The processes that give rise to such events are called poisson
processes.
Example 6.6:Suppose that the number of typographical errors on a single page of this lecture note
has a Poisson distribution with parameter λ = 1. if we randomly select a page in this lecture note,
calculate the probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.
Solution: Let X= Number of errors per page
e −  k
P( X = k ) = ,  = 1, k = 0,1,2,...
k!
e −110 1
a) Required P(X≥1)=? P( X = 0) = = = 0.367879
0! e
−1 3
e 1
b) P( X = 3) = = 0.061313
3!
c) P( X  2) = P( X = 0) + P( X = 1) = 0.73576
D) P( X  1) = 1 − P( X = 0) = 1 − 0.367879 = 0.632121
Example 6.7:If the number of accidents occurring on a highway each day is a Poisson random
variable with parameter λ = 3, what is the probability that no accidents will occur on a randomly
selected day in the future?
Solution: Let X= number of accidents per day
e −3 3 k
P( X = k ) = , k = 0,1,2,...
k!
e −3 30
Required P(X= 0) = ? P( X = 0) = = e −3 = 0.05
0!
Note: The Poisson random variable has a wide range of applications in a diverse number of areas.
An important property of the Poisson random variable is that it may be used to approximate a
binomial random variable when the binomial parameter n is large and p is small. The probability
that X will be k can be approximated by substituting λ by np in the poisson distribution, i.e.
e −  k
P( X = k ) = ,  = np .
k!
6.4 Common continuous probability distributions
Normal distribution
The normal distribution plays an important role in statistical inference because many real-life
distributions are approximately normal; many other distributions can be almost normalized by
appropriate data transformations (e.g., taking the log) and as a sample size increases, the means of
samples drawn from a population of any distribution will approach the normal distribution.

A continuous random variable X is said to follow normal distribution , if and only if , its
1 x− 2
1 − ( )
probability density function (p.d.f.) is f X ( x) = e 2 
where x  (-∞,∞ ), μ  (-∞,∞ )
2 
47
and σ  (0,∞ ). There are infinitely many normal distributions since different values of μ and σ
define different normal distributions. For instance, when μ= 0 and σ =1 , the above density will
1
1 − 2 z2
have the following form f Z ( z ) = e . This particular distribution is called the standard
2
normal distribution and sometimes known as Z-distribution.. The random variable corresponding
to this distribution is usually denoted by Z. If X has a normal distribution with mean μ and variance
σ2, we denote it as X ~ N ( ,  2 ) .
Properties of normal distribution
i) The normal distribution curve is a bell shaped, symmetrical about μ and mesokurtic. The
p.d.f. attains its maximum value at x= μ.
ii) Since for x= μ divides the area under the normal curve into two equal parts, μ is the mean,
the median and the mode of the distribution.
iii) The mean and variance of the normal distribution are μ, and σ2, respectively.
iv) The total area under the curve and bounded from below by the horizontal axis is 1, i.e.


f
−
X ( x)dx = 1

Figure: The shaded area under the normal curve is one

Since a normal distribution is a continuous probability distribution, the probability that X lies
between a and b is the area bounded under the curve, from left to right by the vertical lines x = a
and x = b and below by the horizontal axis.

Figure: P(a<X<b) equals the shaded region

b
However, evaluating P(a  X  b) =  f X ( x)dx is very complicated. To facilitate this problem,
a
we use the standard normal table which gives area values bounded by two points. Areas under the

48
standard normal distribution curve are tabulated in various ways. The most common tables give
areas bounded between Z=0 and a positive value of Z. In addition to the standard normal table,
the properties of normal distribution and the following theorem are useful to make probability
calculations very easy for any normal distribution.

Theorem 6.1: Standardization of a normal random variable

If X has a normal distribution with mean, μ and standard deviation ,σ , then
X −
i) Z = will have a standard normal distribution.

a− X − b−
P ( a  X  b) = P (   )
ii)   
a− b−
= P( Z )
 

Example 6.8: Let Z be the standard normal random variable. Calculate the following probabilities
using the standard normal distribution table: a) P(0<Z<1.2) b) P(0<Z<1.43) c) P(Z≤0) d) P(-
1.2<Z<0) e) P(Z≤-1.43)
f)P(-1.43≤Z<1.2) g) P(Z≥1.52) h)P(Z≥-1.52)
Solution:
a) The probability that Z lies between 0 and 1.2 can be directly found from the standard normal
table as follows: look for the value 1.2 from z column ( first column) and then move
horizontally until you find the value of 0.00 in the first row. The point of intersection made
by the horizontal and vertical movements will give the desired area (probability). Hence
P(0<Z<1.2)= 0.3849. Refer the table below as a guide to find this probability.

49
Figure: P(0<Z<1.2) is the shaded area
b) In a similar way P(0<Z<1.43)= 0.4236.
c) We know that the normal distribution is symmetric about its mean. Hence the area to the left
of 0 and the to the right of zero are 0.5 each. Therefore P(Z≤0)=P(Z≥0)=0.5

Figure: The area to the left and the right of 0 for z-distribution
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to symmetry
e) P(Z<-1.43)= 1- P(Z ≥ -1.43) Using the probability of the complement event.
= 1-[P(-1.43<Z<0)+P(Z≥0)] Since a region can be broken down
=1-[P(0<Z<1.43)+P(Z ≥0)] into non overlapping regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764

Figure: P(Z<-1.43) is the shaded region

f) P(-1.43≤Z<1.2) = P(-1.43≤Z<0) + P(0≤Z<1.2)=P(0<Z≤1.43) + 0.3849= 0.4236 + 0.3849
=0.8085

50
Figure: P(-1.43≤Z<1.2) is the shaded region
g) P(Z≥1.52) = 0.5 – P(0≤ Z<1.52)=0.5 – 0.4357=0.0643

Figure: P(Z≥1.52) is the shaded region

h) P(Z≥-1.52) = P(-1.52≤Z<0) + P(Z ≥0 )= P(0 < Z≤1.52) + 0.5
=0.4357 +0.5=0.9357
Example 7.11: Find the following values of z* of a standard normal random variable based on
the given probability values:
a) P(Z > z*) =0.1446
b) P(Z>z*) = 0.8554
Solution: We need to find specific values of Z given some probability values.
a) If the probability that Z>z* is 0.1446 implies that z* is to the right of zero because
P(Z>0) = 0.5 is greater than P(Z>z*).

P(Z > z) = 0.1446 implies that P(0<Z≤z) = 0.5 -0.1446=0.3554.

51
Hence we can look for the value of z* satisfying the above condition form the standard normal
table. Thus z* =1.06
b) If the probability that Z>z* is 0.8554 implies that z* is to the left of zero because
P(Z>0) = 0.5 is less than P(Z>z*). It implies that z* is a negative number.

P(Z>z) = 0.8554 = P(z≤ Z <0) + P( Z ≥ 0) = P(0 ≤ Z ≤ - z*) + 0.5

Implies P(0 ≤ Z ≤ - z*) = 0.8554 – 0.5=0.3554. Hence the value –z* form the table satisfying the
above condition is 1.06. Therefore z* = -1.06.
Example 6.9: If the total cholesterol values for a certain target population are approximately
normally distributed with a mean of 200 (mg/100 ml) and a standard deviation of 20 (mg/100 ml),
calculate the probability that a person picked at random from this population will have a cholesterol
value
a) greater than 240 (mg/100 ml)
b) between 180 and 220(mg/100 ml)
c) less 200 (mg/100 ml)
Solution: Let X be the cholesterol values in mg/100 ml, then X ~ N (200 , 400 )
X − b−
P( X  240) = P(  )
a)  
240 − 200
= P( Z  ) = P( Z  2) = 0.5 − P(0  Z  2) = 0.5 − 0.4772 = 0.0228
20
X − X − b−
P(180  X  220) = P(   )
  
180 − 200 220 − 200
b) = P( Z ) = P(−1  Z  1)
20 20
= 2 P (0  Z  1) = 2  0.3413 = 0.6826
200 − 200
c) P( X  200) = P( Z  ) = P( Z  0) = 0.5
20
Example 6.10: Assume that the test scores for a large class are normally distributed with a mean
of 74 and a standard deviation of 10.
(a) Suppose that you receive a score of 88. What percent of the class received scores higher than
yours?
(b) Suppose that the teacher wants to limit the number of A grades in the class to no more than
20%. What would be the lowest score for an A?
Solution: Let X be the score of a randomly picked student, then X ~ N (74, 100 )
52
X − 74 88 − 74
P( X  88) = P(  ) = P( Z  1.4)
a) 10 10
= 0.5 − P(0  Z  1.4) = 0.5 − 0.4192 = 0.0808
Hence 8.08 percent of the students score more than you did?
b) Let XA be the lowest mark to get letter grade A. We are given that
X − 74 x A − 74
P( X  x A ) = 0.2 = P(  ) = P( Z  z A )
10 10
x − 74
 P(0  Z  z A ) = 0.5 − 0.2 = 0.3  z A = 0.85  z A = 0.85 = A
10
Hence, the lowest mark to get letter grade A is 82.5.
The chi-square and t distributions
The chi-square and t distributions are important continuous distributions which are useful in
statistical inference. In this section we will see a brief introduction of these distributions. In later
chapters, we are going to see in detail on how to use these distributions in estimation and
hypotheses testing.
Chi-square distribution
A random variable X is said to have a chi-square distribution with n degrees of freedom (denoted
by  n2 ) if its probability density function is given by
n −x
1 −1
f X ( x) = n
x 2 e 2 , x  0.
2 2 ( n )
2
The chi-square distribution has one parameter called the degrees of freedom, n. Depending on the
values of n, we can have many different chi-square distributions. The mean and the variance of
chi-square distribution are n, and 2n, respectively.

Figure: The chi-square distribution

Because of its importance, the chi-square distribution is tabulated for various values of the
parameter n (refer table). Thus we may find in the table that value, denoted by  2 (n) , satisfying
p( X   2 (n)) =  , 0    1. The example below helps on how to read chi-square distribution
values.
Example 6.11:To read the chi-square value with 2 degrees of freedom where the area to the right
of this value is 0.005.Look the degrees of freedom, 2, in the first column (df column) and then
move horizontally until you find the value of α , 0.005 in the first row. The point of intersection
made by the horizontal and vertical movement will give the desired chi-square value, 10.597. This
value satisfies the following: P( X  10 .597 ) = 0.005 . In a similar way,The chi-square value with
100 degrees of freedom where the area to the right of this value is 0.975 is 74.222.

53
The t distribution
The t distribution is an important distribution useful in inference concerning population
mean/means. This distribution has one parameter called the degrees of freedom. Depending on the
values of the degrees of freedom, we may have different t distributions. The degrees of freedom
is usually denoted by n. In inference on the population mean, the degrees of freedom is related
to sample size. As the sample size or degrees of freedom increases, the t distribution approaches
the standard normal distribution.
The t- distribution shares some characteristics of the normal distribution and differs from it in
others. The t distribution is similar to the standard normal distribution in the following ways.
i) it is bell-shaped
ii) it is symmetrical about the mean
iii) the mean, median, and mode are equal to 0 and are located at the center of the distribution.
iv) The curve never touches the x-axis
The t distribution differs from the standard normal distribution in the following ways.
i) the variance is greater than 1.
ii) The t distribution is actually a family of curves based on the concept of degrees of freedom.

Figure: The t distribution

Due to its importance in inference values of t distribution is tabulated for some values of n (refer
table). Thus we may find in the table that value, denoted by t (n) , satisfying
p(t (n)  t (n)) =  , 0    1 and t(n) represents the t random variable with n degrees of freedom.
The following example will help you to read t distribution values.
Example 6.12:To find the t value with 3 degrees of freedom where the area to the right of this
value is 0.05.Look the degrees of freedom, 3, in the first column (df column) and then move
horizontally until you find the value of α , 0.05 in the first row. The point of intersection made by
the horizontal and vertical movement will give the desired t value 2.353. This value satisfies the
following: P(t  2.353 ) = 0.05

54
55
UNIT SEVEN: SAMPLING AND SAMPLING DISTRIBUTION OF SAMPLE MEAN

Objectives:
After a successful completion of this unit, students will be able to:
✓ Differentiate the two major sampling techniques: probabilistic and non-probabilistic
✓ Apply simple random sampling technique to select sample
✓ Define sampling distribution of the sample mean

Introduction to sampling and sampling distribution

In our daily life we are forced to make decision based on small scale study. For instance, a
laboratory technician take small droplets of blood to examine the presence disease; we examine
fruits before we purchase it; zoologists use the concept of sampling to estimate the population of
rodents, e t c. This process of inspection is very wide and is commonly used on various occasions.
But this job is difficult to implement on large scale. On the basis of small study, we make inference
about the entire population.

7.1 Methods of sampling

Definition of some basic terms
Sampling:is the technique of selecting representative sample from the whole.
Population: is the totality of elements or units under study.
Sample: is the part of the population.
Sampling Frame: A complete list of all the units of the population is called the sampling frame.
A unit of population is a relative term. If all the workers in a factory make a population, then a
worker is a unit of the population. If all the factories in a country are being studied for some
purpose, then a factory is a unit of the population of factories. The frame provides a base for the
selection of a sample.
Major reasons to use sampling
1. Saves Time and Cost: As the size of the sample is small as compared to the population, the
time and cost involved on sample study are much less than the complete counts. Hence a
sample study requires less time and cost.
2. To prevent destruction: The destructive nature of some experiments (or inspection) do not
allow to carryout complete enumeration, for instance, to check quality of beers, to study the
efficacy of new drugs, testing the life length of a bulb, e t c.
3. Sample survey provides higher level of accuracy: This accuracy can be achieved through
more selective recruiting of interviewers and supervisors, more extensive training programs, a
closer supervision of the personnel involved and a more efficientmonitoring of the field work.
Types of sampling
Generally, two types of sampling methods exist: probability and non-probability sampling.
Probability Sampling
The term probability sampling (or random sampling) is used when the selection of the sample is
purely based on chance. There is no subjective bias in the selection of units. Every unit of the
population has a known nonzero probability to be in the sample. The following are some of the t
random sampling methods: Simple random sampling, Stratified random sampling, Cluster
sampling, Systematic random sampling.

56
Simple random sampling
Simple random sampling is a method of selecting a sample from a population in such a way that
every unit of the population is given an equal chance of being selected. In practice, you can draw
a simple random sample of elements using either the 'lottery method' or 'tables of random numbers'.

For example, you may use the lottery method to draw a random sample by using a set of 'N' tickets,
with numbers ' 1 to N' if there are 'N' units in the population. After shuffling the tickets thoroughly,
the sample of a required size, say n, is selected by picking the required n number of tickets.
The best method of drawing a simple random sample is to use a table of random numbers. After
assigning consecutive numbers to the units of population, the researcher starts at any point on the
table of random numbers and reads the consecutive numbers in any direction horizontally,
vertically or diagonally. If the read out numbers corresponds with the one written on a unit card,
then that unit is chosen for the sample.

Suppose that a sample of 6 study centers is to be selected at random from a serially numbered
population of 60 study centers. The following table is portion of a random numbers table used to
select a sample.

Row> 1 2 3 4 5 …… N
Column∀
1 2315 7548 5901 8372 5993 ….. 6744
2 0554 5550 4310 5374 3508 ….. 1343
3 1487 1603 5032 4043 6223 ….. 0834
4 3897 6749 5094 0517 5853 ….. 1695
5 9731 2617 1899 7553 0870 ….. 0510
6 1174 2693 8144 3393 0862 ….. 6850
7 4336 1288 5911 0164 5623 ….. 4036
8 9380 6204 7833 2680 4491 ….. 2571
9 4954 0131 8108 4298 4187 ….. 9527
10 3676 8726 3337 9482 1569 ….. 3880
11 ….. ….. ….. ….. ….. ….. …..
12 ….. ….. ….. ….. ….. ….. …..
13 ….. ….. ….. ….. ….. ….. …..
14 ….. ….. ….. ….. ….. ….. …..
15 ….. ….. ….. ….. ….. ….. …..
N 3914 5218 3587 4855 4888 ….. 8042

57
If you start in the first row and first column, centers numbered 23, 05, 14,…, will be selected.
However, centers numbered above the population size (60) will not be included in the sample. In
addition, if any number is repeated in the table, it may be substituted by the next number from the
same column. Besides, you can start at any point in the table. If you chose column 4 and row 1,
the number to start with is 83. In this way you can select first 6 numbers from this column starting
with 83.
The sample, then, is as follows:
83 75
53 33
40 01
05 26
Hence, the study centers numbered 53, 40, 05, 33, 01 and 26 will be in the sample.
Simple random sampling ensures the best results. However, from a practical point of view, a list
of all the units of a population is not possible to obtain. Even if it is possible, it may involve a very
high cost which a researcher or an organization may not be able to afford. In addition, it may result
an unrepresentative sample by chance.
Stratified sampling
Stratified random sampling takes into account the stratification of the main population into a
number of sub-populations, each of which is homogeneous with respect to one or more
characteristic(s). Having ensured this stratification, it provides for selecting randomly the required
number of units from each sub-population. The selection of a sample from each subpopulation
may be done using simple random sampling. It is useful in providing more accurate results than
simple random sampling.
Systematic sampling
In this method, samples are selected at equal intervals from the listings of the elements. This
method provides a sample as good as a simple random sample and is comparatively easier to draw
a sample. For instance, to study the average monthly expenditure of households in a city, you may
randomly select every fourth households from the household listings
Cluster sampling
Cluster sampling is used when sampling frame is difficult to construct or using other sampling
techniques (simple random sampling) is not feasible or costly. For instance, when the geographic
distribution of units is scattered it is difficult to apply simple random sampling. It involves division
of the population of elementary units into groups or clusters that serve as primary sampling units.
A selection of the clusters is then made to form the sample. The precision of estimates made based
on samples taken using this method is relatively low.
Non-probabilily sampling techniques
In non-probability sampling, the sample is not based on chance. It is rather determined by personal
judgment. This method is cost effective; however, we cannot make objective statistical inferences.
Depending on the technique used, non-probability samples are classified into quota, judgment or
purposive and convenience samples.

Sampling and non-sampling errors

Sampling error is the difference between the value of a sample statistic and the value of the
corresponding population parameter. On the other hand, non-sampling error is an error that occurs
in the collection, recording and tabulation of data. Sampling error can be minimized by using

58
appropriate sampling methods and/or increasing the sample size. The non-sampling error is likely
to increase with increase in sample size.

7.2 Sampling distribution of the sample mean 𝒙 ̅

The value of the sample mean for any sample will depend on the elements included in that sample.
Consequently, the sample mean is a random variable. Therefore, like other random variable, the
sample means possess a probability distribution which is more commonly called the sampling
distribution of sample mean. In general, the probability distribution of a sample statistic is called
its sampling distribution. Sampling distribution is important in statistical inference. The important
characteristics of the sampling distribution of the sample mean are its mean, variance and the form
of the distribution.
Example 7.1: Suppose we have a hypothetical population of size 3, consisting of three children
namely: A is 3 years old, B is 6 years old and C is 9 years old. Construct sampling distribution of
the sample mean of size 2 using sampling without replacement and with replacement.
Solution: The mean and variance of the population are 6 and 6, respectively.
1. If sampling is without replacement we will have 3C2 = 3 possible samples: (A, B), (A, C)
and (B, C) and their corresponding sample means are (3+6)/2 = 4.5, 6 and 7.5, respectively.
Hence the probability distribution (sampling distribution) of the sample mean is:
x 4.5 6 7.5
P( X = x ) 1/3 1/3 1/3
E ( X ) =  x P( x ) = 4.5(1/3) + 6(1/3) + 7.5(1/3) = 6
V ( X ) = ( x 2 P( x )) −  x = (6.75 + 12 + 18.75) – 36 = 1.5
2

2. If sampling is with replacement we will have Nn = 32 = 9 possible samples: (A, A), (A, B),
(A, C), (B, A), (B, B), (B, C), (C, A), (C, B) and (C, C). Hence the probability distribution
(sampling distribution) of the sample mean is:
x 3 4.5 6 7.5 9
P(X = x ) 1/9 2/9 3/9 2/9 1/9
E ( X ) =  x P( x ) = 3(1/9) + 4.5(2/9) + 6(3/9) + 7.5(2/9) + 9(1/9) = 6
V ( X ) = ( x 2 P( x )) −  x = (1 + 4.5 + 12 + 12.5 + 9) – 36 = 3
2

Note:
✓ The mean of the sampling distribution of the sample mean is the same as the population
mean irrespective of the sampling procedure.
✓ The variance of the sampling distribution of the sample mean is:
 2
 , if sampling is with replacement
 n
 2
  N − n , if sampling is without replacement
 n  N − 1 
✓ The problem with using sample mean to make inferences about the population mean is that
the sample mean will probably differ from the population mean. This error is measured by
the variance of the sampling distribution of the sample mean and is known as the standard
error. The standard error is the average amount of sampling error found because of taking

59
a sample rather than the whole population. As sample size increases, the standard error
decreases.
7.3 Central Limit Theorem
If X1, X2, …, Xn is a random sample from a population with mean μ and variance σ2, then as n goes
to infinity the distribution of the sample mean, X , approximates normal distribution with mean μ
and variance σ2/n. That is, as n gets large, X N (μ, σ2/n) and its standardized form is
X −
Z= ~ N (0,1).
/ n
Note: The central limit theorem is useful for approximating the distribution of the sample mean
based on a large sample size and when the population distribution is non normal; however, if the
population is normal, then the sampling distribution of the sample mean will be normal regardless
of the sample size.
Example 7.2: If the uric acid values in normal adult males are normally distributed with mean 5.7
mgs and standard deviation of 1mg. Find the probability that
a) a sample of size 4 will yield a mean less than 5
b) a sample of size 9 will yield a mean greater than 6
Solution: Let X be the amount of uric acids in normal adult males with mean 5.7 and variance 1.
a) If a sample of size 4 is taken, then X ~ N (5.7, 0.25) since the population is normally
distributed.
5 − 5.7
P( X  5) = P ( Z  ) = P ( Z  −1.4)
0.5
= 0.5 − P (0  Z  1.4) = 0.0808
b) If a sample of size 9 is taken, then X ~ N (5.7, 1/9) since the population is normally
distributed.
6 − 5.7
P( X  6) = P( Z  ) = P( Z  0.9)
1
3
= 0.5 − P(0  Z  0.9) = 0.1841

60
UNIT EIGHT: SIMPLE LINEAR REGRESSION AND CORRELATION

Objectives:
Having studied this unit, you should be able to:
✓ Formulate a simple linear regression model.
✓ express quantitatively the magnitude and direction of the association between two variables

Introduction
The statistical methods discussed so far are used to analyze the data involving only one variable.
Often an analysis of data concerning two or more variables is needed to look for any statistical
relationship or association between them. Thus, regression and correlation analysis are helpful in
ascertaining the probable form of the relationship between variables and the strength of the
relationship.

8.1 Simple linear regression analysis

Regression analysis is the statistical method that helps to formulate a functional relationship
between two or more variables. It can be used for assessment of association, estimation and
prediction. For instance one might be interested to formulate a statistical model to relate the height
of fathers and their sons, blood pressure and age, fertilizer amount and yield, etc.
A simple model to relate dependent (response) variable Y and with only one predictor variable X
is to consider a linear relationship.
The first step in regression analysis involving two variables is to construct a scatter plot (diagram)
of the observed data. Scatter diagram is a plot of all ordered pairs ( X i , Yi ) on the coordinate plane
which is helpful for determining an apparent relationship between two variables.
The simple linear regression of Y on X can be expressed with respect to the population parameters
 and  as
Y = +  X +
where  = y-intercept that represents the mean value of the dependent variable Y when the
independent variable X is zero;  = slope of the regression line that represents the change in the
mean of Y for a unit change in the value of X ;  = error term
The population parameters  and  can be estimated from sample data using the least square
technique. The estimators of  and  are usually denoted by a and b, respectively. The resulting
regression line is

Y = a+bX
and the equation is known as the fitted regression line. The estimated values of Y are denoted by

Y . The observed values of Y are denoted by y. The difference between the observed and the

estimated values, Y - Y , is known as error or residual, and is denoted by ˆ . The residual can be
positive, negative or zero.
A best fitting line is the one for which the sum of squares of the residuals,  ˆ 2 has the minimum
value. This is called the method of least squares. According to this method, one would select a and

61

b such that  ˆ 2
=  (Y − Y ) 2
is minimum. The solution of this minimization problem using
partial differentiation is as follows:
 X Y
 XY − n n XY −  X  Y
b= = and a = Y − bX
( X ) 2
n X 2 − ( X ) 2
X − n
2

Example 8.1: A researcher wants to find out if there is any relationship between height of the son
and his father. He took random sample of 6 fathers and their sons. The height in inch is given in
the table below:
Height of father (X) 63 65 64 65 67 68
Height of the son (Y) 66 68 65 67 69 70
i) Draw the scatter diagram and comment on the type of relationship.
ii) Fit the regression line of Y on X.
iii) Predict the height of the son if his father’s height is 66 inch.
Solution:
i)

From the scatter plot one can see that the points are roughly on straight line.
ii)
n = 6  X = 392 ,  Y = 405 ,  X = 25628 ,  XY = 26476 ,  Y = 27355
2 2

n XY −  X  Y 6(26476) − (392)(405) 405 392

b= = = 0.923 a = Y − bX = − 0.923 = 7.2
n X − ( X )
2 2
6(25628) − (392) 2
6 6
Then the fitted (regression) line of Y on X is given by:

Y = a + b X = 7.2+0.923X
✓ The slope of the line, i.e. b=0.923, tells us that a unit (one inch) increase in the height
of the father results in 0.923 inch increase in the height of the son.
✓ The y-intercept of the line, i.e. a=7.2, is the value of Y when the value of X is zero(do
you think that the intercept is meaningful?)
iii) Y=7.2+0.923(66) =68.118, thus the height of the son is 68.118 inch.

8.2 The covariance and the correlation coefficient

Correlation coefficient measures the degree of linear relationship between two variables. The
population correlation coefficient is represented by  and its estimator is r. For a set of n pairs of

62
sample values X and Y, Pearson’s correlation coefficient is calculated as the ratio of the covariance
of the variables X and Y to the product of the standard deviations of X and Y. symbolically,
( X − X )(Y − Y )
Cor ( X , Y )  n −1
r= =
Var( X ) Var(Y ) .  ( X − X )  (Y − Y ) 2
2

n −1 n −1

=
 ( X − X )(Y − Y )
 ( X − X )  (Y − Y )
2 2

Alternatively, the Pearson’s correlation coefficient r can be obtained as:

n XY − ( X )( Y )
r=
n X 2 − ( X ) 2 n  Y 2 − ( Y ) 2
Properties of Pearson’s correlation coefficient r,
o It is appropriate to calculate when both variables X and Y are measured on an interval or
ratio scale.
o The value of r is independent of the unit in which X and Y are measured. i.e., it is a pure
number.
o The value of r ranges from +1 to -1.
o r = +1 indicates a perfect linear relationship between X and Y with positive slope.
o r = -1 indicates a perfect linear relationship between X and Y with negative slope.
o r = 0 indicates no linear relationship between the two variables X and Y.
o as r approaches +1 indicates strong and positive linear relationship between the two
variables
o as r approaches -1 indicates strong and negative linear relationship between the two
variables
o as r approaches 0 indicates weak linear relationship between the two variables
Examples of correlation coefficients:

Example 8.2: In some locations, there is strong association between concentrations of two
different pollutants. An article reports the accompanying data on ozone concentration x (ppm) and
secondary carbon concentration y ( g / m 3 ) :
X 0.066 0.088 0.120 0.050 0.162 0.186 0.057 0.100
Y 4.6 11.6 9.5 6.3 13.8 15.4 2.5 11.8
63
0.112 0.055 0.154 0.074 0.111 0.140 0.071 0.110
8.0 7.0 20.6 16.6 9.2 17.9 2.8 13

a. Calculate the correlation coefficient and comment on the strength and direction of the
relationship between the two variables.
Solution: The summary quantities are
n = 16,  xi = 1.656 ,  y i = 170 .6,  xi y i = 20 .0397 ,  xi = 0.196912 ,  y i = 2253 .56
2 2

The Person’s correlation coefficient is

n XY − ( X )( Y )
r=
n X 2 − ( X ) 2 n Y 2 − ( Y ) 2
16(20.0397) − (1.656)(170.6)
=
16(0.196912) − (1.656) 2 16(2253.56) − (170.6) 2
320.6352 − 282.5136 38.1216
= =
0.408256 6952.6 (.639)(83.38)
= 0.716
The value of 0.716 indicates that there is somehow strong and positive relationship between ozone
concentration and secondary carbon concentration.

64
UNIT NINE: ESTIMATION AND HYPOTHESIS TESTING
Objectives:
Having studied this unit, you should be able to
✓ construct and interpret confidence interval estimates
✓ formulate hypothesis about a population mean
✓ determine an appropriate sample size for estimation
Introduction
We now assume that we have collected, organized and summarized a random sample of data and
are trying to use that sample to estimate a population parameter. Statistical inference is a procedure
whereby inferences about a population are made on the basis of the results obtained from a sample.
Statistical inference can be divided in to two main areas: estimation and hypothesis testing.
Estimation is concerned with estimating the values of specific population parameters; hypothesis
testing is concerned with testing whether the value of a population parameter is equal to some
specific value.
9.1 Point and interval estimation of the mean
Point estimate: In point estimation, a single sample statistic (such as x , s or pˆ ) is calculated from
the sample to provide an estimate of the true value of the corresponding population parameters
(such as  ,  or p ). Such a single statistic is termed as point estimator, and the specific value of
the statistic is termed as point estimate. For example, the sample mean X is an estimator for
population mean and X = 10 is an estimate, which is one of the possible values of X .
Interval estimate: In most practical problems, a point estimate does not provide information about
‘how close is the estimate’ to the population parameter unless accompanied by a statement of
possible sampling errors involved based on the sampling distribution of the statistic. Hence, an
interval estimate of a population parameter is a confidence interval with a statement of confidence
that the interval contains the parameter value.
An interval estimate of the population parameter  consists of two bounds within which the
parameter will be contained:
L  U
where L is the lower bound and U is the upper bound.
Case 1: When the population is normal.
✓ If the variance  2 is known, the sampling distribution of the sample mean X is normal
2  2  X −
with mean  and variance . i.e., X ~ N   ,  and Z = ~ N(0,1).
n  n  
n
X −
✓ If the variance  2 is unknown, t = will have t-distribution with
S
n
n - 1 degrees of freedom. Moreover, as the sample size increases t is approximately the
same as standard normal.
Consider the case  2 is known, we can derive a (1 −  )100 % confidence interval for the population
mean  .

65

to the right. i.e. P( Z  Z  )
Let Z  be a point on the standard normal curve that cuts an area of
2 2 2

 
= . By the symmetric property of the normal distribution, P( Z  − Z  ) = (see the diagram
2 2 2
below).
From the standard normal distribution, we know that
P(− Z   Z  Z  ) = 1 − 
2 2

α/2 α/2
1-α

-Zα/2 Z=0 Z=α/2

To obtain the limit of the interval estimate, we use the standardized form of X in the above
X −
probability statement. i.e., letting Z =

n
P(− Z   Z  Z  ) = 1 −  Becomes
2 2

X −
 P(− Z    Z ) = 1 − 
2  2
n
 
 P(− Z   X −   Z ) = 1−
2 n 2 n
 
 P(− X − Z   −  − X + Z  ) = 1−
2 n 2 n
 
 P( X − Z     X + Z ) = 1−
2 n 2 n
 
We can assert with probability 1 −  that the interval ( X − Z     X + Z ) contains
2 n 2 n
the population mean we are estimating.

Thus, (1 −  )100 % confidence interval for the population mean  is given by

   
 X − Z  , X + Z 
 2 n 2 n

66
 
The end points of the interval, X − Z  and X + Z 
, are called confidence limits and the
n2 2 n
probability 1 −  is called the degree of confidence.
In a similar way a (1 −  )100 % confidence interval for the population mean  with unknown
variance  2 is given by
 S S 
 X − t (n − 1) , X + t (n − 1) 
 2 n 2 n

where t is the critical value of t-test statistic providing an area in the right tail of the t-
2 2

distribution with n − 1 degrees of freedom, and S =

(X i − X )2
.
n −1
Case 2: When the population is non normal.
We use the central limit theorem to approximate the distribution of the sample mean based on large
sample ( n  30 ). Large sample size is a necessary condition to use the normal distribution. And
X −
hence, Z = ~ N(0,1). If  is unknown we can replace it by its sample estimate S. The

n
resulting (1 −  )100 % confidence interval of  becomes
   
 X − Z 2 , X + Z , when  is known
 n 2 n

 X − Z S , X + Z S  , when  is unknown
 
2 n

2

n
Example 9.1: A drug company is testing a new drug which is supposed to reduce blood pressure.
From the six people who are used as subjects, it is found that the average drop in blood pressure
is 2.28 millimeter of mercury (mmHg) with a standard deviation of 0.95 mmHg. What is the 95%
confidence interval for the mean change in blood pressure? (Assume that the population is normal).
Solution: Given: X = 2.28 , S = 0.95 , n = 6
(1 −  )100% = 95%  1 −  = 0.95   = 0.05   = 0.025
2
✓ X = 2.28 is a point estimate for the population mean drop in blood pressure  .
A 95% confidence interval of population mean for unknown  2 and small sample size is:
 S S 
 X − t (n − 1) , X + t (n − 1)  .
 2 n 2 n
And from the t distribution table, t (n − 1) = t 0.025 (5) = 2.571
2

 0.95 0.95 
 2.28 − (2.571) , 2.28 + (2.571) 
 6 6 
 (2.28-0.997, 2.28+0.997)
 (1.28, 3.27)

67
We are 95% confident that the mean drop in blood pressure lies in between 1.28 mmHg and 3.27
mmHg for the sampled population.
Example 9.2: Punctuality of patients in keeping appointment is of interest to a research team. In
a study of patients flow through the office of general practitioners, it was found that a sample of
35 patients were 17.2 minutes late for appointments, on the average. Previous research had shown
the standard deviation to be about 8 minutes. The population distribution was felt to be not normal.
What is the 90 percent confidence interval for the true mean amount of time late for appointment?
Solution: Given: X = 17 .2 ,  = 8 , n = 35
(1 −  )100% = 90%  1 −  = 0.90   = 0.1   = 0.05
2
Since the sample size is fairly large (n > 30), and since the population standard deviation is known,
according to the central limit theorem, the sampling distribution of sample mean is approximately
normal. Thus, a confidence interval of the population mean is given by:
   
 X − Z  , X + Z 
 2 n 2 n
And from the standard normal distribution table, Z  = Z 0.05 = 1.65
2

 8 8 
17.2 − (1.65) , 17.2 + (1.65) 
 35 35 
 (17.2 – 2.2, 17.2 + 2.2)
 (15.0, 19.4)
Therefore, the 90% confidence interval for true mean amount of time late for appointment is
between 15.0 and 19.4 minutes.

9.2 Hypothesis Testing about the Mean

In many circumstances we merely wish to know whether a certain proposition is true or false. The
process of hypothesis testing provides a framework for making decisions on an objective basis, by
weighing the relative merits of different hypotheses, rather than on a subjective basis by simply
looking at the numbers. Different people can form different opinions by looking at data, but a
hypothesis test provides a standardized decision-making process that will be consistent for all
people.
Statistical hypothesis: is a claim (belief or assumption) about an unknown population parameter
values.
Examples of hypothesis:
✓ There is association between lung cancer and number of cigarettes an individual smokes.
✓ The proportion of female students in Hawassa University is 0.35.
✓ In sub-Saharan Africa 40% of individuals are leaving below poverty line.
Hypothesis testing: is the procedure that enables decision-makers to draw inferences about
population characteristics by analyzing the difference between the value of sample statistic and the
corresponding hypothesized parameter value.
General procedure for hypothesis testing
To test the validity of the claim or assumption about the population parameter, sample is drawn
from the population and analyzed. The result of the analysis are used to decide whether the claim
is valid or not.

68
Step 1: State the null hypothesis ( H 0 ) and alternative hypothesis ( H 1 )
Null hypothesis ( H 0 ): refers to a hypothesized numerical value of the population parameter which
is initially assumed to be true. The null hypothesis is always expressed in the form of an equation
making a claim regarding the specific value of the population parameter. That is, for example
H 0 :  = 0
where  0 is hypothesized value of the population mean.
Alternative hypothesis ( H 1 ): is the logical opposite of the null hypothesis. The alternative
hypothesis states that specific population parameter value is not equal to the value stated in the
null hypothesis. For example,
H 1 :    0 (Two-sided test)
H 1 :    0 or H 1 :    0 (One-sided test)
Step 2: State the level of significance  (alpha) for the test
The level of significance is the probability to wrongly reject the null hypothesis H 0 when it is
actually true. It is specified by the statistician or the researcher before the sample is drawn. The
most commonly used values of  are 0.10, 0.50 or 0.01.
Step 3: Calculate the appropriate test statistic
Test statistic is a value computed from a sample that is used to determine whether the null
hypothesis has to be rejected or not. The choice of suitable test statistic depends on the sampling
distribution of the sample statistic. Accordingly, we have the following cases:
Case 1: When the population is normal.
✓ If the variance  2 is known, the sampling distribution of the sample mean X is normal
2  2  X −
with mean  and variance . i.e., X ~ N   ,  and the test statistic is Z = ~
n  n  
n
N(0,1).
X −
✓ If the variance  2 is unknown the test statistic is, t = ~t (n-1).
S
n

Case 2: When the population is non normal.

We use the central limit theorem to approximate the distribution of the sample mean based on large
sample ( n  30 ). Large sample size is a necessary condition to use the normal distribution. And
hence the test statistic is
X −
Z= ~ N(0,1). If  is unknown we can replace it by its sample estimate S.

n
Step 4: Establish a decision rule (critical or rejection region)
The cut-off point to reject or not reject H 0 depends on the level of significance  , the type of test
statistic chosen and the form of the alternative hypothesis. If the value of the test statistic falls in
the rejection region, the null hypothesis is rejected, otherwise we do not reject H 0 (see fig 1 below).
The value of the sample statistic that separates the regions of acceptance and rejection is called

69
critical value. For a specified  , we read the critical values from the Z or t tables, depending on
the test statistic chosen.

Rejection Rejection
region, α/2 Acceptance region, α/2
region, 1-α

µ=µ0
Critical Critical
value, Zα/2 value, Zα/2

Figure:Area of acceptance and rejection of H 0 (Two-tailed test)

Based on the form of the alternative hypothesis and the test statistic we can make the following
decisions:
i. For H 1 :    0 (two-tailed test) reject H 0 if Z  Z  .
2

Rejection Rejection
region, Acceptance region,
α/2 region, 1-α α/2

-Zα/2 Z=0 Z=α/2

ii. For H 1 :    0 (right-tailed test) reject H 0 if Z  Z  .

Rejection
Acceptance region, α
region, 1-α

Z=0 Zα

70
iii. For H 1 :    0 (left-tailed test) reject H 0 if Z  − Z  .

Rejection
region, α Acceptance
region, 1-α

-Zα Z=0

We can summarize the decsion rules as follows:

Alternative hypotheses
Decision H1 :   0 H1 :   0 H1 :   0

Reject H 0 :  =  0 if Z  Z Z  Z Z  −Z
2

Reject H 0 :  =  0 if t  t (n − 1) t  t (n − 1) t  −t (n − 1)
2

Step 5: Interpret the result.

Errors in Hypotesis Testing
Ideally the hypotesis testing procedure should lead to the rejection of the null hypothesis H 0 when
it is false and nonrejection of H 0 when it is true. However, the correct decision is not always
possible. Since the decision to reject or do not reject a hypothesis is based on sample data, there is
a possibility of committing an incorrect decision or error. Hence, a decision-maker may commit
one of the two types of errors while testing a null hypothesis. These errors are summarized as
follows:

Null Hypothesis ( H 0 )
Decision True False
Reject H 0 Type I error (  ) Correct decision
Accept H 0 Correct decision Type II error (  )
Type I error is committed if we reject the null hypothesis when it is true. The probability of
committing a type I error, denoted by  is called the level of significance. The probability level
of this error is decided by the decision-maker before the hypothesis test is performed. Type II
error is committed if we do not reject the null hypothesis when it is false. The probability of
committing a type II error is denoted by  (Greek letter beta). As type one error increases type
two error will decrease (they are inversely proportional). Hence we cannot reduce both errors
simultaneously. As the sample size increases both errors will decrease.

71
Example 9.3: The life expectancy of people in the year 1999 in a country is expected to be 50
years. A survey was conducted in eleven regions of the country and the data obtained, in years, are
given below:
Life expectancy (years): 54.2, 50.4, 44.2, 49.7, 55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and 53.4.
Do the data confirm the expected view? (Assuming normal population) Use 5% level of
significance.
Solution: Let  be the life expectancy of people in the year 1999 in a country.
1. H 0 :  = 50 (The life expectancy of people in the year 1999 in a country is 50 years)
H 1 :   50 (The life expectancy of people in the year 1999 in a country is different from
50 years)
2. Level of significance, α = 0.05.
3. Since  is unknown and the population is normal, the t-test statistic is appropriate.
Given: n = 11;  0 = 50 and we need to compute X and s .
11

x i
54.2 + 50.4 + ..... + 57.5 + 53.4 598.5
X = i =1
= = = 54.41
n 11 11
11

x = 54.2 2 + 50.4 2 + ..... + 57.5 2 + 53.4 2 = 32799.91

2
i
i =1

1 
 x i −
( xi )  1 
2

 = 32799.91 −
(598.5) 2 
S = 2 2

n −1  n  10  11 
 
1
= (236.07) = 23.607
10
 S = 23 .607 = 4.859
Then, the t-test statistic is calculated as:
X −  0 54 .41 − 50 4.41
t= = = = 3.01
S 4.859 1.465
n 11
4. For α = 0.05 and two-tailed test, the critical (table) value is:
t (n − 1) = t 0.05 (11 − 1) = t 0.025 (10 ) = 2.228
2 2

0.02 0.02
5 5
-2.228 0
2.228

Since t = 3.01  t (n − 1) = 2.228  reject the null hypothesis H 0 . That is, the calculated
2
t value lies in the rejection region (the shaded region).
5. Conclusion: The data do not confirm the expected view. That is, the life expectancy is
different from 50 years at 5% level of significance.
72
Example 9.4: Suppose that we want to test the hypothesis with a significance level of .05 that the
climate has changed since industrialization. Suppose that the mean temperature throughout history
is 50 degrees. During the last 40 years, the mean temperature has been 51 degrees and the
population standard deviation is 2 degrees. What can we conclude?
Solution:
Let  be the mean temperature.
1. H 0 :  = 50 (There is no change in temperature since industrialization)
H 1 :   50 (There is change in temperature since industrialization)
2. Level of significance, α = 0.05.
3. Since n = 40 is large, the Z-test statistic is appropriate.
Given: n = 40;  = 2; X = 51;  0 = 50
X − 0 51 − 50 1
Z= = = = 3.16
 2 0.316
n 40
4. For α = 0.05 and two-tailed test, the critical (table) value is:
Z  = Z 0.05 = Z 0.025 = 1.96
2 2

0.0 0.0
25 25
-
1.96 Z= 1.9
0 6
Since Z = 3.16  Z  = Z 0.025 = 1.96  reject the null hypothesis H 0 . That is, the
2
calculated Z value lies in the rejection region (the shaded region).
5. Conclusion: There has been a change in temperature since industrialization, at 5% level of
significance.
Example 9.5:A study was conducted to describe the menopausal status, menopausal symptoms,
energy expenditure and aerobic fitness of healthy midwife women and to determine relationship
among these factors. Among the variables measured was maximum oxygen uptake (Vo2max). The
mean Vo2max score for a sample of 242 women was 33.3 with a standard deviation of 12.14. On
the basis of these data, can we conclude that the mean score for a population of such women is
greater than 30? Use 5% level of significance.
Solution:
Let  be the mean Vo2max score for a population of healthy midwife women.
1. H 0 :  = 30 (The mean score for a population of healthy midwife women is 30)
H 1 :   30 (The mean score for a population of healthy midwife women is greater than
30).
2. Level of significance, α = 0.05.
3. Since n = 242 is large, the Z-test statistic is appropriate.
Given: n = 242; S = 12.14; X = 33.3;  0 = 30

73
X − 0 33 .3 − 30 3.3
Z= = = = 4.23
S 12 .14 0.7804
n 242
4. For α = 0.05 and right-tailed test, the critical (table) value is:
Z  = Z 0.05 = 1.65

0.
05

Z= 1.6
0 5
Since Z = 4.23  Z  = 1.65  reject the null hypothesis H 0 . That is, the calculated Z value
lies in the rejection region (the shaded region).
5. Conclusion: The mean Vo2max score for the sampled population of healthy midwife women
is greater than 30 at 5% level of significance.
9.3 Test of Association (Independence)
Usually we encounter with nominal scale data. The  2 test of association is useful for determining
whether there is any relationship or association exists between two nominal variables. For instance,
we might be interested in the relationship between HIV status with sex, lung cancer and smoking
habit, political affiliation and sex, e t c.
When observations are classified according to two variables or attributes and arranged in a table,
the display is called a contingency table as shown below:

The test of association or independence uses the contingency table format. Here the variables A
and B have been classified into mutually exclusive categories. The values Oij in row i and column
j of the table shows the observed frequency falling in each joint category i and j. The row and
column totals are the sums of their corresponding frequencies. The sum of row or column totals
will give grand total n, which represents the sample size. The procedures to test the association
between two independent variables is summarized as follows:
Step 1: State the null and alternative hypotheis
H 0 : There is no association or relationship exists between two variables, that is, the two
variables are independent.

74
H 1 : There is association or relationship between two variables, that is, the two variables
are dependent.
Step 2: State the level of significance,  .
Step 3: Calculate the expected frequencies, Eij, corresponding to the observed frequency in row i
and column j. The expected frequencies in each cell are calculated as:
Row i total  Column j total Ri  C j
Eij = =
Sample size n
Step 4: Compute the value of test-statistic:
r c (O − E ) 2
 Cal = 
2 ij ij

i =1 j =1 Eij
where Oij is the observed frequency of row i and coulumn j and Eij is the expected frequency of
row i and coulumn j.
Step 5: Find the critical (table) value of   2 (df ) (from Appendix..). The value of   2 correponds
to an area in the right tail of the distribution.
where df = (Number of rows – 1)(Number of columns – 1) = (r – 1)(c – 1)
Step 6: Compare the calculated and table values of  2 . Decide wheather the variables are
independent or not, using the following decision rule:
Reject H 0 if  Cal 2 is greater than   2 , (df ) . Otherwise do not reject H 0 .
Example 9.6: The following data on the colour of eye and hair for 6800 individuals were obtained
from a source:
Eye colour
Hair colour Fair Brown Black red Total
Blue 1768 808 190 47 2813
Green 946 1387 746 43 3122
Brown 115 444 288 18 865
Total 2829 2639 1224 108 6800
Test the hypothesis that hair colour and eye colour are independently distributed (there is no
association between colour of eye and colour of hair) at the level of  = 0.01.
Solution:
1. H 0 : There is no association between hair colour and eye colour.
H 1 : There is association between hair colour and eye colour.
2.  = 0.01.
3. Calculate the expected frequencies, Eij
Ri  C j
Eij =
n
2813  2829 2813  108
E11 = = 1170.29 ……………….. E14 = = 44.68
6800 6800
865  2829 865  108
E31 = = 359.87 ………………….. E34 = = 13.74
6800 6800

75
Therefore, the contingency table for expected frequencies is as follows:
Eye colour
Hair colour Fair Brown Black red Total
Blue 1170.29 1091.69 506.34 44.68 2813
Green 1298.84 1211.61 561.96 49.58 3122
Brown 359.87 335.70 155.70 13.74 865
Total 2829 2639 1224 108 6800
4. Calculate the test statistic:
r c (O − E ) 2
 Cal = 
2 ij ij

i =1 j =1 Eij
(1768 − 1170.29) 2 (47 − 44.68) 2 (946 − 1298.84) 2
 Cal 2
= + ..... + + + ..... +
1170.29 44.68 1298.84
(43 − 49.58) 2 (115 − 359.87) 2 (18 − 13.74) 2
+ + ..... +
49.58 359.87 13.74
 Cal 2 = 1074 .43
5. Critical value   2 (df )
df = (r – 1) (c – 1) = (3 – 1) (4 – 1) = (2) (3) = 6
  2 (df ) =  0.01 2 (6) = 16 .812
6. Since  Cal = 1074 .43 >   (df ) = 16 .812  Reject H 0 .
2 2

7. Conclusion: There is association between hair colour and eye colour. That is, hair colour
and eye colour are dependent.

76
77
78
79

Statstics Full Handout
0% (1)
Statstics Full Handout
95 pages
Intro._SWE_and_Agro_Eco[1]
No ratings yet
Intro._SWE_and_Agro_Eco[1]
64 pages
Awoke Introduction To Statistics
100% (2)
Awoke Introduction To Statistics
147 pages
Introductio To Statistics Module in DMU
No ratings yet
Introductio To Statistics Module in DMU
100 pages
Statistics For Management I - Best
No ratings yet
Statistics For Management I - Best
127 pages
CHAPTER ONE. INTRODUCTION TO STATISTICS docx
No ratings yet
CHAPTER ONE. INTRODUCTION TO STATISTICS docx
6 pages
Chapter One
No ratings yet
Chapter One
6 pages
Chapter One
No ratings yet
Chapter One
7 pages
CHAPTER ONE
No ratings yet
CHAPTER ONE
6 pages
Statistics For MGMT I & II
No ratings yet
Statistics For MGMT I & II
161 pages
Chapter One
No ratings yet
Chapter One
8 pages
CH 1
No ratings yet
CH 1
20 pages
statistics 1-1
No ratings yet
statistics 1-1
4 pages
Stat 1-3 Chapters
No ratings yet
Stat 1-3 Chapters
36 pages
CH 1, 2 & 3for MIS
No ratings yet
CH 1, 2 & 3for MIS
31 pages
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
No ratings yet
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
111 pages
Chapter 1 Description and History of Statistical Science
No ratings yet
Chapter 1 Description and History of Statistical Science
2 pages
Chapter 1 & 2
No ratings yet
Chapter 1 & 2
17 pages
CHAPTER 1 (1)
No ratings yet
CHAPTER 1 (1)
11 pages
2013 CH 1 Up 9 Probability Note
No ratings yet
2013 CH 1 Up 9 Probability Note
104 pages
CH 1 Up 9 Probability Note-1 PDF
No ratings yet
CH 1 Up 9 Probability Note-1 PDF
106 pages
Chapter One
No ratings yet
Chapter One
7 pages
chapter 1
No ratings yet
chapter 1
9 pages
Satatistics
No ratings yet
Satatistics
40 pages
Stat (I) 1-3 Material
No ratings yet
Stat (I) 1-3 Material
29 pages
Lecture Note (Chapter-I and II) PDF
No ratings yet
Lecture Note (Chapter-I and II) PDF
26 pages
stat 1-3 chapters
No ratings yet
stat 1-3 chapters
30 pages
stat for for computer science
No ratings yet
stat for for computer science
50 pages
Stat for Engand Scientist_231127_120304
No ratings yet
Stat for Engand Scientist_231127_120304
75 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Note for Int to Statistics
No ratings yet
Note for Int to Statistics
24 pages
Basic Statistics PDF
No ratings yet
Basic Statistics PDF
43 pages
Math-101-Statistics
No ratings yet
Math-101-Statistics
100 pages
Basic Statistics For Analysis and Interpretation of Assessment Data
No ratings yet
Basic Statistics For Analysis and Interpretation of Assessment Data
24 pages
Chapter One: Introduction: 1 1.1 Definition and Classification of Statistics
No ratings yet
Chapter One: Introduction: 1 1.1 Definition and Classification of Statistics
68 pages
Statistics For Management I
No ratings yet
Statistics For Management I
82 pages
Stat For Comp (CH 1-5)
No ratings yet
Stat For Comp (CH 1-5)
54 pages
Chapter 1-2 Basic Stat.docx NEW (1) (1)
No ratings yet
Chapter 1-2 Basic Stat.docx NEW (1) (1)
15 pages
Basic Stat Chapter 1
No ratings yet
Basic Stat Chapter 1
10 pages
Statistics Unit One
No ratings yet
Statistics Unit One
7 pages
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
No ratings yet
Chapter - I 1. Introduction: - 1.1 Definition and Classification of Statistics
14 pages
Business Statistics
No ratings yet
Business Statistics
186 pages
Introduction to Statistics
No ratings yet
Introduction to Statistics
82 pages
Chapter 1 and 2
No ratings yet
Chapter 1 and 2
29 pages
Stat CH 1 Lec Note - 231109 - 164005
No ratings yet
Stat CH 1 Lec Note - 231109 - 164005
4 pages
Chapter 1-4 For Fundametals of Biostat
No ratings yet
Chapter 1-4 For Fundametals of Biostat
36 pages
Economics Sem 1Lecture Notes Introduction to Statistics (1)
No ratings yet
Economics Sem 1Lecture Notes Introduction to Statistics (1)
90 pages
Business Statistics CH (5)
No ratings yet
Business Statistics CH (5)
4 pages
Average: Sagni D. 1
No ratings yet
Average: Sagni D. 1
85 pages
Chapter 1 Intro to Statistics
No ratings yet
Chapter 1 Intro to Statistics
12 pages
statics 1 and 2 (1)
No ratings yet
statics 1 and 2 (1)
21 pages
Lecture - MODULE 1 LESSON 1
No ratings yet
Lecture - MODULE 1 LESSON 1
15 pages
Chapter One 1.1 Origin and Growth of Statistics
No ratings yet
Chapter One 1.1 Origin and Growth of Statistics
12 pages
Chapter 1 Stat
No ratings yet
Chapter 1 Stat
5 pages
CH 1 Up 9 Probability Note (For Engineering) - 1
No ratings yet
CH 1 Up 9 Probability Note (For Engineering) - 1
93 pages
Ch#5# ST
No ratings yet
Ch#5# ST
79 pages
Note For Students
No ratings yet
Note For Students
68 pages
Basic Stat
No ratings yet
Basic Stat
31 pages