100% found this document useful (2 votes)
585 views230 pages

STAT 111: Introduction To Statistics & Probability For Actuaries

(i) The relative frequency in the fourth year is 0.698 = 1 - 0.122 - 0.180 (ii) If the total sample is 200, then the frequency of the fourth year is 0.698 * 200 = 139.6 ≈ 140
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
585 views230 pages

STAT 111: Introduction To Statistics & Probability For Actuaries

(i) The relative frequency in the fourth year is 0.698 = 1 - 0.122 - 0.180 (ii) If the total sample is 200, then the frequency of the fourth year is 0.698 * 200 = 139.6 ≈ 140
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 230

STAT 111: Introduction to Statistics &

Probability for Actuaries

Dr. Perpetual Saah Andam

Department of Statistics & Actuarial Science


(University of Ghana)
LECTURE ONE - OUTLINE
• Nature of Statistics
• Branches of Statistics
• Data types & Variables
• Measurement Scales for Variables
• Sources of Data
• Uses of Statistical Data

Dr. P. S. Andam 2
NATURE OF STATISTICS
• Statistics refers to the theory of information with inference making as
objective.

Theory of information here includes:


• Data Collection
• Summarization &
• Analyzing data

Inference making is concerned with making generalizations about a


population based on a sample chosen from the population.
Dr. P. S. Andam 3
DEFINTION OF TERMS
• Population refers to the set of existing units or all objects under a study.
• Sample is a subset of the population.
• The process of choosing a sample from a population is known as sampling.
Sampling can be done in a number of ways:
1. Random Sampling
2. Cluster Sampling
3. Stratified Sampling
4. Purposive Sampling
5. Systematic Sampling etc.

Dr. P. S. Andam 4
BRANCHES OF STATISTICS

Descriptive Statistics Inferential Statistics


Deals with making
Concerned with data
decisions about the
collection, summary
population based on a
techniques & analysis
sample chosen from the
of the results.
population.

Dr. P. S. Andam 5
DATA
• Data is the raw material of statistics. It refers to unprocessed facts and
figures from which conclusions can be drawn.
• Data can either be:

Categorical or Qualitative
• Cannot be expressed as numerical values though numbers can be
assigned to them.
Quantitative
• Results from either counting or measuring. Also known as numerical.
Dr. P. S. Andam 6
Examples of Data Types
• Classify the following as either categorical or quantitative:

Temperature of Complexion of level Performance


patients in a 100 actuarial grouped as:
hospital students Good, better & best

Number of Heights of
alphabets in the
English language
athletes in UG

Dr. P. S. Andam 7
VARIABLES
• A variable refers to any characteristic of an object under a survey.
• It is either measured (or counted) or categorized.
• Examples of a variables include:
Age, height, sex, performance in a quiz, time, marital status etc.

• Variables can either be quantitative or qualitative.


• Quantitative variables are numerical.
• Alternatively, Qualitative variables can only be categorized or
grouped.
Dr. P. S. Andam 8
VARIABLES CONT’D
• Quantitative variables can also be Discrete or Continuous.

On one hand, they are discrete if they assume either finite or


countably infinite number of values.

On the other hand, they are continuous if they take on uncountably


infinite number of values.

Example: Classify the following as either discrete or continuous variables:


Weight, half-life, age, length, number of females in a class, car numbers.

Dr. P. S. Andam 9
MEASUREMENT SCALES FOR VARIABLES
Qualitative Variables
• Nominal Scale: measurement that classifies values into mutually exclusive
groups where order/rank is unimportant.
• Ordinal Scale: measurement that classifies data also into mutually exclusive
categories that can be ranked.

Example: State the Scale of measurements that will be used on each of the
following variables:
Political party, performance on a IQ test (Pass, Fail), Awards at a ceremony (Ist
position, Ist runner up,…), religion, sex, blood group (A, B, O, AB), Insurance
Claim Severity (Very high, high, moderate,…) etc.
Dr. P. S. Andam 10
MEASUREMENT SCALES FOR VARIABLES CONT’D
Quantitative Variables:
• Interval scale: ranks data, and precise differences between units of measure
do exist; however, there is no meaningful zero.
NB: temperature (0oC does not mean total absence of temperature), IQ (score of
zero does not indicate no intelligence).
• Ratio Scale: possess all properties of interval scale and also the ratio between
any two values if meaningful.
NB: highest level of measurement & there exist a true zero (0).
Eg: Waiting time (of zero means did not wait), treatment cost (of zero means
paid nothing).
Dr. P. S. Andam 11
SUMMARY OF MEASUREMENT SCALES & VARIABLES

• The Interval &


Discrete Ratio Scales are
Quantitative used for
Quantitative data
Continuous
Variables
• The Nominal &
Qualitative/
Qualitative Ordinal Scales are
Categorical used for
Qualitative data
Dr. P. S. Andam 12
SOURCES OF STATISTICAL DATA

Primary Sources
• Primary data is obtained when researchers originally collect
data from designing experiments/conducting surveys. Eg:
interviews, giving questionnaires

Secondary Sources
• Secondary data are collected from other sources such as
libraries, the internet or corporate bodies.
Dr. P. S. Andam 13
USES OF STATISTICS (STATISTICAL DATA)
• Financial Planning
• Problem solving
• Political & economic decision-making
• Employment Opportunities

Dr. P. S. Andam 14
LECTURE TWO - OUTLINE
• Meaning of Data Reduction
• Describing Categorical & Quantitative Data using:
1. Frequency tables

2. Graphical Techniques (pie chart, bar chart, stem-and-leaf


plot, dot plot, cumulative frequency diagram &
histograms)

Dr. P. S. Andam 15
DATA REDUCTION
• Data in its natural form is large or meaningless.
• The process of putting data in such a way that meaning can be
made is known as Data Reduction.
• Data reduction is a step in the data mining process.

• Data mining comprises the use of statistical procedures as well


as techniques from computer science to extract useful
information from a large or meaningless dataset.
Dr. P. S. Andam 16
SUMMARIZING CATEGORICAL DATA
• Frequency Distribution
- Raw data is organized in table form by class and frequency
- Classes are mutually exclusive (i.e. non-overlapping)
Eg: The ff are preferences of drinks by 10 lecturers in UG:
Malt, Malt, Club, Pepsi, Malt, Club, Pepsi, Pepsi, Malt, Malt.
Drink Preference Number of lecturers
Club 2
Malt 5
Pepsi 3
Dr. P. S. Andam 17
SUMMARIZING CATEGORICAL DATA CONT’D
𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚
• Relative frequency =σ
𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚
• When the relative frequency is multiplied by 100, the result is the
percent frequency. As the name suggests, the category values can be
converted to percentages following this approach.
• NB; σ 𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 = 𝟏
Drink Preference Number of Lecturers Relative Frequency
𝟐
Club 2 𝟏𝟎
𝟓
Malt 5 𝟏𝟎

𝟑
Pepsi 3 𝟏𝟎
Dr. P. S. Andam 18
SUMMARIZING CATEGORICAL DATA CONT’D
• Exercise
A partial relative frequency is given
by the table below (i) What is the relative
Year Relative Frequency frequency in the fourth year?
First 0.122 (ii) If the total sample is 200,
what is the frequency of the
Second 0.180
fourth year?
Third 0.400
(iii) Show the frequency
Fourth _ distribution

Dr. P. S. Andam 19
SUMMARIZING CATEGORICAL DATA CONT’D
PIE & BAR CHARTS
• Both useful for displaying categorical data with small number of classes.
• In a pie chart, the segment area represents the category value.

• In a bar chart, the height of a bar represents the value.

• Generally bar charts are better for display. Relative lengths are easier to
judge than relative areas.

Dr. P. S. Andam 20
SUMMARIZING CATEGORICAL DATA CONT’D

Example: Pie & Bar Charts


A questionnaire provides 58 yes, 42 no & 20
no-opinion answers
•Construct a (i) Pie Chart (ii) Bar Graph

Dr. P. S. Andam 21
SUMMARIZING QUANTITATIVE DATA CONT’D
• Ungrouped frequency distributions
This is meant for discrete data.
Eg: The following are the ages of 10 Level 100 actuarial students in UG:
16, 17, 17, 18, 18, 18, 18, 19, 19, 20.
Ages (x) Tally Frequency (f)
16 / 1
17 // 2
18 //// 4
19 // 2
Dr. P. S. Andam 20 / 22 1
SUMMARIZING QUANTITATIVE DATA CONT’D
• Grouped Frequency Distributions
- Used for continuous data.
- As a ‘rule of thumb’ the number of classes is given by 𝑵, where N is
the number of observations
- One way to determine the class width is to use the formula;
𝑯𝒊𝒈𝒉𝒆𝒔𝒕 𝑽𝒂𝒍𝒖𝒆−𝑳𝒐𝒘𝒆𝒔𝒕 𝑽𝒂𝒍𝒖𝒆
Class Width =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑪𝒍𝒂𝒔𝒔𝒆𝒔

Dr. P. S. Andam 23
SUMMARIZING QUANTITATIVE DATA CONT’D
The following relates to marks obtained by Level 400 Actuarial Science
Students in UG:
52, 98, 85, 59, 92, 61, 81, 88, 58, 72, 72, 57, 78, 65, 62, 69, 80, 58, 60,74.
The frequency table is shown as follows:
Marks Tally Frequency Relative frequency Cumulative frequency

51 – 60 //// / 6 0.30 6
61 – 70 //// 4 0.20 10
71 – 80 //// 5 0.25 15
81 – 90 /// 3 0.15 18
91 – 100 // 2 0.10 20
Dr. P. S. Andam 24
SUMMARIZING QUANTITATIVE DATA CONT’D
Stem & Leaf Plots
• Suitable for data measured by the interval and ratio scales of measurement.
• The actual data values are included in this graph.
• A stem plot consist of a series of horizontal rows of numbers.
• Each row is labeled through a number called its stem.
• All numbers that follows the stem are called the leaves.
• Useful for a small set of numeric data.
• Gives an impression of location, spread and shape of values.

Dr. P. S. Andam 25
SUMMARIZING QUANTITATIVE DATA CONT’D
• Example;
Solution:
Stem & Leaf Plot Question
• 20 049 Key: • Consider the distribution of
• 21 012788 Stem: Hundreds aptitude scores policy: 200, 204,
• 22 27778 & Tens 209, 210, 211, 212, 217, 218,
• 23 01378 Leaf: Ones 218, 222, 227, 227, 227, 228,
• 24 12237 230, 231, 233, 237, 238, 241,
• 25 11346 242, 242, 243, 247, 251, 251,
• 26 0 253, 254, 256, 260.
Dr. P. S. Andam 26
SUMMARIZING QUANTITATIVE DATA CONT’D
Example: The following relates to marks obtained by Level 400 Actuarial
Science Students in UG:
52, 98, 85, 59, 92, 61, 81, 88, 58, 72, 72, 57, 78, 65, 62, 69, 80, 58, 60,74.
Draw a
(a) Cumulative frequency polygon
(b) Ogive
(c) Histogram, for the distribution.
(d) Dot plot

Dr. P. S. Andam 27
LECTURE THREE - OUTLINE
Descriptive Statistics for Univariate Data
• Measures of Location or Central Tendency:
- Mean (Arithmetic, Geometric & Harmonic), Mode & Median
• Measures of Dispersion or Spread or Variation:
- Range, Variance, Standard deviation, Coefficient of Variation,
Skewness & Kurtosis.
• Measures of Position
- Deciles, Quartiles & Percentiles
• Exploratory Data Analysis
- Box plots
Dr. P. S. Andam 28
INTRODUCTION
• When only one characteristic is measured on an experimental
unit, the data obtained is termed Univariate data.
• Graphical displays are not adequate for making inferences.
• As such, numerical measures are pursued.

• When a numerical measure is calculated based on all units of


the population, is it known as a population parameter.
• When it is calculated on the sample alone, it is known as a
sample statistic.
Dr. P. S. Andam 29
MEASURES OF CENTRAL TENDENCY
• These are a set of statistics that locate the center of a dataset.
• They are the mean, mode and the median
• Also known as measures of location.
• They usually denote the most “typical” number in a dataset.
• ‘Usually, measures of central tendency are said to be
representative of all the measurements in a dataset but not
equal to any of them’. This statement is the basis of such
statements as “the average Ghanaian …”
Dr. P. S. Andam 30
MEASURES OF CENTRAL TENDENCY
The Mean
• There are three types of means; arithmetic, geometric and
harmonic.
• The arithmetic mean (sample mean) is the most widely used.
• The arithmetic mean is common called average.
σ𝒏
𝒊=𝟏 𝑿𝒊
• It is calculated as 𝒙
ഥ(𝒔𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏) =
𝒏
σ𝑵
𝒊=𝟏 𝑿𝒊
• The population mean is usually denoted by 𝝁 given by 𝝁=
𝑵
Dr. P. S. Andam 31
MEASURES OF CENTRAL TENDENCY
• Examples:
(a) The mean of 3 numbers, 10, 20 and x is 17. Find the value of x.
(b) Suppose the average annual income of 4 actuaries is $225,000, would
it be possible for any of them to have an annual income of $900,000?
(c) Five claim costs of an insurance company are $50, $75, $100, $125 and
$150. Suppose the company decides to pay the amounts less $5 each,
what is the mean claim cost?
(d) The cedi equivalents of some investment returns are given by: GHS
300, GHS 500 & GHS 750. find the mean dollar returns if 1 dollar = GHS
5.

Dr. P. S. Andam 32
MEASURES OF CENTRAL TENDENCY
Properties of the mean
Given 𝑥1 , 𝑥2 , … , 𝑥𝑛 are sample units and 𝑐𝜖ℝ, then:
𝑥1 ±𝑐 + 𝑥2 ±𝑐 +⋯+ 𝑥𝑛 ±𝑐
(a) 𝑛
= 𝑥ҧ ± 𝑐
𝑐𝑥1 +𝑐𝑥2 +⋯+𝑐𝑥𝑛
(b) 𝑛
= 𝑐 𝑥ҧ
(c) σ𝑛𝑖=1 𝑐1 𝑥𝑖 + 𝑐2 = 𝑐1 𝑛𝑥ҧ + 𝑐2

NB: (a) translation property (b) Scaling property (c) Linear combination

Dr. P. S. Andam 33
MEASURES OF CENTRAL TENDENCY
The Weighted Mean
• This reflects the relative importance of observations by including their
weights.
• If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are 𝑛 measurements and 𝑤1 , 𝑤2 , … , 𝑤𝑛 be their relative
importance or weights (NB: σ𝒏𝒊=𝟏 𝒘𝒊 = 𝟏), then
ഥ𝒘 (𝒕𝒉𝒆 𝒘𝒆𝒊𝒈𝒉𝒕𝒆𝒅 𝒎𝒆𝒂𝒏) = σ𝒏𝒊=𝟏 𝒘𝒊 𝒙𝒊
𝒙

Eg 1; Emma scored 60, 75 & 𝑋3 marks in three different tests. To compute


Emma’s grade, the weights to be used are:0.1 & 0.3 corresponding to 60 & 75
respectively. What is Emma’s weighted average?
Eg 2; Show that the arithmetic mean is a special case of the weighted mean.
Dr. P. S. Andam 34
MEASURES OF CENTRAL TENDENCY
Geometric Mean
• It is given by the nth root of the product of 𝑛 numbers.
ഥ𝑮 = 𝒏 𝒙𝟏∙ ∙ 𝒙𝟐 ∙∙∙ 𝒙𝒏
𝒙
• Only applies to measurements of the same sign
• Used when the interest is to find proportional growth
Harmonic Mean
𝒏
• It is given as 𝒙
ഥ𝑯 = 𝟏 and used when the average rate of things
σ𝒏
𝒊=𝟏𝒙
𝒊
happening is desired.
Dr. P. S. Andam 35
MEASURES OF CENTRAL TENDENCY
Relationship between the Geometric, Harmonic & Arithmetic Means
• The relationship between the 3 means is 𝐺 2 = 𝐴𝐻.

Eg 1: Given the measurements 𝑥1 & 𝑥2 , show that 𝐺 2 = 𝐴𝐻.

Eg 2: Find the geometric mean of the following numbers; 2, 8 and 256.

Eg 3: Suppose the distance from my house to class is 40km. If I


drove to class at a speed of 40km/h and returned at speed of 80km/h,
calculate the average speed used for the journey.
Dr. P. S. Andam 36
MEASURES OF CENTRAL TENDENCY
Mean of to or more means
• If 𝑛1 , 𝑛2 , … , 𝑛𝑘 are sample sizes of 𝑘 samples. 𝒙 ഥ𝟏 , 𝒙
ഥ𝟐 , … , 𝒙
ഥ𝒌 are their
respective means, then then the mean of the k means given as
ഥ 𝟏 + 𝒏𝟐 𝒙
𝒏𝟏 𝒙 ഥ 𝟐 + ⋯ + 𝒏𝒌 𝒙
ഥ𝒌
ഥ𝑾 =
𝒙
𝒏𝟏 + 𝒏𝟐 + ⋯ + 𝒏𝒌

Example: Suppose there are 3 sections of this course, the average scores of the
final exams are:= 𝒙 ഥ𝑨 =71 for section A, 𝒙 ഥ𝑩 = 85 for section B & 𝒙 ഥ𝑪 = 78 for
section C. If the size of the 3 sections is the same, the mean score will be …?
Suppose the sizes are 10 for A, 15 for B & 20 for C, calculate the average score.
Dr. P. S. Andam 37
MEASURES OF CENTRAL TENDENCY
The Median
• This is the middle number in a data set.
• It indicates 50% of the data elements lie on both sides.
• To find the median, data elements have to be arranged in order (either
ascending or descending).
• The median corresponds to (i.e. for ungrouped data):
𝒏+𝟏
ෝ =
𝒎 𝒕𝒉 𝒑𝒐𝒔𝒕𝒊𝒐𝒏, 𝑖𝑓 𝒏 𝒊𝒔 𝒐𝒅𝒅 &
𝟐
𝒏
𝟐
𝒕𝒉 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏+𝒏𝒆𝒙𝒕 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏
ෝ =
𝒎 , 𝑖𝑓 𝒏 𝒊𝒔 𝒆𝒗𝒆𝒏
𝟐
Dr. P. S. Andam 38
MEASURES OF CENTRAL TENDENCY
• However, for grouped data, the median is
σ𝒇
𝟐
−σ 𝒇𝒎
ෝ = 𝑳𝒇 +
𝒎 𝑪 , where
𝒇𝒎

- 𝑳𝒇 = lower class boundary of median class


- σ 𝑓 = Total frequency
- σ 𝒇𝒎 = sum of all frequencies lower than the median class
- 𝒇𝒎 = frequency of the median class
- 𝐶 = Class size of median class

Dr. P. S. Andam 39
MEASURES OF CENTRAL TENDENCY
The Mode
• It is the data element with the highest frequency.
• It can be used as a measure of central tendency for categorical data
• For grouped (numerical) data, it is calculated as:
∆𝟏
𝑴𝒐𝒅𝒆 = 𝑳𝒎 + 𝑪, where
∆𝟏 +∆𝟐
-𝑳𝒎 = lower class boundary of the modal class
- ∆𝟏 = excess of modal class’ frequency over the frequency of the next
lower class
- ∆𝟐 = excess of modal class’ frequency over the frequency of the
next upper class
Dr. P. S. Andam 40
MEASURES OF CENTRAL TENDENCY
Example;
The table below is a collection of the marks obtained by level 400 Stats
students in a quiz. Calculate the mean, mode & median of the
distribution.
Marks Frequency
51 – 60 4
61 – 70 6
71 – 80 5
81 – 90 3
91 – 100 2
Dr. P. S. Andam 41
MEASURES OF DISPERSION
• Dispersion or Variability is the degree of spread of numerical
data about an average.

• Measures of dispersion are important since two distributions


may have the same means but different shapes. Thus, they
provide a further approach to understanding a data set.
• The simplest is the range and the most widely used is the
variance.

Dr. P. S. Andam 42
MEASURES OF DISPERSION
The Range
• Mathematically,
𝑹𝒂𝒏𝒈𝒆 = 𝑴𝒂𝒙𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆 − 𝑴𝒊𝒏𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆
• Albeit being very easy to compute, it is largely affected by outliers and is
also not meaningful for categorical data.
• For grouped data, range is given by;
𝑹𝒂𝒏𝒈𝒆
= 𝑼𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 𝒐𝒇 𝒕𝒉𝒆 𝒉𝒊𝒈𝒉𝒆𝒔𝒕 𝒄𝒍𝒂𝒔𝒔
− 𝑳𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒃𝒐𝒖𝒏𝒅𝒂𝒓𝒚 𝒐𝒇 𝒕𝒉𝒆 𝒍𝒐𝒘𝒆𝒔𝒕 𝒄𝒍𝒂𝒔𝒔

Dr. P. S. Andam 43
MEASURES OF DISPERSION
Variance and Standard deviation
• Most commonly used measures of dispersion
• Unlike the range, the variance (standard deviations) involve all elements
within a dataset.
• The intuition behind these statistics are we want a statistic that is
- small when observations are clustered around the mean &
- large when they are spread out
• The relationship between the variance and the standard deviation is
𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 = 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆
Dr. P. S. Andam 44
MEASURES OF DISPERSION
• The sample variance (𝒔𝟐 ) is calculated as:
σ 𝒏 𝟐
𝒊=𝟏(𝑿 𝒊 − 𝑿)
𝒔𝟐 =
𝒏−𝟏
i.e. the sum of the squared deviations from the sample mean divided by
(n – 1), where n = sample size.
• The population variance (𝝈𝟐 ) is given by;
σ 𝒏 𝟐
𝟐 𝒊=𝟏 (𝑿 𝒊 − 𝝁)
𝝈 =
𝒏
i.e. the sum of the squared deviations from the population mean divided by
n, where n = population size.

Dr. P. S. Andam 45
MEASURES OF DISPERSION
Properties of the Variance (Standard deviation)
• It is not affected by translation
If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are sample units with variance say 𝒔𝟐 = 𝝅, then
(𝒙𝟏 ±𝒄), 𝒙𝟐 ± 𝒄 , … , (𝒙𝒏 ±𝒄) also have variance 𝒔𝟐 = 𝝅 for c𝝐𝑹.
• It responds to scaling
If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are sample units with variance say 𝒔𝟐 = 𝝅, then
(𝒄𝒙𝟏 ), 𝒄𝒙𝟐 , … , (𝒄𝒙𝒏 ) also have variance = 𝒄𝟐 𝝅 for c𝝐𝑹.

Hence, 𝑽𝒂𝒓 𝒂𝑿 + 𝒃 = 𝒂𝟐 𝑽𝒂𝒓(𝑿), where a & b are constants.


Dr. P. S. Andam 46
MEASURES OF DISPERSION
Coefficient of Variation
• A demerit of the SD is that it depends on the unit of the measurements
• To compare the variability among two or more sets of data representing
different quantities with different units of measurements, the Coefficient
of Variation (CV) is used.
• Thus, it is a measure of relative variation.
• It is calculated as:
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒅𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏
𝑪𝑽 = × 𝟏𝟎𝟎
𝒎𝒆𝒂𝒏

Dr. P. S. Andam 47
MEASURES OF DISPERSION: Example

Compute the Standard deviation


Marks Frequency
The sample mean and SD of
two sets of data, A and B are
51 – 60 4
Mean (A) = 67.57kg; SD (A) =
61 – 70 6 26.57kg
71 – 80 5 Mean (B) = 132.55lbs; SD (B) =
81 – 90 3
36.19lbs
Compare the two sets of data.
91 – 100 2
Dr. P. S. Andam 48
SHAPE OF DISTRIBUTIONS
• When we talk of shape in Statistics, two concepts worthy of note are:
- Skewness & - Kurtosis
• Skewness
This refers to the degree of asymmetry in a distribution.
• If a distribution is normal, then it is mound-shaped and also the mean,
mode and median all lie in the center of the distribution.
i.e. mean = median = mode

Dr. P. S. Andam 49
SHAPE OF DISTRIBUTIONS
• Right or positive skewed distributions have a long tail to the right.
• This implies the bulk of the measurements in the distribution lies to the
left
• Thus their mean > median > mode.

• Left or Negative skewed distributions have a long tail to the right.


• This implies the bulk of the measurements in the distribution lies to the
right.
• Thus their mean < median < mode
Dr. P. S. Andam 50
SHAPE OF DISTRIBUTIONS
• The Coefficient of Skewness (SK) is calculated by;
1 𝑥 −𝑥 3
SK = σ𝑛𝑖=1 𝑖
𝑛−1 𝑠
• The skewness of a normal distribution is zero
• -1 < SK < 1, then data is slightly skewed (either positively or negatively)
• -1 < SK < -0.5 , then data is moderately negatively skewed (-1>SK>-0.5
means moderately negatively skewed )
• SK < -1, then data is highly negatively skewed (SK < -1 means highly
positively skewed).

Dr. P. S. Andam 51
SHAPE OF DISTRIBUTIONS

Dr. P. S. Andam 52
SHAPE OF DISTRIBUTIONS

Dr. P. S. Andam 53
SHAPE OF DISTRIBUTIONS
Kurtosis
• This refers to the peakednesss of a dataset.
• That is, how peaked or flat a distribution is.
• The kurtosis (K) of a normal distribution is 3 (i.e. mesokurtic)
• However, highly /sharply peaked distributions are leptokurtic
(K > 3)
• Flat peaked distributions are platykurtic (K < 3)

Dr. P. S. Andam 54
SHAPE OF DISTRIBUTIONS

• The Coefficient of Kurtosis (K) is given by:

𝒏 𝟒
𝟏 ഥ
𝒙𝒊 − 𝒙
𝑲= ෍
𝒏−𝟏 𝒔
𝒊=𝟏

Dr. P. S. Andam 55
SHAPE OF DISTRIBUTIONS

Dr. P. S. Andam 56
MEASURES OF POSITION
Deciles
• These are statistics that divide data in tens
• The kth decile is given by;
𝒌
𝒌𝒕𝒉 𝒅𝒆𝒄𝒊𝒍𝒆 = × 𝒏𝒕𝒉 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏, where 𝐧 = 𝐭𝐨𝐭𝐚𝐥 𝐟𝐫𝐞𝐪.
𝟏𝟎
• To find a percentile, put the dataset in ascending order.

NB: The 5th decile corresponds to the mean.

Dr. P. S. Andam 57
MEASURES OF POSITION
Quartiles
• These are three statistics that divide a data set into four.
• The first or lower quartile (𝑸𝟏 ) is given by:
𝟏
𝑸𝟏 = × 𝑵𝒕𝒉 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏
𝟒 Interquartile
• The second or middle quartile (𝑸𝟐 ) is given by;
𝟏 range =
𝑸𝟐 = × 𝑵𝒕𝒉 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏
𝟐
• The third or upper quartile (𝑸𝟑 ) is given by;
𝑸 𝟑 - 𝑸 𝟏
𝟑
𝑸𝟑 = × 𝑵𝒕𝒉 𝒑𝒐𝒔𝒊𝒕𝒊𝒐𝒏
𝟒
Dr. P. S. Andam 58
MEASURES OF POSITION
Percentiles
• These are statistics that divide data in 100
• The kth percentile is given by;
𝒌
𝒌𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 = × 𝒏𝐭𝐡 𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧, where 𝐧 = 𝐭𝐨𝐭𝐚𝐥 𝐟𝐫𝐞𝐪.
𝟏𝟎𝟎
• To find a percentile, put the dataset in ascending order.

NB: The 5th decile corresponds to the mean.

Dr. P. S. Andam 59
EXPLORATORY DATA ANALYSIS
• Geared towards analyzing data sets to summarize their main
characteristics
• It often uses visual methods.
• One useful one is the Box plot (Box – and – whisker plot)
• The purpose of exploratory data analysis is to examine data to find out
what information can be discovered about the data such as the center
and the spread.

Dr. P. S. Andam 60
EXPLORATORY DATA ANALYSIS
Box Plots
• It is constructed using these five specific values;
- Maximum Value
- Minimum value
- Lower Quartile
- Upper Quartile
- Middle Quartile
• These values are called a five-number summary of the data set.

Dr. P. S. Andam 61
EXPLORATORY DATA ANALYSIS
A boxplot is a graph of a data set obtained by
• drawing a horizontal line from the minimum data value to 𝑸𝟏
• drawing a horizontal line from 𝑸𝟑 to the maximum data value
• drawing a box whose vertical sides pass through 𝑸𝟏 and 𝑸𝟑 with a
vertical line inside the box passing through the median or 𝑸𝟐 .
• The lines are known as whiskers.
• The lowest value id the lower fence while the maximum value is the
upper fence.

Dr. P. S. Andam 62
EXPLORATORY DATA ANALYSIS
• Diagram depicting a box – and – whisker plot.

Lower Upper
𝑸𝟏 𝑸𝟐 𝑸𝟑
fence fence
x - axis
0 𝑹𝟏 𝒓𝟐 𝒓𝟑 𝒓𝟒 𝑹𝟓
Dr. P. S. Andam 63
EXPLORATORY DATA ANALYSIS
Summary of Steps to Construct a Box Plot
• Calculate median and the three (3) main quartiles.
• Obtain your IQR (interquartile range)
NB: IQR = It is the length of the interval that contains the middle 50%.
• Obtain your horizontal line

Dr. P. S. Andam 64
EXPLORATORY DATA ANALYSIS
Outliers
• These are anything above or below the upper and lower fences.
Lower fence = 𝑸𝟏 − 𝟏. 𝟓 𝑰𝑸𝑹
Upper fence = 𝑸𝟑 + 𝟏. 𝟓 𝑰𝑸𝑹
• Outliers may results from errors or miscalculations
NB; On the box plot, indicate outliers with an asterisk (*)
• The vertical line within the box corresponds to the median

Dr. P. S. Andam 65
EXPLORATORY DATA ANALYSIS
Example;
Amounts of fuel consumed per day by 8 buses tested for a fixed journey
are given below: 260, 290, 300, 320, 330, 340, 345, 520.
(a) Construct a box – and – whisker plot.
(b) Describe the graph

Dr. P. S. Andam 66
LECTURE FOUR - OUTLINE
Descriptive Statistics for Bivariate Data
• Contingency Tables
• Scatter Plots
• Correlation
- Parametric Correlation Coefficients
- Non-parametric Correlation Coefficients

Dr. P. S. Andam 67
INTRODUCTION
Bivariate Data Set
• It is a collection of data consisting of two variables of an experimental
unit. For instance, Height and Weight of a particular person.

• To describe (summarize characteristics) of a bivariate data set, the


following techniques can be used;
- Contingency Tables
- Scatter Plots
- Correlation Coefficients
Dr. P. S. Andam 68
CONTINGENCY TABLES
• A summary technique for bivariate data.
• It is a cross-tabulation of variables in a dataset.
• It gives the count of observations for each combination
of values of two categorical data values.
• Cells in the table indicate counts of measurement that
bears characteristics of the cross.

Dr. P. S. Andam 69
CONTINGENCY TABLES: Layout (r × n)
VARIABLE B
VARIABLE A C1 C2 … Cn Row Sum
1 x11 x12 … x1n R1

2 x21 x22 … X2n R2

⋮ ⋮ ⋮ ⋱ ⋮ ⋮

r xr1 xr2 … xrn Rn

Column Sum C1 C2 … Cn Total

Dr. P. S. Andam 70
CONTINGENCY TABLES
Example;
A group of Financial Professionals consisting of Actuaries and Financial
Analysts were asked choose between a old method of valuation and a new
one. Of the 50 Actuaries, 20 chose the new method while 25 were
conservative. 5 actuaries did not make a choice. And of the 150 Financial
Analysts, 60 chose the new method, 75 stuck with the old method while
15 made no choice.
Construct a Contingency Table for the information given below

Dr. P. S. Andam 71
CONTINGENCY TABLES
Solution
Valuation Method
Financial New Valuation Old Valuation No Choice Total
Professional Method Method

Actuary 20 25 5 = 50

Financial 60 75 15 = 150
Analyst
= 80 =100 =20
Total 200

Dr. P. S. Andam 72
CONTINGENCY TABLES
• Some important information that can be deduced from contingency tables are:
(1) Proportions (percentages)
These proportions are with respect to the total number of people (i.e. 200)
Eg: 10% of Financial professionals went with the new valuation method etc.
Valuation Method

Financial Professional New Valuation Old Valuation No Choice Total


Method Method
Actuary 0.1 0.125 0.025 =0.25
Financial Analyst 0.3 0.375 0.075 =0.75
Total =0.4 =0.5 =0.1 =1
Dr. P. S. Andam 73
CONTINGENCY TABLES
(2) Marginal distributions
-Marginal distributions are the distribution of a single variable in a two – way
table.
Suppose we wish to find a distribution for Financial professionals alone, we can
do that in proportions as follows:

Financial Professional Proportion


Actuary 𝟓𝟎
= 𝟎. 𝟐𝟓
𝟐𝟎𝟎
Financial Analyst 𝟏𝟓𝟎
= 𝟎. 𝟕𝟓
𝟐𝟎𝟎
=𝟏
Dr. P. S. Andam Total 74
CONTINGENCY TABLES
(3) Conditional distributions
Conditional distributions are obtained when we condition on one value of one
variable and calculate the other variable.
Let’s now find the conditional distribution of the Choices of Valuation Methods
by Actuaries. This is given in the table below:
Valuation Method (Proportions)
New Valuation Old Valuation No Choice Total
Method Method
Actuaries 20 25 5
= 0.4 = 0.5 = 0.1 =1
50 50 50
Dr. P. S. Andam 75
CONTINGENCY TABLES
The conditional distribution of the Choices of Valuation Methods
by Financial Analysts is given in the table below:

Valuation Method (Proportions)


New Old Valuation No Choice Total
Valuation Method
Method
Financial 60 75
=0.5 15
= 0.4 150 = 0.1
Analyst 150 150 =1

Dr. P. S. Andam 76
SCATTER PLOTS
• It is the best way of graphically displaying the relationship
between quantitative variables in a bivariate data.

• It is plotted such that one variable (explanatory/ independent/


predictor) is on the horizontal (x-axis) plane and the other
(response/ explained/ dependent) on the vertical axis.

Dr. P. S. Andam 77
SCATTER PLOTS
Examining Scatter Plots
• Look for the following

- Overall Pattern (Linear or not)


- Direction or association (Positive or negative)
- Strength (weak, moderate or strong)

Dr. P. S. Andam 78
SCATTER PLOTS

Dr. P. S. Andam 79
CORRELATION COEFFICIENTS
• This statistic measures the strength and the direction of the
linear relationship between variables.

• We resort to numerical measures to explain relationships


between variables since our eyes are not good judges of how
strong relationships are.

Dr. P. S. Andam 80
CORRELATION COEFFICIENTS
Pearson’s Correlation Coefficient (r)
𝟏
𝑪𝒐𝒗 (𝑿,𝒀) σ𝒏 (𝒙 −ഥ
𝒙)(𝒚𝒊 −ഥ
𝒚)
𝒏−𝟏 𝒊=𝟏 𝒊
• 𝒓= = , where
𝒔𝒙 𝒔𝒚 𝒔𝒙 𝒔𝒚

𝟏 𝟏
𝒔𝒙 = σ𝒏𝒊=𝟏 ഥ
𝒙𝒊 − 𝒙 𝟐 & 𝒔𝒚 = σ𝒏𝒊=𝟏 ഥ
𝒚𝒊 − 𝒚 𝟐
𝒏−𝟏 𝒏−𝟏

Dr. P. S. Andam 81
CORRELATION COEFFICIENTS
Characteristics of the Correlation Coefficient
• The Cov (X, Y) in the correlation formula denotes the Covariance.
• It provides a measure of strength of the correlation between two or
more variables.
• It has the following properties:
- Cov (X, X) = Var (X)
- Cov (X, Y) = Cov (Y, X). i.e. it is symmetric.
- It is unaffected by translation but affected by scaling.

Dr. P. S. Andam 82
CORRELATION COEFFICIENTS
Properties of the Correlation Coefficient
• Corr (X, X) = 1
• It only measures linear association
• It is strongly affected by outliers

Dr. P. S. Andam 83
CORRELATION COEFFICIENTS
Interpreting the Pearson’s Correlation Coefficient (|r|≤ 𝟏)
r Interpretation
0.00 – 0.20 Very weak

0.20 – 0.30 Weak

0.03 – 0.40 Moderate

0.40 – 0.70 Strong

0.70 – 1.00 Very Strong


Dr. P. S. Andam 84
CORRELATION COEFFICIENTS
• A more convenient way to compute the correlation coefficient
is to use the formula:

𝒏 σ𝒏 𝒙 𝒚
𝒊=𝟏 𝒊 𝒊 −( σ𝒏
𝒙
𝒊=𝟏 𝒊 )( σ𝒏
𝒊=𝟏 𝒚𝒊 )
𝒓= 𝟐 𝟐
𝒏 σ𝒏 𝒙𝟐−
𝒊=𝟏 𝒊
𝒏
σ𝒊=𝟏 𝒙𝒊 𝒏 σ𝒏 𝒚𝟐−
𝒊=𝟏 𝒊
𝒏
σ𝒊=𝟏 𝒚𝒊

Dr. P. S. Andam 85
CORRELATION COEFFICIENT
Example: Calculate the Correlation Coefficient
Student # of absences (x) Final Grade (y)

A 6 82
B 2 86
C 15 43
D 9 74
E 12 58
F 5 90
Dr. P. S. Andam G 8 86 78
CORRELATION COEFFICIENT
• The results of the correlation coefficient is displayed below:
2 2
STUDENT x y xy x y
A 6 82 492 36 6724
B 12 86 1032 144 7396
C 15 43 645 225 1849
D 9 74 666 81 5476
E 12 58 696 144 3364
F 5 90 450 25 8100
G 8 78 624 64 6084
TOTALS 67 511 4605 719 38993
Dr. P. S. Andam 87
CORRELATION COEFFICIENTS
• From the table;
෍ 𝑥 = 67, ෍ 𝑦 = 511, ෍ 𝑥𝑦 = 4605, ෍ 𝑥 2 = 719, ෍ 𝑦 2 = 39983.

𝑛 σ 𝑥𝑦−σ 𝑥 σ 𝑦
Therefore; 𝑟 = . Since 𝒏 = 𝟕,
[𝑛 σ 𝑥 2 − σ𝑥 2 ][𝑛 σ 𝑦 2 − σ𝑦 2]

𝟕 𝟒𝟔𝟎𝟓 −(𝟔𝟕×𝟓𝟏𝟏)
𝒓=
[ 𝟕 𝟕𝟏𝟗 −(𝟔𝟕)𝟐 ][ 𝟕 𝟑𝟗𝟗𝟖𝟑 −(𝟓𝟏𝟏)𝟐
𝒓 = −𝟎. 𝟔𝟐𝟔𝟕
NB: Interpret the results

Dr. P. S. Andam 88
Correlation Coefficient
Spearman’s Rank Correlation Coefficient (𝒓𝒔 )
• Used when data does not follow a Normal Distribution.
• That is, it is a Non-parametric Statistic.
• The correlation coefficient is given by the formula
𝟔 σ𝒏 𝒅𝟐
𝒊=𝟏 𝒊
𝒓𝒔 = 𝟏 −
𝒏(𝒏𝟐 −𝟏)

Dr. P. S. Andam 89
Correlation Coefficient
Treating tied ranks when dealing with Spearman’s Rank Correlation
• Assign the mean of the tied ranks to each tied score.
For instance there’s a tie between the score of the 4th and 5th positions,
4+5
assign the mean value = 4.5 to each of the two positions.
2
• The next score receives the 6th position

Dr. P. S. Andam 90
Correlation Coefficient
Example;
Calculate the Spearman’s rank Correlation Coefficient for the data below:

Pe rformance In Pe rformance in
STUDENT Statistics (x) Finance (y)
A 73 77
B 76 78
C 78 79
D 65 80
E 86 86
F 82 89
G 91 95
Dr. P. S. Andam 91
Correlation Coefficient
• The table for computation of the Spearman’s rank correlation coefficient
is shown below;

Rank of x Rank of y d=Rx-Ry d2


6 7 -1 1
5 6 -1 1
4 5 -1 1
7 4 3 9
2 3 -1 1
3 2 1 1
1 1 0 0
14
Dr. P. S. Andam 92
Correlation Coefficient
From the table given, σ 𝒅𝟐 = 𝟏𝟒 & 𝒏 = 𝟕
6 σ 𝑑2
Therefore, 𝑟𝑠 = 1 − becomes
𝑛(𝑛2 −1)
6 14
𝑟𝑠 = 1 − = 𝟎. 𝟕𝟓 (Interpret this results)
7[(7)2 −1]

Dr. P. S. Andam 93
LECTURE FIVE - OUTLINE
Set Theory & Counting Processes
• Algebra of Sets
• Counting rules
- Permutations
- Combinations

Dr. P. S. Andam 94
INTRODUCTION: SETS
• A set is a collection of well-defined and distinct objects.
• “Well-defined” implies there is no doubt whatsoever about whether or not a
given item belongs to the set under consideration.
• “Distinct” in the sense that no two identical objects must be contained in the
same set.
Examples of sets:
(i) The set of all students in STAT 111 class
(ii) The set of all months with less than 30 days
(iii) The set of all integers > 1

Dr. P. S. Andam 95
INTRODUCTION: SETS
• The objects that belong to a set are called its elements or members.
• Sets are denoted with capital letters such as A, B, 𝓔.
• Elements are denoted by lower case letters such as a, b, z.
• "𝒂 ∈ 𝑩” means “a is an element of set B” & “𝒂 ∉ 𝑩” means otherwise.
• Sets can be described in three common ways:
– By definition (stating in words what it contains)
– By the roster method (listing the elements)
– By the property method (set-builder notation)

Dr. P. S. Andam 96
INTRODUCTION: SETS
Types of Sets
• Universal set (i.e. the population or sample in some cases)
• Equal & Equivalent sets
• Countable & Uncountable sets
• Null or Empty set (denoted by ∅ 𝑜𝑟 { })
• Singleton set
• Subsets
“𝐀 ⊂ 𝑩” means A is a subset of B. This implies that if 𝒂 ∈ 𝑨 then
𝒂 ∈ 𝑩.
Dr. P. S. Andam 97
SET OPERATIONS
Union of Sets (U)
• “A U B” denotes A union B.
• 𝐴 ∪ 𝐵 = {𝑥|𝑥 ∈ 𝐴 𝑜𝑟 𝑥 ∈ 𝐵 𝑜𝑟 (𝑥 ∈ 𝐴 𝑎𝑛𝑑 𝑥 ∈ 𝐵)}

• Note the following large operator notations for Union of Sets


- 𝐴1 ∪ 𝐴2 ∪… ∪ 𝐴𝑛 = ‫=𝑖𝑛ڂ‬1 𝐴𝑖
- 𝐴1 ∪ 𝐴2 ∪… = ‫∞ڂ‬ 𝑖=1 𝐴𝑖

Dr. P. S. Andam 98
SET OPERATIONS
Intersection of Sets
• "𝐴 ∩ 𝐵“ denotes the intersection of two sets A and B.
• 𝐴 ∩ 𝐵 = {𝑥|𝑥 ∈ 𝐴 𝑎𝑛𝑑 𝑥 ∈ 𝐵}
• If 𝐴 ∩ 𝐵 = ∅, then A and B are known as disjoint sets.

• Large operator notations for Intersection of Sets are shown below:


- 𝐴1 ∩ 𝐴2 ∩… 𝐴𝑛 = ‫=𝑖𝑛ځ‬1 𝐴𝑖
- 𝐴1 ∩ 𝐴2 ∩… = ‫∞ځ‬ 𝑖=1 𝐴𝑖

Dr. P. S. Andam 99
SET OPERATIONS
Complement of a Set
• Given a set A, its complement is denoted by 𝐴′ or 𝐴𝑐 or 𝐴.ҧ
• 𝐴𝑐 = {𝑥|𝑥 ∈ 𝒰, 𝑥 ∉ 𝐴}
• i.e. those elements belonging to 𝒰 (the universal set) but not in A.
Note the following laws of set algebra relating to complement of sets
− 𝐴𝑐 𝑐 = 𝐴
− 𝒰𝑐 = ∅
−∅𝑐 =𝑈

Dr. P. S. Andam 100


SET OPERATIONS
Partition of Sets
• It is the subdivision of set A into non-empty subsets which are disjoint
and collectively exhaustive (i.e. their union is A)
• Hence if set A is partitioned into 𝑃1 , 𝑃2 , … , 𝑃𝑛 , then:
- 𝑷𝒊 ∩ 𝑷𝒋 = ∅, if 𝒊 ≠ 𝒋
- ‫𝑨 = 𝒊𝑷 𝟏=𝒊𝒏ڂ‬
• The subsets in a Partition are called cells thus 𝑃𝑖 ∀𝑖 are the cells.

Dr. P. S. Andam 101


VENN DIAGRAMS
• Provide a geometrical way of representing sets
• Named after English Mathematician John Venn.
• The universal set 𝒰 is represented by a rectangle while its subsets are
represented by circles.
Eg: Show the following regions on a Venn diagram:
(i) 𝐴 ∩ 𝐵
(ii) 𝐴 ∪ 𝐵
(iii)𝐴 − 𝐵

Dr. P. S. Andam 102


COUNTING PRINCIPLES
• Provide methods or techniques to enumerate members of a set
especially when number of possible outcomes in an experiment is large.
• It forms the basis of probability and statistics.
• The branch of mathematics concerned with counting is known as
Combinatorics or Combinatorial Analysis.

• The problem of counting begins with drawing 𝒓 objects from a specified


group of say 𝒏 objects. This is known as Sampling.

Dr. P. S. Andam 103


COUNTING PRINCIPLES
• If in sampling, we draw one object one after the other instead of
drawing the whole 𝑟 objects at a time, then we can distinguish between
sampling with or without replacement.
• If an object is drawn, notice is take of it and it is put back into the
population before another sample is drawn, then it is sampling with
replacement.
• However, if after an object is drawn, it is put aside before the next one is
drawn, until all the r objects are drawn, then the sampling is without
replacement.

Dr. P. S. Andam 104


COUNTING PRINCIPLES
Addition Principle of Counting
• This is the most basic counting principle.
• It states that suppose 𝑨 = 𝑩𝟏 𝑼𝑩𝟐 𝑼 … 𝑼𝑩𝒏 , where 𝑩𝒊 ∀𝑖 form a
Partition of A (i.e. they are also pairwise disjoint), then 𝒏 𝑨 = σ𝒏𝒊=𝟏 𝑩𝒊 .

Eg: A class of students consists of 6 Ghanaians, 4 Nigerians and 2 Togolese.


In how many ways can a Ghanaian or Nigerian be drawn from the class if
no student has dual citizenship?

Dr. P. S. Andam 105


COUNTING PRINCIPLES
Solution
Let A denote the set of students who are Ghanaians or Nigerians
𝐵1 denotes the set of students who are Ghanaians
𝐵2 denote he set of students who are Nigerians
Clearly 𝐵1 & 𝐵2 are disjoint since no student holds a dual citizenship
Hence,
𝑛 𝐴 = 𝑛 𝐵1 𝑜𝑟 𝐵2 = 𝑛 𝐵1 + 𝑛 𝐵2 = 6 + 4 = 10 students

Dr. P. S. Andam 106


COUNTING PRINCIPLES
Inclusion – Exclusion Principle
• Suppose A and B are subsets of 𝒰, then:
𝒏 𝓤 = 𝑨𝑼𝑩 = 𝒏 𝑨 + 𝒏 𝑩 − 𝒏(𝑨 ∩ 𝑩)

Eg: A class of students consists of 6 Ghanaians, 4 Nigerians and 2 Togolese.


In how many ways can a Ghanaian or Nigerian be drawn from the class if 3
students have dual citizenship status of being both Ghanaian and Nigerian
citizens?

Dr. P. S. Andam 107


COUNTING PRINCIPLES
Solution
Let A denote the set of students who are Ghanaians or Nigerians
𝐵1 denotes the set of students who are Ghanaians
𝐵2 denote the set of students who are Nigerians
Then, 𝐵1 ∩ 𝐵2 denotes those students who are both Ghanaians and
Nigerians.
Hence,
𝑛 𝐴 = 𝑛 𝐵1 𝑈 𝐵2 = 𝑛 𝐵1 + 𝑛 𝐵2 − 𝑛(𝐵1 ∩ 𝐵2 )
𝑛 𝐴 = 𝑛 𝐵1 𝑈 𝐵2 = 6 + 4 − 3 = 7 students.

Dr. P. S. Andam 108


COUNTING PRINCIPLES
Multiplication Principle
• This is the fundamental principle of counting.
• It states that if one thing can be accomplished in 𝒏𝟏 different ways and after this
a second thing can be accomplished in 𝒏𝟐 different ways,…, and finally a
𝒌𝒕𝒉 thing can be accomplished in 𝒏𝒌 ways, then all 𝒌 things can be
accomplished in specified order in 𝒏𝟏 × 𝒏𝟐 × ⋯ × 𝒏𝒌 different ways.
Eg: If a woman has 2 blouses and 3 skirts, how many outfits can she put together?
Solution:
She can put the blouses on in 2 ways and the skirts on in 3 ways. Thus the
number of outfits she can put together is 2 × 3 = 6.
Dr. P. S. Andam 109
COUNTING PRINCIPLES
Eg 2:
Six dice are rolled. In how many ways may the faces of the dice show up?

Solution:
There are six faces of each dice and since there are six dice in all, the
number of ways the faces may show up y the fundamental principle of
counting is given as:
6 ways for dice 1 × 6 ways for dice 2 × … × 6 ways for the 6th dice
= 6 × 6 × 6 × 6 × 6 × 6 = 66 = 46656 ways.
Dr. P. S. Andam 110
COUNTING PRINCIPLES
Factorial
• It is denoted by the exclamation mark. i.e. !
• Given a positive integer 𝑛, the product of all the whole numbers from 𝑛
to 1 is called 𝑛 factorial which is denoted as 𝒏!
• 𝒏! = 𝒏 𝒏 − 𝟏 𝒏 − 𝟐 … 𝟑 ∙ 𝟐 ∙ 𝟏
• 𝟎! = 𝟏

Dr. P. S. Andam 111


COMBINATORICS
• When drawing r objects one at a time from n distinct objects, it is either
the order in which the objects appear in the sample is regarded or not.
• Thus we can have an ordered sample and an unordered sample.
• An ordered sample of size r drawn from a population of n objects,
𝒂𝟏 , 𝒂𝟐 , … , 𝒂𝒏 as any ordered arrangement 𝒂𝒋𝟏 , 𝒂𝒋𝟐 , … , 𝒂𝒋𝒓 of r objects.

• Whether or not order is important in the selection of r objects is the


main criteria for distinguishing between the two main techniques of
counting: Permutations and Combinations.

Dr. P. S. Andam 112


PERMUTATION
• It is an arrangement of objects in a given order.
• The term “n-Permutation” refers to the arrangement of n distinct
objects in a given order taking all at a time.
• In the case above, the number of ways this can be done is denoted by
𝑷 or 𝑷 𝒏, 𝒏 or n𝑷
n n n.
• Thus, n𝑷n= 𝒏! = 𝒏 𝒏 − 𝟏 𝒏 − 𝟐 … 𝟑 ∙ 𝟐 ∙ 𝟏
• The formula above can be interpreted as follows:
“to make successive choices from a single set of n objects, the 1st choice
may be made in n ways, the second choice made be made in (n-1) ways
and so on up to the last choice which leaves only one choice”.

Dr. P. S. Andam 113


PERMUTATION
Eg: Consider the set of letters a, b & c. In how many ways can the three letters
be arranged taking all of them together. Indicate the arrangements.
Solution:
𝑛(ways to arrange)= 3! = 3 × 2 × 1 = 6 ways.

Eg: Given the numbers 1, 2, 3, 4, how many different numbers of three digits
can be formed from them if repetitions are not allowed.
Solution:
(i) 𝑛(ways)= 4! = 4 × 3 × 2 × 1 = 24 ways

Dr. P. S. Andam 114


PERMUTATION
r-Permutation
• This is an arrangement of objects in a given order taking 𝒓 at a time
from a set of 𝒏 (𝒓 ≤ 𝒏) objects.
• This denoted by n𝑷r or 𝑷 𝒏, 𝒓 or n𝑷r
• This definition assumes that the permutations are without repetition
• Mathematically,
n𝑷 𝒏!
r =
𝒏−𝒓 !

Dr. P. S. Andam 115


PERMUTATION
Eg1: In how many different ways can 4 people be chosen from a set of 6?
6𝑃 6!
Solution: n(ways)= 4 = = 360 ways.
6−4 !

NB: The number of permutations or arrangement of r objects with repetition


from 𝒏 distinct objects is 𝒏𝒓 .

Eg2: Given the numbers 1, 2, 3, 4, how many different numbers of three digits
can be formed from them if repetitions are allowed.
Solution: 𝑛(ways) = 43 = 4 × 4 × 4 = 64 ways
Dr. P. S. Andam 116
CYCLIC PERMUTATIONS
• Typically, actual positions do not matter.
For instance; if six people are sitting in a circle, we do not get a new permutation
if they all move one position in a clockwise (or anti-clockwise) direction.

• Relative positions on the other hand are what matter.


• To find the number of ways to arrange objects in a circle, first fix one object.
• Starting from this object and moving either in a clockwise or anti-clockwise
direction, there are n𝐏n-1 =(n – 1)! Possibilities.

Dr. P. S. Andam 117


CYCLIC PERMUATIONS
Eg1: Five executives attend a round-table meeting. How many different
arrangements are possible?
Solution; there are (n – 1)! = (5 – 1)! = 4! = 24 circular permutations.

Eg2: Suppose each of the executives was accompanied by the secretary to take
minutes at the meeting.
(a) How many arrangements are possible that alternate the executives and their
secretaries?
(b) If a secretary should sit by his executive, how many arrangements are possible
that alternate the executives and the secretaries?
.
Dr. P. S. Andam 118
CYCLIC PERMUTATIONS
Solution:
(a) Suppose an executive sits down at the start. Then ∃ (5 – 1)! different
arrangements for the remaining executives. The five secretaries can be
seated in the next 5 alternating seats. Thus ∃ 5! Possibilities for them.
Then by the multiplication principle there are ∃ 5!× 4! = 2880
different arrangements.
(b) Suppose the first to sit down is an executive. Then ∃ (5 – 1)! different
arrangements. ∃ two ways the first secretary can sit, either at the left
or the right of her executive. Once she sits all other places are
automatic for the rest of the secretaries. Hence ∃ 2(4!) = 48 possible
arrangements.
Dr. P. S. Andam 119
PERMUTATIONS WITH REPETITIONS
• Suppose we want to find the number of permutations (with repetition)
of 𝑛 objects of which 𝒏𝟏 , 𝒏𝟐 , … , 𝒏𝒌 .
𝒏!
• Then the number of ways is , where 𝒏𝟏 + 𝒏𝟐 + ⋯ + 𝒏𝒌 = 𝒏.
𝒏𝟏 !𝒏𝟐 !⋯𝒏𝒌 !
Eg: In how many ways can the letters of the word STATISTICS be arranged?
Solution:
𝑛 = 10, 𝑛1 number of S = 3, 𝑛2 number of 𝑇 = 3,
𝑛3 number of 𝐼 = 2, 𝑛4 number of 𝐴 = 𝑛1 number of 𝐶 = 1
10!
∴ The number of ways = = 50400 ways.
3!3!2!1!1!

Dr. P. S. Andam 120


COMBINATIONS
• This is an unordered selection of objects.
• The r – Combination is the total number of combinations of a set of 𝒏
objects taking 𝒓 at a time, 𝒏 ≥ 𝒓.
• This is denoted by n𝑪r or 𝑪 𝒏, 𝒓 or n𝑪r
• This definition assumes that the permutations are without repetitions
𝒏!
• Mathematically, n𝑪
r= 𝒓! 𝒏−𝒓 !

Dr. P. S. Andam 121


COMBINATIONS
Eg: A school basketball squad for the inter-school competition has ten
players. The coach must select a team for the first tournament.
(a) How many different teams of five players can be constituted for this
tournament?
(b) If, in constituting the team, the coach also has to designate positions,
how many different teams of five players can be constituted?

NB: (a) is a combination problem while (b) is a permutation problem.

Dr. P. S. Andam 122


COMBINATIONS
Solution:
10 10!
(a) The number of ways = = = 252 combinations
5 5! 10−5 !
(b) Since order counts in this case, the number of ways is given by
10𝑃 = 10! = 30240 permutations.
5 10−5 !

Dr. P. S. Andam 123


LECTURE SIX - OUTLINE
• Definition of Terms (Experiments, Trials, outcomes, equally likely
outcomes, Sample space etc.)
• Axiomatic Approach to Probability
• Probability Calculus

Dr. P. S. Andam 124


INTRODUCTION: Definition of terms
Experiment
• It is a process specially set up or occurring naturally which leads to
some well-defined outcome or results.
• It can either be deterministic or random.
• It is deterministic if its observed result is not subject to chance.
• If its outcomes are uncertain, then experiment is random.
• A simple performance of an experiment is a trial.

Dr. P. S. Andam 125


INTRODUCTION: Definition of terms
Example of the types of an experiment
(a) If we measure the distance between two towns, A and B many times
under exactly the same conditions, we expect the same results. Such an
experiment is said to deterministic.
(b) However, if we toss a fair coin under exactly the same conditions, such
an experiment is random or stochastic since the outcome cannot be
predicted with certainty although all possible outcomes are known.

Dr. P. S. Andam 126


INTRODUCTION: Definition of terms
Sample Space
• This is the set of all possible outcomes of same random experiment.
• It is usually denoted by 𝑆 or Ω.
• The possible outcomes are known as sample points and are denoted by 𝒔 or
𝝎. Hence, 𝒔𝝐𝑺 or 𝝎𝝐𝛀.
• If any one outcome of an experiment has the same chance of occurrence as
any other outcome when an experiment is performed, then the outcomes are
said to be equally likely.
For instance; when a die is tossed, the outcomes 1, 2, 3, 4, 5, 6 are all equally
likely so long as the die is fair.
In this case, the sample space 𝑺 = {𝟏, 𝟐, 𝟑, 𝟒, 𝟓, 𝟔}
Dr. P. S. Andam 127
INTRODUCTION: Definition of terms
Examples:
What are the sample spaces when;
(i) A fair coin is tossed thrice.
(ii) A fair die and a fair coin are tossed once.

NB: It must be noted that the sample space can either be finite, countably
infinite or uncountably infinite. i.e. Discrete or continuous.

Dr. P. S. Andam 128


INTRODUCTION: Definition of terms
Events
• It is a subset of the sample space 𝑆.
• It is also denoted by a capital letter such as 𝐴, 𝐵, 𝐶 etc.
• If a sample space has 𝑛 samples, then there are a total of 2𝑛 subsets or
events.
For instance; When a die is rolled, write down the following event sets:
(a) The number < 4
(b) a number > 4
(c) An odd number.

Dr. P. S. Andam 129


INTRODUCTION: Definition of terms
Mutually Exclusive Events
• Two events 𝐴 and 𝐵 are said to be mutually exclusive if they cannot
occur together. i.e. 𝐴 ∩ 𝐵 = ∅ .
• Otherwise they are Mutually Inclusive.
Example: Suppose 𝑆 = {𝑥|𝑥𝜖𝑁 and 𝑥 < 8}. Classify the following events
as either mutually exclusive or mutually inclusive.
(a) 𝐴 = {1,2,3} and 𝐵 = 4,5,6
(b) 𝐶 = {𝑥|𝑥 is an even number, 3 < 𝑥 < 7} and 𝐷 = {4,6,7}

Dr. P. S. Andam 130


INTRODUCTION: Definition of terms
Collectively Exhaustive Events
• Two or more events defined on the same sample space are said to be
collectively exhaustive if their union is equal to the sample space 𝑆.
• i.e. 𝐴𝑖 , ∀𝑖 = 1,2,3, … , 𝑛 are said to be collectively exhaustive if
𝑛

ራ 𝐴𝑖 = 𝑆
𝑖=1
For Instance; when a die is thrown, the events 1 , 2 , 3 , 4 , 5 , {6}
are collectively exhaustive since their union equals the Samples space 𝑆.
Again, when a coin is tossed, the events {𝐻} and {𝑇} are collectively
exhaustive since their union is the sample space 𝑆.
Dr. P. S. Andam 131
INTRODUCTION: Definition of terms
Partition
• The events 𝑨𝟏 , 𝑨𝟐 , … , 𝑨𝒏 form a partition of the sample space 𝑆 if
(a) 𝐴𝑖 ≠ ∅, ∀𝑖 = 1,2, … , 𝑛
(b) 𝐴𝑖 ∩ 𝐴𝑗 = ∅, ∀𝑖 ≠ 𝑗 & 𝑖, 𝑗 = 1,2, … , 𝑛
(c) ‫=𝑖𝑛ڂ‬1 𝐴𝑖 = 𝑆

• Condition (a) means nonempty classes are not allowed, (b) implies
classes or events should be pairwise mutually exclusive and (c) means
all classes or events must be mutually exclusive.
Dr. P. S. Andam 132
INTRODUCTION: Definition of terms
Example: Partition
A coin is tossed thrice. Partition the sample space 𝑆 according to the
number of heads in the outcome.
Solution
The sample space is 𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}.
The Partitions are:
𝐴1 = 𝐻𝐻𝐻 (3 heads)
𝐴2 = 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻 (2 heads)
𝐴3 = {𝑇𝑇𝐻, 𝑇𝐻𝑇, 𝐻𝑇𝑇} (1 head)
𝐴4 = {𝑇𝑇𝑇} (No heads)
Dr. P. S. Andam 133
INTRODUCTION: Definition of terms
Independent Events
• Two events 𝐴 and 𝐵 are independent if the occurrence (or non-occurrence) of
one of them is not affected by the occurrence (or non-occurrence) of the
other.
Example (Independence): When two coins are tossed, the occurrence of the
event “head” on the first coin and “tail” on the second coin are independent.
• Otherwise, the two events are dependent.
Example (Dependence): A box contains two blue pens and one red pen. Two
pens are picked at random successively. The events “blue pen picked in the first
round” and “red pen picked in the second round” are dependent. Clearly, the
likelihood that you will pick a red pen depends on whether it has been picked
already or not.
Dr. P. S. Andam 134
INTRODUCTION: Definition of terms
Properties of Independent Events
If A & 𝐵 defined over the same sample space 𝑆 are independent events, then
(a) A & 𝐵′ are independent
(b) A’ & 𝐵 are independent
(c) A’ & 𝐵′ are independent

Also, if A, 𝐵, 𝐶 are independent events, then


(a) C & A ∪ 𝐵 are independent
(b) C & A ∩ 𝐵 are independent
(c) C & A\𝐵 are independent
Dr. P. S. Andam 135
CONCEPT OF PROBABILITY
Axiomatic Approach
• Probability is a set function 𝑷 ∙ which assigns to each event
𝐴 ⊂ 𝑆 a numerical value 𝑷(𝑨) called the probability of A which
represents the likelihood of event A occurring such that the ff axioms
are satisfied:
(a) 0 ≤ 𝑃 𝐴 ≤ 1
(b) 𝑃 𝑆 = 1
(c) If 𝐴1 , 𝐴2 , … , 𝐴𝑛 are events and 𝐴𝑖 ∩ 𝐴𝑗 ≠ ∅, 𝑖 ≠ 𝑗, then
𝑃 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 = 𝑃 𝐴1 + 𝑃 𝐴2 + ⋯ + 𝑃(𝐴𝑛 )
Dr. P. S. Andam 136
PROBABILITY CALCULUS
From the three axioms stated, other properties can be established.
Theorem 1
If ∅ is the empty set, then 𝑃 ∅ = 0.
Proof:
Let 𝑆 be the sample space, then
𝑆 =𝑆∪∅
𝑃 𝑆 =𝑃 𝑆∪∅
𝑃 𝑆 =𝑃 𝑆 +𝑃 ∅
⇒𝑃 ∅ =𝑃 𝑆 −𝑃 𝑆 =0

Dr. P. S. Andam 137


PROBABILITY CALCULUS
Theorem 2
For each event 𝐴 ⊂ 𝑆, 𝑃 𝐴 = 1 − 𝑃(𝐴′ )
Proof:
Since 𝐴 ⊂ 𝑆, then
𝑆 = 𝐴 ∪ 𝐴′ (𝐍𝐁: A ∩ A′ = ∅)
𝑃 𝑆 = 𝑃 𝐴 ∪ 𝐴′
𝑃 𝑆 = 𝑃 𝐴 + 𝑃 𝐴′
1 = 𝑃 𝐴 + 𝑃 𝐴′
⇒ 𝑃 𝐴 = 1 − 𝑃(𝐴′ )

Dr. P. S. Andam 138


PROBABILITY CALCULUS
Theorem 3
If 𝐴 and 𝐵 ⊂ 𝑆 | 𝐴 ∩ 𝐵 ≠ ∅, then the probability that either 𝐴 𝑜𝑟 𝐵 (𝑜𝑟 𝑏𝑜𝑡ℎ)
will occur is the sum of their separate probabilities less the probability of their
joint occurrence: i.e. 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
Proof:
𝐴 ∪ 𝐵 = 𝐴 ∪ 𝐴′ ∩ 𝐵
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 ∪ 𝐴′ ∩ 𝐵
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐴′ ∩ 𝐵 −− −(eqn1)
However,
𝐵 = (𝐴 ∩ 𝐵) ∪ (𝐴′ ∩ 𝐵)
Dr. P. S. Andam 139
PROBABILITY CALCULUS
Hence
𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 + 𝑃(𝐴′ ∩ 𝐵)
𝑃 𝐵 − 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴′ ∩ 𝐵 −− −(𝑒𝑞𝑛 2)
Substituting (eqn 2) into (eqn 1), we have
𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)

Dr. P. S. Andam 140


PROBABILITY CALCULUS
Example:
A faculty leader was meeting two students in Paris one coming from Town
A and the other from Town B at approximately the same time. If P(A) =
0.93, P(B) = 0.89 and P(A ∩ B) = 0.87. Find
(a) The probability that at least one person is on time.
Solution
P(At least one person is on time) = P(A U B)
Hence, P(A U B) = P(A) + P(B) – P(A ∩ B) = 0.93 + 0.89 - 0.87 = 0.95.

Dr. P. S. Andam 141


PROBABILITY CALCULUS
Example: Given that P(A)=0.4, P(B)=0.5 and P(A ∩ B)=0.3, find:
(a) P(A U B) (b) P(A ∩ 𝐵’) (c) P(A’ U B’)
Solution:
(a) P(AUB)=P(A) + P(B) - P(A ∩ 𝐵) = 0.4 + 0.5 - 0.3 = 0.6.

(b) P(A ∩ 𝐵’)=P(A) – P(A ∩ 𝐵)= 0.4 – 0.3 = 0.1

(c) P(A’ U B’)= P(A’) + P(B’) - P(A’ ∩ 𝐵′)


⇒P(A’ U B’)=(1-0.4)+(1-0.5)-(1-0.3)=0.5

Dr. P. S. Andam 142


PROBABILITY CALCULUS
Conditional Probability
• Probability of an event will sometimes depend on whether we know
other events have occurred.
• Such probabilities are termed conditional probabilities.
• Thus, the probability of an event A occurring given that B has already
occurred is denoted by 𝑷(𝑨|𝑩) and given by the formula:
𝑷 𝑨∩𝑩
𝑷 𝑨𝑩 = ,
𝑷 𝑩
Where 𝐏 𝑩 > 𝟎.

Dr. P. S. Andam 143


PROBABILITY CALCULUS
• If A and B are mutually exclusive, then A ∩ 𝐵 = ∅ and hence
𝑃 𝐴∩𝐵
𝑃 𝐴𝐵 = = 0, 𝑠𝑖𝑛𝑐𝑒 𝑃 𝐴 ∩ 𝐵 = 0 & 𝑃 𝐵 > 0
𝑃 𝐵

• Also, if B ⊆ 𝐴, then
𝑃 𝐴∩𝐵 𝑃 𝐵
𝑃 𝐴𝐵 = = =1
𝑃 𝐵 𝑃 𝐵

Dr. P. S. Andam 144


PROBABILITY CALCULUS
Example:
Suppose a fair die is tossed once. Find the probability of obtaining 1 given that
an odd number was obtained.
Solution
Let A be the event that a 1 is observed and B be the event that an odd number
was obtained. Then 𝑆 = 1,2,3,4,5,6 , 𝐴 = 1 , 𝐵 = {1,3,5}
𝑛 𝐴1 𝑛 𝐵 3
𝑛 𝐴 =1⇒𝑃 𝐴 = = and 𝑛 𝐵 =3⇒𝑃 𝐵 = =
𝑛 𝑆6 𝑛 𝑆 6
𝑛 𝐴∩𝐵 1
𝑛 𝐴 ∩ 𝐵 = 1 ⇒P(A ∩ 𝐵) = =
𝑛 𝑆 6
𝑃(𝐴∩𝐵) 1Τ 1
Hence 𝑃 𝐴 𝐵 = = 3Τ6 =
𝑃(𝐵) 6 3
Dr. P. S. Andam 145
PROBABILITY CALCULUS
Independent Events
If two events A and B are independent then,
𝑃 𝐴 ∩ 𝐵 = 𝑃(𝐴) ⋅ 𝑃(𝐵)
• Hence,
𝑃(𝐴 ∩ 𝐵) 𝑃(𝐴) ⋅ 𝑃(𝐵)
𝑃 𝐴𝐵 = = = 𝑃(𝐴)
𝑃(𝐵) 𝑃(𝐵)

• Also
𝑃 𝐵 𝐴 = P(B)

Dr. P. S. Andam 146


PROBABILITY CALCULUS
Example:
If 𝑃 𝑋 = 0.6 and 𝑃 𝑌 = 0.5, given that events 𝑋 & Y are independent,
find
(a) 𝑃 𝑋𝑈𝑌
(b) 𝑃 𝑋 ′ ∩ 𝑌
(c) 𝑃(𝑋 ′ ∪ 𝑌 ′ )

Dr. P. S. Andam 147


PROBABILITY CALCULUS
Total Probability Rule (TPR)
• Also known as the formula for incompatible and exhaustive causes or
stratified sampling theorem.
We consider the ff theorem before we proceed to the TPR.
Theorem
If 𝐴1 , 𝐴2 , … , 𝐴𝑛 form a partition of 𝑺, then ∀ events 𝐵 ⊆ 𝑆 and for 𝑃 𝐵 > 0
𝑛

𝑃 𝐵 = ෍ 𝑃(𝐴𝑖 ∩ 𝐵)
𝑖=1

Dr. P. S. Andam 148


PROBABILITY CALCULUS
Theorem: Total Probability Rule (TPR)
If 𝐴1 , 𝐴2 , … , 𝐴𝑛 form a partition of 𝑺, then ∀ events 𝐵 ⊆ 𝑆 and for 𝑃 𝐵 > 0.
Then
𝑛

𝑃 𝐵 = ෍ 𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
𝑖=1

Dr. P. S. Andam 149


PROBABILITY CALCULUS
Example: A group of visitors to UG comprised of 15 students from Oxford
and 20 students from Harvard. Among those from Oxford, 8 were females
and 15 of the Harvard students were males. A student was selected at
random to give the vote of thanks at the end of the visit. What is the
probability that the student is a female?
Solution:
Let F = {Female Students}
𝐴1 = { Oxford Students}
𝐴2 = {Harvard Students}

Dr. P. S. Andam 150


PROBABILITY CALCULUS
Then either the student comes from Oxford and she is a female or is from
Harvard and is a female. [Thus, (𝐴1 ∩ 𝐹) ∪ (𝐴2 ∩ 𝐹)]
Therefore,
𝑃 𝐹 = 𝑃 𝐴1 𝑃 𝐹 𝐴1 + 𝑃 𝐴2 𝑃 𝐹 𝐴2
15 8 20 5
⇒𝑃 𝐹 = × + ×
35 15 35 20
13
⇒ 𝑃(𝐹) =
35

Dr. P. S. Andam 151


PROBABILITY CALCULUS
Bayes’ Theorem
• Also known as theorem on Probability of causes.
• Used to revise probabilities in accordance with newly acquired
conditional probabilities.
That is, it is applicable where quantities in the form 𝑃(𝐵|𝐴𝑖 ) and 𝑃(𝐴𝑖 )
are known and we wish to find 𝑃(𝐴𝑖 |𝐵) .
• 𝑃(𝐴𝑖 ) are called prior probabilities of 𝐴𝑖 while 𝑃(𝐵|𝐴𝑖 ) are called
posterior probabilities.
• 𝑃(𝐴𝑖 |𝐵) are called likelihoods.
Dr. P. S. Andam 152
PROBABILITY CALCULUS
Theorem: Bayes’ Theorem
Suppose 𝐴1 , 𝐴2 , … , 𝐴𝑛 form a partition of 𝑺| 𝑷(𝑨𝒊 ) ≠ 𝟎 ∀ 𝑖 are known. Let
𝐵 be any event in 𝑺| 𝑷(𝑩) ≠ 𝟎 and 𝑷(𝑩|𝑨𝒊 ) is known.
Then
𝑷 𝑨𝒊 𝑷(𝑩|𝑨𝒊 )
𝑷 𝑨𝒊 |𝑩 = 𝒏
σ𝒋=𝟏 𝑷(𝑨𝒋 )𝑷(𝑩|𝑨𝒋 )

Dr. P. S. Andam 153


PROBABILITY CALCULUS
• Example 1:
In Orange County, 51% of the adults are males. One adult is randomly
selected for a survey involving credit card usage.
(a) Find the prior probability that the selected person is a male.

(b) It is later learned that the selected survey subject was smoking a
cigar. Also, 9.5% of males smoke cigars, whereas 1.7% of females
smoke cigars. Use this additional information to find the probability
that the selected subject is a male.

Dr. P. S. Andam 154


PROBABILITY CALCULUS
Example 2:
A survey is taken in Oklahoma, Kansas and Arkansas.
In Oklahoma, 50% of surveyed support raising tax. In Kansas, 60% support
a tax increase and in Arkansas only 35% favour the increase.
Of the total population of the three states, 40% live in Oklahoma, 25% live
in Kansas and 35% live in Arkansas.
Given that a surveyed person is in favour of raising taxes, what is the
probability that he/she lives in Kansas?

Dr. P. S. Andam 155


LECTURE SEVEN - OUTLINE
• Random Variable definition and characterization.
• Characterization of Discrete random variables
• Characterization of Continuous random variables
• Numerical Characteristics of Random Variables

Dr. P. S. Andam 156


Introduction
• Random variable is a real-valued function defined on a sample space S.
• If S is the sample space associated with some random experiment 𝛆, a r.v.
𝑿 is a function that assigns a real number 𝑿(𝒔) to each sample point 𝒔𝝐𝑺.
• A r.v. is also known as a Stochastic or aleatory variable.

𝑿
∙𝒔 ∙ 𝑿(𝒔)

𝑺𝒂𝒎𝒑𝒍𝒆 𝑺𝒑𝒂𝒄𝒆
𝑹𝒆𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝑺𝒚𝒔𝒕𝒆𝒎
Dr. P. S. Andam 157
Introduction
Example:
Let the random experiment be the tossing of a fair coin twice.
The Sample space 𝑺 = {𝑯𝑯, 𝑯𝑻, 𝑻𝑯, 𝑻𝑻}
We can define a r.v. 𝑿 as the number of tails.
In tabular form;

Sample Point HH HT TH TT
𝑿 (𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒂𝒊𝒍𝒔)
2 1 1 0

Dr. P. S. Andam 158


Introduction
• Thus {𝒔|𝑿 𝒔 = 𝟏} is the event 𝑯𝑻, 𝑻𝑯 . This can simply be written as
{𝑿 = 𝟏}.
• In general, {𝑿 = 𝒙} will be used to denote events.
• Similarly {𝒔|𝑿 𝒔 < 𝟏} is shortened to 𝑿 < 𝟏 and the Probability of
the two events are written as P(𝑿 = 𝟏) & P 𝑿 < 𝟏 respectively.
X = Number of tails when two coins are tossed
HH 0
HT
1
TH
TT 2

𝑺
Dr. P. S. Andam 159
Introduction
NB: (a) From the example, the original sample space has four sample
points but the event space has 3 points i.e. 𝟎, 𝟏, 𝟐
(b) 𝑿 is a function with domain 𝜴 and range ⊂ ℝ
Example:
Give the sample space and range space (event space) of each of the ff r.v.
(i) Number of heads × Number of tails
(ii) Number of heads + Number of tails
When two coins are tossed.

Dr. P. S. Andam 160


Types of Random Variables
• Discrete random variables
Here, the range space is either finite or countably infinite.

• Continuous random variables


The range space is uncountably infinite

• Mixed random variables


The range space is usually piecewise defined; partly discrete and
partly continuous.

Dr. P. S. Andam 161


The Probability Distribution of a Discrete r.v.
• This consists of:
(i) the list of all possible values 𝒙𝒊 , 𝑖 = 1,2,3, … of the r.v.
(ii) the corresponding probability of each 𝒙𝒊 occurring i.e. 𝑷(𝑿 = 𝒙𝒊 )
• The above is necessary for a complete characterization of the r.v. 𝑿.
• The probability distribution of a discrete r.v. is known as the Probability mass
function (p.m.f)
• For 𝑷(𝑿 = 𝒙𝒊 ) to be a legitimate probability distribution, the following
conditions are necessary and sufficient:
𝟎 ≤ 𝑷 𝑿 = 𝒙𝒊 ≤ 𝟏, ∀𝒊
෍ 𝑷 𝑿 = 𝒙𝒊 = 𝟏
∀𝒊

Dr. P. S. Andam 162


Representing the Probability distribution of 𝑋
The probability distribution of 𝑋 can be represented in the ff forms:
- Tables
- Graphs
- Formulae

Dr. P. S. Andam 163


Representing the Probability distribution of 𝑋
Tabular form
• Given a r.v. 𝑿 together with corresponding probabilities 𝑷(𝑿 = 𝒙𝒊 ) or 𝐩(𝒙𝒊 ),
in tabular form;
𝑿𝒊 𝒙𝟏 𝒙𝟐 … 𝒙𝒏
𝑷(𝑿 = 𝒙𝒊 ) 𝑷(𝑿 = 𝒙𝒊 ) 𝑷(𝑿 = 𝒙𝒊 ) 𝑷(𝑿 = 𝒙𝒊 )

Dr. P. S. Andam 164


Representing the Probability distribution of 𝑋
Probability Graph
• This is the graph of 𝑷(𝑿 = 𝒙𝒊 ) against 𝒙𝒊 .
• Vertical lines (or bars) are drawn above the possible values of 𝒙𝒊 of the r.v.
𝒑(𝒙)
𝒑(𝒙𝟐 )
𝒑(𝒙𝟏 ) 𝒑(𝒙𝟒 )
𝒑(𝒙𝟑 )

𝒙𝒊
𝒙𝟐 𝒙𝟑 𝒙𝟒
𝒙𝟏

Dr. P. S. Andam 165


Discrete Probability Distributions
Example 1: A fair coin is tossed thrice. Let 𝑿 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒉𝒆𝒂𝒅𝒔 which
are up.
(a) Find the probability distribution of 𝑋
(b) Construct the probability graph
Solution:
𝑺 = {𝑯𝑯𝑯, 𝑯𝑯𝑻, 𝑯𝑻𝑯, 𝑯𝑻𝑻, 𝑻𝑯𝑯, 𝑻𝑯𝑻, 𝑻𝑻𝑯, 𝑻𝑻𝑻}
NB: All events are equiprobable (or equally likely).
𝟏
𝑷 𝑬𝒗𝒆𝒏𝒕 =
𝟖

Dr. P. S. Andam 166


Discrete Probability Distributions
• The results are as tabulated below;
𝑿 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒉𝒆𝒂𝒅𝒔
𝒔𝝐𝑺 TTT TTH THT HTT HHT HTH THH HHH
𝑿 0 1 1 1 2 2 2 3
𝑷(𝑿 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
= 𝒙) 𝟖 𝟖 𝟖 𝟖 𝟖 𝟖 𝟖 𝟖
𝟏
• 𝑷 𝑿 = 𝟎 = 𝑷 𝑻𝑻𝑻 =
𝟖
𝟏 𝟏 𝟏 𝟑
• 𝑷 𝑿 = 𝟏 = 𝑷 𝑻𝑻𝑻 ∪ 𝑻𝑯𝑻 ∪ {𝑯𝑻𝑻} = + + =
𝟖 𝟖 𝟖 𝟖
𝟏 𝟏 𝟏 𝟑
• 𝑷 𝑿 = 𝟐 = 𝑷 𝑯𝑯𝑻 ∪ 𝑯𝑻𝑯 ∪ {𝑻𝑯𝑯} = + + =
𝟖 𝟖 𝟖 𝟖
𝟏
• 𝑷 𝑿 = 𝟑 = 𝑷 𝑻𝑻𝑻 =
Dr. P. S. Andam
𝟖 167
Discrete Probability Distributions
(a) Hence, the p.m.f is given as:
𝒙𝒊
0 1 2 3
𝟏 𝟑 𝟑 𝟏
𝑷(𝒙𝒊 )
𝟖 𝟖 𝟖 𝟖
𝑷(𝒙𝒊 )
(b) 1

3/8
2/8
1/8
𝒙𝒊
Dr. P. S. Andam
0 1 2 3 168
Discrete Probability Distributions
Example 2: A committee of 4 is to be selected from a group of 5 men and
5 women. Let 𝑿 be the r.v. representing the number of women in the
committee. Create the p.m.f.
Solution:
𝑿 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑜𝑚𝑒𝑛; hence 𝒙 = 𝟎, 𝟏, 𝟐, 𝟑, 𝟒.
𝟏𝟎
𝑺= , where 𝑺 is the sample space.
𝟒
𝟓
Number of ways of selecting women =
𝒙
𝟓
Number of ways of selecting men =
𝒚
Dr. P. S. Andam 169
Discrete Probability Distributions
But 𝒙 + 𝒚 = 𝟒 ⇒ 𝒚 = 𝟒 − 𝒙
Therefore, the number of ways of consisting the committee is;
𝟓 𝟓 𝟓 𝟓
=
𝒙 𝒚 𝒙 𝟒−𝒙

Hence in conclusion,
𝟓 𝟓
𝑷 𝑿=𝒙 = 𝒙 𝟒 − 𝒙 , 𝒙 = 𝟎, 𝟏, 𝟐, 𝟑, 𝟒.
𝟏𝟎
𝟒
Dr. P. S. Andam 170
Discrete Probability Distributions
Example 3:
Verify that 𝒑(𝒙) is a pmf of some r.v. 𝑿 if:
𝟏
𝒑 𝒙 = ቐ𝟐𝟏 𝟐𝒙 + 𝟑 , 𝒙 = 𝟏, 𝟐, 𝟑
𝟎, 𝒆𝒍𝒔𝒆𝒘𝒉𝒆𝒓𝒆
Example 4:
𝒌(𝒙 − 𝟏), 𝒙 = 𝟑, 𝟒, 𝟓
Given 𝒑 𝒙 = ቊ ,
𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆
Find 𝑘𝜖ℝ| 𝒑(𝒙) is a legitimate pmf.
Dr. P. S. Andam 171
Continuous Probability Distributions
• The probability distribution for a continuous random variable is given
either in graphical form or functional form.
• This is known as probability density function (pdf) denoted by 𝒇 𝒙 .
• Tables cannot be used because listing all values is impossible.
Definition: A function 𝒇 𝒙 defined on the real numbers is called a pdf if it
satisfies the ff properties:
𝑷𝟏: 𝒇 𝒙 ≥ 𝟎 , ∀𝒙

𝑷𝟐: න 𝒇 𝒙 𝒅𝒙 = 𝟏
−∞

Dr. P. S. Andam 172


Continuous Probability Distributions
Definition: Probability between two values 𝒂 & 𝒃.
𝑏
𝑃 𝑎≤𝑋≤𝑏 = ‫𝑓 𝑎׬‬ 𝑥 𝑑𝑥, −∞ ≤ 𝑎 ≤ 𝑏 ≤ ∞

Theorem: Probability that a continuous r.v. assumes a specific


𝑃 𝑋 = 𝑎 = 0.
Proof:
𝑎
𝑃 𝑋 = 𝑎 = 𝑃 𝑎 ≤ 𝑋 ≤ 𝑎 = න 𝑓 𝑥 𝑑𝑥 = 0
𝑎

Dr. P. S. Andam 173


Continuous Probability Distributions
Theorem: ∀𝒂, 𝒃| 𝒂 ≤ 𝒃,
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝑃 𝑎 ≤ 𝑋 < 𝑏 = 𝑃 𝑎 < 𝑋 ≤ 𝑏 = 𝑃(𝑎 < 𝑋 < 𝑏)
Proof:
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝑃 𝑋 = 𝑎 + 𝑃 𝑎 < 𝑋 < 𝑏 + 𝑃(𝑋 = 𝑏).
⇒ 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 0 + 𝑃 𝑎 < 𝑋 < 𝑏 + 0.
⇒𝑃 𝑎≤𝑋≤𝑏 =𝑃 𝑎<𝑋<𝑏 .

NB: The other parts may be proved in a similar way.

Dr. P. S. Andam 174


Continuous Probability Distributions
Example: Let 𝑿 be a continuous random variables such that:
𝟏
𝒇 𝒙 = ቐ𝟖 𝒙, 𝟎 < 𝒙 < 𝟒
𝟎, 𝒆𝒍𝒔𝒆𝒘𝒉𝒆𝒓𝒆
(a) Show that 𝒇 𝒙 is a pdf.
(b) Sketch the graph.

Dr. P. S. Andam 175


Continuous Probability Distributions
Solution:
(a) Clearly,
𝟏
𝒇 𝒙 ≥ 𝟎 ⇔ 𝒙 ≥ 𝟎 ∀𝒙𝝐(𝟎, 𝟒).
𝟖
Also, since

‫׬‬−∞ 𝒇 𝒙 = 𝟏,
𝟒
𝟏 𝟏 𝒙𝟏+𝟏 𝟒 𝟏 𝟐𝟒 𝟏 𝟐
𝟏
⇒ න 𝒙 𝒅𝒙 = × |𝟎 = 𝒙 |𝟎 = [ 𝟒 − 𝟎)𝟐 = 𝟏𝟔 = 𝟏 .
𝟎 𝟖 𝟖 𝟐 𝟏𝟔 𝟏𝟔 𝟏𝟔
Hence, 𝒇 𝒙 is a pdf.

Dr. P. S. Andam 176


Continuous Probability Distributions
(b) The graph of the pdf is given by:
F(x)

4/8
f(x)=x/8
3/8

2/8

1/8

0
1 2 3 4 x

Dr. P. S. Andam 177


Continuous Probability Distributions
𝑎𝑥, 0 < 𝑥 < 4
Example 2: A r.v. 𝑿 has pdf: 𝑓 𝑥 = ቊ where 𝒂 is a constant
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(a) Find the value of the constant 𝒂
(b) Compute 𝑷(𝟐 < 𝑿 < 𝟑)
Solution:
4 4 1
(a) Since the fxn is a pdf, ‫׬‬0 𝑓 𝑥 𝑑𝑥 = ‫׬‬0 𝑎𝑥 𝑑𝑥 = 1 ⇒ 𝑎 =
8
𝟑𝟏 𝟓
(b)𝐏 𝟐 < 𝐗 < 𝟑 = ‫𝒙𝒅 𝒙 𝟖 𝟐׬‬ =
𝟏𝟔

Dr. P. S. Andam 178


Cumulative Distribution Function
Definition: Let 𝑿 be a r.v. and 𝒙𝝐𝑹. The cdf of 𝑿 is a function 𝑭 defined as
the probability that the r.v. 𝑿 takes on values less or equal to 𝒙. i.e.
𝑭 𝒙 = 𝑷(𝑿 ≤ 𝒙)

• The cdf is the most universal characteristic of a r.v. thus, it exists for all
random variables be it discrete or continuous.
• It is also known as the distribution function

Dr. P. S. Andam 179


Distribution function of Discrete r.v.
Definition: Let 𝑿 be a discrete r.v. with pmf 𝑝 𝑥𝑖 , then the cdf of 𝑿 is
given by 𝑭 𝒙 = σ𝒙𝒊 ≤𝒙 𝒑(𝒙𝒊 ).
• If 𝑿 takes on only a finite number of values 𝑥1 , 𝑥2 , … , 𝑥𝑛 , then the cdf is:
0, −∞ ≤ 𝑥 < 𝑥1
𝑝 𝑥1 , 𝑥1 ≤ 𝑥 < 𝑥2
𝐹 𝑥 = 𝑝 𝑥1 + 𝑝 𝑥2 , 𝑥2 ≤ 𝑥 < 𝑥3

𝑝 𝑥1 + 𝑝 𝑥2 + ⋯ + 𝑝 𝑥𝑛 = 1, 𝑥𝑛 ≤ 𝑥 < +∞
• Such a function is a jump fxn and is right continuous

Dr. P. S. Andam 180


Distribution function of Discrete r.v.
Example; Given the probability distribution below, find the cdf.
𝒙𝒊 0 1 2 3
𝑝(𝒙𝒊 ) 1 3 3 1
8 8 8 8
Solution:
𝟏
𝑭 𝟎 =𝑷 𝑿≤𝟎 =𝑷 𝟎≤𝑿<𝟏 = .
𝟖
𝟏 𝟑 𝟒
𝑭 𝟏 =𝑷 𝑿≤𝟏 =𝑷 𝟎≤𝑿<𝟐 =𝑷 𝑿=𝟎 +𝑷 𝑿=𝟏 = + = .
𝟖 𝟖 𝟖
𝟏 𝟑 𝟑 𝟕
𝑭 𝟐 =𝑷 𝑿≤𝟐 =𝑷 𝟎≤𝑿<𝟑 = + + = .
𝟖 𝟖 𝟖 𝟖
𝟏 𝟑 𝟑 1
𝑭 𝟑 =𝑷 𝑿≤𝟑 =𝑷 𝟎≤𝑿≤𝟑 = + + + = 𝟏
Dr. P. S. Andam
𝟖 𝟖 𝟖181 8
Distribution function of Discrete r.v.
Hence the cdf is
0, 𝑥<0
1
, 0≤𝑥<1
8
4
𝐹 𝑥 = , 1≤𝑥<2
8
7
, 2≤𝑥<3
8
1, 𝑥≥3

Dr. P. S. Andam 182


Distribution function of Discrete r.v.
• The graph of 𝑭(𝒙) is as shown below:
F(x)
1

6/8

4/8

2/8

0 1 4
2 3 x
Dr. P. S. Andam 183
Distribution function of Continuous r.v.
Definition: Let 𝑿 be a continuous r.v. with pdf 𝒇 𝒙 . Then the cdf 𝑭(𝒙) is
given by
𝒙
𝑭 𝒙 = න 𝒇 𝒕 𝒅𝒕
−∞

0, 𝑥 < 0
𝑥
Example: the pdf of 𝑿 is given by: 𝑓 𝑥 = ൞2 , 0 ≤ 𝑥 ≤ 2
0, 𝑥 ≥ 2
Find the cdf of 𝑿

Dr. P. S. Andam 184


Distribution function of Continuous r.v.
Solution
• If 𝑥 < 0, then 𝐹 𝑥 = 0
𝑥 𝑥𝑡 𝑥2
• If 0 < 𝑥 < 2, then 𝐹 𝑥 = ‫׬‬0 𝑓 𝑡 𝑑𝑡 = ‫׬‬0 𝑑𝑡 =
2 4
2 2𝑡
• If If 𝑥 > 2, then 𝐹 𝑥 = ‫׬‬0 𝑓 𝑡 𝑑𝑡 = ‫׬‬0 2 𝑑𝑡 = 1
Hence,
0, 𝑥 < 0
𝑥2
𝐹 𝑥 = , 0 ≤ 𝑥 ≤ 2,
4
1, 𝑥 > 2
NB: Graph the above function.
Dr. P. S. Andam 185
Properties of CDFS
Property 1
0 < 𝐹 𝑥 < 1, ∀𝑥.
Property 2
𝑭(𝒙) is a non-decreasing function of 𝒙 .
Property 3
𝑃 𝑎 <𝑋 <𝑏 =𝐹 𝑏 −𝐹 𝑎 .
Property 4
lim 𝐹 𝑥 = 1 & lim 𝐹 𝑥 = 0.
𝑥→+∞ 𝑥→−∞

Dr. P. S. Andam 186


Properties of the CDF
Theorem: Suppose 𝑿 is a continuous r.v., if the cdf is 𝑭(𝒙), then:
𝒇 𝒙 = 𝑭′ 𝒙

𝟏 𝟏
Theorem: If 𝑷(𝑿 ≤ 𝒙𝟎 ) ≥ or 𝑷(𝑿 ≥ 𝒙𝒐 ) ≥ , then 𝑥𝑜 is said to be the
𝟐 𝟐
median of the distribution.

Dr. P. S. Andam 187


Numerical Characteristics of Random Variables
• These give a general quantitative description of the r.v. by obtaining a
single value from the values of the r.v.

• Some of these values are:


- Mathematical Expectation (Mean Value)
- Variance
- Moments

Dr. P. S. Andam 188


Numerical Characteristics of Random Variables
Mathematical Expectation
• This is the same as the mean or expected value of a r.v.
Definition: [The Expectation of a Discrete r.v.]
If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 is the range of a discrete r.v. 𝑿 which assumes the value of
𝑥𝑖 with corresponding probability 𝒑(𝒙𝒊 ), 𝒊 = 𝟏, 𝟐, … , 𝒏,then the
expectation of 𝑿 is given by:
𝒏

𝑬 𝑿 = ෍ 𝒙𝒊 𝒑(𝒙𝒊 )
𝒊=𝟏

Dr. P. S. Andam 189


Numerical Characteristics of Random Variables
Relationship between the Expectation and the Arithmetic Mean
𝑓𝑖
Since we know that 𝑝(𝑥𝑖 ) = , where 𝑓𝑖 is the frequency, then
𝑁
𝒏 𝒏 𝒏
𝒇𝒊 𝟏
𝑬 𝑿 = ෍ 𝒙𝒊 𝒑(𝒙𝒊 ) = ෍ 𝒙𝒊 = ෍ 𝒇𝒊 𝒙𝒊
𝑵 𝑵
𝒊=𝟏 𝒊=𝟏 𝒊=𝟏

When each 𝒙𝒊 , ∀𝒊 are equally likely, then 𝒇𝒊 = 𝟏, ∀𝒊. Then the expectation
becomes:
𝒏
𝟏
𝑬 𝑿 = ෍ 𝒇𝒊 𝒙𝒊
𝑵
𝒊=𝟏
Dr. P. S. Andam 190
Numerical Characteristics of Random Variables
Definition: [The Expectation of a Continuous r.v.]
Suppose 𝑿 is a continuous random variable then the expectation is given by:

𝑬 𝑿 = න 𝒙𝒇 𝒙 𝒅𝒙
−∞

Dr. P. S. Andam 191


Numerical Characteristics of Random Variables
Properties of the Expected Value of a r.v.
Property 1: The expectation of a constant is the constant, i.e. 𝑬 𝒄 = 𝒄, 𝒄 ∈ ℝ.

Property 2: If 𝒄 is a constant and 𝑿 is a r.v., then 𝑬 𝒄𝑿 = 𝒄𝑬(𝑿), 𝒄 ∈ ℝ

Property 3: 𝑬 𝒂𝑿 + 𝒃 = 𝒂𝑬 𝑿 + 𝒃, given that 𝒂, 𝒃 ∈ ℝ where 𝑿 is a r.v.

Property 4: The expectation of the deviation of r.v. 𝑿 from its mean is zero. i.e.
𝑬 𝑿−𝝁 =𝟎
Dr. P. S. Andam 192
Numerical Characteristics of Random Variables
Variance
• The variance of a r.v. 𝑿 is the expectation of the square of the deviation of the
r.v. from its expected value. i.e.
𝒏
𝟐
𝑽𝒂𝒓 𝑿 = 𝑬 𝑿 − 𝑬 𝑿 = 𝑬[𝑿 − 𝒖]𝟐 = ෍(𝒙𝒊 − 𝒖)𝟐 𝒑(𝒙𝒊 )
𝒊=𝟏

• The positive square root of the variance denoted as 𝝈 is the standard


deviation. Thus
𝝈 = 𝑽𝒂𝒓(𝑿)
• When the variance formula above is simplified, it becomes:
𝑽𝒂𝒓 𝑿 = 𝑬 𝑿 − 𝑬 𝑿 𝟐 = 𝑬[𝑿 − 𝒖]𝟐 = 𝑬 𝑿𝟐 − 𝝁𝟐
Dr. P. S. Andam 193
Numerical Characteristics of Random Variables
Properties of the Variance
Property 1: Variance of a constant is zero. i.e. 𝑽𝒂𝒓 𝒄 = 𝟎, 𝒄 ∈ 𝑹.

Property 2: 𝑽𝒂𝒓 𝒂𝑿 = 𝒂𝟐 𝑽𝒂𝒓(𝑿), 𝒂 ∈ 𝑹

Property 3: 𝑽𝒂𝒓 𝒂𝑿 + 𝒃 = 𝒂𝟐 𝑽𝒂𝒓 𝑿 , 𝒂, 𝒃 ∈ 𝑹

Dr. P. S. Andam 194


Numerical Characteristics of Random Variables
Example:
For the distribution of the r.v. 𝑿 below, find the expectation of 𝟐𝑿.
𝑿 = 𝒙𝒊 0 1 2 3
𝑷(𝑿 = 𝒙𝒊 ) 1 3 3 1
8 8 8 8

Solution:
𝐸 2𝑋
3 3
1 3 3 1
= ෍ 2𝑥𝑖 𝑝 𝑥𝑖 = 2 ෍ 𝑥𝑖 𝑝 𝑥𝑖 = 2 0× + 1× + 2× + 3× =3
8 8 8 8
𝑖=0 𝑖=0
Dr. P. S. Andam 195
Numerical Characteristics of Random Variables
Moments
• The moment of a r.v. 𝑿 is the expectation of different 𝒌 powers (𝒌 =
𝟏, 𝟐, … ) of the r.v when the expectation exists.
Types of Moments
• Moment about the origin
• Moment about the mean
• Moment about a point

Dr. P. S. Andam 196


Numerical Characteristics of Random Variables
Moment about the Origin
• The kth moment of a r.v. 𝑿 about the origin is defined as 𝑬 𝑿𝒌 which is
given by:

𝑬 𝑿𝒌 = ෍ 𝒙𝒊 𝒌 𝒑(𝒙𝒊 ) … 𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞 𝐜𝐚𝐬𝐞


𝒊=𝟏

𝑬 𝑿𝒌 = න 𝒙𝒌 𝒇 𝒙 𝒅𝒙 … (𝑪𝒐𝒏𝒕𝒊𝒏𝒖𝒐𝒖𝒔 𝒄𝒂𝒔𝒆)
−∞
NB: When k = 1, we have the Expected Value of 𝑿, i.e. 𝑬(𝑿)

Dr. P. S. Andam 197


Numerical Characteristics of Random Variables
Moment about the mean
• This is also known as the central moment.
• Thus, the kth central moment of 𝑿 is given by:

𝑬 (𝑿 − 𝒖)𝒌 = ෍(𝒙𝒊 − 𝒖)𝒌 𝒑(𝒙𝒊 ) … 𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞 𝐜𝐚𝐬𝐞


𝒊=𝟏

𝑬 (𝑿 − 𝒖)𝒌 = න (𝒙 − 𝒖)𝒌 𝒇 𝒙 𝒅𝒙 … (𝑪𝒐𝒏𝒕𝒊𝒏𝒖𝒐𝒖𝒔 𝒄𝒂𝒔𝒆)
−∞
NB: the first central moment of a r.v 𝑿 is zero. i.e. 𝑬 𝑿 − 𝒖 = 𝟎

Dr. P. S. Andam 198


Numerical Characteristics of Random Variables
Moments about any point
• The kth moment of a r.v. 𝑿 about any arbitrary point 𝒂 is defined as:

𝑬 (𝑿 − 𝒂)𝒌 = ෍(𝒙𝒊 − 𝒂)𝒌 𝒑(𝒙𝒊 ) … 𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞 𝐜𝐚𝐬𝐞


𝒊=𝟏

𝑬 (𝑿 − 𝒂)𝒌 = න (𝒙 − 𝒂)𝒌 𝒇 𝒙 𝒅𝒙 … (𝑪𝒐𝒏𝒕𝒊𝒏𝒖𝒐𝒖𝒔 𝒄𝒂𝒔𝒆)
−∞

Dr. P. S. Andam 199


Numerical Characteristics of Random Variables
Uses of Moments
• Used in finding
- the expectation of a r.v. 𝑿
- the variance
- Skewness
- kurtosis

Dr. P. S. Andam 200


LECTURE EIGHT - OUTLINE
• Probability Distributions
Discrete
- Bernoulli Distribution
- Binomial Distribution
- Poisson Distribution
Continuous
- Normal Distribution
Dr. P. S. Andam 201
Bernoulli Distribution
• Consider a random experiment that has only two outcomes; i.e. either event
A occurs or does not occur.

• Examples of such experiments include:


- Selecting a male from among a group of males and females
- selecting defective items from a batch of defective and non-defective
products
- Experiments resulting in either life or death, head or tail, hit or miss

Dr. P. S. Andam 202


Bernoulli Distribution
• If event A occurs, the trial is labelled a success (𝐴) and if it does not
occur, that trial is labelled a failure 𝐴ҧ .
• Such experiments are called Bernoulli trials.

Definition: Bernoulli trial


It is a random experiment, the outcome of which can be classified in one of
two mutually exclusive and independent exhaustive ways.

NB: The definition consists of the x’tics of a Bernoulli trial


Dr. P. S. Andam 203
Bernoulli Distribution
Characterization of the Bernoulli Distribution
• The probability of a success will be denoted by 𝒑 [𝒊. 𝒆. 𝑷 𝑨 = 𝒑] and
that of a failure denoted by 𝐪 = 𝟏 − 𝒑 𝒊. 𝒆. 𝑷 𝑨 ഥ =𝒒=𝟏−𝒑 .
• Let the random variable 𝑋 be the occurrence of A.
• For simplicity sake, denote 𝐴 by 1 and 𝐴ҧ by 0. Thus,
𝑷 𝑺𝒖𝒄𝒄𝒆𝒔𝒔 = 𝑷 𝑿 = 𝟏 = 𝑷 𝑨 = 𝒑 and
𝑷 𝑭𝒂𝒊𝒍𝒖𝒓𝒆 = 𝑷 𝑿 = 𝟎 = 𝑷 𝑨 ഥ =𝒒=𝟏−𝒑

NB: The r.v 𝑋 takes on only two variables i.e. 𝑋 = 0, 1.

Dr. P. S. Andam 204


Bernoulli Distribution
Definition: Bernoulli Distribution
• A discrete random variable 𝑿 is known as a Bernoulli Distribution with
parameter 𝒑 if its probability distribution satisfies
𝒑, 𝒙=𝟏
𝑷 𝑿 = 𝒙 = ቐ 𝟏 − 𝒑, 𝒙=𝟎
𝟎, 𝒆𝒍𝒔𝒆𝒘𝒉𝒆𝒓𝒆
The above is represented with the pmf:
𝑷 𝑿 = 𝒙 = 𝒑 𝒙 = 𝒑𝒙 𝟏 − 𝒑 𝟏−𝒙
, 𝒙 = 𝟎, 𝟏

Dr. P. S. Andam 205


Bernoulli Distribution
Theorem:
The Bernoulli distribution is a legitimate probability distribution.
Proof: Exercise.

Properties of the Bernoulli Distribution.


(a) 𝑬 𝑿 = 𝒑
(b) 𝑽𝒂𝒓 𝑿 = 𝒑𝒒 = 𝒑(𝟏 − 𝒑)

Dr. P. S. Andam 206


Bernoulli Distribution
Example 1: If a fair die is tossed once, let the r.v. 𝑋 be such that
1, 𝑖𝑓 𝑎 3 𝑜𝑐𝑐𝑢𝑟𝑠
𝑋=ቊ
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find (i) The probability distribution of 𝑋 (ii) The mean and variance of 𝑋

1 5
Solution: 𝑆 = {1,2,3,4,5,6}, 𝑃 𝑋 = 1 = & 𝑃 𝑋 = 0 =
3 6
. Hence,

1
, 𝑥=1 1 5
3
(i) 𝑃 𝑋 = 𝑥 = ൞5 (ii) 𝐸 𝑋 = 𝑝 = & 𝑉𝑎𝑟 𝑋 = 𝑝𝑞 =
6 36
, 𝑥=0
6
Dr. P. S. Andam 207
Bernoulli Distribution
Example 2: Suppose the probability of germination of a beans seed is
0.8 and the germination of a seed is considered a success. If 10 seeds are
planted independent of each other, describe the experiment below and
characterize its probability distribution.
Solution: The experiment involves 10 Bernoulli trials with success
probability (or parameter) 𝑝. Thus, if 𝑋 = 𝑔𝑒𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎 𝑠𝑒𝑒𝑑, then 𝑋 =
0 𝑜𝑟 1 where 0 = 𝑛𝑜𝑛 − 𝑔𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 & 1 = 𝑔𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎 𝑠𝑒𝑒𝑑.
0.8, 𝑥=1
𝑃 𝑋=𝑥 =ቊ
0.2, 𝑥=0

Dr. P. S. Andam 208


Binomial Distribution
• It is based on a Bernoulli process.

• A Bernoulli process is a series of independent and identical Bernoulli


trials.

• An example of a Bernoulli process is drawing n items one at a time with


replacement from a batch of N items.

Dr. P. S. Andam 209


Binomial Distribution
Definition: Binomial Experiment
An experiment is said to be Binomial if it possesses the following properties;
(a) The experiment is repeated n times under identical conditions.
(b) Each of the n trials is a Bernoulli trial (i.e. each trial must result in either a
success or a failure)
(c) The n trials are independent of each other.
(d) The probability of a success on a single trial p remains the same for all trials.
(e) The random variable of interest, X is the number of successes in n Bernoulli
trials.

Dr. P. S. Andam 210


Binomial Distribution
Example: A coin is tossed 10 times. The random variable of interest X, the
number of heads that appear. Is this a Binomial experiment? What are the
parameters?
Solution:
(a) The experiment consists of n = 10 trials. One trial represents determining
whether a head appears.
(b) Each trial results in one of two outcomes, a head and not a head (a tail)
𝟏
(c) The probability of obtaining a head on each trial is the same. i.e.
𝟐
(d) The trials are independent because the outcome are independent
(e) The random variable of interest X, is the number of heads (successes) in ten
trials.
Dr. P. S. Andam 211
Binomial Distribution
Definition: Binomial Distribution
A discrete r.v. X is said to have a binomial distribution with parameters n and p
(𝒏𝝐ℤ+ , 𝟎 ≤ 𝒑 ≤ 𝟏 and 𝑿 is a Bernoulli r.v.) if it has pmf:
𝒏 𝒙
𝒑 𝟏 − 𝒑 𝒏−𝒙 , 𝒙 = 𝟎, 𝟏, 𝟐, … , 𝒏
𝑷 𝑿=𝒙 =ቐ 𝒙
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

NB:
Theorem: The Binomial Distribution is a legitimate probability distribution.
Dr. P. S. Andam 212
Binomial Distribution
Properties of the Binomial Distribution
If 𝑿~𝒃(𝒙; 𝒏, 𝒑) then:
(a) 𝑬 𝑿 = 𝒏𝒑
(b) 𝑽𝒂𝒓(𝑿) = 𝒏𝒑

Dr. P. S. Andam 213


Binomial Distribution
Example: A fair coin is tossed 5 times. Find the probability that
(a) Exactly three heads occur
(b) At least three heads occur
(c) At least one head occurs.

Solution:
Let a Head be ‘a success’ then 𝑿 = 𝒕𝒉𝒆 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒉𝒆𝒂𝒅𝒔
𝟏 𝟏
And 𝒏 = 𝟓, 𝒑 = , 𝒒 =𝟏−𝒑=
𝟐 𝟐
𝟏
The r.v. 𝑿~𝒃(𝒙; 𝟓, )
Dr. P. S. Andam
𝟐 214
Binomial Distribution
1 5 1 3 1 5−3
(a) 𝑃 𝑋 = 3 = 𝑏 3, 5, = = 0.3125
2 3 2 2

(b) 𝑃 𝑋 ≥ 3 = 𝑃 𝑋 = 3 + 𝑃 𝑋 = 4 + 𝑃 𝑋 = 5 = 0.5

(a) 𝑃 𝑋 ≤ 1 = 1 − 𝑃 𝑋 = 0 = 0.96875

Dr. P. S. Andam 215


Binomial Distribution
Example: In a certain game of gambling, a player tosses a fair coin; if it falls
heads he wins $100 and if it falls tail he losses $100. A player with $800 tosses
the coin 6 times. What is the probability that he will be left with $600?
Solution:
Let 𝑿 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒂 𝒑𝒍𝒂𝒚𝒆𝒓 𝒘𝒊𝒏𝒔. Then 𝑿 is a binomial
𝟏
distribution with parameters 𝒏 = 𝟔, 𝒑 = .
𝟐
If at the end of the six tosses the player is left with $600 it follows that he only
wins twice out of the six tosses. Hence:
𝟐 𝟔−𝟐
𝟔 𝟏 𝟏
𝑷 𝑿=𝟐 = = 𝟎. 𝟐𝟑𝟒𝟒
𝟐 𝟐 𝟐
Dr. P. S. Andam 216
Binomial Distribution
Distribution Function
The Cdf of the binomial r.v., 𝑩(𝒓; 𝒏, 𝒑) gives the probability of obtaining r
or less successes in n trials (𝒓 ≤ 𝒏) is obtained by adding the individual
probabilities for all binomial values equal or less than r. i.e.
𝑩 𝒓; 𝒏, 𝒑 = 𝑷 𝑿 ≤ 𝒓 = 𝒃 𝟎; 𝒏, 𝒑 + 𝒃 𝟏; 𝒏, 𝒑 + ⋯ + 𝒃(𝒓; 𝒏, 𝒑)
Hence in concise form:
𝒓

𝑩 𝒓; 𝒏, 𝒑 = ෍ 𝒃 𝒙; 𝒏, 𝒑 , 𝒙 = 𝟎, 𝟏, 𝟐, … , 𝒏
𝒙=𝟎

Dr. P. S. Andam 217


Binomial Distribution
Example: A coin is tossed 10 times. What is the probability of getting:
(a) Less than 8 heads?
(b) 8 or more heads?
𝟏
Solution: We have a binomial distribution with 𝒏 = 𝟏𝟎, 𝒑 = .
𝟐
(a) 𝑃 𝑋 < 8 = 𝑃 𝑋 ≤ 7 = σ7𝑥=0 𝑏 𝑥; 10,0.5 = 𝐵 7; 10,0.5 = 0.94531.
(b) 𝑃 𝑋 ≥ 8 = σ10
𝑥=8 𝑏 𝑥; 10,0.5 = 0.05469

Dr. P. S. Andam 218


Binomial Distribution
Example: Two fair dice are tossed 360 times. Find
(a)How many times you would expect the sum of the numbers which
show to be 10?
(b) The standard deviation of getting a sum of 10.

Solution: let X be the number of times the sum of the two numbers which
show up is 10.
Then 𝑋~𝑏(𝑥; 360, 𝑝)
3 1
The required event is { 𝟒, 𝟔 , 𝟔, 𝟒 , 𝟓, 𝟓 } so 𝑝 = =
36 12
Dr. P. S. Andam 219
Binomial Distribution
1
(a) 𝐸 𝑋 = 𝑛𝑝 = 360 × 12
= 30
1 1
(b) 𝑉𝑎𝑟 𝑋 = 𝑛𝑝 1 − 𝑝 = 360 × 12
× 1−
12
= 27.50
Therefore, the standard deviation 𝜎 = 𝑉𝑎𝑟(𝑋) = 27.50 = 5.24

Dr. P. S. Andam 220


The Poisson Distribution
• Suppose that events occur at random points in time (space or volume).
• These events may be; occurrences of accidents, errors, number of calls
that come into a telephone exchange, number of enquiries at an
information desk etc.
• Such occurrences are assumed to follow a Poisson process.
• It is used to model the number of occurrences of events of a specified
type in a period of time of length 𝒕, when events of this type are
occurring randomly at a mean rate 𝝀 per unit time.

Dr. P. S. Andam 221


The Poisson Distribution
• It is based on the Poisson Process.
• A process is said to be a Poisson Process if the ff postulates hold:
(a) The occurrence of an event in any given interval in independent of any
other interval.
(b) The probability of a single occurrence of the event in a given interval is
proportional to the length of the interval. i.e. stationarity
(c) In any infinitesimally small portion of the interval, the probability of
more than one occurrence of the event is negligible.

Dr. P. S. Andam 222


Poisson Distribution
Definition: A discrete r.v. 𝑿 is said to have a Poisson distribution with
parameter 𝝀 (𝝀 > 𝟎) if its pmf is
𝒆−𝝀 𝝀𝒙
𝒑 𝒙 = ൞ 𝒙! , 𝑥 = 0,1,2, …
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

The Poisson distribution is denoted by 𝐩(𝐱; 𝝀)


Theorem: The Poisson distribution is a legitimate probability distribution

Dr. P. S. Andam 223


Poisson Distribution
Example: Suppose that the number of calls at a switch board during a 30 –
minute time interval is known to be a r.v. having a Poisson distribution with
parameter 𝝀 = 𝟑. Find the probability that during a given 30 – minutes,
(a) Exactly 4 calls will be received
(b) More than 4 calls will be received
(c) At least one call will be received
Solution:
The 30 – minute time interval is just stated as a unit. Thus 𝝀 = 𝟑.

Dr. P. S. Andam 224


Poisson distribution
Let 𝑿 be the number of calls received at the switch board.
Then 𝑿~𝒑(𝒙; 𝟑)
𝒆−𝟑 𝟑𝟒
(a) 𝑷 𝑿 = 𝟑 = 𝟒!
= 𝟎. 𝟏𝟔𝟖𝟎
(b) 𝑷 𝑿 > 𝟒 = 𝟏 − 𝑷 𝑿 ≤ 𝟒 = 𝟎. 𝟏𝟖𝟒𝟕
(c) 𝑷 𝑿 ≥ 𝟏 = 𝟏 − 𝑷 𝑿 = 𝟎 = 𝟎. 𝟗𝟓𝟎𝟐

Dr. P. S. Andam 225


Poisson Distribution
Distribution function
The Cdf of a Poisson r.v. 𝑿 with a parameter 𝝀 denoted by 𝑷(𝒓; 𝝀) is
defined as:
𝒓

𝑃 𝑟, 𝝀 = 𝑷 𝑿 ≤ 𝒓 = ෍ 𝒑(𝒙; 𝝀)
𝒙=𝟎

Dr. P. S. Andam 226


Poisson Distribution
Properties of the Poisson Distribution
If 𝑿 has a Poisson distribution with parameter 𝝀, then
(a) 𝐸 𝑋 = 𝝀
(b) 𝑉𝑎𝑟 𝑋 = 𝝀

The above is a very important characteristic of the Poisson distribution


since it is clear that:
𝑬 𝑿 = 𝝀 = 𝑽𝒂𝒓(𝑿)

Dr. P. S. Andam 227


Poisson Distribution
Poisson as a limiting form of Binomial Distribution
Theorem:
Suppose X is a binomial r.v. with parameters 𝒏 and 𝒑. Then for large
𝒏 (𝒏 → ∞) and small 𝒑 (𝒑 → 𝟎) such that 𝝀 = 𝒏𝒑 is a constant and
moderate, the binomial distribution is approximated by the Poisson
distribution, i.e.
𝒏 𝒙 𝒏−𝒙 𝒆−𝝀 𝝀𝒙
lim 𝒑 𝒒 =
𝒏→∞ 𝒙 𝒙!

Dr. P. S. Andam 228


Poisson Distribution
Definition: Rare events
In a binomial distribution, if 𝒏 → ∞ and 𝒑 → 𝟎 (𝑠𝑜 𝑡ℎ𝑎𝑡 𝒒 → 𝟏), the event
is called Rare.
In practice, if 𝑛 ≥ 50 while 𝑛𝑝 < 5, then the binomial distribution is very
closely approximated by the Poisson distribution with 𝝀 = 𝒏𝒑.

Dr. P. S. Andam 229


Poisson Distribution
Example: Suppose the probability that a newly born baby will die of a certain
disease is 0.00002, what is probability that out of 100,000 newborn babies;
(a) Four or more will die of this disease
(b) Exactly four will die of this disease?
Solution: n=100,000, p=0.00002 ⇒ 𝑞 = 1 − 𝑝 = 0.99998
𝟏𝟎𝟎, 𝟎𝟎
(a) 𝑷 𝑿 ≥ 𝟒 = σ𝟏𝟎𝟎,𝟎𝟎𝟎
𝒙=𝟒 𝟎. 𝟎𝟎𝟎𝟎𝟐 𝒙 𝟎. 𝟗𝟗𝟗𝟗𝟖 𝟏𝟎𝟎𝟎𝟎𝟎−𝒙 = 𝟎. 𝟏𝟒𝟐𝟖𝟖
𝒙
(b) 𝑷 𝑿 = 𝟒 = 𝑷 𝟒; 𝟐 − 𝑷 𝟑; 𝟐 = 𝟎. 𝟎𝟗𝟎𝟐𝟓

Dr. P. S. Andam 230

You might also like