0% found this document useful (0 votes)
19 views59 pages

Statistics Review

A comprehensive foundational statistical review of elementary statistics which can be referred to anytime for principle foundations. See textbook for practice

Uploaded by

Amal Kadir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views59 pages

Statistics Review

A comprehensive foundational statistical review of elementary statistics which can be referred to anytime for principle foundations. See textbook for practice

Uploaded by

Amal Kadir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 59

ECN 1203 Introductory Statistics Notes 2018-2019

Lecture 1: Introduction
We live in a world where numbers are very important in understanding the events
and things that are happening around us. Every day we are bombarded with
numbers that try to provide information about the world.

The more we understand the meaning and information these numbers are trying to
convey the better we understand the world and the more informed the decisions and
actions we take.

This course will provide us with a range of techniques and methods to better
understand numbers and the information they convey.

At the beginning of this course we would first like to define the term
statistics and some basic definitions of terms we use throughout the course.

Definition of Statistics.

Two meanings:

1. Narrow Meaning: Statistics refer to numerical data/facts. Here statistics


refer to the numbers themselves or numbers derived from other numbers. Eg.
Employment statistics, inflation rate, the speed limit etc.
2. Broad Meaning: Statistics is concerned with scientific methods for collecting
organising summarising presenting and analysing data as well as with drawing
valid conclusios and making reasonable decisions on the basis of such analysis.
(SOS)

The broader meaning of Statistics therefore points to a process that takes time. Our
whole course involves this process.

Two Branches of Statistics


1. Descriptive Statistics:
Much of the process mentioned in the broader meaning of Statistics falls under
the area of descriptive statistics. The basic meaning is that descriptive
statistics involves collecting, summarising, organising and analysing raw data.
Descriptive Statistics use actual data that has been collected.

2. Inferential Statistics:
Inferential Statistics involves using samples and information gathered from
samples to make conclusions or inferences about the population.

Population: refers to all the members, items in a general set under study. Eg.
All the students on campus, all the trees on campus, all the transactions
undertaken by a bank teller for a specific day.

Sample: A part or subset of a population.


Therefore when we are doing inferential statistics we use samples and
information derived from these samples to make conclusions about the
population.

Variables
A variable is a characteristic or trait related to a sample or a population. A variable
is a symbol, such as X, Y, H, X, or B which can assume any of a prescribed set of
values. Therefore the values the variable takes on depends on the problem being
studied.

The value or measurement of a characteristic or trait related to a member or item of


a sample or population give rise to a data point or observation. A collection of data
points give rise to a data set.

Two types of variables :


1. Qualitative: A qualitative variable is represented by categories. It is non-
numeric in nature. Examples of this type of variable are gender, religious
affiliation, eye colour, type of car owned etc.

2. Quantitative variable: A Quantitative variable is one that takes on numeric


values eg the balance in a checking account, the number of children in a family,
height of students etc.

This type of variable is further subdivided into two categories:


Discrete:

Discrete variables are generated by counting items or elements in a sample or


population.
Continuous:
Continuous variables are generated by measuring.

Cross section vs Time Series Data


Cross Section Data:

Time Series Data:


Sources of Data: Primary vs Secondary Data
Primary Data

Secondary Data

2.Summarising Data: Arrays and Frequency Distributions and


Graphical Techniques of Representing Data.
Raw Data
Raw data has no order and represented data as collected. The first part of the
course focuses on the techniques of making such data more understandable.

Organising and Summarising Raw Data


1. Arrays
An array is a simple arrangement of raw data either in ascending or
descending order. Despite its lack of summary the array does offer some
information which may not be readily seen in the raw data.
Eg. 17, 45, 38, 27, 6, 48, 11, 57, 34 and 22

Ascending Array: 6, 11, 17, 22, 27, 34, 38, 45, 48, 57
Descending Array: 57, 48, 45, 38, 34, 27, 22, 17, 11, 6
Assume that the numbers in the arrays ar marks from a test. What
information can such an array yield?
1) The highest mark is 57.
2) The lowest mark is 6.
3) The Range is 57 – 6 = 51.
4) How many students pass the test achieving a mark of 50 or more.

2. Frequency distributions

Frequency Distributions are tables used to summarise and present raw data.
The type of distributions depends on the nature of the data collected. Thus
Frequency distributions may present qualitative or quantitative data. FDs
representing the latter may present discrete or continuous data. Further the
FD may be ungrouped or grouped depending on the size of the data set.

Qualitative Frequency Distributions

qualitative frequency distribution presents non-numeric or qualitative data.


Here data is . summarised using non-overlapping classes where the latter are
given as categories. The example given below is taken from Bluman our main
textbook.

Example 1:In this example the blood types of twenty five persons are given
and the FD for this data is given below.
Source: Alan Bluman “Elementary Statistics” – 8th Edition. McGraw Hill, New York

Example 2: Another worked out example is given below with some details:

Source: Prem Mann “Introductory Statistics” Introductory Statistics – 8th Edition. John
Wiley and Sons

Disgrams to represent a qualitative frequency distribution


1. Quantitative Frequency distributions

These distributions represent numeric data, which may be discrete or continuous.


They usually fall into two categories, ungrouped and grouped distributions.

1.1. Ungrouped Distributions

An ungrouped Frequency Distribution is used when the data set is relatively small.
Items in the data set are therefore represented by single values of the variable in
question and the process of constructing the FD involves simply counting how many
items take on the different values the variable can assume. An example taken from
Mann is given below.

The following FD is worked out from the data above. Notice that it uses single value
classes.
Source: Prem Mann “Introductory Statistics” Introductory Statistics – 8th Edition.
John Wiley and Sons

1.2. Grouped Distributions

In the Table 3.1 the heights of 100 students at State University are recorded to
the nearest inch. Construct a frequency distribution of the data given in Table
1

Table 1

61 68 63 68 73 64 66 68 67 72 66 65
64 70 65 67 70 67 69 64 68 64 67 70
66 74 68 69 67 73 71 69 69 65 71 66
62 70 67 70 63 68 66 69 67 70 66 73
69 66 65 67 67 70 62 67 70 70 71 66
69 73 66 69 64 68 66 69 71 64 67 67
66 64 67 68 68 70 63 66 69 66 62 69
63 66 61 68 67 72 66 68 63 63 67 68
70 73 69 64

2.2.1 Grouped frequency distribution

Class limits Class boundary Tally Frequency


60-62 59.5-62.5 5
63-65 62.5-65.5 18
66-68 65.5-68.5 42
69-71 68.5-71.5 27
72-74 71.5-74.5 8
2. Constructing Histogram, Ogive and the frequency polygon

Table 2

3.1 Using the frequency distribution in table 2 above construct a Histogram


3.2 Using the frequency distribution in table 2 above construct a frequency
polygon

3.3 Using the frequency distribution in table 2 above construct an Ogive

Table 3.2
Marks f Mid-point R.F P.F (%) C.F

60-62 5 61 0.05 5 5
63-65 18 64 0.18 18 23
66-68 42 67 0.42 42 65
69-71 27 70 0.27 27 92
72-74 8 73 0.08 8 100
100 1.00 100

3. Measures of Central Tendency


4.1 The Mean
The mean, also known as the arithmetic average is found by adding the values of
the data and dividing by the total number of values. (∑x)/n

4.1.1 Calculating the mean of ungrouped data


Example 4.1: Find the mean of the following numbers 45,49,12,61,70,36
= (45+49+12+61+70+36)/6
=273/6
= 45.5

Table 3.3

Marks f Mid-point fx X2 fx2

60-62 5 61 305 3,721 18,605

63-65 18 64 1,152 4,096 73,728

66-68 42 67 2,814 4,489 188,538

69-71 27 70 1,890 4,900 132,300

72-74 8 73 584 5,329 42,632

6,745 22,535 455,803

4.1.2 From the table above calculate the mean of the grouped frequency
distribution

Mean = (∑fx)/n

=6,745/100
=67.45

4.2 Calculating the Median

The Median is the midway of a data array

4.2.1 Ungrouped Data

45,49,12,61,70,36,18

=12,18,36,45,49,61,70

= 45

4.2.2 Grouped frequency Distribution

=65.5+ ((100/2 -23)/42) * 3

= 65.5 + ((50-23)/42)* 3

= 65.5 + 1.93

= 67.43

4.3 Calculating the Mode

The mode is the value that occurs most often in a data set. It is sometimes said to be
the most typical case.

4.3.1 Ungrouped Data

25,15,18,25,17,25,12

= 25

4.3.2 Grouped Data

= 65.5 + ((24) /(24+15))* 3

=65.5 + 1.85

=67.35
4. Measures of Dispersion

4.1. Range

4.1.1. Range Ungrouped Data


25,15,18,63,17,75,12
= Highest Value – Lowest Value
= 75 – 12
= 63

4.1.2. Range Grouped Data


4.1.2.1 The Mid Point Method
= MPLC - MPFC
= 73 – 61
= 12

4.1.2.2 The Boundary Method


= UBLC - LBFC
= 74.5 - 59.5
= 15

4.2. Mean Absolute Deviation (MAD)


Computing the MAD for a raw data set.
5, 9, 16, 17, 18

X X-µ (X-µ)

5 -8 8

9 -4 4

16 3 3

17 4 4

18 5 5

65 0 24

µ= 13

¿
Mean Average Deviation = Σ∨x−µ∨ N ¿

= 24/5
= 4.8

Computing Mean Absolute Deviation for Grouped Data.

Class f Mid-point
fx X - X │X - X│ f │X - X│
118-126 3 122 366.0 -24.98 24.98 74.94
127-135 5 131 655.0 -15.98 15.98 79.90
136-144 9 140 1,260.0 -6.98 6.98 62.82
145-153 12 149 1,788.0 2.02 2.02 24.24
154-162 5 158 790.0 11.02 11.02 55.10
163-171 4 167 668.0 20.02 20.02 80.08
172-180 2 176 352.0 29.02 29.02 58.04
40 5,879.0 435.12

Mean = (∑fx)/n

= 5,879/40
= 146.98
¿ 435.12
Mean Average Deviation = Σf ∨x−µ∨ N ¿ =
40
= 10.88

On average the weight of each person in the data set deviates by 10.88 lbs from the
average of all persons in the data set.

4.3. Variance and Standard Deviation

Ungrouped data

Definitional formulas for population.

2
2 Σ ( x−µ)
σ =
N
1750
= 6
= 291.7

σ =
√ Σ(x −µ)2
N
= √ 291.7
= 17.08

Definitional formulas for sample

The formulas above are the definitional formulas, however the shortcut formulas
below are the ones which should be used.
Variance and standard deviation for grouped data.
Population

Sample

Example
Marks f Mid-point fx X2 fx2

10-20 2 15 30 225 450

21-31 8 26 208 676 5,408

32-42 15 37 555 1,369 20,535

43-53 7 48 336 2,304 16,128

54-64 10 59 590 3,481 34,810

65-75 3 70 210 4,900 14,700

1,929 12,955 92,031

S2 = n(∑ f x2) ___


(∑ f x)2
n (n - 1)

= 45(92,031) - (1929)2

45(45-1)

= 4,141,395 - 3,721,041

1980

= 212.3

S = √ S2

= √ 212.3

= 14.57
The Empirical Rule
Introduction to Probability
Uncertainty, Random experiments, sample points, sample spaces and
events.
We need to understand these concepts in order to calculate probabilities.

Uncertainty and Random Experiments.


If a process has two or more possible outcomes, the outcomes are said to be
uncertain.

Definition of Experiment: an experiment is an act or process of observation that


leads to a single outcome that cannot be predicted with certainty.

Examples:
1. The toss of coin. The outcomes of this experiment are: Head or Tail. Using set
notation the set of outcomes is {H, T}.
2. Rolling a fair die. There are six possible outcomes of this experiment. These
are: 1, 2, 3, 4, 5, 6. Using set notation the set of outcomes is {1, 2, 3, 4, 5, 6}.
3. Tossing two coins: The outcomes of this experiment are: HH, HT, TH, TT.
Using set notation the set of outcomes is {HH, HT, TH, TT}

Sample Point

The most basic outcome of an experiment (simple event).

A sample point cannot be decomposed further. Refer to the experiments above to


identify sample points.

The first experiment has two sample points, while the second experiment has six
sample points.
Sample spaces

A sample space is a set containing all the possible outcomes or sample points of an
experiment, where sample points are collectively exhaustive and each pair of sample
points is mutually exclusive (universal set). The sample space is represented by the
letter S.

Examples:
1. The toss of coin. The outcomes of this experiment are: Head or Tail. Using set
notation: S = {H, T}.
2. Rolling a fair die. There are six possible outcomes of this experiment. These
are: 1, 2, 3, 4, 5, 6. Using set notation: S = {1, 2, 3, 4, 5, 6}.
3. Tossing two coins: The outcomes of this experiment are: HH, HT, TH, TT.
Using set notation: S = {HH, HT, TH, TT}

In example 1 N(S) = 2

In example 2 N(S) = 6

In example 3 N(S) = 4

Sample spaces can also be represented by (1) Venn diagrams, (2) two
dimensional graphs and (3) Tree diagrams.

Venn diagrams
Two Dimensional Diagram

Sample space for the experiment of rolling two dice.

Tree Diagrams

The above ways of representing a sample space are useful when the sample space
is relatively small. In situations where the sample space is large other techniques
are used to determine the sample space for experiments. A study of combinations
and permutations may be useful here.
Events
An event is defined as a subset of the sample space. In other words an event is a
collection of sample points.

Types of events and related concepts


1. A simple event: is a subset containing exactly one sample point in a sample
space. It is also known as an elementary event or a fundamental event.

Example: Experiment –tossing a die.

A simple event that is related to this experiment might be the event A where

A = {observing the number 1}

Another simple event might be B where

B = {observing the number 3}

2. Compound Event: A compound event s subset containing two or more


sample points.

Example: Experiment –tossing a die.

A compound event that is related to this experiment might be the event C where

C = {observing an odd number} = {1, 3, 5}

Another compound event might be D where

D = {observing a number greater than the number three 3} = {4, 5, 6}

3. Impossible Event : An Impossible event is a subset containing none of the


points form the sample space. In other words the impossible event has
zero sample points. It is an empty or null set.
Example: Experiment –tossing a die.

An impossible event that is related to this experiment might be the event E where

E = {observing a number seven} = { } or

A word on notation

Capital letters are used to refer to events.

Example: Event A = {observing an odd number in one roll of the die} = {1, 3, 5}

n (A) = number of sample points in event A. In this case n (A) = 3

Complement of an event

The complement of an event denoted as the event A consists of all the points in the
sample space that are not in the event A. The symbol A is usually read aloud as A bar.

Example: Experiment – A single roll of a fair die. Let the event A = {observe an odd
number}

∴ A = {1, 3,5} and A = {2, 4, 6}

We can put these ides in the Venn diagram which follows:

A 1 3 2

5 4 6

Union of two events

The union of two events A and B is the event that occurs if either A or B or both occur
on a single performance of the experiment and is denoted as A ⋃ B.

Note that A ⋃ B is an event which is derived from the two events A and B. In other
words the sample points that belong to the event (A ⋃ B) are all the different sample
points drawn from the two events A and B. When we are considering the union of
two sets there are three possible ways in which the events a and the event B are
related, as shown below.

Scenario 1. Union when the two events (sets) are disjoint sets or sets which have no
intersection.

Given S = {2, 4, 6, 8} and T = {1, 3, 5, 7}

S ⋃ T = {1, 2, 3, 4, 5, 6, 7, 8}

Scenario 2: Union when the two events (sets) which have elements (sample points) in
common or in other words when the sets intersect.

Given V = {a, e, i, o, u} and F = {d, i, g, n, o}

V ⋃ F = {a, d, e, g, i, n, o, u}
Scenario 2: Union of two sets when one set is a proper subset of the other set.

T = {0, 1, 2, 3, 4} and S = {0, 1, 2}

T ⋃ S = (1, 2, 3, 4}

Intersection of two events

The intersection of two events A and B is the event that occurs if both A and B occur
on a single performance of the experiment. We write A ⋂ B for the intersection of A and B.
The intersection A⋂B, consists of all the sample points belonging to both A and B as shown in the
following diagram.

Scenario 1: The intersection involves part of both sets or events.

The above diagram represents a die toss experiment. Let A = {observing an even number} = {2, 4, 6}
and B = {observing a number less than or equal to 4} = {1, 2, 3, 4}

A ⋂ B = {2,4}

Scenrio2: Where the intersection is equal to all the elements of one set or event
Given D ={a, e, i, f, r, s, t} and C = {a, e, r}

D ⋂ C = {a, e r}

Note because the set C is a proper subset of D, their intersection is equal to C = {a, e, r}

Mutually Exclusive Events ( Disjoint events}

Events A and B are mutually exclusive if A ⋂ B contains no sample points, that is events A and B have
no sample points in common. In other words n(A ⋂ B) = 0 or is an empty set.

The relevant diagram for this situation is shown below:

S ⋂ T = { } or empty set.

Three ways of finding the probability of an event.


The Classical Approach to Probabilities

The classical approach of probabilities is based on the assumption that several outcomes are equally
likely. It depends on logical reasoning.

Two experiments for which the outcomes are equally likely are: (i) rolling a fair die and (ii) tossing a
fair die.

In other words classical Probability posits that all the points in a suitably constructed sample space are
equally probable. Using this approach the probability of an event is equal to the number of points in
the sample space for which this event occurs, divided by the total number of points in the sample
space, S. Thus for an event E, the probability of the event occurring is given by:

n(E)
P (E) =
N (S )

Where: P(E) – Probability of event E, occurring

n (E) – number of sample points in the event E.

N (S) – number of sample points in the sample space S.

Example1: What is the probability of getting a head on a single toss of a balanced coin?

S = {H, T} E = {H} n(E) = 1 N(S) = 2

1
Therefore P(E) =
2

Example 2: What is the probability of observing the number 1 in one roll of a die?

S = {1, 2, 3, 4, 5, 6} E = {observing the number one} = {1} n(E) = 1 N(S) = 6

1
Therefore P(E) =
6

Empirical approach to computing Probabilities


This approach is based on empirical observations and use the frequency past occurrences to compute
probabilities.

Hence if after n repetitions of an experiment, where n is very large, an event is observed to occur in x
of these then the probability of the event is x/n.

Another way of looking at this is given an event E, the probability of the event P(E) is given by the the
formula:

number of ×E occured f
P(E) = =
number of trials(repititions) n

Example. If we toss a coin 1000 times and find that it comes up heads 532 times, we estimate the
probability of a head coming up to be 532/1000 = 0.532

The empirical approach computes the probability of an event using the relative frequency of the event.

The subjective approach to Probability.

Some events cannot be analysed using the objective approaches outlined above. The subjective
theorist regards probability as a measure of personal confidence in a particular proposition.

Hence for situations in which there is little or no historical information from which to determine a
probability, subjective probabilities can be employed.

Subjective probability can be thought of as the probability assigned by an individual or group based on
whatever evidence is available.

In this course the first and second approaches will be used to compute probabilities.

Propositions/Rules of Probability
1. Within any sample space S, if E is an event then:

0 ≤ P (E) ≤ 1

Basically, this is saying that any calculated probability must be between 0 and 1 inclusive. The closer
the probability of an event is to zero the less likely that event will occur. On the other hand the closer
the probability of an event is to 1 the more likely that event will occur. It should be noted that the
probability of an event cannot be less than 0 or be negative. Further the probability of an event cannot
be greater than 1.

Recall that for an experiment with equally likely outcomes the formula for computing the probability
n(E)
is: P (E) =
N (S )

When n(e) = 0 then P(E) = 0 and the event E, is certainly not going to take place.

When n(E) = N(S) then P(E) = 1, and the event is certain to occur.

Otherwise n(E) < N(S) and P(E) is between 0 and 1. The expression above captures these ideas.

2. For any sample space S,


P(S) = 1

This means that an entire sample space is a certainty. It is an event that is bound to occur in one
running of the experiment.

3. Addition Rule
There are two versions of the addition rule one for two events that are mutually exclusive and a
general rule which may be used for events which are mutually exclusive and for those that are not.

3.1. The Addition rule for Mutually Exclusive Events


Two events are mutually exclusive if they are disjoint, that is if they have no common
sample points. Given two mutually exclusive events, A and B the probability that A or B will
occur is given by P(A ⋃ B). To compute this probability we use the addition rule.
P (A ⋃ B) = P (A) + P (B)
This rule is also referred to as the special addition rule. Let us use an example to examine
how this rule may be derived.

Example 1: Experiment – Roll a fair die.


S = {1, 2, 3, 4, 5, 6} A = {observing a number greater than 2} = {3, 4, 5, 6}
B = {observing a number less than 2}
From the Events and the Venn diagram it is clear that the event (A ⋃ B) is made up of all the
sample points in both events. (A ⋃ B) = {2, 3, 4, 5, 6}.
Therefore we can use the classical approach to compute:
n( A ⋃ B) 5
P(A ⋃ B) =
N (S)
= 6

This result may also be computed using the special addition rule:
P(A ⋃ B) = P(A) + P(B) = 4/6 + 1/6 = 5/6

Example 2: Toss two coins. Let event A = {observe one head} and B = {observe exactly two
heads}. Find the probability of observing at least one head by first counting sample points
and secondly by using the addition rule.

S = {HH, HT, TH, TT} A = {HT, TH} B = {HH}

3
P {observing at least one head} = . This approach employs the Classical approach of
4
counting sample points, since the outcomes of this experiment are equally likely.

The second approach uses the addition rule. Since both events A and B contain sample
points with at least one head and the events are disjoint we can use the special additional
rule to compute the required probability.
P (A ⋃ B) = P (A) + P(B)

2 1 3
= +
4 4
= 4

3.2. The General Addition Rule


This rule may be used for both mutually vents events as well as those that are not mutually
exclusive.
Accordingly with any sample space S, is A and B are two events, then:
P(A ⋃ B) = P (A) + P (B) – P (A ⋂ B)
In the case where two events are mutually exclusive if we use this general addition rule then
the last term P (A ⋂ B) ie equal to zero.

The question is why we make this adjustment. Recall from above that two events are
mutually exclusive when there are no common sample points in the two events, as shown in
the diagram below. In this diagram the events S and T have no common sample points.

On the other hand two events are not mutually exclusive when there is common sample
point/s in the two events as shown below.

Given B = {1, 2, 3, 4} and A = {2, 4, 6}

A ⋃ B = {1, 2, 3, 4, 6} A ⋂ B = {2, 4}

Then the P (A ⋃ B) = 6 (correct answer).


5

If we use the special addition rule P (A ⋃ B) = P (A) + P (B) for the second diagram then the
3 4 7
answer would be: = +
6 6
= 6 (This answer is incorrect)

Why does the special addition rule give us the incorrect answer?
Notice there are two common sample points in the two events and these are counted in the
computation of both P (A) and the P(B). Therefore we are double counting the numbers 2 and 4. To
correct for this we need to take away the effect of the double counting on the overall probability. This
is done by taking away P (A ⋂ B).

For the above problem we therefore need to use the general addition rule. So the
above problem with the adjustment gives the following:

P(A ⋃ B) = P (A) + P (B) – P (A ⋂ B)


3 4 2
= 6 + 6 - 6
5
=
6
Hence we see that the answer is the same as the correct answer above with the adjustment.

Example: Toss a coin and roll a die and combine the results. Let the event A = {observing an outcome
with a number less than or equal to 3} and B = { observing an outcome with a number greater than or
equal to 3}. Find the probability of observing an outcome with a number less than or equal to 3 or
greater than or equal to 3, i.e find P(A ⋃ B).

Sample space from this experiment worked out using tree diagram.

S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
A = { H1, H2, H3, T1, T2, T3}
B = { H3, H4, H5, H6, T3, T4, T5, T6}
(A ⋃ B) = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
(A ⋂ B) = {H3, T3}
n( A) 6 n (B) 8 n( A ⋂ B) 2
P (A) = = P (B) = = P(A ⋂ B) = =
N (S ) 12 N (S ) 12 N (S) 12

n( A ⋃ B) 12
P(A ⋃ B) = = =1
N (S) 12

Alternatively we can use the general addition rule to solve this problem as given below.
P(A ⋃ B) = P (A) + P (B) – P (A ⋂ B)
6 8 2
= + -
12 12 12
12
= 12

= 1

4. The probability of an event where n(E) equals the empty set.

P(∅) = 0

If the event set is an “impossibility” for which the probability of occurrence is zero.

Recall that the null set has no sample point.

Example: Roll a die. S = {1, 2, 3, 4, 5, 6}. Let A = {Observe the number seven} = {7}

P(A) = 0

5. Complementary Events and their probabilities.

The complement of an event E is the set of outcomes in the sample space that are
not included in the outcomes of event E. The complement of E is denoted by E (read “E
bar”) Bluman p218.
Example: Tossing two balanced coins. S = {HH, HT, TH, TT}. Let A = {observing at
least one head} = { HH, HT, TH} and A = {TT}

n( A) 3 1
P{A} =
N (S )
= 4
= 0.75 P( A ¿=
4

∴ P(A) = 1 – P( A ) = 1 - 4 = 4
1 3

P( A ) = 1 - P(A) = 1 - 4 = 4
3 1

P(A) + P( A ) = 4 + 4 = 1
3 1

We will see later that sometimes it may be easier to compute the probability of an
event if we know the probability of its complement.

6. The Multiplication Rule.

There are two versions of the multiplication rule depending on whether the events
are independent or dependent.

6.1. The Multiplication Rule for independent events.


The big question is how to determine whether two events are independent.

Two events A and B are independent events if the fact that A occurs does not affect
the probability of B occurring.

The multiplication rules can be used to find the probability of two or more events
that occur in sequence. For example, if you toss a coin and then roll a die, you can
find the probability of getting a head on the coin and a 4 on the die. These two
events are said to be independent since the outcome of the first event (tossing a
coin) does not affect the probability outcome of the second event (rolling a die).

A couple of other examples are:


Rolling a die and getting a 6, and then rolling a second die and getting a 3.
Drawing a card from a deck and getting a queen, replacing it, and drawing a
second card and getting a queen.
Example: To find the probability of two independent events that occur in sequence,
you must find the probability of each event occurring separately and then multiply the
answers. For example, if a coin is tossed twice, What is the probability of getting two
heads P(HH)?

We learnt that since all the outcomes of this experiment are equally likely, the
1
probability assigned to each one is .
4
1 1 1 1
Given that S ={HH, HT, TH, TT}, P{HH} = P{HT} = P{TH} = P{TT} =
4 4 4 4

We can view this experiment from a different perspective. Recall the tree diagram for
the experiment:

For sequential experiments the tree diagram offers us a method to compute the
1
probabilities of different events. Notice the P(HH) =
4
, as worked out using the
classical approach to computing probabilities.

Placing the probabilities along the branches of the tree is a useful technique to work
out the probabilities of different events. So this experiment and example shows two
ways of computing the same probabilities of events.

Using the tree diagram to compute probabilities for the four equally likely outcomes of
this experiment allows us to explain the multiplication rule for independent events,

1 1 1 1
Notice that the P{HH} =
4
or P(H and H) =
2
x 2
= 4

Example 2: Tossing three coins.


S = {HHT, HHT, HTH, HTT, THH, THT, TTH, TTT}

The probabilities of all the outcomes are the same since all the outcomes are equally
likely.

1
Thus P(H, 1) =
12
Example 3: Suppose there are five marbles in an urn. They are identical except for
colour. Three of the marbles are red and two are blue. You are instructed to draw out
one marble, note its colour, and replace it in the urn. Then you are to draw out another
marble and note its colour. What are the outcomes of the experiment? What is the
probability of each outcome?

We first draw a tree diagram to work out the outcomes of the experiment.

We see that the probabilities for the different outcomes are computed by multiplying
the probabilities along the branches of the tree.

These examples confirm the special multiplication rule for independent events.

6.2. The Multiplication Rule for dependent events.

When the events are dependent we need to adjust the multiplication formula to make
allowance for this.

The big question is when are events considered to be dependent?

Some examples of events that are dependent are:


Formula.

Note that we have adjusted the second term on the right hand side of the formula.
That second term is referred to as the conditional probability of the event B, since
once the event A has occurred, then the probability of the event B needs to be
adjusted given that the event A has occurred.

Example 1:, suppose a card is drawn from a deck and not replaced, and then a
second card is drawn. What is the probability of selecting an ace on the first card and
a king on the second card?

Given that there are 52 cards in the pack an there are four aces and four kings.

4 4 16 4
P(Ace and King) = x = =
52 51 2652 663

Example 4
Suppose there are five marbles in an urn. They are identical except for colour. Three
of the marbles are red and two are blue. You are instructed to draw out one marble,
note its colour, and set it aside (do not replace it). Then you are to draw out another
marble and note its colour. What are the outcomes of the experiment? What is the
probability of each outcome?

We first draw a tree diagram to work out the outcomes of the experiment.
7. Conditional Probability

We have seen above that the general multiplication rule employs the conditional
probability which adjusts the rule to deal with dependent events. The conditional
probability of an event which is represented as P(A|B), gives the probability of the
event given that another event which affects the second event has already occurred.

The main idea in computing the conditional probability of an event is based on some
additional information that might affect the likelihood of the outcome of an
experiment, so we need to alter the probability of an event of interest.

A probability that reflects such additional knowledge is called the conditional


probability of the event.

The event that occurs first or the additional knowledge reduces the sample space to
one of its subsets. The additional information therefore tells us that we are in a
portion of the sample space rather than being anywhere in the sample space.

Example: Roll a die. S = {1, 2, 3, 4, 5, 6}. Let event A = {observing an even


number} = {2, 4, 6} and event B = {observing a number less than or equal to 3} =
{1, 2, 3}

3 3
P(A) = P(B) =
6 6

Note: the probability of A which is 3/6 or ½, is computed without any reference to the
event B.
If we are now told that or given the additional information that the event B occurred
and we are now asked to compute the probability of the event A, then the additional
information that the event B has occurred must be taken into account. In other words
if we are told that the event B has occurred then we are sure that the one numbers 1,
2 and 3 showed up.

Question : With this additional information we ask the question: Would the
3
probability of observing an even number on that throw of the die still be equal to ?
6
No. WHY?
Let’s look two Venn diagrams will shed light on these questions.

A
4
2 1 B
6
3
5

In the first Venn Diagram we assume there is no additional information, so the


probabilities are calculated without such knowledge.

Once the additional information is provided then it must be taken into consideration
when computing the probability of the event A. Since we are told that the event B
occurred, then the only number in the event A that is possible, given that an even
number occurred is the number 2.

Note that the original sample space which contained six sample points is reduced to a
subset containing three sample points as shown in the diagram on the right.

∴ P(A|B) = 3
1

Hence probabilities associated with events defined on the reduced sample space are
called conditional probabilities. We will derive a formula below for conditional
probabilities.
Example 2: This is a more extended example which will review most of the topics
dealt with so far.

Composition of a firm’s Labour force

NON-
COLLEGE
COLLEGE TOTAL
(C)
(C )
MANAGERIAL (M) 50 20 70
NON-MANAGERIAL ( M
150 280 430
)
TOTAL 200 300 500

In this example as shown in the table above, the employers of a firm are cross
classified as managerial or non-managerial personnel, and as college graduates or not.

We will choose a worker at random and compute probabilities for different subsets of
the sample space.

Marginal Probabilities

n(M ) 70
P(M) =
N (S )
= 500
= 0.14

n(M ) 430
P( M ) =
N (S )
= 500
= 0.86

n(C ) 200
P(C) =
N (S )
= 500
= 0.40

n(C ) 300
P(C ) =
N (S )
= 500
= 0.60

Cell Probabilities (Joint Probabilities)

n(M ⋂C ) 50
P(M ⋂ C) =
N (S )
= 500
= 0.10

n(M ⋂C ) 20
P(M ⋂ C ) =
N (S )
= 500
= 0.04
n(M ⋂C ) 150
P( M ⋂ C) =
N (S )
= 500
= 0.30

n(M ⋂C ) 280
P( M ⋂ C ) =
N (S )
= 500
= 0.56

Notice that the way in which the data is presented makes it easy to get the numbers
to compute the desired probabilities. We can also use the multiplication rule to
compute the cell probabilities which we will do when we develop our understanding of
conditional probabilities.

Conditional Probabilities
We have seen in the example above that the conditional probability of an event takes
into consideration that another event occurred and the occurrence of that event has
changed the probability of the second event. We can use the table above to compute
conditional probabilities and below we will give a rule to compute the conditional
probability of an event.

50 20
P(M|C) = = 0.25 P(M|C ) = = 0.0667
200 300

150 280
P( M |C) = = 0.75 P( M |C ) = = 0.933
200 300

50 150
P(C|M) = = 0.71 P(C| M ) = = 0.35
70 430

20 280
P(C |M) = = 0.29 P(C | M ) = = 0.65
70 430

Just as we did for the card examples where two cards are pulled successively from a
deck of cards without replacement, the probabilities above are computed for an event
given that a first event has already occurred. When the data is presented as in the
table above it is easy to make the necessary adjustment. Otherwise we can use the
formula derived below from the multiplication rule for dependent events..
We can rewrite the last expression as follows:

We can re-compute the probability in the example one above. Here:

1
P ( A∧B) 6 1 6 1
P(A|B) =
P (B)
= 3
= 6
x 3
= 3
= 0.33
6

50
P ( M ∧C) 500 50 500 50
P(M|C) =
P(C )
= 200
= 500
x 200
= 200
= 0.25
500

We can use the formula to compute a couple of the other conditional probabilities for
practice.

Another issue we have to address is when do we know two events are


independent or dependent?

It is tempting to assume that if there is an intersection, that is n(A ⋂ B) ≠ 0, that the


two events are dependent . This is not true since two events may intersect each other
and be either independent or dependent. In other words we cannot infer
independence or dependence form the fact that two events intersect each other.

Statistically Independent Events.

1. Two events A and B are statistically independent if and only if:

P(A ⋂ B) = P(A) x P(B)

Therefore two events are statistically if their joint probability is equal to the product of
their separate probabilities.

2. Two events A and B are statistically independent if:


P(A) = P(A|B) or P(B) = P(B|A) ----------- statistically independent

P(A) ≠ P(A|B) or P(B) ≠P(B|A) ------------- statistically dependent


Probability Distributions

Objectives:
 Define a random variable.
 Compute the mean and variance of a random variable.
 Define the probability distribution
 Distinguish between discrete and continuous probability distributions.
Two probability distributions: binomial PD
Normal PD

In our study of probability we examined the probability of specific events of an


experiment occurring.

We now want to study the whole range of events resulting from an experiment. To
describe the likelihood of each outcome of this range of events, we use a probability
distribution.

We will see later that probability distribution help us to make accurate conclusions
about a population from which a sample was taken.

Basically a PD is a listing of the outcomes of an experiment that may occur and their
corresponding probabilities.

What is a random variable?


We came across the term variable before when we looked at the frequency
distribution. Each Frequency distribution studies the distribution of a variable.

Revisit the bicycle example. In this example we have a variable (bicycle sales) and
the frequency of the various values (x) may assume.

Recall that the FDs may be converted to diagrams.

In relation to the probability distribution we are trying to do similar things. Here


however the variable x is called a random variable and distribution is a probability
distribution.

Definition of a random variable:

We will try to define a random variable in relation to the following experiment.

Experiment: Toss two coins. Combine the up faces. S = {HH, HT, TH, TT}
In this experiment we are looking to define a variable let’s call it X, and we define it as
the number of heads observed when two coins are tossed.

We need to assign a single number to each outcome of the experiment. We can


define a random variable X, such that X is defined as the number of heads observed.
As shown in the diagram each of the outcome has a single number associated with it.
The variable X, is a random because the outcome of the experiment is uncertain.

Example 2: Roll two dice. Definition of variable: Let X be the sum resulting from
adding the two up surfaces of the dice. The range of the variable X is from the
numbers 2 to 12 inclusive.

Range of X. Number of sample Sample points in S


points
2 1 (1,1)
3 2 (1,2), (2,1)
4 3 (1,3), (2, 2), (3,1)
5 4 (1,4), (2,3), (3,2), (4,1)
6 5 (1,5), (2, 4), (3, 3), (4,2),
(5,1)
7 6 (1,6), (2,5), (3,4), (4,3), (5,2),
(6,1)
8 5 (2,6), (3,5), (4,4), (5,3), (6,2)
9 4 (3,6), (4,5), (5,4), (6,3)
10 3 (4,6), (5, 5), (6,4)
11 2 (5,6), (6,5)
12 1 (6,6)

Some observations about random variables and sample spaces.

1. Each sample point is assigned a specific possible value of the random variable,
though the same specific value can be assigned to two or more sample space.
2. Each possible value of a Random variable is an event, since it is a subset defined
on a sample space.
3. All the values of a Random Variable constitute a set of events that are mutually
exclusive and completely exhaustive.

Another way of viewing a Random Variable is that it is simply an uncertain quantity – a


quantity whose value is not known with certainty.
2 Types of Random Variables

1. Discrete Random Variables: can assume a finite number of values or countably


infinite number of values.

2. Continuous Random Variables: can assume values corresponding to any of the


points contained in one or more intervals (i.e. they are uncountable).

Probability Distributions for Discrete Random Variables

A complete description of a DRV requires that we specify the possible values the RV
can assume and the probability associated with each value.

e.g. Toss of two coins. Let’s name the random variable X where X is defined as the
number of heads observed. Find the probability of each value the random
variable can assume.
S = {HH, HT, TH, TT}

We can associate with each outcome a number given the definition of the random
variable.

Probability distribution may be represented in tabulation form and also a graph.

Probability Distribution for Coin Toss Experiment

Definition of PD: The PD of a DRV is a graph, table or formula that specifies the
probability associated with each possible value the RV can assume.

Tabular representation of probability distribution

Observe
P(x)
Heads (x)
0 ¼ Tabular representation of
probability distribution
1 ½

2 ¼

Graphical representation of the probability distribution

2 requirements for the PF of a Discrete Random Variable x


1. P(x) ≥0 for all values of x
2. ∑ p(x) = 1

A PD may also be expressed algebraically

Eg. Roll of 2 dice

f(y) = {
We see that the PDs is analogous to FDs, we looked at earlier. Not unlike the
frequency distributions it is useful to compute measures of central tendency and
dispersion.

NB:

1. FDs are empirical distributions based on observed or realised values of


variables. On the other hand PDs are theoretical distributions. They are defined
by logical reasoning. Probability distribution is theoretical because it shows how
the total probability of 1 is distributed among all the possible values of a
Random Variable. These values are what we may expect to occur when an
experiment is performed. Thus a probability distribution is predictive in nature
and this is the reason why it can aid us in making decisions under conditions of
uncertainty.

2. Empirical distribution may be constructed from a set of sample data or that of a


population data. A theoretical distribution is always a population distribution
because a random variable is considered as a population. A RV is a population
in that it takes on all the possible outcomes of a Random process as it values.
IOW the possible values of a RV exhaust all the sample points in a sample space
that is a universal set. Thus descriptive measures for RVs are called by statistical
conventions, parameters.

A discrete PD is based on a discrete random variable.

QUESTION: What is the difference between RV and PD?

RV - list outcomes of an experiment.


PD - includes list of possible outcomes and the probability of each outcome.

PD possesses as mean µ and a variance σ2.

The mean of a PD

The mean is a value that is typical of the PD and it also represents the Long Run
average of the Random Variable. The mean for a Probability Distribution is also called
the expected value (denoted E(x)) and is a weighted average with the weights being
the probabilities of the different values of X or the Random Variable.

Thus µ = E(x) = ∑ x p(x)

Where: µ = E(x) – mean or expected value


p (x) – probability of a particular value of X.

The Expected Value, as a measure of Central Tendency tells us where the centre of
the mass of the Probability Distribution of a Random Variable is located. Also the
average value of a RV if the same random experiment is repeated over and over
again. EV need not be a possible value of the RV.

Variance of a PD

As in the case of FDs measurements of variability are very important to get a better
picture of PD.

The population variance σ2 defined as the average of the squared distance of x from
the population mean, µ.

σ2 = E (x - µ) 2
= ∑ (x- µ) 2
p(x)

Or
Standard Deviation

The standard deviation of a DRV is equal to the square roof of the variance,

i.e

We can use µ and σ of the probability distribution x, in conjunction with the


Experimental rule to make statements about the likelihood that values of x will fall
within intervals µ+/- σ, µ+/- 2σ, µ+/- 3σ.

We have seen so far how a discrete random variable is defined on S may give rise to a
probability distribution. We have seen some similarities between PD and FDs. Also
how the measures of central tendency and dispersion are also applicable to the
probability distribution.

Example 1:

. P(X)
2
No. of Heads P(x) X P(x) X X
2

observed (X)
0 1/4 0 0 0
1 1/2 1/2 1 ½
2 1/4 1/2 4 1
1 1 1.5
Example: Finding the Variance and Standard deviation of a probability distribution.
2 2 2
σ = Σ[ X . P(X)] - µ

= 1.5 – 1

= 0.5

σ = √ σ2

= √0.5

= 0.7071

Example 2:

2 2 2
• σ = Σ[ X . P(X)] - µ

= 8.1137 – 1.6612
= 8.1137 - 2.7589

= 5.3548

• σ = √ σ2

= √ 5.3548

= 2.314
• The Interpretation of the Standard Deviation Measure

Meaning of Standard Deviation:

What is a standard deviation? What does it do, and what does it mean? The most
precise way to define standard

deviation is by reciting the formula used to compute it. However, insight into the
concept of standard deviation can be gleaned by viewing the manner in which it is
applied. Two ways of applying the standard deviation are the empirical rule and
Chebyshev’s theorem.

• Empirical Rule

• The empirical rule is an important rule of thumb that is used to state the
approximate percentage of values that lie within a given number of standard
deviations from the mean of a set of data if the data are normally distributed.

The empirical rule is used only for three numbers of standard deviations: 1, 2, and 3.
More detailed analysis of other numbers of values is presented in Chapter 6. Also
discussed in further detail in Chapter 6 is the normal distribution, a unimodal,
symmetrical distribution that is bell (or mound) shaped. The requirement that the data
be normally distributed contains some tolerance, and the empirical rule generally
applies as long as the data are approximately mound shaped
The Binomial Distribution

Many experiments result in dichotomous responses, i.e where there are two possible
alternatives.

Random variables associated with dichotomous responses are called binomial random
variables.

Example: toss a fair coin three times and count the number of Heads.

Why is this experiment a


Binomial experiment?
S = {(HHH) (HHT) (HTH) (HTT) (THH) (THT) (TTH)(TTT)}

# of Probabilit
Heads y
0 1/8
1 3/8
2 3/8
3 1/8
1

Formula for the Binomial experiment.


Deriving the probability distribution using the above experiment using the
Binomial formula.

Once the experiment is determined to be a binomial experiment then we can do the


following.

n!
P(X) = . X . n− X
( n− X ) ! X ! p q

n!
→ This component of the formula is known as the binomial coefficient and
( n− X ) ! X !
calculate the number of outcomes having the characteristics of the event under
consideration.
X n− X
p .q → The second part of the formula gives the probability of the event of interest
occurring once.

Brief review of computations involving Factorial notation

0! = 1, 1! = 1 4! = 4 x 3x 2 x 1 = 24 6! = 6 x
5 x 4 x 3 x 2 x 1 = 720
6! 6X5X4X3X2X1 720 6!
6! = 6 x 5 x 4! = 30 x 24 = 720 = = = 30 =
4! 4X3X2X1 24 4!
6 X 5 X 4! 30
= = 30
4! 1

For this application, we have n = 3 trials. Since a success S is defined as an adult who
passes the test, p = P(S) = ½ and q = 1 - p = ½. Substituting n = 4, p = ½ and q =
.1/2 into the formula for p(x), we obtain:

n! 3! 0 3−0
3∗2∗1 1 6 1 1
P(0) = . p X . q n− X = . 1 . 1 = *1* = * =
( n− X ) ! X ! ( 3−0 ) ! 0 ! 2 2 3∗2∗1(1) 8 6 8 8

n! 3! 1 3−1
3∗2∗1 1 1 6 1 3
P(1) = . p X . q n− X = . 1 . 1 = * * = * =
( n− X ) ! X ! ( 3−1 ) ! 1! 2 2 2∗1(1) 2 4 2 8 8

n! 3! 2 3−2
3∗2∗1 1 1 6 1 3
P(2) = . p X . q n− X = . 1 . 1 = * * = * =
( n− X ) ! X ! ( 3−2 ) ! 2 ! 2 2 1(2) 4 2 2 8 8

n! 3! 3 3−3
3∗2∗1 1 6 1 1
P(3) = . p X . q n− X = . 1 . 1 = * *1= * =
( n− X ) ! X ! ( 3−3 ) ! 3 ! 2 2 1(6) 8 6 8 8

( nx ) =
n!
nCx = ( n− X ) ! X ! → these three expressions are all ‘equal’. The
combination expression can be done on the calculator.

Mean (µ) = 3 x ½ = 1.5

Variance (σ2) = 3 x ½ x ½ = 0.75


Standard Deviation (σ) = √0.75 = 0.866

You might also like