Statistics Review
Statistics Review
Lecture 1: Introduction
We live in a world where numbers are very important in understanding the events
and things that are happening around us. Every day we are bombarded with
numbers that try to provide information about the world.
The more we understand the meaning and information these numbers are trying to
convey the better we understand the world and the more informed the decisions and
actions we take.
This course will provide us with a range of techniques and methods to better
understand numbers and the information they convey.
At the beginning of this course we would first like to define the term
statistics and some basic definitions of terms we use throughout the course.
Definition of Statistics.
Two meanings:
The broader meaning of Statistics therefore points to a process that takes time. Our
whole course involves this process.
2. Inferential Statistics:
Inferential Statistics involves using samples and information gathered from
samples to make conclusions or inferences about the population.
Population: refers to all the members, items in a general set under study. Eg.
All the students on campus, all the trees on campus, all the transactions
undertaken by a bank teller for a specific day.
Variables
A variable is a characteristic or trait related to a sample or a population. A variable
is a symbol, such as X, Y, H, X, or B which can assume any of a prescribed set of
values. Therefore the values the variable takes on depends on the problem being
studied.
Secondary Data
Ascending Array: 6, 11, 17, 22, 27, 34, 38, 45, 48, 57
Descending Array: 57, 48, 45, 38, 34, 27, 22, 17, 11, 6
Assume that the numbers in the arrays ar marks from a test. What
information can such an array yield?
1) The highest mark is 57.
2) The lowest mark is 6.
3) The Range is 57 – 6 = 51.
4) How many students pass the test achieving a mark of 50 or more.
2. Frequency distributions
Frequency Distributions are tables used to summarise and present raw data.
The type of distributions depends on the nature of the data collected. Thus
Frequency distributions may present qualitative or quantitative data. FDs
representing the latter may present discrete or continuous data. Further the
FD may be ungrouped or grouped depending on the size of the data set.
Example 1:In this example the blood types of twenty five persons are given
and the FD for this data is given below.
Source: Alan Bluman “Elementary Statistics” – 8th Edition. McGraw Hill, New York
Example 2: Another worked out example is given below with some details:
Source: Prem Mann “Introductory Statistics” Introductory Statistics – 8th Edition. John
Wiley and Sons
An ungrouped Frequency Distribution is used when the data set is relatively small.
Items in the data set are therefore represented by single values of the variable in
question and the process of constructing the FD involves simply counting how many
items take on the different values the variable can assume. An example taken from
Mann is given below.
The following FD is worked out from the data above. Notice that it uses single value
classes.
Source: Prem Mann “Introductory Statistics” Introductory Statistics – 8th Edition.
John Wiley and Sons
In the Table 3.1 the heights of 100 students at State University are recorded to
the nearest inch. Construct a frequency distribution of the data given in Table
1
Table 1
61 68 63 68 73 64 66 68 67 72 66 65
64 70 65 67 70 67 69 64 68 64 67 70
66 74 68 69 67 73 71 69 69 65 71 66
62 70 67 70 63 68 66 69 67 70 66 73
69 66 65 67 67 70 62 67 70 70 71 66
69 73 66 69 64 68 66 69 71 64 67 67
66 64 67 68 68 70 63 66 69 66 62 69
63 66 61 68 67 72 66 68 63 63 67 68
70 73 69 64
Table 2
Table 3.2
Marks f Mid-point R.F P.F (%) C.F
60-62 5 61 0.05 5 5
63-65 18 64 0.18 18 23
66-68 42 67 0.42 42 65
69-71 27 70 0.27 27 92
72-74 8 73 0.08 8 100
100 1.00 100
Table 3.3
4.1.2 From the table above calculate the mean of the grouped frequency
distribution
Mean = (∑fx)/n
=6,745/100
=67.45
45,49,12,61,70,36,18
=12,18,36,45,49,61,70
= 45
= 65.5 + ((50-23)/42)* 3
= 65.5 + 1.93
= 67.43
The mode is the value that occurs most often in a data set. It is sometimes said to be
the most typical case.
25,15,18,25,17,25,12
= 25
=65.5 + 1.85
=67.35
4. Measures of Dispersion
4.1. Range
X X-µ (X-µ)
5 -8 8
9 -4 4
16 3 3
17 4 4
18 5 5
65 0 24
µ= 13
¿
Mean Average Deviation = Σ∨x−µ∨ N ¿
= 24/5
= 4.8
Class f Mid-point
fx X - X │X - X│ f │X - X│
118-126 3 122 366.0 -24.98 24.98 74.94
127-135 5 131 655.0 -15.98 15.98 79.90
136-144 9 140 1,260.0 -6.98 6.98 62.82
145-153 12 149 1,788.0 2.02 2.02 24.24
154-162 5 158 790.0 11.02 11.02 55.10
163-171 4 167 668.0 20.02 20.02 80.08
172-180 2 176 352.0 29.02 29.02 58.04
40 5,879.0 435.12
Mean = (∑fx)/n
= 5,879/40
= 146.98
¿ 435.12
Mean Average Deviation = Σf ∨x−µ∨ N ¿ =
40
= 10.88
On average the weight of each person in the data set deviates by 10.88 lbs from the
average of all persons in the data set.
Ungrouped data
2
2 Σ ( x−µ)
σ =
N
1750
= 6
= 291.7
σ =
√ Σ(x −µ)2
N
= √ 291.7
= 17.08
The formulas above are the definitional formulas, however the shortcut formulas
below are the ones which should be used.
Variance and standard deviation for grouped data.
Population
Sample
Example
Marks f Mid-point fx X2 fx2
= 45(92,031) - (1929)2
45(45-1)
= 4,141,395 - 3,721,041
1980
= 212.3
S = √ S2
= √ 212.3
= 14.57
The Empirical Rule
Introduction to Probability
Uncertainty, Random experiments, sample points, sample spaces and
events.
We need to understand these concepts in order to calculate probabilities.
Examples:
1. The toss of coin. The outcomes of this experiment are: Head or Tail. Using set
notation the set of outcomes is {H, T}.
2. Rolling a fair die. There are six possible outcomes of this experiment. These
are: 1, 2, 3, 4, 5, 6. Using set notation the set of outcomes is {1, 2, 3, 4, 5, 6}.
3. Tossing two coins: The outcomes of this experiment are: HH, HT, TH, TT.
Using set notation the set of outcomes is {HH, HT, TH, TT}
Sample Point
The first experiment has two sample points, while the second experiment has six
sample points.
Sample spaces
A sample space is a set containing all the possible outcomes or sample points of an
experiment, where sample points are collectively exhaustive and each pair of sample
points is mutually exclusive (universal set). The sample space is represented by the
letter S.
Examples:
1. The toss of coin. The outcomes of this experiment are: Head or Tail. Using set
notation: S = {H, T}.
2. Rolling a fair die. There are six possible outcomes of this experiment. These
are: 1, 2, 3, 4, 5, 6. Using set notation: S = {1, 2, 3, 4, 5, 6}.
3. Tossing two coins: The outcomes of this experiment are: HH, HT, TH, TT.
Using set notation: S = {HH, HT, TH, TT}
In example 1 N(S) = 2
In example 2 N(S) = 6
In example 3 N(S) = 4
Sample spaces can also be represented by (1) Venn diagrams, (2) two
dimensional graphs and (3) Tree diagrams.
Venn diagrams
Two Dimensional Diagram
Tree Diagrams
The above ways of representing a sample space are useful when the sample space
is relatively small. In situations where the sample space is large other techniques
are used to determine the sample space for experiments. A study of combinations
and permutations may be useful here.
Events
An event is defined as a subset of the sample space. In other words an event is a
collection of sample points.
A simple event that is related to this experiment might be the event A where
A compound event that is related to this experiment might be the event C where
An impossible event that is related to this experiment might be the event E where
A word on notation
Example: Event A = {observing an odd number in one roll of the die} = {1, 3, 5}
Complement of an event
The complement of an event denoted as the event A consists of all the points in the
sample space that are not in the event A. The symbol A is usually read aloud as A bar.
Example: Experiment – A single roll of a fair die. Let the event A = {observe an odd
number}
A 1 3 2
5 4 6
The union of two events A and B is the event that occurs if either A or B or both occur
on a single performance of the experiment and is denoted as A ⋃ B.
Note that A ⋃ B is an event which is derived from the two events A and B. In other
words the sample points that belong to the event (A ⋃ B) are all the different sample
points drawn from the two events A and B. When we are considering the union of
two sets there are three possible ways in which the events a and the event B are
related, as shown below.
Scenario 1. Union when the two events (sets) are disjoint sets or sets which have no
intersection.
S ⋃ T = {1, 2, 3, 4, 5, 6, 7, 8}
Scenario 2: Union when the two events (sets) which have elements (sample points) in
common or in other words when the sets intersect.
V ⋃ F = {a, d, e, g, i, n, o, u}
Scenario 2: Union of two sets when one set is a proper subset of the other set.
T ⋃ S = (1, 2, 3, 4}
The intersection of two events A and B is the event that occurs if both A and B occur
on a single performance of the experiment. We write A ⋂ B for the intersection of A and B.
The intersection A⋂B, consists of all the sample points belonging to both A and B as shown in the
following diagram.
The above diagram represents a die toss experiment. Let A = {observing an even number} = {2, 4, 6}
and B = {observing a number less than or equal to 4} = {1, 2, 3, 4}
A ⋂ B = {2,4}
Scenrio2: Where the intersection is equal to all the elements of one set or event
Given D ={a, e, i, f, r, s, t} and C = {a, e, r}
D ⋂ C = {a, e r}
Note because the set C is a proper subset of D, their intersection is equal to C = {a, e, r}
Events A and B are mutually exclusive if A ⋂ B contains no sample points, that is events A and B have
no sample points in common. In other words n(A ⋂ B) = 0 or is an empty set.
S ⋂ T = { } or empty set.
The classical approach of probabilities is based on the assumption that several outcomes are equally
likely. It depends on logical reasoning.
Two experiments for which the outcomes are equally likely are: (i) rolling a fair die and (ii) tossing a
fair die.
In other words classical Probability posits that all the points in a suitably constructed sample space are
equally probable. Using this approach the probability of an event is equal to the number of points in
the sample space for which this event occurs, divided by the total number of points in the sample
space, S. Thus for an event E, the probability of the event occurring is given by:
n(E)
P (E) =
N (S )
Example1: What is the probability of getting a head on a single toss of a balanced coin?
1
Therefore P(E) =
2
Example 2: What is the probability of observing the number 1 in one roll of a die?
1
Therefore P(E) =
6
Hence if after n repetitions of an experiment, where n is very large, an event is observed to occur in x
of these then the probability of the event is x/n.
Another way of looking at this is given an event E, the probability of the event P(E) is given by the the
formula:
number of ×E occured f
P(E) = =
number of trials(repititions) n
Example. If we toss a coin 1000 times and find that it comes up heads 532 times, we estimate the
probability of a head coming up to be 532/1000 = 0.532
The empirical approach computes the probability of an event using the relative frequency of the event.
Some events cannot be analysed using the objective approaches outlined above. The subjective
theorist regards probability as a measure of personal confidence in a particular proposition.
Hence for situations in which there is little or no historical information from which to determine a
probability, subjective probabilities can be employed.
Subjective probability can be thought of as the probability assigned by an individual or group based on
whatever evidence is available.
In this course the first and second approaches will be used to compute probabilities.
Propositions/Rules of Probability
1. Within any sample space S, if E is an event then:
0 ≤ P (E) ≤ 1
Basically, this is saying that any calculated probability must be between 0 and 1 inclusive. The closer
the probability of an event is to zero the less likely that event will occur. On the other hand the closer
the probability of an event is to 1 the more likely that event will occur. It should be noted that the
probability of an event cannot be less than 0 or be negative. Further the probability of an event cannot
be greater than 1.
Recall that for an experiment with equally likely outcomes the formula for computing the probability
n(E)
is: P (E) =
N (S )
When n(e) = 0 then P(E) = 0 and the event E, is certainly not going to take place.
When n(E) = N(S) then P(E) = 1, and the event is certain to occur.
Otherwise n(E) < N(S) and P(E) is between 0 and 1. The expression above captures these ideas.
This means that an entire sample space is a certainty. It is an event that is bound to occur in one
running of the experiment.
3. Addition Rule
There are two versions of the addition rule one for two events that are mutually exclusive and a
general rule which may be used for events which are mutually exclusive and for those that are not.
This result may also be computed using the special addition rule:
P(A ⋃ B) = P(A) + P(B) = 4/6 + 1/6 = 5/6
Example 2: Toss two coins. Let event A = {observe one head} and B = {observe exactly two
heads}. Find the probability of observing at least one head by first counting sample points
and secondly by using the addition rule.
3
P {observing at least one head} = . This approach employs the Classical approach of
4
counting sample points, since the outcomes of this experiment are equally likely.
The second approach uses the addition rule. Since both events A and B contain sample
points with at least one head and the events are disjoint we can use the special additional
rule to compute the required probability.
P (A ⋃ B) = P (A) + P(B)
2 1 3
= +
4 4
= 4
The question is why we make this adjustment. Recall from above that two events are
mutually exclusive when there are no common sample points in the two events, as shown in
the diagram below. In this diagram the events S and T have no common sample points.
On the other hand two events are not mutually exclusive when there is common sample
point/s in the two events as shown below.
A ⋃ B = {1, 2, 3, 4, 6} A ⋂ B = {2, 4}
If we use the special addition rule P (A ⋃ B) = P (A) + P (B) for the second diagram then the
3 4 7
answer would be: = +
6 6
= 6 (This answer is incorrect)
Why does the special addition rule give us the incorrect answer?
Notice there are two common sample points in the two events and these are counted in the
computation of both P (A) and the P(B). Therefore we are double counting the numbers 2 and 4. To
correct for this we need to take away the effect of the double counting on the overall probability. This
is done by taking away P (A ⋂ B).
For the above problem we therefore need to use the general addition rule. So the
above problem with the adjustment gives the following:
Example: Toss a coin and roll a die and combine the results. Let the event A = {observing an outcome
with a number less than or equal to 3} and B = { observing an outcome with a number greater than or
equal to 3}. Find the probability of observing an outcome with a number less than or equal to 3 or
greater than or equal to 3, i.e find P(A ⋃ B).
Sample space from this experiment worked out using tree diagram.
S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
A = { H1, H2, H3, T1, T2, T3}
B = { H3, H4, H5, H6, T3, T4, T5, T6}
(A ⋃ B) = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
(A ⋂ B) = {H3, T3}
n( A) 6 n (B) 8 n( A ⋂ B) 2
P (A) = = P (B) = = P(A ⋂ B) = =
N (S ) 12 N (S ) 12 N (S) 12
n( A ⋃ B) 12
P(A ⋃ B) = = =1
N (S) 12
Alternatively we can use the general addition rule to solve this problem as given below.
P(A ⋃ B) = P (A) + P (B) – P (A ⋂ B)
6 8 2
= + -
12 12 12
12
= 12
= 1
P(∅) = 0
If the event set is an “impossibility” for which the probability of occurrence is zero.
Example: Roll a die. S = {1, 2, 3, 4, 5, 6}. Let A = {Observe the number seven} = {7}
P(A) = 0
The complement of an event E is the set of outcomes in the sample space that are
not included in the outcomes of event E. The complement of E is denoted by E (read “E
bar”) Bluman p218.
Example: Tossing two balanced coins. S = {HH, HT, TH, TT}. Let A = {observing at
least one head} = { HH, HT, TH} and A = {TT}
n( A) 3 1
P{A} =
N (S )
= 4
= 0.75 P( A ¿=
4
∴ P(A) = 1 – P( A ) = 1 - 4 = 4
1 3
P( A ) = 1 - P(A) = 1 - 4 = 4
3 1
P(A) + P( A ) = 4 + 4 = 1
3 1
We will see later that sometimes it may be easier to compute the probability of an
event if we know the probability of its complement.
There are two versions of the multiplication rule depending on whether the events
are independent or dependent.
Two events A and B are independent events if the fact that A occurs does not affect
the probability of B occurring.
The multiplication rules can be used to find the probability of two or more events
that occur in sequence. For example, if you toss a coin and then roll a die, you can
find the probability of getting a head on the coin and a 4 on the die. These two
events are said to be independent since the outcome of the first event (tossing a
coin) does not affect the probability outcome of the second event (rolling a die).
We learnt that since all the outcomes of this experiment are equally likely, the
1
probability assigned to each one is .
4
1 1 1 1
Given that S ={HH, HT, TH, TT}, P{HH} = P{HT} = P{TH} = P{TT} =
4 4 4 4
We can view this experiment from a different perspective. Recall the tree diagram for
the experiment:
For sequential experiments the tree diagram offers us a method to compute the
1
probabilities of different events. Notice the P(HH) =
4
, as worked out using the
classical approach to computing probabilities.
Placing the probabilities along the branches of the tree is a useful technique to work
out the probabilities of different events. So this experiment and example shows two
ways of computing the same probabilities of events.
Using the tree diagram to compute probabilities for the four equally likely outcomes of
this experiment allows us to explain the multiplication rule for independent events,
1 1 1 1
Notice that the P{HH} =
4
or P(H and H) =
2
x 2
= 4
⋂
The probabilities of all the outcomes are the same since all the outcomes are equally
likely.
1
Thus P(H, 1) =
12
Example 3: Suppose there are five marbles in an urn. They are identical except for
colour. Three of the marbles are red and two are blue. You are instructed to draw out
one marble, note its colour, and replace it in the urn. Then you are to draw out another
marble and note its colour. What are the outcomes of the experiment? What is the
probability of each outcome?
We first draw a tree diagram to work out the outcomes of the experiment.
We see that the probabilities for the different outcomes are computed by multiplying
the probabilities along the branches of the tree.
These examples confirm the special multiplication rule for independent events.
When the events are dependent we need to adjust the multiplication formula to make
allowance for this.
Note that we have adjusted the second term on the right hand side of the formula.
That second term is referred to as the conditional probability of the event B, since
once the event A has occurred, then the probability of the event B needs to be
adjusted given that the event A has occurred.
Example 1:, suppose a card is drawn from a deck and not replaced, and then a
second card is drawn. What is the probability of selecting an ace on the first card and
a king on the second card?
Given that there are 52 cards in the pack an there are four aces and four kings.
4 4 16 4
P(Ace and King) = x = =
52 51 2652 663
Example 4
Suppose there are five marbles in an urn. They are identical except for colour. Three
of the marbles are red and two are blue. You are instructed to draw out one marble,
note its colour, and set it aside (do not replace it). Then you are to draw out another
marble and note its colour. What are the outcomes of the experiment? What is the
probability of each outcome?
We first draw a tree diagram to work out the outcomes of the experiment.
7. Conditional Probability
We have seen above that the general multiplication rule employs the conditional
probability which adjusts the rule to deal with dependent events. The conditional
probability of an event which is represented as P(A|B), gives the probability of the
event given that another event which affects the second event has already occurred.
The main idea in computing the conditional probability of an event is based on some
additional information that might affect the likelihood of the outcome of an
experiment, so we need to alter the probability of an event of interest.
The event that occurs first or the additional knowledge reduces the sample space to
one of its subsets. The additional information therefore tells us that we are in a
portion of the sample space rather than being anywhere in the sample space.
3 3
P(A) = P(B) =
6 6
Note: the probability of A which is 3/6 or ½, is computed without any reference to the
event B.
If we are now told that or given the additional information that the event B occurred
and we are now asked to compute the probability of the event A, then the additional
information that the event B has occurred must be taken into account. In other words
if we are told that the event B has occurred then we are sure that the one numbers 1,
2 and 3 showed up.
Question : With this additional information we ask the question: Would the
3
probability of observing an even number on that throw of the die still be equal to ?
6
No. WHY?
Let’s look two Venn diagrams will shed light on these questions.
A
4
2 1 B
6
3
5
Once the additional information is provided then it must be taken into consideration
when computing the probability of the event A. Since we are told that the event B
occurred, then the only number in the event A that is possible, given that an even
number occurred is the number 2.
Note that the original sample space which contained six sample points is reduced to a
subset containing three sample points as shown in the diagram on the right.
∴ P(A|B) = 3
1
Hence probabilities associated with events defined on the reduced sample space are
called conditional probabilities. We will derive a formula below for conditional
probabilities.
Example 2: This is a more extended example which will review most of the topics
dealt with so far.
NON-
COLLEGE
COLLEGE TOTAL
(C)
(C )
MANAGERIAL (M) 50 20 70
NON-MANAGERIAL ( M
150 280 430
)
TOTAL 200 300 500
In this example as shown in the table above, the employers of a firm are cross
classified as managerial or non-managerial personnel, and as college graduates or not.
We will choose a worker at random and compute probabilities for different subsets of
the sample space.
Marginal Probabilities
n(M ) 70
P(M) =
N (S )
= 500
= 0.14
n(M ) 430
P( M ) =
N (S )
= 500
= 0.86
n(C ) 200
P(C) =
N (S )
= 500
= 0.40
n(C ) 300
P(C ) =
N (S )
= 500
= 0.60
n(M ⋂C ) 50
P(M ⋂ C) =
N (S )
= 500
= 0.10
n(M ⋂C ) 20
P(M ⋂ C ) =
N (S )
= 500
= 0.04
n(M ⋂C ) 150
P( M ⋂ C) =
N (S )
= 500
= 0.30
n(M ⋂C ) 280
P( M ⋂ C ) =
N (S )
= 500
= 0.56
Notice that the way in which the data is presented makes it easy to get the numbers
to compute the desired probabilities. We can also use the multiplication rule to
compute the cell probabilities which we will do when we develop our understanding of
conditional probabilities.
Conditional Probabilities
We have seen in the example above that the conditional probability of an event takes
into consideration that another event occurred and the occurrence of that event has
changed the probability of the second event. We can use the table above to compute
conditional probabilities and below we will give a rule to compute the conditional
probability of an event.
50 20
P(M|C) = = 0.25 P(M|C ) = = 0.0667
200 300
150 280
P( M |C) = = 0.75 P( M |C ) = = 0.933
200 300
50 150
P(C|M) = = 0.71 P(C| M ) = = 0.35
70 430
20 280
P(C |M) = = 0.29 P(C | M ) = = 0.65
70 430
Just as we did for the card examples where two cards are pulled successively from a
deck of cards without replacement, the probabilities above are computed for an event
given that a first event has already occurred. When the data is presented as in the
table above it is easy to make the necessary adjustment. Otherwise we can use the
formula derived below from the multiplication rule for dependent events..
We can rewrite the last expression as follows:
1
P ( A∧B) 6 1 6 1
P(A|B) =
P (B)
= 3
= 6
x 3
= 3
= 0.33
6
50
P ( M ∧C) 500 50 500 50
P(M|C) =
P(C )
= 200
= 500
x 200
= 200
= 0.25
500
We can use the formula to compute a couple of the other conditional probabilities for
practice.
Therefore two events are statistically if their joint probability is equal to the product of
their separate probabilities.
Objectives:
Define a random variable.
Compute the mean and variance of a random variable.
Define the probability distribution
Distinguish between discrete and continuous probability distributions.
Two probability distributions: binomial PD
Normal PD
We now want to study the whole range of events resulting from an experiment. To
describe the likelihood of each outcome of this range of events, we use a probability
distribution.
We will see later that probability distribution help us to make accurate conclusions
about a population from which a sample was taken.
Basically a PD is a listing of the outcomes of an experiment that may occur and their
corresponding probabilities.
Revisit the bicycle example. In this example we have a variable (bicycle sales) and
the frequency of the various values (x) may assume.
Experiment: Toss two coins. Combine the up faces. S = {HH, HT, TH, TT}
In this experiment we are looking to define a variable let’s call it X, and we define it as
the number of heads observed when two coins are tossed.
Example 2: Roll two dice. Definition of variable: Let X be the sum resulting from
adding the two up surfaces of the dice. The range of the variable X is from the
numbers 2 to 12 inclusive.
1. Each sample point is assigned a specific possible value of the random variable,
though the same specific value can be assigned to two or more sample space.
2. Each possible value of a Random variable is an event, since it is a subset defined
on a sample space.
3. All the values of a Random Variable constitute a set of events that are mutually
exclusive and completely exhaustive.
A complete description of a DRV requires that we specify the possible values the RV
can assume and the probability associated with each value.
e.g. Toss of two coins. Let’s name the random variable X where X is defined as the
number of heads observed. Find the probability of each value the random
variable can assume.
S = {HH, HT, TH, TT}
We can associate with each outcome a number given the definition of the random
variable.
Definition of PD: The PD of a DRV is a graph, table or formula that specifies the
probability associated with each possible value the RV can assume.
Observe
P(x)
Heads (x)
0 ¼ Tabular representation of
probability distribution
1 ½
2 ¼
f(y) = {
We see that the PDs is analogous to FDs, we looked at earlier. Not unlike the
frequency distributions it is useful to compute measures of central tendency and
dispersion.
NB:
The mean of a PD
The mean is a value that is typical of the PD and it also represents the Long Run
average of the Random Variable. The mean for a Probability Distribution is also called
the expected value (denoted E(x)) and is a weighted average with the weights being
the probabilities of the different values of X or the Random Variable.
The Expected Value, as a measure of Central Tendency tells us where the centre of
the mass of the Probability Distribution of a Random Variable is located. Also the
average value of a RV if the same random experiment is repeated over and over
again. EV need not be a possible value of the RV.
Variance of a PD
As in the case of FDs measurements of variability are very important to get a better
picture of PD.
The population variance σ2 defined as the average of the squared distance of x from
the population mean, µ.
σ2 = E (x - µ) 2
= ∑ (x- µ) 2
p(x)
Or
Standard Deviation
The standard deviation of a DRV is equal to the square roof of the variance,
i.e
We have seen so far how a discrete random variable is defined on S may give rise to a
probability distribution. We have seen some similarities between PD and FDs. Also
how the measures of central tendency and dispersion are also applicable to the
probability distribution.
Example 1:
. P(X)
2
No. of Heads P(x) X P(x) X X
2
observed (X)
0 1/4 0 0 0
1 1/2 1/2 1 ½
2 1/4 1/2 4 1
1 1 1.5
Example: Finding the Variance and Standard deviation of a probability distribution.
2 2 2
σ = Σ[ X . P(X)] - µ
= 1.5 – 1
= 0.5
σ = √ σ2
= √0.5
= 0.7071
Example 2:
2 2 2
• σ = Σ[ X . P(X)] - µ
= 8.1137 – 1.6612
= 8.1137 - 2.7589
= 5.3548
• σ = √ σ2
= √ 5.3548
= 2.314
• The Interpretation of the Standard Deviation Measure
What is a standard deviation? What does it do, and what does it mean? The most
precise way to define standard
deviation is by reciting the formula used to compute it. However, insight into the
concept of standard deviation can be gleaned by viewing the manner in which it is
applied. Two ways of applying the standard deviation are the empirical rule and
Chebyshev’s theorem.
• Empirical Rule
• The empirical rule is an important rule of thumb that is used to state the
approximate percentage of values that lie within a given number of standard
deviations from the mean of a set of data if the data are normally distributed.
The empirical rule is used only for three numbers of standard deviations: 1, 2, and 3.
More detailed analysis of other numbers of values is presented in Chapter 6. Also
discussed in further detail in Chapter 6 is the normal distribution, a unimodal,
symmetrical distribution that is bell (or mound) shaped. The requirement that the data
be normally distributed contains some tolerance, and the empirical rule generally
applies as long as the data are approximately mound shaped
The Binomial Distribution
Many experiments result in dichotomous responses, i.e where there are two possible
alternatives.
Random variables associated with dichotomous responses are called binomial random
variables.
Example: toss a fair coin three times and count the number of Heads.
# of Probabilit
Heads y
0 1/8
1 3/8
2 3/8
3 1/8
1
n!
P(X) = . X . n− X
( n− X ) ! X ! p q
n!
→ This component of the formula is known as the binomial coefficient and
( n− X ) ! X !
calculate the number of outcomes having the characteristics of the event under
consideration.
X n− X
p .q → The second part of the formula gives the probability of the event of interest
occurring once.
0! = 1, 1! = 1 4! = 4 x 3x 2 x 1 = 24 6! = 6 x
5 x 4 x 3 x 2 x 1 = 720
6! 6X5X4X3X2X1 720 6!
6! = 6 x 5 x 4! = 30 x 24 = 720 = = = 30 =
4! 4X3X2X1 24 4!
6 X 5 X 4! 30
= = 30
4! 1
For this application, we have n = 3 trials. Since a success S is defined as an adult who
passes the test, p = P(S) = ½ and q = 1 - p = ½. Substituting n = 4, p = ½ and q =
.1/2 into the formula for p(x), we obtain:
n! 3! 0 3−0
3∗2∗1 1 6 1 1
P(0) = . p X . q n− X = . 1 . 1 = *1* = * =
( n− X ) ! X ! ( 3−0 ) ! 0 ! 2 2 3∗2∗1(1) 8 6 8 8
n! 3! 1 3−1
3∗2∗1 1 1 6 1 3
P(1) = . p X . q n− X = . 1 . 1 = * * = * =
( n− X ) ! X ! ( 3−1 ) ! 1! 2 2 2∗1(1) 2 4 2 8 8
n! 3! 2 3−2
3∗2∗1 1 1 6 1 3
P(2) = . p X . q n− X = . 1 . 1 = * * = * =
( n− X ) ! X ! ( 3−2 ) ! 2 ! 2 2 1(2) 4 2 2 8 8
n! 3! 3 3−3
3∗2∗1 1 6 1 1
P(3) = . p X . q n− X = . 1 . 1 = * *1= * =
( n− X ) ! X ! ( 3−3 ) ! 3 ! 2 2 1(6) 8 6 8 8
( nx ) =
n!
nCx = ( n− X ) ! X ! → these three expressions are all ‘equal’. The
combination expression can be done on the calculator.