Statistics For Economists ch1 - ch7 (All in One)

STATISTICS FOR ECONOMISTS
ECON 2042
Addis Ababa University
College of Business and

Economics
Department of Economics
May, 2023
11/10/2023 Statistics for Economics_AAU 1
CHAPTER ONE
OVERVIEW OF BASIC PROBABILITY
THEORY
Introduction
❖ There are large numbers of happenings in nature and in the
realm of human activity that are associated with uncertainties.
❖ Uncertainty plays an important role in our daily lives and
activities as well as in business.
❖ It is often necessary to "guess" about the outcome of an event
in order to make a decision.
❖ In our daily lives we are faced with a lot of decision-making
situations that involve uncertainty.
For example:
✓ In a business context: investment, stock prices
✓ Game Wins, Lottery wins, Election win, Traffic jam
✓ Market Conditions: effectiveness of an ad campaign or the
eventual success of a new product.
✓ Weather forecast: appearance of clouds in the sky next morning
is not as certain.
✓ The sex of a baby to be born some months hence is again not
known for certain.

❑ In generally, we often do not know an outcome with certainty,
instead, we are forced to guess, to estimate, to hedge our bets.
✓ In each of such happenings there are two or more outcomes
leading to uncertainty.
❑ If an experiment is repeated under essentially similar conditions,
we generally come across two types of situation: namely
1) the result or the outcome is unique or certain - known as
deterministic,
2) the result is not unique rather may be one of the several
possible outcomes- known as unpredictable or non-
deterministic or probabilistic phenomenon.
❑ The experiment (models) related to non-deterministic phenomenon
is called random experiment or non-deterministic model.

❖ In this course we will deal with the random experiments- or non -
deterministic models.
Examples:
Exp1: Toss a die and observe the number that shows up.
Exp2: Toss a coin n time and observe the total number of
heads obtained
Exp3: Manufacture items on a production line and count the
number of defective items produced during a 24 hr. period.
Exp4: testing the life length of a manufactured light bulb
by inserting it into a socket and the time elapsed (in hrs.) until it
burns out recorded.
Exp5: Measuring blood pressure of a group of individuals,
Exp6: Checking an automobile’s petrol mileage,
Exp7 : Measuring daily rainfall, and so on.

So what is probability theory ?
❖ Probability is the science of uncertainty.
❖ It provides precise mathematical rules for understanding and
analyzing our own ignorance.
❖ It does not tell us tomorrow’s weather or next week’s stock prices;
rather, it gives us a framework for working with our limited
knowledge and for making sensible decisions based on what we
do and do not know.
❖ Is a measure of uncertainty with values between zero and one,
inclusive, describing the relative possibility(chance or likelihood)
of occurrence of an event.
❑ In sum, probability theory provides us with a precise
understanding of uncertainty; and this understanding can help us
make predictions, make better decisions, assess risks, and even
make money.
1.1 Sample Space, Sample Points, Events and Event
Space
i) Experiment: An activity or measurement that results in an
outcome.
ii) Sample space: is the set of all possible outcomes of an experiment
denoted by S or Ω.
iii) Sample points: Each element of the sample space S is called a
sample point and usually denoted by ω𝑖 .
iv) Events: One or more of the possible outcomes; a subsets of the
sample space, S.
v) Event space: is a class or collection of all events associated with a
given experiment OR sample space. We use E to denote event
space.

EXAMPLE: Let the experiment be rolling of a well balanced single
die one times.
Given this:
✓ The sample space of the experiment is S = {1, 2, 3, 4, 5, 6}
which contains all the possible outcomes of the experiment.
✓ Each outcome is a sample point, i.e., 𝝎𝒊 ∈ S.
Let A = {2, 4, 6}.
✓ Then A is a subset of S and defines the event of obtaining an
even outcome.
Let B = {4}.
✓ Then B defines the event of obtaining a number 4.
✓ Both A and B are subsets of the sample space and thus are
events.

➢ We may be interested on the following events
a) the outcome is the number 1
b) the outcome is even but less than 3
c) the outcomes is not even and so on.
✓ When an event contains only one element of S, like B above, it is
called simple(elementary) event.
❑ If we collect all subsets of S in one set and denote it by E this new set
is called Event Space.
Types of Events
✓ We defined have already defined Event as any subset of
outcomes of an experiment.
1) Mutually Exclusive Events: If two or more events cannot occur
simultaneously in a single trial of an experiment, then such events
are called mutually exclusive events or disjoint events.
✓ In other words, two events are mutually exclusive if the occurrence
of one of them prevents or rules out the occurrence of the other.
For example, the numbers 2 and 3 cannot occur simultaneously on the
roll of a dice.
✓ Symbolically, a set of events {𝑨𝟏 , 𝑨𝟐 , . . ., 𝑨𝒏 } is mutually
exclusive if 𝑨𝒊 ∩ 𝑨𝒋 = ∅ for i ≠ j.
✓ This means the intersection of two events is a null set (∅); it is
impossible to observe an event that is common in both 𝑨𝒊 and 𝑨𝒋 .
2) Collectively Exhaustive Events: A list of events is said to be

collectively exhaustive when all possible events that can occur from
an experiment includes every possible outcome.
✓ That is, two or more events are said to be collectively exhaustive
if one of the events must occur.

✓ Symbolically, a set of events {𝑨𝟏 , 𝑨𝟐 , . . ., 𝑨𝒏 } is collectively exhaustive if
the union of these events is identical with the sample space S.
✓ That is, S = {𝑨𝟏 ∪ 𝑨𝟐 ∪, . . ., ∪ 𝑨𝒏 }.
For example, being a male and female; event of even and event of odd
number in rolling a die are mutually exclusive and collectively exhaustive
events.
3) Independent and Dependent Events: Two events are said to be

independent if information about one tells nothing about the occurrence of the
other.
✓ In other words, outcome of one event does not affect, and is not
affected by, the other event.
✓ The outcomes of successive tosses of a coin are independent of its
preceding toss.
Example: Increase in the population (in per cent) per year in India is
independent of increase in wheat production (in per cent) per year in the USA.
✓ However, two or more events are said to be dependent if
information about one tells something about the other.
✓ That is, dependence between characteristics implies that a
relationship exists, and therefore, knowledge of one characteristic
is useful in assessing the occurrence of the other.
For example, drawing of a card (say a queen) from a pack of playing
cards without replacement reduces the chances of drawing a queen
in the subsequent draws.
4) Compound Events: When two or more events occur in connection

with each other, then their simultaneous occurrence is called a
compound event.
✓ finding the probability of more than one event occurring at the
same time
✓ These event may be (i) independent, or (ii) dependent.

5) Equally Likely Events: Two or more events are said to be equally
likely if each has an equal chance to occur.
✓ That is, one of them cannot be expected to occur in preference to
the other.
For example, each number may be expected to occur on the
uppermost face of a rolling die the same number of times in the long
run.
6) Complementary Events: If A is any subset of the sample space(S),
then its complement denoted by (read as E-bar) contains all the
elements of the sample space that are not part of E.
✓ If S denotes the sample space then,
ഥ =𝑺−𝑨
𝑨
= {All sample elements not in E}
❑ Obviously such events must be mutually exclusive and collective
exhaustive.
1.2 Approaches of Probability
✓ A general definition of probability states that probability is a
numerical measure (between 0 and 1 inclusively) of the likelihood
or chance of occurrence of an uncertain event.
✓ However, it does not tell us how to compute the probability.
✓ In this section, we shall discuss different conceptual approaches of
calculating the probability of an event.
✓ There are three widely used or applied definitions or interpretation
of probability.
✓ These definitions are:
a) Classical or a priori definition of probability
b) Relative frequency or posterior definition of probability
c) Subjective definition of probability

A) Classical or a priori definition of Probability
➢ This approach of defining the probability is based on the assumption that all
the possible outcomes (finite in number) of an experiment are mutually
exclusive and equally likely.
Definition: If a random experiment can result in N mutually exclusive and
equally likely outcomes, i.e. if the sample space S consists of N mutually
exclusive and equally likely outcomes, and if NA of these outcomes have an
attribute A.
➢ Then the probability of A is given by the ratio:
NA 𝑭𝒂𝒗𝒐𝒓𝒂𝒃𝒍𝒆 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝒘𝒊𝒕𝒉 𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆 𝑨

𝐏(𝐀) = = ………….……(1)
𝑵 𝑻𝒐𝒕𝒂𝒍 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔
❖ Since the probability of occurrence of an event is based on prior
knowledge of the process involved, therefore this approach is often
called a priori classical probability approach.
✓ This means, we do not have to perform random experiments to find the
probability of occurrence of an event.
✓ This also implies that no experimental data are required for
computation of probability.
✓ Since the assumption of equally likely simple events can rarely be
verified with certainty, therefore this approach is not used often other
than in games of chance.

Example 1: Let the random experiment be tossing of a single die.
S = {1, 2, 3, 4, 5, 6}.
❖These six outcomes are mutually exclusive since two or more
faces can not turn up at once or simultaneously.
❖And if the die is fair, or unbiased, the six outcomes are equally
likely that is, each face is expected to appear with about equal
relative frequency in the long run.
❖In this example the probability of obtaining any single outcome
𝟏 𝟏
say 2, on a single toss is = .
𝑵 𝟔

Example 2: Similarly for the process of selecting a card at random, each
event or card is mutually exclusive, exhaustive, and equiprobable.
✓ The probability of selecting any one card on a trial is equal to 1/52,
since there are 52 cards.
Hence, in general, for a random experiment with N mutually
exclusive, exhaustive, equiprobable events, the probability of any of
the events is equal to 1/N.
✓ For instance, if an event A is elementary event, meaning containing
𝟏
a single outcome, then its probability, P (A), is .
𝑵

Example 3: Let the experiment be tossing a single coin 3 times: or
tossing 3 coins simultaneously.
S = {HHH, HHT, HTH, HTT, THH, THT, TTK, TTT}
✓ If the coin is well balanced, or fair, these 8 possible outcomes are
mutually exclusive and equally likely outcomes.
𝟏
➢ Each single outcome, or sample point, has the probability of =
𝑵
𝟏
.
𝟖
➢ Now suppose that we want the probability that the result of a toss
be at least two heads and represent it by B.
B = {HHH, HHT, HTH, THH}
✓ Then, the probability of the event B occurring is given by P B =
𝑁(𝐵) 4
= = 0.5
𝑁 8

❑ To sum up, according to the classical definition, the probability P(A) of an
event A is determined a priori without actual experimentation, i.e. no
actual experiment need ever take place.
❑ In other words it is possible to conceive the experiment, say, of throwing a
coin 3 times with out actually doing it and proceed logically on the
assumption of unbiased coin.
❑ That is why it is called a priori probability.
NOTE: By classical definition the probability of event A is a number

between 0 and 1 inclusive, i.e. P(A) take values between 0 and 1 inclusive, i.e.
0 ≤ P(A) ≤ 1.
i. P(A)≤1, because total number of possible outcomes can not be less
than the number of outcomes with specified attributes, i.e. N ≥
N(A).
ii. If an event is certain to happen, its probability is 1
iii. If it is certain not to happen, its probability is 0.
Critiques:
✓ The classical definition breaks down in the following cases.
(a) If outcomes of an experiment are not mutually exclusive and equally
likely.
Example:
✓ What is the probability of the event of rolling a number 4, with a single
toss of a die if the die is biased?
✓ To put it differently, if the die is loaded and the probability of 4 equals,
say 0.2, the number 0.2 can not be calculated from the ratio given by
equation (1).
❖ The classical definition will not help us when we try to answer
questions such as: What is the probability that
✓ a child born in Addis will be a boy?; a male will die before age
50?; a candidate will pass in a certain test ? it will rain tomorrow ?
❖ Outcomes of such experiment are not equally likely, so impossible to
answer.
✓ For example, it is not possible to state in advance, without repetitive
trials(empirical test) of the experiment, the probabilities in cases like
(i) whether a number greater than 3 will appear when die is rolled or
(ii) if 100 items will include 10 defective items.
b) When the number of possible outcomes is infinite
✓ So, the alternative is to use relative frequency or posterior definition of
probability.
2) Relative Frequency (posterior) definition of probability
❖ This approach of computing probability is based on the assumption that a
random experiment can be repeated a large number of times under
identical conditions where trials are independent to each other.
❖ While conducting a random experiment, we may or may not observe the
desired event.
❖ But as the experiment is repeated many times, that event may occur some
proportion of time.
❖ Thus, the approach calculates the proportion of the time (i.e.
the relative frequency) with which the event occurs over an
infinite number of repetitions of the experiment under identical
conditions.
✓ Let’s say N(A) is the number of times that A occurs in the N trials,
𝑵(𝑨)
the ratio appears to converge to a constant limit as N
𝑵
increases indefinitely.
✓ We can take the ultimate value of this ratio as the P(A).
For example, if a die is tossed N times and N(A) denotes the
number of times the event A (i.e., number 4, 5, or 6) occurs, then the
ratio P(A) = (N(A))/N gives the proportions of times the event A
occurs in n trials, and are also called relative frequencies of the
event in n trials.

✓ Although our estimate about P(A) may change after every trial,
yet we will find that the proportion (N(A))/N tends to cluster
around a unique central value as the number of trials N
becomes even larger.
✓ This unique central value (also called probability of event A)(the
relative frequency definition) is given by
𝑵(𝑨)
𝑃 𝐴 = lim ………………………………….…(2)
𝑛→∞ 𝑵
✓ This is by assumimg a series of experiment can be made
keeping the initial conditions as equal as possible.
✓ An observation of a random experiment is made, then the
experiment is repeated and another observation is taken.

✓ When the experiment is repeated up to sufficiently large times, in
many of the cases the observations fall into certain classes where in
the relative frequencies are quite stable.
✓ This stable relative frequency can be taken as the probability
(approximate) of events.
Example_1: Consider tossing a coin (balanced or unbalanced) 1000 times.
✓ The following result is expected from this experiment.
OUTCOME Observed Observed Relative Long run expected
Frequency Frequency frequency
T 540 0.54 0.5
H 460 0.46 0.5
Total 1000 1.00 1.00
𝟓𝟒𝟎
✓ Thus, probability of observing H, = P(H) = ≈ 𝟎. 𝟓; P(T) = 0.5
𝟏𝟎𝟎𝟎
Example_2: If we find up on examination of large records that about 51%
of births, in one locality, are male; it might be reasonable to postulate that
the probability of a male birth in this locality is equal to P and P  0.51, i.e.
P(male birth in that locality) = P  0.51.
Example_3: A study of 8,000 economics graduates of A.A.U was
conducted. The study revealed that 400 of these students were not employed
in their major areas of study.
What is the probability that a particular economics graduate will be
employed in an area other than his/her field of study?
400
P(a graduate will be employed in area other than his/her major) p≈ =
8000
0.05
❖ What we learn from the above examples is that the probability of an event
happening in the long run is determined by observing what fraction of
the time similar events happened in the past.
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒊𝒎𝒆𝒔 𝒆𝒗𝒆𝒏𝒕 𝒐𝒄𝒄𝒖𝒓𝒆𝒅 𝒊𝒏 𝒕𝒉𝒆 𝒑𝒂𝒔𝒕
𝒑 𝒂𝒏 𝒆𝒗𝒆𝒏𝒕 𝒉𝒂𝒑𝒑𝒆𝒊𝒏𝒈 =
𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔

NOTE: the relative frequency definition does not
a) require events to be equally likely
b) necessarily requiring the objects of the experiment to be unbiased.
C) Subjective Definition
✓ If there is little or no past experience on which to base probability it may be
guessed subjectively.
✓ Essentially, this means evaluating the available information and then estimating
the probability of an event.
✓ The subjective approach is always based on the degree of beliefs, convictions, and
experience concerning the likelihood of occurrence of a random event.
Example: Estimating:
i) The probability that a new product of a firm will be successful in the market.
ii) The probability that a student will score an A in the course, and
iii) When a person says that the probability of rain tomorrow is, say 70% - he is
expressing his personal degree of belief.
1.3. Axioms of Probability: The Rule of probability
✓ Before giving axiomatic definition of probability let’s first see, the summary of what
we know about set theory which are relevant to our interest.
De Morgan’s Law
ഥ∩𝑩
I) (𝐀 ∪ 𝑩)= 𝑨 ഥ

ഥ∪𝑩
II) (𝐀 ∩ 𝑩)= 𝑨 ഥ
Example: Suppose Mr. X is interviewed for two Jobs(Bank trainee and

salesman).
Let A = Mr. X will be offered Job Bank Trainee
B = Mr. X will be offered Job Salesman
P(A) = 0.30; P(B)=0.50 and P(A∩ B) = 0.10
Questions: What is the probability that Mr. X
i) will be offered neither Bank Trainee nor Salesman?
ii) Will be not be offered both jobs at the same time?
11/10/2023
(Use Venn Diagram for illustration)
Statistics for Economics_AAU 29
Axiomatic Definition of Probability:
✓ The three Axioms of probability are:
i) P(A) ≥ 0 for every A ∈ S or Ω.
ii) P(S) or P(Ω) = 1
iii) P(𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪...𝐴𝑛 ) = P(𝐴1 ) + P(𝐴2 ) + P(𝐴3 ) +…P(𝐴𝑛 )
given 𝐴1 , 𝐴2 ,..,𝐴𝑛 are mutually exclusive or disjoint events i.e.
𝑨𝒊 ∩ 𝑨𝒋 = ∅.
OR p(∪∞ 𝑨
𝒊=𝟏 𝒊 )=σ ∞
𝒊=𝟏 𝒑(𝑨𝒊 )
Illustration: If S or Ω is a finite sample space, then if each outcome is
equally likely, we define the probability of A as the fraction of outcomes
that are in A.
NA
𝐏(𝐀) =
𝑵
✓ Thus, computing P(A) means counting the number of outcomes in the
event A and the number of outcomes in the sample space S or Ω and
dividing.
✓ This simple formula provides
11/10/2023
the first two axioms.
Questions for Exercise
Find the probabilities under equal likely outcomes.
i) Probability of head in a single coin toss?
ii) Probability of at least tow heads in tossing a coin three times?
ii) Probability of sum is 9 in a die rolling twice?
iii) Probability of exactly three heads in tossing a coin four times?
iv) Probability of exactly four heads in tossing a coin four times?
v) Probability of at least three heads in tossing a coin four times?
NOTE: Answers for questions iii to v provide us a proof for the third
axiom.

Rules(Properties) of Probability
✓ Other properties that we associate with a probability can be derived from the axioms.
1) The Complement Rule:
✓ Sometimes it is easier to calculate the probability of an event happening by determining
the probability of it not happening and subtracting the result from 1.
✓ Because A and its complement 𝐴𝐶 = {ω; ω ∉ A} are mutually exclusive and
collectively exhaustive, the probabilities of A and 𝐴𝐶 sum to 1.
P(A) + P(𝑨𝑪 ) = P(A ∪ 𝑨𝑪 ) = P(Ω) = 1
or P(𝑨𝑪 ) = 1 − P(A).
Example_1: if we toss a biased coin, we may want to say that P{heads} = p where p is not
necessarily equal to 1/2.
By necessity, P{tails} = 1 − p.
Example_2: Toss a coin 4 times.
5 𝟏𝟏
P{fewer than 3 heads} = 1 − P{at least 3 heads} = 1 − =
16 𝟏𝟔

2) The Difference Rule:
✓ We write B\A to denote the outcomes that are in B but not in A.
✓ If A ⊂ B, then
P(B \ A) = P(B) − P(A)
(The symbol ⊂ denotes “contains in”. A and B\A are mutually exclusive and their
union is B.
Thus, P(B) = P(A) + P(B\A).
Example_1: In A single die roll: let
A: an event of odd number
B: a number less than or equal to five
Solution:
A = {1, 3, 5} B = {1, 2, 3, 4, 5}
B/A = {2, 4}
P(B) = P(A) + P(B\A)
Thus, 5/6 = 1/2 + 1/3

Rules of Addition
3) Special Rule of Addition
✓ The special rule of addition is used when events are mutually exclusive.
✓ Recall that mutually exclusive means that when one event occurs, none of
the other events can occur at the same time.
✓ An illustration of mutually exclusive events in the die-tossing experiment is
the events “a number 4 or larger” and “a number 2 or smaller.” If the
outcome is in the first group {4, 5, and 6}, then it cannot also be in the
second group {1 and 2}.
𝑷(𝑨 𝒐𝒓 𝑩) = 𝑷(𝑨) + 𝑷(𝑩)
✓ For three mutually exclusive events designated A, B, and C, the rule is
written:
P(A or B or C) = P(A) + P(B) + P(C)
Example_1: What is the probability of either head or tail in a single toss of
coin?
Solution:
P(Head or Tail) = P(Head) + P(Tail) = ½ + ½ = 1 = P(S)
4) The General Rule of Addition:
✓ The outcomes of an experiment may not be mutually exclusive.
✓ For example, let’s assume that Ethiopian tourism bureau reports that 400
tourists visited two Ethiopian tourist sites during the year 2020. The report also
revealed that 240 tourists went to Lalibela and 200 went to Sof Omar caves.
Question: What is the probability that a person selected visited either Lalibela or
Sof Omar Caves?
Solution:
✓ If the special rule of addition is used, the probability of selecting a tourist who
went to Lalibela’s is 0.60, found by 240/400.
✓ Similarly, the probability of a tourist going to Sof Omar caves is .50.
✓ The sum of these probabilities is 1.10.
✓ We know, however, that this probability cannot be greater than 1.
✓ The explanation is that many tourists visited both sites and are being counted
twice!
❑ A check of the survey responses revealed that 120 out of 400 sampled did,
in fact, visit both attractions.
✓ To answer our question, “What is the probability a selected person
visited either Lalibela or Sof Omar caves?” (1) add the probability that a
tourist visited Lalibela and the probability he or she visited Sof Omar caves,
and (2) subtract the probability of visiting both.
➢ Thus: P(Lalibela or Sof Omar caves) = P(Lalibela) + P(Sof Omar caves)
- P(both Lalibela and Sof Omar caves)
= 0.6 + 0.50 – 0.30
= 0.8
✓ When two events both occur, the probability is called a joint probability.
✓ The probability (.30) that a tourist visits both sites is an example of a joint
probability.
❑ So the general rule of addition, which is used to compute the probability
of two events(A, B) that are not mutually exclusive, is:
𝑷(𝑨 𝒐𝒓 𝑩) = 𝑷(𝑨) + 𝑷(𝑩) – 𝑷(𝑨 𝒂𝒏𝒅 𝑩)

Example_1: What is the probability that a card chosen at random from a standard
deck of cards will be either a king or a heart?
Solution:
✓ There 4 cards with a king(K) in a deck of 52 cards.
✓ There are 13 cards with a heart in a deck of 52 cards.
✓ In the thirteen(13) heart cards there is one card with a king(K).
❑ Thus, P(king) = 4/52; P(Heart)= 13/52 and P(King and heart) = 1/52
P(either a King or a heart) = P(King) + P(Heart) – P(King and Heart)

=4/52 + 13/52 – 1/52 = 16/52 = 0.3077
Exercises: The events X and Y are mutually exclusive. Suppose P(X) = .05
and P(Y) = .02.
i) What is the probability of either X or Y occurring?
ii) What is the probability that neither X nor Y will happen?
Rules of Multiplication
5) The Special Rule of Multiplication
✓ The special rule of multiplication requires that two events A and B are
independent.
✓ Two events are independent if the occurrence of one event does not alter
the probability of the occurrence of the other event.
Example: suppose two coins are tossed. The outcome of a coin toss (head or
tail) is unaffected by the outcome of any other prior coin toss (head or tail).
❑ For two independent events A and B, the probability that A and B will both
occur is found by multiplying the two probabilities.
❑ This is the special rule of multiplication and is written symbolically as:
𝑷(𝑨 𝒂𝒏𝒅 𝑩) = 𝑷(𝑨)𝑷(𝑩)
❑ For three independent events, A, B, and C, the special rule of multiplication
used to determine the probability that all three events will occur is:
𝑷(𝑨 𝒂𝒏𝒅 𝑩 𝒂𝒏𝒅 𝑪) = 𝑷(𝑨)𝑷(𝑩)𝑷(𝑪)

Example_1: A survey by the American Automobile Association (AAA)
revealed 60% of its members made airline reservations last year. Two members
are selected at random. What is the probability both made airline reservations
last year?
Solution:
✓ The probability the first member made an airline reservation last year is .60,
written P(𝑅1 ) = 0.60, where 𝑅1 refers to the fact that the first member made a
reservation. The probability that the second member selected made a
reservation is also .60, so P(𝑅2 ) = 0.60.
✓ Because the number of AAA members is very large, you may assume that 𝑹𝟏
and 𝑹𝟐 are independent. Consequently, using the previous formula, the
probability they both make a reservation is 0.36, found by:
P(𝑅1 and 𝑅2 ) = P(𝑅1 )P(𝑅2 = (.60)(.60) = 0.36

✓ Furthermore, recall R indicates that a reservation is made and 𝑅ത indicates
no reservation is made.
✓ The complement rule is applied to compute the probability that a member
does not make a reservation, P(𝑹 ഥ ) = 0.40.
✓ Using this information, the probability that neither member makes a
reservation, [P( 𝑅ത1 ) P 𝑅ത2 )] = (.40)(.40) = 0.16.
6) The General Rule of Multiplication:

✓ If two events are not independent, they are referred to as dependent.
Example: A Player has 12 golf shirts in his closet. Suppose nine of these
shirts are white and the others blue. He gets dressed in the dark, so he just
grabs a shirt and puts it on. He plays golf two days in a row and does not
launder and return the used shirts to the closet (Replacement). What is the
likelihood both shirts selected are white?

Solution:
The event that the first shirt selected is white is W1. The probability is
P(W1) = 9/12 because nine of the 12 shirts are white. The event that
the second shirt selected is also white is identified as W2. The
conditional probability that the second shirt selected is white, given
that the first shirt selected is also white, is P(W2 | W1) = 8/11. Why is
this so?
✓ Because after the first shirt is selected, there are only 11 shirts
remaining in the closet and eight of these are white.
✓ To determine the probability of two white shirts being selected, we
use formula (5–6). P(W1 and W2) = P(W1)P(W2 | W1) = ( 9/12)(
8/11) = 0.55
❑ Thus, the general of multiplication for two events is given as:
P(A and B) = P(A)P(B/A)

1.4 Counting Procedures
✓ If the number of possible outcomes in an experiment is small, it is
relatively easy to list and count all the possible events and assign
probability to all events.
❖ However, if the size of the event, say N(A), and the size of the sample
space, N(S) are large for a given random experiment with a finite
number of equally likely outcomes, the counting procedure become
difficult problem.
✓ Such counting is usually handled by use of counting procedures known as
combinatorial formulas.
❖ There are three widely used counting (Enumeration) methods namely,
1) Multiplication Principle
2) Permutations
3) Combinations

1) Multiplication Principle
Suppose we have two sets F and T, if F has m distinct objects f1, f2, . . . fm
and T has n distinct objects, t1, t2, . . . tn, then the number of pairs ( fi, tj) that
can be formed by taking one object from set F and a second from the set T is
(m)(n).
Example 1: Let the random experiment be throwing a balanced coin and
fair die and record the paired outcomes. What is the total possible outcome of the
experiment?
Solution: If set F contains total possible outcomes of throwing a coin its
elements are ( H, T) and 𝑵(𝑭) = 𝟐 and if set T contains the total possible out
comes of throwing a die its elements are {1, 2, 3, 4, 5, 6} and 𝑵 (𝑻) = 𝟔 . The
outcomes of our random experiment are obtained by the Cartesian cross products
of the two sets. That is (H, T) X ( 1, 2, 3, 4, 5, 6)
❖Thus, total possible paired outcomes are = 𝟐 ∗ 𝟔 = 𝟏𝟐

❑ In general, if there are m ways of doing one thing and n ways of
doing another thing, there are (m)(n) ways doing both .
Multiplication formula: Number of arrangements = (m)(n)
Note: This principle obviously can be extended to any number of
procedures.
✓ That is if there are m ways of doing one thing n ways of doing
another thing and r ways of doing the third thing then total number of
arrangements is given by (m)(n)( r).
Example: A manufactured item must pass through three controls
stations. At each station the item is inspected for particular
characteristics and marked accordingly. At the first station, 3 ratings are
possible while at the last two stations 4 ratings are possible.
In how many possible ways may the item be marked?
Solution: 3 x 4 x 4 = 48 possible ways

2) Permutation Rule
✓ This rule of counting involves ordering or permutations. This rule helps us to
compute the number of ways in which n distinct objects can be arranged,
taking r of them at a time.
Suppose that we have n different objects. Then the question is in how many
ways may these objects be arranged (Permuted ) where the order of
arrangement is important. That is ABC and CBA are considered as two
different arrangements.
❑ Let’s consider the following scheme. Arranging n objects is equivalent to
filling them into a box with n compartments in some specified order.
n n-1 … 2 1
✓ We have n choices to fill the first compartment. Once we choose any one of n
objects to fill the first compartment we will have ( n-1) options to fill the
second compartment etc. and for the last compartment we have exactly one
option, the total number of arrangements denoted by n Pn, is given by:
𝒏 𝑷𝒏 = 𝒏 𝒏 − 𝟏 𝒏 − 𝟐 𝒏 − 𝟑 … (𝟐)(𝟏) = 𝒏!
Example: Five Students can line up in 5! number of ways which is, 120
different ways.
❑ Besides, we may be interested in the number of permutations possible
when we choose r (<n) objects from n. That is an arrangement
(Permutation ) of n objects taking r objects at a time. Like the above
case the first compartment position can be filled in n ways, the second
in ( n-1) ways etc, and the last compartment in (𝒏 − 𝒓 + 𝟏) ways.
✓ Hence total permutation of n objects taken 4 at a time is:
𝒏!
nPr = 𝒏 𝒏 − 𝟏 𝒏 − 𝟐 𝒏 − 𝟑 … 𝒏 − 𝒓 + 𝟏 =
(𝒏 − 𝒓)!
Example: A manufacturer uses a color code to identify lots of
manufactured items. The code consists of stamping seven colored stripes
on the container. The order of the color is significant and each
identification uses all 7 colors.
Suppose that the manufacturer wishes to use seven stripes but has 15
colors available. How many distinct markings can he get?
Solution:
𝟏𝟓! 𝟏𝟓!
15P7 = = = 𝟑𝟐, 𝟒𝟑𝟐, 𝟒𝟎𝟎
(𝟏𝟓 − 𝟕)! 𝟖!
3) Combination Rule
✓ Sometimes the ordering or arrangement of objects is not important, but
only the objects that are chosen. For example, we may not care in what
order the books are placed on the shelf, but only which books you are able
to shelve. In addition, when a five-person committee is chosen from a
group of 10 students, the order of choice is not important because all 5
students will be equal members of committee.
❑ This counting rule for combinations allows us to select r (say) number of
outcomes from a collection of n distinct outcomes without caring in what
order they are arranged. This rule is denoted by
𝑛 𝒏!
nCr 𝒐𝒓 𝑟 =
11/10/2023 (𝒏 − 𝒓)! 𝒓!
Example 1: From eight persons, how many committees of 3 members
may be chosen?
Solution: Since two committees are the same if they are made up of the same
members we have,
8 𝟖! 𝟖!
C
8 3 𝒐𝒓 = = = 𝟓𝟔 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑐𝑜𝑚𝑚𝑖𝑡𝑒𝑠
𝟑 (𝟖−𝟑)!𝟑! 𝟓!𝟑!
Exercise 1: A class consists eight PhD students, 5 males and 3 females. A

committee consisting two males and one female is required by an office.
How many possible number of committees are there?
Exercise 2: In how many possible ways can a three digit car plate be
made from Arabic numbers(0 to 9) given the last digit as a single letter A?
Exercise 3: Three items out of ten are defective but it is not known
which are defective. In how many ways can three items be selected? How
many of these selections will include at least one defective item?
1.5 Conditional Probability and Independence
✓ When two events happen, the outcome of the first event may or may
not have an effect on the outcome of the second event. That is, the
events may be either dependent or independent.
Definition: Statistically independence is the case when the occurrence
of an event has no effect on the probability of the occurrence of any other
event. On the other hand, statistical dependence exists when the
probability of some event is dependent up on or affected by the occurrence
of some other event.
✓ In this section, we examine events that are statistically independent.
❑ There are 3 types of probabilities under statistical independence:
1) Marginal(unconditional) Probability
2) Joint Probability
3) Conditional Probability

1.5.1 Conditional Probability
❑ An experiment is repeated N times and on each occasion we observe the
occurrence or non-occurrence of two events, say, A and B.
✓ Given these two events we are interested to know the probability of
event A given that event B has occurred.
❑ This probability is known as conditional probability of event A, given
that B has occurred and denoted by P(A/B). Similarly conditional
probability of event B, given that A has occurred is expressed as
P(B/A).
❑ The conditional probability of events can be computed under statistical
dependence and statistical independence.
Conditional Probability Under Statistical Independence
✓ For statistically independent events, the conditional probability of event
B given that event A has occurred is simply the probability of event B:
𝑷 (𝑩/𝑨) = 𝑷 (𝑩).
❑ Thus, events A and B are defined to be independent if and only if any of the
following conditions are satisfied.
1) P(A ∩ B) = P(A) ∗ P(B)
2) 𝑷 (𝑨/𝑩) = 𝑷 (𝑨)
3) 𝑷 (𝑩/𝑨) = 𝑷 (𝑩)
Example 1: What is the probability that the 𝟐𝒏𝒅 toss of a fair coin will result
in heads, given that heads resulted on the first toss?
Solution: In this case the two events are independent.
Symbolically: the question is written as: 𝑃(𝐻2 /𝐻1 ) ??
✓ Using conditional probability under statistically independent situation,
𝑷(𝑯𝟐 /𝑯𝟏 ) = P (𝑯𝟐 )
✓ Thus, 𝑷(𝑯𝟐 /𝑯𝟏 ) = 0.5
Implication: In the case of independent events the probability of occurrence
of either of the events does not depend or affect the occurrence of the others.
✓ Therefore in the coin tossing example, the probability of a head occurrence in
11/10/2023 the second toss, given that head resulted
Statistics in the first toss, is still 0.5.
for Economics_AAU 51
Example 2: Suppose we have two red and three white balls in a bag.
Draw a ball with replacement. Are these events independent?
Solution: Let A = the event that the first draw is red.
B = the event that the second draw is red.
𝟐
𝑷 (𝑨/𝑩) = 𝑷 (𝑨) = = 𝟎. 𝟒
𝟓
𝟐
𝑷 (𝑩/𝑨) = 𝑷 (𝑩) = = 𝟎. 𝟓
𝟓
✓ Thus, events A and B are independent events.
Example 3: Two computers A and B are to be marketed. A salesman who
is assigned the job of finding customers for them has 60 percent and 40 percent
chances, respectively of succeeding for computers A and B. The two
computers can be sold independently. Given that he was able to sell at least
one computer, what is the probability that computer A has been sold?

Solution: Let us define the events as:
A = Computer A is marketed and
B = Computer B is marketed.
❑ It is given that P(A) = 0.60,
P(B) = 0.40 and
𝑃 𝐴 𝑎𝑛𝑑 𝐵 𝒐𝒓 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃 𝐵
= 0.60 × 0.40 = 𝟎. 𝟐𝟒
❑ Hence, the probability that computer A has been sold given that the
salesman was able to sell at least one computer is given by:
𝑷{𝑨∩ 𝑨∪𝑩 } 𝑷(𝑨) 𝑷(𝑨)
𝑃(𝐴/𝐴 ∪ 𝐵) = = =
𝑷(𝑨∪𝑩) 𝑷(𝑨∪𝑩) 𝑷 𝑨 +𝑷 𝑩 −𝑷(𝑨∩𝑩)
𝟎.𝟔𝟎
= = 𝟎. 𝟕𝟖𝟗
𝟎.𝟔𝟎+𝟎.𝟒𝟎−𝟎.𝟐𝟒

Exercise 1: A bag contains 3 red balls, 2 blue balls and 5 white
balls. A ball is selected and its color noted. Then it is replaced. A second
ball is selected and its color noted. Find the probability of each of these.
i) Selecting 2 blue balls
ii) Selecting 1 blue ball and then 1 white ball
iii) Selecting 1 red ball and then 1 blue ball
Exercise 2: Approximately 9% of men have a red-green color
blindness. If 3 men are selected at random, find the probability that all
of them will have this type of red-green color blindness.
Exercise: 3: What is the probability that a couple’s second child
will be:
a) A boy, given that their first child was a girl?
b) A girl, given that their first child was a girl?

Conditional Probability Under Statistical Dependence
Definition: Suppose two events A and B are in the sample space S. If it is known
that an element randomly drawn from S belongs to B, then the probability that it also
belongs to A is defined to be the conditional probability of A given B.
✓ In this case, the occurrence of event A depends on the occurrence of event B.
❑ Thus, this concept answers the question: what is the probability that event A
occurs, knowing that event B has already occurred.
𝑃(𝐴∩𝐵)
❑ Symbolically, 𝑷 𝑨/𝑩 = if 𝑷(𝑩) > 𝟎
𝑷(𝑩)
✓ Using the Venn diagram, conditional probability is the area shown below:

Example 1: A box contains black chips and white chips. A person selects two chips
without replacement. If the probability of selecting a black chip and a white chip is 𝟎. 𝟐𝟔𝟖,
and the probability of selecting a black chip on the first draw is 𝟎. 𝟑𝟕𝟓, find the probability of
selecting the white chip on the second draw, given that the first chip selected was a black
chip.
Solution: Let B = selecting a black chip
W = selecting a white chip, then,
𝑷(𝑩 𝒂𝒏𝒅 𝑾) 𝟎. 𝟐𝟔𝟖 𝟓
𝑷 𝑾𝑩 = = = = 𝟎. 𝟕𝟏
𝑷(𝑩) 𝟎. 𝟑𝟕𝟓 𝟕
Example 2: A market survey was conducted in four cities to find out the preference for a
soap brand marked as XX. The responses are shown below:
Response City A City B City C City D
Yes 45 55 60 50
No 35 45 35 45
No opinion 5 5 5 5
Given the information, what is the probability that
A) a consumer selected at random, preferred brand XX?
B) a consumer preferred brand XX and was from city C?
C) a consumer preferred brand XX given that he was from city C?
D) a consumer was from city D given that consumer preferred brand XX?
Solution: Lets first summarize the given information as follow.
Response City A City B City C City D Total
Yes 45 55 60 50 210
No 35 45 35 45 160
No opinion 5 5 5 5 20
Total 85 105 100 100 = 𝟑𝟗𝟎
✓ Now, Let E denote the event that a consumer selected at random preferred brand XX.
Then,
A) The probability that a consumer selected at random preferred brand XX is:
𝟐𝟏𝟎
11/10/2023 𝑷(𝑬) = = 𝟎. 𝟓𝟑𝟗𝟖 Statistics for Economics_AAU 57
𝟑𝟗𝟎
B) The probability that a consumer preferred brand XX and was from city C
is:
𝟔𝟎
𝑷(𝑬 ∩ 𝑪) = = 𝟎. 𝟏𝟓𝟑𝟖
𝟑𝟗𝟎
C) The probability that a consumer preferred brand XX, given that he was
from city C:
𝑷(𝑬 𝒂𝒏𝒅 𝑪) 𝟎. 𝟏𝟓𝟑𝟖 𝟎. 𝟏𝟓𝟑𝟖
𝑷 𝑬𝑪 = = = = 𝟎. 𝟓𝟗𝟕
𝑷(𝑪) 𝟏𝟎𝟎/𝟑𝟗𝟎 𝟎. 𝟐𝟓𝟔
D) The probability that the consumer belongs to city D, given that he
preferred brand XX,
𝑷(𝑫 𝒂𝒏𝒅 𝑬) 𝟓𝟎ൗ 𝟓𝟎
𝑷 𝑫𝑬 = = 𝟑𝟗𝟎 = = 𝟎. 𝟐𝟑𝟖
𝑷(𝑬) 𝟐𝟏𝟎ൗ 𝟐𝟏𝟎
𝟑𝟗𝟎

Exercise 1: Two events, A and B, are statistically dependent, If 𝑷 (𝑨) = 𝟎. 𝟑𝟗,
𝑷 (𝑩) = 𝟎. 𝟐𝟏, and 𝑷(𝑨 𝒐𝒓 𝑩) = 𝟎. 𝟒𝟕, find the probability that,
i. Neither A nor B will Occur?
ii. Both A and B will occur?
iii. B will occur given that A has occurred?
iv. A will occur, given that B has occurred?
Exercise 2: The personnel department of a given company has records of its 200
engineers as below. Age BA Degree MSc Degree
Under 30 90 10
30 to 40 20 30
Over 40 40 10
✓ If one engineer is selected at random from the company, find the probability that,
A) he has only a bachelor’s degree.
B) he has a master’s degree, given that he is over 40.
11/10/2023
C) he is under 30, given thatStatistics
he has only a bachelor’s degree.
for Economics_AAU 59
END !
Special Probability Distributions & Densities
1
Some Special Discrete Probability Distributions
 There are many special discrete probability distributions that have different applications and
properties. Some examples are:
 The Bernoulli Distribution
 The Binomial Distribution
 The Hypergeometric Distribution
 The Poisson Distribution

The Bernoulli Distribution
 Bernoulli process: A process in which each trial has only two possible outcomes, the probability
of the outcome at any trial remains fixed over time, and the trials are statistically independent.
 The Bernoulli distribution, which models a single trial with two possible outcomes, such as
success or failure.
 A random variable X is defined to have a Bernoulli distribution if the discrete density
function(PMF) of X is given by:
𝒙 𝟏 𝒙
 Where is the parameter that the distribution of depends on.
𝒙 𝟏 𝒙
Alternatively, let , then
 Bernoulli Trial is an experiment only with two possible outcomes; and
(Boy or Girl, dead or alive, adopt or not adopt, fail or pass etc.).
 The Bernoulli distribution arises when the following three conditions are satisfied.
 Each trial result in just two outcomes.
 The probability of success, is the same for each trial.
 The trials are independent.
 Example 1: Consider tossing of a coin and let X = an event of observing head.
Outcomes X P( 𝒊)
Tail 0
Head 1
 From the outcomes, we can calculate,
𝟏 𝟏 𝟏
 . And,
 𝟎 𝟏 𝟎
 Mean and Variance:

 Suppose X has a Bernoulli distribution with parameter , then
𝟏 𝟐 𝟐
𝒙 𝟎 𝒊 𝒊 and
where
 Example 2: If a die is tossed once, then, find the distribution, mean and variance if the
value 4 occurs.
𝟏𝒙 𝟏 𝒙
𝟏 𝟏 𝟓
 Solution: Here and 𝒊 𝟔
𝟔 𝟔 𝟔
𝟏
 and,
𝟔
𝟏 𝟓 𝟓
V
𝟔 𝟔 𝟑𝟔
ii. Compute the first four moment about the mean for this Bernoulli Distribution.
 Questions
I. How many possible outcomes can there be for a Bernoulli trial ?
II. In Bernoulli trial, if the probability of success is , what will be the
probability of failure?
III. In a toss of a single fair die, compute the mean and variance of the distribution if an
even number occurs.
The Binomial Distribution
 A random variable X is defined to have a binomial distribution if the discrete density
function (PMF) of X is given by:
𝒏 𝒙 𝒏 𝒙
 𝒊
𝒙
𝒏
 where: 𝒙
denotes a combination; is the number of trials; is the random variable
defined as the number of successes; and is the probability of a success on each trial.
 A Binomial Experiment is any experiment that can be considered as a sequence of
n trials in which,
i. The number of trials 'n' is finite and determined before the experiment
begins,
ii. Each trial results in two mutually disjoint outcomes, termed as success
and failure,
iii. The random variable is the number of successes in a fixed number of trials.
iv. The result of any trial is independent of the results of all other trials,
v. The probability of success and failure does not change from trial to trial.
 Illustration: Lets consider a toss of a coin two times.
 Possible outcomes
 Now lets consider getting one head as the success. Count the number of successes in each
possible outcome. X = number of heads, what is the probability that ; ; and
?
X= No. of heads P( 𝒊
0
1
2
 As the experiment is a binomial experiment, we can determine the probability of success

simply using the binomial formula as follow:
𝒏 𝒙 𝒏 𝒙
𝒙
 Given: : getting one head , Number of trials and , probability of

success in a single trail.
 Thus, the probability of observing exactly one head in an experiment of tossing
coin twice is 0.5.
 The same way, the probability of getting no head can be found using:
 Example 1: Tossing a coin three times.

i. What is the probability of getting exactly two heads?
ii. What is the probability of getting at least two heads?
• Solution:
i. Here the success is getting two heads, so ; .
ii)
𝟑 𝟑 𝟑
𝟑 𝟏
 Thus, .
𝟖 𝟖
𝟐
• Exercise 1: A and B play a game in which A’s chance of wining is .
𝟑
• In a series of 8 games what is the probability that A will win at least 6 games?
 Mean and Variance: Let X be a binomially distributed random variable with parameter p and
n, denoted by .
𝒏 𝒏 𝒏 𝒙 𝒏 𝒙
 Then, 𝒙 𝟎 𝒊 𝒊 𝒙 𝟎 𝒊 𝒙
𝒏
𝒙 𝒏 𝒙
𝒊
𝒙 𝟎
𝟐 𝟐

 Example 1: If the probability of defective bolt is 0.1. Find:
(a) the mean and standard deviation for the distribution of defective bolts in a total of
500 bolts,
(b) the distribution of the random variable, and
(c) The first four moments about the mean?
 Solution:
a) ,
Then,
 Therefore,
b) Since the variable follows a binomial distribution, it becomes:
𝟓𝟎𝟎 𝒙 𝟓𝟎𝟎 𝒙
𝒙
𝒊
The Hyper Geometric Distribution
 For the binomial distribution to be applied, the probability of a success must stay the same
for each trial.
 For example, the probability of guessing the correct answer to a true/false question is 0.50.
This probability remains the same for each question on an examination.
 Most sampling, however, is done without replacement. Thus, if the population is small, the
probability of a success will change for each observation. When the population is finite and the
sampling is done without replacement, so that the events are stochastically dependent,
although random, we obtain hyper geometric distribution.

 The hypergeometric distribution is used when,
1) The sample is selected from a finite population without replacement and,
2) If the size of the sample n is more than 5% of the size of the population N.
 A random variable X is defined to have a Hyper Geometric Distribution if the discrete
density function of is given by:
𝑺 𝑵 𝑺
𝒙 𝒏 𝒙
𝑵
𝒏 , for x = 0,1,..n
 Where: N is the size of the population; S is the number of successes in the
population; x is the number of successes in the sample: It may be 0, 1, 2, 3, . . . ; and
n is the size of the sample or the number of trials.
 In summary, a hypergeometric probability distribution has these characteristics:
 An outcome on each trial of an experiment is classified into one of two mutually
exclusive categories: a success or a failure.
 The random variable is the number of successes in a fixed number of trials.
 The trials are not independent.
𝒏
 We assume that we sample from a finite population without replacement and
𝑵
> 0.05. So, the probability of a success changes for each trial.
 Example: From a basket of 10 items, containing 3 defectives a sample of 4 items is drawn at
random, without replacement.
 Let the random variable X denote the number of defective items in the sample space. Find,
a) Probability of obtaining exactly 2, defective items,
b) Obtain
 Solution:
 Given ; ;
𝑺 𝑵 𝑺 𝟑 𝟏𝟎 𝟑 𝟑 𝟕
𝒙 𝒏 𝒙 𝟐 𝟒 𝟐 𝟐 𝟐 𝟑
𝑵 𝟏𝟎 𝟏𝟎 𝟏𝟎
𝒏 𝟒 𝟒
𝟑 𝟕 𝟑 𝟕
𝟎 𝟒 𝟏 𝟑
𝟏𝟎 𝟏𝟎
𝟒 𝟒
 Example 2: The college of Business and Economics have 50 academic staff. 40 of
them are in the department of Economics. The dean office wants to form a
committee of 5 persons for overseeing the operation of the Staff Lounge. What is the
probability that 4 of the selected committees will be from the department of
Economics?
 Solution:
 Given ; ;
 Then, = = =
 Thus, the probability of four members of ECONOMICS department being

selected for the committee is 43.1%.
 Mean and Variance: Let X be a hyper geometrically distributed random variable with the
previously given discrete density function, then:
𝒏
 Mean = 𝒙 𝟎 𝒊 𝒊
𝒏 𝒏
𝒊 𝒊
𝒙 𝟎 𝒙 𝟎
𝑺
 Then, if we denote
𝑵
 𝟐 𝟐
𝒏𝑺
 Knowing that and upon longways of simplification of the expressions are;
𝑵
𝟐
The Poisson Distribution
 The Poisson probability distribution describes the number of times some event occurs during
a specified interval. Examples of an interval may be time, distance, area, or volume.
 The distribution is based on two assumptions:
1) The probability is proportional to the length of the interval
2) The intervals are independent
 To put it another way, the longer the interval, the larger the probability, and the number
of occurrences in one interval does not affect the other intervals. This distribution is a
limiting form of the binomial distribution when the probability of a success is very small
and n is large.
 It is often referred to as the “law of improbable events,” meaning that the probability, , of a
particular event’s happening is quite small.
 A random variable X is defined to have Poisson distribution if the discrete density function of
µ𝒙 𝒆 µ
is given by: , where: μ (mu) = p is the mean number of occurrences (successes)
𝒙!
in a particular interval; e is the constant 2.71828; x is the number of occurrences (successes);

and p(x) is the probability for a specified value of x.
 The Poisson probability distribution has these characteristics:
 The random variable is the number of times some event occurs during a defined interval.
 The probability of the event is proportional to the size of the interval.
 The intervals do not overlap and are independent.
 The distribution results from a count of the number of successes in a fixed number of
trials.
 The probability of a success is usually small, and the number of trials is usually large.
 Illustrative Example: Lets assume baggage is rarely lost in a particular airlines. Most flight
do not experience any mishandled bags; and so on. Suppose a random sample of 1000 flight
shows a total of 30 bags were lost. Thus, the arithmetic mean number of lost bags per flight is
𝟑𝟎
0.03, found by .
𝟏𝟎𝟎𝟎
 If the number of lost bags per flight follows a Poisson distribution with , we can
µ𝒙 𝒆 µ
compute the various probability by the formula:
𝒙!
 For Example, the probability of not losing any bags is:

𝟎.𝟎𝟑𝟎 𝒆 𝟎.𝟎𝟑

𝟎!
 In others words, 97.05 percent of the flights will have no lost baggage.
 The probability of exactly one lost bag:
𝟏 𝟎.𝟎𝟑
 Thus, we would expect to find exactly one lost bag on 2.9 percent of the flights.
Questions
1) In a Poisson distribution .
a) What is the probability that ?
b) What is the probability that ?
2) In a Poisson distribution .
a) What is the probability that ?
b) What is the probability that ?
c) What is the probability that ?
 Mean and Variance: The mean of the Poisson Distribution is given by:
𝒏 𝒏 µ𝒙 𝒆 µ µ 𝒏 µ𝒙
 𝒙 𝟎 𝒊 𝒊 𝒙 𝟎 𝒊 𝒙! 𝒙 𝟎 𝒊 𝒙!
 𝟐 𝟐
𝟐
 Given µ and finding after longways of simplification we obtain,
µ . Therefore,
 Example 1: The probability that a check cashed by a bank will bounce is 0.0003, and 10, 000
checks are cashed, then what will be the mean number of bad checks its variance?
 Solution:
 The variance will also be the same as the mean i.e
Uniform Density (UD) & The Normal Distribution
 Uniform Density is the simplest distribution for a continuous random variable.
 Uniform Density is rectangular in shape(thus called rectangular distribution) and is
completely defined by its minimum and maximum values.
 A random variable X is said to have a continuous uniform distribution over an interval
if its PDF is constant and given by:
𝟏
 𝒃 𝒂 Where the parameters a and b satisfy
 Properties of Uniform Distribution
1. A uniformly distributed random variable has a constant pdf over the interval of
definition.
3) A uniformly distributed random variable represents the continuous example to equally likely
outcomes.
 That is for any sub interval where , is the same for all
interval having the same length. That is,
𝒅 𝒅 𝒄
𝒄 𝒃 𝒂
3) The Cumulative Distribution Function of a uniformly distributed random variable X is

given by:
𝒙
 Mean and Variance: The mean of a uniform distribution is located in the middle of the
interval between the minimum and maximum values.
𝒏 𝒂 𝒃
 It is computed as: 𝒙 𝟎 𝒊 𝒊 𝟐
𝟐 𝟐 (𝒃 𝒂)𝟐

𝟏𝟐
 Example 1: A point is chosen at random on the line segment . What is the probability that
𝟑
the chosen point lies between 1 and ?.
𝟐
 Solution: The pdf of X is given as:
/
Normal Probability Distribution
 Most numerical values of a random variable are spread around the center, and
greater the distance a numerical value has from the center, the fewer numerical
values have that specific value.
 A frequency distribution of values of random variable observed in nature which
follows this pattern is approximately bell shaped.
 A special case of distribution of measurements is called a normal curve (or
distribution).
 If a population of numerical values follows a normal curve and X is the randomly
selected numerical value from the population, then X is said to be normal random
variable, which has a normal probability distribution.
 The random variable X is said to have a normal (or Gaussian) distribution with parameter
(mean) and (standard deviation) if its pdf is given by:
𝟏 𝒙 µ 𝟐
𝟏 ( 𝟐 )
𝟐 𝝈 for
𝝈 𝟐𝝅
 Where: the symbols and σ = Standard Deviation. The Greek symbol π is a

constant or 3.1416. The letter e is also a constant x is the value of a
continuous random variable.
 So a normal distribution is based on that is, it is defined by its mean and standard deviation.
It is continuous probability distribution in which the curve is bell-shaped having a single
peak.
 The mean of the distribution lies at the center of the ℎ
at the mean.
 Many variables are approximately, normally distributed, such as IQ scores, life expectancies,
and adult height. This implies that nearly all observations occur within 3 standard
deviations of the mean.
 On the other hand, observations that occur beyond 3 standard deviations from the mean are
extremely rare.
• Characteristics of the Normal Probability Distribution
1) There is a family of normal distributions.
 Each normal distribution may have a different mean, µ or standard deviation, σ.
 A unique normal distribution may be defined by assigning specific values to the mean
µ and standard deviation σ in the normal probability density function.
 For example, large value of σ reduce the height of the curve and increase the spread
and vice versa.
2) For every pair of values of µ and σ, the curve of normal probability density function is bell
shaped and symmetric.
3) The normal curve is symmetrical around a vertical line erected at the mean µ with respect
to the area under it, that is, fifty percent of the area of the curve lies on both sides of the mean
and reflect the mirror image of the shape of the curve on both sides of the mean . This implies
that the probability of any individual outcome above or below the mean will be same. Thus,
for any normal random variable X,
4) Since the normal curve is symmetric, the mean, median, and mode for the normal
distribution are equal because the highest value of the probability density function occurs
when value of a random variable,
5) The two tails of the normal curve extend to infinity in both directions and theoretically never
touch the horizontal axis.
6) The mean of the normal distribution may be negative, zero, or positive.
7) The mean µ determines the central location of the normal distribution, while standard
deviation σ determines its spread.
 The larger the value of the standard deviation σ, the wider and flatter is the normal curve,
thus showing more variability in the data.
 Thus standard deviation σ determines the range of values that any random variable is
likely to assume.
8) The area under the normal curve represents probabilities for the normal random variable, and
therefore, the total area under the curve for the normal probability distribution is 1.
Figure a: Characteristics of a Normal Distribution
Standard Normal Probability Distribution
 There is not just one normal probability distribution, but rather a “family” of them. In
application, it is necessary that a normal random variable X is standardized by
expressing its value as the number of σ it lies to the left or right of its µ.
 The standardized normal random variable, z (also called , or
normal variate) is defined as:
OR equivalently, .
A measures the number of standard deviations that a value of the random
variable X fall from the mean.
 is normal probability distribution with mean zero and standard deviation one.
 From the formula, we may conclude that
i) If x is less than the mean (µ), the value of z is negative.
ii) If x is more than the mean (µ), the value of z is positive.
iii) When , the value of
Figure b: Standard Normal Distribution
 Transforming measurements to standard normal deviates changes the scale. The conversions
are also shown in the previous graph. For example, is converted to a z value of 1.00.
 Likewise, is transformed to a z value of 2.00.
 Note that the center of the z distribution is zero, indicating no deviation from the mean, .
 The Empirical Rule
 It states that if a random variable X is normally distributed, then:
 Approximately 68% of the observations will lie within plus and minus 1 standard
deviation of the mean .
 About 95% of the observations will lie within plus and minus 2 standard deviations of
the mean .
 Practically all, or 99.7% of the observations, will lie within plus and minus 3 standard
deviations of the mean .
 Area Under the Normal Curve
 The area under the standard normal distribution between the mean and a
specified positive value of z, say 𝟎 is the probability 0≤z≤ 𝟎) and can be read off
directly from STANDARD NORMAL (Z) TABLES.
 For example, area between is the proportion of the area under the curve
which lies between the vertical lines erected at two points along the x–axis.
 The standard normal distribution is very useful for determining probabilities for
any normally distributed random variable.
 Basic Procedures in Finding Area
1) Find the z value for a particular value of the random variable based on the mean
and standard deviation of its distribution.
2) Use the z table value to find various probabilities(area under the curve).
Table a: Z-Score of a standard normal probability distribution, 0≤z≤ )
 NOTE: The table above is the area to the one side of the center (to the Right Side)
 We can also have cumulative probability.
 Example 1: Suppose the weekly income of taxi drivers follows the normal
probability distribution with a mean of $1,000 and a standard deviation of $100. What is
the z value of income for
1) a driver who earns $1,100 per week?
2) a driver who earns $900 per week?
 Solution: Given, ; 𝟏 ; and 𝟐
𝒙 µ 𝟏,𝟏𝟎𝟎 𝟏𝟎𝟎𝟎
 𝟏 𝝈 𝟏𝟎𝟎
𝒙 µ 𝟗𝟎𝟎 𝟏𝟎𝟎𝟎
 𝟐 𝝈 𝟏𝟎𝟎
 Implication:
 The z of 1.00 indicates that a weekly income of $1,100 is 1 standard deviation above the
mean, and a z of −1.00 shows that a $900 income is 1 standard deviation below the mean.
 Example 2: The lifetimes of certain kinds of electronic devices have a mean of 300 hours and
standard deviation of 25 hours. Assuming that the distribution of these lifetimes, which are
measured to the nearest hour, can be approximated closely with a normal curve,
 Find the probability that any one of these electronic devices will have a lifetime of
more than 350 hours.
 What percentage will have lifetimes of 300 hours or less?
 What percentage will have lifetimes from 220 or 260 hours?
 Solution: Given, ; Then,
 The area under the normal curve up to that is is 0.9772

 Note: This is cumulative probability. Thus, since urea under the whole curve is one.
Then,
𝒙 µ 𝟑𝟎𝟎 𝟑𝟎𝟎
𝝈 𝟐𝟓
 Thus,
 Therefore, the required percentage is, 0.5000 × 100 = 50%.
c) Given ; ; 𝟏 ; and 𝟐 .
 Then,
𝒙 µ 𝟐𝟐𝟎 𝟑𝟎𝟎
 𝟏 𝝈 𝟐𝟓
𝒙 µ 𝟐𝟔𝟎 𝟑𝟎𝟎
 𝟐 𝝈 𝟐𝟓
 From the normal table, we have
 (Area
and ) and
 Thus the required probability is:

P(z = – 3.2) – P(z = – 1.6) = 0.4903 – 0.4452
= 0.0541
 Hence the required percentage
 Example 3: In a normal distribution if thirty one percent of the items are under 45 and
eight percent are over 64. Find the mean and standard deviation of the distribution.
 Solution: Since 31 percent of the items are under 45, therefore the left of the ordinate at
is 0.31, and obviously the area to the right of the ordinate up to the mean is
The value of z corresponding to this area is 0.5(found from z
table).
𝒙 µ 𝟒𝟓 µ
 Hence,
𝝈 𝝈
 As 8 percent of the items are above 64, therefore area to the right of the ordinate at 64 is
0.08. Area to the left of the ordinate at up to mean ordinate is
and the value of z corresponding to this area is 1.4 (obtained from the table).
𝒙 µ 𝟔𝟒 µ
 Hence,
𝝈 𝝈
 From these two equations, we get and . Thus, mean of the distribution
is 50 and standard deviation 10.
Figure e: Area below and above

 Example 4: Assume that the test scores from a college admissions’ test are normally
distributed with a mean of 450 and a standard deviation of 100.
a) What percentage of people taking the test score are between 400 and 500?
b) Suppose someone received a score of 630. What percentage of the people taking the test
score better? What percentage score worse?
c) If the particular college will not admit any one scoring below 480, what percentage of the
persons taking the test would be acceptable to the university?

 Solution: Given ; ; 𝟏 ; and 𝟐
𝒙 µ 𝟒𝟎𝟎 𝟒𝟓𝟎 𝒙 µ 𝟓𝟎𝟎 𝟒𝟓𝟎

 𝟏 𝟏
𝝈 𝟏𝟎𝟎 𝝈 𝟏𝟎𝟎
a) The area under the normal curve between and z = 0.5 is 0.1915. The probability that
the score falls between 400 and 500 is =
So the percentage of the people taking the test score
between 400 and 500 is 38.30 percent.
𝒙 µ 𝟔𝟑𝟎 𝟒𝟓𝟎
b) Given ,
𝝈 𝟏𝟎𝟎
 The area under the normal curve between and is 0.4641.

 The probability that people taking the test score better is given by:
 Thus, 3.6 percent people score better.

 The same way, the probability that people taking the test score worse is given by:

 Thus, 96.40 percent people score worse.

c) Given ,
 The area under the normal curve between and is 0.1179.

 So
 Thus, the percentage of people who score more than 480 and are acceptable to the
university is 38.21 percent.
Gamma and Beta Distributions
 Gamma Distribution: The PDF for the gamma (or Erlang) distribution is:
𝝁(𝝁𝒙)𝒏 𝟏 𝒆 𝝁𝒙
, ,
𝒏 𝟏 !
 Gamma distribution is derived by the sum of n identically distributed and independent

exponential random variables. Here it may be noted that the pdf of gamma distribution
reduces to the exponential density function for n = 1. This means, the exponential
distribution is the special case of the gamma distribution, where .
 In the pdf for gamma distribution, the parameter µ changes the relative scales of the two axes,
and the parameter n determines the location of the peak of the curve.
 However, for all values of these two parameters, the area under the curve is equal to 1.
 The expected value and variance of this distribution are: ; 𝟐
 The graphs of the pdf for the gamma distribution for µ = 1 and selected values of n are shown
in Figure below.
Figure f: Gama Distribution pdf’s for

 Applications: Gamma distribution is highly applicable: In modeling accident
occurrences(traffic accidents), cancer rates(for instance the age distribution of cancer incidence
follows gamma distribution, rainfall occurrences(amount of rainfall), insurance claims and etc.
 In analyzing times elapses between various number of events(failure times, wait times,
service times, and etc.).
 To model and fit right skewed data.
 Beta Distribution
 The PDF for beta distribution is:
𝒎 𝟏 𝒏 𝟏
, ;
where 𝒎 𝟏 𝒏 𝟏 is the beta function, whose value may be obtained

directly from the table of the beta function.
 The expected value and variance of random variable X in this case are given by:
𝒎 𝒎𝒏
; and
𝒎 𝟏 𝒎 𝒏 𝟐 (𝒎 𝒏 𝟏)
Applications:
 Beta distribution is commonly used to describe the random variable whose possible
values lie in a restricted interval of numbers.
 A typical use of this distribution is found in the use of program Evaluation and
Review Technique (PERT) and Critical Path Method(CPM) where activity times are
estimated within a specific range.
 Highly applicable in Project Management. For instance, the beta distribution
technique in Project management is used to identify uncertainty in the estimated
project time.
The Normal Approximation of Binomial Distribution
 In the first section of this chapter we discussed the binomial probability distribution,
which is a discrete distribution.
 If a problem involved taking a large sample(say of 60), generating a binomial distribution
for that large a number would be very time consuming.
 A more efficient approach is to apply the normal approximation to the binomial.
 We can use the normal distribution (a continuous distribution) as a substitute for a binomial
distribution (a discrete distribution) for large values of because, as n increases, a
binomial distribution gets closer and closer to a normal distribution.

 The chart below(chart a) depicts the change in the shape of a binomial distribution with
from an n of 1, to an n of 3, and to an n of 20.
 Notice how the case where approximates the shape of the normal distribution.
Chart a: Binomial Distributions for an n of 1, 3, and 20, Where

 The normal probability distribution is a good approximation to the binomial probability
distribution when and are both at least 5.
 However, before we apply the normal approximation, we must make sure that our
distribution of interest is in fact a binomial distribution.
 Recall from the previous section that four criteria must be met:
1) There are only two mutually exclusive outcomes to an experiment: a “success” and
a “failure.”
2) The distribution results from counting the number of successes in a fixed
number of trials.
3) The probability of a success, , remains the same from trial to trial.
4) Each trial is independent.
Questions:
 Assuming a binomial probability distribution with and , compute the
following:
a) The mean and standard deviation of the random variable.
b) The probability that X is 15 or more.
c) The probability that X is 10 or less.
• Assuming a binomial probability distribution with and , compute the
following:
a) The mean and standard deviation of the random variable.
b) The probability that X is 25 or greater.
c) The probability that X is 15 or less.
d) The probability that X is between 15 and 25, inclusive.
The End.
Thanks
Joint and Conditional Probability Distribution
1
Introduction: Joint and Conditional Probability Distribution
 Joint probability: The probability of two or more independent events occurring

together or in succession is called the joint probability.
 The joint probability of two or more independent events is equal to the product of
their marginal probabilities.
 Conditional probability: The probability of an event occurring, given that
another event has occurred.
 Marginal Probability: The marginal probability of an event under statistical
dependence is the same as the marginal probability of an event under
statistical independence.
 So far we have focused on probability distributions for single random variables (one
dimensional case).
 However, we are often interested in probability statements concerning two or more
random variables. That is., Education and Earnings; Height and Longevity;
Attendance and Learning Outcomes; Sex-ratios and Areas Under Rice Cultivation;
income and expenditure; rainfall(R) and average temperature (T) Genetic Make-Up
and Disease etc.

Joint and Marginal Distributions
 Business and economic applications of statistics are often concerned about the
relationships between/among variables.
 Illustrations:
 Products at different quality levels have different prices.
 Age groups have different preferences for clothing, for automobiles, and for music.
 The percent returns on two different stocks may tend to be related (positive
relationship; negative relationship).
 When we work with probability models for problems involving relationships between
variables, it is important that the effect of these relationships is included in the
probability Model.
 For example, assume that a car dealer is selling the following automobiles:
 (1) a red two-door compact,
 (2) a blue minivan, and
 (3) a silver full-size sedan; the probability distribution for purchasing would
not be the same for women in their 20s, 30s, and 50s.
 Thus, it is important that probability models reflect the joint effect of variables on
probabilities.
 Joint Probability: is probability of two or more independent events occurring
together or in succession.
 The joint probability of two or more independent events is equal to the
product of their marginal probabilities.
 In particular, if A and B are independent events, the probability that both A
and B will occur is given by:

 Example 1: Suppose we toss a coin twice. The probability that in both cases the
coin will turn up head is given by:

 In general, if X and Y are two random variables, the probability distribution that
defines their simultaneous behavior is called a Joint Probability Distribution.
 It is denoted as:
 If X and Y are discrete, this distribution can be described with a Joint
Probability Mass Function (Joint PMS).
 If X and Y are continuous, this distribution can be described with a Joint
Probability Density Function.

Joint and Marginal Distributions: Discrete
 Examples (Joint discrete variables)
 Years in College Vs. Number of credits taken
 Number of Cigarettes smoked per day Vs. Day of the week
 Average temperature Vs. Day of the Year
 Number of students Vs. Number of professors in a College
 Let X and Y be discrete random variables defined on the sample space that take on values
𝟏 𝟐 and 𝟏 𝟐 respectively.
 Then, the Joint Probability Mass Function of is: 𝒊 𝒊
𝒊 𝒊 .
 The values give the probability/likelihood that outcomes X and Y occur at the same
time.
 Example 1: CDs are covered with a rectangular plastic. Measurements for the
length and width are rounded to the nearest mm (so they are discrete). Let X
denote the length and Y denote the width. The possible values of X are 129, 130,
and 131 mm while the possible values of Y are 15 and 16 mm (Thus, both X and Y
are discrete). Thus, there are six possible areas (pairs of X and Y) of the plastic
cover. The probability for each pair is shown in the following table:
X: length
129 130 131
15 0.12 0.42 0.06
Y: Width
16 0.08 0.28 0.04
 From the table, the sum of all the probabilities is 1.0, the combination with the highest
probability is (130, 15), the combination with the lowest probability is (131, 16).
 The joint probability mass function is the function 𝑿𝒀 𝑿𝒀 . For
example, we have 𝑿𝒀 .
 Recall: A marginal or unconditional probability is the simple probability of the occurrence
of an event.
 Which is the probability of the occurrence of a single event i.e.,
.
𝟏
 Examples: The probability a card drawn is red is and the probability of a
𝟐
𝟏𝟑 𝟏
card drawn is a heart is .
𝟓𝟐 𝟒
 These are marginal Probabilities.

Marginal Probability Distribution: Discrete
 Suppose that the discrete random variables X and Y have joint PMF,
Let denote the possible values of X, and denote the
possible values of Y.
 The marginal probability mass functions of X and Y are respectively given by:
 Thus, if we are given a joint PMF for X and Y, we can obtain the individual
probability distribution for X or for Y (and these are called the Marginal
Probability Distributions).
 Given the previous example and find the probability that a CD cover has length of 129 mm
(i.e. ). Then, what is the probability distribution of X?
Solution:
X: length Row Sum

Y: Width 129 130 131
15 0.12 0.42 0.06 0.60
16 0.08 0.28 0.04 0.40
Column Sum 0.20 0.70 0.10 1.00
ii. Then, the probability distribution for X appears in the column totals, given as:
X 129 130 131

P( 0.20 0.70 0,10
 The same way, the probability distribution for Y appears in the raw totals, given
as: Y 15 16
P( 0.60 0.40
 Because the PMF for X and Y appear in the margins of the table (i.e. column and
row totals), they are often stated to as the Marginal Distributions for X and Y.
 We can also compute the Expected Values (i.e. and ) and variance
of a single random variable from the joint probability distribution function.
 Example 2: A fair coin is tossed three times independently: let X denote the number of
heads on the first toss and Y denote the total number of heads. Then, find the joint
probability mass function of X and Y .
 Solution: the joint distribution of can be summarized in the following table:
X|Y 0 1 2 3
0 1/8 2/8 1/8 0
1 0 1/8 2/8 1/8
 Suppose that we wish to find the PMF of Y from the joint PMF of X and Y in the previous
example:
 𝒚 )
 𝒚 )
 In general, to find the frequency function of Y, we simply sum down the
appropriate column of the table given the joint PMF of X and Y.
 For this reason, is called the marginal probability mass function of Y.
 Similarly, summing across the appropriate rows gives,
which is the marginal PMF of X.
 For the above example, we have

Properties of Joint Discrete Probability Distribution
ii. the sum of the joint probabilities over all possible pairs of values must be 1,
i.e. 𝑿 𝒀
iii. For any region A in the plane ,
𝑿 𝒙 𝒙
𝟐 𝟐
𝒙 𝒙 𝑿 𝒙
𝒚 𝒚 𝒚
𝟐 𝟐
𝒚 𝒚 𝒚 𝒚
 Exercise 1: A basket contains two Red, three Black, and four White balls. Let X be the number of chosen
black balls and Y be number of chosen red balls. If two balls are randomly selected from the basket(without
replacement),
A) Find the joint probability distribution function
B) The variance of X and Y.
 Exercise 2: Given the joint PMF of two random variables as follow,
𝑷(𝒙, 𝒚) 0 10 20
10 1/5 1/10 1/5
25 5/100 3/20 3/10
A) Find the probability that .

B) Find the probability that .
C) Find the marginal probability of X, and marginal probability of Y,
D) Compute the expected value and variances of the two random variables.
 Exercise 3: Consider two random variable X and Y with joint PMF as given below.
10 15 20
0 1/4 1/6 1/8
1 1/8 1/6 1/6
A) Find the probability that and .
B) Find the probability that .
B) Joint and Marginal Densities: Continuous
 A bivariate function can serve as a joint probability density function of a pair of
continuous random variables X and Y if it values, , satisfy the following two
conditions:
for and
 If X and Y are continuous random variables and is the value of their joint
probability density at , the function given by:
is known as the marginal density of X.

 Correspondingly, the function given by:
is known as the marginal density of Y.

 Exercise 1: Consider the following function: for
and Verify that it is joint PDF and find the marginal density function
for X and Y.
 Exercise 2: The bivariate random variables X & Y has the pdf:
, , Then, find the value of the constant k
and compute the marginal density functions for X and Y
 Exercise 3: Given the joint pdf: , , . Then, compute
the marginal density functions for X, and compute the marginal density functions
for Y
Conditional Distributions and Independence
 The probability of an event occurring, given that another event has occurred is Conditional
Probability:
 The conditional probability of event A, given that event B has already occurred is written as:
Similarly, we may write
 The vertical bar is read as: ‘given’ and events appearing to the right of the bar are those that
you know have occurred. Two events A and B are said to be independent if and only
or Otherwise, events are said to be dependent.
 Example : In a toss of a coin twice, the 𝟐 𝟏 𝟐 : Independent events
 In a box containing 2 white and 2 black balls, 𝟐 𝟏 𝟐 , with
replacement: Independent Events 𝟐 𝟏 , without replacement:
Dependent Events.
 Conditional Distribution: The conditional probability distribution/density of X given
𝒇(𝒙,𝒚)
is defined as: if and the conditional probability
𝒇(𝒚)
𝒇(𝒙,𝒚)
distribution/density of Y given is defined as: if
𝒇(𝒙)
 This definition holds for both discrete and continuous random variables. You can also notice
the similarity between the conditional probability and conditional probability distribution.
 Example:
Age Group (X)
Purchase Decision 1 2 3
(Y) (16 to 20) (26 to 46) (46 to 65)
1(buy) 0.10 0.20 0.10 0.40
2(not buy) 0.25 0.25 0.10 0.60
0.35 0.45 0.20 1.00
 Statistical Independence: The condition of statistical independence arises when the
occurrence of one event has no effect upon the probability of occurrence of any other event.
 For example, in a fair coin toss the outcome of each toss is an event that is statistically
independent of the outcomes of every other toss of the coin.
 Note: The probability computations under statistical dependent and independent events
is very different.
 Under Statistical dependence

 and
 OR

𝑷(𝑨∩𝑩) 𝑷(𝑨∩𝑩)
 and
𝑷(𝑩) 𝑷(𝑨)
 When the information that the random variable X takes a particular value x is irrelevant to the
determination of the probability that another random variable Y takes, we say that Y is
independent of X.
 Formally, two random variables, X and Y, are said to be independent if and only if any one of
the following three conditions holds:
 for
 , for
 , for
 Exercise 1: Given the joint PDFs in the previous exercises, compute the conditional
probability density functions for X given Y and for Y given X.
Expectations: Mathematical Expectation
 If X and Y are two discrete random variables with joint PDF, then:
 , 𝒙 𝒚
 Given 𝒚
𝒙 , for discrete random variable. The expected value for Y is also

given in the same way.
 If X and Y are two continuous random variables with joint PDF, then:
 Given, , for continuous random

variable. The expected value for Y is also given in the same way.
 If X and Y are two random variables, then,
 If g is any function of X and Y, then, , if X and Y are
discrete random variables.
 , if X and Y are continuous random variables.
 Example: Given the PMF in the form of table as below, find, , and
X|Y 0 1 2 3
0 1/8 2/6 1/8 0 4/8
1 0 2/8 1/8 1/8 4/8
1/8 3/8 3/8 1/8 1
Solution:
𝟒 𝟒
 𝒙 𝟖 𝟖
 𝒚
𝟏 𝟑 𝟑 𝟏 𝟏𝟐
𝟖 𝟖 𝟖 𝟖 𝟖
𝟏 𝟐
 𝒙 𝒚 𝟖 𝟖
𝟐
 The Variance,
 𝟐 𝟐 𝟐 𝟐
 𝟐 𝟐 𝟐 𝟐
 In terms of summation and integration notations:

 𝟐 𝟐 , for discrete random variable.
𝒙 𝒙 𝒙
𝟐 𝟐
 𝒙 𝒙 , for continuous random variable.
𝟐
 Note that the variance for Y, 𝒚 is also expressed in the same way, replacing Y in place of X
in the expressions above.
 Exercise: Compute the variance for X and Y for the previous example. Answer:
𝟐 𝟐
𝒙 𝒚
Covariance and Correlation
 An important measures when we have two random variables is that of covariation and
correlation of X and Y. The relationship between two variables can be represented by a
covariance and a correlation, which then suggests that there may be a relationship that differs
from 0.
 Covariance: measures the amount of linear dependence between two random variables.
 Is a measure of the joint variability of two variables.
 A positive covariance indicates that two random variables move in the same direction,
while a negative covariance indicates they move in opposite directions.
 Interpreting the magnitude of a covariance can be a little tricky.
 It is defined as follows:
 𝑿 𝒀 𝑿 𝒀
 𝒚 𝒙 𝑿 𝒀 , in the case of discrete.
𝑿 𝒀 , in the case of continuous.

 Note the following:
 𝒚 𝒙 𝑿 𝒀
 𝒚 𝒙 𝑿 𝒀
 Correlation: the degree of association between a dependent (response) variable Y and two or
more independent variables.
 is an alternative measure of dependence between X and Y that solves the “units” problem
of the covariance.
 Specifically, the correlation between X and Y is the covariance between X and Y divided
by their standard deviations, denoted by or 𝑿,𝒚 is:
𝑪𝑶𝑽 𝑿,𝒀 𝑪𝑶𝑽(𝑿,𝒀)
 or 𝑿,𝒚 𝑽𝒂𝒓 𝒙 ∗𝑽𝒂𝒓(𝒀) 𝝈𝑿 𝝈𝒀
 This is called the coefficient of correlation, which measures linear relationship between X
and Y. Thus, Covariance measures the joint variability of two random variables while
Correlation measures the strength/the degree to which variables are moving together.
 Properties of the coefficient of correlation
 𝑿,𝒚
 and
 𝒙,𝒚 𝒚,𝒙
 Exercise 1: Find the covariance and coefficient of correlation of our example under the mean
for the discrete.
𝟏 𝟏
 Answer: 𝒙,𝒚 𝒚,𝒙
𝟒 𝟑
𝟑 𝟐 𝟒
𝒙 𝒚 𝟐 𝒙 𝒚
 Exercise 2: Given 𝑵 with ; , and ; and
𝒏
its results tabulated as follow: X|Y 0 1 2 𝐟(𝐱)

0 1/6 2/9 1/36 5/12
1 1/3 1/6 0 1/2
2 1/12 0 0 1/12
𝐟(𝐲) 7/12 7/18 1/36 1
 Find the correlation coefficient between X and Y.

𝟐 𝟐
 Answer: ; ; ;
 𝟐 ; 𝟐 ; ;
𝒙 𝒚
𝟕 𝟕
 𝒙,𝒚 𝒚,𝒙
𝟓𝟒 𝟓𝟒
𝟕 𝟒𝟗 𝟑𝟒𝟑
∗
𝟏𝟖 𝟏𝟔𝟐 𝟐𝟗𝟏𝟔
 The covariance ends up negative because of the restriction given by . So that when X
increases, Y must go down and thus, the negative relationship between the two random variables.
 Conditional Expectation
 Conditional expectation of X given is defined as:
 𝒙 , if discrete.
 , if continuous.
 Similarly, the conditional expectation of Y given is defined as:
 𝒚 , if discrete.
 , if continuous.
 Note that: The conditional expectation of Y given turns out to be the function of X. That
is, or is called the regression function of Y on X.
 It tells us how the mean of Y varies with changes in X.
 Example: In our earlier example given as follows
X|Y 0 1 2 3
0 1/8 2/6 1/8 0 4/8
1 0 2/8 1/8 1/8 4/8
1/8 3/8 3/8 1/8 1
 Find,
 Solution: a) Conditional probability distribution of X
Y
0
1
2
3
a) ( | = ), Conditional Probability distribution of Y
X ( | = ) ( | = ) ( | = ) ( | = )
0
 Now, regression of Y on X is defined as . That is,

𝟏 𝟏 𝟏
 𝒚 𝟒 𝟐 𝟒
𝟏 𝟏 𝟏
 𝒚 𝟒 𝟐 𝟒
X
0 1
1 2
 In the form of function:

Independence and Expectation
 Let X and Y be two random variables with probability function , then
1) If X and Y are independent, then
 Proof: 𝒙 𝒚 𝒙 𝒚 , as X and Y are independent
𝒙 𝒚
2) If X and Y are independent, then and 𝒙,𝒚
 Proof:
 Note: If two random variables are independent, then they have zero correlation. However, the
converse is not necessarily true. That is, two random variables may have zero correlation; it
does not necessarily mean that they are independent.
Example: Let the joint PMF of two random variables X and Y be given as follows:
X|Y -1 0 1
-1 0 1/4 0 1/4
0 1/4 0 1/4 1/2
1 0 1/4 0 1/4
1/4 1/2 1/4 1
 ; and , therefore
 However, is not equal to
 .
 Note also that is not equal to ; but X and Y are dependent.
Y Y
-1 0.5 -1 1/4
0 0 0 1/2
1 0.5 1 1/4
1 1
 The sum of the conditional distribution should add up to unity. Similarly, the sum
of the marginal distribution also adds up to one. If X and Y are independent
for all X, but in the above example this does not hold, therefore,
the two variables are not independent.
 Relationship between Independence and Uncorrelatedness
 Independence: Two random variables are independent when there is no relationship of

any form between the two RVs (X & Y).
 Uncorrelatedness: Two random variables are uncorrelated when there is no

relationship of a particular form between the two RVs (X & Y). Thus, independent
random variables are uncorrelated but uncorrelated random variables may not be
independent.
5-1
Sampling and Sampling Distributions
Lecturer, Department of Economics
1
5-2
Introduction: Sampling and Sampling Distributions

 The process of selecting a sample from a population is called
sampling.
 In sampling, a representative sample or portion of elements

of a population or process is selected and then analyzed.
 Based on sample results, called sample statistics, statistical

inferences are made about the population characteristic.
SAMPLING METHODS
5-3
 Sampling methods compared to census provides an attractive

means of learning about a population or process in terms of
reduced cost, time and greater accuracy.
5-4
Sampling Distributions
• The sampling distribution of a statistic is the probability
distribution of all possible values the statistic may assume,
when computed from random samples of the same size,
drawn from a specified population.
• The sampling distribution of X is the probability
distribution of all possible values the random variable X may
assume when a sample of size n is taken from a specified
population.
5-5
• The concept of sampling distribution can be related to the
various probability distributions.
• Probability distributions are the theoretical distributions of
random variables that are used to describe characteristics of
populations or processes under certain specified conditions.
• That is, probability distributions are helpful in determining the
probabilities of outcomes of random variables when
populations or processes that generate these outcomes satisfy
certain conditions.
5-6
• When the population has a normal distribution, then the
phenomenon that describes the normal probability distribution
provides a useful description of the distribution of population
values.
• When the mean values obtained from samples are distributed
normally, it implies that this distribution is useful for describing the
characteristics (properties) of sampling distribution.
• Consequently, these properties, which are also the properties of
sampling distribution, help to frame rules for making statistical
inferences about a population on the basis of a single sample drawn
from it, that is, without even repeating the sampling process.
5-7
• The sampling distribution of a sample statistic also helps in
describing the extent of error associated with an estimate of the
value of population parameters.
5-8
Sample Distributions Vs. Sampling Distribution

• Population distribution is the distribution of values of its
elements members and has mean denoted by µ, variance σ2 and
standard deviation σ.
• Sample distribution is the distribution of measured values of
statistic in random samples drawn from a given population.
• Each sample distribution is a discrete distributions. Because the
value of the sample mean would vary from sample to sample. This
variability serves as the basis for the random sampling
distribution.
5-9
Cont’d
• In such distributions the arithmetic mean represents the
average of all possible sample means or the ‘mean of means’
denoted by X ; the standard deviation which measures the
variability among all possible values of the sample values, is
considered as a good approximation of the population’s standard
deviations σ.
• To estimate σ of the population to greater accuracy the formula
5-10
Cont’d
• where n is the size of sample. The new value n – 1 in the
denominator results into higher value of s than the observed value
s of the sample. Here n – 1 is also known as degree of freedom.
• The number of degrees of freedom, df = n – 1 indicate the number
of values that are free to vary in a random sample.
• Sampling distribution is the distribution of all possible values
of a statistic from all the distinct possible samples of equal size
drawn from a population or a process.
5-11
Sampling Distribution of Sample Mean

• The sampling distribution of sample means depending on the
distribution of the population or process from which samples
are drawn.
• If a population or process is normally distributed, then sampling
distribution of sample means is also normally distributed
regardless of the sample size.
• Even if the population or process in not distributed normally,
the sampling distribution of sample mean tends to be distributed
normally as the sample size is sufficiently large.
5-12
Sampling Distribution Properties
• Sampling distribution statistics is based on the following
properties:
Cont’d 5-13
 Sampling error provides some idea of the precision of a statistical estimate.

 A low sampling error means a relatively less variability or range in the
sampling distribution.
 The greater the sample’s standard deviation, the greater the standard error
(the sampling error).
 The standard error is also related to the sample size: greater the sample
size, smaller the standard error.
5-14
Properties of the Sampling Distribution of the Sample Mean

• Comparing the population
distribution and the sampling
distribution of the mean:
Uniform Distribution (1,8)
The sampling distribution 0.2
is more bell-shaped and

symmetric.
P(X)
0.1
Both have the same center.

The sampling distribution 0.0
1 2 3 4 5 6 7 8
of the mean is more

X
compact, with a smaller Sampling Distribution of the Mean
variance.
0.10
P(X)
0.05
0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
X
5-15
Relationships between Population Parameters and the Sampling
Distribution of the Sample Mean
The expected value of the sample mean is equal to the population mean:
E( X )    
X X
The variance of the sample mean is equal to the population variance divided by
the sample size:
 2
V (X )   2
 X
X
n
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:

SD ( X )    X
X
n
5-16
Sampling from a Normal Population
When sampling from a normal population with mean  and standard
deviation , the sample mean, X, has a normal sampling
distribution:

2
X ~ N (, )
n
This means that, as the sample Sampling Distribution of the Sample Mean
size increases, the sampling 0.4
Sampling Distribution: n =16

distribution of the sample mean 0.3
Sampling Distribution: n = 4
remains centered on the

f(X)
0.2
Sampling Distribution: n = 2
population mean, but becomes 0.1
Normal population
Normal population
more compactly distributed 0.0

around that population mean
5-17
The Central Limit Theorem
• If population is not normally distributed, then we make use of the
central limit theorem to describe the random nature of the
sample mean for large samples without knowledge of the
population distribution.
• The Central Limit Theorem states that when the random
samples of observations are drawn from a non-normal population
with finite mean µ and standard deviation σ, and as the sample
size n is increased, the sampling distribution of sample mean x
is approximately normally distributed, with mean and standard
deviation as:
5-18
Cont’d
• Regardless of its shape, the sampling distribution of sample
mean x always has a mean identical to the sampled
population, i.e. µx = µ and standard deviation σx = σ / x . This
implies that the spread of the distribution of sample means is
considerably less than the spread of the sampled population.
• The central limit theorem is useful in statistical inference.
• When the sample size is sufficiently large, estimations such as
‘average’ or ‘proportion’ that are used to make inferences about
population parameters are expected to have sampling distribution
that is approximately normal.
5-19
Cont’d
• The behavior of these estimations can be described in repeated sampling and
are used to evaluate the probability of observing certain sample results using
the normal distribution as follows:
• However, following guidelines are helpful in deciding an appropriate value of n:

5-20
n=5
When sampling from a population 0.25
with mean  and finite standard

0.20
0.15
P(X)
0.10
deviation , the sampling 0.05

0.00
X
distribution of the sample mean will n = 20

tend to a normal distribution with

0.2
mean  and standard deviation n as
P(X)
0.1
the sample size becomes large 0.0

X
(n >30). Large n
0.4
0.3
f(X)
For “large enough” n: X ~ N(, / n)
0.2
2
0.1
0.0
-

X
5-21
The Central Limit Theorem Applies to Sampling Distributions from Any Population
Normal Uniform Skewed General
Population
n=2
n = 30
 X  X  X  X
5-22
 Example: Mercury makes a 2.4 liter V-6 engine, the Laser XRi, used in speedboats.
The company’s engineers believe the engine delivers an average power of 220
horsepower and that the standard deviation of power delivered is 15 HP. A
potential buyer intends to sample 100 engines (each engine is to be run a single
time). What is the probability that the sample mean will be less than 217 HP?
 
 X   217   
P ( X  217)  P  
   
 n n 
   
 217  220  217  220
 P Z    P Z  
 15   15 
 100   10 
 P ( Z  2)  0.0228
5-23
Sampling Distribution of Mean When Population has Normal Distribution

 Population Standard Deviation σ is known as mentioned earlier that no
matter what the population distribution is, for any given sample of size n taken
from a population with mean µ and standard deviation σ, the sampling
distribution of a sample statistic, such as mean and standard deviation are
defined respectively by;
 If all possible samples of size n are drawn with replacement from a

population having normal distribution with mean µ and standard deviation σ,
then it can be shown that the sampling distribution of mean x and standard
error σ x will also be normally distributed irrespective of the size of the sample.
5-24
Sampling Distribution of Mean When Population has Normal Distribution

 This result is true because any linear combination of normal random variables
is also a normal random variable.
 In particular, if the sampling distribution of x is normal, the standard error of

the mean σ x can be used in conjunction with normal distribution to determine
the probabilities of various values of sample mean.
 For this purpose, the value of sample mean x is first converted into a value z on
the standard normal distribution to know how any single mean value deviates
from the mean x of sample mean values, by using the formula
5-25
5-26
5-27
Student’s t Distribution
If the population standard deviation, , is unknown, replace with

the sample standard deviation, s. If the population is normal, the
resulting statistic: t  X  
s/ n
has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped and symmetric

distributions, one for each number of degree Standard normal
of freedom.
• The expected value of t is 0. t, df=20
t, df=10
• The variance of t is greater than 1, but
approaches 1 as the number of degrees of
freedom increases. The t is flatter and has
fatter tails than does the standard normal.

• The t distribution approaches a standard 
normal as the number of degrees of freedom
increases.
5-28
Degrees of Freedom
 Degrees of freedom: The number of unrestricted chances for
variation in the measurement being made.
 The devisor (n–1) in the formula for the sample variance s2 is called
number of degrees of freedom (df) associated with s2.
 The number of degrees of freedom refers to the number of unrestricted
chances for variation in the measurement being made, i.e. number of
independent squared deviations in s2 that are available for estimating σ2.
 In other words, it refers to the number of values that are free to vary in a
random sample.
 The shape of t-distribution varies with degrees of freedom. Obviously
more is the sample size n, higher is the degrees of freedom.
5-29
Degrees of Freedom
 Example: The mean length of life of a certain cutting tool is 41.5
hours with a standard deviation of 2.5 hours. What is the
probability that a simple random sample of size 50 drawn from
this population will have a mean between 40.5 hours and 42
hours?
 Solution: We are given the following information µ = 41.5 hours,
σ = 2.5 hours, and n = 50
 It is required to find the probability that the mean length of life, x
, of the cutting tool lies between 40.5 hours and 42 hours, that is,
P(40.5 ≤ x ≤ 42).
5-30
Cont’d
5-31
5-32
Degrees of Freedom
Consider a sample of size n=4 containing the following data points:
x1=10 x2=12 x3=16 x4=?
and for which the sample mean is: x   x  14

n
Given the values of three data points and the sample mean, the
value of the fourth data point can be determined:
 x 12  14  16  x4 x 4  56  12  14  16
x=   14
n 4
xx 4 = 56
14
12  14  16  x  56
4
Degrees of Freedom
5-33
If only two data points and the sample mean are known:
x1=10 x2=12 x3=? x4=? x  14
The values of the remaining two data points cannot be uniquely

determined:
x 12  14  x  x 4
3
x=   14
n 4
12  14  x  x 4  56
3
5-34
Degrees of Freedom
 The number of degrees of freedom is equal to the total number of
measurements (these are not always raw data points), less the total
number of restrictions on the measurements. A restriction is a
quantity computed from the measurements.
 The sample mean is a restriction on the sample measurements, so after
calculating the sample mean there are only (n-1) degrees of freedom
remaining with which to calculate the sample variance. The sample
variance is based on only (n-1) free data points:
s 2

 ( x  x ) 2
(n  1)
Cont’d
5-35
 Example: A sample of size 10 is given below. We are to choose three

different numbers from which the deviations are to be taken. The first
number is to be used for the first five sample points; the second
number is to be used for the next three sample points; and the third
number is to be used for the last two sample points.
Sample # 1 2 3 4 5 6 7 8 9 10
Sample 93 97 60 72 96 83 59 66 88 53
Point
i. What three numbers should we choose in order to minimize the SSD

(sum of squared deviations from the mean).?
SSD    x  x 
2
• Note:
5-36
Solution: Choose the means of the corresponding sample points. These are: 83.6,
69.33, and 70.5.
ii. Calculate the SSD with chosen numbers.
Solution: SSD = 2030.367. See table on next slide for calculations.
iii. What is the df for the calculated SSD?
Solution: df = 10 – 3 = 7.
iv. Calculate an unbiased estimate of the population variance.
Solution: An unbiased estimate of the population variance is SSD/df = 2030.367/7 =

290.05.
Example
5-37
Sample # Sample Point Mean Deviations Deviation Squared
1 93 83.6 9.4 88.36

2 97 83.6 13.4 179.56
3 60 83.6 -23.6 556.96
4 72 83.6 -11.6 134.56
5 96 83.6 12.4 153.76
6 83 69.33 13.6667 186.7778
7 59 69.33 -10.3333 106.7778
8 66 69.33 -3.3333 11.1111
9 88 70.5 17.5 306.25
10 53 70.5 -17.5 306.25
SSD 2030.367
SSD/df 290.0524
5-38
The Sampling Distribution of the Sample Proportion
n=2, p = 0.3
The sample proportion is the percentage of 0 .5
0 .4
successes in n binomial trials. It is the 0 .3
P(X)
0 .2
number of successes, X, divided by the 0 .1
number of trials, n.
0 .0
0 1 2
n=10,p=0.3
0.3
X
Sample proportion: pˆ  0.2
n
P(X)
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10
X
As the sample size, n, increases, the sampling

distribution of p
n=15, p = 0.3
 approaches a normal 0.2
distribution with mean p and standard
P(X)
deviation p (1  p )
0.1
n 0.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X
0 1 2 3 4 5 6 7 8 9 10 11 12 13 1415
15 1515 15 15 15 15 15 15 1515 15 15 15 1515 ^p
Sample Proportion
5-39
• Example: In recent years, convertible sports coupes have become very popular in Japan.
Toyota is currently shipping Celicas to Los Angeles, where a customizer does a roof lift and
ships them back to Japan. Suppose that 25% of all Japanese in a given income and lifestyle
category are interested in buying Celica convertibles. A random sample of 100 Japanese
consumers in the category of interest is to be selected. What is the probability that at least
20% of those in the sample will express an interest in a Celica convertible?
n  100
 
p  0.25
 p  p .20  p 
P ( p > 0.20 )  P
 p (1  p )
>
p (1  p ) 

np  (100 )( 0.25)  25  E ( p )
 n n

 
p (1  p )

(.25)(.75)
 0.001875  V ( p )  .20  .25   >  .05 
 P z >   P z
n 100
 (.25)(.75)
  .0433
p (1  p )  100

 0.001875  0.04330127  SD ( p )  P ( z > 1.15)  0.8749
n
5-40
40
•
ESTIMATION
Lecturer, Department of Economics

 Estimation: the method to estimate the value of a population
parameter from the value of the corresponding sample statistic.
 Estimation a sample statistic to estimate most likely value of its
population parameter.
 Estimation is a method that enables us to estimate, with reasonable
accuracy, the population parameter.
 The procedure of marking estimation is to have random sample of
size n from the know probability distribution, compute sample
statistics and use it as an estimate of the population parameters.
 Some Terminologies of Estimation
 An estimator is a sample statistics used to estimate a population

parameter.
 An estimator of a population parameter is a random variable that

depends on the sample information, and whose realizations provide
approximations to this unknown parameter.
 Example: The sample mean, can be an estimator of the population

mean, .
 An estimate is a specific observed value of a statistic or it is a specific

realization of that random variable.
 We can get an estimate by taking a sample and computing the value
taken by our estimator in that sample.
 To clarify the distinction between the terms Estimator and Estimate,

suppose that we want to compute the mean age of a student from a
sample of this section and find it to be 19 years. If we use this specific
value to estimate the age of students in the campus, the value 19 years
would be an estimate.
 Point Estimation: is a single number, which is used to estimate an
unknown population parameter.
 Point estimates for population parameters based on certain criteria,
staticians frequently use to choose among estimators.
 The following are the standard estimators used by staticians to
estimate these parameters.
 : is the most common estimator of the
population mean.
 The sample mean is unbiased and consistent. Moreover, it can be
shown that if the population is normal the sample mean is the most
efficient unbiased estimator available.
 For these reasons the sample mean is generally the preferred estimator of
the population mean. It can be computed as:
𝒊
 : the sample
variance is unbiased & consistent estimator of the population variance.
 It is relatively efficient as compared to other estimators.
 Its equal root, the sample standard deviation, is generally used as an
estimator of the population standard deviation is also relatively efficient.
𝟐
𝒊
 It can be computed as:
 : is an unbiased, consistent, and relatively
efficient estimator of the population proportion.
 Due to this fact, it is generally the preferred estimator of the
population proportion. It is calculated by taking elements in the
sample that have the same characteristics.
 Example: Price- earnings ratios for a random sample of ten shares
traded in Addis Ababa stock Exchange on December 27, 2011 were:
10; 16; 5; 10; 12; 8; 4; 6; 5; 4
 Find point estimates of the population mean, variance and standard
deviation and Proportion of shares in the population for which the
price earnings ratio exceeded 8.5
 Solution
 To find the first three of these sample quantities, we show the
calculations in tabular form as:
i 1 2 3 4 5 6 7 8 9 10 Total
Xi 10 16 5 10 12 8 4 6 5 4 80
Xi2 100 256 25 100 144 64 16 36 25 16 782

 The point estimate of the population mean is
 The point estimate of the population variance is
 The point estimate of the standard deviation is =3.97
 The point estimate of the population proportion is

 Interval Estimation: An interval estimate is a range of values used
to estimate a population parameter.
 It indicates the error in two ways: By the extent of its range and the
probability of the true population parameter lying within that range.
 Instead of relying on the point estimate alone, we may construct an
interval around the point estimator, say within two or three standard
error of the mean on either side of the point estimator such that this
interval has, for instance 0.95 probability of including the true
parameter value.
 Assume that we want to find out how “close” is an estimator
 For this purpose, we try to find out two positive values and such
that the probability that the random variable
 Where: is the level of significance, is confidence coefficient,

and such an interval is known as confidence Interval.
 The random variables are the lower and the upper
confidence limit (or critical values) respectively.
 Calculating Confidence Interval Estimate of a Population Mean:
 Normal Population with known Suppose we have a normal
population whose mean and standard deviation are and then the
sampling distribution of the mean will be normal with mean of
 For sampling distribution of the mean the standard normal variable is
the interval estimate of a population mean (the
confidence interval for a population mean) is:

 Where 𝜶
𝟐
is the value of standard normal variable that is exceeded
with a probability of or 𝜶
𝟐
is Z value providing an area of in
the upper tail of the standard normal probability distribution.
 Example: A normal infinite population has a standard deviation of 10.

A random sample of size 25 has a mean of 50. Construct a 95%
confidence interval of the population mean?
 Solution: given
 Required: 𝜶
𝟐
𝜶
𝟐
, 1st find the standard error
𝑿
of the mean i.e.,
 Then we have that = level of significance = 0.05 since = 0.95, so
 Therefore, the confidence interval estimation for the population mean ( ) is:
 In this case we may say that “we are 95% confident that the
population mean lies within 46.08 and 53.92.”
 This statement does not mean that the chance is 0.95 that the
population mean of all the random variables falls within the interval
established from this one sample.
 Instead, it means that if we select many random samples of the same
size and if we compute a confidence interval for each of these
samples, then in about 95 percent of these cases, the population mean
will lie within that interval.
 Example 2: The mean annual income of Ethiopian Air lines Workers

(EAL) workers is supposed to be 24,000 Birr. Assume that this
estimate was based on a sample of 250 airline workers and the
population standard deviation was 5000 Birr. Then
a) Compute the 95% confidence interval for the population mean?
b) Construct the 90% confidence interval for the population mean?

 Solution: given
 Required:


 Note that:
 A narrower confidence interval is more precise
 Larger samples give more precise estimates
 Small variance leads to more precise estimates
 Lower confidence coefficients allow us to construct more precise
estimate
 Calculating Confidence Interval Estimate of a Population Mean:
Normal Population with Unknown
 When the population standard deviation is unknown, we use the
sample standard deviation, as an estimate of . The sample
standard deviation is given by:
 Thus, the standard deviation of the sampling distribution of the sample

means, is given by:
 In this case, the construction of confidence interval estimate depends
up on whether the sample size is larger or small:
 Case 1: When the sample size is large and unknown (A sample size
is large when n>30).
 Confidence interval estimate for population mean ( ) is given by:
 Case 2: when the sample size is small and unknown (A sample size
is small when n < =30)
 Confidence interval estimate for population mean ( ) is given by:
𝑿
 𝜶 𝜶
𝟐 𝟐
 Where is the t-value providing an area of in the upper tail of a

t-distribution with degrees of freedom, and is sample standard
deviation.
 The t-distribution is a family of similar probability distributions,
with a specific t- distribution depending on a parameter known as the
degrees of freedom.
 As the number of degrees of freedom increases, the difference
between the t-distribution and standard normal probability becomes
smaller and smaller, and the t-distribution will have less dispersion.
 The t-distribution is symmetrical, bell-shaped and has zero
as its mean. We use the
 The t-distribution if
 The Z-distribution if

 Example: Sales personnel for Beer factory are required to submit
weekly reports listing the customer contacts made during the week. A
sample of 28 weekly contact reports showed a mean of 22.4 customer
contacts per week for the sales personnel. The sample standard
deviation was five contacts.
a) Develop a 95% confidence interval for the sale personnel?
b) Develop a 90% confidence interval for the sale personnel?
 Solution: given
 Required:


 Example 2: In the testing of a new production method, 18 employees were
selected randomly and asked to try the new method. The sample mean
production rate for the 18 employees was 80 parts per hour and the sample
standard deviation was 10 parts per hour. Provide 90% and 95% confidence
intervals for the population mean production rate for the new method,
assuming the population has a normal probability distribution.
 Solution: given
 Required:



 Determining the Sample Size in Estimating Population Mean
 Some sampling error will arise because we have not studied the whole
population.
 Whenever we sample, we always miss some helpful information about

the population. If we want a high level of precision (that is, if we want
to be quite sure of our estimate).
 We have to sample enough of the population to provide the required

information. Sampling error is controlled by selecting a sample size
that is adequate. How is this adequate sample size determined for any
specified level of precision or confidence?
 A method to determine an adequate sample size is dealt below.

 The confidence interval for the population mean is:
. In this case the sampling error is
or less, so, let’s denote by , which is the
maximum tolerable sampling error for some specified level of
confidence .
 Where is the maximum sampling error at some level of

precision .
Example: The CSA of Ethiopia has past data that indicate the
interview time for a consumer opinion study has standard deviation of
6 minutes.
How large a sample should be taken if the authority desires a 98%
level of precision that the mean interval time to be within 2 minutes or
less?
Assume that the sample size recommended in (a) above is taken and
that the mean interview time for the sample is 32 minutes. What is the
98% confidence interval estimate for the mean interview time for the
population?
Properties of Estimators
 There are various methods with which we may obtain estimates of the
parameters of economic relationships.
 How are we to decide whether an estimate is ‘good’ or whether it is

better than another obtained from a different method?
 We need some criteria for judging the goodness of the estimate. The
criteria or properties for a good estimator may be different for small
and large samples.
A) Small sample properties
1. Unbiasedness: An estimator is said to be unbiased if the expected
value of the estimator is equal to the true population parameter, i.e. if
. Biased if & Unbiased if ,
where is an estimator of and is a population parameter.

2. Minimum variance (Best Estimator): An estimator is best when it
possesses the smallest variance as compared to any other estimator
which is obtained by various methods.
3. Efficiency, Efficient estimator: An estimator is efficient when it
occupies both the aforementioned properties (Unbiasedness,
Minimum variance)
4. Minimum Mean Square Estimator (MSE): An estimator is a MSE
if it has the smallest mean square error defined as the expected value
of the square differences around the true population parameter.
5. Sufficiency, Sufficient Estimator: An estimator is said to be
sufficient if it utilizes all the information a sample contains about the
true parameter i.e. it must use all the observations of a sample
6. Best, Linear and Unbiased Estimator (BLUE): An estimator is
BLUE if it is best, linear and unbiased. An estimator is linear if it is a
linear function of the sample observation.
A) Large Sample Properties (Asymptotic Properties)
1. Asymptotic Unbiased: An estimator is asymptotically unbiased if
.
2. Consistency: An estimator is consistence if it satisfies the following
conditions:
i. The estimator must be asymmetrically unbiased
ii. The variance of the estimator must approach to zero as n approaches
infinity, i.e.,
1. Asymptotic Efficiency: An estimator is asymptotically efficient if:
i. The estimator is consistent
ii. The estimator has a smaller asymptotic variance as compared to any
consistent estimator.
Lecture VII
HYPOTHESIS TESTING
INTRODUCTION
 When a sample is drawn from a population, the evidence
obtained can be used to make inferential statements about the
characteristics of the population. As we have seen, one
possibility is to estimate the unknown population parameters
through the calculation of point estimates or confidence
intervals. Alternatively, the sample information can be
employed to assess the validity of some conjecture, or
hypothesis, that an investigator has formed about the
population. A hypothesis is formed about some population,
and conclusions about the merits of this hypothesis are to be
formed on the basis of sample information.
Some terminologies in hypothesis testing
 Alternative hypothesis (Ha): the complement of the null
hypothesis or the hypothesis one accepts if he/she rejects the
null hypothesis. The alternative hypothesis can take either of
three forms: greater than, less than or different from.
 A hypothesis against which the null hypothesis is tested, and which
will be held to be true if the null is held false.
 Simple Hypothesis: A hypothesis that specifies a single value for a
population parameter
 Composite Hypothesis: A hypothesis that specifies a range of value
for a population parameter
 One Sided Alternative: An alternative hypothesis involving all
possible values of a population parameter on either one side or the
other of (that is, either greater than or less than) the value specified by
a simple null hypothesis.
 Two Sided Alternatives: An alternative hypothesis involving all
possible values of a population parameter other than the value
specified by a simple null hypothesis.
 Hypothesis Test Decisions: A decision rule is formulated, leading the
investigator to either accept or reject the null hypothesis on the basis
of sample evidence.
Cont’d
 One tailed and two tailed test: the test of hypothesis in which
the alternative hypothesis is given by the statement of greater
than or less than is called one tailed test because it is given in
one direction. If the alternative hypothesis is given by the
statement "different from" the test is called a two tailed test.
 Critical value: the demarcation point between the acceptance

and the rejection region. If the test statistic value falls in the
acceptance region, the null hypothesis is accepted; if its value
falls in the rejection region the null hypothesis is rejected.
Cont’d
 Test statistics: the value that is computed from sample results and to
be compared with the critical value to conclude either to accept or
reject the null hypothesis.
 Type I error: If we strongly reject the null hypothesis which is

actually true, we commit type I error. The rejection of a true null
hypothesis
 Type II error: If one fails to reject the null hypothesis which is

actually false, the error so committed is a type II error. The acceptance
of a false mull hypothesis
 Significance Level: the probability of rejecting a null hypothesis which is
true (this probability is sometimes expressed as a percentage)
Cont’d
 Power of test: The probability of rejecting a null hypothesis which is false
 Power of test: The probability of rejecting a null hypothesis which is false.
 Level of significance (α): It is the probability that shows the level

confidence for our conclusion. It establishes the criterion for the
rejection or acceptance of the null hypothesis.
 The P-value: Observed significance level or p-value the probability of

observing the value of the test statistic that is at least a contradictory
to the null hypothesis, when the null hypothesis is true. The smaller
the p-value the stronger the evidence against the null hypothesis
provided by the data. If the p-value is as small as or smaller than the
α level, we say that the data are statistically significant at level α.
Steps In Hypothesis Testing
 Step 1: State the hypotheses. We begin by stating the value of a
population mean in a null hypothesis, which we presume is true.
 Step 2: Set the criteria for a decision. To set the criteria for a
decision, we state the level of significance for a test.
 Step 3: Determining the test distribution to use depends on the

standard deviation and sample size (Z-score test or T-test).
 Step 4: Defining the rejection or critical region to accept or

reject the null hypothesis.
 Step 5: Compute the test statistic.
 Step 6: Make a decision.

Large sample tests
 Large sample tests are used to examine more variables and larger sample
sizes. For a large sample, a z-test is used, while for a small sample, a t-test is
used.
 The steps of testing a hypothesis include setting up the null hypothesis,

setting up the alternative hypothesis, calculating test statistics, determining
the table value of test statistics, and concluding whether the null hypothesis
is accepted or rejected.
 Other testing options include the chi-square test and the f-test.
 A large t-score indicates that the groups are different, while
 A small t-score indicates that the groups are similar.
 The correlated t-test is performed when the samples consist of matched

pairs of similar units or when there are cases of repeated measures.
Types of Hypothesis
 The null hypothesis is usually denoted by and the alternative
hypothesis is denoted by .
 Simple Hypothesis: A hypothesis, whether null or alternative, might
specify just a single (specific) value, say , for the population
parameter . For example if the null hypothesis is simple, It can be
stated as:
 Composite hypothesis: a hypothesis could specify a range of values
for the unknown population parameter. And will hold true for more
than one value of the population parameter.
 For instance, the null hypothesis that the mean weight of boxes of
cereal is at least 20 ounces is composite.
 The hypothesis is true for any population mean weight greater than or
equal to 20 ounces.
 In many applications, a simple null hypothesis, say , is
tested against a composite alternative.
a. One-sided alternatives: In some cases, only alternatives on one side
of the null hypothesis are of interest.
 For example, we might want to test the null hypothesis that
against the alternative hypothesis that the true value of is bigger
than , which we can written as:
 Conversely, the alternative of interest might be . Such
alternative hypotheses are called one-sided alternatives.
 Two-sided alternative: sometimes we may want to test this simple
null hypothesis against the very general alternative that
the true population mean is different from ,that is, .
This is referred to as a two-sided alternative.
 Having specified a null and alternative hypothesis and collected
sample information, a decision concerning the null hypothesis must
be made.
 The two possibilities are to accept the null hypothesis, or reject it
in favor of the alternative.
 In order to reach one of these conclusions, some decision rule, based
on the sample evidence, has to be formulated.
 If all that is available is a sample data from population, then the
population parameters will not be precisely known.
 There are two possible states of nature-either the null hypothesis is
true or it is false.
 One error that could be made, called a Type I error is the rejection
of a true null hypothesis.
 If the decision rule is such that the probability of rejecting the null
hypothesis when it is true is , then is said to be the significance
level of the test.
 Since the null hypothesis must either be accepted or rejected, it
follows that the probability of accepting the null hypothesis when it is
true is .
 The other possible error, called a Type II error, arises when a false
null hypothesis is accepted.
 Suppose that, for a particular decision rule, the probability of

making such an error when the null hypothesis is false is denoted by
.
 Then, the probability of rejecting a false null hypothesis is ,

which is called the power of the test.
 Null hypothesis is true Null hypothesis is false
Decision Accept Correct decision (with a Type II error (with a prob.

concerning the prob. of ) of )
null hypothesis Reject Type I error (with a prob. Correct decision (with a
of ) prob. of )
 Ideally, of course, we would like to have the probabilities of both

types of error be as small as possible. However, there is clearly a
trade-off between the two.
 Once a sample has been taken, any adjustment to the decision rule that
makes it less likely to reject a true null hypothesis will inevitably
render it more likely to accept this hypothesis when it is false.
 Specially, suppose we want to test, on the basis of a random sample,
the null hypothesis that the true mean weight of the contents of boxes
of cereal is at least 20 ounces.
 Given a specific sample size- say, n = 30 observations we might
adopt the decision rule that the null hypothesis is rejected if the
sample mean weight is less than 18.5 ounces.
 Now, it is easy to find a decision rule for which the probability of
Type I error is lower.
 If we modify our decision rule to “Reject null hypothesis if sample
mean weight is less than 18 ounces” this objective will have been
achieved.
Large Sample Run Test
 Hypothesis testing involving large samples (n>30) is based on
the assumption that the population from which the sample is
drawn has a normal distribution. Consequently the sampling
distribution of mean x is also normal. Even if the population
does not have a normal distribution, the sampling distribution
of mean x is assumed to be normal due to the central limit
theorem because the sample size is large.
Large Sample Run Test
 The procedure for large sample test involves the following steps.
1. Set up the null hypothesis and the alternative hypothesis.
2. Choose the level of significance (alpha) based on the reliability and
risk of the test.
3. Determine whether the test is one-tailed or two-tailed based on the
alternative hypothesis.
4. Compute the test statistic using the population or sample standard
deviation and the standard error formula.
5. Compare the test statistic with the critical value or the p-value to
make a decision.
Large Sample Tests
 Sample Size Should be Large, as the size of sample increases
it becomes more and more representative of parent population
and shows its characteristics. However, in actual practice, large
samples are more expansive. Thus a balance has to be
maintained between the sample size, degree of accuracy
desired and financial resources available.
 For large sample, at a specified level of significance.
 Reject H0, if computed value of Zcal ≥ critical value Zα
 Otherwise accept H0
Large Sample Tests
 A sample of size n ≥ 30 is generally considered to be a large
sample for statistical analysis whereas a sample of size n < 30 is
considered to be a small sample.
 It may be noted from the formula of that its value tends to
be smaller as the size of sample n increases and vice-versa.
 When standard deviation σ of population is not known, the
standard deviation s of the sample, which closely approximates
σ value, is used to compute standard error, that is,
 There’s a price to be paid, using the modified decision rule, we will be
more likely to accept the null hypothesis, whether it is true or false. Thus,
in decreasing the type I error probability, we have increased the type II
error probability.
 The only way of simultaneously lowering both error probabilities would
be to obtain more information about the true population mean, by taking
a larger sample. Typically what is done in practice is to fix at some
desired level the probability of making a type I error, that is; the
significance level is fixed. This then determines the appropriate decision.
 In fixing a significance level, generally at some small probability, we
are ensuring that the chance is low that a true null hypothesis will be
rejected.
Testing about the population mean
 Case I: Tests of the Mean of a Normal Distribution: Population
Variance Known
 If the sample size is large (n≥30) or the population standard
deviation is known the normal distribution is used for testing
about a mean.
 Suppose we have a random sample of observations from a normal
population with mean and known variance .
 If the observed sample mean is , then a test with significance level of
of the null hypothesis
𝟎
𝜹
𝒏
 Z is a standard
random variable.
 Example: When a process producing ball is operating correctly, the
weights of the balls have a normal distribution with mean 5 gram and
standard deviation 0.1 gram. An adjustment has been made to the
process, and the plant manager suspects this has raised the mean
weight of ball produced, leaving the standard deviation unchanged. A
random sample of sixteen balls is taken, and their mean weight is
found to be 5.038 gram. Test at significance levels of 0.05 & 0.10
(that is, at 5% and 10% levels) the null hypothesis that the population
mean weight is 5 gram against the alternative that it is bigger.
 Solution: given
 The decision rule is to reject in favor of if
 Since 1.52 does not exceed 1.645, we fail to reject the null hypothesis
at the 5% level of significance; that is, the null hypothesis is accepted
at this significance level.
 However, and since 1.52 is bigger than 1.28, the
null hypothesis is rejected at the 10% level of significance. To this
extent, then, there is some evidence in the data to suggest that the true
mean weight exceeds 5 gram.
 Obviously the lower the significance level at which a null hypothesis
can be rejected, the greater the doubt cast on its truth.
 Rather than testing hypotheses at pre-assigned levels of significance,
investigators often determine the smallest level of significance at
which a null hypothesis can be rejected.
 The smallest significance level at which a null hypothesis can be
rejected is called the probability- value, or P-value, of the test.
 In the above example we found that Therefore, according
to our decision rule, the null hypothesis is rejected for any significance
level for which is less than 1.52. From the standard normal table
we find that, when is 1.52, is equal to 0.0643= 6.43%. This, then,
is the P– value of the test. The implication is that the null hypothesis
can be rejected at all levels of significance higher than 6.43%.
 The appropriate procedure for testing, at a given significance level, the
null hypothesis against the alternative hypothesis
is precisely the same as when the null hypothesis is .
 Case II: Tests of the Mean of a Normal Distribution: Population
Variance Known
population with mean and known variance . If the observed
sample mean is , then a test with significance level of of either
null hypothesis
 Case III: Tests of the Mean of a Normal Distribution: Population
Variance Known
population with mean and known variance .
 If the observed sample mean is , then a test with significance level of
of the null hypothesis
 Example: A drill, as part of an assembly line operation, is used to drill
holes in sheet metal. When the drill is functioning properly, the
diameters of these holes have a normal distribution with mean 2
inches and standard deviation 0.06 inch. Periodically, to check that the
drill is functioning properly, the diameters of a random sample of
holes are measured. Assume that the standard deviation does not vary.
A random sample of nine measurements yield mean diameter 1.95
inches. Test the null hypothesis that the population mean is 2 inches
against the alternative that it is not. Use a 5% significance level and
also find the P-value of the test.
 Solution: given
 Where
 Then, since – 2.50 is less than -1.96, the null hypothesis is rejected at
the 5% significance level.
 According to the decision rule, the null hypothesis will be rejected for
any significance level for which is bigger than -2.50. From
the standard normal table, we see that when is 2.50, is equal
to 0.006. Hence, = 0.0124. This is the P- value of the test, implying
that the null hypothesis can be rejected against the two-sided
alternative at any level of significance greater than 1.24%. This
certainly casts substantial doubt on the hypothesis that the drill is
functioning correctly.
 Tests for the Mean of Large Sample Sizes
 Suppose we have a random of n observations from a population with
mean and variance . If the sample size n is large (n>30), the test
procedures developed for the case where the population variance is
known can be employed when it is unknown, replacing by the
observed sample variance . Moreover, these procedures remain
approximately valid even if the population distribution is not normal.
 Example: It might be suspected that the firms most likely to attract
take-over bids are those that have been achieving relatively poor
returns. One measure o such performance is through “abnormal
returns,” which average 0 over all firms. A random sample of 88 firms
for which cash tender offers had been made showed abnormal returns
with a mean of -0.0029 and a standard deviation of 0.0169 in the
period from 24 months to 4 months prior to the take-over bids. Test
the null hypothesis that the mean abnormal return for this population
is 0 against the alternative that it is negative.
 Solution: given
The decision rule is to reject in favor of if

 According to our decision rule, the null hypothesis is rejected for any
significance level for which is bigger than -1.61. From a table,
we see that when is 1.61, is equal to 0.0537.
 Hence, the null hypothesis is rejected at any significance level bigger
than 5.37%.Thus, the probability of observing sample mean abnormal
returns as low as or lower than those actually observed would be
0.0537 if the true mean abnormal returns for all firms attracting take-
over bids were 0. The data suggest quite strongly that, on average,
abnormal returns are lower for such firms.
 Tests of the Mean of Normal Distribution: Population Variance
Unknown
 Suppose we have a random sample of n observations from a normal
population with mean μ. If the observed sample mean and standard
deviation are X and Sx, then the following tests have significance
level of ά:
 To test either null hypothesis against the
alternative is obtained from the decision rule
 To test either null hypothesis against the
alternative is obtained from the decision rule
 To test the null hypothesis against the alternative
 is obtained from the decision rule

 Example: A real chain knows that, on average, sales in its
stores are 20% higher in December than in November. For the
current year, a random sample of six stores was selected. Their
percentage December sales increases were found to be: 19.2;
18.4; 19.8; 20.2; 20.4 and 19.0. Assuming a normal population
distribution, test the null hypothesis that the true mean
percentage sales increase is 20, against the two-sided
alternative, at the 10% significance level.
 Solution: given the data we can find
 Where
 Thus, since –1.597 lies between - 2.015 and 2.015, the null hypothesis
that the true mean percentage increase is 20 is accepted at the 10%
level. The evidence in the data against this hypothesis is not terribly
strong.
THAKS!!!
END!!!

Statistics For Economists ch1 - ch7 (All in One)

Uploaded by

Copyright:

Available Formats

Statistics For Economists ch1 - ch7 (All in One)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Economists ch1 - ch7 (All in One)

Uploaded by

Copyright:

Available Formats

STATISTICS FOR ECONOMISTS

College of Business and

11/10/2023 Statistics for Economics_AAU 3

11/10/2023 Statistics for Economics_AAU 4

11/10/2023 Statistics for Economics_AAU 5

11/10/2023 Statistics for Economics_AAU 7

11/10/2023 Statistics for Economics_AAU 8

2) Collectively Exhaustive Events: A list of events is said to be

11/10/2023 Statistics for Economics_AAU 10

3) Independent and Dependent Events: Two events are said to be

4) Compound Events: When two or more events occur in connection

11/10/2023 Statistics for Economics_AAU 12

11/10/2023 Statistics for Economics_AAU 14

NA 𝑭𝒂𝒗𝒐𝒓𝒂𝒃𝒍𝒆 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝒘𝒊𝒕𝒉 𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆 𝑨

11/10/2023 Statistics for Economics_AAU 16

11/10/2023 Statistics for Economics_AAU 17

11/10/2023 Statistics for Economics_AAU 18

11/10/2023 Statistics for Economics_AAU 19

NOTE: By classical definition the probability of event A is a number

11/10/2023 Statistics for Economics_AAU 23

11/10/2023 Statistics for Economics_AAU 24

11/10/2023 Statistics for Economics_AAU 26

11/10/2023 Statistics for Economics_AAU 28

Example: Suppose Mr. X is interviewed for two Jobs(Bank trainee and

11/10/2023 Statistics for Economics_AAU 31

11/10/2023 Statistics for Economics_AAU 32

Thus, 5/6 = 1/2 + 1/3

11/10/2023 Statistics for Economics_AAU 36

P(either a King or a heart) = P(King) + P(Heart) – P(King and Heart)

11/10/2023 Statistics for Economics_AAU 38

11/10/2023 Statistics for Economics_AAU 39

6) The General Rule of Multiplication:

11/10/2023 Statistics for Economics_AAU 40

11/10/2023 Statistics for Economics_AAU 41

11/10/2023 Statistics for Economics_AAU 42

11/10/2023 Statistics for Economics_AAU 43

11/10/2023 Statistics for Economics_AAU 44

Exercise 1: A class consists eight PhD students, 5 males and 3 females. A

11/10/2023 Statistics for Economics_AAU 49

11/10/2023 Statistics for Economics_AAU 52

11/10/2023 Statistics for Economics_AAU 53

11/10/2023 Statistics for Economics_AAU 54

11/10/2023 Statistics for Economics_AAU 55

11/10/2023 Statistics for Economics_AAU 58

properties. Some examples are:

 The Bernoulli Distribution

 The Binomial Distribution

 The Hypergeometric Distribution

 The Poisson Distribution

 Where is the parameter that the distribution of depends on.

 Mean and Variance:

 A random variable X is defined to have a binomial distribution if the discrete density

function (PMF) of X is given by:

 As the experiment is a binomial experiment, we can determine the probability of success

 Given: : getting one head , Number of trials and , probability of

 Example 1: Tossing a coin three times.

b) Since the variable follows a binomial distribution, it becomes:

for each trial.