Lecture Transcript 1 (Random Variable and Their Probability Distribution)
Lecture Transcript 1 (Random Variable and Their Probability Distribution)
Learning Objectives:
In the previous lecture we discussed the basic concepts of probability. Now, we are going to learn about
another concept called random variables and their corresponding probability distributions.
After reading this transcript, it is hoped that the knowledge acquired on the topic will help us form a greater
understanding about modeling and describing phenomena that is random in nature.
References Used:
Calaca, N., Uy, C., Noble, N., Manalo, R. (2016). Statistics and Probability. Vibal Group, Inc., Quezon City
Lim, Y., Nocon, R., Nocon, E., Ruivivar, L. (2016). Math for Engaged Learning: Statistics and Probability.
Sibs Publishing House, Inc., Quezon City
Melosantos, L., Antonio, J., Robles, S., Bruce, R., and Sacluti, J (2016). Math Connections in the Digital Age
Statistics and Probability. Quezon City: Sibs Publishing House, Inc., 2016.
Mendelhall, W., Beaver, R., Beaver, B. (2013). Introduction to Probability and Statistics. Pacific Grove, Calif.
: Brooks/Cole ; Andover : Cengage Learning [distributor], 2013.
Lecture 1.1.
Introduction to Random Variables
Introduction
Recall that an experiment is an activity that produces outcomes or generates data. Further, note that these
outcomes have a corresponding chance (or probability). Some examples of which are a) tossing three coins
and counting the number of heads, b) recording the time a person can squat before he/she gets exhausted,
c) counting the class attendance of students for today, etc. Now, in this lecture, we will be talking about a
way to map the outcomes of these statistical experiments determined by probabilities to a number.
Lesson Proper
Def. A Random Variable is variable that assumes real number values that is derived from the outcomes
of an experiment. Alternatively, a random variable is a function that maps the outcomes of a random
process to a numeric value. That is,
𝑋: Outcome → Number
1
• 𝑊 = The playing time of an amateur basketball player per game.
Note that random variables are classified as discrete or continuous according to the values it assumes.
The random variables 𝑋 and 𝑌 stated above are considered as discrete random variables since these
variables are defined over a finite (countable) sample space. Given this, we have the following definition:
Meanwhile, the random variables 𝑍 and 𝑊 are considered as continuous because these random variables
are defined over an uncountable infinite sample space. This leads us to the following definition:
Example 1.1.1. Classify the following random variables as DISCRETE or CONTINUOUS random variable:
As discussed previously, a random variable is a mapping of a random outcome to a number. Now, in this
part of the lecture, we will learn how to identify the possible outcomes (or otherwise known as writing the
sample space of the random variable) and the value of the random variable.
Example 1.1.2. In a box are two (2) balls — one white and one black, two balls are picked one at a time with
replacement. List down all the possible outcomes and the values of the random variable 𝑋 representing the
number of white balls drawn using the table below.
Solution to Example 1.1.2. Let 𝑋 represent the number of white balls drawn. First, we create a table with
the following heading
After which, we list the possible outcomes of the experiment. Clearly, if two balls are picked one at a time
with replacement, the possible outcomes are WW (White, White), WB (White, Black), BW (Black, White),
and BB (Black, Black). We now have the following:
Lastly, we determine the value of each of the possible outcome of the random variable:
3
Remark. There are various ways to list the possible outcomes and values of a random variable. In fact,
different texts may show different ways. This tabular way of showing the sample space and the values of
the random variable is one of the many ways to show its possible outcomes and values.
Example 1.1.3. Three coins are tossed, list down all the possible outcomes and the values of the random
variable 𝑌 representing the number of heads that occur.
Example 1.1.4. In a family with three children, list down all the possible outcomes of their sexes and the
values of the random variable 𝑍 representing the number of male children.
Supplementary Exercises
List all possible outcomes and find the values of the random variable:
1. A manufacturer produces laptops. Suppose three units are tested by the quality assurance team and
they want to find the number of defective units that occur. Let D represent the defective units and N
the non-defective units. Show the values of the random variable 𝑋 representing the number of defective
units using the table below.
Number of defective
Possible Outcomes
laptops (𝑋)
4
2. In an experiment four coins are tossed. Let 𝑀 be the random variable representing the number of tails
that occur. Find the values of the random variable 𝑀.
Lecture 1.2.
Discrete Probability Distribution
Introduction
After learning the concept of random variables, we will now learn how to construct the probability
distribution of a discrete random variable. Our understanding about the probability of an event is
important in this lesson.
Lesson Proper
By convention, the notation used for the probability distribution of a discrete random variable is 𝑃(𝑋)
or 𝑃(𝑋 = 𝑘), where 𝑘 represents any of the values of the random variable.
Remark.
Let us recall that,
Where:
𝐸 is any event (or the number of desired outcome)
𝑆 is the total number of sample space (or the total number of possible outcomes)
5
b) Experimental (or Observation/Empirical) Probability is the likelihood of an event made in
repeating an experiment and observing the outcomes. Its formula is:
a) The probability of each event in the sample space must be between or equal to 0 and 1.
0 ≤ 𝑃(𝑋) ≤ 1
b) The sum of the probabilities of all the events in the sample space must be equal to 1 (or in
percentages, 100%).
∑ 𝑃 (𝑋 ) = 1
Example 1.2.1. In a box are 2 balls, one red and one blue. Two balls are picked one at a time with
replacement. With the number of red balls drawn, construct the probability distribution table.
Further, the sample space for the random variable is 𝑆 = { (blue, blue), (blue, red), (red, blue), (red, red) }.
This means that the number of sample space, 𝑛(𝑆), is equal 4. From here we create a table containing the
following information:
𝑋 𝑃(𝑋)
0
1
2
6
Now, we compute for the corresponding theoretical probabilities
∑ 𝑃 (𝑋 ) = 1
Remark. Always check if you have made a valid probability distribution. First, that is, the probability of
each event in the sample space should be between or equal to 0 and 1. Second, the sum of the probabilities
of all the events in the sample space should be equal to 1.
Example 1.2.2. Suppose a die is rolled 2 times. Let 𝑋 = the number of times a 6 comes up. Answer the
following:
Now, it should be clear that the values of the random variable is 𝑋 = 0, 1, 2 (in rolling a die twice, the face
could show one (1) six, or two (2) sixes, or none at all).
Next, to get the values that will be used to compute for the probability, we may write down all the possible
outcomes (e.g. 1,1; 1,2; 1,3; 1,4; 1,5; 1,6; …; 6,6) and list the number of times a 6 comes up. Alternatively,
we may also list the possible outcomes using a matrix shown on the illustration below:
1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
7
6 6,1 6,2 6,3 6,4 6,5 6,6
From here we see that 𝑛(𝑆) = 36; 𝑛(𝑛𝑜 6 𝑐𝑜𝑚𝑒𝑠 𝑢𝑝) = 25; 𝑛(𝑜𝑛𝑒 (1) 𝑠𝑖𝑥 𝑐𝑜𝑚𝑒𝑠 𝑢𝑝) = 10; and
𝑛(𝑡𝑤𝑜 (2) 𝑠𝑖𝑥 𝑐𝑜𝑚𝑒𝑠 𝑢𝑝) = 1. This brings us to the following table:
𝑋 𝑃(𝑋)
25
0 𝑃 (0 ) = = 0.6944
36
10
1 𝑃 (1 ) = = 0.2778
36
1
2 𝑃 (2 ) = = 0.0278
36
25
𝑃(𝑋 = 0) = 𝑃(0) = 36 = 0.6944 or (69.44%)
Remark. For uniformity, we will always express answers in four decimal places.
Example 1.2.3. The result of a survey given to Senior High School students is shown below
Task: Construct the probability distribution for the random variable 𝑌, the number of pets that they have
at home, and determine the chance that a senior high school student have three (3) pets at home.
8
Solution to Example 1.2.3.
Since the values for the random variable 𝑌 are already given, we construct the following table:
Number of Pets at
𝑃(𝑌 = 𝑘)
Home (𝑌)
0
1
2
3
4
5
Further, recall that the given survey result is an outcome of making an observation (or experiment). Thus,
𝑓
the formula for the experimental probability , 𝑃(𝑌 = 𝑘) = 𝑁 , should be used. Note that in this formula, 𝑓
represents the frequency of the desired observation, while 𝑁 is the total number of observations. Now,
based on the given table, this means that 𝑁 = 5 + 4 + 6 + 8 + 1 + 1 = 25. Also, the frequency of having no
pets at home is five (5); the frequency of having one pet at home is four (4); the frequency of having two
pets at home is six (6) and so on…
Number of Pets at
Home (𝑌) 𝑃(𝑌 = 𝑘)
5
0 𝑃 (𝑌 = 0 ) =
25
4
1 𝑃 (𝑌 = 1 ) =
25
6
2 𝑃 (𝑌 = 2 ) =
25
8
3 𝑃 (𝑌 = 3 ) =
25
1
4 𝑃 (𝑌 = 4 ) =
25
1
5 𝑃 (𝑌 = 5 ) =
25
Lastly, it should be clear that the chance that a senior high school student have three (3) pets at home is
8
𝑃(a senior high school student have three pets at home) = 𝑃 (𝑌 = 3) = 𝑃 (3) = 25 = 0.32 or 32%
9
Example 1.2.4. A nursing school is investigating the number of reported laboratory accidents committed
by their students while on their internship program. These are on-the-job training related accidents over
a period of one month. The following are records on the laboratory accidents that were documented:
Laboratory
accidents, 𝑃 (𝑋 )
𝑋
25
0 𝑃 (𝑋 = 0) = = 0.5
50
15
1 𝑃 (𝑋 = 1) = = 0.3
50
2
2 𝑃 (𝑋 = 2) = = 0.04
50
5
3 𝑃 (𝑋 = 3) = = 0.10
50
2
4 𝑃 (𝑋 = 4) = = 0.04
50
1
5 𝑃 (𝑋 = 5) = = 0.02
50
10
Solution to Example 1.2.4. c)
Supplementary Exercises
1 1 1 1 1 1
𝑃(𝑋)
6 6 6 6 6 6
b)
𝑋 0 5 10 15 20
𝑃(𝑋) −0.5 0.7 0.3 0.2 0.3
c)
𝑋 2 4 6 8 10
𝑃(𝑋) 1.22 0.54 0.25 0.48 0.01
d)
𝑋 0 1 2 3 4
𝑃(𝑋) 0.10 0.15 0.15 0.25 0.35
𝑋 −20 −10 0 10 20 30
𝑃(𝑋) 0.125 0.175 0.150 0.200 𝑚 0.100
11
b) What is the probability that the random variable 𝑋 is negative?
c) What is the probability that the random variable 𝑋 is greater than −10?
d) What is the probability that the random variable 𝑋 is less than 0?
3. The table below shows the number of cars sold in a month by 25 car dealers.
4. The household of a local community were surveyed about the number of occupants who are
working. It was found that 25 households have one occupant working, 18 have two occupants
working, 12 have three occupants working and 5 have four occupants working.
Let 𝑋 be the number of occupants working from a randomly selected household, create a probability
distribution for the random variable 𝑋.
Enrichment/Extension
Below is the result of a survey which aims to determine the reason behind SHS graduates who forgo a
University for college education:
Lecture 1.3.
Mean and Variance of a Discrete Random Variable
Introduction
Previously, we have discussed how to construct probability distributions for discrete random variables.
Further, significant questions such as “What fraction of the time will intern students of a certain nursing
school incur three laboratory accidents?” among others, were answered.
Now, in this lecture, we will respond to other important questions such as “On the average, what number
of laboratory accidents will we expect our intern students to incur, based on the data that we got?”. In other
words, this lecture will teach us how to make expectations about a random phenomenon we choose to
observe.
Lesson Proper
Recall that the Population Mean, 𝜇, is a parameter or population characteristic that describes the center or
common data in the distribution. Now, with regards to the discrete random variable, the following is its
definition:
Let us look at an intuitive justification for the formula of the expected value of a discrete random variable:
Toss two fair coins again and let 𝑋 be the number of heads observed. Recall that the probability distribution
for 𝑋 (computed using theoretical probability) is
𝑋 𝑃(𝑋)
1
0
4
2
1
4
1
2
4
13
Now, suppose that we perform an experiment by large number of times, say 400 times. Intuitively, for each
of the possible value (or outcomes) of the random variable, one would observe approximately 100
outcomes where no heads appear, 100 outcomes where a head and a tail appears, 100 outcomes where a
tail and a head appears, and lastly, 100 outcomes where two heads appear (remember, the coin is assumed
to be fair). The average then of this experiment is,
𝑠𝑢𝑚 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠
100(no heads are observed ) + 200(one head is observed) + 100(two heads are observed)
[Equation 1]
400
1 2 1
(0) + (1) + (2) ,
4 4 4
which is exactly what the formula of the expected value (∑[𝑋 ∙ 𝑃 (𝑋)]) states.
Remark. We could use a similar intuitive justification to justify formulas for the measures of dispersion—
the population variance (a parameter that measures the average squared distance or deviation of each item
in the data from the mean) and the standard deviation (a measure of the average distance between the
values of the data in the set and the mean, or how spread out the values in a data set are around the mean.
). However, to delegate an ample amount of time for the priority areas of the lecture, we will go straight to
their definitions and applications in real-life situations.
Remark. The standard deviation (𝜎) of a random variable 𝑋 is equal to the principal square root (positive
square root) of its variance. That is
𝜎 = √variance
14
Example 1.3.1. A young professional wish to venture into some investments. Two banks offered an
investment plan which indicates the return on investment (ROI) for a period of 5 years. In addition, the
probabilities associated with each ROI’s are provided in the table:
• We multiply 𝑋 and 𝑃(𝑋) values per row and we write the results on another column with a heading
entitled “𝑋 ∙ 𝑃(𝑋)”
• Next, we get the sum of all 𝑋 ∙ 𝑃(𝑋) column as this is what is stated by the formula for the mean of a
discrete random variable (𝜇 = ∑[𝑋 ∙ 𝑃(𝑋)]).
We see that both plans yield the same average ROI. At this point, this information should make us realize
that in terms of the ROI average, there is no plan that could be regarded as the better investment. However,
since a decision is not yet finalized, we can look at another aspect of this random phenomena through
making computations that will shed light on the means’ variability (computing for the variance or standard
deviation of the random variables). Recall that one of the things that the variance or standard deviation
tells us is the “consistency” or “closeness” of data points towards the mean. Note that using this measure
as basis for analysis is justified since the means of the two plans are the same.
Now, to compute for the variance and standard deviation, we do the following:
• First we add three more columns entitled “(𝑋 − 𝜇)”, “(𝑋 − 𝜇)2 ”, and “(𝑋 − 𝜇)2 ∙ 𝑃(𝑋)” on the right
side of the table we previously created accordingly.
• Next, we continue to fill out the table (tip: it will be easier to compute if we work by rows).
• Last, we compute for the sum of the last column to get the value of the variance, then we take the
principal square root of the variance to get the standard deviation.
ROI in
Investment Probability,
thousands 𝑋 ∙ 𝑃(𝑋) (𝑋 − 𝜇) (𝑋 − 𝜇)2 (𝑋 − 𝜇)2 ∙ 𝑃(𝑋)
Plan (𝑃(𝑋))
(𝑋)
10 0.2 2 10 − 30 = −20 (−20)2 = 400 400 ∙ 0.2 = 80
16
𝜎 2 = ∑[(𝑋 − 𝜇)2 ∙ 𝑃(𝑋)]
𝜇 = ∑[𝑋 ∙ 𝑃(𝑋)]
𝜎2 = 0
𝜇 = 30 𝜎=0
Thus, it is within our opinion to advise this young professional to invest in Bank B.
Example 1.3.2. The owner of a computer store chain is set to choose the “Top Branch of the Year” and his
decision is completely based on the over-all selling performance (number of laptops sold) for the past
twelve months. The table below shows the data of the two finalists. :
Branch A Branch B
No. of Months No. of Months
Number of Number of
Before the Before the
Laptops Sold Laptops Sold
Target was Met Target was Met
(𝑋) (𝑋)
(𝑓) (𝑓)
20 1 20 2
25 3 25 2
30 2 30 2
35 3 35 2
40 2 40 1
45 1 45 3
Help the owner determine the “Top Branch of the Year” using probability and statistics.
Number of Probability,
Finalists No. of months, 𝑓 𝑋 ∙ 𝑃(𝑋)
laptops sold, 𝑋 𝑃(𝑋)
1
20 1 = 0.0833 1.6670
12
3
Branch A 25 3 = 0.2500 6.2500
12
2
30 2 = 0.1667 5.0000
12
17
3
35 3 = 0.2500 8.7500
12
2
40 2 = 0.1667 6.6680
12
1
45 1 = 0.0833 3.7485
12
𝜇 = ∑[𝑋 ∙ 𝑃(𝑋)]
𝑁 = 12
=32.0835
2
20 2 = 0.1667 3.3340
12
2
25 2 = 0.1667 4.1675
12
2
30 2 = 0.1667 5.0010
12
2
35 2 = 0.1667 5.8345
Branch B 12
1
40 1 = 0.0833 3.3320
12
3
45 3 = 0.25 11.2500
12
𝜇 = ∑[𝑋 ∙ 𝑃(𝑋)]
𝑁 = 12
=32.9190
Based on these results, Branch B could be regarded as the “Top Branch of the Year”.
Remark. It should be interesting to consider, however, the consistency of sales, since the means are
relatively close to each other. The computation is given below:
18
=32.0835 𝜎 2 = 51.9015
𝜎 = 7.2042
2
20 2 = 0.1667 3.3340 −12.9190 166.9006 27.8223
12
2
25 2 = 0.1667 4.1675 −7.9190 62.7106 10.4539
12
2
30 2 = 0.1667 5.0010 −2.9190 8.5206 1.4204
12
2
35 2 = 0.1667 5.8345 2.0810 4.3306 0.7219
Branch B 12
1
40 1 = 0.0833 3.3320 7.081 50.1406 4.1767
12
3
45 3 = 0.25 11.2500 12.0810 145.9506 36.4876
12
Since the means are relatively close to each other, and the results above show that Branch B has greater
variability compared to Branch A, thus, the owner could still consider Branch A to receive the “Top Branch
of the Year” award.
Example 1.3.3. You attended your school fair and a Cube Game (six-sided die game) caught your attention.
In this game, you roll a die you have to pay 20 php. If the die shows a “one” you win 100 php, otherwise
you lose. You have the money and time to spare; do you think it will be advantageous for you to play a
considerable number of games?
Now, to compute for the mean, first, we let 𝑋 represent possible monetary outcomes for playing the game.
Clearly, the values of 𝑋 could be 80 php (this if you won; you get 100 php but recall that you paid 20 php
to play the game) and −20 php (if you lost). Similarly, we see that the total number of possible outcomes
is six (6). We can now construct our probability distribution that contains the 𝑋 ∙ 𝑃(𝑋) column:
Possible
𝑋 𝑃(𝑋) 𝑋 ∙ 𝑃(𝑋)
outcomes
1
1 80 = 0.1667 13.34
6
5
2, 3, 4, 5, 6 −20 = 0.8333 -16.67
6
𝐸 (𝑥 ) = ∑ 𝑥 ∙ 𝑃 (𝑥 )
𝐸(𝑥) = −3.33
19
The negative (−) sign on the expected value means that you will lose in the “long run” or alternatively, it
means that the game is designed so that the house always wins.
Example 1.3.4. With the same game as Example 1.3.3., let us modify as follows: To play the game you have
to pay 20 php. If the die shows a “one” you win P120, else you lose. Do you think you will win in the long
run?
Possible
𝑋 𝑃(𝑋) 𝑋 ∙ 𝑃(𝑋)
outcomes
1
1 120 − 20 = 100 = 0.1667 16.67
6
5
2, 3, 4, 5, 6 −20 = 0.8333 −16.67
6
𝐸 (𝑥 ) = ∑ 𝑥 ∙ 𝑃 ( 𝑥 )
𝐸(𝑥) = 0
The expected value of this game is 0, which means that theoretically, you will even out your winnings and
losses in the long run. Perhaps, a wasted effort, that is.
Remark. A game is said to be “fair” if the expected value is 0. For the most part of playing a gambling game,
you will not find this type of game of chance where the expected value is equal to 0. Most games of chance
are designed in favor of the house or banker.
Example 1.3.5. The organizing committee of a high school reunion placed 150 balls inside a box. Ten of the
balls are red, five are blue, one is gold, and the rest are white. A player has a chance to draw one ball from
the box. A red ball will win P500, a blue ball will win him P1000, and the single gold ball can win him a
prize of P5000. However, he will not win anything if he draws a white ball. What would be the fair price
to pay for a chance to draw a ball from the box?
Possible
𝑋 𝑃(𝑋) 𝑋 ∙ 𝑃(𝑋)
Outcomes
10
Red – 10 balls 500 = 0.0667 33.35
150
5
Blue – 5 balls 1000 = 0.0333 33.30
150
1
Gold – 1 ball 5000 = 0.0067 33.50
150
20
134
White – 134 balls 0 = 0.8933 0
150
𝐸 (𝑥 ) = ∑ 𝑥 ∙ 𝑃 (𝑥 )
𝐸(𝑥) = 100.15
Thus, if the organizing committee wants to profit from this game, the price to play the game should be
greater than 100.15 php (or alternatively, greater than 100 php).
Example 1.3.6. A car insurance company offers to pay 500,000 php if a car is stolen or is destroyed beyond
repair. The insurance policy costs 24,000 php and company research shows that the probability that the
company will need to pay the amount of insurance is 0.002. Find the expected value of the insurance to
car owners.
Further, say that during the entire life of the policy, a car owner did not meet such accidents, the value of
insurance is −24,000 php. We construct the probability distribution as follows:
𝑋 𝑃(𝑋) 𝑋 ∙ 𝑃(𝑋)
𝐸 (𝑋) = −23,000
Therefore, the expected value of the insurance to the car owner is −23,000 php. The negative expected
value shows that the policy is designed to the advantage of the insurance company. Car owners still buy
these insurance policies because the security it provides in the event of a loss is worth the cost to them.
Supplementary Exercises
1. A salesperson has found that the probability of making various numbers of sales per day, given that
calls on 10 sales prospects can be made, is presented in the table below. Calculate the average
number of sales per day, variance and the standard deviation of the number of sales. Use the table
below to show your solution.
2.
Sales per day when 10 prospects are contacted
Number of Sales, 𝑋 1 2 3 4 5 6 7 8
21
Probability, 𝑃(𝑋) 0.04 0.15 0.20 0.25 0.19 0.10 0.05 0.02
3. The arrival of customers during randomly chosen 10-min intervals at a drive-in facility specializing
in photo development and film sales has been found to follow the probability distribution in the
following table. Calculate the expected number of arrivals for 10-min intervals and the standard
deviation of the arrivals. Use the table below to show your solution.
22
Determine the following:
a. Expected number of arrivals for 10-min intervals
b. Variance of arrivals for 10-min intervals
c. Standard deviation of arrivals for 10-min intervals
4. A company makes electronic gadgets. One out of every 50 gadgets is faulty, but the company doesn't
know which ones are faulty until a buyer complains. Suppose the company makes a 150 php profit
on the sale of any working gadget, but suffers a loss of 4000 php for every faulty gadget because
they have to repair the unit. Check whether the company can expect a profit in the long term.
5. A local club plans to invest 10,000 php to host a baseball game. They expect to sell tickets
worth 15,000 php. But if it rains on the day of the game, they won't sell any tickets and the club will
lose all the money invested. If the weather forecast for the day of the game is a 20% possibility of
rain, is this a good investment?
Enrichment
Determining the Expected Value, Variance and Standard Deviation of Discrete Probability Distributions
using a Scientific Calculator
There are a myriad of brands and models of scientific calculators. They also have different ways of finding
the mean, variance, and standard deviation. The succeeding examples will give you an illustration (two
types of calculators) on how to use the function of your calculator in finding the mean, variance and
standard deviation.
Let us go back to Example 1.3.1. Using the Statistics function of our calculator, let us determine the mean,
variance and standard deviation.
24