Assignments MA
Assignments MA
FACULTY OF ENGINEERING
Statistics
Part I
1. Name and define the two areas of statistics.
Descriptive statistics describe the data set and consists of the collection, organization,
summarization, and presentation of data.
Inferential statistics consists of generalizing from samples to populations, performing
estimations and hypothesis tests to draw conclusions about the population.
2. Why are samples used in statistics?
Samples are used to save time and money when the population is large and when the
units must be destroyed to gain information.
3. In each of these statements, tell whether descriptive or inferential statistics have been used:
a. In the year 2030, 148 million Americans will be enrolled in an HMO. (Inferential)
b. Nine out of ten on-the-job fatalities are men. (Descriptive)
c. The median household income for people aged 25–34 is $35,888. (Descriptive)
1
PHM628: Probability and Statistics
Name: ID: 2/32
Part II
1. Name the three types of frequency distributions and explain when each should be used.
1. Categorical, ungrouped and grouped:
• Categorical: The categorical frequency distribution is used for data that can be
placed in specific categories, such as nominal- or ordinal-level data.
• Grouped: When the range of the data is large, the data must be grouped into classes
that are more than one unit in width, in what is called a grouped frequency
distribution.
• Ungrouped: When the range of the data values is relatively small, a frequency
distribution can be constructed using single data values for each class. This type of
distribution is called an ungrouped frequency distribution.
2. The following two frequency distributions are incorrectly constructed. State the reason why.
a. Class Frequency
27–32 1
33–38 0
39–44 6
45–49 4
50–55 2
Class width is not uniform.
b. Class Frequency
123–127 3
128–132 7
138–142 2
143–147 19
A class has been omitted.
3. Shown here are the number of inches of rain received in 1 year in 25 selected cities in the
United States. Construct a grouped frequency distribution and a cumulative frequency
distribution with 6 classes.
6 37 14 45 22 32 33 49 55
94 38 83 85 40 67 36 67
71 52 52 55 63 49 44 58
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
PHM628: Probability and Statistics
Name: ID: 3/32
4. Do Students Need Summer Development? For 108 randomly selected college applicants,
the following frequency distribution for entrance exam scores was obtained. Construct a
histogram, frequency polygon, and ogive for the data.
Applicants who score above 107 need not enroll in a summer developmental program. In
this group, how many students do not have to enroll in the developmental program?
Histogram:
PHM628: Probability and Statistics
Name: ID: 4/32
Frequency Polygon:
Ogive:
From the ogive, we can calculate the number of students who need to enroll in
summer developmental program, which is 28.
Hence, the students who don’t need to enroll = 108 – 28 = 80 students.
5. The math and reading achievement scores from the National Assessment of Educational
Progress for selected states are listed below. Construct a back-to back stem and leaf plot
with the data.
Math Reading
52 66 69 62 61 65 76 76 66 67
63 57 59 59 55 71 70 70 66 61
55 59 74 72 73 61 69 78 76 77
68 76 73 77 77 80
Math Reading
9997552 5
986321 6 1156679
64332 7 0016667778
8 0
PHM628: Probability and Statistics
Name: ID: 5/32
6. The state gas tax in cents per gallon for 25 states is given below. Construct a grouped
frequency distribution and a cumulative frequency distribution with 5 classes.
7.5 16 23.5 17 22
21.5 19 20 27.1 20
22 20.7 17 28 20
23 18.5 25.3 24 31
14.5 25.9 18 30 31.5
7.5 14.5 16 17 17 18 18.5 19 20 20 20 20.7 21.5 22 22
Part III
1. The average undergraduate grade point average (GPA) for the 25 top-ranked medical
schools is listed below.
3.80 3.77 3.70 3.74 3.70
3.86 3.76 3.68 3.67 3.57
3.83 3.70 3.80 3.74 3.67
3.78 3.74 3.73 3.65 3.66
3.75 3.64 3.78 3.73 3.64.
Find the mean, the median, the mode, and the midrange.
Mean = X1 + X 2 + X 3 + + Xn X i
93.1
= i =1
= =3.724
n n 25
Median = MD =3.73
PHM628: Probability and Statistics
Name: ID: 6/32
2. For the following data, construct a grouped frequency distribution with six classes then find
the mean and modal class.
1013 1867 1268 1666 2309 1231 3005 2895 2166 1136
1532 1461 1750 1069 1723 1827 1155 1714 2391 2155
1412 1688 2471 1759 3008 2511 2577 1082 1067 1062
1319 1037 2400.
Step 4: Find the mean of the grouped data using the formula given below:
Mean =
f . X 59553
= = 1804.6
f 33
The modal class is the class with the largest frequency. For the given grouped data, the
class 1013–1345 has the maximum frequency (=11). Hence
modal class = 1013–1345
3. Find the weighted mean price of three models of automobiles sold. The number and price
of each model sold are shown in this list.
Model Number Price
A 8 $10,000
B 10 12,000
C 12 8,000.
3. The weighted mean of a variable X is found by multiplying each value by its
corresponding weight and dividing the sum of the products by the sum of the weights.
w X + w2 X 2 + ... + wn X n wX
weighted mean = 1 1 =
w1 + w2 + ... + wn w
For the data given to us:
18 − 15
= 74.5 + 5 = 75.75
12
Group Frequency
150 - 154 5
155 - 159 2
160 - 164 6
165 - 169 8
170 - 174 9
175 - 179 11
180 - 184 6
185 - 189 3
f m − f m−1
Mode = L + i
( f m − f m−1 ) + ( f m − f m+1 )
11 − 9
= 174.5 + 5 = 175.9
(11 − 9) + (11 − 6)
PHM628: Probability and Statistics
Name: ID: 9/32
7. For these situations, state which measure of central tendency (mean, median, or mode)
should be used.
a) The most typical case is desired. (mode should be used as the measure of central
tendency. Since the most typical value is the one that occur most in the data.)
b) The data are categorical. (mode should be used as the measure of central tendency.
Since mean and median cannot be calculated for them.)
c) The values are to be divided into two approximately equal groups, one group containing
the larger values and one containing the smaller values. (median should be used as the
measure of central tendency. Since median, by definition is the middle value that splits the
data into two parts, one containing values lower than median and other containing values
higher than median)
7
8
9
10
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………………
For normal monthly precipitation:
n X
1
2
3
4
5
6
7
8
9
10
X= =
X 50 + 37 + ... + 34 + 61
= 44.1
n 10
where n is the number of values in the data set.
The variance is the sum of squared deviations from the mean divided by n-1:
=
2 (X − X ) 2
Variance = s
n −1
The shortcut formulas for computing the variance:
PHM628: Probability and Statistics
Name: ID: 11/32
n( X 2 ) − ( X ) 2
s =
2
147.6
n(n − 1)
The standard deviation is the square root of the variance:
n( X 2 ) − ( X ) 2
SD = s = s =
2
n(n − 1)
= 147.6 12.15
The Coefficient of Variation
s 12.15
CV = *100 = *100 = 27.55%
x 44.1
For Monthly Precipitation:
The range is the difference between the highest and lowest data value:
R = highest value - lowest value
= 5.1 – 1.1 = 4
The mean is the sum of all values divided by the number of values:
X= =
X 4.8 + 2.6 + ... + 1.8 + 2.5
2.63
n 10
where n is the number of values in the data set.
The shortcut formulas for computing the variance:
n( X 2 ) − ( X ) 2
s =
2
1.89
n(n − 1)
=
2 (X − X ) 2
Variance = s
n −1
(4.8 − 2.63) 2 + ... + (2.5 − 2.63) 2
= 1.89
10 − 1
The standard deviation is the square root of the variance:
n( X 2 ) − ( X ) 2
SD = s = s 2 =
n(n − 1)
= 1.89 1.373
The Coefficient of Variation
s 1.373
CV = *100 = *100 = 52.2%
x 2.63
Range 32 4
Variance 147.6 1.89
Standard deviation 12.15 1.373
CV 27.55% 52.2%
PHM628: Probability and Statistics
Name: ID: 12/32
So the Monthly Precipitation are more variable, because of higher coefficient of variation
2. Team batting averages for major league baseball in 2005 are represented below. Find the
variance and standard deviation for each league.
NL AL
0.252–0.256 4 0.256–0.261 2
0.257–0.261 6 0.262–0.267 5
0.262–0.266 1 0.268–0.273 4
0.267–0.271 4 0.274–0.279 2
0.272–0.276 1 0.280–0.285 1
For NL:
Class limits Class boundaries
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………
For AL:
Class limits Class boundaries
The data given to us is grouped frequency distribution for the team batting averages
for major league baseball in 2005.
Finding variance and distribution for data for NL
To find the variance and standard deviation of the given distribution, we execute
following steps:
PHM628: Probability and Statistics
Name: ID: 13/32
Step 1 Make a table as shown, and find the midpoint of each class.
A B C D E
Class Frequency(f) Midpoint f . Xm f . Xm2
Step 2 Classes and corresponding frequencies are given to us. List them in column A
and B respectively.
Step 3 The midpoint (Xm) for each class can be calculated using:
lower class limit + lower class limit
Xm =
2
List midpoints corresponding to all the classes in column C.
Step 4 Multiply the frequency by the midpoint for each class, and place the products in
column D.
Step 5 Multiply the frequency by the square of the midpoint, and place the products in
column E.
Step 6 Find the sums of columns B, D, and E. (The sum of column B is n. The sum of
column D is f . X m . The sum of column E is f . X m2 .)
The resulting table obtained is given below:
A B C D E
Class Frequency(f) Midpoint f . Xm f . Xm2
0.252–0.256 4 0.254 0.016 0.258064
0.257–0.261 6 0.259 1.554 0.402486
0.262–0.266 1 0.264 0.264 0.069696
0.267–0.271 4 0.269 1.076 0.289444
0.272–0.276 1 0.274 0.274 0.075076
Total 16 4.184 1.094766
A B C D E
Class Frequency(f) Midpoint f . Xm f . Xm2
Step 2 Classes and corresponding frequencies are given to us. List them in column A
and B respectively.
Step 3 The midpoint (Xm) for each class can be calculated using:
lower class limit + lower class limit
Xm =
2
List midpoints corresponding to all the classes in column C.
Step 4 Multiply the frequency by the midpoint for each class, and place the products in
column D.
Step 5 Multiply the frequency by the square of the midpoint, and place the products in
column E.
Step 6 Find the sums of columns B, D, and E. (The sum of column B is n. The sum of
column D is f . X m . The sum of column E is f . X m2 .)
The resulting table obtained is given below:
A B C D E
Class Frequency(f) Midpoint f . Xm f . Xm2
0.256–0.261 2 0.2585 0.517 0.1336445
0.262–0.267 5 0.2645 1.3225 0.34980125
0.268–0.273 4 0.2705 1.082 0.292681
0.274–0.279 2 0.2765 0.553 0.1529045
0.280–0.285 1 0.2825 0.2825 0.07980625
Total 14 3.757 1.0088375
Step 7 Substitute in the formula and solve to get the variance.
n( f . X m 2 ) − ( f . X m ) 2
s =
2
n(n − 1)
14(1.0088375) − (3.757) 2
=( )
14(14 − 1)
14.123725 − 14.115049
=( )
182
s 2 = 0.0000477
Step 8 Take the square root to get the standard deviation.
s = s2
= 0.0000477 = 0.0069
The table below shows the variance and the standard deviation in NL and AL:
Variance Standard Deviation
NL 0.000043 0.0066
AL 0.000048 0.0069
PHM628: Probability and Statistics
Name: ID: 15/32
3. The average age of senators in the 108th Congress was 59.5 years. If the standard deviation
was 11.5 years, find the z scores corresponding to the oldest and youngest senators:
Robert C. Byrd (D, WV), 86, and John Sununu (R, NH), 40.
For Robert C. Byrd:
x − 86 − 59.5
z= = 2.30
11.5
For John Sununu:
x − 40 − 59.5
z= = −1.70
11.5
4. The average teacher’s salary in a particular state is $54,166. If the standard deviation is
$10,200, find the salaries corresponding to the following z scores.
a. 2 b. -1 c. 0
a) x = + z = 54166 + 2(10200) = $74,566
b) x = + z = 54166 − 1(10200) = $43,966
c) x = + z = 54166 + 0(10200) = $54,166
5. Find the percentile rank for each value in the data set. The data represent the values in
billions of dollars of the damage of 10 hurricanes.
1.1, 1.7, 1.9, 2.1, 2.2, 2.5, 3.3, 6.2, 6.8, 20.3.
(number of values below X ) + 0.5 0+ 0.5
1.1 is P = 100 = 100 5
total number of values 10
(number of values below X ) + 0.5 1+ 0.5
1.7 is P = 100 = 100 15
total number of values 10
(number of values below X ) + 0.5 2+ 0.5
1.9 is P = 100 = 100 25
total number of values 10
(number of values below X ) + 0.5 3+ 0.5
2.1 is P = 100 = 100 35
total number of values 10
(number of values below X ) + 0.5 4+ 0.5
2.2 is P = 100 = 100 45
total number of values 10
(number of values below X ) + 0.5 5+ 0.5
2.5 is P = 100 = 100 55
total number of values 10
(number of values below X ) + 0.5 6+ 0.5
3.3 is P = 100 = 100 65
total number of values 10
(number of values below X ) + 0.5 7+ 0.5
6.2 is P = 100 = 100 75
total number of values 10
(number of values below X ) + 0.5 8+ 0.5
6.8 is P = 100 = 100 85
total number of values 10
(number of values below X ) + 0.5 9+ 0.5
20.3 is P = 100 = 100 95
total number of values 10
PHM628: Probability and Statistics
Name: ID: 16/32
5th, 15th, 25th, 35th, 45th, 55th, 65th, 75th, 85th, 95th
• Computing median:
The new data-set formed using values less than median ( =19) is:
7, 16
number of values in new data set = n = 2
n n + 2 th
Since number of values are even hence the median is mean of ( )th and ( )
2 2
value. Hence median in our case is mean of 1st and 2nd values:
7 + 16
Q1 = = 11.5
2
• Computing Q3 :
Q3 for a given data-set is the median of the data values more the median of
that data set.
The new data-set formed using values more than median ( =19) is:
22, 48
number of values in new data set = n = 2
n n + 2 th
Since number of values are even hence the median is mean of ( )th and ( )
2 2
value. Hence median in our case is mean of 1st and 2nd values:
22 + 48
Q3 = = 35
2
The lowest and highest data values of given data-set are 7 and 48 respectively.
PHM628: Probability and Statistics
Name: ID: 18/32
8. Construct a boxplot for the following data representing the number of games pitched by
major league baseball’s earned run average (ERA) leaders for the past few years.
30, 34, 29, 30, 34, 29, 31, 33, 34, 27, 30, 27, 34, 32.
To construct the boxplot of given data-set, we first need to find five-number summary of a
data set
A five-number summary of a data set consists of five points:
(1) The lowest value of the data set (i.e. minimum)
(2) Q1 (first quartile)
(3) The median
(4) Q3 (third quartile)
(5) The highest value of the data set (i.e. maximum)
Hence five-number summary of a number is:
Minimum, Q1 , median, Q3 , maximum
• Computing median:
The data-set given: 30, 34, 29, 30, 34, 29, 31, 33, 34, 27, 30, 27, 34, 32
To find the median of the given data-set we execute the following steps:
Step 1 Arrange the data-set in order from lowest to highest:
27, 27, 29, 29, 30, 30, 30, 31, 32, 33, 34, 34, 34, 34
Step 2 Compute number of values (n) in given data set.
number of values = n = 14
n n + 2 th
Since number of values are odd hence the median is mean of ( )th and ( ) value.
2 2
Hence median of data set is mean of 7th and 8th value.
30 + 31
Median = = = 30.5
2
• Computing Q1 :
Q1 for a given data-set is the median of the data values below the median of
that data set.
The new data-set formed using values less than median ( =30.5) is:
n + 1 th
Since number of values are odd hence the median is ( ) value. Hence median in
2
our case is 4th values:
Q1 = 29
• Computing Q3 :
Q3 for a given data-set is the median of the data values more the median of
that data set.
The new data-set formed using values more than median ( =30.5) is:
9. Starting teacher salaries (in equivalent U.S. dollars) for upper secondary education in
selected countries are listed below. Which set of data is more variable? (The U.S. average
starting salary at this time was $29,641.)
Europe Asia
Sweden $48,704 Korea $26,852
Germany 41,441 Japan 23,493
Spain 32,679 India 18,247
Finland 32,136 Malaysia 13,647
Denmark 30,384 Philippines 9,857
Netherlands 29,326 Thailand 5,862
Scotland 27,789
For Europe:
n X
1
2
3
4
5
6
7
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………
………………
For Asia:
n X
1
2
3
4
5
6
PHM628: Probability and Statistics
Name: ID: 21/32
EUROPE:
The range is the difference between the highest and lowest data value:
R = highest value - lowest value
= 48704 – 27789 = 20915
The mean is the sum of all values divided by the number of values:
X= =
X 48704 + 41441 + ... + 29326 + 27789
= 34637
n 7
where n is the number of values in the data set.
The variance is the sum of squared deviations from the mean divided by n-1:
=
2 (X − X ) 2
Variance = s
n −1
The shortcut formulas for computing the variance:
n( X 2 ) − ( X ) 2
s =
2
147.6
n(n − 1)
The standard deviation is the square root of the variance:
n( X 2 ) − ( X ) 2
SD = s = s = 2
n(n − 1)
= 57908917 7609.7909
s
CV = 100
x
7609.79
= 100 = 21.97%
34637
ASIA:
The range is the difference between the highest and lowest data value:
R = highest value - lowest value
= 26852 – 5862 = 20990
The mean is the sum of all values divided by the number of values:
X= =
X 26852 + 23493 + ... + 9857 + 5862
16326.3333
n 6
where n is the number of values in the data set.
The variance is the sum of squared deviations from the mean divided by n-1:
=
2 (X − X ) 2
Variance = s
n −1
The shortcut formulas for computing the variance:
n( X 2 ) − ( X ) 2
s =
2
64874621
n(n − 1)
The standard deviation is the square root of the variance:
PHM628: Probability and Statistics
Name: ID: 22/32
n( X 2 ) − ( X ) 2
SD = s = s = 2
n(n − 1)
= 64874621 8054.4783
s
CV = 100
x
8054.48
= 100 = 49.33%
16326.33
Since the coefficient of variation for the data from Asia is higher, hence starting teachers’
salaries in Asia is more variable.
(g) construct a Venn diagram to illustrate the intersections and unions of the events A, B
and C.
2. An experiment consists of tossing a die and then flipping a coin once if the
number on the die is even. If the number on the die is odd, the coin is flipped
twice. Construct a tree diagram to show the 18 elements of the sample space S.
then:
S = {1HH, 1HT, 1TH, 1TT, 2H, 2T, 3HH, 3HT, 3TH, 3TT, 4H, 4T, 5HH, 5HT, 5TH, 5TT,
6H, 6T}.
(a) list the elements corresponding to the event A that a number less than 3 occurs on
the die;
A = {1HH, 1HT, 1TH, 1TT, 2H, 2T}.
(b) list the elements corresponding to the event B that two tails occur;
B = {1TT, 3TT, 5TT}.
(c) list the elements corresponding to the event A ;
A′ = {3HH, 3HT, 3TH, 3TT, 4H, 4T, 5HH, 5HT, 5TH, 5TT, 6H, 6T}.
(d) list the elements corresponding to the event A ∩B;
A′ ∩ B = {3TT, 5TT}.
(e) list the elements corresponding to the event A∪B.
A ∪ B = {1HH, 1HT, 1TH, 1TT, 2H, 2T, 3TT, 5TT}.
4. If S = {x | 0 < x < 12}, M = {x | 1 < x < 9}, and N = {x | 0 < x < 5}, find
(a) M ∪ N;
M ∪ N = {x | 0 < x < 9}.
PHM628: Probability and Statistics
Name: ID: 24/32
(b) M ∩ N;
M ∩ N = {x | 1 < x < 5}.
(c) M ∩ N
M′ ∩ N′ = {x | 8 < x < 12}.
5. (a) How many three-digit numbers can be formed from the digits 0, 1, 2, 3, 4, 5, and 6 if
each digit can be used only once?
Any of the 6 nonzero digits can be chosen for the hundreds position, and of the
remaining 6 digits for the tens position, leaving 5 digits for the units position. So, there
are (6)(6)(5) = 180 three-digit numbers.
(b) How many of these are odd numbers?
The units position can be filled using any of the 3 odd digits. Any of the remaining 5
nonzero digits can be chosen for the hundreds position, leaving a choice of 5 digits for
the tens position. By Theorem 2.2, there are (3)(5)(5) = 75 three-digit odd numbers.
(c) How many are greater than 330?
(i) If a 4, 5, or 6 is used in the hundreds position there remain 6 and 5 choices,
respectively, for the tens and units positions. This gives (3)(6)(5) = 90 three-digit
numbers beginning with a 4, 5, or 6.
(ii) If a 3 is used in the hundreds position, then a 4, 5, or 6 must be used in the tens
position leaving 5 choices for the units position. In this case, there are (1)(3)(5) = 15
three-digit number begin with a 3.
So, the total number of three-digit numbers that are greater than 330 is 90 + 15 = 105.
6. Three lottery tickets for first, second, and third prizes are drawn from a group of 40
tickets. Find the number of sample points in S for awarding the 3 prizes if each
contestant holds only 1 ticket.
Three lottery tickets, from a group of 40 tickets are drawn for first, second and third prizes.
Because the order is important, we need to use permutations.
The number of permutations of 40 distinct objects taken 3 at a time is:
40! 40!
P= = = (40)(39)(38) = 59280
(40 − 3)! 37!
10 3
Hence, there are 59280 possible ways for awarding the 3 tickets from a group of 40
tickets.
7. How many ways are there to select 3 candidates from 8 equally qualified recent
graduates for openings in an accounting firm?
We need to calculate in how many ways 3 candidates can be selected out of 8 equally
qualified recent graduates.
The order in which we choose candidates is not important. Because the order is not
important, we need to use combinations.
The number of combinations of 8 distinct objects taken 3 at a time is:
8 8!
8
C3 = = = 56
3 3!(8 − 3)!
Hence, there are 56 possible ways.
PHM628: Probability and Statistics
Name: ID: 25/32
Part II
1. A box contains 500 envelopes, of which 75 contain $100 in cash, 150 contain $25, and 275
contain $10. An envelope may be purchased for $25. What is the sample space for the
different amounts of money? Assign probabilities to the sample points and then find the
probability that the first envelope purchased contains less than $100.
A box contains 500 envelopes, of which 75 contain $100 in cash, 150 contain $25, and 275
contain $10.
Hence, the sample space for the different amounts of money is:
S = {$10, $25, $100}
If an experiment can result in any one of N different equally likely outcomes, and if
exactly n of these outcomes correspond to the event A, then the probability of event A
is:
n
P(A) = .
N
• We have N = 500 envelopes, of which n1 = 75 contain $100 in cash. The
probability of randomly choosing the $100 cash envelope is
n 75 3
P(randomly choosing the $100 cash envelope) = 1 = = = 0.15 .
N 500 20
• We have N = 500 envelopes, of which n2 = 150 contain $25 in cash. The
probability of randomly choosing the $25 cash envelope is
n 150 3
P(randomly choosing the $25 cash envelope) = 2 = = = 0.3 .
N 500 10
• We have N = 500 envelopes, of which n3 = 275 contain $100 in cash. The
probability of randomly choosing the $10 cash envelope is
n 275 11
P(randomly choosing the $10 cash envelope) = 3 = = = 0.55 .
N 500 20
• Among 500 envelopes, there are 150 + 275 =425 envelopes which contains less
than $100.
• Hence, the probability that the first envelope purchased contains less than $100
is
425 17
P(the first envelope purchased contains less than $100) = = = 0.85 .
500 20
2. A pair of fair dice is tossed. Find the probability of getting
When a pair of fair dice is thrown, both can show an outcome of 1, 2, 3, 4, 5, 6. If x and y
denote outcomes of two dice then the sample space S for this can be given as
S = { ( x, y) |1 x, y 6 }
where x and y can only take integer values.
total number of elements in sample space =36
Here we have treated both dices as distinct. Which means (x, y) and (y, x) are not
identical.
(a) a total of 8;
PHM628: Probability and Statistics
Name: ID: 26/32
the probability that the selected student took mathematics or history is denoted by
P( M H ) :
P ( M H ) = P ( M ) + P( H ) − P( M H )
P(M H ) = 0.54 + 0.69 − 0.35 = 0.88
(b) the student did not take either of these subjects;
the probability that the selected student took neither of these subjects is denoted by
P( M H):
P( M H ) = 1 − P( M H)
P( M H ) = 1 − 0.88 = 0.12
(c) the student took history but not mathematics.
the probability that the selected student took history but not mathematics is denoted
by
P( H M):
P( H M ) = P( H ) − P( H M)
P( H M ) = 0.69 − 0.35 = 0.34
H 21 36 30
NH 48 26 19
where H and NH in the table stand for Hypertension and Non-hypertension, respectively.
If one of these individuals is selected at random, find the probability that the person is
We have the following data:
Nonsmokers Moderate Heavy Total
Smokers Smokers
H 21 36 30 87
NH 48 26 19 93
Total 69 62 49 180
(b) Find P( B C ) .
………………………………………………………………………………………………….…..…………………………………
……………………………………………………………
(c) Find P(C).
………………………………………………………………………………………………….…..…………………………………
……………………………………………………………
(d) Find the probability that the river is polluted, given that fishing is permitted, and the
sample tested did not detect pollution.
………………………………………………………………………………………………….…..…………………………………
……………………………………………………………
…………………………………………………………………………………………………..
6. Police plan to enforce speed limits by using radar traps at four different locations within
the city limits. The radar traps at each of the locations L1, L2, L3, and L4 will be operated
40%, 30%, 20%, and 30% of the time. If a person who is speeding on her way to work has
probabilities of 0.2, 0.1, 0.5, and 0.2, respectively, of passing through these locations,
what is the probability that she will receive a speeding ticket?
If the person received a speeding ticket on her way to work, what is the probability that
she passed through the radar trap located at L2?
PHM628: Probability and Statistics
Name: ID: 30/32
P( A) = 0.95
P( B) = 0.7 P( B) = 1 − 0.7 = 0.3,
We have
P(C ) = 0.8 P(C ) = 1 − 0.8 = 0.2,
P( D) = 0.9.
So, the required probability is P( A ( B C ) D) . Because components are independent,
we have: P( A ( B C ) D) = P( A).P( B C ).P( D)
Using that
Method 1:
P( A).P( B C ).P( D) = P( A).[ P( B) + P(C ) − P( B).P(C )].P( D)
= (0.95).[(0.7) + (0.8) − (0.7).(0.8)].(0.9) 0.804
Method 2:
P( B C ) = 1 − P( B C ) = 1 − P( B C ) = 1 − P( B) P(C )
So,
P( A ( B C ) D) = P( A).P( B C ).P( D)
P( A ( B C ) D) = P( A).(1 − P( B) P(C )).P( D)
= (0.95)(1 − (0.3)(0.2))(0.9) 0.804
Let T denote the event that a person will receive a speeding ticket.
• Locations of the radar traps are L1, L2, L3, and L4 and they will be operating 40%,
30%, 20%, and 30% of the time respectively. Hence,
P(T | L1 ) = 40% = 0.4
P(T | L2 ) = 30% = 0.3
P(T | L3 ) = 20% = 0.2
P(T | L4 ) = 30% = 0.3
• A person who is speeding on her way to work has probabilities of 0.2, 0.1, 0.5,
and 0.2, respectively, of passing through locations L1, L2, L3, and L4 respectively.
i. e. ,
P( L1 ) = 0.2
P( L2 ) = 0.1
P( L3 ) = 0.5
P( L4 ) = 0.2
We need the probability that she will receive a speeding ticket, i. e. we need to calculate P(T )
.
According to the theorem of total probability,
4
P(T ) = P(T | Li )P( Li )
i =1
P( L2 ) P(T | L2 )
According to Bayes’ rule, we have P( L2 | T ) = 4
P( L ) P(T | L )
i =1
i i
(0.1)(0.3) 0.03
= = = 0.1111
(0.2)(0.4) + (0.1)(0.3) + (0.5)(0.2) + (0.2)(0.3) 0.27
Suppose that a malfunction was reported, and it was found to be caused by other human
errors. What is the probability that it came from station C?
7. The number of malfunctions reported by each station and the causes are shown at the
following table:
Station A B C Total
Total 18 15 10 43
7
P( H | B) = 0.47
15
The probability that malfunction report came from C is due to human errors (i. e. let’s find
P( H | C ) .
5
P( H | C ) = 0.5
10
Finally, using Bayes’ rule, we get the required probability as follows:
P( H | C ) P(C )
P(C | H ) =
P( H | C ) P(C ) + P( H | B) P( B) + P( H | A) P( A)
(0.5)(0.23) 0.115
= = 0.26
(0.5)(0.23) + (0.47)(0.35) + (0.39)(0.42) 0.4433
19
Note: We can get P( H ) = 0.4419 directly
43
AIN SHAMS UNIVERSITY
FACULTY OF ENGINEERING
2. Determine the value c so that the following function can serve as a probability distribution of
2 3
the discrete random variable X: f ( x ) = c , for x = 0, 1, 2.
x 3 − x
Since f is the distribution function of the discrete random variable X which obtains values 0,
1 and 2, we have:
2
f ( x) = 1
x=0
2
2 3
c x 3 − x = 1
x=0
2
2 3
c =1
x = 0 x 3 − x
2 3 2 3 2 3
c . + . + . = 1
0 3 1 2 2 1
c((1)(1) + (2)(3) + (1)(3)) = 1
1
c(10) = 1 c =
10
3. A shipment of 7 television sets contains 2 defective sets. A hotel makes a random purchase
of 3 of the sets. If x is the number of defective sets purchased by the hotel, find the probability
distribution of X.
x 0 1 2
2 4 1
f(x)
7 7 7
1
PHM628: Probability and Statistics
Name: ID: 2/34
In order to find the probability distribution of X, we need to find the probability of X being
each of the possible values.
Since there are 2 defective sets and the hotel is purchasing 3 sets, X can take values 0,
1 and 2.
(i) If X = 0 then all 3 purchased sets are among the 5 non-defective television sets
so:
5 2
3 0
P{X = 0} =
= 10 = 2
7 35 7
3
(ii) If X = 1 then 2 of 3 purchased sets are among the 5 non-defective television
sets, while 1 purchased set is among the 2 defective sets so:
5 2
2 1
P{X = 1} =
= (10)(2) = 4
7 35 7
3
(iii) If X = 2 then 2 of 3 purchased sets are exactly the 2 defective television sets,
while the remaining one is non-defective so:
5 2
1 2
P{X = 2} =
= (5)(1) = 1
7 35 7
3
a) Express the results graphically as a probability histogram.
Histogram:
PHM628: Probability and Statistics
Name: ID: 3/34
b) Find the cumulative distribution function of the random variable X representing the
number of defectives.
0 x0
2
0 x 1
7
=
6 1 x 2
7
1 2 x
ii. P (0 < X ≤ 2)
2 5
P(0 X 2) = P( X 2) − P( X 0) = F (2) − F (0) = 1 − =
7 7
d) Construct a graph of the cumulative distribution function.
PHM628: Probability and Statistics
Name: ID: 4/34
F(x)
1
6/7
5/7
4/7
3/7
2/7
1/7
x
1 2 3
4. Determine the probability mass function of X from the following cumulative distribution
function:
0 x −2
0.2 -2 x 0
F ( x) =
0.7 0 x2
1 2 x
The following figure displays a plot of F (x). From the plot, the only points that receive
nonzero probability are –2, 0, and 2. The probability mass function at each point is the
jump in the cumulative distribution function at the point.
Therefore,
f(-2) = 0.2 − 0 = 0.2
f(0) = 0.7 – 0.2 =0.5
f(2) = 1 – 0.7 =0.3
PHM628: Probability and Statistics
Name: ID: 5/34
5. A coin is biased such that a head is three times as likely to occur as a tail. Find the expected
number of tails when this coin is tossed twice.
Let T be a random variable which represents the number of tails in 2 tosses of this
biased coin. Also, let X and Y be the random variables which represent the number of
tails in the first and second toss, respectively, of the same coin.
Obviously, T can assume the values 0,1 and 2, while X and Y only assume the values
0 and 1.
The coin is biased in such way that the probability of a head occurring is three times
larger than the probability of a tail occurring. Thus, since the sum of those 2
probabilities must be 1, we get the system of equations:
P( X = 0) = 3P( X = 1) ,
P( X = 0) + P( X = 1) = 1
3P( X = 1) + P( X = 1) = 1
P( X = 1) = 0.25 P( X = 0) = 0.75
Of course that also implies P(Y = 1) = 0.25 and P(Y = 0) = 0.75.
Further on, let’s rewrite some probabilities with the help of what we’ve found
out.
P(T = 0) = P( X = 0,Y = 0)
= P( X = 0) P(Y = 0)
9
= (0.75)(0.75) =
16
P(T = 1) = P( X = 1,Y = 0) + P( X = 0,Y = 1)
= P( X = 1) P(Y = 0) + P( X = 0) P(Y = 1)
3
= (0.25)(0.75) + (0.75)(0.25) =
8
P(T = 2) = P( X = 1,Y = 1)
= P( X = 1) P(Y = 1)
1
= (0.25)(0.25) =
16
In (1), (2) and (3) we used the independence of the variables X and Y.
Now, we find the expected number of tails by directly applying the formula for
the expected value of a discrete random variable.
E (T ) = tf (t )
t
2
= t.P (T = t )
t =0
9 3 1 1
= (0)( ) + (1)( ) + (2)( ) =
16 8 16 2
PHM628: Probability and Statistics
Name: ID: 6/34
6. The distribution of the number of imperfections per 10 meters of synthetic fabric is given by
x 0 1 2 3 4
f(x) 0.41 0.37 0.16 0.05 0.01
c) Find E x( )
2
In order to determine the required expected value, we will use the formula
E ( g ( X )) = g ( x) f ( x)
x
2
With g(x) = x
E ( X 2 ) = x 2 f ( x)
x
4
= x 2 f ( x)
x=0
= (0) (0.41) + (1) (0.37) + (2) (0.16) + (3) 2 (0.05) + (4) 2 (0.01) = 1.62
2 2 2
PHM628: Probability and Statistics
Name: ID: 7/34
For the discrete random variable X, which represents the number of imperfections
per 10 meters, the following probability distribution table is given
x 0 1 2 3 4
f(x) 0.41 0.37 0.16 0.05 0.01
To find the variance of X, we will use the formula
X2 = E ( X 2 ) − X2
Thus, by inserting those values in the formula provided above, we obtain
X2 = E ( X 2 ) − X2 = 1.62 − 0.882 = 0.8456
Now, we compute the standard deviation of X by taking the positive square root of the
variance, and so:
X = 0.8456 = 0.92
7. If a random variable X is defined such that E ( x − 1) = 10 and E ( x − 2 ) = 6 , find μ and 2 .
2 2
2 4
PHM628: Probability and Statistics
Name: ID: 8/34
We have:
P( X 3) = P( X = 1) + P( X = 2) + P( X = 3)
5 5 5
= (0.75)1 (0.25)4 + (0.75)2 (0.25)3 + (0.75)3 (0.25) 2
1 2 3
5! 5! 5!
= (0.75)(0.0039) + (0.5625)(0.0156) + (0.4218)(0.0625) 0.3672
1!4! 2!3! 3!2!
PHM628: Probability and Statistics
Name: ID: 9/34
2. A national study that examined attitudes about antidepressants revealed that approximately
70% of respondents believe “antidepressants do not really cure anything; they just cover up
the real trouble.” According to this study:
a) what is the probability that at least 3 of the next 5 people selected at random will hold
this opinion?
• Approximately, 70% of respondents believe that antidepressants do not
really cure anything. So, the probability that respondent believes that
antidepressants do not really cure anything is p = 0.7
• We need to find the probability that at least 3 of the next 5 people
selected at random will hold this opinion
• We need to find the probability that at least 3 of the next 5 respondents believe
that antidepressants do not really cure anything. So, we need to find
P( X 3)
The probability distribution of the binomial random variable X is:
5
P( X = x) = b( x;5,0.7) = (0.7) x (0.3)5− x
x
We have:
P( X 3) = 1 − P( X 3) = 1 − P( X 2)
= 1 − {P( X = 0) + P( X = 1) + P( X = 2)} = 0.8369
b) If X represents the number of people who believe that antidepressants do not cure but
only cover up the real problem, find the mean and variance of X when 5 people are
selected at random.
• Approximately 70% of respondents believe that antidepressants do not really
cure anything. So, the probability that randomly selected respondent believes
that antidepressants do not really cure anything is p = 0.7
The probability of success (i.e. the probability that respondent believes that
antidepressants do not really cure real problem) is p = 0.7
So, the mean and variance of the binomial random variable X with parameters n = 5
and p = 0.7 are:
= np = (5)(0.7) = 3.5
= np(1 − p) = (5)(0.7)(1 − 0.7) = 1.05
2
3. In a batch of 2000 calculators, there are, on average, 8 defective ones. If a random sample of
150 is selected, find the probability of 5 defective ones.
In a batch of 2000 calculators, there are on average, 8 defective ones. A random sample of
150 calculators is chosen.
Since the description talks about no. of defective calculators in a batch of calculators, no.
of defective calculators is a Poisson random variable X with
8
p= = 0.004
2000
And since 150 calculators are randomly selected.
n = 150
To find: The probability that 5 calculators are defective out of 150 or P(X = 5).
Recall the formula to calculate probability for a Poisson random variable.
e− x
P( x) = ,
x!
Where x = 0, 1, 2, ... and = np
In this case,
= np = 0.004 150 = 0.6
Hence,
e −0.6 (0.6)5
P( x = 5) = 3.56 10−4
5!
PHM628: Probability and Statistics
Name: ID: 11/34
4. A mail-order company receives an average of 5 orders per 500 solicitations. If it sends out
100 advertisements, find the probability of receiving at least 2 orders.
A mail-order company receives an average of 5 orders per 500 solicitations. 100
advertisements are sent out at random.
Since the description talks about orders being spread over certain no. of solicitations,
number of orders is a Poisson random variable X with
5
p= = 0.01
500
And since 100 advertisements are sent out at random,
n = 100
To find: The probability that at least 2 orders are received or P(X 2).
Recall the formula to calculate probability of receiving x no. of orders when n
advertisements are sent out.
e− x
P( x) = ,
x!
Where x = 0, 1, 2, ... and = np
In this case,
= np = 0.01 100 = 1
Hence,
P( x 2) = 1 − P( x 1)
= 1 − P( x = 1) − P( x = 0)
e−1 (1)1 e−1 (1)0
= 1− − = 0.2642
1! 0!
5. A bookstore owner examines 5 books from each lot of 25 to check for missing pages. If he
finds at least 2 books with missing pages, the entire lot is returned. If, indeed, there are 5
books with missing pages, find the probability that the lot will be returned.
A bookstore owner examines 5 books from each lot of 25 to check for missing pages. If he
finds at least 2 missing pages, the entire lot is returned.
To find Probability that the lot will be returned If there are 5 books with missing pages.
Since there are only 2 types of books, 5 books with missing pages and 20 books with no
missing pages, this is a case of Hypergeometric distribution.
Formula to calculate probability P(x) of selecting without replacement x books with
missing pages is
a b
x n − x
P( x) =
a + b
n
where x = no. of items of type a out of n items drawn, a = total no. of items of type a in the
total population, b = total no. items of type b, n = Total no. of items drawn from the
population.
In our case,
PHM628: Probability and Statistics
Name: ID: 12/34
x2
a = 5, b = 20, n = 5
Hence,
P( x 2) = 1 − P( x 1)
= 1 − P( x = 1) − P( x = 0)
5 20 5 20
0 5 1 4
= 1 − − 0.2522
25 25
5 5
1. The shelf life, in days, for bottles of a certain prescribed medicine is a random variable
having the density function
20,000
, x 0,
f ( x ) = ( x + 100)3
0,
elsewhere.
Find the probability that a bottle of this medicine will have a shell life of
a) at least 200 days.
Let X be the shelf life for a bottle of the medicine, i.e. the random variable with the
density function f.
P( X 200) = 1 − P( X 200)
200
= 1−
x=0
f ( x)dx
200
20000
= 1−
x = 0 ( x + 100)
3
dx
10000
= 1 − (− ) |0200
( x + 100) 2
10000 10000
= 1+ −
(200 + 100) 2 (0 + 100) 2
10000 1
= 1+ −1=
90000 9
120
P(80 X 120) =
x =80
f ( x)dx
PHM628: Probability and Statistics
Name: ID: 13/34
120
20000
= 1−
x = 80 ( x + 100)
3
dx
10000 120
= (− ) |80
( x + 100) 2
10000 10000
=− 2
− (− ) = 0.102
(220) (180) 2
−
f ( x)dx = 1,
Therefore, we get:
1
1=
−
f ( x)dx = k xdx + 0
0
2 3 1 2k
x ) |0 = = k(
3 3
3
k=
2
b) Find F(x) and use it to evaluate P(0.3 < X < 0.6).
F(x) = 0 for x ≤ 0
For 0 < x < 1
x
F ( x) = P( X x) = f (t )dt
0
x
3
= tdt
0 2
3 2
= ( t 3 ) |0x = x3
2 3
For x ≥ 1
x 1
F ( x) = f (t )dt = f ( x)dt = 1
0 0
0, x0
F ( x) = x 3 , 0 x 1
1, x 1
P(0.3 X 0.6) = P( X 0.6) − P( X 0.3)
PHM628: Probability and Statistics
Name: ID: 14/34
= P( X 0.6) − P( X 0.3)
= F (0.6) − F (0.3)
= 0.63 − 0.33 = 0.3004
3. Measurements of scientific systems are always subject to variation, some more than
others. There are many structures for measurement error, and statisticians spend a great
deal of time modeling these errors. Suppose the measurement error X of a certain physical
quantity is decided by the density function:
k (3 − x 2 ), -1 x 1,
f ( x) =
0, elsewhere.
a) Determine k that renders f(x) a valid density function.
Since f is the density function, it has the following property:
−
f ( x)dx = 1,
Therefore, we get:
1
1=
−
f ( x)dx = k (3 − x 2 )dx
−1
1 16k
= k (3x − x3 ) |1−1 =
3 3
3
k=
16
b) Find the probability that a random error in measurement is less than 1/2.
1
We need to find the probability P( X ).
2
1 1
P( X ) = P(−1 X )
2 2
1
2
3
= 16 (3 − x
2
)dx
−1
3 1 3 12
= (3x − x ) |−1
16 3
3 3 1 1 1 99
= ( − ( )3 − (−3) + (−1)3 ) =
16 2 3 2 3 128
c) For this particular measurement, it is undesirable if the magnitude of the error (i.e.,
|x|) exceeds 0.8. What is the probability that this occurs?
We need to find the probability p(| X | 0.8), i. e. the probability P( X −0.8 or X 0.8).
P(| X | 0.8) = 1 − P(| X | 0.8)
= 1 − p(−0.8 X 0.8)
PHM628: Probability and Statistics
Name: ID: 15/34
0.8
3
= 1− 16 (3 − x )dx
2
−0.8
0.8
3
= 1− 2 (3 − x 2 )dx
0 16
3 1
= 1 − 2. (3x − x3 ) |0.8 0
16 3
3 1 41
= 1 − ((3(0.8) − (0.8)3 ) − 0) =
8 3 250
1 3
P ( X ) = f ( x)dx
3 −
1
3
= 2(1 − x) dx
0
x 2 13
= 2( x − ) |0
2
1 1 2 1 02 5
= 2( − ( ) ( ) − 0 + ) =
3 3 2 2 9
P(Y 0.5) =
0.5
f ( x)dx
1
= 2(1 − x)dx
0.5
x2 1
= 2( x − ) |0.5
2
12 0.52 1
= 2(1 − ( ) − 0.5 + )=
2 2 4
PHM628: Probability and Statistics
Name: ID: 16/34
c) Given that X ≥ 0.5, what is the probability that X will be less than 0.75?
f ( x)dx
= 0.5
1
0.5
f ( x)dx
0.75
2(1 − x)dx
= 0.5
1
2(1 − x)dx
0.5
x 2 0.75
2( x − ) |0.5
= 2
x2 1
2( x − ) |0.5
2
0.752 0.52
0.75 − ( ) − 0.5 +
= 2 2 =3
2 2
1 0.5 4
1 − ( ) − 0.5 +
2 2
5. The time until a chemical reaction is complete (in milliseconds) is approximated by the
cumulative distribution function
0, x 0,
f ( x) =
1 − e 0 x.
−0.01 x
,
Determine the probability density function of X. What proportion of reactions is complete
within 200 milliseconds?
Using the result that the probability density function is the derivative of F (x), we
obtain
0, x 0,
f (x ) =
0.01e
−0.01 x
, 0 x.
The probability that a reaction completes within 200 milliseconds is
2. Given a normal distribution with μ = 30 and σ = 6, find the value of x that has 80% of the
normal curve area to the left.
Let’s find the value of x that has 80% of the normal curve area to the left.
From the normal tables, we get, that the value of z that leaves an area of 0.80 to the
left is 0.842.
Therefore,
x−
z=
x =z +
x = 6(0.842) + 30 = 35.05
PHM628: Probability and Statistics
Name: ID: 18/34
3. Given the normally distributed variable X with mean 18 and standard deviation 2.5, find
the value of k such that P(X >k) = 0.1814.
Let’s find the value of k such that = P( X k ) = 0.1814..
X − k−
P( ) = 0.1814
k−
1 − P( Z ) = 0.1814
k−
P( Z ) = 1 − 0.1814 = 0.8186
P(Z 0.91) = 0.8186
Therefore,
k−
= 0.91 k = 20.275
4. The heights of 1000 students are normally distributed with a mean of 174.5 centimeters
and a standard deviation of 6.9 centimeters. Assuming that the heights are recorded to
the nearest half-centimeter, how many of these students would you expect to have
heights
Let random variable X represent the height of students that follows normal
distribution with mean 174.5 cm and the standard deviation of 6.9 cm.
5. If a set of observations is normally distributed, what percent of these differ from the mean
by
a) more than 1.3σ?
Let’s calculate what percent of these differ from the mean by more than 1.3
P(| X − | 1.3 ) = 1 − P(| X − | 1.3 )
= 1 − P(−1.3 X − 1.3 )
−1.3 X − 1.3
= 1 − P( )
= 1 − P(−1.3 Z 1.3)
= 1 − P(Z 1.3) + P(Z −1.3) = 1 − 0.9032 + 0.0968 = 0.1936 = 19.36%
6. The length of time for one individual to be served at a cafeteria is a random variable having
an exponential distribution with a mean of 4 minutes. What is the probability that a
person is served in less than 3 minutes on at least 4 of the next 6 days?
• Let random variable X represent the length of time for one individual to be served at
a cafeteria.
-X follows an exponential distribution with a mean μ = 4.
-The probability density function of exponential distribution is:
1 − x /4
e ; x0
f ( x) = 4
0, elsewhere
First, let’s calculate the probability that a person is served in less than 3 minutes.
3
= P( x = 4) + P( x = 5) + P( x = 6)
6 6 6
= (0.5276)4 (1 − 0.5276)2 + (0.5276)5 (1 − 0.5276)1 + (0.5276)6 (1 − 0.5276)0 = 0.3968
4 5 6
7. Suppose that a study of a certain computer system reveals that the response time, in
seconds, has an exponential distribution with a mean of 3 seconds.
What is the probability that response time exceeds 5 seconds?
• Let random variable X represent the response time, in seconds.
- X has an exponential distribution with a mean of = 3 seconds.
The probability density function of exponential distribution is:
1 − x /3
e ; x0
f ( x; ) = 3
0, elsewhere
Let’s calculate the probability that response time exceeds 5 seconds.
PHM628: Probability and Statistics
Name: ID: 21/34
P( x 5) = 1 − P( x 5)
1 5 − x /3
= 1 − e dx
30
= 1 − (1 − e −5/3 ) = 0.1889
a) What is the probability that response time exceeds 10 seconds?
Let’s find the probability that response time exceeds 10 seconds.
P( x 10) = 1 − P( x 10)
1 10
= 1 − e− x /3dx
30
= 1 − (1 − e −10/3 ) = 0.0357
Part V: Joint Distributions
1. Determine the values of c so that the following function represents joint probability
distribution of the random variables X and Y:
f(x, y) = c|x − y|, for x = −2, 0, 2; y = −2, 3.
f(x, y) = c|x − y|, for x = −2, 0, 2; y = −2, 3.
f ( x, y) = 1
x y
So,
1 = f ( x, y ) = c | x − y |= c | x − y |
x y x y x y
b) P(X + Y = 4)
We need to determine the probability P( X + Y = 4) by using the definition of the joint
probability function
2
P( X + Y = 4) = P( X = 4 − y, Y = y )
y=0
x+ y
2
=
y = 0 30
x + 0 x +1 x + 2 x +1
= + + = for x ϵ {0, 1, 2, 3}
30 30 30 10
x+ y 3
=
x = 0 30
0 + y 1+ y 2 + y 3 + y 2y + 3
= + + + = for y ϵ {0, 1, 2}
30 30 30 30 15
3. A fast-food restaurant operates both a drive through facility and a walk-in facility. On a
randomly selected day, let X and Y , respectively, be the proportions of the time that the
drive-through and walk-in facilities are in use, and suppose that the joint density function
of these random variables is
2
( x + 2 y ), 0 x 1,0 y 1,
f ( x, y ) = 3
0, elsewhere.
a) Find the marginal density of X
g ( x) =
−
f ( x, y)dy
PHM628: Probability and Statistics
Name: ID: 23/34
1
2
= ( x + 2 y)dy
0 3
2
= ( xy + y 2 ) |10
3
2 2
= (( x + 1)2 − (0.x + 02 )) = ( x + 1)
3 3
b) Find the marginal density of Y
By the use of the definition of the marginal density of Y we get:
h( y ) =
−
f ( x, y)dx
1
2
= ( x + 2 y)dx
0 3
2 x2
= ( + 2 xy ) |10
3 2
2 12 02 1
= (( + 2 y ) − ( + 2 y.0)) = (4 y + 1)
3 2 2 3
c) Find the probability that the drive-through facility is busy less than one-half of the time
We need to determine the probability P( X 0.5) .
P( X 0.5) = P(0 X 0.5,0 Y 1)
1 0.5
2
= ( x + 2 y)dxdy
0 0 3
2 1 x2
= ( + 2 xy) |xx == 0.5
0
dy
30 2
2 1 0.52
= ( + 2(0.5) y )dy
30 2
21 1
= ( y + )dy
30 8
2
2 y y
= ( + ) |10
3 2 8
2 12 1 5
= ( + )=
3 2 8 12
4. A candy company distributes boxes of chocolates with a mixture of creams, toffees, and
cordials. Suppose that the weight of each box is 1 kilogram, but the individual weights of
the creams, toffees, and cordials vary from box to box. For a randomly selected box, let X
and Y represent the weights of the creams and the toffees, respectively, and suppose that
the joint density function of these variables is
PHM628: Probability and Statistics
Name: ID: 24/34
24 xy, 0 x 1,0 y 1, x + y = 1,
f ( x, y ) =
0, elsewhere.
a) Find the probability that in a given box the cordials account for more than 1/2 of the
weight.
Since X is the weight of the creams, Y is the weight of the toffees and the total weight
of the box is 1 kg, we know that the weight of the cordials is 1- X – Y.
1
We need to determine the probability P(1 − X − Y ) .
2
1 1
P(1 − X − Y ) = P( X + Y )
2 2
1 1
−x
2 2
= f ( x, y ) dydx
0 0
1 1
−x
2 2
= 24 xydydx
0 0
1 1
−x
2 2
= x( 24 ydy )dx
0 0
1
2 1
−x
= x(12 y ) | dx 2 2
0
0
1
2
1
= x(12( − x) 2 )dx
0 2
1
2
= (3 x − 12 x 2 + 12 x 3 )dx
0
3x 2 1
1
=( − 4 x + 3x ) |02 =
3 4
2 16
b) Find the marginal density for the weight of the creams.
We need to find the marginal density for the random variable X:
g ( x) =
−
f ( x, y)dy
1− x
= 24 xydy
0
y 2 1− x
= 24 x( ) |0
2
(1 − x) 2
= 24 x( )
2
= 12 x(1 − x) 2 for 0 x 1
PHM628: Probability and Statistics
Name: ID: 25/34
c) Determine whether the two random variables X and Y are dependent or independent.
x
f(x, y)
2 4
1 0.10 0.15
y 3 0.20 0.30
5 0.10 0.15
The two random variables X and Y are independent since the equality
f(x, y) = g(x) h(y) for all (x, y)
holds for all values of x and y.
If we check that identity for all combinations of x and y, we get:
f(1, 2) = 0.1 = (0.4)(0.25) = g(2) h(1)
f(1, 4) = 0.15 = (0.6)(0.25) = g(4) h(1)
f(3, 2) = 0.2 = (0.4)(0.5) = g(2) h(3)
f(3, 4) = 0.3 = (0.6)(0.5) = g(4) h(3)
f(5, 2) = 0.1 = (0.4)(0.25) = g(2) h(5)
f(5, 4) = 0.15 = (0.6)(0.25) = g(4) h(5)
PHM628: Probability and Statistics
Name: ID: 26/34
6. A coin is tossed twice. Let Z denote the number of heads on the first toss and W the total
number of heads on the 2 tosses. If the coin is unbalanced and a head has a 40% chance
of occurring, find
the joint probability distribution of W and Z
From the definition of the random variables Z and W, it’s obvious that Z assumes
values 0 and 1, while W assumes values 0,1 and 2.
a) We observe that when Z =1, i. e. when the result of the first coin toss is head, we have
W ≥ 1 or W ≠ 0. Therefore:
f (1,0) = P(Z = 1,W = 0) = 0
Similarly, if Z =0, i. e. when the result of the first coin toss is tail, we have W ≤
1 or W ≠ 2. Therefore:
f (0,2) = P(Z = 0,W = 2) = 0
All other combinations of Z and W are possible.
Let Y be the random variable which assumes the value 1 if the result of the
second coin toss is head, and 0 if it’s tail.
By knowing the fact that the coin is unbalanced, we conclude:
P(Y = 1) = P(Z = 1) = 0.4 and P(Y = 0) = P(Z = 0) = 0.6
From the definition of the random variables Y, Z and W, we obtain
W=Y+Z
Now, we calculate the remaining probabilities:
f (1,1) = P( Z = 1,W = 1) = P( Z = 1, Y + Z = 1) = P( Z = 1, Y = 0)
= P( Z = 1) P(Y = 0) = (0.4)(0.6) = 0.24
f (1,2) = P( Z = 1,W = 2) = P( Z = 1, Y + Z = 2) = P( Z = 1, Y = 1)
= P( Z = 1) P(Y = 1) = (0.4)(0.4) = 0.16
f (0,0) = P( Z = 0,W = 0) = P ( Z = 0,Y + Z = 0) = P ( Z = 0,Y = 0)
= P( Z = 0) P(Y = 0) = (0.6)(0.6) = 0.36
f (0,1) = P( Z = 0,W = 1) = P( Z = 0, Y + Z = 1) = P( Z = 0, Y = 1)
= P( Z = 0) P(Y = 1) = (0.6)(0.4) = 0.24
Thus, we get the following joint probability distribution table
W =0 W =1 W =2
Z = 0 0.36 0.24 0
We user the following formula for the joint marginal distribution of the random
variables Y and Z:
g ( y, z ) = f ( x, y, z )dx
x
1
4 xyz 2
= dx
0 9
2 x 2 yz 2 1
=( ) |0
9
2 yz 2 (1) 2 2 yz 2 (0) 2 2 yz 2
=( )−( )=
9 9 9
PHM628: Probability and Statistics
Name: ID: 28/34
= 4xydx
0
= (2 x y ) |10 = 2 y
2
1 1 1
c) P( X ,Y ,1 Z 2);
4 2 3
1
2 1 2
1 1 1
P( X , Y ,1 Z 2) = f ( x, y, z )dzdydx
4 2 3 1 1 1
4 3
1
4 xyz 2
2 1 2
= dzdydx
1 1 1 9
4 3
1
4 xyz 3 2
2 1
= ( ) |1 dydx
1 1 27
4 3
1
4 xy (2)3 4 xy (1)3
2 1
= ( − )dydx
1 1 27 27
4 3
1
2 1
28 xy
= dydx
1 1 27
4 3
1
2
14 xy 2 1
= ( ) |1 dx
1 27 3
4
1 1 2
2
14 x(1) 14 x (2 )
= ( − 3 )dx
1 27 27
4
PHM628: Probability and Statistics
Name: ID: 29/34
1
112 x
2
56 x 2 12 7
= dx = ( ) |1 =
1 243 243 4 162
4
8. Two tire-quality experts examine stacks of tires and assign a quality rating to each tire on
a 3-point scale. Let X denote the rating given by expert A and Y denote the rating given by
B. The following table gives the joint distribution for X and Y.
y
f(x, y)
1 2 3
1 0.10 0.05 0.02
x 2 0.10 0.35 0.05
3 0.03 0.10 0.20
Find x and y .
Using the formula we have
X = xf ( x, y ) = xg ( x)
x y x
Y = yf ( x, y ) = yh( y )
x y y
= x 2 .2dydx − ( x.2dydx)2
0 x 0 x
1 1
= (2 x 2 − 2 x3 )dx − ( (2 x − 2 x 2 )dx) 2
0 0
2 1 2
= ( x3 − x 4 ) |10 −(( x 2 − x3 ) |10 ) 2
3 2 3
2 1 2 1
= − − (1 − )2 =
3 2 3 18
We repeat the similar process to find
PHM628: Probability and Statistics
Name: ID: 31/34
Y2 = E (Y 2 ) − Y2
= y 2 f ( x, y )dydx − ( yf ( x, y )dydx) 2
x y x y
1 1 1 1
= y 2 .2dydx − ( y.2dydx)2
0 x 0 x
1 1
2
= y 3 |1x dx − ( y 2 |1x dx)2
0 3 0
1 1
2 2 3
= ( − x )dx − ( (1 − x 2 )dx)2
0 3 3 0
2 1 1
= ( x − x 4 ) |10 −(( x − x3 ) |10 )2
3 6 3
2 1 1 2 1
= − − (1 − ) =
3 6 3 18
Next we need to find the covariance of X and Y. From calculating the expected values
1 2
we know that X = and Y = .
3 3
XY = E ( XY ) − X Y
1 2
= xyf ( x, y )dydx − .
x y 3 3
1 1
2
= xy.2dydx −
0 x 9
1
2
= ( xy 2 ) |1x dx −
0 9
1
2
= ( x − x3 )dx −
0 9
1 1 2 1
= ( x 2 − x 4 ) |10 − =
2 4 9 36
Finally, we obtain
1
36 1
XY = XY = = .
X Y 1 1 2
18 18
PHM628: Probability and Statistics
Name: ID: 32/34
10. If X and Y are independent random variables with variances X2 = 5 and Y2 = 3, find the
variance of the random variable Z = −2X + 4Y − 3.
We will use the property that for some independent random variables X1, X2, …,
Xn, we have
a2 X + a X +...+ a X = a12 X2 + a22 X2 + ... + an2 X2
1 1 2 2 n n 1 2 n
Since the given variables X and Y are independent, by direct use of the statement
from above, we obtain:
Z2 = −22 X + 4Y −3 = −22 X + 4Y = (−2)2 X2 + (4)2 Y2 = 4(5) + 16(3) = 68
11. There are two service lines. The random variables X and Y are the proportions of time
that line 1 and line 2 are in use, respectively. The joint probability density function for (X,
Y) is given by
3 2
( x + y ), 0 x, y 1,
2
f ( x, y ) = 2
0, otherwise.
a) Determine whether or not X and Y are independent.
We check the independence of the random variables X and Y by checking if the
following identity is true:
f ( x, y) = g ( x)h( y), for all 0 x, y 1
Where g and h are the marginal distributions of X and Y, respectively.
Let’s find the functions g and h. Since X and Y are continuous, we have:
1
3 3 1
g ( x) = f ( x, y )dy = ( x 2 + y 2 )dy = ( x 2 y + y 3 ) |10
y 0 2 2 3
3 1 3 1
= ( x2 + ) = x2 +
2 3 2 2
Similarly, we find the marginal distribution of Y
1
3 3 1
h( y) = f ( x, y)dx = ( x 2 + y 2 )dx = ( xy 2 + x3 ) |10
x 0 2 2 3
3 1 3 1
= ( y2 + ) = y2 +
2 3 2 2
Now, we can see if the statement (1) is true:
3 1 3 1 3
g ( x)h( y) = ( x 2 + )( y 2 + ) ( x 2 + y 2 ) = f ( x, y)
2 2 2 2 2
So, the random variables X and Y aren’t independent
For finding out the expected value of Z, we can use the property:
E( X + Y ) = E( X ) + E(Y )
Therefore, let us find the means of X and Y.
PHM628: Probability and Statistics
Name: ID: 33/34
1
3 2 1 11 3
E ( X ) = xg ( x)dx = x.( x + )dx = (3x + x)dx
x 0 2 2 20
1 3 1 1 3 1 5
= ( x 4 + x 2 ) |10 = ( + ) =
2 4 2 2 4 2 8
Equivalently, we get
1
3 2 1 11
E (Y ) = yh( y )dy = y.( y + )dy = (3 y 3 + y )dy
y 0 2 2 20
1 3 1 1 3 1 5
= ( y 4 + y 2 ) |10 = ( + ) =
2 4 2 2 4 2 8
Therefore, we obtain
5 5 5
E ( Z ) = E ( X + Y ) = E ( X ) + E (Y ) = + =
8 8 4
Further on, since X and Y aren’t independent random variables, we can’t use the
fact E(XY) = E(X)E(Y). Therefore, we compute the required expected value in the
following way
1 1
3
E ( XY ) = xyf ( x, y )dydx = xy. ( x 2 + y 2 )dydx
x y 0 0 2
1 1
3
= ( x3 y + xy 3 )dydx
200
31 1 1
= ( x3 y 2 + xy 4 ) |10 dx
20 2 4
1 3
3 x x 31
= ( + )dx = (2 x3 + x)dx
20 2 4 80
3 1 1 3 1 1 3
= ( x 4 + x 2 ) |10 = ( + ) =
8 2 2 8 2 2 8
c) Find Var(X), Var(Y), and Cov(X,Y ).
In order to find the variances of X and Y, we need the means of X^2 and Y^2.
E ( X 2 ) = x 2 f ( x, y )dydx = x 2 f ( x, y ) = x 2 g ( x)
x y x y x
1
3 2 1 11 4 1 2
= x ( x + )dx = (3x + x )dx
2
0 2 2 20 2
1 3 1 1 3 1 7
= ( x5 + x3 ) |10 = ( + ) =
2 5 3 2 5 3 15
1
3 2 1 11 1
= y ( y + )dy = (3 y 4 + y 2 )dy
2
0 2 2 20 2
PHM628: Probability and Statistics
Name: ID: 34/34
1 3 1 1 3 1 7
= ( y 5 + y 3 ) |10 = ( + ) =
2 5 3 2 5 3 15
Now, we have everything required to compute the variances of X and Y.
7 5 73
Var ( X ) = E ( X 2 ) − ( E ( X )) 2 = − ( ) 2 =
15 8 960
73
And completely equivalently Var(Y) =
960
We will compute the covariance using the formula
3 5 1
Cov( X ,Y ) = E ( XY ) − E ( X ) E (Y ) = − ( ) 2 = −
8 8 64
d) Find Var(X + Y ).
Var ( X + Y ) = X2 +Y = (1) 2 X2 + (1) 2 Y2 + (2)(1)(1) XY
= Var ( X ) + Var (Y ) + 2C ov( X ,Y )
73 73 1 29
= + + 2(− ) =
960 960 64 240