Assignment 1 To 4 - BTC507 - 20376005
Assignment 1 To 4 - BTC507 - 20376005
Of
BIOSTATISTICS AND EXPERIMENTAL DESIGN (THEORY AND LAB)
Submitted By
Raghib Ishraq Alvy
20376005
Department of Mathematics and Natural Sciences
BRAC University
Submitted to
Mohammad Rafiqul Islam, PhD
Associate Professor
Department of Mathematics and Natural Sciences
BRAC University
TABLE OF CONTENTS
ASSIGNMENT NO: 01 ................................................................................................................ 3
ASSIGNMENT NO: 02 .............................................................................................................. 14
ASSIGNMENT NO: 03 .............................................................................................................. 27
ASSIGNMENT NO: 04 .............................................................................................................. 31
2
Assignment No: 01
Problem‐1: daily sleeping duration(hours) different persons were as follows:
7 8 10 0 5 12 3 4 6
1. find variance and standard deviation of sleeping duration and comment.
2. What is the variability of sleeping duration and comment?
3. Calculate coefficient of variation (C.V) of sleeping duration.
Solution:
1
1. Variance, s2 = 𝑛−1 (∑𝑥 2 - n 𝑥2 )
Here,
n=9
7+8+10+0+5+12+3+4+6
Mean, x̅ = 9
55
= 9
= 6.11 hours
𝑥 (hours) 𝑥2
7 72 = 49
8 82 =64
10 102 =100
0 02 =0
5 52 =25
12 122 =144
3 32 =9
4 42 =16
6 62 =36
∑ 𝑥= 55 ∑𝑥 2 =443
1
Variance, s2 = 𝑛−1 (∑𝑥 2 - n 𝑥2 )
1
= 9−1 {(443 – 9(6.112 )}
1
= 8 (443 – 335.988)
= 0.125 x 107.01
= 13.37 hours2
Comment: The average squared difference of daily sleeping duration of different persons from
the mean sleeping hours is 13.37 hours2.
3
= 3.66 hours
Comment: The average difference of daily sleeping duration of different persons from the mean
sleeping hours is 3.66 hours.
2. The variability of sleeping duration: to know the variability of the sleeping duration we
need to know the variance, standard deviation and the coefficient of variation of the
provided data.
This means that the average squared difference of daily sleeping duration of different
persons from the mean sleeping hours is 13.37 hours2.
This means that the average difference of daily sleeping duration of different persons
from the mean sleeping hours is 3.66 hours.
We know that,
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Coefficient of variation, CV= × 100%
𝑀𝑒𝑎𝑛
3.66
∴ CV = 6.11 × 100%
=59.90%
Comment: The relative difference of daily sleeping duration (hours) of different persons
from the mean sleeping hours is 59.90%.
3. From the previous calculation we have found that coefficient of variation is 59.90%.
Comment: The relative difference of daily sleeping duration (hours) of different persons
from the mean sleeping hours is 59.90%.
4
Problem‐2: The average monthly sales of shop “Fair” is 900 (million $) with variance 25
(million $)2 whereas average monthly sales of shop “People” is 1700 (million $) with variance
40(million $)2
a. Which shop is more stable (or consistent) in monthly sales
b. Which shop is more variation in monthly sales
(Fair‐cv:0.55%, People‐cv:0.37%)
c. Which shop is less variable in monthly sales
Solution:
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV1 = × 100%
𝑀𝑒𝑎𝑛
5
= 900 × 100%
= 0.55 %
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV2 = × 100%
𝑀𝑒𝑎𝑛
6.32
= 1700 × 100%
= 0.37 %
Comment: Since CV2 < CV1 , the shop “People” is relatively more stable or consistent in monthly
sales.
b. The shop “Fair” has coefficient of variability of 0.555% and has more variation in
monthly sales.
c. The shop “People” has coefficient of variability of 0.372% and less variable in monthly
sales.
5
Problem‐3: These are the runs of two cricket players in the last 7 innings
Player B(runs) 30 50 20 30 40 0 10
Solution:
i. For player A:
100+20+0+50+0+0+120
Mean, x̄ = 7
= 41.42 runs
And n=7
x (Runs) 𝑥2
2
100 100 =10000
20 202 = 400
0 02 = 0
2
50 50 = 2500
0 02 = 0
0 02 = 0
120 1202 = 14400
∑x= 55 ∑𝑥 2 = 27300
1
Now, Variance, s 2 = 𝑛−1 (∑𝑥 2 - n 𝑥2 )
1
= 7−1 {27300 – 7 × (41.42)2}
1
= 6 (27300- 12013.95)
= 0.166 × 15286.04
= 2547.67 runs 2
The average squared difference of the runs of cricket player A in the last 7 innings from the
mean runs is 2547.67 runs 2 .
6
= 50.47 runs
The average difference of the runs of cricket player A in the last 7 innings from the mean runs is
50.47 runs.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV1 = × 100%
𝑀𝑒𝑎𝑛
50.474
= 41.428 × 100%
= 121.83 %
The relative difference of the runs of cricket player A in the last 7 innings from the mean runs is
121.83 %.
For player B:
30+50+20+30+40+0+10
Mean, x̄ = 7
x (runs) x2
30 302 =900
50 502 =2500
20 202 =400
30 302 =900
40 402 =1600
0 02 =0
10 102 =100
∑x= 180 ∑x 2 = 6400
= 25.71 runs
1
Variance, s 2 = 𝑛−1 (∑𝑥 2 - n 𝑥2 )
1
= 7−1 (6400 - 7× (25.71)2 )
1
= 6 × (6400-4627.02)
= 295.49 runs 2
The average squared difference of the runs of cricket player B in the last 7 innings from the mean
runs is 295.49 runs 2 .
The average difference of the runs of cricket player B in the last 7 innings from the mean runs is
17.19 runs.
7
Therefore, the coefficient of variation for player B is:
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV2 = × 100%
𝑀𝑒𝑎𝑛
17.19
= 25.71 × 100%
= 66.86%
The relative difference of the runs of cricket player B in the last 7 innings from the mean runs is
66.86%.
Since, CV2 < CV1 , so player B performs better than player A.
ii. To me, player B would be a better choice so I would like to select player B. The reason behind this
is that the player B has lesser coefficient of variation value than player A. This means player B is
relatively more consistent or stable in his performance and has lesser variability than the player A.
8
Problem‐4. The average monthly income of Barclays Bank (UK) employees is 6500$ with
standard deviation 450$ whereas average monthly income of Dhaka Bank (Bangladesh)
employees is 65000tk with standard deviation 5500 tk.
a) Compare monthly income of two Banks
b) Which bank is more uniform (or homogeneous) in paying monthly income of
employees and why?
c) Which bank is more variable in paying monthly income of employees and why?
Solution:
a) The two coefficients of variation of two banks are, respectively-
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV1 = × 100%
𝑀𝑒𝑎𝑛
450
= 6500 × 100%
= 6.92 %
b) Barclays Bank (UK) is more uniform (or homogeneous) in paying monthly income of
employees. This is because the coefficient of variation of Barclays Bank (UK) is lesser (6.92%)
than the coefficient of variation of Dhaka Bank (Bangladesh) (8.46%). Thus, we know that the
Barclays Bank (UK) has more consistency in paying monthly income of employees than Dhaka
Bank (Bangladesh).
c) Dhaka Bank (Bangladesh) is more variable in paying monthly income of employees as it has
more coefficient of variation value than that of Barclays Bank (UK).
9
Problem-5. Survival time (in years) of cancer patients were as follows:
8, 12, 10, 12, 15, 20,1
1. Calculate average survival time and comment
2. Find variance and standard deviation (s.d) of survival time and comment
3. What is the variability of survival?
4. Calculate coefficient of variation (C.V) of survival time
5. Also find range
Solution:
1. The average survival time of cancer patients:
8+12+10+12+15+20+1
Mean, x̄ = 7
78
= 7
= 11.14 years
Comment: the average survival time of cancer patients is 11.14 years.
2.
x (years) x2
8 82 =64
12 122 =144
10 102 =100
12 122 =144
15 152 =225
20 202 =400
1 12 =1
∑x= 78 ∑x 2 = 1078
Here, n=7
1
Variance, s 2 = 𝑛−1 (∑𝑥 2 - n 𝑥2 )
1
= 7−1 (1078 - 7×11.142 )
1
= 6 × 209.30
= 34.88 years 2 .
Comment: The average squared difference of the survival time (in years) of cancer patients from
the mean survival time is 34.88 years 2 .
Standard deviation, s = √variance
= √34.88
= 5.90 years
Comment: The average difference of the survival time (in years) of cancer patients from the
mean survival time is 5.90 years.
10
3. The variability of survival: The variability of survival of cancer patients has been
determined by measuring the variance and standard deviation. The variance of the survival
time of cancer patients is 34.88 years 2 and the standard deviation is 5.90 years.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV= × 100%
𝑀𝑒𝑎𝑛
5.90
= 11.14 × 100%
= 52.96 %
Comment: The relative difference of the survival time (in years) of cancer patients from the
mean survival time is 52.96 %.
11
Problem 6: following data shows the duration of pain relief after taking
a particular medicine:
Duration(hours) No. of patients
4_8 6
8_12 5
12_16 9
16_20 4
20_24 2
Find
1. Mean (average) duration of pain relief
2. Variance and s.d of duration of pain relief
3. What is conclusion about the variability (variation) of duration of pain relief
4. Calculate coefficient of variation (C.V)
Solution:
1.
Duration (hours) midpoint, x No. of fx x2 fx 2
patients, f
4-8 6 6 36 36 216
8-12 10 5 50 100 500
12-16 14 9 126 196 1764
16-20 18 4 72 324 1296
20-24 22 2 44 484 968
∑f= 26 ∑fx = 328 ∑ fx 2 = 4744
Mean (average) duration of pain relief is measured as follows:
∑fx
Mean, x̄ = ∑f
328
= 26
= 12.61 hours.
12
Standard deviation, s = √variance
= √24.26
= 4.92 hours.
Comment: The average difference of the duration of pain relief after taking a particular medicine
from the mean duration of pain relief is 4.92 hours.
3. The conclusion about the variability (variation) of duration of pain relief: The variability
(variation) of duration of pain relief can be determined by measuring variance and standard
deviation. The variance of the duration of pain relief after taking a particular medicine is 24.26
hours2 and standard deviation of the duration of pain relief after taking a particular medicine
is 4.92 hours.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV= × 100%
𝑀𝑒𝑎𝑛
4.92
= 12.61 × 100%
= 39.01 %
Comment: The relative difference of the duration of pain relief after taking a particular medicine
from the mean duration of pain relief is 39.01%.
13
Assignment No: 02
Problem‐1 The following are the sales and profits of a company for the last 7 day:
Sales('000'Tk) Profit ('000' TK)
6 1
7 1
8 3
11 5
12 6
10 4
12 5
1. Calculate the (Pearson's)correlation coefficient(r) between sales and profit and comment
2. Find the relationship between sales and profit by scatter diagram and comment (interpret)
3. Fit a (least squared) regression line of profit on sales and comment (ordinary least square
method)
4. If sale is 15 ('000'tk) what is profit?
5. Find coefficient of determination (how well the regression line is fitted).
Solution:
𝑆𝑎𝑙𝑒𝑠(′000′ 𝑇𝑘) 𝑃𝑟𝑜𝑓𝑖𝑡 (′000′ 𝑇𝐾) 𝑥𝑦 𝑥2 𝑦2
x y
6 1 6 36 1
7 1 7 49 1
8 3 24 64 9
11 5 55 121 25
12 6 72 144 36
10 4 40 100 16
12 5 60 144 25
𝛴𝑥 = 66 𝛴𝑦 = 25 2 2
𝛴𝑥𝑦 = 264 𝛴𝑥 = 658 𝛴𝑦 = 113
Here,
𝑛 = 7
∑𝑥 66
𝑥= = = 9.42
𝑛 7
∑𝑦 25
𝑦̅ = = = 3.57
𝑛 7
14
28.6
𝑟= = 0.96
√36.89×23.82
As, r is positive (0.96) and near to +1, which means there is a perfect positive and strong correlation
between sales and profits. So, when sales increase profits also increase.
2.
The above scatter plot shows that, shape of the data is going to upward which means the correlation
between sales and profit is positive and if sales was increased, profits will also increase.
5. coefficient of determination is
𝑟 = (0.96)2 = 0.92
2
92% of the variation in the profit that is increased is explained by taking into account the average
number of sales in TK.
15
Problem ‐2 The demand and price of a specific products are given in the following table
Demand (kg) Price (tk)
10 25
8 37
9 40
7.5 45
5 48
4.5 50
3 55
2 70
1. fit a least square regression equation(line) of demand on price and comment
2. What will be demand when the price is 52 (tk) and 10 (tk)
3. Find correlation coefficient and comment and comment
4. Find coefficient of determination (how well the regression line is fitted)
5. Find the relationship between demand and price by scatter diagram and comment
(interpret)
Solution:
𝑫𝒆𝒎𝒂𝒏𝒅 (𝒌𝒈)(𝒙) 𝑷𝒓𝒊𝒄𝒆 (𝒕𝒌)(𝒚) 𝑥𝑦 𝑥2 𝑦2
10 25 250 100 625
8 37 296 64 1369
9 40 360 81 1600
7.5 45 337.5 56.25 2025
5 48 240 25 2304
4.5 50 225 20.25 2500
3 55 165 9 3025
2 70 140 4 4900
𝛴𝑥 = 49 𝛴𝑦 = 370 2 2
𝛴𝑥𝑦 = 2013.5 𝛴𝑥 = 359.5 𝛴𝑦 = 18348
Here,
𝑛 = 8
∑𝑥 49
𝑥= = = 6.12
𝑛 8
∑𝑦 370
𝑦̅ = = = 46.25
𝑛 8
16
𝛼̂ = 71.89 𝑚𝑒𝑎𝑛𝑠, 𝑖𝑓 𝑑𝑒𝑚𝑎𝑛𝑑𝑠 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒 𝑡𝑜 71.89 𝑘𝑔 𝑡ℎ𝑒𝑛 𝑡ℎ𝑒 𝑝𝑟𝑖𝑐𝑒 𝑤𝑖𝑙𝑙 𝑏𝑒 0
𝛽̂ = −4.19 𝑚𝑒𝑎𝑛𝑠, 𝑖𝑓 𝑝𝑟𝑖𝑐𝑒 𝑤𝑖𝑙𝑙 𝑏𝑒 𝑑𝑒𝑐𝑐𝑟𝑒𝑎𝑠𝑒𝑑 𝑡𝑜 4.19 𝑇𝐾 𝑤ℎ𝑒𝑛 𝑡ℎ𝑒 𝑑𝑒𝑚𝑎𝑛𝑑𝑠 𝑤𝑎𝑠 1
4. coefficient of determination is
𝑟 = (−0.92)2 = 0.84
2
84% of the variation in the price that is decreased is explained by taking into account the average
number of demands in kg.
17
5.
The above scatter plot shows that, shape of the data is going to downward which means the
correlation between demands and price is negative and if demands was increased, price will
decrease.
18
Problem‐3
To study the strength of a certain ware, the following pairs of observations were recorded,
the diameter (in cm) and the mass supported in kg/cm.
Diameter(cm) 0.6 1.0 1.2 1.6 2.0 2.2
Mass(kg/cm) 20 50 55 100 105 110
i. Fit a regression model of mass on diameter.
ii. What will be mass for a wire with diameter 1.3 cm?
iii. Draw a scatter diagram and comment
Solution:
𝐷𝑖𝑎𝑚𝑒𝑡𝑒𝑟(𝑐𝑚)(𝑥) 𝑘𝑔 𝑥𝑦 𝑥2 𝑦2
𝑀𝑎𝑠𝑠 ( ) (𝑦)
𝑐𝑚
0.6 20 12 0.36 400
1 50 50 1 2500
1.2 55 66 1.44 3025
1.6 100 160 2.56 10000
2 105 210 4 11025
2.2 110 242 4.84 12100
𝛴𝑥 = 8.6 𝛴𝑦 = 440 2 2
𝛴𝑥𝑦 = 740 𝛴𝑥 = 14.2 𝛴𝑦 = 39050
Here,
𝑛 = 6
∑𝑥 8.6
𝑥= = = 1.43
𝑛 6
∑𝑦 440
𝑦̅ = = = 73.33
𝑛 6
19
3.
The above scatter plot shows that, shape of the data is going to upward which means the correlation
between diameter and mass is positive and if diameter was increased, mass will also increase.
20
Problem‐4
Daily studying(x) (in hours) and marks (y) in a quiz (out of 15) of 10 students were as follows:
Studying 6 5 4 2 1 7 10 0 8 5
Marks 14 12 10 8 6 12 13 12 15 9
i. Fitted the regression equation of Studying(y) on marks.
ii. What will be marks for studying 3 hours daily?
iii. Can you verify the relationship between marks and daily studying? Try all the
methods.
Solution:
𝑆𝑡𝑢𝑑𝑦𝑖𝑛𝑔(𝑥) 𝑀𝑎𝑟𝑘𝑠(𝑦) 𝑥𝑦 𝑥2 𝑦2
6 14 84 36 196
5 12 60 25 144
4 10 40 16 100
2 8 16 4 64
1 6 6 1 36
7 12 84 49 144
10 13 130 100 169
0 12 0 0 144
8 15 120 64 225
5 9 45 25 81
𝛴𝑥 = 48 𝛴𝑦 = 111 2 2
𝛴𝑥𝑦 = 585 𝛴𝑥 = 320 𝛴𝑦 = 1303
Here,
𝑛 = 10
∑𝑥 48
𝑥= = = 4.8
𝑛 10
∑𝑦 111
𝑦̅ = = = 11.1
𝑛 10
21
3. The Pearson correlation coefficient, r is
𝛴𝑥𝑦 − 𝑛𝑥 𝑦̅
𝑟=
√(𝛴𝑥2 − 𝑛𝑥 2 )(𝛴𝑦2 − 𝑛𝑦̅ 2 )
585−10×4.8 ×11.1
𝑟=
√(320−10×4.82 )(1303−10×11.12 )
52.2
𝑟= = 0.65
√89.6×70.9
As, r is positive (0.65), which means there is a positive and strong correlation between studying
and marks. So, when duration of study increases marks also increase.
The above scatter plot shows that, shape of the data is going to upward which means the correlation
between duration of study and marks is positive and if duration of study was increased, marks will
also increase.
22
Problem‐5
A department of transportation’s study on driving speed and mileage for midsize
automobile
resulted in the following table:
Driving speed 30 40 50 55 25
Mileage 27 25 30 35 22
a. Is there any the relationship between Driving speed and Mileage? Verify your answer
b. Find the regression equation of driving speed on mileage.
c. What will be mileage when speed is 45?
d. Test the fitness of your regression model with explanation
Solution:
𝐷𝑟𝑖𝑣𝑖𝑛𝑔 𝑠𝑝𝑒𝑒𝑑(𝑥) 𝑀𝑖𝑙𝑒𝑎𝑔𝑒(𝑦) 𝑥𝑦 𝑥2 𝑦2
30 27 810 900 729
40 25 1000 1600 625
50 30 1500 2500 900
55 35 1925 3025 1225
25 22 550 625 484
𝛴𝑥 = 200 𝛴𝑦 = 139 𝛴𝑥𝑦 = 5785 𝛴𝑥 2 = 8650 𝛴𝑦 2 = 3963
Here,
𝑛 = 5
∑𝑥 200
𝑥= = = 40
𝑛 5
∑𝑦 139
𝑦̅ = = = 27.8
𝑛 5
23
𝛼̂ = 14.2 𝑚𝑒𝑎𝑛𝑠, 𝑖𝑓 𝑑𝑟𝑖𝑣𝑖𝑛𝑔 𝑠𝑝𝑒𝑒𝑑 𝑑𝑒𝑐𝑟𝑒𝑎𝑠𝑒 𝑡𝑜 14.2 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟𝑠 𝑡ℎ𝑒𝑛 𝑡ℎ𝑒 𝑚𝑖𝑙𝑒𝑎𝑔𝑒 𝑤𝑖𝑙𝑙 𝑏𝑒 0
𝛽̂ = 0.34 𝑚𝑒𝑎𝑛𝑠, 𝑖𝑓 𝑚𝑖𝑙𝑒𝑎𝑔𝑒 𝑤𝑖𝑙𝑙 𝑏𝑒 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒𝑑 𝑡𝑜 0.34 𝑘𝑚 𝑤ℎ𝑒𝑛 𝑡ℎ𝑒 𝑑𝑟𝑖𝑣𝑖𝑛𝑔 𝑠𝑝𝑒𝑒𝑑 𝑤𝑎𝑠 1
24
Problem‐6: correlation and regression
Father’s Height (inches) Son’s Height (inches)
58 60
60 62
62 63
64 64
65 64
66 65
67 66
70 67
73 69
75 70
1. Find the relationship between Son’s Height and Father’s Height and comment
2. Find a least square regression equation of Son’s Height on Father’s Height and
comment
3. What will be Son’s Height when Father’s Height are 77 and 56 inches
4. How well the regression line is fitted?
Solution:
𝐹𝑎𝑡ℎ𝑒𝑟’𝑠 𝐻𝑒𝑖𝑔ℎ𝑡 𝑆𝑜𝑛’𝑠 𝐻𝑒𝑖𝑔ℎ𝑡 𝑥𝑦 𝑥2 𝑦2
(𝑖𝑛𝑐ℎ𝑒𝑠)(𝑥) (𝑖𝑛𝑐ℎ𝑒𝑠) (𝑦)
25
1. The Pearson correlation coefficient, r is
𝛴𝑥𝑦 − 𝑛𝑥 𝑦̅
𝑟=
√(𝛴𝑥2 − 𝑛𝑥 2 )(𝛴𝑦2 − 𝑛𝑦̅ 2 )
43051−10×66 ×65
𝑟=
√(43828−10×662 )(42336−10×652 )
151
𝑟= = 0.99
√268×86
As, r is positive (0.99) and near to +1, which means there is a positive and strong correlation
between father’s height and son’s height. So, when father’s height increases son’s height also
increase.
4. coefficient of determination is
𝑟 2 = (0.99)2 = 0.98
98% of the variation in the son’s height that is increased is explained by taking into account the
average number of father’s height in TK
26
Assignment No: 03
Conditional probability
Problem 1: In a hospital 60% of the patients suffer from diabetics, 40% heart disease; and
20% both diabetics and heart disease. One patient is selected at random
(i) find the probability that he suffers from heart disease, if he suffers from diabetic
s.
(ii) find the probability that he suffers from diabetics, if he suffers from heart diseas
e.
(iii) find the probability that he suffers from at least one disease (either diabetics or)
heart disease
Solution:
𝑃(𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = 60% = 0.6
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 40% = 0.4
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ꓵ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = 20% = 0.2
(i) probability that he suffers from heart disease, if he suffers from diabetics is
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ꓵ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) 0.2
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ǀ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = = = 0.5
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) 0.4
(ii) probability that he suffers from diabetics, if he suffers from heart disease is
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ꓵ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) 0.2
𝑃(𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠 ǀ ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = = = 0.33
𝑃(𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) 0.6
(iii)probability that he suffers from at least one disease (either diabetics or) heart disease is
𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑜𝑟 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = 𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) +
𝑃(𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) – 𝑃(ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑎𝑛𝑑 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠)
So,
(i) the probability that he suffers from heart disease, if he suffers from diabetics is 50%
(ii) the probability that he suffers from diabetics, if he suffers from heart disease is 33%
(iii) the probability that he suffers from at least one disease (either diabetics or)
heart disease 80%
27
Problem 2: Twenty percent of women in a community is diabetic. Of these, 75% have low
bone mineral density (BMD). Of those who did not have diabetes, 20% have low BMD. What
is the probability that a randomly selected low BMD woman who has diabetes?
Solution:
𝑃(𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = 20% = 0.2
𝑃(𝑙𝑜𝑤 𝐵𝑀𝐷 𝑤𝑖𝑡ℎ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = 75% = 0.75
probability that a randomly selected low BMD woman who has diabetes is
𝑃(𝑙𝑜𝑤 𝐵𝑀𝐷 ꓵ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) = 𝑃 (𝑙𝑜𝑤 𝐵𝐷𝑀 𝑤𝑖𝑡ℎ 𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) × 𝑃(𝑑𝑖𝑎𝑏𝑒𝑡𝑒𝑠) =
0.75 × 0.2 = 0.15
So,
The probability that a randomly selected low BMD woman who has diabetes is 15%
Solution:
𝑃(𝐼) = 14% = 0.14
𝑃(𝐻) = 26% = 0.26
the patient experiences insomnia; the probability that the patient will also experience headache is
𝑃(𝐼 ꓵ 𝐻) = 𝑃 (𝐼) × 𝑃(𝐻) = 0.14 × 0.26 = 0.036
So,
The patient experiences insomnia; the probability that the patient will also experience headache is
3.6%
28
Problem 4: the distribution of students is as follows (contingency/cross table:
IT Software
Male 8 4
Female 5 15
One student is selected at random. What is the probability that the student is?
i. from IT?
ii. female?
iii. male and from Software?
iv. female or from IT?
v. From IT or Software?
vi. female if (given that/but) from IT?
vii. If the student is male, what is the probability that student from Software?
viii. Are student from IT and male dependent? why? Are they mutually exclusive?
Solution:
IT Software Total
Male 8 4 12
Female 5 15 20
Total 13 19 32
i. From IT
13
𝑃(𝐼𝑇) = = 0.40
32
ii. Female
20
𝑃 (𝐹𝑒𝑚𝑎𝑙𝑒) = = 0.62
32
iii. Male and from Software
4
𝑃 ( 𝑀𝑎𝑙𝑒 ꓵ 𝑓𝑟𝑜𝑚 𝑠𝑜𝑓𝑡𝑤𝑎𝑟𝑒) = = 0.12
32
iv. Female or from IT
𝑃 ( 𝐹𝑒𝑚𝑎𝑙𝑒 ∪ 𝑓𝑟𝑜𝑚 𝐼𝑇) = 𝑃 (𝐹𝑒𝑚𝑎𝑙𝑒) +
𝑃 (𝑓𝑟𝑜𝑚 𝐼𝑇) – 𝑃 (𝑓𝑒𝑚𝑎𝑙𝑒 ꓵ 𝑓𝑟𝑜𝑚 𝐼𝑇)
5
𝑃 ( 𝐹𝑒𝑚𝑎𝑙𝑒 ∪ 𝑓𝑟𝑜𝑚 𝐼𝑇) = 0.62 + 0.40 – 32
𝑃 ( 𝐹𝑒𝑚𝑎𝑙𝑒 ∪ 𝑓𝑟𝑜𝑚 𝐼𝑇) = 0.86
v. From IT or Software
𝑃 ( 𝐼𝑇 ∪ 𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒) = 𝑃 (𝐼𝑇) + 𝑃 (𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒) – 𝑃 (𝐼𝑇 ꓵ 𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒)
19 32
𝑃 ( 𝐼𝑇 ∪ 𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒) = 0.40 + 32 – 32 = -0.006
29
vi. Female but from IT
𝑃(𝐹𝑒𝑚𝑎𝑙𝑒 ꓵ 𝐼𝑇)
𝑃(𝐹𝑒𝑚𝑎𝑙𝑒 ǀ 𝐼𝑇) = 𝑃(𝐹𝑒𝑚𝑎𝑙𝑒)
5
32
𝑃(𝐹𝑒𝑚𝑎𝑙𝑒 ǀ 𝐼𝑇) = = 0.25
0.62
vii. Male but from Software
𝑃(𝑀𝑎𝑙𝑒 ꓵ 𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒)
𝑃(𝑀𝑎𝑙𝑒 ǀ 𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒) = 𝑃(𝑀𝑎𝑙𝑒)
0.12
𝑃(𝑀𝑎𝑙𝑒 ǀ 𝑆𝑜𝑓𝑡𝑤𝑎𝑟𝑒) = 12 = 0.32
32
viii. Mutually exclusive
𝑃(𝑀𝑎𝑙𝑒 ∪ 𝐼𝑇) = 𝑃(𝑀𝑎𝑙𝑒) + 𝑃(𝐼𝑇) – 𝑃(𝑀𝑎𝑙𝑒 ꓵ 𝐼𝑇)
12 8
𝑃(𝑀𝑎𝑙𝑒 ∪ 𝐼𝑇) = 32 + 0.40 – 32
𝑃(𝑀𝑎𝑙𝑒 ∪ 𝐼𝑇) = 0.52
So,
i. From IT the probability of students is 40%
ii. Probability of female students is 62%
iii. Probability of male students from software is 12%
iv. Probability of female students or students from IT is 86%
v. Probability of students from IT or software is -0.6%
vi. Probability female students but from IT is 25%
vii. Probability male students but from software is 32%
viii. Mutually exclusive probability of male students from IT is 52%
30
Assignment No: 04
Problem 1: Suppose that a new screening test is proposed for the detection of fracture. The
prevalence of fracture in the general population is known to be 10%. The test has been investigated
in fracture subjects and was found to give positive result in 70% of such cases (sensitivity). When
given to subjects without fracture, the test yielded a positive result of 20%.
a) What is the proportion (probability) of subjects will actually be found to have fracture with
positive test?
b) What is the proportion (probability) of subjects will be found to have without fracture with
positive test?
c) What is the overall rate of positive test result?
Solution:
𝑃(𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ) = 0.1
𝑃(𝑤𝑖𝑡ℎ 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ) = 0.7
𝑃( 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡) = 0.2
𝑃(𝑛𝑜𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒) = 1 – 𝑃(𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒) = 1 − 0.1 = 0.9
a) the proportion (probability) of subjects will actually be found to have fracture with positive test is
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ǀ 𝑤𝑖𝑡ℎ 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒) =
𝑃(𝑤𝑖𝑡ℎ 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 )×𝑃(𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒)
𝑃(𝑤𝑖𝑡ℎ 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ) × 𝑃(𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒)+ 𝑃(𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ) × 𝑃(𝑛𝑜𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒)
0.7×0.1
= 0.7 ×0.1+0.2 ×0.9
0.07
=
0.25
=0.28 = 28%
b) the proportion (probability) of subjects will actually be found to have fracture without
positive test is
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ǀ 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ) =
𝑃(𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 )×𝑃(𝑛𝑜𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒)
𝑃(𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ) × 𝑃(𝑛𝑜𝑡 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒)+ 𝑃(𝑤𝑖𝑡ℎ 𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 ) × 𝑃(𝑓𝑟𝑎𝑐𝑡𝑢𝑟𝑒)
0.2×0.9
= 0.2 ×0.9+0.7 ×0.1
0.18
= 0.25
=0.72 = 72%
31
Problem 2: Machines A and B produce 10% and 90% respectively of the production of a component
intended for the motor industry. From experience, it is known that the probability that machine A
produces a defective component is 0.01 while the probability that machine B produces a defective
component is 0.05. If a component is selected at random from a day’s production and is found to be
defective, find the probability that it was made by
(a) machineA;
(b) machineB.
(c) what is total rate of defective?
Solution:
𝑃(𝐴) = 0.1
𝑃(𝐵) = 0.9
𝑃(𝐷ǀ𝐴) = 0.01
𝑃(𝐷ǀ𝐵) = 0.05
𝑃(𝐷ǀ𝐴) × 𝑃(𝐴)
𝑃(𝐴ǀ𝐷) =
𝑃(𝐷ǀ𝐴) × 𝑃(𝐴) + 𝑃(𝐷ǀ𝐵) × 𝑃(𝐵)
0.01 × 0.1
𝑃(𝐴ǀ𝐷) =
0.01 × 0.1 + 0.05 × 0.9
0.001
𝑃(𝐴ǀ𝐷) =
0.046
𝑃(𝐴ǀ𝐷) = 0.02 = 2%
𝑃(𝐷ǀ𝐵) × 𝑃(𝐵)
𝑃(𝐵ǀ𝐷) =
𝑃(𝐷ǀ𝐵) × 𝑃(𝐵) + 𝑃(𝐷ǀ𝐴) × 𝑃(𝐴)
0.05 × 0.9
𝑃(𝐵ǀ𝐷) =
0.05 × 0.9 + 0.01 × 0.1
0.045
𝑃(𝐴ǀ𝐷) =
0.046
𝑃(𝐴ǀ𝐷) = 0.97 = 97%
32
Problem 3: A test correctly identifies a disease in 95% of people who have it. It correctly identifies
no disease in 94% of people who do not have it. In the population, 3% of the people have the disease.
What is the probability that you have the disease if you tested positive?
Solution:
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒) = 0.95
𝑃(𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒) = 0.94
𝑃(ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 0.03
𝑃(ℎ𝑎𝑣𝑒 𝑛𝑜𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 1 − 𝑃(ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) = 1 − 0.03 = 0.97
0.03
𝑃(ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ꓵ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒) 0.95
𝑃(ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ǀ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒) = = = 1.03
𝑃(ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) 0.03
0.97
𝑃(ℎ𝑎𝑣𝑒 𝑛𝑜𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ꓵ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒) 0.94
𝑃(ℎ𝑎𝑣𝑒 𝑛𝑜𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 ǀ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒) = = = 0.01
𝑃(ℎ𝑎𝑣𝑒 𝑛𝑜𝑡 𝑡ℎ𝑒 𝑑𝑖𝑠𝑒𝑎𝑠𝑒) 0.97
1.03 × 0.95
=
1.03 × 0.95 + 0.01 × 0.94
0.978
=
0.987
= 0.99 = 99%
So,
The probability that I have the disease if I tested positive is 99%
33
Problem 4: There are 140 companies in a conglomerate. These companies can be categorized as small
cap, medium cap and large cap, based on their total market capital. These companies publish their
profits and losses on an annual basis. The management of the conglomerate keeps a record of the
companies by category. In 2014, it was found that 75% of the large caps, 78% of the mid‐caps and
65% of the small caps reported profits. The records show that there are twice as many small cap
companies as the medium cap companies, and twice as many medium cap companies as the large cap
companies
i. Given that a company picked at random made profits in 2014, what is the probability
that it is a Small Cap company
ii. Given that a company picked at random made losses in 2014, what is the probability that
it is a Small cap company?
iii. Given that a company picked at random made profits in 2014, what is the probability
that it is a mid-cap company?
iv. Given that a company picked at random made losses in 2014, what is the probability that
it is a large cap company?
Solution:
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶) = 0.75
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶) = 0.78
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶) = 0.65
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶) = 1 − 𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶) = 1 − 0.75 = 0.25
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶) = 1 − 𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶) = 1 − 0.78 = 0.22
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶) = 1 − 𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶) = 1 − 0.65 = 0.35
𝐿𝐶
𝑃(𝐿𝐶) =
140
𝐿𝐶
2 × 140
𝑃(𝑀𝐶) = = 2𝐿𝐶
140
2 × 2𝐿𝐶 𝐿𝐶
𝑃(𝑆𝐶) = =
140 35
0.75
𝐿𝐶
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶 ꓵ 𝐿𝐶) 140 140
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶 ǀ 𝐿𝐶) = = =
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶) 0.75 𝐿𝐶
0.78
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶 ꓵ 𝑀𝐶) 2𝐿𝐶 1
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶 ǀ 𝑀𝐶) = = =
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶) 0.78 2𝐿𝐶
0.65
𝐿𝐶
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶 ꓵ 𝑆𝐶) 35
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶 ǀ 𝑆𝐶) = = 35 =
𝑃(𝑝𝑟𝑜𝑓𝑖𝑡𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶) 0.65 𝐿𝐶
0.25
𝐿𝐶
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶 ꓵ 𝐿𝐶) 140 140
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶 ǀ 𝐿𝐶) = = =
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝐿𝐶) 0.25 𝐿𝐶
34
0.22
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶 ꓵ 𝑀𝐶) 2𝐿𝐶 1
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶 ǀ 𝑀𝐶) = = =
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑀𝐶) 0.22 2𝐿𝐶
0.35
𝐿𝐶
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶 ꓵ 𝑆𝐶) 35
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶 ǀ 𝑆𝐶) = = 35 =
𝑃(𝑙𝑜𝑠𝑠𝑒𝑠 𝑚𝑎𝑑𝑒 𝑏𝑦 𝑆𝐶) 0.35 𝐿𝐶
35