Lecture+Notes+5+-+EDA+-+Continuous+Random+Variable
Lecture+Notes+5+-+EDA+-+Continuous+Random+Variable
The density function of the normal random variable X, with mean μ and
variance σ2, is
(𝑥−𝜇) 2
1 −(1/2)[ 𝜎 ]
𝑛(𝑋; 𝜇, 𝜎) = 𝑒 for – ∞ < X < + ∞
√2𝜋𝜎
-∞ μ X values + ∞
σ σ
Normal Curve
Properties of the Normal Curve
a. The mode, which is the point on the horizontal axis where the curve is a
maximum, occurs at x = .
b. The curve is symmetric about a vertical axis through the mean .
c. The curve has its points of inflection at x = ± , is concave downward if
( - ) X ( + ) , and is concave upward otherwise.
d. The normal curve approaches the horizontal axis asymptotically as we
proceed in either direction away from the mean.
e. The total area under the curve and above the horizontal axis is equal to 1.
Transformation Formula
𝑋−𝜇
𝑍=
𝜎
Normal Curve
μ < or > 0
σ > or < 1
μ X value
StandardNormal Curve
μ=0
σ=1
Z=0 Z value
1|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Areas Under the Standard Normal Curve
Area
0 Z
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
- 3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
- 3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
- 3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
- 3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
- 2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
- 2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
- 2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
- 2.6 0.0047 0.0045 0.0044 0.0043 0..0041 0.0040 0.0039 0.0038 0.0037 0.0036
- 2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
- 2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
- 2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
- 2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
- 2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
- 2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
- 1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
- 1.8 0.0359 0.0352 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
- 1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
- 1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
- 1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
- 1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0722 0.0708 0.0694 0.0681
- 1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
- 1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
- 1.1 0.157 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
- 1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
- 0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
- 0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
- 0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
- 0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
- 0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
- 0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
- 0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
- 0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
- 0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
- 0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.0 0.0500 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5040 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 05438 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.5832 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6217 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6591 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.6950 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7291 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7611 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.7910 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8186 0.8461 0.8485 0.8508 0.8531 0.8554 .8577 0.8599 0.8621
1.1 0.8643 0.8438 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8665 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.8869 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9049 0.9222 0.9236 0.9251 0.9265 0.9278 0.9292 0.9306 0.9319
1.5 0.9332 0.9207 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9345 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9463 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9564 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9649 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9719 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9778 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9826 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9864 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9896 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9920 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9940 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9955 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9966 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9975 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9982 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9987 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9991 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9993 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
2|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Example
Given the normally distributed variable X with mean 18 and standard deviation 2.5, find:
a. P(X 15)
b. P(X > 17)
c. P(19 X 22)
d. The value of k such that P(X k) = 0.1814
Solution:
Let X = continuous random variable
The symbol, <, means the required probability (area) is to the left of a specific value and
the symbol, >, means the required probability (area) is to the right of a specific value.
NOTE:
THE TABLE, AREAS UNDER THE STANDARD NORMAL CURVE, SHOWN
ABOVE ALWAYS GIVE THE AREA TO THE LEFT OF A SPECIFIC VALUE
OF Z.
a. P(X 15)
X = 15
μ = 18 X value
Z1 = - 1.20 Z = 0 Z value
From the table, the area (probability) to the left of Z = - 1.20 is 0.1151
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
- 1.3
- 1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
- 1.1
3|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
X = 17 μ = 18 X value
Z2 = - 0.40 Z = 0 Z value
𝑃(𝑋 > 15) = 𝑃(𝑍 > −0.4) = 𝑷(𝑺) − 𝑷(𝒁 < −𝟎. 𝟒𝟎)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
- 0.6
- 0.5
- 0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
- 0.3
c. P(19 X 22)
4|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
18 X = 19 X = 22 X value
0Z 3 = 0.40 Z value
Z4 = 1.60
𝑃(19 < 𝑋 < 22) = 𝑃(0.40 < 𝑍 < 1.60) = 𝑃(𝑍 < 1.60) − 𝑃(𝑍 < 0.40)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
0.3
0.4 0.6554 0.6217 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6 0.9452 0.9345 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7
𝑋−𝜇
𝑍=
𝜎
𝑋 = 𝑍𝜎 + 𝜇
5|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
P(X > k)
18 X=k X value
0 Z = z5 Z value
Example
In a mathematics examination the average grade was 82 and the standard deviation was
5. All students with grades from 88 to 94 received a grade of “B”. If the grades are
approximately normally distributed and 8 students received a B grade, how many
students took the examination?
Solution:
Assume that the grades are rounded to the nearest whole number, thus
X1 = 88 – 0.5 = 87.5
X2 = 94 + 0.5 = 94.5
And
𝑋1 −𝜇 87.5−82
𝑍1 = = = 1.10
𝜎 5
𝑋2 −𝜇 94.5−82
𝑍2 = = = 2.50
𝜎 5
Also
8
𝑃(𝑋1 < 𝑋 < 𝑋2 ) = 𝑃(𝑍1 < 𝑍 < 𝑍2 ) =
𝑁
6|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
82 X1 = 87.5 X value
X2 = 94.5
0Z 1 = 1.10 Z value
Z2 = 2.50
8
𝑃(𝑋1 < 𝑋 < 𝑋2 ) = 𝑃(𝑍1 < 𝑍 < 𝑍2 ) =
𝑁
𝑃(𝑍1 < 𝑍 < 𝑍2 ) = 𝑃(𝑍 < 𝑍2 ) − 𝑃(𝑍 < 𝑍2 ) = 𝑃(𝑍 < 2.50) − 𝑃(𝑍 < 1.10)
8
𝑃(𝑍1 < 𝑍 < 𝑍2 ) = 0.9938 − 0.8643 = 0.1295 =
𝑁
8
𝑁= = 𝟔𝟐 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔
0.1295
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
1.0
1.1 0.8643 0.8438 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2
1.3
2.4
2.5 0.9938 0.9920 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6
Example
The tensile strength of a certain metal component is normally distributed with mean of
10,000 kg / sq. cm and a standard deviation of 100 kg / sq. cm. Measurements are
recorded to the nearest 50 kg / sq. cm.
a. What proportion of these components exceed 10,150 kg / sq. cm in tensile
strength?
b. If specifications require that all components have tensile strength between
9,800 and 10,200 kg / sq. cm inclusive, what proportion of pieces would we
expect to scrap?
Solution:
7|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
a. What proportion of these components exceed 10,150 kg / sq. cm in tensile
strength?
0 Z1 = 1.75 Z value
𝑃(𝑋 > 10175) = 𝑃(𝑍 > 1.75) = 𝑃(𝑆) − 𝑃(𝑍 < 1.75)
𝑃(𝑋 > 10175) = 𝑃(𝑍 > 1.75) = 𝑃(𝑆) − 𝑃(𝑍 < 1.75)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
- 3.3
1.6
1.7 0.9554 0.9463 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8
𝑋3 −10000
𝑍3 = = 2.25
100
8|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
P(9775<X<10225)
10175)
P(-2.25<Z<2.25)
Z2 = - 2.25
0 Z3 = 2.25 Z value
𝑃(𝑋2 < 𝑋 < 𝑋3 ) = 𝑃(𝑍2 < 𝑍 < 𝑍2 ) = 𝑃(𝑍 < 𝑍3 ) − 𝑃(𝑍 < 𝑍2 )
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
- 2.3
- 2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
- 2.1
2.1
2.2 0.9861 0.9826 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3
Therefore, the proportion of pieces that would be scrapped = 1 – 0.9756 = 0.0244 = 2.44%
Example
A lawyer commutes daily from his suburban home to his midtown office. On the average
the trip one way takes 24 minutes, with a standard deviation of 3.8 minutes. Assume the
distribution of trips to be normally distributed.
a. What is probability that a trip will take at least ½ hour?
b. If the office opens at 9:00 A.M. and he leaves his house at 8:45 A.M. daily, what
percentage of the time is he late for work?
c. If he leaves the house at 8:35 A.M and coffee is served at the office from 8:50
A.M until 9:00 A.M., what is the probability that he misses coffee?
d. Find the length of time above which we find the slowest 15% of the trips.
e. Find the probability that 2 of the next 3 trips will take at least ½ hour.
Solution:
9|27 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
24 X1 = 30 X value
0 Z1 = 1.58 Z value
𝑃(𝑋 > 30) = 𝑃(𝑍 > 𝑍1 ) = 𝑃(𝑆) − 𝑃(𝑍 < 𝑍1 ) = 1.0 − 𝑃(𝑍 < 1.58)
𝑃(𝑋 > 30) = 𝑃(𝑍 > 𝑍1 ) = 𝑃(𝑆) − 𝑃(𝑍 < 𝑍1 ) = 1.0 − 0.9429 = 0.0571
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
- 0.9
1.5 0.9332 0.9207 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6
b. If the office opens at 9:00 A.M. and he leaves his house at 8:45 A.M. daily, what
percentage of the time is he late for work?
15 − 24
𝑍2 = = −2.37
3.8
X = 15 μ = 24 X value
10 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
𝑃(𝑋 > 15) = 𝑃(𝑍 > −2.37) = 𝑃(𝑆) − 𝑃(𝑍 < −2.37))
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
- 2.4
- 2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
- 2.2
c. If he leaves the house at 8:35 A.M and coffee is served at the office from 8:50
A.M until 9:00 A.M., what is the probability that he misses coffee?
24 X = 25 X value
0 Z3 = 0.26 Z value
𝑃(𝑋 > 25) = 𝑃(𝑍 > 0.26) = 𝑃(𝑆) − 𝑃(𝑍 < 0.26)
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
0.1
0.2 0.5793 05438 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3
Therefore, 0.3974 (39.74%) chance that the lawyer will miss his hot coffee.
d. Find the length of time above which we find the slowest 15% of the trips.
The slowest trips will be longer trip time, thus
P(X > k) = 15% = 0.1500
11 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
24 X=k X value
0 Z4 Z value
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
- 3.4
0.9
1.0 0.8413 0.8186 0.8461 0.8485 0.8508 0.8531 0.8554 .8577 0.8599 0.8621
1.1
The nearest Z- value leaving an area to the left equal to 0.8500 is 1.04 (1.0 + 0.04).
Therefore:
𝑋 = 𝑍𝜎 + 𝜇
e. Find the probability that 2 of the next 3 trips will take at least ½ hour.
3!
𝑏(𝑋 = 2; 3,0.0571) = ( ) (0.0571)2 (0.9429)1 = 𝟎. 𝟎𝟎𝟗𝟐
2! 1!
The probability that exactly 2 of the next 3 of the lawyer’s trip is 0.0092.
12 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION
If X is a binomial random variable with mean = np and variance 2 = npq, then the
limiting form of the distribution of
𝑋 − 𝑛𝑝
𝑍=
√𝑛𝑝𝑞
Example
The probability that a patient recovers from a delicate heart operation is 0.90. Of the next
100 patients having this operation, what is the probability that
a. between 84 and 95 inclusive survive?
b. Fewer than 86 survive?
Solution:
μ = np = (100) (0.90) = 90
σ = (100 x 0.90 x 0.10)1/2 = 3
X1 = 84 90 X2 = 95 X value
Z1 = - 2.0
0 Z2 = 1.67 Z value
𝑃(84 < 𝑋 < 95) = 𝑃(−2.0 < 𝑍 < 1.67) = 𝑃(𝑍 < 1.67) − 𝑃(𝑍 < −2.0)
𝑃(84 < 𝑋 < 95) = 𝑃(−2.0 < 𝑍 < 1.67) = 0.9525 − 0.0228 = 0.9297
The probability that 84 to 95 patients from 100 patients will survive the
delicate heart operation is 92.97%
13 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
X = 86
μ = 90 X value
Z1 = - 1.33 Z = 0 Z value
The probability that fewer than 86 patients from 100 patients will survive
the delicate heart operation is 9.18%
Example
If 20% of the residents in certain city prefer a white telephone over any other color
available, what is the probability that among the next 1000 telephones installed in this city
a. between 170 and 185 inclusive will be white?
b. At least 210 but not more than 225 will be white?
Solution:
μ = np = (1000) (0.20) = 200
σ = (1000 x 0.20 x 0.80)1/2 = 40
a. What is the probability that among the next 1000 telephones installed in this city
between 170 and 185 inclusive will be white?
Solution: X1 = 170
X2 = 185
170−200
And: 𝑍1 = = −0.75
40
185−200
𝑍2 = = −0.38
40
14 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Normal Curve σ = 40
Z1 = - 0.75 Z = 0 Z value
Z2 = - 0.38
𝑃(170 < 𝑋 < 185) = 𝑃(−0.75 < 𝑍 < −0.38) = 𝑃(𝑍 < −0.38) − 𝑃(𝑍 < −0.75)
𝑃(170 < 𝑋 < 185) = 𝑃(−0.75 < 𝑍 < −0.38) = 0.3520 − 0.2266 = 𝟎. 𝟏𝟐𝟓𝟒
b. What is the probability that among the next 1000 telephones installed in this city at
least 210 but not more than 225 will be white?
Solution: : X3 = 210
X4 = 225
210−200
And: 𝑍3 = = 0.25
40
225−200
𝑍4 = = 0.62
40
Normal Curve σ = 40
0Z 3 = 0.25 Z value
Z4 = 0.62
𝑃(210 < 𝑋 < 225) = 𝑃(0.25 < 𝑍 < 0.62) = 𝑃(𝑍 < 0.62) − 𝑃(𝑍 < 0.25)
𝑃(210 < 𝑋 < 225) = 𝑃(0.25 < 𝑍 < 0.62) = 0.7324 − 0.5987 = 𝟎. 𝟏𝟑𝟑𝟕
15 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
SAMPLING THEORY
(Reference: Introduction to Statistics by R. Walpole – Chapter 8)
Research Methodology
refers to the detailed description of procedure, instrument, and participants.
This includes the sampling procedure, that is, how participants selected
in the study.
Target Population
it is the entire particular group of people a researcher identifies to study and
about which to draw conclusions.
Sample
it refers to that part of the population that is included in the study and where the
information in research comes from.
Sampling
refers to the process of selecting the participants from the target population to
be included in the study.
SAMPLING TECHNIQUES
A. Random Sampling – is the method of selecting a sample size (n) from the universe
(N) such that each member of the population has an equal chance of being included in
the sample and all possible combinations of size (n) have an equal chance of being
selected as the sample.
B. Systematic Sampling – when sample units are obtained by drawing every, say, 4th or
7th or 10th item on a list.
1. Stratified Sampling – in this method the population is first divided into groups –
based on homogeneity – in order to avoid the possibility of drawing samples whose
members come only from one stratum. The distribution of sampling units is
proportionate to the total number of units in each stratum. The bigger the
population, the more sample units are drawn, the less population, the less sample
units.
2. Cluster Sampling – the cluster sample is sometimes referred to as an area sample
because it is frequently applied on a geographical basis. On this basis, districts or
blocks of a municipality or city are selected. These districts or blocks constitute the
clusters. Cluster sampling is useful in selecting the sample when blocks in a
community or city are occupied by heterogeneous groups.
3. Multi-stage Sampling – this technique uses several stages or phases in getting the
sample from the general population. Multi-stage sampling is useful in conducting
nation-wide surveys or any-survey involving a large universe.
C. Non-random Sampling – under this methodology, not all members of the population
are given equal chances to be chosen. Certain elements in the population are deliberately
left out in the choice of the sample for varied reasons.
1. Purposive Sampling – this is based on certain criteria laid down by the researcher.
People who satisfy the criteria are interviewed.
16 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
2. Quota Sampling – this is relatively quick and inexpensive method to operate. Each
interviewer is given definite instructions about the section of the public he is to
question, but the final choice of the actual persons is left to his own convenience
or preference and is not predetermined by some carefully operated randomized
plan. Each interviewer then proceeds to fill the prescribed quota.
3. Convenience Sampling – for example a researcher might want to find out the
popularity of a radio program. Since the researcher has a telephone, he might
simply use it and “randomly” pick his samples from the telephone directory. This
method, of course, biased against non-telephone users.
4. Snowball Sampling = the selection of samples through referrals made by people
who possess characteristics that are of interest to the researcher.
SLOVIN’S FORMULA
𝑁
𝑛=
1 + 𝑁𝑒 2
Example
Determine the reliable sample size from 10, 000 students of UC if the margin of error is
2%. Freshmen are 3500; sophomores are 3000; juniors are 2000; and seniors are 1500.
Solution:
Using Slovin’s formula, the total students to be selected for a reliable sample size is
𝑁 10000
𝑛= 2
= = 2000 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
1 + 𝑁𝑒 1 + 10000(0.02)2
17 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Then determine the number of students to be selected from the different year levels by
stratified sampling
3500
Freshmen: 𝑛1 = 10000 (2000) = 700 𝑓𝑟𝑒𝑠ℎ𝑚𝑒𝑛 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
3000
Sophomores: 𝑛2 = 10000 (2000) = 600 𝑠𝑜𝑝ℎ𝑜𝑚𝑜𝑟𝑒 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
2000
Juniors: 𝑛3 = 10000 (2000) = 400 𝑗𝑢𝑛𝑖𝑜𝑟 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
1500
Seniors: 𝑛4 = 10000 (2000) = 300 𝑠𝑒𝑛𝑖𝑜𝑟 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
And lastly select the number of students from the different levels by systematic sampling
may using the data information from the MIS of the school.
Number of students to be
Year level
selected
Freshmen 700
Sophomores 600
Juniors 400
Seniors 300
Total sample size 2000
SAMPLING DISTRIBUTIONS
The probability distribution of a statistic.
Central Limit Theorem (if the sample size is large and population standard
deviation is known)
If random samples of size n are drawn from a large or infinite population with
mean μ and variance σ2, then the sampling distribution of the sample mean 𝑋̅
is approximately normally distributed with mean 𝜇𝑋̅ = 𝜇 and standard deviation
𝜎𝑋̅ = 𝜎⁄ . Hence
√𝑛
𝑋̅ − 𝜇
𝑍= 𝜎
√𝑛
18 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Example
The random variable X, representing the number of cherries in a cherry puff, has the
following probability distribution
x 4 5 6 7
P(X = x) 0.20 0.40 0.30 0.10
Solution:
a. Find the mean and the variance 2 of X.
The mean and variance will be determined using the formulas in
discrete random variable.
𝑛
𝜇 = ∑ 𝑋𝑖 𝑃(𝑋 = 𝑥𝑖 )
𝑖=1
𝜎 2 = ∑ 𝑋𝑖 2 𝑃(𝑋 = 𝑥𝑖 ) − 𝜇 2
𝑖=1
𝜎 2 = 𝟎. 𝟖𝟏
b. Find the mean x and the variance 2x of the mean 𝑋̅ for the random samples
of 36 cherry puffs.
For n = 36
𝜇𝑋̅ = 𝜇 = 𝟓. 𝟑
2
𝜎 2 0.81
𝜎𝑋̅ = = = 𝟎. 𝟎𝟐𝟐𝟓
𝑛 36
c. Find the probability that the average number of cherries in the 36 cherry puffs
will be less than 5.5.
n = 36
X = 5.5
P(X < 5.5) = ?
And
𝑋̅ − 𝜇 5.5 − 5.3
𝑍= 𝜎 = = 1.33
√0.81
√𝑛 √36
19 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
σX = 0.15
Normal Curve
μX = 5.3 X value
X = 5.5
σ=1
Standard Normal Curve
Z = 0 Z = 1.33 Z value
Example
The heights of 1000 students are approximately normally distributed with a mean of 174.5
cm and standard deviation of 6.9 cm. If 200 random samples of size 25 are drawn from
this population and the means recorded to the nearest tenth of a cm, determine
a. the mean and standard error of the sampling distribution of 𝑋̅;
b. the number of sample means that fall between 172.5 and 175.8 cm
inclusive;
c. the number of sample means falling below 172 cm.
Solution:
Let X = continuous random variable representing the heights of students
Given: μ = 174.5 cm
σ = 6.9 cm
N = 200 samples
n = 25
means are recorded to the nearest tenth of a centimeter.
a. Determine the mean and standard error of the sampling distribution of 𝑋̅.
𝜇𝑋̅ = 𝜇 = 174.5 𝑐𝑚
𝜎 2 (6.9)2
𝜎𝑋̅ 2 = = = 1.9044
𝑛 25
𝜎2
𝜎𝑋̅ = √ = √𝜎𝑋̅ 2 = √1.9044 = 1.38
𝑛
b. Determine the number of sample means that fall between 172.5 and 175.8 cm
inclusive.
0.1
𝑋̅1 = 172.5 − = 172.45
2
20 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
0.1
𝑋̅2 = 175.8 + = 175.85
2
Let:
A = the event that the sample mean falls between 172.5 and 175.8 cm
inclusive. = (nA samples)
𝑛𝐴 𝑛𝐴
𝑃(𝐴) = = = 𝑃(172.45 < 𝑋̅ < 175.85)
𝑁 200
𝑃(−1.49 < 𝑍 < 0.98) = 𝑃(𝑍 < 0.98) − 𝑃(𝑍 < −1.49) = 0.8365 − 0.0681 = 𝟎. 𝟕𝟔𝟖𝟒
Thus
𝑛𝐴 𝑛𝐴
𝑃(𝐴) = = = 𝑃(172.45 < 𝑋̅ < 175.85) = 0.7684
𝑁 200
𝑛𝐴
0.7684 =
200
𝜎𝑋̅ = 1.38
Normal Curve
ഥ < 175.8)
P(172.5 < 𝑿
ഥ = 172.5174.5
𝑿1
ഥ 2 = 175.8
𝑿 𝑋̅ value
Z1 = - 1.49
0 Z2 = 0.98 Z value
21 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
𝜎𝑋̅ = 1.38
Normal Curve
174.5 𝑋̅ value
𝑋̅3 = 172
Z1 = - 1.85 Z = 0 Z value
Let: B = event that the sample mean falls below 172 cm = (nB samples)
𝑛𝐵 𝑛𝐵
𝑃(𝐵) = = = 𝑃(𝑋̅ < 172) = 𝑃(𝑍 < −1.85) = 0.0322
𝑁 200
𝑛𝐵
= 0.0322
200
22 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
t Distribution (if the sample size is small and population standard deviation is
unknown)
If 𝑋̅ and 𝑆 2 are the mean and variance, respectively, of a random sample of size n taken
from a population that is normally distributed with mean μ and variance 𝜎 2 , then
ഥ −𝝁
𝑿
𝒕= 𝒔
⁄ 𝒏
√
is a value of a random variable T having the t distribution with 𝑣 = 𝑛 − 1 degrees of
freedom.
α
0 tα
α
v
0.10 0.05 0.025 0.01 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
infinity 1.282 1.645 1.960 2.326 2.576
The table given above gives the probability (α) to the right of a specific value of T variable.
23 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Example
Given a random sample of size 24 from a normal distribution, find k such that
a. P(– 2.069 < T < k) = 0.965
b. P(k < T < 2.807) = 0.095
c. P(– k < T < k) = 0.90
Solution:
The degrees of freedom, 𝑣 = n – 1 = 24 – 1 = 23
The area between T = – 2.069 and T = k is 0.965, thus T = k must be at the right
side of T = 0.
0.965
β α
T = - 2.069 0 T=k
Determine β from the t – table, degrees of freedom, v = 23, disregard negative sign
of the T value, T = 2.069.
v α
0.10 0.05 β=0.025 0.01 0.005
1
22
23 1.319 1.714 2.069 2.500 2.807
24
Thus: β = 0.025
v α
0.10 0.05 0.025 0.01 0.005
1
22
23 1.319 1.714 2.069 2.500 2.807
24
Thus: k = 2.500
The area between T = k and T = 2.807 is 0.095 (< 0.50), thus T = k must be at the
right side of T = 0.
24 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
0.095
α
0 T = k T = 2.807
v α
0.10 0.05 0.025 0.01 0.005
1
22
23 1.319 1.714 2.069 2.500 2.807
24
Thus: α = 0.005
v α
0.10 0.05 0.025 0.01 0.005
1
22
23 1.319 1.714 2.069 2.500 2.807
24
Thus: k = 1.319
The area between T = – k and T = k is 0.90, thus T = – k must be at the left side
of T = 0 and T = k must be at the right side of T = 0. The area is symmetrical to T
= 0.
0.90
α α
T=-k 0 T=k
Determine α:
1
𝛼= (1.0 − 0.90) = 0.05
2
25 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
solve for k in the t – table with v = 23 and α = 0.05
v α
0.10 0.05 0.025 0.01 0.005
1
22
23 1.319 1.714 2.069 2.500 2.807
24
Example
A manufacturing firm claims that the batteries used in their electronic games will last an
average of 30 hours. To maintain this average, 16 batteries are tested each month. If the
computed t value falls between – t0.025 and t0.025, the firm is satisfied with its claim. What
conclusion should the firm draw from a sample that has a mean 𝑋̅ = 27.5 hours and a
standard deviation s = 5 hours? Assume the distribution of battery lives to be
approximately normal.
Solution:
n = 16
𝑋̅ = 27.5 ℎ𝑟𝑠
s = 5 hrs.
μ = 30 hrs.
Determine t-value
ഥ − 𝝁 𝟐𝟕. 𝟓 − 𝟑𝟎
𝑿
𝒕= 𝒔 = = −𝟐. 𝟎
⁄ 𝒏 𝟓
√
√𝟏𝟔
Acceptance
region
α α = 0.025
0
T = - t0.025, 15 T = t0.025, 15
v α
0.10 0.05 0.025 0.01 0.005
1
13
14
15 1.341 1.753 2.131 2.602 2.947
16
26 | 2 7 cblamsis
EDA (Eng’g Data Analysis)
Reference: Intro to Statistics by R. Walpole Lecture Notes 5
Conclusion:
Since t = – 2.0 is greater than T = – 2.131 and less than T = 2.131, T = – 2.0 is
on the acceptance region, thus, the manufacturers claim is valid.
Acceptance
region
α α = 0.025
T = – 2.131 0 T = 2.131
T = – 2.0
Example
A cigarette manufacturer claims that his cigarettes have an average nicotine content of
1.83 milligrams. If a random sample of 8 cigarettes of this type have nicotine contents of
2.0, 1.7, 2.1, 1.9, 2.2, 2.1, 2.0, and 1.6 milligrams. What is the T-value?
Solution:
n=8
μ = 1.83
Nicotine content, Xi (Xi)2
2.0 4.00
1.7 2.89
2.1 4.41
1.9 3.61
2.2 4.84
2.1 4.41
2.0 4.00
1.6 2.56
∑ 𝑋𝑖 = 15.60 ∑(𝑋𝑖 )2 = 30.72
1
8(30.72) − (15.6)2 2
𝑠=[ ] = 0.207 𝑚𝑖𝑙𝑙𝑖𝑔𝑟𝑎𝑚𝑠
8(8 − 1)
Determine T- value
ഥ − 𝝁 𝟏. 𝟗𝟓 − 𝟏. 𝟖𝟑
𝑿
𝑻=𝒕= 𝒔 = = 𝟏. 𝟔𝟒𝟎
⁄ 𝒏 𝟎. 𝟐𝟎𝟕
√
√𝟖
27 | 2 7 cblamsis