Summary of Statistics
Summary of Statistics
Statistics is a group of methods used to collect, analyse, present, and interpret data and to
make decisions.
In more details, Statistics is the practice or science of collecting and analysing numerical data
in large quantities, especially for the purpose of inferring proportions in a whole from those
in a representative sample.
1. Half of the students taking a test score less than the average mark.
3. In a large population of animals, about half of the adult animals are heavier than the
average adult weight.
4. Suppose that in a game you can only score an even number of points: 0, 2, 10, 50. So,
the average score over a series of games is an even number.
7. The chance of observing an outcome more than three standard deviations from the
mean is less than 1 in 100.
8. I repeat an experiment with a random numerical outcome many times. Eventually the
average of my outcomes will be within 1% of the theoretical average outcome.
9. The chance of observing an outcome more than ten standard deviations from the mean
is not more than 1%.
10. If two statistical processes are uncorrelated then they must be independent.
Definition of Population
In simple terms, population means the aggregate of all elements under study having one or
more common characteristic, for example, all people living in India constitutes the
population. The population is not confined to people only, but it may also include animals,
events, objects, buildings, etc. It can be of any size, and the number of elements or members
in a population is known as population size, i.e. if there are hundred million people in India,
then the population size (N) is 100 million.
Examples
Definition of Sample
By the term sample, we mean a part of population chosen at random for participation in the
study. The sample so selected should be such that it represents the population in all its
characteristics, and it should be free from bias, so as to produce miniature cross-section, as
the sample observations are used to make generalisations about the population.
In other words, the respondents selected out of population constitutes a ‘sample’, and the
process of selecting respondents is known as ‘sampling.’ The units under study are called
sampling units, and the number of units in a sample is called sample size.
Key Differences Between Population and Sample
The difference between population and sample can be drawn clearly on the following
grounds:
2. The population consists of each and every element of the entire group. On the other
hand, only a handful of items of the population is included in a sample.
3. The characteristic of population based on all units is called parameter while the
measure of sample observation is called statistic.
4. When information is collected from all units of population, the process is known as
census or complete enumeration. Conversely, the sample survey is conducted to
gather information from the sample using sampling method.
5. With population, the focus is to identify the characteristics of the elements whereas in
the case of the sample; the focus is made on making the generalisation about the
characteristics of the population, from which the sample came from.
The term used in statistical sampling to describe the greater whole is population. A
population could be a group of people or any group of objects you are studying (e.g., rocks
containing gold, dog biscuits made by x-brand, or all left-handed one-toothed people in the
world). Studying populations can be complicated, expensive, and time consuming so
researchers have developed several different ways to sample whatever it is they are studying.
Broadly, these sampling techniques are either probability-based (random sampling) or non-
probability-based (non-random sampling).
Before one can determine which sampling method to use they must first decide what will be
their target population. From that they would develop a sampling frame, or list, of the set
of people from whom data could be collected. This can be a difficult task at times. Then they
would probably want to apply one of the following methods for determining what, or who,
will be in the sample.
5. For each research objective below, identify the population and sample in
the study.
a) The government research agency contacts 2020 residents of Malaysia aged 18 or older and
ask whether the National Service is good to be implemented. (Sample)
b) A worker selected 20 cans of soft drinks at random for inspection. (sample)
c) Steven Co. studies the process of 40 Statistics books in Malaysia in order to set a price for
its new Statistics book (Sample).
d) A bus driver selects 15 passengers at bus station to see the colour of the clothes (Sample).
(1)
The last four semesters an instructor taught Intermediate Algebra; the following numbers of
people passed the class.
SEM I-17, SEM II –19, SEM III-4, SEM IV -20
Which of the following conclusions can be obtained from purely descriptive measures and
which can be obtained by inferential methods?
Answer:
a) Descriptive- The last four semesters the instructor taught Intermediate Algebra; an
average of 15 people passed the class.
b) Inferential- The next time the instructor teaches Intermediate Algebra, we can expect
approximately 15 people to pass the class.
c) Inferential- This instructor will never pass more than 20 people in an Intermediate
Algebra class.
e) Inferential- Only 5 people passed one semester because the instructor was in a bad
mood the entire semester.
f) Inferential-The instructor passed 20 people the last time he taught the class to keep
the administration off of his back for poor results.
g) Inferential- The instructor passes so few people in his Intermediate Algebra classes
because he doesn't like teaching that class.
(2)
During the last week, Tony Gwynn of the San Diego Padres recorded the following number
of hits.
Sun –2
Mon –1
Tues –4
Wed –3
Thurs –0
Fri –3
Sat -1
Which of the following conclusions can be obtained from purely descriptive methods and
which can be obtained by inferential methods?
Answer:
(a) Inferential- Tony will never have more than 4 hits in a game.
(b) Inferential-Tony had 0 hits on Thursday because he used a bat that belonged to
another player.
(c) Descriptive- During the last week, Tony averaged 2 hits per game.
(e) Descriptive-Tony had the same total number of hits in the first 3 games as he did in
the last 4 games.
Qualitative. Qualitative variables take on values that are names or labels. The colour
of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier)
would be examples of qualitative or categorical variables.
Quantitative. Quantitative variables are numeric. They represent a measurable
quantity. For example, when we speak of the population of a city, we are talking
about the number of people in the city - a measurable attribute of the city. Therefore,
population would be a quantitative variable.
e. Salaries of employees.(Quantitative)
f. Religious affiliation.(Qualitative)
A variable that can assume any numerical value over a certain interval or intervals is called a
continuous variable
Inclusive:
For this type of data, the lower boundary = lower limits –0.5, and
upper boundary = upper limits + 0.5
Exclusive:
What is frequency distribution?
A frequency distribution lists each category of data and the number of occurrences for each
category of data.
Frequency of that Category f
Relative frequency of a category=
∑ of all Frequencies = ∑f
Percentage (%)= (Relative frequency) × 100
Example:
Answer:
Bar graph:
A graph made of bars whose heights represent the frequencies of respective categories is
called a bar graph.
x–axis : categories
y–axis : frequencies (or relative frequencies or percentages)
Example
question on Bar chart/graph:
The following data are the favourite National car models of 50 MMU students.
The question can ask you by giving this pattern below:
Or,
Car Model Number of cars (f)
Wira 14
Iswara 6
Kancil 17
Kenari 7
Saga 6
Sum = 50
Wira 14 0.28 28
Iswara 6 0.12 12
Kancil 17 0.34 34
Kenari 7 0.14 14
Saga 6 0.12 12
Sum= 50 Sum=1 Sum= 100%
c)
Bar chart/graph
0.4
0.35
0.3
Relative Frequency
0.25
0.2
0.15
0.1
0.05
0
Wira Iswara Kancil Kenari Saga
Cars Model
Pie Chart:
A pie chart is a circle divided into sectors. Each sector represents a category of data. The
area of each sector is proportional to the frequency of the category.
Pie chart Angle: (Relative Frequency) ×360˚
Answer:
(a & b)
Types of No of RF Percentag Angel
tastes respondents e
S 5 0.416667 41.66667 0.416667×360˚= 150˚
M 4 0.333333 33.33333 0.333333×360˚=120˚
I 3 0.25 25 0.25×360˚= 90˚
Sum= 1 Sum= 100 Sum= 360˚
c)
RF Angel
0.416667 0.416667×360˚= 150˚
0.333333 0.333333×360˚=120˚
0.25 0.25×360˚= 90˚
Sum= 1 Sum= 360˚
Pie chart
25%
42%
33%
Inclusive:
For this type of data, the lower boundary = lower limits – 0.5, and
upper boundary = upper limits + 0.5
Exclusive:
If the question gives ungrouped data, and then asked you to
construct an inclusive/exclusive frequency distribution table:
For example:
Example:
Polygon:
A graph formed by joining the midpoints of the tops of successive bars in a histogram with
straight lines is called a polygon.
11-16 8 0.2 20
17-22 13 0.325 32.5
23-28 8 0.2 20
29-34 6 0.15 15
35-40 3 0.075 7.5
41-46 1 0.025 2.5
47-52 1 0.025 2.5
Total 40 1 100
b)
Class Frequency RF % Cumulative Cumulative Cumulative
Limit frequency relative percentage
frequency
11-16 8 0.2 20 8 0.2 20
17-22 13 0.325 32.5 21 0.525 52.5
23-28 8 0.2 20 29 0.725 72.5
29-34 6 0.15 15 35 0.875 87.5
35-40 3 0.075 7.5 38 0.95 95
Histogram & Polygon
41-46 1 0.025 2.5 39 0.975 97.5
0.35
47-52 1 0.025 2.5 40 1 100
Total0.3 40 1 100
Relative Frequency
0.25
0.2
c)
0.15
0.1
0.05
0
10.5 16.5 22.5 28.5 34.5 40.5 46.5 52.5
Class Limit
d)
Mode on Histogram
0.35
0.3
0.25
Relative frequency
0.2
0.15
0.1
0.05
0
10.5 16.5 22.5 28.5 34.5 40.5 46.5 52.5
Class Limit
More than type Cumulative Frequency Curve (Here we use the lower limit of the classes
to plot the curve)
Less than type Cumulative Frequency Curve (Here we use the upper limit of the classes
to plot the curve)
Answer:
a)
Electric bills (inclusive class interval) No. of families (f)
21-30 3
31-40 5
41-50 8
51-60 5
61-70 5
71-80 4
Total 30
b)
Electric bills (inclusive class No. of families (f) Cumulative frequency
interval) (CF)
X<20.5 0 0
20.5-30.5 3 3
30.5-40.5 5 8
40.5-50.5 8 16
50.5-60.5 5 21
60.5-70.5 5 26
70.5-80.5 4 30
Total 30
Cumulative frequency curve
35
30
Cumulative Frequency
25
20
15
10
0
10 20 30 40 50 60 70 80
Upper boundaries
c)
Cumulative frequency curve
35
30
Cumulative Frequency
25
20
15
10
0
10 20 30 40 50 60 70 80
Upper boundaries
10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5
a) From the graph there are 27 families whose electricity bill is RM 61 or more. The
total no of families are, 30 so the number of families whose electricity bill is
RM61 or more are 30-27= 3 (10%)
b) From the graph there are 15 families whose electricity bill is RM 41 or less (50%)
n 30
d. Median, Position= = =15
2 2
15−8
= 40.5 + ×10
8
= 40.5 +8.75
= 49.25
n+1 104
e. First quartile= = = 26
4 4
3(n+1) 315
Third quartile = = = 78.75
4 4
Quartiles:
Quartiles
If we divide a cumulative frequency curve into quarters, the value at the lower quarter is
referred to as the lower quartile, the value at the middle gives the median and the value at the
upper quarter is the upper quartile.
A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these numbers
is 19.625 . However, the extremes in this set (8 and 50) distort this value. The interquartile
range is a method of measuring the spread of the middle 50% of the values and is useful since
it ignores the extreme values.
The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case) and
the upper quartile is the 3(n+1)/4 the value. The difference between these two is the
interquartile range (IQR).
In the above example, the upper quartile is the 118.5th value and the lower quartile is the
39.5th value. If we draw a cumulative frequency curve, we see that the lower quartile,
therefore, is about 17 and the upper quartile is about 37. Therefore, the IQR is 20 (bear in
mind that this is a rough sketch- if you plot the values on graph paper you will get a more
accurate value).
Steam & Leaf:
In a stem-and-leaf display of quantitative data, each value is divided into two portions –a
stem and a leaf. The leaves for each stem are shown separately in a display. Used in data
analysis when data is small.
Answer:
50, 52, 57, 61, 64, 65, 68, 69, 71, 71, 72, 72, 75, 76, 77, 78, 79, 79, 80, 81, 83, 84, 86, 87, 87,
92, 92, 93, 95, 96, 98
Chapter 3, 4 & 5
Population: μ=
∑x
N
Sample: X́ =
∑x
n
Mean (Grouped Data) (Continuous & Discrete)
Population: μ=
∑ fx or μ= ∑ fx
N ∑f
Sample: : X́ =
∑ fx or, X́ = ∑ fx
n ∑f Where, L, is the lower boundary of the class containing
2. Median (Ungrouped and Discrete) the median
3. Mode (Ungrouped and Discrete) Where, L is the lower boundary of class containing the mode
Take the highest frequency f 0 ,is the frequency of the modal class
Mode (Continuous) f 1 ,is the frequency of class before modal class
f 0−f 1 f 2, is the frequency of class after modal class
^
X =L+ ×C
( f 0−f 1 ) +¿ ¿ C, is the interval of the modal class
Ꝺ 2=∑ ¿¿ ¿ S2=√ ∑ ¿ ¿ ¿
Variance and Standard Deviation (Ungrouped data)
Ꝺ 2=∑ f ¿ ¿ ¿ S2=√ ∑ f ¿ ¿ ¿
5. Dispersion:
Ungrouped Data
1. 23, 45, 32, 14, 56, 45, 30, 47
a) Mean
b) Median
c) Mode
d) Variance & Standard Deviation
Answer:
a) Mean, X́ =
∑ x = 23+ 45+32+ 14+56+ 45+30+ 47 =36.5
n 8
~ n+1 th
b) Median, X ¿
2
8+1
= th =4.5th
2
14+56
So, the median will be, =35
2
c) Mode:
Here in this ungrouped data we found two modes, the nature of this mode is called
bimodal.
d)
x x́ ( x− x́ ) ¿
23 36.5 -13.5 182.25
45 36.5 8.5 72.25
32 36.5 -4.5 20.25
14 36.5 -22.5 506.25
56 36.5 19.5 380.25
45 36.5 8.5 72.25
30 36.5 -6.5 42.25
47 36.5 10.5 110.25
∑¿
Ꝺ 2=∑ ¿¿ ¿.
S2=√ ∑ ¿ ¿ ¿ = √ 4.75=2.18
Grouped Data (Discrete)
The local Dress Rack store conducted an inventory of their sales to determine which
sizes to order for the fall season. The following data represent the number of dresses sold
this month by size
Size No of dresses
sold
4 8
6 23
8 12
10 35
12 3
Calculate :
a) mean
b) median
c) mode
d) Standard deviation
Answer:
a) Mean:
d) Standard Deviation
(f) Size (x) x́ ( x− x́ ) ¿ f¿
8 4 8.05 -4.05 16.40 131.2
23 6 8.05 -2.05 4.20 96.6
12 8 8.05 0.05 0.0025 0.03
35 10 8.05 1.95 3.80 133
3 12 8.05 3.95 15.60 46.8
∑¿ ∑f¿
407.63
S2=√ ∑ f ¿ ¿ ¿ =¿
√ 40−1
=√ 10.45=3.23
a) Mean
b) Median
c) Mode
d) Variance & Standard deviation
Answer:
a) Mean
Speed (km hour) No. of vehicles (f) Midpoint (x) (fx) (Cf)
40-44 8 42 336 8
45-49 18 47 846 26
50-54 16 53 848 42
55-59 26 57 1482 68
60-64 22 62 1364 90
65-69 10 67 670 100
∑ f =100 ∑ fx=5546
X́ =
∑ fx = 5546 =55.46
∑ f 100
b) Median:
n
Position: = 50
2
L=54.5
∑ ( f m−1 ) =42 .
f m=26 .
n
−∑( f m −1)
So median, ~ 2
X =L+ × C.
fm
50−42
= 54.5+ ×5
26
= 56.04
c) Mode:
L= 54.5
f 0−f 1=26−16=10 ;
¿.
f 0−f 1
^
X =L+ ×C
( f 0−f 1 ) +¿ ¿
10
= 54.5+ ×5
10+4
= 58.07
Ꝺ 2=∑ f ¿ ¿ ¿.
S2=√ ∑ f ¿ ¿ ¿ √ 51.68=7.189=7.2
Find out missing frequency
Table shows the age of 40 tourists who visited a tourist spot.
Age Frequency
10-19 4
20-29 m
30-39 n
40-49 10
50-59 8
Given that the median age is 35.5, find the value of m and of n.
Answer:
Age Frequency Cumulative frequency
10-19 4 4
20-29 m 4+m
30-39 n 4+m+n
40-49 10 14+m+n
50-59 8 22+m+n
Here the total number of tourists are 40. So, the cumulative frequency (22+m+n) will be equal to the
number of tourists, 40.
That means, 22+m+n=40
n= 40-22-m
n=18-m ……….(1)
According to the question the given median age is 35.5, which lead to the median class, 30-39. So the
median will be,
n
−f m −1
Median= L + 2
( )× C
fm
40
−( 4+ m)
35.5 = 29.5 + 2
×10
n
20−4−m
35.5 – 29.5 = ×10
n
16−m
6= × 10
n
6n = 10(16-m)
6n = 160-10m
3n = 80 – 5 m ……………..(2)
Now if we take the substitute (1) into (2) so we can find out the value of m.
3(18-m) = 80-5m
54-3m =80 - 5m
2m = 26
26
m=
2
m =13
We get the value of m=13, so we can put the value of m into substitute (1)
n= 18-m
n= 18-13
n=5
So, m=13, n=15.
Shapes of distribution:
Pearson’s Coefficient
The weekly income of all part time employees of a fast-food restaurant chain was organized into the
following frequency distribution.
Answer:
Weekly income f X fx Cf x́ ( x− x́ ) ¿ f¿
100-150 5 125 625 5 240 -115 13225 91125
150-200 9 175 1575 14 240 -65 4225 38025
200-250 20 225 4500 34 240 -15 225 4500
250-300 18 275 4950 52 240 35 1225 22050
350-400 5 325 1625 57 240 85 7225 36125
450-500 3 375 1125 60 240 135 18225 54675
∑ f =60 ∑ fx =14400 ∑f¿
Mean,
X́ =
∑ fx = 14400 =240
∑ f 60
Median,
a) Median:
n
Position: = 30
2
L=200
∑ ( f m−1 ) =14 .
f m=20 .
n
−∑( f m −1)
So median, ~ 2
X =L+ × C.
fm
30−14
= 200+ ×50
20
= 200+40 = 240
S2=√ ∑ f ¿ ¿ ¿ √ 4178=65
3(240−240)
Parson Coefficient, r =3 ¿¿ = =0
65
r=0, Distribution is symmetry.
Greater Dispersion/CV
Answer:
Series A
Weekly income f X fx x́ ( x− x́ ) ¿ f¿
10-20 10 15 150 42.86 -27.86 776.18 7761.8
20-30 16 25 400 42.86 -17.86 318.98 5103.68
30-40 30 35 1050 42.86 -7.86 61.78 1853.4
40-50 40 45 1800 42.86 2.14 4.58 183.2
50-60 26 55 1430 42.86 12.14 147.38 3831.88
60-70 18 65 1170 42.86 22.14 490.18 8823.24
∑ f =140 ∑ fx =6000 ∑f¿ 27557.2
Mean,
X́ =
∑ fx = 6000 =42.86
∑ f 140
Standard Deviation:
S2=√ ∑ f ¿ ¿ ¿ √ 198=14
S
CV = ( ) X́
×100 %
14
=(
42.86 )
× 100 % = 32.66%
Series B
Weekly income f X fx x́ ( x− x́ ) ¿ f¿
10-20 22 15 330 39 -24 576 12672
20-30 18 25 450 39 -14 196 3528
30-40 32 35 1120 39 -4 16 512
40-50 34 45 1530 39 6 36 1224
50-60 18 55 990 39 16 256 4608
60-70 16 65 1040 39 26 676 10816
∑ f =140 ∑ fx=5460 ∑f ¿
Mean,
X́ =
∑ fx = 5460 =39
∑ f 140
Standard Deviation:
33360
S2 =
√ 140−1
=¿ √ 240=15.5
CV = ( X́S ) ×100 %
15.5
=(
39 )
×100 % = 39.72%
Chapter 6: Probability
Formulas:
Number of ways that event A can occur n( A)
P(A)= =¿
Total number of possible outocmes n (S)
Mutually exclusive
P(A∪B)= P(A)+P(B)
If bar is given,
P(A∩ B́)= P(A) – P(A∩B) or, P(A∪ B́)= P(A) – P(A∪B)
P( Á ∩ B ¿=¿ P(B) – P(A∩B) or, P( Á ∪ B ¿=¿ P(B) – P(A∪B)
P( A ∩´ B ¿¿ = 1- P(A∪B) or, P( A ∪´ B ¿ ¿ = 1- P(A∩B)
Probability tree diagram:
Example 1:
A bag contains 3 black balls and 5 white balls. Paul picks a ball at random
from the bag and replaces it back in the bag. He mixes the balls in the bag
and then picks another ball at random from the bag.
a) Construct a probability tree of the problem.
b) Calculate the probability that Paul picks:
i) two black balls
ii) a black ball in his second draw
Solution:
b)
i. To find the probability of getting two black balls, first locate the B
branch and then follow the second B branch. Since these are
independent events we can multiply the probability of each branch.
ii) There are two outcomes where the second ball can be black.
From the
Example 2:
Bag A contains 10 marbles of which 2 are red and 8 are black. Bag B contains 12 marbles of
which 4 are red and 8 are black. A ball is drawn at random from each bag.
a) Draw a probability tree diagram to show all the outcomes the experiment.
b) Find the probability that:
(i) both are red.
(ii) both are black.
(iii) one black and one red.
(iv) at least one red.
Solution:
a) A probability tree diagram that shows all the outcomes of the experiment.
b) The probability that:
(i) both are red.
P(R, R) =
P(B, B) =
P(R, B) or P(B, R) =
1- P(B, B) =
Example 3:
A box contains 4 red and 2 blue chips. A chip is drawn at random and then replaced. A
second chip is then drawn at random.
a) Show all the possible outcomes using a probability tree diagram.
b) Calculate the probability of getting:
(i) at least one blue.
(ii) one red and one blue.
(iii) two of the same colour.
Solution:
a) A probability tree diagram to show all the possible outcomes.
P(R, B) or P(B, R) =
P(R, R) or P(B, B) =
Example 4
A card is taken at random from a pack of 52 playing cards, and then replaced. A
second card is then drawn at random from the pack.
Use a tree diagram to determine the probability that:
We first note that, for a single card drawn from the pack,
13 1 39 3
p(Diamond) = = and p(not Diamond) = = .
52 4 52 4
We put these probabilities on the branches of the tree diagram below:
Note also that the probability for each combination, for example, two Diamonds, is
determined by multiplying the probabilities along the branches.
(a)
both cards are Diamonds,
1
p(both Diamonds) =
16
(b)
at least one card is a Diamond,
1 3 3 7
p(at least one Diamond) = + + =
16 16 16 16
(c)
exactly one card is a Diamond,
3 3 6 3
p(exactly one Diamond) = + = =
16 16 16 8
(d)
neither card is a Diamond.
9
p(neither card a Diamond) =
16
Example 5
The probability that a patient is allergic to penicillin is .20. Suppose this drug is administered
to three patients. Find the probability:
a)all three of them are allergic to it.
b)exactly one patient allergic to it.
c)at least one patient allergic to it.
Answer:
Venn Diagram
Example 1:
In a school of 320 students, 85 students are in the band, 200 students are on sports teams, and
60 students participate in both activities. Find the probability of students involved in either
band or sports?
85 60 200
Answer:
P(A)= 85
P(B)= 200
P(A∩B)= 60
P(A∪B)= P(A)+P(B)-P(A∩B)
= 85+200-60
= 225
Example 2:
P(A) = 0.4, P(B) = 0.3, P(A ∪ B) = 0.6. Find a) P(A ∩ B) b) P(A') c) P(A' ∩ B') d) Draw
these events out on a Venn Diagram
a)
P(A ∩ B) = P(A) + P(B) - P(A ∪ B)
P(A ∩ B) = 0.4 + 0.3 - 0.6
P(A ∩ B) = 0.1
b)
P(A') = 1 - P(A)
P(A') = 1 - 0.4
P(A') = 0.6
c)
P(A' ∩ B') - this means that event A doesn't occur AND event B doesn't occur. We know
that P(A ∪ B) = 0.6; this means the probability of event A OR B occurring is 0.6. Therefore
the probability that event A AND B DON'T occur, must be 1 - P(A ∪ B). This equals 0.4.
d)
Example 3:
The probability that a child in a school has green eyes is 0.37, and the probability they
have black hair is 0.45. The probability that the child has either green eyes or black hair
or both is 0.40. A child is randomly selected from the school, what is the probability that
the child has a) black hair and green eyes b) black hair but not green eyes c) neither
black hair or green eyes?
a)
P(BH AND GE) = P(BH ∩ GE)
P(BH ∩ GE) = P(BH) + P(GE) - P(BH ∪ GE)
P(BH ∩ GE) = 0.45 + 0.37 - 0.8
P(BH ∩ GE) = 0.02
b)
P(BH AND NOT GE) = P(BH ∩ GE')
If we visualise a Venn diagram, we want the BH circle without the intersection with GE
circle. Therefore, we want BH minus P(BH ∩ GE):
P(BH ∩ GE') = P(BH) - P(BH ∩ GE)
P(BH ∩ GE') = 0.45 - 0.02
P(BH ∩ GE') = 0.43
Example 4:
An integer is selected randomly from a set of integers {1,2,3,4,5,6,7,8,9,10,11,12}. Find the
probability that the integer is
(a)an even number or is divisible by 3
(b)an even number and is not divisible by 3
(c)not an even number and is not divisible by 3
Answer:
Let A= event that even number is chosen
= (2,4,6,8,10,12)
Let B = event that the chosen number is divisible by 3
= (3,6,9,12)
a) P(A∪B)= P(A)+P(B)-P(A∩B)
6 4 2 8 2
= + − = =
12 12 12 12 3
b) P(A∩ B́)= P(A) – P(A∩B)
6 2 4 1
= − = =
12 12 12 3
´
c) P( A ∩ B ¿¿ = 1- P(A∪B)
2 1
= 1- =
3 3
Example:
Two dice are tossed, find the probability :
a) the sum of the two number is an even number.
b) the sum of two number is an even number or the sum of two number is divisible by 3.
Answer:
A= Sum of the two number is an even number
B= sum of two number is divisible by 3
18 1
a) P(A) = = OR, 0.50
36 2
12 1
b) P(B)= = OR, 0.33
36 3
P(A∪B)= P(A)+P(B)-P(A∩B)
18 12 6 24 2
= + − = = = 0.66
36 36 36 36 3
OR,
0.50 + 0.33-0.17 = 0.66