0% found this document useful (0 votes)
204 views49 pages

Summary of Statistics

1. Statistics involves collecting, analyzing, presenting, and interpreting data to make decisions. It uses methods to study samples and make generalizations about populations. 2. Examples of statistical statements include that half of students scored below the average on a test, and that in a large animal population about half of adults weigh more than the average adult weight. 3. A population includes all individuals with a common characteristic, while a sample is a subset selected from the population. Samples are used because studying entire populations can be complicated, expensive, and time-consuming, so samples allow researchers to make generalizations about populations.

Uploaded by

Aditya Majumder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views49 pages

Summary of Statistics

1. Statistics involves collecting, analyzing, presenting, and interpreting data to make decisions. It uses methods to study samples and make generalizations about populations. 2. Examples of statistical statements include that half of students scored below the average on a test, and that in a large animal population about half of adults weigh more than the average adult weight. 3. A population includes all individuals with a common characteristic, while a sample is a subset selected from the population. Samples are used because studying entire populations can be complicated, expensive, and time-consuming, so samples allow researchers to make generalizations about populations.

Uploaded by

Aditya Majumder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Summary of Statistics

Chapter 1: Introduction to Statistics


1. Define Statistics:

Statistics is a group of methods used to collect, analyse, present, and interpret data and to
make decisions.

In more details, Statistics is the practice or science of collecting and analysing numerical data
in large quantities, especially for the purpose of inferring proportions in a whole from those
in a representative sample.

2. Give TWO (2) example of statistical statements:

1. Half of the students taking a test score less than the average mark.

2. Nobody scores higher than the average mark in a test.

3. In a large population of animals, about half of the adult animals are heavier than the
average adult weight.

4. Suppose that in a game you can only score an even number of points: 0, 2, 10, 50. So,
the average score over a series of games is an even number.

5. A random process is defined by a certain (unknown) probability distribution. The


standard deviation of the random process is not larger than the range of the observed
data.

6. A random process is defined by a certain probability distribution. The standard


deviation of the random process is not larger than half of the maximum theoretical
range of the observed data.

7. The chance of observing an outcome more than three standard deviations from the
mean is less than 1 in 100.

8. I repeat an experiment with a random numerical outcome many times. Eventually the
average of my outcomes will be within 1% of the theoretical average outcome.
9. The chance of observing an outcome more than ten standard deviations from the mean
is not more than 1%.
10. If two statistical processes are uncorrelated then they must be independent.

3. Explain the differences between a sample and a population.

Definition of Population

In simple terms, population means the aggregate of all elements under study having one or
more common characteristic, for example, all people living in India constitutes the
population. The population is not confined to people only, but it may also include animals,
events, objects, buildings, etc. It can be of any size, and the number of elements or members
in a population is known as population size, i.e. if there are hundred million people in India,
then the population size (N) is 100 million.

Examples

 The population of all workers working in the sugar factory.


 The population of motorcycles produced by a particular company.
 The population of mosquitoes in a town.
 The population of tax payers in India.

Definition of Sample

By the term sample, we mean a part of population chosen at random for participation in the
study. The sample so selected should be such that it represents the population in all its
characteristics, and it should be free from bias, so as to produce miniature cross-section, as
the sample observations are used to make generalisations about the population.

In other words, the respondents selected out of population constitutes a ‘sample’, and the
process of selecting respondents is known as ‘sampling.’ The units under study are called
sampling units, and the number of units in a sample is called sample size.
Key Differences Between Population and Sample

The difference between population and sample can be drawn clearly on the following
grounds:

1. The collection of all elements possessing common characteristics that comprise


universe is known as the population. A subgroup of the members of population
chosen for participation in the study is called sample.

2. The population consists of each and every element of the entire group. On the other
hand, only a handful of items of the population is included in a sample.

3. The characteristic of population based on all units is called parameter while the
measure of sample observation is called statistic.

4. When information is collected from all units of population, the process is known as
census or complete enumeration. Conversely, the sample survey is conducted to
gather information from the sample using sampling method.

5. With population, the focus is to identify the characteristics of the elements whereas in
the case of the sample; the focus is made on making the generalisation about the
characteristics of the population, from which the sample came from.

4. Why are samples used in statistics?

Sampling is a term used in statistics. It is the process of choosing a representative sample


from a target population and collecting data from that sample in order to understand
something about the population as a whole.

The term used in statistical sampling to describe the greater whole is population. A
population could be a group of people or any group of objects you are studying (e.g., rocks
containing gold, dog biscuits made by x-brand, or all left-handed one-toothed people in the
world). Studying populations can be complicated, expensive, and time consuming so
researchers have developed several different ways to sample whatever it is they are studying.
Broadly, these sampling techniques are either probability-based (random sampling) or non-
probability-based (non-random sampling).
Before one can determine which sampling method to use they must first decide what will be
their target population. From that they would develop a sampling frame, or list, of the set
of people from whom data could be collected. This can be a difficult task at times. Then they
would probably want to apply one of the following methods for determining what, or who,
will be in the sample.

5. For each research objective below, identify the population and sample in
the study.
a) The government research agency contacts 2020 residents of Malaysia aged 18 or older and
ask whether the National Service is good to be implemented. (Sample)
b) A worker selected 20 cans of soft drinks at random for inspection. (sample)
c) Steven Co. studies the process of 40 Statistics books in Malaysia in order to set a price for
its new Statistics book (Sample).
d) A bus driver selects 15 passengers at bus station to see the colour of the clothes (Sample).

e)The heights of 100 secondary students in Malaysia (Sample)

f)The number of books sold by all bookstores in Melaka (Population).

g)The monthly income for Year 2005 (Population) .

h)The prices of all houses sold by a developer (Population).

i)The income tax collected from 50 companies in Malaysia (Sample).

j)The time taken by a sample of 85 university students in an examination.(Sample)

k)The weights of 14 policemen in a country.(Sample)

6. In each of the statements, tell whether descriptive or inferential statistics


have been used.
a) In the year 2015, 148 million Americans will be enrolled in HMO.(Inferential)
b) Nine out of ten on-the-job fatalities are men.(Descriptive)
c) Drinking decaffeinated coffee can raise cholesterol levels by 7%. (Descriptive)
d) The median Household income for people aged 24-35 is $35,888. (Inferential)

Example of Descriptive or inferential statistics

(1)
The last four semesters an instructor taught Intermediate Algebra; the following numbers of
people passed the class.
SEM I-17, SEM II –19, SEM III-4, SEM IV -20
Which of the following conclusions can be obtained from purely descriptive measures and
which can be obtained by inferential methods?
Answer:
a) Descriptive- The last four semesters the instructor taught Intermediate Algebra; an
average of 15 people passed the class.

b) Inferential- The next time the instructor teaches Intermediate Algebra, we can expect
approximately 15 people to pass the class.

c) Inferential- This instructor will never pass more than 20 people in an Intermediate
Algebra class.

d) Descriptive-The last four semesters the instructor taught Intermediate Algebra, no


more than 20 people passed the class.

e) Inferential- Only 5 people passed one semester because the instructor was in a bad
mood the entire semester.

f) Inferential-The instructor passed 20 people the last time he taught the class to keep
the administration off of his back for poor results.

g) Inferential- The instructor passes so few people in his Intermediate Algebra classes
because he doesn't like teaching that class.

(2)
During the last week, Tony Gwynn of the San Diego Padres recorded the following number
of hits.
Sun –2
Mon –1
Tues –4
Wed –3
Thurs –0
Fri –3
Sat -1
Which of the following conclusions can be obtained from purely descriptive methods and
which can be obtained by inferential methods?

Answer:

(a) Inferential- Tony will never have more than 4 hits in a game.

(b) Inferential-Tony had 0 hits on Thursday because he used a bat that belonged to
another player.
(c) Descriptive- During the last week, Tony averaged 2 hits per game.

(d) Inferential-Tony is a better hitter than any other baseball player

(e) Descriptive-Tony had the same total number of hits in the first 3 games as he did in
the last 4 games.

7. Distinguish between quantitative variables and qualitative variables.

Qualitative vs. Quantitative Variables

Variables can be classified as qualitative (aka, categorical) or quantitative (aka, numeric).

 Qualitative. Qualitative variables take on values that are names or labels. The colour
of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier)
would be examples of qualitative or categorical variables.
 Quantitative. Quantitative variables are numeric. They represent a measurable
quantity. For example, when we speak of the population of a city, we are talking
about the number of people in the city - a measurable attribute of the city. Therefore,
population would be a quantitative variable.

In algebraic equations, quantitative variables are represented by symbols (e.g., x, y, or z).

8. Determine whether the given statement is a qualitative or quantitative


variable. For quantitative variables, identify whether it is a discrete or
continuous variable.
a) Noon temperature (in degree Celsius) in Kuala Lumpur last two weeks. (Quantitative)
(cont.)
b) The distances travelled by travellers who visit Malaysia. (Quantitative) (Con.)
c) The advertisement showed on television from 9.00pm to 10.00pm last night.(Quantitative)
(dis.)
d) The responses of a survey (agree, disagree and no opinion). (Quantitative) (dis.)
e) The number of typos in 2 pages of documents typed by a secretary (Quantitative) (dis)

Example of Qualitative and quantitative variables:

Classify the qualitative and quantitative variable.


a. Hair colours.(Qualitative)

b. Types of products produced in a factory. (Quantitative)

c. Heights of policemen in a physical test. (Quantitative)

d. Weights of cars in a parking area. (Quantitative)

e. Salaries of employees.(Quantitative)

f. Religious affiliation.(Qualitative)

g. The number of times ‘tail’ is observed after a coin is tossed 20 times.(Quantitative)

h. Brands of cellular phones displayed in a telecommunication store. (Qualitative)

11.Definition of discrete variables:

A variable whose values are countable is called a discrete variable

Example of discrete variables:

a) The number of cars sold per month by a car sales executive.

b) The number of students attending the Statistics class.

c) The number of burgers sold per day in McDonald.

d) The number of books read by Shaun per year.

e) The number of heads occurring when a coin is tossed 3 times.

Definition of continuous variables:

A variable that can assume any numerical value over a certain interval or intervals is called a
continuous variable

Example of Continuous variables:

a) The amounts of milk that cows produce.

b) The heights of children in Sunshine Kindergarten.

c) The weights of IT students.


d) The temperature in a frozen room in a restaurant.

e) The prices of books in a book store.

Chapter 2: Describing Data using Tables and


Graphs
Formulas:
Frequency of that Category f
Relative frequency of a category=
∑ of all Frequencies = ∑f
Percentage (%)= (Relative frequency) × 100

Pie chart Angle: (Relative Frequency) ×360˚

Class width / size of class (Range) = Upper boundary –Lower boundary

Lower Limit+ Upper Limit


Class midpoint/mark=
2

Determine number of classes, c= 1 + 3.3log n


c= number of classes
n=number of classes

Inclusive:
For this type of data, the lower boundary = lower limits –0.5, and
upper boundary = upper limits + 0.5

Exclusive:
What is frequency distribution?
A frequency distribution lists each category of data and the number of occurrences for each
category of data.
Frequency of that Category f
Relative frequency of a category=
∑ of all Frequencies = ∑f
Percentage (%)= (Relative frequency) × 100

Example:

Answer:

Stress on Job Frequency Relative Frequency (RF) Percentage


Very 10 10/30= 0.333 0.333(100)= 33.3
Somewhat 14 14/30= 0.467 0.467 (100) = 46.7
None 6 6/30= 0.200 0.200 (100) = 20.0
Sum = 1.00 Sum = 100

Bar graph:
A graph made of bars whose heights represent the frequencies of respective categories is
called a bar graph.
x–axis : categories
y–axis : frequencies (or relative frequencies or percentages)
Example
question on Bar chart/graph:
The following data are the favourite National car models of 50 MMU students.
The question can ask you by giving this pattern below:

Or,
Car Model Number of cars (f)
Wira 14
Iswara 6
Kancil 17
Kenari 7
Saga 6
Sum = 50

(a) Prepare the frequency distribution table for these data


(b) Calculate Relative frequency and percentage distributions
(c) Draw a bar graph for the relative frequency
Answer:
(a and b)
Car Model Number of RF Percentage
cars (f)

Wira 14 0.28 28
Iswara 6 0.12 12
Kancil 17 0.34 34
Kenari 7 0.14 14
Saga 6 0.12 12
Sum= 50 Sum=1 Sum= 100%

c)

Bar chart/graph
0.4
0.35
0.3
Relative Frequency

0.25
0.2
0.15
0.1
0.05
0
Wira Iswara Kancil Kenari Saga
Cars Model

Pie Chart:
A pie chart is a circle divided into sectors. Each sector represents a category of data. The
area of each sector is proportional to the frequency of the category.
Pie chart Angle: (Relative Frequency) ×360˚

Example question on Pie chart:


Twelve persons were asked to taste two types of soft drinks, A and B, and indicate if the taste
of A was superior (S), the same (M) or inferior (I) to that of B. Their responses are listed
below:
S I I M M S M M S I S S
Or,
Types of tastes No of respondents
S 5
M 4
I 3

a) Construct a frequency distribution table.

b) Calculate the relative frequencies and percentages for all categories.

c) Draw a pie chart for the percentage distribution.

Answer:
(a & b)
Types of No of RF Percentag Angel
tastes respondents e
S 5 0.416667 41.66667 0.416667×360˚= 150˚
M 4 0.333333 33.33333 0.333333×360˚=120˚
I 3 0.25 25 0.25×360˚= 90˚
Sum= 1 Sum= 100 Sum= 360˚
c)
RF Angel
0.416667 0.416667×360˚= 150˚
0.333333 0.333333×360˚=120˚
0.25 0.25×360˚= 90˚
Sum= 1 Sum= 360˚

Pie chart

25%
42%

33%

Inclusive:
For this type of data, the lower boundary = lower limits – 0.5, and
upper boundary = upper limits + 0.5

Exclusive:
If the question gives ungrouped data, and then asked you to
construct an inclusive/exclusive frequency distribution table:
For example:

Histogram & Polygon:


Histogram
A histogram is constructed by drawing rectangles for each class of data whose height is the
frequency or relative frequency of the class. The width of each rectangle should be the same
and they should touch each other.
x–axis : class boundaries (lower boundary -upper boundary)
y–axis : frequencies (or relative frequencies or percentages)

Example:
Polygon:
A graph formed by joining the midpoints of the tops of successive bars in a histogram with
straight lines is called a polygon.

Question on Histogram and Polygon with cumulative frequency:


Class Limit Frequency
11-16 8
17-22 13
23-28 8
29-34 6
35-40 3
41-46 1
47-52 1
Total 40

a) Calculate the relative frequency and percentage.


b) Calculate the cumulative frequency, cumulative relative frequency, and Cumulative
percentage
c) Draw a relative frequency histogram and a relative frequency polygon for the data on the
same graph.
d) Find out the mode on histogram.
Answer:
a)
Class Limit Frequency RF Percentage (%)

11-16 8 0.2 20
17-22 13 0.325 32.5
23-28 8 0.2 20
29-34 6 0.15 15
35-40 3 0.075 7.5
41-46 1 0.025 2.5
47-52 1 0.025 2.5
Total 40 1 100

b)
Class Frequency RF % Cumulative Cumulative Cumulative
Limit frequency relative percentage
frequency
11-16 8 0.2 20 8 0.2 20
17-22 13 0.325 32.5 21 0.525 52.5
23-28 8 0.2 20 29 0.725 72.5
29-34 6 0.15 15 35 0.875 87.5
35-40 3 0.075 7.5 38 0.95 95
Histogram & Polygon
41-46 1 0.025 2.5 39 0.975 97.5
0.35
47-52 1 0.025 2.5 40 1 100
Total0.3 40 1 100
Relative Frequency

0.25

0.2
c)
0.15

0.1

0.05

0
10.5 16.5 22.5 28.5 34.5 40.5 46.5 52.5
Class Limit
d)
Mode on Histogram
0.35

0.3

0.25
Relative frequency

0.2

0.15

0.1

0.05

0
10.5 16.5 22.5 28.5 34.5 40.5 46.5 52.5
Class Limit

Ogive or cumulative frequency curve:


An ogives(commonly known as cumulative frequency curve) is a graph or line chart of a
cumulative frequency distribution. The cumulative frequencies are plotted against the upper-
class boundaries. The cumulative frequency is used to determine the number of observations
which lie below a certain upper-class boundary.
There are two types
of Cumulative Frequency Curves (or Ogives) :

 More than type Cumulative Frequency Curve (Here we use the lower limit of the classes
to plot the curve)

 Less than type Cumulative Frequency Curve (Here we use the upper limit of the classes
to plot the curve)

Difference between less than type and more than type

Question of ogive on graph:


d. Median
e. First & Third quartile

Answer:
a)
Electric bills (inclusive class interval) No. of families (f)
21-30 3
31-40 5
41-50 8
51-60 5
61-70 5
71-80 4
Total 30

b)
Electric bills (inclusive class No. of families (f) Cumulative frequency
interval) (CF)
X<20.5 0 0
20.5-30.5 3 3
30.5-40.5 5 8
40.5-50.5 8 16
50.5-60.5 5 21
60.5-70.5 5 26
70.5-80.5 4 30
Total 30
Cumulative frequency curve
35

30
Cumulative Frequency

25

20

15

10

0
10 20 30 40 50 60 70 80
Upper boundaries

c)
Cumulative frequency curve
35

30
Cumulative Frequency

25

20

15

10

0
10 20 30 40 50 60 70 80
Upper boundaries
10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5

a) From the graph there are 27 families whose electricity bill is RM 61 or more. The
total no of families are, 30 so the number of families whose electricity bill is
RM61 or more are 30-27= 3 (10%)
b) From the graph there are 15 families whose electricity bill is RM 41 or less (50%)

n 30
d. Median, Position= = =15
2 2

L = 40.5, f m−1=8, , f m=8


n
−∑( f m−1)
Median= L + 2 ×C
fm

15−8
= 40.5 + ×10
8

= 40.5 +8.75

= 49.25

n+1 104
e. First quartile= = = 26
4 4

3(n+1) 315
Third quartile = = = 78.75
4 4

Quartiles:

Quartiles
If we divide a cumulative frequency curve into quarters, the value at the lower quarter is
referred to as the lower quartile, the value at the middle gives the median and the value at the
upper quarter is the upper quartile.

A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these numbers
is 19.625 . However, the extremes in this set (8 and 50) distort this value. The interquartile
range is a method of measuring the spread of the middle 50% of the values and is useful since
it ignores the extreme values.
The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case) and
the upper quartile is the 3(n+1)/4 the value. The difference between these two is the
interquartile range (IQR).

In the above example, the upper quartile is the 118.5th value and the lower quartile is the
39.5th value. If we draw a cumulative frequency curve, we see that the lower quartile,
therefore, is about 17 and the upper quartile is about 37. Therefore, the IQR is 20 (bear in
mind that this is a rough sketch- if you plot the values on graph paper you will get a more
accurate value).
Steam & Leaf:
In a stem-and-leaf display of quantitative data, each value is divided into two portions –a
stem and a leaf. The leaves for each stem are shown separately in a display. Used in data
analysis when data is small.

Answer:
50, 52, 57, 61, 64, 65, 68, 69, 71, 71, 72, 72, 75, 76, 77, 78, 79, 79, 80, 81, 83, 84, 86, 87, 87,
92, 92, 93, 95, 96, 98
Chapter 3, 4 & 5

Measure of Central tendency


&
IQR and Relative Dispersion
Formulas:
1. Mean (Ungrouped Data)

Population: μ=
∑x
N

Sample: X́ =
∑x
n
Mean (Grouped Data) (Continuous & Discrete)

Population: μ=
∑ fx or μ= ∑ fx
N ∑f
Sample: : X́ =
∑ fx or, X́ = ∑ fx
n ∑f Where, L, is the lower boundary of the class containing
2. Median (Ungrouped and Discrete) the median

~ ¿ n+1 th ∑ ( f m−1 ) , is the cumulative frequency before the


X
2 median class

Median (Continuous) f m , is the frequency of the median class


C, is the class interval of the class containing the median
n
~ −∑( f m−1)
X =L+ 2 ×C
fm

3. Mode (Ungrouped and Discrete) Where, L is the lower boundary of class containing the mode

Take the highest frequency f 0 ,is the frequency of the modal class
Mode (Continuous) f 1 ,is the frequency of class before modal class
f 0−f 1 f 2, is the frequency of class after modal class
^
X =L+ ×C
( f 0−f 1 ) +¿ ¿ C, is the interval of the modal class

4. Variance and Standard Deviation (Ungrouped data)

Ꝺ 2=∑ ¿¿ ¿ S2=√ ∑ ¿ ¿ ¿
Variance and Standard Deviation (Ungrouped data)

Ꝺ 2=∑ f ¿ ¿ ¿ S2=√ ∑ f ¿ ¿ ¿

5. Dispersion:

CV= ( X́S ) ×100 %


6. Pearson Co-efficient (Skewness):
r =3 ¿¿ or, ¿ ¿

Ungrouped Data
1. 23, 45, 32, 14, 56, 45, 30, 47
a) Mean
b) Median
c) Mode
d) Variance & Standard Deviation
Answer:

a) Mean, X́ =
∑ x = 23+ 45+32+ 14+56+ 45+30+ 47 =36.5
n 8
~ n+1 th
b) Median, X ¿
2
8+1
= th =4.5th
2
14+56
So, the median will be, =35
2
c) Mode:
Here in this ungrouped data we found two modes, the nature of this mode is called
bimodal.
d)

x x́ ( x− x́ ) ¿
23 36.5 -13.5 182.25
45 36.5 8.5 72.25
32 36.5 -4.5 20.25
14 36.5 -22.5 506.25
56 36.5 19.5 380.25
45 36.5 8.5 72.25
30 36.5 -6.5 42.25
47 36.5 10.5 110.25
∑¿
Ꝺ 2=∑ ¿¿ ¿.
S2=√ ∑ ¿ ¿ ¿ = √ 4.75=2.18
Grouped Data (Discrete)
The local Dress Rack store conducted an inventory of their sales to determine which
sizes to order for the fall season. The following data represent the number of dresses sold
this month by size

Size No of dresses
sold
4 8
6 23
8 12
10 35
12 3

Calculate :
a) mean
b) median
c) mode
d) Standard deviation
Answer:
a) Mean:

Size (x) No of dresses sold (f) (fx) (cf)


4 8 32 8
6 23 138 31
8 12 96 43
10 35 350 78
12 3 36 81
∑ f =81 ∑ fx=652
X́ =
∑ fx = 652 =8.05
∑ f 81
~ n+1 th 81+1 =41
b) Median, X ¿ = th, so the median 41th position is in size 8.
2 2
c) Mode: here the mode is size 10 because it has highest frequency

d) Standard Deviation
(f) Size (x) x́ ( x− x́ ) ¿ f¿
8 4 8.05 -4.05 16.40 131.2
23 6 8.05 -2.05 4.20 96.6
12 8 8.05 0.05 0.0025 0.03
35 10 8.05 1.95 3.80 133
3 12 8.05 3.95 15.60 46.8
∑¿ ∑f¿
407.63
S2=√ ∑ f ¿ ¿ ¿ =¿
√ 40−1
=√ 10.45=3.23

Grouped Data (Continues)


The table below shows the speed of 100 vehicles that pass by a small town in a certain period of
time.

a) Mean
b) Median
c) Mode
d) Variance & Standard deviation

Answer:

a) Mean

Speed (km hour) No. of vehicles (f) Midpoint (x) (fx) (Cf)
40-44 8 42 336 8
45-49 18 47 846 26
50-54 16 53 848 42
55-59 26 57 1482 68
60-64 22 62 1364 90
65-69 10 67 670 100
∑ f =100 ∑ fx=5546

X́ =
∑ fx = 5546 =55.46
∑ f 100
b) Median:
n
Position: = 50
2
L=54.5
∑ ( f m−1 ) =42 .

f m=26 .

n
−∑( f m −1)
So median, ~ 2
X =L+ × C.
fm
50−42
= 54.5+ ×5
26
= 56.04

c) Mode:

L= 54.5

f 0−f 1=26−16=10 ;

¿.
f 0−f 1
^
X =L+ ×C
( f 0−f 1 ) +¿ ¿
10
= 54.5+ ×5
10+4
= 58.07

d) Variance & Standard deviation:

(f) Midpoint (x) x́ ( x− x́ ) ¿ f¿


8 42 55.46 -13.46 181.17 1449.36
18 47 55.46 -8.46 71.57 1288.26
16 53 55.46 -2.46 6.05 96.8
26 57 55.46 1.54 2.37 61.62
22 62 55.46 6.54 42.77 940.94
10 67 55.46 11.54 133.17 1331.7
∑f ¿

Ꝺ 2=∑ f ¿ ¿ ¿.

S2=√ ∑ f ¿ ¿ ¿ √ 51.68=7.189=7.2
Find out missing frequency
Table shows the age of 40 tourists who visited a tourist spot.

Age Frequency
10-19 4
20-29 m
30-39 n
40-49 10
50-59 8
Given that the median age is 35.5, find the value of m and of n.

Answer:
Age Frequency Cumulative frequency
10-19 4 4
20-29 m 4+m
30-39 n 4+m+n
40-49 10 14+m+n
50-59 8 22+m+n

Here the total number of tourists are 40. So, the cumulative frequency (22+m+n) will be equal to the
number of tourists, 40.
That means, 22+m+n=40
n= 40-22-m
n=18-m ……….(1)
According to the question the given median age is 35.5, which lead to the median class, 30-39. So the
median will be,
n
−f m −1
Median= L + 2
( )× C
fm

40
−( 4+ m)
35.5 = 29.5 + 2
×10
n
20−4−m
35.5 – 29.5 = ×10
n
16−m
6= × 10
n
6n = 10(16-m)
6n = 160-10m
3n = 80 – 5 m ……………..(2)
Now if we take the substitute (1) into (2) so we can find out the value of m.
3(18-m) = 80-5m
54-3m =80 - 5m
2m = 26
26
m=
2
m =13
We get the value of m=13, so we can put the value of m into substitute (1)
n= 18-m
n= 18-13
n=5
So, m=13, n=15.

Shapes of distribution:
Pearson’s Coefficient
The weekly income of all part time employees of a fast-food restaurant chain was organized into the
following frequency distribution.

Weekly income No of employees


100-150 5
150-200 9
200-250 20
250-300 18
350-400 5
450-500 3
Compute the Karl Pearson's coefficient of skewness and conclude on the shape of the distribution.

Answer:
Weekly income f X fx Cf x́ ( x− x́ ) ¿ f¿
100-150 5 125 625 5 240 -115 13225 91125
150-200 9 175 1575 14 240 -65 4225 38025
200-250 20 225 4500 34 240 -15 225 4500
250-300 18 275 4950 52 240 35 1225 22050
350-400 5 325 1625 57 240 85 7225 36125
450-500 3 375 1125 60 240 135 18225 54675
∑ f =60 ∑ fx =14400 ∑f¿
Mean,

X́ =
∑ fx = 14400 =240
∑ f 60
Median,
a) Median:
n
Position: = 30
2
L=200
∑ ( f m−1 ) =14 .

f m=20 .
n
−∑( f m −1)
So median, ~ 2
X =L+ × C.
fm

30−14
= 200+ ×50
20
= 200+40 = 240

S2=√ ∑ f ¿ ¿ ¿ √ 4178=65

3(240−240)
Parson Coefficient, r =3 ¿¿ = =0
65
r=0, Distribution is symmetry.

Greater Dispersion/CV

Answer:
Series A
Weekly income f X fx x́ ( x− x́ ) ¿ f¿
10-20 10 15 150 42.86 -27.86 776.18 7761.8
20-30 16 25 400 42.86 -17.86 318.98 5103.68
30-40 30 35 1050 42.86 -7.86 61.78 1853.4
40-50 40 45 1800 42.86 2.14 4.58 183.2
50-60 26 55 1430 42.86 12.14 147.38 3831.88
60-70 18 65 1170 42.86 22.14 490.18 8823.24
∑ f =140 ∑ fx =6000 ∑f¿ 27557.2

Mean,

X́ =
∑ fx = 6000 =42.86
∑ f 140
Standard Deviation:

S2=√ ∑ f ¿ ¿ ¿ √ 198=14

S
CV = ( ) X́
×100 %

14
=(
42.86 )
× 100 % = 32.66%

Series B
Weekly income f X fx x́ ( x− x́ ) ¿ f¿
10-20 22 15 330 39 -24 576 12672
20-30 18 25 450 39 -14 196 3528
30-40 32 35 1120 39 -4 16 512
40-50 34 45 1530 39 6 36 1224
50-60 18 55 990 39 16 256 4608
60-70 16 65 1040 39 26 676 10816
∑ f =140 ∑ fx=5460 ∑f ¿

Mean,

X́ =
∑ fx = 5460 =39
∑ f 140
Standard Deviation:

33360
S2 =
√ 140−1
=¿ √ 240=15.5
CV = ( X́S ) ×100 %
15.5
=(
39 )
×100 % = 39.72%

Series B has greater dispersion than series A.

Chapter 6: Probability
Formulas:
Number of ways that event A can occur n( A)
P(A)= =¿
Total number of possible outocmes n (S)

Mutually non-exclusive (Addition rule)


P(A∪B)= P(A)+P(B)-P(A∩B)
P(A∩B)= P(A)+P(B)-P(A∪B)

Mutually exclusive
P(A∪B)= P(A)+P(B)

If bar is given,
P(A∩ B́)= P(A) – P(A∩B) or, P(A∪ B́)= P(A) – P(A∪B)
P( Á ∩ B ¿=¿ P(B) – P(A∩B) or, P( Á ∪ B ¿=¿ P(B) – P(A∪B)
P( A ∩´ B ¿¿ = 1- P(A∪B) or, P( A ∪´ B ¿ ¿ = 1- P(A∩B)
Probability tree diagram:
Example 1:

A bag contains 3 black balls and 5 white balls. Paul picks a ball at random
from the bag and replaces it back in the bag. He mixes the balls in the bag
and then picks another ball at random from the bag.
a) Construct a probability tree of the problem.
b) Calculate the probability that Paul picks:
i) two black balls
ii) a black ball in his second draw

Solution:

a) Check that the probabilities in the last column add up to 1.

b)
i. To find the probability of getting two black balls, first locate the B
branch and then follow the second B branch. Since these are
independent events we can multiply the probability of each branch.
ii) There are two outcomes where the second ball can be black.

Either (B, B) or (W, B)

From the

P(second ball black)


= P(B, B) or P(W, B)
= P(B, B) + P(W, B)

Example 2:
Bag A contains 10 marbles of which 2 are red and 8 are black. Bag B contains 12 marbles of
which 4 are red and 8 are black. A ball is drawn at random from each bag. 
a) Draw a probability tree diagram to show all the outcomes the experiment. 
b) Find the probability that: 
(i) both are red. 
(ii) both are black. 
(iii) one black and one red. 
(iv) at least one red.

Solution:

a) A probability tree diagram that shows all the outcomes of the experiment.
b) The probability that:
(i) both are red.

P(R, R) = 

(ii) both are black.

P(B, B) = 

(iii) one black and one red.

P(R, B) or P(B, R) = 

(iv) at least one red.

1- P(B, B) =   

Example 3:

A box contains 4 red and 2 blue chips. A chip is drawn at random and then replaced. A
second chip is then drawn at random. 
a) Show all the possible outcomes using a probability tree diagram. 
b) Calculate the probability of getting: 
(i) at least one blue. 
(ii) one red and one blue. 
(iii) two of the same colour.

Solution:
a) A probability tree diagram to show all the possible outcomes.

b) The probability of getting:

(i) at least one blue.

P(R, B) or P(B, R) or P(B, B) = 

(ii) one red and one blue.

P(R, B) or P(B, R) = 

(iii) two of the same colour.

P(R, R) or P(B, B) =   

Example 4

A card is taken at random from a pack of 52 playing cards, and then replaced. A
second card is then drawn at random from the pack.
Use a tree diagram to determine the probability that:
We first note that, for a single card drawn from the pack,
13 1 39 3
p(Diamond) = = and p(not Diamond) = = .
52 4 52 4
We put these probabilities on the branches of the tree diagram below:
Note also that the probability for each combination, for example, two Diamonds, is
determined by multiplying the probabilities along the branches.

(a)
both cards are Diamonds,
1
p(both Diamonds) =
16
(b)
at least one card is a Diamond,
1 3 3 7
p(at least one Diamond) = + + =
16 16 16 16

(c)
exactly one card is a Diamond,
3 3 6 3
p(exactly one Diamond) = + = =
16 16 16 8
(d)
neither card is a Diamond.
9
p(neither card a Diamond) =
16

Example 5
The probability that a patient is allergic to penicillin is .20. Suppose this drug is administered
to three patients. Find the probability:
a)all three of them are allergic to it.
b)exactly one patient allergic to it.
c)at least one patient allergic to it.
Answer:

a) P (A and B and C ) = P (A ) P (B ) P (C ) = (.20) (.20) (.20) = .008


b) P (exactly one patient allergic) = 0.128+0.128+0.128=0.384
c) P(at least one patient allergic) = 0.32 + 0.32 +0.128+ 0.032+ 0.128+ 0.128 = 1

Venn Diagram
Example 1:
In a school of 320 students, 85 students are in the band, 200 students are on sports teams, and
60 students participate in both activities. Find the probability of students involved in either
band or sports?

85 60 200
Answer:
P(A)= 85
P(B)= 200
P(A∩B)= 60
P(A∪B)= P(A)+P(B)-P(A∩B)
= 85+200-60
= 225

Example 2:
P(A) = 0.4, P(B) = 0.3, P(A ∪ B) = 0.6. Find a) P(A ∩ B) b) P(A') c) P(A' ∩ B') d) Draw
these events out on a Venn Diagram

a)
P(A ∩ B) = P(A) + P(B) - P(A ∪ B)
P(A ∩ B) = 0.4 + 0.3 - 0.6
P(A ∩ B) = 0.1

b)
P(A') = 1 - P(A)
P(A') = 1 - 0.4
P(A') = 0.6

c)
P(A' ∩ B') - this means that event A doesn't occur AND event B doesn't occur. We know
that P(A ∪ B) = 0.6; this means the probability of event A OR B occurring is 0.6. Therefore
the probability that event A AND B DON'T occur, must be 1 - P(A ∪ B). This equals 0.4.
 
d) 

Example 3:
The probability that a child in a school has green eyes is 0.37, and the probability they
have black hair is 0.45. The probability that the child has either green eyes or black hair
or both is 0.40. A child is randomly selected from the school, what is the probability that
the child has a) black hair and green eyes b) black hair but not green eyes c) neither
black hair or green eyes?

P(BH) = 0.45; P(GE) = 0.37; P(BH ∪ GE) = 0.8

a)
P(BH AND GE) = P(BH ∩ GE)
P(BH ∩ GE) = P(BH) + P(GE) - P(BH ∪ GE)
P(BH ∩ GE) = 0.45 + 0.37 - 0.8
P(BH ∩ GE) = 0.02

b)
P(BH AND NOT GE) = P(BH ∩ GE')
If we visualise a Venn diagram, we want the BH circle without the intersection with GE
circle. Therefore, we want BH minus P(BH ∩ GE):
P(BH ∩ GE') = P(BH) - P(BH ∩ GE)
P(BH ∩ GE') = 0.45 - 0.02
P(BH ∩ GE') = 0.43

Example 4:
An integer is selected randomly from a set of integers {1,2,3,4,5,6,7,8,9,10,11,12}. Find the
probability that the integer is
(a)an even number or is divisible by 3
(b)an even number and is not divisible by 3
(c)not an even number and is not divisible by 3
Answer:
Let A= event that even number is chosen
= (2,4,6,8,10,12)
Let B = event that the chosen number is divisible by 3
= (3,6,9,12)
a) P(A∪B)= P(A)+P(B)-P(A∩B)
6 4 2 8 2
= + − = =
12 12 12 12 3
b) P(A∩ B́)= P(A) – P(A∩B)
6 2 4 1
= − = =
12 12 12 3
´
c) P( A ∩ B ¿¿ = 1- P(A∪B)
2 1
= 1- =
3 3

Example:
Two dice are tossed, find the probability :
a) the sum of the two number is an even number.
b) the sum of two number is an even number or the sum of two number is divisible by 3.
Answer:
A= Sum of the two number is an even number
B= sum of two number is divisible by 3
18 1
a) P(A) = = OR, 0.50
36 2
12 1
b) P(B)= = OR, 0.33
36 3
P(A∪B)= P(A)+P(B)-P(A∩B)
18 12 6 24 2
= + − = = = 0.66
36 36 36 36 3
OR,
0.50 + 0.33-0.17 = 0.66

You might also like