Statistics For Year Two
Statistics For Year Two
UNIT 5 STATISTICS
TOPICS:
Data collection
Measures of dispersion
LEARNING OUTCOMES
Dear student, in this topic we are expected to find out the meaning of statistics and to discover
how statistics is applied in real life situation.
Meaning of statistics
Statistics is a branch of mathematics that deals with data collection organization, analysis
interpretation and presentation. This should lead one make meaning conclusions can be drawn
from the presented data. Generally statistical investigations fall into two broad categories called
descriptive and inferential statistics.
Descriptive statistics
Descriptive statistics deals with the processing of data without attempting to draw any inferences
or conclusion from it. The data are presented inform of tables and graphs. The characteristics of
the data are described in simple terms. Events that are dealt with include everyday happenings
such as prices of goods, business, incomes, epidemics, accidents, sports and population data.
Inferential statistics
Inferential statistics is a scientific discipline that uses mathematical tools to make forecasts and
projections by analyzing the given data. This is of use to people employed in fields like
engineering, business, agriculture, social sciences and communications.
Application of statistics
Statistical studies are of great importance in our everyday life. Every human being at a certain
level is a statistician. Below are some major areas where statistics is applied.
Government agencies
The government uses statistics to make decisions about population, health, education etc. It may
conduct research on education to check the progress of primary school pupils.
The medical field would be far less effective without research to see which medicines or
interventions work best and how our human bodies react to treatment. Medical professionals also
perform studies by race, age, gender or nationality to see the effect of these characteristics on
health.
Psychology
Although this is attached to both the science and medical field, success in psychology would be
impossible without the systematic study of human behavior.
Education
At all levels of education and testing, there are statistical reports about learner performance from
kindergarten to university. Teachers are encouraged to researchers in their own classrooms. This
is in order to see what methods work on which students and understand why
Large companies
Every large company employs its own statistical research divisions or firms to research issues
related to products, employees, customer service etc
Types of data
Quantitative data
Qualitative data
Primary data
Secondary data
Quantitative data
Data that is expressed in numbers and summarized using statistics to give meaningful
information is referred to as quantitative data. Examples of quantitative data are temperatures,
heights, weights, ages of students etc
Qualitative data
When we use data for description without measurement, we call it qualitative data. Examples of
qualitative data are colour (red, blue, yellow, pink etc), students’ attitude towards school,
attitudes towards exams etc. such data cannot be easily summarized using statistics
Primary data
When we obtain data directly from individuals, objects or processes, we refer to it as primary
data. Quantitative or qualitative data can be collected using this approach. Such data is usually
collected especially for the research problem that you will study
Secondary data
When you collect data after another researcher or agency that initially gathered makes it
available, you are gathering secondary data. Examples of secondary are census data published by
the Uganda Bureau of statistics, prices data published by a news paper etc
1. Observation
In an observational data collection method, you acquire data by observing any relationship that
may be present in the event you are studying. Making direct observations of simplistic
phenomena can be very quick effective way of collecting data with minimal interruption. There
are four types of observational methods that available to you as a researcher:
Cross-sectional
Case-control
Cohort
Ecological
Cross-sectional
In a cross-sectional study, you only collect data on observed relationship once. This
method has the advantage of being cheaper and takes less time as compared case control and
cohort.
Case-control
In a case control method, you create cases and controls and then observe them.
Cohort
In cohort method, you follow people with similar characteristics over a period. This method is
advantageous you are collecting data on occurrences that happen over a long period.
Ecological method
When you are interested in studying a population instead of individuals, you use ecological
method. For example, if you are interested in malaria infection rates in Iganga and Tororo
districts.
2. Interviews
Interviews help researchers uncover rich, deep insight and learn informative missed
on that they may have missed otherwise. Below are the various ways of conducting interviews:
In-person interview
Telephone interview
Online interview
3. Questionnaires
These are forms which are completed and returned by respondents. The use of questionnaires is
cheap and useful in places where people are literate. Questionnaires are stand alone instruments
of data collection that may be administered to the sample subjects. They have long been one of
the most popular data collection techniques
4. Checklists
These are items that comprise several questions on a topic and require the same response format.
Checklists structure a person’s observation or evaluation of performance or item. They can be
simple list of criteria that can be marked as present or absent, or can provide space for observer
comments
In this topic, we shall learn how to find the mean, mode and median from an ungrouped data.
The frequency of an event is the number of times that the event occurs. When you have a large
amount of data, you need to organize it in a table. The frequency comes from adding together the
tallies. Each stroke / stands for 1. We mark in row with a stroke till four and five is shown by a
bundle
Examples
Forty people were asked to judge which of five paintings, labeled P, Q, R and T was the best.
The results are given below
[ ]
QRQQS PT P
R R P PQST P
QSQQQR PS
T Q R PP R SQ
RR P SQQS P
MEAN
The mean is by far the most commonly used measure of central tendency. It is obtained by
adding all the data items and then dividing the sum by the number of items.
Examples
Note. The Greek letter sigma Ʃ meaning the sum of is always used the formula of finding the
mean. You should also note that the mean of a sample is symbolically written x (read ‘x bar’)
The shoe sizes of nine boys are 7, 8, 6, 9, 8, 6, 7, 9 and 10. Find their mean shoe size
We shall find the mean by adding the shoe sizes and dividing the sum by the number of data
items
Ʃx
x= , where Ʃ x represents the sum of items and n stands for the number of items.
n
70
=
9
= 7.78
When many data values occur more than once and a frequency distribution is used to organize
the data, we may use the formula below to calculate the mean.
Ʃ fx
Mean x = ,
Ʃf
Where,
Ʃ fx Represents the sum of all the products obtained by multiplying each data value by its
frequency
Example
Cards were numbered using 30, 40, 50, 60 and 70. Their number recorded in the table below.
Score 30 40 50 60 70
Frequency 2 3 8 12 5
Questions
Score F Fx
30 2 60
40 3 120
50 8 400
60 12 720
70 5 350
Ʃ f = 30 Ʃ fx =1650
Ʃ fx
Mean x =
Ʃf
1650
=
30
=55
Task
Score 11 13 15 1 19
7
Frequency 3 5 11 1 8
6
MODE
The mode of a distribution is a data value that occurs most often in a data set. If more than one
data value has the highest frequency, then each of these data values is a mode. If there is no data
value that occurs most often, then the distribution has no mode
Example
(b) 17, 11, 25, 14, 11, 15, 22, 16, 25, 20.
(c) Score 3 4 5 6 7 8 9
Frequency 5 7 9 11 6 4 2
Response
(a) The number 9 occurs more than any other. So the mode = 9
(b) The number 11 and 25 are the modes since they occur more often than the other
(c) The highest frequency is 11. The score with this frequency is 6. The mode = 6
Task
(b) 42, 57, 85, 24, 57, 45, 65, 85, 87, 42, 70.
MEDIAN
The median of a distribution is the value below which half the samples. To find the median of a
group of data items, we:
Arrange the data items in order. This can be either ascending or descending.
You should also see whether the number of data items is either odd or even
If the number of data items is odd, the median is the data item in the middle of the list. It is the
n+1
value in the position
2
If the number of data items is even, the median is the mean of the two middle data items.
Examples
(b) 47, 12, 35, 8, 28, 16, 56, 40, 24, 39.
Response
(a) Arrange the data items in order first. The number of items 7 is odd. So the median is the
middle number.
n+1
Position
2
7+1
Position
2
8
Position
2
4th position
(b) Arrange the data items in order first from the smallest to the highest. The number of data
n+1
items is 10. The median is the value in the Position.
2
n+1
= Position
2
10+1
= Position
2
11
= Position
2
= 5.5th Position
This means that the median is the mean of the data items in positions 5 and 6.
28+35
Median =
2
63
=
2
= 31.5
2. The points scored by a team in various events are shown in the table below. Find the median
of the data.
Points 0 1 2 3 4 5 6 7
No. of events 1 4 5 3 7 4 3 2
We shall have columns labeled x, f and cumulative frequency (CF). Cumulative frequency is
found by adding the frequencies from the first one up to the last one.
Points (x) Frequency (f) Cumulative frequency ( CF)
0 1 1
1 4 5
2 5 10
3 3 13
4 7 20
5 4 24
6 3 27
7 2 29
10+1
The median is the value in the Position
2
29+1
= Position
2
30
= Position
2
= 15th Position
This occurs in the row where cumulative frequency is 20 since it contains all the points scored
from 14th until 20th positions and corresponding points are 4. Therefore the median is = 4
Task
(b) 61, 54, 19, 87, 24, 90, 55, 34, 78, 47, 28, 36.
MEASURE OF DISPERSION
RANGE
The range is difference between highest value and the lowest value in a given data set.
Example
Year two students scored the following marks in a course work. 12, 20, 15, 25, 16, 17, 18, 24 and
10.
= 25 – 10
= 15
The class intervals are usually of equal widths. The number of classes should be between 5 and
20. Classes are shown in the first column and frequencies in the subsequent column.
The marks obtained by students in a Maths test are shown below. Represent the data in a
frequency table in class intervals of 10 starting from 40
44 54 85 92 73 57 99 91 96 74 75 70
83 49 52 57 64 40 65 82 90 70 88 91
52 64 82 73 59 67 73 78 81 89 53 52
First draw a frequency table with the 3 columns and tally each mark once and cancel it to avoid
repetition. The first class interval from 40 is 40-49.
Example
Sixty students measure the time it takes them to travel to school in minutes
Time (minutes) 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40
No. of students 6 8 10 11 12 6 4 3
First draw a frequency distribution table with all the required columns
MEAN
The mean of a grouped data is calculated by finding the mid- interval value (mid-mark) for a
group and then multiplied by its frequency. A mid interval value is found by working out the
mean of the two class limits.
Ʃ fx
The mean =
Ʃf
1100
=
60
=18.33
MEDIAN
The median lies in the median class. The median class halfway and is found using the CF
column. The class 20-24 has class boundaries 19.5 – 24.5. 19.5 is called the lower class
boundary and 24.5 is called the upper class boundary. These are useful in estimating the median
of grouped data by use of the formula given below.
N
−cf b
L
Median = m + 2 xi
fm
Where
60
To estimate the median, first find the median class, which is in the = 30th position. Thus
2
median class is 16-20. So Lm =15.5, N = 60, cf b=24 , f m = 11 and i = (20 – 16) + 1 = 5
N
−cf b
Median = m + 2
L xi
fm
60
−24
Median = 15.5 + 2 x5
11
30
= 15.5 +
11
= 15.5 + 2.727
= 18.227
MODE
We need to first find the modal class, which is the class with the highest frequency. The actual
mode may not even be that group. Therefore the mode can be estimated using the following
formula.
f m−f 1
Mode = L+ xi
( f m−f 1) +(f m−f 2)
First find the modal class. This is the class with the highest frequency. It is 21 – 25.
f m−f 1
Using the formula Mode = L+ xi
( f m−f 1) +(f m−f 2)
L = 20.5
f m=¿ 12
f 1 = 11
f2 = 6
i=5
12−11
Mode = 20.5 + x5
12 – 11+12−6
5
Mode = 20.5 +
7
Mode = 21.214
Once the Ogive has been drawn, it can be used to estimate values for the data including the
median, quartiles and percentiles. Percentiles are 100 divisions of the distribution of the data
The table below shows the frequency distribution of Mathematics test marks for 120
students
Marks 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Frequency 2 6 8 14 19 26 23 12 7 3
11-20 6 8 20.5
21-30 8 16 30.5
31-40 14 30 40.5
41-50 19 49 50.5
51-60 26 75 60.5
61-70 23 98 70.5
Object 55