Lecture 3-Exploring and Making Sense of Data-Deriving Information
Lecture 3-Exploring and Making Sense of Data-Deriving Information
Sense of Data-Presenting
Information
Lecture 3-Class Discussion Notes
BM & EBL Year 1
Kelebogile Kenalemang
Introduction
• We are now moving on to discuss ways of organising and presenting
data so that relevant information can be derived.
• The ideas and techniques considered here are very simple but they
must be applied appropriately if they are to result in useful and timely
information management.
• In this lecture it is assumed that the required data have been
collected by an appropriate method from an appropriate population
or sample.
• We are now ready to process and organise the data to find the
required information.
Frequency Distributions
• One of the most frequently used ways in which data are organized is
by means of frequency distributions.
90-99 4
80-89 6
70-79 4
60-69 3
50-59 2
40-49 1
• A frequency table lists intervals or ranges of data values called data
classes together with the number of data values from the set that are
in each class.
• This number is called the frequency of the class.
Types of Quantitative Frequency Tables
• Discrete Data: This is data that takes on whole values, the data is
obtained by counting, such as the number of defective items in a
batch, number of students in a class, The data cannot be subdivided,
e.g. you can't have half a student or 1/3 student.
How to Construct an Ungrouped
Frequency Table
• In a certain area, 50 households were surveyed.
• The following data give the occupancy of each household.
4 7 4 1 4 2 3 6 3 5
6 3 4 9 12 1 3 4 2 2
1 1 3 8 1 1 4 2 3 4
3 2 1 4 6 5 6 1 2 3
4 4 4 1 4 2 3 5 4 4
• Construct a frequency distribution for these data.
• The discrete variable is 'the number of occupants in the household'.
• The minimum value is 1 and the maximum value is 12, hence, in this
data set, the variable takes values from 1 to 12 inclusive.
• The number of households with a given number of occupants is the
'frequency’.
• Determine the frequency of each value using a tally system . It is
necessary (and desirable) to read through the data set once only to
reduce the risk of error.
Grouped Frequency Table/Distribution
• When dealing with large amounts of data it is usual to group the data
into classes.
• For example, in a weight loss program, the weights in kilograms of 30
people were measured using a scale.
• We could list all 30 weights here but such a long list of data is
cumbersome.
• Instead, we can group the data into several weight ranges or classes.
• Such a table is called a Grouped frequency distribution.
Steps for constructing a grouped
frequency table from a data set
• 1. If the number of classes is not given, decide on a number
of classes to use. This number should be between 5 and 20.
• 2. Find the class width by determining the range (max-min) of the
data and divide this by the number of classes you chose in step 1.
• 3. Round up to the next convenient number (if it's a whole
number, also round up to the next whole number).
• 4.Find the class limits: You can use the minimum data entry as
the lower limit of the class. To get the lower limit of the next class,
add the class width. Continue until you reach the last class.
5. Then find the upper limits of each class (since the classes cannot
overlap, and occasionally your data will include decimal numbers,
remember that it's fine for the upper limits to be decimals).
6. Count the number of data entries for each class, and record the
number in the row of the table for that class. (The book recommends
using \tally" marks to count)
• The groups are usually referred to as classes or class intervals.
• The range of values included in a class is referred to as the class
width. The minimum and maximum values included in the class are
referred to as class boundaries.
• The numbers used to specify the class are called the class limits.
Example of a Grouped Frequency Table
• For example, let’s say you have a list of IQ scores for a gifted
classroom in an elementary school.
• The IQ scores are: 118, 123, 124, 125, 127, 128, 129, 130, 130, 133,
136, 138, 141, 142, 149, 150, 154.
• That list doesn’t tell you much about anything. You could draw a
frequency distribution table, which will give a better picture of your
data than a simple list.
• Pick 5 classes for this example
• Find the class width by calculating the range and dividing it by the
number of classes you picked above. Range= max-min= 154-118=36
• Class width=
• Find the class limits, use the minimum data entry as the lower limit of
the class. E.g 118 will be the lower limit of the first class.
• Add the class width 8 to 118 to get the next lower-class limit: 118 + 8 =
126
• keep on adding your class width to your minimum data values until you
have created the number of classes you chose in Step 1.
• We chose 5 classes, so our 5 minimum data values are:
118
126
134
142
150
• Write down the upper-class limits.
• These are the highest values that can be in the category, so in most
cases you can subtract 1 from the class width and add that to the
minimum data value. For example:
118-125
126-133
134-141
142-149
150-157
Finishing up the table
• Add a second column for the frequencies
IQ Frequency
118-125 4
126-133 6
134-141 3
142-149 2
150-157 2
Class exercise
• Construct a frequency table with 6 data classes from the following
data set.
• Amount of gas purchased by 28 drivers:
• 7, 4, 18, 4, 9, 8, 8, 7, 6, 2, 9, 5, 9, 12, 4, 14, 15, 7, 10, 2, 3, 11, 4, 4, 9,
12, 5, 3
Cumulative Frequency Distribution
90-99 4 4
80-89 6 10
70-79 4 14
60-69 3 17
50-59 2 19
40-49 1 20
Class exercise
• Add a cumulative frequency column to the table of gas purchases.
Relative Frequency
• The relative frequency of a data class is the percentage of data
elements in that class. We can calculate the relative frequency for
each class as follows:
• Relative frequency =
80-89 6 10 .30
70-79 4 14 .20
60-69 3 17 .15
50-59 2 19 .10
40-49 1 20 .05
Class exercise
• Like a histogram, the vertical axis represents frequency and the horizontal axis
represents the variable being measured in the data set. To construct the graph, a
point is plotted for each class at its midpoint and with height given by the frequency
of the class. The points are then connected by straight lines.
An example
of a
Frequency
Polygon
Interpretation of Charts and Diagrams
• It is very easy to produce diagrams which mislead the user.
• Great care should always be taken to ensure that any diagram which
you draw conveys an accurate impression of the data.
• By the same token, you should also examine critically, diagrams
produced by others, in order to avoid being misled yourself.
Reading Assignment
• Study the advantages and disadvantages of using each chart
• Know at least four advantages and disadvantages under each.