0% found this document useful (0 votes)
38 views38 pages

Lecture 2 Statistics

Uploaded by

phen zanuth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views38 pages

Lecture 2 Statistics

Uploaded by

phen zanuth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Chapter 2

Frequency distribution
and Graphs

Danet Hak, PhD December 3, 2024


Outline
1. Introduction
2. Organizing Data
3. Histograms, Frequency
Polygons, and Ogives
4. Other Types of Graphs
Introduction
In this chapter we will learn:
 How to organize data by constructing
frequency distributions table
 How to present the data by constructing
charts and graphs
o Histograms,
o Frequency polygons,
o Ogives,
o Pie graphs,
o Pareto charts,
o Time series graphs
Organizing Data

 When data are collected in original


form, they are called raw data.

 A frequency distribution is the


organization of raw data in table form,
using classes and frequencies.
Organizing Data
Example. You want to investigate age distribution of the
world richest people.
=> First, you get the age list of the world top 50 richest person from
Forbes Magazine. <= (data source)
=> Short the data set in the frequency table

49 57 38 73
81
74 59 76 65
69
54 56 69 68
78
65 85 49 69
61
48 81 68 37
43
78 82 43 64
67 Frequency distribution
52 56 81 77
79 Raw Data table
85 40 85 59
Why Frequency Distributions is necessary?
The reasons for constructing a frequency distribution are
as follows:
o To organize the data in a meaningful, intelligible way.
o To enable the reader to determine the nature or
shape of the distribution.
o To facilitate computational procedures for measures
of average and spread (more detail in chapter 3).
o To enable the researcher to draw charts and graphs
for the presentation of data.
o To enable the reader to make comparisons among
different data sets.
Two Types of Frequency Distributions

(1) Categorical frequency distributions


- Use for categorical/qualitative data => nominal or
ordinal measurement scale (political affiliation, religious affiliation,
blood type etc)

(2) Grouped frequency distributions


- Use for quantitative data => interval or ratio
measurement scale (population age, scores, mileage of travelers,
economic group)
- Use when the range of data is very large.
- Data must be grouped into classes that are more than
one unit in width
Categorical Frequency Distributions
Example: Blood Type Frequency Distribution
Twenty-five army inductees were given a blood test
to determine their blood type. The data set is A; B;
B; AB; O; O; O; B; AB; B; B; B; O; A; O; A; O; O;
O; AB; AB; A; O; B and A. Construct a frequency
distribution for the data.
Blood type Tally Frequency Percentage
(classes)

A
B
AB
O
Group Frequency Distributions

Ex. Grouped frequency distributions table of


the number of hours that boats batteries
lasted
49 57 38 73
81 From raw data to grouped
74 59 76 65 frequency distributions
69
54 56 69 68
78
65 85 49 69
61
48 81 68 37
43
78 82 43 64
67
52 56 81 77
79
85 40 85 59
80
Terms Associated with a Grouped Frequency Distribution
• Class limits represent the smallest and largest data values that can
be included in a class. (should have the same decimal place value as the data)
• Class Boundaries are the midpoints between the upper class limit
of a class & the lower class limit of the next class – the true limit of a
class.
o Used to separate the classes so that there are no gaps in the frequency
distribution
o Should have one higher decimal value than that of the data (e.g. the data are
1, 2, 3, 4, 5, 6 => class limit must be ended by .5)
• Class width is computed by subtracting the lower (or upper) class
limit of one class from the lower (or upper) class limit of the next
class.
• Cumulative frequency: is the frequency of the data values less
than or equal to a specific value (usually an upper boundary).
Cumulative frequencies are used to show how many data values are accumulated up to
and including in a specific class.
Some consideration when creating Grouped Frequency Table

1. Number of classes / groups.


=>No strict rule for deciding number of classes. Yet, it is importance to have
enough classes to present a clear description of the data.
2. Class width
Þ Class width should be an odd number to ensures that the midpoint of each
class has the same place value as the data.
3. Classes must be mutually exclusive
Þ No overlapping class limits; so that data cannot be placed into two classes.
4. The classes must be exhaustive => there should be enough classes to
accommodate all the data.
5. Classes must be continuous – (No gaps between classes)
6. Classes must be equal in class width.
o exception occurs when a distribution has a class that is open-ended. That is, the class has no
specific beginning value or no specific ending value. A frequency distribution with an open-ended
class is called an open-ended distribution.
(Ex. 15 to  25, less than 1 to 10………..)
Some consideration when creating Grouped
Frequency Table
Different kinds of classes
Appropriate Non- Overlapped Open-ended Classes with unequal
classes continuous classes classes width/overlapped
classes
Incorrect class
Incorrect class
1-10 1-10 1-10 10 and below 1-10
11-20 21-30 10-20 11-20 11-20
21-30 31-40 20-30 21-30 20-30
31-40 41-50 30-40 31-40 31-38
41-50 51-60 40-50 41-50 39-44
51-60 61-70 50-60 51 and above 45-50
Procedure for Constructing a Grouped
Frequency Distribution
Example: Grouped Frequency Distribution
These data represent the record high temperatures in
degrees Fahrenheit (F) for each of the 50 states in the US.

112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

1.Construct a grouped frequency distribution for the data


using 7 classes?
2. Find the distribution of the relative frequency and
cumulative frequency of the data?
3.Analyze the distribution.
Solution
Frequency distribution table
Class limits Class Tally Frequency Relative Cumulative
boundaries frequency frequency
100-104 99.5-104.5

105-109 104.5-109.5

110-114 109.5-114.5

115-119 114.5-119.5

120-124 119.5-120.5

125-129 120.5-129.5

130-134 129.5-134.5

Lowest data value = 100 Number of classes = 7


Highest data value = 134 Class width = 34/7 = 4.9 ≈ 5
Data range = 134-100 = 34 Lower class limit = 100; 105; 110; 115; 120; 125; 130
Upper class limits = 104; 109; 114; 119; 124; 129; 134
Other type of Frequency distribution

Ungrouped frequency distribution


=> When the range of the data values is
relatively small, a frequency distribution
can be constructed using single data
values for each class. Therefore, the
resulting frequency distribution is called
“ungrouped frequency distribution”.
Example: Ungrouped Frequency Distribution

The data shown here represent the number of miles


per gallon (mpg) that 30 selected four-wheel-drive
sports utility vehicles obtained in city driving.
Construct a frequency distribution, and analyze the
distribution.

12 17 12 14 16 18
16 18 12 16 17 15
15 16 12 15 16 16
12 14 15 12 15 15
19 13 16 18 16 14
Let’s move to the next
topic…………..
Histograms,
Frequency
Polygons, and
Ogives & Other
types of graph
The three most common
use graph in research
1. Histograms
2. Frequency Polygon
3. Ogives or cumulative
frequency graph
Histograms
 The histogram is a graph that displays the
data by using contiguous vertical bars
(unless the frequency of a class is 0) of
various heights to represent the frequencies
of the classes.

Histogram showing the


frequency distribution of
the distance in miles that
20 marathon runners ran
during a given week.
Frequency polygon
 The frequency polygon is a graph that displays
the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes.
The frequencies are represented by the heights of
the points.

Frequency polygon showing


the frequency distribution of
the distance in miles that 20
marathon runners ran during
a given week.
Ogive or cumulative frequency graph
(Ogive: pronounced o-jive).

 Cumulative frequency graph or ogive is a


graph that represents the cumulative
frequencies for the classes in a frequency
distribution.

Cumulative frequency graph or


ogive showing the cumulative
frequency of the distance in
miles that 20 marathon
runners ran during a given
week.
Other Types of Graphs
Bar graph/ bar chart- A bar graph represents the
data by using vertical or horizontal bars whose
heights or lengths represent the frequencies of the
data.

Bar chart showing the average expense of first year college


student
Other Types of Graphs
Pareto charts - a Pareto chart is used to represent
a frequency distribution for a categorical variable.
=> The frequencies are displayed by the heights of
vertical bars, which are arranged in order from
highest to lowest.
Pareto
charts
Time Series Graph
Time series graph - A time series graph
represents data that occur over a specific
period of time.

Þ used when we want to see the variation


trend over time.
Þ Two or more data sets can be compared on
the same graph called a compound time
series graph.
Time Series Graph

Time series graph showing the Compound time series graph


trend of workplace homicides showing the variation of number
over the period of 2003-2008 of elderly in U.S labor force from
1960-2008
The Pie Graph


Pie graph - A pie graph is a circle that
is divided into sections or wedges
according to the percentage of
frequencies in each category of the
distribution.
The Pie Graph
Example: Pie Chart of the Number of Crimes Investigated by Law
Enforcement Officers In U.S. National Parks During 1995
Misleading Graphs

- Inappropriate scale- trend can not be


observed
- No axis label – unclear indication
Misleading Graphs

- No chart title => lack of information


- Inappropriate scale => unclear pattern
Misleading Graphs

Too many categories in a chart


Misleading Graphs

- Improper axis label


- Improper comparison
Misleading Graphs

- Incorrect statistics
Misleading Graphs

- Unclear label on Y axis


- Unfair comparison
End of Chapter 2
Important terms
• Raw Data: Data collected in original form.
• Frequency : The number of times a certain value or class of values occurs.
• Frequency Distribution: The organization of raw data in table form with classes and frequencies.
• Categorical Frequency Distribution: A frequency distribution in which the data is only nominal or ordinal.
• Ungrouped Frequency Distribution: A frequency distribution of numerical data. The raw data is not grouped.
• Grouped Frequency Distribution: A frequency distribution where several numbers are grouped into one class.
• Class Limits: Separate one class in a grouped frequency distribution from another. The limits could actually appear in the data and have gaps between
the upper limit of one class and the lower limit of the next.
• Class Boundaries: Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data
and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class
limit.
• Class Width: The difference between the upper and lower boundaries of any class. The class width is also the difference between the lower limits of
two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class.
• Class Mark (Midpoint): The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found
by adding the upper and lower boundaries and dividing by two.
• Cumulative Frequency: The number of values less than the upper class boundary for the current class. This is a running total of the frequencies.
• Relative Frequency: The frequency divided by the total frequency. This gives the percent of values falling in that class.
• Cumulative Relative Frequency (Relative Cumulative Frequency): The running total of the relative frequencies or the cumulative frequency divided by
the total frequency. Gives the percent of the values which are less than the upper class boundary.
• Histogram: A graph which displays the data by using vertical bars of various heights to represent frequencies. The horizontal axis can be either the class
boundaries, the class marks, or the class limits.
• Frequency Polygon: A line graph. The frequency is placed along the vertical axis and the class midpoints are placed along the horizontal axis. These
points are connected with lines.
• Ogive: A frequency polygon of the cumulative frequency or the relative cumulative frequency. The vertical axis the cumulative frequency or relative
cumulative frequency. The horizontal axis is the class boundaries. The graph always starts at zero at the lowest class boundary and will end up at the
total frequency (for a cumulative frequency) or 1.00 (for a relative cumulative frequency).
• Pareto Chart: A bar graph for qualitative data with the bars arranged according to frequency.
• Pie Chart: Graphical depiction of data as slices of a pie. The frequency determines the size of the slice. The number of degrees in any slice is the relative
frequency times 360 degrees.
• Pictograph: A graph that uses pictures to represent data.
• Stem and Leaf Plot: A data plot which uses part of the data value as the stem and the rest of the data value (the leaf) to form groups or classes. This is
very useful for sorting data quickly.

You might also like