0% found this document useful (0 votes)
11 views

Lesson 2 Frequency Distribution and Graphs

Frequency distribution

Uploaded by

kentmatthewperez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lesson 2 Frequency Distribution and Graphs

Frequency distribution

Uploaded by

kentmatthewperez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

LESSON 2

Statistics: Frequency Distributions & Graphs

Definitions

Raw Data
Data collected in original form.
Frequency
The number of times a certain value or class of values occurs.
Frequency Distribution
The organization of raw data in table form with classes and
frequencies.
Categorical Frequency Distribution
A frequency distribution in which the data is only nominal or ordinal.
Ungrouped Frequency Distribution
A frequency distribution of numerical data. The raw data is not
grouped.
Grouped Frequency Distribution
A frequency distribution where several numbers are grouped into one
class.
Class Limits
Separate one class in a grouped frequency distribution from another.
The limits could actually appear in the data and have gaps between
the upper limit of one class and the lower limit of the next.
Class Boundaries
Separate one class in a grouped frequency distribution from another.
The boundaries have one more decimal place than the raw data and
therefore do not appear in the data. There is no gap between the
upper boundary of one class and the lower boundary of the next class.
The lower class boundary is found by subtracting 0.5 units from the
lower class limit and the upper class boundary is found by adding 0.5
units to the upper class limit.
Class Width
The difference between the upper and lower boundaries of any class.
The class width is also the difference between the lower limits of two
consecutive classes or the upper limits of two consecutive classes. It is
not the difference between the upper and lower limits of the same
class.
Class Mark (Midpoint)
The number in the middle of the class. It is found by adding the upper
and lower limits and dividing by two. It can also be found by adding the
upper and lower boundaries and dividing by two.
Cumulative Frequency
The number of values less than the upper class boundary for the
current class. This is a running total of the frequencies.
Relative Frequency
The frequency divided by the total frequency. This gives the percent of
values falling in that class.

Cumulative Relative Frequency (Relative Cumulative Frequency)


The running total of the relative frequencies or the cumulative
frequency divided by the total frequency. Gives the percent of the
values which are less than the upper class boundary.
Histogram
A graph which displays the data by using vertical bars of various
heights to represent frequencies. The horizontal axis can be either the
class boundaries, the class marks, or the class limits.
Frequency Polygon
A line graph. The frequency is placed along the vertical axis and the
class midpoints are placed along the horizontal axis. These points are
connected with lines.
Ogive
A frequency polygon of the cumulative frequency or the relative
cumulative frequency. The vertical axis the cumulative frequency or
relative cumulative frequency. The horizontal axis is the class
boundaries. The graph always starts at zero at the lowest class
boundary and will end up at the total frequency (for a cumulative
frequency) or 1.00 (for a relative cumulative frequency).
Pareto Chart
A bar graph for qualitative data with the bars arranged according to
frequency.
Pie Chart
Graphical depiction of data as slices of a pie. The frequency
determines the size of the slice. The number of degrees in any slice is
the relative frequency times 360 degrees.
Pictograph
A graph that uses pictures to represent data.
Stem and Leaf Plot
A data plot which uses part of the data value as the stem and the rest
of the data value (the leaf) to form groups or classes. This is very
useful for sorting data quickly.
Percentile
The percent of the population which lies below that value. The data
must be ranked to find percentiles.
Quartile
Either the 25th, 50th, or 75th percentiles. The 50th percentile is also
called the median.
Decile
Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th
percentiles.
Lower Hinge
The median of the lower half of the numbers (up to and including the
median). The lower hinge is the first Quartile unless the remainder
when dividing the sample size by four is 3.
Upper Hinge
The median of the upper half of the numbers (including the median).
The upper hinge is the 3rd Quartile unless the remainder when dividing
the sample size by four is 3.

Box and Whiskers Plot (Box Plot)


A graphical representation of the minimum value, lower hinge, median,
upper hinge, and maximum. Some textbooks, and the TI-82 calculator,
define the five values as the minimum, first Quartile, median, third
Quartile, and maximum.
Five Number Summary
Minimum value, lower hinge, median, upper hinge, and maximum.
InterQuartile Range (IQR)
The difference between the 3rd and 1st Quartiles.
Outlier
An extremely high or low value when compared to the rest of the
values.
Mild Outliers
Values which lie between 1.5 and 3.0 times the InterQuartile Range
below the 1st Quartile or above the 3rd Quartile. Note, some texts use
hinges instead of Quartiles.
Extreme Outliers
Values which lie more than 3.0 times the InterQuartile Range below the
1st Quartile or above the 3rd Quartile. Note, some texts use hinges
instead of Quartiles.

Creating a Grouped Frequency Distributions

Guidelines for classes


1. There should be between 5 and 20 classes.
2. The class width should be an odd number. This will guarantee that the
class midpoints are integers instead of decimals.
3. The classes must be mutually exclusive. This means that no data value
can fall into two different classes
4. The classes must be all inclusive or exhaustive. This means that all
data values must be included.
5. The classes must be continuous. There are no gaps in a frequency
distribution. Classes that have no values in them must be included
(unless it's the first or last class which are dropped).
6. The classes must be equal in width. The exception here is the first or
last class. It is possible to have an "below ..." or "... and above" class.
This is often used with ages.

Steps in Creating a Grouped Frequency Distribution


1. Select the number of classes desired. This is usually between 5 and 20.
2. Find the largest and smallest values
3. Compute the Range = Maximum - Minimum
4. Find the class width by dividing the range by the number of classes
and rounding up. There are two things to be careful of here. You
must round up, not off. Normally 3.2 would round to be 3, but in
rounding up, it becomes 4. If the range divided by the number of
classes gives an integer value (no remainder), then you can either add
one to the number of classes or add one to the class width. Sometimes
you're locked into a certain number of classes because of the
instructions.
5. Pick a suitable starting point less than or equal to the minimum value.
You will be able to cover: "the class width times the number of classes"
values. You need to cover one more value than the range. Follow this
rule and you'll be okay: The starting point plus the number of classes
times the class width must be greater than the maximum value. Your
starting point is the lower limit of the first class. Continue to add the
class width to this lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract one from the lower
limit of the second class. Then continue to add the class width to this
upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting 0.5 units from the lower limits and
adding 0.5 units from the upper limits. The boundaries are also half-
way between the upper limit of one class and the lower limit of the
next class. Depending on what you're trying to accomplish, it may not
be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're
trying to accomplish, it may not be necessary to find the cumulative
frequencies.
11. If necessary, find the relative frequencies and/or relative
cumulative frequencies.
Creating a Grouped Frequency Distributions and Constructing
Graphs

Example:
Consider a data of the Table 1, which represents the lives of 40 similar
car batteries recorded to the nearest tenth of a year. The batteries were
guaranteed to last 3 years.

Table I.
Car Battery Lives

2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6


3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5

Steps

1. Decide on the number of class intervals required


2. Solve for the range (Range is the difference of the highest score and the
lowest score)
R = HS – LS = 4.7 – 1.6 = 3.1
3. Divide the range by the number of classes to estimate the approximate
with of the interval.
class width = Range divided by number of intervals
= Range/class interval
= 3.1/7
= 0.443
c = 0.5
4. List the lower class limit of the bottom interval and then the lower class
boundary. Add the class width to the lower class boundary to obtain the
popper class boundary. Write down the upper class limit.
If we begin the lower class interval at 1.5, the lower class boundary for
this interval will be 1.45. To this we ass the class width, 0.5, and find the
upper class boundaries to be 1.95.
5. List all the class limits and class boundaries by adding the class width to
the limits and boundaries of the previous interval.
6. Determine the class marks of each interval by averaging the class limits or
the class boundaries.
7. Tally the frequencies for each class.
8. Sum the frequency column and check against the total number of
observations.

Further computations
For the class midpoint or class mark (average of upper and lower class limits)
Example: (lower class limit + upper class limit) / 2
(1.5 + 1.9)/2 = 1.7
Relative frequency

Relative frequency can be obtained by dividing the class frequency by the


total frequency.
Example: RF = frequency/total number of observation
= 2 / 40
= 0.05
Percentage distribution can be obtained by multiplying the RF (relative
frequency) by 100
Example: RF x 100
= 0.05 x 100
= 5%
Cumulative frequency is the total frequency of all values less than the upper
class boundary of a given class interval.
Note: the method of getting the cumulative frequency is by tallying.

Percentage cumulative distribution can be obtained by multiplying the


cumulative frequency by 100
Example: Cumulative frequency divided by total number of observation
multiply by 100
= 2/40x100 = 0.05x100 = 5
= 3/40x100 = 0.075x100 = 7.5

Class Class Class Frequen Relative Percenta Class Cumulati Percenta


interval boundari midpoint cy Frequen ge boundaries ve ge
es cy Distributi Frequenc cumulati
on y ve
distributi
on
1.5 – 1.9 1.45 – 1.7 2 Less than 0 0
1.95 0.05 5 1.45
2.0 – 2.4 1.95 – 2.2 1 Less than 2 5
2.45 0.025 2.5 1.95
2.5 – 2.9 2.45 – 2.7 4 Less than 3 7.5
2.95 0.1 10 2.45
3.0 – 3.4 2.95 – 3.2 15 Less than 7 17.5
3.45 0.375 37.5 2.95
3.5 – 3.9 3.45 – 3.7 10 Less than 22 55.0
3.95 0.25 25 3.45
4.0 – 4.4 3.95 – 4.2 5 Less than 32 80.0
4.45 0.125 12.5 3.95
4.5 – 4.9 4.45 – 4.7 3 Less than 37 92.5
4.95 0.075 7.5 4.45
40 Less than 40 100
4.95
Constructing the graphs

Bar chart
16
14
12
10
Frequency

8 15
6
10
4
2 4 5
2 3
0 1
1.5 – 1.9 2.0 – 2.4 2.5 – 2.9 3.0 – 3.4 3.5 – 3.9 4.0 – 4.4 4.5 – 4.9
Batter lives (year)
Class Interval

In a Bar Chart the base of each bar corresponds to a class interval of


the frequency distribution and the heights of bars represent the frequencies
associated with each class.

Frequency Histogram

Frequency

Battery lives (years)

A histogram differs from a bar chart in the baes of each bar are the
class boundaries rather than class limits. The use of class boundaries for the
bases eliminates the spaces between the bar to give the solid appearance.

Frequency polygon
Frequency Polygon
16
14
12
10
frequency

8
6
4
2
0
1.2 1.7 2.2 2.7 3.2 3.7 4.2 4.7 5.2
Battery Lives in years
class midpoint

Frequency polygons are constructed by plotting class frequencies


against class marks and connecting the consecutive points by straight lines.
A polygon is many sided closed figure. To close the frequency polygon,
an additional class interval is added to both ends of the distribution, each
with zero frequency.
We can obtain the frequency polygon by very quickly from the
histogram by joining the midpoints of the tops of adjacent rectangles and
then adding the two intervals at each end.

Frequency ogive

frequency ogive
45
40
cumulative frequency

35
30
25
20
15
10
5
0
1.45 1.95 2.45 2.95 3.45 3.95 4.45 4.95
Battery Lives (years)
class boundaries

Frequency ogive or cumulative frequency polygon is obtained by


plotting the cumulative frequency less than any upper class boundary
against the upper class boundary against the upper class boundary and
joining all the consecutive points by straight lines.
On the otherhand, if relative cumulative frequencies or percentage
have been used, we call the graph a relative frequency ogive or percentage
ogive.
Other Graphs

Pie chart

A pie chart is a circular statistical graphic, which is divided into slices


to illustrate numerical proportion. In a pie chart, the arc length of each slice,
is proportional to the quantity it represents.

Box plot

What is a Boxplot?

A boxplot, also called a box and whisker plot, is a way to show


the spread and centers of a data set. Measures of spread include
the interquartile range and the mean of the data set. Measures of center
include the mean or average and median (the middle of a data set).
The box and whiskers chart shows you how your data is spread out. Five
pieces of information (the “five number summary“) are generally included in
the chart:
 The minimum (the smallest number in the data set). The minimum is
shown at the far left of the chart, at the end of the left “whisker.”
 First quartile, Q1, is the far left of the box (or the far right of the left
whisker).
 The median is shown as a line in the center of the box.
 Third quartile, Q3, shown at the far right of the box (at the far left of the
right whisker).
 The maximum (the largest number in the data set), shown at the far right
of the box.

Click here to have example of creating box and whiskers plot in excel.
https://support.microsoft.com/en-us/office/create-a-box-plot-10204530-
8cdf-40fe-a711-2eb9785e510f

battery_lives Stati
stic
Mean 3.41
25
95% Lower 3.18
Confiden Bound 77
ce Upper 3.63
Interval Bound 73
for
Mean
5% Trimmed Mean 3.43
33
Median 3.40
00
Variance .494
Std. Deviation .702
81
Minimum 1.60
Maximum 4.70
Range 3.10
Interquartile Range .77
Skewness -.36
4
Kurtosis .359

Stem-and-Leaf Plot

battery_lives Stem-and-Leaf Plot

Frequency Stem & Leaf

1.00 2 . 2
4.00 2 . 5669
15.00 3 . 001111222333444
10.00 3 . 5567778899
5.00 4 . 11234
3.00 4 . 577

You might also like