Organizing Data Numerical Data W25
Organizing Data Numerical Data W25
representing data
Part 2: Numerical Data
▪ Construct a Grouped Frequency
distributions,
▪ Construct and interpret frequency
Lesson Plan tables and histograms
▪ Identify and describe common shapes
of distributions
▪ Identify and use other graphic
representations of data
▪ Construct Steam-and-leaf displays
Purpose of To summarize large sets of data
organizing Helps us see trends within our data (both
data: Review graphically and numerically)
We will be
focusing on
numerical
data
Stem & leaf pllot: organizing data by 10s value or 100s value
46 16 41 26 22 33 30 22 36 34
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35
Practice!
Sample of the age 50 drivers arrested for driving under the influence
Let's organize the data to make it clear and meaningful
Why is having the right
number of intervals important?
If you have to few class you loose out on data, and too many the data becomes clouded
46 16 41 26 22 33 30 22 36 34
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35
Step 3: Determine your class limit
Using the first lower class limit as the
class width, list the other lower class
Grouped limits Lower class liimit is the smallest nuber in a paticular class
Upper class limit is the biggest number in a particular class
Class 15.5
16-22
22.5
23-29
Practice! 29.5
30-36
36.5
37-43
Halfway
44-50 43.5
point
51-57 between
intervals
58-64
Step 5: Tally the data
Take each individual data value and put
a tally mark in the appropriate class.
Find the total frequency for each class.
Grouped
Frequency Step 6: Calculate the midpoint
Distribution
Table
Midpoint = Lower class limit + upper class limit
2
Step 5:Tally the data
15.5
Relative frequency for first
16-22 22.5 8 class:
Grouped 8 = .16
Frequency 23-29 29.5 11 50
15.5
16-22 22.5 8 .16 16%
23-29 29.5 11 .22 22%
30-36 36.5 11 .22 22%
37-43 43.5 7 .14 14%
44-50 50.5 6 .12 12%
51-57 57.5 4 .08 8%
58-64 64.5 3 .06 6%
10
Histogram
8
Frequency
0
15.5 22.5 29.5 36.5 43.5 50.5 57.5 64.5
The bars are touching each other because the are sequential
Age
How to make a histogram (or relative
frequency histogram)
1. Make a frequency table with designated
classes
Histogram 2. Place class boundaries on horizontal axis
and frequencies (or relative frequencies)
on vertical axis
3. For each class draw a bar that
corresponds to the class frequency
kurtosis Kurtosis:
Skewness: When the graph liess more on one side
How spread out our data is
Symmetric / Normal
distribution Uniform Bimodal
It means that the
frequency is the same
for every singal class
and or catogorie
Distribution
Skewed Left Skewed Right
Shape
Histogram
Because the frequency is all the same,
therefore there is no frequency
reoeated more then an other threrofre
no mode
OTHER
SHAPES
ACCORDING A multimodal distribution has A data set without modes
is called a uniform distribution
TO three
or more modes
MODALITY We either have a
large data set or
miss defined
population
A bimodal distribution
has
two modes
Peak
We are looking
how far the
Kurtosis
Low distrubution
Normal Distribution
to consider
Tail
Cumulative Percentage
Age of drivers driving under the influence
120%
Cumulative 100%
Relative 80%
Frequency 60%
Distribution 40%
20%
0%
16-22 23-29 30-36 37-43 44-50 51-57 58-64
Years
Histogram Bar Graph
We ignore outliers
- But be midfull when eliminating them
Used in explroratory analysis
used to display data
Steam Leaf
46 16 41 26 22 33 30 22 36 34
Practice!
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35
Right most
digit in
increasing
order
6 3 4
We are not starting at 10s or 20s
digit because we don’t have a any Skewed right
value’s form that group
3 3 4 5 7 8 9
3 4 = 34 years old
3 4 5 7 9 9
6 7 8 8 9
4 1 2 5 6 7 7 8 9
5 0 2 3 5 5 6 7 8 8 9
6 0 1 1 2 2 3 3 4 5 7 8 9
7 8 9
7 0 2 4 4 5 8
8 0 4 6 7 9
8
9 4 5 8
4 5 7 9 9
1 2 5 6 7
0 2 3 5 5
0 1 1 2 2
0 2 4 4 5
0 4 6 7 9
10 2
4 = 34 years old
• There are not many young voters in the sample. The youngest is
4 5 8
34yrs old.
• Most voters in this sample are middle-aged and elderly.
2
• In general, this sample data suggests that the people who live in
10
3
3
4
5
6
7
8
9
this electoral
district are primarily older people, either nearing retirement or
already retired.
Histogram Stem & Leaf
A histogram shows each
interval as a bar. A stemplot shows
The heights of the bars every individual data
show the frequencies or value. For large data
Histogram relative frequencies of sets, however, it can
vs. Stem & values in each interval. be difficult to se the
Leaf The choice of intervals
in a histogram can
overall pattern in the
graph. We can get a
affect the appearance
of a distribution.
better picture of the
distribution by
Histograms with more grouping nearby
intervals show more
detail but may have a values in a
less clear overall histogram.
pattern.
Plotting pots according to
your y and x axis
Unemployment rate among young men and young women in Canada
Plotting every single point to
make a line, it shows change
over a series of time
Time Series
graphs (or
line graphs)