0% found this document useful (0 votes)
8 views33 pages

Organizing Data Numerical Data W25

This document outlines methods for organizing and representing numerical data, focusing on techniques such as grouped frequency distributions, histograms, and stem-and-leaf displays. It emphasizes the importance of summarizing large data sets to identify trends and variations, and provides step-by-step instructions for constructing frequency tables and calculating relative and percentage frequencies. Additionally, it discusses the significance of distribution shapes and the impact of outliers on data analysis.

Uploaded by

tx4n775hkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views33 pages

Organizing Data Numerical Data W25

This document outlines methods for organizing and representing numerical data, focusing on techniques such as grouped frequency distributions, histograms, and stem-and-leaf displays. It emphasizes the importance of summarizing large data sets to identify trends and variations, and provides step-by-step instructions for constructing frequency tables and calculating relative and percentage frequencies. Additionally, it discusses the significance of distribution shapes and the impact of outliers on data analysis.

Uploaded by

tx4n775hkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Organizing and

representing data
Part 2: Numerical Data
▪ Construct a Grouped Frequency
distributions,
▪ Construct and interpret frequency
Lesson Plan tables and histograms
▪ Identify and describe common shapes
of distributions
▪ Identify and use other graphic
representations of data
▪ Construct Steam-and-leaf displays
Purpose of  To summarize large sets of data
organizing  Helps us see trends within our data (both
data: Review graphically and numerically)
We will be
focusing on
numerical
data

Purpose of organizing data: Review


Ex Ratio level of data, Grades

Grouped Frequency distribution: Grouping simmular anwser choices together

Stem & leaf pllot: organizing data by 10s value or 100s value

Frequency In larger varied data sets, we use


Distribution a Grouped Frequency
Distribution & Stem & Leaf
Displays
Smallest age is 16 years old and oldest if 64
values from 16-64

46 16 41 26 22 33 30 22 36 34
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35

Practice!
Sample of the age 50 drivers arrested for driving under the influence
Let's organize the data to make it clear and meaningful
Why is having the right
number of intervals important?

Usally have 1 - 15 classes

We are going to use a class of 7

If you have to few class you loose out on data, and too many the data becomes clouded

Grouped Class with alway needs to be a whole number. Always round up

Frequency Steps in building a grouped frequency distribution


Distribution  Step 1: Decide the number of classes
Table &  Step 2: calculate the class width
Histograms
Maximum data value – Minimum data value
Class Width =
64-16/7 = 48/7 = 6.8 about 7 Number of classes
Always round
up!
64-16
= 6.86 → 7
7
 Step 1: Number of classes 7
Practice!
 Step 2: Class Width

46 16 41 26 22 33 30 22 36 34
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35
 Step 3: Determine your class limit
 Using the first lower class limit as the
class width, list the other lower class
Grouped limits Lower class liimit is the smallest nuber in a paticular class
Upper class limit is the biggest number in a particular class

Frequency  List the upper class limits


Distribution  Step 4: Determine Class Boundaries
Table  Midpoint between upper class limit of
that class and lower class limit of the
next class
First class will start at 16-22
16,17,18,19,20,21,22 (7 data sets)
23-29
30-36
37-43
44-50
51-57
58-64
 Step 3: Class Limit
 Lowest value: 16
 Class limit for the first class: 16-22 
include 7 values (because the class
width is 7)
 Do the same for all seven classes!
Practice!
46 16 41 26 22 33 30 22 36 34
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35
 Step 4: Determine class boundaries
move from 22 to 23 because if not 22 would be counted twice

Class 15.5
16-22
22.5
23-29
Practice! 29.5
30-36
36.5
37-43
Halfway
44-50 43.5
point
51-57 between
intervals
58-64
 Step 5: Tally the data
 Take each individual data value and put
a tally mark in the appropriate class.
Find the total frequency for each class.
Grouped
Frequency  Step 6: Calculate the midpoint
Distribution
Table
Midpoint = Lower class limit + upper class limit
2
 Step 5:Tally the data

Class Frequency Step 6: Midpoint


16-22 8 (for the first class)
23-29 11 (22+16) = 19
Practice! 30-36 11 2
37-43 7
44-50 6
51-57 4
58-64 3
Total 16=64 50
 Step 7: Proportions and percentages

relative class frequency


Grouped frequency =
Frequency sum of all frequencies
Distribution
Table class frequency
percentage
frequency =  100%
sum of all frequencies
Class Class Frequency
Boundaries

15.5
Relative frequency for first
16-22 22.5 8 class:
Grouped 8 = .16
Frequency 23-29 29.5 11 50

Distribution 30-36 26.5 11


Table 37-43 43.5 7 Percentage frequency for
44-50 50.5 6 first class:
8 = .16 *100
51-57 57.5 4 50
= 16%
58-64 64.5 3
= Total: 50
Class Class Boundaries Frequency Relative Frequency Percentage Frequency

15.5
16-22 22.5 8 .16 16%
23-29 29.5 11 .22 22%
30-36 36.5 11 .22 22%
37-43 43.5 7 .14 14%
44-50 50.5 6 .12 12%
51-57 57.5 4 .08 8%
58-64 64.5 3 .06 6%

Grouped Frequency Distribution Table


Rounded to two decimal spaces
Class Class Frequency Relative Percentage Cumulative
Boundaries Frequency Frequency frequency
15.5
16-22 22.5 8 .16 16% 16%
23-29 29.5 11 .22 22% 16+22 = 38%
30-36 26.5 11 .22 22% 38+22=60%
37-43 43.5 7 .14 14% 60+14= 74%
44-50 50.5 6 .12 12% 74+12=
86%
51-57 57.5 4 .08 8% 86+8=94%
58-64 64.5 3 .06 6% 94+6=100%
Cumulative frequency distribution is sum of the previous classes
Drivers of 44-64 they have less DUIs

Cumulative frequency Distribution


What is a histogram?
- A visual repersenation of a grouped frequency
table
- Thin higher the bar, the higher the frequency
- The numbers at the bottom of the chart
represent the class bounderies
- The numbers on the side of the chart:
repersent the values of the class
- The bars ar touching because the classes go
one from onther (they are sequental)

 Shows how data is distributed Frequency


Histogram
12

10
Histogram
8
Frequency

0
15.5 22.5 29.5 36.5 43.5 50.5 57.5 64.5
The bars are touching each other because the are sequential
Age
How to make a histogram (or relative
frequency histogram)
1. Make a frequency table with designated
classes
Histogram 2. Place class boundaries on horizontal axis
and frequencies (or relative frequencies)
on vertical axis
3. For each class draw a bar that
corresponds to the class frequency

This can also


be done using
Excel!
 When a data set is graphed, it produces a
shape that can give you information about
useful statistics, such as the data set’s
dispersion, its variability, its mean, or its
The Shape of range, for example.
a  Distributions shapes are influenced by:
Distribution  their modality, the class that apprear the most
 symmetry/skewness, and
Symmetry: graph look like a probaela, the data lies in the middle,

 kurtosis Kurtosis:
Skewness: When the graph liess more on one side
How spread out our data is
Symmetric / Normal
distribution Uniform Bimodal
It means that the
frequency is the same
for every singal class
and or catogorie

Distribution
Skewed Left Skewed Right
Shape

Left skew bc tail Also called


a negative
Right skew bc tail Also called
a postive
is to the left! scewed is to the right! skewed

Histogram
Because the frequency is all the same,
therefore there is no frequency
reoeated more then an other threrofre
no mode

OTHER
SHAPES
ACCORDING A multimodal distribution has A data set without modes
is called a uniform distribution
TO three
or more modes
MODALITY We either have a
large data set or
miss defined
population

A bimodal distribution
has
two modes
Peak

We are looking
how far the
Kurtosis
Low distrubution

peak and tail


are from the x-
Axis

Normal Distribution

Other factors High distrubution

to consider
Tail

Kurtosis represents the degree of peakedness of a


distribution (i.e. how pointy or flat it is).
 The cumulative frequency distribution is calculated by adding each
frequency from a frequency distribution table to the sum of its predecessors.
The last value will always be equal to the total for all observations.

Cumulative Percentage
Age of drivers driving under the influence
120%
Cumulative 100%
Relative 80%

Frequency 60%

Distribution 40%

20%

0%
16-22 23-29 30-36 37-43 44-50 51-57 58-64
Years
Histogram Bar Graph

 A histogram  A bar graph


Difference displays the displays the
between bar distribution of distribution of
graph and a quantitative a categorical
Histogram variable. variable.
 Histograms do  Bar graphs
not have gaps show gaps
between bars. between bars.
Reason for outliers
- People are abnormal
- Improper person slelected to be apart of population
- People lie

Outliers Main reasons


- People lie or their was an error

We ignore outliers
- But be midfull when eliminating them
Used in explroratory analysis
used to display data

Every steam and leaf will have a number

 Used to display data


 Used to rank order and arrange data into
groups
Steps in making a Stem and Leaf
Stem and 1. Divide data into two parts: Stems and
Leaf Leaf 32

Steam Leaf

2. Align the stems in a vertical column from


smallest to largest. Draw a vertical line
3. Place leaves with the matching stems in
increasing order
4. Label to indicate magnitude
Stem Leaf
 Sample of 50 drivers arrested for driving under the
influence
Stem Leaf
3 4

46 16 41 26 22 33 30 22 36 34
Practice!
63 21 26 18 27 24 31 38 26 55
31 47 27 43 35 22 64 40 58 20
49 37 53 25 29 32 23 49 39 40
24 56 30 51 21 45 27 34 47 35
Right most
digit in
increasing
order

3 3 Represents 33 years old


Stem Leaves
s
Practice! 1 6
2 0 1 12 2 23 4 4 5 6 6 6 7779
3 0 01 1 2 3 4 45 5 6 78 9
4 0 01 3 5 6 79 9
5 1 3 5 6 8
6 3 4
The tens
digit
Shows you the
shape of the
distribution

3 3 Represents 33 years old


Stem Leaves
s
Stem & Leaf 1 6
2 0 1 12 2 23 4 4 5 6 6 6 7779
3 0 01 1 2 3 4 45 5 6 78 9
4 0 01 3 5 6 79 9
5 1 3 5 6 8 Skewed right

6 3 4
We are not starting at 10s or 20s
digit because we don’t have a any Skewed right
value’s form that group

Age distribution of voters in a telephone survey (n=50).

3 3 4 5 7 8 9
3 4 = 34 years old
3 4 5 7 9 9

6 7 8 8 9
4 1 2 5 6 7 7 8 9
5 0 2 3 5 5 6 7 8 8 9
6 0 1 1 2 2 3 3 4 5 7 8 9

7 8 9
7 0 2 4 4 5 8
8 0 4 6 7 9

8
9 4 5 8

4 5 7 9 9
1 2 5 6 7
0 2 3 5 5
0 1 1 2 2
0 2 4 4 5
0 4 6 7 9
10 2

4 = 34 years old
• There are not many young voters in the sample. The youngest is

4 5 8
34yrs old.
• Most voters in this sample are middle-aged and elderly.

2
• In general, this sample data suggests that the people who live in

10
3
3
4
5
6
7
8
9
this electoral
district are primarily older people, either nearing retirement or
already retired.
Histogram Stem & Leaf
A histogram shows each
interval as a bar. A stemplot shows
 The heights of the bars every individual data
show the frequencies or value. For large data
Histogram relative frequencies of sets, however, it can
vs. Stem & values in each interval. be difficult to se the
Leaf  The choice of intervals
in a histogram can
overall pattern in the
graph. We can get a
affect the appearance
of a distribution.
better picture of the
distribution by
 Histograms with more grouping nearby
intervals show more
detail but may have a values in a
less clear overall histogram.
pattern.
Plotting pots according to
your y and x axis
Unemployment rate among young men and young women in Canada
Plotting every single point to
make a line, it shows change
over a series of time

Make sure label axesis and


title the graph

Time Series
graphs (or
line graphs)

 This graph displays measurements of the same variable


recorded at regular intervals over a period of time. They
are useful at displaying how data change over time.

You might also like