Week 2 Chapter 2 Describing Data
Week 2 Chapter 2 Describing Data
I Data
II Numerical data
Data
Categorical Numerical
Examples:
Marital Status
Are you registered to
vote?
Discrete Continuous
Eye Color
(Defined categories or
groups) Examples:
Examples:
Weight
Number of Children
Voltage
Defects per hour
(Counted items) (Measured
characte
I. DATA
Graphical presentation of data
⚫ Numerical data: data about values, numbers
⚫ Categorial data: data about categories
II. NUMERCIAL DATA
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41,
43, 44, 46, 53, 58
⚫ Find range: 58 - 12 = 46
⚫ Select number of classes: 5 (usually between 5 and 15)
⚫ Compute class interval (width): 10 (46/5 then round up)
⚫ Determine class boundaries (limits): 10, 20, 30, 40, 50,
60
⚫ Compute class midpoints: 15, 25, 35, 45, 55
⚫ Count observations & assign to classes
II. NUMERCIAL DATA
2.2. Organizing numerical data : Data from ordered array
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38,
41, 43, 44, 46, 53, 58
Cumulative Cumulative
Class Frequency Percentage Frequency Percentage
Total 20 100
Numerical Data
Stem-and-Leaf
Display Histogram Polygon Ogive
II. NUMERCIAL DATA
2.2. Organizing numerical data
Stem and leaf diagram:
⚫ A simple way to see distribution details in
a data set
Stem Leaf
21 is shown as 2 1
38 is shown as 3 8
41 is shown as
4 1
II. NUMERCIAL DATA
2.2.Organizing numerical data
Example
Stem Leaf
613 would become 6 3
776 would become
7 8
1224 would become 12 2
II. NUMERCIAL DATA
Stem and leaf
Using other stem units
⚫ Using the 100’s digit as the stem:
◦ The completed stem-and-leaf display:
6 136
613, 632, 658, 717,
722, 750, 776, 827, 7 2258
841, 859, 863, 891, 8 346699
894, 906, 928, 933, 9 13368
955, 982, 1034, 10 356
1047,1056, 1140, 11 47
1169, 1224 12 2
II. NUMERCIAL DATA
2.2.Organizing numerical data
Frequency distributions
What is a frequency distribution?
⚫A frequency distribution is a list or a table …
⚫ Containing class groupings (ranges within which
the data fall) ...
⚫ The corresponding frequencies with which data fall
within each grouping or category
II. NUMERCIAL DATA
2.2. Organizing numerical data:
Why use a frequency distributions
⚫ It is a way to summarize numerical data
⚫ It condenses the raw data into a more useful
form...
⚫ It allows for a quick visual interpretation of the
data
⚫ It enables the determination of the major
characteristics of the data set including where
the data are concentrated / clustered
II. NUMERCIAL DATA
2.2. Organizing numerical data:
Class intervals and class boundaries
range
Width of interval
number of desired class
groupings
Usually at least 5 but no more than 15 groupings
Class boundaries never overlap
Round up the interval width to get desirable endpoints
II. NUMERCIAL DATA
2.3.The histogram
⚫ A graph of the data in a frequency distribution
is called a histogram
⚫ The class boundaries (or class midpoints) are
shown on the horizontal axis
⚫ The vertical axis is either frequency, relative
frequency, or percentage
⚫ Bars of the appropriate heights are used to
represent the number of observations within each
class
III. NUMERCIAL DATA
2.3.The histogram
Example
Class Midpoint Frequency
4
3
2
(No gaps 1
between bars)
0
5 15 25 35 45 55
65
III. NUMERCIAL DATA
2.3.The histogram
III. NUMERCIAL DATA
2.3.The histogram
III. NUMERCIAL DATA
2.3.The histogram
II. NUMERCIAL DATA
Class Midpoint
Class Frequency
4
3
2
(In a percentage polygon the 1
vertical axis would be defined to
show the percentage of 0
observations per class)
5 15 25 35 45 55 65
Class Midpoints
II. NUMERCIAL DATA
2.4.The ogive (cumulative % polygon)
Lower
class Cumulative
Class boundary Percentage
Less than 10 0 0
10 but less than 20 10 15
Ogive: Daily High Temperature
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90 1
Cumulative Percentage 0
50 but less than 60 50 100
0
8
0
6
0
4
10 20 30 40 50
0
60
2
II. NUMERCIAL DATA
Distribution shape
⚫ The shape of the distribution is said to be
symmetric if the observations are balanced, or
evenly distributed, about the center
II. NUMERCIAL DATA
2.5. Scatter diagrams
⚫ Scatter diagrams are used to examine possible
relationships between two numerical
variables
29 151 200
33 160 150
38 167 100
41 185 50
42 170
0 0 10 20 30 50 60 70
50 188 40
55 195 Volume per Day
60 200
III. CATEGORICAL DATA
Categorical Data
Graphing Data
Tabulating Data
Investor's
Portfolio
Savings
CD
Bond
s
0 1 20 30 4 5
Stock 0 0 0
Amount in
s $1000's
III. CATEGORICAL DATA
Pie charts
Savings
15%
Stocks
42%
CD 14%
Percentages
are rounded to
Bonds
29% the nearest
percent
III. CATEGORICAL DATA
3.3.Pareto diagram
⚫ Used to portray categorical data (nominal scale)
⚫A bar chart, where categories are shown
in descending order of frequency
⚫A cumulative polygon is often shown in
the same graph
⚫ Used to separate the “vital few” from the “trivial
many”
III. CATEGORICAL DATA
3.3.Pareto diagram
⚫ Forexample: 400 defective items are examined
for cause of defect:”
Source of Number of defects
Manufacturing
Error
Bad Weld 24
Poor Alignment 223
Missing Part 25
Paint Flaw 78
Electrical Short 19
Cracked case 21
Total 400
III. CATEGORICAL DATA
3.3.Pareto diagram
For example: 400 defective items are examined for cause
of defect:
⚫ Step 1: Sort by defect cause, in descending order
⚫ Step 2: Determine % in each category
Source of Number of % of Total
Manufacturing defects defect
Error
Bad Weld 24 55.75
Poor Alignment 223 19.50
Missing Part 25 8.50
Paint Flaw 78 6.25
Electrical Short 19 5.25
Cracked case 21 4.75
Total 400 100%
III. CATEGORICAL DATA
3.3.Pareto diagram example
⚫ Step 3 show the results
graphically
III. CATEGORICAL DATA
Graphs for Time-series data
⚫ A line chart (time series plot) is used to study patterns in
the values of a variable over time
⚫ Time is measured on the horizontal axis
⚫ The variable of interest is measured on the vertical axis
III. CATEGORICAL DATA
Cross table or frequency table
⚫ Cross Tables (or contingency tables) list the number of
observations for every combination of values for two
categorical or ordinal variables
⚫ If there are r categories for the first variable (rows) and c
categories for the second variable (columns), the table
is called an r x c cross table
III. CATEGORICAL DATA
Tabulating and graphing multivariate categorical
data
⚫ Contingency Table for Investment Choices by investors ( value
in
$1000’s)
III. CATEGORICAL DATA
Tabulating and graphing multivariate categorical
data
⚫ Side-by-side bar charts
C o m p a r i n g I nvestors
S avings
CD
B onds
S toc k
s 0 10 20 30 40 50 60
60
50
40
East
30 West
Nort
20
h
10
0
1st 2nd 3rd 4th
Qtr Qtr Qtr Qtr