3. Variables & Chart
3. Variables & Chart
⧫ An item of data
⧫ Examples:
– gender
– test scores
– weight
⧫ Value varies from one observation to
another
Types/Classifications of Variables
⧫ Qualitative
⧫ Quantitative
– Discrete
– Continuous
Qualitative Data
⧫ Describes the quality
⧫ Non-numerical format
Counts
Cannot order or measure
⧫ Examples
– gender
– marital status
– geographical region
– job title….
Categorical data
⧫ Non-overlapping categories or
characteristics
⧫ Examples:
– Completes/Incompletes
– Professions
– Gender
Quantitative Data
⧫ Frequencies
⧫ Measurements
Discrete
⧫ Measurements are integers
⧫ Examples:
– number of employees of a company
– number of incorrect answers on a test
– number of participants in a program…
Continuous
⧫ Measurements can take on any value -
usually within some range
⧫ Examples:
– Age
– Income
⧫ Arithmetic operations such as differences
and averages make sense.
Qualitatiave or Quantitative?
Discrete or Continuous?
⧫ Score on a placement exam
⧫ Preferred restaurant
⧫ Dollar amount of a loan
⧫ Height
⧫ Salary
⧫ Length of time to complete a task
⧫ Number of applicants
⧫ Ethnic origin
Treatment as Ranks
⧫ Natural order
⧫ Not strictly measured
⧫ Examples:
– Age group
– Likert Scale data
⧫ Distinction between adjacent points on the
scale is not necessarily the same
Analysis
Qualitative Data
⧫ Frequency tables
⧫ Modes - most frequently occurring
⧫ Graphs: Bar Charts and Pie Charts
Analysis
Quantitative Data
⧫ Any form
⧫ Create groups or categories and generate
frequency tables
⧫ All descriptive statistics
Effective Graphs:
Quantitative Data
⧫ Histograms
⧫ Stem-and-Leaf plots
⧫ Dot Plots
⧫ Box plots
⧫ XY Scatter Plots (2 variables).
Examples of Graphs
Pie Chart
P e rfo rm a n c e A p p ra is a ls
10%
38%
M o re D iffic u lt
14%
D iffic u lt
S am e
M u c h E a s ie r
E a s ie r
33%
90
80
70
60
50 E a st
40 W e st
30 N o r th
20
10
0
1 st Q tr 2 n d Q tr 3 r d Q tr 4 th Q tr
Histogram
H isto g ram
12
10
8
Frequency
6
4
2
0
49 59 69 79 89 99
Sco r e
Boxplot
B o x p lo t o f C 1
20 30 40 50 60 70 80 90 100 110
C1
Stem and Leaf Plot
StemandLeaf Plot
W eight of Meat
75
83
87999
923
966789
10
10688
112244
11788
124
128
13
138
141
Analyze Ranked Data
⧫ Frequency tables
⧫ Mode, Median, Quartiles
⧫ Graphs:
– Bar Charts
– Dot Plots, Pie Charts
– Line Charts (2 variables)
Data Example
Suggest some ways you could analyze these items.
English
Spanish 55%
⧫ Segment - percentage of 25%
the whole that falls into
each category Native Language
Bar Charts
⧫ Bar charts - % in various
categories A
v
er
age
Uni
tsS
ol
d (
per
per
son
)b
yP
ro
du
ct
⧫ Vertical scale - 2
0
Average Sold/Person
frequencies, relative 1
5
frequencies 1
0
⧫ Horizontal scale -
5 B
ef
o r
e T
r
aini
n g
A
ft
erT
r
aini
n g
0
categories B
41 B
A4
2 B
41
F C
21 O
t
her
⧫ Allows comparisons
P
r
odu
ct
Constructing Bar Charts
⧫ All boxes should have the same width
⧫ Gaps between the boxes - no connection
between
⧫ Any order.
⧫ Use to represent two categorical variables
simultaneously
Graphs: Measured
Continues Quantitative Data
⧫ Histograms
⧫ Stem and Leaf
⧫ Box plots
⧫ Line Graphs
⧫ XY Scatter Charts (2 variables)
Histograms
⧫ Frequency G
r
a
d
eD
is
t
r
i
bu
t
io
n
distributions of 1
2
1
0
continuous variables
Frequency
8
6
⧫ Intervals - generally 1
2
1
0
the same length
Frequency
8
6
⧫ Number of values in 4
2
each interval -class 0 G
r
ade
5
9 6
9 7
9 8
9 9
9
frequency
⧫ Relative frequencies o
XY Scatter Chart
⧫ Two variables
Abscent by Age
⧫ Variables: quantitative and
20
continuous.
Days Absent
15
⧫ Plot pairs - rectangular
10
coordinate system 5
⧫ Examine the relationship 0
0 10 20 30 40 50 60 70
between two variables
Age
Line Chart
⧫ Similar to the scatter
1997MonthlySales
chart
⧫ Values of the 170
165
Sales (x$10,000)
160
independent variable 155
150
(shown on the 145
140
horizontal axis) can be 135
130
ranked values (i.e.. 125
Feb
June
Jan
Mar
May
Apr
they do not have to be Month
continuous variables).
Basic Principles for Constructing
All Plots
⧫ Data should stand out clearly from
background
⧫ The information should be clearly labeled
– title
– axes, bars, pie segments, etc. - include units that
are needed to interpret data
– scale including starting points.
Principles cont.
⧫ Source
⧫ No clutter
⧫ Minimize information or data on one graph.
⧫ Try several approaches
Describing Data
⧫ Shape of the Distribution
– Symmetry
– Skewness
– Modality: most frequently occurring value
– Unimodal or bimodal or uniform
e
e
Right Skewed
r
Left Skewed
r
H i s t o g r a m H i s t o g r a m
1 2 1 2
1 0 1 0
8 8
6 6
4 4
F
F
2 2
0 0
5 9 6 9 7 9 8 9 9 9 5 9 6 9 7 9 8 9 9 9
G r a d e G r a d e
H
is
tog
ram
1
2
1
0
Frequency
8
6
4 Symmetrical
2
0
5
9 6
9 7
9 8
9 9
9
G
rad
e
Describing Data
⧫ Centrality
⧫ Spread
⧫ Extreme values
Measures of Centrality
⧫ Mean
⧫ Median
⧫ Mode
Mean
⧫ Most common measure
⧫ Extremely large values in a data set will
increase the value of the mean
⧫ Extremely low values will decrease it.
Calculating the
Mean
T1 T2 T3
85 85 85
90 90 90
75 35 75
90 90 110
340 300 360 Sum
85 75 90 Mean
Median
⧫ Central point .
⧫ Half of the data has a value than the median
⧫ Half of the data has a higher value than the
median
⧫ Not affected by extremely large or small
values
Find the Median
85 90 75 92 95 Data
75 85 90 92 95 Sorted Data
Median is 90.
Find the Median
95 90 92 85 Data
85 90 92 95 Sorted Data
Median:
(90 + 92)/2 = 91
Measures of Spread
Range
⧫ Subtract the smallest value from the largest
⧫ Report the smallest and largest values.
85 90 75 92 95 Scores
Range: 75 to 95
or 20
Variance/Standard Deviation
⧫ Average variation of the data values from
the mean of the values
⧫ Variance.
The Empirical Rule
⧫ Symmetrical Data
⧫ At least:
68% of the data values are within one standard
deviation of the mean
90% of the data values are within two standard
deviation of the mean
99% of the data values are within three standard
deviations of the mean
Tchybychef’s Inequality
⧫ Skewed Data
⧫ At least:
75% of the data values are within two standard
deviation of the mean.
90% of the data values are within one standard
deviation of the mean.
Measures of Relative Standing
⧫ Percentiles
⧫ Quartiles
Quartiles
⧫ The lower quartile is the same as the 25th
percentile.
– 25% of the scores are lower and
– 75% of the scores are higher than the lower
quartile.
⧫ The upper quartile is the same as the 75th
percentile.
– 75% of the scores are lower and
Correlation
Describes the strength of the relationship
between two (or more) variables
4
3
2
1
0
0 2 4 6
X
Negative Relationship
8
r will be a negative
7 number.
6
5
Y
4
3
2
1
0
0 2 4 6
X