Lecture 2
Lecture 2
NOTE: for project do not choose aggregative data (such as average data of a province)
Types of quantitative variables
• Reminder: A quantitative variable is one that takes on numerical
values can be discrete and continuous
• Quantitative variables can be discrete or continuous
• Discrete: Takes a countable number of values
• E.g. Number of cats in your household exactly number such as: 2, 3, 4
• Continuous: Takes an uncountable (and infinite) number of values
• E.g. Length of your cat’s tail decimal number: 1.4, 3.1, etc
NOTE: even though you have unlimited number, but if it’s still exact
number (rounded) it’s still discrete number
Exploring distributions
• Exploratory data analysis: examining and looking at the features of a
set of data
• Distribution of a variable: the values the variable takes and how often
it takes them
• Categorical variable: list each category and show a count or precent of the
cases that fall in each category
• Quantitative variable: give ranges of values for the variable and show how
often cases have values falling in each range
Displaying distributions with graphs
• Ways to display categorical data
• Bar graphs
• Pie charts
• Ways to display quantitative data
• Histograms
• Stemplots
• Time plots
Categorical variables - Distributions
• Example: marital status value
• Values of the variable: Married,
Never married, Divorced,
Widowed
• We can summarize the data in
table form
Bar graphs for categorical variables
• Categories are listed on the
horizontal axis
• The height of the bar above each
category represents the count
(or sometimes percentage) of
observations in that category
• The categories in the graph can
be ordered any way we want
(alphabetical, by increasing
value, chronological, etc.)
Pie charts for categorical variables
• The area of the pie dedicated to
each category represents the
proportion of observations
falling in that category
• The categories in the pie chart
are exhaustive – the percentages
add up to 100%
Pareto charts
• A pareto chart is a bar chart that is sorted by frequency
• Example: accidents per day of the week
The Pareto chart on the left is easier to read than the chronologically ordered
bar chart on the right.
Displaying distributions – quantitative
variables
• Histograms and stemplots
• These graphs summarize the distribution of a single quantitative variable.
• Time plots
• These are graphs of a single variable measured at multiple points in time. A
line connecting the points emphasizes the changes occurring over time.
Histograms
A range of data is divided into
non-overlapping and equal width
classes (bins) that cover the full
range of values.
How do we pick the number of bins and the width of bins? next class
For next class
• Complete the survey, if you haven’t done it yet
• Reading: Alwan 3.1, 1.2; the rest of Krauth Chapter 2
• For practice: Alwan Questions 1.7, 1.13 , 1.27, 1.28