Visualization
Visualization
Business Analytics
Chapter 4: Data Visualization
Instructor: Hamed Qahri-Saremi, PhD
Visualizing Categorical and Numerical Variables
Methods to Visualize Categorical Variables
• A categorical variable consists of observations that represent labels or names.
– A bar chart depicts the frequency or relative frequency for each category of the categorical
variable.
• Horizontal or vertical bars
• Lengths proportional to the values they are depicting
Methods to Visualize Categorical Variables
• Example: Myers-Briggs assessment of employees
Exercise 1 (Soft Drinks)
1. Import the SoftDrinks.xlsx dataset.
2. Use JMP: Analyze ► Distribution
Exercise 2 (Transit Survey)
• Example 3.1, page 83
– What is the most common commuting mode at the university?
– Open the Transit Survey data file
– Use JMP: Analyze ► Distribution
Methods to Visualize Numerical Variables
• A histogram is a series of rectangles where the width and height of each rectangle
represent the interval width and frequency (or relative frequency) of the respective
interval.
• A histogram provides information about the shape of the distribution.
– Symmetric: mirror image of itself on both sides of its center
– Skewed: positive (elongated right tail) or negative (elongated left tail)
Methods to Visualize Numerical Variables
• Example: Histogram for the annual returns (in %) for Growth
Exercise 3 (Growth Value)
Construct a frequency distribution and a histogram using JMP for annual returns (in %) for
value (Example 3.2, page 88)
1. Analyze ► Distribution
2. Click the red triangle next to “Value”;
1. Histogram Options ► Show Percents;
2. Histogram Options ► Set Bin Width
Exercise 4 (Audit Times)
Create a frequency distribution and a histogram for the dataset “AuditData.xlsx”
1. Reformat the dataset, if needed.
1. Select the columns. Table ► Stack.
2. If needed, delete the unnecessary columns and rows (select, right click, delete) and rename the new
stacked column.
2. Analyze ► Distribution
3. Click the red triangle next to “Year-End Audit Times (in Days)”;
1. Histogram Options ► Show Counts;
2. Histogram Options ► Set Bin Width
Methods to Visualize Categorical and Numerical Variables
• The possibility exists for unintentional, as well as purposeful, distortions of graphical
information.
• Follow these basic guidelines.
– The simple graph should be used for a given set of data.
– Axes should be clearly marked with numbers of their respective scales; each axis should be labeled.
– When creating a bar chart or a histogram, each bar/rectangle should be of the same width.
– The vertical axis should not be given a very high value as an upper limit or stretched.
Scale Matters
12%
1.5%
Visualizing the Relationship Between Two Variables
Visualize The Relationship Between Two Variables
• Use a contingency table to examine the relationship between two categorical variables.
– Frequencies for two categorical variables
– Each cell represents a mutually exclusive combination of the pair of values
• Use a stacked column / bar chart or side-by-side column / bar charts to visualize more
than one categorical variable.
– Graphically shows the contingency table
– Allows for the comparison compositive within each category.
Visualize The Relationship Between Two Variables
• Example: Myers-Brigg assessment and sex
Exercise 5 (Smoking & Drinking)
Dataset (Smoking Drinking.xlsx) shows
the smoking and drinking habits of
8761 adults. Categories are coded: “N”
is non, “O” occasional, “H” heavy, “S”
smoker, and “D” drinker.
• Analyze ► Tabulate
– Add (drag and drop) statistics to
show in the contingency table.
– From the red triangle, add chart (for
the contingency table).
Exercise 5 (Smoking & Drinking)
• To construct a stacked bar chart:
1. Graph ► Graph Builder
2. Cast Smoking into X role; Drop Drinking into Overlay;
3. Change Bar Style to Stacked
Visualize The Relationship Between Two Variables
• Use a scatterplot to examine the relationship between two numerical variables.
– Determine whether two numerical variables are related in some systematic way
– Each point represents a paired observation for the two variables
– Refer to one variable as x (x-axis) and the other as y (y-axis)
• Example: life expectancy vs. birth rate by GNI (Gross National Income)
Exercise 8 (Billionaires)
• Construct a bubble plot to show the percentage of billionaires per 10 mission residents
against per capita income by country
– Graph ► Bubble Plot
Exercise 9 (Kirkland Regional)
• Construct a line chart for Sales in north and south regions (KirklandRegional.xlsx)
1. Adjust the data format
1. Table > Stack
2. Recode Data for Region
3. Remove unnecessary columns (if needed)
2. Graph ► Graph Builder
Exercise 9 (Kirkland Regional)
• Construct a bar chart for Sales in north and
south regions (KirklandRegional.xlsx)
1. Adjust the data format
2. Graph ► Graph Builder
3. Set Bar Style as Stacked or Side by Side
Other Data Visualization Methods
• A heat map uses color or color intensity to display relationships between variables.
– Example: bookstore and book type
Exercise 10 (Same Store Sales)
• Construct a heat map to plot sale change (in %) against month for each location
(SameStoreSales.xlsx)
1. Format data: Table ► Stack
2. Graph ► Graph Builder
3. See next slide for graph setup
Exercise 10 (Same Store Sales) Solution
1. Month: X 1. Month: X
2. SaleChange: Y
2. SaleChange: Y
3. SaleChange: Color
3. SaleChange: Color
4. Location: Group Y