0% found this document useful (0 votes)
9 views

Visualization

Chapter 4 of the Business Analytics course focuses on data visualization techniques for both categorical and numerical variables. It discusses methods such as frequency distributions, bar charts, histograms, and scatterplots, along with practical exercises using datasets. The chapter emphasizes the importance of clear graphical representation and guidelines to avoid distortions in visual data presentation.

Uploaded by

manojpruthvi650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Visualization

Chapter 4 of the Business Analytics course focuses on data visualization techniques for both categorical and numerical variables. It discusses methods such as frequency distributions, bar charts, histograms, and scatterplots, along with practical exercises using datasets. The chapter emphasizes the importance of clear graphical representation and guidelines to avoid distortions in visual data presentation.

Uploaded by

manojpruthvi650
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

CIS 370

Business Analytics
Chapter 4: Data Visualization
Instructor: Hamed Qahri-Saremi, PhD
Visualizing Categorical and Numerical Variables
Methods to Visualize Categorical Variables
• A categorical variable consists of observations that represent labels or names.

– Summarize the data with a frequency distribution.


• Group the data into categories and record the number of observations that fall into each category.
• The relative frequency for each category is the proportion of observations in each category. Multiply the
proportions by 100 to get percentages.

– A bar chart depicts the frequency or relative frequency for each category of the categorical
variable.
• Horizontal or vertical bars
• Lengths proportional to the values they are depicting
Methods to Visualize Categorical Variables
• Example: Myers-Briggs assessment of employees
Exercise 1 (Soft Drinks)
1. Import the SoftDrinks.xlsx dataset.
2. Use JMP: Analyze ► Distribution
Exercise 2 (Transit Survey)
• Example 3.1, page 83
– What is the most common commuting mode at the university?
– Open the Transit Survey data file
– Use JMP: Analyze ► Distribution
Methods to Visualize Numerical Variables
• A histogram is a series of rectangles where the width and height of each rectangle
represent the interval width and frequency (or relative frequency) of the respective
interval.
• A histogram provides information about the shape of the distribution.
– Symmetric: mirror image of itself on both sides of its center
– Skewed: positive (elongated right tail) or negative (elongated left tail)
Methods to Visualize Numerical Variables
• Example: Histogram for the annual returns (in %) for Growth
Exercise 3 (Growth Value)
Construct a frequency distribution and a histogram using JMP for annual returns (in %) for
value (Example 3.2, page 88)
1. Analyze ► Distribution
2. Click the red triangle next to “Value”;
1. Histogram Options ► Show Percents;
2. Histogram Options ► Set Bin Width
Exercise 4 (Audit Times)
Create a frequency distribution and a histogram for the dataset “AuditData.xlsx”
1. Reformat the dataset, if needed.
1. Select the columns. Table ► Stack.
2. If needed, delete the unnecessary columns and rows (select, right click, delete) and rename the new
stacked column.
2. Analyze ► Distribution
3. Click the red triangle next to “Year-End Audit Times (in Days)”;
1. Histogram Options ► Show Counts;
2. Histogram Options ► Set Bin Width
Methods to Visualize Categorical and Numerical Variables
• The possibility exists for unintentional, as well as purposeful, distortions of graphical
information.
• Follow these basic guidelines.
– The simple graph should be used for a given set of data.
– Axes should be clearly marked with numbers of their respective scales; each axis should be labeled.
– When creating a bar chart or a histogram, each bar/rectangle should be of the same width.
– The vertical axis should not be given a very high value as an upper limit or stretched.
Scale Matters

12%

1.5%
Visualizing the Relationship Between Two Variables
Visualize The Relationship Between Two Variables
• Use a contingency table to examine the relationship between two categorical variables.
– Frequencies for two categorical variables
– Each cell represents a mutually exclusive combination of the pair of values

• Use a stacked column / bar chart or side-by-side column / bar charts to visualize more
than one categorical variable.
– Graphically shows the contingency table
– Allows for the comparison compositive within each category.
Visualize The Relationship Between Two Variables
• Example: Myers-Brigg assessment and sex
Exercise 5 (Smoking & Drinking)
Dataset (Smoking Drinking.xlsx) shows
the smoking and drinking habits of
8761 adults. Categories are coded: “N”
is non, “O” occasional, “H” heavy, “S”
smoker, and “D” drinker.
• Analyze ► Tabulate
– Add (drag and drop) statistics to
show in the contingency table.
– From the red triangle, add chart (for
the contingency table).
Exercise 5 (Smoking & Drinking)
• To construct a stacked bar chart:
1. Graph ► Graph Builder
2. Cast Smoking into X role; Drop Drinking into Overlay;
3. Change Bar Style to Stacked
Visualize The Relationship Between Two Variables
• Use a scatterplot to examine the relationship between two numerical variables.
– Determine whether two numerical variables are related in some systematic way
– Each point represents a paired observation for the two variables
– Refer to one variable as x (x-axis) and the other as y (y-axis)

• Once plotted, the graph may reveal one of the below:


Other Data Visualization Methods
• A line chart displays a numerical variable as a series of data points connected by a line.
– A line chart is especially useful for tracking changes or trends over time.
– When multiple lines are plotted in the same chart, we can compare these observations on one or
more dimensions.
Exercise 6 (Growth-Value)
• Construct a scatterplot of Value against Growth (Example 3.4, page 101)
Exercise 7 (Kirkland)
Construct a scatterplot and a line chart for Sales (Kirkland.xlsx)
1. Graph ► Graph Builder
2. Select scatterplot or line chart
Other Data Visualization Methods
Other Data Visualization Methods
• Incorporate a categorical variable within a scatterplot by using different colors or
symbols. This allows you to determine if the relationship between x and y differs across
the values of the categorical variable.
– Example: life expectancy vs. birth rate by country development
Other Data Visualization Methods
• A bubble plot shows the relationship between three numerical variables.
– The third variable is represented by the size of the bubble (points).

• Example: life expectancy vs. birth rate by GNI (Gross National Income)
Exercise 8 (Billionaires)
• Construct a bubble plot to show the percentage of billionaires per 10 mission residents
against per capita income by country
– Graph ► Bubble Plot
Exercise 9 (Kirkland Regional)
• Construct a line chart for Sales in north and south regions (KirklandRegional.xlsx)
1. Adjust the data format
1. Table > Stack
2. Recode Data for Region
3. Remove unnecessary columns (if needed)
2. Graph ► Graph Builder
Exercise 9 (Kirkland Regional)
• Construct a bar chart for Sales in north and
south regions (KirklandRegional.xlsx)
1. Adjust the data format
2. Graph ► Graph Builder
3. Set Bar Style as Stacked or Side by Side
Other Data Visualization Methods
• A heat map uses color or color intensity to display relationships between variables.
– Example: bookstore and book type
Exercise 10 (Same Store Sales)
• Construct a heat map to plot sale change (in %) against month for each location
(SameStoreSales.xlsx)
1. Format data: Table ► Stack
2. Graph ► Graph Builder
3. See next slide for graph setup
Exercise 10 (Same Store Sales) Solution
1. Month: X 1. Month: X
2. SaleChange: Y
2. SaleChange: Y
3. SaleChange: Color
3. SaleChange: Color

4. Location: Group Y

You might also like