Module 3B - Visualizing Relationships Among Variables
Module 3B - Visualizing Relationships Among Variables
relationship between
variables
Business Analytics
Data Analysis and Decision Making (7e)
S. Christian Albright
Wayne L. Winston
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.1: Relationship between Smoking and
Drinking
• To create the crosstabs, enter the
category headings in Excel® and use the
COUNTIFS function to fill the table with
counts of joint categories.
• Next, sum across rows and down
columns to get totals.
• Then express the counts as percentages
of row and percentages of column.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.1: Relationship between Smoking and
Drinking
• Create a side-by-side bar graph for the percentages of the rows.
• Create a side-by-side bar graph for the percentages of the columns.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.1: Relationship between Smoking and
Drinking
• Counts versus percentages
❑There is no single correct way to display the data in a
crosstab.
❑Showing the counts as percentages of row totals or
percentages of column totals usually makes any
relationships stand out more clearly.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
3B-2 Relationships among Categorical Variables
and a Numerical Variable
• The comparison problem is one of the most important problems in data analysis.
It occurs whenever you want to compare a numerical measure across two or more
subpopulations.
• Examples
• The subpopulations are males and females, and the numerical measure is
salary.
• The subpopulations are different regions of the country, and the numerical
measure is the cost of living.
• The subpopulations are different days of the week, and the numerical
measure is the number of customers going to a particular fast-food chain.
Stacked and Unstacked Formats
• There are two possible data formats, stacked and unstacked.
• The data are stacked if there are two “long” variables, such as Gender
and Salary. The idea is that the male salaries are stacked in with the
female salaries.
• This is the format you will see in the vast majority of situations.
• You will occasionally see data in unstacked format, when there are two
“short” variables, such as Male Salary and Female Salary.
Stacked and Unstacked Formats
Example 3.2: Baseball Salaries
• Objective: To learn methods in Excel for breaking down baseball
salaries by various categorical variables.
• Solution: Data set contains the same 2018 baseball data examined
previously, as well as several extra categorical variables.
• Create summary measures by selecting One-Variable Summary from
the Summary Statistics dropdown list.
• Next, click the Format button and choose Stacked. Then choose the Cat
variable you want to categorize by and the Val variable you want to
summarize.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.2: Baseball Salaries (slide 2 of 7)
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Some Conclusions:
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.2: Baseball Salaries
• Categorize the teams as
Yankees and Non-Yankees.
• Create a box plot for the Salary
of Yankees and Non-Yankees.
The Yankees’ payroll is indeed much
larger than the payrolls for the rest of
the teams. In fact, it is so large that
Alex Rodriguez’s $32 million is
considered only a mild outlier relative
to the rest of the team.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
3B-3 Relationships Among Numeric Variables
• To study relationships among numeric variables, the scatterplot, and two
summary measures for numerical variables, correlation and covariance, are
used.
• These measures can be applied to any variables that are displayed
numerically.
• However, they are appropriate only for truly numerical variables, not for
categorical variables that have been coded numerically.
• In general, don’t use correlations that involve coded categorical variables
such as 0 – 1 dummies. The methods from the previous section is more
appropriate.
Scatterplots
• A scatterplot is a scatter of points, where each point denotes the
values of an observation for two selected variables.
• It is a graphical method for detecting relationships between two numerical
variables.
• The two variables are often labeled generically as X and Y, so a scatterplot is
sometimes called an X-Y chart.
• The purpose of a scatterplot is to make a relationship (or the lack of it)
apparent.
Example 3.3: Golf Stats on the PGA Tour
• Objective: To use scatterplots to search for relationships in the golf data.
• Solution: Data set includes an observation (stats) for each of the top 200
earners on the PGA Tour.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.3: Golf Stats on the PGA Tour
• In StatTools, designate a StatTools data set for a particular year.
• Next, select Scatterplot from the Summary Graphs dropdown list
and then select at least one X variable and at least one Y variable.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.3: Golf Stats on the PGA Tour
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Correlation and Covariance
• Correlation and covariance measure the strength and direction of a
linear relationship between two numerical variables.
• The relationship is “strong” if the points in a scatterplot cluster
tightly around some straight line.
• If this straight line rises from left to right, the relationship is
positive and the measures will be positive numbers.
• If it falls from left to right, the relationship is negative and the
measures will be negative numbers.
Correlation and Covariance
• The two numerical variables must be “paired” variables.
• They must have the same number of observations, and
the values for any observation should be naturally
paired.
Covariance
• Covariance is essentially an average of products of deviations from
means.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Correlation
• Correlation is a unitless quantity that is unaffected by the measurement
scale.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.3: Golf Stats on the PGA Tour
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.3: Golf Stats on the PGA Tour
• You can learn more about a correlation by creating the corresponding
scatterplot.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
3B.4 Pivot Tables
• The pivot table is an Excel® tool that allows you to break data down
by categories.
• Sometimes pivot tables are used to display tables of counts, often
called crosstabs or contingency tables.
• However, crosstabs typically list only counts, whereas pivot tables can
list counts, sums, averages, and other summary measures.
Example 3.4: Customer Orders at Elecmart
• Objective: To use pivot tables
to break down the customer
order data by a number of
categorical variables.
• Solution: Data set contains
data on 400 customer orders
during several months for
Elecmart company.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.4: Customer Orders at Elecmart
• Create a pivot table by
clicking the PivotTable
button on the Insert ribbon.
• The top section of the
dialog box allows you to
specify the table or range
that contains the data. The
bottom section allows you
to select the location where
you want the results to be
placed.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.4: Customer
Orders at Elecmart
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.4: Customer Orders at Elecmart
• Assuming any cell inside this blank pivot table is selected, the PivotTable Tools
“super tab” is visible.
• This super tab has two ribbons, Analyze and Design.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.4: Customer Orders at
Elecmart
• Finally, the PivotTable Fields pane is visible.
• The pane indicates that a pivot table has four areas.
• These are for Filters, Rows, Columns, and Values. They
correspond to the four areas in a blank pivot table
where you can put fields.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.4: Customer Orders at Elecmart
• A Rows field has categories that go down the left side of a pivot table.
• A Columns field has categories that go across the top of a pivot table.
• A Filters field lets you filter the whole pivot table by its categories.
• A Values field contains the data you want to summarize.
• Typically (but not always), you will place categorical variables in the
Filters, Rows, and/or Columns areas, and you will place numeric
variables in the Values area.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.4: Customer Orders at Elecmart
• Check the Time, Region, and
Total Cost boxes in the upper
half of the PivotTable Fields
pane.
• Choose from three layouts:
Compact, Outline, or Tabular,
available from the Report
Layout list on the Design
ribbon.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Hiding Categories (Filtering)
• When you place a categorical field in the Rows, Columns, or Filters
area, all its categories show by default.
• It is often useful to filter out, or hide, some of these categories. This
lets you focus on the categories of most interest.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Sorting
• It is easy to sort in a pivot table, either by the numbers in the Values
area or by the labels in a Rows or Columns field.
• To sort by the numbers in the Values area, right-click any number and select
Sort.
• To sort on the labels of a Rows or Columns field, right-click any of the
categories and select Sort.
• You can also click the dropdown arrow for the field and get the dialog box that allows
both sorting and filtering.
Pivot Charts
• It is easy to accompany pivot tables with pivot charts.
• These charts adapt automatically to the underlying pivot table.
• To create a pivot chart, click anywhere inside the pivot table, select the
PivotChart button on the Analyze/Options ribbon, and select a chart type.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Multiple Variables in the Values Area
• More than a single variable can be placed in the Values area.
• Also, a given variable in the Values area can be summarized by more
than one summarizing function.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Summarizing by Count
• The variable in the Values area can be summarized by the Count
function.
• This is useful when you want to know, for example, how many of the orders
were placed by females in the South.
• Right-click any number in the pivot table, select Value Field Settings, and select
the Count function.
Grouping
• Categories in a Rows or Columns
variable can be grouped.
• Suppose you want to summarize
Sum of Total Cost by Date.
• Starting with a blank pivot table,
check both Date and Total Cost in
the PivotTable Fields pane.
• Then right-click any date and
select Group.
Example 3.5: Frozen Lasagna Dinners
• Objective: To use pivot tables to explore which demographic variables help to
distinguish lasagna triers from nontriers.
• Solution: Data set contains data on over 800 potential customers being
tracked by a frozen lasagna company.
• Set up a pivot table that shows counts of triers and nontriers for different
categories of the variables.
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.
Example 3.5: Frozen Lasagna Dinners
© 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website or school-approved learning management system for classroom use.