Edited and Compiled By:
Dr. Chandrashekhar V. Joshi
Book: Business Analytics: Methods, Models,
and Decisions, 1st edition, James R. Evans
3-1
The old adage “A picture is worth 1000 words” is probably truer in today’s
information rich environment than ever before.
Data visualization - the process of displaying data (often in large
quantities) in a meaningful fashion to provide insights that will support
better decisions.
Data visualization improves decision-making, provides managers with
better analysis capabilities that reduce reliance on IT professionals, and
improves collaboration and information sharing.
Converting data into information to understand past and current
performance is the core of descriptive analytics and is vital to making good
business decisions. Techniques for doing this range from plotting data on
charts, extracting data from databases, and manipulating and summarizing
data.
3-2
Raw data are important, particularly when one needs to identify accurate values or
compare individual numbers.
However, it is quite difficult to identify trends and patterns, find exceptions, or
compare groups of data in tabular form. The human brain does a surprisingly good
job processing visual information—if presented in an effective way.
Visualizing data provides a way of communicating data at all levels of a business
and can reveal surprising patter.
3-3
Tabular data can be used to determine exactly how
many units of a certain product were sold in a particular
month, or to compare one month to another.
For example, we see that sales of product A dropped in February,
specifically by 6.7% (computed as 1 – B3/B2). Beyond such
calculations, however, it is difficult to draw big picture conclusions.
3-4
A visual chart provides the
means to
easily compare overall sales of
different products (Product C
sells the least, for example);
identify trends (sales of Product
D are increasing), other patterns
(sales of Product C is relatively
stable while sales of Product B
fluctuates more over time), and
exceptions (Product E’s sales
fell considerably in September).
3-5
A dashboard is a visual representation of a set of key business
measures. It is derived from the analogy of an automobile’s control
panel, which displays speed, gasoline level, temperature, and so on.
Dashboards provide important summaries of key business
information to help manage a business process or function.
3-6
Data visualization ranges from simple Excel charts to more advanced
interactive tools and software that allow users to easily view and
manipulate data with a few clicks, not only on computers, but on iPads
and other devices as well.
While we will only focus on Excel-based tools, you should be aware of
other options and commercial packages that are available.
In particular, we suggest that you look at the capabilities of Tableau
(www.tableausoftware.com) and IBM’s Cognos software
(www.cognos10.com). Tableau is easy to use and offers a free trial.
3-7
Select the Insert tab.
Highlight the data.
Click on chart type, then subtype.
Use Chart Tools to customize.
3-8
Excel distinguishes between vertical and horizontal bar
charts, calling the former column charts and the latter
bar charts.
◦ A clustered column chart compares values across categories
using vertical rectangles;
◦ a stacked column chart displays the contribution of each value to
the total by stacking the rectangles;
◦ a 100% stacked column chart compares the percentage that each
value contributes to a total.
Column and bar charts are useful for comparing
categorical or ordinal data, for illustrating differences
between sets of values, and for showing proportions or
percentages of a whole.
3-9
Highlight the range C3:K6, which includes the headings and
data for each category. Click on the Column Chart button
and then on the first chart type in the list (a clustered
column chart).
Highlighted Cells
3-10
To add a title, click on the first icon in the Chart Layouts group. Click on ―Chart
Title‖ in the chart and change it to ―EEO Employment Report—Alabama.‖ The
names of the data series can be changed by clicking on the Select Data button
in the Data group of the Design tab. In the Select Data Source dialog (see
below), click on ―Series1‖ and then the Edit button. Enter the name of the data
series, in this case ―All Employees.‖ Change the names of the other data
series to ―Men‖ and ―Women‖ in a similar fashion.
3-11
Line charts provide a useful means for displaying data over time.
You may plot multiple data series in line charts; however, they can be difficult to
interpret if the magnitude of the data values differs greatly. In that case, it would be
advisable to create separate charts for each data series.
Figure shows a line chart giving the amount of U.S. exports to China in billions of
dollars from the Excel file China Trade Data. The chart clearly shows a significant
rise in exports starting in the year 2000, which began to level off around 2008.
Example 3.3: A Line
Chart for China Export
Data
3-12
A Pie Chart for Census Data
Consider the marital status of individuals in the U.S. population in the Excel file
Census Education Data, a portion of which is shown in Figure. To show the relative
proportion in each category, we can use a pie chart, as shown. This chart uses a layout
option that shows the labels associated with the data as well as the actual proportions
as percentages. A different layout that shows both the values and/or proportions can
also be chosen.
3-13
A pie chart displays this by partitioning a circle into pie-
shaped areas showing the relative proportion.
Example 3.4: A Pie Chart for
Census Data
3-14
Data visualization professionals don't recommend using pie charts.
In a pie chart, it is difficult to compare the relative sizes of areas;
however, the bars in the column chart can easily be compared to
determine relative ratios of the data.
If you do use pie charts, restrict them to small numbers of
categories, always ensure that the numbers add to 100%, and use
labels to display the group names and actual percentages. Avoid
three-dimensional (3-D) pie charts—especially those that are
rotated—and keep them simple.
3-15
An area chart combines the features of a pie chart with those of line charts. Area
charts present more information than pie or line charts alone but may clutter the
observer’s mind with too many details if too many data series are used; thus, they
should be used with care.
Figure displays total energy consumption (billion Btu) and consumption of fossil
fuels from the Excel file Energy Production & Consumption. This chart shows that
although total energy consumption has grown since 1949, the relative proportion of
fossil fuel consumption has remained generally consistent at about half of the total,
indicating that alternative energy sources have not replaced a significant portion of
fossil-fuel consumption.
Example 3.5: An Area
Chart for Energy
Consumption
3-16
Scatter charts show the relationship between two variables. To construct a scatter
chart, we need observations that consist of pairs of variables.
Figure shows a scatter chart of house size (in square feet) versus the home market
value from the Excel file Home Market Value. The data clearly suggest that higher
market values are associated with larger homes.
Example 3.6: A
Scatter Chart for
Real Estate Data
3-17
A bubble chart is a type of scatter chart in which the size of the data marker
corresponds to the value of a third variable; consequently, it is a way to plot three
variables in two dimensions.
Example 3.7: A
Bubble Chart for
Stock Comparisons
Figure shows a bubble chart for displaying price, P/E (price/earnings) ratio, and market
capitalization for five different stocks on one particular day in the Excel file Stock
Comparisons. The position on the chart shows the price and P/E; the size of the bubble
represents the market cap in billions of dollars.
3-18
Excel provides several additional charts for special applications. These
additional types of charts (including bubble charts) can be selected and
created from the Other Charts button in the Excel ribbon.
These include the following:
A stock chart allows you to plot stock prices, such as the daily high,
low, and close. It may also be used for scientific data such as
temperature changes.
A surface chart shows 3-D data.
A doughnut chart is similar to a pie chart but can contain more than
one data series.
A radar chart allows you to plot multiple dimensions of several data
series.
3-19
Many applications of business analytics involve geographic data.
Visualizing geographic data can highlight key data relationships,
identify trends, and uncover business opportunities. In addition, it
can often help to spot data errors and help end users understand
solutions, thus increasing the likelihood of acceptance of decision
models.
Companies like Nike use geographic data and information systems
for visualizing where products are being distributed and how that
relates to demographic and sales information. This information is
vital to marketing strategies.
Geographic mapping capabilities were introduced in Excel 2000 but
were not available in Excel 2002 and later versions. These
capabilities are now available through Microsoft MapPoint 2010,
which must be purchased separately.
3-20
Data bars
Color scales
Icon sets
Sparklines
Camera tool
These options are part of Excel’s Conditional Formatting rules,
which allow you to visualize different numerical values through
the use of colors and symbols. Excel has a variety of standard
templates to use, but you may also customize the rules to meet
your own conditions and styles. We encourage you to experiment
with these tools.
3-21
Data bars display colored bars that are scaled to the
magnitude of the data values (similar to a bar chart) but
placed directly within the cells of a range.
Highlight the data in each column, click the Conditional Formatting
button in the Styles group within the Home tab, select Data Bars,
and choose the fill option and color.
3-22
Color scales shade cells based on their numerical value
using a color palette.
Color-coding of quantitative data is commonly called a heatmap.
3-23
Icon sets provide similar information using various
symbols such as arrows or stoplight colors.
3-24
Sparklines are graphics that summarize a row
or column of data in a single cell.
Excel has three types of sparklines: line,
column, and win/loss.
◦ Line sparklines are clearly useful for time-series data
◦ Column sparklines are more appropriate for
categorical data.
◦ Win-loss sparklines are useful for data that move up or
down over time.
3-25
Example 3.9 Examples of Sparklines
Figure shows line sparklines in row 14 for each product. In column G, we
display column sparklines, which are essentially small column charts.
Generally you need to expand the row or column widths to display them
effectively. Notice, however, that the lengths of the bars are not scaled
properly to the data; for example, in the first one, products D and E are
roughly one-third the value of Product E yet the bars are not scaled
correctly. So be careful when using them.
Next Figure shows a modified worksheet in which we computed the
percentage change from 1 month to the next for products A and B. The
win-loss sparklines in row 14 show the patterns of sales increases and
decreases, suggesting that product A has a cyclical pattern while product B
changed in a more random fashion. If you click on any cell containing a
sparkline, the Sparkline Tools Design tab appears, allowing you to
customize colors and other options.
3-26
Generally you need to expand the row or column widths to display
them effectively. Notice, however, that the lengths of the bars are
not scaled properly to the data; for example, in the first one,
products D and E are roughly one-third the value of Product E yet
the bars are not scaled correctly. So be careful when using them.
3-27
This tool allows you to create live pictures of various ranges from
different worksheets that you can place on a single page, size them,
and arrange them easily.
They are simply linked pictures of the original ranges, and the
advantage is that as any data are changed or updated, the camera
shots are also.
To use the camera too, first add it to the Quick Access Toolbar (the
set of buttons above the ribbon). From the File menu, choose
Options and then Quick Access Toolbar. Choose Commands, and
then Commands Not in the Ribbon. Select Camera and add it.
3-28
Managers often need to sort and filter data.
Filtering means extracting a set of records having certain
characteristics.
Excel provides a convenient way of formatting
databases to facilitate analysis using sorting and
filtering, called Tables.
3-29
First, select the range of the data, including headers (a useful shortcut is to
select the first cell in the upper left corner, then click Ctrl+Shift+down arrow,
and then Ctrl+Shift+right arrow).
Next, click Table from the Tables group on the Insert tab and make sure that
the box for My Table Has Headers is checked. (You may also just select a
cell within the table and then click on Table from the Insert menu.)
The table range will now be formatted and will continue automatically when
new data are entered.
If you click within a table, the Table Tools Design tab will appear in the
ribbon, allowing you to do a variety of things, such as change the color
scheme, remove duplicates, change the formatting, and so on.
3-30
Suppose that in the Credit Risk Data table, we wish to calculate the
total amount of savings in column C. We could, of course, simply
use the function =SUM(C4:C428). However, with a table, we could
use the formula =SUM(Table1[Savings]). The table name,Table1,
can be found (and changed) in the Properties group of the Table
Tools Design tab. Note that Savings is the name of the header in
column C. One of the advantages of doing this is that if we add new
records to the table, the calculation will be updated automatically,
3-31
The sort buttons in Excel can be found under the Data
tab in the Sort & Filter group. Select a single cell in the
column you want to sort on and click the ―AZ down
arrow‖ button to sort from smallest to largest or the ―AZ
up arrow‖ button to sort from largest to smallest. You
may also click the Sort button to specify criteria for more
advanced sorting capabilities.
3-32
Suppose we wish to sort the data by supplier. Click on any
cell in column A of the data (but not the header cell A3) and
then the ―AZ down‖ button in the Data tab. Excel will select
the entire range of the data and sort by name of supplier in
column A.
3-33
An Italian economist, Vilfredo Pareto, observed in 1906
that a large proportion of the wealth in Italy was owned
by a small proportion of the people.
Similarly, businesses often find that a large proportion of
sales come from a small percentage of customers, a
large percentage of quality defects stems from just a
couple of sources, or a large percentage of inventory
value corresponds to a small percentage of items.
A Pareto analysis involves sorting data and calculating
cumulative proportions.
3-34
Sort by
75% of the bicycle inventory value comes from 40% (9/24) of items.
3-35
For large data files, finding a particular subset of
records that meet certain characteristics by sorting
can be tedious.
Excel provides two filtering tools:
◦ AutoFilter for simple criteria, and
◦ Advanced Filter for more complex criteria.
3-36
In the Purchase Orders database, suppose we are interested in extracting
all records corresponding to the item Bolt-nut package.
Select any cell in the
database
Data > Sort & Filter > Filter
Click on the dropdown
arrow in cell D3.
Select Bolt-nut package to
filter out all other items.
3-37
The filter tool does not extract the records; it simply hides the
records that don’t match the criteria. However, you can copy and
paste the data to another Excel worksheet, Microsoft Word
document, or a Power-Point presentation.
To restore the original data file, click on the drop-down arrow again
and then click Clear filter from ―Item Description.‖
3-38
Suppose we wish to identify all records in the Purchase Orders
database whose item cost is at least $200. First, click on the drop-
down arrow in the Item Cost column and position the cursor over
Numbers Filter. This displays a list of options. Select Greater Than
Or Equal To . . . from the list.
3-39
The Custom AutoFilter dialog allows you to specify up to two
specific criteria using ―and‖ and ―or‖ logic. Enter 200 in the box as
shown; the tool will display all records having an item cost of $200
or more.
3-40
AutoFilter creates filtering criteria based on the type of
data being filtered. If you choose to filter on Order Date
or Arrival Date, the AutoFilter tools will display a different
Date Filters menu list for filtering that includes
―tomorrow,‖ ―next week,‖ ―year to date,‖ and so on.
AutoFilter can be used sequentially to ―drill down‖ into
the data.
◦ For example, after filtering the results by Bolt-nut package, we
could then filter by order date and select all orders processed in
September.
3-41
Statistics is both the science of uncertainty and
the technology of extracting information from data.
A statistic is a summary measure of data.
Descriptive statistics are methods that describe
and summarize data.
Microsoft Excel supports statistical analysis in two
ways:
1. Statistical functions
2. Analysis Toolpak add-in
3-42
A frequency distribution is a table that shows
the number of observations in each of several
non-overlapping groups.
◦ Categorical variables naturally define the groups in a
frequency distribution.
To construct a frequency distribution, we need
only count the number of observations that
appear in each category.
◦ This can be done using the Excel COUNTIF function.
3-43
List the item names in a column on the spreadsheet.
Use the function =COUNTIF($D$4:$D$97,cell_reference),
where cell_reference is the cell containing the item name
3-44
Construct a column chart to visualize the frequencies.
3-45
Relative frequency is the fraction, or proportion, of the
total.
If a data set has n observations, the relative frequency of
category i is:
We often multiply the relative frequencies by 100 to
express them as percentages.
A relative frequency distribution is a tabular summary
of the relative frequencies of all categories.
3-46
First, sum the frequencies to find the total number (note
that the sum of the frequencies must be the same as the
total number of observations, n).
Then divide the frequency of each category by this
value.
3-47
For numerical data that consist of a small number
of discrete values, we may construct a frequency
distribution similar to the way we did for
categorical data; that is, we simply use COUNTIF
to count the frequencies of each discrete value.
3-48
In the Purchase Orders data, the A/P terms are all
whole numbers 15, 25, 30, and 45.
3-49
A graphical depiction of a frequency distribution for
numerical data in the form of a column chart is
called a histogram.
Frequency distributions and histograms can be
created using the Analysis Toolpak in Excel.
◦ Click the Data Analysis tools button in the Analysis group
under the Data tab in the Excel menu bar and select
Histogram from the list.
3-50
Specify the Input Range corresponding to the data. If you include
the column header, then also check the Labels box so Excel knows
that the range contains a label. The Bin Range defines the groups
(Excel calls these ―bins‖) used for the frequency distribution.
3-51
If you do not specify a Bin Range, Excel will
automatically determine bin values for the frequency
distribution and histogram, which often results in a rather
poor choice.
If you have discrete values, set up a column of these
values in your spreadsheet for the bin range and specify
this range in the Bin Range field.
3-52
We will create a frequency distribution and histogram for
the A/P Terms variable in the Purchase Orders
database.
We defined the bin range below the data in cells
H99:H103 as follows:
Month
15
25
30
45
3-53
Histogram tool results:
3-54
For numerical data that have many different discrete values with
little repetition or are continuous, a frequency distribution requires
that we define by specifying
1. the number of groups,
2. the width of each group, and
3. the upper and lower limits of each group.
Choose between 5 to 15 groups, and the range of each should be
equal.
Choose the lower limit of the first group (LL) as a whole number
smaller than the minimum data value and the upper limit of the last
group (UL) as a whole number larger than the maximum data value.
3-55
The data range from a minimum of $68.75 to a maximum of
$127,500; set the lower limit of the first group to $0 and the upper
limit of the last group to $130,000.
If we select 5 groups, using equation (3.2) the width of each group is
($130,000 - 0) / 5 = $26,000
3-56
Ten-group histogram
3-57
Set the cumulative relative frequency of the first group equal to its
relative frequency. Then add the relative frequency of the next group
to the cumulative relative frequency.
For, example, the cumulative relative frequency in cell D3 is
computed as =D2+C3 = 0.000 + 0.447 = 0.447.
3-58
The kth percentile is a value at or below which at least k
percent of the observations lie. The most common way to
compute the kth percentile is to order the data values from
smallest to largest and calculate the rank of the kth percentile
using the formula:
Statistical software use different methods that often involve
interpolating between ranks instead of rounding, thus
producing different results.
◦ The Excel function PERCENTILE.INC(array, k) computes the kth
percentile of data in the range specified in the array field, where k is in
the range 0 to 1, inclusive (i.e., including 0 and 1).
3-59
Compute the 90th percentile for Cost per order in
the Purchase Orders data.
◦ Rank of kth percentile = nk/100 + 0.5
◦ n = 94; k = 90
◦ For the 90th percentile, the rank is
= 94(90)/100+0.5 = 85.1 (round to 85)
◦ Value of the 85th observation = $74,375
Using the Excel function
PERCENTILE.INC(G4:G97,0.9), the 90th percentile is
$73,737.50, which is different from using formula (3.3).
3-60
Data >
Data Analysis >
Rank and Percentile
90.3rd percentile
= $74,375
(same result as
manually computing
the 90th percentile)
The Excel value of the 90th percentile that was computed in
Example 3.23 as $74,375 is the 90.3rd percentile value.
3-61
Quartiles break the data into four parts.
◦ The 25th percentile is called the first quartile,Q1;
◦ the 50th percentile is called the second quartile, Q2;
◦ the 75th percentile is called the third quartile, Q3; and
◦ the 100th percentile is the fourth quartile, Q4.
One-fourth of the data fall below the first quartile, one-
half are below the second quartile, and three-fourths are
below the third quartile.
Excel function QUARTILE. INC(array, quart), where
array specifies the range of the data and quart is a whole
number between 1 and 4, designating the desired
quartile.
3-62
Compute the Quartiles of the Cost per Order data
First quartile: =QUARTILE.INC(G4:G97,1) = $6,757.81
Second quartile: =QUARTILE.INC(G4:G97,2) = $15,656.25
Third quartile: =QUARTILE.INC(G4:G97,3) = $27,593.75
Fourth quartile: =QUARTILE.INC(G4:G97,4) = $127,500.00
3-63
A cross-tabulation is a tabular method that displays the
number of observations in a data set for different
subcategories of two categorical variables.
◦ A cross-tabulation table is often called a contingency table.
The subcategories of the variables must be mutually
exclusive and exhaustive, meaning that each
observation can be classified into only one subcategory,
and, taken together over all subcategories, they must
constitute the complete data set.
3-64
Sales Transactions database
Count the number (and compute the percentage) of
books and DVDs ordered by region.
3-65
3-66
Excel provides a powerful tool for distilling a
complex data set into meaningful information:
PivotTables.
PivotTables allows you to create custom
summaries and charts of key information in the
data.
PivotTables can be used to quickly create cross-
tabulations and to drill down into a large set of
data in numerous ways.
3-67
Click inside your
database
Insert >
Tables >
PivotTable
The wizard creates a
blank PivotTable as
shown.
3-68
Select and drag the
fields to one of the
PivotTable areas:
Report Filter
Column Labels
Row Labels
Σ Values
3-69
Initial PivotTable
for Regional
Sales by Product
The PivotTable
defaults to a sum
of the field in the
Values area.
We seek a count
of the number of
records in each
category.
3-70
Active Field > Analyze >
Field Settings
Change summarization
method in Value Field
Settings dialog box
Select Count
3-71
3-72
Uncheck the boxes in
the PivotTable Field
List or drag the field
names to different
areas.
You may easily add
multiple variables in
the fields to create
different views of the
data.
◦ Example: drag the
Source field into the Row
Labels area
3-73
Dragging a field into the Report Filter area in the
PivotTable Field list allows you to add a third dimension
to your analysis.
Click the drop down
arrow in cell B1;
choose Credit:
3-74
PivotCharts visualize data in PivotTables.
They can be created in a simple one-click fashion.
◦ Select the PivotTable
◦ From the analyze tab, click PivotChart.
◦ Excel will display an Insert Chart dialog that allows you to
choose the type of chart you wish to display.
3-75
By clicking on the drop-down buttons, you can easily change the
data that are displayed. by filtering the data. Also, by clicking on the
chart and selecting the PivotChart Tools Design tab, you can switch
the rows and columns to display an alternate view of the chart or
change the chart type entirely.
3-76
Excel 2010 introduced slicers — tools for drilling
down to ―slice‖ a PivotTable and display a subset
of data.
To create a slicer for any of the columns in the
database, click on the PivotTable and choose
Insert Slicer from the Analyze tab in the PivotTable
Tools ribbon.
3-77
Cross-tabulation
―sliced‖ by E-
mail
3-78
The camera tool is useful for creating PivotTable-
based dashboards.
If you create several different PivotTables and
charts, you can easily use the camera tool to take
pictures of them and consolidate them onto one
worksheet.
In this fashion, you can still make changes to the
PivotTables and they will automatically be
reflected in the camera shots.
3-79
3-80
Exercises
1. The total runs scored by 30 players in a test cricket match in the year 2011 were recorded
to determine which score was the highest and which the lowest.
The runs are:
423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363,
391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419, 386, 390
Construct the frequency distribution table and calculate relative frequency.
2. A community health-status survey obtained the following demographic information from
the respondents:
Compute the relative frequency and cumulative relative frequency of the age groups.
3-81