0% found this document useful (0 votes)
39 views

Chapter 3 - Visualizing Data

Visualizing Data

Uploaded by

Ryan Dinglasan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Chapter 3 - Visualizing Data

Visualizing Data

Uploaded by

Ryan Dinglasan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Chapter 3

Visualizing and
Exploring Data

KIM RYAN DINGLASAN


Data Visualization
 Data visualization - the process of displaying data (often in large
quantities) in a meaningful fashion to provide insights that will
support better decisions.
◦ Data visualization improves decision-making, provides managers with better
analysis capabilities that reduce reliance on IT professionals, and improves
collaboration and information sharing.
Benefits of Data Visualization
 Better Analysis
 Quick Action
 Identifying Patterns
 Finding Errors
 Understanding the Story
 Exploring Business Insights
 Grasping the Latest Trends
Example 3.1: Tabular vs. Visual Data Analysis
 Tabular data can be used to determine exactly how many units of a certain
product were sold in a particular month, or to compare one month to another.
◦ For example, we see that sales of product A dropped in February, specifically by 6.7%
(computed as 1 – B3/B2). Beyond such calculations, however, it is difficult to draw big
picture conclusions.
Example 3.1: Tabular vs. Visual Data Analysis

 A visual chart provides the means to


◦ easily compare overall sales of different
products (Product C sells the least, for
example);
◦ identify trends (sales of Product D are
increasing), other patterns (sales of
Product C is relatively stable while sales of
Product B fluctuates more over time), and
exceptions (Product E’s sales fell
considerably in September).
Dashboards
 A dashboard  is a visual representation of a set
of key business measures. It is derived from the
analogy of an automobile’s control panel, which
displays speed, gasoline level, temperature, and
so on.
◦ Dashboards provide important summaries of key
business information to help manage a business
process or function.  
Creating Charts in Microsoft Excel
 Select the Insert tab.
 Highlight the data.
 Click on chart type, then subtype.

 Use Chart Tools to customize.


Column and Bar Charts
 Excel distinguishes between vertical and horizontal bar
charts, calling the former column charts and the latter
bar charts.
◦ A clustered column chart compares values across categories
using vertical rectangles;
◦ a stacked column chart displays the contribution of each value to
the total by stacking the rectangles;
◦ a 100% stacked column chart compares the percentage that each
value contributes to a total.
 Column and bar charts are useful for comparing
categorical or ordinal data, for illustrating differences
between sets of values, and for showing proportions or
percentages of a whole.
Example 3.2: Creating a Column Chart
Open file: EEO Employment Report.
Highlight the range C3:K6, which includes the headings and
data for each category. Click on the Column Chart button
and then on the first chart type in the list (a clustered
column chart).
Highlighted Cells
Example 3.2: Creating a Column Chart
To add a title, click on the first icon in the Chart Layouts group. Click on “Chart
Title” in the chart and change it to “Alabama Employment.” The names of the
data series can be changed by clicking on the Select Data button in the Data
group of the Design tab. In the Select Data Source dialog (see below), click on
“Series1” and then the Edit button. Enter the name of the data series, in this
case “All Employees.” Change the names of the other data series to “Men” and
“Women” in a similar fashion.
Line Charts Open File: China Trade Data
 Line charts provide a useful means for displaying data
over time.
◦ You may plot multiple data series in line charts; however, they can
be difficult to interpret if the magnitude of the data values differs
greatly. In that case, it would be advisable to create separate
charts for each data series.
Pie Charts
 A pie chart displays this by partitioning a circle into pie-
shaped areas showing the relative proportion.

Open File: Census Education Data


Area Charts
 An area chart combines the features of a pie chart with
those of line charts.
◦ Area charts present more information than pie or line charts alone
but may clutter the observer’s mind with too many details if too
many data series are used; thus, they should be used with care.

Open File: Energy Production &


Consumption
Scatter Charts
 Scatter charts show the relationship between two
variables. To construct a scatter chart, we need
observations that consist of pairs of variables.

Open File: Home Market Value


Bubble Charts
 A bubble chart is a type of scatter chart in which the size
of the data marker corresponds to the value of a third
variable; consequently, it is a way to plot three variables
in two dimensions.

Open File: Stock Comparisons


Miscellaneous Excel Charts
 Stock chart – allows you to plot stock prices, such as the daily high,
low and close.
 Surface chart – shows 3D data.
 Doughnut chart – similar to a pie chart but contain more than one

data series.
 Radar chart – allows you to plot multiple dimensions of several data

series.
Geographic Data
 Many applications of business analytics involve
geographic data. For example, problems such as
finding the best location for production and
distribution facilities, analyzing regional sales
performance, transporting raw materials and
finished goods, and routing vehicles such as
delivery trucks involve geographic data.
Data Visualization through Conditional Formatting
 Data bars display colored bars that are scaled to the magnitude of the data
values (similar to a bar chart) but placed directly within the cells of a range.
Open File: Monthly Product Sales

◦ Highlight the data in each column, click the Conditional Formatting button in the Styles
group within the Home tab, select Data Bars, and choose the fill option and color.
Data Visualization through Conditional Formatting
 Color scales shade cells based on their numerical value using a color
palette.
◦ Color-coding of quantitative data is commonly called a heatmap.  
Data Visualization through Conditional Formatting
 Icon sets provide similar information using various symbols such as arrows
or stoplight colors.
Sparklines
 Sparklines are graphics that summarize
a row or column of data in a single cell.
 Excel has three types of sparklines: line,

column, and win/loss.


◦ Line sparklines are clearly useful for time-
series data
◦ Column sparklines are more appropriate for
categorical data.
◦ Win-loss sparklines are useful for data that
move up or down over time.  
Open File: Monthly Product Sales
Excel Camera Tool
 This tool allows you to create live pictures of various ranges from different
worksheets that you can place on a single page, size them, and arrange them
easily.
◦ To use the camera tool, first add it to the Quick Access Toolbar (the set of buttons above the ribbon).
From the File menu, choose Options and then Quick Access Toolbar. Choose Commands, and then
Commands Not in the Ribbon. Select Camera and add it.
Data Queries: Tables, Sorting, and
Filtering
 Managers often need to sort and filter
data.
 Filtering means extracting a set of

records having certain characteristics


 Excel provides a convenient way of

formatting databases to facilitate analysis


using sorting and filtering, called Tables.

Open File: Credit Risk Data


Example 3.10: Creating an Excel Table
Example 3.11: Table-Based Calculations
Sorting Data in Excel

 The sort buttons in Excel can be found under the Data


tab in the Sort & Filter group. Select a single cell in the
column you want to sort on and click the “AZ down
arrow” button to sort from smallest to largest or the “AZ
up arrow” button to sort from largest to smallest. You
may also click the Sort button to specify criteria for more
advanced sorting capabilities.
Example 3.12 Sorting Data in the Purchase
Orders Database
Pareto Analysis
 An Italian economist, Vilfredo Pareto, observed in 1906
that a large proportion of the wealth in Italy was owned
by a small proportion of the people.
 Similarly, businesses often find that a large proportion of
sales come from a small percentage of customers, a
large percentage of quality defects stems from just a
couple of sources, or a large percentage of inventory
value corresponds to a small percentage of items
 A Pareto analysis involves sorting data and calculating
cumulative proportions.
Open File: Bicycle Inventory
Example 3.12. Applying the Pareto Analysis
Example 3.13: Applying the Pareto
Principle
Sort by

75% of the bicycle inventory value comes from 40% (9/24) of items.
Filtering Data
 For large data files, finding a particular
subset of records that meet certain
characteristics by sorting can be tedious.
 Excel provides two filtering tools:

◦ AutoFilter for simple criteria, and


◦ Advanced Filter for more complex criteria.

Open File: Purchase Order


Example 3.14: Filtering Records by Item Description
Example 3.15: Filtering Records by Item Cost
Example 3.15: Filtering Records by Item
Cost
 The Custom AutoFilter dialog allows you to specify up to two
specific criteria using “and” and “or” logic. Enter 200 in the box as
shown; the tool will display all records having an item cost of $200
or more.
Statistical Methods for Summarizing Data
 Statistics is both the science of uncertainty and the technology of
extracting information from data.
 A statistic is a summary measure of data.
 Descriptive statistics are methods that describe and summarize

data.
 Microsoft Excel supports statistical analysis in two ways:

1. Statistical functions
2. Analysis Toolpak add-in
Frequency Distributions for Categorical Data
 A frequency distribution is a table that shows the number of
observations in each of several non-overlapping groups.
◦ Categorical variables naturally define the groups in a frequency distribution.
 To construct a frequency distribution, we only need to count the
number of observations that appear in each category.
◦ This can be done using the Excel COUNTIF function.
Example 3.16: Constructing a Frequency Distribution
for Items in the Purchase Order Database
Example 3.16: Constructing a Frequency Distribution
for Items in the Purchase Orders Database
 Construct a column chart to visualize the frequencies.
Relative Frequency Distributions
 Relative frequency is the fraction, or proportion, of the total.
 If a data set has n observations, the relative frequency of category i is:

 We often multiply the relative frequencies by 100 to express them as


percentages.
 A relative frequency distribution is a tabular summary of the relative
frequencies of all categories.
Frequency Distributions for Numerical Data
 For numerical data that consist of a small number
of discrete values, we may construct a frequency
distribution similar to the way we did for
categorical data; that is, we simply use COUNTIF
to count the frequencies of each discrete value.
Example 3.18: Frequency and Relative Frequency
Distribution for A/P Terms
Excel Histogram Tool
 A graphical depiction of a frequency distribution for numerical
data in the form of a column chart is called a histogram.
 Frequency distributions and histograms can be created using the

Analysis Toolpak in Excel.


◦ Click the Data Analysis tools button in the Analysis group under the Data
tab in the Excel menu bar and select Histogram from the list.
Histogram Dialog
 Specify the Input Range corresponding to the data. If you include
the column header, then also check the Labels box so Excel knows
that the range contains a label. The Bin Range defines the groups
(Excel calls these “bins”) used for the frequency distribution.
Using Bin Ranges
 If you do not specify a Bin Range, Excel will
automatically determine bin values for the frequency
distribution and histogram, which often results in a rather
poor choice.
 If you have discrete values, set up a column of these
values in your spreadsheet for the bin range and specify
this range in the Bin Range field.
Example 3.19: Using the Histogram Tool
 We will create a frequency distribution and histogram for the A/P Terms
variable in the Purchase Orders database.
 We defined the bin range below the data in cells H99:H103 as follows:
Month
15
25
30
45
Example 3.19: Using the Histogram Tool
 Histogram tool results:
Percentiles
 The kth percentile is a value at or below which at least k
percent of the observations lie. The most common way to
compute the kth percentile is to order the data values from
smallest to largest and calculate the rank of the kth percentile
using the formula:

 Statistical software use different methods that often involve


interpolating between ranks instead of rounding, thus
producing different results.
◦ The Excel function PERCENTILE.INC(array, k) computes the kth
percentile of data in the range specified in the array field, where k is in
the range 0 to 1, inclusive (i.e., including 0 and 1).
Examples 3.22 and 3.23: Computing Percentiles
 Compute the 90th percentile for Cost per order in the Purchase
Orders data.
 Sort in ascending order the Cost per Order column.
◦ Rank of kth percentile = (nk/100) + 0.5
◦ n = 94; k = 90
◦ For the 90th percentile, the rank is
= (94(90)/100) +0.5 = 85.1 (round to 85)
◦ Value of the 85th observation = $74,375
 Using the Excel function PERCENTILE.INC(G4:G97,0.9), the 90th percentile
is $73,737.50, which is different from using formula (3.3).
Example 3.24 Excel Rank and Percentile Tool
Data >
Data Analysis >
Rank and Percentile

90.3rd percentile
= $74,375
(same result as
manually computing
the 90th percentile)

The Excel value of the 90th percentile that was computed in


Example 3.23 as $74,375 is the 90.3rd percentile value.
Quartiles
 Quartiles break the data into four parts.
◦ The 25th percentile is called the first quartile,Q1;
◦ the 50th percentile is called the second quartile, Q2;
◦ the 75th percentile is called the third quartile, Q3; and
◦ the 100th percentile is the fourth quartile, Q4.
 One-fourth of the data fall below the first quartile, one-half are below the
second quartile, and three-fourths are below the third quartile.
 Excel function QUARTILE. INC(array, quart), where array specifies the range
of the data and quart is a whole number between 1 and 4, designating the
desired quartile.
Example 3.25 Computing Quartiles in Excel
 Compute the Quartiles of the Cost per Order data
 First quartile: =QUARTILE.INC(G4:G97,1) = $6,757.81
 Second quartile: =QUARTILE.INC(G4:G97,2) = $15,656.25
 Third quartile: =QUARTILE.INC(G4:G97,3) = $27,593.75
 Fourth quartile: =QUARTILE.INC(G4:G97,4) = $127,500.00
Cross-Tabulations
 A cross-tabulation is a tabular method that displays the number of
observations in a data set for different subcategories of two categorical
variables.
◦ A cross-tabulation table is often called a contingency table.
 The subcategories of the variables must be mutually exclusive and
exhaustive, meaning that each observation can be classified into only one
subcategory, and, taken together over all subcategories, they must constitute
the complete data set.
 Open Sales Transaction file.
Example 3.26: Constructing a Cross-Tabulation
Example 3.26: Constructing a Cross-Tabulation
Cross-Tabulation Visualization: Chart of Regional Sales
by Product
Exploring Data Using PivotTables
 Excel provides a powerful tool for distilling a complex data set into
meaningful information: PivotTables.
 PivotTables allows you to create custom summaries and charts of

key information in the data.


 PivotTables can be used to quickly create cross-tabulations and to

drill down into a large set of data in numerous ways.


Constructing PivotTables
 Open Sales Transaction file.
 Select the data.
 Create a PivotTable

Insert >Tables >PivotTable


The wizard creates a blank
PivotTable as shown.
PivotTable Field List

Select and drag the fields to


one of the PivotTable areas:
 Column Labels - Product
 Row Labels - Region
 Σ Values – Sum of Cust ID
Example 3.27 Creating a PivotTable

Initial PivotTable for


Regional Sales by
Product
The PivotTable defaults to
a sum of the field in the
Values area.
We seek a count of the
number of records in
each category.
Changing Value Field Settings

Active Field > Analyze >


Field Settings
 Change summarization
method in Value Field
Settings dialog box
 Select Count
Final Pivot Table
Modifying PivotTables
 Uncheck the boxes in
the PivotTable Field
List or drag the field
names to different
areas.
 You may easily add
multiple variables in
the fields to create
different views of the
data.
◦ Example: drag the
Source field into the Row
Labels area
Example 3.28: Using the PivotTable Report Filter
 Dragging a field into the Report Filter area in the
PivotTable Field list allows you to add a third dimension
to your analysis.
Click the drop down
arrow in cell B1;
choose Credit:
PivotCharts
 PivotCharts visualize data in PivotTables.
 They can be created in a simple one-click fashion.

◦ Select the PivotTable


◦ From the analyze tab, click PivotChart.
◦ Excel will display an Insert Chart dialog that allows you to choose the type of
chart you wish to display.
Example 3.29: A PivotChart for Sales Data

By clicking on the drop-down buttons, you can easily change the


data that are displayed. by filtering the data. Also, by clicking on the
chart and selecting the PivotChart Tools Design tab, you can switch
the rows and columns to display an alternate view of the chart or
change the chart type entirely.
Slicers
 Excel 2010 introduced slicers — tools for drilling down to “slice” a
PivotTable and display a subset of data.
 To create a slicer for any of the columns in the database, click on

the PivotTable and choose Insert Slicer from the Analyze tab in the
PivotTable Tools ribbon.
Example 3.30 Using Slicers

Cross-tabulation
“sliced” by E-mail
PivotTable Dashboards
 The camera tool is useful for creating PivotTable-based dashboards.
 If you create several different PivotTables and charts, you can easily

use the camera tool to take pictures of them and consolidate them
onto one worksheet.
 In this fashion, you can still make changes to the PivotTables and

they will automatically be reflected in the camera shots.


Camera-Based Dashboard Example
Thank you!

You might also like