0% found this document useful (0 votes)
18 views

Module 5 - Data Visualization.pptx (1)

Uploaded by

21-58129
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module 5 - Data Visualization.pptx (1)

Uploaded by

21-58129
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Chapter 3

Visualizing and
Exploring Data
Data Queries: Tables, Sorting, and
Filtering
? Managers often need to sort and filter data.
◦ Filtering means extracting a set of records having
certain characteristics.
? Excel provides a convenient way of formatting
databases to facilitate analysis using sorting and
filtering, called Tables.
Example 3.10: Creating an Excel Table
? First, select the range of the data, including headers (a useful shortcut is to
select the first cell in the upper left corner, then click Ctrl+Shift+down arrow,
and then Ctrl+Shift+right arrow).
? Next, click Table from the Tables group on the Insert tab and make sure that
the box for My Table Has Headers is checked. (You may also just select a
cell within the table and then click on Table from the Insert menu.)
? The table range will now be formatted and will continue automatically when
new data are entered.
? If you click within a table, the Table Tools Design tab will appear in the
ribbon, allowing you to do a variety of things, such as change the color
scheme, remove duplicates, change the formatting, and so on.
Example 3.11: Table-Based
Calculations
? Suppose that in the Credit Risk Data table, we wish to calculate the
total amount of savings in column C. We could, of course, simply
use the function =SUM(C4:C428). However, with a table, we could
use the formula =SUM(Table1[Savings]). The table name,Table1,
can be found (and changed) in the Properties group of the Table
Tools Design tab. Note that Savings is the name of the header in
column C. One of the advantages of doing this is that if we add new
records to the table, the calculation will be updated automatically,
Sorting Data in Excel

? The sort buttons in Excel can be found under the Data


tab in the Sort & Filter group. Select a single cell in the
column you want to sort on and click the “AZ down
arrow” button to sort from smallest to largest or the “AZ
up arrow” button to sort from largest to smallest. You
may also click the Sort button to specify criteria for more
advanced sorting capabilities.
Example 3.12 Sorting Data in the
Purchase Orders Database
? Suppose we wish to sort the data by supplier. Click on
any cell in column A of the data (but not the header cell
A3) and then the “AZ down” button in the Data tab. Excel
will select the entire range of the data and sort by name
of supplier in column A.
Pareto Analysis
An Italian economist, Vilfredo Pareto, observed in 1906
that a large proportion of the wealth in Italy was owned
by a small proportion of the people.
Similarly, businesses often find that a large proportion of
sales come from a small percentage of customers, a
large percentage of quality defects stems from just a
couple of sources, or a large percentage of inventory
value corresponds to a small percentage of items
A Pareto analysis involves sorting data and calculating
cumulative proportions.
Example 3.13: Applying the Pareto
Principle
Sort by

75% of the bicycle inventory value comes from 40% (9/24) of items.
Filtering Data
? For large data files, finding a particular subset of
records that meet certain characteristics by sorting
can be tedious.
? Excel provides two filtering tools:
◦ AutoFilter for simple criteria, and
◦ Advanced Filter for more complex criteria.
Example 3.14: Filtering Records by Item
Description
In the Purchase Orders database, suppose we are interested in extracting
all records corresponding to the item Bolt-nut package.

Select any cell in the


database
Data > Sort & Filter > Filter
Click on the dropdown
arrow in cell D3.
Select Bolt-nut package to
filter out all other items.
Example 3.14: Filter Results
? The filter tool does not extract the records; it simply hides the
records that don’t match the criteria. However, you can copy and
paste the data to another Excel worksheet, Microsoft Word
document, or a Power-Point presentation.
? To restore the original data file, click on the drop-down arrow again
and then click Clear filter from “Item Description.”
Example 3.15: Filtering Records by Item
Cost
? Suppose we wish to identify all records in the Purchase Orders
database whose item cost is at least $200. First, click on the
drop-down arrow in the Item Cost column and position the cursor
over Numbers Filter. This displays a list of options. Select Greater
Than Or Equal To . . . from the list.
Example 3.15: Filtering Records by Item
Cost
? The Custom AutoFilter dialog allows you to specify up to two
specific criteria using “and” and “or” logic. Enter 200 in the box as
shown; the tool will display all records having an item cost of $200
or more.
About the AutoFilter
? AutoFilter creates filtering criteria based on the type of
data being filtered. If you choose to filter on Order Date
or Arrival Date, the AutoFilter tools will display a different
Date Filters menu list for filtering that includes
“tomorrow,” “next week,” “year to date,” and so on.
? AutoFilter can be used sequentially to “drill down” into
the data.
◦ For example, after filtering the results by Bolt-nut package, we
could then filter by order date and select all orders processed in
September.
Statistical Methods for Summarizing Data

? Statistics is both the science of uncertainty and


the technology of extracting information from data.
? A statistic is a summary measure of data.
? Descriptive statistics are methods that describe
and summarize data.
? Microsoft Excel supports statistical analysis in two
ways:
1. Statistical functions
2. Analysis Toolpak add-in
Frequency Distributions for Categorical
Data
? A frequency distribution is a table that shows
the number of observations in each of several
nonoverlapping groups.
◦ Categorical variables naturally define the groups in a
frequency distribution.
? To construct a frequency distribution, we need
only count the number of observations that
appear in each category.
◦ This can be done using the Excel COUNTIF function.
Example 3.16: Constructing a Frequency
Distribution for Items in the Purchase Orders
Database
? List the item names in a column on the spreadsheet.
? Use the function =COUNTIF($D$4:$D$97,cell_reference),
where cell_reference is the cell containing the item name
Example 3.16: Constructing a Frequency
Distribution for Items in the Purchase Orders
Database
? Construct a column chart to visualize the frequencies.
Relative Frequency Distributions
? Relative frequency is the fraction, or proportion, of the
total.
? If a data set has n observations, the relative frequency of
category i is:

? We often multiply the relative frequencies by 100 to


express them as percentages.
? A relative frequency distribution is a tabular summary
of the relative frequencies of all categories.
Example 3.17: Constructing a Relative
Frequency Distribution for Items in the
Purchase Orders Database
? First, sum the frequencies to find the total number (note
that the sum of the frequencies must be the same as the
total number of observations, n).
? Then divide the frequency of each category by this
value.
Frequency Distributions for
Numerical Data
? For numerical data that consist of a small number
of discrete values, we may construct a frequency
distribution similar to the way we did for
categorical data; that is, we simply use COUNTIF
to count the frequencies of each discrete value.
Example 3.18: Frequency and Relative
Frequency Distribution for A/P Terms
? In the Purchase Orders data, the A/P terms are all
whole numbers 15, 25, 30, and 45.
Excel Histogram Tool

? A graphical depiction of a frequency distribution for


numerical data in the form of a column chart is
called a histogram.
? Frequency distributions and histograms can be
created using the Analysis Toolpak in Excel.
◦ Click the Data Analysis tools button in the Analysis group
under the Data tab in the Excel menu bar and select
Histogram from the list.
Histogram Dialog
? Specify the Input Range corresponding to the data. If you include
the column header, then also check the Labels box so Excel knows
that the range contains a label. The Bin Range defines the groups
(Excel calls these “bins”) used for the frequency distribution.
Using Bin Ranges
? If you do not specify a Bin Range, Excel will
automatically determine bin values for the frequency
distribution and histogram, which often results in a rather
poor choice.
? If you have discrete values, set up a column of these
values in your spreadsheet for the bin range and specify
this range in the Bin Range field.
Example 3.19: Using the Histogram Tool

? We will create a frequency distribution and histogram for


the A/P Terms variable in the Purchase Orders database.
? We defined the bin range below the data in cells
H99:H103 as follows:
Month
15
25
30
45
Example 3.19: Using the Histogram Tool
? Histogram tool results:
Histograms for Numerical Data
? For numerical data that have many different discrete values with
little repetition or are continuous, a frequency distribution requires
that we define by specifying
1. the number of groups,
2. the width of each group, and
3. the upper and lower limits of each group.
? Choose between 5 to 15 groups, and the range of each should be
equal.
? Choose the lower limit of the first group (LL) as a whole number
smaller than the minimum data value and the upper limit of the last
group (UL) as a whole number larger than the maximum data value.
Example 3.20: Constructing a Frequency
Distribution and Histogram for Cost per Order
? The data range from a minimum of $68.75 to a maximum of
$127,500; set the lower limit of the first group to $0 and the upper
limit of the last group to $130,000.
? If we select 5 groups, using equation (3.2) the width of each group is
($130,000 - 0) / 5 = $26,000
Example 3.20: Constructing a Frequency
Distribution and Histogram for Cost per Order
? Ten-group histogram
Example 3.21 Computing Cumulative
Relative Frequencies
? Set the cumulative relative frequency of the first group equal to its
relative frequency. Then add the relative frequency of the next group
to the cumulative relative frequency.
? For, example, the cumulative relative frequency in cell D3 is
computed as =D2+C3 = 0.000 + 0.447 = 0.447.
Percentiles
? The kth percentile is a value at or below which at least k
percent of the observations lie. The most common way to
compute the kth percentile is to order the data values from
smallest to largest and calculate the rank of the kth percentile
using the formula:

? Statistical software use different methods that often involve


interpolating between ranks instead of rounding, thus
producing different results.
◦ The Excel function PERCENTILE.INC(array, k) computes the kth
percentile of data in the range specified in the array field, where k is in
the range 0 to 1, inclusive (i.e., including 0 and 1).
Examples 3.22 and 3.23: Computing
Percentiles
? Compute the 90th percentile for Cost per order in
the Purchase Orders data.
◦ Rank of kth percentile = nk/100 + 0.5
◦ n = 94; k = 90
◦ For the 90th percentile, the rank is
= 94(90)/100+0.5 = 85.1 (round to 85)
◦ Value of the 85th observation = $74,375
? Using the Excel function
PERCENTILE.INC(G4:G97,0.9), the 90th percentile is
$73,737.50, which is different from using formula (3.3).
Example 3.24 Excel Rank and Percentile
Tool
Data >
Data Analysis >
Rank and Percentile

90.3rd percentile
= $74,375
(same result as
manually computing
the 90th percentile)

The Excel value of the 90th percentile that was computed in


Example 3.23 as $74,375 is the 90.3rd percentile value.
Quartiles
? Quartiles break the data into four parts.
◦ The 25th percentile is called the first quartile,Q1;
◦ the 50th percentile is called the second quartile, Q2;
◦ the 75th percentile is called the third quartile, Q3; and
◦ the 100th percentile is the fourth quartile, Q4.
? One-fourth of the data fall below the first quartile,
one-half are below the second quartile, and three-fourths
are below the third quartile.
? Excel function QUARTILE. INC(array, quart), where
array specifies the range of the data and quart is a whole
number between 1 and 4, designating the desired
quartile.
Example 3.25 Computing Quartiles in
Excel
? Compute the Quartiles of the Cost per Order data
First quartile: =QUARTILE.INC(G4:G97,1) = $6,757.81
Second quartile: =QUARTILE.INC(G4:G97,2) = $15,656.25
Third quartile: =QUARTILE.INC(G4:G97,3) = $27,593.75
Fourth quartile: =QUARTILE.INC(G4:G97,4) = $127,500.00
Cross-Tabulations
? A cross-tabulation is a tabular method that displays the
number of observations in a data set for different
subcategories of two categorical variables.
◦ A cross-tabulation table is often called a contingency table.
? The subcategories of the variables must be mutually
exclusive and exhaustive, meaning that each
observation can be classified into only one subcategory,
and, taken together over all subcategories, they must
constitute the complete data set.
Example 3.26: Constructing a
Cross-Tabulation
Sales Transactions database

Count the number (and compute the percentage) of


books and DVDs ordered by region.
Cross-Tabulation Visualization: Chart of
Regional Sales by Product
Exploring Data Using PivotTables
? Excel provides a powerful tool for distilling a
complex data set into meaningful information:
PivotTables.
? PivotTables allows you to create custom
summaries and charts of key information in the
data.
? PivotTables can be used to quickly create
cross-tabulations and to drill down into a large set
of data in numerous ways.
Constructing PivotTables

Click inside your


database
Insert >
Tables >
PivotTable
The wizard creates a
blank PivotTable as
shown.
PivotTable Field List

Select and drag the


fields to one of the
PivotTable areas:
Report Filter
Column Labels
Row Labels
Σ Values
Example 3.27 Creating a PivotTable

Initial PivotTable
for Regional
Sales by Product
The PivotTable
defaults to a sum
of the field in the
Values area.
We seek a count
of the number of
records in each
category.
Changing Value Field Settings

Active Field > Analyze >


Field Settings
Change summarization
method in Value Field
Settings dialog box
Select Count
Final Pivot Table
Modifying PivotTables
? Uncheck the boxes in
the PivotTable Field
List or drag the field
names to different
areas.
? You may easily add
multiple variables in
the fields to create
different views of the
data.
◦ Example: drag the
Source field into the Row
Labels area
Example 3.28: Using the PivotTable
Report Filter
? Dragging a field into the Report Filter area in the
PivotTable Field list allows you to add a third dimension
to your analysis.
Click the drop down
arrow in cell B1;
choose Credit:
PivotCharts
? PivotCharts visualize data in PivotTables.
? They can be created in a simple one-click fashion.
◦ Select the PivotTable
◦ From the analyze tab, click PivotChart.
◦ Excel will display an Insert Chart dialog that allows you to
choose the type of chart you wish to display.
Example 3.29: A PivotChart for Sales
Data

By clicking on the drop-down buttons, you can easily change the


data that are displayed. by filtering the data. Also, by clicking on the
chart and selecting the PivotChart Tools Design tab, you can switch
the rows and columns to display an alternate view of the chart or
change the chart type entirely.
Slicers
? Excel 2010 introduced slicers — tools for drilling
down to “slice” a PivotTable and display a subset
of data.
? To create a slicer for any of the columns in the
database, click on the PivotTable and choose
Insert Slicer from the Analyze tab in the
PivotTable Tools ribbon.
Example 3.30 Using Slicers

Cross-tabulation
“sliced” by E-mail
PivotTable Dashboards
? The camera tool is useful for creating
PivotTable-based dashboards.
? If you create several different PivotTables and
charts, you can easily use the camera tool to take
pictures of them and consolidate them onto one
worksheet.
? In this fashion, you can still make changes to the
PivotTables and they will automatically be
reflected in the camera shots.
Camera-Based Dashboard Example

You might also like