0% found this document useful (0 votes)
19 views8 pages

Visualizing Descriptive Statistics and Analytics: Progress Check

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Visualizing Descriptive Statistics and Analytics: Progress Check

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PROGRESS CHECK

1. What is typically captured on the vertical axis in a data visualization?


2. What is typically captured on the horizontal axis in a data visualization?
3. How do you sort data in a chart in Excel?
4. Why did we sort the data in Exhibit 10.4 by the numerical value (Total Sales) instead of by the categorical
value (Company Name)? Would it be meaningful to sort it by Company Name instead?

VISUALIZING DESCRIPTIVE STATISTICS AND


ANALYTICS

LO 10-2
Explain how descriptive analytics
incorporates visualizations in
communicating its results.

Recall from Chapter 6 that descriptive statistics procedures summarize data to determine
what happened. As you recall, descriptive statistics are brief summaries (or factoids) of a
data set that provide a representation of the data set as a whole including basic statistics,
including the mean, standard deviation, minimums, and maximums of a data set. Exhibit 10.5
depicts a simple table showing Amazon’s net income and sales from 2008 through 2018.
Presented in tabular form, it is difficult to see the changes of the measures over time.

EXHIBIT 10.5
Table of Financial Performance for Amazon from 2008–2018 ($ in millions)
Source: Amazon Income Statements 2008–2018.

When that same data is visualized using a bar chart, as shown in Exhibit 10.6, you can
quickly recognize that Amazon’s sales have grown steadily each year. Note the chart
components:

EXHIBIT 10.6
Bar Chart of Financial Performance for Amazon from 2008–2018 ($ in millions)
Source: Amazon Income Statements 2008–2018.

A vertical axis with a scale from 0 to 250,000, with tick marks indicating each incremental
50,000 increase.
A horizontal axis indicating the years from 2008 through 2018.
Bars indicating the data series for each sales data point of each year.

page 531
Bar charts should be sorted logically. When using time series data (values taken on by a
variable are listed in time order: days, months, or years), it typically makes the most sense to
sort the data chronologically. However, bar charts also lend themselves to sorting based on
the numerical value of the measure. In this instance, sales increased each year, so it would
not change if sorted ascending based on sales.
Due to the time series nature of this data, a line chart would also work. Line charts should
only be sorted chronologically. Depicting Exhibit 10.1’s sales from five different companies
as a line chart would not make sense because there is nothing that inherently connects the
different companies’ sales totals from data point to data point. In contrast, the data in Exhibit
10.5 can be depicted as a line chart because the net income is provided over a 10-year period
as shown in Exhibit 10.7. Not only do line charts lend themselves to time series data, but they
also lend themselves well to numerical data that extends below 0. In this instance, 2012 and
2014 each had negative net income.
EXHIBIT 10.7
Line Chart of Net Income for Amazon from 2008–2018 ($ in millions)
Source: Amazon Income Statements 2008–2018.

This net income data could technically be depicted with a bar chart as shown in Exhibit
10.8; however, it is clunkier in its presentation than the sales data. The data point for 2012 is
not even visible on the bar chart due to the negative amounts. Line charts are also preferable
when the overall trend is most important to communicate rather than specific page 532
data points. Line charts provide more flexibility with scales.

EXHIBIT 10.8
Bar Chart of Net Income for Amazon from 2008–2018 ($ in millions)
Source: Amazon Income Statements 2008–2018.

Typically, ratio data should be visualized with 0 as the starting point, especially in a bar
chart. However, in a line chart, the trend will stand out the same way, even if your starting
point makes more sense at a point above or below 0.
Another type of chart to depict descriptive analytics is a pie chart. Pie charts can be
useful for showing proportion and visualizing categorical data. Recall that categorical data
tend to be represented by words, such as categorizing transaction types (e.g., sales versus
returns).
Pie charts can be used to show proportion when it is meaningful for numerical data.
Proportion is the number of observations in one particular category divided by the grand
total of observations. However, in Exhibit 10.9, the proportion of each year’s sales over a
period of 10 years is not particularly useful. It is also difficult to tell the year each slide
represents based on the legend. Pie charts are rarely preferred if there are more than six
categories (or “slices” of pie).

EXHIBIT 10.9
Pie Chart of Financial Performance for Amazon from 2008–2018 ($ in millions)
Source: Amazon Income Statements 2008–2018.

page 533

PROGRESS CHECK
5. When is a line chart preferred over a bar chart?
6. When you are visualizing data that is presented over time, which charts are the best options?
7. What type of data does a pie chart present?

PRESENTING DATA IN A DASHBOARD


When you think of a dashboard, you might think of your car. When driving, the dashboard
presents all the information needed to make driving decisions—a speedometer to ensure
you’re keeping under the speed limit, an odometer to indicate when to take your car for
service, a gas gauge to help prevent you from running out of fuel, and so on. In a business
setting, a dashboard is a report that contains a collection of useful visualizations and tables to
check business progress and drive decisions.
In Amazon’s case, it could be useful to display both the year-over-year sales bar chart and
the year-over-year net income line chart on the same report. Dashboards also typically have
filters available so that it is easy for decision-makers to hone on a particular time period,
category, product, or geographic area. If we had more detailed information about the different
locations and products that form the aggregated data in Exhibit 10.5, we could filter the two
charts so that they shifted based on the different location, product category, or year we
selected, as shown in Exhibit 10.10.
EXHIBIT 10.10
Example of How Amazon May Use a Simple Dashboard
Source: Amazon Income Statements 2008–2018.

LAB CONNECTION
Lab 10.1 provides an example of creating a dashboard in Excel with filters, and Lab 10.2 provides an
example of creating a dashboard in Tableau.

page 534

BAR CHARTS VERSUS HISTOGRAMS


As you know from previous chapters, histograms are used to illustrate frequency in the
distributions of data. Histograms can be used to assess the distribution of a particular dataset
as well as to simply compare the number of observations across intervals.
While histograms may look like bar charts, they have three key differences:
Histograms represent bins/intervals. Instead of the bars representing categories (such as
product name or year), the bars of a histogram represent bins. Bins, or class intervals, are
subsets of the data, arranged in increasing order. The first bin would begin with the lowest
number in the range of data, and the last bin would end with the largest number in the range
of data. Tableau automatically creates bins for you, but depending on the version of Excel in
use, you may have to create your own bins.
Each bin should be the same size and must contain the entire range of data. Typically,
there are between 5 and 20 bins (depending on the size and diversity of the dataset). It is a
best practice to make the dataset evenly divisible by the number of bins.
page 535
Histograms use numerical data. While bar charts could represent counts of
categorical data, histograms are only used for numerical data. This is made clear visually
because there are no gaps between the bars in histograms, unless the gap represents the
absence of data in that particular bin.
Vertical axis shows count of observations. While the vertical axis in a bar chart can be
associated with a variety of descriptive statistics (average, sum, count, etc.), the vertical axis
in a histogram is always associated with the count of observations in each particular bin.
Exhibit 10.11 is an example of a histogram for exam results, showing how many As, Bs,
Cs, Ds, and Fs were earned in a class. We can tell that the exam in Exhibit 10.11 is relatively
easy because of the shape of the distribution shown in the histogram. This distribution is
referred to as negatively skewed or skewed left because the bulk of the data is in the upper
range of the distribution (from 70-80, 80-90, and 90-100, or Cs, Bs, and As), but there is an
outlier to the far left, or the lower end of the distribution.

EXHIBIT 10.11
Histogram of Grade Distribution
An example of aged receivables (created using the Excel PivotTable in Lab 6-1) shows an
aging analysis, which can translate into a modified histogram (Exhibit 10.12). It is modified
for the following reasons:

EXHIBIT 10.12
Example of Aged Receivables from an Excel Pivot Chart (using Lab 6 data)

The vertical axis represents a sum of transaction amounts in each bin, instead of a count of
transactions.
The lowest bin starts at 1, instead of 9 (which is the actual lowest value in the range of data).
The data presents like a bar chart, due to the spaces between the bars.

page 536
Regardless of these differences, the way one would interpret this chart is similar to the way
one would interpret the results of a histogram. There does not seem to be a significant skew,
with the bulk of the receivables being in the middle intervals of the dataset.
For comparison’s sake, look at a count of the aged receivables transaction in the actual
histogram in Exhibit 10.13. The distribution is very similar.

You might also like