0% found this document useful (0 votes)

519 views37 pages

BA 1.2 - Visualizing Data

- The document discusses visual representations of data like histograms and how they help identify trends and patterns, especially in large data sets. - It provides an example of a histogram created from the heights of 10 Boston Red Sox players from the 2013 season. The heights vary from 68 to 76 inches, with most players between 72-74 inches. - The document explains how to create a histogram in Excel using oil consumption data from 10 countries as an example, including setting up bins from 1-19 million barrels per day to categorize the data.

Uploaded by

ScarfaceXXX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

519 views37 pages

BA 1.2 - Visualizing Data

Uploaded by

ScarfaceXXX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 37

1.2.

1 Recognizing Patterns
Some people are naturally more comfortable working with numbers than others. For
most of us, visual representations of data make it easier to identify trends and patterns
that can help us make better decisions. This is especially true for large data sets; the
more data we have, the harder it typically is to distinguish trends and patterns based
solely on reviewing the data set.

Data visualizations are all around us. The nutritional information on a cereal box offers a
standardized format that helps us understand how a single bowl of cereal fits into the
context of our overall daily food intake. The dashboard on your car is a visual
representation of complex data about a car’s performance—such as its speed, fuel and
oil levels, and whether or not its headlights are on. The dashboard has been carefully
designed to help us understand how the car is operating with only a quick glance.

Managers often look at data in the form of graphs or charts. Consider this set of data
about ten Boston Red Sox players on the roster for the 2013 season—a year in which
the Red Sox won the World Series.

Reflection
What do you notice about the heights of these ten Red Sox Players? Among other
topics, you may wish to consider the following in your response:

 Do the heights of these players vary significantly?

 How tall are the majority of the players?

 Approximately how tall are the shortest and tallest players?

The heights of these players do vary to quite an extent. The data reveals that the maximum is 76
inches while the minimum is 68 inches -- a difference of 8 inches.

6 players (the majority) are between 72 to 74 inches.

The shortest player is 68 inches and the two tallest players are 76 inches
In addition to graphing data, we can group data into categories that make it easier to
perform analyses within a category or across multiple categories. The groupings we
choose are often influenced by the question we are asking or the problem we want to
solve. An income statement, also known as a profit and loss (P&L) statement, is a good
example of a way to arrange financial data to make it easier to understand. Accountants
separate data into categories such as income and expenses so companies can analyze
their performance.

Statements with financial data can look intimidating at first, but you’ll become
comfortable using them more quickly than you might think. Over time, most managers
become as adept at understanding a balance sheet or P&L as a driver is at monitoring a
car’s performance by taking a quick glance at the dashboard.
1.2.2 Histograms
One of the most useful and commonly used graphical representations of data is a
histogram. A histogram displays the frequency, or number, of data points (often called
observations) that fall within specified bins.

Histograms allow us to quickly discern trends or patterns in a data set and are easy to
construct using programs such as Excel. The graph of the Red Sox players’ heights
shown below is an example of a histogram.

1.2.2_01_Histograms.wmv

 Let's learn how to

create a histogram
 by looking at the heights of
10 baseball players on the 2013
 World Series winning
Boston Red Sox team.
 We start by drawing a
horizontal, or x-axis,
 and label it with the variable
of interest, in this case,
 a player's height.
 We then draw the
vertical or y-axis,
 and label it with
frequency to indicate
 the number of players
whose heights fall
 within certain ranges.
 Before we plot
the data, we first
 need to determine
those height ranges.
 We call these ranges
bins, and we'll
 use them to categorize
players heights.
 Let's start by using six
bins, labeled 66, 68, 70, 72,
 74, and 76 inches.
 Each of these numbers represents
a range of possible heights
 that run up to that number.
 By convention, Excel
includes in the range
 the number represented
by the label.
 For example, bin 66 will contain
players whose heights are less
 than or equal to 66 inches.
 Bin 68 will contain
players whose heights
 are greater than 66
inches, and less than
 or equal to 68 inches.
 No players are shorter
than 66 inches,
 so the first bin is empty.
 Its frequency is zero.
 The shortest player, Dustin
Pedroia, is 68 inches tall,
 so we place him in bin 68.
 No other players have heights
that are greater than 66
 and less than or
equal to 68 inches,
 so the frequency of
bin 68 is just one.
 For bin 70, we want to know
how many players have heights
 that are greater than 68,
but less than or equal to 70
 inches.
 The only player is
Shane Victorino,
 who is 69 inches tall.
 So the frequency of
bin 70 is also one.
 We continue this process
of grouping players
 by height until all players
are included in the histogram.
 Two players, Drew and
Napoli, fall into bin 72.
 Three players, Ellsbury, Gomes,
and Uehara, fall into bin 74.
 The remaining three
players fall into bin 76.
 Notice that the sum
of the frequencies
 is 10, because there
are 10 baseball
 players in our dataset.
 This histogram gives
us a good sense
 of the height distribution
of these 10 players.
 Clearly, more players are at the
taller end of the distribution
 than at the shorter end.
 Suppose we had used larger
bins, say 66, 72, and 78.
 Again, there are no
players in bin 66.
 But now Drew, Napoli,
Pedroia, and Victorino
 are all in bin 72.
 So the frequency of
that bin is four.
 All of the other
players have heights
 that are greater than
72, but less than
 or equal to 78 inches.
 So the frequency
of bin 78 is six.
 The bins we choose have
a noticeable impact
 on what a histogram reveals
about the underlying data.
 Using larger bins
simplifies our graph,
 but provides less detail
about the distribution
 of heights than the graph
we previously constructed.
 These large bins can prevent us
from seeing interesting trends
 in the data.
 Very small bins
can create graphs
 that show such low frequencies
that it can also be
 difficult to discern patterns.
 Fortunately, it's easy to adjust
the bins when working in Excel.
 So with a little
experimentation,
 we can usually
construct bins that
 provide good visual
representations of our data.

Let’s review a few key concepts before we learn how to create a histogram in Excel.

On the horizontal axis, we display a series of single values, each of which represents a
bin, or range of possible values. On the vertical axis, we display the frequency of the
observations in each bin. With a small data set, we can count and assign data points to
bins as we just did, but with large data sets, this approach would be extremely tedious.
This is where programs like Excel are helpful!

Let's consider another example, the amount of oil consumed by the ten top oil-
consuming countries in 2012, and create a histogram of that data set in Excel. Before we
begin, let’s decide which bins to use. Notice that the consumption data (shown below)
have been sorted from largest to smallest. Because the data values range from
approximately 2 to 19 million barrels per day, we’ll use bins 1, 2, …, 19.

By convention, Excel includes in the range the number represented by the bin label. So
bin 1 includes all countries with oil consumption less than or equal to 1 million barrels
per day (x≤1); bin 2 includes all countries with oil consumption greater than 1 but less
than or equal to 2 million barrels per day (1<x≤2); and bin 19 includes all countries with
oil consumption greater than 18 but less than or equal to 19 million barrels per day
(18<x≤19).

Spreadsheet: Creating a Histogram

This is a two-question set. In Question 1, you will only be setting up the bins (1-19) that
correspond to the range of the data. In Question 2, you will be building the actual
histogram.

Question 1 of 2

Step 1

Label column D. Since this label will appear on the horizontal axis of the histogram,
copy the label from cell A1 into cell D1.

Step 2
To enter the bins in column D, input 1 into cell D2, then input 2 into cell D3, and
continue down the column until you have entered all 19 bin labels.

Note

 Numeric responses should be entered into the blue answer cells. All of the blue
answer cells must be filled before you can submit your answer.
 You can manually input each value or you can use auto-fill, which allows you to
quickly populate values without entering them one-by-one.
 To use auto-fill, enter the first two values in cells D2 and D3. Highlight those two
cells and place your cursor at the bottom right-hand corner of cell D3. The cursor
will turn into a black cross. Drag the cross down the column until you reach
cell D20. When you release the mouse, the values will auto-fill.

Correct!

1 should be in cell D2, 2 should be in cell D3, …, and 19 should be in cell D20.

Question 2 of 2

Step 3

From the Data menu, select Data Analysis, then select Histogram.

Step 4

Enter the appropriate Input Range and Bin Range:

 The Input Range is the original oil consumption data in column A with its
label, A1:A11.
 The Bin Range is the set of values from 1-19 that you created in column D with
its label, D1:D20.
 Make sure to include the cells containing labels when inputting your ranges and
check the Labels in first row box, as this ensures that your histogram will be
appropriately labeled.
Correct!

The Input Range is A1:A11 and the Bin Range is D1:D20. You must check
the Labels in first row box since we included A1 and D1 to ensure that the
histogram’s axes are appropriately labeled. Note that Excel automatically adds a bin
called “More” to histograms. In this case “More” includes all countries with oil
consumption greater than 19 million barrels per day.

Related: Creating Histograms in Excel

In order to create histograms in Excel, you may need to download the Analysis
ToolPak, which is an add-in program. Please consult the Microsoft website for more
information on how to install the Analysis ToolPak for your own version of Excel.

Note that you do NOT need to obtain the Analysis ToolPak to complete the
online spreadsheets that are part of the HBS Online Business Analytics
course. The add-in program is only necessary if you wish to use these analytical
tools in your own version of Excel.

The histogram you just created is skewed, meaning that it has a tail that extends out to
one side. The tail is the part of a graph that appears long or “flattens”, and has bins with
lower frequencies. Skewness measures the degree of a graph’s asymmetry. If the right
tail is longer, we say the distribution is skewed to the right or “right-tailed.” Likewise, if
the left tail is longer, we say the distribution is skewed to the left or “left-tailed.” The oil
consumption data set is skewed to the right.
Question 1 of 5

Select the countries that are in bin 4.

United States
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. The United States consumes 18.6 million barrels
per day.
China
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. China consumes 10.3 million barrels per day.
Japan
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Japan consumes 4.7 million barrels per day.
India
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. India consumes 3.6 million barrels per day and
therefore falls into bin 4.
Russia
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Russia consumes 3.2 million barrels per day and
therefore falls into bin 4.
Saudi Arabia
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Saudi Arabia consumes 2.9 million barrels per
day.
Brazil
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Brazil consumes 2.8 million barrels per day.
Germany
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Germany consumes 2.4 million barrels per day.
Canada
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Canada consumes 2.3 million barrels per day.
South Korea
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. South Korea consumes 2.3 million barrels per
day.

Select the countries that are in bin 4.

United States
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. The United States consumes 18.6 million barrels
per day.
China
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. China consumes 10.3 million barrels per day.
Japan
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Japan consumes 4.7 million barrels per day.
India CORRECT
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. India consumes 3.6 million barrels per day and
therefore falls into bin 4.
Russia CORRECT
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Russia consumes 3.2 million barrels per day and
therefore falls into bin 4.
Saudi Arabia
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Saudi Arabia consumes 2.9 million barrels per
day.
Brazil
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Brazil consumes 2.8 million barrels per day.
Germany
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Germany consumes 2.4 million barrels per day.
Canada
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. Canada consumes 2.3 million barrels per day.
South Korea
Bin 4 includes all countries that consume more than 3 million barrels and less than or
equal to 4 million barrels of oil per day. South Korea consumes 2.3 million barrels per
day.

Question 2 of 5

According to the histogram, which range contains the most countries?

Greater than 1 million and less than or equal to 2 million barrels per day
The frequency of bin 2, which represents oil consumption greater than 1 million and less
than or equal to 2 million barrels per day, is zero. Look for the bin with the highest
frequency.
Greater than 2 million and less than or equal to 3 million barrels per day
The tallest bar corresponds to bin 3, which means that bin 3 has the highest frequency.
Therefore, the range containing the most countries in the data set includes all countries
that consume more than 2 million and less than or equal to 3 million barrels of oil per
day.
Greater than 3 million and less than or equal to 4 million barrels per day
Only two countries consume more than 3 million and less than or equal to 4 million
barrels per day. Look for the bin with the highest frequency.
Greater than 18 million and less than or equal to 19 million barrels per day
Only one country consumes more than 18 million and less than or equal to 19 million
barrels per day. Look for the bin with the highest frequency.

Question 3 of 5

According to the histogram, which range contains the country in the data set that
consumes the least amount of oil?

More than 0 and less than or equal to 1 million barrels per day
The frequency of bin 1, which represents oil consumption less than or equal to 1 million
barrels per day, is zero. This means that no countries consume an amount in this range.
The amount of oil consumed per day is shown on the horizontal axis. Look for the range,
or bin, with the lowest value that has a frequency of at least one.
More than 1 and less than or equal to 2 million barrels per day
The frequency of bin 2, which represents oil consumption more than 1 million and less
than or equal to 2 million barrels per day, is zero. This means that no countries consume
an amount in this range. The amount of oil consumed per day is shown on the horizontal
axis. Look for the range, or bin, with the lowest value that has a frequency of at least
one.
More than 2 and less than or equal to 3 million barrels per day
The amount of oil consumed per day is shown on the horizontal axis. Bin 3 is the lowest
range that has a frequency of at least one (in this case, the frequency is 5). Therefore,
the lowest consumer of oil consumes more than 2 million and less than or equal to 3
million barrels per day.
More than 18 and less than or equal to 19 million barrels per day
The country in bin 19 consumes more than 18 million barrels per day. Other countries
consume less. The amount of oil consumed per day is shown on the horizontal axis.
Look for the range, or bin, with the lowest value that has a frequency of at least one.

Question 4 of 5

How many countries consume more than 10 million and less than or equal to 11 million
barrels of oil per day?

0
The number of countries that consume more than 10 million and less than or equal to 11
million barrels of oil per day is indicated by the height of the bar at bin 11. Because the
height exceeds zero, we know that at least one country’s consumption is within this
range. How many countries consume more than 10 million and less than or equal to 11
million barrels?
1
The number of countries that consume more than 10 million and less than or equal to 11
million barrels of oil per day is indicated by the height of the bar at bin 11. The frequency
of that bar is one, which indicates that one country consumes more than 10 million and
less than or equal to 11 million barrels of oil per day.
8
The number of countries that consume more than 10 million and less than or equal to 11
million barrels of oil per day is indicated by the height of the bar at bin 11. Eight countries
consume less than or equal to 10 million barrels of oil per day. How many countries
consume more than 10 million and less than or equal to 11 million barrels?
9
The number of countries that consume more than 10 million and less than or equal to 11
million barrels of oil per day is indicated by the height of the bar at bin 11. Nine countries
consume less than or equal to 11 million barrels of oil per day. How many countries
consume more than 10 million and less than or equal to 11 million barrels?

Question 5 of 5

Suppose you are interested in gaining a deep understanding about the distribution of
salaries at your company.

Which histogram provides the greatest insight about the distribution of the salary data for 15
employees?
COLD CALL

You were not selected to take this cold call. Please review other students' responses
below.

Which histogram do you think best displays the 2012 revenue for the top 100 U.S.
companies? Why?
I think Option C is the most adequate graph since it uses the right amount of bins.

Using larger bins such as in Option A and B simplifies the graph but provides less detail
about the distribution. Therefore they prevent us from seeing interesting trends in the
data.

On the other hand, very small bins such as in Option D provide graphs that show such
low frequencies that it can be difficult to see any patterns in the data. –Carla+5/+30

Option B. We see it is right tailed, and the distribution of 2012 revenue in a reasonable
format.Option C and D have too much data. –Tarun+16/+24
COLD CALL

You were not selected to take this cold call. Please review other students' responses
below.

How would you decide what bins to use when creating a histogram? What factors might
influence the bins you select?

Choosing bins balances showing trends versus showing noise. Ideal bins demonstrate
general trends in the data, skewing right or left, without overwhelming the visual with too
much noise. While raw numbers show granular data, histograms should be used for
more abstract shape attributes of the data. –Jason+7/+40

1. Size of frequency or population

2. spread of size i.e. from lowest to highest and the average gap in between the sizes
3. outcome that is desired i.e. if a detailed outcome is required, then the size are smaller.

--Thang Fei +4/+24

I would look over the entire range of data to get a sense of how many data points will be
shown and how widely dispersed are the data points. –JJ+2/+24

1.2.3 Outliers
1.2.3_01_Outliers.wmv

 When we examine a histogram, we

may notice values that fall far
 from the rest of the data.
 For example, if we graph the
age distribution of students
 in a college
course, we might see
 that almost all
of the data points
 are clustered around
20 years, but then
 see one data point at 60 years.
 Data points that fall far
from the rest of the data
 are called outliers.
 When we see an
outlier, we should
 investigate why it exists.
 Is it just an unusual,
but valid data point?
 Is it a data entry error?
 Was it collected in a different
manner or at a different time
 than the rest of the data?
 In this case, we might discover
that the data point refers
 to a 60-year-old who is taking
the course to learn more
 about a topic of interest,
but is not enrolled
 in a degree granting program.
 Typically, we take one
of three approaches when
 we see an outlier, leave it as
is, change it to a corrected
 value, or very rarely,
remove it from the data set.
 A 60-year-old in a college
class may be an outlier,
 but his age represents
a legitimate value
 in the data set.
 If we truly want to
understand the age
 distribution of all
students in the course,
 we should leave that point in.
 However, if what we
really want to know
 is the age distribution of
students in the course who
 are also enrolled in a full
time degree granting program,
 we would exclude this outlier.
 Occasionally, we may change
the value of an outlier.
 But this should be done only
after examining the underlying
 situation very carefully.
 For example, let's
look at a graph
 showing the
inventory of football
 cleats in a retail store.
 Most of the values
are between 5 and 15.
 A data point showing
inventory of 80 pairs
 would certainly be an outlier.
 Notice that the data point
80 was recorded on April 7th,
 and that the inventory was
10 on the preceding day
 in 6 on the following day.
 Based on our management
understanding
 of how inventory
levels rise and fall,
 we realized that the value
80 is extremely unlikely.
 We speculate that the data point
might be a data entry error.
 Further investigation of
sales and purchasing records
 reveals that the actual
inventory level on that day
 was 8, not 80.
 Excluding or changing data
is not something we do often.
 We should never do it
to help the data fit
 a conclusion we wish to draw.
 Such changes to a data set
should only be done on a case
 by case basis after careful
investigation of the situation.

Drill Down: Technical Definition of an Outlier

Technically, a data point is considered an outlier if it is more than a specified distance

below the lower quartile or above the upper quartile of a data set. Let’s start with a
couple of definitions. The lower quartile, Q1, is the 25th percentile—by definition, 25%
of all observations fall below Q1. The upper quartile, Q3, is the 75th percentile—75%
of all observations fall below Q3. The interquartile range (IQR) is the difference
between the upper and lower quartiles, that is, IQR=Q3–Q1. We then multiply the IQR
by 1.5 to find the appropriate range, computing 1.5(IQR)=1.5(Q3–Q1). A data point is
an outlier if it is less than Q1–1.5(IQR) or greater than Q3+1.5(IQR).

Let’s return to the oil consumption data.

Question 1 of 2

Select the countries that appear to be outliers.

United States CORRECT

An outlier is a data point that falls far from the rest of the data. The United States
consumes far more oil per day than any other country in the data set and is therefore
considered an outlier. (See the Drill Down Drop Bar above for more information about
how outliers are defined.)
China CORRECT
An outlier is a data point that falls far from the rest of the data. Whether or not China is
an outlier is hard to determine from looking at the graph. Technically, China is an outlier.
(See the Drill Down Drop Bar above for more information about how outliers are
defined.)
Japan
Japan is not an outlier because it is sufficiently close to where the rest of the data
cluster. (See the Drill Down Drop Bar above for more information about how outliers are
defined.)
India
India is not an outlier because it is sufficiently close to where the rest of the data cluster.
(See the Drill Down Drop Bar above for more information about how outliers are
defined.)
Russia
Russia is not an outlier because it is sufficiently close to where the rest of the data
cluster. (See the Drill Down Drop Bar above for more information about how outliers are
defined.)
Saudi Arabia
Saudi Arabia is not an outlier because it is sufficiently close to where the rest of the data
cluster. (See the Drill Down Drop Bar above for more information about how outliers are
defined.)
Brazil
Brazil is not an outlier because it is sufficiently close to where the rest of the data cluster.
(See the Drill Down Drop Bar above for more information about how outliers are
defined.)
Germany
Germany is not an outlier because it is sufficiently close to where the rest of the data
cluster. (See the Drill Down Drop Bar above for more information about how outliers are
defined.)
Canada
Canada is not an outlier because it is sufficiently close to where the rest of the data
cluster. (See the Drill Down Drop Bar above for more information about how outliers are
defined.)
South Korea
South Korea is not an outlier because it is sufficiently close to where the rest of the data
cluster. (See the Drill Down Drop Bar above for more information about how outliers are
defined.)

Question 2 of 2

If you were interested in knowing the average oil consumption of the top oil-consuming
countries, how would you handle the outliers?

Keep them in the data set

Because we want to know average oil consumption of the top oil-consuming countries,
we would keep the outliers in the data set. We know that the United States and China
are large oil-consumers, so we have no reason to suspect that those countries’ data are
not valid entries. Removing them would significantly alter our analysis.
Change their values
Based on our knowledge, we know that the United States and China are large
consumers of oil, so we have no reason to suspect that those countries’ data are not
valid entries. Thus, even though those data points are outliers, we should not change
their values.
Remove them from the data set
Based on our knowledge, we know that the United States and China are large
consumers of oil, so we have no reason to suspect that the data points for those
countries are not valid entries. Thus, even though those data points are outliers, we
should not remove the outliers from the data set. Removing them would significantly
alter our analysis.

1.2 Practice Questions

The following data set lists the prices for thirty houses in and around Boston,
Massachusetts. Create a histogram of the data using the bins provided in column D.
Correct!

The Input Range is B1:B31 and the Bin Range is D1:D8. You must check
the Labels in first row box since we included B1 and D1 to ensure that the
histogram’s axes are appropriately labeled.

Question 2 of 6

How would you describe the shape of the distribution shown below of the real estate
pricing data?
Uniform
A uniform distribution has constant probability across a range of possible outcomes.
Thus the bars of the histogram of a uniform distribution will have the same frequency
provided the bins over the range of possible outcomes are of equal size. Since the
frequencies of the bins in this graph vary, the distribution is not uniform.
Right-tailed
This graph has a tail that extends out the right side. As selling price increases, the
frequency of each bin above $600,000 is much less than those below $600,000.
Therefore, we infer that this distribution is skewed to the right, or right-tailed.
Left-tailed
This graph is not left-tailed. Although it has a tail, the tail extends out the right side, not
the left side. Thus we cannot infer that the distribution is left-tailed.
Symmetric
This graph is not symmetric; it has a tail that extends out to one side.
Question 3 of 6

How many houses cost more than $400 thousand and less than or equal to $800
thousand?
Approximately 2
By convention, Excel includes in a bin’s range the number represented by the bin label.
For example, the first bin (labeled $200,000) includes all houses with values less than or
equal to $200,000 and the second bin (labeled $400,000) includes all houses with
values greater than $200,000 but less than or equal to $400,000. The only bins with
frequency 2 are the fourth bin (labeled $800,000), which indicates that approximately 2
houses cost more than $600,000 and less than or equal to $800,000, and the sixth bin
(labeled $1,200,000), which indicates that approximately 2 houses cost more than
$1,000,000 and less than or equal to $1,200,000). The number of houses that cost more
than $400,000 and less than or equal to $800,000 is indicated by the height of the bars
at bins $600,000 and $800,000. How many houses cost more than $400,000 and less
than or equal to $800,000?
Approximately 11
The number of houses that cost more than $400,000 and less than or equal to $800,000
is indicated by the height of the bars at bins $600,000 and $800,000. The frequency of
the bar above bin $600,000 is approximately 9 and the frequency of the bar above bin
$800,000 is approximately 2. Therefore, approximately 9+2=11 houses cost more than
$400,000 and less than or equal to $800,000.
Approximately 15
By convention, Excel includes in a bin’s range the number represented by the bin label.
For example, the first bin (labeled $200,000) includes all houses with values less than or
equal to $200,000 and the second bin (labeled $400,000) includes all houses with
values greater than $200,000 but less than or equal to $400,000. Approximately 15
houses cost less than or equal to $400,000. The number of houses that cost more than
$400,000 and less than or equal to $800,000 is indicated by the height of the bars at
bins $600,000 and $800,000. How many houses cost more than $400,000 and less than
or equal to $800,000?
Approximately 25
By convention, Excel includes in a bin’s range the number represented by the bin label.
For example, the first bin (labeled $200,000) includes all houses with values less than or
equal to $200,000 and the second bin (labeled $400,000) includes all houses with
values greater than $200,000 but less than or equal to $400,000. Approximately 25
houses cost less than or equal to $600,000. The number of houses that cost more than
$400,000 and less than or equal to $800,000 is indicated by the height of the bars at
bins $600,000 and $800,000 How many houses cost more than $400,000 and less than
or equal to $800,000?

Question 4 of 6
The following data set contains the heights of several members of the Boston Red Sox.
Create a histogram of the data using the bins provided in column C.
Correct!

The Input Range is B1:B11 and the Bin Range is C1:C4. You must check
the Labels in first row box since we included B1 and C1 to ensure that the
histogram’s axes are appropriately labeled.

Question 5 of 6
The following data set contains the heights of several members of the Boston Red Sox.
Create a histogram of the data using the bins provided in column C.
Correct!

The Input Range is B1:B11 and the Bin Range is C1:C6. You must check
the Labels in first row box since we included B1 and C1 to ensure that the
histogram’s axes are appropriately labeled.

Question 6 of 6
Below are three histograms showing the heights of several members of the Boston Red
Sox. Which do you think is more effective in showing the distribution of player heights?

Time-Varying Parameter VAR Model Using TVP-VAR Package: Jouchi Nakajima
0% (1)
Time-Varying Parameter VAR Model Using TVP-VAR Package: Jouchi Nakajima
5 pages
BA Module 02 - Quiz
100% (10)
BA Module 02 - Quiz
22 pages
DOL, DFL, DTL Exercises
No ratings yet
DOL, DFL, DTL Exercises
4 pages
Aczel Solution 005
100% (3)
Aczel Solution 005
19 pages
Stats Chap12 Notes
No ratings yet
Stats Chap12 Notes
89 pages
Ch43 Summarizing Data by Histograms and Pareto Charts Was Ch41 PDF
No ratings yet
Ch43 Summarizing Data by Histograms and Pareto Charts Was Ch41 PDF
17 pages
Devidend Dicision
No ratings yet
Devidend Dicision
70 pages
Visualizing Data in Spreadsheet
No ratings yet
Visualizing Data in Spreadsheet
6 pages
Creating Histograms
No ratings yet
Creating Histograms
8 pages
Real Vs Nominal GDP Practice
No ratings yet
Real Vs Nominal GDP Practice
6 pages
Quiz 9 Hypothesis Testing For Two Populations
100% (1)
Quiz 9 Hypothesis Testing For Two Populations
29 pages
Excel Lab 2
No ratings yet
Excel Lab 2
7 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
Geog 2025 Gr 11 June Exam Marking Guidelines
No ratings yet
Geog 2025 Gr 11 June Exam Marking Guidelines
8 pages
Business Analytics - The Science of Data Driven Decision Making
No ratings yet
Business Analytics - The Science of Data Driven Decision Making
55 pages
Editorial_Board_Perspective_Publication Dr Husni
No ratings yet
Editorial_Board_Perspective_Publication Dr Husni
18 pages
BA Module 02 - 2.4 - Confidence Interval
No ratings yet
BA Module 02 - 2.4 - Confidence Interval
41 pages
Assignment of Statistics
No ratings yet
Assignment of Statistics
3 pages
BA 1.3 - Descriptive Statistics
No ratings yet
BA 1.3 - Descriptive Statistics
70 pages
Statistics and Operations Management I
No ratings yet
Statistics and Operations Management I
5 pages
51992_Nit 06_25 Tender Notice
No ratings yet
51992_Nit 06_25 Tender Notice
1 page
Dashboard Assignment
No ratings yet
Dashboard Assignment
17 pages
In-Class Practices - Session 1 - Answers
No ratings yet
In-Class Practices - Session 1 - Answers
19 pages
Shur Joint
100% (1)
Shur Joint
132 pages
Welcome: To All MBA Students
No ratings yet
Welcome: To All MBA Students
60 pages
Lecture 1 - Introduction & Descriptive Statistics
No ratings yet
Lecture 1 - Introduction & Descriptive Statistics
35 pages
One Sample Test of Hypotensis
No ratings yet
One Sample Test of Hypotensis
24 pages
Test Bank For Business Statistics in Practice 8th Edition by Bowerman Chapters 1 18
100% (1)
Test Bank For Business Statistics in Practice 8th Edition by Bowerman Chapters 1 18
27 pages
Evans Analytics2e PPT 12
100% (1)
Evans Analytics2e PPT 12
63 pages
Regression
0% (1)
Regression
38 pages
01 - Describing and Summarizing Data
No ratings yet
01 - Describing and Summarizing Data
41 pages
Basic Business Statistics: Analysis of Variance
No ratings yet
Basic Business Statistics: Analysis of Variance
85 pages
Ibm Spss
No ratings yet
Ibm Spss
20 pages
3-Statistical Learning - Distributions
No ratings yet
3-Statistical Learning - Distributions
33 pages
Test Bank for Business Analytics 3rd Edition by Evans
No ratings yet
Test Bank for Business Analytics 3rd Edition by Evans
28 pages
Statistics For Business and Economics: Simple Regression
No ratings yet
Statistics For Business and Economics: Simple Regression
62 pages
Peripeteia Ibsen S History in Hedda Gabler and The Pretenders
No ratings yet
Peripeteia Ibsen S History in Hedda Gabler and The Pretenders
42 pages
12.simple Regression NLS Edit
No ratings yet
12.simple Regression NLS Edit
62 pages
Analysis of research papers on E-commerce (2000–2013)- based on a text mining approach
No ratings yet
Analysis of research papers on E-commerce (2000–2013)- based on a text mining approach
15 pages
Ch10 Two Sample Tests
No ratings yet
Ch10 Two Sample Tests
17 pages
Analyzing The External Environment of The Firm: Chapter Two
No ratings yet
Analyzing The External Environment of The Firm: Chapter Two
37 pages
The Complete Guide To Motivation
No ratings yet
The Complete Guide To Motivation
79 pages
CAD Detail - Waterproofing and tiling - according ETAG022
No ratings yet
CAD Detail - Waterproofing and tiling - according ETAG022
1 page
iii_2
No ratings yet
iii_2
17 pages
Managerial Economics - Oligopoly
No ratings yet
Managerial Economics - Oligopoly
24 pages
CHAPTER FIVE(1)
No ratings yet
CHAPTER FIVE(1)
8 pages
Ansi Isa 12.13.01 2003
No ratings yet
Ansi Isa 12.13.01 2003
108 pages
Time Series Analysis in The Toolbar of Minitab's Help
No ratings yet
Time Series Analysis in The Toolbar of Minitab's Help
30 pages
09 Sampling Distribution
No ratings yet
09 Sampling Distribution
15 pages
Chapter 03 Electronic Fuel Injection
No ratings yet
Chapter 03 Electronic Fuel Injection
44 pages
Intellectual Property Law Report
No ratings yet
Intellectual Property Law Report
20 pages
Business Statistics May Module
No ratings yet
Business Statistics May Module
72 pages
Statistics For Business and Economics: Describing Data: Numerical
No ratings yet
Statistics For Business and Economics: Describing Data: Numerical
40 pages
Graphical and Tabular Descriptive Techniques
No ratings yet
Graphical and Tabular Descriptive Techniques
40 pages
BA Module 02 - 2.1 + 2.2
No ratings yet
BA Module 02 - 2.1 + 2.2
12 pages
TG100 Manual
No ratings yet
TG100 Manual
89 pages
Business Statistics3
No ratings yet
Business Statistics3
18 pages
Appendix 5 Tile-Care-Maintenance
No ratings yet
Appendix 5 Tile-Care-Maintenance
35 pages
LAT TOEFL WRITTEN EXPRESSION
No ratings yet
LAT TOEFL WRITTEN EXPRESSION
2 pages
BME 403-Sp 2020-Exam 3
No ratings yet
BME 403-Sp 2020-Exam 3
4 pages
BA Module 2 Summary
No ratings yet
BA Module 2 Summary
3 pages
09-12 - A Manufacturer of Semiconductor Devices Takes A Random Sampl Quizlet
100% (1)
09-12 - A Manufacturer of Semiconductor Devices Takes A Random Sampl Quizlet
5 pages
Time Series Analysis: 1 Contributed by National Academy of Statistical Administration
No ratings yet
Time Series Analysis: 1 Contributed by National Academy of Statistical Administration
56 pages
BA Module 4 Summary
No ratings yet
BA Module 4 Summary
3 pages
Strategic Management Outline 21-23 PM
No ratings yet
Strategic Management Outline 21-23 PM
7 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
DSTP2.0-Batch-05 DBI101 3
No ratings yet
DSTP2.0-Batch-05 DBI101 3
3 pages
Time Series With Minitab - I Smoothing
No ratings yet
Time Series With Minitab - I Smoothing
1 page
VOC1 - Part1
No ratings yet
VOC1 - Part1
70 pages
BA Module 01 - Quiz
No ratings yet
BA Module 01 - Quiz
29 pages
The Normal Distribution: Learning Objectives
No ratings yet
The Normal Distribution: Learning Objectives
5 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
10 pages
P-Values Notes
No ratings yet
P-Values Notes
15 pages
Doane4e Preface PDF
100% (1)
Doane4e Preface PDF
23 pages
Capacity Training
No ratings yet
Capacity Training
3 pages
ST205 Exam Paper 2017
No ratings yet
ST205 Exam Paper 2017
7 pages
Wjec English Literature Coursework Examples
100% (2)
Wjec English Literature Coursework Examples
6 pages
Hyundai Motor Company
No ratings yet
Hyundai Motor Company
4 pages
Fundamentals of Business Statistics - Hypothesis
No ratings yet
Fundamentals of Business Statistics - Hypothesis
25 pages
Case Study 4
No ratings yet
Case Study 4
10 pages
MQM100 MultipleChoice Chapter8
No ratings yet
MQM100 MultipleChoice Chapter8
14 pages
BA Module 02 - Practice Questions
No ratings yet
BA Module 02 - Practice Questions
23 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
2015 JMEMS X Rays Velasquez
No ratings yet
2015 JMEMS X Rays Velasquez
11 pages
Grout Release Form
No ratings yet
Grout Release Form
2 pages
Proof - Relationship Between Normal Distribution and T-Distribution
No ratings yet
Proof - Relationship Between Normal Distribution and T-Distribution
2 pages
Basic Business Statistics: 11 Edition
No ratings yet
Basic Business Statistics: 11 Edition
24 pages
F Dist Recipocal Proof Alternate - Mathematics Stack Exchange
No ratings yet
F Dist Recipocal Proof Alternate - Mathematics Stack Exchange
5 pages
Probability - Why Is The Expected Value of X-Squared - $E (X 2) - Neq E (X) 2$ - Mathematics Stack Exchange
No ratings yet
Probability - Why Is The Expected Value of X-Squared - $E (X 2) - Neq E (X) 2$ - Mathematics Stack Exchange
5 pages
Content - The Mean and Variance of - ( - Bar (X) - )
No ratings yet
Content - The Mean and Variance of - ( - Bar (X) - )
4 pages
Thai Law BuddhistLaw Essays On The Legal History Ofthailand, Laos and Burma
No ratings yet
Thai Law BuddhistLaw Essays On The Legal History Ofthailand, Laos and Burma
2 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Pellicon XL 50 Cassette and Labscale TFF System
No ratings yet
Pellicon XL 50 Cassette and Labscale TFF System
6 pages
Unit 5 Reading Graphs/Charts and Tables: Objectives
No ratings yet
Unit 5 Reading Graphs/Charts and Tables: Objectives
7 pages
Tv-Based Instruction (Tbi) Episode Script For (Mathematics 7)
No ratings yet
Tv-Based Instruction (Tbi) Episode Script For (Mathematics 7)
13 pages
Estimation
No ratings yet
Estimation
6 pages
Puvi Claim New
No ratings yet
Puvi Claim New
4 pages
R - Package Creation and Installation
No ratings yet
R - Package Creation and Installation
9 pages
Maximizing Outreach Through Town Halls: A Planning Guide
No ratings yet
Maximizing Outreach Through Town Halls: A Planning Guide
15 pages
HX710 PDF
No ratings yet
HX710 PDF
1 page
Basic Statistics: Simple Linear Regression
No ratings yet
Basic Statistics: Simple Linear Regression
8 pages

BA 1.2 - Visualizing Data

Uploaded by

BA 1.2 - Visualizing Data

Uploaded by

1.2.

 Do the heights of these players vary significantly?

 How tall are the majority of the players?

 Approximately how tall are the shortest and tallest players?

6 players (the majority) are between 72 to 74 inches.

 Let's learn how to

Spreadsheet: Creating a Histogram

1 should be in cell D2, 2 should be in cell D3, …, and 19 should be in cell D20.

From the Data menu, select Data Analysis, then select Histogram.

Enter the appropriate Input Range and Bin Range:

Related: Creating Histograms in Excel

Select the countries that are in bin 4.

Select the countries that are in bin 4.

According to the histogram, which range contains the most countries?

1. Size of frequency or population

--Thang Fei +4/+24

 When we examine a histogram, we

Drill Down: Technical Definition of an Outlier

Technically, a data point is considered an outlier if it is more than a specified distance

Let’s return to the oil consumption data.

Select the countries that appear to be outliers.

United States CORRECT

Keep them in the data set

1.2 Practice Questions

You might also like