Data Analysis Visualisations in Excel Printable
Data Analysis Visualisations in Excel Printable
This item contains selected online content. It is for use alongside, not as a replacement for the module website,
which is the primary study format and contains activities and resources that cannot be replicated in the printed
versions.
22/07/24
Contents
Introduction 4
Learning outcomes 5
1 Excel spreadsheets 6
1.1 Using Excel 6
1.2 Opening an Excel file 7
1.3 Adding the Data Analysis ToolPak in Excel 8
1.4 Decimal points and dates 9
1.5 Using shortcut keys in Excel 11
1.6 Use of Excel spreadsheets 13
2 Univariate data visualisation 15
2.1 Frequency tables 15
2.2 Types of frequency distribution 16
2.3 Histograms: a graphical visualisation of frequency tables 27
2.4 Frequency density 29
3 Bivariate data 32
3.1 Contingency tables 33
3.2 Scatter diagrams 34
Conclusion 38
Acknowledgements 39
Introduction 22/07/24
Introduction
The objective of this course is to explore the ways to visualise data sets, such as
univariate and bivariate data, in Excel and to familiarise yourself with the functions used in
Excel to explore the relationship between variables. Univariate data here refers to data
that consists of one variable, and bivariate data refers to data that consists of two
variables.
Learning outcomes
After studying this course, you should be able to:
● explore the functionalities of Excel that are used for problem solving in a business
context
● demonstrate the numeracy skills required for gathering and organising data for
decision making related to a specific problem
● use graphical techniques (histograms and scatter diagrams) to provide a visual
summary of available data
● recognise data presentation and communication techniques used in a range of
traditional and electronic media
● describe the relationship between two variables (independent and dependent
variables).
1 Excel spreadsheets 22/07/24
1 Excel spreadsheets
Before making any decision for a business, it is usually a good idea to get a clear picture
of the data, which can provide you with an overview of the relevant information. It is good
practice, therefore, to organise and present data in such ways that make it useful for
decision making and problem solving. Microsoft Excel is used widely for data analysis in a
professional context. Researchers and analysts alike use this tool for various applications
in the real world, such as in business, medicine, academia, tax and auditing, marketing,
accounting and finance. Moreover, it is flexible enough to be used with many types of
data, irrespective of whether it is qualitative or quantitative data.
In this course, you will make extensive use of Excel spreadsheets. In this section you will
familiarise yourself with the basics of using Excel. This will enhance your analytical skills
as well as your employability skills.
This section briefly explains the various features and functions of Excel that are used by
researchers and data analysts to explore, organise and analyse data.
If you are currently studying with the OU as a fee-paying student, you have free
access to Microsoft Office 365. This includes the spreadsheet software Excel. For
this you need to go to the OU Computing Guide and scroll down to ‘Microsoft Office
365’. Click on the link. If you have not done so, then you should follow the
instructions to sign up to get access for your free version of the software. If you have
already installed Microsoft Excel on your laptop, then you may prefer to use your
own version. Although earlier versions of Excel are not significantly different, the
layout of some tabs and menus may vary slightly.
By opening the worksheet, you should see quarterly data of units sold by JC Electrics in
four columns. The screenshot below shows a spreadsheet with these columns labelled
as: ‘Quarter’, ‘Generators’, ‘Transformers’ and ‘Electric Motors’.
1 Excel spreadsheets 22/07/24
Page Down/
Move screen down or up.
Page Up
Ctrl + Arrow Move to the edge of a region. This is useful for navigating large blocks of data,
keys particularly with the Ctrl + Shift + Arrow selection functionality.
Move to the beginning of a worksheet. This is most useful if you have multiple
worksheets and want to prepare a nice-looking workbook, by cycling through
Ctrl + Home
all worksheets pressing Ctrl + Page Down and Ctrl + Home for each sheet,
which quickly puts the cursor in the upper-left corner.
Ctrl + Tab Set focus on next workbook if multiple workbooks are open.
Select the current and previous sheet in a workbook. This is useful if you
Ctrl + Shift +
have similar worksheets and want to edit cells in all of them at the same
Page Up
time.
Shift + Arrow key Extend the selection by one cell. This is one of the most useful shortcuts.
Ctrl + Shift + Extend the selection to the last cell with content in row or column. You can
Arrow key do this with the Page Up/Down keys.
Table 3 Editing
Cut active selection. Think carefully about whether you want to copy or cut a selection
Ctrl + X before pasting in each situation, because cell references in copied selections will
point to other cells and not the original references when pasted.
Alt +
Start a new line in the same cell when entering text.
Enter
Shift
Insert new worksheet.
+ F11
Ctrl + Shift + Enter an array formula. Must have a range selected first. (This is shown here
Enter only for reference and will be explained later.)
When editing a cell reference (e.g. ‘H5’), pressing F4 makes this reference
F4 absolute (e.g., ‘$H$5’). Pressing F4 repeatedly makes only row or column
absolute.
Save the current workbook. Extremely useful for the occasional power outage
Ctrl + S
or computer crash.
● In schools and universities, spreadsheets are often used to manage student data in
areas such as their grade performance, attendance or personal biography.
● In hospitals, spreadsheets are used to manage patient data, like their personal
information, details of their illness or details of the medicines they use.
● Data is often exported from more complicated computer systems, such as
manufacturing, financial or marketing systems, to allow managers or analysts to
manipulate the data once it has been created, and to carry out forecasts, simulations
and ‘what-if’ exercises.
Excel also has many formatting options (borders, colour highlighting), to allow you to draw
attention to aspects of your figures. One of the more advanced features is conditional
formatting, in which Excel automatically assigns distinct colours to cells according to their
value (e.g. red for negative, green for positive and appropriate shades in between).
Choose an answer to each of the questions.You can check your answer to each
question as you go.
Which of the following functions counts all cells?
o a) SUMIF
o b) COUNTIF
o c) AVERAGE
o d) COUNT
What is the shortcut key in the keyboard to edit the formula or text?
o a) F9
o b) F2
o c) Ctrl + F
What is the shortcut key to cancel the selection in the sheet or cell?
o a) Ctrl + Alt + Delete
o b) Esc
o c) F12
In the next section, you will learn how to present and summarise a univariate dataset in a
table and graphical form.
2 Univariate data visualisation 22/07/24
● tabular form
● graphical form.
While presenting and summarising data in Excel, it is important to know the features of
data. If your data is univariate – that is, the data consists of many observations for only
one variable – then you can either use a frequency table or a histogram to summarise the
data and get an idea of its features. However, if your data is bivariate – that is, the data
consists of two variables (an independent variable and a dependent variable) – and you
need to know the relationship between these two variables, then you can use either a
contingency table or scatter diagram to summarise the data and get an idea of its
structure. You will learn about bivariate data visualisation later in the course.
The next section will briefly explain frequency tables.
In the next section, you will learn about various types of frequency distribution table.
Before describing each type of frequency distribution table, you need to know the
difference between ungrouped data and grouped data.
In simple terms, ungrouped data is raw data that has not been categorised. For example,
a manager in a firm knows that 100 employees work in their firm; this is raw data because
it does not tell you how many employees work in each department (e.g. production and
sales). However, if you have raw data that is categorised, it is defined as grouped data.
For example, if this manager knows that 50 employees work in production and
50 employees work in sales, it means that the data is organised in such a way that it
provides a clear indication of how many employees work in each department.
7–8 6.5–8.5
9–10 8.5–10.5
11–12 10.5–12.5
2 Univariate data visualisation 22/07/24
13–14 12.5–14.5
15–16 14.5–16.5
Referring to Table 5, you can say that the lower limit of the first-class interval is 6.5, as all
values between 6.5 and 7.5 are recorded as 7. Meanwhile, the upper-class limit of 8 is
8.5, as all values between 7.5 and 8.5 are recorded as 8. The real class limit of a class is
called a class boundary. A class boundary is obtained by adding two successive class
limits and dividing the sum by 2. The value so obtained is taken as the upper-class
boundary for the previous class, and lower-class boundary for the next class.
Midpoint or class mark: this is the average of a class interval, and is obtained by dividing
the sum of upper- and lower-class limits by 2. Thus, the class mark of the interval 7–8 is
7.5, as (7+8)/2=7.5.
The size or the width of a class interval: the size, or width, of a class interval is the
difference between the lower- and upper-class boundaries and is also referred to as the
class width, class size, or class length. If all class intervals of a frequency distribution have
equal widths, this common width is denoted by c.
Range: this is the difference between the maximum value and the minimum value of the
data set. For example, in the JC Electrics data set the maximum number of Electric
Motors sold has a value of 25, while the minimum is 14. Hence, to calculate the range, you
must calculate 25–14=11.
In this activity, you will learn how to make an ungrouped frequency table in Excel.
Once you have produced the ungrouped frequency table in Excel, you may need to
compare it with the final output by clicking ‘Reveal discussion’. This will help you to
see whether you have produced the accurate ungrouped frequency table or not.
Watch Video 1, which gives on how to create a frequency table, or follow the
instructions below.
2 Univariate data visualisation 22/07/24
● Open the Excel file JC Electrics. This file contains the quarterly data of
number of generators sold. Make sure that the data is arranged in columns.
● Copy the data containing the number of generators sold to Column A of a new
worksheet.
● Label Column B ‘Value’ and label Column C ‘Frequency’.
● Find the minimum and maximum value in the data. In this example:
=MAX(A5:A28), which is 15, and =MIN(A5:A28), which is 7
● Calculate the range: (MAX – MIN), so .
● To count the number of quarters in which 7 units were sold, you need to
calculate the frequency in Column C. Type =COUNTIF (Range, value). For
example, =COUNTIF (A5:A28,7)
● You should now save your file as you will return to this ungrouped frequency
table in a later activity.
2 Univariate data visualisation 22/07/24
Discussion
Table 6 Ungrouped
frequency table
Value Frequency
7 4
8 2
9 5
10 2
11 3
12 3
13 2
14 2
15 1
As is mentioned above, ungrouped frequency tables are useful when you have a small set
of data and you want to easily observe the frequency of each value in the data set.
However, if you have a large data set then a grouped frequency distribution table is the
best option; you will learn about these in the next section.
2 Univariate data visualisation 22/07/24
In this activity, you need to produce a grouped frequency table in Excel either by
watching the screencast in Video 2 or by following the instructions given below.
Once you have produced the grouped frequency table in Excel, you can check your
answer by clicking ‘Reveal discussion’.
● Open the Excel file JC Electrics. This file contains quarterly data of the
number of generators sold. Make sure that the data is arranged in columns.
● Find the range which is the difference between the maximum and minimum
value in the data set. You can do this either by entering the formula =MAX
(A2:A25)-MIN (A2:A25), or by simply using the results you have calculated in
Column H as, =H10-H11 (see Figure 11).
2 Univariate data visualisation 22/07/24
● Decide the class interval width. There are no firm rules on how to choose the
width. However, the following formula is the most common method to calculate
the width:
● You can round this value to a whole number or a number that is convenient to
add (such as multiple of 10). For example, the width calculated in the given
data set is 1.6, so will be taken as 2 (see Figure 12).
● This means that the first-class interval has lower limit of 7 and upper limit of 8.
See Figure 13 below.
● The next step is to calculate the frequency. Select the range E2:E6 and enter
FREQUENCY function as shown in the Figure 14 in the discussion.
● Press CTRL + SHIFT + ENTER to submit the FREQUENCY formula above as
an array formula. If it is entered correctly, you would see a formula wrapped in
curly braces {}.
● You should now save your file as you will return to this grouped frequency table
in a later activity.
2 Univariate data visualisation 22/07/24
Discussion
As mentioned above, a grouped frequency table is the best option to visualise the
frequency of values in a large data set. However, if you are interested to know the
proportion of a particular value in relation to the total number of values in the data set,
then a relative frequency distribution table is the better option. In the next section, you will
learn how to produce a relative frequency table in Excel.
In this activity, you will build a relative frequency table using the ungrouped
frequency distribution table from Activity 2. Once you have made the relative
frequency distribution table in Excel, check your answer by clicking ‘Reveal
discussion’ below.
The ungrouped frequency distribution table consists of three columns. Column A is
labelled ‘Generators’, Column B is labelled ‘Value’ and Column C is labelled
‘Frequency’. Add a fourth column to the table for the relative frequencies.
To calculate the relative frequencies, you need to divide each frequency by the
sample size (frequency / sample size). You can calculate the sample size by taking
the sum of all the frequencies in Column C, which is 24.
Discussion
In the next section, you will learn how to make cumulative frequency distribution tables in
Excel.
In the following activity, you will learn how to build a cumulative frequency distribution
table in Excel.
In this activity, you will build a cumulative frequency distribution table using the
grouped frequency distribution table in Activity 3. Once you have built the
cumulative frequency distribution table, you can check your answer by clicking
‘Reveal discussion’ below.
Borrow the grouped frequency distribution table from Activity 3. This table consists
of five columns. Column A is labelled Generators, Column B is labelled Class
intervals, Column C is labelled Lower limit of class interval, Column D is labelled
Upper limit of class interval and Column E is labelled Frequency.
Add another column, Column F, to the table for the cumulative frequency. The
cumulative frequency is calculated by adding each frequency from a frequency
distribution table to the sum of its predecessors. The last value will always be equal
to the total for all observations, since all frequencies will already have been added to
the previous total.
Discussion
class intervals. Column D shows the values of upper limit of each class intervals.
Column E shows the frequency.
To calculate the cumulative frequency in Column F, add each frequency to the
frequencies in the previous rows. If you do it correctly, the value in the last row will
be equal to the sample size.
In the next section, you will learn how to visualise these tables by drawing histograms in
Excel.
In this activity, you will learn how to produce a histogram in Excel by following the
instructions that are given below. Once you have produced the histogram in Excel,
you can check your answer by clicking ‘Reveal discussion’ below.
● Open the Excel file called JC Electrics, which contains the quarterly data of
number of generators sold. The third column C contains information about the
number of generators sold in each quarter of the year.
● Find the minimum and maximum value in the data set. You can obtain them
through the min (range) and max (range) functions in Excel. Type
=MAX(A5:A28) into cell L10 and =MIN(A5:A28) into cell L11. This will give the
minimum and maximum values of the data set, which are 7 and 15.
● Next, you need to specify a range of intervals (often called ‘bins’) for which to
count the number of observations that fall into each bin. The maximum value is
15 and the minimum value is 7, so you can make the class intervals 7–8, 9–10,
11–12, 13–14, 14–15, 15–16 etc. This means that the first class has the lower
2 Univariate data visualisation 22/07/24
value 7 and the maximum value 8 and so on. See Columns C and D in the
worksheet in Figure 18.
● There are many ways to calculate the width of the bin in Excel. One of the
easiest ways to calculate it is as the width of the bin or class intervals (sample
size / range), which is 3 (i.e. 24/8=3). In this example, the bin width is 2.
● Click on ‘Data Analysis’ in the ‘Data’ ribbon. This will bring up a list of some of
the statistical analyses that you can perform in Excel.
● Select ‘Histogram’ and click ‘OK’.
● Specify the input range as A5:A28 and the bin range as D5:D9
● Tick the box ‘Chart Output’ and specify the output location as H5, as shown in
Figure 19 below.
Click ‘OK’. Excel will put the histogram next to your frequency table.
● To remove the space between the bars, right click a bar, click Format Data
Series, and change the Gap Width to 0%.
● To add borders, right click a bar, click Format Data Series, click the Fill & Line
icon, click Border, and select a colour.
● Now click ‘Reveal discussion’ to compare what you have made against the
answer.
Discussion
The frequency density gives the ratio of the frequency of a class to its width.
Frequency density is used to plot a frequency density histogram; here, you plot frequency
density instead of frequency on the y-axis. Frequency density gives you the total area of
bars and tells you about the frequency in the histogram (rather than the height).
You can calculate frequency density when you have a set of grouped data that consists of
unequal widths of class intervals. For example, see the following Excel worksheet in
Figure 22, which shows information about the ages of a group of people playing football.
2 Univariate data visualisation 22/07/24
Figure 22 Information about the age of people playing football in an Excel file
To calculate the frequency densities:
● In Column C, find the class width of the class intervals by finding the difference of
upper and lower bounds/limits. (For example, , and so on.)
● Then, in Column D, divide the frequency of each class interval by its width.
Watch the video below and note down in the box the difference between frequency
histograms and frequency density histograms.
3 Bivariate data
Bivariate data refers to an instance in which two separate variables are examined and
compared. For example, a performance manager may be interested to know how well
employees perform their work, that is, to measure the efficiency of the employees. In this
example, the performance manager may examine two variables: the number of tasks they
complete and the quality of the tasks.
Bivariate data is collected to explore the relationship between two variables and then use
this relationship to inform future decisions. One of the main aims of the researcher is to
find out whether changes in one variable may be caused by changes in another variable.
This type of research involves two basic types of variables: independent variables and
dependent variables.
Independent variables: an independent variable is one that stands alone and is not
changed by the other variable you are trying to measure. The researcher changes the
independent variable to see the effect it will have on the dependent variable.
Dependent variables: a dependent variable is the one that changes because of
independent variable manipulation. It is the outcome you are interested in measuring,
and it ‘depends’ on your independent variable. In statistics, dependent variables are also
called response variables (as they respond to a change in another variable).
For example, say a researcher is interested to know whether mature students’
performance in a maths class changes based on the time of the class. To answer this
question, the researcher measures mature students’ performance in a morning class and
an evening class. The study finds that mature students perform better in the evening class
than in the morning class.
What are the independent and dependent variables in the example above? The
independent variable is the time of the class, and the dependent variable is mature
students’ performance in maths, as it might change in relation to the independent variable.
In the next activity, you will expand your knowledge of bivariate data.
Watch the following video and make notes on bivariate data in the free response
box below.
Bivariate data can be visualised using contingency tables and scatter diagrams. In the
next section you will learn about contingency tables.
1 Technology <50
2 Food 50+
3 Technology <50
3 Bivariate data 22/07/24
4 Food <50
5 Food <50
6 Food 50+
7 Technology <50
8 Technology <50
9 Technology 50+
10 Food 50+
See the following cross table (Table 8), which summarises the information of the sample
data. It counts the number of firms for each combination of sector and number of
employees.
Technology 4 0 4
Food 2 4 6
Total 6 4 10
The cross table shows that there are two firms in the food manufacturing sector that have
less than 50 employees. However, the cell 50+ shows that there are four firms in the food
manufacturing sector that have more than 50 employees. The sum of the total food
manufacturing firm is six which is the 60% of the grand total.
Contingency tables vary in size and type because the size of the contingency table
depends on the sample size and number of observations.
There is no formula to draw a contingency table in Excel. However, analysts use a
PivotTable to build contingency tables. A PivotTable is considered a powerful statistical
tool to summarise bivariate and multivariate data sets in an Excel spreadsheet or
database table and obtain the desired report. This tool does not actually change the
spreadsheet or database itself; it simply pivots or turns the data to view it from different
perspectives. Researchers and analysts use PivotTables especially when they have large
amounts of data that would be time consuming to calculate by hand. A PivotTable can
perform a few data processing functions such as identifying sums, averages, ranges or
outliers. It then arranges this information in a simple and meaningful way that draws
attention to key values. If you would like to experiment with PivotTables, go to the ‘Insert’
ribbon in Excel and select ‘PivotTable’.
variable) and the other variable on the y-axis (the dependent variable). You can then plot
the corresponding point on the diagram.
In the next activity, you will produce a scatter diagram in Excel either by following the
video or the instructions provided.
The screencast in Video 5 gives you instructions on how to draw scatter plots in
Excel.
Look at the following example, which shows a data set relating to the temperature
on several days in June, and the number of Pepsi drinks sold in a small shop.
Temperature
12 14 15 17 22 13 20 23
(X)
Pepsi (Y) 12 16 16 19 32 10 24 40
● On the Insert tab, in the Charts group, click the Scatter symbol.
● This will open a drop-down menu showing various types of scatter plots. The
standard type is the one with unconnected dots in the top left. Click the Scatter
symbol to insert this chart.
Discussion
Conclusion
In this course, you have started to familiarise yourself with the spreadsheet software
Excel, which is widely used in workplaces, and useful in many different fields and
contexts: for example, in business, medicine, marketing, tax and auditing, accounting and
finance.
You have also studied the basics of data analysis. The focus here was on the several
ways to visualise and summarise data using tools available in Microsoft Excel, such as
frequency tables, histograms, and scatter diagrams or plots. The main objective of data
analysis and statistical modelling is to help make more evidence-based decisions. The
various data visualisation tools studied in this course are only the first step toward starting
the decision-making process using data.
The next step could be to study descriptive statistics, which gets you closer to a
comprehensive analysis of the data. You could then become more confident when
examining and summarising data and using Excel tools such as measures of location and
measures of dispersion to numerically analyse data.
A second OpenLearn course on data analysis, Data analysis: hypothesis testing, is now
also available should you wish to take your studies further.
This OpenLearn course is an adapted extract from the Open University course
B126 Business data analytics and decision making.
Acknowledgements 22/07/24
Acknowledgements
This free course was written by Henry Lahr.
Except for third party materials and otherwise stated (see terms and conditions), this
content is made available under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence.
The material acknowledged below is Proprietary and used under licence (not subject to
Creative Commons Licence). Grateful acknowledgement is made to the following sources
for permission to reproduce material in this free course:
Course image: ismagilov / iStock / Getty Images Plus
Figure 1: jadamprostore / iStock / Getty Images Plus
Figures 2–28: © Excel
Every effort has been made to contact copyright owners. If any have been inadvertently
overlooked, the publishers will be pleased to make the necessary arrangements at the
first opportunity.
Don't miss out
If reading this text has inspired you to learn more, you may be interested in joining the
millions of people who discover our free learning resources and qualifications by visiting
The Open University – www.open.edu/openlearn/free-courses.