0% found this document useful (0 votes)
30 views19 pages

Data Analysis: Learning Outcomes

Uploaded by

Thuỳ Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views19 pages

Data Analysis: Learning Outcomes

Uploaded by

Thuỳ Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter 13 Introduction

13

Data analysis
Learning outcomes
• Specify the purpose of data, the different types and sources of data, the importance of data
comparability and the role of professional scepticism in relation to data collection, analysis and
visualisation
• Specify principles in relation to the collection and analysis of data, including populations, surveys,
presentation of simple frequency distributions, basic sampling and data ethics
• Identify types of error in data and types of data bias, including their causes and effects
• Identify issues in relation to the use of spreadsheets and the visualisation and interpretation of
data in graphs, charts etc
• Identify the characteristics of big data
• Specify uses of data science and data analytics by organisations
Introduction
Specific syllabus references are 6a, 6b, 6c, 6d, 6e, 6f
Learning outcomes 13

Syllabus links Syllabus links


Assessment context Knowledge of data analysis is developed throughout the ACA scheme. At the certificate level the
Management Information module also requires an understanding of issues around data bias. At the
Chapter study guidance professional level, the Business Strategy and Technology module requires the ability to use data to
evaluate strategic decisions and risks. At the advanced level the Audit and Assurance and Corporate
Learning topics Reporting modules require students to analyse datasets that include thousands of transactions.
13

1 Use of data in business


Assessment context
2 Sources of data and information
Questions on data analysis will focus on why data is used and potential problems with data analysis.
3 Qualities of good information Questions will be set in multiple choice format, either as a straight test of knowledge or in a
4 Data analysis scenario.
13

5 Spreadsheets Chapter study guidance


6 Potential problems with data Use this schedule and your study timetable to plan the dates on which you will complete your study
7 Presentation of information of this chapter.

8 Big data Topic Practical significance Study approach Exam approach Interactive
Questions
9 Data science
1 Use of data in business Approach Questions are
Summary likely to focus on
We live in the This section is
Further question practice information age, where introductory in the meaning of
many of the largest nature. Read planning, control
Technical references companies owe their through it and be and decision
success to their ability to aware of the making, and
Self-test questions identifying which
make the best use of difference
Answers to Interactive questions information and data. between data of these activities
and information, a particular piece
Answers to Self-test questions and the main of data is most
activities that useful for.
data is used for.
Stop and think
Think about the
planning,
controlling and

428 Business, Technology and Finance ICAEW 2023


Topic Practical significance Study approach Exam approach Interactive Topic Practical significance Study approach Exam approach Interactive
Questions Questions

decision-making section slowly, of the correlation


activities that you ensuring you coefficient? Do
perform in your understand the you know what
personal life. concepts type I and type II
What information covered. errors are?
do you use when Stop and think
you perform
these? Do any of your
outside interests
2 Sources of data and Approach Questions on include reading
information Read fairly sources of data analysis of any
As an accountant quickly through are likely to focus data? Perhaps
working in a business, internal and on which you analyse the
you may be asked to external data particular source performance of
help identify sources of sources as you would be best for your favourite
data. As this section may be aware of a particular type sports team?
shows there are many many of these. of data. How did you
new sources, particularly Make sure you learn to
thanks to the ‘internet of understand the understand the
things’. concept of the analysis given?
internet of things. 5 Spreadsheets Approach Exam questions
3 Qualities of good Approach Exam questions IQ 1: Good Read through the would not
information are likely to focus information first section which require you to
Make sure you use
Providing information is understand and on which introduces the
particular facets concept of a spreadsheets.
the core task of the appreciate the They are likely to
accountant. Knowing components of of good spreadsheet. If
information are you have used a test your
what makes good the ACCURATE understanding of
quality information is mnemonic and present in, or spreadsheet
missing from, a application, you their use and the
something that you will learn this. potential
find useful throughout particular piece will know the
Stop and think of information. information in problems and
your career. solutions.
Think about a this section.
piece of Spend more time
information you on sub sections 2
used recently – and 3 which deal
for example with the use of
instructions on spreadsheets in
how to find finance and
something. potential
Analyse it using problems with
the ACCURATE their use.
acronym. Understand the
principles in the
4 Data analysis Approach Exam questions ICAEW list of
As an accountant, you The examiner will focus on your principles.
would need to expects you to ability to interpret
analysis. Ensure 6 Potential problems with Approach Exam questions
understand the results of understand the
data analysis and the purpose of the you know what data Ensure you know will test whether
the various An important quality of a the meaning of you know the
potential pitfalls, without statistical
analyses mean, professional accountant comparability meaning of the
necessarily having techniques used.
detailed statistical You would not be particularly the is professional and the different terms. They are
mean, standard scepticism. When types of data likely to be
knowledge. This section expected to
deviation, and looking at data analysis bias. Learn the scenario based
introduces the basic perform
regression. Can an accountant needs to meaning of the (eg, proving you
statistical methods. calculations for
most of these you interpret the know what questions to two types of error with an example
different values ask to ensure that the (type 1 and type of some data that
items. Take this
is compromised,

ICAEW 2023 13: Data analysis 429 430 Business, Technology and Finance ICAEW 2023
Topic Practical significance Study approach Exam approach Interactive 1 Use of data in business
Questions

data is reliable. This 2) from and asking you to Section overview


section looks at ways in hypothesis identify what type
which the reliability of testing. of bias or error • Data refers to unprocessed facts, information is data that has been processed and is therefore
data may be exists). useful.
compromised.
• Information can help organisations to plan, control and make decisions more effectively.
7 Presentation of Approach Exam questions
information Review the will focus on your
Being able to present different types of ability to interpret 1.1 What is data and information?
information well is a skill chart shown and charts, or on your
ability to choose These two terms are often used interchangeably, and it is useful at this point to make sure you are
that all professional staff ensure that you clear about the distinction between them.
will find useful. understand them. which type of
You would not diagram would
need to produce be most Definitions
such diagrams in appropriate for a
Data: Distinct pieces of information, which can exist in a variety of forms – as numbers or text on
the BTF exam. Be particular
pieces of paper, as bits or bytes stored in electronic memory, or as facts stored in a person’s mind.
aware too of the situation.
Information: The output of whatever system is used to process data or to organise it in a useful way.
principles of
This may be a computer system, turning single pieces of data into a report, for instance.
good
presentation.

8-9 Big data and data Approach Exam questions 1.2 Types of data
science Learn the could test
definitions of the Quantitative data is data that concerns quantitative variables such a measurements or counts. It is
Big data is an area that meaning of big expressed as numeric values (eg, the number of units sold per day, or the height of individuals).
has been growing in data and the 4Vs. concepts
covered in these Quantitative data lends itself to statistical and other analysis.
importance over the last Ensure that you
10 years. Initially it was understand the two sections. Qualitative data is data about variables that are derived from qualitative attributes (eg, gender, or
only large companies factors that have nationality), which cannot be expressed in numerical terms. As discussed in the chapter Introduction
that had the resources to led to the growth to risk management it is more difficult to analyse such data using statistical analysis. It would not be
analyse big data but of big data. Make possible for example to talk about the mean favourite colour of a group of individuals. The mode can
developments in sure you know be used as a measure of central tendency (eg, the most popular colour is blue). If the colours were
technology have meant the meaning of ordered according to the rainbow and each one was ascribed a number, we could identify the
that sophisticated data data science and median as well as the mode and that would be meaningful.
analysis is open to most data analytics. Discrete data can only take exact values (usually a whole numbers), and usually refers to items that
business organisations. Appreciate how can be counted (eg, the number of people in a group).
big data and data Continuous variables can take any value within a range (for example the height of individuals). If we
analytics can were collecting data about the height of individuals, and only specified whole numbers (170 cm, 171
create value for cm etc) we would exclude the heights of people who fall between these values (eg, 170.4 cm). It
companies. would not be practical to precisely list all possible heights, as infinitesimally small differences in
Ensure you know height between individuals would need to be specified. With continuous variables, we therefore
the different specify ranges, and analyse the values based on which range they fall in (eg 170–171 cm, 171–172
types of big data cm etc).
(structured/
unstructured) and 1.3 Uses of data
source
(processed data, The amount of data that is created by society is increasing at an exponential rate. According to
open data and so Bernard Marr, every two days we create as much data as we did from the beginning of time until
on). 2003. Some of the most successful companies in the world owe their success to their ability to use
data to give themselves a competitive advantage. Data can be analysed and used to inform the
Once you have worked through this guidance you are ready to attempt the further question practice management of businesses. Essentially, data can be used for the following activities:
included at the end of this chapter. • planning
• decision making
• control

1.3.1 Planning
Businesses need to plan their activities to ensure that they have sufficient resources (labour,
materials and production capacity) to meet their anticipated demand. Analysis of data can help

ICAEW 2023 13: Data analysis 431 432 Business, Technology and Finance ICAEW 2023
managers make more accurate forecasts of what will happen and therefore better plans (eg, • Human resources and payroll records, holding information on people, their skills and aspirations,
supermarkets need to plan which products to stock, and in what quantities, based on anticipated and so on.
customer demand). • Machine logs and computer systems in production/operations containing information about
1.3.2 Decision making machine capacity, fuel consumption, movement of people, materials used, and work in progress,
set up times, maintenance requirements and so on.
Organisational managers constantly make decisions, from short-term operational decisions to
• Procurement data systems (for example, Electronic Data Interchange (EDI) systems which share
longer-term strategic decisions. At the strategic level, businesses need to plan new products or
data between organisations in the supply chain) hold information on the organisation’s purchases
services, and therefore need information about what customers want (or what they will want) in the
of raw materials and other goods and services that they buy in.
future. Often decisions may be mutually exclusive, meaning that managers have to choose between
two or more potential actions. If such decisions are supported by good data, then better decisions • Timesheets in service businesses, notably accountants and solicitors, containing data on the time
will be made leading to organisations achieving their objectives (eg, profits) more effectively. spent on various activities, both to justify fees to clients and to assess the efficiency and
profitability of operations.
When making decisions about risk, having more accurate information about the expected returns
and standard deviations of different projects would enable managers to better manage risk (eg, • Staff: Information may be obtained either informally in the course of day-to-day business or
selecting the project with the lower standard deviation where two projects have the same return through meetings, interviews or questionnaires.
(assuming the two projects are mutually exclusive, so they cannot both be chosen). Refer back to the
chapter Introduction to risk management for detail about the use of information to support decision 2.2 External data sources
making relating to risk. Capturing data information from outside the business may be formal or informal.
1.3.3 Control Formal collection of data from outside sources includes the following:
Control involves ensuring that an organisation is achieving its objectives and taking action to • a business’s tax specialists will gather information about changes in tax law and how this will
remedy situations where the organisation is deviating from its plans. Traditionally, financial affect the business
information was widely used for this purpose, such as comparing actual profits against budgets. • obtaining information about any new legislation on health and safety at work, or employment
Thanks to technology, much more quantitative, non-financial data is now available that enables regulations, must be the responsibility of a particular person who must then pass on the
organisations to monitor their operations in greater detail, in real time (eg, the use of sensors in information to managers affected by it
machines that can alert the users if the machine is starting to malfunction).
• research and development (R&D) work often relies on information about other R&D work being
Note: Refer back to the chapter The finance function and financial information for more detail about done by another business or by government institutions
the uses of financial information. See section 8 below for the uses of big data.
• marketing managers need to know about the opinions and buying attitudes of potential
customers. To obtain this information, they carry out marketing research exercises
2 Sources of data and information Informal gathering of information from the environment goes on all the time, consciously or
unconsciously, because the employees of an organisation learn what is going on in the world
around them – from the internet, social media, newspapers, television reports, meetings with
Section overview business associates or the trade press.
A business’s files (paper and computerised) include information from external sources such as
• Useful data/information comes from both inside and outside the organisation, from a variety of invoices, letters, emails, advertisements and so on received from customers and suppliers.
sources. Sometimes additional external information is required, requiring an active search outside the
• The internet of things is an important source of data. Smart devices, software, sensors and business. The following sources may be identified, accessed usually via the internet:
security devices are all part of the internet of things. • the government
• advice or information bureaux, such as Reuters or Bloomberg
• data sharing portals (web-based access to shared data)
2.1 Internal data sources
• consultancies of all sorts
Knowledge Content
• newspaper and magazine publishers
Capturing data/information from inside the organisation involves the following:
• specific reference works which are used in a particular line of work
• a system for collecting or measuring transactions data – for example: sales, purchases, inventory
etc, – which sets out procedures for what data is collected, how frequently, by whom, and by what • libraries and information services
methods, and how it is processed, filed or communicated • the systems of other businesses
• informal communication of information between managers and staff (for example, by word-of-
mouth or at meetings) 2.3 The internet of things
• communication between managers The internet of things is increasingly becoming an important source of data, with smart technology at
the forefront. Examples of technology that form the internet of things include the following:
Inside the business, data/information come from the following internal sources:
• smart devices (gas and electricity meters, smartphones, radio-frequency identification (RFID) tags
• The accounting records: Computerised accounting systems hold information that may be of great
and fitness trackers)
value outside the finance function, for example, sales information for the marketing function. To
maintain the integrity of its accounting records, a business operates controls over transactions. • software (such as applications to control the smart devices)
These also give rise to valuable information. An inventory control system, for example, will include • sensors (such as motorway traffic sensors and ‘black boxes’ used in cars for insurance purposes)
details of purchase orders, goods received notes, goods returned notes and so on, which can be • security devices (such as CCTV and number plate recognition technology) that can be tracked via
analysed to provide management information about speed of delivery, say, or the quality of
the internet
supplies.

ICAEW 2023 13: Data analysis 433 434 Business, Technology and Finance ICAEW 2023
Smart phones, which can provide data about the location of the owners and their movements.
Professional skills focus: Applying judgement
3 Qualities of good information
Exam questions may describe some information and ask you to identify which of the qualities of
good information is missing. It is therefore important to know the meaning of these terms.
Section overview

• Information should be ACCURATE and complete. It should have a benefit that is in proportion to
Interactive question 1: Good information
its cost, and it should be targeted at its user. It should be relevant and from an authoritative
source. It should be provided at the time when it is needed, and it should be easy to use. Managers often complain that they are weighed down by information which they struggle to make
• What makes information valuable is its source, its ease of assimilation, its accessibility and its sense of and to use. Which of the ACCURATE qualities of good information are most often ignored in
relevance. information given to managers?
• The cost of obtaining information should be less than the benefits it brings. See Answer at the end of this chapter.

Information of whatever type is of good quality if it has eight key characteristics, which are easiest to
remember if you use the mnemonic ACCURATE.
4 Data analysis
Note that the second A here stands for ‘Authoritative’, an increasingly important concern given the
huge proliferation of information sources available online today.
Section overview
Quality Example
This section provides an overview of data analysis, explaining the stages of data analysis, before
Accurate Figures should add up, the degree of rounding should be appropriate, there
describing some data analysis techniques:
should be no typographical errors, items should be allocated to the correct
category, and assumptions should be stated for uncertain information (no • Descriptive statistics which was covered in the chapter on risk management
spurious accuracy). • Inferential statistics, which aims to make inferences about a population of data by taking a sample
from it. Inferential statistics can be either:
Complete Information should include everything that it needs to include, for example
external data if relevant, or comparative information. – exploratory data analysis, which aims to identify relationships between different variables in a
set of data; or
Cost- It should not cost more to obtain the information than the benefit derived from – confirmatory data analysis using a sample of data to infer information about a population from
beneficial having it. Providers of information should be given efficient means of collecting which the sample was drawn.
and analysing it. Presentation should be such that users do not waste time
working out what it means.

User-targeted The needs of the user should be borne in mind, for instance senior managers 4.1 What is data analysis?
may require summaries, whereas junior ones may require detail.
Data analysis involves obtaining useful information from data to give insights for management. A
Relevant Information that is not needed for a decision should be omitted, no matter how well-planned data analysis programme involves the following stages:
‘interesting’ it may be. (a) Identifying the information needs of the business. This will depend on the business’s objectives
and strategies (eg, an online retailer might want to know what products its target market wants).
Authoritative The source of the information should be a reliable one (not, for instance, ‘Joe
Bloggs’ Predictions Page’ on the internet, unless Joe Bloggs is known to be a (b) Collecting the data. This involves identifying the sources of the data. Data may already be
reliable source for that type of information). available withing the business’s systems, or it may be necessary to invest in new sources. (See
section 2 above for sources of data.)
Timely The information should be available when it is needed. (c) Analysing the data – using statistical techniques to convert the data into the information
Easy to use Information should be clearly presented, not excessively long, and sent using required.
the right medium and communication channel (email, phone, hard-copy report (d) Presenting the information. Decisions need to be made about how best to communicate the
etc). information to the managers of the business. Managers may not be statisticians, so it is important
that the information is presented in such a way as to be easily understood. Information could be
presented using graphics or visualisations for example, or using tables. (Presentation of
information is discussed in section 6 below.)
(e) Using the information to make better decisions, improve the performance of the business or
make better plans. (The purpose of data and information was discussed in section 1 above.)
The process above is likely to be an iterative one - which means that rather than the stages occurring
in sequential order, earlier stages may be repeated based on feedback obtained at later stages (eg,
during the data analysis stage, insights obtained may lead analysts to modify the second phase,
identifying additional sources of data).
Data analysis may be based on the whole population or on a sample.

ICAEW 2023 13: Data analysis 435 436 Business, Technology and Finance ICAEW 2023
Definition If a sample does not reflect the characteristics of the population as a whole, then incorrect inferences
could be made. For example, if we are trying to determine the average income of families in a
Population: Population – the entire set of data from which a sample is selected for analysis (eg, sales
particular region, we might take a sample of families from that region and calculate their average
to all customers in the last year.)
income. We would use this as our estimate of the average income of all families in the region. If our
sample was not representative however, our estimate would be incorrect. If our sample of families all
lived in a particularly affluent area, for example, then it is likely that the average income of the sample
4.2 Analysing the data would be higher than the average income of families of the region. Our inference about the
population based on this sample would be incorrect.
A number of types of analysis can be used. The broad categories of statistical analysis commonly
used are as follows: In order to make it more likely that a sample is representative, the following factors need to be
considered:
Definitions Method of sample selection: Our method of selecting items from a population to form a sample
should be such that all members of the population have an equal chance of being selected. In
Descriptive statistics: The use of statistics that summarise the data in a data set. Examples of
practice it may often be difficult to achieve this as there may be no listing that contains all items from
descriptive statistics are the measures of central tendency (mean, median and mode) and measures
a particular population. Statisticians refer to a sampling frame, which is the list of items that the
of dispersion (range, variance and standard deviation) discussed in the chapter Introduction to Risk
sample will be drawn from. Ideally, the sampling frame should include all items in the population.
Management.
Sample size: The general principle is that the larger the sample size is, the more likely it is to be
Inferential statistics: Statistical methods that deduce the characteristics of a bigger population from
representative. A rule of thumb is that a sample size should contain at least 30 items. As samples
a small but representative sample.
become larger, they tend to reflect more accurately the characteristics of the population. Using larger
Exploratory data analysis: Identifying relationships in a set of data – for example, patterns that the samples also allows for filtering of results – for example, analysing the sample into male and female
business was not aware of that could be useful (eg, finding out that customers with particular participants, and making inferences about these.
characteristics are more likely to churn. Churn is a term used to refer to customers who switch to
other providers of a service.) Exploratory data analysis may use regression and correlation, which are 4.3.2 Sampling methods
covered in the Management Information module. Sampling methods that aim to give every item in a population the chance of being selected include:
Confirmatory data analysis: Using statistical methods to confirm a pre-determined hypothesis (eg, a Simple random sampling: A sample should be drawn from the whole population. Every item in the
factory believes that on average, 5% of its output is faulty, and wants to investigate to see if this is population is assigned a number. Random number generators are then used to select a sample of
correct). Confirmatory data analysis is discussed in more detail in the section on sampling below. numbers, and the items with the corresponding number are selected for the sample. The
disadvantage of random sampling is that in spite of the sample being selected randomly, it may not
be representative of the population.
4.3 Sampling Systematic sampling: All items in the population are assigned a number. A random number
generator is then used to select the first item to be chosen. After this, every nth item is chosen (eg,
Definition every 10th item). Systematic sampling could provide a sample that is more representative than a
simple random. For example, if the population is numbered 1 to 100, it is possible that a random
Sampling: Analysing a sample of data from a population, and based on this, making inferences sample of 10 items might be heavily biased – eg, too many numbers below 50. In systematic
about the population. sampling, this bias will not be present.
Stratified sampling: The population is divided into sub populations (strata) based on a particular
Sometimes, the whole population may be used in data analysis (eg, audit software tools enable all characteristic. The number of items in the sample from each strata is determined based on the
sales invoices in a particular year to be analysed). Developments in information technology and the relative size of each strata. The sample is then taken by randomly selecting the appropriate number
growth of big data over the past 20 years has made it more feasible to analyse whole populations of of items from each strata. The advantage of stratified sampling is that it ensures that all strata are
data. represented in the sample.
More often, information about the whole population is not available. It is often unfeasible or even For example, a population of trade receivables contains 200 customer balances. In the population of
impossible to find out information about every item in a population. In order to perform statistical 200 customers, 40 customers (20%) have a balance greater than £1,000, 140 customers (70%) have a
analysis therefore, samples are taken, and inferences are made about the population based on balance between £0 and £1,000, and 20 customers (10%) have a credit balance. The auditors wish to
analysis of the sample (eg, in order to find out the mean salary of ICAEW members, a sample of select a stratified sample of 30 items in total for testing. They will randomly select 6 (20% × 30)
ICAEW members could be taken. The mean salary of the members of the sample would be used as customer balances from the strata with balances greater than £1,000, 21 (70% × 30) customer
an estimate for the mean salary of all ICAEW members). balances between £0 and £1,000, and 3 (10%× 30) credit balances.

4.3.1 Representative samples 4.3.3 Practical considerations in sample selection


When making inferences, it is probable that the statistics obtained from the sample will not be While statisticians would clearly like to use sampling methods that provide the most representative
exactly the same as the population, so it has to be recognised that they are an estimate. In order to samples, practical considerations also need to be taken into account when determining sampling
make estimates more reliable, statisticians try to ensure that as far as possible, their samples are methods. In particular, the costs of a particular sampling method need to be compared with the
representative of the population from which they are taken. benefits. The costs of using a sampling method that may give a less representative sample may be
considerably less that using a sampling method where all items in the population could be chosen.
Definition For example, conducting a customer survey on a social network platform would involve only
including members of that social network in the sample and would ignore the opinions of customers
Representative sample: A sample that reflects the characteristics of the population from which it is that do not use the platform, but the costs could be considerably lower than contacting customers by
drawn. If a sample is representative of the population, sample results can be analysed and valid traditional methods such as post.
inferences can be made about the population as a whole.

ICAEW 2023 13: Data analysis 437 438 Business, Technology and Finance ICAEW 2023
4.3.4 Professional scepticism The survey respondents are a sample of that population. The respondents of a survey should be
You may recall from the chapter The accountancy profession that accountants should apply representative of the population as a whole. Discussion of what makes a sample representative is
professional scepticism to their work. This involves assessing information, estimates and discussed in more detail earlier in this chapter. The following factors need to be considered when
explanations critically, with a questioning mind, and being alert to possible misstatements due to identifying people who will be invited to participate in the sample:
error and fraud. 1. The number of respondents should be large enough to form a representative sample. Generally, it
When dealing with the results of statistical analysis, professional scepticism would imply knowing the is better to have more respondents than less.
right questions to ask to assess the reliability of the analysis. If inferences have been drawn from 2. The respondents should belong to the target population. If the survey is trying to identify the
samples, it would be pertinent to ask about the size of the sample used, the sampling method used, opinions of teenagers in the UK, there is no point in asking adults to complete the survey, for
the age of the data (older data could be out of date and therefore not useful) and the questions that example.
were asked.
4.4.4 Problems of low response rate
4.4 Surveys When invitations are sent to people to participate in surveys, the response rates are generally low. A
response rate of 50% is considered to be excellent. A low survey rate gives rise to a risk of ‘self-
Surveys are widely used by organisations to obtain useful information for decision making (eg,
selection bias’, which means that those who respond to the survey are not a representative sample of
market research surveys can help businesses to decide what products to develop). This section deals
the population that the survey is aiming to find out about. It may be, for example, that people with
with good practice in designing surveys.
busy jobs and high workloads would not respond to the survey, while people with more spare time
4.4.1 Writing good survey questions would. This would mean that the opinions of people with busy jobs would be ignored from the
results of the survey, leading to incorrect conclusions.
(a) Use simple, short, clear questions to ensure that the reader understands the question. There
should be no ambiguity or scope for misinterpretation of the question. Long, waffly questions Types of bias, such as self-selection bias, are discussed in more detail in the section Potential
can be off putting to respondents. problems with data, later in this chapter.
(b) Ask questions thatrequire specific answers rather than judgement (eg, rather than asking ‘do
you eat in restaurants often?’ ask ‘how many times per month do you eat in restaurants?’). This is
because different respondents will have different opinions on what often means.
5 Spreadsheets
(c) Avoid broader questions such as, ‘do you like our product?’ Try to identify the facets that make
the product likeable and ask more detailed questions. For example, ‘do you like the design of Section overview
the product?’, ‘do you find the product easy to use?’, ‘does the product meet your needs?’
(d) Using scales can provide more information than requiring yes/no answers. Respondents could Spreadsheet applications, such as Microsoft Excel, Google Sheets and Apple numbers, enable the
be asked to place a tick in one column – for example where the columns are ‘strongly agree, storage and analysis of data.
agree, neither agree nor disagree, disagree, strongly disagree’ or they could be asked to A spreadsheet application provides the user with an array made up of cells. Numerical values, text or
evaluate something on a scale of 1 to 5. Avoid using very large scales (such as on a scale of 1 to formulae can be entered into the cells.
20) as most participants will choose either the two extremes, or the centre, so the remaining
Within finance, spreadsheets are used for many purposes, including budgeting and forecasting and
values will be redundant.
‘what-if’ and scenario analysis. Smaller businesses may maintain their accounting records in
(e) Avoid leading questions. A leading question is one where the phrasing of the question might spreadsheets.
bias the answer. For example, ‘do you avoid buying products that come in single use plastic?’
would bias the reader as the question clearly suggests that they should avoid buying such The use of spreadsheets is not without risks. Risks include the risk of errors in the spreadsheet, lack
products. Instead, the question could be rephrased as, ‘how often do you buy products that of consistency about the way spreadsheets are designed and the styles used, poor design, lack of
come in single use plastic?’ documentation of the spreadsheet design, and loss of data.

(f) Avoid ‘double-barrelled’ questions – a double-barrelled question is a question that is actually The ICAEW has published a list of 20 principles of good spreadsheet practice, which aim to mitigate
two questions in one. For example, ‘do you enjoy maintaining your home and garden?’ would some of these risks.
be difficult to answer for a person who enjoys maintaining their garden but not their home.
(g) If trying to get opinions on new products, qualitative data can be as important as quantitative.
Rather than just knowing that 45% of people surveyed like the new product, it would be as 5.1 Introduction to spreadsheets
important to know what those 45% liked about the product (and what the other 55% did not
like). The use of focus groups can be more useful than surveys in obtaining such information. A
focus group is an informal meeting where a small group of potential customers are shown a 5.1.1 Exam context
proposed new product or service and asked to discuss it.
You will not be required to use a spreadsheet in the BTF exam, but you may be required to show an
4.4.2 Survey length understanding of the principles of spreadsheets and what they can be used for. Later exams, such as
Business Strategy and Technology, will require you to use spreadsheets in the exam.
If surveys are too long, the respondents will suffer ‘survey fatigue’ at which point they will either stop
the survey or provide random answers to the remaining questions. It is important therefore to try to 5.1.2 What is a spreadsheet?
keep the surveys short. Prioritise the questions and ask only the important ones.
A spreadsheet is an application that enables the user to store and analyse data. Well known
4.4.3 Selecting respondents for surveys spreadsheet applications include Microsoft Excel, Google Sheets and Apple Numbers.
It is important to define what is the target population, whose opinion the survey is trying to ascertain. A spreadsheet consists of an interface made up of an array of cells into which data can be entered.
If we are conducting a survey to discover what teenagers in the UK think of a new product, our target Each cell has a unique reference number, based on which column and row it is in. For example, in the
population is all of the teenagers in the UK. spreadsheet in Figure 13.1 below, the word ‘Sunday’ is in cell A4, while sales for Tuesday are shown
in cell B6.

ICAEW 2023 13: Data analysis 439 440 Business, Technology and Finance ICAEW 2023
A B C D E
1
Figure 13.1: Basic spreadsheet 2 2023 2024 2025 2026
3 S$000 S$000 S$000 S$000
A B C D E F G H I 4 Revenue 47,000 54,050 62,158 71,482
5 Commission on freelance del 17,000 19,550 22,482 25,854
1
6 Total revenue 64,000 73,600 84,640 97,336
2 Day Sales
7 Direct costs
3
8 Drivers salaries (24,000) (26,400) (29,040) (31,944)
4 Sunday 0
9 Operational workers (19,000) (20,900) (22,990) (25,289)
5 Monday 2,000
10 Fuel costs (3,000) (3,300) (3,795) (3,382)
6 Tuesday 3,000
11 Depreciation trucks (10,500) (11,550) (12,705) (13,976)
7 Wednesday 2,500
12 Total direct costs (56,500) (62,150) (68,530) (75,041)
8 Thursday 3,100
13
9 Friday 3,400
14 7,500 11,450 16,110 22,295
10 Saturday 4,200
15 11.72% 15.56% 19.03% 22.91%
11
12 Total 18,200
• The overall layout of the spreadsheet in Figure 13.2 table is laid out logically. Each column shows
13
the data for a particular year, and each row shows the values for a particular type of cost or
revenue. There is also a calculation of gross profit and gross profit margin in rows 14 and 15.
5.1.3 Types of data • The text in row 2, and in cell A4 and A7 has been shown in bold. This adds emphasis.
Any cell in a spreadsheet can store one of the following: • The currency heading in row 3 are italicised - this helps to distinguish the headings from the
• Text. A text cell usually contains words, but may contain numbers that do not represent numeric values below.
values for calculation purposes (eg, a Part Number) • The cells in row 5 make use of a bottom border – this is often used to show that the cells below (ie,
• Values. A value is a number that can be used in a calculation. Many of the cells in column B above row 6) contain a sub total. The cells in row 12 contain a single line border at the top and a double
contain sales values. line border at the bottom. This highlights that this is an important total, and it is the last total in this
• Formulae. A formula is an expression that calculates the value in a cell, usually referring to other data.
cells. For example, the formula = B4+B5 would add the values of cells B4 and B5. • The number format chosen for most of the values is a standard number, with a comma to
• Functions are predefined formulae. In Figure 13.1, the cell B12 contains a function = sum(B4:B10) distinguish the 1,000s and no decimal places. Brackets are used to show a negative number
which adds up the values in all the cells from B4 to B10, giving the total sales for the week. (You (although other conventions could also be used, such as a minus sign before the number or a
cannot see the function above, only the output of the function is shown.) Modern spreadsheet negative number could appear in red). In row 15, a percentage number format has been shown,
applications have the ability to use dozens of formulae and functions including arithmetical, in this case with two decimal places.
financial and statistical functions. • Several formulae and functionshave been used in the spreadsheet above. Formulae and
functions all begin with an equals sign (=). B12 is the sum of the values in cells B8 to B11. The
5.1.4 Worksheets and workbooks function = SUM(B8:B11) has been used. B14, gross profit, is the difference between total revenue
Multiple spreadsheets (worksheets) may be contained within a workbook (eg, a workbook (in cell B6) and total direct costs (in cell B12) and has been calculated using the formula =B6-B12.
containing sales data might have a different worksheet for each branch). It is possible for formulae in The gross profit margin in cell B15 uses the formula = B14/B6. Similar formulae have also been
one worksheet to refer to data in another one. used in the corresponding cells in columns C, D and E. The formulae themselves cannot be seen
in the spreadsheet above, only the results of the calculations made by the formula, although there
5.1.5 Presentation of data in a spreadsheet is a setting in most spreadsheet packages whereby the formulae can be displayed. You will not be
Spreadsheets are very flexible, enabling the overall structure of a spreadsheet to be designed in the tested on knowledge of formulae or functions in the exam but may be required to identifywhich
way the creator considers most appropriate. All spreadsheets need to be planned and then particular cells in a spreadsheet contain them.
constructed carefully.
5.1.6 Visualisations
The data within cells can be formatted in various different ways. The spreadsheet in Figure 13.2
includes several formatting features: A number of charts or diagrams can be created from the numbers in a spreadsheet, including pie
charts, bar charts and line charts. If the underlying data in a spreadsheet is changed, the
Figure 13.2: Presentation of data visualisations based on this data will be automatically updated. Visualisations are described further
in Section 7, but it should be noted here that a wide range of the visualisations described in section 7
can be produced within a spreadsheet.

5.2 Use of spreadsheets in finance


Since spreadsheets have the ability to contain large amounts of data and give the user the flexibility
to determine how that data is presented, spreadsheets can be applied to a wide number of tasks.
Within the finance function, some common uses of spreadsheets include the following:
• accounting records for small businesses
• budgets and forecasts
• what-if analysis/scenario analysis

ICAEW 2023 13: Data analysis 441 442 Business, Technology and Finance ICAEW 2023
5.2.1 Accounting records for small businesses Errors in spreadsheets: There could be errors in the formulae, errors in entering data, or errors in the
Small businesses or other small organisations with a limited number of transactions often keep their logic of the spreadsheet, leading to the final outputs (the values that the spreadsheet is designed to
accounting records in spreadsheets. Typically, these include worksheets showing bank transactions, calculate) being incorrect.
sales invoices and supplier listings. The transactions are recorded and can be analysed appropriately.
The following is an example of the use of a spreadsheet to record and analyse the bank payment Context example: Public Health England
transactions of a sports club: During the coronavirus pandemic, Public Health England used a spreadsheet to calculate the daily
Figure 13.3: Analysis of bank transactions number of new infections. The data originated from test results, sent by private testing companies in
a template. These templates were automatically uploaded onto a central spreadsheet maintained by
A B C D E F G H Public Health England, which calculated the number of positive test results.
1 Bank account payments Between 25 September 2021 and 2 October 2021, the number of cases reported by Public Health
Football England was 50,786, but it was later discovered that the true number of cases for this period was
2 Tennis Courts Pitch Grass Water & Waste higher by 15,841.
Date Voucher Amount Maintenance Electricity Disposal
3 10/02/20X1 Veolia 27.90 27.90 The error was caused by the use of an older version of Microsoft Excel, which could only handle
4 17/01/20X1 Michael Cook 57.00 57.00 65,000 lines of data, and since the details of each test created several rows of data, the spreadsheet
5 17/01/20X1 Gareth Davies 30.00 30.00 could only manage 1,400 cases. When the number of cased exceeded 1,400, further cases were
6 18/01/20x1 DD EDF Energy 59.00 59.00
ignored and missed from the total.
7 Lack of consistency in style and design of spreadsheets. Where different people within an
8 Totals for month 173.90 57.00 30.00 59.00 27.90 organisation are producing spreadsheets, if there are no standards in place for the use of
9 spreadsheets, it can lead to inconsistent presentation. This can make it hard for users to understand
them.
In the UK, if businesses are registered for VAT, they must maintain certain records in a digital format Poor design. Many spreadsheets evolve over time, so little consideration is given to the overall
which are used to complete and file the business’s VAT returns. Many small businesses, which do not structure. This can lead to poor structure, for example overly complex methods or poor presentation.
use accounting software packages, keep their VAT records in spreadsheets in order to meet this Lack of documentation of inherited or reused spreadsheets. Where responsibility for updating or
requirement. maintaining a spreadsheet designed by one person is passed onto a second person, problems may
occur if the design of the spreadsheet has not been documented. The new user may not be fully
5.2.2 Budgets and forecasts aware of the implications of changing a formula, for example.
A budget is a financial plan for a period – typically one year. In many organisations, departments have Loss of data. If important data is kept in spreadsheets, then there is a risk that data will be lost if the
to prepare their own budgets, which are then sent back to the finance function that consolidates spreadsheet file is corrupted or deleted (accidentally or intentionally). Procedures should be in
these to produce a budget for the company overall. Spreadsheets can assist the budgeting process place to ensure that all important spreadsheets are backed up regularly.
in the following ways:
• Budget templates – standardised budget templates can be sent to each department to ensure
that all departmental budgets follow the same format. This helps departmental managers know
what information they need to provide in their budget and ensures that consistent information is
5.4 Principles of good spreadsheet practice
provided. The ICAEW has published a set of principles of good spreadsheet practice. The aim of the principles
• Formulae perform arithmetic calculations – such as summing costs and calculating profit totals. If is to help reduce the amount of time wasted by poor spreadsheet design and reduce the number of
any of the budget inputs is changed, the totals are automatically recalculated saving time errors caused. The principles do not provide detailed guidance about good spreadsheet design.
performing manual arithmetic. If the underlying cost or revenue data is changed, the spreadsheet Instead, they provide a framework for good spreadsheet development and use within organisations.
will automatically recalculate the revised budgeted profits The principles are presented below with brief explanations in brackets. The principles are available
• Consolidations – departmental budgets can be saved as worksheets within a workbook, and on the ICAEW website with more detailed explanations.
these can be automatically added up to calculate a budget for the whole organisation. If the The spreadsheet’s business environment:
budget from one department is then amended, the consolidated budget will also be updated 1. Determine what role spreadsheets play in your business and plan your spreadsheet standards
automatically. accordingly
Spreadsheets can also be used for financial forecasts. As with budgets, once a forecast has been set 2. Adopt a standard for your organisation and stick to it (eg, standards over cell formats)
up in a spreadsheet, the user can change the assumptions used and the spreadsheet will
automatically recalculate the forecast based on the new assumption. 3. Ensure that everyone involved in the creation or use of spreadsheets has an appropriate level of
knowledge and competence
5.2.3 What-if analysis and scenario analysis 4. Work collaboratively, share ownership, peer review
What-if analysis is used in financial modelling. A financial model can be set up in a spreadsheet, such Designing and building your spreadsheet:
as a cash flow forecast for the next 10 years. The various assumptions used in the model can then be
changed in the spreadsheet to see what happens – eg, what if sales revenue growth is zero, what if 5. Before starting, satisfy yourself that a spreadsheet is the appropriate tool for the job
sales revenue growth is 10% etc. The spreadsheet will recalculate the cash flows based on the 6. Identify your audience. If a spreadsheet is intended to be understood and used by others, the
different assumptions. design should facilitate this
7. Include an ‘About’ or ‘Welcome’ sheet to document the spreadsheet
5.3 Risks from spreadsheet use
8. Design for longevity (eg, it should be possible in future to adapt the spreadsheet for changes in
Where businesses rely on spreadsheets (for example in financial models) there are potential risks: tax rates or quantities)

ICAEW 2023 13: Data analysis 443 444 Business, Technology and Finance ICAEW 2023
9. Focus on the required outputs; start spreadsheet design by identifying the required outputs and The definition above makes the point that data is only comparable if it is free from differences due to
work backwards to determine what inputs and logic are required factors other than the underlying statistics themselves. If data is impacted by these, then incorrect
10. Separate and clearly identify inputs, workings and outputs (a well-structured spreadsheet is conclusions could be drawn.
easier to understand and maintain) Comparability may be distorted by two main sources:
11. Be consistent in structure (eg, use columns for different years and use a specific column for year • use of different definitions; or
one, with subsequent years being presented to the right of this) • Use of different measuring tools, compilation and presentation practices.
12. Be consistent in the use of formulae (eg, groups of cells using the different formulae should be
separated from each other) Context example: Differences in definitions
13. Keep formulae as short and simple as practicable An important statistic used in economics is the rate of unemployment. Different methods are used to
14. Never embed numbers in a formula if they might change (eg, in a formula calculating sales value measure unemployment in different countries. In some countries, households are surveyed, while in
from sales volume and sales prices, ensure that the sales volume and sales prices are entered into others businesses are surveyed. In some countries, people are not classed as unemployed until they
separate cells and not within the formula cell) have been out of work for three months, in other countries the period is different. This can lead to
15. Perform any calculation once and then refer back to that calculation (don’t have a calculation for differences in statistics simply because of the different methods used rather than to the underlying
the same value in many different cells) unemployment.

16. Avoid using advanced features when simpler features could achieve the same result
Spreadsheet risks and controls: 6.3 Data bias and representative samples
17. Have a system of backup and version control, which should be applied consistently within an If sample data is used to make inferences about the population, then it is important that the sample
organisation (eg, controls to identify which is the latest version of a spreadsheet) is representative of the population and free from bias. If not, wrong conclusions may be reached.
18. Rigorously test the workbook (to help identify any mistakes in a spreadsheet, it is recommended One important factor is the size of the sample. The larger the sample is, the more likely it is to be
that it should be tested by a peer) representative of the population. However, increasing sample sizes can increase the costs of
19. Build in checks, controls and alerts from the outset and during the course of spreadsheet design collecting data significantly, so a balance needs to be struck between the size of the sample and the
(eg, a formula to check that the balance sheet balances) costs. Users of data should question the size of the samples used.
20. Protect parts of the worksheet that are not supposed to be changed by user (eg, cells containing Another issue relating to samples is bias.
formulae can be locked so that users do not accidentally change them)
Definition
6 Potential problems with data Data bias: Where the data in the sample is not representative of the population for reasons other
than the size of the sample.

Section overview

As users of information, accountants need to exercise professional scepticism in relation to the


Context example: Unrepresentative samples
reliability of data, accountants need to question its comparability and whether it might contain bias During elections, pollsters call samples of voters and ask them how they intend to vote in the
Hypothesis testing involves making a hypothesis about a population statistic (eg, the mean) and forthcoming election. They use this as a basis for predicting what the results of the election will be.
then testing a sample of data to see if this hypothesis is correct. The nature of hypothesis testing is One reason that polls are often wrong is because people are not always honest when talking about
such that there is always the possibility that the wrong conclusions are drawn from the sample – the their voting intentions, perhaps feeling embarrassed about admitting which party they truly intend to
two types of error are called type I and type II errors. vote for.

There are several different types of data bias.


6.1 Professional scepticism
Type of bias in data Meaning
In the context of data analysis, areas that accountants may question include comparability of data
from different sources, and data bias which means its data is not representative of the population Selection bias This occurs when the data is not selected
from which it is drawn. randomly and leads to a sample that is not
representative of the population. In order to be
6.2 Comparability of data representative, all items in the population
One potential issue with data or statistics from different sources is that of comparability. should have an equal chance of being selected
for the sample.
Definition Self-selection bias This is a type of selection bias. It occurs when
Comparability: The extent to which differences between statistics from different geographical areas, individuals select themselves to be part of a
non-geographic domains, or over time, can be attributed to differences between the true values of sample.
the statistics (OECD). Example – online questionnaire
People are often provided with the opportunity
to be entered into a prize draw if they complete

ICAEW 2023 13: Data analysis 445 446 Business, Technology and Finance ICAEW 2023
Type of bias in data Meaning Type of bias in data Meaning

an online questionnaire. Only people who are students passed BTF in the last sitting. This 95%
interested in being entered into the draw are did not include the students who had not been
likely to participate in the questionnaire. The allowed to sit the exam because they had
questionnaire will not therefore reflect the achieved less than 45% in their mock exams.
opinions of those who are not.

Observer bias This occurs when observing and recording 6.4 Type I and type II errors
results and relates to interpretation. The
Hypothesis testing involves using data to confirm whether a predetermined idea or ‘null hypothesis’
researcher allows their assumptions (which may
is true, or whether an alternative hypothesis is true. A sample is taken to see if it confirms the null
be unconscious) to influence their observations.
hypothesis or contradicts it, in which case the alternative hypothesis is true. The null hypothesis is
Omitted variable bias Exploratory data analysis aims to identify rejected if the sample shows a result that is statistically significantly different from the result expected
relationships between data – for example, by the hypothesis.
finding out what characteristics people who are
potential customers display. Definition
Omitted variable bias is where key variables are Statistical significance: the results generated by testing or experimentation are unlikely to occur by
not included within the data to be analysed. For chance or randomly, but occur due to a specific cause.
example, the researcher may omit to record the
Note: Determining statistical significance is beyond the scope of the BTF syllabus.
ages of people even though this might be a
factor that determines whether a person is a
potential customer. In hypothesis testing there is a risk that wrong conclusions are reached as follows:
Cognitive bias This relates to human perception and includes • A type I (‘false positive’) error occurs where the null hypothesis is true, but because the sample
bias depending on how data is presented (eg, result is significantly different, the null hypothesis is rejected.
infographics or the order of presentation), the • A type II (‘false negative’) error occurs when the null hypothesis is false, but it is accepted
context in which it is presented, and ‘anchoring’, because the sample result is not statistically significantly different to the null hypothesis.
where the perception of whether something is
good or not is influenced by being shown a Context example: Type I error
previous or expected value for that variable.
A sports retail company believes (correctly) that the average age of its customers is 28.0 years. It has
Example – recognising the context of
decided to test this hypothesis.
performance
A sample of 100 customers was taken. The sample had a mean age that was significantly different
A company’s profits for the year show a 20%
from 28 years. The hypothesis was rejected, and the marketing company concluded wrongly that the
increase compared to the previous year. This
average age of its customers was not 28.0 years. This is a type I error.
information is likely to sound impressive to
shareholders. If, however, shareholders are first
told that the market growth was 30% over the
last year, shareholders will recognise the Context example: Type II error
company’s 20% growth as less impressive.
A sports retail company believes (incorrectly) that the average age of its customers is 28.0 years. It
Confirmation bias This occurs when people see data that confirms has decided to test this hypothesis.
their beliefs and they ignore (consciously or A sample of 100 customers was taken. The sample had a mean that was not significantly different to
sub-consciously) data that disagrees with their 28.0. The hypothesis was therefore accepted, and the marketing company concluded that the
beliefs. average age of its customers was 28.0 years. In actual fact, the average age of its customers was not
Example – new product/market research 28 years. The company has made a type II error, in accepting an incorrect null hypothesis.
Managers who have already made a decision
(eg, to invest in a new product) may ignore
marketing research that suggests that the
product will not be successful, while fully paying 7 Presentation of information
attention to research that confirms their
decision.
Section overview
Survivorship bias This is where the sample contains only items
that survived some previous event.
Example – exam results Thought should be given to what is the best, most effective way to present the results of data
An accountancy firm only lets students sit the analysis. Data can be presented using tables, charts, or a combination of the two. The objectives of
BTF exam if they achieve at least 45% in the good presentation are:
mock exam. The firm boasted that 95% of their • Easy for the users to understand. This will be the case if the data is presented using an
appropriate format.

ICAEW 2023 13: Data analysis 447 448 Business, Technology and Finance ICAEW 2023
• The presentation should accurately reflect the underlying data. An appropriate scale should be Sales
used. 900,000
• The information may need to be accompanied by some commentary to help users interpret the 800,000
message. 700,000
600,000
500,000
7.1 Using data visualisation 400,000
400,000
Definition 200,000
100,000
Data visualisation: Data visualisation is the use of charts and diagrams to present information.
20X8 20X9 20Y0 20Y1
The advantage of visualisation is that high level information can often be more quickly understood if
North South East West
presented in the form of charts and diagrams compared to tables of numerical data.
Many software applications are available that can produce professional looking charts and diagrams Figure 13.5: Component Bar Charts
from data. In spreadsheet software, for example, many different charts are available that use data
within the spreadsheet, and automatically refresh themselves when the spreadsheets are amended.
While software applications are a useful tool, care needs to be taken in how they are used.
7.3 Pie charts
The sections below examine some more common types of chart, and discuss when they are most Pie charts are a useful way of showing the components that make up a total. The larger the angle of
useful. a particular component is, the higher is its proportion of the total. Unlike clustered and component
bar charts, they only show information for one period of time. The advantage is that seeing the size of
the components visually gives the user immediate perspective on the relative size of the different
7.2 Bar charts
components.
Bar charts (column charts) present information using bars, where the length of the bars represents
The limitation of pie charts is that they can only analyse one variable at a time – for example, it would
the value of the data. Bar charts are useful for presenting discrete data where comparisons are made
not be possible to see an analysis of sales and profits on the same pie chart.
between different data sets – for example, sales in different periods, or sales by region. Clustered bar
charts show the components that make up the total as well as the total itself. Figure 13.1 shows a Sales
clustered bar chart, where the total sales are represented by one bar for each period, and this is then
broken down by region, with a different coloured bar representing each different region. The bars
are repeated for several time periods. They are useful provided that there are not too many sub- Household products
components. If there are too many, the charts begin to become too cluttered and may confuse users: Drinks

Sales
900,000
Books and
800,000 magazines
700,000
600,000
500,000 Food products
400,000
400,000
200,000
100,000 Figure 13.6: Pie Charts

20X8 20X9 20Y0 20Y1 7.4 Line charts


Total North South East West Line charts are useful for showing trends in a data series (eg, sales over time). The advantage of line
charts is that it is easier to see trends in the data. A further advantage is that several data series could
Figure 13.4: Bar Charts be presented within the same chart – for example, sales and profits could be plotted as two separate
lines.

An alternative presentation of the clustered bar chart is a component bar chart (stacked column)
which shows one column representing the total, broken into the components that make up that total.

ICAEW 2023 13: Data analysis 449 450 Business, Technology and Finance ICAEW 2023
Quarterly Sales and Profits • Velocity: Big data can be streamed into the business at great speed, so data about sales, for
450,000 instance, or inventory is available effectively in real time. Previously a business would have waited
400,000 for periodic, summarised reports.
350,000 • Variety: Big data is available about a huge variety of issues, for instance about customers,
300,000 competitors, transactions or social media activity. Because it comes from a variety of sources big
250,000 data is unstructured, which means that it needs to be analysed before it can be useful.
200,000
• Veracity: This is to do with the trustworthiness or accuracy of big data. All data sets contain
150,000
inaccuracies, bias, anomalies and irrelevancies (‘random noise’), and it is important that as much
100,000
as possible is done to clean up this ‘dirty data’ so that it can be relied upon when analysed.
50,000
While big data is positive because it connects the business with its customers and the competitive
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 environment, the sheer size can be overwhelming.
20X8 20X9 20Y0 20Y1
8.2 Types of big data
Sales Profits
Nemschoff points out that big data is typically either structured or unstructured. He classifies big data
Figure 13.7: Line Charts as follows:
• Structured data is data, which is obtained with a particular purpose in mind, so has an inherent
structure derived from the way in which it is collected, typically from website clicks or other
7.5 Principles of effective visualisations particular actions:
In order to be clear and informative for users, the following principles should be adopted when using – created data – data which has been created on purpose by an organisation, usually for product
visualisations. While these may sound like common sense, it can be easy to forget them when or market research (for example data created when customers login to their online account
dealing with complex data: with a retailer, or when RFID tags on inventory are logged via the internet as the inventory is
(a) The appropriate type of chart should be used for the data presented. An inappropriate method – moved around)
such as a pie chart for multiple variables – may confuse rather than enlighten the user. – provoked data – data obtained from people who have been given the opportunity to express
(b) An appropriate scale should be chosen. If the scale is too small, it may be that different values their views (for example ratings and reviews left on a clothing retailer’s website by customers)
will not look very different. – transacted data – data collected about actual transactions such as sales, including all the steps
(c) Charts should have a clear title as to data and time period, and clear labelling, with legends or website traffic that led up to each transaction
provided if appropriate. – compiled data – data collected by a third party such as a market research, credit rating or
(d) The use of colours or shading helps to distinguish between the different components in a chart polling organisation and accessed by a business
and make them clear. • Unstructured data is obtained without a particular objective so has no inherent structure within
Essentially, the qualities of good information discussed above also apply to charts. itself:
– captured data – data which is created passively from unrelated activity and captured without a
specific purpose, for example from a smartphone which allows data to be captured about a
8 Big data person’s location, or from a search engine which captures which websites have been accessed
by which computers
Section overview – user-generated data – data which internet users create and voluntarily place online, such as
tweets, photos, etc
The developments in technology over the past decades have enabled organisations to collect and
process much larger volumes of data. The ability to store much higher volumes of data, and new 8.3 Sources of big data
methods of collecting data, such as the internet of things has led to an explosion in the data that is Another way of classifying types of big data is to analyse its sources:
collected by organisations. The increased use of social media containing commercially useful
• processed data from information systems held by traditional business and other organisations
information means more types of information are available. More sophisticated analysis techniques
that enable analysis of text and photographs in addition to numerical data allow greater insights to • open data, which refers to the release of large amounts of primarily public sector data, such as
be achieved. These developments are often referred to as ‘big data’. geo-spatial data, transport data, government financial data and public service data
• human-sourced data from social networks, blogs, emails, text messages and internet searches
• machine-generated data from the internet of things: from fixed and mobile sensors, and from
Definition computer and website logs

Big data: those datasets whose size is beyond the ability of typical… software to capture, store, 8.4 Importance of big data
manage and analyse (Manyika et al)
ICAEW sets out the driving forces behind the increasing importance of big data:
• new sources of data, for instance the huge increase in unstructured human-sourced and machine-
generated data (open data, social media and the internet of things)
8.1 Characteristics of big data
• exponential growth in computing power and storage, which means that entire data sets can be
There are four key characteristics of big data (the four Vs):
captured and processed, regardless of their size and complexity
• Volume: The amount of big data accessible to a business is vast. It is available relatively easily and • new infrastructure for knowledge creation, such as crowdsourcing and open-source software
in ever-increasing volumes.

ICAEW 2023 13: Data analysis 451 452 Business, Technology and Finance ICAEW 2023
9 Data science • Innovation: Analysed big data can reveal completely new ideas which result in new products and
services. Big data also underlie the growth of virtual – or ‘borderless‘ – organisations which we saw
in the chapter Organisational and business structures.
Section overview • Risk management: Data analytics can use big data to assist with the identification, quantification
and management of risk.
Data science covers the whole life cycle of data, from acquisition and exploration to analysis and
communication of the results. It is not only concerned with the tools and methods to obtain, manage 9.3 Risks of big data, data science and data analytics
and analyse data: it is also about extracting value from data and translating it from asset to product. Knowledge Content
Despite the benefits, there are a number of risks that big data, data science and data analytics
The importance of big data and its uses to businesses means that employees require skills in data create:
science. • Storage: The sheer volume of big data means that organisations must monitor and flex storage
levels to avoid running out of space.
Definition • Workforce skills: Increasing amounts of data and the need for its analysis and interpretation
Data science: Deals with collecting, preparing, managing, analysing, interpreting and visualising means that organisations need to ensure that they have sufficient data scientists and analysts with
large and complex datasets (Imperial College London, 2018). an appropriate mix of knowledge and experience to get the most out of the data that they have.
• Data dependency: Organisations can become dependent on data to make business decisions,
which puts them at risk of making poor decisions if there are errors in the data or if it is
9.1 Data analytics misinterpreted by data scientists. This links back to big data’s characteristic of veracity (or
trustworthiness of data).
Value is extracted from big data by data scientists through the process of data analytics. This means
that data is assembled using fields within the source data itself, rather than using predetermined
• Information overload: Data analytics of big data can create huge numbers of insights into the
formats which can be very restrictive. The data assembled can then be filtered, sorted, highlighted business and its environment. There is a risk that too much information becomes available which
and presented visually using, typically, bar charts and pie charts. Both the extraction of data and may actually hamper the speed that business decisions can be made.
subsequent presentation of information are richer and more varied than was possible previously, and • Data privacy: Collecting, storing and analysing data puts the organisation at risk of breaching
allow far wider and deeper analysis. data privacy legislation, such as the Data Protection Act 2018 in the UK. Organisations need to be
aware of the legislation and take all necessary steps to comply with it.
Definition • Data security: Big data and its analysis can be a source of competitive advantage for
organisations. The data and its analysis needs to be protected from cyber security risks to prevent
Data analytics: The process of using fields within the source data itself, rather than predetermined
this highly valuable information from falling into the wrong hands.
formats, to collect, organise and analyse large sets of data to discover patterns and other useful
information which an organisation can use for its future business decisions.
Professional skills focus: Structuring problems and solutions

9.2 Benefits of big data, data science and data analytics Exam questions could test your ability to demonstrate understanding of the business’s strategy in
relation to the use of big data by producing a list of potential risks, and ask which is a risk of big data.
As we have seen, big data, data science and data analytics are all closely related. Big data is the raw Be aware of the risks of big data, particularly in relation to security and privacy.
data available to organisations in vast amounts. Data science is the collection and management of
that data, and data analytics is the processing of the data (often by data scientists) to create insights
which are useful to the organisation.
9.4 Data ethics
There are a number of benefits that all three can bring to an organisation. ICAEW points out that big
data and data analytics are being used to: The increasing collection and analysis of data, particularly personal data held about individuals,
raises complex ethical issues. The overarching principles of data ethics, as set out by the UK
• gain insights; government guidance for use in the public sector are: transparency, accountability and fairness. The
• predict the future; and first two of these principles are equally applicable to the private sector.
• automate non-routine decision making. 9.4.1 Transparency
A report by the management consultants McKinsey highlights the following ways in which big data Are businesses transparent about how they use data, particularly if using techniques such as AI (see
and data analytics can be used by a business to create value: the chapter Developments in technology) in aggregated data. Information about data held and data
• Enhancing transparency: Data analytics of big data create insights into issues affecting the processes should be published.
business that may not have previously been fully understood, such as customer buying patterns or
market price fluctuations. 9.4.2 Fairness
• Performance improvement: Real-time, analysed information allows managers to make better The processes for collecting, storing and analysing data should aim to avoid unintended
decisions which result in better profitability. discriminatory effects on individuals and social groups. This is most obviously done by mitigating
bias in data which may influence an outcome to ensure that outcomes arising from data respect the
• Market segmentation and customisation: Insights into customer needs from analysed big data
dignity of individuals and are non-discriminatory.
allow the business to tailor product designs and prices to particular customers or groups of
customers.
• Decision making: Real-time information allows managers to make better day-to-day decisions, for
instance about pricing or stocking of goods.

ICAEW 2023 13: Data analysis 453 454 Business, Technology and Finance ICAEW 2023
9.4.3 Privacy
Individuals have a right to privacy. If organisations are routinely collecting and processing
Summary
information about individuals, this can threaten that privacy. Information should therefore only be
collected with the consent of the individual and within relevant regulations on privacy.

9.4.4 Ownership of data


Use of Data in business
Businesses can increase their revenues by selling the data they collect to other organisations. Not
only is this a further threat to privacy, but there is also a question of who owns the data. Does it Data vs information
belong to the data subject, or does it belong to the business simply because they collected it? • Planning
• Decision making
9.4.5 Consent • Controlling
Do users genuinely consent to the collection of data and its use? Most web sites use cookies, which Sources of data
are small applications that sit on the user’s devices and collect data about the individuals. Web sites and information
do warn you that they use cookies and ask for consent before you can visit the site, but most people • Internal Presentation of data
just ‘agree’ without spending the time to really understand what you are consenting to. You may be • External Data analysis • Data visualisation
consenting to your data not only being collected, but also used to generate revenue. Qualities of • Internet of things • Identify • Principles of
good information needs effective
9.4.6 Open data information • Collect the data visualisations
There is a question of whether data collected should be available for all to use. Open data is data 'Accurate' • Analyse the data
that anybody can access or share (Open Data Institute). Currently there is a movement that all data mnemonic • Present the
should be open, as it benefits society as a whole. Descriptive statistics Data bias
information
Exploratory data Type I errors
analysis Type II errors
Regression Sampling
Confirmatiory data
Correlation Hypothesis testing
analysis

Big Data Data ethics Spreadsheets


• 4 Vs • Transparency • Introduction to
• Fairness spreadsheets
• Privacy • Use of spreadsheets
Data Science • Ownership in finance
• Data analytics of data • Risks from
• Benefits and risks spreadsheet use
• Principles of good
spreadsheet practice

ICAEW 2023 13: Data analysis 455 456 Business, Technology and Finance ICAEW 2023
Further question practice Technical references

Marr, B (2017) Data Strategy: How to Profit from a World of Big Data, Analytics and the Internet of
1 Knowledge diagnostic Things, Kogan Page Limited.
Before you move on to question practice, confirm you are able to answer the following questions Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A.H. (2011) Big Data:
having studied this chapter. It not, you are advised to revisit the relevant learning from the topic The next frontier for innovation, competition, and productivity. [Online]. Available from:
indicated. www.mckinsey.com/business-functions/mckinsey-digital/our-insights/big-data-the-next-frontier-for-
innovation [Accessed 5 April 2021].
Confirm your learning
Nemschoff (2014) 7 Important Types of Big Data. [Online]. Available from
1 Can you identify the three management activities that are supported by data and https://www.smartdatacollective.com/7-important-types-big-data/ [Accessed 5 April 2021].
information? (Topic 1) ICAEW (2019) Big Data and Analytics: The Impact on the Accountancy Profession [Online] Available
2 Can you give three examples of internal data sources and three examples of external from: https://www.icaew.com/-/media/corporate/files/technical/technology/thought-leadership/big-
data sources? (Topic 2) data-and-analytics.ashx [Accessed 5 April 2021].

3 Can you state what each of the letters in the ACCURATE acronym stand for, in relation to
qualities of good information. (Topic 3)

4 Can you list the stages in a well-planned data analysis programme? (Topic 4)

5 Can you distinguish between the different types of data bias? (Topic 5)

6 Can you identify the meaning of type I and type II errors? (Topic 5)

7 Can you identify when bar charts, pie charts and line charts are most appropriate? (Topic
6)

8 Can you list what the four Vs stand for in the context of big data? (Topic 7)

9 Can you define data science? (Topic 8)

10 Can you list some of the ethical problems associated with data analysis? (Topic 9)

2 Chapter Self-test question practice


Aim to complete all self-test questions at the end of this chapter. Once completed, attempt all
questions in the Data analysis chapter of the Business, Technology and Finance Question Bank. Refer
back to the learning in this chapter for any questions which you did not answer correctly or where the
suggested solution has not provided sufficient explanation to answer all your queries. Once you have
attempted these questions, you can move on to the next chapter, Developments in Technology.

ICAEW 2023 13: Data analysis 457 458 Business, Technology and Finance ICAEW 2023
Self-test questions C
D
the survey will suffer from survivorship bias
the survey will take into account the opinion of the whole population
6 In the context of hypothesis testing, what does a type I error mean?
A the sample mean was calculated incorrectly
Answer the following questions.
B the null hypothesis is rejected when it is actually correct
1 Ian has to make a decision about whether to allow overtime tonight to Gonzalez, a customer service
adviser, but he is unsure whether this extra time is needed between 7pm and 9pm on a Wednesday. C the null hypothesis is accepted when it is actually incorrect
The type of information he needs to answer this query is: D the selection of the sample was biased
A planning
7 A finance director is preparing a presentation of the company’s results and wishes to produce a slide
B operational showing sales for the current and previous periods, analysed by product group. There are four main
C tactical product groups.
D strategic Requirement
2 Rachel has presented some information on how to measure performance to a panel of managers at What type of chart would be most appropriate for this?
Jab plc. She found this information on the internet the previous evening as a PowerPoint file and has A a pie chart
presented it to the panel unedited. Within five minutes they found it to be highly informative and
B a component bar chart
targeted at the issues they are concerned with.
C a line chart
Requirement
D a table
Which ACCURATE criteria for good information does Rachel’s information fail to meet?
8 Which of the characteristics of big data refers to the fact that the data may include text, photographs
A being cost beneficial
and sound files?
B being relevant
A Volume
C being easy to use
B Velocity
D being authoritative
C Variety
3 Pap plc makes a single product with five operatives working five machines in a 35-hour week, for D Veracity
which they are paid £10.60 per hour. National insurance etc, adds another £173 to the weekly labour
bill. Last week the gross cost of labour was £2,200. 9 What is the process where value is extracted from big data known as?

Requirement A Data science


B Data extraction
In which internal source should the managers of Pap plc refer to identify why the bill was this size?
C Data analytics
A the payroll
D Data creation
B the computerised accounting system
C the machine logs Now go back to the Introduction and ensure that you have achieved the Learning outcomes listed for
D the workers this chapter.
4 An analyst has produced some information about a company’s online customers for the company’s
marketing department. The information includes mean age and mean spending by age.
Requirement
What type of analysis has the analyst produced?
A Descriptive statistics
B Exploratory data analysis
C Confirmatory data analysis
D Sampling

5 A large supermarket group wishes to carry out a survey to ascertain customers’ opinions about some
new products that it is considering launching. It has decided to do a survey on one of the large social
networking sites.
Requirement
Which of the following statements about the survey is correct?
A the survey will provide opinions that are representative of all customers
B the survey will suffer from selection bias

ICAEW 2023 13: Data analysis 459 460 Business, Technology and Finance ICAEW 2023
Answers to Interactive questions Answers to Self-test questions

Answer to Interactive question 1 1 Correct answer(s):


B operational
The most frequent problem encountered by managers is that the information is not targeted at the
user, that is the information system has not been designed with users and their needs in mind. By its short-term nature the information could not be planning or strategic; the fact that this is a day-
Information that is not actually relevant to the decisions they make is often included. Management to-day issue means it is not tactical.
information is frequently not easy to use , and it is late ie, not timely.
2 Correct answer(s):
D being authoritative
Clearly the users have found the information easy to use and relevant, and the fact that Rachel spent
very little time in generating it makes it cost-beneficial.

3 Correct answer(s):
A the payroll
The first place to look is the payroll, which will show whether the variance comes from the pay rate,
the number of workers paid, or the national insurance etc. The other sources will provide further
information to back up the evidence in the payroll.

4 Correct answer(s):
A Descriptive statistics
Descriptive statistics describes the properties of sample and population data which is the case here,
as the analyst has provided information about the properties (mean age and spending) for the
population of customers. Exploratory analysis aims to find relationships between the data, such as
how spending might be related to age. No such analysis has been produced in this case.
Confirmatory data analysis aims to find out if an existing hypothesis is correct or not, but again no
such actions have been taken here. Sampling involves taking a sample of data and making inferences
about the population. Here, it appears that the analyst has used the whole population of customers.

5 Correct answer(s):
B the survey will suffer from selection bias
The survey will not be representative of all customers as it does not represent the views of any
customers who do not use the large social networking site.
The survey suffers from selection bias – not all members of the population (of customers) have a
chance of being selected for the survey.
Survivorship bias relates to a situation where people who have not survived a particular event would
not be chosen in the sample. This is not the case here.

6 Correct answer(s):
B the null hypothesis is rejected when it is actually correct
A type I error means that a hypothesis is rejected when it is actually correct, because data from a
sample is significantly different from the hypothesised value, so B is correct. C describes a type II
error. A type I error does not mean that the sample mean was calculated incorrectly, so A is wrong.
Nor does a type I error mean that the selection of the sample was biased.

7 Correct answer(s):
B a component bar chart

A pie chart would be appropriate if the finance director wanted to show sales by product group for
only one year. However, since they wish to show the previous period too, two pie charts would be

ICAEW 2023 13: Data analysis 461 462 Business, Technology and Finance ICAEW 2023
required, and the comparison between the two periods would not be so clear as they are shown in
different charts.
A component bar chart would be appropriate. The length of each bar would represent total sales for
the two years, and the analysis of the sales would be shown within each bar by using different
colours for the portion of the bar representing revenue from each of the product groups.
It would not be very easy to distinguish between the different product groups if a line chart were
used. Line charts are more appropriate for looking at one or two variables over several time periods
to show trends.
Presenting the data in a table would not provide the same impact as showing the data in a chart.

8 Correct answer(s):
C Variety
One of the issues of big data is variety – which refers to the fact that data comes in many forms, such
as text and photographs, which require special systems to analyse them, so C is the correct answer.
Volume refers to the amount of data, not the form. Velocity refers to the fact that big data is
continuously being updates, so analysis had to occur quickly. Veracity means that the information can
be trusted, as it is from a reliable source.

9 Correct answer(s):
C Data analytics
Value is extracted from big data through the process of data analytics by data scientists

ICAEW 2023 13: Data analysis 463 464 Business, Technology and Finance ICAEW 2023

You might also like