Data Analytics For Accounting - Chapter 3 - Data Analytics For Accouting - Performing The Test Plan and
Data Analytics For Accounting - Chapter 3 - Data Analytics For Accouting - Performing The Test Plan and
CHAPTER 3
DATA ANALYTICS FOR ACCOUTING PERFORMING THE TEST PLAN AND ANALYZING THE
RESULTS
OBJECTIVES
a) Understand four categories of Data
Analytics
b) Describe some descriptive analytics
approaches, including summary
statistics and data reduction.
c) Explain the diagnostic approach to data
analytics, including profiling and
clustering.
d) Under predictive analytics, including
regression and classification.
e) Describe the use of prescriptive
analytics, including machine learning
and artificial intelligence.
• Descriptive analytics are procedures that summarize existing data to determine what
has happened in the past. Some examples of descriptive analytics include summary
statistics (e.g. Count, Min, Max, Average, Median), distributions, and proportions.
• Diagnostic analytics are procedures that explore the current data to determine why
something has happened the way it has, typically comparing the data to a benchmark.
As an example, diagnostic analytics allow users to drill-down in the data and see how
it compares to a budget, a competitor, or trend.
• Predictive analytics are procedures used to generate a model that can be used to
determine what is likely to happen in the future. Examples of predictive analytics
include regression analysis, forecasting, classification, and other predictive modeling.
• Prescriptive analytics are procedures that model data to enable recommendations for
what should be done in the future. These typically include developing more advanced
machine learning and artificial intelligence models to recommend a course of action
based on a current problem.
The choice of Data Analytics model depends largely on the type of question that you’re
trying to answer and your access to the data needed to answer the question. Descriptive and
diagnostic analytics are typically paired when you would want to describe the past data and
then compare it to a benchmark to determine why the results are the way they are, similar
to the accounting concepts of planning and controlling. Likewise, predictive and prescriptive
analytics make good partners when you would want to predict an outcome and then make a
recommendation on how to follow up, similar to an auditor flagging a transaction as high risk
and then following a decision flowchart to determine whether to request additional evidence
or include it in audit findings.
B. Descriptive Analytics
Descriptive analytics help summarize what has happened in the past. For example,
a financial accountant would sum all of the sales transactions within a period to calculate
the value for Sales Revenue that appears on the income statement. An analyst would
count the number of records in a data extract to ensure the data are complete before
running a more complexanalysis. An auditor would filter data to limit the scope to
transactions that represent the highest risk. In all these cases, basic analysis provides an
understanding of what has happened in the past to help decision makers achieve good
results and correct poor results.
Here we look at two main approaches that are used by accountants today:
summary statistics and data reduction.
Summary Statistics
Summary statistics describe the location, spread, shape, and dependence of a set
of observations. These commonly include the count, sum, minimum, maximum, mean or
average, standard deviation, median, quartiles, correlation covariance, and frequency
that describe a specific measurable value;
Statistic Excel Formula Description
Sum =SUM () The total value of all
numerical values
Mean =AVERAGE () The center value; sum of
all observations divided by
the number of
observations
Median =MEDIAN() The middle value that
divides the top half of the
data from the bottom half
Minimum =MIN() The smallest value
Maximum =MAX() The largest value
Count =COUNT() The number of
observations
Frequency =FREQUENCY() The number of
observations in each of a
series of numerical or
categorical buckets
Standard deviation =STDEV() The variability or spread of
the data from the mean; a
larger standard deviation
means a wider spread
away from the mean.
Data Reduction
As you recall, the data reduction approach attempts to reduce the amount of
detailed information considered to focus on the most critical, interesting, or abnormal
items (i.e., highest cost, highest risk, largest impact, etc.). It does this by filtering through
a large set of data (perhaps the total population) and reducing it to a smaller set that has
the vast majority of the critical information of the larger set. The data reduction approach
is done primarily using structured data—that is, data that are stored in a database or
spreadsheet and are readily searchable.
Data reduction involves the following steps (using an example of an employee
creating a fictitious vendor and submitting fake invoices):
• Identify the attribute you would like to reduce or focus on
• Filter the results
• Interpret the results
• Follow up on results
C. Diagnostic Analytics
Diagnostic analytics provide insight into why things happened or how individual
data values relate to the general population. Once you summarize data using descriptive
techniques, you can drill-down and discover the numbers that are driving an outcome.
Benchmarks give context to the data by giving analysts a reference point (or line) to
compare the data to. For example, the arithmetic mean of a data set gives you context
for a specific value. These benchmarks may be based on past activity, a comparison with
a major competitor or an entire industry.
Two primary methods of diagnostic analytics include profiling and cluster analysis.
In both of these cases the analysis provides insight into where a specific value lies relative
to the rest of the sample or population. The farther the distance from the rest of the
observations, the more interesting the individual value becomes. These outliers could
represent risk or opportunities to learn more about the business process or partnerships
driving the behavior.
Profiling
Data profiling typically involves the following steps:
• Identify the objects or activity you want to profile.
• Determine the types of profiling you want to perform.
• Set boundaries or thresholds for the activity
• Interpret the results and monitor the activity and/or generate a list of exceptions
• Follow up on exceptions
Cluster Analysis
The clustering data approach works to identify groups of similar data elements and the
underlying relationships of those groups. More specifically, clustering techniques are
used to group data/observations into a specific number of clusters or groups so that all
the data within any cluster are similar, while data across clusters are different. When you
are exploring the data for these patterns and don’t have a specific question, you would
use an unsupervised approach. For example, consider the question: “Do our vendors form
natural groups based on similar attributes?” In this case, there isn’t a specific target
because you don’t yet know what similarities our vendors have. You may use clustering
to evaluate the vendor attributes and see which ones are closely related. You could also
use co-occurrence grouping to match vendors by geographic region; data reduction to
simplify vendors into obvious categories, such as wholesale or retail or based on overall
volume of orders; or profiling to evaluate vendors with similar on-time delivery behavior.
In any of these cases, the data drive the analysis, and you evaluate the output to see if it
matches our intuition. These exploratory exercises may help to define better questions,
but are generally less useful for making decisions.
Predictive Analytics
Regression
Classification
The goal of classification is to predict whether an individual we know very little
about will belong to one class or another. For example, will a customer have his or her
balance written off? The key here is that we are predicting whether the write-off will
occur or not (in other words, there are two classes: “Write-Off” and “Good”).
Classification is a supervised method that can be used to predict the class of a new
observation. In this case, blue circles represent “on-time” vendors. Green squares
represent “delayed” vendors. The gold star represents a new vendor with no history.
Classification is a little more involved as we are now dealing with machine learning and
complex probabilistic models. Here are the general steps:
• Identify the classes you wish to predict.
• Manually classify an existing set of records.
• Select a set of classification models.
• Divide your data into training and testing sets.
• Generate your model.
• Interpret the results and select the “best” model.
D. PRESCRIPTIVE ANALYTICS
Prescriptive analytics answer the question “What do we do next?” We have collected the
data; analyzed and profiled the data; and in some cases, developed predictive models to
estimate the proper class or target value. Once those analyses have been performed, the
decision process can be aided by rules-based decision support systems, machine learning
models, or added to an existing artificial intelligence model to improve future predictions.
These analytics are the most complex and expensive because they rely on multiple
variable and inputs, structured and unstructured data, and in some cases the ability to
understand and interpret natural language command into data-driven queries.
Decision Support Systems
Decision support systems are information systems that support decision-making activity
within a business by combining data and expertise to solve problems and perform
calculations. They are designed to be interactive and adapt to the information collected
by the user. In the accounting domain, they are typically built around a series of rules or
If . . . then . . . branching statements that guide the user through the process to the result.
Decision support systems can help with application of accounting rules as well. For
example, when a company classifies a lease as a financing or operating lease, it must
consider whether the lease meets a number of criteria. Using a decision support system,
a controller could evaluate a new lease and answer five questions to determine the
proper classification, shown in Exhibit 3-18.