A Look at This Chapter 6
A Look at This Chapter 6
In this chapter, we focus on substantive testing within the audit setting. We highlight discussion of the audit plan,
dis- cuss when population testing is appropriate, and attempt to understand simple audit analyses. We also discuss
the use of clustering to detect outliers and the use of Benford’s analysis.
A Look Back
In chapter 5, we introduced Data Analytics in auditing by considering how both internal and external auditors are
using technology in general, and audit analytics specifically, to evaluate firm data and generate support for
manage- ment assertions. We emphasized audit planning, audit data standards, continuous auditing, and audit
working papers.
A Look Ahead
Chapter 7 explains how to apply Data Analytics to measure performance. By measuring past performance and
com- paring it to targeted goals, we are able to assess how well a company is working toward a goal. Also, we can
determine required adjustments to how decisions are made or how business processes are run, if any.
208
Internal auditors at Hewlett-Packard Co. (HP) understand
how data analytics can improve processes and controls.
Management identified abnormal behavior with manual journal
entries, and the internal audit department responded by work-
ing with various governance and compliance teams to develop
dashboards that would allow them to monitor accounting activ-
ity. The dashboard made it easier for management and the
audi- tors to follow trends, identify spikes in activity, and drill
down to identify the individuals posting entries. Leveraging
accounting data allows the internal audit function to focus on
the risks fac- ing HP and act on data in real time by
©Anatolii Babii/Alamy
implementing better con- trols. Audit data analytics provides
an enhanced level of control that is missing from a traditional
OBJECTIVES
After reading this chapter, you should be able to:
209
210 Chapter 6 Audit Data Analytics
Data may also be found in unlikely places. An auditor may be tasked with determining
whether the steps of a process are being followed. Traditional evaluation would involve
the auditor observing or interviewing the employee performing the work. Now that most
pro- cesses are handled through online systems, an auditor can perform Data Analytics on
the time stamps of the tasks and determine the sequence of approvals in a workflow
along with
212 Chapter 6 Audit Data Analytics
the amount of time spent on each task. This form of process mining enables insight into
areas where greater efficiency can be applied. Likewise, data stored in paper documents,
such as invoices received from vendors, can be scanned and converted to tabular data
using specialized software. These new pieces of data can be joined to other transactional
data to enable new, thoughtful analytics.
There is an increasing opportunity to work with unstructured Big Data to provide
addi- tional insight into the economic events being evaluated by the auditors, such as
surveillance video or text from e-mail, but those are still outside the scope of current Data
Analytics that an auditor would develop.
Most auditors will perform descriptive and diagnostic analytics as part of their audit
plan. On rare occasions, they may experiment with predictive and prescriptive analytics
directly. More likely, they may identify opportunities for the latter analytics and work
with data scientists to build those for future use.
Some examples of CAATs and audit procedures related to the descriptive, diagnostic,
predictive, and prescriptive analytics can be found in Table 6-2.
While many of these analyses can be performed using Excel, most CAATs are built on
generalized audit software (GAS), such as IDEA, ACL, or TeamMate Analytics. The
GAS software has two main advantages over traditional spreadsheet software. First, it
enables analysis of very large datasets. Second, it automates several common analytical
routines, so an auditor can click a few buttons to get to the results rather than writing a
complex set of formulas. GAS is also scriptable and enables auditors to record or
program common analy- ses that may be reused on future engagements.
Communicate Insights
Many analytics can be adapted to create an audit dashboard, particularly if the firm has
adopted continuous auditing. The primary output of CAATs is evidence used to validate
assertions about the processes and data. This evidence should be included in the audit
workpapers.
Track Outcomes
The detection and resolution of audit exceptions may be a valuable measure of the effi-
ciency and effectiveness of the internal audit function itself. Additional analytics may
track the number of exceptions over time and the time taken to report and resolve the
issues. For the CAATs involved, a periodic validation process should occur to ensure that
they continue to function as expected.
PROGRESS CHECK
1. Using Table 6-2 as a guide, compare and contrast descriptive and diagnostic
analytics. How might these be used in an audit?
2. In a continuous audit, how would a dashboard help to communicate audit find-
ings and spur a response?
In this and the next few sections, we’ll present some examples of procedures that audi-
tors commonly use to evaluate enterprise data. In these examples, we show the basic pro-
cess for Excel, including formulas, and IDEA. Note that in the Excel formulas, we
identify data elements in [brackets]. To use these formulas, replace the bracketed [data
element] with a value or range of values as appropriate. For example, [Aging date] would
be replaced with C3 if the data are in column C, row 3.
Age Analysis
Aging of accounts receivable and accounts payable help determine the likelihood that a
bal- ance will be paid. This substantive test of account balances evaluates the date of an
order and groups it into buckets based on how old it is, typically in 0–30, 31–60, 61–90,
and >90
days, or similar. See Table 6-3 for an example. Extremely old accounts that haven’t been
resolved or written off should be flagged for follow-up by the auditor. It could mean that
(1) the data are bad, (2) a process is broken, (3) there’s a reason someone is holding that
account open, or (4) it was simply never resolved.
There are many ways to calculate aging in Excel, including using pivot tables. If you
have a simple list of accounts and balances, you can calculate a simple age of accounts in
Excel using the following procedure.
Data
• Customer/vendor name
• Unpaid order number
• Order date
• Amount
In Excel
1. Open your worksheet.
2. Add a cell with the aging date.
3. Add a calculated column for the days outstanding: =[Aging date]–[Order date].
4. Add four new calculated columns for the buckets:
a. 0–30 days: =IF([Aging date]–[Order date]<=30,[Amount],0).
b. 31–60 days: =IF(AND([Aging date]–[Order date]<=60, [Aging date]–
[Order date]>30),[Amount],0).
c. 61–90 days: =IF(AND([Aging date]–[Order date]<=90, [Aging date]–
[Order date]>60),[Amount],0).
d. >90 days: =IF([Aging date]–[Order date]>90),[Amount],0).
5. Copy the formulas for all records.
6. Add a total to the bottom of each bucket: =SUM([bucket column]).
In IDEA
1. Open your worksheet.
2. Go to Analysis > Categorize > Aging.
216 Chapter 6 Audit Data Analytics
3. Select aging date, field containing transaction date, and amount for the field to
total amount.
4. Click OK.
Sorting
Sometimes, simply viewing the largest or smallest values can provide meaningful insight.
Sorting in ascending order shows the smallest number values first. Sorting in descending
order shows the largest values first.
Data
• Any numerical, date, or text data of interest
In Excel
1. Open your worksheet.
2. Select the data you wish to sort.
3. Go to Home > Format as Table.
4. Click the drop-down arrow next to the header or the column you want to sort.
5. Click Sort A to Z for ascending order or Sort Z to A for descending order.
Chapter 6 Audit Data Analytics 217
In IDEA
1. Open your data table.
2. Go to Data > Order > Sort.
3. Choose your fields and direction, Ascending or Descending.
4. Click OK.
Summary Statistics
Summary statistics provide insight into the relative size of a number compared with the
population. The mean indicates the average value, while the median produces the middle
value, where all the transactions lined up in a row. The min shows the smallest value,
while the max shows the largest. Finally, a count tells how many records exist, where the
sum adds up the values to find a total. Once summary statistics are calculated, you have a
refer- ence point for an individual record. Is the amount above or below average? What
percent- age of the total does a group of transactions make up?
Data
• Any numerical data, such as a dollar amount or quantity
In Excel
1. Open your workbook.
2. Add the following calculated values:
• Mean: =AVERAGE([range]).
• Median: =MEDIAN([range]).
• Minimum: =MIN([range]).
• Maximum: =MAX([range]).
• Count: =COUNT([range]).
• Sum: =SUM([range]).
3. Alternatively, format your data as a table and show the total row at the bottom:
a. Select your data.
b. Go to Home > Styles > Format as Table.
c. Select a table style and click OK.
d. Go to Table Tools > Design > Table Style Options and click the Total Row box.
e. Click the drop-down arrow next to the column total value that appears, and choose
an appropriate statistic.
In IDEA
1. Open your worksheet.
2. In the Properties pane on the right, click Field Statistics.
3. Allow IDEA to calculate all uncalculated fields, if prompted.
4. In the output screen, you can click any blue number to locate those transactions.
Sampling
Sampling is useful when you have manual audit procedures, such as testing transaction details
or evaluating source documents. The idea is that if the sample is an appropriate size, the
features of the sample can be confidently generalized to the population. So, if the sample
has no errors (misstatement), then the population is unlikely to have errors as well. Of
course, sampling has its limitations. The confidence level is not a guarantee that you won’t
miss some- thing critical like fraud. But it does limit the scope of the work the auditor
must perform.
218 Chapter 6 Audit Data Analytics
There are three determinants for sample size: confidence level, tolerable misstatement,
and estimated misstatement.
Data
• Any list of transactions or master data
In Excel
1. Enable Analysis ToolPak:
a. Go to File > Options > Add-ins > Excel Add-ins > Go.
b. Check the box next to Analysis ToolPak, and click OK.
2. Go to Data > Analysis > Data Analysis.
3. Click Sampling, then OK.
a. Select your input range, usually the transaction number.
b. Choose Random, and input the number of samples.
c. Click OK.
4. A new worksheet will appear with a list of your randomly selected transactions.
In IDEA
1. Open your worksheet.
2. Go to Analysis > Sample > Random.
a. Input number of records to select for your sample size.
b. Change other values as needed.
c. Click OK.
3. A new worksheet will be created with your random sample.
Monetary unit sampling (MUS) allows auditors to evaluate account balances. MUS
is more likely to pull accounts with large balances (higher risk and exposure) because it
focuses on dollars, not account numbers.
Data
• The book value of the financial accounts you’re evaluating
• The sample size
In Excel
1. Find the sampling interval. Divide the book value by sample size.
a. 1,000,000/132 = 7,575 <- Sampling interval
2. Sort the financial accounts in some type of sequence, and calculate a
cumulative balance.
a. Alphabetically by name.
b. Numerically by number.
c. By date.
3. Pick a random number between 1 and your sampling interval.
a. This will be the starting value. For example, 1,243.
4. Go down the list of cumulative balances until you pass your random number.
a. For example, test the first account that passes 1,243.
5. Continue down the list of cumulative balances until you pass the next sampling
interval.
a. For example, test the second account that passes 1,243 + 7,575 = 8,818.
6. Repeat step 5 until you run out of accounts.
a. 8,818 + 7,575 = 16,393; 16,393 + 7,575 = 23,968 . . .
Chapter 6 Audit Data Analytics 219
In IDEA
1. Open your data table.
2. Go to Analysis > Sample > Monetary Unit > Plan.
a. Choose your monetary value field.
b. Set your confidence level, tolerable error, and expected error.
c. Click Estimate to calculate your sample size.
d. Adjust other values as needed, then click Accept.
e. Click OK.
3. A new worksheet will appear with your sample transactions.
PROGRESS CHECK
3. What type of descriptive analytics would you use to find negative numbers that
were entered in error?
4. How does monetary unit sampling help you isolate the items of greatest poten-
tial significance to an auditor in evaluating materiality?
Z-Score
A standard score or Z-score is a concept from statistics that assigns a value to a number
based on how many standard deviations it stands from the mean, shown in Exhibit 6-1.
By setting the mean to 0, you can see how far a point of interest is above or below it. For
example, a point with a Z-score of 2.5 is two-and-a-half standard deviations above the
mean. Because most values that come from a large population tend to be normally
distributed (frequently skewed toward smaller values in the case of financial
transactions), nearly all (98 percent) of the values should be within plus-or-minus three
standard deviations. If a value has a Z-score of 3.9, it is very likely an outlier that
warrants scrutiny.
In Excel
1. Calculate the average: =AVERAGE([range]).
2. Calculate the standard deviation: =STDEVPA([range]).
3. Add a new column called “Z-score” next to your number range.
4. Calculate the Z-score: =STANDARDIZE([value],[mean],[standard deviation])
a. Alternatively: =([value]–[mean])/[standard deviation].
5. Sort your values by Z-score in descending order.
In IDEA
• Z-score calculation is not a default feature of IDEA.
Benford’s Law
Benford’s law states that when you have a large set of naturally occurring numbers, the
lead- ing significant digit will likely be small. The economic intuition behind it is that
people are more likely to make $10, $100, or $1,000 purchases than $90, $900, or $9,000
purchases. This law has been shown in many settings, such as the amount of electricity
bills, street addresses, and GDP figures from around the world (as shown in Exhibit 6-2).
20%
15%
10%
5%
0%
1 2 3 4 5 6 7 8 9
Purchases GDP 2016 Benford’s Predicted
In auditing, we can use Benford’s law to identify transactions or users with nontypical
activity based on the distribution of the first digit of the number. For example, assume
that purchases over $500 require manager approval. A cunning employee might try to
make large purchases that are just under the approval limit to avoid suspicion. She will
even be clever and make the numbers look random: $495, $463, $488, etc. What she
doesn’t real- ize is that the frequency of the leading digit 4 is going to be much higher
than it should be, shown in Exhibit 6-3. Benford’s law can also detect random computer-
generated numbers because those will have equally distributed first digits.
We show an illustration of how to evaluate data and their frequency with respect to
Benford’s law in both Excel and IDEA.
Chapter 6 Audit Data Analytics 221
35%
EXHIBIT 6-3
Using Benford’s Law
30%
Structured
purchases may look
25% normal, but they
alter the distribution
20% under Benford’s
law.
15%
10%
5%
0%
1 2 3 4 5 6 7 8 9
Purchases Benford’s Predicted
Data
• Large set of numerical data, such as monetary amounts or quantities
In Excel
1. Open your spreadsheet.
2. Add a new column and extract the leading digit: =LEFT([Amount],1).
3. Create a frequency distribution:
a. Create a list on your sheet using values from as shown in Table 6-4 below.
In IDEA
1. Open your worksheet.
2. Go to Analysis > Explore > Benford’s Law.
a. Choose the numerical field to analyze.
b. Only check First digit. Uncheck everything else.
c. Click OK.
3. A graph will appear with the Benford’s expected amount and the actual frequency
of the dataset.
4. Click any digits that are significantly above the bounds and choose Extract Records.
Bonus: Use the average expected Benford’s law value to identify specific employees
with abnormally large transactions. In this case, a user with lots of transactions should
have an average expected Benford’s law percentage of 11.1 percent or above. Employees
whose average purchases are closer to 8 or 5 percent have a lot of 7, 8, and 9 values that
are skew- ing their average.
In Excel
1. Open your spreadsheet with financial data that contain an employee name and
transac- tion amount.
2. Add a new column and extract the leading digit:
=NUMBERVALUE(LEFT([Amount],1))
3. Add the expected Benford’s law percentages to your sheet similar to Table 6.5 below:
TABLE 6-5
Digit Benford Expected %
Expected Benford’s
Law Percentages 1 30.1%
2 17.6%
3 12.5%
4 9.6%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%
4. Add a new column next to your data to look up the expected Benford’s law percentage
for your value: =INDEX([Benford Expected %], MATCH([Value],[Digit],0)).
5. Create a PivotTable to see the average % by user:
a. Select your data.
b. Go to Insert > Tables > PivotTable.
c. Click OK to add the PivotTable to a new sheet.
Chapter 6 Audit Data Analytics 223
In IDEA
• This is not possible by built-in tool.
Drill Down
The most modern Data Analytics software allows auditors to drill down into specific
values by simply double-clicking a value. This lets you see the underlying transactions
that gave you the summary amount. For example, you might click the total sales amount
in an income statement to see the sales general ledger summarizing the daily totals. Click
a daily amount to see the individual transactions from that day.
Data needed
• Two tables/sheets with a common attribute, such as a primary key/foreign key, name,
or address
In Excel
1. Search the Internet for Fuzzy Lookup Add-In for Excel, then download and install it
to your computer.
2. Open your spreadsheet with two sheets you’d like to join using a fuzzy match. For
exam- ple, employees and vendors.
3. Go to Fuzzy Lookup > Fuzzy Lookup (Go to File > Options > Add-ins > COM Add-ins
> Go. . . and check Fuzzy Lookup Add-in For Excel if you don’t see the bar).
224 Chapter 6 Audit Data Analytics
a. Select the sheet you want for the Left Table and a sheet that has similar values
for the Right Table.
b. Choose the columns that you expect to find matching values in the Left and
Right Columns pane. Note: For addresses, choose Address AND Zip Code for
more likely matches.
c. Select your output columns, if needed.
d. Adjust the similarity threshold, if needed.
e. Open a new worksheet.
f. Click Go.
4. Evaluate the similarity.
In IDEA
1. Fuzzy matching isn’t available by default in IDEA.
Sequence Check
Another substantive procedure is the sequence check. This is used to validate data integ-
rity and test the completeness assertion, making sure that all relevant transactions are
accounted for. Simply put, sequence checks are useful for finding gaps, such as a missing
check in the cash disbursements journal, or duplicate transactions, such as duplicate pay-
ments to vendors. This is a fairly simple procedure that can be deployed quickly and
easily with great success.
In Excel
PROGRESS CHECK
5. A sequence check will help us to see if there is a duplicate payment to
vendors. Why is that important for the auditor to find?
6. Let’s say a company has nine divisions, and each division has a different
check number based on its division—so one starts with “1,” another with “2,”
etc. Would Benford’s law work in this situation?
226 Chapter 6 Audit Data Analytics
Regression
Regression allows an auditor to predict a specific dependent value based on independent
variable inputs. In other words, what would we expect behavior to be given some inputs
and does that match reality? In auditing, we could evaluate overtime booked for workers
against productivity or the value of inventory shrinkage given environmental factors.
Classification
Classification in auditing is going to be mainly focused on risk assessment. The predicted
classes may be low risk or high risk, where an individual transaction is classified in either
group. In the case of known fraud, auditors would classify those cases or transactions as
fraud/not fraud and develop a classification model that could predict whether similar
trans- actions might also be potentially fraudulent.
There is a longstanding classification method used to predict whether a company is
expected to go bankrupt or not. Altman’s Z is a calculated score that helps predict bank-
ruptcy and might be useful for auditors to evaluate a company’s ability to continue as a
going concern.
When using classification models, it is important to remember that large training sets
are needed to generate relatively accurate models. Initially, this requires significant manual
classi- fication by the auditors or business process owner so that the model can be useful for
the audit.
Probability
When talking about classification, the strength of the class can be important to the
auditor, especially when trying to limit the scope (e.g., evaluate only the 10 riskiest
transactions). Classifiers that use a rank score can identify the strength of classification
by measuring the distance from the mean. That rank order focuses the auditor’s efforts on
the items of poten- tially greatest significance.
Sentiment Analysis
Evaluate text (e.g., 10-K or annual report) for positive or negative sentiment to predict
posi- tive or negative outcomes or to look for potential bias on management’s part. There
is more discussion on sentiment analysis in chapter 8.
Applied Statistics
Additional mixed distributions and nontraditional statistics may also provide insight to
the auditor. For example, an audit of inventory may reveal errors in the amount recorded
in the system. The difference between the error amounts and the actual amounts may
provide some valuable insight into how significant or material the problem may be.
Auditors can plot the frequency distribution of errors and use Z-scores to hone in the
cause of the most significant or outlier errors.
Chapter 6 Audit Data Analytics 227
Artificial Intelligence
As the audit team generates more data and takes specific action, the action itself can
be modeled in a way that allows an algorithm to predict expected behavior. Artificial
intelligence is designed around the idea that computers can learn about action or behav-
ior from the past and predict the course of action for the future. Assume that an expe-
rienced auditor questions management about the estimate of allowance for doubtful
accounts. The human auditor evaluates a number of inputs, such as the estimate calcula-
tion, market factors, and the possibility of income smoothing by management. Given
these inputs, the auditor decides to challenge management’s estimate. If the auditor
consistently takes this action and it is recorded by the computer, the computer learns
from this action and makes a recommendation when a new inexperienced auditor faces
a similar situation.
Decision support systems that accountants have relied upon for years (e.g., TurboTax)
are based on a formal set of rules and then updated based on what the user decides given
several choices. Artificial intelligence can be used as a helpful assistant to auditors and
may potentially be called upon to make judgment decisions itself.
Additional Analyses
The list of Data Analytics presented in this chapter is not exhaustive by any means. There
are many other approaches to identifying interesting patterns and anomalies in enterprise
data. Many ingenious auditors have developed automated scripts that can simplify several
of the audit tasks presented here. Excel add-ins like TeamMate Analytics provide many
different techniques that apply specifically to the audit of fixed assets, inventory, sales
and purchase transactions, etc. Auditors will combine these tools with other techniques,
such as periodically testing the effectiveness of automated tools by adding erroneous or
fraudulent transactions, to enhance their audit process.
PROGRESS CHECK
7. Why would a bankruptcy prediction be considered classification? And why
would it be useful to auditors?
8. If sentiment analysis is used on a product advertisement, would you guess the
overall sentiment would be positive or negative?
Summary
This chapter discusses a number of analytical techniques that auditors use to gather
insight about controls and transaction data. These include descriptive analytics that are
used to summarize and gain insight into the data, diagnostic analytics that identify
patterns in the data that may not be immediately obvious, predictive analytics that look
for common attributes of problematic data to help identify similar events in the future,
and prescriptive analytics that provide decision support to auditors as they work to
resolve issues with the processes and controls.
Key Words
computer-assisted audit techniques (CAATs) (212) Computer-assisted audit techniques
(CAATs) are automated scripts that can be used to validate data, test controls, and enable substantive
testing of transaction details or account balances and generate supporting evidence for the audit.
descriptive analytics (212) Descriptive analytics summarize activity or master data elements
based on certain attributes.
diagnostic analytics (212) Diagnostic analytics looks for correlations or patterns of interest in the data.
fuzzy matching (213) Fuzzy matching finds matches that may be less than 100 percent matching by
finding correspondences between portions of the text or other entries.
monetary unit sampling (MUS) (218) Monetary unit sampling allows auditors to evaluate account
balances. MUS is more likely to pull accounts with large balances (higher risk and exposure) because it
focuses on dollars, not account numbers.
predictive analytics (212) Predictive analytics attempt to find hidden patterns or variables that are
linked to abnormal behavior.
prescriptive analytics (212) Prescriptive analytics use machine learning and artificial intelligence for
auditors as decision support to assist future auditors in finding potential issues in the audit.