Auditing Data Contained in EXLs
Auditing Data Contained in EXLs
Data contained in
Excel
Worksheets
Audit Commander
Audit Guide
Data analysis made easier…
EZ-R Stats, LLC
Auditing data on Excel worksheets
Audit Commander
Document History
Revision History
Revision Revision Date Summary of Changes Author
Number
1.0 10-17-2009 Initial Version M. Blakley
1.1 11-12-2009 Trend Line and additional M. Blakley
error checking. New
style of input form.
Auditing data on Excel worksheets
Table of Contents
1.3 Purpose.............................................................................................................................................................2
1.4 Scope.................................................................................................................................................................2
2 GETTING STARTED.................................................................................................4
4 AUDIT COMMANDS.................................................................................................1
4.1 Numeric............................................................................................................................................................2
4.1.1 Population Statistics...................................................................................................................................2
4.1.2 Round Numbers..........................................................................................................................................7
4.1.3 Benford’s Law..........................................................................................................................................11
4.1.4 Stratify......................................................................................................................................................15
4.1.5 Summarization..........................................................................................................................................19
4.1.6 Top and Bottom 10...................................................................................................................................22
4.1.7 Histograms................................................................................................................................................25
4.1.8 Box Plot....................................................................................................................................................29
4.1.9 Random numbers......................................................................................................................................33
4.2 Date.................................................................................................................................................................37
4.2.1 Holiday Extract.........................................................................................................................................37
4.2.2 Week days.................................................................................................................................................41
4.2.3 Holiday summary.....................................................................................................................................44
4.2.4 Ageing......................................................................................................................................................48
4.2.5 Date Near..................................................................................................................................................52
4.2.6 Date Range...............................................................................................................................................54
4.2.7 Week days Report.....................................................................................................................................56
4.3 Other...............................................................................................................................................................59
4.3.1 Gaps in Sequences....................................................................................................................................59
4.3.2 Data Extraction.........................................................................................................................................62
4.3.3 Duplicates.................................................................................................................................................66
4.3.4 Same, Same, Different..............................................................................................................................69
4.3.5 Trend Lines...............................................................................................................................................72
4.3.6 Time Line analysis....................................................................................................................................75
4.3.7 Confidence Band......................................................................................................................................82
4.3.8 Confidence Band (Time Series)...............................................................................................................85
4.3.9 Invoice Near Miss....................................................................................................................................89
4.3.10 Split Invoices..........................................................................................................................................92
4.3.11 Check SSN..............................................................................................................................................94
4.3.12 Check PO Box........................................................................................................................................97
Auditing data on Excel worksheets
4.4 Patterns.........................................................................................................................................................110
4.4.1 Round Numbers......................................................................................................................................110
4.4.2 Data Stratification...................................................................................................................................114
4.4.3 Day of Week...........................................................................................................................................117
4.4.4 Holidays..................................................................................................................................................120
4.4.5 Benford’s Law........................................................................................................................................123
4.5 Sampling.......................................................................................................................................................126
4.5.1 Attributes – Unrestricted: Stop and Go..................................................................................................126
4.5.2 Variable Sampling – Unrestricted Stop and Go......................................................................................133
4.5.3 Stratified Variable Sampling – Population.............................................................................................139
4.5.4 Stratified Variable Sampling – Assessment............................................................................................142
4.5.5 Stratified Attribute Sampling – Population............................................................................................144
4.5.6 Stratified Attribute Sampling – Assessment...........................................................................................147
5.1 Overview.......................................................................................................................................................149
5.3 An example...................................................................................................................................................151
5.6 An example...................................................................................................................................................156
6.1 Numeric........................................................................................................................................................162
6.2 Text................................................................................................................................................................162
Auditing data on Excel worksheets
6.3 Date / Time...................................................................................................................................................163
6.5 Combinations...............................................................................................................................................164
• Chapter 1 – Overview
• Auditors: can use the software to for a variety of common audit tasks. Altogether, over
40 useful analytical audit functions are included
• Command and option names appear in bold type in definitions and examples.
1.3 Purpose
The purpose of this monograph is to provide a practical guide to auditing data contained on
Excel work sheets using the Audit Commander. Over 40 useful audit tests and data analyzes
can be performed. Although the primary source of data will be that contained on Excel work
sheets, the technique described also applies to certain other data sources such as Excel
workbooks, Access databases, as well as text files that are in a specific format (“tab separated
values”).
The auditor does not need special computer skills in order to be able to perform these tests
because they are largely menu driven with “fill in the blanks”.
Development of the software began in August 2005 when the author searched fruitlessly for a
relatively easy to use, economical software package for analyzing data on Excel work sheets
(and other). During its development, suggestions and improvements were made by a variety of
audit practitioners.
More information about the system is available from the website, More information is also
available about the author.
1.4 Scope
This guide explains how to install the software, the general purpose of the functions provided, as
well as examples of use.
2 Getting Started
The worksheet analyzer is generally used to analyze all or portions of single Excel spread
sheets. However, it can also be used to analyze data contained within MS-Access databases,
as well as text files in various formats (e.g. comma separated values, tab separated values, print
format, etc.)
The worksheet analyzer derives much of its capabilities by leveraging the software provided by
Microsoft called “ActiveX Data Objects” which provides significant database capabilities. These
database capabilities are in turn incorporated into and used by the software to provide a variety
of capabilities of special interest to auditors and data analysts.
• Significantly reduced time required to perform more complex extracts and analyzes
• Computations for attribute sampling are slow with populations > 1,000
Although the software is a stand-alone program, by design it is intended for use with Excel, and
is small enough that the form can reside along side the Excel workbook which contains the data
to be examined. This is done by having both the Excel workbook open as well as the Audit
Commander form on the same page while both are open. This makes it easier to transfer data
back and forth between the systems while doing a review.
An example screen shot is shown below to illustrate a case where a range of data on the
worksheet is being analyzed.
By intentionally keeping the Audit Commander form small, it becomes easier to transfer the
information from the Excel work book to the form, analyze the data and then “paste” the results
The “commands” menu item is used to select the command or type of analysis to be performed.
The remaining menu items are “forms” which are used to gather and process information. A
summary description of the purpose of each form is provided in the table below.
The typical sequence used for running an audit analysis of data on a worksheet is as follows:
1. If not already done, specify the location where the audit results are to be stored, along
with the audit title, audit step number, etc. (“Audit” form)
2. Select the type of analysis to be performed (menu of 40+ commands)
3. Select the data to be analyzed, the columns or rows to be tested, along with any
additional information required for the analysis (“Clipboard/MS/Text” form)
4. If specific criteria are to be used (i.e. the test is for an extract of the data), specify this
information (“Where” tab)
5. If the data to be tested is from the clipboard, then copy the data to be tested from the
worksheet. This is done by first highlighting the data, then copying it to the clipboard
using methods such as 1) keyboard combination “Control-C”, 2) menu selection “Edit|
Copy”, or 3) right mouse click and select “Copy”. (“Clipboard” form)
6. On the tab labeled “Form”, click the button labeled “Run” (“Clipboard” form)
7. Wait until the analysis is finished, as indicated with a status message on the Status Bar
of the Audit Commander form. (“Clipboard” form)
8. View the report (“Report” tab)
9. If desired, the output in the audit folder specified may also be viewed. This includes both
a text report as well as any charts prepared (if applicable).
10. Analysis report results can also be copied to the clip board (“Report” tab)
11. Change audit parameters or specify different tests and repeat the steps above
Note: If the data to be tested resides in an Excel workbook, Access database or text file,
then “MS” or “File” tabs are used instead.
Clicking on the “Audit” tab displays the information used to store the results for the analysis
performed. If any of this information needs to be changed, it can be overtyped and then the
button labeled “Update” clicked to store the information. The folder shown (in this case
C:\test\temp\” is the location where the reports and graphics produced by the audit analysis will
be stored. The folder name can be selected by clicking on the button labeled “Folder”, or else
overtyping the name in the text box.
The step number is used to uniquely identify the output. The starting step number is shown
above, and will be increased by one every time a procedure is run.
Once the information has been entered, click on the button labeled “Update” to save the
information. An informational message will be displayed on the status bar to acknowledge that
the change has been applied. This change will be in effect until the next change is applied.
Warning: Existing report files and graphics can be overwritten if the starting step number is too
low.
Once the audit parameter information has been entered (or checked), the data analysis
procedures can be performed. If the data to be analyzed is contained on an Excel worksheet,
then the analysis process begins with the first tab, which is labeled “Form”.
Note: If data in Excel work books, Access databases or text files are to be analyzed, the
tables “MS” and “File” should be used instead.
The first step is to select the data to be analyzed. This is done by highlighting the area on the
worksheet to be analyzed and then copying it to the clipboard using any of four methods:
Often, the data to be reviewed will be in vertical format as shown here. However, in some cases
the data will be organized horizontally (e. g. in comparative financial statements). If the data is
organized horizontally, then the checkbox “rows” on the main form needs to be checked before
the data is “pasted” into the form.
Once the data to be analyzed has been copied to the clipboard, it can then be “pasted” onto the
Audit Commander form. If the first row of the header contains column names, then the
checkbox just below the “Paste” button must be checked. When the data is pasted onto the
Once the data has been pasted onto the form, the name of the first column is shown, and any
other column can be selected from the drop down list. For this test, the second column, named
“Cost” will be selected. The test to be performed will be to identify the three largest values. So
the command “Largest values” is selected from the command drop down list.
If the column name is blanked out, then all the data pasted will be processed in accordance with
the information below:
The option to process the entire area pasted is available only for those functions which normally
process only a single column of data (list is in the table below). Depending upon the function
selected, only numeric data, date data or all data will be processed. The type of data processed
is shown in the table below.
For commands which produce a chart, the chart title and chart colors can be specified using the
“Chart” tab.
Although all commands will produce a text file report, only certain commands will also prepare a
chart. Both the title of the chart and the color scheme used can be specified. The color scheme
can be specified in three formats:
1. “pre-set” scheme selected from the drop down list, e.g. “fall”
2. A range of colors between two specified values, e.g. brown – light tan (Note that a dash
separates the color names)
3. A range of colors specified for a numbered color group, e.g. turquoise 1 – 4. This is
equivalent to the specification turquoise 1 – turquoise 4, but shorter to type. Note that
only certain color names have color groups.
A complete list of color names accepted by the system and how they appear can be seen.
Examples of color ranges and how they appear can be seen – examples show a histogram and
use a chart title which specifies the color names used in the range. Two documents showing
examples are provided, both are predominantly harmonious color schemes. The first shows
color ranges for colors in a tight range (conservative). This is a PDF document of 251 pages
and is 8.4 MB in size. The second range of colors are less conservative, but still harmonious,
and are shown on a PDF document of 226 pages which has a size of 7.6 MB.
The case for chart colors can be either upper or lower case. Spaces are ignored. Thus the
following three specifications are equivalent:
• Turquoise 2
• TURQUOISe2
• Tur quoise 2
The next step is to select the command to be processed from the command menu. The
commands are organized by function type.
Once the command has been selected, a help message is displayed on the status bar indicating
what additional information is needed. If no additional information is needed, the status bar will
read “(No additional info)” and the info text box will not be displayed. However, if additional
information is required, the help message will be displayed on the status bar and the “Info” box
will be displayed. The resulting form is as follows:
The form now displays a fourth line called “Other info” and also displays an abbreviated help
message on the status bar: “number of values, e.g. 10”. The help message indicates that the
Other info is required and consists of a single value and the default value is “10”. In order
words, for the largest value test, the largest 10 items will be selected. In this case, we want only
the largest three values, so the number 3 is then typed into the “Other info” box.
Since all the needed information has been entered, the “Run” button can be clicked in order to
perform the analysis. After clicking the “Run” button, there will be a pause while the system
processes the information. Once processing is complete, the location of the output file will be
shown on the status bar. If a chart was also produced, it will have the same name as the output
text report file, but with a suffix of “.png”. An example of the form appears as follows:
As shown on the status bar, the report has been written to the file named “c:\test\temp\step-2.txt”
in the directory requested. The initial portion of the report (up to a maximum of 2,000
characters), can also be viewed by clicking on the tab labeled “Report”.
The report lists the three lowest valued cost items in the range selected. Remaining information
about these items can be viewed by scrolling the view to the right. Note that the report has also
been stored in the report file specified.
• Return to the “Clipboard” form and select another command to be processed, e.g.
Benford’s Law test”
• Return to the “Clipboard” form and select another column to be processed, e.g. “AD”
(accumulated depreciation)
• Return to the “Clipboard” form and “paste” another worksheet area for processing
• Switch to any of the other tabs for additional processing.
Go to a blank area in the current (or other) worksheet and “paste” the report results into that
worksheet.
Note: When a command is run, the results of that command can also be pasted to the
clipboard by clicking on the “Copy” button, making it easy to do further processing or
analysis by pasting this information on a worksheet.
Results are written to both a text file and a chart. In the example shown, the report was written
to the text file “c:\test\temp\step-8.txt” and a chart was produced and stored with almost the
same name, i.e. “c:\test\temp\step-8.png”. The results were stored in the directory “c:\test\temp”
because that folder was specified as the Audit folder in this instance (can be changed using the
“Audit” form).
For the population statistics command, the counts for positive, negative and zero amounts are
shown, along with the totals.
Note: The default color for the chart is blue and can be overridden using the values under
the “Chart” tab.
Clicking on the label named “Where?” causes the selection criteria help form above to be shown.
This form is useful in reminding you of the syntax for various types of selection that can be
performed. Of the templates shown, an example can be selected from the drop down list, then
modified and then copied over to the main processing form.
A complete record of the processing performed can be recorded automatically in a log file. The
log file records the processing performed in “macro” format so that it can be re-performed at a
future date or shared with others.
To perform logging, only two actions are needed:
Specify the name of the log file to be used (only required is a different logfile is used from prior
times)
For the processing performed, check the box on the form to indicate that logging is desired. This
check box can be turned on and off at will. When turned off, no logging is recorded until the
check box is turned back on.
4 Audit Commands
Types of queries
There are some 40+ queries or audit commands which can be selected for processing. These
commands are grouped into five classes based upon the type of function performed – 1)
numeric, 2) date, 3) other, 4) patterns and 5) sampling. For each command, a brief
explanation of the purpose and use of the command is provided, an explanation of the meaning
of any “other information” which must be provided. For each command, there are further
examples and example output contained on the CD which is distributed with the software.
4.1 Numeric
Population Statistics
Overview / Use in Audit Procedures
The population statistics command is the “work horse” of the system and can be used alone to
provide information for many audit steps. Just a few examples include:
The population statistics command produces three text reports and one graphic:
1. Basic statistics
2. Histogram data
3. Percentile report
Basic statistics include information such as counts, totals, minimum and maximum values, etc. This
information alone can be used to perform certain audit steps such as agreeing transaction supporting
details to ledger amounts, testing for procedural compliance, etc. In the example below, a histogram
chart and histogram data is to be prepared for fixed asset costs. The purpose of the procedure is to
obtain an overview of the fixed assets cost information, identify potential errors or extreme values and
provide information for audit planning.
Usage Example 1
In a test of fixed assets, determine the count and amount of fixed assets which have been over
depreciated.
Approach – using the “population statistics” command, obtain totals and counts where the asset cost
less accumulated depreciation is less than salvage.
Audit Command values
Column value – Cost
Text Box – (empty)
Where – (cost – ad) < salvage
Results
Counts, totals, minimum, maximum, etc. for all assets which have been over depreciated.
Usage Example 2
For the purposes of sample planning, determine the distribution of values for fixed asset costs in order to
be able to plan strata to use for stratified sampling.
Approach – using the “population statistics” command, obtain a histogram of fixed asset costs.
Audit Command values
Column value – Cost
Text Box – (empty)
Where – (empty)
The command shown below produces three reports for cost totals for location ‘ABC’. This is a very
basic example of the command. It is possible to specify considerably more complex selection criteria.
In addition, it is possible to prepare statistics for certain calculated amounts that are not contained in the
file or the worksheet. An example might be statistics for net book value measured by “cost – ad” (cost
less accumulated depreciation.
Output results
Population Statistics
The results above were “copied” from the form and then “pasted” into a worksheet. An alternative would
be to import the report as a text file into Excel.
Output results
Histograms
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
Round numbers
Overview / Use in Audit Procedures
Round numbers are often an indicator of estimates, which may be appropriate in certain cases (e.g.
journal entries), but not appropriate in others (e.g. purchase orders, invoices, expense reports, etc.).
The system can be used to identify the extent (if any) to which round numbers are being used as well as
extract data based upon types of round numbers. The system defines a round number as one which is
a whole number (i.e. no pennies), and contains one or more zeros immediately to the left of the decimal
point, without any intervening digits other than zero. The number of such zeros determines the “order”
of the round number. The chart below indicates examples of various round numbers, as well as their
“order”. If a number is not round, then it will be classified as “NR” (not round).
Example Order
15,000.00 3
10 1
123.19 NR
1,000,000.00 6
20.19 NR
Examples of tests which can be performed are provided below:
In a test of purchase orders, determine the frequency of round numbers for purchase orders. There is
an allegation relating to purchases at store number ‘123’.
Approach – using the “round numbers” command, obtain frequencies for round numbers on purchase
orders, classified as to type of round number.
Audit Command values
Column value – Purchase order amount
Text Box – (empty)
Where – [store number] = 123
Results
Frequencies of round numbers used on purchase orders for store number 123.
Usage Example 2
In a test of journal entries, determine the frequency and extent of round numbers in journal entries for
transactions relating to expenses. Expense account numbers begin with the number 3 for this
company .
Approach – using the “round numbers” command, obtain a frequency count.
Audit Command values
Auditing data in Excel
Page 7
worksheets
Audit Commands
Output results
Round numbers
Output results (pasted into Excel work sheet)
Round Number report:
d-stat: .003704
Digits Count Pct
Not Round 3,660 90.37%
1 354 8.74%
2 34 0.84%
3 2 0.05%
Totals 4,050 100.00%
The report indicates that just a little under 10% of the numbers are round. The largest order of round
numbers is 3 (and there are two such numbers).
The “d-stat” value of “.003704 is a measure of the difference between the expected number of round
numbers and the actual number found. The d-stat value ranges from a low of zero (indicating conformity
with that expected) to a high of one (indicating a significant difference between observed and expected).
Output results
Round numbers
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
Benford’s Law
The Benford’s Law command is generally used as part of a fraud or other forensic investigation. The
purpose will be to determine if numeric values on a schedule conform with that which is expected using
Benford’s Law. The test should only be applied to numeric values which would be expected to adhere to
that expected using Benford’s Law. More information is available about Benford’s law and its use.
There are six types of tests which can be performed for Benford’s Law:
Tests using Benford’s law must specify the type of test being performed:
F1 – Test of the first digit
F2 – Test of the first two digits
F3 – Test of the first three digits
D2 – Test of the second digit only
L1 – Test of the last digit
L2 – test of the last two digits
Usage Example 1
In a test of physical inventory counts, determine if some of the counts may have been made up. It is
expected that actual inventory counts would follow Benford’s law, i.e. a frequency distribution of
inventory counts would align with that expected using Benford’s law. There is an allegation relating to
counts at warehouse 5713.
Approach – using the “benford” command, obtain frequencies for physical inventory counts and compare
those with that expected using benford’s law
Audit Command values
Column value – Inventory count
Text Box – F1
Where – [warehouse] = 5713
Results
Frequencies of first digits of inventory counts, along with a chart and analysis comparing the
results with that expected using benford’s law.
Usage Example 2
In a test of accounts payable, determine if particular vendor invoices have leading digit frequencies as
Auditing data in Excel
Page 11
worksheets
Audit Commands
would be expected using benford’s law. The vendors in question all have vendor numbers starting with
the letters “R” – “V”.
Approach – using the “benford” command, obtain a frequency count.
Audit Command values
Column value – [Invoice Amount]
Text Box – F1
Where – [Vendor number] like ‘[R-V]%
In the example below, the auditor is testing whether the first digits of the column named cost adhere with
that expected using benford’s Law.
Output results
Benford’s Law
Benford’s Law
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
The chart indicates that the data distribution is fairly uniform (shown in the light tan) and differs
significantly from that which would be expected using Benford’s Law (shown in darker tan). The Chi
Square value is shown on the chart. Note that different chart colors and titles may be specified under
the “Chart” tab on the form.
Output results - chart
4.1.4 Stratify
Data stratification
The data stratification procedure classifies numeric amounts into “buckets” or value ranges specified by
the auditor. The purpose is to classify numeric amounts in order to determine the most frequently
occurring values, largest and smallest values, etc. Stratification is often used for sample planning
(stratified sampling, reasonableness tests) as well as audit planning in general.
The values to be used for the strata (specified in ascending order and
separated by commas or spaces). An example strata specification is “-
1000, -500, 0 300, 2000, 4000, 6000”. Note that the strata values do not
need to be evenly spaced. If any values are found outside the end ranges
of the strata specified, those values are reported separately.
Warning: If strata values are not numeric, or not in ascending order, invalid results may be obtained. Do
not include commas within a single value – e.g. specify 1000 NOT 1,000
Usage Example 1
In a test of accounts payable, classify the invoice amounts into particular ranges for the purpose of audit
planning. Invoices less than $100 do not require a secondary authorization. Invoices over $50,000
requires three authorizations. All invoices over $2,500 require a purchase order.
Approach – using the “stratify” command, obtain frequencies and totals for invoices classified into
various numeric ranges.
Audit Command values
Column value – Inventory amount
Text Box – -5000 -500 0 100 500 2500 30000 50000 100000
Where – (empty)
Results
The invoice amounts for each range specified are totaled and counted. Invoices for less than -
$5,000 or ore than $100,000 (the extreme values) are tallied separately.
Usage Example 2
Auditing data in Excel
Page 15
worksheets
Audit Commands
In a test of accounts payable, stratify the amounts of invoices for sample planning. One objective of the
analysis is to classify the amounts such that 80% of the value can be tested with one procedure and the
remaining 20% with another audit procedure. Only invoices at location ABC are to be classified.
Approach – using the “stratify” command, obtain a data stratification.
Audit Command values
Column value – [Invoice Amount]
Text Box – 0 500 20000 50000 100000
Where – location = ‘ABC’
Results
A report classifying the invoice amounts at location ‘ABC’ into the ranges specified. The results
also include a chart.
Data stratification
Data stratification
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
4.1.5 Summarization
Data summarization
The summarization function obtains not only totals by each control break (sort key) specified, but also
other information such as minimum and maximum values, averages and standard deviation. There is no
limit as to the number of columns which make up the control break. A control break (sort key) may
consist of a single column, e.g. sub-totals by vendor would be specified as just a single column name –
“vendor”. If subtotals were needed by region by vendor, then the control break specification would be
“region, vendor”.
Usage Example 1
The auditor wishes to summarize sales by region and store in order to identify both the totals, as well as
the ranges of values at these stores, i.e. largest single amount and smallest single amount.
Approach – using the “summary” command, obtain totals, counts, minima, maxima, standard deviation,
average.
Audit Command values
Column value – Sales amount
Text Box – region, store
Where – (empty)
Results
The summarized amount by store by region is produced, showing also the averages, minima,
maxima, standard deviation, etc.
Usage Example 2
Expense report information is available and includes employee number, region, expense type and
expense date. The auditor wishes to summarize expense report costs , by region and employee number
for the month of June, for travel expenses only (i.e. expense type = “travel”).
Approach – using the “summary” command, obtain a data summarization.
Audit Command values
A simpler example is shown in the example below – summarize cost by location and life. All rows are to
be summarized.
Output results
Data summarization
Stand-
Minim- ard De-
location life Total Average um Maximum Count viation
AB 1 1 1 1 1 1 1
AB 2 2 2 2 2 1 1
AB 13 13 13 13 13 1 1
ABC 3 648 3 3 3 216 0
ABC 4 992 4 4 4 248 0
1,285.0
ABC 5 0 5 5 5 257 0
1,572.0
ABC 6 0 6 6 6 262 0
1,722.0
ABC 7 0 7 7 7 246 0
2,088.0
ABC 8 0 8 8 8 261 0
2,115.0
ABC 9 0 9 9 9 235 0
2,160.0
ABC 10 0 10 10 10 216 0
2,497.0
ABC 11 0 11 11 11 227 0
3,132.0
ABC 12 0 12 12 12 261 0
CDS 3 45 3 3 3 15 0
CDS 4 60 4 4 4 15 0
CDS 5 80 5 5 5 16 0
CDS 6 108 6 6 6 18 0
CDS 7 105 7 7 7 15 0
CDS 8 96 8 8 8 12 0
CDS 9 162 9 9 9 18 0
CDS 10 170 10 10 10 17 0
Output results
The Top and Bottom 10 commands are used to identify the largest (or smallest) numeric, date or text
values from a population (and criteria can be applied). The number of items to be identified can be
specified as any value. Generally the command is used to identify extremes among the following types
of data:
• For numeric values, identify unusually large (or small) items, possible outliers or to focus on just
the most significant dollar items.
• For date values, identify the latest (or earliest) dates in order to identify date ranges, transactions
outside the cutoff date, etc.
• For text values, identify high (or low) values of text as would be shown had the data been sorted.
Note that the data being analyzed does not need to be presorted. Analysis of subsets of the data can be
readily performed. For example, the auditor may wish to know the smallest fixed asset costs for those
assets with a useful life of seven years or more and located within one or more regions or states. Other
types of criteria can also be applied, depending upon what the analyst wishes to accomplish.
Usage Example 1
For purposes of audit testing, the 10 fixed assets with the largest cost need to be identified, but only for
assets located in either Florida, Alabama or Georgia.
Approach – using the “topn” command, list the details pertaining to the ten asset records having the
largest cost. Note that the input data does not need to be pre-sorted.
Audit Command values
Column value – asset cost
Text Box – 10
Where – location in(‘FL’,’GA’,’AL’)
Results
A list of the fixed asset records for the ten assets having the greatest cost in any of the three
states specified.
In the example below, the auditor wishes to identify the ten asset records which have the largest cost
amounts.
Output results
The records with the largest ten asset costs are shown, listed in descending order. Note that if the data
pasted did not have column headers, then the largest values would shown in the leftmost column. For
example, if an area of six columns (with no column headers) were pasted and column three (“Col003”)
were selected, then the results would be shown with Column3 as the first column, followed by Column 1,
2, 4, 5 and 6.
Output results
4.1.7 Histograms
Histograms
Histograms provide a visual representation for the values or transactions being analyzed. The results
are identical to that of the population statistics, and boxplot commands, except that a different chart is
produced.
1. Basic statistics
2. Histogram data
3. Percentile report
Basic statistics include information such as counts, totals, minimum and maximum values, etc. This
information alone can be used to perform certain audit steps such as agreeing transaction supporting
details to ledger amounts, testing for procedural compliance, etc. Examples of basic statistics reports
can be found in the work papers referenced below:
Usage Example 1
For purposes of audit testing, prepare a histogram of employee expense report amounts.
Approach – using the “histo” command, prepare a chart and detail report as to expense report amounts
at region XYZ.
Audit Command values
Column value – [expense report amount]
Text Box – (empty)
Where – region = ‘XYZ’
Results
A histogram chart of expense report amounts at region XYZ, along with a text report containing
the numeric values.
Usage Example 2
For purposes of testing inventory values, prepare a histogram of inventory unit cost amounts.
Approach – using the “histo” command, prepare a chart and detail report as to inventory unit cost
amounts.
Audit Command values
Column value – [inventory cost]
Text Box – (empty)
Where – (empty)
Results
A histogram chart of unit inventory costs, along with a text report containing the numeric values.
Where – (empty)
Results
The invoice amounts for each range specified are totaled and counted. Invoices for less than -
$5,000 or ore than $100,000 (the extreme values) are tallied separately.
The example below shows a histogram of cost values is to be prepared.
Output results
Histograms
Histograms
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
This chart indicates that the most common values are those between 9,164 and 9,997. The fewest
counts are between the values of 1 and 834.
Output results - chart
Box Plot
The Box Plot command is used to separate a population of numeric values into quartiles in order to see
the values and to also envision how the population is distributed. This provides a little more information
than just the minimum, maximum and median. Except for the chart, the command is identical to the
Population statistics and the histogram command.
Usage Example 1
As part of an audit of accounts payable, the range of invoice costs needs to be determined.
Approach – using the “boxplot” command, prepare a chart and detail report as to invoice costs for
invoices dated after 6/30/2008.
Audit Command values
Column value – [invoice amount]
Text Box – (empty)
Where – [invoice date] > #6/30/2008#
Results
A box plot chart of invoice amounts for invoices dated after 6/30/2008, along with a text report
containing the numeric values.
Usage Example 2
Output results
Box Plot
Box Plot
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
Random numbers are commonly required as part of the sampling process. Excel has a built in
function for the generation of random numbers, “=RAND()”. The Excel RAND function generates
pseudo random numbers evenly distributed between 0 and 1. For many purposes, the pseudo
random number generated using the RAND function may be adequate.
The RAND function is just one of a number of random number generators (RNG). The quality of
a random number generator can be tested using the “DieHard” test suite developed by the
National Institute of Standards (NIST). More information is available at
http://csrc.nist.gov/groups/ST/toolkit/rng/batteries_stats_test.html.
One of the free random number generators is called the Mersenne Twister.
by Makoto Matsumoto (松本 眞?) and Takuji Nishimura (西村 拓士?)[1] that is
based on a matrix linear recurrence over a finite binary field F2. It provides for
fast generation of very high-quality pseudorandom numbers, having been de-
signed specifically to rectify many of the flaws found in older algorithms.
Its name derives from the fact that period length is chosen to be a Mersenne
prime.
The commonly used variant of Mersenne Twister, MT19937 has the following
desirable properties:
this is many orders of magnitude larger than the estimated number of particles in the ob-
servable universe, which is 1087).
3. It passes numerous tests for statistical randomness, including the Diehard tests.
It passes most, but not all, of the even more stringent TestU01 Crush randomness tests.
The Mersenne Twister algorithm has received some criticism in the computer science
field, notably by George Marsaglia. These critics claim that while it is good at generating
random numbers, it is not very elegant and is overly complex to implement.”
Generation of random numbers using Audit Commander is done using the “random”
command. A seed value consisting of an integer value between 1 and 2,147,483,647 is
used to determine the starting random number. The random numbers generated will
consist of uniformly distributed numbers between zero and one.
Usage Example 1
For purposes of sampling, generate and assign random numbers to each row of data
contained on an Excel work sheet. The starting seed number to be used is 102935427.
Command – “random”
Column name – “N/A”
TextBox – “102935427”
Random numbers
The example command shown on the next page adds a random number value in the rightmost column.
This random number will be between 0 and 1 (exclusive). The starting number is based upon the seed
value provided (in this case 1738974 ). The seed value should be a whole number between 1 and
approximately 2.1 billion.
Random numbers
Output results (pasted into Excel work sheet – highlighting added for effect, not all columns shown)
Life Location Acquisition Accode DispDate Random number
7 DEF 5/17/2008 7:40 A 0 0.974683138
8 DEF 12/19/2001 A 0 0.961858645
12 DEF 1/5/2008 11:31 A 0 0.209254051
3 DEF 10/12/2009 16:33 A 0 0.451545258
8 DEF 11/20/2008 11:16 A 0 0.362094671
10 DEF 1/31/2007 6:00 A 0 0.010547096
5 DEF 8/21/2010 21:21 A 0 0.784745319
4 DEF 3/14/2000 15:07 A 0 0.269402404
3 DEF 4/4/2001 8:38 A 0 0.417646239
3 DEF 7/31/2006 6:57 A 0 0.578761123
8 DEF 11/30/2008 9:07 A 0 0.590210739
9 DEF 1/21/2004 8:09 A 0 0.690726882
7 DEF 7/29/2010 23:31 A 0 0.902005128
8 DEF 8/12/2000 19:12 A 0 0.361275228
7 DEF 7/23/2002 9:07 A 0 0.456829664
8 DEF 5/8/2001 9:07 A 0 0.503349514
8 DEF 4/13/2010 15:36 A 0 0.119554142
9 DEF 9/9/2010 15:07 I 0 0.602501919
7 DEF 12/16/2003 6:57 A 0 0.820769995
7 DEF 6/22/2006 18:28 A 0 0.944822744
Output results
4.2 Date
Holiday Extract
Often it is desirable to check if any transaction dates fall on a federal holiday such as the Independence
Day, etc. Although it may be possible to visually check for these dates, it becomes more complicated
when the date falls on a weekend and is therefore celebrated on the preceding Friday (or the following
Monday). This function can analyze all the dates within a specified range and quantify the number that
fall on each of the holiday dates. There are two functions related to holidays. One prepares a summary
of counts of holiday dates and the other extracts transactions whose dates fall on federal holidays.
Usage Example 1
In a test of general ledger, an extract of all journal postings on a federal holiday needs to be obtained.
Approach – using the “holiday” command, extract a list of all journal entries posted on holidays. The
date format being used is month – day – year (mdy).
Audit Command values
Column value – [journal posting date]
Text Box – mdy
Where – (empty)
Results
A list of any journal entry transactions which have been posted on a date which is a federal
holiday. In addition, a summary chart of holiday transactions is prepared.
Usage Example 2
Determine if any receiving reports exist for dates falling on a federal holiday. Date format is mdy.
Approach – using the “holiday” command, extract a list of receiving transactions falling on a federal
holiday.
Audit Command values
Note: The default values: US and mdy will be used if no values are specified.
The command example below checks for any records which have an acquisition date falling on a federal
holiday in the United States.
Output results
Holiday Extract
Holiday Summary
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
This chart indicates that the most frequent holiday for asset acquisitions was President’s Day (19
instances).
Output results - chart
Week days
In many instances the auditor wishes to extract just certain data within Excel based upon days of the
week. In this instance one column or row will contain dates which the auditor wishes to examine.
Usage Example 1
In a test of certain expense, an extract is needed for expenses incurred on a Friday or Saturday.
Approach – using the “wd” command, extract a list of all such transactions. The date format being used
is month – day – year (mdy).
Audit Command values
Column value – [expense date]
Text Box – Friday, saturday
Where – (empty)
Results
A list of any expense transactions which fell on a Friday or Saturday are prepared.
Usage Example 2
An audit test is to be performed to identify any travel expense transactions on Saturdays, which is not
allowed at this company.
Approach – using the “wd” command, extract a list of all such transactions. The date format being used
is month – day – year (mdy).
Audit Command values
Column value – [expense date]
Text Box –Saturday
Where – [travel code] = ‘airline’
Results
A list of any expense transactions which fell on a Saturday is prepared.
The day of the week must include at least the first three letters of the week day name. case does not
matter. Thus, Sunday could be specified using any of the following: “sun”, “Sunday”, “sund”, etc.
The example below is used to extract all transactions which fall on either a Saturday or a Monday. Note
that additional selection criteria could have been applied, e.g. store = ‘ABC’ to isolate the extract to just
Auditing data in Excel
Page 41
worksheets
Audit Commands
those transactions at store ‘ABC’. Similarly a date range could have also been applied, e.g. acqdate
between #7/1/2005# and #9/30/2005#. When specifying dates as part of the extract criteria, the date
value must be enclosed in pound signs (‘#’).
Output results
Week days
Holiday Summary
In certain instances it is desirable to extract just those transactions in a file which fall on a federal
holiday. These transactions can then be reviewed separately. The holiday extract command can be
used in conjunction with date ranges, location codes or any other criteria which should be applied as
part of the extract.
Usage Example 1
In a test of general ledger, an extract of all journal postings on a federal holiday needs to be obtained.
Approach – using the “holiday” command, extract a list of all journal entries posted on holidays. The
date format being used is month – day – year (mdy).
Audit Command values
Column value – [journal posting date]
Text Box – mdy
Where – (empty)
Results
A list of any journal entry transactions which have been posted on a date which is a federal
holiday. In addition, a summary chart of holiday transactions is prepared.
Usage Example 2
Determine if any receiving reports exist for dates falling on a federal holiday. Date format is mdy.
Approach – using the “holiday” command, extract a list of receiving transactions falling on a federal
holiday.
Audit Command values
Column value – [receiving report date]
Text Box – mdy
Where – (empty)
Results
Note: The default values: US and mdy will be used if nothing is specified.
Output results
Holiday Summary
Output results (pasted into Excel work sheet)
Holidays:
New Year's 14
Martin Luther King 13
President's Day 19
Memorial Day 14
Independence Day 9
Labor Day 8
Columbus Day 7
Veterans Day 8
Thanksgiving 9
Christmas 16
Output results
Auditing data in Excel
Page 45
worksheets
Audit Commands
Holiday Summary
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
4.2.4 Ageing
Ageing
During a review of applications which use both dates and amounts, it is very common to "age" the data
for various purposes - e.g. reasonableness testing, checking for stale or obsolete items, data
classification, etc. The procedure to age data is straightforward:
The date to be used for ageing “Ageing Date”
The width of the ageing range, e.g. 30 days
The name of the column with the date to be aged, e.g. “Due Date”
The name of the column with the amount to be aged, e.g. “Balance Due”
Usage Example 1
In a test of accounts receivable, an ageing of customer account balances is needed.
Approach – using the “age” command, prepare an ageing report for customers in ABC region. Ageing is
to be done as of June 30, 2008. Ageing width is 30 days.
Audit Command values
Column value – [invoice date]
Text Box – invoice date, invoice amount, 6/30/2008, mdy
Where – region = ‘ABC’
Results
An ageing report is prepared for those customer in region ABC as of June 30, 2008.
Usage Example 2
Output results
Ageing
Ageing
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
Date Near
Selection of a range of transactions based upon date value is a very common data extraction procedure.
Examples include cut-off testing, re-testing balances for a specified period, etc.
There are two equivalent procedures for doing such an extraction -
2. DateNear - the auditor specifies a date and the maximum number of days from the date (e.g.
three days before or after July 4th)
Usage Example 1
For cutoff testing, the auditor wants to identify any sales made within 5 days of June 30, 2008.
Approach – using the “datenear” command, prepare a list of any such transactions.
Audit Command values
Column value – [sales date]
Text Box – 6/30/2008, 5
Where – (empty)
Results
A list of any sales transactions within five days of June 30, 2008, i.e. June 25, 2008 – July 5,
2008.
Usage Example 2
For accrual testing, the auditor wants to identify any accruals posted within 15 days of June 30, 2008.
Only account numbers beginning with either a ‘2’ or a ‘3’ are to be selected.
Approach – using the “datenear” command, prepare a list of any such transactions.
Audit Command values
Column value – [journal date]
Text Box – 6/30/2008, 15
Where – [account number] like ‘[2-3]%’
Results
A list of any accruals posted within 15 days for the account numbers specified.
Output results
Date near
Output results (pasted into Excel work sheet – doesn’t show all rows or columns)
TagNo Cost AD Replace Bookval Salvage Depr Life Location Acquisition Accode
840 6032 2421.711 1810 3610.29 1206 484.3423 3 DEF 7/31/2006 6:57 A
4615 6166 2526.535 1850 3639.46 1233 505.307 8 ABC 8/2/2006 11:02 A
2145 6094 2475.97 1828 3618.03 1219 495.194 4 DFS 7/26/2006 0:43 A
1298 6144 2512.487 1843 3631.51 1229 502.4973 3 ABC 7/29/2006 12:14 A
108 6042 2430.326 1813 3611.67 1208 486.0651 8 ABC 7/30/2006 16:04 A
4426 6105 2475.607 1832 3629.39 1221 495.1214 7 ABC 8/4/2006 9:21 I
Output results
Date Range
The date range test is the same as “date near”, except specific dates are provided.
Usage Example 1
For cutoff testing, the auditor wants to identify any sales made between 6/25/2008 and 7/5/2008.
Approach – using the “daterange” command, prepare a list of any such transactions.
Audit Command values
Column value – [sales date]
Text Box – 6/25/2008, 7/5/2008
Where – (empty)
Results
A list of any sales transactions within the specified range, i.e. June 25, 2008 – July 5, 2008.
Usage Example 2
For accrual testing, the auditor wants to identify any accruals posted within 15 days of June 30, 2008.
Only account numbers beginning with either a ‘2’ or a ‘3’ are to be selected.
Approach – using the “daterange” command, prepare a list of any such transactions.
Audit Command values
Column value – [journal date]
Text Box – 6/15/2008, 7/14/2008
Where – [account number] like ‘[2-3]%’
Results
A list of any accruals posted within 15 days for the account numbers specified.
Output results
Date range
Output results (pasted into Excel work sheet – doesn’t include all columns)
Acquisition TagNo Cost AD Replace Bookval Salvage Depr
7/31/2006 6:57 840 6032 2421.711 1810 3610.29 1206 484.3423
8/11/2006 21:07 4919 6103 2466.12 1831 3636.88 1221 493.224
8/2/2006 11:02 4615 6166 2526.535 1850 3639.46 1233 505.307
8/10/2006 5:16 4376 6040 2417.777 1812 3622.22 1208 483.5554
8/8/2006 3:50 2149 6073 2445.843 1822 3627.16 1215 489.1685
8/4/2006 9:21 4426 6105 2475.607 1832 3629.39 1221 495.1214
8/11/2006 21:21 7053 6158 2510.114 1847 3647.89 1232 502.0229
8/10/2006 9:50 9235 6113 2475.591 1834 3637.41 1223 495.1182
Output results
The week days report summarizes the count of transactions by day of week. This test may be used for
reasonableness tests, audit planning, etc. The report consist of both text and a chart.
Usage Example 1
In an audit of expense reports, the counts of expenses by day of week are needed.
Approach – using the “wdreport” command, summarize such transactions.
Audit Command values
Column value – [expense report date]
Text Box – mdy
Where – (empty)
Results
A summary of counts of expense report transactions by day of week.
Usage Example 2
In an audit of purchasing, the counts of purchase orders issued by day of week are needed.
Approach – using the “wdreport” command, summarize such transactions.
Audit Command values
Column value – [purchase order date]
Text Box – mdy
Where – (empty)
Results
A summary of counts of purchase order transactions by day of week.
Date format – “mdy” for mm/dd/yyyy or “dmy” – dd/mm/yyyy
Country code – “US” or “CA”.
Note: The default values: US and mdy will be used if nothing is specified.
Output results
Week days report
Output results (pasted into Excel work sheet)
Weekday analysis:
Sunday: 539
Monday: 575
Tuesday: 514
Wednesday: 588
Thursday: 551
Friday: 583
Saturday: 536
Output results
Weekdays report
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
The chart indicates that the most common day of the week for the transactions selected was
Wednesday and the least frequent day of the week was Tuesday.
Output results - chart
4.3 Other
A prime indicator of missing documents is a "gap" in a numeric sequence, such as check numbers,
purchase orders, sales invoices, petty cash slips, receiving reports, etc. The "gaps" command is used to
check a range of data to determine if there are any "gaps" within a range of numbers.
Usage Example 1
A check is to be made to determine if all asset tag numbers are accounted for. The purpose of the test
id to determine if there are any “gaps” in the numbers assigned for fixed asset tags. No records are to be
excluded. The name of the column for the fixed asset tag number is “Tagno”. The command box to
perform this test would be as shown below.
Usage Example 2
In an audit of cash, the auditor wishes to determine of the schedule of checks paid is complete, i.e. are
there any missing check numbers which have not been accounted for? The commands to perform this
test are shown below. Notre that the name of the column which contains the check numbers is called
“Check Number”. All of the data is to be tested, i.e. there are no exclusions for testing, so the “Where”
box is blank. This command does not require any other information, so that box is also blank.
Output results
Data extraction is a very common audit procedure whose purpose is to narrow down the
transactions or other data which needs to be tested. Only two pieces of information are required
– the name of the command which is selected from the drop down list (“Data extraction”) and the
specific instructions which are contained in the “Other Info” column.
There are many available commands for performing data extraction and they are described in
more detail in Chapter 7. In the first example, the audit wishes to extract fixed asset records for
those assets which were acquired during the fiscal year ended June 30, 2008, i.e. July 1, 2997 –
June 30, 2008. The name of the column for the acquisition date is named “acquisition date”.
Example 1
Note that because the column name contains an embedded space, it must be enclosed in
brackets.
In the second example, the auditor wishes to test for a possible error condition. Few assets with
a useful life of more than 10 years would have a cost of less than $1,000. The auditor wishes to
run an extract to see if there are any such records.
In some cases, the syntax needed for the command may not be obvious. There is a “help”
facility available by clicking on the label named “Where?”. This brings up a form of examples,
where a command similar to that needed may be selected and edited.
Example output
Output will be just those rows (if any) which meet the criteria specified. At a minimum a header
row will be provided.
Data Extraction
Output results
Data Extraction
Output results (pasted into Excel work sheet – not all is shown)
This is a schedule of all assets which have been over depreciated, i.e. cost less accumulated
depreciation exceeds salvage.
Output results
4.3.3 Duplicates
Duplicates
Often it is desirable to check if any transactions are exact duplicates. The auditor specifies what
constitutes a duplicate, as ordinarily this will depend upon the values in several columns. As an
example, a duplicate invoice might be defined as the same vendor number, same invoice date and same
invoice number. Note that one or more columns can be used in the search for duplicate transactions.
There is no limit as to the number of columns which may be involved.
Usage Example 1
The first example is a test performed as part of an accounts payable audit. A potential duplicate invoice
is defined as one which has the same vendor number, invoice number and invoice date. The test is
performed using the commands shown below.
The command text in the “Other info” is simply the column names separated by commas:
Results
A schedule of potential duplicate invoices, using the specification provided.
Usage Example 2
In an audit of fixed assets, an audit objective is to determine the accuracy of the records by checking for
duplicate asset tag numbers. Tag numbers should be unique within any single location. However, there
are certain “generic” tag numbers which begin with the letter “A” and these tag numbers should not be
tested.
The test is performed using the commands shown below.
The command text in the “Other info” is simply the column names separated by commas:
Output results
Duplicates
Output results (pasted into Excel work sheet – not all rows and columns are shown, highlighting
added for emphasis)
location tagno Cost AD Replace Bookval Salvage Depr
ABC 19 5766 2357.063 1730 3408.94 1153 471.4125
ABC 19 2575 1042.965 772 1532.03 515 208.5931
ABC 56 3888 1568.307 1166 2319.69 778 313.6614
ABC 56 7557 3036.653 2267 4520.35 1511 607.3306
ABC 110 2735 1102.043 820 1632.96 547 220.4085
ABC 110 5214 2101.48 1564 3112.52 1043 420.2959
ABC 122 8814 3527.223 2644 5286.78 1763 705.4446
ABC 122 2040 826.3205 612 1213.68 408 165.2641
ABC 139 7391 2966.962 2217 4424.04 1478 593.3925
ABC 139 2425 978.3281 728 1446.67 485 195.6656
ABC 233 8410 3424.003 2523 4986 1682 684.8005
ABC 233 4463 3570 1339 893 893 357.7068
ABC 258 2704 1098.159 811 1605.84 541 219.6318
ABC 258 8965 3620.646 2690 5344.35 1793 724.1293
ABC 402 6213 2531.266 1864 3681.73 1243 506.2532
ABC 402 4365 1771.483 1310 2593.52 873 354.2965
ABC 418 2952 1187.545 886 1764.46 590 237.5089
ABC 418 6729 2728.152 2019 4000.85 1346 545.6304
ABC 441 7380 3014.342 2214 4365.66 1476 602.8683
ABC 441 7263 2970.587 2179 4292.41 1453 594.1173
ABC 520 6359 2567.103 1908 3791.9 1272 513.4206
ABC 520 8120 3297.159 2436 4822.84 1624 659.4317
ABC 556 1198 486.1772 359 711.82 240 97.23544
ABC 556 3849 1576.375 1155 2272.63 770 315.2749
ABC 560 3209 1287.226 963 1921.77 642 257.4452
Output results
Unusual or error conditions may be detected using the “same, same, different” test. An example during
a review of invoice transactions would be two invoice payments which had the same vendor, same
invoice number, same date, but different amounts. Similarly, during a review of the employee master
file, two records might be identified which have the same employee last name, same employee first
name, same city, same street, but different social security numbers. The purpose of the same, same,
different procedure is to identify any such records, if they exist.
The test is performed using the names of the columns to be tested.
The names of each column to be tested for same, same different, separated
by commas. The last column specified is that which is tested for being
different. For example, in the invoice example above, the testing
specification would be “[Vendor Number],[Invoice Number],[Invoice date],
[Invoice Amount]” (without the quotes).
Usage Example 1
In an audit of accounts payable, test for the unusual situation described above.
Approach – using the “ssd” command, analyze the transactions.
Audit Command values
Column value – [blank]
Text Box – [Vendor Number],[Invoice Number],[Invoice date],[Invoice
Amount]
Where – (empty)
Results
A schedule of any transaction pairs which have the same vendor number, invoice number,
invoice date, but a different invoice amount.
Usage Example 2
In an audit of payroll transactions, check for any pair of records which have the same employee last
name, same employee first name, same street address, but different employee numbers. Tests are to
be made only for those employees in Florida, Georgia and Alabama.
Approach – using the “ssd” command, analyze such transactions.
Audit Command values
Column value – [empty]
Text Box – [last name],[first name], [street address], [employee number]
Where –state in (‘FL’,’GA’,”AL’)
Results
Schedule of any such records identified.
The example below illustrates the procedure for identifying instances of fixed asset records which have
the same tag number but a different location.
Output results
Same, Same, Different
The system provides for four primary types of trend line analysis:
Trend lines
The purpose of the trend line procedure is to perform a “best fit” linear regression test on transaction
data, and then calculate both confidence intervals and prediction intervals in order to determine if any
amounts might lie outside these bounds. Any such amounts might be tested by the auditor to ensure
that they do not represent errors.
Usage Example 1
Comparative income statements exists for the last five years. In this test, a trend analysis on the Sales
amounts will be performed. (The amounts shown are actual from a Standard and Poors report for a
Fortune 500 company.
Since the data is in horizontal format, the check box “Rows” is checked before the data is copied from
Excel and pasted into the form.
Output results
Trend Line
Output results show the basic trend line information – intercept, slope and correlation coefficient.
The slope is negative because the information goes back in time. The correlation of 83% indicates a
fairly consistent trend over time.
Output results
The purpose of the timeline analysis command is summarize and chart key information from transaction
data over a time period in order to see underlying trends or to identify potential anomalies or errors.
Built into the functionality is the ability to “drill down” using various criteria and also to view the
summarized information using various measures such as counts, totals, averages, etc. Output is a
detail report which identifies potential variances, as well as a chart so that the summarized information
may be more easily viewed.
To run the analysis, five pieces of information are needed:
1. Name of the date column to be used, i.e. the name of the column which contains the
transaction date to be used for the analysis.
2. Name of the amount column, i.e. the column containing the numeric information
being analyzed
3. The time interval to be used for the analysis, specified as a single letter, and which
must be one of the following:
a. monthly, specified using ‘m’
b. quarterly, specified using ‘q’
c. annually, specified using ‘y’
d. weekly, specified using ‘w’
e. daily, specified using ‘d’
4. The type of metric to be applied, which must be one of the following:
a. summary, specified as ‘sum’,
b. count, specified as ‘count’
c. average, specified as ‘avg’,
d. minimum value, specified as ‘min’
e. maximum value, specified as ‘max’,
f. standard deviation, specified as ‘stdev’
5. The confidence level, a number between 0 and 1. The default value is .95, i.e. a 95%
confidence level
With this information, the system will aggregate the data using the time period specified and the type of
aggregation desired. The results will be written out as a text file and also plotted on a chart.
Auditing data in Excel
Page 75
worksheets
Audit Commands
Usage Example 1
In an audit of accounts payable, the auditor wishes to see a trend as to invoice totals for a specified
vendor, by quarter, in order to view the overall trend and to see if there may be any unusual items such
as “spikes”, missing data, etc.
The date column to be used is called “invoice date”, and the amount column to be analyzed is called
“invoice amount”. Tests are to be done at a 95% confidence level. The command would be as follows:
[invoice date], [invoice amount], q, sum, .95
Usage Example 2
Continuing with the same example, the auditor now wants to see transaction counts by month. The
command would then be as follows:
[invoice date], [invoice amount], m, count, .95
The command box above performs a time line analysis of asset acquisitions using the “cost” column,
and specifying a period of “q” (quarterly) with a precision of 95%.
The chart produced is shown below.
The chart indicates that there were few or no asset acquisitions prior to the first quarter of 2004. To
get a more representative picture, the procedure can be re-run, specifying just asset acquisitions made
after January 1, 2004.
Output results
Time line analysis
There is also a text report which has all the details. Below is that data imported into Excel.
Output results
Queries can now be further refined. The next query obtains the same information by month,
changing only the period parameter from a ‘q’ to an ‘m’.
Confidence Band
The purpose of the confidence band procedure is to perform a linear regression test on transaction data,
and then calculate both confidence intervals and prediction intervals in order to determine if any
amounts might lie outside these bounds. Any such amounts might be tested by the auditor to ensure
that they do not represent errors.
Usage Example 1
The chart shows that there is a fair overall correlation between the data. (86.3%). However, for one data
point the repair costs are well outside the expected range. This might be an area the auditor could focus
on.
Output results
Confidence Band
Output results (pasted into Excel work sheet – emphasis added, formatting performed for clarity)
Linear regression report:
Equation: y = b + mx
Intercept: 5505.15584475063
Slope:6.61707235425678E-02
Correlation: 35%
Precision: 0.9
Desc X Y Predicted
Lower Prediction
Lower Confidence
Predicted
Upper Confidence
Upper Prediction Comment
Wake 19,758.00 6,737.81 6,812.56 -1,028.65 -1,027.45 6,812.56 14,652.56 14,653.76
Mecklenberg 14,097.00 6,248.66 6,437.96 3,231.92 3,234.85 6,437.96 9,641.08 9,644.01
New Hanover 12,518.00 6,180.84 6,333.48 4,418.72 4,423.63 6,333.48 8,243.33 8,248.24
Johnston 12,121.00 6,231.25 6,307.21 4,716.58 4,722.49 6,307.21 7,891.93 7,897.84
Person 11,838.00 6,208.12 6,288.48 4,928.60 4,935.52 6,288.48 7,641.45 7,648.37 observed greater
than upper
predictionobserved
greater than upper
Dansbury 7,957.00 8,213.17 6,031.68 4,199.87 4,205.00 6,031.68 7,858.35 7,863.48 confidence
Smythe 18,731.00 6,623.40 6,744.60 -255.53 -254.19 6,744.60 13,743.39 13,744.73
Jackson 2,465.00 5,488.28 5,668.27 -658.25 -656.76 5,668.27 11,993.30 11,994.78
Gregory 14,380.00 6,323.13 6,456.69 3,019.05 3,021.78 6,456.69 9,891.60 9,894.33
Altenberg 13,612.00 6,330.88 6,405.87 3,596.66 3,600.00 6,405.87 9,211.74 9,215.08
Jamestown 16,769.00 6,691.96 6,614.77 1,221.32 1,223.06 6,614.77 12,006.49 12,008.23
Flurry 1,880.00 5,430.37 5,629.56 -1,176.03 -1,174.65 5,629.56 12,433.76 12,435.14
Snow 15,366.00 6,443.21 6,521.94 2,277.20 2,279.41 6,521.94 10,764.46 10,766.67
Bear 790.00 5,307.48 5,557.43 -2,140.82 -2,139.60 5,557.43 13,254.46 13,255.68
Rugged 3,488.00 5,615.62 5,735.96 247.16 248.87 5,735.96 11,223.05 11,224.76
PineLake 4,154.00 5,691.17 5,780.03 836.55 838.45 5,780.03 10,721.60 10,723.50
FireStorm 3,083.00 5,427.82 5,709.16 -111.28 -109.67 5,709.16 11,527.99 11,529.60
observed less than
Fern Valley 10,354.00 6,032.78 6,190.29 5,993.84 6,049.51 6,190.29 6,331.06 6,386.73 lower confidence
Output results
The purpose of the confidence band (time series) procedure is to perform a linear regression test on
transaction data, and then calculate both confidence intervals and prediction intervals in order to
determine if any amounts might lie outside these bounds. Any such amounts might be tested by the
auditor to ensure that they do not represent errors.
Usage Example 1
In an audit of transportation expenses, there is a need to determine if there is a linear relationship
between mileage and annual maintenance expenses
Approach – using the “confband2” command, test such a relationship.
Audit Command values
Column value –N/A
Text Box – year, month, x, y
Where – (empty)
Results
A trend line chart over time with confidence and prediction intervals for the linear relationship.
Output results
Confidence Band
Confidence Band
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
The chart indicates that there is a good correlation (98.7%) between the claim amount and the ffp
amount. The correlation should be 100%. Further checking is needed at the account level.
Output results - chart
Duplicate invoices may arise due to a variety of circumstances, even when system edits are in
place. One example is where two invoices from the same vendor for the same amount are entered,
where one invoice number is a slight variation of the other, e.g. a transposition. In cases like this,
the system may not necessarily recognize that the invoices are duplicates.
The purpose of the near miss procedure is to identify potential duplicate invoices by checking for
any combination of two invoices which meet the following criteria:
First invoice - vendor 123, amount $100.00, date 8/18/2009, invoice number 10023
Second invoice - vendor 123, amount $100.00, date 9/5/2003, invoice number 10032
If the specification for the identification of duplicates were 30 days and a Levenshtein distance of 2,
these two invoices would be flagged as potential duplicates.
For this test, the input data does not need to be sorted. However, the comparison process is com-
putationally intensive, so that invoices from any one vendor are tested in blocks of up to 200 in
count. Generally, the system will identify potentially duplicate invoices based upon the criteria
provided, but it is possible that for vendors with a large number of invoices, two potentially duplicate
invoices could be missed.
Output results
Invoice “Near Miss”
Output results (pasted into Excel work sheet)
Near Miss Report
Vendno Amt Inv Date Second Date
Invno Suspect Invno
Closeness
V200 103.02 5/31/2007 5/31/2007 2103 4
V200 103.02 6/2/2007 5/31/2007 2103 4
V200 103.02 6/2/2007 5/31/2007 0
V201 186.01 5/26/2007 5/26/2007 2186 2186 0
V202 647.82 4/29/2007 4/29/2007 20647 2647 1
V202 647.82 4/29/2007 4/29/2007 2467 2647 2
V202 647.82 4/29/2007 4/29/2007 2467 20647 2
V202 647.82 4/29/2007 4/29/2007 2647 2647 0
V202 647.82 4/29/2007 4/29/2007 2647 20647 1
V202 647.82 4/29/2007 4/29/2007 2647 2467 2
Split invoices
The purpose of the split invoice test is to determine if an invoice may have been paid as a single amount
and then also paid with multiple payments totaling the invoice amount. As an example, an invoice in the
amount of $2,700 consisting of three line items of $1,000, $900 and $800 may have been paid once as
$2,700 and then three additional payments made of $1,000, $900 and $800. The test for split invoices
uses certain auditor parameters to determine whether an invoice amount should be considered, namely
the length of time between amounts.
The maximum number of days apart two payments are in order to be considered. For example, the
auditor may wish to consider only those payments to a vendor that are within 10 days of each other as
part of the test for split invoices. Any payment amounts made more than ten days apart would then not
be considered as part of the split invoice test.
Usage Example 1
A test of invoices is made to determine if any potential “split invoice” payments can be identified. The
names of the column values to be tested are as follows:
Column name Description
Vendor Vendor number
InvNo Invoice Number
InvDate Invoice Date
InvAmt Invoice Amount
Tests are to be made for invoices with dates up to 30 days apart.
The values entered into the form are shown below.
Output results
Split invoices
Output results (pasted into Excel work sheet)
Split Invoice Report
Vendno Inv No Inv No2 Amount Amount2 Amount 3 Diff
V201 2186 2186 86.01 186.01 100 2 30
V201 2186 2186 100 186.01 86.01 2 30
These results indicate that there was an invoice paid in the amount of $186.01. In addition, two other
invoices to the same vendor, within the specified time period were paid which also totaled to $186.01 =
$100.00 + $86.01.
Output results
The purpose of testing for Social Security number validity is to identify any social security numbers
which would be considered invalid according to the criteria published on the site of the Social security
Administration. The test considers several factors:
• Ranges of numbers issued
• Certain digits or ranges which are automatically invalid
• The highest number assigned for an area
Note: The social security number ranges are published monthly by the Social Security
Administration.
Usage Example 1
A test of validity of social security numbers is to be performed on data where the social security number
column is named “SSN”.
Audit Command values
Column value – [SSN]
Text Box – (empty)
Where – (empty)
Results
A list of all records where the social security number is invalid.
The input form used to perform the checking is shown below.
Output results
Output results (pasted into Excel work sheet- not all rows shown – no social security numbers shown are
valid – highlight added for emphasis)
SSN LASTNAME FIRSTNAME MIDNAME DOB ADDRESS CITY
NOT A REAL SOCIAL SECURITY NUMBER BLACKBURN BLAKE 1/15/1930 P O BOX 196
AGURA HILLS
NOT A REAL SOCIAL SECURITY NUMBER NYMAN WOODROW A 1/24/1930 10013 S RHODES
MONMOUTH JUNCTION
NOT A REAL SOCIAL SECURITY NUMBER MCMULLAN CLAYBORN 1/29/1930 931 E HOPEWESTPORT
ST
NOT A REAL SOCIAL SECURITY NUMBER WEINREB DEBBIE 5/12/1930 818 KIRKWOOD
HOLLISST
NOT A REAL SOCIAL SECURITY NUMBER DIAZ CHARLENE 5/18/1930 C/O 3420 NE
PELHAM
168TH ST
NOT A REAL SOCIAL SECURITY NUMBER NANCE YVONNE A 8/15/1930 10 RAINBOWGRANADA
LANE HILLS
NOT A REAL SOCIAL SECURITY NUMBER RUSSELL MELISSA JAMES 8/30/1930 237 MASTENEGGERTSVILLE
RD
NOT A REAL SOCIAL SECURITY NUMBER BARBOUR ANTHONY 10/22/1930 P O BOX 630,
ROCKVILLE
#79729-004
CTR
NOT A REAL SOCIAL SECURITY NUMBER STONER JO MIGUEL 4/17/1931 4595 HYLAND
COLEMAN
BLVD
NOT A REAL SOCIAL SECURITY NUMBER PEPIN LINDA L 6/30/1931 311 BRIDGEDECATUR
ST
NOT A REAL SOCIAL SECURITY NUMBER MCNAMARA TIMOTHY ALICE 12/30/1931 11120 NW GAINESVILLE
LOS ALTOS ROAD
NOT A REAL SOCIAL SECURITY NUMBER CASTRO LOUIS L 1/22/1932 300 MAIN STREET
ROCHESTER
NOT A REAL SOCIAL SECURITY NUMBER CAPLES ANGELA 1/25/1932 P O BOX 8103
READING
NOT A REAL SOCIAL SECURITY NUMBER SCHWANDT LOUIS L 1/30/1932 3000 MURWORTH
SPOKANE DR, APT 511
NOT A REAL SOCIAL SECURITY NUMBER FISHKIN AVANELL 4/23/1932 P O BOX 496
MIAMI
NOT A REAL SOCIAL SECURITY NUMBER MOORE LEROY LANG 7/1/1932 3201 KNIGHT
KENNER
ST, APT 1402
NOT A REAL SOCIAL SECURITY NUMBER BAJZA MEGAN JEAN 7/9/1933 241 FARNOLPRESCOTT
ST, SW
NOT A REAL SOCIAL SECURITY NUMBER BROWN BRIDGETTE 8/2/1933 P O BOX 1032,
CAMP #79399-004
VERDE
NOT A REAL SOCIAL SECURITY NUMBER WHITE MARK K 9/7/1933 269 EAST SDOWNERS
STREET GROVE
NOT A REAL SOCIAL SECURITY NUMBER BUTCHER HARRIET S 3/13/1934 5771 DEXTER
KNOXVILLE
CIRCLE
NOT A REAL SOCIAL SECURITY NUMBER VANGRAEFSCHEPE
JASON PARAMA 3/26/1934 501 N 13THCHARLESTON
AVENUE
Output results
The purpose of the check P.O. Box command is to examine addresses for an indication that it is a Post
Office Box. Because there are many ways in which a Post Office Box address can be coded, a
procedure devoted to just this type of test is provided. For example, the address may contain “PO Box”,
“POB”, “P.O. Box”, etc.
In audits of disbursements made based upon an accounts payable system, one of the audit tests
commonly performed is to test for vendors whose address is a post office box. Generally, vendors
should have a street address where they receive their mail. In certain instances, fraudulent payments
have been made to vendors using a post office box in order to disguise the true nature of the payment,
which may be associated with an employee of the company making the payment.
Although it is possible to visually check for post office boxes in addresses, the process can be tedious
and time consuming, especially if a large number of records are involved. One of the challenges is
simply the ability to recognize many of the variations possible in the designation of a post office box in
an address. For example, the address might be structured in any of the following formats:
Example 1
Search the column named “Address1” in the vendor master for addresses which might be post office
boxes.
Output results
Calculated Values
In many instances the auditor wishes to add a column of data, e.g. a calculated amount, based
upon values contained in other columns. Calculated values
A common procedure used during the analysis of data in Excel is to insert one or more columns and
calculate their value using formula which based on values contained in other columns. Although
this procedure is effective, it has the drawback that column letters must be used instead of column
names which makes interpreting and verifying the formulae used more difficult.
The purpose of the calculated values procedure is to add one or more columns to a work sheet us-
ing formula with column names. Often the formula will consist of mathematical operations, but any
SQL function may be used (see list of functions in description of where clause values).
The syntax for the calculated values is "expression1 as name1, expression2 as name2" etc. where
"expression" is a calculated value. The word "as" must be used without change, and "name" must
be a description beginning with a letter and consisting of only letters, numbers and the special char-
acters "$", "_". If the name contains any embedded spaces, then the entire name must be enclosed
in brackets, e.g. "[cost amount]".
Examples -
Add a column called net book value computed as cost less accumulated depreciation
Output results
Calculated Values
Output results (pasted into Excel work sheet – first column highlighted for emphasis)
property tax TagNo Cost AD Replace Bookval Salvage Depr Life Location AcqDate
72.49729037 3504 2438 988.0542 731 1449.95 488 197.6108 6 ABC 4/6/2005
97.1394758 4148 3244 1301.21 973 1942.79 649 260.2421 5 ABC 2/3/2006
274.2308104 3302 9163 3678.384 2749 5484.62 1833 735.6768 8 ABC 10/15/2004
146.6431954 3816 4937 2004.136 1481 2932.86 987 400.8272 4 ABC 7/8/2005
240.3376714 3411 8118 3311.247 2435 4806.75 1624 662.2493 5 ABC 2/9/2007
245.5702876 2547 8258 3346.594 2477 4911.41 1652 669.3188 9 ABC 5/26/2007
94.12422075 1701 3143 1260.516 943 1882.48 629 252.1031 11 ABC 9/30/2005
265.6780722 3960 8955 3641.439 2686 5313.56 1791 728.2877 3 ABC 12/8/2005
85.70210075 5056 2885 1170.958 866 1714.04 577 234.1916 5 ABC 3/24/2005
47.82652079 2996 1596 639.4696 479 956.53 319 127.8939 3 ABC 10/7/2005
93.07851995 1299 3115 1253.43 934 1861.57 623 250.6859 12 ABC 3/4/2006
66.92986036 2881 2244 905.4028 673 1338.6 449 181.0806 8 ABC 3/6/2006
30.4 2791 3039 2431 912 608 608 761.4 12 ABC 3/17/2007
155.8641946 1443 5240 2122.716 1572 3117.28 1048 424.5432 12 ABC 11/17/2004
42.23143191 1202 1416 571.3714 425 844.63 283 114.2743 6 ABC 6/5/2007
172.5694554 3567 5776 2324.611 1733 3451.39 1155 464.9222 11 ABC 12/5/2004
79.1798243 5010 2645 1061.404 794 1583.6 529 212.2807 10 ABC 9/28/2006
91.2098218 4163 3048 1223.804 914 1824.2 610 244.7607 4 ABC 12/19/2005
271.3595988 1306 9177 3749.808 2753 5427.19 1835 749.9616 7 ABC 9/17/2006
11.65 5205 1165 932 350 233 233 95.43749 8 ABC 4/8/2006
73.8564635 4219 2500 1022.871 750 1477.13 500 204.5741 3 ABC 7/10/2006
17.93122414 1384 603 244.3755 181 358.62 121 48.8751 12 ABC 1/25/2006
284.3327576 3914 9578 3891.345 2873 5686.66 1916 778.269 4 ABC 8/19/2005
44.18759538 4323 1482 598.2481 445 883.75 296 119.6496 7 ABC 3/16/2007
143.4290984 4758 4829 1960.418 1449 2868.58 966 392.0836 9 ABC 2/3/2006
79.19611735 3213 2669 1085.078 801 1583.92 534 217.0155 11 ABC 5/21/2006
Output results
The technique of measuring the difference between text values based upon Levenshtein distance
was developed by a Russian mathematician. The technique measures the number of steps required
to make two character values match based upon additions, changes and deletions of text. It is
particularly useful in identifying transpositions or other instances in which the difference between
two text strings is minimal. The number of steps required to make the change is referred to as the
"Levenshtein distance".
Usage Example 1
The difference between any two character strings may be measured using the "Levenshtein dis-
tance". This concept was developed by the Russian physicist Vladimir Levenshtein and defines the
distance as the minimum number of character additions, deletions and changes necessary to trans-
form one character string into another.
For auditors, the concept is applicable to searches for character strings which represent only very
minor differences between two character strings. For example, the name "McMillan" is similar, but
not identical to "McMillun". In this case the distance would be one, because only a single change
from the letter "a" to the letter "u" is necessary for them to be identical. As another example, trans-
positions will represent a Levenshtein distance of 2, as both an insertion and a deletion are required
in order for the two strings to be identical.
Common uses for the algorithm can be found in searches where an exact match is not found, but
two or more instances may be identified which are "close". Such searches might be needed in
looking at vendor master files, checking for potentially duplicate invoice numbers or any other situ-
ation where two or more instances might be found which are close, but not identical.
The test can be performed on either a single column by specifying the column name, or else on all
columns (by omitting the column name). If the test is to be done ignoring case, then the command
"UCASE" should be specified for the column name, e.g. Ucase(lastname). If leading and trailing
spaces are to be ignored the "TRIM" command should be specified, e.g. Trim(address).
The search specification is made by providing the text to search against, as well as the maximum
distance to be considered. The following are examples of usage:
Check for address like 108 Fallsworth, trim any spaces on left and right
Output results
This schedule is the results of a search for a record with a last name of ‘MCMILLAN’ with a Levenshtein
distance of 2. In this example, a single character ‘U’ could be replaced with an ‘I’ to obtain the match
desired. This was the only instance identified in the search that was within a Levenshtein distance of 2.
Output results
Selection of subsets of data within a worksheet based upon more complex matching patterns is possible
using the "fuzzy match" command. As an example, the auditor may wish to select all records for asset
tag numbers that begin with "98", followed by any character or digit and then contain the digit "5". Other
examples include all store locations beginning with the letters "A' through "C", followed by two digits and
then one or more of any characters. All of these matches can be done using the technique of "regular
expressions".
There is fairly extensive documentation on how regular expressions work, but they generally consist of
one or more special search characters with the following meanings -
• [!A-H] - match any single character, except the letters "A" through "H"
In order to do fuzzy matching, the auditor sets
Usage Example 1
A search is to be made of employee last names where the first letter is “H” and the second letter is
any of the characters “E” through “I”. The last name to be matched can contain two or more letters
in total. The search specification is shown in the form below.
Output results
Sequential invoices
Sequential Invoices
Generally vendors do not issue sequentially numbered invoices to the same customer, except in un-
usual situations or in cases where they have only a single customer. Sequential invoice checking is
a test to determine which vendors of your organization may have only one customer - your organiz-
ation.
The system does the checking by first sorting the invoice data by vendor and invoice number and
then checking if any two invoices represent sequential numbers, i.e. they have a numeric difference
of one. For any such instance identified, all the detail information for both invoices is listed in a re-
pot for review.
To perform the test, only the name of the vendor number column and the name of the column con-
taining the invoice number need to be provided.
As a simple example, suppose that vendor invoice data is to be tested for sequential invoices and
that the name of the column identifying the vendor is called "Vend_No" and the name of the column
containing the invoice number is "Invoice_No". The command to perform the check would then be
"Vend_No, Invoice_No" (without the quotes).
Note that any non-numeric values are removed from the invoice number before a comparison is
performed. Thus an invoice number "C102345B" would be transformed to "102345" for purposes of
the test.
Example 1
Vendor invoice data is to be tested to determine if any vendor has issued sequential invoices. The input
data is not sorted. The test to be selected is “Sequential invoices” as selected from the drop down list of
commands. The name of the column for the vendor number is named “Vendor”. The test is not limited
to any records, so the “where” information is left blank. The “other information” is the name of the
Output results
Sequential invoices
Output results (pasted into Excel work sheet)
Count of sequentially numbered items
V201 : 1
The results indicate that only one vendor (“V201”) had issued a sequential invoice and that vendor
(“V201”) issued just one sequential invoice.
Output results
4.4 Patterns
An example will best illustrate the concept of pattern testing for round numbers. Consider a
case where journal entries are prepared at the end of each month. Generally, journal entry
postings will contain some round numbers. Although somewhat tedious, the auditor could
determine the count of round numbers posted for the year. For example, there might be a total
of 2,000individual journal entry postings for the year. Of those, 100 (or 5%) were round
numbers, possibly indicating an estimate. If the round number postings were fairly evenly
spread throughout the year, this would indicate that possibly nothing unusual exists, based upon
a comparative test of round numbers. However, if the concentration is in the last month of the
fiscal year (or the first month of the next fiscal period), then this could be a different situation.
Pattern testing is based upon the overall concept outlined above. The procedure first obtains
counts or totals for the entire transaction population. Then the procedure separates the
population based upon criteria specified by the auditor (in the example above posting month)
and then systematically compares each subgroup with the overall population. The system then
reports each group based upon how different it is from the overall population as measured by
the statistical test “Chi Square”.
This same test can also be applied using metrics other than round numbers – e.g. counts by day
of week, counts by holidays, counts by data stratification, etc.
Usage Example 1
Usage Example 2
A test is to be performed for usage of round numbers in general journal entries by the person
preparing the journal entry. The column name for the journal entry preparer is “preparer”.
Usage Example 3
A test is to be performed for usage of round numbers in fixed asset costs by location.
Output results
Pattern analysis using round numbers
An example will best illustrate the concept of pattern testing using stratification. Consider a case where
inventory is being taken at the end of each month at separate warehouse locations. Unless the
warehouses have a significantly different “mix” of items, a stratification of the inventory values by item
will generally follow the same pattern of counts and values. Although somewhat tedious, the auditor
could stratify the amounts manually and then visually compare the results. For example, one
warehouse might have a much larger number of low (or high) value items than the others. Certainly this
could be a valid situation, but it might also represent an error as well.
Pattern testing is based upon the overall concept outlined above. The procedure first obtains counts or
totals for the entire transaction population. Then the procedure separates the population based upon
criteria specified by the auditor (in the example above warehouse) and then systematically compares
each subgroup with the overall population. The system then reports each group based upon how
different it is from the overall population as measured by the statistical test “Chi Square”.
Usage Example 1
In an audit of inventory, the inventory values are known to be clustered in a certain pattern.
Approximately 20% of all inventory items have a value under $100. Then 50% have a value under $200
and 80% have a value under $500. The stratification ranges used to obtain these results were the bin
values of 0, 100, 200, 500
A test is to be made to identify the warehouse location which has inventory value which are the most
different from this pattern as measured using data stratification and the bin values above,
Approach – using the “Pattern - stratification” command, analyze the inventory values. .
Audit Command values
Column value – [unit cost]
Text Box – [location],[unit cost], 0, 100, 200, 500
Where – (empty)
Results
A list, by location, of the measures of the difference between the values at that location and
Output results
An example will best illustrate the concept of pattern testing by day of week. Consider a case for the
retail environment. Generally, sales tend to be concentrated on Fridays, Saturdays and Sundays, with
much lesser amounts on say Monday and Tuesday. If the auditor is looking at a group of locations
(stores), then this test can identify which stores have sales patterns that are the most statistically
different, as measured using standard statistical tests. Although differences in patterns may be
explainable, they may also result from errors. Alternative tests can be performed using month of year
instead of store location, etc.
Usage Example 1
In an audit of revenue in a retail environment, determine which store’s revenue was the most different,
based upon analysis by day of week.
Approach – using the “patternwd” command, analyze such transactions.
Audit Command values
Column value – [trans date]
Text Box – [store number],[transdate]
Where – (empty)
Results
A listing of summary results, by store location, in descending order
Usage Example 2
In an audit of journal entries, determine which account’s postings were the most different, based upon
the day of the week they were posted.
Approach – using the “patternwd” command, analyze such transactions.
Audit Command values
Column value – [ posting date]
Text Box – [account number], [posting date]
Where – (empty)
Results
Auditing data in Excel
Page 117
worksheets
Audit Commands
In the example below, a test was performed on asset acquisitions, by day of week.
Output results
4.4.4 Holidays
An example will best illustrate the concept of pattern testing by holiday. Consider a case for the retail
environment. In some cases, sales tend to be concentrated on certain holidays. If the auditor is looking
at a group of locations (stores), then this test can identify which stores have sales patterns that are the
most statistically different, as measured using standard statistical tests. Although differences in patterns
may be explainable, they may also result from errors. Alternative tests can be performed using month of
year instead of store location, etc.
Usage Example 1
In an audit of revenue in a retail environment, determine which store’s revenue was the most different,
based upon analysis by sales on holidays.
Approach – using the “patternhol” command, analyze such transactions.
Audit Command values
Column value – [trans date]
Text Box – [store number],[transdate]
Where – (empty)
Results
A listing of summary results, by store location, in descending order
Usage Example 2
In an audit of journal entries, determine which account’s postings were the most different, based
postings made on holidays.
Approach – using the “patternhol” command, analyze such transactions.
Audit Command values
Column value – [ posting date]
Text Box – [account number], [posting date]
Where – (empty)
Results
A listing of summary results, by account number, in descending order
Output results
Many accounting transaction amounts will tend to follow that expected using Benford’s law unless there
is a compelling reason that they should not (e.g. upper or lower transaction limits, recurring amounts,
etc.).
The pattern test for Benford’s law separates the population into groups and then computes the expected
and observed values using Benford’s law for that group. An example might be inventory counts taken at
various warehouses. Inventory counts should conform with that expected using Benford’s Law. By
applying a pattern test by warehouse, it is possible to identify which warehouse had inventory counts
that differed the most from that expected using Benford’s law.
Usage Example 1
In an audit of expense reports, a test is to be made to determine which employee’s expense reports
were the most different from all other expense reports, based upon Benford’s Law.
Approach – using the “patternben” command,analyze expense report transactions.
Audit Command values
Column value – [expense amount]
Text Box – [employee number], [expense amount], F1
Where – (empty)
Results
A listing of summary results, by employee number, in descending order
Usage Example 2
In an audit of inventory counts, a test is to be made to determine which inventory counts were the most
different from all other warehouse locations , based upon Benford’s Law.
Approach – using the “patternben” command, analyze inventory count transactions.
Audit Command values
Column value – [inventory count]
Text Box – [warehouse], [inventory count], F1
Where – (empty)
Auditing data in Excel
Page 123
worksheets
Audit Commands
Results
A listing of summary results, by warehouse, in descending order
In the example below, the test was performed using cost amounts at various locations. The Benford’s
Law test was for first digit, F1.
Output results
4.5 Sampling
Compliance testing often relies on attribute sampling when a test is to be based upon a random
sample. If segments of a population are expected to have significantly different rates of
compliance for a tested procedure, then stratified attribute sampling maybe appropriate. If not,
then unrestricted sampling will be better.
If the supporting documents for data being audited are contained in a central location, e.g. no
travel or other logistics are involved, then stop and go sampling may be a more efficient and
effective method for random sampling for the following reasons:
Stop and Go sampling is a statistically valid process which involves the following steps:
1. Assign a random number to each item in the population
(e.g. using "Mersenne Twister" or other statistically
valid random number generator)
2. Sort the population by assigned random number, either
ascending or descending
3. Select the first 10 - 20 items (auditor judgment as
to number), test them and put the results into an
Excel spreadsheet.
4. Run a "stop and go" sample report and review the
results (see example below)
The report from the Stop and Go Sample will show the intermediate results, sample
statistics as well as calculate the estimate of the population at four confidence levels -
80%, 90%, 95% and 98%. The results will also be charted for easy review. The charts
show the upper and lower bounds, as well as the point estimate for each calculation.
An example of the chart output is shown below (attribute test for signature on
documents as tested in 25 samples):
The chart above presents the results of the attribute sample test visually for four confidence
levels as follows:
1. 80% confidence the rate is between approximately .015 and .021
2. 90% confidence the rate is between approximately .014 and .022
3. 95% confidence the rate is between approximately .013 and .025
4. 98% confidence the rate is between approximately .0125 and .024
These formula are based upon the article in The American Statistician:
Output results
Output results
Stop and Go Attribute
Output results (chart)
The chart below was specified using a custom color scheme and the title shown. These values are
provided using the “Chart” tab on the processing form.
Monetary amounts can be estimated using stratified sampling, especially if the population can be
divided into strata which have less variability. There are techniques for optimizing the selection of
sample size, such as Neyman's allocation method.
If the supporting documents for data being audited are contained in a central location, e.g. no
travel or other logistics are involved, then stop and go sampling may be a more efficient and
effective method for random sampling for the following reasons:
1. There is no need to compute a required sample size,
2. There is no need to perform a preliminary analysis of the population attributes such as
expected error rate, and
3. There is little or no risk in "over sampling", i.e. testing more samples than required and
therefore spending excess audit time doing the testing.
Stop and Go sampling is a statistically valid process which involves the following steps (but note
that it does not comply with the proposed SAS 39):
1. Assign a random number to each item in the population (e.g. using "Mersenne Twister"
or other statistically valid random number generator)
2. Sort the population by assigned random number, either ascending or descending
3. Assign a strata number to each transaction in the population (typically based upon a
numeric range of values).
4. Obtain a suggested sample allocation based upon Neyman's allocation (or other method
logy)
5. Select the first 10 - 20 items (auditor judgment as to number), test them and put the
results into an Excel spreadsheet.
6. Run a "stop and go" sample report and review the results (see example below)
7. If the resulting sample precision is too large, then select another group of transactions by
sorted assigned random number (auditor judgment as to number)
8. Test the samples and record the results in the same Excel spreadsheet.
9. Run another "stop and go" sample an review the results.
10. Repeat steps 5 through 7 until satisfactory results have been obtained.
The report from the Stop and Go Sample will show the intermediate results, sample statistics as
well as calculate the estimate of the population at four confidence levels - 80%, 90%, 95% and
98%. The results will also be charted for easy review. The charts show the upper and lower
bounds, as well as the point estimate for each calculation.
The chart above presents the results of the variable sample test visually for four confidence
levels as follows:
1. 80% confidence the true population amount is between approximately $110,000 and
$218,000
2. 90% confidence the true population amount is between approximately $95,000 and
$230,000
3. 95% confidence the true population amount is between approximately $81,000 and
$241,000
4. 98% confidence the true population amount is between approximately $67,000 and
$259,000
Usage Example 1
The confidence interval is computed using the Student’s T-value as computed using the “Cephes”
software (U.S. Department of Energy).
Output results
One of the first steps in performing a stratified variable sample is a determination of the composition of
each strata, including its variability, etc. With this information it is then possible to perform either a 1)
proportional sample or 2) a disproportionate sample. Generally, auditors will select a disproportionate
sample, as typically the population will not be consistent, and thus the sampling should be concentrated
in those strata which have the most variability.
There is a formula which can be used to determine the optimal counts for sampling, which is referred to
as “Neyman’s allocation”.
The purpose of the stratified variable population command is to assess the population values by strata
and suggest a sample plan based upon Neyman’s allocation, i.e. a disproportionate stratified sample.
The “z-score” is computed using the inverse normal function of the Cephes software (US DOE).
Neyman’s allocation is calculated using the following formula:
For purposes of the calculation, the costs of sampling ( c sub I and c sub h) are assumed to be uniform.
Output results
Stratified Variable Sampling
Output results
The stratified attribute population command simply prepares a schedule showing the number of items to
be tested within each stratum. Such information provides the auditor a basis for making further decisions
as to the composition of the samples to be tested.
The data values do not have be sorted by strata. Also, although the strata identifiers shown here are
numeric, the strata identifiers may have any value. Each unique value will result in a separate strata for
sample testing.
Usage Example 1
In the example below, the attribute to be tested is identified as “audited”. The name of the column
containing the strata identifier is “stratum” and the name of the column indicating whether the value in
the row is to be sampled and tested is named “Selected”.
Each value selected for sampling is indicated by placing an “X” in the column labeled “selected” (or other
name chosen). For attribute sampling, the audited value will be non-blank if the attribute being tested is
found to exist. All this is illustrated in a very simple example below:
Row Signature Selected Strata
1 A
2 X B
3X X C
4 A
5 B
The data being tested consists of five rows, separated into three strata “A”, “B” and “C”. Only rows 2
and 3 have been selected for sampling. The attribute being tested is a signature on a document. The
record for row 2 has a signature, the record for row 3 does not.
Output results
The stratified attribute assessment command uses the sample results to extrapolate the results to each
strata and in total. For each stratum, the point estimate, as well as upper and lower limits are listed.
The data values do not have be sorted by strata. Also, although the strata identifiers shown here are
numeric, the strata identifiers may have any value. Each unique value will result in a separate strata for
sample testing.
The command below prepares an extrapolation based upon attribute sampling. The name of the column
containing the stratum identifier is “stratum”, the name of the column containing the results of the test of
the attribute is called “audited”, and the name of the column indicating if the row was selected for
sampling is called “selected”. The confidence level desired for the results is 97%. This the command in
the text box is:
Note: By default, results at the three confidence levels – 80%, 90% and 95% are produced. An
additional confidence level may be specified.
Output results
Output results
5.1 Overview
The procedure for working with data contained in Access databases and Excel workbooks is
almost identical to that for working with data which has been “pasted” from the Clipboard, with
two exceptions:
• The name of the Access database or Excel workbook must be provided
• In the case of Excel, the name of the worksheet must be provided, or
• In the case of Access, the name of the table or query must be provided.
All this information is provided using a form and drop down lists.
The rest of the information (e.g. column names, textbox information and “where” information is
identical.
The input form is contained under the “MS” tab shown below.
5.3 An example
To illustrate the process, the auditor wishes to extract information from a worksheet named “FA”
in a workbook named EWP.xls to identify fixed asset records where the fixed asset may have
been over depreciated. Below is the process, step by step.
The last used directory is shown and the Excel work book named fa.xls is selected.
All this information is provided using a form and drop down lists.
The rest of the information (e.g. column names, textbox information and “where” information is
identical.
The input form is contained under the “File” tab shown below.
5.6 An example
To illustrate the process, the auditor wishes to analyze information from a text file named “FA.txt”
in the directory “c:\test\data” to identify fixed asset records where the fixed asset may have been
over depreciated. Below is the process, step by step.
The last used directory is shown and the Excel work book named fa.xls is selected.
Drilling down to information of interest is enabled through the use of the “Where” information. A
separate tab is provided in order to enter the information if it is lengthy or complex.
Note: This form can also be shown by clicking on the label “Where?”.
There are numerous examples of possible “where” clauses. To help, there is a drop down list of
examples which can be selected and then tailored to specific uses.
In the screen above, the auditor wishes to extract information within the last 30 days. The
example shown provides a mean to do this.
All that needs to be done now is to change the name of the column to one that is of interest
(unless the column of interest is named “acquisition”).
Below are tables which provide examples of some of the functions with a brief description. More
complex criteria can be applied using combinations of the functions or “nesting” which is
described below.
Auditing data in Excel
Page 161
worksheets
Access Databases and Excel Workbooks Audit Commands
6.1 Numeric
6.2 Text
6.5 Combinations
Functions can be combined using the logical tests described in section 7.4. For example, to test
asset records acquired during a specific fiscal period which also have useful lives exceeding ten
years the criteria would be specified as follows using the “AND” connectior:
([installation date] between #7/1/2007# and #6/30/2008#) and ([useful life] > 7)
If the last name may also have blanks to the right of the last character, then an
additional function (“trim”) could be first applied before the remaining tests:
1. Between
2. In
3. Like
The between operator allows the specification of a range of values which may be
text, numeric or date – e.g.
Between #7/1/2007# and #6/30/2008#
Between ‘A’ and ‘M’
Between 100 and 2000
The in operator allows the specification of a number of text values, each separated
by a comma, e.g. to test if a specific state code has been located:
Operator Meaning
[last name] like ‘BLA%’ Last name starts with ‘BLA’
Installation of the software is a straightforward process, using the standard “Setup.exe” method.
There are two types of installs:
1. “regular” install
2. “silent” install
For a “silent” install, the software is installed with all the default values – no interaction is
required.
Double clicking the file “ACSetup.exe” brings up the splash screen asking if you wish to install
the Audit Commander.
Auditing data in Excel
Page 167
worksheets
Access Databases and Excel Workbooks Audit Commands
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
8 Comment Form
Although I am not able to respond to all such comments and suggestions, I will try
to do so as feasible. Registered users of Audit Commander will be notified as
revised versions of the manual are released.