Chapter 6 Data-Driven Fraud Detection
Chapter 6 Data-Driven Fraud Detection
Chapter 6 Data-Driven Fraud Detection
(Fraud Examination 6e
by Albrecht, Arbrecht, Arbrecht, Zimbelman)
Audit Sampling and Fraud
➢ Statistical sampling has become a standard auditing procedure.
✓ Audit sampling is an effective analysis procedure for finding routine errors spread
throughout a data set.
✓ In contrast, sampling is usually a poor analysis technique when looking for a needle
in a haystack.
✓ If you sample at a 5 percent rate, you effectively take a 95 percent chance that you
will miss the few fraudulent transactions.
➢ Often, fraud examiners strive to complete full-population analysis to ensure
that the “needles” are found.
➢ Given the right tools and techniques, full-population analysis is often the
preferred method in a fraud investigation.
The Data Analysis Process
➢Fraud investigators must be prepared to learn new methodologies,
software tools, and analysis techniques to successfully take advantage
of data-oriented methods.
➢Data-driven fraud detection is proactive in nature.
✓ The investigator no longer has to wait for a tip to be received.
✓ The investigator brainstorms the schemes and symptoms that might be found
and then looks for them.
➢Data-driven detection is essentially a hypothesis-testing approach:
✓ The investigator makes hypotheses and tests to see which are supported by the
Figure 6.1 The Proactive Method of Fraud
The Data Analysis Process – Six Steps
Figure 6.2 Red Flags of Kickbacks
➢Analytical Symptoms
✓Increasing prices
✓Larger order quantities
✓Increasing purchases from favored vendor
✓Decreasing purchases from other vendors
✓Decreasing quality
Figure 6.2 Red Flags of Kickbacks
➢Behavioral Symptoms
✓ Buyer doesn’t relate well to other buyers and vendors
✓ Buyer’s work habits change unexpectedly
Figure 6.2 Red Flags of Kickbacks
➢Lifestyle Symptoms
✓ Buyer lives beyond known salary
Figure 6.2 Red Flags of Kickbacks
➢Control Symptoms
✓ All transactions with one buyer and one vendor
✓ Use of unapproved vendors
➢Document Symptoms
✓ 1099s from vendor to buyer’s relative
Figure 6.2 Red Flags of Kickbacks
Step 4: Use Technology to Gather Data
about Symptoms
➢Searching and analysis
✓ Data analysis applications
✓ Custom structured query language (SQL) queries and scripts
➢The deliverable of this step is a set of data that matches the symptoms
identified in the previous step
Step 5: Analyze Results
• Once errors are refined and determined by the examiners to be likely
indications of fraud, they are analyzed using either traditional or technology-
based methods:
• Screening results using computer algorithms
• Real-time analysis and detection of fraud
• One advantage of the data-driven approach is its potential reuse.
Step 6: Investigate Symptoms
➢ The final step of the data-driven approach is investigation into the most
promising indicators.
➢ The primary advantage of the data-driven approach is the investigator takes
charge of the fraud investigation process.
✓ Instead of waiting for tips or other indicators to become egregious enough to show
on their own, the data-driven approach can highlight frauds while they are small.
➢ The primary drawback to the data-driven approach is that it can be more
expensive and time intensive than the traditional approach.
Data Analysis Software
➢ACL Audit Analytics
✓ Powerful program for data analysis
✓ Most widely used by auditors worldwide
➢CaseWare’s IDEA
✓ Recent versions include an increasing number of fraud techniques
✓ ACL’s primary competitor
Data Analysis Software
➢Microsoft Office + ActiveData
✓ a plug-in for Microsoft Office
✓ provides data analysis procedures
✓ based in Excel and Access
✓ less expensive alternative to ACL and IDEA
➢Other software package include
✓ SAS and SPSS (Statistical analysis programs with available fraud modules
✓ Traditional programming languages like Perl, Python, Ruby, Visual Basic, and other
specialized data mining platforms
Data Access
➢The most important (and often most difficult) step
in data analysis is gathering the right data in the
right format during the right time period.
➢Methods include:
o Open Database Connectivity (ODBC)
o Text Import
o Hosting a Data Warehouse
Open Database Connectivity (ODBC)
➢ standard method of querying data from corporate relational
➢ a connector between the front-end analysis using analysis
applications (such as ACL and IDEA ) and the back-end
corporate databases (Oracle, SQL Server, and MySQL)
➢ best way to retrieve data for analysis because
✓ it can retrieve data in real time
✓ it allows use of the SQL language
✓ it allows repeated pulls for iterative analysis
✓ it retrieves metadata (like column types and relationships) directly
Text Import
➢ Several text formats exist for copying data from one application (i.e., a
database) to another (i.e., an analysis application).
➢ Text Import
✓ Import data with a delimited text
o Comma separated values (CSV):
ID, Date, First Name, Last Name, Phone Number, etc.
342, 12/23/2007, Seth, Knab, 000-000-0000, etc.
o Table separated values (TSV):
ID Date First Name Last Name Phone
342 12/23/2007 Seth Knab 000-000-0000
✓ Fixed-width format
✓ Extensible markup language (XML) – used in many new applications
✓ EBCDIC - Used primarily on IBM mainframes
Hosting a Data Warehouse
➢Many investigators simply import data directly into their analysis
application, effectively creating a simplified data warehouse.
➢While most programs are capable of storing millions of records in
multiple tables, most analysis applications are relatively poor data
➢Databases are the optimal method of storing data.
➢Accounting applications like ACL and IDEA provide options for
server-based storage of data.
Data Analysis Techniques
➢ Once data are retrieved and stored in a data warehouse, analysis application, or text
file, they need to be analyzed to identify transactions that match the indicators
identified earlier in the process.
➢ Analysis techniques that are most commonly used by fraud investigators:
✓ Data Preparation
✓ Benford’s Law
✓ Digital Analysis
✓ Outlier Investigation
✓ Stratification and Summarization
✓ Time Trend Analysis
✓ Fuzzy Matching
✓ Real-Time Analysis
Data Preparation
➢One of the most important—and often most difficult— tasks in data
analysis is proper preparation of data.
➢Areas of concern
✓ Type conversion and consistency of values
✓ Descriptives about columns of data
✓ Time standardization
Digital Analysis
➢Digital analysis is the art of analyzing the digits that make up number
sets like invoice amounts, reported hours, and costs.
➢Benford’s Law accurately predicts for many kinds of financial data
that the first digits of each group of numbers in a set of random
numbers will conform to the predicted distribution pattern.
✓ Using Benford’s Law to detect fraud has the major advantage of being a very
inexpensive method to implement and use.
✓ The disadvantage of using Benford’s Law is that it is tantamount to hunting
fraud with a shotgun.
Table 6.1 Benford’s Law Probability
Figure 6.3 Digital Analysis—Supply
Figure 6.4 Supplier Graphs
➢Stratification is the splitting of complex data sets into groupings.
➢The data set must be stratified into a number of “subtables” before
analysis can be done.
➢For many data sets, stratification can result in thousands of subtables.
➢While basic programs like spreadsheets make working with this many
tables difficult and time consuming, analysis applications like ACL
and IDEA make working with lists of tables much easier.
➢Summarization is an extension of stratification.
➢Summarization runs one or more calculations on the subtables to
produce a single record representing each subtable.
➢Basic summarization usually produces a single results table with one
record per case value.
➢Pivot tables (also called cross tables) are two-dimensional views with
cases in one dimension and the calculations in the detail cells.
Time Trend Analysis
➢Time trend analysis is a summarization technique that produces a
single number that summarizes each graph.
➢By sorting the results table appropriately, the investigator quickly
knows which graphs need further manual investigation.
Figure 6.5 Time Trend Graph
Fuzzy Matching
➢ Another common technique is fuzzy matching of textual values.
➢ This technique allows for searches to be performed that will find matches
between some text and entries in a database that are less than 100 percent
➢ The first and most common method of fuzzy matching is use of the
Soundex algorithm.
➢ A more powerful technique for fuzzy matching uses n-grams. This technique
compares runs of letters in two values to get a match score from 0 to 100
Real-Time Analysis
➢ Data-driven investigation is one of the most powerful methods of
discovering fraud.
➢ It is usually performed during investigations or periodic audits, but it can be
integrated directly into existing systems to perform real-time analysis on
➢ Although real-time analysis is similar to traditional accounting controls
because it works at transaction time, it is a distinct technique because it
specifically analyzes each transaction for fraud (rather than for accuracy or
some other attribute).