Chapter 6 Data-Driven Fraud Detection
Chapter 6 Data-Driven Fraud Detection
Chapter 6 Data-Driven Fraud Detection
Detection
(Fraud Examination 6e
by Albrecht, Arbrecht, Arbrecht, Zimbelman)
3
Audit Sampling and Fraud
➢ Statistical sampling has become a standard auditing procedure.
✓ Audit sampling is an effective analysis procedure for finding routine errors spread
throughout a data set.
✓ In contrast, sampling is usually a poor analysis technique when looking for a needle
in a haystack.
✓ If you sample at a 5 percent rate, you effectively take a 95 percent chance that you
will miss the few fraudulent transactions.
➢ Often, fraud examiners strive to complete full-population analysis to ensure
that the “needles” are found.
➢ Given the right tools and techniques, full-population analysis is often the
preferred method in a fraud investigation.
4
The Data Analysis Process
➢Fraud investigators must be prepared to learn new methodologies,
software tools, and analysis techniques to successfully take advantage
of data-oriented methods.
➢Data-driven fraud detection is proactive in nature.
✓ The investigator no longer has to wait for a tip to be received.
✓ The investigator brainstorms the schemes and symptoms that might be found
and then looks for them.
➢Data-driven detection is essentially a hypothesis-testing approach:
✓ The investigator makes hypotheses and tests to see which are supported by the
data.
5
Figure 6.1 The Proactive Method of Fraud
Detection
6
The Data Analysis Process – Six Steps
10
Figure 6.2 Red Flags of Kickbacks
➢Analytical Symptoms
✓Increasing prices
✓Larger order quantities
✓Increasing purchases from favored vendor
✓Decreasing purchases from other vendors
✓Decreasing quality
11
Figure 6.2 Red Flags of Kickbacks
➢Behavioral Symptoms
✓ Buyer doesn’t relate well to other buyers and vendors
✓ Buyer’s work habits change unexpectedly
12
Figure 6.2 Red Flags of Kickbacks
➢Lifestyle Symptoms
✓ Buyer lives beyond known salary
13
Figure 6.2 Red Flags of Kickbacks
➢Control Symptoms
✓ All transactions with one buyer and one vendor
✓ Use of unapproved vendors
➢Document Symptoms
✓ 1099s from vendor to buyer’s relative
14
Figure 6.2 Red Flags of Kickbacks
15
Step 4: Use Technology to Gather Data
about Symptoms
➢Searching and analysis
✓ Data analysis applications
✓ Custom structured query language (SQL) queries and scripts
➢The deliverable of this step is a set of data that matches the symptoms
identified in the previous step
16
Step 5: Analyze Results
• Once errors are refined and determined by the examiners to be likely
indications of fraud, they are analyzed using either traditional or technology-
based methods:
• Screening results using computer algorithms
• Real-time analysis and detection of fraud
• One advantage of the data-driven approach is its potential reuse.
17
Step 6: Investigate Symptoms
➢ The final step of the data-driven approach is investigation into the most
promising indicators.
➢ The primary advantage of the data-driven approach is the investigator takes
charge of the fraud investigation process.
✓ Instead of waiting for tips or other indicators to become egregious enough to show
on their own, the data-driven approach can highlight frauds while they are small.
➢ The primary drawback to the data-driven approach is that it can be more
expensive and time intensive than the traditional approach.
18
Data Analysis Software
➢ACL Audit Analytics
✓ Powerful program for data analysis
✓ Most widely used by auditors worldwide
➢CaseWare’s IDEA
✓ Recent versions include an increasing number of fraud techniques
✓ ACL’s primary competitor
19
Data Analysis Software
➢Microsoft Office + ActiveData
✓ a plug-in for Microsoft Office
✓ provides data analysis procedures
✓ based in Excel and Access
✓ less expensive alternative to ACL and IDEA
➢Other software package include
✓ SAS and SPSS (Statistical analysis programs with available fraud modules
✓ Traditional programming languages like Perl, Python, Ruby, Visual Basic, and other
specialized data mining platforms
20
Data Access
➢The most important (and often most difficult) step
in data analysis is gathering the right data in the
right format during the right time period.
➢Methods include:
o Open Database Connectivity (ODBC)
o Text Import
o Hosting a Data Warehouse
21
Open Database Connectivity (ODBC)
➢ standard method of querying data from corporate relational
databases
➢ a connector between the front-end analysis using analysis
applications (such as ACL and IDEA ) and the back-end
corporate databases (Oracle, SQL Server, and MySQL)
➢ best way to retrieve data for analysis because
✓ it can retrieve data in real time
✓ it allows use of the SQL language
✓ it allows repeated pulls for iterative analysis
✓ it retrieves metadata (like column types and relationships) directly
22
Text Import
➢ Several text formats exist for copying data from one application (i.e., a
database) to another (i.e., an analysis application).
➢ Text Import
✓ Import data with a delimited text
o Comma separated values (CSV):
ID, Date, First Name, Last Name, Phone Number, etc.
342, 12/23/2007, Seth, Knab, 000-000-0000, etc.
o Table separated values (TSV):
ID Date First Name Last Name Phone
342 12/23/2007 Seth Knab 000-000-0000
✓ Fixed-width format
✓ Extensible markup language (XML) – used in many new applications
✓ EBCDIC - Used primarily on IBM mainframes
23
Hosting a Data Warehouse
➢Many investigators simply import data directly into their analysis
application, effectively creating a simplified data warehouse.
➢While most programs are capable of storing millions of records in
multiple tables, most analysis applications are relatively poor data
repositories.
➢Databases are the optimal method of storing data.
➢Accounting applications like ACL and IDEA provide options for
server-based storage of data.
24
Data Analysis Techniques
➢ Once data are retrieved and stored in a data warehouse, analysis application, or text
file, they need to be analyzed to identify transactions that match the indicators
identified earlier in the process.
➢ Analysis techniques that are most commonly used by fraud investigators:
✓ Data Preparation
✓ Benford’s Law
✓ Digital Analysis
✓ Outlier Investigation
✓ Stratification and Summarization
✓ Time Trend Analysis
✓ Fuzzy Matching
✓ Real-Time Analysis
25
Data Preparation
➢One of the most important—and often most difficult— tasks in data
analysis is proper preparation of data.
➢Areas of concern
✓ Type conversion and consistency of values
✓ Descriptives about columns of data
✓ Time standardization
26
Digital Analysis
➢Digital analysis is the art of analyzing the digits that make up number
sets like invoice amounts, reported hours, and costs.
➢Benford’s Law accurately predicts for many kinds of financial data
that the first digits of each group of numbers in a set of random
numbers will conform to the predicted distribution pattern.
✓ Using Benford’s Law to detect fraud has the major advantage of being a very
inexpensive method to implement and use.
✓ The disadvantage of using Benford’s Law is that it is tantamount to hunting
fraud with a shotgun.
27
Table 6.1 Benford’s Law Probability
Values
28
Figure 6.3 Digital Analysis—Supply
Management
29
Figure 6.4 Supplier Graphs
30
Stratification
➢Stratification is the splitting of complex data sets into groupings.
➢The data set must be stratified into a number of “subtables” before
analysis can be done.
➢For many data sets, stratification can result in thousands of subtables.
➢While basic programs like spreadsheets make working with this many
tables difficult and time consuming, analysis applications like ACL
and IDEA make working with lists of tables much easier.
31
Summarization
➢Summarization is an extension of stratification.
➢Summarization runs one or more calculations on the subtables to
produce a single record representing each subtable.
➢Basic summarization usually produces a single results table with one
record per case value.
➢Pivot tables (also called cross tables) are two-dimensional views with
cases in one dimension and the calculations in the detail cells.
32
Time Trend Analysis
➢Time trend analysis is a summarization technique that produces a
single number that summarizes each graph.
➢By sorting the results table appropriately, the investigator quickly
knows which graphs need further manual investigation.
33
Figure 6.5 Time Trend Graph
34
Fuzzy Matching
➢ Another common technique is fuzzy matching of textual values.
➢ This technique allows for searches to be performed that will find matches
between some text and entries in a database that are less than 100 percent
identical.
➢ The first and most common method of fuzzy matching is use of the
Soundex algorithm.
➢ A more powerful technique for fuzzy matching uses n-grams. This technique
compares runs of letters in two values to get a match score from 0 to 100
percent.
35
Real-Time Analysis
➢ Data-driven investigation is one of the most powerful methods of
discovering fraud.
➢ It is usually performed during investigations or periodic audits, but it can be
integrated directly into existing systems to perform real-time analysis on
transactions.
➢ Although real-time analysis is similar to traditional accounting controls
because it works at transaction time, it is a distinct technique because it
specifically analyzes each transaction for fraud (rather than for accuracy or
some other attribute).
36