11/03/2024 11/03/2024
Contents
• How are data used and stored in the accounting cycle?
CHAPTER 02 • How are data stored in relational databases?
Mastering Data • Data dictionaries
• What does it mean to extract, transform, and load
• Ethical considerations of data collection and use
1 3
How are data used and stored in the accounting cycle?
Objectives
Data can be found throughout
various systems.
• Understand how data are organized in an accounting information
system.
• Understand how data are stored in a relational database.
In most cases, you need to know
• Explain and apply extraction, transformation, and loading (ETL) which tables and attributes
techniques. contain the relevant data. Exhibit 2-2 Procure-to-Pay Database Schema (Simplified)
• Describe the ethical considerations of data collection and data
use
Unified Modeling Language
(UML) is one way to understand
databases.
2 4
1 2
11/03/2024 11/03/2024
How are data stored in relational databases?
Internal and External Data Sources
Data may come from a number of different sources, either internal or Relational databases ensure that data:
external to the organization. Internal data sources include:
• Are complete or include all data.
• accounting information system
• supply chain management system
• Aren’t redundant, so they don’t take up too much space.
• customer relationship management system • Follow business rules and internal controls.
• human resource management system. • Aid communication and integration of business processes.
Enterprise Resource Planning (ERP) (also known as Enterprise Systems)
is a category of business management software that integrates applications
from throughout the business (such as manufacturing, accounting, finance,
human resources, etc.) into one system.
5 7
How are data used and stored in the accounting cycle? How are data stored in relational databases?
• There are a variety of applications that support relational • Primary keys are unique Purchase Order Table
databases (these are referred to as Relational Database identifiers. PO_ Created Approved Supplier Employee
Cash
Management Systems or RDBMS). For example: Microsoft • Foreign keys are attributes that
Number
Date
By By ID ID
Disbursement
ID
Access, SQLite, and Microsoft SQL Server. point to a primary key in another 1787 11/1/2020 1001 1010 1 52 2001
• There are many other examples of relational database table. 1788 11/1/2020 1005 1010 2 52 2003
management systems: Teradata, MySql, Oracle RDBMS, IBM • Composite keys are a 1789 11/8/2020 1002 1010 1 52 2004
DB2, Amazon RDS, and PostGreSQL. combination of two or more
1790 11/15/2020 1005 1010 1 52 2004
attributes to create a unique Exhibit 2-4 Purchase Order Table
identifier.
• Descriptive attributes include
everything else.
6 8
3 4
11/03/2024 11/03/2024
How are data stored in relational databases?
Data dictionaries define what data are acceptable.
• Examples of two tables, attributes, and data. Notice the PK-FK • For each attribute, we learn: Primary or
Attribute Defaul Field
relationship.
Foreign Required Description Data Type Notes
What type of key it is. Key?
Name t Value Size
What data are required. Supplier
Unique Identifier
PK Y for each Supplier Number n/a 10
What data can be stored in it. ID
How much data is stored. N
Supplier First and Last
Short Text n/a 30
Name Name
Type Code for
Different
Supplier
FK N Supplier Number Null 10 1: Vendor
Type
Categories 2: Misc
Exhibit 2-3 Line Items Table:
Purchase Order Detail Table
Exhibit 2-6 Supplier Data Dictionary
Exhibit 2-4 Purchase Order Table
9 11
Lending Club Data Dictionary for Rejected Loan Data
10 12
5 6
11/03/2024 11/03/2024
Abbreviated Data Dictionary for Vendor Data Extract Extract
Step 1: Determine the purpose and scope of the data request.
Ask a few questions before beginning the process:
• What is the purpose of the data request?
• What do you need the data to solve?
• What business problem will it address?
• What risk exists in data integrity (for example, reliability,
usefulness)?
• What is the mitigation plan?
• What other information will impact the nature, timing, and
extent of the data analysis?
Romney et al, 2021 13 15
What does it mean to extract, transform, and load Extract
The ETL process begins with identifying which data you need and is Step 2: Obtain the Data – Methods
complete when the clean data are loaded in the appropriate format
into the tool to be used for analysis. The Requesting data is an iterative There are a couple options:
practice involving 5 steps: • Obtain data through a data request to the IT department.
• Extract
Step 1: Determine the purpose and scope of the data request.
• Obtain data yourself.
Step 2: Obtain the data.
• Transform
Step 3: Validate the data for completeness and integrity.
Step 4: Clean the data.
• Load
Step 5: Load the data for data analysis.
14 16
7 8
11/03/2024 11/03/2024
Example Standard Data Request Form – Header
Obtain the data yourself
Section 1: Request Details
Frequency (circle One-Off Annually Termly
Requestor Name: one) Other:___________ • If you have direct access to a data warehouse, you can use SQL
Requestor Contact and other tools to pull the data yourself.
Number:
Requestor Email Format you wish the
Spreadsheet
Word Document
• Identify the tables that contain the information you need. You
Address: data to be delivered
Text File can do this by looking through the data dictionary or the
in(circle one):
Please provide a description of the Other: ____________ relationship model.
information needed (indicate which tables
and which fields you require): Request Date: • Identify which attributes, specifically, hold the information you
Required Date: need in each table.
What will the information be used for?
Intended Audience: • Identify how those tables are related to each other.
Customer
(if not requestor):
EXHIBIT 2-7 Example Standard Data Request Form
17 19
Example Standard Data Request Form – Response Transform
Section 2: To be Completed by Information
Section 3: Completion Details
Step 3: Validating the data for completeness and integrity
Systems Department
• Chances are the data you request isn’t complete. Before you
Request Date
Date Date begin, do a little work to make sure your data are valid:
Number Received
Completed Provided Compare the number of records.
Assigned
Received by Compare descriptive statistics for numeric fields.
to
Initial review comments (discussion with client— Revisions Validate Date/Time fields.
Required
revisions required? agreement to proceed? etc.) Compare string limits for text fields.
Feedback from client (if applicable)
Work in progress comments (additional notes and
comments during production of data)
EXHIBIT 2-7 Example Standard Data Request Form
18 20
9 10
11/03/2024 11/03/2024
Transform In column 3, which of the following problems do you find?
Step 4: Clean the data. a. data consistency error
• Once you have valid data, there is still some work that needs to b. data imputation error
be done to make sure it is consistent and ready for analysis:
Remove headings or subtotals. c. data contradiction error
Clean leading zeroes and nonprintable characters. d. violated attribute dependencies
Format negative numbers.
Correct inconsistencies across data, in general.
21 23
Knowledge check In column 5, which of the following problems do you find?
a. data pivoting error
b. violated attribute dependencies
c. data consistency error
d. cryptic values
22 24
11 12
11/03/2024 11/03/2024
In row 8 and row 9, which of the following problems do
you find?
A note about data quality.
a. data contradiction error • Dates (e.g., 7/6/2023 or 6/7/2023 or 2023-07-06)
b. data concatenation error • Numbers (e.g., 1 or I, 7 or seven)
c. data aggregation error • International characters and encoding (e.g., * or “ or TAB)
d. duplicate values • Languages and measures (e.g., Arkansas or AR, $ or €)
• Human error (e.g., 23 or 32)
25 27
In column 2, row 7, which of the following problems do you find?
Format Cells Window in Excel
a. data threshold violation
b. data entry error
c. violated attribute dependencies
d. dichotomous variable problem
26 28
13 14
11/03/2024 11/03/2024
Load Chapter 2 Summary
• The first step in the IMPACT cycle is to identify the • Once you have the data, they will need to be validated
questions that you intend to answer through your data for completeness and integrity—that is, you will need to
Step 5: Load the data for data analysis analysis project. Once a data analysis problem or question ensure that all of the data you need were extracted, and
has been identified, the next step in the IMPACT cycle is that all data are correct. Sometimes when data are
• Finally, you can now import your data into the tool of your mastering the data, which can be broken down to mean
obtaining the data needed and preparing it for analysis.
extracted, some formatting or sometimes even entire
records will get lost, resulting in inaccuracies. Correcting
choice and expect the functions to work properly. • In order to obtain the right data, it is important to have a
the errors and cleaning the data is an integral step in
mastering the data.
firm grasp of what data are available to you and how that
information is stored. • Finally, after the data have been cleaned, there may be
• Data are often stored in a relational database, which one last step of mastering the data, which is to load
helps to ensure that an organization’s data are them into the tool that will be used for analysis. Often,
complete and to avoid redundancy. Relational the cleaning and correcting of data occur in Excel and
databases are made up of tables with uniquely the analysis will also be done in Excel. In this case, there
identified records (this is done through primary keys) is no need to load the data elsewhere. However, if you
and are related through the usage of foreign keys. intend to do more rigorous statistical analysis than Excel
provides, or if you intend to do more robust data
• To obtain the data, you will either have access to extract the visualization than can be done in Excel, it may be
data yourself or you will need to request the data from a necessary to load the data into another tool following
database administrator or the information systems team. If the transformation process.
the latter is the case, you will complete a data request form,
indicating exactly which data you need and why.
29 31
Potential ethical issues surround how data Problems
are collected and how they are shared.
1. How does the company use data, and to what extent are they integrated • P1
into firm strategy
2. Does the company send a privacy notice to individuals when their
• P2
personal data are collected? • P3
3. Does the company assess the risks linked to the specific type of data the • P4
company uses?
• P5
4. Does the company have safeguards in place to mitigate the risks of data
misuse? • P6
5. Does the company have the appropriate tools to manage the risks of
data misuse?
6. Does our company conduct appropriate due diligence when sharing with
or acquiring data from third parties?
30 32
15 16