0% found this document useful (0 votes)
231 views

CHAPTER 8 Data Structures and Caatts

This chapter discusses data structures and their use in data extraction. It covers flat file structures like sequential, indexed, hashing and pointer structures. It also discusses relational database concepts like entities, attributes, keys and relationships. Specific structures covered include hierarchical, network, relational models and normalization. It also discusses using views to provide user access to normalized database structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
231 views

CHAPTER 8 Data Structures and Caatts

This chapter discusses data structures and their use in data extraction. It covers flat file structures like sequential, indexed, hashing and pointer structures. It also discusses relational database concepts like entities, attributes, keys and relationships. Specific structures covered include hierarchical, network, relational models and normalization. It also discusses using views to provide user access to normalized database structures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

CHAPTER 8

DATA
STRUCTURES AND
CAATTS FOR
DATA
EXTRACTION
LEARNING OBJECTIVES
After studying this chapter, you should:
• Understand the components of data structures and how these are
used to achieve data processing operations.
• Be familiar with structures used in flat-file systems, including
sequential, indexes, hashing, and pointer structures.
• Be familiar with relational database structures and the principles of
normalization.
• Understand the features, advantages, and disadvantages of the
embedded audit module approach to data extraction.
• Know the capabilities and primary features of generalized audit
software.
• Become familiar with the more commonly used features of audit
command language.

2
Data Structures have two fundamental components: Organization and
Access method.

Organization refers to the way records are physically arranged on


the secondary storage device. This may be either sequential or
random.
The access method is the technique used to locate records and to
navigate through the database or file. While several specific
techniques are used, in general, they can be classified as either
direct access or sequential access methods.

3
Flat-File Structure
End users in this environment own their data files rather
than share them with other users
Data files are structured, formatted, and arranged to suit
the specific needs of the owner or primary user.
Sequential Structure
• All records in contiguous storage spaces in specified
sequence (key field)
• Sequential files are simple & easy to process
• Application reads from beginning in sequence

5

6
Indexed Structure
Named because, in addition to the actual data file,
there exists a separate index that is itself a file of
record addresses
The data file itself may be organized either
sequentially or randomly.

7
8
Virtual Storage Access Method
(VSAM) Structure
Used for very large files that require routine batch
processing and a moderate degree of individual record
processing.

9
10
11
Hashing Structure
Employs an algorithm that converts the primary key of a
record directly into a storage address
• Advantage: Fast access speed
• Disadvantage: Inefficient use of storage space

12
13
Pointer Structures
Stores the address (pointer) of related record in a
field with each data record.
Pointers provide connections between records and
may be used to link records between files.
3 types of pointers
1. Physical address pointer- contains actual disk storage location which allows
direct access to the record.
Advantage: Access speed.
Disadvantages: If related record moves, pointer must be changed. With no
logical relationship to records they identify, if pointer is lost or destroyed,
record it references is also lost.

2. Relative address pointer- contains relative position of a record in the file


which must be manipulated to convert to physical address.

3. Logical key pointer- contains primary key of related record.


Key value is converted by hashing to physical address.
Hierarchical & Network Database Structures
Early models employed flat-file techniques and new proprietary database structures.
Major difference between approaches is degree of process integration and data sharing
that can be achieved.
Database models designed to support existing flat-file systems while allowing a move to
new levels of data integration.
A many-to-many association is illustrated on the next slide.
Link files may also contain accounting data.
Relational Database Structure, Concepts and Terminology
Based on indexed sequential file structure which facilitates direct access to
individual records and batch processing of the file.
Multiple indexes can create an inverted list or cross-reference allowing even more
flexible access to data.
Represents data in two-dimensional tables and supports relational algebra
functions of restrict, project, and join.
The Relational Algebra Functions
Restrict, Project, and Join
Relational Database Concepts

Entity is anything the organization wants to capture data about.


Data model is the blueprint for creating the physical database.
Occurrence used to describe number of instances or records pertaining to a
specify entity.
Attributes are data elements that define an entity.
Relational Database Concepts
Association
Represented by a line connecting two entities.
Described by a verb, such as ships, requests, or receives.

Cardinality is the degree of association between two entities.


The number of possible occurrences in one table that are associated with a
single occurrence in a related table.
Four basic forms: zero or one (0,1), one and only one (1,1), zero or many (0,M)
and one or many (1,M).
Examples of entity associations
The Physical Database Tables
Physical database tables are constructed from the data model
with each entity in the model being transformed into a separate physical
table.

13
Linkages Between Relational Tables

Logically related tables need to be physically connected to


achieve the associations described in the data model. This is
accomplished by using a foreign key, as illustrated in Figure
8.14.

13
User Views

A user view is the set of data that a particular user sees. User view
(external schema) a view of part or all of the contents of a database
specified to facilitate a particular purpose or user activity.

13
Anomalies, Structural Dependencies and Data
Normalization

Database Anomalies
A database anomaly is an inconsistency in the data resulting
from an operation like an update, insertion, or deletion.

13
Update Anomaly

The update anomaly results from data redundancy in an


unnormalized table.

13
13
Insertion Anomaly

An insertion anomaly is the inability to add data to the


database due to the absence of other data

13
Deletion Anomaly

The deletion anomaly involves the unintentional


deletion of data from a table.

13
Auditors and Data Normalization

Database normalization is a technical matter that is


usually the responsibility of systems professionals.
The subject, however, has implications for internal
control that make it the concern of auditors also.

13
DESIGNING RELATIONAL DATABASES

Identify Entities
The four key entities:
Inventory (Inventory Status Report)
Supplier
Inventory Purchases (Purchase Order)
Inventory Receipts (Receiving Report).
Construct a Data Model Showing Entity Associations
(Cardinality)
Add Primary Keys and Attributes to the Model

Add Primary Keys


The next step in the process is to assign primary keys to the entities
in the model. The analyst should select a primary key that logically
defines the nonkey attributes and uniquely identifies each
occurrence in the entity.
Add Attributes
Every attribute in an entity should appear directly or indirectly (a
calculated value) in one or more user views. Entity attributes are,
therefore, originally derived and modeled from user views.
Normalize Data Model and Add Foreign Keys
Repeating Group Data in Purchase Order
The attributes Part Number, Description, Order Quantity, and Unit Cost
are repeating group data. This means that when a particular purchase order
contains more than one item (most of the time), then multiple values will
need to be captured for these attributes.
Repeating Group Data in Receiving Report
The attributes Part Number, Quantity Received, and Condition Code are
repeating groups in the Receiving Report entity and were removed to a
new entity called Rec Report Item Detail.
Transitive Dependencies
The Purchase Order and Receiving Report entities contain attributes that
are redundant with data in the Inventory and Supplier entities.
Construct the Physical Database
Prepare the Physical User View

The Receiving Report, Purchase Order, and Inventory


Status Report views would all be created in this way.

The SELECT command identifies all of the attributes to be contained in the


view.
The FROM command identifies the tables used in creating the view.
The WHERE command specifies how rows in the Inventory, Part-Supplier, and
Supplier tables are to be matched to create the view
Multiple expressions may be linked with the AND, OR, and NOT operators.
GLOBAL VIEW INTEGRATION
A MODERN COMPANY WOULD EMPLOY THOUSANDS OF VIEWS. THE TASK OF
COMBINING THE DATA NEEDS OF ALL USERS INTO A SINGLE ENTITY-WIDE SCHEMA
IS CALLED VIEW INTEGRATION.

T HE NOR MAL I Z E D E N T I T I E S I N TH E R E SU L TI N G D ATA M ODE L M U S T M E E T T H E


F OL L OWI NG C O N D I T I O N S
:
1 . AN E NT I T Y M U S T C ONS I S T OF T WO OR M OR E OCCU R R E N CE S .
2. NO T WO ENT I T I ES MAY H A V E T H E SAM E P R I M AR Y KE Y.
3 . NO N ON-K EY AT T R I BU T E MAY B E ASSOCI AT E D WI T H M OR E T HA N O NE E NT I T Y.
COMMERCI AL S YS TE MS A RE DES I GNED TO

COMMERCIAL
COMPLY WI TH PROVE N I NDUS TRY BES T
PR ACTI CES AN D TO S ATI S FY THE M O S T
DATABASE COMMON N E E DS OF DI FFERENT CLI ENT

SYSTEM
OR GANI ZAT I ON S . FOR E X A M P LE, A LL
OR GANI ZAT I ON S TH AT S ELL P RO DUCTS TO

CUS TOMERS WI L L N E E D A N I NV ENTO RY TA BLE ,
A CUSTOME R TABL E , A S UP P LI ER TA BLE, A ND
S O FORTH.
IDENTIFY IMPORTANT
TRANSACTIONS LIVE WHILE
THEY ARE BEING PROCESSED
AND EXTRACT THEM.

EXAMPLES
· ER RORS
· FR AUD
COMPLI ANC E
EMBEDDED AUDIT MODULE
DISADVANTAGES

Ope r at i onal e f f i c i e nc y- can d e cr e as e


pe r for manc e , e s pe c i a l l y i f t e s t i ng i s
e x t e nsi ve

Ve r i f yi ng EA M i nt e g r i t y - s uch as
e nvi r onme nt s w i t h a hi gh l e v e l of p r og r am
mai nt e nanc e
GENERALI ZE D AU D I T S O F TWA RE
It is the most widely used CAATT for IS auditing. GAS allows auditors to access
electronically coded data files and perform various operations on their contents.
Some of the more common uses for GAS include:

1 • F o ot i ng a nd b a la n ci n g en ti re f i l e s o r se l e cte d da ta
i t e ms
2 • Se l e ct i ng a n d r e p o r t i n g de ta i le d da ta co n ta i n e d i n
files
3 • Se l ect i ng st r a t i f i e d s t a t i s ti ca l sa mp le s f ro m da ta
files
4 • F or ma t t i ng r e s ult s o f t ests i n to re p o rts
• P r i n t i ng con f i r ma t i o n s i n e i the r sta n da rdi z e d o r
5 s p e ci a l wor di n g
• Scr eeni ng da t a a n d se l e cti v e ly i n cl udi n g o r
6
e x cl udi ng i t e ms
7 • C ompa r i ng mul t i p le f i le s a n d i de n ti f yi n g a n y
di f f er ences
8 • R eca l cula t i n g d a t a f i e l d s
GENERALIZED AUDIT SOFTWARE
is popular because…

MAN Y
GAS S OFTWARE GAS CAN BE US E D
PR ODUCTS ARE AUDI T ORS C AN
I S EASY TO US E TO AUDI T T H E
PLATFORM PER FORM T E S TS
AND R EQUI R E S DAT A CURR E N TL Y
I NDEPENDEN T, I NDEPENDEN TL Y
LI T TLE BEI NG STORE D I N
WOR KS ON OF I T STAFF .
COMPUTER MOS T LI FE
MAI NFRAME S
BACKGROUN D. S TR UCTURES AN D
AND PCS.
FOR MATS.
USING GAS TO ACCESS SIMPLE FILE
STRUCTURE
USING GAS TO ACCESS COMPLEX FILE
STRUCTURE
ACCESS CONTROL LIST
(ACL) SOFTWARE

A p r opr i e t ar y v e r s i o n o f GA S

Le a de r i n t h e i ndu s t r y

De s i g n as an a u di t o r - f r i e nd l y
l an guage

Ac c e ss t o da t a g e ne r a l l y e as y w i t h
Op e n Dat ab a s e Co nne c t i v i t y
( ODBC) i nt e r f a c e
DATA
DEFINITION

ONE OF ACL’S
STRENGTHS IS THE
ABILITY TO READ DATA
STORED IN MOST
FORMATS. ACL USES
THE DATA DEFINITION
FEATURE FOR THIS
PURPOSE. THE DATA
DEFINITION SCREEN
ALLOWS THE AUDITOR
TO DEFINE
IMPORTANT
CHARACTERISTICS OF
THE SOURCE FILE.
CUSTOMIZING
VIEW
A VIEW IS SIMPLY A
WAY OF LOOKING AT
DATA IN A FILE;
AUDITORS SELDOM
NEED TO USE ALL THE
DATA CONTAINED IN A
FILE. ACL ALLOWS THE
AUDITOR TO
CUSTOMIZE THE
ORIGINAL VIEW
CREATED DURING
DATA DEFINITION TO
ONE THAT BETTER
MEETS HIS OR HER
AUDIT NEEDS.
FILTERING DATA
ACL provides powerful options for filtering data that support various
audit tests. Filters are expressions that search for records that meet
the filter criteria. ACL’s expression builder allows the auditor to use
logical operators such as AND, OR, NOT and others to define and
test conditions of any complexity and to process only those records
that match specific conditions.
STRATIFYING DATA
ACL’s stratification feature allows the auditor to view the
distribution of records that fall
into specified strata. Data can be stratified on any numeric
field such as sales price, unit
cost, quantity sold, and so on. The data are summarized and
classified by strata, which
can be equal in size (called intervals) or vary in size (called
free)..

STATISTICAL ANALYSIS
ACL offers many sampling methods for statistical analysis.
Two of the most frequently used are record sampling and
monetary unit sampling (MUS). Each method allows
random and interval sampling. The choice of methods will
depend on the auditor’s strategy and the composition of the
file being audited. On one hand, when records in a file
are fairly evenly distributed across strata, the auditor may want
an unbiased sample and will thus choose the record sample
approach.
SUMMARY

This chapter began with a review of data structures, which


specify (1) how records are organized in the physical file or
database and (2) the access method employed to navigate the
file.
MEMBERS

DELA CRUZ, ZYRELLE


GUTIERREZ, JANET
CHAPTER 8 MUNOZ, PRECIOUS
PIDLAOAN, FATIMA
USON, VANESSA MAE
VILLANDA, BEVERLYN

You might also like