CH8 AudCis Notes in Book
CH8 AudCis Notes in Book
1. Organization -> refers to the way records are physically arranged on the secondary storage
device. This may be either sequential or random.
The records in sequential files -> are stored in contiguous locations that occupy a specified area
of disk space.
Records in random files -> are stored without regard for their physical relationship to other records
of the same file.
-> Random files may have records distributed throughout a disk.
Access method -> is the technique used to locate records and to navigate through the database
or file. While several specific techniques are used, in general, they can be classified as either
direct access or sequential access methods.
Flat-File Structures
-> flat-file model describes an environment in which individual data files are not integrated with
other files.
-> End users in this environment own their data files rather than share them with other users.
-> Data processing is thus performed by standalone applications rather than integrated systems.
-> is a single view model that characterizes many legacy systems.
-> data files are structured, formatted, and arranged to suit the specific needs of the owner or
primary user. Such structuring, however, may omit or corrupt data attributes that are essential to
other users, thus preventing successful integration of systems across the organization
Indexed Structure
-> in addition to the actual data file, there exists a separate index that is itself a file of record
addresses.
-> This index contains the numeric value of the physical disk storage location (cylinder, surface,
and record block) for each record in the associated data file.
-> The data file itself may be organized either sequentially or randomly.
-> Records in an indexed random file are dispersed throughout a disk without regard for their
physical proximity to other related records.
-> In fact, records belonging to the same file may reside on different disks.
-> A record’s physical location is unimportant as long as the operating system software can find
it when needed.
-> Locating a record is accomplished by searching the index for the desired key value, reading
the corresponding storage location (address), and then moving the disk read-write head to the
address location. When a new record is added to the file, the data management software selects
a vacant disk location, stores the record, and adds the new address to the index.
-> The physical organization of the index itself may be either sequential (by key value) or random.
Random indexes -> are easier to maintain, in terms of adding records, because new key records
are simply added to the end of the index without regard to their sequence.
Indexes in sequential order -> are more difficult to maintain because new record keys must be
inserted between existing keys.
-> One advantage of a sequential index is that it can be searched rapidly.
-> Because of its logical arrangement, algorithms can be used to speed the search through the
index to find a specific key value. This advantage becomes particularly important for large data
files with corresponding large indexes.
-> The principal advantage of indexed random files is in operations involving the processing of
individual records.
-> Another advantage is their efficient use of disk storage. Records may be placed wherever there
is space without concern for maintaining contiguous storage locations.
Virtual storage access method (VSAM) structure -> is used for very large files that require routine
batch processing and a moderate degree of individual record processing.
-> For instance, the customer file of a public utility company will be processed in batch mode for
billing purposes and directly accessed in response to individual customer queries.
-> Because of its sequential organization, the VSAM structure can be searched sequentially for
efficient batch processing.
-> is used for large files that often occupy several cylinders of contiguous storage on a disk.
-> To find a specific record location, the VSAM file uses a number of indexes that describe in
summarized form the contents of each cylinder.
-> VSAM indexes do not provide an exact physical address for a single record, instead they
identify the disk track where the record in question resides.
-> Because VSAM must read multiple indexes and search the track sequentially, the average
access time for a single record is slower than the indexed sequential or indexed random
structures.
-> A disadvantage of the VSAM structure is that it does not perform record insertion operations
efficiently.
-> because the VSAM file is organized sequentially, inserting a new record into the file requires
the physical relocation of all the records located beyond the point of insertion.
-> The indexes that describe this physical arrangement must, therefore, also be updated with
each record insertion, which is extremely time-consuming and disruptive to operations.
-> One method of dealing with this problem is to store new records in an overflow area that is
physically separate from the other data records in the file.
VSAM file -> rather than inserting a new record directly into the prime area, the data management
software places it in a randomly selected location in the overflow area.
-> it then records the address of the location in a special field (called a pointer) in the prime area.
-> Later, when searching for the record, the indexes direct the access method to the track location
where the record should reside. The pointer at that location reveals the record’s actual location in
the overflow area. Thus, accessing a record may involve searching the indexes, searching the
track in the prime data area, and finally searching the overflow area. This slows data access time
for both direct access and batch processing.
-> Periodically, the VSAM file must be reorganized by integrating the overflow records into the
prime area and then reconstructing the indexes. This process involves time, cost, and disruption
to operations.
-> when a file is highly volatile (records are added or deleted frequently), the maintenance burden
associated with the VSAM approach tends to render it impractical.
-> for large, stable files that need both direct access and batch processing, the VSAM structure
is a popular option.
Hashing structure -> employs an algorithm that converts the primary key of a record directly into
a storage address.
-> Hashing eliminates the need for a separate index.
-> By calculating the address, rather than reading it from an index, records can be retrieved more
quickly.
-> The principal advantage of hashing is access speed. Calculating a record’s address is faster
than searching for it through an index. This structure is suited to applications that require rapid
access to individual records.
-> The hashing structure has two disadvantages. First, this technique does not use storage space
efficiently. The storage location chosen for a record is a mathematical function of its primary key
value. The algorithm will never select some disk locations because they do not correspond to
legitimate key values.
-> The second disadvantage is the reverse of the first. Different record keys may generate the
same (or similar) residual, which translates into the same address. This is called a "collision"
because two records cannot be stored at the same location. One solution to this problem is to
randomly select a location for the second record and place a pointer to it from the first (the
calculated) location.
Physical address pointer -> contains the actual disk storage location (cylinder, surface, and record
number) needed by the disk controller.
-> This physical address allows the system to access the record directly without obtaining further
information.
-> This method has the advantage of speed, since it does not need to be manipulated further to
determine a record’s location.
-> It has two disadvantages however: First, if the related record is moved from one disk location
to another, the pointer must be changed. This is a problem when disks are periodically
reorganized or copied.
-> Second, the physical pointers bear no logical relationship to the records they identify. If a
pointer is lost or destroyed and cannot be recovered, the record it references is also lost.
Relative address pointer -> contains the relative position of a record in the file.
Logical key pointer -> contains the primary key of the related record.
-> This key value is then converted into the record’s physical address by a hashing algorithm.
Two-dimensional flat files -> exist as independent data structures that are not linked logically or
physically to other files.
Database models -> were designed to support flat-file systems already in place, while allowing
the organization to move to new levels of data integration.
-> By providing linkages between logically related files, a third (depth) dimension is added to
better serve multiple-user needs.
Indexed sequential file structure -> relational databases are based on this.
-> This structure uses an index in conjunction with a sequential file organization.
-> It facilitates both direct access to individual records and batch processing of the entire file.
-> Multiple indexes can be used to create a cross-reference, called an "inverted list", which allows
even more flexible access to data.
Relational Database Theory -> E. F. Code originally proposed the principles of the relational
model in the late 1960s.
-> The formal model has its foundations in relational algebra and set theory, which provide the
theoretical basis for most of the data manipulation operations used. -
-> Accordingly, a system is relational if it:
1. Represents data in the form of two-dimensional tables.
2. Supports the relational algebra functions of restrict, project, and join
Although restrict, project, and join is not the complete set of relational functions, it is a useful
subset that satisfies most business information needs.
Data model -> is the blueprint for ultimately creating the physical database.
Entity relationship (ER) diagram -> the graphical representation used to depict the model.
Association -> the labeled line connecting two entities in a data model describes the nature of the
association between them.
-> This association is represented with a verb.
Physical database tables -> are constructed from the data model with each entity in the model
being transformed into a separate physical table.
Attributes -> Across the top of each table are attributes forming columns.
Tuple -> Intersecting the columns to form the rows of the table are tuples.
-> corresponds approximately to a record in a flat-file system.
Foreign key -> Logically related tables need to be physically connected to achieve the
associations described in the data model. This is accomplished by using a foreign key.
User view -> is the set of data that a particular user sees.
-> Examples of user views are computer screens for entering or viewing data, management
reports, or source documents such as an invoice.
-> Views may be digital or physical (paper), but in all cases, they derive from underlying database
tables.
Database Anomalies -> improperly normalized tables can cause DBMS processing problems that
restrict, or even deny, users access to the information they need. Such tables exhibit negative
operational symptoms called anomalies.
One or more of these anomalies will exist in tables that are not normalized or are normalized at a
low level, such as first normal form (1NF) or second normal form (2NF). To be free of anomalies,
tables must be normalized to the third normal form (3NF) level.
Deletion anomaly -> involves the unintentional deletion of data from a table.
-> The presence of the deletion anomaly is less conspicuous, but potentially more serious than
the update and insertion anomalies.
-> A flawed database design that prevents the insertion of records or requires the user to perform
excessive updates attracts attention quickly.
-> The deletion anomaly, however, may go undetected, leaving the user unaware of the loss of
important data until it is too late. -> This can result in the unintentional loss of critical accounting
records and the destruction of audit trails.
Dependencies -> The database anomalies described previously are symptoms of structural
problems within tables called dependencies.
Normalization process -> involves identifying and removing structural dependencies from the
table(s) under review.
The resulting tables will then meet the following two conditions:
1. All nonkey (data) attributes in the table are dependent on (defined by) the primary key.
2. All nonkey attributes are independent of the other nonkey attributes.
Keys in 1:1 Associations -> Where a true 1:1 association exists between tables, either (or both)
primary keys may be embedded as foreign keys in the related table.
-> On the other hand, when the lower cardinality value is zero (1:0,1) a more efficient table
structure can be achieved by placing the one-side (1:) table’s primary key in the zero-or-one (:0,1)
table as a foreign key.
Keys in 1:M Associations -> Where a 1:M (or 1:0,M) association exists, the primary key of the 1
side is embedded in the table of the M side.
Keys in M:M Associations -> To represent the M:M association between tables, a link table needs
to be created. The link table has a combined (composite) key consisting of the primary keys of
two related tables.
-> An M:M association between tables requires the creation of a separate link table because
embedding a foreign key within either table is not possible.
Database normalization -> is a technical matter that is usually the responsibility of systems
professionals.
-> The subject, however, has implications for internal control that make it the concern of auditors
also.
-> Although most auditors will not be responsible for normalizing an organization’s databases,
they should have an understanding of the process and be able to determine whether a table is
properly normalized.
-> the auditor needs to know how the data are structured before he or she can extract data from
tables to perform audit procedures.
View Modeling -> six phases of database design, which are collectively known as view modeling.
‘2. Construct a Data Model Showing Entity Associations (Cardinality) -> The next step in view
modeling is to determine the associations between entities and document them with an ER
diagram.
-> the organization’s business rules directly impact the structure of the database tables.
-> If the database is to function properly, its designers need to understand the organization’s
business rules as well as the specific needs of individual users.
Add Attributes -> Every attribute in an entity should appear directly or indirectly (a calculated
value) in one or more user views.
-> Entity attributes are, therefore, originally derived and modeled from user views. In other words,
if stored data are not used in a document, report, or a calculation that is reported in some way,
then it serves no purpose and should not be part of the database.
The query function of a relational DBMS allows the system designer to easily create user views
from tables. The designer simply tells the DBMS which tables to use, their primary and foreign
keys, and the attributes to select from each table. Older DBMSs require the designer to specify
view parameters directly in SQL. Newer systems do this visually. The designer simply points and
clicks at the tables and the attributes. From this visual representation, the DBMS generates the
SQL commands for the query to produce the view.
View Integration -> the task of combining the data needs of all users into a single entity-wide
schema is called view integration.
-> The normalized entities in the resulting data model must meet the following conditions:
‘1. An entity must consist of two or more occurrences.
‘2. No two entities may have the same primary key. The exceptions to this are entities with
composite keys that are comprised of the primary keys of other entities.
‘3. No non-key attribute may be associated with more than one entity.
Commercial Database Systems -> Modeling the data needs of thousands of user views is a
daunting undertaking when creating an entity-wide database from scratch.
-> To facilitate this task, modern commercial database systems come equipped with a core
schema, normalized tables, and templates for thousands of views.
-> Commercial systems are designed to comply with proven industry best practices and to satisfy
the most common needs of different client organizations.
Database vendors cannot, however, anticipate the information needs of all users in advance.
Therefore, new entities and new attributes may need to be added to the core schema. Although
configuring the core database in this fashion is far more efficient than working from scratch, the
objective is the same. The database designer must produce a set of integrated tables that are
free of the update, insert, and deletion anomalies and sufficiently rich to serve the needs of all
users.
Operational Efficiency
-> From the user’s point of view, EAMs decrease operational performance.
-> The presence of an audit module within the host application may create significant overhead,
especially when the amount of testing is extensive.
-> One approach for relieving this burden from the system is to design modules that may be turned
on and off by the auditor. Doing so will, of course, reduce the effectiveness of the EAM as an
ongoing audit tool.
Generalized audit software -> is the most widely used CAATT for IS auditing.
-> GAS allows auditors to access electronically coded data files and perform various operations
on their contents.
ACL SOFTWARE
-> Public accounting firms make extensive use of GAS.
-> Among them, ACL is an industry
leader.
ACL -> is a meta-language designed to allow auditors to access and analyze client data stored in
various digital formats.
-> In fact, many of the problems associated with accessing complex data structures have been
solved by ACL’s Open Database Connectivity (ODBC) interface
-> One of ACL’s strengths is the ability to read data stored in most formats.
Data Definition
-> One of ACL’s strengths is the ability to read data stored in most formats.
-> ACL uses the data definition feature for this purpose.
-> To create a data definition, the auditor needs to know both where the source file physically
resides and its field structure layout.
-> Small files can be imported via text files or spreadsheets. Very large files may need to be
accessed directly from the mainframe computer. When this is the case, the auditor must obtain
access privileges to the directory in which the file resides.
-> Where possible, however, a copy of the file should be stored in a separate test directory or
downloaded to the auditor’s PC. This step usually requires the assistance of systems
professionals.
-> The auditor should ensure that he or she secures the correct version of the file, that it is
complete, and that the file structure documentation is intact. At this point, the auditor is ready to
define the file to ACL.
Data definition screen -> allows the auditor to define important characteristics of the source file,
including overall record length, the name given to each field, the type of data (i.e., numeric or
character) contained in each field, and the starting point and length of each field in the file.
-> This definition is stored in a table under a name assigned by the auditor.
Customizing a View -> A view is simply a way of looking at data in a file; auditors seldom need to
use all the data contained in a file.
-> ACL allows the auditor to customize the original view created during data definition to one that
better meets his or her audit needs.
-> The auditor can create and reformat new views without changing or deleting the data in the
underlying file.
Filtering Data -> ACL provides powerful options for filtering data that support various audit tests.
Filters -> are expressions that search for records that meet the filter criteria.
ACL’s expression builder -> allows the auditor to use logical operators such as AND, OR, NOT
and others to define and test conditions of any complexity and to process only those records that
match specific conditions.
Stratifying Data -> ACL’s stratification feature allows the auditor to view the distribution of records
that fall into specified strata.
-> Data can be stratified on any numeric field such as sales price, unit cost, quantity sold, and so
on.
-> The data are summarized and classified by strata, which can be equal in size (called intervals)
or vary in size (called free).
Statistical Analysis -ACL offers many sampling methods for statistical analysis. Two of the most
frequently used are record sampling and monetary unit sampling (MUS). Each method allows
random and interval sampling. The choice of methods will depend on the auditor’s strategy and
the composition of the file being audited. On one hand, when records in a file are fairly evenly
distributed across strata, the auditor may want an unbiased sample and will thus choose the
record sample approach