Module 3: File and Database Organization: Test-Your-Knowledge Questions
Module 3: File and Database Organization: Test-Your-Knowledge Questions
Course Modules
Exam Preparation
Resources
Describe how fields, records, files, and databases are organized within a data
hierarchy. (Level1)
3.2
Database organization
methods
3.3
Database management
systems
3.4
3.5
Database developments
Module summary
Print this module
Course Schedule
Course Modules
Exam Preparation
Resources
Entity
Field
File
Record
It
It
It
It
is short.
is a file.
represents an entity.
is unique.
Indexed
Flat file
Hierarchical
Relational
Solution
2. Chapter 5, Review question 2, page 210
Solution
3. Chapter 5, Review question 13, page 210
Solution
Course Schedule
Course Modules
Exam Preparation
Resources
Describe how fields, records, files, and databases are organized within a data hierarchy. (Level 1)
Required reading
Databases can be used for business intelligence purposes such as obtaining product profitability, customer
profiles, and targeting promotions. In the opening case of Chapter 5, the Valero Energy company uses a
fully-integrated enterprise business intelligence system, called WebFocus, to make meaningful data available
and accessible throughout the organization.
Basic terms
Data is generally organized in a hierarchy that starts with a character and progresses into a database. For
illustrative purposes, let's look at the components of a student database that holds the students' names,
courses enrolled, and the students' grades. A character may be alphabetic, numeric, or a symbol, and each
character occupies a single position in a field. Each letter in a student's name is a character.
A field is a group of related characters and it is the smallest piece of information in a record. For example, in a
student file, one field could hold the first name of each student; in an accounts receivable file, one field could
hold the invoice number. A field can also hold graphical, video, or sound images. More than one field makes up
a record.
A record is a collection of related data fields. It holds all the information about an entity in the file. All the
records in a file must have the same fields.
A file is a collection of related records. Each file has a unique structure. For example, a paper-based file is
identified by a folder and all the pages it holds, organized in some fashion, perhaps with a table of contents. An
electronic file on a computer is identified by a filename, and holds all the records stored under the filename.
An entity is a generalized class of people, places, or things (objects) for which data is collected, stored, and
maintained. For example, in a student database, one entity could be a student. In general, each entity has at
least one record associated with it.
An attribute is a characteristic of an entity. In the above example, the student has a student number, name,
date of birth, and so on. Attributes are contained in the fields that are grouped by entities. Not only must each
record in a file contain the same fields, each field must hold the same type of information and have the same
attributes. An example of an attribute defined for the NAME field of a personnel file could be:
Field description:
NAME
Field type:
Character field
Field width:
30 characters
Field structure:
initial, use the first initial only. If the name is too long to
fit in the field, drop the initial, then truncate (shorten) the
first name as needed.
Each record can be seen as a row in a table and each field can be seen as a column. A database is an
organized collection of records in one or multiple tables.
All databases require that every record contain at least one key. A key is a field or set of fields that identifies
the record. A primary key is a field or set of fields that uniquely identifies each record in the table. In case
the primary key is not unique, a secondary key can be used. For example, in a file containing a student
directory, the key field could be the name, and the secondary field could be the address, so that in case of
identical names, the secondary field can be used for sorting.
Course Schedule
Course Modules
Exam Preparation
Resources
Database approach
As computer applications became more complex and required the use of several related files, database
techniques were developed to meet these needs. The Data Base Task Group of the Conference on Data
Systems Languages (CODASYL) published the first formal documentation of the key features of databases in
1971. This publication, which has been updated several times, has become the model that many software
developers use to develop databases.
Unlike the file approach, the database approach allows different applications (for example, accounting,
personnel, and payroll) to access the same database. Instead of organizing the data to meet the needs of a
particular application (for example, payroll), the database approach requires the organization to analyze its
overall information requirements, and then design a common database to meet the needs of multiple
applications.
Database systems provide a centralized repository of information that is not application-specific. The data in the
database is managed centrally regarding the data integrity, primary and secondary key management, and
indexing. Various applications access the database to update information. Because the information is no longer
organized in application-specific files, it is much easier to update or change software applications as long as the
information is used as structured in the database.
The database approach requires the use of database management systems (DBMS).
Data modelling
Logical design describes logical relationships among data and groups them in a logical order, whereas
physical design takes the logical design and structures it for efficiency and effectiveness. For example, it
might be more effective to create summary totals as data are entered, rather than calculate them each time
they are required, or some data attributes could be carried in more than one entity. These are examples of
planned data redundancy, with the goal of improving system performance to meet user needs.
An important tool for database designers is a data model, which is used to show relationships between
entities. If this is done at the highest level for the organization, it is known as enterprise modelling. A
commonly used tool for modellers is an entity-relationship(ER) diagram. By using these tools, designers
can ensure that relationships are logically structured so that when databases and application programs are
developed, they will in fact meet the needs of the system's users.
Database models
The data in a database can be interrelated in many ways. Historically, databases were organized in a
hierarchical or network structure. Today, the most popular structure is a relational database. Do not be overly
concerned with the mechanics of these structures. Instead, focus on the essential differences between the
database types, and the general organization of the data.
Hierarchical database
A hierarchical database organizes information in a tree-like structure in which data elements are related to
each other in a parent (superior) to child (subordinate) relationship. A data element can be a data field, a
record, or a database file.
The hierarchical database provides a one-to-many relationship, in a top-down manner. To access the employee
of any department, you must specify the department because department is the parent of employee. If you
have no information on the parent, it is impossible to retrieve the item because you must access the item
through its parent.
A hierarchical structure is particularly useful for databases containing structured information where access to
information is keyed to the structure, that is, the logical access is in the same hierarchy as the physical layout
of the database. The rigid structure of a hierarchical database enables it to be updated efficiently. Typically, it
is used in applications such as inventory management, where a large number (hundreds of thousands or
millions) of records are in the system.
Network database
A network database is similar to a hierarchical database except that a child in the system can have more
than one parent. Thus, because more than one path to a particular data element exists, the database structure
is many-to-many. Network databases are particularly efficient for looking up information because they permit
access from more than one starting point. Unlike a hierarchical database, the process of querying a network
database is less restrictive. A network database is appropriate for situations where queries of the database may
not follow a predetermined pattern. An example is a database of students and their course enrolment, where a
student can be enrolled in multiple courses. The relationship between students and courses is thus many-tomany, and a hierarchical database is inappropriate.
Relational database
A relational database uses two-dimensional tables called relations to store data. In the relational model,
each row of a table represents an entity, with the columns representing attributes. Each attribute can have
only certain predefined values, and these allowable values are called the domain. This provides automatic
error-checking features to all applications using the table.
The relational database is particularly easy to manage for answering user questions and producing reports.
Basic data manipulation includes selecting (eliminates rows), projecting (eliminates columns), and joining
and linking (creates a new table).
One distinctive feature of a relational database is that you can combine any number of tables as long as there
are common fields. You can combine (join) two tables to form a third, provided there is a common column. As
long as the tables share at least one common attribute, they can be linked to answer queries or produce
reports. What is especially important is that data from multiple tables can be linked to answer queries. Using a
relational database, you can answer a complex query with a few simple commands, whereas the traditional
file-based approach would require several programs to be written and run against the various files containing
the required data, and then creating a new file after several operations.
A relational database has properties beyond two-dimensional tables. For example, there is no need for order or
sequence in a table, and the relation is a logical structure, thus users need not be concerned with physical
storage details.
Because of the flexibility provided by relational databases, they are becoming the design of choice for computer
professionals. Relational databases reduce data redundancy (facilitated by the database joining capability) and
allow data tables to be added with relative ease. With relational databases, it is relatively easy to perform
queries on the data without being constrained by the actual structure of the data. Microsoft Access is a
relational database program.
Example 3.1
Choosing a database model
Francine Ong has been assigned to design a database for a new inventory control system. The following is a
partial description of the data items and their relationship:
Product items are organized by product lines, and each product can only belong to one product line. Each
salesperson is assigned one or more product lines. A product line can have more than one salesperson
assigned. Each salesperson is assigned a sales territory. For large territories, more than one salesperson can be
assigned.
Q: Of the three database models (hierarchical, network, and relational), which model is suitable for the
information described?
Solution
Exhibit 3-1 graphically represents the network model for the inventory database, while Exhibit 3-2 depicts the
relational model. Exhibit 3-3 is a short-form notation of the relational model.
Exhibit 3-1
Network model
Exhibit 3-2
Relational model
Electrical parts
Plumbing supplies
Stainless
Manufacturing
E02
105.21
Instead of drawing the tables as shown in Exhibit 3-2, another common practice is to list the contents of each
table in short-form notation, marking the key field with an asterisk, as shown in Exhibit 3-3.
Exhibit 3-3
Short-form notation
Tables:
Sales territory
Salesperson
Product line
Product assignment
Product
Sales territory:
Territory code*
Territory name
Territory manager
G/L profit centre
Salesperson:
Salesperson ID*
Salesperson name
Territory code
Quota for the year
Product line:
Product line code*
Description
Product assignment: Product line code*
Salesperson ID*
Product:
Product code*
Product name
Manufacturer
Product line code
Unit cost
Many computer database programs, such as Access, FileMaker Pro, and Paradox, provide relational capabilities.
Oracle and Microsoft SQL are examples of fully relational databases.
Any database is only as useful as the data it contains. Data should be accurate, complete, economical, flexible,
reliable, relevant, simple, timely, verifiable, accessible, and secure. The purpose of data cleanup is to develop
processes to ensure those characteristics. Data cleanup is particularly important when moving from a file-based
system to a database or migrating from one database to another one.
Course Schedule
Course Modules
Exam Preparation
Resources
The goals and activities of a business should be supported by the appropriate database structure. To create,
implement, and use a database, a database management system (DBMS) is required. A DBMS is a group
of programs used as an interface between the database and either the application programs or a user. Users
include end-users, who use the information from the database or enter data into the database; programmers,
who develop applications for the database; and database administrators (DBA), who create and manage the
database. All DBMSs have certain common functions, but are classified by the type of database they support.
within the DBMS ensure that two users cannot modify the same field at the same time.
Course Schedule
Course Modules
Exam Preparation
Resources
Distributed databases are technically quite complicated to implement and administer. Distributed database
storage involves storing an organization's data in several different servers that are connected via
telecommunication equipment. It is sufficient to know that such a technology exists, and that one form of
implementation is a replicated database. For the purpose of this course, the description in the text on page
203 is adequate.
Course Schedule
Course Modules
Exam Preparation
Resources
Describe database developments, including data warehousing, data marts, and data mining.
(Level 2)
Required reading
Business intelligence
Business intelligence (BI) is the process of getting enough of the right information in a timely manner and
usable form to support the business strategy, tactics, or operations.
Competitive intelligence is the continuous legal and ethical collection and analysis of information about
competitors for comparison purposes.
Counterintelligence is what a firm does to protect its information from the competition.
Knowledge management is a collection of techniques that captures and manages structured and
unstructured information to improve the ability of the organization to make timely and good business decisions.
Open database connectivity (ODBC) is a set of standards that helps database integration and has the
ability to share information between databases. Software developed according to these standards can be used
with any ODBC-compliant database. This is extremely important to organizations that use a variety of levels of
database applications. ODBC is frequently a standard requirement when organizations select software.
Course Schedule
Course Modules
Exam Preparation
Resources
Module 3 summary
File and database organization
This module introduces the basic concepts of files and databases, their components, and organization.
Database characteristics, advantages, and disadvantages will be reviewed, followed by a comparison of
hierarchical, network, and relational databases.
Describe how fields, records, files, and databases are organized within a
data hierarchy.
Data must be organized and structured so that they can be used effectively.
Data hierarchy (from largest to smallest element):
1. Database a group of files holding related information
2. File a collection of related information called records
3. Record a collection of attributes of an entity in a file. For example, in a personnel
file, an employee is an entity. Attributes of an employee include employee number,
date of birth, and start date.
4. Field:
A field is the smallest piece of information in a record, corresponding
to one attribute of an entity.
A primary key field is a field that uniquely identifies a record in a file
for quicker access of data and sorting.
A secondary key field is sometimes used for access and sorting but it
does not uniquely identify a record.
5. Entity people, places, or objects for which data is collected, stored, and
maintained
6. Attribute a characteristic of an entity
7. Character a letter, number, or symbol
widespread effect.
A database administrator (DBA) is required to manage the DBMS.
Three principal database models are:
hierarchical model organizes information in a tree-like structure
network model the database structure is many-to-many
relational model uses two-dimensional tables called relations to store data
Which model to use depends on
the nature of the data relationships
the need for flexibility
the volume of requests or changes to the database to be processed
the ease of use for end-users
Data warehouse consolidates data from various operational systems and external data.
It enables online analytical processing (OLAP) to provide information that meets the organizations
information needs.
It is difficult and costly to build; however, it provides a good return on investment if properly
designed.
Data marts
Smaller versions of data warehouse, called data marts, may be built first. These data marts can
be used for departmental OLAP and form the basis of data warehouse for the organization.
Data mining
Data mining consolidates data from various operational systems and external data.
It enables online analytical processing (OLAP) to provide information that meets the organizations
information needs.
Course Schedule
Course Modules
Solution 1
a.
b.
c.
d.
e.
2)
3)
4)
3)
4)
Exam Preparation
Resources
Course Schedule
Course Modules
Exam Preparation
Resources
Solution 2
A database is a collection of integrated and related files. A database management system is the software used
to manipulate the database and provide an interface between the database and the user or application
programs. A database management system is systems software that helps organize data for effective access
and storage by multiple applications. A DBMS provides different users with different views of the data
(subschemas), avoids redundancy, encourages program independence, offers flexible access, and provides
centralized control.
Course Schedule
Course Modules
Exam Preparation
Resources
Solution 3
Data mining is the automated discovery of patterns and relationships in data warehouses. OLAP tools can tell
users what happened in their business. Data mining searches the data for statistical "whys" by seeking patterns
in the data and then developing hypotheses to predict future behaviour. Online analytical processing (OLAP)
programs are used to store and deliver data warehouse information. The OLAP allows users to explore
corporate data in new and innovative ways using multiple dimensions such as products, salespeople, or time.
OLAP programs include spreadsheets, reporting and analysis tools, and custom applications.
Course Schedule
Course Modules
Exam Preparation
Resources
The hierarchical model is inappropriate in this case because of the many-to-many relationships between
salespersons, product lines, sales territories, and inventory items. The network and relational models, however,
are both suitable. The preferred model is a relational database due to its flexibility to associate or link different
types of data.