The Database Approach
The Database Approach
As the information system field developed during the late 1960s and early 1970s, the
concept of the database (the information to be stored) and the database
management system (the software used to manage the database) were developed.
Today, sophisticated database management systems (DBMS) are used to handle
enormous databases.
Early database systems, like other computer software, were developed to provide a
well-defined set of functions using a specified set of data. The data were stored as
one or more computer files that were accessed by special purpose database
software. This file processing approach to database management is illustrated in
Figure 3.3 below showing a university administration application.
Figure 3.3 Sharing data files among applications in the File Processing Environment
The file processing approach has some serious drawbacks. First, since each
application program/user must directly access each data file that it/he/she uses, the
program/user must know how the data in each file are stored. This can create
considerable redundancy.
Secondly, if modifications are made to the data file, access instructions must be
modified too.
Another problem arises when data are used by different application programs/users
who can access and modify data and access information i.e. lack of central control
which jeopardises data integrity/quality.
It can thus be seen that a database system is essentially nothing more than a
computerized record keeping system, whereas the database itself can be regarded
as an electronic filling cabinet of some kind i.e. a repository for a collection of
computerized data files (Date, 1986:1). The user of a database system has the
facilities to perform a variety of operations on such files, including the following
among others:
• Adding new files to the database
• Inserting new data into existing files
• Retrieving data from existing files
• Updating data in existing files
• Deleting data from existing files
• Removing existing files from the database
Figure 3.4 Sharing data files among applications in a Data Base Management System Environment
(Aronoff, 1989)
Advantages
This concept of centralized data implies that there will be some identifiable person
who has control responsibility for the operational data. Several advantages arise
from this notion of centralized control. Among others these include:
This problem is solved in a database system by integrating the various files for
example a) in a university database the academic vs the administration files, b) in a
business organization creditors vs debtors files, etc.
This occurs particularly during updates when one user or department updates their
files and others don’t then there arises some inconsistencies.
Sharing means not only that existing users/applications sharing the data in the
database, but also that new applications/users can be developed or incorporated to
use the same stored data.
With central control of the database approach it is easy to ensure that all applicable
data standards (organizational, national, regional or international) are observed in
the representation and coding of data.
Standardizing stored data is particularly desirable and can aid data interchange or
migration between systems.
Likewise, data naming (coding) and documentation standards are also very desirable
as an aid to data sharing and understandability.
This implies
a) ensuring that access to the database is through proper channels – use
of passwords,
b) impose security checks whenever access to sensitive data is
attempted,
c) updates are controlled and streamlined.
To ensure smooth running/sharing, different security checks can be established for
each type of access (retrieve, update, modify, delete, etc). Since without such
checks the security and integrity of the data is actually at a higher risk than in a
traditional filing system.
One of the primary functions of a database management system is to free the user
and/or programmes from the responsibility of knowing the physical details (length,
location and arrangement) of the data stored. This leaves the user with the chance to
concentrate with the logical or informational aspects of the data.
DBMS separate the data from programmes that access it thus reducing access
barriers and hence enhancing horizontal and vertical integration.
Disadvantages
There are some disadvantages associated with the database approach including:
1. Complexity
Inevitably, the DBMS is a complex piece of software and requires database experts
both to look after it and also design and develop the database and applications.
2. Cost
3. Inefficiencies in processing
4. Rigidity
Most DBMS’s were designed to manage particular type of data; fixed format records;
until recently, other types of data such as text and graphics have had to be excluded
from databases and even not the ability to include them are still quite restrictive.
A DBMS organises data from the same source into information suitable for different
users/uses, each of these referred to a database view. Figure below illustrates this
concept using database comprising of client names, addresses, purchases and
inventory information. These data are presented using to views one for accounts
executive, showing sales accounts and for inventory manager, showing items
available.
Figure
By providing different views, the DBMS tailors the data base to each user – a very
valuable function – without storing multiple copies of the same data.
DATABASE MODELS
As we saw in the data organization section, the contents of a data file can be
described using records and fields. A record is a group of related data items stored
together, which can be thought of as one row in a table as shown below.
Here the first record contains information for the soil profile number 1. The records
represents the information pertaining to a particular entity/element, soil profile. A
record is divided into fields, each of which contains an item of data. A field defines
where a particular type of data can be found in the record, for example the data
fields in the table above include profile id, series name, pH, depth, texture and
erosion.
We introduce another concept, the key, which is a label comprised of one or more
fields by means of which a record can be identified/retrieved from the database. In
our example the profile Id could be designated the key. Fields which are not
designated as key fields are referred to as attribute fields.
We start by looking at the simplest model which is known as the hierarchal database
model.
When the data have a parent-child or one-to-many relation such as soil series within
a soil family or an academic department within a university, hierarchical methods
provide quick and convenient means of data access. Hierarchical systems of data
organization are well–known in environmental science, being the methods used for
animal and plant taxonomies, soil classification, land cover classification, etc.
In the hierarchical database model the data are organized in a tree structure as
shown in part A of Figure below. The relations among the five entities (university,
departments, lecturers and courses) are defined by the organization of the hierarchy,
which is encoded in the data records for each entity as shown in part B. The field
names are shown in the top half of each box and a sample data record is shown in
the lower half. There is one key that is designated as the key field that is used to
organize the hierarchy, which is represented by the arrows connecting the key field
(bolded) in each data record.
The top of the hierarchy is referred to as the root and is composed of one entity, in
our example the University. The root may be represented by a record containing a
single data field (as our case) or by a record containing many fields.
Except for the root, every element has one higher level element related to it, termed
as its parent, and one or more subordinate elements termed children. An element
can have only one parent but can have multiple children (one-to-many relationship).
Advantages
Hierarchical systems are easy to understand and are also easy to update and
expand.
Disadvantages
To retrieve information one has to traverse the tree structure. Retrieving information
of all students or all lecturers (key fields) from a specific department is very easy
since there is a direct link between student or lecturer with the departments.
However, to find all course (non-key field) offered by a specific department requires
a two stage search (search of all lecturers and then all courses associated to each
lecturer), making the process inefficient.
Searches cannot be done on attribute fields. For example we cannot search for all
second year students in the database above since the Year of Study field is not a
key field.
Consequently, hierarchical database models are only good for data retrieval if the
structure of all possible queries are known before hand so as to guide in the
selection of key fields. In most geographical analyses, the data searches are often
exploratory and cannot be predicted in advance like in the case of bibliographic
systems.
Some of the inflexibilities of the hierarchical database model can be overcome by the
network database model by allowing multiple parents as well as multiple children and
by doing away with the root. This allows searching of data records directly without
traversing the entire hierarchy above that record. The university database used
above is shown below using the network database model.
The Course entity can now have two parents i.e. Department and Lecturers entities.
A search of all courses in a specified department can now be done more directly.
Network models tend to have less redundant data storage than corresponding
hierarchical model. However more extensive linkage information is must be stored,
adding to the size and complexity of the data files.
In its simplest form, the relational database model consists of data stored in simple
records known as tuples which contain an ordered set of attribute values that are
grouped together in two-dimensional tables known as relations. Each table is
usually a separate file.
3. Student Information
5. Professor Information
In the relational database model there is no hierarchy of data fields within a record;
every data field can be used as a key. A search can be made of any of any single
table using any of the attribute fields, singly or together. For example the Student
Information table can be used to search for all students in year 4 just as easily as it
would be to search for all students with last Koech.
Searches of related attributes that are stored in different tables can be done by
linking two or more tables using any attribute they share in common using a join
operation. The shared attribute need not itself be part of the relation being analysed.
The figure above illustrates how the database in Figure 3.8 could be searched to
generate a student list for a specific course. Tables 1,2 and 3 are joined by means of
the Course-ID and Student-ID attributes. In effect Table 1 is joined to Table 2 by
Course-ID which they have in common, while Tables 2 and 3 are linked by the
Student-ID attribute. A new table, Table 6 is created from this relational join
operation.
Table 6 is in fact a “virtual table”, i.e. it does not physically exist although it can be
queried.
This logical join operation gives the relational database model tremendous flexibility.
It is able to accommodate diverse queries for which it was not specifically designed.
This flexibility has made this model one of the most commonly used for storing
attribute information in GIS.