Database Concepts: University of Missouri Columbia
Database Concepts: University of Missouri Columbia
Introduction
Very early attempts to build GIS began from scratch, using limited tools like operating systems & compilers More recently, GIS have been built around existing database management systems (DBMS)
purchase or lease of the DBMS is a major part of the systems software cost the DBMS handles many functions which would otherwise have to be programmed into the GIS
Mixed solution
some data (usually attribute tables and relationships) are accessed through the DBMS because they fit the model well some data (usually locational) are accessed directly because they do not fit the DBMS model
The GIS adds geographical access to existing methods of search and query Such systems require very fast response to a limited number of queries, little analysis In these areas it is often said that GIS is a database problem rather than an algorithm, analysis, data input or data display problem
4
Definition
A database is a collection of non-redundant data which can be shared by different application systems
stresses the importance of multiple applications, data sharing the spatial database becomes a common resource for an agency
Implies separation of physical storage from use of the data by an application program, i.e. program/data independence
the user or programmer or application specialist need not know the details of how the data are stored such details are transparent to the user
5
Definition (continued)
Changes can be made to data without affecting other components of the system, e.g.
change format of data items (real to integer, arithmetic operations) change file structure (reorganize data internally or change mode of access) relocate from one device to another, e.g. from optical to magnetic storage, from tape to disk
Avoidance of inconsistencies
data must follow prescribed models, rules, standards
7
Security restrictions
database includes security tools to control access, particularly for writing
CONCEPTUAL VIEW
Primary means by which the database administrator builds and manages the database
User A1
User A2
User B1
User B2
User B3
External View A
External View B
Database Management System (DBMS)
Conceptual View
Components
Data types
integer (whole numbers only) real (decimal) character (alphabetic & numeric characters) date
more advanced systems may include pictures & images as data types
Example: a database of buildings for the fire department which stores a picture as well as address, number of floors, etc.
Standard Operations
Examples: sort, delete, edit, select records
11
Components (Continued)
Data definition Language (DDL)
The language used to describe the contents of the database
Examples: attribute names, data types - Metadata
Components (Continued)
Programming tools
Besides commands and queries, the database should be accessible directly from application programs through e.g. subroutine calls
File Structures
The internal structures used to organize the data
13
The hierarchical, network & relational models all try to deal with the same problem with tabular data:
inability to deal with more than one type of object, or with relationships between objects
Example: database may need to handle information on aircraft, crew, flights, and passengers - four types of records with different attributes, but with relationships between them (is booked on between passenger & flight) 14
Hierarchical Model
Early 1960s, IBM saw business world organizing data in the form of a hierarchy Rather than one record type (flat file), a business has to deal with several types which are hierarchically related to each other
Hierarchical Model
Example: company has several departments, each with attributes: name of director, number of staff, address
Each department requires several parts to make its product, with attributes: part number, number in stock Each part may have several suppliers, with attributes: address, price
D P D P S P S S
17
The database keeps track of different record types, their attributes, and the hierarchical relationships between them The attribute which assigns records to levels in the database structure is called the key
Example: Is record a department, part or supplier?
18
Summary of Features
A set of record types
Examples: Supplier record type, department record type, part record type
A set of links connecting all record types in one data structure diagram (tree) At most one link between two record types, hence links need not be named
For every record, there is only one parent record at the next level up in the tree
Example: every county has exactly one state, every part has exactly one department
19
x
20
Data access is easy via the key attribute, but difficult for other attributes
In the business case, easy to find record given its type (department, part or supplier) In geographical case, easy to find record given its geographical level (state, county, city, census tract), but difficult to find it given any other attribute
Example: find the records with population 5,000 or less
21
Cannot define linkages laterally or diagonally in the tree, only vertically The only geographical relationships which can be coded easily are is contained in or belongs to
DBMSs based on the hierarchical model (i.e., System 2000) have often been used to store spatial data, but have not been very successful as bases for GIS
22
Network Model
Developed in mid 1960s as part of work of CODASYL (Conference on Data Systems Languages) which proposed programming language COBOL (1966) and then network model (1971)
Other aspects of database systems also proposed at this time include database administrator, data security, audit trail
Objective of network model is to separate data structure from physical storage, eliminate unnecessary duplication of data with associated errors & costs
23
Need to link patients to doctor, also to ward Doctor record can own many patient records Patient record can be owned by both doctor and ward records
25
26
27
Relational Model
The most popular DBMS model for GIS
The INFO in ARC/INFO EMPRESS in System/9 Several GIS use ORACLE Several PC-based GIS use Dbase III
Flexible approach to linkages between records comes closes to modeling the complexity of spatial relationships between objects Proposed by IBM researcher E.F. Codd (1970) More of a concept than a data structure
Internal architecture varies substantially from one RDBMS to another
28
Note the potential confusion: a relation is a table of records, not a linkage between records
29
Examples of relations:
Unary: COURSES (Subject) Binary: PERSONS (Name, Address) OWNER (Person name, house address) Ternary: HOUSES (address, price, size)
30
Non-redundancy
No attribute in the key can be discarded without destroying the keys uniqueness
A relational join is the reverse of this normalization process, where the two relations HOMES2 and COST are combined to form HOMES1
33
Example
Given two relations:
PROPERTY (ADDRESS, VALUE, COUNTY_ID) COUNTY (COUNTY_ID, NAME, TAX_RATE)
To answer the query what are the taxes on property x the user would:
Retrieve the property record Link the property and county records through the common attribute COUNTY_ID Compute the taxes by multiplying VALUE from the property tuple with TAX_RATE from the linked county tuple
35
Setting up and maintaining a spatial database requires careful planning, attention to numerous issues Many GIS were developed for a research environment of small databases
Many database issues like security not considered important in many early GIS Difficult to grow into an environment of large, production-oriented systems
36
Very few database systems have been able to handle textual data
Example: descriptions of soils in the legend of a soil map can run to hundreds of words Example: descriptions are as important as numerical data in defining property lines in surveying - metes and bounds descriptions
37
38
Data Security
Many systems for small computers, and systems specializing in geometric and geographical data, do not provide functionality necessary to maintain data integrity over long periods of time.
42
Data Security
Integrity Constraints
Integrity constraints: rules which the database must obey in order to be meaningful
Attribute values must lie within prescribed domains Relationships between objects must not conflict
Example: Flows into relationship between river segments must agree with is fed by relationship
Locational data must not violate rules of planar enforcement, contours must not cross each other, etc.
43
Data Security
Transactions
Transactions may include:
Modifications to individual data items Addition or deletion of entire records Addition or deletion of attributes Changes in schema (external views of the database)
Example: addition of new tables or relations, redefinition of access keys
All of the updates or modifications made by a user are temporary until confirmed
System checks integrity before permanently modifying the database (posting the changes to the database) Updates and changes can be abandoned at any time prior to final confirmation
44
Concurrent Users
In many cases more than one user will need to access the database at any one time
This is a major advantage of multi-user systems & networks
However, if the database is being modified by several users at once, it is easy for integrity constraints to be violated unless adequate preventative measures exist
45
47
Protected: Any application may retrieve data, but only one may modify it
Example: User B should be able to query the status of fire trucks even after user A has placed a hold on one
Check-out/check-in
In GIS applications, digitizing and updating spatial objects may require intensive work on one part of the database for long periods of time
Example: digitizer operator may spend an entire shift working on one may sheet Work will likely be done on a workstation operating independently of the main database
Check-out/Check-in (continued)
At beginning of shift, operator checks out an area from the database At end of work, the same area is checked in, modifying and updating the database While an area is checked out, it should be locked by the main database
This will allow other users to read the data, but not to check it out themselves for modification This resolves problems which might occur
50
Check-out/Check-in (continued)
Example:
user A checks out a sheet at 8:00a.m. & starts updating User B checks out the same sheet at 9:00 a.m and starts a different set of updates from the same base If both are subsequently allowed to check the sheet back in, then the second check-in may try to modify an object which no longer exists
The area is unlocked when the new version is checked in and modifies the database The amount of time required for check-out and check-in must be no more than a small part of a shift
51
52
55
56
Operations to protect against loss may be expensive, but the cost can be balanced against the value of the database Because of the consequences of data loss in some areas (air traffic control, bank accounts) very secure systems have been devised
57
58
Loss of storage medium, due to operating or hardware defects (head crashes), or interruption during transaction processing
These occur much less often, slower recovery is acceptable Database is regenerated from most recent backup, plus transaction log if available
59
Unauthorized Use
Some GIS data is confidential or secret
Examples: tax records, customer lists, retail store performance data
60
Flexibility, complexity of many GIS applications often makes it difficult to provide adequate security
62