CAPE NOTES Unit 2 Module 1 Database Management-1
CAPE NOTES Unit 2 Module 1 Database Management-1
Specific Objective 4: Explain the advantages of using a database approach compared to using traditional
file processing;
Content: Advantages including speed, efficiency, cost; data quality: completeness, validity, consistency,
timeliness and accuracy; data handling, data processing.
explain the advantages of using a Advantages including speed, efficiency, cost; data quality:
database approach compared to using completeness, validity, consistency, timeliness and
traditional file processing; accuracy; data handling, data processing.
Traditional file based system is basically a file based system, in which we manually or through computer
handle the database such as updating, inserting, deletion or adding new files to database, etc.
3. Less integrity
http://wiki.answers.com/Q/What_is_a_traditional_File-Based_System
What is a Database?
A Computer Database is a structured collection of records or data that is stored in a computer system. The
structure is achieved by organizing the data according to a database model. The model in most common
use today is the relational model. Other models such as the hierarchical model and the network model
use a more explicit representation of relationships (see below for explanation of the various database
models).
A computer database relies upon software to organize the storage of data. This software is known as a
database management system (DBMS). Database management systems are categorized according to the
database model that they support. The model tends to determine the query languages that are available
to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the
A Database Management System (DBMS) is computer software designed for the purpose of managing
databases based on a variety of data models.
http://en.wikipedia.org/wiki/Database_management_system
If some major changes were to be made to the data, the application programs may need to be rewritten. In
a database system, the database management system provides the interface between the application
programs and the data. When changes are made to the data representation, the metadata maintained by the
DBMS is changed but the DBMS continues to provide data to application programs in the previously used
way. The DBMS handles the task of transformation of data wherever necessary.
This independence between the programs and the data is called data independence. Data independence is
important because every time some change needs to be made to the data structure, the programs that were
being used before the change would continue to work. To provide a high degree of data independence, a
DBMS must include a sophisticated metadata management system.
In DBMS, all files are integrated into one system thus reducing redundancies and making data
management more efficient. In addition, DBMS provides centralized control of the operational data.
Some of the advantages of data independence, integration and centralized control are:
In conventional data systems, an organization often builds a collection of application programs often
created by different programmers and requiring different components of the operational data of the
organization. The data in Conventional Data Systems (Traditional File Management System) is often
not centralised. Some applications may require data to be combined from several systems. These
several systems could well have data that is redundant as well as inconsistent (that is, different copies
of the same data may have different values). Data inconsistencies are often encountered in everyday
life. For example, we have all come across situations when a new address is communicated to an
organization that we deal with (e.g. a bank, or Telecom, or a gas company), we find that some of the
communications from that organization are received at the new address while others continue to be
mailed to the old address. Combining all the data in a database would involve reduction in redundancy
as well as inconsistency. It also is likely to reduce the costs for collection, storage and updating of
data.
A DBMS is often used to provide better service to the users. In conventional systems, availability of
information is often poor since it normally is difficult to obtain information that the existing systems
were not designed for. Once several conventional systems are combined to form one centralized data
Centralizing the data in a database also often means that users can obtain new and combined
information that would have been impossible to obtain otherwise. Also, use of a DBMS should allow
users that do not know programming to interact with the data more easily.
The ability to quickly obtain new and combined information is becoming increasingly important in an
environment where various levels of governments are requiring organizations to provide more and
more information about their activities. An organization running a conventional data processing
system would require new programs to be written (or the information compiled manually) to meet
every new demand.
Changes are often necessary to the contents of data stored in any system. These changes are more
easily made in a database than in a conventional system in that these changes do not need to have any
impact on application programs.
It is much easier to respond to unforeseen requests when the data is centralized in a database than
when it is stored in conventional file systems. Although the initial cost of setting up of a database can
be large, one normally expects the overall cost of setting up a database and developing and
maintaining application programs to be lower than for similar service using conventional systems
since the productivity of programmers can be substantially higher in using non-procedural languages
that have been developed with modern DBMS than using procedural languages.
Since all access to the database must be through the DBMS, standards are easier to enforce. Standards
may relate to the naming of the data, the format of the data, the structure of the data etc. This might
not be so when using Traditional File Storage Systems.
In conventional systems, applications are developed in an ad hoc manner. Often different systems of
an organization would access different components of the operational data. In such an environment,
enforcing security can be quite difficult.
Setting up of a database makes it easier to enforce security restrictions since the data is now
centralized. It is easier to control that has access to what parts of the database. However, setting up a
database can also make it easier for a determined person to breach security. We will discuss this in the
next section.
Integrity may be compromised in many ways. For example, someone may make a mistake in data
input and the salary of a full-time employee may be input as $4,000 rather than $40,000. A student
may be shown to have borrowed books but has no enrolment. Salary of a staff member in one
department may be coming out of the budget of another department.
If a number of users are allowed to update the same data item at the same time, there is a possibility
that the result of the updates is not quite what was intended. For example, in an airline DBMS we
could have a situation where the number of bookings made is larger than the capacity of the aircraft
that is to be used for the flight. Controls therefore must be introduced to prevent such errors to occur
because of concurrent updating activities. However, since all data is stored only once, it is often easier
to maintain integrity than in conventional systems.
All enterprises have sections and departments and each of these units often consider the work of their
unit as the most important and therefore consider their needs as the most important. Once a database
has been set up with centralized control, it will be necessary to identify enterprise requirements and to
balance the needs of competing units. It may become necessary to ignore some requests for
information if they conflict with higher priority needs of the enterprise.
Perhaps the most important advantage of setting up a database system is the requirement that an
overall data model for the enterprise be built. In conventional systems, it is more likely that files will
be designed as needs of particular applications demand. The overall view is often not considered.
Building an overall view of the enterprise data, although often an expensive exercise is usually very
cost-effective in the long term.
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/databases/dbms_adv.html
Databases today allows users to retrieve information stored across many database tables, doing this by
creating relationships between tables and allow multiple sets of criteria to be used to search field (s) in
tables and extract the results. Databases over their conventional filing systems allows for quicker
updating of multiple records based on a criteria or set of criterias based on the use of Update Queries.
Speed: Databases allows records to be retrieved at high speed allowing for greater information retrieval.
With Databases, the database can be powered with several criterias that can be executed together
(compulsory criteria selection) or separately (hierarchical criteria selection). Database criterias can
extract data from several huge database tables consisting of thousands of records, and return the specific
data required quickly in order to make decisions.
Completeness:
Consistency:
With Databases, changes made in the main tables are automatically reflected across all queries and reports
and Database allows for greater consistency in data entry. With the use of drop list in database allows you
to more accurately enter the same group of data (i.e. Model of Cars – Toyota, Suzuki, Ford, Honda,
BMW, Mercedes, etc.).
Additional discussion:
Disadvantages of Databases
A database system generally provides on-line access to the database for many users. In contrast, a
conventional system is often designed to meet a specific need and therefore generally provides access to
only a small number of users. Because of the larger number of users accessing the data when a database is
used, the enterprise may involve additional risks as compared to a conventional data processing system in
the following areas.
1. Confidentiality, privacy and security – due to the high concerns of these factors and the threat post by
hackers and other harmful users, a large amount of money is allocated to cover the cost associated
with maintaining databases to provide for these features. Due to the high cost of database
maintenance, at times the cost is passed onto the customer or included in the service or product cost.
2. Data quality – as to who has the right to maintain the data and to periodically clean the database, this
can pose a significant challenge. One would have to ensure that they run several Data Cleansing
software tool and use this in ensuring that the quality of data stored can be improved.
3. Data integrity – not at all times are managers able to ensure that the data stored is accurate and
properly stored. When one is to question the data updating capacity of a DBMS, then the integrity of
the data generated or stored is also an issue.
4. Enterprise vulnerability may be higher – databases are normally high cost investments for
organizations such as banks, insurance companies, cable companies, weather monitoring stations,
etc. How would a company survive the attack on its database is not a guarantee that this would not
severely affect their operations and may even render the company wounded from operating.
5. The cost of using DBMS – not just to collect, store and run queries are what databases are used for,
but they are also used to ensure the company can operate. Other costs to be considered include
security, management controls, concurrency control (record, file, table locking – Shared and
Exclusive Locks on Database Records, table, etc.).
When information is centralised and is made available to users from remote locations, the possibilities of
abuse are often more than in a conventional data processing system. To reduce the chances of
unauthorised users accessing sensitive information, it is necessary to take technical, administrative and,
possibly, legal measures.
Most databases store valuable information that must be protected against deliberate trespass and
destruction.
Data Quality
Since the database is accessible to users remotely, adequate controls are needed to control users updating
data and to control data quality. With increased number of users accessing data directly, there are
enormous opportunities for users to damage the data. Unless there are suitable controls, the data quality
may be compromised.
Data Integrity
Since a large number of users could be using a database concurrently, technical safeguards are necessary
to ensure that the data remain correct during operation. The main threat to data integrity comes from
several different users attempting to update the same data at the same time. The database therefore needs
to be protected against inadvertent changes by the users.
Enterprise Vulnerability
Centralising all data of an enterprise in one database may mean that the database becomes an indispensible
resource. The survival of the enterprise may depend on reliable information being available from its
database. The enterprise therefore becomes vulnerable to the destruction of the database or to unauthorised
modification of the database.
Conventional data processing systems are typically designed to run a number of well-defined, preplanned
processes. Such systems are often "tuned" to run efficiently for the processes that they were designed for.
Additional Issues:
Centralization: That is use of the same program at a time by many users sometimes lead to loss of some
data; High cost of software.
Additional Content:
Although the conventional systems are usually fairly inflexible in that new applications may be difficult to
implement and/or expensive to run, they are usually very efficient for the applications they are designed
for.
We now discuss a conceptual framework for a DBMS. Several different frameworks have been suggested
over the last several years. For example, a framework may be developed based on the functions that the
various components of a DBMS must provide to its users. It may also be based on different views of data
that are possible within a DBMS. We consider the latter approach.
A commonly used view of data approach is the three-level architecture suggested by ANSI/SPARC
(American National Standards Institute/Standards Planning and Requirements Committee). ANSI/SPARC
produced an interim report in 1972 followed by a final report in 1977. The reports proposed an
architectural framework for databases. Under this approach, a database is considered as containing data
about an enterprise. The three levels of the architecture are three different views of the data:
The three level database architecture allows a clear separation of the information meaning (conceptual
view) from the external data representation and from the physical data structure layout. A database system
that is able to separate the three different views of data is likely to be flexible and adaptable. This
flexibility and adaptability is data independence that we have discussed earlier.
The External Level is the view that the individual user of the database has. This view is often a restricted
view of the database and the same database may provide a number of different views for different classes
of users. In general, the end users and even the applications programmers are only interested in a subset of
the database. For example, a department head may only be interested in the departmental finances and
student enrolments but not the library information. The librarian would not be expected to have any
interest in the information about academic staff. The payroll office would have no interest in student
enrolments.
The Conceptual View is the information model of the enterprise and contains the view of the whole
enterprise without any concern for the physical implementation. This view is normally more stable than
the other two views. In a database, it may be desirable to change the internal view to improve performance
while there has been no change in the conceptual view of the database. The conceptual view is the overall
community view of the database and it includes all the information that is going to be represented in the
database. The conceptual view is defined by the conceptual schema which includes definitions of each of
the various types of data.
Efficiency considerations are the most important at this level and the data structures are chosen to provide
an efficient database. The internal view does not deal with the physical devices directly. Instead it views a
physical device as a collection of physical pages and allocates space in terms of logical pages.
The separation of the conceptual view from the internal view enables us to provide a logical description of
the database without the need to specify physical structures. This is often called physical data
independence. Separating the external views from the conceptual view enables us to change the
conceptual view without affecting the external views. This separation is sometimes called logical data
independence.
Assuming the three level view of the database, a number of mappings are needed to enable the users
working with one of the external views. For example, the payroll office may have an external view of the
database that consists of the following information only:
The Conceptual View of the database may contain academic staff, general staff, casual staff etc. A
mapping will need to be created where all the staff in the different categories are combined into one
category for the payroll office. The conceptual view would include information about each staff's position,
the date employment started, full-time or part-time, etc. This will need to be mapped to the salary level for
the salary office. Also, if there is some change in the conceptual view, the external view can stay the same
if the mapping is changed.
The database will be able to meet the demands of various users in the organization effectively only if it is
maintained and managed properly. Usually a person (or a group of persons) centrally located, with an
overall view of the database, is needed to keep the database running smoothly. Such a person is called the
Database Administrator (DBA).
The DBA would normally have a large number of tasks related to maintaining and managing the database.
These tasks would include the following:
1. Deciding and Loading the Database Contents - The DBA in consultation with senior management is
normally responsible for defining the conceptual schema of the database. The DBA would also be
responsible for making changes to the conceptual schema of the database if and when necessary.
2. Assisting and Approving Applications and Access - The DBA would normally provide assistance to
end-users interested in writing application programs to access the database. The DBA would also
approve or disapprove access to the various parts of the database by different users.
3. Deciding Data Structures - Once the database contents have been decided, the DBA would normally
make decisions regarding how data is to be stored and what indexes need to be maintained. In
addition, a DBA normally monitors the performance of the DBMS and makes changes to data
structures if the performance justifies them. In some cases, radical changes to the data structures may
be called for.
4. Backup and Recovery - Since the database is such a valuable asset, the DBA must make all the efforts
possible to ensure that the asset is not damaged or lost. This normally requires a DBA to ensure that
regular backups of a database are carried out and in case of failure (or some other disaster like fire or
flood), suitable recovery procedures are used to bring the database up with as little down time as
possible.
5. Monitor Actual Usage - The DBA monitors actual usage to ensure that policies laid down regarding
use of the database are being followed. The usage information is also used for performance tuning.
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/databases/
References:
1. J. P. Fry and E. H. Sibley (1976), "Evolution of Data-Base Management Systems", ACM Computing
Surveys, 8, March 1976, pp. 7-42.
2. D. A. Jardine, ed. (1977), "The ANSI/SPARC DBMS Model", North-Holland.
3. D. Tsichritzis and A. Klug, Eds. (1978), "The ANSI/X3/SPARC Framework", Info. Systems, Vol 3,
No 3, 1978.
describe the different types and File types including master and transaction files; file
organization of files and databases; organization including serial, sequential, random or direct,
indexed sequential database types including personal,
workgroup, department and enterprise databases; database
organization including hierarchical, relational, network
and object-oriented.
FILE TYPES
Master File
A master file or master table contains a group of common records. Item Data, Customer Data, and
Supplier Data are examples of tables.
www.ssgweb.com/CourseTerms.cfm
Contains information about an organization’s business situation. Most transactions and databases are
stored in the master file. www.nationmaster.com/encyclopedia/Transaction-Processing-System
File maintained by the Contractor that contains all essential account information.
www.gsa.gov/gsa/cm_attachments/GSA_BASIC/GLOSSARY_R2-v-h8-u_0Z5RDZ-i34K-pR.doc
A file of relatively permanent data or information that is updated periodically.
highered.mcgraw-hill.com/sites/0073010847/student_view0/chapter8/key_terms.html
A file of data which is the principal source of information for a job which is updated or amended as
necessary. www.nursing.bcs.org/inftouch/vol1/glossv1.html
(computer science) a computer file that is used as the authority in a given job and that is relatively
permanent wordnet.princeton.edu/perl/webwn
Transaction
Old File
Master
File
New UPDATE
Master
File
Transaction File
It is the collection of transaction records. It helps to update the master file and also serves as audit
trails and transaction history. en.wikipedia.org/wiki/Transaction_Processing_System
(computer science) a computer file containing relatively transient data about a particular data
processing task wordnet.princeton.edu/perl/webwn
A collection of transaction records. The data in transaction files is used to update the master files,
which contain the data about the subjects of the organization (customers, employees, vendors, etc.).
Transaction files also serve as audit trails and history for the organization. Where before they were
FILE ORGANIZATION
Serial
This means ...start at the beginning of the file and access each record in turn until the one needed is found.
If files are stored on magnetic tape then serial access is the only method of access.
Sequential
A group of elements (e.g. data in a memory array or a disk file or on a tape) is accessed in a
predetermined, ordered sequence. Sequential access is sometimes the only way of accessing the data, for
example if it is on a tape. It may also be the access method of choice, for example if we simply want to
process a sequence of data elements in order.
The computer can calculate (from the key field) where the record is stored in the file, and can then access
the record directly from that position. Direct access can only be used if files are stored on media such as
disk, CD, DVD. Direct access of records will generally be much faster than Serial Access. Direct access
is also known as random access.
Useful Link:
http://www.theteacher99.btinternet.co.uk/theteacher/gcse/newgcse/others/file_gens.htm
Indexed Sequential
Indexed Sequential files are important for applications where data needs to be accessed randomly using
the index. For example when the whole of the file is processed to produce pay slips at the end of the
month. An Indexed Sequential file can only be stored on a random access device
eg magnetic disc, CD.
Personal – used for recording self information and for one’s own access. A database created to store
phone contacts and addresses and friends would be a good example.
Workgroup – used by teams working on tasks that needs access to the same data. Maybe the building
team of a major works projects have contractors, builders and engineers using the same database to update
each other on the status of their work.
Department – use by a unit in an organization which provides some functional application of the data in
the department’s domain. The marketing department of an insurance company could have a database of
possible clients who can be contacted to discuss possible choices of policy that are offered.
Enterprise - Centralized data that is shared by many users throughout the organization. The information
shared across all branches of a bank on its customers’ accounts and allows any branch to post a transaction
to the customer file and eventual update the master file stored in a centralised or distributed manner.
DATBASE ORGANIZATION
Hierarchical
In a hierarchical data model, data is organized into a tree-like structure. The structure allows repeating
information using parent/child relationships: each parent can have many children but each child only has
one parent. All attributes of a specific record are listed under an entity type. In a database, an entity type is
the equivalent of a table; each individual record is represented as a row and an attribute as a column.
Entity types are related to each other using 1: N mapping, also known as one-to-many relationships.
http://en.wikipedia.org/wiki/Hierarchical_model
An example of a hierarchical data model would be if an organization had records of employees in a table
(entity type) called "Employees". In the table there would be attributes/columns such as First Name, Last
Name, Job Name and Wage. The company also has data about the employee’s children in a separate table
called "Children" with attributes such as First Name, Last Name, and date of birth. The Employee table
represents a parent segment and the Children table represents a Child segment. These two segments form a
hierarchy where an employee may have many children, but each child may only have one parent.
Relational
The relational model for database management is a database model based on first-order predicate
logic, first formulated and proposed in 1969 by Edgar Codd.[1][2][3]
Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables,
describing constraints on the possible values and combinations of values. The content of the database at
any given time is a finite model (logic) of the database, i.e. a set of relations, one per predicate variable,
such that all predicates are satisfied. A request for information from the database (a database query) is
also a predicate.
Object-oriented
One aim for this type of system is to bridge the gap between conceptual data modeling techniques such
as Entity-relationship diagram (ERD) and object-relational mapping (ORM), which often use classes
and inheritance, and relational databases, which do not directly support them.
Another, related, aim is to bridge the gap between relational databases and the object-oriented modeling
techniques used in programming languages such as Java, C++ or C#. However, a more popular
alternative for achieving such a bridge is to use standard relational database systems with some form of
ORM software.
Network
The network model is a database model conceived as a flexible way of representing objects and their
relationships. Its original inventor was Charles Bachman, and it was developed into a standard
specification published in 1969 by the CODASYL Consortium. Where the hierarchical model structures
data as a tree of records, with each record having one parent record and many children, the network
model allows each record to have multiple parent and child records, forming a lattice structure.