DBMS - Lesson 1 - Data Management
DBMS - Lesson 1 - Data Management
Information
Information is processed data. In other words, information is processed, organized or
summarized data.
Characteristics of good/ quality information
Nowadays there is a lot of data but we lack quality information. Quality information must have
the following characteristics:
Accuracy: Accuracy means that information is free from errors and it clearly and
correctly reflects the meaning of data on which it is based, i.e. its degree of correctness is
high.
Timeliness: Timelines means that the recipients receive the information when they need
it and within the required time frame.
Relevance: Relevance means the usefulness of the information to the corresponding
persons or intended recipients.
Meta data
Meta data is data about data. Meta data describes objects in the database and makes it easier for
those objects to be accessed or manipulated. The Meta data describes the following about data
The database structure.
Size of data type.
Constrains.
Applications
Authorization etc.
All these items are used as an integral tool for information resource management.
Type of Meta data
There are three main types of Meta data namely:
Descriptive Meta data: Descriptive Meta data is a type Meta data that describes a
resource (data) for the purpose of being discovered, identified or selected. It includes, a
traditional library catalogue form of Meta data, title, abstract, author and keywords etc.
Structural Meta data: Structural Meta data is a type Meta data that describes the
relationship between the various parts of a resource (data), e.g. how many pages of a
book are ordered to form a chapter.
Administration Meta data: Administration Meta data is a type Meta data that
provides information to help manage a resource (data), such as when and how it was
created, file type , technical information, preservation, right and use.
Data management
Data management is the practice of managing data as a valuable resource to unlock its
potential. It is the practice of collecting, storing, organizing, protecting, verifying and processing
essential data to ensure its accessibility, reliability and timeliness to its users, who use it to make
informed decisions.
Importance of data management
The following are some of the benefits of good data management
Increased productivity: With good data management organizations will be more
organized and productive. Employees will have an easier time finding, understanding and
relying information.
Smooth operation: Good data management makes it easy for organization to respond
quickly to the world around them. This means that organization can respond efficiently to
market changes and react appropriately to competitors.
Reduce security risk: Proper data management helps ensure that organization’s
information stays secure and never ends up in the wrong hands. A strong data
management system will help protect its information from theft and attacks.
Reduce data loss: With a good data management plan in place organization greatly
reduce the risk of losing vital information. It also ensures important information is backed
up and retrieval in case something happens to the original copies.
Accurate decisions: Proper data management helps ensure all employees and workers
view and analyses the same information. This helps ensure that the organization will be
making the most accurate decisions based on the most accurate information.
Cost effective: Good data management can help organization avoid unnecessary extra
costs such as unneeded duplication when data is easily accessible.
Elements/components of data management
The data management process is made up of the following components or concepts:
Architecture: This is data management component involves defining how data is
connected, integrated, transformed, stored and used.
Modelling: This is data management component involves defining key business
concepts, e.g. customers, products etc. Data is designed through different models and the
relationship between them are shown.
. Application
Data File program Users
progran
Marketing Reports
Marketing Program
g
Manufacturing
Manufacturing Control Reports
Program
Inventory
Inventory Control Reports
Program
Payroll Reports
Payroll Program
Program dependence: The reports produced by a file processing system are program
dependent, which means if any change in the format or structure of data and records in
the file is to be made, the programs have to be modified correspondingly. Also, a new
program will have to be developed to produce a new report.
Data dependence: The application/programs in a file processing system are data
dependent i.e. the file organization its physical location and retrieval from the storage
media are dictated by the requirements of the particular application, the file may be
organized o employee records sorted on their last name, which implies that accessibility
of any employee’s record has to be through the last name.
Limited data sharing: There are limited data sharing possibilities with the traditional file
system. Each application has its own private files and user little choice to share
the data outside their own applications. Complex programs require to be written to obtain
data from several incompatible files.
Poor data control: In file processing system there is no centralized control of data
element level, hence data field may have multiple names defined by the different
department of an organization and depending on the file it is in. This situation leads to
different meaning of a data field in different context or same meaning for different fields.
This causes poor data control.
Security problem: It is very difficult to enforce security checks and access rights in a file
processing system, since application programs are added in an unplanned (ad hoc)
manner.
Data manipulation capability: is in adequate: The data manipulation capability is very
limited in the file based approach since they do not provide strong relationships between
data in different files.
Needs excessive programming: An excessive programming effort is needed to develop
a new application program and data in a file system. Each new application requires that
the developers start from scratch by designing new file formats, descriptions and then
write the file access logic for each new life.
Data isolation: Data is scattered in different files and files are different formats hence
making writing a new application program to retrieve data difficult.
Data atomicity: Ensuring that transactions fully happen or not happen at all is difficult,
since the information needed to roll back a transaction may not be readily available in a
file processing system.
The shared file approach
One approach of solving the problems, experienced with each application having its own set of
files is to share files between different applications. This will eliminate the problem of
duplication and inconsistent data between different applications and is as shown in the following
diagram
File 2 File 3
File 1
The introduction of the shared file approach solves the problem of duplication and inconsistent
data access different versions of the same file had by different departments but other problems
may emerge which include the following:
File incompatibility: A file structure that suits one application may not necessarily suit
another application, e.g. when each department has its own version of a file for
processing, each department could have to ensure that the structure of the file suits the ir
specific application which in turn would not be suitable for another.
Difficult to control access: Some application may require access to more data than
others, hence some files will still need to contain additional information to support the
application.
Physical data dependence: If the structure of the data need to the changed in the same
way, these alterations will need to be reflected in all application program that use the data
file.
Difficult to implement concurrency: While a data file is being processed by one
application, the file will not be available for other applications since if more than one
application is allowed to alter data in a file at one time, serious problems can arise in
ensuring that updates made by each application do not crush with one another. File base
approach avoids these problems by not allowing more than one application to access a
file at the same time.
Note: All the limitations of the file based approach can be attribute to two factors:
The definition of data embedded in the application programs rather than being stored
separately and independently.
There is no control over the access and manipulation of data beyond that imposed by
application programs.
In order to remove all limitations of the file based approach a new approach is required that must
be more effective. This approach is known as the database approach.
The database approach
In the database approach, data is defined and stored centrally in a database as shown in the
following diagram
Marketing
Application Reports
Program
Data base
management Manufacturing
Marketing data Control Reports
System Application
Manufacturing data Program
Inventory data
Inventory Control Reports
Pay roll data Application
Program
Payroll Reports
Application
Program
The database technology eliminates many of the problems experienced in the file based approach
by organizing many application and different groups at the same time. In this approach a pool of
related data is shared by multiple application programs. Rather than having separate files each
application uses a collection of data that is joined or related in the database.
Characteristics of database based approach
The following are the main characteristics of a database approach
Self-describing nature of a database system: A database contains not only the data, but
also a complete definition of the data structure, data types and data constraints. The
additional information is known as metadata which is stored in a file known as the data
dictionary.
Data abstraction: In a traditional file processing system, the structure of the data files is
hard coded in the application programs; thus any changes in structure would need the
related application programs to be modified accordingly whereas in a database system,
the application programs are independent of the data stored in the database. The
application programs are only concerned with “what data” is stored in the database and
not concerned with “how the data is stored”. As long as the contents of data remain
unchanged, the database structure can be changed, without affecting the existing
application programs. This features is called data abstraction.
Support for multiple views of the data: Depending on different needs and different
levels of authorizations, different users will be provided with different perspectives of the
same data known as views. A view refers to a subset of the stored data or set of virtual
data, i.e. data derived from the stored data.
Multiuser access and concurrency control: A multiuser Database Management System
(DBMS) allows multiple users to access the same database concurrently. This is achieved
by including concurrency control software in the Database Management System
(DBMS), to ensure that database remains consistent, despite access by multiple users
concurrently.
Effective system protection through grants of access right: Access rights are granted
to the users, to the extent required for their roles in the organization. These rights are
stored in the data dictionary itself. When a query is to be processed, the Database
Management System (DBMS) will first ensure that the user submitting the query has
sufficient rights for the processing of that query; only then is the query processed.
Support for efficient recovery: When a system is restarted after a failure, log based
recovery recovers the database efficiently.
Advantages / Benefits of the database approach
The main advantages of the database approach are defined as follows:
Data redundancy is minimized: Database system keeps data at one place in the
database. The data is integrated into a single, logical structure. Different applications
refer to the data from the centrally controlled location. The storage of the data,
centrally, minimizes data redundancy.
Data inconsistency is reduced: Minimizing data redundancy using database system
reduces data inconsistency too. Updating of data value becomes simple and there is
no disagreement in the stored value, e.g. student’s mobile numbers are stored at a
single location and get updated centrally.
Data is shared: Data sharing means sharing the same data among more than one
user, each user has access to the same data, though they may it for different purposes.
A database is designed to support shared data. Authorized users are permitted to use
the data from the database. Users are provided with views of the data to facilitate its
use, e.g. the student’s mobile numbers stored in the database which is shared by
student profile system and library system.
Data independence: It is the separation of data description (Meta data) from the
application programs that use the data. In the database approach, data descriptions are
stored in a central location called the data dictionary. This property allows an
organization’s data to change and evolve (within limits) without changing the
application programs that process the data.
Data integrity is maintained: Stored data is changed, frequently for variety of
reasons such as adding new data items types and changing the data formats. The
integrity and consistency of the database are protected using constraints on values that
data items can have. Data constraints definitions are maintained in the data dictionary
Data security is improved: The database is a valuable resource that needs
protection. The database is kept secure by limiting access to the database by
authorized personnel. Authorized users are generally restricted to the particular data
they can access and whether they can update it or not. Access is often controlled by
passwords.
Backup and recovery support: Backup and recovery are supported by the software
that logs changes to the database. This support helps in recovering the current state of
the database in case of system failure.
Standards are enforced: Since data is stored centrally it is easy to enforce standards
on the database. Standards could include the naming conversions and standard for
updating accessing and processing data. Tools are available for developing and
enforcing standards.
Application development time is reduced: The database approach greatly reduce
the cost and time for developing new business applications. Programmers can focus
on specific functions required for the new applications, without having to worry about
design or low level implementation details as related data have already been designed
and implemented. Tools for generation of forms and reports are also available.
In addition to the advantages highlighted above there are above there are several implications of
using the database approach like provision of multiple user interface , representation of complex
relationships, concurrent data access etc.
Disadvantages / Draws backs/ Limited of the database approach
There are some disadvantages of data systems over the file based system which include:
Database systems are more vulnerable then file based system because of the centralized
nature of a large integrated database.
If a failure occurs, the recovery process is complex and sometimes may result in lost
transactions.
Comparison of the file base approach with the database approach.
The following table shows a comparison between the file base approach and the database
approach