Developing An Information System
Developing An Information System
What is a database?
The first – and most obvious – question to ask when you take up this subject is the simplest –
“What is a database?” Certainly, you will have dealt with them, indirectly, almost daily. Whether
you are in a shop in person or whether you are exploring its catalogue on the internet, when
you check whether a product is in stock, it is likely that a database will be used somewhere
within the system. Amazon and Facebook, YouTube and iTunes all use databases to deliver
products and services to their users.
The database and its structure may be quite obvious to the user for a library catalogue or an
online retailer, but it may also be serving a less direct purpose, allowing the company to keep
track of its employees and suppliers, or helping an advertiser track visitors to web pages across
different sites, tailoring their adverts to match a browser9s activity.
Activity
Before you read on, try to list some other examples of databases you have come across. What
do they have in common? Based on your examples, write down what you think are the most
important features of a database.
A database system is a system that stores data. To qualify as a database system, there are some
features that it would have to offer:
• find (retrieve) data
• add (insert) new data
• delete unwanted data • change (update) data.
This definition will be refined and formalized in the sections to come, but first, we can illustrate
these features with an example.
Consider a shoe shop, specializing in trainers. The shop keeps information about the products it
sells. This information could be organized in the form of a table, as shown in Figure 1.1 (prices
are in U.K. pounds sterling, shown as £), and could be part of the shop’s database.
Figure 1.2. Part of the shoe shop’s database after some changes. Altered fields are
underlined.
A database has a structure and content. The structure is represented in this example by the
table headings; the content by the body of the table. The content changes in time – it is
dynamic in nature. The structure can change, but it is far less changeable than the content. For
instance, you could add a new column to this table – the type of trainer or the activity it might
be associated with – but you would not expect to make such changes that often. The structure
of the database is called its intension and the content is called its extension
Although it may not be obvious from this example, a database is capable of storing a large
amount of data.
So far, a database system is, for us, nothing more than a system that manages data. But is any
system that manages data a database system? Is there anything that all database systems have
in common, that distinguishes them from other software systems? The answer is obviously yes.
In order to understand the “”database approach””, we shall first have a brief look at file based
systems. In appearance (behavior) they are similar to database systems, but they are
conceptually (qualitatively) different. We shall identify the drawbacks of the file-based
approach to data management and then introduce the database approach as a solution to
most of these drawbacks.
Activity
Consider the database examples you listed in the previous activity. For each one, think about it
as a table like the ones in Figures 2.1 and 2.2 above. What columns would the table have?
Would all the information fit in a single table, or would there be several?
File-based systems
We shall start with a definition of file-based systems.
Definition: A file-based system is a collection of application programs, each managing its own
data.
In a file-based system, permanent data is stored in various files of ad-hoc structures. Each
application program defines and handles its own data files independently of the others. This
approach is called the de-centralized approach. Each application program works with its data
at the physical level, manipulating records as they are organized in persistent memory. Sharing
of data between applications is likely to be limited.
The concept of a physical level for data is one to which we will return later. The structure we
describe is not purely physical, but we use the term to indicate that it is to some extent
platform dependent, because access to files is made through the primitives (built-in
functionality) of the operating system.
Take, for example, an estate agent’s office, for which we shall consider the Sales and the
Contracts department. Each department maintains its own data in its own data files, as
depicted in Figure 1.3.
Figure 1.3. A file-based system for an estate agent’s company. The Sales Department needs:
detailed information about the properties for rent, so staff can give good advice to
customers (such as Type and No Of Rooms from the Property for rent file);
detailed information about customers, so that their needs can be appropriately
matched to what is available (such as Preferred Type and Max Rent in the Renter file);
“identification”” information – such as name, address and telephone number – about
customers, the properties on offer and their owners. The Contracts Department
needs:
detailed information about the renting contracts (in the Lease file);
“”identification”” information – such as name, address, telephone number – about c
ustomers, the contracted properties and their owners.
Some drawbacks of this solution are obvious. These are the limitations of the file- based approach in
general. The most important are enumerated below.
Duplication. Different applications might have to make use of the same information.
Because each application has its own files, data is duplicated (e.g. the 8identification9
information in our example). This aspect has at least two negative consequences. Firstly,
duplication is wasteful. Secondly, data can become inconsistent – it can have different values
indifferent files (belonging to different applications), even though it is supposed to give the
same piece of information. For example, the address of an owner, Mr. J. Morris, might be
updated in the Owner file belonging to the Sales Department, while the Contracts
Department might still have Mr.Morris’s old address.
NOTE: Wasting disk space is unlikely to be a significant concern in the example we have
given – storage is cheap – but in situations where the number of applications and the scale
of duplication is greater this can become more important.
Separation and isolation. Data is scattered among different files, each file belonging to a
certain department. A department has access to its own files, but no access to the files of the
other departments. Files belonging to different departments cannot be used together in order
to create more complex data or analysis. Often, because they are based on different
infrastructures (platforms, development software, etc.) files belonging to different
departments cannot be transferred (copied) across.
Program-data dependence. Each file belongs to a certain application program. The (physical)
structure of data is defined inside the application program. This could easily – and usually does
– lead to incompatible file formats between applications, meaning that it becomes impossible
to share data between them. Another aspect is that data definition is embedded in the
application program. That means that if the physical structure of data is to be changed – for
instance, if instead of representing a year with two digits, it is to be represented with four2 – then
the application program itself must be changed. Not only that, but the methods of access and data are
also embedded in the application program to change them, the application program must be
modified.
In the file-based approach, the emphasis is placed on functionality – provided by the application
program. Data modeling takes a lower priority. This approach leads to the drawbacks we have
listed. If the approach is inverted and we consider data as central, then these problems can be
removed. Informally, this represents the database approach.
2.2.2 Databases and database management systems
We shall start with the definition provided by Connolly and Begg:
Activity
Before you read on, try to think of an organization you know – perhaps one you9ve worked for
or studied at – that has multiple systems similar to what is described above. What problems
can occur when you have data duplication like this? Does it matter whether the separate
systems store their information in database software or spreadsheets? Do you think this is a
useful definition of database systems?
software, serving two purposes: the management of the stored data, and further
processing of the data to the users’s needs;
hardware, supporting both the stored data and the software components;
users, broadly divided into two categories: developers of the database system, and
users of the system.
Data
Data can be classified into two categories, namely:
1. Primary data – the fundamental information necessary to provide the database service,
stored on permanent support, such as hard disks.
2. Derived data – information that can be inferred or calculated from primary data (and may be
recalculated at any time).
Derived data may be the output of the application programs – the result of processing the
primary data – in a form suitable for the users’ needs, but it can also be the input from users
that will then be processed by the application to be stored as primary data.
The focus of a database system is on primary data. This has to be appropriately identified,
described and implemented. The primary data has three important characteristics. It is:
• integrated, rather than existing in separate systems – it has been gathered
together into a single system2
• shared, with all the applications belonging to the information system having
common access to (at least parts of) it
• extensive, in that database systems are usually developed for data intensive
applications, where their benefits are more clearly felt.
Stored data, as we have already seen, does not include only the raw data, but also its description
– the metadata, system dictionary or catalogue.
Software
The software component can be seen as consisting of three layers (Figure
2.11):
• the operating system (OS), positioned at the base, provides the necessary routines for
accessing the hardware resources (such as file handling or memory management
routines);
• the database management system (DBMS), placed above the OS – and using the routines
that the OS makes available – provides all the necessary primitives for data
management, including languages for defining schemas, manipulating and reading data and
so on;
• application programs, above the DBMS – and using the routines made available by the
DBMS – provide data formats and computations beyond the capabilities of the DBMS.
Figure 2.11. The layered structure of the software component of a database system. Anything
below the dashed grey line is platform-dependent, and so will not be discussed here in any
detail.
The hardware and the OS are often grouped together and called the platform. There is
considerable variation between platforms, which is one reason for having the DBMS software
handle this variation and present a more abstracted interface to higher-level components. This
provides a platform independence that shields the application programs from unnecessary
physical details, and means that we need not concern ourselves with details of hardware or OS
for the remainder of these subject guides. Instead, we focus on the features provided for the
application programs by the DBMS.
The features of the DBMS will be considered in detail over the course of this chapter. Briefly, the
DBMS provides support for schema definition, data manipulation, data security and data
integrity. The application programs can be of two kinds:
1. user developed;
2. provided together with the DBMS by its developer.
The former class of applications will generally be written in a high-level programming language,
such as C, Java or Python. Support for database access in such languages is provided by means
of a data sub-language, embedded within the host language. Statements written in the
embedded sub-language are processed and passed on to the DBMS using the appropriate
routines.
Programs provided by the DBMS developer allow the rapid development of user applications,
without the user writing any conventional code.
Programming tools abstract away or remove so much functionality in order to allow often
application-specific software to be constructed quickly; these are known generically as fourth-
generation tools. Home or small business database systems – such as Microsoft Access or
OpenOffice Base – provide graphical fourth-generation tools for this purpose.
The DBMS can also be referred to as server or backend (server), whereas the application
programs are referred to as clients, or front-ends. Clients use the services provided by a server
for data management. The division between client and server makes it possible for the server
and client to run on different machines, giving rise to the idea of distributed processing, an
issue discussed in the 8Database architectures9 section and elsewhere in these subject guides.
Hardware
As we have seen, the DBMS allows both the developer of a database and the
database users to operate without knowing the details of the hardware being used.
This does not remove from the system administrator the need to select hardware
and operating systems that, firstly, are capable of running the chosen software; and
secondly that can cope with the demands that will be placed upon it by the
database and associated systems. The system administrator should be satisfied that:
1. There is enough permanent storage space, for instance disk space, to store the data
and any indexes and cached derived data.
2. There is enough temporary storage space, for instance RAM, to hold intermediate
results and computations.
3. There is enough computational power to manipulate the data at the rate that will be
required.
4. There is fast enough communication between components of the
system for moving the data between them. This is only usually an issue
for particularly data-heavy applications or systems with a very large user
base.
Although DBMS vendors will provide recommendations for minimal configurations required for
different sizes of application, individual use cases will have a large impact on the system
requirements.
Users
Users, as a component of a database environment, can be classified in four categories, according
to the role they play.
Data administrator. The data administrator (DA) is a user who properly understands the data
requirements of the organization and is in charge of administering the organisation9s data. This
user:
• decides which data is relevant and which is not;
• is in charge of applying the organization’s policy and standards;
• decides on the security policy, and so on.
The DA does not need to be a technical expert or a manager. Rather, the DA is somewhere in
between, liaising with the management on one hand, and with the technical team, on the other.
Database administrator. The database administrator (DBA) is the technical user in charge of the
database system. More specifically the DBA is responsible for the database9s design,
implementation and maintenance, and deals with both the correctness of the implementation
and the efficiency of the database system. The DBA must have good technical knowledge and is
in charge of the definition of the DB schemas, integrity and security rules, access procedures,
backup and recovery procedures, performance of the system, etc.
End user. The end users are the “beneficiaries” of the database system. They may range from
technically naïve to extremely sophisticated. A technically naïve user, for example a bank
employee, may interact with the system using application programs developed for specific tasks.
A naïve user does not have to be aware of the functionality of the DBMS. All they need is
reliable and easy to use programs that they can use with minimal fuss. A sophisticated user, on
the other hand, will know how to access the database directly, through the database language
supported by the DBMS. Sometimes a sophisticated user might even develop applications, and
so become an application programmer.
Activity
“ The term user is often used in software engineering to refer to one or more people playing a
particular role in interacting with software. That means that a “user” here can mean several
people, and one person can be several different users’ in different contexts, depending on the
work that she or he is doing at the time.”
At the beginning of this chapter, you were asked to list databases you had encountered in real
life. For each one, consider which group of user takes which of the above roles. Is the
separation always clear?
This is achieved by means of a data manipulation language (DML). There can be a DML at each
level of abstraction. At the external and conceptual level, the DML is concise, comprehensive
and easy to use; in other words, the emphasis is on its expressive power – on these levels,
efficiency is a secondary goal. On the other hand, at the internal level, the emphasis is placed
on the DML9s efficiency. This means that its statements are complex – and probably not that
straightforwardly expressible – but quite efficient.
These languages (DDLs and DMLs) are called data sub-languages because they do not
include constructs for the control of flow – they are computationally incomplete
(meaning they cannot be used as general purpose programming languages).
Users can use them directly in order to define and access the database. However, for
applications that require more complex data processing (and formatting) they are usually
embedded into a full high-level programming language.
Some authors prefer to further divide DMLs into two categories; namely, procedural and non-
procedural (declarative). Within a procedural language one must specify how the result to be
obtained is computed; whereas using a declarative language one only has to specify what
result must be obtained – what it looks like – the system being responsible for its computation.
Since there are neither pure declarative or pure procedural DMLs – they range between the
two – any classifications of this kind are rather ad-hoc in nature. For example in certain
situations SQL can be considered declarative while in others it can be considered procedural.
An important requirement for DMLs is to allow unplanned or ad-hoc queries; namely, requests
that were not foreseen at the time of design. A problem that may result from this is how to gain
reasonable efficiency for such unpredicted use.
Advantages
Reduced redundancy. In a file-based system each application has its own private files. This
often leads to data being duplicated in different files, wasting storage space. In a database
approach, all data is integrated, reducing or removing unwanted redundancy. There are
various reasons why eliminating redundancy completely is often not possible or desirable in a
DBMS – and we shall return to these in later chapters. However, where the file-based system
forces redundancy in an ad-hoc way, a DBMS should provide mechanisms for specifying
redundant data and for controlling it (to maintain the consistency of the database).
Higher impact of failure. The database system is at the core of the information system of an
organization. All data is stored centrally, in the database. As a result, most applications rely on
this data. If the DBMS fails, the whole organization is paralyzed, unlike a decentralized system,
where a failure in one system will only directly affect the department that uses it.
Performance. DBMS software is heavily optimized for its core functionality, but it is still a
generic piece of software. A database application may be slower for an individual user than a
bespoke, perhaps local, file based solution.
At this point, you should now be in a position to build/develop a functional information system
using Database life cycle approach covered in topic2.In this context we shall develop a student
information system. The implementation is based on MSACCESS and MYSQL relational database
management systems
Gradebook
Course
3. Create the schema using the conceptual model for each entity. Enforce integrity
constraints at this stage.
Example
Repeat step 4 for all tables you have created in step 3. Add as many records as possible.
As summary of all schemas look as follows