0% found this document useful (0 votes)
20 views19 pages

Developing An Information System

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views19 pages

Developing An Information System

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Developing an information system

What is a database?
The first – and most obvious – question to ask when you take up this subject is the simplest –
“What is a database?” Certainly, you will have dealt with them, indirectly, almost daily. Whether
you are in a shop in person or whether you are exploring its catalogue on the internet, when
you check whether a product is in stock, it is likely that a database will be used somewhere
within the system. Amazon and Facebook, YouTube and iTunes all use databases to deliver
products and services to their users.
The database and its structure may be quite obvious to the user for a library catalogue or an
online retailer, but it may also be serving a less direct purpose, allowing the company to keep
track of its employees and suppliers, or helping an advertiser track visitors to web pages across
different sites, tailoring their adverts to match a browser9s activity.

Activity
Before you read on, try to list some other examples of databases you have come across. What
do they have in common? Based on your examples, write down what you think are the most
important features of a database.

A database system is a system that stores data. To qualify as a database system, there are some
features that it would have to offer:
• find (retrieve) data
• add (insert) new data
• delete unwanted data • change (update) data.
This definition will be refined and formalized in the sections to come, but first, we can illustrate
these features with an example.
Consider a shoe shop, specializing in trainers. The shop keeps information about the products it
sells. This information could be organized in the form of a table, as shown in Figure 1.1 (prices
are in U.K. pounds sterling, shown as £), and could be part of the shop’s database.

Air Max 90 Nike 35 K1S4 £40.00 3

Air Max 90 Nike 37 K1S4 £40.00 12

Mesh Kaplan 48 MFash1 £65.00 18

Ferris Vans 41 WPlim2 £10.00 1

Figure 1.1. A part of a shoe shop’s database of trainers.


The “Vans Ferris”” range is about to be discontinued, and the shop will no longer have them for
sale once the pair in stock is sold. At that point, the record for that shoe – the row in the table
– will be deleted.
“”Kaplan Mesh”” shoes have not been selling well, and the large number in stock is taking up
space, so the price will be reduced to £40. The corresponding field for the price of that product
– a cell in this table – will be updated.
A new fashion trainer, the “”Asics Gel Lyte VI” has just been released, so a new record – a new
row in the table – must be inserted. In this example, we would expect several new records – for
different sizes of shoe – to be inserted, but we will only show one, to save space.
Figure 1.2 shows the table after these operations – deletion, updating and insertion – has been
carried out.

Model Brand Size (EU) Location Price Stock level


Air Max Nike 35 K1S4 £40.00 3
90
Air Max Nike 37 K1S4 £40.00 12
90
Mesh Kaplan 48 MFash1 £40.00 18
Gel Lyte Asics 46 MFash3 £100.0 14
VI … … … …0 …

Figure 1.2. Part of the shoe shop’s database after some changes. Altered fields are
underlined.
A database has a structure and content. The structure is represented in this example by the
table headings; the content by the body of the table. The content changes in time – it is
dynamic in nature. The structure can change, but it is far less changeable than the content. For
instance, you could add a new column to this table – the type of trainer or the activity it might
be associated with – but you would not expect to make such changes that often. The structure
of the database is called its intension and the content is called its extension
Although it may not be obvious from this example, a database is capable of storing a large
amount of data.
So far, a database system is, for us, nothing more than a system that manages data. But is any
system that manages data a database system? Is there anything that all database systems have
in common, that distinguishes them from other software systems? The answer is obviously yes.
In order to understand the “”database approach””, we shall first have a brief look at file based
systems. In appearance (behavior) they are similar to database systems, but they are
conceptually (qualitatively) different. We shall identify the drawbacks of the file-based
approach to data management and then introduce the database approach as a solution to
most of these drawbacks.
Activity
Consider the database examples you listed in the previous activity. For each one, think about it
as a table like the ones in Figures 2.1 and 2.2 above. What columns would the table have?
Would all the information fit in a single table, or would there be several?

File-based systems
We shall start with a definition of file-based systems.

Definition: A file-based system is a collection of application programs, each managing its own
data.
In a file-based system, permanent data is stored in various files of ad-hoc structures. Each
application program defines and handles its own data files independently of the others. This
approach is called the de-centralized approach. Each application program works with its data
at the physical level, manipulating records as they are organized in persistent memory. Sharing
of data between applications is likely to be limited.

The concept of a physical level for data is one to which we will return later. The structure we
describe is not purely physical, but we use the term to indicate that it is to some extent
platform dependent, because access to files is made through the primitives (built-in
functionality) of the operating system.
Take, for example, an estate agent’s office, for which we shall consider the Sales and the
Contracts department. Each department maintains its own data in its own data files, as
depicted in Figure 1.3.
Figure 1.3. A file-based system for an estate agent’s company. The Sales Department needs:
 detailed information about the properties for rent, so staff can give good advice to
customers (such as Type and No Of Rooms from the Property for rent file);
 detailed information about customers, so that their needs can be appropriately
matched to what is available (such as Preferred Type and Max Rent in the Renter file);
 “identification”” information – such as name, address and telephone number – about
customers, the properties on offer and their owners. The Contracts Department
needs:
 detailed information about the renting contracts (in the Lease file);
 “”identification”” information – such as name, address, telephone number – about c
ustomers, the contracted properties and their owners.

Some drawbacks of this solution are obvious. These are the limitations of the file- based approach in
general. The most important are enumerated below.
Duplication. Different applications might have to make use of the same information.
Because each application has its own files, data is duplicated (e.g. the 8identification9
information in our example). This aspect has at least two negative consequences. Firstly,
duplication is wasteful. Secondly, data can become inconsistent – it can have different values
indifferent files (belonging to different applications), even though it is supposed to give the
same piece of information. For example, the address of an owner, Mr. J. Morris, might be
updated in the Owner file belonging to the Sales Department, while the Contracts
Department might still have Mr.Morris’s old address.
NOTE: Wasting disk space is unlikely to be a significant concern in the example we have
given – storage is cheap – but in situations where the number of applications and the scale
of duplication is greater this can become more important.

Separation and isolation. Data is scattered among different files, each file belonging to a
certain department. A department has access to its own files, but no access to the files of the
other departments. Files belonging to different departments cannot be used together in order
to create more complex data or analysis. Often, because they are based on different
infrastructures (platforms, development software, etc.) files belonging to different
departments cannot be transferred (copied) across.
Program-data dependence. Each file belongs to a certain application program. The (physical)
structure of data is defined inside the application program. This could easily – and usually does
– lead to incompatible file formats between applications, meaning that it becomes impossible
to share data between them. Another aspect is that data definition is embedded in the
application program. That means that if the physical structure of data is to be changed – for
instance, if instead of representing a year with two digits, it is to be represented with four2 – then
the application program itself must be changed. Not only that, but the methods of access and data are
also embedded in the application program to change them, the application program must be
modified.
In the file-based approach, the emphasis is placed on functionality – provided by the application
program. Data modeling takes a lower priority. This approach leads to the drawbacks we have
listed. If the approach is inverted and we consider data as central, then these problems can be
removed. Informally, this represents the database approach.
2.2.2 Databases and database management systems
We shall start with the definition provided by Connolly and Begg:

Database. “A database is a (shared) collection of logically-related persistent data (including its


description) as part of the information system of an organization.”
Let us now explain this definition. A database is a large repository of data, in which data is
defined once and stored once. Data that was scattered in different files – with different formats
and owners – in the file-based approach, is now integrated with minimum redundancy
(duplication), as a single resource. Different application programs will share this common
resource, usually concurrently (at the same time).
There are times when redundant data is necessary, some of which will be explored in this guide.
You might choose to store intermediate results that are called for often (using snapshots) or to
ensure the atomicity of a set of operations (transactions). In these cases, the redundant data is
intentionally included to achieve something extra, and even so requires special treatment. The
diagrams in Figure 2.4 and Figure 2.5 illustrate the differences between the database approach
and the file-based approach.

Figure 2.4. The file-based approach.


In the database approach (see Figure 2.5), the raw data is integrated in a common database for
all applications. The data is managed by a database management system (DBMS), which
provides shared access to it, for all the applications in the system.

Figure 2.5. The database approach.


A database management system is a software system that provides a set of primitives
(built-in functionality) for defining, accessing and maintaining a database.
A database stores both the raw data and its description. We say that the information1
stored in a database is self-describing. The description of the raw data is known as the system
dictionary, data dictionary or metadata.
The consequence of this approach is program-data independence. This means that the
structure of data may change without affecting the application programs that use it. This basic
definition is going to be refined and better explained later in this chapter.
This approach – separating the data definition from application programs – is similar to data
abstraction in programming, where the internal definition of an object is kept separate from its
external definition. An outside system can only see the exterior of the object. As far as the
external definition remains unchanged, any changes in the object9s internal definition do not
affect the outside system.
A database management system automatically performs a lot of the housekeeping tasks that
would otherwise be the responsibility of the application programmer. As a result, the user – i.e.
the person who defines and uses the database – is presented with a clean and powerful set of
tools for database development and exploitation. A more detailed description of both database
systems and database management systems is provided in the following sections. At this stage,
it is important that you broadly understand why database systems are needed and what their
main benefits are.
This definition, with its distinction between file-based and database approaches is quite high-
level and functional. From the discussion above, many organizations that use database
software would still be defined as having a file-based system, if different departments use
different database implementations for storing similar data.

Activity
Before you read on, try to think of an organization you know – perhaps one you9ve worked for
or studied at – that has multiple systems similar to what is described above. What problems
can occur when you have data duplication like this? Does it matter whether the separate
systems store their information in database software or spreadsheets? Do you think this is a
useful definition of database systems?

 It should be possible to change the physical representation of data without affecting


users, as long as its logical structure is preserved.
 As we have seen, the database integrates all the information required within an
organization. Individual users will often only need (or be allowed) access to certain
parts of this pool of information9. Each user, then, needs to have a customized view
of the database and it should be possible to change that view without affecting
other users.
These aims were formalized in the early 1970s and codified and adopted as a standard in 1975
as the ANSI/SPARC three-level architecture. The architecture forms a basis for most modern
DBMS.
The ANSI/SPARC architecture consists of three levels of abstraction (see Figure 2.6). The
external level represents the way data is viewed by individual users. The conceptual level
represents the way the organizational data (i.e. all data that is relevant for the organization) is
structured. The internal level
2.2 The components of a database system

Figure 1.10 The components of a database system.

At the highest, most general level, a database environment consists of:


data, representing the information needed for an organization;

 software, serving two purposes: the management of the stored data, and further
processing of the data to the users’s needs;

 hardware, supporting both the stored data and the software components;
 users, broadly divided into two categories: developers of the database system, and
users of the system.

Data
Data can be classified into two categories, namely:
1. Primary data – the fundamental information necessary to provide the database service,
stored on permanent support, such as hard disks.
2. Derived data – information that can be inferred or calculated from primary data (and may be
recalculated at any time).
Derived data may be the output of the application programs – the result of processing the
primary data – in a form suitable for the users’ needs, but it can also be the input from users
that will then be processed by the application to be stored as primary data.
The focus of a database system is on primary data. This has to be appropriately identified,
described and implemented. The primary data has three important characteristics. It is:
• integrated, rather than existing in separate systems – it has been gathered
together into a single system2

• shared, with all the applications belonging to the information system having
common access to (at least parts of) it
• extensive, in that database systems are usually developed for data intensive
applications, where their benefits are more clearly felt.
Stored data, as we have already seen, does not include only the raw data, but also its description
– the metadata, system dictionary or catalogue.
Software
The software component can be seen as consisting of three layers (Figure
2.11):
• the operating system (OS), positioned at the base, provides the necessary routines for
accessing the hardware resources (such as file handling or memory management
routines);
• the database management system (DBMS), placed above the OS – and using the routines
that the OS makes available – provides all the necessary primitives for data
management, including languages for defining schemas, manipulating and reading data and
so on;
• application programs, above the DBMS – and using the routines made available by the
DBMS – provide data formats and computations beyond the capabilities of the DBMS.
Figure 2.11. The layered structure of the software component of a database system. Anything
below the dashed grey line is platform-dependent, and so will not be discussed here in any
detail.
The hardware and the OS are often grouped together and called the platform. There is
considerable variation between platforms, which is one reason for having the DBMS software
handle this variation and present a more abstracted interface to higher-level components. This
provides a platform independence that shields the application programs from unnecessary
physical details, and means that we need not concern ourselves with details of hardware or OS
for the remainder of these subject guides. Instead, we focus on the features provided for the
application programs by the DBMS.
The features of the DBMS will be considered in detail over the course of this chapter. Briefly, the
DBMS provides support for schema definition, data manipulation, data security and data
integrity. The application programs can be of two kinds:
1. user developed;
2. provided together with the DBMS by its developer.
The former class of applications will generally be written in a high-level programming language,
such as C, Java or Python. Support for database access in such languages is provided by means
of a data sub-language, embedded within the host language. Statements written in the
embedded sub-language are processed and passed on to the DBMS using the appropriate
routines.
Programs provided by the DBMS developer allow the rapid development of user applications,
without the user writing any conventional code.

Programming tools abstract away or remove so much functionality in order to allow often
application-specific software to be constructed quickly; these are known generically as fourth-
generation tools. Home or small business database systems – such as Microsoft Access or
OpenOffice Base – provide graphical fourth-generation tools for this purpose.
The DBMS can also be referred to as server or backend (server), whereas the application
programs are referred to as clients, or front-ends. Clients use the services provided by a server
for data management. The division between client and server makes it possible for the server
and client to run on different machines, giving rise to the idea of distributed processing, an
issue discussed in the 8Database architectures9 section and elsewhere in these subject guides.

Hardware
As we have seen, the DBMS allows both the developer of a database and the
database users to operate without knowing the details of the hardware being used.
This does not remove from the system administrator the need to select hardware
and operating systems that, firstly, are capable of running the chosen software; and
secondly that can cope with the demands that will be placed upon it by the
database and associated systems. The system administrator should be satisfied that:
1. There is enough permanent storage space, for instance disk space, to store the data
and any indexes and cached derived data.
2. There is enough temporary storage space, for instance RAM, to hold intermediate
results and computations.
3. There is enough computational power to manipulate the data at the rate that will be
required.
4. There is fast enough communication between components of the
system for moving the data between them. This is only usually an issue
for particularly data-heavy applications or systems with a very large user
base.
Although DBMS vendors will provide recommendations for minimal configurations required for
different sizes of application, individual use cases will have a large impact on the system
requirements.

Users
Users, as a component of a database environment, can be classified in four categories, according
to the role they play.

Data administrator. The data administrator (DA) is a user who properly understands the data
requirements of the organization and is in charge of administering the organisation9s data. This
user:
• decides which data is relevant and which is not;
• is in charge of applying the organization’s policy and standards;
• decides on the security policy, and so on.
The DA does not need to be a technical expert or a manager. Rather, the DA is somewhere in
between, liaising with the management on one hand, and with the technical team, on the other.
Database administrator. The database administrator (DBA) is the technical user in charge of the
database system. More specifically the DBA is responsible for the database9s design,
implementation and maintenance, and deals with both the correctness of the implementation
and the efficiency of the database system. The DBA must have good technical knowledge and is
in charge of the definition of the DB schemas, integrity and security rules, access procedures,
backup and recovery procedures, performance of the system, etc.

Application programmer. The application programmer writes programs that perform


more complex processing of data (either computations or formatting). For this, they use
either a third-generation language, embedded with a database language, or a fourth-
generation tool. The resulting programs are for use by end users.

End user. The end users are the “beneficiaries” of the database system. They may range from
technically naïve to extremely sophisticated. A technically naïve user, for example a bank
employee, may interact with the system using application programs developed for specific tasks.
A naïve user does not have to be aware of the functionality of the DBMS. All they need is
reliable and easy to use programs that they can use with minimal fuss. A sophisticated user, on
the other hand, will know how to access the database directly, through the database language
supported by the DBMS. Sometimes a sophisticated user might even develop applications, and
so become an application programmer.

Activity
“ The term user is often used in software engineering to refer to one or more people playing a
particular role in interacting with software. That means that a “user” here can mean several
people, and one person can be several different users’ in different contexts, depending on the
work that she or he is doing at the time.”
At the beginning of this chapter, you were asked to list databases you had encountered in real
life. For each one, consider which group of user takes which of the above roles. Is the
separation always clear?

DBMSs and database languages


The database management system is the software through which all access to the
database is made. This is a concise but limited definition. In reality the DBMS is
responsible for much more. Some of its important features are presented below.
Data definition. The DBMS must provide support for defining or modifying the database
schema. Schema definition includes specifying data types, structures, constraints and
security restrictions. This is achieved by means of a data definition language (DDL). The
statements made in DDL for a specific database system represent the system’s catalogue
(or data dictionary7). In theory, there should be a DDL at each level of (i.e. external,
conceptual and internal), but in practice there usually exists a single DDL that allows
definitions at any level.
Data manipulation. The DBMS must provide support for data manipulation. In particular it has
to support:
• retrieval of existing data, contained in the database
• deletion of old data from the database
• insertion of new data
• modification of out-of-date data.

This is achieved by means of a data manipulation language (DML). There can be a DML at each
level of abstraction. At the external and conceptual level, the DML is concise, comprehensive
and easy to use; in other words, the emphasis is on its expressive power – on these levels,
efficiency is a secondary goal. On the other hand, at the internal level, the emphasis is placed
on the DML9s efficiency. This means that its statements are complex – and probably not that
straightforwardly expressible – but quite efficient.
These languages (DDLs and DMLs) are called data sub-languages because they do not
include constructs for the control of flow – they are computationally incomplete
(meaning they cannot be used as general purpose programming languages).
Users can use them directly in order to define and access the database. However, for
applications that require more complex data processing (and formatting) they are usually
embedded into a full high-level programming language.
Some authors prefer to further divide DMLs into two categories; namely, procedural and non-
procedural (declarative). Within a procedural language one must specify how the result to be
obtained is computed; whereas using a declarative language one only has to specify what
result must be obtained – what it looks like – the system being responsible for its computation.
Since there are neither pure declarative or pure procedural DMLs – they range between the
two – any classifications of this kind are rather ad-hoc in nature. For example in certain
situations SQL can be considered declarative while in others it can be considered procedural.

An important requirement for DMLs is to allow unplanned or ad-hoc queries; namely, requests
that were not foreseen at the time of design. A problem that may result from this is how to gain
reasonable efficiency for such unpredicted use.

Other features that the DBMS must provide include:


• support for data integrity – the system must ensure that there are no
“”contradictions”” between the data values in the database; this is achieved based on a
set of integrity constraints (part of the data dictionary)
• support for security control – the system must ensure that data is not accessed by
unauthorized users or applications; this is achieved based on a set of security rules (part of
the data dictionary)
• recovery services – the ability to restore the database to a previously correct state in the
case of a crash
or error
• concurrency facilities – allowing the database to be accessed by more than one user at a
time . support for data communication
• user-accessible data dictionary.

Advantages and disadvantages of database systems


The main advantages and disadvantages associated with the database approach are not
described in detail here. You are advised to keep this issue in your mind throughout the whole
course, and to continuously review it.

Advantages
Reduced redundancy. In a file-based system each application has its own private files. This
often leads to data being duplicated in different files, wasting storage space. In a database
approach, all data is integrated, reducing or removing unwanted redundancy. There are
various reasons why eliminating redundancy completely is often not possible or desirable in a
DBMS – and we shall return to these in later chapters. However, where the file-based system
forces redundancy in an ad-hoc way, a DBMS should provide mechanisms for specifying
redundant data and for controlling it (to maintain the consistency of the database).

Avoiding inconsistency. This is largely as a result of the reduced redundancy. A database is in


an inconsistent state if the same item of information is stored in at least two places in the
database, but with different values. The database approach dramatically reduces that sort of
repetition, making the risk of inconsistent data smaller. Even where redundant information is
stored, the repetition can be made known to the DBMS, so that the system automatically
enforces consistency, so that whenever some changes are made to one set of data, the same
changes are propagated to the same version that is duplicated elsewhere. The support
provided by most current DBMSs for preventing inconsistencies is limited to a relatively small
number of categories, but the mechanism is present.
Improved data sharing. Since all data is centralized, the restrictions on which applications and
users can see it are ones of security constraints rather than those of system and network
architecture. In contrast to having a set of separate file-based systems, here all data is
integrated, meaning that more information can be derived from the same amount of data. Both
aspects considerably improve the accessibility of data.
Data independence. As we have seen in earlier sections, a database approach provides
protection for applications from changes in both the physical and – at least to some extent –
the logical structure of the data (physical and logical data independence).
Some other benefits of the database approach are:
• the maintenance of the overall information system can be improved due to data
independence
• integrity can be maintained – any DBMS should allow the specification of integrity
constraints on data
• more detailed and coherent security restrictions can be applied – a DBMS should allow
the specification of
security rules on data and on users
• standards can be enforced
• concurrent access can be more easily achieved
• better recovery mechanisms can be devised
• large-scale requirement conflicts for the information system of an organization can be
balanced and
resolved.
Disadvantages
Complexity. In the database approach the information needed by an organization is modeled
and implemented as a whole. Where the file-based approach can often be achieved piece by
piece as individual departments develop a need and budget, the process of developing a
database system is by its nature a single, unifying and more complex process, which will
include:
• data acquisition
• data modeling and design • database implementation
• database maintenance.
The greater complexity of this process may mean that errors in implementation, design and data
acquisition may occur, and be harder, within the organization, to get fixed.
Depending on the organization and its data need, the database approach may require extra
hardware and IT infrastructure, along with new maintenance contracts. Depending on the
system being replaced, this can make a database approach more expensive, in terms of either
initial or ongoing costs. The DBMS software itself may cost no money, since there are many
free and open source options, but the system built around it will require developer time and
may also incorporate other, paid-for software. In some cases, the integration of several
systems may represent a reduction in costs, as separate contracts and IT structures are
rationalized and unified.

Higher impact of failure. The database system is at the core of the information system of an
organization. All data is stored centrally, in the database. As a result, most applications rely on
this data. If the DBMS fails, the whole organization is paralyzed, unlike a decentralized system,
where a failure in one system will only directly affect the department that uses it.

Performance. DBMS software is heavily optimized for its core functionality, but it is still a
generic piece of software. A database application may be slower for an individual user than a
bespoke, perhaps local, file based solution.

At this point, you should now be in a position to build/develop a functional information system
using Database life cycle approach covered in topic2.In this context we shall develop a student
information system. The implementation is based on MSACCESS and MYSQL relational database
management systems

NB: The following implementation is done using MYSQL


1. Identify key entities in Students information system
Example
 Personal details
 Program
 School
 Grade book
 Course
2. Represent each entity in its conceptual model.
Example
Personal details

Gradebook

Course

3. Create the schema using the conceptual model for each entity. Enforce integrity
constraints at this stage.
Example

CREATE TABLE Personal_details(


admno varchar(13) not null primary key,
fname varchar(255),
lname varchar(255),
email varchar(255),
dob varchar(255)
yos int
);
Enforcing referential integrity
Example
Admno attribute of the Gradebook references admno of the personal_details
CREATE TABLE table_name (
column1 data_type,
column2 data_type,
...,
FOREIGN KEY (column_name)
REFERENCES referenced_table_name (referenced_column_name)
);

ADD FOREIGN KEY (admno) REFERENCES Personal_details(admno)


Cascade on update/delete operations
ALTER TABLE `employees`.`result`
ADD FOREIGN KEY (`admno`) REFERENCES `employees`.`personal_details`(`admno`) ON
UPDATE CASCADE ON DELETE CASCADE,
ENGINE=INNODB;
4. Insert data into the respective tables
Example
Insert into `employees`.`personal_details` (`admno`, `fname`, `lname`,
`email`, `dob`, `yos`)
values
('P04/0002/2023', 'Someone', 'Okello', '[email protected]',
'2004/06/23', '1');

Repeat step 4 for all tables you have created in step 3. Add as many records as possible.
As summary of all schemas look as follows

You might also like