Introduction To Databases: 1 Basic Definitions
Introduction To Databases: 1 Basic Definitions
Introduction to Databases
Databases and database applications are everywhere. When the cashiers ring you at the
supermarket, they are accessing a database; when you enroll in a class, that information is
stored in a computer database; all the bills that come to you are generated by database
applications.
This book introduces computer databases, mainly from the perspective of a database designer
and software developer. We assume you will be writing database applications; that is,
programs that access a database.
1 Basic Definitions
Most computer applications store and manipulate data, in one way or another. Initially, as
people wrote different applications, each group of programmers would write routines to store
data, usually in fies in a file system; these routines would be completely different for each
application (chances are, you were asked to write a program that reads and writes records
from a file, for your introductory programming classes).
As we gained more experience, we realized that it would make sense to standardize some of
those routines, still at a file level, and we created libraries and programming languages with
standard routines to access files, where each file contained records of the same type;
languages like COBOL and RPG. These routines and languages were much better than
starting from scratch, but were still limited.
Eventually, these routines grew into complex pieces of software, designed to store and
manipulate large amounts of data, to control multi-user access and to provide for easy ways to
produce ad-hoc reports; we now call this software a Database Management System.
When dealing with computer databases it is useful to distinguish between the actual database
(the data) and the database applications (the programs that access
that data). Actually, since database applications are so important, we Computer
have found it convenient to create pieces of software that deal with Database:
collection of data
databases in general, and so allow us to create database applications stored in a
easier; we call that piece of software a DataBase Management System computer system.
(DBMS).
More formally, we can define those concepts as follows: Database
● A Computer Database (DB) is a collection of data stored in a Management
computer system. In this book, we assume all databases will be System (DBMS):
software that is
stored in a computer, so we don't distinguish between databases able to provide
and computer databases. access to
● A Database Management System (DBMS) is a piece of databases in
general.
software that is able to provide access to databases in general.
● A Database Application is a piece of software that provides
Database
value to a user and accesses one (or a small number of) specific Application:
databases. application
Although we need to distinguish these three terms so we can study designed to access
one (or a small
them separately, it is extremely common for people to use the term number of) specific
'database' to refer to each one of them. For most users of a database databases.
By Orlando Karam, Licensed under Creative Commons, Attribution, Share-Alike
http://creativecommons.org/licenses/by-sa/3.0/ Data: individual
facts.
Databases, okaram Introduction to Databases 2 of 8
application, the application is the database, and many programmers refer to the DBMS, or to
the computer on which the DBMS runs, as the database.
When designing database applications, it is useful to distinguish between data, that is the
individual facts, and information, which is processed data (hopefully so as to be more useful
to the users). For the most part, what will be input into a database
application is raw data, but what we want to get out of it is information. Information:
Notice this is not a hard and crisp distinction. For many users, the data processed data
is also information, and what is information for one user may be raw
data for another.
For example, imagine a university, with several departments, which offers many classes. Data
about individual students, which classes they take and what grades they get would be
considered data, since it is what we put into the system. For the students, this is also
information, since they want to know their grades. For the teachers, the individual grades are
information, but also things like class GPA etc; for a department chair, which particular
students are taking each course (raw data) may not be important, but only the number of
students in each course; this number of students in each course may be considered raw data
for the president of the university, who may be concerned just about the average number of
students in a department's courses.
An important kind of data is metadata, that is data about other data. Metadata: data
This includes computer metadata (the data type used, the number of about other data.
bits or bytes per field etc), that is, metadata that is directly used by
computer systems that manipulate the data, and human documentation (what the fields in the
database actually mean, where does each piece of data comes from, who is authorized to
change the definitions, etc).
Another important distinction when thinking about databases, is between a database's schema, that is
instance, which is the data present in that particular database at a
given moment in time. Obviously, the data in a database changes over Database Schema:
time, while the schema changes very little, if at all. Description of all
possible values in a
Formally: database, along
● A Database Schema is a description of all the possible values in with the constraints
those values must
a database, along with the constraints those values must satisfy. satisfy.
Database
● A database instance is the set of values present in a particular Instance: Set of
database at a particular moment in time. values present in a
Sometimes, we call a database's schema its intension and the particular database
at a particular
particular instance its extension. moment in time
The basic operations a database application performs are to Create (or
add), Read, Update and Delete data, which we can remember with the
acronym CRUD. Most database applications do not provide ways to alter the database's
schema within the application, just the extension of the database (the actual data inside the
database). DBMSs, of course, provide ways to create new databases and to alter their
schema, besides CRUD operations on data.
1 A better name would be program-metadata independence, since the program is independent of the format of
the data, that is, the metadata; however, program-data independence is the commonly used term.
Databases, okaram Introduction to Databases 4 of 8
advantages are potential, and depend not just on database technology but also on
organizational issues.
1. Program-data Independence
2. Minimized Data Redundancy
3. Improved Data Consistency, Standards and Quality
4. Improved Data Sharing
5. Improved Data accessibility and Decision support (ad-hoc queries etc)
6. Reduced program maintenance
The main disadvantage of the database approach is the need for specialized software, the
DBMS. This means people need to be trained for it (but that's what you're doing right now),
and probably the need to have a database administrator (DBA), that takes care of the DBMS
and the databases managed by it. It used to be that installing and administering a DBMS was
a very specialized task, but nowadays it is relatively simple, although, of course, and
experienced DBA will probably be able to achieve much better performance and reliability from
a given DBMS. The DBMS may also need specialized backup procedures.
3 Database Sizes
Most database applications share common characteristics, which is why we classify those
applications as database applications; however, depending on a number of factors, most
notably the number of users, the size of the database and the applications dependability
requirements, we may need to use different technologies, so it is often convenient to classify
the applications according to their size.
Notice that as technology changes, it may become convenient to classify in a different way or
to adjust the categories. However, in 2008, we find convenient to classify the applications as
follows:
● Personal Database Applications – These will be accessed by one person only.
Nowadays this usually means palmtops and mobile devices, since on PCs we develop
most databases for the possibility of having more than one user at a time. The database
itself will usually be very small and the application doesn't need network connectivity, so
the DBMS may be embedded in the application, rather than use client-server
technology.
● Workgroup Database Application – These are databases accessed by a relatively
small number of people at a time (say less than 100), and most of them on the local
network. Here we normally want client-server technology, with a DBMS server, but we
do not need specialized hardware, and almost any DBMS will do.
● Enterprise database applications – Enterprise applications are those that, when not
working, bring down the whole company; therefore, we need a special emphasis on
reliability. We usually need specialized hardware (real servers, oftentimes Unix or
Mainframes), and we oftentimes have a large number of users, and many of them
accessing the database remotely.
● Internet Database Applications – Here we have a very large number of users, but
they seldom access the application. All of the users come remotely, and their client is a
web browser.
Keep in mind these categories are fuzzy, and they are changing. Technology has been
converging rapidly. Advanced DBMSs have become cheaper and easier to use, so we use
Databases, okaram Introduction to Databases 5 of 8
them even for small applications; we are becoming more adept at developing web-based
applications, so nowadays we develop many applications as if they were internet applications,
even though they will be used as workgroup or even personal applications.