ch1 - Database System Concepts
ch1 - Database System Concepts
Polytechnic, Chandwad
Chapter 1 Marks 12
Introduction to Database System
Data is distinct pieces of information, usually formatted in a special way. Data can exist in a
variety of forms -- as numbers or text on pieces of paper, as bits and bytes stored in electronic
memory, or as facts stored in a person's mind.
Strictly speaking, data is the plural of datum, a single piece of information. In practice,
however, people use data as both the singular and plural form of the word.
(2) The term data is often used to distinguish binary machine-readable information from
textual human-readable information. For example, some applications make a distinction
between data files (files that contain binary data) and text files (files that contain ASCII
data).
(3) In database management systems, data files are the files that store the database
information, whereas other files, such as index files and data dictionaries, store
administrative information, known as metadata.
3. Data isolation.
Because data are scattered in various files, and files may be in different formats, writing
new application programs to retrieve the appropriate data is difficult.
4. Integrity problems.
The data values stored in the database must satisfy certain types of consistency
constraints. For example, the balance of certain types of bank accounts may never fall below a
prescribed amount (say, Rs 2500). Developers enforce these constraints in the system by
adding appropriate code in the various application programs. However, when new constraints
are added, it is difficult to change the programs to enforce them. The problem is compounded
when constraints involve several data items from different files.
5. Atomicity problems.
A computer system, like any other mechanical or electrical device, is subject to failure.
In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent
Mr. P R Sali 2 DMS (313302)
SNJB’s Shri H.H.J.B. Polytechnic, Chandwad
7. Security problems.
Not every user of the database system should be able to access all the data. For
example, in a banking system, payroll personnel need to see only that part of the database
that has information about the various bank employees. They do not need access to
information about customer accounts. But, since application programs are added to the
file-processing system in an ad hoc manner, enforcing such security constraints is difficult.
1. Independence of Data and Program: This is a prime advantage of a database. Both the
database and the user program can be altered independently of each other thus saving time
and money which would be required to retain consistency.
2. Data Share Ability and Non- redundancy of Data: The ideal situation is to enable
applications to share an integrated database containing all the data needed by the
applications and thus eliminate as much as possible the need to store data redundantly.
3. Integrity: With many different users sharing various portions of the database, it is
impossible for each user to be responsible for the consistency of the values in the database
and for maintaining the relationships of the user data items to all other data item, some of
which may be unknown or even prohibited for the user to access.
4. Centralized Control: With central control of the database, the DBA can ensure that
standards are followed in the representation of data.
5. Security: Having control over the database the DBA can ensure that access to the database
is through proper channels and can define the access rights of any user to any data items or
defined subset of the database. The security system must prevent corruption of the existing
data either accidentally or maliciously.
6. Performance and Efficiency : In view of the size of databases and of demanding database
accessing requirements, good performance and efficiency are major requirements, Knowing
the overall requirements of the organization, as opposed to the requirements of any individual
user, the DBA can structure the database system to provide an overall service that is 'best for
the enterprise'.
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
•Credit card transactions: For purchases on credit cards and generation of monthly
statements.
•Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
• Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real- time market data to enable on-line trading by
customers and automated trading by the firm.
• Manufacturing: For management of the supply chain and for tracking production of items in
factories, inventories of items in warehouses and stores, and orders for items.
•Human resources: For information about employees, salaries, payroll taxes, benefits, and for
generation of paychecks.
1. Physical level. The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures in detail.
2. Logical level. The next- higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve
complex physical- level structures, the user of the logical level does not need to be
3. View level. The highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains because of the
variety of information stored in a large database. Many users of the database system do
not need all this information; instead, they need to access only a part of the database.
The view level of abstraction exists to simplify their interaction with the system. The
system may provide many views for the same database.
Figure 1.1 shows the relationship among the three levels of abstraction.
A relational database management system must manage its stored data using only its
relational capabilities. The system must qualify as relational, as a database, and as a
management system. For a system to qualify as a relational database management system
(RDBMS), that system must use its relational facilities (exclusively) to manage the
database.
All information in a relational database (including table and column names) is represented
in only one way, namely as a value in a table.
All data must be accessible. This rule is essentially a restatement of the fundamental
requirement for primary keys. It says that every individual scalar value in the database
must be logically addressable by specifying the name of the containing table, the name of
the containing column and the primary key value of the containing row.
Mr. P R Sali 7 DMS (313302)
SNJB’s Shri H.H.J.B. Polytechnic, Chandwad
The DBMS must allow each field to remain null (or empty). Specifically, it must support a
representation of "missing information and inapplicable information" that is systematic,
distinct from all regular values (for example, "distinct from zero or any other number", in
the case of numeric values), and independent of data type. It is also implied that such
representations must be manipulated by the DBMS in a systematic way.
The system must support an online, inline, relational catalog that is accessible to
authorized users by means of their regular query language. That is, users must be able to
access the database's structure (catalog) using the same query language that they use to
access the database's data.
All views that are theoretically updatable must be updatable by the system.
The system must support set-at-a-time insert, update, and delete operators. This means
that data can be retrieved from a relational database in sets constructed of data from
multiple rows and/or multiple tables. This rule states that insert, update, and delete
operations should be supported for any retrievable set rather than just for a single row in
a single table.
Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.)
must not require a change to an application based on the structure.
Changes to the logical level (tables, columns, rows, and so on) must not require a change
to an application based on the structure. Logical data independence is more difficult to
achieve than physical data independence.
Integrity constraints must be specified separately from application programs and stored in
the catalog. It must be possible to change such constraints as and when appropriate
without unnecessarily affecting existing applications.
If the system provides a low-level (record-at-a-time) interface, then that interface cannot
be used to subvert the system, for example, bypassing a relational security or integrity
constraint.
4. Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
The storage manager components provide the interface between the low level data
stored in the database and the application programs and queries submitted to the system. The
storage manager components include
1. Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
2. Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
3. File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
4. Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in memory.
In addition, several data structures are required as part of the physical system
implementation:
1. Data files, which store the database itself.
2. Data dictionary, which stores metadata about the structure of the database. The data
dictionary is used heavily. Therefore, great emphasis should be placed on developing a good
design and efficient implementation of the dictionary.
3. Indices which provide fast access to data items that hold particular values.
4. Statistical data, which store statistical information about the data in the database. This
information is used by the query processor to select efficient ways to execute a query.
Following figure 1.2 shows these components and the connections among them.
Database functionality, can be broadly divided into two parts- the front end and back-end
as shown in Figure 1.4. The back-end manages access structures, query evaluation and
optimization, concurrency control, and recovery. The front- end of a database system consists of
tools such as forms, report writers, and graphical user-interface facilities. The interface between
the front-end and the back-end is through SQL, or through an application program. Server
systems can be broadly categorized as transaction servers and data servers.
1. Transaction -server systems, also called query- server -systems, provide an interface to which
clients can send requests to perform an action, in response to which they execute the action and
send back results to the client. Users may specify requests in SQL, or through an application
program interface, using a remote- procedure- call mechanism.
2. Data- server systems allow clients to interact with the servers by making requests to read or
update data, in units such as files or pages. For example, files servers provide a file-system
interface where clients can create, update, read and delete files. Data servers for database
systems offer much more functionality; they support units of data- such as pages, tuples or
objects- that are smaller than a file. They provide indexing facilities for data, and provide
transaction facilities so that the data are never left in an inconsistent state if a client machine or
process fails.
A-207 750