1 File Processing Systems: COMP 378 Database Systems Notes For Chapter 1 of Database System Concepts
1 File Processing Systems: COMP 378 Database Systems Notes For Chapter 1 of Database System Concepts
1 File Processing Systems: COMP 378 Database Systems Notes For Chapter 1 of Database System Concepts
Database Systems
Notes for Chapter 1 of Database System Concepts
Introduction
A database management system (DBMS) is a collection of data and an integrated set of programs
that access that data. The collection of data is often referred to as the database.
Example: Dickinson keeps information about each student: name, Banner ID number, standing, address
...
• this data makes up the database
• programs (or parts of programs) are used to add new students, to change a student’s address or
standing, to retrieve information about a student ...
Goals of a DBMS:
• manage large bodies of information
• provide convenient and efficient ways to store and access information
• secure information against system failure or tampering
• permit data to be shared among multiple users
It is difficult to prevent such problems unless programs (example: withdrawal and deposit) are coordi-
nated or integrated.
• atomicity problems - ensuring that a system failure during a database update does not leave the
database in an inconsistent state
1
• security problems
• integrity problems
2 Data Abstraction
abstraction:
• physical level
• logical level
• view level
2
view 1 view 2 ... view N
logical
level
physical
level
4 Data Models
A data model is a collection of conceptual tools for describing:
• data
• data relationships
• data semantics
• consistency constraints
Data models:
3
• provide a way of thinking about data that isn’t linked to the implementation of the database
• data is viewed as sets of entities that represent things in the real world, and sets of relationships among
entities
• constructing new kinds of objects from old kinds (via inheritance) is emphasized
• sets of entities and sets of relationships between entities are represented as tables
• each data item is represented as a fixed format record stored in a row of such a table
For example, a table storing customer information could have columns for the customer name, address and
social security number. Each row in such a table is then a record representing one customer.
Databases are often designed using the entity-relationship model, and then the design is translated to
the relational model for implementation.
The object-relational model is a combination of the object and relational models. Objects of a class can
be stored in a table with one column for each field (data member) of the class.
4
5 Database Languages
5.1 Data Definition Language
A data definition language (DDL) is the language used to define and modify the logical schema of the
database.
• often used to define/modify subschemas (views) and to specify consistency constraints
• the definition of the logical schema (written in the DDL) is compiled into a file or set of tables called
the data dictionary
• the data dictionary is consulted for schema information whenever data is read or modified
• a separate language called the data storage and definition language is used to access the physical
schema of the database
5
– detects when information in the database or data dictionary is lost or corrupted due to disk crash,
power failure, software errors ...
– restores the database to a previous consistent state using the log
The database manager for a small system typically does not implement all of these functions.
• the DDL interpreter, which handles DDL statements and records the effect in the data dictionary
• the DML compiler, which translates DML statements into low-level commands to the query eval-
uation engine. The DML compiler also does query optimization - evaluating alternative query
evaluation plans, and choosing the plan with the lowest cost (typically in terms of disk accesses).
• the query evaluation engine, which executes the query evaluation plan produced by the DML
compiler.
7 Transaction Management
A transaction is a sequence of DML commands that forms a logical unit of work. Example: transferring
money from one bank account to another.
Critical properties of transactions:
• atomicity - a transaction must execute completely or not at all (in terms of the final effect on the
database state).
• consistency - once a transaction completes successfully, the database must be in a consistent state.
The database may be in an inconsistent state while a transaction is executing. For example, during a
transaction that transfers money from account A to account B, the total funds held by the bank may
be inconsistent (after account A has been debited, but before account B has been credited). Note that
such consistency is largely the responsibility of the application programmer.
• isolation - a transaction must not be affected by other transactions that are executing concurrently.
• durability - once a transaction completes successfully, its effect must persist even in the presence of
system failures.
8 Data Mining
Data mining is the process of analyzing large volumes of data to find useful patterns (for example, that
young males who purchase video game systems are also likely to purchase HD televisions). Roughly speaking,
this process has three steps:
1. preprocessing data (often from multiple databases) to put it into the format expected by the data
mining algorithm
2. running the data mining algorithm, which typically uses statistical techniques to find patterns in the
data
The SQL:1999 standard specifies constructs to support data mining, and many commercial tools for data
analysis and visualization are available.
6
9 Database Architecture
Modern database systems typically employ a client-server architecture:
• users interact with client machines, which connect to the server via a network
• two-tier architecture
• three-tier architecture
– the application program communicates with an application server (Oracle WebLogic, IBM Web-
sphere, JBoss), but contains no direct database calls
– the application server communicates with the database server
As the application server contains reusable business logic and Web interface components, this is an
effective strategy for developing large applications.
• sophisticated users: interact with the DBMS using the DML directly.
7
– CAD
– expert systems
– graphical or audio data
– temporal data
– ...
• naive users: interact with the database through application programs (including Web interfaces).
• schema definition