0% found this document useful (0 votes)
19 views29 pages

Lec1 - Introduction To Database Systems

Uploaded by

ramenhunter98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views29 pages

Lec1 - Introduction To Database Systems

Uploaded by

ramenhunter98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Database

Programmin
g
Fall 2024/2025
Lecture 1: Introduction to database systems
Dr. Samir Tartir
What is a database

Database is nothing
more than a collection In informatics, a
of information that database refers to a
exists over a long period collection of data that is
of time, often many managed by a DBMS.
years.
Database system
• Databases today are essential to every business.
• The power of databases comes from a body of knowledge
and technology that has developed over several decades
Introductio • This knowledge is embodied in specialized software called
n a database management system (DBMS)
• A DBMS is a powerful tool for
• Creating and managing large amounts of data efficiently
• Allowing it to persist over long periods of time, safely
Expectation of DBMS
Allow Give Support Enable Control

Allow users to Give users the Support the storage Enable durability Control access to
create new ability to query the of very large and the recovery of data from many
databases and data and modify the amounts of data the database. users at once:
specify their data, using an over a long period - Without allowing
schemas. appropriate of time; allowing unexpected
language. efficient access to interactions among
the data for queries users (isolation)
and database - Without actions on
modifications. the data to be
performed partially
but not completely
(atomicity).
Early DBMS

• The first commercial database management systems appeared in the late 1960’s evolving from
file systems
• File systems store data over a long period of time and allow the storage of large amounts of data
• File systems do not:
• Generally guarantee that data cannot be lost if it is not backed up
• Support efficient access to data items whose location in a particular file is not known
• Directly support a query language
• Satisfy control access to data from many users at once
• Support isolation and atomicity
• File systems support for a schema for the data is limited to the creation of directory structures for
file
Early DBMS

Library Examination Registration

Library Examination Registration


Applications Applications Applications

Library Data Examination Registration


Files Data Files Data Files
Disadvantages of
File Processing
• Program-Data Dependency
• File structure is defined in the program code.
• All programs maintain metadata for each file they
use
• Duplication of Data (Data Redundancy)
• Different systems/programs have separate copies
of the same data
• Same data is held by different programs
• Wasted space and potentially different values
and/or different formats for the same item.
• Limited Data Sharing
• No centralized control of data
• Programs are written in different languages, and
so cannot easily access each other’s files
Disadvantages of
File Processing
•Lengthy Development Times
• Programmers must design their own file
formats

•Excessive Program Maintenance


• 80% of information systems budget

•Vulnerable to Inconsistency
• Change in one table need changes in
corresponding tables as well, otherwise data
will be inconsistent
Application of an early DBMS
Banking systems: maintaining accounts and making sure that system failures
do not cause money to disappear.

Airline reservation systems: require assurance that data will not be lost, and
they must accept very large volumes of small actions by customers.

Corporate record keeping: such as in employment and tax records.


Relational DBMS

Library Examination Registration

Library Examination Registration


Applications Applications Applications

Database
Management System

Data Sharing Data Independence


Controlled Redundancy Better Data Integrity
University
Students Database
• Following a famous paper written by Ted Codd in 1970,
database systems changed significantly.
• Codd proposed that database systems should present the user
Relational with a view of data organized as tables called relations.

DBMS • Behind the scenes, there might be a complex data structure


that allowed rapid response to a variety of queries
• Queries could be expressed in a very high-level language, which
greatly increased the efficiency of database programmers
Heading to smaller systems

DBMS’s were large, expensive The size was necessary, because to store Today, hundreds of gigabytes fit on a
software systems running on large a gigabyte of data required a large single disk, and it is quite feasible to run
computers computer system. a DBMS on a personal computer.

Database systems based on the


relational model have become Use of documents (e.g., XML)
available for even very small
machines
Bigger systems are different than before

• On the other hand, a gigabyte is not that much data anymore.


• Corporate databases routinely store terabytes (1012 bytes). Yet there are many databases that
store petabytes (1015 bytes) of data and serve it all to users.
• Some important examples:
1. Google holds petabytes of data (images, videos, webpages, etc.). This data is not held in a traditional DBMS, but in
specialized structures optimized for search-engine queries.
2. Satellites send down petabytes of information for storage in specialized systems.
3. Repositories such as Flickr store millions of pictures and support search of those pictures. Even a database like Amazon’s has
millions of pictures of products to serve.
4. Sites such as YouTube hold hundreds of thousands, or millions, of videos.
Outline of a
DBMS
• Single boxes -> system components
• Double boxes -> in-memory data structures
• Solid lines -> control and data flow
• Dashed lines -> data flow only

• At the top, there are two distinct sources of


commands to the DBMS:
1. Conventional users and application
programs that ask for data or modify data.
2. Database administrator: a person or
persons responsible for the structure or
schema of the database.
• A DBA, for a university registrar’s database might decide that there
should be:
• A table or relation with columns for a student.
• A course the student has taken and a grade for that student in
that course.
Data-Definition • Only allowable grades are A, B, C, D, and F
• This structure and constraint information is all part of the schema of
Language the database
Commands • The DBA needs a special authority to execute schema-altering
commands, since these can have profound effects on the database
• These schema-altering Data-Definition Language (DDL) commands are:
• Parsed by a DDL processor.
• Then, passed to the execution engine
• Then goes through the index/file/record manager to alter the
metadata, that is, the schema information for the database.
A user or an application
program initiates some
action, using the DML

Overvie The DML commands does


w of not affect the schema of the
database but may affect the
Query content of the database

Processi Answering the Query

ng DML statements are handled


by two separate subsystems

Transaction processing
• It is normal to group one or more database operations into a
transaction, which is a unit of work
• Transaction, however, must be executed atomically and in
apparent isolation from other transactions
Transaction • In addition, a DBMS offers the guarantee of durability;
Processing • The work of a completed transaction will never be lost.
• The transaction manager therefore accepts transaction
commands from an application
• Tell the transaction manager when transactions begin and end
• Provide information about the expectations of the application
The transaction processor performs the following tasks:
Transaction 1. Logging

Processing 2. Concurrency Control


3. Deadlock Resolution
Transaction Processing - Logging

• In order to assure durability, every change in the database is logged separately on disk
• The log manager follows one of several policies designed to assure that no matter when a
system failure or “crash” occurs
• A recovery manager will be able to examine the log of changes, and
• restore the database to some consistent state
• The log manager initially writes the log in buffers and negotiates with the buffer manager
• To make sure that buffers are written to disk (where data can survive a crash) at appropriate times
Transaction Processing – Concurrency
Control

• Transactions must appear to execute in isolation


• But in most systems, there will in truth be many transactions executing at once
• Thus, concurrency-control manager (scheduler) must assure that the individual actions of multiple
concurrent transactions are executed properly (as if they are done sequentially)
• A typical scheduler does its work by maintaining locks on certain pieces of the database.
• These locks prevent two transactions from accessing the same piece of data in ways that interact badly.
• Locks are generally stored in a main-memory lock table
• The scheduler affects the execution of queries and other database operations by forbidding the execution
engine from accessing locked parts of the database.
Transaction • As transactions compete for resources through
Processing: the locks that the scheduler grants, they can get
into a situation where none can proceed
Deadlock because each needs something another
transaction has.
Resolution
• The transaction manager has the responsibility
to intervene and cancel (“rollback” or “abort”)
one or more transactions to let the others
proceed.
• The portion of the DBMS that most affects the
performance that the user sees is the query processor
Query
Answering • The query processor is represented by two components:
1. The query compiler
2. The execution engine
• The query is parsed and optimized by a query compiler
• The query compiler translates the query into an internal form called a query
plan, or sequence of actions the DBMS will perform to answer the query
• Often the operations in a query plan are implementations of “relational algebra”
Query operations

Answering: The • The query compiler consists of three major units:


Query 1. A query parser, which builds a tree structure from the textual form of the
query
Compiler 2. A query preprocessor,
• Performs semantic checks on the query
• Performing some tree transformations to turn the parse tree into a tree of algebraic
operators
3. A query optimizer, which transforms the initial query plan into the best
available sequence of operations on the actual data.

• The query compiler uses metadata and statistics about the data to decide which
sequence of operations is likely to be the fastest
• The execution engine has the responsibility for executing each
of the steps in the chosen query plan
Query • The execution engine interacts with most of the other
Answering: components of the DBMS, either directly or through the buffers
• The execution engine requests data from the buffer manager in order to

The Query manipulate them


• It needs to interact with the scheduler to avoid accessing data that is
locked
Compiler • It also interacts with the log manager to make sure that all database
changes are properly logged
Storage and Buffer Management

• The requests for data are passed to the buffer manager

• The buffer manager’s task is to bring appropriate portions of the data from secondary
storage (disk) where it is kept permanently, to the main-memory buffers

• Normally, the page or (disk block) is the unit of transfer between buffers and disk

• The buffer manager communicates with a storage manager to get data from disk

• The storage manager might involve operating-system commands, but more typically, the
DBMS issues commands directly to the disk controller
Storage and Buffer Management

It is the job of the storage


The data of a database To perform any useful manager to control the
normally resides in operation on data, that data placement of data on disk
secondary storage (Disk). must be in main memory. and its movement between
disk and main memory.

In a simple database The storage manager keeps


For efficiency purposes,
system, the storage track of the location of files
DBMSs normally control
manager might be nothing on the disk and obtains the
storage on the disk directly,
more than the file system of block or blocks containing a
at least under some
the underlying operating file on request from the
circumstances.
system. buffer manager.
Storage and Buffer Management

• The buffer manager is responsible for partitioning the available main memory into buffers, which are page-sized
regions into which disk blocks can be transferred
• All DBMS components that need information from the disk will interact with the buffers and the buffer manager,
either directly or through the execution engine
• The kinds of information that various components may need include:
1. Data: the contents of the database itself
2. Metadata: the database schema that describes the structure of, and constraints on, the database
3. Log Records: information about recent changes to the database; these support durability of the database
4. Statistics: information gathered and stored by the DBMS about data properties (e.g., sizes, values, etc.)
5. Indexes: data structures that support efficient access to the data
References
1. S. Abiteboul et al., “The Lowell database research self-
assessment,” Comm. AC M 48:5 (2005), pp. 111-118.
http://research.microsoft.com/~gray
/lowell/LowellDatabaseResearchSelfAssessment.htm
2. S. Abiteboul, R. Hull, and V. Vianu, Foundations of
Databases, Addison- Wesley, Reading, MA, 1995.
3. http://liinwww.ira.uka.de/bibliography/Database.

References and 4. M. M. Astrahan et al., “System R: a relational approach to


database management,” ACM Trans, on Database Systems
1:2, pp. 97-137, 1976.
acknowledgme 5. http://www.informatik.uni-trier.de/~ley/d b/index.html. A
mirror site is found at

nt 6.
http://www.acm.org/sigmod/dblp/db/index.html.
M. Stonebraker and J. M. Hellerstein (eds.), Readings in
Database Systems, Morgan-Kaufmann, San Francisco, 1998.
7. M. Stonebraker, E. Wong, P. Kreps, and G. Held, “The design
and implementation of INGRES,” ACM Trans, on Database
Systems 1:3, pp. 189- 222, 1976.
8. J. D. Ullman, Principles of Database and Knowledge-Base
Systems, Volumes I and II, Computer Science Press, New
York, 1988, 1989.

Acknowledgment
Slides were prepared by Eng. Lina Hammad
Modified by Dr. Ala’a Al-Habashna

You might also like