0% found this document useful (0 votes)
93 views8 pages

01a - Introduction - CSC 343

Uploaded by

Partha Kuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views8 pages

01a - Introduction - CSC 343

Uploaded by

Partha Kuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Week 1 – Part 1: An Introduction to Databases and DBMSs

Database Systems
ƒ Database: A very large, integrated collection of data.
ƒ Examples: databases of customers, products,...
ƒ There are huge databases out there, for satellite and
Databases and DBMSs other scientific data, digitized movies,...; up to
Data Models and Data Independence hexabytes of data (i.e., 1018 bytes)
Concurrency Control and Database Transactions ƒ A database usually models (some part of) a real-world
Structure of a DBMS enterprise.
DBMS Languages ƒ Entities (e.g., students, courses)
ƒ Relationships (e.g., Paolo is taking CS564)
ƒ A Database Management System (DBMS) is a software
package designed to store and manage databases.

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 1 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 2

Why Use a DBMS? Why Study Databases??


ƒ Data independence and efficient access — You
don’t need to know the implementation of the
ƒ Shift from computation to information:
database to access data; queries are optimized.
Computers were initially conceived as neat
ƒ Reduced application development time —
devices for doing scientific calculations; more
Queries can be expressed declaratively,
and more they are used as data managers.
programmer doesn’t have to specify how they are
evaluated. ƒ Datasets increasing in diversity and volume:
ƒ Data integrity and security — (Certain) Digital libraries, interactive video, Human Genome
constraints on the data are enforced project, EOS project
automatically. … need for DBMS technology is exploding!
ƒ Uniform data administration. ƒ DBMS technology encompasses much of Computer
ƒ Concurrent access, recovery from crashes — Science:
Many users can access/update the database at OS, languages, theory, AI, multimedia, logic,...
the same time without any interference.
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 3 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 4
Data Models
Levels of Abstraction
ƒ A data model is a collection of concepts for Many views, single logical
describing data. schema and physical
View 1 View 2 View 3
ƒ A database schema is a description of the data that schema.
are contained in a particular database. ƒ Views (also called external
ƒ The relational model of data is the most widely Logical Schema
schemas) describe how
used data model today. users see the data. Physical Schema
ƒ Main concept: relation, basically a table with ƒ Logical schema* defines
rows and columns. logical structure
ƒ A relation schema, describes the columns, or ƒ Physical schema describes
attributes, or fields of a relation. the files and indexes used.
* Called conceptual schema back in the
old days.

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 5 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 6

Example: University Database Tables Represent Relations


ƒ Logical schema:
Students(Sid:String, Name:String, Login: Students Sid Name Login Age Gpa

String, Age:Integer,Gpa:Real) 00243 Paolo pg 21 4.0


01786 Maria mf 20 3.6
Courses(Cid:String, Cname:String, Credits:
02699 Klaus klaus 19 3.4
Integer)
02439 Eric eric 19 3.1
Enrolled(Sid:String, Cid:String,
Grade:String)
Courses Cid Cname Credits
ƒ Physical schema:
csc340 Rqmts Engineering 4
ƒ Relations stored as unordered files. csc343 Databases 6
ƒ Index on first column of Students. ece268 Operating Systems 3
ƒ (One) External Schema (View): csc324 Programming Langs 4
CourseInfo(Cid:String, Enrollment:Integer)
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 7 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 8
Data Independence
Concurrency Control
Applications insulated from how data is structured ƒ Concurrent execution of user programs is essential
and stored: (See also 3-layer schema structure.) for good DBMS performance.
ƒ Logical data independence: Protection from ƒ Because disk accesses are frequent, and
changes in the logical structure of data. relatively slow, it is important to keep the CPU
ƒ Physical data independence: Protection from humming by working on several user programs
changes in the physical structure of data. concurrently.
ƒ Interleaving actions of different user programs can
lead to inconsistency: e.g., cheque is cleared while
One of the most important benefits account balance is being computed.
of database technology! ƒ DBMS ensures that such problems don’t arise:
users can pretend they are using a single-user
system.
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 9 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 10

Database Transactions Scheduling Concurrent Transactions


DBMS ensures that execution of {T1, ... , Tn} is equivalent
ƒ Key concept is transaction, which is an atomic to some serial execution of T1, ... ,Tn.
sequence of database actions (reads/writes). ƒ Before reading/writing an object, a transaction
ƒ Each transaction executed completely, must leave the requests a lock on the object, and waits till the DBMS
DB in a consistent state, if DB is consistent when the gives it the lock. All locks are released at the end of
transaction begins. the transaction. (Strict 2-phase locking protocol.)
ƒ Users can specify some simple integrity constraints on ƒ Idea: If an action of Ti (say, writing X) affects Tk (which
the data, and the DBMS will enforce these constraints. perhaps reads X), one of them, say Ti, will obtain the
ƒ Beyond this, the DBMS does not really understand the lock on X first and Tk is forced to wait until Ti
semantics of the data. (e.g., it does not understand how completes; this effectively orders the transactions.
the interest on a bank account is computed). ƒ What if Tk already has a lock on Y and Ti later requests
ƒ Thus, ensuring that a transaction (run alone) preserves a lock on Y? (Deadlock!) Ti or Tk is aborted and
consistency is ultimately the user’s responsibility! restarted!

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 11 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 12
Ensuring Atomicity The Log

ƒ DBMSs ensure atomicity (all-or-nothing property), ƒ The following actions are recorded in the log:
even if system crashes in the middle of a transaction. ƒ Ti writes an object: the old value and the new
ƒ Idea: Keep a log (history) of all actions carried out by value; log record must go to disk before the
the DBMS while executing a set of transactions: changed page!
ƒ Before a change is made to the database, the ƒ Ti commits/aborts: a log record indicating this
corresponding log entry is forced to a safe action.
location. (WAL protocol; OS support for this is ƒ Log records chained together by transaction id, so it’s
often inadequate.) easy to undo a specific transaction (e.g., to resolve a
ƒ After a crash, the effects of partially executed deadlock).
transactions are undone using the log. (Thanks to ƒ Log is often duplexed and archived on “stable”
WAL, if log entry wasn’t saved before the crash, storage.
corresponding change was not applied to ƒ All log related activities (and in fact, all CC-related
database!) activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 13 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 14

Databases Make Folks Happy... Structure of a DBMS These layers


must consider
concurrency
control and
recovery
ƒ End users and DBMS vendors
ƒ Database application programmers, ƒ A typical DBMS has a Query Optimization
layered architecture. and Execution
e.g. smart webmasters
ƒ The figure does not
ƒ Database administrators (DBAs) Relational Operators
show the concurrency
ƒ Design logical /physical schemas control and recovery Files and Access Methods
ƒ Handle security and authorization components. Buffer Management
ƒ Data availability, crash recovery ƒ This is one of several
Disk Space Management
ƒ Database tuning as needs evolve possible architectures;
each system has its
own variation.
Must understand how a DBMS works! DB

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 15 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 16
Database Languages SQL, an Interactive Language
A DBMS supports several languages and several SELECT Course, Room,
modes of use: Building
ƒ Interactive textual languages, such as SQL; FROM Rooms, Courses
ƒ Interactive commands embedded in a host WHERE Code = Room
programming language (Pascal, C, Cobol, Java, AND Floor=”Ground"
etc.)
ƒ Interactive commands embedded in ad-hoc ROOMS Code Building Floor
development languages (known as 4GL), usually DS1 Ex-OMI Ground
N3 Ex-OMI Ground
with additional features (e.g., for the production of G Science Third
forms, menus, reports, ...)
COURSES Course Room Floor
ƒ Form-oriented, non-textual user-friendly languages Networks N3 Ground
such as QBE. Systems N3 Ground

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 17 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 18

SQL Embedded in ad-hoc


SQL Embedded in Pascal
Language (Oracle PL/SQL)
write(‘city name''?'); readln(city); declare Sal number;
EXEC SQL DECLARE E CURSOR FOR begin
SELECT NAME, SALARY select Sal into Salary from Emp where Code='5788'
FROM EMPLOYEES for update of Sal;
WHERE CITY = :city ; if Salary>30M then
EXEC SQL OPEN E ; update Emp set Sal=Salary*1.1 where Code='5788';
EXEC SQL FETCH E INTO :name, :salary ; else
while SQLCODE = 0 do begin
update Emp set Sal=Salary*1.2 where Code='5788';
write(‘employee:', name, ‘raise?');
end if;
readln(raise);
commit;
EXEC SQL UPDATE PERSON SET SALARY=SALARY+:raise
WHERE CURRENT OF E exception
EXEC SQL FETCH E INTO :name, :salary when no_data_found then
end; insert into Errors
EXEC SQL CLOSE CURSOR E values(‘No employee has given code',sysdate);
end;
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 19 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 20
Form-Based Interface
(in Access) DBMS Languages

Host Programming Language


DML — data manipulation
language DBMS
DDL — data definition
language (allows defini-tion DDL
DML
of database schema) 4GL
4GL — fourth generation
language, useful for
declarative query proces- Database
sing, report generation

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 21 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 22

DBMS Technology: Pros and Cons Conventional Files vs Databases


Pros
ƒ Data are handled as a common resource. Databases
ƒ Centralized management and economy of scale. Files Advantages — Good for
ƒ Availability of integrated services, reduction of Advantages — many data integration; allow for
redundancies and inconsistencies already exist; good for more flexible formats (not
simple applications; very just records)
ƒ Data independence (useful for the development and
efficient Disadvantages — high cost;
maintenance of applications)
Disadvantages — data drawbacks in a centralized
Cons facility
duplication; hard to
ƒ Costs of DBMS products (and associated tools), evolve; hard to build for
also of data migration. complex applications
ƒ Difficulty in separating features and services (with
potential lack of efficiency.) The future is with databases!

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 23 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 24
Types of DBMSs The Hierarchical Data Model
ƒ Conventional — relational, network, hierarchical,
Database consists of hierarchical record
consist of records of many different record types
structures;
structures a field may have as value a list of
(database looks like a collection of files)
records; every record has at most one parent
ƒ Object-Oriented — database consists of objects
(and possibly associated programs); database Book
schema consists of classes (which can be objects B365 War & Peace $8.99
too). parent
Borrower
ƒ Multimedia — database can store formatted data children
(i.e., records) but also text, pictures,... 38 Elm Toronto
ƒ Active databases — database includes event-
condition-action rules Borrowing
ƒ Deductive databases* — like large Prolog
programs, not available commercially Jan 28, 1994 Feb 24, 1994
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 25 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 26

The Network Data Model Comparing Data Models


A database now consists of records with pointers ƒ The oldest DBMSs were hierarchical, dating back to
the mid-60s. IMS (IBM product) is the most popular
(links) to other records. Offers a navigational view among them. Many old databases are hierarchical.
of a database. ƒ The network data model came next (early ‘70s).
Views database programmer as “navigator”, chasing
Customer
Customer links (pointers, actually) around a database.
1::n link
ƒ The network model was found to be too
implementation-oriented, not insulating sufficiently
the programmer from implementation features of
Order
Order
cycles of links are allowed network DBMSs.
ƒ The relational model is the most recent arrival.
Relational databases are cleaner because they don’t
Ordered
Ordered Sales
Sales allow links/pointers (necessarily implementation-
Part Part
Part Part History
History dependent).
ƒ Even though the relational model was proposed in
1970, it didn’t take over the database market till the
Region 80s.
Region
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 27 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 28
Summary

ƒ DBMSs used to maintain and query large datasets.


ƒ Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
ƒ Levels of abstraction give data independence.
ƒ A DBMS typically has a layered architecture.
ƒ DBAs hold responsible jobs and are well-paid !
ƒ DBMS R&D is one of the broadest,
most exciting areas in CS.

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 29

You might also like