Lecture 2
Lecture 2
DBMS
|
|
Definitions
Data: Meaningful facts, text, graphics,
images, sound, video segments
Database: An organized collection of
logically related data
Information: Data processed to be
useful in decision making
Metadata: Properties /characteristics
that describes data
|
|
4hat is a database (DB)?
A collection of data that exists over a
long period of time, often many years¶
Managed through a database
management system
|
|
4hat is a database management
system (DBMS)?
A powerful tool for creating and
managing [and manipulating] large
amounts of data [(several gigabytes)]
efficiently and allowing it to persist
over long periods of time, safely
Focus on secondary, rather than main,
memory
Powerful, but simple, programming
interface
|
|
A Simple Data Management
Problem
Suppose we want to save Names, Phone
Numbers«
Solution 1 (Paper based)
² A blank notebook OR a phone/address book
² Entries recorded by pen, in time order
Advantages
² Cheap, simple, private, reliable, space efficient
Disadvantages
² Hard to search, update, share, expand
² Hard to add information, e.g. email addresses
|
|
Another approach
The Traditional File
Processing Environment
Use of Note Pad, Ms. 4ord, MS Excel
|
|
DBMS vs. `just a file system'
DBMS's evolved from file systems
file systems also store large amounts of data
over a long period of time in secondary
memory
however, file systems
² can lack efficient access
² have no direct support for queries
² limit organization to directory creation and
hierarchical organization
² have no sophisticated support for concurrency
² do not ensure durability
|
|
ACID properties
( All good DBMS's should guarantee these )
Atomicity
² should not be able to execute half of an operation
² either all or none of the effects of a transaction are made permanent
Consistency
² The consistency property ensures that any transaction the database performs will take it from one
consistent state to another.
² there should be no surprises in the world, e.g., gpa > 4.0, balance < 0, cats should never have more
than 1 tail!
² the effect of concurrent transactions is equivalent to some serial execution
² use constraints, triggers, active DB elements (context-free)
Isolation
² Isolation refers to the requirement that other operations cannot access data that has been
modified during a transaction that has not yet completed.
² concurrency control
² transactions should not be able to observe the partial effects of other transactions
² use locks (whole relations or individual tuples?)
Durability
² Durability is the ability of the DBMS to recover the committed transaction updates against any kind
of system failure (hardware or software).
² if power goes out, nothing bad should happen
² once accepted, the effects of a transaction are permanent (until, of course, changed by another
transaction)
² use logs
|
|
Applications of database
systems
reservation systems, banking systems
Network simulations / Experiments
record/book keeping (corporate, university, medical), statistics
bioinformatics, e.g., gene databases
criminal justice
² fingerprint matching
² how do you encode `looks like'?
multimedia systems
² require terabytes (1012 bytes) of storage
² tertiary storage devices, e.g., CD, DVDs
² image/audio/video retrieval
² streaming, interactivity
satellite imaging; can require petabytes (1015 bytes) of storage
the web
² client-server and multi-tier architectures
² almost all data-intensive websites are database-driven; IMDB.com is an exception
information integration
² over the web
² legacy systems; must deal with issues of
synonymy: different words having the same meaning, e.g., coffee shop vs. café
polysemy: same word (homonym) having different meanings, e.g., shot
² data warehouses
² data mining (KDD, Knowledge Discovery in Databases), e.g., association rules: `diapers
beer'; we pass these on to
the marketing folks
in sum, databases are everywhere!
|
|
Three classical data models
hierarchical model
network model
² each tuple is a separate record,
² no separation between logical and physical views
² used record-at-a-time languages
² too low-level
relational model
² most popular and successful model
² de facto standard for databases
² (relational) databases are one of the most popular success stories of
simple theoretical ideas
semistructured data and XML
² semistructured data is self-describing
² web data tends to be semistructured
² in between structured and unstructured data (free text)
² the study of the storage and retrieval of unstructured data is called IR
(Information Retrieval)
|
|
Main themes of relational database
management systems (RDBMS's)
data stored in a relation (for now, a table), e.g., a simple relation
Views
Relations
physical storage
|
|
Contd««
gives rise to powerful, yet declarative, relation-at-a-
time query languages, e.g., SQL (Structured Query
Language; pronounced `sequel')
a simple SQL query illustrating the SELECT-FROM-
4HERE construct
² SELECT id FROM Students 4HERE major = 'CPS' AND GPA
> 3.7;
relational query languages (QLs) are declarative
² you specify what you want, not how to get it (à la PROLOG)
² e.g., SQL
closure property
|
|
How can we study database
systems?
design of databases, i.e., how do you
structure your data in a database?
² entity-relationship (E/R) model
² relational model
database programming
² how do you use a DBMS?
² study (query languages) such as SQL
|
|