CPS 116 Introduction To Database Systems
CPS 116 Introduction To Database Systems
Course goals
Random things you might do (for fun or profit) after taking this course
Develop your own database-driven Web sites (like Amazon, eBay, etc.) Be a power user of commercial database systems Upgrade your Web sites with XML Explain to friends why MySQL is not a real database system without InnoDB or Berkeley DB support
Course roadmap
Relational databases
Relational algebra, database design, SQL, application programming
XML
Data model and query languages, application programming, interplay between XML and relational databases
Database internals
Storage, indexing, query processing and optimization, concurrency control and recovery
Research topics
Data warehousing and mining, stream data processing, etc.
Sounds simple!
1001#Springfield#Mr. Morgan ... ... 00987-00654#Ned Flanders#2500.00 00123-00456#Homer Simpson#400.00 00142-00857#Montgomery Burns#1000000000.00 ... ...
Query
1001#Springfield#Mr. Morgan ... ... 00987-00654#Ned Flanders#2500.00 00123-00456#Homer Simpson#400.00 00142-00857#Montgomery Burns#1000000000.00 ... ...
What happens when the query changes to: Whats the balance in accounts 00142-00857?
Observations
Tons of tricks (not only in query processing, but also in storage, concurrency control, recovery, etc.) Different tricks may work better in different usage scenarios (example?) Same tricks get used over and over again in different applications We need a library, or better yet, a server (to support sharing, backup, etc.)
10
11
12
Early efforts
Factoring out data management functionalities from applications and standardizing these functionalities is an important first step
CODASYL standard (circa 1960s) Bachman got a Turing award for this in 1973
13
But getting the abstraction right (the API between applications and the DBMS) is still tricky
CODASYL
Query: Who have accounts with 0 balance managed by a branch in Springfield? Pseudo-code of a CODASYL application:
Use index on account(balance) to get accounts with 0 balance; For each account record: Get the branch id of this account; Use index on branch(id) to get the branch record; If the branch records location field reads Springfield: Output the owner field of the account record.
14
Whats wrong?
When data/workload characteristics change
The best navigation strategy changes The best way of organizing the data changes
15
16
Programmer specifies what answers a query should return, but not how the query is executed DBMS picks the best execution strategy based on availability of indexes, data/workload characteristics, etc. Provides physical data independence
17
Applications should not need to worry about how data is physically structured and stored Applications should work with a logical data model and declarative query language Leave the implementation details and optimization to DBMS The single most important reason behind the success of DBMS today
And a Turing Award for E. F. Codd
18
What else?
DBMS is multi-user
Example
get account balance from database; if balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database;
19
Homer at ATM1 withdraws $100 Marge at ATM2 withdraws $50 Initial balance = $400, final balance = ?
20
if balance > amount then balance = balance - amount; $300 write balance; $300
Final balance = $
Homer withdraws $100:
read balance; read balance; if balance > amount then balance = balance - amount; write balance; if balance > amount then balance = balance - amount; write balance;
21
22
Recovery in DBMS
Example: balance transfer
decrement the balance of account X by $100; increment the balance of account Y by $100;
23
Scenario 1: Power goes out after the first instruction Scenario 2: DBMS buffers and updates data in memory (for efficiency); before they are written back to disk, power goes out Log updates; undo/redo during recovery
24
Massive amounts of data (terabytes ~ petabytes) High throughput (thousands ~ millions transactions per minute) High availability ( 99.999% uptime)
25
26
DBMS OS Disk(s)
OS layer is bypassed by performance and safety Many details will be filled in the DBMS box
27
End users: query/update databases through application user interfaces (e.g., Amazon.com, 1-800-DISCOVER, etc.) Database designers: design database schema to model aspects of the real world Database application developers: build applications that interface with databases Database administrators (a.k.a. DBAs): load, back up, and restore data, fine-tune databases for performance DBMS implementors: develop the DBMS or specialized data management software, implement new techniques for query processing and optimization
Course information
Book
Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom Get the value-pack edition with free access to Gradiance when it comes out (check Web site for updates)
28
Web site
http://www.cs.duke.edu/courses/fall04/cps116/
Course information; tentative syllabus and reference sections in GMUW; lecture slides, assignments, programming notes
Course load
Four homework assignments (35%)
Include written and programming problems as well as online exercises on Gradiance
29
10