01-introduction-annotated
01-introduction-annotated
1 / 54
Course Introduction
Welcome!
2 / 54
Course Introduction
Today’s Agenda
3 / 54
Course Introduction Course Outline & Logistics
4 / 54
Course Introduction Course Outline & Logistics
• You want to learn how to make database systems scalable, for example, to support
web or mobile applications with millions of users.
• You want to make applications that are highly available (i.e., minimizing downtime)
and operationally robust.
• You have a natural curiosity for the way things work and want to know what goes on
inside major websites and online services.
• You are looking for ways of making systems easier to maintain in the long run,even as
they grow and as requirements and technologies change.
• If you are good enough to write code for a database system, then you can write code
on almost anything else.
5 / 54
Course Introduction Course Outline & Logistics
Course Objectives
6 / 54
Course Introduction Course Outline & Logistics
Course Topics
7 / 54
Course Introduction Course Outline & Logistics
Background
• I assume that you have already taken an intro course on database systems (e.g.,, GT
4400).
• We will discuss modern variations of classical algorithms that are designed for today’s
hardware.
• Things that we will not cover: SQL, Relational Algebra, Basic Algorithms + Data
Structures.
8 / 54
Course Introduction Course Outline & Logistics
Background
9 / 54
Course Introduction Course Outline & Logistics
Course Logistics
10 / 54
Course Introduction Course Outline & Logistics
Course Logistics
• Course Policies
▶ The programming assignments and exercise sheets must be your own work.
▶ They are not group assignments.
▶ You may not copy source code from other people or the web.
▶ Plagiarism will not be tolerated.
• Academic Honesty
▶ Refer to Georgia Tech Academic Honor Code.
▶ If you are not sure, ask me.
11 / 54
Course Introduction Course Outline & Logistics
Late Policy
• You are allowed ten total slip days (for programming assignments and exercise sheets).
• You lose 25% of an assignment’s points for every 24 hrs it is late.
• Mark on your submission (1) how many days you are late and (2) how many late days
you have left.
12 / 54
Course Introduction Course Outline & Logistics
Teaching Assistants
13 / 54
Course Introduction Course Outline & Logistics
Course Rubric
• Project (20%)
• Programming Assignments (45%)
• Exercise Sheets (15%)
• Mid-term Exam (20%)
14 / 54
Course Introduction Course Outline & Logistics
Project - Outline
15 / 54
Course Introduction Course Outline & Logistics
Project - Outline
• You don’t have to pick a topic until midway through the course.
• We will provide sample project topics.
• This project can be a conversation starter in job interviews.
16 / 54
Course Introduction Course Outline & Logistics
Project – Deliverables
17 / 54
Course Introduction Course Outline & Logistics
Project – Proposal
• Five minute presentation to the class that discusses the high-level topic.
• Each proposal must discuss:
▶ What is the problem being addressed by the project?
▶ Why is this problem important?
▶ How will the team solve this problem?
18 / 54
Course Introduction Course Outline & Logistics
• Five minute presentation to update the class about the current status of your project.
• Each presentation should include:
▶ Current development status.
▶ Whether anything in your plan has changed.
▶ Any thing that surprised you.
19 / 54
Course Introduction Course Outline & Logistics
• Ten minute presentation on the final status of your project during the finals week.
• You’ll want to include any performance measurements or benchmarking numbers for
your implementation.
• Demos are always hot too.
20 / 54
Course Introduction Course Outline & Logistics
Programming Assignments
21 / 54
Course Introduction Course Outline & Logistics
Exercise Sheets
22 / 54
Course Introduction Course Outline & Logistics
Exercise Sheet #1
23 / 54
Course Introduction History of Database Systems
24 / 54
Course Introduction History of Database Systems
• Reference
• Design decisions in early database systems are still relevant today.
• The “SQL vs. NoSQL” debate is reminiscent of “Relational vs. CODASYL” debate.
• Old adage: he who does not understand history is condemned to repeat it.
• Goal: ensure that future researchers avoid replaying history.
25 / 54
Course Introduction History of Database Systems
26 / 54
Course Introduction History of Database Systems
27 / 54
Course Introduction History of Database Systems
28 / 54
Course Introduction History of Database Systems
• Advantages
▶ No need to reinvent the wheel for every application
▶ Logical data independence: New record types may be added as the logical requirements
of an application may change over time.
29 / 54
Course Introduction History of Database Systems
• Limitations
▶ Information is repeated.
▶ Tree structured data model is very restrictive: Existence depends on parent tuples.
▶ No Physical data independence: Cannot freely change storage organization to tune a
database application because there is no guarantee that the applications will continue to
run
▶ Optimization: A tuple-at-a-time user interface forces the programmer to do manual query
optimization, and this is often hard.
30 / 54
Course Introduction History of Database Systems
1960s – IDS
31 / 54
Course Introduction History of Database Systems
1960s – CODASYL
32 / 54
Course Introduction History of Database Systems
33 / 54
Course Introduction History of Database Systems
• Advantages
▶ Graph structured data models are less restrictive
• Limitations
▶ Poorer physical and logical data independence: Cannot freely change storage
organizations or change application schema
▶ Slow loading and recovery: Data is typically stored in one large network. This much
larger object had to be bulk-loaded all at once, leading to very long load times.
34 / 54
Course Introduction History of Database Systems
35 / 54
Course Introduction History of Database Systems
36 / 54
Course Introduction History of Database Systems
• Advantages
▶ Set-a-time languages are good, regardless of the data model, since they offer physical data
independence
▶ Logical data independence is easier with a simple data model than with a complex one.
▶ Query optimizers can beat all but the best tuple-at-a-time DBMS application
programmers.
37 / 54
Course Introduction History of Database Systems
38 / 54
Course Introduction History of Database Systems
39 / 54
Course Introduction History of Database Systems
40 / 54
Course Introduction History of Database Systems
41 / 54
Course Introduction History of Database Systems
42 / 54
Course Introduction History of Database Systems
43 / 54
Course Introduction History of Database Systems
44 / 54
Course Introduction History of Database Systems
45 / 54
Course Introduction History of Database Systems
46 / 54
Course Introduction History of Database Systems
2010s – NewSQL
• Provide same performance for OLTP workloads as NoSQL DBMSs without giving up
ACID:
▶ Relational / SQL
▶ Distributed
▶ Usually closed-source
47 / 54
Course Introduction History of Database Systems
48 / 54
Course Introduction History of Database Systems
49 / 54
Course Introduction History of Database Systems
• Shared-disk DBMSs
• Embedded DBMSs
• Times Series DBMS
• Multi-Model DBMSs
• Blockchain DBMSs
50 / 54
Course Introduction History of Database Systems
51 / 54
Course Introduction Conclusion
Conclusion
52 / 54
Course Introduction Conclusion
Parting Thoughts
• There are many innovations that come from both industry and academia.
▶ Lots of ideas start in academia but few build complete DBMSs to verify them.
▶ IBM was the vanguard during 1970-1980s but now there is no single trendsetter.
▶ The era of cloud systems has begun.
• The relational model has won for operational databases.
53 / 54
Course Introduction Conclusion
Next Class
54 / 54