00 Introduction
00 Introduction
INTRODUCTION TO COURSE
SYSTEMS GROUP @ TU DARMSTADT http://tuda.systems/
Lecturers:
Teaching
Assistants:
Muhammad El-Hindi Nils Boeschen Adrian Lutsch
Systems@TUDa http://tuda.systems/ | 4
THE COURSE IN A NUTSHELL
systems-oriented internals of
scalable data systems
Course Logistics
• Organization
• Grading
Systems@TUDa http://tuda.systems/ | 6
Systems@TUDa http://tuda.systems/ | 7
Volume of data/information created …. worldwide
from 2010 to 2020, with forecasts from 2021 to 2025
https://www.statista.com/statistics/871513/worldwide-data-created/
Scalable Databases to Analyse Petabytes
of Structured (Tabular) Data
Modern Cloud DBMSs
LARGE SCALE AI: GENERATIVE MODELS / LLMs
LLMs (e.g., GPT4) trained
on large collections of
text and images
Systems@TUDa http://tuda.systems/ | 10
WHY NOW?
Game changer: Exponential growth in data & technology to process data
Growth
Time
Systems@TUDa http://tuda.systems/ | 11
EFFECTS OF EXPONENTIAL GROWTH?
The legend of
the king and the
chessboard
Systems@TUDa http://tuda.systems/ | 12
THE „SECOND HALF“ OF THE CHESS BOARD
DATA
Exponential grows in
data & resources
changes what we can do!
Time
Systems@TUDa http://tuda.systems/ | 14
THIS COURSE: WHAT ARE YOU LEARNING?
Systems@TUDa http://tuda.systems/ | 15
THE INFRASTRUCTURE: CLOUD DATA CENTERS
Data Centers in the Cloud: 1000’s of machines connected via
high-speed networks. How to use them for data processing?
Systems@TUDa http://tuda.systems/ | 16
COURSE SCHEDULE
Week 1: Introduction
Week 2: DBMS Storage - Single Node
Week 3: DBMS Storage - Distributed
Week 4: DBMS Query Processing - Single Node Single-Node +
Week 5: DBMS Query Processing - Distributed Query Distributed DBMS
Week 6: DBMS Query Optimization - Single Node & Distributed
Week 7: DBMS Transaction Processing - Single Node
Week 8: DBMS Transaction Processing - Distributed
Week 9: Cloud DBMS - Data Centers & DBMS Architectures
Week 10: Cloud DBMS - Scalable Query Processing
Cloud DBMSs
Week 11: Cloud DBMS - Scalable Transaction Processing
Week 12: Cloud DBMS - Secure DBMSs
Week 13: Other Workloads – MapReduce / Streaming
Week 14: Other Workloads – Distributed AI Other Workloads
Systems@TUDa http://tuda.systems/ | 17
THIS COURSE: WHAT ARE YOU LEARNING?
Systems@TUDa http://tuda.systems/ | 18
DISTRIBUTED DBMS: QUERY PROCESSING
Worker
SQL-Query:
SELECT * compile Coord
inator
Worker
σamount>50 σamount>50
Partition1 of Orders Table (on Worker1) Partition2 of Orders Table (on Worker2)
How to enable this for more complex queries (e.g., with joins)?
Systems@TUDa http://tuda.systems/ | 20
THIS COURSE: WHAT ARE YOU LEARNING?
Systems@TUDa http://tuda.systems/ | 21
DBMS MARKET: THE CLOUD IS TAKING OVER
Source: https://blogs.gartner.com/merv-adrian/2022/04/16/dbms-market-transformation-2021-the-big-picture/
CLOUD DBMS: DISAGGREGATED SYSTEMS
SQL Query
Service Layer
(Virtual Machines) Optimizer Metadata … Security Scale
Systems@TUDa http://tuda.systems/ | 23
THIS COURSE: WHAT ARE YOU LEARNING?
Systems@TUDa http://tuda.systems/ | 24
BIG DATA: BEYOND SQL
Jeffrey Dean (Lead of Google AI): Keynote 2008
Challenge: Build a Search Index for Google over ~20 billion web pages
Main Problem:
• 20 billion x 20KB (per web page) = 400TB of raw data
• Average read rate of a commodity disk is 30-35MB/s =>
~ 4 months to just process the web crawl with 1 machine!
Systems@TUDa http://tuda.systems/ | 25
BIG DATA: BEYOND SQL
Input-File Output-File
the brown, 2
quick fox, 2
brown how, 1
fox now, 1
the, 3
Example: Compute
the fox Word Frequencies
ate the
mouse
ate, 1
how now cow, 1
brown mouse, 1
cow quick, 1
Systems@TUDa http://tuda.systems/ | 26
BIG DATA: MAP REDUCE
Input-File (Distributed) Map Shuffle Reduce Output-File (Distributed)
the, 1
quick, 1 brown, {1,1}
the brown, 1 fox, {1,1} brown, 2
Block quick Map … … fox, 2
(64MB)
brown Reduce how, 1
fox now, 1
the, 1
fox, 1 the, 3
the fox
MapReduce was developed by Google
ate, 1
the, 1
Block
(64MB)
ate the to run scalable
Map data processing beyond SQL
…
mouse
how, 1
ate, 1
how now now, 1
Block brown, 1
cow, 1
(64MB) brown Reduce mouse, 1
….
cow Map ate, {1} quick, 1
cow, {1}
…
Systems@TUDa http://tuda.systems/ | 27
STRUCTURE OF COURSE
Part 1: Single-Node & Distributed Database Architectures
Systems@TUDa http://tuda.systems/ | 28
TODAY’S AGENDA
Course Overview
Course Logistics
• Organization
• Grading
Systems@TUDa http://tuda.systems/ | 29
COURSE LOGISTICS
Organization:
• Lectures: Tuesday 15:20-17:00pm
• Exercises: Friday 9:50-11:30am (Exercise Sheets + Programming Labs)
• Exercise Sheets: Preparation for Final Exam
• Programming Labs (Graded): Practical Implementation of Concepts in Lecture
Pre-requisites:
• Solid programming skills required
• Lecture “Information Management” or equivalent (Intro to Databases/SQL)
• Sufficient time to work on course assignments (i.e., labs)
Systems@TUDa http://tuda.systems/ | 30
COURSE GRADING AND REGISTRATION
Grading:
• 60 Points for Final Exam
• 40 Points for Programming Labs (3 Labs: 15P + 15P + 10P )
• Up to 6 Bonus Points for Leaderboard (3 Labs x 2P - 0P)
• 2 Bonus Points for Warm-up Lab
• 50 Points overall required to pass
Why Rust?
• Alternatives for scalable DBMSs: C++, Java (Not anymore)
• Rust: Efficient as C++ but with enhanced safety and modern features
Systems@TUDa http://tuda.systems/ | 34
PROGRAMMING LABS: RUST
Systems@TUDa http://tuda.systems/ | 35
FINAL EXAM
Exam questions based on lecture and exercises
Closed-book
No additional notes
Systems@TUDa http://tuda.systems/ | 36
COURSE MATERIAL & INFRASTRUCTURE
Moodle (of CS Department): https://moodle.informatik.tu-
darmstadt.de/course/view.php?id=1663
Details on
• Lecture (Slides) and Exercise/Lab Material
http://tuda.systems/
• Forum for Q&A
Lab Infrastructure:
• What you get: Code framework /
Automated testing (in Rust) Details in
• What you hand-in: Your code via Gitlab exercise session
(will be automatically tested) this Friday!
(you need to
• Warm-up lab: get to know Rust + lab/hand-in setup come re setup)
Systems@TUDa http://tuda.systems/ | 37
QUESTIONS
Systems@TUDa http://tuda.systems/ | 38