Lect01-Annotated DB
Lect01-Annotated DB
001
Introduction to Databases
Fall 2023
2
Class Resources
● Class website:
https://www.cs.columbia.edu/~gravano/cs4111
● Discussion board: Ed Discussion, which you
can access from CourseWorks, at
https://courseworks.columbia.edu/
● Announcements from class staff: on
CourseWorks
3
Your Instructor: Luis Gravano
● https://www.cs.columbia.edu/~gravano
● [email protected]
● Office hours:
● Mondays, 9:30-10:30 a.m. ET in person at
706 Schapiro CEPSR
● Wednesdays, 9:30-10:30 a.m. ET online
● By appointment by email
● Details and links on class website
4
Your TAs
Office hours and their location or links will soon be on a Google Calendar in
CourseWorks “Announcement” 5
Class Information: Prerequisites
COMS W3134 – Data Structures in Java,
COMS W3136 – Essential Data Structures in
C/C++, or
COMS W3137 – Data Structures and Algorithms
(equivalent courses taken elsewhere are
acceptable as well)
6
Class Information: Lectures
● Tuesdays and Thursdays, 1:10-2:25 p.m. ET
● Classroom: 301 Pupin
of CourseWorks
● As recordings: videos in “Video Library” section
of CourseWorks
• Python is much easier to work with than Java for our database projects
• Python is a great, easy-to-learn, widely used language
• If you are fluent in Java, you will be able to easily learn the (not-so-deep)
level of Python needed for our projects
More details announced soon; please be patient and wait until projects
announced
9
Projects (cont.)
• To be fair to all students in the class, I will grant
no extensions or exceptions for project
submission
• Instead, you have three grace late days total for
projects that you can use as you wish
throughout the semester; weekends and
university holidays are not counted
• After using all grace days, you will get a 25%
grade deduction for each additional late day
Check full details on website
10
Collaboration Policy
• Please check “Collaboration and Academic Honesty”
page from the main webpage for the class
• Exams are to be done individually
• Projects are done in teams; no collaboration between
different teams
• We will not tolerate cheating, which would be wrong and
unfair to the rest of the class. Check the CS Department
policies and procedures regarding academic honesty at
http://www.cs.columbia.edu/education/honesty; they
fully apply to this course.
12
Project 1 Contest
● Four best projects chosen as contest winners
● If you win:
● You will have the option to discuss and
demonstrate your project in class
● You will get a 10% boost in your Project 1
grade
13
Ongoing Feedback
• Don’t wait until the end-of-semester course
evaluations to complain or give feedback on how
to improve course (it’s too late then!)
• Talk to me early on during my office hours or
send me email with your concerns and
suggestions, or ask a TA to forward them to me
14
Thanks to Raghu Ramakrishnan, Johannes
Gehrke, and our own Eugene Wu for the
basis of some of the slides!
What Is a Database and a Database
Management System (DBMS)?
● A database is a generally large, integrated
collection of data that models a real-world
enterprise
● Entities (e.g., students, courses)
● Relationships (e.g., Juana González is taking
cs4111)
17
Why Study Databases?
• Shift from computation to information
• Data sets increasing in diversity and volume
• The Web, online activity and commerce, social
media, …
• ... need for DBMS exploding
• DBMS encompasses most of CS
• OS, languages, theory, AI, machine learning, natural
language processing, multimedia, …
18
Why Study Databases?
• Most structured information on the web lives in databases
• Projects 1 and 2 will give you the opportunity to understand their
potential for data and a domain of your interest
• Databases are critical to organize, query, and perform
data analysis (e.g., data mining) of scientific, business and
financial, environmental, health data, and much more
• Also extremely helpful to organize large-scale
experimental results and data
• Other types of data repositories, such as text content,
covered in COMS E6111
19
Why are DBMSs Necessary?
20
Why are DBMSs Necessary?
22
How About Efficiency?
23
Providing “Transactional Guarantees”?
24
Expected Behaviors if You and Your Spouse Withdraw
Money or Pay a Bill on Same Account Simultaneously?
25
Yes, We Could Write All Code for Our
Applications Ourselves ...
26
Yes, We Could Write All Code for Our
Applications Ourselves ...
… in a few decades!
27
Database Courses at Columbia
(This course) COMS W4111-Introduction to Databases
Prerequisites: CS3134, CS3136, or CS3137
29
COMS W4112-Database System Implementation
Prerequisites: CS4111; fluency in Java or C++; recommended:
CS3827
• Storage Methods and Indexing
• Query Processing and Optimization for 1NF Relations,
including External Sorting
• Materialized View Maintenance, Selection, and Use in Query
Optimization
• Query Processing and Optimization for ORDBMSs
• Transaction Processing and Recovery
• Parallel and Distributed Databases: Query Processing and
Optimization
• Parallel and Distributed Databases: Transaction Processing
• Performance Considerations Beyond I/Os
30
COMS E6111-Advanced Database Systems
Prerequisites: CS4111; Working Knowledge of Python
• Information Retrieval
• Information Extraction
• Web Search
• Data Mining
• Data Warehousing, OLAP, Decision Support
• Time Series Analysis and Mining
• Spatial Data Management
• …
31