0% found this document useful (0 votes)
105 views

Lect01-Annotated DB

This document provides information for COMS W4111 Introduction to Databases taught in the fall 2023 semester at Columbia University, including details about the instructor Luis Gravano, class resources, prerequisites, lectures, projects, grading, and optional textbook. Students will complete two team projects using Python on the Google Cloud platform and be evaluated based on exams, projects, and optional homework assignments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Lect01-Annotated DB

This document provides information for COMS W4111 Introduction to Databases taught in the fall 2023 semester at Columbia University, including details about the instructor Luis Gravano, class resources, prerequisites, lectures, projects, grading, and optional textbook. Students will complete two team projects using Python on the Google Cloud platform and be evaluated based on exams, projects, and optional homework assignments.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

COMS W4111.

001
Introduction to Databases
Fall 2023

Computer Science Department


Columbia University
Your Instructor: Luis Gravano
(he/him/his)
● Ph.D. in Computer Science, Stanford U.
● Professor, Computer Science Department
(at Columbia U. since Fall 1997)
● At Google: Senior Research Scientist (2001),
Visiting Faculty Researcher (2018-19)

● Research interests: Databases, Web Search,


Information Extraction, Social Media

2
Class Resources
● Class website:
https://www.cs.columbia.edu/~gravano/cs4111
● Discussion board: Ed Discussion, which you
can access from CourseWorks, at
https://courseworks.columbia.edu/
● Announcements from class staff: on
CourseWorks

3
Your Instructor: Luis Gravano
● https://www.cs.columbia.edu/~gravano
[email protected]

● Office hours:
● Mondays, 9:30-10:30 a.m. ET in person at
706 Schapiro CEPSR
● Wednesdays, 9:30-10:30 a.m. ET online
● By appointment by email
● Details and links on class website

4
Your TAs

Rajeswari Bose Rohit Gopalakrishnan Prateek Jain Teng Jiang


(she, her, hers) (he, him, his) (he, him, his) (he, him, his)

Prahlad Pranav Sarah Vikram Vicky (Jingyi)


Koratamaddi Sukumar Tang Waradpande Zhou
(he, him, his) (he, him, his) (she, her, hers) (he, him, his) (she, her, hers)

Office hours and their location or links will soon be on a Google Calendar in
CourseWorks “Announcement” 5
Class Information: Prerequisites
COMS W3134 – Data Structures in Java,
COMS W3136 – Essential Data Structures in
C/C++, or
COMS W3137 – Data Structures and Algorithms
(equivalent courses taken elsewhere are
acceptable as well)

You need permission from the instructor if you


don’t have the prerequisites

6
Class Information: Lectures
● Tuesdays and Thursdays, 1:10-2:25 p.m. ET
● Classroom: 301 Pupin

Lectures also available:


● On Zoom: link in “Zoom Class Sessions” section

of CourseWorks
● As recordings: videos in “Video Library” section

of CourseWorks

Best to attend in person in the classroom, to ask


questions and participate in discussions 7
Grading Information
• Midterm (Thu Oct 26, during lecture time): 20%
• Final (Tue Dec 19, 1:10-3:10 p.m. ET, cumulative): 30%
• Projects (2): 50%
Project 1: 40%, Project 2: 10%

• Midterm and final are closed book, closed notes


• I will publish 4 (optional) homework assignments and
their solutions, so that you can practice the contents of
the class; homework is not submitted or graded

• Median grade will be a B+ or slightly higher


• Alternative or make-up exams will not be given
• Project 1 has higher weight than Project 2
8
2 Projects, in Teams of 2 Students
• You will do the projects on the Google Cloud platform
• You will have more-than-enough free credit through individual codes that
I will distribute once enrollments stabilize
• Projects will have a non-programming option
• If you follow the programming option, which I strongly recommend, you
will program in Python (only language option):

• Python is much easier to work with than Java for our database projects
• Python is a great, easy-to-learn, widely used language
• If you are fluent in Java, you will be able to easily learn the (not-so-deep)
level of Python needed for our projects

• Project 1: model and build an application of your choice on top of a


database system, using “traditional” relational database features
• Project 2: expand Project 1 to use substantial, advanced database
system features

More details announced soon; please be patient and wait until projects
announced
9
Projects (cont.)
• To be fair to all students in the class, I will grant
no extensions or exceptions for project
submission
• Instead, you have three grace late days total for
projects that you can use as you wish
throughout the semester; weekends and
university holidays are not counted
• After using all grace days, you will get a 25%
grade deduction for each additional late day
Check full details on website
10
Collaboration Policy
• Please check “Collaboration and Academic Honesty”
page from the main webpage for the class
• Exams are to be done individually
• Projects are done in teams; no collaboration between
different teams
• We will not tolerate cheating, which would be wrong and
unfair to the rest of the class. Check the CS Department
policies and procedures regarding academic honesty at
http://www.cs.columbia.edu/education/honesty; they
fully apply to this course.

• Contact the instructor right away if you have any


questions
11
Optional Textbook
Avi Silberschatz, Henry F. Korth, S. Sudarshan:
Database System Concepts, Seventh Edition,
McGraw-Hill, 2019; ISBN: 9780078022159

Textbook is optional: lectures will cover all material needed for


homework assignments and exams
Textbook “on reserve” at Science & Engineering Library

Textbook homepage has useful resources:


https://db-book.com/

12
Project 1 Contest
● Four best projects chosen as contest winners

● If you win:
● You will have the option to discuss and
demonstrate your project in class
● You will get a 10% boost in your Project 1
grade

13
Ongoing Feedback
• Don’t wait until the end-of-semester course
evaluations to complain or give feedback on how
to improve course (it’s too late then!)
• Talk to me early on during my office hours or
send me email with your concerns and
suggestions, or ask a TA to forward them to me

14
Thanks to Raghu Ramakrishnan, Johannes
Gehrke, and our own Eugene Wu for the
basis of some of the slides!
What Is a Database and a Database
Management System (DBMS)?
● A database is a generally large, integrated
collection of data that models a real-world
enterprise
● Entities (e.g., students, courses)
● Relationships (e.g., Juana González is taking

cs4111)

● A database management system (DBMS) is a


software system designed to store, manage,
and interact with databases
16
Why Use a DBMS?
• Data independence and efficient access
• Reduced application development time
• Data integrity and security
• Uniform data administration
• Concurrent access, recovery from crashes

17
Why Study Databases?
• Shift from computation to information
• Data sets increasing in diversity and volume
• The Web, online activity and commerce, social
media, …
• ... need for DBMS exploding
• DBMS encompasses most of CS
• OS, languages, theory, AI, machine learning, natural
language processing, multimedia, …

18
Why Study Databases?
• Most structured information on the web lives in databases
• Projects 1 and 2 will give you the opportunity to understand their
potential for data and a domain of your interest
• Databases are critical to organize, query, and perform
data analysis (e.g., data mining) of scientific, business and
financial, environmental, health data, and much more
• Also extremely helpful to organize large-scale
experimental results and data
• Other types of data repositories, such as text content,
covered in COMS E6111

19
Why are DBMSs Necessary?

Consider a bank with a simple “relation” with


account information:

20
Why are DBMSs Necessary?

Consider a bank with a simple “relation” with


account information:
accountNo balance type
12345 $1000 savings
54321 $250 checking
12345 $150 checking
... ... ...

● Users can withdraw money from an account


● Users can transfer money between accounts
● Bank can ask queries such as Q1: “list all accounts with a balance
over $10K” 21
Why not implement all this with a file
system with some ad-hoc software?

22
How About Efficiency?

23
Providing “Transactional Guarantees”?

24
Expected Behaviors if You and Your Spouse Withdraw
Money or Pay a Bill on Same Account Simultaneously?

25
Yes, We Could Write All Code for Our
Applications Ourselves ...

26
Yes, We Could Write All Code for Our
Applications Ourselves ...

… in a few decades!

Luckily, this functionality (queries, query


processing, transaction processing) is common
across many applications, so we can factor it out

27
Database Courses at Columbia
(This course) COMS W4111-Introduction to Databases
Prerequisites: CS3134, CS3136, or CS3137

• The Entity-Relationship Model


• The Relational Model
• The Relational Algebra
• SQL: Queries, Constraints, Triggers
• Embedded SQL, Cursors, SQL APIs
• Schema Refinement and Normal Forms
• Object-Relational DBMS: Database Design
• Introduction to Query Processing and Optimization
• Introduction to Transaction Processing

29
COMS W4112-Database System Implementation
Prerequisites: CS4111; fluency in Java or C++; recommended:
CS3827
• Storage Methods and Indexing
• Query Processing and Optimization for 1NF Relations,
including External Sorting
• Materialized View Maintenance, Selection, and Use in Query
Optimization
• Query Processing and Optimization for ORDBMSs
• Transaction Processing and Recovery
• Parallel and Distributed Databases: Query Processing and
Optimization
• Parallel and Distributed Databases: Transaction Processing
• Performance Considerations Beyond I/Os

30
COMS E6111-Advanced Database Systems
Prerequisites: CS4111; Working Knowledge of Python

• Information Retrieval
• Information Extraction
• Web Search
• Data Mining
• Data Warehousing, OLAP, Decision Support
• Time Series Analysis and Mining
• Spatial Data Management
• …

31

You might also like