Lecture 0 - CS50's Introduction To Databases With SQL
Lecture 0 - CS50's Introduction To Databases With SQL
Lecture 0 - CS50's Introduction To Databases With SQL
What is a Database?
SQL
SELECT
LIMIT
WHERE
NULL
LIKE
RANGES
ORDER BY
AGGREGATE
FUNCTIONS
Introduction
Databases (and SQL) are tools that can be used to interact with, store, and manage information.
Although the tools we’re using in this course are new, a database is an age-old idea.
Look at this diagram from a few thousand years ago. It has rows and columns, and seems to
contain stipends for workers at a temple. One could call this diagram a table, or even a
spreadsheet.
Let us now consider a modern context. Say you are a librarian tasked with organizing
information about the book titles and authors in this diagram.
One way of organizing the information would be to have each book title followed by its author,
as below.
Notice that each book is now a row in this table.
Every row has two columns — each a different attribute of the book (book title and
author).
In today’s information age, we can store our tables using software like Google Sheets instead of
paper or stone tablets . However, in this course we will talk about databases and not
spreadsheets.
Three reasons to move beyond spreadsheets to databases are
Scale: Databases can store not just items numbering to tens of thousands but even
millions and billions.
Update Capacity: Databases are able to handle multiple updates of data in a second.
Speed: Databases allow faster look-up of information. This is because databases
provide us with access to different algorithms to retrieve information.
What is a Database?
A database is a way of organizing data such that you can perform four operations on
it create
read
update
delete
A database management system (DBMS) is a way to interact with a database using a graphical
interface or textual language.
Examples of DBMS: MySQL, Oracle, PostgreSQL, SQLite, Microsoft Access, MongoDB
etc. The choice of a DBMS would rest on factors like
Cost: proprietary vs. free software,
Amount of support: free and open source software like MySQL, PostgreSQL and SQLite
come with the downside of having to set up the database yourself,
Weight: more fully-featured systems like MySQL or PostgreSQL are heavier and require
more computation to run than systems like SQLite.
In this course, we will start with SQLite and then move on to MySQL and PostgreSQL.
SQL
SQL stands for Structured Query Language. It is a language used to interact with databases,
via which you can create, read, update, and delete data in a database. Some important notes
about SQL
it is structured, as we’ll see in this course,
it has some keywords that can be used to interact with the database, and
it is a query language — it can be used to ask questions of data inside a database.
In this lesson, we will learn how to write some simple SQL queries.
SELECT
What data is actually in our database? To answer this, we will use our rst SQL keyword,
SELECT , which allows us to select some (or all) rows from a table inside the database.
In the SQLite environment, run
SELECT *
FROM "longlist";
This selects all the rows from the table called longlist .
The output we get contains all the columns of all the rows in this table, which is a lot of data.
We can simplify it by selecting a particular column, say the title, from the table. Let’s try
SELECT "title"
FROM "longlist";
Now, we see a list of the titles in this table. But what if we want to see titles and authors in
our search results? For this, we run
The database schema contains the structure of the database, including table and column
names. Later in this course, we will learn how to get the database schema and understand it.
SQLite is case-insensitive. However, we do follow some style conventions. Observe this query:
SELECT *
FROM "longlist";
SQL keywords are written in capital letters. This is especially useful in improving the readability
of longer queries. Table and column names are in lowercase.
LIMIT
If a database had millions of rows, it might not make sense to select all of its rows. Instead, we
might want to merely take a peek at the data it contains. We use the SQL keywordLIMIT to
specify the number of rows in the query output.
SELECT "title"
FROM "longlist"
LIMIT 10;
This query gives us the rst 10 titles in the database. The titles are ordered the same way in
the output of this query as they are in the database.
WHERE
The keyword WHERE is used to select rows based on a condition; it will output the rows for
which the speci ed condition is true.
This gives us the titles and authors for the books longlisted in 2023. Note that 2023 is not in
quotes because it is an integer, not a string or identi er.
The operators that can be used to specify conditions in SQL
are = (“equal != (“not equal
to”),
to”) and <> (also “not equal to”).
To select the books that are not hardcovers, we can run the query
Note that hardcover is in single quotes because it is an SQL string and not an identi er.
!= can be replaced with the operator <> to get the same results. The modi ed query would
be
Yet another way to get the same results is to use the SQL keyword NOT . The modi ed query
would be
To select the books longlisted in 2022 or 2023 that were not hardcovers
Here, the parantheses indicate that OR clause should be evaluated before the clause.
the AND
NULL
It is possible that tables may have missing data. NULL is a type used to indicate that certain
data does not have a value, or does not exist in the table.
For example, the books in our database have a translator along with an author. However,
only some of the books have been translated to English. For other books, the translator value
will
NULLbe
.
Conditions used with NULL IS NULL
are and IS NOT NULL .
To select the books for which translators don’t exist, we can run
Let’s try the reverse: selecting the books for which translators do exist.
LIKE
This keyword is used to select data that roughly matches the speci ed string. For example,
LIKE could be used to select books that have a certain word or phrase in their title.
LIKE is combined with the operators % (matches any characters around a given string) and _
(matches a single character).
To select the books with the word “love” in their titles, we can run
SELECT "title"
FROM "longlist"
WHERE "title" LIKE '%love%';
% matches 0 or more characters, so this query would match book titles that have 0 or more
characters before and after “love” — that is, titles that contain “love”.
To select the books whose title begin with “The”, we can run
SELECT "title"
FROM "longlist"
WHERE "title" LIKE 'The%';
The above query may also return books whose titles begin with “Their” or “They”. To select only
the books whose titles begin with the word “The”, we can add a space.
SELECT "title"
FROM "longlist"
WHERE "title" LIKE 'The %';
Given that there is a book in the table whose name is either “Pyre” or “Pire”, we can select it by
running
SELECT "title"
FROM "longlist"
WHERE "title" LIKE 'P_re';
This query could also return book titles like “Pore” or “Pure” if they existed in our database,
because _ matches any single character.
Questions
Can we use multiple
% or _ symbols in a query?
Yes, we can! Example 1: If we wanted to select books whose titles begin with “The” and have
“love” somewhere in the middle, we could run
SELECT "title"
FROM "longlist"
WHERE "title" LIKE 'The%love%';
Note: No book from our current database matches this pattern, so this query returns nothing.
Example 2: If we knew there was a book in the table whose title begins with “T” and has four
letters in it, we can try to nd it by running
SELECT "title"
FROM "longlist"
WHERE "title" LIKE 'T ';
Ranges
We can also use the operators < , > and in our conditions to match a range of values.
, <= >=
For example, to select all the books longlisted between the years 2019 and 2022 (inclusive), we
can run
Another way to get the same results is using the BETWEEN and AND to specify
keywords inclusive ranges. We can run
SELECT "title", "author"
FROM "longlist"
WHERE "year" BETWEEN 2019 AND 2022;
To select the books that have a rating of 4.0 or higher, we can run
To further limit the selected books by number of votes, and have only those books with at least
10,000 votes, we can run
To select the books that have less than 300 pages, we can run
Questions
For range operators like
< and > , do the values in the database have to be integers?
No, the values can be integers or oating-point (i.e., “decimal” or “real”) numbers. While
creating a database, there are ways to set these data types for columns.
ORDER BY
The ORDER BY keyword allows us to organize the returned rows in some speci ed order.
The following query selects the bottom 10 books in our database by rating.
Note that we get the bottom 10 books because ORDER BY Instead, to select the top 10 books
chooses ascending order by default.
SELECT "title", "rating"
FROM "longlist"
ORDER BY "rating" DESC LIMIT 10;
Note the use of the SQL keyword DESC to specify the descending ASC can be used to
explicitly specify ascending order. order.
To select the top 10 books by rating and also include number of votes as a tie-break, we can
run
Note that for each column in the ORDER BY clause, we specify ascending or descending order.
Questions
To sort books by title alphabetically, can we use ORDER BY
SELECT "title"
FROM "longlist"
ORDER BY "title";
Aggregate Functions
COUNT , AVG , MIN , MAX , andSUM are called aggregate functions and allow us to perform the
corresponding operations over multiple rows of data. By their very nature, each of the following
aggregate functions will return only a single output—the aggregated value.
To nd the average rating of all books in the database
SELECT AVG("rating")
FROM "longlist";
SELECT ROUND(AVG("rating"), 2)
FROM "longlist";
SELECT MAX("rating")
FROM "longlist";
To select the minimum rating in the database
SELECT MIN("rating")
FROM "longlist";
SELECT SUM("votes")
FROM "longlist";
SELECT COUNT(*)
FROM "longlist";
Remember that we used * to select every row and column from the database. In this
case, we are trying to count every row in the database and hence we use the * .
To count the number of translators
SELECT COUNT("translator")
FROM "longlist";
We observe that the number of translators is fewer than the number of rows in the
database. This is because function does not count values.
the
COUNT NULL
To count the number of publishers in the database
SELECT COUNT("publisher")
FROM "longlist";
As with translators, this query will count the number of publisher values that are not NULL .
However, this may include duplicates. Another SQL keyword, DISTINCT , can be used to ensure
that only distinct values are counted.