Lecture 3 - CS50's Introduction To Databases With SQL
Lecture 3 - CS50's Introduction To Databases With SQL
OpenCourseWare
Donate (https://cs50.harvard.edu/donate)
Lecture 3
• Introduction
• Database Schema
• Inserting Data
◦ Questions
• Other Constraints
• Inserting Multiple Rows
◦ Questions
• Deleting Data
◦ Questions
• Updating Data
• Fin
Introduction
Last week, we learned how to create our own database schema. In this lecture, we’ll explore how to
add, update, and delete data in our databases.
The Boston MFA (Museum of Fine Arts) is a century-old museum in Boston. The MFA manages a vast
collection of historical and contemporary artifacts and artwork. They likely use a database of some
kind to store data about their art and artifacts.
When a new artifact is added to their collection, we can imagine they would insert the
corresponding data to their database. Similarly, there are use cases in which data might need to be
read, updated or deleted.
We will focus now on the creation (or insertion) of data in a Boston MFA database.
Database Schema
Consider this schema that the MFA might use for its collection.
Each row of data contains the title for a piece of artwork along with the accession_number which is
a unique ID used by the museum internally. There is, too, a date indicating when the art was
acquired.
The table contains an ID which serves as the primary key.
We can imagine that the database administrator of the MFA runs an SQL query to insert each of
these pieces of artwork into the table.
To understand how this works, let us �rst create a database called mfa.db . Next, we read the
schema �le schema.sql into the database. This schema �le, already given to us, helps us create the
table collections .
To con�rm that the table has been created, we can select from the table.
This should give us an empty result, because the table doesn’t have any data yet.
Inserting Data
The SQL statement INSERT INTO is used to insert a row of data into a given table.
We can see that this command requires the list of columns in the table that will receive new data
and the values to be added to each column, in the same order.
Running the INSERT INTO command returns nothing, but we can run a query to con�rm that the row
is now present in collections .
We can add more rows to the database by inserting multiple times. However, typing out the value of
the primary key manually (as 1, 2, 3 etc.) might result in errors. Thankfully, SQLite can �ll out the
primary key values automatically. To make use of this functionality, we omit the ID column
altogether while inserting a row.
INSERT INTO "collections" ("title", "accession_number", "acquired")
VALUES ('Farmers working at dawn', '11.6152', '1911-08-03');
We can check that this row has been inserted with an id of 2 by running
Notice that the way SQLite �lls out the primary key values is by incrementing the previous primary
key—in this case, 1.
Questions
If we delete a row with the primary key 1, will SQLite automatically assign a primary key of 1 to the
next inserted row?
No, SQLite actually selects the highest primary key value in the table and increments it to generate
the next primary key value.
Other Constraints
Opening the �le schema.sql will pull up the schema for the database.
It is speci�ed that the accession number is unique. If we try to insert a row with a repeated accession
number, we will trigger a error that looks like Runtime error: UNIQUE constraint failed:
collections.accession_number (19) .
This error informs us that the row we are trying to insert violates a constraint in the schema—
speci�cally the UNIQUE constraint in this scenario.
Similarly, we can try to add a row with a NULL title, violating the NOT NULL constraint.
On running this, we will again see an error that looks like Runtime error: NOT NULL constraint
failed: collections.title (19) .
In this manner, the schema constraints are guardrails that protect us from adding rows that do not
follow the schema of our database.
The museum may not always know exactly when a painting was acquired, hence it is possible for the
acquired value to be NULL , as is the case for the �rst painting we just inserted.
To see the updated table, we can select all rows from the table as always.
SQLite makes it possible to import a CSV �le directly into our database. To do this, we need to start
from scratch. Let us leave this database mfa.db and then remove it.
We already have a CSV �le called mfa.csv that contains the data we need. On opening up this �le,
we can note that the �rst row contains the column names, which match exactly with the column
names of our table collections as per the schema.
First, let us create again the database mfa.db and read the schema �le as we did earlier.
Next, we can import the CSV by running a SQLite command.
The �rst argument, --csv indicates to SQLite that we are importing a CSV �le. This will help SQLite
parse the �le correctly. The second argument indicates that the �rst row of the CSV �le (the header
row) needs to be skipped, or not inserted into the table.
We can select all the data from the collections table to see that every painting from mfa.csv has
been successfully imported into the table.
The CSV �le we just inserted contained primary key values (1, 2, 3 etc.) for each row of data.
However, it is more likely that CSV �les we work with will not contain the ID or primary key values.
How can we have SQLite insert them automatically?
To try this out, let’s open up mfa.csv in our codespace and delete the id column from the header
row, along with the values in each column. This is what mfa.csv should look like once we �nish
editing:
title,accession_number,acquired
Profusion of flowers,56.257,1956-04-12
Farmers working at dawn,11.6152,1911-08-03
Spring outing,14.76,1914-01-08
Imaginative landscape,56.496,
Peonies and butterfly,06.1899,1906-01-01
We will also delete all the rows that are already within the collections table.
Now, we want to import this CSV �le into a table. However, the collections table (as per our
schema) must have four columns in every row. This new CSV �le contains only three columns for
every row. Hence, we cannot proceed to import in the same way we did before.
To successfully import the CSV �le without ID values, we will to use a temporary table:
Notice how we don’t use the argument --skip 1 with this command. This is because SQLite is
capable of recognizing the very �rst row of CSV data as the header row, and converts those into the
column names of the new temp table.
We can see the data within the temp table by querying it.
Next, we will select the data (without primary keys) from temp and move it to collections , which
was the goal all along! We can use the following command to achieve this.
In this process, SQLite will automatically add the primary key values in the id column.
Just to clean up our database, we can also drop the temp table once we’re done moving data.
Questions
Can we place columns in speci�c positions while inserting into a table?
While we can change the ordering of values in the INSERT INTO command, we usually can’t change
the ordering of the column names themselves. The order of column names follows the same order
used while creating the table.
What happens if one of the multiple rows we are trying to insert violates a table constraint?
While trying to insert multiple rows into a table, if even one of them violates a constraint, the
insertion command will result in an error and none of the rows will be inserted!
After inserting data from the CSV, one of the cells was empty and not NULL . Why did this happen?
When we imported data from the CSV �le, one of the acquired values was missing! This was
interpreted as text and hence, read into the table as an empty text value. We can run queries on the
table after importing to convert these empty values into NULL if required.
Deleting Data
We saw previously that running the following command deleted all rows from the table
collections . (We don’t want to actually run this command now or we’ll lose all the data in the
table!)
We can also delete rows that match speci�c conditions. For example, to delete the painting “Spring
outing” from our table collections we can run:
To delete any paintings with the date acquired as NULL we can run
As we always do, we will make sure the deletion worked as expected by selecting all data from the
table.
We see that the “Spring outing” and “Imaginative landscape” paintings are not in the table anymore.
To delete rows pertaining to paintings older than 1909, we can run
Using the < operator here, we are �nding the paintings acquired before January 1, 1909. These are
the paintings that will be deleted on running the query.
There might be cases where deleting some data could impact the integrity of a database. Foreign
key constraints are a good example. A foreign key column references the primary key of a different
table. If we were to delete the primary key, the foreign key column would have nothing to reference!
Consider now an updated schema for the MFA database, containing information not just about
artwork but also artists. The two entities Artist and Collection have a many-to-many relationship—a
painting can be created by many artists and a single artist can also create many pieces of artwork.
The artists and collections tables have primary keys—the ID columns. The created table
references these IDs in its two foreign key columns.
Given this database, if we choose to delete the unidenti�ed artist (with the ID 3), what would
happen to the rows in the table created with an artist_id of 3? Let’s try it out.
After opening up mfa.db , we can now see the updated schema by running the .schema command.
The created table does indeed have two foreign key constraints, one for the artist ID and one for
the collection ID.
Now, we can try to delete from the artists table.
On running this, we get an error very similar to ones we have seen before in this class: Runtime
error: FOREIGN KEY constraint failed (19) . This error noti�es us that deleting this data would
violate the foreign key constraint set up in the created table.
How do we ensure that the constraint is not violated? One possibility is to delete the corresponding
rows from the created table before deleting from the artists table.
This query effectively deletes the artist’s af�liation with their work. Once the af�liation no longer
exists, we can delete the artist’s data without violating the foreign key constraint. To do this, we can
run
In another possibility, we can specify the action to be taken when an ID referenced by a foreign key
is deleted. To do this, we use the keyword ON DELETE followed by the action to be taken.
ON DELETE RESTRICT : This restricts us from deleting IDs when the foreign key constraint is
violated.
ON DELETE NO ACTION : This allows the deletion of IDs that are referenced by a foreign key and
nothing happens.
ON DELETE SET NULL : This allows the deletion of IDs that are referenced by a foreign key and
sets the foreign key references to NULL .
ON DELETE SET DEFAULT : This does the same as the previous, but allows us to set a default
value instead of NULL .
ON DELETE CASCADE : This allows the deletion of IDs that are referenced by a foreign key and
also proceeds to cascadingly delete the referencing foreign key rows. For example, if we used
this to delete an artist ID, all the artist’s af�liations with the artwork would also be deleted
from the created table.
The latest version of the schema �le implements the above method. The foreign key constraints now
look like
Now running the following DELETE statement will not result in an error, and will cascade the
deletion from the artists table to the created table:
To check that this cascading deletion worked, we can query the created table:
We observe that none of the rows have an ID of 3 (the ID of the artist deleted from the artists
table).
Questions
We just deleted an artist with the ID of 3. Is there any way to make the next inserted row have an ID of
3?
By default, as we discussed before, SQLite will select the largest ID present in the table and
increment it to obtain the next ID. But we can use the AUTOINCREMENT keyword while creating a
column to indicate that any deleted ID should be repurposed for a new row being inserted into the
table.
Updating Data
We can easily imagine scenarios in which data in a database would need to be updated. Perhaps, in
the case of the MFA database, we �nd out that the painting “Farmers working at dawn” originally
mapped to an “Unidenti�ed artist” was actually created by the artist Li Yin.
We can use the update command to make changes to say, the af�liation of a painting. Here is the
syntax of the update command.
Let’s change this af�liation for “Farmers working at dawn” in the created table using the above
syntax.
UPDATE "created"
SET "artist_id" = (
SELECT "id"
FROM "artists"
WHERE "name" = 'Li Yin'
)
WHERE "collection_id" = (
SELECT "id"
FROM "collections"
WHERE "title" = 'Farmers working at dawn'
);
The �rst part of this query speci�es the table to be updated. The next part retrieves the ID of Li Yin
to set as the new ID. The last part selects the row(s) in created which will be updated with the ID of
Li Yin, which is the painting “Farmers working at dawn”!
Fin
This brings us to the conclusion of Lecture 3 about Writing in SQL!