Relational Database SQL
Relational Database SQL
Imagine a situation where you are part of the school administration team in the olden days,
having to manage hundreds and thousands of physical files of staff and student records in
multiple cabinets.
With the advancement of technology, we no longer have to keep and manage physical records.
In general, there are two types of databases: relational and non-relational. In this chapter,
we shall look into the former.
A relational (SQL) database is a collection of relational tables with a fixed schema, which is
the precise description of the data to be stored and the relationships between them. In this
model, the data are stored in relational tables and represented in the form of tuples as follows.
<TableName>(<Field1>, <Field2>, …)
1
Attributes of Relational Database
A table is a two-dimensional representation of data stored in rows and columns. Each table is
made up of records and fields.
A record is a complete set of data about a single entity in the table. In the table above, there
are 10 records, each referring to the complete set of data of a particular student.
A field or column refers to one type of data about the entities in the table. In the table above,
there are 4 fields: RegNo, Name, Gender and MobileNo.
Quick Check
Express the table StudentMD10 using the tuple representation mentioned in the previous
page.
2
Keys in Relational Database
A candidate key is a minimal set of fields that can uniquely identify each record in a table. It
should never be empty.
A primary key is a candidate key that is most appropriate to become the main key for a table.
It uniquely identifies each record in a table and should not change over time. That is, a primary
key tells a particular record apart from another record.
Quick Check
Which of the fields in the table StudentMD10 is a suitable primary key?
RegNo
A composite key is a combination of two or more fields in a table that can be used to uniquely
identify each record in a table. Uniqueness is only guaranteed when the fields are combined.
When taken individually, the fields do not guarantee uniqueness.
Quick Check
A table called StudentMD1011 is shown below.
Which two fields form the composite key for the table? RegNo and FormClass
3
A foreign key is a field in one table that refers to the primary key in another table.
To illustrate this concept, take a look at another table below called ClassInfo with
FormClass chosen to be the primary key.
Notice that the primary key (PK) in the table ClassInfo is related or linked to the FormClass
field in table StudentMD1011. This makes FormClass in the table StudentMD1011 a foreign
key (FK).
Data Redundancy
Data redundancy refers to the same data being stored more than once.
As we can see, the data for FormClass and FormTutor are repeated for students who are
in the same form class. This may lead to potential issues on insertion, updating and deletion
of data, such as:
Insertion A new student cannot be inserted unless a form class and a form tutor have been
assigned.
Update Should Mr Peter Lim quit the school, all the records in the table would need to
be updated. Should we miss any record, it would lead to inconsistent data.
Deletion Should all the records in the table be deleted, information on form class and form
tutor would be lost.
4
Data Dependency
Functional dependency
Attribute Y is functionally dependent on attribute X (usually the primary key), if for every
valid instance of X, the value of X uniquely determines the value of Y, i.e. X Y.
MatricNo uniquely identifies Name because if we know the MatricNo, we can know the
Name associated with it. Therefore, we can say Name is functionally dependent on MatricNo,
i.e.
MatricNo Name
Transitive dependency
MatricNo FormClass
FormClass FormTutor
MatricNo FormTutor
Normalisation
5
First Normal Form (1NF)
For a table to be in 1NF, all columns must be atomic, i.e. the information cannot be broken
down further.
For this example, assume that every form class has only one form tutor, and each CCA has
only one teacher IC.
The table above is not in 1NF because the CCAInfo column contains multiple values.
In order for the table to be in 1NF, we can split CCAInfo into two single-value columns:
CCAName and CCATeacherIC. Notice that the students with MatricNo 2 and 5 have multiple
CCAs. We keep this information intact by splitting their records into multiple records, each
corresponding to a different CCA. The resulting table is shown below.
The values for CCAName and CCATeacherIC are now atomic for each record.
The primary key for the above table shall be the composite key formed by MatricNo and
CCAName.
6
Second Normal Form (2NF)
Name, Gender, FormClass, FormTutor and BaseClass is dependent on only part of the
primary key, MatricNo.
Thus, we decompose the 1NF table into three tables shown below.
Student
MatricNo Name Gender FormClass FormTutor BaseClass
1 Adam M MD10 Peter Lim F3.1
2 Adrian M MD10 Peter Lim F3.1
2 Adrian M MD10 Peter Lim F3.1
3 Adam M MD11 Susan Tan F3.2
4 Bala M MD11 Susan Tan F3.2
5 Bee Lay F MD11 Susan Tan F3.2
5 Bee Lay F MD11 Susan Tan F3.2
StudentCCA CCAInfo
MatricNo CCA CCA CCA
Name Name TeacherIC
1 Tennis Tennis Adrian Tan
2 Choir Choir Sanjay Vittal
2 Art Club Art Club Nur Fauziah
3 Rugby Rugby Zoe Lim
4 Tech Council Tech Council Lilian Phua
5 Choir Choir Sanjay Vittal
5 Chess Chess Edison Poh
Quick Check
What should be the primary or composite key for each of the three tables above?
The composite key for table StudentCCA should be MatricNo and CCAName.
7
Third Normal Form (3NF)
Quick Check
Explain the transitive dependency found in the Student table.
To remove the transitive dependency, we decompose the 2NF Student table into two tables
shown below.
Student FormInfo
MatricNo Name Gender Form Form Form Base
Class Class Tutor Class
1 Adam M MD10 MD10 Peter Lim F3.1
2 Adrian M MD10 MD11 Susan Tan F3.2
3 Adam M MD11
4 Bala M MD11
5 Bee Lay F MD11
StudentCCA(MatricNo, CCAName)
CCAInfo(CCAName, CCATeacherIC)
The primary key for each table is indicated by underlining one or more attributes.
Note:
In the H2 Computing 9569 syllabus, candidates are required to reduce data redundancy to
3NF only. Nevertheless, going through 1NF and 2NF may help in some situations.
8
Entity-Relationship (E-R) Diagram
An entity-relationship (E-R) diagram is a data modelling technique that illustrates the entities
of a database and the relationships among those entities. It is useful in the planning of the
design of relational databases.
For the purpose of the syllabus, we shall only cover a simplified convention for the drawing of
E-R diagrams using crow’s foot notation.
An entity is a specific object of interest. Nouns are usually used to name entities. Entities are
represented by rectangles.
e.g. Student
A relationship describes the link between two entities. One of the following relationships can
exist between two entities:
one-to-one
Entity 1 Entity 2
For example, at a concert with reserved seating, each ticket entitles someone to a
particular seat and each seat is linked to only one ticket.
Ticket Seat
one-to-many
Entity 1 Entity 2
For example, a form class can have many students, but a student can belong to only
one form class.
many-to-many
Entity 1 Entity 2
For example, a CCA can have many students, and a student can join many CCAs.
Student CCAInfo
9
To implement a many-to-many relationship in a relational database, we usually
decompose a many-to-many relationship into two (or more) one-to-many relationships.
e.g.
Quick Check
Refer to the following normalised tables covered earlier.
StudentCCA(MatricNo, CCAName)
CCAInfo(CCAName, CCATeacherIC)
Draw an E-R diagram to model the simple school database described above.
10
Quick Check
A school library contains books that can be on loan to borrowers.
Draw an E-R diagram to model the school library database described above.
Book
Borrower Loan
Publisher
Structured Query Language (SQL) is a standard computer language for the operation and
management of relational databases. It is a language used to query, insert, update and modify
data.
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of
the International Organisation for Standardisation (ISO) in 1987. Since then, the standard was
updated several times. Most major relational databases support this standard, but have their
own proprietary extensions.
There are many types of SQL database engines. A database engine is the software that a
database management system (DBMS) uses to create, read, update and delete (CRUD) data
from a database.
We are going to use SQLite, a widely used database engine, for the purpose of the syllabus.
It is a popular choice as embedded database software for local/client storage in application
software, such as web browsers.
To visualise the databases that we are going to encounter throughout the course of this study,
we shall make use of DB Browser for SQLite.
11
Database Operations
In industry-based database applications, all four categories of SQL commands listed below
are required.
SQL Commands
Some of the more advanced commands under DCL and TCL are more relevant to industry-
specific roles, such as database administrators.
For the purpose of our learning, we only need to be able to understand and apply these basic
CRUD database operations:
12
SQL Data Types
Each field in an SQL table has to be associated with one data type. The following table shows
some of the common data types.
Refer to the school library database that we have discussed earlier on Page 11.
Open sql_lecture.db in DB Browser for SQLite. Three tables - Book, Publisher and
Unused (which shall be deleted later on), have been defined.
The summary of the tables required in this particular database, together with the fields and
their constraints, are shown below.
Borrower
Field Data Type Constraint
BorrowerID Numeric PRIMARY KEY, AUTOINCREMENT
FirstName String NOT NULL
Surname String NOT NULL
ContactNum Numeric
Loan
Field Data Type Constraint
LoanID Numeric PRIMARY KEY, AUTOINCREMENT
BorrowerID Numeric FOREIGN KEY to BorrowerID in Borrower table
BookID Numeric FOREIGN KEY to BookID in Book table
DateBorrowed String (Desired format: YYYYMMDD)
Book
Field Data Type Constraint
BookID Numeric PRIMARY KEY, AUTOINCREMENT
BookTitle String NOT NULL
PublisherID Numeric FOREIGN KEY to PublisherID in Publisher table
Damaged Numeric NOT NULL
(0 means undamaged, 1 means damaged)
Publisher
Field Data Type Constraint
PublisherID Numeric PRIMARY KEY, AUTOINCREMENT
PublisherName String NOT NULL
13
DDL: CREATE
PRIMARY KEY
FOREIGN KEY … REFERENCES …
NOT NULL
A value must be inserted into the field.
UNIQUE
No two records can repeat the same value within the field.
AUTOINCREMENT
The integer value is automatically given by the database when not specified (+1).
The following SQL statements, separated by a semi-colon, create the Borrower and Loan
tables respectively in the database.
DDL: DROP
The DROP command allows us to delete an entire table and all the records inside.
14
DML: INSERT
PublisherID PublisherName
1 NPH
2 Unpop
3 Appleson
4 Squirrel
5 Yellow Flame
As a quick exercise, insert the following records into the Borrower and Loan tables.
Borrower
BorrowerID FirstName Surname ContactNum
1 Peter Tan 999
2 Sarah Lee 81111123
3 Kumara Ravi 94456677
4 Some User
Loan
LoanID BorrowerID BookID DateBorrowed
1 3 2 20190220
2 3 1 20181215
3 2 3 20181231
4 1 5 20190111
15
DML: SELECT
To select only one or a subset of fields, we use the field names separated by commas.
To order the selected records according to some field values in ascending or descending order,
we use ORDER BY … ASC/DESC.
16
DML: UPDATE
The UPDATE command allows us to edit the data values in a database. One or more records
may be updated at the same time.
UPDATE <table_name>
SET <column1_name = column1_value, column2_name = column2_value, …>
WHERE <condition(s)>
DML: DELETE
17
Quick Check
For the Book table, write an SQL statement to insert an undamaged book titled “Eleventh
Night” with BookID no. 8 and PublisherID no. 2
For the Book table, write an SQL statement to update the condition of the book titled
“Eleventh Night” to damaged.
UPDATE Book
SET Damaged = 1
WHERE BookTitle = 'Eleventh Night'
For the Book table, write an SQL statement to retrieve the titles of all the books with
publishers and are damaged.
For the Borrower table, write an SQL statement to delete all the records without contact
numbers.
DROP TABLE deletes the table and all the records inside. Since the table has been deleted,
it is no longer possible to add records into Table1 anymore.
DELETE FROM does not delete the table, but only all the records inside. That means it is
possible to add records again into Table2.
18
JOIN
Inner join returns the Cartesian product of rows from the tables, i.e. it combines each row in
the first table with each row in the second table.
For example, to check the name of the publisher of each of the books in the library database,
we can write the following SQL statement.
19
The resulting table is a big table having many records with inconsistent data for PublisherID.
In order to retrieve only the useful records, we can add a condition as follows.
The table above is more meaningful as it links the book titles to the correct publishers.
However, notice that H2 Computing Ten Year Series has been omitted as it has no
PublisherID.
In such a case, we need to use left outer join, which takes into consideration all the records
from one table and records from the other that meet the join conditions.
20
Quick Check
Write an SQL statement to retrieve the titles of all the books that are not damaged with their
publisher names.
AGGREGATE FUNCTIONS
There are a few aggregate functions that we can use in SQL statements to calculate results
from a given database:
MIN (minimum value)
MAX (maximum value)
SUM (sum of all values)
COUNT (number of values)
OPERATORS
We have seen some operators being used in the examples earlier. These operators are often
used in the SELECT statements, but can be used in other statements like UPDATE. The
following are the three types of operators that we are expected to know.
Comparison Operators
= < >
!= <= >=
Logical Operators
OR IS ||
AND IS NOT (string concatenation)
Arithmetic Operators
+ * %
- /
21
Python and SQLite
DB Browser for SQLite is a convenient program for us to experiment with SQL statements and
examine the results. However, it is not an appropriate program to use if we want to customise
or restrict how the contents of a database are modified or presented.
Suppose we have a database that stores information about the books in a library. We should
not use DB Browser for SQLite for users to search the database as not everyone is familiar
with SQL statements. That aside, malicious users may run harmful statements, e.g. DROP
TABLE to delete the database.
As such, a developer typically write a custom program to control how users interact with a
database, which has an interface that is easy to understand and use. Based on the users’
inputs, the program would then generate the appropriate SQL statements in the background
and run them to produce the intended results. In this way, the users are prevented from
modifying the database.
We shall learn how to write Python programs that can interact with SQLite databases using
the built-in sqlite3 module.
Quick Check
Which of the following is not a valid reason why DB Browser for SQLite should not be
accessible to the users of a public library?
A Users may use the program to insert fake data into the database.
B Users may use the program to drop tables from the database.
C Users may use the program to perform a query that returns nothing.
D Users may not know how to perform the query using the program.
Loading a Database
Program 1: load_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.close()
The connect() method (line 3) takes in a string that contains the path and filename of a
database file and returns a Connection object. If no path is included, the file is assumed to
be in the same directory as the Python file. Furthermore, if the specified file does not exist, an
empty file will be created with the given filename instead.
After all operations with the database are complete, the close() method (line 4) of the
Connection object should then be called. This ensures that the database file is closed
properly, but does not save any modifications that have been made to the data.
22
Executing SQL Statements
Program 2: create_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.execute("CREATE TABLE Book " +
5 "(ID INTEGER PRIMARY KEY, Title TEXT)")
6 connection.commit()
7 connection.close()
The execute() method (line 4) takes in a string containing the SQL statement we wish to
run.
The commit() method (line 6) saves the change(s) made to the database.
After running the program above, we can use DB Browser for SQLite to check that a table
called Book has indeed been created.
However, if we try to run the program again, we will get the following error:
This demonstrates that calling execute() is just like running regular SQL statements in the
"Execute SQL" tab of DB Browser for SQLite. Any errors caused by running SQL statements
are reported as Python exceptions.
Program 3: insert_example_incomplete.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.execute("INSERT INTO Book(ID, Title) " +
5 "VALUES(0, 'Example Book')")
6 connection.close()
The program above runs with no errors. However, if we open library.db using DB Browser
for SQLite, we can see that the inserted data is missing from the Book table.
A transaction is a unit of work that is performed against a database. Using INSERT, UPDATE
or DELETE command opens a transaction that can either be committed or rolled back.
23
Program 4: insert_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.execute("INSERT INTO Book(ID, Title) " +
5 "VALUES(0, 'Example Book')")
6 connection.commit()
7 connection.close()
With a call to commit() added on line 6, the data are inserted and saved correctly.
Program 5: rollback_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4
5 connection.execute("INSERT INTO Book(ID, Title) " +
6 "VALUES(1, 'Rollback Book')")
7 connection.execute("INSERT INTO Book(ID, Title) " +
8 "VALUES(2, 'Also Rollback Book')")
9 connection.rollback()
10
11 connection.execute("INSERT INTO Book(ID, Title) " +
12 "VALUES(3, 'Committed Book')")
13 connection.commit()
14
15 connection.close()
The rollback() method (line 9) discards any changes done by the preceding SQL
statements. In the example shown above, the first two INSERT statements are rolled back so
that they have no effect on the database. On the other hand, the last INSERT statement is
committed so it does affect the database.
This behaviour of SQLite is useful as sometimes we may wish to discard any modifications
since the last transaction was opened. For instance, in our library example, we may start the
process of placing a book on loan, but discover partway that the borrower has already reached
his limit of borrowed books. We can discard all the changes made since the transaction was
opened by calling the Connection object's rollback() method.
Warning: Starting with Python 3.6, commands that control the structure of the database, such
as CREATE TABLE and DROP TABLE, do not open a transaction and will generally take effect
immediately. This means that, by default, it is not possible to roll back such changes
automatically.
24
Parameter Substitution
Program 6: delete_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4
5 # Insert some rows first so we have something to delete
6 connection.execute("INSERT INTO Book(ID, Title) " +
7 "VALUES(4, 'Extra Book')")
8 connection.execute("INSERT INTO Book(ID, Title) " +
9 "VALUES(5, 'Also Extra Book')")
10 connection.commit()
11
12 # Ask for ID and delete the corresponding row
13 book_id = input("Enter Book ID to delete: ")
14 connection.execute("DELETE FROM Book WHERE ID = ?", (book_id,))
15 connection.commit()
16
17 connection.close()
We often need to include some data that are provided by the user. For instance, we may want
the user to enter the ID of a book to delete from the database. This requires us to generate a
DELETE statement with the entered ID in its WHERE clause.
We may be tempted to use string concatenation to generate the required SQL statement,
Unfortunately, this is insecure as special characters or keywords in the user's input are not
escaped, thus malicious users can use this loophole to inject his own SQL statements.
We should use parameter substitution to safely include data that is provided by the user. To
do this, we use the question-mark character ? as placeholders for any data provided by the
user. We then provide a second argument to execute() that is a tuple of values to fill in the
placeholders.
Parameter substitution follows the same order in which the placeholders appear in the SQL
statement. This is illustrated by the following diagram:
execute("DELETE FROM Book WHERE ID > ? AND ID < ?", (2, 4))
25
Quick Check
As mentioned previously, the following string concatenation is not safe.
Suggest an input for book_id that will delete all the rows in the Book table.
1 or 1
As we have already learned, the SELECT command is used to select data from the database.
When we run a SELECT command in DB Browser for SQLite, the selected rows are usually
displayed in a table.
In Python, however, we must access the selected rows using a Cursor object that is returned
by the execute() method. This cursor can go through the selected rows, one by one, using
either a for loop or the fetchone() method. Each iteration returns a tuple of the columns
in the current row.
The two programs below print out all the book titles in the Book table.
Program 7: forloop_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 cursor = connection.execute("SELECT ID, Title FROM Book")
5 for row in cursor:
6 print(row[1]) # Title is the second item in the tuple
7 connection.close()
Program 8: fetchone_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 cursor = connection.execute("SELECT ID, Title FROM Book")
5 row = cursor.fetchone()
6 while row is not None:
7 print(row[1]) # Title is the second item in the tuple
8 row = cursor.fetchone()
9 connection.close()
The fetchone() method (Program 8 line 5) will advance the cursor to the next row, so calling
it repeatedly will iterate through the selected rows until the cursor reaches the end and returns
None.
26
Program 9: fetchall_example.py
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 cursor = connection.execute("SELECT ID, Title FROM Book")
5 rows = cursor.fetchall()
6 for row in rows:
7 print(row[1]) # Title is the second item in the tuple
8 connection.close()
Alternatively, instead of going through the rows one by one using a cursor, we may wish to
fetch all the rows at once and keep them in a list.
The fetchall() method (line 5) returns a list of tuples with each tuple containing the
selected columns for a single row.
1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.row_factory = sqlite3.Row
5 cursor = connection.execute("SELECT ID, Title FROM Book")
6 for row in cursor:
7 print(row["Title"]) # row is now a dictionary
8 connection.close()
Yet another alternative is to configure the SQLite connection so that each row is retrieved as
a dictionary that maps column names to field values instead. To do this, we set the
connection object's row_factory attribute to the built-in sqlite3.Row class (line 4). This
lets us change the ordering of columns in the SELECT statement without having to modify the
code for extracting individual column values.
Quick Check
Refer to Program 10.
The SQL statement on line 5 is replaced with one of the following options. Which option
would cause an error on line 7 when the program is run?
27
sqlite3 Module Summary
close() Closes (but does not save changes to) SQLite file
execute(sql, values_tuple) Runs the given SQL statement (first argument) after
substituting question mark(s) with the corresponding
value(s) in the given tuple (second argument) and
returns a Cursor object
28