0% found this document useful (0 votes)
2 views

Relational Database SQL

This document provides an overview of relational databases, focusing on SQL, and explains key concepts such as tables, records, fields, keys (primary, candidate, secondary, composite, and foreign), data redundancy, and normalization (1NF, 2NF, 3NF). It illustrates these concepts with examples related to student records and emphasizes the importance of organizing data to reduce redundancy and maintain consistency. Additionally, it introduces entity-relationship diagrams as a method for modeling database entities and their relationships.

Uploaded by

Darren Siow
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Relational Database SQL

This document provides an overview of relational databases, focusing on SQL, and explains key concepts such as tables, records, fields, keys (primary, candidate, secondary, composite, and foreign), data redundancy, and normalization (1NF, 2NF, 3NF). It illustrates these concepts with examples related to student records and emphasizes the importance of organizing data to reduce redundancy and maintain consistency. Additionally, it introduces entity-relationship diagrams as a method for modeling database entities and their relationships.

Uploaded by

Darren Siow
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

2021 JC2 H2 Computing 9569

24. Relational Database: SQL


Introduction

Imagine a situation where you are part of the school administration team in the olden days,
having to manage hundreds and thousands of physical files of staff and student records in
multiple cabinets.

What are some of the issues that you may encounter?

With the advancement of technology, we no longer have to keep and manage physical records.

A database is a collection of data stored in an organised or logical manner. Storing data in a


database allows us to access and manage the data. Some examples of databases in real-life
include student records, supermarket inventory and contact list.

In general, there are two types of databases: relational and non-relational. In this chapter,
we shall look into the former.

A relational (SQL) database is a collection of relational tables with a fixed schema, which is
the precise description of the data to be stored and the relationships between them. In this
model, the data are stored in relational tables and represented in the form of tuples as follows.

<TableName>(<Field1>, <Field2>, …)

1
Attributes of Relational Database

A table is a two-dimensional representation of data stored in rows and columns. Each table is
made up of records and fields.

Below is an example of a table called StudentMD10, showing data of students from an


imaginary form class MD10.

RegNo Name Gender MobileNo


1 Adam M 92313291
2 Adrian M 92585955
3 Agnes F 83324112
4 Aisha F 88851896
5 Ajay M 94191061
6 Alex M 98675171
7 Alice F 95029176
8 Amy F 98640883
9 Andrew M 95172444
10 Andy M 95888639

A record is a complete set of data about a single entity in the table. In the table above, there
are 10 records, each referring to the complete set of data of a particular student.

A field or column refers to one type of data about the entities in the table. In the table above,
there are 4 fields: RegNo, Name, Gender and MobileNo.

Quick Check
Express the table StudentMD10 using the tuple representation mentioned in the previous
page.

StudentMD10(RegNo, Name, Gender, MobileNo)

2
Keys in Relational Database

A candidate key is a minimal set of fields that can uniquely identify each record in a table. It
should never be empty.

A primary key is a candidate key that is most appropriate to become the main key for a table.
It uniquely identifies each record in a table and should not change over time. That is, a primary
key tells a particular record apart from another record.

Quick Check
Which of the fields in the table StudentMD10 is a suitable primary key?

RegNo

A secondary key is a candidate key that is not selected as a primary key.

A composite key is a combination of two or more fields in a table that can be used to uniquely
identify each record in a table. Uniqueness is only guaranteed when the fields are combined.
When taken individually, the fields do not guarantee uniqueness.

Quick Check
A table called StudentMD1011 is shown below.

RegNo Name Gender FormClass


1 Adam M MD10
2 Adrian M MD10
3 Agnes F MD10
4 Aisha F MD10
5 Ajay M MD10
6 Alex M MD10
7 Alice F MD10
8 Amy F MD10
9 Andrew M MD10
10 Andy M MD10
1 Adam M MD11
2 Bala M MD11
3 Bee Lay F MD11
4 Ben M MD11
5 Boon Kiat M MD11
6 Boon Lim M MD11
7 Chee Seng M MD11
8 Colin M MD11
9 Daniel M MD11
10 Eleanor F MD11

Which two fields form the composite key for the table? RegNo and FormClass

3
A foreign key is a field in one table that refers to the primary key in another table.

To illustrate this concept, take a look at another table below called ClassInfo with
FormClass chosen to be the primary key.

FormClass FormTutor BaseClass


MD10 Peter Lim F3.1
MD11 Susan Tan F3.2

Notice that the primary key (PK) in the table ClassInfo is related or linked to the FormClass
field in table StudentMD1011. This makes FormClass in the table StudentMD1011 a foreign
key (FK).

Data Redundancy

Data redundancy refers to the same data being stored more than once.

Take a look at the table below.

RegNo Name Gender FormClass FormTutor


1 Adam M MD10 Peter Lim
2 Adrian M MD10 Peter Lim
3 Agnes F MD10 Peter Lim
4 Aisha F MD10 Peter Lim
5 Ajay M MD10 Peter Lim
6 Alex M MD10 Peter Lim
7 Alice F MD10 Peter Lim
8 Amy F MD10 Peter Lim
9 Andrew M MD10 Peter Lim
10 Andy M MD10 Peter Lim

As we can see, the data for FormClass and FormTutor are repeated for students who are
in the same form class. This may lead to potential issues on insertion, updating and deletion
of data, such as:

Insertion A new student cannot be inserted unless a form class and a form tutor have been
assigned.
Update Should Mr Peter Lim quit the school, all the records in the table would need to
be updated. Should we miss any record, it would lead to inconsistent data.
Deletion Should all the records in the table be deleted, information on form class and form
tutor would be lost.

4
Data Dependency

Suppose we have the following table:

Student(MatricNo, Name, Gender, FormClass, FormTutor)

MatricNo is a unique number assigned to every student in the college.

Functional dependency

Attribute Y is functionally dependent on attribute X (usually the primary key), if for every
valid instance of X, the value of X uniquely determines the value of Y, i.e. X  Y.

MatricNo uniquely identifies Name because if we know the MatricNo, we can know the
Name associated with it. Therefore, we can say Name is functionally dependent on MatricNo,
i.e.

MatricNo  Name

Transitive dependency

A functional dependency is said to be transitive if it is indirectly formed by two functional


dependencies. Z is transitively dependent on X if Y is functionally dependent on X, but X is
not functionally dependent on Y, and Z is functionally dependent on Y. In other words, X  Z
is a transitive dependency if the following hold true:
 XY
 Y does not  X
 YZ

FormClass is functionally dependent on MatricNo, but MatricNo is not functionally


dependent on FormClass, i.e.

MatricNo  FormClass

On the other hand, FormTutor is functionally dependent on FormClass, i.e.

FormClass  FormTutor

Therefore, we can conclude that FormTutor is transitively dependent on MatricNo, i.e.

MatricNo  FormTutor

Normalisation

Normalisation is the process of organising the tables in a database to reduce data


redundancy and prevent inconsistent data. There are at least three normal forms:
 first normal form (1NF)
 second normal form (2NF)
 third normal form (3NF)

5
First Normal Form (1NF)

For a table to be in 1NF, all columns must be atomic, i.e. the information cannot be broken
down further.

Consider the following table.

MatricNo Name Gender Form Form Base CCAInfo


Class Tutor Class
1 Adam M MD10 Peter Lim F3.1 Tennis
Teacher IC = Adrian Tan
2 Adrian M MD10 Peter Lim F3.1 Choir
Teacher IC = Sanjay Vittal,
Art Club
Teacher IC = Nur Fauziah
3 Adam M MD11 Susan Tan F3.2 Rugby
Teacher IC = Zoe Lim
4 Bala M MD11 Susan Tan F3.2 Tech Council
Teacher IC = Lilian Phua
5 Bee F MD11 Susan Tan F3.2 Choir
Lay Teacher IC = Sanjay Vittal,
Chess
Teacher IC = Edison Poh

For this example, assume that every form class has only one form tutor, and each CCA has
only one teacher IC.

The table above is not in 1NF because the CCAInfo column contains multiple values.

In order for the table to be in 1NF, we can split CCAInfo into two single-value columns:
CCAName and CCATeacherIC. Notice that the students with MatricNo 2 and 5 have multiple
CCAs. We keep this information intact by splitting their records into multiple records, each
corresponding to a different CCA. The resulting table is shown below.

Matric Name Gender Form Form Base CCA CCA


No Class Tutor Class Name TeacherIC
1 Adam M MD10 Peter Lim F3.1 Tennis Adrian Tan
2 Adrian M MD10 Peter Lim F3.1 Choir Sanjay Vittal
2 Adrian M MD10 Peter Lim F3.1 Art Club Nur Fauziah
3 Adam M MD11 Susan Tan F3.2 Rugby Zoe Lim
4 Bala M MD11 Susan Tan F3.2 Tech Council Lilian Phua
5 Bee Lay F MD11 Susan Tan F3.2 Choir Sanjay Vittal
5 Bee Lay F MD11 Susan Tan F3.2 Chess Edison Poh

The values for CCAName and CCATeacherIC are now atomic for each record.

The primary key for the above table shall be the composite key formed by MatricNo and
CCAName.

6
Second Normal Form (2NF)

For a table to be in 2NF, it must satisfy two conditions:


 The table should already be in 1NF.
 Every non-key attribute must be fully dependent on the entire primary key. This means
no attribute can depend on part of the primary key only.

Name, Gender, FormClass, FormTutor and BaseClass is dependent on only part of the
primary key, MatricNo.

CCATeacherIC, on the other hand, is dependent only on CCAName.

Thus, we decompose the 1NF table into three tables shown below.

Student
MatricNo Name Gender FormClass FormTutor BaseClass
1 Adam M MD10 Peter Lim F3.1
2 Adrian M MD10 Peter Lim F3.1
2 Adrian M MD10 Peter Lim F3.1
3 Adam M MD11 Susan Tan F3.2
4 Bala M MD11 Susan Tan F3.2
5 Bee Lay F MD11 Susan Tan F3.2
5 Bee Lay F MD11 Susan Tan F3.2

StudentCCA CCAInfo
MatricNo CCA CCA CCA
Name Name TeacherIC
1 Tennis Tennis Adrian Tan
2 Choir Choir Sanjay Vittal
2 Art Club Art Club Nur Fauziah
3 Rugby Rugby Zoe Lim
4 Tech Council Tech Council Lilian Phua
5 Choir Choir Sanjay Vittal
5 Chess Chess Edison Poh

Quick Check
What should be the primary or composite key for each of the three tables above?

The primary key for table Student should be MatricNo.

The composite key for table StudentCCA should be MatricNo and CCAName.

The primary key for table CCAInfo should be CCAName.

7
Third Normal Form (3NF)

For a table to be in 3NF, it must satisfy two conditions:


 The table should already be in 2NF.
 The table should not have transitive dependencies.

Quick Check
Explain the transitive dependency found in the Student table.

FormTutor and BaseClass are dependent on FormClass and FormClass is dependent


on MatricNo. Therefore, FormTutor and BaseClass are transitively dependent on
MatricNo.

To remove the transitive dependency, we decompose the 2NF Student table into two tables
shown below.

Student FormInfo
MatricNo Name Gender Form Form Form Base
Class Class Tutor Class
1 Adam M MD10 MD10 Peter Lim F3.1
2 Adrian M MD10 MD11 Susan Tan F3.2
3 Adam M MD11
4 Bala M MD11
5 Bee Lay F MD11

MatricNo remains the primary key for the Student table.


FormClass shall be the primary key of the newly formed table called FormInfo.

The final design after normalisation is represented below.

Student(MatricNo, Name, Gender, FormClass)

FormInfo(FormClass, FormTutor, BaseClass)

StudentCCA(MatricNo, CCAName)

CCAInfo(CCAName, CCATeacherIC)

The primary key for each table is indicated by underlining one or more attributes.

Each foreign key is indicated by using a dashed underline.

Note:
In the H2 Computing 9569 syllabus, candidates are required to reduce data redundancy to
3NF only. Nevertheless, going through 1NF and 2NF may help in some situations.

8
Entity-Relationship (E-R) Diagram

An entity-relationship (E-R) diagram is a data modelling technique that illustrates the entities
of a database and the relationships among those entities. It is useful in the planning of the
design of relational databases.

For the purpose of the syllabus, we shall only cover a simplified convention for the drawing of
E-R diagrams using crow’s foot notation.

An entity is a specific object of interest. Nouns are usually used to name entities. Entities are
represented by rectangles.

e.g. Student

A relationship describes the link between two entities. One of the following relationships can
exist between two entities:

 one-to-one

Entity 1 Entity 2

For example, at a concert with reserved seating, each ticket entitles someone to a
particular seat and each seat is linked to only one ticket.

Ticket Seat

 one-to-many

Entity 1 Entity 2

For example, a form class can have many students, but a student can belong to only
one form class.

Form Class Student

 many-to-many

Entity 1 Entity 2

For example, a CCA can have many students, and a student can join many CCAs.

Student CCAInfo

9
To implement a many-to-many relationship in a relational database, we usually
decompose a many-to-many relationship into two (or more) one-to-many relationships.

e.g.

Student StudentCCA CCAInfo

Other symbols used to describe relationships include:

Quick Check
Refer to the following normalised tables covered earlier.

Student(MatricNo, Name, Gender, FormClass)

FormInfo(FormClass, FormTutor, BaseClass)

StudentCCA(MatricNo, CCAName)

CCAInfo(CCAName, CCATeacherIC)

Draw an E-R diagram to model the simple school database described above.

FormInfo Student Student


CCAInfo
CCA

10
Quick Check
A school library contains books that can be on loan to borrowers.

 A borrower can take one or more loans.


 Each loan record belongs to only one borrower.
 A book can be loaned many times.
 A publisher publishes one or more books.
 A book can be published by zero or one publisher.
 (e.g. exam papers and lecture notes are not published by an official publishing house.)

Draw an E-R diagram to model the school library database described above.

Book
Borrower Loan

Publisher

Structured Query Language (SQL)

Structured Query Language (SQL) is a standard computer language for the operation and
management of relational databases. It is a language used to query, insert, update and modify
data.

SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of
the International Organisation for Standardisation (ISO) in 1987. Since then, the standard was
updated several times. Most major relational databases support this standard, but have their
own proprietary extensions.

There are many types of SQL database engines. A database engine is the software that a
database management system (DBMS) uses to create, read, update and delete (CRUD) data
from a database.

We are going to use SQLite, a widely used database engine, for the purpose of the syllabus.
It is a popular choice as embedded database software for local/client storage in application
software, such as web browsers.

Python’s IDLE comes with a built-in module for SQLite3.

To visualise the databases that we are going to encounter throughout the course of this study,
we shall make use of DB Browser for SQLite.

11
Database Operations

In industry-based database applications, all four categories of SQL commands listed below
are required.

 Data Definition Language (DDL) defines database schemas.

 Data Manipulation Language (DML) is used to retrieve and modify data.

 Data Control Language (DCL) is used to control access to a database.

 Transaction Control Language (TCL) is used to manage changes to a database,


usually at transactional level.

SQL Commands

DDL DML DCL TCL


SELECT
CREATE INSERT GRANT COMMIT
ALTER SELECT REVOKE SAVEPOINT
DROP UPDATE ROLLBACK
RENAME DELETE
TRUNCATE MERGE
COMMENT CALL
EXPLAIN PLAN
LOCK TABLE

Some of the more advanced commands under DCL and TCL are more relevant to industry-
specific roles, such as database administrators.

For the purpose of our learning, we only need to be able to understand and apply these basic
CRUD database operations:

Operation SQL Command


CREATE INSERT
READ SELECT
UPDATE UPDATE
DELETE DELETE

12
SQL Data Types

Each field in an SQL table has to be associated with one data type. The following table shows
some of the common data types.

Data Type SQL Syntax Description


String CHAR(x) Fixed length characters (x can be from 1 to 255)
VARCHAR(x) Variable length characters (x can be from 1 to 65535)
TEXT Equivalent to VARCHAR(65535)
Numeric INTEGER Integers
REAL Real numbers
Boolean BOOL True or False

Creating and Manipulating SQL Database

Refer to the school library database that we have discussed earlier on Page 11.

Open sql_lecture.db in DB Browser for SQLite. Three tables - Book, Publisher and
Unused (which shall be deleted later on), have been defined.

The summary of the tables required in this particular database, together with the fields and
their constraints, are shown below.

Borrower
Field Data Type Constraint
BorrowerID Numeric PRIMARY KEY, AUTOINCREMENT
FirstName String NOT NULL
Surname String NOT NULL
ContactNum Numeric

Loan
Field Data Type Constraint
LoanID Numeric PRIMARY KEY, AUTOINCREMENT
BorrowerID Numeric FOREIGN KEY to BorrowerID in Borrower table
BookID Numeric FOREIGN KEY to BookID in Book table
DateBorrowed String (Desired format: YYYYMMDD)

Book
Field Data Type Constraint
BookID Numeric PRIMARY KEY, AUTOINCREMENT
BookTitle String NOT NULL
PublisherID Numeric FOREIGN KEY to PublisherID in Publisher table
Damaged Numeric NOT NULL
(0 means undamaged, 1 means damaged)

Publisher
Field Data Type Constraint
PublisherID Numeric PRIMARY KEY, AUTOINCREMENT
PublisherName String NOT NULL

13
DDL: CREATE

The CREATE command allows us to make a new table.

CREATE TABLE <table_name> (


<column1_name COLUMN1_TYPE COLUMN1_CONSTRAINT(S)>,
<column2_name COLUMN2_TYPE COLUMN2_CONSTRAINT(S)>,

PRIMARY KEY (<column1_name>, <column2_name>, …),
FOREIGN KEY (<column_name>) REFERENCES <table_name>(<column_name>)
)

The field constraints that we need to know are as follows:

 PRIMARY KEY
 FOREIGN KEY … REFERENCES …
 NOT NULL
A value must be inserted into the field.
 UNIQUE
No two records can repeat the same value within the field.
 AUTOINCREMENT
The integer value is automatically given by the database when not specified (+1).

The following SQL statements, separated by a semi-colon, create the Borrower and Loan
tables respectively in the database.

CREATE TABLE Borrower (


BorrowerID INTEGER PRIMARY KEY AUTOINCREMENT,
FirstName VARCHAR(30) NOT NULL,
Surname VARCHAR(30) NOT NULL,
Contact INTEGER
);

CREATE TABLE Loan (


LoanID INTEGER PRIMARY KEY AUTOINCREMENT,
BorrowerID VARCHAR(30) NOT NULL,
BookID VARCHAR(30) NOT NULL,
DateBorrowed VARCHAR(30) NOT NULL
)

DDL: DROP

The DROP command allows us to delete an entire table and all the records inside.

DROP TABLE <table_name>

e.g. DROP TABLE Unused

14
DML: INSERT

The INSERT command allows us to insert a new record in a table.

INSERT INTO <table_name>(<column1_name, column2_name, …>)


VALUES (<column1_value, column2_value, …>)

Refer to the Publisher table below.

PublisherID PublisherName
1 NPH
2 Unpop
3 Appleson
4 Squirrel
5 Yellow Flame

e.g. INSERT INTO Publisher VALUES (6, 'BigBooks')


OR
INSERT INTO Publisher(PublisherName) VALUES ('BigBooks')

Either statement inserts a new publisher named 'BigBooks' with PublisherID =


6. It is not necessary to specify PublisherID in this case since it is incremented
automatically.

As a quick exercise, insert the following records into the Borrower and Loan tables.

Borrower
BorrowerID FirstName Surname ContactNum
1 Peter Tan 999
2 Sarah Lee 81111123
3 Kumara Ravi 94456677
4 Some User

Loan
LoanID BorrowerID BookID DateBorrowed
1 3 2 20190220
2 3 1 20181215
3 2 3 20181231
4 1 5 20190111

15
DML: SELECT

The SELECT command allows us to retrieve data from the database.

SELECT <column1_name, column2_name, …>


FROM <table_name>
WHERE <condition(s)>
ORDER BY <column_name> ASC/DESC

Refer to the Book table below.

BookID BookTitle PublisherID Damaged


1 The Lone Gatsby 5 0
2 A Winter’s Slumber 4 1
3 Life of Pie 4 0
4 A Brief History of Primates 3 0
5 To Praise a Mocking Bird 2 0
6 The Catcher in the Eye 1 1
7 H2 Computing Ten Year Series 0

To select all fields from a table, we use *.

e.g. SELECT * FROM Book

To select only one or a subset of fields, we use the field names separated by commas.

e.g. SELECT BookTitle FROM Book


SELECT BookID, BookTitle FROM Book

To select only rows meeting certain conditions, we use WHERE.

e.g. SELECT BookTitle from Book WHERE Damaged = 1


This statement returns the titles of all the damaged books.

SELECT * from Book WHERE PublisherID IS NOT NULL


This statement returns all the books with PublisherID.

SELECT * from Book WHERE PublisherID = 4 AND Damaged = 0


This statement returns all the books published by a certain publisher with ID no. 4 and
are not damaged.

To order the selected records according to some field values in ascending or descending order,
we use ORDER BY … ASC/DESC.

e.g. SELECT BookID, BookTitle FROM Book ORDER BY PublisherID ASC


This statement returns all the book IDs and titles arranged in an ascending order of
PublishedID.

16
DML: UPDATE

The UPDATE command allows us to edit the data values in a database. One or more records
may be updated at the same time.

UPDATE <table_name>
SET <column1_name = column1_value, column2_name = column2_value, …>
WHERE <condition(s)>

e.g. UPDATE Book SET Damaged = 1


WHERE BookTitle = 'To Praise a Mocking Bird'
This statement updates the condition of the book titled ‘To Praise a Mocking Bird’ to
damaged.

UPDATE Book SET BookTitle = 'Book: ' || Title


This statement updates the values of BookTitle such that each book title now starts
with ‘Book: ‘. Note the use of || for string concatenation.

DML: DELETE

The DELETE command allows us to delete existing records in a table.

DELETE FROM <table_name>


WHERE <condition(s)>

e.g. DELETE FROM Publisher WHERE PublisherID = 6


This statement deletes the record having PublisherID = 6.

DELETE FROM Publisher


This statement deletes all the records in the Publisher table.

17
Quick Check
For the Book table, write an SQL statement to insert an undamaged book titled “Eleventh
Night” with BookID no. 8 and PublisherID no. 2

INSERT INTO Book


VALUES (8, 'Eleventh Night', 2, 0)

For the Book table, write an SQL statement to update the condition of the book titled
“Eleventh Night” to damaged.

UPDATE Book
SET Damaged = 1
WHERE BookTitle = 'Eleventh Night'

For the Book table, write an SQL statement to retrieve the titles of all the books with
publishers and are damaged.

SELECT BookTitle FROM Book


WHERE PublisherID IS NOT NULL AND Damaged = 1

For the Borrower table, write an SQL statement to delete all the records without contact
numbers.

DELETE FROM Borrower


WHERE ContactNum IS NULL

What is the difference between the two commands below?

DROP TABLE Table1

DELETE FROM Table2

DROP TABLE deletes the table and all the records inside. Since the table has been deleted,
it is no longer possible to add records into Table1 anymore.

DELETE FROM does not delete the table, but only all the records inside. That means it is
possible to add records again into Table2.

18
JOIN

The JOIN command allows us to combine data from two tables.

Inner join returns the Cartesian product of rows from the tables, i.e. it combines each row in
the first table with each row in the second table.

For example, to check the name of the publisher of each of the books in the library database,
we can write the following SQL statement.

SELECT * FROM Book, Publisher

BookID BookTitle PublisherID Damaged PublisherID PublisherName


1 The Lone Gatsby 5 0 1 NPH
1 The Lone Gatsby 5 0 2 Unpop
1 The Lone Gatsby 5 0 3 Appleson
1 The Lone Gatsby 5 0 4 Squirrel
1 The Lone Gatsby 5 0 5 Yellow Flame
2 A Winter’s Slumber 4 1 1 NPH
2 A Winter’s Slumber 4 1 2 Unpop
2 A Winter’s Slumber 4 1 3 Appleson
2 A Winter’s Slumber 4 1 4 Squirrel
2 A Winter’s Slumber 4 1 5 Yellow Flame
3 Life of Pie 4 0 1 NPH
3 Life of Pie 4 0 2 Unpop
3 Life of Pie 4 0 3 Appleson
3 Life of Pie 4 0 4 Squirrel
3 Life of Pie 4 0 5 Yellow Flame
4 A Brief History Of Primates 3 0 1 NPH
4 A Brief History Of Primates 3 0 2 Unpop
4 A Brief History Of Primates 3 0 3 Appleson
4 A Brief History Of Primates 3 0 4 Squirrel
4 A Brief History Of Primates 3 0 5 Yellow Flame
5 To Praise a Mocking Bird 2 0 1 NPH
5 To Praise a Mocking Bird 2 0 2 Unpop
5 To Praise a Mocking Bird 2 0 3 Appleson
5 To Praise a Mocking Bird 2 0 4 Squirrel
5 To Praise a Mocking Bird 2 0 5 Yellow Flame
6 The Catcher in the Eye 1 1 1 NPH
6 The Catcher in the Eye 1 1 2 Unpop
6 The Catcher in the Eye 1 1 3 Appleson
6 The Catcher in the Eye 1 1 4 Squirrel
6 The Catcher in the Eye 1 1 5 Yellow Flame
7 H2 Computing Ten Year Series 0 1 NPH
7 H2 Computing Ten Year Series 0 2 Unpop
7 H2 Computing Ten Year Series 0 3 Appleson
7 H2 Computing Ten Year Series 0 4 Squirrel
7 H2 Computing Ten Year Series 0 5 Yellow Flame

19
The resulting table is a big table having many records with inconsistent data for PublisherID.
In order to retrieve only the useful records, we can add a condition as follows.

SELECT * FROM Book, Publisher


WHERE Book.PublisherID = Publisher.PublisherID

BookID BookTitle PublisherID Damaged PublisherID PublisherName


1 The Lone Gatsby 5 0 5 Yellow Flame
2 A Winter’s Slumber 4 1 4 Squirrel
3 Life of Pie 4 0 4 Squirrel
4 A Brief History Of Primates 3 0 3 Appleson
5 To Praise a Mocking Bird 2 0 2 Unpop
6 The Catcher in the Eye 1 1 1 NPH

The table above is more meaningful as it links the book titles to the correct publishers.
However, notice that H2 Computing Ten Year Series has been omitted as it has no
PublisherID.

In such a case, we need to use left outer join, which takes into consideration all the records
from one table and records from the other that meet the join conditions.

SELECT <column1_name, column2_name, …>


FROM <Table_A>
INNER / LEFT OUTER JOIN <Table_B>
ON <condition(s)>

SELECT * FROM Book


LEFT OUTER JOIN Publisher
ON Book.PublisherID = Publisher.PublisherID

BookID BookTitle PublisherID Damaged PublisherID PublisherName


1 The Lone Gatsby 5 0 5 Yellow Flame
2 A Winter’s Slumber 4 1 4 Squirrel
3 Life of Pie 4 0 4 Squirrel
4 A Brief History Of Primates 3 0 3 Appleson
5 To Praise a Mocking Bird 2 0 2 Unpop
6 The Catcher in the Eye 1 1 1 NPH
7 H2 Computing Ten Year Series 0

20
Quick Check
Write an SQL statement to retrieve the titles of all the books that are not damaged with their
publisher names.

SELECT BookTitle, PublisherName FROM Book, Publisher


WHERE Book.PublisherID = Publisher.PublisherID AND Book.Damaged = 0

AGGREGATE FUNCTIONS

There are a few aggregate functions that we can use in SQL statements to calculate results
from a given database:
 MIN (minimum value)
 MAX (maximum value)
 SUM (sum of all values)
 COUNT (number of values)

OPERATORS

We have seen some operators being used in the examples earlier. These operators are often
used in the SELECT statements, but can be used in other statements like UPDATE. The
following are the three types of operators that we are expected to know.

Comparison Operators

= < >
!= <= >=

Logical Operators

OR IS ||
AND IS NOT (string concatenation)

Arithmetic Operators

+ * %
- /

21
Python and SQLite

DB Browser for SQLite is a convenient program for us to experiment with SQL statements and
examine the results. However, it is not an appropriate program to use if we want to customise
or restrict how the contents of a database are modified or presented.

Suppose we have a database that stores information about the books in a library. We should
not use DB Browser for SQLite for users to search the database as not everyone is familiar
with SQL statements. That aside, malicious users may run harmful statements, e.g. DROP
TABLE to delete the database.

As such, a developer typically write a custom program to control how users interact with a
database, which has an interface that is easy to understand and use. Based on the users’
inputs, the program would then generate the appropriate SQL statements in the background
and run them to produce the intended results. In this way, the users are prevented from
modifying the database.

We shall learn how to write Python programs that can interact with SQLite databases using
the built-in sqlite3 module.

Quick Check
Which of the following is not a valid reason why DB Browser for SQLite should not be
accessible to the users of a public library?

A Users may use the program to insert fake data into the database.

B Users may use the program to drop tables from the database.

C Users may use the program to perform a query that returns nothing.

D Users may not know how to perform the query using the program.

Loading a Database

Program 1: load_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.close()

The connect() method (line 3) takes in a string that contains the path and filename of a
database file and returns a Connection object. If no path is included, the file is assumed to
be in the same directory as the Python file. Furthermore, if the specified file does not exist, an
empty file will be created with the given filename instead.

After all operations with the database are complete, the close() method (line 4) of the
Connection object should then be called. This ensures that the database file is closed
properly, but does not save any modifications that have been made to the data.

22
Executing SQL Statements

Program 2: create_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.execute("CREATE TABLE Book " +
5 "(ID INTEGER PRIMARY KEY, Title TEXT)")
6 connection.commit()
7 connection.close()

The execute() method (line 4) takes in a string containing the SQL statement we wish to
run.

The commit() method (line 6) saves the change(s) made to the database.

After running the program above, we can use DB Browser for SQLite to check that a table
called Book has indeed been created.

However, if we try to run the program again, we will get the following error:

Traceback (most recent call last):


File "create_example.py", line 5, in <module>
"(ID INTEGER PRIMARY KEY, Title TEXT)")
sqlite3.OperationalError: table Book already exists

This demonstrates that calling execute() is just like running regular SQL statements in the
"Execute SQL" tab of DB Browser for SQLite. Any errors caused by running SQL statements
are reported as Python exceptions.

Committing Changes and Rolling Back

Program 3: insert_example_incomplete.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.execute("INSERT INTO Book(ID, Title) " +
5 "VALUES(0, 'Example Book')")
6 connection.close()

The program above runs with no errors. However, if we open library.db using DB Browser
for SQLite, we can see that the inserted data is missing from the Book table.

A transaction is a unit of work that is performed against a database. Using INSERT, UPDATE
or DELETE command opens a transaction that can either be committed or rolled back.

23
Program 4: insert_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.execute("INSERT INTO Book(ID, Title) " +
5 "VALUES(0, 'Example Book')")
6 connection.commit()
7 connection.close()

With a call to commit() added on line 6, the data are inserted and saved correctly.

Program 5: rollback_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4
5 connection.execute("INSERT INTO Book(ID, Title) " +
6 "VALUES(1, 'Rollback Book')")
7 connection.execute("INSERT INTO Book(ID, Title) " +
8 "VALUES(2, 'Also Rollback Book')")
9 connection.rollback()
10
11 connection.execute("INSERT INTO Book(ID, Title) " +
12 "VALUES(3, 'Committed Book')")
13 connection.commit()
14
15 connection.close()

The rollback() method (line 9) discards any changes done by the preceding SQL
statements. In the example shown above, the first two INSERT statements are rolled back so
that they have no effect on the database. On the other hand, the last INSERT statement is
committed so it does affect the database.

This behaviour of SQLite is useful as sometimes we may wish to discard any modifications
since the last transaction was opened. For instance, in our library example, we may start the
process of placing a book on loan, but discover partway that the borrower has already reached
his limit of borrowed books. We can discard all the changes made since the transaction was
opened by calling the Connection object's rollback() method.

Warning: Starting with Python 3.6, commands that control the structure of the database, such
as CREATE TABLE and DROP TABLE, do not open a transaction and will generally take effect
immediately. This means that, by default, it is not possible to roll back such changes
automatically.

24
Parameter Substitution

Program 6: delete_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4
5 # Insert some rows first so we have something to delete
6 connection.execute("INSERT INTO Book(ID, Title) " +
7 "VALUES(4, 'Extra Book')")
8 connection.execute("INSERT INTO Book(ID, Title) " +
9 "VALUES(5, 'Also Extra Book')")
10 connection.commit()
11
12 # Ask for ID and delete the corresponding row
13 book_id = input("Enter Book ID to delete: ")
14 connection.execute("DELETE FROM Book WHERE ID = ?", (book_id,))
15 connection.commit()
16
17 connection.close()

We often need to include some data that are provided by the user. For instance, we may want
the user to enter the ID of a book to delete from the database. This requires us to generate a
DELETE statement with the entered ID in its WHERE clause.

We may be tempted to use string concatenation to generate the required SQL statement,

e.g. connection.execute("DELETE FROM Book WHERE ID = " + book_id)

Unfortunately, this is insecure as special characters or keywords in the user's input are not
escaped, thus malicious users can use this loophole to inject his own SQL statements.

We should use parameter substitution to safely include data that is provided by the user. To
do this, we use the question-mark character ? as placeholders for any data provided by the
user. We then provide a second argument to execute() that is a tuple of values to fill in the
placeholders.

Parameter substitution follows the same order in which the placeholders appear in the SQL
statement. This is illustrated by the following diagram:

execute("DELETE FROM Book WHERE ID > ? AND ID < ?", (2, 4))

1st tuple item replaces


1st placeholder
2nd tuple item replaces
2nd placeholder

25
Quick Check
As mentioned previously, the following string concatenation is not safe.

connection.execute("DELETE FROM Book WHERE ID = " + book_id)

Suggest an input for book_id that will delete all the rows in the Book table.

1 or 1

Retrieving Data from a Database

As we have already learned, the SELECT command is used to select data from the database.
When we run a SELECT command in DB Browser for SQLite, the selected rows are usually
displayed in a table.

In Python, however, we must access the selected rows using a Cursor object that is returned
by the execute() method. This cursor can go through the selected rows, one by one, using
either a for loop or the fetchone() method. Each iteration returns a tuple of the columns
in the current row.

The two programs below print out all the book titles in the Book table.

Program 7: forloop_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 cursor = connection.execute("SELECT ID, Title FROM Book")
5 for row in cursor:
6 print(row[1]) # Title is the second item in the tuple
7 connection.close()

Program 8: fetchone_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 cursor = connection.execute("SELECT ID, Title FROM Book")
5 row = cursor.fetchone()
6 while row is not None:
7 print(row[1]) # Title is the second item in the tuple
8 row = cursor.fetchone()
9 connection.close()

The fetchone() method (Program 8 line 5) will advance the cursor to the next row, so calling
it repeatedly will iterate through the selected rows until the cursor reaches the end and returns
None.

26
Program 9: fetchall_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 cursor = connection.execute("SELECT ID, Title FROM Book")
5 rows = cursor.fetchall()
6 for row in rows:
7 print(row[1]) # Title is the second item in the tuple
8 connection.close()

Alternatively, instead of going through the rows one by one using a cursor, we may wish to
fetch all the rows at once and keep them in a list.

The fetchall() method (line 5) returns a list of tuples with each tuple containing the
selected columns for a single row.

Program 10: row_factory_example.py

1 import sqlite3
2
3 connection = sqlite3.connect("library.db")
4 connection.row_factory = sqlite3.Row
5 cursor = connection.execute("SELECT ID, Title FROM Book")
6 for row in cursor:
7 print(row["Title"]) # row is now a dictionary
8 connection.close()

Yet another alternative is to configure the SQLite connection so that each row is retrieved as
a dictionary that maps column names to field values instead. To do this, we set the
connection object's row_factory attribute to the built-in sqlite3.Row class (line 4). This
lets us change the ordering of columns in the SELECT statement without having to modify the
code for extracting individual column values.

Quick Check
Refer to Program 10.

The SQL statement on line 5 is replaced with one of the following options. Which option
would cause an error on line 7 when the program is run?

A SELECT * FROM Book

B SELECT ID FROM Book

C SELECT Title FROM Book

D SELECT Title, ID FROM Book

27
sqlite3 Module Summary

connect(filename) Creates a Connection object using SQLite file with


the given filename

Row Can be used as a Connection object’s


row_factory so that fetchone() returns a
dictionary that maps column names to field values
instead of returning a tuple of values

Connection Class Summary

commit() Saves changes to (but does not close) SQLite file

close() Closes (but does not save changes to) SQLite file

execute(sql) Runs the given SQL statement on the database and


returns a Cursor object

execute(sql, values_tuple) Runs the given SQL statement (first argument) after
substituting question mark(s) with the corresponding
value(s) in the given tuple (second argument) and
returns a Cursor object

rollback() Undoes any changes made since the last call to


commit()

row_factory Can be set to Row so that fetchone() returns a


dictionary that maps column names to field values
instead of returning a tuple of values

Cursor Class Summary

fetchone() Returns a tuple of values from next row of the query


result or None if there are no more values (or a
dictionary that maps column names to field values
if row_factory is set to Row)

fetchall() Calls fetchone() repeatedly until it returns None


and returns a list of the non-None results

28

You might also like