Introduction
• Why SQL?
• What about Python? R?
• Data Analytics
• Relational Database
• What is a database?
• Terminology
• SQLite
• Exercise 1
• SQL
• Data Definition Language (DDL)
• Exercise 2
• Data Manipulation Language (DML)
• Exercise 3 • Open Data Portal
• How I prepared for today
Why SQL? :
• Simple • Accessible • Applicable • Powerful • Pervasive • Valuable • Universal
Why not Python? R? :
Difficult for beginners • Complicated syntax • Requires programming knowledge (logic,
algorithms) • Is SQL better than Python or R? • SQL is good for some things • Python/R is
good for other things • Compliment each other • SQL is a great starting point.
Data
Analytics :
• Analytics is the discovery, interpretation, and communication of meaningful patterns in
data, and the process of applying those patterns towards effective decision making.
• Organizations may apply analytics to business data to describe, predict, and improve
business performance.
What is a database?
A relational “database” management system (RDBMS) organizes data.
• The logical structure of the database is based upon the information needs of an
organization.
• Entities (“things” of interest to the organization), AND.
• Relationships (how the Entities are associated with each other).
Advantages of an RDBMS :
• Establish a centralized, logical view of
data
• Minimizes data duplication (i.e.,
“redundancy”)
• Promote data accuracy and integrity
• Capacity of the database
• Superior multi-user or concurrent access
• Security
• Retrieve information quickly
• Inter-operability
Database Terminology:
• Table - Entity, Relation (similar to an Excel Worksheet)
• Row - Record, Instance
• Column - Field, Attribute
• Primary Key – unique and mandatory
• Foreign Key – a cross-reference between tables because it references the primary key of
another table
• Relationship – created through foreign keys
How to introduce SQL?
• Microsoft Access : https://products.office.com/en- ca/access
• Microsoft SQL Server : https://www.microsoft.com/en- us/sql-server/sql-server-2017
• MariaDB, MySQL : https://mariadb.org/ , https://www.mysql.com/
• PostgreSQL: https://www.postgresql.org/
• Oracle: https://www.oracle.com/database/
• Hadoop, Spark, Hive, Pig: https://hadoop.apache.org/
A database that :
• Has billions and billions of deployments
• Is a single-file database
• Has public domain source code
• Small footprint
• Has a max DB size of 140 terabytes
• Has a max row size of 1 gigabyte
• Is faster than direct file access
• Aviation-grade quality and testing
• Zero-configuration
• Has ACID (Atomic, Consistent, Isolated, and Durable) transactions, even after power loss
• Has a stable, enduring file format
• Has extensive, detailed documentation
• Has long-term support (to the year 2050)
SQLite :
• “SQLite is the most widely deployed database in the world with more applications than we
can count, including several high-profile projects.”
• https://www.sqlite.org/famous.html
• “SQLite is an in-process library that implements a self-contained, serverless, zero-
configuration, transactional SQL database engine.”
• https://www.sqlite.org/about.html
• Perfect for learning SQL (the foundation of data analytics)
Exercise 1: Download and Run SQLite :
• Extract the ZIP archive to the Desktop
• Start SQLite • SQLiteDatabaseBrowserPortable.exe
• Create a New database
• open_data_day_2019.db
• Save the database in the Data folder
• Click Cancel when prompted to create a table
• Done!
What is SQL?
• SQL stands for Structured Query Language.
• SQL is pronounced S-Q-L or sequel.
• SQL is a standard language for managing, manipulating, and
querying databases.
• Developed at IBM in the early 1970s.
• In 1986, ANSI and ISO standard groups officially adopted the standard “Database Language
SQL” definition.
• Most SQL databases have their proprietary extensions in addition to the SQL standard.
• SQL is the language used to ask questions (queries) of a database, which will return
answers (results).
Why is SQL the foundation of Data
Analytics?
• Data engineers and database administrators will use SQL to ensure that everybody in their
organization can access the needed data.
• Data scientists will use SQL to load data into their models.
• Data analysts will use SQL to query tables of data and derive insights from the.
Components of SQL :
• SQL consists of three components that offer everything required to manage, maintain,
and use a database
1. Data Definition Language 2. Data Manipulation Language 3. Data Control Language
Data Definition Language (DDL)
• This component is used to define the
structure (or schema) of the database.
• For tables, there are three main commands.
• CREATE TABLE table_name
• To create a table in the database
• ALTER TABLE table_name
• To add or remove columns from a table in the database
• DROP TABLE table_name
• To remove a table from the database
Exercise 2: Data Definition Language
• Select the Execute SQL tab in SQLite.
• Type or copy/paste the CREATE TABLE statement into the empty SQLite Execute SQL
window.
• Click the Execute SQL button on the toolbar.
• If the table is created successfully, you should receive the following message:
• Query executed successfully: CREATE TABLE "MOSQUITO_TRAP_DATA“ .
• Click Write Changes to commit the changes permanently.
• View the changes in the Database Structure tab.
CREATE TABLE "MOSQUITO_TRAP_DATA" ( `SAMPLEID` INTEGER PRIMARY KEY
AUTOINCREMENT,
`TRAP_DATE` NUMERIC,
`GENUS` TEXT,
`SPECIES` TEXT,
`TYPE` TEXT,
`GENDER` TEXT)
• Select the Execute SQL tab in SQLite.
• Type or copy/paste the ALTER TABLE statements into the empty SQLite Execute SQL
window.
• Click the Execute SQL button on the toolbar.
• If the table is created successfully, you should receive the following message:
• Query executed successfully: ALTER TABLE "MOSQUITO_TRAP_DATA“
• Click Write Changes to make the changes permanent.
• View the changes in the Database Structure tab.
ALTER TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RURALNORTHWEST` INTEGER; ALTER
TABLE "MOSQUITO_TRAP_DATA" ADD COLUMN `RURALNORTHEAST` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `RURALSOUTHEAST` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `RIVERVALLEYEAST` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `RIVERVALLEYWEST` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `RESIDENTIALNORTH` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `RURALSOUTHWEST` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `LAGOON` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `GOLFCOURSE` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `INDUSTRIALPARK` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `RESIDENTIALSOUTH` INTEGER; ALTER TABLE
"MOSQUITO_TRAP_DATA" ADD COLUMN `TOTAL` INTEGER;
• Select the Execute SQL tab in SQLite
• Type or copy/paste the DROP TABLE statement into the empty SQLite Execute SQL window
• Click the Execute SQL button on the toolbar
• If the table is created successfully, you should receive the following message:
• Query executed successfully: DROP TABLE "MOSQUITO_TRAP_DATA"
• Click Write Changes to make the changes permanent
• View the changes in the Database Structure tab
DROP TABLE "MOSQUITO_TRAP_DATA";
• Create the MOSQUITO_TRAP_DATA table again using the DDL on the next slide.
• Click Write Changes to make the changes permanent.
• View the changes in the Database Structure tab
• Done!
CREATE TABLE "MOSQUITO_TRAP_DATA" (
`SAMPLEID` INTEGER PRIMARY KEY AUTOINCREMENT,
`TRAP_DATE` NUMERIC, `GENUS` TEXT,
`SPECIES` TEXT, `TYPE` TEXT,
`GENDER` TEXT,
`RURALNORTH` INTEGER,
`RURALNORTHEAST` INTEGER,
`RURALSOUTHEAST` INTEGER,
`RIVERVALLEYEAST` INTEGER,
`RIVERVALLEYWEST` INTEGER,
`RESIDENTIALNORTH` INTEGER,
`RURALSOUTHWEST` INTEGER,
`LAGOON` INTEGER,
`GOLFCOURSE` INTEGER,
`INDUSTRIALPARK` INTEGER,
`RESIDENTIALSOUTH` INTEGER,
`TOTAL` INTEGER
Data Manipulation Language
• This component is used to manipulate data within a table. There are four main commands:
• SELECT - To select rows of data from a table.
• INSERT - To insert rows of data into a table.
• UPDATE - To change rows of data in a table.
• DELETE - To remove rows of data from a table
SELECT Data Manipulation Language :
• Select the Execute SQL tab in SQLite.
• Type or copy/paste the SELECT statement into the empty SQLite Execute SQL window.
• SELECT COUNT(*) FROM MOSQUITO_TRAP_DATA;
• Click the Execute SQL button on the toolbar.
• Do you get an answer? Why not?
Exercise 3: INSERT Data Manipulation Language
• Add some data to the MOSQUITO_TRAP_DATA table created in Exercise 2
• Type or copy/paste the INSERT statement into the empty SQLite Execute SQL window
• Click the Execute SQL button on the toolbar
• Click Write Changes to make the changes permanent
• View the changes in the Browse Data tab
• The MOSQUITO_TRAP_DATA table now has seven rows of data.
INSERT INTO "MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','spencerii','Black legs','Female',0,0,0,0,0,1,0,0,0,1,1,3); INSERT INTO
"MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','dorsalis','Banded legs','Female',0,1,0,0,0,0,2,0,0,0,0,3); INSERT INTO
"MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','euedes','Banded legs','Female',1,1,0,0,2,0,0,0,0,0,0,4); INSERT INTO
"MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','excrucians','Banded legs','Female',1,2,0,0,2,1,0,0,0,1,0,7); INSERT INTO
"MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','fitchii','Banded legs','Female',0,2,0,0,1,0,0,0,0,0,4,7); INSERT INTO
"MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','flavescens','Banded legs','Female',6,5,8,0,0,0,5,0,0,3,1,28); INSERT INTO
"MOSQUITO_TRAP_DATA" (TRAP_DATE, GENUS, SPECIES, TYPE, GENDER,
RURALNORTHWEST, RURALNORTHEAST, RURALSOUTHEAST, RIVERVALLEYEAST,
RIVERVALLEYWEST, RESIDENTIALNORTH, RURALSOUTHWEST, LAGOON, GOLFCOURSE,
INDUSTRIALPARK, RESIDENTIALSOUTH, TOTAL) VALUES ('2014- 07-
01','Aedes','vexans','Banded legs','Female',3,168,1,21,38,8,16,0,0,3,32,290);
Type or copy/paste the SELECT statement into the empty SQLite Execute SQL window.
• SELECT COUNT(*) FROM MOSQUITO_TRAP_DATA;
• Click the Execute SQL button on the toolbar.
• When you execute the query, you ask the database a question.
• Can you tell me the number of rows in the MOSQUITO_TRAP_DATA table?.
• The database gives you an answer (the result) and you should have received the following
message:
• 7 rows returned in 1ms from: SELECT * FROM MOSQUITO_TRAP_DATA;
What if you want to see all the rows in your database?.
• SELECT * FROM MOSQUITO_TRAP_DATA;
• Returns all columns and rows in a table.
• What if you only want to see each row's Genus, Species, and Total?
• SELECT GENUS, SPECIES, TOTAL FROM MOSQUITO_TRAP_DATA;
• Returns only the GENUS, SPECIES, and TOTAL columns for each row in a table.
Data Manipulation Language :
• The WHERE clause.
• Uses operators to extract only those
records that fulfill a specified condition.
• Used to ask more complicated questions.
• SQL will do exactly what you ask, not
always what you expect.
• “I do not think it means what you think it
means.”
• Inigo Montoya.
• Show the rows that have a mosquito TYPE of “Black legs”
• SELECT * FROM MOSQUITO_TRAP_DATA WHERE TYPE = 'Black legs';
YOUR TURN :
• Write and execute a DML statement to answer the question below:
• Which mosquito species were caught in the traps placed in the west river valley?
UPDATE Data Manipulation Language :
• Select the Execute SQL tab in SQLite.
• Type or copy/paste the UPDATE statement into an empty SQLite Execute SQL window.
• Click the Execute SQL button on the toolbar.
• You should receive the following message:
• Query executed successfully: … (took 1ms, 4 rows affected)
UPDATE MOSQUITO_TRAP_DATA SET GENDER = 'Male‘ WHERE SAMPLEID IN (1,3,5,7);
• The GROUP BY clause - Used in
collaboration with the SELECT statement to
arrange identical data into groups.
• The GROUP BY statement is often used
with aggregate functions.
YOUR TURN
• Write and execute a DML statement to answer the question below:
• How many mosquitoes of each gender were caught in traps throughout the city?
• Select the Execute SQL tab in SQLite.
• Type or copy/paste the DELETE statement into an empty SQLite Execute SQL window.
• Click the Execute SQL button on the toolbar.
• You should receive the following message:
• Query executed successfully: … (took 0ms, 4 rows affected).
DELETE FROM MOSQUITO_TRAP_DATA WHERE GENDER = "Male";
YOUR TURN
• Write and execute a DML statement to answer the question below:
• At which traps were more mosquitoes caught? Rural north east or rural north west?
• Done!
Advanced SQL
• The MOSQUITO database only has one table.
• Databases with more than one table require tables to be joined.
• Foreign keys create relationships between tables and must be joined in a DML statement.
• Download the LED Streetlight Conversion database called odd_streetlight.db.
• Execute the query below: SELECT LED_STREETLIGHT.STREETLIGHT_ID,
LED_STREETLIGHT.TYPE, LOCATION.LOCATION FROM LED_STREETLIGHT, LOCATION WHERE
LED_STREETLIGHT.STREETLIGHT_ID = LOCATION.STREETLIGHT_ID AND
LED_STREETLIGHT.STREETLIGHT_ID = 12;
Using the Open Data Portal
• https://data.edmonton.ca/
• Data sets are usually available in comma-separated value (CSV) format.
• To use the dataset requires cleaning, importing, exploring, and understanding the dataset
• Workshop: Exploring & Cleaning Data with Open Refine.
• Requires work.
Data Work Flow :
How I prepared the data sets for today:
• Selected data sets from the Open Data Portal.
• Downloaded the CSV and surveyed in Google Sheets.
• Cleaned the data set.
• E.g., reformatted dates from MMM DD YYYY to YYYYMM-DD.
• Imported directly into SQLite tables.
• Added primary keys.
• Explored the dataset using DML.
Some “Mosquito Trap Data” questions
• How many mosquitoes were caught in 2014? SELECT strftime ('%Y', TRAP_DATE) as YEAR,
SUM(TOTAL) FROM MOSQUITO_TRAP_DATA
WHERE TOTAL <> '' AND TOTAL > 0 GROUP BY YEAR;
• How many mosquitoes of each species were caught?
• Which traps caught the most mosquitoes?
Some “LED Streetlight Conversion” questions
• How many total streetlights?
• How many streetlights are converted to LED?
• How many streetlights were converted by year? SELECT strftime('%Y', STARTDATE) as YEAR,
TYPE, COUNT(STREETLIGHT_ID)
FROM LED_STREETLIGHT
WHERE TYPE = "LED"
GROUP BY YEAR;
SQL and Climate Change
• Connecting and linking various data sets.
• Builds an understanding of what that data means.
•Data is a universal language, and climate change is a global problem.
Next steps
• Playing with data and SQL forces you to think and understand the data (builds knowledge)
• The relationships between data
• The meaning of those relationships
• The validity of the data
• SQL is iterative, often a “trial and error” process
• Don’t be afraid to make mistakes
• Team sport – discuss, share, question, collaborate
• Data is everywhere, which raises questions of privacy, security, and ethics