01 Python 03 SQL Basics
01 Python 03 SQL Basics
SQL Basics
Databases
SQL (1970-now):
SQLite
PostgreSQL
MySQL / MariaDB
Microsoft SQL Server
Oracle
IBM DB2
NoSQL (https://en.wikipedia.org/wiki/NoSQL) (2000-now):
Document (CouchDB, MongoDB, etc.)
Key-value (Couchbase, Dynamo, Redis, Riak, etc.)
Graph (Neo4J, etc.)
Relational Databases
A relational database is a digital database based on the relational model of data, as proposed
by E. F. Codd in 1970.
[...]
This model organizes data into one or more tables (or "relations") of columns and rows,
with a unique primary key identifying each row.
[...]
The primary keys within a database are used to define the relationships among the tables.
When a PK is used in another table, it is named a foreign key. This design pattern can
represent either a one-to-one or one-to-many relationship.
SQLite
A database stored in a single file.
👉 sqlite.org (https://www.sqlite.org/index.html)
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 1/8
09/05/2023 09:49 01-Python_03-SQL-Basics
SQLite DB Example
👉 European Soccer Database (https://www.kaggle.com/hugomathien/soccer/) on Kaggle
Exploration
Let's use DBeaver (https://dbeaver.io/), a universal database client for developers, SQL programmers,
database administrators and analysts.
ERD Diagram
When discovering a new database, a data scientist should explore and draw the Entity Relationship
Diagram (https://www.visual-paradigm.com/guide/data-modeling/what-is-entity-relationship-diagram/).
With DBeaver
Open the SQL Editor and write your first SQL query.
Execute the query (Click on ▶️or use keyboard shortcut Ctrl + Enter )
id |name |
-----|-----------|
1|Belgium |
1729|England |
4769|France |
[...]
With Python
Reaching for the sqlite3 (https://docs.python.org/3.7/library/sqlite3.html) package.
In [ ]:
import sqlite3
conn = sqlite3.connect('data/soccer.sqlite')
c = conn.cursor()
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 2/8
09/05/2023 09:49 01-Python_03-SQL-Basics
In [ ]:
Out[ ]:
[(1, 'Belgium'),
(1729, 'England'),
(4769, 'France'),
(7809, 'Germany'),
(10257, 'Italy'),
(13274, 'Netherlands'),
(15722, 'Poland'),
(17642, 'Portugal'),
(19694, 'Scotland'),
(21518, 'Spain'),
(24558, 'Switzerland')]
In [ ]:
conn = sqlite3.connect('data/soccer.sqlite')
conn.row_factory = sqlite3.Row
c = conn.cursor()
In [ ]:
In [ ]:
first_row['name']
Out[ ]:
'Belgium'
In [ ]:
tuple(first_row)
Out[ ]:
(1, 'Belgium')
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 3/8
09/05/2023 09:49 01-Python_03-SQL-Basics
In [ ]:
1 - Belgium
In [ ]:
None
SQL
Projection
Choosing which columns the query shall return.
Selection
Selecting which rows the query shall return.
SELECT *
FROM "Match" AS matches
WHERE matches.country_id = 4769
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 4/8
09/05/2023 09:49 01-Python_03-SQL-Basics
SELECT *
FROM "Match" AS matches
WHERE matches.country_id = 1
OR matches.country_id = 1729
Alternative:
SELECT *
FROM "Match" AS matches
WHERE matches.country_id IN (1, 1729)
SELECT *
FROM Player
WHERE UPPER(Player.player_name) LIKE 'JOHN %'
Counting
Counting the number of rows matching the selection
SELECT COUNT(Player.id)
FROM Player
WHERE Player.height >= 200
Sorting
Sorting the rows based on a column (or a group of columns)
SELECT *
FROM Player
ORDER BY Player.weight DESC
LIMIT 10
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 5/8
09/05/2023 09:49 01-Python_03-SQL-Basics
Grouping
Grouping rows on a given column C (aggregating rows with a function where values of C column are the
same)
🤔 How many matches were played on a per-country basis, ignoring countries with less than 3000 matches?
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 6/8
09/05/2023 09:49 01-Python_03-SQL-Basics
SELECT
COUNT(matches.id) AS outcome_count,
CASE
WHEN matches.home_team_goal > matches.away_team_goal
THEN 'home_win'
WHEN matches.home_team_goal = matches.away_team_goal
THEN 'draw'
ELSE 'away_win'
END AS outcome
FROM "Match" AS matches
GROUP BY outcome
ORDER BY outcome_count DESC
🤔 How many matches where played in each league (with their respective country)?
SELECT
League.id,
League.name AS league_name,
COUNT(matches.id) AS match_count,
Country.name AS country_name
FROM "Match" AS matches
JOIN League ON matches.league_id = League.id
JOIN Country ON League.country_id = Country.id
GROUP BY League.id
ORDER BY
match_count DESC,
country_name ASC
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 7/8
09/05/2023 09:49 01-Python_03-SQL-Basics
Your turn!
You will learn to:
Explore a new database and draw its schema with kitt.lewagon.com/db (https://kitt.lewagon.com/db)
Query the database with SQL with a GUI client like DBeaver
Use Python to execute SQL queries against the database
https://kitt.lewagon.com/camps/1173/lectures/content/01-Python_03-SQL-Basics.html 8/8