Using SQL and Python For Data Analytics
Using SQL and Python For Data Analytics
Using SQL and Python For Data Analytics
1
SQLite in Jupyter Notebook
• Enter the following commands in code cells:
• Installing SQL module in the notebook
!pip install ipython-sql
• Loading the SQL module
%load_ext sql
• The above magic command loads the ipython-sql
extension. We can connect to any database
which is supported by SQLAlchemy.
• Connect to a SQLite database:
%sql sqlite://
2
Creating a Table (cont'd)
Table and column name restrictions
– Names cannot exceed 30 characters
3
Creating a Table (cont'd)
4
Creating a Table (cont'd)
5
Dropping a Table
Can correct errors by dropping (deleting) a table
and starting over
Useful when table is created before errors are
discovered
Command DROP TABLE is followed by the table
to be dropped and a semicolon
Any data in table also deleted
6
Using Data Types
For each column, the type of data must be
defined
Common data types
– CHAR(n)
– VARCHAR(n)
– DATE
– DECIMAL(p,q)
– INT
– SMALLINT
7
Using Nulls
A special value to represent a situation when the
actual value is not known for a column
Can specify whether to allow nulls in the
individual columns
Should not allow nulls for primary key columns
8
Using Nulls (continued)
Use NOT NULL clause in CREATE TABLE
command to exclude the use of nulls in a column
9
Using Nulls (cont'd)
10
Adding Rows to a Table
INSERT Command
– INSERT INTO followed by table name
– VALUES command followed by specific values in
parentheses
– Values for character columns in single quotation
marks
11
Inserting a Row that Contains Nulls
Use a special format of INSERT command to
enter a null value in a table
Identify the names of the columns that accept
non-null values and then list only the non-null
values after the VALUES command
12
Inserting a Row that Contains Nulls
(cont'd)
13
Viewing Table Data
Use SELECT command
– Can display all the rows and columns in a table
14
Viewing Table Data (continued)
15
Correcting Errors in a Table
16
Correcting Errors in a Table (cont'd)
17
Correcting Errors in a Table
(cont'd)
18
Creating the Remaining
Database Tables: Figure 3-26
%%sql
INSERT INTO CUSTOMER
VALUES
('148','Al''s Appliance and Sport','2837 Greenway','Fillmore','FL','33336',6550.00,7500.00,'20');
INSERT INTO CUSTOMER
VALUES
('282','Brookings Direct','3827 Devon','Grove','FL','33321',431.50,10000.00,'35');
INSERT INTO CUSTOMER
VALUES
('356','Ferguson''s','382 Wildwood','Northfield','FL','33146',5785.00,7500.00,'65');
INSERT INTO CUSTOMER
VALUES
('408','The Everything Shop','1828 Raven','Crystal','FL','33503',5285.25,5000.00,'35');
INSERT INTO CUSTOMER
VALUES
('462','Bargains Galore','3829 Central','Grove','FL','33321',3412.00,10000.00,'65');
INSERT INTO CUSTOMER
VALUES
('524','Kline''s','838 Ridgeland','Fillmore','FL','33336',12762.00,15000.00,'20');
INSERT INTO CUSTOMER
VALUES
('608','Johnson''s Department Store','372 Oxford','Sheldon','FL','33553',2106.00,10000.00,'65');
INSERT INTO CUSTOMER
VALUES
('687','Lee''s Sport and Appliance','282 Evergreen','Altonville','FL','32543',2851.00,5000.00,'35');
INSERT INTO CUSTOMER
VALUES
('725','Deerfield''s Four Seasons','282 Columbia','Sheldon','FL','33553',248.00,7500.00,'35');
INSERT INTO CUSTOMER
VALUES
('842','All Season','28 Lakeview','Grove','FL','33321',8221.00,7500.00,'20');
19
20
21
22
Entities, Attributes, and Relationships
(cont'd)
Relation is a two-dimensional table
– Entries in the table are single-valued
– Each column has a distinct name
– All values in a column are values of the same attribute
– The order of the columns is immaterial
– Each row is distinct
– The order of the rows is immaterial
23
Functional Dependence
An attribute, B, is functionally dependent on
another attribute (or collection), A, if a value for A
determines a single value for B at any one time
B is functionally dependent on A
A B
A functionally determines B
Cannot determine from sample data; must know
the users’ policies
24
Functional Dependence (cont'd)
25
Primary Keys
Unique identifier for a table
Column (attribute) A (or a collection of columns)
is the Primary Key (PK) for a table (relation), R,
if:
1) All columns in R are functionally dependent on A
2) No subcollection of the columns in A (assuming that
A is a collection of columns and not just a single
column) also has Property 1
26
Candidate Keys
Like a primary key, a candidate key is a column
(or collection of columns) on which all columns in
the table are functionally dependent—the
definition for primary key really defines candidate
key as well.
From all the candidate keys, you would choose
one to be the primary key.
Example:
Student table with Student# and UTORid.
27
Design Method
1. Read the requirements, identify the entities (objects)
involved, and name the entities
2. Identify the unique identifiers for the entities identified
in step 1
3. Identify the attributes for all the entities
4. Identify the functional dependencies that exist among
the attributes
5. Use the functional dependencies to identify the tables
by placing each attribute with the attribute or minimum
combination of attributes on which it is functionally
dependent
6. Identify any relationships between tables.
28
Normalization
Identify the existence of potential problems
Provides a method for correcting problems
Goal
– Convert unnormalized relations (tables that contain
repeating groups) into various types of normal forms
29
Normalization (cont'd)
1 NF
– Better than unnormalized
2 NF
– Better than 1 NF
3 NF
– Better than 2 NF
30
First Normal Form
A relation is in first normal form (1NF) if it does
not contain any repeating groups
To convert an unnormalized relation to 1NF,
expand the PK to include the PK of the repeating
group
– This effectively eliminates the repeating group from
the relation
31
First Normal Form (cont'd)
32
First Normal Form (cont'd)
33
Second Normal Form
Redundancy causes problems
Update Anomalies
– Update
– Inconsistent data
– Additions
– Deletions
34
Second Normal Form (cont'd)
A relation is in second normal form (2NF) if it is
in 1NF and no nonkey attribute is dependent
on only a portion of the primary key
or …
All nonkey attributes are functionally
dependent on the entire primary key
35
Second Normal Form (cont'd)
37
Third Normal Form
Update anomalies still possible
Determinant
– An attribute (or collection) that functionally determines
another attribute
38
Third Normal Form (cont'd)
40
Third Normal Form (cont'd)
41
Diagrams for Database Design
Graphical illustration
Entity-relationship (E-R) diagram
– Rectangles represent entities
– Arrows represent relationships
42
Diagrams for Database Design (cont'd)
43
Diagrams for Database Design (cont'd)
44
Diagrams for Database Design (cont'd)
45
Constructing Simple Queries (cont'd)
SELECT-FROM-WHERE statement
– SELECT columns to include in result
– FROM table containing columns
– WHERE any conditions to apply to the data
46
Retrieving Certain Columns and
Rows (cont'd)
47
Retrieving All Columns and Rows
(cont'd)
48
Using a WHERE Clause (cont'd)
49
Using a WHERE Clause (cont'd)
50
Using a WHERE Clause (cont'd)
51
Using Compound Conditions
Compound conditions
– Connect two or more simple conditions
with AND, OR, and NOT operators
AND operator: all simple conditions are true
OR operator: any simple condition is true
NOT operator: reverses the truth of the original
condition
52
53
Using Compound Conditions
(cont'd)
54
Using Compound Conditions
(cont'd)
55
Using Compound Conditions
(cont'd)
56
Using the BETWEEN Operator
Use instead of AND operator
Use when searching a range of values
Makes SELECT commands simpler to construct
Inclusive
– When using BETWEEN 2000 and 5000, values of
2000 or 5000 would be true
57
58
59
Using the BETWEEN Operator
(cont'd)
60
Using Computed Columns
Computed column
– Does not exist in the database but is computed
using data in existing columns
Arithmetic operators
– + for addition
– - for subtraction
– * for multiplication
– / for division
61
Using Computed Columns
(cont'd)
62
Using Computed Columns (cont'd)
Use AS clause to assign a name
63
Using the LIKE Operator
Used for pattern matching
LIKE %Central% will retrieve data with those
characters
– “3829 Central” or “Centralia”
64
Using the LIKE Operator (cont'd)
65
Using the IN Operator
Concise phrasing of OR conditions
66
Sorting
By default, no defined order in which results are
displayed
Use ORDER BY clause to list data in a specific
order
Ascending is default sort order
67
Additional Sorting Options
Possible to sort data by more than one key
Major sort key and minor sort key
List sort keys in order of importance in the
ORDER BY clause
For descending order sort, use DESC
68
Additional Sorting Options
(cont'd)
69
Using Functions
Aggregate functions
– Apply to groups of rows
70
Using the COUNT Function
Counts the number of rows in a table
Can use asterisk (*) to represent any column
71
Using the SUM Function
Used to calculate totals of columns
Column must be specified and must be numeric
Null values are ignored
72
Using the AVG, MAX, and MIN
Functions
Numeric columns only
Ignores nulls
73
Using the DISTINCT Operator
(cont'd)
74
Nesting Queries
Query results require two or more steps
Subquery: an inner query placed inside another
query
Outer query uses subquery results
75
Nesting Queries (cont'd)
76
Nesting Queries (cont'd)
77
Using the GROUP BY Clause
78
Using the GROUP BY Clause
(cont'd)
79
Using a HAVING Clause
Used to restrict groups that will be included
80
Having vs. Where
WHERE: limit rows
HAVING: limit groups
Can use together if condition involves both rows
and groups
81
Having vs. Where (cont'd)
82
Nulls
Condition that involves a column that can be null
– IS NULL
– IS NOT NULL
83
Joining Two Tables
SELECT clause
– List all columns to display
FROM clause
– List all tables involved in query
WHERE clause
– Restrict to rows that have common values in
matching columns
84
Joining Two Tables (cont'd)
85
Joining Two Tables (cont'd)
86
Joining Two Tables (cont'd)
87
Comparing JOINS, IN, and EXISTS
We can join tables using:
– WHERE clause
88
Comparing JOINS, IN, and EXISTS
(cont'd)
WHERE clause
89
Using the IN Operator
90
Using the EXISTS Operator
91
Using the EXISTS Operator (cont'd)
Correlated subquery
– Subquery involves a table listed in outer query
92
Using a Subquery within a
Subquery (cont'd)
93
A Comprehensive Example
94
Using an Alias
An alternate name for a table
Use in FROM clause
Type name of table, press Spacebar, and then
type name of alias
Allows for simplicity
95
Using an Alias (cont'd)
96
Joining a Table to Itself (cont'd)
97
Joining Several Tables
Condition relates columns for each pair of tables
98
Joining Several Tables (cont'd)
In SELECT clause, list all columns to display
Qualify any column names if needed
In FROM clause, list all tables
Include tables used in the WHERE clause, even
if they are not in the SELECT clause
99
Set Operations
Union
– The union of two tables is a table containing
every row that is in either the first table, the
second table, or both tables
100
Set Operations (cont'd)
101
Set Operations (cont'd)
Intersection
– Intersection of two tables is a table containing all
rows that are in both tables
102
Set Operations (cont'd)
103
Set Operations (cont'd)
Difference
– Difference of two tables is a table containing set
of all rows that are in first table but not in second
table
– Uses the MINUS operator
– Not supported by SQLite, SQL Server and
Microsoft Access
» Use an alternate approach
» SQLite uses EXCEPT instead
104
Set Operations (cont'd)
105
Special Operations
• Self-join
• Inner join
• Outer join
• Product
https://theartofpostgresql.com/blog/2019-09-sql-joins/
from
https://www.tutorialgateway.org/sql-joins/
106
Inner Join
Compares the tables in FROM clause and lists
only those rows that satisfy condition in WHERE
clause
INNER JOIN command
– Update to SQL standard 1992
(ANSI SQL-92 explicit join syntax)
107
Inner Join (cont'd)
108
Outer Joins
Left (outer) join: all rows from the table on the left
(listed first in the query) will be included; matching
rows only from the table on the right will be
included
Right (outer) join: all rows from the table on the
right will be included; matching rows only from the
table on the left will be included
Full outer join: all rows from both tables will be
included regardless of matches
109
Left (Outer) Join
In SQLite only INNER and LEFT (OUTER) JOINs
are implemented
RIGHT and FULL OUTER JOINs are not currently
supported
Example:
%%sql
SELECT customer.customer_num, customer_name,
order_num, order_date
FROM customer
LEFT JOIN orders
ON customer.customer_num = orders.customer_num
ORDER BY customer.customer_num;
110
Product
The product (Cartesian product) of two tables is
the combination of all rows in the first table and
all rows in the second table
Omit the WHERE clause to form a product, i.e.,
in the “old” notation:
SELECT ...
FROM table1 t1, table2 t2
equivalent to the “modern” notation:
SELECT ...
FROM table1 t1 CROSS JOIN table2 t2
111
https://stackoverflow.com/questions/44437397/why-do-cross-join-conditions-not-work-in-the-on-clause-only-the-where-clause
Using a Self-Join on a Primary
Key (cont'd)
112
SQL JOINs
https://stackoverflow.com/questions/30358982/sql-server-replaces-left-join-for-left-outer-join-in-view-query 113