0% found this document useful (0 votes)
144 views113 pages

Using SQL and Python For Data Analytics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 113

Using SQL and Python for Data Analytics

There are several recommended options to work on


the coding examples:
1. On your own computer: Install ANACONDA
https://www.anaconda.com/download/ (Python 3.7)
-> Launch Jupyter Notebook
2. Use Google Colaboratory (runs in the cloud)
https://colab.research.google.com
You need a Google account!

1
SQLite in Jupyter Notebook
• Enter the following commands in code cells:
• Installing SQL module in the notebook
!pip install ipython-sql
• Loading the SQL module
%load_ext sql
• The above magic command loads the ipython-sql
extension. We can connect to any database
which is supported by SQLAlchemy.
• Connect to a SQLite database:
%sql sqlite://

2
Creating a Table (cont'd)
 Table and column name restrictions
– Names cannot exceed 30 characters

– Must start with a letter

– Can contain letters, numbers, and underscores (_)

– Cannot contain spaces in header name

– SQL is not case sensitive

3
Creating a Table (cont'd)

4
Creating a Table (cont'd)

5
Dropping a Table
 Can correct errors by dropping (deleting) a table
and starting over
 Useful when table is created before errors are
discovered
 Command DROP TABLE is followed by the table
to be dropped and a semicolon
 Any data in table also deleted

6
Using Data Types
 For each column, the type of data must be
defined
 Common data types
– CHAR(n)
– VARCHAR(n)
– DATE
– DECIMAL(p,q)
– INT
– SMALLINT
7
Using Nulls
 A special value to represent a situation when the
actual value is not known for a column
 Can specify whether to allow nulls in the
individual columns
 Should not allow nulls for primary key columns

8
Using Nulls (continued)
 Use NOT NULL clause in CREATE TABLE
command to exclude the use of nulls in a column

 Default is to allow null values

 If a column is defined as NOT NULL, system will


reject any attempt to store a null value there

9
Using Nulls (cont'd)

CREATE TABLE REP


(REP_NUM CHAR(2) PRIMARY KEY,
LAST_NAME CHAR(15) NOT NULL,
FIRST_NAME CHAR(15) NOT NULL,
STREET CHAR(15),
CITY CHAR(15),
STATE CHAR(2),
ZIP CHAR(5),
COMMISSION DECIMAL(7,2),
RATE DECIMAL(3,2) );

10
Adding Rows to a Table
 INSERT Command
– INSERT INTO followed by table name
– VALUES command followed by specific values in
parentheses
– Values for character columns in single quotation
marks

11
Inserting a Row that Contains Nulls
 Use a special format of INSERT command to
enter a null value in a table
 Identify the names of the columns that accept
non-null values and then list only the non-null
values after the VALUES command

12
Inserting a Row that Contains Nulls
(cont'd)

13
Viewing Table Data
 Use SELECT command
– Can display all the rows and columns in a table

 SELECT * FROM followed by the name of the


table
 Ends with a semicolon

14
Viewing Table Data (continued)

15
Correcting Errors in a Table

 UPDATE command is used to update a value in


a table
 DELETE command allows you to delete a record
 INSERT command allows you to add a record

16
Correcting Errors in a Table (cont'd)

17
Correcting Errors in a Table
(cont'd)

18
Creating the Remaining
Database Tables: Figure 3-26
%%sql
INSERT INTO CUSTOMER
VALUES
('148','Al''s Appliance and Sport','2837 Greenway','Fillmore','FL','33336',6550.00,7500.00,'20');
INSERT INTO CUSTOMER
VALUES
('282','Brookings Direct','3827 Devon','Grove','FL','33321',431.50,10000.00,'35');
INSERT INTO CUSTOMER
VALUES
('356','Ferguson''s','382 Wildwood','Northfield','FL','33146',5785.00,7500.00,'65');
INSERT INTO CUSTOMER
VALUES
('408','The Everything Shop','1828 Raven','Crystal','FL','33503',5285.25,5000.00,'35');
INSERT INTO CUSTOMER
VALUES
('462','Bargains Galore','3829 Central','Grove','FL','33321',3412.00,10000.00,'65');
INSERT INTO CUSTOMER
VALUES
('524','Kline''s','838 Ridgeland','Fillmore','FL','33336',12762.00,15000.00,'20');
INSERT INTO CUSTOMER
VALUES
('608','Johnson''s Department Store','372 Oxford','Sheldon','FL','33553',2106.00,10000.00,'65');
INSERT INTO CUSTOMER
VALUES
('687','Lee''s Sport and Appliance','282 Evergreen','Altonville','FL','32543',2851.00,5000.00,'35');
INSERT INTO CUSTOMER
VALUES
('725','Deerfield''s Four Seasons','282 Columbia','Sheldon','FL','33553',248.00,7500.00,'35');
INSERT INTO CUSTOMER
VALUES
('842','All Season','28 Lakeview','Grove','FL','33321',8221.00,7500.00,'20');

19
20
21
22
Entities, Attributes, and Relationships
(cont'd)
 Relation is a two-dimensional table
– Entries in the table are single-valued
– Each column has a distinct name
– All values in a column are values of the same attribute
– The order of the columns is immaterial
– Each row is distinct
– The order of the rows is immaterial

23
Functional Dependence
 An attribute, B, is functionally dependent on
another attribute (or collection), A, if a value for A
determines a single value for B at any one time
 B is functionally dependent on A
 A B
 A functionally determines B
 Cannot determine from sample data; must know
the users’ policies

24
Functional Dependence (cont'd)

25
Primary Keys
 Unique identifier for a table
 Column (attribute) A (or a collection of columns)
is the Primary Key (PK) for a table (relation), R,
if:
1) All columns in R are functionally dependent on A
2) No subcollection of the columns in A (assuming that
A is a collection of columns and not just a single
column) also has Property 1

26
Candidate Keys
 Like a primary key, a candidate key is a column
(or collection of columns) on which all columns in
the table are functionally dependent—the
definition for primary key really defines candidate
key as well.
 From all the candidate keys, you would choose
one to be the primary key.
 Example:
Student table with Student# and UTORid.

27
Design Method
1. Read the requirements, identify the entities (objects)
involved, and name the entities
2. Identify the unique identifiers for the entities identified
in step 1
3. Identify the attributes for all the entities
4. Identify the functional dependencies that exist among
the attributes
5. Use the functional dependencies to identify the tables
by placing each attribute with the attribute or minimum
combination of attributes on which it is functionally
dependent
6. Identify any relationships between tables.
28
Normalization
 Identify the existence of potential problems
 Provides a method for correcting problems
 Goal
– Convert unnormalized relations (tables that contain
repeating groups) into various types of normal forms

29
Normalization (cont'd)
 1 NF
– Better than unnormalized
 2 NF
– Better than 1 NF
 3 NF
– Better than 2 NF

30
First Normal Form
 A relation is in first normal form (1NF) if it does
not contain any repeating groups
 To convert an unnormalized relation to 1NF,
expand the PK to include the PK of the repeating
group
– This effectively eliminates the repeating group from
the relation

31
First Normal Form (cont'd)

32
First Normal Form (cont'd)

33
Second Normal Form
 Redundancy causes problems
 Update Anomalies
– Update
– Inconsistent data
– Additions
– Deletions

34
Second Normal Form (cont'd)
 A relation is in second normal form (2NF) if it is
in 1NF and no nonkey attribute is dependent
on only a portion of the primary key
or …
 All nonkey attributes are functionally
dependent on the entire primary key

35
Second Normal Form (cont'd)

Table is in First Normal Form but not in Second Normal Form


A 1NF relation with a primary key that is a single field is in 2NF
automatically
36
Second Normal Form (cont'd)

37
Third Normal Form
 Update anomalies still possible
 Determinant
– An attribute (or collection) that functionally determines
another attribute

38
Third Normal Form (cont'd)

Table is in Second Normal Form but not in Third Normal Form


39
Third Normal Form (cont'd)
 A relation is in third normal form (3NF) if it is in
2NF and the only determinants it contains are
candidate keys
 Boyce-Codd normal form (BCNF) is the true
name for this version of 3NF

40
Third Normal Form (cont'd)

41
Diagrams for Database Design
 Graphical illustration
 Entity-relationship (E-R) diagram
– Rectangles represent entities
– Arrows represent relationships

42
Diagrams for Database Design (cont'd)

43
Diagrams for Database Design (cont'd)

44
Diagrams for Database Design (cont'd)

45
Constructing Simple Queries (cont'd)

 SELECT-FROM-WHERE statement
– SELECT columns to include in result
– FROM table containing columns
– WHERE any conditions to apply to the data

WHERE clause is optional

46
Retrieving Certain Columns and
Rows (cont'd)

47
Retrieving All Columns and Rows
(cont'd)

48
Using a WHERE Clause (cont'd)

49
Using a WHERE Clause (cont'd)

50
Using a WHERE Clause (cont'd)

 Simple conditions can compare columns

51
Using Compound Conditions
 Compound conditions
– Connect two or more simple conditions
with AND, OR, and NOT operators
 AND operator: all simple conditions are true
 OR operator: any simple condition is true
 NOT operator: reverses the truth of the original
condition

52
53
Using Compound Conditions
(cont'd)

54
Using Compound Conditions
(cont'd)

55
Using Compound Conditions
(cont'd)

56
Using the BETWEEN Operator
 Use instead of AND operator
 Use when searching a range of values
 Makes SELECT commands simpler to construct
 Inclusive
– When using BETWEEN 2000 and 5000, values of
2000 or 5000 would be true

57
58
59
Using the BETWEEN Operator
(cont'd)

60
Using Computed Columns
 Computed column
– Does not exist in the database but is computed
using data in existing columns
 Arithmetic operators
– + for addition
– - for subtraction
– * for multiplication
– / for division

61
Using Computed Columns
(cont'd)

62
Using Computed Columns (cont'd)
 Use AS clause to assign a name

63
Using the LIKE Operator
 Used for pattern matching
 LIKE %Central% will retrieve data with those
characters
– “3829 Central” or “Centralia”

 Underscore (_) represents any single character


– “T_M” for TIM or TOM or T3M

64
Using the LIKE Operator (cont'd)

65
Using the IN Operator
 Concise phrasing of OR conditions

66
Sorting
 By default, no defined order in which results are
displayed
 Use ORDER BY clause to list data in a specific
order
 Ascending is default sort order

67
Additional Sorting Options
 Possible to sort data by more than one key
 Major sort key and minor sort key
 List sort keys in order of importance in the
ORDER BY clause
 For descending order sort, use DESC

68
Additional Sorting Options
(cont'd)

69
Using Functions
 Aggregate functions
– Apply to groups of rows

70
Using the COUNT Function
 Counts the number of rows in a table
 Can use asterisk (*) to represent any column

71
Using the SUM Function
 Used to calculate totals of columns
 Column must be specified and must be numeric
 Null values are ignored

72
Using the AVG, MAX, and MIN
Functions
 Numeric columns only
 Ignores nulls

73
Using the DISTINCT Operator
(cont'd)

74
Nesting Queries
 Query results require two or more steps
 Subquery: an inner query placed inside another
query
 Outer query uses subquery results

75
Nesting Queries (cont'd)

76
Nesting Queries (cont'd)

77
Using the GROUP BY Clause

 Group data on a particular column


 Calculate statistics

 Grouping: creates groups of rows that share


common characteristics
 Calculations in the SELECT command are
performed for the entire group

78
Using the GROUP BY Clause
(cont'd)

79
Using a HAVING Clause
 Used to restrict groups that will be included

80
Having vs. Where
 WHERE: limit rows
 HAVING: limit groups
 Can use together if condition involves both rows
and groups

81
Having vs. Where (cont'd)

82
Nulls
 Condition that involves a column that can be null
– IS NULL
– IS NOT NULL

83
Joining Two Tables
 SELECT clause
– List all columns to display

 FROM clause
– List all tables involved in query

 WHERE clause
– Restrict to rows that have common values in
matching columns

84
Joining Two Tables (cont'd)

85
Joining Two Tables (cont'd)

86
Joining Two Tables (cont'd)

87
Comparing JOINS, IN, and EXISTS
 We can join tables using:
– WHERE clause

– IN operator with a subquery

– EXISTS operator with a subquery

88
Comparing JOINS, IN, and EXISTS
(cont'd)
 WHERE clause

89
Using the IN Operator

90
Using the EXISTS Operator

91
Using the EXISTS Operator (cont'd)
 Correlated subquery
– Subquery involves a table listed in outer query

 In Figure 5-7, the ORDERS table, listed in


FROM clause of outer query used in subquery
 Must qualify ORDER_NUM column in subquery
as ORDERS.ORDER_NUM

92
Using a Subquery within a
Subquery (cont'd)

93
A Comprehensive Example

94
Using an Alias
 An alternate name for a table
 Use in FROM clause
 Type name of table, press Spacebar, and then
type name of alias
 Allows for simplicity

95
Using an Alias (cont'd)

96
Joining a Table to Itself (cont'd)

97
Joining Several Tables
 Condition relates columns for each pair of tables

98
Joining Several Tables (cont'd)
 In SELECT clause, list all columns to display
 Qualify any column names if needed
 In FROM clause, list all tables
 Include tables used in the WHERE clause, even
if they are not in the SELECT clause

99
Set Operations
 Union
– The union of two tables is a table containing
every row that is in either the first table, the
second table, or both tables

– Use UNION operator

– Tables must be union compatible; that is, the


same number of columns and corresponding
columns have identical data types and lengths

100
Set Operations (cont'd)

101
Set Operations (cont'd)
 Intersection
– Intersection of two tables is a table containing all
rows that are in both tables

– Uses the INTERSECT operator

– Not supported by Microsoft Access


» Use an alternate approach

102
Set Operations (cont'd)

103
Set Operations (cont'd)
 Difference
– Difference of two tables is a table containing set
of all rows that are in first table but not in second
table
– Uses the MINUS operator
– Not supported by SQLite, SQL Server and
Microsoft Access
» Use an alternate approach
» SQLite uses EXCEPT instead

104
Set Operations (cont'd)

105
Special Operations
• Self-join

• Inner join

• Outer join

• Product

https://theartofpostgresql.com/blog/2019-09-sql-joins/
from
https://www.tutorialgateway.org/sql-joins/

106
Inner Join
 Compares the tables in FROM clause and lists
only those rows that satisfy condition in WHERE
clause
 INNER JOIN command
– Update to SQL standard 1992
(ANSI SQL-92 explicit join syntax)

107
Inner Join (cont'd)

108
Outer Joins
 Left (outer) join: all rows from the table on the left
(listed first in the query) will be included; matching
rows only from the table on the right will be
included
 Right (outer) join: all rows from the table on the
right will be included; matching rows only from the
table on the left will be included
 Full outer join: all rows from both tables will be
included regardless of matches

109
Left (Outer) Join
 In SQLite only INNER and LEFT (OUTER) JOINs
are implemented
 RIGHT and FULL OUTER JOINs are not currently
supported
 Example:
%%sql
SELECT customer.customer_num, customer_name,
order_num, order_date
FROM customer
LEFT JOIN orders
ON customer.customer_num = orders.customer_num
ORDER BY customer.customer_num;
110
Product
 The product (Cartesian product) of two tables is
the combination of all rows in the first table and
all rows in the second table
 Omit the WHERE clause to form a product, i.e.,
in the “old” notation:
SELECT ...
FROM table1 t1, table2 t2
 equivalent to the “modern” notation:
SELECT ...
FROM table1 t1 CROSS JOIN table2 t2
111
https://stackoverflow.com/questions/44437397/why-do-cross-join-conditions-not-work-in-the-on-clause-only-the-where-clause
Using a Self-Join on a Primary
Key (cont'd)

112
SQL JOINs

https://stackoverflow.com/questions/30358982/sql-server-replaces-left-join-for-left-outer-join-in-view-query 113

You might also like