03 Relational Databases and SQL
03 Relational Databases and SQL
com
Relational Databases
and SQL
Class objectives
5 1
Present findings Ask a question
4 2
Analyze data Acquire data
3
Transform data
Primary
Key
Customers Foreign Key
CustID FirstName LastName Email Primary Column/
1 John Smith [email protected] Key Field
Row/ 2 Steven Goldfish [email protected]
Record 3 Paula Brown [email protected]
Orders
OrderID CustID Date Amount
4 James Smith [email protected]
5 Mary Stevens [email protected] 1 2 2/15/2015 100.22
2 1 3/18/2015 99.95
3 3 5/11/2015 122.95
4 3 7/28/2015 100.00
5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00
Relationship
Parent table
Primary key
field
Child table
Relationshi
p
Foreign
PwC | Relational Databases and SQL
key 11
Relationship types
• One-to-One
- Each instance of one entity relates to only one instance of the second
- Not frequently used in database systems
- Might use to divide a table with many fields in order to isolate part of a
table
• One-to-Many
- Each instance of one entity relates to one or more instances of the
second entity
- Most common type of relationship and it is used to relate one record
from the ‘primary’ table with many records in the ‘related’ table.
- In a one-to-many relationship, a record (‘parent’) in Table A can have
many matching records (‘children’) in Table B, but a record (‘child’) in
Table B has only one matching record (‘parent’) in Table A.
• Many-to-Many
- Multiple instances of one entity relate to multiple instances of the second
entityDatabases and SQL
PwC | Relational 12
Entity relationship diagram
• Data lineage refers to how the information flows from source systems to
destination systems, such as analytical environments
Tra
n s fo
Source System 1 rm
Transform Transform
Destination System
Source System 2 Data Warehouse e.g. Transactions Monitoring system
fo rm
ns
Source System 3
Tra
Key questions:
• Are all needed records from the source system transferred to the
destination system?
• Are Critical Data Elements required in the destination system
transferred?
• Is data transferred from source to destination at the right frequency?
• While
PwC | Relationaltransforming
Databases and SQL the data, are required business rules applied to the 14
attributes?
Enterprise Resource Planning
General
Ledger
• Users can also book manual journal entries directly to the general ledger
• Because these entries are not entered through the normal sub-ledger
business processes, these types of entries typically present higher risk
Customer Billing
and Receivables
• SAP
• Oracle EBS
• Oracle JD Edwards
• Oracle Peoplesoft
• Others ERP include:
- Quickbooks
- Microsoft Dynamics
- NetSuite
- Workday
www.sqlite.org:
• “SQLite is a self-contained, high-reliability,
embedded, full-featured, public-domain, SQL
database engine”
• “SQLite is the most used database engine in the
world”
sqlitebrowser.org:
• “DB Browser for SQLite [DB4S] is a high quality,
visual, open source tool to create, design, and edit
database files compatible with SQLite”
- Create, define, modify and delete tables
- Browse, edit, add and delete records
- Search records
- Import and export tables from/to CSV files
- Issue SQL queries and inspect the results
PwC | Relational Databases and SQL 25
SQLite/DB4S basics
Browse Execute
tables, SQL
columns, and statemen
records t
SQL statement
(also F5
or SHIFT
Result set + F5)
• Chinook is a digital
media store with
several employees
• Customers can buy
individual tracks from a
variety of musical artists
• The database contains
master and
transactional data for
the business
• Download the database
and open the
Chinook_Sqlite.sqlite
file in DB4S
• A CREATE TABLE statement is used to add the result set from a query to
the database as a new table
• Complicated queries can be broken down by creating intermediate tables
• The ability to create a table depends on the level of access granted to a
user, so it may not be possible in some situations
CREATE TABLE new_table AS
SELECT column1, column2, column3 FROM table_name
• The COUNT function returns the number of records in the result set
• The input parameter can be * or 1
SELECT COUNT(*) FROM table_name
SELECT COUNT(1) FROM table_name
Wildcard Description
% A substitute for zero or more characters
_ (underscore) A substitute for a single character
[charlist] Sets and ranges of characters to match
[^charlist] Matches only a character NOT specified within the brackets
• NULL means that a value has not been entered for a particular column in a
row, i.e. the value is missing
• NULL is not numeric or any other value so it can’t equal itself or anything
else
• NOT creates the negation of an expression
• To test for a NULL value you will want to use the expressions IS NULL or
IS NOT NULL
SELECT column1, column2 FROM table_name
WHERE column2 IS NULL
SELECT column1, column2 FROM table_name
WHERE column2 IS NOT NULL
Operator Description
+ Adds values on either side of the operator
- Subtracts right hand operand from left hand operand
* Multiplies values on either side of the operator
/ Divides left hand operand by right hand operand
% Divides left hand operand by right hand operand and returns remainder (“modulus”)
• The RTRIM and LTRIM functions remove spaces from the right side or left
side of a string
RTRIM(<string>)
LTRIM(<string>)
• Use SUBSTR to return a portion of a string starting at a given position and
for a specified number of characters
SUBSTR(<string>,<start location>,<length>)
• Use LENGTH to return the number of characters in a string (other SQL
implementations may use LEN)
LENGTH(<string>)
• The CAST function is used for explicitly changing the data type of a
column, often from text to numeric and vice versa
• Available data types include:
- INT - Integer
- CHAR - Text
- TEXT - Text
- REAL - Numeric
- FLOAT - Numeric
SELECT column1, CAST(column1 AS INT) AS int_col FROM table_name
SELECT column2, CAST(column2 AS CHAR) AS char_col FROM
table_name
• Use the CASE function to evaluate a list of expressions and return the first
one that evaluates to true
• CASE can be used inside aggregations to generate a statistic (e.g. sum)
only for
certain records
SELECT column1, column2,
CASE <test expression>
WHEN <comparison expression1> THEN <return value1>
WHEN <comparison expression1> THEN <return value1>
ELSE <default value>
END AS new_field_name
FROM table_name
SELECT column1, SUM(
CASE <test expression>
WHEN <comparison expression1> THEN 1 ELSE 0
END AS new_field_name
) FROM table_name GROUP BY column1
PwC | Relational Databases and SQL 56
Exercise #4
A JOIN combines fields from two tables using a key column shared by the
tables
SELECT table_a.column1, table_a.column2, table_b.column3
FROM table_a JOIN table_b ON table_a.column1 = table_b.column1
key
INNER JOIN
1 2 Selects rows from both tables as long as there is a match between
the key columns in both tables
LEFT JOIN
Returns all rows from the left table (1), with the matching rows in
1 2 the right table (2); the result is NULL in the right side when there is
no match
RIGHT JOIN
Returns all rows from the right table (2), with the matching rows in
1 2 the left table (1); the result is NULL in the left side when there is
no match
FULL OUTER JOIN
Returns all rows from the left table (1) and from the right table (2);
the result is NULL in the left or right side when there is no match
1 2
customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00
customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00
SELECT
Query Result:
customers.CustID,
customers.LastName, CustID FirstName LastName Date Amount
customers.FirstName,
1 John Smith 3/18/2015 99.95
orders.Date,
orders.Amount 2 Steven Goldfish 3/18/2015 99.95
FROM Customers 3 Paula Brown 5/11/2015 122.95
LEFT JOIN Orders ON customers.CustID = orders.CustID
3 Paula Brown 7/28/2015 100
4 James Smith 8/18/2015 555.55
5 Mary Stevens NULL NULL
customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00
SELECT
Query Result:
orders.OrderID,
customers.LastName, OrderID FirstName LastName Date Amount
customers.FirstName,
1 Steven Goldfish 2/15/2015 100.22
orders.Date,
orders.Amount 2 John Smith 3/18/2015 99.95
FROM Customers 3 Paula Brown 5/11/2015 122.95
RIGHT JOIN Orders ON customers.CustID = orders.CustID
4 Paula Brown 7/28/2015 100
5 James Smith 8/18/2015 555.55
6 NULL NULL 7/4/2015 1000.00
customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00
SELECT
Query Result:
customers.CustID,
orders.OrderID, CustID OrderID FirstName LastName Date Amount
customers.LastName,
1 2 John Smith 3/18/2015 99.95
customers.FirstName,
orders.Date, 2 1 Steven Goldfish 2/15/2015 100.22
orders.Amount 3 3 Paula Brown 5/11/2015 122.95
FROM Customers
FULL OUTER JOIN Orders ON customers.CustID = 3 4 Paula Brown 7/28/2015 100.00
orders.CustID 4 5 James Smith 8/18/2015 555.55
5 NULL Mary Stevens NULL NULL
NULL 6 NULL NULL 7/4/2015 1000.00
orders SELECT
o.CustID,
OrderID CustID Date Amount
c.FirstName,
1 2 2/15/2015 100.22
c.LastName,
2 1 3/18/2015 99.95 SUM(o.Amount) AS “TotalAmount”
3 3 5/11/2015 122.95 FROM orders AS o
4 3 7/28/2015 100 INNER JOIN customers AS c ON o.clientID =
5 4 8/18/2015 555.55 c.clientID
GROUP BY o.CustID, c.FirstName, c.LastName
6 7 7/4/2015 1000
ORDER BY TotalAmount DESC
7 3 7/15/2015 458
8 1 10/18/2015 323
9 2 5/11/2015 355
10 3 3/28/2015 77
11 7 8/18/2015 113
12 2 4/4/2015 130
13 4 8/18/2015 448
14 3 1/4/2015 449
Query Result:
You have been hired by the Office of the Mayor in New York City to review
data from city operations for opportunities to improve services. In particular,
the Mayor has heard complaints from some citizens that film shoots in the
city are causing too much disruption. The city has provided historical data on
film permits and noise complaints.
Your assignment is to perform initial profiling and descriptive analysis of the
data to understand more about the situation. Submissions are due by
XXXXXXXXXX.
1. Import ‘NYC_Film_Permits.csv’ into Excel. Import the same file and
‘NYC_Noise_Complaints.tsv’ into SQLite.
2. Answer questions 1-30 in ‘Assignment_1_Answers.xlsx’ and add a brief
description of how you determined the answer. This file should only include
your answers, not the data. Add your student ID # to the end of the file
name.
3. Upload the Excel workbook, including the data, that you used to answer
questions
1-20. This file should be called ‘Assignment_1_Excel_<STUDENTID>.xlsx’.
4. Upload the SQL script that you used to answer questions 21-30. Your script69
PwC | Relational Databases and SQL
Assignment #1 – Inputs and Outputs
File Name
Assignment_1_Answers.slsx
Assignment_1_SQL.sql
NYC_Film_Permits.dsv
NYC_Noise_Complaints.tsv
File Name
Assignment_1_Answers_ss12345.xlsx
Assignment_1_Excel_ss12345.xlsx
Assignment_1_SQL_ss12345.sql
Follow these steps to import the delimited text files into SQLite:
1. Download and unzip the files.
2. Click ‘New Database’
3. Specify a name and location for the.db file and click ‘Save’
4. Click ‘Cancel’ on the dialog box that appears
5. File > Import > Table from CSV…
6. For the.tsv file, select ‘All Files (*)’ from the drop-down (Windows) or
Options (Mac)
7. Select the file and click ‘Open’
8. Update ‘Field Separator’ to comma (.csv) or tab (.tsv)
9. Check box for ‘Column names in first line’
10. Wait for the data to be inserted into a database table
11. Check the results on the ‘Database Structure’ tab and run a simple query
against the table.
PwC | Relational Databases and SQL 71
Summary
http://sqlitebrowser.org/
Table Relationships
Primary key
Column 1
Column 2
Foreign key