0% found this document useful (0 votes)
23 views

03 Relational Databases and SQL

Uploaded by

Muhammad Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

03 Relational Databases and SQL

Uploaded by

Muhammad Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 83

www.pwc.

com

Relational Databases
and SQL
Class objectives

• Demonstrate knowledge of relational databases, data lineage, and ERP


systems
• Demonstrate how to interpret an entity relationship diagram
• Demonstrate how to use SQL to transform and analyze data

PwC | Relational Databases and SQL 2


Review

PwC | Relational Databases and SQL 3


Data analytics process

5 1
Present findings Ask a question

4 2
Analyze data Acquire data

3
Transform data

PwC | Relational Databases and SQL 4


Relational databases

PwC | Relational Databases and SQL 5


What is a relational database?

• A relational database is a collection of data that organizes data in tables and


maintains relationships between the tables for information retrieval
• A relational database management system (RDBMS) is computer software that
enables users to create, modify, and analyze data in a relational database
• Enterprise RDBMS include:
- Microsoft SQL Server
- Oracle Database
- Oracle MySQL
- IBM DB2
- SAP Sybase
- Teradata
- PostgreSQL
• Lightweight RDBMS include:
- Microsoft Access
- SQLite
PwC | Relational Databases and SQL 6
Key terms

• A table is the basic unit of data storage in a database. Data is stored in


rows and columns. You can specify rules for each column of a table like
establishing its data type, forcing a column to contain a value in every row,
or ensuring unique records
• A view is a tailored presentation of data selected from one or more tables,
possibly including other views. A view contains no actual data but rather
derives what it shows from the tables and views on which it is based.
Therefore, a view can be thought of as a virtual table
• A schema is a “container” within a database similar to a directory on a file
system that groups tables together
• The connection between tables is made by a Primary Key – Foreign Key
pair, where the Foreign Key in a table is the Primary Key of another table
• A query is used to create, manipulate, or view database objects
• A report is formatted view of data from the database within user-defined
parameters, often generated in an external file format such as.txt

PwC | Relational Databases and SQL 7


Tables

• Tables consist of 0ne or more records Contact


CustomerID CompanyName Name ContactTitle
or rows of data GREAL Great Lakes Howard Marketing
Food Market Snyder Manager
• The data for each row is organized HUNGC Hungry Coyote Yoshi Latimer Sales
into a defined set of attributes, Import Store Representative
LAZYK Lazy K John Steel Marketing
organized into columns Kountry Store Manager
LETSS Let's Stop N Shop Jaime Yorres Owner
• Data types include:
LONEP Lonesome Pine Fran Wilson Sales Manager
Restaurant
- String/Text/Character
- Numeric (Integer, Float)
- Date/Time
- Boolean (Yes/No, True/False)

PwC | Relational Databases and SQL 8


Relational tables

Primary
Key
Customers Foreign Key
CustID FirstName LastName Email Primary Column/
1 John Smith [email protected] Key Field
Row/ 2 Steven Goldfish [email protected]
Record 3 Paula Brown [email protected]
Orders
OrderID CustID Date Amount
4 James Smith [email protected]
5 Mary Stevens [email protected] 1 2 2/15/2015 100.22
2 1 3/18/2015 99.95
3 3 5/11/2015 122.95
4 3 7/28/2015 100.00
5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00
Relationship

PwC | Relational Databases and SQL 9


Primary key

• Primary key is a unique identifier of records in a table


• Primary key values may be generated manually or automatically
• A primary key can consist of more than one field

PwC | Relational Databases and SQL 10


Foreign key

Parent table
Primary key
field

Child table

Relationshi
p

Foreign
PwC | Relational Databases and SQL
key 11
Relationship types

• One-to-One
- Each instance of one entity relates to only one instance of the second
- Not frequently used in database systems
- Might use to divide a table with many fields in order to isolate part of a
table
• One-to-Many
- Each instance of one entity relates to one or more instances of the
second entity
- Most common type of relationship and it is used to relate one record
from the ‘primary’ table with many records in the ‘related’ table.
- In a one-to-many relationship, a record (‘parent’) in Table A can have
many matching records (‘children’) in Table B, but a record (‘child’) in
Table B has only one matching record (‘parent’) in Table A.
• Many-to-Many
- Multiple instances of one entity relate to multiple instances of the second
entityDatabases and SQL
PwC | Relational 12
Entity relationship diagram

PwC | Relational Databases and SQL 13


Data lineage

• Data lineage refers to how the information flows from source systems to
destination systems, such as analytical environments
Tra
n s fo
Source System 1 rm

Transform Transform
Destination System
Source System 2 Data Warehouse e.g. Transactions Monitoring system

fo rm
ns
Source System 3
Tra

Key questions:
• Are all needed records from the source system transferred to the
destination system?
• Are Critical Data Elements required in the destination system
transferred?
• Is data transferred from source to destination at the right frequency?
• While
PwC | Relationaltransforming
Databases and SQL the data, are required business rules applied to the 14
attributes?
Enterprise Resource Planning

PwC | Relational Databases and SQL 15


What is an ERP system?

• An Enterprise Resource Planning or ERP system provides a platform to


integrate the internal and external management of information across an
organization
• Information in the ERP system can include:
- Finance and accounting
- Supply chain
- Manufacturing
- Sales
- Human resources

PwC | Relational Databases and SQL 16


General ledger and sub-ledgers

• ERP modules (or subledgers) store information unique to various business


processes
• All the modules roll up to the General Ledger
• For example, the Property, Plant and Equipment (PP&E) module stores
information about an asset’s useful life, salvage value, date placed in
service, depreciation, etc.
General
Ledger

Customer Billing Procurement and Property, Plant Production and


and Receivables Accounts Payable and Equipment Inventory

PwC | Relational Databases and SQL 17


Automated journal entries

• Sub-ledger transactions account for a substantial amount of the journal


entries posted to the general ledger
• Transactions in the subledger can be a result of automated entries or user
activity
• Subledger activity then flows into the General Ledger

General
Ledger

Customer Billing Procurement and Property, Plant Production and


and Receivables Accounts Payable and Equipment Inventory

Sub-ledger Sub-ledger Sub-ledger Sub-ledger


entries entries entries entries

PwC | Relational Databases and SQL 18


Manual journal entries

• Users can also book manual journal entries directly to the general ledger
• Because these entries are not entered through the normal sub-ledger
business processes, these types of entries typically present higher risk

General Manual JEs


Ledger

Customer Billing Procurement and Property, Plant Production and


and Receivables Accounts Payable and Equipment Inventory

PwC | Relational Databases and SQL 19


Transactions and master data

• ERP modules include multiple tables with different


Transactional data
types of information about a transaction
Generated as a result of a
• Key data fields sit in multiple data tables creating specific transaction
relationships across the data Master data
• Multiple data tables are read, updated, etc. even for Not related to specific events
but required for transactions
simple transactions
of a given type
• Within the ERP, each accounting process has
configuration settings that ensure transactions are
processed correctly and consistently

Customer Billing
and Receivables

Customer Cash Goods


Invoices Sales Orders
Master Receipts Shipments

PwC | Relational Databases and SQL 20


Common ERP systems

• SAP
• Oracle EBS
• Oracle JD Edwards
• Oracle Peoplesoft
• Others ERP include:
- Quickbooks
- Microsoft Dynamics
- NetSuite
- Workday

PwC | Relational Databases and SQL 21


Structured Query Language

PwC | Relational Databases and SQL 22


What is SQL?

• Structured Query Language (SQL) is the programming language used when


interacting with data in a relational database
• SQL implementations for different databases may have minor differences in
functions and syntax
• SQL statements can be categorized into two groups: Data Definition
Language (DDL) and Data Manipulation Language (DML)
- DDL statements are used to create and/or alter table structures in a SQL
Server database
- DML statements are used to add, modify, query, or remove data from a
SQL Server database

PwC | Relational Databases and SQL 23


SQL basics

• A statement is a complete sequence of functions, keywords, operators, and


values that can be evaluated by the database engine
• A query returns records from one or more tables in the database
• A clause is a component of a statement or query
• An expression is a combination of one or more values, operators, and
functions that evaluates to a value
• Extra white space (spaces, new lines, etc.) is not evaluated
• SQL is typically case-insensitive, but functions are capitalized by
convention
• SQL statements are often required to end with a semi-colon (;)

PwC | Relational Databases and SQL 24


SQLite and DB Browser for SQLite

www.sqlite.org:
• “SQLite is a self-contained, high-reliability,
embedded, full-featured, public-domain, SQL
database engine”
• “SQLite is the most used database engine in the
world”
sqlitebrowser.org:
• “DB Browser for SQLite [DB4S] is a high quality,
visual, open source tool to create, design, and edit
database files compatible with SQLite”
- Create, define, modify and delete tables
- Browse, edit, add and delete records
- Search records
- Import and export tables from/to CSV files
- Issue SQL queries and inspect the results
PwC | Relational Databases and SQL 25
SQLite/DB4S basics

Open database Write and execute


file SQL statements

File > Import >


Table from CSV
file…

Browse Execute
tables, SQL
columns, and statemen
records t
SQL statement
(also F5
or SHIFT
Result set + F5)

PwC | Relational Databases and SQL 26


Comments!

• Adding comments is a best practice when writing code, in any


programming language
• These statements are not evaluated by the SQL engine but help the reader
understand the intent and the approach behind an individual line of code or
the script as a whole
• In SQL, single-line comments are generally preceded by a double dash
(“--”), while block comments across multiple lines are marked by “/* … */”

PwC | Relational Databases and SQL 27


Chinook sample database

• Chinook is a digital
media store with
several employees
• Customers can buy
individual tracks from a
variety of musical artists
• The database contains
master and
transactional data for
the business
• Download the database
and open the
Chinook_Sqlite.sqlite
file in DB4S

PwC | Relational Databases and SQL 28


SELECT statement

• A SELECT statement is used to fetch records from a database table in the


form of a table called a result set
• One or more columns may be specified, or the asterisk (*) can be used to
return all records from the table
• Use the LIMIT keyword to return only the first n rows from the result set
• The result set is not stored in the database without the use of additional
functions
SELECT column1, column2, column3 FROM table_name
SELECT * FROM table_name
SELECT column1, column2, column3 FROM table_name LIMIT 5

PwC | Relational Databases and SQL 29


CREATE TABLE statement

• A CREATE TABLE statement is used to add the result set from a query to
the database as a new table
• Complicated queries can be broken down by creating intermediate tables
• The ability to create a table depends on the level of access granted to a
user, so it may not be possible in some situations
CREATE TABLE new_table AS
SELECT column1, column2, column3 FROM table_name

PwC | Relational Databases and SQL 30


ORDER BY

• The ORDER BY keyword can be added to a SELECT statement to sort the


records
• The typical default order is ascending (ASC), but you can also specify
descending (DESC)
SELECT column1, column2, column3 FROM table_name
ORDER BY column2, column1, column3 DESC

PwC | Relational Databases and SQL 31


COUNT

• The COUNT function returns the number of records in the result set
• The input parameter can be * or 1
SELECT COUNT(*) FROM table_name
SELECT COUNT(1) FROM table_name

PwC | Relational Databases and SQL 32


Exercise #1

Answer the following questions:


1. How many columns does the customer table have?
2. If you select only the first 5 rows, what is the country in the last row?
3. If you sort the table on last name, what is the last name in the third row?
4. How many rows are in the invoice table?

PwC | Relational Databases and SQL 33


WHERE clause

• The WHERE clause can be added to a SELECT statement to filter records


that meet certain conditions, i.e. return a subset of the table
• We use expressions to define these conditions
- Most conditional operators are self-explanatory (=, <, >, etc.)
- A common operator for “not equals” is <> or !=
SELECT column1, column2, column3 FROM table_name WHERE column1 =
‘ABC’
SELECT column1, column2, column3 FROM table_name WHERE column2 >
100
SELECT column1, column2, column3 FROM table_name WHERE column3
<> 0

PwC | Relational Databases and SQL 34


Comparative operators

Operator Operator Description Example (a = 10, b = 20)


= Checks if the values of two operands are equal or not, if yes (a = b) is not true
then condition becomes true
!= Checks if the values of two operands are equal or not, if values (a != b) is true
are not equal then condition becomes true
<> Checks if the values of two operands are equal or not, if values (a <> b) is true
are not equal then condition becomes true
> Checks if the value of left operand is greater than the value of (a > b) is not true
right operand, if yes then condition becomes true
< Checks if the value of left operand is less than the value of right (a < b) is true
operand, if yes then condition becomes true

PwC | Relational Databases and SQL 35


AND/OR

• An AND/OR operator can be added to a WHERE clause to filter records on


based on multiple conditions
• The AND operator returns all records where both conditions are true
• The OR operator returns all records if either the first condition or the
second condition are true
SELECT column1, column2, column3 FROM table_name
WHERE a_condition AND another_condition
SELECT column1, column2, column3 FROM table_name
WHERE a_condition OR another_condition

PwC | Relational Databases and SQL 36


IN

• The IN operator can be used to specify multiple values in a WHERE clause


• This is equivalent to a series of OR clauses
SELECT column1, column2, column3 FROM table_name
WHERE column1 IN (value1, value2, …, valueN)
SELECT column1, column2, column3 FROM table_name
WHERE column1 = value1 OR column1 = value OR … OR column1 =
valueN

PwC | Relational Databases and SQL 37


BETWEEN

• The BETWEEN operator can be used in a WHERE clause to select values


within a range
SELECT column1, column2, column3 FROM table_name
WHERE column2 BETWEEN value1 AND value2

PwC | Relational Databases and SQL 38


LIKE

• The LIKE operator can be used in a WHERE clause to search for a


specified pattern in a column
• LIKE supports the several wildcards
SELECT column1, column2, column3 FROM table_name
WHERE column2 LIKE ‘%value1%’

Wildcard Description
% A substitute for zero or more characters
_ (underscore) A substitute for a single character
[charlist] Sets and ranges of characters to match
[^charlist] Matches only a character NOT specified within the brackets

PwC | Relational Databases and SQL 39


NULL and NOT

• NULL means that a value has not been entered for a particular column in a
row, i.e. the value is missing
• NULL is not numeric or any other value so it can’t equal itself or anything
else
• NOT creates the negation of an expression
• To test for a NULL value you will want to use the expressions IS NULL or
IS NOT NULL
SELECT column1, column2 FROM table_name
WHERE column2 IS NULL
SELECT column1, column2 FROM table_name
WHERE column2 IS NOT NULL

PwC | Relational Databases and SQL 40


Exercise #2

Answer the following questions:


1. How many invoices have totals greater than $20?
2. How many invoices have totals greater than $10 and less than $20?
3. How many invoices are from customers in Brazil, Argentina, or Chile?
4. How many invoices have a billing country that starts with “C”?
5. How many customers list their company?

PwC | Relational Databases and SQL 41


Expressions

• An expression is a combination of one or more values, operators, and SQL


functions that evaluates to a value
• Any time the data must be displayed, filtered, or ordered in a way that is
different from how it is stored, you can use expressions and functions to
manipulate it
• We can use expressions to create calculated or derived fields, as in the
example below:
- end_bal – beg_bal is the expression
- Minus (-) is an operator
- end_bal and beg_bal are columns in the existing table
- AS “difference” provides a name for a new column in the result set
SELECT column1, end_bal, end_bal – beg_bal AS “difference”
FROM table_name

PwC | Relational Databases and SQL 42


Arithmetic operators

Operator Description
+ Adds values on either side of the operator
- Subtracts right hand operand from left hand operand
* Multiplies values on either side of the operator
/ Divides left hand operand by right hand operand
% Divides left hand operand by right hand operand and returns remainder (“modulus”)

PwC | Relational Databases and SQL 43


String functions

• The RTRIM and LTRIM functions remove spaces from the right side or left
side of a string
RTRIM(<string>)
LTRIM(<string>)
• Use SUBSTR to return a portion of a string starting at a given position and
for a specified number of characters
SUBSTR(<string>,<start location>,<length>)
• Use LENGTH to return the number of characters in a string (other SQL
implementations may use LEN)
LENGTH(<string>)

PwC | Relational Databases and SQL 44


String functions (continued)

• Use UPPER and LOWER to change a string to either uppercase or


lowercase
• You may need to display all uppercase data in a report, for example
UPPER(<string>)
LOWER(<string>)
• Use REPLACE to substitute one string value for another
• Use REPLACE to clean up data; for example, you may need to replace
slashes (/) in a phone number column with hyphens (-) for a report
REPLACE(<string value>,<string to replace>,<replacement>)
• SELECT column1, REPLACE (column1, ‘/’, ‘-’) AS new_col
• FROM table_name

PwC | Relational Databases and SQL 45


Numeric functions

• Use ROUND to specify a number of significant digits:


SELECT column1, ROUND (column1, 2) AS new_col
FROM table_name
• Use RANDOM in combination with the absolute value function ABS and
modulus operator % to generate a random number on a specific range:
SELECT *, ABS(RANDOM() % 100) AS new_col
FROM table_name

PwC | Relational Databases and SQL 46


Date functions

• Different SQL implementations Sub Description


have varying functions and %d day of month: 00
operators for dates and times %f fractional seconds: SS.SSS
%H hour: 00-24
• In SQLite, date objects are created %j day of year: 001-366
by DATE and related functions, and %J Julian day number
arithmetic operators are used to do %m month: 01-12
calculations involving dates %M minute: 00-59
• DATE(‘now’) returns the current %s seconds since 1970-01-01

date and time of the server %S seconds: 00-59


%w day of week 0-6 with Sunday = 0
• JULIANDAY interprets dates in the %W week of year: 00-53
standard YYYY-MM-DD format %Y year: 0000-9999

• STRFTIME provides flexible


options for managing dates and
times using substitutions shown at
right

PwC | Relational Databases and SQL 47


Example – Date functions

SELECT JULIANDAY('now') - JULIANDAY('1776-07-04');


SELECT column1, STRFTIME(‘%Y-%m-%d’, column1) AS new_col
FROM table_name
SELECT column1, STRFTIME(‘%w’, column1) AS new_col
FROM table_name
SELECT column1, STRFTIME(‘%W’, column1) AS new_col
FROM table_name

PwC | Relational Databases and SQL 48


CAST

• The CAST function is used for explicitly changing the data type of a
column, often from text to numeric and vice versa
• Available data types include:
- INT - Integer
- CHAR - Text
- TEXT - Text
- REAL - Numeric
- FLOAT - Numeric
SELECT column1, CAST(column1 AS INT) AS int_col FROM table_name
SELECT column2, CAST(column2 AS CHAR) AS char_col FROM
table_name

PwC | Relational Databases and SQL 49


Exercise #3

Answer the following questions:


1. How many employees does Chinook have?
2. Create a new column that shows their phone numbers without the area
code.
3. How long is the longest last name?
4. How many employees were born on a Sunday?
5. How old was the oldest employee when they were hired?

PwC | Relational Databases and SQL 50


DISTINCT

• The DISTINCT function returns the unique values in a column or unique


records across multiple columns
• The combination of COUNT and DISTINCT returns the number of unique
values
SELECT DISTINCT column1 FROM table_name
SELECT COUNT(DISTINCT column1) FROM table_name

PwC | Relational Databases and SQL 51


Aggregate functions

• The following functions, including COUNT, calculate totals or statistics by


column
• These statistics can apply to all rows or groupings of rows using GROUP
BY
• Other SQL implementations may include additional statistics (standard
deviation, variance,
Function
median, etc.)
Description
COUNT() Counts the total number of records
SUM() Sum of numeric values
AVG() Finds the average
MIN() Returns the smallest number
MAX() Returns the largest number

PwC | Relational Databases and SQL 52


GROUP BY clause

The SQL GROUP BY clause is used in collaboration with the SELECT


statement to aggregate by groups (similar to Excel’s pivot table).
The GROUP BY clause follows the WHERE clause and precedes the ORDER
BY clause (if present)
This clause regularly makes use of SQL’s aggregate functions; when used
with GROUP BY, each aggregate function produces a single value for each
group
SELECT column1, column2, count(1) as ct, sum(column1) as sum1
FROM table_name
WHERE a_condition
GROUP BY column1, column2
ORDER BY column2, column1

PwC | Relational Databases and SQL 53


Example – GROUP BY with aggregate function

OrderID CustID Date Amount SELECT


1 2 2/15/2015 100.22 CustID, SUM(Amount) AS TotalAmount
2 1 3/18/2015 99.95 FROM Orders
3 3 5/11/2015 122.95 GROUP BY CustID
4 3 7/28/2015 100 ORDER BY TotalAmount DESC
5 4 8/18/2015 555.55
6 7 7/4/2015 1000
Result set:
7 3 7/15/2015 458
8 1 10/18/2015 323 CustID TotalAmount
9 2 5/11/2015 355
3 1206.95
10 3 3/28/2015 77
7 1113
11 7 8/18/2015 113
4 1003.55
12 2 4/4/2015 130
2 585.22
13 4 8/18/2015 448
1 422.95
14 3 1/4/2015 449

PwC | Relational Databases and SQL 54


HAVING clause

• The HAVING clause can be added to GROUP BY to filter records that


meet certain conditions, i.e. return a subset of the groupings
• Operators are the same as the WHERE clause
SELECT column1, column2, count(1) as counts FROM table_name
GROUP BY column1, column2 HAVING counts > 5
SELECT column1, count(1) as counts, sum(column2) as sum2
FROM table_name GROUP BY column1
HAVING counts > 1 and sum2 > 10000

PwC | Relational Databases and SQL 55


CASE

• Use the CASE function to evaluate a list of expressions and return the first
one that evaluates to true
• CASE can be used inside aggregations to generate a statistic (e.g. sum)
only for
certain records
SELECT column1, column2,
CASE <test expression>
WHEN <comparison expression1> THEN <return value1>
WHEN <comparison expression1> THEN <return value1>
ELSE <default value>
END AS new_field_name
FROM table_name
SELECT column1, SUM(
CASE <test expression>
WHEN <comparison expression1> THEN 1 ELSE 0
END AS new_field_name
) FROM table_name GROUP BY column1
PwC | Relational Databases and SQL 56
Exercise #4

Answer the following questions:


1. Return a list of countries by the total value of invoices in descending order.
2. How many invoice line items are there?
3. How many distinct prices are there?
4. How many downloads of each price have happened?
5. Which tracks have been downloaded the most times?
6. What is the range of invoice dates covered by the database?

PwC | Relational Databases and SQL 57


JOIN clause

A JOIN combines fields from two tables using a key column shared by the
tables
SELECT table_a.column1, table_a.column2, table_b.column3
FROM table_a JOIN table_b ON table_a.column1 = table_b.column1

key

PwC | Relational Databases and SQL 58


Aliases

• SQL aliases are used to temporarily refer to a table or a column by a


different name
• Aliases are commonly used in join clauses to save space and keystrokes
SELECT SELECT
Employee.BusinessEntityID, e.BusinessEntityID,
FirstName, FirstName,
LastName, LastName,
JobTitle, JobTitle,
BirthDate BirthDate
FROM HumanResources.Employee FROM HumanResources.Employee
INNER JOIN Person.Person ON AS e
Employee.BusinessEntityID = INNER JOIN Person.Person AS p ON
Person.BusinessEntityID e.BusinessEntityID =
p.BusinessEntityID
aliases

PwC | Relational Databases and SQL 59


JOIN types

INNER JOIN
1 2 Selects rows from both tables as long as there is a match between
the key columns in both tables
LEFT JOIN
Returns all rows from the left table (1), with the matching rows in
1 2 the right table (2); the result is NULL in the right side when there is
no match
RIGHT JOIN
Returns all rows from the right table (2), with the matching rows in
1 2 the left table (1); the result is NULL in the left side when there is
no match
FULL OUTER JOIN
Returns all rows from the left table (1) and from the right table (2);
the result is NULL in the left or right side when there is no match
1 2

PwC | Relational Databases and SQL 60


INNER join

customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00

SELECT Query Result:


customers.CustID,
customers.LastName, CustID FirstName LastName Date Amount
customers.FirstName,
1 John Smith 3/18/2015 99.95
orders.Date,
orders.Amount 2 Steven Goldfish 3/18/2015 99.95
FROM Customers 3 Paula Brown 5/11/2015 122.95
INNER JOIN Orders ON customers.CustID = orders.CustID
3 Paula Brown 7/28/2015 100
4 James Smith 8/18/2015 555.55

PwC | Relational Databases and SQL 61


LEFT join

customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00

SELECT
Query Result:
customers.CustID,
customers.LastName, CustID FirstName LastName Date Amount
customers.FirstName,
1 John Smith 3/18/2015 99.95
orders.Date,
orders.Amount 2 Steven Goldfish 3/18/2015 99.95
FROM Customers 3 Paula Brown 5/11/2015 122.95
LEFT JOIN Orders ON customers.CustID = orders.CustID
3 Paula Brown 7/28/2015 100
4 James Smith 8/18/2015 555.55
5 Mary Stevens NULL NULL

PwC | Relational Databases and SQL 62


RIGHT join

customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00

SELECT
Query Result:
orders.OrderID,
customers.LastName, OrderID FirstName LastName Date Amount
customers.FirstName,
1 Steven Goldfish 2/15/2015 100.22
orders.Date,
orders.Amount 2 John Smith 3/18/2015 99.95
FROM Customers 3 Paula Brown 5/11/2015 122.95
RIGHT JOIN Orders ON customers.CustID = orders.CustID
4 Paula Brown 7/28/2015 100
5 James Smith 8/18/2015 555.55
6 NULL NULL 7/4/2015 1000.00

PwC | Relational Databases and SQL 63


FULL OUTER join

customers orders
CustID FirstName LastName Email OrderID CustID Date Amount
1 John Smith [email protected] 1 2 2/15/2015 100.22
2 Steven Goldfish [email protected] 2 1 3/18/2015 99.95
3 Paula Brown [email protected] 3 3 5/11/2015 122.95
4 James Smith [email protected] 4 3 7/28/2015 100.00
5 Mary Stevens [email protected] 5 4 8/18/2015 555.55
6 7 7/4/2015 1000.00

SELECT
Query Result:
customers.CustID,
orders.OrderID, CustID OrderID FirstName LastName Date Amount
customers.LastName,
1 2 John Smith 3/18/2015 99.95
customers.FirstName,
orders.Date, 2 1 Steven Goldfish 2/15/2015 100.22
orders.Amount 3 3 Paula Brown 5/11/2015 122.95
FROM Customers
FULL OUTER JOIN Orders ON customers.CustID = 3 4 Paula Brown 7/28/2015 100.00
orders.CustID 4 5 James Smith 8/18/2015 555.55
5 NULL Mary Stevens NULL NULL
NULL 6 NULL NULL 7/4/2015 1000.00

PwC | Relational Databases and SQL 64


Example – Join with GROUP BY

orders SELECT
o.CustID,
OrderID CustID Date Amount
c.FirstName,
1 2 2/15/2015 100.22
c.LastName,
2 1 3/18/2015 99.95 SUM(o.Amount) AS “TotalAmount”
3 3 5/11/2015 122.95 FROM orders AS o
4 3 7/28/2015 100 INNER JOIN customers AS c ON o.clientID =
5 4 8/18/2015 555.55 c.clientID
GROUP BY o.CustID, c.FirstName, c.LastName
6 7 7/4/2015 1000
ORDER BY TotalAmount DESC
7 3 7/15/2015 458
8 1 10/18/2015 323
9 2 5/11/2015 355
10 3 3/28/2015 77
11 7 8/18/2015 113
12 2 4/4/2015 130
13 4 8/18/2015 448
14 3 1/4/2015 449
Query Result:

customers CustID FirstName LastName TotalAmount


3 Paula Brown 1206.95
CustID FirstName LastName Email
7 NULL NULL 1113
1 John Smith [email protected]
4 James Smith 1003.55
2 Steven Goldfish [email protected]
2 Steven Goldfish 585.22
3 Paula Brown [email protected]
1 John Smith 422.95
4 James Smith [email protected]
5 Mary Stevens [email protected]

PwC | Relational Databases and SQL 65


Combining queries

• SELECT statements can be combined to avoid hard-coding values or


creating intermediate tables
• In a simple case, an additional SELECT statement returns a value for use
in a condition:
SELECT * FROM table_name WHERE column1 >
(SELECT AVG(column1) FROM table_name);
More complex queries with multiple layers of nesting may require the use of
aliases
SELECT COUNT(*) FROM (SELECT * FROM table_name WHERE column1
>
(SELECT AVG(column1) FROM table_name)) temp;

PwC | Relational Databases and SQL 66


Exercise #5

Answer the following questions:


1. Return the top five most downloaded tracks with the full name of the track
and number of downloads.
2. How many distinct albums have actually been downloaded?
3. What is the most popular genre?

PwC | Relational Databases and SQL 67


Assignment

PwC | Relational Databases and SQL 68


Assignment #1 – Excel and SQL

You have been hired by the Office of the Mayor in New York City to review
data from city operations for opportunities to improve services. In particular,
the Mayor has heard complaints from some citizens that film shoots in the
city are causing too much disruption. The city has provided historical data on
film permits and noise complaints.
Your assignment is to perform initial profiling and descriptive analysis of the
data to understand more about the situation. Submissions are due by
XXXXXXXXXX.
1. Import ‘NYC_Film_Permits.csv’ into Excel. Import the same file and
‘NYC_Noise_Complaints.tsv’ into SQLite.
2. Answer questions 1-30 in ‘Assignment_1_Answers.xlsx’ and add a brief
description of how you determined the answer. This file should only include
your answers, not the data. Add your student ID # to the end of the file
name.
3. Upload the Excel workbook, including the data, that you used to answer
questions
1-20. This file should be called ‘Assignment_1_Excel_<STUDENTID>.xlsx’.
4. Upload the SQL script that you used to answer questions 21-30. Your script69
PwC | Relational Databases and SQL
Assignment #1 – Inputs and Outputs

File Name
Assignment_1_Answers.slsx
Assignment_1_SQL.sql
NYC_Film_Permits.dsv
NYC_Noise_Complaints.tsv

File Name
Assignment_1_Answers_ss12345.xlsx
Assignment_1_Excel_ss12345.xlsx
Assignment_1_SQL_ss12345.sql

PwC | Relational Databases and SQL 70


Assignment #1 – Importing data to SQLite

Follow these steps to import the delimited text files into SQLite:
1. Download and unzip the files.
2. Click ‘New Database’
3. Specify a name and location for the.db file and click ‘Save’
4. Click ‘Cancel’ on the dialog box that appears
5. File > Import > Table from CSV…
6. For the.tsv file, select ‘All Files (*)’ from the drop-down (Windows) or
Options (Mac)
7. Select the file and click ‘Open’
8. Update ‘Field Separator’ to comma (.csv) or tab (.tsv)
9. Check box for ‘Column names in first line’
10. Wait for the data to be inserted into a database table
11. Check the results on the ‘Database Structure’ tab and run a simple query
against the table.
PwC | Relational Databases and SQL 71
Summary

PwC | Relational Databases and SQL 72


Acquire data

Task Description SQL


Data access Connect to a data source
Importing data Read the data into an New Database > File > Import > Table from CSV File
analytical environment
Data profiling Review data dimensions and COUNT
summary statistics
DISTINCT
MIN, MAX, SUM, AVG, etc.
ORDER BY
Data quality Identify aspects of the data IS NULL
assessment that pose challenges for
subsequent analysis IS NOT NULL
Data simulation Generate data based on RANDOM
analytical requirements

PwC | Relational Databases and SQL 73


Transform data

Task Description SQL


Cleaning data Address data quality issues to LTRIM, RTRIM, UPPER, LOWER, etc.
facilitate analysis
ROUND
Changing data types Convert a value to the CAST AS
appropriate format for
analysis
Filtering data Create subsets of records WHERE
and features based on
specified conditions IN, BETWEEN, LIKE
HAVING
Deriving data Create new features from AS
original features
REPLACE, SUBSTR, etc.
JULIANDAY, etc.
CASE, IF
Scaling data Put features with different MIN, MAX, SUM, AVG, etc.
ranges of values on the same
scale while preserving
relative values

PwC | Relational Databases and SQL 74


Transform data (continued)

Task Description SQL


Sampling data Create subsets of records RANDOM
based on a probability
distribution
Aggregating data Return a statistic or value for GROUP BY
one feature according to
different values of MIN, MAX, SUM, AVG, etc.
another feature
Reshaping data Change whether values are
represented in different
records or different features
Concatenating data Combine data sets through UNION
juxtaposition
UNION ALL
Merging data Combine data sets by INNER JOIN
matching records on a
common identifier LEFT JOIN

PwC | Relational Databases and SQL 75


Analyze data

Task Description SQL


Summary analysis Calculate representative MIN, MAX, SUM, AVG, etc.
statistics for features
of interest
Perform statistical Estimate the probability that
tests the data supports a
specific claim
Clustering Identify similar groups
of records
Predictive modeling Use one set of features to
predict the value of
another feature
Network analysis Examine relationships
between entities

PwC | Relational Databases and SQL 76


Present findings

Task Description SQL


Data visualization Display data using lines,
shapes, colors, and other
abstract representations
Dashboarding Create a collection of
dynamic visualizations
Exporting data Produce output from an Copy/paste
analytical environment for
future use File > Export > Table(s) as CSV file…
Make Use results of data analysis to
recommendations guide decision-making

PwC | Relational Databases and SQL 77


Appendix

PwC | Relational Databases and SQL 78


Step #1 – Download and install DB4S

http://sqlitebrowser.org/

PwC | Relational Databases and SQL 79


Step #2 – Start DB4S and open the
sample.sqlite file

PwC | Relational Databases and SQL 80


Step #3 – Execute a simple query

1. Go to ‘Execute SQL’ tab 2


2. Enter the following SQL query:
select count(*) from Artist;
3. Click the forward button to
execute the current line and check
for results
4. Always add a semi-colon (;) at the
end of a query
1 3

PwC | Relational Databases and SQL 81


Exercise – Entity relationship diagram

• Work in groups to create an entity relationship diagram for an Accounts


Payable module with at least 5 tables

Table Relationships
Primary key
Column 1
Column 2

Foreign key

PwC | Relational Databases and SQL 82


© 2018 PwC. All rights reserved. PwC refers to the US member firm or one of its subsidiaries
or affiliates, and may sometimes refer to the PwC network. Each member firm is a separate
legal entity. Please see www.pwc.com/structure for further details.

You might also like