3 CommonDataTypesAndSQL
3 CommonDataTypesAndSQL
IMS 3
Information
Management
School
2
3.1
0
Types of data
Common data types and SQL
3
Transactional data
§Records in a database or other type of data structure (e.g., CSV
file) that capture any type of transaction, for example, a purchase
4
Other data types
§Time-series data (e.g., stock
exchange data)
§Data streams (e.g., video surveillance
and sensor data)
§Spatial data (e.g., maps)
§Video
§Images
§Audio
§Semi-structured data (e.g., XML or
JSON files)
§Other unstructured data (e.g., text or
website content) 5
3.2
0
Common semi-structured
data formats
Common data types and SQL
6
CSV
7
CSV files
§ CSV stands for Comma Separated Values
§ Typically, a CSV file is composed of:
§ One “header” row with the name of the variables/fields
§ Many rows of data (also known as records or observations)
§ In header row or in the data rows, the fields are separated by
commas
§ In some cases, the separator could be other characters, such as a
semi-colon (;), pipe (|), tab, or other characters
§ In same cases, when the separator character is the tab, files are
also called TSV files
8
Example
Header
First row
Name,Age,TotalPurchased,Recency,Frequency
Mario Costa,48,1000.00,30,20
Anne Marie,27,2132.00,25,10 Data
Elisabeth James,34,3334.50,10,15 Subsequent rows
Paul Phillips,54,783.00,5,6
9
XML
10
Origin
§ XML stands for Extensible Markup Language
§ Subset of SGML (Standard Generalized Markup Language) – ISO
standard defined in 1986
§ Not owned by any company or institution
§ XML standards are defined by the W3C consortium
(http://www.w3.org)
11
Why
§ To structure data:
§ When data are out of context don’t make sense or can be incorrectly
interpreted
§ Structure data can be read and used by other applications
§ To create a language to describe data (metadata)
§ To help:
§ Enterprise integration
§ B2B ecommerce
§ Information distribution
12
What it solved
Organized data in a way that:
§ Is simultaneously “human” and ”machine” readable
§ Defines the structure and the content
§ Emphasizes relationships
§ Separates structure from presentation
§ Open and extensible
13
What is a “Markup language”?
14
What is a “Markup language”?
§Is a markup because it describes data in <Booking ID=“12345”>
<From>Agency XYZ</From>
15
B2B Example: Booking.com reservation
<bookings hotelId=”12123”>
<booking id="JU3XPE" distributorId=”1234" currency="EUR”
date="2014-03-24T11:53:11" totalAmount="856.0000" paidAmount="0.0000”
dueAmount="856.0000" origin=”Booking.com" originId="9192”
lastModificationDate="2014-03-24T11:53:11" paxCount="2”>
<rooms>
<room id="38673" uniqueId="01" status="Create”>
<stays>
<stay from="2014-06-13" to="2014-06-21" quantity="1”
unitPrice="107.0000" rateCode="BAR3" />
</stays>
<guests>
<guest firstName=”Brian" lastName=”Black" title="MR" ageRange="0" />
<guest firstName=”Carol" lastName=”Black" title="MRS" ageRange="0" />
</guests>
</room>
</rooms>
<customer firstName="Brian" lastName=”Black" title="MR”>
<contact email=”[email protected]" phone="4412345667”>
<address country="GB" />
</contact>
</customer>
</booking>
</bookings>
16
JSON
17
What is
§ JSON stands for JavaScript Object Notation
§ A lightweight data-interchange format
§ A syntax for storing and exchanging data
§ Easier-to use alternative to XML
§ A subset of JavaScript
§ A “text file” of .json type
§ Open standards defined in http://www.json.org
18
Data types
§ Number: real, integer or floating
§ String: any kind of text
§ Boolean: true or false value
§ Null: variable with out value
§ Object: a collection of key-value pairs, separated by comma and
enclosed in curly brackets
§ Array: order sequence of values, separated by command and
enclosed in brackets
19
Example JSON VS. XML
{ <bookings>
”bookings”:[ <booking>
{“id”:1234, <id>1234</id>
“name”:”John”, <name>John</name>
“arrival”:”2018-02-01”, <arrival>2018-02-01</arrival>
”nights”:3 <nights>3</nights>
}, </booking>
{“id”:1235, <booking>
“name”:”Mary”, <id>1235</id>
“arrival”:”2018-02-02”, <name>Mary</name>
”nights”:4} <arrival>2018-02-02</arrival>
] <nights>4</nights>
} </booking>
</bookings>
JSON XML
20
JSON VS XML
JSON XML
§ Simpler to read and write § Does not support arrays
§ Supports array data § Tends to be more
structure complicated for ”human”
§ Easily read by “humans” reading
§ Supports structure data § Relies on XML schemas to
through arrays and objects define data types
§ Native object support § Objects are expressed by
conventions (attributes and
elements)
21
3.3
0
22
Introduction
Databases and SQL
23
What is a database
§A database is an organized collection of data, composed of tables,
queries, reports, views, and other objects
24
Database management systems (DBMS)
DBMS are the software that interfaces with applications, users, and
the database itself to capture and analyze the data
DBMS
25
DBMS
Non
Relational
Relational
26
Relational DBMS
28
Relational database
ENTITY
Entity: Something of interest to the database users (e.g., customers, products, etc.) 30
Relational database
Row (record): A set of columns that together describe an entity or some action to an entity
32
Relational database
TABLE TABLE
Customer Order Order item
id fname lname id date cust_id ord_id prod_id qty
1 George Blake 1 2020/01/01 1 1 1 1
2 Sue Smith 2 2020/01/02 1 1 2 1
3 2020/01/02 2 1 4 2
TABLE 2 2 1
Product 2 3 1
id name cat_id 3 1 1
1 Sugar 1 TABLE 3 2 2
Category
2 Beer 2 3 3 1
id name
3 Wine 2 3 4 3
1 Food
4 Cookies 1
2 Beverages TABLE
Table: A set of rows, held either in memory (nonpersistent) or on permanent storage (persistent)
33
Relational database
PK FK FK FK
Customer Order Order item
PK id fname lname id date cust_id PK ord_id prod_id qty
1 George Blake 1 2020/01/01 1 1 1 1
2 Sue Smith 2 2020/01/02 1 1 2 1
3 2020/01/02 2 1 4 2
FK 2 2 1
Product 2 3 1
PK id name cat_id PK 3 1 1
1 Sugar 1 3 2 2
Category
2 Beer 2 3 3 1
id name
3 Wine 2 3 4 3
1 Food
4 Cookies 1
2 Beverages
Primary key: One or more columns that can be used as a unique identifier for each row in a table
Foreign key: One or more columns that can be used to identify a single row in another table 34
Structured Query Language
(SQL)
medium.com
Databases and SQL
35
SQL
Standard language for querying and manipulating data
§Developed in the 1970’s by IBM
§Several ANSI standards were introduced since 1986
§Vendors support different subsets
36
Benefits of a standardized language
§Reduces training costs
§Improves productivity
§Application portability
§Application longevity
§Reduces vendor dependency
§Cross-systems communication
37
SQL components
§Schema: The structure that contains the descriptions of objects
(tables, views, constraints, etc.)
§Catalog: A set of schemas that constitute the description of a
database
§Data Definition Language (DDL): Commands that define the
database: create, alter, delete tables and their attributes
§Data Manipulation Language (DML): Commands to maintain and
query a data base
§Data Control Language (DCL): Commands that control a
database, including administering privileges and committing data
38
SQL query example
Get all customers who last name is “Blake”
SELECT * Customer
id fname lname
FROM Customer 1 George Blake
WHERE lname=‘Blake’ 2 Sue Smith
39
DBMS
Non
Relational
Relational
40
Non relational database
ACID:
§ Atomicity: indivisible
§ Consistency: any transaction can only change data in allowed ways
§ Isolation: determines which parts of data can be visible to other users (e.g., purchase orders headers can only be
visible after all lines are entered)
§ Durability: guarantees that all committed transactions will survive permanently 42
Exercise: create database
Databases and SQL
43
Online SQL
Instead of installing a local DBMS, we will use Sqliteonline , but there are
several alternatives:
§https://sqliteonline.com: user-friendly interface for data science
(SQLite, MariaDB/MySQL, PostgreSQL, MSSQL
§https://www.tutorialspoint.com/execute_sql_online.php: SQLLite
platform for coding
§http://sqlfiddle.com: multi-SQL compatible platform for database
training
§https://www.db-fiddle.com: user-friendly interface for multi-SQL
database training
§ If you prefer to install a local version, you could install, for example,
MySQL community edition (free) from
https://www.mysql.com/products/community
44
Database creation
1. With a text editor open the script file
mysqlsampledatabase_noDBcreation.sql (if you
want to create a new DB in a local DBMS use the
file mysqlsampledatabase.sql)
2. See the SQL code in the script file. It not only
creates tables, but also populates it with data
3. Open SQLiteonline, select MariaDB (compatible
with MySQL) and click in “Click to connect”. It will
open a database with a table named “Demo”
4. Click the “Import” button and select the file
mentioned in point 1. After, click “Ok” for the script
to run.
5. All tables will be created and populated with data.
45
Entity Relationship Diagram (ERD)
MySQL Sample Database Diagram
46
Exercise: queries and
operators
Databases and SQL
47
Reading data from a database
SELECT <list of columns>
FROM <table(s) or view(s) from which to read data>
WHERE <conditions to include rows>
GROUP BY <categorize results by>
HAVING <conditions under which a group will be included>
ORDER BY <criteria to sort results>
<...>
48
Selecting all columns
Select all columns from the table
customers
49
Selecting all columns
select * from customers
50
Selecting specific columns
Select the contact's last name
and the contact first name from
the table customers
51
Selecting specific columns
select contactLastName,
contactFirstName from
customers
52
Selecting unique values from a column
§Select dates when there were
orders made by customers
§Only show each date once
(even if there is more than one
order on the date)
53
Selecting unique values from a column
select distinct orderDate from
orders
54
Selecting rows (1/2)
Select all columns from
customers from Spain
55
Selecting rows (1/2)
select * from customers where
country='Spain'
56
Selecting rows (2/2)
Select all columns from
customers who have a
customerNumber smaller than
125
57
Selecting rows (2/2)
select * from customers where
customerNumber<125
58
Selecting rows with “Like”
Select all columns from
customers where the
customerName includes the
letters “yo” in any part of the
name
59
Selecting rows with “Like”
select * from customers where
customerName like '%yo%'
60
Selecting rows with logic operators (1/4)
61
Selecting rows with logic operators (2/4)
Select all columns from
customers from the “USA” with a
customerNumber smaller than
125
62
Selecting rows with logic operators (2/4)
select * from customers where
customerNumber<125 and
country='USA'
63
Selecting rows with logic operators (3/4)
Select all columns from
customers from the “USA” or
with a customerNumber smaller
than 125
64
Selecting rows with logic operators (3/4)
select * from customers where
customerNumber<125 or
country='USA'
65
Selecting rows with logic operators (4/4)
Select all columns from
customers from the “USA” with a
customerNumber smaller than
125, or customers from Spain,
independently of their number
66
Selecting rows with logic operators (4/4)
select * from customers where
(customerNumber<125 and
country='USA') or
country='Spain'
67
Sorting results (1/2)
Select customerName and
country from customers in
ascend order of the
customerName
68
Sorting results (1/2)
select customerName, country
from customers order by
customerName
69
Sorting results (2/2)
Select customerName and
country from customers in
descend order of the
customerName
70
Sorting results (2/2)
select customerName, country
from customers order by
customerName desc
71
Join selections (1/2)
§Find all customers who made
orders before 2003-02-01
§Return customer name, order
number, and order date
72
Join selections (1/2)
select
customers.customerName,
orders.orderNumber,
orders.orderDate
from orders
inner join customers on
customers.customerNumber=or
ders.customerNumber
where orderDate<'2003-02-01'
73
Join selections (2/2)
source: imgur.com
74
Exercise: aggregate
functions
Databases and SQL
75
COUNT (1/2)
Count the number of customers
from Spain
76
COUNT (1/2)
select count(*) from customers
where country='Spain'
77
COUNT (2/2)
Count how many cities in Spain
are customers from
78
COUNT (2/2)
select count(DISTINCT city)
from customers where
country='Spain'
79
SUM, MAX, MIN, and AVG
What is the total value of the
orders made?
80
SUM, MAX, MIN, and AVG
select
sum(od.priceEach*od.quantityOr
dered) as totalPrice
from orderdetails od
81
Aggregation with subquery
§Products which the buy price
are higher than the average
buy price
§Return the product code,
product name, and buy price,
ordered buy buy price
(descent)
82
Aggregation with subquery
select productCode,
productName, buyPrice
from products
where buyprice>(
select avg(buyprice)
from products)
order by buyprice desc
83
GROUP BY(1/2)
§Get the total ordered, date of
first order, and the date of last
order, per customer
§Return customer number,
customer name, total ordered,
date of first order, and the date
of the last order
84
GROUP BY(1/2)
select c.customerNumber,
c.customerName,
sum(od.priceEach*od.quantityOrdere
d) as totalOrdered,
min(o.orderDate) as dateFirstOrder,
max(o.orderDate) as dataLastOrder
from orderdetails od
inner join orders o on
o.orderNumber=od.orderNumber
inner join customers c on
c.customerNumber=o.customerNumb
er
group by c.customerNumber
order by c.customerName
85
GROUP BY(2/2)
§Get the number of orders, total
ordered, average per order,
and average per order line, per
customer
§Return customer number,
customer name, total ordered,
average per order, and average
per order line
86
GROUP BY(2/2)
select c.customerNumber,
c.customerName, count(DISTINCT
o.orderNumber) as orders,
sum(od.priceEach*od.quantityOrdered)
as totalOrdered,
sum(od.priceEach*od.quantityOrdered)/c
ount(DISTINCT o.orderNumber) as
avgPerOrder,
avg(od.priceEach*od.quantityOrdered) as
avgPerOrderLine
from orders o
inner join orderdetails od on
od.orderNumber=o.orderNumber
inner join customers c on
c.customerNumber=o.customerNumber
group by c.customerNumber
order by c.customerName 87
Combining LEFT JOIN and aggregation
§Get the number of orders,
number of products, number of
distinct products, and total
quantities, per customer and
year (even for customers
without no orders)
§Return customer name, year,
number of orders, number of
products, number of distinct
products, total quantity
88
Combining LEFT JOIN and aggregation
select customers.customerName as Name,
ifnull(year(orders.orderDate),0) as
Year,
count(orders.orderNumber) as Orders,
count(orderdetails.productCode) as
ProducsOrdered,
count(DISTINCT
orderdetails.productCode) as
DistinctProductsOrdered,
ifnull(SUM(orderdetails.quantityOrdered),0)
as QtyOrdered
from customers
left join orders on
orders.customerNumber=customers.custome
rNumber
left join orderdetails on
orderdetails.orderNumber=orders.orderNumb
er
group by customers.customerName,
year(orders.orderDate) 89
Data Science for Marketing
© 2021-2024 Nuno António (Rev. 2024-08-26)
Acreditações e Certificações
Instituto Superior de Estatística e Gestão da Informação
Universidade Nova de Lisboa