Date - Time Functions
Date - Time Functions
______________________________________________________________________________
Problem Statement:
You are a Data Analyst at Amazon Fresh. You have been tasked to study
the Farmer’s Market.
______________________________________________________________________________
So far…
● You now know how to aggregate data, group data, use JOINs to get
data from multiple tables and how to do windowed calculations
But
● We haven’t dealt with date and time data type. One of the most
important and common problems for data scientists is dealing with
temporal data.
______________________________________________________________________________
DateTime format -
EXTRACT()
Question: What if you only want to see the hour at which the
market started and ended on each date?
For BigQuery:
SELECT
EXTRACT(hour
FROM
market_start_datetime) AS start_hr,
EXTRACT(hour
FROM
market_end_datetime) AS end_hr
FROM
`farmers_market.datetime_demo`;
______________________________________________________________________________
For BigQuery:
SELECT market_start_datetime,
EXTRACT(date from market_start_datetime) as date,
EXTRACT(time from market_start_datetime) as time,
EXTRACT(year from market_start_datetime) as year_no,
EXTRACT(quarter from market_start_datetime) as q_no,
EXTRACT(month from market_start_datetime) as month_no,
EXTRACT(day from market_start_datetime) as day_no,
EXTRACT(week from market_start_datetime) as week_no,
EXTRACT(DAYOFWEEK from market_start_datetime) as week_day,
EXTRACT(hour from market_start_datetime) as hr,
EXTRACT(minute from market_start_datetime) as minute,
EXTRACT(second from market_start_datetime) as second,
FROM farmers_market.datetime_demo
Note: Depending on the database system you are using, the function that
retrieves different portions of a datetime value may be called EXTRACT
(MySQL), DATE_ PART (Redshift), or DATEPART (Oracle and SQL Server).
______________________________________________________________________________
Question: Your manager asks you that for each market date, he
wants to see what day & month it was. Display the name of the
day & month.
For BigQuery:
SELECT
market_start_datetime,
FORMAT_DATETIME("%B", market_start_datetime) as
mktsrt_month_name,
FORMAT_DATETIME("%A", market_start_datetime) as
mktsrt_day_name
FROM
`farmers_market.datetime_demo`
______________________________________________________________________________
We also have shortcuts for extracting the entire date and entire time from
the datetime field, so you don’t have to extract each part and re-concatenate
it together.
For BigQuery:
SELECT
market_start_datetime,
DATE(market_start_datetime) AS mktsrt_date,
TIME(market_start_datetime) AS mktsrt_time
FROM
`farmers_market.datetime_demo`
______________________________________________________________________________
The powerful thing about storing string dates as datetime values (or
converting them using SQL) is that you can do date calculations, which is not
possible when they are stored as numbers, punctuation, and letters in a
string field.
Here, we’ll use the market_start_datetime and market_end_datetime fields
to demonstrate.
We can use SQL to add 30 minutes to the start time by passing the datetime,
the interval (minutes, in this case), and the number of minutes we want to
add into the DATE_ADD function:
SELECT market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL 30 MINUTE) AS
mktstrt_date_plus_30min
FROM farmers_market.datetime_demo;
Note:
● If we instead wanted to do a calculation that required looking 30 days
past a date (for example which would require calculating 30 days past
a customer’s first purchase to determine if they made a second
purchase within that time frame), we could change the interval
parameter from MINUTE to DAY, and add 30 days instead:
SELECT market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL 30 DAY) AS
mktstrt_date_plus_30days
FROM farmers_market.datetime_demo;
The following query demonstrates that using DATE_ADD() to add –30 days to
a date has the same effect as using DATE_SUB() to subtract 30 days from a
date,
SELECT market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL -30 DAY) AS
mktstrt_date_
plus_neg30days,
DATE_SUB(market_start_datetime, INTERVAL 30 DAY) AS
mktstrt_date_
minus_30days
FROM farmers_market.datetime_demo;
______________________________________________________________________
DATE_DIFF()
Question: Find the number of days between the first and last
market dates.
For BigQuery:
SELECT
x.first_market,
x.last_market,
DATE_DIFF(x.last_market, x.first_market, DAY) days_first_to_last
FROM
(
SELECT
min(market_start_datetime) first_market,
max(market_start_datetime) last_market
FROM farmers_market.datetime_demo
)x
Here, the inner query (by which I mean the query inside parentheses, aliased
“x”) returns the first and last market dates from the datetime_demo table,
and the outer query (which is selecting from “x”) calculates the difference
between those two dates in days using DATE_DIFF.
____________________________________________________________________________
The DATE_DIFF function is used only for date values, but there is also a
function in BigQuery called TIMESTAMP_DIFF that returns the difference
between two datetimes in any chosen interval.
Here, we calculate the hours and minutes between the market start and
end times on each market date.
For BigQuery:
SELECT market_start_datetime, market_end_datetime,
TIMESTAMP_DIFF(market_end_datetime,
market_start_datetime ,HOUR)
AS market_duration_hours,
TIMESTAMP_DIFF(market_end_datetime,market_start_datetime,
MINUTE)
AS market_duration_mins
FROM farmers_market.datetime_demo;
____________________________________________________________________________
PARSE_DATETIME()
For BigQuery:
SELECT *,
PARSE_DATETIME("%Y-%m-%d %H:%M:%S", CONCAT(market_date," ",
transaction_time)) AS market_date_trans_time
FROM farmers_market.customer_purchases;
● PARSE_DATETIME("%Y-%m-%d %H:%M:%S", CONCAT(market_date," ",
transaction_time)):
○ This function combines the values from the market_date and
transaction_time columns, concatenating them into a single
string with the format "YYYY-MM-DD HH:MM:SS".
○ It then parses this string into a datetime value using the specified
format.
● %Y-%m-%d %H:%M:%S is the format used to describe how the datetime
string should be interpreted.
○ It breaks down as follows:
■ %Y: Year with century (e.g., 2023)
■ %m: Month (01-12)
■ %d: Day of the month (01-31)
■ %H: Hour (00-23)
■ %M: Minute (00-59)
■ %S: Second (00-59)
____________________________________________________________________________
SELECT
market_date,
COUNT(*) AS num_sales
FROM (
SELECT
c.market_date,
time(PARSE_DATETIME('%I:%M %P', m.market_start_time)) AS
market_start_time,
time(PARSE_DATETIME('%I:%M %P', m.market_end_time)) AS
market_end_time,
time(c.transaction_time) AS transaction_time,
PARSE_DATETIME('%Y-%m-%d %I:%M %P',
CONCAT(c.market_date, " ", m.market_start_time )) AS
market_start_datetime,
PARSE_DATETIME('%Y-%m-%d %I:%M %P',
CONCAT(c.market_date, " ", m.market_end_time )) AS
market_end_datetime,
PARSE_DATETIME('%Y-%m-%d %H:%M:%S',
CONCAT(c.market_date, " ", c.transaction_time )) AS
market_date_transaction_time,
c.product_id,
c.vendor_id,
c.customer_id,
c.quantity,
c.cost_to_customer_per_qty
FROM farmers_market.customer_purchases c
LEFT JOIN farmers_market.market_date_info m
ON c.market_date = m.market_date)
WHERE market_date_transaction_time <=
DATE_ADD(market_start_datetime, INTERVAL 30 MINUTE)
GROUP BY market_date
ORDER BY market_date;
____________________________________________________________________________
In this section, we’ll explore a few ways that you can use date functions when
summarizing data.
We can get the difference between the first and last purchase.
For BigQuery:
SELECT customer_id,
MIN(market_date) AS first_purchase,
MAX(market_date) AS last_purchase,
COUNT(DISTINCT market_date) AS count_of_purchase_dates,
DATE_DIFF(MAX(market_date), MIN(market_date), day) AS
days_between_first_last_purchase
FROM farmers_market.customer_purchases
GROUP BY customer_id
Q2. If we wanted to also know how long it’s been since the customer
last made a purchase?
For BigQuery:
SELECT customer_id,
MIN(market_date) AS first_purchase,
MAX(market_date) AS last_purchase,
COUNT(DISTINCT market_date) AS count_of_purchase_dates,
DATE_DIFF(MAX(market_date), MIN(market_date), day) AS
days_between_first_last_purchase,
DATE_DIFF(CURRENT_DATE(), MAX(market_date), day) AS
days_since_last_purchase
FROM farmers_market.customer_purchases
GROUP BY customer_id
____________________________________________________________________________
SELECT customer_id,
market_date,
LAG(market_date,1) OVER (PARTITION BY customer_id ORDER BY
market_
date) AS last_purchase
FROM farmers_market.customer_purchases;
Step 2: How to get the no. of days between current and the last purchase
date? - DATE_DIFF()
SELECT
customer_id,
market_date,
LAG(market_date, 1) OVER (PARTITION BY customer_id ORDER
BY market_date) AS last_purchase,
DATE_DIFF(market_date, (LAG(market_date, 1) OVER (PARTITION
BY customer_id ORDER BY market_date)), DAY) AS count_bw_prchs
FROM farmers_market.customer_purchases;
For BigQuery:
SELECT
x.customer_id,
x.market_date,
LAG(x.market_date, 1) OVER(PARTITION BY x.customer_id ORDER
BY x.market_date) AS last_prchs,
DATE_DIFF(x.market_date, LAG(x.market_date, 1)
OVER(PARTITION BY x.customer_id ORDER BY x.market_date), DAY) AS
days_bw_prch
FROM
( SELECT
DISTINCT customer_id,
market_date
FROM farmers_market.customer_purchases
) AS x
____________________________________________________________________________
Question: Today’s date is May 31, 2019, and the marketing director
of the farmer’s market wants to give infrequent customers (with
only 1 purchase) an incentive to return to the market in April.
Pull up a list of everyone who only purchased once during the previous
month, because they want to email all of those customers with a coupon to
receive a discount on a purchase made in April.
● First, we must find everyone who made a purchase 31 days prior to May
31, 2019.
● Then, we need to filter that list to those who came to the market and
made purchase(s) on a single market date.
This query would retrieve a list of one row per market date per customer
within that date range:
For BigQuery:
SELECT DISTINCT customer_id, market_date
FROM farmers_market.customer_purchases
WHERE DATE_DIFF('2019-05-31', market_date, DAY) <= 31
Then, we could query the results of that query, count the distinct
market_date values per customer during that time, and filter to those with
exactly one market date, using the HAVING clause.
For BigQuery:
SELECT x.customer_id,
COUNT(x.market_date) AS market_count
FROM (
SELECT DISTINCT customer_id, market_date
FROM farmers_market.customer_purchases
WHERE DATE_DIFF('2019-05-31', market_date, DAY) BETWEEN 0 AND 31
)x
GROUP BY x.customer_id
HAVING COUNT(DISTINCT market_date) = 1
____________________________________________________________________________