0% found this document useful (0 votes)
2 views

Date - Time Functions

date time function in SQL data

Uploaded by

AMIT ADIKANE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Date - Time Functions

date time function in SQL data

Uploaded by

AMIT ADIKANE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Date & Time Functions

______________________________________________________________________________

Problem Statement:
You are a Data Analyst at Amazon Fresh. You have been tasked to study
the Farmer’s Market.

Dataset: Farmer’s Market database

______________________________________________________________________________

So far…
● You now know how to aggregate data, group data, use JOINs to get
data from multiple tables and how to do windowed calculations

But
● We haven’t dealt with date and time data type. One of the most
important and common problems for data scientists is dealing with
temporal data.

● Many ML algorithms are “trained” to identify patterns in data from


the past and use those patterns to predict future outcomes. In order to
build a dataset for that purpose, we have to be able to filter queries by
time range.
● Often, datasets that are built for predictive models include summaries
of activities within dynamic date ranges—for example, how many
times a user ordered an iPhone on Amazon during each of the past
three months.
● Or, in the case of time-series analysis, an input dataset might include
one row per time period (hour, day, week, month) with a count of
something associated with each time period; for example, the number
of patients a doctor sees per week.

In our database, I have created one more table called datetime_demo to


showcase how you can work with datetime data types.

Inside this table,


● We have data from the “market_date_info” table i.e. market_date,
market_start_time, and market_end_time columns.
Along with two additional columns market_start_datetime and
market_end_datetime.

Here’s what the “datetime_demo” table looks like:

SELECT * FROM farmers_market.datetime_demo;

______________________________________________________________________________
DateTime format -

Previously, we saw that


● the market date was in “YYYY-MM-DD”,
● the market start and end time were in “hr:mins AM/PM”, and
● the market start and end datetime in “YYYY-MM-DD HH:MM:SS” format.

But do we always get date and time in the same format?


No, it might differ when working on different data based on how the
DateTime format is saved.

Let us have a look at some of the commonly used DateTime formats:


● %Y: Year with century (e.g., 2023)
● %m: Month (01-12)
● %d: Day of the month (01-31)
● %I: Hour (00-12)
● %H: Hour (00-23)
● %M: Minute (00-59)
● %S: Second (00-59)
● %P: AM/PM
____________________________________________________________________________

Oftentimes, you will encounter datetime data types, such as timestamps, in


the databases you work with and might only need a portion of the stored
date and time value.

Let’s answer a few questions to understand:

EXTRACT()

Question: Suppose you wish to know from which year to which


year data do we have in our database?

To extract the year, in MySQL, we can use the EXTRACT() function.


For BigQuery:
SELECT
MIN(EXTRACT(year
FROM
market_start_datetime)) AS min_year,
max (EXTRACT(year
FROM
market_start_datetime)) AS max_year
FROM
`farmers_market.datetime_demo`;
______________________________________________________________________________

Question: What if you only want to see the hour at which the
market started and ended on each date?

For BigQuery:
SELECT
EXTRACT(hour
FROM
market_start_datetime) AS start_hr,
EXTRACT(hour
FROM
market_end_datetime) AS end_hr
FROM
`farmers_market.datetime_demo`;
______________________________________________________________________________

Similarly, we can extract the followings from each


‘market_start_datetime’ as well -
● Date
● Time
● Year
● Quarter
● Month
● Day
● Week
● Day Of Week
● Hour
● Minute
● Second

Here’s how the syntax goes:

For BigQuery:

SELECT market_start_datetime,
EXTRACT(date from market_start_datetime) as date,
EXTRACT(time from market_start_datetime) as time,
EXTRACT(year from market_start_datetime) as year_no,
EXTRACT(quarter from market_start_datetime) as q_no,
EXTRACT(month from market_start_datetime) as month_no,
EXTRACT(day from market_start_datetime) as day_no,
EXTRACT(week from market_start_datetime) as week_no,
EXTRACT(DAYOFWEEK from market_start_datetime) as week_day,
EXTRACT(hour from market_start_datetime) as hr,
EXTRACT(minute from market_start_datetime) as minute,
EXTRACT(second from market_start_datetime) as second,
FROM farmers_market.datetime_demo

Note: Depending on the database system you are using, the function that
retrieves different portions of a datetime value may be called EXTRACT
(MySQL), DATE_ PART (Redshift), or DATEPART (Oracle and SQL Server).
______________________________________________________________________________

Question: Your manager asks you that for each market date, he
wants to see what day & month it was. Display the name of the
day & month.

For BigQuery:
SELECT
market_start_datetime,
FORMAT_DATETIME("%B", market_start_datetime) as
mktsrt_month_name,
FORMAT_DATETIME("%A", market_start_datetime) as
mktsrt_day_name
FROM
`farmers_market.datetime_demo`
______________________________________________________________________________

DATE() AND TIME()

Question: Suppose you only have the ‘market_start_datetime’


column in the table and you want to view the date and time fields
separately.

We also have shortcuts for extracting the entire date and entire time from
the datetime field, so you don’t have to extract each part and re-concatenate
it together.

For BigQuery:
SELECT
market_start_datetime,
DATE(market_start_datetime) AS mktsrt_date,
TIME(market_start_datetime) AS mktsrt_time
FROM
`farmers_market.datetime_demo`
______________________________________________________________________________

DATE_ADD and DATE_SUB

The powerful thing about storing string dates as datetime values (or
converting them using SQL) is that you can do date calculations, which is not
possible when they are stored as numbers, punctuation, and letters in a
string field.
Here, we’ll use the market_start_datetime and market_end_datetime fields
to demonstrate.

The DATE_ADD() function adds a time/date interval to a date and then


returns the date.

We can use SQL to add 30 minutes to the start time by passing the datetime,
the interval (minutes, in this case), and the number of minutes we want to
add into the DATE_ADD function:

SELECT market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL 30 MINUTE) AS
mktstrt_date_plus_30min
FROM farmers_market.datetime_demo;

Note:
● If we instead wanted to do a calculation that required looking 30 days
past a date (for example which would require calculating 30 days past
a customer’s first purchase to determine if they made a second
purchase within that time frame), we could change the interval
parameter from MINUTE to DAY, and add 30 days instead:

SELECT market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL 30 DAY) AS
mktstrt_date_plus_30days
FROM farmers_market.datetime_demo;

There is also a related function called DATE_SUB() that subtracts intervals


from datetimes.

However, instead of switching to DATE_SUB(), you could also just add a


negative number to the datetime if you prefer.

The following query demonstrates that using DATE_ADD() to add –30 days to
a date has the same effect as using DATE_SUB() to subtract 30 days from a
date,
SELECT market_start_datetime,
DATE_ADD(market_start_datetime, INTERVAL -30 DAY) AS
mktstrt_date_
plus_neg30days,
DATE_SUB(market_start_datetime, INTERVAL 30 DAY) AS
mktstrt_date_
minus_30days
FROM farmers_market.datetime_demo;
______________________________________________________________________

DATE_DIFF()

Question: Find the number of days between the first and last
market dates.

DATE_DIFF is a SQL function available in BigQuery that accepts two dates or


DateTime values along with a date_part and returns the difference
between them specified date_part intervals.

For BigQuery:
SELECT
x.first_market,
x.last_market,
DATE_DIFF(x.last_market, x.first_market, DAY) days_first_to_last
FROM
(
SELECT
min(market_start_datetime) first_market,
max(market_start_datetime) last_market
FROM farmers_market.datetime_demo
)x

Here, the inner query (by which I mean the query inside parentheses, aliased
“x”) returns the first and last market dates from the datetime_demo table,
and the outer query (which is selecting from “x”) calculates the difference
between those two dates in days using DATE_DIFF.
____________________________________________________________________________

Question: But what if we want the difference between the first


and last market dates in hours instead of days?

The DATE_DIFF function is used only for date values, but there is also a
function in BigQuery called TIMESTAMP_DIFF that returns the difference
between two datetimes in any chosen interval.

Here, we calculate the hours and minutes between the market start and
end times on each market date.

For BigQuery:
SELECT market_start_datetime, market_end_datetime,
TIMESTAMP_DIFF(market_end_datetime,
market_start_datetime ,HOUR)
AS market_duration_hours,
TIMESTAMP_DIFF(market_end_datetime,market_start_datetime,
MINUTE)
AS market_duration_mins
FROM farmers_market.datetime_demo;
____________________________________________________________________________

PARSE_DATETIME()

Let’s have a look at the transaction_time column in the


“customer_purchases” table.

SELECT transaction_time FROM farmers_market.customer_purchases;


Observation: Notice that the transaction_time is 24hr format compared to
12hr format in market_start_time and market_end_time.

Question: Create a new column market_date_trans_time that


contains market date and transaction time in a proper datetime
format.

For BigQuery:
SELECT *,
PARSE_DATETIME("%Y-%m-%d %H:%M:%S", CONCAT(market_date," ",
transaction_time)) AS market_date_trans_time
FROM farmers_market.customer_purchases;
● PARSE_DATETIME("%Y-%m-%d %H:%M:%S", CONCAT(market_date," ",
transaction_time)):
○ This function combines the values from the market_date and
transaction_time columns, concatenating them into a single
string with the format "YYYY-MM-DD HH:MM:SS".
○ It then parses this string into a datetime value using the specified
format.
● %Y-%m-%d %H:%M:%S is the format used to describe how the datetime
string should be interpreted.
○ It breaks down as follows:
■ %Y: Year with century (e.g., 2023)
■ %m: Month (01-12)
■ %d: Day of the month (01-31)
■ %H: Hour (00-23)
■ %M: Minute (00-59)
■ %S: Second (00-59)

____________________________________________________________________________

Complex Question: Let’s say you want to calculate how many


sales occurred within the first 30 minutes after the farmer’s
market opened, how would you dynamically determine what
cutoff time to use? (automatically calculate it for every market
date in your database)

Again, this is where DATE_ADD and PARSE_DATETIME functions come in.


For BigQuery:

SELECT
market_date,
COUNT(*) AS num_sales
FROM (
SELECT
c.market_date,
time(PARSE_DATETIME('%I:%M %P', m.market_start_time)) AS
market_start_time,
time(PARSE_DATETIME('%I:%M %P', m.market_end_time)) AS
market_end_time,
time(c.transaction_time) AS transaction_time,
PARSE_DATETIME('%Y-%m-%d %I:%M %P',
CONCAT(c.market_date, " ", m.market_start_time )) AS
market_start_datetime,
PARSE_DATETIME('%Y-%m-%d %I:%M %P',
CONCAT(c.market_date, " ", m.market_end_time )) AS
market_end_datetime,
PARSE_DATETIME('%Y-%m-%d %H:%M:%S',
CONCAT(c.market_date, " ", c.transaction_time )) AS
market_date_transaction_time,
c.product_id,
c.vendor_id,
c.customer_id,
c.quantity,
c.cost_to_customer_per_qty
FROM farmers_market.customer_purchases c
LEFT JOIN farmers_market.market_date_info m
ON c.market_date = m.market_date)
WHERE market_date_transaction_time <=
DATE_ADD(market_start_datetime, INTERVAL 30 MINUTE)
GROUP BY market_date
ORDER BY market_date;

____________________________________________________________________________

Date Functions in Aggregate Summaries and Window


Functions :-

In this section, we’ll explore a few ways that you can use date functions when
summarizing data.

Question: Let’s say we wanted to get a profile of each farmer’s


market customer’s habits over time.
● First purchase date
● Last purchase date
● Count of distinct purchases
Q1. If we wanted to determine for how long this person has been a
customer of the farmer’s market?

We can get the difference between the first and last purchase.

For BigQuery:
SELECT customer_id,
MIN(market_date) AS first_purchase,
MAX(market_date) AS last_purchase,
COUNT(DISTINCT market_date) AS count_of_purchase_dates,
DATE_DIFF(MAX(market_date), MIN(market_date), day) AS
days_between_first_last_purchase
FROM farmers_market.customer_purchases
GROUP BY customer_id

Q2. If we wanted to also know how long it’s been since the customer
last made a purchase?

We can use the CURRENT_DATE() function.

CURRENT_DATE() can be used to represent the current system date in any


calculation that requires a date or datetime parameter.

For BigQuery:
SELECT customer_id,
MIN(market_date) AS first_purchase,
MAX(market_date) AS last_purchase,
COUNT(DISTINCT market_date) AS count_of_purchase_dates,
DATE_DIFF(MAX(market_date), MIN(market_date), day) AS
days_between_first_last_purchase,
DATE_DIFF(CURRENT_DATE(), MAX(market_date), day) AS
days_since_last_purchase
FROM farmers_market.customer_purchases
GROUP BY customer_id
____________________________________________________________________________

With Window Functions :-

Question: Write a query that gives us the days between each


purchase a customer makes.

Step 1: How can we get the previous purchase? - LAG()

SELECT customer_id,
market_date,
LAG(market_date,1) OVER (PARTITION BY customer_id ORDER BY
market_
date) AS last_purchase
FROM farmers_market.customer_purchases;

Step 2: How to get the no. of days between current and the last purchase
date? - DATE_DIFF()

SELECT
customer_id,
market_date,
LAG(market_date, 1) OVER (PARTITION BY customer_id ORDER
BY market_date) AS last_purchase,
DATE_DIFF(market_date, (LAG(market_date, 1) OVER (PARTITION
BY customer_id ORDER BY market_date)), DAY) AS count_bw_prchs
FROM farmers_market.customer_purchases;

- Here, we didn’t quite accomplish the goal of finding the difference


between each purchase date and the previous purchase date.
- Because there are multiple rows with the same date in cases where the
customer purchased multiple items on the same date.
We can resolve this in a few ways.

● One approach is to remove the duplicates by using the DISTINCT


keyword, and then use a WHERE clause filter to remove rows where the
two dates (current and next purchase) are the same (because multiple
purchases were made on the same date).

● Another is to remove duplicates in the initial dataset and use a


subquery (a query inside a query) to get the date differences. Doing this
and moving the window functions to the outer query will also fix the
issue of the RANK counting each purchase, when we really want to
count each purchase date.

For BigQuery:
SELECT
x.customer_id,
x.market_date,
LAG(x.market_date, 1) OVER(PARTITION BY x.customer_id ORDER
BY x.market_date) AS last_prchs,
DATE_DIFF(x.market_date, LAG(x.market_date, 1)
OVER(PARTITION BY x.customer_id ORDER BY x.market_date), DAY) AS
days_bw_prch
FROM
( SELECT
DISTINCT customer_id,
market_date
FROM farmers_market.customer_purchases
) AS x
____________________________________________________________________________

Question: Today’s date is May 31, 2019, and the marketing director
of the farmer’s market wants to give infrequent customers (with
only 1 purchase) an incentive to return to the market in April.

Pull up a list of everyone who only purchased once during the previous
month, because they want to email all of those customers with a coupon to
receive a discount on a purchase made in April.
● First, we must find everyone who made a purchase 31 days prior to May
31, 2019.
● Then, we need to filter that list to those who came to the market and
made purchase(s) on a single market date.

This query would retrieve a list of one row per market date per customer
within that date range:

For BigQuery:
SELECT DISTINCT customer_id, market_date
FROM farmers_market.customer_purchases
WHERE DATE_DIFF('2019-05-31', market_date, DAY) <= 31

Then, we could query the results of that query, count the distinct
market_date values per customer during that time, and filter to those with
exactly one market date, using the HAVING clause.

For BigQuery:
SELECT x.customer_id,
COUNT(x.market_date) AS market_count
FROM (
SELECT DISTINCT customer_id, market_date
FROM farmers_market.customer_purchases
WHERE DATE_DIFF('2019-05-31', market_date, DAY) BETWEEN 0 AND 31
)x
GROUP BY x.customer_id
HAVING COUNT(DISTINCT market_date) = 1
____________________________________________________________________________

You might also like