8-Week SQL Challenge Data Bank. Transaction Data Analysis-Case Study #4 by Chisom Promise MLearning - Ai Apr, 2023 Medium
8-Week SQL Challenge Data Bank. Transaction Data Analysis-Case Study #4 by Chisom Promise MLearning - Ai Apr, 2023 Medium
Search Medium
Published in MLearning.ai
Save
1 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Introduction
There is a new innovation in the financial industry called Neo-Banks: new aged
digital-only banks without physical branches.
Danny thought that there should be some sort of intersection between these new-
age banks, cryptocurrency, and the data world…so he decides to launch a new
initiative — Data Bank!
Data Bank runs just like any other digital bank — but it isn’t only for banking
2 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
activities, they also have the world’s most secure distributed data storage platform!
Customers are allocated cloud data storage limits which are directly linked to how
much money they have in their accounts. There are a few interesting caveats that go
with this business model, and this is where the Data Bank team needs your help!
The management team at Data Bank wants to increase its total customer base — but
also needs some help tracking just how much data storage its customers will need.
This case study as we’ve mentioned above is all about calculating metrics, and
growth and helping the business analyze their data in a smart way to better forecast
and plan for their future developments!
Available Data
The Data Bank team has prepared a data model for this case study as well as a few
example rows from the complete dataset below to get you familiar with their tables.
Before we dive into the analysis, let’s take a moment to understand the different tables that
were eventually created in the database.
Table 1: Regions
Just like popular cryptocurrency platforms — Data Bank is also run off a network of
3 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
nodes where both money and data are stored across the globe. In a traditional
banking sense — you can think of these nodes as bank branches or stores that exist
around the world.
This regions table contains the region_id and their respective region_name values.
This random distribution changes frequently to reduce the risk of hackers getting
into Data Bank’s system and stealing customers’ money and data!
4 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
B. Customer Transactions
E. Extra Challenge
F. Extension Request
5 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Solution
Unique Nodes on the Data Bank System.sql hosted with � by GitHub view raw
Solution
1 SELECT c.region_id,
2 region_name,
3 COUNT(node_id) AS num_of_nodes
4 FROM customer_nodes c
5 INNER JOIN regions r
6 ON c.region_id = r.region_id
7 GROUP BY c.region_id, region_name
8 ORDER BY num_of_nodes DESC;
6 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Australia had the highest number of nodes occurrences (770), followed by America
(735) with Europe having the least number of nodes (616).
Solution
steps:
7 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Australia had the highest number of customers allocated to that region, followed by
America, while Europe had the least number of customers.
8 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Solution
• First of all, look at the unique start dates and end dates.
• The date is incorrect, might be a typo error, and therefore needs to be excluded
from the query.
9 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
5. What is the median, 80th, and 95th percentile for this same reallocation days metric for
each region?
Solution
• Use PERCENTILE_CONT and WITHIN GROUP to find the median, 80th, and 95th
percentile
10 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
The output shows that all the regions have the same median and 95th percentile for the
same reallocation days metric with Africa and Europe having 24 days as the 80th
11 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
percentile and America, Asia, and Australia having 23 days as the 80th percentile
reallocation metric.
B. Customer Transactions
1. What is the unique count and total amount for each transaction type?
Solution
• Use SUM to find the total amount for each transaction type
12 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
There were more deposits (2671), followed by purchases (1617), and then
withdrawals (1580).
2. What is the average total historical deposit counts and amounts for all customers?
13 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Solution
• Create a CTE named deposit_summary that calculates the total deposit counts
and amounts for each customer.
• Use the deposit_summary CTE in the outer query to calculate the average total
deposit counts and amounts for all customers using the AVG function.
14 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
The average deposit count for a customer is 5 and the average deposit amount for a
customer is 2,718.
3. For each month — how many Data Bank customers make more than 1 deposit and either
one purchase or withdrawal in a single month?
Solution
• Use the main query to filter the customer activity CTE to include only customers
who made more than 1 deposit and either 1 purchase or 1 withdrawal in a single
month.
• We then group the results by month number and month name and count the
number of unique customers who meet this criterion.
15 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
March had the highest number of customers (192) who had made more than 1
deposit and either 1 withdrawal or 1 deposit, while April had the least number of
such customers (70).
16 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
4. What is the closing balance for each customer at the end of the month?
Solution
• Use a CTE to aggregate the customer transaction data by the customer and by
month.
— Use the DATEADD function to truncate the txn_date column to the beginning of
the month. This is done to group the transactions by month, regardless of the actual
day of the month when the transaction was made.
— Use the SUM function to calculate the total number of transactions for each
customer within each month. The CASE statement is used to distinguish deposits
and withdrawals so that withdrawals are subtracted from the total amount.
• Use a final query to calculate the closing balance of a customer for a specific
month, with the closing balance being the sum of all transaction amounts up to
and including that month. In the final query:
— Use the MONTH and DATENAME functions to extract the month id and name
from the month_start column of the CTE.
— Use the SUM function with the OVER clause to calculate the running total of the
total_amount column for each customer, partitioned by the customer id and ordered
by the month start. This running total gives the closing balance for each customer at
the end of the month.
17 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
18 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
The output above isn’t complete. I had to cut it because it was long.
5. What is the percentage of customers who increase their closing balance by more than
5%?
Solution
• Use the closing_balances CTE to calculate the closing balance for each customer
for each month by summing up the transactions from the previous months.
• Use pct_increase CTE to calculate the percentage increase in closing balance for
each customer from the previous month.
• Use the pct_increase in the final query to calculate the percentage of customers
whose closing balance increased by more than 5% compared to the previous
month. It does this by counting the number of distinct customers whose
pct_increase is greater than 5 and dividing that by the total number of distinct
customers.
19 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
75.6 percent of the customers had their closing balance increase by 5% compared to
the previous month.
20 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
following data elements to help the Data Bank team estimate how much data will
need to be provisioned for each option:
1. running a customer balance column that includes the impact of each transaction
Solution
• Calculate the running balance for each customer based on the order of their
transactions.
21 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
59 2
Be sure to SUBSCRIBE here � to never miss another article on Machine Learning & AI Art Take a look.
Your email
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our
privacy practices.
22 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
Solution
• Calculate the closing balance for each customer for each month.
23 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
3. minimum, average, and maximum values of the running balance for each customer
Solution
• Use a CTE to find the running balance of each customer based on the order of
transactions.
• Then calculate the minimum, maximum, and average balance for each
customer.
24 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
The above data points in Part C are the data points we need to carry out the Data
Allocation experiment, whose full code you’ll find on my GitHub repo.
The Part D; Extra Challenge full code and explanations can also be found in my
GitHub repo via the above link.
The final part of the case study, the extension request, is a PowerPoint presentation
that will be used as marketing material for both external investors who might want
to buy Data Bank shares and new prospective customers who might want to bank
25 of 26 5/1/2023, 11:19 AM
8-Week SQL Challenge: Data Bank. Transaction Data Analysis—Case ... https://medium.com/mlearning-ai/8-week-sql-challenge-data-bank-abcf...
with Data Bank. And I hope that after you’ve gone through my presentation here,
you’ll want to invest in Data Bank.
I also created a four-page dashboard on Power BI that Data Bank’s team can use to
understand the performance of the business and gain insights on the data allocation
options that were tested out in an experiment.
26 of 26 5/1/2023, 11:19 AM