SQL Objective and Subjective Questions
SQL Objective and Subjective Questions
Output:
Output:
Avg_num_of_products
1.53
4. Determine the churn rate by gender for the most recent year in
the dataset.
In the analysis conducted using a table chart, the churn rate by gender
for the most recent year available in the dataset, 2019, was determined.
Notably, the churn rate for females (25.05%) exceeded that of males (15.37%),
despite the total count of exited customers being 658. It's noteworthy that the
total number of male customers is higher than females, yet the count of
exited customers is lower compared to females.
Output:
Output:
ExitCategory AvgCreditScore
Exit 645.35
Retain 651.85
While the difference in average credit scores between the two groups is
relatively small, it appears that customers who remain have a slightly higher
average credit score compared to those who have exited.
FROM customerinfo c
JOIN gender g
ON c.GenderID = g.GenderID
It calculates the average estimated salary and the count of active accounts for
each gender category. It achieves this by joining three tables: customerinfo,
gender, and bank_churn. The gender table is linked to the customerinfo table
via the GenderID column, providing the gender category for each customer.
Meanwhile, the bank_churn table is connected to the customerinfo table
through the CustomerId column, allowing for the retrieval of churn-related
data. The query utilizes aggregate functions such as AVG and SUM, along with
conditional statements within the CASE expression, to compute the average
estimated salary and count the active accounts. Finally, the results are grouped
by gender category.
Output:
Output:
Output:
GeographyLocation ActiveAccounts
France 1575
Spain 796
Germany 800
Output:
10. For customers who have exited, what is the most common
number of products they have used?
select b.NumOfProducts,
COUNT(CASE WHEN ec.ExitCategory = 'Exit' THEN 1 END) AS ExitedCustomers
from bank_churn b
join exitcustomer ec
on b.Exited=ec.ExitID
group by b.NumOfProducts
order by ExitedCustomers desc;
It retrieves data from the bank_churn and exitcustomer tables and
groups it by the number of products held by customers. Using a conditional
statement, it counts the number of customers who have churned within each
product category. The results are then ordered in descending order based on
the count of exited customers.
Output:
NumOfProducts ExitedCustomers
1 1409
2 348
3 220
4 60
11. Examine the trend of customers joining over time and identify
any seasonal patterns (yearly or monthly). Prepare the data
through SQL and then visualize it.
SELECT
EXTRACT(YEAR FROM BankDOJ) AS JoinYear,
EXTRACT(MONTH FROM BankDOJ) AS JoinMonth,
COUNT(*) AS JoinCount
FROM customerinfo
GROUP BY JoinYear,JoinMonth
ORDER BY joinCount desc;
It examines the trend of customers joining over time by extracting the year and
month from the "BankDOJ" (Bank Date of Joining) column in the
"customerinfo" table. It counts the number of customers who joined in each
year-month combination and then groups the results by year and month.
Finally, it orders the results based on the join count in descending order.
Output:
It appears that in the year 2019(470), December had the highest number
of customer joining’s, followed by November in 2018(368), December in
2017(334), and November in 2016(313). This analysis suggests that there is
seasonal pattern in customer acquisition, with higher joining’s typically
observed in the months of September, November, and December.
Out of the 7963 retained customers, 3117 have a zero balance. This
indicates that some customers who have remained with the bank have fully
utilized their funds or closed their accounts. 4846 have a non-zero balance.
This suggests that the majority of customers who have remained with the bank
still have funds in their accounts.
14. How many different tables are given in the dataset, out of these
tables which table only consists of categorical variables?
There are seven different tables given in the dataset. Among these
tables, "Gender", "Geography", "ExitCustomer", and "ActiveCstomer" are likely
categorical variables.
15. Using SQL, write a query to find out the gender-wise average
income of males and females in each geography id. Also, rank the
gender according to the average value. (SQL)
SELECT c.GeographyID, g.GenderCategory,
AVG(c.EstimatedSalary) AS AvgIncome,
RANK() OVER (PARTITION BY c.GeographyID ORDER BY
AVG(c.EstimatedSalary) DESC) AS GenderRank
FROM customerinfo c
JOIN gender g ON c.GenderID = g.GenderID
GROUP BY c.GeographyID, g.GenderCategory
ORDER BY c.GeographyID, GenderRank;
It directly joins the customerinfo table with the gender table using the
JOIN clause based on the GenderID column and then it calculates the average
income (AVG(c.EstimatedSalary)) for each combination of GeographyID and
GenderCategory using the GROUP BY clause. The RANK() window function is
used to rank the genders within each geographic location based on their
average income. The PARTITION BY clause ensures that ranking is done
separately for each geographic location and gender category. Finally, the
results are ordered by GeographyID and GenderRank.
Output:
Output:
17. Is there any direct correlation between salary and the balance
of the customers? And is it different for people who have exited or
not?
On average, customers who have exited the bank have a slightly higher
estimated salary compared to those who have remained. Customers who have
exited the bank have a notably higher average balance compared to those who
have remained. This suggests that customers with higher balances are more
likely to leave the bank.
Output:
18. Is there any correlation between the salary and the Credit score
of customers?
From the table it is evident that there is no relation between the Credit
Score of the customer and the Average of Estimated Salary. As the data
suggests, the lowest Credit score has the highest Average Estimated Salary.
Output:
FROM (
SELECT CASE
END AS CreditScoreBucket,
CustomerId,Exited
FROM bank_churn
) AS ScoreBuckets
GROUP BY CreditScoreBucket
ORDER BY Ranks;
This SQL query categorizes customers into credit score buckets ('Poor',
'Fair', 'Good', 'Excellent', 'Unknown') based on their credit scores. It then
calculates the number of churned customers within each bucket. Additionally,
it ranks the buckets according to the count of churned customers, with higher
ranks indicating more churn. The query utilizes a subquery to assign credit
score buckets to customers and then joins it with the exitcustomer table to
identify churned customers. Finally, it groups the results by credit score bucket
and orders them by rank in descending order.
Output:
The "Fair" credit score bucket is ranked first because it has the highest
number of churned customers compared to other credit score buckets, with
685 customers exiting the bank. It is noteworthy that a significant portion of
customers fall within the 600-700 credit score range, which corresponds to the
"Fair" credit score category.
Output:
21. Rank the Locations as per the number of people who have
churned the bank and average balance of the customers.
SELECT GeographyLocation,Num_Churned_Customers,Avg_Balance,
RANK() OVER (ORDER BY Num_Churned_Customers DESC, Avg_Balance
DESC) AS Location_Rank
FROM(SELECT
geo.GeographyLocation,
COUNT(*) AS Num_Churned_Customers,
ROUND(AVG(bc.Balance),2) AS Avg_Balance
FROM bank_churn bc
JOIN CustomerInfo ci ON bc.CustomerId = ci.CustomerId
JOIN Geography geo ON ci.GeographyID = geo.GeographyID
WHERE bc.Exited = 1
GROUP BY geo.GeographyLocation) AS LocationStats;
This SQL query ranks the geographic locations based on the number of
customers who have churned the bank (Num_Churned_Customers) and their
average balance (Avg_Balance). It utilizes the RANK() function to assign a rank
to each location, ordering them by the number of churned customers in
descending order, and then by the average balance in descending order.
Output:
Germany has the highest number of churned customers (814) and the
highest average balance among them (120,361.08), thus it is ranked first.
Output:
24. Were there any missing values in the data, using which tool did
you replace them and what are the ways to handle them?
The dataset does not contain any missing values. The only change made to the
dataset was converting the datatype of the BankDOJ column to a date
datatype.
25. Write the query to get the customer IDs, their last name, and
whether they are active or not for the customers whose surname
ends with “on”.
SELECT ci.CustomerId,ci.Surname,
MAX(ac.ActiveCategory) AS ActiveCategory
FROM customerinfo ci
JOIN bank_churn b ON ci.CustomerId = b.CustomerId
JOIN activecustomer ac ON b.IsActiveMember = ac.ActiveID
WHERE ci.Surname LIKE '%on'
GROUP BY ci.CustomerId,ci.Surname;
This SQL query selects unique combinations of CustomerId and Surname from
the customerinfo table where the Surname contains "on". It retrieves the
maximum ActiveCategory associated with each combination from the
activecustomer table.
Sample Output:
This query groups results by CustomerId and Surname, selecting the highest
ActiveCategory for each group, ensuring unique CustomerId entries in the
output.
Subjective Questions:
1. Customer Behavior Analysis: What patterns can be observed in
the spending habits of long-term customers compared to new
customers, and what might these patterns suggest about customer
loyalty?
Long term customers who have retained in the bank have purchased
more number of products when compared to new customers.
It is to be noted that the Number of customers and their Activeness are
more in this category.
The reason behind the Low average balance among other categories
might be because they have purchased more products.
There are high numbers of customers with only one or two products.
This suggests that there is a strong association between the number of
products a customer holds and their possession with the bank. Targeted
marketing campaigns and personalized recommendations can be tailored
based on customers' existing product holdings to encourage them to expand
their relationships with the bank.
In regions with high churn rates, bank may need to focus on improving
customer retention strategies, such as enhancing customer service, offering
competitive rates, or introducing loyalty programs.
Similarly, in regions with lower counts of active accounts, bank may need to
invest in marketing efforts, product innovation, or partnerships to attract new
customers and expand their market share.
CASE
END AS AgeBin,
COUNT(*) AS CustomerCount,
AVG(bc.balance) AS AverageBalance
FROM customerinfo c
GROUP BY g.GenderCategory,
CASE
END;
SELECT
-- Demographic Segmentation
CASE
ELSE 'Unknown'
END AS Age_Group,
g.GeographyLocation,
gi.GenderCategory AS Gender,
CASE
ELSE 'Unknown'
END AS Balance_Category,
cc.Category AS Credit_Card_Status
FROM customerinfo ci
11. What is the current churn rate per year and overall as well in
the bank? Can you suggest some insights to the bank about which
kind of customers are more likely to churn and what different
strategies can be used to decrease the churn rate?
The churn rates vary across different age groups each year. Generally,
older customers (50 or above) consistently exhibit higher churn rates
compared to younger age groups (18-30 and 30-50).While there are
fluctuations in churn rates from year to year, there isn't a significant upward or
downward trend in any specific age group.
Go to the "Home" tab and click on "Transform Data" to open the query editor.
In the query editor, find the table "Bank_Churn" from the list of queries on the
left-hand side.
Once you've renamed the column, click on "Close & Apply" to save your
changes and close the query editor.