0% found this document useful (0 votes)
15 views10 pages

ABA2024 Final Project

Advanced business analytics

Uploaded by

lai.nguyen200153
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

ABA2024 Final Project

Advanced business analytics

Uploaded by

lai.nguyen200153
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Advanced Business Analytics (ABA) Final Project

(Due 11:59 pm, Dec 14th, 2024)

Question 1: Database (20 points)

You are a social media marketing specialist. You have collected the data on the performance of various campaigns that your company has run for its client
in the past few months. A sample of the data is given below:

Marketing Campaigns Table:

Campaig Campaig StartDate EndDate ClientID ClientNa ClientAdd RegionID RegionNa ChannelI ChannelN Spend Impressio Conversi
nID nName me ress me D ame ns ons

1 Summer 2024-06-0 2024-06-3 301 Green 123 Elm R1 North East C1 Social 5000 150000 3000
Sale 1 0 Leaf St, New Media
Foods York

2 Winter 2024-12-0 2024-12-3 302 Urban 456 Oak R2 West C2 Email 4000 100000 2500
Promo 1 1 Style St, Coast
California

3 Holiday 2024-12-1 2024-12-3 301 Green 123 Elm R1 North East C1 Social 3000 80000 2000
Discount 5 1 Leaf St, New Media
Foods York

4 Flash Sale 2024-07-1 2024-07-2 303 Sunshine 789 Pine R3 South C3 PPC 2000 60000 1000
0 0 Apparel St, Florida East

5 Back-to-S 2024-08-0 2024-08-3 304 FreshTech 101 Maple R4 South C2 Email 3500 90000 1800
chool 1 1 Supplies Ave, West
Texas

6 Spring 2024-03-1 2024-04-1 305 Global 987 Cedar R5 Midwest C1 Social 6000 160000 3200
Refresh 5 5 Media St, Illinois Media
Group

7 Holiday 2024-12-2 2025-01-0 306 Bright 333 R2 West C3 PPC 4500 120000 2800
Giveaway 0 5 Home Willow St, Coast
Goods Nevada
8 Fall 2024-09-0 2024-09-3 307 Blue 222 Birch R3 South C2 Email 3000 95000 2100
Frenzy 1 0 Ridge St, East
Apparel Georgia

9 Cyber 2024-11-2 2024-11-2 308 Digital 654 R2 West C3 PPC 7000 200000 4500
Monday 5 6 Future Aspen Rd, Coast
Deals LLC Oregon

10 Winter 2024-12-0 2024-12-3 309 Harmony 555 Maple R5 Midwest C1 Social 5000 130000 2600
Clearance 1 1 Electronic St, Media
s Michigan
Q1.1. Perform data normalization on the provided database. In your normalized design, you need to:

● List and name all potential tables.


● Describe all the columns in each table.
● Highlight the primary key in each table.

Q1.2. Please list the relationships between the tables and highlight the foreign keys. You should use an
ERD (Entity Relationship Diagram) to describe the relationships.
Question 2: SQL (20 points)

Please answer the questions below using Metabase hosted at https://metabase-xy1u.onrender.com/ and
look for dvdrental database. Below is the ERD of the database:
Please include screenshots of the resulting table/chart and the SQL query used for each question.

1. Find the payment with the highest amount


2. Find all inactive customers
3. List all unique first names of actors in alphabetical order
4. Find all films that cost more than $20 to replace but have a rental rate below $3
5. Retrieve the titles of all films whose description contains the word 'Epic'
6. Find the film with the longest duration.
7. Find the average rating of the films
8. Find the average rental duration of all films
9. List all films along with the number of actors who acted in each film
10. Find the most popular film based on the number of rentals
11. Identify the customers who have made more than 5 payments
12. Find the average rental rate of films in each category
13. Retrieve the film categories along with the number of films in each category
14. Find all customers who rented films and display their first name, last name, and the rental date.
15. Find the total amount of payments made by each customer, ordered by the highest total first
Question 3: Exploratory Data Analysis (20 points)

Instructions:
- Marketing data in csv (comma-separated-values) format
http://lms.vnuk.edu.vn/courses/1384/files/94518?module_item_id=41774
- Template Jupyter notebook for Google Colab
http://lms.vnuk.edu.vn/courses/1384/files/94519?module_item_id=41775
- Question 3-5 will make use of the same dataset and Jupyter notebook template for analysis
- Please don’t be intimidated by the code, focus on what questions you want to answer in your
report, the code changes should be minimal
- Note that using Python for your analysis is encouraged, but optional. For example, if you want to
use Excel for exploratory data analysis, that’s fine (no penalty). But then again, we would rather
that you give Python a try.

Dataset Overview: The dataset provided includes customer data collected by a marketing analytics
team. The main goal is to analyze the behavior of customers based on their demographic and
transactional data to derive insights that can improve marketing strategies. The dataset contains
columns describing customer demographics (e.g., Income, Education, Marital), transactional data (e.g.,
MntWines, NumCatalogPurchases), and their responses to marketing campaigns (e.g., AcceptedCmp1,
AcceptedCmp2). The Response column serves as a target variable, indicating whether the customer
accepted the offer in the last campaign.

Key Columns:

● AcceptedCmp1-5: whether the customer accepted the offer in each of the five campaigns.
● Response: whether the customer accepted the offer in the last campaign (binary).
● Income: Customer's yearly household income.
● Kidhome, Teenhome: Number of small children and teenagers in the household.
● MntWines, MntMeatProducts, etc.: Amount spent on various product categories in the last 2
years.
● NumWebPurchases, NumStorePurchases, etc.: Number of purchases across various
channels.
● NumWebVisitsMonth: Number of visits to the company's website in the last month.
● Recency: Number of days since the last purchase.

Questions:

Perform an exploratory data analysis on the data, such as visualizing the distribution of values in
selected columns, or differences in the distribution of a column across sub-groups of customers.

Below are some suggested questions that you can try answering. Note that you don’t have to answer all
of them, please aim to answer 5 questions in your report, which may or may not be in the list below.
Focus on the quality and depth of your answers, not just quantity.

a. Demographic Analysis:
- What is the distribution of customer income? Are there any noticeable patterns or
outliers?
- How are customers distributed across different education levels and marital statuses?
b. Spending Behavior:
- Which product category (e.g., wines, meats, sweets) has the highest and lowest average
spending?
- What is the total spending across all categories (MntTotal) for customers with varying income
levels?
c. Campaign Effectiveness:
- What percentage of customers accepted offers in each campaign (AcceptedCmp1-5)? Is there a
trend in acceptance rates across campaigns?
- Are customers who accepted previous campaigns (AcceptedCmp1-5) more likely to accept the
most recent campaign (Response)?
d. Channel Preferences:
- Which purchase channel (e.g., web, catalog, store) is the most frequently used across
customers?
- How does the number of website visits (NumWebVisitsMonth) relate to the number of website
purchases (NumWebPurchases)?
e. Customer Segments:
- How do family dynamics (e.g., Kidhome, Teenhome) affect customer spending on specific
product categories like wines or sweets?
- Are customers who have complained in the last two years (Complain) spending significantly more
or less than those who haven’t?
f. Recency and Response:
- Is there a relationship between the recency of a customer’s last purchase (Recency) and their
likelihood of accepting the latest campaign (Response)?
Question 4: Linear Regression Analysis (supervised learning) (20 points)

The same notebook from question 3 includes a template for linear regression analysis. The intention of
this section is to check if there are statistically significant linear relationship between variables in the
dataset.

Below are some questions that you can try answering. Note that you don’t have to answer all of them,
please aim to answer 3 questions in your report (besides the analysis and visualization extracted from
the notebook), which may or may not be in the list below. Focus on the quality and depth of your
answers, not just quantity.

● Income vs. Spending: how does a customer’s yearly income predict their spending?
● Website Visits vs. Website Purchases: can the number of website visits in the last month
predict the number of website purchases?
● Family Size vs. Spending: does the number of children and teenagers in a household predict
spending on certain products?
● Discount Purchases vs. Total Purchases: does the number of discount purchases predict the
total number of purchases made in certain, or all channels?
● Recency vs. Campaign Response: does the number of days since a customer’s last purchase
influence their likelihood of accepting the latest campaign offer?
Question 4: Customer segmentation (Unsupervised learning) (20 points)

The same notebook from question 3 includes a template for customer segmentation. The segmentation
(clustering) is performed based on RFM (recency, frequency, monetary) aspects of the customers. The
intention of this section is to identify meaningful clusters of users.

Below are some questions that you can try answering. Note that you don’t have to answer all of them,
please aim to answer at least the first 2 questions in your report. Focus on the quality and depth of your
answers, not just quantity.

● Retention Strategies: which clusters in your analysis should be targeted with retention
strategies, and how?
● Promotional Targeting: which clusters can be targeted with promotional campaigns, and how?
● High-Value Customers: who are the high-value customers and what are their spending
patterns?
● Channel Preferences: how do purchase behaviors differ across clusters? Can specific channels
be optimized for certain customer groups?
● Campaign Effectiveness: how does campaign acceptance vary across clusters? Are high-value
clusters more likely to respond positively to campaigns?
Question 5: Format and Writing Style

- Please compile all your answer into a single report in pdf format
- You can include your code (SQL, Python or Excel) describing how the analysis was performed in
a separate file if you don’t want to clutter your report
- Your report should include relevant visualization (e.g. graphs) to support your
interpretation/recommendation
- You are encouraged to use AI assistance to improve your writing. However, make sure to use AI
wisely. We do not want (and will penalize) overly verbose paragraphs with little meaning.

Question 6: Peer evaluation

As an outcome of a group assignment, your work must include a peer evaluation. Clearly state how each
member contributed to the group work and the percentage of the total work each member should
receive.

This evaluation should be sent individually and confidentially via email to the instructors at
[email protected] and [email protected]

Team Member Contribution Description % Contribution (Total 100%)

Member 1 Wrote part 1, collected data, edited the paper 25%

Member 2 Wrote part 2, collected data 20%

Member 3 Wrote part 3, collected data 20%

… … …

You might also like