0% found this document useful (0 votes)

186 views6 pages

Capstones AIML and DS Capstone Projects

The document discusses four different domains - Real Estate, Healthcare, Retail, and EdTech. For each domain, it provides the problem statement, objective, tools used (Jupyter notebook and Tableau), and learning objective. The learning objectives involve tasks like exploratory data analysis, model building, dashboard creation, and more. The overall goal is to work on predictive analytics and business intelligence problems in different domains.

Uploaded by

Kaushik Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views6 pages

Capstones AIML and DS Capstone Projects

Uploaded by

Kaushik Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Science (All categories)

Problem statement<Objective<Tools used<Learning objective

Domain: Real Estate

Problem statement: A banking institution requires actionable insights into mortgage-backed
securities, geographic business investment, and real estate analysis.The mortgage bank would
like to identify potential monthly mortgage expenses for each region based on monthly family
income and rental of the real estate.

Also you need to create a dashboard that demonstrates relationships and trends for the key
metrics as follows: number of loans, average rental income, monthly mortgage and owner’s
cost, family income vs mortgage cost comparison across different regions. The metrics
described here do noObjective: A statistical model needs to be created to predict the potential
demand for the amount of loan in dollars for each of the regions in the USA. Also, there is a
need to create a dashboard which would refresh periodically, post data retrieval from the
agencies.

t limit the dashboard to these few.

Tool used: Jupyter notebook and Tableau`

Learning Objective: In this capstone, you will perform exploratory data analysis (EDA), data
preprocessing prior to model building, and then build linear regression models that predict total
monthly expenditure for home mortgage loans. You will also create a dashboard in tableau.

Domain: Healthcare

Problem statement: NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases)
research creates knowledge about and treatments for the most chronic, costly, and
consequential diseases. The datasets consist of several medical predictor variables and one
target variable (Outcome). Predictor variables include the number of pregnancies the patient
has had, their BMI, insulin level, age, and more.

Objective: In this capstone you will have to predict whether or not a patient has diabetes, based
on certain diagnostic measurements included in the dataset. Build a model to accurately predict
whether the patients in the dataset have diabetes or not.
Tool used: Jupyter notebook and Tableau

Learning objective: You will have to perform descriptive analysis to explore the dataset
variables using histograms. You will also create scatter charts between the pair of variables to
understand the relationships. Report the data by creating a dashboard in tableau.

Domain: Retail

Problem statement: Customer segmentation is the practice of segregating the customer base
into groups of individuals based on some common characteristics such as age, gender,
interests, and spending habits. It is a critical requirement for business to understand the value
derived from a customer. RFM is a method used for analyzing customer value.

RFM: Measuring when was the last order of a customer, which is called ‘Recency’, is an
important customer attribute to consider for segmentation. It means the number of days since a
customer made the last purchase.

How often customer purchases from the store should also be taken into account for the
customer segmentation exercise. This can be termed as ‘Frequency’. It is about the
number of purchases in a given period. Bigger value of frequency indicates a more
engaged customer.

Can we conclude on customer-value based on Recency and Frequency only? Maybe not!
Because we also must incorporate the amount a customer paid for the purchases, which is
monetary value. ‘Monetary’ is the total amount of money a customer spent in the given time
period.

Objective:Perform customer segmentation using RFM analysis. The resulting segments

can be ordered from most valuable (highest recency, frequency, and value) to least
valuable (lowest recency, frequency, and value).

Tool used: Jupyter notebook and Tableau

Learning Objective: You must conduct a preliminary data inspection and data cleaning.
After performing cohort analysis (a cohort is a group of subjects who share a defining
characteristic), you will be asked to build an RFM (Recency Frequency Monetary) model
and calculate RFM metrics. At the end you will create a dashboard in tableau by choosing
appropriate chart types and metrics useful for the business
AIML(Masters)
Problem statement<Objective<Tools used<Learning objective

Domain: E-commerce

Problem statement: Amazon is an online shopping website that now caters to millions of
people everywhere. Over 34,000 consumer reviews for Amazon brand products like Kindle,
Fire TV Stick and more are provided. The dataset has attributes like brand, categories,
primary categories, reviews.title, reviews.text, and the sentiment. Sentiment is a categorical
variable with three levels "Positive", "Negative“, and "Neutral". For a given unseen data, the
sentiment needs to be predicted.

Objective: You are required to predict Sentiment or Satisfaction of a purchase based on

multiple features and review text.

Tools used: Jupyter notebook, Amazon sagemaker lab

Learning Objective: Perform an EDA(Exploratory Data Analysis) on the dataset to tackle

the class imbalance problem in the dataset. You will also have to run multinomial Naive
Bayes classifier, SVM and Random forest classifiers. You will also get an idea of Deep
learning by performing LSTM(long short-term memory networks). At the end you will be
asked to compare the accuracy of neural nets with traditional ML based algorithms

Domain: Finance

Problem statement: Finance Industry is the biggest consumer of AIML engineers. It faces
constant attack by fraudsters, who try to trick the system. Correctly identifying fraudulent
transactions is often compared with finding a needle in a haystack because of the low
event rate. It is important that credit card companies are able to recognize fraudulent credit
card transactions so that the customers are not charged for items that they did not
purchase.

Objective: You are required to try various techniques such as supervised models with
oversampling, unsupervised anomaly detection, and heuristics to get good accuracy at
fraud detection.
Tools used: Jupyter notebook, Amazon sagemaker lab
Learning Objective: You will perform an EDA on the Dataset. You will be required to
create models such as Naive Bayes, Logistic Regression, and SVM. Determine which one
performs the best. You will also be asked to predict store sales using ANN (Artificial Neural
Network). Aside from that, you will be required to implement anomaly detection algorithms.

Domain: Retail

Problem Statement: Demand Forecast is one of the key tasks in Supply Chain and Retail
Domain in general. It is key in effective operation and optimization of retail supply chain.
Effectively solving this problem requires knowledge about a wide range of tricks in Machine
learning and good understanding of ensemble techniques.

Objective: You are required to predict sales for each Store-Day level for one month. All the
features will be provided and actual sales that happened during that month will also be
provided for model evaluation.

Tool used: Jupyter notebook, Amazon sagemaker lab

Learning Objective: You will be transforming the dataset variables using data
manipulation techniques such as One-Hot Encoding and conducting an EDA (Exploratory
Data Analysis) to determine the impact of variables on Sales. You will be applying Linear
Regression to predict the store sales. You will investigate Non-Linear Regressors such as
Random Forest or other Tree-based Regressors and compare the performance of Linear
and Non-Linear Regressors based on previous observations. To understand the
significance of deep neural network algorithms you will be using ANN (Artificial Neural
Network) to predict Store Sales.

AIML(Bootcamp and PG)

Problem statement<Objective<Tools used<Learning objective

Domain: EdTech

Problem statement and Objective: Simplilearn would like to assess the quality of
E-Learning videos freely available on YouTube. This would give them ideas on preparing
their video content, which is more engaging with the students. They have chosen
handpicked playlists corresponding to various Computer Science Subjects from an NPTEL
channel as a pilot study. Videos will be assessed on various fronts like instructor presence
in the video, body language, use of blackboard, use of slides, etc.
Tools used: Jupyter notebook, Amazon sagemaker lab

Learning Objective: To proceed with the analysis, you need to employ uniform time
sampling to segment the MP4 videos into keyframes, and then perform clustering. As all
videos on YouTube are freely accessible, you can extract the comments and replies related
to them using YouTube API v3.

Domain: Healthcare

Problem Statement: ICMR wants to analyze different types of cancers, such as breast
cancer, renal cancer, colon cancer, lung cancer, and prostate cancer becoming a cause of
worry in recent years. The input dataset contains 802 samples for the corresponding 802
people who have been detected with different types of cancer. Each sample contains
expression values of more than 20K genes.
Samples have one of the types of tumors: BRCA, KIRC, COAD, LUAD, and PRAD

Objective: Determine the most likely cause of these cancers in terms of the genes
responsible for each type of cancer. This would lead to earlier detection of each type of
cancer, lowering the mortality rate.

Tool used: Jupyter notebook

Learning Objective:Your task is to conduct an Exploratory Data Analysis (EDA) on the

dataset. Afterward, you need to use feature selection algorithms like forward selection and
backward elimination to narrow down the selected attributes. You will then perform
dimensionality reduction using techniques such as PCA, LDA, and t-SNE. Your goal is to
identify groups of genes and sample distributions that exhibit similar behavior. To achieve
this, you will apply clustering techniques such as k-means, hierarchical, and mean shift
clustering on genes and samples. Ultimately, your objective is to develop a strong
classification model that can accurately identify different types of cancer.

Domain: Cyber Security

Problem Statement: Book-My-Show will enable the ads on their website, but they are also
very cautious about their user privacy and information about who visits their website. Some
ads URL could contain a malicious link that can trick any recipient and lead to a malware
installation, freezing the system as part of a ransomware attack or revealing
sensitive information.
Objective: Book-My-Show now wants to analyze whether the particular URL is prone to
phishing (malicious) or not. The input dataset contains an 11k sample corresponding to the
11k URL. Each sample contains 32 features that give a different and unique description of
the URL ranging from -1,0,1.
-1: Phishing
0: Suspicious
1: Legitimate

Tool used: Jupyter notebook

Learning Objective: Your task is to conduct an Exploratory Data Analysis (EDA) on the
dataset. Identify the correlated features present in the data and remove the feature which
might be correlated with some threshold. Finally, you will be asked to build a robust
classification system that classifies whether the URL sample is a phishing site or not.

20 End-to-End Data Science Projects For A Junior Portfolio
No ratings yet
20 End-to-End Data Science Projects For A Junior Portfolio
7 pages
Final Is2184
No ratings yet
Final Is2184
13 pages
Assessment in Double Entry Accounting
No ratings yet
Assessment in Double Entry Accounting
7 pages
De Vera, Crisangelyn C
No ratings yet
De Vera, Crisangelyn C
2 pages
Cat Connectors
No ratings yet
Cat Connectors
85 pages
Guidelines Flare Vent Measurement
100% (1)
Guidelines Flare Vent Measurement
36 pages
Transportation Engg: Compiled By: Engr Muhammad Abbas Khan
No ratings yet
Transportation Engg: Compiled By: Engr Muhammad Abbas Khan
9 pages
Agri-Fishery Arts: Module 1: Importance of Planting Trees
No ratings yet
Agri-Fishery Arts: Module 1: Importance of Planting Trees
22 pages
Three Phase Frequency Converter PDF
No ratings yet
Three Phase Frequency Converter PDF
86 pages
Visual Summary of Atomic Habits by James Clear (Part 1)
No ratings yet
Visual Summary of Atomic Habits by James Clear (Part 1)
15 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
5 pages
Facilitator's CALA Guide: Learning Area: CALA Type: Level: Topic: Duration
No ratings yet
Facilitator's CALA Guide: Learning Area: CALA Type: Level: Topic: Duration
8 pages
Business Communication Report
No ratings yet
Business Communication Report
15 pages
Top 25 YouTube Channels
No ratings yet
Top 25 YouTube Channels
7 pages
WS10. LETTER OF COMPLAINT ClassX
No ratings yet
WS10. LETTER OF COMPLAINT ClassX
4 pages
CLobazam
No ratings yet
CLobazam
7 pages
Practical Asessment - 3.2022
No ratings yet
Practical Asessment - 3.2022
303 pages
Organic Bakery Marketing Plan
No ratings yet
Organic Bakery Marketing Plan
30 pages
IDFL Standards - European Sleeping Bag Labeling Info EN13537 Information For Consumers Jan 05
No ratings yet
IDFL Standards - European Sleeping Bag Labeling Info EN13537 Information For Consumers Jan 05
5 pages
Top Human Resource Email List - Part 5
No ratings yet
Top Human Resource Email List - Part 5
12 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Fine Wines - Skinner Auctions 2622B and 2614T
No ratings yet
Fine Wines - Skinner Auctions 2622B and 2614T
108 pages
Komal Res1
No ratings yet
Komal Res1
2 pages
Admission Circular in Evening - Executive MBA (EMBA) in Jahangirnagar University
No ratings yet
Admission Circular in Evening - Executive MBA (EMBA) in Jahangirnagar University
2 pages
With Python: Machine Learning
No ratings yet
With Python: Machine Learning
3 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
32 pages
Projects & Job Assistance - Learnbay
No ratings yet
Projects & Job Assistance - Learnbay
14 pages
Cement Statement PDF
No ratings yet
Cement Statement PDF
6 pages
FF0332 01 Artificial Intelligence Powerpoint Template
No ratings yet
FF0332 01 Artificial Intelligence Powerpoint Template
8 pages
Pranjali Mishra Resume BusinessAnalyst
No ratings yet
Pranjali Mishra Resume BusinessAnalyst
1 page
Vijaya Bharathi
No ratings yet
Vijaya Bharathi
2 pages
Mayuri Sonawane: Objective
No ratings yet
Mayuri Sonawane: Objective
3 pages
新电影评论和评分
100% (2)
新电影评论和评分
7 pages
Final DMT Report PDF
No ratings yet
Final DMT Report PDF
27 pages
2550Q-4th2021 - (EB187139-EEFB-462C
No ratings yet
2550Q-4th2021 - (EB187139-EEFB-462C
3 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Machine Learning Using Python
100% (1)
Machine Learning Using Python
2 pages
RC1665 - Mindi Puspita Anggraeni
No ratings yet
RC1665 - Mindi Puspita Anggraeni
5 pages
Digital Transformation in Banking
No ratings yet
Digital Transformation in Banking
4 pages
Sundar Raghvan
No ratings yet
Sundar Raghvan
2 pages
Exam Time Table 2024 Bulanala-1
No ratings yet
Exam Time Table 2024 Bulanala-1
2 pages
Project Report: Application of Machine Learning
No ratings yet
Project Report: Application of Machine Learning
12 pages
Predictive Analysis For Big Mart Sales Using Machine Learning Algorithms
No ratings yet
Predictive Analysis For Big Mart Sales Using Machine Learning Algorithms
6 pages
Data Science
No ratings yet
Data Science
68 pages
Data Scientist Good Resume
No ratings yet
Data Scientist Good Resume
1 page
Namrata Resume
No ratings yet
Namrata Resume
4 pages
Raushan Nov-2023
No ratings yet
Raushan Nov-2023
2 pages
NLC Accomplishment Report 2024-2025
No ratings yet
NLC Accomplishment Report 2024-2025
5 pages
9/11 Commission Interview Requests For Defense Department Personnel
No ratings yet
9/11 Commission Interview Requests For Defense Department Personnel
6 pages
E Commerce project-NL
No ratings yet
E Commerce project-NL
35 pages
Raushan Dec-2023
No ratings yet
Raushan Dec-2023
2 pages
Oemaomaa PDF 1734439841
No ratings yet
Oemaomaa PDF 1734439841
34 pages
UCL International Postgraduates Orientation Webinar
No ratings yet
UCL International Postgraduates Orientation Webinar
70 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
First Coding Session - Overview!
No ratings yet
First Coding Session - Overview!
5 pages
Faculty Project Titles 2024
No ratings yet
Faculty Project Titles 2024
26 pages
LLM2
No ratings yet
LLM2
6 pages
Sales Prediction For Big Mart 3.0.pptx MM
No ratings yet
Sales Prediction For Big Mart 3.0.pptx MM
25 pages
Instructions For Big Data Assignment
No ratings yet
Instructions For Big Data Assignment
5 pages
Doctrinal
No ratings yet
Doctrinal
42 pages
Capstone3problemstatement
No ratings yet
Capstone3problemstatement
14 pages
D Caltech PG AI & ML Project
No ratings yet
D Caltech PG AI & ML Project
4 pages
Supriya Synopsis Final
No ratings yet
Supriya Synopsis Final
27 pages
Aishwarya Swetha Data Science
No ratings yet
Aishwarya Swetha Data Science
1 page
PPIR!1
No ratings yet
PPIR!1
9 pages
S 11
No ratings yet
S 11
7 pages
Retail Market Analysis: Ke Yuan, Yaoxin Liu, Shriyesh Chandra, Rishav Roy New York University
No ratings yet
Retail Market Analysis: Ke Yuan, Yaoxin Liu, Shriyesh Chandra, Rishav Roy New York University
12 pages
Big Data Analytics Suggestion
No ratings yet
Big Data Analytics Suggestion
3 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
Master Bollinger Bands Swing Trading Strategy - OpoFinance
No ratings yet
Master Bollinger Bands Swing Trading Strategy - OpoFinance
14 pages
Customer Segmentation 2
No ratings yet
Customer Segmentation 2
19 pages
Data Science
No ratings yet
Data Science
8 pages
Srinagah EAS504 9
No ratings yet
Srinagah EAS504 9
6 pages
Sai Krishna Neelam Resume
No ratings yet
Sai Krishna Neelam Resume
4 pages
Data Science
No ratings yet
Data Science
62 pages
Project
No ratings yet
Project
2 pages
Heart Disease
No ratings yet
Heart Disease
28 pages
Project List Data Analytics
No ratings yet
Project List Data Analytics
13 pages
Case Study - Yangpu - Riverfront
No ratings yet
Case Study - Yangpu - Riverfront
2 pages
Srinagah EAS504 7
No ratings yet
Srinagah EAS504 7
5 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Sari Go MM Ulaan U Deep Resume
No ratings yet
Sari Go MM Ulaan U Deep Resume
3 pages
Daa 01
No ratings yet
Daa 01
11 pages
Siddhartha Dhar
No ratings yet
Siddhartha Dhar
1 page
Project Resource - 3 Beginner-Friendly Data Science Projects
No ratings yet
Project Resource - 3 Beginner-Friendly Data Science Projects
3 pages
Amit-Soni
No ratings yet
Amit-Soni
1 page
Intel AI Global Impact Festival 2025 - Flyer
No ratings yet
Intel AI Global Impact Festival 2025 - Flyer
1 page
Aim L Projects
No ratings yet
Aim L Projects
3 pages
Mastering Data Analytics: For Absolute Beginners To Business Intelligence
From Everand
Mastering Data Analytics: For Absolute Beginners To Business Intelligence
Er. Allen Sage Jr.
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
From Everand
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
Max Editorial
No ratings yet
What Is Data Analytics? A Complete Guide For Beginners
From Everand
What Is Data Analytics? A Complete Guide For Beginners
Piyush Kumar Jain
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet