0% found this document useful (0 votes)

35 views6 pages

Assignment Data Science

This document outlines a data science assignment consisting of three problem statements focused on Data Science, MLOps, and Generative AI. Participants are required to choose two tasks, complete them within specified timelines, and submit their solutions via GitHub or a zip folder, including a Jupyter Notebook and any necessary scripts. Each problem statement includes detailed tasks such as data preprocessing, model training, API deployment, and email summarization or response generation.

Uploaded by

bikasgupta526

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views6 pages

Assignment Data Science

Uploaded by

bikasgupta526

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Science Assignment

Introduction:

We’re excited to have you participate in this interview assignment! The purpose of this task is to
assess your skills in Data Science, MLOps, and Generative AI. You will have the option to work
on any two of the problem statements outlined below. Each task is designed to test different aspects
of your technical abilities, from data manipulation and machine learning to deploying AI models.

Timelines:

• 3-5 days after receiving the assignment.

Instructions:

• Choose any two tasks from the assignment to complete.

• Submit your solution using one of the following methods:

o GitHub Repository: Provide a link to the repository containing your code.

o Zip Folder: Submit a zip file that contains your project code.

• You can use google colab or any other open-source notebooks too (if needed).

Ensure your submission includes:

• A Jupyter Notebook that explains your process and walks through the solution. Notebook
should contain the outputs too.

• Any additional Python scripts/files if required for deployment or additional functionalities.

Good luck, and we look forward to reviewing your solutions!

Problem Statement 1: Data Science Task

You will be working with a dataset related to a company's customer churn. The goal is to predict
whether a customer is likely to churn.

Dataset:

Open-source dataset from Kaggle: https://www.kaggle.com/datasets/blastchar/telco-customer-

churn/data

Objective:

Build a Machine Learning / Deep Learning Model to predict customer churn.

Suggested Timelines:

• 2 days

Tasks:

1. Data Preprocessing:
• Perform EDA (Exploratory Data Analysis) to understand the dataset and handle
missing values, outliers, and feature transformations.
• Encode categorical variables and scale numerical features if necessary.
2. Feature Engineering:
• Create new features that may improve model performance. For example, can you
create a feature that indicates customer tenure length in months?
3. Model Training and Selection:
• Train at least 2/3 different models (should have at-least 1 DL model) and compare
their performances.
• Optimize hyperparameters using cross-validation or a grid search method.
4. Model Evaluation:
• Evaluate your models using metrics such as accuracy, precision, recall, F1-score,
and AUC-ROC.
• Create a confusion matrix for all models and discuss the results.
5. Model Interpretation:
• Explain the most important features that contribute to the predictions.
• Use techniques like SHAP values or feature importance plots to explain the model’s
decisions.
6. Bonus:
• Suggest potential business actions the company could take based on your analysis
to reduce customer churn.
Deliverables:

• A Jupyter Notebook with your EDA, model training, evaluation, and model interpretation
steps.
• A report summarizing your findings and recommendations.

Problem Statement 2: MLOps Task

The second part involves setting up a basic MLOps pipeline to ensure that your machine learning
model can be continuously improved and deployed into production.

Scenario:

You have been asked to deploy the churn prediction model as a REST API for your company's
marketing team. They should be able to send customer data and get a prediction of whether the
customer is likely to churn.

Suggested Timelines:

• 1 day

Tasks:

1. Model Packaging:

• Package the trained machine learning model (from Part 1) using a Python-based
framework like Flask, FastAPI, or Django.

• Ensure the model can be served as a REST API that accepts customer data and
returns a prediction.

2. Containerization:

• Containerize the API using Docker, ensuring that the application can run in a
reproducible environment.

3. CI/CD Pipeline (Bonus):

• Set up a CI/CD pipeline using tools like GitHub Actions, Jenkins to automatically
test and deploy the model when changes are made.

• Integrate a testing framework to validate the API before deployment.

4. Model Monitoring:

• Suggest how you would monitor the model’s performance in production. For
example, tracking model drift, API latency, and request logs.

5. Scaling (Bonus):

• Discuss how you would scale this API in a production environment. Include tools
like Kubernetes or auto-scaling with cloud platforms (e.g., AWS, GCP).

Deliverables:

• A GitHub repository or zip file of the project containing:

o The REST API code for model deployment.

o A Dockerfile for containerizing the API.

o Optional CI/CD setup files (if part of the bonus tasks).

o Instructions to run the API locally.

Problem Statement 3: Generative AI

In this task, you will use the “Enron Email Dataset” to build a system that can either summarize
long email threads or generate responses to common emails. The goal is to explore the capabilities
of a generative language model to handle everyday email tasks.

Dataset:

Open-source dataset from Kaggle: https://www.kaggle.com/datasets/wcukierski/enron-email-

dataset

Scenario:

Your company wants to automate part of its email workflow by using AI to either summarize long
email threads or generate responses for common email types.

Suggested Timelines:

• 2 days
Objective:

• Create a pipeline using a pre-trained language model to perform one of the following tasks:

1. Summarize long email threads.

2. Generate automated responses to common email types.

Tasks:

1. Dataset Exploration & Preprocessing:

• Download the Enron Email Dataset from Kaggle.

• Pick a subset of emails that are either part of long threads (3+ replies) or cover common
topics (e.g., meeting requests, project updates).

• Clean the email text by:

o Removing unnecessary parts (e.g., signatures, metadata).

o Simplifying the content for model input.

2. Email Summarization Task:

Goal: Summarize long email threads into concise, actionable summaries.

• Steps:

o Use a pre-trained model open-source model for text summarization.

o Input email threads into the model and generate a summary.

o Evaluate the summary by checking if it captures the key points (e.g., decisions,
action items).

3. Response Generation Task

Goal: Automatically generate responses to common email types.

• Steps:

o Select a set of common email topics (e.g., meeting requests, status updates).

o Use a pre-trained model to generate an automated response based on the email

content.

o Evaluate the responses by checking if they are relevant and appropriate for the
context.

4. Model Evaluation:
• Summarization Task:

o Assess the quality (using metrics or manually) of the summaries. Does the
summary capture the main points? Is it concise and accurate?

• Response Generation Task:

o Check if the responses are coherent, contextually appropriate, and relevant to

the email.

5. Bonus (Optional): Simple API Deployment:

• Deploy the summarization or response generation system as a simple Flask or FastAPI

app.

o The app should allow users to input an email thread and receive a summary or
an automated response.

Deliverables:

• A Jupyter Notebook or Python script with:

o Data preprocessing steps.

o Implementation of the email summarization and response generation task.

o Manual evaluation of the results (or use of advanced metrics).

• A GitHub repository containing:

o The code for email summarization and response generation.

o (Optional) API code if the bonus task is completed.

Problem Statement - Usecase 1.2
No ratings yet
Problem Statement - Usecase 1.2
3 pages
Checklist
No ratings yet
Checklist
4 pages
IBM Data Science Project - Round2
No ratings yet
IBM Data Science Project - Round2
32 pages
Data Science and Machine Learning Project Ideas
100% (2)
Data Science and Machine Learning Project Ideas
20 pages
Dnyaneshwar Data Scientist CV
No ratings yet
Dnyaneshwar Data Scientist CV
1 page
Machine Learning Assignment-02
No ratings yet
Machine Learning Assignment-02
2 pages
Artificial Intelligence and Machine Learning Fundamentals
No ratings yet
Artificial Intelligence and Machine Learning Fundamentals
54 pages
AI Recruit
No ratings yet
AI Recruit
7 pages
Data Science Intern - Assignment
No ratings yet
Data Science Intern - Assignment
4 pages
Python Engineer Problem Statements
No ratings yet
Python Engineer Problem Statements
5 pages
Phase-2 Intelligent Chatbot Automated Assistance
No ratings yet
Phase-2 Intelligent Chatbot Automated Assistance
7 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Phase-2 Ibrahim
No ratings yet
Phase-2 Ibrahim
9 pages
Task - Case Study - DLMDSME01
No ratings yet
Task - Case Study - DLMDSME01
7 pages
CSE357CV
No ratings yet
CSE357CV
3 pages
Phase 3
No ratings yet
Phase 3
12 pages
Name - Anil Daharwal
No ratings yet
Name - Anil Daharwal
2 pages
Use Cases For Project
No ratings yet
Use Cases For Project
4 pages
Raushan Dec-2023
No ratings yet
Raushan Dec-2023
2 pages
2025 DM4ML Assign1
No ratings yet
2025 DM4ML Assign1
6 pages
Flight Fare Prediction Overview
No ratings yet
Flight Fare Prediction Overview
5 pages
BERT for Multi-Label Ticket Classification
No ratings yet
BERT for Multi-Label Ticket Classification
4 pages
Sari Go MM Ulaan U Deep Resume
No ratings yet
Sari Go MM Ulaan U Deep Resume
3 pages
GrowthLink - DS
No ratings yet
GrowthLink - DS
8 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Abhay Resume
No ratings yet
Abhay Resume
5 pages
Naukri YogendraVerma (6y 6m)
No ratings yet
Naukri YogendraVerma (6y 6m)
3 pages
VenkataRamana - Data Scientist - 5Y
No ratings yet
VenkataRamana - Data Scientist - 5Y
3 pages
Bhaskar CV
No ratings yet
Bhaskar CV
5 pages
Volvo - CV - Anik - Mukhopadhyay For Interview
No ratings yet
Volvo - CV - Anik - Mukhopadhyay For Interview
1 page
Entry-Level Data Scientist Resume
No ratings yet
Entry-Level Data Scientist Resume
2 pages
Data Science Coding Tasks and Solutions
100% (1)
Data Science Coding Tasks and Solutions
36 pages
Abhay Resume
No ratings yet
Abhay Resume
5 pages
AI Projects
No ratings yet
AI Projects
2 pages
Sachit Resume
No ratings yet
Sachit Resume
2 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
A1991370857 65680 10 2025 Csm355ca1
No ratings yet
A1991370857 65680 10 2025 Csm355ca1
6 pages
S Rivas An Sridhar An Resume
No ratings yet
S Rivas An Sridhar An Resume
6 pages
Sanjulika Sharma MLE Data Scientist Resume
No ratings yet
Sanjulika Sharma MLE Data Scientist Resume
2 pages
CV Nagaraj 3 4 2023.pdf 1680525267971
No ratings yet
CV Nagaraj 3 4 2023.pdf 1680525267971
3 pages
Machine Learning Pilot Proposal Overview
No ratings yet
Machine Learning Pilot Proposal Overview
3 pages
Harshit AI ML Engineer
No ratings yet
Harshit AI ML Engineer
4 pages
Komal CV
No ratings yet
Komal CV
4 pages
Consumer Complaint Prediction Pipeline
No ratings yet
Consumer Complaint Prediction Pipeline
4 pages
Deep Learning Projects
No ratings yet
Deep Learning Projects
13 pages
Sayiqa - AI Engineer
No ratings yet
Sayiqa - AI Engineer
4 pages
Tech Saksham: Capstone Project Report
No ratings yet
Tech Saksham: Capstone Project Report
32 pages
Shreyank
No ratings yet
Shreyank
6 pages
Lemur Astrologer Coding
No ratings yet
Lemur Astrologer Coding
28 pages
Sample - Resume-4 - 1688986307058
No ratings yet
Sample - Resume-4 - 1688986307058
4 pages
ThoufeeqM Sainokoyo
No ratings yet
ThoufeeqM Sainokoyo
3 pages
Automated ML Solution for Industrial Use
No ratings yet
Automated ML Solution for Industrial Use
4 pages
Ninad - Kamdi ML
No ratings yet
Ninad - Kamdi ML
4 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
44 pages
Projects For Ai
No ratings yet
Projects For Ai
8 pages
Fateh 1
No ratings yet
Fateh 1
7 pages
CV Siddhartha Shrestha
No ratings yet
CV Siddhartha Shrestha
5 pages
AI Model Assignments for Engineers
No ratings yet
AI Model Assignments for Engineers
3 pages
Variance and Standard Deviation
No ratings yet
Variance and Standard Deviation
14 pages
Linear Regression Assignment Guide
No ratings yet
Linear Regression Assignment Guide
2 pages
Pudari Rahul Finance Resume
No ratings yet
Pudari Rahul Finance Resume
2 pages
Research Kent
No ratings yet
Research Kent
27 pages
Statistics Unit 3
No ratings yet
Statistics Unit 3
71 pages
Data Analysis Essentials Guide
No ratings yet
Data Analysis Essentials Guide
36 pages
Project: ©great Learning. Proprietary Content. All Rights Reserved. Unauthorised Use or Distribution Prohibited
No ratings yet
Project: ©great Learning. Proprietary Content. All Rights Reserved. Unauthorised Use or Distribution Prohibited
7 pages
Proposal Workshop 1 PowerBI 0tab
No ratings yet
Proposal Workshop 1 PowerBI 0tab
2 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Associative Forecasting in Healthcare
No ratings yet
Associative Forecasting in Healthcare
42 pages
Hayes - Moderation PDF
No ratings yet
Hayes - Moderation PDF
13 pages
BEd-I ICT - Notes-MB
No ratings yet
BEd-I ICT - Notes-MB
5 pages
Regression and Correlation Unit Guide
No ratings yet
Regression and Correlation Unit Guide
1 page
Evaluating Classifiers with Mc Nemar's Test
No ratings yet
Evaluating Classifiers with Mc Nemar's Test
13 pages
Community Policing's Impact on Crime in Nigeria
No ratings yet
Community Policing's Impact on Crime in Nigeria
54 pages
Vijin PROJECT REPORT Front Page Original
No ratings yet
Vijin PROJECT REPORT Front Page Original
6 pages
Data Science PG Diploma Program
No ratings yet
Data Science PG Diploma Program
9 pages
National Council of Teachers of Mathematics
No ratings yet
National Council of Teachers of Mathematics
13 pages
Relating Hofstede's Dimensions of Culture To The International Variations in Print Advertisements
No ratings yet
Relating Hofstede's Dimensions of Culture To The International Variations in Print Advertisements
17 pages
Data Science Exam Review
No ratings yet
Data Science Exam Review
4 pages
Economics Assignment Guide
No ratings yet
Economics Assignment Guide
2 pages
ANOVA and Post Hoc
No ratings yet
ANOVA and Post Hoc
2 pages
Unit 5 and 6
No ratings yet
Unit 5 and 6
9 pages
Chapter 2 Chn2
No ratings yet
Chapter 2 Chn2
26 pages
Lab Record Foundation in Data Science
No ratings yet
Lab Record Foundation in Data Science
25 pages
Guidelines For Project DMBA404 - Nov 24
No ratings yet
Guidelines For Project DMBA404 - Nov 24
10 pages
Case Processing Summary
No ratings yet
Case Processing Summary
8 pages
Data Science Unit-3
No ratings yet
Data Science Unit-3
49 pages
Project Charter
No ratings yet
Project Charter
6 pages
Qualitative Research Overview
No ratings yet
Qualitative Research Overview
7 pages