0% found this document useful (0 votes)

64 views43 pages

Team Presentation Slides

The document presents a comprehensive analysis of a dataset related to students participating in Excelerate's internship programs, focusing on data cleaning, exploratory data analysis, and predictive modeling. Key findings include insights on student engagement, churn analysis revealing a 40% churn rate, and the development of machine learning models to predict churn behavior. Recommendations for improving student engagement include personalized alerts, targeted course suggestions, and incentivizing participation.

Uploaded by

rahulvetal204

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views43 pages

Team Presentation Slides

Uploaded by

rahulvetal204

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Excelerate Students

Data Analysis
RIT 1410 AI Team 5A

1
Overview

1. Introduction
2. Exploratory Data Analysis (EDA)
3. Insight Generation
4. Hypothesis Development
5. Churn Analysis
6. Predictive Modeling
7. Recommendations
8. Conclusion
2
3
1.Introduction

4
1.1: Understanding the Dataset

- The dataset originally had 8,559 rows and 33 columns.

- This dataset is related to the Students who had participated
in one of the internship programs powered by Excelerate,
back in June 2023.

5
1.2: Key Challenges in the Dataset

- Dataset had a lot of empty cells.

- It had number of columns with a wrong data type that could
have caused issues down the line.
- Dataset had not been analyzed for insights extraction.

6
2. Exploratory Data Analysis
(EDA)

7
2.1: Initial Data Cleaning

8
2.1: Initial Data Cleaning

- Changing the data types of the columns to appropriate format

- Date to be in datetime64[ns] format, Zip code to be int, etc.
- Dropping the columns that had a lot of empty data or were of no
use
- Validating the data as per the column type, e.g.
- Ensuring name is alphabetical and has at least 3 characters
- Ensuring Zip Code is numerical
- No extra characters such as quotation marks ‘’

9
2.2: Context Based EDA

10
2.2: Context based EDA

- Ensuring that no column had any empty values

- Numerizing the columns for Machine Learning models
- Dropping the columns with a lot of empty data
- Dropping the columns that made no sense for the model
development phase
- Creating new columns based on the previous columns just
before the model development phases.

11
2.3: Feature Engineering

12
2.3.1: Creating new features

Creating new features:

- Creating a new column “Age of Learner” based on “Date of
Birth”
- Creating a new column “Engagement Duration” based on the
calculation between the “Apply Date” and “Starting Date”

13
2.3.2: Data Normalization

- Applying FeatureHasher on First & Last Names, location,

institution name, and current major.
- Applying MinMaxScaler() on 'Engagement Score',
'Opportunity Participation Count', 'Days Since Last
Engagement', 'Age of Learner', 'Engagement Duration'.
- Applying one-hot encoding on the other columns.

14
2.3.3: Other Feature Engineering Processes

- Extracting Date based features, i.e., SignUp Month & Year

- Calculating Engagement Score based on “Time in
Opportunity” and “Age of Learner”
- Calculating “Time in Opportunity” in days based on
Opportunity Start and End dates.
- Adding a new column “Time Since Last Engagement”

15
3. Insight Generation

16
3.1: Simple Insight
Generation

17
18
19
20
3.2: Advanced Insights
Generation

21
22
23
4. Hypothesis Development

24
4.1: Correlation Analysis

- Engagement Scores
declines as the time since
signup increases
- Correlation of 0.33
- Does not support the
hypothesis

25
4.2: Cohort Analysis

- Students who are highly

engaged in the first 30
days are more likely to
remain active long-term.
- Supports the hypothesis.

26
4.3: T-Test for Age Groups

- Older students have higher

engagement scores than
younger students.
- T-statistic: -0.1182, P-
value: 0.9059
- Does not support
hypothesis

27
4.4: Regression Analysis

- Students who have

engaged recently have
higher overall engagement
scores.
- Upward slope of the graph
indicates that this supports
the hypothesis.

28
4.5: Chi-Square Test

- Students who sign up early

in the year have higher
long-term engagement.
- Chi2: 127.542, P-
value:5.46652228690605
× 10 ^ (-22)
- Supports hypothesis

29
5. Churn Analysis

30
5. Churn Analysis

- Performed Churn analysis using the columns:

- Engagement Score: Score below 2.112069
- Days Since Last Engagement: Days more than 126
- Opportunity Participation Count: Count less than 2
- Calculated through limiting the last 25% values in each
column
- 878 Churn out of 2195; 40% Churn Rate

31
6. Predictive Modeling

32
6.1: Machine Learning
Models

33
6.1: Models

Following models were used of - K- Nearest Neighbor

Machine Learning: (metric: euclidean)
- Random Forest
- Logistic Regression
- Library used: Scikit-Learn
- Decision Trees
- Support Vector Machines
(SVMs) – Linear Kernel

34
6.2: Deep Learning Models

35
6.2: Models

- Artificial Neural Network

(ANN)
- Recurrent Neural Network
(RNN)
- Autoencoder
- Multilayer Perceptron (MLP)
- Library used: TensorFlow,
Keras

36
6.3: Evaluation

37
6.3: How did we evaluate the models?

- Evaluating using confusion matrix, accuracy, precision, recall,

and F1-Score.
- Applied Dropout layers to avoid overfitting in deep learning
models.
- Evaluated against the test data to ensure the model accuracy.
- In the end, got an accuracy of 1.0 for all models due to proper
data cleaning and validation.

38
7. Recommendations

39
7. Recommendations

- Personalized Re-Engagement Alerts: Notify students

with low engagement or long inactivity to re-engage.
- Targeted Course Suggestions: Recommend courses and
opportunities based on student interests and participation
history.
- Incentivize Participation: Encourage involvement through
rewards for completing activities or courses.

40
8. Conclusion

41
8. Conclusion

- Thoroughly cleaned and analyzed the data using best

possible data analysis and statistical techniques.
- Developed models to predict the churning behavior and used
best approaches during model development phases.
- Based on the analysis, we gave suitable recommendations to
Excelerate to reduce the churn behavior of their students.

42
Thank You

Chapter 3 and 4 Research Paper
100% (3)
Chapter 3 and 4 Research Paper
21 pages
Lead Score Case Study - Presentation
33% (3)
Lead Score Case Study - Presentation
17 pages
Week 2 Report
No ratings yet
Week 2 Report
38 pages
Team 56B
No ratings yet
Team 56B
17 pages
Final Eda Report (1) - Removed
No ratings yet
Final Eda Report (1) - Removed
6 pages
Sanskar Shrivastava - Data Visualization - Intern - Team 3C - Week1
No ratings yet
Sanskar Shrivastava - Data Visualization - Intern - Team 3C - Week1
22 pages
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
100% (1)
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
38 pages
ML 2 - Problem Statements and Rubirics
No ratings yet
ML 2 - Problem Statements and Rubirics
3 pages
Lab Assignment 1 Ucs551
No ratings yet
Lab Assignment 1 Ucs551
23 pages
Project Synopsis
33% (3)
Project Synopsis
4 pages
Attrition Project Mangal
No ratings yet
Attrition Project Mangal
75 pages
Course Project Report: Indian Institute of Technology, Kanpur
No ratings yet
Course Project Report: Indian Institute of Technology, Kanpur
15 pages
Turover Prediction
No ratings yet
Turover Prediction
52 pages
DA Notes
No ratings yet
DA Notes
68 pages
Rapportml
No ratings yet
Rapportml
54 pages
Sanskar Shrivastava - Data Visualization - Intern - Team 3C - Week2
No ratings yet
Sanskar Shrivastava - Data Visualization - Intern - Team 3C - Week2
23 pages
Arul Final PPP
No ratings yet
Arul Final PPP
45 pages
MiniProject XLSX Merged1
No ratings yet
MiniProject XLSX Merged1
37 pages
Week 3 Report
No ratings yet
Week 3 Report
37 pages
Developing A Web System For Predicting Student Success Using Learning Analytics
No ratings yet
Developing A Web System For Predicting Student Success Using Learning Analytics
60 pages
Phase 5
No ratings yet
Phase 5
41 pages
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
100% (1)
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
31 pages
Aspentech Course Catalog Fy18
No ratings yet
Aspentech Course Catalog Fy18
27 pages
Report
No ratings yet
Report
17 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Data Mining
No ratings yet
Data Mining
17 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Performance Improvement in Education Sector Using Datamining
No ratings yet
Performance Improvement in Education Sector Using Datamining
21 pages
LEAD SCORING CASE STUDY-converted-compressed
No ratings yet
LEAD SCORING CASE STUDY-converted-compressed
13 pages
Student Performance Prediction Report
No ratings yet
Student Performance Prediction Report
9 pages
Asiign2 Aaryan Ai
No ratings yet
Asiign2 Aaryan Ai
11 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
9 pages
Report Shawari
No ratings yet
Report Shawari
10 pages
Performance Improvement in Education Sector Using Datamining
No ratings yet
Performance Improvement in Education Sector Using Datamining
21 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
9 pages
Asiign2 Smith
No ratings yet
Asiign2 Smith
10 pages
CA Cover Sheet For Submissions
No ratings yet
CA Cover Sheet For Submissions
9 pages
Data4800 Report Ai
No ratings yet
Data4800 Report Ai
8 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
MKTM Ca2
No ratings yet
MKTM Ca2
7 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
13 pages
Student Performance Prediction: Mukul Gharpure, Pushpak Chaudhari, Yash Bhole, Sagar Borkar, Aashutosh Awasthi
No ratings yet
Student Performance Prediction: Mukul Gharpure, Pushpak Chaudhari, Yash Bhole, Sagar Borkar, Aashutosh Awasthi
7 pages
Theme2 IntelligentPerformanceReviewSuite WebinarTranscript05c0389
No ratings yet
Theme2 IntelligentPerformanceReviewSuite WebinarTranscript05c0389
8 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Phase-2 Ibrahim
No ratings yet
Phase-2 Ibrahim
9 pages
Soln Architecture11.
No ratings yet
Soln Architecture11.
5 pages
Student Performance Analysis Using Machine Learning: Yamnampet, Hyderabad.
No ratings yet
Student Performance Analysis Using Machine Learning: Yamnampet, Hyderabad.
8 pages
ADA Assignment - Final - 2024
No ratings yet
ADA Assignment - Final - 2024
5 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
Updated Dissertation Writing Guide
No ratings yet
Updated Dissertation Writing Guide
30 pages
Rohit Agrawal - Analyst
No ratings yet
Rohit Agrawal - Analyst
2 pages
Unit 1 Written Assignment
No ratings yet
Unit 1 Written Assignment
4 pages
Zahra Viva Script With Realisations
No ratings yet
Zahra Viva Script With Realisations
3 pages
LeadscoringCaseStudySummary Aparna Ashish
100% (2)
LeadscoringCaseStudySummary Aparna Ashish
2 pages
Rahul Sehrawat Resume
No ratings yet
Rahul Sehrawat Resume
1 page
EntranceTest DAInternMCI
No ratings yet
EntranceTest DAInternMCI
1 page
Aarti Valwani Resume
No ratings yet
Aarti Valwani Resume
1 page
EntranceTest DAInternMCNA
No ratings yet
EntranceTest DAInternMCNA
1 page
Matrix Data Analysis Diagram
75% (4)
Matrix Data Analysis Diagram
11 pages
10.1201 9781003175711 Previewpdf
No ratings yet
10.1201 9781003175711 Previewpdf
59 pages
Business Statistics: Fourth Canadian Edition
No ratings yet
Business Statistics: Fourth Canadian Edition
33 pages
Google Data Science Interview Questions
No ratings yet
Google Data Science Interview Questions
6 pages
AL3451 Assignment Question1
No ratings yet
AL3451 Assignment Question1
3 pages
Proposal
No ratings yet
Proposal
4 pages
BÀI TẬP THỰC HÀNH TRÊN STATA
No ratings yet
BÀI TẬP THỰC HÀNH TRÊN STATA
5 pages
Limited Dependent Variable Models Example
No ratings yet
Limited Dependent Variable Models Example
5 pages
Latest Issue Magazine
No ratings yet
Latest Issue Magazine
64 pages
Flight Ticket Price Predictor - Formatted Paper
No ratings yet
Flight Ticket Price Predictor - Formatted Paper
5 pages
(Ebook PDF) Accounting Information Systems, Global Edition 15th Edition Instant Download
100% (1)
(Ebook PDF) Accounting Information Systems, Global Edition 15th Edition Instant Download
44 pages
Data Analysis - Version 2
No ratings yet
Data Analysis - Version 2
12 pages
417 AI Handbook Class9!81!96
No ratings yet
417 AI Handbook Class9!81!96
16 pages
Ibm SPSS Statistics 23
No ratings yet
Ibm SPSS Statistics 23
18 pages
LGT2425 Lecture 4 (Revised Notes)
No ratings yet
LGT2425 Lecture 4 (Revised Notes)
47 pages
2232 CorpuzMNC2013 PDF
No ratings yet
2232 CorpuzMNC2013 PDF
11 pages
Case Chapter 17: The Research Report: The Jupiter Consumer Electronics Chain
No ratings yet
Case Chapter 17: The Research Report: The Jupiter Consumer Electronics Chain
10 pages
Data Warehousing and Data Mining Dec 2023
No ratings yet
Data Warehousing and Data Mining Dec 2023
28 pages
Midterm Exam - Deneme Incelemesi
No ratings yet
Midterm Exam - Deneme Incelemesi
19 pages
Shazam To Stata
No ratings yet
Shazam To Stata
4 pages
489 ArticleText 5174 1 10 20241025
No ratings yet
489 ArticleText 5174 1 10 20241025
5 pages
E-Commerce Catalog Manager - Detailed Explanation
No ratings yet
E-Commerce Catalog Manager - Detailed Explanation
2 pages
The Language of TV Commercials' Slogan
No ratings yet
The Language of TV Commercials' Slogan
7 pages
Cophenetic Correlation
No ratings yet
Cophenetic Correlation
2 pages
STAT101 Assignment 1
No ratings yet
STAT101 Assignment 1
3 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet

Team Presentation Slides

Uploaded by

Team Presentation Slides

Uploaded by

Excelerate Students

- The dataset originally had 8,559 rows and 33 columns.

- Dataset had a lot of empty cells.

- Changing the data types of the columns to appropriate format

- Ensuring that no column had any empty values

Creating new features:

- Applying FeatureHasher on First & Last Names, location,

- Extracting Date based features, i.e., SignUp Month & Year

- Students who are highly

- Older students have higher

- Students who have

- Students who sign up early

- Performed Churn analysis using the columns:

Following models were used of - K- Nearest Neighbor

- Artificial Neural Network

- Evaluating using confusion matrix, accuracy, precision, recall,

- Personalized Re-Engagement Alerts: Notify students

- Thoroughly cleaned and analyzed the data using best

You might also like