0% found this document useful (0 votes)
64 views43 pages

Team Presentation Slides

The document presents a comprehensive analysis of a dataset related to students participating in Excelerate's internship programs, focusing on data cleaning, exploratory data analysis, and predictive modeling. Key findings include insights on student engagement, churn analysis revealing a 40% churn rate, and the development of machine learning models to predict churn behavior. Recommendations for improving student engagement include personalized alerts, targeted course suggestions, and incentivizing participation.

Uploaded by

rahulvetal204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views43 pages

Team Presentation Slides

The document presents a comprehensive analysis of a dataset related to students participating in Excelerate's internship programs, focusing on data cleaning, exploratory data analysis, and predictive modeling. Key findings include insights on student engagement, churn analysis revealing a 40% churn rate, and the development of machine learning models to predict churn behavior. Recommendations for improving student engagement include personalized alerts, targeted course suggestions, and incentivizing participation.

Uploaded by

rahulvetal204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Excelerate Students

Data Analysis
RIT 1410 AI Team 5A

1
Overview

1. Introduction
2. Exploratory Data Analysis (EDA)
3. Insight Generation
4. Hypothesis Development
5. Churn Analysis
6. Predictive Modeling
7. Recommendations
8. Conclusion
2
3
1.Introduction

4
1.1: Understanding the Dataset

- The dataset originally had 8,559 rows and 33 columns.


- This dataset is related to the Students who had participated
in one of the internship programs powered by Excelerate,
back in June 2023.

5
1.2: Key Challenges in the Dataset

- Dataset had a lot of empty cells.


- It had number of columns with a wrong data type that could
have caused issues down the line.
- Dataset had not been analyzed for insights extraction.

6
2. Exploratory Data Analysis
(EDA)

7
2.1: Initial Data Cleaning

8
2.1: Initial Data Cleaning

- Changing the data types of the columns to appropriate format


- Date to be in datetime64[ns] format, Zip code to be int, etc.
- Dropping the columns that had a lot of empty data or were of no
use
- Validating the data as per the column type, e.g.
- Ensuring name is alphabetical and has at least 3 characters
- Ensuring Zip Code is numerical
- No extra characters such as quotation marks ‘’

9
2.2: Context Based EDA

10
2.2: Context based EDA

- Ensuring that no column had any empty values


- Numerizing the columns for Machine Learning models
- Dropping the columns with a lot of empty data
- Dropping the columns that made no sense for the model
development phase
- Creating new columns based on the previous columns just
before the model development phases.

11
2.3: Feature Engineering

12
2.3.1: Creating new features

Creating new features:


- Creating a new column “Age of Learner” based on “Date of
Birth”
- Creating a new column “Engagement Duration” based on the
calculation between the “Apply Date” and “Starting Date”

13
2.3.2: Data Normalization

- Applying FeatureHasher on First & Last Names, location,


institution name, and current major.
- Applying MinMaxScaler() on 'Engagement Score',
'Opportunity Participation Count', 'Days Since Last
Engagement', 'Age of Learner', 'Engagement Duration'.
- Applying one-hot encoding on the other columns.

14
2.3.3: Other Feature Engineering Processes

- Extracting Date based features, i.e., SignUp Month & Year


- Calculating Engagement Score based on “Time in
Opportunity” and “Age of Learner”
- Calculating “Time in Opportunity” in days based on
Opportunity Start and End dates.
- Adding a new column “Time Since Last Engagement”

15
3. Insight Generation

16
3.1: Simple Insight
Generation

17
18
19
20
3.2: Advanced Insights
Generation

21
22
23
4. Hypothesis Development

24
4.1: Correlation Analysis

- Engagement Scores
declines as the time since
signup increases
- Correlation of 0.33
- Does not support the
hypothesis

25
4.2: Cohort Analysis

- Students who are highly


engaged in the first 30
days are more likely to
remain active long-term.
- Supports the hypothesis.

26
4.3: T-Test for Age Groups

- Older students have higher


engagement scores than
younger students.
- T-statistic: -0.1182, P-
value: 0.9059
- Does not support
hypothesis

27
4.4: Regression Analysis

- Students who have


engaged recently have
higher overall engagement
scores.
- Upward slope of the graph
indicates that this supports
the hypothesis.

28
4.5: Chi-Square Test

- Students who sign up early


in the year have higher
long-term engagement.
- Chi2: 127.542, P-
value:5.46652228690605
× 10 ^ (-22)
- Supports hypothesis

29
5. Churn Analysis

30
5. Churn Analysis

- Performed Churn analysis using the columns:


- Engagement Score: Score below 2.112069
- Days Since Last Engagement: Days more than 126
- Opportunity Participation Count: Count less than 2
- Calculated through limiting the last 25% values in each
column
- 878 Churn out of 2195; 40% Churn Rate

31
6. Predictive Modeling

32
6.1: Machine Learning
Models

33
6.1: Models

Following models were used of - K- Nearest Neighbor


Machine Learning: (metric: euclidean)
- Random Forest
- Logistic Regression
- Library used: Scikit-Learn
- Decision Trees
- Support Vector Machines
(SVMs) – Linear Kernel

34
6.2: Deep Learning Models

35
6.2: Models

- Artificial Neural Network


(ANN)
- Recurrent Neural Network
(RNN)
- Autoencoder
- Multilayer Perceptron (MLP)
- Library used: TensorFlow,
Keras

36
6.3: Evaluation

37
6.3: How did we evaluate the models?

- Evaluating using confusion matrix, accuracy, precision, recall,


and F1-Score.
- Applied Dropout layers to avoid overfitting in deep learning
models.
- Evaluated against the test data to ensure the model accuracy.
- In the end, got an accuracy of 1.0 for all models due to proper
data cleaning and validation.

38
7. Recommendations

39
7. Recommendations

- Personalized Re-Engagement Alerts: Notify students


with low engagement or long inactivity to re-engage.
- Targeted Course Suggestions: Recommend courses and
opportunities based on student interests and participation
history.
- Incentivize Participation: Encourage involvement through
rewards for completing activities or courses.

40
8. Conclusion

41
8. Conclusion

- Thoroughly cleaned and analyzed the data using best


possible data analysis and statistical techniques.
- Developed models to predict the churning behavior and used
best approaches during model development phases.
- Based on the analysis, we gave suitable recommendations to
Excelerate to reduce the churn behavior of their students.

42
Thank You

43

You might also like