0% found this document useful (0 votes)

12 views37 pages

Sarumathi Intern18

The internship report details a project focused on predicting the conversion of free subscription users to paid subscribers using the Random Forest algorithm. The study aims to improve conversion rates by analyzing user data, identifying key factors influencing conversion, and enhancing user engagement strategies. The report includes sections on data collection, analysis, methodology, and the advantages of using Random Forest over existing predictive models.

Uploaded by

Kk Kk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views37 pages

Sarumathi Intern18

Uploaded by

Kk Kk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

VISVESARAYA TECHNOLOGY UNIVERSITY, BELAGAVI

INTERNSHIP REPORT
ON
“INNOVATION ON PYTHON, MACHINE LEARNING AND AI”
Submitted in the partial fulfilment for the award of degree (21****)

BACHELOR OF ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

Submitted by

SARUMATHI SREE S (1SP21CS093)

UNDER THE GUIDANCE OF

PROF JAYAKUMAR B L
DEPT OF CSE

Department of Computer Science and Engineering

S.E.A COLLEGE OF ENGINEERING AND TECHNOLOGY
BENGALURU-560049

2023-2024

DEPT OF CSE SEACET 2023- 2024 Page 1

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

S.E.A COLLEGE OF ENGINEERING AND TECHNOLOGY

Ekta Nagar, Basavanapura, K.R.Puram, Bangalore, Karnataka , 560059 Department
of Computer Science and Engineering

CERTIFICATE

This is to certify that the Internship entitled "INNOVATION ON PYTHON,

MACHINE LEARNING AND AI" has been successfully carried out by “Indoskill Team,
Aqmenz Automation Pvt Ltd, in partial fulfilment for the award of Bachelor of Engineering
in Computer Science under Visvesvaraya Technological University, Belgaum during the
year 2023-2024 under my supervision. It is certified that all corrections/suggestions
indicated have been incorporated in the report. The project report has been approved as it
satisfies the academic requirements in respect of Internship prescribed for the course
Internship / Professional Practice (21INT49)

__________ _ _______________

SIGNATURE OF GUIDE SIGNATURE OF HOD SIGNATURE OF PRINCIPAL

DEPT OF CSE SEACET 2023- 2024 Page 2

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

CERTIFICATION:

DEPT OF CSE SEACET 2023- 2024 Page 3

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

ABSTRACT

The prediction model will help the company in improving their conversion rate by
identifying potential users who are likely to convert to paid subscriptions. The model can also
be used to identify factors that are important in predicting conversion and can help the company
to focus on those factors to improve its user engagement and retention strategies.

DEPT OF CSE SEACET 2023- 2024 Page 4

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

ACKNOWLEDGEMENT

This Internship is a result of accumulated guidance, direction and support of several

important persons. We take this opportunity to express our gratitude to all who have helped us
to complete the Internship.

We express our sincere thanks to our Principal, for providing usadequate facilities to
undertake this Internship.

We would like to thank our Head of Dept – Computer science , for providing us an
opportunity to carry out Internship and for his valuable guidance and support.

We express our deep and profound gratitude to our guide, Guide name,
Assistant/Associate Prof, for her keen interest and encouragement at every step in completing
the Internship.

We would like to thank all the faculty members of our department for the support
extended during the course of Internship.

We would like to thank the non-teaching members of our dept, forhelping us during the
Internship.

Last but not the least, we would like to thank our parents and friends without whose
constant help, the completion of Internship would have not been possible.

DEPT OF CSE SEACET 2023- 2024 Page 5

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DECLARATION

I, Sarumathi Sree.S, 3rd year student of Computer Science, SEA College Of Engineering &
Technology, declare that the Internship has been successfully completed, in
“ENTREPRENEURSHIP” under Indoskill platform conducted by Aqmenz Automation
Private Limited Technology. This report is submitted in partial fulfilment of the requirements
for award of Bachelor Degree in Computer Science Engineering , during the academic year
2023-2024.

Date: 07/11/2023

Place: Bangalore

USN: 1SP21CS093

Name: Sarumathi Sree.S

DEPT OF CSE SEACET 2023- 2024 Page 6

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

TABLE OF CONTENT:

1. INTRODUCTION

2. COMPANY PROFILE

3. TOOLS EXPOSED

4. DATA COLLECTION AND PREPARATION

5. EXPLORATORY DATA ANALYSIS

6. METHODOLOGY

7. CODING

8. TESTING

9. CONCLUSION

10. REFERENCES

DEPT OF CSE SEACET 2023- 2024 Page 7

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

1.INTRODUCTION:

The global pandemic has drastically changed the way students are learning worldwide
and thus distinctive online learning has taken place. Students from around the world have
suddenly shifted from classroom learning to online learning.

The COVID-19 pandemic is essential for educational institutions to provide their

students with a user-friendly online learning platform to sustain quality learning. An online
learning platform is a digital space that allows course creators to market, sell, and deliver their
eLearning courses. eLearning platforms like 365DataScience, Udemy business, Coursera,
Skills Share, and. They offer a wide range of features, including assignments, quizzes, learning
interactions, and completion certificates.

Subscription-based business models have become increasingly popular in recent years,

offering customers access to products and services in exchange for a recurring fee. In this
business model, it is crucial to identify which free plan subscribers are likely to convert to paid
subscribers, as this can significantly impact the company's revenue.

The goal of this study is to evaluate the effectiveness of the Random Forest algorithm
in predicting the conversion of free plan subscribers to paid subscribers. The results of this
study have implications for subscription-based businesses, as they can use the findings to target
their marketing efforts and improve the conversion rate of free plan subscribers to paid
subscribers. the students and employees to meet the mandatory necessities of future human
resources and skill demands.
We are in the 4th industrial revolution. The technological revolution is catastrophic like
never before, hence continues awareness for the up-gradation environment is much essential.
Aqmenz Automation Pvt. Ltd. is working to help and enhance the potential of studentsand
employees. So that future human resources will be very beneficial, purposeful and profitable
to the nation.

DEPT OF CSE SEACET 2023- 2024 Page 8

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

1.5 Objectives
• AAPL had a trust in Skill India mission & vision, hence our utmost priority is to add skill to
the young Generation and make them Profitable and productive for the nation.
• We aim in Providing Industrial Automation Training Skill module kits to Institution
University’s & Collage Lab Facilities with Lowest Possible Price for Benefits of Technical
Students.
Identifying young entrepreneurs and motivate, training them to establish Start-up to create
Employment as well as prosperity for the nation.

• Consultation, Sourcing and supplying highly skilled Manpower to Industry for better
efficiency and productivity.
• Providing low cast & precise industrial automation solutions.

• Very eager to fetch solution for most complex industrial problems in a mode

1.6 Major Milestones

We have under gone many industrial projects. Our major clients are BIAL (Bangalore
International Airport Limited), GE (General Electric) and Amics technologies.

DEPT OF CSE SEACET 2023- 2024 Page 9

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

1.7 Service offered

Provides Design & Automation solutions.

All type of automation projects to companies using PLC’s, SCADA embedded systems.

We provide robots and robotic solutions to small and medium scale companies

Embedded solutions to companies like GE

We conduct technical skill oriented training programs to engineering colleges.

We also provide robotics and automation lab equipments for colleges.

1.9 Services offered

• Provides Design & Automation solutions.
• All type of automation projects to companies using PLC’s, SCADA embedded systems.
• We provide robots and robotic solutions to small and medium scale companies.
• Embedded solutions to companies like GE
• We conduct technical skill-oriented training programs to engineering colleges.
• We also provide robotics and automation lab equipment’s for colleges.
Number of people working in company and their responsibilities. There are 20 persons in this
company, out of which:
• Mohan Shamanna, Chief Executive Officer (CEO)
• Mohammed Azhar Hussain, Chief Technology Officer (CTO)

Ongoing projects
• Automation related projects
• CNC Machines
• Open-source Custom Robots
• Garment Industry slider Project

DEPT OF CSE SEACET 2023- 2024 Page 10

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

2.1 STATEMENT OF THE PROBLEM

The problem is to predict whether a user of a free subscription plan will convert to a
paid subscriber or not using machine learning with the Random Forest algorithm. The data
available for analysis includes various user attributes such as demographics, usage patterns,
and activities on the platform. The objective is to build a model that can accurately predict
the conversion of a free user to a paid subscriber. The prediction model will help the company
in improving their conversion rate by identifying potential users who are likely to convert to
paid subscriptions. The model can also be used to identify factors that are important in
predicting conversion and can help the company to focus on those factors to improve its user
engagement and retention strategies

3. SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

The existing system for predicting whether a Free Plan user would convert to a paid
subscriber or not using machine learning techniques involves various approaches. One of the
most common approaches is the use of logistic regression, where the data is modelled using
a logistic function to predict the probability of conversion.

Support Vector Machines (SVMs) are also commonly used in predicting customer
conversion, where the algorithm tries to find a hyperplane that separates the data into two
classes. SVMs have high accuracy and can handle high-dimensional data, but they may not
be suitable for large datasets due to their high computational complexity.

Random forest is another popular approach that overcomes the limitations of decision
trees by using an ensemble of trees, where each tree is built on a random subset of the data
and a random subset of the features. This approach reduces overfitting and provides better
predictions

Overall, the existing system for predicting customer conversion using machine learning
techniques involves a variety of approaches, and the choice of the algorithm depends on the
specific characteristics of the data and the problem at hand.

DEPT OF CSE SEACET 2023- 2024 Page 11

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

3.2 LIMITATIONS OF EXISTING SYSTEM

* Accuracy
* No faster mode
* Computational Complexity

3.4 ADVANTAGES OF PROPOSED SYSTEM

* Improved accuracy: Random Forest is known to be a highly accurate machine learning

algorithm, and it is expected to improve the accuracy of the predictions compared to the
existing system.
* Handling non-linearity: Random Forest can handle non-linear relationships between
variables, which is useful for predicting complex user behavior.
* Feature importance: Random Forest provides a measure of feature importance, which can
be useful in identifying the most important variables that drive user conversion.
* Robustness: Random Forest is robust to noise and outliers in the data, which can be a
common problem in real-world data sets.
* Scalability: Random Forest can handle large and complex data sets, making it suitable for
applications with a large number of users or variables.
* Flexibility: Random Forest can handle both categorical and continuous variables, making it
suitable for a wide range of data set.
Overall, the proposed theory using Random Forest has the potential to improve the accuracy
of 89% and robustness of the predictions for whether a Free Plan user would convert to a paid
subscriber or not, leading to better decision-making and improved business outcomes.

DEPT OF CSE SEACET 2023- 2024 Page 12

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

4. DATA COLLECTION AND PREPARATION

4.1 DATA SOURCES

For this case study, we have chosen the Real-Time data set from the 365 Data Science
learning platform. There are 11 datasets in the CSV format. The datasets are heavily
imbalanced. There are many attributes in the datasets and need to identify which dataset and
feature might contribute to getting better accuracy.
4.2 DATA PROFILE

Data Types:
• Minutes watched: Numeric (continuous).
• Number of days engaged with the platform: Numeric (continuous).
• Engaging with the quiz: Boolean (categorical).

• Engaged with the exam: Boolean (categorical).

• Engaged with Q/A hub: Boolean (categorical).
• Purchased: Boolean (categorical)

Data Range:
• Minutes watched: 0 to infinity.

• The number of days engaged with the platform: 0 to infinity.

• Engaging with the quiz: True (1) or False (0).

• Engaged with the exam: True (1) or False (0).

• Engaged with Q/A hub: True

• Purchased: True (1) or False

Data Distribution:

• Minutes watched: The distribution is likely to be skewed to the right, with a few users
watching a lot of minutes and most users watching less.
• Number of days engaged with the platform: The distribution is also likely to be skewed
to the right, with some users engaging with the platform for many days and most users
engaging for fewer days.
DEPT OF CSE SEACET 2023- 2024 Page 13
INNOVATION ON PYTHON, MACHINE LEARNING AND AI

• Engaging with the quiz, engaged with the exam, engaged with Q/A hub, and purchasing:
These are categorical variables, so the distribution will be in the form of frequency
counts of True (1) and False (0) values.

Data Quality:

• Missing Values: There are some null values in the dataset and filled with mean values.
Outliers: There are a few outliers in the minutes watched and the number of days
engaged with platform attributes, as some users may have much higher values than
others.
• Imbalancing: The dataset is highly imbalanced and we used imbalance learning or
resampling methods for balancing data.

Data Relationships:

• Minutes watched and the number of days engaged with the platform is likely to be
positively correlated, as users who watch more minutes are likely to engage with the
platform for more days.
• Engaging with quizzes, engaging with exams, and engaged with the Q/A hub are likely
to be correlated with each other, as users who engage with one are more likely to engage
with the others as well.

DEPT OF CSE SEACET 2023- 2024 Page 14

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

4.3 DATA CLEANING AND PREPROCESSING

4.3.1 Data Preparation

Group by () -> Index

Figure. Data Preprocessing

Step 1: Merging all Datasets with respect to Student ID, using Outer merge.

Step 2: Filling Null values with Zero(0).

Step 3: Type Casting each column with respect to the required format.

5. EXPLORATORY DATA ANALYSIS

5.1 Checking for null values

5.2 Dataset Information

DEPT OF CSE SEACET 2023- 2024 Page 15

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

5.3 Dataset Description

5.4 Handling Imbalancing dataset

Handling Imbalanced Data:
For dealing with imbalanced the resample function from the sklearn. utils module can
be used to handle imbalanced data by resampling the data so that the classes are balanced.
There are two main types of resampling methods: oversampling and under sampling.
Oversampling: This method involves increasing the number of instances in the minority
class by repeating instances or generating synthetic instances. The aim is to balance the class
distribution by having an equal number of instances from both classes.
Under sampling: This method involves reducing the number of instances in the majority
class. The aim is to balance the class distribution by having an equal number of instances from
both classes.

DEPT OF CSE SEACET 2023- 2024 Page 16

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

5.5 DATA VISUALIZATION TECHNIQUES

5.5.1 Checking for values distribution count in target variable 'st_purch’

5.5.2. Counter plot for Target variable value distribution sns.countplot(data.st_purch)

5.5.3 Imbalanced to Balanced Plots

DEPT OF CSE SEACET 2023- 2024 Page 17

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

6. METHODOLOGY
Data collection: The first step is to collect relevant data that can be used to build the model.
The data should include information about the free plan subscribers such as their demographics,
usage patterns, and any other relevant features.
Data pre-processing: The collected data must be pre-processed to handle missing values, deal
with outliers, and convert categorical variables into numerical ones. The pre-processing step is
critical for ensuring the data quality and the model’s accuracy.
Feature selection: The next step is to select the features that will be used to build the model.
This step involves identifying the most important features that significantly impact the target
variable (i.e., whether a free plan subscriber would convert to a paid subscriber).
Splitting the data: The data must be split into two parts: training data and testing data. The
training data will be used to build the model, while the testing data will be used to evaluate its
performance.
Model training: The next step is to train the Random Forest model using the training data. The
model will use the selected features to learn the relationship between the features and the target
variable.
Model evaluation: The trained model must be evaluated using the testing data. This step will
provide insights into the model's performance and allow for any necessary adjustments. The
evaluation metrics used to assess the performance of the model may include accuracy, precision
etc.
Model deployment: If the model's performance is satisfactory, it can be deployed in a realworld
setting to make predictions on new data. The model can be used to predict whether a new free
plan subscriber would convert to a paid subscriber.

6.1 DATA MODELS

There are several data models in machine learning that can be used to predict whether
a user of a free subscription plan will convert to a paid subscriber or not.
Logistic Regression: Logistic regression is a commonly used classification algorithm that can
be used to predict the probability of a user converting to a paid subscriber. It works by
modelling the relationship between the independent variables, such as minutes watched, the
number of days engaged with the platform, engaging with the quiz, engagement with the exam,
engagement with the Q/A hub and purchased, and the dependent variable, which is the
probability of a user converting.

DEPT OF CSE SEACET 2023- 2024 Page 18

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

Decision Trees: Decision trees are another classification algorithm that can be used to predict
whether a user will convert or not. Decision trees work by recursively splitting the data based
on the most significant features until a certain threshold is reached. These splits are based on
the features such as minutes watched, number of days engaged with the platform, engagement
with the quiz, engagement with the exam, engagement with the Q/A hub and purchase.
Random Forests: Random forests are an extension of decision trees and work by aggregating
multiple decision trees to improve the accuracy of the prediction. Each decision tree in the
random forest is trained on a random subset of the data and a random subset of the features.
Support Vector Machines (SVMs): SVMs are powerful classification algorithm that works by
finding the best hyperplane that separates the data into different classes. SVMs can be used to
predict whether a user will convert or not based on their behavior on the platform.

6.2 Model Building

Figure. Model Building

Data acquisition: The first step in any machine learning project is to acquire the relevant data.
This could involve collecting data from different data sources such as Kaggle and Kdneggets.
It is important to ensure that the data is of high quality and free of errors or biases. And in this
project, we have taken datasets from the 365 datascience.com website.

DEPT OF CSE SEACET 2023- 2024 Page 19

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

Data preparation: Once the data has been acquired, it needs to be cleaned and preprocessed
before it can be used to train a machine learning model. This might involve removing missing
values, handling outliers, normalizing the data, and converting categorical variables into
numerical ones.
Feature extraction: After the data has been cleaned and preprocessed, relevant features need to
be extracted from the data. Feature extraction involves selecting the most relevant variables
that can help the machine learning algorithm learn patterns in the data. This could involve
techniques such as principal component analysis (PCA), feature scaling, or feature selection.
Split the dataset: Once the data has been preprocessed and features have been extracted, the
dataset needs to be split into training and testing sets. The training set is used to train the
machine learning model, while the testing set is used to evaluate the performance of the model.
Training model: With the data split into training and testing sets, the next step is to train the
machine learning model. This could involve using algorithms such as logistic regression,
decision trees, and support vector machines. Random forest is one of the ensemble methods
that can be used to improve the accuracy of the model.
Evaluation of model: After training the machine learning model, it is important to evaluate its
performance on the testing set. This could involve using metrics such as accuracy, precision,
recall and F1 score. It is important to ensure that the model is not overfitting the training data
and is able to new data.
Data visualization: Data visualization is an important step in any machine learning project. It
involves using visual tools such as scatter plots, histograms, and heatmaps to explore the data
and identify patterns or relationships. Visualization can help in feature selection and
determining the most important features.
Building front-end interface: Finally, after building the machine learning model and evaluating
its performance, it is important to build a user interface that allows end-users to interact with
the model. This could involve building a web application or a mobile app that provides real-
time predictions based on user input.
6.3 MODEL SELECTION AND EVALUATION
6.3.1 Model selection
We selected Random Forest for this project, Random Forest is a powerful machine learning
algorithm that is often used for classification, regression, and feature selectionRandom forest
is an ensemble learning method that combines multiple decision trees to improve the

DEPT OF CSE SEACET 2023- 2024 Page 20

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

accuracy and robustness of the prediction. Each decision tree is trained on a random subset of
the data and a random subset of the features, and the final prediction is based on the majority
vote of the individual trees. Random forest is a popular model selection algorithm because it is
robust to noise and overfitting, and it can handle both categorical and continuous variables.
Figure. Random Forest Architecture

6.3.2 Model evaluation

Random forest is a popular machine learning algorithm that can be used for predicting
whether a user of a free subscription plan will convert to a paid subscriber or not. In order to
evaluate the performance of a random forest model for this problem, we can use a variety of
metrics.
One common evaluation metric for random forest models is accuracy. Accuracy
measures the proportion of correctly classified instances out of all the instances. In our case, it
would measure the proportion of users who were correctly classified as either converting to a
paid subscription or not.

However, accuracy can be misleading in situations where the classes are imbalanced,
meaning one class has significantly more examples than the other. In our case, if the majority
of users do not convert to a paid subscription, accuracy may not be the best metric to evaluate
our model.

DEPT OF CSE SEACET 2023- 2024 Page 21

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

So, we used evaluation metrics for binary classification problems like this including
precision, recall, and F1 score
Precision measures the proportion of correctly predicted positive instances (i.e., users
who converted to a paid subscription) out of all the instances predicted as positive. This metric
is useful when the cost of false positives is high, meaning we want to minimize the number of
false positives and we got 92.22 %.
Recall measures the proportion of correctly predicted positive instances out of all the
actual positive instances. This metric is useful when the cost of false negatives is high, meaning
we want to minimize the number of false negatives and we got 86.35%
F1 score is the harmonic mean of precision and recall and provides a balance between
the two metrics. It is a useful metric when we want to balance both false positives and false
negatives and we got 89%.

6.4 Result

The confusion matrix is a performance evaluation tool that helps to measure the
accuracy of a classification model. It is a table that is used to evaluate the performance of a
machine-learning algorithm. The confusion matrix is made up of four values: true positive (TP),
true negative (TN), false positive (FP), and false negative (FN). These values help us to
understand how well the model is performing and where it is making errors.

DEPT OF CSE SEACET 2023- 2024 Page 22

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

In the above confusion matrix, we have two classes - subscribed and not subscribed.
True negatives are the number of cases where the model correctly predicted that the customer
will not subscribe, and the actual value is also not subscribed. False positives are the cases
where the model predicted that the customer would subscribe, but the actual value is not
subscribed. False negatives are the cases where the model predicted that the customer will not
subscribe, but the actual value is subscribed. True positives are the cases where the model
correctly predicted that the customer would subscribe, and the actual value is also subscribed.

The confusion matrix shows that the model has 6233 true negatives, which means that
the model correctly predicted that 6233 customers will not subscribe, and the actual value is
also not subscribed. The model has 475 false positives, which means that the model predicted
that 475 customers will subscribe but the actual value is not subscribed. The model has 891
false negatives, which means that the model predicted that 891 customers will not subscribe,
but the actual value is subscribed. Finally, the model has 5638 true positives, which means that
the model correctly predicted that 5638 customers will subscribe, and the actual value is also
subscribe.
In this confusion matrix, the labels are as follows:
True Positive (TP): The model predicted that the user would subscribe to a paid plan,
and the user actually did subscribe i.e. 5638.
True Negative (TN): The model predicted that the user would not subscribe to a paid
plan, and the user actually did not subscribe i.e. 6233.
False Positive (FP): The model predicted that the user would subscribe to a paid plan,
but the user actually did not subscribe i.e. 475.
False Negative (FN): The model predicted that the user would not subscribe to a paid
plan, but the user actually did subscribe i.e. 891.

Figure. Accuracy

DEPT OF CSE SEACET 2023- 2024 Page 23

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

The accuracy of different machine learning algorithms can be measured using a

confusion matrix, which is a table that summarizes the number of correct and incorrect
predictions made by the model. In this case, we have considered the performance of six
different algorithms, namely Naive Bayes, SVM, Logistic Regression, KNN, Decision Tree,
and Random Forest, which were evaluation.

The accuracy obtained by each algorithm was Naive Bayes - 86.67%, SVM - 88.17%,
Logistic Regression - 89%, KNN - 93.47%, Decision Tree - 89.33%, and Random Forest -
89.63%. Among these algorithms, KNN performed the best with an accuracy of 93.47%,
closely followed by Random Forest with an accuracy of 89.63%.

These results indicate that machine learning algorithms can be used effectively to predict
whether a free plan user converts to a paid subscriber or not. It also suggests that KNN and
Random Forest are the most effective algorithms for this task.

However, we have considered Random Forest because it can handle a larger number of datasets
and is more accurate and computationally faster than other models.

DEPT OF CSE SEACET 2023- 2024 Page 24

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

7. CODING

DEPT OF CSE SEACET 2023- 2024 Page 25

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 26

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 27

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 28

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 29

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 30

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 31

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 32

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 33

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

DEPT OF CSE SEACET 2023- 2024 Page 34

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

8.Testing
Testing is an essential step in the predictive modeling process, and it plays a crucial
role in evaluating the performance of a Random Forest model when predicting whether a user
of a free subscription plan will convert to a paid subscriber or not, using the given dataset
attribute.

To perform testing, we usually split the dataset into two parts: a training set and a testing
set. The training set is used to train the Random Forest model, and the testing set is used to
evaluate the model's performance. In this case, the independent variables are minutes watched,
number of days engaged with the platform, engaging with quiz, engaged with exam, and
engaged with Q/A hub, and the dependent variable is Purchase.

After splitting the dataset, we train the Random Forest model on the training set and use
it to make predictions on the testing set. We can then evaluate the performance of the model
using various evaluation metrics. And we measured accuracy, precision, recall, and F1-score.

DEPT OF CSE SEACET 2023- 2024 Page 35

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

For instance, accuracy measures the percentage of correctly classified samples out of
all samples in the testing set and we got 89%, while precision measures the percentage of true
positive predictions out of all positive predictions got 92.22%. The recall measures the
percentage of true positive predictions out of all actual positive samples 86.35%. The F1-score
is the harmonic mean of precision and recall and we got 78%.

By analyzing the results of these evaluation metrics, we have determined the model
performed 89% in predicting, whether a user of a free subscription plan will convert to a paid
subscriber or not.

9. Conclusion

Machine learning is a technique of training machines to perform the activities a human

brain can do, albeit bit faster and better than an average human-being . It has applications in
nearly every other field of study and is already being implemented commercially because
machine learning can solve problems too difficult or time-consuming for humans to solve .
Machine learning is a subfield of artificial intelligence, which is broadly defined as the
capability of a machine to imitate intelligent human behavior . The subject is vast and has a lot
of depth, but each topic can be learned in a few hours. Practicing one topic at a time is the best
way to start studying machine learning.

DEPT OF CSE SEACET 2023- 2024 Page 36

INNOVATION ON PYTHON, MACHINE LEARNING AND AI

10.REFERENCES

https://userpilot.com/blog/churn-prediction/
https://neptune.ai/blog/how-to-implement-customer-churn-prediction
https://analyticsindiamag.com/customer-event-prediction-in-onlinesubscription-products/
https://www.chargebee.com/blog/subscription-revenue-forecasting/

DEPT OF CSE SEACET 2023- 2024 Page 37

AWS AI ML Virtual Internship Full Report
No ratings yet
AWS AI ML Virtual Internship Full Report
33 pages
Internship Report 40 Pages
No ratings yet
Internship Report 40 Pages
40 pages
Aiml Virtual Internship Report
No ratings yet
Aiml Virtual Internship Report
99 pages
Daa Bits
100% (1)
Daa Bits
44 pages
Intership Report
No ratings yet
Intership Report
41 pages
Naveen Python - For - Data-Science-Report
100% (1)
Naveen Python - For - Data-Science-Report
24 pages
IBM Internship Report
No ratings yet
IBM Internship Report
49 pages
Internshipreport FINAL441
No ratings yet
Internshipreport FINAL441
14 pages
Sample Documentation
No ratings yet
Sample Documentation
70 pages
3 Driessen
100% (1)
3 Driessen
34 pages
Shortest Path Problem
100% (1)
Shortest Path Problem
7 pages
Industrial Training 1
No ratings yet
Industrial Training 1
34 pages
Internship Report
No ratings yet
Internship Report
20 pages
Anand First Page
No ratings yet
Anand First Page
351 pages
E.venkatasai Ir
No ratings yet
E.venkatasai Ir
204 pages
Probability Distributions
No ratings yet
Probability Distributions
129 pages
Lab 13: Implementation of AVL TREE
No ratings yet
Lab 13: Implementation of AVL TREE
4 pages
Shareef
No ratings yet
Shareef
29 pages
Co 03
No ratings yet
Co 03
20 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Vikas Internship Document
No ratings yet
Vikas Internship Document
34 pages
Google Aiml
No ratings yet
Google Aiml
47 pages
Report Data Analysis
No ratings yet
Report Data Analysis
45 pages
Report
No ratings yet
Report
21 pages
Nagarjuna AI ML
No ratings yet
Nagarjuna AI ML
20 pages
7th Sem Final Report
No ratings yet
7th Sem Final Report
67 pages
Ai ML Virtual Internship
No ratings yet
Ai ML Virtual Internship
51 pages
Lab Report 05
No ratings yet
Lab Report 05
20 pages
hw4 - Solution
No ratings yet
hw4 - Solution
2 pages
Final Last
No ratings yet
Final Last
34 pages
Stack Practice Programs
No ratings yet
Stack Practice Programs
19 pages
Data Science Report
No ratings yet
Data Science Report
46 pages
Naveen Internship
No ratings yet
Naveen Internship
34 pages
Internship
No ratings yet
Internship
30 pages
Report Final
No ratings yet
Report Final
21 pages
Sachin
No ratings yet
Sachin
28 pages
SVD Slides
No ratings yet
SVD Slides
26 pages
Final Rep
No ratings yet
Final Rep
23 pages
Kasi Puneeth Ram Report
No ratings yet
Kasi Puneeth Ram Report
60 pages
Contents TIE Report - Merged
No ratings yet
Contents TIE Report - Merged
18 pages
J1 (SkillDzire)
No ratings yet
J1 (SkillDzire)
49 pages
Real Report
No ratings yet
Real Report
62 pages
Abhi Inter 01
No ratings yet
Abhi Inter 01
68 pages
DM Recurrence Relation
No ratings yet
DM Recurrence Relation
32 pages
21P31A05C3
No ratings yet
21P31A05C3
54 pages
FINAL INTERN DOCUMENT Dhanunjai
No ratings yet
FINAL INTERN DOCUMENT Dhanunjai
26 pages
Avinash PDF
No ratings yet
Avinash PDF
23 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
26 pages
DP-Knapsack Problem
No ratings yet
DP-Knapsack Problem
51 pages
RK Final
No ratings yet
RK Final
32 pages
Internship ML REPORT
No ratings yet
Internship ML REPORT
27 pages
C0 Report
No ratings yet
C0 Report
50 pages
FSPG Report
No ratings yet
FSPG Report
43 pages
570 Report
No ratings yet
570 Report
38 pages
Bhargav
No ratings yet
Bhargav
27 pages
Post-Quantum Cryptography Working Group (2023) Risk Model Technical Paper
No ratings yet
Post-Quantum Cryptography Working Group (2023) Risk Model Technical Paper
25 pages
Topic 1 - Basic Notions
No ratings yet
Topic 1 - Basic Notions
36 pages
Internship Report: A Report Submitted in Partial Fulfillment of The Requirements of
No ratings yet
Internship Report: A Report Submitted in Partial Fulfillment of The Requirements of
19 pages
Last Last Final Internship Report Siddhi
No ratings yet
Last Last Final Internship Report Siddhi
14 pages
Python Intership Report
No ratings yet
Python Intership Report
22 pages
Guru Intership Report 1
No ratings yet
Guru Intership Report 1
40 pages
Data Science IT Fsdfegg
No ratings yet
Data Science IT Fsdfegg
31 pages
Tushar Internship Report 4th Year
No ratings yet
Tushar Internship Report 4th Year
17 pages
Internship Report by Sachin Gadadaki King
No ratings yet
Internship Report by Sachin Gadadaki King
28 pages
189y1a05d4 Internship
No ratings yet
189y1a05d4 Internship
46 pages
Machine Learning From Rohit
No ratings yet
Machine Learning From Rohit
14 pages
KNN ALGORITHM IN MACHINELEARNING
No ratings yet
KNN ALGORITHM IN MACHINELEARNING
10 pages
Visvesvaraya Technological University Jnana Sangama, Belagavi-590018
No ratings yet
Visvesvaraya Technological University Jnana Sangama, Belagavi-590018
27 pages
OmNarayanSingh CC306 IS Final
No ratings yet
OmNarayanSingh CC306 IS Final
15 pages
Internship Report Sample
No ratings yet
Internship Report Sample
9 pages
Final NOBLE COLLEGE Intership Report
No ratings yet
Final NOBLE COLLEGE Intership Report
41 pages
Suriya Intern Report2021 2024
No ratings yet
Suriya Intern Report2021 2024
15 pages
Intership
No ratings yet
Intership
23 pages
Assignment No.6
No ratings yet
Assignment No.6
8 pages
Final Report
No ratings yet
Final Report
22 pages
Internship Introduction Pages
No ratings yet
Internship Introduction Pages
10 pages
Chotu 101
No ratings yet
Chotu 101
28 pages
A Language-Based Approach To Measuring Scholarly Impact
No ratings yet
A Language-Based Approach To Measuring Scholarly Impact
8 pages
Spiking Neural Nets For Image Classification
No ratings yet
Spiking Neural Nets For Image Classification
8 pages
Toward A Quantum-Science Gateway A Hybrid Reference Architecture Facilitating Quantum Computing Capabilit
No ratings yet
Toward A Quantum-Science Gateway A Hybrid Reference Architecture Facilitating Quantum Computing Capabilit
12 pages
Grasshopper Optimization Algorithm Based Design of Structures
No ratings yet
Grasshopper Optimization Algorithm Based Design of Structures
1 page
Natural Language Processing Assignment
No ratings yet
Natural Language Processing Assignment
3 pages
09 Domain Analysis Testing Examples - Done
No ratings yet
09 Domain Analysis Testing Examples - Done
6 pages
Enhanced Shell Sorting Algorithm: Basit Shahzad, and Muhammad Tanvir Afzal
No ratings yet
Enhanced Shell Sorting Algorithm: Basit Shahzad, and Muhammad Tanvir Afzal
5 pages
Practical Secure Aggregation For Federated Learning On User Held Data
No ratings yet
Practical Secure Aggregation For Federated Learning On User Held Data
5 pages
Lab 4 - DTFS Analysis
No ratings yet
Lab 4 - DTFS Analysis
4 pages
BE411 Bode Plots Lab 4
No ratings yet
BE411 Bode Plots Lab 4
11 pages
DIP Final
No ratings yet
DIP Final
3 pages
S.E.A College of Engineering and Technology
No ratings yet
S.E.A College of Engineering and Technology
3 pages
N Gram, RNN Tranformer
No ratings yet
N Gram, RNN Tranformer
2 pages
Option Pricing Calculator: Michael Rechenthin, PHD
No ratings yet
Option Pricing Calculator: Michael Rechenthin, PHD
2 pages
Week 7 Operators What Are The Fundamental Image Processing Operators
No ratings yet
Week 7 Operators What Are The Fundamental Image Processing Operators
1 page