0% found this document useful (0 votes)
88 views3 pages

Great Step Data Abstract

1. The team analyzed safety data and evaluated several machine learning models for classification, including Naive Bayes, Decision Tree, and Support Vector Machines (SVMs) with different kernels. 2. The best performing models were Decision Tree with 95.06% accuracy and SVM with a polynomial kernel achieving 92.93% accuracy. 3. To further improve the SVM model, the team tuned hyperparameters like cost, epsilon, degree, and kernel type and found the polynomial kernel with specified values for these hyperparameters achieved the best accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views3 pages

Great Step Data Abstract

1. The team analyzed safety data and evaluated several machine learning models for classification, including Naive Bayes, Decision Tree, and Support Vector Machines (SVMs) with different kernels. 2. The best performing models were Decision Tree with 95.06% accuracy and SVM with a polynomial kernel achieving 92.93% accuracy. 3. To further improve the SVM model, the team tuned hyperparameters like cost, epsilon, degree, and kernel type and found the polynomial kernel with specified values for these hyperparameters achieved the best accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

GREAT STEP – SAFETY DATA ANALYTICS ABSTRACT SUBMISSION

TEAM MATES:

SRI CHANDRA DUDDU – 14AG36001

SRICHANDRA CHILAPPAGARI – 13EC35014

Abstract Submission:

1. We have reviewed the predictor variables and dropped the variables ‘Id’ and ‘Phone
number’ which is obvious for the reason that they are unique for each customer.
This is also seen from the importance plot from randomForest package in R.
2. Looking at the importance plot from Random Forest, ‘Area Code’ is the least
important with < 5 % importance.
3. Upon changing the Categorical Variable ‘State’ into One Hot Encoding, we have seen
a decrease in the accuracy. So, we dropped this variable.
4. We have found that there are no missing values in the data. We have performed the
stratified sampling using ‘CreateDataPartition’ and divided the whole dataset into
train set and test set in 70:30 split.

RESULTS-

1. For Naive Bayes:

Reference
Prediction False True
False 1225 116
True 62 96

Accuracy: 88.12 %
Precision: 91.13 %
Recall: 95.2 %
2. For Decision Tree:

Reference
Prediction False True
False 1269 56
True 18 156

Accuracy: 95.06 %
Precision: 95.8 %
Recall: 98.6 %
GREAT STEP – SAFETY DATA ANALYTICS ABSTRACT SUBMISSION

3. For SVM – radial kernel:

Reference
Prediction False True
False 1275 117
True 12 95

Accuracy: 91.39 %
Precision: 91.6 %
Recall: 99.1 %

4. For SVM – polynomial kernel:

Reference
Prediction False True
False 1280 123
True 7 89

Accuracy: 91.32 %
Precision: 91.2 %
Recall: 99.5 %
5. For SVM – Linear kernel:

Reference
Prediction False True
False 1287 212
True 0 0

Accuracy: 85.85 %
Precision: 85.9 %
Recall: 100 %

6. For SVM – sigmoid kernel:

Reference
Prediction False True
False 1195 190
True 92 22

Accuracy: 81.18 %
Precision: 86.3 %
Recall: 92.9 %
GREAT STEP – SAFETY DATA ANALYTICS ABSTRACT SUBMISSION

There are no parameters to be tuned in rpart and Naive bayes. In order to improve the
accuracy performance of the support vector classification we will need to select the best
parameters for the model. We trained a lot of models for the different couples
of ϵ(epsilon) and cost, and choose the best one based on the root mean square
error(RMSE) value.

For SVM :

Best possible accuracy : 92.93%


Gamma : 0.0556
Cost : 16
Epsilon : 0
Degree : 3
Kernel : polynomial

In the figure below, dark blue regions represent the svm models with less RMSE value.
Darker the region, less is the RMSE of the model.

You might also like