0% found this document useful (0 votes)
12 views8 pages

Rplots

The document contains analysis of a diabetes prediction model. Various machine learning algorithms and feature importance measures were evaluated. The top algorithms were random forest and extra trees with AUROC scores of 0.85 and 0.7 respectively. Key predictive features included plasma glucose, age, pregnancies, and pedigree. A regularized logistic regression model found similar predictive features.

Uploaded by

veyeho1025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Rplots

The document contains analysis of a diabetes prediction model. Various machine learning algorithms and feature importance measures were evaluated. The top algorithms were random forest and extra trees with AUROC scores of 0.85 and 0.7 respectively. Key predictive features included plasma glucose, age, pregnancies, and pedigree. A regularized logistic regression model found similar predictive features.

Uploaded by

veyeho1025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2

Actual
diabetes
density

N
Y

0.00 0.25 0.50 0.75 1.00


Predicted Probability of diabetes
AUPR = 0.72, AUROC = 0.85
Predicted N Predicted Y

Actual
diabetes
density

N
Y

0.00 0.25 0.50 0.75 1.00


Predicted Probability of diabetes
AUPR = 0.72, AUROC = 0.85
insulin ●

skinfold ●

diastolic_bp ●

weight_class ●

plasma_glucose ●

pregnancies ●

pedigree ●

patient_id ●

diabetes ●

age ●

0% 10% 20% 30% 40% 50%


Percent of Observations Missing
Random Forest
mtry
● ●

● ●

6 ● ●

● ● ● ● ● ●

3 ● ●

● ● ● ●

● ●

splitrule

extratrees ● ● ● ●● ● ● ● ●●

gini ● ● ● ● ● ● ● ● ● ● ● ●

min.node.size
20 ●

● ●

● ●
15



10 ●

● ●●

5 ●

● ●

0.65 0.66 0.67 0.68 0.69 0.70


AUPR
Coefficients for regularized classification model of diabete

plasma_glucose ●

pregnancies ●

pedigree ●

weight_class_morbidly.obese (vs. obese) ●

age ●

skinfold ●

insulin ●

diastolic_bp ●

weight_class_missing (vs. obese) ●

weight_class_overweight (vs. obese) ●

weight_class_normal (vs. obese) ●

weight_class_underweight (vs. obese) ●

−1 0 1
Coefficient Estimate
Hyperparameter values alpha = 0 and lambda = 0.0168
Random Forest variable importance

plasma_glucose ●

age ●

pregnancies ●

pedigree ●

insulin ●

skinfold ●

diastolic_bp ●

weight_class_normal ●

weight_class_overweight ●

weight_class_morbidly.obese ●

weight_class_missing ●

weight_class_underweight ●

0 25 50 75 100
Relative Importance
0.8

0.6

weight_class
predicted_diabetes

morbidly obese
obese

0.4 NA
overweight
normal

0.2

0.0
75 100 125 150 175
plasma_glucose
Predictions made by: glmnet
2.5 low extreme
moderate high

2.0

1.5
Actual
diabetes
density

N
Y

1.0

0.5

0.0

0.00 0.25 0.50 0.75 1.00


Predicted Probability of diabetes
AUPR = 0.7, AUROC = 0.88

You might also like