Generative AI Binary Classification
Generative AI Binary Classification
2 Description
The data used here is : ‘Pima Indians Diabetes Dataset’. It is downloaded from :
https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv
It is a binary (2-class) classification problem. There are 768 observations with 8 input variables
and 1 output variable.
The variable names are as follows:
1. Number of times pregnant.
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test.
3. Diastolic blood pressure (mm Hg).
4. Triceps skinfold thickness (mm).
5. 2-Hour serum insulin (mu U/ml).
6. Body mass index (weight in kg/(height in m)^2).
7. Diabetes pedigree function.
8. Age (years).
9. Class variable (0 or 1).
1
[2]: # load data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/
↪pima-indians-diabetes.csv'
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 767 entries, 0 to 766
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 6 767 non-null int64
1 148 767 non-null int64
2 72 767 non-null int64
3 35 767 non-null int64
4 0 767 non-null int64
5 33.6 767 non-null float64
6 0.627 767 non-null float64
7 50 767 non-null int64
8 1 767 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
6 148 72 35 0 33.6 0.627 50 1
0 1 85 66 29 0 26.6 0.351 31 0
1 8 183 64 0 0 23.3 0.672 32 1
2 1 89 66 23 94 28.1 0.167 21 0
3 0 137 40 35 168 43.1 2.288 33 1
4 5 116 74 0 0 25.6 0.201 30 0
StandardScaler: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
[3]: 0 1 2 3 4 5 6 \
0 -0.843726 -1.122086 -0.160249 0.532023 -0.693559 -0.683729 -0.364265
1 1.234240 1.944476 -0.263578 -1.286882 -0.693559 -1.102301 0.604701
2 -0.843726 -0.996920 -0.160249 0.155698 0.122357 -0.493469 -0.919684
3 -1.140579 0.505069 -1.503534 0.908349 0.764674 1.409132 5.482732
4 0.343683 -0.152051 0.253070 -1.286882 -0.693559 -0.810569 -0.817052
2
7
0 -0.188940
1 -0.103795
2 -1.040393
3 -0.018650
4 -0.274086
X_data: (767, 8)
Y_data: (767,)
X_train: (575, 8)
y_train: (575,)
X_test: (192, 8)
y_test: (192,)
[ ]: # declaring model
basic_model = Sequential()
3
# First layers: 8 neurons/perceptrons that takes the input and uses 'sigmoid'␣
↪activation function.
4
[14]: # Plot accuracy vs epochs (DIY)
epochRange = range(1,epochs+1);
plt.plot(epochRange,history.history['accuracy'])
plt.plot(epochRange,history.history['val_accuracy'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.grid()
plt.xlim((1,epochs))
plt.legend(['Train','Test'])
plt.show()
5
[15]: # Test, Loss and accuracy
loss_and_metrics = basic_model.evaluate(X_test, y_test)
print('Loss = ',loss_and_metrics[0])
print('Accuracy = ',loss_and_metrics[1])
6
Name: 1, dtype: int64
[[0.71106774]
[0.30598542]
[0.7271708 ]
[0.34965125]
[0.25360548]]
[1, 0, 1, 0, 0]
7 Conclusion
An Artificial Neural Network was built to implement Binary Classification task using
3 Dense layers with sigmoid activation function and was tested using Pima Indians
Diabetes dataset. The model has an accuracy of 79%.