0% found this document useful (0 votes)
34 views

Generative AI Binary Classification

The document describes building an artificial neural network using the Pima Indians Diabetes dataset to perform binary classification. It involves preprocessing the data, designing a model with 3 dense layers and sigmoid activations, training the model for 120 epochs, evaluating performance by plotting loss and accuracy curves, and achieving 79% accuracy on the test set. Key steps include data scaling, splitting into train/test sets, model compilation, training, and classification performance metrics.

Uploaded by

Cyborg Ultra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Generative AI Binary Classification

The document describes building an artificial neural network using the Pima Indians Diabetes dataset to perform binary classification. It involves preprocessing the data, designing a model with 3 dense layers and sigmoid activations, training the model for 120 epochs, evaluating performance by plotting loss and accuracy curves, and achieving 79% accuracy on the test set. Key steps include data scaling, splitting into train/test sets, model compilation, training, and classification performance metrics.

Uploaded by

Cyborg Ultra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Experiment - 1

1 Aim: Build an Artificial Neural Network to implement Binary


Classification task using the Back-propagation algorithm and
test the same using appropriate data sets.

2 Description
The data used here is : ‘Pima Indians Diabetes Dataset’. It is downloaded from :
https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv
It is a binary (2-class) classification problem. There are 768 observations with 8 input variables
and 1 output variable.
The variable names are as follows:
1. Number of times pregnant.
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test.
3. Diastolic blood pressure (mm Hg).
4. Triceps skinfold thickness (mm).
5. 2-Hour serum insulin (mu U/ml).
6. Body mass index (weight in kg/(height in m)^2).
7. Diabetes pedigree function.
8. Age (years).
9. Class variable (0 or 1).

3 Data Import and Processing


[1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

1
[2]: # load data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/
↪pima-indians-diabetes.csv'

# data_pd = pd.read_csv(url,header = None)


data_pd =pd.read_csv('pima-indians-diabetes.csv')
print(data_pd.info())
print(data_pd.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 767 entries, 0 to 766
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 6 767 non-null int64
1 148 767 non-null int64
2 72 767 non-null int64
3 35 767 non-null int64
4 0 767 non-null int64
5 33.6 767 non-null float64
6 0.627 767 non-null float64
7 50 767 non-null int64
8 1 767 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
6 148 72 35 0 33.6 0.627 50 1
0 1 85 66 29 0 26.6 0.351 31 0
1 8 183 64 0 0 23.3 0.672 32 1
2 1 89 66 23 94 28.1 0.167 21 0
3 0 137 40 35 168 43.1 2.288 33 1
4 5 116 74 0 0 25.6 0.201 30 0
StandardScaler: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

[3]: #Scaling Numerical columns


from sklearn.preprocessing import StandardScaler
std = StandardScaler()
scaled = std.fit_transform(data_pd.iloc[:,0:8])
scaled = pd.DataFrame(scaled)
scaled.head()

[3]: 0 1 2 3 4 5 6 \
0 -0.843726 -1.122086 -0.160249 0.532023 -0.693559 -0.683729 -0.364265
1 1.234240 1.944476 -0.263578 -1.286882 -0.693559 -1.102301 0.604701
2 -0.843726 -0.996920 -0.160249 0.155698 0.122357 -0.493469 -0.919684
3 -1.140579 0.505069 -1.503534 0.908349 0.764674 1.409132 5.482732
4 0.343683 -0.152051 0.253070 -1.286882 -0.693559 -0.810569 -0.817052

2
7
0 -0.188940
1 -0.103795
2 -1.040393
3 -0.018650
4 -0.274086

[4]: X_data =scaled.to_numpy()


print('X_data:',np.shape(X_data))
Y_data = data_pd.iloc[:,8]
print('Y_data:',np.shape(Y_data))

X_data: (767, 8)
Y_data: (767,)

[5]: # Split data into X_train, X_test, y_train, y_test


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_data, Y_data, test_size=0.
↪25, random_state= 0)

[6]: # Check the dimension of the sets


print('X_train:',np.shape(X_train))
print('y_train:',np.shape(y_train))
print('X_test:',np.shape(X_test))
print('y_test:',np.shape(y_test))

X_train: (575, 8)
y_train: (575,)
X_test: (192, 8)
y_test: (192,)

4 Design the Model


[ ]: import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import binary_crossentropy,␣
↪categorical_crossentropy

[ ]: # declaring model
basic_model = Sequential()

Check Eg: https://github.com/urjeet/Pima-Diabetes-Keras-Model/blob/master/pima_diabetes_keras_model.py

[10]: # Adding layers to the model (DIY)

3
# First layers: 8 neurons/perceptrons that takes the input and uses 'sigmoid'␣
↪activation function.

basic_model.add(Dense(8, input_shape=(8,), activation='sigmoid'))


# Second layers: 4 neurons/perceptrons, 'sigmoid' activation function.
basic_model.add(Dense(4, activation='sigmoid'))
# Final layer: 1 neuron/perceptron to do binary classification
basic_model.add(Dense(1, activation='sigmoid'))

[ ]: # compiling the model (DIY)


basic_model.compile(loss='binary_crossentropy', optimizer='adam',␣
↪metrics=['accuracy'])

5 Train the Model


[ ]: # training the model
epochs=120
history = basic_model.fit(X_train, y_train, validation_data=(X_test, y_test),␣
↪epochs=epochs)

6 Evaluate the Model


[13]: # plot loss vs epochs
epochRange = range(1,epochs+1);
plt.plot(epochRange,history.history['loss'])
plt.plot(epochRange,history.history['val_loss'])
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid()
plt.xlim((1,epochs))
plt.legend(['Train','Test'])
plt.show()

4
[14]: # Plot accuracy vs epochs (DIY)
epochRange = range(1,epochs+1);
plt.plot(epochRange,history.history['accuracy'])
plt.plot(epochRange,history.history['val_accuracy'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.grid()
plt.xlim((1,epochs))
plt.legend(['Train','Test'])
plt.show()

5
[15]: # Test, Loss and accuracy
loss_and_metrics = basic_model.evaluate(X_test, y_test)
print('Loss = ',loss_and_metrics[0])
print('Accuracy = ',loss_and_metrics[1])

6/6 [==============================] - 0s 2ms/step - loss: 0.4515 - accuracy:


0.7917
Loss = 0.4515419006347656
Accuracy = 0.7916666865348816

6.0.1 Classification Model Performance measures


[16]: y_pred = basic_model.predict(X_test)
print(y_test[:5])
print(y_pred[:5])

6/6 [==============================] - 0s 1ms/step


661 1
122 0
113 1
14 1
529 0

6
Name: 1, dtype: int64
[[0.71106774]
[0.30598542]
[0.7271708 ]
[0.34965125]
[0.25360548]]

[17]: y_pred =[1 if y_pred[aa]>=0.5 else 0 for aa in range(len(y_pred)) ]


print(y_pred[:5])

[1, 0, 1, 0, 0]

[18]: print(sklearn.metrics.classification_report(y_test, y_pred))

precision recall f1-score support

0 0.82 0.89 0.85 131


1 0.71 0.59 0.64 61

accuracy 0.79 192


macro avg 0.76 0.74 0.75 192
weighted avg 0.79 0.79 0.79 192

7 Conclusion
An Artificial Neural Network was built to implement Binary Classification task using
3 Dense layers with sigmoid activation function and was tested using Pima Indians
Diabetes dataset. The model has an accuracy of 79%.

You might also like