MACHINE LEARNING LAB ASSIGNMENT
CSE-716
S. M. SHAFKAT RAIHAN
ID: 16701041
SESSION: 2015-16
PROBLEM DESCRIPTION
Implementing Artificial Neural Network Classification Method
Using Python Language
FULL DATASET
age income isStudent credit_rating Buys_Computer
youth high No Fair No
youth high No Excellent No
middle_aged high No Fair Yes
Implementing
senior Artificial
medium Neural
No Network
Fair Classification
Yes Using R
senior low Yes Fair Yes
Language
senior low Yes Excellent No
middle_aged low Yes Excellent Yes
youth medium No Fair No
youth low Yes Fair Yes
senior medium Yes Fair Yes
youth medium Yes Excellent Yes
middle_aged medium No Excellent Yes
middle_aged high Yes Fair Yes
senior medium No Excellent No
TRAINING DATA
age income isStudent credit_rating class
youth high No Fair No
youth high No Excellent No
Implementing
middle_aged
Artificial
high
NeuralNoNetwork Classification
Fair
Using
Yes
R
Language
senior medium No Fair Yes
senior low Yes Fair Yes
senior low Yes Excellent No
middle_aged low Yes Excellent Yes
youth medium No Fair No
youth low Yes Fair Yes
senior medium Yes Fair Yes
TEST DATA
age income isStudent credit_rating class
Implementing
youth Artificial
mediumNeural
YesNetwork Classification
Excellent Using
Yes R
Language
middle_aged medium No Excellent Yes
middle_aged high Yes Fair Yes
senior medium No Excellent No
CODE & EXPLANATION
import pandas as pd # This data analysis library is used for using methods to read the CSV files
from sklearn.preprocessing import LabelEncoder # LabelEncoder Class to convert categorical text data into model-understandable numerical data
from sklearn.metrics import confusion_matrix, accuracy_score #For computing Accuracy and confusion matrix
import numpy #Used for mathematical operations on multidimensional
from keras.models import Sequential #keras is python’s deep learning library and Sequential is a class used for creating sequential stack of layers of
neurons
from keras.layers import Dense #Dense is a layer class, which implements a densely connected/fully connected layer
numpy.random.seed(7) # used to randomly initialize the weight matrix of 7 weight entries
train = pd.read_csv('C:/Users/USER/Desktop/Testing_ML_Lab_Files/data/Training2.csv’) # Read training data using pandas method read.csv()
test = pd.read_csv('C:/Users/USER/Desktop/Testing_ML_Lab_Files/data/UnknownData.csv’) # Read testing data using pandas method read.csv()
# LabelEncoder to convert categorical to numeric value.
number = LabelEncoder() #LabelEncoder object constructed to convert categorical text data into model-understandable numerical data
# Convert categorical values to numeric.
for i in train:
train[i] = number.fit_transform(train[i].astype('str’)) # fit() retrieves the parameters of the model from the dataset such as mean and standard div.
# Transform() applies those on the dataset to transform it into the new dataset.
# fit_transform() joins these #two operations.
#astype(str) used to convert the all attribute values to string type first
CODE & EXPLANATION
# Split input and output columns; x = input columns, y = output columns.
x_train = train.iloc[:, :-1] #train.iloc[:,:-1] means data of all rows and all columns except
#the last column will be considered
y_train = train.iloc[:, -1] #test.iloc[:,-1] means data of all rows and only the last column
#will be considered
# Do the same for test dataset.
for i in test:
test[i] = number.fit_transform(test[i].astype('str'))
x_test = test.iloc[:,:-1] #What was done for x_train
y_test = test.iloc[:,-1] #What was done for y_train
model = Sequential() # Create a sequential ANN model.
model.add(Dense(10, input_dim=4, activation='relu’)) # Add first layer; It will be dense with output
#array of shape (*,10) and input shape (*,4).
# Rectified Linear unit is used as Activation
#function
model.add(Dense(4, activation='relu’)) # Add second layer; output shape = (*,4)
model.add(Dense(1, activation='sigmoid’)) # Add output layer; output shape = (*,1) for output 0 or 1.
#Logsigmoid activation function used
CODE & EXPLANATION
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy’]) # Compile configures the model for
# training
Loss function ‘binary_crossentropy’
#is used as there are just 2 class labels
model.fit(x_train, y_train, epochs=150, batch_size=10) # Now train-up the model, iterations = 500, batch = 10 is the number of
#training examples in one epoch
predictions = model.predict(x_test) # doing prediction using the model
predicted= [int(round(x[0])) for x in predictions] # Result have been rounded off and converted o integer as y_test values are integers
# Build confusion matrix
cfm = confusion_matrix(y_test, predicted) # Actual Value\Predicted Value No Yes
# Calculating accuracy # No True Negative False Positive
acc = accuracy_score(y_test, predicted) # Yes False Negative True Positive
# Printing accuracy and cfm # Accuracy = (TP+TN) / Total Number of samples
print('Accuracy:', acc) # Prinitng Accuracy
print('Prediction No Yes’) #Prinitng Confusion Matrix
print(' No {} {}'.format(cfm[0][0], cfm[0][1]))
print(' Yes {} {}'.format(cfm[1][0], cfm[1][1]))
OUTPUT & EXPLANATION
Output
Accuracy: 0.75
Prediction No Yes
No 1 0
Yes 1 2
Explanation
Total Number of samples in test set = 4
Adding the elements of the principle diagonal of the confusion matrix gives us the Correctly identified
positive and negative classes = 1+2=3
Accuracy = (TP + TN) / Total = ¾ = 0.75