0% found this document useful (0 votes)
34 views

Import Numpy As NP

The document contains code to read in a CSV dataset, split it into training and test sets, calculate probabilities to classify the data, make predictions on the test set, and calculate classification accuracy. It reads a CSV file, splits the data into a 60% training set and 40% test set. It calculates probabilities for targets "yes" and "no" in the training data and uses these to make predictions on the test set. It prints the predictions, actual targets, and calculates accuracy at 50%.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Import Numpy As NP

The document contains code to read in a CSV dataset, split it into training and test sets, calculate probabilities to classify the data, make predictions on the test set, and calculate classification accuracy. It reads a CSV file, splits the data into a 60% training set and 40% test set. It calculates probabilities for targets "yes" and "no" in the training data and uses these to make predictions on the test set. It prints the predictions, actual targets, and calculates accuracy at 50%.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

import numpy as np

import random

import csv

import pdb

def read_data(filename):

with open(filename,'r')as csvfile:

datareader =csv.reader(csvfile)

metadata=next(datareader)

traindata=[]

for row in datareader:

traindata.append(row)

return(metadata,traindata)

def splitdataset(dataset,splitratio):

trainsize=int(len(dataset)*splitratio)

trainset=[]

testset=list(dataset)

i=0

while len(trainset)<trainsize:

trainset.append(testset.pop(i))

return[trainset,testset]

def classifydata(data,test):

total_size=data.shape[0]

print("\n")

print("training data size=",total_size)

print("test data size=",test.shape[0])

countyes=0

countno=0

probyes=0

probno=0
print("\n")

print("target count probability")

for x in range(data.shape[0]):

if data[x,data.shape[1]-1]=='yes':

countyes=countyes+1

if data[x,data.shape[1]-1]=='no':

countno=countno+1

probyes=countyes/total_size

probno=countno/total_size

print("yes","\t",countyes,"\t",probyes)

print("no","\t",countno,"\t",probno)

prob0=np.zeros((test.shape[1]-1))

prob1=np.zeros((test.shape[1]-1))

accuracy=0

print("\n")

print("instance prediction target")

for t in range(test.shape[0]):

for k in range(test.shape[1]-1):

count1=count0=0

for j in range(data.shape[0]):

if test[t,k]==data[j,k] and data[j,data.shape[1]-1]=='no':

count0=count0+1

if test[t,k]==data[j,k] and data[j,data.shape[1]-1]=='yes':

count1=count1+1

prob0[k]=count0/countno

prob1[k]=count1/countyes

probNo=probno

probYes=probyes

for i in range(test.shape[1]-1):

probNo=probNo*prob0[i]
probYes=probYes*prob1[i]

if probNo>probYes:

predict='no'

else:

predict='yes'

print(t+1,"\t",predict,"\t",test[t,test.shape[1]-1])

if predict==test[t,test.shape[1]-1]:

accuracy+=1

final_accuracy=(accuracy/test.shape[0])*100

print("accuracy",final_accuracy,"%")

return

metadata,traindata=read_data("3-dataset.csv")

print("attribute names of the traning dta are:", metadata)

splitratio=0.6

trainingset,testset=splitdataset(traindata,splitratio)

training=np.array(trainingset)

print("\n tarining data set are")

for x in trainingset:

print(x)

testing=np.array(testset)

print("\n the test data set are:")

for x in testing:

print(x)

classifydata(training, testing)

attribute names of the traning dta are: ['outlook', 'temprature',


'humidity', 'wind', 'Target']
tarining data set are
['sunny', 'hot', 'high', 'weak', 'no']
['sunny', 'hot', 'high', 'strong', 'no']
['overcast', 'hot', 'high', 'weak', 'yes']
['rain ', 'mild', 'high', 'weak', 'yes']
['rain ', 'cool', 'normal', 'weak', 'yes']
['rain ', 'cool', 'normal', 'strong', 'no']
['overcast', 'cool', 'normal', 'strong', 'yes']
['sunny', 'mild', 'high', 'weak', 'no']

the test data set are:


['sunny' 'cool' 'normal' 'weak' 'yes']
['rain ' 'mild' 'normal' 'weak' 'yes']
['sunny' 'mild' 'normal' 'strong' 'yes']
['overcast' 'mild' 'high' 'strong' 'yes']
['overcast' 'hot' 'normal' 'weak' 'yes']
['rain' 'mild' 'high' 'strong' 'no']

training data size= 8


test data size= 6

target count probability


yes 4 0.5
no 4 0.5

instance prediction target


1 no yes
2 yes yes
3 no yes
4 yes yes
5 yes yes
6 yes no
accuracy 50.0 %

You might also like