Machine Learning Lab Manual
Machine Learning Lab Manual
Laboratory Manual
For VI Semester
Experiment No.1
Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data
samples. Read the training data from a .CSV file.
Ans:-
Decision Tree
The Decision Tree Basically is an inverted tree, with each
node representing features and attributes.
While the leaf node represents the output
Except for the leaf node, the remaining nodes act as decision
making nodes.
Algorithms
CART (Gini Index)
ID3 (Entropy, Information Gain)
Note:- Here we will understand the ID3 algorithm
Algorithm Concepts
1. To understand this concept, we take an example, assuming we
have a data set (link is given here Click Here).
2. Based on this data, we have to find out if we can play
someday or not.
3. We have four attributes in the data-set. Now how do we
decide which attribute we should put on the root node?
4. For this, we will Calculate the information gain of all the
attributes (Features), which will have maximum information
will be our root node.
Step1 : Creating a root node
Entorpy(Entropy of whole data-set)
Entropy(S)=(p/p+n)*log2(p/p+n)-(n/n+p)*log2(n/n+p)
p- p stand for number of positive examples
n- n stand for number of negative examples.
Step2: For Every Attribute/Features
Average Information (AIG of a particular attribute)
I(Attribute)=Sum of {(pi+ni/p+n)*Entropy(Entropy of Attribute)}
pi- Here pi stand for number of positive examples in particular
attribute.
ni- Here ni stand for number of negative examples in particular
attribute.
Entropy (Attribute) - Entropy of Attribute calculated in same as we
calculated for System (Whole Data-Set)
Information Gain
Gain=Entropy(s)-I(Attribute)
1. If all examples are positive, Return the single-node tree ,with
label=+
2. If all examples are Negative, Return the single-node tree,with
label= -
3. If Attribute empty, Return the single-node tree
Step4: Pick The Highest Gain Attribute
1. The attribute that has the most information gain has to create
a group of all the its attributes and process them in same
as which we have done for the parent (Root) node.
2. Again, the feature which has maximum information gain will
become a node and this process will continue until we get the
leaf node.
Step5: Repeat Until we get final node (Leaf node )
Let's take a look how our tree will look like
ACCURACY
#Defining a function to calculate accuracy
def classify(instance, tree, default=None):
attribute = next(iter(tree))
print("Key:",tree.keys())
print("Attribute:",attribute)
print("Insance of Attribute :",instance[attribute],attribute)
if instance[attribute] in tree[attribute].keys():
result = tree[attribute][instance[attribute]]
print("Instance
Attribute:",instance[attribute],"TreeKeys :",tree[attribute].keys()
)
if isinstance(result, dict):
return classify(instance, result)
else:
return result
else:
return default
Note
df_tennis['predicted'] = df_tennis.apply(classify, axis=1,
args=(tree,'No') )
print(df_tennis['predicted'])
print('\n Accuracy is:\n' +
str( sum(df_tennis['PT']==df_tennis['predicted'] ) /
(1.0*len(df_tennis.index)) ))
df_tennis[['PT', 'predicted']]
Note:-
training_data = df_tennis.iloc[1:-4]
test_data = df_tennis.iloc[-4:]
train_tree = id3(training_data, 'PT', attribute_names)
test_data['predicted2'] = test_data.apply(
classify, axis=1, args=(train_tree,'Yes') )
print ('\n\n Accuracy is : ' +
str( sum(test_data['PT']==test_data['predicted2'] ) /
(1.0*len(test_data.index)) ))
Output
Experiment No. 4
Note:
1. First we imported OpenCV module after that we
imported matplotlib library to plot and show the picture.
2. After that we read the image,our image was a colored image
which we then converted into grayscale image.
3. You can see the difference between grayscale image and
colored image form the output.
4. In OpenCv all the images first converted into numpy array,this
is why we were able to see their shape.
5. And when we printed the gray image ,you can see that we got
the 2D array, Like this if you will print a color image you will get a
3D array.
6. #create Cascade Classifier it contains all the face feature
7. haar_face_cascade=cv2.CascadeClassifier('face.xml')
8. #Search for the co-ordinates of face
9. faces=haar_face_cascade.detectMultiScale(image,scaleFacto
r=1.06,
10. minNeighbors=5);
11. print("face found",len(faces))
12. #create a rectangle outline on the face
13. for (x, y, w, h) in faces:
14. cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0),2)
15. plt.imshow(image) #show the image
16. plt.axis('off')
17. plt.show()
Note: (let's understand what we are doing in this code)
1. We first created a Cascade classifier, which contain all the
faces features in "face.xml".
2. Smaller the value of scaleFactor greater the accuracy.
3. As we know our image is a numpy array, now we are
searching for face coordinates in the image which we have to
detect.
4. After that we are making a rectangular frame on the faces,
(0,255,0) shows frame color and 2 is the width of the frame.
Output
Experiment No.5
Write a program to implement the naïve Bayesian classifier for a
sample training data set stored as a .CSV file. Compute the
accuracy of the classifier, considering few test data sets.
Note:
1. Here we have seen the values of 14 samples for our 13
features by printing only five.
2. On the basis of these features, our wine classes are made up
of three, 0, 1, 2.
3. # Import train_test_split function
4. from sklearn.model_selection import train_test_split
5. # Split dataset into training set and test set
6. X_train, X_test, y_train, y_test = train_test_split(wine.data,
wine.target, test_size=0.30,random_state=109)
Note: Split our data into training data and testing data , 70 %
training data and 30 % testing data. From training data our model
learn and from testing data, we can see how much our model
learned.
Note:
1. To check how good our model is, we have obtained the
accuracy of our model.
2. Here we calculated both confusion matrix and accuracy.
3. We can see from the confusion matrix that our model has
predict a total of 5 values wrong and are correct prediction.
Experiment No.6
X_test_tf=count_vect.transform(news_test.data)
X_test_tfidf=tfidf_transformer.transform(X_test_tf)
predicted=clf.predict(X_test_tfidf)
predicted
Output
Note: It classify our test data into classes we have, you can also
use any text other then test data.
ACCURACY
let's check how correct our model is
from sklearn import metrics
from sklearn.metrics import accuracy_score
print("Accuracy",accuracy_score(news_test.target,predicted))
print(metrics.confusion_matrix(news_test.target,predicted))
Output
Note: You can see that here we have accuracy, confusion matrix.
Experiment No.7