0% found this document useful (0 votes)
42 views35 pages

6CS4-22 - ML Lab Manual

The Machine Learning Lab Manual for Poornima College of Engineering outlines the structure, objectives, and experiments for the course 6CS4-22 in the Department of Advance Computing for the 2024-25 session. It includes the institute's vision and mission, lab outcomes, evaluation schemes, and specific instructions for lab conduct and safety. The manual also details a series of programming experiments aimed at teaching students the fundamentals of Python and machine learning algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views35 pages

6CS4-22 - ML Lab Manual

The Machine Learning Lab Manual for Poornima College of Engineering outlines the structure, objectives, and experiments for the course 6CS4-22 in the Department of Advance Computing for the 2024-25 session. It includes the institute's vision and mission, lab outcomes, evaluation schemes, and specific instructions for lab conduct and safety. The manual also details a series of programming experiments aimed at teaching students the fundamentals of Python and machine learning algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Poornima College of Engineering, Jaipur ML Lab Manual

ISI-6, RIICO Institutional Area, Sitapura, Jaipur-302022, Rajasthan


Phone/Fax: 0141-2770790-92, www.pce.poornima.org

Machine Learning LAB


(Lab Code: 6CS4-22)
6thSemester, 3thYear

Department of Advance Computing


Session: 2024-25

Machine Learning Lab (: 6CS4-22) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur ML Lab Manual

TABLE OF CONTENT
S.No. Topic/Name of Experiment Page
Number
GENERAL DETAILS
1 Vision & Mission of Institute and Department iv

2 RTU Syllabus and Marking Scheme v

3 Lab Outcomes and its Mapping with POs and PSOs vi

4 Lab Conduction Plan viii

5 General Lab Instructions ix

6 Lab Specific Safety Rules x

LIST OF EXEPERIMENTS WITH VIVA QUESTIONS (AS PER RTU SYLLABUS)

A Zero Lecture 1
Write a program to demonstrate basic data type in python. 3
1
Write a program to compute distance between two points taking input 5
2
from the user Write a program add.py that takes 2 numbers as command
line arguments and prints its sum.

3 Write a Program for checking whether the given number is an even 7


number or not. Using a for loop, write a program that prints out the
decimal equivalents of 1/2, 1/3, 1/4, . . . , 1/10.
Write a Program to demonstrate list and tuple in python. Write a program 9
4
using a for loop that loops over a sequence. Write a program using a
while loop that asks the user for a number, and prints a countdown from
that number to zero.
Find the sum of all the primes below two million. By considering the 11
5
terms in the Fibonacci sequence whose values do not exceed four
million, WAP to find the sum of the even-valued terms.
Write a program to count the numbers of characters in the string and 14
6 store them in a dictionary data structure Write a program to use split
and join methods in the string and trace a birthday of a
person with a dictionary data structure.
Write a program to count frequency of characters in a given file. Can 17
you use character frequency to tell whether the given file is a Python
7
program file, C program file or a text file? Write a program to count

ML Lab (6CS4-22) manual Department of Advance Computing


Poornima College of Engineering, Jaipur ML Lab Manual

frequency of characters in a given file. Can you use character frequency


to tell whether the given file is a Python program file, C program file or
a text file?
Write a program to print each line of a file in reverse order. Write a 18
8
program to compute the number of characters, words and lines in a file.
Write a function nearly equal to test whether two strings are nearly 20
9 equal. Two strings a and b are nearly equal when a can be generated by
a single mutation on. Write function to compute gcd, lcm of two
numbers. Each function shouldn’t exceed one line.
Write a program to implement Merge sort. Write a program to 21
10
implement Selection sort, Insertion sort.

11 Beyond the Syllabus Experiment-1 23

Beyond the Syllabus Experiment-2 24


12

ML Lab (6CS4-22) manual Department of Advance Computing


Poornima College of Engineering, Jaipur ML Lab Manual

VISION&MISSION
INSTITUTE VISION & MISSION

VISION
To create knowledge based society with scientific temper, team spirit and dignity of
labor to face the global competitive challenges

MISSION
To evolve and develop skill based systems for effective delivery of knowledge so as
to equip young professionals with dedication & commitment to excellence in all
spheres of life.

DEPARTMENT VISION & MISSION

VISION
Evolve as a center of excellence with wider recognition and to adapt the rapid innovation
in Computer Engineering.

MISSION

 To provide a learning-centered environment that will enable students and faculty


members to achieve their goals empowering them to compete globally for the most
desirable careers in academia and industry.

 To contribute significantly to the research and the discovery of new arenas of knowledge
and methods in the rapid developing field of Computer Engineering.

 To support society through participation and transfer of advanced technology from one
sector to another.

iv
ML Lab (6CS4-22) manual Department of Advance Computing
Poornima College of Engineering, Jaipur ML Lab Manual

RTU SYLLABUS AND MARKING SCHEME

EVALUATION SCHEME

I+II Mid Term Examination Attendance and performance End Term Examination
Total Marks
Experiment Viva Total Attendance Performance Total Experiment Viva Total

20 10 30 10 5 15 20 10 30 75

DISTRIBUTION OF MARKS FOR EACH EXPERIMENT

Attendance Record Performance Total


2 3 5 10

v
ML Lab (6CS4-22) manual Department of Advance Computing
Poornima College of Engineering, Jaipur Department of Advance Computing

LAB OUTCOME AND ITS MAPPING WITH PO & PSO

LAB OUTCOMES

Objectives of this lab are as follows:


 To study the various
 To learn the basics of python as a programming language.
To learn the fundamental activities to develop programs using python data types.
Outcomes:-
 Students will develop basic programs of python like maximum, sum, prime,
even and odd.
 Students will apply the knowledge different data types of python.
Students can use the extensive library of python and since its open source and can be
easily integrated
Lab Outcomes:

LO1 To choose basic python Libraries and commands used in Machine Learning

LO2 To apply knowledge of machine learning algorithms for problem statements provided

LO3 to analyze various Supervised and Unsupervised Machine Learning algorithms

LO4 To Evaluate Machine Learning Algorithms for real world problems

LO-PO-PSO MAPPING MATRIX OF COURSE


LO/PO/P PO PO PO PO PO PO PO PO PO PO1 PO PO1 PS PS PS
SO 1 2 3 4 5 6 7 8 9 0 11 2 O1 O2 O3
LO1 - - - - 3 - - - - - - - 3 - -
LO2 - - 3 - - - - - - - - - 3 - 3
LO3 - - - 3 - - - - - - - - 3 3 -
LO4 - - - - - - 3 - - - - - 3 - 3

PROGRAM OUTCOMES (POs)

Engineering knowledge: Apply the knowledge of mathematics, science, engineering


PO1 fundamentals, and an engineering specialization to the solution of complex engineering
problems
Problem analysis: Identify, formulate, review research literature, and analyze complex
PO2 engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.

vi
ML Lab (6CS4-22) Manual Department of Advance Computing
Poornima College of Engineering, Jaipur Department of Advance Computing

Design/development of solutions: Design solutions for complex engineering


PO3 problems and design system components or processes that meet the specified needs
with appropriate consideration for the public health and safety, and the cultural,
societal, and environmental considerations.
Conduct investigations of complex problems: Use research-based knowledge and
PO4 research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.

Modern tool usage: Create, select, and apply appropriate techniques, resources, and
PO5 modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.

The engineer and society: Apply reasoning informed by the contextual knowledge to
PO6 assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.

Environment and sustainability: Understand the impact of the professional


PO7 engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.

Ethics: Apply ethical principles and commit to professional ethics and responsibilities
PO8 and norms of the engineering practice.
Individual and teamwork: Function effectively as an individual, and as a member or
PO9 leader in diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities with the
PO10 engineering community and with society at large, such as, being able to comprehend
and write effective reports and design documentation, make effective presentations,
and give and receive clear instructions.
Project management and finance: Demonstrate knowledge and understanding of the
PO11 engineering and management principles and apply the set one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary
environments.
Life-long learning: Recognize the need for, and have the preparation and ability to
PO12 Engage in independent and life-long learning in the broadest context of technological
change.

PROGRAM SPECIFIC OUTCOMES (PSOs)

Design, analyze and innovate solutions to technical issues in Thermal, Production


PSO1 and Design engineering.

PSO2 Exhibit the knowledge and skills in the field of Mechanical & Allied engineering
concepts.
PSO3 Apply the knowledge of skills in HVAC & Rand Automobile engineering.

vii
ML Lab (6CS4-22) Manual Department of Advance Computing
Poornima College of Engineering, Jaipur Department of Advance Computing

LAB CONDUCTION PLAN

Total number of experiment: 10


Total number of turns required: 13
Number of turns required for:

Experiment Number Turns Scheduled Week


Exp. 1 1 Week 1,2
Exp. 2 1 Week 3,4
Exp. 3 1 Week 5
Exp. 4 1 Week 6
Exp. 5 1 Week 7
Exp. 6 1 Week 8
Exp. 7 1 Week 9
Exp. 8 1 Week 10
Exp. 9 1 Week 11
Exp. 10 1 Week 12,13

DISTRIBUTIONOF LAB HOURS

Distribution of Lab Hours


S. No. Activity
Time(120minute)
1 Attendance 5
2 Explanation of features of language 15
3 Explaining the Experiment 15
4 Performance of experiment 70
5 Viva/Quiz/Queries 15

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

GENERAL LAB INSTRUCTIONS

DO’S
 Enter the lab on time and leave at proper time.
 Wait for the previous class to leave before the next class enters.
 Keep the bag outside in the respective racks.
 Utilize lab hours in the corresponding.
 Turn off the machine before leaving the lab unless a member of lab staff has specifically
told you not to do so.
 Leave the labs at least as nice as you found them.
 If you notice a problem with a piece of equipment (e.g. a computer doesn't respond) or the
room in general (e.g. cooling, heating, lighting) please report it to lab staff immediately.
Do not attempt to fix the problem yourself. 

DON’TS
 Don't abuse the equipment.
 Do not adjust the heat or air conditioners. If you feel the temperature is not properly set,
inform lab staff; we will attempt to maintain a balance that is healthy for people and
machines.
 Do not attempt to reboot a computer. Report problems to lab staff.
 Do not remove or modify any software or file without permission.
 Do not remove printers and machines from the network without being explicitly told to do
so by lab staff. 
 Don't monopolize equipment. If you're going to be away from your machine for more than
10 or 15 minutes, log out before leaving. This is both for the security of your account, and
to ensure that others are able to use the lab resources while you are not.
 Don’t use internet, internet chat of any kind in your regular lab schedule.
 Do not download or upload of MP3, JPG or MPEG files.
 No games are allowed in the lab sessions.
 No hardware including USB drives can be connected or disconnected in the labs without
prior permission of the lab in-charge.
 No food or drink is allowed in the lab or near any of the equipment. Aside from the fact that
it leaves a mess and attracts pests, spilling anything on a keyboard or other piece of computer
equipment could cause permanent, irreparable, and costly damage. (and in fact has) If you
need to eat or drink, take a break and do so in the canteen. 
 Don’t bring any external material in the lab, except your lab record, copy and books.
 Don’t bring the mobile phones in the lab. If necessary then keep them in silence mode.
 Please be considerate of those around you, especially in terms of noise level. While labs are
a natural place for conversations of all types, kindly keep the volume turned down.

If you are having problems or questions, please go to either the faculty, lab in-charge or the lab
supporting staff. They will help you. We need your full support and cooperation for smooth
functioning of the lab.

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

LAB SPECIFIC SAFETY RULES

Before entering in the lab

 All the students are supposed to prepare the theory regarding the next experiment/
Program.
 Students are supposed to bring their lab records as per their lab schedule.
 Previous experiment/program should be written in the lab record.
 If applicable trace paper/graph paper must be pasted in lab record with proper labeling.
 All the students must follow the instructions, failing which he/she may not be allowed in
the lab.

While working in the lab

 Adhere to experimental schedule as instructed by the lab in-charge/faculty.


 Get the previously performed experiment/ program signed by the faculty/ lab in charge.
 Get the output of current experiment/program checked by the faculty/ lab in charge in the
lab copy.
 Each student should work on his/her assigned computer at each turn of the lab.
 Take responsibility of valuable accessories.

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Zero Lecture

Topic to be covered: Machine learning Approaches

1. Decision tree learning


Decision tree learning uses a decision tree as a predictive model, which maps observations about an
item to conclusions about the item's target value.

2. Association rule learning


Association rule learning is a method for discovering interesting relations between variables in large
databases.

3. Artificial neural networks


An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is a
learning algorithm that is vaguely inspired by biological neural networks. Computations are structured
in terms of an interconnected group of artificial neurons, processing information using a connectionist
approach to computation. Modern neural networks are non-linear statistical data modeling tools. They
are usually used to model complex relationships between inputs and outputs, to find patterns in data, or
to capture the statistical structure in an unknown joint probability distribution between observed
variables.

4. Deep learning
Falling hardware prices and the development of GPUs for personal use in the last few years have
contributed to the development of the concept of deep learning which consists of multiple hidden layers
in an artificial neural network. This approach tries to model the way the human brain processes light
and sound into vision and hearing. Some successful applications of deep learning are computer vision
and speech Recognition.

5. Inductive logic programming


Inductive logic programming (ILP) is an approach to rule learning using logic Programming as a
uniform representation for input examples, background knowledge, and hypotheses. Given an encoding
of the known background knowledge and a set of examples represented as a logical database of facts,
an ILP system will derive a hypothesized logic program that entails all positive and no negative
examples. Inductive programming is a related field that considers any kind of programming languages
for representing hypotheses (and not only logic programming), such as functional programs.

6. Support vector machines


Support vector machines (SVMs) are a set of related supervised learning methods used for classification
and regression. Given a set of training examples, each marked as belonging to one of two categories,
an SVM training algorithm builds a model that predicts whether a new example falls into one category
or the other.

7. Clustering
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

observations within the same cluster are similar according to some pre designated criterion or criteria,
while observations drawn from different clusters are dissimilar. Different clustering techniques make
different assumptions on the structure of the data, often defined by some similarity metric and evaluated
for example by internal compactness (similarity between members of the same cluster) and separation
between different clusters. Other methods are based on estimated density and graph connectivity.
Clustering is a method of unsupervised learning, and a common technique for statistical data analysis.

8. Bayesian networks
A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical
model that represents a set of random variables and their conditional independencies via a directed
acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships
between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities
of the presence of various diseases. Efficient algorithms exist that perform inference and learning.

9. Reinforcement learning
Reinforcement learning is concerned with how an agent ought to take actions in an environment so as
to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a
policy that maps states of the world to the actions the agent ought to take in those states. Reinforcement
learning differs from the supervised learning problem in that correct input/output pairs are never
presented, nor sub-optimal actions explicitly corrected.

10. Similarity and metric learning


In this problem, the learning machine is given pairs of examples that are considered similar and pairs
of less similar objects. It then needs to learn a similarity function (or a distance metric function) that
can predict if new objects are similar. It is sometimes used in Recommendation systems.

11. Genetic algorithms


A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses
methods such as mutation and crossover to generate new genotype in the hope of finding good solutions
to a given problem. In machine learning, genetic algorithms found some uses in the 1980s and 1990s.
Conversely, machine learning techniques have been used to improve the performance of genetic and
evolutionary algorithms.

12. Rule-based machine learning


Rule-based machine learning is a general term for any machine learning method that identifies, learns,
or evolves "rules" to store, manipulate or apply, knowledge. The defining characteristic of a rule- based
machine learner is the identification and utilization of a set of relational rules that collectively represent
the knowledge captured by the system. This is in contrast to other machine learners that commonly
identify a singular model that can be universally applied to any instance in order to make a prediction.
Rule-based machine learning approaches include learning classifier systems, association rule learning,
and artificial immune systems.

13. Feature selection approach


Feature selection is the process of selecting an optimal subset of relevant features for use in model
construction. It is assumed the data contains some features that are either redundant or irrelevant, and

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

can thus be removed to reduce calculation cost without incurring much loss of information. Common
optimality criteria include accuracy, similarity and information measures.

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment–1

AIM: Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.
Program:

import random
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
traindata = []
for row in datareader:
traindata.append(row)
return (traindata)
h=['phi','phi','phi','phi','phi','phi'
data=read_data('finds.csv')
def isConsistent(h,d):
if len(h)!=len(d)-1:
print('Number of attributes are not same in hypothesis.')
return False
else:
matched=0
for i in range(len(h)):
if ( (h[i]==d[i]) | (h[i]=='any') ):
matched=matched+1
if matched==len(h):
return True
else:
return False
def makeConsistent(h,d):
for i in range(len(h)):
if((h[i] == 'phi')):
h[i]=d[i]
elif(h[i]!=d[i]):
h[i]='any'
return h
print('Begin : Hypothesis :',h)
filename = "finds.csv"
dataset = loadCsv(filename)
print(dataset)
hypothesis=['0'] * num_attributes
print("Intial Hypothesis")
print(hypothesis)
print("The Hypothesis are")
for i in range(len(dataset)):

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

target = dataset[i][-1]
if(target == 'Yes'):
for j in range(num_attributes):
if(hypothesis[j]=='0'):
hypothesis[j] = dataset[i][j]
if(hypothesis[j]!= dataset[i][j]):
hypothesis[j]='?'
print(i+1,'=',hypothesis)
print("Final Hypothesis")
print(hypothesis)

Output:

Attributes = ['Sky', 'Temp', 'Humidity', 'Wind', 'Water', 'Forecast']


[['sky', 'Airtemp', 'Humidity', 'Wind', 'Water', 'Forecast', 'WaterSport'],
['Cloudy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'Yes'],
['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Yes'],
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Yes'],
['Cloudy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'No'],
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Yes'],
['Rain', 'Mild', 'High', 'Weak', 'Cool', 'Change', 'No'],
['Rain', 'Cool', 'Normal', 'Weak', 'Cool', 'Same', 'No'],
['Overcast', 'Cool', 'Normal', 'Strong', 'Warm', 'Same', 'Yes']]
Intial Hypothesis
['0', '0', '0', '0', '0', '0']
The Hypothesis are
2 = ['Cloudy', 'Cold', 'High', 'Strong', 'Warm', 'Change']
3 = ['?', '?', '?', 'Strong', 'Warm', '?']
4 = ['?', '?', '?', 'Strong', 'Warm', '?']
6 = ['?', '?', '?', 'Strong', '?', '?']
9 = ['?', '?', '?', 'Strong', '?', '?']
Final Hypothesis
['?', '?', '?', 'Strong', '?', '?']

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment–2

Object: For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the
training examples.

Program:

import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Output:

initialization of specific_h and general_h


['Cloudy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['?' '?' '?' 'Strong' '?' '?']
Final General_h:
[['?', '?', '?', 'Strong', '?', '?']]

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment–3

AIM: Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample.

Program:
import pandas as pd
import numpy as np
dataset= pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
elements,counts = np.unique(target_col,return_counts = True)
entropy = np.sum([(-counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts))

for i in range(len(elements))])
return entropy
def InfoGain(data,split_attribute_name,target_name="class"):
total_entropy = entropy(data[target_name])
vals,counts= np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy =
np.sum([(counts[i]/np.sum(counts))*entropy(data.where(data[split_attribute_name]==vals[i]).dr
opna()[target_name]) for i in range(len(vals))])

Information_Gain = total_entropy - Weighted_Entropy


return Information_Gain
def ID3(data,originaldata,features,target_attribute_name="class",parent_node_class = None):
if len(np.unique(data[target_attribute_name])) <= 1:
return np.unique(data[target_attribute_name])[0]
elif len(data)==0:
return
np.unique(originaldata[target_attribute_name])[np.argmax(np.unique(originaldata[target_attribut
e_name],return_counts=True)[1])]
elif len(features) ==0:
return parent_node_class
else:
parent_node_class =
np.unique(data[target_attribute_name])[np.argmax(np.unique(data[target_attribute_name],return
_counts=True)[1])]
item_values = [InfoGain(data,feature,target_attribute_name) for feature in features] #Return
the information gain values for the features in the dataset
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
tree = {best_feature:{}}
features = [i for i in features if i != best_feature]

for value in np.unique(data[best_feature]):


value = value
sub_data = data.where(data[best_feature] == value).dropna()
subtree = ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing
tree[best_feature][value] = subtree
return(tree)
tree = ID3(dataset,dataset,dataset.columns[:-1])
print(' \nDisplay Tree\n',tree)

Output:
Display Tree
{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny':
{'humidity': {'High': 'No', 'Normal': 'Yes'}}}}

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment–4

AIM: Build an Artificial Neural Network by implementing the Back propagation Algorithm
and test the same using appropriate data sets.

Program:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100

#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))

#Derivative of Sigmoid Function


def derivatives_sigmoid(x):
return x * (1 - x) #Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

#weight and bias initialization


wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
# draws a random range of numbers uniformly of dim x*y

#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)

#Backpropagation
EO = y-output

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Output:

Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment–5

AIM: Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Program:
import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy))
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename = 'data.csv'
splitRatio = 0.67
dataset = loadCsv(filename)
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),
len(trainingSet), len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet)
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy: {0}%'.format(accuracy))
main()
OUTPUT :
Split 306 rows into train=205 and test=101 rows
Accuracy: 72.27722772277228%

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment-6

AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program. Calculate
the accuracy, precision, and recall for your data set.

Program:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
from sklearn import metrics
print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Output:

The dimensions of the dataset (18, 2)

0 I love this sandwich


1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today

Name: message, dtype: object


01
11
21
31
41
50
60
70
80
90
10 1
11 0
12 1
13 0
14 1
15 0
16 1

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

17 0

Name: labelnum, dtype: int64


(5,)
(13,)
(5,)
(13,)

Accuracy metrics

Accuracy of the classifer is 0.8

Confusion matrix
[[3 1]
[0 1]]

Recall and Precison


1.0
0.5

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment-7

AIM: Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data
Set. You can use Java/Python ML library classes/API.
Program:

import numpy as np
from urllib.request import urlopen
import urllib
import pandas as pd
from pgmpy.inference import VariableElimination
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator

names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca',
'thal', 'heartdisease']
heartDisease = pd.read_csv('heart.csv', names = names)
heartDisease = heartDisease.replace('?', np.nan)

model = BayesianModel([('age', 'trestbps'), ('age', 'fbs'), ('sex', 'trestbps'), ('exang',


'trestbps'),('trestbps','heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'), ('heartdisease','chol')])

model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
from pgmpy.inference import VariableElimination

HeartDisease_infer = VariableElimination(model)
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 37, 'sex' :0})
print(q['heartdisease'])

Output:
╒════════════════╤════
│ heart disease │ phi (heart disease) │
╞══════════════════════
│ heartdisease_0 │ 0.5593 │
├──────────────┤
│ heartdisease_1 │ 0.4407 │
╘════════════════╧═════

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment-8

AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment
on the quality of clustering. You can add Java/Python ML library classes/API in the program.

Program:
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
import matplotlib.pyplot as plt1
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1], color='black')

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Output:

EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean:
[[57.70629058 25.73574491]
[52.12044022 22.46250453]
[46.4364858 39.43288647]]
Covariances
[[[83.51878796 14.926902 ]
[14.926902 2.70846907]]
[[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938]
[29.55835938 18.17157304]]]
[[71.24 28. ]
[52.53 25. ]
[64.54 27. ]
[55.69 22. ]
[54.58 25. ]
[41.91 10. ]
[58.64 20. ]
[52.02 8. ]
[31.25 34. ]
[44.31 19. ]
[49.35 40. ]
[58.07 45. ]
[44.22 22. ]
[55.73 19. ]
[46.63 43. ]
[52.97 32. ]
[46.25 35. ]
[51.55 27. ]
[57.05 26. ]
[58.45 30. ]
[43.42 23. ]
[55.68 37. ]
[55.15 18. ]

centroid and predications

[[57.74090909 24.27272727]
[48.6 38. ]
[45.176 16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment-9

AIM: Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.

Program:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

import pandas as pd
dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)

classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')
classifier.fit(X_train,y_train)

#predict the test resuts


y_pred=classifier.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))

Output:

Confusion matrix is as follows


[[13 0 0]
[ 0 15 1]
[ 0 0 9]]
Accuracy Metrics
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9
avg / total 0.98 0.97 0.97 38
correct predicition 0.9736842105263158
worng predicition 0.02631578947368418

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Experiment-10

AIM: Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

Program:
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook
def local_regression(x0, X, Y, tau):
# add bias term
x0 = np.r_[1, x0] # Add one to avoid the loss in information
X = np.c_[np.ones(len(X)), X]
# fit model: normal equations with kernel
xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W
beta = np.linalg.pinv(xw @ X) @ xw @ Y # @ Matrix Multiplication or Dot Product
# predict value
return x0 @ beta # @ Matrix Multiplication or Dot Product for prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))
# Weight or Radial Kernal Bias Function
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\n",X[1:10])
Y = np.log(np.abs(X ** 2 - 1) + .5)
print("The Fitting Curve Data Set (10 Samples) Y :\n",Y[1:10])
X += np.random.normal(scale=.1, size=n)
print("Normalised (10 Samples) X :\n",X[1:10])
domain = np.linspace(-3, 3, num=300)
print(" Xo Domain Space(10 Samples) :\n",domain[1:10])
def plot_lwr(tau):
# prediction through regression
prediction = [local_regression(x0, X, Y, tau) for x0 in domain]
plot = figure(plot_width=400, plot_height=400)
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2, color='red')
return plot
# Plotting the curves with different tau
show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]
]))

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Output:

The Data Set ( 10 Samples) X :


[-2.99399399 -2.98798799 -2.98198198 -2.97597598 -2.96996997 -2.96396396
-2.95795796 -2.95195195 -2.94594595]
The Fitting Curve Data Set (10 Samples) Y :
[2.13582188 2.13156806 2.12730467 2.12303166 2.11874898 2.11445659
2.11015444 2.10584249 2.10152068]
Normalised (10 Samples) X :
[-3.10518137 -3.00247603 -2.9388515 -2.79373602 -2.84946247 -2.85313888
-2.9622708 -3.09679502 -2.69778859]
Xo Domain Space(10 Samples) :
[-2.97993311 -2.95986622 -2.93979933 -2.91973244 -2.89966555 -2.87959866
-2.85953177 -2.83946488 -2.81939799]

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Beyond the Syllabus Experiment-1

AIM: WAP to implement Heap Random Forest Algorithm

Program:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('Salaries.csv')
print(data)
from sklearn.ensemble import RandomForestRegressor

# create regressor object


regressor = RandomForestRegressor(n_estimators = 100, random_state = 0)

# fit the regressor with x and y data


regressor.fit(x, y)

Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1))

X_grid = np.arrange(min(x), max(x), 0.01)

# reshape for reshaping the data into a len(X_grid)*1 array,


# i.e. to make a column out of the X_grid value
X_grid = X_grid.reshape((len(X_grid), 1))

# Scatter plot for original data


plt.scatter(x, y, color = 'blue')

# plot predicted data


plt.plot(X_grid, regressor.predict(X_grid),
color = 'green')
plt.title('Random Forest Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

from sklearn.trees import RandomForestClassifier


RandomeForest = RandomForestClassifier(oob_score=True)
RandomForest.fit(X_train,y_train)
print(RandomForest.oob_score_)

ML Lab (6CS4-23) Manual Department of Advance Computing


Poornima College of Engineering, Jaipur Department of Advance Computing

Beyond the Syllabus Experiment-2

AIM: WAP to implement XGBoost for regression.

Program:

import numpy as np
import pandas as pd
import xgboost as xg
from sklearn.model_selection import train_test_split from
sklearn.metrics import mean_squared_error as MSE

# Load the data


dataset = pd.read_csv("boston_house.csv")
X, y = dataset.iloc[:, :-1], dataset.iloc[:, -1]

# Splitting
train_X, test_X, train_y, test_y = train_test_split(X, y,
test_size = 0.3, random_state = 123)

# Instantiation
xgb_r = xg.XGBRegressor(objective ='reg:linear',
n_estimators = 10, seed = 123)

# Fitting the model


xgb_r.fit(train_X, train_y)

# Predict the model


pred = xgb_r.predict(test_X)

# RMSE Computation
rmse = np.sqrt(MSE(test_y, pred))
print("RMSE : % f" %(rmse))

Output:

129043.2314

ML Lab (6CS4-23) Manual Department of Advance Computing

You might also like