ML Lab Manual

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 41

LABORATORYMANUAL

B.TECH CSE
(4TH YEAR–8TH SEM)
(2023-24)

MachineLearningwithPythonLab
(LC-CSE-412G)

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING

BIMLA DEVI EDUCATION SOCIETY’S GROUP OF INSTITUTIONS


J B KNOWLEDGE PARK, MANJHAWALI, FARIDABAD
Approved by the AICTE, Ministry of HRD, Government of India & affiliated to M.D. University,
Rohtak, a State Govt University Accredited with an 'A+' grade by NAAC.

1
ChecklistforLabManual
Sr.No. Particulars

1 Missionand Vision

2 Course Outcomes

3 Guidelines forthe student

4 ListofProgramsasper University

5 SamplecopyofFile

2
DepartmentofComputerScience& Engineering

VisionandMissionoftheDepartment
Vision
To be a Model in Quality Education for producing highly talented and globally
recognizable students with sound ethics, latest knowledge, and innovative ideas in
Computer Science & Engineering.
MISSION
TobeaModel in QualityEducation by
M1: Imparting good sound theoretical basis and wide-ranging practical experience
to the Students for fulfilling the upcoming needs of the Societyin the various fields
of Computer Science & Engineering.
M2:OfferingtheStudentsanoverallbackgroundsuitableformakinga Successful career
in Industry/Research/Higher Education in India and abroad.
M3: Providing opportunity to the Students for Learning beyond Curriculum and
improving Communication Skills.
M4:EngagingStudentsinLearning,UnderstandingandApplyingNovelIdeas.
Course:MachineLearningwithPythonLab Course
Code: LC-CSE-412G
CO(CourseOutcomes) RBT*-RevisedBloom’s
Taxonomy
ToDescribetheimplementationproceduresfortheMachineLearning L2
CO1
algorithms. (Understand)
L3
CO2 ToApplyappropriatedatasetstotheMachineLearningalgorithms. (Apply)
L3
CO3 ToUseMachineLearningalgorithmstosolvereal-worldproblems. (Apply)
L4
CO4 ToOutlinepredictionsusingmachinelearningalgorithms. (Analyze)
ToDesignJava/PythonprogramsforvariousMachineLearningalgorithms. L6
CO5 (Create)

COPO-PSOArticulationMatrices
Course (POs) PSOs
Outcomes
(COs) PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
CO1 3 2 2 1 3 3
CO2 3 3 2 1 3 2
CO3 3 2 3 2 1 3 2
CO4 3 3 2 1 3 2
CO5 3 2 3 1 3 2

3
GuidelinesforStudents

1. Studentsshouldberegularandcomepreparedforthelabpractice.
2. In case a student misses a class, it is his/her responsibility to
complete that missedexperiment(s).
3. Students should bring the observation book, lab journal and lab
manual. Prescribed textbookand class notes can be kept ready for
reference if required.
4. TheyshouldimplementthegivenProgramindividually.
5. While conducting the experiments students should see that their
programs would meet thefollowing criteria:
 Programs shouldbeinteractivewithappropriateprompt messages,
error messages if any,and descriptive messages for outputs.
 Programs should perform input validation (Data type, range error,
etc.) and giveappropriate error messages and suggest corrective
actions.
 Commentsshouldbeusedtogivethestatementoftheproblemand every
function shouldindicate the purpose of the function, inputs and
outputs
 Statementswithintheprogramshouldbeproperlyindented
 Usemeaningfulnamesforvariablesandfunctions.
 MakeuseofConstantsandtypedefinitionswhereverneeded.
6. Oncetheexperiment(s)getexecuted,theyshouldshowthe
programandresultstotheinstructors and copythe same in their
observation book.
7. Questionsforlabtestsandexamneednotnecessarilybelimitedto
thequestionsinthemanual,butcouldinvolvesomevariationsand/or
combinations of the questions.
4
Machinelearning
Machine learning is a subset of artificial intelligence in the field of computerscience that often
uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve
performance on a specific task) with data, without being explicitly programmed. In the past
decade, machine learning has given us self-driving cars, practical speech recognition, effective
web search, and a vastly improved understanding of the humangenome.

Machinelearningtasks
Machine learning tasks are typicallyclassified into two broad categories, depending on whether
there is a learning "signal" or "feedback" available to a learning system:

Supervised learning: The computer is presented with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.As special
cases, the input signal can be only partially available, or restricted to special feedback:

Semi-supervisedlearning:thecomputerisgivenonlyanincompletetrainingsignal:atrainingset with
some (often many) of the target outputs missing.

Activelearning:thecomputercanonlyobtaintraininglabelsforalimitedsetofinstances(based on a
budget), and also has to optimize its choice of objects to acquire labels for. When used
interactively, these can be presented to the user for labeling.

Reinforcement learning: training data (in form of rewards and punishments) is given only as
feedbacktotheprogram'sactionsinadynamicenvironment,suchas drivingavehicleorplaying a game
against an opponent.

Unsupervisedlearning:Nolabelsaregiven tothelearningalgorithm,leavingitonitsowntofind
structureinitsinput.Unsupervisedlearningcanbeagoalinitself(discoveringhiddenpatternsin data) or
a means towards an end (feature learning).

Supervisedlearning UnSupervised learning Instancebased


learning
Find-salgorithm EMalgorithm
Candidateeliminationalgorithm
Decisiontreealgorithm
BackpropagationAlgorithm
NaïveBayes Algorithm Locally weighted
Kmeansalgorithm Regressionalgorithm
K nearest
neighbouralgorithm(lazylearning
algorithm)

5
Machinelearningapplications
Inclassification,inputsaredividedintotwoormoreclasses,andthelearnermustproduceamodel that
assigns unseen inputs to one or more (multi-label classification) of these classes. This is
typicallytackledinasupervisedmanner.Spamfilteringisanexampleof classification,wherethe
inputsareemail(orother)messagesandtheclassesare"spam"and"notspam".Inregression,also a
supervised problem, the outputs are continuous rather than discrete.

In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are
not known beforehand, makingthis typicallyan unsupervised task. Densityestimation finds the
distributionofinputsinsomespace.Dimensionalityreductionsimplifiesinputsbymappingthem
intoalower-dimensionalspace. Topicmodelingisarelatedproblem,whereaprogramisgiven
alistofhumanlanguagedocumentsandistaskedwithfindingoutwhichdocumentscoversimilar topics.

MachinelearningApproaches

Decisiontreelearning:Decisiontreelearningusesa decisiontreeasapredictivemodel,whichmaps observations


about an item to conclusions about the item's target value. Association rule learning
Associationrulelearningisamethodfordiscoveringinterestingrelationsbetweenvariablesinlarge databases.

Artificialneuralnetworks

An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is
a learning algorithm that is vaguely inspired by biological neuralnetworks. Computations are
structuredintermsofan interconnected groupof artificialneurons, processinginformationusing a
connectionist approach to computation. Modern neural networks are non-
linearstatisticaldatamodelingtools.Theyareusuallyusedtomodelcomplexrelationshipsbetweeninput
sandoutputs, to find patterns in data, or to capture the statistical structure in an unknown
jointprobabilitydistribution between observed variables.

Deep learning

FallinghardwarepricesandthedevelopmentofGPUsforpersonaluseinthelastfewyearshave
contributedtothedevelopmentoftheconceptofdeeplearningwhichconsistsofmultiplehidden layers
in an artificial neural network. This approach tries to model the way the human brain
processeslightandsoundintovisionandhearing.Somesuccessfulapplicationsofdeeplearning are
computer vision and speechrecognition.

Inductivelogicprogramming
Inductive logic programming(ILP) is an approach to rule learningusing logicprogramming as a
uniform representation for input examples, background knowledge, and hypotheses. Given an
encoding of the known background knowledge and a set of examples represented as a logical
databaseoffacts,an ILPsystemwillderiveahypothesizedlogicprogramthat entailsallpositive and no
negative examples. Inductiveprogrammin6g is a related field that considers any kind of
programming languages for representing hypotheses (and not only logic programming), such as
functionalprograms.

Supportvectormachines

Support vector machines (SVMs) are a set of related supervised learning methods used for
classification and regression. Given a set of training examples, each marked as belonging to one
oftwocategories,anSVMtrainingalgorithmbuildsamodelthatpredictswhetheranewexample falls
into one category or the other.

Clustering

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to some pre designated criterion or
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by some
similaritymetricandevaluatedforexamplebyinternalcompactness(similaritybetweenmembers of
the same cluster) and separation between different clusters. Other methods are based on
estimated densityand graph connectivity. Clusteringis a method of unsupervisedlearning, and a
common technique for statisticaldata analysis.

Bayesiannetworks

ABayesiannetwork,beliefnetworkordirectedacyclicgraphicalmodelisaprobabilisticgraphicalmode
lthatrepresentsasetofrandomvariablesandtheirconditionalindependenciesviaadirectedacyclic
graph (DAG). For example, a Bayesian network could represent the probabilistic relationships
between diseases and symptoms. Given symptoms, the network can be used to compute the
probabilities of the presence of various diseases. Efficient algorithms exist that perform
inference and learning.

Reinforcementlearning
Reinforcement learning is concerned with how an agent ought to take actions in an environment
so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt
tofindapolicythatmapsstatesoftheworldtotheactionstheagentoughttotakeinthosestates.
Reinforcement learning differs from the supervised learning problem in that correct input/output
pairs are never presented, nor sub-optimal actions explicitlycorrected.

Similarityandmetriclearning
In this problem, the learning machine is given pairs of examples that are considered similar and
pairs of less similar objects. It then needs to learn a similarity function (or a distance metric
function) that can predict if new objects are similar. It is sometimes used in
Recommendationsystems.
7
Geneticalgorithms
Ageneticalgorithm(GA)isasearchheuristicthatmimicstheprocessofnaturalselection,and
uses methods such as mutation and crossover to generate new genotype in the hope of finding
good solutions to a given problem. In machine learning, genetic algorithms found some uses in
the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the
performance of genetic and evolutionary algorithms.

Rule-basedmachinelearning

Rule-based machine learning is a general term for any machine learning method that identifies,
learns,orevolves"rules"tostore,manipulateorapply,knowledge.Thedefiningcharacteristicof a rule-
based machine learner is the identification and utilization of a set of relational rules that
collectivelyrepresent the knowledgecaptured bythe system. This is in contrast to othermachine
learners that commonly identify a singular model that can be universally applied to any instance
inordertomakeaprediction.Rule-
basedmachinelearningapproachesincludelearningclassifiersystems,association rule learning, and
artificial immune systems.

Featureselectionapproach

Featureselectionistheprocessofselectinganoptimalsubsetofrelevantfeaturesforuseinmodel
construction.Itisassumedthedatacontainssomefeaturesthatareeitherredundantor irrelevant, and
can thus be removed to reduce calculation cost without incurring much loss of information.
Common optimality criteria include accuracy, similarity and information measures.

Description(Ifany):

1. Theprograms canbeimplementedineitherJAVAorPython.
2. ForProblems1to6and10,programsareto bedevelopedwithoutusingthebuilt- in
classes or APIs of Java/Python.
3. Data sets can be taken from standard repositories
(https://archive.ics.uci.edu/ml/datasets.html)orconstructedbythe
students.

Lab Experiments:

1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesisbasedonagivensetoftrainingdatasamples. Readthetrainingdatafroma
.CSVfile.

2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of
allhypotheses consistent with the training examples.

3. Writeaprogramtodemonstratethework8ingofthedecisiontreebasedID3algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.
4. BuildanArtificial NeuralNetworkbyimplementingtheBack-propagationalgorithm
andtestthesameusingappropriatedata sets.

5. Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test
data sets.

6. Assumingasetofdocumentsthatneedtobeclassified,usethenaïveBayesianClassifier
model to performthis task.Built-in Javaclasses/APIcan beused to writetheprogram.
Calculate the accuracy, precision, and recall for your dataset.

7. Write a program to construct a Bayesian network considering medical data. Use this
modeltodemonstratethediagnosisofheartpatients usingstandardHeartDiseaseData Set.
You can use Java/Python ML libraryclasses/API.

8. ApplyEMalgorithmto clusterasetofdatastoredina.CSV file.Usethesamedataset for


clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library
classes/API in the program.

9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data
set. Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.

10. Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

9
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesisbasedonagivensetoftrainingdatasamples.Readthetrainingdata from a
.CSV file.
importcsv

withopen('tennis.csv','r')asf:
reader = csv.reader(f)
your_list = list(reader)

h=[['0','0','0','0','0','0']]

foriinyour_list:
print(i)
ifi[-1]=="True": j
=0
forxini:
ifx !="True":
ifx!=h[0][j]andh[0][j]=='0': h[0][j] =
x
elifx!=h[0][j]andh[0][j]!='0': h[0][j]
= '?'
else:
pass
j=j+1
print("Mostspecifichypothesisis")
print(h)

Output

'Sunny','Warm','Normal','Strong','Warm','Same',True
'Sunny','Warm','High','Strong','Warm','Same',True
'Rainy','Cold','High','Strong','Warm','Change',False
'Sunny','Warm','High','Strong','Cool','Change',True

MaximallySpecificset
[['Sunny','Warm','?','Strong','?', '?']]

10
2. For a given set of training data examples stored in a .CSV file, implement and
demonstratetheCandidate-Eliminationalgorithmtooutputadescriptionofthesetofall
hypotheses consistent with the training examples.

classHolder:
factors={}#Initializeanemptydictionary
attributes=()#declarationofdictionariesparameterswithanarbitrarylength

'''
ConstructorofclassHolderholdingtwoparameters, self
refers to the instance of the class
'''
definit (self,attr): #
self.attributes=attr
for i in attr:
self.factors[i]=[]

defadd_values(self,factor,values):
self.factors[factor]=values

classCandidateElimination:
Positive={} #Initialize positive empty dictionary
Negative={}#Initializenegativeemptydictionary

definit (self,data,fact):
self.num_factors=len(data[0][0])
self.factors = fact.factors
self.attr=fact.attributesself.
dataset = data

defrun_algorithm(self):
'''
Initializethespecificandgeneralboundaries,andloopthedatasetagainstthe algorithm
'''
G=self.initializeG()
S = self.initializeS()

'''
Programmaticallypopulatelistintheiteratingvariabletrial_set'''
count=0
fortrial_setinself.dataset:
ifself.is_positive(trial_set):#iftrialset/exampleconsistsofpositiveexamples
G=self.remove_inconsistent_G(G,trial_set[0])#removeinconsitentdatafrom the
general boundary

11
S_new=S[:]#initializethedictionarywithnokey-valuepair print
(S_new)
fors in S:
ifnotself.consistent(s,trial_set[0]):
S_new.remove(s)
generalization=self.generalize_inconsistent_S(s,trial_set[0])
generalization = self.get_general(generalization,G)
if generalization:
S_new.append(generalization)
S =S_new[:]
S=self.remove_more_general(S)
print(S)

else:#ifitisnegative

S=self.remove_inconsistent_S(S,trial_set[0])#removeinconsitentdatafrom the
specific boundary
G_new=G[:]#initializethedictionarywithnokey-valuepair(datasetcan take any
value)
print(G_new)
for g in G:
ifself.consistent(g,trial_set[0]):
G_new.remove(g)
specializations=self.specialize_inconsistent_G(g,trial_set[0])
specializationss = self.get_specific(specializations,S)
if specializations != []:
G_new+=specializationss
G=G_new[:]
G=self.remove_more_specific(G)
print(G)

print(S)
print(G)

definitializeS(self):
'''Initializethespecificboundary'''
S=tuple(['-'forfactorinrange(self.num_factors)])#6constraintsinthevector return [S]

definitializeG(self):
'''Initializethegeneralboundary'''
G=tuple(['?'forfactorinrange(self.num_factors)])#6constraintsinthevector return [G]

defis_positive(self,trial_set):
'''Checkifagiventrainingtrial_setispositive''' if
trial_set[1] == 'Y':

12
returnTrue
eliftrial_set[1]=='N': return
False
else:
raiseTypeError("invalidtargetvalue")

defmatch_factor(self,value1,value2):
''' Check for the factors values match,
necessarywhilecheckingtheconsistencyof
training trial_set with the hypothesis '''
ifvalue1=='?'orvalue2=='?':
return True
elifvalue1==value2: return
True
returnFalse

defconsistent(self,hypothesis,instance):
'''Checkwhethertheinstanceispartofthehypothesis''' for
i,factor in enumerate(hypothesis):
ifnotself.match_factor(factor,instance[i]):
return False
returnTrue

defremove_inconsistent_G(self,hypotheses,instance):
''' For a positive trial_set, the hypotheses in G
inconsistentwithitshouldberemoved'''
G_new = hypotheses[:]

forginhypotheses:
ifnotself.consistent(g,instance):
G_new.remove(g)
returnG_new

defremove_inconsistent_S(self,hypotheses,instance):
''' For a negative trial_set, the hypotheses in S
inconsistentwithitshouldberemoved'''
S_new = hypotheses[:]
fors in hypotheses:
ifself.consistent(s,instance):
S_new.remove(s)
returnS_new

defremove_more_general(self,hypotheses):
'''AftergeneralizingSforapositivetrial_set,thehypothesisinS general
than others in S should be removed '''
S_new=hypotheses[:]
for old in hypotheses:
13
fornew in S_new:
ifold!=newandself.more_general(new,old):
S_new.remove[new]
returnS_new

defremove_more_specific(self,hypotheses):
'''AfterspecializingGforanegativetrial_set,the hypothesisinG
specific than others in G should be removed '''
G_new=hypotheses[:]
forold in hypotheses:
fornewinG_new:
ifold!=newandself.more_specific(new,old):
G_new.remove[new]
returnG_new

defgeneralize_inconsistent_S(self,hypothesis,instance):
'''Whenainconsistenthypothesisforpositivetrial_setisseeninthespecific boundaryS,
itshouldbegeneralized tobeconsistentwiththetrial_set...wewillgetone hypothesis'''
hypo=list(hypothesis)#converttupletolistformutability for
i,factor in enumerate(hypo):
iffactor=='-':
hypo[i]=instance[i]
elifnotself.match_factor(factor,instance[i]):
hypo[i] = '?'
generalization=tuple(hypo)#convertlistbackto tupleforimmutability return
generalization

defspecialize_inconsistent_G(self,hypothesis,instance):
'''Whenainconsistenthypothesisfornegativetrial_setisseeninthegeneral boundary G
shouldbespecializedtobeconsistentwiththetrial_set..wewillgetasetof hypotheses '''
specializations=[]
hypo=list(hypothesis)#converttupletolistformutability for
i,factor in enumerate(hypo):
iffactor=='?':
values=self.factors[self.attr[i]]
for j in values:
ifinstance[i]!=j:
hyp=hypo[:]
hyp[i]=j
hyp=tuple(hyp)#convertlistbacktotupleforimmutability
specializations.append(hyp)
returnspecializations

14
defget_general(self,generalization,G):
'''ChecksifthereismoregeneralhypothesisinG
forageneralizationofinconsistenthypothesisinS
incaseofpositivetrial_setandreturnsvalidgeneralization'''

forginG:
ifself.more_general(g,generalization):
return generalization
returnNone

defget_specific(self,specializations,S):
'''ChecksifthereismorespecifichypothesisinS for
each of hypothesis in specializations of an
inconsistenthypothesisinGincaseofnegativetrial_set and
return the valid specializations'''
valid_specializations = []
forhypoinspecializations:
fors in S:
ifself.more_specific(s,hypo)ors==self.initializeS()[0]:
valid_specializations.append(hypo)
returnvalid_specializations

defexists_general(self,hypothesis,G):
'''Usedtocheckifthereexistsamoregeneralhypothesisin general
boundary for version space'''

forginG:
ifself.more_general(g,hypothesis):
return True
returnFalse

defexists_specific(self,hypothesis,S):
'''Usedtocheckifthereexistsamorespecifichypothesisin general
boundary for version space'''

fors in S:
ifself.more_specific(s,hypothesis):
return True
returnFalse

defmore_general(self,hyp1,hyp2):
'''Checkwhetherhyp1 ismoregeneralthanhyp2''' hyp
= zip(hyp1,hyp2)
fori,jinhyp:
if i == '?':
continue

15
elifj=='?':
if i != '?':
returnFalse
elif i != j:
returnFalse
else:
continue
return True

defmore_specific(self,hyp1,hyp2):
'''hyp1morespecificthanhyp2is
equivalenttohyp2beingmoregeneralthanhyp1 ''' return
self.more_general(hyp2,hyp1)

dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron
g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi
gh','strong','cool','change'),'Y')]
attributes
=('Sky','Temp','Humidity','Wind','Water','Forecast')f =
Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy'))#skycanbesunnyrainyorcloudy f.add_values('Temp',
('cold','warm')) #Temp can be sunny cold or warm f.add_values('Humidity',('normal','high'))
#Humidity can be normal or high f.add_values('Wind',('weak','strong')) #wind can be weak or
strong f.add_values('Water',('warm','cold')) #water can be warm or cold f.add_values('Forecast',
('same','change')) #Forecast can be same or change
a=CandidateElimination(dataset,f)#passthedatasettothealgorithmclassandcallthe run
algoritm method
a.run_algorithm()

Output

[('sunny','warm','normal','strong','warm','same')]
[('sunny','warm','normal','strong', 'warm','same')]
[('sunny','warm','?','strong','warm','same')]
[('?','?','?','?','?','?')]
[('sunny','?','?','?','?','?'),('?','warm','?','?','?','?'),('?','?','?','?','?','same')]
[('sunny','warm','?','strong','warm','same')]
[('sunny','warm','?','strong','?','?')]
[('sunny','warm','?','strong','?','?')]
[('sunny','?','?','?','?','?'),('?','warm','?','?','?','?')]

16
3. WriteaprogramtodemonstratetheworkingofthedecisiontreebasedID3algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.

importnumpyasnp
import math
fromdata_loader importread_data

classNode:
definit (self, attribute):
self.attribute=attribute
self.children = []
self.answer = ""

defstr(self):returnself
.attribute

defsubtables(data,col,delete):
dict = {}
items=np.unique(data[:,col])

count=np.zeros((items.shape[0],1),dtype=np.int32) for

x in range(items.shape[0]):
foryin range(data.shape[0]):
ifdata[y,col]==items[x]: count[x]
+= 1

forx inrange(items.shape[0]):
dict[items[x]]=np.empty((int(count[x]),data.shape[1]),dtype="|S32")

pos =0
foryinrange(data.shape[0]):
ifdata[y,col]==items[x]:
dict[items[x]][pos]=data[y]
pos += 1

ifdelete:
dict[items[x]]=np.delete(dict[items[x]],col,1) return

items, dict

defentropy(S):
items=np.unique(S)
if items.size == 1:

17
return0

counts=np.zeros((items.shape[0],1))
sums = 0

forx inrange(items.shape[0]):

counts[x]=sum(S ==items[x])/(S.size*1.0)

forcountin counts:
sums+=-1*count*math.log(count,2) return
sums

defgain_ratio(data,col):
items,dict=subtables(data,col, delete=False)

total_size=data.shape[0]
entropies=np.zeros((items.shape[0],1))
intrinsic = np.zeros((items.shape[0], 1))
for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x]=ratio*entropy(dict[items[x]][:,-1])
intrinsic[x] = ratio * math.log(ratio, 2)

total_entropy=entropy(data[:,-1]) iv
= -1 * sum(intrinsic)

forxinrange(entropies.shape[0]):
total_entropy -= entropies[x]

returntotal_entropy/ iv

defcreate_node(data, metadata):
if(np.unique(data[:,-1])).shape[0]==1:
node = Node("")
node.answer=np.unique(data[:,-1])[0]
return node

gains=np.zeros((data.shape[1]-1,1))
for col in range(data.shape[1] - 1):
gains[col]=gain_ratio(data,col) split

= np.argmax(gains)

node=Node(metadata[split])

18
metadata=np.delete(metadata,split,0)
items,dict=subtables(data,split, delete=True)

forx inrange(items.shape[0]):
child=create_node(dict[items[x]],metadata) node.children.append((items[x],
child))

return node

defempty(size):
s =""
forxinrange(size): s
+= " "
returns

defprint_tree(node,level):
if node.answer != "":
print(empty(level), node.answer)
return

print(empty(level),node.attribute)

for value, n in node.children:


print(empty(level+1),value)
print_tree(n, level + 2)

metadata,traindata=read_data("tennis.csv") data
= np.array(traindata)
node=create_node(data,metadata)
print_tree(node, 0)

Data_loader.py
importcsv
defread_data(filename):
withopen(filename,'r')ascsvfile:
datareader=csv.reader(csvfile,delimiter=',')
headers = next(datareader)
metadata=[]
traindata= []
for name in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)

return(metadata,traindata)
19
Tennis.csv

outlook,temperature,humidity,wind,
answer
sunny,hot,high,weak,nosunny,hot,high,st
rong,noovercast,hot,high,weak,yesrain,mi
ld,high,weak,yesrain,cool,normal,weak,y
esrain,cool,normal,strong,noovercast,cool
,normal,strong,yessunny,mild,high,weak,
nosunny,cool,normal,weak,yesrain,mild,
normal,weak,yessunny,mild,normal,stron
g,yesovercast,mild,high,strong,yesoverca
st,hot,normal,weak,yesrain,mild,high,stro
ng,no

Output
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes

20
4. BuildanArtificialNeuralNetworkbyimplementingtheBackpropagation
algorithm and test the same using appropriate data sets.

importnumpyasnp
X=np.array(([2,9],[1,5],[3, 6]),dtype=float)
y=np.array(([92],[86],[89]),dtype=float)
X=X/np.amax(X,axis=0)#maximumofXarraylongitudinally y =
y/100

#SigmoidFunction
def sigmoid (x):
return1/(1+np.exp(-x))

#DerivativeofSigmoidFunction
defderivatives_sigmoid(x):
returnx*(1-x)

#Variableinitialization
epoch=7000#Settingtrainingiterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons=3#numberofhiddenlayersneurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#drawsarandomrangeofnumbersuniformlyofdimx*y for i
in range(epoch):

#Forward
Propogationhinp1=np.dot(
X,wh) hinp=hinp1 +
bhhlayer_act=sigmoid(hin
p)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output=sigmoid(outinp)

#Backpropagation
EO = y-output
outgrad=derivatives_sigmoid(output)
d_output = EO* outgrad
EH=d_output.dot(wout.T)
hiddengrad=derivatives_sigmoid(hlayer_act)#howmuchhiddenlayerwts
contributed to error

21
d_hiddenlayer=EH*hiddengrad
wout+=hlayer_act.T.dot(d_output)*lr#dotproductofnextlayererrorand
currentlayerop
#bout+=np.sum(d_output,axis=0,keepdims=True)*lrwh
+= X.T.dot(d_hiddenlayer) *lr
#bh+=np.sum(d_hiddenlayer,axis=0,keepdims=True)*lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("PredictedOutput:\n",output)

output
Input:
[[0.666666671.]
[0.333333330.55555556]
[1. 0.66666667]]
ActualOutput:
[[ 0.92]
[0.86]
[0.89]]
PredictedOutput:
[[ 0.89559591]
[0.88142069]
[0.8928407]]

22
5.WriteaprogramtoimplementthenaïveBayesianclassifierforasampletrainingdata
setstoredasa.CSVfile.Computetheaccuracyoftheclassifier,consideringfewtestdata sets.

importcsvimport
random import
math

defloadCsv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
foriinrange(len(dataset)):
#convertingstringsintonumbersforprocessing
dataset[i] = [float(x) for x in dataset[i]]

returndataset

defsplitDataset(dataset,splitRatio):
#67% training size
trainSize=int(len(dataset)*splitRatio);
trainSet = []
copy=list(dataset);
whilelen(trainSet)<trainSize:
#generateindicesforthedatasetlistrandomlytopickelefortrainingdata index =
random.randrange(len(copy)); trainSet.append(copy.pop(index))
return[trainSet,copy]
defseparateByClass(dataset):
separated= {}
#createsadictionaryofclasses1and0wherethevaluesarethe instacnesbelongingto each class
foriinrange(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]]=[]
separated[vector[-1]].append(vector)
return separated

defmean(numbers):
returnsum(numbers)/float(len(numbers))

defstdev(numbers):
avg=mean(numbers)
variance=sum([pow(x-avg,2)forxinnumbers])/float(len(numbers)-1) return
math.sqrt(variance)

23
defsummarize(dataset):
summaries=[(mean(attribute),stdev(attribute))forattributeinzip(*dataset)]; del
summaries[-1]
returnsummaries

defsummarizeByClass(dataset):
separated=separateByClass(dataset);
summaries = {}
forclassValue,instancesinseparated.items():
#summariesisadicoftuples(mean,std)foreachclassvalue
summaries[classValue] = summarize(instances)
returnsummaries

defcalculateProbability(x,mean,stdev):
exponent=math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

defcalculateClassProbabilities(summaries,inputVector):
probabilities={}
forclassValue,classSummariesinsummaries.items():#classandattributeinformation as
mean and sd
probabilities[classValue]=1
foriin range(len(classSummaries)):
mean,stdev=classSummaries[i]#takemeanandsdofeveryattribute for
class 0 and 1 seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue]*=calculateProbability(x,mean,stdev);#use
normaldist
returnprobabilities

defpredict(summaries,inputVector):
probabilities=calculateClassProbabilities(summaries,inputVector)
bestLabel, bestProb = None, -1
forclassValue,probabilityinprobabilities.items():#assignsthatclasswhichhashe
highestprob
ifbestLabelisNoneorprobability>bestProb:
bestProb = probability
bestLabel=classValue
return bestLabel

defgetPredictions(summaries,testSet):
predictions = []
foriinrange(len(testSet)):
result=predict(summaries,testSet[i])
predictions.append(result)
returnpredictions

24
defgetAccuracy(testSet,predictions):
correct=0
foriinrange(len(testSet)):
iftestSet[i][-1]==predictions[i]:
correct += 1
return(correct/float(len(testSet)))*100.0

defmain():
filename='5data.csv'
splitRatio = 0.67
dataset=loadCsv(filename);

trainingSet,testSet=splitDataset(dataset,splitRatio)
print('Split{0}rowsintotrain={1}andtest={2}rows'.format(len(dataset), len(trainingSet),
len(testSet)))
#preparemodel
summaries=summarizeByClass(trainingSet);
# test model
predictions=getPredictions(summaries,testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracyoftheclassifieris:{0}%'.format(accuracy))

main()

Output
confusionmatrixisas
follows [[17 0 0]
[0 17 0]
[0 0 11]]
Accuracymetrics
precisionrecallf1-scoresupport

0 1.00 1.00 1.00 17


1 1.00 1.00 1.00 17
2 1.00 1.00 1.00 11

avg/total 1.00 1.00 1.00 45

25
6. Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifiermodeltoperformthistask.Built-inJavaclasses/APIcanbe usedtowrite the
program. Calculate the accuracy, precision, and recall for your data set.

import pandas as
pdmsg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)

#splittingthedatasetintotrainandtestdata
fromsklearn.model_selectionimporttrain_test_splitxtrai
n,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
#outputofcountvectoriseris asparsematrix
fromsklearn.feature_extraction.textimportCountVectorizercount_ve
ct = CountVectorizer()
xtrain_dtm=count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())

df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)#tabular representation
print(xtrain_dtm)#sparsematrixrepresentation

#TrainingNaiveBayes(NB)classifierontrainingdata.from
sklearn.naive_bayes import MultinomialNB
clf=MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy metrics


fromsklearnimportmetrics
print('Accuracymetrics')
print('Accuracyof the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))

'''docs_new=['Ilikethisplace', 'Mybossisnotmysaviour']

26
X_new_counts=count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
fordoc,categoryinzip(docs_new,predictednew):
print('%s->%s'%(doc,msg.labelnum[category]))'''

I love this sandwich,pos


Thisisanamazingplace,pos
Ifeelverygoodaboutthesebeers,pos
This is my best work,pos
Whatanawesomeview,pos
Idonotlikethisrestaurant,neg I
am tired of this stuff,neg
I can't deal with this,neg
Heismyswornenemy,neg
My boss is horrible,neg
Thisisanawesomeplace,pos
Idonotlikethetasteofthisjuice,neg I
love to dance,pos
Iamsickandtiredofthisplace,neg What
a great holiday,pos
Thatis abadlocalityto stay,neg
We will have good fun tomorrow,pos
Iwenttomyenemy'shousetoday,neg

OUTPUT

['about','am','amazing','an','and','awesome','beers','best','boss','can','deal',
'do','enemy','feel','fun','good','have','horrible','house','is','like','love','my',
'not','of','place','restaurant','sandwich','sick','stuff','these','this','tired','to',
'today','tomorrow','very','view','we','went','what','will','with','work'] about
am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 01 0 0 0 ... 0
1 0 0 0 0 0 00 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 00 0 0 0 ... 1
4 0 0 0 0 0 00 0 0 0 ... 0
5 01 0 01 0 0 0 0 0 ... 0
6 0 0 0 0 0 00 0 0 1 ... 0
7 0 0 0 0 0 00 0 0 0 ... 0
8 0 1 0 0 0 00 0 0 0 ... 0
9 0 0 0 1 0 10 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0

27
tomorrowveryviewwewent whatwillwithwork
0 010 00 00 00
1 0 0 0 0 00 0 0 1
2 0 0 0 0 00 0 0 0
3 0 0 0 0 10 0 0 0
4 0 0 0 0 00 0 0 0
5 0 0 0 0 00 0 0 0
6 0 0 0 0 00 0 1 0
7 1 0 0 1 00 1 0 0
8 0 0 0 0 00 0 0 0

28
7.WriteaprogramtoconstructaBayesiannetworkconsideringmedicaldata.Use this
model to demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set. You can use Java/Python ML library classes/API.

From pomegranate import*


Asia=DiscreteDistribution({„True‟:0.5,„False‟:0.5})
Tuberculosis=ConditionalProbabilityTable(
[[„True‟,„True‟,0.2],
[„True‟,„False‟,0.8],
[„False‟,„True‟,0.01],
[„False‟,„False‟,0.98]],[asia])

Smoking=DiscreteDistribution({„True‟:0.5,„False‟:0.5}) Lung
= ConditionalProbabilityTable(
[[„True‟,„True‟,0.75],
[„True‟,„False‟,0.25].
[„False‟,„True‟,0.02],
[„False‟,„False‟,0.98]],[ smoking])

Bronchitis =
ConditionalProbabilityTable([[ „True‟,
„True‟, 0.92],
[„True‟,„False‟,0.08].
[„False‟,„True‟,0.03],
[„False‟,„False‟,0.98]],[ smoking])

Tuberculosis_or_cancer=ConditionalProbabilityTable(
[[ „True‟, „True‟, „True‟, 1.0],
[„True‟,„True‟,„False‟,0.0],
[„True‟,„False‟,„True‟,1.0],
[„True‟,„False‟,„False‟,0.0],
[„False‟,„True‟,„True‟, 1.0],
[„False‟,„True‟,„False‟,0.0],
[„False‟,„False‟„True‟, 1.0],
[„False‟,„False‟,„False‟,0.0]],[tuberculosis,lung])

Xray =
ConditionalProbabilityTable([[ „True
‟, „True‟, 0.885],
[„True‟,„False‟,0.115],
[„False‟,„True‟,0.04],

29
[„False‟,„False‟,0.96]],[tuberculosis_or_cancer])
dyspnea = ConditionalProbabilityTable(
[[„True‟,„True‟,„True‟, 0.96],
[„True‟,„True‟,„False‟,0.04],
[„True‟,„False‟,„True‟,0.89],
[„True‟,„False‟,„False‟,0.11],
[„False‟,„True‟,„True‟, 0.96],
[„False‟,„True‟,„False‟,0.04],
[„False‟,„False‟„True‟, 0.89],
[„False‟,„False‟,„False‟,0.11]],[tuberculosis_or_cancer,bronchitis]) s0 =
State(asia, name=”asia”)
s1=State(tuberculosis,name=”tuberculosis”)
s2 = State(smoking, name=” smoker”)

network = BayesianNetwork(“asia”)
network.add_nodes(s0,s1,s2)
network.add_edge(s0,s1)
network.add_edge(s1.s2)
network.bake()
print(network.predict_probal({„tuberculosis‟:„True‟}))

30
8.ApplyEMalgorithmtoclusterasetofdatastoredina.CSVfile.Usethesamedata set for
clustering using k-Means algorithm. Compare the results of these two
algorithmsandcommentonthequalityof clustering.YoucanaddJava/Python ML
library classes/API in the program.

importnumpyasnp
importmatplotlib.pyplotasplt
fromsklearn.datasets.samples_generatorimportmake_blobs
X, y_true = make_blobs(n_samples=100, centers =
4,Cluster_std=0.60,random_state=0)
X=X[:,::-1]

#flipaxesforbetterplotting
fromsklearn.mixtureimportGaussianMixture
gmm=GaussianMixture(n_components=4).fit(X)
lables = gmm.predict(X)
plt.scatter(X[:,0],X[:,1],c=labels,s=40,cmap=‟viridis‟);
probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size=50*probs.max(1)**2#squareemphasizesdifferences
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=‟viridis‟, s=size);

frommatplotlib.patchesimportEllipse
defdraw_ellipse(position, covariance, ax=None, **kwargs);
“””Drawanellipsewithagivenpositionandcovariance”””
Ax=axorplt.gca()
#Convertcovariancetoprincipal axes
ifcovariance.shape==(2,2):

U,s,Vt=np.linalg.svd(covariance)
Angle=np.degrees(np.arctan2(U[1,0],U[0,0]))
Width, height = 2 * np.sqrt(s)
else:
angle=0
width,height=2*np.sqrt(covariance)

#DrawtheEllipse
fornsigin range(1,4):
ax.add_patch(Ellipse(position,nsig*width,nsig*height,
angle, **kwargs))

defplot_gmm(gmm,X,label=True,ax=None): ax
= ax or plt.gca()
labels= gmm.fit(X).predict(X)
if label:

31
ax.scatter(X[:,0],x[:,1],c=labels,s=40,cmap=‟viridis‟,zorder=2) else:
ax.scatter(X[:,0],x[:,1],s=40,zorder=2)
ax.axis(„equal‟)

w_factor=0.2/gmm.weights_.max()
forpos,covar,winzip(gmm.means_,gmm.covariances_,gmm.weights_): draw_ellipse(pos,
covar, alpha=w * w_factor)

gmm=GaussianMixture(n_components=4,random_state=42)
plot_gmm(gmm, X)
gmm=GaussianMixture(n_components=4,covariance_type=‟full‟,
random_state=42)
plot_gmm(gmm,X)

Output

[[1,0,0, 0]
[0,0,1, 0]
[1,0,0, 0]
[1,0,0, 0]
[1,0, 0, 0]]

32
K-means
fromsklearn.clusterimportKMeans

#fromsklearnimportmetrics
import numpy as np
importmatplotlib.pyplot as plt
import pandas as
pddata=pd.read_csv("kmeansdata.cs
v") df1=pd.DataFrame(data)
print(df1)
f1=df1['Distance_Feature'].values
f2=df1['Speeding_Feature'].values

X=np.matrix(list(zip(f1,f2)))
plt.plot()
plt.xlim([0,100])
plt.ylim([0, 50])
plt.title('Dataset')
plt.ylabel('speeding_feature')
plt.xlabel('Distance_Feature')
plt.scatter(f1,f2)
plt.show()

#createnewplotanddata plt.plot()
colors=['b','g','r']
markers=['o','v','s']

#KMeansalgorithm
#K = 3
kmeans_model=KMeans(n_clusters=3).fit(X)

plt.plot()
fori,linenumerate(kmeans_model.labels_):
plt.plot(f1[i],f2[i],color=colors[l],marker=markers[l],ls='None') plt.xlim([0,
100])
plt.ylim([0,50])

plt.show()

Driver_ID,Distance_Feature,Speeding_Feature
3423311935,71.24,28
3423313212,52.53,25
3423313724,64.54,27
3423311373,55.69,22
3423310999,54.58,25

33
3423313857,41.91,10
3423312432,58.64,20
3423311434,52.02,8
3423311328,31.25,34
3423312488,44.31,19
3423311254,49.35,40
3423312943,58.07,45
3423312536,44.22,22
3423311542,55.73,19
3423312176,46.63,43
3423314176,52.97,32
3423314202,46.25,35
3423311346,51.55,27
3423310666,57.05,26
3423313527,58.45,30
3423312182,43.42,23
3423313590,55.68,37
3423312268,55.15,18

34
9. Writeaprogramtoimplement k-NearestNeighbouralgorithmtoclassifytheiris
dataset. Printbothcorrect andwrongpredictions.Java/Python MLlibraryclasses can
be used for this problem.
importcsv
import random
import math
importoperator

defloadDataset(filename,split,trainingSet=[],testSet=[]):
with open(filename, 'rb') as csvfile:
lines=csv.reader(csvfile)
dataset = list(lines)
forxinrange(len(dataset)-1): for
y in range(4):
dataset[x][y] = float(dataset[x][y])
if random.random() < split:
trainingSet.append(dataset[x])
else:
testSet.append(dataset[x])

defeuclideanDistance(instance1,instance2,length):
distance = 0
forxinrange(length):
distance+=pow((instance1[x]-instance2[x]),2) return
math.sqrt(distance)

defgetNeighbors(trainingSet,testInstance,k):
distances = []
length=len(testInstance)-1
forx inrange(len(trainingSet)):
dist=euclideanDistance(testInstance,trainingSet[x],length)
distances.append((trainingSet[x], dist))
distances.sort(key=operator.itemgetter(1))
neighbors = []
forxinrange(k):
neighbors.append(distances[x][0])
return neighbors

defgetResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)):
response=neighbors[x][-1]
if response in classVotes:
classVotes[response]+=1
else:
classVotes[response]=1
35
sortedVotes =
sorted(classVotes.iteritems(),
reverse=True)
returnsortedVotes[0][0]

defgetAccuracy(testSet,
predictions): correct = 0
for x in
range(len(testSet)):
key=operator.itemgetter(1
),
iftestSet[x][-1]==predictions[x]:
correct += 1
return(correct/float(len(testSet)))*100.0

defmain():
# prepare
data
trainingSet=
[]testSet=[]
split = 0.67
loadDataset('knndat.data', split, trainingSet,
testSet)print('Trainset:'+repr(len(trainingSet)))
print('Test set: ' + repr(len(testSet)))
# generate
predictions
predictions=[]
k=3
forx inrange(len(testSet)):
neighbors=getNeighbors(trainingSet,testSet[x], k)
result = getResponse(neighbors)
predictions.append(result)
print('>predicted='+repr(result)+',actual='+repr(testSet[x][- 1]))
accuracy = getAccuracy(testSet, predictions)
print('Accuracy:'+repr(accuracy)+

'%') main()

36
OUTPUT
Confusionmatrixisasfollows

[[1100]

[0 9 1]

[0 1 8]]

Accuracymetrics

01.001.001.0011

1 0.90 0.90 0.90 10

2 0.89 0.89 0,899

Avg/Total0.93 0.93 0.93 30

37
10.Implementthenon-parametricLocallyWeightedRegressionalgorithminorder
tofitdatapoints.Select appropriatedatasetforyourexperiment and drawgraphs.

fromnumpyimport* import
operator
fromosimportlistdir import
matplotlib
importmatplotlib.pyplotasplt
import pandas as pd
importnumpy as np1
importnumpy.linalgasnp
fromscipy.stats.statsimportpearsonr

defkernel(point,xmat,k):
m,n=np1.shape(xmat)
weights=np1.mat(np1.eye((m)))
for j in range(m):
diff=point-X[j]
weights[j,j]=np1.exp(diff*diff.T/(-2.0*k**2))
return weights

deflocalWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

deflocalWeightRegression(xmat,ymat,k):
m,n = np1.shape(xmat)
ypred=np1.zeros(m)
for i in range(m):
ypred[i]=xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

#loaddatapoints
data=pd.read_csv('data10.csv')
bill = np1.array(data.total_bill)
tip = np1.array(data.tip)

#preparingandadd1inbill mbill
= np1.mat(bill)
mtip=np1.mat(tip)
m=np1.shape(mbill)[1]
one = np1.mat(np1.ones(m))
X= np1.hstack((one.T,mbill.T))
#setk here
ypred=localWeightRegression(X,mtip,2)

38
SortIndex=X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

Output

39
VivaQuestions
1. Whatismachinelearning?
2. Definesupervisedlearning
3. Defineunsupervisedlearning
4. Definesemisupervisedlearning
5. Definereinforcementlearning
6. Whatdoyoumeanby hypotheses
7. Whatisclassification
8. Whatisclustering
9. Defineprecision,accuracyandrecall
10. Defineentropy

11. Defineregression
12. HowKnnisdifferentfromk-meansclustering

13. Whatisconceptlearning
14. Definespecificboundaryandgeneralboundary
15. Definetargetfunction
16. Definedecisiontree
17. WhatisANN
18. Explaingradientdescentapproximation
19. StateBayestheorem
20. Define Bayesian belief networks
21.Differentiatehardandsoftclustering
22. Definevariance
23. Whatisinductivemachinelearning
24. WhyKnearestneighbouralgorithmislazylearningalgorithm
25. WhynaïveBayesisnaïve
26. Mentionclassificationalgorithms
27. Definepruning

40
28. DifferentiateClusteringandclassification
29. Mentionclusteringalgorithms
30. DefineBias

41

You might also like