UNIT 1 Notes
UNIT 1 Notes
CONCEPTLEARNING
Definition:Conceptlearning-InferringaBoolean-
valuedfunctionfromtrainingexamplesofitsinputandoutput
ACONCEPTLEARNINGTASK
Consider the example task of learning the target concept "Days on which Aldo
enjoyshis favorite water sport”
Table:PositiveandnegativetrainingexamplesforthetargetconceptEnjoySport.
The task is to learn to predict the value of EnjoySport for an arbitrary day, based on
thevaluesofitsotherattributes?
Whathypothesisrepresentationisprovidedtothelearner?
• Let’sconsiderasimplerepresentationinwhicheachhypothesisconsistsofaconjunctionofco
nstraints ontheinstanceattributes.
• Leteachhypothesisbeavectorofsixconstraints,specifyingthevaluesofthesixattributesSky,
AirTemp,Humidity,Wind, Water,andForecast.
Foreachattribute,thehypothesis willeither
• Indicatebya"?'that anyvalue isacceptable forthis attribute,
• Specifyasinglerequired value(e.g., Warm)forthe attribute,or
• Indicatebya "Φ"that novalueis acceptable
The hypothesis that PERSON enjoys his favorite sport only on cold days with high
humidityisrepresentedbytheexpression
(?,Cold,High,?, ?,?)
Themostgeneralhypothesis-thateverydayisapositiveexample-isrepresentedby
(?,?, ?,?,?,?)
Notation
• Thesetofitemsoverwhichtheconceptisdefinediscalledthesetofinstances,whichisdenotedbyX.
Example:Xisthesetofallpossibledays,eachrepresentedbytheattributes:Sky,AirTemp,Humidity,
Wind,Water,andForecast
• The concept or function to be learned is called the target concept, which is denoted by
c.ccan be anyBooleanvalued functiondefinedover theinstances X
c:X→{O,1}
• Instancesforwhichc(x)=1arecalledpositiveexamples,ormembersofthetargetconcept.
• Instancesforwhichc(x)=0arecallednegativeexamples,ornon-membersofthetargetconcept.
• Theorderedpair(x,c(x))todescribethetrainingexampleconsistingoftheinstancexanditstarget
conceptvaluec(x).
• Dtodenotethesetofavailabletrainingexamples
• The symbol H to denote the set of all possible hypotheses that the learner may
considerregarding the identity of the target concept. Each hypothesis h in H represents a
Boolean-valuedfunctiondefinedoverX
h:X→{O,1}
• Given:
• InstancesX:Possibledays,eachdescribedbytheattributes
• Sky(withpossiblevaluesSunny, Cloudy,and Rainy),
• AirTemp(withvalues WarmandCold),
• Humidity(withvalues NormalandHigh),
• Wind(withvalues StrongandWeak),
• Water(withvaluesWarmand Cool),
• Forecast(withvalues SameandChange).
• Targetconceptc:EnjoySport:X→{0,l}
• Trainingexamples D: Positiveandnegativeexamplesof thetargetfunction
• Determine:
• AhypothesishinHsuchthath(x)=c(x)forallxinX.
Table:TheEnjoySportconceptlearningtask.
Theinductivelearninghypothesis
Any hypothesis found to approximate the target function well over a sufficiently large set
oftrainingexampleswillalsoapproximatethetargetfunctionwelloverotherunobservedexamples.
CONCEPTLEARNINGASSEARCH
• Conceptlearningcanbeviewedasthetaskofsearchingthroughalargespaceofhypothesesimp
licitlydefinedbythehypothesisrepresentation.
• Thegoalofthissearchistofindthehypothesisthatbestfitsthetrainingexamples.
Example:
Consider the instances X and hypotheses H in the EnjoySport learning task. The attribute
Skyhas three possible values, and AirTemp, Humidity, Wind, Water, Forecast each have
twopossiblevalues,theinstancespaceXcontainsexactly
3.2.2.2.2.2=96distinctinstances
5.4.4.4.4.4 =5120syntacticallydistincthypotheseswithinH.
Every hypothesis containing one or more "Φ" symbols represents the empty set of
instances;thatis,itclassifieseveryinstanceasnegative.
1+(4.3.3.3.3.3)=973. Semanticallydistincthypotheses
General-to-SpecificOrderingofHypotheses
Considerthetwohypotheses
h1=(Sunny,?,?,Strong,?,?)
h2= (Sunny,?,?, ?,?,?)
• Considerthesetsofinstancesthatareclassifiedpositivebyhlandbyh2.
• h2imposesfewerconstraintsontheinstance,itclassifiesmoreinstancesaspositive.So,anyinst
anceclassifiedpositivebyhlwillalsobeclassifiedpositivebyh2.Therefore,h2is
moregeneralthanhl.
Given hypotheses hjand hk, hjis more-general-than or- equal do hkif and only if any
instancethatsatisfieshkalsosatisfies hi
Definition:LethjandhkbeBoolean-valuedfunctionsdefinedoverX.Thenhjismoregeneral-than-
or-equal-tohk(writtenhj≥hk)ifandonlyif
(xX)[(hk(x)=1)→(hj(x)=1)]
• Inthefigure,theboxontheleftrepresentsthesetXofallinstances,theboxontherightthesetHof
allhypotheses.
• EachhypothesiscorrespondstosomesubsetofX-
thesubsetofinstancesthatitclassifiespositive.
• Thearrows connectinghypothesesrepresentthemore-general-thanrelation, with
thearrowpointing towardthe lessgeneralhypothesis.
• Notethesubsetofinstancescharacterizedbyh2subsumesthesubsetcharacterizedbyhl,hence
h2ismore- general–than h1
FIND-S:FINDINGAMAXIMALLYSPECIFICHYPOTHESIS
FIND-SAlgorithm
• ThefirststepofFIND-Sistoinitializeh tothemostspecifichypothesisinH
h-(Ø,Ø,Ø,Ø,Ø,Ø)
• Considerthefirsttrainingexample
x1=<SunnyWarmNormalStrong WarmSame>,+
Observing the first training example, it is clear that hypothesis h is too specific.
Noneofthe"Ø"constraintsinharesatisfiedbythisexample,soeachisreplacedbythenextmore
generalconstraintthatfitstheexample
h1=<SunnyWarmNormalStrong WarmSame>
• Considerthesecond trainingexample
x2=<Sunny, Warm,High,Strong, Warm,Same>,+
The second training example forces the algorithm to further generalize h, this
timesubstituting a "?" in place of any attribute value in h that is not satisfied by the
newexample
h2=<SunnyWarm?Strong WarmSame>
• Considerthethirdtrainingexample
x3=<Rainy,Cold,High,Strong,Warm, Change>,-
Upon encountering the third training the algorithm makes no change to h. The FIND-
Salgorithmsimplyignoreseverynegativeexample.
h3=<SunnyWarm? Strong WarmSame>
• Considerthefourthtrainingexample
x4=<SunnyWarmHighStrong Cool Change>,+
Thefourthexample leadstoafurthergeneralizationofh
h4=<SunnyWarm?Strong ??>
Thekey propertyoftheFIND-Salgorithm
• FIND-Sis guaranteedtooutputthe mostspecifichypothesiswithinH
thatisconsistentwiththepositivetrainingexamples
• FIND-
Salgorithm’sfinalhypothesiswillalsobeconsistentwiththenegativeexamplesprovidedthec
orrecttargetconceptiscontainedinH,andprovidedthetrainingexamplesarecorrect.
UnansweredbyFIND-S
1. Hasthelearnerconvergedtothecorrecttargetconcept?
2. Whypreferthemostspecifichypothesis?
3. Arethetrainingexamplesconsistent?
4. Whatifthereareseveral maximallyspecificconsistenthypotheses?
VERSIONSPACESANDTHECANDIDATE-ELIMINATIONALGORITHM
ThekeyideaintheCANDIDATE-
ELIMINATIONalgorithmistooutputadescriptionofthesetofallhypotheses
consistentwiththetrainingexamples
Representation
c(x))Notedifferencebetweendefinitionsofconsistentandsatisfies
• An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether x
isapositive or negativeexample ofthetargetconcept.
• Anexample xissaidtoconsistentwithhypothesis hiff h(x)=c(x)
Definition:versionspace-Theversionspace,denotedVS withrespecttohypothesisspace
H,D
HandtrainingexamplesD,isthesubsetofhypothesesfromHconsistentwiththetrainingexamplesinD
VS {hH|Consistent(h,D)}
H, D
TheLIST-THEN-ELIMINATIONalgorithm
1. VersionSpacecalistcontainingeveryhypothesisinH
2. Foreach training example,(x,c(x))
removefromVersionSpace anyhypothesishforwhichh(x)≠c(x)
3. OutputthelistofhypothesesinVersionSpace
TheLIST-THEN-ELIMINATEAlgorithm
• List-Then-Eliminateworksinprinciple,solongasversionspaceisfinite.
• However,sinceitrequiresexhaustiveenumerationofallhypothesesinpracticeitisnotfeasibl
e.
AMoreCompact RepresentationforVersionSpaces
Definition:ThegeneralboundaryG,withrespecttohypothesisspaceHandtrainingdataD,
isthesetofmaximallygeneral membersofH consistentwithD
G{gH|Consistent(g,D)(g'H)[(g'g)Consistent(g', D)]}
g
Definition: ThespecificboundaryS,withrespecttohypothesisspaceHandtrainingdataD,
isthesetofminimallygeneral(i.e.,maximallyspecific)members ofHconsistentwithD.
S{sH|Consistent(s,D)(s'H)[(ss')Consistent(s',D)]}
g
VS ={hH|(sS)(gG)(ghs)}
H,D g g
ToProve:
1. Everyhsatisfyingtherighthand sideof theaboveexpressionisinVS
H, D
2. EverymemberofVS satisfiestheright-handsideof theexpression
H, D
Sketchof proof:
1. letg,h,sbearbitrarymembersof G,H,Srespectivelywithgghgs
• BythedefinitionofS, smustbesatisfiedbyallpositiveexamplesinD.Becausehgs,
h mustalsobesatisfiedbyallpositiveexamplesinD.
• BythedefinitionofG,gcannotbesatisfiedbyanynegativeexampleinD,andbecauseg g h h
cannot be satisfied by any negative example in D. Because h is satisfied by allpositive
examples in D and by no negative examples in D, h is consistent with D,
andthereforehisamemberofVSH,D.
2. It can be proven by assuming some h in VSH,D,that does not satisfy the right-hand
sideoftheexpression,thenshowingthatthis leadstoaninconsistency
CANDIDATE-ELIMINATIONLearningAlgorithm
TheCANDIDATE-
ELIMINTIONalgorithmcomputestheversionspacecontainingallhypothesesfromH that
areconsistentwithan observed sequenceof trainingexamples.
• Ifdisanegativeexample
• RemovefromSanyhypothesisinconsistentwithd
• ForeachhypothesisginGthatis notconsistentwithd
• RemovegfromG
• AddtoGall minimalspecializationshof gsuchthat
• hisconsistentwithd,andsomememberofSismorespecificthanh
• RemovefromGanyhypothesisthatislessgeneralthananotherhypothesisinG
CANDIDATE-ELIMINTIONalgorithmusingversionspaces
AnIllustrativeExample
Initializing theGboundarysettocontainthemostgeneralhypothesisinH
G0?, ?,?, ?,?,?
InitializingtheSboundarysettocontainthemostspecific(leastgeneral)hypothesis
S0,,,,,
• When the second training example is observed, it has a similar effect of generalizing
SfurthertoS2,leavingGagainunchangedi.e.,G2=G1=G0
• Considerthethirdtrainingexample.ThisnegativeexamplerevealsthattheGboundaryof the
version space is overly general, that is, the hypothesis in G incorrectly
predictsthatthisnewexampleisapositiveexample.
• The hypothesis in the G boundary must therefore be specialized until it
correctlyclassifiesthisnewnegativeexample
Given that there are six attributes that could be specified to specialize G2, why are there
onlythreenewhypothesesinG3?
For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization of
G2thatcorrectlylabelsthenewexampleasanegativeexample,butitisnotincludedinG3.The
reason this hypothesis is excluded is that it is inconsistent with the
previouslyencountered positive examples
• Considerthefourthtrainingexample.
• This positive example further generalizes the S boundary of the version space. It
alsoresults in removing one member of the G boundary, because this member fails
tocoverthenew positiveexample
Afterprocessingthesefourexamples,theboundarysets S4and
G4delimittheversionspaceofallhypothesesconsistentwiththeset of
incrementallyobservedtrainingexamples.
INDUCTIVEBIAS
Thefundamentalquestionsforinductiveinference
1. Whatifthetargetconceptisnotcontainedinthehypothesisspace?
2. Canweavoidthisdifficultybyusingahypothesisspacethatincludeseverypossiblehypothe
sis?
3. How does the size of this hypothesis space influence the ability of the algorithm
togeneralizetounobservedinstances?
4. How does the size of the hypothesis space influence the number of training
examplesthat mustbeobserved?
ABiasedHypothesis Space
• Suppose the target concept is not contained in the hypothesis space H, then
obvioussolutionis toenrichthehypothesisspacetoincludeeverypossiblehypothesis.
• ConsidertheEnjoySportexampleinwhichthehypothesisspaceisrestrictedtoincludeonlyco
njunctionsofattributevalues.Becauseofthisrestriction,thehypothesisspaceisunabletorepr
esent evensimpledisjunctivetargetconceptssuchas
"Sky= SunnyorSky=Cloudy."
• The following three training examples of disjunctive hypothesis, the algorithm
wouldfindthattherearezerohypothesesinthe versionspace
SunnyWarmNormalStrongCoolChange Y
CloudyWarmNormalStrongCoolChange Y
RainyWarmNormalStrongCoolChange N
• IfCandidateEliminationalgorithmisapplied,thenitendupwithemptyVersionSpace.Afterfi
rsttwotrainingexample
S=?WarmNormalStrongCoolChange
• This new hypothesis is overly general and it incorrectly covers the third
negativetrainingexample!SoHdoesnotincludetheappropriatec.
• Inthiscase,amore expressivehypothesisspaceisrequired.
AnUnbiasedLearner
• ThesolutiontotheproblemofassuringthatthetargetconceptisinthehypothesisspaceHis to
provide a hypothesis space capable of representing every teachable concept that
isrepresenting everypossiblesubsetoftheinstancesX.
• ThesetofallsubsetsofasetXiscalledthepower setofX
• IntheEnjoySportlearningtaskthesizeoftheinstancespaceXofdaysdescribedbythesix
attributesis96instances.
• Thus,thereare296distincttargetconceptsthatcouldbedefinedoverthisinstancespaceandlearner
mightbecalledupontolearn.
• Theconjunctivehypothesisspaceisabletorepresentonly973ofthese-
abiasedhypothesisspaceindeed
• LetusreformulatetheEnjoySportlearningtaskinanunbiasedway
bydefininganewhypothesis spaceH'thatcanrepresenteverysubsetofinstances
• Thetargetconcept"Sky=SunnyorSky=Cloudy"couldthenbedescribedas
TheFutility ofBias-FreeLearning
Inductivelearningrequiressomeformofpriorassumptions,or inductivebias
Definition:
ConsideraconceptlearningalgorithmLforthesetofinstancesX.
• LetcbeanarbitraryconceptdefinedoverX
• LetD ={(x,c(x))}beanarbitrarysetoftrainingexamplesofc.
c
• LetL(x,D)denotetheclassificationassignedtotheinstancex byLafter trainingon
i c i
thedataD.
c
• TheinductivebiasofLisanyminimalsetofassertionsBsuchthatforanytargetconceptcand
correspondingtrainingexamples D
c
• (xiX)[(BDcxi)├L(xi,Dc)]
Thebelowfigureexplains
• Modellinginductivesystemsbyequivalentdeductivesystems.
• The input-output behavior of the CANDIDATE-ELIMINATION algorithm using
ahypothesis space H is identical to that of a deductive theorem prover utilizing
theassertion"Hcontainsthetargetconcept."Thisassertionisthereforecalledtheinductivebia
softheCANDIDATE-ELIMINATIONalgorithm.
• Characterizinginductivesystemsbytheirinductivebiasallowsmodellingthembytheirequiv
alentdeductivesystems.Thisprovidesaway
tocompareinductivesystemsaccordingtotheirpoliciesforgeneralizingbeyond
theobservedtrainingdata.