0% found this document useful (0 votes)
60 views37 pages

2-Inductive Learning

This document provides an introduction to machine learning concepts including different types of learning problems and algorithms. It discusses supervised learning in detail, including classification and regression as standard formulations. Supervised learning involves learning a function from labeled input-output examples. Unsupervised learning models inputs without labels by tasks like clustering. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The document also summarizes representation, evaluation, and optimization as key components of machine learning algorithms.

Uploaded by

Karishma Jani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views37 pages

2-Inductive Learning

This document provides an introduction to machine learning concepts including different types of learning problems and algorithms. It discusses supervised learning in detail, including classification and regression as standard formulations. Supervised learning involves learning a function from labeled input-output examples. Unsupervised learning models inputs without labels by tasks like clustering. Reinforcement learning involves an agent learning through trial-and-error interactions with an environment. The document also summarizes representation, evaluation, and optimization as key components of machine learning algorithms.

Uploaded by

Karishma Jani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Introduc)on

 to  Machine  
Learning:  CS  436/580L  
Induc&ve  (Supervised)  
Learning:  Hypothesis  Spaces  
Instructor:  Ar)  Ramesh  
Binghamton  University  
Administrivia  
•  Homework  0  is  available  on  myCourses  
–  Worth  4/40  points  
–  Due  Sep  6th,  Wednesday,  11:59  pm  
–  Late  penalty  is  10%  aPer  the  deadline  
 
 

2
Probability  Review  Recap  
•  Mutual  Exclusion  
•  Independence  
•  Condi)onal  Independence  
•  Expecta)on  
•  Variance  

3
Probability  Review  
The   weather   on   a   par)cular   day   can   be   sunny,  
cloudy,  or  rainy.  It  can  be  sunny  with  probability  =  
0.3,  cloudy  with  probability  =  0.4,  and    rainy  with  
probability  =  0.3.  A  concert  is  planned  to  be  held  
in  the  city.  If  the  weather  is  sunny,  the  concert  will  
be  held  100%.  If  the  weather  is  cloudy  or  rainy,  it  
will   be   held   with   probability   0.8   and   0.5,  
respec)vely.    
What   is   the   probability   that   the   concert   will   be  
held?  
4
Probability  Review  
Let  X  denote  the  sum  of  two  fair  dice.  What  is  the  
expecta)on  of  X?  
   

5
Recap  
•  Different  defini)ons  of  machine  learning  and  all  
are  correct!!!  
•  Slight  varia)ons  according  to  type  of  learning  
•  Types  of  Learning  
–  Supervised  Learning  
–  Unsupervised  Learning  
–  Reinforcement  Learning  
–  Semi-­‐Supervised  Learning  

6
Types  of  Learning  
•  Supervised  Learning  
–  problem:  the  learner  is  required  to  learn  a  func&on  
which  maps  a  vector  into  one  of  several  classes  by  
looking  at  several  input-­‐output  examples  of  the  
func)on.  
–  standard  formula)on  of  the  supervised  learning  
task:  classifica&on    

7
Types  of  Learning  
•  Unsupervised  Learning  
–  models  a  set  of  inputs:  labeled  examples  are  not  
available  
–  standard  formula)on  of  the  unsupervised  learning  
task:  clustering  
•  Semi-­‐supervised  Learning  
–  combines  both  labeled  and  unlabeled  examples  to  
generate  an  appropriate  func)on  or  classifier  

8
Types  of  Learning  
•  Reinforcement  Learning  
–  the  algorithm  learns  a  policy  of  how  to  act  given  an  
observa)on  of  the  world  
–  Every  ac)on  has  some  impact  in  the  environment,  
the  environment  provides  feedback  that  guides  the  
learning  algorithm  
 

9
Which  type  of  learning  is  best?  
•  Determining  the  best  move  to  make  in  a  game  
 
•  Dis)nguish  between  dogs,  cats,  and  horse  pictures  
   
•  Elevator  scheduling  
   
•  Agent  in  field  trying  to  diffuse  a  bomb  
 
•  Speech  analysis  of  telephone  conversa)on  (400  
hours  annota)on  )me  for  each  hour  of  speech)  

10
ML  in  a  Nutshell  
•  Tens  of  thousands  of  machine  learning  
algorithms  
•  Hundreds  new  every  year  
•  Every  machine  learning  algorithm  has  three  
components:  
–  Representa&on  
–  Evalua&on  
–  Op&miza&on  

11
Representa&on  
•  Decision  trees  
•  Sets  of  rules  /  Logic  programs  
•  Instances  
•  Graphical  models  (Bayes/Markov  nets)  
•  Neural  networks  
•  Support  vector  machines  
•  Model  ensembles  
•  Etc.  

12
Evalua&on  
•  Accuracy  
•  Precision  and  recall  
•  Squared  error  
•  Likelihood  
•  Posterior  probability  
•  Cost  /  U)lity  
•  Margin  
•  Entropy  
•  K-­‐L  divergence  
•  Etc.  

13
Op&miza&on  
•  Combinatorial  op)miza)on  
–  E.g.:  Greedy  search  
•  Convex  op)miza)on  
–  E.g.:  Gradient  descent  
•  Constrained  op)miza)on  
–  E.g.:  Linear  programming  

14
Supervised  Learning  
•  Given:  Training  examples  (x,  f(x)),  for  some  unknown  func)on  
f  
•  Find:  an  approxima)on  to  f  
Example  Applica&ons  
•  Credit  risk  assessments  
–  x:  proper)es  of  customer  and  proposed  purchase  
–  f(x):  to  approve/reject  purchase  
•  Disease  diagnosis  
–  x:  proper)es  of  pa)ent  (symptoms,  lab  tests)  
–  f(x):  disease  diagnosis,  recommended  therapy  
•  Face  recogni&on  
–  x:  bitmap  picture  of  person’s  face  
–  f(x):  Person’s  name  

15
Appropriate  Applica&ons  for  
Supervised  Learning  
•  Situa&ons  where  there  is  no  human  expert  
–  x:  bond  graph  for  a  new  molecule  
–  f(x):  predicted  binding  strength  to  AIDS  protease  molecule  
•  Situa&ons  where  humans  can  perform  the  task  but  cant  
describe  how  to  do  it  
–  x:  bitmap  picture  of  handwrihen  character  
–  f(x):  ASCII  code  of  character  
•  Situa&ons  where  desired  f(x)  is  changing  rapidly  
–  x:    descrip)on  of  stock  prices  and  trades  for  last  10  days  
–  f(x):  recommended  stock  transac)ons  
•  Situa&ons  where  each  user  needs  a  customized  f  
–  x:    incoming  email  message  
–  f(x):  importance  score  for  presen)ng  to  user  

16
Example:  A  dataset  for  supervised  
learning  
Sepal  Length   Sepal  Width   Petal  Length   Petal  Width   Class  
5.1   3.5   1.4   0.2   Iris-­‐Sentosa  
6.1   3.0   4.6   1.4   Iris-­‐Versicolor  
7.2   3.6   6.1   2.5   Iris-­‐Virginica  

•  Columns  are  called  input  variables,  features,  or  aSributes  


•  The  type  of  flower  {Iris-­‐Sentosa,  Iris-­‐Versicolor,  Iris-­‐Virginica}  are  
called  target  variables,  output  variables,  or  labels  
•  A  row  in  the  table  is  called  a  training  example  
•  The  whole  table  is  called  (training,  valida&on,  test  or  
evalua&on)  data  set  
•  The  problem  of  predic)ng  the  label  is  called  classifica&on  

17
Supervised Learning
Supervised  Learning:   Problem
Formal   Defini&on  
test point
P(x,y) <x,y>
training points x y
training learning f
samples algorithm

Training examples are drawn from independently at


random according to unknown probability distribution ŷ y
P(x,y)
The learning algorithm analyzes the the examples and
produces a classifier f or hypothesis h loss
function
Given a new data point <x,y> drawn from P
(independently and at random), the classifier is given x
and predicts ŷ = f(x)
The loss L(ŷ,y) is then measured. L(ŷ,y)
Goal of the learning algorithm: Find the f that minimizes
the expected loss.
18
A  learning  problem!  

X   0   X   X   0   X   X   X   X  
0   X   0   X   X   0   0   X   X   f(x)=1  
0   X   X   X   0   0   0   0   0  

0   X   0   0   0   X   0   X   X  
X   0   X   X   X   0   X   0   0   f(x)=0  
0   X   X   0   X   X   0   X   X  

0   X   X  
0   X   0   f(x)=?  

X   X   0  
19
A  Learning  Problem!  
X1   X2   X3   X4   X5   X6   X7   X8   X9   f(x)  
X   0   X   0   X   0   0   X   X   1  
X   0   X   X   X   0   X   0   0   1  
X   X   X   0   X   X   0   0   0   1  
0   X   0   X   0   X   0   X   X   0  
0   0   X   X   X   0   0   X   X   0  
0   X   X   X   0   0   0   X   X   0  
0   X   X   0   X   0   X   X   0   ?  
•  x:  a  9-­‐dimensional  vector  
•  f(x):  a  func)on  or  a  program  that  takes  the  vector  as  input  and  outputs  either  a  
0  or  a  1  
•  Task:  given  the  training  examples,  find  a  good  approxima)on  to  f  so  that  in  
future  if  you  see  an  unseen  vector  “x”  you  will  be  able  to  figure  out  the  value  
of  f(x)  

20
Example  of  a  learning  problem  

A  simpler  
example  for   Classifica&on  problem  
analysis!  

Given  data  or  examples,  find  the  func&on  f?  

21
How  to  find  a  good  approxima&on  to  f?  
•  A  possible/plausible  technique  
Unknown   Training  
func)on   Examples/Data   Learning  
𝑓:𝑋→𝑌   algorithm  
(𝑥,𝑓(𝑥))  

A  good  
Hypothesis   approxima)on:  
space  𝐻  
ℎ≈𝑓  

Set  of  candidate  func)ons  


(Your  assump)ons  about  f)  

22
Hypothesis  Spaces  

You  are  assuming  that  the   It  turns  out  that  out  of  the  ​
unknown  func)on  f  could  be   2↑16   possible  func)ons,  ​2↑9   
any  one  of  the  ​2↑16   func)ons!   classify  all  points  in  the  
training  data  correctly!  

23
Hypothesis  Spaces  
•  10,000  features  
•  Features  are  binary  
•  Output  is  binary  
 
Number  of  boolean  func&ons?  

24
Hypothesis  Spaces  

You  are  assuming  that  


the  unknown  func)on  f  
could  be  any  one  of  the  
16  conjunc)ve  rules!  

Unfortunately,  none  of  


them  work  

25
Hypothesis  Spaces  

At  least  m  of  the  n  


variables  must  be  true  

You  are  assuming  that  


the  unknown  func)on  f  
could  be  any  one  of  the  
32  m-­‐of-­‐n  rules!  

Only  one  of  them,  the  one  


marked  by  “***”  works!  

26
Two  Views  of  Learning  
•  Learning  is  the  removal  of  uncertainty  
•  Learning  requires  guessing  a  good,  small  hypothesis  
case  
•   We  could  be  wrong!  
–  Our  prior  knowledge  might  be  wrong!  
–  Our  guess  for  hypothesis  class  could  be  wrong!  
•  The  smaller  the  hypothesis  class,  more  likely  we  are  wrong!  

27
Strategies  for  Machine  Learning  
•  Strategy  1:  Develop  languages  for  expressing  
prior  knowledge:  rule  grammars  and  stochas)c  
models  
•  Strategy  2:  Develop  flexible  hypothesis  spaces:  
Nested  collec)ons  of  hypotheses  –  decision  
trees,  rules,  neural  networks,  …  
•  In  either  case:    
–  Develop  algorithms  for  finding  a  hypothesis  that  
fits  the  data!  

28
Terminology  
•  Training  Example:  An  example  of  form  (x,  f(x))  
•  Target  func)on  (target  concept):  The  true  func)on  f  
•  Hypothesis:  A  proposed  func)on  h  believed  to  be  similar  
to  f  
•  Concept:  A  boolean  func)on.  Examples  for  which  f(x)  =  1  
are  called  posi)ve  examples  or  posi)ve  instances  of  the  
concept.  Examples  for  which  f(x)  =  0  are  called  nega)ve  
examples  or  nega)ve  instances  of  the  concept.    
•  Classifier:  A  discrete-­‐valued  func)on.  The  possible  values  
of  f  are  called  class  labels    f ∈ {1, 2,...K}
•  Hypothesis  Space:  The  space  of  learning  algorithms  that  
can  be  output  by  a  learning  algorithm  

29
Key  Issues  in  Machine  Learning  
•  What  are  good  hypothesis  spaces?  
–  Which  spaces  are  useful  in  prac)cal  applica)ons  and  why?  
•  What  algorithms  can  work  in  these  spaces?  
–  Are  there  general  design  principles  for  machine  learning  
algorithms?  
•  How  can  we  op&mize  accuracy  on  future  data  points?  
–  This  is  some)mes  called  the  problem  of  overfivng  
•  How  can  we  have  confidence  in  the  results?  
–  How  much  training  data  is  required  to  find  accurate  hypothesis  
•  Are  some  learning  problems  computa&onally  intractable?    
 (the  computa)onal  ques)on!)  
•  How  can  we  formulate  applica&on  problems  as  machine  
learning  problems?    
 (the  engineering  ques)on!)  
30
Steps  in  Supervised  Learning  
1.  Determine  the  representa)on  for  “x,f(x)”  and  
determine  what  “x”  to  use  (Feature  Engineering)  
2.  Gather  a  training  set  (not  all  data  is  kosher)  
(Data  Cleaning)  
3.    Select  a  suitable  evalua)on  method  
4.  Find  a  suitable  learning  algorithm  among  a  
 plethora  of  available  choices  
–  Issues  discussed  on  the  previous  slide  
 
31
Feature  Engineering  is  the  Key  
•  Most  effort  in  ML  projects  is  construc)ng  
features  
•  Black  art:  Intui)on,  crea)vity  required  
–  Understand  proper)es  of  the  task  at  hand  
–  How  the  features  interact  with  or  limit  the  algorithm  
you  are  using.  
•  ML  is  an  itera)ve  process  
–  Try  different  types  of  features,  experiment  with  
each  and  then  decide  which  feature  set/algorithm  
combina)on  to  use  

32
A  sample  machine  learning  Algorithm  
•  2-­‐way  classifica)on  problem  
–  +ve  and  –ve  classes  
•  Representa)on:  Lines  (Ax+By=C)  
–  Specifically  
•  if  Ax+By+C  >0  then  classify  “+ve”  
•  Else  classify  as  “-­‐ve”  
•  Evalua)on:  Number  of  mis-­‐classified  examples  
•  Op)miza)on:  An  algorithm  that  searches  for  the  
three  parameters:  A,  B  and  C.  

33
Toy  Example  
Y Ax+By+C>0   Blue  circles:  Good  
credit  (low  risk)  
60K  
Red  circles:  Bad  
credit  (high  risk)  
45K  
Income  

Ax+By+C<0   Problem:  Fit  a  


30K   line  that  
separates  the  two  
15K   such  that  the  
x error  is  
20   40   60   80   minimized.  
Age  
34
How  do  machine  learners  solve  this  
problem?  
•  Try  different  lines  un)l  you  find  one  that  
separates  the  data  into  two  
•  A  more  plausible  alterna)ve  
–  Begin  with  a  random  line  
–  Repeat  un)l  no  errors  
–  For  each  point  
•  If  the  current  line  says  +ve  and  point  is  –ve  then  decrease  
A,  B  and  C  
•  If  the  current  line  says  –ve  and  the  point  is  +ve  then  
increase  A,  B,  and  C  

35
Toy  Example:  More  data  
x2
Blue  circles:  Good  
credit  (low  risk)  
Red  circles:  Bad  credit  
(high  risk)  
Income  

Problem:  Fit  a  line  


that  separates  the  
two  such  that  the  
error  is  minimized.  
x1
Age  
36
Learning  =  Representa&on  +  
       Evalua&on  +  Op&miza&on  
•  Combina)ons  of  just  three  elements  
Representa&on   Evalua&on   Op&miza&on  
Instances   Accuracy   Greedy  search  
Hyperplanes   Precision/Recall   Branch  &  bound  
Decision  trees   Squared  error   Gradient  descent  
Sets  of  rules   Likelihood   Quasi-­‐Newton  
Neural  networks   Posterior  prob.   Linear  progr.  
Graphical  models   Margin   Quadra)c  progr.  
Etc.   Etc.   Etc.  

37

You might also like