0% found this document useful (0 votes)

13 views11 pages

Section 1.1

Uploaded by

Azim Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Section 1.1

Uploaded by

Azim Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Chapter 1

The Learning P ro bl em

If you show a picture to a three-year-old and ask if there is a tree in it, you will
likely get the correct answer. If you ask a thirty-year-old what the definition
of a tree is, you will likely get an inconclusive answer. We didn't learn what
a tree is by studying the mathematical definition of trees. We learned it by
looking at trees. In other words, we learned from 'data'.
Learning from data is used in situations where we don't have an analytic
solution, but we do have data that we can use to construct an empirical solu
tion. This premise covers a lot of territory, and indeed learning from data is
one of the most widely used techniques in science, engineering, and economics,
among other fields.
In this chapter, we present examples of learning from data and formalize
the learning problem. We also discuss the main concepts associated with
learning, and the different paradigms of learning that have been developed.

1.1 Problem Setup

What do financial forecasting, medical diagnosis, computer vision, and search

engines have in common? They all have successfully utilized learning from
data. The repertoire of such applications is quite impressive. Let us open the
discussion with a real-life application to see how learning from data works.
Consider the problem of predicting how a movie viewer would rate the
various movies out there. This is an important problem if you are a company
that rents out movies, since you want to recommend to different viewers the
movies they will like. Good recommender systems are so important to business
that the movie rental company Netflix offered a prize of one million dollars to
anyone who could improve their recommendations by a mere 103.
The main difficulty in this problem is that the criteria that viewers use to
rate movies are quite complex. Trying to model those explicitly is no easy task,
so it may not be possible to come up with an analytic solution. However, we

1
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM SETUP

l
viewer

:t\fatch movie and add contributions

from each factor
viewer factors

movie

Figure 1.1: A model for how a viewer rates a movie

know that the historical rating data reveal a lot about how people rate movies,
so we may be able to construct a good empirical solution. There is a great
deal of data available to movie rental companies, since they often ask their
viewers to rate the movies that they have already seen.
Figure 1.1 illustrates a specific approach that was widely used in the
million-dollar competition. Here is how it works. You describe a movie as
a long array of different factors, e.g. , how much comedy is in it, how com
plicated is the plot, how handsome is the lead actor, etc. Now, you describe
each viewer with corresponding factors; how much do they like comedy, do
they prefer simple or complicated plots, how important are the looks of the
lead actor, and so on. How this viewer will rate that movie is now estimated
based on the match/mismatch of these factors. For example, if the movie is
pure comedy and the viewer hates comedies, the chances are he won't like it.
If you take dozens of these factors describing many facets of a movie's content
and a viewer's taste, the conclusion based on matching all the factors will be
a good predictor of how the viewer will rate the movie.
The power of learning from data is that this entire process can be auto
mated, without any need for analyzing movie content or viewer taste. To do
so, the learning algorithm 'reverse-engineers' these factors based solely on pre-

2
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM SETUP

vious ratings. It starts with random factors, then tunes these factors to make
them more and more aligned with how viewers have rated movies before, until
they are ultimately able to predict how viewers rate movies in general. The
factors we end up with may not be as intuitive as 'comedy content', and in
fact can be quite subtle or even incomprehensible. After all, the algorithm is
only trying to find the best way to predict how a viewer would rate a movie,
not necessarily explain to us how it is done. This algorithm was part of the
winning solution in the million-dollar competition.

1. 1. 1 Components of Learning
The movie rating application captures the essence of learning from data, and
so do many other applications from vastly different fields. In order to abstract
the common core of the learning problem, we will pick one application and
use it as a metaphor for the different components of the problem. Let us take
credit approval as our metaphor.
Suppose that a bank receives thousands of credit card applications every
day, and it wants to automate the process of evaluating them. Just as in the
case of movie ratings, the bank knows of no magical formula that can pinpoint
when credit should be approved, but it has a lot of data. This calls for learning
from data, so the bank uses historical records of previous customers to figure
out a good formula for credit approval.
Each customer record has personal information related to credit , such as
annual salary, years in residence, outstanding loans, etc. The record also keeps
track of whether approving credit for that customer was a good idea, i.e . , did
the bank make money on that customer. This data guides the construction of
a successful formula for credit approval that can be used on future applicants.
Let us give names and symbols to the main components of this learning
problem. There is the input x (customer information that is used to make
a credit decision) , the unknown target function f: X -- Y (ideal formula for
credit approval) , where X is the input space ( set of all possible inputs x) , and Y
is the output space (set of all possible outputs, in this case just a yes/no deci
sion) . There is a data set D of input-output examples (x1 , Y1 ) , · , (xN , YN ) ,
· ·

where Yn = f (xn ) for n = 1, . . . , N (inputs corresponding to previous customers

and the correct credit decision for them in hindsight). The examples are often
referred to as data points. Finally, there is the learning algorithm that uses the
data set D to pick a formula g: X -- Y that approximates f. The algorithm
chooses g from a set of candidate formulas under consideration, which we call
the hypothesis set 1-l . For instance, 1-l could be the set of all linear formulas
from which the algorithm would choose the best linear fit to the data, as we
will introduce later in this section.
When a new customer applies for credit, the bank will base its decision
on g (the hypothesis that the learning algorithm produced) , not on f (the
ideal target function which remains unknown) . The decision will be good only
to the extent that g faithfully replicates f. To achieve that , the algorithm

3
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM SETUP

UNKNOWN TARGET FUNCTION

f :X Y
(ideal cred'il approval forrn'Ulo)

TRAINING EXAMPLES
· · · ,
(xN, YN)

FINAL
HYPOTHESIS
g� f
(learned credit approval forrn'Ula)

HYPOTHESIS SET
1-

(set of cand,idate form'alas)

Figure 1.2: Basic setup of the learning problem

chooses g that best matches f on the training examples of previous customers,

with the hope that it will continue to match f on new customers. Whether
or not this hope is justified remains to be seen. Figure 1.2 illustrates the
components of the learning problem.
Exercise 1.1
Express each o f the following tasks i n t h e framework o f learning from d ata by
specifying the i nput space X, output space Y, target function f: Y.
a n d the specifics of the data set that we will learn from.
(a) Med ica l diagnosis: A patient wal ks i n with a medical h istory and some
symptoms, a n d you want to identify the problem.
(b) H andwritten digit recognition (for example postal zip code recognition
for m a i l sorting) .
( c) Determi ning if a n email is spam or not.
( d) P redicting how an electric load varies with price, temperature, and
day of the week.
( e) A problem of i nterest to you for which there is no a n alytic sol ution,
but you have data from which to construct an empirica l sol ution .

4
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM SETUP

We will use the setup in Figure 1.2 as our definition of the learning problem.
Later on, we will consider a number of refinements and variations to this basic
setup as needed. However, the essence of the problem will remain the same.
There is a target to be learned. It is unknown to us. We have a set of examples
generated by the target. The learning algorithm uses these examples to look
for a hypothesis that approximates the target.

1. 1.2 A Simple Learning Model

Let us consider the different components of Figure 1.2. Given a specific learn
ing problem, the target function and training examples are dictated by the
problem. However, the learning algorithm and hypothesis set are not. These
are solution tools that we get to choose. The hypothesis set and learning
algorithm are referred to informally as the learning model.
Here is a simple model. Let X =]Rd be the input space, where JRd is the
d-dimensional Euclidean space, and let Y = { + 1, - 1 } be the output space,
denoting a binary (yes/no) decision. In our credit example, different coor
dinates of the input vector x E JRd correspond to salary, years in residence,
outstanding debt, and the other data fields in a credit application. The bi
nary output y corresponds to approving or denying credit. We specify the
hypothesis set 1{ through a functional form that all the hypotheses h E 1{
share. The functional form h(x) that we choose here gives different weights to
the different coordinates of x, reflecting their relative importance in the credit
decision. The weighted coordinates are then combined to form a 'credit score'
and the result is compared to a threshold value. If the applicant passes the
threshold, credit is approved; if not, credit is denied:

i=I:l
d
Approve credit if WiXi > threshold,

i=I:l
d
Deny credit if WiXi < threshold.

This formula can be written more compactly as

(1.1)

where x i , ··· , x d are the components of the vector x; h(x) = + 1 means 'ap
prove credit' and h(x) = - 1 means 'deny credit'; sign(s) = + 1 if s > 0 and
sign(s) = - 1 if s < 0. 1 The weights are w1, ··· , wd , and the threshold is
determined by the bias term b since in Equation (1.1) , credit is approved if
I::=l WiXi > - b.
This model of 1{ is called the perceptron, a name that it got in the context
of artificial intelligence. The learning algorithm will search 1{ by looking for
1 The value of sign (s) whens 0 is a simple technicality that we ignore for the moment.

5
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM SETUP

( a) Misclassified data ( b) Perfectly classified data

Figure 1 .3: Perceptron classification of linearly separable data in a two

dimensional input space ( a) Some training examples will be misclassified
( blue points in red region and vice versa) for certain values of the weight
parameters which define the separating line. ( b) A final hypothesis that
classifies all training examples correctly. is + 1 and is - 1 . )

weights and bias that perform well o n the data set. Some o f the weights
w1, · , Wd may end up being negative, corresponding to an adverse effect on
· ·

credit approval. For instance, the weight of the 'outstanding debt' field should
come out negative since more debt is not good for credit. The bias value b
may end up being large or small, reflecting how lenient or stringent the bank
should be in extending credit. The optimal choices of weights and bias define
the final hypothesis g E 1-l that the algorithm produces.

Exercise 1. 2
S uppose that we use a perceptron to detect spam messages. Let's say
that each email message is represented by the frequency of occurrence of
keywords, a nd the output is if the message is considered spa m .
( a ) Can you t h i n k o f some keywords that wil l e n d u p with a large positive
weight in the perceptron?
( b ) H ow a bout keywords that wil l get a negative weight?
( c) What parameter in the perceptron d i rectly affects how many border
line messages end up being classified as spam ?

Figure 1.3 illustrates what a perceptron does i n a two-dimensional case (d = 2) .

The plane is split by a line into two regions, the + 1 decision region and the - 1
decision region. Different values for the parameters w1, w2, b correspond to
different lines w1x1 + w 2 x 2 + b = 0. If the data set is linearly separable, there
will be a choice for these parameters that classifies all the training examples
correctly.

6
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM S ETUP

To simplify the notation of the perceptron formula, we will treat the bias b
as a weight wo = b and merge it with the other weights into one vector
w = [w0, w 1 , , wd]T, where T denotes the transpose of a vector, so w is a
· · ·

column vector. We also treat x as a column vector and modify it to become x =

[x0, xi, , xd]T, where the added coordinate x0 is fixed at x0 = 1 . Formally
· · ·

speaking, the input space is now

With this convention, wTx = ��=O WiXi, and so Equation (1.1) can be rewrit
ten in vector form as
h (x) = sign(wTx) . (1.2)
We now introduce the perceptron learning algorithm (PLA) . The algorithm
will determine what w should be, based on the data. Let us assume that the
data set is linearly separable, which means that there is a vector w that
makes (1.2) achieve the correct decision h (xn ) = Yn on all the training exam
ples, as shown in Figure 1.3.
Our learning algorithm will find this w using a simple iterative method.
Here is how it works. At iteration t, where t = 0, 1, 2, . . . , there is a current
value of the weight vector, call it w(t) . The algorithm picks an example
from (x1 , Y1 ) (xN , YN) that is currently misclassified, call it (x(t) , y (t) ) , and
· · ·

uses it to update w(t) . Since the example is misclassified, we have y ( t ) #

sign(wT(t)x(t) ) . The update rule is

w(t + 1) = w(t) + y (t)x(t) . (1.3)

This rule moves the boundary in the direction of classifying x(t) correctly, as
depicted in the figure above. The algorithm continues with further iterations
until there are no longer misclassified examples in the data set .

7
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM SETUP

Exercise 1.3

The weight u pdate rule i n {1.3) has the n ice interpretation that it moves
in the direction of classifying x(t) correctly.
(a) Show that y(t)wT(t)x(t) < 0. [Hint: x(t) is misclassified by w(t).]
(b) S how that y(t)wT(t l)x(t) > y(t)wT(t)x(t). [Hint: Use (1.3).]
( c) As far as classifying x(t) is concerned, argue that the move from w(t)
to w(t + 1) is a move ' i n the right direction ' .

Although the update rule in ( 1 . 3) considers only one training example at a

time and may 'mess up' the classification of the other examples that are not
involved in the current iteration, it turns out that the algorithm is guaranteed
to arrive at the right solution in the end. The proof is the subject of Prob
lem 1.3. The result holds regardless of which example we choose from among
the misclassified examples in (x1, Y1 ) · · · (xN, YN) at each iteration, and re
gardless of how we initialize the weight vector to start the algorithm. For
simplicity, we can pick one of the misclassified examples at random ( or cycle
through the examples and always choose the first misclassified one) , and we
can initialize w(O) to the zero vector.
Within the infinite space of all weight vectors, the perceptron algorithm
manages to find a weight vector that works, using a simple iterative process.
This illustrates how a learning algorithm can effectively search an infinite
hypothesis set using a finite number of simple steps. This feature is character
istic of many techniques that are used in learning, some of which are far more
sophisticated than the perceptron learning algorithm.

Exercise 1 .4
Let us create our own target function f a nd data set 1) a n d see how the
perceptron learning a lgorithm works. Take d = 2 so you can visua lize the
problem , a nd choose a random l i ne i n the plane as you r target function ,
where o ne side of the line m a ps to 1 a nd the other m a ps to - 1. Choose
the i n puts Xn of the data set as random points in the pla ne, a n d eval u ate
the target function on each Xn to get the corresponding output Yn ·
Now, generate a data set of size 20. Try the perceptron learning a lgorithm
on you r data set a n d see how long it takes to converge a n d how wel l the
fin a l hypothesis g matches you r target f. You can find other ways to play
with this experiment in Problem 1.4.

The perceptron learning algorithm succeeds in achieving its goal; finding a hy
pothesis that classifies all the points in the data set V = { (x1, y1) · · · (xN, yN) }
correctly. Does this mean that this hypothesis will also be successful in classi
fying new data points that are not in V? This turns out to be the key question
in the theory of learning, a question that will be thoroughly examined in this
book.

8
1 . THE LEARNING PROBLEM 1 . 1 . PROBLEM S ETUP

Size Size
(a ) Coin data ( b) Learned classifier

Figure 1 .4: The learning approach to coin classification ( a) Training data of

pennies, nickels, dimes, and quarters ( 1 , 5, 10, and 25 cents) are represented
in a size mass space where they fall into clusters. (b) A classification rule is
learned from the data set by separating the four clusters. A new coin will
be classified according to the region in the size mass plane that it falls into.

1. 1. 3 Learning versus Design

So far, we have discussed what learning is. Now, we discuss what it is not. The
goal is to distinguish between learning and a related approach that is used for
similar problems. While learning is based on data, this other approach does
not use data. It is a 'design' approach based on specifications, and is often
discussed alongside the learning approach in pattern recognition literature.
Consider the problem of recognizing coins of different denominations, which
is relevant to vending machines , for example. We want the machine to recog
nize quarters, dimes, nickels and pennies. We will contrast the 'learning from
data' approach and the 'design from specifications' approach for this prob
lem. We assume that each coin will be represented by its size and mass, a
two-dimensional input.
In the learning approach, we are given a sample of coins from each of
the four denominations and we use these coins as our data set . We treat
the size and mass as the input vector, and the denomination as the output.
Figure 1 .4( a) shows what the data set may look like in the input space. There
is some variation of size and mass within each class, but by and large coins
of the same denomination cluster together. The learning algorithm searches
for a hypothesis that classifies the data set well. If we want to classify a new
coin, the machine measures its size and mass, and then classifies it according
to the learned hypothesis in Figure l .4(b) .
In the design approach, we call the United States Mint and ask them about
the specifications of different coins. We also ask them about the number

9
1 . THE LEARNING P ROBLEM 1 . 1 . P ROBLEM SETUP

Size Size
(a) Probabilistic model of data (b) Inferred classifier

Figure 1 .5: The design approach to coin classification (a) A probabilistic

model for the size, mass, and denomination of coins is derived from known
specifications. The figure shows the high probability region for each denom
ination ( 1 , 5, 10, and 25 cents) according to the model. (b) A classification
rule is derived analytically to minimize the probability of error in classifying
a coin based on size and mass. The resulting regions for each denomination
are shown.

of coins of each denomination in circulation, in order to get an estimate of

the relative frequency of each coin. Finally, we make a physical model of
the variations in size and mass due to exposure to the elements and due to
errors in measurement. We put all of this information together and compute
the full joint probability distribution of size, mass, and coin denomination
(Figure 1 . 5 ( a) ) . Once we have that joint distribution, we can construct the
optimal decision rule to classify coins based on size and mass (Figure 1 . 5 (b) ) .
The rule chooses the denomination that has the highest probability for a given
size and mass, thus achieving the smallest possible probability of error. 2
The main difference between the learning approach and the design ap
proach is the role that data plays. In the design approach, the problem is well
specified and one can analytically derive f without the need to see any data.
In the learning approach, the problem is much less specified, and one needs
data to pin down what f is.
Both approaches may be viable in some applications, but only the learning
approach is possible in many applications where the target function is un
known. We are not trying to compare the utility or the performance of the
two approaches. We are just making the point that the design approach is
distinct from learning. This book is about learning.

2 This is called Bayes optimal decision theory. Some learning models are based on the
same theory by estimating the probability from data.

10
1 . THE LEARNING PROBLEM 1 . 2 . TYPES OF LEARNING

Exercise 1. 5
Which of the following problems a re more suited for the learning a pproach
and which a re more suited for the d esign approach?
(a) Determining the a ge at which a particular med ica l test should be
performed
(b) Classifying n u m bers into primes a n d non-primes
( c) Detecting potentia l fraud i n credit card charges
( d) Determi ning the time it wou ld ta ke a fal l i ng object to h it the ground
(e) Determining the optima l cycle for traffic lights i n a busy intersection

1. 2 Types of Learning

The basic premise of learning from data is the use of a set of observations to
uncover an underlying process. It is a very broad premise, and difficult to fit
into a single framework. As a result, different learning paradigms have arisen
to deal with different situations and different assumptions. In this section, we
introduce some of these paradigms.
The learning paradigm that we have discussed so far is called supervised
learning. It is the most studied and most utilized type of learning, but it is
not the only one. Some variations of supervised learning are simple enough
to be accommodated within the same framework. Other variations are more
profound and lead to new concepts and techniques that take on lives of their
own. The most important variations have to do with the nature of the data
set.

1.2. 1 Supervised Learning

When the training data contains explicit examples of what the correct output
should be for given inputs, then we are within the supervised learning set
ting that we have covered so far. Consider the hand-written digit recognition
problem ( task (b ) of Exercise 1 . 1) . A reasonable data set for this problem is
a collection of images of hand-written digits, and for each image, what the
digit actually is. We thus have a set of examples of the form ( image , digit ) .
The learning is supervised in the sense that some 'supervisor' has taken the
trouble to look at each input, in this case an image, and determine the correct
output, in this case one of the ten categories {O, 1 , 2, 3, 4, 5, 6 , 7, 8, 9}.
While we are on the subject of variations, there is more than one way that
a data set can be presented to the learning process. Data sets are typically cre
ated and presented to us in their entirety at the outset of the learning process.
For instance, historical records of customers in the credit-card application,
and previous movie ratings of customers in the movie rating application, are
already there for us to use. This protocol of a 'ready' data set is the most

MLT Part 1
No ratings yet
MLT Part 1
230 pages
AI Chapter 19
No ratings yet
AI Chapter 19
53 pages
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin - Learning From Data. A Short course-AMLBook (2012)
No ratings yet
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin - Learning From Data. A Short course-AMLBook (2012)
216 pages
Chapter 2 Notes
No ratings yet
Chapter 2 Notes
16 pages
Learning From Data A Short Course (Yaser S. Abu Mostafa 2012)
No ratings yet
Learning From Data A Short Course (Yaser S. Abu Mostafa 2012)
216 pages
Computational Learning Theorem
No ratings yet
Computational Learning Theorem
91 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
135 pages
Module 03
No ratings yet
Module 03
54 pages
AI Unit 4
No ratings yet
AI Unit 4
91 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
03 - Data & Learning
No ratings yet
03 - Data & Learning
53 pages
Learning From Data - Yaser S. Abu-Mostafa 2012
No ratings yet
Learning From Data - Yaser S. Abu-Mostafa 2012
216 pages
Unit No. 1
No ratings yet
Unit No. 1
73 pages
Amanuel Ai
No ratings yet
Amanuel Ai
28 pages
5 Learning
No ratings yet
5 Learning
42 pages
Machine 1
No ratings yet
Machine 1
35 pages
1: The Learning Problem
No ratings yet
1: The Learning Problem
27 pages
Learning: Introduction and Overview: Chapter 18-21
No ratings yet
Learning: Introduction and Overview: Chapter 18-21
29 pages
ML Mid1 Notes
No ratings yet
ML Mid1 Notes
45 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
Introduction 1175
No ratings yet
Introduction 1175
58 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
22 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
Afafdfsregf
No ratings yet
Afafdfsregf
9 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
The Learning Problem
No ratings yet
The Learning Problem
4 pages
Lecture#12 DM MS (DEIM) Spring 2025
No ratings yet
Lecture#12 DM MS (DEIM) Spring 2025
21 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
54 pages
Unit 5
No ratings yet
Unit 5
21 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Ai Unit V
No ratings yet
Ai Unit V
18 pages
FML Lecture Notes
No ratings yet
FML Lecture Notes
34 pages
Learning From Data
No ratings yet
Learning From Data
16 pages
Chap 18
No ratings yet
Chap 18
51 pages
Module 1
No ratings yet
Module 1
50 pages
Unit 01
No ratings yet
Unit 01
32 pages
11 Learning
No ratings yet
11 Learning
25 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Unit I
No ratings yet
Unit I
17 pages
Slides Lect 01
No ratings yet
Slides Lect 01
14 pages
Chapter 6:artificial Intelligence Learning: By. Getaneh T
No ratings yet
Chapter 6:artificial Intelligence Learning: By. Getaneh T
59 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
The Student Internship Programme - Concept and Roles
No ratings yet
The Student Internship Programme - Concept and Roles
5 pages
Machine Learning - v1
No ratings yet
Machine Learning - v1
30 pages
AI Notes Module - 4
No ratings yet
AI Notes Module - 4
13 pages
191AIC502T - Machine Learning - Unit 1
No ratings yet
191AIC502T - Machine Learning - Unit 1
41 pages
Artificial Intelligence Chapter 18 (Updated)
No ratings yet
Artificial Intelligence Chapter 18 (Updated)
19 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
Rizal Sa Dapitan
0% (1)
Rizal Sa Dapitan
3 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
AIML Module - 03
No ratings yet
AIML Module - 03
34 pages
Ciml v0 - 99 ch01 PDF
No ratings yet
Ciml v0 - 99 ch01 PDF
11 pages
Larning Introduction
No ratings yet
Larning Introduction
6 pages
Reinforcement Learning: Parallelizing Genetic Algorithms
No ratings yet
Reinforcement Learning: Parallelizing Genetic Algorithms
5 pages
Wind Turbine Models - Model Development and Verification Measurements Final Update
No ratings yet
Wind Turbine Models - Model Development and Verification Measurements Final Update
42 pages
2702 PDF
No ratings yet
2702 PDF
7 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Artificial Intelligence: Chapter 5 - Machine Learning
No ratings yet
Artificial Intelligence: Chapter 5 - Machine Learning
30 pages
n213b Purification
No ratings yet
n213b Purification
253 pages
MegaBrain Report Volume 3 Number 3
100% (9)
MegaBrain Report Volume 3 Number 3
74 pages
Social Relevance Project On: "Consumer Perception About Big Bazaar"
100% (1)
Social Relevance Project On: "Consumer Perception About Big Bazaar"
35 pages
SAP Quality Management
No ratings yet
SAP Quality Management
5 pages
IOT Assignment-12 Solution
No ratings yet
IOT Assignment-12 Solution
7 pages
Familyandfriends 5 Teachersbookpdf
No ratings yet
Familyandfriends 5 Teachersbookpdf
2 pages
Uolo Guideline Snapshot
No ratings yet
Uolo Guideline Snapshot
11 pages
Bayview t2 Floor Plans
No ratings yet
Bayview t2 Floor Plans
42 pages
Typical EfW Plant Commissioning Plan Feb 2010
No ratings yet
Typical EfW Plant Commissioning Plan Feb 2010
176 pages
Hand Out Network Security
No ratings yet
Hand Out Network Security
6 pages
Year 5 Section A Set 2
No ratings yet
Year 5 Section A Set 2
13 pages
PC4020 v3.3 - Manual de Instrucción: Advertencia
No ratings yet
PC4020 v3.3 - Manual de Instrucción: Advertencia
44 pages
Blubber Experiment Worksheet
No ratings yet
Blubber Experiment Worksheet
1 page
Unit Plan GR 8
No ratings yet
Unit Plan GR 8
7 pages
Laboratory Activity 2
No ratings yet
Laboratory Activity 2
3 pages
Educ 404
No ratings yet
Educ 404
18 pages
Managerial Decision Making
No ratings yet
Managerial Decision Making
14 pages
Quality of Care Between Donabedian Model and Iso9001v2008 PDF
100% (1)
Quality of Care Between Donabedian Model and Iso9001v2008 PDF
14 pages
Seat 220524 e
No ratings yet
Seat 220524 e
36 pages
CIMA Code of Ethics For Professional Accountants
No ratings yet
CIMA Code of Ethics For Professional Accountants
4 pages
Segunda Lengua Extranjera Inglés II Examen Resuelto
No ratings yet
Segunda Lengua Extranjera Inglés II Examen Resuelto
4 pages
David J. Malan: 33 Oxford Street - Cambridge MA 02138 USA - +1-617-523-0925
No ratings yet
David J. Malan: 33 Oxford Street - Cambridge MA 02138 USA - +1-617-523-0925
6 pages
Assignment Report.d05
No ratings yet
Assignment Report.d05
3 pages
Availability Bias
No ratings yet
Availability Bias
15 pages
Math
No ratings yet
Math
6 pages
LP 15
No ratings yet
LP 15
4 pages
Godrej Consumer Products Ltd. - Ankit Vohra
No ratings yet
Godrej Consumer Products Ltd. - Ankit Vohra
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Alarming! the Chasm Separating Education of Applications of Finite Math from It's Necessities
From Everand
Alarming! the Chasm Separating Education of Applications of Finite Math from It's Necessities
Ramune B. Adams
No ratings yet

Section 1.1

Uploaded by

Section 1.1

Uploaded by

Chapter 1

1.1 Problem Setup

What do financial forecasting, medical diagnosis, computer vision, and search

:t\fatch movie and add contributions

Figure 1.1: A model for how a viewer rates a movie

where Yn = f (xn ) for n = 1, . . . , N (inputs corresponding to previous customers

UNKNOWN TARGET FUNCTION

(set of cand,idate form'alas)

Figure 1.2: Basic setup of the learning problem

chooses g that best matches f on the training examples of previous customers,

1. 1.2 A Simple Learning Model

This formula can be written more compactly as

( a) Misclassified data ( b) Perfectly classified data

Figure 1 .3: Perceptron classification of linearly separable data in a two

Figure 1.3 illustrates what a perceptron does i n a two-dimensional case (d = 2) .

column vector. We also treat x as a column vector and modify it to become x =

speaking, the input space is now

uses it to update w(t) . Since the example is misclassified, we have y ( t ) #­

w(t + 1) = w(t) + y (t)x(t) . (1.3)

Although the update rule in ( 1 . 3) considers only one training example at a

Figure 1 .4: The learning approach to coin classification ( a) Training data of

1. 1. 3 Learning versus Design

Figure 1 .5: The design approach to coin classification (a) A probabilistic

of coins of each denomination in circulation, in order to get an estimate of

1.2. 1 Supervised Learning

You might also like

uses it to update w(t) . Since the example is misclassified, we have y ( t ) #