0% found this document useful (0 votes)

42 views29 pages

Lecture 02 Supervised Learning 27102022 124322am

The document discusses supervised machine learning. It defines supervised learning as using a set of labeled training data to learn a function that maps inputs to outputs. The training data is used to train a model, which can then be used to make predictions on new, unlabeled data. Some key supervised learning tasks mentioned include disease diagnosis, part-of-speech tagging, and face recognition. The document also discusses important supervised learning concepts like features, label spaces, and different learning settings like classification and regression. It provides an example of using support vector machines for spam detection.

Uploaded by

Misbah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views29 pages

Lecture 02 Supervised Learning 27102022 124322am

Uploaded by

Misbah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

LECTURER:

Humera Farooq, Ph.D.

Computer Sciences Department,
Bahria University (Karachi Campus)
SUPERVISED LEARNING
Outline

1. ML in a Nutshell
2. Representation, Evaluation, Optimization
3. Types of Learning
4. Trade-offs in Machine Learning
Supervised Learning

 The learning algorithm would receive a set of

inputs along with the corresponding correct outputs
to train a model
Training Data Model Prediction
(Labeled Data)
Supervised Learning
4

Output
Input
Model
y∈ Y
x∈ X y = f(x)
An item y
An item x
drawn from an output space Y
drawn from an input space X

An algorithm
required to design it

 We consider model that apply a function f() to input items x and return an output y =
f(x).
 In (supervised) machine learning, we deal with systems whose f(x) is learned from
examples.
 We typically use machine learning when the function f(x) we want to apply is unknown to us, and
we cannot “think” about it.
Supervised Learning Settings
5
Produce useful predictions (on unseen data)
Supervised Learning
6
Supervised Learning
7

Features Label Output

User ID Gender Age Salary Purchase

d
001 M 19 20,000 0
021 F 20 22,000 1
Training
031 F 34 45,000 1 Data
041 M 23 25,000 0
082 M 22 22,000 1
092 F 21 21,000 0
120 M 50 60,000 0
Testing
920 M 32 34,000 0
Data
125 F 33 35,000 0
874 M 45 55,000 1
Supervised learning: Training
8

User ID Gende Age Salar Purc

r y hased

001 M 19 20,00 0
0
021 F 20 22,00 1
0
031 F 34 45,00 1
0
041 M 23 25,00 0
0
082 M 22 22,00 1
0
092 F 21 21,00 0
0
120 M 50 60,00 0
0
Supervised Learning : Examples
 Disease diagnosis
 x: Properties of patient (symptoms, lab tests)
 f : Disease (or maybe: recommended therapy)
 Part-of-Speech tagging
 x: An English sentence (e.g., The can will rust)
 f : The part of speech of a word in the sentence
 Face recognition
 x: Bitmap picture of person’s face
 f : Name the person (or maybe: a property of)
 Automatic Steering
 x: Bitmap picture of road surface in front of car
 f : Degrees to turn the steering wheel
Good features are essential

The choice of features is crucial for how

well a task can be learned.
 In many application areas (language, vision,
etc.), a lot of work goes into designing
suitable features.
 This requires domain expertise.
The Label space Y
 The label space Y determines what kind of supervised learning task
we are dealing with
 Output labels Y are categorical (Classification)
 Binary classification: Two possible labels (0, 1) , (-1 , 1) or (1 ,2 )
 Multiclass classification: M possible labels (1, 2, ……. M)
 Output labels Y are structured objects (sequences of labels, parse trees, etc.)

 Output labels Y are numerical (Regression):

 Labels are continuous-valued
 Learn a linear/polynomial function

 Ranking:
 Labels are ordinal
 Learn an ordering f(x1) > f(x2) over input
Views of Learning
 Learning is the removal of our remaining uncertainty:
 Suppose we knew that the unknown function was an m-of-n Boolean
function, then we could use the training data to infer which function it is.
 Learning requires guessing a good, small hypothesis class:
 We can start with a very small class and enlarge it until it contains an
hypothesis that fits the data.

 We could be wrong !
 Our prior knowledge might be wrong
 Our guess of the hypothesis space could be wrong

 If this is the unknown function, then we will make errors when we are given
new examples, and are asked to predict the value of the function
Spam Detection Example
 Suppose there are 10,000 email messages
 Each with a label either “spam” or “not_spam” (could add those labels manually).
 Convert each email message into a feature vector.
 How to convert a real-world entity, such as an email message, into a feature vector?
 One common way to convert a text into a feature vector, called bag of words, is to take a
dictionary of English words (let’s say it contains 20,000 alphabetically sorted words) and
stipulate that in feature vector:
 the first feature is equal to 1 if the email message contains the word “a”; otherwise, this feature is 0;
 the second feature is equal to 1 if the email message contains the word “aaron”; otherwise, this feature equals
0;
 • ... •
 the feature at position 20,000 is equal to 1 if the email message contains the word “zulu”; otherwise, this
feature is equal to 0.
 Repeat the above procedure for every email message in the collection, which gives us 10,000
feature vectors (each vector having the dimensionality of 20,000) and a label
(“spam”/“not_spam”
Spam Detection
 Now input data is ready.

 output labels are still in the form of human-readable text.

 Some learning algorithms require transforming labels into numbers.

 For example, some algorithms require numbers like 0 (to represent the label “not_spam”) and 1
(to represent the label “spam”).

 The given algorithm use to illustrate supervised learning is called Support Vector Machine
(SVM).

 This algorithm requires that the positive label (in our case it’s “spam”) has the numeric value of
+1 (one), and the negative label (“not_spam”) has the value of -1 (minus one)

 After having a dataset and a learning algorithm, now apply the learning algorithm to the dataset
to get the model
SVM
 SVM sees every feature vector as a point in a high-dimensional space (in our case, space is
20,000-dimensional).
 The algorithm puts all feature vectors on an imaginary 20,000- dimensional plot and draws an
imaginary 20,000-dimensional line (a hyperplane) that separates examples with positive
labels from examples with negative labels.
 In machine learning, the boundary separating the examples of different classes is called the
decision boundary.
 The equation of the hyperplane is given by two parameters, a real-valued vector w of the
same dimensionality as our input feature vector x, and a real number b like this:
wx – b =0
 where the expression wx means , where D is the
number of dimensions of the feature vector x
 Now, the predicted label for some input feature vector x is given like this:
 Y = sign (wx –b)
 where sign is a mathematical operator that takes any value as input and returns +1 if the input
is a positive number or -1 if the input is a negative number
SVM
 The goal of the learning algorithm — SVM in this case — is to leverage the dataset and find the
optimal values w* and b* for parameters w and b. Once the learning algorithm identifies these
optimal values, the model f(x) is then defined as:

F(x) = sign (wx – b )

 Therefore, to predict whether an email message is spam or not spam using an SVM model, you
have to take a text of the message, convert it into a feature vector, then multiply this vector by
w*, subtract b* and take the sign of the result. This will give us the prediction (+1 means “spam”,
-1 means “not_spam”).

 Now, how does the machine find w* and b*? It solves an optimization problem. Machines are
good at optimizing functions under constraints

 So what are the constraints we want to satisfy here? First of all, we want the model to predict the
labels of our 10,000 examples correctly. Remember that each example i = 1,..., 10000 is given by
a pair (xi, yi), where xi is the feature vector of example i and yi is its label that takes values either
-1 or +1. So the constraints are naturally
SVM
 Preferably the hyperplane should separates positive examples from negative ones with the
largest margin.
 The margin is the distance between the closest examples of two classes, as defined by the
decision boundary. A large margin contributes to a better generalization, that is how well
the model will classify new examples in the future.

To achieve that, we need to minimize the Euclidean

norm of w denoted by

So, the optimization problem that we want the

machine to solve looks like this:

The blue and orange circles represent,

respectively, positive and negative examples,
and the line given by wx - b = 0 is the
decision boundary.
SVM

is just a compact way to write the above two constraints.

Why, by minimizing the norm of w, do we find the highest margin between the two classes?

Geometrically, the equations wx - b = 1 and wx - b = -1 define two parallel hyperplanes.

The distance between these hyperplanes is given by so the smaller the norm ||w||, the larger
the distance between these two hyperplanes.

This particular version of the algorithm builds the so-called linear model. It’s called linear
because the decision boundary is a straight line (or a plane, or a hyperplane).

SVM can also incorporate kernels that can make the decision boundary arbitrarily non-linear. In
some cases, it could be impossible to perfectly separate the two groups of points because of
noise in the data, errors of labeling, or outliers (examples very different from from a “typical”
example in the dataset).
Another version of SVM can also incorporate a penalty hyperparameter for misclassification of
training examples of specific classes.
Evaluate the Performance of
Supervised Learning
 Machines learn by means of a loss function. It’s a method of evaluating how well specific
algorithm models the given data. If predictions deviates too much from actual results, loss
function would cough up a very large number. Gradually, with the help of some optimization
function, loss function learns to reduce the error in prediction.

 There’s no one-size-fits-all loss function to algorithms in machine learning. There are various
factors involved in choosing a loss function for specific problem such as type of machine
learning algorithm chosen, ease of calculating the derivatives and to some degree the percentage
of outliers in the data set.

 Loss functions play an important role in any statistical model - they define an objective which
the performance of the model is evaluated against and the parameters learned by the model are
determined by minimizing a chosen loss function.
Loss Function
 Two major categories depending upon the type of learning task we are dealing with
 Regression losses : Regression, deals with predicting a continuous value. Few known loss
functions are:

Mean Absolute Error (MAE)

Mean Squared Error (MSE)
Mean Bias Error (MBE)
Mean Squared Logarithmic Error (MSLE)

 Classification losses. In classification, we deal with categorical values. Few known loss
functions are:
Binary Cross Entropy Loss
Hinge Loss
Mean Absolute Error (MAE) / L1 Loss

 Regression problems may have variables that are not strictly Gaussian in nature due to the
presence of outliers (values that are very different from the rest of the data).

 Mean Absolute Error would be an ideal option in such cases because it does not take into
account the direction of the outliers (unrealistically high positive or negative values).

 MAE takes the average sum of the absolute differences between the actual and the predicted
values. For a data point xi and its predicted value yi, n being the total number of data points in
the dataset, the mean absolute error is defined as:
Mean Squared Error (MSE)
/ L2 Loss
 Prefer by researchers, because most variables can be modeled into a Gaussian distribution.
 Mean Squared Error is the average of the squared differences between the actual and the predicted values.
For a data point Yi and its predicted value Ŷi, where n is the total number of data points in the dataset, the
mean squared error is defined as:

 It’s only concerned with the average magnitude of error irrespective of their direction. However, due to
squaring, predictions which are far away from actual values are penalized heavily in comparison to less
deviated predictions. Plus MSE has nice mathematical properties which makes it easier to calculate
gradients.

 Mean absolute error, is measured as the average of sum of absolute differences between predictions and
actual observations. Like MSE, this as well measures the magnitude of error without considering their
direction. Unlike MSE, MAE needs more complicated tools such as linear programming to compute the
gradients. Plus MAE is more robust to outliers since it does not make use of square.
Mean Bias Error
 Mean Bias Error takes the actual difference between the target and the predicted value, and not the absolute
difference. One has to be cautious as the positive and the negative errors could cancel each other out, which
is why it is one of the lesser-used loss functions.

 Mean Bias Error is used to calculate the average bias in the model. Bias, in a nutshell, is overestimating or
underestimating a parameter. Corrective measures can be taken to reduce the bias post-evaluating the model
using MBE.

Where yi is the true value, ŷi is the predicted value and ’n’ is the total number of data points in the dataset.
Mean Squared Logarithmic Error
(MSLE)
 Sometimes, one may not want to penalize the model too much for predicting unscaled quantities directly.
Relaxing the penalty on huge differences can be done with the help of Mean Squared Logarithmic Error.
 Calculating the Mean Squared Logarithmic Error is the same as Mean Squared Error, except the natural
logarithm of the predicted values is used rather than the actual values.

Where yi is the true value, ŷi is the predicted value and ’n’ is the total number of data points in the dataset.
Binary Cross Entropy Loss

 Entropy is the measure of randomness in the information being processed, and cross entropy is a measure of
the difference of the randomness between two random variables.
 If the divergence of the predicted probability from the actual label increases, the cross-entropy loss increases.
For example, predicting a probability of .011 when the actual observation label is 1 would result in a high
loss value. In an ideal situation, a “perfect” model would have a log loss of 0. Looking at the loss function
would make things even clearer -
Hinge Loss/Multi class SVM Loss
 Hinge loss is primarily developed for
support vector machines for
calculating the maximum margin
from the hyperplane to the classes.
 Loss functions penalize wrong
predictions and does not do so for the
right predictions. So, the score of the
target label should be greater than the
sum of all the incorrect labels by a
margin of (at the least) one.

 This margin is the maximum

margin from the hyperplane to the
data points, which is why hinge
loss is preferred for SVMs.
Hinge Loss/Multi class SVM Loss
 In simple terms, the score of correct category should be greater than sum of scores of all
incorrect categories by some safety margin (usually one). And hence hinge loss is used for
maximum-margin classification, most notably for support vector machines. Although not
differentiable, it’s a convex function which makes it easy to work with usual convex optimizers
used in machine learning domain

 Loss functions penalize wrong predictions and does not do so for the right
predictions. So, the score of the target label should be greater than the sum of all the
incorrect labels by a margin of (at the least) one.
Hinge Loss/Multi class SVM Loss
Consider an example where we have three training examples and three classes
to predict — Dog, cat and horse. Below the values predicted by our algorithm
for each of the classes. Computing hinge losses for all 3 training examples :-
Summary

Learning?

SVM Using Python
No ratings yet
SVM Using Python
24 pages
Bilinear Interpolation
100% (1)
Bilinear Interpolation
4 pages
Synthetic Division
75% (4)
Synthetic Division
4 pages
Presentation On Brain Tumor Detection
100% (1)
Presentation On Brain Tumor Detection
23 pages
The Hundred Page Machine Learning 2019
No ratings yet
The Hundred Page Machine Learning 2019
4 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
Lec 05
No ratings yet
Lec 05
54 pages
Unit 3 Aam
No ratings yet
Unit 3 Aam
30 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
NLP Review Classfication: Knowledge Solutions India
No ratings yet
NLP Review Classfication: Knowledge Solutions India
11 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Supervised Learning
No ratings yet
Supervised Learning
30 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
9 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
SVM7
No ratings yet
SVM7
53 pages
3.unit 3 ML Part-1 Q&A
No ratings yet
3.unit 3 ML Part-1 Q&A
39 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
CH 5 SVM
No ratings yet
CH 5 SVM
25 pages
Unit 3
No ratings yet
Unit 3
20 pages
Session Svmclassification
No ratings yet
Session Svmclassification
28 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
9 pages
03 Classification
No ratings yet
03 Classification
66 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
46 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Fintech ML Using Azure
No ratings yet
Fintech ML Using Azure
51 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
Machine Learning - I
No ratings yet
Machine Learning - I
126 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
5 pages
Assessing A Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing A Single Classification Algorithm and Two Classification Algorithms
12 pages
SVM, Neural Network and Random Forest in R
No ratings yet
SVM, Neural Network and Random Forest in R
45 pages
Comparative Study of Four Supervised Machine Learning Techniques For Classification
No ratings yet
Comparative Study of Four Supervised Machine Learning Techniques For Classification
15 pages
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
No ratings yet
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
9 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
08 Classification
No ratings yet
08 Classification
46 pages
SVM 1
No ratings yet
SVM 1
17 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
SVM
No ratings yet
SVM
11 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Lecture 06 Bayesian Networks 07112022 011127pm
No ratings yet
Lecture 06 Bayesian Networks 07112022 011127pm
33 pages
Lecture 01 Introducing ML 13102022 031101pm
No ratings yet
Lecture 01 Introducing ML 13102022 031101pm
36 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Misbah Rashid (02-243212-001) - Advance Network Design - Assignment2
No ratings yet
Misbah Rashid (02-243212-001) - Advance Network Design - Assignment2
10 pages
Misbah Rashid - Advance Network Design Presentation
No ratings yet
Misbah Rashid - Advance Network Design Presentation
20 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
AdvanceNetworkDesign Misbah Rashid MSCS 3A Ass01
No ratings yet
AdvanceNetworkDesign Misbah Rashid MSCS 3A Ass01
3 pages
Anosha Muzammil, Misbah Rashid Presentation
No ratings yet
Anosha Muzammil, Misbah Rashid Presentation
23 pages
Anosha Muzammil Presentation
No ratings yet
Anosha Muzammil Presentation
8 pages
Control Systems: GATE Objective & Numerical Type Solutions
No ratings yet
Control Systems: GATE Objective & Numerical Type Solutions
9 pages
Neville.: TAREA #2. Métodos Numéricos. Por: Ingrid Jiménez López. C.C. 1.152.700.197
No ratings yet
Neville.: TAREA #2. Métodos Numéricos. Por: Ingrid Jiménez López. C.C. 1.152.700.197
11 pages
20EC3305 - PTRP - Assignment 2 Questions - 2022-23
No ratings yet
20EC3305 - PTRP - Assignment 2 Questions - 2022-23
2 pages
Week 1
No ratings yet
Week 1
6 pages
Minimizing Energy Consumption and Cycle Time in Two Sided - 2016 - Journal of C
No ratings yet
Minimizing Energy Consumption and Cycle Time in Two Sided - 2016 - Journal of C
15 pages
DLT Unit-2
100% (1)
DLT Unit-2
50 pages
SPSS Annotated Output K Means Cluster Anal
No ratings yet
SPSS Annotated Output K Means Cluster Anal
10 pages
AMS - 326 - Syllabus - Summer 2024
No ratings yet
AMS - 326 - Syllabus - Summer 2024
2 pages
Recurrence Relation: Fall 2002 CMSC 203 - Discrete Structures 1
No ratings yet
Recurrence Relation: Fall 2002 CMSC 203 - Discrete Structures 1
23 pages
Using of Fir and IIR Filters For Noise Removal From ECG Signal
No ratings yet
Using of Fir and IIR Filters For Noise Removal From ECG Signal
9 pages
Geofence Boundary Violation Detection in 3D Using Triangle Weight Characterization With Adjacency
No ratings yet
Geofence Boundary Violation Detection in 3D Using Triangle Weight Characterization With Adjacency
12 pages
S1 21 - Dseclzg519 L2
No ratings yet
S1 21 - Dseclzg519 L2
20 pages
PDF of Digital Signal Processing Ramesh Babu 2 PDF
No ratings yet
PDF of Digital Signal Processing Ramesh Babu 2 PDF
2 pages
BEST - Fractional Knapsack - Tutorialspoint
No ratings yet
BEST - Fractional Knapsack - Tutorialspoint
5 pages
3.4 Fundamentals of Spatial Filtering: Filter
No ratings yet
3.4 Fundamentals of Spatial Filtering: Filter
18 pages
Simplex Method Incase of Artificial Variables " "
No ratings yet
Simplex Method Incase of Artificial Variables " "
13 pages
Lab Manual: Department of Computer Science & Engineering (Rajasthan Technical University, KOTA)
No ratings yet
Lab Manual: Department of Computer Science & Engineering (Rajasthan Technical University, KOTA)
123 pages
DEM4110 - Interpolation and Extrapolation - 2021
No ratings yet
DEM4110 - Interpolation and Extrapolation - 2021
74 pages
Objective: The Student Will Be Able To
No ratings yet
Objective: The Student Will Be Able To
18 pages
Jim DSA Questions
No ratings yet
Jim DSA Questions
26 pages
AI Unit 2
No ratings yet
AI Unit 2
259 pages
A 77.3-dB SNDR 62.5-kHz Bandwidth Continuous-Time Noise-Shaping SAR ADC With Duty-Cycled GM-C Integrator
No ratings yet
A 77.3-dB SNDR 62.5-kHz Bandwidth Continuous-Time Noise-Shaping SAR ADC With Duty-Cycled GM-C Integrator
10 pages
Limiter6 Manual
No ratings yet
Limiter6 Manual
22 pages
Forward-Backward Algorithm
No ratings yet
Forward-Backward Algorithm
8 pages
Automata Revision Ans
No ratings yet
Automata Revision Ans
6 pages
Samruddhi Resume
No ratings yet
Samruddhi Resume
1 page
11110 計算方法設計許建平 quiz1
No ratings yet
11110 計算方法設計許建平 quiz1
6 pages

Lecture 02 Supervised Learning 27102022 124322am

Uploaded by

Lecture 02 Supervised Learning 27102022 124322am

Uploaded by

LECTURER:

Humera Farooq, Ph.D.

 The learning algorithm would receive a set of

Features Label Output

User ID Gender Age Salary Purchase

User ID Gende Age Salar Purc

The choice of features is crucial for how

 Output labels Y are numerical (Regression):

 output labels are still in the form of human-readable text.

 Some learning algorithms require transforming labels into numbers.

F(x) = sign (w*x – b* )

To achieve that, we need to minimize the Euclidean

So, the optimization problem that we want the

The blue and orange circles represent,

is just a compact way to write the above two constraints.

Geometrically, the equations wx - b = 1 and wx - b = -1 define two parallel hyperplanes.

Mean Absolute Error (MAE)

 This margin is the maximum

You might also like

F(x) = sign (wx – b )