0% found this document useful (0 votes)

62 views57 pages

Module3-Fitting A Model To Data

The document discusses fitting models to data through linear regression, logistic regression, and support vector machines. It explains that these techniques find the linear function that best fits the data by optimizing an objective function. Linear regression finds the line that minimizes the squared errors to predict continuous numeric outcomes. Logistic regression estimates the probability of classification by applying a logit transformation to the odds. Support vector machines produce different decision boundaries than logistic regression by optimizing a different objective function. An example using iris flower data demonstrates classifying instances based on measurements and comparing linear boundaries from logistic regression and support vector machines.

Uploaded by

Green Mongor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views57 pages

Module3-Fitting A Model To Data

Uploaded by

Green Mongor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Module 3

Fitting a Model to Data

What is a model?

In certain fields of Statistics & Economics, the bare model with unspecified
parameters is called “The Model”.

The Model: the model is a convenient fiction that necessarily glosses over
some of the details of the actual thing being modeled. It is meant to capture
the structure of the data as simply as possible.

The basic structure of a statistical model is:

data=model + error

We generally describe a model in terms of its parameters, which are

values that we can change in order to modify the predictions of the
model.
Outline
What is Model Fitting?
Model Fitting is a measurement of how well a machine learning model adapts to data
that is similar to the data on which it was trained. The fitting process is generally
built-in to models and is automatic. A well-fit model will accurately approximate the
output when given new data, producing more precise results.
Classification via Mathematical Functions
Decision Boundaries
A main purpose of creating
homogeneous regions is that we can
predict the target variable of a new,
unseen instance by determining which
segment it falls into.

For example, in Figure 4-1,

if a new customer falls into the lower-left
segment, we can conclude that the target
value is very likely to be “•”.

Similarly, if it falls into the upper-right

segment, we can predict its value as “+”.
Classification via Mathematical Functions

Instance Space Linear Classifier

Linear Discriminant Functions
Our goal is going to be to fit our model to the data, and to do so it is
quite helpful to represent the model mathematically.

You may recall that the equation of a line in two dimensions is

y = mx + b, where m is the slope of the line and b is the y intercept
(the y value when x = 0).

The line in Figure 4-3 can be expressed in this form (with Balance in
thousands) as:

We would classify an instance x as a + if it is above the line,

and as a • if it is below the line.
For this example Decision Boundary the classification solution is,

•Linear discriminant:

𝑐𝑙𝑎𝑠𝑠 = {+ if 1.0 × 𝐴𝑔𝑒 − 1.5 × 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 + 60 > 0

● if 1.0 × 𝐴𝑔𝑒 − 1.5 × 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 + 60 ≤ 0

• We now have a parameterized model: the weights of the linear function

are the parameters.

For our purposes, the important thing is that we can express the model
as a weighted sum of the attribute values.

• The weights are often loosely interpreted as importance indicators of the

features.
Thus, this linear model is a different sort of multivariate supervised
segmentation.
For example, consider our feature vector x, with the individual component
features being xi.

A linear model then can be written as follows in Equation 4-2.

Equation 4-2. A general linear model

The concrete example from Equation 4-1 can be written in this form:

To use this model as a linear discriminant, for a given instance

represented by a feature vector x, we check whether f(x) is positive or
negative.
In the two-dimensional case, this corresponds to seeing whether the
instance x falls above or below the line.
In Figure 4-5, there actually are many different linear discriminants that
can separate the classes perfectly.

They have very different slopes and intercepts, and each represents a
different model of the data. In fact, there are infinitely many lines (models)
that classify this training set perfectly.

Which should we pick?

Optimizing an Objective Function
what should be our goal or objective in choosing the parameters?

Our general procedure will be to define an objective function that represents

our goal, and can be calculated for a particular set of weights and a particular
set of data.

We will then find the optimal value for the weights by maximizing or minimizing
the objective function.

What can easily be overlooked is that these weights are “best” only if we
believe that the objective function truly represents what we want to achieve.
Unfortunately, creating an objective function that matches the true goal of the
data mining is usually impossible.

Several choices have been shown to be remarkably effective. One of these

choices creates the so-called “support vector machine.”

Linear regression, logistic regression, and support vector machines

are all very similar instances of our basic fundamental technique:
fitting a (linear) model to data.

The key difference is that each uses a different objective function.

Regression

• Technique used for the modeling and analysis of numerical data.

• Exploits the relationship between two or more variables so that we can

gain information about one of them through knowing values of the other.

• Regression can be used for prediction, estimation,hypothesis testing,

and modeling causal relationships.

Y= X1 + X2 + X3
Dependent Variable Independent Variable
Outcome Variable Predictor Variable
Response Variable Explanatory Variable
Linear Regression
Linear regression is the type of regression that forms a relationship between
the target variable and one or more independent variables utilizing a straight
line. The given equation represents the equation of linear regression

Y = a + b*X + e.

Where, a represents the intercept, b represents the slope of the

regression line, e represents the error, X and Y represent the predictor
and target variables, respectively.

If X is made up of more than one variable, termed as multiple linear equations.

In linear regression, the best fit line is achieved utilizing the least squared
method, and it minimizes the total sum of the squares of the deviations from
each data point to the line of regression. Here, the positive and negative
deviations do not get canceled as all the deviations are squared.
Linear Regression Line
A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:

Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable
increases on X-axis, then such a relationship is termed as a Positive linear
relationship.
Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.
Logistic Regression

This type of statistical model (also known as logit model) is often used for
classification and predictive analytics.

Logistic regression estimates the probability of an event occurring, such as

voted or didn’t vote, based on a given dataset of independent variables.
Since the outcome is a probability, the dependent variable is bounded
between 0 and 1.

In logistic regression, a logit transformation is applied on the odds—that is, the

probability of success divided by the probability of failure.

This is also commonly known as the log odds, or the natural logarithm of
odds, and this logistic function is represented by the following formulas:
Logit(pi) = 1/(1+ exp(-pi))
Logistic regression is a misnomer
• The distinction between classification and regression is whether the value for
the target variable is categorical or numeric

• For logistic regression, the model produces a numeric estimate.

• However, the values of the target variable in the data are

categorical.

• Logistic regression is estimating the probability of class membership

(a numeric quantity) over a categorical class.

• Logistic regression is a class probability estimation model and not a

regression model.
An Example of Mining a Linear Discriminant from Data

• To illustrate linear discriminant functions, we use an adaptation of the Iris

dataset.

• This is an old and fairly simple dataset representing various types of iris, a
genus of flowering plant.

• The original dataset includes three species of irises represented with

four attributes, and the data mining problem is to classify each instance
as belonging to one of the three species based on the attributes.

• For this illustration we’ll use just two species of irises, Iris Setosa and Iris
Versicolor.
An Example of Mining a Linear Discriminant from Data
• The dataset describes a collection of flowers of these two species, each
described with two measurements: the Petal width and the Sepal width
(Figure 4-6).
Classifying Flowers
• The flower dataset is plotted in Figure 4-7, with these two attributes on the x and y
axis, respectively.
• Each instance is one flower and corresponds to one dot on the graph.
• The filled dots are of the species Iris Setosa and the circles are instances of
the species Iris Versicolor.
Classifying Flowers
• Two different separation lines are shown in the figure, one generated by
logistic regression and the second by another linear method, a support
vector machine (which will be described shortly).

• Note that the data comprise two fairly distinct clumps, with a few outliers.
Logistic regression separates the two classes completely: all the Iris
Versicolor examples are to the left of its line and all the Iris Setosa to
the right.

• The Support vector machine line is almost midway between the clumps,
though it misclassifies the starred point at (3, 1).

• Notice that the methods produce different boundaries because they’re

optimizing different functions.
Ranking Instances and Probability Class Estimation
• In many applications, we don’t simply want a yes or no prediction of whether an
instance belongs to the class, but we want some notion of which examples are more
or less likely to belong to the class. For ex,
• Which consumers are most likely to respond to this offer?
• Which customers are most likely to leave when their contracts expire?

• Ranking
• Tree induction
• Linear discriminant functions (e.g., linear regressions, logistic regressions,
SVMs)
• Ranking is free
• Class Probability Estimation
• Tree induction
• Logistic regression
The many faces of classification:
Classification / Probability Estimation / Ranking

Increasing difficulty

Classification Ranking Probability

Ranking:
• Business context determines the number of actions (“how far down the
list”)

Probability:
• You can always rank / classify if you have probabilities!
Ranking: Examples
• Search engines
• Whether a document is relevant to a topic / query
Class Probability Estimation: Examples
• MegaTelCo
• Ranking vs. Class Probability Estimation.

• Identify accounts or transactions as likely to have been defrauded

• The director of the fraud control operation may want the analysts to focus
not simply on the cases most likely to be fraud, but on accounts where the
expected monetary loss is higher.

• We need to estimate the actual probability of fraud.

Logistic regression (“sigmoid”) curve
Application of Logistic Regression
• The Wisconsin Breast Cancer Dataset
Wisconsin Breast Cancer dataset

• From each of these basic characteristics, three values were

computed: the mean (_mean), standard error (_SE), and “worst” or
largest
Wisconsin Breast Cancer dataset
Support Vector Machines
In machine learning, support vector machines (SVMs, also support vector networks) are
supervised learning models with associated learning algorithms that analyze data for
classification and regression analysis.

Find a linear hyperplane (decision boundary) that will separate the data
Support Vector Machines

One Possible Solution

Support Vector Machines

Another possible solution

Support Vector Machines

Which one is better? B1 or B2?

How do you define better?
Definitions

Define the hyperplane H such that:

xi•w+b  +1 when yi =+1 H1
xi•w+b  -1 when yi =-1
H1 and H2 are the planes: H2 d+
H1: xi•w+b = +1
d-
H2: xi•w+b = -1
H
The points on the planes H1 and H2
are the
Support Vectors:

d+ = the shortest distance to the closest positive point

d- = the shortest distance to the closest negative point
The margin of a separating hyperplane is d+ + d-.
Maximizing the margin
We want a classifier with as big margin as possible.

H1
Recall the distance from a point(x0,y0) to a H
line: H2
Ax+By+c = 0 is|A x0 +B y0 +c|/sqrt(A2+B2) d+
d-
The distance between H and H1 is:
|w•x+b|/||w||=1/||w||
The distance between H1 and H2 is: 2/||w||

In order to maximize the margin, we need to minimize ||w||. With the

condition that there are no datapoints between H1 and H2:
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi =-1 Can be combined into yi(xi•w)  1
Optimization Problem
Support Vector Machines
B1

 
w• x + b = 0
 
 
w • x + b = −1 w • x + b = +1

b11

  b12
 1 if w • x + b  1 2
f ( x) =    Margin = 
 − 1 if w • x + b  −1 || w ||
Support Vector Machines
B1

b21
b22

margin
b11

b12

Find hyperplane maximizes the margin => B1 is better than B2

Support Vector Machines

2
We want to maximize: Margin = 
|| w ||

|| w ||2
Which is equivalent to minimizing: L( w) =
2
But subjected to the following constraints:

𝑤 ∙ 𝑥𝑖 + 𝑏 ≥ 1 if 𝑦𝑖 = 1
𝑤 ∙ 𝑥𝑖 + 𝑏 ≤ −1 if 𝑦𝑖 = −1

This is a constrained optimization problem

Numerical approaches to solve it (e.g., quadratic
programming)
Loss Functions
• Loss functions define what a good prediction is and isn’t. In
short, choosing the right loss function dictates how well your
estimator will be.

• Loss functions measure how far an estimated value is from its

true value.

• A loss function maps decisions to their associated costs.

• Loss functions are not fixed, they change depending on the

task in hand and the goal to be met.
Loss Functions
• Zero-one loss assigns a loss of zero for a correct decision
and one for an incorrect decision.

• Squared error specifies a loss proportional to the square of

the distance from the boundary.
• Squared error loss usually is used for numeric value
prediction (regression), rather than classification.

• The squaring of the error has the effect of greatly penalizing

predictions that are grossly wrong.
Hinge Loss functions
• In machine learning, the hinge loss is a loss function used for
training classifiers. The hinge loss is used for "maximum-
margin" classification.

• Support vector machines use hinge loss.

• The hinge loss is a special type of cost function that not only
penalizes misclassified samples but also correctly classified
ones that are within a defined margin from the decision
boundary.
Hinge Loss functions
• Hinge loss incurs no penalty for an example that is not on the
wrong side of the margin.

• The hinge loss only becomes positive when an example is on

the wrong side of the boundary and beyond the margin.
• Loss then increases linearly with the example’s distance from
the margin.
• Penalizes points more the farther they are from the separating
boundary.
Characteristics of SVM
The learning problem is formulated as a convex optimization problem
• Efficient algorithms are available to find the global minima.
• Many of the other methods use greedy approaches and find locally
optimal solutions.
• High computational complexity for building the model.
• Robust to noise.

• Overfitting is handled by maximizing the margin of the decision boundary.

• SVM can handle irrelevant and redundant attributes better than many
other techniques.

• The user needs to provide the type of kernel function and cost function.
• Difficult to handle missing values.
Simple Neural Network
Non-linear Functions
• Linear functions can actually represent nonlinear models, if we include
more complex features in the functions

Illanes - Macias - Continuum Theory Proceedings of The Special Session in Honor of Professor Sam B. Nadler, Jr.'s 60th Birthday (2002, Marcel Dekker LTD) - Libgen - Li
No ratings yet
Illanes - Macias - Continuum Theory Proceedings of The Special Session in Honor of Professor Sam B. Nadler, Jr.'s 60th Birthday (2002, Marcel Dekker LTD) - Libgen - Li
357 pages
Operation Research Notes
100% (1)
Operation Research Notes
124 pages
Models: The Definition of A Model: Simplified
No ratings yet
Models: The Definition of A Model: Simplified
29 pages
Digital Image Processing - Image Enhancement
No ratings yet
Digital Image Processing - Image Enhancement
50 pages
Ch06 Deep Feedforward Networks
100% (1)
Ch06 Deep Feedforward Networks
90 pages
Environmental Auditing
100% (1)
Environmental Auditing
37 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
大型复杂船体结构焊接变形分析方法的研究及应用
No ratings yet
大型复杂船体结构焊接变形分析方法的研究及应用
179 pages
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
38 pages
Bits Catalog
100% (1)
Bits Catalog
98 pages
Describe PEAS
No ratings yet
Describe PEAS
68 pages
Complete Guide To Parameter Tuning in XGBoost With Codes in Python PDF
No ratings yet
Complete Guide To Parameter Tuning in XGBoost With Codes in Python PDF
20 pages
NPTEL Transcript PDF
No ratings yet
NPTEL Transcript PDF
977 pages
Multimedia DB
No ratings yet
Multimedia DB
30 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
UNIT-3 Image Enhancement in Frequency Domain
No ratings yet
UNIT-3 Image Enhancement in Frequency Domain
66 pages
Basics of Digital System Design For Beginners ISBN:978-81-957614-4-9
No ratings yet
Basics of Digital System Design For Beginners ISBN:978-81-957614-4-9
8 pages
Hui 2020
No ratings yet
Hui 2020
6 pages
VTU Exam Question Paper With Solution of 17EC741 Multimedia Communication Feb-2021-Prof. Aritri Debnath
No ratings yet
VTU Exam Question Paper With Solution of 17EC741 Multimedia Communication Feb-2021-Prof. Aritri Debnath
38 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
Northfield
No ratings yet
Northfield
14 pages
Cache Mapping Functions
No ratings yet
Cache Mapping Functions
39 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Linear Programming 1
No ratings yet
Linear Programming 1
18 pages
Topic 6 Optimal Dispatch of Generation
No ratings yet
Topic 6 Optimal Dispatch of Generation
136 pages
1.4 Exam Questions
No ratings yet
1.4 Exam Questions
3 pages
Module 3a - Itext & Image Compression
100% (1)
Module 3a - Itext & Image Compression
54 pages
Ec8093 Dip - Question Bank With Answers
100% (1)
Ec8093 Dip - Question Bank With Answers
189 pages
Microprocessor and Assembly Language Lecture 1
No ratings yet
Microprocessor and Assembly Language Lecture 1
31 pages
Housing - 5 Elements
No ratings yet
Housing - 5 Elements
7 pages
Zavoli Federici 2021 Reinforcement Learning For Robust Trajectory Design of Interplanetary Missions
No ratings yet
Zavoli Federici 2021 Reinforcement Learning For Robust Trajectory Design of Interplanetary Missions
14 pages
Fdsa Unit 3
No ratings yet
Fdsa Unit 3
42 pages
LPP Worksheet
No ratings yet
LPP Worksheet
4 pages
Lecture#14
No ratings yet
Lecture#14
38 pages
Mathematical Modelling and Parameter Identification of A Stainless Steel Annealing Furnace
100% (1)
Mathematical Modelling and Parameter Identification of A Stainless Steel Annealing Furnace
25 pages
6CS5 DS Unit-4
No ratings yet
6CS5 DS Unit-4
64 pages
MCA 3rd Sem Game Theory
No ratings yet
MCA 3rd Sem Game Theory
17 pages
LPP Formulation
0% (2)
LPP Formulation
15 pages
DSA Lab Manual
No ratings yet
DSA Lab Manual
41 pages
POM Unit 1
No ratings yet
POM Unit 1
16 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
Matlab Notes
No ratings yet
Matlab Notes
37 pages
Digital Signal Processing Question Bank 01
No ratings yet
Digital Signal Processing Question Bank 01
37 pages
Describe The Structure of Mathematical Model in Your Own Words
No ratings yet
Describe The Structure of Mathematical Model in Your Own Words
10 pages
KMBN It01 - Unit 4
No ratings yet
KMBN It01 - Unit 4
19 pages
Numerical Optimization in Matlab
No ratings yet
Numerical Optimization in Matlab
25 pages
Difference Between Lossless Compression and Lossy Compression
No ratings yet
Difference Between Lossless Compression and Lossy Compression
15 pages
Simultaneous Optimization of Pump and Cooler Networks in A Cooling Water System
No ratings yet
Simultaneous Optimization of Pump and Cooler Networks in A Cooling Water System
9 pages
Digital Image Processing: Image Enhancement (Spatial Filtering 1)
No ratings yet
Digital Image Processing: Image Enhancement (Spatial Filtering 1)
19 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
Digital Image Processing Matlab Basics
No ratings yet
Digital Image Processing Matlab Basics
46 pages
Optimal Tuning of PI Controller For Speed Control of DC Motor Drive Using Particle Swarm Optimization
No ratings yet
Optimal Tuning of PI Controller For Speed Control of DC Motor Drive Using Particle Swarm Optimization
6 pages
Petrochemicals Planning.
No ratings yet
Petrochemicals Planning.
12 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Recursive Least-Squares (RLS) Adaptive Filters
No ratings yet
Recursive Least-Squares (RLS) Adaptive Filters
21 pages
Isight 55 Datasheet
No ratings yet
Isight 55 Datasheet
2 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Data Compression
No ratings yet
Data Compression
25 pages
B.E Semester: 6 - IT (GTU) : 2161603 - Data Compression and Data Retrieval
No ratings yet
B.E Semester: 6 - IT (GTU) : 2161603 - Data Compression and Data Retrieval
17 pages
DWT PPT
100% (2)
DWT PPT
16 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
41 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
3c) Question Paper (Mid Exam)
No ratings yet
3c) Question Paper (Mid Exam)
2 pages
Noise Models in Image Processing
No ratings yet
Noise Models in Image Processing
4 pages
Neural Network Unit - 4 - 221210 - 134739
No ratings yet
Neural Network Unit - 4 - 221210 - 134739
15 pages
Syllabus Content:: 1.2 Multimedia Graphics
No ratings yet
Syllabus Content:: 1.2 Multimedia Graphics
21 pages
Data Structure Question Bank
No ratings yet
Data Structure Question Bank
24 pages
Embedded Zero Tree Wavelet Coding
No ratings yet
Embedded Zero Tree Wavelet Coding
25 pages
Machine Learning Applications in Physical Design: Recent Results and Directions
No ratings yet
Machine Learning Applications in Physical Design: Recent Results and Directions
114 pages
Image Processing (Rry025)
No ratings yet
Image Processing (Rry025)
22 pages
This Study Resource Was: Home Work: Chapter 3
No ratings yet
This Study Resource Was: Home Work: Chapter 3
5 pages
Chapter 11 - MPEG Video Coding I - MPEG-1 and 2
No ratings yet
Chapter 11 - MPEG Video Coding I - MPEG-1 and 2
39 pages
Predictive Coding I
No ratings yet
Predictive Coding I
14 pages
Branch and Bound
No ratings yet
Branch and Bound
14 pages
Optima Consultants - Company Profile
No ratings yet
Optima Consultants - Company Profile
11 pages
Question Bank Internal Image Processing
No ratings yet
Question Bank Internal Image Processing
6 pages
EC2037 MUTIMEDIa QB
100% (4)
EC2037 MUTIMEDIa QB
16 pages
Dip 15ec72 Module1
No ratings yet
Dip 15ec72 Module1
27 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
DSP Questions
No ratings yet
DSP Questions
46 pages
Dce Sem 5 Syllabus
0% (1)
Dce Sem 5 Syllabus
3 pages
CELP
No ratings yet
CELP
23 pages
Lab Manual 15
No ratings yet
Lab Manual 15
9 pages
DIP IPT Unit V Complete
No ratings yet
DIP IPT Unit V Complete
69 pages
ML Viva Questions
No ratings yet
ML Viva Questions
8 pages
Smoothing Techniques in Image Processing
No ratings yet
Smoothing Techniques in Image Processing
59 pages
Eng4Bf3 Medical Image Processing: Image Enhancement in Frequency Domain
No ratings yet
Eng4Bf3 Medical Image Processing: Image Enhancement in Frequency Domain
59 pages
Detailed Lesson Plan-Dsp
No ratings yet
Detailed Lesson Plan-Dsp
6 pages

Module3-Fitting A Model To Data

Uploaded by

Module3-Fitting A Model To Data

Uploaded by

Module 3

Fitting a Model to Data

The basic structure of a statistical model is:

We generally describe a model in terms of its parameters, which are

For example, in Figure 4-1,

Similarly, if it falls into the upper-right

Instance Space Linear Classifier

You may recall that the equation of a line in two dimensions is

We would classify an instance x as a + if it is above the line,

𝑐𝑙𝑎𝑠𝑠 = {+ if 1.0 × 𝐴𝑔𝑒 − 1.5 × 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 + 60 > 0

• We now have a parameterized model: the weights of the linear function

• The weights are often loosely interpreted as importance indicators of the

A linear model then can be written as follows in Equation 4-2.

Equation 4-2. A general linear model

To use this model as a linear discriminant, for a given instance

Which should we pick?

Our general procedure will be to define an objective function that represents

Several choices have been shown to be remarkably effective. One of these

Linear regression, logistic regression, and support vector machines

The key difference is that each uses a different objective function.

• Technique used for the modeling and analysis of numerical data.

• Exploits the relationship between two or more variables so that we can

• Regression can be used for prediction, estimation,hypothesis testing,

Where, a represents the intercept, b represents the slope of the

If X is made up of more than one variable, termed as multiple linear equations.

Positive Linear Relationship:

Logistic regression estimates the probability of an event occurring, such as

In logistic regression, a logit transformation is applied on the odds—that is, the

• For logistic regression, the model produces a numeric estimate.

• However, the values of the target variable in the data are

• Logistic regression is estimating the probability of class membership

• Logistic regression is a class probability estimation model and not a

• To illustrate linear discriminant functions, we use an adaptation of the Iris

• The original dataset includes three species of irises represented with

• Notice that the methods produce different boundaries because they’re

Classification Ranking Probability

• Identify accounts or transactions as likely to have been defrauded

• We need to estimate the actual probability of fraud.

• From each of these basic characteristics, three values were

One Possible Solution

Another possible solution

Other possible solutions

Which one is better? B1 or B2?

Define the hyperplane H such that:

d+ = the shortest distance to the closest positive point

In order to maximize the margin, we need to minimize ||w||. With the

Find hyperplane maximizes the margin => B1 is better than B2

This is a constrained optimization problem

• Loss functions measure how far an estimated value is from its

• A loss function maps decisions to their associated costs.

• Loss functions are not fixed, they change depending on the

• Squared error specifies a loss proportional to the square of

• The squaring of the error has the effect of greatly penalizing

• Support vector machines use hinge loss.

• The hinge loss only becomes positive when an example is on

• Overfitting is handled by maximizing the margin of the decision boundary.

You might also like