Module 3.1

Uploaded by

1dt20ai016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views25 pages

Module 3.1

Uploaded by

1dt20ai016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Module 3

Fitting a Model to Data

• An alternative method for learning a predictive model from a dataset is to start by
specifying the structure of the model with certain numeric parameters left
unspecified.
• Then the data mining calculates the best parameter values given a particular set of
training data.
• As examples we will present some common techniques used for predicting
(estimating) unknown numeric values, unknown binary values (such as whether a
document or web page is relevant to a query), as well as likelihoods of events, such
as default on credit, response to an offer, fraud on an account, and so on.
Classification via Mathematical Functions
• It shows the space broken up into regions by horizontal and vertical decision
boundaries that partition the instance space into similar regions. Examples in each
region should have similar values for the target variable.
• A main purpose of creating homogeneous regions is so that we can predict the
target variable of a new, unseen instance by determining which segment it falls
into.
• For example, in Figure 4-1, if a new customer falls into the lower-left segment, we
can conclude that the target value is very likely to be “•”. Similarly, if it falls into
the upper-right segment, we can predict its value as “+”.
• For example, we can separate the instances almost perfectly (by class) if we are
allowed to introduce a boundary that is still a straight line, but is not perpendicular
to the axes (Figure 4-3)

Figure 4-3. The dataset of Figure 4-2 with a single linear split.

• This is called a linear classifier and is essentially a weighted sum of the values for
the various attributes.
Linear Discriminate Functions
• Our goal is going to be to fit our model to the data, and to do so it is quite helpful to
represent the model mathematically. You may recall that the equation of a line in two
dimensions is y = mx + b, where m is the slope of the line and b is the y intercept (the y
value when x = 0). The line in Figure 4-3 can be expressed in this form (with Balance in
thousands) as:
• Age = ( - 1.5) × Balance + 60
• We would classify an instance x as a + if it is above the line, and as a • if it is below the
line. Rearranging this mathematically leads to the function that is the basis of all the
techniques discussed in this chapter. First, for this example form the classification
solution is shown in Equation 4-1.
Equation 4-1. Classification function
class() = {+if 1.0×Age-1.5×Balance+60>0
{. if 1.0 × Age - 1.5 × Balance + 60 ≤ 0
• This is called a linear discriminate because it discriminates between the classes,
and the function of the decision boundary is a linear combination—a weighted sum
of the attributes.
• A linear discriminate function is a numeric classification model. For example,
consider our feature vector x, with the individual component features being xi. A
linear model then can be written as follows in Equation 4-2.
• Equation 4-2. A general linear model

f (x) = w0 + w1x1 + w2x2 + ⋯

Figure 4-4. A basic instance space in two dimensions containing points of two classes.
• The concrete example from Equation 4-1 can be written in this form:
• f (x) = 60 + 1.0 × Age - 1.5 × Balance
• To use this model as a linear discriminate, for a given instance represented by a
feature vector x, we check whether f(x) is positive or negative. As discussed above,
in the two-dimensional case, this corresponds to seeing whether the instance x falls
above or below the line.
• The data mining is going to “fit” this parameterized model to a particular dataset
meaning specifically, to find a good set of weights on the features.

Figure 4-5. Many different possible linear boundaries can separate the two
groups of points of Figure 4-4.
Optimizing an Objective Function
• Our general procedure will be to define an objective function that represents our
goal, and can be calculated for a particular set of weights and a particular set of
data. We will then find the optimal value for the weights by maximizing or
minimizing the objective function.
• Logistic regression doesn’t really do what we call regression, which is the
estimation of a numeric target value. Logistic regression applies linear models to
class probability estimation, which is particularly useful for many applications.

An Example of Mining a Linear Discriminant from Data

• From the UCI Dataset Repository (Bache & Lichman, 2013). This is an old and
fairly simple dataset representing various types of iris, a genus of flowering plant.
The original dataset includes three species of irises represented with four attributes,
and the data mining problem is to classify each instance as belonging to one of the
three species based on the attributes.
Figure 4-6. Two parts of a flower. Width measurements of these are used in
the Iris dataset
Iris Setosa and Iris Versicolor. The dataset describes a collection of flowers
of these two species, each described with two measurements: the Petal width
and the Sepal width (Figure 4-6).
Figure 4-7, with these two attributes on the x and y axis, respectively. Each
instance is one flower and corresponds to one dot on the graph. The filled dots are of
the species Iris Setosa and the circles are instances of the species Iris Versicolor.
• Two different separation lines are shown in the figure, one generated by logistic
regression and the second by another linear method, a support vector machine
(which will be described shortly). Note that the data comprise two fairly distinct
clumps, with a few outliers. Logistic regression separates the two classes
completely: all the Iris Versicolor examples are to the left of its line and all the Iris
Setosa to the right.

Linear Discriminant Functions for Scoring and Ranking Instances

• Many people suspect that right near the decision boundary we would be
most uncertain about a class (and see the discussion below on the
“margin”).
• f(x) will be relatively small when x is near the boundary. And f(x) will be
large (and positive) when x is far from the boundary in the + direction.
Support Vector Machines, Briefly
• Support vector machines are linear discriminants. For many business users
interacting with data scientists, that will be sufficient. Nevertheless, let’s look at
SVMs a little more carefully, if we can get through some minor details, the
procedure for fitting the linear discriminant is intuitively satisfying.
• The distance between the dashed parallel lines is called the margin around the
linear discriminant, and thus the objective is to maximize the margin.
•

Figure 4-8. The points of Figure 4-2 and the maximal margin classifier.
• The idea of maximizing the margin is intuitively satisfying for the following
reason. The training dataset is just a sample from some population. In predictive
modeling, we are interested in predicting the target for instances that we have not
yet seen. These instances will be scattered about. Hopefully they will be distributed
similarly to the training data, but they will in fact be different points. In particular,
some of the positive examples will likely fall closer to the discriminant boundary
than any positive example we have yet seen.
• The penalty for a misclassified point is proportional to the distance from the
decision boundary, so if possible the SVM will make only “small” errors.
Technically, this error function is known as hinge loss.
Figure 4-9. Two loss functions illustrated. The x axis shows the distance from the
decision boundary. The y axis shows the loss incurred by a negative instance as a
function of its distance from the decision boundary. (The case of a positive instance is
symmetric.) If the negative instance is on the negative side of the boundary, there is
no loss. If it is on the positive (wrong) side of the boundary, the different loss
functions penalize it differently.
Regression via Mathematical Functions
• The linear regression model structure is exactly the same as for the linear
discriminant function.

f (x) = w0 + w1x1 + w2x2 + ⋯

• The linear function estimates this numeric target value using Equation 4-2, and of
course the training data have the actual target value.
• The model that fits the data best would be the model with the minimum sum of
errors on the training data. And that is exactly what regression procedures.
• Standard linear regression procedures instead minimize the sum or mean of the
squares of these errors which gives the procedure its common name “least squares”
regression.
Class Probability Estimation and Logistic
“Regression”
• A linear discriminant could be used to identify accounts or transactions as likely to
have been defrauded. The director of the fraud control operation may want the
analysts to focus not simply on the cases most likely to be fraud, but on the cases
where the most money is at stake—that is, accounts where the company’s monetary
loss is expected to be the highest.
• Table 4-1. Probabilities and the corresponding odds.
Probability Corresponding odds
0.5 50:50 or 1
0.9 90:10 or 9
0.999 999:1 or 999
0.01 1:99 or 0.0101
0.001 1:999 or 0.001001
• Table 4-2. Probabilities, odds, and the corresponding log-odds.
Probability Odds Log-odds
0.5 50:50 or 1 0
0.9 90:10 or 9 2.19
0.999 999:1 or 999 6.9
0.01 1:99 or 0.0101 –4.6
0.001 1:999 or 0.001001 –6.9
• For probability estimation, logistic regression uses the same linear model as do our
linear discriminants for classification and linear regression for estimating numeric
target values.
• The output of the logistic regression model is interpreted as the log-odds of class
membership.
Logistic Regression: Some Technical Details

• p+(x) to represent the model’s estimate of the probability of class membership of a data item
represented by feature vector x.
• The estimated probability of the event not occurring is therefore 1 - p+(x).
• Equation 4-3. Log-odds linear function
log ( p+(x) 1 - p+(x))= f (x) = w0 + w1x1 + w2x2 + ⋯
• Equation 4-3. Log-odds linear function
• log ( p+(x) 1 - p+(x))= f (x) = w0 + w1x1 + w2x2 + ⋯
• Thus, Equation 4-3 specifies that for a particular data item, described by feature-vector x, the
log-odds of the class is equal to our linear function, f(x). Since often we actually want the
estimated probability of class membership, not the log-odds, we can solve for p+(x) in
Equation 4-3. This yields the not-so-pretty quantity in Equation 4-4.
• Equation 4-4. The logistic function
p+(x) =1 /1 + e - f (x)
Figure 4-10. Logistic regression’s estimate of class probability as a function of
f(x), (i.e., the distance from the separating boundary). This curve is called a
“sigmoid” curve because of its “S” shape, which squeezes the probabilities into
their correct range (between zero and one).
• Figure 4-10 plots the estimated probability p+(x) (vertical axis) as a function of the
distance from the decision boundary (horizontal axis). The figure shows that at the
decision boundary (at distance x = 0), the probability is 0.5 (a coin toss).
• Consider the following function computing the “likelihood” that a particular
labeled example belongs to the correct class, given a set of parameters w that
produces
• class probability estimates p+(x): g(x, w) = { p+(x) if x is a+
• 1 - p+(x) if x is a•
Example: Logistic Regression versus Tree Induction
• A classification tree uses decision boundaries that are perpendicular to the instance
space axes (see Figure 4-1), whereas the linear classifier can use decision
boundaries of any direction or orientation (see Figure 4-3).
• A classification tree is a “piecewise” classifier that segments the instance space
recursively when it has to, using a divide-and-conquer approach.
Figure 4-11. One of the cell images from which the Wisconsin Breast Cancer
dataset was derived. (Image courtesy of Nick Street and Bill Wolberg.)
Each example describes characteristics of a cell nuclei image, which has
been labeled as either benign or malignant (cancerous), based on an expert’s
diagnosis of the cells. A sample cell image is shown in Figure 4-11.
Table 4-3. The attributes of the Wisconsin Breast Cancer dataset.
• Attribute name Description
• RADIUS Mean of distances from center to points on the perimeter
• TEXTURE Standard deviation of grayscale values
• PERIMETER Perimeter of the mass
• AREA Area of the mass
• SMOOTHNESS Local variation in radius lengths
• COMPACTNESS Computed as: perimeter2/area – 1.0
• CONCAVITY Severity of concave portions of the contour
• CONCAVE POINTS Number of concave portions of the contour
• SYMMETRY A measure of the nucleii’s symmetry
• FRACTAL DIMENSION 'Coastline approximation' – 1.0
• DIAGNOSIS (Target) Diagnosis of cell sample: malignant or benign
• Table 4-4. Linear equation learned by logistic regression on the Wisconsin Breast
Cancer dataset (see text and Table 4-3 for a description of the attributes).
• Attribute Weight (learned parameter)
• SMOOTHNESS_worst 22.3
• CONCAVE_mean 19.47
• CONCAVE_worst 11.68
• SYMMETRY_worst 4.99
• CONCAVITY_worst 2.86
• CONCAVITY_mean 2.34
• RADIUS_worst 0.25
• TEXTURE_worst 0.13
• AREA_SE 0.06
• TEXTURE_mean 0.03
• TEXTURE_SE –0.29
• COMPACTNESS_mean –7.1
• COMPACTNESS_SE –27.87
• w0 (intercept) –17.7
Nonlinear Functions, Support Vector Machines,
and
Neural Networks
• In Figure 4-12 we show that such linear functions can actually represent nonlinear
models, if we include more complex features in the functions.
• The resulting model is a curved line (a parabola) in the original feature space.
Sepal width2. We also added a single data point to the original dataset, an Iris
Versicolor example.
• The two most common families of techniques that are based on fitting the
parameters of complex, nonlinear functions are nonlinear support vector machines
and neural networks.
Figure 4-12. The Iris dataset with a nonlinear feature. In this figure, logistic
regression and support vector machine both linear models are provided an
additional feature, Sepal width2, which allows both the freedom to create more
complex, nonlinear models (boundaries), as shown.

Linear Discriminant
No ratings yet
Linear Discriminant
25 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
ML Unit 2
No ratings yet
ML Unit 2
53 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Discrimination and Classification
No ratings yet
Discrimination and Classification
7 pages
3 Society As An Objective Reality
0% (1)
3 Society As An Objective Reality
13 pages
Workbook On Math Math Grade 6
100% (1)
Workbook On Math Math Grade 6
198 pages
ML Unit-4
No ratings yet
ML Unit-4
34 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
ML Unit-4
No ratings yet
ML Unit-4
35 pages
Physics 1 Wk1 Conversion of Units Scientific Notation
No ratings yet
Physics 1 Wk1 Conversion of Units Scientific Notation
158 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
Five Year Development Plan (2003-2008)
No ratings yet
Five Year Development Plan (2003-2008)
190 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
Design Thinking Process Worksheet
No ratings yet
Design Thinking Process Worksheet
8 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
M7 ClassificationLinearModels
No ratings yet
M7 ClassificationLinearModels
74 pages
Educational Neuroscience - 1st Edition FULL PDF DOCX DOWNLOAD
100% (17)
Educational Neuroscience - 1st Edition FULL PDF DOCX DOWNLOAD
16 pages
1 Tahura Sharaban 2021 PHD
No ratings yet
1 Tahura Sharaban 2021 PHD
341 pages
BIONICS - DR - Parameswari. PHD Agri., Bionics Enviro Tech, Nanozyme
100% (1)
BIONICS - DR - Parameswari. PHD Agri., Bionics Enviro Tech, Nanozyme
26 pages
Unit-4 Part-1 ML Ai&Ml r23
No ratings yet
Unit-4 Part-1 ML Ai&Ml r23
20 pages
Information Securtiy
No ratings yet
Information Securtiy
8 pages
Linguistics and Evolution A Developmental Approach Andresen JT PDF Download
No ratings yet
Linguistics and Evolution A Developmental Approach Andresen JT PDF Download
79 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
HANDOUT - Understanding Common Hazards
No ratings yet
HANDOUT - Understanding Common Hazards
2 pages
Linear - Classification
No ratings yet
Linear - Classification
72 pages
Siam Mapped A History of The Geo-Body of A Nation - Selection
No ratings yet
Siam Mapped A History of The Geo-Body of A Nation - Selection
102 pages
Mass Transfer Lab 1
No ratings yet
Mass Transfer Lab 1
9 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Flowchart: A Pictorial Form of An Algorithm Is Known As A Flowchart
No ratings yet
Flowchart: A Pictorial Form of An Algorithm Is Known As A Flowchart
28 pages
Linear Models For Classification
No ratings yet
Linear Models For Classification
21 pages
Discriminant, Generative, Discriminative Models
No ratings yet
Discriminant, Generative, Discriminative Models
98 pages
Welcome Remarks
100% (1)
Welcome Remarks
5 pages
Edith Brown Weiss - Intergenerational Equity UN
No ratings yet
Edith Brown Weiss - Intergenerational Equity UN
24 pages
Examinee Guide: POST Entry-Level Dispatcher Selection Test Battery
No ratings yet
Examinee Guide: POST Entry-Level Dispatcher Selection Test Battery
9 pages
ML 41
No ratings yet
ML 41
49 pages
Class23 26 LinearClassification NeuralNetworks - 05 15nov2019
No ratings yet
Class23 26 LinearClassification NeuralNetworks - 05 15nov2019
35 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
StatementOfAccount 3211598770 Apr05 131805
No ratings yet
StatementOfAccount 3211598770 Apr05 131805
9 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
4025 - Government College of Engineering, Nagpur
No ratings yet
4025 - Government College of Engineering, Nagpur
6 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
Practice 14 Practice Tests Set 14 - Paper 1H Mark Scheme
No ratings yet
Practice 14 Practice Tests Set 14 - Paper 1H Mark Scheme
15 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Chapter 4: Linear Models For Classification: Grit Hein & Susanne Leiberg
No ratings yet
Chapter 4: Linear Models For Classification: Grit Hein & Susanne Leiberg
21 pages
Izzaldine SYUFYAN - Circus Assessment Citeria A and B
No ratings yet
Izzaldine SYUFYAN - Circus Assessment Citeria A and B
14 pages
MPCC Mark III Instruction Manual
No ratings yet
MPCC Mark III Instruction Manual
7 pages
The Unseen Architects: How Microbes Shape Our World
No ratings yet
The Unseen Architects: How Microbes Shape Our World
2 pages
Information Retrieval Important Questions
No ratings yet
Information Retrieval Important Questions
20 pages
C30 C35 LinearModelForClassification
No ratings yet
C30 C35 LinearModelForClassification
50 pages
Cebex 100 SDS26417 44
No ratings yet
Cebex 100 SDS26417 44
7 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Astmb462 18
No ratings yet
Astmb462 18
6 pages
Benouli
No ratings yet
Benouli
7 pages
Choose The Underlined Part Among A, B, C or D That Needs Correcting
No ratings yet
Choose The Underlined Part Among A, B, C or D That Needs Correcting
3 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Energy-Efficient Logarithmic Square Rooter For Error-Resilient Applications
No ratings yet
Energy-Efficient Logarithmic Square Rooter For Error-Resilient Applications
4 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Temperate Cyclones
No ratings yet
Temperate Cyclones
4 pages
Lecture Slides Week11
No ratings yet
Lecture Slides Week11
33 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Lecture Slides-Week11
No ratings yet
Lecture Slides-Week11
32 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
A Study of Rio de Janeiro
No ratings yet
A Study of Rio de Janeiro
2 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
‎⁨مد احصاء حيوي 1446⁩
No ratings yet
‎⁨مد احصاء حيوي 1446⁩
2 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
1 An Introduction To Linear Classifiers
No ratings yet
1 An Introduction To Linear Classifiers
9 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Discriminant Analysis Example 2: Fisher's Iris Data
No ratings yet
Discriminant Analysis Example 2: Fisher's Iris Data
12 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Module 3.1

Uploaded by

Module 3.1

Uploaded by

Module 3

Fitting a Model to Data

f (x) = w0 + w1x1 + w2x2 + ⋯

An Example of Mining a Linear Discriminant from Data

Linear Discriminant Functions for Scoring and Ranking Instances

f (x) = w0 + w1x1 + w2x2 + ⋯

You might also like