0% found this document useful (0 votes)

29 views43 pages

Week 4

Uploaded by

KhánhLinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views43 pages

Week 4

Uploaded by

KhánhLinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

M2Lab

Machine Learning
(112-1: EE5184)
劉子毓 Joyce Liu

1
M2Lab

Outline
● A quick review of the materials last week

● Some methods related to LDA and QDA

○ Kernel density classification
○ Naive Bayes

● Generalized additive models

○ Smoothing Splines
○ Additive logistic regression algorithm

● Tree-based methods

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
2
M2Lab

A quick review
of the materials
last week

3
M2Lab

Linear regression of an indicator matrix

Can we apply the linear regression method to this problem?

● Suppose each of the response categories are coded via an indicator variable.
If the class output variable G has K classes, there will be K indicators, Yk, k =
1, . . . ,K, with Yk = 1 if G = k else 0. These are collected together in a vector Y
= (Y1, . . . , YK).
● Fit a linear regression model to each of the columns of Y simultaneously.

X is the model matrix with p+1 columns corresponding to the p

● inputs, and a leading column of 1’s for the intercept

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]

4
M2Lab

Linear discriminant analysis (LDA)

● Suppose fk(x) is the class-conditional density of X in class G = k, and let
πk be the prior probability of class k.

● The class posteriors can be written as

● Assume that we model each class density as multivariate Gaussian.

● Linear discriminant analysis (LDA) arises in the special case when we

assume that the classes have a common covariance matrix.
5
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Linear discriminant analysis (LDA)

● Now we know how to make a prediction given the parameters of the
Gaussian distribution. The remaining question is how to find there
parameters in the Gaussian distribution?

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 6
M2Lab

Quadratic discriminant analysis (QDA)

Note that in LDA, we assumed that the covariance matrices of each class are the same. Without
the assumption, the convenient cancellation would not occur, and the pieces quadratic in x remain.

LDA QDA

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 7
M2Lab

Logistic regression

For a binary response with a 0/1

coding as above, if we use linear
regression, some of our estimates
might be outside the [0, 1]
interval, making them hard to
interpret as probabilities!

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 8
M2Lab

Logistic regression
After some calculations, one can show that

and these probabilities sum to 1. In what follows, we denote the parameter set
and .

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 9
M2Lab

Separating hyperplanes

We have seen that LDA and

logistic regression both estimate
linear decision boundaries in
similar but slightly different ways.
Now, we will describe separating
hyperplane classifiers, in which
we construct linear decision
boundaries to explicitly try to
separate the data into different
classes as much as possible.

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 10
M2Lab

Optimal separating hyperplanes

The optimal separating hyperplane separates the two classes and maximizes the
distance to the closest point from either class.

Note that this optimization is conducted over all training points. The set of
conditions ensure that all the points are at least a signed distance M from the
decision boundary defined by beta and beta_0.

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 11
M2Lab

Support vector classifier

● Define the slack variables and
incorporate them into the
optimization problem.
Optimal separating hyperplanes

Support vector classifier

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 12
M2Lab

Support vector machines (SVM)

● The support vector classifier we described so far finds linear boundaries in the
input feature space.

● We can represent the the support vector classifier optimization problem and its
solution in a special way that only involves the input features via inner products.

● Suppose we transform our input features x first using h(x). We see that the
solution f(x) can be written

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 13
M2Lab

The SVM as a penalization method

Consider the following optimization problem.

The “+” notation indicates positive part. This formulation has the loss+penalty
format. It is known as the hinge loss function. And the optimization problem is
equivalent to

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link] 14
M2Lab

Some methods
related to LDA
and QDA

15
M2Lab

Kernel density classification

● Kernel density estimation:
○ Suppose we have a random sample x1, x2, …, xN drawn from a probability density function
fx(x), and we wish to estimate fx at a point x0.
○ A natural local estimate has the form below, in which N(x0) is a small metric neighborhood
around x0 of width λ.

○ The above estimate could be bumpy, and often a smooth estimate is preferred.

○ A popular choice for Kλ is the Gaussian kernel.

16
M2Lab

Kernel density classification

● An example of kernel density
estimation is shown on the right.

● Kernel density classification: We can

apply kernel density estimation to
classification by using Bayes’ theorem.

● Suppose we fit density estimates in

each class and
estimates of the class priors .

17
M2Lab

Naive Bayes
Another popular technique is to assume that given a class G=j, the features Xk are
independent.

The assumption is generally not true, but it simplifies the estimation.

● The individual class-conditional marginal densities fjk can each be estimated
separately using one-dimensional kernel density estimates. The original naive
Bayes model uses univariate Gaussian the represent the marginals.
● If the input Xj is discrete, we can use appropriate histogram to estimate.

18
M2Lab

Naive Bayes
We can further derive the logit-transform as follows.

It has the form of a generalized additive model, which we will discuss further later
in the class.
19
M2Lab

Generalized
additive models

20
M2Lab

Smoothing splines
● We have made use of models linear in the input features, both for regression and
classification. For example, linear regression, linear discriminant analysis, logistic regression
and separating hyperplanes all rely on a linear model.

● Here we will take a look at an example for moving beyond linearity.

● The idea is to augment/replace the vector of inputs X with additional variables, which are
transformations of X, and then use linear models in this new space of derived input features.

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
21
M2Lab

Smoothing splines

● This is a linear basis expansion in X.

● Once the basis functions hm have been determined, the models are linear in these new
variables, and the fitting proceeds as before.

● Examples

Read section 5.2 on piecewise

polynomials and splines.

22
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Smoothing splines
Consider the following problem.

The first term measures closeness to the data, while the second term penalizes
curvature in the function, and λ establishes a tradeoff between the two.

23
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Smoothing splines
● The solution is a natural spline.

● Nj(x) are an N-dimensional set of basis functions for representing this family
of natural splines (See section 5.2).

● The loss function RSS becomes

● The solution becomes and

24
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Smoothing splines
A smoothing spline is an example of a linear smoother. This is because the
estimated parameters are a linear combination of the y i.

The N-vector of the fitted values at the training predictors xi is also linear in y.
The finite linear operator S𝛌 is known as the smoother matrix.

25
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Smoothing splines

The figure on the right shows the

results of applying a cubic smoothing
spline to some air pollution data. The
larger lambda is, the smoother the fit
becomes.

26
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Generalized additive models

● We just discussed the smoothing spline, in which models have the format.

● What about the case when we have multiple predictors?

● Recall in the 2nd week, we extend linear regression to multiple regression.

● Here, we can extend the smoothing spline to generalized additive models.

27
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Generalized additive models

● This is helpful since in real life, effects are often not linear.

● In general, the conditional mean 𝜇(X), e.g., 𝜇(X)=P(Y=1|X) of a response Y is related to

an additive function of the predictors via a link function g:

● For example, when g(𝜇)=logit(𝜇)=log(𝜇(X)/(1-𝜇(X))) is equivalent to logistic regression.

● Additive logistic regression model replaces the linear term by a more general functional
form.

● In the generalized additive model, not all of the functions fj need to be nonlinear. We
can mix in linear and other nonlinear terms. 28
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Fitting additive models

The additive model has the form

Given the observations xi, yi, a criterion like the penalized sum of squares can be
written as

The idea is to iteratively solve for each function fj. When solving for a function fj, we
can adopt what we learned in smoothing splines.

29
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Fitting additive models

30
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Tree-based
methods

31
M2Lab

Tree-based methods
Tree-based methods partition the feature space into a set of rectangles, and then
fit a simple model in each one. Here we introduce tree-based regression and
classification, known as CART (Classification And Regression Trees).
● Suppose we have a regression problem with
continuous response Y and inputs X1 and
X2.

● In each partition element we can model Y

with a different constant.

● However, not every partitioning line has a

simple description like X1=c. Some of the
resulting regions are complicated to
describe.
32
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Tree-based methods
To simplify, we restrict to recursive binary partitions.

● We first split the space into two regions and model

the response by the mean of Y in each region.
● Then one or both of these regions are split further
into two more regions.
● Continue the process.

The resulting of the process is a partition into the five

regions R1,..., R5. And the corresponding regression
model predicts Y with a constant cm in region Rm.

Now, let’s see the details for regression tree and

classification tree respectively.

33
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Regression tree
● Our data consists of p inputs and a response, for each of N observations:

(xi, yi) for i =1, 2,..., N, with xi=(xi1, xi2, …, xip)

● Suppose we have a partition into M regions, R1,..., RM, and we model the
response as a constant cm.

● If we adopt as our criterion minimization of the sum of squares, the solution is

the average of yi in region Rm.

34
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Regression tree
Finding the best binary partition in terms of minimum sum of squares is generally
computationally infeasible. We proceed with a greedy algorithm.

● Start with all of the data, consider a splitting variable j and split points, and define the
pair of half-planes.

● Seek the splitting variable j and split point s that solve

● For any choice j and s, the inner minimization is solved by

● Having found the best split, we partition the data into the two resulting regions and
repeat the splitting process on each of the two regions. 35
M2Lab

Regression tree
We have the algorithm to grow the tree. But how large should we grow the tree?

● Strategy 1: Split tree nodes only if the decrease in sum-of-squares due to the
split exceeds some threshold.
○ However, sometimes a seemingly worthless split might lead to a good split below it. This
strategy can be short-sighted.

● Strategy 2: Grow a large tree. Stop the splitting process only when some
minimum node size is reached. Then prune the tree using cost-complexity
pruning.

36
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Regression tree

The parameter ɑ governs

the tradeoff between tree
size and its goodness of fit
to the data.

37
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Classification tree
If the target is a classification outcome taking values 1, 2, …, K, the only changes
needed the the tree algorithm pertain to the criteria for splitting nodes and pruning
the tree.

● In a node m, representing a region Rm with Nm observations, let

the proportion of class k observations in node m.

● We classify the observations in node m to class

● We need to define the measure of node impurity suitable for classification.

38
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Classification tree

For two classes, suppose p is the proportion in

the second class.

● Misclassification error = 1-max(p, 1-p)

● Gini index = 2p(1-p)
● Cross-entropy = -plogp-(1-p)log(1-p)

39
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Classification tree

● Cross-entropy and the Gini index are differentiable and hence more amenable to
numerical optimization.
● Cross-entropy and the Gini index are more sensitive to changes in the node
probabilities than the misclassification rate.
○ Suppose we have (400, 400) observations, i.e., 400 observations in each class in a 2-class problem.
■ Scenario 1: a split that created nodes (300, 100) and (100, 300)
■ Scenario 2: a split that created nodes (200, 400) and (200, 0)
○ Both Gini index and cross-entropy are lower for scenario 2. As for misclassification error, these two
scenarios achieved the same error rate. Note that the 2nd scenario produces a pure node and is
probably preferable.
40
Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]
M2Lab

Homework ●
●
Reading materials
Colab experiment

41
M2Lab

Reading materials after this week’s class

● Some methods related to LDA and QDA
○ Kernel density classification (Ch 6.6.2)
○ Naive Bayes (Ch 6.6.3)

● Generalized additive models (Ch 9.1)

○ Smoothing Splines (ch 5.4)
○ Additive logistic regression algorithm (Ch 9.1)

● Tree-based methods (Ch 9.2)

Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. New York: springer, 2017. [Author’s pdf link]

42
M2Lab

Colab Experiment: Breast cancer classification

● Data: Breast cancer wisconsin dataset for classiﬁcation
○ The detailed information can be found below
○ https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.lo
ad_breast_cancer

● Compare the classification performance of different methods that we have discussed in

class. You should include at least the following.
○ Logistic regression
○ SVM
○ Decision tree

● You are welcome to explore different regularization terms in each method, and even other
classification methods.

● Classification performance: Use 5-fold CV to evaluate the performance of each method in

terms of error rate. Report the mean and standard deviation of the error rate over these 5
folds.
43

Logistic Regression
100% (3)
Logistic Regression
30 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
IPL - PREDICTION Final
No ratings yet
IPL - PREDICTION Final
6 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Week 3
No ratings yet
Week 3
3 pages
Livelihood Strategies and Food Security in Southern Ethiopia
95% (22)
Livelihood Strategies and Food Security in Southern Ethiopia
182 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
2 Linear
No ratings yet
2 Linear
91 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
Linear Methods For Classification
No ratings yet
Linear Methods For Classification
29 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
19 28 Comparative Analysis of Bankruptcy
100% (1)
19 28 Comparative Analysis of Bankruptcy
10 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
M7 ClassificationLinearModels
No ratings yet
M7 ClassificationLinearModels
74 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
ML 3
No ratings yet
ML 3
66 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
ML 41
No ratings yet
ML 41
49 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Lecture Notes Lecture 2 Basic Linear Algebra Matlab
No ratings yet
Lecture Notes Lecture 2 Basic Linear Algebra Matlab
45 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Chap 5-2 Machine Learning Basics - Hyun-Lim Yang
No ratings yet
Chap 5-2 Machine Learning Basics - Hyun-Lim Yang
39 pages
An Introduction To Secondary Data Analysis With IBM SPSS Statistics - John MacInnes
100% (1)
An Introduction To Secondary Data Analysis With IBM SPSS Statistics - John MacInnes
504 pages
Unit Iii
No ratings yet
Unit Iii
18 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Lec 12
No ratings yet
Lec 12
15 pages
Multicategory Logit Models
No ratings yet
Multicategory Logit Models
49 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Fitting A Model To Data
No ratings yet
Fitting A Model To Data
41 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Categorical Dependent Variable Regression Models Using STATA, SAS, and SPSS
No ratings yet
Categorical Dependent Variable Regression Models Using STATA, SAS, and SPSS
32 pages
Support Vector Machines: Topics: The Support Vector Classifier: The Support Vector Machine Classifier
No ratings yet
Support Vector Machines: Topics: The Support Vector Classifier: The Support Vector Machine Classifier
16 pages
White Paper - Multiplicative MMM Simplified
No ratings yet
White Paper - Multiplicative MMM Simplified
8 pages
S, SVM, LR
No ratings yet
S, SVM, LR
18 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Final f04
No ratings yet
Final f04
13 pages
Article Download
No ratings yet
Article Download
14 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
Linear Regression Models With Logarithmic Transformations
No ratings yet
Linear Regression Models With Logarithmic Transformations
3 pages
Islp 3
No ratings yet
Islp 3
5 pages
27 Support - Vector - Machine
No ratings yet
27 Support - Vector - Machine
17 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Internship Report
No ratings yet
Internship Report
41 pages
Student Attitudes To Student Debt
No ratings yet
Student Attitudes To Student Debt
17 pages
Factors Associated With Teen Pregnancy in Sub-Saharan Africa: A Multi-Country Cross-Sectional Study
No ratings yet
Factors Associated With Teen Pregnancy in Sub-Saharan Africa: A Multi-Country Cross-Sectional Study
15 pages
Final 2006
No ratings yet
Final 2006
15 pages
HW 7
No ratings yet
HW 7
4 pages
Does The Adoption of International Financial Reporting Standards Restrain Earnings Management? Evidence From An Emerging Market
No ratings yet
Does The Adoption of International Financial Reporting Standards Restrain Earnings Management? Evidence From An Emerging Market
16 pages
Abiyot's Research Proposal
No ratings yet
Abiyot's Research Proposal
24 pages
Ch13slides Generalized Linear Models
No ratings yet
Ch13slides Generalized Linear Models
24 pages
Hydrogen Refueling Station Consideration and Driver Experience in California
No ratings yet
Hydrogen Refueling Station Consideration and Driver Experience in California
14 pages
Binary Logistic Regression Using Stata 17 Drop-Down Menus
No ratings yet
Binary Logistic Regression Using Stata 17 Drop-Down Menus
53 pages
Borse Et Al 2024 Detecting Early Warning Signs of Employee Attrition Using Machine Learning Algorithms
No ratings yet
Borse Et Al 2024 Detecting Early Warning Signs of Employee Attrition Using Machine Learning Algorithms
9 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
Camm 4e Ch09 PPT
No ratings yet
Camm 4e Ch09 PPT
71 pages
IQ Accounts For Black White Crime Difference PDF
No ratings yet
IQ Accounts For Black White Crime Difference PDF
6 pages
Level 2 Quants Notes
No ratings yet
Level 2 Quants Notes
7 pages
Username Password Remember Me
No ratings yet
Username Password Remember Me
25 pages
A Practice Based Study of Cultural Humility and Well Being Among Psychotherapy
No ratings yet
A Practice Based Study of Cultural Humility and Well Being Among Psychotherapy
12 pages
Sexual Behavior in Adults With Autism
No ratings yet
Sexual Behavior in Adults With Autism
13 pages
Adoption of Water Conservation Practices
No ratings yet
Adoption of Water Conservation Practices
9 pages
Full Download (Original PDF) Biology: A Global Approach, Global Edition 11th Edition PDF
80% (5)
Full Download (Original PDF) Biology: A Global Approach, Global Edition 11th Edition PDF
43 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Week 4

Uploaded by

Week 4

Uploaded by

M2Lab

● Some methods related to LDA and QDA

● Generalized additive models

Linear regression of an indicator matrix

X is the model matrix with p+1 columns corresponding to the p

Linear discriminant analysis (LDA)

● The class posteriors can be written as

● Assume that we model each class density as multivariate Gaussian.

● Linear discriminant analysis (LDA) arises in the special case when we

Linear discriminant analysis (LDA)

Quadratic discriminant analysis (QDA)

For a binary response with a 0/1

We have seen that LDA and

Optimal separating hyperplanes

Support vector classifier

Support vector classifier

Support vector machines (SVM)

The SVM as a penalization method

Kernel density classification

○ A popular choice for Kλ is the Gaussian kernel.

Kernel density classification

● Kernel density classification: We can

● Suppose we fit density estimates in

The assumption is generally not true, but it simplifies the estimation.

● Here we will take a look at an example for moving beyond linearity.

● This is a linear basis expansion in X.

Read section 5.2 on piecewise

● The loss function RSS becomes

● The solution becomes and

The figure on the right shows the

Generalized additive models

● What about the case when we have multiple predictors?

● Here, we can extend the smoothing spline to generalized additive models.

Generalized additive models

● In general, the conditional mean 𝜇(X), e.g., 𝜇(X)=P(Y=1|X) of a response Y is related to

● For example, when g(𝜇)=logit(𝜇)=log(𝜇(X)/(1-𝜇(X))) is equivalent to logistic regression.

Fitting additive models

Fitting additive models

● In each partition element we can model Y

● However, not every partitioning line has a

● We first split the space into two regions and model

The resulting of the process is a partition into the five

Now, let’s see the details for regression tree and

(xi, yi) for i =1, 2,..., N, with xi=(xi1, xi2, …, xip)

● If we adopt as our criterion minimization of the sum of squares, the solution is

● Seek the splitting variable j and split point s that solve

● For any choice j and s, the inner minimization is solved by

The parameter ɑ governs

● In a node m, representing a region Rm with Nm observations, let

the proportion of class k observations in node m.

● We classify the observations in node m to class

● We need to define the measure of node impurity suitable for classification.

For two classes, suppose p is the proportion in

● Misclassification error = 1-max(p, 1-p)

Reading materials after this week’s class

● Generalized additive models (Ch 9.1)

● Tree-based methods (Ch 9.2)

Colab Experiment: Breast cancer classification

● Compare the classification performance of different methods that we have discussed in

● Classification performance: Use 5-fold CV to evaluate the performance of each method in

You might also like