0% found this document useful (0 votes)

125 views11 pages

Binary Classification Techniques

This document presents a comparative study of logistic regression and kernel logistic regression for binary classification. It introduces logistic regression and kernel logistic regression, discussing how they are used to classify linearly and non-linearly separable data, respectively. Gradient descent and stochastic gradient descent are used to optimize the models. The performance differences between classical and kernel logistic regression, as well as between their classical and stochastic variants, are explored.

Uploaded by

Sayan Ghosal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views11 pages

Binary Classification Techniques

Uploaded by

Sayan Ghosal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/337932960

LOGISTIC REGRESSION AND KERNEL LOGISTIC REGRESSION A comparative

study of logistic regression and kernel logistic regression for binary
classiﬁcation

Preprint · December 2019

DOI: 10.13140/RG.2.2.28668.28808

CITATIONS READS

0 558

2 authors:

Kenneth Ezukwoke Samaneh Zareian

Université Jean Monnet Université Jean Monnet
7 PUBLICATIONS 0 CITATIONS 5 PUBLICATIONS 0 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Logistic Regression and Perceptron for Classification View project

constraint programming framework View project

All content following this page was uploaded by Kenneth Ezukwoke on 14 December 2019.

The user has requested enhancement of the downloaded file.

LOGISTIC REGRESSION AND KERNEL
LOGISTIC REGRESSION
A comparative study of logistic regression and kernel
logistic regression for binary classification
Ezukwoke K.I1 , Zareian S.J2
1, 2
Department of Computer Science
Machine Learning and Data Mining
{[Link], [Link]}@[Link]
University Jean Monnet, Saint-Etienne, France

Abstract parameter β that best maps the predictors

to the response variable yi .
Logistic regression is a linear
binary classification algorithm
y = X1 β1 + X2 β2 + · · · + XN βN (1)
frequently used for classifica-
tion problems. In this paper we
present its kernel version which
Using Ordinary Least Squares (OLS) we
is used for classification of non- can estimate the unknown parameters in
linearly separable problems. We the linear regression problem [1]. It does
briefly introduce the concept of this by minimizing the sum of square dif-
multiple kernel learning and ap- ferences between the predictors and the
ply it to kernel logistic regression. response variable. 1
We elaborate the performance dif-
ferences between classical, kernel
min kXβ − yk2 (2)
logistic regression and its stochas- β
tic variant (both classical and
kernel logistic regression). We minimize this objective function using
maximum likelihood estimation and derive
Keywords the following closed form solution.
Classification, logistic regression, kernel
logistic regression, multi-kernel learning. β = (XT X)−1 XT y (3)

This returns a model that produces as

1 INTRODUCTION straight line that maps the predictors to
the response [1].
Linear regression is a statistical method However, linear regression is only suffi-
used for univariate and multivariate anal- cient for explaining the relationship in
ysis. Given a set of observations {xi , yi }n , observations with continuous variable. For
where {xi }N is the independent variables observations with categorical variables it
(feature space) and yi is the dependent (re- becomes impossible to adopt this model.
sponse variable- usually continuous or dis- Logistic regression solves the limitation
crete). Linear regression models estimate of linear regression for categorical vari-
1
source code for project is available on github

1
able using maximum likelihood estima- rior probability of each class P r(Y |x; β).
tion of probability log function. This idea It is also a generalized linear model, map-
is further explained in the next sections. ping output of linear multiple regression
Our focus however is on its kernel version to posterior probability of each class
and how we explore the inner product of P r(Y |x; β) ∈ {0, 1} [2]. The probabilty
the independent variable to classify non- of a data-sample belonging to class 1 is
seperable data. given by:

P r(Y = 1|X = x; β) = σ(z), where z = β T x

2 CLASSIFICATION (4)

Classification is a supervised machine P (Y = 1|X = x; β) = σ(β T x) (5)

learning approach to categorize data into
where
distinct number of classes where we
can assign label to each class. Given a set P r(Y = 1|X = x; β)+P r(Y = 0|X = x; β) = 1
of data {x(i) , y (i) } where x is the feature (6)
space in m × (n + 1) dimension and y is
the classfication output such y ∈ {0, 1} for
binary output or {1, 2, ...n} for multiclass P r(Y = 0|X = x; β) = 1−P r(Y = 1|X = x; β)
output. Classification algorithms are most (7)
used for Spam detection, Voice and im- Hence, the probability of a data-sample
age recognition, sentiment analysis, fraud belonging to class 0 is given by:
detection and many more.
P r(Y = 0|X = x; β) = 1 − σ(z) (8)
Logistic regression is a linear binary clas-
sification algorithm that maps a set of σ(z) is called the logistic sigmoid func-
predictors to their corresponding categor- tion and is given by
ical response variables. The algorithm is
1
capable of classifying linearly separable σ(z) = (9)
1 + exp−z
datasets. However, linear logistic regres-
sion is not able to accurately classify non- The uniqueness of this function is that it
linear data, therefore we use kernel logistic maps all real numbers R to range { 0, 1}.
regression for non-linearly separable data Again, we know
classification.
log(odds(P r(Y = 1|X = x; β)))
Kernel logistic regression is similar to sup-
port vector machines in its operational out- P r(Y = 1|X = x; β)
=
put [4]. Existing papers already implement P r(Y = 0|X = x; β)
kernel logistic regression using the Newton- P r(Y = 1|X = x; β)
=
Ralphson method [3], Sequential Minimal 1 − P r(Y = 1|X = x; β)
Optimization (SMO) [4] and Truncated
Newton-method [5]. Assuming P (Y = 1|X = x; β) = p(x), the
next most obvious idea is to let logp(x) be
In this paper however we solve logistic
a linear function of x, so that changing an
regression and kernel-logistic regression
input variable multiplies the probability by
using gradient descent (GD) and stochas-
a fixed amount. This is done by taking a
tic gradient descent (SGD) optimization
log transformation of p(x).
techniques.
Formally, logit(p(x)) = β0 + β T x making

2.1 LOGISTIC REGRESSION

p(x)
logit(p(x)) = log = β0 + β T x
Logistic regression is a discriminative 1 − p(x)
model since it focuses only on the poste- (10)

2
n
Simplifying for p(x) and 1 − p(x) we have X p(xi )
= log(1 − p(xi )) + yi log
1 − p(xi )
p(x) i=1
= exp(β0 + β T x) (11) (18)
1 − p(x)
p(x)

we replace log 1−p(x) with β0 + x · β as
p(x) = (1 − p(x)) exp(β0 + β T x) (12) seen in equation (8) and (1 − p(x)) with
1
p(x) = exp(β0 + β T x) − p(x) · exp(β0 + β T x) 1+exp(β0 +x·β) . Hence,
(13)
p(x) + p(x) · exp(β0 + β T x) = exp(β0 + β T x) n
(14)
X 1
l(β0 , β) = log
p(x)(1 + exp(β0 + β T x)) = exp(β0 + β T x) 1 + exp(β0 + x · β)
i=i
(15) +y(β + x · β) 0

exp(β0 + β T x)
p(x) =
1 + exp(β0 + β T x)
1
exp(β0 +β T x)
· exp(β0 + β T x)
= exp(β0 +β T x)
1 n
exp(β0 +β T x)
+ exp(β0 +β T x)
X
= −log(1+exp(β0 +x·β))+y(β0 +x·β)
1 i=i
=
1 + exp −(β0 + β T x) (19)
1
1 − p(x) = (16) n
1 + exp(β0 + β T x) X 1
∇l(β0 , β) = − xij
1 + exp(β0 + x · β)
i=1
2.2 Learning Logistic regression n
X
We assume that P (Y = 1|X = x; β) = exp(β0 + x · β) + yi x i
P (x; β), for some probability function i=1

P (x; β) parameterized by β the conditional

likelihood function is given by Bernoulli
sequence: n
X
∇lβ = (yi − p(xi ; β0 , β))xij (20)
i=1
Πni=1 P r(Y = yi |X = xi ; β) = Πni=1 p(xi ; β)y
(1 − p(xi ; β)(1−yi ) )
Since this is a transcendental equation
with no closed-form solution, we ap-
The probability of a class is p, if yi = 1, or ply the gradient descent optimization
1 − p, if yi = 0. The likelihood is then algorithm. We take first the first or-
der derivative of the objective function
L(β0 , β) = Πni=1 p(xi )y (1−p(xi )(1−yi ) ) (17) Pn
∇β l = i=1 (yi − p(xi ; β0 , β))xij
taking the log of this likelhood we have
Algorithm 1: Logistic regression
Xn via Gradient Descent GD
l(β0 , β) = yi logp(xi ) + (1 − yi ) Input : x ∈ X where y ∈ Y
i=1 Output : β
log(1 − p(xi )) 1 begin
2 βj ← β 0 ;
Xn 3 while not converged do
= yi logp(xi ) − yi log(1 − p(xi ))+ 4 βj+1 = βj − α∇β l;
i=1 5 end
log(1 − p(xi )) 6 end

3
We also solve it using stochastic approach We can now express p(x; β) is subspace of
Algorithm 2: Logistic regression input vectors only such that
via Stochastic Gradient Descent
SGD 1
p(φ; α) = (24)
Input : x ∈ X where y ∈ Y 1 + e−αi κ(xi ,xj )
Output : β
1 begin and
2 βj ← β 0 ;
1
3 while not converged do 1 − p(φ; α) = (25)
4 for 1 + eαi κ(xi ,xj )
i ∈ randshuffle({1, . . . , N })
The logit function is mapped into the ker-
do
nel space as
5 for k ∈ {1, . . . , i} do
6 βj+1 = βj − α∇β lk ;
p(φ; α)
7 end logit( ) = ακ(xi , x) (26)
1 − p(φ; α)
8 end
9 end Deriving the equation of kernel logistic re-
10 end gression requires the regularized logistic
regression, precisely the l2 − norm of the
2.3 KERNEL LOGISTIC RE- log-likelihood. This is in comparison to the
GRESSION SVM objective function used in [3].

Classical logistic regression will fail to clas-

sify accurately non-linearly separable data, X n

therefore, we prefer to use its kernel ver- Lα = yi logp(xi ) + (1 − yi )log(1 − p(xi ))

sion. It also has a direct probabilistic inter- i=1

pretation that makes it suited for Bayesian λ

− αT κ(xi , x)α
design [4]. 2
The vector space can be expressed as a lin-
ear combination of the input vectors such 2.4 Learning kernel logistic re-
that gression
XN
β= αi φ(xi ) (21) As mentioned earlier, some of the methods
i=1
for finding the maximum likelihood esti-
where α ∈ Rn×1 is the dual variable. The mate include gradient descent (GD) and
function φ(xi ) maps the data points from iterative re-weighted least sqaures (IRLS)
lower dimension to higher dimension. method. Here we employ the use of IRLS
0 which is based on the Newton-Ralphson
φ : x ∈ RD → φ(x) ∈ F ⊂ RD (22)
algorithm.
Let κ(xi , x) be a kernel function resulting
from the inner product of φ(xi ) and φ(xj ),
2.4.1 Optimization problem
such that

κ(xi , x) = φ(xi )φ(xj ) (23) n
X
From representer theorem we know Lα = yi logp(xi ) + (1 − yi )log(1−
that i=1
λ
F = β T φ(x) = α φ(xi )φ(xj )

p(xi )) − αT κ(xi , x)α
2
= ακ(xi , xj )

4
We can expand the objective function as learning rate.
follows Algorithm 4: Kernel logistic re-
gression using Stochastic Gradi-
ent descent
Input : κ, y, αj
p Output : α
Lα = ylog + log(1 − p(xi ))
1−p 1 begin
λ 2 αj ← α0 ’;
− αT κ(xi , x)α
2 3 while not converged do
p 1 4 for
= ylog + log
1−p 1 + eακ(xi ,x) i ∈ randshuffle({1, . . . , N })
λ do
− αT κ(xi , x)α
2 5 for k ∈ {1, . . . , i} do
= yακ(xi , x) − log(1 + eακ(xi ,x) ) 6 αj+1 = αj − lr∇α Lk ;
λ 7 end
− αT κ(xi , x)α 8 end
2
9 end
10 end

2.4.2 Prediction
First order derivative of the log-likelihood
Still using the representer theorem, we
compute the posterior probability of a new
data point
κ(xi , x)eακ(xi ,x)
1

∇α L = yκ(xi , x) − y = sign (27)
1 + eακ(xi ,x) 1 + exp−ακ(xi ,x)
−λακ(xi , x)
= yκ(xi , x) − pκ(xi , x) − λακ(xi , x) Here, the prediction is dependent only on
α and the kernel.
∇α L = κ(xi , x)(y − p) − λακ(xi , x)

2.5 Kernels
We introduce the commonly used kernels
and a brief overview of Multiple kernels
We apply gradient descent and stochastic used. The radius R is computed from the
gradient descent algorithms
• Linear kernel
Algorithm 3: Kernel logistic re-
gression using Gradient descent κ(xi , xj ) = xi xTj (28)
Input : κ, y, α
Output : α • Polynomial kernel
1 begin
2 αj ← α0 ’; κ(xi , xj ) = (xi xTj + c)d (29)
3 while not converged do
where c ≥ 0 and d is the degree of
4 αj+1 = αj − lr∇α L;
the polynomial usually greater than
5 end
2.
6 end
• RBF(Radial Basis Function) ker-
In algorithm 4 of Algorithm 3, lr is the nel

5
Sometimes referred to as the Gaus- So that,
sian kernel. N
X
hu, κui = ui · (κui ) (34)
κ(xi , xj ) = exp(−γ||xi − xj ||2 ) (30) i=1

1 N N
where γ = 2σ 2
. X X

= ui φ(xi )φ(xj )H uj (35)
• Sigmoid kernel i=1 j=1
Where H represent the Hilbert space we
project the kernel [7].
κ(xi , xj ) = tanh(γxi xTj + c) (31) *N N +
XX
1 = ui φ(xi ), uj φ(xj ) (36)
where c ≥ 0 and γ = 2σ 2
.
i=1 j=1 H
• Laplace kernel
N
2

X
hu, κui =

≥0
ui φ(xi ) (37)
i=1
κ(xi , xj ) = exp(−γ||xi − xj ||) (32) H
Therefore κ is positive definite.
where γ = 1
. Using this property of the kernel κ we in-
2σ 2
troduce multiple kernel combination as
follow
2.6 Multi-kernel
• LinearRBF
The reason behind the use of multiple ker- Here we combine two kernels, pre-
nel is similar to the notion of multiclassi- cisely Linear and RBF kernel us-
fication, where cross-validation is used to ing their inner product.
select the best performing classifier [6]. By
using multiple kernel, we hope to learn a K̂linrbf = κ(xi , xj ) × κ(xi , xl ) (38)
different similar in the kernel space not
easily observed when using single kernel. • RBFPoly
We can prove from Mercer’s Theorem Here we combine RBF and Polyno-
that a kernel is Positive Semi-Definite mial kernel using their inner prod-
(PSD) if uT κ(xi , xj )u ≥ 0. Hence by per- uct.
forming arithmetic or any mathematical K̂rbfpoly = κ(xi , xj ) × κ(xi , xl ) (39)
operation on two or more kernel matrix,
we obtain a new kernel capable of exploit- • EtaKernel
ing different property or similarities of The EtaKernel is a composite com-
training data. bination of LinearRBF, RBFPoly
Given a kernel κ, we prove that κ is PD if and RBFCosine and it is given by

hu, κui ≥ 0 (33) K̂etarbf = K̂linrbf × K̂rbfpoly +

K̂rbfpoly × K̂rbfcosine
Proposition: A symmetric function κ: χ
→ R is positive semi-definite if and only if
hu, κui ≥ 0 3 EXPERIMENT
Proof :
3.1 Dataset
Suppose that κ is a kernel which is the
inner

product
of the mapping functions We perform a comparison between classi-
φ(xi )φ(xj ) . κ is a kernel if its inner cal logistic regression and its kernel ver-
product are positive and the solution of sion using benchmark datasets includ-
κu = λu gives non-negative eigenvalues. ing moons, blob, circle and classification.

6
Given N , the total number of samples. ber of times. We do this because of the
Circle (N = 1000), Moon (N = 1000) non-deterministic result we get from ran-
and Classification (N = 1000), we split dom initialization of β and α-for stochastic
each data into training and test sample of version.
70% − 30% each. Using the configuration such that
learning rate = 10, γ = 1 and λ = 0.00001
3.1.1 Data description the algorithm returns the result in figure 1.
Each data contains binary class (exactly
2 groups of data), and each samples data 3.3 Logistic and kernel logistic
contains exactly two feature vectors each. regression result (Stochastic)
The figure 1 below shows the result of the
3.2 Logistic and kernel logis-
stochastic version for logistic regression
tic regression result (Non-
and its kernel versions. After N amount
stochastic)
of runs using the configuration such that
We begin by passing all data into one learning rate = 10, γ = 1 and λ = 0.00001
pipeline and runs this procedure N num- the algorithm returns the result in figure 2.

Figure 1: Non-Stochastic logistic and kernel logistic regression.

3 iterations for all except blob data (10 iterations), 0.01 learning
rate.

Figure 2: Stochastic logistic and kernel logistic regression. 3 itera-

tions for all except blob data (50 iterations), 0.01 learning rate.

7
3.4 Performance Analysis logistic regression and its kernel versions.
F1-Score is the harmonic mean of precision
We compare the performance of Logistic and recall and it is given by
regression and its kernel version. We also
show that although stochastic logistic re- 2 × precision × recall
F 1 − score = (40)
gression gives us a better stable result com- precision + recall
pared to non-stochastic logistic regression,
its takes considerable amount of time to where precision is given by
compute and hence, less faster compared TP
to non-stochastic version. precision = (41)
TP + FP
TP
3.4.1 Evaluation metric recall = (42)
TP + FN
We use f1-score as our evaluation metric TP: True Positives, TN: True Negatives,
to compare the performance of classical FP: False Positives FP: False Negatives.

Non-Stochastic F1-Score (%)

kernels linear rbf poly sigmoid laplace rbfpoly linrbf eta
kernel
Moons 81 72 68 72 63 67 69 66
Blobs 98 41 0 66 68 1 97 0
Circle 50 93 0 69 1 0 0 0
Classifi- 88 0 79 64 0 5 0 4
cation

Non-Stochastic Running Time (secs)

kernels linear rbf poly sigmoid laplace rbfpoly linrbf eta
kernel
Moons 0.003 0.02 0.04 0.006 0.02 0.6 0.02 0.37
Blobs 0.003 0.02 0.04 0.007 0.03 0.05 0.03 0.12
Circle 0.003 0.02 0.04 0.05 0.02 0.06 0.02 0.38
Classifi- 0.003 0.08 0.04 0.005 0.12 0.17 0.08 0.81
cation

Stochastic F1-Score (%)

kernels linear rbf poly sigmoid laplace rbfpoly linrbf eta
kernel
Moons 86 23 68 69 87 67 69 66
Blobs 98 75 0 66 95 88 99 84
Circle 48 49 0 17 69 0 0 0
Classifi- 89 0 78 17 0 0 0 0
cation

8
Stochastic Running Time (secs)
kernels linear rbf poly sigmoid laplace rbfpoly linrbf eta
kernel
Moons 0.62 0.05 0.07 0.02 0.09 0.21 0.08 0.27
Blobs 0.60 0.37 0.06 0.05 0.38 0.06 0.05 0.08
Circle 0.61 0.08 0.250 0.01 0.09 0.29 0.07 0.35
Classifi- 0.63 0.25 0.04 0.01 0.28 0.13 0.15 0.23
cation

We observer from the F 1 − score table stochastic logistic regression is faster

that linear logistic regression is almost than its gradient descent version. in other
suitable for all datasets except circle data. words, stochastic gradient version of LR
This is because since circle data is not a reaches the optimum solution faster than
linearly seperable data, precisely rbf ker- its gradient descent version. We can make
nel logistic regression is most suitable for the same argument for kernel logistic re-
classifying it with an score of 93%. gression (stochastic) and logistic regression
In terms of running time, it is obvious (non-stochastic). Its stochatic version con-
classical or linear logistic regression is the verges faster towards zero than its gradient
fastest in computation compared to its descent version.
kernel versions. This is due to the time
taken in computing the kernel matrix
O(m × n)d where d is the degree (used
for rbf and its variants with d = 2 and 4 CONCLUSION
polynomial with d ≥ 2). Sigmoid ker-
nel still has the fastest running time of all
kernels. We demonstrate the use of logistic regres-
However, we have considered different data sion, kernel logistic regression and stochas-
types and the performance of the algo- tic version of logistic and kernel logistic re-
rithm can be better evaluated when each gression. We conclude that kernel logistic
dataset is considered individual with dif- regression is the best performing algorithm
ferent configuration of learning rate, γ and for classifying non-linearly separable data.
polynomial degree. Its classical version however, has a faster
computational time but only serves best
for linear binary classification. This is one
3.5 Convergence rate of the advantages of using stochastic gradi-
ent descent version over non-stochastic ver-
We compare the convergence rate for logis- sion. Specifically because it does required
tic regression and its kernel and stochastic large number of iteration to converge.
kernel version to speed of convergence.
We introduced the notion of multiple ker-
nel learning and see that that can also out-
perform classical logistic regression using
the F1-score evaluation metric. Stochastic
logistic and kernel logistic regression both
behave alike with non-stochastic version
but can be much stable than their non-
stochastic counterpart. We noted that the
convergence rate of stochastic logistic and
kernel logistic regression is faster than its
We observe the speed of convergence for non-stochastic version.

9
References Conference on Machine Learning, 19,
2002.
[1] Goldberger, Authur S. Classical Linear
Regression. Econometric Theory. John [5] Maher Maalouf, Theodore B. Trafalis,
Wiley & Sons. pp. 158. ISBN- 04471- Indra Adrianto., Kernel logistic regres-
31101-4, 1964. sion using truncated Newton method.
Computer Management Science, 8:415-
[2] Scott Menard. Applied Logistic Regres- 428, 2009.
sion Analysis. SAGE PUBLICATIONS.
ISBN- 0-7619-2208-3, 2001. [6] Mehmet Gönen, Ethem Alpaydin.
Multiple Kernel Learning Algorithms.
[3] Zhu J, Hastie T. Kernel logistic re- Journal of Machine Learning Research,
gression and import vector machine. J 12:2211-2268, 2011.
Comput Graphic Stat, 14:185–205, 2005.
[7] John Shawt-Taylor, Nello Cristian-
[4] Keerthi, S., Duan, K., Shevade, S., and ini. Kernel Methods for Pattern Anal-
Poo, A. A Fast Dual Algorithm for Ker- ysis. Cambridge University Press,
nel Logistic Regression. International ISBN:9780511809682, pg47-83, 2011.

View publication stats

ML Unit 3
No ratings yet
ML Unit 3
40 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
4 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
Advanced Regression with GLMs
No ratings yet
Advanced Regression with GLMs
13 pages
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
Module1.4 Regression
No ratings yet
Module1.4 Regression
24 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
AI Lab8
No ratings yet
AI Lab8
8 pages
Misc 5
No ratings yet
Misc 5
1 page
Nisha Arora - Logistics Regression Using SPSS
No ratings yet
Nisha Arora - Logistics Regression Using SPSS
76 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
69 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
228w1f0065 ML
No ratings yet
228w1f0065 ML
15 pages
Logistic Regression: Jia Li
No ratings yet
Logistic Regression: Jia Li
44 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logit PDF
No ratings yet
Logit PDF
44 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Dav Exp4 66
No ratings yet
Dav Exp4 66
5 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
P 2.1 Logistic Regression
No ratings yet
P 2.1 Logistic Regression
18 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Logistic Regression for Classification
No ratings yet
Logistic Regression for Classification
13 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
15 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Logistic Regression - Metrics and Iteration
No ratings yet
Logistic Regression - Metrics and Iteration
26 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
29 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Task 1
No ratings yet
Task 1
7 pages
Logistic Regression Model - A Review
No ratings yet
Logistic Regression Model - A Review
5 pages
Exp 2
No ratings yet
Exp 2
7 pages
ML 4
No ratings yet
ML 4
80 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
Logistic Regression for Binary Classification
No ratings yet
Logistic Regression for Binary Classification
84 pages
Linear Regression Simple Technique For I
No ratings yet
Linear Regression Simple Technique For I
3 pages
Logistic Regression in Data Analysis: An Overview
No ratings yet
Logistic Regression in Data Analysis: An Overview
21 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
Applsci 15 05930
No ratings yet
Applsci 15 05930
29 pages
Simple Linear Regression and Correlation: Chapter Outline
No ratings yet
Simple Linear Regression and Correlation: Chapter Outline
77 pages
Machine Learning & Python Basics
No ratings yet
Machine Learning & Python Basics
53 pages
ML Assignment
No ratings yet
ML Assignment
5 pages
Understanding Regression Techniques
No ratings yet
Understanding Regression Techniques
34 pages
Deep Learning for Object Detection
No ratings yet
Deep Learning for Object Detection
12 pages
R Manual To Agresti's Categorical Data Analysis
100% (1)
R Manual To Agresti's Categorical Data Analysis
280 pages
jmp027 Titanic Passengers
No ratings yet
jmp027 Titanic Passengers
13 pages
Using Modified Early Warning Score To Predict Need of Lifesaving Intervention in Adult Non-Trauma Patients in A Tertiary State Hospital
No ratings yet
Using Modified Early Warning Score To Predict Need of Lifesaving Intervention in Adult Non-Trauma Patients in A Tertiary State Hospital
6 pages
Advanced Data Analysis - Lecture Notes
No ratings yet
Advanced Data Analysis - Lecture Notes
874 pages
Factors Affecting Adoption of Soil and Water Conservation Practices The
No ratings yet
Factors Affecting Adoption of Soil and Water Conservation Practices The
7 pages
Logit Probit
No ratings yet
Logit Probit
20 pages
Getaw A. Propo - PPT
No ratings yet
Getaw A. Propo - PPT
31 pages
Course Slides - Data Science and ML Fundamentals
No ratings yet
Course Slides - Data Science and ML Fundamentals
92 pages
(Ebook) Analysis of Binary Data, Second Edition by David Roxbee Cox E.J. Snell ISBN 9780412306204, 9781315137391, 9781351466738, 0412306204, 1315137399, 1351466739 Updated 2025
No ratings yet
(Ebook) Analysis of Binary Data, Second Edition by David Roxbee Cox E.J. Snell ISBN 9780412306204, 9781315137391, 9781351466738, 0412306204, 1315137399, 1351466739 Updated 2025
296 pages
What Is Beautiful Is Not Always Good Influence of Machine Learning-Derived Photo Attractiveness On Intention To Initiate Social Interactions in Mobil
No ratings yet
What Is Beautiful Is Not Always Good Influence of Machine Learning-Derived Photo Attractiveness On Intention To Initiate Social Interactions in Mobil
21 pages
Linear & Logistic Regression
No ratings yet
Linear & Logistic Regression
26 pages
Graded Response Model Guide
No ratings yet
Graded Response Model Guide
16 pages
Milk Candy
No ratings yet
Milk Candy
109 pages
Geology Prediction via ML Techniques
No ratings yet
Geology Prediction via ML Techniques
13 pages
Modeling the locative alternation in Mandarin Chinese　A corpus-based study
No ratings yet
Modeling the locative alternation in Mandarin Chinese　A corpus-based study
28 pages
Modeling Basketball's Points Per Possession With Application To Predicting The Outcome of College Basketball Games
No ratings yet
Modeling Basketball's Points Per Possession With Application To Predicting The Outcome of College Basketball Games
19 pages
2004 Do Conditional Cash Transfers Improve Child Health Evidence From PROGRESA Control Randomized Experiment
No ratings yet
2004 Do Conditional Cash Transfers Improve Child Health Evidence From PROGRESA Control Randomized Experiment
8 pages
Machine Learning Data & Metrics Guide
No ratings yet
Machine Learning Data & Metrics Guide
12 pages
Glossary SPSS Statistics
No ratings yet
Glossary SPSS Statistics
6 pages
Unit V
No ratings yet
Unit V
67 pages
Question and Answer
No ratings yet
Question and Answer
13 pages
Logistic Regression for Binary Classification
No ratings yet
Logistic Regression for Binary Classification
37 pages
Optimal Dynamic Clustering Through Relegation
No ratings yet
Optimal Dynamic Clustering Through Relegation
34 pages

Binary Classification Techniques

Uploaded by

Binary Classification Techniques

Uploaded by

See discussions, stats, and author profiles for this publication at: [Link]

LOGISTIC REGRESSION AND KERNEL LOGISTIC REGRESSION A comparative

Preprint · December 2019

Kenneth Ezukwoke Samaneh Zareian

SEE PROFILE SEE PROFILE

Logistic Regression and Perceptron for Classification View project

constraint programming framework View project

The user has requested enhancement of the downloaded file.

Abstract parameter β that best maps the predictors

This returns a model that produces as

P r(Y = 1|X = x; β) = σ(z), where z = β T x

Classification is a supervised machine P (Y = 1|X = x; β) = σ(β T x) (5)

2.1 LOGISTIC REGRESSION  

P (x; β) parameterized by β the conditional

Classical logistic regression will fail to clas-

therefore, we prefer to use its kernel ver- Lα = yi logp(xi ) + (1 − yi )log(1 − p(xi ))

pretation that makes it suited for Bayesian λ

hu, κui ≥ 0 (33) K̂etarbf = K̂linrbf × K̂rbfpoly +

Figure 1: Non-Stochastic logistic and kernel logistic regression.

Figure 2: Stochastic logistic and kernel logistic regression. 3 itera-

Non-Stochastic F1-Score (%)

Non-Stochastic Running Time (secs)

Stochastic F1-Score (%)

We observer from the F 1 − score table stochastic logistic regression is faster

View publication stats

You might also like

2.1 LOGISTIC REGRESSION