0% found this document useful (0 votes)

56 views30 pages

Unit-2: Logistic Regression

Logistic regression is a machine learning classification algorithm that predicts categorical dependent variables using independent variables. It is similar to linear regression but is used for classification problems rather than regression problems. The logistic function is used to map predicted values to probabilities between 0 and 1. Logistic regression makes assumptions that the dependent variable is categorical and independent variables do not have multicollinearity. It can be binomial, multinomial, or ordinal based on the number of categories of the dependent variable.

Uploaded by

jas deep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views30 pages

Unit-2: Logistic Regression

Uploaded by

jas deep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Unit-2

Logistic regression
• Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning
technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent
variable. Therefore the outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression
except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit,
so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the
logistic function.
In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold
values tends to 0.
Assumptions for Logistic Regression:
The dependent variable must be categorical in nature.
The independent variable should not have multi-collinearity.
Type of Logistic Regression:
• On the basis of the categories, Logistic Regression
can be classified into three types:
• Binomial: In binomial Logistic regression, there can
be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
• Multinomial: In multinomial Logistic regression,
there can be 3 or more possible unordered types of
the dependent variable, such as "cat", "dogs", or
"sheep"
• Ordinal: In ordinal Logistic regression, there can be 3
or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".
Perceptron
• A perceptron is the simplest model of Artificial
Neural Network. It consists of a single artificial
neuron with Heaviside Step function as the
activation function.

The perceptron is a linear binary classifier. The training phase of perceptron performs multiple
iterations on the training data points.
A Perceptron is an algorithm used for supervised learning of binary classifiers. Binary classifiers
decide whether an input, usually represented by a series of vectors, belongs to a specific class
Perceptron Learning Algorithm

• Set all the weights to zero

• Until all the instances in the training data are
classified correctly
• For each instance I in the training data
• If I is classified incorrectly by the perceptron
• If I belongs to first class add it to the weight
vector
• else subtract it from the weight vector
Exponential family
Generative learning algorithms,
Generative Adversarial Networks (GANs) are a powerful class of neural networks
that are used for unsupervised learning. It was developed and introduced by Ian J.
Goodfellow in 2014. GANs are basically made up of a system of two competing
neural network models which compete with each other and are able to analyze,
capture and copy the variations within a dataset. Generative approaches try to build a
model of the positives and a model of the negatives. You can think of a model as a
“blueprint” for a class. A decision boundary is formed where one model becomes
more likely. As these create models of each class they can be used for generation.
To create these models, a generative learning algorithm learns the joint probability
distribution P(x, y).
The joint probability can be written as:
P(x, y) = P(x | y) . P(y) ….(i)
Also, using Bayes’ Rule we can write:
P(y | x) = P(x | y) . P(y) / P(x) ….(ii)
Since, to predict a class label y, we are only interested in the arg max , the
denominator can be removed from (ii).
Hence to predict the label y from the training example x, generative models evaluate:
f(x) = argmax_y P(y | x) = argmax_y P(x | y) . P(y)
The most important part in the above is P(x | y). This is what allows the model to be
generative! P(x | y) means – what x (features) are there given class y.
Different types of GANs:
GANs are now a very active topic of research and there have been many different types of GAN
implementation. Some of the important ones that are actively being used currently are described
below:
• Vanilla GAN: This is the simplest type GAN. Here, the Generator and the Discriminator are
simple multi-layer perceptrons. In vanilla GAN, the algorithm is really simple, it tries to
optimize the mathematical equation using stochastic gradient descent.
• Conditional GAN (CGAN): CGAN can be described as a deep learning method in which
some conditional parameters are put into place. In CGAN, an additional parameter „y‟ is
added to the Generator for generating the corresponding data. Labels are also put into the
input to the Discriminator in order for the Discriminator to help distinguish the real data from
the fake generated data.
• Deep Convolutional GAN (DCGAN): DCGAN is one of the most popular also the most
successful implementation of GAN. It is composed of ConvNets in place of multi-layer
perceptrons. The ConvNets are implemented without max pooling, which is in fact replaced
by convolutional stride. Also, the layers are not fully connected.
• Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a linear invertible image
representation consisting of a set of band-pass images, spaced an octave apart, plus a low-
frequency residual. This approach uses multiple numbers of Generator and Discriminator
networks and different levels of the Laplacian Pyramid. This approach is mainly used because
it produces very high-quality images. The image is down-sampled at first at each layer of the
pyramid and then it is again up-scaled at each layer in a backward pass where the image
acquires some noise from the Conditional GAN at these layers until it reaches its original
size.
• Super Resolution GAN (SRGAN): SRGAN as the name suggests is a way of designing a
GAN in which a deep neural network is used along with an adversarial network in order to
produce higher resolution images. This type of GAN is particularly useful in optimally up-
scaling native low-resolution images to enhance its details minimizing errors while doing so.
Gaussian/Linear discriminant analysis
• LDA is a generative learner as it makes assumption about the data distribution.
• LDA makes some simplifying assumptions about your data:
• That your data is Gaussian, that each variable is shaped like a bell curve when
plotted.
• That each attribute has the same variance that values of each variable vary
around the mean by the same amount on average. With these assumptions, the
LDA model estimates the mean and variance from your data for each class.
• The discriminant function used by LDA is:
• f = μ C (x ) – 0.5* μ C (μ ) + ln(p )
i i -1 k T i -1 i T i

• where,
• f is the probability of the input to belong to class i
i

• μ is the mean of features for class i

• C is the inverse of pooled covariance matrix

-1

• x is the object which is to be classified

• We assign the object k with features x to group i that has maximum f

k i

• models the decision boundary between the classes

• learns the conditional probability distribution p(y|x)p(y|x)
they:
– assume some functional form for p(y|x)p(y|x)
– estimate parameters of p(y|x)p(y|x) directly from training data
• examples : logistic regression, scalar vector machine, traditional neural
networks
Steps:
• Train the data and obtain a discriminate fn, tells
which class a data point has higher prob of
belonging to
• compute μμ and σσ for each class, then calculate
prob that data belongs to it, class with highest prob
chosen
Support vector machines: Optimal hyper plane,
• Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is
used for Classification as well as Regression
problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best
line or decision boundary that can segregate n-
dimensional space into classes so that we can
easily put the new data point in the correct
category in the future. This best decision boundary
is called a hyperplane.
• SVM chooses the extreme points/vectors that help
in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm
is termed as Support Vector Machine
SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly
separable data, which means if a dataset can be
classified into two classes by using a single straight
line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM
classifier.
• Non-linear SVM: Non-Linear SVM is used for non-
linearly separated data, which means if a dataset
cannot be classified by using a straight line, then
such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.

• by adding a new dimension, z=x2+y2z=x2+y2

• Hyperplane: There can be multiple lines/decision boundaries to
segregate the classes in n-dimensional space, but we need to find out
the best decision boundary that helps to classify the data points. This
best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which
means the maximum distance between the data points.
• A hyperplane in an n-dimensional Euclidean space is a flat, n-1
dimensional subset of that space that divides the space into two
disconnected parts. There can be multiple lines/decision boundaries to
segregate the classes in n-dimensional space. SVM algorithm finds the
closest point of the lines from both the classes, these points are called
support vectors. The distance between the vectors and the hyperplane is
called as margin, the goal of SVM is to maximize this margin. The
hyperplane with maximum margin is called the optimal hyperplane.
Kernels.
• SVM algorithms use a set of mathematical functions that are defined as the kernel
• function of kernel is to take data as input and transform it into the required form
• different SVM algorithms use different types of kernel functions. These functions
can be different types. For example linear, nonlinear, polynomial, radial basis
function (RBF), and sigmoid
• most used type of kernel function is RBF. Because it has localized and finite
response along the entire x-axis
• kernel functions return the inner product between two points in a suitable feature
space, thus defining a notion of similarity, with little computational cost even in
very high-dimensional spaces
Radial Basis Function (RBF) kernel
• A radial basis function is a real-valued function whose value depends only
on the distance from the origin. Any function that satisfies the property
ϕ(x)=ϕ(||x||) is a radial function.
• There are various types of RBF: Gaussian, Multi-quadratic, Inverse quadratic,
etc.
Gaussian Kernel
• The Gaussian kernel is an example of RBF kernel. The adjustable parameter
sigma plays a major role in the performance of the kernel, and should be
carefully tuned to the problem at hand. If over-estimated the exponential
will behave almost linearly and the higher dimensional projection will start
to lose its non-linear power. On the other hand, if under-estimated, the
function will lack regularization and the decision boundary will be highly
sensitive to noise in training data.

Exponential kernel
The exponential kernel is closely related to the Gaussian kernel, with only the square of the
norm left out. It is also a radial basis function kernel.
Model selection and feature selection.
Model selection
• Given a set of models, choose the model that is expected to give the best results.
• Choosing among different learning algorithms e.g. choosing kNN over other
• Classification algorithms.
• Choosing parameters in same learning model e.g. choosing value of k in kNN.
Feature Selection- Selecting a useful subset from all the features.
Why Feature Selection?
• Some algorithms scale (computationally) poorly with increased dimension
• Irrelevant features can confuse some algorithms
• Redundant features adversely affect regularization
• Removal of features can increase (relative) margin (and generalization)
• Reduces data set and resulting model size
• Note: Feature Selection is different from Feature Extraction. The latter transforms
original
• features to get a small set of new features
How?
• Remove a binary feature if nearly all of it values are same.
• Use some criteria to rank features and keep top ranked features.
• Wrapper Methods: requires repeated runs of the learning algorithm with
different
• Set of features.
• Combining classifiers: Bagging, boosting (The Ada boost
algorithm),
• Ensemble Models
Bagging
• Its objective is to create several subsets of data from training
sample chosen randomly with replacement. Each collection of
subset data is used to train their decision trees. We get an
ensemble of different models. Average of all the predictions
from different trees are used which is more robust than a single
decision tree classifier
Steps:
• 1. Suppose there are observations and features in training data set,
sample from training data set is taken randomly with replacement
• 2. A subset of features are selected randomly and whichever
feature gives the best split is used to split the node iteratively
• 3. The tree is grown to the largest
• 4. Above steps are repeated times and prediction is given based on
the aggregation of predictions from number of trees.

Advantages:
• Reduces over-fitting of the model
• Handles higher dimensionality data very well
• Maintains accuracy for missing data
Disadvantages:
• Since final prediction is based on the mean predictions from subset
trees, it won’t give precise values for the classification and
regression model.
Boosting
• It is used to create a collection of predictors. Learners
are learned sequentially with early learners fitting
simple models to the data and then analysing data for
errors. Consecutive trees are fit and at every step, the
goal is to improve the accuracy from the prior tree.
When an input is misclassified by a hypothesis, its
weight is increased so that next hypothesis is more
likely to classify it correctly. Process converts weak
learners into better performing model.
Steps:
• 1. Draw a random subset of training samples without replacement
from the training set to train a weak learner
• 2. Draw second random training subset without replacement from
the training set and add percent of the samples that were
previously falsely classified/misclassified to train a weak learner
• 3. Find the training samples d3 in the training set D on which and
disagree to train a third weak learner
• 4. Combine all the weak learners via majority voting.

Advantages
• Supports different loss function
• Works well with interactions.

Disadvantages
• Prone to over-fitting
• Requires careful tuning of different hyper-parameters
Adaboost
• Weak models are added sequentially, trained using the
weighted training data.
• The training weights are updated giving more weight to
incorrectly predicted instances, and less weight to correctly
predicted instances.
• The process continues until a pre-set number of weak
learners have been created (a user parameter) or no further
improvement can be made on the training dataset.
• Once completed, you are left with a pool of weak learners
each with a stage value.
• A stage value is calculated for the trained model which
provides a weighting for any predictions that the model
makes.
• Predictions are made by calculating the weighted average of
the weak classifiers.
– Evaluating and debugging learning algorithms,
Classification errors.
Evaluating your machine learning algorithm is an essential part of
any project. Your model may give you satisfying results when
evaluated using a metric say accuracy_score but may give poor
results when evaluated against other metrics such
as logarithmic_loss or any other such metric. Most of the times we
use classification accuracy to measure the performance of our
model, however it is not enough to truly judge our model. In this
post, we will cover different types of evaluation metrics available.
Classification Accuracy
• Classification Accuracy is what we usually mean, when we use
the term accuracy. It is the ratio of number of correct predictions
to the total number of input samples.
Logarithmic Loss
• Logarithmic Loss or Log Loss, works by penalising
the false classifications. It works well for multi-class
classification. When working with Log Loss, the
classifier must assign probability to each class for all
the samples. Suppose, there are N samples
belonging to M classes, then the Log Loss is
calculated as below

where,
y_ij, indicates whether sample i belongs to class j or not
p_ij, indicates the probability of sample i belonging to class j
Log Loss has no upper bound and it exists on the range *0, ∞). Log Loss nearer to 0 indicates
higher accuracy, whereas if the Log Loss is away from 0 then it indicates lower accuracy
Naive Bayes,
• It is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems. Mainly
used in text classification that includes a high-dimensional
training dataset. It is simple and most effective Classification
algorithms. Probabilistic classifier, which means it predicts on the
basis of the probability of an object. Examples are spam
filtration, sentimental analysis, and classifying articles. It
assumes that the occurrence of a certain feature is independent of
the occurrence of other features uses Bayes theorem
p(Ck∣x)=p(Ck) p(x∣Ck)p(x)p(Ck∣x)=p(Ck) p(x∣Ck)p(x)
Naive Bayes classifier is based on Bayes theorem which says that
P(H|E) = P(E|H) * P(H) / P(E)
where H is some hypothesis based on some evidence E e.g.
evidence=fever, hypothesis=dengue.
• P(E), P(H), P(E|H) are priori-probabilities which are used to
calculate conditional probability P(H|E).
In Naive Bayes, we have to predict the class (C) of an example(X), so
the equation can be re-written as
• P(C|X) = P(X|C) * P(C) / P(X)
we have to build a classifier using the above training set i.e. we have to calculate
priori probabilities P(C), P(X|C) and P(X). As we have only two classes in out training
dataset, therefore P(C) is P(yes) and P(no). (sunny,cool,high,true),

P(C) = number of examples belonging to

class C / total examples
P(yes) = 9/14
P(no) = 5/14
P(X) = number of examples having X /
total examples
P(sunny) = 5/14
P(overcast) = 4/14
P(rainy) = 5/14
P(hot) = 4/14
P(mild) = 6/14
P(cool) = 4/14
P(high) = 7/14
P(normal) = 7/14
P(false) = 8/14
P(true) = 6/14
P(X|C) = number of times X is associated
with C / number of examples belonging to
class C
P(sunny|yes) = 2/9, P(sunny|no) = 3/5
P(overcast|yes) = 4/9, P(overcast|no) =
0/5
P(rainy|yes) = 3/9, P(rainy|no) = 2/5
P(hot|yes) = 2/9, P(hot|no) = 2/5
P(mild|yes) = 4/9, P(mild|no) = 2/5
P(cool|yes) = 3/9, P(cool|no) = 1/5
P(high|yes) = 3/9, P(high|no) = 4/5
P(normal|yes) = 6/9, P(normal|no) = 1/5
P(false|yes) = 6/9, P(false|no) = 2/5
P(true|yes) = 3/9, P(true|no) = 3/5
We have obtained all the 3 priori
probabilities from the training dataset.
Now, we want to classify a new
unclassified example.
Let the example be {sunny,cool,high,true}
and we have to predict it's class. The class
can be
predicted using the formula
P(C|X) = ,* P(C)*ΠP(X|C)+ - / ΠP(X)
Case I: Yes
P(yes|sunny,cool,high,true) = P(yes) *
P(sunny|yes) * P(cool|yes) * P(high|yes) *
P(true|yes) /
P(sunny) * P(cool) * P(high) * P(true) = 9/14
* 2/9 *3/9 * 3/9 * 3/9 / ΠP(X)

Case II : No
P(no|sunny,cool,high,true) = P(no) * P(sunny|no) * P(cool|no) * P(high|no) * P(true|no) /
P(sunny) *
P(cool) * P(high) * P(true) = 5/14 * 3/5 * 1/5 * 4/5 * 3/5 / ΠP(X)
Result:
As P(X) is same in both equations, we can ignore it giving
P(yes|sunny,cool,high,true) = 0.00529
P(no|sunny,cool,high,true) = 0.02057
As P(no|sunny,cool,high,true) > P(yes|sunny,cool,high,true), therefore we assign label "no" to it.

Control System Assignment Section B
No ratings yet
Control System Assignment Section B
14 pages
ML Unit 2
No ratings yet
ML Unit 2
21 pages
Maths Syllabus F1&F2
50% (2)
Maths Syllabus F1&F2
49 pages
ICSE Final Practice Paper-3-1
No ratings yet
ICSE Final Practice Paper-3-1
7 pages
BASIC CALCULUS 3rd WEEK 1st QUARTER
No ratings yet
BASIC CALCULUS 3rd WEEK 1st QUARTER
12 pages
Physical Design Essentials
No ratings yet
Physical Design Essentials
88 pages
Degree Measure Theorem
No ratings yet
Degree Measure Theorem
22 pages
10 Civl235 Slope Stake
100% (1)
10 Civl235 Slope Stake
9 pages
Ig Mathlearningplan
No ratings yet
Ig Mathlearningplan
16 pages
The Pythagorean Philosophy of Numbers: Silvano Leonessi, Adapted by Mary Jones, S.R.C
No ratings yet
The Pythagorean Philosophy of Numbers: Silvano Leonessi, Adapted by Mary Jones, S.R.C
4 pages
Matrix
No ratings yet
Matrix
5 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
62 pages
MECHANICS - Mathalino
No ratings yet
MECHANICS - Mathalino
11 pages
1st Periodical Exam in Basic Calculus Reviewer
100% (1)
1st Periodical Exam in Basic Calculus Reviewer
10 pages
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
No ratings yet
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
14 pages
Us2 M 92 Improper Fractions Activity Sheet English United States Ver 7
No ratings yet
Us2 M 92 Improper Fractions Activity Sheet English United States Ver 7
4 pages
Lecture#03,4
No ratings yet
Lecture#03,4
27 pages
135 Circular 2022
No ratings yet
135 Circular 2022
4 pages
Mathematics For Class 10 Real Numbers
No ratings yet
Mathematics For Class 10 Real Numbers
5 pages
Finding The Sum of The First N Terms of A Finite Geometric Sequence
No ratings yet
Finding The Sum of The First N Terms of A Finite Geometric Sequence
28 pages
LKM Kalin - Thoiffatul Khusnun Nisa'-023 PDF
100% (1)
LKM Kalin - Thoiffatul Khusnun Nisa'-023 PDF
8 pages
ML Merge
No ratings yet
ML Merge
145 pages
Latihan 1 POM-QM
No ratings yet
Latihan 1 POM-QM
2 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
Relation & Functions: By-Ashish Agarwal Sir
No ratings yet
Relation & Functions: By-Ashish Agarwal Sir
350 pages
Unit 3
No ratings yet
Unit 3
9 pages
Cce C
No ratings yet
Cce C
1 page
4) Factorising Into A Double Bracket
No ratings yet
4) Factorising Into A Double Bracket
3 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Part 2
No ratings yet
Part 2
10 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
M112 Topic 2
No ratings yet
M112 Topic 2
14 pages
3.unit 3 ML Part-2 Q&A
No ratings yet
3.unit 3 ML Part-2 Q&A
23 pages
Deep Learning Models
No ratings yet
Deep Learning Models
18 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Pathfinding (Private)
No ratings yet
Pathfinding (Private)
5 pages
Deep Learning Own Notes
No ratings yet
Deep Learning Own Notes
10 pages
Artificial Neural Network Bao
No ratings yet
Artificial Neural Network Bao
26 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
6 pages
WHAT IS A PROBLEM - Learning Unit 2
No ratings yet
WHAT IS A PROBLEM - Learning Unit 2
15 pages
Quadrant Data Efficient Machine Learning Screen
No ratings yet
Quadrant Data Efficient Machine Learning Screen
6 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
An Introductory Note On Machine Learning. A V Narasimhadhan
No ratings yet
An Introductory Note On Machine Learning. A V Narasimhadhan
2 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Assignment-2: 1) Explain Classification With Logistic Regression and Sigmoid Function
No ratings yet
Assignment-2: 1) Explain Classification With Logistic Regression and Sigmoid Function
6 pages
Machine Learning
No ratings yet
Machine Learning
133 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Undersea Warfare Course
No ratings yet
Undersea Warfare Course
8 pages
Foundation of Mathematics - 2
No ratings yet
Foundation of Mathematics - 2
4 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Classification Algorithm
No ratings yet
Classification Algorithm
43 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Math Question Bank
No ratings yet
Math Question Bank
178 pages
UCS-401 - CSE7th M L Lect 02 - Done
No ratings yet
UCS-401 - CSE7th M L Lect 02 - Done
22 pages
Supervised ML
No ratings yet
Supervised ML
69 pages
MLT UNIT-2 Notes
No ratings yet
MLT UNIT-2 Notes
16 pages
ML LVC 1 Post-Session Summary
No ratings yet
ML LVC 1 Post-Session Summary
15 pages
03-Lecture Notes-Mid
No ratings yet
03-Lecture Notes-Mid
23 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Unit Iii
No ratings yet
Unit Iii
18 pages
CSGL
No ratings yet
CSGL
11 pages
Unit 1 Part 3
No ratings yet
Unit 1 Part 3
11 pages
Assignment 4 Reportdocx
No ratings yet
Assignment 4 Reportdocx
10 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
AI lsn5 PDF
No ratings yet
AI lsn5 PDF
18 pages
5.1 Introduction To Limits
No ratings yet
5.1 Introduction To Limits
13 pages
Week 12 Chats
No ratings yet
Week 12 Chats
4 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
AI & ML Unit 4, 5 Notes
No ratings yet
AI & ML Unit 4, 5 Notes
137 pages
REVISION LIST - Year 9 - Assessment 4
No ratings yet
REVISION LIST - Year 9 - Assessment 4
1 page
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
ML Research Paper
No ratings yet
ML Research Paper
9 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Lec 12
No ratings yet
Lec 12
15 pages
Unit 5 Deep Unsupervised Learning
No ratings yet
Unit 5 Deep Unsupervised Learning
30 pages
Module 1
No ratings yet
Module 1
64 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
ML 3
No ratings yet
ML 3
21 pages
Deep Learning
No ratings yet
Deep Learning
52 pages
MLT Unit 2 Notes
No ratings yet
MLT Unit 2 Notes
58 pages
1
No ratings yet
1
61 pages
Unit 4 ML
No ratings yet
Unit 4 ML
11 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit-2: Logistic Regression

Uploaded by

Unit-2: Logistic Regression

Uploaded by

Unit-2

• Set all the weights to zero

• μ is the mean of features for class i

• C is the inverse of pooled covariance matrix

• x is the object which is to be classified

• We assign the object k with features x to group i that has maximum f

• models the decision boundary between the classes

• by adding a new dimension, z=x2+y2z=x2+y2

P(C) = number of examples belonging to

You might also like