15) Machine Learning Algorithms - Google Docs
15) Machine Learning Algorithms - Google Docs
● ogistic Regression
L
● Support Vector Machines (SVM)
● K - Nearest Neighbors (KNN)
● Decision Trees
● Random Forests
● Xgboost etc.
Logistic Regression
ogisticregressionisapopularstatisticalalgorithmusedforbinaryclassificationproblems.It
L
isasupervisedlearningalgorithmthatpredictstheprobabilityofaninputbelongingtooneof
two classes, typically represented as 0 or 1.
he key idea behind logistic regression is to model the relationship between the input
T
features and the probability of the input belonging to a specific class. Unlike linear
regression, which predicts a continuous value, logistic regression uses a logistic function
(also called a sigmoid function) to map the output to a value between 0 and 1.
● D ataPreparation:Youstartwithadatasetthatcontainsinputfeatures(independent
variables) and corresponding class labels (0 or 1). The features could be any
measurable characteristics or attributes that are relevant to the classification task.
● M odel Training: During the training phase, logistic regression estimates the
parameters (coefficients)thatdefinetherelationshipbetweentheinputfeaturesand
theclassprobabilities.Itusesaprocesscalledmaximumlikelihoodestimationtofind
the optimal parameter values that maximize the likelihood of the observed data.
● L ogisticFunction:Thelogisticfunctionisthecorecomponentoflogisticregression.
Ittransformsthelinearcombinationofinputfeaturesandtheirassociatedcoefficients
into a value between 0 and 1. The logistic function equation is:
Inthisequation,P(Y=1|X)representstheprobabilityoftheinputbelongingtoclass1
giventheinputfeaturesX.b0,b1,b2,...,bnarethecoefficientsassociatedwitheach
feature, and X1, X2, ..., Xn are the corresponding feature values.
● D ecision Boundary: To make a prediction, logistic regression uses a decision
boundary. Typically, if the predicted probability is greaterthanaspecifiedthreshold
(often0.5),theinputisclassifiedasbelongingtoclass1;otherwise,itisclassifiedas
belonging to class 0.
● M odelEvaluation:Aftertraining,youevaluatethelogisticregressionmodelusinga
separate dataset to assess its performance. Common evaluation metricsforbinary
classification include accuracy, precision, recall, and F1-score.
ogisticregressioniswidelyusedduetoitssimplicityandinterpretability.Itcanhandleboth
L
linear and nonlinear relationships between the input features and the class probabilities.
However, it assumes that the relationship is log-linear, meaning the logarithm of the odds
ratio is a linear function of the input features.
Types of Logistic Regression
here aredifferenttypesorvariationsoflogisticregressionthatcanbeuseddependingon
T
the specific requirements of the classification problem. Here are some common types of
logistic regression:
1. B inary Logistic Regression: Binary logistic regression is the most basic form of
logistic regression. It is used when the target variable has only two classes or
categories.Thelogisticfunctioninbinarylogisticregressionmapstheinputfeatures
to the probability of belonging to one of the two classes.
2. M ultinomialLogisticRegression:Multinomiallogisticregressionisusedwhenthe
target variable has more than two mutually exclusive classes. It extends binary
logisticregressiontohandlemulti-classclassificationproblems.Thelogisticfunction
is modified to handle multiple classes, typically using the softmax function.
3. O rdinal Logistic Regression: Ordinal logistic regression is used whenthetarget
variable has ordered or ordinal categories. In this case,the classes have a natural
ordering or ranking. The logistic regression model is adapted to estimate the
cumulative probabilities of each class relative to the others.
4. P enalized Logistic Regression: Penalized logistic regression, also known as
regularized logistic regression, includes penalty terms in the model to prevent
overfitting and improve generalization. Common penalty terms include L1
regularization (Lasso regression) and L2 regularization (Ridge regression). These
penalties encourage sparsity and shrinkage of the coefficients.
hesearesomeofthecommontypesoflogisticregression.Thechoiceofthetypeoflogistic
T
regression depends on the nature of the data, the number of classes, the presence of
ordered categories, the need for regularization or penalization, and other specific
requirements of the classification problem at hand.
ogisticregressionmakesseveralkeyassumptionstoensurethevalidityandreliabilityofits
L
results. Here are the main assumptions of logistic regression:
➢ B inary or Ordinal Outcome: Logistic regression assumes that the dependent
variable(outcome)isbinary(twocategories)orordinal(orderedcategories).Itisnot
suitable for continuous outcome variables.
➢ L inearity of Log-Odds: Logistic regression assumes thattherelationshipbetween
theindependentvariablesandthelog-oddsoftheoutcomeislinear.Thismeansthat
theeffectoftheindependentvariablesisadditiveonthelog-oddsscale.Ifthereare
non-linear relationships, appropriate transformations or higher-order terms may be
needed.
➢ L arge Sample Size: Logistic regression performs well with large sample sizes. A
rule of thumb is to have at least 10-15 outcome events (cases) per independent
variable to ensure stable estimates and reliable statistical inference. Insufficient
sample size may lead to overfitting or unreliable results.
It's importanttonotethatviolatingtheseassumptionsmayimpactthevalidityandreliability
of the logistic regression results. Therefore, it is advisable to assess and address these
assumptions appropriately when applying logistic regression in practice.
ogistic regression, like any statistical model, has certain limitations that should be
L
considered whenapplyingittoaclassificationproblem.Herearesomecommonlimitations
of logistic regression:
➢ L inearRelationshipAssumption:Logisticregressionassumesalinearrelationship
between the independent variables and the log-odds of the outcome. If the
relationship is non-linear, logistic regression may not capture it accurately. In such
cases, more flexible models like decision trees or neural networks might be more
appropriate.
➢ S ensitivitytoOutliers:Logisticregressioncanbesensitivetooutliers,especiallyif
they are influential in affecting the estimated coefficients. Outliers with extreme
values can disproportionately influence the model, leading to biased coefficient
estimates. Robust regression techniques or outlier detection methods can be
employed to mitigate this issue.
➢ Imbalanced Data: Logistic regression may not perform well with imbalanced
datasets,whereoneclassissignificantlymoreprevalentthantheother.Ittendstobe
biased towards the majorityclass,leadingtopoorpredictionsfortheminorityclass.
Techniques like oversampling, undersampling, or usingweightedlossfunctionscan
help address this issue.
➢ P robabilityCalibrationIssues:Logisticregressionprovidespredictedprobabilities,
but these probabilitiesmaynotbewell-calibrated,meaningtheymaynotaccurately
reflect the true probabilities of the outcome.
espite these limitations, logistic regression remains a widely used and valuable tool for
D
binary and ordinal classification tasks. It provides interpretable results, is computationally
efficient,andworkswellinscenarioswheretheassumptionsarereasonablymet.However,
it's essential to carefully evaluate these limitations and consider alternative models when
necessary to ensure accurate and robust predictions.
The response variables are continuous The response variable is categorical in
in nature nature
It helps estimate the dependent variable It helps to calculate the possibility of a
when there is a change in the particular event taking place
independent variable