0% found this document useful (0 votes)
31 views

Classification Algorithms II

This document summarizes key aspects of logistic regression and its use in machine learning. Logistic regression can be used for binary classification by predicting the probability that an observation belongs to a class. It fits a linear model within a logistic function to output probabilities between 0 and 1. The Iris dataset is commonly used for demonstrations and contains flower measurements for three species. Logistic regression can be extended to multiclass classification using one-vs-rest or multinomial approaches. Regularization is also discussed as a method to reduce overfitting by adding a penalty term to the loss function.

Uploaded by

Jayod Rajapaksha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Classification Algorithms II

This document summarizes key aspects of logistic regression and its use in machine learning. Logistic regression can be used for binary classification by predicting the probability that an observation belongs to a class. It fits a linear model within a logistic function to output probabilities between 0 and 1. The Iris dataset is commonly used for demonstrations and contains flower measurements for three species. Logistic regression can be extended to multiclass classification using one-vs-rest or multinomial approaches. Regularization is also discussed as a method to reduce overfitting by adding a penalty term to the loss function.

Uploaded by

Jayod Rajapaksha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The following study notes are from the textbook "Machine Learning with Python

Cookbook".

Logistic regression as a binary classifier


Despite being called a regression, logistic regression is a widely used supervised
classification technique. Logistic regression and its extensions, like multinomial logistic
regression, allow us to predict the probability that an observation is of a certain class using
a straightforward and well-understood approach.
In a logistic regression, a linear model (e.g., β 0 + β 1 X ) is included in a logistic (also called
1
sigmoid) function, , such that:
1+ e− z
1
P ( y i=1∨ X=x i )= − ( β 0 + β1 x i )
,
1+e

where P ( y i=1∨ X=x i ) is the probability of the i th observation's target value, y i, being class
1, X is the training data, and β 0, β 1 are the parameters to be learned. The effect of the
logistic function is to constrain the value of the function's output to between 0 and 1 so that
it can be interpreted as a probability. If P ( y i=1∨ X=x i ) is greater than or equal 0.5, class 1
is predicted; otherwise, class 0 is predicted.
Iris dataset
This is a classical dataset included in scikit-learn in the datasets module. We can load it
by calling the load_iris() function:
from sklearn.datasets import load_iris
iris = load_iris()

The iris object that is returned by load_iris is a Bunch object, which is very similar to a
dictionary. It contains keys and values:
print("Keys of iris_dataset: \n{}".format(iris.keys()))

Keys of iris_dataset:
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR',
'feature_names', 'filename', 'data_module'])

The value of the key DESCR is a short description of the dataset. We show the beginning of
the description here (feel free to look up the rest yourself):
print(iris['DESCR'][:193] + "\n...")

.. _iris_dataset:

Iris plants dataset


--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)


:Number of Attributes: 4 numeric, pre
...

The value of the key target_names is an array of strings, containing the species of flower
that we want to predict:
print("Target names: {}".format(iris['target_names']))

Target names: ['setosa' 'versicolor' 'virginica']

The value of feature_names is a list of strings, giving the description of each feature:
print("Feature names: \n{}".format(iris['feature_names']))

Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal
width (cm)']

The data itself is contained in the target and data fields. data contains the numeric
measurements of sepal length, sepal width, petal length, and petal width in a NumPy array:
print("Type of data: {}".format(type(iris['data'])))

Type of data: <class 'numpy.ndarray'>

The rows in the data array correspond to flowers, while the columns represent the four
measurements that were taken for each flower:
print("Shape of data: {}".format(iris['data'].shape))

Shape of data: (150, 4)

We see that the array contains measurements for 150 different flowers. The following is
the feature values for the first five samples:
print("First five columns of data:\n{}".format(iris['data'][:5]))

First five columns of data:


[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]

From this data, we can see that all of the first five flowers have a petal width of 0.2 cm and
that the first flower has the longest sepal, at 5.1 cm.
print("Shape of target: {}".format(iris['target'].shape))
Shape of target: (150,)

The species are encoded as integers from 0 to 2:


print("Target:\n{}".format(iris['target']))

Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2
2 2]

The meanings of the numbers are given by the iris['target_names'] array: 0 means
"setosa", 1 means "versicolor", and 2 means "virginica".
In scikit-learn, we can learn a logistic regression model using LogisticRegression.
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Load data with only two classes


features = iris.data[:100,:]
target = iris.target[:100]

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object


logreg = LogisticRegression(random_state=0)

# Train model
model = logreg.fit(features_standardized, target)

Once it is trained, we can use the model to predict the class of new observations.
# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
model.predict(new_observation)

array([1])
In this example, our observation was predicted to be class 1. Additionally, we can see the
probability that an observation is a member of each class:
# View predicted probabilities
model.predict_proba(new_observation)

array([[0.17738424, 0.82261576]])

Our observation had an 17.7% chance of being class 0 and 82.2% chance of being class 1.

Multiclass classifier
On their own, logistic regressions are only binary classifiers, meaning they cannot handle
target vectors with more than two classes. However, two clever extensions to logistic
regression do just that.
First, in one-vs-rest logistic regression (OVR) a separate model is trained for each class
predicted whether an observation is that class or not (thus making it a binary classification
problem). It assumes that each classification problem (e.g., class 0 or not) is independent.
To make a prediction, all binary classifiers are run on a test point. The classifier that has the
highest score on its single class "wins", and this class label is returned as the prediction.
Alternatively, in multinomial logistic regression (MLR), the logistic function is replaced
with a softmax function:

eβ k xi
P ( y i=k ∨X =xi ) = K−1
,
1+ ∑ e β j xi

j=1

where P ( y i=1∨ X=x i ) is the probability of the i th observation's target value, y i, being class
k , and K is the total number of classes. One practical advantage of the MLR is that its
predicted probabilities using the predict_proba method are more reliable (i.e., better
calibrated).
When using LogisticRegression we can select which of the two techniques we want,
with OVR, ovr, being the default argument. We can switch to an MLR by setting the
argument to multinomial.
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create training and testing datasets


X_train, X_test, y_train, y_test =
train_test_split(features_standardized, target, stratify=target,
random_state=66)

print ("Training data shape:{}".format(X_train.shape))


print ("Test data shape :{}".format(X_test.shape))

Training data shape:(112, 4)


Test data shape :(38, 4)

# Create one-vs-rest logistic regression object


logreg = LogisticRegression(multi_class="ovr")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))


# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,


y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 1 0 2 2


0 0 0 2 1 2 1 2 2 0 1 2 2
0]
Test set accuracy: 0.87

# Create multinomial logistic regression object


logreg = LogisticRegression(random_state=0, multi_class="multinomial")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))


# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,


y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.
Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 1 0 2 2
0 0 0 2 1 1 1 2 2 0 1 1 1
0]
Test set accuracy: 0.95

Reducing variance through regularization


Regularization is a method of penalizing complex models to reduce their variance.
Specifically, a penalty term is added to the loss function we are trying to minimize, typically
the L 1 and L 2 penalties.
In the L 1 penalty:
p
α ∑ |^β j ) ,
j=1

where ^β j is the parameters of the j th of p features being learned and α is a hyperparameter


denoting the regularization strength.
With the L 2 penalty:
p
α ∑ ^β2j .
j=1

Higher values of α increase the penalty for larger parameter values (i.e., more complex
models). For LogisticRegression, the trade-off parameter that determines the strength
1
of the regularization is called C where C is the inverse of the regularization strength: C= ,
α
and higher values of C correspond to less regularization. In other words, when you use a
high value for the parameter C , LogisticRegression tries to fit the training set as best as
possible, while with low values of the parameter C , the models put more emphasis on
finding a coefficient vector ( 𝛃 ) that is close to zero.
There is another interesting aspect of how the parameter C acts. Using low values of C will
cause the algorithms to try to adjust to the "majority" of data points, while using a higher
value of C stresses the importance that each individual data point be classified correctly.
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create training and testing datasets


X_train, X_test, y_train, y_test =
train_test_split(features_standardized, target, stratify=target,
random_state=66)

print ("Training data shape:{}".format(X_train.shape))


print ("Test data shape :{}".format(X_test.shape))

Training data shape:(112, 4)


Test data shape :(38, 4)

# Create multinomial logistic regression object


logreg = LogisticRegression(penalty='l2', C=100, random_state=0,
multi_class="multinomial")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))


# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,


y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 2 0 2 2


0 0 0 2 1 1 1 2 2 0 1 1 1
0]
Test set accuracy: 0.97

If we desire a more interpretable model, using L 1 regularization might help, as it limits the
model to using only a few features.
# Create multinomial logistic regression object
logreg = LogisticRegression(penalty='l1', solver="saga", C=100,
random_state=0, multi_class="multinomial")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))


# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,


y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 2 0 2 2


0 0 0 2 1 1 1 2 2 0 1 1 1
0]
Test set accuracy: 0.97

/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/
_sag.py:352: ConvergenceWarning: The max_iter was reached which means
the coef_ did not converge
warnings.warn(

As in regression, the penalty parameter influences the regularization and whether the
model will use all available features or select only a subset.

Handling imbalanced classes


Like many other learning algorithms in scikit-learn, LogisticRegression comes with
a built-in method of handling imbalanced classes. If we have highly imbalanced classes and
have not addressed it during preprocessing, we have the option of using the
class_weight parameter to weight the classes to make certain we have a balanced mix of
each class. Specifically, the balanced argument will automatically weigh classes inversely
proportional to their frequency:
n
w j= ,
k nj

where w j is the weight to class j , n is the number of observations, n j is the number of


observations in class j , and k is the total number of classes.
# Load libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Make class highly imbalanced by removing first 40 observations


features = features[40:,:]
target = target[40:]
# Create target vector indicating if class 0, otherwise 1
target = np.where((target == 0), 0, 1)

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create training and testing datasets


X_train, X_test, y_train, y_test =
train_test_split(features_standardized, target, stratify=target,
random_state=66)

print ("Training data shape:{}".format(X_train.shape))


print ("Test data shape :{}".format(X_test.shape))

Training data shape:(82, 4)


Test data shape :(28, 4)

# Create classifier object


logreg = LogisticRegression(random_state=0, class_weight="balanced",
multi_class="multinomial", n_jobs=-1)

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))


# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,


y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1


1 1 1 1]
Test set accuracy: 1.00

You might also like