0% found this document useful (0 votes)

35 views9 pages

Classification Algorithms II

This document summarizes key aspects of logistic regression and its use in machine learning. Logistic regression can be used for binary classification by predicting the probability that an observation belongs to a class. It fits a linear model within a logistic function to output probabilities between 0 and 1. The Iris dataset is commonly used for demonstrations and contains flower measurements for three species. Logistic regression can be extended to multiclass classification using one-vs-rest or multinomial approaches. Regularization is also discussed as a method to reduce overfitting by adding a penalty term to the loss function.

Uploaded by

Jayod Rajapaksha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views9 pages

Classification Algorithms II

Uploaded by

Jayod Rajapaksha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

The following study notes are from the textbook "Machine Learning with Python

Cookbook".

Logistic regression as a binary classifier

Despite being called a regression, logistic regression is a widely used supervised
classification technique. Logistic regression and its extensions, like multinomial logistic
regression, allow us to predict the probability that an observation is of a certain class using
a straightforward and well-understood approach.
In a logistic regression, a linear model (e.g., β 0 + β 1 X ) is included in a logistic (also called
1
sigmoid) function, , such that:
1+ e− z
1
P ( y i=1∨ X=x i )= − ( β 0 + β1 x i )
,
1+e

where P ( y i=1∨ X=x i ) is the probability of the i th observation's target value, y i, being class
1, X is the training data, and β 0, β 1 are the parameters to be learned. The effect of the
logistic function is to constrain the value of the function's output to between 0 and 1 so that
it can be interpreted as a probability. If P ( y i=1∨ X=x i ) is greater than or equal 0.5, class 1
is predicted; otherwise, class 0 is predicted.
Iris dataset
This is a classical dataset included in scikit-learn in the datasets module. We can load it
by calling the load_iris() function:
from sklearn.datasets import load_iris
iris = load_iris()

The iris object that is returned by load_iris is a Bunch object, which is very similar to a
dictionary. It contains keys and values:
print("Keys of iris_dataset: \n{}".format(iris.keys()))

Keys of iris_dataset:
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR',
'feature_names', 'filename', 'data_module'])

The value of the key DESCR is a short description of the dataset. We show the beginning of
the description here (feel free to look up the rest yourself):
print(iris['DESCR'][:193] + "\n...")

.. _iris_dataset:

Iris plants dataset

--------------------

Data Set Characteristics:

:Number of Instances: 150 (50 in each of three classes)

:Number of Attributes: 4 numeric, pre
...

The value of the key target_names is an array of strings, containing the species of flower
that we want to predict:
print("Target names: {}".format(iris['target_names']))

Target names: ['setosa' 'versicolor' 'virginica']

The value of feature_names is a list of strings, giving the description of each feature:
print("Feature names: \n{}".format(iris['feature_names']))

Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal
width (cm)']

The data itself is contained in the target and data fields. data contains the numeric
measurements of sepal length, sepal width, petal length, and petal width in a NumPy array:
print("Type of data: {}".format(type(iris['data'])))

Type of data: <class 'numpy.ndarray'>

The rows in the data array correspond to flowers, while the columns represent the four
measurements that were taken for each flower:
print("Shape of data: {}".format(iris['data'].shape))

Shape of data: (150, 4)

We see that the array contains measurements for 150 different flowers. The following is
the feature values for the first five samples:
print("First five columns of data:\n{}".format(iris['data'][:5]))

First five columns of data:

[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]

From this data, we can see that all of the first five flowers have a petal width of 0.2 cm and
that the first flower has the longest sepal, at 5.1 cm.
print("Shape of target: {}".format(iris['target'].shape))
Shape of target: (150,)

The species are encoded as integers from 0 to 2:

print("Target:\n{}".format(iris['target']))

Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2
2 2]

The meanings of the numbers are given by the iris['target_names'] array: 0 means
"setosa", 1 means "versicolor", and 2 means "virginica".
In scikit-learn, we can learn a logistic regression model using LogisticRegression.
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Load data with only two classes

features = iris.data[:100,:]
target = iris.target[:100]

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object

logreg = LogisticRegression(random_state=0)

# Train model
model = logreg.fit(features_standardized, target)

Once it is trained, we can use the model to predict the class of new observations.
# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
model.predict(new_observation)

array([1])
In this example, our observation was predicted to be class 1. Additionally, we can see the
probability that an observation is a member of each class:
# View predicted probabilities
model.predict_proba(new_observation)

array([[0.17738424, 0.82261576]])

Our observation had an 17.7% chance of being class 0 and 82.2% chance of being class 1.

Multiclass classifier
On their own, logistic regressions are only binary classifiers, meaning they cannot handle
target vectors with more than two classes. However, two clever extensions to logistic
regression do just that.
First, in one-vs-rest logistic regression (OVR) a separate model is trained for each class
predicted whether an observation is that class or not (thus making it a binary classification
problem). It assumes that each classification problem (e.g., class 0 or not) is independent.
To make a prediction, all binary classifiers are run on a test point. The classifier that has the
highest score on its single class "wins", and this class label is returned as the prediction.
Alternatively, in multinomial logistic regression (MLR), the logistic function is replaced
with a softmax function:

eβ k xi
P ( y i=k ∨X =xi ) = K−1
,
1+ ∑ e β j xi

j=1

where P ( y i=1∨ X=x i ) is the probability of the i th observation's target value, y i, being class
k , and K is the total number of classes. One practical advantage of the MLR is that its
predicted probabilities using the predict_proba method are more reliable (i.e., better
calibrated).
When using LogisticRegression we can select which of the two techniques we want,
with OVR, ovr, being the default argument. We can switch to an MLR by setting the
argument to multinomial.
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create training and testing datasets

X_train, X_test, y_train, y_test =
train_test_split(features_standardized, target, stratify=target,
random_state=66)

print ("Training data shape:{}".format(X_train.shape))

print ("Test data shape :{}".format(X_test.shape))

Training data shape:(112, 4)

Test data shape :(38, 4)

# Create one-vs-rest logistic regression object

logreg = LogisticRegression(multi_class="ovr")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))

# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 1 0 2 2

0 0 0 2 1 2 1 2 2 0 1 2 2
0]
Test set accuracy: 0.87

# Create multinomial logistic regression object

logreg = LogisticRegression(random_state=0, multi_class="multinomial")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))

# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.
Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 1 0 2 2
0 0 0 2 1 1 1 2 2 0 1 1 1
0]
Test set accuracy: 0.95

Reducing variance through regularization

Regularization is a method of penalizing complex models to reduce their variance.
Specifically, a penalty term is added to the loss function we are trying to minimize, typically
the L 1 and L 2 penalties.
In the L 1 penalty:
p
α ∑ |^β j ) ,
j=1

where ^β j is the parameters of the j th of p features being learned and α is a hyperparameter

denoting the regularization strength.
With the L 2 penalty:
p
α ∑ ^β2j .
j=1

Higher values of α increase the penalty for larger parameter values (i.e., more complex
models). For LogisticRegression, the trade-off parameter that determines the strength
1
of the regularization is called C where C is the inverse of the regularization strength: C= ,
α
and higher values of C correspond to less regularization. In other words, when you use a
high value for the parameter C , LogisticRegression tries to fit the training set as best as
possible, while with low values of the parameter C , the models put more emphasis on
finding a coefficient vector ( 𝛃 ) that is close to zero.
There is another interesting aspect of how the parameter C acts. Using low values of C will
cause the algorithms to try to adjust to the "majority" of data points, while using a higher
value of C stresses the importance that each individual data point be classified correctly.
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create training and testing datasets

X_train, X_test, y_train, y_test =
train_test_split(features_standardized, target, stratify=target,
random_state=66)

print ("Training data shape:{}".format(X_train.shape))

print ("Test data shape :{}".format(X_test.shape))

Training data shape:(112, 4)

Test data shape :(38, 4)

# Create multinomial logistic regression object

logreg = LogisticRegression(penalty='l2', C=100, random_state=0,
multi_class="multinomial")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))

# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 2 0 2 2

0 0 0 2 1 1 1 2 2 0 1 1 1
0]
Test set accuracy: 0.97

If we desire a more interpretable model, using L 1 regularization might help, as it limits the
model to using only a few features.
# Create multinomial logistic regression object
logreg = LogisticRegression(penalty='l1', solver="saga", C=100,
random_state=0, multi_class="multinomial")

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))

# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 2 0 2 2

0 0 0 2 1 1 1 2 2 0 1 1 1
0]
Test set accuracy: 0.97

/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/
_sag.py:352: ConvergenceWarning: The max_iter was reached which means
the coef_ did not converge
warnings.warn(

As in regression, the penalty parameter influences the regularization and whether the
model will use all available features or select only a subset.

Handling imbalanced classes

Like many other learning algorithms in scikit-learn, LogisticRegression comes with
a built-in method of handling imbalanced classes. If we have highly imbalanced classes and
have not addressed it during preprocessing, we have the option of using the
class_weight parameter to weight the classes to make certain we have a balanced mix of
each class. Specifically, the balanced argument will automatically weigh classes inversely
proportional to their frequency:
n
w j= ,
k nj

where w j is the weight to class j , n is the number of observations, n j is the number of

observations in class j , and k is the total number of classes.
# Load libraries
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Make class highly imbalanced by removing first 40 observations

features = features[40:,:]
target = target[40:]
# Create target vector indicating if class 0, otherwise 1
target = np.where((target == 0), 0, 1)

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create training and testing datasets

X_train, X_test, y_train, y_test =
train_test_split(features_standardized, target, stratify=target,
random_state=66)

print ("Training data shape:{}".format(X_train.shape))

print ("Test data shape :{}".format(X_test.shape))

Training data shape:(82, 4)

Test data shape :(28, 4)

# Create classifier object

logreg = LogisticRegression(random_state=0, class_weight="balanced",
multi_class="multinomial", n_jobs=-1)

# Train model
model = logreg.fit(X_train, y_train)

print("Test set predictions: {}".format(model.predict(X_test)))

# We calculate the predictions for y_test.

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

y_test)))
# To evaluate how well our model generalizes, we call the score method
with the test data together
# with the test labels.

Test set predictions: [1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1

1 1 1 1]
Test set accuracy: 1.00

Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
No ratings yet
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
6 pages
PPT for Assignment-8 (Logistic Regression)
No ratings yet
PPT for Assignment-8 (Logistic Regression)
10 pages
Lab#10 Ai
No ratings yet
Lab#10 Ai
3 pages
IRIS FLOWER CLASSIFICATION USING ML
No ratings yet
IRIS FLOWER CLASSIFICATION USING ML
12 pages
Lab 6
No ratings yet
Lab 6
4 pages
Logisticregression
No ratings yet
Logisticregression
4 pages
01 Machine Learning
No ratings yet
01 Machine Learning
25 pages
22BCS14374 - Sanya - Singh - Assignment 2
No ratings yet
22BCS14374 - Sanya - Singh - Assignment 2
8 pages
ML LAB FILE (2)
No ratings yet
ML LAB FILE (2)
48 pages
LinearRegression_Iris
No ratings yet
LinearRegression_Iris
4 pages
Task1
No ratings yet
Task1
7 pages
Lab Exam ... Roll No 24cs4103
No ratings yet
Lab Exam ... Roll No 24cs4103
4 pages
DA Practicle Answers Easyw
No ratings yet
DA Practicle Answers Easyw
30 pages
1 Assignment 3 - Classification
No ratings yet
1 Assignment 3 - Classification
16 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
hemraj_python_ass1
No ratings yet
hemraj_python_ass1
7 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
MLfull
No ratings yet
MLfull
29 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
ML File
No ratings yet
ML File
17 pages
ML Lab Programs (1)
No ratings yet
ML Lab Programs (1)
9 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Assignment 3 1
No ratings yet
Assignment 3 1
3 pages
A Good Beginner Project With Logistic Regression by Jacob Toftgaard Rasmussen - fragment
No ratings yet
A Good Beginner Project With Logistic Regression by Jacob Toftgaard Rasmussen - fragment
15 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Assessment 1
No ratings yet
Assessment 1
4 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
lOGISTIC REGRESSION
No ratings yet
lOGISTIC REGRESSION
8 pages
1613101309_JAYESH BANSAL_FinalProjectReport - Jayesh Bansal
No ratings yet
1613101309_JAYESH BANSAL_FinalProjectReport - Jayesh Bansal
38 pages
Machine Learning: Lecture 7: Create Your First Project
No ratings yet
Machine Learning: Lecture 7: Create Your First Project
17 pages
Module-2_Logistic Regression in Machine Learning
No ratings yet
Module-2_Logistic Regression in Machine Learning
28 pages
AMDL Practicals
No ratings yet
AMDL Practicals
27 pages
ML-Lecture-10-Project
No ratings yet
ML-Lecture-10-Project
20 pages
assignment-3
No ratings yet
assignment-3
9 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
23BCE7092_ML_Lab_Assignment[1]
No ratings yet
23BCE7092_ML_Lab_Assignment[1]
14 pages
Nomlab 14 Ai
No ratings yet
Nomlab 14 Ai
3 pages
Section 4
No ratings yet
Section 4
40 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
3 pages
Wa0001
No ratings yet
Wa0001
39 pages
4. Logistic Regression
No ratings yet
4. Logistic Regression
21 pages
ML L - Ab
No ratings yet
ML L - Ab
13 pages
AI lab8
No ratings yet
AI lab8
8 pages
Lecture Slides slides 9
No ratings yet
Lecture Slides slides 9
2 pages
PR
No ratings yet
PR
17 pages
new89梁涛企业管理（运营与供应链方向）202111080248Application of linear regression model and logistic regression model based on Iris data set
No ratings yet
new89梁涛企业管理（运营与供应链方向）202111080248Application of linear regression model and logistic regression model based on Iris data set
21 pages
Experiment1 Explanation
No ratings yet
Experiment1 Explanation
6 pages
Ass1_SetB2
No ratings yet
Ass1_SetB2
1 page
Python for Data Science IA 1 Programs
No ratings yet
Python for Data Science IA 1 Programs
14 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Singer Mens Rechargeable Shaver With Middle Trim Blade
No ratings yet
Singer Mens Rechargeable Shaver With Middle Trim Blade
2 pages
Compendium Iim Shillong Analytics and Prod Man
No ratings yet
Compendium Iim Shillong Analytics and Prod Man
68 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Glycemic Control and Radiographic Manifestations of Tuberculosis in Diabetic Patients
No ratings yet
Glycemic Control and Radiographic Manifestations of Tuberculosis in Diabetic Patients
17 pages
Telemarketing Dataset Analysis: Group 7 Abhishek Jagdale Nilay N Sonal Mittal Swapnil B Swapnil T Vishal Sinha
No ratings yet
Telemarketing Dataset Analysis: Group 7 Abhishek Jagdale Nilay N Sonal Mittal Swapnil B Swapnil T Vishal Sinha
21 pages
INN_Hotels_Project.docx
No ratings yet
INN_Hotels_Project.docx
26 pages
Telehealth Stroke Dysphagia Evaluation Safe and Effective: Original Paper
No ratings yet
Telehealth Stroke Dysphagia Evaluation Safe and Effective: Original Paper
11 pages
Margins Stata
No ratings yet
Margins Stata
74 pages
Community Disaster Preparedness - Davao City Paper
No ratings yet
Community Disaster Preparedness - Davao City Paper
13 pages
Model Choice Models
No ratings yet
Model Choice Models
7 pages
MOLLA T. Agri. Econ
No ratings yet
MOLLA T. Agri. Econ
121 pages
Multinomial Logistic Regression
No ratings yet
Multinomial Logistic Regression
18 pages
Hoyos Et Al 2021 GEB ForestSavannaSouthAm
No ratings yet
Hoyos Et Al 2021 GEB ForestSavannaSouthAm
15 pages
Im 1
No ratings yet
Im 1
11 pages
Occupational Stress and Associated Among Nurses Working at Public Hospitals of Addis Ababa, Ethiopia, 2022 A Hospital
No ratings yet
Occupational Stress and Associated Among Nurses Working at Public Hospitals of Addis Ababa, Ethiopia, 2022 A Hospital
7 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
3 pages
Essential Statistics for Public Managers and Policy Analysts Wang - Get instant access to the full ebook with detailed content
No ratings yet
Essential Statistics for Public Managers and Policy Analysts Wang - Get instant access to the full ebook with detailed content
56 pages
ML W8 Merged
No ratings yet
ML W8 Merged
27 pages
(Ebook) China's Soft Power and International Relations by Hongyi Lai, Yiyi Lu ISBN 9780415604017, 041560401X instant download
No ratings yet
(Ebook) China's Soft Power and International Relations by Hongyi Lai, Yiyi Lu ISBN 9780415604017, 041560401X instant download
56 pages
(Ebook) The R Book, 3rd Edition by Elinor Jones, Simon Harden, Michael J. Crawley ISBN 9781119634324, 1119634326 download pdf
100% (3)
(Ebook) The R Book, 3rd Edition by Elinor Jones, Simon Harden, Michael J. Crawley ISBN 9781119634324, 1119634326 download pdf
81 pages
Olveretal2018VRS SOUpdatedriskcategoriesPsychAssessment
No ratings yet
Olveretal2018VRS SOUpdatedriskcategoriesPsychAssessment
16 pages
Predictive Model For Diabetes Using Machine Learning
No ratings yet
Predictive Model For Diabetes Using Machine Learning
38 pages
Financial Early Warning System Model and Data Mining Application For Risk Detection 2012 Expert Systems With Applications
No ratings yet
Financial Early Warning System Model and Data Mining Application For Risk Detection 2012 Expert Systems With Applications
16 pages
Mode Choice Modeling For Intercity Travel in Nepal
No ratings yet
Mode Choice Modeling For Intercity Travel in Nepal
8 pages
Project Questions
No ratings yet
Project Questions
3 pages
Learn The Boosting - Method - Implementation - in - R
No ratings yet
Learn The Boosting - Method - Implementation - in - R
27 pages
Celebal Summer t-1
No ratings yet
Celebal Summer t-1
34 pages
(Ebook) Credit Scoring, Response Modelling and Insurance Rating: A Practical Guide to Forecasting Consumer Behaviour by Steven Finlay ISBN 0230577040 instant download
100% (1)
(Ebook) Credit Scoring, Response Modelling and Insurance Rating: A Practical Guide to Forecasting Consumer Behaviour by Steven Finlay ISBN 0230577040 instant download
39 pages
Ibm Attrition Practices
No ratings yet
Ibm Attrition Practices
7 pages
GeoCube A 3D Mineral Resources Quantitative Prediction and Assessment System
No ratings yet
GeoCube A 3D Mineral Resources Quantitative Prediction and Assessment System
13 pages
Novel Graph-Based Machine Learning Technique To Secure Smart Vehicles in Intelligent Transportation Systems
No ratings yet
Novel Graph-Based Machine Learning Technique To Secure Smart Vehicles in Intelligent Transportation Systems
9 pages

Classification Algorithms II

Uploaded by

Classification Algorithms II

Uploaded by

The following study notes are from the textbook "Machine Learning with Python

Logistic regression as a binary classifier

Iris plants dataset

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)

Target names: ['setosa' 'versicolor' 'virginica']

Type of data: <class 'numpy.ndarray'>

Shape of data: (150, 4)

First five columns of data:

The species are encoded as integers from 0 to 2:

# Load data with only two classes

# Create logistic regression object

# Create training and testing datasets

print ("Training data shape:{}".format(X_train.shape))

Training data shape:(112, 4)

# Create one-vs-rest logistic regression object

print("Test set predictions: {}".format(model.predict(X_test)))

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 1 0 2 2

# Create multinomial logistic regression object

print("Test set predictions: {}".format(model.predict(X_test)))

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

Reducing variance through regularization

where ^β j is the parameters of the j th of p features being learned and α is a hyperparameter

# Create training and testing datasets

print ("Training data shape:{}".format(X_train.shape))

Training data shape:(112, 4)

# Create multinomial logistic regression object

print("Test set predictions: {}".format(model.predict(X_test)))

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 2 0 2 2

print("Test set predictions: {}".format(model.predict(X_test)))

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

Test set predictions: [1 0 1 0 0 2 0 1 2 2 1 2 2 2 2 2 0 0 1 0 2 0 2 2

Handling imbalanced classes

where w j is the weight to class j , n is the number of observations, n j is the number of

# Make class highly imbalanced by removing first 40 observations

# Create training and testing datasets

print ("Training data shape:{}".format(X_train.shape))

Training data shape:(82, 4)

# Create classifier object

print("Test set predictions: {}".format(model.predict(X_test)))

print("Test set accuracy: {:.2f}".format(logreg.score(X_test,

Test set predictions: [1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1

You might also like

Data Set Characteristics: