0% found this document useful (0 votes)

6 views6 pages

Lab 4 - Logistic Regression - KNN - Notes

This document provides an overview of Logistic Regression, kNN, and Decision Tree algorithms for classification tasks in a machine learning lab. It includes model building and testing procedures using Python's scikit-learn library, along with explanations of key parameters and metrics for evaluating model performance. The document also details the syntax and usage for each algorithm, including examples for implementation.

Uploaded by

Đức Hải Đào

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views6 pages

Lab 4 - Logistic Regression - KNN - Notes

Uploaded by

Đức Hải Đào

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

ML - LAB - NLU

(Semester 2, 2024/2025)

Lab #4: Logistic Regression_kNN Notes

This lab is to deal with Logistic Regression, kNN, and Decision Tree algorithms
applied to classification tasks.

==================================================================

LogisticRegression
Logistic regression predicts the probability of an outcome that can only have two
values. The prediction is based on the use of one or several features (numerical and
categorical)

Build a model:

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

Test the model:

y_pred = classifier.predict(X_test)

Confusion matrix:

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

Accuracy:

1
ML - LAB - NLU
(Semester 2, 2024/2025)
from sklearn.metrics import accuracy_score
print ("Accuracy : ", accuracy_score(y_test, y_pred))

Some additional values on metrics:

− metrics.confusion_matrix(y_test, y_pred[, ...]): Compute confusion matrix to

evaluate the accuracy of a classification
− metrics.precision_score(y_test, y_pred[, ...]) Compute the precision
− metrics.recall_score(y_test, y_pred[, ...]): Compute the recall
− metrics.f1_score(y_test, y_pred[, labels, ...]): Compute the F1 score, also known as
balanced F-score or F-measure
− metrics.accuracy_score(y_test, y_pred[, ...]): Accuracy classification score

==================================================================

kNN algorithm (https://scikit-

learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html):

Syntax:

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform',

algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None,
n_jobs=None)

where,

n_neighbors: int, default=5. Number of neighbors to use by default for kneighbors queries.

2
ML - LAB - NLU
(Semester 2, 2024/2025)
weights:{‘uniform’, ‘distance’} or callable, default=’uniform’. Weight function used in
prediction. Possible values:
• ‘uniform’: uniform weights. All points in each neighborhood are weighted equally.
• ‘distance’: Weight points by the inverse of their distance. In this case, neighbors
closer to a query point will have a greater influence than neighbors further away.
• [callable]: a user-defined function that accepts an array of distances, and returns an
array of the same shape containing the weights.
algorithm: {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’. Algorithm used to
compute the nearest neighbors:
• ‘ball_tree’ will use BallTree - a data structure used for nearest neighbor
searches in multi-dimensional space. It partitions the space into "balls" rather
than hyperrectangles.➔ Suitable when the dimensionality (d) is medium (~10-
100)
• ‘kd_tree’ will use KDTree - (K-dimensional Tree) is a binary tree used for
partitioning data space. It splits the space into hyperrectangles along
coordinate axes. Efficient for data with low dimensionality
• ‘brute’ will use a brute-force search.
• ‘auto’ will attempt to decide the most appropriate algorithm based on the
values passed to fit method.
Note: fitting on sparse input will override the setting of this parameter, using brute
force.
leaf_size: int, default=30
Leaf size passed to BallTree or KDTree. This can affect the speed of the construction
and query, as well as the memory required to store the tree. The optimal value
depends on the nature of the problem.
p: int, default=2
Power parameter for the Minkowski metric. When p = 1, this is equivalent to using
manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p,
minkowski_distance (l_p) is used.
metric: str or callable, default=’minkowski’
The distance metric to use for the tree. The default metric is Minkowski, and with p=2
is equivalent to the standard Euclidean metric. For a list of available metrics, see the
documentation of DistanceMetric. If the metric is “precomputed”, X is assumed to
be a distance matrix and must be square during fit. X may be a sparse graph, in which
case only “nonzero” elements may be considered neighbors.

Usage:

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)
# Train the model using the training sets
model.fit(X_train,y_train)
#Predict Output
y_pred = model.predict(X_test)

3
ML - LAB - NLU
(Semester 2, 2024/2025)
==================================================================

Decision Tree (https://scikit-

learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html):

Syntax:
class sklearn.tree.DecisionTreeClassifier(*, criterion='gini', splitter='best',
max_depth=None, min_samples_split=2, min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_features=None, random_state=None,
max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0)
where,

criterion:{“gini”, “entropy”}, default=”gini”

The function is to measure the quality of a split. Supported criteria are “gini” for the
Gini impurity and “entropy” for the information gain.

− Suppose attribute A having v distinct values, {a1, a2,... , av}

− Information needed (after using A to split D into v partitions) to classify D

v Dj
Info A ( D ) = 
D
( )
 Info D j
j =1

Ci , D
pi =
Info ( D ) = −i =1 pi log 2 ( pi )
m
D
,

Gain ( A) = Info ( D ) − InfoA ( D )

Gini ( D ) = 1 − i =1 pi2
m

− pi: the nonzero probability that an arbitrary tuple in D belongs to class Ci.
− A binary split on A partitions D into D1 and D2, the Gini index of D given that
partitioning:

D1 D2
GiniA ( D ) = Gini ( D1 ) + Gini ( D2 )
D D

splitter: {“best”, “random”}, default=”best”

4
ML - LAB - NLU
(Semester 2, 2024/2025)
The strategy used to choose the split at each node. Supported strategies are “best” to
choose the best split and “random” to choose the best random split.

max_depth: int, default=None

The maximum depth of the tree. If None, then nodes are expanded until all leaves are
pure or until all leaves contain less than min_samples_split samples.

min_samples_split: int or float, default=2

The minimum number of samples required to split an internal node:

• If int, then consider min_samples_split as the minimum number.

• If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum number of
samples for each split.

min_samples_leaf: int or float, default=1. The minimum number of samples required to be at

a leaf node.
min_weight_fraction_leaf: float, default=0.0

The minimum weighted fraction of the sum total of weights (of all the input samples)
required to be at a leaf node. Samples have equal weight when sample_weight is not
provided.

max_features: int, float or {“auto”, “sqrt”, “log2”}, default=None. The number of features to
consider when looking for the best split:

• If int, then consider max_features features at each split.

• If float, then max_features is a fraction and int(max_features *
n_features) features are considered at each split.
• If “auto”, then max_features=sqrt(n_features).
• If “sqrt”, then max_features=sqrt(n_features).
• If “log2”, then max_features=log2(n_features).
• If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node
samples is found, even if it requires to effective inspection of more than
max_features features.

max_leaf_nodes: int, default=None

Grow a tree with max_leaf_nodes in the best-first fashion. Best nodes are defined as
relative reduction in impurity. If None then an unlimited number of leaf nodes.

min_impurity_decrease: float, default=0.0

A node will be split if this split induces a decrease of the impurity greater than or
equal to this value.

5
ML - LAB - NLU
(Semester 2, 2024/2025)
The weighted impurity decrease equation is the following:

N_t / N * (impurity - N_t_R / N_t * right_impurity

- N_t_L / N_t * left_impurity)

where N is the total number of samples, N_t is the number of samples at the current
node, N_t_L is the number of samples in the left child, and N_t_R is the number of
samples in the right child.

N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

class_weight: dict, list of dict or “balanced”, default=None

Weights associated with classes in the form {class_label: weight}. If None, all
classes are supposed to have weight one. For multi-output problems, a list of dicts can
be provided in the same order as the columns of y.

Usage:

from sklearn.tree import DecisionTreeClassifier

clf_model = DecisionTreeClassifier(criterion="gini", random_state=42,
max_depth=3, min_samples_leaf=5)
clf_model.fit(X_train,y_train)
# Plot decision tree
tree.plot_tree(clf)
# Predict X_test
y_predict = clf_model.predict(X_test)

STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Sap HCM User Manual Organizational Management1
100% (1)
Sap HCM User Manual Organizational Management1
29 pages
LPDP X Aas Recommendation Letter Aqilla Nur Syaharani
No ratings yet
LPDP X Aas Recommendation Letter Aqilla Nur Syaharani
2 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
Ai Combined Update
No ratings yet
Ai Combined Update
274 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
Supervised Learning
No ratings yet
Supervised Learning
71 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Respuestas
33% (18)
Respuestas
6 pages
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Merging Result-Merged
No ratings yet
Merging Result-Merged
14 pages
KNN SVM Class Reading: October 31, 2017
No ratings yet
KNN SVM Class Reading: October 31, 2017
14 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Lecture Material 12
No ratings yet
Lecture Material 12
9 pages
ML - Course - 15 - 17
No ratings yet
ML - Course - 15 - 17
31 pages
ML Unit 3
No ratings yet
ML Unit 3
83 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
Best Splitting Attributes ML
No ratings yet
Best Splitting Attributes ML
34 pages
Lesson 5.0 Supervised Learning With Decision Trees
No ratings yet
Lesson 5.0 Supervised Learning With Decision Trees
16 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Practical 1ritesh
No ratings yet
Practical 1ritesh
3 pages
c5800 Safety Guide
No ratings yet
c5800 Safety Guide
44 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Lab Manual2
No ratings yet
Lab Manual2
6 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Project 1
No ratings yet
Project 1
4 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
NovoQuad Brochure - ND-BD003
No ratings yet
NovoQuad Brochure - ND-BD003
3 pages
ML Lab Manual 4-8
No ratings yet
ML Lab Manual 4-8
11 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Decision Tree
No ratings yet
Decision Tree
15 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Process of ML Code/Algorithm: KNN Type I - Input Test Sample Method
No ratings yet
Process of ML Code/Algorithm: KNN Type I - Input Test Sample Method
3 pages
FDP Session 4 (Decision Tree)
No ratings yet
FDP Session 4 (Decision Tree)
1 page
Sentence Building
No ratings yet
Sentence Building
1 page
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Random Forest
No ratings yet
Random Forest
3 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit-4 Notes
No ratings yet
Unit-4 Notes
13 pages
MANUAL
No ratings yet
MANUAL
33 pages
MANUAL
No ratings yet
MANUAL
34 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Decision Tree
No ratings yet
Decision Tree
1 page
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Decision Tree Notes
No ratings yet
Decision Tree Notes
6 pages
ML 4
No ratings yet
ML 4
5 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Unit-3 Alt
No ratings yet
Unit-3 Alt
24 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
Peer TobPeer Computing The Evolution of A Disruptive Technology
No ratings yet
Peer TobPeer Computing The Evolution of A Disruptive Technology
331 pages
Ver Taxa de Erro Rru SFP
No ratings yet
Ver Taxa de Erro Rru SFP
3 pages
Inam Ullah Khan (editor), Salma El Hajjami (editor), Mariya Ouai - Cognitive Machine Intelligence_ Applications, Challenges, and Related Technologies (Intelligent Data-Driven Systems and Artific (2024, CRC Press) -
No ratings yet
Inam Ullah Khan (editor), Salma El Hajjami (editor), Mariya Ouai - Cognitive Machine Intelligence_ Applications, Challenges, and Related Technologies (Intelligent Data-Driven Systems and Artific (2024, CRC Press) -
373 pages
5B Bayesian Inference: Class Problems
No ratings yet
5B Bayesian Inference: Class Problems
9 pages
Capacitors and Circuits
No ratings yet
Capacitors and Circuits
61 pages
Blackberry Blackberry (Disambiguation) : Manufacturer Research in Motion Compatible Networks
No ratings yet
Blackberry Blackberry (Disambiguation) : Manufacturer Research in Motion Compatible Networks
12 pages
Silabus Material Teknik Elektro
No ratings yet
Silabus Material Teknik Elektro
4 pages
BC 5200&5500
No ratings yet
BC 5200&5500
12 pages
UM1734 User Manual: STM32Cube™ USB Device Library
No ratings yet
UM1734 User Manual: STM32Cube™ USB Device Library
60 pages
HCTL 2017
No ratings yet
HCTL 2017
12 pages
New PKI Based Login Procedure Manual CPWD
No ratings yet
New PKI Based Login Procedure Manual CPWD
13 pages
Seo 1726465399543
No ratings yet
Seo 1726465399543
2 pages
Volume Testing: Identify Whether The Following Is Personal Danger or Danger To Devices
No ratings yet
Volume Testing: Identify Whether The Following Is Personal Danger or Danger To Devices
12 pages
VERSION 1.1/0116: Product Manual English
No ratings yet
VERSION 1.1/0116: Product Manual English
28 pages
16 Crypto
No ratings yet
16 Crypto
8 pages
Accelerating Product Time To Market: Executive Summary
No ratings yet
Accelerating Product Time To Market: Executive Summary
6 pages
BAT-600 Multi-Mission Terminal (MMT) : Transportable, Multi-Band Satellite Communications
No ratings yet
BAT-600 Multi-Mission Terminal (MMT) : Transportable, Multi-Band Satellite Communications
2 pages
Nistspecialpublication800 38d
No ratings yet
Nistspecialpublication800 38d
69 pages
DM Unit - 1
No ratings yet
DM Unit - 1
89 pages
Graph-Based Threat Hunting
No ratings yet
Graph-Based Threat Hunting
16 pages
21csl581 Angular Js Rrce
No ratings yet
21csl581 Angular Js Rrce
37 pages
Varicap bb804
100% (1)
Varicap bb804
4 pages
Nvidia Gpu 25
No ratings yet
Nvidia Gpu 25
15 pages
JVM Overview Content
No ratings yet
JVM Overview Content
10 pages
Em Tech Reviewer
No ratings yet
Em Tech Reviewer
2 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lab 4 - Logistic Regression - KNN - Notes

Uploaded by

Lab 4 - Logistic Regression - KNN - Notes

Uploaded by

ML - LAB - NLU

Lab #4: Logistic Regression_kNN Notes

from sklearn.linear_model import LogisticRegression

Test the model:

from sklearn.metrics import confusion_matrix

Some additional values on metrics:

− metrics.confusion_matrix(y_test, y_pred[, ...]): Compute confusion matrix to

kNN algorithm (https://scikit-

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform',

from sklearn.neighbors import KNeighborsClassifier

Decision Tree (https://scikit-

criterion:{“gini”, “entropy”}, default=”gini”

− Suppose attribute A having v distinct values, {a1, a2,... , av}

Gain ( A) = Info ( D ) − InfoA ( D )

splitter: {“best”, “random”}, default=”best”

max_depth: int, default=None

min_samples_split: int or float, default=2

The minimum number of samples required to split an internal node:

• If int, then consider min_samples_split as the minimum number.

min_samples_leaf: int or float, default=1. The minimum number of samples required to be at

• If int, then consider max_features features at each split.

max_leaf_nodes: int, default=None

min_impurity_decrease: float, default=0.0

N_t / N * (impurity - N_t_R / N_t * right_impurity

class_weight: dict, list of dict or “balanced”, default=None

from sklearn.tree import DecisionTreeClassifier

You might also like