0% found this document useful (0 votes)

22 views31 pages

CSE445 NSU Week - 2

The document discusses ZeroR and OneR classifiers, which are simple classification methods used in supervised learning. ZeroR predicts the majority category without considering predictors, serving as a baseline for model performance, while OneR generates a single rule for each predictor and selects the one with the smallest total error. Additionally, it covers model evaluation techniques, confusion matrices, and tools like WEKA and Anaconda for machine learning applications.

Uploaded by

Rabiul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views31 pages

CSE445 NSU Week - 2

Uploaded by

Rabiul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

ZeroR and OneR

Classifier

1
Classifier (Recall)

►Supervised learning contains (X, y)

►X is the data comprising of instances with attributes <x1, x2, ….. xn> (n
featured attribute)
►y is the label
►If y is discrete/categorical, then the problem is a classification
problem and we require a classifier to classify.
►If y is continuous, then the problem is a regression problem

2
Categorical/Discrete variables
►A categorical/ discrete variable is one that has two or more categories (values).
►There are two kinds of categorical variable
►Nominal
►A nominal variable has no intrinsic ordering to the categories.
►E.g., a gender is a categorical variable with values {Male, Female} that have no intrinsic ordering to the
categories
►Ordinal (related to order)
►An ordinal data has clear ordering.
►E.g., Temperature = {low, medium, high}

3
ZeroR classifier
►ZeroR stands for Zero Rule
►Simplest classification method that relies on the target (output, ) and ignores all
predictors (features, ).
►ZeroR classifier simply predicts the majority category.
►Although there is no predictability power in ZeroR, it is useful in determining a
baseline performance as a benchmark for other classification methods.
Algorithm
Construct a frequency table for the target and select the most frequent value.

4
ZeroR classifier
For binary classification : Accuracy
Can be tested for imbalanced dataset
Training accuracy =

frequency table

5
Classification Model Evaluation
►Models need to be evaluated and therefore some kind of model evaluation
techniques need to be in place.
►Confusion Matrix
►A confusion matrix shows the number of correct and incorrect predictions made by the
classification model compared to the actual outcomes (target value) in the data.
►The matrix is N × N, where N is the number of classes in the target variable.
►N = the number of classes

6
Terminologies (Related to Confusion
Matrix)
►Accuracy: The proportion of the total number of predictions that were correct.
►Positive Predictive Value or Precision: The proportion of positive cases that were
correctly identified.
►Precision
►Negative Predictive Value: The proportion of negative cases that were correctly
identified.
►Sensitivity or Recall: The proportion of actual positive cases which are correctly
identified.
►Recall
►Specificity: The proportion of actual negative cases that are correctly identified.
►F1-score: Harmonic mean of precision and recall
7
2 × 2 Confusion Matrix for two
classes (Positive and Negative)
Confusion Matrix Target (Actual)
Positive Negative
Positive a (TP) b (FP) Positive a/(a+b)
Predicted
value
Model Negative c (FN) d (TN) Negative d/(c+d)
(Predicted) Predicted
Value
Sensitivity Specificity Accuracy = (a+d)/(a+b+c+d)
a/(a+c) d/(b+d)

a = True Positive (TP)

b = False Positive (FP)
c = False Negative (FN)
8
d = True Negative (TN)
Precision-recall trade-Terminologies
(Related to Confusion Matrix)off
►Precision and recall: Important for imbalanced/skewed dataset
►Cancer prediction, spam email prediction
►TPR
►FPR
►ROC curve: TPR vs. FPR
►AUC: Area under the ROC Curve
►closer ROC AUC is to 1, the better.

9
Confusion Matrix of the ZeroR
Classifier for the “play golf” dataset

10
Confusion Matrix for multiclassification
problem

Precision and recall for EACH class/category is calculated

Arithmetic/Macro average is calculated
Weighted average considers how many samples/instances of each class there were
in its calculation, so fewer of one class means that it’s precision/recall/F1 score has
less of an impact on their weighted average
11
ML problem steps
►EDA: Exploratory Data Analysis
►Data Preprocessing: Remove duplicate and missing entries and incorrect data;
outlier and noise detection
►Outlier: data point that differs significantly from other observations
►Label and One-hot encoding: Convert categorical values to numerical data
►Feature scaling: standardize the independent features present in the data in a
fixed range [0,1]
►Hyperparameter optimization/tuning:
choosing a set of optimal hyperparameters
for a learning algorithm

12
ML library
►WEKA: Machine Learning Software in Java
►Contains tools for data preparation, classification, regression, clustering,
association rules mining, and visualization
►Developed at the University of Waikato, New Zealand
►Anaconda: Anaconda is a distribution of the Python and R programming
languages for scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.), that aims to simplify
package management and deployment
►Anaconda Navigator is a desktop graphical user interface (GUI) included in
Anaconda distribution that allows users to launch applications and manage
conda packages, environments and channels without using command-line
commands

13
WEKA

►Weka is a collection of machine learning algorithms for data mining

tasks.
►It contains tools for data preparation, classification, regression,
clustering, association rules mining, and visualization.
►Weka supports deep learning!
►Built in Java
►Link: https://www.cs.waikato.ac.nz/ml/weka/

14
Weka Explorer

15
Weka File Selection

16
Weka

17
Weka Results

18
Weka Results

19
Anaconda libraries
►Jupyter Notebook: Jupyter Notebook App is a server-client application that
allows editing and running notebook documents via a web browser
►Pandas: read, write data, data preprocessing – data handling
►Matplotlib, seaborn: data visualization libraries
►Sklearn (Scikit-learn): ML libraries
►TensorFlow: library for machine learning and deep neural networks
►Numpy: numerical Python. large, multi-dimensional arrays and matrices, along
with a large collection of high-level mathematical functions to operate on these
arrays
►Change directory from Anaconda Prompt
►Green: edit mode; Blue: Command mode

20
Google Colab
►free cloud service hosted by Google that runs on cloud
►Provides free GPU support
►Kaggle: a subsidiary of Google LLC - online community of data scientists and
machine learning practitioners.
►Kaggle allows users to find and publish data sets, explore and build models in a
web-based data-science environment, work with other data scientists and
machine learning engineers, and enter competitions to solve data science
challenges

21
OneR Classifier
►OneR stands for “One Rule”
►A simple, yet accurate, classification algorithm that generates one rule for each
predictor in the data, then selects the rule with the smallest total error as its
“one rule”.
OneR Algorithm:
For each predictor,
For each value of that predictor, make a rule as follows;
Count how often each value of target (class) appears
Find the most frequent class
Make the rule assign that class to this value of the predictor
Calculate the total error of the rules of each predictor
Choose the predictor with the smallest total error.
22
OneR classifier

23
OneR classifier
Outlook_error = 2+0+2 =4; smallest error; Winning Predictor
Temp_error = 2+2+1 = 5
Humidity_error = 3 + 1 =4
Windy_error = 2+3=5

24
Model Evaluation (OneR classifier)

TP = 7
FP = 2
FN = 2
TN = 3

25
Machine learning in Python

Scikit-learn: Free, software machine learning library for Python

programming language

Pandas: Software library written for Python programming language for

data manipulation and analysis.

Matplotlib: A plotting library for Python programming language

Numpy: LIbrary for Python programming language which is used for

working with arrays and matrices
26
ZeroR classifier using Python

imports
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import pandas as pd

27
ZeroR classifier using Python

Reading dataset
df = pd.read_csv('E:\ML-teaching\Sample Datasets\weather.csv')

Try the following

print(df)
print(df.shape)
print(df.head())
print(df.describe())

28
ZeroR classifier using Python

Separate X and y

X = df.drop(columns = ['Play'])
y = df['Play']

29
ZeroR classifier using Python

Train and Test

model = DummyClassifier(strategy = 'most_frequent', random_state =

0)
model.fit(X,y)
predictions = model.predict(X)
score = accuracy_score(y, predictions)
print(score)
print(confusion_matrix(y, predictions))
print(classification_report(y, predictions)) 30
Future

We should ideally divide the dataset into train set and test set
Train set should be used to create the model
Test set should be used to test how well the model is predicting
unlabel cases.

ML Lab Mannual R22 Cse (DS)
No ratings yet
ML Lab Mannual R22 Cse (DS)
46 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Cardio Disease - Full Document - LightGBM
No ratings yet
Cardio Disease - Full Document - LightGBM
29 pages
ML Supervised Regression
No ratings yet
ML Supervised Regression
70 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Chap5 Evaluating Performance
No ratings yet
Chap5 Evaluating Performance
54 pages
ML LAB Manual
No ratings yet
ML LAB Manual
28 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Twitter Sentiment Analysis Project Report Compressed
No ratings yet
Twitter Sentiment Analysis Project Report Compressed
33 pages
Data Description Toolbox DD Tools 2.0.0
No ratings yet
Data Description Toolbox DD Tools 2.0.0
47 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Module 3
No ratings yet
Module 3
108 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Unit 3
No ratings yet
Unit 3
97 pages
Report PDF
No ratings yet
Report PDF
42 pages
Big Data Lesson 5 Lucrezia Noli
No ratings yet
Big Data Lesson 5 Lucrezia Noli
30 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Q ClassX AI Evaluation
No ratings yet
Q ClassX AI Evaluation
12 pages
Practical - 5 - 52
No ratings yet
Practical - 5 - 52
4 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
ML Notes UT-2
No ratings yet
ML Notes UT-2
19 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
Agriculture Crop Recommendation System Using Machine Learning
No ratings yet
Agriculture Crop Recommendation System Using Machine Learning
11 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Python Learning
No ratings yet
Python Learning
21 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
MLib Cheat Sheet Design
No ratings yet
MLib Cheat Sheet Design
1 page
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Factory Workers Daily Performance & Attrition Dataset Meta Data
No ratings yet
Factory Workers Daily Performance & Attrition Dataset Meta Data
5 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Reading Buildinga Decision Treein KNIME
No ratings yet
Reading Buildinga Decision Treein KNIME
5 pages
IEEE Conference Template-1
No ratings yet
IEEE Conference Template-1
6 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Evaluation
No ratings yet
Evaluation
32 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
Practical - 6
No ratings yet
Practical - 6
6 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
All Life Bank - AIML - ML - Project - Low - Code - Notebook
No ratings yet
All Life Bank - AIML - ML - Project - Low - Code - Notebook
78 pages
Semester
No ratings yet
Semester
8 pages
Lit Rev Dis 2
No ratings yet
Lit Rev Dis 2
8 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
ML - Lab-6.ipynb - Colab
No ratings yet
ML - Lab-6.ipynb - Colab
4 pages
ML Lab - V Sem - Bca
No ratings yet
ML Lab - V Sem - Bca
22 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
GAN CNN Ensemble
No ratings yet
GAN CNN Ensemble
13 pages
Cse3036 Predictive Analytics Final Lab Manual
No ratings yet
Cse3036 Predictive Analytics Final Lab Manual
112 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
Vishal FOML Micro Project Vishal & Milan
No ratings yet
Vishal FOML Micro Project Vishal & Milan
26 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Lecture Slides - ML - Part 2
No ratings yet
Lecture Slides - ML - Part 2
22 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Final ML
No ratings yet
Final ML
2 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Ecological Monographs - 2022 - Yates - Cross Validation For Model Selection A Review With Examples From Ecology
No ratings yet
Ecological Monographs - 2022 - Yates - Cross Validation For Model Selection A Review With Examples From Ecology
24 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Print Pneumonia Detection - Ipynb - Colab
No ratings yet
Print Pneumonia Detection - Ipynb - Colab
1 page
CS619
No ratings yet
CS619
149 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Lesson 2.4.1 What Is Scikit Learn Keynote
No ratings yet
Lesson 2.4.1 What Is Scikit Learn Keynote
21 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Simple Introduction of Neural Network
No ratings yet
Simple Introduction of Neural Network
28 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
AIch 5
No ratings yet
AIch 5
50 pages
SEC Presentation
No ratings yet
SEC Presentation
22 pages
? Task
No ratings yet
? Task
23 pages
SML
No ratings yet
SML
8 pages
Machine Learning QB
No ratings yet
Machine Learning QB
15 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi Instant Download
100% (2)
Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi Instant Download
59 pages
PSCS511 - Machine Learning
No ratings yet
PSCS511 - Machine Learning
23 pages
Modelling
No ratings yet
Modelling
69 pages
机器学习
No ratings yet
机器学习
41 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages

CSE445 NSU Week - 2

Uploaded by

CSE445 NSU Week - 2

Uploaded by

ZeroR and OneR

►Supervised learning contains (X, y)

a = True Positive (TP)

Precision and recall for EACH class/category is calculated

►Weka is a collection of machine learning algorithms for data mining

Scikit-learn: Free, software machine learning library for Python

Pandas: Software library written for Python programming language for

Matplotlib: A plotting library for Python programming language

Numpy: LIbrary for Python programming language which is used for

Try the following

Train and Test

model = DummyClassifier(strategy = 'most_frequent', random_state =

You might also like