0% found this document useful (0 votes)
6 views

Week03 Classification

Uploaded by

bobforlife001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Week03 Classification

Uploaded by

bobforlife001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Lecture 03 –

Classification

1
Recall – Classification
• Definition: Classification is a supervised learning task where
the goal is to predict the category or label of a given input
based on learned patterns from a dataset
• Spam Detection (Spam vs. Not Spam)
• Fault Detection (Fault vs. Not Fault)

• Output: The model assigns a class label to the input (e.g.,


class "A" or "B").
Classification algorithms
• Some popular classification algorithms include:
• Decision Trees
• Logistic Regression
• Support Vector Machines (SVM)
• k-Nearest Neighbors (k-NN)
• Neural Networks
• In this lecture we will focus on Decision Trees
Decision Tree
• A decision tree is a supervised machine
learning algorithm used for classification.
• It models decisions and their possible
consequences in a tree-like structure.
• A decision tree is a model composed of a
collection of "questions" organized
hierarchically in the shape of a tree

https://developers.google.com/machine-learning/decision-
forests/decision-trees
Decision Tree
• Root Node: The topmost node
representing the entire dataset
• Internal Nodes: Nodes that
perform tests on features
• Branches: Edges connecting
nodes, representing the outcome
of a test
• Leaf Nodes: Terminal nodes that
provide the final decision or
prediction
Types of Decision Trees
• In practice, decision trees generally use binary trees, and we
will focus on them
Building a Decision Tree
• We split the data based on questions like:
• "Is the pet a cat or dog?"
• "Is the temperature high or low?"
• Goal: We want each split to make the groups as pure as
possible (like grouping similar items together)

What should be our criterion of splitting the data?


Understanding Entropy
1. What is Entropy in Thermodynamics?
• In physics, entropy measures disorder in a system (how mixed or random
things are)
• For example, a room with scattered objects has high entropy (high
disorder)
1. Entropy in Decision Trees:
• Entropy helps us measure how mixed or uncertain our data is.
• Lower entropy means the data is more organized (purer), just like a clean room has
lower disorder
Goal: We want to reduce entropy at each step (make the groups less
mixed)
Information Gain (How We Use Entropy)
1. Information Gain: When we split the data, we calculate how
much entropy (disorder) we reduced. This is called
information gain (IG).
2. Why Use Entropy?
• By choosing the splits that reduce entropy the most, we’re making
the data more organized and easier to classify.

Final Goal: We keep splitting the data until each group is as


pure as possible, resulting in a well-organized tree.
Entropy Calculation
• If all data points in a group belong to the same category (very
organized), the entropy is low (close to 0).
• If the data points are evenly split between categories (very
mixed), the entropy is high (close to 1).
Information Gain
• Information Gain tells us how much entropy is reduced after
splitting the data
• The goal is to make the data more organized after each split

Information Gain (IG)=Entropy (before split)−Entropy (after split)

• This is a measure of “goodness” of our split.


• We will use this metric to build our decision tree.
Information Gain Example
• Initial Group:
• We have 10 animals: 6 dogs and 4 cats.
• The initial entropy (before split):

𝐻 = −(0.6 ∗ log 2 0.6 + 0.4 log 2 (0.4)) = 0.971


• Split by tail length:
• Short Tail Group: 4 dogs, 1 cat
• Long Tail Group: 2 dogs, 3 cat

Let see if we gain any information by splitting based on the tail length feature.
Information Gain Example
• Short tail group: 𝐻 = − 0.8 ∗ log2 0.8 + 0.2 ∗ log2 0.2 = 0.722
• Long tail group: 𝐻 = − 0.8 ∗ log2 0.8 + 0.2 ∗ log2 0.2 = 0.722
Weighted Average Entropy (after split):

5 5
𝐻𝑎𝑓𝑡𝑒𝑟 𝑠𝑝𝑙𝑖𝑡 = × 0.722 + × 0.971 = 0.846
10 10

Information gain: 0.971 – 0.846 = 0.125 (so a useful split!)


Decision Tree Algorithm (pseudo code)
Types of Data
• Categorical data
• Data that represents distinct categories or labels
• Material Type: "Steel", "Aluminum", "Plastic”
• Machine Status: "Running", "Stopped", "Maintenance"
• How to Handle in Decision Trees
• Split data based on categories:
• Example:
• "Is the material type Steel?" or "Is the machine in Maintenance?"
Types of Data
• Numerical Data
• Data that represents numerical values, which can be continuous or discrete
• Temperature: 50°C, 100°C, 150°C
• Pressure: 5 bar, 10 bar, 20 bar

• How to Handle in Decision Trees


• Splitting Strategy
• Use comparison operators (e.g., <= or >) to divide the data into two groups (e.g., based on the
mean, median, etc.)
• Example: "Is the pressure less than or equal to 10 bar?"
• Binning Strategy
• Group values into ranges if appropriate
• Example: "Low Pressure (0-10 bar)", "Medium Pressure (11-20 bar)", "High Pressure (21-30
bar)".
Overfitting

Tree 2
Tree 1
Weight is < 1350
Weight is > 1350

Color
Overfitting
• Overfitting occurs when a model learns the details and noise in
the training data to the extent that it negatively impacts
performance on new, unseen data.
• The model becomes too complex, capturing even irrelevant patterns,
which reduces its ability to generalize to unseen data.
Depth of the tree
• Lower depth (e.g., 2 to 7): • Larger depth (e.g., > 7):
• Easier to interpret • Can capture complex
• Ideal for scenarios where relationships
model transparency is • More prone to overfitting and
essential. do not generalize well
• May learn noise in the data
Train-Test Split
• To prevent overfitting, we split the dataset into two parts:
• Training set: Used to train the model (e.g., 70% of the data)
• Test set: Used to evaluate the model's performance on unseen data
(e.g., 30% of the data)
Overfitting in Decision Trees
• The tree is grown too deep, creating many branches that
capture noise and irrelevant patterns in the training data
• Solution: Limit depth of the tree and rely on testing accuracy

Source: https://machinelearningmastery.com/overfitting-machine-learning-models/
Pros and cons
Pros
• Easy to Understand and Interpret
• Handles Both Numerical and Categorical Data
• Works well for small datasets
Cons
• Prone to overfitting
• Computationally expensive for large trees

You might also like