0% found this document useful (0 votes)

62 views64 pages

ML-Module 3

Uploaded by

Shashank R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views64 pages

ML-Module 3

Uploaded by

Shashank R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

MACHINE LEARNING (BCS602)

Module 3
Chapter 4 : Similarity Based Learning
4.2 Nearest-neighbor Learning :
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that classifies
unlabeled data by finding the most similar labeled examples
The value of k is critical in KNN as it determines the number of neighbors to consider when
making predictions.(Bias & Variance)
Example
Imagine you’re deciding which fruit it is based on its shape and size. You compare it to fruits
you already know.
• If k = 3, the algorithm looks at the 3 closest fruits to the new one.
• If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an
apple because most of its neighbors a

4.1 KNN Algorithm

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Problem based on KNN
Example: You are trying to classify a new point based on its features, using k = 3.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

4.3 Weighted K –Nearest –Neighbor Algorithm

• Normal k-NN: In regular k-NN, each of the k nearest neighbors gets equal weight in
making the decision.
• Weighted k-NN: In weighted k-NN, closer neighbors have a higher weight, meaning their
influence on the final prediction is stronger. Typically, weights are inversely proportional
to the distance from the new point.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Problems on Weighted KNN

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

4.4 Nearest Centroid Classifier

• The Nearest Centroid Classifier is a simple machine learning classification algorithm that
assigns a class to a data point by finding the centroid (mean) of each class and then
classifying the point based on which centroid it is closest to.
• This classifier is often used for classification tasks where the classes have well-separated
centroids.

Problems on Nearest Centroid Classifier:

1. We want to classify a new data point with x=5 and y=5

.
Prof. Navyashree K S, CSE(DS) RNSIT
MACHINE LEARNING (BCS602)
Centroid of Class A:
We calculate the centroid for Class A using the average of all x and y values for the points in
Class A.
The data points for Class A are:
• (1, 2)
• (2, 3)
• (3, 3)
The centroid of Class A (μA) is the average of the x-coordinates and y-coordinates ,
(6,8)/3 ≈ (2,2.67)
Centroid of Class B:
We calculate the centroid for Class B using the average of all x and y values for the points in
Class B.
The data points for Class B are:
• (6, 6)
• (7, 7)
• (8, 6)
The centroid of Class B(μB) is
(21,19)/3 ≈ (7,6.33)
Step 2: Calculate the Distance to Each Centroid
Now, we calculate the Euclidean distance from the new point (5, 5) to each of the centroids.
Distance to Class A Centroid (2,2.67):
The Euclidean distance is given by:
dA =sqrt (14.43) ≈3 .8
Distance to Class B Centroid (7,6.33):
The Euclidean distance is:
dB=sqrt (5.77) ≈ 2.4
Step 3: Classify the New Point
• The distance to the Class A centroid is 3.8.
• The distance to the Class B centroid is 2.4.
Since the new point is closer to the Class B centroid, we classify the new point as Class B. The
new point (5, 5) is classified as Class B using the Nearest Centroid Classifier.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
2.Consider the sample data shown in table with two features x and y . The target classes
are A or B. Predict the class using Nearest Centroid Classifier. Given point(6,5)

4.5 Locally Weighted Regression (LWR)

• Locally Weighted Linear Regression (LWLR) is a non-parametric, memory-based
algorithm designed to capture non-linear relationships in data. Unlike traditional regression
models that fit a single global line across the dataset, LWLR creates localized models for
subsets of data points near the query point. Each query point has its own regression line
based on weighted contributions from nearby data points.
• LWLR assigns weights to data points based on their proximity to the query point:
• Points closer to the query point have higher weights.
• Points farther away have lower weights.
Steps Involved in Locally Weighted Linear Regression
1. Data Collection and Preparation
• Gather a dataset with relevant features and a target variable.
• Preprocess the data by handling missing values and normalizing features to ensure a
consistent scale, which improves the weighting process.
2. Choose the Kernel and Bandwidth (Tau)
• Kernel Function: Determines how weights are assigned to data points based on their
distance from the query point.
 Gaussian Kernel: Assigns weights using the formula:

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

• Here, Wi is the weight for data point ,xi and τ\tau (bandwidth) controls the rate of weight
decay.
• Bandwidth (Tau): A critical parameter that governs how localized the regression is:
• Small T : Focuses on nearby points, capturing finer details but risks overfitting.
• Large T: Includes more distant points, reducing variance but increasing bias.
3. Weight Calculation
For a given query point , xi compute weights for all data points using the chosen kernel
function. Points closer to xi will have higher weights.
4. Model Fitting:Using the computed weights, fit a weighted least squares regression to the
data. The goal is to minimize the weighted sum of squared errors:
5. Prediction:Once the localized model is fitted, use it to predict the target value for the query
point.
LWR Algorithm:

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Problems on LWR:

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Chapter 5:
Regression Analysis
5.1 Introduction to Regression
Regression analysis is a statistical technique used to model and analyse the relationships
between variables. This is one of the oldest supervised learning technique.
It helps in understanding how the dependent variable (outcome) changes when one or more
independent variables (predictors) are modified. Given a training dataset D containing N
training points
(xi , yi), where i=1….N, used to model the relationship between one or more independent
variables xi and a dependent variable yi. It is represented by a function y=f(x)
This technique is widely used in various fields such as economics, finance, machine learning,
and social sciences to make predictions and infer relationships between variables.
Types of Regression
 Simple Linear Regression: Involves one independent variable and models a linear
relationship (e.g., predicting sales based on advertising spend).
 Multiple Linear Regression: Involves multiple independent variables (e.g., predicting
house prices based on location, size, and number of rooms).
 Logistic Regression: Used when the dependent variable is categorical (e.g., predicting
whether a customer will buy a product: Yes/No).
 Polynomial Regression: Models nonlinear relationships by adding polynomial terms.
 Ridge and Lasso Regression: Used for regularization to prevent overfitting.
5.2 Introduction to Linearity, correlation, And Causation
Understanding the relationships between variables is crucial in data analysis and statistics.
Three fundamental concepts that help explain these relationships are linearity, correlation, and
causation. Each concept plays a unique role in interpreting data and making accurate
predictions.
1. Linearity
Linearity refers to a straight-line relationship between two variables. In a linear relationship,
changes in one variable result in proportional changes in another. This relationship can be
expressed using a linear equation:
Y=β0+β1X+ε
Where:
 Y is the dependent variable ,X is the independent variable ,β0 is the intercept,β1 is the slope
(rate of change),ε is the error term
Examples of Linearity:
 A company’s revenue increasing proportionally with its marketing budget.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
 Temperature rising linearly with time during a steady heat increase.
Nonlinear Relationships:
 Some relationships do not follow a straight-line pattern. For example, the relationship
between study hours and test scores might flatten after a certain point.
2. Correlation
Correlation among two variables can be done effectively using a scalar plot, which is a plot
between explanatory variables and response variables. It is a 2D graph showing the relationship
between two variables. The x-axis of scatter plot is independent , or input or predictor variables
and y-axis of the scatter plot is output or dependent or predicted variables.
The scatter plot are shown in below figure.

Measures the strength and direction of a relationship between two variables. It is represented
by the correlation coefficient (r), which ranges from -1 to 1:
 r=1 → Perfect positive correlation (as one variable increases, the other increases).
 r=−1 → Perfect negative correlation (as one variable increases, the other decreases).
 r=0 → No correlation (no relationship between variables).
Examples of Correlation:
 Height and weight often show a positive correlation.
 Increase in gas prices and demand for electric cars may have a negative correlation.
However, correlation does not imply causation—just because two variables move together
does not mean one causes the other.
3. Causation
Causation means that a change in one variable directly causes a change in another. Unlike
correlation, which only shows an association, causation requires experimental or observational
evidence.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Examples of Causation:
 Smoking causes lung cancer.
 Increasing fertilizer use leads to higher crop yield.
Mistaking Correlation for Causation:
 Ice cream sales and drowning incidents are correlated—but ice cream does not cause
drowning. Instead, hot weather (a third factor) increases both.
 Higher education levels correlate with higher income—but education alone does not
"cause" wealth; factors like skills, experience, and networking also play roles.
Linearity and Non -Linearity Relationships
Linearity means relationship between the dependent and independent variables can be
visualized in a straight line. The line form y=ax+b can be fitted to the data points that
indicate the relationship between x and y. By linearity, it is meant that as one variable
increases, the corresponding variable also increases in a linear manner.
A linear relationship is show in below figure ,(a) A non linear relationship exists in
functions such as exponential function and power function and it is shown in figures (b)
and (c), here x axis is given by x data and y axis is given by y data.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Types of Regression Methods:

1. Linear Regression
 Purpose: Predict a continuous outcome based on one predictor.
 Equation:
y=β0+β1x+ε
Use case: Straight-line relationship (e.g., predicting salary based on years of experience).
2. Multiple Linear Regression
 Purpose: Predict a continuous outcome using multiple predictors.
 Equation:
y=β0+β1x1+β2x2+…+βnxn+ε
Use case: Predicting house price based on area, number of rooms, location, etc.
3. Polynomial Regression
 Purpose: Handle nonlinear relationships by adding polynomial terms.
 Equation (2nd degree example):
y=β0+β1x+β2x2+ε
Use case: Modelling curves like the growth rate of a business over time.
4. Logistic Regression
 Purpose: Classification (not traditional regression).
 Output: Probability (between 0 and 1) that is mapped to classes (e.g., 0 or 1).
 Equation (Sigmoid function):

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

 Use case: Predicting whether an email is spam or not.

5. Lasso Regression (L1 Regularization)
 Purpose: Prevent overfitting & perform feature selection by shrinking some coefficients
to 0.

 Penalty term added:

 Use case: When you have many features, but only a few are actually useful.
6. Ridge Regression (L2 Regularization)
 Purpose: Handle multicollinearity and reduce model complexity by shrinking
coefficients.

 Penalty term added:

 Use case: When all features contribute a bit, but you want to control overfitting.
Limitations of Regression Model:
1. Outliers
Outliers are extreme values that can distort regression results by heavily influencing the
regression line. They can lead to misleading coefficients and poor predictions. Handling them
involves detection, removal, or using robust regression techniques.
2.Number of Cases (Sample Size)
A small sample size can cause overfitting and unreliable results. Regression models need
enough data to be accurate and generalizable. Ideally, there should be at least 10–15
observations per predictor variable.
3.Missing Data
Missing values reduce the effectiveness of a regression model and can bias results. Simply
dropping them wastes data, while imputation methods like mean filling or multiple imputation
help retain valuable information.
4.Multicollinearity
When predictors are highly correlated, it becomes hard to separate their individual effects. This
causes unstable coefficients and inflated standard errors. It can be addressed by removing
correlated variables or using techniques like Ridge regression.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
5.3 Introduction to Linear Regression
Linear regression is a method used to find the best-fitting straight line through a set of data
points. The goal is to model the relationship between a dependent variable y and an independent
variable x using a straight line:
y=a0+a1x + e
 a0 is the intercept
 a1 is the slope
 e is the error in prediction
The assumptions of linear regression are listed as follows:
1. The observations (y) are random and are mutually independent.
2. The difference between the predicted and true values is called an error. The error is also
mutually independent with the same distributions, such as a normal distribution with
zero mean and constant variance.
3. The distribution of the error term is independent of the joint distribution of explanatory
variables.
4. The unknown parameters of the regression models are constants.
The idea of linear regression is based on Ordinary Least square (OLS) approach.The data points
are modelled using a straight line. Any arbitrarily drawn line is not an optimal line. In fig 5.4
three data points and their errors (e1,e2,e3) are shown

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
5.4 Validation of Regression Methods:
1.Mean Absolute Error (MAE)
• The average of the absolute differences between actual values yi and predicted values.

Interpretation: Lower MAE means better model performance.

2.Mean Squared Error (MSE)
• The average of the squared differences between actual and predicted values:

Interpretation: Larger errors get penalized more due to squaring. MSE is always positive.
3.Standard Error (explained in words)
• Refers to the standard deviation of residuals (errors between actual and predicted
values).
• Ideally, it should be small, meaning predictions are close to actual values.
• If residuals follow a normal distribution with mean zero, it's a good sign.
4.RMSE
Measures the average magnitude of prediction errors in other words, how far off your model's
predictions are from the actual values. It's especially useful because it penalizes larger errors
more than smaller ones (due to squaring).

5.Relative Mean Square Error (RelMSE)

RelMSE shows how well your model performs compared to a naive model that always
predicts the average value. It’s a normalized metric, useful when you want to compare error
relative to the scale of your data.

3. Coefficient of Variation (CV)

CV helps you understand the error size relative to the average value of the data. It makes
RMSE dimensionless by dividing it by the mean of the target variable. So, you can use it to

compare errors across different datasets or scales.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

5.6 Polynomial Regression

If the Relationship between the independent and dependent variables is not Linear , then linear
regression cannot be used as it will result in large errors. The problem of non -linear regression
can be solved by two methods.
1.Transformation of non linear data, so that the linear regression can handle the data.
2.Using polynomial regression.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Chapter 6
Decision Tree Learning
Decision Tree Learning is a widely used predictive model for supervised learning that spans
over a number of practical applications in various areas. It is used for both classification and
regression tasks. The decision tree model basically represents logical rules that predict the value
of a target variable by inferring from data features. This chapter provides a keen insight into
how to construct a decision tree and infer knowledge from the tree.
Learning Objectives
 Understand the structure of the decision tree
 Know about the fundamentals of Entropy
 Learn and understand popular univariate Decision Tree Induction algorithms such as
ID3, C4.5, and multivariate decision tree algorithm such as CART
 Deal with continuous attributes using improved C4.5 algorithm
 Construct Classification and Regression Tree (CART) for classifying both categorical
and continuous-valued target variables
 Construct regression trees where the target feature is a continuous-valued variable
 Understand the basics of validating and pruning of decision trees

6.1 INTRODUCTION TO DECISION TREE LEARNING MODEL

 Decision tree learning model, one of the most popular supervised predictive learning
models, classifies data instances with high accuracy and consistency. The model
performs an inductive inference that reaches a general conclusion from observed
examples. This model is variably used for solving complex classification applications.
 Decision tree is a concept tree which summarizes the information contained in the
training dataset in the form of a tree structure. Once the concept model is built, test data
can be easily classified.
 This model can be used to classify both categorical target variables and continuous-
valued target variables. Given a training dataset X, this model computes a hypothesis
function f(X) as a decision tree.
 Inputs to the model are data instances or objects with a set of features or attributes which
can be discrete or continuous, and the output of the model is a decision tree which
predicts or classifies the target class for the test data object.
 In statistical terms, attributes or features are called as independent variables. The target
feature or target class is called as response variable which indicates the category we
need to predict on a test object.
 The decision tree learning model generates a complete hypothesis space in the form of
a tree structure with the given training dataset and allows us to search through the
possible set of hypotheses which in fact would be a smaller decision tree as we walk
through the tree. This kind of search bias is called as preference bias.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
6.1.1 Structure of a Decision Tree

A decision tree has a structure that consists of a root node, internal nodes/decision nodes,
branches, and terminal nodes/leaf nodes. The topmost node in the tree is the root node. Internal
nodes are the test nodes and are also called as decision nodes. These nodes represent a choice
or test of an input attribute and the outcome or outputs of the test condition are the branches
emanating from this decision node. The branches are labelled as per the outcomes or output
values of the test condition. Each branch represents a sub-tree or subsection of the entire tree.
Every decision node is part of a path to a leaf node. The leaf nodes represent the labels or the
outcome of a decision path. The labels of the leaf nodes are the different target classes a data
instance can belong to.

Every path from root to a leaf node represents a logical rule that corresponds to a conjunction
of test attributes and the whole tree represents a disjunction of these conjunctions. The decision
tree model, in general, represents a collection of logical rules of classification in the form of a
tree structure.

Decision networks, otherwise called as influence diagrams, have a directed graph structure with
nodes and links. It is an extension of Bayesian belief networks that represents information about
each node’s current state, its possible actions, the possible outcome of those actions, and their
utility. The concept of Bayesian Belief Network (BBN) is discussed in Chapter 9.

Figure 6.1 shows symbols that are used in this book to represent different nodes in the
construction of a decision tree. A circle is used to represent a root node, a diamond symbol is
used to represent a decision node or the internal nodes, and all leaf nodes are represented with
a rectangle.

Building the Tree

Goal: Construct a decision tree with the given training dataset. The tree is constructed in a top-
down fashion. It starts from the root node. At every level of tree construction, we need to find
the best split attribute or best decision node among all attributes. This process is recursive and
continued until we end up in the last level of the tree or finding a leaf node which cannot be
split further. The tree construction is complete when all the test conditions lead to a leaf node.

The leaf node contains the target class or output of classification.

Output: Decision tree representing the complete hypothesis space.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Knowledge Inference or Classification

Goal: Given a test instance, infer to the target class it belongs to.

Classification: Inferring the target class for the test instance or object is based on inductive
inference on the constructed decision tree. In order to classify an object, we need to start
traversing the tree from the root. We traverse as we evaluate the test condition on every decision
node with the test object attribute value and walk to the branch corresponding to the test’s
outcome. This process is repeated until we end up in a leaf node which contains the target class
of the test object.
Output: Target label of the test instance.
Advantages of Decision Trees
1. Easy to model and interpret
2. Simple to understand
3. The input and output attributes can be discrete or continuous predictor variables
4. Can model a high degree of nonlinearity in the relationship between the target variables
and the predictor variables
5. Quick to train
Disadvantages of Decision Trees
Some of the issues that generally arise with a decision tree learning are that:
1. It is difficult to determine how deeply a decision tree can be grown or when to stop growing
it.
2. If training data has errors or missing attribute values, then the decision tree constructed may
become unstable or biased.
3. If the training data has continuous-valued attributes, handling it is computationally complex
and has to be discretized.
4. A complex decision tree may also be over-fitting with the training data.
5. Decision tree learning is not well suited for classifying multiple output classes.
6. Learning an optimal decision tree is also known to be NP-complete.
How to draw a decision tree to predict a student's academic performance based on the given
information such as class attendance, class assignments, home-work assignments, tests,
participation in competitions or other events, group activities such as projects and
presentations, etc.
Solution:
The target feature is the student performance in the final examination—whether he will pass or
fail in the examination. The decision nodes are test nodes which check for conditions like:
 “What’s the student’s class attendance?”
 “How did he perform in his class assignments?”

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
 “Did he do his home assignments properly?”
 “What about his assessment results?”
 “Did he participate in competitions or other events?”
 “What is the performance rating in group activities such as projects and presentations?”

The leaf nodes represent the outcomes, that is, either ‘pass’, or ‘fail’.
A decision tree would be constructed by following a set of if-else conditions which may or
may not include all the attributes, and decision nodes outcomes are two or more than two.
Hence, the tree is not a binary tree.
Note: A decision tree is not always a binary tree. It is a tree which can have more than two
branches
Example 6.2:
Predict a student’s academic performance of whether he will pass or fail based on the given
information such as ‘Assessment’ and ‘Assignment’. The following Table 6.2 shows the
independent variables, Assessment and Assignment, and the target variable Exam Result with
their values. Draw a binary decision tree.
Table 6.2: Attributes and Associated Values

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

6.1.2 Fundamentals of Entropy

When constructing a decision tree, we need to select the best feature to split the dataset at each
level. The goal is to best describe the target class for the given test instances.
 The best split feature is the one that contains more information about how to divide
the data effectively so the target class is accurately identified.
 This means choosing features that make the resulting subsets purer in terms of their
classification.
 This continues recursively until a stopping condition is met.
Entropy is used as a measure of this information:
 It is a measure of uncertainty or randomness in a system.
 It also reflects the homogeneity of the data.
 Lower entropy indicates less randomness and more homogeneity (better
classification).
Example:
 A coin flip (2 outcomes) has lower entropy than rolling a die (6 outcomes), because
it's simpler and has fewer possible outcomes.
Higher Entropy → Higher Uncertainty
Lower Entropy → Lower Uncertainty
Entropy is a measure of the purity or impurity in a dataset:
 If all instances belong to the same class, entropy = 0 (pure).
 If classes are evenly distributed (e.g., 50%-50%), entropy = 1 (impure, max
uncertainty).

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
Example:
Given 10 data instances:
 6 belong to the positive class
 4 belong to the negative class
Entropy is calculated using:

Interpretation:
 If the dataset is homogeneous, entropy = 0
 If the dataset is evenly split, entropy = 1
 The value of entropy lies between 0 and 1
 A lower entropy means a better (purer) split

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

6.2 Decision Tree Induction Algorithms

Several decision tree algorithms are commonly used:
 ID3 (Iterative Dichotomiser 3)
 C4.5
 CART (Classification and Regression Trees)
 Others: CHAID, QUEST, GUIDE, CRUISE, CTREE
 ID3 (1986) by J.R. Quinlan uses Information Gain to decide splits.
 C4.5 (1993) is an improvement over ID3, using Gain Ratio as the criterion.
 CART (1984) uses GINI Index and supports both classification and regression.
 C4.5 is better suited for handling continuous values and missing data.

Univariate vs. Multivariate Decision Trees:

 Univariate trees (like ID3, C4.5): Split based on one attribute at a time.
 Multivariate trees (like CART): Split based on a combination of attributes.

ID3 and Attribute Types

 ID3 works well when features are categorical (discrete).

 For continuous attributes, they need to be discretized (partitioned into ranges).
 It uses a purity measure called Information Gain to build trees.
 Ideal for nominal attributes with no missing values.
 Best for large datasets, but:
o Can lead to overfitting on small datasets.
o Performs poorly with missing values and outliers.

Note: No pruning is done in ID3, making it prone to overfitting.

C4.5 and CART handle both categorical and continuous attributes, and can handle missing
values.

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)
6.2.1 ID3 Tree Construction

 Supervised learning algorithm.

 Uses a greedy approach to choose the best attribute (one at a time).
 Only works with categorical features.
 Builds axis-aligned splits (one feature per decision).

Procedure to Construct a Decision Tree using ID3

1. Compute Entropy_Info for the entire dataset (based on the target attribute).
2. Compute Entropy_Info and Information Gain for each attribute.
3. Choose the attribute with the lowest entropy / highest gain as the best split.
4. Place it as the root node.
5. Branch the dataset into subsets based on the values of the root attribute.
6. Repeat recursively for each subset until:
o A leaf node is formed.
o No instances remain.
o Entropy is 0.

Note: Stop branching when entropy = 0.At each step, choose the attribute with highest
Information Gain.

Definitions

 Let T be the training dataset.

 Let A = {A₁, A₂, ..., Aₙ} be the set of attributes.
 Let m be the number of classes.
 Let Pᵢ be the probability that a data instance belongs to class Cᵢ:

Where:dci: Number of instances of class Ci ,T: Total number of instances in the dataset

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

MACHINE LEARNING (BCS602)

Prof. Navyashree K S, CSE(DS) RNSIT

PHYSICS INVESTIGATORY PROJECT Step Down
67% (3)
PHYSICS INVESTIGATORY PROJECT Step Down
22 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
ECON 262-Mathematical Applications in Economics-Kiran Arooj
0% (1)
ECON 262-Mathematical Applications in Economics-Kiran Arooj
4 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Microsoft Premium PL-900 by VCEplus 157q
100% (1)
Microsoft Premium PL-900 by VCEplus 157q
123 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
XS Series E Appen 7 Installation PDF
No ratings yet
XS Series E Appen 7 Installation PDF
101 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Module 5 Part 2 3
No ratings yet
Module 5 Part 2 3
19 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
Module 3
No ratings yet
Module 3
20 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
INSTANCE Based Learning
No ratings yet
INSTANCE Based Learning
12 pages
Ue21cs352a 20230830121058
No ratings yet
Ue21cs352a 20230830121058
18 pages
Unit-4 ML
No ratings yet
Unit-4 ML
12 pages
CS8082U4L01 - K-Nearest Neighbour Learning
No ratings yet
CS8082U4L01 - K-Nearest Neighbour Learning
21 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
ML Unit V
No ratings yet
ML Unit V
10 pages
Module 4 A
No ratings yet
Module 4 A
29 pages
2223 ML Lecture04
No ratings yet
2223 ML Lecture04
46 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
Aiml
No ratings yet
Aiml
7 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
ML - Module 3 - Chapter 4 RNSIT
No ratings yet
ML - Module 3 - Chapter 4 RNSIT
5 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Module - 03 Machine Learning (BCS602) Search Creators
No ratings yet
Module - 03 Machine Learning (BCS602) Search Creators
29 pages
Lecture W7ab
No ratings yet
Lecture W7ab
21 pages
CH 2
No ratings yet
CH 2
30 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Classification
No ratings yet
Classification
74 pages
Lecture 11
No ratings yet
Lecture 11
32 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Chapter#10 (Part#01) SL (K-NN)
No ratings yet
Chapter#10 (Part#01) SL (K-NN)
27 pages
Chapter 6: Classification and Prediction: Classify Predictions
No ratings yet
Chapter 6: Classification and Prediction: Classify Predictions
23 pages
@vtudeveloper - in ML Mod 3
No ratings yet
@vtudeveloper - in ML Mod 3
29 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Module 3-1
No ratings yet
Module 3-1
46 pages
MCA 4th Sem
No ratings yet
MCA 4th Sem
18 pages
Module 3
No ratings yet
Module 3
101 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
MLT Unit 3 Part 2
No ratings yet
MLT Unit 3 Part 2
57 pages
KNN Algorithm
No ratings yet
KNN Algorithm
4 pages
Machine Learning Module-03
No ratings yet
Machine Learning Module-03
24 pages
ML Module 3
No ratings yet
ML Module 3
34 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
AI&ML Module 3
No ratings yet
AI&ML Module 3
96 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Difference Between Instance-And Model-Based Learning
No ratings yet
Difference Between Instance-And Model-Based Learning
35 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Module 3
No ratings yet
Module 3
23 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Data Privacy Policy PDF
No ratings yet
Data Privacy Policy PDF
5 pages
Catalogue ns80 300 Eng PDF
No ratings yet
Catalogue ns80 300 Eng PDF
268 pages
Write An 8051 Assembly Program To Generate A 2ms Waveform Wi
No ratings yet
Write An 8051 Assembly Program To Generate A 2ms Waveform Wi
18 pages
ROCKEXE6EREADR
No ratings yet
ROCKEXE6EREADR
25 pages
Brief History of The Relational Model
No ratings yet
Brief History of The Relational Model
5 pages
GSM-To-UMTS Training Series 01 - Principles of The WCDMA System - V1.0
No ratings yet
GSM-To-UMTS Training Series 01 - Principles of The WCDMA System - V1.0
87 pages
2020 Electrical Engineering Paper-1 (PCC-EE-301) : Circuit Theory Total Marks - 70 Duration:3 Hrs
No ratings yet
2020 Electrical Engineering Paper-1 (PCC-EE-301) : Circuit Theory Total Marks - 70 Duration:3 Hrs
5 pages
Eece 522 Notes - 05 CH - 3b
No ratings yet
Eece 522 Notes - 05 CH - 3b
10 pages
Toshiba 37bv701b Chassis 17mb60 17mb65 Ver.1.00
No ratings yet
Toshiba 37bv701b Chassis 17mb60 17mb65 Ver.1.00
54 pages
Topic10 - Technology Trends
No ratings yet
Topic10 - Technology Trends
11 pages
Indian Polity
No ratings yet
Indian Polity
176 pages
Manual VL54
No ratings yet
Manual VL54
174 pages
Infrastructure Components
No ratings yet
Infrastructure Components
26 pages
HFY-Checklist-14!06!04Inspection Checklist FOTE Test（FOTE试验（现场验收试验）
No ratings yet
HFY-Checklist-14!06!04Inspection Checklist FOTE Test（FOTE试验（现场验收试验）
1 page
The Hash Table Data Structure: Mugurel Ionuț Andreica Spring 2012
No ratings yet
The Hash Table Data Structure: Mugurel Ionuț Andreica Spring 2012
9 pages
Android Controlled Spy Robot With Night Vision Camera
No ratings yet
Android Controlled Spy Robot With Night Vision Camera
16 pages
Unit - V
No ratings yet
Unit - V
13 pages
A Beginner
No ratings yet
A Beginner
21 pages
AI - Lab - Manual - Day2
No ratings yet
AI - Lab - Manual - Day2
25 pages
GEPI Instructions 2025
No ratings yet
GEPI Instructions 2025
2 pages
CD GTU Study Material Presentations Unit-1 27062020072512AM
No ratings yet
CD GTU Study Material Presentations Unit-1 27062020072512AM
41 pages
PowerWalker VFD 600-1000 EN
No ratings yet
PowerWalker VFD 600-1000 EN
8 pages
Intel's New Chimera - Alder Lake - Agner's CPU Blog
No ratings yet
Intel's New Chimera - Alder Lake - Agner's CPU Blog
4 pages
Ford Company Document
No ratings yet
Ford Company Document
1 page
The Killhouse Entry Point Wiki Fandom
No ratings yet
The Killhouse Entry Point Wiki Fandom
1 page