0% found this document useful (0 votes)

9 views4 pages

THUẬT TOÁN

Uploaded by

Huyền Trang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views4 pages

THUẬT TOÁN

Uploaded by

Huyền Trang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

1.

THEORETICAL FRAMEWORK

2.1. Decision Tree

The Decision Tree is a supervised machine learning algorithm that utilizes a tree-
like structure to make decisions based on input data. At each node, the model identifies
an attribute to split the data into smaller subsets, aiming to make these subsets more
homogeneous according to an evaluation criterion. The leaf nodes at the end of the tree
contain the output value, representing either a classification label or a predicted value in
regression tasks.

Working Mechanism

The process of constructing a decision tree involves a series of iterative steps,

where, at each step, the best attribute is chosen to split the data. The best attribute is
determined based on a criterion that minimizes uncertainty or enhances the purity of the
data, such as Entropy, Gini Index, or Mean Squared Error.

 Entropy và Information Gain:

Entropy measures all data or any data. Calculate formula

Information Gain: the difference in Entropy before and after splitting data based
on an attribute, representing the degree of improvement of the tree:

 Gini Index: an alternative measure of purity Entropy, calculated according to the

formula:
Mean Squared Error (MSE). With a regression problem, the decision tree
optimizes by minimizing the MSE:

Popular Decision Tree Construction Algorithms

ID3 (Iterative Dichotomiser 3)

ID3 is the first algorithm used to build decision trees, relying on Entropy and
Information Gain. The algorithm selects the attribute with the highest Information Gain
to split the data at each node.

C4.5
C4.5 is an extension of ID3 that improves upon it by using Gain Ratio – a variant
of Information Gain designed to avoid bias towards attributes with many unique values.
Additionally, C4.5 can handle continuous attributes by determining threshold values for
splitting.

CART (Classification and Regression Trees)

CART is an algorithm that uses the Gini Index to evaluate attributes in
classification problems and employs Mean Squared Error (MSE) for regression tasks. It
is widely implemented due to its flexibility and efficiency.

2.2. Random Forest

Random Forest is a machine learning algorithm belonging to the ensemble

learning methods, where multiple decision trees are built and combined to create a more
robust model. Each tree in the Random Forest is constructed using a different dataset
generated through bootstrap sampling – a method of random sampling with replacement
from the original dataset.

Working Mechanism Constructing independent decision trees: For each tree, a

subset of the training data is created using bootstrap sampling. At each node in the tree,
only a random subset of features is considered to select the best attribute for splitting.
Aggregating results from the trees: For classification problems, Random Forest uses
the majority voting method, where the class predicted by the majority of the trees is
chosen as the output. For regression problems, the output is the average of the predictions
from all the trees.
Key Characteristics of Random Forest Random selection of datasets and
attributes at each node increases the diversity among trees, reducing the risk of
overfitting.

Combining weak learners into a strong model: While individual decision trees
might not perform well, their combination in a forest leads to a robust and stable model.

Random Forest Construction Algorithm: Bootstrap Sampling: Generate

multiple training datasets by randomly sampling with replacement from the original
dataset. Random Subset of Features: At each node in the tree, only a small subset of
attributes is considered for splitting, minimizing the correlation among trees. Voting or
Averaging: Aggregate the outputs from the trees by voting (for classification) or
averaging (for regression).

2.3. Ridge Regression

Ridge Regression is a type of linear regression that includes a regularization term

to address the problem of multicollinearity and overfitting in high-dimensional datasets.
It modifies the ordinary least squares (OLS) regression by adding a penalty term, which
constrains the magnitude of the model's coefficients. This ensures a more generalized
model that performs well on unseen data.

Working Mechanism

The Ridge Regression model minimizes a cost function that balances the trade-off
between fitting the data and keeping the model coefficients small. The cost function is
expressed as:

yi: Actual value of the dependent variable for observation iii.

y^i: Predicted value of the dependent variable for observation iii.

βj: Coefficient of the jjj-th feature.

λ: Regularization parameter (penalty term).

The first term in the cost function is the residual sum of squares (RSS), which
measures the model’s error. The second term is the penalty term, proportional to the
squared magnitudes of the coefficients, and it discourages large coefficients.
Effect of Regularization Parameter (λ)

When λ=0, Ridge Regression reduces to Ordinary Least Squares, and no

regularization is applied. As λ increases, the penalty term becomes more significant,
forcing the coefficients to shrink closer to zero. Unlike Lasso Regression, Ridge
Regression does not reduce coefficients to exactly zero, meaning all predictors remain in
the model.

Key Characteristics of Ridge Regression: Multicollinearity Handling is

Ridge Regression is particularly effective in scenarios where predictors are highly
correlated. By adding a penalty term, it reduces the variance of the model, resulting in
more stable predictions. Feature Shrinkage is the regularization term that shrinks the
coefficients, which helps prevent overfitting, especially when the dataset contains noise
or irrelevant features. Solution Stability is the penalty term that ensures that the problem
of inversion in the normal equation is mitigated, making the solution numerically stable
even with collinear predictors.

Ridge Regression Construction Algorithm

Data Standardization: Standardize the features to ensure that all predictors are on
the same scale. This step is essential since Ridge Regression depends on the magnitude of
coefficients.

Defining the Cost Function: Construct the cost function as the sum of the RSS
and the penalty term.

Optimization: Solve the optimization problem using techniques like gradient

descent or matrix-based methods. The Ridge Regression solution can be expressed as:

Here, X is the design matrix, y is the target vector, and I is the identity matrix.

Hyperparameter Tuning:Choose an appropriate value for λ\lambdaλ using

methods like cross-validation to balance bias and variance optimally.

Prediction: Use the optimized coefficients to make predictions on new data.

Strassen's Matrix Multiplication
100% (1)
Strassen's Matrix Multiplication
12 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
CLI Reference v12 3
No ratings yet
CLI Reference v12 3
314 pages
Acknowledgement For Thesis Work in Pakistan
100% (3)
Acknowledgement For Thesis Work in Pakistan
7 pages
Decision Trees
67% (3)
Decision Trees
14 pages
04 File Handling
No ratings yet
04 File Handling
40 pages
bcs402 Microcontrollers Model Question Paper Solutions For 4th Sem Be
No ratings yet
bcs402 Microcontrollers Model Question Paper Solutions For 4th Sem Be
44 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Decision Trees Cheat Sheet PDF
No ratings yet
Decision Trees Cheat Sheet PDF
2 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
36 pages
MI - Unit 4
No ratings yet
MI - Unit 4
79 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
BI
No ratings yet
BI
17 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
DM 3000
No ratings yet
DM 3000
7 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Chap 8
No ratings yet
Chap 8
9 pages
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
No ratings yet
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
36 pages
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
No ratings yet
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
40 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Unit 3
No ratings yet
Unit 3
31 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Assignment Problem
No ratings yet
Assignment Problem
5 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Árboles de Regresión. Algunos Algoritmos y Extensiones A Métodos de Consenso Autor David Gonzalo Ejea Carbonell
No ratings yet
Árboles de Regresión. Algunos Algoritmos y Extensiones A Métodos de Consenso Autor David Gonzalo Ejea Carbonell
34 pages
ML-classification Models
No ratings yet
ML-classification Models
27 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Developing World Shaping Solutions For A Global Economy: Whitepaper 3.0
No ratings yet
Developing World Shaping Solutions For A Global Economy: Whitepaper 3.0
31 pages
6 - CART Models
No ratings yet
6 - CART Models
15 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
Pic Favorite
No ratings yet
Pic Favorite
86 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Edab Module - 4
No ratings yet
Edab Module - 4
16 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Unit 2
No ratings yet
Unit 2
11 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Tree-Based Methods
No ratings yet
Tree-Based Methods
32 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
MCQs - Deep Learning Fundamentals - Understanding Neural Networks, Activation Functions, and Bac
No ratings yet
MCQs - Deep Learning Fundamentals - Understanding Neural Networks, Activation Functions, and Bac
10 pages
Unit 4
No ratings yet
Unit 4
33 pages
Maintenance - Hiren's BootCD 1.3
No ratings yet
Maintenance - Hiren's BootCD 1.3
23 pages
RG-RAP2260 (H) Datasheet-20240104
No ratings yet
RG-RAP2260 (H) Datasheet-20240104
12 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Regression Trees
No ratings yet
Regression Trees
11 pages
Production Activity Control Scheduling
No ratings yet
Production Activity Control Scheduling
31 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
Scout Engineering 3
No ratings yet
Scout Engineering 3
14 pages
IT Reviewer
No ratings yet
IT Reviewer
13 pages
2010A IP Questions
No ratings yet
2010A IP Questions
47 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
ML Assignment-01
No ratings yet
ML Assignment-01
7 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Teach Your Raspberry Pi - "Yeah, World"
No ratings yet
Teach Your Raspberry Pi - "Yeah, World"
10 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Random Forest
No ratings yet
Random Forest
5 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
User-Defined Functions in C
No ratings yet
User-Defined Functions in C
6 pages
Etransfer KPESE Manual Version 1.0 PDF
No ratings yet
Etransfer KPESE Manual Version 1.0 PDF
16 pages
Final Examination in Empowerment Technologies
No ratings yet
Final Examination in Empowerment Technologies
3 pages
Using Office Backstage: Lesson Skill Matrix
No ratings yet
Using Office Backstage: Lesson Skill Matrix
13 pages
Sqa 47 44490
No ratings yet
Sqa 47 44490
3 pages
ISYE6501 HW1 Kevin
No ratings yet
ISYE6501 HW1 Kevin
7 pages
Academic Planner Class 2
No ratings yet
Academic Planner Class 2
7 pages
Arijit Math
No ratings yet
Arijit Math
6 pages
Judge Eloida R. de Leon-Diaz v. Atty. Ronaldo Antonio v. Calayan Almacen PDF Attorney-Client Privilege Notary Public
No ratings yet
Judge Eloida R. de Leon-Diaz v. Atty. Ronaldo Antonio v. Calayan Almacen PDF Attorney-Client Privilege Notary Public
1 page
UNIT I Complete Notes
No ratings yet
UNIT I Complete Notes
5 pages
Lab 2
No ratings yet
Lab 2
3 pages
Esquema Sensor de Temperatura Com LM 358
No ratings yet
Esquema Sensor de Temperatura Com LM 358
1 page
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

THUẬT TOÁN

Uploaded by

THUẬT TOÁN

Uploaded by

1.

2.1. Decision Tree

The process of constructing a decision tree involves a series of iterative steps,

 Entropy và Information Gain:

Entropy measures all data or any data. Calculate formula

 Gini Index: an alternative measure of purity Entropy, calculated according to the

Popular Decision Tree Construction Algorithms

ID3 (Iterative Dichotomiser 3)

CART (Classification and Regression Trees)

2.2. Random Forest

Random Forest is a machine learning algorithm belonging to the ensemble

Working Mechanism Constructing independent decision trees: For each tree, a

Random Forest Construction Algorithm: Bootstrap Sampling: Generate

2.3. Ridge Regression

Ridge Regression is a type of linear regression that includes a regularization term

yi: Actual value of the dependent variable for observation iii.

y^i: Predicted value of the dependent variable for observation iii.

βj: Coefficient of the jjj-th feature.

λ: Regularization parameter (penalty term).

When λ=0, Ridge Regression reduces to Ordinary Least Squares, and no

Key Characteristics of Ridge Regression: Multicollinearity Handling is

Ridge Regression Construction Algorithm

Optimization: Solve the optimization problem using techniques like gradient

Hyperparameter Tuning:Choose an appropriate value for λ\lambdaλ using

Prediction: Use the optimized coefficients to make predictions on new data.

You might also like