UNIT III Word File
UNIT III Word File
and Regression Trees – Ensemble Learning – Boosting – Bagging – Different ways to Combine
Classifiers – Basic Statistics – Gaussian Mixture Models – Nearest Neighbor Methods –
Unsupervised Learning – K means Algorithms
Decision Tree
Decision tree is a simple diagram that shows different choices and their possible results
helping you make decisions easily.
Decision Trees are a type of Supervised Machine Learning where the data is continuously
split according to a certain parameter.
✔ The tree can be explained by two entities, namely decision nodes and leaves.
✔ The leaves are the decisions or the final outcomes. And the decision nodes are where
the data is split.
A decision tree is a graphical representation of different options for solving a problem and
show how different factors are related. Root Node is the starting point that represents the
entire dataset.
Branches: These are the lines that connect nodes. It shows the flow from one
decision to another.
Internal Nodes are Points where decisions are made based on the input features.
Leaf Nodes: These are the terminal nodes at the end of branches that represent final
outcomes or predictions
Now, let’s take an example to understand the decision tree. Imagine you want to decide
whether to drink coffee based on the time of day and how tired you feel. First the tree
checks the time of day—if it’s morning it asks whether you are tired. If you’re tired the tree
suggests drinking coffee if not it says there’s no need. Similarly in the afternoon the tree
again asks if you are tired. If you recommends drinking coffee if not it concludes no coffee is
needed.
We have mainly two types of decision tree based on the nature of the target
variable: classification trees and regression trees.
Classification trees: Classification is used when you want to categorize data into
different classes or groups. For example, classifying emails as "spam" or "not spam"
or predicting whether a patient has a certain disease based on their symptoms.
Advantages:
Disadvantages:
Overfitting: Can become too complex and perform poorly on new data.
Instability: Small data changes can lead to big variations in predictions.
Bias Toward Many-Level Features: Might focus too much on features with many
categories, reducing accuracy.
Applications:
Bank Loan Approval: Uses customer details (income, credit score, etc.) to decide
loan approval.
Medical Diagnosis: Helps predict diseases like diabetes based on test results.
The idea is that a group of weak learners can perform better than any single weak
learner.
1.
Bagging (Bootstrap Aggregating):
Models are trained independently on different random subsets of the training data.
Their results are then combined—usually by averaging (for regression) or voting (for
classification). This helps reduce variance and prevents overfitting.
2. Boosting:
Models are trained one after another. Each new model focuses on fixing the errors
made by the previous ones. The final prediction is a weighted combination of all
models, which helps reduce bias and improve accuracy.
Think of it like in a class a teacher focuses more on weak learners to improve its academic
performance similarly boosting works.
While Strong learners have higher prediction accuracy, Boosting converts a system of
weak learners into a single strong learning system. A strong learner is a model that
tries to overcome the weakness and errors of the weak model to give better
predictions.
Increase or improve : Boosting can also mean to increase or improve something. For
example, "boosted him up over the fence"
AdaBoost(adaptive boosting)
AdaBoost works by weighting the instances in the training dataset based on the
accuracy of previous classifications.
its a boosting technique that assigns equal weights to all training samples initially and
iteratively adjusts these weights by focusing more on misclassified datapoints for next
model. It effectively reduces bias and variance making it useful for classification tasks but it
can be sensitive to noisy data and outliers.
There are several types of boosting algorithms some of the most famous and useful models
are as :
4. ALGORITHM
5. Initialise the dataset and assign equal weight to each of the data point.
6. Provide this as input to the model and identify the wrongly classified data points.
7. Increase the weight of the wrongly classified data points and decrease the weights of
correctly classified data points. And then normalize the weights of all data points.
9. End
BAGGING
It can be used for both regression and classification tasks.
Final Prediction: Merges all model outputs to make a strong, stable final prediction.
It helps reduce variance, avoid overfitting, and boost accuracy in machine learning
Bootstrap aggregating
Different training data subsets are selected using row sampling with replacement and
random sampling methods from the entire training dataset.
Bagging tries to solve the over-fitting problem.
If the classifier is unstable (high variance), then apply bagging.