0% found this document useful (0 votes)
8 views2 pages

Logistic Regression Vs Decision Tree

The document compares Logistic Regression and Decision Trees, highlighting their model types, decision boundaries, interpretability, and handling of non-linearity. It also outlines the K-Means algorithm for clustering, including its steps and the use of Bayesian Information Criterion (BIC) for determining the number of clusters. Additionally, it describes the Boosting technique, specifically AdaBoost, which combines weak learners to improve model performance.

Uploaded by

dsukanya737
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

Logistic Regression Vs Decision Tree

The document compares Logistic Regression and Decision Trees, highlighting their model types, decision boundaries, interpretability, and handling of non-linearity. It also outlines the K-Means algorithm for clustering, including its steps and the use of Bayesian Information Criterion (BIC) for determining the number of clusters. Additionally, it describes the Boosting technique, specifically AdaBoost, which combines weak learners to improve model performance.

Uploaded by

dsukanya737
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

(usually Euclidean).


Logistic Regression vs Decision Tree.
1. Model Type: 3.​ Update Step: Calculate new centroids by
●​ Logistic Regression is a linear model that taking the mean of all points assigned to
predicts the probability of class each cluster.​
membership using a sigmoid function.
●​ Decision Tree is a non-linear, tree-based 4.​ Repeat Steps 2 and 3 until centroids do
model that splits data based on feature not change significantly (i.e.,
values. convergence).​
2. Decision Boundary:
●​ Logistic Regression creates a straight-line
(linear) boundary.
●​ Decision Tree can form complex,
3(b) Prove that your algorithm will converge
non-linear boundaries by recursive
splitting. in finite number of iterations
3. Interpretability:
Proof of Finite Convergence:
●​ Logistic Regression is highly interpretable
with meaningful coefficients. ●​ At each iteration, K-means:​
●​ Decision Tree is also interpretable but can
become complex if too deep. ○​ Minimizes the within-cluster sum
4. Handling Non-linearity and Interactions: of squared distances (WCSS).
●​ Logistic Regression struggles with ○​ Reassigns points to the nearest
non-linearity unless features are centroid → cost function
transformed. decreases or stays same.
●​ Decision Tree naturally handles ○​ Updates centroids to the mean →
non-linear relationships and feature further reduces WCSS.
interactions. ●​ Since there are finite ways to assign n
Conclusion: points to k clusters, and the cost function
●​ Use Logistic Regression when data is never increases, K-Means must converge
linearly separable and model simplicity is after a finite number of steps, even if it’s
key. not optimal (local minima).​
●​ Use Decision Tree for non-linear data and
when interpretability through rules is
preferred.

3(c) Use of Bayesian Information Criterion


3(a) Write the K-Means Algorithm for (BIC) to Determine the Number of Clusters
Clustering
BIC (Bayesian Information Criterion) is used to
K-Means Algorithm Steps: select the model (number of clusters) that best
balances fit and complexity.
1.​ Initialize: Choose the number of clusters k
and randomly initialize k centroids.​ Formula:

BIC = ln(n) . p - 2 . ln(L)


2.​ Assignment Step: Assign each data point
to the nearest centroid based on distance
Where:

●​ : number of data points


●​ : number of parameters (depends on
number of clusters) 5. Update sample weights:
●​ : likelihood of the model
Increase weights of misclassified samples so the
Explanation: next learner focuses more on them.

●​ BIC penalizes more complex models Decrease weights of correctly classified samples.
(higher k) to avoid overfitting.
6. Repeat steps 2–5:
●​ Choose the number of clusters k where
BIC is minimized. For a fixed number of rounds or until desired
●​ It helps find a balance between model performance.
complexity and accuracy.
7. Final Model:

Combine all weak learners using a weighted vote


Boosting is an ensemble learning technique that (based on each learner’s accuracy).
combines multiple weak learners (usually
decision trees) to form a strong learner.

A weak learner is a model that performs slightly


better than random guessing (e.g., small decision
tree).

●​ Working Mechanism of AdaBoost:

1. Initialize sample weights:

Assign equal weights to all training samples.

2. Train a weak learner:

Fit a weak classifier (e.g., decision stump) using


the weighted training data.

3. Calculate error:

Compute the error rate of the weak learner (only


misclassified samples matter more).

4. Compute learner weight:

More accurate learners get higher weight in the


final model.

Formula:

You might also like