Unit 4 Classification
Unit 4 Classification
Unit 4
Classification
What is classification:
Classification refers to the process of categorizing data into predefined classes or groups
based on certain characteristics or features. It is a supervised learning technique, meaning it
requires labeled data for training.
Types of nodes
1. Root node – Main Question
2. Branch node – Intermediate Node
3. Leaf node - Answer
Example:
Credit score rating
A Average, B Bad, C Good , D Excellent
6. Handling Missing Values: Decision tree algorithms may not handle missing values effectively,
potentially leading to biased models.
7. Prone to Local Optima: Greedy algorithms used in decision tree induction may lead to
suboptimal splits.
8. Interpretability vs. Accuracy: While decision trees are interpretable, their simplicity may limit
predictive accuracy compared to more complex models.
Addressing these issues often involves using techniques such as pruning, ensemble methods,
regularization, and careful hyper-parameter tuning.
Disadvantages:
1. Overfitting: Prone to overfitting, especially with deep trees.
2. High Variance: Sensitive to variations in the training data.
3. Bias-Variance Tradeoff: Balancing bias and variance can be challenging.
4. Instability: Small changes in data can lead to different tree structures.
5. Limited Expressiveness: May struggle to capture complex relationships.
6. Greedy Nature: May not find the globally optimal tree structure.
7. Difficulty with Imbalanced Data: May perform poorly on imbalanced datasets.
1. Bayes's theorem is expressed mathematically by the following equation that is given below.
2. Naive Bayes Classifier: The Naive Bayes classifier is one of the simplest Bayesian classifiers. It
assumes that the features are conditionally independent given the class label. Despite this strong
assumption (which is often not true in practice), Naive Bayes classifiers are surprisingly effective
and efficient for many real-world problems. Common variants include:
o Gaussian Naive Bayes: Assumes that the likelihood of the features follows a Gaussian
distribution.
o Multinomial Naive Bayes: Suitable for features that represent counts or frequencies (e.g.,
text classification).
o Bernoulli Naive Bayes: Assumes that features are binary.
3. Bayesian Network Classifiers: These are more complex models that represent the dependencies
between features using a directed acyclic graph (DAG). Each node in the graph represents a
random variable (feature), and edges represent probabilistic dependencies between them.
Bayesian network classifiers can capture more complex relationships between features but require
more sophisticated inference algorithms.
Bayesian classification methods are particularly useful when dealing with small datasets or when
interpretability is important. They provide a principled way to incorporate prior knowledge into the
classification process and can produce well-calibrated probability estimates. However, they may not always
perform as well as more complex models like deep neural networks on very large datasets.
Rule-Based Classification
Rule-based classification is a method of classifying data based on a set of predefined rules. These rules
are typically derived from domain knowledge or extracted from the data itself. Rule-based classification
systems make decisions by applying these rules to the input data and assigning a class label based on the
conditions satisfied.
Here's an overview of how rule-based classification works:
1. Rule Representation: Rules are typically represented in the form of "if-then" statements. Each
rule consists of a condition (antecedent) and an action (consequent). For example:
o If (age < 30) and (income > $50,000), then class = "young and affluent"
o If (age >= 30) and (age < 50), then class = "middle-aged"
o If (age >= 50) and (income < $30,000), then class = "senior with low income"
2. Rule Induction: Rule-based classification systems can be built using rule induction algorithms,
which automatically generate rules from training data. These algorithms analyze the data to
identify patterns and relationships between features and class labels. Common rule induction
algorithms include:
o Decision Trees: Decision trees can be converted into rule sets by traversing the tree from
the root to the leaf nodes, where each path represents a rule.
o Sequential Covering Algorithms: These algorithms iteratively generate rules to cover
different subsets of the data, often using techniques like exhaustive search or heuristics to
select the best rule at each step.
o Association Rule Mining: Association rule mining algorithms, such as Apriori, identify
rules that describe relationships between different attributes in the data.
3. Rule Refinement: Once initial rules are generated, they may be refined to improve their accuracy
or interpretability. This can involve pruning redundant or irrelevant rules, optimizing rule
conditions, or combining rules to create more general or specific rules.
4. Rule Evaluation: Rule-based classifiers are evaluated using metrics such as accuracy, precision,
recall, and F1-score on a held-out test dataset. The performance of the classifier can be assessed
by comparing the predicted class labels with the true class labels in the test data.
5. Interpretability: One of the key advantages of rule-based classification is its interpretability.
Since the classification decisions are based on explicit rules, it is easy to understand why a
particular decision was made. This makes rule-based classifiers particularly useful in domains
where interpretability is important, such as healthcare and finance.
While rule-based classification systems have the advantage of interpretability, they may struggle to
capture complex relationships in the data compared to more flexible models like neural networks.
Additionally, designing effective rule sets can require significant domain expertise, and rule-based
classifiers may not perform as well as other methods on very large or high-dimensional datasets.
well but might be sensitive to noise, while a larger value of k leads to a smoother decision boundary but
may miss local variations.
Suppose we have a new data point and we need to put it in the required category. Consider the below
image:
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is
the distance between two points, which we have already studied in geometry. It can be calculated
as:
By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B. Consider the below image:
As we can see the 3 nearest neighbors are from category A, hence this new data point must belong
to category A.
2. Precision
Definition: Precision is the ratio of correctly predicted positive observations to the total predicted
positives.
Formula:
Interpretation: Precision answers the question: "What proportion of positive identifications was actually
correct?"
Interpretation: Recall answers the question: "What proportion of actual positives was identified
correctly?"
Confusion Matrix
To make these concepts clearer, consider the confusion matrix, which is a summary of prediction results
on a classification problem.
Positive (P) TP FN
Negative (N) FP TN
Positive (P) 50 10
Negative (N) 5 35
TP = 50
TN = 35
FP = 5
FN = 10