Decision Tree Classification Example
Decision Tree Classification Example
Features (Inputs):
Age
Gender (Male/Female)
Class (1st, 2nd, 3rd)
Fare (Ticket price)
Labels (Output):
Survived (Yes/No)
Decision Tree Steps:
1. Root Node: Check Gender.
Female → Likely Yes (survived).
Male → Move to next step.
2. Class Split (Male):
1st Class → Likely Yes.
3rd Class → Likely No.
2nd Class → Check Age/Fare.
Example Paths:
Male, 3rd Class, Age 30, Paid $7 → No.
Female, 1st Class, Age 40, Paid $80 → Yes.
Outcome: Classify passengers as survived or not survived based on features.
Confusion Matrix
A confusion matrix is a tool used to evaluate the performance of a classification model. It summarizes the results by
showing how well the predicted classifications match the actual classifications. It is typically a 2x2 matrix for binary
classification tasks but can be expanded for multi-class classification.
Components:
1. True Positives (TP): Correctly predicted positive cases.
2. True Negatives (TN): Correctly predicted negative cases.
3. False Positives (FP): Incorrectly predicted positive cases (Type I error).
4. False Negatives (FN): Incorrectly predicted negative cases (Type II error).
Uses:
Accuracy: (TP + TN) / Total predictions.
Precision: TP / (TP + FP).
Recall: TP / (TP + FN).
F1-Score: Harmonic mean of precision and recall.
The confusion matrix provides insights into where the model is making errors and helps in tuning its performance.
Factors Affecting Classifier Performance:
1. Data Quality: Clean, relevant, labeled data.
2. Feature Selection: Choosing the right features.
3. Model Complexity: Avoid overfitting or underfitting.
4. Training Data Size: Sufficient and diverse data.
5. Hyperparameter Tuning: Optimize parameters for better results.
Correlation vs. Causation
Correlation:
Relationship/association between two variables.
Does not imply causation.
Causation:
One variable directly influences another.
Indicates a cause-effect relationship.
Example:
Correlation:
Ice Cream Sales ↔ Drowning Incidents (both increase in summer).
No causation: Ice cream sales don’t cause drowning.
Causation:
Smoking → Lung Cancer (smoking directly increases risk).
Summary:
Correlation: Link between variables.
Causation: Direct cause-effect relationship.
Types of Machine Learning
1. Supervised Learning:
Data Type: Labeled data (input-output pairs).
Objective: Predict outputs.
Examples: Classification, Regression.
2. Unsupervised Learning:
Data Type: Unlabeled data (only inputs).
Objective: Identify patterns/structures.
Examples: Clustering, Dimensionality Reduction.
3. Semi-supervised Learning:
Combines labeled and unlabeled data.
Example: Image classification with limited labels.
4. Reinforcement Learning:
Learns through actions in an environment to maximize rewards.
Example: Game playing, Robotics.
5. Deep Learning:
Uses neural networks with multiple layers.
Example: CNNs, RNNs.