Machine Learning INTRO
Machine Learning INTRO
"Traditional Programming is like a recipe for the Computer: Data goes in,
Program processes it, and Output comes out."
"In Machine Learning, the Computer learns from Data to produce Output.
It's like teaching the Computer to Program itself."
Web search
Computational biology
Finance
E-commerce
Space exploration
Robotics
Information extraction
Social networks
Debugging software
[Your favorite area]
Task (T)
Performance metric (P)
Experience (E)
Deep Learning:
Supervised Learning:
Unsupervised Learning:
Assume training and test examples are independently drawn from the
same overall distribution (i.i.d).
examples are not independent, collective classification
test distribution is different, transfer learning
Problem Setting:
Test Set:
The goal of the classification task is to train a model using the training set so that it
can accurately predict the class labels for the test set based on the values of the
attributes. This is typically done using machine learning algorithms such as
decision trees.
2. Classification Techniques:
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines
Each applies a learning algorithm to identify the best model for
predicting class labels.
PREDICT
3. Evaluating Classification Models: TRUE FALSE
ACTUAL TRUE TP FN
4. Confusion Matrix: FALSE FP TN
True Positives (TP): Instances that are
actually positive (class "Yes") and are correctly classified as positive.
5. False Positives (FP): Instances that are actually negative (class "No") but
are incorrectly classified as positive.
6. False Negatives (FN): Instances that are actually positive (class "Yes") but
are incorrectly classified as negative.
7. True Negatives (TN): Instances that are actually negative (class "No") and
are correctly classified as negative.
𝑇𝑃+𝑇𝑁 𝐹𝑃+𝐹𝑁
Accuracy= Error rate=
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃 𝑇𝑃
Precision= Recall=
𝑇𝑃+𝐹𝑃 𝑇𝑃+𝐹𝑁
Tree Induction:
Utilizes a greedy strategy to split records based on attribute tests optimizing
certain criteria.
Challenges include determining attribute test conditions and identifying the
best splits.
Stopping Criteria for Tree Induction:
Cease node expansion when all records belong to the same class or have
similar attribute values.
Early termination methods may be applied.
Decision Tree Based Classification:
Advantages:
Affordability.
Rapid classification.
Easy interpretation for small trees.
Comparable accuracy to other techniques for simple datasets.
Practical Issues of Classification:
Underfitting and Overfitting:
Achieving a balance between training and generalization errors is
crucial.
Underfitting occurs when the model is too simple, while overfitting
results in overly complex models ,Noise , Insufficient Examples.