Machine Learning Intro & Evaluation Metrics
Machine Learning Intro & Evaluation Metrics
LEARNING
2
COURSE OVERVIEW
Machine learning is a branch of computer science that uses algorithms to imitate the way in
which humans learn. It uses statistical methods to train algorithms and make predictions. The
accuracy of these predictions improves over time.
This course covers the basic concepts and techniques of Machine Learning from both theoretical
and practical perspective
Introduction to Artificial Intelligence and Machine Learning
Data Prepossessing: TOPICS
• Feature Scaling.
• Missing Data.
• Dummy Variable.
• Imbalanced Data.
Feature Engineering.
• Backward Elimination.
• Forward Elimination.
• Model Validation
Supervised Learning:
• Linear Regression.
• Logistic Regression.
• Naive Bays.
• Regression and Classification Metrics, Confusion Matrix, ROC and AUC.
Ensemble Learning
• Decision Tree.
• Random Forest.
• Gradient Boost, XG Boost and LGBM
Unsupervised learning
• K-means Clustering.
• Hieratical Clustering.
• Gaussian Mixture Model GMM.
• Principal Component Analysis PCA.
4
DEFINITION
• Arthur Samuel (1959). Machine Learning: Field of
study that gives computers the ability to learn
without being explicitly programmed.
WHY ?
• Flood of available data (especially with the advent of the Internet) “2.5
quintillion / day (10^30)”
• Increasing computational power
• Growing progress in available algorithms and theory developed by
researchers
• Increasing support from industries
6
HOW ?
E.g. Spam mails program
Experience
=
Data
9
10
SUPERVISED LEARNING
• Classification and Regression is seen as supervised learning from examples.
• Supervision: The data (observations, measurements, etc.) are labeled with pre-
defined classes. It is like that a “teacher” gives the classes (supervision).
• Test data are classified into these classes too (in case of Classification).
11
DATA SET
13
MODEL LEARNING
14
MODEL TESTING
Cross Classifier
Validation & Classification Model Evaluation
Testing Set
15
PREDICTION / REGRESSION
16
QUESTION
➢ We have a hospital application for the suspects of a cancer, they get the
data of the hospital and contains many features and attributes, to have an
efficient system to decide whether the tumor is malignant or benign. What
would be the type of the problem?
UNSUPERVISED LEARNING
• given the data can you derive a certain group / cluster ?
18
UNSUPERVISED LEARNING
• studies how systems can learn to represent particular input patterns in a way
that reflects the statistical structure of the overall collection of input patterns.
• It is likely to be much more common in the brain than supervised learning.
• Examples:
• Clustering.
• Blind signal separation.
• Self- organising maps
• Etc.
19
REINFORCEMENT LEARNING
• Learning from interaction with an environment to achieve some long-term
goal that is related to the state of the environment
• The goal is defined by reward signal, which must be maximized
• Agent must be able to partially/fully sense the environment state and take
actions to influence the environment state
• The state is typically described with a feature-vector
21
Evaluation Metrics
25
26
ACCURACY
QUESTION 1 OF 2
In the medical example, what is worse, a False
Positive, or a False Negative?
False Negative
32
QUESTION 2 OF 2
In the spam detector example, what is
worse, a False Positive, or a False
Negative?
False positive
33
34
35
36
37
F1 SCORE
Y
Harmonic
2XY Arithmetic Mean(Precision , Recall)
Mean =
/X+Y
F1 score = Harmonic Mean(Precision , Recall)
X
38
39
F-BETA SCORE
beta = 1.0
Precision Recall
beta = 0.5 F1 Score beta = 2
40
41
42
True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:
Sensitivit
y
False Positive Rate (FPR) is defined as follows:
Specificit
y
44
Q&A
Any Questions?