0% found this document useful (0 votes)
4 views31 pages

05 - Machine Learning

The document provides an overview of machine learning approaches, including classical, reinforcement, and ensemble learning, as well as neural networks and deep learning. It discusses the differences between supervised and unsupervised learning, highlighting their applications, drawbacks, and evaluation methods. Additionally, it addresses issues related to data preparation, classification, prediction, and the challenges of overfitting and underfitting in model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views31 pages

05 - Machine Learning

The document provides an overview of machine learning approaches, including classical, reinforcement, and ensemble learning, as well as neural networks and deep learning. It discusses the differences between supervised and unsupervised learning, highlighting their applications, drawbacks, and evaluation methods. Additionally, it addresses issues related to data preparation, classification, prediction, and the challenges of overfitting and underfitting in model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Introduction to data science

Overview of Machine Learning


Machine Learning Approaches

Classical
learning

Reinforcement MACHINE Ensemble


LEARNING
learning learning

Neural nets
and deep
learning
Machine Learning Approaches

Classical
learning

Supervised Unsupervised Semi-supervised


learning learning learning
Machine Learning Approaches

Ensemble
learning

Boosting Bagging Stacking


Machine Learning Approaches

Reinforcement
learning

Genetic
Algorithm Q-Learning …
(GA)
Machine Learning Approaches

Neural nets
(NN) and
deep learning

Back
Feed forward Convolutional
Propagation Recurrent NN ….
NN NN
NN
Supervised vs. Unsupervised Learning

◼ Supervised learning (classification)


◼ Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
◼ New data is classified based on the training set
◼ Unsupervised learning (clustering)
◼ The class labels of training data is unknown
◼ Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Supervised vs. Unsupervised Learning
Supervised Learning: Classification vs. Prediction

◼ Classification
◼ predicts categorical class labels (discrete or nominal)

◼ classifies data (constructs a model) based on the


training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
◼ Prediction (Regression)
◼ models continuous-valued functions, i.e., predicts
unknown or missing values
◼ Typical applications
◼ Credit approval

◼ Target marketing

◼ Medical diagnosis

◼ Fraud detection
Supervised Learning: Drawbacks

◼ Supervised learning requires human expertise: Expert


annotators play an invaluable role in guiding your model’s
training, but they can be difficult to recruit.
◼ Supervised learning is labor-intensive: You’ll need to
have a big enough team with relevant expertise to accurately
label large datasets.
◼ Supervised learning is time-intensive: In addition to top
talent, you’ll need the bandwidth to accurately annotate the
dataset so that your model is capable of producing
predictable outcomes.
Classification: A Two-Step Process

◼ Model construction: describing a set of predetermined classes


◼ Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
◼ The set of tuples used for model construction is training set

◼ The model is represented as classification rules, decision trees,


or mathematical formulae
◼ Model usage: for classifying future or unknown objects
◼ Estimate accuracy of the model

◼ The known label of test sample is compared with the

classified result from the model


◼ Accuracy rate is the percentage of test set samples that are

correctly classified by the model


◼ Test set is independent of training set, otherwise over-

fitting will occur


◼ If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


M ik e A ssistan t P ro f 3 no (Model)
M ary A ssistan t P ro f 7 yes
B ill P ro fesso r 2 yes
J im A sso c iate P ro f 7 yes
IF rank = ‘professor’
D ave A ssistan t P ro f 6 no
OR years > 6
Anne A sso c iate P ro f 3 no
THEN tenured = ‘yes’
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom A ssistan t P ro f 2 no Tenured?
M erlisa A sso c iate P ro f 7 no
G eo rg e P ro fesso r 5 yes
J o sep h A ssistan t P ro f 7 yes
Machine learning in data mining
Issues regarding to classification and prediction
Issues: Data Preparation

◼ Data cleaning
◼ Preprocess data in order to reduce noise and handle
missing values
◼ Relevance analysis (feature selection)
◼ Remove the irrelevant or redundant attributes
◼ Data transformation
◼ Generalize and/or normalize data
Issues: Evaluating Classification Methods

◼ Accuracy
◼ classifier accuracy: predicting class label

◼ predictor accuracy: guessing value of predicted attributes

◼ Speed
◼ time to construct the model (training time)

◼ time to use the model (classification/prediction time)

◼ Robustness: handling noise and missing values


◼ Scalability: efficiency in disk-resident databases
◼ Interpretability
◼ understanding and insight provided by the model

◼ Other measures, e.g., goodness of rules, such as decision


tree size or compactness of classification rules
Issues: Evaluating Classification Methods
Actual class
+ –
False Positive - NP
Predicted + True Positive - TP
Type I error
False Negative- FN
class – Type II error
True Negative - TN
Issues: Evaluating Classification Methods
Miss Detection Rate

False Alarm Rate


Issues: Evaluating Classification Methods
Issues: Evaluating Classification Methods

Example: Given a confusion matrix

Calculate Accuracy, Precision,


Recall and F1-Score.

Accuracy =
Precision =
Recall =
F1-Score =
Issues: Evaluating Regression Methods
Issues: Evaluating Regression Methods

Mean Squared Error (MSE)

Mean Absolute Error (MAE):

Root Mean Square Error (RMSE):

where: yi is the actual values, and 𝑦ො𝑖 is the predicted values


Issues: Evaluating Regression Methods

Mean Absolute Percentage Error (MAPE)

R2 (R-squared):

where: yi is the actual values, and 𝑦ො𝑖 is the predicted values


SSR is the sum of squared residuals, and SST is the total sum of squares
Issues: Evaluating Regression Methods

Calculate MSE, MAE, RMSE, R2


Issues: Evaluating Regression Methods
Issues: Overfitting and underfitting

▪ Underfitting happens when a model is not good enough to understand all the
details in the data
→ Poor performance on both the training and test sets
▪ Overfitting occurs when a model is too complex and memorizes the training
data too well
→ good performance on the training set but poor performance on the test set
Other machine learning models

▪ Ensemble learning:
Other machine learning models

▪ Ensemble learning:

You might also like