Ch01 ICS422 02
Ch01 ICS422 02
PREDICTIVE ANALYTICS
[3- 0-0-3]
CLASS 02
Presented by
Dr. Selvi C
Assistant
Professor
TYPES OF ML ALGORITHMS
🞆 Supervised Learning/Predictive Learning
🞆 Unsupervised Learning/ Descriptive Learning
🞆 Semi-Supervised Learning
🞆 Reinforcement Learning
2
SUPERVISED LEARNING
• The machine has a "supervisor" or a "teacher" who
gives the machine all the answers, like whether it's
a apple in the picture or a orange.
• The teacher has already divided (labeled) the data
into oranges and apples, and the machine is using
these examples to learn -- > One by one
3
SUPERVISED LEARNING
4
SUPERVISED LEARNING
🞆 When an algorithm learns from example data and
associated target responses in order to later predict the
correct response when posed with new examples
5
SUPERVISED LEARNING
6
SUPERVISED LEARNING
House Rent Prediction
SQ.FEET RENT
Square feet Vs House Rent
100 1500
200 3000 20000
Regression
7
SUPERVISED LEARNING
A. Regression
B. Classification
C. Classification
8
SUPERVISED LEARNING
A. Regression
B. Classification
C. Regression
D. Classification
E. Classifcation
9
SOLVE THIS…
11
UNSUPERVISED LEARNING
12
UNSUPERVISED LEARNING- CLUSTERING
13
UNSUPERVISED LEARNING
14
UNSUPERVISED LEARNING
Clustering
🞆 Detecting potentially useful clusters of input examples.
15
UNSUPERVISED LEARNING -
DIMENSIONALITY REDUCTION
•Assembles specific features into more high-
level ones
16
UNSUPERVISED LEARNING -
ASSOCIATION RULE LEARNING
17
SOLVE THIS…
Supervised Unsupervised
Labelled data No labels
Direct Feedback No feed back
Predict outcome Find Hidden
Structure in data
Supervised It is easier to get
machine learning unlabeled data
helps you to solve from a computer
various types of than labeled data,
real-world which needs
computation manual
problems. intervention.
19
SEMI-SUPERVISED LEARNING
🞆 If some learning samples are labeled, but some other are
not labeled, then it is semi-supervised learning.
21
REINFORCEMENT LEARNING
🞆 Accompany an example with positive or negative feedback
according to the
solution the algorithm proposes
🞆 Learning by trial and error: The system evaluates its
performance based on the feedback responses and reacts
accordingly
🞆 “how to act or behave when given occasional reward or
punishment signals”
22
23
7 STEPS OF MACHINE LEARNING
Apple or
Color Shape Orange?
Red Round Conical Apple
Orange Round Orange
Data Collection
⮚ For collecting data on color, we may use a spectrometer and,
for the shape data, we may use pictures of the fruits so that
they can be treated as 2D figures.
⮚ For the purpose of collecting data, we would try to get as many
different types of apples and orange as possible in order to
25
create diverse data sets for our features. For this purpose,
we may try to search the markets for oranges and apples that
may be from different parts of the world.
STEP – 2 DATA PREPARATION
• Load our data into a suitable place and prepare it for use in
our machine learning training
• Randomize the ordering – will improve the model
• Visualizations of your data - Relevant relationships vs Data
imbalances
• Example
• More data points about apple than orange, the model we
train will be biased
• Split the data in two parts.
• First used in training our model, will be the majority of the
dataset – Train data
• Second part will be used for evaluating our trained model’s
performance – Test Data
• Not to use the same data for training and testing
26
STEP – 3 CHOOSING A MODEL
• Understand your
constraints
27
STEP – 4 TRAINING
29
STEP – 7 PREDICTION /
INFERENCE
Color: Red
Shape: Round and Conical
30
ML-TERMINOLOGIES
31
ML TERMINOLOGIES
🞆 Algorithm
A method, function, or series of instructions used to generate
a machine learning model. Examples include linear
regression, decision trees, support vector machines, and
neural networks.
🞆 Attribute
A quality describing an observation (e.g. color, size, weight).
In Excel terms, these are column headers.
🞆 Dimension
How much feature you have in you data . 32
ML TERMINOLOGIES
🞆 Training set
A set of observations used to generate machine learning models.
🞆 Test set
A set of observations used at the end of model training and
validation to assess the predictive power of your model. How
generalizable is your model to unseen data?
🞆 Validation set
A set of observations used during model training to provide
feedback on how well the current parameters generalize beyond
the training set. If training error decreases but validation error
increases, your model is likely overfitting and you should pause
training.
33
ML TERMINOLOGIES
🞆 Dataset Split
✔ First split the dataset into 2 — Train and Test
✔ Keep aside the Test set
✔ Randomly choose X% of their Train dataset to be the
actual Train set and the remaining (100-X)% to be
the Validation set
✔ The model is then iteratively trained and validated on
these different sets
34
ML TERMINOLOGIES
🞆 Dataset Split
Cross Validation: K Fold
The training set is split into k smaller sets
The following procedure is followed for each of the k
“folds”:
🞆 A model is trained using K-1 of the folds as
training data;
35
ML TERMINOLOGIES
🞆 Parameters
Parameters are properties of training data learned by training
a machine learning model or classifier. They are adjusted
using optimization algorithms and unique to each experiment.
Examples of parameters include:
o weights in an artificial neural network
o support vectors in a support vector machine
o coefficients in a linear or logistic regression
36
ML TERMINOLOGIES
🞆 Overfitting
Overfitting occurs when your model learns the training data
too well and incorporates details and noise specific to your
dataset. You can tell a model is overfitting when it performs
great on your training/validation set, but poorly on your test
set (or new real-world data).
🞆 Underfitting
The counterpart of overfitting, happens when a machine
learning model is not complex enough to accurately capture
relationships between a dataset features and target variables.
37
ML TERMINOLOGIES
🞆 Bias
Bias is the difference between the average prediction of our
model and the correct value which we are trying to predict.
Model with high bias pays very little attention to the training
data and oversimplifies the model. It always leads to high
error on training and test data.
🞆 Variance
Variance is the variability of model prediction for a given data
point or a value which tells us spread of our data. Model with
high variance pays a lot of attention to training data and does
not generalize on the data which it hasn’t seen before. As a
result, such models perform very well on training data but has
high error rates on test data..
38
THANK YOU