0% found this document useful (0 votes)
19 views39 pages

Ch01 ICS422 02

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views39 pages

Ch01 ICS422 02

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

ICS422 APPLIED

PREDICTIVE ANALYTICS
[3- 0-0-3]

CLASS 02

Presented by
Dr. Selvi C
Assistant
Professor
TYPES OF ML ALGORITHMS
🞆 Supervised Learning/Predictive Learning
🞆 Unsupervised Learning/ Descriptive Learning
🞆 Semi-Supervised Learning
🞆 Reinforcement Learning

2
SUPERVISED LEARNING
• The machine has a "supervisor" or a "teacher" who
gives the machine all the answers, like whether it's
a apple in the picture or a orange.
• The teacher has already divided (labeled) the data
into oranges and apples, and the machine is using
these examples to learn -- > One by one

3
SUPERVISED LEARNING

4
SUPERVISED LEARNING
🞆 When an algorithm learns from example data and
associated target responses in order to later predict the
correct response when posed with new examples

🞆 Labeled data: Data consisting of a set of training


examples, where each example is a pair consisting of an
input and a desired output value (also called the
supervisory signal, labels, etc)

5
SUPERVISED LEARNING

🞆 Classification: Attempts to find the appropriate class


label, such as analyzing positive/negative sentiment, male
and female persons, benign and malignant tumors, secure
and unsecure loans etc.
Map input variables into discrete
categories

🞆 Regression: Predicts a continuous-valued response, for


example predicting real estate prices.
Map input variables to some continuous
function

6
SUPERVISED LEARNING
House Rent Prediction
SQ.FEET RENT
Square feet Vs House Rent
100 1500
200 3000 20000

500 5000 18000


800 6000 16000
1200 12500 14000
2000 18000 12000
700 ?
10000
8000
6000
4000
2000
0
0 500 1000 1500 2000 2500

Regression
7
SUPERVISED LEARNING

A. Given a picture of Male/Female, predict his/her age on


the basis of given picture.
B. Given a picture of Male/Female, predict Whether He/She
is of High school, College, Graduate age.
C. Banks have to decide whether or not to give a loan to
someone on the basis of his credit history

A. Regression
B. Classification
C. Classification

8
SUPERVISED LEARNING

A. Predicting the results of a game.


B. Predicting whether a tumour is malignant or benign.
C. Predicting the price of domains like real estates, stokes
etc.
D. Classifying an email as spam or not.
E. Face recognition

A. Regression
B. Classification
C. Regression
D. Classification
E. Classifcation
9
SOLVE THIS…

You’re running a company, and you want to develop learning


algorithms to address each of two problems.
Problem 1:You have a large inventory of identical items. You want
to predict how many of these items will sell over the next 3 months.

Problem 2: You’d like software to examine individual customer


accounts, and for each account decide if it has been
hacked/compromised. Should you treat these as classification or as
regression problems?

A. Treat both as classification problems.


B. Treat problem 1 as a classification problem, problem 2 as a
regression problem.
C. Treat problem 1 as a regression problem, problem 2 as a
10
classification problem.
D. Treat both as regression problems.
SUPERVISED LEARNING -
ALGORITHMS
🞆 k-Nearest Neighbours
🞆 Decision Trees
🞆 Naive Bayes
🞆 Logistic Regression
🞆 Linear Regression
🞆 Support Vector Machines

11
UNSUPERVISED LEARNING

12
UNSUPERVISED LEARNING- CLUSTERING

Divides objects based on unknown features. Machine


chooses the best way

13
UNSUPERVISED LEARNING

🞆 Learn from the data- No labels


🞆 Discover interesting structures from the data-
Knowledge Discovery
🞆 Does not require an human expert to label the data
In unsupervised learning, there is no instructor or
teacher, and the algorithm must learn to make sense of
the data without this guide.

14
UNSUPERVISED LEARNING
Clustering
🞆 Detecting potentially useful clusters of input examples.

For example, a taxi agent might gradually develop a concept


of “good traffic days” and “bad traffic days” without ever
being given labeled examples of each by a teacher.

— Pages 694-695, Artificial Intelligence: A Modern Approach,


3rd edition, 2015.

15
UNSUPERVISED LEARNING -
DIMENSIONALITY REDUCTION
•Assembles specific features into more high-
level ones

16
UNSUPERVISED LEARNING -
ASSOCIATION RULE LEARNING

• "Look for patterns in the orders' stream"

17
SOLVE THIS…

Of the following examples, which would you address using an


unsupervised learning algorithm? (Check all that apply.)

A. Given email labeled as spam/not spam, learn a spam filter.


B. Given a set of news articles found on the web, group them
into sets of articles about the same stories.
C. Given a database of customer data, automatically discover
market segments and group customers into different
market segments.
D. Given a dataset of patients diagnosed as either having
diabetes or not, learn to classify new patients as having
diabetes or not. 18
SUPERVISED VS UNSUPERVISED

Supervised Unsupervised
Labelled data No labels
Direct Feedback No feed back
Predict outcome Find Hidden
Structure in data
Supervised It is easier to get
machine learning unlabeled data
helps you to solve from a computer
various types of than labeled data,
real-world which needs
computation manual
problems. intervention.

19
SEMI-SUPERVISED LEARNING
🞆 If some learning samples are labeled, but some other are
not labeled, then it is semi-supervised learning.

🞆 It makes use of a large amount of unlabeled data for


training and a small amount of labeled data for testing.

❖ Trained upon a combination of labeled and unlabeled


data
❖ First, cluster similar data using an unsupervised
learning algorithm
❖ Then use the existing labeled data to label the rest of
the unlabeled data
Semi-supervised learning is applied in cases where it is expensive
to acquire a fully labeled dataset while more practical to label a
small subset. 20
SEMI-SUPERVISED LEARNING

21
REINFORCEMENT LEARNING
🞆 Accompany an example with positive or negative feedback
according to the
solution the algorithm proposes
🞆 Learning by trial and error: The system evaluates its
performance based on the feedback responses and reacts
accordingly
🞆 “how to act or behave when given occasional reward or
punishment signals”

22
23
7 STEPS OF MACHINE LEARNING

Problem: Classification of Oranges and Apples.


24
STEP – 1 GATHERING DATA
• First real step of machine learning is gathering data
• Quality and quantity of data

Apple or
Color Shape Orange?
Red Round Conical Apple
Orange Round Orange

Data Collection
⮚ For collecting data on color, we may use a spectrometer and,
for the shape data, we may use pictures of the fruits so that
they can be treated as 2D figures.
⮚ For the purpose of collecting data, we would try to get as many
different types of apples and orange as possible in order to
25
create diverse data sets for our features. For this purpose,
we may try to search the markets for oranges and apples that
may be from different parts of the world.
STEP – 2 DATA PREPARATION

• Load our data into a suitable place and prepare it for use in
our machine learning training
• Randomize the ordering – will improve the model
• Visualizations of your data - Relevant relationships vs Data
imbalances
• Example
• More data points about apple than orange, the model we
train will be biased
• Split the data in two parts.
• First used in training our model, will be the majority of the
dataset – Train data
• Second part will be used for evaluating our trained model’s
performance – Test Data
• Not to use the same data for training and testing

26
STEP – 3 CHOOSING A MODEL

• Categorize the problem


• Categorize by input
• Categorize by output

• Understand your
constraints

• Find the available


algorithms

27
STEP – 4 TRAINING

Use our data to


incrementally improve our • Weights
model’s ability • Biases

Each iteration or cycle of updating


the weights and biases is called
28
one training “step”
STEP – 5 AND 6
Step 6 : Parameter Tuning

Step 5: Evaluation • Further improvement to


• Classification Accuracy training
• Logarithmic Loss • Many times we run through
• Confusion Matrix the training dataset
• Area under Curve
• F1 Score
• Mean Absolute Error
• Mean Squared Error

29
STEP – 7 PREDICTION /
INFERENCE

Color: Red
Shape: Round and Conical

30
ML-TERMINOLOGIES
31
ML TERMINOLOGIES

🞆 Algorithm
A method, function, or series of instructions used to generate
a machine learning model. Examples include linear
regression, decision trees, support vector machines, and
neural networks.

🞆 Attribute
A quality describing an observation (e.g. color, size, weight).
In Excel terms, these are column headers.

🞆 Dimension
How much feature you have in you data . 32
ML TERMINOLOGIES
🞆 Training set
A set of observations used to generate machine learning models.

🞆 Test set
A set of observations used at the end of model training and
validation to assess the predictive power of your model. How
generalizable is your model to unseen data?

🞆 Validation set
A set of observations used during model training to provide
feedback on how well the current parameters generalize beyond
the training set. If training error decreases but validation error
increases, your model is likely overfitting and you should pause
training.

33
ML TERMINOLOGIES
🞆 Dataset Split
✔ First split the dataset into 2 — Train and Test
✔ Keep aside the Test set
✔ Randomly choose X% of their Train dataset to be the
actual Train set and the remaining (100-X)% to be
the Validation set
✔ The model is then iteratively trained and validated on
these different sets

34
ML TERMINOLOGIES
🞆 Dataset Split
Cross Validation: K Fold
The training set is split into k smaller sets
The following procedure is followed for each of the k
“folds”:
🞆 A model is trained using K-1 of the folds as
training data;

🞆 The resulting model is validated on the remaining


part of the data (i.e., it is used as a test set to
compute a performance measure such as
accuracy).

35
ML TERMINOLOGIES
🞆 Parameters
Parameters are properties of training data learned by training
a machine learning model or classifier. They are adjusted
using optimization algorithms and unique to each experiment.
Examples of parameters include:
o weights in an artificial neural network
o support vectors in a support vector machine
o coefficients in a linear or logistic regression

36
ML TERMINOLOGIES
🞆 Overfitting
Overfitting occurs when your model learns the training data
too well and incorporates details and noise specific to your
dataset. You can tell a model is overfitting when it performs
great on your training/validation set, but poorly on your test
set (or new real-world data).

🞆 Underfitting
The counterpart of overfitting, happens when a machine
learning model is not complex enough to accurately capture
relationships between a dataset features and target variables.

37
ML TERMINOLOGIES

🞆 Bias
Bias is the difference between the average prediction of our
model and the correct value which we are trying to predict.
Model with high bias pays very little attention to the training
data and oversimplifies the model. It always leads to high
error on training and test data.
🞆 Variance
Variance is the variability of model prediction for a given data
point or a value which tells us spread of our data. Model with
high variance pays a lot of attention to training data and does
not generalize on the data which it hasn’t seen before. As a
result, such models perform very well on training data but has
high error rates on test data..
38
THANK YOU

You might also like