0% found this document useful (0 votes)
5 views28 pages

ML Intro

The document outlines the fundamentals of machine learning, including definitions of learning and the data science process. It distinguishes between supervised, unsupervised, and semi-supervised learning, and discusses the stages of machine learning such as training and testing. Additionally, it provides examples related to cancer diagnosis to illustrate the application of machine learning algorithms in classification tasks.

Uploaded by

rtzvdpsw2x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views28 pages

ML Intro

The document outlines the fundamentals of machine learning, including definitions of learning and the data science process. It distinguishes between supervised, unsupervised, and semi-supervised learning, and discusses the stages of machine learning such as training and testing. Additionally, it provides examples related to cancer diagnosis to illustrate the application of machine learning algorithms in classification tasks.

Uploaded by

rtzvdpsw2x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Machine Learning

Data Science Process


Learning?
• Herbert Simon: “Learning is any process by which a
system improves performance from experience.”

• There are two ways that a system can improve:


1. By acquiring new knowledge (e.g. acquiring new facts)
2. By adapting its behavior (e.g. solving problems more accurately )
• How to learn a machine using data?
Main types of Machine Learning
• Supervised learning(With a teacher): uses a
series of labelled examples with direct feedback

• Unsupervised/clustering learning (without a


teacher): no feedback

• Semi-supervised: in between supervised and


unsupervised learning (Some data is labeled but
most of it is unlabeled)
Supervised vs Unsupervised
• How many groups do we have in this figure?
• Can we apply supervised learning?
• What will you get if you apply unsupervised learning?
What do you think now?
Supervised vs Unsupervised
• Can you separate this data into two groups?
Supervised vs Unsupervised vs Semi-supervised
Example
• We have a dataset with two columns x1 and x2
X1 X2
1 2
5 3
… …
• We plot the data into two-dimensional space as follows

Q1) can you


divide the data
into two
groups?
• Q1) can you divide the data into two groups?
• Try to separate the points based on the distance
between the data points

X1 X2
1 2
5 3
... ...
Q2) If we give you the labels (a new column which
provides the class of each row) can you draw a line
that separte the two classes?

X1 X2 X3
(Label)
1 2 normal
(blue)
5 3 abnormal
(red)
.. .. ..
Examples of
ML Algorithms
Usual ML stages
• Hypothesis, data
• Training or learning (requires examples/data)
• Testing or generalization
Training
• Training is the acquisition of knowledge, skills, and competencies as
a result of teaching, practical skills and knowledge that relate to
specific useful competencies (wikipedia)
• Training requires scenarios or examples (data)
In machine learning we learn from the available data or examples

Training: The figure shows how the separating line is updated through the several training steps

Initial random line Updating the line after one Training is complete
training step
Testing
• How well the learned system works?
• Generalization
• Performance on unseen or unknown scenarios or data

• Which model performs the best?


Types of testing
• Evaluate performance by
testing on data NOT used for
training (both should be
randomly sampled)

• Cross validation methods


for small data sets

The more (relevant) data the


better.
Defining the Learning Task
Improve on task, T, with respect to
performance metric, P, based on experience, E.
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors


P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.

T: Categorize email messages as spam or legitimate.


P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
Suppose that we are done
with EDA and data is
ready for modelling, what
is next?
Cancer diagnosis
This is our data 103x5
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1 5 20 118 M
2 3 15 130 B
3 7 10 52 B
4 2 30 100 M
... ... ... ... ...
100 3 19 100 M
101 4 16 95 M
102 9 22 125 B
103 1 14 80 M
Recall ML stages
Supervised Learning Classification

Training
Set

• Use this training set to learn how to classify patients


where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101 4 16 95 ?
102 9 22 125 ? Test Set
103 1 14 80 ?

Will be predicted by
Input Data our model
Breast Cancer Diagnosis Linear Separation

Line produced
by our model
to separate
the two
classes

The plot of the training data into 2D, where:


red represents M cases and blue represents B cases
Predict the test data

The gray circles represent the test set


• The model predict the test data as following:

Patient ID # of Tumors Avg Area Avg Density Diagnosis


101 4 16 95 M Predicted by
102 9 22 125 M
the model
103 1 14 80 M

Actual
diagnosis

• How good is our model?


Examples of
ML Algorithms

You might also like