0% found this document useful (0 votes)
46 views45 pages

L 1 Intro Machine Learning

Introduction to machine learning

Uploaded by

Prateek Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views45 pages

L 1 Intro Machine Learning

Introduction to machine learning

Uploaded by

Prateek Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Unit Coverage

Machine Learning Basics


Machine Learning Applications
Supervised and Unsupervised learning
Gradient Descent For learning
Copyright @ gdeepak.com 1 6/4/2014 6:53 PM
Machine Learning
Machine learning, a branch of artificial intelligence,
concerns the construction and study of systems that can
learn fromdata. -Wikipedia
It is a type of AI that provides computers with the ability to
learn without being explicitly programmed. Machine
learning focuses on the development of computer programs
that can teach themselves to grow and change when
exposed to new data.
6/4/2014 6:53 PM Copyright @ gdeepak.com 2
Machine Learning-Abstract Definition
A Machine (Computer Program) Learns with experience E for
Some Task T and Performance Measure P, if P keeps on
increasing with increase in E.
Example: Someone Writes a Program to classify and filter your
emails as spam or not based on your marking of individual mails
as spam or not. What is T, E and P in this example.
Your marking of an email as Spam or not
Percentage of emails being true positive for spam
Recording of your labelling or classification of email as spam
or not
6/4/2014 6:53 PM Copyright @ gdeepak.com 3
Characteristic of Machine Learning
- Way learning happens is very critical
- Applies to tasks that can not be defined well, except by
examples
- To find relationships and correlations that can be hidden
in the data
- To learn in proportion to the experience e.g. becoming a
better player after playing many games
- Results may vary vastly if we apply different learning
paths or different algorithms of ML
6/4/2014 6:53 PM Copyright @ gdeepak.com 4
Two Basic Types
Supervised Learning
In general it means that we are going to supervise the
learning mechanism or we are going to supply some
guidelines/ parameters/ labelling regarding the data
In supervised learning, training patterns giving inputs and
the corresponding correct outputs are available.
6/4/2014 6:53 PM Copyright @ gdeepak.com 5
Unsupervised Learning
Learning happens automatically and the structures
hidden in the data are recognised by the system
System must find interesting and/or significant patterns
in the data without any feedback as to what is right
6/4/2014 6:53 PM Copyright @ gdeepak.com 6
Grading Example
6/4/2014 6:53 PM Copyright @ gdeepak.com 7
A
B
C
D
G
r
a
d
e
s
Markss
Handwriting Recognition
6/4/2014 6:53 PM Copyright @ gdeepak.com 8
S
S
S written by Different People
Supervised Learning
6/4/2014 6:53 PM Copyright @ gdeepak.com 9
For each Data Point we will let the machine know whether
it is a star or smiley
Classification or regression
When we have discrete outputs then the problem is a
classification problem
When we have continuous outputs then it is a regression
problem
6/4/2014 6:53 PM Copyright @ gdeepak.com 10
Traffic Time Prediction
You Give 10 Actual Timings to reach from Ambala to
Delhi if you start at different times of the day starting at
9 A.M.
Now you want to Predict the timings at some other time
You may use different Curve Fittings
6/4/2014 6:53 PM Copyright @ gdeepak.com 11
6/4/2014 6:53 PM Copyright @ gdeepak.com 12
6/4/2014 6:53 PM Copyright @ gdeepak.com 13
Speech Recognition
Database of Requests
User speaks something, you need to Identify the request
You need to capture the individual recordings from a
meeting of four people
You need t0 separate the conversation from background
noise or music
You need to understand the speech in a particular language
and do the text labelling
You need to convert the speech in some other language
6/4/2014 6:53 PM Copyright @ gdeepak.com 14
Medical Diagnosis
Given Symptom and Disease database
A new patient with some symptom comes, you need to
identify the disease
To diagnose the disease from the test results and by
analysing the images from the medical equipment
T0 recognise disability by looking at the photograph of
the person
Different machine learning tests for various disabilities
e.g. hearing test
6/4/2014 6:53 PM Copyright @ gdeepak.com 15
Unsupervised Learning
6/4/2014 6:53 PM Copyright @ gdeepak.com 16
Each Data Point is given but not labelled; machine is supposed to
find some structure in the data; in this example called clusters
News.google.com
All similar stories are clustered at one place
6/4/2014 6:53 PM Copyright @ gdeepak.com 17
Participant Segmentation
If I give all the Registration Formdata of IWMLDA to Some
unsupervised learning based programand it comes out with
some grouping based on the distinguishing features given
in the Registration Form.
6/4/2014 6:53 PM Copyright @ gdeepak.com 18
Social Network Analysis
To find groups of certain kind on the
Network based on their activity, Cohesiveness,
Type of Chatter, Type of likes etc
6/4/2014 6:53 PM Copyright @ gdeepak.com 19
Sentiment Analysis
You want to buy a product and you want to know the
sentiments of the public who has previously used or
bought that product.
There can be different kinds of sentiments that have
been expressed online; It may be related to sports,
Politics, Tragedy, Agitation/ Revolution
6/4/2014 6:53 PM Copyright @ gdeepak.com 20
Semi-Supervised Learning
Semi-supervised learning is a class of supervised learning
tasks and techniques that also make use of unlabeled data
for training - typically a small amount of labeled data with a
large amount of unlabeled data.
Actually it will depend upon the type and size of data
available.
6/4/2014 6:53 PM Copyright @ gdeepak.com 21
Training Set-Old car price example
Mileage Car Price
2000 300000
20000 200000
18000 220000
100000 100000
50000 150000
80000 130000
10000 250000
6/4/2014 6:53 PM Copyright @ gdeepak.com 22
Some Terminology
m= number of training records
x= input features/ input values of the variables (can be
more than one)
y= output value ( Can be more than one)
(x,y) is a pair of one training record
(x
(i)
, y
(i)
) is i
th
pair of training example

i
is parameter of feature x
What is y
4
and what is x
2
on the previous slide
6/4/2014 6:53 PM Copyright @ gdeepak.com 23
General Model
6/4/2014 6:53 PM Copyright @ gdeepak.com 24
Mileage (x)
Hypothesis
(h)
Training Records
Car Price (y)
Learning
Algorithm
Hypothesis Parameters
Only point to be kept in mind while selecting
i
is that it
should give the value of the hypothesis as close to y in the
training record as possible
6/4/2014 6:53 PM Copyright @ gdeepak.com 25
Cost Function using Squared Error Function
6/4/2014 6:53 PM Copyright @ gdeepak.com 26
Contour Plots
6/4/2014 6:53 PM Copyright @ gdeepak.com 27
Contour Figure
6/4/2014 6:53 PM Copyright @ gdeepak.com 28
6/4/2014 6:53 PM Copyright @ gdeepak.com 29
Another Example
6/4/2014 6:53 PM Copyright @ gdeepak.com 30
Gradient Descent
To minimize the cost function
Min
(0, 1, 2.. n)
J (0, 1, 2.. n)
Start with some initial values of
Keep applying gradient descent until we reach to the
minimum possible value, which may be the optimal value of
the cost function.
6/4/2014 6:53 PM Copyright @ gdeepak.com 31
Different Shape Bowls
6/4/2014 6:53 PM Copyright @ gdeepak.com 32
For convergence
6/4/2014 6:53 PM Copyright @ gdeepak.com 33
Where alpha is the learning rate. Learning rate also plays an
important role in the slow and fast convergence of Gradient
Descent, but there is always a trade off. With small learning rate,
algorithm may take many iteration and will be slow, while with
large learning rate, the algorithm may be fast but it may not
converge at all and we may skip or bypass the local or global minim.
All values of j from 0
to n should be
simultaneously
updated
Concept of learning rate on bowl shape
curve
6/4/2014 6:53 PM Copyright @ gdeepak.com 34
Gradient Descent with linear regression
We repeat the following expression until the function
converge for all values of .
6/4/2014 6:53 PM Copyright @ gdeepak.com 35
If each step of the the gradient descent uses all the training records then that
algorithm comes under the category of batch gradient descent
Dealing with multiple variables
Mileage Car Price Engine Size
(No. of
Cylinders)
Original
Price
Accessory
Cost
2000 300000 4 500000 40000
20000 200000 6 600000 3000
18000 220000 4 450000 100000
100000 100000 4 400000 50000
50000 150000 8 800000 110000
6/4/2014 6:53 PM Copyright @ gdeepak.com 36
Feature scaling
Since the range of values of raw data varies widely, in some
machine learning algorithms, objective functions will not
work properly without normalization. For example, the
majority of classifiers calculate the distance between two
points by the distance. If one of the features has a broad
range of values, the distance will be governed by this
particular feature. Therefore, the range of all features
should be normalized so that each feature contributes
approximately proportionately to the final distance.
6/4/2014 6:53 PM Copyright @ gdeepak.com 37
How to do feature scaling

Where is the mean or average value of the training values


of that feature and s is the range (max-min) of that features
training value. We try to get every feature into
Range. However if the feature values are not too much
distorted then we may not decide to go for feature scaling.
6/4/2014 6:53 PM Copyright @ gdeepak.com 38
Feature Scaling Example
Average : 303000/5= 60600
Max-Min= 107000
6/4/2014 6:53 PM Copyright @ gdeepak.com 39
Mileage Car Price Engine Size
(No. of
Cylinders)
Accessory
Cost
After
Scaling
2000 300000 4 40000 -0.19
20000 200000 6 3000 -0.53
18000 220000 4 100000 +0.37
100000 100000 4 50000 -0.1
50000 150000 8 110000 +0.46
Combining Features
Few features may have same values but may have been
given in different units. For ex. Height in cm and height in
inches. Similarly few features have parallel values e.g length
of the string, number of characters in the string etc
6/4/2014 6:53 PM Copyright @ gdeepak.com 40
Other Imp points regarding Convergence of
Gradient Descent
For small learning rate J() should decrease on every
iteration of the algorithm.
Having learning rate too small or too large will have its
own issues as discussed before.
The number of iterations may vary from two digits to
many digits.
If J() decreases by less than 0.001 then we can declare
convergence, otherwise the delta change will be too
small.
6/4/2014 6:53 PM Copyright @ gdeepak.com 41
Question
Does the learning Rate remains same or it changes over
time. If yes, why. If No, Why.
6/4/2014 6:53 PM Copyright @ gdeepak.com 42
Question
6/4/2014 6:53 PM Copyright @ gdeepak.com 43
Whether test sample (green circle)
should be classified either to the first
class of blue squares or to the second
class of red triangles using k-NN
technique.
If k = 3 (solid line circle)
If k = 5 (dashed line circle)
Question
What will be your criteria to decide whether to use feature
scaling or not?
6/4/2014 6:53 PM Copyright @ gdeepak.com 44
Questions, Suggestions and Comments
6/4/2014 6:53 PM Copyright @ gdeepak.com 45

You might also like