We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45
Unit Coverage
Machine Learning Basics
Machine Learning Applications Supervised and Unsupervised learning Gradient Descent For learning Copyright @ gdeepak.com 1 6/4/2014 6:53 PM Machine Learning Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn fromdata. -Wikipedia It is a type of AI that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. 6/4/2014 6:53 PM Copyright @ gdeepak.com 2 Machine Learning-Abstract Definition A Machine (Computer Program) Learns with experience E for Some Task T and Performance Measure P, if P keeps on increasing with increase in E. Example: Someone Writes a Program to classify and filter your emails as spam or not based on your marking of individual mails as spam or not. What is T, E and P in this example. Your marking of an email as Spam or not Percentage of emails being true positive for spam Recording of your labelling or classification of email as spam or not 6/4/2014 6:53 PM Copyright @ gdeepak.com 3 Characteristic of Machine Learning - Way learning happens is very critical - Applies to tasks that can not be defined well, except by examples - To find relationships and correlations that can be hidden in the data - To learn in proportion to the experience e.g. becoming a better player after playing many games - Results may vary vastly if we apply different learning paths or different algorithms of ML 6/4/2014 6:53 PM Copyright @ gdeepak.com 4 Two Basic Types Supervised Learning In general it means that we are going to supervise the learning mechanism or we are going to supply some guidelines/ parameters/ labelling regarding the data In supervised learning, training patterns giving inputs and the corresponding correct outputs are available. 6/4/2014 6:53 PM Copyright @ gdeepak.com 5 Unsupervised Learning Learning happens automatically and the structures hidden in the data are recognised by the system System must find interesting and/or significant patterns in the data without any feedback as to what is right 6/4/2014 6:53 PM Copyright @ gdeepak.com 6 Grading Example 6/4/2014 6:53 PM Copyright @ gdeepak.com 7 A B C D G r a d e s Markss Handwriting Recognition 6/4/2014 6:53 PM Copyright @ gdeepak.com 8 S S S written by Different People Supervised Learning 6/4/2014 6:53 PM Copyright @ gdeepak.com 9 For each Data Point we will let the machine know whether it is a star or smiley Classification or regression When we have discrete outputs then the problem is a classification problem When we have continuous outputs then it is a regression problem 6/4/2014 6:53 PM Copyright @ gdeepak.com 10 Traffic Time Prediction You Give 10 Actual Timings to reach from Ambala to Delhi if you start at different times of the day starting at 9 A.M. Now you want to Predict the timings at some other time You may use different Curve Fittings 6/4/2014 6:53 PM Copyright @ gdeepak.com 11 6/4/2014 6:53 PM Copyright @ gdeepak.com 12 6/4/2014 6:53 PM Copyright @ gdeepak.com 13 Speech Recognition Database of Requests User speaks something, you need to Identify the request You need to capture the individual recordings from a meeting of four people You need t0 separate the conversation from background noise or music You need to understand the speech in a particular language and do the text labelling You need to convert the speech in some other language 6/4/2014 6:53 PM Copyright @ gdeepak.com 14 Medical Diagnosis Given Symptom and Disease database A new patient with some symptom comes, you need to identify the disease To diagnose the disease from the test results and by analysing the images from the medical equipment T0 recognise disability by looking at the photograph of the person Different machine learning tests for various disabilities e.g. hearing test 6/4/2014 6:53 PM Copyright @ gdeepak.com 15 Unsupervised Learning 6/4/2014 6:53 PM Copyright @ gdeepak.com 16 Each Data Point is given but not labelled; machine is supposed to find some structure in the data; in this example called clusters News.google.com All similar stories are clustered at one place 6/4/2014 6:53 PM Copyright @ gdeepak.com 17 Participant Segmentation If I give all the Registration Formdata of IWMLDA to Some unsupervised learning based programand it comes out with some grouping based on the distinguishing features given in the Registration Form. 6/4/2014 6:53 PM Copyright @ gdeepak.com 18 Social Network Analysis To find groups of certain kind on the Network based on their activity, Cohesiveness, Type of Chatter, Type of likes etc 6/4/2014 6:53 PM Copyright @ gdeepak.com 19 Sentiment Analysis You want to buy a product and you want to know the sentiments of the public who has previously used or bought that product. There can be different kinds of sentiments that have been expressed online; It may be related to sports, Politics, Tragedy, Agitation/ Revolution 6/4/2014 6:53 PM Copyright @ gdeepak.com 20 Semi-Supervised Learning Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Actually it will depend upon the type and size of data available. 6/4/2014 6:53 PM Copyright @ gdeepak.com 21 Training Set-Old car price example Mileage Car Price 2000 300000 20000 200000 18000 220000 100000 100000 50000 150000 80000 130000 10000 250000 6/4/2014 6:53 PM Copyright @ gdeepak.com 22 Some Terminology m= number of training records x= input features/ input values of the variables (can be more than one) y= output value ( Can be more than one) (x,y) is a pair of one training record (x (i) , y (i) ) is i th pair of training example
i is parameter of feature x What is y 4 and what is x 2 on the previous slide 6/4/2014 6:53 PM Copyright @ gdeepak.com 23 General Model 6/4/2014 6:53 PM Copyright @ gdeepak.com 24 Mileage (x) Hypothesis (h) Training Records Car Price (y) Learning Algorithm Hypothesis Parameters Only point to be kept in mind while selecting i is that it should give the value of the hypothesis as close to y in the training record as possible 6/4/2014 6:53 PM Copyright @ gdeepak.com 25 Cost Function using Squared Error Function 6/4/2014 6:53 PM Copyright @ gdeepak.com 26 Contour Plots 6/4/2014 6:53 PM Copyright @ gdeepak.com 27 Contour Figure 6/4/2014 6:53 PM Copyright @ gdeepak.com 28 6/4/2014 6:53 PM Copyright @ gdeepak.com 29 Another Example 6/4/2014 6:53 PM Copyright @ gdeepak.com 30 Gradient Descent To minimize the cost function Min (0, 1, 2.. n) J (0, 1, 2.. n) Start with some initial values of Keep applying gradient descent until we reach to the minimum possible value, which may be the optimal value of the cost function. 6/4/2014 6:53 PM Copyright @ gdeepak.com 31 Different Shape Bowls 6/4/2014 6:53 PM Copyright @ gdeepak.com 32 For convergence 6/4/2014 6:53 PM Copyright @ gdeepak.com 33 Where alpha is the learning rate. Learning rate also plays an important role in the slow and fast convergence of Gradient Descent, but there is always a trade off. With small learning rate, algorithm may take many iteration and will be slow, while with large learning rate, the algorithm may be fast but it may not converge at all and we may skip or bypass the local or global minim. All values of j from 0 to n should be simultaneously updated Concept of learning rate on bowl shape curve 6/4/2014 6:53 PM Copyright @ gdeepak.com 34 Gradient Descent with linear regression We repeat the following expression until the function converge for all values of . 6/4/2014 6:53 PM Copyright @ gdeepak.com 35 If each step of the the gradient descent uses all the training records then that algorithm comes under the category of batch gradient descent Dealing with multiple variables Mileage Car Price Engine Size (No. of Cylinders) Original Price Accessory Cost 2000 300000 4 500000 40000 20000 200000 6 600000 3000 18000 220000 4 450000 100000 100000 100000 4 400000 50000 50000 150000 8 800000 110000 6/4/2014 6:53 PM Copyright @ gdeepak.com 36 Feature scaling Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, the majority of classifiers calculate the distance between two points by the distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance. 6/4/2014 6:53 PM Copyright @ gdeepak.com 37 How to do feature scaling
Where is the mean or average value of the training values
of that feature and s is the range (max-min) of that features training value. We try to get every feature into Range. However if the feature values are not too much distorted then we may not decide to go for feature scaling. 6/4/2014 6:53 PM Copyright @ gdeepak.com 38 Feature Scaling Example Average : 303000/5= 60600 Max-Min= 107000 6/4/2014 6:53 PM Copyright @ gdeepak.com 39 Mileage Car Price Engine Size (No. of Cylinders) Accessory Cost After Scaling 2000 300000 4 40000 -0.19 20000 200000 6 3000 -0.53 18000 220000 4 100000 +0.37 100000 100000 4 50000 -0.1 50000 150000 8 110000 +0.46 Combining Features Few features may have same values but may have been given in different units. For ex. Height in cm and height in inches. Similarly few features have parallel values e.g length of the string, number of characters in the string etc 6/4/2014 6:53 PM Copyright @ gdeepak.com 40 Other Imp points regarding Convergence of Gradient Descent For small learning rate J() should decrease on every iteration of the algorithm. Having learning rate too small or too large will have its own issues as discussed before. The number of iterations may vary from two digits to many digits. If J() decreases by less than 0.001 then we can declare convergence, otherwise the delta change will be too small. 6/4/2014 6:53 PM Copyright @ gdeepak.com 41 Question Does the learning Rate remains same or it changes over time. If yes, why. If No, Why. 6/4/2014 6:53 PM Copyright @ gdeepak.com 42 Question 6/4/2014 6:53 PM Copyright @ gdeepak.com 43 Whether test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles using k-NN technique. If k = 3 (solid line circle) If k = 5 (dashed line circle) Question What will be your criteria to decide whether to use feature scaling or not? 6/4/2014 6:53 PM Copyright @ gdeepak.com 44 Questions, Suggestions and Comments 6/4/2014 6:53 PM Copyright @ gdeepak.com 45