Machine Learning

Python

Uploaded by

Boopathi Kumar Trainer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

32 views13 pages

Machine Learning

Python

Uploaded by

Boopathi Kumar Trainer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 13

o1 and 02: Introduction, Regression Analysis, and Gradient Descent Next Indes Introduction to the course ‘+ Weill learn about © State of the art © How to do the implementation + Applications of machine learning include © Search 2 Photo tagging © Spam filters + The AI dream of building machines as intelligent as humans ‘©. Many people believe best way to do that is mimic how humans learn ‘+ What the course covers ‘© Learn about state of the art algorithms © But the algorithms and math alone are no good 2 Need to know how to get these to work in problems Why is ML so prevalent? © Grew out of AT © Build intelligent machines = You can program a machine how to do some simple thing ‘= For the most part hard-wiring Al is too difficult + Best way to da itis to have some way for machines to learn things themselves ‘= A mechanism for learning -if a machine can learn from input then it does the hard work for you Examples + Database mining ‘2 Machine learning has recently become so big party because of the huge amount of data being generated © Large datasets from growth of automation web 2 Sourees of data include ‘= Web data (click-stream or click through data) ‘Mine to understand users better ‘= Huge segment of silicon valley = Medical records 1 Electronie records -> turn records in knowledges * Biological data ‘= Gene sequences, ML algorithms give a better understanding of human genome 1 Engineering info Data from sensors, log reports, photos ete + Applications that we cannot program by hand ‘9 Autonomous helicopter © Handwriting recognition = This is very inexpensive because when you write an envelope, algorithms can automatically route envelopes through the post © Natural language processing (NLP) «Al pertaining to language © Computer vision = Al pertaining vision + Self eustomizing programs © Netflix © Amazon © iTunes genius 9 Take users info * Learn based on your behavior + Understand human learning and the brain © If we can build systems that mimic (or try to mimic) how the brain works, this may push our own understanding of the associated neurobiology What is machine learning? + Here we.. ‘© Define what itis© When touseit Not a well defined definition ‘© Couple of examples of how people have tried to define it + Arthur Samuel (1959) © Machine learning: "Field of study that gives computers the abilit programmed" 1 Samuels wrote a checkers playing program ‘= Had the program play 10000 games against itself ‘= Work out which board positions were good and bad depending on wins/losses y to learn without being explicitly ‘+ Tom Michel (1999) ‘© Well posed learning problem: "A computer program is said to learn from experi some class of tasks I’ and performance measure P, if its performance at tasks in’ improves with experience E." = The checkers example, = 100008 games * Tis playing checkers = Pifyou win or not .ce E-with respect to ‘as measured by P, + Several types of learning algorithms © Supervised learning * Teach the computer how to do something, then let it use it;s new found knowledge to do it © Unsupervised learning Tel the computer learn how to do something, and use this to determine steueture and patterns in data © Reinforcement learning ‘© Recommender systems, + This course ‘© Look at practical advice for applying learning algorithms © Learning a set of tools and how to apply them Supervised learning - introduction ‘+ Probably the most common problem type in machine learning + Starting with an example ‘© How do we predict housing prices * Collect data regarding housing prices and how they relate to size in feet Housing price prediction. 400 x 300 x x xX xX Xx Price ($) x x 200 in 1000's x 100 x ° 0 500 10001500» 2000-2500 Size in feet? + Example problem: "Given this data, a friend has a house 750 square feet - how much can they be expected to get? ‘+ What approaches can we use to solve this? © Straight line through data = Maybe $150 000 © Second order polynomial = Maybe $200 000 © One thing we discuss later - how to chose straight or curved line? © Bach of these approaches represent a way of doing supervised learning ‘+ What does this mean? ‘© We gave the algorithm a data set where a "right answer" was provided ‘© Sowe know actual prices for houses‘The idea is we can learn what makes the price a certain value from the training data * The algorithm should then produce more right answers based on new training data where we don’t know the price already * i.e. prediet the price ‘+ We also call this a regression problem © Predict continuous valued output (price) © Noreal discrete delineation + Another example ‘© Can we definer breast cancer as malignant or benign based on tumour size WY) Malignant? own) Tumor Size + Looking at data °° Fiveofeach © Can you estimate prognosis based on tumor size? © Thisis an example ofa classification problem * Classify data into one of two diserete classes -no in between, either malignant or not * In classification problems, can have a discrete numberof possible values fo the output = eg-maybe have four values * 0- benign = 1-typet 2-type2 typed + In classification problems we ean plot data in a different way —O— 68 5 _ Sx B30 Tumor Size + Use only one attribute (size) © In other problems may have multiple attributes ‘2 Wemay also, for example, know age and tumor size Tumor Size + Based on that data, you can try and define separate classes by ‘© Drawing a straight line between the two groups © Using a more complex funetion to define the two groups (which well seuss later)‘© Then, when you have an individual with a specific tumor size and who is a specific age, you can hopefully use that information to place them into one of your classes + You might have many features to consider ‘© Clump thickness © Uniformity of cel size © Uniformity of cell shape ‘+ The most exciting algorithms can deal with an infinite number of features ‘© How do you deal with an infinite number of features? ‘© Neat mathematical trick in support vector machine (which we discuss later) ‘Ifyou have an infinitely long list - we can develop and algorithm to deal with that + Summary ‘© Supervised learning lets you get the ‘© Regression problem © Classification problem ight” data a Unsupervised learning - introduction + Second major problem type + In unsupervised learning, we get unlabeled data ‘© dust told -here is a data set, an you structure it ‘+ One way of doing this would be to cluster data into to groups © Thisisa clustering algorithm Clustering algorithm + Example of clustering algorithm © Google news = Groups news stories into cohesive groups ‘© Used in any other problems as well = Geno! ‘= Microarray data = Have a group of individuals = On each measure expression of a gene = Run algorithm to cluster individuals into types of people Genes Individuals * Organize computer clusters += Identify potential weak spots or distribute workload effectively * Social network analysis = Customer data * Astronomical data analysis, * Algorithms give amazing results + Basically © Can you automatically generate structure ‘© Because we don't give it the answer, i's unsupervised learning Cocktail party algorithm + Cocktail party problem © Lots of overlapping voiees - hard to hear what everyone is saying, = Two people talking ‘© Microphones at different distances from speakersCocktail party problem e @ ¢ q + Record sightly different versions of the conversation depending on where your microphone is ‘© Bul overlapping none the less ‘+ Have recordings of the conversation from each microphone ‘2 Give them to a cocktail party algorithm © Algorithm processes audio recordings * Determines there are two audio sources * Separates out the two sources «+ Isthis a very complicated problem © Algorithm can be done with one line of code! © (W,s,v] = evd((repmat (eum(x.*x,1) , size (x,1),1).*x)*x') # Not easy to identify ‘= But, programs can be short! = Using octave (or MATLAB) for examples ‘= Often prototype algorithms in octave/MATLAB to test as, = Only when you show it works migeate it to C++ ives a much faster agile development Understanding this algorithm © svd- linear algebra routine which is built into octave = In C++ this would be very complicated! ‘© Shown that using MATLAB to prototype is a really good way to do this 'svery fast Linear Regression ‘+ Housing price data example used earlier ‘© Supervised learning regression problem + What do we start with? © Training set (this is your data set) © Notation (used throughout the course) ‘= m= number of training examples input variables / features 1 y’s= output variable "target" variables ‘= (xy)- single training example = (ly) - specific example ('" training example) ‘isan index to training set Size in feet? (x) | Price ($) in 1000's (y) 2104 460 1416 232 me 4h 1534 315 852 178 ‘+ With our training set defined - how do we used it? © Take training set © Pass into a learning algorithm‘© Algorithm outputs a function (denoted h) (h = hypothesis) * This function takes an input (eg. size of new house) 1 Tries to output the estimated value of Y + How dowe represent hypothesis h? © Going to present has; 1 hols) = 09 + 03x ‘= h() (shorthand) ho() = 00 + Ox + What does this mean? ‘© Means Y isa linear function of x! © Ojare parameters + 09s zero condition = 6, is gradient + This kind of function isa linear regression with one variable © Also called univariate linear regression + Soin summary ‘© Abypothesis takes in some variable © Uses parameters determined by a learning system ‘© Outputs a prediction based on that input Linear regression - implementation (cost function) + Accost function lets us figure out how to fit the best straight line to our data ‘+ Choosing values for 0; (parameters) © Different values give you different functions © If, is 1.5 and 6, iso then we get straight line parallel with X along is @ y © If6, is > o then we get a positive slope ‘+ Based on our training set we want to generate parameters which make the straight line ‘© Chosen these parameters 50 h(x) is close to y for our training examples ' Basically, uses xs in training set with ho(x) to give output which is as close to the actual y value as possible ‘= Think of hg(x) asa "y imitator”~it tries to convert the x into y, and considering we already have y we can evaluate how ‘well ho(x) does this + To formalize this; ‘© We want to want to solve a minimization problem © Minimize (ho() - y)® * i.e, minimize the difference between h(x) and y for each/any/every example ‘© Sum this over the training set 1” “) «) meh Y= y ‘+ Minimize squared different between predicted house price and actual house price * 1/m- means we determine the average + iam the 2 makes the math a bit easier, and doesnt change the constants we determine at al (i.e. hal the smallest, value is still the smallest value!) ‘© Minimizing 8/8, means we get the values of 8, and 8, which find on average the minimal deviation ofx from y when we use those parameters in our hypothesis function ‘+ More cleanly, this is a cost function Tie. 9) = t, Ele gF+ And we want to minimize this cost function © Our cost function is (because of the summartion term) inherently looking at ALL the data in the training set at any time + Soto recap © Hypothesis -is like your prediction machine, throw in an x value, get a putative y value 3 ho(x) 2 y A=1 1 0 0 1 2 3 x © Cost. is a way to, using your training data, determine values for your 0 values which make the hypothesis as accurate as, possible Mamre S(O, a | Lost Euction ' This cost function is also ealled the squared error cost function ' This cost function is reasonable choice for most regression functions = Probably most commonly used function © Incase J(0o,0,) is a bit abstract, going into what it does, why it works and how we use it in the coming sections Cost function - a deeper look ‘+ Lets consider some intuition about the cost funetion and why we want to use it ‘© The cost function determines parameters © The value associated with the parameters determines how your hypothesis behaves, with different values generate different ‘+ Simplified hypothesis 8 Assumes 8, = 0 ho(x) = Oe Ss=o J(01) = BY (ho(2) = y)? minimize J(01) 1 + Cost function and goal here are very similar to when we have @y, but with a simpler parameter © Simplified hypothesis makes visualizing cost function JQ) bit easier ‘+ So hypothesis pass through 0,0 ‘+ Two key functins we want to understand © h(x) '= Hypothesis is a funetion of x - funetion of what the size of the house is© 30) 1 Isa function of the parameter of 8, «© So for example = JO) =0 © Plot = 8,vsJ06) = Data oD "2 “3 "0-0 = 30) =~2.3 2 Ife compute a range of values plot = J(@,) vs 8, we get a polynomial (looks ike a quadratic) J(0) 05005 115 2 25 a + "The optimization objective forthe learning algorithm is ind the value of O, which minimizes J(0,) «@ Shere 0, = 1isthe best value for 0, A deeper insight into the cost function - simplified cost function + Assume you're familiar with contour plots or contour figures Using same cost fonction, hypothesis and goa! previously 5 Its Ok to sip pars ofthis secon i'you don't understand ctour plots + Using our orginal complex hyotbeas with two parables, = Ip, 0)) + Example, Say = 09=50 = ,= 0.06 © Previously we plotted our cost function by plotting = 6,¥5I06,) 9 Now we have two parameters ' Plot becomes a bit more complicated * Generates a aD surface plot where axis are 8 = 2205 = Y= 100,04)10 6% 20-20 e% ‘+ We can see that the height (y) indicates the value of the cost function, so find where y is at a minimum + Instead of a surface plot we can use a contour figures/plots © Set of ellipses in different colors © Each colour is the same value of J(8y,@,), but obviously plot to different locations because 8, and 8, will vary ‘© Imagine a bow! shape function coming out ofthe screen so the middle is the concentric circles T(8o,01) {function of the parameters @o, th) + Each point (like the red one above) represents a pair of parameter values for 80 and O1 ‘© Ourexample here put the values at = 0=~800 = 8,=~-015 © Nota good fit * ie. these parameters give @ value on our contour plot far from the center © Ifwehave #0) = ~360 = 0,-0 1 This gives a better hypothesis, but still not great -not inthe center of the eountour plot Finally we find the minimum, which gives the best hypothesis + Doing this by eye/hand isa pain in the ass ‘© What we really want isan efficient algorithm fro finding the minimum for O and 0, Gradient descent algorithm + Minimize cost funetion J+ Gradient descent ‘© Used all over machine learning for minimization + Start by looking ata general JQ function * Problem © Wehave J(®,, 0.) © We want to get min J(0g, 0,) Gradient descent applies to more general functions © Jo» By Oz Bn) © minJ(G, 8,8 8) How does it work? + Start with initial guesses © Start at 0,0 (or any other value) © Keeping changing Qg and 0, a little bt to try and reduce J(,,0,) ‘+ Each time you change the parameters, you select the gradient which reduces J(@,8,) the most possible + Repeat Do so until you converge to a local minimum + Has an interesting property ‘© Where you start can determine which minimum you end up 1104.0). ‘© Here we ean see one initialization point led to one local minimum © The other led to a different one Amore formal definition + Do the following until eovergence ) (for j = 0 and j = 1) + What does this ll mean? ‘© Update 6; by setting it to (0 ~a) times the partial derivative ofthe cost function with respect to 9; + Notation * Denotes assignment = NBa=bisa fruth assertion © a(alpha) = Isa number called the learning rate = Controls how big a step you take = Ifqis big have an aggressive gradient descent * If ais small take tiny steps + Derivative term a 3g, 7000)© Not going to talk about it now, derive it later ‘There isa subtly about how this gradient descent algorithm is implemented © Dothis for @, and 4, © Forj =o and j= 1 means we simultaneously update both, © How do we do this? ‘= Compute the right hand side for both ®, and @ = So we need a temp value «= Then, update 8, and 6, at the same time = We show this graphically below temp0 := 0 — a72-J(0, 01) temp] := 0, ~ az J(O, 61) 0 == temp0 6; := templ ‘+ Ifyou implement the non-simultancous update it's not gradient descent, and will behave weirdly ‘© But it might look sort of right -so it's important to remember this! Understanding the algorithm «To understand gradient descent, well return toa simpler function where we minimize one parameter to help explain the algorithm in more detail © min 8, J(,) where 8, i areal number + Two key terms inthe algorithm © Alpha Derivative term + Notation nuances > Paral derivative va. derivative * Use partial derivative when we have multiple variables but only derive with respect to one + Use derivative when we are deriving with respect to all the variables + Derivative term a = J(61) 00; «Derivative says + Lets take the tangent at the point and look atthe slope ofthe line + Somoving towards the mimum (down) wil greate a negative derivative, alpha is always postive, so will update (0) to a smaller value = Similarly, if we're moving up a slope we make j(6,) a bigger numbers Alpha term (a) © What happens ifalpha is too smal or too large ® Toosmall = Take baby steps: 1 Takes to long @ Too large * Can overshoot the minimum and fai to converge + When you get toa local minianura © Gradient of tangent/derivative is 0 2 So derivative term = 0 © alpha*o=0 2 $00, =0,-0 80 8, remains the same ‘+ Asyou approach the global minimum the derivative term gets smaller, so your update gets smaller, even with alpha is fixed ‘© Means as the algorithm runs you take smaller steps as you approach the minimum. ‘2 Sono need to change alpha over time inear regression with gradient descent + Apply gradient descent to minimize the squared error cost function J(09, 0) ‘+ Now we havea partial derivative+ Sohere were just expanding out the ist expression © JO, 0,) =1/2m.... © hyls) = By + Ayre + Soe need to determine the derivative for each parameter ~ie 3 'When jo ° Whenj=1 «Figure out what this partial derivative is fr the Op and, case © When we derive this expression in terms of j = 0 and j = 1 we get the following 100,01) = Az E Chole) (he GE) yh). Fi I(G0,81) = ans + To check this you need to know multivariate caleulus ‘© Sowe can plug these values back into the gradient descent algorithm ‘+ How does it work ‘© Risk of meeting different local optimum ‘© The linear regression cost function is always a convex function - always has a single minimum * Bowl shaped = One global optima ‘So gradient descent will always converge to global optima © Inaction = Initialize values to = 8 =, ho(x) J (G0, 01) (For fixed 0p, 1, this isa function of x) (function of the parameters Hp, 61) Tala data current pothesis i000—~3000— 300000 iw fut) + End up ata global minimum + This is actually Batch Gradient Descent © Refers to the fact that over each step you look at all the training data = Each step compute aver m training examples ‘© Sometimes non-batch versions exist, which look at small data subsets+= We'll look at other forms of gradient descent (to use when m is too large) later in the course ‘There exists a numerical solution for finding a solution for a minimum function ‘© Normal equations method © Gradient descent scales better to large data sets though © Used in lots of contexts and machine learning ‘What's next - important extensions Two extension to the algorithm + 1) Normal equation for numeric solution © To solve the minimization problem we can solve it min J(@6,)] exactly using a numeric method which avoids the iterative approach used by gradient descent © Normal equations method © Has advantages and disadvantages = Advantage ‘= No longer an alpha term = Can be much faster for some problems * Disadvantage = Much more complicated ‘© We discuss the normal equation in the linear regression with multiple features section + 2) We can learn with a larger number of features ‘© Somay have other parameters which contribute towards a prize = eg. with houses " Size = Age Number bedrooms = Number floors fx, x2, 33,84 ‘© With multiple features becomes hard to plot * Can't really plot in more than 3 dimensions ‘= Notation becomes more complicated too ‘Best way to get around with this is the notation of linear algebra ‘= Gives notation and set of things you can do with matrices and vectors ern 2104 5 1 45 460 y— {lie 3 2 40 , — (232 1534 3 2 30 Y= I315, 852 2 1 36 172 ‘+ We see here this matrix shows us © Size © Number of bedrooms ‘© Number floors © Age of home + Allin one variable © Block of numbers, take all data organized into one big block + Vector © Shown as y 2 Shows us the prices ‘+ Need linear algebra for more complex linear regression modles + Linear algebra is good for making computationally efficient models (as seen later too) © Provide a good way to work with large sets of data sets| ‘© Typically vectorization of a problem is a common optimization technique

01 02 Introduction Regression Analysis and GR
No ratings yet
01 02 Introduction Regression Analysis and GR
11 pages
Intro to Machine Learning
100% (1)
Intro to Machine Learning
170 pages
ML 01
No ratings yet
ML 01
15 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
ML - Week 1
No ratings yet
ML - Week 1
37 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
MLLecture 1
No ratings yet
MLLecture 1
10 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
28 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
W1 - Introduction To ML
No ratings yet
W1 - Introduction To ML
57 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
03 Introtoml Ueh
No ratings yet
03 Introtoml Ueh
43 pages
Machine Learning: Welcome!
No ratings yet
Machine Learning: Welcome!
181 pages
L 1 Intro Machine Learning
No ratings yet
L 1 Intro Machine Learning
45 pages
AI.5 Machine Learning (21 26)
No ratings yet
AI.5 Machine Learning (21 26)
176 pages
MACHINE LEARNING ALGORITHM - Unit-1-1
100% (1)
MACHINE LEARNING ALGORITHM - Unit-1-1
78 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Introduction To Machine Learning: David Kauchak CS 451 - Fall 2013
No ratings yet
Introduction To Machine Learning: David Kauchak CS 451 - Fall 2013
34 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Unit 01
No ratings yet
Unit 01
32 pages
Unit-3 Machine Learning
No ratings yet
Unit-3 Machine Learning
81 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Unit-1 Introduction To Machine Learning
No ratings yet
Unit-1 Introduction To Machine Learning
24 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
Lecture 1, Machine Learning Course Andrew NG
No ratings yet
Lecture 1, Machine Learning Course Andrew NG
3 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
12 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
Lecture 1.1. Introduction
No ratings yet
Lecture 1.1. Introduction
48 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Module 1-Basics of ML
No ratings yet
Module 1-Basics of ML
142 pages
UNIT I-Part 1
No ratings yet
UNIT I-Part 1
52 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
15 pages
Tirth PDF
No ratings yet
Tirth PDF
19 pages
Machine Learning
No ratings yet
Machine Learning
26 pages
ME3435E ADDTE Lect27 Machine Learning For Signal Processing 19.03.25
No ratings yet
ME3435E ADDTE Lect27 Machine Learning For Signal Processing 19.03.25
34 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
606 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
51 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
46 pages
Iu 3.6.4 ML 101
No ratings yet
Iu 3.6.4 ML 101
39 pages
S1 - 25 (NSP) - ML - CS 1 - 27th July 2025
No ratings yet
S1 - 25 (NSP) - ML - CS 1 - 27th July 2025
59 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Data Science & ML Course Guide
No ratings yet
Data Science & ML Course Guide
83 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

You might also like