0% found this document useful (0 votes)
22 views23 pages

Unit 3 Machine Learning Notes

The document discusses the K-Nearest Neighbor (K-NN) algorithm, a supervised learning technique used for classification and regression tasks. It outlines the steps involved in implementing K-NN, including selecting the number of neighbors, calculating distances, and assigning categories based on majority voting. The document also highlights the advantages and considerations for choosing the value of K in the algorithm.

Uploaded by

lakhwanabhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
22 views23 pages

Unit 3 Machine Learning Notes

The document discusses the K-Nearest Neighbor (K-NN) algorithm, a supervised learning technique used for classification and regression tasks. It outlines the steps involved in implementing K-NN, including selecting the number of neighbors, calculating distances, and assigning categories based on majority voting. The document also highlights the advantages and considerations for choosing the value of K in the algorithm.

Uploaded by

lakhwanabhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 23
ne Deg, ult. tion Teer leeenivg — ( Supoanises leasing ) Declsiou Len Denning a meted for apprenirnati vy discrete —vabued ragek function, in whiter tu beaiued function is supsnented by 4 docigion Tau « 4 oxtaut Teuvino Decision Tes © Root Node Te supauent Hn entice population whic gets fuctner Qivided Into wo or more sett. eo © Sptrting dt uwio procure of divide a node luto wo or more dub- noder ne incase tree ® decision Mod a When a dub node Apuits ito fortner dub noden Hew dt 2 calla 0 decision vode @ wo /Tovmtnst neue van ent (Fee nodes whic a0 not split ane > wller oa fouutnat nodes Bur nodeo 1s abled prunivg gor Sauce Aue. . © Braue ( SubTeee) A mutes Cif) odire Mu ps hive duu js tallea brawch of by: A geurret bub ree’ decision tee QD Porn md thild nodes A ede Wt diulder Structun, ito Sub wede> BB called 2 parent node . Tlu Submod e a port ede au tatled Child neder rere, ame per \-fi 6 yw 19 1 enc er : Suteome of a random Vaslalt or an eves - rep y desertbeo aboud ue homogencity of He data Ind fer ape When a Coin Jt Leppes, tnuad on Haid ant 1 “? outtewus , rune He eudropy Ju Lower has Compared ee a dia which har got Six puttemt - Entrapy 2 = 2 P(aa) . begs Plaid A-Sematins Ta) TH ounaton pain 2 find 24 @ seduction ( ducrtase) sn Beep tk TO(GA) = Entropy (6) - 2 Med | tances. 1 Ip3 Magonttnen CD45 Maasitam Gud Tijormation Gain for cach of Hre data gel. Steps, Clwsoke pur attribute for whic Ae ; i ipl 2 is Hus dest APE “and De ioe tue fa ue maw O ‘ Attribute Stepy “Ths bert pitt attribute tb plead ad the riot unde. into Subtree WA Step5 The suot node Lt branched ae 1 each subtree as an outcome a Hu test “ae sol node adtribute : “yw subse Steps Recurively apply Hoa dame operation for of Hur taint Aad wth He eee altributer a Lah nate is derived. e. Question wake a deeigion Axe row fue given tae’ p data Bie Tabes tie + ee Tiwation | upeetin| Melty [whe | 2 Mo unt) Day DI Het vf wae Weak Hot Higa Stremg No Sun rome Kot ie weak | Yes Rain [Led cE Ht week | Yer Rain (od Nosrmat | Weal [vex Nooumat Stow No Nownal Strova | Yew igs week | No Nosanad Ww Yes Nosomad Steal Yes trough eS Nosmat ae Noted IMS ENGINEERING COLLEGE Department of CSE/CS/CSD dud ve jet "|" wropy (44, 5) = -2 14. (45) | | Reot Node Selochou Pp i | Mributes = Oudlook , Temp , han, bund i d Tonge = Pag Teunis | (| The pbibude Wide gfves wight ipoemiton | | “guin us selected QL gupb node | 1 | i it IMS ENGINEERING COLLEGE Department of CSE/CS/CSD | Se [a 5] Eatvopy (se - 4 bog 9 _ 5 | ry ORT deg 474 sae (4.3-) Entropy yan 2-¢4 = 0.93) 7 aS 5 Sougear € [44,0-] Cutropy ( Souercast) es doy, fp last u S | | ree , | v S | Gain (5, outlok) = Erctoopy (5) az Ty Entropy ($y) | eben, voces Cis) ~ t 134,2-] Batropy (Spas) « sm 2i2 ay, 2 = oom = Exo py (s) - f eras 2, Gateopy G Sung )- ig Paty a) Entropy (S It M4 = 0-4¥- 0-346] - 0 - 03467 = 0 2ybhe (Hiput) This 42 He iebor matt on gain (14) 4 outlwk attribute Je. 2464 Siomitanty we tam cabertate Ht information Juin 4 Mernsiving 3 alivibsdes “he Temperature | Humidity aud wind Pe eee hea aw ee De 6 es te daa Smaba —[ 44, 9-] Erctrepy ( Sputaetdes) =t top, 4 4,2 0-983 Scour €= (84, t=} Fudropy ( Seone) = : e pe ~ tad OBIS ~ s IS) Ent (Sv) esp): halle «18 5 eG mild coat) + Evtrepy ($row? 6 Exaror (Sniea) repy ( Sums.) = & 10.9103 - Y x.08N3 Tu Iy , SE nadues (wend) ~ Shreug | weak | S= (44, 5] Entopy (5) 20.9% Samay [+4,3-] Endrayy ( Sctrou ) = st jeg $ - . ks = Og 871 S weak e [5+2] Evtropy (Sweak) 2 5. dey . £ dog 2 s08tte Gain ( Sruiina) = Entropy('s) - et Entropy (su) VE ( Chrong weak) L | = Erctrepy (5) - 7 Entre (Seta ) - ‘2 Entropy (aug ‘ OM ~F 0.985) ~ Fy 0.8630, > ~ I 14 = OM ~ 0-49255~ o.y2/s | = COTE 0°01595 (4+, 5-) | Sunny Overtast Rasa & 9 / | \ E I (>1,p2, DQ Da,p11) (03,03, 012,13) (D4, Ds, >, Dio,biy | (24,3-) (44, 0-) Gt, 2-) mH a 3} Sua € [04 22] Entropy (Sse) Smite (14) I-J Eutrepy ( Smita) = Sem € (14, O-] Garepy ( Scot) 201 o- ISul Evt vey (Su) Cain ( (Sunny Temp) = Entropy (8) ~ 2 ey 7s] "Hy VE Hot, nate, cot = 0-0 Entropy ($) - Eatery ( Spot) - 2 Batrepy (Sucisa ) “LG. A Euatrepy ( Sunl) 2x 1-Lxoo eS 5 0-97- x Oro- ale 097- 2 - 0570 5 Sa OS mie Ca Vadis (Aunt dity ) e Hig, Nosemat Sst = [24,3-]J Enhepy (5) > 0-97 & gh = (0413-9 Entropy ( Spigu) 2b0 O° Eutropy CSNesunap) 20-0 7 Jy) Su Gain ( Scunny, Humidity ) E Entrepy (9) al Entropy (Sv) Nosunet = Entropy (¢) eS Eutropy (Shiga) - 2 Extropy oe) @ “8 O7- Byo- 2B eo = 0-97 = M40 s Ss Values (wing) = Stroag, wealt. Sstennsy 2 Rt, 3-J Entropy (s) = 0497 “sroug Cit, 1-J Entropy (Sstseng te Sweek & [1 4, 2-] Evdtropyy (Swean) = <1 deg 2 o = “3 3 . Is 6 , Gain ( Ssunny, Wind ) = Cut ropy (5) - Tey ty (Su) ue (Shou, wet ) S Entropy 5) - 3 Entropy ( Sstrong ) -4 Exarog (Seek) = 097- Bre “Zyone = 0:0192 tallied Hig Notunat D (0! bz, bg) (4, on) Mo Yer. finet Corapee . | _ aT Overcas i Sunny Hips Noxansd / \ No Ye MEE He Dein Tok ang © 6 aed Dp Redued. oan prunivg © vec dake Seay mame eee ws rae Teconponacing Coudivusts Valued adtyibut er: slain diseiate o Paeemining Te ahinisfitation 499 ® Mavdttng atintbutes wite tif Eenad costs © mecastive fox setect M222 W9ULs, Le adbehutes . ‘ec a4 © peat ae Cnamptas witte miss ing. Odribute vabwert Live Bias in Decisiou Tyee Decision Thue learniieg vin tin bee tn ung. 1 Here 18 wo bias . oR <> bud Af we gut mire Hon one tee ) So He Bb a Problem | eel be Used to Atanoify He new examples Haak Problem As, eabtad dvductive bias sn IpD3 Mypsdtion . Se eee See es ag” Aharter trees One prepoud over tonper tyeen . (6) A ploser approxi mation to Het> inducctive bias cl Id3 ~ Bhovier trees axe pralraur Over doug er freer. gene ay = Tou Heat place aig. inposumation ain attributes tose to tie woot oe prerred over +usse Hunt do not: Hf a, & Testanee Based latiatug dee bavance 4 aw Oli OF | On examapar tv Hat chravdvg data A eb - | Simbtent ty pased hamifier or Justanes baaed lami e7* Use simdtanity measures to mate Hou nemurt ighbeve totes Cobras - and clarify 9 test instamee Which wasdes 1 wile ober earning nrtchamibnn Such ar autizion Het: Tur advantage of Uses Hus Learning ts ted ) Proceraing weeuns guly when a Hempel to elamif f 4 re Jost it gee Tas madly pr pwhon Hu Whobe date get ik not avaltabte inv 1 beginning bul uobiucted In duommental manner a of Hui haxnivg Ls Hoar jt Aegiseea snag te ms Bh tes i vn am Th tak ovnbrused iabHably wht pur © aii dake. Approoemee fe rT? J I l weiquted Radial Basi — (ase Booed Se kn wee “e Function teeta, He in “4 Tel p operat ou Ofter Coie Hae Cound I stan - wit. yt poorious Instances , $1 1s called ay lang oe or Mompry based Learning . Because dn this technique tHe nod At token ot ores tur newt instwe 4 anriverl. + Tustance Based Learning o a K-Nearest Neighbor(KNN) Algorithm for Machine Learning © K-Nearest Neighbour is one of the simplest Machine Learning algorithms based ‘on Supervised Learning technique. © K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. © K-NN algorithm can be used for Regression as well as for Classification but mostly itis used for the Classification problems. o K-NN is anon-parametric algorithm, which means it does not make any assumption on underlying data © It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. © KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much simiiar to the new data. Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either cat or dog category. KNN Classifier 8° Predicted Output Input value Why do we need a K-NN Algorithm’ new ‘Suppose there are two categories, i.e., Category A and Category B, and we have a data point x1, so this data point wll le in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the ‘category or class of a particular dataset. Consider the below diagram: ap o \ Category B © Nev data point ~\ Category B ° New data point assigned to Category 1 INN Category A Category A How does K-NN work? ‘The K-NN working can be explained on the basis of the below algorithm: ‘Step-1: Select the number K of the neighbors alculate the Euclidean distance of K number of neighbors ‘ake the K nearest neighbors as per the calculated Euclidean distance. Step-4: Among these k neighbors, count the number of the data points in each category. © Step-5: Assign the new data points to that category for which the number of the neighbor is maximum o Step-6: Our model is ready. ‘Suppose we have a new data point and we need to put it in the required category. Consider the below image: %e o*% @ ° ° } ° Category 8 New Data a ® point 4 Category A © Firstly, we will choose the number of neighbors, so we will choose the k=5. o Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as: X2.¥2) Yiferessee= (Xn¥1) x Euclidean Distance between Ai and Bz = © By calculating the Euclidean distance we got the fearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B. Consider the below image: ee @O* « o © ° Category B ©O\ *,@ New Data ® point Category A © Aswe can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A. How to select the value of K in the K-NN Algorithm? Below are some points to remember while selecting the value of K in the K-NN algorithm: © There is no particular way to determine the best value for "K", so we need to try some values to find the best out of them. The most preferred value for K is 5. 5 Avery low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model. co Large values for K are good, but it may find some difficulties. Advantages of KNN Algorithm: ‘© [tis simple to implement. © Itis robust to the noisy training data © Itean be more effective if the training data is large. Disadvantages of KNN Algorithm: ‘Always needs to determine the value of K which may be complex some time. The computation cost is high because of calculating the distance between the data points for all the training samples. TA» .. a for foliaing datags and praait tn 4 | Row on aL (aa) (4% (17+ (7- y)> 4 ie “art | (7n* = too ef (2-1 (TM —s ho | Row 4 [Tau v | Row | Pouce rue 7 | Lae weto 0 eH , Jeri 23 (29°*+ (3) - Nae (art C47" ; ing Oxdor Ea ele Tet eumapt wit be’ . Test oummple (ar 23, Ar27) Assume K23 Al Ar Un y) Tate 4 Due 3 + fatse 1 4 Tone é @5 3 false xy C 3 Tae ig ( 2 [c= vio 4 Dusstion of tui! Mgorttim, @ Rot - eo arate « dratiregg, knvss [Taapecanne = Monet = here aes Row 3 = Al (20-60) *# (35-40) > 2 afer t(asye= Jiswt 3028 5 jes Rows 2 [Ganley + (36-25)9™ * ol (rea [tet (loi > fiver = Meat ee eee i shee Sa Rew sz J (qo-To)>+ (35 To)” [oops (359% = [satinas = 6) Row 6 = TC (20-to)>+ (35-10)” A Cert (250s, [Tht ase ATI Rewo = [(ae-25y> + (26- Be)" 7 pf (574 (457% = fastans = [m0 = 45-28 n a _ Jucuaaivg order Row 4 Red v Tut majontty than With ia Row | Red vw tw 5 neanest hela lator sa Row 2, Biwe ~ Lag how 7 Blue ~ nan cutry ip Re Rew b Red ~ Row % Blue Row 3 Blue Wweets) Fig. 12.7. Radial Basis Function (RBF) Network (is) _ CASE-BASED LEARNING OR CASE-BASED REASONING (CBR) an S site posed reasoning ‘based reasoning (CBR) is used for classification and regression. Tee an past brobleina, oe Process of solving new problems based on the solutions fee a past problems. The CBR is an advanced instance-based learning method which used solve more complex problems. Idoesnot use Buldean distance metic Se arrives to classify, then first an identi i i imilar case i classify, ientical case is checked in memory. Ifany similar case is found in stored memory, then its solution is also retrieved. Fig. 12.8. Case-based Reasoning (CBR) life cycle 12.5.1 Steps in CBR e Retrieve: Gather data from memory. Check any previous solut;, A ae ion Similar, current problem. Reuse: Suggest a solution based on the experience. A, , demands of new situation. " Adapt it to Meet the Revise: Evaluate the use of solution in new context, Retain: Store this new problem-solving method in memory system, 12.5.2 Applications of CBR e Customer service helpdesk for diagnosis of problems. e Engineering and law for technical design and legal rules. e Medical science for patient case histories and treatment. 12.5.3 CBR Example (Smart Software Agent) A common example of CBR is a helpdesk system. Here, the user calls for computer related service problem. CBR is used by the software assistant to diagnose the problem, Then software assistance recommends some possible solution to solve the current problem. For example, a printer problem, internet connection problem etc. Case Based Reasoning (CBR) ‘As we know Nearest Neighbour classifiers stores training tuples as points in Euclidean space. But Case-Based Reasoning classifiers (CBR) use 2 database of problem solutions to solve new problems. It stores the tuples or cases for problem-solving as complex symbolic descriptions. How CBR works? When a new case arises to classify, a Case-based Reasoner(CBR, will first check if an identical training case exists. If one is found, then the accompanying solution to that case is returned. If no identical case is found, then the CBR will search for training cases having components that are similar to those of the new case. Conceptually, these training cases may be considered as neighbours of the new case. If cases are represented as graphs, this involves searching for subgraphs that are similar to subgraphs within the new case. The CBR tries to combine the solutions of the neighbouring training cases to propose a solution for the new case. If compatibilities arise with the individual solutions, then backtracking to search for other solutions may be necessary. The CBR may employ background knowledge and problem-solving strategies to propose a feasible solution. Applications of CBR includes: 1. Problem resolution for customer service help desks, where cases describe product-related diagnostic problems. 2. It is also applied to areas such as engineering and law, where cases are either technical designs or legal rulings, respectively. 3. Medical educations, where patient case histories and treatments are used to help diagnose and treat new patients. Challenges with CBR « ‘Finding a good similarity metric (eg for matching subgraphs) and suitable methods for combining solutions. + Selecting salient features for indexing training cases and the development of efficient indexing techniques. CBR becomes more intelligent as the number of the trade-off between accuracy and efficiency evolves as the number of stored cases becomes very large. But after a certain point, the system's efficiency will suffer as the time required to search for and Process relevant cases increases.

You might also like