0% found this document useful (0 votes)
32 views

Machine Learning

1and1

Uploaded by

nekoueiyasaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
32 views

Machine Learning

1and1

Uploaded by

nekoueiyasaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 76
Ortle 6 pFok bs yi “Learning is any process by which a system improves performance from experience.” - Herbert Simon Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that * improve their performance P * atsome task T * with experience FB. A well-defined learning task is given by . GF WT Aigai wiz jt Jo Improve on task T, with respect to performance metric P, based on experience E - playing checkers Percentage of games won against an arbitrary opponent Playing practice games against itself Recognizing hand-written words : Percentage of words correctly classified Database of human-labeled images of handwritten words : Driving on four-lane highways using vision sensors Average distance traveled before a human-judged error A sequence of images and steering commands recorded while ‘observing a human driver. P . 1 B e Categorize email messages as spam or legitimate. : Percentage of email messages correctly classified 2 Database of emails, some with human-given labels my Grol Elgil * Supervised (inductive) learning — Given: training data + desired outputs (labels) * Unsupervised learning — Given: training data (without desired outputs) * Reinforcement learning — Rewards from sequence of actions Og 7S) MBL 5 Fok, Given (24, 43), (Xp, Ya), + (Tar Yo) * Learn a function f(x) to predict y given x — yis real-valued == regression ° 8 o* 6 5 Ba 2 a o ° September Arctic Sea ie Extent (4,000,000 sql) 19701980 am90 200020102020 ay Alig 1 b og 9b * Given (24, 44), (@21 Yo), +» (av Yn) * Learn a function f(x) to predict y given x — y is categorical == classification Breast Cancer (Malignant / Benign) (Malignant) eo 2@200 (Benign) Tumor Size Predict Benign Mal Tumor Size Ol & 6p Sol * x can be multi-dimensional — Each dimension corresponds to an attribute Tumor Size BS Go Gp Fol, * Given x4, £2,...,%, (without labels) * Output hidden structure behind the 2’s — E.g., clustering @e ®e @ @ °° > e° ee ° ee Se Stok, * Given a sequence of states and actions with (delayed) rewards, output a policy — Policy is a mapping from states > actions that tells you what to do in a given state * Examples: — Credit assignment problem — Game playing — Robot in a maze — Balance a pole on your hand GAY Alle’ oT ly cold Aegare Ky jl gl Algal oe 2 & Tig Home Marital a ees No No No No ‘Yes No No ‘Yes No ‘Yes. ole Elgil TS or (Types of Data) Creed org (Categorical) (Tue) wle aug BP cases aaa 7 (Ordinal) Binary) (Discrete) (Continues) oold wildy Gay hol aby ools gio SE We BEET G95 Spb 9 oy dob lubes w9ld js) G28 Szby wrgiie poli go Sy + oats gil a LS en olBaly psig (Quan) sil sh * cols nals Cine Aegean GEE! gle! sla adlye oles «Kege bus (dimensionality reduction) sy uals * Silo yp Sigal 9 gan, Ag Aba pl Semed wygew 5, eld Jae (numerosity reduction) 265 pals + (data compression) sls gjlw oa p23 * 0919 5 jes Aims 9 JSS ph site Jleg + cole oj Lsk lest slates wt gle sbRine jf Ite yy a2 a jlo Lle gle ols ogery Gfalo ay sainn GES ally clus yo la ook © Ay ge ols JEN tals U9 Pg w5)8 089 ABLE 5 8p Sig ope Ja ay LL be als jas + wiles lke Sy gla ce yolks glass Gg wky ae Se + Sip pale b las opagh Jal ool a9 6 (as) "10" = Syio Sn iUles slp ceabul Laas’ 49 GIES! ogy toy yall ¢ "2010/07/03" sig ai "42" RBC aese 58 ey D3" ghd 86 43, oad Sua 5 Sis JEL 4 guee sab & apd ool alt + tan ay iy alee & etangd INS t pond 99> p) OdgaRe polio digha 2135) GES case * SRS rab 9 7 Sle} 00g hiie o3lo ad G9) y, ody cal Sow AF oygh go Opry ay dese WIT SA“UNKNOWN” Come yy Wha 35 pl ps ul SE * Sol. og altl sgl colo us ay pe cake She b Gakile * sober ay glaie gle digas pla sly alle & rome C59 jar pegd fhe qlistel colaylp! jf oolitel b ylade Gye 3 ares * oslo falas IS oaks ae pace S S9y y gl aslo 0019 Salat © odes GUS Solos Se eS |) La onl i) gece qane eslojbul Uoald ok K, Goole Jules Le * SS It 5072 1S p08 |p gel clos Gate 9 0852 2 cole] sles yee wold Jui loyal, * (dimensionality reduction) ax; Ja. Sep just * ete ag Aye Julad * ike deena) Lal * (numerosity reduction) 2 Js .Y log-linear y yw, sla Jar * by Spa st Migs pl feet ool Gea Set (data compression) oss (jlo pSIz2 .¥ ty hel old ay Su955 gybel aulth ad gs 9 HW pS a Fk cote oslo Gates wile Jlos Bad phe iI ates gle ly peel ldo tly IF ST oalo Jule yo silgage a, IS a Ghee woly Soph ih glise als als a yao lyse ELE! ly Se gully opine ee GT ole Fae 58 a pate argh Gloss 99 cae Se he I gh cage Cabo gl 4 shew O59, sylasteul & Slag wb cols gyal oily arias lanly LRT L La ya el iI LAT Kas ale oy 27 3175 [0.0/1.0] & [1,2] aslen 65 yapee b KesS 05h 2 0 ange IEE pail oale ile Slay lets, minmax gibe Jas * score gile Jas * Sle soba Geb it eile Jia * wile Jlos min- max, jlw Jleyi Aes plasl hol oaks g9y98 ght JRE w at + 5H gba Jlas rian A caso 45 STS 9 lilam pole 0509 GLEI MAXy 9 Ming tess yd * V/A JV Jake Sp tl8s & [neW_ming, mew_maxg] 95h [ming, maxg] 0 v— min. (new_max—new_min.)+new_min, max.— min. adie poe SABIB5 [0.01.0] 036 52 1) al > atlys ge cual Aves g Weer sel jo cake yl 9 Jila> Sle * gh co MNF ay ras VIF ee 73,600—12,000, 0-0) +0=0.716 98,000—12,000 wile Jlos (phe dike gible Jl,ib) z-score sjlu Jl Bia oe ile JAB A jhene lye 9 abies polly, A cae ly polio * ye Woes on Spb ge NTO ay fas VF + OF ee aloe sy ceSile sll © wile Jlos i Jlonws lide Gr jb ji! oily Joy A caise syste lagu aba 9,8 tale ugh jt a vy, 8yIO A lle lade pS ay Sa owt Lule bis land = To cool Max ("JS 1 ass jg ay coal gaara GyiSegs jail yo aly A Gils jade yiSla> cust 917 G -9B6 oj6 9A jJoadeud polis Je = emit Neve ayy lala 0 Slag ola Gabe jlsile Slag gly cou! 986 GE ie) Sno opcige Jle 50.917 «917 5 -0.986 «)-986 Gjbw mnd? werrtede SB come yt U ofl GE mg yb Gote Cite SY AE pollo Gos nL * cabeg (Dealer JLaF 52 caster) ype WG g TMD Nome )ejl Cryge ay AF caw stile * oe 4 go 00ld july Ral Gel * rilizro glory dine eloil * K-nearest neighbor + Decision Trees * Logistic regression * Naive Bayes * Neural networks * SVM Le dy larod oy p95 Tell me who your friends are and I'll tell you who you are! Abana cy Ks 9 55 @ Suppose we're given a navel input vector x we'd like to classify. @ The idea; find the nearest input vector to x in the training set and copy its label. @ Can formalize “nearest” in terms of Euclidean distance Ix — [2 =, Yel every example in the blue every example in the blue shaded ares will be shaded area willbe classified misclassified asthe blue class correctly as the red class © Nearest neighbors sensitive to noise or mis-labeled data (“class noise” ) Solution? @ Smooth by having k nearest neighbors vote Algorithm (kNN): 1. Find & examples (x, 1} closest to the test instance x 2. Classification output is majority class i = (2) 2 4 y=argmax ae 2) KNN 31 Jus 682 9 ge dd 99. & ba SES gary dd aly gl daliay erry S39 * tAIMIN yeh 4 ad plod aly> 9 ome! yy p20 coaglia Sieg 99 ole! » ig Bad OAS yl) gS 3 = py poge Geconds) 32a ap Y= Classification (kg/square meter) * 7 Bad 7 4 Bad 3 4 Good a 4 Good utlgiaa coal 9 asd cl ghph XD=T gX1=3 sighs cod 6 analy apse lS eS lay a8 el fad ind Ae BDU OB (aol!) KNN jf Jlso Lies if AB peep inl 9 Sse! plat bssys See delbeutes 2 een haloes ILA sla ged ali UJ sT abo’ 5939 oly (3.7) gS bs 59 b Rie wipe X= Gd Sah goal ab {econds) Cipfaqeare nett) g z YO C— =4 7 4 i+G- 5 3 4 4-37 += =3 1 i ae (aol!) KNN jf Jlso Su29 eben K Steel y ahold polal » grjgal Gl sige uo ie 3 = geak Map GDGA NLS ie) alata sileconds) —_Oce/square meter) oni Fen ay 7 7 ¥ = 7 7 ea 7 - 3 7 aa y = ; ‘ fa-3F+G-7F = a (aol!) KNN jf Jlso AS she edd Uli oy12 Kop Gh Glhce jo) uss OS 4 3S G2an3 Gye I cap CTLs =n wana eae Sno : 7 oS ro BE i * EEG ' = LE G78 a T 7 ro Tradeoffs in choosing k? @ Small k > Good at capturing fine-grained patterns » May overfit, ie. be sensitive to random idiosynerasies in the training data © Large k » Makes stable predictions by averaging over lots of examples > May underfit, i.c. fail to capture important regularities © Balancing k ’ Optimal choice of & depends on number of data points n. Nice theoretical properties if k + 00 and & + 0 Rule of thumb: choose k < /n. We can choose i: using validation set (next slides). vy OPLL yo Alanod yy HK 955 WT [1 from skiearn inport datasets fron sklearn-neighbons import KelghborsClessifien fron stlearn.nodel_selection inport train test_split 4 Load the inte dataset iris ~ datazets.load_sris() Xe dete gata y = Seis target ‘Split the data into trauning and test sete X train, Ktest, y train, y test = train test split(t, y, [email protected]) 4 create @ kell classifier with 5 nedghtors rn = KheighoorsClassifien(n_nefghbors-S) 44 Fit the classifier to the training data kon-F480X train, y-tratn) 4+ Evaluate the classifier on the test dats accuracy = km.score(K test, y test) print(#*Accuracy: (accuracy:.27)") [reonas a> 3» poponad asd BSpe GAY Awd ogo | Le digad GT jo aS Cul 25> peed Sp © Bey ge Typ Sle OF 4p Cals 52 9 SET ge A, tly eae ay at, 5! a 2p Qe pat (attribute) 35 © Lb (non leaf) +222 L blo oF » @ BS 52 Thee 52999 JE be aba 521) Ne Shee oo! a5 93 o9e5 (branch) aslo Jou cull jXee clyle slar a Leh oS 2 > © align eet llge gi) jist be Sy gi wig (co Gattche lglge jf ats Sib g WW SLs ,9 gl cE, © peat aT Gd 9 ul aS Cull gal penal cdj0 lb gl efi cle © W809 654 GLE |) Gogy9 Sle Ke ae Gad chy Ge BS gt pot | at he OS ot GS AO 1849 pr oi no ape cal tga Cop 9 diel Ue Hh? Hp Sl Jab pot ob Jee 9 Ss i) ; 2 etl oot 95 gt 1S se of Shs ln S aly salp cl? ply S199 5 ote Ct lat ames CS 10 1 Jldo a eed Model: Decision Tree ply CS y iL ee tee Gly Ro ppm CS 39 aC rs CeremmScCaoo [No No No No |Yes No No Yes. No Yes. praead ad 10 LIT 09ers Start from the root of tree. Test Data Sve esl sT 9 Sigel es SIL SIT 2 te es alae pera CS 19 1 JLo outlook: = b&b ta &e & & lovercast hot high false mile high false high prponad S19 (yr led ogod Outlook S552 ont cote LLs,! Sunny Overcast ~ Rein alae Eley soles 9 pened > wl ah a he a eee 2 FS ayders gle oS heb Sg pee sce! gaee High ‘a = me No Yes No Yes (Outlook = Sunny \ Humidity = Normal) v (Outlook = Overcast) VY (Outlook = Rain \ Wind = Weak) 25% 9 20gei Gate |) 2 Shs II(AND) cies 5 Sy 4 at, jl Sih oe | HLS 5 opN(OR) (bad og 3,9 Papel 2 > ote ott sly pena CS 30 1 JLo (Iris) G35 JF cole de gore : Jl Aol y ais US gi as 51s jf digas 50 Wedges gal 6 cal gat SIME j1 4a Gyglaar tigal 150 Jol aguas gel sans> ys Sul g Spat 2a 9 dgbo Joli LaaSjnp gal caalossd sySeihil gu BS Sg Udeigas Sy yO ese igs all fg Bes) — >= 175em ga) eal 9 Gy L8yb aS ay gloien ust ausgicaliy gl 9 Siki-Learh Sayts oalatal Ll gay BS eal aegare 4. from sklearn.datasets import load iris 2. dataset = load iris() bw wg, aS de owed et pened S19 City 6loets 9S Decision Tree Induction Blah 59 cile ay Me agreedyy athens spot fees tale pe pemet So ys GpeFals aglaera;sSll lel iS ge Jae oe ge glacsp * Hunt + ID3{(Iterative Dichotomiser 3) + CHS + CART (Classification And Regressoin Tree) tg ril- plo alone Entropy(S) y piloge | t | Bucket 1 Bucket 2 Bucket 3 Entropy: 0 Entropy: 0.81125 Entropy: 1 i973] Apoloee 51 Jide Entropy(S) = ~ P1082) - 51085) = 0.940 CALEB! 0 40 S sega 9) pA Spy Stel oe Is Ts Entropy(S,) Gain(S,A) = Entropy(s) - 2 vevalites(A) A Siig cob 9 canes jog cl 6 last nal ole * LEI 0 543 Aqwloee 51 ldo ie 4 m ewe = @)vas)-(2)ous(2)=0704 renmn- (Jou) eas Gain, Wind) = E(D) - (Greve +5 stsro)) = 0.94 (0.491 +0346) = 0.94—0.837 = 0.102 CLEIL! 0 py Annolree 51 JLo vm | wo ~ maz] Gain(o, rem) = £(0)- (Len + Sean + Leo) eb = 0.94 ~ (0.231 + 0.346 + 0.346) = 0.94 ~ 0.932 = 0.008 LEI 0 yy deauloee jf flo ask [toe | Siner | Hom ee svg [vor Siney | Hoh | aieie—ctec| Qutook PeKOyerers—| 4.0 vee [mt _| Overt | nah 1.4 som | cod [tin [heal | seg [cot |main [nomi | | E(Sunmy) = 0.722 wk [ma [sumer [vom | | E(Overcast) = 0 wat [oa [sumer [nom | | E(Rain) = 0.722 wok [me Sinoy | Hah eumnre ' seg [ma mon Hoh Gatton) = B10) (Fat + $0) + 00) vetk [tet vera |verml | = 0.94 ~ (0258 +0 +0258) = 0.940516 = 0424 wk [oa [fam | How CLEIL! 0 py Annolree 51 JLo = ih = "sma 49 | E(High) = 0.863 E(Wormal) = 0.985 Gent, Hid) = (0) ~ (70 + 7,60) = 094 (0481 +0493) = 094-0974 = 0016 102 0.424 0.016 LEU 0 php Aneolece 31 ldo Sunny” Overcast Rain Yes 1,02,08,09,11 [eed 5 =(p.p2pspep11) Gain S ,Manidy) = 970 - (48) 00 - @15)00 = $70 Gain(S Tenperatare) = $70 - (25)00 - (25)1.0 - (1590.0 = 57 Gain(S Wind) = 970 - (215)1.0 - G5).918 = 919 Leb] 0 pgs Awl 31 Jldo ok 49 prponcs at) Training set accuracy: 1.000 Test set accuracy: 0.051 preonad GS 10 51 Be ney ploul alg Gloold 9 Wee ten Gy GlE> H2 0 dewlro sreay( 8.02009000e400, 0. 0200%EBIeHOD, e.09B00200e409, 0. c9B00900+00, 0000000006100, 0.600000006H00, 6, 59871500600, 0,00800080e+00, 1.646#1577e102, 0.€0090000e+00, 1.09477685e101, 0.00090000+00, 0.090000006100, 060000000400, -1.13177354e4011) prone CF 19 5155 porn yi tha! GR kx Gry Ade’ ob 55! cle jliro # correct predictions accuracy = @———_____— # test instances # incorrect predictions error = 1 — accuracy = S&————___— # test instances (Confusion Matrix) Sis) hyo uu yilo * Given a dataset of P positive instances and N negative instances: Predicted Class Yes No Actual Class =| AD No lie) T™N precision = FP PP+FP TP+TN P+N accuracy = reoall= “~ TP+FN 390i! 9 Ajgel Gl ool * Training data: data used to build the model * Test data: new data, not used in the training process * Training performance is often a poor indicator of generalization performance — Generalization is what we really care about in ML — Easy to overfit the training data — Performance on test data is a good indicator of generalization performance — i.e., test accuracy is more important than training accuracy Training and Test Data Training Data Idea: Full Data Set Train each a model on the “training data”... ..and then test = Data each model’ s accuracy on the test data k-Fold Cross-Validation * Why just choose one particular “split” of the data? — In principle, we should do this multiple times since performance may be different for each split + k-Fold Cross-Validation (e.g., k=10) — randomly partition full data set of n instances into k: disjoint subsets (each roughly of size n/k) — Choose each fold in turn as the test set; train model on the other folds and evaluate — Compute statistics over k test performances, or choose best of the k models Example 3-Fold CV Full Data Set 1 Partition 2° Partition kt Partition [| | 9 = Test Performance Test Test Performance Performance ‘Summary statistics over Is test performances Generalization Test set (labels unknown) Training set (labels known) * How well does a learned model generalize from the data it was trained on to a new test set? Predictive Underfitting Overfitting Error Error on Test Data Error on Training Data Mode! Complexity Ideal Range for Model Complexity oss pramead C19 39 ily te ‘Ov texining data ‘On test dara 200 30 4D Size of tee (aumber of nodes) Oo) ig) Selo Le pete le el Lely ee g ye cle Si! Sy ee te eb Bw lb ee gale LLG! ge SL doles ™ Cj cepeae cle pte 5 gelace! —(colatil cle pite lb Kis byes uy * J @2 ee 424 pone scyqems Fy et dole Gh oho * ae Fy Elgil cole be Og Sy 8 Y=P+BpX +é eh Rs Se ee see bt op Sy * Vs PytBX,+ BX, tt BX, tO eek ee Rs See we i Sod tae 55 3282 oh i Rs Se pe te yt fX, a=PUN=I|X, 1 AA, I+ (0 yaiine SG) ool hd cyquaw Fy i! Jeo ¥F vh-F wr vee re vort WA nov hg (x) = 0) + O,x woke gen) evar tule 0 patio Me bs " Jo > 9 5) Oe = = Se + O4x4 Ey 1x1 + Oox2 + 03 =O + he(x) © prmito 92 pga Fy wd (yy lol Y Sarma cy gaan 955 OAS (or 9)F 59) p72 Alia! Sy gb 51,5 yelihe(x) 205 Sees ae a y = 0: ho(x) < 0.5 = So cf owed

You might also like