0% found this document useful (0 votes)
128 views107 pages

Data Science Theory Ashwin

Uploaded by

Snimy Stephen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
128 views107 pages

Data Science Theory Ashwin

Uploaded by

Snimy Stephen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 107
2 Te | 41 PROBABILITY 4 total nvm ber of possible onkcomes. No.of fav cases | total nvmbor of paces ie Eq. consider tude : prbabilly of getting Xa tYo =F - 16 34 total cases 64. . Ri “ Pei of ghing PN. he z ontcome |S pine vot aca) = 4 = 3-4 6 + Fq3. coin) ne prota ly ot geting different fuocs % HH AT v TT THe « Events Ontcome of an oxporiment Mutually exdusive ent “Tndepender Vent # set Collection et wall defn od and wvigue elements. #4. 1 ropentaton ot tvmborc . dohl_pssibilly = 26 Muryally exclusive eyent 4+ only ore. exon co yerfoon at a ime # rwaltiple avers cannot ogant » ab on Hime. *¥4 Cancer A com. H A- Zontcame IS Hz ee ates & 19 Aand B are MEE. # consider he $2,354 BeLAAN SA # Union COR) Aus = 0,2/35,69R pcx) = 2k Ye # inkrsechion CAND) AQB = $24 reaen OO #£q. MEE # cosider a die, he S ontcome #5 ewer B= $ ontcome is odd § A and & are Mee. PCAUR)= PCA) © PCB) Pray = 6 24 pce = Ye = 32. LHS => PCAUB) = 1- RHs => pcar* PCB) = Z+4= + tq. ARB ove not Ms. A= $ oncome is prime S B= ¢atome is odd ¢ A= & 2,3,5¢ BS 3,5% pave) = PCN PCB) Par 2 Ve pa = 2 PCAved = %-2/3/5% PCauR) = 416 PCAVA) == pce? + PCB Yo = Gtk e(Aub) ¥ FCA) + PCB) 4 rca = & ont | tt ince ccenh cxenl J # ope evert does not | on other event « - * ontcome. | Can occur poralley with | wollple event . 7a [reaney = rco.rcs) le exampk, come is head # pce) = % you" ortcome, = herd | lech) CHT) CTD cTN§ | pCa) = %CHH) CHT) S = 2/4 pees Ech CTS =H AVy. | PCANB) =icHH)$ = | Lei | 2 PCAN®) = 'p* pcane) = PCA?-PCB) LH = RIS # prothins: consider two eludents “Bh x2 will pass | . + Both SV¥S2 will fail. s exactly one wil pass. * edly one will faib + atlact one will pass SMa rae xo “(De a both will pass . \bx 2b . 7B" Say F PCANB) = | pcang) = 3. #2. both will fail . ec ne) = 6 «Va PCACNBO = V6 Ps) exactly one will piss [il BQ) CO Anse rnb. FCANBS) U PCACNB) = PCAD.pcBs) + PCAC) PCB) = ap. M3 a. 2/% ap t+ = 46 + 2% = 4h, ANBS & ACB are Mse. #4. atleast one will pass . = P(AUB)- PCAOB) POH) - POND. PCB) = \bt2h - Ye % | -7sl = 5 | 3 6 # problem consider 2 student + all pass "all tail * exact * exn \y two will Pass “@® @ .@ Swe ~®& #4 all ee one will (ae #CA). PCB) - °Cc) Bie PCANB(IC) = mie eee = V4, #2. all tail. Pc aca Bn’) = PAD. ecg). i aa) 2b. ‘ea. V4. a= 2 #3, exactly one will Pass . acansénc’) v PCATBNCS) v PEN. el io sixht+ bey ta Lydd + SF =yt4lt mean = ia2t = Bhs, at = 44, #4. exoly dno ef them pass. pCAnBoc®) U P(ANBT|C)U (AN BIC). 14 i.e 41-22% ee ear 2 e =tdyt+l 2 3 + = 2t3t6 4a at ty pATA # TYPES a F DATA STRUCTURE # DIVISIONS OF DATA Tes ww aunt unwanw f A OF naw DATA TYPES strudured data , \inchuclared dota __ NUMERICAL can be stored in bbe os ‘eed cannot be, slord in lable #2 Somi shctored dala: Dala man 1c . . : | = medi | nome mh: conte eo ine — mode , S4 s2 > standard denahon. lo 10 ®% Bo 33 36 > Variance. rwedan: 40 median + 23 > range mean: 33.8 mean : 33. > Suantile tow mith ln can coplaminne * Araphial measures median 2 > piechart has. Soy of dew © required > boar chart | to contaminate — wedian. ; Ex : t # Ds eran ede he an a 5 #910 Summary of te dob 7 #4 mean: queinge of date . DATA 2 2 24 BO 7.40 100 NEW) MEDIAV: 50 @ (a> ex # almost S07 of dala tne to nv | be changed J contaminate ed #2 medin . middle valle cf dala. 4 yodian i wore Robust Pan variance a peote of change in dala # Sinifioma of vorance. how far or dishwe bv te hn port and the mean, # variance provide, the amount o} diclnw the dhl ints are “ from the mean . “ia oS #2 if dalapoink ove. hYh Vimy anay ‘fom mam ‘Varian de 7. id car) n-4 a var SO= [ Var oe ae a) ee Xn FY, XA 2 YX SW aed = © fe By 4 23 Wo 4232 Y=R. a, + odin main to the ch vill not change the ean ne, Wedun remains Same thy no add the median + te | daly sob Example : \23 median

nest of dal ash sip © > 15 #2. Bop Quntib : fs. 75/. oyantile «| # Sp qu : median . ee * Roxplol | oblact # tet quail nd 3rd quontile PLOT 1 QB7 quantile 50/. 2 15/ 200 ard - de par. 101, 102,- °° Is - 5-5 au_| x0 0% spree i Rg Ie 15.5 onther oy ‘ee 3/2 TAR 195-5 5 #IQR Manns dislana bi Ss bb & 3rd quantile. Buffes a DIOR - Most ate not ve fon be few more 98 las so providing eka Spree. of the dal an PYTHON |). Basic ALGERIC RAT) 2. VARIABLES At 42| TF ELSE CONDITION s, FUNCNONS 6. MAP FUNCTION . 7. FICTER FUNCTION So , ‘A, EXCEPTION HANDLING __ Jo. DEFAULT VALUE “ALLOCATION _ NN. WER FUNCTION - yt. PYTHON } aa, Bogle sgdvic opel WH) #3 DATA THVES. Wa ——— 1 nadchon : eubtaclion . . sale se a pyeo o/b S. Square » aah pow (a,b) 6. Romindar Ob. 7 guotiont » of/b. #2_Nariables (i # nawing ; 4 my name - underscore. , 9 manamne. small \eHer 3, Myname - captial \eHor # snapping of value: . = oth 7 other lengua b =20 demp = 4 og b = temp > py than . a =lo b ao | ab = ba. * problem ; x=lo, y=lo x4 = K to, Yt10, #20 ee 3° 7 oatpat y | corre : @ a. ' nt 2. Float 3 string 4. list Ss: Tvple 6. set 7. Dichionary # conversion of intergr Jo flont x= 40 typed + ink X= Float) (Con float . x = 40,0 " Pos i conversion 9} u | X= 10.0 “e ie float x= intlO typeco » int ¥A._shing # collection of clwaclers . N OF Abt sjz bon" | a =* 12345" a example « | a= toy name. 1s Aswin! Ola. oppad) “onlpat +S My NAME 1S Qn | ontpat = ~ ASNIN® lowar C) my rome & aswin” captalizec ) onlpal: 7 ”y rome is dewin’ re P one ‘ agninkumar” oS aT3] aks i! ae index (i ) ow 3 3 a. comt Cx) cut: 6 0. Hid CHD otf. ®@ TRUE/ FALSE 44. 4.15 upper C) #2. a. 18 lower() #2. 0.6 aloha) 4. 0- is alyum() € a=‘ aswin” b=s kvmar! AM At & ot ‘aswinkumar ” #2. OF > out; * aswinasnin” © indexing peation & AT stort end : skpsize ] FO = psn” 90424]: ow: Asn)” # Q= * asin’ OD: :-s] out +’ nia” . out: we | ON LIST. # calleckon of objods. # a list can contain any ki of dalatype . oF can con, an. calleckon of obit | | a= [4,2.5,7Rve,* vin Ts 9 i # parties @€ +) eq. list rs mutable a= Lo, 20] aLo] = 100 | ot: = = [100,20] €2. a. inser} O ". = Tio, 20] Q. instert (0,17) ot: a= L17, 10,207 2 =L 0,20) O. insert (4, 14,2] ) a= [to,04,27, 207 #3. a. append () a= Lio,20] QO. opperd C* aswin”) out : a- [10,20,“asmin? ] a. popd) a= 17,289,11] . pop(2) Pee Ly index ook: = [17,981] | OO go wen O #6 SET pel 72804 % 1a] |e collection of ae ropnlative a. remove C14) | nal defined events ot OF p28] le ect chez vot upper! indexing. ge o extend ©) es 17, 28,4, 19] a- extend CE 1j2,31) # opon Hons # 0. union Cb) = Cun} wk 0 = C2998, 028] | be $c1212)} & a. agperd(a) | ouleat : £213)44 @ a=t123] le a. inloseckn Cb) * appara) | axb out 2 = T423,E- ela “pt pat #5. TUPLE # n= gut cont bauanse b= 3,415 ¢ 2 Monat pga ; oh eas number dt femurs. gh 23,24 an #2 Fosler compared dp list | bea -{45} baat 43. Immutable te cannot dog 42 DICTONARY & da over time | Dichinary—Name =¢ * ample | iKeys= a’e volues-4, ae C 10)'7/26) | Shou 2" lua 7 0 index) saps i ¥ 0. cont C) a ® 9 = 6123, 0728) je Dickona name Tkeys-2"| afila ppend @). ovtpab values 2, PAI append CE V2.3) é Dickionay. ome keys ots = C12), 3) [17,98,4,L" “le Dicitonany - vale values Fa. update Cb) es Sees | + # yords = { No! LY apple Senin’ | vge ESbal’, “Sball”] © Seaton”, S cunent ”] I if Conprmens AND Sey, | #4 FOR LOOP yy Get Add 9 number oh ng a= 01/2,3,4] t bec] <® del worck [cI (for Fina: ; dol vords C°eICM] b. append Cit mmbon) _ # conversion of Chile type | print Cb). “ ; - shy () FD. yeni M=T 1,2,3,4] ‘ a= int(a) b=r] ® a= too foe Fin a: % = string(a) | b. oppond Ci MQ out : iad > a= loatca) ep ) ® conversion of string list #3__svm_of _rvmbers. t a =“ agmin” i Cy2,3] a = list (a) ti | rd YN Pew rINA: out: T8ats “sf Swie"i5 *al] b. appand Ci +i) ** foinca) print cb) od ao . | wt + asin’ > string Weeatey ® conversion of list to ple sm = 6 ®=01,2,2) = list for Tina: sum = sem t i a= tuple (a) | out A=Ct212) = trple . print (sum) a = hstCa) HRY nul iplionkin of all rymbers out a= Ll,2,3) | a = 04,2,3] # conversion of list to cet / MUL = 4. ae 12,3) for iin a: a= set (a) Mut = MbLa i = list Ca) print (MDL) 47,1 -BLSE_STMENENT (lll a wed to check conditions * print output based on the condition. fe . po when de conditions 15 shied . # PROBLENS #1) Sepeomle even skodd fom lsh A> £4/2/3/4/5/6 19] even> CT] add = C] for tina: if iA. == e, ae wppend (ne eentl'l | elif tifa 224, | oo8 append CS a Print (ever) print Codd) Oak pat + [214,68] 1135/79) sepene exer ,0dd, bockon % ching = [4,2)3/4/5,6) “asin”, 10-7] ewneT] odd = L] function = V1 string = eg) | elif ipreede edd. append 1). Ce Hoat c trocton append a) else hing append ca. elit Output « | oven = £2416] odd = T4315] frvdion = L 10-73 string 2 Pasmin’] PREUNCTIONS (NN) # set of definitions 4o exewk | operation # dof anor C bee nbs) relum base xx indox.. # do} Sum »4,x2) relun x1-4%2 # difference bw point R velo . om print jt sh np ba doos, ne “f > rev Saves the rornber can be also used in faire . @#%. Even_ oop. ‘ clef even_odd Cx); if X#%2 ==03 return * EVEN’ elif x42 ==45. celurn “edd” @2_sum. def somlist 605; sum= 6- for iin x, sum= sum+i: relum sum #3) sum_lst OO) B if 4ype C0 |= lis ; return * Provide list” SUM =6 af for iin x5 a ————i‘(i‘( i‘iésCSCsCSCSC*C*C +t aR, Primo _nvmbor | def paimelxd; if x<=4, return “ Non Prime” if ype l= int, return * Non prime! | T in tange (2, (4%) fr Tin mea, if X#izs retain“ Non — Pring eke : | tun S PRIME! o> ization me | def prime cx), if x<=45 return “Non PRIME” if X%L==0: return “Won PRim ge” if fypeGo t= ints telurn ® Provide. inkeger’ fe fin rarye (2,(x//2) +4): if X42 ==145 if xAi == 0: redurn “Won PRINé! eke : relum® PRIME. ( @4)_def mc): relorn LARD for fink] fun CC 1,2,21) Oued 04,40 # LIST _CONPRE HENSON [ITY € alternntve fr FoR LooP. # POR Lop can porfawm all the operations of Le but Lc can pat all code in Single line . ae 04/203,4] ewn>T 1. fos iin as if 1,2 == 6: ben. append ci) b, # UST cop, EVEN ae [4/2/34] belie tin a if ize sso] b. 2) actors OWISBLE BY NO. ne 35. be Lif inte Coat iE afte] b : output Lis 738] @3__ SqueRE _onlY ODD NO. L= 142,35,46] la =[i2 for tin in 1,9,25] L if beet ‘naerd] MAP FUNCTION (NY # cyptinwe wart Fun N # EVEN ODO_ FRACTION def even -odd_frac 5 if xy2==0: roluin “Even” lit xh =eh* tolven S 000” Ay return * fiachion’. © consider | L-C4% 3,4, 10.5 | a vp = my Ceven-old.finchin , L) lis} Conkpat) Qolpnt:: [Sodd 4 “Even edd,” Bses! 2] \# FILTER FUNCTION (Ny een dof seven c) rolum *%2= 8 L= 14234] a= fuller ¢ &-even, 1) bet Ca) 0%. L241 ® we only boolen opaaler te Twe / Fake. y | OY # TER OBSEcr # EXCEPTION HANDLING: (ll) =, # enter gcbloms wthoat s opp in [% wore. memory elton the excenton of next i ches rot she ovtpat in qamln we we EH of tte tine a | mont. t A ef diy C%,Y); | . # wep & FILER aye, Her oh q q telarn X/Y - és 7 ercept : | Le Livx2 for ” ie Ong) ten. * Epon” Lb = Cinna fer * in tong oi) next (U4) oS Clo, 2) \p teakend of ora nse, 4p 8 inher cbc > div ("Asmin”, 2) ; Out. ERRop “s# poblem: PALINDROME % DEFAUT _VAUIES_ Assntevtn, a —- or #de} Aop CxXF6, Y=O)r vy | return “ Provide sy’ tetarn x+y | if xe= xt]: return * PALIND RoW oo elif ete yes 4]: > hop Cie) telvin ® Now pay! out - 10 ae > ADD (+) out. 0 # def Mut C4, yea): reluen xay MUL C10) Gb: 19 OY 4. RANDOM VARIABLE DISCRETE RANDom UPR, ‘ CONTINUOUS RANDOM VAR 3 2.) | NORMAL DISTRI REBUT Ion = STMIORRD A NORMAL PITA En 4. CONFIDENCE INTERV oo ~ (3. HYPOTHECIs TESTING, 6. p-value $$ q 7 TNO SAMPLE 4-TEcr & ANovA - 4. CHI~ SquRRED TEgT | j = “eb satistics IT 4 RANDOM VARIABLE ry RAND 4 by experimont whoses ontcone cannot be prodicled is called random expesinnl avd arkcome of such expeniments ave. enlled random variables, # PROBABILITY DISTRIBUTION : #1. Sum of all the events in a single ethenimon’ iS ohe, Poth at, #2. Nogative value of probably is not ‘ible . farts a coin . és. Exch A the padkabi hes are. + Lilo Die . a poss! # TYPES te P2O we PISRETE. RV # BivomaL: | con ivews RV # DSCEETE Rwooyt VARIABLE ~- BERNOULLT ‘BINOMIAL or ang rons A “#4. Binomial disbi When tandom variable can Wee. more Han wo value ce culcome ote com for) can be. Nri Hen as Sim of bernoulli's : & consider tesulls of texam, se disceete Random variable. | tanclom variable whidn can take only fers voriable/ vanes, . # Beenovur ; | Random varialle which can take © ead on wes apt Eq: Tossing a coss [Head/ Til] Cuore of wnkeh Ltvin/Loss] Exam [ Pass/Fail | ae: in all subjects et ¢ 8 a \ ead! ae Best, Result = Ex1 +202 HAS TEXt Mo, \ 2, Yed m Va We v2 M2, wadyy 6 “keoon (20) 1 Gas) 4 (Ht=0)] = P(xi=d). C20) + PCXa=6). PLA=0) =(8)" & Pass in all subject = Ub)" 45 STATISTIC s J #. RANDOM VARIABLE cannot be predicted is called random exposmnent avd arkcome of svch expeoiments Ee Any exporimont whoses ow}come ae called random variables, Eq.* Tossing a coin. + Lido Die. _# PROBABILITY | DISTRIBUTION : #4. sum of all the event tn a single exdgsimeny 1S ove, Path = i Total peopabilila = 1, #2. Nogative valuc of probably is not posible. #5, Ech ot the packabi hes are. possible # TYPES we 7 PISCRETE RV —conrivuws RV VARIA: # DECRETE RAVDOM VARIABLE ——— BERNOULT -BINo MIAL PIsceeTe Random variable : tarclom variable which can take only fen variable /’ me £_BERNOULLT : Random variable which can take exaclly two values 5» ovbpab. Eq: Tessing a coss LHead/ Tail] Oueorne of bch Livin/Loss'] Exam [ Pass/Fail J EB Pass in all subject 2 Up)" fe PzZo #_Binvomia: When tandom varialole can Wee. wore, than -wo valuo om outcome or ang nombar of Y tengo) #4. Binomial disbi bul on Nri Hen OS sum ob bernoulli's - & consider tesvlis of texan, L, i pad! pac Ee mae ' Result = E442 HXSTEXt EE) Fail in all subjects Wa, v2, V2, M2, xady2 dy ad y, And Na =ACx=2)0 (a=6) G80) 0 (X4=0)] =PQi=0). POR=4)- PLXa=6)- POO) “)* a | ' tp. oew_ USTs poe ny FE CONTINVOYS RANDAM VARIKBLE | ' en er nn | # probability of @ singe | m | 50 ewnt or value is npprox | to zero th a Conttinvons caw, w Eq. PC ‘tempore iso7-4) # BELL SHAPED CURVE tomorow @ 12:00n # PROPERTIES + 4H: symmetrical CUTE ArOUnd, # In cev the probabi hty mean /median (4). can Wie in a ravnga. /inkerval “Any symebrcal data Ving Eq. PC emp (27-0 - 29-09) | Mean = mech. # DENSITY CURVE: fe: SPREAD OF THE: CURVE + Provioles region which high | * hgh verge = high Varione. (3 bab li vi las 4 + less tai low Variance (4: Prbbly pol * Tota, AREA = 4 # caleulale the poobabalily of interval (02 -0-#) * spread is kes + wane is less so the cue qaws tlle because to make the atea 4 [\ . spread & hh + varane is hi | Aw \ so tHe * has lower ii 7 ba ory + p(oa-0W = = ha (02-04) Y | WBSTANDARD _NORWAL DISTRIBUTION | conpiTion’s 5 ys shifting» cune. cunts zwNn(o,t) withont changing the Shape | vimeal dickibakin with rhea of the curve . zero and variance one . b. Ku NCH) | | LHC NCUTG, 6?) PROOF: Any nora dishibehin om. be converted into standard” roxnnal. drstrbnhon when the distolonhon 1 tlie cf Nd dbl ghd by 1 ee ard divided _ty fis sboiac ex NC CH OOD + Mean chaages by co units * Yanan remains Same. + Mean changes ctimes . © ya nce) tyr No.6) | milly # ADDITION ° I. cGy ON (0, ce) * variance change, c Hines . , ¥# PROBLEM: =o ; 6*=400 FIND ~ P(X <40) 2 Am. convet snp: * P(X-50) < 70-50 nian lo ar (@ze<4s) Oe 0.68 | t4 CONFIDENCE . { ATE exer STATISticg, 4 ¢ to uwdershed ty of the pope lnhn tt | impossible +o shad il the indivi ix de i So Ne lake a San ppolatn te sled propestios of the popula, “Wer ot aune = ore gig “4 6m Thfosertial shotiskes Probability of galing. x HH Tr InN, 2 om . " + dist. sample variane (s?) = 21%)" n-4 #\ t~- DISTRIBUTION A Rr A ‘STANDARD fA NORMAL DisT t Dist. has hiker hail #t- diskrbahon. DEGREES OF FREEDOM: “No. OF variable which independ chase + FOR WN DATE the POF iS Ned, cm be. Eq. S= 2kxi “nd "For more no of daa , dof is high at sich cae zt. 2Ki—¥4) * for no of | FOR (nh 230)" We either normal dhl +- dictribution . [ * por n<30: Ip o? Ionn : Nn Bi 5? unkronns &~ disp A Col Stoner DerRrEITey # Square of normal dishibitin is chi- Square distribution xu ay 2 Nn) — x csp bts Xo tm csp lof =o. ae MAX, 1X, cree? dof 23° 2, 2 2 MDE. Xn Cn defen #3 F- AL DSTRIBITON # Ratio of two chi-squre distri buhon # concider 4° Ryy* ied a Mu dof F (1), M2) yr Vv ‘ dot Fe Ona db are CSD bra FON Rae, — FLon CHART # HYPOTHESIS TESTING: AN CHI-SQUARE, | ano) > an na # DEFINITION * bv NCS) br Cna ypo thesis esting a method | ew | 4 of Intermbal slatitics wad | rae EF = Om do dolewmire the pase! ble ; ane FC!) M2) —__ bax | conclughn which hac © be tint FCmym) | shlistially puokn or disprove , Coe YTYPES ! Two TYPES oF HYPOTHESIS, T. bist. @ Types : [* NOLL HYPOTHESIS 2. ALTERNATIVE HYPOTHESIS | | | #NULL HYPOTHESIS : + Existing baie. an eae ~ Ry: Existing Dovg is H . ee better than $4 a Ls Be FY o “the result or onkcame Mog Bleue of we known or follows e #2 i ee $ a Fr, og ee 'g ALTERNATE HYPOTHESIS : q New event or experiment | Zs * Onkcome 0 of His new Be 2 et : expeviment ov al ain: EGA Ve fw fo be proven. o = a oT gift iF gr : ae New Doug 1s beer a | a 2 \ shan exishing Dong “@PURPBE of tyPoTHESIs TEST 22 _A, * Purpose is fest whether the Te als ¢ - wi byes can be reechot ea ae 3 or agpsoved wiles © 2 eo Es 3S Noda + IF oll hypo rejected , fe rig # $3 S " reaseaich Yypo thesis can On ccep tec}. at 5 Hypothesis ein : #4 “TYPE 4 “L y = Ho Ha: YE Mo. #2 TEL + Ho! Ys Ho Ha: Y74o #3. TYPEg : Ho. 4ZHo Ha: Y -disbibution in co disbibation only hen deg DLE LY, 2 For null. typothesis dywe the of freedom s les chm 30. vane Zs b6t onith 45 ¥/ -_ | YY /c@tvee 3: wypoTHEsIS TESTING. |, Ss og | * oz Xa-th 4 i 7 . ys Ho | th ; Ls - Consider — dafy v0" now dh a 64 v in Yi fo Yeo th W NCHS) SO mr NCW?) WM > 164, ‘ _ " Fe a a (0,4) “Nol HYPOTHESIS TRUE : n Ty te value of Yo -ly it Assume + ngll hypo true Th Y= Wo | en “sub S-" on both sidec fe more. han - 6 he -Hs -Yo nll hypo thesis ig doe f+ add & divide Xn & frp *NULL_HYPOTHEsIs oT TRIE: = . a Hote value of Ko =tb 6° n | Tn kes than 1-64 thon He s m- . “Te wll hypothesis is rect Z dnd alkernative. bypo thesis i " * FoR o> UNKNOWN - LILE PP hee TS Bao. dibibahin. fs Fin “So sine 2 18 com m whee § is” S, variance SH nary) dist «1¢0)4) Sug ~ Gan say than 2 -16+ wth 95°77 copidone. , A ONL CENTER PROBLEM. 4 p-YALVE Gui = yet oi a nul hypothesis fecting . the 46 ewe 4 net | p-value is the. potabilly of e oblaing et resus os exhtorre ‘. hee | wrt observed reso lls under ee OG ete es Soln - (//) | Raf Th aS nal Inypothesis iad = tees @ The Pr probability tat 3/T& null bypothesis 's ‘ne. = 08 fo eq. D test - eo -0NeTR ie ne weed 99.94/ corfi- 21 -dena +o say newd = HA, is betkr thon exsting dg. + Fe 964 conpidone 2 to 12. Mauch tect + Null ince: te: it such cases » Deng ” rmking deceisions |S critical so inskend of reling doces- mln we gre p vate; pie (es fn rime, -2 to +2. # Range of p valve ts O +4 “TL Pues & less + Null hypothesis is wot bne “1 pvalee s high : poopy of wll ype thesis basing tne ts high. # PVAWE FoR 7 > # PVAWE OF TYPES # PVAWE OF TES Pci . when observation 2 | conduoked Ho * |< Yo ipod, opin ry Has Mp spine? gg Mh ray i eto valnes the rsh hn hun yous Cobsorintin 1) + Poison 2 PVALIES [ DATAI] y Pike <* For Hy: Y= Uo to y) & Hs 4 +H t ‘7 My Ts= Xn-M 214 7 P VALUE FOR TYPE 3 Vn —_—— to. U> Uo -2 Xy DATA2: Yi Yo Ys -. Yn | HNC) Yu N (46%) Xn- Yn “wn (Ah- 42 , 267) n ~My) ~ (Ute) Yn) - (Ui-te) a NCO) is RANGE > - Qszstl2 %% | Ge DRAWBACKS OF BAS THe Op | | A ANoua 4 Analysis of variance | # In two sampk Lo tect we | compared ‘Wo podacts or | CASES dala. , when wo bmw +o | col pare. more than -two dole Ne. ice ANOVA. ANovA iS oa foo) used 4o rae rave pooducks * * | | | J | eq: Card Card Cars Crt, | *% 4 | | -# onses ad CrSE2 the abedak, Asha bps the rrean pth / | ar same . 7) Basic THEORY #4 Consider 3 dala G (Cs, . Check the —dishibation of G (2 Cs, # Bat tn cased the mom of 5. I the plots of 4 GRG | the the pt 1 and prt a ne dese bh exboler te, be same, betas it se dts the pls _are Ih variance smal) then condu, Arak Hey a, caso.» the mem of Means same vita vem Wo Hy pots & plot caatet \ mean (snot Same. dhe same because ' hes 7 n N low vatian@. even 4howh the dishna Ya Yo 4a are A cme A and cased. - | #50 the conaph of manuay | | pbeolake dna." belvan PE pos failed. #O owrcone His we We Relative distme . # CONCEPT OF RELATE DISTANCE (*) # BASIC THEORY: » concer the dishno bf He | clebibation and vattance. ie if the dislana behwan the mean ofthe —dishibutions aye nelle compared the varia of the disbution then me conclude tuk the mam is Same veep se up. aisha. ae Sovne . #y tHe tae ae up dishna is small com pared to nha on | compart inharoup nr sang Lewrer Distance = (U1-UY Cth-M) +l FOR 3 OISTRIBITION aan ee eee pry : shan | BI GP) INTER GROUP DISTANCE 9 PROOF : # INTER -INTRA DISTANCE - Consider Xow kn Ub = KOR Xn n . - Yee Uo oe Zn {lg =ztZt-> *4n2-- hen “/>calwlale 4 = Hae > ahler dichan cere? <> For n dishabalion . = meme Ter = (Wh es me uy > > inlagewp _— Is also called distance . \elween ad #® INTRA DETANCE | F - RATIO at | “ * Fen Ilr Raton = Belweon Siwup De ky, a- ne! | Within Gronp Ding | MN J f\ IK a 2 Consider j , | NUL Rypodecis Ho: i= tpey, a THA a Altern, Sypotbeas Ha: Uy =they | 44 eo Altea on | | meh egal, | # Concider ditnbution | XIXD XB. Xn # F-Ratio is bes © Recep Hy, : Al ya fa--- YW | - ok ZU2 @... Bp FRao 1 more : Relact Ho, # Distance, b/w Gorps are: * Bab << Nap : Nou HYPO 5 TRE "+ Coup? +. + Cx + Wad << Rup : AMERNATE «eq > Cus Uy + ede. + Cyn ey J # Alteevare Hypo. Low P VALUE, Ca-Ya)"t (a- Ut+s-4ao # Average inkag up lstanee : & CHIU)" + ECYi Hey $2} ay Bena) # Tne goup dshme & alo called * Noy HYPO HIGH PVALLE Sqn-a) distance mathin or with in orp dis+yn oo Nad =2O6-UI scyiprreCeiyh | jeet= SQUARE TEST TEST blenn + h r 10 “ testing tec nave for conical “apie: he ; de Lhchon the Mhee «ET I+ 1R are wehtle: io =O |* chi- sl bebo _caluprinL ; . a | 3 a we have wo ae Ho. Smokor ard athe ane ist | wriabe. we check Hut te Srookm R athlele — owe + | hy degerdont or non soln | dada » PCS) =14/28 PCN = 14/28 DEFINITION + pcad = 18/28 Poway = 10/28 Chi Squote fest of om ind aie, | ek whother tio qualitative varia- a> pcangy = co) i | -Hs ae indepen dent te whether ag 33 (388 j | tere exisk relahonship b/w | a x | fe catego ca value , Rex AIS) = Qx2e =. ag NUL HYPOTHESIS HO: The variables ~ PCANNS) = PCA). PCNS) ate independent , Wee. is no ae om | telbtonship bohvan the wo gx P(AIND = | gia pedapena) fog the wwe af one variable” RELY oe. to “2 owe wtheb in pedicliny tre ae valve of rl po 4 kx P(SdNA)= 5 A ALTBRNATE HYFOTHESIS 114 » pgenwad= PCNS-PLMA) the variables are cepordint te _ ‘a thee & telatinship Why ‘vo = 5a categoria Tapas nen P(NSoNADS 5, ————————————— a | 6-4 © he olservad value: s NA Coma. | 4 q a lt NA = § 0 5 0 # dishoe belwon the obseried < — wlne and praticled value . | ‘* "= = (je (i- Ey" Ey Oi - observed ! Er expecled . # pea i. de = @0-of toy-4)(n0-of wal ~ 4) | * * Find x2 and calculate p vale # Pp Blo, Ho P Is hith * Accept to . ; ee 7H pvalve <= 0-05 teloct Mul} hypothesis | H6 Ex PLORATORY DAIA ANALYSIS EDA Basic EDA bala Aypes conversion Duplicas Diap columns Rename the colunms Outi dolechion Nisg values Tmpulahion Seale pot and correlation uly sis —_ Trans founstion . #1. BASIC EDA 44. Dole. types . #2- Data. sl 48 type ( Dia) ‘#2_DAATIPE CONVERSION | #1 Data. info #2 Data isna().cum() > vil values | wna - =p. tonomesic ( dala [\cd_name! J ators = \cwere!) Cocategory') | Bestors =‘eperce’ 43 ase 7 dlype form Form obj. tl. hop Dale [col rane’ | = Dab [ ‘name. ae + change he value to normexic if possi ble wl if not possible pub nail Values , eae int (0% Sak” ne replace to roll val e_DURICATES # Same lative tows ae Called duplicated tons . #). Daa. duplicated 0 Y: False Tme Data [Data 5 duplicated a] + display duplicated ton #2 % #2 Remove doplicaled ton Data [vr Dato - duplicated ()| Fake if Tone uy — retequal / “ae Toe if fake “#4, Easy stop to deble dighiot duplicates C) . #4 DROP _coWWNNS #1. Dan. cep dof 1, 8-8) axis=6 ROW axis =4 Dal ep inne ot <4 axis 24) . colyron « A " i 5 RENAME THE plegel dorms (Dna eon imputation . aloms-[atcd for fin Dah. colors - ) tmean= Dal Ci]. mam!) Data Lid = Dalai] ill aC oncom) excopt ; PAS “a | STANDARDS ALER as Robust Salor | consicler Xi) a, XW , Toe # He presena ch outer om i nv. Comlamimk the. wean f 6 = ee 2Oki- )? dhe clan . oe | So he we wedian ind “cust to dala to shrdad | of mem noma decbibution (954) # TER Coe-Oa) ve kergth 7 ) ve by a Dw S/: vantile and 7. we , ‘act yeh oll quanhik 1s fal a meee ef slardad deviation , Xiy X29 - # wnaieonty ot dan Nes par 0. #2)_MIN WK SCALER # During oles the Sidsalr consider dork vy Yaa» xq is wedi Rbgtsaler Find = Mar(%%0,---%n) Xi Malian b= Min OX) k2)-- Xn) Tar Convert the dole range Cyt) _wwmnk Scniee ONay = Dry. valwes scaler = Minx Scaler 0) “Resor = scaler. fit tevef % Yo % - Xn Cony)§ # ram, of nn rk stakr ig crayoapy sale forn ob 4. anay= Dal. vanes Sokr = Standard SorlerC) fit (anay ) bs isa beach tectaled = sakr- transom C array ) 0< U-bs os ai-bs4 " “he cover ue x dp C tala] bbeLen ra hf # Model which is used +o > sintPLF LINEAR REGIE a # TYPES OF REGRE Ssion mode] c Yet # Bast Vp LF SIMPLE LINEAR REGRESSION 4. STATISTICAL MODELS 2. MULTIPLE LINEAR REGRESSION 2. MACHINE LEARNING #_SIMPLE LINEAR REGRESSION MODELS B. DEEP LEARNING. COPSIDER Howse PRICE PROBLEM: MODELS. (@)_PROPERTES OF STRPIAHT LINE # MODEL raining inference, about future or unknonun. 1. REGRE ssion . Y= mxrt+c aa 2. CLASSIFICATION. ” ule | ye tanorvte # Regression model + conbinvous variable / data - empC202#) # chesif ionhion mold: categorreal Varial \e/ dala. 4- NC Winner (2024) # PROPERTIES OF LINEAR REGRESSION #1. Linear medal says when tno varinbes wth +e conelbion o —ney- corlation , Ne can. preclict all values aN a # REGRESSION MODEL sting Uno, . predict any eonhtnvons variabk tS called regression mode] #2. STNGES OF PREDICTION PROBLEM. 4. FIND CORRELATION 2. cReEATE NODEL Fy: Tenpemlae ay Heh 3. FIND Rv VAWE Price Weight. ar q N= coef C SLOPE). oS b inlercoy ot ® Best line for Linea model: i te Bifowna belwan Ochs) VAR value % predhicled vale; called locs. tas = Ui 9)#E-B)t-+ Of ~ © ACTOAL # the BEST MODEL ic which * PREDICTED A RO KG qh minimonn Loss, . # we know tHe Vales of % # best modl provider odual vlne Qe auth a- shpe, and predicted vale clase 40 Granth pe each other % = Atnth . ‘olercon, | HL REST none: The dina lh [ess= 2 Cyr-avi bY | i=) te oclual valve. and the pte -dided valve minimum. | * Yolwe of Yi Li are Known. te the distna bf add * ®&b ave unknown. in b) value & tho precided valne # Find hfe (a) L ! oe should be ese os minimum . sich Hat thay provide rive Lee #3. CALCULATE Slop (0) & HON TO CACULATE Digi ACTUAL & PREDICTED? pele AcTUAL PREDICTED. = (Min)” —concicler {0o= x> | find te ya a 4-F optimal point whore, x tala i Ye % (2-2)? -mon value. * & C43-ReY" Ang Rrivak of x2 sim = MB. CY2- Gs Ck yp SY=0 ——— x=0 @ ww x as min DISTANCE = WAY On-9) thee CLOSS) ] qy this distane is minimum then we hme best line, . \ | sinily consider : * J tyi- axi-b)~ a wss™ §, OLD Loss , sind inina Vale cf ak b. Good model : NENLS 2 Q : R24 | ae iC =e slofe | lp Loss. es = [ id [. RA | ge acs oy ——o | Bnd model Nov ss a R=9 Ean -XK / #1, rewmoddl ic Ing so db =0 Goahtrcert- ie gest than old modd. then the. ob ‘ VA, of nen bss ill ba bie pats ; minimum compaced 0 old loss # iL te model is little beHer # MATE Han ovciage medal than the | . ay value of ren loss will be & is used +o menue spy equal 4o old accuracy of te mode) . be | xo the mi of ae © CoEFEICIENT OF DETERMIN nifor BL HE CORRELATION CXVS¥) #Rarge + osR’s4 yoy SE ¥ R24 GOOD MODEL R270. BAD MODEL =Ve cor +e ar # CALCULATION OF R* VALUE $ OF R* VALUE + 3 RAIUOE (L0RR) 7 -AS CORRS t1 Daa ACTUAL AVERAGE PREDICTED 7 corReiarion ~4 + Strong negative or xi vi a7 cae 7 B e > sto We corr a a ¥ % worReLaTION +4 "4 post ; s z correLaTion O : No- Corr xn y % |” roan: xy ky MA ol : . ie e “> Wiese HW HCA AHO Ps & cxiRI WIV) wey wags = eA )Y Cleat Oh Veckiw 20-8 -¥) NEN lose cc OLD LOSS Asfert. y 4 MULTIPLE LINEAR REGRESSION] a lean MLR Ne have inultiple foniures Sylmar ov atbibutos # Esror com be postive a8 well ag 1 # error can be Ih aswell as. low. B DRBACK OF SLR’ ie) DEES ER #six comet of one fortune. (x) 2 1) ERROR 1 N09) # Error is comi ftom nostra, debibution NCO, 7) Y= artb #N disbibahon st owt y combs perl bua) Hench Sees on several (entices . ly te velo of error shew. # this is overcome by MLR, “Id be less oF 9ppOX aul # sie model +o zero. Actual - Predicked = Stor #& lies near 4 er os it 1-F =« lis vey y = +e # gvnloo) Y = artbre — #2) AssvmPTion oN FEATURESCX) ie cool # Features y= 2x1 the w=! = 2h rg BY = 3X YS @)DETECTION OF COLL inpre 4. PAR PLOT 2 VIF Corian ii > @LVARIANCE INFLATION facie — ia #M the three rapessin weld a oc 7 spi is conect ox same . HT] twig fortes ae comb molhiple _represmbatin # How 70 CHEok 2 multiple Tepresovialth os obhin fe ¥1,X2,43...Xn ae — dent os rot dele Ziedel ., so findy ungre mal +B ¥ Cane Reon will be difficatt @ MODEL VALIDATION TECHNIQUE @counsaary.: Tf “two tenures are Ini dent or collinear. fesult: delele © Consider =X, zy ont model. {= 2K, Hirt § allerale y = 41 404% Y =4ntotg ope of thae Hale Xp =0 Xv ¥0. Ki =0 Ho: % +0 ve eine teed 7 bo » Ho a 20 xkr=0 * Ho: Gels rejected fos Xa : predict X40 weet otter Late, Ag = UX2 tOvet.. Fags; “calle @, if Rs bith 4, model is Good. . * TL Model is Good + Xa cin be ye ~dickd sing ether felons * Then Xi is telled 46 other fentims . (x) deport) . Sin varly perfeom y we oder Wes . D(oel ey, | Xi xv Xa type tla 78 = ptt! WW Wt we. Rn an eee | yg Hat. hn a | « Ros + os remove fenlores wh Yas high e value . _ 1 is VFS vif *~——> can . ae Re zs high the 1 is also high, \-k wor. eg Red 4eRze 2 prof: os RSA se? on both sides A Fy vir z(0-20): peace han (0+ 20 then \* iL We is d , te partial fertare is -ont /telrted to rex other # Turestiold RB: A AE A, > 9.9 + DEPENDENT FEATURE (#31 ERROR |S INDEPEN DENT Y= Oy: ae a # ertor is pt of fenares # enor & oncorcled with fol | jw because oll theestmas are. tandont. @ vow 40 find & am and 2 Eux wn cavebolod 2 en” © scathor plot . conn % of tre pet sbovld be sic re noll -correlation por ow TO TEST A DISTRIBUTION IVERTHER IS COMING FROM ‘Romal_osmRIBUTION 2 Al. Assume Hut the dol ey Se sn NCS) # Prove Hat data & note dsbibahon . # Standardie the dala £, a 3... &n 4 NOS) Owide w: SO. =Q PLOT (Quanhle % - Gohl EXPECTED: . 21 ee En YW N(0,4) patel | Soa. ° cider ettor Cexpecled J, ACTUAL ee paced cy ea, fx...en “ NCo0%) 2Z,a,.- Mm N COA) €1 , €2, ache ot Ei -E) # Sina expecked and Oda — 'S_ comi fom N(o4), - (so there ploperties prnpastes 1,@..- env No") ae aks ossumed tobe sam. o 6 © ‘ Ly aclually, comin te ayatb values of He eae y expecked dala % octval data Mg, tae 2h NCA) Ore compared and 3 if thy ce tot Goby conc fw az b test rtode\ is adunly linear oor time. ¥ cont Variance Ls te wy to chock Ineo 6: 2 cosider ny dab | plot yn Cy Fay - XM) y= 46 i How do check YS linear 7 arn N Coot) ye xi tite rf aa NCO) do check the relationship Sy = B- Yn nN CO4) | bly yur move the veror should wot change the et of co} variance. - # wemove X2 We pextom 4 hay 40 “tat _honosedcal Linear ragresson wx #4 pt guy. y os #2 plot en'y. | v SN 4 Types OF PLOTS. | "Ge Ging) ox) | if Ey te and Eyurte bove correlation Ahen We can conduct dnt ¥ and xy ako — hve cotrelaton Y a y inow@asing nO cha in enor 243. wor I # pot 2 3. enor re range o| enor * increase 2. doornse wr the change in means, y # phot 4 5 Luxe ‘ tx nm ny X4 virin ig Change ebro codactioly e Yy #0 | Qutuee Desay b9p ee * mal debhon ali ochcs ls chango. in rvcck| « deelon graces rym ne i mel # This i weed 4 dole} 2 | “nt pt ic nat tropthy, | Trent —oatlias ir data. ie a, leben of chp ve * box Lot can dolect outher foe A 4 P n la ie chan , Onear ao pobl lk ht ab t hy % how + cklect outhar? | onHlier. > v4 | a wrthouk ov acelin ot # The presen of otter in he ca he model is shihig /chawing chug in the linear model. # tonarde the outher # So To delect that ovlliar ne. ave gene dele zach dala point @ atime , fom st te lest. Lomis REGRESSION # TO SLE RESIDUAL PRORLEN + REGRESSION + aamuare vediet ee FeASON fo \a nor Regression wed o predict or residuals 6 {Hing a a Covegpoal Valne. line . # vile of Vis pi dp be T Sohe this peoblom cephe He ay 40 value (2 od anit bine with a curve. | i 4 # On we vee bincar mytessbn | | | | fa 0 cakgniak cle "2 % PROBLEM + HitH ERRORC RESIDUAL) | | ° | x # Sobstion of ipheprelaton pbb = peclichig a calegorical dab Using a continvors mie. fd: -ows Stor ® inerpretaton > Resa fee at prsblon. an , pe " ie so sia vie fy “ne 4 : data, inslead of predi ing Y, # PRoBiEw 2 5 INTER PETATION PROB. predichin of probabally of VA (y=) op PCY=0 can solve. inkerprelation problem P( y7) => the ony known valMas are Y= 0 and ¥=4 7 Y¥206 ig unkown y Y _* For valnc of K we get Y= 07 | $o =e > ply=4) is 707 [te = t y = 4+ =1 “7 PCy =o) IS S07 A+ Te P(ye ? ( n He) = * (cq ty 4) is high tha . when ae Ply=o) So we pechct 4 }0~ ee # te Guth check wl ae #4: Replace a ine with a e* Cute, C contincorg), a #2. predict ply=4) & ply=0) , le Ra of sigmoid 4 4, 4 fin values of puobabil pot fr | , Cutve 1S le tes 4%0 bs # Why Sigmoid 2 ASMP CURVE Ate: Sine the iange of Syrei - 4 — Atte Co44) 15 simile / the ae of pri Cob) * cham of Sheps a, fod = eed chan in Oh — scaling fader i Zquahon of sumo = change in b ~ shifb- Te * parame aand b should be | ford suh at loss fo > x=0 | ot = iS minimom Ore ky a a | Rel) = get) # a ond b— shovld bo " sich that +he ome KA good fr jo he dae. prase FUNCTION CTusrension oar $5 # Good HE means, actual pel. 7 pcye4)’ C1- P(y=a1)) billy Ree aia tahoe hootch caster Y=4. PCyn4) To find goad {+e ce YO PCY=0) | ows ponchion ie nah ly # lss-tonchin | likehood . = puyo” cl-p y=)? '@ Find 0 ond : ray a tS min- max ogy) + Cp) bg (lH) the enor ov bes - (mum. sin ~ 99 CHY=D) +O Wl CAs) sonic mad be de a colecal doin 2 (Classi ienteo) # BINARY CROSS ENTROFY me Sanne # oaeey > -# comect pre =e i) = C1-y) log ( pees) 4 ton) predichin nin = yg CpCyey) - Ce FA P . a ~ : # rth: L # sks ip copsecte Fe PU ro can) | lv xX Y _ 7 “ “ “ye 0 + te x=2 predic Y. 3 00 Curve shape. 4 ow pa > Gace) # charge 4 shape ot cau ; cepecenk of akb é Ply!) =4 + By changing aXb , curve + hour = Le os m7, . changes | ts - | a ae ! ite? | | + # Problm with Neeurocy % When cody dishibalon is inewn re Inge ro-of 4 ard las 0. ey Pat canar — Predicled « 4 ° 2 2 ao loo Pei:0 000 ecco eco Recuragy is 40% # E> Fravd prediction re ony AY foand . . eee ) Recunuy = Ptin TR+IN+ FPtH Ro lpwaoeN # RECALL : | oY # Recall = Clasd = —TPO TP+EN clas 0 = _TN __ Fer TN # Preasion : Pook NPGK Precision =20 =1. 0 WH MULTIPLE ATURES wenn ply=1) = in 400% + stOpinth! jroo! me Pe ca _ Accurawy Uneven de bibdkin of data # iste Bgese oval = Conl usin whi } Predict all probilble sleds bes C0,4,2) ~ Sorr matis # Precision « The stocks prediced by ne stovld be profitable (26) NOpEL OyrQtes, c-'/ @ av arls Seavnata Ch - 30/ C2 - 70/, 4 Hon cLUEsTERING: 2 # Hon) civesrerinia 2 Ovby we weed om» wath oko guPERVISED ~~ REGRESSION -wthm +o cliskr the data MODEL cy] CLASSIFICATION —_ivglend of manual cluering 7 So cures # Plotting ot two Pealures No Y. ‘ | SRECOMMENDATIg CAN be per ferred rranunlly, \pnearey er REDUCTION T Pca & TSN] # CLUSTERING # Grouping of dole. Into discele err # why cluckering ? # Cluctering is hecessary +o # Manual pling Is possible oy ope 3 Penh re 3> #1 ne hme ulhple Ferues lotting iS jimposei undercland dilaent qr of ‘he wth Cee “he la Gn dolack the sbudue # ta: Bank covshner dala she deta and div the * Understand - clJoont types of customer dops: File > Defavlen. # CLUSTERING - HIERARCHICAL —— AJGORITHN # Example : Cusbmer seq regarhin : + Awl nlhich oF castmer which ore. opeced DR ScaAN ALGORITHIN 4o unsubcttbe. . K- MEANS ALGORITHM #4, HIERARCHICAL ALGORITHM SER CHICAL AE GORITHIN ; AB? e Ac? +Rc2 t Clustering the dla pein: AR™ = (X6-x0" + CYa-yp? in the same qm posses similar _ chanokrshi wg _———$— = properties | t= ee 8» [Coa rGn ah 7 EXAMPLE + conside 5 ln poinks A) MANHATIAN CL4) DISTANCE sos Cate) [ = h-xlt ew a ” d ~~ 7 4 ony #12 Distance is direcky : fom A +b B. oo #14 Distine. is fom Abe Saroops - £43 $23 g33 £43 (52 | Teoors 542622 63,42 S52 and ChB. | PS: £4123 £3,428 G5} | pred * Uai5F E849 # LA Diskna is om aretidy | ! Geos: (1,2/3,458 higher en distance. | (@DFoRMuIA TO _CACCULAE DISMMKE GD) tow To caLcUATE Distance _SETNEBN Tho polTs faa BETweew A POINT AND A am ; | b= [og—ent rey | | i @)EVCLIDEAN DISTANCE dt y 8 Gon) =| | | T | a a3: Pronge Tehage e2_K Means _ALGUETTHIN . diide # k~ Means algorithm ue p! complele linkme, num ber of cluskers read +p a (dis de) | the dala. # scaling the dala is necessary "4 #3) single hinkrme ae # STEPS: “wie eya). # TEL number of cles i +o crete BHow + colulake Dsbana b/y a chcbe scuied ice. # Sippe ea WS rei ~ is tree, N=3. ds 1.Randomly cleose ree geil & Z © te clackr cenbe . ~ # al eoch —clicler’ cenlee / , ‘et \ cluclar Kdab- + Avance lin ‘ chido ted ~poin ) | 7 * * ~. Assim ence dota pot + * comple linkage: wowCAudydhd) ") t he nearect cluskr cene. Sing inkaye siwin (dr, dado) La Rep 192 whi) cluske #) DENDOGRAM cqnhe converge y #A. Update. clucler cenke . a] a =e soot sqtt > ane saat ° —+ --- --- 3cuswe 2 |__| AC LUSTER | a — ---|-—- 5 close “# De saw | ee | #4. Ym | prink, inci J * DR saw is called Density cite iS “" the dena, Based Clasleing slp | Bako his # 34 pees doled the 2 Nine ma Of point shucks fh the. cal. Ia eee se core point isuee # 1 thoe és any fire shud reo csi in shuche DR Aver swear Wy & oe “e | ce~ cluster centre, DP- Daln point @etow _ephinnl rum rumker of clei © nea kehnique works 2 detanw > sdi ca | “We cca and OP. aa | ie ee able 4o provide. a | Soe fr ench number of © Prange dstna >> edi +b luslers formed . | bi cea & OR 7 # the m-of clickers bwin he @ dr, de,dz,ds, d6- ™ he poke » sof YES we consider hab point overcome. this. paben ne feline el ee the opbinul ee elbows meted . | num ber of clusers for @.crireeia FOR ELBOW METHOD dws ease . #1: Find a Hut ro: of oluskr whoe,is a marimum deer -owe fn dstne. Dns ce and dal pots. WLHOVETTE SCORE que 7 _ ls ee oN cutee CURTERA hy CONDITIONS + he | # pick o point inside the clskr Pt. #2 choo. a point in .clusler such tat — tt hve the mx on | distance fom Pa. | : fe inler cluder max distance Sin #2. Colonlake another dicta. which i's ninimonn from the point Pa and por ovlside, she cluskr a n- max distince within cher NA) b- min chstna ovkide chekr #Goon +b >a Custer = b-a Max Ca, b) SCORE Poe doo CUSTER 4 boa. Bian b SORE = (#2 PPR _BAD ClusteR < Ady7b. go a SCORE a Sa | | | GP) RANGE OF ScoRE -1 < SORE Ss 4 & CALINSKI_ HARABAZ. SCORE #1. GOOD CLUSTER > DNC << high score Rad clus > low score. # twotha- dicna bA cc} DP CEM) pinens FONALITY REDUCTION Pe loo % — 160 ferlures # Using Pca ae 1 Bnalysi ? Principle comparent Protysts Gs - 3 Cortes. pp ie o nsuporvised toch te we concept of Y var. gh is ole of the technique I, reduco. the — dimension. f te dab. ( feature), aon Pen ehnip aris? ~ pc Nas invented in Nake. 9Q) Koo Vv Ay 22,B)-- Bog - E Descending Order] | the new fenlores ame oO + BW wore reVavnt than old feniures that the nenly construcked fake We (2,22,-- 2s) consist he inforntin Jc of all Cn ald foloes. ~ # Aer the conshuchion of PCA only the {iret fen forlares moc. conta'n 4s /- of ‘Tn 1) Xa, --- Xioo some Hi Pe \, \ y she information so the v other features om be remover Z 22 23..-- 20 led . ces oy y /: Yep, 4 0: 000001+/, the is cal lie 7 © @GrRnewichl REPRESENTATION @ cono rion FOR PCA xa |i. Fenlures (1/¥2) Xn) shoul | be clpeclent t-teakres shold be soy, kind of conelahon je poke or reiphic ore, ] x2 nts fe indacind he ye direction in which dan iS a = ee : So4on having higest_variante . | . # so POA conctals 24 svch For ENDEPENT FEATURES é Unt the iechon af hich 21 has maxinvum variance « @) PROPERTIES OF MARK “#1. ADDITION: TNO. mabis hong # constriction of 2 ® sl same no- of rows +o bo in or thn gorl_ai direct some no of Colne = jr 4o e f we peter oy CAGE za become 2 and 7 zn are, independent of each offer. #2. suptrarion: »0-°f pos = 1:4 dh be Accortiy the pet we can Eq: [ef] (ie He # dtop 22 \ecavse 07 of dala . # MULTIPLICATION: zi . = p»- \S'i0 as oO oa inchs _® «4 A [sep] “hs iF 2x3, qs ao er —™ | gyn FN vconicee p (enbves R xe |" to #8 “Pep « pimorsion of ¥ arepose f= Xpan xx! = xp Xen = HI pp @ Consider a mudtix [24] consider point C22). molpy pont % mahix. 213] - UF] u cs) (2a) Mn x ® Signiticana, of tonbts al you amily any vecbr with O rotix, Tt wil) lake and ip "agnily he vector | #2. When wbld with mons 5 both direchon and _ magn bie will be change Cie Intrange. or docromed) . @EIWEN VECIR LEMAEN Sean yecbrs ae these vecbs. | # Ee which does wot hone tho divection. ard mpi heroes vechr of o waltix & nt veclor which shows the dieclion of snxinnn variance in the dak # Ey PEE A Vv WY :16)-=8) A ts | v- eigen voc » — chee. i cvagn ibe ‘wl wath . | YY res: nds , — om : i . “ bie elt vk hy direchon 4 maximo) vanian N i -macKinnum — Magnibaa ts and minimus variancle | ah “ crs helps 4p i a “ . * Corres Mm ‘al direchins / tin ice rent 3, 13 ~ * minal the wad 7 X | X2 2 . © Koen + sig wells YoYo 4 Dip Maron : MO ..»>), 7 | , 24 23 ...7. > Xnxp XPAn . POA FEATIRES par 4 | > Xnxp..x\pxn = x!)pep. aye xv NM 7 = 72 = XV2 @ A2V2 Xonb “ove = Zax. # In PCA ne study the ceo 9 | ordertg # AM te fentnes (2,2, %) # Fom Xt % ordering We Noylar do eth Is vet possible perp AE Bit in 2y20-2p orden Was posble because + veches which impor ant te le is s Nore ‘impo ny X » is ke impor st) zP = XVP > Ap Yp . me Ee NMENDATION TEM Kt -tyPES OF RECOMMENDATION 1. Groneral Rec ownven dation 2 Personalised Recommend hin # MARKET BASKET ANALYSIS APRIORT ALGORITHM wd. SVPPORT « NO. OF EVENTS _ OIA) No: OF EVENTS #2. CONFIDENCE + CA7B) DIAL. No: OF A #3. LIFT (APB)? conripEnce (AB) SvPPORT CB) = eto - pRopucTA ely {oflvercing PRODI B . 4 — PRODUCT A -vely SUES : Influencing PRODUCT . EXAMPLE ¢ C1. BREAD RUTIER MILK C2. BREAD MILK FRUIT Juice C3 FRUIT SuceE EGG C4. BREAD BYTTER MILK EGG # Antecedent —- Prodver A Lonseqent - Produk & No: oF EVENTS (PB) + bay milk 7 # yipp st - spaclependen 4) support (milk) = 3/4 Support (Breed) = 3+ support C Bu Her) = 7/* suppor! Ceqq) * a @2) conipence = = C @read > £49) confiolence 22 CBrend - Bier) ° confidence. - 2 CFrik >I) = #3) Lier ( Ervit Boller) =o LIFTC Brend > Bile) = 24 lap = 2x4 =F =H 1-33 Z =133 Lirr(Bread> milk) = {A 5h = +, =(|-33. # LIET >! SueT= 1223) Mears Brad is Helv influeney the sale of milk. Ne Can pith cuslmers buying bead to ie t oducts - “Ge PERSONALISED RECOMMENDATION A so né Con SOygect C a buy Pia K sw gst . ‘ “lo Personalised recommendation is cons tcled mainly based on by Pio 7} past dala. # TER ACTION get ot all He Ch, Exwwele: USER CTEM! : 2 ew a! similar do cusbner 4 # simi belwan on, Consider 20 products and 10 aan be four ising hn called JACCARD Mary | customers - PL apat GPYIACCARD MATRIC. c2 0 Ie2/, [a # oceard wattic 1S a TT mathematical method which clo} £ deleamince the distna be @ what shovld Z_ recommed Ct “en wo customers . fe to boy 2 siwibnily % aliseimily. who hme # consiDeR like Cl. cys PI PZ PIO PIT co : PL PB Pr Pa. # Find — cuktomers . similar by in par lnc # all ge ne find os ys similar buying par ems > similarity 2 Connon trol Total no of uae? a of chy Hen we cam TT teceemond C4 to by pielck similnily = CANCIo [saree bought ly cs. ceoom [md Eq. Consider 1, Ralio iS righ — sywilariy C4: PA PB Pio Piz y . ps. PZ Pir Pa, Ratio is lon 7 clesia! ee > | anccart! be 4 0 2 tnnbrx “% RECOMMENDATION BASED _ON QUAN TITY PURCRARED 5 pistance i | hyher still lower destone Consider 20 prodicls and) Voc hor dictanaa , low sim laily on . hy pa | P2] P23 ]--- P20 | ct 20] 0 | 38 6 pistANce & ° ‘e ‘SIMILARITY } : ) 0 [osnnee =\~ Staley aie | {‘ 50 iE ° 1 # trom the pet. cicbmer 3nd 10 Ore clse 4 As purchased y c2xce +o C4 SO Ne recommend pode # pistane ca. Pl Pz Pio Piz cs: 1 P38 Pao iia cio; Pi Pz Par Pia esi C3 U0 5 pao & PIA. 4 @ MBASURE OF DISTANCE 4. COSINE SIMI LARITY. 2. Co- RELATION @D_Cosine _SIMARITY Pa ciSTOMER? / x2, V2) PA cuctome4 al Cy) xixD + WYO (Peat Dye +y2 - Se cs Oy Similan, AA) NEGATIVE co- RELATION . # f A Is porchusing Pedueh in ae quanti but co purchases at vay low qu >if O=0 asot=4 7) high sinilartty > if e=90 cosas’ =0. => low sinalarty. and vic - Versa ©e high > Delana high> shy © bw > pis. bw > Sinbari} # cruueet X Clsnmee Bae high : simihr ib they have ps # COS @ - Mercure of sii lib comela}in . ~cog 6 - Moro of dichnee. 7 cs 4 # CiCTOMER A & CIKUMER R ye BD Foe _MWNME co-oRoNATES desimbr | thy bu Similny iid Hen ces 212 HOt... 4th G2) How to calaulete concbley W121 $0 +... «4 Hood = Wen wee aha. tae @2)._CO-RELATION MATRIX #L, POSITVE G-rELATION # 4 cA NS chasis a produck iq bulk quand, + aye corel - high similaiy then c2 should ako by tht -\e crelan- bw sili} predict in bulk quanhily / - tou # IL CA is buying a padiel in bw quanlily to (2 shwld ako by ot prodvel hese. ee oe CUSDNER A” CIR DMEER cucmnees © cone lation > peck SSION TREE Gp How CECESION TREE NoRKs 2 A LINITATION OF LOGISTIC RE@RESSWN eo 1 ee LINEAR ORT 2 NON LINEA® DATA 42 # Logis tc rowpassion perfeams excellon} & hiner data. becavsa LR Prirnilaay ey ind 0 bound bedwan the dole Honk contd sepentale ‘he dh. to woes. _” © ¢ \e [24] +" fs | # For a pon -\near Cala, a sige Wine cannot ale sconsider a randon point (2,3) the dah into diflesen't H=2 3. CLAS = RED L . . # Sine the vo: of fewlores ae # > +o gercome this pobkm wo, potting and — calowlation doled to non-linoar lala, os easy. Decor ign Tee vac inkoduadl # But mor <0 Hut the pT model to plein tro dh imposible Oberae {oe von-linear dala we adopk a ath method molhpe fenlores, called = GINT ImpurITy, coe = | | ®B Gunz Inpoetty # oller segregation 2293 Gini imporily iS a mothod whe 7 Bwko} 4° 4 h ides 0 Store wetix Sar = 1-/2 rey = 6 . -d prow a T= 2) . score for all the conditions > Bot 2 of sepomttin wet PY | ge ay 4 and im pay. X70 — ScoRE X71 — ScoRE # CONSIDER | € tol imply after apa ; a Cece > GL, = 4x6 + 3xt mee ee er ae ' 2 aco CC eo — # uni inpny lek the priy = . f tho / buckets . + Ale ceareepin ‘he. ing \# BVCKET 4 lye ban redved 2 FIND PROPOSATION Rep & Bbc + 4 rene) = 14 or =a) fh) an PCBUNE)= 3/4 7 | > are eet = 90-0982 fer = 1 Gay (raf rn? | 4a 42 : © ve rind best seqegplin , |e x, valve, of ors ere ar nee Llaaita inmpurrty os | MOAAImuM dues or lover - UN THR UED in impor bw inital 6 tenn eee | and final Get . H 24 @ inal imply \-ay-B) a y | ON TOP SPLTING Zo wi TO VER FITTING }@ ER ErTING: Modo! is ohn ge — . Wee the imporiy of | + porfoan qood for raining t . pwekel becomes 220, | dala bu performs vay wp wll SP spliiog the | flor fe now dea or ks} data. L | | . nag We abby jo @)uear : lal, is Hut point y ot pevite {oof acamagy for | is og at which tree 7 ng - “train da . ie ik can fe goer git Jo enttie ca into AEH IO ig disoeke lackels . Nox ler, = 2 > oor Agari wonerizes He ao errtons 10 CHECK IMARITY seainng OOM - bat plcfrtn Poor pe dak dale, oF nen) dala. 4. Ginn InMpORITY. | #0 the gowth of tee is 2. ENTROPY . impor a | #IENTROPY. #2. Nhon the ho- of sampks [e= 4 alg | ingide » bucket is equal ot ke dm OB, plop the? Nore. entopy a more importy | spliting. S toce enliepy 2 less tancemmra Win_Somple €5. B83. Maximum hoght < 10. , | yy AG ATWO PROBLEMS OF DT # RESOLUTION FOR DT PrOE | \ I— 4. OVER FITING - 4. BAGGINGr. 2. CONSISTENCE . g. RANDOM FOREST # 4 © 941. OVERFITTING + 3. BOOTING . # The occurcy at taining dota vill be vey heh and oc $4) BRAGGING ~imey for best doh is ge Back “THEORY: y poor , a the oT Sans a 4p per em memorize the training da . bad for tet Onin beta, | the two flaws . RO tte oder to redwe the J | | etter or variance , insted # #2. CONSIS TENCE ¢ 4] [ream] [rena] freama _ : a i 4 of Aakig or cons truchin one [ol % my Ma 60/: i 73/7 78/1. decesion “ree. NE HN conetrn # moltiple independent DT ond aalulate the — overage of al) the bt the amor ater # Arceu is inconsisdut or rob same”, when we We same dal to all four wodel. with. ; difjoent “taining % techn or the vatiana will shaw che . a ctashe change or rede # Min — Roaunyy ts S0/. PTI, Ay _., Dt. Mn — Reawnyy is 7S/. El, £2)... En. # ¢ Rselidions Is cro wwlti- nao) E)tEo¢... & and combina all mals, a

You might also like