0% found this document useful (0 votes)
32 views15 pages

ADBMS Assignment 3 (C24036)

Adv DB Solutions 2

Uploaded by

katha.sagar000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
32 views15 pages

ADBMS Assignment 3 (C24036)

Adv DB Solutions 2

Uploaded by

katha.sagar000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 15
Assignment i- 3 Ga=0 WAME t= Sanjivanee Mohan Jarhad Sto :- Frm ja Rol) mo !~ ¢€24036 Advanced Deedubase Management System, G-") || €xplain in peal! KDD Process ~The k0D prvess stands Por knowledge- Discovery and Data Mining proless ~The kDN press 75 of discovering vsePul (nidde) Pdt+hkyns, ex Plovatuyy data analysis , informattrn harveHing and unsupervised patted recogni tHe pn ~|400 1s the process of finding useful infosmetion ancl patterns fn deed, Tota mining Is +he use of gorithms to exhrath the Information and putirns devived by +he KDN provesy ~The IDD process consist: of Following steps FJ selectHon vi The target subset of data and the athibukee of interest ave identified by examining the entre raw cataset, may be objained form many different and “| The data needed for the data mining Protege | hederogenous dati Sourles -| Trig obtaing +he deakt Rom various databases Piles and non- elechonic Sources TTT] poder cleaning -| Moise and outlers aoc removed , Preld vedues ant transformed +o common unihs ond some nee = it | es fields are created by combining existing: 1 Beles 5, Pucilitete anclysts ~The deta 75 typicalling put Tn are tettonal formed, and several tables might be vombind in a dénormolization ster: iti})||Dose Preprocessing The clotey +o be usecl by the Process may have Incorrec+ ox missing daby Thex. anay be anomalous deck Rom multirle som Fovelving diPfexm -acHvities performed at ot, a Exnoneous data may be conrcted. or removed, where ag missing data must he Supplied oF || Pacdicted. iv]| pate Transformation, ||pota Rorn di Ffeormt sources musy be ronverted Trio 4 Common formed: for processing. - ~The dea maybe encoded ov trains formed fnky mor unsable formats. Dada RedutHon ray be used to aedule the Me. of possible data velues being ronsidexd VJ| Data mining —||Rased on the obita mining task befng pevfermed this Step applies cgasthms te bt rransformed deta to Senerake the destred results. we appld the DM Alyoasthmsg 4a Cachet + Tnrerespng padierns, ~ They epplt associction rwWe learnin of ide! g to cHhot ate Aequently boughy og ephey . _ vi J) Tnresprye tation ) évea)luation The. patterns are, presented 4p end-users Tr an understandable form The pm results are presented +o the userc TH exRemelyY Important because pre usefulnes of +he results fs clependenk on fh varseus visudization and Gv! Shrotegl ec ort used. Entegrekt, ERT] = (ae aS TRANS FoTmed Dake inidt L = + wa a Dota Data Date | 7 “ Rnowkye vitel for eliscovering valuable! | The _ kDD process 78 Pnsights from clete , which can significant} enharre decighon ~ making ancl Shretegy dle velopment fa various Fells =s latte aq Short noit on Data Processing Deca Poocessing fnvolves tansPoeming raw dody Into _useful information Data processing means +o Processing of dati he to lonver+ 346 format Data rrveessing is the Series of opercet ons, performed on clatu yo transform , analyse and organize 74+ into & useful format The goal fe +o exhute pertinent information thal Can be applied jn decision meleing protessesor Suppot-+< exishing technologies a The following fs the process of Dat Prvlessing: Daty Colec-Hon J] | petty Prcparaton Dara input || Dota Provessing Data o/P ancl Interpretation Dakg Story? The following ove tne tures of oatu Provessing Real~ Time Provessing T+ fs essemtod for dese thet require immediah handling of dada _apan receipt) providing Instead Protessing and Feedback Multiprocessing ( paralle) Processing) Er Fovolves pHilizing mute processing uniis ot Ph do handle warous +agks Simudtauneous by This approsl? elves for more efficient dale Breressing , porn) ' = 4 c=) for complex computations that con be broken down Into smaller, concusren} asks thexby speeding yp overall prOCeSsing +me. €]]| Dishibuted Processing + favolves Spreading Computetional +g k5 Aces routtiele computers or devices +o Emprwve protesting Speed anel weliabi lity S| Mamucd Jato Pressing Ft wequare human inderventen for the Input | Processing and olp of clade rhpheolly without the aid of. electronic devices it]|| Dota Cleaning = |Data Cleaning fs +he Process of F¥xIng 07 rernoving | Pncovrcct ) Commupted 1 INcomecHy Formatted: idupliccae | or incorrectly Formerdmd within a dedusel | -||d+ 76 a process thee removes data theok obes | no} belong in your datuset = Peata cleaning fs clso Known as data cleansing and clatg Sirubbing The following are the ways to Clean clata! — Remove duplication on fxrelevand observation WLRPx structuvot esers Hil] Prey unvoanted outliers ~5v]]| prandie missing daty VJ]vatidete and aA ; ||) fs couctcel step in tne data pot paration / Process , playing on fmportant rele FO ensurin L the actusacy ; reliability and overall quality oy a detuseb Moreovey , clean caty facil tates mort eh RecHve modeling and patlern vecogniton ag clgonthm, bexform optimally when Fed high-quality , enn, Bee input. 1 [The following ave +he 4echniques fer cleanin, ! “i oordy fl al |tgnore +he Huples ~! QM tn the missing value Ss CJBinning method na 4]| negxssfon Fe €})| Clusdewiing iS he% are same usage of vate Cleaning 7 “Yo. ineegrection b] 0. migrestion cl O- Transformaton an Ao vebugsing Pn ETL processes — |The chesactenshis of pata Cleaning ane actual? [coherence 1 vedtdity ) uniformity , elec Vea Brcadteny cian deka bat kflord. o~ ~ tJ] Date @educton : ~ ~ [Dara Reduction $5 @ method of 2educing size of al orginal data so shu f4 rnay be Weprasented tnd — aa muth Smaller space [t+ aims +o define f+ morw compacit y The fnieantt of +he ovginal data, cota rsducBon tethniques are UHilsed +o generale 4 owduted vexsivn Of tre detuset -€. Substanbally Smalley fn volume The outcome of pm fs _unartecied by cata xdulHor The gow of dat reduction is -o make Information move Compal Tt fs easier +0 use Sophisticatel and computable expeNSive algevthms when the dota amount fs less THE deck (an be vecluced-Fn terms of the numbe x oF wows (xttords ) ox the number of columns Celimensfons ) Following are Some yechniques of dati aeduche pn a) Deta Sarneling oJ bimensiondity Reduction e] Dota Comprsssion a] Ded —DiscrxeH2asor 2] Pecturt Selection IMethods of Data Reducton J Odtea Cube Aggregation 4) DPmension Reduction ~ Sep ~wige fovusarel selection — S¥p- wise Backword Seechor = Combination of eoruwarding and backwerel selection, Data Compression pumersiIyY geduction Hl oigmeHeabion & Concert hi evurchY Ope mbon 4 ng Qa=—=0 @ | Explain Dimensfonality Reduction fn debut! ~>-l pimensfone?ty aecuction fs _o Preces? eae, dethnique +o vredute the number of dimension On feotuers” In _a deta seb ~The goal is +o decrease the dete set's Complexihy py aceducing the nomber of Feahiry While Keeping the most Imporhint properve of ne oviginal detd = _|Dimensionality Reducton fs a Way of (envertin, she highty dimensions dataset Fnto lesser dimensions dutusek ensunng see PH prvides SIM lay FPos mechan = [The Foilewing awe «jhe _methools of dimensions || Re ductfon iJ Featin selecHon iF] Pecsrise Exhachon 5] || Feertuve Se lection — [T+ selects a subsey of the origina: Featurs base lon _-thefy Importun(e —=[me common methods inclucle¢ a [Eder me thoels Uses Stetistoe measuxs like Comclatan metucd Fnfermedion or vanance threshold ¢ variance thoesholding remove : S Peccbune P 1ow Vvamanle seth b] i werapper metoel + evaluakS subser of Feotuns based of [model performeanre 20-9 REE pe Retyrsi¥ feature el?minahon ~| 2 Embedded metrees T+ includes techniques Jike fasse xINSSTON thet penalize jess Important features dunng model training. Featuve ExtracHon T+ _Snelucles tansfosming +he clata fom a hig h el? mengionel SPACE +0 a owe, ! dimenstone} Space SIPcb ( paincipte)- Component Analysis) Project date onto orthogonal axes callec/ Pr nerped ComPonents The Fi+s+ Components Captures the maximum Varian te » Pollwe el hy, MIILOR (Linear Diseniminant AnUSts_) Simileay +0 POA but FOCUSES ON maxtmirin C'A5S SePercthjll ty i 7s 4 curesvised method spat aims to find Vinety Cembinotion of featuns tnat best SCPC rele AtPferent Classes L~ pistébutee StochasHe mweighbous embeddin ¢ wy Anon linear technique. for high .climensfonad data visualization It yedluces climensionality while _m dinicining the serotive distance. between data pornts Duto ~ encoder SS Neural nehoorks designed jo learn Compre sce ‘ atP r¢S5emacHoN of Fnpub clerty they recluce d?mensionality Hmugh 4 bottler: lay ery ‘| * App licaHons e Dak come xssion i] id! Roise Reduction improved _vigualizcction iv]) PrtPomess'ng SkP ——E @. 4) Explain the -apyMeation of pata mining tn * fousiness Injeltrgente => - Data mining playS a prvatul ale jn Business Intdligence by enadting OMAN rations +o ex halt voluable insights fom large _decteseks —| These insights Suppoa} betes decfsfen - making 5 MPRWVE operational efficfenty, enhante Customey experiences , and: dave comp ettve advantige — |The Following are ane applicotieA of pry in mit ST Customer Relationship Management Ceam) SJ sates and morketing optmizaton ni Fraud oerecHon & Pxventon NJ financial Analysts anc fore casting YJ |isupply chain and operectFonS management vi) Human Resourte Anal +Hcs cam or helps businesses undeystine) CuStorney behavior . ancl Pochoence, 1 leading +o fmprved CusPomer SatisfacHon and loyalty Tt Tneludes applicetHon f-e. custome behoviosy Segmentetton Churn paedicHon anal PerSonaljr eg mar keting t]||Sales and Maaketng optimization | Carnp align OM _parvides jnsights into Seles pattern , markehiny ef RecBveness and rrurkeb bends Tig based ON marketS Based Analysis effectiveness and saleS Forccastin Qa—==0) ——— 1 = U7 | Baud DeteHon and PreverrHor i IPM fdemifies unusel patierns fn -finantl [transacHons Which coulel incidase frau =|Predictve models assign visk soot $ +o Hransaction or! eniHes , helping pre rtize Snve sig actors iv} Finance! Analysic and forecasting = [Paci cdive andyhes delamines credit @eithine by anelyzing historical date , patmen} neyo ane derrographic. _informechon “| Clustering ane classthicahon algosthm help fdentify the most and leass P7Fi+teble produ | Sewvices oF customey Segments VJiisurply chain and opersioned Managerren + = |OM optimizes various aspects of Supply Chain pereretions , enhaning etficteney and reducing [Costs ~ . : —||OM tdentfies inePfetienies Fn Procluctton vy nogisties andl suasesic Im prevernen » = Explair Bayes Theo* and wafve RAYS Theor rw t 2 2 A NAve Bayes ctgettam fs based on cond? Hoaned Prebab illite J+ Uses BayxPS Theorem a formula thot caltul ae Probability by lounting the Requency of velu es and -combinedtions of velues tn the. historical dat The advantige of Waive gayes ts fs Speed The’ mMeive"” part of the name Indices the Step tifuing assumpHon’ mecle by the nlafve Baya, Class ifFiey : The “Gayes* part of the name refers to | Revexned Thomes Bayes) an eth centr Seuteficechon and theologian rwoho formulatedl Gay ec Hh 20% en: The general sttemen} of Bayes Theorem Is TRE Conde pond prrbalihy of an even: A JiveEN she Ocedinente of another even) -B, 5 equal +o the product of the event 6, diven A and pre prwhabili HH of A divided by the probability of evens 8") Fe PCAIR) = P(8/9). pla) PCB) Devivetion of Bayes Theort m ~The prof of Bayes Theowm Js IVE a5, aK Jo FRE Cond fone poob ability formuld P(EI/A) = PLE NA) —@) ein) Then, by Using the multiplication we of ——— a aN a Poobability ; we gel - 7 | Plern A= (ej) PCA Jet) — ~ Mow 1 by the Jotul perpbabl li hy These ry 7 Pin) = 2? (te). paler) rr Subs titing the value of P [EI NA) an 7 PLA) Prom eq. @ and £99 GD Fn ing a we get, PCEIJA) = Pl E+) -PIA) FL) +2 (Bre). plA/ Bx) — || Bayes theorem fs also Known as IP. form Poy ne probability of "Causes 7 |The £3 !s ase a patton cf the Sample Space and ot Any given Hire only one ef tne events occurs, Tess related to bayes theorem a¢ | hx rotheses Evenis happening in tHe Sample Spare El E24 ++. En fs called the hypomese, wd Pastor? Prob abj lity I+ fs the initial probabil hy ofan evenk occu before ANY too clea Ps Jeuken jo Pale, PEs). ts isne-prlost Poobebility of- hypothesis EB. iv] Posterior Pr babi | THY. | Considering new fnfirrmectiory PLEIIA) Pe const: Jos tne poster probabiys Hy oF YPohesrs F. ! Se E=0 Tre. following ax he assumplfon of Naive } Bares Feocthun. _jndevendenr & vl Me missing data ComHnous Rekines are normally distribu ol wi Distoehe. Reatures have multinomial dis bs bt79 ng a) Feechh

You might also like