0% found this document useful (0 votes)
57 views

Module 5 (NLP)

m

Uploaded by

prajna.sg241
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
57 views

Module 5 (NLP)

m

Uploaded by

prajna.sg241
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 30
43ue& 27675 Module -5, TNFORMATION RETRIEVAL Ir nformation retrieval CIR) deals with the osganizaton., Rtow ve age , wctwieval and cvaluation of tnformation welevank TR Aystems are NOE a, only document tO a user's query. Traditionally , expected +0 yeturn the actual informatie containing that tnposmation.. poes NoT 1WFORM THE wautry. 7T MERELY UTS of DocUMENE AN USER on TH IWFoRMS on) THE RELATING TO HER REQUEST su@récT oF EXISTENCE AND WHERFABO Document’ yeferved here includes tent & non- textual nfo } apeech. Buch ac tmages s can be categorized as Retrieval system (iD Information seta ii) Mala Retsdeval Gi) auestion answestng éeval Rystem. Inpormotion. qetxieval | Data retsceval euestion answering, euestions art very specific tu makuse questions azc very speciric wh nakare * euestes submitted ase vague & imopeccise provides users Rk Retvsieve the cn fosrnation Retsieve pi rae on the extstence eo sae ith answers tO whescaboulF of the Ate ed aerined gpecipie qyucction documenté aeloting to pewuck ure juser's query: 2's cnfozmation meed - x user porroulale on the need: query bosed pe system « TR system atturns document welaléd tO Gerd Relevant document - The wetsteval & pexposmed by matching the query 2cpsee, tion with document xcpacscntation. Documents in a collection are frequently awepresentcd Ope nce xamac sae Ie re Keywords. icequsorcl® can Theg mught be Such *cpo esentation thsough a get be gtngle wosd ox multiple keywords. extsacted qutomatically ox manually. the document. The process OF provides logical vee Of 4tansporming document text to 20M aepresentation of it tg Known as indexing - The data gtrucluse rerson! t(nvestéed index. used en JR rysier a en tin menlé tc has to be pzocessed, operations asc : CH stop word climination: xemoves grarematical ox functional logicot view of the oclocu most commonly used teu words UO Stemming. yeduces words FO their ool Zipf's law con be applied to fusther reduc of tndex ger. To identify the zelevance of the Ecz7ms w the document quantified by assigning them numerical value. [ream - weighting) words e the £C3ze they ose called weigh for good descat indexing - Luhr’ “introduced the concept automabic tndexéng en thers content. He pzopascd cx berms a a of theey document? based the discxémination powe? OF ind yant osdex of the exequency middle frequency berms have Fhe Thee model extraclé oF that gunction of the occurence and ighese dircximination power. hi ae terms from a document - Thus tndeni aoe famed on gle the zwcprcsentation of text ag one meaning é Content of the oOrtginal tdxt = — The wosd te a c¥M can be Atngle word o roulti- keyword phrosy onsider a gentence ation, rcksieval, Single words can be : destgn. featutes, (nposro Aystens Words can be wkpresented 3 Sct Bese - ‘ gp, featuses, enforrmation xetaevol, information retneal &qSlems. Nulti words ay get of texms can be obtained by (cookin ak frequently appcasing seajuence of words, oe 4 Paxli -of Speech tags (pos). Though statistical opproocher to ph@ase extsaction ase efpecient they fall to handle word order changes ean Aksuclusal vosiations. syntactic approaches ove eo handing pesuclaral vastations. commonly sed phsose extrackon methods asc * Ang potr o potential phrase. 4 final Uist of phases & composed of those Pielemncecun, 100) 7 08) 0% documents ¢! (where n & constdered feascbly] PimatljaceDt ‘ron -thop wore? & segorded a paias of words he collection Ein but do nok contzibulé 40 the geman re | content’ of the document th a Iecyword gopresenrotion « Eliminating stop words seduces the numbex OF erably Liminating AtOP words ts thok ie | tb in the elimination of useful ey ck can be some tin’ tndex- { | The daawbock oF © \ i | assuming “a” to be the azticle 4 i& & xermoved., | only vitamen uw considered to be the index teem ihict WAU Gy improper vetsieval oF the document. \ + Some phsares Like x consis(t fa | OF entizely atop words Eliminating Rtop words in such casts mates it tmpossibi to sae earch a document ee oace uged to wepeesent text ase gtcms not the | actual words. : Most populay Akcroming algorithm was developed by | poster. The psocess of Bteroming usepul to help comptalé Rimilav +texnos seating th veeadl increased xecatl, but seduces precision. The major drawback of Atemming + that ik may throw away useful dustinctions. £9:° Since * computation’ ¥ “computes” both hove “cormpue” as their Keyword, ouery consisting OF term computation mmoy reeulls th the Betsieval OF document containing team compuler zipp's tao sels shor Qian ey eee Frequency of words W inversly psopostional to ce monk. 4 High facquerley words being common have less duscaimtnating power % thus not useful fox indexing. Low frequency words ayc less Likely to be included i the quczy % are nol useful for indexing Avo pping these texms [pith high Frequency and tow frequency] zeducesthe index is Rie considerably peas. a Po pneen & an émplementakion of Zipp 's (aw es, beds frequency +txms are deopped F270 ACL of INFormATion RETRIEVAL MODELS: An 1R mode w a pottern thok d Of the veteicval paoceduszt . Models diftey tn the wag di and qyeses axe xcpresented and retvievol ts performed - Somme models consider documenls as 8cli of Etamms and perform getricvolL based merely on pscscnce 0F absence OF one oF more ayuesy terms tp the document. others sepscs ent the documenk as a vecko™ oF term weighle and pesfosm. wctaicvel boged O” the numesic AcO7C arcigned +0 each document , epsesenting Riri lanty bekvocen the query & the documet. These models can be classified as: % classical Models of FR % Won- clacstéol models OF dk Lresnative models of IR efines geverol aspect jocument: ap are closcrcal models of IR. esse asels_ave_ poses on rrarherentiol Erma Hhokt (2 eosily. 2c oized and well understood These mrodcls axe gerople efFetene and easy to implement. These noodels asc coiled Classical models as maogt OF all Ga comme sctoaL zyslems ase based on ~motherebial models of I Proc eeeme AG documenl® are zp. : L apis model, _ i oF 7 Zp) eal eee nse Loveste, woods. \ Bees as a keke Ofc we a ust OF eyoore's and tdent\ e “ é sovested ft! dex. An coves f! © ep which they OccU®: “piers of the docume? ie, express es Piencdias a users aoe rege es i bed with Boolean expscsscon consistig. of 00 AS page : Boolean logical aopeslors Lavo, or, NOT]. Retxieval pexfosmed based on yhethes o7 NO document conrotn the qucsy terms. Given a fenile Ack of ender teums , 2 [t),ta,#3, 1. tes 7 tmt and fénile Ak of documents eid ds. da. Es Boolean expscssion - tn a nozmal form sepesent & G/UCry a as follows f as Ave), &e& fee, test Retzieval is pexposrmed oF* (i) The set Rr of Aocumcnle do not contain the texm +i: Re = idjloce dit. & € fee, ete} where ate € dj => te¢dy Gi) Sek opexations avec used to retzieve dlocumenlé {D),Ds,233 be the Ack of document where D, (Information sekrieval ct concerned with the Oxgonization, fore aaa erie D, = A user having - GD 4 be orn fosmulalé a sequct written th natusal Language - Ds = The SySlee ryeaponde by act ascurng ig " that stems aelevant to the ap Pl more peequentig in the 3. stemming : seduces the obtacn index +e7Mes. istic wook FO pz to terme occoading “O- g 1 assigns Wi g ow bob. | He. Term weigh in theiz tumpostance o> the document pin the collecting: Exarple ;- Beonatder the Att OF documents. Show the veckor aepsesentotion of documents aples glémroing : Di: vector pace model D2: Probabilistic setrieval model D3: Inkelligent techniqua > inposmation getacv al. | Steromed ‘terms | ai | pe | D3 inform a Bi i infetl Ree heal model i |e poobabitist ° teams serney Es eas |S Space t o fe - Ip both the documents VV on : menie and query vectors pave been coatne malized, then inner product yields the H flaw me YW cokine Kémilamty. = COKinC McaguNe divides the Numerator by the produl OF length of vectors. thir tends to give low zimilastticn +o bon: vectors [vector with many teams], the overlap cocffeeiealr by the vector having Amalier um of uxighli. Sina (dj.de) 2 de ig* ex min (3 wij Brin) a i NON classical models of IR'* peaform wetvsiceval based on principles other than those used by classical models [atmilanty, Probability, Boolean operation). These rnodels arc based on Apeetal logic technique , &/tuation theory o» concept of intesaction, (a) Information logic model’ ts based on xpeciat logic technique coiled logical imaging. Retsieval Ut pexpormed by moting infesences from document to Yyuery unlice usual iroptications , which G@ true in all cages exceple that when antecedent & true and conacquent i false this inpesence te uncertain. Hence neasure of uncertainty te acsocioléd with thie infesence. the princeple. b “GIVEN ANY TWO SENTENCES X AND Y, van Rijsbergen £04 y>x REewTIVE TO A A MEASURE OF THE UNCERTAINITY OF Givew DATA SET tk DETERMINED BY THE MINIMAL exTENT TO WRICH ONE HAS TO ADD INFORMATION TO THE DATA SOT mo Jo ESTABUCH THE TRUTH OF yx". (b) Situation theory modet: bared on yan Rijshexgen's painceple- Retwicval ur constdered as a #low Of a ow nr 50m document to yuery- A Structize catled by t G used to descsibe the situation and to enposnmation flow. An ¢tnfon wepscscnli an 7g Eis polarity. Polarity defn pods oni snegotive information by mapnine assigning 18 example,“ consider a Rentence “ AdiL of aering corresponding (fen Te << sevving Adit, di LarlG of infor depends on ES ep pe Sttuation $m ical infon T= << searing Adil, di by o gituation siz 1 j = A Ae, ee i | Solution : - Rank Document Relevance obscaved precision ab yecall poinr —-zecail point i a im Vy 20-2 th = 10 ie ce x fg ove anne aia 3 eB 4 iS i ss 0.6 o ! By me Caney ala é a 7 4 2 7 : c a Misco mig eee uo é x ion onn2/ome ous ik Won- intespolaléd average pacciston = i+1+0-6 +04 4° SF SIcfale tO RnOs temas s The most widely used yecall levels axe era Bas =| eal ven e 1.0. The pseciston values asc colculaled at ae ile a al 1) vecatl bevels y then averaged From AoC ee alae x e ton. + & Enown as it~ porne interpotaled averag Pp ez ainbiy whoo area] vwhwodor whuiodéy wanna | see eNpIoM u SuERER UnOR ZZ BINbLy Tamed’ | akoowy| ssoyn razed wor | wkoojoy ard 0 jou worg | wx%u0:>) eon oyeo] ——sadagns 1 ndzr002 mors | wkuodAy an coyeo | ssmurpio-tedes 01 ads0009 wos | hardy ar, senna | eae “uBney, qi94 aqp 204 suoney2s wXuodon sip smoys 9-71 asndiy WaNPIOM WHOS) parDENX2 3011, 10) UE ukuredhy e saogs ¢71 23nSty (naidin 30) sed wsXuowwe vo paseg sien ‘oy paznrefio are saarafpe seasacym ‘uortejaa Kuskuodkyyhuwkusaddy 2 Lo pareq.somsresary ois paziueSi0 aye sqi2A PUE SUNON ‘SqH24pe PUE sannoolpe pur ‘sqiaa “sunou uaamioq ploy rey Sdrysuone|24 2qp Jo wos OM YZ1 PUR ETI ‘ZZI soundly sBunieous srenuD.2yNP day s9=rO1D) ‘Jan © se sosuss {| PLE unoU = se aU3t 240 FEY ,PEDY, PEM, PION 2 ELE sramoany yeas ‘mp2 wor ound wupromr da * eos 5 kwn02 IONN_2h nbs 330uH_ ut pu. Piom y 2 "ROOSGH uskwopo/AarTaainas “euikuoyue CuiKuod ky oy Beymer Yosser “oat 21 sainemovicevincis oni cms related with IR, Eon Menttcation in Natural Lanuae® Oe eee svt ats efi 8 ane a TA ‘Wore entry rhe Wa word ar (ators ragelel- hac rer “A mare Swe OF YR BO . gee eg Sw Syed A ee ot Im) Buskanuos vorod 24 toc) ax) Oy BRE JO 19812994, opeapumarsoc) Fae eee 00g weuedereyy pue IYOHY) YT aPNPDU! suoAE>y 1/99 Uoheuwexo ay wos 19¥0ed e paniaves nqusnuy, (721) s2uaquas Sus ,znqysnepy 09 wxPed 1U>s ‘OuM, Se Yons Uonsanb © samsue o1 wansks HuLramste-tonsonb = MoITe souresj voulwso> yong “Urey YAISNVYL vouWuoD ¥ 0) wodsor yim Paugap ways (zoog Aysjesnf pue rappo) “9 ‘SCOOD ‘LNIIdIOTY SWAANAS s9}0r snueWos ayp azeys prom ,9419291, puE ,PUBS, q32A 241 ‘ajdurexa 10g waysXs Sousmsue-uonsanb aip ut djay ew sojox onuewiog “eyneam peq 01 anp paddors yoiew au “yoiew aug paddors aundun ous (ean) s2uqu9s ut pafqns ayn st aya Spalqo amp st y>reUH, pioM 940 “(PGT) 2ouoT09s Uy awas24FP st aos snseuAs ay ‘gues st 910s nUEWAS Xp YsNOH . “oxaMOH] ‘voNIEAXa UoNeULOsUT UI 9j01 yuEyodun! ue Aejd UD 1NjoUTA } “tout PowRgo ajo: anueuiss morreys ayy, Bursred anueuras anewo}ne 10) at aad “HR NSU pasn (5097) [8 19 UOHK n mp2 Kjo430q wo usury div, (eer) ee > a el [.Aopo1 uid 00:2 Aa u aH10d oy sawie;g s9]01 9594) ssa Pur “aassaaaV !9ue3} NOLLVOINNAWOO. ons sajor sumuo> 2ues) LNTIALVIS © ‘2[dumxa 40 “ouies) soupoue ow oueyy ¥ “NOSVaY Pur “AaNIVAa ‘aoant P s21p0 pi UOye ures) 2302-U0U pue 2109 (OL HOLYOINANWOD "{ 2484) NOLVOINN| *) paggeu [aai0d au, surey LSTA ap Ut @ion 71) s>uo1005 ut pron ose Bu. “Logasns pur neues ax yu poreouue (1) 29491495 sapisu0 ‘aydaexa ur sy “nuawoys aes pur “AW Tlivinas w se paMaiA- a ED sia 94 9 ATO iy ‘s[OF SHUTUIGE TOR p jaNouresy ied iim uonvtai® rejnanied © S34OR pom URS Tein THT SIG ae, “ivowoys suey arin yim pale a3 a METST BOM atl io tq uo paveq sy uf WTF DWURUIDS Jo oeeqriep aBTe| ws ,)aNoUTes sureyp ea1xa] ainduio> 01 19NPIOYA wHO4y wvoneuioyus sozinn (2661) PePEyry pur Aepereg Aq porussaid yseosdde aL “woneZuewuns 1x9) ut UoHEDYdée {rjasn puno} sey r9NpIOM, ‘onesneununs wounsoc(— (g661 utsey pue 1026) vonezudBore> 40} pasn uaaq oaey ‘aSpa|nouy Jo woneyuasaidss jemidaru09 NPIOM PUE “NPIOA Wor} paIrenxe uoNeUO|UL snuEWES a4 ‘vonezu032169 pue Buunianns iwownogg~ resauiay vonewvoju pue Burssazouy afen3ue7 i sr panjoaur Ay sndioa jeuonen, yshug afr U su sodunnuas UsiaaRy tte #5 851 LSOdOOV. 1rS/93HP SOd J0 uonsajIo9 y USOd0ov User Jumy 288ez00,;rem0g ny an 319 a1qujeae eqoud Tppous noxreyy ain Xq pare} surajqosd Hgeqoud © 1 (661 yprUpS) 198-901), wPFeL201 oS cr (4661) wus pue aprsse5 pure ‘(9661) weag Pue ‘apreieD) “y3997 “(7961) apisresy 208 “isShey Jul @20ur 204 9x01 30 od atp 03 Surpodze souren MLL Kaeo" 9576-96 pardny>e sey SAMY, yeqord y0q, se 208) pughy © pasopstuoa aq ue “yeqyI9 ‘safe ap OIF SL emepan zw Due ponycény) yee] Jo hysiaaniy acojenap sem i] ysiSug 40} ssa onsyiqeqard rso109 319 Jo 200 51 (SMVID) wnsks Fussm-piom oneutoine poo'y>yN wavneo USii8Ua 40) 19530] ysedsjoueg SMYID S'S'ZT Zero LAU Ig-/mpe nylso mmayy diy aH aq) ye peofAmop +0} S1atHa8 #41289 34g “Gumuasun Jo sore> atuos ut spiom 0} posdluse 29 wee sBer apdaymu 222yor ‘a8 599-4 e OnU! papuaTxd 99 ued 1085e) aw MOY smensuow9p 1 spiom umouun BuiFFe 01 yovoudde poses dts 31 ssaidxa oy aqqeun Apuouind are sia8fi anseyoo%e 1 feo1x3] SuissosdK9 0) poyrous © sequosap 94] (661) trea 4q pasodo:d usoq axvy s28e) poseq.aina sup oy suDIeuaneS Je s2quinu y ‘sajt 2onpus Apeanewiome 1 Bunusea] pareq-voneusiojsuen fm if 819888) onseyaors jo yey) 07 ajqzieduio> soueusoyiad Powreqo rey} 19838) peseq.n: Poqussap (2661) itd soF8eL ia y's'zT TRE s900n0sy eon bedieyemetw Tel otged alowauny Adainas anmenee ‘seqpeaidde juauina saino se yom sp to1an joddns towoy Supnjour ‘sun 4 ponstyoe yep oy aoueunioyrad ajgereduion said SINWIW [2u0n291»p-14 sous petots 224909;u) uoManpHIN wopedina Aree spoyiaus sen Giut teuonsupiq posodosd ayy ren mous sasha wpoadsjouuned Surpuodsa1io> a uy pur sai Tad NAN) spew soxreyy Adonus wnusereu uo pareg 4 Buse ‘Peedtyoued 20) wxnuofie souasay euonooupag ¢ cot sae) sy Snotsaud 2899 919 uo p's Jo UoR2npas 10119 Ut on enorne UDI o692-26 ISM yweqao1] wary op uo safe sup jo Aoemooe pauodar ay) SS =e (sp pone ne any paRiinsod peyoninn - pdf ae 044 " ie ‘26610 Gusn ptt omomc> nav 0 TE¥ 900 ‘won>atte> WOVD 341 J E64 1u9WIND0p Jo ue Sunnoows 195 sombre uappiy uo pase epuP Ys (o9oz RIG) Lu 40 sey usuieldiy o93BE UL Egzy (so0e. *.L Pue eyonunsy) saunyseus Ae Quwseay ywe-ayp.j0.>)¢1¢ Inso4 jruauusdyg “own penousjed ur oi wontsedutoaap ajqissod fe ets18u3 10) 1973] yooedsiouegy ze:zq {punys 2580) Soe ieeeeelens drurdiy yu 249 we punoy aq ues sasher ane “H=poU reaU-30) euonIpuoD us sion ‘aamieay yer} Jo aftr peoig e coon anion venewen Bue Busss2044 BenPLe7esney 7 (rma yfom-aaay dit Eee yi SSE ss Language Processing and Information Retrieval ‘Maximum Entropy Tagger (MET) ‘This tagger is based on a framework suggested by Ratnapark Uses an iterative procedure t6 succ ively improve paramete of features that help to distinguish between relevant contexts. Tiigram Tagger (73) ‘This tagger is based on HMM. The slates in the model are tag pairs that emit words. The technique has been suggested by Rabiner (1990) implementation is influenced by Brants (2000) Errorctrven Transformation-based Tagger (TET) Example-based Tagger (EN) ‘The underlying assumption of example-based models (also called memory based, instance-based or distance-based model LLP by Daelemans et al 12.5.7 POS Tagger Indian Languages ‘The automatic text proc cof Hindi and other Indian languages is constrained heavily due to lack of basic tools and large annotated corpuses. Research groups are now focusing on removing these bottlenecks. The work on the development of tools, techniqus several places such as CDAC, IIT Bombay, IIIT Hyderabad, Uni of Hyderabad, CIIL Mysore, and Univer involved in the development of morphology analy taggers for Hindi and Marathi. Both these morphological structures. Theit approach is based on bootstrappi small corpus tagged by a rule-based tagger and-then applying techniques to train a machine. More information can be found been reported by Hardie (2003) and Baker et al. (2004), Research corpora have been developed tasks. In the following section, we point out. docum We have alieady provided alist of IR test document collection in Chapt 8. Glasgow University, UK, maintains a lst of freely available IR ctions. Table 12.2 lists the sources of those and few more IR test Package of benchmark data sets released by, Microsoft Research Asia. It consists of two datasets OHSUMED and ‘TREC (TD2003 and TD2004). LETOR is packaged with extracted features for each query-document pair in the collection, baseline resulls of several -aming-o-rank algorithms on the data and eval ‘cols. The data set is aimed at supporting future research in the area of learning ranking function for information retrieval Table 122. tet colecon LeTOR | huptiesearch micraohcomlserah ETON ‘Nip: tiwr dese. utidomf_ resources colleunal 12.62 Summarization Data Evaluating a text summarizing system requires existence of ‘gold summaties. DUC provides document collections with known extracts and abstracts, which are used for evaluating performance of sursmaroation Seems submited at TREC conferences. Figuel2.11 shows a sample document and from DUC 2002 summarization data. Josuaegtuny speojumopyepe Sunnduog 10) [ pure sSuipa201d a2uasajuo> jo saquuinte api y SUV 3HL NI S30N3U34NOO ONY ST¥NUNOF ‘asn aanensuowiap Jo aunyeu moys ‘1 paymouue sndio> spuryy © pue “Suisse yooad: ep pin, sapnpput sndio> jojgesed au ajdoqnus uy paysyqnd are you 1x9) 24, “soFend red a4, pomuau eisy Oa ay ut Hor od yomurupy 20 Duey sexo Wedny 2US By ve yenueW ay ut pune} 2q WED Suerep s0unINg Tun ccogMyMayAnprniopore/o-wpPmmmy-dny YU aYp ve *Ayuo asm yazeasas 40) Inq “201j 10) a[qeyrene st sndio> I /ETUWI Iiry 241 -s9sndio> pamouue pur jajjered ‘sasndio> uayods pur wantin Tenujouow pnyout aiqurese apew TTIW tp s99!N0s wep at, 391098 Jp stew ps sou sofort ee 9s uae TD s28enSiuey wexpuy jo saquinu x apnj>us 0 saBenfy 198se) Jo 198 ay Popuorxs pafesd oyp in soured ueipuy ain “(T119) saBenSuey uexpuy 30) ayrunsuy penuac saBenBuey ueisy yin0s Jo gTIN a¥N 40) sj00) SuuaoutFua afien2ue] 21seq pu samosas aseaujos "rep Jo wonexauall uo sasn2o4 raford 241 HA “KtssontwE saseouey re paload (aT TW) Suuseuisua afensiue] _ Avuouws Suyqeus oxy jo snsos axp st sndion ATI Pensuyojow ayy esodiog aBenfuey uelsy y9'ZT 92M 210 2940 oygnd jessuad ayp woy Bue asude pom s729]}09 a -sndion paSifer suas afte] Aian © ayeas9 oy duane «ody p10y4 Pu wd “nasuds NPIOM, pia paBBeyasuas ‘sndior uMosg yp JO tesqns ¥s13] uonendiquresip ut pasn sndioo paSifey28u98 © st, OOWAS uonendiquiesiq asuas pom E9TT SBE seanosay jesma7 Ler ‘pena Sy pve 2962 90 WoH Waunop aes 41-21 aunby hy 89493 a¥esKLME Fuomen a YB Arpmieg aueousny © ou! pousyySuane seaqagues ui9!e9 24 W sti uso jaedony penx S005 "pou | eo> venrayy ayn Funny 2s0jq Ayauq yas sued UM pyres! kagng] ‘819 241 9uUINY puores oxy sem “vORRDS uNOIEanUERY’ BBGL AHO UNO Powee YiKs ap “smUaI01g ur jo aay 2eupio,y aueung “Kepines UO, 2 Kaeay ey 70> ou 349 UO “een we 212m a1L, 09 YiR0® o2ny ong 01135 spuim aeoxnnor Suons Supooy jerseoa ayFno:q u2qhiD ay Aim paieizusre pum Bucy ‘epi, 2) pu oony onany 24 yore pooy Hse © Ponts aotas saya Sif ain unre Sunesos sane Kaa © UaghiD pee “oon aang “wea ues aye, fuONEN OL ofutiog one yo eaxanos sou gz pur ‘any owen 'a>u04 Jo YiNos: nage sam 59 apraSioy ‘guou 191 apne 1 Kepung ure Ze vont su pavods ru ut s0N129 Supt euoneN 24 ee Arps aurouuny © out pausfuant pur smaqqueg worre2 3p W plo} Haq uuoKS eden ‘o8unwiog ns Jom oyu st nage voting Jo Aa> 29 wt o'dg SupnPet‘2uIs0Hd ay won ydoed oyoor Povo uy auowrow sgn mono) Aso pinout uoYEsE Jo S>UNOId i9 Jo BUspIos pes age -epanns wpe 310;3q quoys vor voj2jn © w pres igeD oHHsBhng 12g au3}9q AID «ULE 30) pabU ou H 304. ‘ud 25 1 fens dus gy J sp pours gun iexpnot 9 WO Surioeoudde sea ware a se iq pu sures doeay spuun yy 20] ardaid of e09 mos poremdod {se 31 ptt awa ED 2 pur Nepung 2ygedoy uEMag aqrpiemor Want uae seeauMIy lav) sandoy wowMod ‘OONTNOG OLNYSGNTTALVED: = SANTTAR>914M 5844 pareDOSVINITARS 109 weWOG Premo FpEBH USAID suEUIMCUY THES arE0 UID 2uDUIMH-J8caNODAS> - 6000 11-90 U>qr ose DEY cESHLES CITI LOTECtO 11 Go UNaVcaTETD | ‘ 9100-1 160884 <0} eroviey vonewroyy pue Burss320%4 08 ACM SIGIR Conference is one of: and development foru was held on 23-97 July 2007 at Amsterdam. The Pi Retrieval Conferences (TRECS)"* are another impor information. These proceedings report evaluations organized by the US government. Th organized regularly since 1992 as a part of the TIPSTER They were earlier known as the Docament Understanding Message Understanding Conferences. The conference series by the National Institute of Standards and Technology (NIST) with additional support from other US government agencies. The ACM Special Interest Group on Information Retrieval (ACM-SIGIR) focuses on telat tasks and ECIR ss European counterpart The NTCIR foresee on languages KES" Intemational Conferences in Knowledge-Based and Intelligent ineering & Information Systems have been a regular feature since The conference mainly focuses on applications of intelligent systems. The topics covered by KES includes general intelligent topics like neural networks, fuzzy techniques, genetic algorithms, knowledge representation and management, applications using intelligent techniques (e.g, speech and synthesis and NLP) and emerging intelligent technologies formation retrieval, intelligent web mining and applications, intelligent user interfaces, etc. HLT-NACCL is sponsored by the North American chapter of the Association for Computational Linguistics The Journal of Computational Linguistics 's a leading premier publication Focussing on theoretical and linguistics aspects. More practical applications are covered in the Natural Language Engineering Journal nformation Retrieval by Kluwer, Jnformation Processing and Management by Elsevier, ACM's Transactions on Information Systems (TONS), Journal of American Society for ‘Information Sciences are major journals covering a wide range of information processing applications, Other journals in the area include Jaformation Sites ete cee wn sigie2007.0r6/ ‘ent gov/ *Seunernational org/conferences php SUMMARY REFERENCES ‘+ Lexical resources such as WordNet and FrameNet can be used ted tasks. in a number of information processing tasks such as information retrieval, text summarization, and text categorization. ‘+ Widely known stemmers include Porter's and Lovins stemmers. + ParvoFspeech tagger is used to assign a par-of-speech, such as noun, verb, pronoun, preposition, adverb, and adjective, to each word in a seotence (of + Taggers include stanford log-linear partof-speech tagger, ToT, CLAWS, and Brill’s tagger. ‘+ TREC and SIGIR conferences offer useful resources for a number of information processing-related tasks nan, R. and Durgesh Rao, 2 weight stemmer for Workshop on Computational Linguistics for South Asian, Languages, The 70th Conference ofthe European Chapter of the Assocation «for Computational Lingustes (EACL'03), ACL, Morristown, NJ. AM. McEnery, and B.D. Jayaram, 2004, ‘Corpus s and South Asian languages: corpus creation and tool development,” Literary and Linguistic Computing, 19(4), 509-24 Regina and Michael Elhadad, 1997, ‘Using lexical chains for summanzation, Proceedings ofthe Itligent Scalable Text Summarzaton Workshop (ISTS°97), ACL, Madrid ee Brants, Thosrten, 2000, ‘TnT--as statistical partof speech tagger,’ Procedings af the Sixth Applied Natural Language Processing Conference (ANLP-2000), Seattle, WA, Hee Brill E,, 1092, ‘A simple rule-based partof-speech tagger,’ Proceedings of the Third Conference on Applied Natural Language Processing, ACL; Budapest, + Hungary.

You might also like