Tao
Tao
Tao
rmish gunninghm hin wynrd ulin fonthev lentin ln xirj eswni sn oerts qenevieve qorrell edm punk engus oerts hni hmljnovi homs reitz wrk eF qreenwood rorio ggion tohnn etrk oyong vi im eters
et al
httpXGGgteFFukG
his user mnul is freeD ut plese onsider mking dontionF rwv versionX httpXGGgteFFukGuserguide
Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale Information Extraction), GR/M31699 (GATE 2), RA007940 (EMILLE), GR/N15764/01 (AKT) and GR/R85150/01 (MIAKT), AHRB grant APN16396 (ETCSL/GATE), Matrixware, the Information Retrieval Facility and several EU-funded projects: (SEKT, TAO, NeOn, MediaCampaign, Musing, KnowledgeWeb, PrestoSpace, h-TechSight, and enIRaF).
This work is licenced under the Creative Commons Attribution-No Derivative Licence. You are free to copy, distribute, display, and perform the work under the following conditions:
Attribution You must give the original author credit. No Derivative Works You may not alter, transform, or build upon this work.
Waiver
Any of the above conditions can be waived if you get permission from the copyright holder.
Other Rights
In no way are any of the following rights aected by the license: your fair dealing or fair use rights; the author's moral rights; rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
Notice For any reuse or distribution, you must make clear to others the licence terms
of this work.
For more information about the Creative Commons Attribution-No Derivative License, please visit this web address: http://creativecommons.org/licenses/by-nd/2.0/uk/
Brief Contents
I GATE Basics
I sntrodution P snstlling nd unning qei Q sing qei heveloper R giyviX the qei gomponent wodel S vnguge esouresX gorporD houments nd ennottions T exxsiX xerlyExew snformtion ixtrtion ystem
3
S PU QU UI WQ IIU
135
IQU IVW PPW PQW PTW PUU
289
PWI QII QSI QSW QTU QVI
iv
Contents
IW ools for elignment sks PH gomining qei nd swe PI wore @giyviA lugins
525
SPU SSI
Appendices
e ghnge vog f ersion SFI lugins xme wp g ysolete giyvi lugins h hesign xotes i ent sks for qei p xmedEintity tte whine tterns q rtEofEpeeh gs used in the repple gger eferenes
553
SSQ SVU SVW SWU THS TIQ TPI TPQ
Contents
I GATE Basics
I sntrodution
IFI IFP IFQ row to se this ext F F F F F F F F F F F F F F F F F F F F F F F F gontext F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F yverview F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F IFQFI heveloping nd heploying vnguge roessing pilities IFQFP fuiltEsn gomponents F F F F F F F F F F F F F F F F F F F F IFQFQ edditionl pilities in qei heveloperGimedded F F F IFQFR en ixmple F F F F F F F F F F F F F F F F F F F F F F F F F ome ivlutions F F F F F F F F F F F F F F F F F F F F F F F F F F eent ghnges F F F F F F F F F F F F F F F F F F F F F F F F F F F IFSFI ersion UFI @xovemer PHIPA F F F F F F F F F F F F F F F F purther eding F F F F F F F F F F F F F F F F F F F F F F F F F F F hownloding qei F F F F F F F F F F F F F F F F snstlling nd unning qei F F F F F F F F F F F PFPFI he isy y F F F F F F F F F F F F F F F PFPFP he rrd y @IA F F F F F F F F F F F F F PFPFQ he rrd y @PAX uversion F F F F F F PFPFR unning qei heveloper on nixGvinux sing ystem roperties with qei F F F F F F F gon(guring qei F F F F F F F F F F F F F F F F F fuilding qei F F F F F F F F F F F F F F F F F F F PFSFI sing qei with wvenGsvy F F F F F F F ninstlling qei F F F F F F F F F F F F F F F F F rouleshooting F F F F F F F F F F F F F F F F F F F he qei heveloper win indow voding nd iewing houments F F greting nd iewing gorpor F F F F orking with ennottions F F F F F F QFRFI he ennottion ets iew F F v
3
V V W W II IP IP IR IS IS IU
PU
PU PU PU PV PW PW QH QP QQ QR QS QS QV RH RQ RS RS
QU
vi
Contents
QFRFP he ennottions vist iew F F F F F F F F F F F F F F F F F F F F QFRFQ he ennottions tk iew F F F F F F F F F F F F F F F F F F F F QFRFR he goEreferene iditor F F F F F F F F F F F F F F F F F F F F F F QFRFS greting nd iditing ennottions F F F F F F F F F F F F F F F F F QFRFT hemEhriven iditing F F F F F F F F F F F F F F F F F F F F F F F QFRFU rinting ext with ennottions F F F F F F F F F F F F F F F F F F QFS sing giyvi lugins F F F F F F F F F F F F F F F F F F F F F F F F F F QFT snstlling nd updting giyvi lugins F F F F F F F F F F F F F F F F QFU voding nd sing roessing esoures F F F F F F F F F F F F F F F F F QFV greting nd unning n epplition F F F F F F F F F F F F F F F F F F F QFVFI unning n epplition on htstore F F F F F F F F F F F F F F QFVFP unning s gonditionlly on houment petures F F F F F F F QFVFQ hoing snformtion ixtrtion with exxsi F F F F F F F F F F F F QFVFR wodifying exxsi F F F F F F F F F F F F F F F F F F F F F F F F F QFW ving epplitions nd vnguge esoures F F F F F F F F F F F F F F F QFWFI ving houments to pile F F F F F F F F F F F F F F F F F F F F F QFWFP ving nd estoring vs in htstores F F F F F F F F F F F F F QFWFQ ving epplition ttes to pile F F F F F F F F F F F F F F F F QFWFR ving n epplition with its esoures @eFgF qeigloudFnetA QFIH ueyord hortuts F F F F F F F F F F F F F F F F F F F F F F F F F F F F F QFII wisellneous F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F QFIIFI topping qei from estoring heveloper essionsGyptions F F QFIIFP orking with niode F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F RT RT RU RV SI SP SQ SS ST SV SV SW TH TH TI TI TP TQ TR TS TU TU TV
RFV
he e nd giyvi F F F F F F F F F F F F F F F F F he qei prmework F F F F F F F F F F F F F F F F F F he vifeyle of giyvi esoure F F F F F F F F F F roessing esoures nd epplitions F F F F F F F F F vnguge esoures nd htstores F F F F F F F F F F F fuiltEin giyvi esoures F F F F F F F F F F F F F F F giyvi esoure gon(gurtion F F F F F F F F F F F F RFUFI gon(gurtion with wv F F F F F F F F F F F F F RFUFP gon(guring esoures using ennottions F F F F RFUFQ wixing the gon(gurtion tyles F F F F F F F F F RFUFR voding hirdErty virries using ephe svy oolsX row to edd tilities to qei heveloper F F F F RFVFI utting your tools in suEmenu F F F F F F F F peturesX imple ettriuteGlue ht F F F F F F F F gorporX ets of houments plus petures F F F F F F houmentsX gontent plus ennottions plus petures ennottionsX hireted eyli qrphs F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
UI
UP UQ UQ UR US US UT UU VP VU VW WH WI
WQ
WQ WR WR WR
Contents
SFRFI ennottion hems F F F F F F F F F F F F F F F F F F F F SFRFP ixmples of ennotted houments F F F F F F F F F F F F SFRFQ gretingD iewing nd iditing hiverse ennottion ypes houment pormts F F F F F F F F F F F F F F F F F F F F F F F F F SFSFI heteting the ight eder F F F F F F F F F F F F F F F F SFSFP wv F F F F F F F F F F F F F F F F F F F F F F F F F F F F F SFSFQ rwv F F F F F F F F F F F F F F F F F F F F F F F F F F F F SFSFR qwv F F F F F F F F F F F F F F F F F F F F F F F F F F F F SFSFS lin text F F F F F F F F F F F F F F F F F F F F F F F F F F SFSFT p F F F F F F F F F F F F F F F F F F F F F F F F F F F F F SFSFU imil F F F F F F F F F F F F F F F F F F F F F F F F F F F F SFSFV hp piles nd y0e houments F F F F F F F F F F F F F SFSFW swe ge houments F F F F F F F F F F F F F F F F F F SFSFIH goxvvGsyf houments F F F F F F F F F F F F F F F F F F wv snputGyutput
vii
SFS
WR WT WW WW IHI IHP IIH III III IIP IIQ IIR IIR IIS IIT
IIU
IIV IIW IIW IPH IPI IPI IPQ IPR IPS IPT IPT IPU IPU IPU IPU IPV IPV IPW IPW IQQ IQR IQR IQR
135
IQU
viii
Contents
UFI UFP UFQ UFR uik trt with qei imedded F F F F F F F F F F F F F esoure wngement in qei imedded F F F F F F F F sing giyvi lugins F F F F F F F F F F F F F F F F F F F vnguge esoures F F F F F F F F F F F F F F F F F F F F F F UFRFI qei houments F F F F F F F F F F F F F F F F F F UFRFP peture wps F F F F F F F F F F F F F F F F F F F F F UFRFQ ennottion ets F F F F F F F F F F F F F F F F F F F F UFRFR ennottions F F F F F F F F F F F F F F F F F F F F F F UFRFS qei gorpor F F F F F F F F F F F F F F F F F F F F roessing esoures F F F F F F F F F F F F F F F F F F F F F gontrollers F F F F F F F F F F F F F F F F F F F F F F F F F F F wodelling eltions etween ennottions F F F F F F F F F hupliting esoure F F F F F F F F F F F F F F F F F F F F UFVFI hrle properties F F F F F F F F F F F F F F F F F F ersistent epplitions F F F F F F F F F F F F F F F F F F F F yntologies F F F F F F F F F F F F F F F F F F F F F F F F F F F greting xew ennottion hem F F F F F F F F F F F F F greting xew giyvi esoure F F F F F F F F F F F F F edding upport for xew houment pormt F F F F F F F sing qei imedded in wultithreded invironment F sing qei imedded within pring epplition F F F UFISFI huplition in pring F F F F F F F F F F F F F F F F F UFISFP pring pooling F F F F F F F F F F F F F F F F F F F F F UFISFQ purther reding F F F F F F F F F F F F F F F F F F F F sing qei imedded within omt e epplition UFITFI eommended hiretory truture F F F F F F F F F UFITFP gon(gurtion piles F F F F F F F F F F F F F F F F F F UFITFQ snitiliztion gode F F F F F F F F F F F F F F F F F F qroovy for qei F F F F F F F F F F F F F F F F F F F F F F F UFIUFI qroovy ripting gonsole for qei F F F F F F F F UFIUFP qroovy sripting F F F F F F F F F F F F F F F F F UFIUFQ he riptle gontroller F F F F F F F F F F F F F F UFIUFR tility methods F F F F F F F F F F F F F F F F F F F F ving gon(g ht to gteFxml F F F F F F F F F F F F F F F ennottion merging through the es F F F F F F F F F F F F he veftErnd ide F F F F F F F F F F F F F F F F F F F F VFIFI wthing intire ennottion ypes F F F F F F F VFIFP sing petures nd lues F F F F F F F F F F F F VFIFQ sing wetEroperties F F F F F F F F F F F F F F VFIFR fuilding omplex ptterns from simple ptterns VFIFS wthing imple ext tring F F F F F F F F F VFIFT sing empltes F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F IQU IQV IRI IRQ IRQ IRQ IRS IRT IRV ISH ISH ISQ ISS IST ISU ISV ISW ITH ITQ ITR ITT ITW IUH IUI IUP IUP IUP IUQ IUR IUR IUS IUW IVR IVT IVU
UFS UFT UFU UFV UFW UFIH UFII UFIP UFIQ UFIR UFIS
UFIT
UFIU
IVW
Contents
VFIFU wultiple tternGetion irs F F F F F F F F F F F VFIFV vr wros F F F F F F F F F F F F F F F F F F F F F VFIFW wultiEgonstrint ttements F F F F F F F F F F F F VFIFIH sing gontext F F F F F F F F F F F F F F F F F F F F VFIFII xegtion F F F F F F F F F F F F F F F F F F F F F F F VFIFIP isping peil ghrters F F F F F F F F F F F F VFP vr ypertors in hetil F F F F F F F F F F F F F F F F F F VFPFI iqulity ypertors F F F F F F F F F F F F F F F F F VFPFP gomprison ypertors F F F F F F F F F F F F F F F VFPFQ egulr ixpression ypertors F F F F F F F F F F F VFPFR gontextul ypertors F F F F F F F F F F F F F F F F VFPFS gustom ypertors F F F F F F F F F F F F F F F F F VFQ he ightErnd ide F F F F F F F F F F F F F F F F F F F F VFQFI e imple ixmple F F F F F F F F F F F F F F F F F VFQFP gopying peture lues from the vr to the r VFQFQ yptionl or impty vels F F F F F F F F F F F F F VFQFR r wros F F F F F F F F F F F F F F F F F F F F F VFR se of riority F F F F F F F F F F F F F F F F F F F F F F F VFS sing hses equentilly F F F F F F F F F F F F F F F F F VFT sing tv gode on the r F F F F F F F F F F F F F F F VFTFI e wore gomplex ixmple F F F F F F F F F F F F F VFTFP edding peture to the houment F F F F F F F F VFTFQ pinding the okens of wthed ennottion F F VFTFR sing xmed floks F F F F F F F F F F F F F F F F VFTFS tv r yverview F F F F F F F F F F F F F F F F F VFU yptimising for peed F F F F F F F F F F F F F F F F F F F F VFV yntology ewre qrmmr rnsdution F F F F F F F F F VFW erilizing tei rnsduer F F F F F F F F F F F F F F F F VFWFI row to erilizec F F F F F F F F F F F F F F F F F F VFWFP row to se the erilized qrmmr pilec F F F F VFIH xotes for wontrel rnsduer sers F F F F F F F F F F F VFII tei lus
ix
IWU IWV PHH PHI PHP PHR PHR PHS PHS PHT PHT PHU PHU PHU PHV PHW PHW PIH PIQ PIR PIS PIU PIV PPH PPH PPQ PPR PPR PPS PPS PPS PPT
W exxsgX exxottionsEsnEgontext
WFI WFP
WFQ
snstntiting h F F F F F F F F F F F F F F F F F erh qs F F F F F F F F F F F F F F F F F F F F WFPFI yverview F F F F F F F F F F F F F F F F F WFPFP yntx of ueries F F F F F F F F F F F F F WFPFQ op etion F F F F F F F F F F F F F F F F WFPFR gentrl etion F F F F F F F F F F F F F F WFPFS fottom etion F F F F F F F F F F F F F F sing h from qei imedded F F F F F F F WFQFI row to instntite serhledtstore WFQFP row to serh in this dtstore F F F F F
F F F F F F F F F F
F F F F F F F F F F
F F F F F F F F F F
F F F F F F F F F F
F F F F F F F F F F
PPW
PQH PQI PQI PQP PQQ PQR PQS PQS PQS PQT
Contents
IHFI wetris for ivlution in snformtion ixtrtion F F F F F F F F IHFIFI ennottion eltions F F F F F F F F F F F F F F F F F F F F IHFIFP gohen9s upp F F F F F F F F F F F F F F F F F F F F F F F IHFIFQ reisionD ellD pEwesure F F F F F F F F F F F F F F F F IHFIFR wro nd wiro everging F F F F F F F F F F F F F F F F IHFP he ennottion hi' ool F F F F F F F F F F F F F F F F F F F F F IHFPFI erforming ivlution with the ennottion hi' ool F F IHFPFP greting qold tndrd with the ennottion hi' ool IHFQ gorpus ulity essurne F F F F F F F F F F F F F F F F F F F F F IHFQFI hesription of the interfe F F F F F F F F F F F F F F F F F IHFQFP tep y step usge F F F F F F F F F F F F F F F F F F F F F IHFQFQ hetils of the gorpus sttistis tle F F F F F F F F F F F IHFQFR hetils of the houment sttistis tle F F F F F F F F F F IHFQFS qei imedded es for the mesures F F F F F F F F F IHFQFT seXevlXqpr F F F F F F F F F F F F F F F F F F F F F F F F F IHFR gorpus fenhmrk ool F F F F F F F F F F F F F F F F F F F F F F IHFRFI repring the gorpor for se F F F F F F F F F F F F F F F IHFRFP he(ning roperties F F F F F F F F F F F F F F F F F F F F F IHFRFQ unning the ool F F F F F F F F F F F F F F F F F F F F F F IHFRFR he esults F F F F F F F F F F F F F F F F F F F F F F F F F IHFS e lugin gomputing snterEennottor egreement @seeA F F F F F IHFSFI see for glssi(tion F F F F F F F F F F F F F F F F F F F F IHFSFP see por xmed intity ennottion F F F F F F F F F F F F IHFSFQ he fhwEfsed see ores F F F F F F F F F F F F F F F F IHFT e lugin gomputing the fhw ores for n yntology F F F F F IHFU ulity essurne ummriser for emwre F F F F F F F F F F F IIFI yverview F F F F F F F F F F F F F F F IIFIFI petures F F F F F F F F F F F IIFIFP vimittions F F F F F F F F F IIFP qrphil ser snterfe F F F F F F IIFQ gommnd vine snterfe F F F F F F IIFR epplition rogrmming snterfe IIFRFI vogRjFproperties F F F F F F F IIFRFP fenhmrk log formt F F F IIFRFQ inling pro(ling F F F F F F IIFRFR eporting tool F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
PQW
PRH PRH PRI PRR PRS PRT PRT PRV PSH PSH PSH PSI PSP PSP PSS PST PST PSU PSV PSW PTH PTP PTQ PTR PTS PTT PTW PUH PUH PUH PUI PUP PUP PUQ PUQ PUR
PTW
IP heveloping qei
IPFI eporting fugs nd equesting petures F F F F F F F F F F F F F F F F F F F F PUU IPFP gontriuting thes F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F PUU IPFQ greting xew lugins F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F PUV
PUU
Contents
IPFQFI ht to gll your lugin F F F F F F IPFQFP riting xew F F F F F F F F F F IPFQFQ riting xew F F F F F F F F F F IPFQFR riting edy wde9 epplition IPFQFS histriuting our xew lugins F F F IPFR pdting this ser quide F F F F F F F F F F IPFRFI fuilding the ser quide F F F F F F F IPFRFP wking ghnges to the ser quide F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
xi
289
PWI
PWI PWP PWQ PWQ PWS PWS PWS PWT PWT PWU PWU PWV PWW QHH QHH QHP QHQ QHT QHT QHU QHV QHV QHW QHW QIH
QII
xii
Contents
IRFP yntology ivent wodel F F F F F F F F F F F F F F F F F F F F F F F F F F F IRFPFI ht rppens when esoure is heletedc F F F F F F F F F F F IRFQ he yntology luginX gurrent smplementtion F F F F F F F F F F F F F F IRFQFI he yvswyntology vnguge esoure F F F F F F F F F F F F IRFQFP he gonnetesmeyntology vnguge esoure F F F F F F F F IRFQFQ he greteesmeyntology vnguge esoure F F F F F F F F F IRFQFR he yvswP fkwrdsEgomptile vnguge esoure F F F IRFQFS sing yntology smport wppings F F F F F F F F F F F F F F F F F IRFQFT sing figyvsw F F F F F F F F F F F F F F F F F F F F F F F F F F IRFQFU he sesmegvs ommnd line interfe F F F F F F F F F F F F F F IRFR he yntologyyvswP pluginX kwrdsEomptile implementtion IRFRFI he yvswyntologyv vnguge esoure F F F F F F F F F F IRFS qei yntology iditor F F F F F F F F F F F F F F F F F F F F F F F F F F F IRFT yntology ennottion ool F F F F F F F F F F F F F F F F F F F F F F F F F IRFTFI iewing ennotted ext F F F F F F F F F F F F F F F F F F F F F F IRFTFP iditing ixisting ennottions F F F F F F F F F F F F F F F F F F F IRFTFQ edding xew ennottions F F F F F F F F F F F F F F F F F F F F F F IRFTFR yptions F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F IRFU eltion ennottion ool F F F F F F F F F F F F F F F F F F F F F F F F F IRFUFI hesription of the two views F F F F F F F F F F F F F F F F F F F F IRFUFP grete new nnottion nd instne from text seletion F F F F F IRFUFQ grete new nnottion nd dd lel to existing instne from seletion F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F IRFUFR grete nd set properties for nnottion reltion F F F F F F F F F IRFUFS helete instneD lel or property F F F F F F F F F F F F F F F F F IRFUFT hi'erenes with ye nd yntology iditor F F F F F F F F F F F F IRFV sing the ontology es F F F F F F F F F F F F F F F F F F F F F F F F F F F IRFW sing the ontology es @old versionA F F F F F F F F F F F F F F F F F F F IRFIHyntologyEewre tei rnsduer F F F F F F F F F F F F F F F F F F F F IRFIIennotting ext with yntologil snformtion F F F F F F F F F F F F F F IRFIPopulting yntologies F F F F F F F F F F F F F F F F F F F F F F F F F F F IRFIQyntology es nd smplementtion ghnges F F F F F F F F F F F F F F F IRFIQFI hi'erenes etween the implementtion plugins F F F F F F F F F IRFIQFP ghnges in the yntology es F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F text F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F QIT QIV QIW QPH QPQ QPR QPR QPR QPS QPT QPU QPU QPW QQR QQR QQR QQU QQU QQV QQW QRH QRH QRH QRI QRI QRI QRQ QRR QRS QRT QRV QRV QRW
ISFI vnguge sdenti(tion F F F F F F F ISFIFI pingerprint qenertion F F F ISFP prenh lugin F F F F F F F F F F F F ISFQ qermn lugin F F F F F F F F F F F ISFR omnin lugin F F F F F F F F F F ISFS eri lugin F F F F F F F F F F F F ISFT ghinese lugin F F F F F F F F F F F ISFTFI ghinese ord egmenttion
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
F F F F F F F F
QSI
Contents
xiii
ITFI fiomedil upport F F F F F F F F F F F F F F ITFIFI efxi F F F F F F F F F F F F F F F F ITFIFP wetwp F F F F F F F F F F F F F F F ITFIFQ qpell iomedil spelling suggestion ITFIFR fehi F F F F F F F F F F F F F F F ITFIFS winighemGhrug gger F F F F F F F ITFIFT eqene F F F F F F F F F F F F F F F F ITFIFU qixse F F F F F F F F F F F F F F F F ITFIFV enn fiogger F F F F F F F F F F F F ITFIFW wuttionpinder F F F F F F F F F F F F ITFIFIH xormqene F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F nd orretion F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
QSW
QTH QTH QTI QTQ QTR QTR QTR QTR QTS QTT QTT
IU rsers
IUFI winir rser F F F F F F F F F F F F F F F F F IUFIFI ltform upported F F F F F F F F F F F IUFIFP esoures F F F F F F F F F F F F F F F F IUFIFQ rmeters F F F F F F F F F F F F F F F IUFIFR rerequisites F F F F F F F F F F F F F F F IUFIFS qrmmtil eltionships F F F F F F F IUFP e rser F F F F F F F F F F F F F F F F F F IUFQ vi rser F F F F F F F F F F F F F F F F F IUFQFI equirements F F F F F F F F F F F F F F IUFQFP fuilding vi F F F F F F F F F F F IUFQFQ unning the rser in qei F F F F F IUFQFR iewing the rse ree F F F F F F F F F IUFQFS ystem roperties F F F F F F F F F F F F IUFQFT gon(gurtion piles F F F F F F F F F F F IUFQFU rser nd qrmmr F F F F F F F F F F IUFQFV wpping xmed intities F F F F F F F F IUFQFW pgrding from fughrt to vi F IUFR tnford rser F F F F F F F F F F F F F F F F F IUFRFI snput equirements F F F F F F F F F F F IUFRFP snitiliztion rmeters F F F F F F F F IUFRFQ untime rmeters F F F F F F F F F F
QTW
QTW QUH QUI QUI QUI QUP QUP QUR QUS QUS QUS QUT QUT QUU QUV QUW QUW QVH QVH QVI QVI QVR QVS QVS QVT QVU RHH
IV whine verning
IVFI wv qenerlities F F F F F F F F F F F F F F F F F F F F F F F F F F IVFIFI ome he(nitions F F F F F F F F F F F F F F F F F F F F F IVFIFP qeiEpei( snterprettion of the eove he(nitions IVFP fth verning F F F F F F F F F F F F F F F F F F F F F F F F IVFPFI fth verning gon(gurtion pile ettings F F F F IVFPFP gse tudies for the hree verning ypes F F F F F F F
QVQ
xiv
Contents
IVFPFQ row to se the fth verning in qei heveloper IVFPFR yutput of the fth verning F F F F F F F F F F F F F IVFPFS sing the fth verning from the es F F F F F F F IVFQ whine verning F F F F F F F F F F F F F F F F F F F F F F F F IVFQFI he heei ilement F F F F F F F F F F F F F F F F F F IVFQFP he ixqsxi ilement F F F F F F F F F F F F F F F F F F F IVFQFQ he iue rpper F F F F F F F F F F F F F F F F F F F F IVFQFR he weix rpper F F F F F F F F F F F F F F F F F F IVFQFS he w vight rpper F F F F F F F F F F F F F F F F F F IVFQFT ixmple gon(gurtion pile
IWFI sntrodution F F F F F F F F F F F F F F IWFP he ools F F F F F F F F F F F F F F F IWFPFI gompound houment F F F F IWFPFP gompoundhoumentpromml IWFPFQ gompound houment iditor IWFPFR gomposite houment F F F F F IWFPFS heletewemers F F F F F F IWFPFT withwemers F F F F F F IWFPFU ving s wv F F F F F F F F IWFPFV elignment iditor F F F F F F F IWFPFW ving piles nd elignments F IWFPFIH etionEyEetion roessing
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
RQQ
RQQ RQQ RQR RQT RQT RQU RQV RQW RQW RQW RRT RRU
PHFI imedding swe ei in qei F F F F F F F F F F F PHFIFI wpping pile pormt F F F F F F F F F F F F F F PHFIFP he swe gomponent hesriptor F F F F F F PHFIFQ sing the enlysisingine F F F F F F F F F PHFP imedding qei gorpusgontroller in swe F F PHFPFI wpping pile pormt F F F F F F F F F F F F F F PHFPFP he qei epplition he(nition F F F F F F F PHFPFQ gon(guring the qeiepplitionennottor F PIFI er qroup ghunker F F F F F F F F F F F F F F PIFP xoun hrse ghunker F F F F F F F F F F F F F PIFPFI hi'erenes from the yriginl F F F F F PIFPFP sing the ghunker F F F F F F F F F F F PIFQ ggerprmework F F F F F F F F F F F F F F F F PIFQFI reegger"wultilingul y gger PIFQFP qixse nd houle uotes F F F F F F PIFR ghemistry gger F F F F F F F F F F F F F F F F PIFRFI sing the gger F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
RRW
RSH RSH RSR RSS RST RST RSU RSV RTP RTP RTP RTP RTQ RTT RTV RTW RTW
RTI
Contents
PIFS emnt emnti ennottion ervie F F F F F F PIFT vupedi emnti ennottion ervie F F F F F F F PIFU ennotting xumers F F F F F F F F F F F F F F F F PIFUFI xumers in ords nd xumers F F F F F PIFUFP omn xumerls F F F F F F F F F F F F F F PIFV ennotting wesurements F F F F F F F F F F F F F PIFW ennotting nd xormlizing htes F F F F F F F F PIFIHnowll fsed temmers F F F F F F F F F F F F F PIFIHFI elgorithms F F F F F F F F F F F F F F F F F PIFIIqei worphologil enlyzer F F F F F F F F F F PIFIIFI ule pile F F F F F F F F F F F F F F F F F F F PIFIPplexile ixporter F F F F F F F F F F F F F F F F F F PIFIQgon(gurle ixporter F F F F F F F F F F F F F F F PIFIRennottion et rnsfer F F F F F F F F F F F F F F PIFIShem inforer F F F F F F F F F F F F F F F F F F PIFITsnformtion etrievl in qei F F F F F F F F F F PIFITFI sing the s puntionlity in qei F F F PIFITFP sing the s es F F F F F F F F F F F F F F PIFIUesphinx e grwler F F F F F F F F F F F F F F PIFIUFI sing the grwler F F F F F F F F F F F PIFIUFP roxy on(gurtion F F F F F F F F F F F F F PIFIVordxet in qei F F F F F F F F F F F F F F F F F PIFIVFI he ordxet es F F F F F F F F F F F F F PIFIWue E eutomti ueyphrse hetetion F F F F F F PIFIWFI sing the uie ueyphrse ixtrtor9 PIFIWFP sing ue gorpor F F F F F F F F F F F F F PIFPHennottion werging lugin F F F F F F F F F F F F PIFPIgopying ennottions etween houments F F F F PIFPPypenglis lugin F F F F F F F F F F F F F F F F F PIFPQvingipe lugin F F F F F F F F F F F F F F F F F F F PIFPQFI vingipe okenizer F F F F F F F F F F F PIFPQFP vingipe entene plitter F F F F F F F PIFPQFQ vingipe y gger F F F F F F F F F PIFPQFR vingipe xi F F F F F F F F F F F F F PIFPQFS vingipe vnguge sdenti(er F F F F F PIFPRypenxv lugin F F F F F F F F F F F F F F F F F F PIFPRFI snit prmeters nd models F F F F F F F F PIFPRFP ypenxv s F F F F F F F F F F F F F F F PIFPRFQ ytining nd generting models F F F F F PIFPSgontent hetetion sing foilerpipe F F F F F F F F PIFPTsnter ennottor egreement F F F F F F F F F F F F PIFPUhem ennottion iditor F F F F F F F F F F F F F PIFPVgoref ools lugin F F F F F F F F F F F F F F F F F PIFPWumed pormt
xv
RTW RUH RUI RUP RUS RUT RUW RVI RVI RVP RVQ RVS RVT RVU RVW RWH RWP RWR RWS RWT RWV RWV SHP SHR SHR SHT SHU SHV SHW SIH SII SII SII SIP SIP SIQ SIR SIR SIT SIT SIU SIV SIV SPP
xvi
Contents
PIFQHwediiki pormt F F F F F F F F F F PIFQIermider term extrtion tools F F PIFQIFI ermnk lnguge resoures PIFQIFP ermnk ore gopier F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F SPP SPQ SPQ SPS
527
SPW
SQH SQH SQI SQP SQQ SQQ SQR SQR SQS SQS SQW SRI SRI SRQ SRQ SRR SRR SRS SRS SRT SRV SSH
PR qei wmir
SSQ
Appendices
e ghnge vog
eFI ersion UFI @xovemer PHIPA F F F F F F eFIFI xew plugins F F F F F F F F F F F eFIFP virry updtes F F F F F F F F F eFIFQ qei imedded es hnges eFP ersion UFH @perury PHIPA F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
555
SSS
SSS SSS SST SST SSU
Contents
eFPFI wjor new fetures F F F F F F F F F F F eFPFP emovl of depreted funtionlity F F eFPFQ yther enhnements nd ug (xes F F eFQ ersion TFI @epril PHIIA F F F F F F F F F F F F eFQFI xew giyvi lugins F F F F F F F F F eFQFP yther new fetures nd improvements eFR ersion TFH @xovemer PHIHA F F F F F F F F F F eFRFI wjor new fetures F F F F F F F F F F F eFRFP freking hnges F F F F F F F F F F F F eFRFQ yther new fetures nd ug(xes F F F F eFS ersion SFPFI @wy PHIHA F F F F F F F F F F F F eFT ersion SFP @epril PHIHA F F F F F F F F F F F F eFTFI tei nd teiErelted F F F F F F F F eFTFP yther ghnges F F F F F F F F F F F F F eFU ersion SFI @heemer PHHWA F F F F F F F F F F eFUFI xew petures F F F F F F F F F F F F F F eFUFP tei improvements F F F F F F F F F F eFUFQ yther improvements nd ug (xes F F eFV ersion SFH @wy PHHWA F F F F F F F F F F F F F eFVFI wjor xew petures F F F F F F F F F F eFVFP yther xew petures nd smprovements eFVFQ pei( fug pixes F F F F F F F F F F F eFW ersion RFH @tuly PHHUA F F F F F F F F F F F F F eFWFI wjor xew petures F F F F F F F F F F eFWFP yther xew petures nd smprovements eFWFQ fug pixes nd yptimiztions F F F F F eFIH ersion QFI @epril PHHTA F F F F F F F F F F F F eFIHFI wjor xew petures F F F F F F F F F F eFIHFP yther xew petures nd smprovements eFIHFQ fug pixes F F F F F F F F F F F F F F F F eFII tnury PHHS F F F F F F F F F F F F F F F F F F eFIP heemer PHHR F F F F F F F F F F F F F F F F F eFIQ eptemer PHHR F F F F F F F F F F F F F F F F F eFIR ersion Q fet I @eugust PHHRA F F F F F F F F eFIS tuly PHHR F F F F F F F F F F F F F F F F F F F F eFIT tune PHHR F F F F F F F F F F F F F F F F F F F F eFIU epril PHHR F F F F F F F F F F F F F F F F F F F F eFIV wrh PHHR F F F F F F F F F F F F F F F F F F F eFIW ersion PFP ! eugust PHHQ F F F F F F F F F F F eFPH ersion PFI ! perury PHHQ F F F F F F F F F F eFPI tune
xvii
SSU SSV SSV STH STH STH STP STP STP STQ STR STS STS STT STT STU STW SUH SUH SUI SUQ SUR SUR SUR SUT SUV SUW SUW SUW SVI SVP SVP SVQ SVQ SVR SVR SVS SVS SVS SVT SVT
SVW
xviii
Contents
gFI yntotext tpeg gompiler F F F F F F F F F F F gFP qoogle lugin F F F F F F F F F F F F F F F F F F gFQ hoo lugin F F F F F F F F F F F F F F F F F F gFQFI sing the hoo F F F F F F F F F F F gFR qzetteer isul esoure E qei F F F F F F gFRFI hisply wodes F F F F F F F F F F F F F gFRFP viner he(nition ne F F F F F F F F F gFRFQ viner he(nition oolr F F F F F F F gFRFR ypertions on viner he(nition xodes gFRFS qzetteer vist ne F F F F F F F F F F F gFRFT wpping he(nition ne F F F F F F F F gFS qoogle rnsltor F F F F F F F F F F F F F hFI tterns F F F F F F F F F F F F hFIFI gomponents F F F F F F hFIFP wodelD viewD ontroller hFIFQ snterfes F F F F F F F hFP ixeption rndling F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
F F F F F F F F F F F F F F F F F
SWI
SWI SWP SWP SWQ SWQ SWR SWR SWS SWS SWS SWT SWT SWW THH THP THQ THQ
h hesign xotes
SWW
iFI helring the sks F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F iFP he pkgegpp tsk E undling n pplition with its dependenies F F F iFPFI sntrodution F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F iFPFP fsi sge F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F iFPFQ rndling xonElugin esoures F F F F F F F F F F F F F F F F F F F F F iFPFR tremlining your lugins F F F F F F F F F F F F F F F F F F F F F F F F iFPFS fundling ixtr esoures F F F F F F F F F F F F F F F F F F F F F F F F iFQ he expndreoles sk E werging ennottionEhriven gon(g into reoleFxml winFjpe F F F F F F F F F F F (rstFjpe F F F F F F F F F F F F (rstnmeFjpe F F F F F F F F F nmeFjpe F F F F F F F F F F F pFRFI erson F F F F F F F F F pFRFP votion F F F F F F F F pFRFQ yrgniztion F F F F F pFRFR emiguities F F F F F F pFRFS gontextul informtion nmepostFjpe F F F F F F F F dtepreFjpe F F F F F F F F F dteFjpe F F F F F F F F F F F F reldteFjpe
THU
THU THU THU THV THW TIP TIP TIR TIS TIT TIU TIU TIU TIU TIV TIV TIV TIV TIW TIW TIW
TIS
Contents
pFW pFIH pFII pFIP pFIQ pFIR pFIS pFIT pFIU pFIV pFIW numerFjpe F F F F ddressFjpe F F F F urlFjpe F F F F F F identi(erFjpe F F F jotitleFjpe F F F F (nlFjpe F F F F F F unknownFjpe F F F nmeontextFjpe orgontextFjpe F loontextFjpe F lenFjpe
TIW TPH TPH TPH TPH TPH TPI TPI TPI TPP TPP
TPQ TPS
Contents
Chapter 1 Introduction
oftwre doumenttion is like sexX when it is goodD it is veryD very goodY nd when it is dD it is etter thn nothingF @enonymousFA here re two wys of onstruting softwre designX one wy is to mke it so simple tht there re oviously no de(ieniesY the other wy is to mke it so omplited tht there re no ovious de(ieniesF @gFeFF roreA e omputer lnguge is not just wy of getting omputer to perform operE tions ut rther tht it is novel forml medium for expressing ides out methodologyF husD progrms must e written for people to redD nd only iniE dentlly for mhines to exeuteF @he truture nd snterprettion of gomputer rogrmsD rF eelsonD qF ussmn nd tF ussmnD IWVSFA sf you try to mke something eutifulD it is often uglyF sf you try to mke something usefulD it is often eutifulF @ysr ildeA1 qei2 is n infrstruture for developing nd deploying softwre omponents tht proess humn lngugeF st is nerly IS yers old nd is in tive use for ll types of omputtionl tsk involving humn lngugeF qei exels t text nlysis of ll shpes nd sizesF prom lrge orportions to smll strtupsD from multiEmillion reserh onsorti to undergrdute projetsD our user ommunity is the lrgest nd most diverse of ny system of this typeD nd is spred ross ll ut one of the ontinents3 F qei is open soure free softwreY users n otin free support from the user nd developer ommunity vi qeiFFuk or on ommeril sis from our industril prtnersF e re the iggest open soure lnguge proessing projet with development tem more thn doule the size of the lrgest omprle projets @mny of whih re integrted with
1 These were, at least, our ideals; of course we didn't completely live up to them. . . 2 If you've read the overview at http://gate.ac.uk/overview.html, you may prefer to skip to Section 1.1. 3 Rumours that we're planning to send several of the development team to Antarctica on one-way tickets
are false, libellous and wishful thinking.
Introduction
qei4 AF wore thn S million hs een invested in qei development5 Y our ojetive is to mke sure tht this ontinues to e money well spent for ll qei9s usersF he qei fmily of tools hs grown over the yers to inlude desktop lient for developersD work)owEsed we pplitionD tv lirryD n rhiteture nd proessF qei isX
an IDED
proessing omponents undled with very widely used snformtion ixtrtion system nd omprehensive set of other plugins for hosted lrgeEsle text proessingD qei gloud @httpXGGgteloudFnetGAF ee lso ghpter PPF
a cloud computing solution
a web appD qei emwreX ollortive nnottion environment for ftoryE style semnti nnottion projets uilt round work)ow engine nd hevilyE optimised kend servie infrstrutureF ee lso ghpter PQF a multi-paradigm search repositoryD
qei wmirD whih n e used to index nd serh over textD nnottionsD semnti shems @ontologiesAD nd semnti metEdt @instne dtAF st llows queries tht ritrrily mix fullEtextD struturlD linguisti nd semnti queries nd tht n sle to terytes of textF ee lso ghpter PRF qei imeddedX n ojet lirry optimised for inlusion in diverse pplitions giving ess to ll the servies used y qei heveloper nd moreF
a frameworkD an architecture X
ompositionF
a process
e lso developX wikiGgwD qei iki @httpXGGgtewikiFsfFnetGAD minly to host our own wesites nd s tested for some of our experiments por more informtion on the qei fmily see httpXGGgteFFukGfmilyG nd lso rt s of this ookF yne of our originl motivtions ws to remove the neessity for solving ommon engineering prolems efore doing useful reserhD or reEengineering efore deploying reserh results into pplitionsF gore funtions of qei tke re of the lion9s shre of the engineeringX
4 Our philosophy is reuse not reinvention, so we integrate and interoperate with other systems e.g.:
LingPipe, OpenNLP, UIMA, and many more specic tools.
5 This is the gure for direct Sheeld-based investment only and therefore an underestimate. 6 GATE Developer and GATE Embedded are bundled, and in older distributions were referred to just as
`GATE'.
Introduction
modelling nd persistene of speilised dt strutures
mesurementD evlutionD enhmrking @never elieve omputing reserher who hsn9t mesured their results in repetle nd open setting3A visulistion nd editing of nnottionsD ontologiesD prse treesD etF (nite stte trnsdution lnguge for rpid prototyping nd e0ient implementtion of shllow nlysis methods @teiA extrtion of trining instnes for mhine lerning pluggle mhine lerning implementtions @ekD w vightD FFFA yn top of the ore funtions qei inludes omponents for diverse lnguge proessing tsksD eFgF prsersD morphologyD tggingD snformtion etrievl toolsD snformtion ixtrtion omponents for vrious lngugesD nd mny othersF qei heveloper nd imedded re supplied with n snformtion ixtrtion system @exxsiA whih hs een dpted nd evluted very widely @numerous industril systemsD reserh systems evluted in wgD igD egiD hgD slD xgsD etFAF exxsi is often used to rete hp or yv @metdtA for unstrutured ontent @semantic annotationAF qei version I ws written in the midEIWWHsY t the turn of the new millennium we omE pletely rewrote the system in tvY version S ws relesed in tune PHHWY nd version T " in xovemer PHIHF e elieve tht qei is the leding system of its typeD ut s sientists we hve to dvise you not to tke our word for itY tht9s why we9ve mesured our softwre in mny of the ompetitive evlutions over the lst dedeEndEEhlf @wgD igD egiD hg nd moreY see etion IFR for detilsAF e invite you to give it tryD to get involved with the qei ommunityD nd to ontriute to humn lnguge sieneD engineering nd developmentF his ook desries how to use qei to develop lnguge proessing omponentsD test their performne nd deploy them s prts of other pplitionsF sn the rest of this hpterX etion IFI desries the est wy to use this ookY etion IFP rie)y notes tht the ontext of qei is pplied lnguge proessingD or Language EngineeringY etion IFQ gives n overview of developing using qeiY etion IFR lists pulitions desriing qei performne in evlutionsY etion IFS outlines wht is new in the urrent version of qeiY etion IFT lists other pulitions out qeiF
Introduction
xoteX if you don9t see the omponent you need in this doumentD or if we mention omE ponent tht you n9t see in the softwreD ontt gteEusersdlistsFsoureforgeFnet7 ! vrious omponents re developed y our ollortorsD who we will e hppy to put you in ontt withF @yften the proess of getting new omponent is s simple s typing the v into qei heveloperY the system will do the restFA
1.1
he mteril presented in this ook rnges from the oneptul @eFgF wht is softwre rhiteturec9A to prtil instrutions for progrmmers @eFgF how to del with qei exeptionsA nd linguists @eFgF how to write pttern grmmrAF purthermoreD qei9s highly extensile nture mens tht new funtionlity is onstntly eing dded in the form of new pluginsF smportnt funtionlity is s likely to e loted in plugin s it is to e integrted into the qei oreF his presents something of n orgnistionl hllengeF yur @no dout imperfetA solution is to divide this ook into three prtsF rt s overs instlltionD using the qei heveloper qs nd using exxsiD s well s providing some kground nd theoryF e reommend the new user to egin with rt sF rt ss overs the more dvned of the ore qei funtionlityY the qei imedded es nd tei pttern lnguge mong other thingsF rt sss provides referene for the numerous plugins tht hve een reted for qeiF elthough exxsi provides good strting pointD the user will soon wish to explore other resouresD nd so will need to onsult this prt of the textF e reommend tht rt sss e used s refereneD to e dipped into s neessryF sn rt sssD plugins re grouped into rod res of funtionlityF
1.2
Context
qei n e thought of s oftwre erhiteture for vnguge ingineering gunninghm HHF oftwre erhiteture9 is used rther loosely here to men omputer infrstruture for softE wre developmentD inluding development environments nd frmeworksD s well s the more usul use of the term to denote mroElevel orgnistionl struture for softwre systems hw 8 qrln WTF vnguge ingineering @viA my e de(ned sX F F F the disipline or t of engineering softwre systems tht perform tsks involvE ing proessing humn lngugeF foth the onstrution proess nd its outputs
7 Follow the `support' link from http://gate.ac.uk/ to subscribe to the mailing list.
Introduction
re mesurle nd preditleF he literture of the (eld reltes to oth ppliE tion of relevnt sienti( results nd ody of prtieF gunninghm WW
he relevnt sienti( results in this se re the outputs of gomputtionl vinguistisD xtE url vnguge roessing nd erti(il sntelligene in generlF nlike these other disiplinesD viD s n engineering disiplineD entils predictabilityD oth of the proess of onstruting viE sed softwre nd of the performne of tht softwre fter its ompletion nd deployment in pplitionsF ome working de(nitionsX IF gomputtionl vinguistis @gvAX siene of lnguge tht uses omputtion s n investigtive toolF PF xturl vnguge roessing @xvAX siene of omputtion whose sujet mtE ter is dt strutures nd lgorithms for omputer proessing of humn lngugeF QF vnguge ingineering @viAX uilding xv systems whose ost nd outputs re mesurle nd preditleF RF oftwre erhitetureX mroElevel orgnistionl priniples for fmilies of systemsF sn this ontext is lso used s infrstrutureF SF oftwre erhiteture for vnguge ingineering @eviAX softwre infrstruE tureD rhiteture nd development tools for pplied gvD xv nd viF @yf ourse the prtie of these (elds is roder nd more omplex thn these de(nitionsFA sn the sienti( endevours of xv nd gvD qei9s role is to support experimenttionF sn this ontext qei9s signi(nt fetures inlude support for utomted mesurement @see ghpter IHAD providing level plying (eld9 where results n esily e repeted ross di'erent sites nd environmentsD nd reduing reserh overheds in vrious wysF
1.3
Overview
IH
Introduction
gomponents re reusle softwre hunks with wellEde(ned interfesD nd re populr rhiteturl formD used in un9s tv fens nd wirosoft9s FxetD for exmpleF qei omponents re speilised types of tv fenD nd ome in three )voursX
vngugeesoures @vsA represent entities suh s lexionsD orpor or ontologiesY roessingesoures @sA represent entities tht re primrily lgorithmiD suh s prsersD genertors or ngrm modellersY isulesoures @sA represent visulistion nd editing omponents tht prtiipte in qssF
hese de(nitions n e lurred in prtie s neessryF golletivelyD the set of resoures integrted with qei is known s giyviX golletion of iusle yjets for vnguge ingineeringF ell the resoures re pkged s tv erhive @or te9A (lesD plus some wv on(gurtion dtF he te nd wv (les re mde ville to qei y putting them on we serverD or simply pling them in the lol (le speF etion IFQFP introdues qei9s uiltEin resoure setF hen using qei to develop lnguge proessing funtionlity for n pplitionD the developer uses qei heveloper nd qei imedded to onstrut resoures of the three typesF his my involve progrmmingD or the development of vnguge esoures suh s grmmrs tht re used y existing roessing esouresD or mixture of othF qei heveloper is used for visulistion of the dt strutures produed nd onsumed during proessingD nd for deuggingD performne mesurement nd so onF por exmpleD (gure IFI is sreenshot of one of the visulistion toolsF qei heveloper is nlogous to systems like wthemti for wthemtiinsD or tfuilder for tv progrmmersX it provides onvenient grphil environment for reserh nd development of lnguge proessing softwreF hen n pproprite set of resoures hve een developedD they n then e emedded in the trget lient pplition using qei imeddedF qei imedded is supplied s series of te (lesF9 o emed qeiEsed lnguge proessing filities in n pplitionD these te (les re ll tht is neededD long with te (les nd wv on(gurtion (les for the vrious resoures tht mke up the new filitiesF
9 The main JAR le (gate.jar) supplies the framework. Built-in resources and various 3rd-party libraries
are supplied as separate JARs; for example (guk.jar, the GATE Unicode Kit.) contains Unicode support (e.g. additional input methods for languages not currently supported by the JDK). They are separate because the latter has to be a Java extension with a privileged security prole.
Introduction
II
IP
Introduction
1.3.4 An Example
his setion gives very rief exmple of typil use of qei to develop nd deploy lnguge proessing pilities in n pplitionD nd to generte quntittive results for sienti( pulitionF vet9s imgine tht developer lled ptim is uilding n emil lient11 for gyerdyne ystems9 lrge orporte sntrnetF sn this pplition she would like to hve lnguge proessing system tht utomtilly spots the nmes of people in the orportion nd trnsforms them into milto hyperlinksF e little investigtion shows tht qei9s existing omponents n e tilored to this purposeF ptim strts up qei heveloperD nd retes new doument ontining some exmple emilsF he then lods some proessing resoures tht will do nmedEentity reognition @ tokeniserD gzetteer nd semnti tggerAD nd retes n pplition to run these omponents on the doument in sequeneF rving proessed the emilsD she n see the results in one of severl viewers for nnottionsF he qei omponents re deent strtD ut they need to e ltered to del speilly with people from gyerdyne9s personnel dtseF herefore ptim retes new yerE9 versions of the gzetteer nd semnti tgger resouresD using the ootstrp9 toolF his tool retes diretory struture on disk tht hs some tv stu odeD wke(le nd n wv
10 JDK: Java Development Kit, Sun Microsystem's Java implementation. Unicode support is being actively
improved by Sun, but at the time of writing many languages are still unsupported. In fact, Unicode itself doesn't support all languages, e.g. Sylheti; hopefully this will change in time. specic viruses and hadn't heard of Gmail or Thunderbird.
11 Perhaps because Outlook Express trashed her mail folder again, or because she got tired of Microsoft-
Introduction
IQ
on(gurtion (leF efter severl hours struggling with dly written doumenttionD ptim mnges to ompile the stus nd rete te (le ontining the new resouresF he tells qei heveloper the v of these (les12 D nd the system then llows her to lod them in the sme wy tht she loded the uiltEin resoures erlier onF ptim then retes seond opy of the emil doumentD nd uses the nnottion editing filities to mrk up the results tht she would like to see her system produingF he sves this nd the version tht she rn qei on into her seril dtstoreF prom now on she n follow this routineX IF un her pplition on the emil test orpusF PF ghek the performne of the system y running the nnottion di'9 tool to ompre her mnul results with the system9s resultsF his gives her oth perentge ury (gures nd grphil disply of the di'erenes etween the mhine nd humn outputsF QF wke edits to the odeD pttern grmmrs or gzetteer lists in her resouresD nd reompile where neessryF RF ell qei heveloper to reEinitilise the resouresF SF qo to IF o mke the ltertions tht she requiresD ptim reEimplements the exxsi gzetteer so tht it regenertes itself from the lol personnel dtF he then lters the pttern grmmr in the semnti tgger to prioritise reognition of nmes from tht soureF his ltter jo involves lerning the tei lnguge @see ghpter VAD ut s this is sed on regulr expressions it isn9t too di0ultF iventully the system is running nielyD nd her ury is WQ7 @there re still some proE lem sesD eFgF when people use niknmesD ut the performne is good enough for proE dution useAF xow ptim stops using qei heveloper nd works insted on emedding the new omponents in her emil pplition using qei imeddedF his pplition is written in tvD so emedding is very esy13 X the qei te (les re dded to the projet gveerD the new omponents re pled on we serverD nd with little ode to do initilistionD loding of omponents nd so onD the jo is (nished in hlf dy ! the ode to tlk to qei tkes up only round ISH lines of the eventul pplitionD most of whih is just opied from the exmple in the sheffield.examples.StandAloneAnnie lssF feuse ptim is worried out gyerdyne9s unethil poliy of developing kynet to help the lrge orportes of the est strengthen their strngleEhold over the orldD she wnts to get jo s n demi insted @so tht her onsiene will only hve to ope with the
12 While developing, she uses a file:/... URL; for deployment she can put them on a web server. 13 Languages other than Java require an additional interface layer, such as JNI, the Java Native Interface,
which is in C.
IR
Introduction
torture of studentsD s opposed to humnityAF he tkes the ury mesures tht she hs ttined for her system nd writes pper for the tournl of xsturtium vogrithm snitement desriing the pproh used nd the results otinedF feuse she used qei for developmentD she n ite the repetility of her experiments nd o'er ess to exmple inry versions of her softwre y putting them on n externl we serverF end everyody lived hppily ever fterF
1.4
Some Evaluations
his setion ontins n inomplete list of pulitions desriing systems tht used qei in ompetitive quntittive evlution progrmmesF hese progrmmes hve hd signi(nt impt on the lnguge proessing (eld nd the widespred presene of qei is some mesure of the mturity of the system nd of our understnding of its likely performne on diverse text proessing tsksF
vi
et al.
HUd desries the performne of n wEsed lerning system in the xgsET tent etrievl skF he system hieved the est result on two of three mesures used in the tsk evlutionD nmely the Ereision nd pEmesureF he system oE tined lose to the est result on the remining mesure @eEreisionAF
teringF st uses qei for informtion extrtion nd the wwe system to rete sumE mries nd semnti representtions of doumentsF yne system on(gurtion rnked Rth in the e eople erh PHHU evlutionF nents nd the eri plugin ville in qei to produe summries in inglish from mixture of inglish nd eri doumentsF
of reserh into openEdomin question nsweringF qei hs formed the E sis of muh of this reserh resulting in systems whih hve rnked highly durE ing independent evlutions sine IWWWF he (rst suessful question nswering system developed t the niversity of he0eld ws evluted s prt of ig V nd used the vsi informtion extrtion system @the forerunner of exxsiA whih ws distriuted with qei rumphreys et al. WWF purther reserh ws reported in ott 8 qizusksF HHD qreenwood et al. HPD qizusks et al. HQD qizusks et al. HR nd qizusks et al. HSF sn PHHR the system ws rnked Wth out of PV prtiipting groupsF inition ptterns mnully implemented in qei s well s lerned tei ptterns
ggion HR desries tehniques for nswering de(nition questionsF he system uses defE
Introduction
IS
indued from orpusF sn PHHRD the system ws rnked Rth in the igGe evluE tionsF
mented using summriztion omponents omptile with qei @the wwe sysE temAF he system ws rnked Pnd in the houment nderstnding ivlution proE grmmesF
et al.
wynrd
surprise lnguge progrmF exxsi ws dpted to geuno with four person dys of e'ortD nd hieved n pEmesure of UUFS7F nfortuntelyD ours ws the only system prtiipting3
et al.
HQe nd wynrd
et al.
wynrd
designed for the egi tsk @eutomti gontent ixtrtionAF elthough ompriE son to other prtiipting systems nnot e reveled due to the stipultions of egiD results show VP7EVT7 preision nd rellF
et al. et al.
HP nd wynrd
et al.
rumphreys qizusks
WV desries the vsiEss system used in wgEUF WS desries the vsiEss system used in wgETF
1.5
Recent Changes
his setion detils reent hnges mde to qeiF eppendix e provides omplete hnge logF
IT
Introduction
upport for reding numer of new doument formts hs lso een ddedX
PubMed and the Cochrane Library CoNLL IOB MediaWiki
mrkupD oth plin text nd wv dump (les suh s those from ikipedi @see etion PIFQHAF
sn dditionD redyEmde pplitions hve een dded to mny existing plugins @notly the Lang_* nonEinglish lnguge pluginsA to mke it esier to experiment with their sF
virry updtes
pdted the tnford rser plugin @see etion IUFRA to version PFHFR of the prser itselfD nd dded runEtime prmeters to the to ontrol the prser9s dependeny optionsF he wesurement nd xumer tggers hve een upgrded to use teiC insted of teiF his should result in fster proessingD nd lso llows for more memory e0ient duplition of instnesD iFeF when pool of pplitions is retedF he ypenxv plugin hs een ompletely revised to use ephe ypenxv IFSFP nd the orresponding set of modelsF ee etion PIFPR for detilsF he ntive lunher for qei on w y now works with yrle tv U s well s epple tv TF
Introduction
IU
he s de(ned in the exxsi plugin re now desried y nnottions on the tv lsses rther thn expliitly inside reoleFxmlF he min reson for this hnge is to enle the de(nitions to e inherited to ny sulsses of these sF greting n empty sulss is ommon wy of providing with di'erent set of defult prmeters @this is used extensively in the lnguge plugins to provide ustom gzetteers nd nmed entity trnsduersAF his hs the dded ene(t of ensuring tht new fetures lso utomtilly perolte down to these sulssesF sf you hve developed your own tht extends one of the exxsi ones you my (nd it hs quired new prmeters tht were not there previouslyD you my need to use the driddengreolermeter nnottion to suppress themF he orpus prmeter of vngugeenlyser @n interfe mostD if not llD s impleE mentA is now nnotted s dyptionl s most implementtions do not tully require the prmeter to e setF hen sving n pplition the plugins re now sved in the sme order in whih they were originlly loded into qeiF his ensures tht dependenies etween plugins re orretly mintined when pplitions re restoredF es support for working with reltions etween nnottions ws ddedF ee etion UFU for more detilsF he method of populting orpus from single (le hs een updted to llow ny mime type to e used when reting the new doumentsF end numerous smller ug (xes nd performne improvementsF F F
1.6
Further Reading
vots of doumenttion lives on the qei we siteD inludingX qei online tutorilsY the min system doumenttion treeY tvho es doumenttionY rwv of the soure odeY omprehensive list of qei pluginsF por more detils out he0eld niversity9s work in humn lnguge proessing see the xv group pges or A Denition and Short History of Language Engineering @gunninghm WWAF por more detils out snformtion ixtrtion see IE, a User Guide or the qei si pgesF
IV
Introduction
e list of pulitions on qei nd projets tht use it @some of whih re ville onEline from httpXGGgteFFukGgteGdoGppersFhtmlAX
PHIH fonthev
et al.
ronmentD emphsising the di'erent roles tht users ply in the orpus nnottion proE essF guge interfesF here is other relted work y hmljnoviD egtonoviD nd gunE ninghm on using qei to uild nturl lnguge interfes for quering ontologiesF @rindi nd qujrtiAF
hmljnovi IH presents the use of qei in the development of ontrolled nturl lnE
eswni 8 qizusks IH disusses the use of qei to proess outh esin lnguges PHHW ggion 8 punk HW fouses in detil on the use of qei for mining opinions nd fts
for usiness intelligene gthering from we ontentF qeiF
eswni 8 qizusks HW presents in more detil the text lignment omponent of fonthev
et al. HW is the rumn vnguge ehnologies9 hpter of emnti unowlE edge wngement9 @tohn hviesD wrko qroelnik nd hunj wldeni edsFA et al.
hmljnovi
ommunition reserhD fousing on the roles plyed y emil in informtion mngeE mentD nd ommeril nd reserh e'orts to integrte semntiEsed pproh to emilF ing tsksF pirstlyD n w with uneven mrgins @wwA is proposed to del with the prolem of imlned trining dtF eondlyD w tive lerning is employed in order to llevite the di0ulty in otining lelled trining dtF he lgorithms re presented nd evluted on severl snformtion ixtrtion @siA tsksF
vi
et al.
HW investigtes two tehniques for mking ws more suitle for lnguge lernE
PHHV egtonovi
et al. HV presents our pproh to utomti ptent enrihmentD tested in lrgeEsleD prllel experiments on y nd iy doumentsF
Introduction
IW
hmljnovi
et al.
n openEsoure softwre engineering projet with the gol of exploring methods for ssisting openEsoure developers nd softwre users to lern nd mintin the system without mjor e'ortF
et al.
hell lle
HV presents erviepinderF
veloped suessfully to dpt w for the spei( fetures of the pEterm ptent lsE si(tion tskF mehnis methods for informtion retrievl nd nturl lnguge proessingF exmines the extent to whih they re redy for use in the rel worldF
vi 8 fonthev HV reviews the reent developments in pplying geometri nd quntum wynrd HV investigtes the stte of the rt in utomti textul nnottion toolsD nd wynrd
et al. HV disusses methods of mesuring the performne of ontologyEsed informtion extrtion systemsD fousing prtiulrly on the flned histne wetri @fhwAD new metri we hve proposed whih ims to tke into ount the more )exile nture of ontologillyEsed pplitionsF et al. HV investigtes xv tehniques for ontology popultionD using omE intion of ruleEsed pprohes nd mhine lerningF et al.
wynrd ln
essing strutured informtionD tht is domin independent nd esy to use without triningF
informtion extrtionF
et al.
HU desries n ontologilly sed pproh to multiEsoureD multilingul HU presents ontrolled lnguge for ontology editing nd softwre imE
plementtionD sed prtly on stndrd xv toolsD for proessing tht lnguge nd mnipulting n ontologyF
et al.
wynrd
indued y hnges to the ontologiesD nd @PA the evolution of the ontology indued y hnges to the underlying metdtF
PH
Introduction
et al.
wynrd
min ontologiesD whih enles the extrtion of relevnt informtion to e fed into models for nlysis of (nnil nd opertionl risk nd other usiness intelligene pplitions suh s ompny intelligeneD y mens of the fv stndrdF PHHUF yur rossEdoument oreferene system uses n inEhouse gglomertive lustering implementtion to group douments referring to the sme entityF
et al.
ggion
the ontext of prtil eEusiness pplition for the i wsxq rojet where the gol is to gther interntionl ompny intelligene nd ountryGregion informtionF ontology s n essentil prt of the extrtion proessD y tking into ount the reltions etween oneptsF
vi
et al.
HU introdues hierrhil lerning pproh for siD whih uses the trget
vi
et al.
tion lelsD whih n e seen s the lel reltion sensitive version of importnt mesures suh s verged preision nd pEmesureD nd presents the results of pplyE ing the new evlution mesures to ll sumitted runs for the xgsET pEterm ptent lssi(tion tskF
et al.
vi vi
HU desries the lgorithms nd linguisti fetures used in our prtiipting system for the opinion nlysis pilot tsk t xgsETF
proh for the spei(s of the pEterm ptent lssi(tion sutsk t xgsET tent etrievl skF
et al.
HUd desries our wEsed system nd the tehniques we used to dpt the pE
uernel gnonil gorreltion enlysis @uggeAD method of orrelting liner relE tionships etween two vriles in kernel de(ned feture spesF
PHHT eswni
et al. HT @roeedings of the Sth snterntionl emnti e gonferene @sgPHHTAA sn this pper the prolem of dismiguting uthor instnes in onE tology is ddressedF e desrie weEsed pproh tht uses vrious fetures suh s pulition titlesD strtD initils nd oEuthorship informtionF et al. HT emnti ennottion nd rumn vnguge ehnology9D ontriE ution to emnti e ehnologyX rends nd eserh9 @hviesD tuder nd rE renD edsFA et al. HT emnti snformtion eess9D ontriution to emnti e ehnologyX rends nd eserh9 @hviesD tuder nd rrenD edsFA
fonthev
fonthev
Introduction
PI
of informtion soures ssoited with softwre projets nd PA relies on tehniques tht re portle ross pplition dominsF
hvis
et al. HT desries work in progress onerning the pplition of gontrolled vnE guge snformtion ixtrtion E gvsi to ersonl emnti iki E emperE ikiD the gol eing to permit users who hve no speilist knowledge in ontology tools or lnguges to semiEutomtilly nnotte their respetive personl iki pgesF
lnguge informtion retrievlF he lgorithm is pplied to tpneseEinglish rossE lnguge informtion retrievlF
wynrd
et al. HT disusses existing evlution metrisD nd proposes new method for evluting the ontology popultion tskD whih is generl enough to e used in vriety of situtionD yet more preise thn mny urrent metrisF et al.
ln
simply y using restrited version of the inglish lngugeF he ontrolled lnguge desried is sed on n open voulry nd restrited set of grmmtil onE strutsF
ln ng
et al. HT desries the retion of linguisti nlysis nd orpus serh tools for umerinD s prt of the development of the igvF et al. HT proposes n w sed pproh to hierrhil reltion extrtionD using fetures derived utomtilly from numer of qeiEsed openEsoure lnguge proessing toolsF
PHHS eswni
et al. HS @roeedings of pifth snterntionl gonferene on eent edvnes in xturl vnguge roessing @exvPHHSAA st is fullEfetured nnottion indexing nd serh engineD developed s prt of the qeiF st is powered with ephe vuene tehnology nd indexes vriety of douments supported y the qeiF
fonthev HS presents the yxyw system whih uses xturl vnguge qenertion
@xvqA tehniques to produe textul summries from emnti e ontologiesF of the inylopedi of vnguge nd vinguistisF
gunninghm HS is n overview of the (eld of snformtion ixtrtion for the Pnd idition gunninghm 8 fonthev HS is n overview of the (eld of oftwre erhiteture for
vnguge ingineering for the Pnd idition of the inylopedi of vnguge nd vinE guistisF
PP
Introduction
et al.
HS @iuro sntertive elevision gonferene perA e system whih n HS @orld ide e gonferene perA he e is used to ssist the HS @eond iuropen emnti e gonferene perA e system tht
semntilly nnottes television news rodsts using news wesites s resoure to id in the nnottion proessF sed si system whih uses the w with uneven mrgins s lerning omponent nd the qei s xv proessing moduleF
vi
et al.
vi
et al.
verning @goxvvEPHHSAA uses the uneven mrgins versions of two populr lerning lgorithms w nd ereptron for si to del with the imlned lssi(tion proE lems derived from siF
et al.
vi
@ighnEHSAA system for ghinese word segmenttion sed on ereptron lerningD simpleD fst nd e'etive lerning lgorithmF
et al.
oljnr
priendly yntology euthoring sing gontrolled vngugeF ogrphil summries from multiple doumentsF
et al.
ggion 8 qizusks HS desries experiments on ontent seletion for produing iE rsu HS @roeedings of the Pnd iuropen orkshop on the sntegrtion of unowlE
edgeD emnti nd higitl wedi ehnologies @isw PHHSAAhigitl wedi reserE vtion nd eess through emntilly inhned eEennottionF
ng
et al. HS @roeedings of the PHHS siiiGsgGegw snterntionl gonferene on e sntelligene @s PHHSAA ixtrting homin yntology from vinguisti esoure fsed on eltedness wesurementsF
PHHR fonthev HR @vig PHHRA desries lexil nd ontologil resoures in qei used for
xturl vnguge qenertionF
et al.
fonthev
gunninghm 8 ott HR @txviA is the introdution to the ove olletionF gunninghm 8 ott HR @txviA is olletion of ppers overing mny importnt
res of oftwre erhiteture for vnguge ingineeringF
Introduction
PQ
himitrov vi
et al.
et al.
oreferene resolutionF
HR @whine verning orkshop PHHRA desries n w sed lerning lgoE rithm for si using qeiF
et al.
wynrd
HR @vig PHHRA presents lgorithms for the utomti indution of HR @i PHHRA disusses ontologyEsed si in the hehight projetF
wynrd wynrd
et al. HR @eswe PHHRA presents utomti retion nd monitoring of seE mnti metdt in dynmi knowledge portlF
ggion 8 qizusks HR desries n pproh to mining de(nitionsF ggion 8 qizusks HR desries sentene extrtion system tht produes two
sorts of multiEdoument summriesY generlEpurpose summry of luster of relted douments nd n entityEsed summry of douments relted to prtiulr personF
et al.
ood PHHQ
fonthev
et al.
gunninghm
et al. HQ @gorpus vinguistis PHHQA desries qei s tool for olloE rtive orpus nnottionF
wnov
for siF
HQ @rvExeegv PHHQA desries experiments with geogrphi knowledge HQ @iegv PHHQA looks t the distintion etween informtion nd onE HQ @eent edvnes in xturl vnguge roessing PHHQA looks t
wynrd
et al.
tent extrtionF
et al.
et al. HQe @egv orkshop PHHQA desries xi extrtion without trining dt on lnguge you don9t spek @3AF et al.
tionF
et al. HQ @ht nd unowledge ingineeringA disusses multimedi indexing nd serh from multisoure multilingul dtF
PR
Introduction
et al.
ggion ln ood
HQ @iegv PHHQA disusses event oEreferene in the wws projetF HQ @rvExeegv PHHQA presents the yvvsi onEline lerning for si systemF
et al.
et al. HQ @eent edvnes in xturl vnguge roessing PHHQA disusses using prllel texts to improve si rellF
PHHP fker
et al.
HP @vig PHHPA report results from the iwsvvi sndi lnguges orpus HP @egl PHHP orkshopA desries how qei n e used s n enE
fonthev
vironment for tehing xvD with exmples of nd ides for future student projets developed within qeiF
et al. HP @xvs PHHPA disusses how qei n e used to rete rv modE ules for use in informtion systemsF et al. HPD himitrov HP nd himitrov HP @evx PHHPD heeg PHHPD w thesisA desrie the shllow nmed entity oreferene modules in qeiX the orthomther whih resolves pronominl orefereneD nd the pronoun resolution moduleF
fonthev fonthev
gunninghm
et al. HP @egv PHHPA desries the qei frmework nd grphil develE opment environment s tool for roust xv pplitionsF et al.
vl HP @wster hesisA looks t text summristion using qeiF vl 8 uger HP @egv PHHPA looks t text summristion using qeiF wynrd wynrd wynrd
et al. HP @egv PHHP ummristion orkshopA desries using qei to uild portle siEsed summristion system in the domin of helth nd sfetyF
et al. HP @eswe PHHPA desries the dpttion of the ore exxsi modules within qei to the egi @eutomti gontent ixtrtionA tsksF et al. HPd @xordi vnguge ehnologyA desries vrious xmed intity reognition projets developed t he0eld using qeiF
Introduction
PS
wynrd
et al.
presents qei s n exmple of system whih ontriutes to roustness nd to low overhed systems developmentF
et al.
str
ggion
et al. HP nd ggion et al. HP @vig PHHPD v PHHPA desries how exxsi modules hve een dpted to extrt informtion for indexing multimedi mterilF et al.
ln
fonthev
srie prototype of qei version P tht integrted with the ihsgy multimedi mrkup tool from the wx lnk snstituteF
gunninghm HH @hh thesisA de(nes the (eld of oftwre erhiteture for vnguge
ingineeringD reviews previous work in the reD presents requirements nlysis for suh systems @whih ws used s the sis for designing qei versions P nd QAD nd evlutes the strengths nd weknesses of qei version IF
et al.
gunninghm
PHHHD vig IWWVA presents qei9s model of vnguge esouresD their ess nd distriutionF
et al.
HHD gunninghm
et al.
WV nd eters
et al.
WV @yntovex
qmk 8 ylsson HH @vig PHHHA disusses experienes in the vensk projetD whih
used qei version I to develop reusle toolox of wedish lnguge proessing omponentsF
et al.
wynrd winery
et al. HH @ivekA presents the iwsvvi projet in the ontext of whih qei9s niode support for sndi lnguges hs een developedF
PT
Introduction
WV nd gunninghm
et al.
et al. WU @exv IWWUA presents motivtion for qei nd qeiElike infrstruturl systems for vnguge ingineeringF et al.
qei version IF
et al.
WT @mnulA ws the guide to developing giyvi omponents for WT @siA disusses seletion of projets in he0eld using WTD gunninghm
et al.
IWWTD esf orkshop IWWTD tehnil reportA report erly work on qei version IF
et al. et al.
WTdD gunninghm
et al.
WS @gyvsxq
WUD gunninghm
et al.
WTe @sges
rumphreys
softwre engineering issues suh s reuseD nd frmework onstrutionD re importnt for lnguge proessing 8hF
2.2
qei will run nywhere tht supports tv T or lterD inluding olrisD vinuxD w y nd indows pltformsF e don9t run tests on other pltformsD ut hve hd reports of suessful instlls elsewhereF
PV
rerequisitesX
e onforming tv P environmentD
! version IFRFP or ove for qei QFI ! version SFH for qei RFH et I or lterF ! version TFH for qei TFI or lterF
ville free from yrle or from your xs supplierF @e test on vrious un thus on olrisD vinux nd indows FA finries from the qei distriution you downlodedX gteFjr @whih n e found in the diretory lled inAF ou will lso need the li diretoryD ontining vrious lirries tht qei depends onF suitle ephe ex instlltion @version IFVFI or newerAF ou will need to dd n environment vrile nmed exrywi pointing to your ex instlltionD nd dd exrywiGin to your erF en open mind nd sense of humourF sing the inry distriutionX npk the distriutionD reting diretory ontining jr (les nd sriptsF o run qei heveloperX on indowsD strt gommnd rompt windowD hnge to the diretory where you unpked the qei distriution nd run ntFt run9Y on xsGvinux or w open terminl window nd run nt run9F elterntivelyD you n use the inGgteFsh sript on xsGvinux systems @see setion PFPFRAD or inGgteFt on indowsF o emed qei s lirry @qei imeddedAD put gteFjr nd ll the lirries in the li diretory in your gveerF he ent sripts tht strt qei heveloper @ntFt or ntA require you to set the teerywi environment vrile to point to the top level diretory of your tee instlE ltionF he vlue of qeigyxpsq is pssed to the system y the sripts using either Ei ommndEline optionD or the tv property gteFsiteFonfigF
PW
Eh show usge informtion Eld rete or use the (les FgteFsession nd FgteFxml in the urrent diretory s the
session nd on(gurtion (lesF nd on(gurtion (lesF
Eln name rete or use name Fsession nd name Fxml in the urrent diretory s the session Ell if the urrent diretory ontins (le nmed logRjFproperties then use it insted of the
defult @qeirywiGinGlogRjFpropertiesA to on(gure loggingF elterntelyD you n speify ny logRj on(gurtion (le y setting the logRjFonfigurtion property expliitly @see elowAF lotion is providedD the vs in sved pplition re sved reltive to this lotion insted of reltive to the pplition stte (le @see setion QFWFQAF his is equivlent to setting the property gteFuserFresoureshome to this lotionF
Erh location set the resoures home diretory to the lotion providedF sf resoures home
Ed URL lods the giyvi plugin t the given v during the strtEup proessF Ei le uses the spei(ed (le s the site on(gurtionF
re pssed on to the jv ommndF his n e used to eFgF set properties using the jv option EhF por exmple to set the mximum mount of
QH
unning qei heveloper with either the Eld or the Eln option from di'erent diretories is useful to keep severl projets seprte nd n e used to run multiple instnes of qei heveloper @or even di'erent versions of qei heveloperA in suession or even simultnously without the on(gurtion (les getting mixed up etween themF
2.3
huring initilistionD qei reds severl tv system properties in order to deide where to (nd its on(gurtion (lesF rere is list of the properties usedD their defult vlues nd their meningsX
gteFhome sets the lotion of the qei instll diretoryF his should point to the top
level diretory of your qei instlltionF his is the only property tht is requiredF sf this is not setD the system will disply n error messge nd them it will ttempt to guess the orret vlueF ins @FkFF giyvi diretoriesAF {gteFhome}Gplugins is usedF sf this is not set then the defult vlue of
gteFsiteFon(g points to the lotion of the on(gurtion (le ontining the siteEwide
optionsF sf not set this will defult to {gteFhome}GgteFxmlF he site on(gurtion (le must exist3 spei(ed (le does not exist t strtup timeD the defult vlue of gteFxml @FgteFxml on nix pltformsA in the user9s home diretory is usedF
gteFuserFon(g points to the (le ontining the user9s optionsF sf not spei(edD or if the
gteFuserFsession points to the (le ontining the user9s sved sessionF sf not spei(edD
the defult vlue of gteFsession @FgteFsession on nixA in the user9s home diretory is usedF hen strting up qei heveloperD the session is reloded from this (le if it existsD nd when exiting qei heveloper the session is sved to this (le @unless the user hs disled sve session on exit9 in the on(gurtion dilogAF he session is not used when using qei imeddedF
of qei heveloper to the spei(ed diretory insted of the user9s opertingEsystem spei( defult diretoryF
QI
listed here will e loded s giyvi plugins during initilistionF his hs similr funtionlity with the the Ed ommnd line optionF diretoryF his is the lotion of the reoleFxml (le tht de(nes the fundmentl qei resoure typesD suh s doumentsD doument formt hndlersD ontrollers nd the si visul resoures tht mke up qeiF he defult points to lotion inside gteFjr nd should not generlly need to e overriddenF
hen using qei imeddedD you n set the vlues for these properties efore you ll qteFinit@AF elterntivelyD you n set the vlues progrmmtilly using the stti methods setqterome@AD setluginsrome@AD setitegonfigpile@AD etF efore lling qteFinit@AF ee the tvdo doumenttion for detilsF sf you wnt to set these vlues from the ommnd line you n use the following syntx for setting gteFhome for exmpleX
jv EhgteFhomeaGmyGnewGgteGhomeGdiretory EpFFF
gteFwin
hen running qei heveloperD you n set the properties y reting (le uildFproperties in the top level qei diretoryF sn this (leD ny system properties whih re pre(xed with runF9 will e pssed to qeiF por exmpleD to set n lterntive user on(g (leD put the following line in uildFproperties1 X
runFgteFuserFonfiga6{userFhome}GlterntiveEgteFxml
his fility is not limited to the qeiEspei( properties listed oveD for exmple the following line hnges the defult temporry diretory for qei @note the use of forwrd slshesD even on indows pltformsAX
runFjvFioFtmpdiradXGigtmp
hen running qei heveloper from the ommnd line vi nt or vi the gteFsh sript you n set properties using EhF xote tht the run pre(x is required when using ntX
nt run EhrunFgteFuserFonfigaGmyGpthGtoGuserGonfigFfile
ut not when using gteFshX
FGinGgteFsh EhgteFuserFonfigaGmyGpthGtoGuserGonfigFfile
he qei heveloper lunher lso supports the system property gteFlssFpth to speE ify dditionl lsspth entries tht should e dded to the lssloder tht is used to lod qei lssesF his is expeted to e in the norml lsspth formtD iFeF list of diretory or te (le pths seprted y semiolons on indows nd olons on other pltformsF he
1 In this specic case, the alternative cong le must already exist when GATE starts up, so you should
copy your standard
gate.xml
QP
stndrd tv T shorthnd of GpthGtoGdiretoryGB2 to inlude ll Fjr (les from given diretory is lso supportedF es n lterntive to this system propertyD the environment vriE le qeigveer n e usedD ut the environment vrile is only red if the system property is not setF
FGinGgteFsh EhgteFlssFpthaGshredGliGmylssesFjr
2.4
Conguring GATE
hen qei heveloper is strtedD or when qteFinit@A is lled from qei imeddedD qei lods vrious sorts of on(gurtion dt stored s wv in (les generlly lled something like gteFxml or FgteFxmlF his dt holds informtion suh sX whether to sve settings on exitY whether to sve session on exitY wht fonts qei heveloper should useY plugins to lod t strtY olours of the nnottionsY lotions of (les for the (le hooserY nd lot of other qs relted optionsY his type of dt is stored t two levels @in order from generl to spei(AX the siteEwide levelD whih y defult is loted the gteFxml (le in top level diretory of the qei instlltion @iFeF the qei homeF his lotion n e overridden y the tv system property gteFsiteFonfigY the user levelD whih lives in the user9s rywi diretory on xs or their pro(le diretory on indows @note tht prts of this (le re overwritten when sving user settingsAF he defult lotion for this (le n e overridden y the tv system property gteFuserFonfigF here on(gurtion dt ppers on severl di'erent levelsD the more spei( ones overwrite the more generlF his mens tht you n set defults for ll qei users on your systemD for exmpleD nd llow individul users to override those defults without interfering with othersF
2 Remember to protect the * from expansion by your shell if necessary.
gon(gurtion dt n e set from the qei heveloper qs vi the yptions9 menu then gon(gurtion9F he user n hnge the pperne of the qs in the epperne9 tD whih inludes the options of font nd the look nd feel9F he edvned9 t enles the user to inlude nnottion fetures when sving the doument nd preserving its formtD to sve the seleted yptions utomtilly on exitD nd to sve the session utomtilly on exitF he snput wethods9 sumenu from the yptions9 menu enles the user to hnge the defult lnguge for inputF hese options re ll stored in the user9s FgteFxml (leF hen using qei imeddedD you n lso set the site on(g lotion using qteFsetitegonfigpile@pileA prior to lling qteFinit@AF
2.5
Building GATE
xote tht you don9t need to uild qei unless you9re doing development on the system itselfF
rerequisitesX
e onforming tv environment s oveF e opy of the qei soures nd the uild sripts ! either the g distriution pkge from the nightly snpshots or opy of the ode otined through uversion @see etion PFPFQAF e working instlltion of ephe ex version IFVFI or newerF ou will need to dd n environment vrile nmed exrywi pointing to your ex instlltionD nd dd exrywiGin to your erF st is dvisle tht you lso set your teerywi environE ment vrile to point to the topElevel diretory of your tv instlltionF en ppreition of nturl eutyF
nt
nt test
nt do
QR
nt run
@he detils of the uild proess re ll spei(ed y the uildFxml (le in the gte diretoryFA ou n lso use development environment like ilipse @the required Fprojet (le nd other metdt re inluded with the souresAD ut note tht it9s still dvisle to use nt to generte doumenttionD the jr (le nd so onF elso note tht the run on(gurtions hve the lotion of gteFxml site on(gurtion (le hrdEoded into themD so you my need to hnge these for your siteF
sn ddition you will require the mthing versions of ny qei plugins you wish to use in your pplition ! these re not mnged y wven or svyD ut n e otined from the stndrd qei relese downlod or downloded using the qei heveloper plugin mnger s ppropriteF xightly snpshot uilds of gteEore re ville from our own wven repository t httpXGGrepoFgteFFukGontentGgroupsGpuliF
QS
2.6
Uninstalling GATE
or just delete the whole of the instlltion diretory @the one ontining inD liD ninstllerD etFAF he instller doesn9t instll nything outside this diretoryD ut for ompleteness you might lso wnt to delete the settings (les qei retes in your home diretory @FgteFxml nd FgteFsessionAF
2.7
Troubleshooting
ee the pe on the qei iki for frequent questions out running nd using qeiF
QT
QV nnottion types suh s xme9 or hte9D nnottion sets omprising groups of nnottionsD
proessing resoures tht mnipulte nd rete nnottions on doumentsD nd pplitionsD omprising sequenes of proessing resouresD tht n e pplied to doument or orpusF ht is onsidered to e the end result of the proess vries depending on the tskD ut for the purposes of this hpterD output tkes the form of the nnotted doumentGorpusF eserhers might e more interested in (gures demonstrting how suessfully their ppliE tion ompres to gold stndrd9 nnottion setY ghpter IH in rt ss will over wys of ompring nnottion sets to eh other nd otining mesures suh s pIF smplementers might e more interested in using the nnottions progrmmtillyY ghpter UD lso in rt ssD tlks out working with nnottions from qei imeddedF por the purposes of this hpterD howeverD we will fous only on reting the nnotted douments themselvesD nd reting qei pplitions for future useF qei inludes omplete informtion extrtion system tht you re free to useD lled exxsi @ xerlyExew snformtion ixtrtion ystemAF wny users (nd this is good strting point for their own pplitionD nd so we will over it in this hpterF ghpter T tlks in lot more detil out the inner workings of exxsiD ut we im to get you strted using exxsi from inside of qei heveloper in this hpterF e strt the hpter with n explortion of the qei heveloper qsD in etion QFIF e desrie how to rete douments @etion QFPA nd orpor @etion QFQAF e tlk out viewing nd mnully reting nnottions @etion QFRAF e then tlk out loding the plugins tht ontin the proessing resoures you will use to onstrut your pplitionD in etion QFSF e then tlk out instntiting proessing resoures @etion QFUAF etion QFV overs pplitionsD inluding using exxsi @etion QFVFQAF ving pplitions nd lnguge resoures @douments nd orporA is overed in etion QFWF e onlude with few ssorted topis tht might e useful to the qei heveloper userD in etion QFIIF
3.1
pigure QFI shows the min window of qei heveloperD s you will see it when you (rst run itF here re (ve min resX IF t the topD the menus bar nd tools bar with menus pile9D yptions9D ools9D relp9 nd ions for the most frequently used tionsY
QW
PF on the left sideD tree strting from qei9 nd ontining epplitions9D vnguge esoures9 etF ! this is the resources treeY QF in the ottom left ornerD retngleD whih is the
small resource viewerY
RF in the enterD ontining ts with wessges9 or the nme of resoure from the resoures treeD the main resource viewerY SF t the ottomD the
messages barF
he menu nd the messges r do the usul thingsF vonger messges re displyed in the messges t in the min resoure viewer reF he resoure tree nd resoure viewer res work together to llow the system to disply diverse resoures in vrious wysF he mny resoures integrted with qei n hve either smll viewD lrge viewD or othF et ny timeD the min viewer n lso e used to disply other informtionD suh s messgesD y liking on the pproprite t t the top of the min windowF sf n error ours in proessingD the messges t will )sh redD nd n dditionl popup error messge my lso ourF
RH
sn the options dilogue from the yptions menu you n hoose if you wnt to link the seletion in the resoures tree nd the seleted min viewF
3.2
pigure QFPX wking xew houment sf you rightElik on vnguge esoures9 in the resoures pneD selet xew9 then qei houment9D the window rmeters for the new qei houment9 will pper s shown in (gure QFPF rereD you n speify the qei doument to e retedF equired prmeters re indited with tikF he nme of the doument will e reted for you if you do not speify itF inter the v of your doument or use the (le rowser to indite the (le you wish to use for your doument soureF por exmpleD you might use httpXGGgteFFuk9D or rowse to text or wv (le you hve on diskF glik on yu9 nd qei doument will e reted from the soure you spei(edF ee lso the movie for reting doumentsF he doument editor is ontined in the entrl ted pne in qei heveloperF houleE lik on your doument in the resoures pne to view the doument editorF he doument editor onsists of top pnel with uttons nd ions tht ontrol the disply of di'erent views nd the serh oxF snitillyD you will see just the text of your doumentD s shown in (gure QFQF glik on ennottion ets9 nd ennottions vist9 to view the nnottion sets to the right nd the nnottions list t the ottomF ou will see view similr to (gure QFRF sn ple of the nnottions listD you n lso hoose to see the nnottions stkF sn ple of the nnottion setsD you n lso hoose to view the oEreferene editorF wore informtion out this funtionlity is given in etion QFRF everl options n e set from the smll tringle ion t the top right ornerF
RI
ith ve gurrent vyout9 you store the wy the di'erent views re shown nd the nnotE tion types highlighted in the doumentF hen if you set estore vyout eutomtilly9 you will get the sme views nd nnottion types eh time you open doumentF he lyout is sved to the user preferenes (leD gteFxmlF st mens tht you n give this (le to new user so sGhe will hve preon(gured doument editorF enother setting mke the doument editor edEonly9F sf enledD you won9t e le to edit the text ut you will still e le to edit nnottionsF st is useful to void to involuntrily modify the originl textF he option ight o veft yrienttion9 is useful for hnging orienttion of the text for the lnguges suh s eri nd rduF eleting this option hnges orienttion of the text of the urrently visile doumentF pinlly you n hoose etween snsert eppend9 nd snsert repend9F ht setting is only relevnt when you9re inserting text t the very order of n nnottionF sf you ple the ursor t the strt of n nnottionD in one se the newly entered text will eome prt of the nnottionD in the other se it will sty outsideF sf you ple the ursor t the end of n nnottionD the opposite will hppenF
RP
vet use this senteneX his is n nnottionF9 with the squre rkets denoting the oundries of the nnottionF sf we insert x9 just efore the 9 or just fter the n9 of nnottion9D here9s wht we getX eppend his is n xnnottionF his is n nnottionxF repend his is n xnnottionF his is n nnottionxF
pigure QFRX he houment iditor with ennottion ets nd ennottions vist ext in loded doument n e edited in the doument viewerF he usul pltform spei( utD opy nd pste keyord shortuts should lso workD depending on your operting
RQ
system @eFgF gvEgD gvE for indowsAF he lst ionD mgnifying glssD t the top of the doument editor is for serhing in the doumentF o prevent the new nnottion windows popping up when piee of text is seletedD hold down the gv keyF elterntivelyD you n hide the nnottion sets view y liking on its utton t the top of the doument viewY this will lso use the highlighted portions of the text to eome unEhighlightedF ee lso etion IWFPFQ for the ompound doument editorF
3.3
ou n rete new orpus in similr mnner to reting new doumentY simply rightE lik on vnguge esoures9 in the resoures pneD selet xew9 then qei orpus9F e rief dilogue ox will pper in whih you n optionlly give nme for your orpus @if you leve this lnkD orpus nme will e reted for youA nd optionlly dd douments to the orpus from those lredy loded into qeiF here re three wys of dding douments to orpusX IF hen reting the orpusD liking on the ion next to the doumentsvist input (eld rings up popup window with list of the douments lredy loded into qei heveloperF his enles the user to dd ny douments to the orpusF PF elterntivelyD the orpus n e loded (rstD nd douments dded lter y doule liking on the orpus nd using the C nd E ions to dd or remove douments to the orpusF xote tht the douments must hve een loded into qei heveloper efore they n e dded to the orpusF QF yne lodedD the orpus n e populted y right liking on the orpus nd seleting opulte9F ith this methodD douments do not hve to hve een previously loded into qei heveloperD s they will e loded during the popultion proessF sf you rightElik on your orpus in the resoures pneD you will see tht you hve the option to opulte9 the orpusF sf you selet this optionD you will see dilogue ox in whih you n speify diretory in whih qei will serh for doumentsF ou n speify the extensions llowleY for exmpleD wv or F his will restrit the orpus popultion to only those douments with the extensions you wish to lodF ou n hoose whether to reurse through the diretories ontined within the trget diretory or restrit the popultion to those douments ontined in the top level diretoryF glik on yu9 to populte your orpusF his option provides quik wy to rete qei gorpus from diretory of doumentsF edditionllyD rightEliking on loded doument in the tree nd seleting the xew orpus with this doument9 option retes new trnsient orpus nmed gorpus for document name ontining just this doumentF
RR
pigure QFSX gorpus iditor houle lik on your orpus in the resoures pne to see the orpus editorD shown in (gure QFSF ou will see list of the douments ontined within the orpusF sn the top left of the orpus editorD plus nd minus uttons llow you to dd douments to the orpus from those lredy loded into qei nd remove douments from the orpus @note tht removing doument from orpus does not remove it from qeiAF p nd down rrows t the top of the view llow you to reorder the douments in the orpusF he rightmost utton in the view opens the urrently seleted doument in doument editorF et the ottomD you will see tht ts entitled snitilistion rmeters9 nd gorpus ulity essurne9 re lso ville in ddition to the orpus editor t you re urrently looking tF gliking on the snitilistion rmeters9 t llows you to view the initilistion prmeters for the orpusF he gorpus ulity essurne9 t llows you to lulte greement
RS
mesures etween the nnottions in your orpusF egreement mesures re disussed in depth in ghpter IHF he use of orpus qulity ssurne is disussed in etion IHFQF
3.4
sn this setionD we will tlk in more detil out viewing nnottionsD s well s reting nd editing them mnullyF es disussed in t the strt of the hpterD the min purpose of qei is nnotting doumentsF hilst pplitions n e used to nnotte the douments entirely utomtillyD nnottion n lso e done mnullyD eFgF y the userD or semiEutomtillyD y running n pplition over the orpus nd then orretingGdding new nnottions mnullyF etion QFRFS fouses on mnul nnottionF sn etion QFU we tlk out running proessing resoures on our doumentsF e egin y outlining the funtionlity round viewing nnottionsD orgnised y the qs re to whih the funtionlity pertinsF
RT
list of the nnottions ssoited with itD from whih one n selet n nnottion to view in the nnottion editorD or if there is only oneD the nnottion editor for tht nnottionF pigure QFT shows the nnottion editorF
RU
only one feture to disply y douleEliking on the nnottion type in the (rst olumnF ightElik on n nnottion in the nnottions stk view to edit itF gontrolEhiftElik to delete itF houleElik to opy it to nother nnottion setF gontrolElik on feture vlue tht ontins n v to disply it in your rowserF ell of these mouse shortuts mke it esier to rete gold stndrd nnottion setF
RV
pigure QFVX goEreferene editor inside doument editorF he popup window in the doument under the word ig9 is used to dd highlighted nnottions to oEreferene hinF rere the nnottion type yrgniztion9 of the nnottion set hefult9 is highlighted nd lso the oEreferenes ig9 nd qei9F
e omo ox ner the top of the oEreferene editor llows the user to selet n nnottion type from the urrent setF hen the how utton is seleted ll the nnottions of the seleted type will e highlightedF xow when the mouse pointer is pled over one of those nnottionsD popEup ox will pper giving the user the option of dding the nnottion to oEreferene hinF he nnottion n e dded to n existing hin y typing the nme of the hin @s shown in the list on the rightA in the popEup oxF elterntivelyD if the user presses the down ursor keyD list of ll the existing nnottions ppersD together with the option xew ghinF eleting the xew ghin option will use new hin to e reted ontining the seleted nnottion s its only elementF ih nnottion n only e dded to single hinD ut nnottions of di'erent types n e dded to the sme hinD nd the sme text n pper in more thn one hin if it is referened y two or more nnottionsF he movie for inspeting results is lso useful for lerning out viewing nnottionsF
RW
he type of the nnottionD y defultD will e the sme s the lst nnottion you retedD unless there is noneD in whih se it will e xew9F ou n enter ny nnottion type nme you wish in the text oxD unless you re using shemEdriven nnottion @see etion QFRFTAF ou n dd or hnge fetures nd their vlues in the tle elowF o delete n nnottionD lik on the red ion t the top of the popup windowF o growGshrink the spn of the nnottion t its strt use the two rrow ions on the left or right nd left keysF se the two rrow ions next on the right to hnge the nnottion end or ltCright nd ltCleft keysF edd shift nd ontrolCshift keys to mke the spn inrement iggerF he red ion is for removing the nnottionF he pin ion is to pin the window so tht it remins where it isF sf you drg nd drop the windowD this utomtilly pins it tooF inning it mens tht even if you selet nother nnottion @y hovering over it in the min resoure viewerA it will still sty in the sme positionF he popup menu only ontins nnottion types present in the ennottion hem nd those lredy listed in the relevnt ennottion etF o rete new ennottion hemD see etion QFRFTF he popup menu n e edited to dd new nnottion typeD howeverF he new nnottion reted will utomtilly e pled in the nnottion set tht hs een seleted @highlightedA y the userF o rete new nnottion setD type the nme of the new set to e reted in the ox elow the list of nnottion setsD nd lik on xew9F pigure QFIH demonstrtes dding yrgniztion9 nnottion for the string ig9 @highE lighted in greenA to the defult nnottion set @lnk nme in the nnottion set view on the rightA nd feture nme type9 with vlue out to e ddedF o dd seond nnottion to seleted piee of textD or to dd n overlpping nnottion to n existing oneD press the gv key to void the existing nnottion popup pperingD nd then selet the text nd rete the new nnottionF egin y defult the lst nnottion type to hve een used will e displyedY hnge this to the new nnottion typeF hen piee of text hs more thn one nnottion ssoited with itD on mouseover ll the nnottions will e displyedF eleting one of them will ring up the relevnt nnottion popupF o serh nd nnotte the doument utomtillyD use the serh nd nnotte funtion s shown in (gure QFIIX
SH
grete ndGor selet n nnottion to e used s model to nnotteF ypen the pnel t the ottom of the nnottion editor windowF ghnge the expression to serh if neessryF se the pirst utton or inter key to selet the (rst expression to nnotteF se the ennotte utton if the seletion is orret otherwise the xext uttonF efter few yles of ennotte nd xextD se the ennF ll next uttonF xote tht fter using the pirst utton you n move the ret in the doument nd use the xext utton to void ontinuing the serh from the eginning of the doumentF he c utton t the end of the serh text (eld will help you to uild powerful regulr expressions to serhF
SI
SP
shem re ville to edit1 F here feture is delred s hving n enumerted type the ville enumertion vlues re presented s n rry of uttonsD mking it esy to selet the required vlue quiklyF
SQ
pinlly open the wv (le in your rowser nd print itF xote tht overlpping nnottionsD nnot e expressed orretly with inline wv tgs nd thus won9t e displyed orretlyF
3.5
sn qeiD proessing resoures re used to utomtilly rete nd mnipulte nnottions on doumentsF e will tlk out proessing resoures in the next setionF roweverD we must (rst introdue giyvi pluginsF sn most sesD in order to use prtiulr proessing resoure @nd ertin lnguge resouresA you must (rst lod the giyvi plugin tht ontins itF his setion tlks out using giyvi pluginsF henD in etion QFUD we will tlk out reting nd using proessing resouresF he de(nitions of giyvi resoures @eFgF proessing resoures suh s tggers nd prsersD see ghpter RA re stored in giyvi diretories @diretories ontining n wv (le deE sriing the resouresD the tv rhive with the ompiled exeutle ode nd whtever lirries re required y the resouresAF lugins n hve one or more of the following sttes in reltion with qeiX
known plugins re those plugins tht the system knows outF hese inlude ll the plugins
in the plugins diretory of the qei instlltion nd those instlled in the user9s
SR
loded plugins re the plugins urrently loded in the systemF ell giyvi resoure types
from the loded plugins re ville for useF ell known plugins n esily e loded nd unloded using the user interfeF initilistion whih n e on(gured vi the lodFpluginFpth system propertyF
utoElodle plugins re the list of plugins tht the system lods utomtilly during
t plugins diretory of the instilltionD lthough the defult lotion n e modi(ed using the gteFpluginsFhome system propertyF folderF he lotion of this folder n e set either through the on(gurtion t of the giyvi mnger interfe or vi the gteFuserFplugins system property or user plugin folderF
user plugins re plugins tht hve een instlled y the user into their personl plugins
lol plugins re those plugins loted on disk ut whih ren9t in either the ore plugins remote plugins re plugins whih re loded vi http from remote mhineF
he giyvi plugins n e mnged through the grphil user interfe whih n e tivted y seleting wnge giyvi lugins9 from the pile9 menuF his will ring up window listing ll the known pluginsF por eh plugin there re two hekEoxes ! one lelled vod xow9D whih will lod the pluginD nd the other lelled vod elwys9 whih will dd the plugin to the list of utoElodle pluginsF e helete9 utton is lso provided ! whih will remove the plugin from the list of known pluginsF his opertion does not delete the tul plugin diretoryF snstlled plugins re found utomtilly when qei is strtedY if n instlled plugin is deleted from the listD it will reEpper next time qei is lunhedF sf you selet pluginD you will see in the pne on the right the list of resoures tht plugin ontinsF por exmpleD in (gure QFIPD the exxsi9 plugin is seletedD nd you n see tht it ontins IU resouresF sf you wish to use prtiulr resoure you will hve to sertin whih plugin ontins itF his list n e useful for thtF elterntivelyD the qei wesite provides diretory of plugins nd their proessing resouresF rving loded the plugins you needD the resoures they de(ne will e ville for useF ypiE llyD to the qei heveloper userD this mens tht they will pper on the xew9 menu when you rightElik on roessing esoures9 in the resoures pneD lthough some speil plugins hve di'erent e'etsY for exmpleD the hemennottioniditor @see etion QFRFTAF
SS
3.6
hile qei is distriuted with numer of ore plugins @see rt sssA there re mny more plugins developed nd mde ville y other qei usersF ome of these dditionl plugins n esily e instlled into your lol opy of qei through the giyvi plugin mngerF lugin developers n o'er their plugins y mintining plugin repositoryF he ddresse of plugin repository n then e dded to your qei instlltion through the on(gurtion t of the plugin mngerF por exmpleD in the following sreenshot you n see tht two plugin repositories hve een ddedD lthough only one is urrently enledF eferenes to numer of plugin repositories re provided within the qei distriutionD lthough they re initilly disled2 F yne plugin repository is enled the plugins whih n e instlled re listed on the eville9 tF snstlling new plugins is simply se of heking the ox nd liking epply ell9F xote tht plugins re instlled into the user plugins diretoryD whih must hve een orretly on(gured efore you n try instlling new pluginsF yne plugin is instlled it will pper in the list of snstlled lugins9 nd n e loded
2 Currently three plugin repositories are listed in the main distribution. To have your repository included
in the list send an e-mail with the address to the GATE developers mailing list.
ST
in the sme wy s ny other giyvi plugin @see etion QFUAF sf new version of plugin you hve instlled eomes ville the new version will e o'ered s n updteF hese updtes n e instlled in the sme wy s new pluginF
3.7
his setion desries how to lod nd run giyvi resoures not present in exxsiF o lod exxsiD see etion QFVFQF por tehnil desriptions of these resouresD see the pproprite hpter in rt sss @eFgF ghpter PIAF pirst ensure tht the neessry plugins hve een loded @see etion QFSAF sf the resoure you require does not pper in the list of roessing esouresD then you proly do not hve the neessry plugin lodedF roessing resoures re loded y seleting them from the set of roessing esouresX right lik on roessing esoures or selet xew roessing esoure9 from the pile menuF por exmpleD use the lugin gonsole wnger to lod the ools9 pluginF hen you right
SU
lik on roessing esoures9 in the resoures pne nd selet xew9 you hve the option to rete ny of the proessing resoures tht plugin providesF ou my hoose to rete qei worphologil enlyser9D with the defult prmetersF rving done thisD n instne of the qei worphologil enlyser ppers under roessing esoures9F his proessing resoureD or D is now ville to useF houleEliking on it in the resoures pne revels its initilistion prmetersD see (gure QFIRF
his proessing resoure is now ville to e dded to pplitionsF st must e dded to n pplition efore it n e pplied to doumentsF ou my rete s mny of prE tiulr proessing resoure s you wishD for exmple with di'erent initilistion prmetersF etion QFV tlks out reting nd running pplitionsF ee lso the movie for loding proessing resouresF
SV
3.8
yne ll the resoures you need hve een lodedD n pplition n e reted from themD nd run on your orpusF ight lik on epplitions9 nd selet xew9 nd then either gorpus ipeline9 or ipeline9F e pipeline pplition n only e run over single doumentD while orpus pipeline n e run over whole orpusF o uild the pipelineD doule lik on itD nd selet the resoures needed to run the pplition @you my not neessrily wish to use ll those whih hve een lodedAF rnsfer the neessry omponents from the set of loded omponents9 displyed on the left hnd side of the min window to the set of seleted omponents9 on the rightD y seleting eh omponent nd liking on the left nd right rrowsD or y douleEliking on eh omponentF insure tht the omponents seleted re listed in the orret order for proessing @strting from the topAF sf notD selet omponent nd move it up or down the list using the upGdown rrows t the left side of the pneF insure tht ny prmeters neessry re set for eh proessing resoure @y liking on the resoure from the list of seleted resoures nd heking the relevnt prmeters from the pne elowAF por exmpleD if you wish to use nnottion sets other thn the hefult oneD these must e de(ned for eh proessing resoureF xote tht if orpus pipeline is usedD the orpus needs only to e set oneD using the dropE down menu eside the orpus9 oxF sf pipeline is usedD the doument must e seleted for eh proessing resoure usedF pinllyD lik on un9 to run the pplition on the doument or orpusF ee lso the movie for loding nd running proessing resouresF por how to use the conditional versions of the pipelines see etion QFVFP nd for svingGrestorE ing the on(gurtion of n pplition see etion QFWFQF
SW
hen you run the pplition on the orpus dtstoreD eh doument will e lodedD proE essedD sved then unlodedF o t ny time there will e only one doument from the dtstore orpus lodedF his prevent memory shortge ut is lso little it slower thn if ll your douments were lredy lodedF he proessed douments re utomtilly sved k to the dtstore so you my wnt to use opy of the dtstore to experimentF fe very reful tht if you hve some douments from the dtstore orpus lredy loded efore running the pplition then they will not e unloded nor svedF o sve suh doument you hve to right lik on it in the resoures tree view nd sve it to the dtstoreF
TH
TI
3.9
sn this setionD we will desrie how pplitions nd lnguge resoures n e sved for use outside of qei nd for use with qei t lter timeF etion QFWFI tlks out sving douments to (leF etion QFWFP outlines how to use dtstoresF etion QFWFQ tlks out sving pplition sttes @resoure prmeter sttesAD nd etion QFWFR tlks out exporting pplitions together with referened (les nd resoures to s (leF
TP
he nnottions re sved s norml doument tgsD using the nnottion type s the tg nmeF sf the dvned option snlude nnottion fetures for ve reserving pormt 9 @see etion PFRA is set to trueD then the nnottion fetures will lso e sved s tg ttriutesF sing this opertion for qei douments tht were not reted from n rwv or wv (le results in plin text (leD with inEline tgs for the sved nnottionsF xote tht qei9s model of nnottion llows grph struturesD whih re di0ult to repreE sent in wv @wv is treeEstrutured representtion formtAF huring the dump proessD nnottions tht ross eh other in wys tht nnot e represented in legl wv will e disrdedD nd wrning messge printedF
TQ
TR
sn this wyD ll resoure (les tht re prt of qei re lwys used orretlyD no mtter where qei is instlledF esoure (les whih re not prt of qei nd used y n pplition do not need to e in the sme lotion s when the pplition ws initilly reted ut rther in the sme location relative to the location of the application leF sn ddition if your pplition uses projetEspei( lotion for glol resoures or projet spei( pluginsD the jv property gteFuserFresoureshome n e set to this lotion nd the pplition will e stored so tht this lotion will lso lwys e used orretlyD no mtter where the pplition stte (le is opied toF o set the resoures home diretoryD the Erh lotion option for the vinux sript gteFsh to strt qei n e usedF he omintion of these fetures llows the retion nd deployment of portle pplitions y keeping the pplition (le nd the resoure (les used y the pplition togetherF xote tht qei resoures tht re used y your pplition my hnge etween di'erent releses of qeiF sf your pplition depends on spei( version of resoures tht ome with the qei distriutionD onsider opying them to your projet diretory in order to enE sure the orret version is usedF he option 4ixport for qeigloudFnet4 @see etion QFWFRA supports this y reting s (le tht ontins opy ll qei resoures used y the pplitionD inluding qei pluginsF hen n pplition is restored from n pplition stte (leD qei uses the keyword 6relpth6 for pths reltive to the lotion of the gpp (leD 6gtehome6 for pths reltive to the qei home instlltion diretory nd 6resoureshom6 for pths reltive to the the lotion the property gteFuserFresoureshome is setF here exists other keywords tht n e interesting in some sesF ou will need to edit the gpp (le mnullyF he keywords re 6gteplugins6 nd 6syspropXFFF6F he ltter is ny jv system propertyD for exmple 6syspropXuserFhome6F sf you wnt to sve your pplition long with ll the resoures it requires you n use the ixport for qeigloudFnet9 option @see etion QFWFRAF ee lso the movie for sving nd restoring pplitionsF
TS
hen you export n pplition in this wyD qei heveloper produes s (le ontining the sved pplition stte @in the sme formt s ve pplition stte9AF eny plugins nd resoure (les tht the pplition refers to re lso inluded in the zip (leD nd the reltive pths in the sved stte re rewritten to point to the orret lotions within the pkgeF he resulting pkge is therefore selfEontined nd n e opied to nother mhine nd unpked thereD or pssed to qeigloudFnet for deploymentF es well s seleting the lotion where you wnt to sve the pkgeD the ixport for qeiE gloudFnet9 option will lso prompt you to selet the nnottion sets tht your pplition uses for input nd outputF por exmpleD if your pplition mkes use of the unpked wv mrkup in soure douments nd retes nnottions in the defult set then you would seE let yriginl mrkups9 s n input set nd the <Default annotation set>9 s n output setF qei heveloper will try to mke n eduted guess t the orret sets ut you should hek nd mend the lists s neessryF here re few importnt points to note out the export proessX he omplete ontents of ll the plugin diretories tht re loded when you perform the export will e inluded in the resulting pkgeF se the plugin mnger to unlod ny plugins your pplition is not using efore you export itF sf your pplition refers to resoure (le in diretory tht is not under one of the loded pluginsD the entire ontents of this diretory will e reursively inluded in the pkgeF sf you hve numer of unrelted resoures in single diretory @eFgF mny sets of lrge gzetteer listsA you my wnt to seprte them into seprte diretories so tht only the relevnt ones re inluded in the pkgeF he pkger only knows out resoures tht your pplition refers to diretly in its prmetersF por exmpleD if your pplition inludes multiEphse tei grmmr the pkger will only onsider the min grmmr (leD not ny of its suEphsesF sf the suEphses re not ontined in the sme diretory s the min grmmr you my (nd they re not inludedF sf indiret referenes of this kind re ll to (les under the sme diretory s the mster9 (le it will work yuF sf you require more )exiility thn this option provides you should red etion iFPD whih desries the underlying ent tsk tht the exporter usesF
3.10
Keyboard Shortcuts
ou n use vrious keyord shortuts for ommon tsks in qei heveloperF hese re listed in this setionF
TT pI hisply help pge for the seleted omponent eltCpR ixit the pplition without on(rmtion ut the fous on the next omponent or frme
hiftC ut the fous on the previous omponent or frme pT ut the fous on the next frme hiftCpT ut the fous on the previous frme eltCp how the pile menu eltCy how the yptions menu eltC how the ools menu eltCr how the relp menu pIH how the (rst menu
TU
3.11
Miscellaneous
TV
sf prolem ours nd the sved dt prevents qei heveloper from strtingD you n (x this y deleting the on(gurtion nd session dt (lesF hese re stored in your home diretoryD nd re lled gteFxml nd gteFsesssion or FgteFxml nd FgteFsesssion depending on pltformF yn indows your home isX
sn qei heveloperD selet niode editor9 from the ools9 menuF his will disply n editor windowD ndD when lnguge with ustom input method is seleted for input @see next setionAD virtul keyord window with the hrters of the lnguge ssigned to the keys on the keyordF ou n enter dt either y typing s normlD or with mouse liks on the virtul keyordF
sn the editor nd in qei heveloper9s min windowD the yptions9 menu hs n snput methods9 hoieF ell supported input lnguges @ superset of the thu lngugesA re ville hereF xote tht you need to use font ple of displying the lnguge you seletF fy defult qei heveloper will hoose niode font if it n (nd one on the pltform you9re running onF ytherwiseD selet font mnully from the yptions9 menu gon(gurtion9 hoieF
TW
hen you rete doument from v pointing to textul dt in qeiD you hve to tell the system wht hrter enoding the text is stored inF fy defultD qei will set this prmeter to e the empty stringF his tells tv to use the defult enoding for whtever pltform it is running on t the time ! eFgF on estern versions of indows this will e syE VVSWEID nd istern ones syEVVSWEWF yn vinux systemsD the defult enoding is in)uened y the vexq environment vrileD eFgF when this vrile is set to enFutfEV the defult enoding used will e pEVF hen qei is strted using the inGnt run ommnd or @on vinuxA through the gteFsh sript or link to itD you n hnge the defult enoding used y qei to pEV y dding EhrunFfileFenodingautfEV s prmeterF e populr wy to store niode douments is in pEVD whih is superset of egss @ut n still store ll niode dtAY if you get n error messge out doument sGy during redingD try setting the enoding to pEVD or some other lolly populr enodingF @o see list of ville enodingsD try opening doument in qei9s uniode editor ! you will e prompted to selet n enodingFA
UH
UP
qei omponents re one of three typesX vngugeesoures @vsA represent entities suh s lexionsD orpor or ontologiesY roessingesoures @sA represent entities tht re primrily lgorithmiD suh s prsersD genertors or ngrm modellersY isulesoures @sA represent visulistion nd editing omponents tht prtiipte in qssF he distintion etween lnguge resoures nd proessing resoures is explored more fully in setion hFIFIF golletivelyD the set of resoures integrted with qei is known s giE yviX golletion of iusle yjets for vnguge ingineeringF sn the rest of this hpterX etion RFQ desries the lifeyle of qei omponentsY etion RFR desries how roessing esoures n e grouped into pplitionsY etion RFS desries the reltionship etween vnguge esoures nd their dtsE toresY etion RFT summrises qei9s set of uiltEin omponentsY etion RFU desries how on(gurtion dt for esoure types is supplied to qeiF
4.1
qei llows resoure implementtions nd vnguge esoure persistent dt to e disE triuted over the eD nd uses tv nnottions nd wv for on(gurtion of resoures @nd qei itselfAF esoure implementtions re grouped together s plugins9D stored t v @when the resoures re in the lol (le system this n e fileXG vAF hen plugin is loded into qei it looks for on(gurtion (le lled reoleFxml reltive to the plugin v nd uses the ontents of this (le to determine wht resoures this plugin delres nd where to (nd the lsses tht implement the resoure types @typilly these lsses re stored in te (le in the plugin diretoryAF gon(gurtion dt for the resoures my e stored diretly in the reoleFxml (leD or it my e stored s tv nnottions on the resoure lsses themselvesY in either se qei retrieves this on(gurtion informtion nd dds the resoure de(nitions to the giyvi registerF hen user requests n instntition of resoureD qei retes n instne of the resoure lss in the virtul mhineF vnguge resoure dt n e stored in inry serilised form in the lol (le systemF
UQ
4.2
e n think of the qei frmework s kplne into whih users n plug giyvi omponentsF he user gives the system list of vs to serh when it strts upD nd omponents t those lotions re loded y the systemF he kplne performs these funtionsX omponent disoveryD ootstrppingD loding nd relodingY mngement nd visulistion of ntive dt strutures for ommon informtion typesY generlised dt storge nd proess exeutionF e set of omponents plus the frmework is deployment unit whih n e emedded in nother pplitionF et their most siD ll qei resoures re Java BeansD the tv pltform9s model of softwre omponentsF fens re simply tv lsses tht oey ertin interfe onventionsX ens must hve noErgument onstrutorsF ens hve propertiesD de(ned y pirs of methods nmed y the onvention setProp nd getProp F qei uses tv fens onventions to onstrut nd on(gure resoures t runtimeD nd de(nes interfes tht di'erent omponent types must implementF
4.3
giyvi resoures exhiit vriety of forms depending on the perspetive they re viewed fromF heir implementtion is s tv lss plus n wv metdt (le living t the sme vF hen using qei heveloperD resoures n e loded nd viewed vi the resoures tree @left pneA nd the rete resoure9 mehnismF hen progrmming with qei imeddedD they re tv ojets tht re otined y mking lls to qei9s ptory lssF hese vrious inrntions re the phses of giyvi resoure9s lifeyle9F hepending on wht sort of tsk you re using qei forD you my use resoures in ny or ll of these phsesF por exmpleD you my only e interested in getting grphil view of wht qei9s exxsi snformtion ixtrtion system @see ghpter TA doesY in this se you will use qei heveloper to lod the exxsi resouresD nd lod doumentD nd rete n exxsi pplition nd run it on the doumentF sfD on the other hndD you wnt to
UR
rete your own resouresD or modify the tv ode of n existing resoure @s opposed to just modifying its grmmrD for exmpleAD you will need to del with ll the lifeyle phsesF he vrious phses my e summrised sX
greting new resoure from srth @ootstrppingAF o rete the inry imge
of resoure @ tv lss in te (leAD nd the wv (le tht desries the resoure to qeiD you need to rete the pproprite Fjv (le@sAD ompile them nd pkge them s FjrF qei provides ootstrp tool to strt this proess ! see etion UFIPF elterntively you n simply opy ode from n existing resoureF
odeD use qei9s ptory lss @this tkes re of prmeterising the resoureD restorE ing it from dtse where ppropriteD etF etFAF etion UFP desries how to do thisF
voding resoure into qei heveloperF o lod resoure into qei heveloperD
use the vrious xew FFF resoure9 options from the pile menu nd elsewhereF ee etion QFIF
empty resoure tht does nothingF sn order to hieve the ehviour you requireD you9ll need to hnge the on(gurtion of the resoure @y editing the reoleFxml (leA ndGor hnge the tv ode tht implements the resoureF ee setion RFUF
4.4
s n e omined into applicationsF epplitions model ontrol strtegy for the exeE ution of sF sn qeiD pplitions re lled ontrollers9 ordinglyF gurrently only sequentilD or pipelineD exeution is supportedF here re two min types of pipelineX
imple pipelines simply group set of s together in order nd exeute them in turnF
he implementing lss is lled erilgontrollerF
nd orporF e orpus pipeline opens eh doument in the orpus in turnD sets tht doument s runtime prmeter on eh D runs ll the s on the orpusD then loses the doumentF he implementing lss is lled erilenlysergontrollerF
US
gonditionl versions of these ontrollers re lso villeF hese llow proessing resoures to e run onditionlly on doument feturesF ee etion QFVFP for how to use theseF sf more )exiility is requiredD the qroovy plugin provides scriptable ontroller @see setion UFIUFQA whose exeution strtegy is spei(ed using the qroovy progrmming lngugeF gontrollers re themselves s ! in prtiulr simple pipeline is stndrd nd orpus pipeline is vngugeenlyser ! so one pipeline n e nested in notherF his is prtiulrly useful with onditionl ontrollers to group together set of s tht n ll e turned on or o' s groupF here is lso relEtime version of the orpus pipelineF hen reting suh ontrollerD timeout prmeter needs to e set whih determines the mximum mount of time @in milliseondsA llowed for the proessing of doumentF houments tht tke longer to proessD re simply ignored nd the exeution moves to the next doument fter the timeout intervl hs lpsedF ell ontrollers hve speil hndling for proessing resoures tht implement the interfe gteFreoleFgontrollerewreF his interfe provides methods tht re lled y the ontroller t the strt nd end of the whole pplition9s exeution ! for orpus pipelineD this mens efore ny doument hs een proessed nd fter ll douments in the orpus hve een proessedD whih is useful for s tht need to shre dt strutures ross the whole orpusD uild ggregte sttistisD etF por full detilsD see the tvho doumenttion for gontrollerewreF
4.5
vnguge esoures n e stored in htstoresF htstores re n strt model of diskE sed persisteneD whih n e implemented y vrious types of storge mehnismF rere re the types implementedX
4.6
UT
qei omes with vrious uiltEin omponentsX vnguge esoures modelling houments nd gorporD nd vrious types of ennotE tion hem ! see ghpter SF roessing esoures tht re prt of the exxsi system ! see ghpter TF qzetteers ! see ghpter IQF yntologies ! see ghpter IRF whine verning resoures ! see ghpter IVF elignment tools ! see ghpter IWF rsers nd tggers ! see ghpter IUF yther misellneous resoures ! see ghpter PIF
4.7
his setion desries how to supply qei with the on(gurtion dt it needs out resoureD suh s wht its prmeters reD how to disply it if it hs visulistionD etF everl qei resoures n e grouped into single pluginD whih is diretory ontining n wv on(gurtion (le lled reoleFxmlF gon(gurtion dt for the plugin9s resoures n e given in the reoleFxml (le or diretly in the tv soure (le using tv nnottionsF e reoleFxml (le hs root element `giyviEhsigybF rditionlly this element didn9t ontin ny ttriutesD ut with the introdution of instllle plugins @see etions QFT nd IPFQFSA the following ttriutes n now e providedF
shX e string tht uniquely identi(es this pluginF his should e formtted in similr wy to fully spei(ed tv lss nmesF he lss portion @iFeF everything fter the lst dotA will e used s the nme of the plugin in the qsF por exmpleD the osolete e plugin ould hve the sh gteFosoleteFeF xote tht unlike tv lss nmes the plugin nme n ontin spes for the purpose of presenttionF isyxX he version numer of the pluginF por exmpleD QD QFID QFIID QFIPExery etF higssyxX e short desription of the resoures provided y the pluginF xote tht there is relly only spe for single sentene in the qsF rivvX he v of we pge giving more detils out this pluginF
UU
qeiEwsxX he erliest version of qei tht this plugin is omptile withF his should e in the sme formt s the version shown in the qei titlerD iFeF TFI or TFPE xeryF ho not inlude the uild numer informtionF qeiEweX he lst version of qei whih the plugin is omptile withF his should e in the sme formt s qeiEwsxF
gurrently ll these ttriutes re optionlD unless you intend to mke the plugin ville through plugin repository @see etion IPFQFSAD in whih se the sh nd isyx ttriutes must e providedF e wouldD howeverD suggest tht developers strt to dd these ttriutes to ll the plugins they develop s the informtion is likely to e used in more ples throughE out qei developer nd emeded in the futureF ghild elements of the `giyviEhsigyb depend on the on(gurtion styleF he following three setions disuss the di'erent styles ! llEwvD llEnnottions nd mixture of the twoF
UV
COMMENT="Name of the Minipar command file"> java.net.URL </PARAMETER> <PARAMETER NAME="annotationInputSetName" RUNTIME="true" OPTIONAL="true" COMMENT="Name of the input Source"> java.lang.String </PARAMETER> <PARAMETER NAME="annotationOutputSetName" RUNTIME="true" OPTIONAL="true" COMMENT="Name of the output AnnotationSetName"> java.lang.String </PARAMETER> <PARAMETER NAME="annotationTypeName" RUNTIME="false" DEFAULT="DepTreeNode" COMMENT="Annotations to store with this type"> java.lang.String </PARAMETER> </RESOURCE> </CREOLE> </CREOLE-DIRECTORY>
fsi esoureEvevel ht
ih resoure must give nmeD tv lss nd the te (le tht it n e loded fromF he ove exmple is tken from the rserwinipr pluginD nd de(nes single resoure with numer of prmetersF he full list of vlid elements under `iygib is s followsX
xewi the nme of the resoureD s it will pper in the xew9 menu in qei heveloperF
sf omittedD defults to the re nme of the resoure lss @without pkge nmeAF
gve the fully quli(ed nme of the tv lss tht implements this resoureF te nmes te (les required y this resoure @pths re reltive to the lotion of
reoleFxmlAF ypilly this will e the te (le ontining the lss nmed y the `gveb elementD ut dditionl `teb elements n e used to nme thirdEprty te (les tht the resoure depends onF
when hovering over n instne of this resoure in the resoures tree in qei hevelE operF sf omittedD no omment is usedF
gywwix desriptive omment out the resoureD whih will pper s the tooltip
UW
rivv v to help doument on the we for this resoureF st is used in the help
rowser inside qei heveloperF
sxipegi the interfe type implemented y this resoureD for exmple new types of
doument would speify `sxipegibgteFhoument`GsxipegibF
sgyx the ion used to represent this resoure in qei heveloperF his is pth inside
the plugin9s te (leD for exmple `sgyxbGsomeGpkgeGionFpng`GsgyxbF sf the pth spei(ed does not strt with forwrd slshD it is ssumed to nme n ion from the qei defult setD whih is loted in gteFjr t gteGresouresGimgF sf no ion is spei(edD generi lnguge resoure or proessing resoure ion @s ppropriteA is usedF not shown in the xew9 menusF his is useful for resoure types tht re intended to e reted internlly y other resouresD or for resoures tht hve prmeters of type tht nnot e set in the qsF `seiGb resoures n still e reted in tv ode using the ptoryF
sei if presentD this resoure type is hidden in the qei heveloper qsD iFeF it is
lly rete instnes of this resoure when the plugin is lodedF eny numer of uto instnes my e de(nedD qei will rete them llF ih `eysxexgib element my optionlly ontin `eew xewia4FFF4 evia4FFF4 Gb elements giving prmE eter vlues to use when reting the instneF eny prmeters not spei(ed expliitly will tke their defult vluesF se `rshhixEeysxexgib if you wnt the uto inE stnes not to show up in qei heveloper ! this is useful for things like doument formts where there should only ever e single instne in qei nd tht instne should not e deletedF to the ools menu in qei heveloperF
yyv if presentD this resoure type is onsidered to e toolF ools n ontriute items
por visul resouresD `qsb element should lso e providedF his tkes i ttriuteD whih n hve the vlue veqi or wevvF veqi mens tht the visul resoure is lrge viewer nd should pper in the min prt of the qei heveloper window on the right hnd sideD wevv mens the is smll viewer whih ppers in the spe elow the resoures tree in the ottom leftF he `qsb element supports the following suEelementsX
soure whose type is ssignle to this type will e displyed with this viewerD so for exmple tht n disply ll types of doument would speify gteFhoumentD wheres tht n only disply the defult qei doument implementtion would speify gteForporFhoumentsmplF
VH
viewer for the given resoure typeD nd will ensure tht if severl di'erent viewers re ll pplile to this resoureD this viewer will e the one tht is initilly visileF
por nnottion viewersD you should speify n `exxyesyxihsveihb element givE ing the nnottion type tht the viewer n disply @eFgF enteneAF
esoure rmeters
esoures my lso hve prmeters of vrious typesF hese resouresD from the qei distriutionD illustrte the vrious types of prmetersX
<RESOURCE> <NAME>GATE document</NAME> <CLASS>gate.corpora.DocumentImpl</CLASS> <INTERFACE>gate.Document</INTERFACE> <COMMENT>GATE transient document</COMMENT> <OR> <PARAMETER NAME="sourceUrl" SUFFIXES="txt;text;xml;xhtm;xhtml;html;htm;sgml;sgm;mail;email;eml;rtf" COMMENT="Source URL">java.net.URL</PARAMETER> <PARAMETER NAME="stringContent" COMMENT="The content of the document">java.lang.String</PARAMETER> </OR> <PARAMETER COMMENT="Should the document read the original markup" NAME="markupAware" DEFAULT="true">java.lang.Boolean</PARAMETER> <PARAMETER NAME="encoding" OPTIONAL="true" COMMENT="Encoding" DEFAULT="">java.lang.String</PARAMETER> <PARAMETER NAME="sourceUrlStartOffset" COMMENT="Start offset for documents based on ranges" OPTIONAL="true">java.lang.Long</PARAMETER> <PARAMETER NAME="sourceUrlEndOffset" COMMENT="End offset for documents based on ranges" OPTIONAL="true">java.lang.Long</PARAMETER> <PARAMETER NAME="preserveOriginalContent" COMMENT="Should the document preserve the original content" DEFAULT="false">java.lang.Boolean</PARAMETER> <PARAMETER NAME="collectRepositioningInfo" COMMENT="Should the document collect repositioning information" DEFAULT="false">java.lang.Boolean</PARAMETER> <ICON>lr.gif</ICON> </RESOURCE> <RESOURCE>
VI
rmeters my e optionlD nd my hve defult vlues @nd my hve omments to desrie their purposeD whih is displyed y qei heveloper during intertive prmeter settingAF ome prmeters re exeution time @xswiAD some re initilistion timeF iFgF t exeution time do is supplied to lnguge nlyserY t initilistion time grmmr my e supplied to lnguge nlyserF he `eewiib tg tkes the following ttriutesX
xewiX nme of the tvfen property tht the prmeter refers toD iFeF for prmeter
nmed somerm9 the lss must hve setomerm nd getomerm methodsF1
hipevX defult vlue @see elowAF xswiX doesn9t need setting t initilistion timeD ut must e set efore lling
exeute@AF ynly meningful for s
ysyxevX not required gywwixX for disply purposes siwgvexewiX @only pplies to prmeters whose type is jvFutilFgolletion
or type tht implements or extends thisA this spei(es the type of elements the olE letion ontinsD so qei n use the right type when prmeters re setF sf omittedD qei will pss in the elements s tringsF of (le su0xes tht this prmeter typilly eptsD used s (lter in the (le hooser provided y qei heveloper to selet lol (le s the prmeter vlueF
st is possile for two or more prmeters to e mutully exlusive @iFeF user must speify one or the other ut not othAF sn this se the `eewiib elements should e grouped together under n `yb elementF
1 The JavaBeans spec allows
is
instead of
get
does not support parameters with primitive types. Parameters of type class) are permitted, but these have
get
accessors anyway.
VP
he type of the prmeter is spei(ed s the text of the `eewiib elementD nd the type supplied must mth the return type of the prmeter9s get methodF eny referene type @lssD interfe or enumA my e used s the prmeter typeD inluding other resoure types ! in this se qei heveloper will o'er list of the loded instnes of tht resoure s options for the prmeter vlueF rimitive types @hrD oolenD F F F A re not supportedD insted you should use the orresponding wrpper type @jvFlngFghrterD jvFlngFfoolenD F F F AF sf the getter returns prmeterized type @eFgF vist`sntegerbA you should just speify the rw type @jvFutilFvistA here2 F he hipev string is onverted to the pproprite type for the prmeter E jvFlngFtring prmeters use the vlue diretlyD primitive wrpper types eFgF jvFlngFsnteger use their respetive vlueyf methodsD nd other uiltEin tv types n hve defults spei(ed provided they hve onstrutor tking tringF he type jvFnetFv is treted speillyX if the defult string is not n solute v @eFgF httpXGGgteFFukGA then it is treted s pth reltive to the lotion of the reoleFxml (leF hus hipev of resouresGminFjpe9 in the (le fileXGoptGwyluginGreoleFxml is treted s the solute v fileXGoptGwyluginGresouresGminFjpeF por golletionEvlued prmeters multiple vlues my e spei(edD seprted y semiE olonsD eFgF fooYrYz9Y if the prmeter9s type is n interfe ! golletion or one of its suEinterfes @eFgF vistA ! suitle onrete lss @eFgF erryvistD rshetA will e hosen utomtilly for the defult vlueF por prmeters of type gteFpeturewp multiple nmeavlue pirs n e spei(edD eFgF kindawordYorthauppersnitil9F por enumEvlued prmeters the defult string is tken s the nme of the enum onstnt to useF pinllyD if no hipev ttriute is spei(edD the defult vlue is nullF
java.lang.Integer
as the
VQ
his tells qei to lod myluginFjr nd sn its ontents looking for resoure lsses nnotted with dgreoleesoureF yther te (les required y the plugin n e spei(ed using other `teb elements without gexa4true4F sn qei imedded pplition it is possile to register single dgreoleesoure nnoE tted lss without using reoleFxml (le y lling
Gate.getCreoleRegister().registerComponent(MyResource.class);
qei will extrt the on(gurtion from the nnottions on the lss nd mke it ville for use s if it hd een de(ned in pluginF
fsi esoureEvevel ht
o mrk lss s giyvi resoureD simply use the dgreoleesoure nnottion @in the gteFreoleFmetdt pkgeAD for exmpleX
1 2 3 4 5 6 7
import gate . creole . AbstractLanguageAnalyser ; import gate . creole . metadata .*; @CreoleResource ( name = " GATE Tokeniser " , comment = " Splits text into tokens and spaces " ) public class Tokeniser extends AbstractLanguageAnalyser { ...
he dgreoleesoure nnottion provides slots for ll the vlues tht n e spei(ed under `iygib in reoleFxmlD exept `gveb @inferred from the nme of the nnotted lssA nd `teb @tken to e the te ontining the lssAX
nme @tringA the nme of the resoureD s it will pper in the xew9 menu in qei
heveloperF sf omittedD defults to the re nme of the resoure lss @without pkge nmeAF @wv equivlent `xewibA tooltip when hovering over n instne of this resoure in the resoures tree in qei heveloperF sf omittedD no omment is usedF @wv equivlent `gywwixbA
omment @tringA desriptive omment out the resoureD whih will pper s the
VR
isrivte @oolenA should this resoure type e hidden from the qei heveloper qsD so
it does not pper in the xew9 menusc sf omittedD defults to flse @iFeF not hiddenAF @wv equivlent `seiGbA
ion @tringA the ion to use to represent the resoure in qei heveloperF sf omittedD
generi lnguge resoure or proessing resoure ion is usedF @wv equivlent `sgyxbD see the desription ove for detilsA new type of doument would speify 4gteFhoument4 hereF @wv equivlent `sxipegibA resoure tht should e reted utomtilly when the plugin is lodedF sf omittedD no utoEinstnes re reted y defultF @wv equivlentD one or more `eysxexgib ndGor `rshhixEeysxexgib elementsD see the desription ove for detilsA
interfexme @tringA the interfe type implemented y this resoureD for exmple
@wv equivlent
resourehisplyed @tringA the lss nme of the resoure type tht this displysD eFgF
4gteFgorpus4F @wv equivlent `iygihsveihbA
miniewer @oolenA is this the most importnt9 viewer for its displyed resoure
typec @wv equivlent `wesxsiiGbD see ove for detilsA por nnottion viewersD you should speify n nnottionypehisplyed element giving the nnottion type tht the viewer n disply @eFgF enteneAF
esoure rmeters
rmeters re delred y pling nnottions on their tvfen set methodsF o mrk setter method s prmeterD use the dgreolermeter nnottionD for exmpleX
@CreoleParameter(comment = "The location of the list of abbreviations") public void setAbbrListUrl(URL listUrl) { ...
VS
qei will infer the prmeter9s nme from the nme of the tvfen property in the usul wy @iFeF strip o' the leding set nd onvert the following hrter to lower seD so in this exmple the nme is rvistrlAF he prmeter nme is not tken from the nme of the method prmeterF he prmeter9s type is inferred from the type of the method prmeter @jvFnetFv in this seAF he nnottion elements of dgreolermeter orrespond to the ttriutes of the `eewiib tg in the wv on(gurtion styleX
omment @tringA n optionl desriptive omment out the prmeterF @wv equivlent
gywwixA
defultlue @tringA the optionl defult vlue for this prmeterF he vlue is spei(ed
s string ut is onverted to the relevnt type y qei ording to the onversions desried in the previous setionF xote tht reltive pth defult vlues for vEvlued prmeters re still reltive to the lotion of the reoleFxml (leD not the nnotted lss3 F @wv equivlent hipevA
su0xes @tringA for vEvlued prmetersD semiolonEseprted list of defult (le sufE
(xes tht this prmeter eptsF @wv equivlent ppsiA
ments in the olletionF his n usully e inferred from the generi type informE tionD for exmple puli void setsndies@vist`sntegerb indiesAD ut must e spei(ed if the set method9s prmeter hs rw @nonEprmeterizedA typeF @wv equivlent siwgvexewiA
wutullyEexlusive prmeters @suh s would e grouped in n `yb in reoleFxmlA re hndled y dding disjuntiona4label4 nd priorityan to the dgreolermeter nE nottion ! ll prmeters tht shre the sme lel re grouped in the sme disjuntionD nd will e o'ered in order of priorityF he prmeter with the smllest priority vlue will e the one listed (rstD nd thus the one tht is o'ered initilly when reting resoure of this type in qei heveloperF por exmpleD the following is simpli(ed extrt from gteForporFhoumentsmplX
1 2 3 4 5
@CreoleParameter ( disjunction = " src " , priority =1) public void setSourceUrl ( URL src ) { / * * / } @CreoleParameter ( disjunction = " src " , priority =2) public void setStringContent ( String content ) { / * * / }
his delres the prmeters stringgontent nd sourerl s mutullyEexlusiveD nd when reting n instne of this resoure in qei heveloper the prmeter tht will e
3 When registering a class using
CreoleRegister.registerComponent
Class.getResource
to construct the default URLs if no value is supplied for the parameter by the user.
VT
shown initilly is sourerlF o set stringgontent insted the user must selet it from the dropEdown listF rmeters with the sme delred priority vlue will pper next to eh other in the listD ut their reltive ordering is not spei(edF rmeters with no expliit priority re lwys listed after those tht do speify priorityF yptionl nd runtime prmeters re mrked using extr nnottionsD for exmpleX
1 2 3 4 5
snheritne
nlike with pure wv on(gurtionD when using nnottions resoure will inherit ny on(gurtion dt tht ws not expliitly spei(ed from nnottions on its prent lss nd on ny interfes it implementsF pei(llyD if you do not speify ommentD interE fexmeD ionD nnottionypehisplyed or the qsErelted elements @guiype nd reE sourehisplyedA on your dgreoleesoure nnottion then qei will look up the lss tree for other dgreoleesoure nnottionsD (rst on the superlssD its superlssD etFD then t ny implemented interfesD nd use the (rst vlue it (ndsF his is useful if you re de(ning fmily of relted resoures tht inherit from ommon se lssF he resoure nme nd the isrivte nd miniewer )gs re
not
inheritedF
rmeter de(nitions re inherited in similr wyF his is one of the ig dvntges of nnottion on(gurtion over pure wv ! if one resoure lss extends nother then with pure wv on(gurtion ll the prent lss9s prmeter de(nitions must e duplited in the sulss9s reoleFxml de(nitionF ith nnottionsD prmeters re inherited from the prent lss @nd its prentD etFA s well s from ny interfes implementedF por exmE pleD the gteFvngugeenlyser interfe provides two prmeter de(nitions vi nnotted set methodsD for the orpus nd doument prmetersF eny dgreoleesoure nnotted lss tht implements vngugeenlyserD diretly or indiretlyD will get these prmeters utomtillyF yf ourseD there re some ses where this ehviour is not desirleD for exmple if sulss lultes vlue for superlss prmeter rther thn hving the user set it diretlyF sn this se you n hide the prmeter y overriding the set method in the sulss nd using mrker nnottionX
1 2 3 4
VU
he overriding method will typilly just ll the superlss oneD s its only purpose is to provide ple to put the driddengreolermeter nnottionF elterntivelyD you my wnt to override some of the on(gurtion for prmeter ut inherit the rest from the superlssF eginD this is hndled y trivilly overriding the set method nd reEnnotting itX
1 2 3 4 5 6 7 8 9 10 11 12 13
@CreoleParameter ( comment = " Location of the grammar file " , suffixes = " jape " ) public void setGrammarUrl ( URL grammarLocation ) { ... } @Optional @RunTime @CreoleParameter ( comment = " Feature to set on success " ) public void setSuccessFeature ( String name ) { ... }
/ / / / subclass / / override the default value, inherit everything else
/ / superclass
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
@CreoleParameter ( defaultValue = " resources / defaultGrammar . jape " ) public void setGrammarUrl ( URL url ) { super . setGrammarUrl ( url ); } @Optional ( false ) @CreoleParameter public void setSuccessFeature ( String name ) { super . setSuccessFeature ( name ); }
/ / we want the parameter to be required in the subclass
xote tht for kwrds omptiilityD dt is only inherited from superlss nnottions if the sulss is itself nnotted with dgreoleesoureF sf the sulss is not nnotted then qei ssumes tht all its on(gurtion is ontined in reoleFxml in the usul wyF
VV
he defult vlue for the listrl prmeter in the nnotted lss will e repled y your vlueF
ixternl eysxexgis
por resoures like doument formtsD where there should lwys nd only e one inE stne in qei t ny timeD it mkes sense to put the utoEinstne de(nitions in the dgreoleesoure nnottionF fut if the utomtilly reted instnes re onveniene rther thn neessity it my e etter to de(ne them in wv so other users n disle them without reEompiling the lssX
<CREOLE-DIRECTORY> <JAR SCAN="true">myPlugin.jar</JAR> <RESOURCE> <CLASS>com.acme.AutoPR</CLASS> <AUTOINSTANCE> <PARAM NAME="type" VALUE="Sentence" /> </AUTOINSTANCE> <AUTOINSTANCE> <PARAM NAME="type" VALUE="Paragraph" /> </AUTOINSTANCE> </RESOURCE> </CREOLE-DIRECTORY>
VW
snheriting rmeters
sf you would prefer to use wv on(gurtion for your own resouresD ut would like to ene(t from the prmeter inheritne fetures of the nnottionEdriven pprohD you n write norml reoleFxml (le with ll your on(gurtion nd just dd lnk dgreoleesoure nnottion to your lssF por exmpleX
1 2 3 4 5 6 7 8
package com . acme ; import gate .*; import gate . creole . metadata . CreoleResource ; @CreoleResource public class MyPR implements LanguageAnalyser { ... }
<!-- creole.xml --> <CREOLE-DIRECTORY> <CREOLE> <RESOURCE> <NAME>My Processing Resource</NAME> <CLASS>com.acme.MyPR</CLASS> <COMMENT>...</COMMENT> <PARAMETER NAME="annotationSetName" RUNTIME="true" OPTIONAL="true">java.lang.String</PARAMETER> <!-don't need to declare document and corpus parameters, they are inherited from LanguageAnalyser --> </RESOURCE> </CREOLE> </CREOLE-DIRECTORY>
WH
xo ttempt is mde here to explin the workings of svy or the formt of the ivyFxml (leF por full detils you should refer to the ppropriote setion of the svy mnulF snorporting n svy (le within giyvi plugin is s simple s referening it from within reoleFxmlF essumuing you hve used the defult (lenme of ivyFxml then you n referE ene it vi simple `sb elementF
<CREOLE-DIRECTORY> <JAR SCAN="true">myPlugin.jar</JAR> <IVY/> </CREOLE-DIRECTORY>
sf you hve used n lterntive (lenme then you n speify it s the text ontent of the `sb elementF por exmpleD if the (lenme is pluginEivyFxml you would referene it s followsX
<CREOLE-DIRECTORY> <JAR SCAN="true">myPlugin.jar</JAR> <IVY>plugin-ivy.xml</IVY> </CREOLE-DIRECTORY>
hen the plugin is loded into qei svy resolves the dependeniesD downlods the pproE prite lirries @if neessryA nd then mkes them ville to the pluginF yne the plugin is loded it ehves extly the sme s ny other pluginF xote tht if you export n pplition @see etion QFWFRA then to ensure tht it is selfE ontined nd usele within ny proessing environment the svy sed dependenies re expndedY the lirries re downloded into the plugin9s li folderD pproprite entires re dded to reoleFxml nd the `sb element is removedF
4.8
isul esoures llow developer to provide qs to intert with prtiulr resoure type @ or vAD ut sometimes it is useful to provide generl utilities for use in the qei heveloper qs tht re not tied to ny spei( resoure typeF ixmples inlude the nE nottion di' tool nd the qroovy onsole @provided y the qroovy pluginAD oth of whih re selfEontined tools tht disply in their own topElevel windowF o support thisD the giyvi model hs the onept of toolF e resoure type is mrked s tool y using the `yyvGb element in its reoleFxml de(nitionD or y setting tool a true if using the dgreoleesoure nnottion on(gE urtion styleF sf resoure is delred to e toolD nd written to implement the
WI
gteFguiFetionsulisher interfeD then whenever n instne of the resoure is reE ted its pulished tions will e dded to the ools menu in qei heveloperF
ine the pulished tions of every instne of the resoure will e dded to the tools menuD it is est not to use this mehnism on resoure types tht n e instntited y the userF he tool mrker is est used in omintion with the privte )g @to hide the resoure from the list of ville types in the qsA nd one or more hidden utoinstne de(nitions to rete limited numer of instnes of the resoure when its de(ning plugin is lodedF ee the qroovyupport resoure in the qroovy plugin for n exmple of thisF
action . putValue ( GateConstants . MENU_PATH_KEY , new String [] { " Acme toolkit " , " Statistics " });
he key must e GateConstants.MENU_PATH_KEY nd the vlue must e n rry of stringsF ih string in the rry represents the nme of one level of suEmenusF hus in the exmple ove the tion would e pled under ools eme toolkit ttistisF sf no MENU_PATH_KEY vlue is provided the tion will e pled diretly on the ools menuF
WP
his hpter douments qei9s model of orporD douments nd nnottions on douE mentsF etion SFI desries the simple ttriuteGvlue dt model tht orporD douments nd nnottions ll shreF etion SFPD etion SFQ nd etion SFR desrie orporD doE uments nd nnottions on douments respetivelyF etion SFS desries qei9s support for diverse doument formtsD nd etion SFSFP desries filities for wv inputGoutputF
5.1
qei hs single model for informtion tht desries doumentsD olletions of douments @orporAD nd nnottions on doumentsD sed on ttriuteGvlue pirsF ettriute nmes re stringsY vlues n e ny tv ojetF he es for essing this feture dt is tv9s wp interfe @prt of the golletions esAF WQ
WR
5.2
e gorpus in qei is tv et whose memers re houmentsF foth gorpor nd houE ments re types of vngugeesoure @vAY ll vs hve peturewp @ tv wpA ssoE ited with them tht stored ttriuteGvlue informtion out the resoureF peturewps re lso used to ssoite ritrry informtion with rnges of douments @eFgF piees of textA vi the nnottion model @see elowAF houments hve houmentgontent whih is text t present @future versions my dd support for udiovisul ontentA nd one or more ennottionets whih re tv etsF
5.3
houments re modelled s ontent plus nnottions @see etion SFRA plus fetures @see etion SFIAF he ontent of doument n e ny sulss of houmentgontentF
5.4
ennottions re orgnised in grphsD whih re modelled s tv sets of ennottionF enE nottions my e onsidered s the rs in the grphY they hve strt xode nd n end xodeD n shD type nd peturewpF xodes hve pointers into the soures doumentD eFgF hrter o'setsF
WS
hte hem
<?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema"> <!-- XSchema deffinition for Date--> <element name="Date"> <complexType> <attribute name="kind" use="optional"> <simpleType> <restriction base="string"> <enumeration value="date"/> <enumeration value="time"/> <enumeration value="dateTime"/> </restriction> </simpleType> </attribute> </complexType> </element> </schema>
erson hem
<?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema"> <!-- XSchema definition for Person--> <element name="Person" /> </schema>
eddress hem
<?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema"> <!-- XSchema deffinition for Address--> <element name="Address"> <complexType> <attribute name="kind" use="optional"> <simpleType> <restriction base="string"> <enumeration value="email"/> <enumeration value="url"/> <enumeration value="phone"/> <enumeration value="ip"/>
WT
<enumeration <enumeration <enumeration <enumeration </restriction> </simpleType> </attribute> </complexType> </element> </schema>
WU
ext
ennottions
pn ind S IQ IU PP PQ S PQ
ext
ennottions
pn ind S IQ IU PP PQ S PQ
WV
ext
ennottions
petures
ddmmyyaIHIIWR
sn most sesD the hierrhil struture ould e reovered from the spnsF roweverD it my e desirle to reord this struture diretly through onstituents feture whose vlue is sequene of nnottions representing the immedite onstituents of the initil nnottionF por the nnottions of type prseD the onstituents re either nonEterminls @other nnotE tions in the prse groupA or tokensF por the sentene nnottionD the onstituents feture points to the onstituent tokensF e referene to nother nnottion is represented in the tle s 4[ ennottion sd]4Y for exmpleD 4[Q]4 represents referene to nnottion QF here the vlue of n feture is sequene of itemsD these items re seprted y ommsF xo speil opertions re provided in the urrent rhiteture for mnipulting onstituentsF et less esoteri levelD nnottions n e used to reord the overll struture of doumentsD inluding in prtiulr douments whih hve strutured hedersD s is shown in the third exmple @le SFQAF sf the eddresseeD oureD FFF nnottions re reorded when the doument is indexed for retrievlD it will e possile to perform retrievl seletively on informtion in prtiulr (eldsF yur (nl exmple @le SFRA involves n nnottion whih e'etively modi(es the doumentF he urrent rhiteture does not mke ny spei( provision for the modi(tion
WW
ext
ennottions
of the originl textF roweverD some llowne must e mde for proesses suh s spelling orretionF his informtion will e reorded s orretion feture on token nnottions nd possily on nme nnottionsX
5.5
Document Formats
he following doument formts re supported y qeiX lin ext rwv qwv wv p imil
IHH
fy defult qei will try nd identify the type of the doumentD then strip nd onvert ny mrkup into qei9s nnottion formtF o disle this proessD set the mrkupewre prmeter on the doument to flseF hen reding doument of one of these typesD qei extrts the text etween tgs @where suh existA nd rete qei nnottion (lled s followsX he nme of the tg will onstitute the nnottion9s typeD ll the tgs ttriutes will mteE rilize in the nnottion9s fetures nd the nnottion will spn over the text overed y the tgF e few exeptions of this rule pply for the pD imil nd lin ext formtsD whih will e desried lter in the input setion of these formtsF he text etween tgs is extrted nd ppended to the qei doument9s ontent nd ll nnottions reted from tgs will e pled into qei nnottion set nmed yriginl mrkups9F
Example:
he strtxode nd endxode re reted from o'sets referring the eginning nd the end of e piee of text9 in the doument9s ontentF he douments supported y qei hve to e in one of the enodings epted y tvF he most populr is the `UTF-8' enoding whih is lso the most storge e0ient one for xsgyhiF sfD when loding doument in qei the encoding prmeter is set to 9@the empty stringAD then the defult enoding of the pltform will e usedF
IHI
static public DocumentFormat getDocumentFormat ( gate . Document aGateDocument , URL url ) static public DocumentFormat getDocumentFormat ( gate . Document aGateDocument , String fileSuffix )
IHP
6 7 8
he (rst two methods try to detet the right wimeype for the qei doumentD nd fter thtD they ll the third one to return the reder ssoite with wimeypeF yf ourseD if n expliit mimeype prmeter ws spei(edD qei lls the third form of the method diretlyD pssing the spei(ed typeF qei uses the implementtion from httpXGGjigswFwQForg9 for mime typesF he mgi numers test is performed using the informtion form mgiPmimeypewp mpF ih key from this mpD is serhed in the (rst u'erize @the defult vlue is PHRVA hrs of textF he method tht does this is lled runwgixumers@snputtremeder ederA nd it elongs to houmentpormt lssF wore detils out it n e found in the qei es doumenttionF sn order to tivte reder to perform the unpkingD the reole de(nition of qei doument de(nes prmeter lled mrkupewre9 initilized with defult vlue of trueF his prmeterD fores qei to detet proper reder for the doument eing redF sf no reder is foundD the doument9s ontent is lod nd presented to the userD just like ny other text editor @this for textul doumentsAF ou n lso use ik formt utoEdetetion y setting the mimeype of doument to 4pplitionGtik4F hen the doument will e prsed only y ikF he next susetions investigtes prtiulrities for eh formt nd will desrie the (le extensions registered with eh doument formtF
5.5.2 XML
snput
qei permits the proessing of ny wv doument nd o'ers support for wv nmespesF st ene(ts the power of ephe9s eres prser nd lso mkes use of un9s te lyerF ghnging the wv prser in qei n e hieved y simply repling the vlue of tv system property @jvxFxmlFprsersFerserptory9AF qei will ept ny well formed wv doument s inputF elthough it hs the possiility to vlidte wv douments ginst hhs it does not do so euse the vlidting proedure is time onsuming nd in mny ses it issues messges tht re nnoying for the userF here is n open prolem with the generl pproh of reding wvD rwv nd qwv douments in qeiF es we previously sidD the text overed y tgsGelements is ppended to the qei doument ontent nd qei nnottion refers to this prtiulr spn of textF hen ppendingD in ses suh s endF`Gb`btrt9 it might hppen tht the ending
IHQ
word of the previous nnottion is ontented with the eginning phrse of the nnottion urrently eing retedD resulting in grge input for qei proessing resoures tht operte t the text surfeF vet9s tke nother exmple in order to etter understnd the prolemX
<title>This is a title</title><p>This is a paragraph</p><a href="#link">Here is an useful link</a>
hen the mrkup is trnsformed to nnottionsD it is likely tht the text from the doument9s ontent will e s followsX
he mgi numers test serhes inside the doument for the wv@`cxml versiona4IFH4A signtureF st is lso le to detet if the wv doument uses the semntis desried in the qei doument formt hh @see SFSFP elowA or uses other semntisF
xmespe hndling
fy defultD qei will retin the nmespe pre(x nd nmespe ss of wv elements when reting nnottions nd fetures within the yriginl mrkups nnottion setF por exmpleD the element
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Document title</dc:title>
IHR
roweverD s the olon hrter 9X9 is reserved metEhrter in teiD it is not possile to write tei rule tht will mth the dXtitle element or its nmespe sF sf you need to mth nmespeEpre(xed elements in the yriginl mrkups eD you n lter the defult nmespe deseriliztion ehviour to remove the nmespe pre(x nd dd it s feture @long with the nmespe sAD y speifying the following ttriutes in the `qeigyxpsqb element of gteFxml or lol on(gurtion (leX ddxmespepetures E set to 4true4 to deserilize nmespe pre(x nd uri inE formtion s feturesF nmespes E he feture nme to use tht will hold the nmespe s of the elementD eFgF 4nmespe4 nmespere(x E he feture nme to use tht will hold the nmespe pre(x of the elementD eFgF 4pre(x4 iFeF
<GATECONFIG addNamespaceFeatures="true" namespaceURI="namespace" namespacePrefix="prefix" />
por exmple
<dc:title>Document title</dc:title>
would rete in yriginl mrkups e @ssuming the xmlnsXd s hs de(ned in the doE ument root or prent elementA
title(prefix=dc, namespace=http://purl.org/dc/elements/1.1/)
IHS
when using the 9ve preserving doument formt9 wv output option @see SFSFP elowAF
yutput
qei is ple of ensuring persistene for its resouresF he types of persistent storge used for vnguge esoures reX tv seriliztionY wv seriliztionF e desrie the ltter se hereF wv persistene doesn9t neessrily preserve ll the ojets elonging to the nnottionsD douments or orporF heir fetures n e of ll kinds of ojetsD with vrious lyers of nestingF por exmpleD lists containing lists containing maps, etcF erilizing these ritrry dt types in wv is not simple tskY qei does the est it nD nd supports ntive tv types suh s sntegers nd foolensD ut where omplex dt types re usedD informtion my e lost@the types will e onverted into tringsAF qei provides full seriliztion of ertin types of fetures suh s olletionsD strings nd numersF st is possile to serilize only those olletions ontining strings or numersF he rest of other fetures re serilized using their string representtion nd when red kD they will e ll strings insted of eing the originl ojetsF gonsequenes of this might e oserved when performing evlutions @see ghpter IHAF hen qei outputs n wv doument it my do so in one of two wysX hen the originl doument tht ws imported into qei ws n wv doumentD qei n dump tht doument k into wv @possily with dditionl mrkup ddedAY por ll doument formtsD qei n dump its internl representtion of the doument into wvF sn the former seD the wv output will e lose to the originl doumentF sn the ltter seD the formt is qeiEspei( one whih n e red k y the system to rerete ll the informtion tht qei held internlly for the doumentF
IHT
sn order to understnd why there re two types of wv seriliztionD one needs to understnd the struture of qei doumentF qei llows grph of nnottions tht refer to prts of the textF hose nnottions re grouped under nnottion setsF feuse of this strutureD sometimes it is impossile to sve doument s wv using tgs tht surround the text referred to y the nnottionD euse tgs rossover situtions ould pper @wv is essentilly treeEsed model of informtionD wheres qei uses grphsAF hereforeD in order to preserve ll nnottions in qei doumentD ustom type of wv doument ws developedF he prolem of rossover tgs ppers with qei9s seond option @the preserve formt oneAD whih is implemented t the ost of losing ertin nnottionsF he wy it is pplied in qei is tht it tries to restore the originl mrkup nd where it is possileD to dd in the sme mnner nnottions produed y qeiF
row to eess nd se the wo porms of wv eriliztion ve s wv yption his option is ville in qei heveloper in the popEup menu
ssoited with eh lnguge resoure @doument or orpusAF ving orpus s wv is done y lling ve s wv9 on eh doument of the orpusF his option sves ll the nnottions of doument together their fetures@pplying the restritions previously disussedAD using the qtehoumentFdtd X
<!ELEMENT GateDocument (GateDocumentFeatures, TextWithNodes, (AnnotationSet+))> <!ELEMENT GateDocumentFeatures (Feature+)> <!ELEMENT Feature (Name, Value)> <!ELEMENT Name (\#PCDATA)> <!ELEMENT Value (\#PCDATA)> <!ELEMENT TextWithNodes (\#PCDATA | Node)*> <!ELEMENT AnnotationSet (Annotation*)> <!ATTLIST AnnotationSet Name CDATA \#IMPLIED> <!ELEMENT Annotation (Feature*)> <!ATTLIST Annotation Type CDATA \#REQUIRED StartNode CDATA \#REQUIRED EndNode CDATA \#REQUIRED> <!ELEMENT Node EMPTY> <!ATTLIST Node id CDATA \#REQUIRED>
he doument is sved under nme hosen y the user nd it my hve ny extensionF roweverD the reommended extension would e xml9F sing qei imeddedD this option is ville y lling gteFhoument9s toml@A methodF his method returns string whih is the wv representtion of the doument on whih the method ws lledF
IHU
xoteX st is reommended tht the string representtion to e sved on the (le sysE
enodinga4pEV4cb
tem using the pEV enodingD s the (rst line of the string is X `cxml versiona4IFH4
Example of such a GATE format document:
<?xml version="1.0" encoding="UTF-8" ?> <GateDocument> <!-- The document's features--> <GateDocumentFeatures> <Feature> <Name className="java.lang.String">MimeType</Name> <Value className="java.lang.String">text/plain</Value> </Feature> <Feature> <Name className="java.lang.String">gate.SourceURL</Name> <Value className="java.lang.String">file:/G:/tmp/example.txt</Value> </Feature> </GateDocumentFeatures> <!-- The document content area with serialized nodes --> <TextWithNodes> <Node id="0"/>A TEENAGER <Node id="11"/>yesterday<Node id="20"/> accused his parents of cruelty by feeding him a daily diet of chips which sent his weight ballooning to 22st at the age of l2<Node id="146"/>.<Node id="147"/> </TextWithNodes> <!-- The default annotation set --> <AnnotationSet> <Annotation Type="Date" StartNode="11" EndNode="20"> <Feature> <Name className="java.lang.String">rule2</Name> <Value className="java.lang.String">DateOnlyFinal</Value> </Feature> <Feature> <Name className="java.lang.String">rule1</Name> <Value className="java.lang.String">GazDateWords</Value> </Feature> <Feature> <Name className="java.lang.String">kind</Name> <Value className="java.lang.String">date</Value> </Feature> </Annotation> <Annotation Type="Sentence" StartNode="0"
IHV
EndNode="147"> </Annotation> <Annotation Type="Split" StartNode="146" EndNode="147"> <Feature> <Name className="java.lang.String">kind</Name> <Value className="java.lang.String">internal</Value> </Feature> </Annotation> <Annotation Type="Lookup" StartNode="11" EndNode="20"> <Feature> <Name className="java.lang.String">majorType</Name> <Value className="java.lang.String">date_key</Value> </Feature> </Annotation> </AnnotationSet> <!-- Named annotation set --> <AnnotationSet Name="Original markups" > <Annotation Type="paragraph" StartNode="0" EndNode="147"> </Annotation> </AnnotationSet> </GateDocument>
xoteX yne must know tht ll fetures tht re not olletions ontining numers or strings
or tht re not numers or strings re disrdedF ith this optionD qei does not preserve those fetures it nnot restore kF
popup menu of the nnottions tleF sf no nnottion in this tle is seletedD then the option will restore the doument9s originl mrkupF sf ertin nnottions re seletedD then the option will ttempt to restore the originl mrkup nd insert ll the seleted onesF hen n nnottion violtes the rossed over onditionD tht nnottion is disrded nd messge is issuedF his option mkes it possile to generte n wv doument with tgs surrounding the nE nottion9s referened text nd fetures sved s ttriutesF ell fetures whih re olletionsD strings or numers re svedD nd the others re disrdedF roweverD when red kD only the ttriutes under the qei nmespe @see elowA re reonstruted k di'erently to the othersF ht is euse qei does not store in the wv doument the informtion out the fetures lss nd for olletions the lss of the itemsF oD when red kD ll fetures will eome stringsD exept those under the qei nmespeF yne will notie tht ll generted tgs hve n ttriute lled gtesd9 under the nmesE pe httpXGGwwwFgteFFuk9F he ttriute is used when the doument is red k in qeiD in order to restore the nnottion9s old shF his feture is needed euse it works in lose oopertion with nother ttriute under the sme nmespeD lled mthes9F his ttriute indites nnottionsGtgs tht refer the sme entity1 F hey re under this
1 It's not an XML entity but a information extraction named entity
he reserve pormt yption his option is ville in qei heveloper from the
IHW
nmespe euse qei is sensitive to them nd trets them di'erently to ll other eleE ments with their ttriutes whih fll under the generl reding lgorithm desried t the eginning of this setionF he gtesd9 under qei nmespe is used to rete n nnottion whih hs s sh the vlue indited y this ttriuteF he mthes9 ttriute is used to rete n erryvist in whih the items will e sntegersD representing the sh of nnottions tht the urrent one mthesF
Example:
nder qei imeddedD this option is ville y lling gteFhoument9s toml@et etgontiningennottionsA methodF his method returns string whih is the wv representtion of the doument on whih the method ws lledF sf lled with null s prmeterD then the method will ttempt to restore only the originl mrkupF sf the prmeter is set tht ontins nnottionsD then eh nnottion is tested ginst the rossover restritionD nd for those found to violte itD wrning will e issued nd they will e disrdedF sn the next susetions we will show how this option pplies to the other formts supported y qeiF
IIH
5.5.3 HTML
snput
rwv douments re prsed y qei using the xekorwv prserF he douments re red nd reted in qei the sme wy s the wv doumentsF he extensions ssoite with the rwv reder reX
htm
html
text/htmlF
he mgi numers test serhes inside the doument for the rwv@`htmlA signtureFhere re ertin rwv douments tht do not ontin the rwv tgD so the mgil numers test might not holdF here is ertin degree of ustomiztion for rwv douments in tht qei introdues new lines into the doument9s text ontent in order to otin redle formF he nnotE tions will refer the piees of text s desried in the originl doument ut there will e few extr new line hrters insertedF efter reding rID rPD rQD rRD rSD rTD D gixiD vsD f nd hs tgsD qei will introdue new line @xvA hr into the textF efter svi tg it will introdue two xvsF ith tgsD qei will introdue one xv t the eginning of the prgrph nd one t the end of the prgrphF ell newly dded xvs re not onsidered to e prt of the text ontined y the tgF
yutput
he ve s wv9 option works extly the sme for ll qei9s douments so there is no prtiulr oservtion to e mde for the rwv formtsF hen ttempting to preserve the originl mrkup formttingD qei will generte the doE ument in xhtmlF he html doument will look the sme with ny rowser fter proessed y qei ut it will e in nother syntxF
III
5.5.4 SGML
snput
he qwv support in qei is firly light s there is no freely ville tv qwv prserF qei uses light onverter ttempting to trnsform the input qwv (le into well formed wvF feuse it does not mke use of hhD the onversion might not e lwys goodF st is dvisle to perform qwvPwv onversion outside the system@using some other speilized toolsA efore using the qwv doument inside qeiF he extensions ssoite with the qwv reder reX sgm sgml he we server ontent type ssoite with xml douments is X here is no mgi numers test for qwvF
text/sgmlF
yutput
hen ttempting to preserve the originl mrkup formttingD qei will generte the doE ument s wv euse the rel input of qwv doument inside qei is n wv oneF
IIP
then two prgrph9 type nnottion will e reted in the yriginl mrkups9 nnottion set @referring the (rst nd seond prgrphs A with n empty feture mpF he extensions ssoite with the plin text reder reX
txt text
he we server ontent type ssoite with plin text douments isX here is no mgi numers test for plin textF
text/plain.
yutput
hen ttempting to preserve the originl mrkup formttingD qei will dump wv mrkup tht surrounds the text refereedF he proedure desried ove pplies oth for plin text nd p doumentsF
5.5.6 RTF
snput
eessing p douments is performed y using the tv9s p editor kitF st only extrts the doument9s text ontent from the p doumentF he extension ssoite with the p reder is
`rtf 'F text/rtfF
he we server ontent type ssoite with xml douments is X he mgi numers test serhes for {\\rtfIF
yutput
me s the plin tex outputF
IIQ
5.5.7 Email
snput
qei is le to red emil messges pked in one doument @xs milox formtAF st detets multiple messges inside suh douments nd for eh messge it retes nnottions for ll the (elds omposing n eEmilD like dteD fromD toD sujetD etF he messge9s ody is nlyzed nd prgrph detetion is performed @just like in the plin text seA F ell nnottion reted hve s type the nme of the eEmil9s (elds nd they re pled in the yriginl mrkup nnottion setF
Example:
6 10:35:50 2000
Date: Wed, 6 Sep2000 10:35:49 +0100 (BST) From: forename1 surname2 <[email protected]> To: forename2 surname2 <[email protected]> Subject: A subject Message-ID: <Pine.SOL.3.91.1000906103251.26010A-100000@servername> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII This text belongs to the e-mail body.... This is a paragraph in the body of the e-mail This is another paragraph.
qei ttempts to detet lines suh s From [email protected] Wed Sep 6 10:35:50 20009 in the eEmil textF hose lines seprte eEmil messges ontined in one (leF efter thtD for eh (eld in the eEmil messge nnottions re reted s followsX he nnottion type will e the nme of the (eldD the feture mp will e empty nd the nnottion will spn from the end of the (eld until the end of the line ontining the eEmil (eldF
Example:
IIR
6Sep2000 10:35:49 +0100 (BST)^ a2.type = "from"; a2 spans between the two ^ ^. From:^ forename1 surname2 <[email protected]>^
he extensions ssoited with the emil reder reX eml emil mil he we server ontent type ssoite with plin text douments isX he mgi numers test serhes for keywords like
Subject:DetF text/email.
yutput
me s plin text outputF
IIS
`ge versiona4P4b
nd
xmlnsXsa
qei interpets this formt quite )exilyX the olumns n e seprted y ny whitespe sequeneD nd the numer of olumns n vryF he strings from the leftmost olumn eome strings in the doument ontentD with spes interposedD nd oken nd peoken nnoE ttions @with string nd length feturesA re reted ppropritely in the Original markups setAF ih lnk line @empty or ontining only whitespeA in the originl dt eomes newline in the doument ontentF he tgs in susequent olumns re trnsformed into nnottionsF e hunk tg @eginning with fE nd followed y zero or more mthing sE tgsA produes n nnottion whose type is determined y the rest of the tg @x or in the ove exmpleD ut ny string with no whitespe is eptleAD with kind = chunk fetureF yther tgs produe nnottions with the tg nme s the type nd kind = token fetureF
2 http://ifarm.nl/signll/conll/
IIT
ivery nnottion derived from tg hs column feture whose int vlue indites the soure olumn in the dt @numered from H for the string olumnAF en y tg loses ll open hunk tgs t the end of the previous tokenF his doument formt is ssoited with wswiEtype textGxEonll nd (lenme extensions Fonll nd FioF
5.6
XML Input/Output
upport for input from nd output to wv is desried in etion SFSFPF sn shortX qei will red ny wellEformed wv doument @it does not ttempt to vlidte wv doumentsAF wrkup will y defult e onverted into ntive qei formtF qei will write k into wv in one of two wysX IF reserving the originl formt nd dding seleted mrkup @for exmple to dd the results of some lnguge nlysis proess to the doumentAF PF sn qei9s own wv serilistion formtD whih enodes ll the dt in qei houment @s fr s this is possile within treeEstrutured prdigm ! for IHH7 nonElossy dt storge use qei9s hfw or inry serilistion filities ! see etion RFSAF hen using qei imeddedD ojet representtions of wv douments suh s hyw or jhywD or query nd trnsformtion lnguges suh s Eth or vD my e used in prllel with qei9s own houment representtion @gteFhoumentA without on)itsF
qei ws originlly developed in the ontext of snformtion ixtrtion @siA 8hD nd si systems in mny lnguges nd shpes nd sizes hve een reted using qei with the si omponents tht hve een distriuted with it @see wynrd et al. HH for desriptions of some of these projetsAF1
1 The principal architects of the IE systems in GATE version 1 were Robert Gaizauskas and Kevin
Humphreys. This work lives on in the LaSIE system. (A derivative of LaSIE was distributed with GATE
IIU
IIV
qei is distriuted with n si system lled exxsiD e xerlyExew si system @develE oped y rmish gunninghmD lentin lnD hin wynrdD ulin fonthevD wrin himitrov nd othersAF exxsi relies on (nite stte lgorithms nd the tei lnguge @see ghpter VAF exxsi omponents form pipeline whih ppers in (gure TFIF exxsi omponents re
pigure TFIX exxsi nd vsi inluded with qei @though the linguisti resoures they rely on re generlly more simple thn the ones we use inEhouseAF he rest of this hpter desries these omponentsF
6.1
Document Reset
he doument reset resoure enles the doument to e reset to its originl stteD y removE ing ll the nnottion sets nd their ontentsD prt from the one ontining the doument formt nlysis @yriginl wrkupsAF en optionl prmeterD keepyriginlwrkupseD lE lows users to deide whether to keep the yriginl wrkups e or not while reseting the doumentF he prmeter nnottionypes n e used to speify list of nnottion types to remove from ll the sets insted of the whole setsF
version 1 under the name VIE, a Vanilla IE system.)
IIW
elterntivelyD if the prmeter setsoemove is not emptyD the other prmeters exept nnottionypes re ignored nd only the nnottion sets spei(ed in this list will e removedF sf nnottionypes is lso spei(edD only those nnottion types in the spei(ed sets re removedF sn order to speify tht you wnt to reset the defult nnottion setD just lik the 4edd4 utton without entering nme ! this will dd `nullb whih denotes the defult nnottion setF his resoure is normlly dded to the eginning of n pplitionD so tht doument is reset efore n pplition is rerun on tht doumentF
6.2
Tokeniser
he tokeniser splits the text into very simple tokens suh s numersD puntution nd words of di'erent typesF por exmpleD we distinguish etween words in upperse nd lowerseD nd etween ertin types of puntutionF he im is to limit the work of the tokeniser to mximise e0ienyD nd enle greter )exiility y pling the urden on the grmmr rulesD whih re more dptleF
hetils out the primitive onstruts ville re given in the tokeniser (le @hefultoE keniserFulesAF he following tokeniser rule is for word eginning with single pitl letterX
`UPPERCASE_LETTER' `LOWERCASE_LETTER'* > Token;orth=upperInitial;kind=word;
IPH
st sttes tht the sequene must egin with n upperse letterD followed y zero or more lowerse lettersF his sequene will then e nnotted s type oken9F he ttriute orth9 @orthogrphyA hs the vlue uppersnitil9Y the ttriute kind9 hs the vlue word9F
ord
e word is de(ned s ny set of ontiguous upper or lowerse lettersD inluding hyphen @ut no other forms of puntutionAF e word lso hs the ttriute orth9D for whih four vlues re de(nedX uppersnitil E initil letter is upperseD rest re lowerse llgps E ll upperse letters lowergse E ll lowerse letters mixedgps E ny mixture of upper nd lowerse letters not inluded in the ove tegories
xumer
e numer is de(ned s ny omintion of onseutive digitsF here re no sudivisions of numersF
ymol
wo types of symol re de(nedX urreny symol @eFgF 69D 9A nd symol @eFgF 89D 9AF hese re represented y ny numer of onseutive urreny or other symols @respetivelyAF
untution
hree types of puntution re de(nedX strtpuntution @eFgF @9AD endpuntution @eFgF A9AD nd other puntution @eFgF X9AF ih puntution symol is seprte tokenF
IPI
peoken
hite spes re divided into two types of peoken E spe nd ontrol E ording to whether they re pure spe hrters or ontrol hrtersF eny ontiguous @nd homogeE neousA set of spe or ontrol hrters is de(ned s peokenF he ove desription pplies to the defult tokeniserF roweverD lterntive tokenisers n e reted if neessryF he hoie of tokeniser is then determined t the time of text proessingF
6.3
Gazetteer
he role of the gzetteer is to identify entity nmes in the text sed on listsF he exxsi gzetteer is desried hereD nd lso overed in ghpter IQ in etion IQFPF he gzetteer lists used re plin text (lesD with one entry per lineF ih list represents set of nmesD suh s nmes of itiesD orgnistionsD dys of the weekD etF felow is smll setion of the list for units of urrenyX
Ecu European Currency Units FFr Fr German mark German marks New Taiwan dollar New Taiwan dollars NT dollar
IPP
NT dollars
en index (le @listsFdefA is used to ess these listsY for eh listD mjor type is spei(ed ndD optionllyD minor typeF st is lso possile to inlude lnguge in the sme wy @fourth olumnAD where lists for di'erent lnguges re usedD though exxsi is only onerned with monolingul reognitionF fy defultD the qzetteer retes vookup nnottion for every gzetteer entry it (nds in the textF yne n lso speify n nnottion type @(fth olumnA spei( to n individul listF sn the exmple elowD the (rst olumn refers to the list nmeD the seond olumn to the mjor typeD nd the third to the minor typeF hese lists re ompiled into (nite stte mhinesF eny text tokens tht re mthed y these mhines will e nnotted with fetures speifying the mjor nd minor typesF qrmmr rules then speify the types to e identi(ed in prtiulr irumstnesF ih gzetteer list should reside in the sme diretory s the index (leF
currency_prefix.lst:currency_unit:pre_amount currency_unit.lst:currency_unit:post_amount date.lst:date:specific day.lst:date:day
oD for exmpleD if spei( dy needs to e identi(edD the minor type dy9 should e spei(ed in the grmmrD in order to mth only informtion out spei( dysY if ny kind of dte needs to e identi(edDthe mjor type dte9 should e spei(edD to enle tokens nnotted with ny informtion out dtes to e identi(edF wore informtion out this n e found in the following setionF sn dditionD the gzetteer llows ritrry feture vlues to e ssoited with prtiulr entries in single listF exxsi does not use this pilityD ut to enle it for your own gzetteersD set the optionl gzetteerpetureeprtor prmeter to single hrter @or n espe sequene suh s t or uxxxxA when reting gzetteerF sn this modeD eh line in Flst (le n hve feture vlues spei(edD for exmpleD with the following entry in the index (leX
software_company.lst:company:software
nd gzetteerpetureeprtor set to 8D the gzetteer will nnotte ed rt s vookup with fetures mjorypeaompnyD minorypeasoftwre nd stokymolareF xote tht
IPQ
you do not hve to provide the sme fetures for every line in the (leD in prtiulr it is possile to provide extr fetures for some lines in the list ut not othersF rere is full list of the prmeters used y the hefult qzetteerX
snitEtime prmeters listsv e v pointing to the index (le @usully listsFdefA tht ontins the list of pttern
listsF
enoding he hrter enoding to e used while reding the pttern listsF gzetteerpetureeprtor he hrter used to dd ritrry fetures to gzetteer
entriesF ee ove for n exmpleF
seensitive hould the gzetteer e se sensitive during mthingF unEtime prmeters doument he doument to e proessedF nnottionetxme he nme for nnottion set where the resulting vookup nnottions
will e retedF
wholeordsynly hould the gzetteer only mth whole wordsc sf set to trueD string
segment in the input doument will only e mthed if it is ordered y hrters tht re not lettersD non sping mrksD or omining sping mrks @s identi(ed y the niode stndrdAF
longestwthynly hould the gzetteer only mth the longest possile string strting
from ny positionF his prmeter is only relevnt when the list of lookups ontins proper pre(xes of other entries @eFg when oth hell9 nd hell iurope9 re in the listsAF he defult ehviour @when this prmeter is set to trueA is to only mth the longest entryD hell iurope9 in this exmpleF his is the defult qei gzetteer ehviour sine version PFHF etting this prmeter to flse will use the gzetteer to mth ll possile pre(xesF
6.4
Sentence Splitter
he sentene splitter is sde of (niteEstte trnsduers whih segments the text into sentenesF his module is required for the tggerF he splitter uses gzetteer list of revitions to help distinguish senteneEmrking full stops from other kindsF ih sentene is nnotted with the type entene9F ih sentene rek @suh s full stopA is lso given plit9 nnottionF st hs feture kind9 with two possile vluesX
IPR
internl9 for ny omintion of exlmtion nd question mrk or one to four dots nd externl9 for newlineF he sentene splitter is domin nd pplitionEindependentF here is n lterntive ruleset for the entene plitter whih onsiders newlines nd rrige returns di'erentlyF sn generl this version should e used when new line on the pge indites new senteneAF o use this lterntive versionD simply lod the minEsingleE nlFjpe from the defult lotion insted of minFjpe @the defult (leA when sked to selet the lotion of the grmmr (le to e usedF
6.5
he egix sentene splitter is n lterntive to the stndrd exxsi entene plitterF sts min im is to ddress some performne issues identi(ed in the teiEsed splitterD minly do to with improving the exeution time nd roustnessD espeilly when fed with irregulr inputF es its nme suggestsD the egix splitter is sed on regulr expressionsD using the defult tv implementtionF he new splitter is on(gured y three (les ontining @tv styleD see httpXGG jvFsunFomGjPseGIFSFHGdosGpiGjvGutilGregexGtternFhtmlA regulr expresE sionsD one regex per lineF he three di'erent (les enode ptterns forX
internl splits sentene splits tht re prt of the senteneD suh s sentene ending punE
tutionY
externl splits sentene splits tht re xy prt of the senteneD suh s P onseutive
new linesY
non splits text frgments tht might e seen s splits ut they should e ignored @suh s
full stops ourring inside revitionsAF
he new splitter omes with n initil set of ptterns tht try to emulte the ehviour of the originl splitter @prt from the situtions where the originl one ws oviously wrongD like not llowing sentenes to strt with numerAF rere is full list of the prmeters used y the egix entene plitterX
snitEtime prmeters enoding he hrter enoding to e used while reding the pttern listsF
IPS
externlplitvistv v for the (le ontining the list of externl split ptternsY internlplitvistv v for the (le ontining the list of internl split ptternsY nonplitvistv v for the (le ontining the list of non split ptternsY unEtime prmeters doument he doument to e proessedF outputexme he nme for nnottion set where the resulting plit nd entene
nnottions will e retedF
6.6
he tgger repple HH is modi(ed version of the frill tggerD whih produes prtE ofEspeeh tg s n nnottion on eh word or symolF he list of tgs used is given in eppendix qF he tgger uses defult lexion nd ruleset @the result of trining on lrge orpus tken from the ll treet tournlAF foth of these n e modi(ed mnully if neessryF wo dditionl lexions exist E one for texts in ll upperse @lexionpAD nd one for texts in ll lowerse @lexionlowerAF o use theseD the defult lexion should e repled with the pproprite lexion t lod timeF he defult ruleset should still e used in this seF he exxsi rtEofEpeeh tgger requires the following prmetersF enoding E enoding to e used for reding rules nd lexions @initEtimeA lexionv E he v for the lexion (le @initEtimeA rulesv E he v for the ruleset (le @initEtimeA doument E he doument to e proessed @runEtimeA inputexme E he nme of the nnottion set used for input @runEtimeA outputexme E he nme of the nnottion set used for output @runEtimeAF his is n optionl prmeterF sf user does not provide ny vlueD new nnottions re reted under the defult nnottion setF seokenennottionype E he nme of the nnottion type tht refers to okens in doument @runEtimeD defult a okenA seenteneennottionype E he nme of the nnottion type tht refers to enE tenes in doument @runEtimeD defult a enteneAF
IPT
sf E @inputexme aa outputexmeA exh @outputennottionype aa seokenenE nottionypeA then E xew fetures re dded on existing nnottions of type seokenennottionype9F otherwise E gger serhes for the nnottion of type outputennottionype9 under the outputexme9 nnottion set tht hs the sme o'sets s tht of the nnottion with type seokenennottionype9F sf it sueedsD it dds new feture on found nnotE tionD nd otherwiseD it retes new nnottion of type outputennottionype9 under the outputexme9 nnottion setF
6.7
Semantic Tagger
exxsi9s semnti tgger is sed on the tei lnguge ! see ghpter VF st ontins rules whih t on nnottions ssigned in erlier phsesD in order to produe outputs of nnotted entitiesF
6.8
@xoteX this omponent ws previously known s xmewther9FA he yrthomther module dds identity reltions etween nmed entities found y the semnti tggerD in order to perform orefereneF st does not (nd new nmed entities s suhD ut it my ssign type to n unlssi(ed proper nmeD using the type of mthing nmeF he mthing rules re only invoked if the nmes eing ompred re oth of the sme typeD iFeF oth lredy tgged s @syA orgnistionsD or if one of them is lssi(ed s unknown9F his prevents previously lssi(ed nme from eing retegorisedF
IPU
6.8.2 Resources
e lookup tle of lises is used to reord nonEmthing strings whih represent the sme entityD eFgF sfw9 nd fig flue9D goEgol9 nd goke9F here is lso tle of spurious mthesD iFeF mthing strings whih do not represent the sme entityD eFgF f ireless9 nd f gellnet9 @whih re two di'erent orgniztionsAF he list of tles to e used is lod time prmeter of the orthomtherX defult list is set ut n e hnged s neessryF
6.8.3 Processing
he wrpper uilds n rry of the stringsD types nd shs of ll nme nnottionsD whih is then pssed to string omprison funtion for pirwise omprisons of ll entriesF
6.9
Pronominal Coreference
he pronominl oreferene module performs nphor resolution using the tei grmmr formlismF xote tht this module is not utomtilly loded with the other exxsi modE ulesD ut n e loded seprtely s roessing esoureF he min module onsists of three sumodulesX quoted text module pleonsti it module pronominl resolution module he (rst two modules re helper sumodules for the pronominl oneD euse they do not perform nything relted to oreferene resolution exept the lotion of quoted frgments nd pleonsti it ourrenes in textF hey generte temporry nnottions whih re used y the pronominl sumodule @suh temporry nnottions re removed lterAF he min oreferene module n operte suessfully only if ll exxsi modules were lredy exeutedF he module depends on the following nnottions reted from the reE spetive exxsi modulesX
IPV
por eh pronoun @nphorA the oreferene module genertes n nnottion of type gorefE erene9 ontining two feturesX nteedent o'set E this is the o'set of the strting node for the nnottion @entityA whih is proposed s the nteedentD or null if no nteedent n e proposedF mthes E this is list of nnottion shs tht omprise the oreferene hin omprising this nphorGnteedent pirF
IPW
! inspet the proper pproprite ontext for ll ndidte nteedents for this kind
of pronounY
reproessing
he preproessing tsk inludes the following sutsksX sdentifying the sentenes in the doument eing proessedF he sentenes re identi(ed with the help of the entene nnottions generted from the entene plitterF por eh sentene dt struture is prepred tht ontins three listsF he lists ontin the nnottions for the personGorgniztionGlotion nmed entities ppering in the senteneF he nmed entities in the sentene re identi(ed with the help of the ersonD votion nd yrgniztion nnottions tht re lredy generted from the xmed intity rnsduer nd the yrthowtherF he gender of eh person in the sentene is identi(ed nd stored in glol dt strutureF st is possile tht the gender informtion is missing for some entities E for exmple if only the person fmily nme is oserved then the xmed intity trnsduer will e unle to dedue the genderF sn suh ses the list with the mthing entities generted y the yrhtowther is inspeted nd if some of the orthogrphi mthes ontins gender informtion it is ssigned to the entity eing proessedF
IQH
ronoun esolution
his tsk inludes the following sutsksX etrieving ll the pronouns in the doumentF ronouns re represented s nnottions of type oken9 with feture tegory9 hving vlue 69 or 9F he former lssi(es possessive djetives suh s myD yourD etF nd the ltter lssi(es personlD re)exive etF pronounsF he two types of pronouns re omined in one list nd sorted ording to their o'set in the textF por eh pronoun in the list the following tions re performedX sf the pronoun is it9D then the module performs hek to determine if this is pleonsti ourreneF sf it isD then no further ttempt for resolution is mdeF he proper ontext is determinedF he ontext size is expressed in the numer of sentenes it will ontinF he ontext lwys inludes the urrent sentene @the one ontining the pronounAD the preeding sentene nd zero or more preeding sentenesF hepending on the type of pronounD set of ndidte nteedents is proposedF he ndidte set inludes the nmed entities tht re omptile with this pronounF por exmple if the urrent pronoun is she then only the erson nnottions with gender9 feture equl to femle9 or unknown9 will e onsidered s ndidtesF prom ll ndidtesD one is hosen ording to evlution riteri spei( for the pronounF
IQI
por eh pirD the orthogrphi mthes @if nyA of the nteedent entity is retrieved nd then extended with the nphor of the pir @iFeF the pronounAF he result is the oreferene hin for the entityF he oreferene hin ontins the shs of the nnottions @entitiesA tht oEreferF e new goreferene nnottion is reted for eh hinF he nnottion ontins single feture mthes9 whose vlue is the oreferene hin @the list with shsAF he nnottions re exported in preEspei(ed nnottion setF he resolution of sheD herD her6D heD himD hisD herself nd himself re similr euse n nlysis of orpus showed tht these pronouns re relted to their nteedents in similr mnnerF he hrteristis of the resolution proess reX gontext inspeted is not very ig E ses where the nteedent is found more thn Q sentenes k from the nphor re rreF eeny ftor is hevily used E the ndidte nteedents tht pper loser to the nphor in the text re sored etterF enphor hve higher priority thn tphorF sf there is n nphori ndidte nd tphori oneD then the nphori one is preferredD even if the reeny ftor sores the tphori ndidte etterF he resolution proess performs the following stepsX snspet the ontext of the nphor for ndidte nteedentsF ivery erson nnottion is onsider to e ndidteF gses where sheGher refers to innimte entity @ship for exmpleA re not hndledF por eh ndidte perform gender omptiility hek E only ndidtes hving gender9 feture equl to unknown9 or omptile with the pronoun re onsidered for further evlutionF ivlute eh ndidte with the est ndidte so frF sf the two ndidtes re nphori for the pronoun then hoose the one tht ppers loserF he sme holds for the se where the two ndidtes re tphori reltive to the pronounF sf one is nphori nd the other is tphori then hoose the formerD even if the ltter ppers loser to the pronounF
IQP
IQQ
ry to lote ndidte in the text preeding the quoted frgment @third ptternAF ghoose the losest one to the eginning of the quoteF sf found then set s nteedent nd exitF ry to lote nteedents in the unquoted prt of the sentene preeding the sentene where the quote strts @seond ptternAF qive preferene to the one losest to the end of the quote @if nyA in the preeding sentene or losest to the sentene eginningF
6.10
A Walk-Through Example
vet us tke n exmple of QEstge proedure using the tokeniserD gzetteer nd nmedE entity grmmrF uppose we wish to reognise the phrse VHHDHHH dollrs9 s n entity of type xumer9D with the feture money9F pirst of llD we give n exmple of grmmr rule @nd orresponding mrosA for moneyD whih would reognise this type of ptternF
Macro: MILLION_BILLION ({Token.string == "m"}| {Token.string == "million"}| {Token.string == "b"}| {Token.string == "billion"} ) Macro: AMOUNT_NUMBER ({Token.kind == number} (({Token.string == ","}| {Token.string == "."}) {Token.kind == number})* (({SpaceToken.kind == space})? (MILLION_BILLION)?) ) Rule: Money1 // e.g. 30 pounds ( (AMOUNT_NUMBER) (SpaceToken.kind == space)? ({Lookup.majorType == currency_unit}) ) :money --> :money.Number = {kind = "money", rule = "Money1"}
IQR
IQS
imedding qeiEsed lnguge proessing in other pplitions using qei imedded @the qei esA is strightforwrdX dd 6qeirywiGinGgteFjr nd the te (les in 6qeirywiGli to the tv gveer @6qeirywi is the qei root diretoryA tell tv tht the qei niode uit is n extensionX
EhjvFextFdirsa6qeirywiGliGext xFfF his is only neessry for qs pplitions tht need to support niode text inputY other pplitions suh s ommnd line or we pplitions don9t generlly need quF
initilise qei with gteFqteFinit@AY progrm to the frmework esF por exmpleD this ode will rete the exxsi extrtion systemX
1 2 3 4 5 6 7 8
/ / load ANNIE as an application from a gapp le / / initialise the GATE library
SerialAnalyserController controller = ( SerialAnalyserController ) PersistenceManager . loadObjectFromFile ( new File ( new File ( Gate . getPluginsHome () , ANNIEConstants . PLUGIN_DIR ) , ANNIEConstants . DEFAULT_FILE ));
IQU
IQV
GATE Embedded
sf you wnt to use resoures from ny pluginsD you need to lod the plugins efore lling reteesoureX
1 2 3 4 5 6 7 8 9 10 11
Gate . init (); Gate . getCreoleRegister (). registerDirectories ( new File ( Gate . getPluginsHome () , " Tools " ). toURL () ); ... ProcessingResource morpher = ( ProcessingResource ) Factory . createResource ( " gate . creole . morph . Morph " );
/ / need Tools plugin for the Morphological analyser
snsted of reting your proessing resoures individully using the ptoryD you n rete your pplition in qei heveloperD sve it using the sve pplition stte9 option @see etion QFWFQAD nd then lod the sved stte from your odeF his will utomtilly relod ny plugins tht were loded when the stte ws svedD you do not need to lod them mnullyF
1 2 3 4 5 6
/ / loadObjectFromUrl is also available
Gate . init (); CorpusController controller = ( CorpusController ) PersistenceManager . loadObjectFromFile ( new File ( " savedState . xgapp " ));
here re mny exmples of using qei imedded ville tX httpXGGgteFFukGwikiGodeErepositoryGF ee etion PFQ for detils of the system properties qei uses to (nd its on(gurtion (lesF
7.2
vnguge esoures X @vsA entities tht hold linguisti dtF roessing esoures X @sA entities tht proess dtF isul esoures X @sA omponents used for uilding grphil interfesF
hese resoures re olletively nmed giyvi1 resouresF
1 CREOLE stands for Collection of REusable Objects for Language Engineering
GATE Embedded
IQW
ell giyvi resoures hve some ssoited metEdt in the form of n entry in speil wv (le nmed reoleFxmlF he most importnt role of tht metEdt is to speify the set of prmeters tht resoure understndsD whih of them re required nd whih notD if they hve defult vlues nd wht those reF he vlid prmeters for resoure re desried in the resoure9s setion of its reoleFxml (le or in tv nnottions on the resoure lss ! see etion RFUF ell resoure types hve retionEtime prmeters tht re used during the initilistion phseF roessing esoures lso hve runEtime prmeters tht get used during exeution @see etion UFS for more detilsAF
gontrollers re used to de(ne qei pplitions nd hve the role of ontrolling the
exeution )ow @see etion UFT for more detilsAF his setion desries how to rete nd delete giyvi resoures s ojets in running tv virtul mhineF his proess involves using qei9s ptory lss2 D ndD in the se of vsD my lso involve using httoreF giyvi resoures re tv fensY retion of resoure ojet involves using defult onstrutorD then setting prmeters on the enD then lling n init@A methodF he ptory tkes re of ll thisD mkes sure tht the qei heveloper qs is told out wht is hppening @when qs omponents exist t runtimeAD nd lso tkes re of restoring vs from httoresF e progrmmer using qei imedded should never ll the
greting resoure involves providing the following informtionX fully quli(ed lss nme for the resoureF his is the only required vlueF por ll the restD defults will e used if tul vlues re not providedF vlues for the retion time prmetersF initil vlues for resoure feturesF por n explntion on fetures see etion UFRFPF nme for the new resoureY rmeters nd fetures need to e provided in the form of qei peture wp whih is essentilly jv wp @jvFutilFwpA implementtionD see etion UFRFP for more detils on peture wpsF
greting resoure vi the ptory involves pssing vlues for ny reteEtime prmeters tht require setting to the ptory9s reteesoure methodF sf no prmeters re pssedD the defults re usedF oD for exmpleD the following ode retes defult exxsi prtEofE speeh tggerX
2 Fully qualied name:
gate.Factory
IRH
GATE Embedded
Gate . getCreoleRegister (). registerDirectories ( new File ( Gate . getPluginsHome () , ANNIEConstants . PLUGIN_DIR ). toURI (). toURL ()); FeatureMap params = Factory . newFeatureMap (); / / empty map:default params ProcessingResource tagger = ( ProcessingResource ) Factory . createResource ( " gate . creole . POSTagger " , params );
1 2 3 4 5
xote tht if the resoure reted here hd ny prmeters tht were oth mndtory nd hd no defult vlueD the reteesoure ll would throw n exeptionF sn this seD ll the informtion needed to rete tgger is ville in defult vlues given in the tgger9s wv de(nition @in pluginsGexxsiGreoleFxmlAX
<RESOURCE> <NAME>ANNIE POS Tagger</NAME> <COMMENT>Mark Hepple's Brill-style POS tagger</COMMENT> <CLASS>gate.creole.POSTagger</CLASS> <PARAMETER NAME="document" COMMENT="The document to be processed" RUNTIME="true">gate.Document</PARAMETER> .... <PARAMETER NAME="rulesURL" DEFAULT="resources/heptag/ruleset" COMMENT="The URL for the ruleset file" OPTIONAL="true">java.net.URL</PARAMETER> </RESOURCE>
rere the two prmeters shown re either runtime9 prmetersD whih re set efore is exeutedD or hve defult vlue @in this se the defult rules (le is distriuted with qei itselfAF hen reting houmentD howeverD the v of the soure for the doument must e provided3 F por exmpleX
1 2 3 4 5
URL u = new URL ( " http :// gate . ac . uk / hamish / " ); FeatureMap params = Factory . newFeatureMap (); params . put ( " sourceUrl " , u ); Document doc = ( Document ) Factory . createResource ( " gate . corpora . DocumentImpl " , params );
xote tht the doument reted here is trnsientX when you quit the tw the doument will no longer existF sf you wnt the doument to e persistentD you need to store it in httore @see etion UFRFSAF eprt from reteesoure@A methods with di'erent signturesD ptory lso provides some shortuts for ommon opertionsD listed in tle UFIF qei mintins vrious dt strutures tht llow the retrievl of loded resouresF hen resoure is no longer requiredD it needs to e removed from those strutures in order to
3 Alternatively a string giving the document source may be provided.
GATE Embedded
IRI
Method newFeatureMap() newDocument(String content) newDocument(URL sourceUrl) newDocument(URL sourceUrl, String encoding) newCorpus(String name)
Purpose
Creates a new Feature Map (as used in the example above). Creates a new GATE Document starting from a String value that will be used to generate the document content. Creates a new GATE Document using the text pointed by an URL to generate the document content. Same as above but allows the specication of an encoding to be used while downloading the document content. creates a new GATE Corpus with a specied name.
remove ll referenes to itD thus mking it ndidte for grge olletionF his is hieved using the deleteesoure@esoure resA method on ptoryF imply removing ll referenes to resoure from the user ode will xy e enough to mke the resoure olletEleF xot lling ptoryFdeleteesoure@A will led to memory leks3
7.3
es shown in the exmples oveD in order to use giyvi resoure the relevnt giyvi plugin must e lodedF roessing esouresD isul esoures nd vnguge esoures other thn houmentD gorpus nd httore ll require tht the pproprite plugin is (rst lodedF hen using houmentD gorpus or httoreD you do not need to (rst lod pluginF he following es lls listed in tle UFP re relevnt to working with giyvi pluginsF sf you re writing qei imedded pplition nd hve single resoure lss tht will only e used from your emedded ode @nd so does not need to e disE triuted s omplete pluginAD nd ll the on(gurtion for tht resoure is provided s tv nnottions on the lssD then it is possile to register the lss with the greoleegister t runtime without needing to pkge it in te nd provide reoleFxml (leF ou n pss the glss ojet representing your resoure lss to qteFgetgreoleegister@AFregistergomponent@A method nd then rete instnes of the resoure in the usul wy using ptoryFreteesoureF xote tht resoures nnot e registered this wy in the developer qsD nd nnot e inluded in sved pplition sttes @see setion UFW elowAF
IRP
GATE Embedded
Class gate.Gate Method Purpose public static void addKnownadds the plugin to the list of known pluPlugin(URL pluginURL) gins. public static void removetells the system to `forget' about one KnownPlugin(URL pluginURL) previously known directory. If the spec-
public static void addAutoloadPlugin(URL pluginUrl) public static void removeAutoloadPlugin(URL pluginURL)
Class gate.CreoleRegister public void registerDirectoloads a new CREOLE directory. The ries(URL directoryUrl) new plugin is added to the list of known
public void registerComponent(Class<? extends Resource> cls) public void removeDirectory(URL directory)
plugins if not already there. registers a single @CreoleResource annotated class without the need for a creole.xml le. unloads a loaded CREOLE plugin.
ied directory was loaded, it will be unloaded as well - i.e. all the metadata relating to resources dened by this directory will be removed from memory. adds a new directory to the list of plugins that are loaded automatically at start-up. tells the system to remove a plugin URL from the list of plugins that are loaded automatically at system start-up. This will be reected in the user's conguration data le.
GATE Embedded
IRQ
7.4
Language Resources
yjets tht hve fetures in qei implement the gteFutilFpetureferer interE fe whih hs only the two essor methods for the ojet feturesX peturewp getpetures@A nd void setpetures@peturewp feturesAF
IRR
GATE Embedded
Content Manipulation Method Purpose DocumentContent getContent() Gets the Document content. void edit(Long start, Long end, Modies the Document content.
DocumentContent replacement) void setContent(DocumentContent newContent)
Replaces the entire content.
name)
Annotations Manipulation Method Purpose public AnnotationSet getAnnotaReturns the default annotation set. tions() public AnnotationSet getAnnotaReturns a named annotation set. tions(String name) public Map getNamedAnnotation- Returns all the named annotation sets. Sets() void removeAnnotationSet(String Removes a named annotation set.
String
toXml()
Input Output
Serialises the Document in XML format. Generates XML from a set of annotations only, trying to preserve the original format of the le used to create the document.
GATE Embedded
qetting prtiulr feture from n ojet
1 2 3 4 5 6 7
IRS
Object obj ; String featureName = " length " ; if ( obj instanceof FeatureBearer ){ FeatureMap features = (( FeatureBearer ) obj ). getFeatures (); Object value = ( features == null ) ? null : features . get ( featureName ); }
pigure UFIX he ennottion qrph modelF en nnottion set holds numer of nnottions nd mintins series of indies in order to provide fst ess to the ontined nnottionsF he qei ennottion ets re de(ned y the gteFennottionet interfe nd there is defult implementtion providedX
IRT
GATE Embedded
Integer add(Long start, Long end, String type, FeatureMap features) Integer add(Node start, Node end, String type, FeatureMap features) boolean remove(Object o)
Method
Nodes Purpose
Creates a new annotation between two osets, adds it to this set and returns its id. Creates a new annotation between two nodes, adds it to this set and returns its id. Removes an annotation from this set.
Set implementation
Gets the node with the smallest oset. Gets the node with the largest oset. Get the rst node that is relevant for this annotation set and which has the oset larger than the one of the node provided.
les UFR nd UFS list the most used ennottion et funtionsF sterting from left to right over ll nnottions of given type
1 2 3 4 5 6 7 8 9 10 11 12
List persList = new ArrayList ( persSet ); Collections . sort ( persList , new gate . util . OffsetComparator ());
Iterator persIter = persList . iterator (); while ( persIter . hasNext ()){ ... }
/ / Iterate
7.4.4 Annotations
en nnottionD is form of metEdt tthed to prtiulr setion of doument ontentF he onnetion etween the nnottion nd the ontent it refers to is mde y mens of two pointers tht represent the strt nd end lotions of the overed ontentF en nnottion
GATE Embedded
IRU
AnnotationSet
get(Long offset)
Searching
Select annotations by oset. This returns the set of annotations whose start node is the least such that it is less than or equal to oset. If a positional index doesn't exist it is created. If there are no nodes at or beyond the oset parameter then it will return null. Select annotations by oset. This returns the set of annotations that overlap totally or partially with the interval dened by the two provided osets. The result will include all the annotations that either: start before the start oset and end strictly after it start at a position between the start and the end osets
AnnotationSet AnnotationSet
AnnotationSet get(String type, FeatureMap constraints) Set getAllTypes() AnnotationSet getContained(Long startOffset, Long endOffset) AnnotationSet getCovering(String neededType, Long startOffset, Long endOffset)
Returns all annotations of the specied type. Returns all annotations of the specied types. Selects annotations by type and features. Gets a set of java.lang.String objects representing all the annotation types present in this annotation set. Select annotations contained within an interval, i.e. Select annotations of the given type that completely span the range.
IRV
GATE Embedded
must lso hve type @or nmeA whih is used to rete lsses of similr nnottionsD usully linked together y their semntisF en ennottion is de(ned yX
strt node lotion in the doument ontent de(ned y n o'setF end node lotion in the doument ontent de(ned y n o'setF type tring vlueF fetures @see etion UFRFPAF sh n snteger vlueF ell nnottions shs re unique inside n nnottion setF
sn qei imeddedD nnottions re de(ned y the gteFennottion interfe nd impleE mented y the gteFnnottionFennottionsmpl lssF ennottions exist only s memers of nnottion sets @see etion UFRFQA nd they should not e diretly reted y mens of onstrutorF heir retion should lwys e delegted to the ontining nnottion setF
gteForporFgorpussmpl used for trnsient orporF gteForporFerilgorpussmpl used for persistent orpor tht re stored in seril dtstore @iFeF s diretory in (le systemAF
eprt from implementtion for the stndrd vist methodsD gorpus lso implements the methods in tle UFTF
Corpus corpus = Factory . newCorpus ( " My XML Files " ); File directory = ...; ExtensionFileFilter filter = new ExtensionFileFilter ( " XML files " , " xml " ); URL url = directory . toURL (); corpus . populate ( url , filter , null , false );
sing httore
essuming tht you hve httore lredy open lled myhttoreD this ode will sk the dtstore to tke over persistene of your doumentD nd to synhronise the memory representtion of the doument with the disk storgeX
GATE Embedded
IRW
Purpose
void populate(URL singleConcatenatedFile, String documentRootElement, String encoding, int numberOfDocumentsToExtract, String documentNamePrefix, DocType documentType)
Gets the name of a document in this corpus. Gets the names of all the documents in this corpus. Fills this corpus with documents created on the y from selected les in a directory. Uses a FileFilter to select which les will be used and which will be ignored. A simple le lter based on extensions is provided in the Gate distribution (gate.util.ExtensionFileFilter). Fills the provided corpus with documents extracted from the provided single concatenated le. Uses the content between the start and end of the element as specied by documentRootElement for each document. The parameter documentType species if the resulting les are html, xml or of any other type. User can also restrict the number of documents to extract by providing the relevant value for numberOfDocumentsToExtract parameter.
ISH
GATE Embedded
hen you wnt to restore doument @or other vA from dtstoreD you mke the sme reteesoure ll to the ptory s for the retion of trnsient resoureD ut this time you tell it the dtstore the resoure me fromD nd the sh of the resoure in tht dtstoreX
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
/ / read the document back / / we need to tell the factory about the LR's ID in the data / / store, and about which datastore it is in - we do this / / via a feature map: / / getLrIds returns a list of LR Ids, so we get the rst one
URL u = ....; / / URL of a serial datastore directory SerialDataStore sds = new SerialDataStore ( u . toString ()); sds . open (); Object lrId = sds . getLrIds ( " gate . corpora . DocumentImpl " ). get (0);
FeatureMap features = Factory . newFeatureMap (); features . put ( DataStore . LR_ID_FEATURE_NAME , lrId ); features . put ( DataStore . DATASTORE_FEATURE_NAME , sds ); Document doc = ( Document ) Factory . createResource ( " gate . corpora . DocumentImpl " , features );
7.5
Processing Resources
roessing esoures @sA represent entities tht re primrily lgorithmiD suh s prsersD genertors or ngrm modellersF hey re reted using the qei ptory in mnner similr the vnguge esouresF feE sides the retionEtime prmeters they lso hve set of runEtime prmeters tht re set y the system just efore exeuting themF enlysers re prtiulr type of proessing resoures in the sense tht they lwys hve doument nd orpus mong their runEtime prmetersF he most used methods for roessing esoures re presented in tle UFU
7.6
Controllers
gontrollers re used to rete qei pplitionsF e gontroller hndles set of roessing esoures nd n exeute them following prtiulr strtegyF qei provides series of seril ontrollers @iFeF ontrollers tht run their s in sequeneAX
GATE Embedded
ISI
Purpose
Sets the value for a specied parameter. method inherited from gate.Resource Sets the values for more parameters in one step. method inherited from gate.Resource Gets the value of a named parameter of this resource. method inherited from gate.Resource Initialise this resource, and return it. method inherited from gate.Resource Reinitialises the processing resource. After calling this method the resource should be in the state it is after calling init. If the resource depends on external resources (such as rules les) then the resource will re-read those resources. If the data used to create the resource has changed since the resource has been created then the resource will change too after calling reInit(). Starts the execution of this Processing Resource. Noties this PR that it should stop its execution as soon as possible. Checks whether this PR has been interrupted since the last time its Executable.execute() method was called.
init()
reInit()
void void
boolean
ISP
GATE Embedded
gteFreoleFerilgontrollerX seril ontroller tht tkes ny kind of sF gteFreoleFerilenlysergontrollerX seril ontroller tht only epts vnguge enlysers s memer sF gteFreoleFgonditionlerilgontrollerX seril ontroller tht epts ll types of s nd tht llows the inlusion or exlusion of memer s from the exeution hin ording to ertin runEtime onditions @urrently fetures on the doument eing proessed re usedAF gteFreoleFgonditionlerilenlysergontrollerX seril ontroller tht only E epts vnguge enlysers nd tht llows the onditionl run of memer sF gteFreoleFeltimegorpusgontrollerX erilenlysergontroller tht llows you to speify timeout prmeterF sf proessing for doument tkes longer thn this timeout then it will e forily terminted nd the ontroller will move on to the next doumentF elso if n exeption ours while proessing doument this will simply use the ontroller to move on to the next doument rther thn filing the entire orpus proessingF
edditionlly there is scriptable controller provided y the qroovy pluginF ee setion UFIUFQ for detilsF
Gate . getCreoleRegister (). registerDirectories ( new File ( Gate . getPluginsHome () , " ANNIE " ). toURI (). toURL ()); SerialAnalyserController annieController = ( SerialAnalyserController ) Factory . createResource ( " gate . creole . SerialAnalyserController " , Factory . newFeatureMap () , Factory . newFeatureMap () , " ANNIE " );
FeatureMap params = Factory . newFeatureMap (); ProcessingResource pr = ( ProcessingResource ) Factory . createResource ( ANNIEConstants . PR_NAMES [ i ] , params ); annieController . add ( pr );
GATE Embedded
ISQ
26 27
/ / Run ANNIE
7.7
wost text proessing tsks in qei model metdt ssoited with text snippets s nnoE ttionsF sn some sesD howeverD it is useful to to hve nother lyer of metdtD ssoited with the nnottions themselvesF yne suh se is the modelling of reltions etween nnoE ttionsF yne typil exmple of reltions etween nnottion is tht of oErefereneF wo nnottions of type erson my e referring to the sme tul personY in this se the two nnottions re sid to e oEreferringF trting with version 7.1D qei imedded supports the representtion of reltions etween nnottionsF imilr to the nnottionsD the reltions re ssoited with doumentD nd re grouped in reltion setsF eltion sets re otined using their nmeF fy onventionD the default reltions set orresponding to n nnottion set hs the sme nme s the nnottion setF gonsequentlyD the reltion set for the defult nnottion set uses the vlue null s its nmeF he tul reltions dt is stored s speilly nmed doument fetureF he lsses supporting reltions n e found in the gate.relations pkgeF e reltionD s desried y the gate.relations.Relation interfeD is de(ned y the following vluesX
'coref '
memers n int[] rryD ontining the nnottion shs for the nnottions referred to y
the reltionF xote tht reltions re not gurnteed to e symmetriD so the ordering in the memers rry is relevntF
eltion sets re modelled y the gate.relations.RelationSet lssF he prinipl es lls pulished y this lss inludeX public static RelationSet getRelations(Document document, String name) tti ftory methodF qets reltion set for the given doument nd given nmeF en ritrry numer of reltion sets n e ssoited with doumentD ut they ll must hve di'erent nmesF fy onventionD the defult reltion set ssoited with n nnottion set ers the sme nme s the nnottion setF his n e otined y lling the method desried elowF
ISR
GATE Embedded
public static RelationSet getRelations(AnnotationSet annSet) tti ftory methodF qets the defult reltion set ssoited with given nnottion setF public Relation addRelation(String type, int... members) gretes new reltion with the spei(ed type nd memer nnottionsF eturns the newly reted reltion ojetF public void addRelation(Relation rel) edds to this reltion set n externllyEreted reltionF his method is provided to support hte use of ustom implementtions of the gate.relations.Relation interfeF public boolean deleteRelation(Relation relation) heletes the spei(ed reltion from this reltion setF public List<Relation> getRelations(String type) qets ll reltions with the spei(ed type ontined in this reltion setF public List<Relation> getRelations(int... members) qets reltions y memersF qets ll reltions with hve the spei(ed memers on the spei(ed positionsF he required memers re represented s n int[]D where eh required nnottion sh is pled on its required positionF por unonstrined positionsD the onstnt vlue gate.relations.RelationSet.ANY should e usedF public List<Relation> getRelations(String type, int... members) qets ll reltions with the spei(ed type nd memersF public int getMaximumArity() qets the mximum rity @numer of memersA for ll reltions in this reltion setF
snluded next is simple ode snippet tht illustrtes the eltionet esF he funtion of the exmple ode is toX (nd ll the entene nnottions inside doumentY for eh senteneD (nd ll the ontined oken nnottionsY for eh sentene nd ontined tokenD dd new reltion nmed the token nd the senteneF
1 2 3 4 5 6 7 8
/ / get the document
contained
etween
Document doc = Factory . newDocument ( new File ( " documents / file . xml " ). toURI (). toURL ()); AnnotationSet annSet = doc . getAnnotations ();
/ / get the annotation set / / get the relations set / / get all sentences
GATE Embedded
AnnotationSet sentences = annSet . get ( ANNIEConstants . SENTENCE_ANNOTATION_TYPE ); for ( Annotation sentence : sentences ) { AnnotationSet tokens = annSet . get ( ANNIEConstants . TOKEN_ANNOTATION_TYPE , sentence . getStartNode (). getOffset () , sentence . getEndNode (). getOffset ()); for ( Annotation token : tokens ) {
/ / for each sentence and token, add the contained relation / / get all the tokens
ISS
9 10 11 12 13 14 15 16 17 18 19 20 21 22
relSet . addRelation ( " contained " , new int [] { token . getId () , sentence . getId ()});
7.8
Duplicating a Resource
ometimesD prtiulrly in multiEthreded pplitionD it is useful to e le to rete n independent opy of n existing D ontroller or vF he ovious wy to do this is to ll reteesoure ginD pssing the sme lss nmeD prmetersD fetures nd nmeD nd for mny resoures this will do the right thingF rowever there re some resoures for whih this my e insu0ient @eFgF ontrollersD whih lso need to duplite their sAD unsfe @if uses temporry (lesD for instneAD or simply ine0ientF por exmple for lrge gzetteer this would involve loding seond opy of the lists into memory nd ompiling them into seond identil stte mhine representtionD ut muh more e0ient wy to hieve the sme ehviour would e to use hredhefultqzetteer @see setion IQFIHAD whih n reEuse the existing stte mhineF he qei ptory provides duplite method whih tkes n existing resoure instne nd retes nd returns n independent opy of the resoureF fy defult it uses the lgorithm desried oveD extrting the prmeter vlues from the templte resoure nd lling reteesoure to rete duplite @the tul lgorithm is slightly more omplited thn thisD see the following setionAF roweverD if prtiulr resoure type knows of etter wy to duplite itself it n implement the gustomhuplition interfeD nd provide its own duplite method whih the ftory will use insted of performing the defult duplition lgorithmF e ller who needs to duplite n existing resoure n simply ll ptoryFduplite to otin opyD whih will e onstruted in the pproprite wy depending on the resoure typeF xote tht the duplite ojet returned y ptoryFduplite will not necessarily e of the sme lss s the originl ojetF rowever the ontrt of ptoryFduplite spei(es tht where the originl ojet implements ny of list of ore qei interfesD the duplite n e ssumed to implement the sme ones ! if you duplite hefultqzetteer the result my not e n instne of hefultqzetteer ut it is gurnteed to implement the qzetteer interfeF
IST
GATE Embedded
pull detils of how to implement ustom duplite method in your own resoure type n e found in the tvho doumenttion for the gustomhuplition interfe nd the ptoryFduplite methodF
GATE Embedded
ISU
TF sf the originl resoure is D extrt its runtime prmeter vlues @exept those tht re mrked s shrleD whih hve lredy een delt with oveAD nd reursively duplite ny resoure vlues in the mpF UF et the resulting runtime prmeter vlues on the duplite resoureF he duplition proess keeps trk of ny reursivelyEduplited resouresD suh tht if the sme originl resoure is used in severl ples @eFgF when dupliting ontroller with severl tei trnsduer s tht ll refer to the sme ontology v in their runtime prmetersA then the sme duplite @ontologyA will e used in the sme ples in the duplited resoure @iFeF ll the duplite trnsduers will refer to the sme ontology vD whih will e duplite of the originl oneAF
7.9
Persistent Applications
qei imedded llows the persistent storge of pplitions in formt sed on wv serilistionF his is prtiulrly useful for pplitions mngement nd distriutionF e developer n sve the stte of n pplition when heGshe stops working on its design nd ontinue developing it in next sessionF hen the pplition rehes mturity it n e deployed to the lient site using the sme methodF hen n pplition @iFeF ControllerA is svedD qei will tully only sve the vlues for the prmeters used to rete the roessing esoures tht re ontined in the pplitionF hen the pplition is relodedD ll the s will e reEreted using the sved prmetersF wny s use externl resoures @(lesA to de(ne their ehviour ndD in most sesD these (les re identi(ed using vsF huring the sving proessD ll the vs re onverted reltive vs sed on the lotion of the pplition (leF his wyD if the resoures re pkged together with the pplition (leD the entire pplition n e relily moved to di'erent lotionF es ess to pplition sving nd loding is provided y mens of two stti methods on the gteFutilFpersisteneFersistenewnger lssD listed in tle UFVF ving nd loding qei pplition
1 2 3 4 5 6 7 8 9 10
/ / save / / Where to save the application?
/ / What to save?
gate . util . persistence . PersistenceManager . saveObjectToFile ( theApplication , file ); Factory . deleteResource ( theApplication );
ISV
GATE Embedded
Method
Purpose
loadObject-
Saves the data needed to re-create the provided GATE object to the specied le. The Object provided can be any type of Language or Processing Resource or a Controller. The procedures may work for other types of objects as well (e.g. it supports most Collection types). Parses the le specied (which needs to be a le created by the above method) and creates the necessary object(s) as specied by the data in the le. Returns the root of the object tree.
11 12 13 14 15 16
7.10
Ontologies
trting from qei version QFID support for ontologies hs een ddedF yntologies re nominlly vnguge esoures ut re quite di'erent from douments nd orpor nd re detiled in hpter IRF glsses relted to ontologies re to e found in the gteFreoleFontology pkge nd its suEpkgesF he top level pkge de(nes n strt es for working with ontologies while the suEpkges ontin onrete implementtionsF e lient progrm should only use the lsses nd methods de(ned in the es nd never ny of the lsses or methods from the implementtion pkgesF he entry point to the ontology es is the gteFreoleFontologyFyntology interfe whih is the se interfe for ll onrete implementtionsF st provides methods for essing the lss hierrhyD listing the instnes nd the propertiesF yntology implementtions re ville through pluginsF fefore n ontology lnguge reE soure n e reted using the gteFptory nd efore ny of the lsses nd methods in the es n e usedD one of the implementing ontology plugins must e lodedF por detils see hpter IRF
GATE Embedded
ISW
7.11
en nnottion shem @see etion QFRFTA n e rought inside qei through the reoleFxml (leF fy using the eysxexgi elementD one n rete instnes of resoures de(ned in reoleFxmlF he gteFreoleFennottionhem @whih is the tv representtion of n nnottion shem (leA initilizes with some prede(ned nnottion de(nitions @nnottion shemsA s spei(ed y the qei temF
Example from GATE's internal creole.xml (in
srGgteGresouresGreole):
<!-- Annotation schema --> <RESOURCE> <NAME>Annotation schema</NAME> <CLASS>gate.creole.AnnotationSchema</CLASS> <COMMENT>An annotation type and its features</COMMENT> <PARAMETER NAME="xmlFileUrl" COMMENT="The url to the definition file" SUFFIXES="xml;xsd">java.net.URL</PARAMETER> <AUTOINSTANCE> <PARAM NAME ="xmlFileUrl" VALUE="schema/AddressSchema.xml" /> </AUTOINSTANCE> <AUTOINSTANCE> <PARAM NAME ="xmlFileUrl" VALUE="schema/DateSchema.xml" /> </AUTOINSTANCE> <AUTOINSTANCE> <PARAM NAME ="xmlFileUrl" VALUE="schema/FacilitySchema.xml" /> </AUTOINSTANCE> <!-- etc. --> </RESOURCE>
sn order to rete gteFreoleFennottionhem ojet from shem nnottion (leD one must use the gteFptory lssY
1 2 3 4
FeatureMap params = new FeatureMap ();\\ param . put ( " xmlFileUrl " , annotSchemaFile . toURL ());\\ AnnotationSchema annotSchema = \\ Factory . createResurce ( " gate . creole . AnnotationSchema " , params );
xoteX ell the elements nd their vlues must e written in lower seD s wv is de(ned s
se sensitive nd the prser used for wv hem inside qei serhes is se sensitiveF sn order to e le to write wv hem de(nitionsD the ones de(ned in qei @resouresGreoleGshemA n e used s modelD or the user n hve look t http://www.w3.org/2000/10/XMLSchema for proper desription of the semntis of the elements usedF ome exmples of nnottion shems re given in etion SFRFIF
ITH
GATE Embedded
7.12
o rete new resoure you need toX write tv lss tht implements qei9s ens modelY ompile the lssD nd ny others tht it usesD into tv erhive @teA (leY write some wv on(gurtion dt for the new resoureY tell qei the v of the new te nd wv (lesF qei heveloper helps you with this proess y reting set of diretories nd (les tht implement si resoureD inluding tv ode (le nd wke(leF his proess is lled ootstrpping9F por exmpleD let9s rete new omponent lled qoldpishD whih will e roessing esoure tht looks for ll instnes of the word (sh9 in doument nd dds n nnottion of type qoldpish9F pirst strt qei heveloper @see etion PFPAF prom the ools9 menu selet foottrp
pigure UFPX foottrp izrd hilogue izrd9D whih will pop up the dilogue in (gure UFPF he mening of the dt entry (eldsX he resoure nme9 will e displyed when qei heveloper lods the resoureD nd will e the nme of the diretory the resoure lives inF por our exmpleX qoldpishF
GATE Embedded
ITI
esoure pkge9 is the tv pkge tht the lss representing the resoure will e reted inF por our exmpleX sheffieldFreoleFexmpleF esoure type9 must e one of vngugeD roessing or isul esoureF sn this se we9re going to proess douments @nd dd nnottions to themAD so we selet roessingesoureF smplementing lss nme9 is the nme of the tv lss tht represents the resoureF por our exmpleX qoldpishF he interfes implemented9 (eld llows you to dd other interfes @eFgF gteFreoleFgontrollerewre4 A tht you would like your new resoure to imE plementF sn this se we just leve the defult @whih is to implement the gteFroessingesoure interfeAF he lst (eld selets the diretory tht you wnt the new resoure reted inF por our exmpleX zXGtmpF xow we need to ompile the lss nd pkge it into te (leF he ootstrp wizrd retes n ent uild (le tht mkes this very esy ! so long s you hve ent set up properlyD you n simply run nt jr his will ompile the tv soure ode nd pkge the resulting lsses into qoldpishFjrF sf you don9t hve your own opy of entD you n use the one undled with qei E suppose your qei is instlled t GoptGgteESFHEsnpshotD then you n use GoptGgteESFHEsnpshotGinGnt jr to uildF ou n now lod this resoure into qeiY see etion QFUF he defult tv ode tht ws reted for our qoldpish resoure looks like thisX
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
/*
* * * * * * * * */
GoldFish . java You should probably ( See put a copyright notice here . Why not use the
h t t p : / / www . g n u . o r g / . )
howto . tex , v
package sheffield . creole . example ; import import import import java . util .*; gate .*; gate . creole .*; gate . util .*;
ITP
GATE Embedded
/* *
19 20 21 22 23 24 25 26 27 28
* T h i s c l a s s i s t h e i m p l e m e n t a t i o n o f t h e r e s o u r c e GOLDFISH . */ @CreoleResource ( name = " GoldFish " , comment = " Add a descriptive comment about this resource " ) public class GoldFish extends AbstractProcessingResource implements ProcessingResource {
/ / class GoldFish
he diretory struture ontining these (les is shown in (gure UFQF qoldpishFjv lives
pigure UFQX foottrp diretory tree in the srGsheffieldGreoleGexmple diretoryF reoleFxml nd uildFxml re in the top qoldpish diretoryF he li diretory is for lirriesY the lsses diretory is where tv lss (les re pledY the do diretory is for doumenttionF hese lst twoD plus qoldpishFjr re reted y entF his proess hs the dvntge tht it retes omplete soure tree nd uild struture for the omponentD nd the disdvntge tht it retes omplete soure tree nd uild
GATE Embedded
ITQ
struture for the omponentF sf you lredy hve soure treeD you will need to hop out the its you need from the new tree @in this se qoldpishFjv nd reoleFxmlA nd opy it into your existing oneF ee the exmple ode t httpXGGgteFFukGwikiGodeErepositoryGF
7.13
sn order to dd new doument formtD one needs to extend the gteFhoumentpormt lss nd to implement n strt method lledX
1 2
his method is supposed to implement the funtionlity of eh formt reder nd to rete nnottions on the doumentF pinlly the doument9s old ontent will e repled with new one ontining only the text etween mrkupsF sf one needs to dd new textul reder will extend the gteForporFextulhoumentpormt nd override the unpkwrkup@doA methodF his lss needs to e implemented under the tv en spei(tions euse it will e instntited y qei using ptoryFreteesoure@A methodF he init@A method tht one needs to dd nd implement is very importnt euse in here the reder de(nes its mens to e seleted suessfully y qeiF ht one needs to do is to dd some spei( informtion into ertin stti mps de(ned in houmentpormt lssD tht will e used t reder detetion timeF efter thtD de(nition of the reder will e pled into the one9s reoleFxml (le nd the reder will e ville to qeiF e present for the rest of the setion omplete three step exmple of dding suh rederF he reder we desrie in here is n wv rederF
tep I
grete new lss lled mlhoumentpormt tht extends gteForporFextulhoumentpormtF
tep P
smplement the unpkwrkup@houment doA whih performs the required funtionlity for the rederF edd wv detetion mens in init@A methodX
1 2
ITR
GATE Embedded
MimeType mime = new MimeType ( " text " ," xml " );
/ / Register the class handler for this mime type
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
mimeString2ClassHandlerMap . put ( mime . getType ()+ " / " + mime . getSubtype () , this );
suffixes2mimeTypeMap . put ( " xml " , mime ); suffixes2mimeTypeMap . put ( " xhtm " , mime ); suffixes2mimeTypeMap . put ( " xhtml " , mime );
} //
wore detils out the informtion from those mps n e found in etion SFSFI
tep Q
edd the following reole de(nition in the reoleFxml doumentF
<RESOURCE> <NAME>My XML Document Format</NAME> <CLASS>mypackage.XmlDocumentFormat</CLASS> <AUTOINSTANCE/> <PRIVATE/> </RESOURCE>
wore informtion on the opertion of qei9s doument formt nlysers my e found in etion SFSF
7.14
qei imedded n e used in multithreded pplitionsD so long s you oserve few restritionsF pirstD you must initilise qei y lling qteFinit@A exactly once in your pE plitionD typilly in the pplition strtup phse efore ny onurrent proessing threds re strtedF eondlyD you must not mke lls tht 'et the glol stte of qei @eFgF loding or unloding pluginsA in more thn one thred t timeF eginD you would typilly lod ll the plugins your pplition requires t initilistion timeF st is sfe to rete instances of resoures in multiple threds onurrentlyF
GATE Embedded
ITS
hirdlyD it is importnt to note tht individul qei proessing resouresD lnguge reE soures nd ontrollers re y design not thred sfe ! it is not possile to use single instne of ontrollerGGv in multiple threds t the sme time ! ut for well written resoure it should e possile to use severl di'erent instnes of the sme resoure t oneD eh in di'erent thredF hen writing your own resoure lsses you should er the following in mindD to ensure tht your resoure will e usele in this wyF evoid stti dtF here possileD you should void using stti (elds in your lssD nd you should try nd tke ll on(gurtion dt vi the giyvi prmeters you delre in your reoleFxml (leF ystem properties my e pproprite for truly stti on(gurtionD suh s the lotion of n externl exeutleD ut even then it is genE erlly etter to stik to giyvi prmeters ! user my wish to use two di'erent instnes of your D eh tlking to di'erent exeutleF ed prmeters t the orret timeF snitEtime prmeters should e red in the init@A @nd resnit@AA methodD nd for proessing resoures runtime prmeters should e red t eh exeute@AF se temporry (les orretlyF sf your resoure mkes use of externl temporry (les you should rete them using pileFreteemppile@A t init or exeute timeD s ppropriteF ho not use hrdoded (le nmes for temporry (lesF sf there re ojets tht n e shred etween di'erent instnes of your resoureD mke sure these ojets re essed either redEonlyD or in thredEsfe wyF sn prtiulr you must e very reful if your resoure n tke other resoure instnes s init or runtime prmeters @eFgF the plexile qzetteerD etion IQFTAF yf ourseD if you re writing tht is simply wrpper round n externl lirry tht imposes these kinds of limittions there is only so muh you n doF sf your resoure nnot e mde sfe you should document this fact clearlyF ell the stndrd exxsi s re sfe when independent instnes re used in di'erent threds onurrentlyD s re the stndrd trnsient doumentD trnsient orpus nd ontroller lssesF e typil pttern of development for multithreded qeiEsed pplition isX hevelop your qei proessing pipeline in qei heveloperF ve your pipeline s Fgpp (leF sn your pplition9s initilistion phseD lod n opies of the pipeline using ersistenewngerFlodyjetprompile@A @see the tvdo doumenttion for deE tilsAD or lod the pipeline one nd then mke opies of it using ptoryFduplite s desried in setion UFVD nd either give one opy to eh thred or store them in pool @eFgF vinkedvistAF
ITT
GATE Embedded
hen you need to proess textD get one opy of the pipeline from the poolD nd return it to the pool when you hve (nished proessingF
elterntively you n use the pring prmework s desried in the next setion to hndle the pooling for youF
7.15
qei imedded provides helper lsses to llow qei resoures to e reted nd mnE ged y the pring frmeworkF por pring PFH or lterD qei imedded provides ustom nmespe hndler tht mkes them extremely esy to useF o use this nmespeD put the following delrtions in your en de(nition (leX
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:gate="http://gate.ac.uk/ns/spring" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://gate.ac.uk/ns/spring http://gate.ac.uk/ns/spring.xsd">
he gteEhomeD userEon(gE(leD etF nd the `vlueb elements under `gteXprelodEpluginsb re interpreted s pring resoure pthsF sf the vlue is not n solute v then pring will resolve the pth in n pproprite wy for the type of pplition ontext " in we pplition they re tken s eing reltive to the we pp rootD nd you would typilly use lotions within ifEsxp s shown in the exmple oveF o use n solute pth for gteEhome it is not su0ient to use leding slsh @eFgF GoptGgteAD for kwrdsE omptiility resons pring will still resolve this reltive to your we pplitionF snsted you must speify it s full vD iFeF fileXGoptGgteF
GATE Embedded
ITU
he ttriutes gteEhomeD pluginsEhomeD siteEonfigEfileD userEonfigEfile nd uiltinEreoleEdir refer diretly to the similrlyEnmed setter methods on gteFqteF eny of these tht re not spei(ed will tke their usul qei imedded defult vlues @iFeF gteEhome will e the prent of the diretory ontining gteFjrD pluginsEhome will e the plugins sudiretory of qei homeD userEonfigEfile will e FgteFxml in the urrent user9s home diretoryD etFAF herefore it is highly reommended to speify t lest userEonfigEfile in order to isolte your pplition from the on(gurtion used y qei heveloperF elterntivelyD you n speify runEinEsndoxa4true4 @see the tvhosA whih will tell qei not to ttempt to red ny on(gurtion from (les t strtupF
`gteXprelodEpluginsb spei(es giyvi plugins tht should e loded fter qei hs een initilisedF en lterntive wy to speify extr plugins is to provide seprte `gteXextrEpluginb elementsD for exmpleX
<gate:init gate-home="WEB-INF" user-config-file="WEB-INF/user.xml" /> <gate:extra-plugin>WEB-INF/ANNIE</gate:extra-plugin>
ou n freely mix the two styles ! nested `gteXprelodEpluginsb de(nitions re proE essed (rstD followed y ll the `gteXextrEpluginb de(nitions found in the pplition ontextF his is useful ifD for exmpleD you re providing dditionl on(gurtion s sepE rte en de(nition (le from the one ontining the min `gteXinitb de(nition nd need to lod extr plugins without editing this min de(nitionF o rete qei resoureD use the `gteXresoureb elementF
<gate:resource id="sharedOntology" scope="singleton" resource-class="gate.creole.ontology.owlim.OWLIMOntologyLR"> <gate:parameters> <entry key="rdfXmlURL"> <gate:url>WEB-INF/ontology.rdf</gate:url> </entry> </gate:parameters> <gate:features> <entry key="ontologyVersion" value="0.1.3" /> <entry key="mainOntology"> <value type="java.lang.Boolean">true</value> </entry> </gate:features> </gate:resource>
he hildren of `gteXprmetersb re pring `entryGb elementsD just s you would write when on(guring en property of type wp`tringDyjetbF `gteXurlb provides wy to onstrut jvFnetFv from resoure pth s disussed oveF sf it is possile
ITV
GATE Embedded
to resolve the resoure pth s fileX v then this form will e preferredD s there re numer of res within qei whih work etter with fileX vs thn with other types of v @for exmple plugins tht run externl proessesD or tht use v prmeter to point to diretory in whih they will rete new (lesAF he `gteXprmetersb nd `gteXfeturesb elements de(ne qei peturewpsF hen using the simple `entry keya4FFF4 vluea4FFF4 Gb formD the entry vlues will e treted s stringsY pring n onvert strings into mny other types of ojet using the stndrd tv fens property editor mehnismD ut sine peturewp n hold ny kind of vlues you must use n expliit `vlue typea4FFF4bFFF`Gvlueb to tell pring wht type the vlue should eF
A note about types X
here is n dditionl twist for `gteXprmetersb ! qei hs its own internl logi to onvert strings to other types required for resoure prmeters @see the disussion of defult prmeter vlues in setion RFUFIAF o for prmeter vlues you hve hoieD you n either use n expliit `vlue typea4FFF4b to mke pring do the onversionD or you n pss the prmeter vlue s string nd let qei do the onversionF por resoure prmeters whose type is jvFnetFvD if you pss string vlue tht is not n solute v @strting (leXD httpXD etFA then qei will tret the string s pth reltive to the reoleFxml (le of the plugin tht de(nes the resoure type whose prmeter you re settingF sf this is not wht you intended then you should use `gteXurlb to use pring to resolve the pth to v efore pssing it to qeiF por exmpleD for tei trnsduerD `entry keya4grmmrv4 vluea4grmmrsGminFjpe4 Gb would resolve to something like fileXGpthGtoGweppGifEsxpGpluginsGexxsiGgrmmrsGminFjpeD wheres
<entry key="grammarURL"> <gate:url>grammars/main.jape</gate:url> </entry>
gustomisers9 re used to ustomise the pplition fter it is lodedF sn the exmple oveD we lod singleton opy of n ontology whih is then shred etween ll the seprte instnes of the @prototypeA pplitionF he `gteXsetEprmeterb ustomiser epts ll the sme wys to provide vlue s the stndrd pring `propertyb element @ 4vlue4 or 4ref4 ttriuteD or suEelement E `vluebD `listbD `enbD `gteXresoureb F F F AF
GATE Embedded
ITW
he `gteXddEprb ustomiser provides support for the se where most of the pplition is in sved stteD ut we wnt to rete one or two extr s with pring @mye to injet other pring ens s init prmetersA nd dd them to the pipelineF
<gate:saved-application ...> <gate:customisers> <gate:add-pr add-before="OrthoMatcher" ref="myPr" /> </gate:customisers> </gate:saved-application>
fy defultD the `gteXddEprb ustomiser dds the trget t the end of the pipelineD ut n ddEefore or ddEfter ttriute n e used to speify the nme of efore @or fterA whih this should e pledF elterntivelyD n index ttriute ples the t spei( @HEsedA index into the pipelineF he to dd n e spei(ed either s ref9 ttriuteD or with nested `enb or `gteXresoureb elementF
he `gteXdupliteb tg ts like prototype en de(nitionD in tht eh time it is fethed or injeted it will ll ptoryFduplite to rete new duplite of its templte resoure @delred s nested element or referened y the templteEref ttriuteAF rowE ever the tg lso keeps trk of ll the duplite instnes it hs returned over its lifetimeD nd will ensure they re relesed @using ptoryFdeleteesoureA when the pring ontext is shut downF he `gteXdupliteb tg lso supports ustomisersD whih will e pplied to the newlyE reted duplicate resoure efore it is returnedF his is sutly di'erent from pplying the ustomisers to the templte resoure itselfD whih would use them to e pplied one to the original resoure efore it is (rst duplitedF
IUH
GATE Embedded
pinllyD `gteXdupliteb tkes n optionl oolen ttriute returnEtemplteF sf set to flse @or omittedD s this is the defult ehviourAD the tg lwys returns duplite " the originl templte resoure is used only s templte nd is not mde ville for useF sf set to trueD the (rst time the en de(ned y the tg is injeted or fethedD the originl templte resoure is returnedF usequent uses of the tg will return duplitesF qenerlly spekingD it is only sfe to set returnEtempltea4true4 when there re no ustomisersD nd when the duplites will ll e reted upEfront efore ny of them re usedF sf the duplites will e reted synhronously @eFgF with dynmilly expnding poolD see elowA then it is possile thtD for exmpleD templte pplition my e duplited in one thred whilst it is eing exeuted y nother thredD whih my led to unpreditle ehviourF
GATE Embedded
<bean id="processor" class="gate.util.LanguageAnalyserDocumentProcessor"> <property name="analyser" ref="theApp" /> <gate:pooled-proxy max-size="5" /> </bean>
IUI
he `gteXpooledEproxyb element deortes singleton en de(nitionF st onverts the originl de(nition to prototype sope nd reples it with singleton proxy delegting to pool of instnes of the prototype enF he pool prmeters re ontrolled y ttriutes of the `gteXpooledEproxyb elementD the most importnt ones eingX
mxEsize he mximum size of the poolF sf more thn this numer of threds try to ll
methods on the proxy t the sme timeD the others will @y defultA lok until n ojet is returned to the poolF
lredy mxEsize onurrent lls in progress nd nother one rrivesAF hould e set to one of rixireihfvygu @the defultD mening lok the exess requests until n ojet eomes freeAD rixireihqy @rete new ojet nywyD even though this pushes the pool eyond mxEsizeA or rixireihpesv @use the exess lls to fil with n exeptionAF
wny more options re villeD orresponding to the properties of the pring gommonE soolrgetoure lssF hese llow youD for exmpleD to on(gure pool tht dynmilly grows nd shrinks s neessryD relesing ojets tht hve een idle for set mount of timeF ee the tvho doumenttion of gommonsoolrgetoure @nd the doumenttion for ephe ommonsEpoolA for full detilsF xote tht the `gteXpooledEproxyb tehnique is not tied to qei in ny wyD it is simply n esy wy to on(gure stndrd pring ens nd n e used with ny en tht needs to e pooledD not just ojets tht mke use of qeiF
IUP
GATE Embedded
7.16
imedding qei in omt we pplition involves severl stepsF IF ut the neessry te (les @gteFjr nd ll or most of the jrs in gteGliA in your weppGifEsxpGliF PF ut the plugins tht your pplition depends on in suitle lotion @eFgF weppGifEsxpGpluginsAF QF grete suitle gteFxml on(gurtion (les for your environmentF RF et the pproprite pths in your pplition efore lling qteFinit@AF his proess is detiled in the following setionsF
GATE Embedded
<?xml version="1.0" encoding="UTF-8" ?> <GATE> <GATECONFIG Save_options_on_exit="false" Save_session_on_exit="false" /> </GATE>
IUQ
his wyD you n ontrol extly whih plugins re loded in your wepp odeF
... public class MyServlet extends HttpServlet { private static boolean gateInited = false ; public void init () throws ServletException { if (! gateInited ) { try { ServletContext ctx = getServletContext (); File gateHome = new File ( ctx . getRealPath ( " / WEB - INF " )); Gate . setGateHome ( gateHome );
/ / thus webapp/WEB-INF/plugins is the plugins directory, and / / webapp/WEB-INF/gate.xml is the site cong le.
/ / imports
Gate . setUserConfigFile ( new File ( gateHome , " user - gate . xml " )); Gate . init ();
Gate . getCreoleRegister (). registerDirectories ( ctx . getResource ( " / WEB - INF / plugins / ANNIE " )); gateInited = true ;
IUR
GATE Embedded
} catch ( Exception ex ) { throw new ServletException ( " Exception initialising GATE " , ex ); }
28 29 30 31 32 33 34 35
yne initilizedD you n rete qei resoures using the ptory in the usul wy @for exmpleD see etion UFI for n exmple of how to rete n exxsi pplitionAF ou should lso red etion UFIR for importnt notes on using qei imedded in multithreded pplitionF snsted of n initiliztion servlet you ould lso onsider doing your initiliztion in ervletgontextvistenerD or using pring @see etion UFISAF
7.17
qroovy is dynmi progrmming lnguge sed on tvF qroovy is not used in the ore qei distriutionD so to enle the qroovy fetures in qei you must (rst lod the qroovy pluginF voding this pluginX provides ess to the qroovy sripting onsole @on(gured with some extensions for qeiA from the qei heveloper ools menuF provides to run qroovy sript over doumentsF provides ontroller whih uses qroovy hv to de(ne its exeution strtegyF enhnes numer of ore qei lsses with dditionl onveniene methods tht n e used from ny qroovy ode inluding the onsoleD the sript D nd ny qroovy lss tht uses the qei imedded esF his setion desries these fetures in detilD ut ssumes tht the reder lredy hs some knowledge of the qroovy lngugeF sf you re not lredy fmilir with qroovy you should red this setion in onjuntion with qroovy9s own doumenttion t httpXGGgroovyFodehusForgGF
GATE Embedded
IUS
o help sripting qei in qroovyD the onsole is preEon(gured to import ll lsses from the gteD gteFnnottionD gteFutilD gteFjpe nd gteFreoleFontology pkges of the ore qei es5 F his mens you n refer to lsses nd interfes suh s ptoryD ennottionetD qteD etF without needing to pre(x them with pkge nmeF sn dditionD the following @redEonlyA vrile indings re preEde(ned in the qroovy gonsoleF orporX list of loded orpor vs @gorpusA dosX list of ll loded doument vs @houmentsmplA prsX list of ll loded s ppsX list of ll loded epplitions @estrtgontrollerA hese vriles re utomtilly updted s resoures re reted nd deleted in qeiF rere9s n exmple sriptF st (nds ll douments with feture nnottor set to fredD nd puts them in new orpus lled fredshosF
1 2 3 4 5
Factory . newCorpus ( " fredsDocs " ). addAll ( docs . findAll { it . features . annotator == " fred " } )
ou n (nd other exmples @nd dd your ownA in the qroovy sript repository on the qei ikiX httpXGGgteFFukGwikiGgroovyEreipesGF
qroovy sript through the onsoleD dilog will pperD sying qroovy is exeutingF lese witF he dilog fils to go wy even when the sript hs endedD nd nnot e losed y liking the snterrupt uttonF ou nD howeverD ontinue to use the qroovy gonsoleD nd the dilog will usully go wy next time you run sriptF his is not qei prolemX it is qroovy prolemF
hy won9t the qroovy exeuting9 dilog go wyc ometimesD when you exeute
IUT
GATE Embedded
rmeters
he qroovy sripting hs single initilistion prmeter sriptvX the pth to vlid qroovy sript st hs three runtime prmeters inputexmeX n optionl nnottion set intended to e used s input y the @ut note tht the hs ess to ll nnottion setsA outputexmeX n optionl nnottion set intended to e used s output y the @ut note tht the hs ess to ll nnottion setsA sriptrmsX optionl prmeters for the sriptF sn reoleFxml (leD these should e spei(ed s keyavlue pirsD eh pir seprted y ommF por exmpleX 9nmeafredDtypeaperson9 F sn the qei qsD these re spei(ed vi dilogF
ript indings
es with the qroovy onsole desried oveD nd with tei rightEhndEside tv odeD qroovy sripts run y the sripting impliitly import ll lsses from the gteD gteFnnottionD gteFutilD gteFjpe nd gteFreoleFontology pkges of the ore qei esF he qroovy sripting lso mkes ville the following indingsD whih you n use in your sriptsX doX the urrent doument @houmentA orpusX the orpus ontining the urrent doument ontentX the string ontent of the urrent doument inputeX the nnottion set spei(ed y inputexme in the s runtime prmeters outputeX the nnottion set spei(ed y outputexme in the s runtime pE rmeters xote tht inpute nd outpute re intended to e used s input nd output ennottionE etsF his isD howeverD onventionX there is nothing to stop sript writing to or reding from ny ennottionetF elsoD lthough the sript hs ess to the orpus ontining the doument it is running overD it is not generlly neessry for the sript to iterte over the douments in the orpus itself ! the referene is provided to llow the sript to ess dt stored in the peturewp of the orpusF eny other vriles ssigned to within the sript ode will e dded to the indingD nd vlues set while proessing one doument n e used while proessing lter oneF
GATE Embedded
IUU
gontroller llks
e qroovy sript my wish to do some preE or postEproessing efore or fter proessing the douments in orpusD for exmple if it is olleting sttistis out the orpusF o support thisD the sript n delre methods eforegorpus nd ftergorpusD tking single prmeterF sf the eforegorpus method is de(ned nd the sript is running in orpus pipeline pplitionD the method will e lled efore the pipeline proesses the (rst doumentF imilrlyD if the ftergorpus method is de(ned it will e lled fter the pipeline hs ompleted proessing of ll the douments in the orpusF sn oth ses the orpus will e pssed to the method s prmeterF sf the pipeline orts with n exeption the ftergorpus method will not e lledD ut if the sript delres method orted@A then this will e lled instedF xote tht euse the sript is not proessing prtiulr doument when these methods re lledD the usul doD orpusD inputeD etF re not ville within the ody of the methods @though the orpus is pssed to the method s prmeterAF he sriptrms vrile is villeF he following exmple shows how this tehnique ould e used to uild simple tfGidf index for qei orpusF he exmple is ville in the qei distriution s pluginsGqroovyGresouresGsriptsGtfidfFgroovyF he sript mkes use of some of the utility methods desried in setion UFIUFRF
1 2 3 4 5
/ / reset variables
void beforeCorpus ( c ) {
/ / list of maps (one for each doc) from term to frequency
frequencies = []
IUV
GATE Embedded
docMap = new TreeMap ()
/ / index of the current doc in the corpus
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
docNum = 0
if (! docMap [ str ]) { docMap [ str ] = new LinkedHashSet () } docMap [ str ] << docNum
def docLength = inputAS [ scriptParams . annotationType ]. size () frequencies [ docNum ]. each { freq -> freq . value = (( double ) freq . value ) / docLength }
/ / increment the counter for the next document
docNum ++
void afterCorpus ( c ) { def tfIdf = [:] docMap . each { term , docsWithTerm -> def idf = Math . log (( double ) docNum / docsWithTerm . size ()) tfIdf [ term ] = [:] docsWithTerm . each { docId -> tfIdf [ term ][ docId ] = frequencies [ docId ][ term ] * idf } } c . features . freqTable = tfIdf }
ixmples
he plugin diretory qroovyGresouresGsripts ontins some exmple sriptsF felow is the ode for nive regulr expression F
1 2
GATE Embedded
while ( matcher . find ()) outputAS . add ( matcher . start () , matcher . end () , scriptParams . type , Factory . newFeatureMap ())
IUW
3 4 5 6 7
he sript needs to hve the runtime prmeter sriptrms set with keys nd vlues s followsX regexX the qroovy regulr expression tht you wnt to mth eFgF sBing typeX the type of the nnottion to rete for eh regex mthD eFgF regexwth hen the is run over doumentD the sript will (rst mke mther over the doument ontent for the regulr expression given y the regex prmeterF st will iterte over ll mthes for this regulr expressionD dding new nnottion for ehD with type s given y the type prmeterF
unning single
o run single from the sriptle ontroller9s list of sD simply use the 9s qroovy method llX
1 2
name
sf the 9s nme ontins spes or ny other hrter tht is not vlid in qroovy identi(erD or if the nme is reserved word @suh s importA then you must enlose the nme in single or doule quotesF ou my prefer to renme the s so their nmes re vlid identi(ersF elsoD if there re severl s in the ontroller9s list with the sme nmeD they will ll e run in the order in whih they pper in the listF
IVH
GATE Embedded
ou n optionlly provide wp of nmed prmeters to the llD nd these will override the orresponding runtime prmeter vlues for the @the originl vlues will e restored fter the hs een exeutedAX
1
he lok of ode @in ft qroovy closure A is exeuted one for eh doument in the orpus extly s stndrd orpus pipeline pplition would operteF he urrent doument is ville to the sript in the vrile doc nd the orpus in the vrile corpusD nd in ddition ny lls to s tht implement the LanguageAnalyser interfe will set the 9s document nd corpus prmeters ppropritelyF
GATE Embedded
eachDocument { def lang = doc . features . language ?: ' generic ' " $ { lang } Tokeniser " () " $ { lang } Gazetteer " () }
IVI
1 2 3 4 5
es nother exmpleD suppose you hve prtiulr tei grmmr tht you know is slow on douments tht mention lrge numer of lotionsD so you only wnt to run it on douments with up to IHH votion nnottionsD nd use fster ut less urte one on othersX
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
eachDocument { annotateLocations () if ( doc . annotations [ " Location " ]. size () <= 100) { fullLocationClassifier () } else { fastLocationClassifier () } }
ou n hve more thn one ll to eachDocumentD for exmple ontroller tht preEproesses some doumentsD then ollets some orpusElevel sttistisD then further proesses the douE ments sed on those sttistisF es (nl exmpleD onsider ontroller to postEproess dt from mnul nnottion tskF ome of the douments hve een nnotted y one nnottorD some y more thn one @the nnottions re in sets nmed nnottorID nnottorPD etFD ut the numer of sets vries from doument to doumentAF
1 2 3 4 5 6 7 8 9 10 11 12 13 14
eachDocument { def annotators = doc . annotationSetNames . findAll { it ==~ / annotator \ d +/ } annotators . each { asName -> postProcessingGrammar ( inputASName : asName , outputASName : asName ) }
/ / nd all the annotatorN sets on this document
IVP
GATE Embedded
15 16 17
qlol vriles
here re numer of vriles tht re preEde(ned in the ontrol sriptF
prs @redEonlyA n unmodi(le list of the proessing resoures in the pipelineF orpus @redEwriteA referene to the orpus @if nyA urrently set on the ontrollerD nd
over whih ny ehhoument loops will iterteF his vrile is diret lis to the ontroller9s getgorpusGsetgorpus methodsD so for exmple sript ould uild new orpus @using we rwler or similrAD then use ehhoument to iterte over this orpus nd proess the doumentsF
sn dditionD s mentioned oveD within the sope of n ehhoument loop there is do vrile giving ess to the doument eing proessed in the urrent itertionF xote tht if this ontroller is nested inside nother ontroller @see the previous setionA then the do vrile will e ville throughout the sriptF
sgnoring errors
fy defultD if n exeption or error ours while proessing @either thrown y or ourring diretly within the ontroller9s sriptA then the ontroller9s exeution will terminte with n exeptionF sf this ours during n ehhoument then the remining douments will not e proessedF sn some irumstnes it my e preferle to ignore the error nd simply ontinue with the next doumentF o support this you n use ignoringirrorsX
GATE Embedded
eachDocument { ignoringErrors { tokeniser () sentenceSplitter () myTransducer () } }
IVQ
1 2 3 4 5 6 7
eny exeptions or errors thrown within the ignoringirrors lok will e logged6 ut not rethrownF o in the exmple ove if myrnsduer fils with n exeption the ontroller will ontinue with the next doumentF xote tht it is importnt to nest the loks orretly ! if the nesting were reversed @with the ehhoument inside the ignoringirrorsA then n exeption would terminte the whole ehhoument loop nd the remining douments would not e proessedF
eltime ehviour
ome qei proessing resoures n e very slow when operting on lrge or omplex doumentsF sn mny ses it is possile to use heuristis within your ontroller9s sript to spot likely prolem douments nd void running suh s over them @see the fst vsF full lotion lssi(er exmple oveAD ut for situtions where this is not possile you n use the timevimit method to put lnket limit on the time tht s will e llowed to onsumeD in similr wy to the relEtime ontrollerF
1 2 3 4 5 6 7 8
eachDocument { ignoringErrors { annotateLocations () timeLimit ( soft :30. seconds , hard :30. seconds ) { classifyLocations () } } }
e ll to timevimit will ttempt to limit the running time of its ssoited ode lokF ou n speify three di'erent kinds of limitX
soft if the lok is still exeuting fter this timeD ttempt to interrupt it gentlyF his
uses hredFinterrupt@A nd lso lls the interrupt@A method of the urrently exeuting @if nyAF
exeption if the lok is still exeuting fter this time eyond the soft limitD ttempt to
indue n exeption y setting the orpus nd doument prmeters of the urrently running to nullF his is useful to del with s tht do not properly respet the interrupt llF
IVR
GATE Embedded
hrd if the lok is still exeuting fter this time eyond the previous limitD forily termiE
nte it using hredFstopF his is inherently dngerous nd prone to memory lekge ut my e the only wy to stop prtiulrly stuorn sF st should e used with utionF
vimits n e spei(ed using qroovy9s imegtegory nottion s shown ove @eFgF IHFseondsD PFminutesD IFminuteCRSFseondsAD or s simple numers @of milliseondsAF ih limit strts ounting from the end of the lstD so in the exmple ove the hrd limit is QH seonds fter the soft limitD or I minute fter the strt of exeutionF sf no hrd limit is spei(ed the ontroller will wit inde(nitely for the lok to ompleteF xote lso tht when timevimit lok is terminted it will throw n exeptionF sf you do not wish this exeption to terminte the exeution of the ontroller s whole you will need to wrp the timevimit lok in n ignoringirrors lokF
timevimit loksD prtiulrly ones with hrd limit spei(edD should e regrded s lst resort ! if there re heuristi methods you n use to void running slow s in the (rst ple it is good ide to use them s (rst defeneD possily wrpping them in timevimit lok if you need hrd gurntees @for exmple when you re pying per hour for your ompute time in loud omputing systemAF
GATE Embedded
IVS
IVT
GATE Embedded
the strt or end o'set of n nnottion @or setAD etF hese methods do not use ny qroovyE spei( typesD so they re usle from pure tv ode in the usul wy s well s eing mixed in for use in qroovyF edditionllyD the lss gteFgroovyFqteqroovywethods @prt of the qroovy pluginA provides methods tht use qroovy types suh s losures nd rngesF he dded methods inludeX ni(ed ess to the strt nd end o'sets of n ennottionD ennottionet or houmentX eFgF someennottionFstrt@A or nennottionetFend@A imple ess to the houmentgontent or string overed y n nnottion or nnottion setX document.stringFor(anAnnotation), document.contentFor(annotationSet)
Simple access to the length of an annotation or document, (annotation.length()) or a long (annotation.lengthLong()). either as an int
A method to construct a FeatureMap from any map, to support constructions like def params = [sourceUrl:'http://gate.ac.uk', encoding:'UTF-8'].toFeatureMap() A method to convert an annotation set into a List of annotations in the order they appear in the document, for iteration in a predictable order: annSet.inDocumentOrder().collect { it.type } The each, eachWithIndex and collect methods for a corpus have been redened to properly load and unload documents if the corpus is stored in a datastore. Various getAt methods to support constructions like annotationSet["Token"] (get all Token annotations from the set), annotationSet[15..20] (get all annotations between osets 15 and 20), documentContent[0..10] (get the document content between osets 0 and 10). A withResource method for any resource, which calls a closure with the resource passed as a parameter, and ensures that the resource is properly deleted when the closure completes (analagous to the default Groovy method InputStream.withStream).
por full detilsD see the soure ode or jvdo doumenttion for these two lssesF
7.18
eritrry fetureGvlue dt items n e sved to the user9s gteFxml (le vi the following es llsX o get the on(g dtX wp onfight a qteFgetsergonfig@AF o dd on(g dt simply put pirs into the mpX onfightFput@4my new onfig key4D 4vlue4AYF
GATE Embedded
o write the on(g dt k to the wv (leX qteFwritesergonfig@AYF
IVU
xote tht new on(g dt will simply override old vluesD where the keys re the smeF sn this wy defults n e set up y putting their vlues in the min gteFxml (leD or the site gteFxml (leY they n then e overridden y the user9s gteFxml (leF
7.19
sf we hve nnottions out the sme sujet on the sme doument from di'erent nE nottorsD we my need to merge those nnottions to form uni(ed nnottionF wo pE prohes for merging nnottions re implemented in the esD vi stti methods in the lss gteFutilFennottionwergingF he two methods hve very similr input nd output prmetersF ih of the methods tkes n rry of nnottion setsD whih should e the sme nnottion type on the sme doument from di'erent nnottorsD s inputF e single feture n lso e spei(ed s prmeter @or given snull if no feture is to e spei(edAF he output is mpD the key of whih is one merged nnottion nd the vlue of whih represents the nnottors @in terms of the indies of the rry of nnottion setsA who supE port the nnottionF he methods lso hve oolen input prmeter to indite whether or not the nnottions from di'erent nnottors re sed on the sme set of instnesD whih n e determined y the stti method public boolean isSameInstancesForAnnotators(AnnotationSet[] annsA) in the lss gteFutilFsglultionF yne instne orreE sponds to ll the nnottions with the sme spnF sf the nnottion sets re sed on the sme set of instnesD the merging methods will ensure tht the merged nnottions re on the sme set of instnesF he two methods orresponding to those desried for the ennottion werging plugin deE sried in etion PIFPHF hey reX he wethod
public static void mergeAnnotation(AnnotationSet[] annsArr, String
nameFeat, HashMap<Annotation,String>mergeAnns, int numMinK, boolean isTheSameInstances) merges the nnottions stored in the rry annsArrF he merged nnottion is put into the mp mergeAnnsD with key of the merged nnottion nd vlue of string ontining the indies of elements in the nnottion set rry annsArr whih ontin tht nnottionF NumMinK spei(es the miniml numer of the nnoE ttors supporting one merged nnottionF he oolen prmeter isTheSameInstances indite if or not those nnottion sets for merging re sed on the sme instnesF
wethod
selets the nnottions whih the mjority of the nnottors gree onF he menings of prmeters re the sme s those in the ove methodF
IVV
GATE Embedded
tei is tv ennottion tterns ingineF tei provides (nite stte trnsdution over nnottions sed on regulr expressionsF tei is version of gv ! gommon ttern pei(tion vnguge1 F his hpter introdues teiD nd outlines the funtionlity vilE leF @ou n (nd n exellent tutoril hereY thnks to hhvl hkkerD h ysmin nd hil vkinAF
1 A good description of the original version of this language is in
was a great help to us in implementing JAPE. Thanks Doug!
Doug
IVW
IWH
tei llows you to reognise regulr expressions in nnottions on doumentsF rng onD there9s something wrong hereX regulr lnguge n only desrie sets of stringsD not grphsD nd qei9s model of nnottions is sed on grphsF rmmmF enother wy of sying thisX typillyD regulr expressions re pplied to hrter stringsD simple liner sequene of itemsD ut here we re pplying them to muh more omplex dt strutureF he result is tht in ertin ses the mthing proess is nonEdeterministi @iFeF the results re dependent on rndom ftors like the ddresses t whih dt is stored in the virtul mhineAX when there is struture in the grph eing mthed tht requires more thn the power of regulr utomton to reogniseD tei hooses n lterntive ritrrilyF roweverD this is not the d news tht it seems to eD s it turns out tht in mny useful ses the dt stored in nnottion grphs in qei @nd other lnguge proessing systemsA n e regrded s simple sequenesD nd mthed deterministilly with regulr expressionsF e tei grmmr onsists of set of phsesD eh of whih onsists of set of ptternGE tion rulesF he phses run sequentilly nd onstitute sde of (nite stte trnsduers over nnottionsF he leftEhndEside @vrA of the rules onsist of n nnottion pttern desriptionF he rightEhndEside @rA onsists of nnottion mnipultion sttementsF ennottions mthed on the vr of rule my e referred to on the r y mens of lels tht re tthed to pttern elementsF gonsider the following exmpleX
Phase: Jobtitle Input: Lookup Options: control = appelt debug = true Rule: Jobtitle1 ( {Lookup.majorType == jobtitle} ( {Lookup.majorType == jobtitle} )? ) :jobtitle --> :jobtitle.JobTitle = {rule = "JobTitle1"}
he vr is the prt preeding the EEb9 nd the r is the prt following itF he vr speE i(es pttern to e mthed to the nnotted qei doumentD wheres the r spei(es wht is to e done to the mthed textF sn this exmpleD we hve rule entitled totitleI9D whih will mth text nnotted with vookup9 nnottion with mjorype9 feture of jotitle9D followed optionlly y further text nnotted s vookup9 with mjorype9 of jotitle9F yne this rule hs mthed sequene of textD the entire sequene is lloted lel y the ruleD nd in this seD the lel is jotitle9F yn the rD we refer to this spn of text using the lel given in the vrY jotitle9F e sy tht this text is to e given n nnottion of type toitle9 nd rule9 feture set to toitleI9F
IWI
e egn the tei grmmr y giving it phse nmeD eFgF hseX totitle9F tei grmmrs n e sdedD nd so eh grmmr is onsidered to e phse9 @see eE tion VFSAF he phse nme mkes up prt of the tv lss nme for the ompiled r tionsF feuse of thisD it must ontin lphnumeri hrters nd undersores onlyD nd nnot strt with numerF e lso provide list of the nnottion types we will use in the grmmrF sn this seD we sy snputX vookup9 euse the only nnottion type we use on the vr re vookup nnottionsF sf no nnottions re de(nedD ll nnottions will e mthedF henD severl options re setX gontrolY in this seD ppelt9F his de(nes the method of rule mthing @see etion VFRA heugF hen set to trueD if the grmmr is running in eppelt mode nd there is more thn one possile mthD the on)its will e displyed on the stndrd outputF e wide rnge of funtionlity n e used with teiD mking it very powerful systemF etion VFI gives n overview of some ommon vr tsksF etion VFP tlks out the vrious opertors ville for use on the vrF efter thtD etion VFQ outlines r funtionlityF etion VFR tlks out priority nd etion VFS tlks out phsesF etion VFT tlks out using tv ode on the rD whih is the min wy of inresing the power of the rF e onlude the hpter with some misellneous teiErelted topis of interestF
8.1
he vr of tei grmmr ims to mth the text spn to e nnottedD whilst voiding undesirle mthesF here re vrious tools ville to enle you to do thisF his setion outlines how you would pproh vrious ommon tsks on the vr of your tei grmmrF
IWP
IWQ
mthes oken nnottion of tegory sx followed y votion nnottionF xote tht followed y in tei depends on the nnottion types spei(ed in the snput line ! the ove pttern mthes oken nnottion nd votion nnottion provided there re no intervening nnottions of type listed in the snput lineF he oken nd votion will not neessrily e immeditely djent @they would proly e seprted y n intervening speAF sn prtiulr the pttern would not mth if peoken were spei(ed in the snput lineF he vertil r | is used to denote lterntivesF por exmple
Rule: InOrAdjective ( {Token.category == "IN"} | {Token.category == "JJ"} ):inLoc
would mth
either
or
mthes oken with one or other of the two tegory vluesD followed y votionD wheresX
IWR
epetition
tei lso provides repetition opertors to llow pttern in prentheses to e optionl @cAD or to mth zero or more @BAD one or more @CA or some spei(ed numer of timesF sn the following exmpleD you n see the |9 nd c9 opertors eing usedX
Rule: LocOrganization Priority: 50 ( ({Lookup.majorType == location} | {Lookup.majorType == country_adj}) {Lookup.majorType == organization} ({Lookup.majorType == organization})? ) :orgName --> :orgName.TempOrganization = {kind = "orgName", rule=LocOrganization}
nge xottion
epetition rnges re spei(ed using squre rketsF
({Token})[1,3]
IWS
ine we re mthing nnottions nd not textD you must e reful tht the strings you sk for re in ft single tokensF sn the exmple oveD {okenFstring aa 4XGG4} would never mth @ssuming the defult exxsi okeniserA s the three hrters re treted s seprte tokensF
IWT
he tei grmmr prser sustitutes the templte vlues for their referenes when the grmmr is prsedF hus the exmple rule is equivlent to
Rule: InterestingLocation ( {Location.score >= 0.6} ):loc --> :loc.Entity = { type = Location, source = "Interesting entity finder" }
he dvntge of using templtes is tht if there re mny rules in the grmmr tht ll referene the threshold templte then it is possile to hnge the threshold for ll rules y simply hnging the templte de(nitionF he nme templte stems from the ft tht templtes whose vlue is string n ontin parametersD spei(ed using 6{nme} nottionX
Template: url = "http://gate.ac.uk/${path}"
hen templte ontining prmeters is referenedD vlues for the prmeters my e spei(edX
... --> :anchor.Reference = { page = [url path = "userguide"] }
his is equivlent to pge a 4httpXGGgteFFukGuserguide4F wultiple prmeter vlue ssignments re seprted y ommsD for exmpleX
IWU
he prser will report n error if vlue is spei(ed for prmeter tht is not delred y the referened templteD for exmple proton modulea4km4 would not e permitted in the ove exmpleF
@his exmple is inspired y the ontologyEwre tei mthing mode desried in seE tion IRFIHFA sn multiEphse tei grmmrD templtes de(ned in erlier phses my e referened in lter phsesF his mkes it possile to delre onstnts @suh s the yyx ss oveA in one ple nd referene them throughout omplex grmmrF
IWV
nnottion is lelled jotitle9 nd is given the new nnottion toitleY the emperson nnottion is lelled person9 nd is given the new nnottion erson9F
Rule: PersonJobTitle Priority: 20 ( {Lookup.majorType == jobtitle} ):jobtitle ( {TempPerson} ):person --> :jobtitle.JobTitle = {rule = "PersonJobTitle"}, :person.Person = {kind = "personName", rule = "PersonJobTitle"}
imilrlyD lelled ptterns n e nestedD s in the exmple elowD where the whole pttern is nnotted s ersonD ut within the ptternD the jotitle is nnotted s toitleF
Rule: PersonJobTitle2 Priority: 20 ( ( {Lookup.majorType == jobtitle} ):jobtitle {TempPerson} ):person --> :jobtitle.JobTitle = {rule = "PersonJobTitle"}, :person.Person = {kind = "personName", rule = "PersonJobTitle"}
IWW
Macro: MILLION_BILLION ({Token.string == "m"}| {Token.string == "million"}| {Token.string == "b"}| {Token.string == "billion"}| {Token.string == "bn"}| {Token.string == "k"}| {Token.string == "K"} ) Macro: NUMBER_WORDS ( (({Lookup.majorType == number} ({Token.string == "-"})? )* {Lookup.majorType == number} {Token.string == "and"} )* ({Lookup.majorType == number} ({Token.string == "-"})? )* {Lookup.majorType == number} ) Macro: AMOUNT_NUMBER (({Token.kind == number} (({Token.string == ","}| {Token.string == "."} ) {Token.kind == number} )* | (NUMBER_WORDS) ) (MILLION_BILLION)? ) Rule: MoneyCurrencyUnit ( (AMOUNT_NUMBER) ({Lookup.majorType == currency_unit}) ) :number --> :number.Money = {kind = "number", rule = "MoneyCurrencyUnit"}
PHH
roweverD it is eqully eptle to hve multiple onstrints in sttementF sn this exmpleD the mjorype9 of vookup9 must e nme9 nd the minorype9 must e surnme9X
Rule: Surname ( {Lookup.majorType == "name", Lookup.minorType == "surname"} ):surname --> :surname.Surname = {}
wultiple onstrints on the sme nnottion type must ll e stis(ed y the same nnottion in order for the pttern to mthF he onstrints my refer to di'erent nnottionsD nd for the pttern s whole to mth the onstrints must e stis(ed y nnottions tht start t the sme lotion in the doE umentF sn this exmpleD in ddition to the onstrints on the mjorype9 nd minorype9 of vookup9D we lso hve onstrint on the string9 of oken9X
Rule: SurnameStartingWithDe ( {Token.string == "de", Lookup.majorType == "name", Lookup.minorType == "surname"} ):de --> :de.Surname = {prefix = "de"}
his rule would mth nywhere where oken with string de9 nd vookup with mE jorype nme9 nd minorype surnme9 strt t the sme o'set in the textF foth the
PHI
vookup nd oken nnottions would e inluded in the Xde indingD so the urnme nE nottion generted would spn the longer of the twoF es eforeD onstrints on the sme nnottion type must e stis(ed y single nnottionD so in this exmple there must e single vookup mthing oth the mjor nd minor types ! the rule would not mth if there were two di'erent lookups t the sme lotionD one of them stisfying eh onstrintF
imilrlyD the following rule @ssuming n pproprite mro for emil9A would men tht n emil ddress would only e reognised if it ourred inside ngled rkets @whih would not themselves form prt of the entityAX
Rule: Emailaddress1 ({Token.string == `<'}) ( (EMAIL) ) :email ({Token.string == `>'}) --> :email.Address= {kind = "email", rule = "Emailaddress1"}
PHP
st is importnt to rememer tht ontext is onsumed y the ruleD so it nnot e reused in nother rule within the sme phseF oD for exmpleD right ontext for one rule nnot e used s left ontext for nother ruleF
8.1.11 Negation
ell the exmples in the preeding setions involve onstrints tht require the presene of ertin nnottions to mthF tei lso supports negtive9 onstrints whih speify the absence of nnottionsF e negtive onstrint is signlled in the grmmr y 39 hrterF xegtive onstrints re used in omintion with positive ones to onstrin the lotions t whih the positive onstrint n mthF por exmpleX
Rule: PossibleName ( {Token.orth == "upperInitial", !Lookup} ):name --> :name.PossibleName = {}
his rule would mth ny upperseEinitil okenD ut only where there is no vookup nnoE ttion strting t the sme lotionF he generl rule is tht negtive onstrint mthes t ny lotion where the orresponding positive onstrint would not mthF xegtive onstrints do not ontriute ny nnottions to the indings E in the exmple oveD the Xnme inding would ontin only the oken nnottion2 F eny onstrint n e negtedD for exmpleX
Rule: SurnameNotStartingWithDe ( {Surname, !Token.string ==~ "[Dd]e"} ):name --> :name.NotDe = {}
his would mth ny urnme nnottion tht does not strt t the sme ple s oken with the string de9 or he9F xote tht this is sutly di'erent from {urnmeD okenFstring 3a~ 4hde4}D s the seond form requires oken nnottion
2 The exception to this is when a negative constraint is used alone, without any positive constraints in
the combination. In this case it binds all the annotations at the match position that do not match the constraint. Thus, {!Lookup} would bind all the annotations starting at this location except Lookups. In general negative constraints should only be used in combination with positive ones.
PHQ
to e presentD wheres the (rst form @3okenFFFA will mth if there is no oken nnottion t ll t this lotionF3 es with positive onstrintsD multiple negtive onstrints on the sme nnottion type must ll mth the sme nnottion in order for the overll pttern mth to e lokedF por exmpleX
{Name, !Lookup.majorType == "person", !Lookup.minorType == "female"}
would mth xme nnottionD ut only if it does not strt t the sme lotion s vookup with mjorype person nd minorype femleF e vookup with mjorype perE son nd minorype mle would not lok the pttern from mthingF rowever negted onstrints on di'erent nnottion types re independentX
{Person, !Organization, !Location}
would mth erson nnottionD ut only if there is no yrgniztion nnottion votion nnottion strting t the sme pleF
and
no
xote rior to qei UFHD negted onstrints on the sme nnottion type were onsidered
independentD iFeF in the xme exmple ove any vookup of mjorype person would lok the mthD irrespetive of its minorypeF sf you hve existing grmmrs tht depend on this ehviour you should dd negtionqrouping a flse to the yptions line t the top of the tei phse in questionF
elthough tei provides n opertor to look for the sene of single nnottion typeD there is no support for generl negtive opertor to prevent rule from (ring if prtiulr sequence of nnottions is foundF yne solution to this is to rete negtive rule9 whih hs higher priority thn the mthing positive rule9F he style of mthing must e eppelt for this to workF o rete negtive ruleD simply stte on the vr of the rule the pttern tht should xy e mthedD nd on the r do nothingF sn this wyD the positive rule nnot e (red if the negtive pttern mthesD nd vie versD whih hs the sme end result s using negtive opertorF e useful vrition for developers is to rete dummy nnottion on the r of the negtive ruleD rther thn to do nothingD nd to give the dummy nnottion rule fetureF sn this wyD it is ovious tht the negtive rule hs (redF elterntivelyD use tv ode on the r to print messge when the rule (resF en exmple of mthing negtive nd positive rule followsF rereD we wnt rule whih mthes surnme followed y omm nd set of initilsF fut we wnt to speify tht the initils shouldn9t hve the y tegory @personl pronounAF o we speify negtive rule tht will (re if the tegory existsD therey preventing the positive rule from (ringF
Rule: NotPersonReverse Priority: 20 // we don't want to match 'Jones, I'
3 In the Montreal transducer, the two forms were equivalent
PHR
( {Token.category == NNP} {Token.string == ","} {Token.category == PRP} ) :foo --> {} Rule: PersonReverse Priority: 5 // we want to match `Jones, F.W.' ( {Token.category == NNP} {Token.string == ","} (INITIALS)? ) :person -->
will mth doule quoteF por other speil hrtersD suh s 69D enlose it in doule quotesD eFgF
{Token.category == "PRP$"}
8.2
his setion gives more detil on the ehviour of the mthing opertors used on the leftE hnd side of tei rulesF wthing opertors re used to speify how mthing must tke ple etween tei pttern nd n nnottion in the doumentF iqulity @aa9 nd 3a9A nd omprison @<9D <=9D >=9 nd >9A opertors n e usedD s n regulr expression mthing nd ontextul opertors @ontins9 nd within9AF
PHS
PHT
PHU
por ny of these opertorsD the rightEhnd vlue @ in the ove exmplesA n e full onstrint itselfF por exmple { ontins {Ffooaar}} is lso eptedF he opertors n e used in multiEonstrint sttement @see etion VFIFWA just like ny of the trditionl onesD so {FfI 3a 4something4D ontins {Ffooaar}} is vlidF
8.3
he r of the rule ontins informtion out the nnottion to e retedGmnipultedF snformtion out the text spn to e nnotted is trnsferred from the vr of the rule using the lel just desriedD nd nnotted with the entity type @whih follows itAF pinllyD ttriutes nd their orresponding vlues re dded to the nnottionF elterntivelyD the r of the rule n ontin tv ode to rete or mnipulte nnottionsD see etion VFTF
PHV
his will set the type9 feture of the generted lotion to the vlue of the minorype9 feE ture from the vookup9 nnottion ound to the lo lelF sf the vookup hs no minorypeD the votion will hve no type9 fetureF he ehviour of newpet a XindFypeFoldpet isX pind ll the nnottions of type ype from the left hnd side inding indF pind one of them tht hs nonEnull vlue for its oldpet feture @if there is more thn oneD whih one is hosen is up to the tei implementtionAF sf suh vlue existsD set the newpet feture of our newly reted nnottion to this vlueF sf no suh nonEnull vlue existsD do not set the newpet feture t llF xotie tht the ehviour is deliberately underspecied if there is more thn one ype nnoE ttion in indF sf you need more ontrolD or if you wnt to opy severl feture vlues from the sme left hnd side nnottionD you should onsider using tv ode on the right hnd side of your rule @see etion VFTAF sn ddition to opying feture vlues you n lso opy metEproperties @see setion VFIFQAX
Rule: LocationType ( {Lookup.majorType == location} ):loc --> :loc.Location = {rule = "LocationType", text = :loc.Lookup@cleanString}
he syntx feture a XlelFennottionypedstring ssigns to the spei(ed feture the text overed y the nnottion of this type in the inding with this lelF he dlentring nd dlength properties re similrF es eforeD if there is more thn one
PHW
nnottion of the given type is ound to the sme lel then one of them will e hosen ritrrilyF he Fennottionype my e omittedD for exmple
Rule: LocationType ( {Token.category == IN} {Lookup.majorType == location} ):loc --> :loc.InLocation = {rule = "InLoc", text = :loc@string, size = :loc@length}
sn this se the stringD lentring or length is tht overed y the whole lelD iFeF the sme spn s would e overed y n nnottion reted with XlelFxewennottion a {}F
his rule n mth sequene onsisting of only one oken whose tegory feture @y tgA strts with xxY in this se the Xdet inding is null nd the Xdjs inding is n empty nnottion setD nd oth of them re silently ignored when the r of the rule is exeutedF
PIH
Macro: UNDERSCORES_OKAY // separate :match // lines { AnnotationSet matchedAnns = bindings.get("match"); int begOffset = matchedAnns.firstNode().getOffset().intValue(); int endOffset = matchedAnns.lastNode().getOffset().intValue(); String mydocContent = doc.getContent().toString(); String matchedString = mydocContent.substring(begOffset, endOffset); FeatureMap newFeatures = Factory.newFeatureMap(); if(matchedString.equals("Spanish")) newFeatures.put("myrule", "Lower"); } else { newFeatures.put("myrule", "Upper"); } {
Rule: Lower ( ({Token.string == "Spanish"}) :match)-->UNDERSCORES_OKAY // no label here, only macro name Rule: Upper ( ({Token.string == "SPANISH"}) :match)-->UNDERSCORES_OKAY // no label here, only macro name
8.4
Use of Priority
ih grmmr hs one of S possile ontrol stylesX rill9D ll9D (rst9D one9 nd ppelt9F his is spei(ed t the eginning of the grmmrF sf no ontrol style is spei(edD the defult is rillD ut we would reommend lwys speifying ontrol style for ske of lrityF he frill style mens tht when more thn one rule mthes the sme region of the doumentD they re ll (redF he result of this is tht segment of text ould e lloted more thn one entity typeD nd tht no priority ordering is neessryF frill will exeute ll mthing rules
PII
strting from given position nd will dvne nd ontinue mthing from the position in the doument where the longest mth (nishesF he ll9 style is similr to frillD in tht it will lso exeute ll mthing rulesD ut the mthing will ontinue from the next o'set to the urrent oneF por exmpleD where re nnottions of type enn
[aaa[bbb]] [ccc[ddd]]
then rule mthing {enn} nd reting {ennEP} for the sme spns will generteX
BRILL: [aaabbb] [cccddd] ALL: [aaa[bbb]] [ccc[ddd]]
ith the (rst9 styleD rule (res for the (rst mth tht9s foundF his mkes it inpproprite for rules tht end in C9 or c9 or B9F yne mth is found the rule is (redY it does not ttempt to get longer mth @s the other two styles doAF ith the one9 styleD one rule hs (redD the whole tei phse exits fter the (rst mthF ith the ppelt styleD only one rule n e (red for the sme region of textD ording to set of priority rulesF riority opertes in the following wyF IF prom ll the rules tht mth region of the doument strting t some point D the one whih mthes the longest region is (redF PF sf more thn one rule mthes the sme regionD the one with the highest priority is (red QF sf there is more thn one rule with the sme priorityD the one de(ned erlier in the grmmr is (redF en optionl priority delrtion is ssoited with eh ruleD whih should e positive inteE gerF he higher the numerD the greter the priorityF fy defult @if the priority delrtion is missingA ll rules hve the priority EI @iFeF the lowest priorityAF por exmpleD the following two rules for lotion ould potentilly mth the sme textF
Rule: Location1 Priority: 25 ( ({Lookup.majorType == loc_key, Lookup.minorType == pre}
PIP
{SpaceToken})? {Lookup.majorType == location} ({SpaceToken} {Lookup.majorType == loc_key, Lookup.minorType == post})? ) :locName --> :locName.Location = {kind = "location", rule = "Location1"} Rule: GazLocation Priority: 20 ( ({Lookup.majorType == location}):location ) --> :location.Name = {kind = "location", rule=GazLocation}
essume we hve the text ghin se9D tht ghin9 is de(ned in the gzetteer s lotion9D nd tht se is de(ned s lokey9 of type post9F sn this seD rule votionI would pplyD euse it mthes longer region of text strting t the sme point @ghin se9D s opposed to just ghin9AF xow ssume we just hve the text ghin9F sn this seD oth rules ould e (redD ut the priority for votionI is highestD so it will tke preedeneF sn this seD sine oth rules produe the sme nnottionD so it is not so importnt whih rule is (redD ut this is not lwys the seF yne importnt point of whih to e wre is tht prioritistion only opertes within single grmmrF elthough we ould mke priority glol y hving ll the rules in single grmmrD this is not idel due to other onsidertionsF snstedD we urrently omine ll the rules for eh entity type in single grmmrF en index (le @minFjpeA is used to de(ne whih grmmrs should e usedD nd in whih order they should e (redF xote lso tht depending on the ontrol styleD (ring rule my onsume9 tht prt of the textD mking it unville to e mthed y other rulesF his n e prolem for exmple if one rule uses ontext to mke it more spei(D nd tht ontext is then missed y lter rulesD hving een onsumed due to use of for exmple the frill9 ontrol styleF ell9D on the other hndD would llow it to e mthedF
PIQ
QF yrder of rulesF sn the se where the ove two ftors do not distinguish etween two rulesD the order in whih the rules re stted ppliesF ules stted (rst hve higher priorityF feuse priority n only operte within single grmmrD this n e prolem for deling with miguity issuesF yne solution to this is to rete temporry set of nnottions in initil grmmrsD nd then mnipulte this temporry set in one or more lter phses @for exmpleD y onverting temporry nnottions from di'erent phses into permnent nnottions in single (nl phseAF ee the defult set of grmmrs for n exmple of thisF sf two possile wys of mthing re found for the sme text stringD on)it n riseF xormlly this is hndled y the priority mehnism @test lengthD rule priority nd (nlly rule preedeneAF sf ll these re equlD tpe will simply hoose mth t rndom nd (re itF his leds ot nonEdeterministi ehviourD whih should e voidedF
8.5
e tei grmmr onsists of set of sequentil phsesF he list of phses is spei(ed @in the order in whih they re to e runA in (leD onventionlly nmed minFjpeF hen loding the grmmr into qeiD it is only neessry to lod this min (le ! the phses will then e loded utomtillyF st isD howeverD possile to omit this min (leD nd just lod the phses individullyD ut this is muh more timeEonsumingF he grmmr phses do not need to e loted in the sme diretory s the min (leD ut if they re notD the reltive pth should e spei(ed for eh phseF yne of the min resons for using sequene of phses is tht pttern n only e used one in eh phseD ut it n e reused in lter phseF gomined with the ft tht priority n only operte within single grmmrD this n e exploited to help del with miguity issuesF he solution urrently dopted is to write grmmr phse for eh nnottion typeD or for eh omintion of similr nnottion typesD nd to rete temporry nnottionsF hese temporry nnottions re essed y lter grmmr phsesD nd n e mnipulted s neessry to resolve miguity or to merge onseutive nnottionsF he temporry nnottions n either e removed lterD or left nd simply ignoredF qenerllyD nnottions out whih we re more ertin re reted erlier onF ennottions whih re more duious my e reted temporrilyD nd then mnipulted y lter phses s more informtion eomes villeF en nnottion generted in one phse n e referred to in lter phseD in extly the sme wy s ny other kind of nnottion @y speifying the nme of the nnottion within urly resAF he fetures nd vlues n e referred to or omittedD s with ll other nnottionsF
PIR
wke sure tht if the snput spei(tion is used in the grmmrD tht the nnottion to e referred to is inluded in the listF
8.6
he r of tei rule n onsist of ny tv odeF his is useful for removing temporry nnottions nd for perolting nd mnipulting fetures from previous nnottionsF sn the exmple elow he (rst rule elow shows rule whih mthes (rst person nmeD eFgF pred9D nd dds gender feture depending on the vlue of the minorype from the gzetteer list in whih the nme ws foundF e (rst get the indings ssoited with the person lel @iFeF the vookup nnottionAF e then rete new nnottion lled personenn9 whih ontins this nnottionD nd rete new peturewp to enle us to dd feturesF hen we get the minorype fetures @nd its vlueA from the personenn nnottion @in this seD the feture will e gender9 nd the vlue will e mle9AD nd dd this vlue to new feture lled gender9F e rete nother feture rule9 with vlue pirstxme9F pinllyD we dd ll the fetures to new nnottion pirsterson9 whih tthes to the sme nodes s the originl person9 indingF xote tht inpute nd outpute represent the input nd output nnottion setF xormllyD these would e the sme @y defult when using exxsiD these will e the hefult9 nnotE tion setAF ine the user is t lierty to hnge the input nd output nnottion sets in the prmeters of the tei trnsduer t runtimeD it nnot e gurnteed tht the input nd output nnottion sets will e the smeD nd therefore we must speify the nnottion set we re referring toF
Rule: FirstName ( {Lookup.majorType == person_first} ):person --> { AnnotationSet person = bindings.get("person"); Annotation personAnn = person.iterator().next(); FeatureMap features = Factory.newFeatureMap(); features.put("gender", personAnn.getFeatures().get("minorType")); features.put("rule", "FirstName"); outputAS.add(person.firstNode(), person.lastNode(), "FirstPerson", features); }
he seond rule @ontined in susequent grmmr phseA mkes use of nnottions proE
PIS
dued y the (rst rule desried oveF snsted of perolting the minorype from the nnottion produed y the gzetteer lookupD this time it peroltes the feture from the nnottion produed y the previous grmmr ruleF o here it gets the gender9 feture vlue from the pirsterson9 nnottionD nd dds it to new feture @gin lled gender9 for onvenieneAD whih is dded to the new nnottion @in outputeA emperson9F et the end of this ruleD the existing input nnottions @from inputeA re removed euse they re no longer neededF xote tht in the previous ruleD the existing nnottions were not removedD euse it is possile they might e needed lter on in nother grmmr phseF
Rule: GazPersonFirst ( {FirstPerson} ) :person --> { AnnotationSet person = bindings.get("person"); Annotation personAnn = person.iterator().next(); FeatureMap features = Factory.newFeatureMap(); features.put("gender", personAnn.getFeatures().get("gender")); features.put("rule", "GazPersonFirst"); outputAS.add(person.firstNode(), person.lastNode(), "TempPerson", features); inputAS.removeAll(person); }
ou n omine tv loks nd norml ssignments @seprting eh lok or ssignment from the next with ommAD so the ove r ould e more simply expressed s
--> :person.TempPerson = { gender = :person.FirstPerson.gender, rule = "GazPersonFirst" }, { inputAS.removeAll(bindings.get("person")); }
PIT
gender of the title in preferene to the gender of the (rst nmeD if it is presentF oD on the rD we (rst look for the gender of the title y getting ll itle nnottions whih hve gender feture tthedF sf gender feture is presentD we dd the vlue of this feture to new gender feture on the erson nnottion we re going to reteF sf no gender feture is presentD we look for the gender of the (rst nme y getting ll (rsterson nnottions whih hve gender feture tthedD nd dding the vlue of this feture to new gender feture on the erson nnottion we re going to reteF sf there is no (rsterson nnottion nd the title hs no gender informtionD then we simply rete the erson nnottion with no gender fetureF
Rule: PersonTitle Priority: 35 /* allows Mr. Jones, Mr Fred Jones etc. */ ( (TITLE) (FIRSTNAME | FIRSTNAMEAMBIG | INITIALS2)* (PREFIX)? {Upper} ({Upper})? (PERSONENDING)? ) :person --> { FeatureMap features = Factory.newFeatureMap(); AnnotationSet personSet = bindings.get("person"); // get all Title annotations that have a gender feature HashSet fNames = new HashSet(); fNames.add("gender"); AnnotationSet personTitle = personSet.get("Title", fNames); // if the gender feature exists if (personTitle != null && personTitle.size()>0) { Annotation personAnn = personTitle.iterator().next(); features.put("gender", personAnn.getFeatures().get("gender")); } else { // get all firstPerson annotations that have a gender feature AnnotationSet firstPerson = personSet.get("FirstPerson", fNames); if (firstPerson != null && firstPerson.size()>0) // create a new gender feature and add the value from firstPerson
PIU
PIV
Priority:100 ({Organization} | {Person} | {Location}):entity --> { //get the annotation set AnnotationSet annSet = bindings.get("entity"); //get the only annotation from the set Annotation entityAnn = annSet.iterator().next(); AnnotationSet tokenAS = inputAS.get("Token", entityAnn.getStartNode().getOffset(), entityAnn.getEndNode().getOffset()); List<Annotation> tokens = new ArrayList<Annotation>(tokenAS); //if no tokens to match, do nothing if (tokens.isEmpty()) return; Collections.sort(tokens, new gate.util.OffsetComparator()); Annotation curToken=null; for (int i=0; i < tokens.size(); i++) {
PIW
PPH
e lel X`lelb on tv lok retes lol vrile `lelbennots within the tv lok whih is the ennottionet ound to the `lelb lelF elsoD the tv ode in the lok is only exeuted if there is t lest one nnottion ound to the lelD so you do not need to hek this ondition in your own odeF yf ourseD if you need more )exiilityD eFgF to perform some tion in the se where the lel is not oundD you will need to use n unlelled lok nd perform the indingsFget@A yourselfF
PPI
method doit nd will work in ontext of this methodF hen prtiulr rule is (redD the method doit will e exeutedF wethod doit is spei(ed y the interfe gteFjpeFhsetionF ih tion lss impleE ments this interfe nd is generted with roughly the following templteX
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
java . io .*; java . util .*; gate .*; gate . jape .*; gate . creole . ontology .*; gate . annotation .*; gate . util .*;
class < AutogeneratedActionClassName > implements java . io . Serializable , gate . jape . RhsAction { private ActionContext ctx ; public ActionContext getActionContext () { ... } public String ruleName () { .. } public String phaseName () { .. } public void doit ( gate . Document doc , java . util . Map < java . lang . String , gate . AnnotationSet > bindings , gate . AnnotationSet annotations , gate . AnnotationSet inputAS , gate . AnnotationSet outputAS , gate . creole . ontology . Ontology ontology ) throws JapeException { }
/ / your RHS Java code will be embedded here ...
wethod doit hs the following prmeters tht n e used in r tv odeX gteFhoument do E doument tht is urrently proessed jvFutilFwp`tringD ennottionetb indings E mp of inding vriles where key is @tringA nme of inding vrile nd vlue is @ennottionetA set of nnottions orresponding to this inding vrile6 gteFennottionet nnottions E ho not use this @it9s synonym for outpute tht is still used in some grmmrs ut is now depretedAF gteFennottionet inpute E input nnottions gteFennottionet outpute E output nnottions gteFreoleFontologyFyntology ontology E qei9s trnsduer ontology
6 Prior to GATE 5.2 this parameter was a plain
Map
PPP
sn dditionD the (eld tx provides the etiongontext ojet to the r ode @see the etiongontext tvho for moreAF he etiongontext ojet n e used to ess the ontroller nd the orpus nd the nme nd the feture mp of the proessing resoureF sn your tv r you n use short nmes for ll tv lsses tht re imported y the tion lss @plus tv lsses from the pkges tht re imported y defult ording to tw spei(tionX jvFlngFBD jvFmthFBAF fut you need to use fully quli(ed tv lss nmes for ll other lssesF por exmpleX
1 2 3 4 5 6 7 8 9 10 11 12 13
/ / INVALID line examples
--> {
/ / VALID line examples
AnnotationSet as = ... InputStream is = ... java . util . logging . Logger myLogger = java . util . logging . Logger . getLogger ( " JAPELogger " ); java . sql . Statement stmt = ... Logger myLogger = Logger . getLogger ( " JapePhaseLogger " ); Statement stmt = ...
sn order to dd dditionl tv import or import static sttements to ll tv r9 of the rules in tei grmmr (leD you n use the following ode t the eginning of the tei (leX
1 2 3 4
Imports : { import java . util . logging . Logger ; import java . sql .*; }
hese import sttements will e dded to the defult import sttements for eh tion lss generted for r nd the orresponding lsses n e used in the r tv ode without the need to use fully quli(ed nmesF e useful lss to know out is gate.Utils @see the jvdo doumenttion for detilsAD whih provides stti utility methE ods to simplify some ommon tsks tht re frequently used in r tv odeF edding n import static gate.Utils.*; to the smports lok llows you to use these methods without ny pre(xD for exmpleX
1 2 3 4 5
AnnotationSet lookups = bindings . get ( " lookup " ); outputAS . add ( start ( lookups ) , end ( lookups ) , " Person " , featureMap ( " text " , stringFor ( doc , lookups )));
ou n do the sme with your own utility lsses " tei rules n import ny lss ville to qeiD inluding lsses de(ned in pluginF he prede(ned methods rulexme@A nd phsexme@A llow you to esily ess the rule nd phse nme in your tv rF
PPQ
e tei (le n optionlly lso ontin tv ode loks for hndling the events of when the ontroller @pipelineA running the tei proessing resoure strts proessingD (nishes proessingD or proessing is orted @see the tvho for gontrollerewre for more inforE mtion nd wrnings out using this fetureAF hese ode loks hve to e de(ned fter ny smportX lok ut efore the (rst phse in the (le using the gontrollertrtedXD gontrollerpinishedX nd gontrollereortedX keywordsX
1 2 3 4 5 6 7 8 9 10
/ / code to run right before the controller nishes / after all transducing
he tv ode in eh of these loks n ess the following prede(ned (eldsX ontrollerX the gontroller ojet running this tei trnsduer orpusX the gorpus ojet on whih this tei trnsduer is runD if it is run y gorpusgontrollerD null otherwiseF ontologyX the yntology ojet if n yntology v hs een spei(ed s runtimeE prmeter for this tei trnsduerD null otherwise txX the etiongontext ojetF he method txFisinled@A n e used to (nd out if the is not disled in onditionl ontroller @xote tht even when is disled the gontrollertrtedGpinished loks re still exeuted3A throwleX inside the gontrollereorted lokD the hrowle whih signlled the orting exeption xote tht these loks re invoked even when the tei proessing resoure is disled in onditionl pipelineF sf you wnt to dpt or void the proessing inside lok in se the proessing resoure is disledD use the method txFisinled@A to hek if the proessing resoure is not disledF
8.7
he wy in whih grmmrs re designed n hve huge impt on the proessing speedF ome simple triks to keep the proessing s fst s possile reX
PPR
use
({Token})[0,3]
if you n predit tht you won9t need to reognise string of okens longer thn QF sing B nd C on very ommon nnottions @espeilly okenA is lso the most ommon use of outEofEmemory errors in tei trnsduersF void speifying unneessry elements suh s peokens where you nF o do thisD use the snput spei(tion t the eginning of the grmmr to stipulte the nnottions tht need to e onsideredF sf no snput spei(tion is usedD ll nnottions will e onsidered @soD for exmpleD you nnot mth two tokens seprted y spe unless you speify the peoken in the ptternAF sfD howeverD you speify okens ut not peokens in the snputD peokens do not hve to e mentioned in the pttern to e reognisedF sfD for exmpleD there is only one rule in phse tht requires peokens to e spei(edD it my e judiious to move tht rule to seprte phse where the peoken n e spei(ed s snputF void the shorthnd syntx for opying feture vlues @newFeat = :bind.Type.oldFeatAD prtiulrly if you need to opy multiple fetures from the left to the right hnd side of your ruleF
8.8
qei supports two di'erent methods for ontology wre grmmr trnsdutionF pirstly it is possile to use the ontology feture oth in grmmrs nd nnottionsD while using the defult trnsduerF eondly it is possile to use n ontology wre trnsduer y pssing n ontology lnguge resoure to one of the susumes methods in implepeturewpsmplF his seond strtegy does not hek for ontology feturesD whih will mke the writing of grmmrs esierD s there is no need to speify ontology when writing themF wore informtion out the ontologyEwre trnsduer n e found in etion IRFIHF
8.9
tei grmmrs re written s (les with the extension Fjpe9D whih re prsed nd omE piled t runEtime to exeute them over the qei doument@sAF eriliztion of the tei
PPS
rnsduer dds the pility to serilize suh grmmr (les nd use them lter to ootE strp new tei trnsduersD where they do not need the originl tei grmmr (leF his llows people to distriute the serilized version of their grmmrs without dislosing the tul ontents of their jpe (lesF his is implemented s prt of the tei rnsduer F he following setions desrie how to serilize nd deserilize themF
8.10
sn tune PHHVD the stndrd tei trnsduer implementtion gined numer of fetures inspired y vu lmondon9s wontrel rnsduer9D whih ws ville s qei plugin for severl yersD nd ws mde osolete in ersion SFIF sf you hve existing wontrel rnsE duer grmmrs nd wnt to updte them to work with the stndrd tei implementtion you should e wre of the following di'erenes in ehviourX unti(ers @BD C nd cA in the wontrel trnsduer re lwys greedyD ut this is not neessrily the se in stndrd teiF he wontrel rnsduer de(nes {ypeFfeture 3a vlue} to e the sme s {3ypeFfeture aa vlue} @nd likewise the 3~ opertor in terms of a~AF sn stnE drd tei these onstruts hve di'erent semntisF {ypeFfeture 3a vlue} will only mth if there is ype nnottion whose feture feture does not hve the given vlueD nd if it mthes it will ind the single ype nnottionF {3ypeFfeture aa vlue} will mth if there is no ype nnottion t given ple with this feture @inluding when there is no ype nnottion t llAD nd if it mthes it will ind every other nnottion tht strts t tht lotionF sf you hve used 3a in your wontrel grmmrs nd wnt them to ontinue to ehve the sme wy you must hnge them to use the pre(xE3 form insted @see etion VFIFIIAF
PPT
8.11
JAPE Plus
ersion 7.0 of qei heveloperGimedded sw the introdution of the teilus pluginD whih inludes new tei exeution engineD in the form of the teiElus rnsduerF he teiElus rnsduer should e dropEin replement for the stndrd tei rnsE duerX it epts the sme lnguge @iFeF tei grmmrsA nd it hs similr set of prmE etersF he teiElus rnsduer inludes series of optimistions designed to speedEup the exeutionX
pw winimistion the (nite stte mhine used internlly to represent the tei grmE
mrs is minimisedD reduing the numer of tests tht to e performed t exeution timeF
ennottion qrph sndexing tei lus uses speil dt struture for holding input
nnottions whih is optimised for the types of tests performed during the exeution of tei grmmrsF
redite ghing tei pttern elements re onverted into tomi preditesD iFeF tests
tht nnot e further suEdivided @suh s testing if the vlue of given nnottion feture hs ertin vlueAF he truth vlue for ll predites for eh input nnottion is hed one lultedD using dynmiEprogrmming tehniquesF his voids the sme test eing evluted multiple times for the sme nnottionF onverted into tv ode tht is then ompiled on the )yF his llows the inlining of onstnts nd the unwinding of exeution loopsF edditionllyD the tv ts optimiE stions n lso pply in this setEupF
gompiltion of the tte whine the (nite stte mhine used during mthing is
here re few smll di'erenes in the ehviour of tei nd tei lusX tei lus ehves in more deterministi fshionF here re ses where multiple pths inside the nnottion grph n e mthed with the sme preedeneD eFgF when the sme tei rule mthes di'erent sets of nnottions using di'erent rnhes of disjuntion in the ruleF sn suh situtionsD the stndrd tei engine will pik one of the possile pths t rndom nd pply the rule using itF eprte exeutions of the sme grmmr over the sme doument n thus led to di'erent resultsF fy ontrstD tei lus will lwys hoose the sme mthing set of nnottionsF st is however not
PPU
possile to know priori whih one will e hosenD unless the rules re reEwritten to remove the miguity @solution whih is lso possile with the stndrd tei engineAF tei lus is ple of mthing zeroElength nnottionsD iFeF nnottions for whih the strt nd end o'sets re the smeD so they over no doument textF he stndrd tei engine simply ignores suh nnottionsD while tei lus llows their use in rulesF his n e useful in mthing nnottions onverted from the originl mrkupD for exmple rwv `rb tgs will never hve ny text ontentF
pigure VFIX tei nd tei lus exeution speed for doument length st is not possile to urtely quntify the speed di'erentil etween tei nd tei lus in the generl seD s tht depends on the omplexity of the tei grmmrs used nd of the input doumentsF o get one useful dt point we performed n experiment where we proessed just over VDHHH we pges from the ffg xews we siteD with the exxsi xi grmmrsD using oth tei nd tei lusF yn verge the exeution speed ws R times fster when using tei lusF he smllest speed di'erentil ws I @iFeF tei lus ws s fst s teiAD the highest ws W times fsterF pigure VFI plots the exeution speed for oth engines ginst doument lengthF es n e seenD tei lus is onsistently fster on ll doument sizesF pigure VFP inludes histogrm showing the numer of douments for eh speed di'erentilF por the vst mjority of doumentsD tei lus ws Q times or more fster thn teiF
PPV
PPW
PQH
ANNIC: ANNotations-In-Context
h hs n dvned grphil interfe tht llows users to issue queries over the hF felow we explin the prmeters required y h nd how to instntite itD how to use its grphil interfe nd how to use h progrmmtillyF
9.1
tepsX
Instantiating SSD
IF sn qei heveloperD right lik on htstores9 nd selet grete htstore9F PF prom dropEdown list selet vuene fsed erhle httore9F QF rereD you will see (le dilogF lese selet n empty folder for your dtstoreF his is similr to the proedure of reting seril dtstoreF RF efter thisD you will see n input windowF lese provide these prmetersX @A httore vX his is the v of the dtstore folder seleted in the previous stepF @A sndex votionX fy defultD the lotion of index is lulted from the dtstore lotionF st is done y ppending Eindex9 to the dtstore lotionF sf user wnts to hnge this lotionD it is possile to do so y liking on the folder ion nd seleting nother empty folderF sf the seleted folder exists lredyD the system will hek if it is n empty folderF sf the seleted folder does not existD the system tries to rete itF @A ennottion etsX rereD you n provide one or more nnottion sets tht you wish to index or exlude from eing indexedF fy defultD the defult nnottion set nd the uey9 nnottion set re inludedF ser n hnge this seletion y liking on the edit list ion nd removing or dding pproprite nnottion set nmesF sn order to e le to redd the defult nnottion setD you must lik on the edit list ion nd dd n empty (eld to the listF sf there re no nnottion sets providedD ll the nnottion sets in ll douments re indexedF @dA fseEoken ypeX @eFgF oken or ueyFokenA hese re the si tokens of ny doumentF our douments must hve the nnottions of fseEokenEype in order to get indexedF hese si tokens re used for displying ontextul inE formtion while serhing ptterns in the orpusF sn se of indexing more thn one nnottion setD user n speify the nnottion set from whih the tokens should e tken @eFgF ueyFokenE nnottions of type oken from the nnottion set lled ueyAF sn se user does not provide ny nnottion set nme @eFgF okenAD the system serhes in ll the nnottion sets to e indexed nd the seE tokens from the (rst nnottion set with the se token nnottions re tkenF lese note tht the douments with no seEtokens re not indexedF roweverD if
ANNIC: ANNotations-In-Context
PQI
the rete tokens utomtilly9 option is seletedD the h retes seEtokens utomtillyF rereD eh string delimited with white spe is onsidered s tokenF @eA sndex nit ypeX @eFgF enteneD ueyFenteneA his spei(es the unit of sndexF sn other wordsD nnottions lying within the oundries of these nnottions re indexed @eFgF in the se of entenes9D no nnottions tht re spnned ross the oundries of two sentenes re onsidered for indexingAF ser n speify from whih nnottion set the index unit nnottions should e onsideredF sf user does not provide ny nnottion setD the h serhes mong ll nnottion sets for index unitsF sf this (eld is left empty or h fils to lote index unitsD the entire doument is onsidered s single unitF @fA peturesX pinllyD users n speify the nnottion types nd fetures tht should e indexed or exluded from eing indexedF @eFgF peoken nd plitAF sf user wnts to exlude only spei( feture of spei( nnottion typeD heGshe n speify it using 9F9 seprtor etween the nnottion type nd its feture @eFgF ersonFmthesAF SF glik yuF sf ll prmeters re yuD new empty h will e retedF TF grete n empty orpus nd sve it to the hF UF opulte it with some doumentsF ih doument dded to the orpus nd eventully to the h is indexed utomtillyF sf the doument does not hve the required nnottionsD tht doument is skipped nd not indexedF hs re portle nd n e moved ross di'erent systemsF roweverD the reltive positions of oth the dtstore folder nd the respetive index folder must e mintinedF sf it is not possile to mintin the reltive positionsD the new lotion of the index must e spei(ed inside the qeierilhttore9 (le inside the dtstore folderF
9.2
Search GUI
9.2.1 Overview
pigure WFI shows the serh qs for dtstoreF he top setion ontins text re to write queryD lists to selet the orpus nd nnottion set to serh inD sliders to set the size of the results nd ontext nd ions to exeute nd ler the queryF he entrl setion shows grphil visulistion of stked nnottions nd feture vlues for the result row seleted in the ottom results tleF here is on(gurtion window where you de(ne whih nnottion type nd feture to disply in the entrl setionF
PQP
ANNIC: ANNotations-In-Context
he ottom setion ontins the results tle of the queryD iFeF the text tht mthes the query with their left nd right ontextsF he ottom setion ontins lso ted pne of sttistisF
ANNIC: ANNotations-In-Context
PQQ
tei ptterns lso support the | @yA opertorF por instneD {e} @{f} | {g}A is pttern of two nnottions where the (rst is n nnottion of type e followed y the nnottion of type either f or gF exxsg supports two opertorsD C nd BD to speify the numer of times prtiulr nnoE ttion or su pttern should pper in the min query ptternF rereD @{e}ACn mens one nd up to n ourrenes of nnottion {e} nd @{e}ABn mens zero or up to n ourrenes of nnottion {e}F felow we explin the steps to serh in hF IF houle lik on hF ou will see n extr t vuene httore erherF glik on it to tivte the serher qsF PF rere you n speify query to serh in your hF he query here is vFrFF prt of the tei grmmrF rere re some exmplesX @A {erson} ! his will return nnottions of type erson from the h @A {okenFstring aa wirosoft} ! his will return ll ourrenes of wirosoft from the hF @A {erson}@{oken}ABP{yrgniztion} ! erson followed y zero or up to two tokens followed y yrgniztionF @dA {okenForthaauppersnitilD yrgniztion} ! oken with feture orth with vlue set to uppersnitil nd whih is lso nnotted s yrgniztionF
PQR
ANNIC: ANNotations-In-Context
eutoEompletion s shown in (gure WFP for nnottion type is triggered when typing 9{9 or 9D9 nd for feture when typing 9F9 fter vlid nnottion typeF st shows only the nnottion types nd fetures relted to the seleted orpus nd nnottion setF sf you rightElik on n expression it will utomtilly selet the shortest vlid enlosing re nd if you lik on seletion it will propose you to dd qunti(ers for llowing the expression to pper zeroD one or more timesF o exeute the queryD lik on the mgnifying glss ionD use inter key or eltCinter key omintionF o ler the queryD lik on the red ion or use eltCfkspe key omiE ntionF st is possile to hve more thn one orpusD eh ontining di'erent set of doumentsD stored in single dtEstoreF exxsgD y providing drop down ox with list of stored orporD lso llows serhing within spei( orpusF imilrly doument n hve more thn one nnottion set indexed nd therefore exxsg lso provides drop down ox with list of indexed nnottion sets for the seleted orpusF e lrge orpus n hve mny hits for given queryF his my tke long time to refresh the qs nd my rete inonveniene while rowsing through resultsF herefore you n speify the numer of results to retrieveF se the Next Page of Results utton to iterte through resultsF hue to tehnil omplexitiesD it is not possile to visit previous pgeF o retrieve ll the results t the sme timeD push the results slider to the right endF
ANNIC: ANNotations-In-Context
PQS
9.3
LuceneDataStoreImpl ds = ( LuceneDataStoreImpl ) Factory . createDataStore ( ` ` gate . persist . LuceneDataStoreImpl ' ' , dsLocation );
Map parameters = new HashMap (); parameters . put ( Constants . INDEX_LOCATION_URL , new URL ( indexLocation ));
parameters . put ( Constants . BASE_TOKEN_ANNOTATION_TYPE , `` Token ' ' ); parameters . put ( Constants . CREATE_TOKENS_AUTOMATICALLY , new Boolean ( true ));
PQT
ANNIC: ANNotations-In-Context
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
/ / all features should be indexed / / specifying the annotation sets "Key" and "Default Annotation Set" / / to be indexed / / specify the index unit type
List < String > setsToInclude = new ArrayList < String >(); setsToInclude . add ( " Key " ); setsToInclude . add ( " < null > " ); parameters . put ( Constants . ANNOTATION_SETS_NAMES_TO_INCLUDE , setsToInclude ); parameters . put ( Constants . ANNOTATION_SETS_NAMES_TO_EXCLUDE , new ArrayList < String >()); parameters . put ( Constants . FEATURES_TO_INCLUDE , new ArrayList < String >()); parameters . put ( Constants . FEATURES_TO_EXCLUDE , new ArrayList < String >());
Searcher searcher = ds . getSearcher (); Map parameters = new HashMap (); String indexLocation = new File ((( URL ) ds . getIndexer (). getParameters () . get ( Constants . INDEX_LOCATION_URL )). getFile ()). getAbsolutePath (); ArrayList indexLocations = new ArrayList (); indexLocations . add ( indexLocation );
String annotationSet2SearchIn = " Key " ; parameters . put ( Constants . INDEX_LOCATIONS , indexLocations ); parameters . put ( Constants . CORPUS_ID , corpus2SearchIn ); parameters . put ( Constants . ANNOTATION_SET_ID , annotationSet ); parameters . put ( Constants . CONTEXT_WINDOW , contextWindow ); parameters . put ( Constants . NO_OF_PATTERNS , noOfPatterns );
ANNIC: ANNotations-In-Context
Hit [] hits = searcher . search ( query , parameters );
PQU
28
PQV
ANNIC: ANNotations-In-Context
qei provides vriety of tools for utomti evlutionF he ennottion hi' tool omE pres two nnottion sets within doumentF gorpus e extends ennottion hi' to n entire orpusF he gorpus fenhmrk tool lso provides funtionlity for ompring nnoE ttion sets over n entire orpusF edditionllyD two plugins over similr funtionlityY one implements interEnnottor greementD nd the otherD the lned distne metriF hese tools re prtiulrly useful not just s (nl mesure of performneD ut s tool to id system development y trking progress nd evluting the impt of hnges s they re mdeF epplitions inlude evluting the suess of mhine lerning or lnguge engineering pplition y ompring its results to gold stndrd nd lso ompring nnottions prepred y two humn nnottors to eh other to ensure tht the nnottions re relileF his hpter egins y introduing the onepts nd metris relevntD efore desriing eh of the tools in turnF PQW
PRH
10.1
hen we evlute the performne of proessing resoure suh s tokeniserD y tggerD or whole pplitionD we usully hve humnEuthored gold stndrd9 ginst whih to ompre our softwreF roweverD it is not lwys esy or ovious wht this gold stndrd should eD s di'erent people my hve di'erent opinions out wht is orretF ypillyD we solve this prolem y using more thn one humn nnottorD nd ompring their nnotE tionsF e do this y lulting interEnnottor greement @seeAD lso known s interErter reliilityF see n e used to ssess how di0ult tsk isF his is sed on the rgument tht if two humns nnot ome to greement on some nnottionD it is unlikely tht omputer ould ever do the sme nnottion orretly9F husD see n e used to (nd the eiling for omputer performneF here re mny possile metris for reporting seeD suh s gohen9s uppD prevleneD nd is iugenio 8 qlss HRF upp is the est metri for see when ll the nnottors hve identil exhustive sets of questions on whih they might gree or disgreeF sn other wordsD it is lssi(tion tskF his ould e tsk like re these nmes mle or femle nmes9F roweverD sometimes there is disgreement out the set of questionsD eFgF when the nnottors themselves determine whih text spns they ought to nnotteD suh s in nmed entity extrtionF ht ould e tsk like red over this text nd mrk up ll referenes to politis9F hen nnottors determine their own sets of questionsD it is pproprite to use preisionD rellD nd pEmesure to report seeF reisionD rell nd pEmesure re lso pproprite hoies when ssessing performne of n utomted pplition ginst trusted gold stndrdF sn this setionD we will (rst introdue some relevnt termsD efore outlining gohen9s upp nd similr mesuresD in etion IHFIFPF e will then introdue preisionD rell nd pE mesure in etion IHFIFQF
goextensive wo nnottions re oextensive if they hit the sme spn of text in douE
mentF fsillyD oth their strt nd end o'sets re equlF
PRI
yverlps wo nnottions overlp if they shre ommon spn of textF gomptile wo nnottions re omptile if they re oextensive nd if the fetures of
one @usully the ones from the keyA re inluded in the fetures of the other @usully the responseAF
wissing his pplies only to the key nnottionsF e key nnottion is missing if either
purious his pplies only to the response nnottionsF e response nnottion is spurious
if either it is not oextensive or overlppingD or if one or more fetures from the key re not inluded in the response nnottionF
ennottorEP tI C
tP d Cd
yserved greement is the portion of the instnes on whih the nnottors greeF por
the two nnottors nd two tegories s shown in le IHFID it is de(ned s
Ao =
a+d a+b+c+d
@IHFIA
he extension of the ove formul to more thn two tegories is strightforwrdF he extension to more thn two nnottors is usully tken s the men of the pirEwise greeE ments pleiss USD whih is the verge greement ross ll possile pirs of nnottorsF en lterntive ompres eh nnottor with the mjority opinion of the others pleiss USF
PRP
roweverD the oserved greement hs two shortomingsF yne is tht ertin mount of greement is expeted y hneF he upp mesure is hneEorreted greementF enother is tht it sums up the greement on ll the tegoriesD ut the greements on eh tegory my di'erF rene the tegory spei( greement is neededF
pei( greement qunti(es the degree of greement for eh of the tegories seprtelyF
por exmpleD the spei( greement for the two tegories list in le IHFI is the followingD respetivelyD
Acat1 =
2a ; 2a + b + c
Acat2 =
2d b + c + 2d
@IHFPA
upp is de(ned s the oserved greements Ao minus the greement expeted y hne
Ae nd is normlized s numer etween EI nd IF = Ao Ae 1 Ae
@IHFQA
= 1 mens perfet greementsD = 0 mens the greement is equl to hneD = 1 mens perfet9 disgreementF
here re two di'erent wys of omputing the hne greement Ae @for detiled explnE tions out it see iugenio 8 qlss HRY howeverD quik outline will e given elowAF he gohen9s upp is sed on the individul distriution of eh nnottorD while the iegel 8 gstelln9s upp is sed on the ssumption tht ll the nnottors hve the sme distriutionF he former is more informtive thn the ltter nd hs een used widelyF vet us onsider n exmpleX ennottorEI tI tP mrginl sum ennottorEP tI I Q R tP P R T mrginl sum Q U IH
gohen9s upp requires tht the expeted greement e lulted s followsF hivide
mrginl sums y the totl to get the portion of the instnes tht eh nnottor llotes to eh tegoryF wultiply nnottor9s proportions together to get the likelihood of hne greementD then totl these (guresF le IHFQ gives worked exmpleF he formul n esily e extended to more thn two tegoriesF
HPAF st
PRQ
di'ers from gohen9s upp only in how the expeted greement is lultedF le IHFR shows worked exmpleF ennottor totls re dded together nd divided y the numer of deisions to form joint proportionsF hese re then squred nd totlledF tI tP otl ennEI Q U ennEP R T um U IQ toint rop UGPH IQGPH tEqured RWGRHHaHFIPPS ITWGRHHaHFRPPS PIVGRHH a HFSRS
le IHFRX glulting ixpeted egreement for iegel 8 gstelln9s upp @ott9s iA he upp su'ers from the prevlene prolem whih rises euse imlned distriuE tion of tegories in the dt inreses Ae F he prevlene prolem n e llevited y reporting the positive nd negtive spei(ed greement on eh tegory esides the upp rripsk 8 reitjn HPD iugenio 8 qlss HRF sn dditionD the soElled is prolem 'ets the gohen9s uppD ut not 8g9sF he is prolem rises s one nnottor prefers one prtiulr tegory more thn nother nnottorF iugenio 8 qlss HR dvised to ompute the 8g9s upp nd the spei( greements long with the gohen9s upp in order to hndle these prolemsF hespite the prolem mentioned oveD the gohen9s upp remins populr see mesureF upp n e used for more thn two nnottors sed on pirEwise (guresD eFgF the men of ll the pirEwise upp s n overll upp mesureF he gohen9s upp n lso e extended to the se of more thn two nnottors y using the following single formul hvies 8 pleiss VP
=1
I(J(J 1)
2 IJ 2 i c Yic c (pc (1 pc )) + c
j (pcj
pc )2 )
@IHFRA
here I nd J re the numer of instnes nd nnottorsD respetivelyY Yic is the numer of nnottors who ssigns the tegory c to the instne I Y pcj is the proility of the nnottor j ssigning tegory cY pc is the proility of ssigning tegory y ll nnottors @iFeF verging pcj over ll nnottorsAF he urippendor'9s lphD nother vrint of uppD di'ers only slightly from the 8g9s upp on nominl tegory prolem @see grlett WTD iugenio 8 qlss HRAF roweverD note tht the upp @nd the oserved greementA is not pplile to some tsksF xmed entity nnottion is one suh tsk rripsk 8 othshild HSF sn the nmed
PRR
entity nnottion tskD nnottors re given some text nd re sked to nnotte some nmed entities @nd possily their tegoriesA in the textF hi'erent nnottors my nnotte di'erent instnes of the nmed entityF oD if one nnottor nnottes one nmed entity in the text ut nother nnottor does not nnotte itD then tht nmed entity is nonEentity for the ltterF roweverD generlly the nonEentity in the text is not wellEde(ned termD eFgF we don9t know how mny words should e ontined in the nonEentityF yn the other hndD if we wnt to ompute upp for nmed entity nnottionD we need the nonEentitiesF his is why people don9t ompute upp for the nmed entity tskF
reision mesures the numer of orretly identi(ed items s perentge of the numer
of items identi(edF sn other wordsD it mesures how mny of the items tht the system identi(ed were tully orretD regrdless of whether it lso filed to retrieve orret itemsF he higher the preisionD the etter the system is t ensuring tht wht is identi(ed is orretF
irror rte is the inverse of preisionD nd mesures the numer of inorretly identi(ed
ell mesures the numer of orretly identi(ed items s perentge of the totl numer
of orret itemsF sn other wordsD it mesures how mny of the items tht should hve een identi(ed tully were identi(edD regrdless of how mny spurious identi(tions were mdeF he higher the rell rteD the etter the system is t not missing orret itemsF glerlyD there must e trdeo' etween preision nd rellD for system n esily e mde to hieve IHH7 preision y identifying nothing @nd so mking no mistkes in wht it identi(esAD or IHH7 rell y identifying everything @nd so not missing nythingAF he pEmesure vn ijsergen UW is often used in onjuntion with reision nd ellD s weighted verge of the twoF plse positives re useful metri when deling with wide vriety of text typesD euse it is not dependent on relative document richness in the sme wy tht preision isF fy this we men the reltive numer of entities of eh type to e found in set of doumentsF hen ompring di'erent systems on the sme doument setD reltive doument rihness is unimportntD euse it is equl for ll systemsF hen ompring single system9s performne on di'erent doumentsD howeverD it is muh more ruilD euse if prtiulr
PRS
doument type hs signi(ntly di'erent numer of ny type of entityD the results for tht entity type n eome skewedF gompre the impt on preision of one error where the totl numer of orret entities a ID nd one error where the totl a IHHF essuming the doument length is the smeD then the flse positive sore for eh textD on the other hndD should e identilF gommon metris for evlution of si systems re de(ned s followsX
P recision =
@IHFSA
Recall =
@IHFTA
F measure =
@IHFUA
where re)ets the weighting of vsF F sf is set to ID the two re weighted equllyF ith set to HFSD preision weights twie s muh s rellF end with set to PD rell weights twie s muh s preisionF
F alseP ositive =
Spurious c
@IHFVA
where c is some onstnt independent from doument rihnessD eFgF the numer of tokens or sentenes in the doumentF xote tht we onsider nnottions to e prtilly orret if the entity type is orret nd the spns re overlpping ut not identilF rtilly orret responses re normlly lloted hlf weightF
PRT
he method of hoie depends on the priorities of the se in questionF wro verging tends to inrese the importne of shorter doumentsF st is lso possile to lulte mro verge ross nnottion typesY tht is to syD preisionD rell nd fEmesure re lulted seprtely for eh nnottion type nd the results then vergedF
10.2
he ennottion hi' tool enles two sets of nnottions in one or two douments to e omE predD in order either to ompre systemEnnotted text with referene @hndEnnottedA textD or to ompre the output of two di'erent versions of the system @or two di'erent sysE temsAF por eh nnottion typeD (gures re generted for preisionD rellD pEmesureF ih of these n e lulted ording to Q di'erent riteri E stritD lenient nd vergeF he reson for this is to del with prtilly orret responses in di'erent wysF he trit mesure onsiders ll prtilly orret responses s inorret @spuriousAF he venient mesure onsiders ll prtilly orret responses s orretF he everge mesure llotes hlf weight to prtilly orret responses @iFeF it tkes the verge of strit nd lenientAF st n e essed oth from qei heveloper nd from qei imeddedF ennottion hi' ompres sets of nnottions with the sme typeF hen performing the omprisonD the nnottion o'sets nd their fetures will e tken into onsidertionF nd fter thtD the omprison proess is triggeredF ell nnottions from the key set re ompred with the ones from the response setD nd those found to hve the sme strt nd end o'sets re displyed on the sme line in the tleF henD the ennottion hi' evlutes if the fetures of eh nnottion from the response set susume those fetures from the key setD s spei(ed y the fetures nmes you provideF o use the nnottion di' toolD see etion IHFPFIF o rete gold stndrdD see setion IHFPFPF o ompre more thn two nnottion setsD see etion QFRFQF
PRU
pigure IHFIX ennottion di' window with the prmeters t the topD the omprison tle in the enter nd the sttistis pnel t the ottomF
douments to e used @note tht oth must hve een previously loded into the systemAD the nnottion sets to e used for ehD nd the nnottion type to e ompredF xote tht the tool utomtilly intersets ll the nnottion types from the seleted key nnottion set with ll types from the response setF yn seprte noteD you n perform di' on the sme doumentD etween two di'erent nnottion setsF yne nnottion set ould ontin the key type nd nother ould ontin the response oneF efter the type hs een seletedD the user is required to deide how the fetures will e ompredF st is importnt to know tht the tool ompres them y nlysing if fetures from the key set re ontined in the response setF st heks for oth the feture nme nd feture vlue to e the smeF here re three si options to seletX o tke ll9 the fetures from the key set into onsidertion o tke only some9 user seleted fetures o tke none9 of the fetures from the key setF
PRV
he weight for the pEwesure n lso e hnged E y defult it is set to IFH @iFeF to give preision nd rell equl weightAF pinllyD lik on gompre9 to disply the resultsF xote tht the window my need to e resized mnullyD y drgging the window edges s ppropriteAF sn the min windowD the key nd response nnottions will e displyedF hey n e sorted y ny tegory y liking on the entrl olumn hederX ac9F he key nd response nnottions will e ligned if their indies re identilD nd re olor oded ording to the legend displyed t the ottomF reisionD rellD pEmesure re lso displyed elow the nnottion tlesD eh ording to Q riteri E stritD lenient nd vergeF ee etions IHFP nd IHFI for more detils out the evlution metrisF he results n e sves to n rwv (le y using the ixport to rwv9 uttonF his retes n rwv snpshot of wht the ennottion hi' tle shows t tht momentF he olumns nd rows in the tle will e shown in the sme orderD nd the hidden olumns will not pper in the rwv (leF he olours will lso e the smeF sf you need more detils or ontext you n use the utton how doument9 to disply the doument nd the nnottions seleted in the nnottion di' drop down lists nd tleF
PRW
pigure IHFPX ennottion di' window with the prmeters t the topD the omprison tle in the enter nd the djudition pnel t the ottomF
pigure IHFQX gorpus ulity essurne showing the doument sttistis tle
PSH
10.3
PSI
ou my now hoose the nnottion types you re interested inF sf you don9t hoose ny then ll will e usedF sf you wishD you my hek the ox present in every seleted set9 to redue the nnottion types list to only those present in every seleted nnottion setF ou n hoose the nnottion fetures you wish to inlude in the lultionF sf you hoose feturesD then for n nnottion to e onsidered mth to notherD their feture vlues must lso mthF sf you selet the ox present in every seleted type9 the fetures list will e redued to only those present in every type you seletedF por the lssi(tion mesures you must selet only one type nd one fetureF he wesures9 list llows you to hoose whether to lulte strit or lenient (gures or verge the twoF ou my hoose s mny s you wishD nd they will e inluded s olumns in the tle to the leftF he fhw mesures llow to ept mth when the two onept re lose enough in n ontology even if their nme re di'erentF ee setion IHFTF en yptions9 utton ove the wesures9 list gives let you set some settings like the et for the psore or the fhw (leF pinllyD lik on the gompre9 utton to relulte the tlesF he (gures tht pper in the severl tles @one per tA re desried elowF
PSP
wiro verging trets the entire orpus s one ig doument where mro vergingD on this tleD is the rithmeti men of the perEtype (guresF ee etion IHFIFR for more detil on the distintion etween miro nd mro vergeF
ith dmpilerl n v to (le of the formt desried t setion IHFTF wethods for omputing the mesuresX
differ.calculateDiff(Collection key, Collection response) classificationMeasures.calculateConfusionMatrix(AnnotationSet key, AnnotationSet response, String type, String feature, boolean verbose) ontologyMeasures.calculateBdm(Collection<AnnotationDiffer> differs)
ith verose to e set to true if you wnt to get printed the nnottions ignored on the 4stndrd4 output stremF
PSQ
gonstrutorsD useful for miro vergeD no need to use lulte methods s they must hve een lredy lledX
AnnotationDiffer(Collection<AnnotationDiffer> differs) ClassificationMeasures(Collection<ClassificationMeasures> tables) OntologyMeasures(Collection<OntologyMeasures> measures)
ith mesures n rry of tring with vlues to hoose fromX pIFHEsore strit pIFHEsore lenient pIFHEsore verge pIFHEsore strit fhw pIFHEsore lenient fhw pIFHEsore verge fhw yserved greement gohen9s upp i9s upp xote tht the numeri vlue IFH9 represents the et oe0ient in the psoreF ee setion IHFI for more informtion on these mesuresF wethod only for glssi(tionwesuresX
List<List<String>> getConfusionMatrix(String title)
he following exmple is tken from gteFguiFgorpusulityessurne5ompreennottion ut hsn9t een rn so there ould e some orretions to mkeF
1 2 3 4 5 6
final int FSCORE_MEASURES = 0; final int CLASSIFICATION_MEASURES = 1; ArrayList < String > documentNames = new ArrayList < String >(); TreeSet < String > types = new TreeSet < String >(); Set < String > features = new HashSet < String >();
PSR
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
PSS
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
differsByDocThenType . add ( differsByType ); differ = new AnnotationDiffer ( differsByType . values ()); List < String > measuresRow ; if ( useBdm ) { OntologyMeasures ontologyMeasures = new OntologyMeasures (); ontologyMeasures . setBdmFile ( bdmFileUrl ); ontologyMeasures . calculateBdm ( differsByType . values ()); measuresRow = ontologyMeasures . getMeasuresRow ( measures , documentNames . get ( documentNames . size () -1)); } else { measuresRow = differ . getMeasuresRow ( measures , documentNames . get ( documentNames . size () -1)); } System . out . println ( Arrays . deepToString ( measuresRow . toArray ())); } else if ( measuresType == CLASSIFICATION_MEASURES && ! keys . isEmpty () && ! responses . isEmpty ()) { ClassificationMeasures classificationMeasures = new ClassificationMeasures (); classificationMeasures . calculateConfusionMatrix ( ( AnnotationSet ) keys , ( AnnotationSet ) responses , types . first () , features . iterator (). next () , false ); List < String > measuresRow = classificationMeasures . getMeasuresRow ( measures , documentNames . get ( documentNames . size () -1)); System . out . println ( Arrays . deepToString ( measuresRow . toArray ())); List < List < String > > matrix = classificationMeasures . getConfusionMatrix ( documentNames . get ( documentNames . size () -1)); for ( List < String > matrixRow : matrix ) { System . out . println ( Arrays . deepToString ( matrixRow . toArray ())); } }
/ / classication document table
ee method gteFguiFgorpusulityessurne5printummry for miro nd mro verge like in the gorpus ulity essurneF
PST
10.4
vike the gorpus ulity essurne funtionlityD the orpus enhmrk tool enles evluE tion to e rried out over whole orpus rther thn single doumentF nlike gorpus eD it uses mthed orpor to hieve thisD rther thn ompring nnottion sets within orpusF st enles trking of the system9s performne over timeF st provides more deE tiled informtion regrding the nnottions tht di'er etween versions of the orpus @eFgF nnottions reted y di'erent versions of n pplitionA thn the gorpus e tool doesF he si ide with the tool is to evlute n pplition with respet to gold stndrd9F ou hve mrked9 orpus ontining the gold stndrd referene nnottionsY you hve len9 opy of the orpus tht does not ontin the nnottions in questionD nd you hve n pplition tht retes the nnottions in questionF xow you n see how you re getting onD y ompring the result of running your pplition on len9 to the mrked9 nnottionsF
PSU
minX you should hve min diretory ontining sudiretories for your mthed orporF st does not mtter wht this diretory is lledF his is the diretory you will selet when the progrm promptsD lese selet diretory whih ontins the douments to e evluted9F lenX wke diretory lled len9 @seEsensitiveAD nd in itD mke opy of your orpus tht does not ontin the nnottions tht your pplition retes @though it my ontin other nnottionsAF he orpus enhmrk tool will pply your ppliE tion to this orpusD so it is importnt tht the nnottions it retes re not lredy present in the orpusF ou n rete this orpus y opying your mrked9 orpus nd deleting the nnottions in question from itF mrkedX you should hve gold stndrd9 opy of your orpus in diretory lled mrked9 @seEsensitiveAD ontining the nnottions to whih the progrm will omE pre those produed y your pplitionF he ide of the orpus enhmrk tool is to tell you how good your pplition performne is reltive to this nnottion setF he mrked9 orpus should ontin extly the sme douments s the len9 setF proessedX this diretory ontins third version of the orpusF his diretory will e reted y the tool itselfD when you run store orpus for future evlution9F e will explin how to do this in etion IHFRFQ
PSV
he defult nnottion set hs to e represented y n empty stringF he outputetxme nd nnotetxme must e di'erentD nd nnot oth e the defult nnottion setF @sf they re the smeD then use the ennottion et rnsfer to hnge one of themFA sf you omit ny line @or just leve the vlue lnkAD tht property reverts to defultF por exmpleD nnotetxmea9 is the sme s leving tht line outF en exmple (le is shown elowX
threshold=0.7 annotSetName=Key outputSetName=ANNIE annotTypes=Person;Organization;Location;Date;Address;Money annotFeatures=type;gender
PSW
e will desrie these options in di'erent order to tht in whih they pper on the menuD to filitte explntionF
tore gorpus for puture ivlution popultes the proessed9 diretory with dtstore
ontining the result of running your pplition on the len9 orpusF sf proessed9 diretory existsD the results will e pled thereY if notD one will e retedF his retes reord of the urrent pplition performneF ou n rerun this opertion ny time to updte the stored setF
rumn wrked eginst tored roessing esults ompres the stored proessed9
set with the mrked9 setF his mode ssumes you hve lredy run tore orpus for future evlution9F st performs di' etween the mrked9 diretory nd the proessed9 diretory nd prints out the metrisF
rumn wrked eginst gurrent roessing esults ompres the mrked9 set with
the result of running the pplition on the len9 orpusF st runs your pplition on the douments in the len9 diretory reting temporry nnotted orpus nd performs di' with the douments in the mrked9 diretoryF efter the metris @rellD preisionD etFA re lulted nd printed outD it deletes the temporry orpusF
hefult wode runs rumn wrked eginst gurrent roessing esults9 nd rumn
wrked eginst tored roessing esults9 nd ompres the results of the twoD showing you where things hve hnged etween versionsF his is one of the min purposes of the enhmrk toolY to show the di'erene in performne etween di'erent versions of your pplitionF yne the mode hs een seletedD the progrm promptsD lese selet diretory whih ontins the douments to e evluted9F ghoose the min diretory ontining your orpus diretoriesF @ho not selet len9D mrked9D or proessed9FA hen @exept in rumn mrked ginst stored proessing results9 modeA you will e prompted to selet the (le ontining your pplition @eFgF n Fxgpp (leAF he tool n e used either in verose or nonEverose modeD y seleting or unseleting the verose option from the menuF sn verose modeD for ny preisionGrell (gure elow the user9s preEde(ned threshold @stored in orpustoolFproperties (leA the tool will show the the nonEoextensive nnottions @nd their orresponding textA for tht entity typeD therey enling the user to see where prolems re ourringF
PTH
sn eh modeD the following sttistis will e outputX IF erEdoument (guresD itemised y typeX preision nd rellD s well s detiled inforE mtion out the di'ering nnottionsY PF ummry y type @ttistis9AX orretD prtilly orretD missing nd spurious totlsD s well s whole orpus @miroEvergeA preisionD rell nd fEmesure @pIAD itemised y typeY QF yverll verge (guresX preisionD rell nd pI lulted s mroEverge @rithE meti vergeA of the individul doument preisions nd rellsF sn hefult9 modeD informtion is lso provided out whether the (gures hve inresed or deresed in omprison with the wrked9 orpusF
10.5
he internnottor greement pluginD snterennottoregreement9D omputes the pE mesuresD nmely preisionD rell nd pID suitle for nmed entity nnottions @see eE
PTI
tion IHFIFQAD nd greementD gohen9s kpp nd ott9s piD suitle for text lssi(tion tsks @see etion IHFIFPAF sn the ltter seD onfusion mtrix is lso providedF sn this setion we desrie those mesures nd the output results from the pluginF fut (rst we explin how to lod the pluginD nd the input to nd the prmeters of the pluginF pirst you need to lod the plugin nmed snterennottoregreement9 into qei hevelE oper using the tool Manage CREOLE PluginsD if it is not lredy lodedF hen you n rete for the plugin from the see gomputtion9 in the existing listF efter tht you n put the into Corpus Pipeline to use itF he see gomputtion di'ers from the gorpus fenhmrk ool in the dt preprtion requiredF es in the gorpus fenhmrk oolD the ide is to ompre nnottion setsD for exmpleD prepred y di'erent nnottorsD ut in the see gomputtion D these nnotE tion sets should e on the sme set of doumentsF husD one orpus is loded into qei on whih the is runF hi'erent nnottion sets ontin the nnottions whih will e ompredF hese should @oviouslyA hve di'erent nmesF st flls to the user to deide whether to use nnottion type or n nnottion feture s lssY re two nnottions onsidered to e in greement euse they hve the sme type nd the sme spnc yr do you wnt to mrk up your dt with n nnottion type suh s wention9D thus de(ning the relevnt nnottionsD then give it lss9 fetureD the vlue of whih should e mthed in order tht they re onsidered to greec his is mtter of onvenieneF por exmpleD dt from the fth verning @see etion IVFPA uses single nnottion type nd lss fetureF sn other ontextsD using nnottion type might feel more nturlY the nnottion sets should gree out wht is erson9D wht is hte9 etF st is lso possile to mix the twoD s you will see elowF he see plugin hs two runtime prmeters nnetspors nd nnypesendpets for speifying the nnottion sets nd the nnottion types nd feturesD respetivelyF lues should e seprted y semiolonsF por exmpleD to speify nnottion sets ennI9D ennP9 nd ennQ9 you should set the vlue of annSetsForIaa to ennIYennPYennQ9F xote tht more thn two nnottion sets re possileF peify the vlue of annTypesAndFeats s er9 to ompute the see for the three nnottion sets on the nnottion type PerF ou n lso speify more thn one nnottion type nd seprte them y Y9 tooD nd optionlly speify n nnottion feture for type y tthing E>9 followed y feture nme to the end of the nnottion nmeF por exmpleD erE>lelYyrg9 spei(es two nnottion types Per nd Org nd lso feture nme label for the type PerF sf you speify n nnottion feture for n nnottion typeD then two nnottions of the sme type will e regrded s eing di'erent if they hve di'erent vlues of tht fetureD even if the two nnottions oupy extly the sme position in the doumentF yn the other hndD if you do not speify ny nnottion feture for n nnottion typeD then the two nnottions of the type will e regrded s the sme if they oupy the sme position in the doumentF he prmeter mesureype spei(es the type of mesure omputedF here re two mesure typesY the F-measure @iFeF reisionD ell nd pIAD nd the observed agreement and Cohen's KappaF por lssi(tion tsks suh s doument or sentene lssi(tionD the
PTP
oserved greement nd gohen9s upp is often usedD though the pEmesure is pplile tooF sn these tsksD the trgets re lredy identi(edD nd the tsk is merely to lssify them orretlyF roweverD for the nmed entity reognition tskD only the pEmesure is pplileF sn suh tsksD (nding the nmed entities9 @text to e nnottedA is s muh prt of the tsk s orretly lelling itF yserved greement nd gohen9s kpp re not suitle in this seF ee etion IHFIFP for further disussionF he prmeter hs two vluesD FMEASURE nd AGREEMENTANDKAPPAF he defult vlue of the prmeter is FMEASUREF enother prmeter verosity spei(es the verosity level of the plugin9s outputF vevel P displys the most detiled outputD inluding the see mesures on eh doument nd the mroEverged results over ll doumentsF vevel I only displys the see mesures verged over ll doumentsF vevel H does not hve ny outputF he defult vlue of the prmeter is IF sn the following we will explin the outputs in detilF et nother runtime prmeter dmorepile spei(es the v for (le ontining the fhw sores used for the fhw sed see omputtionF he fhw sore (le should e produed y the fhw omputtion pluginD whih is desried in etion IHFTF he fhwE sed see omputtion will e explined elowF sf the prmeter is not ssigned ny vlueD or is ssigned (le whih is not fhw sore (leD the will not ompute the fhw sed seeF
PTQ
por eh doumentD it displys one nnottion type nd optionlly n nnottion feture if spei(edD nd then the results for tht type nd tht fetureF xote tht the see ompuE ttions re sed on the pirwise omprison of nnottorsF sn other wordsD we ompute the see for eh pir of nnottorsF he (rst results for one doument nd one nnottion type re the mroEverged ones over ll pirs of nnottorsD whih hve three numers for the three types of see mesuresD nmely Observed agreementD Cohen's kappa nd Scott's piF hen for eh pir of nnottorsD it outputs the three types of mesuresD onfusion mtrix @or ontingeny tleAD nd the spei( greements for eh lelF he lels re otined from the nnottions of tht prtiulr typeF por eh nnottion typeD if feture is spei(edD then the lels re the vlues of tht fetureF lese note tht two terms my e dded to the lel listX one is the empty one otined from those nnottions whih hve the nnottion feture ut do not hve vlue for the fetureY the other is xonEt9D orresponding to those nnottions not hving the feture t llF sf no feture is spei(edD then two lels re usedX enns9 orresponding to the nnottions of tht typeD nd xonE t9 orresponding to those nnottions whih re nnotted y one nnottor ut re not nnotted y nother nnottorF efter displying the results for eh doumentD the plugin prints out the mroEverged results over ll doumentsF pirstD for eh nnottion typeD it prints out the results for eh pir of nnottorsD nd the mroEverged results over ll pirs of nnottorsF pinlly it prints out the mroEverged results over ll pirs of nnottorsD ll types nd ll doumentsF lese note tht the lssi(tion prolem n e evluted using the pEmesure tooF sf you wnt to evlute lssi(tion prolem using the pEmesureD you just need to set the run time prmeter measureType to FMEASUREF
PTR
n overll mesureF he omputtion of the pEmesures @eFgF reisionD ell nd pIA re shown in etion IHFIF es noted in rripsk 8 othshild HSD the pI omputed for two nnottors for one spei( tegory is equivlent to the positive spei( greement of the tegoryF he outputs of the see plugins for nmed entity nnottion re similr to those for lsE si(tionF fut the outputs re the pEmesuresD suh s reisionD ell nd pID insted of the greements nd uppsF st (rst prints out the results for eh doumentF por one doumentD it prints out the results for eh nnottion typeD mroEverged over ll pirs of nnottorsD then the results for eh pir of nnottorsF sn the lst prtD the miroEverged results over ll douments re displyedF xote tht the results re reported in oth the strit mesure nd the lenient mesureD s de(ned in etion IHFPF lese note thtD for omputing the pEmesures for the nmed entity nnottionsD the see plugin rries out the sme omputtion s the Corpus Benchmark toolF he see plugin is simpler thn the gorpus enhmrk tool in the sense tht the former needs only one set of douments with two or more nnottion setsD wheres the ltter needs three sets of the sme doumentsD one without ny nnottionD nother with one nnottion setD nd the third one with nother nnottion setF edditionllyD the see plugin n del with more thn two nnottion sets ut the gorpus enhmrk tool n only del with two nnottion setsF
PTS
10.6
he fhw @lned distne metriA mesures the loseness of two onepts in n ontology or txonomy wynrd HSD wynrd et al. HTF st is rel numer etween H nd IF he loser the two onepts re in n ontologyD the greter their fhw sore isF por detiled explntion out the fhwD see the ppers wynrd HSD wynrd et al. HTF he fhw n e seen s n improved version of the lerning ury gimino et al. HQF st is dependent on the length of the shortest pth onneting the two onepts nd lso the deepness of the two onepts in ontologyF st is lso normlised with the size of ontology nd lso tkes into ount the onept density of the re ontining the two involved oneptsF he fhw hs een used to evlute the ontology sed informtion extrtion @qyfsiA system wynrd et al. HTF he yfsi identi(es the instnes for the onepts of n onE tologyF st9s possile tht n yfsi system identi(es n instne suessfully ut does not ssign it the orret oneptF snsted it ssigns the instne onept eing lose to the orret oneF por exmpleD the entity vondon9 is n instne of the onept CapitalD nd n yfsi system ssigns it the onept City whih is lose to the onept Capital in some ontologyF sn tht se the yfsi should otin some redit ording to the loseness of the two oneptsF ht is where the fhw n e usedF he fhw hs lso een used to evlute the hierrhil lssi(tion system vi et al. HUF st n lso e used for ontology lerning nd lignmentF he fhw omputtion plugin omputes fhw sore for eh pir of onepts in n ontologyF st hs two run time prmetersX ontology ! its vlue should the ontology tht one wnts to ompute the fhw sores forF outputfhwpile ! its vlue is the v of (le whih will store the fhw sores omputedF he plugin hs the nme Ontology_BDM_Computation nd the orresponding proessing resoure9s nme is BDM Computation PRF he n e put into ipelineF sf it is put into gorpus ipelineD the orpus used should ontin t lest one doumentF he fhw omputtion used the formul given in wynrd et al. HTF he resulting (le spei(ed y the runtime prmeter outputBDMFile ontins the fhw soresF st is text (leF he (rst line of the (le gives some met informtion suh s the nme of ontology used for fhw omputtionF prom the seond line of the (leD eh line orresponds to one pir of oneptsF yne line is like
key=Service, response=Object, bdm=0.6617647, msca=Object, cp=1, dpk=1, dpr=0, n0=2.0, n1=2.0, n2=2.8333333, bran=1.9565217
PTT
st (rst shows the nmes of the two onepts @one s key nd nother s responseD nd the fhw soreD nd then other prmeters9 vlues used for the omputtionF xote thtD sine the fhw is symmetri for the two oneptsD the resulting (le ontins only one line for eh pirF o if you wnt to look for the fhw sore for one pir of oneptsD you n hoose one s key nd nother s responseF sf you nnot (nd the line for the pirD you hve to hnge the order of two onepts nd retrieve the (le ginF
10.7
hen douments re nnotted using emwreD nonymous nnottion sets re reted for the nnotting nnottorsF his mkes it impossile to run ulity essurne on suh douments s nnottion sets with sme nmes in di'erent douments my refer to the nnotions reted y di'erent nnottorsF his is speilly the se when requirement is to ompute snter ennottor egreement @seeAF he e ummriser for emwre genertes summry of greements mong nnottorsF st does this y piring individul nnottors involved in the nnottion tskF st lso ompres nnottions of eh individul nnottor with those ville in the onsensus nnottion set in the respetive doumentsF he is ville from the emwreools pluginF st internlly uses the ulityesE surne to lulte greement sttistisF ser hs to provide the following runEtime prmetersX nnottionypes ennottion types for whih the see hs to e omputedF feturexmes petures of nnottions tht should e used in see omputtionsF sf no vlue is providedD only nnottion oundries for sme nnottion types re ompredF mesure one of the six preEde(ned mesuresX pIsgD pIeieqiD pIvixsixD pHSsgD pHSeieqi nd pHSvixsixF outputpolderrl he produes summry in this folderF wore informtion on the generted (le is provided elowF he genertes n index.html (le in the output folderF his html (le ontins tle tht summrises the greement sttistisF foth the (rst row nd the (rst olumn ontin nmes of nnottors who were involved in the nnottion tskF por eh pir of nnottors who did the nnottions together on tlest one doumentD oth the miro nd mro verges re produedF vst two olumns in eh row give verge mro nd miro greements of the respetive nnottor with ll the other nnottors he or she did nnottions togetherF
PTU
hese (gures re olor odedF he olor green is used for ell kground to indite full greement @iFeF IFHAF he kground olor eomes lighter s the greement redues towrds HFSF et HFS greementD the kground olor of ell is fully whiteF prom HFS downwrdsD the olor red is used nd s the greement redues furtherD the olor eomes drker with drk red t HFH greementF se of suh olor oding mkes it esy for user to get n ide of how nnottors re performing nd lote spei( pirs of nnottions who need more trining or my e someone who deserves pt on hisGher kF por eh pir of nnottorsD the summry tle provides link @with ption document A to nother html doument tht summrises nnottions of the two respetive nnottors on per doument sisF he detils inlude numer of nnottions they greed nd disgreed nd the sores for rellD preision nd fEmesureF ih doument nme in this summry is linked with nother html doument with indepth omprison of nnottionsF ser n tully see the nnottions on whih the nnottors hd greed nd disgreedF
PTV
his is reporting tool for qei proessing resouresF st reports the totl time tken y proessing resoures nd the time tken for eh doument to e proessed y n pplition of type orpus pipelineF qei use logRjD logging systemD to write pro(ling informtions in (leF he qei proE (ling reporting tool uses the (le generted y logRj nd produes report on the proessing resouresF st pro(les tei grmmrs t the rule levelD enling the user preisely identify the performne ottleneksF st lso produes report on the time tken to proess eh doument to (nd prolemti doumentsF his initil ode for the reporting tool ws written y sntelius employees endrew forthwik nd ghirg irdiy nd generously relesed under the vqv liene to e prt of qeiF
PUH
11.1.1 Features
eility to generte the following two reports
! ort order y time or y exeutionF ! how or hide proessing elements whih took H milliseondsF ! qenerte rwv report with ollpsile treeF
eport on douments proessed spei( fetures
! vimit the numer of doument to show from the most time onsumingF ! pilter the to disply sttistis forF
petures ommon to oth reports
! qenerte report s indented text or in rwv formtF ! qenerte report only on the log entries from the lst logil run of qeiF ! ell proessing times re reported in milliseonds nd in terms of perentge
@rounded to nerest HFI7A of totl timeF
! gommnd line interfe nd esF ! hetet if the enhmrkFtxt (le is modi(ed while generting the reportF
11.1.2 Limitations
fe wre tht the pro(ling doesn9t support non orpus pipeline s pplition typeF here is indeed no interest in pro(ling non orpus pipeline tht works on one or no doument t llF o get meningful results you should run your orpus pipeline on t lest IH doumentsF
11.2
he tivtion of the pro(ling nd the retion of pro(ling reports re essile from the ools9 menu in qei with the sumenu ro(ling eports9F
PUI
ou n trt ro(ling epplitions9 nd top ro(ling epplitions9 t ny timeF he logging is umultive so if you wnt to get new report you must use the gler ro(ling ristory9 menu item when the pro(ling is stoppedF fe very reful tht you must strt the pro(ling efore you lod your pplition or you will need to relod every roessing esoure tht uses rnsduerF ytherwise you will get n ixeption similr toX
java.lang.IndexOutOfBoundsException: Index: 2, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at gate.jape.SinglePhaseTransducer.updateRuleTime(SinglePhaseTransducer.java:678)
wo types of reports re villeX eport on roessing esoures9 nd eport on houE ments roessed9F ee the previous setion for more informtionF
11.3
tions
PUP
yptionsX Ei input (le pth @defultX enhmrkFtxt in the user9s Fgte diretory2 A Em print medi E htmlGtext @defultX htmlA Ed numer of dosD use EI for ll dos @defultX IH dosA Ep proessing resoure nme to e mthed @defultX llprsA Eo output (le pth @defultX reportFhtmlGtxt in the system temporry diretoryA El logil strt @not set y defultA Eh show help
ixmples
un report IX eport on otl time tken y eh proessing element ross orpus
! jv Ep 4gteGinXgteGliGqnuqetyptFjr4 gteFutilFreportingFimeeporter
Ei enhmrkFtxt Eo reportFtxt Em text un report PX eport on ime tken y doument within given orpusF
! jv Ep 4gteGinXgteGliGqnuqetyptFjr4 gteFutilFreportingFhoimeeporter
Ei enhmrkFtxt Eo reportFhtml Em html
11.4
11.4.1 Log4j.properties
his is required to diret the pro(ling informtion to the enhmrkFtxt (leF he enhE mrkFtxt generted y qei will e used s input for qei pro(ling report tool s inputF 5 pile ppender tht outputs only enhmrk messges logRjFppenderFenhmrklogaorgFpheFlogRjFollingpileeppender logRjFppenderFenhmrklogFhresholdahifq logRjFppenderFenhmrklogFpilea6userFhomeGFgteGenhmrkFtxt
2 GATE versions up to 5.2 placed benchmark.txt in the execution directory.
PUQ
with the timestmp eing the di'ereneD mesured in milliseondsD etween the urrent time nd midnightD tnury ID IWUH gF ixmpleX
1257269774770 START Sections_splitter 1257269774773 0 Sections_splitter.doc_EP-1026523-A1_xml_00008.documentLoaded gate.creole.SerialAnalyserController {corpusName=Corpus for EP-1026523-A1.xml_00008, documentName=EP-1026523-A1.xml_00008} ...
PUR
PUS
PUT
12.1
he qei ug trker n e found on oureporgeD hereF hen reporting ugsD plese give s muh detil s possileF snlude the qei version numer nd uild numerD the pltform on whih you oserved the ugD nd the version of tv you were using @IFTFHHQD etFAF snlude steps to reprodue the prolemD nd full stk tre of ny exeptionsD inluding gused y F F F 9F ou my wish to (rst hek whether the ug is lredy (xed in the ltest nightly uildF ou my lso request new feturesF
12.2
Contributing Patches
thes my e sumitted on oureporgeF he est formt for pthes is n x di' ginst the ltest suversionF he di' n e sved s (le nd tthedY it should not e psted into the ug reportF xote tht we generlly do not ept pthes ginst erlier versions of qeiF elsoD qei is intended to e omptile with tv TD so if you regulrly develop using lter version of tv it is very importnt to ompile nd test your pthes on tv TF thes tht use fetures from lter version of tv nd do not ompile nd run on tv T will not e eptedF sf you intend to sumit lrger hngesD you might prefer to eome ommitter3 e welome input to the development proess of qeiF he ode is hosted on oureporgeD providing PUU
PUV
Developing GATE
nonymous uversion ess @see etion PFPFQAF e9re hppy to give ommitter privileges to nyone with trk reord of ontriuting good ode to the projetF e lso mke the urrent version ville nightly on the ftp siteF
12.3
qei provides )exile struture where new resoures n e plugged in very esilyF here re three types of resouresX vnguge esoure @vAD roessing esoure @A nd isul esoure @AF sn the following susetions we desrie the neessry steps to write new s nd sD nd to dd plugins to the nightly uildF he guide on writing new vs will e ville soonF
Developing GATE
import gate . creole . metadata .*;
/* *
PUW
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
* P r o c e s s i n g R e s o u r c e . The @ C r e o l e R e s o u r c e a n n o t a t i o n m a r k s t h i s * c l a s s a s a GATE R e s o u r c e , a n d g i v e s t h e i n f o r m a t i o n GATE n e e d s * to configure the resource appropriately . */ @CreoleResource ( name = " Example PR " , comment = " An example processing resource " ) public class NewPlugin extends AbstractLanguageAnalyser {
/*
if ( this . rulesURL == null ) throw new ResourceInstantiationException ( " rules URL null " ); }
/*
return this ;
* t h i s method s h o u l d p r o v i d e t h e a c t u a l f u n c t i o n a l i t y o f * ( f r o m w h e r e t h e main e x e c u t i o n b e g i n s ) . T h i s m e t h o d * g e t s c a l l e d when u s e r c l i c k on t h e "RUN" b u t t o n i n t h e * GATE D e v e l o p e r GUI ' s a p p l i c a t i o n w i n d o w . */ public void execute () throws ExecutionException {
/ / write code here
the
PR
}
/*
}
/*
* * * * * * * * *
There 1.
are
two at
types time
of of
parameters
Init
to
values be
for
these a new
parameters resource
need
to
be
and are
not of
Runtime
executing on
changed before
starting
(i .e.
you
A parameter
myParam
specified
PVH
Developing GATE
* a n d setMyParam ( w i t h t h e f i r s t l e t t e r o f t h e p a r a m e t e r * c a p i t a l i z e d i n t h e normal Java Beans s t y l e ) , w i t h t h e * annotated with a @CreoleParameter a n n o t a t i o n . * * f o r example to s e t a v a l u e f o r outputAnnotationSetName */ String outputAnnotationSetName ;
/ / getter and setter methods /* g e t <p a r a m e t e r name with first letter Capital >
name setter
*/
* o p t i o n a l runtime parameter . */ @Optional @RunTime @CreoleParameter ( comment = " name of the annotationSet used for output " ) public void setOutputAnnotationSetName ( String setName ) { this . outputAnnotationSetName = setName ; }
/ * * I n i t t i m e URL rulesURL ; parameter
GATE t h a t
it
defines
an
*/
greole intry
he reoleFxml (le simply needs to tell qei whih te (le to look in to (nd the F
Developing GATE
<?xml version="1.0"?> <CREOLE-DIRECTORY> <JAR SCAN="true">newplugin.jar</JAR> </CREOLE-DIRECTORY>
PVI
elterntively the on(gurtion n e given in the wv (le diretly insted of using soure nnottionsF etion RFU gives the full detilsF
gontext wenu
ih resoure @vDA hs some prede(ned tions ssoited with itF hese tions pper in ontext menu tht ppers in qei heveloper when the user right liks on ny of the resouresF por exmple if the seleted resoure is roessing esoureD there will e t lest four tions ville in its ontext menuX IF glose PF ride QF enme nd RF einitilizeF xew tions in ddition to the prede(ned tions n e dded y implementing the gate.gui.ActionsPublisher interfe in either the vG itself or in ny ssoited F hen the user hs to implement the following methodF
rere the vrile actions should ontin list of instnes of type javax.swing.AbstractActionF e string pssed in the onstrutor of n estrtetion ojet ppers in the ontext menuF edding null element dds seprtor in the menuF
visteners
here re t lest four importnt listeners whih should e implemented in order to listen to the vrious relevnt events hppening in the kgroundF hese inludeX greolevistener greoleEregister keeps informtion out instnes of vrious resoures nd refreshes itself on new dditions nd deletionsF sn order to listen to these eventsD lss should implement the gate.event.CreoleListenerF smplementing greolevistener requires users to implement the following methodsX
PVP
Developing GATE
! puli void resourenloded@greoleivent reoleiventAY ! puli void resoureenmed@esoure resoureD tring oldxmeD tring newE
xmeAY
! puli void dtstoreypened@greoleivent reoleiventAY ! puli void dtstoregreted@greoleivent reoleiventAY ! puli void dtstoreglosed@greoleivent reoleiventAY
houmentvistener e trditionl qei doument ontins text nd set of nnottionetsF o get noti(ed out hnges in ny of these resouresD lss should implement the gate.event.DocumentListenerF his requires users to implement the following methE odsX
! puli void ontentidited@houmentivent eventAY ! puli void nnottionetedded@houmentivent eventAY ! puli void nnottionetemoved@houmentivent eventAY
ennottionetvistener es the nme suggestsD ennottionet is set of nnottionsF o listen to the ddition nd deletion of nnottionsD lss should implement the gate.event.AnnotationSetListener nd therefore the following methodsX
Developing GATE
PVQ
glss he(nition
felow we show templte lss de(nitionD whih n e used in order to write new isul esoureF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
package example . gui ; import gate .*; import gate . creole .*; import gate . creole . metadata .*;
/*
* An e x a m p l e V i s u a l R e s o u r c e f o r t h e New P l u g i n * N o t e t h a t h e r e we e x t e n d s t h e A b s t r a c t V i s u a l R e s o u r c e c l a s s * The @ C r e o l e R e s o u r c e a n n o t a t i o n a s s o c i a t e s t h i s VR w i t h t h e * u n d e r l y i n g PR t y p e i t d i s p l a y s . */ @CreoleResource ( name = " Visual resource for new plugin " , guiType = GuiType . LARGE , resourceDisplayed = " example . NewPlugin " , mainViewer = true ) public class NewPluginVR extends AbstractVisualResource {
/* * An I n i t m e t h o d c a l l e d when * the f i r s t time */ public Resource init () { / / initialize GUI Components the GUI is initialized for
return this ;
PVR
Developing GATE
/* * H e r e t a r g e t i s t h e PR c l a s s t o w h i c h * b e l o n g s . This method i s c a l l e d a f t e r */ public void setTarget ( Object target ) { / / and initialize local data structures if required
28 29 30 31 32 33 34 35 36
this the
Visual i n i t ()
Resource method .
ivery doument hs its own doument viewer ssoited with itF st omes with single omponent tht shows the text of the originl doumentF qei provides wy to tth new qs plugins to the doument viewerF por exmple ennottionet viewerD ennottionvist viewer nd goEeferene editorF hese re the exmples of houmentiewer plugins shipped s prt of the ore qei uildF hese plugins n e displyed either on the right or on top of the doument viewerF hey n lso reple the text viewer in the enter @ee (gure IPFIAF e seprte utton is dded t the top of the doument viewer whih n e pressed to disply the qs pluginF felow we show templte lss de(nitionD whih n e used to develop new houE mentiewer pluginF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
/*
class
Implementers
should
override
this
method
and
use
it
for
}
/*
Returns
the
type
of
this
view
*/
/ / it can be any of the following constants / / from the gate.gui.docview.DocumentView / / CENTRAL, VERTICAL, HORIZONTAL
}
/*
Returns
the
actual
UI
component
this
view
represents .
*/
}
/*
This
method
called
whenever
view
becomes
active .
*/
Developing GATE
PVS
31 32 33 34 35 36
/*
This
method
called
whenever
view
becomes
inactive .
*/
@CreoleResource ( name = " ANNIE + Measurements " , icon = " measurements " , autoinstances = @AutoInstance ( parameters = { @AutoInstanceParam ( name = " pipelineURL " , value = " resources / annie - measurements . xgapp " ) , @AutoInstanceParam ( name = " menu " , value = " ANNIE " )})) public class ANNIEMeasurements extends PackagedController { }
he menu prmeter is used to speify the folder struture in whih the menu item will e plesF his is list nd works in the sme fshion s dding tools to the ools menu @see etion RFVFIAF
PVT
Developing GATE
plus the ompiled te (le nd jvdosF he len trget should len up everythingD inluding the ompiled te nd ny generted souresD etF ou should lso dd your plugin to pluginsFtoFuild9 in the topElevel uildFxml to inlude it in the uildF his is y design E not ll the plugins hve uild (lesD nd of the ones tht doD not ll re suitle for inlusion in the nightly uild @vizF viD etion IUFQAF xote tht if you re urrently uilding gte y doing nt jr9D e wre tht this does not uild the pluginsF unning just nt9 or nt ll9 will do soF
ropefully the struture of this (le is firly self explntoryF ih greolelugin element must ontin url ttriute whih points to giyvi diretoryD iFeF diretory whih ontins reoleFxml (le s desried in etion RFUX note tht for plugins distriuted vi this method the sh nd isyx ttriutes of the giyviEhsigy element must e providedF he v n e either solute @s in the (rst exmpleA or reltiveY reltive vs will e resolved ginst the lotion of the wv (leF ih greolelugin n lsoD optionllyD ontin downlodv ttriuteF sf present this should point to zip (le ontining ompiled opy of the pluginF sf the downlodv is not present then we ssume tht it n e found s (le lled reoleFzip in the diretory referened y the url ttriuteF egrdless of the lotion of the zip (le ontining the pluginD it shouldD t the top levelD ontin single diretory whih in turn ontins the full plugin inluding reoleFxml etF
12.4
he qei ser quide is mintined in the qei suversion repository t oureporgeF sf you re developer t he0eld you do not need to hek out the userguide expliitlyD s it
Developing GATE
PVU
will pper under the to diretory when you hek out sleF por othersD you n hek it out s followsX svn checkout https://svn.sourceforge.net/svnroot/gate/userguide/trunk userguide
A he user guide is written in v i nd trnslted to hp using pdfltex nd to rwv using texRhtF he min (le tht ties it ll together is tominFtexD whih de(nes the vrious mros used in the rest of the guide nd inputs the other Ftex (lesD one per hpterF
he fie dtse igFiF st must e loted in the diretory ove where you hve heked out the userguideD iFeF if the guide soures re in GhomeGoGsvnGuserguide then igFi needs to go in GhomeGiGsvnF he0eld developers will (nd tht it is lredy in the right pleD under sleD others will need to downlod it from httpXGGgteFFukGsleGigFiF he (le httpXGGgteFFukGsleGutilsFtexF e it of lukF yne these re ll ssemled it should e se of running mke to perform the tul uildF o uild the hp do mke toFpdfD for the one pge rwv do mke indexFhtml nd for the severl pges rwv do mke splitFhtmlF he hp uild generlly works without prolemsD ut the rwv uild is known to hng on some mhines for no pprent resonF sf this hppens to you try gin on di'erent mhineF
PVV
Developing GATE
nd would hve the persistent v httpXGGgteFFukGuserguideGseXmisEreoleXfishF sf your hnges re to doument ug (x or new @or removedA feture then you should lso dd n entry to the hnge log in reentEhngesFtexF ou should inlude referene to the full doumenttion for your hngeD in the sme wy s the existing hngelog entries doF ou should (nd yourself dding to the hngelog every time exept where you re just tidying up or rewording existing doumenttionF nlike in the other soure (lesD if you dd setion or susetion you should use the ret or rusetF eent hnges pper oth in the introdution nd the ppendixD so these ommnds enle nesting to e done ppropritelyF etionGsusetion lels should omprise se9 followed y the hpter lel nd desriptive setion identi(erD eh olonEseprtedF xew hpter lels should egin hpX9F ry to void hnging hpterGsetionGsusetion lels where possileD s this my rek links to the setionF sf you need to hnge lelD dd it in the (le setionsFmp9F intries in this (le re formtted one per lineD with the old setion lel followed y t followed y the new setion lelF he quote mrks used should e nd 9F itles should e in title se @pitlise the (rst wordD nounsD pronounsD versD dvers nd djetives ut not rtilesD onjuntions or prepositionsAF hen referring to numered hpterD setionD susetionD (gure or tleD pitlise itD eFgF etion QFI9F hen merely using the words hpterD setionD susetionD (gure or tleD eFgF the next hpter9D do not pitlise themF roper nouns should e pitlised @tv9D qroovy9AD s should strings where the pitlistion is signi(ntD ut not terms like nnottion set9 or doument9F he user guide is reuilt utomtilly whenever hnges re heked inD so your hnge should pper in the online version of the guide within PH or QH minutesF
PVW
Chapter 13 Gazetteers
FFFneuroiologists still go on openly studying re)exes nd looking under the hoodD not huddling pssively in the trenhesF wny of them still keep wonderingX how does the inner life risec iver puzzledD they osillte etween two mjor (tionsX @IA he rin n e understoodY @PA e will never ome loseF wenwhile they keep pursuing rin mehnismsD prtly from hitD prtly out of fithF heir premiseX he rin is the orgn of the mindF glerlyD this threeEpound lump of tissue is the soure of our insight informtion9 out our very eingF omewhere in it there might e few hidden guidelines for etter wys to led our livesF
Zen and the BrainD
13.1
Introduction to Gazetteers
e gzetteer onsists of set of lists ontining nmes of entities suh s itiesD orgnistionsD dys of the weekD etF hese lists re used to (nd ourrenes of these nmes in textD eFgF for the tsk of nmed entity reognitionF he word gzetteer9 is often used interhngely for oth the set of entity lists nd for the proessing resoure tht mkes use of those lists to (nd ourrenes of the nmes in textF hen gzetteer proessing resoure is run on doumentD nnottions of type vookup re reted for eh mthing string in the textF qzetteers usully do not depend on okens or on ny other nnottion nd insted (nd mthes sed on the textul ontent of the doumentF @the plexile qzetteerD desried in setion IQFTD eing the exeption to the ruleAF his mens tht n entry my spn more thn one word nd my strt or end within wordF sf gzetteer tht diretly works on text does respet word oundriesD the wy how word oundries re found might di'er from the wy the qei tokeniser (nds word oundriesF e vookup nnottion will only e reted if the entire gzetteer entry is mthed in the textF he detils of how gzetteer entries mth text depend on the gzetteer PWI
PWP
Gazetteers
proessing resoure nd its prmetersF sn this hpterD we will over severl gzetteersF
13.2
ANNIE Gazetteer
he rest of this introdutory setion desries the exxsi qzetteer whih is prt of exxsi nd lso desried in setion TFQF he exxsi gzetteer is prt of nd provided y the exxsi pluginF ih individul gzetteer list is plin text (leD with one entry per lineF felow is setion of the list for units of urrenyX
Ecu European Currency Units FFr Fr German mark German marks New Taiwan dollar New Taiwan dollars NT dollar NT dollars
en index (le @usully lled listsFdefA is used to desrie ll suh gzetteer list (les tht elong togetherF ih gzetteer list should reside in the sme diretory s the index (leF he gzetteer index (les desries for eh list the mjor type nd optionllyD minor typeD lnguge nd n nnottion typeD seprted y olonsF sn the exmple elowD the (rst olumn refers to the list nmeD the seond olumn to the mjor typeD the third to the minor typeD the fourth olumn to the lnguge nd the (fth olumn to the nnottion typeF hese lists re ompiled into (nite stte mhinesF eny text strings mthed y these mhines will e nnotted with fetures speifying the mjor nd minor typesF
currency_prefix.lst:currency_unit:pre_amount currency_unit.lst:currency_unit:post_amount date.lst:date:specific_date::Date day.lst:date:day monthen.lst:date:month:en monthde.lst:date:month:de season.lst:date:season
he mjor nd minor type s well s the lnguge will e dded s fetures to only vookup nnottion generted from mthing entry from the respetive listF por exmpleD if n entry
Gazetteers
PWQ
from the urrenyunitFlst gzetteer list mthes some text in doumentD the gzetteer proessing resoure will generte vookup nnottion spnning the mthing text nd ssign the fetures mjora4urrenyunit4 nd minora4postmount4 to tht nnottionF fy defult the exxsi qzetteer retes vookup nnottionsF roweverD if user hs spei(ed spei( nnottion type for listD the qzetteer uses the spei(ed nnottion type to nnotte entries tht re prt of the spei(ed list nd pper in the doument eing proessedF qrmmr rules @tei rulesA n speify the types to e identi(ed in prtiulr irumE stnesF he mjor nd minor types enle this identi(tion to tke pleD y giving ess to items stored in prtiulr lists or omintions of listsF por exmpleD if dy needs to e identi(edD the minor type dy9 would e spei(ed in the grmmrD in order to mth only informtion out spei( dysF sf ny kind of dte needs to e identi(edD the mjor type dte9 would e spei(edF his might inlude weeksD monthsD yers etF s well s dys of the weekD nd would give ess to ll the items stored in dyFlstD monthFlstD sesonFlstD nd dteFlst in the exmple shownF
PWR
Gazetteers
hen seleting list in the left tle you get its ontent displyed in the right tleF ou n sort oth tles y liking on their olumn hedersF e text (eld pilter9 t the ottom of the right tle llows to disply only the rows tht ontin the expression you typedF o edit vlue in tleD doule lik on ell or press pP then press inter when (nished editing the ellF o dd new row in oth tles use the text (eld t the top nd press inter or use the xew9 utton next to itF hen dding new list you n selet from the list of existing gzetteer lists in the urrent diretory or type new (le nmeF o delete rowD press hiftChelete or use the ontext menuF o delete more thn one row selet them eforeF ou n relod modi(ed list y seleting it nd rightEliking for the ontext menu item elod vist9 or y pressing gontrolCF hen list is modi(ed its nme in the left tle is oloured in redF sf you hve set gzetteerpetureeprtor9 prmeter then the right tle will show peE ture9 nd lue9 olumns for eh fetureF o dd new ouple of olumns use the utton edd gols9F xote tht in the left tleD you n only selet one row t timeF he gzetteer like other lnguge resoure hs ontext menu in the resoures tree to einitilise9D ve9 or ve sFFF9 the resoureF he right tle hs ontext menu for the urrent seletion to help you reting new
Gazetteers
PWS
gzetteerF st is similr with the tions found in spredsheet pplition like pill hown eletion9D gler eletion9D gopy eletion9D ste eletion9D etF
13.3
OntoGazetteer
he yntogzetteerD or rierrhil qzetteerD is proessing resoure whih n ssoite the entities from spei( gzetteer list with lss in qei ontology lnguge resoureF he yntoqzetteer ssigns lsses rther thn mjor or minor typesD nd is wre of mppings etween lists nd lss shsF he qze visul resoure n disply the listsD ontology mppings nd the lss hierrhy of the ontology for yntoqzetteer proessing resoure nd provides wys of editing these omponentsF
13.4
his setion desries the qze gzetteer editor when it displys n yntoqzetteer proessing resoureF he editor onsists of two prtsX one for the editing of the lists nd the mpping of lists nd one for editing the ontologyF hese two prts re desried in the following susetionsF
veft pneX e single ontology is visulized in the left pne of the F he mpping etween
list nd lss is displyed y showing the list s sulss with di'erent ionF he mpping is spei(ed y drg nd drop from the liner de(nition pne @in the middleA ndGor y right lik menuF
widdle pneX he middle pne displys the nodesGlines in the liner de(nition (leF fy
doule liking on node the orresponding list is openedF iditing of the lineGnode is done y right liking nd hoosing editX dilogue ppers @lower prt of the shemeA llowing the modi(tion of the memers of the nodeF
ight pneX sn the right pne single gzetteer list is displyedF st n e edited nd prts
of it n e utGopiedGpstedF
PWT
Gazetteers
veft pneX he vrious ontologies loded re listed hereF yn doule lik or right lik nd
edit from the menu the ontology is visulized in the ight pneF opertions re llowedX
ight pneX fesides the visuliztion of the lss hierrhy of the ontology the following
expndingGollpsing prts of the ontology dding lss in the hierrhyX y right liking on the intended prent of the new lss nd hoosing dd su lssF removing lssX vi right liking on the lss nd hoosing removeF es result of this D the ontology de(nition (le is 'etedGlteredF
13.5
Hash Gazetteer
he rsh qzetteer is gzetteer implemented y the yntoext v @httpXGGwwwF ontotextFomGAF sts implementtion is sed on simple lookup in severl jvFutilFrshwp ojetsD nd is inspired y the strnge ide of etns uirykovD tht serhing in rshwps my e fster thn in pinite tte whine @pwAF he rsh qzetteer proessing resoure is prt of the exxsi pluginF his gzetteer proessing resoure is implemented in the following wyX ivery phrse iFeF every list entry is seprted into severl prtsF he prts re determined y the whitespes lying mong themY eFgFD the phrse form is emptiness hs three prtsX formD isD nd emptinessF here is lso list of rshwpsX mpsvist whih hs s mny elements s the longest @in terms of ount of prts9A phrse in the listsF o the (rst prt of phrse is pled in the (rst mpF he (rst prt C spe C seond prt is pled in the seond mpD etF he full phrse is pled in the pproprite mpD nd referene to vookup ojet is tthed to itF yn (rst sight it seems tht this lgorithm is ertinly muh more memoryEonsuming thn (nite stte mhine @pwA with the prts of the phrses s trnsitionsD ut this is tully not so importnt sine the verge length of the phrses @in prtsA in the lists is IFIF yn the other hndD one dvntge of the lgorithm is thtD lthough unonventionlD it tkes
Gazetteers
PWU
less memory nd my e slightly fsterD espeilly if you hve very lrge gzetteer @eFgFD IHHDHHHs of entriesAF
13.5.1 Prerequisites
he phrses to e reognised should e listed in set of (lesD one for eh type of ourrene @s for the stndrd gzetteerAF he gzetteer is uilt with the informtion from (le tht ontins the set of lists @whih re (les s wellA nd the ssoited type for eh listF he (le de(ning the set of lists should hve the following syntxX eh list de(nition should e written on its own line nd should ontinX the (le nme @requiredA the mjor type @requiredA the minor type @optionlA the lnguge@sA @optionlA he elements of eh de(nition re seprted y X9F he following is n exmple of vlid de(nitionX
personmale.lst:person:male:english
ih (le nmed in the lists de(nition (le is just list ontining one entry per lineF hen this gzetteer is run over some input text @ qei doumentA it will generte nnoE ttions of type vookup hving the ttriutes spei(ed in the de(nition (leF
13.5.2 Parameters
he rsh qzetteer proessing resoure llows the spei(tion of the following prmeters when it is retedX
enodingX the enoding of the gzetteer lists listsvX the v of the list de(nitions @indexA (leD iFeF the (le tht ontins the (lenmesD
mjor types nd optionlly minor types nd lnguges of ll the list (lesF
PWV
Gazetteers
here is one runEtime prmeterD nnottionetxme tht llows the spei(tion of the nnottion set in whih the vookup nnottions will e retedF sf nothing is spei(ed the defult nnottion set will e usedF xote tht the rsh qzetteer does not hve the longestwthynly nd wholeordE synly prmetersY if you need to on(gure these optionsD you should use the nother gzetteer tht supports themD suh s the stndrd exxsi qzetteer @see setion IQFPAF
13.6
Flexible Gazetteer
he plexile qzetteer provides users with the )exiility to hoose their own ustomized input nd n externl qzetteerF por exmpleD the user might wnt to reple words in the text with their se forms @whih is n output of the worphologil enlyserA efore running the qzetteerF he plexile qzetteer performs lookup over doument sed on the vlues of n ritrry feture of n ritrry nnottion typeD y using n externally provided gzetteerF st is importnt to use n externl gzetteer s this llows the use of ny type of gzetteer @eFgF n yntologil gzetteerAF snput to the plexile qzetteerX untime prmetersX houment ! the doument to e proessed inputexme he nnottionet where the plexile qzetteer should serh for the ennottionypeFfeture spei(ed in the inputpeturexmesF outputexme he ennottionet where vookup nnottions should e pledF gretion time prmetersX inputpeturexmes ! when seletedD these feture vlues re used to reple the orresponding originl textF por eh fetureD temporry doument is reted from the vlues of the spei(ed fetures on the spei(ed nnottion typesF por exmpleX for okenFroot the temporry doument will hve ontent of every oken repled with its root vlueF sn se of overlpping nnottions of the sme type in the inputD only the vlue of the (rst nnottion is onsideredF rereD plese note tht the order of nnottions is deided y using the gteFutilFy'setgomprtor lssF gzetteersnst ! the tul gzetteer instneD whih should run over temporry doumentF his genertes the vookup nnottions with feturesF his must e n instne of gteFreoleFgzetteerFqzetteer whih hs lredy een retedF ell suh instnes will e shown in the dropdown menu for this prmeter in qei heveloperF
Gazetteers
PWW
yne the externl gzetteer hs nnotted text with vookup nnottionsD vookup nnoE ttions on the temporry doument re onverted to vookup nnottions on the originl doumentF pinlly the temporry doument is deletedF
13.7
he gzetteer list olletorD found in the ools pluginD ollets ourrenes of entities diretly from set of nnotted trining douments nd popultes gzetteer lists with the entitiesF he entity types nd struture of the gzetteer lists re de(ned s neessry y the userF yne the lists hve een olletedD semnti grmmr n e used to (nd the sme entities in new textsF he trget gzetteer must ontin list orresponding extly to eh nnottion type to e olletion @for exmpleD ersonFlst for the erson nnottionsD yrgniztionFlst for the yrgniztion nnottionsD etFAF ou n use the gzetteer editor to rete new empty lists for types tht re not lredy in your gzetteerF xote tht if you do thisD you will need to ve nd einitilise the gzetteer lter @the olletor updtes the BFlst (les on diskD ut not the listsFdef (leAF sf list in the gzetteer lredy ontins entriesD the olletor will dd new entriesD ut it will only ollet one ourrene of eh new entryY it heks tht the entry is not present lredy efore dding itF here re R runtime prmetersX nnottionypesX list of the nnottion types tht should e olleted gzetteerX the gzetteer where the results will e stored @this must e lredy loded in qeiA mrkupenmeX the nnottion set from whih the nnottion types should e olE leted thevngugeX sets the lnguge feture of the gzetteer lists to e reted to the pproprite lnguge @in the se where lists re olleted for di'erent lngugesA pigure IQFP shows sreenshot of set of lists olleted utomtilly for the rindi lngugeF st ontins R listsX ersonD yrgnistionD votion nd list of stopwordsF ih list hs mjorype whose vlue is the type of listD minorype inferred9 @sine the lists hve een inferred from the textAD nd the lnguge rindi9F he list olletor lso hs fility to split the erson nmes tht it ollets into their individul tokensD so tht it dds oth the entire nme to the listD nd dds eh of the tokens to the list @iFeF eh of the (rst nmesD nd the surnmeA s seprte entryF hen
QHH
Gazetteers
the grmmr nnottes ersonsD it n require them to e t lest P tokens or P onseE utive erson vookupsF sn this wyD new erson nmes n e reognised y omining known (rst nme with known surnmeD even if they were not in the trining orpusF here only single token is found tht mthesD n nknown entity is genertedD whih n lter e mthed with n existing longer nme vi the orthomther omponent whih performs orthogrphi oreferene etween nmed entitiesF his sme proedure n lso e used for other entity typesF por exmpleD prts of yrgnistion nmes n e omined together in di'erent wysF he fility for splitting erson nmes is hrdoded in the (le gteGsrGgteGreoleGqzetteervistsgolletorFjv nd is ommentedF
13.8
OntoRoot Gazetteer
yntooot qzetteer is type of dynmilly reted gzetteer tht isD in omintion with few other generi qei resouresD ple of produing ontologyEsed nnottions over the given ontent with regrds to the given ontologyF his gzetteer is prt of qzetteeryntologyfsed9 plugin tht hs een developed s prt of the ey projetF
Gazetteers
glssesD snstnesD ropertiesA nd extrt their humnEunderstndle lexilistionsF
QHI
es preondition for extrting humnEunderstndle ontent from the ontologyD (rst list of the following is eing retedX nmes of ll ontology resoures iFeF frgment identi(ers
1
nd
ssigned property vlues for ll ontology resoures @eFgFD lel nd dttype property vluesA ih item from the list is further proessed so thtX ny nme ontining dsh @4E4A or underline @44A hrter@sA is proessed so tht eh of these hrters is repled y lnk speF por exmpleD rojetxme or rojetExme would eome rojet xmeF ny nme tht is written in camelCase style is tully split into its onstituent wordsD so tht rojetxme eomes rojet xme @optionlAF ny nme tht is ompound nme suh s y gger for pnish9 is split so tht oth y gger9 nd gger9 re dded to the list for proessingF sn this exmpleD for9 is stop wordD nd ny words fter it re ignored @optionlAF ih item from this list is nlysed seprtely y the ynto oot epplition @yeA on exeution @see (gure IQFQAF he ynto oot epplition (rst tokenises eh linguisti termD then ssigns prtEofEspeeh nd lemm informtion to eh tokenF es result of tht preEproessingD eh token in the terms will hve dditionl feture nmed root9D whih ontins the lemm s reted y the morphologil nlyserF st is this lemm or set of lemms whih re then dded to the dynmi gzetteer listD reted from the ontologyF por instneD if there is resoure with short nme @iFeFD frgment identi(erA ProjectNameD without ny ssigned properties the reted list efore exeuting the yntooot gzetteer olletion will ontin the following stringsX ProjectName 9D Project
Name 9
QHP
Gazetteers
pigure IQFQX fuilding yntology esoure oot @yntoootA qzetteer from the yntology
ih of the item from the list is then nlysed seprtely nd the results would e the sme s the input stringsD s ll of entries re nouns given in singulr formF
to e proessedY
nd GATE Morphological Analyser to e used during proessE ing @if these re lso used in pipelineD their input nd output prmeters must remin set to the defult nnottion setAY
notY notY
defult is set to true E should this gzetteer nlyse resoure ss or defult is set to true E should this gzetteer onsider properties or
considerPropertiesD
propertiesToInclude
E heked only if considerProperties is set to true E this prmeter ontins the list of property nmes @ssA to e inludedD omm seprtedY
Gazetteers
propertiesToExclude
QHQ
E heked only if considerProperties is set to true E this prmeter ontins the list of property nmes to e exludedD omm seprtedY
caseSensitiveD
defult set to true E should this gzetteer seprte emphE melgsed wordsD eFgF rojetxme9 into rojet xme9Y defult set to flse E should this gzetteer onsider severl heuristi rules or notF ules inlude splitting the words ontining spesD nd using prepositions s stop wordsY for exmpleD if 9pos tgger for pnish9 would e nlysedD for9 would e onsidered s stop wordY heuristilly derived would e pos tgger9 nd this would e further used to dd pos tgger9 to the gzetteer listD with feture emphheuristil level set to e HD nd tgger9 with emphheuristil level IY t runtime lower heuristil level should e preferredF xyiX setting considerHeuristicRules to true n use lot of noise for some ontologies nd is likely to require implementing n dditionl (ltering resoure tht will prefer the nnottions with the lower heuristi levelY
considerHeuristicRulesD
separateCamelCasedWordsD
he yntooot qzetteer9s initiliztion preproesses strings from the ontology nd runs the tokenizerD y tggerD nd morphologil nlyser over themF hese s must remin set to use the defult nnottion set for input nd outputD or the yntooot qzetteer will throw esouresnstntitionixeptionF sf you hnge the prmeters of these s in pipelineD you will not e le to rete yntooot qzetteers with them fterwrdsY in this seD you should rete seprte instnes of the three s nd use them only for instntiting yntooot qzetteers without dding them to pipelineF @es long s the s re not used in pipelineD the runtime prmeters for input nd output remin set for the defult nnottion setD even though you nnot see or set them in the qsFA st my e helpful to give the speil s di'erent nmes from the defults so you n lerly distinguish them from the ones used in the pipelineF
isy wy
por quik strt with the yntooot qzetteerD onsider running it from the qei heveloper @qei qsAX trt qei
QHR
Gazetteers
pigure IQFRX mple ontologyEsed nnottion s result of running yntooot qzetteerF peture URI refers to the s of the ontology resoureD while type identi(es the type of the resoure suh s class, instance, propertyD or datatypePropertyValue
vod smple pplition from resoures folder @exmpleeppFxgppAF his will lod CAT App pplitionF un CAT App pplition nd open query-doc to see set of vookup nnottions generted s result @see pigure IQFRAF
rrd wy
yntooot qzetteer n esily e set up to e used with ny ontologyF o generte qei pplition whih demonstrtes the use of the yntooot qzetteerD follow these stepsX IF trt qei PF vod neessry pluginsX glik on ools yntology yntologyfsedqzetteer yntologyools @optionlAY this prmeter is required in order to view ontology using the qei yntology iditorF exxsiF wke sure tht these plugins re loded from qeiGpluginsGpluginnme folderF
Manage CREOLE plugins
Gazetteers
QHS
QF vod n ontologyF ight lik on Language ResourceD nd selet the lst option to rete n OWLIM Ontology LRF peify the formt of the ontologyD for exE mple rdfXmlURLD nd give the orret pth to the ontologyX either the soE lute pth on your lol mhine suh s XGmyyntologyFowl or the v suh s httpXGGgteFFukGnsGgteEontologyF peify the name suh s myOntology @this is optionlAF RF grete roessing esouresX ight lik on the following s @with defult prmetersAX houment eset exxsi inglish okeniser exxsi y gger qei worphologil enlyser egix entene plitter @or exxsi entene plitterA SF grete n
Onto Root Gazetteer Processing Resource
nd rete the
Ontology X Tokeniser X
selet previously reted myyntologyY selet previously reted okeniserY selet previously reted y ggerY
yntooot gzetteer is quite )exile in tht it n e on(gured using the optionl prmetersF vist of ll prmeters is detiled in etion IQFVFPF hen ll prmeters re set lik yuF st n tke some time to iniE tilise yntooot qzetteerF por exmpleD loding qei knowledge se from httpXGGgteFFukGnsGgteEk tkes round TEIS seondsF vrger ontologies n tke muh longerF TF grete nother whih is plexile qzetteerF es init prmeters it is mndtory to selet previously reted yntooot qzetteer for gzetteersnstF por nother prmeterD inputpeturexmesD lik on the utton on the right nd when prompt with windowD dd 9okenFroot9 in the provided textoxD then lik edd uttonF glik yuD give nme to the new @optionlA nd then lik yuF UF grete n pplitionF ight lik on epplitionD then xew ipeline @or gorpus ipelineAF edd the following s to the pplition in this prtiulr orderX houment eset egix entene plitter @or exxsi entene plitterA exxsi inglish okeniser exxsi y gger
QHT
Gazetteers
qei worphologil enlyser plexile qzetteer
VF grete doument to proess with the new pplitionY for exmpleD if the ontology ws httpXGGgteFFukGnsGgteEkD then the doument ould e the qei home pgeX httpXGGgteFFukF un pplition nd then investigte the results furtherF ell nnottions re of type LookupD with dditionl fetures tht give detils out the resoures they re referring to in the given ontologyF
13.9
Large KB Gazetteer
he lrge uf gzetteer provides support for ontologyEwre xvF ou n lod ny ontology from hp nd then use the gzetteer to otin lookup nnottions tht hve oth instne nd lss sF he lrge uf gzetteer is ville s the plugin qzetteervufF he urrent version of the lrge uf gzetteer does not use qei ontology lnguge reE souresF snstedD it uses its own mehnism to lod nd proess ontologiesF he urrent version is likely to hnge signi(ntly in the ner futureF he vrge uf gzetteer grew from omponent in the semnti serh pltform yntoE text uswF he gzetteer is developed y people from the usw tem @see httpXGGnmwikiF ontotextFomGlkgzetteerGtemElistFhtmlAF ou my (nd the nme kim left in sevE erl ples in the soure odeD doumenttion or soure (lesF
Gazetteers
QHU
he gzetteer will rete nnottions with type vookup9 nd two feturesY inst9D whih ontins the s of the ontology instneD nd lss9 whih ontins the s of the ontology lss tht instne elongs toF
sf you wnt to see exmples of how to use lol hp (lesD plese hek samples/dictionary_from_local_ontology/cong.ttlF he Sesame repository conguration setion onE (gures lol yntotext wiftyvsw dtse tht lods list of hp (lesF imply rete list of your hp (les nd reuse the rest of the on(gurtionF he smple on(gurtion support dtsets with IHDHHHDHHH triples with eptle performneF por working with lrger dtsetsD dvned users n sustitute wiftyvsw with nother esme hp engineF sn tht seD mke sure you dd the neessry tes to the list in GATE_HOME/plugins/Gazetteer_LKB/creole.xmlF por exmpleD yntotext figyv is esme hp engine tht n lod illions of triples on desktop hrdwreF ine ny esme repository n e on(gured in cong.ttlD the vrge uf qzetteer n exE trt ditionries from ll signi(nt hp dtsesF ee the pge on dtse omptiility for more informtionF ontins ev queryF ou n write ny query you likeD s long s its projetion ontins t lest two olumns in the following orderX lel nd instneF es n optionD you n lso dd third olumn for the ontology lss of the hp entityF felow you n see smple queryD whih retes ditionry from the nmes nd the unique identi(ers of IHDHHH entertiners in hediF
query.txt
PREFIX opencyc: <http://sw.opencyc.org/2008/06/10/concept/en/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?Name ?Person WHERE { ?Person a opencyc:Entertainer ; rdfs:label ?Name . FILTER (lang(?Name) = "en") } LIMIT 10000
QHV
Gazetteers
hen you lod the ditionry on(gurtion in qei for the (rst timeD it retes inry snpshot of the ditionryF herefter it will lod only this inry snpshotF sf the diE tionry on(gurtion is hngedD the snpshot will e reinitilized utomtillyF por more informtionD plese see the ditionry lifeyle spei(tionF
Gazetteers
QHW
rmeters
inputexmeY the nnottion setD whih nnottion will e proessedF serverY the v of the esme P r repositoryF upport for generi ev endpoints n e implemented if requiredF repositorysdY the sh of the esme repositoryF nnottionypesY list of types of nnottion tht will e proessedF queryY ev query ptternF he query will e proessed like this E tringFformt@queryD uripromennottionAD so you n use prmeters like 7s or 7I6sF deleteynxoeltionsY whether we wnt to delete the nnottion tht weren9t enrihedF relps to len up the input nnottionsF
QIH
Gazetteers
13.10
he hefultqzetteer @nd its sulsses suh s the yntoootqzetteerA ompiles its gzetteer dt into (nite stte mther t initiliztion timeF por lrge gzetteers this pw requires onsiderle mount of memoryF roweverD one the pw hs een uilt then @s long s you do not modify it dynmilly using qzeA it is essed in redE only mnner t runtimeF por multiEthreded pplition tht requires severl identil opies of its proessing resoures @see setion UFIRAD qei provides mehnism wherey single ompiled pw n e shred etween severl gzetteer s tht n then e exeuted onurrently in di'erent thredsD sving the memory tht would otherwise e required to lod the lists severl timesF his feture is not ville in the qei heveloper qsD s it is only intended for use in emedded odeF o mke use of itD (rst rete single instne of the regulr hefultqzetteer or yntoootqzetteerX
FeatureMap params = Factory.newFeatureMap(); params.put("listsUrl", listsDefLocation); LanguageAnalyser mainGazetteer = (LanguageAnalyser)Factory.createResource( "gate.creole.gazetteer.DefaultGazetteer", params);
hen rete ny numer of hredhefultqzetteer instnesD pssing this regulr gzetteer s prmeterX
FeatureMap params = Factory.newFeatureMap(); params.put("bootstrapGazetteer", mainGazetteer); LanguageAnalyser sharedGazetteer = (LanguageAnalyser)Factory.createResource( "gate.creole.gazetteer.SharedDefaultGazetteer", params);
he hredhefultqzetteer instne will reEuse the pw tht ws uilt y the minqzetteer insted of loding its ownF
QIP
qei ontology support ims to simplify the use of ontologies oth within the set of qei tools nd for progrmmers using the qei ontology esF he qei ontology es hides the detils of the tul kend implementtion nd llows simpli(ed mnipultion of ontologies y modeling ontology resoures s esyEtoEuse tv ojetsF yntologies n e loded from nd sved to vrious seriliztion formtsF he qei ontology support roughly onforms to the representtionD mnipultion nd inferene tht onforms to wht is supported in yvEvite @see httpXGGwwwFwQForgGG owlEfeturesGAF his mens tht user n represent informtion in n ontology tht onforms to yvEvite nd tht the qei ontology model will provide inferred informtion equivlent to wht n yvEvite resoner would provideF he qei ontology model mkes n ttempt to lso to some extend provide useful informtion for ontologies tht do not onform to yvEviteX hpD yvEhvD yvEpull or yvP ontologies n e loded ut qei might ignore prt of ll ontents of those ontologiesD or might only provide prt ofD or inorret inferred fts for suh ontologiesF sf n ontology is loded tht ontins restrition not supported y yvEviteD like oneyfD unionyfD intersetionyfD or omplementyfD the lsses to whih suh restritions pply will not e found in some sittions euse the yntology es hs not wy of representing suh restritionsF por exmpleD suh lsses will not show up when requesting the diret sulsses of given lssF sn other situtionsD eFgF when retrieved diretly using the sD the lss will e foundF sing the yntology plugin with ontologies tht do not onform to yvEvite should e voided to void suh onfusing ehviorF he qei es tries to prevent lients from modifying n ontology tht onforms to yvE vite to eome yvEhv or yvEpull nd lso tries to prevent or wrn out some of the most ommon errors tht would mke the ontology inonsistentF roweverD the urrent implementtion is not le to prevent ll suh errors nd hs no wy of (nding out if n ontology onforms to yvEvite or is inonsistentF
14.1
QIQ
ll the superlsses of tht lssD s well s ll the superlsses of its diret superlssesD nd so on until no more re foundF his lultion is (niteD the upper ound eing the set of ll the lsses in the ontologyF e lss tht hs no superlsses is lled top classF en ontology n hve severl top lssesF elthough the qei ontology es n del with yles in the hierrhy grphD these n use prolems for proesses using the es nd proly indite n error in the de(nition of the ontologyF elso other omponents of qeiD like the ontology editor nnot del with yli lss strutures nd will terminte with n errorF gre should e tken to void suh situtionsF e pir of ontology lsses n lso hve n equivlentglsses reltionD whih indites tht the two lsses re virtully the sme nd ll their properties nd instnes should e shredF e restrition @represented y estrition ojets in the qei ontology esA is n nonyE mous lss @iFeFD the lss is not identi(ed y n sGssA nd is set on n ojet or dttype property to restrit some instnes of the spei(ed domin of the property to hve only ertin vlues @lso known s vlue onstrintA or ertin numer of vlues @lso known s rdinlity restritionA for the propertyF hus for eh restrition there exists t lest three triples in the repositoryF yne tht de(nes resoure s restritionD nother one tht indites on whih property the restrition is spei(edD nd (nlly the third one tht indiE tes wht is the onstrint set on the rdinlity or vlue on the propertyF here re six types of restritionsX IF estrition @owlXrdinlityestritionAX the only vlid vlues for this restrition in yvEvite re H nd IF e rdinlity restrition set to either H or I implies oth MinCardinality estrition nd MaxCardinality estrition set to the sme vlueF
Cardinality MinCardinality MaxCardinality HasValue
PF QF RF SF TF
AllValuesFrom
SomeValuesFrom
14.1.2 Instances
snstnesD lso often lled individuals re ojets tht elong to lssesF vike nmed lssesD eh instne is identi(ed y n sF ih instne n elong to one or more lsses nd
QIR
n hve properties with vluesF wo instnes n hve the smesnstnees reltionD whih indites tht the property vlues ssigned to oth instnes should e shred nd tht ll the properties pplile to one instne re lso vlid for the otherF sn dditionD there is differentsnstnees reltionD whih delres the instnes s disjointF snstnes re represented y ysnstne ojets in the esF es methods re provided for getting ll the instnes in n ontologyD ll the ones tht elong to given lssD nd ll the property vlues for given instneF here is lso method to retrieve list of lsses tht the instne elongs toD using either trnsitive or diret losureF
QIS
e dttype property is ssoited with n ontology instne nd n hve viterl vlue tht is omptile with its dt type F e dt type n e one of the preEde(ned dt types in the qei ontology esX
http://www.w3.org/2001/XMLSchema#boolean http://www.w3.org/2001/XMLSchema#byte http://www.w3.org/2001/XMLSchema#date http://www.w3.org/2001/XMLSchema#decimal http://www.w3.org/2001/XMLSchema#double http://www.w3.org/2001/XMLSchema#duration http://www.w3.org/2001/XMLSchema#float http://www.w3.org/2001/XMLSchema#int http://www.w3.org/2001/XMLSchema#integer http://www.w3.org/2001/XMLSchema#long http://www.w3.org/2001/XMLSchema#negativeInteger http://www.w3.org/2001/XMLSchema#nonNegativeInteger http://www.w3.org/2001/XMLSchema#nonPositiveInteger http://www.w3.org/2001/XMLSchema#positiveInteger http://www.w3.org/2001/XMLSchema#short http://www.w3.org/2001/XMLSchema#string http://www.w3.org/2001/XMLSchema#time http://www.w3.org/2001/XMLSchema#unsignedByte http://www.w3.org/2001/XMLSchema#unsignedInt http://www.w3.org/2001/XMLSchema#unsignedLong http://www.w3.org/2001/XMLSchema#unsignedShort
e set of ontology lsses n e spei(ed s property9s dominY in tht se the property n e ssoited with the instne elonging to ll of the lsses spei(ed in tht domin only @the intersetion of the set of domin lssesAF httype properties n hve other dttype properties s supropertiesF QF yjet ropertyX en ojet property is ssoited with n ontology instne nd hs n instne s vlueF e set of ontology lsses n e spei(ed s property9s domin nd rngeF hen the property n only e ssoited with the instnes elonging to ll of the lsses spei(ed s the dominF imilrlyD only the instnes tht elong to ll the lsses spei(ed in the rnge n e set s vluesF yjet properties n hve other ojet properties s supropertiesF RF hp ropertyX hp properties re more generl thn dttype or ojet propertiesF he qei ontology es uses hproperty ojets to hold dttype propertiesD ojet propertiesD nnottion properties or tul hp properties @rdfXropertyAF
QIT
ell properties @exept the nnottion propertiesA n e mrked s funtionl propertiesD whih mens tht for given instne in their dominD they n only tke t most one vlueD iFeF they de(ne funtion in the lgeri senseF roperties inverse to funtionl properties re mrked s inverse functionalF sf one likes ontology properties with lgeri reltionsD the semntis of these eome pprentF
14.1.4 URIs
ss re used to identify resoures @instnesD lssesD propertiesA in n ontologyF ell ss tht identify lssesD instnesD or properties in n ontology must onsist of two prtsX nme prtX this is the prt fter the lst slsh @5A or the (rst hsh @5A in the sF his prt of the s is often used s shorthnd nme for the entity @eFgF in the ontology editorA nd is often lled fragment identier nmespe prtX the prt tht preedes the nmeD inluding the triling slsh or hsh hrterF ss uniquely identify resouresX eh resoure n hve t most one s nd eh s n e ssoited with t most one resoureF ss re represented y ys ojets in the esF he yntology ojet provides ftory methods to rete yss from omplete s string or y ppending nme to the defult nmespe of the ontologyF rowever it is the responsiility of the ller to ensure tht ny strings tht re pssed to these ftory methods do in ft represent vlid ssF qei provides some helper methods in the ytils lss to help with enoding nd deoding s stringsF
14.2
en yntology ivent wodel @yiwA is implemented nd inorported into the new qei ontology esF nder the new yiwD events re (red when resoure is ddedD modi(ed or deleted from the ontologyF en interfe lled yntologywodifitionvistener is reted with (ve methods @see eE lowA tht need to e implemented y the listeners of ontology eventsF
QIU
his method is invoked whenever n ontology resoure @ lssD property or instneA is removed from the ontologyF heleting one resoure n lso result into the deletion of the other dependent resouresF por exmpleD deleting lss should lso delete ll its instnes @more detils on how deletion works re explined lterAF he seond prmeterD n rry of stringsD provides list of ss of resoures deleted from the ontologyF
public void resourceAdded(Ontology ontology, OResource resource);
his method is invoked whenever new resoure is dded to the ontologyF he prmeters provide referenes to the ontology nd the resoure eing dded to itF
public void ontologyRelationChanged(Ontology ontology, OResource resource1, OResource resource2, int eventType);
his method is invoked whenever reltion etween two resoures @eFgF yglss nd yglssD hpoeprtyD hproeprtyD etA is hngedF ixmple events re ddition or removl of sulss or supropertyD two lsses or properties eing set s equivlent or di'erent nd two instnes eing set s sme or di'erentF he (rst prmeter is the referene to the ontologyD the next two prmeters re the resoures eing 'eted nd the (nl prmeters is the event typeF lese refer to the list of events spei(ed elow for di'erent types of eventsF
public void resourcePropertyValueChanged(Ontology ontology, OResource resource, RDFProperty property, Object value, int eventType)
his method is invoked whenever ny property vlue is dded or removed to resoureF he (rst prmeter provides referene to the ontology in whih the event took pleF he seond provides referene to the resoure 'etedD the third prmeter provides referene to the property for whih the vlue is dded or removedD the fourth prmeter is the tul vlue eing set on the resoure nd the (fth prmeter identi(es the type of eventF
public void ontologyReset(Ontology ontology)
his method is lled whenever ontology is resetF sn other words when ll resoures of the ontology re deleted using the ontologyFlenup methodF he ygonstnts lss de(nes the stti onstntsD listed elowD for vrious event typesF
public static final int OCLASS_ADDED_EVENT; public static final int ANONYMOUS_CLASS_ADDED_EVENT;
QIV
public public public public public public public public public public public public public public public public public public public public public public public public public public public public public public
en ontology is responsile for (ring vrious ontology eventsF yjet wishing to listen to the ontology events must implement the methods ove nd must e registered with the ontology using the following methodF
addOntologyModificationListener(OntologyModificationListener oml);
QIW
king these vrious reltions into ountD hnge in one resoure n 'et other resoures in the ontologyF felow we desrie wht hppens @in terms of wht does the qei ontology es doA when resoure is deletedF hen lss is deleted
! e list of ll its super lsses is otinedF por eh lss in this listD list of its
sulsses is otined nd the deleted lss is removed from itF
! ell sulsses of the deleted lss re removed from the ontologyF e list of ll its
equivlent lsses is otinedF por eh lss in this listD list of its equivlent lsses is otined nd the deleted lss is removed from itF
! ell instnes of the deleted lss re removed from the ontologyF ! ell properties re heked to see if they ontin the deleted lss s memer
of their domin or rngeF sf soD the respetive property is lso deleted from the ontologyF
! e list of ll its sme instnes is otinedF por eh instne in this listD list of
its sme instnes is otined nd the deleted instne is removedF
! e list of ll instnes set s di'erent from the deleted instne is otinedF por ! ell the instnes of ontology re heked to see if ny of their set properties hve
eh instne in this listD list of instnes set s di'erent from it is otined nd the deleted instne is removedF the deleted instne s vlueF sf soD the respetive set property is ltered to remove the deleted instneF
! e list of ll its super properties is otinedF por eh property in this listD list
of its su properties is otined nd the deleted property is removedF
! ell su properties of the deleted property re removed from the ontologyF ! e list of ll its equivlent properties is otinedF por eh property in this listD
list of its equivlent properties is otined nd the deleted property is removedF deleted property set on themF sf so the respetive property is deletedF
! ell instnes nd resoures of the ontology re heked to see if they hve the
14.3
he plugin yntology ontins the urrent ontology es implementtionF his implementE tion provides the dditions nd enhnements introdued into the qei ontology es s of relese SFIF st is sed on kend tht uses esme version P nd yvsw version QF
QPH
fefore ny ontologyEsed funtionlity n e usedD the plugin must e loded into qeiF o do this in the qei heveloper qsD selet the wnge giyvi lugins9 option from the pile9 menu nd hek the vod now9 hekox for the yntology9 pluginD then lik yuF efter thisD the ontext menu for vnguge esoures will inlude the following ontology lnguge resouresX
OWLIMOntology X
this is the stndrd lnguge resoure to use in most situtionsF st llows the user to rete new ontology ked y (les in lol diretory nd optionlly lod ontology dt into itF this lnguge resoure hs the sme funtionlity s OWLIMOntology ut uses the extly sme pkge nd lss nme s the lnguge resoure in the plugin yntologyyvswPF his v is provided to llow n esier upgrde of existing pipelines to the new implementtion ut users should move the the OWLIMOntology LR s soon s possileF
ConnectSesameOntology X OWLIMOntology DEPRECATED X
his lnguge resoures llows the use of ontologies tht re lredy stored in esmeP repository whih is either stored in diretory or essile from serverF his is useful for quikly reEusing very lrge ontology tht hs een previously reted s persistent OWLIMOntology lnguge resoureF his lnguge resoure llows the user to rete new empty ontology y speifying the repository on(gurtion for reting the sesme repositoryF
CreateSesameOntology X
xoteXThis
ih of these lnguge resoures is explined in more detil in the following setionsF o mke the plugin ville to your qei imedded pplitionD lod the plugin prior to reting one of the ontology lnguge resoures using the following odeX
1 2 3 4 5
/ / Find the directory for the Ontology plugin
File pluginHome = new File ( new File ( Gate . getGateHome () , " plugins " ) , " Ontology " ); Gate . getCreoleRegister (). registerDirectories ( pluginHome . toURI (). toURL ());
QPI
o rete new yvsw yntology resoureD selet yvsw yntology9 from the rightElik xew9 menu for lnguge resouresF e dilog s shown in pigure IRFI ppers with the following prmeters to (ll in or hngeX
Name
@optionlAX if no nme is givenD defult nme will e genertedD if n ontology is loded from n vD sed on tht vD otherwise sed on the lnguge resoure nmeF @optionlAX the s to e used for resolving reltive s referenes in the ontology during lodingF
baseURI
@optionlAX the nme of n existing diretory on the (le system where the diretory will e reted tht ks the ontology storeF he nme of the diretory tht will e reted within the dt diretory will e qeiyvswyntology followed y string representtion of the system timeF sf this prmeter is not spei(edD the vlue for system property jvFioFtmpdir is usedD if this is not set either n error is risedF
dataDirectoryName
@optionlAX either true or flseF sf set to flse ll ontology import speE i(tions found in the loded ontology re ignoredF his prmeter is ignored if no ontology is loded when the lnguge resoure is retedF
loadImports
@optionlAX the v of text (le ontining import mppings spei(E tionsF ee setion IRFQFS for desription of the mppings (leF sf no v is spei(edD the qei will interpret eh import s found s n v nd try to import the dt from tht vF sf the s is not solute it will get resolved ginst the se sF
mappingsURL persistent
@optionlAX true or flseX if flseD the diretory reted inside the dt direE tory is removed when the lnguge resoure is losedD otherwiseD tht diretory is keptF he gonnetesmeyntology lnguge resoure n e used t lter time to onnet to suh diretory nd rete n ontology lnguge resoure for it @see etion IRFQFPAF
rdfXmlUrl
@optionlAX n v speifying the lotion of n ontology in hpGwv seE riliztion formt @see httpXGGwwwFwQForgGGrdfEsyntxEgrmmrGA from whih to lod initil ontology dt fromF he prmeter nme n e hnged from rdfmlrl to nQrl to indite xQ seriliztion formt @see httpXGGwwwFwQForgGhesignsssuesG xottionQFhtmlAD to ntriplesrl to indite xEriples formt @see httpXGGwwwF wQForgGGPHHRGigErdfEtestsesEPHHRHPIHG5ntriplesAD nd to turtlerl to indite vi seriliztion formt @see httpXGGwwwFwQForgGemumissionG turtleGAF sf this is left lnkD no ontology is loded nd n empty ontology lnguge resoure is retedF
suessfullyD ut you will not e le to rowseGedit the ontology unless you loded Ontology Tools plugin eforehndF
xoteX you ould rete lnguge resoure suh s OWLIM Ontology from qei heveloper
QPP
edditionl ontology dt n e loded into n existing ontology lnguge resoure y seE leting the vod9 option from the lnguge resoure9s ontext menuF his will show the dilog shown in (gure IRFPF he prmeters in this dilog orrespond to the prmeters in the dilog for reting new ontology with the ddition of one new prmeterX lod s import9F sf this prmeter is hekedD the ontology dt is loded spei(lly s n ontology importF yntology imports n e exluded from wht is sved t lter timeF
pigure IRFQ shows the ontology sve dilog tht is shown when the option ve sF F F 9 is seleted from the lnguge resoure9s ontext menuF he prmeter inlude imports9 llows the user to speify if the dt tht hs een loded through imports should e inluded in the sved dt or notF
QPQ
the nme of the sesme repository holding the ontology storeF por king store reted with the yvsw yntology9 lnguge resoureD this is lwys owlimQ9F the v of the lotion where to (nd the repository holding the ontology storeF he v n either speify lol diretory or n r serverF por king store reted with the yvsw yntology9 lnguge resoure this is the diretory tht ws reted inside the dt diretory @the nme of the diretory strting with qeiyvswyntologyAF sf the v spei(es r server whih requires uthenti(tionD the userEsh nd pssword hve to e inluded in the v @eFgF httpXGGuseridXpsswddlolhostXVHVHGopenrdfEsesmeAF
repositoryLocation X
xote tht this ontology lnguge resoure is only supported when onneted with n yvswQ repository on(gured to use the owlEmx ruleset nd with prtilhp opE timiztions disled3 gonneting to ny other repository is experimentl nd for expert users only3 elso note tht onneting to repository tht is lredy in use y qei or ny other pplition is not supported nd might result in unwnted or erroneous ehvior3
QPR
QPS
sn some ses one might wnt to suppress the import of ertin ontologies or one might wnt to lod the dt from di'erent lotinD eFgF from (le on the lol (le system instedF ith the yvswyntology lnguge resoure this n e hieved y speifying n import mppings (le when reting the ontologyF en import mppings (le @see (gure IRFT for n exmpleA is plin (le tht mps spei( import ss to vs or to nothing t llF ih line tht is not empty or does not strt with hsh @5A inditing omment line must ontin sF sf the s is not followed y nythingD this s will e ignored when proessing importsF sf the s is followed y somethingD this is interpreted s v tht is used for resolving the import of the sF vol (les n e spei(ed s fileX vs or y just giving the solute or reltive pthnme of the (le in vinux pth nottion @forwrd slshes s pth seprtorsAF et the momentD (lenmes with emedded whitespe re not supportedF sf pthnme is reltive it will e resolved reltive to the diretory whih ontins the mppings (leF
# map this import to another web url http://proton.semanticweb.org/2005/04/protont http://mycompany.com/owl/protont.owl # map this import to a file in the same directory as the mappings file http://proton.semanticweb.org/2005/04/protons protons.owl # ignore this import http://somewhere.com/reallyhugeimport
QPT
ler gler the repository nd remove ll triples from itF sk erform n eu queryF he result of the eu query is printed to stndrd outputF query erform ivig queryF he result of the query is printed in tulr form to
stndrd outputF he defult olumn seprtion hrter is t nd if the olumn seprtor or new line hrter ours in vlue it is hnged to speF
updte erform ev updte query @sxiD hiviiA import smport dt into the repository from (le export ixport dt from the repository into (lenmes rete grete new repository using vi repository on(gurtion (leF delete helete repositoryF xote tht due to esme limittionD the tul (les for the
repository my not e removed from the disk for remote ontologies on serverF
QPU
14.4
is the se s to e used for ll new items tht re only mentioned using their lol nmeF his n sfely e left emptyD in whih seD while dding new resoures to the ontologyD users re sked to provide nme spes for eh new resoureF
PF es indited erlierD yvsw supports four di'erent formtsX hpGwvD xriplesD urtle nd xQF eording to the formt of the ontology (leD user should selet one of the four v options (rdfXmlURL, ntriplesURL, turtleURL and n3URL (not supported yet)) nd provide v pointing to the ontology dtF yne n ontology is retedD dditionl dt n e loded tht will e merged with the existing informtionF his n e done y rightEliking on the ontology in the resoures tree in qei heveloper nd seleting vod FFF dt9 where FFF9 is one of the supported formtsF yther options ville re lening the ontology @deleting ll the informtion from itA nd sving it to (le in one of the supported formtsF yntology n e sved in di'erent formts @rdfGxmlD ntriplesD nQ nd turtleA using the options provided in the ontext menu tht n e invoked y right liking on the instne
QPV
of n ontology in qei heveloperF ell the hnges mde to the ontology re logged nd stored s n ontology fetureF sers n lso export these hnges to (le y seleting the ve yntology ivent vog9 option from the ontext menuF imilrlyD users n lso lod the exported event log nd pply the hnges on di'erent ontology y using the vod yntology ivent vog9 optionF eny hnge mde to the ontology n e desried y set of triples either dded or deleted from the repositoryF por exmpleD in qei imeddedD ddition of new instne results into ddition of two sttements into the repositoryX
// Adding a new instance "Rec1" of type "Recognized" // Here + indicates the addition + <http://proton.semanticweb.org/2005/04/protons#Rec1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://proton.semanticweb.org/2005/04/protons#Recognized> // Adding a label (annotation property) to the instance with // value "Rec Instance" + <http://proton.semanticweb.org/2005/04/protons#Rec1> <http://www.w3.org/2000/01/rdf-schema#label> <Rec Instance> <http://www.w3.org/2001/XMLSchema#string>
he event log therefore ontins list of suh triplesD the ltest hnge eing t the ottom of the hnge logF ih triple onsists of sujet followed y predite followed y n ojetF felow we give n illustrtion explining the syntx used for reording the hngesF
// Adding a new instance "Rec1" of type "Recognized" // Here + indicates the addition + <http://proton.semanticweb.org/2005/04/protons#Rec1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://proton.semanticweb.org/2005/04/protons#Recognized> // Adding a label (annotation property) to the instance with // value "Rec Instance" + <http://proton.semanticweb.org/2005/04/protons#Rec1> <http://www.w3.org/2000/01/rdf-schema#label> <Rec Instance> <http://www.w3.org/2001/XMLSchema#string> // Adding a new class called TrustSubClass + <http://proton.semanticweb.org/2005/04/protons#TrustSubClass> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class>
QPW
// TrustSubClass is a subClassOf the class Trusted + <http://proton.semanticweb.org/2005/04/protons#TrustSubClass> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://proton.semanticweb.org/2005/04/protons#Trusted> // Deleting a property called hasAlias and all relevant statements // Here - indicates the deletion // * indicates any value in place - <http://proton.semanticweb.org/2005/04/protons#hasAlias> <*> <*> - <*> <http://proton.semanticweb.org/2005/04/protons#hasAlias> <*> - <*> <*> <http://proton.semanticweb.org/2005/04/protons#hasAlias> // Deleting a label set on the instance Rec1 - <http://proton.semanticweb.org/2005/04/protons#Rec1> <http://www.w3.org/2000/01/rdf-schema#label> <Rec Instance> <http://www.w3.org/2001/XMLSchema#string> // Reseting the entire ontology (Deleting all statements) - <*> <*> <*>
14.5
qei9s ontology support lso inludes viewerGeditor tht n e used within qei heveloper to nvigte n ontology nd quikly inspet the informtion relting to ny of the ojets de(ned in it"lsses nd restritionsD instnes nd their propertiesF elsoD resoures n e deleted nd new resoures n e dded through the viewerF fefore the ontology editor n e usedD one of the ontology implementtion plugins must e lodedF sn ddition the yntologyools must e lodedF
xoteX o mke it possile to show loded ontology in the ontology editorD the yntologyools plugin must e loded before the ontology lnguge resoure is retedF
he viewer is divided into two resF yne on the left shows seprte ts for hierrhy of lsses nd instnes nd for @s of qte RA hierrhy of propertiesF he view on right hnd side shows the detils pertining of the ojet urrently seleted in the other twoF pirst t on the left view displys tree whih shows ll the lsses nd restritions de(ned in the ontologyF he tree n hve severl root nodes"one for eh top lss in the ontologyF he sme tree lso shows eh instnes for eh lssF xoteX snstnes tht elong to severl lsses re shown s hildren of ll the lsses they elong toF
QQH
eond t on the left view displys tree of ll the properties de(ned in the ontologyF his tree n lso hve severl root nodes"one for eh top property in the ontologyF hi'erent types of properties re distinguished y using di'erent ionsF henever n item is seleted in the tree viewD the rightEhnd view is populted with the detils tht re pproprite for the seleted ojetF por n ontology lssD the detils inlude the rief informtion out the resoure suh s the s of the seleted lssD type of the seleted lss etFD set of diret superlssesD the set of ll superlsses using the trnsitive losureD the set of diret sulssesD the set of ll the sulssesD the set of equivlent lssesD the set of pplile property typesD the set of property vlues set on the seleted lssD nd the set of instnes tht elong to the seleted lssF por restritionD in ddition to the ove informtionD it displys on whih property the restrition is pplile to nd wht type of the restrition tht isF por n instneD the detils displyed inlude the rief informtion out the instneD set of diret types @the list of lsses this instne is known to elong toAD the set of ll types this instne elongs to @through the trnsitive losure of the set of diret typesAD the set of sme instnesD the set of di'erent instnes nd the vlues for ll the properties tht re setF hen property is seletedD di'erent informtion is displyed in the rightEhnd view E
QQI
ording to the property typeF st inludes the rief informtion out the property itselfD set of diret superpropertiesD the set of ll superproperties @otined through the trnsitive losureAD the set of diret supropertiesD the set of ll suproperties @otined through the trnsitive losureAD the set of equivlent propertiesD nd domin nd rnge informtionF es mentioned in the desription of the dt modelD properties re not diretly linked to the lssesD ut rther de(ne their domin of ppliility through set of domin restritionsF his mens tht the list of properties should not relly e listed s detil for lss ojets ut only for instnesF st is however quite useful to hve n indition of the types of properties tht ould pply to instnes of given lssF feuse of the semntis of property dominsD it is not possile to lulte preisely the list of pplile properties for given lssD ut only n estimte of itF sf property for instne requires its domin instnes to elong to two di'erent lsses then it nnot e known with ertitude whether it is pplile to either of the two lsses"it does not pply to ll instnes of ny of those lssesD ut only to those instnes the two lsses hve in ommonF feuse of thisD suh properties will not e listed s pplile to ny lssF he informtion listed in the detils pne is orgnised in suElists ording to the type of the itemsF ih suElist n e ollpsed or expnded y liking on the little tringulr utton next to the titleF he ontology viewer is dynmi nd will updte the informtion displyed whenever the underlying ontology is hnged through the esF hen you doule lik on ny resoure in the detils tleD the respetive resoure is seleted in the lss or in the property tree nd the seleted resoure9s detils re shown in the detils tleF o hnge property vlueD user n doule lik on vlue of the property @seond olumnA nd the relevnt window is shown where user is sked to provide new vlueF elong with eh property vlueD utton @with red ptionA is providedF sf user wnts to remove property vlue he or she n lik on the utton nd the property vlue is deletedF e new toolr hs een dded t the top of the ontology viewerD whih ontins the following uttons to dd nd delete ontology resouresX edd new top lss @gA edd new sulss @gA edd new instne @sA edd new restrition @A edd new ennottion property @eA edd new httype property @hA edd new yjet property @yA edd new ymmetri property @A
QQP
he tree omponents llow the user to selet more thn one nodeD ut the detils tle on the rightEhnd side of the qei heveloper qs only shows the detils of the (rst seleted nodeF he uttons in the toolr re enled nd disled sed on users9 seletion of nodes in the treeF IF greting new top lssX e window ppers whih sks the user to provide detils for its nmespe @defult nme spe if spei(edAD nd lss nmeF sf there is lredy lss with sme nme in ontologyD qei heveloper shows n pproprite messgeF PF greting new sulssX e lss n hve multiple super lssesF hereforeD seleting multiple lsses in the ontology tree nd then liking on the g9 uttonD utomtilly onsiders the seleted lsses s the super lssesF he user is then sked for detils for its nmespe nd lss nmeF QF greting new instneX en instne n elong to more thn one lssF hereforeD seleting multiple lsses in the ontology tree nd then liking on the s9 uttonD utomtilly onsiders the seleted lsses s the type of new instneF he user is then prompted to provide detils suh s nmespe nd instne nmeF RF greting new restritionX es desried oveD restrition is type of n nonymous lss nd is spei(ed on property with onstrint set on either the numer of vlues it n tke or the type of vlue llowed for instnes to hve for tht propertyF ser n lik on the lue 9 squre utton whih shows window for reting new restritionF ser n selet type of restritionD property nd vlue onstrint for the smeF lese note tht restritions re onsidered s nonymous lsses nd therefore user does not hve to speify ny s for the sme ut restritions re nmed utomtilly y the systemF SF greting new propertyX iditor llows reting (ve di'erent types of propertiesX ennottion propertyX ine n nnottion property nnot hve ny domin or rnge onstrintsD liking on the new nnottion property utton rings up dilog tht sks the user for informtion suh s the nmespe nd the nnottion property nmeF
QQQ
httype propertyX e dttype property n hve one or more ontology lsses s its domin nd one of the preEde(ned dttypes s its rngeF eleting one or more lsses nd liking on the new httype property ionD rings up window where the seleted lsses in the tree re tken s the property9s dominF he user is then sked to provide informtion suh s the nmespe nd the property nmeF e drop down ox llows users to selet one of the dt types from the listF yjetD ymmetri nd rnsitive propertiesX hese properties n hve one or more lsses s their domin nd rngeF por symmetri property the domin nd rnge re the smeF gliking on ny of these options rings up window where user is sked to provide informtion suh s the nmespe nd the property nmeF he user is lso given two uttons to selet one or more lsses s vlues for domin nd rngeF TF emoving the seleted resouresX ell the seleted nodes re removed when user liks on the 9 uttonF lese note tht sine ontology resoures re relted in vrious wysD deleting resoure n 'et other resoures in the ontologyY for exmpleD deleting resoure n use other resoures in the sme ontology to e deleted tooF UF erhing in ontologyX he erh utton llows users to serh for resoures in the ontologyF e window pops up with n input text (eld tht llows inrementl serhingF sn other wordsD s user types in nme of the resoureD the dropEdown list refreshes itself to ontin only the resoures tht strt with the typed stringF eleting one of the resoures in this list nd pressing yuD selets the pproprite resoure in the editorF he erh funtion lso llows seleting resoures y the property vlues set on themF VF efresh yntology he refresh utton relods the ontology nd updtes the editorF WF etting properties on instnesGlssesX ightEliking on n instne rings up menu tht provides list of properties tht re inherited nd pplile to its lssesF eleting spei( property from the menu llows the user to provide vlue for tht propertyF por exmpleD if the property is n yjet propertyD new window ppers whih llows the user to selet one or more instnes whih re omptile to the rnge of the seleted propertyF he seleted instnes re then set s property vluesF por lssesD ll the properties @eFgF nnottion nd hp propertiesA re listed on the menuF IHF etting reltions mong resouresX wo or more lssesD or two or more propertiesD n e set s equivlentY similrly two or more instnes n e mrked s the smeF ightEliking on resoure rings up menu with n pproprite option @iquivlent glss for ontology lssesD me es snstne for instnes nd iquivlent roperty for propertiesA whih when liked then
QQR
14.6
he yntology ennottion ool @yeA is qei plugin ville from the yntology ools plugin setD whih enles user to mnully nnotte text with respet to one or more ontologiesF he required ontology must e seleted from pullEdown list of ville ontoloE giesF he ye tool supports nnottion with informtion out the ontology lssesD instnes nd propertiesF
QQS
QQT
user needs to mke sure tht they still hek the lsses nd instnes of nnottions further down in the textD in se the sme string hs di'erent mening @eFgFD nk s uilding vsF nk s river nkAF he edit dilogue lso llows orreting nnottion o'set oundriesF sn other wordsD user n expnd or shrink the nnottion o'sets9 oundries y liking on the relevnt rrow uttonsF ye lso llows users to ssign property vlues s nnottion fetures to the existing lss nd instne nnottionsF sn the se of lss nnottionD ll nnottion properties from the ontology re displyed in the tleF sn the se of instne nnottionsD ll properties from the ontology pplile to the seleted instne re shown in the tleF he tle lso shows existing fetures of the seleted nnottionF ser n then ddD delete or edit ny vlue@sA of the seleted fetureF sn the se of propertyD user is llowed to provide n ritrry numer of vluesF ser nD y liking on the editvist uttonD ddD remove or edit ny vlue to the propertyF sn se of ojet propertiesD users re only llowed to selet vlues from preEseleted list of vlues @iFeF instnes whih stisfy the seleted property9s rnge onstrintsAF
QQU
14.6.4 Options
here re severl options tht ontrol the ye ehviour @see pigure IRFIIAX
QQV
14.7
his tool is designed to nnotte doument with ontology instnes nd to rete reltions etween nnottions with ontology ojet propertiesF st is lose nd omptile with ye ut fous on reltions etween nnottionsD see setion IRFT for yeF o use it you must lod the yntology ools pluginD lod doument nd n ontology then show the doument nd in the doument editor lik on the utton nmed eEg9 @eltion ennottion ool glss viewA whih will lso disply the eEs9 view @eltion ennottion ool snstne viewAF
QQW
he right vertil view shows the loded ontologies s treesF o showGhide the nnottions in the doumentD use the lss hekoxF he seletion of lss nd the tiking of hekox re independent nd work the sme s in the nnottion sets viewF o hnge the nnottion set used to lodGsve the nnottionsD use the drop down list t the ottom of the vertil viewF o hideGshow the lsses in the tree in order to derese the mount of elements displyedD use the ontext menu on lsses seletionF he setting is sved in the user preferenesF he ottom horizontl view shows two tlesX one for instnes nd one for propertiesF he instnes tle shows the instnes nd their lels for the seleted lss in the ontology trees nd the properties tle shows the properties vlues for the seleted instne in the instnes tleF wo uttons llow to dd new instne from the text seletion in the doument or s new lel for the seleted instneF o (lter on instne lelsD use the (lter text (eldF ou n ler the (eld with the utton t the end of the (eldF ou n use how sn yntology iditor9 on the ontext menu of n instne in the instne tleF hen in the ontology editor you n dd lss or ojet propertiesF
QRH
14.7.3 Create new annotation and add label to existing instance from text selection
selet lss in the ontology tree t the right selet some text in the doument editor nd hover the mouse on it if the instnes tle is empty then ler the (lter text (eld selet n existing instne in the instnes tle use the utton edd to eleted snstF9 in the view t the ottom in the ottom left tle you hve your new lel don9t forget to sve your doument exh the ontology efore to quit
QRI
14.8
he following ode demonstrtes how to use the qei es to rete n instne of the yvsw yntology lnguge resoureF his exmple shows how to use the urrent version of the es nd ontology implementtionF por n exmple of using the old es nd the kwrds omptiility pluginD see IRFWF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
/ / step 4: nally create an instance of ontology / / step 2: load the Ontology plugin that contains the implementation
if (! Gate . isInitialized ()) { Gate . init (); } File ontoHome = new File ( Gate . getPluginsHome () , " Ontology " ); Gate . getCreoleRegister (). addDirectory ( ontoHome . toURL ());
FeatureMap fm = Factory . newFeatureMap (); fm . put ( " rdfXmlURL " , urlOfTheOntology ); fm . put ( " baseURI " , theBaseURI ); fm . put ( " mappingsURL " , urlOfTheMappingsFile );
/ / .. any other parameters
QRP
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
for ( OClass c : topClasses ) { Set < OClass > dcs = c . getSubClasses ( OConstants . DIRECT_CLOSURE ); for ( OClass sClass : dcs ) { System . out . println ( sClass . getONodeID (). toTurtle ()); } }
/ / creating a new class from a full URI
OURI aURI1 = ontology . createOURI ( " http :// sample . en / owlim # Organization " ); OClass organizationClass = ontology . addOClass ( aURI1 );
/ / create a new class from a name and the default name space set for / / the ontology
OURI aURI2 = ontology . createOURIForName ( " someOtherName " ); OClass someOtherClass = ontology . addOClass ( aURI2 ); someOtherClass . setLabel ( " some other name " , OConstants . ENGLISH );
/ / creating a new Datatype property called name / / with domain set to Organization / / with datatype set to string
URI dURI = new URI ( " http :// sample . en / owlim # Name " , false ); Set < OClass > domain = new HashSet < OClass >(); domain . add ( organizationClass ); DatatypeProperty dp = ontology . addDatatypeProperty ( dURI , domain , Datatype . getStringDataType ());
/ / creating a new instance of class organization called IBM
OURI iURI = ontology . createOURI ( " http :// sample . en / owlim # IBM " ); OInstance ibm = Ontology . addOInstance ( iURI , organizationClass );
Set < DatatypeProperty > dps = Ontology . getDatatypeProperties (); for ( DatatypeProperty dp : dps ) { List < Literal > values = ibm . getDatatypePropertyValues ( dp ); System . out . println ( " DP : " + dp . getOURI ()); for ( Literal l : values ) { System . out . println ( " Value : " + l . getValue ()); System . out . println ( " Datatype : " + l . getDataType (). getXmlSchemaURI ()); }
QRQ
70 71 72 73 74 75
BufferedWriter writer = new BufferedWriter ( new FileWriter ( someFile )); ontology . writeOntologyData ( writer , OConstants . OntologyFormat . TURTLE ); writer . close ();
14.9
he following ode demonstrtes how to use the qei es to rete n instne of the yvsw yntology lnguge resoureF This example shows how to use the API with the
backwards-compatibility plugin
Ontology_OWLIM2
por how to use the es with the urrent implementtion pluginD see IRFVF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
/ / creating a new Datatype property called name / / with domain set to Organization / / with datatype set to string / / creating a new class / / false indicates that it is not an anonymous URI / / step 4: nally create an instance of ontology / / step 2: load the plugin / / step 1: initialize GATE
File ontoHome = new File ( Gate . getPluginsHome () , " Ontology_OWLIM2 " ); Gate . getCreoleRegister (). addDirectory ( ontoHome . toURL ());
FeatureMap fm = Factory . newFeatureMap (); fm . put ( " rdfXmlURL " , url - of - the - ontology ); Ontology ontology = ( Ontology ) Factory . createResource ( " gate . creole . ontology . owlim . OWLIMOntologyLR " , fm );
Set < OClass > topClasses = ontology . getOClasses ( true ); Iterator < OClass > iter = topClasses . iterator (); while ( iter . hasNext ()) { Set < OClass > dcs = iter . next (). getSubClasses ( OConstants . DIRECT_CLOSURE ); for ( OClass aClass : dcs ) { System . out . println ( aClass . getURI (). toString ()); } }
/ / for all top classes, printing their direct sub classes
URI aURI = new URI ( " http :// sample . en / owlim # Organization " , false ); OClass organizationClass = ontology . addOClass ( aURI );
QRR
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
BufferedWriter writer = new BufferedWriter ( new FileWriter ( someFile )); String output = ontology . getOntologyData ( OConstants . ONTOLOGY_FORMAT_NTRIPLES ); writer . write ( output ); writer . flush (); writer . close ();
14.10
yne of the qei omponents tht mkes use of the ontology support is the tei trnsE duer @see ghpter VAF gomining the power of ontologies with tei9s pttern mthing mehnisms n ese the retion of pplitionsF sn order to use ontologies with teiD one needs to lod n ontology in qei efore loding the tei trnsduerF yne the ontology is known to the systemD it n e set s the vlue for the optionl ontology prmeter for the tei grmmrF hoing so lters slightly the wy the mthing ours when the grmmr is exeutedF sf trnsduer is ontologyEwre @iFeF it hs vlue set for the 9ontology9 prmeterA it will tret ll ourrenes of the feture nmed lss di'erently from the other fetures of nnottionsF he vlues for the feture lss on ny type of nnottion will e onsidered s referring to lsses in the ontology s followsX
QRS
if the lss feture vlue is vlid s @eFgF httpXGGsmpleFenGowlim5yrgniztionA then it is treted s referene to the lss @if nyA with tht s in the ontologyF otherwiseD it is treted s nme in the ontology9s defult nmespeF he defult nmespe is prepended to the vlue to give s nd the feture is treted s referring to the lss with tht sF por exmpleD if the defult nmespe of the ontology is httpXGGgteFFukGexmple5 then lss feture with the vlue erson refers to the httpXGGgteFFukGexmple5erson lss in the ontologyF sf the ontology imports other ontologies then it my e useful to de(ne templtes for the vrious nmespe ss to void exessive repetitionF here is n exmple of this for the yyx ontology in setion VFIFTF sn ontologyEwre mode the mthing etween two lss vlues will not e sed on simE ple equlity ut rther hierrhil omptiilityF por exmple if the ontology ontins lss nmed olitiin9D whih is su lss of the lss erson9D then pttern of {intityFlss aa erson9} will suessfully mth n nnottion of type intity with feture lss hving the vlue olitiin9F sf the tei trnsduer were not ontologyE wreD suh test would filF his ehviour llows lrger degree of generlistion when designing set of rulesF ules tht pply severl types of entities mentioned in the text n e written using the most generi lss they pply to nd need not e repeted for eh sutype of entityF yne ould hve rules pplying to votions without needing to know whether prtiulr lotion hppens to e ountry or ityF sf domin ontology is ville t the time of uilding n pplitionD using it in onjuntion with the tei trnsduers n signi(ntly simplify the set of grmmrs tht need to e writtenF he ontology does not normlly 'et tions on the right hnd side of tei rulesD ut when tv is used on the right hnd sideD then the ontology eomes essile vi lol vrile nmed ontologyD whih my e referened from within the rightEhndEside odeF sn tv odeD the lss feture should e referened using the stti (nl vrileD vyyugvepieixewiD tht is de(ned in gteFreoleFexxsigonstntsF
14.11
he ontologyEwre tei trnsduer enles the text to e linked to lsses in n ontology y mens of nnottionsF issentilly this mens tht eh nnottion n hve lss nd ontology fetureF o dd the relevnt lss feture to n nnottion is very esyX simply dd feture lss9 with the lssnme s its vlueF o dd the relevnt ontologyD use ontologyFgetv@AF
QRT
felow is smple rule whih looks for lotion nnottion nd identi(es it s wention9 nnottion with the lss votion9 nd the ontology loded with the ontologyEwre tei trnsduer @vi the runtime prmeter of the trnsduerAF
Rule: Location ({Location}):mention --> :mention{ // create the ontology and class features FeatureMap features = Factory.newFeatureMap(); features.put("ontology", ontology.getURL()); features.put("class", "Location"); // create the new annotation try { annotations.add(mentionAnnots.firstNode().getOffset(), mentionAnnots.lastNode().getOffset(), "Mention", features); } catch(InvalidOffsetException e) { throw new JapeException(e); }
14.12
Populating Ontologies
enother typil pplition tht omines the use of ontologies with xv tehniques is (nding mentions of entities in textF he senrio is tht one hs n existing ontology nd wnts to use snformtion ixtrtion to populte it with instnes whenever entities elonging to lsses in the ontology re mentioned in the input textsF vet us ssume we hve n ontology nd n si pplition tht mrks the input text with nnottions of type wention9 hving feture lss9 speifying the lss of the entity mentionedF he tsk we re seeking to solve is to dd instnes in the ontology for every wention nnottionF he exmple presented here is sed on tei rule tht uses tv ode on the tion side in order to ess diretly the qei ontology esX
1 2 3 4 5
QRU
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
/ / we know the annotation set returned / / will always contain a single annotation
String className = ( String ) mentionAnn . getFeatures (). get ( gate . creole . ANNIEConstants . LOOKUP_CLASS_FEATURE_NAME );
OClass aClass = ontology . getOClass ( ontology . createOURIForName ( className )); if ( aClass == null ) { System . err . println ( " Error class \" " + className + " \" does not exist ! " ); return ; }
/ / nd the text covered by the annotation
/ / should normalize class name and avoid invalid class names here!
/ / when creating a URI from text that came from a document you must take care / / to ensure that the name does not contain any characters that are illegal / / in a URI. The following method does this nicely for English but you may / / want to do your own normalization instead if you have non-English text.
String mentionName = OUtils . toResourceName ( theMentionText ); DatatypeProperty prop = ontology . getDatatypeProperty ( ontology . createOURIForName ( " mentionText " ));
OURI mentionURI = ontology . createOURIForName ( mentionName ); if (! ontology . containsOInstance ( mentionURI )) { OInstance inst = ontology . addOInstance ( mentionURI , aClass ); try { inst . addDatatypePropertyValue ( prop , new Literal ( theMentionText , OConstants . ENGLISH )); } catch ( InvalidValueException e ) { throw new JapeException ( e ); }
/ / add the actual mention text to the instance / / if that mention instance does not already exist, add it
his will mth eh nnottion of type wention in the input nd ssign it to lel mention9F ht lel is then used in the right hnd side to (nd the nnottion tht ws mthed y the pttern @lines S!IHAY the vlue for the lss feture of the nnottion is used to identify the ontologil lss nme @lines IP!IRAY nd the nnottion spn is used to extrt the text overed in the doument @lines IT!PTAF yne ll these piees of informtion re villeD the ddition to the ontology n e doneF pirst the right lss in the ontology is identi(ed using the lss nme @lines PV!QUA nd then new instne for tht lss is reted @lines QV!SHAF
QRV
feside teiD nother tool tht ould ply prt in this pplition is the yntologil qzetteerD see etion IQFQD whih n e useful in ootstrpping the si pplition tht (nds entity mentionsF he solution presented here is purely pedgogil s it does not ddress mny issues tht would e enountered in rel life pplition solving the sme prolemF por instneD it is nve to ssume tht the nme for the entity would e extly the text found in the doumentF sn mny ses entities hve severl lises ! for exmple the sme person nme n e written in vriety of forms depending on whether titlesD (rst nmesD or initils re usedF e proess of nme normlistion would proly need to e employed in order to mke sure tht the sme entityD regrdless of the textul form it is mentioned inD will lwys e linked to the sme ontology instneF por detiled desription of the qei ontology esD plese onsult the tvho doumenE ttionF
14.13
his setion desries the hnges in the es nd the implementtion mde in qei hevelE oper version SFIF he most importnt hnge is tht the implementation of the ontology API has been removed from the GATE core and is now being made available as pluginsF gurrently the plugin yntologyyvswP provides the implementtion tht ws present in the qei ore previously nd the plugin yntology provides new nd upgrded implementtion tht lso implements some new fetures tht were dded to the esF he yntologyyvswP plugin is intended to provide mximum kwrds omptiility ut will not e developed further nd e phsed out in the futureD while the yntology plugin provides the urrent tively developed implementtionF
Before any ontology-related function can be used in GATE, one of the ontology implementation plugins must be loaded.
QRW
ters nd tv pkge s the lnguge resoure yvswyntologyv in kwrdsEomptiility plugin yntologyyvswPF his llows to test existing pipelines nd pplitions with the new implementtion without the neessity to dpt the nmes of the lnguge resoure or prmetersF he implementtion in plugin yntology mkes vrious ttempts to redue the mount of memory needed to lod n ontologyF his will llow to lod signi(ntly lrger ontologies into qeiF his omes t the prie of some methods needing more time thn eforeD s the implementtion does not he ll ontology entities in qei9s memory ny moreF he new implementtion does not provide ess to ny implementtion detil nymoreD the method getesmeepository will therefore throw n exeptionF he return type of this method in the old implementtion hs een hnged to yjet to remove the dependeny on esme lss in the qei esF
QSH
he defult nmespe s is now set utomtilly from the ontology if possile nd the es llows getting nd setting the ontology sF he ontology es now o'ers methods for getting n itertor when essing some ontology resouresD eFgF when getting ll lsses in the ontologyF his helps to prevent the exessive use of memory when retrieving lrge numer of suh resoures from lrge ontologyF yntology ojets do not internlly store opies of ll ontology resoures in hsh mps ny moreF his mens tht reEfething ontology resoures will e slower opertion nd old methods tht rely on this mehnism re either depreted @getyesouresfyxmeD getyesourefyxmeA or do not work t ll ny more @getyesourepromwpD ddyesoureowpD removeyesourepromwpAF
QSI
QSP
15.1
Language Identication
e ommon prolem when hndling multiple lnguges is determining the lnguge of doument or setion of doumentF por exmpleD ptent douments often ontin the strt in more thn one lngugeF sn suh ses you my wnt to only proess those setions written in inglishD or you my wnt to run di'erent proessing resoures over the di'erent setions dependent upon the lnguge they re written inF yne douments or setions re nnotted with their lnguge then it is esy to pply di'erent proessing resoures to the di'erent setions using either gonditionl gorpus ipeline or vi the etionEfyEetion @etion IWFPFIHAF he prolem isD of ourseD identifying the lngugeF he vngugesdentifition plugin ontins extgt sed for performing lnE guge identi(tionF he hoie of lnguges used for tegoriztion is spei(ed through on(gurtion (leD the v of whih is the s only initiliztion prmeterF he hs the following runtime prmetersF
of the spei(ed type nd stores the result s feture on tht nnottionF sf this is left lnk @null or emptyAD the lssi(es the text of eh doument nd stores the result s doument fetureF is lnkF
nlike most other s @whih produe nnottionsAD this one dds either doument fetures or nnottion feturesF @o lssify oth whole douments nd spns within themD use two instnes of this FA xote tht lssi(tion ury is etter over long spns of text @prgrphs rther thn sentenesD for exmpleAF
Note that an alternative language identication PR is available in the LingPipe plugin, which is documented in Section 21.23.5.
QSQ
nnottionype sf this is suppliedD the uses only the text underlying eh nnottion
of the spei(ed type to uild the lnguge (ngerprintF sf this is left lnk @null or emptyAD the will insted use the whole of eh doument to rete the (ngerprintF
annotationType
is
15.2
French Plugin
he prenh plugin ontins two pplitions for xi reognitionX one whih inludes the reegger for y tgging in prenh @frenhCtggerFgppA D nd one whih does not @frenhFgppAF imply lod the pplition required from the pluginsGvngprenh direE toryF ou do not need to lod the plugin itself from the qei heveloper9s lugin wnE gement gonsoleF xote tht the reegger must (rst e instlled nd set up orretly @see etion PIFQ for detilsAF ghek tht the runtime prmeters re set orretly for your reegger in your pplitionF he pplitions oth ontin resoures for tokenistionD sentene splittingD gzetteer lookupD xi reognition @vi tei grmmrsA nd orthogrphi orefereneF xote tht they re not intended to produe high qulity resultsD they re simply strting point for developer working on prenhF ome smple texts re ontined in the pluginsGvngprenhGdt diretoryF
15.3
German Plugin
he qermn plugin ontins two pplitions for xi reognitionX one whih inludes the reegger for y tgging in qermn @germnCtggerFgppA D nd one whih does not @gerE mnFgppAF imply lod the pplition required from the pluginsGvngqermnGresoures diretoryF ou do not need to lod the plugin itself from the qei heveloper9s lugin wngement gonsoleF xote tht the reegger must (rst e instlled nd set up orE retly @see etion PIFQ for detilsAF ghek tht the runtime prmeters re set orretly for your reegger in your pplitionF he pplitions oth ontin resoures for tokeniE stionD sentene splittingD gzetteer lookupD ompound nlysisD xi reognition @vi tei grmmrsA nd orthogrphi orefereneF ome smple texts re ontined in the pluginE sGvngqermnGdt diretoryF e re grteful to pio girvegn nd the hotFuyw projet for use of some of the omponents for the qermn pluginF
QSR
15.4
Romanian Plugin
he omnin plugin ontins n pplition for omnin xi reognition @romE ninFgppAF imply lod the pplition from the pluginsGvngomninGresoures diE retoryF ou do not need to lod the plugin itself from the qei heveloper9s lugin wngement gonsoleF he pplition ontins resoures for tokenistionD gzetteer lookupD xi reognition @vi tei grmmrsA nd orthogrphi orefereneF ome smple texts re ontined in the pluginsGromninGorpus diretoryF
15.5
Arabic Plugin
he eri plugin ontins simple pplition for eri xi reognition @riFgppAF imply lod the pplition from the pluginsGvngeriGresoures diretoryF ou do not need to lod the plugin itself from the qei heveloper9s lugin wngement gonsoleF he pplition ontins resoures for tokenistionD gzetteer lookupD xi reognition @vi tei grmmrsA nd orthogrphi orefereneF xote tht there re two types of gzetteer used in this pplitionX one whih ws derived utomtilly from trining dt @eri inferred gzetteerAD nd one whih ws reted mnullyF xote tht there re some other pplitions inluded whih perform quite spei( tsks @ut n generlly e ignoredAF por exmpleD riEforEnFgpp nd riEforEmuseFgpp mke use of very spei( set of trining dt nd onvert the result to speil formtF here is lso n pplition to ollet new gzetteer lists from trining dt @rilistsolletorFgppAF por detils of the gzetteer list olletor plese see etion IQFUF
15.6
Chinese Plugin
he ghinese plugin ontins two omponentsX simple pplition for ghinese xi reogniE tion @hineseFgppA nd omponent lled ghinese egmenterF sn order to use the formerD simply lod the pplition from the pluginsGvngghineseGresoures diretoryF ou do not need to lod the plugin itself from the qei heveloper9s lugin wngement gonsoleF he pplition ontins resoures for tokenistionD gzetteer lookupD xi reognition @vi tei grmmrsA nd orthogrphi orefereneF he pplition mkes use of some gzetteer lists @nd grmmr to proess themA derived utomtilly from trining dtD s well s regulr hndErfted gzetteer listsF here re lso pplitions @listsolletorFgppD djolletorFgpp nd nounperE sonolletorFgppA to rete suh listsD nd vrious other pplition to perform speil tsks suh s oreferene evlution @orefereneevlFgppA nd onverting the output to di'erent formt @eEtoEmuseFgppAF
QSS
QST
he following ew models re distriuted with plugins nd re ville s ompressed zip (les under the pluginsGvngghineseGresouresGmodels diretoryF lese unzip them to useF sn detilD those models were lerned using the ew lerning lgorithm from the orpor provided y ighnEHS keo' tskF the ew model lerned from u trining dtD using the ew lerning lgorithm nd the UTF-8 enodingD is ville s modelEpumEpkuEutfVFzipF the ew model lerned from u trining dtD using the ew lerning lgorithm nd the GB2312 enodingD is ville s modelEpumEpkuEgFzipF the ew model lerned from e trining dtD using the ew lerning lgorithm nd the UTF-8 enodingD is ville s modelEsEutfVFzipF the ew model lerned from e trining dtD using the ew lerning lgorithm nd the BIG5 enodingD is ville s modelEsEigSFzipF es you n seeD those models were lerned using di'erent trining dt nd di'erent ghinese text enodings of the sme trining dtF he u trining dt re news rtiles pulished in minlnd ghin nd use simpli(ed ghineseD while the e trining dt re news rtiles pulished in iwn nd use trditionl ghineseF sf your text re in simpli(ed ghineseD you n use the models trined y the u dtF sf your text re in trditionl ghineseD you need to use the models trined y the e dtF sf your dt re in qfPQIP enoding or ny omptile enodingD you need use the model trined y the orpus in qfPQIP enodingF
QSU
xote tht the segmented ghinese text @either used s trining dt or produed y this pluginA use the lnk spe to seprte word from its surrounding wordsF reneD if your dt re in niode suh s pEVD you n use the GATE Unicode Tokeniser to proess the segmented text to dd the oken nnottions into your text to represent the ghinese wordsF yne you get the nnottions for ll the ghinese wordsD you n perform further proessing suh s y tgging nd nmed entity reognitionF
15.7
Hindi Plugin
he rindi plugin @vngrindi9A ontins set of resoures for si rindi xi reognition whih mirror the exxsi resoures ut re ustomised to the rindi lngugeF ou need to hve the exxsi plugin loded (rst in order to lod ny of these sF ith the rindiD you n rete n pplition similr to exxsi ut repling the exxsi s with the defult s from the pluginF
QSV
QTH
resoures re often required in order to extrt useful or interesting informtionF his hpter douments qei resoures tht hve een developed for spei( dominsF
16.1
Biomedical Support
houments from the iomedil domin o'er numer of hllengesD inluding highly speilised voulryD words tht inlude mixed se nd numers requiring unusul toE keniztionD s well s ommon inglish words used with dominEspei( senseF wny of these prolems n only e solved through the use of dominEspei( resouresF ome of the proessing resoures doumented elsewhere in this user guide n e dpted with little or no e'ort to help with proessing iomedil doumentsF he vrge unowledge fse qzetteer @etion IQFWA n e initilized ginst iomedil ontology suh s vinked vife ht in order to nnotte mny di'erent dominEspei( oneptsF he vnguge sdenE ti(tion @etion ISFIA n lso e trined to di'erentite etween doument domins insted of lngugesD whih ould help trget spei( resoures to spei( douments using onditionl orpus pipelineF elso mny plugins n e used s is to extrt informtion from iomedil doumentsF por exmpleD the wesurements gger @etion PIFVA n e used to extrt informtion out the dose of meditionD or the weight of ptients prtiipting in studyF he rest of this setionD howeverD douments the resoures inluded with or ville to qei nd whih re foused purely on proessing iomedil doumentsF
16.1.1 ABNER
efxi is e fiomedil xmed intity eogniser ettles HSF st uses mhine lerning @linerEhin onditionl rndom (eldsD gpsA to (nd entities suh s genesD ell typesD nd hxe in textF pull detils of efxi n e found t httpXGGpgesFsFwisFeduG settlesGE nerG o use efxi within qeiD (rst lod the ggerener plugin through the plugins onsoleD nd then rete new efxi gger in the usul wyF he efxi gger hs no initiliztion prmeters nd it does not require ny other s to e run prior to exeutionF gon(gurtion of the tgger is performed using the following runtime prmetersX nerwode he efxi model tht will e used for tggingF he plugin n use one of two previously trined mhine lerning models for tgging textD s provided y efxiX
QTI
eduG~settlesGnerG
16.1.2 MetaMap
wetwpD from the xtionl virry of wediine @xvwAD mps iomedil text to the wv wetthesurus nd llows wetthesurus onepts to e disovered in text orpus eronson 8 vng IHF he ggerwetwp plugin for qei wrps the wetwp tv es lient to llow qei to ommunite with remote @or lolA wetwp rologfens mmserver nd wetwp distriutionF his llows the ontent of spei(ed nnottions @or the entire doument onE tentA to e proessed y wetwp nd the results onverted to qei nnottions nd feturesF o use this pluginD you will need ess to remote wetwp serverD or instll one lolly y downloding nd instlling the omplete distriutionX
QTP
httpXGGmetmpFnlmFnihFgovG
nd tv rologfens mmserver
httpXGGmetmpFnlmFnihFgovGiehwijvpiFhtml
he defult mmserver lotion nd port lotions re lolhost nd VHTTF o use di'erent server lotion ndGor portD see the ove es doumenttion nd speify the !metmpserverhost nd !metmpserverport options within the metwpypE tions runEtime prmeterF
unEtime prmeters
IF nnottexegixX set this to true to dd xegix fetures to nnottions @xegixype nd xegixriggerAF ee httpXGGodeFgoogleFomGpGnegexG for more informtion on xegix PF nnottehrsesX set to true to output wetwp phrseElevel nnottions @generlly nounEphrse hunksAF ynly phrses ontining wetwp mpping will e nnottedF gn e useful for postEoordintion of phrseElevel terms tht do not exist in preE oordinted form in wvF QF inputexmeX input ennottion et nmeF se in onjuntion with inE puteypesX @see elowAF nless spei(edD the entire doument ontent will e sent to wetwpF RF inputeypesX only send the ontent of these nnottions within inputexme to wetwp nd dd new wetwp nnottions inside ehF nless spei(edD the entire doument ontent will e sent to wetwpF SF inputeypepetureX send the ontent of this feture within inputeypes to wetwp nd wrp new wetwp nnottion round eh nnottion in inE puteypesF sf the feture is empty or does not existD then the nnottion ontent is sent instedF TF metwpyptionsX set prmeterEless wetwp options hereF hefult is Edt @trunE te gndidtes mppingsD disllow derivtionl vrints nd do not use full text prsingAF ee httpXGGmetmpFnlmFnihFgovGiehwijvpiFhtml for more detilsF xfX only set the Ey prmeter @wordEsense dismigutionA if wsdservertl is runE ningF UF outputexmeX output ennottion et nmeF VF outputeypeX output nnottion nme to e used for ll wetwp nnottions WF outputwodeX determines whih mppings re output s nnottions in the qei doumentD for eh phrseX
QTQ
ellgndidtesendwppingsX nnotte oth gndidte nd (nl mppingsF his will usully result in multipleD overlpping nnottions for eh termGphrse ellwppingsX nnotte ll the (nl wetwp wppings for eh phrseF his will result in fewer nnottions with higher preision @eFgF for 9lung ner9 only the omplete phrse will e nnotted s xeoplsti roess neopA righestwppingynlyX nnotte only the highest soring wetwp wpping for eh phrseF sf two wppings hve the sme soreD the (rst returned y wetwp is outputF righestwppingvowestgsX here there is more thn one highestEsoring mppingD return the mpping where the hed wordGphrse mp event hs the lowest gsF righestwppingwostouresX here there is more thn one highestEsoring mppingD return the mpping where the hed wordGphrse mp event hs the highest numer of soure voulry ourrenesF ellgndidtesX nnotte ll gndidte mppings nd not the (nl wppingsF his will result in more nnottions with less preision @eFgF for 9lung ner9 oth 9lung9 @poA nd 9lung ner9 @neopA will e nnottedAF IHF tggerwodeX determines whether ll term instnes re proessed y wetwpD the (rst instne onlyD or the (rst instne with oreferene nnottions ddedF ynly used if the inputeypes prmeter hs een setF pirstyurreneynlyX only proess nd nnotte the (rst instne of eh term in the doument goefereneX proess nd nnotte the (rst instne nd oreferene following instnes ellyurrenesX proess nd nnotte ll term instnes independently
QTR
16.1.4 BADREX
fehi @identifying B iomedil Arevitions using D ynmi R egulr E xpressionsAqooh IP is qei plugin tht nnottesD expnds nd oreferenes termErevition pirs using prmeterisle regulr expressions tht generlise nd extend the hwrtzErerst lgoE rithm hwrtz 8 rerst HQF sn ddition it uses suset of the inner!outer seletion rules desried in the eo 8 kgi HS evsgi lgorithmF ther thn simply extrting terms nd their revitionsD it nnottes them in situ nd dds the orresponding longEform nd shortEform text s fetures on ehF sn oreferene mode fehi expnds ll revitions in the text tht mth the short form of the most reently mthed longEform!shortEform pirF sn dditionD there is the option of nnotting nd lssifying ommon medil revitions extrted from ikipediF fehi n e downloded from qitruF
16.1.6 AbGene
upport for using eqene ne 8 ilur HP @ modi(ed version of the frill tggerAD to nnotte gene nmesD within qei is provided y the gger prmework plugin @etion PIFQAF eqene needs to e downloded1 nd instlled externlly to qei nd then the exmple eqene qei pplitionD provided in the resoures diretory of the gger prmework pluginD needs to e modi(ed ordinglyF
16.1.7 GENIA
e numer of di'erent iomedil lnguge proessing tools hve een developed under the uspies of the qixse rojetF upport is provided within qei for using oth the qixse
1 ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/AbGene/
QTS
sentene splitter nd the tggerD whih provides tokeniztionD prtEofEspeeh tggingD shllow prsing nd nmed entity reognitionF o use either the qixse sentene splitter2 or tgger3 within qei you need to hve downE loded nd ompiled the pproprite progrms whih n then e lled y the qei sF he qei qixse plugin provides the sentene splitter F he is on(gured through the following runtime prmetersX nnottionetxme the nme of the nnottion set in whih the entene nnotE tions should e reted deug if true then detils of lling the externl proess will e reported within the messge pne splitterfinry the lotion of the qixse sentene slitter inry upport for the qixse tgger within qei is hndled y the gger prmework whih is doumented in etion PIFQF ogether these two omponents in qei pipeline provides iomedil equivlent of exxsi @minus the orthogrphi oreferene omponentAF uh pipeline is provided s n exmple within the qixse plugin4 F por more detils on the qixse tgger nd its performne over iomedil text see suruok et al. HSF
5 http://www.seas.upenn.edu/~strctlrn/BioTagger/BioTagger.html
QTT
ell three tggers re on(gured in the sme wyD vi one init prmeter nd two runtime prmetersD s followsX modelv the lotion of the model used y the tgger inputexme the nnottion set to use s input to the tgger @must ontin oken nnottionsA outputexme the nnottion set in whih new nnottions re reted vi the tgger
16.1.9 MutationFinder
wuttionpinder is highEperformne si tool designed to extrt mentions of point mutE tions from free text gporso et al. HUF he wuttionpinder is on(gured vi single init prmeterX regexv this init prmeter spei(es the lotion of the regulr expression (le used y wuttionpinderF xote tht the defult vlue points to the (le supplied with wuttionpinderF yne reted the runtime ehviour of the n e ontrolled vi the following runtime prmeterX nnottionetxme the nme of the nnottion set in whih the wuttion nnotE tions should e reted
16.1.10 NormaGene
xormqene is we servieD provided y the fiew group in qenevF he servie provides tools for oth gene tgging nd normliztionD lthough urrently only tgging is supported y this qei wrpperF he xormqene gger is on(gured vi two runtime prmeters s followsX nnottionetxme the nme of the nnottion set in whih the qene nnottions should e retedF
QTU
threshold the threshold t whih n entity will e onsidered gene @defults to HFTAF winimize the threshold prmeter with short text input to reeive etter resultsF uning the threshold down helps to (nd more omplex gene nmes in the text ut it lso inreses the time tken to proess the textF
QTV
Chapter 17 Parsers
17.1 MiniPar Parser
winir is shllow prserF sn its shipped versionD it tkes one sentene s n input nd determines the dependeny reltionships etween the words of senteneF st prses the sentene nd rings out the informtion suh sX the lemm of the wordY the prt of speeh of the wordY the hed modi(ed y this wordY nme of the dependeny reltionship etween this word nd the hedY the lemm of the hedF sn the version of winir integrted in qei @rserwinipr9 pluginAD it genertes nnoE ttions of type hepreexode9 nd the nnottions of type reltion9 tht exists etween the hed nd the hild nodeF he doument is required to hve nnottions of type entene9D where eh nnottion onsists of string of the senteneF winipr tkes one sentene t time s n input nd genertes the tokens of type hepE reexode9F vter it ssigns reltion etween these tokensF ih hepreexode onsists of feture lled word9X this is the tul text of the wordF por eh nd every nnottion of type el9D where el9 is ojD pred etF his is the nme of the dependeny reltionship etween the hild word nd the hed word @see etion IUFIFSAF ivery el9 nnottion is ssigned four feturesX hildwordX this is the text of the hild nnottionY QTW
QUH
Parsers
hildidX shs of the nnottions whih modify the urrent word @if nyAF hedwordX this is the text of the hed nnottionY hedidX sh of the nnottion modi(ed y the hild word @if nyAY
Parsers
QUI
17.1.2 Resources
winir in qei is shipped with four si resouresX winiprrpperFjrX this is tee rpper for winirY reoleFwvX this de(nes the required prmeters for winir rpperY miniprFlinuxX this is modi(ed version of pdemoFppF miniprEwindowsFexe X this is modi(ed version of pdemoFpp ompiled to work on windowsF
17.1.3 Parameters
he winir wrpper tkes six prmetersX nnottionypexmeX new nnottions re reted with this typeD defult is 4hepE reexode4Y nnottionsnputetxmeX nnottions of entene type re provided s n input to winir nd re tken from the given nnottionetY nnottionyutputetxmeX ell nnottions reted y winipr rpper re stored under the given nnottionyutputetY doumentX the qei doument to proessY miniprfinryX lotion of the winir finry (le @iFeF either miniprFlinux or miniprEwindowsFexeF hese (les re ville under gteGpluginsGminiprG diretoryAY miniprhthirX lotion of the dt9 diretory under the instlltion diretory of wsxseF defult is 47wsxserywi7Gdt4F
17.1.4 Prerequisites
he winir wrpper requires the winir lirry to e ville on the underlying vinE uxGindows mhineF st n e downloded from the winir homepgeF
QUP
Parsers
17.2
RASP Parser
e @oust eurte ttistil rsingA is roust prsing system for inglishD develE oped y the xturl vnguge nd gomputtionl vinguistis group t the niversity of ussexF his pluginD rsere9D developed y higitleleD provides four wrpper s tht ll the e modules s externl progrmsD s well s tei omponent tht trnsltes the output of the exxsi y gger @etion TFTAF
Parsers
QUQ
entene plitter @etion TFRAY their output is omptile with the other s in this pluginAF
produed y the exxsi y gger @see etion TFT nd retes ordporm nnottions in the e pormtF he exxsi y gger nd this gonverter n together e used s sustitute for the eP y ggerF
rere re some exmples of orpus pipelines tht n e orretly onstruted with these sF IF egix entene plitter PF eP okenizer QF eP y gger RF eP worphologil enlyser SF eP rser IF egix entene plitter PF eP okenizer QF exxsi y gger RF e y gonverter SF eP worphologil enlyser TF eP rser IF exxsi okenizer PF exxsi entene plitter QF eP y gger
QUR
Parsers
RF eP worphologil enlyser SF eP rser IF exxsi okenizer PF exxsi entene plitter QF exxsi y gger RF e y gonverter SF eP worphologil enlyser TF eP rser purther doumenttion is inluded in the diretory gteGpluginsGrsereGdoGF he e pkgeD whih provides the externl progrmsD is ville from the e we pgeF e is only supported for vinux operting systemsF rying to run it on ny other operting systems will generte n exeption with the messgeX he e nnot e run on ny other operting systems exept vinuxF9 st must e orretly instlled on the sme mhine s qeiD nd must e instlled in diretory whose pth does not ontin ny spes @this is requirement of the e sripts s well s the wrpperAF fefore trying to run sripts for the (rst timeD edit rspFsh nd rspprseFsh to set the orret vlue for the shell vrile eD whih should e the (le system pthnme where you hve instlled the e tools @for exmpleD eaGoptGe or eaGusrGlolGeF ou will need to enter the sme pth for the initiliztion prmeter rsprome for the y ggerD worphologil enlyserD nd rser sF @yn some systems the rh ommnd used in the sripts is not villeY workEround is to omment tht line out nd dd rha9ixVTlinux9D for exmpleFA @he previous version of the e plugin n now e found in pluginsGysoleteGrspFA
17.3
SUPPLE Parser
vi is ottomEup prser tht onstruts syntx trees nd logil forms for inglish sentenesF he prser is omplete in the sense tht every nlysis liensed y the grmmr is produedF sn the urrent version only the est9 prse is seleted t the end of the prsing proessF he inglish grmmr is implemented s n ttriuteEvlue ontext free grmmr whih onsists of sugrmmrs for noun phrses @xAD ver phrses @AD prepositionl
Parsers
QUS
phrses @AD reltive phrses @A nd sentenes @AF he semntis ssoited with eh grmmr rule llow the prser to produe logil forms omposed of unry predites to denote entities nd events @eFgFD chase(e1)D run(e2)A nd inry predites for properties @eFgF lsubj(e1,e2)AF gonstnts @eFgFD e1D e2A re used to represent entity nd event identi(ersF he qei vi rpper stores syntti informtion produed y the prser in the gte doument in the form of prse nnottions ontining rketed representtion of the prseY nd semntis nnottions tht ontins the logil forms produed y the prserF st lso produes yntxreexode nnottions tht llow viewing of the prse tree for sentene @see etion IUFQFRAF
17.3.1 Requirements
he vi prser is written in rologD so you will need rolog interpreter to run the prserF e opy of rologgfe @httpXGGkminriFsiteFkoeEuFFjpGrologgfeGAD pure tv rolog implementtionD is provided in the distriutionF his should work on ny pltform ut it is not prtiulrly fstF vi lso supports the openEsoure s rolog @httpXGGwwwFswiEprologForgA nd the ommerilly liened sgtus prolog @httpXGGwwwFsisFseGsistusD vi supports versions Q nd RAD whih re ville for indowsD w y D vinux nd other nix vrintsF por nything more thn the simplest ses we reommend instlling one of these insted of using rologgfeF
QUT
Parsers
splitter yEtgger worphology vi rser with prmeters mpping (le @on(gGmppingFon(gA feture tle (le @on(gGfeturetleFon(gA prser (le @suppleFplfe or suppleFsistus or suppleFswiA prolog implementtion @shefFnlpFsuppleFprologFrologgfeD shefFnlpFsuppleFprologFsgtusrologQD shefFnlpFsuppleFprologFsgtusrologRD shefFnlpFsuppleFprologFsrolog or shefFnlpFsuppleFprologFstvrolog1 AF ou n tke look t uildFxml to see exmples of invotion for the di'erent impleE menttionsF
xote tht prior to qei QFID the prser (le prmeter ws of type jvFioFpileF prom
QFI it is of type jvFnetFvF sf you hve sved pplition @Fgpp (leA from efore qei QFI whih inludes vi it will need to e updted to work with the new versionF snstrutions on how to do this n e found in the iehwi (le in the vi plugin diretoryF
Parsers
QUU
for sgtusX suppleFsistusFexeutle E defult is to look for sistusFexe @inE dowsA or sistus @other pltformsA on the erF for sX suppleFswiFexeutle E defult is to look for plonFexe @indowsA or swipl @other pltformsA on the erF sf your prolog is instlled under di'erent nmeD you should speify the orret nme in the relevnt system propertyF por exmpleD when instlled from the soure distriutionD the nix version of s prolog is typilly instlled s plD most inry pkges instll it s swiplD though some use the nme swiEprologF ou n lso use the properties to speify the full pth to prolog @eFgF GoptGswiEprologGinGplA if it is not on your defult erF por detils of how to pss system properties to qeiD see the end of etion PFQF
(le
wpping pile
he mpping (le spei(es how nnottions produed using qei re to e pssed to the prserF he (le is omposed of numer of pirs of linesD the (rst line in pir spei(es qei nnottion we wnt to pss to the prserF st inludes the ennottionet @or defultAD the ennottionypeD nd numer of fetures nd vlues tht depend on the ennottionypeF he seond line of the pir spei(es how to enode the qei nnottion in vi syntti tegoryD this line lso inludes numer of fetures nd vluesF es n exmple onsider the mppingX
Gate;AnnotationType=Token;category=DT;string=&S SUPPLE;category=dt;m_root=&S;s_form=&S
st spei(es how determinnt @9h9A will e trnslted into tegory dt9 for the prserF he onstrut 89 is used to represent vrile tht will e instntited to the pproprite vlue during the mpping proessF wore spei(lly token like he9 reognised s h y the yEtgging will e mpped into the following tegoryX
dt(s_form:'The',m_root:'The',m_affix:'_',text:'_').
QUV
Parsers
Gate;AnnotationType=Lookup;majorType=person_first;minorType=female;string=&S SUPPLE;category=list_np;s_form=&S;ne_tag=person;ne_type=person_first;gender=female
st spei(ed tht n nnottion of type vookup9 in qei is mpped into tegory listnp9 with spei( fetures nd vluesF wore spei(lly token like wry9 identi(ed in qei s vookup will e mpped into the following vi tegoryX
list_np(s_form:'Mary',m_root:'_',m_affix:'_', text:'_',ne_tag:'person',ne_type:'person_first',gender:'female').
peture le
he feture tle (le spei(es vi lexil9 tegories nd its feturesF es n exmple n entry in this (le isX
n;s_form;m_root;m_affix;text;person;number
whih spei(es whih fetures nd in whih order noun tegory should e writtenF sn this seX
n(s_form:...,m_root:...,m_affix:...,text:...,person:...,number:....).
where the numer nd type of fetures is dependent on the tegory type @see etion SFIAF ell tegories will hve the fetures sform @surfe formA nd mroot @morphologil rootAY nominl nd verl tegories will lso hve person nd numer feturesY verl tegories will lso hve tense nd vform feturesY nd djetivl tegories will hve degree fetureF he listnp tegory hs the sme fetures s other nominl tegories plus netg nd netypeF yntti rules re spei(ed in rolog with the predite rule(LHS, RHS) where LHS is syntti tegory nd RHS is list of syntti tegoriesF e rule suh s BN P HEAD N @ si noun phrse hed is omposed of noun9A is written s followsX
Parsers
rule(bnp_head(sem:E^[[R,E],[number,E,N]],number:N), [n(m_root:R,number:N)]).
QUW
where the feture sem9 is used to onstrut the semntis while the prser proesses inputD nd iD D nd x re vriles to e instntited during prsingF he full grmmr of this distriution n e found in the prologGgrmmr diretoryD the (le lodFpl spei(es whih grmmrs re used y the prserF he grmmrs re ompiled when the system is uilt nd the ompiled version is used for prsingF
whih mps nmed entity hte9 into syntti tegory 9semt9F e grmmr (le lled semntirulesFpl is provided to mp semt into the pproprite syntti tegory expeted y the phrsl rulesF he following rule for exmpleX
rule(ne_np(s_form:F,sem:X^[[name,X,NAME],[KIND,X]]),[ sem_cat(s_form:F,text:TEXT,type:'Date',kind:KIND,name:NAME)]).
is used to prse hte9 into nmed entity in vi whih in turn will e prsed into noun phrseF
QVH
Parsers
he qei wrpper prmeter uhrtpile is now vipileD nd it is now of type jvFnetFv rther thn jvFioFpileF hetils of how to ompenste for this in existing sved pplitions re given in the vi iehwi (leF he rolog wrppers now strt shefFnlpFsuppleFprolog insted of shefFnlpFuhrtFprolog he mppingFonf (le now hs lines strting viY insted of fuhrtY wost importntly the min wrpper lss is now lled nlpFshefFsuppleFvi
wking these hnges to existing ode should e trivil nd llow pplition to ene(t from future improvements to viF
17.4
Stanford Parser
he tnford rser is proilisti prsing system implemented in tv y tnford niversity9s xturl vnguge roessing qroupF ht (les re ville from tnford for prsing eriD ghineseD inglishD nd qermnF his pluginD rsertnford9D developed y the qei temD provides @gteFstnfordFrserA tht ts s wrpper round the tnford rser @version PFHFRA nd trnsltes qei nnottions to nd from the dt strutures of the prser itselfF he plugin is supplied with the unmodi(ed jr (le nd one inglish dt (le otined from tnfordF tnford9s softwre itself is sujet to the full qvF he prser itself n e trined on other orpor nd lngugesD s doumented on the wesiteD ut this plugin does not provide mens of doing soF rined dt (les re not neE essrily omptile etween di'erent versions of the prserY in prtiulr (les from versions efore PFH re proly inomptile with the urrent softwreF @qei swithed from IFT to IFTFI t uild QIPH in tnury PHHWD to IFTFS in heemer PHIHD to IFTFV in eugust PHIID nd to PFHFI in wrh PHIPFA he urrent versions of the tnford prser nd this re thredsfeF wultiple instnes of the with the sme or di'erent model (les n e used simultneouslyF
Parsers
QVI
omptile with tnford9s prser dt (les for inglish @whih lso use the enn treenk tgsetAF
mppingpile the optionl pth to mpping (leX )tD twoEolumn (le whih the wrpper
n use to trnslte9 tgsF e smple (le is inludedF3 fy defult this vlue is null nd mpping is ignoredF extrt the dependeny reltions from the onstitueny struturesF he defult vlue is omptile with the inglish dt (le suppliedF lese refer to the tnford xv qroup9s doumenttion nd the prser9s jvdo for further explntionF
deug oolen vlue whih ontrols the verosity of the wrpper9s outputF reuseosgs if trueD the wrpper will red tegory fetures @produed y n erlier
yEtgging A from the oken nnottions nd fore the prser to use themF
usewpping if this is true nd mpping (le ws loded when the ws initilizedD the
y nd syntti tgs produed y the prser will e trnslted using tht (leF sf no mpping (le ws lodedD this prmeter is ignoredF
he following oolen prmeters swith on nd o' the vrious types of output tht the prser n produeF eny or ll of them n e trueD ut if ll re flse the will simply print wrning to sve time @insted of running the prserAF
ddosgs if this is trueD the wrpper will dd tegory fetures to the oken nnotE
tionsF
2 resources/englishPCFG.ser.gz 3 resources/english-tag-map.txt
QVP
Parsers
with yntxreexode nnottions tht re omptile with the yntx ree iewer @see etion IUFQFRAF
yped
inludeixtrhependenies his hs no e'et with the ellyped modeY for the othersD
it determines whether to inlude extrs suh s ontrol dependeniesY if they re inludedD the omplete set of dependenies my not follow tree strutureF
wo smple qei pplitions for inglish re inluded in the pluginsGrsertnford diretoryX smpleprserenFgpp runs the egex entene plitter nd exxsi okenizer nd then uses this to nnotte y tgs nd onstitueny nd dependeny struturesD wheres smpleposCprserenFgpp lso runs the exxsi y gger nd mkes the prser reEuse its y tgsF
4 http://nlp.stanford.edu/software/parser-faq.shtml
QVR
Machine Learning
outputs of the fth verning for the four usge modesY nmely triningD pplitionD evlution nd produing feture (les onlyD nd in prtiulrD the formt of the feture (les nd lel list (le produed y the fth verning F etion IVFQ outlines the originl whine verning in qeiF
18.1
ML Generalities
here re two min types of wvY supervised lerning nd unsupervised lerningF upervised lerning is more e'etive nd muh more widely used in xvF glssi(tion is prtiulr exmple of supervised lerningD in whih the set of trining exmples is split into multiple susets @lssesA nd the lgorithm ttempts to distriute new exmples into the existing lssesF his is the type of wv tht is used in qeiD nd ll further referenes to wv tully refer to lssi(tionF en wv lgorithm lerns9 out phenomenon y looking t set of ourrenes of tht phenomenon tht re used s exmplesF fsed on theseD model is uilt tht n e used to predit hrteristis of future @unseenA exmples of the phenomenonF en wv implementtion hs two modes of funtioningX trining nd pplitionF he trining phse onsists of uilding model @eFgF sttistil modelD deision treeD rule setD etFA from dtset of lredy lssi(ed instnesF huring pplitionD the model uilt during trining is used to lssify new instnesF whine verning in xv flls rodly into three tegories of tsk typeY text lssi(tionD hunk reognitionD nd reltion extrtion ext lssi(tion lssi(es text into preEde(ned tegoriesF he proess n e eqully well pplied t the doumentD sentene or token levelF ypil exmples of text lssi(tion might e doument lssi(tionD opinionted sentene reognitionD y tgging of tokens nd word sense dismigutionF ghunk reognition often onsists of two stepsF pirstD it identi(es the hunks of interest in the textF st then ssigns lel or lels to these hunksF rowever some prolems omprise simply the (rst stepY identifying the relevnt hunksF ixmples of hunk reognition inlude nmed entity reognition @nd more generllyD informtion extrtionAD x hunking nd ghinese word segmenttionF eltion extrtion determines whether or not pir of terms in the text hs some type@sA of preEde(ned reltionsF wo exmples re nmed entity reltion extrtion nd oEreferene resolutionF ypillyD the three types of xv lerning use di'erent linguisti fetures nd feture repE resenttionsF por exmpleD it hs een reognised tht for text lssi(tion the soElled
Machine Learning
QVS
tf idf representtion of nEgrms is very e'etive @eFgF with wAF por hunk reognitionD identifying the strt token nd the end token of the hunk y using the linguisti fetures of the token itself nd the surrounding tokens is e'etive nd e0ientF eltion extrtion ene(ts from oth the linguisti fetures from eh of the two terms involved in the reltion nd the fetures of the two terms ominedF
he rest of this setion explins some si de(nitions in wv nd their spei(tion in the wv pluginF
QVT
Machine Learning
18.2
Batch Learning PR
his setion desries the newest mhine lerning in qeiF he implementtion foE uses on the three min types of lerning in xvD nmely hunk reognition @eFgF nmed entity reognitionAD text lssi(tion nd reltion extrtionF he implementtion for hunk reognition is sed on our work using support vetor mhines @wA for informtion exE trtion vi et al. HSF he text lssi(tion is sed on our work on opinionted sentene lssi(tion nd ptent doument lssi(tion @see vi et al. HU nd vi et al. HUdD reE spetivelyAF he reltion extrtion is sed on our work on nmed entity reltion extrtion ng et al. HTF he fth verning D given set of doumentsD n lso produe feture (lesD ontining linguisti fetures nd feture vetorsD nd lels if there re ny in the doumentsF st n lso produe doumentEterm mtries nd nEgrm sed lnguge modelsF peture (les re in text formt nd n e used outside of qeiF reneD users n use qeiEprodued feture (les o'ElineD for their own purposeD eFgF evluting new lerning lgorithmsF he lso provides filities for tive lerningD sed on support vetor mhines @wAD minly rnking the unlelled douments ording to the on(dene sores of the urrent w models for those doumentsF he primry lerning lgorithm implemented is wD whih hs hieved stte of the rt performnes for mny xv lerning tsksF he trining of w uses tv version of the w pkge viw ggHHIF epplition of w is implemented y ourselvesF he ew @ereptron elgorithm with neven wrginsA is lso inluded vi et al. HPD nd on our test dtsets hs onsistently produed performne to rivl the w with muh redued trining timesF woreoverD the wv implementtion provides n interfe to the openE soure mhine lerning pkge ek itten 8 prnk WWD nd n use mhine lerning lgorithms implemented in ekF hree widelyEused lerning lgorithms re ville in the urrent implementtionX xive fyesD uxx nd the gRFS deision tree lgorithmF eess to wv implementtions is provided in qei y the fth verning 9 @in the lerning9 pluginAF he hndles trining nd pplition of n wv modelD evlution of lerning on qei doumentsD produing feture (les nd rnking douments for etive verningF st lso mkes it possile to view the priml forms of liner wF his is vnguge enlyser so it n e used in ll defult types of qei ontrollersF sn order to use the fth verning proessing resoureD the user hs to do three thingsF pirstD the user hs to nnotte some trining douments with the lels tht sGhe wnts the lerning system to nnotte in new doumentsF hose lel nnottions should e qei nnottionsF eondlyD the user my need to preEproess the douments to otin linguisti fetures for the lerningF eginD these fetures should e in the form of qei nnottionsF qei9s plugin exxsi might e helpful for produing the linguisti feturesF yther resoures suh s the x ghunker nd prser my lso e helpfulF fy providing the mhine lerning lgorithm with more nd etter informtion on whih to se lerningD
Machine Learning
QVU
hnes of good result re inresedD so this preproessing stge is importntF pinlly the user hs to rete on(gurtion (le for setting the wv D eFgF seleting the lerning lgorithm nd de(ning the linguisti fetures used in lerningF hree exmple on(gurtion (les re presented in this setionY it might e helpful to tke one of them s strting point nd modify itF
QVV
Machine Learning
inputexme is the nnottion set ontining the nnottions for the linguisti fetures to e used nd the lss lelsF outputexme is the nnottion set in whih the results of pplying the models will e putF xote tht it should e set the sme s the inputASName when doing the evlution @iFeF setting the learningMode s ievesyx9AF lerningwode is runEtime prmeterF st n e set s one of the following vluesD esxsxq9D evsgesyx9D ievesyx9D roduepeturepilesynly9D wsesxsxq9D siswevpywwyhiv9 nd nkinghosporev9F he deE fult lerning mode is esxsxq9F
models into (le lled lernedwodelsFsve9 under the suEdiretory svedpiles9 of the working diretoryF
! sf the user wnts to pply the lerned model to the dtD sGhe should selet evsgesyx modeF sn pplition modeD the reds the lerned model
from the (le lernedwodelsFsve9 in the sudiretory svedpiles9 nd then pplies the model to the dtF the orpus provided @the method of the evlution is spei(ed in the on(gurtion (leD see elowAD nd output the evlution results to the messges window of qei heveloperD or stndrd out when using qei imeddedD nd into the log (leF hen using evlution modeD plese mke sure tht the outputASName is set to the sme nnottion set s the inputASNameF
! sf the user only wnts to produe feture dt nd feture vetors ut does not wnt to trin or pply modelD sGhe my selet the roduepeturepilesynly
modeF he feture (les tht the produes will e explined in detil in etion IVFPFRF pended to the end of ny existing feture (leF sn ontrstD in trining modeD the trining dt reted in the urrent session overwrite ny existing feture (leF gonsequentlyD mixed inititive trining mode uses oth the trining dt otined in this session nd the dt tht existed in the feture (le efore strting the sesE sionF reneD trining mode is for th lerningD while mixed inititive trining mode n e used for onEline @or dptiveD or mixedEinititiveA lerningF here is one prmeter for mixed inititive trining mode speifying the miniml numer of newly dded douments efore strting the lerning proedure to updte the lerned modelF he prmeter n e de(ned in the on(gurtion (leF xv fetures in the lerned modelsF sn the urrent implementtionD the mode is only vlid with the liner w modelD in whih the most slient xv fetures orrespond to the iggest @solute vlues ofA weights in the weight vetorF sn the on(gurtion (le one n speify two prmeters to determine the numer
Machine Learning
QVW
of displyed xv fetures for positive nd negtive weightsF xote tht if eFgF the numer for negtive weight is set s 0D then no xv feture is displyed for negtive weightsF
sn most ses it is not sfe to run more thn one instne of the th lerning with the sme working diretory t the sme timeD euse the needs to updte the model @in esxsxqD wsesxsxq or ievesyx modeA or other dt (lesF st is sfe to run multiple instnes t one provided they re ll in APPLICATION mode1 F
yrder of doument proessing sn the usul seD in qei orpus pipeline ppliE
tionD douments re proessed one t timeD nd eh is pplied in turn to the doumentD proessing it fullyD efore moving on to the next doumentF he fth verning reks from this ruleF wv trining lgorithmsD inluding wD typilly run s th proess over trining setD nd require ll the dt to e fully prepred nd pssed to the lgorithm in one goF his mens tht in trining @or evlutionA modeD the fth verning will wit for ll the douments to e proessed nd will then run s single opertion t the endF hereforeD the fth verning needs to e positioned last in the pipelineF ostE proessing nnot e done within the pipeline fter the fth verning F here further proessing needs to e doneD this should tke the form of seprte pplitionD nd e pplied to the dt fterwrdsF here is n exeption to the oveD howeverF sn pplition modeD the sitution is slightly di'erentD sine the wv model hs lredy een retedD nd the only pplies it to the dtF his n e done on doument y doument sisD in the mnner of norml F roweverD lthough it n e done doument y doumentD there my e dvntges in terms of e0ieny to grouping douments into thes efore pplying the lgorithmF e prmeter in the on(gurtion (leD BATCH-APP-INTERVALD desried lterD llows the user to speify the size of suh thesD nd y defult this is set to 1Y in other wordsD y defultD the fth verning in pplition mode ehves like norml nd proesses eh doument seprtelyF here my e sustntil e0ieny gins to e hd through inresing this prmeter @lthough higher vlues require more memory onsumptionAD ut if the fth verning is pplied in pplition mode and the prmeter BATCH-APP-INTERVAL is set to 1D the n e treted like ny otherD nd other s my e positioned fter it in pipelineF
1 This is only true for GATE 5.2 or later; in earlier versions all modes were unsafe for multiple instances
of the PR.
QWH
Machine Learning
yptionl ettings in the gon(gurtion pile he fth verning provides vriE ety of optionl settingsD whih filitte di'erent tsksF ivery optionl setting hs defult vlueY if n optionl setting is not spei(ed in the on(gurtion (leD the fth verning will dopt its defult vlueF ih of the following optionl settings n e set s n element in the wv on(gurtion (leF
yxh should e set to true9 if the user wnts the fth verning to lern hunks y identifying the strt token nd the end token of the hunkF his pproh to hunk lerningD for exmpleD nmed entity reognitionD where spn of severl tokens is to e identi(edD often produes etter results thn trying to lern every token in the hunkF por lssi(tion prolems nd reltion extrtionD set its vlue s flse9F his element ppers in the on(gurtion (le sX <SURROUND VALUE='X'/> where the vrile hs two possile vluesX true9 or flse9F he defult vlue is flse9F psvisxq reltes to w triningF here the rtio of positive exmples to negE tive exmples is lowD iFeF the instnes elonging in the lss re muh outweighed y instnes outside of the lss @eFgF one ginst others9 is usedD see multiClassication2Binary elowA ws n run into di0ultiesF he positive exmples my e swmped y outlying negtive exmplesF he wv plugin provides funtionlity developed through reserh @eFgF vi 8 fonthev HVA to ssist in suh sesF yne exE mple is the FILTERING prmeterF he (ltering funtionlity performs initil w triningD then removes negtive exmples on the sis of their position reltive to the seprtorF st then retrins on the smller dtsetF ypillyD negtive instnes lose to the oundry re removedF xote tht this twoEstep proess tkes longer thn simple triningF roweverD the seond trining step will e quiker thn the (rstD s it is perE formed on somewht redued dtsetF sf the item dis is set s ner9D the selets nd removes those negtive exmples whih re losest to the w hyperEplneF sf it is set s fr9D those negtive exmples tht re furthest from the w hyperEplne re removedF he vlue of the item ratio determines wht proportion of negtive exmples will e (ltered outF his element ppers in the on(gurtion (le sX < FILTERING ratio='X' dis='Y'/> where represents numer etween H nd I nd n e set s ner9 or fr9F sf the
Machine Learning
QWI
(ltering element is not present in the on(gurtion (leD or the vlue of ratio is set s 0.0D the does not perform (lteringF he defult vlue of ratio is 0.0F he defult vlue of dis is fr9F ievesyx es outlined oveD if the lerning mode prmeter learningMode is set to ievesyx9D the will perform evlution of the wv modelY it will split the douments in the orpus into two prtsD the trining dtset nd the test dtsetD lern model from the trining dtsetD pply the model to the testing dtsetD nd (nlly ompre the nnottions ssigned y the model on the test set with the true nnottions nd output mesures of suess @eFgF pEmesureAF he evlution element spei(es the method of splitting the orpusF he item method determines whih method to use for evlutionF gurrently two ommonly used methods re implementedD nmely k-fold cross-validation nd hold-out testF sn kEfold rossEvlidtion the segments the orpus into k prtitions of equl sizeD nd uses eh of the prtitions in turn s test setD with ll the remining douments s trining setF por holdEout testD the system rndomly selets some douments s testing dt nd uses ll other douments s trining dtF he vlue of the item runs spei(es the numer k9 for kEfold rossE vlidtionF he vlue of the item ratio spei(es the rtio of the dt used for trining in the holdEout test methodF he element in the on(gurtion (le ppers s soX <EVALUATION method="X" runs="Y" ratio="Z"/> where the vrile hs two possile vlues kfold9 nd holdout9D is positive integerD nd is )ot numer etween H nd IF he defult vlue of method is holdout9F he defult vlue of runs is I9F he defult vlue of ratio is HFTT9F multiglssi(tionPfinryF gertin mhine lerning lgorithmsD inluding wD re designed to operte on two lss prolemsY they (nd seprtor etween two groups of instnesF sn order to use suh lgorithms to lssify items into lrger numer of lssesD the prolem hs to e onverted into series of inry9 @two lssA prolemsF he wv plugin implements two ommon methods for onverting multiElss prolem into severl inry prolemsD nmely one against others nd one against anotherF he two methods my hve slightly di'erent nmes in other pulitionsD ut the priniple is the smeF uppose we hve multiElss lssi(tion prolem with n lssesF por the one against others methodD one inry lssi(tion prolem is derived for eh of the n lssesF ixmples elonging to the lss in question re onsidered to e positive exmples nd ll other exmples in the trining set re negtive exmplesF sn ontrstD for the one against another methodD one inry lssi(tion prolem is derived for eh pir (c1, c2) of the n lssesF rining exmples elonging to the lss c1 re the positive exmples nd those elonging to the other lssD c2D re the negtive exmplesF he user n selet one of the two methods y speifying the vlue of the item method of the elementF he element ppers s soX <multiClassication2Binary method="X" thread-pool-size="N"/> where the vrile hs two vluesD oneEvsEothers9 nd oneEvsEnother9F xote tht depending on the smple sizeD the two methods my di'er gretly in their speed of exeutionF he defult method is the oneEvsEothers methodF sf the on(gurtion (le does not hve the element or the item method is missedD then the will use the oneE
QWP
Machine Learning
vsEothers methodF ine the derived inry lssi(ers re independent it is possile to lern severl of them in prllelF he thredEpoolEsize9 ttriute gives the numer of threds tht will e used to lern nd pply the inry lssi(ersF sf omittedD single thred will e used to proess ll the lssi(ers in sequeneF thresholdroilityfoundry sets on(dene threshold on strt nd end tokens for hunk lerningF st is used in postEproessing the lerning resultsF ynly those oundry tokens in whih the on(dene level is ove the threshold re seleted s ndidtes for the entitiesF he element in on(gurtion (le ppers s soX <PARAMETER name="thresholdProbabilityBoundary" value="X"/> he vlue is etween H nd IF he defult vlue is HFRF thresholdroilityintity sets on(dene threshold on hunks @whih is the multiplition of the proilities of the strt token nd end token of the hunkA for hunk lerningF ynly those entities in whih the on(dene level is ove the threshold re seleted s ndidtes of the entitiesF he element in on(gurtion (le ppers s soX <PARAMETER name="thresholdProbabilityEntity" value="X"/> he vlue is etween H nd IF he defult vlue is HFPF he threshold prmeter thresholdroilityglssi(tion is the on(dene threshold for lssi(tion @eFgF text lssi(tion nd reltion extrtion tsksF sn ontrstD the ove two proilities re for the hunking reognition tskFA he orE responding element in on(gurtion (le ppers s soX <PARAMETER name="thresholdProbabilityClassication" value="X"/> he vlue is etween H nd IF he defult vlue is HFSF sEvefivEheefvi is foolen prmeterF sf its vlue is set to true9D the lel list is updted from the lels in the trining dtF ytherwiseD preEde(ned lel list will e used nd nnot e updted from the trining dtF he on(gurtion element ppers s soX <IS-LABEL-UPDATABLE value="X"/> he vlue is true9 or flse9F he defult vlue is true9F sExvpieivsEheefvi is foolen prmeterF sf its vlue is set to true9D the xv feture list is updted from the fetures in the trining or pplition dtF ytherwiseD preEde(ned xv feture list will e used nd nnot e updtedF he on(gurtion element ppers s soX <IS-NLPFEATURELIST-UPDATABLE value="X"/> he vlue is true9 or flse9F he defult vlue is true9F he prmeter ifys spei(es the verosity level of the output of the systemD oth to the messge window of qei heveloper @or stndrd out when using qei imeddedA nd into the log (leF gurrently there re three verosity levelsF vevel H only llows the output of wrning messgesF vevel I outputs some importnt setting informtion nd the results for evlution modeF vevel P is used for deugging purposesF
Machine Learning
he element in the on(gurtion (le ppers s soX <VERBOSITY level="X"/> he vlue n e set s HD I or PF he defult vlue is IF
QWQ
wsEesxsxqEsxiev spei(es the miniml numer of newly dded douE ments needed to trigger retrining the modelF his prmeter is used in MITRAININGF he numer is spei(ed y the vlue of the feture num9 s soX <MI-TRAINING-INTERVAL num="X"/> he defult vlue of is 1F fegrEeEsxiev is used in pplition modeD nd spei(es the numer of douments to e olleted nd pssed s th for lssi(tionF lese refer to etion IVFPFI for detiled explntion of this optionF he orresponding element in the on(gurtion (le isX <BATCH-APP-INTERVAL num="X"/> he defult vlue of is 1F hsveExvpieiEvsxiew reltes to siswevpywE wyhiv9 modeF sn this modeD the most signi(nt fetures re displyed for eh lssF por more informtion out this mode see etion IVFPFIF wo numers re spei(edY the numer of positively weighted fetures to disply nd the numer of negtively weighted fetures to displyF st hs the following form in the on(gurtion (leY <DISPLAY-NLPFEATURES-LINEARSVM numP="X" numN="Y"/> where nd represent the numers of positively nd negtively weighted fetures to displyD respetivelyF he defult vlues of nd re 10 nd 0F egsiviexsxq spei(es the settings for tive lerningF etive lerning rnks douments sed on the verge of smple of wv nnottion on(dene soresF e lrger smple gives more urte rnking ut tkes longer to lulteF he option hs the following formX <ACTIVELEARNING numExamplesPerDoc='X'/> where represents the numer of exmples per doument used to otin the on(dene sore with respet to the lerned modelF he defult vlue of numExamplesPerDoc is 3F
he ixqsxi ilement he
element spei(es whih wv lgorithm will e usedD nd lso llows the options to e set for tht lgorithmF
ENGINE
por w lerningD the user n hoose one of two lerning enginesF e will disuss the two w lerning engines elowF xote tht only liner nd polynomil kernels re supportedF his is despite the ft tht the originl w pkges implemented other types of kernelF viner nd polynomil kernels re populr in nturl lnguge lerningD nd other types of kernel re rrely usedF roweverD if you wnt to experiment with other types of kernelD you
QWR
Machine Learning
n do so y (rst running the fth verning in qei to produe the trining nd testing dtD then using the dt with the w implementtion outside of qeiF he on(gurtion (les in the test diretory @iFeF pluginsGlerningGtestG under the min gte diretoryA ontin exmples for setting the lerning engineF he ixqsxi element in the on(gurtion (le is spei(ed s followsX <ixqsxi niknmea99 implementtionxmea99 optionsa99G> st hs three itemsX niknme n e the nme of the lerning lgorithm or whtever the user wnts it to eF implementtionxme refers to the implementtion of the prtiulr lerning lgoE rithm tht the user wnts to useF sts vlue should e one of the followingX
guge other thn tvD run s seprte proess outside of qeiF gurrently it n use the SV M light w pkge2 Y see the wv (le in the qei distriuE tion @t gteGpluginsGlerningGtestGhunklerningGenginesEsvmEsvmlightFxmlA for n exmple of how to speify the lerning engine to e usedF he lerning enE gines SVMExec nd SVMLibSvmJava should produe the sme results in theory ut my get slightly di'erent results in prtie due to implementtionl di'erE enesF SVMLibSvmJava tends to e fster thn SVMExec for smller trining setsF here my e ses where it is n dvntge to run w s seprte proess howeverD in whih seD SVMExec would e preferleF tion lerning lgorithmF @por detils out the lerning lgorithm ewD see vi et al. HPAF
! ewD the ereptron with uneven mrginsD simple nd fst lssi(E ! ewixeD inry ew implementtion of your hoieD potentilly in
lnguge other thn tvD run s seprte proess outside of qeiF he relE tionship etween the PAUM nd PAUMExec is similr to tht of SVMLibSvmJava nd SVMExecF ou my downlod nd use n implementtion in g from http://www.dcs.shef.ac.uk/yaoyong/paum/paum-learning.zipF ee the wv (le in the qei distriution @t gteGpluginsGlerningGtestGhunklerningGenginesE pumEexeFxmlA for n exmple of how to speify the lerning engine to e usedF
! xivefyesekD the xive fyes lerning lgorithm implemented in ekF ! uxxekD the u nerest neighour @uxxA lgorithm implemented in ekF
2 The SVM package
SV M light
Machine Learning
QWS
! he options for
re similr to those for viw ut with the exeption tht sine SVMLibSvmJava implements the uneven mrgins w lgoE rithms desried in vi 8 hweEylor HQD it tkes the uneven mrgins prmE eter s n optionF SVMLibSvmJava options re s followsX
SVMLibSvmJava
* Es svmtypeY whether the w should e inry or multilssF hefult vlue is HF ine only inry is supportedD the option should e set to H or exludedF * Et kerneltypeY H for liner kernel or I for polynomil kernelF hefult vlue is HF xote tht the urrent implementtion does not support other kernel types suh s rdil nd sigmoid funtionF * Ed degreeY the degree in polynomil kernelD eFgF P for qudrti kernelF hefult vlue is QF * E ostY the ost prmeter g in the wF hefult vlue is IF his prmeter determines the ost ssoited with llowing trining errors @soft mrgins9AF ellowing some points to e mislssi(ed y the w my produe more generlizle resultF * Em hesizeY the he memory size in wf @defult IHHAF * Etu vlueY setting the vlue of uneven mrgins prmeter of the wF = 1 orresponds to the stndrd wF sf the trining dt hs just smll numer of positive exmples nd lrge numer of negtive exmplesD setting the prmeter to vlue less thn I @eFgF = 0.4A often results in etter pEmesure thn the stndrd w @see vi 8 hweEylor HQAF
! he options for SVMExecD using SV M light D re similr to those for using SV M light
diretly for triningF yptions set the type of kernelD the prmeters in the kerE nel funtionD the ost prmeterD the memory usedD etF he prmeter tu is lso inludedD to set the uneven mrgins prmeterD s explined oveF he lst two terms in the prmeter options re the trining dt (le nd the model (leF en exmple of the options for wixe might e E HFU Et H Em IHH Ev H Etu HFT GyoyongGsoftwreGsvmElightGdtsvmFdt GyoyongGsoftwreGsvmE lightGmodelsvmFdt9D mening tht the lerner uses liner kernelD the unE even mrgins prmeter is set s HFTD nd two dt (les GyoyongGsoftwreGsvmE lightGdtsvmFdt nd GyoyongGsoftwreGsvmElightGmodelsvmFdt for writE ing nd reding dtF xote tht oth the dt (les spei(ed here re temporry (lesD whih re used only y the svmElight trining progrmD n e in nywhere in your omputerD nd re independent of the dt (les produed y the qei lerning pluginF wixe lso tkes further rgumentD exeutlerinE ingD whih spei(es the w lerning progrm svmlernFexe in the SV M light F
QWT
Machine Learning
por exmpleD exeutleriningaGyoyongGsoftwreGsvmElightGsvmlernFexe9 spei(es one prtiulr svmlernFexe otined from the pkge SV M light F
! he
PAUM engine hs three optionsY Ep9 for the positive mrginD En9 fo the negtive mrginD nd Eoptf9 for the modi(tion of the is termF por exmpleD optionsaEp SH En S Eoptf HFQ9 mens + = 50D = 5 nd b = b + 0.3 in the ew lgorithmF
! he uxx lgorithm hs one optionY the numer of neighours usedF st is set vi Ek 9F he defult vlue is IF ! here re no options for xive fyes nd gRFS lgorithmsF he heei ilement he
element de(nes the type of nnottion to e used s trining instne nd the set of ttriutes tht hrterise the instnesF he INSTANCE-TYPE suEelement is used to selet the nnottion type to e used for instnesF here will e one trining instne for every one of the instne nnottions in the orpusF por exmpleD if INSTANCE-TYPE hs oken9 s its vlueD there will e one trining instne in the doument per tokenF his lso mens tht the positions @see elowA re de(ned in reltion to tokensF INSTANCE-TYPE n e seen s the si unit to e tken into ount for mhine lerningF he ttriutes of the instne re de(ned y sequene of ATTRIBUTED ATTRIBUTE_REL or ATTRIBUTELIST elementsF
DATASET
hi'erent xv lerning tsks my hve di'erent instne types nd use di'erent kinds of ttriute elementsF ghunking reognition often uses the token s instne type nd the linE guisti fetures of oken9 nd other nnottions s feturesF ext lssi(tion9s instne type is the text unit for lssi(tionD eFgF the whole doumentD or senteneD or tokenF sf lssifying for exmple senteneD nEgrms @see elowA re often good feture represenE ttion for mny sttistil lerning lgorithmsF por reltion extrtionD the instne type is pir of terms tht my e reltedD nd the fetures ome from not only the linguisti fetures of eh of the two terms ut lso those relted to oth terms tken togetherF he DATASET element should de(ne n INSTANCE-TYPE suEelementD it should de(ne n ATTRIBUTE suEelement or n ATTRIBUTE_REL suEelement s lssD nd it should de(ne some linguisti feture relted suEelements @linguisti feture9 or xv feture9 is used here to distinguish fetures or ttriutes used for mhine lerning from fetures in the sense of feture of qei nnottionAF ell the nnottion types involved in the dtset de(nition should e in the sme nnottion setF ih of the suEelements de(ning the linguisti fetures @ttriutesA should ontin n element de(ning the nnottion TYPE to e used nd n element de(ning the FEATURE of the nnottion type to useF por instneD TYPE might e erson9 nd FEATURE might e gender9F por n ATTRIBUTE suE elementD if you do not speify FEATURED the entire suEelement will e ignoredF hereforeD if n nnottion type you wnt to use does not hve ny nnottion feturesD you should dd n nnottion feture to it nd ssign the sme vlue to the feture for ll nnottions of tht typeF xote tht if lnk spes re ontined in the vlues of the nnottion feturesD they will e repled y the hrter 9 in eh ourreneF o it is dvisle tht the
Machine Learning
QWU
vlues of the nnottion fetures usedD in prtiulr for the lss lelD do not ontin ny lnk speF felowD we explin ll the suEelements one y oneF lese lso refer to the exmple onE (gurtion (les presented in next setionF xote tht eh suEelement should hve unique nmeD if it requires nmeD unless we expliitly stte otherwiseF he sxexgiEi suEelement is de(ned s
<INSTANCE-TYPE>X</INSTANCE-TYPE> where is the nnottion type used s instne unit for lerningD for exmple oken9F por reltion extrtionD the user should lso speify the two rguments of the reltionD s soX <INSTANCE-ARG1>A</INSTANCE-ARG1> <INSTANCE-ARG2>B</INSTANCE-ARG2> he vlues of e nd f should e identi(ers for the (rst nd seond terms of the reltionD respetivelyF hese nmes will e used lter in the on(gurtion (leF en exmple n e found t GgteGpluginsGlerningGtestGreltionElerningGenginesEsvmFxmlF
en esfi element hs the following suEelementsX
! xewiY the nme of the ttriuteF sts vlue should not end with grm9D sine ! iwiY type of the ttriute vlueF st n e xywsxev9 or xwisg9F
gurrently only nominl is supportedF
this is reserved for nEgrm fetures s mentioned elowF his ttriute nme will pper in output (lesD so it is useful to give desriptive nmeF
! iY the nnottion type used to extrt the ttriuteF ! pieiY the vlue of the ttriute will e the vlue of the nmed feture on
the nnottion of the spei(ed typeF
the feture reltive to the urrent instne nnottionF H refers to the urrent instne nnottionD EI refers to the preeding instne nnottionD I refers to the following one nd so forthF ell tht we de(ned INSTANCE-TYPE t the strt of the DATASET elementF his type might for exmple e oken9F sn the urrent ATTRIBUTE element we re de(ning n nnottion type to use to get the feture fromD seprte nd possily di'erent from the INSTANCE-TYPEF por exmpleD we might e interested in the mjorype9 of vookup9F fy speifying EID we would e syingD move to the preeding oken9 nd then try to extrt the mjorype9 of the vookup9 on tht tokenF he defult vlue of the prmeter is HF xote tht if our INSTANCE-TYPE were to e for exmple nmed entity nnottion omprising multiple tokensD nd we wnted to extrt feture on the oken9 nnottionD then ll the tokens within it would e onsidered to e in the zero position reltive to the urrent instne nnottionD nd the urrent impleE menttion would simply pik the (rstF @seful in this se might e the NGRAM ttriute typeD desried lterD whih n e used to extrt fetures for eh
QWV
Machine Learning
memer of multiEtoken nnottionFA sn the urrent implementtionD fetures re weighted ording to their distne from the urrent instne nnottionF sn other wordsD fetures whih re further removed from the urrent instne nnoE ttion re given redued importneF he omponent vlue in the feture vetor for one ttriute feture is I if the ttriute9s position p is HF ytherwise its vlue is 1.0/|p|F
only e one ttriute mrked s lss in dtset de(nitionF he ttriuteD s desried oveD hs spei(ed TYPE nd FEATUREY the fetures of the type re the lss lelsF ine only one ttriute n e mrked s lssD it my e neessry to preproess your dt to put ll lss lels into feture of one type of nnottionD eFgF you might rete wention9 nnottionD with the feture glss9D whih is set to the lss nmeF
he esfivs element is similr to ATTRIBUTE exept tht it hs no POSITION suEelement ut insted RANGE elementF his will e onverted into severl ttriutes with position rnging from the vlue of from9 to the vlue of to9F st de(nes ontext window9 ontining severl onseutive exmplesF he ATTRIBUTELIST should e preferred when de(ning ontext window for feturesD euse not only it n void the duplition of ATTRIBUTE elementsD ut lso euse proessing is speeded up @see the disussion for the element WINDOWSIZE elowAF he sxhysi element spei(es the size of the ontext windowF his will override the ontext window size de(ned in every ATTRIBUTELISTF sf the WINDOWSIZE element is not present in the on(gurtion (leD the window size de(ned in eh element ATTRIBUTELIST will e usedY otherwiseD the window size spei(ed y this element will e used for eh ATTRIBUTELIST if it ontins one ATTRIBUTE t position H @otherwise the ATTRIBUTELIST will e ignoredAF his element n e used for speeding up the proess of extrting the feture vetors from the doumentsF he element hs two fetures speifying the length of left nd right sides of ontext windowF st hs the following formX <WINDOWSIZE windowSizeLeft="X" windowSizeRight="Y"/> where nd represent the the length of left nd right sides of ontext windowD reE spetivelyF por exmpleD if = 2 nd = 1D then the ontext window will e from the position EP to I @ eFgF from the seond token in the left through the urrent token to the (rst token in the rightAF en xqew feture is used for hrterising n instne nnottion in terms of onstituent sequenes of susumed feture nnottionsF st is essentilly reversl of the ATTRIBUTELIST prinipleY where ATTRIBUTELIST uses sequene surrounding n instne in order to lssify the instneD NGRAM uses sequenes within the instne s feturesF st simply retes series of ttriutes tht onstitute sliding window ross the entire of the urrent instne nnottionF por exmpleD INSTANCE-TYPE might e sentenesD in sentene lssi(tionD nd the NGRAM ttriute spei(tion ould e used for exmple to rete series of unigrm fetures for the senteneD e'etively
Machine Learning
QWW
g of words9 representtionF gonventionllyD one would use the string of the tokenD or perhps its lemmD s the feture for the NGRAMY howeverD it is possile to speify multiple fetures of hoieD s shown elowF
! xewiY nme of the nEgrmF sts vlue should end with grm9F ! xwfiY the n9 of the nEgrmD with vlue I for unigrmD nd P for igrmD
etF
grm prt of the feture vetor for one instne is normlisedD thus hving defult vlue of IFHF sf the user wnts to djust the ontriutions of the nEgrm to the whole feture vetorD sGhe n do so y setting the isqr prmeterF por exmpleD if the user is doing sentene lssi(tion nd sGhe uses two feturesY the unigrm of tokens in sentene nd the length of the senteneD y defult the entire of the NGRAM ttriute spei(tion is given only the sme importne s the sentene length fetureF sn order to experiment with inresing the importne of the nEgrm elementD the user n set the weight suEelement of the nEgrm element with numer igger thn IFH @like IHFHAF hen every omponent of the nEgrm prt of the feture vetor would e multiplied y the prmeterF
he lueypexgrm element spei(es the type of vlue used in the nEgrmF gurE rently it n tke one of the three typesY inryD tfD nd tfEidfD whih re explined in etion IVFPFRF he vlue is spei(ed y the in <ValueTypeNgram>X</ValueTypeNgram> = 1 for inryD = 2 for tfD nd = 3 for tfEidfF he defult vlue is 3F he pieiEeqI element de(nes the fetures relted to the (rst rgument of the reltion for reltion lerningF st should inlude one ARG suEelement referring to the qei nnottion of the rgument @see elow for detiled explntionAF st my inE lude other suEelementsD suh s ATTRIBUTED ATTRIBUTELIST ndGor NGRAMD to de(ne the linguisti fetures relted to the rgumentF petures pertining prtiuE lrly to one or the other rgument of reltion should e de(ned in FEATURES-ARG1 or FEATURES-ARG2 s ppropriteF petures relting to oth rguments should e de(ned using n ATTRIBUTE_RELF he pieiEeqP element de(nes the fetures relted to the seond rguE ment of reltionF vike the element FEATURES-ARG1D it should inlude one ARG suEelementF st my lso inlude other suEelementsF he ARG suEelement in the FEATURES-ARG2 should hve unique nme whih is di'erent from the nme for
RHH
Machine Learning
the ARG suEelement in the FEATURES-ARG1F roweverD other suEelements my hve the sme nme s orresponding ones in the FEATURES-ARG1D if they refer to the sme nnottion type nd feture in the textF he eq element is used in oth FEATURES-ARG1 nd FEATURES-ARG2F st spei(es the nnottion orresponding to one rgument of reltionF st hs four suE elementsD s followsY
! xewiY unique nme for the rgument @eFgF eqI9AF ! iwiY the type of the rg vlueF his n e xywsxev9 or xwisg9F
gurrently only nominl is implementedF
! iY the nnottion type for the rgumentF ! pieiY the vlue of the nmed feture on the nnottion of spei(ed type is
the identi(er of the rgumentF ynly if the vlue of the feture is sme s the vlue of the feture spei(ed in the suEelement <INSTANCE-ARG1>A</INSTANCEARG1> @or <INSTANCE-ARG2>B</INSTANCE-ARG2>AD the rgument is reE grded s one rgument of the reltion instne onsideredF
esfiiv element is similr to the ATTRIBUTE elementF roweverD it does not hve the POSITION suEelementD nd it hs two other suEelementsD ARG1 nd ARG2D relting to the two rgument fetures of the @reltionA instne typeF sn other wordsD if nd only if the vlue in the suEelement <ARG1>X</ARG1> is sme s the vlue e in the (rst rgument instne <INSTANCE-ARG1>A</INSTANCEARG1> nd the vlue in the suEelement <ARG2>Y</ARG2> is sme s the vlue f in the seond rgument instne <INSTANCE-ARG2>B</INSTANCE-ARG2> is the feture de(ned in this ATTRIBUTE_REL suEelement ssigned to the instne onsideredF por reltion lerningD n ATTRIBUTE_REL is denoted s the lss tE triute y inluding <CLASS/>F
snformtion ixtrtion
he (rst exmple is for informtion extrtionF he orpus is prepred with nnottions providing lss informtion s well s the fetures to e usedF glss informtion is provided in the form of single nnottion typeD wention9D whih ontins feture lss9F ithin the lss feture is the nme of the lss of the textul hunkF yther nnottions in the dtset
Machine Learning
RHI
inlude oken9 nd vookup9 nnottions s provided y exxsiF ell of these nnottions re in the sme nnottion setD the nme of whih will e pssed s runtime prmeterF he on(gurtion (le is given elowF he optionl settings re in the (rst prtF st (rst spei(es surround mode s true9Y we will (nd the hunks tht orrespond to our entities y using mhine lerning to lote the strt nd end of the hunksF hen it spei(es the (ltering settingsF ine we re going to use w in this prolemD we n (lter our dt to remove some of the negtive instnes tht n use prolems if they re too dominntF he ratio9s vlue is HFI9 nd the dis9s vlue is ner9D mening tht n initil w lerning step will e exeuted nd the IH7 of negtive exmples whih re losest to the lerned w hyperEplne will e removed in the (ltering stgeD efore the (nl lerning is exeutedF he threshold proilities for the oundry tokens nd informtion entity re set s HFR9 nd HFP9D respetivelyY oundry tokens found with lower on(dene thn the threshold will e rejetedF he threshold proility for lssi(tion is lso set s HFS9Y thisD howeverD will not e used in this se sine we re doing hunk lerning with surround mode set s true9F he prmeter will e ignoredF multiClassication2Binary is set s oneEvsEothers9D mening tht the wv es will onvert the multiElss lssi(tion prolem into series of inry lssi(tion prolems using the one against others pprohF sn evlution modeD PEfold9 rossEvlidtion will e usedD dividing the orpus into two equl prts nd running two triningGtest yles with eh prt s the trining dtF he seond prt is the suEelement ENGINED speifying the lerning lgorithmF he will use the LibSVM w implementtionF he options determine tht it will use the liner kernel with the ost g s HFU nd the he memory s IHHwF edditionlly it will use uneven mrginsD with s HFRF he lst prt is the DATASET suEelementD de(ning the linguisti fetures usedF st (rst spei(es the oken9 nnottion s instne typeF he (rst ATTRIBUTELIST llows the token9s string s feture of n instneF he rnge from ES9 to S9 mens tht the strings of the urrent token instne s well s its (ve preeding tokens nd its (ve ensuing tokens will e used s fetures for the urrent token instneF he next two ttriute lists de(ne fetures sed on the tokens9 pitlistion informtion nd typesF he ATTRIBUTELIST nmed qz9 uses s ttriutes the vlues of the feture mjorype9 of the nnottion type vookup9F he (nl ATTRIBUTE feture de(nes the lss ttriuteY it hs the suEelement <CLASS/>F he vlues of the feture lss9 of the nnottion type wention9 re the lss lelsF
<?xml version="1.0"?> <ML-CONFIG> <SURROUND value="true"/> <FILTERING ratio="0.1" dis="near"/> <PARAMETER name="thresholdProbabilityEntity" value="0.2"/> <PARAMETER name="thresholdProbabilityBoundary" value="0.4"/> <PARAMETER name="thresholdProbabilityClassification" value="0.5"/> <multiClassification2Binary method="one-vs-others"/>
RHP
Machine Learning
<EVALUATION method="kfold" runs="2"/> <ENGINE nickname="SVM" implementationName="SVMLibSvmJava" options=" -c 0.7 -t 0 -m 100 -tau 0.4 "/> <DATASET> <INSTANCE-TYPE>Token</INSTANCE-TYPE> <ATTRIBUTELIST> <NAME>Form</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Token</TYPE> <FEATURE>string</FEATURE> <RANGE from="-5" to="5"/> </ATTRIBUTELIST> <ATTRIBUTELIST> <NAME>Orthography</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Token</TYPE> <FEATURE>orth</FEATURE> <RANGE from="-5" to="5"/> </ATTRIBUTELIST> <ATTRIBUTELIST> <NAME>Tokenkind</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Token</TYPE> <FEATURE>kind</FEATURE> <RANGE from="-5" to="5"/> </ATTRIBUTELIST> <ATTRIBUTELIST> <NAME>Gaz</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Lookup</TYPE> <FEATURE>majorType</FEATURE> <RANGE from="-5" to="5"/> </ATTRIBUTELIST> <ATTRIBUTE> <NAME>Class</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Mention</TYPE> <FEATURE>class</FEATURE> <POSITION>0</POSITION> <CLASS/> </ATTRIBUTE> </DATASET> </ML-CONFIG>
Machine Learning
RHQ
entene glssi(tion
e will now onsider the se of sentene lssi(tionF he orpus in this exmple is nnotted with entene9 nnottionsD whih ontin the feture sentsize9D s well s the lss of the senteneF purthermoreD oken9 nnottions re ppliedD hving fetures tegory9 nd root9F es eforeD ll nnottions re in the sme setD nd the nnottion set nme will e pssed to the t run timeF felow is n exmple on(gurtion (leF st (rst spei(es surround mode s flse9D euse it is text lssi(tion prolemY we re interested in lssifying single instnes rther thn hunks of instnesF yur trgets of interestD sentenesD hve lredy een found @unlike in the informtion extrtion exmpleD where identifying the limits of the entity ws prt of the prolemAF he next two options llow the lel list nd the xv feture list to e updted from the trining dt when retriningF st lso spei(es proility thresholds for entity nd entity oundryF xote tht these two spei(tions will not e used in this seF roweverD their presene is not prolemtiY they will simply e ignoredF he proility threshold for lssi(tion is set s HFS9F his will e used to deide whih lssi(tions to ept nd whih to rejet s eing too unlikelyF @eltering this prmeter n trde o' preision ginst rell nd vie versFA he evlution will use the holdEout test methodF st will rndomly selet TT7 of the douments from the orpus for triningD nd the other QR7 douments will e used for testingF st will run the evlution twieD nd verge the results over the two runsF xote tht it does not speify the method of onverting multiElss lssi(tion prolem into severl inry lss prolemD mening tht it will dopt the defult @nmely one ginst ll othersAF he on(gurtion (le spei(es uxx @uExerest xeighourA s the lerning lgorithmF st lso spei(es the numer of neighours used s SF yf ourse other lerning lgorithms n e used s wellF por exmpleD the ENGINE element in the previous exmpleD whih spei(es w s lerning lgorithmD n e put into this on(gurtion (le to reple the urrent oneF sn the DATASET elementD the nnottion entene9 is used s instne typeF wo kinds of linguisti fetures re de(nedY one is NGRAM nd the other is ATTRIBUTEF he nEgrm is sed on the nnottion oken9F st is unigrmD s its NUMBER element hs the vlue IF his mens tht g of words9 feture will e formed from the tokens omprising the senteneF st is sed on the two feturesD root9 nd tegory9D of the nnottion oken9F his introdues new spet to the nEgrmF he nEgrm feture omprises ounts of the unigrms ppering in the senteneF por exmpleD if the sentene were the mn wlked the dog4D the unigrm feture would ontin the informtion tht the9 ppered twieD nd mn9D wlked9 nd dog9 ppered oneF roweverD sine our nEgrm hs two feturesD root9 nd tegory9D two tokens will e onsidered the sme term if nd only if they hve the sme root9 feture nd the sme tegory9 fetureF he weight of the ngrm is set s IHFHD mening its ontriution is ten times tht of the ontriution of the other fetureD the sentene lengthF he feture sentsize9 of the nnottion entene9 is given s n ATTRIBUTE fetureF pinlly the vlues of the feture lss9 of the nnottion entene9 re nominted s the lss lelsF
RHR
Machine Learning
<?xml version="1.0"?> <ML-CONFIG> <SURROUND value="false"/> <IS-LABEL-UPDATABLE value="true"/> <IS-NLPFEATURELIST-UPDATABLE value="true"/> <PARAMETER name="thresholdProbabilityEntity" value="0.2"/> <PARAMETER name="thresholdProbabilityBoundary" value="0.42"/> <PARAMETER name="thresholdProbabilityClassification" value="0.5"/> <EVALUATION method="holdout" runs="2" ratio="0.66"/> <ENGINE nickname="KNN" implementationName="KNNWeka" options = " -k 5 "/> <DATASET> <INSTANCE-TYPE>Sentence</INSTANCE-TYPE> <NGRAM> <NAME>Sent1gram</NAME> <NUMBER>1</NUMBER> <CONSNUM>2</CONSNUM> <CONS-1> <TYPE>Token</TYPE> <FEATURE>root</FEATURE> </CONS-1> <CONS-2> <TYPE>Token</TYPE> <FEATURE>category</FEATURE> </CONS-2> <WEIGHT>10.0</WEIGHT> </NGRAM> <ATTRIBUTE> <NAME>Class</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Sentence</TYPE> <FEATURE>sent_size</FEATURE> <POSITION>0</POSITION> </ATTRIBUTE> <ATTRIBUTE> <NAME>Class</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Sentence</TYPE> <FEATURE>class</FEATURE> <POSITION>0</POSITION> <CLASS/> </ATTRIBUTE> </DATASET> </ML-CONFIG>
Machine Learning
RHS
eltion ixtrtion
he lst exmple is for reltion extrtionF he reltion extrtion support in the is sed on the work desried in ng et al. HTF wo onepts re key in reltion extrtion orpusF intities re the things tht my e reltedD nd reltions desrie the reltionship etween the entities if nyF sn our exmpleD entities re preEidenti(edD nd the tsk is to identify the reltionships etween themF he orpus for this exmple is nnotted with the followingX egiintity9 nnottions indite the entities of interest in the orpusF isx9 nnottions form the instnesD nd there is n instne for every pir of egiintities9 within senteneF isx9 nnottions spn the entire of the text etween nd inluding their egiintity9 nnottionsF por exmpleD the ommnder of ssreli troops9 might e potentil reltionship etween personD the ommnE der9D nd n entityD ssreli troops9F sts isx9 nnottion overs the entire of this textF st ontins rgI9 nd rgP9 fetures ontining the numeril identi(ers of the two egiintities9 to whih it pertinsF hese numeril identi(ers mth the wixE syxsh9 feture of the egiintity9 nnottionF egieltion9 nnottions indite the reltions we wish to lernD nd lso spn the entire of the text involved in the reltionshipF hey inlude the fetures wixE syxeqI9 nd wixsyxeqP9D whihD ginD ontin the numeril idenE ti(er found in the wixsyxsh9 feture of the egiintity9 nnottionsD s well s eltiontype9D inditing the type of the reltionF rious exxsiEstyle nnottions re lso inludedF yur tsk is to selet the isx9 instnes tht mth the egieltions9F ou will see tht throughout the on(gurtion (leD nnottion types re spei(ed in onjuntion with rgument identi(ersF his is euse we need to ensure tht the nnottion in question pertins to the right entitiesF hereforeD rgument identi(ers re used to onstrin the mthF he on(gurtion (le does not speify ny optionl settingsD mening tht it uses ll the defult vlues for those settings @see etion IVFPFI for the defult vlues of ll possile settingsAF it sets the
surround mode
s flse9Y
oth the lel list nd xv feture list re updtleY the proility threshold for lssi(tion is set s HFSY
RHT
Machine Learning
it uses one ginst others9 for onverting multiElss prolem into inry lss prolems for w lerningY for evlution it uses holdEout testing with rtio of HFTT nd only one runF
he on(gurtion (le spei(es the lerning lgorithm s the xive fyes method impleE mented in ekF roweverD other lerning lgorithms ould eqully well e usedF e egin y de(ning isx9 s the instne typeF xextD we provide the numeri idenE ti(ers of eh rgument of the reltionship y speifying elements INSTANCE-ARG1 nd INSTANCE-ARG2 s the feture nmes rgI9 nd rgP9 respetivelyF his indites tht the rgument identi(ers of the instnes n e found in the rgI9 nd rgP9 fetures of the isx9 nnottionsF ettriutes might pertin to the entire reltion or they might pertin to one or other rgument within the reltionF e re going to egin y de(ning the fetures spei( to eh rgument of the reltionF ell tht our isx9 nnottions hve s rguments two egiintity9 nnottionsD nd tht these re identi(ed y their wixsyxsh9 eing the sme s the rgI9 or rgP9 fetures of the isx9F st is from these egiintity9 nnottions tht we wish to otin rgumentEspei( feturesF FEATURES-ARG1 nd FEATURES-ARG1 elements egin y speifying whih nnottion we re referring toF e use the ARG elE ement to explin thisF e re interested in nnottions of type egiintity9D nd their wixsyxsh9 must mth rgI9 or rgP9 of isx9 s ppropriteF rving identi(ed preisely whih egiintity9 we re interested in we n go on to give rgumentEspei( feturesY in this seD unigrms of the oken9 feture string9F e now wish to de(ne fetures pertining to the entire reltionF e indite tht the tIP9 feture of isx9 nnottions is to e used @this feture ontins type informtion derived from egiintity9AF eginD rther thn just speifying the isx9 nnottionD we lso indite tht the rgI9 nd rgP9 feture vlues must mth the rgument identi(ers of the instneD s de(ned in the INSTANCE-ARG1 nd INSTANCE-ARG2 elements t the eginningF his ensures tht we re tking our fetures from the orret nnottionF pinllyD we de(ne the lss ttriuteF e indite tht the lss ttriute is ontined in the eltiontype9 feture of the egieltion9 nnottionF he egieltion9 nnottion type hs fetures wixsyxeqI9 nd wixsyxeqI9D inditing its rgumentsF eginD we use the elements ARG1 nd ARG2 to indite tht it is these fetures tht must e mthed to the rguments of the instne if tht instne is to e onsidered positive exmple of the lssF
<?xml version="1.0"?> <ML-CONFIG> <ENGINE nickname="NB" implementationName="NaiveBayesWeka"/> <DATASET> <INSTANCE-TYPE>RE_INS</INSTANCE-TYPE>
Machine Learning
<INSTANCE-ARG1>arg1</INSTANCE-ARG1> <INSTANCE-ARG2>arg2</INSTANCE-ARG2> <FEATURES-ARG1> <ARG> <NAME>ARG1</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>ACEEntity</TYPE> <FEATURE>MENTION_ID</FEATURE> </ARG> <ATTRIBUTE> <NAME>Form</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Token</TYPE> <FEATURE>string</FEATURE> <POSITION>0</POSITION> </ATTRIBUTE> </FEATURES-ARG1> <FEATURES-ARG2> <ARG> <NAME>ARG2</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>ACEEntity</TYPE> <FEATURE>MENTION_ID</FEATURE> </ARG> <ATTRIBUTE> <NAME>Form</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>Token</TYPE> <FEATURE>string</FEATURE> <POSITION>0</POSITION> </ATTRIBUTE> </FEATURES-ARG2> <ATTRIBUTE_REL> <NAME>EntityCom1</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>RE_INS</TYPE> <ARG1>arg1</ARG1> <ARG2>arg2</ARG2> <FEATURE>t12</FEATURE> </ATTRIBUTE_REL> <ATTRIBUTE_REL> <NAME>Class</NAME> <SEMTYPE>NOMINAL</SEMTYPE> <TYPE>ACERelation</TYPE> <ARG1>MENTION_ARG1</ARG1> <ARG2>MENTION_ARG2</ARG2> <FEATURE>Relation_type</FEATURE>
RHU
RHV
Machine Learning
Machine Learning
RHW
prmeterF efter tht you n put the into Corpus Pipeline pplition to use itF edd the orpus ontining the trining douments to the pplition tooF et the inputexme to the nnottion set ontining the nnottions for linguisti fetures nd lss lelsF UF et the runEtime prmeter learningMode to esxsxq9 to lern model from the trining dtD or set learningMode to ievesyx9 to do evlution on the trining dt nd get (gures inditing the suess of the lerningF hen using evlution modeD mke sure tht the outputASName is the sme s the inputASNameF @ipX it my sve time if you (rst try evlution mode on smll numer of douments to mke sure tht the wv works well on your prolem nd outputs resonle results efore trining on the lrge dtFA VF sf you wnt to pply the lerned model to new doumentsD lod those new douments into qei nd preEproess them in the sme wy s the trining doumentsD to ensure tht the sme fetures re presentF @glss lels need not e presentD of ourseFA hen set learningMode to evsgesyx9 nd run the on this orpusF he pplition resultsD nmely the new nnottions ontining the lss lelsD will e dded into the nnottion set spei(ed y the outputASNameF WF sf you just wnt the feture (les produed y the system nd do not wnt to do ny lerning or pplitionD selet the lerning mode roduepeturepilesynly9F
RIH
Machine Learning
rining results
hen the fth verning is used in trining modeD its min output is the lerned modelD stored in (le nmed lernedwodelsFsve9F por the w lgorithmD the lerned model (le is text (leF por the lerning lgorithms implemented in ekD the model (le is inry (leF he output lso inludes the feture (les desried in etion IVFPFRF
epplition esults
he min pplition result is the nnottions dded to the doumentsF hose nnottions re the results of pplying the wv model to the doumentsF sn the on(gurtion (leD the nnottion type nd feture of the lss lels re spei(edY lss lels must e the vlue of feture of n nnottion typeF sn pplition modeD those nnottion types re reted in the new doumentsD nd the feture spei(ed will hold the lss lelF en dditionl feture will lso e inluded on the spei(ed nnottion typeY pro9 will hold the on(dene level for the nnottionF
ivlution esults
he fth verning outputs the evlution results for eh run nd lso the verged results over ll runsF por eh runD it (rst prints messge out the nmes of the douments in trining nd testing orpor respetivelyF hen it displys the evlution results of this runY (rst the results for eh lss lel nd then the miroEverged results over ll lelsF por eh lelD it presents the nme of the lelD the numer of instnes elonging to the lel in the trining dt nd results on the test dtY the numers of orretD prtilly orretD spurious nd missing instnes in the testing dtD nd the preisionD rell nd pID lulted using orret only @stritA nd orret plus prtil @lenientAF he pEmesure results re otined using the AnnotationDi Tool whih is desried in ghpter IHF pinllyD the system presents the mens of the results of ll runs for eh lel nd the miroEverged resultsF
peture piles
he fth verning is le to produe severl feture (lesF hese feture (les ould e used for evluting lerning lgorithms not implemented in this pluginF e desrie the formts of those feture (les elowF xote tht ll the dt (les desried elow n e otined y setting the run time prmeter learningMode to roduepeturepilesynly9D ut some my e produed s prt of other lerning modesF he xv feture (leD nmed NLPFeatureData.saveD ontins the xv fetures of the instnes de(ned in the on(gurtion (leF felow is n exmple of the (rst few lines of n
Machine Learning
xv feture (le for informtion extrtionX
Class(es) Form(-1) Form(0) Form(1) Ortho(-1) Ortho(0) Ortho(1) 0 ft-airlines-27-jul-2001.xml 512 1 Number_BB _NA[-1] _Form_Seven _Form_UK[1] _NA[-1] _Ortho_upperInitial _Ortho_allCaps[1] 1 Country_BB _Form_Seven[-1] _Form_UK _Form_airlines[1] _Ortho_upperInitial[-1] _Ortho_allCaps _Ortho_lowercase[1] 0 _Form_UK[-1] _Form_airlines _Form_including[1] _Ortho_allCaps[-1] _Ortho_lowercase _Ortho_lowercase[1] 0 _Form_airlines[-1] _Form_including _Form_British[1] _Ortho_lowercase[-1] _Ortho_lowercase _Ortho_upperInitial[1] 1 Airline_BB _Form_including[-1] _Form_British _Form_Airways[1] _Ortho_lowercase[-1] _Ortho_upperInitial _Ortho_upperInitial[1] 1 Airline _Form_British[-1] _Form_Airways _Form_[1], _Ortho_upperInitial[-1] _Ortho_upperInitial _NA[1] 0 _Form_Airways[-1] _Form_, _Form_Virgin[1] _Ortho_upperInitial[-1] _NA _Ortho_upperInitial[1]
RII
he (rst line of the xv feture (le lists the nmes of ll fetures usedF hese nmes re the nmes the user gve to their fetures in the on(gurtion (leF he numer in the prenthesis following feture nme indites the position of the fetureF por exmpleD porm@EIA9 mens the porm feture of the token whih is immeditely efore the urrent tokenD nd porm@HA9 mens the porm feture of the urrent tokenF he xv fetures for ll instnes re listed for one doument efore moving on to the nextF por eh doumentD the (rst line shows the index of the doumentD the doument9s nme nd the numer of instnes in the doumentD s shown in the seond line oveF efter thtD eh line orresponds to n instne in the doumentD in their order of pperneF he (rst item on the line is numer nD representing the numer of lss lels of the instneF henD the following n items re the lelsF sf the urrent instne is the (rst instne of n entityD its orresponding lel hs su0x ff9F he other items following the lel item@sA re the xv fetures of the instneD in the order listed in the (rst line of the (leF ih xv feture ontins the feture9s nme nd vlueD seprted y 9F et the end of one xv fetureD there my e n integer in squre rketsD whih represents the position of the feture reltive to the urrent instneF sf there is no squreErketed integer t the end of one xv fetureD then the feture is t the position HF he peture vetor (le hs the (le nme fetureetorshtFsve9D nd stores the feture vetor in sprse formt for eh instneF he (rst few lines of the feture vetor (le orresponding to the xv feture (le shown ove re s followsX
0 512 ft-airlines-27-jul-2001.xml 1 2 1 2 439:1.0 761:1.0 100300:1.0 100763:1.0 2 2 3 4 300:1.0 763:1.0 50439:1.0 50761:1.0 100440:1.0 100762:1.0
RIP
Machine Learning
440:1.0 762:1.0 50300:1.0 50763:1.0 100441:1.0 100762:1.0 441:1.0 762:1.0 50440:1.0 50762:1.0 100020:1.0 100761:1.0 5 20:1.0 761:1.0 50441:1.0 50762:1.0 100442:1.0 100761:1.0 6 442:1.0 761:1.0 50020:1.0 50761:1.0 100066:1.0 66:1.0 50442:1.0 50761:1.0 100443:1.0 100761:1.0
3 4 5 6 7
0 0 1 1 0
he feture vetors re lso listed for eh doument in sequeneF por eh doumentD the (rst line shows the index of the doumentD the numer of instnes in the doument nd the doument9s nmeF ih of the following lines is for eh of the instnes in the doumentF he (rst item in the line is the index of the instne in the doumentF he seond item is numer nD representing the numer of lels the instne hsF he following n items re indies representing the lss lelsF por text lssi(tion nd reltion lerningD the lel9s index omes diretly from the lel list (leD desried elowF por hunk lerningD the lel9s index presented in the feture vetor (le is it more omplitedF sf n instne @eFgF tokenA is the (rst one of hunk with lel k D then the instne hs s the lel9s index 2 k 1D s shown in the (fth instneF sf it is the lst instne of the hunkD it hs the lel9s index s 2 k D s shown in the sixth instneF sf the instne is oth the (rst one nd the lst one of the hunk @nmely the hunk onsists of one instneAD it hs two lel indiesD 2 k 1 nd 2 k D s shown in the (rst nd seond instnesF he items following the lel@sA re the nonEzero omponents of the feture vetorF ih omponent is represented y two numers seprted y X9F he (rst numer is the dimension @positionA of the omponent in the feture vetorD nd the seond one is the vlue of the omponentF he vel list (le hs the nme velsvistFsve9D nd stores list of lels nd their indiesF he following is prt of lel listF ih line shows one lel nme nd its index in the lel listF
Airline 3 Bank 13 CalendarMonth 11 CalendarYear 10 Company 6 Continent 8 Country 2 CountryCapital 15 Date 21 DayOfWeek 4
he xv feture list hs the nme xvpeturesvistFsve9D nd ontins list of xv fetures nd their indies in the listF he following re the (rst few lines of n xv feture list (leF
Machine Learning
totalNumDocs=14915 _EntityType_Date 13 1731 _EntityType_Location 170 1081 _EntityType_Money 523 3774 _EntityType_Organization 12 2387 _EntityType_Person 191 421 _EntityType_Unknown 76 218 _Form_' 112 775 _Form_\$ 527 74 _Form_' 508 37 _Form_'s 63 731 _Form_( 526 111
RIQ
he (rst line of the (le shows the numer of instnes from whih the xv fetures were olletedF he numer of instnes will e used for omputting of the idf @inverse doument frequenyA in doument or sentene lssi(tionF he following lines re for the xv feturesF ih line is for one unique fetureF he (rst item in the line represents the xv fetureD whih is omintion of the feture9s nme de(ned in the on(gurtion (le nd the vlue of the fetureF he seond item is positive integer representing the index of the feture in the listF he lst item is the numer of times tht the feture oursD whih is needed for omputing the idfF he xEgrms @or lnguge modelA (le hs the nme xgrmvistFsve9D nd n only e produed y setting the lerning mode to roduepeturepilesynly9F sn order to produe nEgrm dtD the user my use very simple on(gurtion (leD iFeF it need only ontin the DATASET elementD nd the dt element need ontin only n NGRAM element to speify the type of nEgrm nd the INSTANCE-TYPE element to de(ne the nnottion type from whih the nEgrm dt re reted @eFgF senteneAF he NGRAM element in on(gurtion (le spei(es wht type of nEgrms the produes @see etion IVFPFI for the explntion of the nEgrm de(nitionAF por exmpleD if you speify igrm sed on the string form of oken9D you will otin list of igrms from the orpus you usedF he following re the (rst lines of igrm list sed on the token nnottion9s string9 fetureD nd ws lulted over Q doumentsF
## The following 2-gram were obtained from 3 documents or examples Aug<>, 3 Female<>; 3 Human<>; 3 2004<>Aug 3 ;<>Female 3 .<>The 3 of<>a 3 )<>: 3 ,<>and 3 to<>be 3 ;<>Human 3
RIR
Machine Learning
he two terms of the igrm re seprted y <>9F he numer following one nEgrm is the numer of ourrenes of tht nEgrm in the orpusF he nEgrm list is ordered ording to the numer of ourrenes of the nEgrm termsF he most frequent terms in the orpus re therefore t the strt of the listF he nEgrm dt produed n e sed on ny fetures of nnottions ville in the doumentsF rene it n not only produe the onventionl nEgrm dt sed on the token9s form or lemmD ut lso nEgrms sed on eFgF the token9s yD or omintion of the token9s y nd formD or ny feture of the sentene9 nnottion @see etion IVFPFI for how to de(ne di'erent types of nEgrmAF he houmentEterm mtrix (le hs the nme doumentfyermwtrixFsve9D nd n only e produed y setting the lerning mode to roduepeturepilesynly9F he doumentE term mtrix presents the weights of terms ppering in eh doument @see etion PIFIT for more explntionAF gurrently three types of weight re implementedY inryD term frequeny @tfA nd tfEidfF he inry weight is simply I if the term ppers in doument nd H if it does notF tf @term frequenyA refers to the numer of ourrenes of one term in doumentF tf-idf is populr in informtion retrievl nd text miningF st is multiplition of term frequeny nd inverse doument frequenyF snverse doument frequeny is lulted s followsX
idfi = log
|D| |{dj : ti dj }|
where |D| is the totl numer of douments in the orpusD nd |{dj : ti dj }| is the numer of douments in whih the term ti ppersF he type of weight is spei(ed y the suEelement ValueTypeNgram in the DATASET element in on(gurtion (le @see etion IVFPFIAF vike the nEgrm dtD in order to produe the doumentEterm mtrixD the user my use very simple on(gurtion (leD iFeF it need only ontin the DATASET elementD nd the dt element need only ontin two elementsY the INSTANCE-TYPE elementD to de(ne the nnottion type from whih the terms re ountedD nd n NGRAM element to speify the type of nEgrmF es mentioned previouslyD the element ValueTypeNgram spei(es the type of vlue used in the mtrixF sf it is not presentD the defult type tf-idf will e usedF he onventionl doumentEterm mtrix n e produed using unigrm sed on the token9s form or lemm nd the instne type overing the whole doumentF sn other wordsD INSTANCE-TYPE is set to n nnottion type suh s for exmple ody9D whih overs the entire doumentD nd the nEgrm de(nition then spei(es the string9 feture of the oken9 nnottion typeF he following ws extrted from the eginning of doumentEterm mtrix (leD produed using unigrms of the token9s formF st presents prt of the mtrix of terms nd their term frequeny vlues in the doument nmed PUFxml9F ih term nd its term frequeny re seprted y X9F he terms re in lpheti orderF
0 Documentname="27.xml", has 1 parts: ":2 (:6 ):6 ,:14 -:1 .:16 /:1
Machine Learning
124:1 2004:1 22:1 29:1 330:1 54:1 8:2 ::5 ;:11 Abstract:1 Adaptation:1 Adult:1 Atopic:2 Attachment:3 Aug:1 Bindungssicherheit:1 Cross-:1 Dermatitis:2 English:1 F-SOZU:1 Female:1 Human:1 In:1 Index:1 Insecure:1 Interpersonal:1 Irrespective:1 It:1 K-:1 Lebensqualitat:1 Life:1 Male:1 NSI:2 Neurodermitis:2 OT:1 Original:1 Patients:1 Psychological:1 Psychologie:1 Psychosomatik:1 Psychotherapie:1 Quality:1 Questionnaire:1 RSQ:1 Relations:1 Relationship:1 SCORAD:1 Scales:1 Sectional:1 Securely:1 Severity:2 Skindex-:1 Social:1 Studies:1 Suffering:1 Support:1 The:1 Title:1 We:3 [:1 ]:1 a:4 absence:1 affection:1 along:2 amount:1 an:1 and:9 as:1 assessed:1 association:2 atopic:5 attached:7
RIS
e list of nmes of douments proessed n lso e otinedF he (le hs the nme dosxmeFsve9D nd only n e produed y setting the lerning mode to roduepeE turepilesynly9F st ontins the nmes of ll the douments proessedF he (rst line shows the numer of douments in the listF henD eh line lists one doument9s nmeF he (rst lines of n exmple (le re shown elowX
##totalDocs=3 ft-bank-of-england-02-aug-2001.xml ft-airtours-08-aug-2001.xml ft-airlines-27-jul-2001.xml
e list of nmes of the seleted douments for tive lerning purposes n lso e produedF he (le hs the nme eveletedhosFsve9F st is text (leF st is produed in roduepeturepilesynly9 modeF he (le ontins the nmes of douments whih hve een seleted for nnotting nd trining in the tive lerning proessF st is used y the nkinghosporev9 lerning mode to exlude those seleted douments from the rnked douments for tive lerning purposesF hen one or more douments re seleted for nnotting nd triningD their nmes should e put into this (leD one line per doumentF e list of nmes of rnked douments for tive lerning purposesY the (le hs
the nme evnkedhosFsve9D nd is produed in nkinghosporev9 modeF he (le ontins the list of nmes of the douments rnked for tive lerningD ording to their usefulness for lerningF hose in the front of the list re the most useful douments for lerningF he (rst line in the (le shows the totl numer of douments in the listF ih of other lines in the (le lists one doument nd the verged on(dene sore for lssifying the doumentF en exmple of the (le is shown elowX
##numDocsRanked=3 ft-airlines-27-jul-2001.xml_000201 8.61744 ft-bank-of-england-02-aug-2001.xml_000221 8.672693 ft-airtours-08-aug-2001.xml_000211 9.82562
RIT
Machine Learning
File configFile = new File ( " / home / you / ml_config . xml " ); / / Wherever RunMode mode = RunMode . EVALUATION ; / / or TRAINING, or APPLICATION .. FeatureMap pfm = Factory . newFeatureMap (); pfm . put ( " corpus " , corpus ); gate . creole . SerialAnalyserController pipeline = ( gate . creole . SerialAnalyserController ) gate . Factory . createResource ( " gate . creole . SerialAnalyserController " , pfm );
/ / Make a pipeline and add the corpus
it is
FeatureMap fm = Factory . newFeatureMap (); fm . put ( " configFileURL " , configFile . toURI (). toURL ()); fm . put ( " learningMode " , mode ); gate . learning . LearningAPIMain learner = ( gate . learning . LearningAPIMain ) gate . Factory . createResource ( " gate . learning . LearningAPIMain " , fm ); pipeline . add ( learner ); pipeline . execute ();
rving run the in ievesyx modeD you n ess the results progrmmtillyX
1 2 3 4 5 6 7 8
EvaluationBasedOnDocs ev = learner . getEvaluation (); System . out . println ( ev . macroMeasuresOfResults . precision + " ," + ev . macroMeasuresOfResults . recall + " ," + ev . macroMeasuresOfResults . f1 + " ," + ev . macroMeasuresOfResults . precisionLenient + " ," + ev . macroMeasuresOfResults . recallLenient + " ," + ev . macroMeasuresOfResults . f1Lenient + " \ n " );
Machine Learning
RIU
18.3
Machine Learning PR
he whine verning 9 is qei9s erlier mhine lerning F st hndles oth the trining nd pplition of wv model on qei doumentsF his is vnguge enlE yser so it n e used in ll defult types of qei ontrollersF st n e found in the whineverning9 pluginF sn order to llow for more )exiilityD ll the on(gurtion prmeters for the whine vernE ing re set through n externl wv (le nd not through the norml prmeteriE stionF he root element of the (le needs to e lled wvEgyxpsq9 nd it ontins two elementsX heei9 nd ixqsxi9F en exmple wv on(gurtion (le is given in etion IVFQFTF
RIV
Machine Learning
emntillyD there re three types of ttriutesX nominl ttriutesX oth type nd fetures re de(ned nd list of llowed vlues is providedY numeriX oth type nd fetures re de(ned ut no list of llowed vlues is providedY it is ssumed tht the feture n e onverted to numer @ doule vlueAF oolenX no feture or list of vlues is providedY the ttriute will tke one of the true9 or flse9 vlues sed on the presene @or seneA of the spei(ed nnottion type t the required positionF pigure IVFI gives some exmples of wht the vlues of spei(ed ttriutes would e in sitution when oken9 nnottions re used s instnesF
pigure IVFIX mple ttriutes nd their vlues en esfivs element is similr to esfi exept tht it hs no yssyx suEelement ut exqi elementF his will e onverted into severl esfivs with position rnging from the vlue of the ttriute from9 to the vlue of the ttriute to9F his n e used in order to void the duplition of esfi elementsF
Machine Learning
RIW
RPH
Machine Learning
Machine Learning
RPI
dt sets seprtely from the modelD ut if there should e need to do thisD the iue wrpper n e used to ollet the dtF rining weix model follows the sme generl proedure s for iue modelsD ut the following di'erene should e notedF weix models re not updteleD so the model will lwys e reted nd trined the (rst time lssi(tion is ttemptedF he trining of the model might tke onsiderle mount of timeD depending on the mount of trining dt nd the prmeters of the modelF
RPP
Machine Learning
progrms in the right sequeneD pssing the dt k nd forth in temporry (lesF he <ei> vlue for this engine is gteFreoleFmlFsvmlightFwvightrpperF he w vight inries themselves re not distriuted with qei ! you should downlod the version for your pltform from httpXGGsvmlightFjohimsForg nd ple svmlern nd svmlssify on your pthF glssifying douments using the wvightrpper is two phse proedureF sn its (rst phseD wrpper ollets dt from the preEnnotted douments nd uilds the w model using the olleted dt to lssify the unseen douments in its seond phseF felow we desrie rie)y n exmple of lssifying the strt time of the seminr in orpus of emil nnouning seminrs nd provide more detils lter in the setionF pigure IVFP explins step y step the proess of olleting trining dt for the w lssi(erF qei doumentsD whih re preEnnotted with the nnottions of type Class nd feture type='stime'D re used s the trining dtF sn order to uild the w modelD we require strt nd end nnottions for eh stime nnottionF e use preEproessor tei trnsdution sript to mrk the sTimeStart nd sTimeEnd nnottions on stime nnottionsF pollowing this stepD the whine verning @wvightrpperA with trining mode set to true ollets the trining dt from ll trining doumentsF e qei orpus pipelineD given set of douments nd s to exeute on themD exeutes ll s one y oneD only on one doument t timeF nless provided in seprte pipelineD it mkes it impossile to send ll trining dt @iFeF olleted from ll doumentsA ltogether to the wrpper using the sme pipeline to uild the w modelF his results in the model not eing uilt t the time of olleting trining dtF he stte of the wrpper n e sved to n externl (le one the trining dt is olletedF
pigure IVFPX plow digrm explining the w trining dt olletion fefore lssifying ny unseen doumentD w requires the w model to e villeF sn the sene of n upEtoEdte w modelD wrpper uilds new one using ommnd line SVM_learn utility nd the trining dt olleted from the trining orpusF sn other wordsD the (rst w model is uilt when user tries to lssify the (rst doumentF et this point the user hs n option to sve the model somewhereF his is to enle reloding of the model prior to lssifying other douments nd to void reuilding of the w model everytime the user lssi(es new set of doumentsF yne the model eomes villeD wrpper lssi(es the unseen douments whih retes new sTimeStart nd sTimeEnd nnottions over the textF pinllyD postEproessor tei trnsdution sript is used to omine them into the sTime nnottionF pigure IVFQ explins this proessF he wrpper llows support vetor mhines to e reted whih do either oolen lssiE
Machine Learning
RPQ
(tion or regression @estimtion of numeri prmetersAD nd so the lss ttriute n e oolen or numeriF edditionllyD when lerning lssi(erD w vight supports transductionD wherey dditionl exmples n e presented during trining whih do not hve the vlue of the lss ttriute mrkedF resenting suh exmples nD in some irumstnesD gretly improve the performne of the lssi(erF o mke use of thisD the lss ttriute n e three vlue nominlD in whih se the (rst vlue spei(ed for tht nominl in the on(gurtion (le will e interpreted s trueD the seond s false nd the third s unknownF rnsdution will e used with ny instnes for whih this ttriute is set to the unknown vlueF st is lso possile to use two vlue nominl s the lss ttriuteD in whih se it will simply e interpreted s true or falseF he other ttriutes n e oolenD numeri or nominlD or ny omintion of theseF sf n ttriute is nominlD eh vlue of tht ttriute mps to seprte w vight fetureF ih of these w vight fetures will e given the vlue I when the nominl ttriute hs the orresponding vlueD nd will e omitted otherwiseF sf the vlue of the nominl is not spei(ed in the on(gurtion (le or there is no vlue for n instneD then no feture will e ddedF en extension to the si funtionlity of w vight is tht eh ttriute n reeive weightingF hese weighting n e spei(ed in the on(gurtion (le y dding `isqrsxqb tgs to the prts of the wv (le speifying eh ttriuteF he weighting for the ttriute must e spei(ed s numeri vlueD nd e pled etween n opening `isqrsxqb tg nd losing `Gisqrsxqb oneF qiving n ttriute greter weightingD will use it to ply greter role in lerning the model nd lssifying dtF his is hieved y multiplying the vlue of the ttriute y the weighting efore reting the trining or test dt tht is pssed to w vightF eny ttriute left without n expliitly spei(ed weighting is given
RPR
Machine Learning
defult weighting of oneF upport for these weightings is ontined in the whine verning itselfD nd so is ville to other wrppersD though t time of writing only the w vight wrpper mkes use of weightingsF es with the weix wrpperD w vight models re not updteleD so the model will e trined t the (rst lssi(tion ttemptF he w vight wrpper supports `fegrEwyhiEgvespsgesyx GbD whih should e used unless you hve very good reson not toF he w vight wrpper llows oth dt sets nd models to e loded nd sved to (les in the sme formts s those used y w vight when it is run from the ommnd lineF hen model is svedD (le will e reted whih ontins informtion out the stte of the w vight rpperD nd whih is needed to restore it when the model is loded ginF his (le does notD howeverD ontin ny informtion out the w vight model itselfF sf n w vight model exists t the time of svingD nd tht model is up to dte with respet to the urrent stte of the trining dtD then it will e sved s seprte (leD with the sme nme s the (le ontining informtion out the stte of the wrpperD ut with Fxtivert ppended to the (lenmeF hese (les re in the stndrd w vight model formtD nd n e used with w vight when it is run from the ommnd lineF hen model is reloded y qeiD oth of these (les must e villeD nd in the sme diretoryD otherwise n error will resultF roweverD if n up to dte trined model does not exist t the time the model is svedD then only one (le will e reted upon svingD nd only tht (le is required when the model is relodedF o long s t lest one trining instne existsD it is possile to ring the model up to dte t ny point simply y lssifying one or more instnes @iFeF running the model with the training prmeter set to flseAF
Machine Learning
<!-- The type of annotation used as attribute --> <TYPE>Lookup</TYPE> <!-- The position relative to the instance annotation --> <POSITION>0</POSITION> </ATTRIBUTE> <ATTRIBUTE> <!-- The name given to the attribute --> <NAME>Lookup_MT(-1)</NAME> <!-- The type of annotation used as attribute --> <TYPE>Lookup</TYPE> <!-- Optional: the feature name for the feature used to extract values for the attribute --> <FEATURE>majorType</FEATURE> <!-- The position relative to the instance annotation --> <POSITION>-1</POSITION> <!-- The list of permitted values. if present, marks a nominal attribute; if absent, the attribute is numeric (double) --> <VALUES> <!-- One permitted value --> <VALUE>address</VALUE> <VALUE>cdg</VALUE> <VALUE>country_adj</VALUE> <VALUE>currency_unit</VALUE> <VALUE>date</VALUE> <VALUE>date_key</VALUE> <VALUE>date_unit</VALUE> <VALUE>facility</VALUE> <VALUE>facility_key</VALUE> <VALUE>facility_key_ext</VALUE> <VALUE>govern_key</VALUE> <VALUE>greeting</VALUE> <VALUE>ident_key</VALUE> <VALUE>jobtitle</VALUE> <VALUE>loc_general_key</VALUE> <VALUE>loc_key</VALUE> <VALUE>location</VALUE> <VALUE>number</VALUE> <VALUE>org_base</VALUE> <VALUE>org_ending</VALUE> <VALUE>org_key</VALUE> <VALUE>org_pre</VALUE> <VALUE>organization</VALUE> <VALUE>organization_noun</VALUE> <VALUE>percent</VALUE> <VALUE>person_ending</VALUE> <VALUE>person_first</VALUE> <VALUE>person_full</VALUE> <VALUE>phone_prefix</VALUE> <VALUE>sport</VALUE> <VALUE>spur</VALUE> <VALUE>spur_ident</VALUE> <VALUE>stop</VALUE> <VALUE>surname</VALUE> <VALUE>time</VALUE> <VALUE>time_modifier</VALUE> <VALUE>time_unit</VALUE> <VALUE>title</VALUE> <VALUE>year</VALUE> </VALUES> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE>
RPS
RPT
Machine Learning
<ATTRIBUTE> <!-- The name given to the attribute --> <NAME>Lookup_MT(0)</NAME> <!-- The type of annotation used as attribute --> <TYPE>Lookup</TYPE> <!-- Optional: the feature name for the feature used to extract values for the attribute --> <FEATURE>majorType</FEATURE> <!-- The position relative to the instance annotation --> <POSITION>0</POSITION> <!-- The list of permitted values. if present, marks a nominal attribute; if absent, the attribute is numeric (double) --> <VALUES> <!-- One permitted value --> <VALUE>address</VALUE> <VALUE>cdg</VALUE> <VALUE>country_adj</VALUE> <VALUE>currency_unit</VALUE> <VALUE>date</VALUE> <VALUE>date_key</VALUE> <VALUE>date_unit</VALUE> <VALUE>facility</VALUE> <VALUE>facility_key</VALUE> <VALUE>facility_key_ext</VALUE> <VALUE>govern_key</VALUE> <VALUE>greeting</VALUE> <VALUE>ident_key</VALUE> <VALUE>jobtitle</VALUE> <VALUE>loc_general_key</VALUE> <VALUE>loc_key</VALUE> <VALUE>location</VALUE> <VALUE>number</VALUE> <VALUE>org_base</VALUE> <VALUE>org_ending</VALUE> <VALUE>org_key</VALUE> <VALUE>org_pre</VALUE> <VALUE>organization</VALUE> <VALUE>organization_noun</VALUE> <VALUE>percent</VALUE> <VALUE>person_ending</VALUE> <VALUE>person_first</VALUE> <VALUE>person_full</VALUE> <VALUE>phone_prefix</VALUE> <VALUE>sport</VALUE> <VALUE>spur</VALUE> <VALUE>spur_ident</VALUE> <VALUE>stop</VALUE> <VALUE>surname</VALUE> <VALUE>time</VALUE> <VALUE>time_modifier</VALUE> <VALUE>time_unit</VALUE> <VALUE>title</VALUE> <VALUE>year</VALUE> </VALUES> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE> <ATTRIBUTE> <!-- The name given to the attribute --> <NAME>Lookup_MT(1)</NAME> <!-- The type of annotation used as attribute --> <TYPE>Lookup</TYPE> <!-- Optional: the feature name for the feature used to extract values for the attribute -->
Machine Learning
<FEATURE>majorType</FEATURE> <!-- The position relative to the instance annotation --> <POSITION>1</POSITION> <!-- The list of permitted values. if present, marks a nominal attribute; if absent, the attribute is numeric (double) --> <VALUES> <!-- One permitted value --> <VALUE>address</VALUE> <VALUE>cdg</VALUE> <VALUE>country_adj</VALUE> <VALUE>currency_unit</VALUE> <VALUE>date</VALUE> <VALUE>date_key</VALUE> <VALUE>date_unit</VALUE> <VALUE>facility</VALUE> <VALUE>facility_key</VALUE> <VALUE>facility_key_ext</VALUE> <VALUE>govern_key</VALUE> <VALUE>greeting</VALUE> <VALUE>ident_key</VALUE> <VALUE>jobtitle</VALUE> <VALUE>loc_general_key</VALUE> <VALUE>loc_key</VALUE> <VALUE>location</VALUE> <VALUE>number</VALUE> <VALUE>org_base</VALUE> <VALUE>org_ending</VALUE> <VALUE>org_key</VALUE> <VALUE>org_pre</VALUE> <VALUE>organization</VALUE> <VALUE>organization_noun</VALUE> <VALUE>percent</VALUE> <VALUE>person_ending</VALUE> <VALUE>person_first</VALUE> <VALUE>person_full</VALUE> <VALUE>phone_prefix</VALUE> <VALUE>sport</VALUE> <VALUE>spur</VALUE> <VALUE>spur_ident</VALUE> <VALUE>stop</VALUE> <VALUE>surname</VALUE> <VALUE>time</VALUE> <VALUE>time_modifier</VALUE> <VALUE>time_unit</VALUE> <VALUE>title</VALUE> <VALUE>year</VALUE> </VALUES> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE> <ATTRIBUTE> <!-- The name given to the attribute --> <NAME>POS_category(-1)</NAME> <!-- The type of annotation used as attribute --> <TYPE>Token</TYPE> <!-- Optional: the feature name for the feature used to extract values for the attribute --> <FEATURE>category</FEATURE> <!-- The position relative to the instance annotation --> <POSITION>-1</POSITION> <!-- The list of permitted values.
RPU
RPV
Machine Learning
if present, marks a nominal attribute; if absent, the attribute is numeric (double) --> <VALUES> <!-- One permitted value --> <VALUE>NN</VALUE> <VALUE>NNP</VALUE> <VALUE>NNPS</VALUE> <VALUE>NNS</VALUE> <VALUE>NP</VALUE> <VALUE>NPS</VALUE> <VALUE>JJ</VALUE> <VALUE>JJR</VALUE> <VALUE>JJS</VALUE> <VALUE>JJSS</VALUE> <VALUE>RB</VALUE> <VALUE>RBR</VALUE> <VALUE>RBS</VALUE> <VALUE>VB</VALUE> <VALUE>VBD</VALUE> <VALUE>VBG</VALUE> <VALUE>VBN</VALUE> <VALUE>VBP</VALUE> <VALUE>VBZ</VALUE> <VALUE>FW</VALUE> <VALUE>CD</VALUE> <VALUE>CC</VALUE> <VALUE>DT</VALUE> <VALUE>EX</VALUE> <VALUE>IN</VALUE> <VALUE>LS</VALUE> <VALUE>MD</VALUE> <VALUE>PDT</VALUE> <VALUE>POS</VALUE> <VALUE>PP</VALUE> <VALUE>PRP</VALUE> <VALUE>PRP$</VALUE> <VALUE>PRPR$</VALUE> <VALUE>RP</VALUE> <VALUE>TO</VALUE> <VALUE>UH</VALUE> <VALUE>WDT</VALUE> <VALUE>WP</VALUE> <VALUE>WP$</VALUE> <VALUE>WRB</VALUE> <VALUE>SYM</VALUE> <VALUE>\"</VALUE> <VALUE>#</VALUE> <VALUE>$</VALUE> <VALUE>'</VALUE> <VALUE>(</VALUE> <VALUE>)</VALUE> <VALUE>,</VALUE> <VALUE>--</VALUE> <VALUE>-LRB-</VALUE> <VALUE>.</VALUE> <VALUE>''</VALUE> <VALUE>:</VALUE> <VALUE>::</VALUE> <VALUE>`</VALUE> </VALUES> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE> <ATTRIBUTE> <!-- The name given to the attribute --> <NAME>POS_category(0)</NAME>
Machine Learning
<!-- The type of annotation used as attribute --> <TYPE>Token</TYPE> <!-- Optional: the feature name for the feature used to extract values for the attribute --> <FEATURE>category</FEATURE> <!-- The position relative to the instance annotation --> <POSITION>0</POSITION> <!-- The list of permitted values. if present, marks a nominal attribute; if absent, the attribute is numeric (double) <VALUES> <!-- One permitted value --> <VALUE>NN</VALUE> <VALUE>NNP</VALUE> <VALUE>NNPS</VALUE> <VALUE>NNS</VALUE> <VALUE>NP</VALUE> <VALUE>NPS</VALUE> <VALUE>JJ</VALUE> <VALUE>JJR</VALUE> <VALUE>JJS</VALUE> <VALUE>JJSS</VALUE> <VALUE>RB</VALUE> <VALUE>RBR</VALUE> <VALUE>RBS</VALUE> <VALUE>VB</VALUE> <VALUE>VBD</VALUE> <VALUE>VBG</VALUE> <VALUE>VBN</VALUE> <VALUE>VBP</VALUE> <VALUE>VBZ</VALUE> <VALUE>FW</VALUE> <VALUE>CD</VALUE> <VALUE>CC</VALUE> <VALUE>DT</VALUE> <VALUE>EX</VALUE> <VALUE>IN</VALUE> <VALUE>LS</VALUE> <VALUE>MD</VALUE> <VALUE>PDT</VALUE> <VALUE>POS</VALUE> <VALUE>PP</VALUE> <VALUE>PRP</VALUE> <VALUE>PRP$</VALUE> <VALUE>PRPR$</VALUE> <VALUE>RP</VALUE> <VALUE>TO</VALUE> <VALUE>UH</VALUE> <VALUE>WDT</VALUE> <VALUE>WP</VALUE> <VALUE>WP$</VALUE> <VALUE>WRB</VALUE> <VALUE>SYM</VALUE> <VALUE>\"</VALUE> <VALUE>#</VALUE> <VALUE>$</VALUE> <VALUE>'</VALUE> <VALUE>(</VALUE> <VALUE>)</VALUE> <VALUE>,</VALUE> <VALUE>--</VALUE> <VALUE>-LRB-</VALUE> <VALUE>.</VALUE> <VALUE>''</VALUE> <VALUE>:</VALUE>
RPW
-->
RQH
Machine Learning
<VALUE>::</VALUE> <VALUE>`</VALUE> </VALUES> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE> <ATTRIBUTE> <!-- The name given to the attribute --> <NAME>POS_category(1)</NAME> <!-- The type of annotation used as attribute --> <TYPE>Token</TYPE> <!-- Optional: the feature name for the feature used to extract values for the attribute --> <FEATURE>category</FEATURE> <!-- The position relative to the instance annotation --> <POSITION>1</POSITION> <!-- The list of permitted values. if present, marks a nominal attribute; if absent, the attribute is numeric (double) <VALUES> <!-- One permitted value --> <VALUE>NN</VALUE> <VALUE>NNP</VALUE> <VALUE>NNPS</VALUE> <VALUE>NNS</VALUE> <VALUE>NP</VALUE> <VALUE>NPS</VALUE> <VALUE>JJ</VALUE> <VALUE>JJR</VALUE> <VALUE>JJS</VALUE> <VALUE>JJSS</VALUE> <VALUE>RB</VALUE> <VALUE>RBR</VALUE> <VALUE>RBS</VALUE> <VALUE>VB</VALUE> <VALUE>VBD</VALUE> <VALUE>VBG</VALUE> <VALUE>VBN</VALUE> <VALUE>VBP</VALUE> <VALUE>VBZ</VALUE> <VALUE>FW</VALUE> <VALUE>CD</VALUE> <VALUE>CC</VALUE> <VALUE>DT</VALUE> <VALUE>EX</VALUE> <VALUE>IN</VALUE> <VALUE>LS</VALUE> <VALUE>MD</VALUE> <VALUE>PDT</VALUE> <VALUE>POS</VALUE> <VALUE>PP</VALUE> <VALUE>PRP</VALUE> <VALUE>PRP$</VALUE> <VALUE>PRPR$</VALUE> <VALUE>RP</VALUE> <VALUE>TO</VALUE> <VALUE>UH</VALUE> <VALUE>WDT</VALUE> <VALUE>WP</VALUE> <VALUE>WP$</VALUE> <VALUE>WRB</VALUE> <VALUE>SYM</VALUE> <VALUE>\"</VALUE> <VALUE>#</VALUE>
-->
Machine Learning
<VALUE>$</VALUE> <VALUE>'</VALUE> <VALUE>(</VALUE> <VALUE>)</VALUE> <VALUE>,</VALUE> <VALUE>--</VALUE> <VALUE>-LRB-</VALUE> <VALUE>.</VALUE> <VALUE>''</VALUE> <VALUE>:</VALUE> <VALUE>::</VALUE> <VALUE>`</VALUE> </VALUES> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE> <ATTRIBUTE> <!-- The name given to the attribute --> <NAME>Entity(0)</NAME> <!-- The type of annotation used as attribute --> <TYPE>Entity</TYPE> <!-- The position relative to the instance annotation --> <POSITION>0</POSITION> <CLASS/> <!-- Optional: if present marks the attribute used as CLASS Only one attribute can be marked as class --> </ATTRIBUTE> </DATASET> <ENGINE> <WRAPPER>gate.creole.ml.weka.Wrapper</WRAPPER> <OPTIONS> <CLASSIFIER OPTIONS="-S -C 0.25 -B -M 2">weka.classifiers.trees.J48</CLASSIFIER> <CONFIDENCE-THRESHOLD>0.85</CONFIDENCE-THRESHOLD> </OPTIONS> </ENGINE> </ML-CONFIG>
RQI
RQP
Machine Learning
his hpter introdues new plugin lled elignment9 tht omprises of tools to perform text lignment t vrious level @eFg wordD phrseD sentene etAF st llows users to integrte other tools tht n e useful for speeding up the lignment proessF ext lignment n e hieved t doumentD setionD prgrphD sentene nd word levelF qiven two prllel orporD where the (rst orpus ontins douments in soure lnguge nd the other in trget lngugeD the (rst tsk is to (nd out the prllel douments nd lign them t the doument levelF por these tsks one would need to refer to more thn one doument t the sme timeF reneD need rises for roessing esoures @sA whih n ept more thn one doument s prmetersF por exmple given two doumentsD soure nd trgetD entene elignment would need to refer to oth of them to identify whih sentene of the soure doument ligns with whih sentene of the trget doumentF roweverD the prolem ours when suh is prt of orpus pipelineF sn orpus pipelineD only one doument from the seleted orpus t time is set on the memer sF yne the s hve ompleted their exeutionD the next doument in the orpus is tken nd set on the memer sF hus it is not possile to use orpus pipeline nd t the sme time supply for thn one doument to the underlying sF
19.2
The Tools
e hve introdued few new resoures in qei tht llows proessing prllel dtF hese inlude resoures suh s gompoundhoumentD gompositehoumentD nd new elignmentiditor to nme fewF felow we desrie these omponentsF lese note tht ll these resoures re distriuted s prt of the elignment9 plugin nd therefore the users should lod the plugin (rst in order to use these resouresF RQQ
RQR
! por exmple if user provides three doument shs @eFgF en9D hi9 nd gu9A nd
selets (le with nme pileFenFxml9D the gompoundhoument will serh for rest of the douments @iFeF pileFhiFxml9 nd pileFguFxml9AF he (le nme @iFeF pile9A nd the extension @iFeF xml9A remin ommon for ll three memers of the ompound doumentF
pigure IWFI shows snpshot for instntiting ompound doument from qei heveloperF gompound doument provides vrious methods tht help in essing their individul memE ersF
public Document getDocument(String docid);
RQS
he following method returns mp of douments where the key is doument sh nd the vlue is its respetive doumentF
public Map getDocuments();
lese note tht only one memer doument in ompound doument n hve fous set on itF hen ll the stndrd doument methods of gteFhoument interfe pply to the doument with fous set on itF por exmpleD if there re two doumentsD hi9 nd en9D nd the fous is set on the doument hi9 then the getennottions@A method will return defult nnottion set of the hi9 doumentF yne n use the following method to swith the fous of ompound doument to di'erent doumentX
public void setCurrentDocument(String documentID); public Document getCurrentDocument();
es explined oveD new douments n e dded to or removed from the ompound douE ment using the following methodX
public void addDocument(String documentID, Document document); public void removeDocument(String documentID);
he following ode snippet demonstrtes how to rete new ompound doument using qei imeddedX
RQT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
/ / step 4: nally create an instance of compound document / / for example you want to create a compound document for / / File.id1.xml and File.id2.xml / / step 3: set the parameters / / step 2: load the Alignment plugin / / step 1: initialize GATE
File alignmentHome = new File ( Gate . getPluginsHome () , " Alignment " ); Gate . getCreoleRegister (). addDirectory ( alignmentHome . toURL ()); FeatureMap fm = Factory . newFeatureMap ();
List docIDs = new ArrayList (); docIDs . add ( " id1 " ); docIDs . add ( " id2 " ); fm . put ( " documentIDs " , docIDs ); fm . put ( " sourceUrl " , new URL ( " file :/// url / to / File . id1 . xml " )); Document aDocument = ( gate . compound . CompoundDocument ) Factory . createResource ( " gate . compound . impl . CompoundDocumentImpl " , fm );
19.2.2 CompoundDocumentFromXml
es desried lter in the hpterD the entire ompound doument n e sved in single xml (leF sn order to lod suh ompound doument from the sved xml (leD we provide lnguge resoure lled gompoundhoumentprommlF his is sme s the gompound houmentF he only di'erene is in the prmeters needed to instntite this resoureF his v requires only one prmeter lled ompoundhoumentrlF he prmeter is the url to the xml (leF
RQU
to the ompound doumentF he emove utton removes the urrent visile memer from the doumentF he uttons ve nd ve es wv llow sving the douments individully nd in single xml doument respetivelyF he with utton llows hnging fous of the ompound doument from one memer to the other @this funtionlity is explined lterAF pinllyD the elignment iditor llows one to strt the lignment editor to lign textF
RQV
lss nme tht implements the gominingwethod interfeF he gominingwethod tells the gominewemers how to omine texts nd rete new omposite doumentF por exmpleD defult implementtion of the gominingwethodD lled hefultgominE ingwethodD tkes the following prmeters nd puts the text of the ompound doument9s memers into new omposite doumentF
unitAnnotationType=Sentence inputASName=Key copyUnderlyingAnnotations=true;
he (rst prmeter tells the omining method tht it is the entene9 nnottion type whose text needs to e merged nd it should e tken from the uey9 nnottion set @seond prmeterA nd (nlly ll the underlying nnottions of every entene nnottion must e opied in the omposite doumentF sf there re two memers of ompound doument @eFgF hi9 nd en9AD given the ove prmetersD the omining method (nds out ll the nnottions of type entene from eh doument nd sorts them in sending orderD nd one nnottion from eh doument is put one fter nother in omposite doumentF his opertion ontinues until ll the nnottions hve een trversedF
Document en Sen1 Sen2 Sen3 Document hi Shi1 Shi2 Shi3
he omposite doument lso mintins mpping of text o'sets suh tht if someone dds new nnottion to or removes ny nnottion from the omposite doumentD they re dded to or removed from their respetive doumentsF pinlly the newly reted omposite doument eomes memer of the sme ompound doumentF
19.2.5 DeleteMembersPR
his llows deletion of spei( memer of the ompound doumentF st tkes prmE eter lled doumentsh9 nd deletes doument with this nmeF
RQW
19.2.6 SwitchMembersPR
es desried oveD only one memer of the ompound doument n hve fous set on itF s trying to use the gethoument@A method get pointer to the ompound doumentY however ll the other methods of the ompound doument give ess to the informtion of the doument memer with the fous set on itF o if user wnts to proess prtiulr memer of the ompound doument with some sD sGhe should use the withwemers tht tkes one prmeter lled doumentsh nd sets fous to the doument with tht spei( idF
RRH
n e stored with di'erent nmes @eFgF wordElignmentEuserID wordElignmentEuserP etFAF elignment ojets n e used forX ligning nd unligning two nnottionsY heking if the two nnottions re ligned with eh otherY otining ll the ligned nnottions in doumentY otining ll the nnottions tht re ligned to prtiulr nnottionF qiven ompound doument ontining soure nd trget doumentD the lignment editor strts in the lignment viewer modeF sn this mode the texts of the two douments re shown sideEyEside in prllel windowsF he purpose of the lignment viewer is to highlight the nnottions tht re lredy lignedF he (gure IWFQ shows the lignment viewerF sn this se the seleted douments re inglish nd rindiD titled s en nd hi respetivelyF
pigure IWFQX elignment iewer o see lignmentsD user needs to selet the lignment ojet tht heGshe wnts to see lignE ments fromF elong with thisD user lso needs to selet nnottion sets E one for the soure
RRI
doument nd one for the trget doumentF qiven these prmetersD the lignment viewer highlights the nnottions tht elong to the seleted nnottion sets nd hve een ligned in the seleted lignment ojetF hen the mouse is pled on one of the ligned nnotE tionsD the seleted nnottion nd the nnottions tht re ligned to the seleted nnottion re highlighted in redF sn this se @see (gure IWFQA the word go is ligned with the words chalate heinF fefore the lignment proess n e strtedD the tool needs to know few prmeters out the lignment tskF
nit yf elignmentX this is the nnottion type tht users wnt to perform lignment tF ht oureX generllyD if performing word lignment tskD people onsider pir
of ligned sentenes one t time nd lign words within sentenesF sf the sentenes re nnottedD for exmple s enteneD the entene nnottion type is lled rent of nit of elignmentF he ht oure ontins informtion out the ligned prents of unit of lignmentF sn this seD it would refer to the lignment ojet tht ontins lignment informtion out the nnottions of type enteneF he editor itertes through the ligned sentenes nd forms pirs of prent of unit of lignments to e shown to the user one y oneF sf user does not provide ny dt soureD single pir is formed ontining entire doumentsF elignment peture xmeX this is the nme given to the lignment ojet where the informtion out new lignments is storedF he purpose of the lignment viewer is to highlight the nnottions tht re lredy lignedF he editor omes with three di'erent views for performing lignment whih the user n selet t the time of reting new lignment tskX the vinks view @see IWFR E suitle for hrterD word nd phrse level lignmentsAD the rllel view @see IWFS E suitle for nnottions whih hve longer textsD eFgF sentenesD prgrphsD setionsA nd the wtrix view @see IWFTA E suitle for hrterD word nd phrse level lignmentF vet us ssume tht the user wnts to lign words in sentenes using the vinks viewF he (rst thing he needs to do is to rete new elignment tskF his n e hieved y liking on the pile menu nd seleting the xew sk optionF ser is sked to provide ertin prmeters s disussed oveF he editor lso llows to store tsk on(gurtions in n xml (le whih n e t lter stge reloded in the lignment editorF elsoD if there re more thn one tsk retedD the editor llows users to swith etween themF o lign one or more words in the soure lnguge with one or more words in the trget lngugeD the user needs to selet individul words y liking on them individullyF gliking on words highlights them with n identil olourF ight liking on ny of the seleted words rings up menu with the two defult optionsX eset eletion nd elignF hi'erent olours re used for highlighting di'erent pirs of lignmentsF his helps distinguishing one set of ligned words from other sets of ligned pirsF elso link etween the ligned words in the two texts is drwn to show the lignmentF o unlignD user needs to right lik on the ligned words nd lik on the emove elignment optionF ynly the word on whih user rightEliks is tken out of the lignment nd rest of the words in the pir remin un'etedF e use the term yrphned ennottion to refer to the nnottion whih does not hve ny
RRP
lignment in the trget doumentF sf fter removing n nnottion from lignment pirD there re ny orphned nnottions in the lignment pirD they re unligned tooF
edvned petures
he options elignD eset eletion nd emove elignment re ville y defultF he elign nd the eset eletion options pper when user wnts to lign new nnottionsF he emove elignment option only ppers when user right liks on the lredy ligned nnottionsF he (rst two tions re ville when there is t lest one nnottion seleted in the soure lnguge nd nother one is seleted in the trget lngugeF eprt from these three si tionsD the editor lso llows dding more tions to the editorF here re four di'erent types of tionsX tions tht should e tken efore the user strts ligning words @rehisplyetionAY tions tht should e tken when the user ligns nE nottions @elignmentetionAY the tions tht should e tken when the user hs ompleted
RRQ
ligning ll the words in the given sentene pir @pinishedelignmentetionA nd the E tions to pulish ny dt or sttistis to the userF por exmpleD to help users in the lignment proess y suggesting word lignmentsD one my wnt to wrp preEtrined sttistil word lignment model s rehisplyetionF imilrlyD tions of the type elignmentetion n e used for sumitting exmples to the model in order for the model to updte itselfF hen ll the words in sentene pir re lignedD one my wnt to sign o' the pir nd tke tions suh s ompring ll the lignments in tht sentene pir with the lignments rried out y some other user for the sme pirF imilrlyD while olleting dt in the kgroundD one might wnt to disply some informtion to the user @eFgF sttistis for the olleted dt or some suggestions tht help users in the lignment proessAF hen users lik on the next or the previous uttonD the editor otins the next or the previous pir tht needs to e shown from the dt soureF fefore the pir is displyed in the editorD the editor lls the registered instnes of the rehisplyetion nd the urrent pir ojet is pssed onto the instnes of rehisplyetionF lese note tht this only hppens when the pir is not lredy signed o'F yne the instnes of rehisplyetion hve een
RRR
exeutedD the editor ollets the lignment informtion from the ompound doument nd displys it in the editorF es explined erlierD when users right lik on units of lignment in the editor popup menu with defult options @eFgF elignD eset eletion nd emove elignmentA is shownF he editor llows dding new tions to this menuF st is lso possile tht users my wnt to tke extr tions when they lik on ny of the elign or the emove elignment optionsF he elignmentetion mkes it possile to hieve thisF felow we list some of the prmeters of the elignmenetionF he implementtion is lled depending on these prmetersF invokeporelignedennottion E the tion ppers in the options menu when user right liks on the ligned nnottionF invokeporrighlightednlignedennottion E the tion ppers in the options menu when user right liks on highlighted ut unligned nnottionF
RRS
invokepornhighlightednlignedennottion E the tion ppers in the options menu when user right liks on n unhighlighted nd unligned nnottionF invokeithelignetion E the tion is exeuted whenever user ligns some nnotE tionsF invokeithemoveetion E the tion is exeuted whenever user removes ny lignE mentF ption E in se of the (rst three optionsD the ption is used in the options menuF sn se of the fourth nd the (fth optionsD the ption ppers s hek ox under the tions tF hese methods n e used forD for exmpleD uilding up ditionry in the kground while ligning word pirsF fefore users lik on the next uttonD they re sked if the pir they were ligning hs een ligned ompletely @iFeF signed o' for further lignmentAF sf user replies yes to itD the tions registered s pinishedelignmentetion re exeuted one fter the otherF his ould e helpfulD for instneD to write n lignment exporter tht exports lignment results in n pproprite formt or to updte the ditionry with new lignmentsF sers n point the editor to (le tht ontins list of tions nd prmeters needed to initilize themF e on(gurtion (le is simple text (le with fullyEquli(ed lss nmeD nd required prmeters spei(ed in itF felow we give n exmple of suh on(gurtion (leF
gate.alignment.actions.AlignmentCache,$relpath$/align-cache.txt,root
he (rst rgument is the nme of the lss tht implements one of the tions desried oveF he seond prmeter is the nme of the (le in whih the lignment he should store its resultsF pinllyD the third rgument instruts the lignment he to store root forms of the words in the ditionry so tht di'erent forms of the sme words n e mthed esilyF ell the prmeters @omm seprtedA fter the lss nme re pssed to the tionF he relpth prmeter is resolved t runtimeF
elignmentghe is one suh exmple of pinishedelignmentetion nd the rehisplyetionF his is n inuilt lignment he in the editor whih ollets lignment pirs tht the users nnotteF he ide here is to he suh pirs nd lterD lign them utomtilly if they pper in susequent pirsD thus reduing the e'orts of humns to nnotte the sme pir ginF fy defult the lignment he is disledF sers wishing to enle it should look into the pluginsGelignmentGresouresGtionsFonf nd unomment the pproprite lineF
sers wishing to implement their own tions should refer to the implementtion of the elignmentgheF
RRT
pigure IWFUX ord elignment wv pile hen ligning words in sentenesD it is possile to hve one or more soure sentenes ligned with one or more trget sentenes in pirF his is hieved y hving oure nd rget elements within the ir element whih n hve one or more entene elements in eh of themF ih word or token within these sentenes is mrked with oken elementF ivery oken element hs unique id ssigned to it whih is used when ligning wordsF st is possile to hve IXI or IXmny nd mnyXI lignmentsF he elignment element is used for
RRU
mentioning every lignment pir with soure nd trget ttriutes tht refer to one of the soure token ids nd one of the trget doument ids respetivelyF por exmpleD ording to the (rst lignment entryD the soure token mrkets with id Q is ligned with the trget token bAzAr with id QF he exporter does not export ny entry for the unligned wordsF
RRV
strts in ll these pulitions re lredy identi(edF sf notD you would hve to do some proessing to identify them prior to using the following stepsF sn the following exmpleD we ssume tht the strt oundries hve een nnotted s estrt9 nnottions nd stored under the yriginl mrkups9 nnottion setF tepsX IF grete new orpus nd populte it with set of pulitions tht you would like to proess with exxsiF PF vod the exxsi pplitionF QF vod the elignment9 pluginF RF grete n instne of the egment roessing 9 y seleting it from the list of proessing resouresF SF grete orpus pipelineF TF edd the egment roessing 9 into the pipeline nd provide the following prmeE tersX @A rovide the orpus with pulition douments in it s prmeter to the orpus ontrollerF @A elet the exxsi9 ontroller for the ontroller9 prmeterF @A ype estrt9 in the segmentennottionype9 prmeterF @dA ype yriginl mrkups9 in the inputexme9 prmeterF UF un the pplitionF xowD you should see tht the exxsi pplition hs only proessed the text in eh douE ment tht ws nnotted s estrt9F
RSH
he two omponents operte in very similr wysF qiven doument in the soure form @either qei houment or swe geAD doument in the trget form is reted with opy of the soure doument9s textF ome of the nnottions from the soure re trnsferred to the trgetD ording to mpping de(ned y the userD nd the trget omponent is then runF pinllyD some of the nnottions on the updted trget doument re then trnsferred k to the soureD ording to the userEde(ned mppingF he rest of this doument desries this proess in more detilF etion PHFI desries the qei ei wrpperD nd etion PHFP desries the swe gorpusgontroller wrpperF
20.1
imedding swe nlysis engine in qei pplition is two step proessF pirstD you must onstrut mapping descriptor wv (le to de(ne how to mp nnottions etween the swe ge nd the qei houmentF his mpping (leD long with the nlysis engine desriptorD is used to instntite n AnalysisEnginePR whih lls the nlysis engine on n ppropritely initilized geF ixmples of ll the wv (les disussed in this setion re ville in exmplesGonf under the swe plugin diretoryF
snput he(nitions
ih input de(nition tkes the following formX
<uimaAnnotation type="uima.Type" gateType="GATEType" indexed="true|false"> <feature name="..." kind="string|int|float|fs"> <!-- element defining the feature value goes here --> </feature> ... </uimaAnnotation>
hen doument is proessedD this will rete one swe nnottion of type uimFype in the ge for eh qei nnottion of type qeiype in the input nnottion setD overing the sme o'sets in the textF sf indexed is trueD qei will keep reord of whih qei
RSI
<uimaGateMapping> <inputs> <uimaAnnotation type="..." gateType="..." indexed="true|false"> <feature name="..." kind="string|int|float|fs"> <!-- element defining the feature value goes here --> </feature> ... </uimaAnnotation> </inputs> <outputs> <added> <gateAnnotation type="..." uimaType="..."> <feature name="..."> <!-- element defining the feature value goes here --> </feature> ... </gateAnnotation> </added> <updated> ... </updated> <removed> ... </removed> </outputs> </uimaGateMapping>
RSP
nnottion gve rise to whih swe nnottionF sf you wish to e le to trk updtes to this nnottion9s fetures nd trnsfer the updted vlues k into qeiD you must speify indexeda4true4F he indexed ttriute defults to flse if omittedF ih ontined feture element will use the orresponding feture to e set on the generE ted nnottionF swe fetures n e stringD integer or )ot vluedD or n e referene to nother feture strutureD nd this must e spei(ed in the kind ttriuteF he feture9s vlue is spei(ed using nested elementD ut extly how this vlue is hndled is determined y the kindF here re vrious options for setting feture vluesX `string vluea4fixed string4 Gb he simplest se E (xed tv tringF `dopeturelue nmea4feturexme4 Gb he vlue of the given nmed feture of the urrent qei doumentF `gteennotpeturelue nmea4feturexme4 Gb he vlue of given feture on the urrent qei nnottion @iFeF the one on whih the o'sets of the swe nnottion re sedAF `feturetruture typea4uimFfsFype4bFFF`Gfeturetrutureb e feture struE ture of the given typeF he feturetruture element n itself ontin feture elements reursivelyF he vlue is ssigned to the feture ording to the feture9s kindX
string he vlue ojet9s totring@A method is lledD nd the resulting tring is set s
the string vlue of the fetureF
lledD nd the result is set s the integer vlue of the fetureF sf the vlue oE jet is not xumerD it is totring@AedD nd the resulting tring is prsed using sntegerFprsesnt@AF sf this sueedsD the integer result is usedD if it fils the feture is set to zeroF
)ot es for intD exept tht xumers re onverted y lling flotlue@AD nd nonE
xumers re prsed using plotFprseplot@AF
sn prtiulrD `feturetrutureb vlue elements should only e used with fetures of kind fsF hile nothing will stop you using them with string feturesD the result will proly not e wht you expetedF
RSQ
yutput he(nitions
he output de(nitions tke similr formF here re three groupsX
dded ennottions whih hve een dded y the eiD nd for whih orresponding new
nnottions re to e reted in the qei doumentF
whose feture vlues hve een modi(ed y the eiD nd these vlues re to e trnsE ferred k to the originl qei nnottionsF
he de(nition elements for these three types ll tke the sme formX
<gateAnnotation type="GATEType" uimaType="uima.Type"> <feature name="featureName"> <!-- element defining the feature value goes here --> </feature> ... </gateAnnotation>
por dded nnottionsD this hs the mirrorEimge e'et to the input de(nition ! for eh swe nnottion of the given typeD rete qei nnottion t the sme o'sets nd set its feture vlues s spei(ed y feture elementsF por gteennottion the feture elements do not hve kindD s fetures in qei n hve ritrry yjets s vluesF he possile feture vlue elements for gteennottion reX `string vluea4fixed string4 Gb e (xed stringD s eforeF `uimppeturelue nmea4uimFypeXpeturexme4 kinda4string|int|flot4 Gb he vlue of the given feture of the urrent swe nnottionF he feture nme must e spei(ed in fullyEquli(ed formD inluding the type on whih it is de(nedF he kind is used in similr wy s in input de(nitionsX
string he tv tring ojet returned s the string vlue of the feture is usedF int en snteger ojet is reted from the integer vlue of the fetureF )ot e plot ojet is reted from the )ot vlue of the fetureF
1 Strictly speaking, removed from the annotation index, as feature structures cannot be removed from the
CAS entirely.
RSR
jets re not gurnteed to e vlid one the ge hs een leredD downE strem qei omponent must extrt the relevnt informtion from the feture struture efore the next doument is proessedF ou hve een wrnedF
peture nmes in uimppeturelue must e quli(ed with their type nmeD s the feture my hve een de(ned on supertype of the feture9s own typeD rther thn the type itselfF por exmpleD onsider the followingX
<gateAnnotation type="Entity" uimaType="com.example.Entity"> <feature name="type"> <uimaFSFeatureValue name="com.example.Entity:Type" kind="string" /> </feature> <feature name="startOffset"> <uimaFSFeatureValue name="uima.tcas.Annotation:begin" kind="int" /> </feature> </gateAnnotation>
por updted nnottionsD there must hve een n input de(nition with indexeda4true4 with the sme qei nd swe typesF sn this seD for eh qei nnottion of the pproprite typeD the swe nnottion tht ws reted from it is found in the geF he feture de(nitions re then used s in the dded seD ut hereD the feture vlues re set on the original qei nnottionD rther thn on newly reted nnottionF por removed nnottionsD the feture de(nitions re ignoredD nd the nnottion is removed from qei if the swe nnottion whih it gve rise to hs een removed from the swe nnottion indexF
e gomplete ixmple
pigure PHFP shows omplete exmple mpping desriptor for simple swe ei tht tkes tokens s input nd dds feture to eh token giving the numer of lower se letters in the token9s stringF2 sn this se the swe feture tht holds the numer of lower se letters is lled vowergsevettersD ut the qei feture is lled numvowerF his demonstrtes tht the feture nmes do not need to greeD so long s mpping etween them n e de(nedF
examples/conf.
The AE descriptor
RSS
engine desriptorD or s spei(er giving the lotion of remote ini or ye servieF st is up to the developer to ensure tht the types nd fetures used in the mpping desriptor re omptile with the type system nd pilities of the eiD or runtime error is likely to ourF
RST
he tkes the following runtime prmeter @in ddition to the doument prmeter whih is set utomtilly y gorpusgontrollerAX
from this setD nd ny output mppings ple their new nnottions in this set @dded outputsA or updte the input nnottions in this set @updted or removedAF sf not spei(edD the defult @unnmedA nnottion set is usedF
he ennottor implementtion must e ville for qei to lodF por n nnottor written in tvD this mens tht the te (le ontining the nnottor lss @nd ny other lsses it depends onA must e present in the qei lssloderF he esiest wy to hieve this is to put the te (le or (les in new diretoryD nd rete reoleFxml (le in the sme diretory to referene the tesX
<CREOLE-DIRECTORY> <JAR>my-annotator.jar</JAR> <JAR>classes-it-uses.jar</JAR> </CREOLE-DIRECTORY>
his diretory should then e loded in qei s giyvi pluginF xote thtD due to the omplex mehnis of lssloders in tvD putting your tes in qei9s li diretory will not workF por nnottors written in gCC you need to ensure tht the gCC enler lirries @ville seprtely from httpXGGinutorFpheForgGuimGA nd the shred lirry ontining your nnottor re in diretory whih is on the er @indowsA or vhvsfeer @vinuxA when qei is runF
20.2
he proess of emedding qei ontroller in swe pplition is more or less the mirror imge of the proess detiled in the previous setionF eginD the developer must supply mpping desriptor de(ning how to mp etween swe nd qei nnottionsD nd pss thisD plus the qei ontroller de(nitionD to n ei whih performs the trnsltion nd lls the qei ontrollerF
RSU
extr ttriuteD nnottionetxmeD whih llows inputs to e tken fromD nd outputs to e pled inD di'erent nnottion setsF por exmpleD the following hypothetil exmple mps omFexmpleFerson nnottions into the defult set nd omFexmpleFhtmlFenhor nnottions to 9 tgs in the yriginl mrkups9 setF
<inputs> <gateAnnotation type="Person" uimaType="com.example.Person"> <feature name="kind"> <uimaFSFeatureValue name="com.example.Person:Kind" kind="string"/> </feature> </gateAnnotation> <gateAnnotation type="a" annotationSetName="Original markups" uimaType="com.example.html.Anchor"> <feature name="href"> <uimaFSFeatureValue name="com.example.html.Anchor:hRef" kind="string" /> </feature> </gateAnnotation> </inputs>
pigure PHFQ shows mpping desriptor for n pplition tht tkes tokens nd sentenes produed y some swe omponent nd runs the qei prt of speeh tgger to tg them with enn reefnk y tgsF3 sn the exmpleD no fetures re opied from the swe tokensD ut they re still indexeda4true4 s the y feture must e opied k from qeiF
.gapp
test/conf
UIMA
plugin, along
with the mapping le and the AE descriptor that will run it.
RSV
<uimaGateMapping> <inputs> <gateAnnotation type="Token" uimaType="com.ibm.uima.examples.tokenizer.Token" indexed="true" /> <gateAnnotation type="Sentence" uimaType="com.ibm.uima.examples.tokenizer.Sentence" /> </inputs> <outputs> <updated> <uimaAnnotation type="com.ibm.uima.examples.tokenizer.Token" gateType="Token"> <feature name="POS" kind="string"> <gateAnnotFeatureValue name="category" /> </feature> </uimaAnnotation> </updated> </outputs> </uimaGateMapping>
qteepplition he Fgpp (le ontining the sved pplition stteF wppinghesriptor he mpping desriptor wv (leF
hese must e ound to suitle vsD either y editing the resourewngergonfigurtion setion of the primitive desriptorD or y supplying the inding in n ggregte desriptor tht inludes the qeiepplitionennottor s one of its delegtesF sn dditionD you my need to set the following tv system propertiesX
his defults to
RSW
glsspth xotes
sn ddition to the usul swe lirry te (lesD qeiepplitionennottor requires numer of te (les from the qei distriution in order to funtionF sn the (rst inE stneD you should inlude gteFjr from qei9s in diretoryD nd lso ll the te (les from qei9s li diretory on the lsspthF sf you use the supplied ent uild (leD nt doumentnlyser will run the doument nlyser with this lsspthF hepending on exE tly whih qei plugins your pplition usesD you my e le to exlude some of the li te (les @for exmpleD you will not need ek if you do not use the mhine lerning pluginAD ut it is sfest to strt with them llF qei will lod plugin te (les through its own lssloderD so these do not need to e on the lsspthF
RTH
his hpter desries dditionl giyvi resoures whih do not form prt of exxsiD nd hve not een overed in previous hptersF RTI
RTP
21.1
he ruleEsed ver hunker is sed on numer of grmmrs of inglish gouild WWD ezr VWF e hve developed TV rules for the identi(tion of non reursive ver groupsF he rules over (nite @9is investigting9AD nonE(nite @9to investigte9AD prtiiples @9investiE gted9AD nd speil ver onstruts @9is going to investigte9AF ell the forms my inlude dverils nd negtivesF he rules hve een implemented in teiF he (nite stte nlE yser produes n nnottion of type q9 with fetures nd vlues tht enode syntti informtion @type9D tense9D voie9D neg9D etFAF he rules use the output of the y tgger s well s informtion out the identity of the tokens @eFgF the token might9 is used to identify modlsAF he grmmr for ver group identi(tion n e loded s tpe grmmr into the qei rhiteture nd n e used in ny pplitionX the module is domin independentF he grmmr (le is loted within the exxsi pluginD in the diretory pluginsGexxsiGreE souresGF
21.2
he x ghunker pplition is tv implementtion of the mshw nd wrus fsex hunker @in ft the (les in the resoures diretory re tken stright from their originl distriutionA whih ttempts to insert rkets mrking noun phrses in text whih hve een mrked with y tgs in the sme formt s the output of iri frill9s trnsformtionl tggerF he output from this version should e identil to the output of the originl gCCGerl version relesed y mshw nd wrusF por more informtion out sex strutures nd the use of trnsformtionEsed lerning to derive themD see mshw 8 wrus WSF
RTQ
whih should e set utomtillyF here re (ve runtime prmeters whih should e set prior to exeuting the hunkerF nnottionxmeX nme of the nnottion the hunker should rete to identify noun phrses in the textF inputexmeX he hunker requires ertin types of nnottions @eFgF okens with prt of speeh tgsA for identifying noun hunksF his prmeter tells the hunker whih nnottion set to use to otin suh nnottions fromF outputexmeX his is where the results @iFeF new noun hunk nnottions will e storedAF pospetureX xme of the feture tht holds y tg informtionF 9 unknowngX it works s spei(ed in the previous setionF he hunker requires the following s to hve een run (rstX tokeniserD sentene splitterD y tggerF
21.3
TaggerFramework
he gger prmework is n extension of work originlly developed in order to provide supE port for the reegger plugin within qeiF ther thn fousing on providing support for single externl tgger this plugin provides generi wrpper tht n esily e ustomised @no tv ode is requiredA to inorporte mny di'erent tggers within qeiF he plugin urrently provides exmple pplitions @see pluginsGggerprmeworkGresouresA for the following tggersX qixse @ iomedil tggerAD runpos @providing support for inE glish nd rungrinAD reegger @supporting qermnD prenhD pnish nd stlin s well s inglishAD nd the tnford gger @supporting inglishD qermn nd eriAF he si ide ehind this plugin is to llow the use of mny externl tggersF roviding suh generi wrpper requires few ssumptionsF pirstly we ssume tht the externl tgger will red from (le nd tht the ontents of this (le will e one nnottion per line @iFeF one token or sentene per lineAF eondly we ssume tht the tgger will write it9s response to stdout nd tht it will lso e sed on one nnottion per line ! lthough there is no ssumption tht the input nd output nnottion types re the smeF en importnt issue with most externl tggers is tokenistionX qenerllyD when using ntive qei tgger in pipelineD oken nnottions re (rst generted y tokeniserD nd then proessed y y tggerF wost externl tggersD on the other hndD hve uiltEin ode to perform their own tokenistionF sn this seD there re generlly two optionsX @IA use
RTR
the tokens generted y the externl tgger nd import them k into qei @typilly into oken nnottion typeAF yr @PAD if the tgger epts preEtokenised textD the gger prmework n e on(gured to pss the nnottions s generted y qei tokeniser to the externl tggerF por detils on thisD plese refer to the updteennottions9 runtime prmeter desried elowF roweverD if the tokenistion strtegies re signi(ntly di'erentD this my led to degrdtion of the tgger9s performneF snitiliztion rmeters
untime rmeters
! deugX if set to true then whole hep of useful informtion will e printed to
the messges t s the tgger runsF hefults to flseF
! enodingX this must e set to the enoding tht the tgger expets the inputGoutE
put (les to useF sf this is inorretly set is highly likely tht either the tgger will fil or the results will e meninglessF hefults to syEVVSWEI s this seems to e the most ommonly required enodingF
doument whih nnot e represented in the seleted enodingF sf the prmE eter is true @the defultAD unmpple hrters use the wrpper to throw n exeption nd filF sf set to flseD unmpple hrters re repled y question mrks when the doument is pssed to the tggerF his is useful if your douments re lrgely yu ut ontin the odd hrter from outside the vtinEI rngeF ixeutionixeption if no input ennottions re found nd insted only log single wrning messge per session nd deug messge per doument tht hs no input nnottions @defult a trueAF
! inputemplteX templte string desriing how to uild the line of input for the
tgger orresponding to single nnottionF he templte ontins pleholders of the form 6{feture} whih will e repled y the vlue of the orresponding feture from the nnottionF he defult templte is 6{string}D whih simply psses the string feture of eh nnottion to the tggerF ypil vrints would e 6{string}t6{tegory} for n entity tgger tht requires the string nd the prt of speeh tg for eh tokenD seprted y t1 F sf prtiulr nnottion
1 Java string escape sequences such as \t will be decoded before the template is expanded.
RTS
does not hve one of the spei(ed feturesD the orresponding slot in the templte will e left lnk @iFeF repled y n empty stringAF st is only n error if prtiulr nnottion ontins none of the fetures spei(ed y the templteF
! regexX this should e tv regulr expression tht mthes single line in the
output from the tggerF gpturing groups should e used to de(ne the setions of the expression whih mth the useful outputF
regulr expressionF ih feture will e dded to the output nnottions with vlue equl to the spei(ed pturing groupF por exmpleD the reegger uses regulr expression @FCAt@FCAt@FCA to pture the three olumn outputF his is then omined with the feture mpping {stringaID tegoryaPD lemmaQ} to dd the pproprite fetureGvlues to the output nnottionsF sf not spei(ed the defult @iFeF unEnmedA nnottion set will e usedF
! inputexmeX the nme of the nnottion set whih should e used for inputF ! inputennottionypeX the nme of the nnottion used s input to the tggerF
his will usully e okenF xote tht the input nnottions must ontin string feture whih will e used s input to the tggerF okens usully hve this feture ut ifD for exmpleD you wish to use entene s the input nnottion then you will need to dd the string fetureF tei grmmrs for doing this re provided in pluginsGggerprmeworkGresouresF outputF sf not spei(ed the defult @iFeF unEnmedA nnottion set will e usedF
! outputexmeX the nme of the nnottion set whih should e used for ! outputennottionypeX the nme of the nnottion to e provided s outputF
his is usully okenF
usully shell sript whih my perform extr proessing efore exeuting the tggerF he pluginsGggerprmeworkGresoures diretory ontins exmple sripts @where neededA for the supported tggersF hese sripts my need editing @for exmpleD to set the instlltion diretory of the tggerA efore they n e usedF left unspei(edF
! tggerhirX the diretory from whih the tgger must e exeutedF his n e ! tggerplgsX n ordered set of )gs tht should e pssed to the tgger s
ommnd line options
! updteennottionsX sf set to true then the plugin will ttempt to updte exE
isting output nnottionsF his n fil if the output from the tgger nd the existing nnottions re reted di'erently @iFeF the tgger does its own tokenizE tionAF etting this option to flse will mke the plugin rete new output nnoE ttionsD removing ny existing onesD to prevent the two sets getting out of synF his is lso useful when the tgger is domin spei( nd my do etter jo thn qeiF por exmpleD the qixse tgger is etter t tokenising iomedil text thn the exxsi tokeniserF hefults to trueF
RTT
fy defult the qenerigger simply tries to exeute the tggerfinry using the norml tv untimeFexe@A mehnismF his works (ne on nixEstyle pltforms suh s vinux or w y D ut on indows it will only work if the tggerfinry is Fexe (leF ettempting to invoke other types of progrm fils on indows with rther rypti erroraIWQF o support other types of tgger progrms suh s shell sripts or erl sriptsD the qenerE igger supports tv system property shellFpthF sf this property is set then insted of invoking the tggerfinry diretly the will invoke the progrm spei(ed y shellFpth nd pss the tgger inry s the (rst ommndEline prmeterF sf the tgger progrm is shell sript then you will need to instll the pproprite interE preterD suh s shFexe from the ygwin toolsD nd set the shellFpth system property to point to shFexeF por qei heveloper you n do this y dding the following line to uildFproperties @see etion PFQD nd note the extr kslsh efore eh kslsh nd olon in the pthAX
run.shell.path: C\:\\cygwin\\bin\\sh.exe
imilrlyD for erl or ython sripts you should instll suitle interpreter nd set shellFpth to point to thtF ou n lso run tggers tht re invoked using indows th (le @FtAF o use th (le you do not need to use the shellFpth system propertyD ut insted set the tggerfinry runtime prmeter to point to gXsxhysystemQPmdFexe nd set the (rst two tggerplgs entries to G nd the indowsEstyle pth to the tgger th (le @eFgF gXwyggerrunggerFtAF his will use the to run mdFexe G runggerFt whih is the wy to run th (les from tvF sn generl most of the omplexities of on(guring numer of externl tggers hs lredy een determined nd exmple pipelines re provided in the plugin9s resoures diretoryF o use one of the supported tggers simply lod one of the exmpl pplitions nd then hek the runtime prmeters of the ggerprmework in order to set pths orretly to your opy of the tgger you wish to useF ome tggers require more omplex on(gurtionD detils of whih re overed in the reE minder of this setionF
RTU
ger prmeworkD you n hoose etween pssing okens generted within qei to the reegger for y tgging or let the reegger perform tokenistion s wellD importing the generted okens into qei nnottionsF sf you need to pss the okens generted y qei to the reeggerD it is importnt tht you rete your own ommnd sripts to skip the tokenistion step done y defult in the reegger ommnd sripts @the ones in the reegger9s md diretoryAF e few exmple sripts for pssing qei okens to the reegger re ville under pluginsGggerprmeworkGresouresGreeggerD for exmpleD treeEtggerEgermnEgte runs the qermn prmeter (le with existing oken nnottionsF xote tht you must set the pths in these ommnd (les to point to the lotion where you instlled the reeggerX
BIN=/usr/local/durmtools/TreeTagger/bin CMD=/usr/local/durmtools/TreeTagger/cmd LIB=/usr/local/durmtools/TreeTagger/lib
he gger prmework will run the reegger on ny pltform tht supports the reeE gger toolD inluding vinuxD w y nd indowsD ut the qeiEspei( sripts require ysEstyle fourne shell with the gwkD tr nd grep ommndsD plus erl for the pnish tggerF por indows this mens tht you will need to instll the pE proprite prts of the gygwin environment from httpXGGwwwFygwinFom nd set the system property treetggerFshFpth to ontin the pth to your shFexe @typilly gXygwininshFexeAF
y gsF por inglish the y tgset is slightly modi(ed version of the enn reenk
tgsetD where the seond letter of the tgs for vers distinguishes etween e9 vers @fAD hve9 vers @rA nd other vers @AF
he tgsets for other lnguges n e found on the reegger we siteF pigure PIFI shows sreenshot of prenh doument proessed with the reeggerF
RTV
pigure PIFIX e prenh doument proessed y the reegger through the gger prmework
n e used either s stndlone grmmr or s the postEproess initiliztion feture of the ggerprmework F
RTW
21.4
Chemistry Tagger
his qei module is designed to tg numer of hemistry items in running textF gurrently the tgger tgs ompound formuls @eFgF yPD rPyD rPyR FFFA ions @eFgF peQCD glEA nd element nmes nd symols @eFgF odium nd xAF vimited support for ompound nmes is lso provided @eFgF sulphur dioxideA ut only when followed y ompound formul @in prenthesis or ommsAF
21.5
here re numer of stteEofEtheErt methods for semnti nnottion nd linking to hfE pedi @eFgF hfpedi potlightD eqyD nd wusifrinzAF sn dditionD ommeril we servies suh s elhemyesD ypenglisD nd emnt re lso relevntF e reent evluE tion of ll stteEofEtheErt vyhEsed methods nd toolsD showed tht hfpedi potlight nd emnt hve the est ury on nnotting texts with the orresponding ss from hfpediF emnt es @httpXGGdeveloperFzemntFomA llows pplition developers to query the emnt engine for ontextul informtion out the text tht users enterF qiven piee of textD it identi(es entities in the text nd nnottes these entities with their respetive ss in the hfediF sn qeiD we hve provided wrpper for the emnt esF his wrpperD internllyD sends the entire doument text in numer of thes to the emnt servie nd trnsltes its response into qei nnottionsF purther detils on the emnt servie n e found t httpXGGdeveloperFzemntFomGdosGF
RUH
he emnt ervie n e found under the ggeremnt plugin in qeiF felowD we desrie the vrious initiliztion nd run time prmeters of the F piueyX ine emnt is ommeril servieD ny nonEommeril usge of the servie hs onstrint on numer of requests tht n e mde to the emnt servieF es on PU xovemer PHIPD this limit is set t one thousnd queries per dyF sn order to e le to use the D you re required to otin suh key nd provide it to the F he key n e otined y visiting httpXGGdeveloperFzemntFomGdosG nd reting n ount on the wesiteF numeryfentenessnfthX ine emnt is weservieD only ertin size of text n e sent ross for proessingF he numer of sentenes to e proessed in single th n e spei(ed using this F fy defultD this is set to IH sentenes per thF numeryfentenessngontextX emnt utilises ontextul informtion to identy entities nd ssign eh of them unique s @from hfediAF his prmeter inE dites the dditionl numer of sentenes to e sentD oth from the left nd right ontextsD long with the text to e dismigutedF inputexmeX his is the nnottion set where the looks for entenes to e proessedF outputexmeX he retes nnottions of type Mention for every entity it identi(es in the textF uh wention nnottions re then stored under the nnottion set s spei(ed y the outputexme prmeterF
21.6
vupedi is ext inrihment ervie developed y yntotextF he servie uses ynE totext9s vuf qzetteer to lookup words ginst hfpedi nd vinkedwhf @vinked wovie htseA entitiesF st supports multiple lngugesD suh s inglishD stlin nd prenhF es prt of their servieD they provide vrious output (ltersD weights nd heuristis to llow urte mthingF he servie is imed t performing lookup ut no nmed entity reognitionF yntotext9s evlution of their lupedi es suggests tht it is etter thn tlest two other similr serviesX elhemyes nd ypenglis @see httpXGGwwwFontotextFomGsitesGdefultG(lesGpulitionsGlupediEevlEresultsFpdfA for more detils on their evlutionF sn qeiD we hve developed wrpper round their online esF he wrpperD sends douE ment ontent to the servie nd trnsforms response into qei nnottionsF he wrpper is lled vupedi ervie nd n e found under the ggervupedi plugin in qeiF felowD we desrie vrious run time prmeters of the F
RUI
seensitiveX his prmeter indites whether the lookup performed ginst hfE edi nd vinkedwhf should e se sensitive or notF dtsetsX fy defultD the looks up mthes of types ersonD iventD leD yrgnE istion nd ork nd their sutypes s de(ned in hfedi ontologyF keeppirstendvongestwthX his heuristi llows performing longest mthF sf set to flseD it will nnotte every possile mthF keeprighestX st is possile to hve multiple possile ss for given stringF sf this prmeter is set to trueD only the one with the highest sore is kept nd remining low sore ones re deletedF keeppei(X sf this prmeter is set to trueD only the mth with most spei( s is preservedF lngX es spei(ed erlierD the supports three lngugesX inglishD prenh nd stlinF he lng prmeter is to speify the lnguge of the ontent of the doumentF outputexmeX he produes nnottions of type wentionF he nnottions re stored under the nnottion set with nme spei(ed through this prmeterF singleqreedywthX enother heuristi whih 'ets the wy lookup proedure is rried outF skiphortordsX sf set to trueD this prmeter ensures tht short words @less thn Q hrtersA re skippedF skiptopordsX sf set to trueD stop words re skipped during the lookup proedureF thresholdX he ssigns every mth soreF his prmeter spei(es the miniE mum sore for mentions to e onsidered s possile ndidtesF
21.7
Annotating Numbers
he ggerxumers reole repository ontins numer of proessing resoures whih re designed to nnotte numers ppering within doumentsF es well s nnotting given spn s eing numer the s lso determine the ext numeri vlue of the numer nd dd this s feture of the nnottionF his mkes the nnottions reted y these s idel for uilding more omplex nnottions suh s mesurements or monetry unitsF ell the s in this plugin produe xumer nnottions with the following stndrd fetures typeX this desries the types of tokens tht mke up the numerD eFgF romnD wordsD numers
RUP tring QP IHI QDHHH QFQeQ IGR WIGP RxIHQ SFSBRS thirty one three hundred four thousnd one hundred nd two Q million fnfundzwnzig R sore
vlueX this is the tul vlue @stored s houleA of the numer tht hs een nnotted ih might lso rete other fetures whih re desriedD long with the D in the following setionsF
RUQ
<config> <description>Basic Example</description> <imports> <url encoding="UTF-8">symbols.xml</url> </imports> <words> <word value="0">zero</word> <word value="1">one</word> <word value="2">two</word> <word value="3">three</word> <word value="4">four</word> <word value="5">five</word> <word value="6">six</word> <word value="7">seven</word> <word value="8">eight</word> <word value="9">nine</word> <word value="10">ten</word> </words> <multipliers> <word value="2">hundred</word> <word value="2">hundreds</word> <word value="3">thousand</word> <word value="3">thousands</word> <word value </multipliers> <conjunctions> <word whole="true">and</word> </conjunctions> <decimalSymbol>.</decimalSymbol> <digitGroupingSymbol>,</digitGroupingSymbol> </config>
RUR
he on(gurtion (le is n wv doument tht spei(es the words tht n e used s numers or multipliers @suh s hundredD thousndD FFFA nd onjuntions tht n then e used to omine sequenes of numers togetherF en exmple on(gurtion (le n e seen in pigure PIFPF his on(gurtion (le spei(es hndful of words nd multipliers nd single onjuntionF st lso imports nother on(gurtion (le @in the sme formtA de(ning niode symolsF he words re selfEexplntory ut the multipliers nd onjuntions need further lri(E tionF here re three possile types of multiplierX eX his is the defult multiplier type @iFeF is used if the type is missingA nd signi(es se IH exponentil nottionF por exmpleD if the spei(ed vlue is P then this is expnded to 102 D hene onverting the text Q hundred into 3 102 or QHHF GX his type llows you to de(ne frtionsF por exmple you would de(ne hlf using the vlue P @iFeF you divide y PAF his llows text suh s three hlves to e normlized to IFS @iFeF 3/2AF xote tht you n lso use this type of multiplier to speify multiples greter thn oneF por exmpleD the text four sore should e normlized to VH s sore represents PH yersF o spei(y suh multiplier we use the frtion type with vlue of HFHSF his leds to normlized vlue eing lulted s 4/0.05 whih is VHF o determine the vlue use the simple formul (100/multipe)/100 X wultipliers of this type llow you to speify powersF por exmpleD you ould de(ne squred with vlue of P to llow the text three squred to e normlized to the numer WF sn inglish onjuntions re whole wordsD tht is they require white spe on either side of themD eFgF three hundred nd oneF sn other lngugesD howeverD numers n e joined into single word using onjuntionF por exmpleD in qermn the onjuntion und9 n pper in numer without white speD eFgF twenty one is written s einundzwnzigF sf the onjuntion is whole wordD s in inglishD then the whole ttriute should e set to trueD ut for onjuntions like und9 the ttriute should e set to flseF sn order to support di'erent numer formts the symols used to group numers nd to represent the deiml point n lso e on(guredF hese re optionl elements in the wv on(gurtion (le whih if not supplied defult to omm for the digit group symol nd full stop for the deiml pointF hilst these re pproprite for mny lnguges if you wntedD for exmpleD to prse douments written in fulgrin you would wnt to speify tht the deiml symol ws ommnd nd the grouping symol ws spe in order to reognise numers suh s I HHH HHHDQHQF yne reted n instne of the n then e on(gured using the following runtime prmetersX
RUS
llowithinordsX digits n often our within words @for exmple prt numersD hemil equtions etFA where they should not e interpreted s numersF sf this prmeter is set to true then these instnes will lso e nnotted s numers @useful for nnotting money nd mesurements where spes re often omittedAD howeverD the prmeter defults to flseF nnottionetxmeX the nnottion set to use s oth input nd output for this @due to the wy this works the two sets hve to e the smeA filynwissingsnputennottionsX if the input nnottions @okens nd entenesA re missing should this fil or just not do nythingD defults to true to llow ovious mistkes in pipeline on(gurtion to e ptured t n erly stgeF userintspromyriginlwrkupsX often the originl mrkups will provide hints tht my e useful for orretly interpreting numers within douments @iFeF numeri powers my e in <sup><Gsup> tgsAD if this prmeter is set to true then these hints will e used to help prse the numersD defults to trueF here re no extr nnottion fetures whih re spei( to this numers F he type feture n tke one of three vlues sed upon the text tht is nnottedY wordsD numersD wordsendxumersF
RUT
21.8
Annotating Measurements
wesurements mentioned in text douments n e di0ult to urtely del withF es well s the numerous wys in whih numeri vlues n e written eh type of mesurement @distneD reD time etFA n e written using vriety of di'erent unitsF por exmpleD lengths n e mesured in metresD entimetresD inhesD yrdsD milesD furlongs nd hinsD to mention just fewF hilst mesurements my ll hve di'erent units nd vlues they nD in theory e ompred to one notherF ixtrtingD normlizing nd ompring mesurements n e useful si proess in mny di'erent dominsF he wesurement gger @whih n e found in the ggerwesurements pluginA ttempts to provide suh nnottions for use within si pplitionsF he wesurements gger uses prser sed upon modi(ed version of the tv port of the qx nits pkgeF his llows us to not only reognise nd nnottion spns of text s eing mesurement ut lso to normlize the units to llow for esy omprison of di'erent mesurement vluesF his tully produes two di'erent nnottionsY wesurement nd tioF wesurement nnottions represent mesurements tht involve unitD eFgF QmphD three pintsD R m3 F ingle mesurements @iFeF those not referring to rnge or intervlA re referred to s slr mesurements nd hve the following feturesX typeX for slr mesurements is lwys slr unitX the unit s reognised from the textF xote tht this won9t neessrily e the nnotted textF por exmpleD n nnottion spnning the text three miles would hve unit feture of mileF vlueX houle holding the vlue of the mesurement @this usully omes diretly from the vlue feture of xumer nnottionAF dimensionX the mesurements dimensionD eFgF speedD volumeD reD lengthD time etF normlizednitX to enle mesurements of the sme dimension ut spei(ed in di'erent units to e ompred the redues ll units to their se formF e se form usully onsists of omintion of s unitsF por exmpleD entimetreD mmD nd kilometre re ll normlized to m @for metreAF normlizedlueX houle instne holding the normlized vlueD suh tht the omE intion of the normlized vlue nd normlized unit represent the sme mesurement s the originl vlue nd unitF normlizedX tring representing the normlized mesurement @usully simple spe seprted ontention of the normlized vlue nd unitAF
RUU
ennottions whih represent n intervl or rnge hve slightly di'erent set of feturesF he type feture is set to intervlD there is no normlized or unit feture nd the vlue fetures @inluded the normlized versionA re repled y the following feturesD the vlues of whih re simply opied from the wesurement nnottions whih mrk the oundries of the intervlF normlizedwinlueX houle representing the minimum normlized numer tht forms prt of the intervlF normlizedwxlueX houle representing the minimum normlized numer tht forms prt of the intervlF sntervl nnottions do not reple slr mesurements nd so multiple wesurement nE nottions my well overlpF hey n of ourse e distinguished y the type fetureF es well s wesurement nnottions the tgger lso dds tio nnottions to doumentsF tio nnottions over mesurements tht do not hve unitF erentges re the most ommon rtios to e found in doumentsD ut lso mounts suh s QHH prts per million re nnottedF e tio nnottion hs the following feturesX vlueX houle holding the tul vlue of the rtioF por exmpleD PH7 will hve vlue of HFPF numertorX the numertor of the rtioF por exmpleD PH7 will hve numertor of PHF denomintorX the denomintor of the rtioF por exmpleD PH7 will hve denomintor of IHHF en instne of the mesurements tgger is reted using the following initiliztion prmE etersX ommonvX this (le de(nes units tht re lso ommon words nd so should not e nnotted s mesurement unless they form ompound unit involving two or more unit symolsF por exmpleD g is the epted revition for oulom ut often ppers in douments s prt of referene to tle or (gureD iFeF pigure QgD whih should not e nnotted s mesurementF he defult (le ws hnd tuned over lrge ptent orpus ut my need to e edited when used with di'erent dominsF enodingX the enoding to use when reding oth of the on(gurtion (lesD defults to pEVF
RUV
he does not ttempt to reognise or nnotte numersD insted it relies on xumer nnottions eing present in the doumentF hilst these nnottions ould e generted y ny resoure exeuted prior to the mesurements tggerD we reommend using the xumers gger desried in etion PIFUF sf you hoose to produe xumer nnottions in some other wy note tht they must hve vlue feture ontining houle representing the vlue of the numerF en exmple qei pplitionD showing how to on(gure nd use the two s togetherD is provided with the mesurements pluginF yne reted n instne of the tgger n e on(gured using the following runtime pE rmetersX onsumexumerennottionsX if true then xumer nnottions used to (nd meE surements will e onsumed nd removed from the doumentD defults to trueF filynwissingsnputennottionsX if the input nnottions @okensA re missing should this fil or just not do nythingD defults to true to llow ovious mistkes in pipeline on(gurtion to e ptured t n erly stgeF ignoredennottionsX list of nnottion types in whih mesurement n never ourD defults to set ontining hte nd woneyF inputexmeX the nnottion set used s input to this F outputexmeX the nnottion set to whih new nnottions will e ddedF he ility to prevent the tgger from nnotting mesurements whih our within other nnottions is very useful fetureF he runtime prmetersD howeverD only llow you to speify the nmes of nnottions nd not to restrit on feture vlues or ny other informtion you my know out the douments eing proessedF snternlly ignoring setions of doument is ontrolled y dding gnnotfeewesurement nnottions tht spn the text to e ignoredF sf you need greter ontrol over the proess thn the ignoredennottions prmeter llows then you n rete gnnotfeewesurement nnottions prior to running the mesurement tggerD for exmple tei grmmr pled efore the tgger in the pipelineF xote tht these nnottions will e deleted y the mesurements tgger one proessing hs ompletedF
RUW
21.9
wny informtion extrtion tsks ene(t from or require the extrtion of urte dte informtionF hile exxsi @ghpter TA does produe hte nnottions no ttempt is mde to normlize these dtesD iFeF to (rmly (x ll dtesD even prtil or reltive onesD to timeline using ommon dte representtionF he in the ggerhtexormlizer plugin ttempts to (ll this gp y normlizing dtes ginst the dte of the doument @see elow for detils on how this is determinedA in order to tie eh hte nnottion to spei( dteF his inludes normlizing dtes suh s epril IstD todyD yesterdyD nd next uesdyD s well s onverting fully spei(ed dtes @ones in whih the dyD month nd yer re spei(edA into ommon formtF hi'erent ulturesGountries hve di'erent onventions for writing dtesD s well s di'erent lnguges using di'erent words for the dys of the week nd the months of the yerF he prser underlying this mkes use of the locale-specic informtion when prsing douE mentsF hen initilizing n instne of the hte xormlizer you n speify the lole to use using sy lnguge nd ountry odes long with tv spei( vrints @for detils of these odes see the tv vole doumenttionAF o for exmpleD to speify fritish inglish @whih mens the dy usully omes efore the month in dteA use enqfD or for emerin inglish @where the month usully ppers efore the dy in dteA speify enF sf you need to override the lole on doument sis then you n do this y setting doument feture lled lole to string enoded s oveF sf neither the initiliztion prmeter or doument feture re present or do not represent vlid lole then the defult lole of the tw running qei will e usedF yne initilized nd dded to pipeline the hte xormlizer hs the following runtime prmeters tht n e used to ontrol it9s ehviourF nnottionxmeX the nnottion type reted y this D defults to hteF dtepormtX the formt tht dtes should e normlized toF he formt of this prmeter is the sme s tht use y the tv implehtepormt whose doumenttion desries the full rnge of possile formts @note you must use ww for month nd not mmAF his defults to ddGwwGyyyyF xote tht this prmeter is only required if the numeriyuput prmeter is set to flseF filynwissingsnputennottionsX if the input nnottions @okensA re missing should this fil or just not do nythingD defults to true to llow ovious mistkes in pipeline on(gurtion to e ptured t n erly stgeF inputexmeX the nnottion set used s input to this F normlizedhoumentpetureX if set then the normlized version of the doument dte will e stored in doument feture with this nmeF his prmeter defults to normlizedEdte lthough it n e left lnk to suppress storge of the doument dteF
RVH
st is importnt to note tht rther this plugin retes new hte nnottions nd so if you run it in the sme pipeline s the exxsi xi rnsduer you will likely end up with overlpping hte nnottionsF hepending on your needs it my e tht you need tei grmmr to delete exxsi hte nnottions efore running this F sn prtie we hve found tht the hte nnottions dded y exxsi n e good soure of doument dtes nd so tei grmmr tht uses exxsi htes to dd new houmenthte nnottions nd to delete other hte nnottions n e useful step efore running this F he nnottions reted y this hve the following feturesX normlizeX the normlized dte in the formt spei(ed through the relevnt runtime prmeters of the F inferredX n integer whih spei(es whih speifes whih prts of the dte hd to e inferredF he vlue is tully it msk reted from the following )gdX dy a ID month a PD nd yer a RF ou n (nd whih @if nyA )gs re set y using the ode @inferred 8 pveqA aa pveqD iFeF to see if the dy of the month hd to e inferred you would do @inferred 8 IA aa IF
RVI
ompleteX if no prt of the dte hd to e inferred @iFeF inferred a HA then this will e trueD flse otherwiseF reltiveX n tke the vlues pstD present or future to show how this spei( dte reltes to the doument dteF
21.10
he stemmer pluginD temmernowll9D onsists of set of stemmers s for the folE lowing II iuropen lngugesX hnishD huthD inglishD pinnishD prenhD qermnD stlinD xorweginD ortugueseD ussinD pnish nd wedishF hese tke the form of wrppers for the nowll stemmers freely ville from httpXGGsnowllFtrtrusForgF ih oken is nnotted with new feture stem9D with the stem for tht word s its vlueF he stemmers should e run s other sD on doument tht hs een tokenisedF here re three runtime prmeters whih should e set prior to exeuting the stemmer on doumentF nnottionypeX his is the type of nnottions tht represent tokens in the doumentF hefult vlue is set to oken9F nnottionpetureX his is the nme of feture tht ontins tokens9 stringsF he stemmer uses vlue of this feture s string to e stemmedF hefult vlue is set to string9F nnottionetxmeX his is where the stemmer expets the nnottions of type s spei(ed in the nnottionype prmeter to eF
21.10.1 Algorithms
he stemmers re sed on the orter stemmer for inglish orter VHD with rules impleE mented in nowll eFgF
define Step_1a as ( [substring] among ( 'sses' (<-'ss') 'ies' (<-'i') 'ss' () 's' (delete) )
RVP
21.11
he worphologil enlyser n e found in the ools pluginF st tkes s input tokenized qei doumentF gonsidering one token nd its prt of speeh tgD one t timeD it identi(es its lemm nd n 0xF hese vlues re thn dded s fetures on the oken nnottionF worpher is sed on ertin regulr expression rulesF hese rules were originlly implemented y Kevin Humphreys in qeiI in progrmming lnguge lled FlexF worpher hs pility to interpret these rules with n extension of llowing users to dd new rules or modify the existing ones sed on their requirementsF sn order to llow these opertions with s little e'ort s possileD we hnged the wy these rules re writtenF wore informtion on how to write these rules is explined lter in etion PIFIIFIF wo types of prmetersD snitEtime nd runEtimeD re required to instntite nd exeute the F rulespile @snitEtimeA he rule (le hs severl regulr expression ptternsF ih pttern hs two prtsD vFrFF nd FrFF vFrFF de(nes the regulr expression nd FrFF the funtion nme to e lled when the pttern mthes with the word under onsiderE tionF lese see PIFIIFI for more informtion on rule (leF seensitive @initEtimeA fy defultD ll tokens under onsidertion re onverted into lowerse to identify their lemm nd 0xF sf the user selets caseSensitive to e trueD words re no longer onverted into lowerseF doument @runEtimeA rere the doument must e n instne of qei doumentF 0xpeturexme @runEtimeA xme of the feture tht should hold the 0x vlueF rootpeturexme @runEtimeA xme of the feture tht should hold the root vlueF nnottionetxme @runEtimeA xme of the nnottionet tht ontins okensF onsideryg @runEtimeA ih rule in the rule (le hs seprte tgD whih spei(es whih rule to onsider with wht prtEofEspeeh tgF sf this option is set to flseD ll rules re onsidered nd mthed with ll wordsF his option is very usefulF por exmple if the word under onsidertion is 4singing4F 4singing4 n e used s noun s well s verF sn the se where it is identi(ed s verD the lemm of the sme would e 4sing4 nd the 0x 4ing4D ut otherwise there would not e ny 0xF filynwissingsnputennottions @runEtimeA sf set to true @the defultA the will terE minte with n ixeption if none of the required input ennottions re found in doumentF sf set to flse the will not terminte nd insted log single wrning messge per session nd deug messge per doument tht hs no input nnottionsF
RVQ
riles
he user n de(ne vrious types of vriles under the setion deneVarsF hese vriles n e used s prt of the regulr expressions in rulesF here re three types of vrilesX IF nge ith this type of vrileD the user n speify the rnge of hrtersF eFgF e ==> EEzHEW PF et ith this type of vrileD user n lso speify set of hrtersD where one hrter t time from this set is used s vlue for the given vrileF hen this vrile is used in ny regulr expressionD ll vlues re tried one y one to generE te the string whih is ompred with the ontents of the doumentF eFgF e ==> dqursHWIPQ QF trings here in the two types explined oveD vriles n hold only one hrter from the given set or rnge t timeD this llows speifying strings s possiilities for the vrileF eFgF e ==> 9 y 9 y dd9
ules
ell rules re delred under the setion deneRulesF ivery rule hs two prtsD vr nd rF he vr spei(es the regulr expression nd the r the funtion to e lled when the vr mthes with the given wordF ==>9 is used s delimiter etween the vr nd rF he vr hs the following syntxX
RVR
C9 nd B9D n e used to generte the regulr expressionsF felow we give few exmples of vFrFF expressionsF <ver>4is4 <ver>4nvs4{iihsxq} 4iihsxq4 is vrile de(ned under the setionF xoteX vriles re enlosed with 4{4 nd 4}4F
deneVars
<noun>@{e}B4metre4A 4e4 is vrile followed y the uleene opertor 4B4D whih mens 4e4 n our zero or more timesF <noun>@{e}C4itis4A 4e4 is vrile followed y the uleene opertor 4C4D whih mens 4e4 n our one or more timesF < >4hes4 4< >4 indites tht the rule should e onsidered for ll prtEofE speeh tgsF yn the r of the ruleD the user hs to speify one of the funtions from those listed elowF hese rules re hrdEoded in the worph in qei nd re invoked if the regulr expression on the vr mthes with ny prtiulr wordF stem@nD
stringD ax A
rereD
! ! !
a numer of hrters to e trunted from the end of the stringF a the string tht should e ontented fter the word to produe the
string
rootF
ax
a 0x of the word
ax A
irregstem@rootD
rereD
! !
root ax
RVS
21.12
Flexible Exporter
he plexile ixporter enles the user to sve doument @or orpusA in its originl formt with dded nnottionsF he user n selet the nme of the nnottion set from whih these nnottions re to e foundD whih nnottions from this set re to e inludedD whether fetures re to e inludedD nd vrious renming options suh s renming the nnottions nd the (leF et lod timeD the following prmeters n e set for the )exile exporterX inludepetures E if set to trueD fetures re inluded with the nnottions exportedY if flse @the defult sttusAD they re notF useu0xporhumppiles E if set to true @the defult sttusAD the output (les hve the su0x de(ned in su0xporhumppilesY if flseD no su0x is de(nedD nd the output (le simply overwrites the existing (le @ut see the outputpilerl runtime prmeter for n lterntiveAF su0xporhumppiles E this de(nes the su0x if useu0xporhumppiles is set to trueF fy defult the su0x is FgteF usetndy'wv E if true then the formt will e the qei wv formt tht sepE rtes nodes nd nnottions inside the (le whih llows overlpping nnottions to e svedF he following runtime prmeters n lso e set @fter the (le hs een seleted for the pplitionAX nnottionetxme E this enles the user to speify the nme of the nnottion set whih ontins the nnottions to e exportedF sf no nnottion set is de(nedD it will use the hefult nnottion setF nnottionypes E this ontins list of the nnottions to e exportedF fy defult it is set to ersonD votion nd hteF dumpypes E this ontins list of nmes for the exported nnottionsF sf the nnotE tion nme is to remin the smeD this list should e identil to the list in nnottionE ypesF he list of nnottion nmes must e in the sme order s the orresponding nnottion types in nnottionypesF outputhiretoryrl E this enles the user to speify the export diretory where the (le is exported with its originl nme nd n extension @provided s prmeterA ppended t the end of (lenmeF xote tht you n lso sve whole orpus in one goF sf not providedD use the temporry diretoryF
RVT
21.13
Congurable Exporter
he gon(gurle ixporter llows the user to export ritrry nnottion texts nd feture vlues ording to formt spei(ed in on(gurtion (leF st is written with mhine lerning in mindD where fetures might e required in omm seprted formt or simE ilrD though it ould e eqully well pplied to ny purpose where dt re required in spredsheet formt or simple formt for further proessingF en exmple of the kind of output tht n e otined using the is given elowD lthough signi(nt vrition on the theme is possileD showing typil instne shsD lsses nd ttriutesX
eD eD fD fD fD
4ome text FF4 4ome more text FF4 4purther text FF4 4edditionl text FF4 4et more text FF4
gentrl to the is the onept of n instneY eh line of output will relte to n instneD whih might e doument for exmpleD or n nnottion type within qei doument suh s senteneD tweetD or indeed ny other nnottion typeF snstne is spei(ed s runtime prmeter @see elowAF htever you wnt one per line ofD tht is your instneF he hs one required initilistion prmeterD whih is the lotion of the on(gurtion (leF sf you edit your on(gurtion (leD you must reinitilise the F he on(gurtion (le omprises single line speifying the output formtF ennottion nd feture nmes re surrounded y triple ngle rketsD inditing tht they re to e repled with the nnottionGfetureF he rest of the text in the on(gurtion (le is pssed unhnged into the output (leF here n nnottion type is spei(ed without fetureD the text spnned y tht nnottion will e usedF hot nottion is used to indite tht feture vlue is to e usedF he exmple output given ove might e otined y on(gurtion (le something like thisD in whih indexD lss nd ontent re nnottion typesX
RVU
inputexme E this is the nnottion set whih will e used to rete the export (leF ell nnottions must e in this setD oth instne nnottions nd export nnottionsF sf left lnkD the defult nnottion set will e usedF instnexme E this is the nnottion type to e used s instneF sf left lnkD the doument will e used s instneF outputv E this is the lotion of the output (le to whih the dt will e exportedF sf left lnkD dt will e output to the messges tGstndrd outF xote tht where more thn one nnottion of the spei(ed type ours within the spn of the instne nnottionD the (rst will e used to rete the outputF st is not urrently supported to output more thn one nnottion of the sme type per instneF sf you need to exportD for exmpleD ll the words in the senteneD then you would hve to export the sentene rther thn the individul wordsF
21.14
he ennottion et rnsfer llows opying or moving nnottions to new nnottion set if they lie etween the eginning nd the end of n nnottion of prtiulr type @the overing nnottionAF por exmpleD this n e used when user only wnts to run proessing resoure over spei( prt of doumentD suh s the fody of n rwv doumentF he user spei(es the nme of the nnottion set nd the nnottion whih overs the prt of the doument they wish to trnsferD nd the nme of the new nnottion setF ell the other nnottions orresponding to the mthed text will e trnsferred to the new nnottion setF por exmpleD we might wish to perform nmed entity reognition on the ody of n rwv textD ut not on the hedersF efter tokenising nd performing gzetteer lookup on the whole textD we would use the ennottion et rnsfer to trnsfer those nnottions @reted y the tokeniser nd gzetteerA into new nnottion setD nd then run the remining xi resouresD suh s the semnti tgger nd oreferene modulesD on themF he ennottion et rnsfer hs no lodtime prmetersF st hs the following runtime prmetersX inputexme E this de(nes the nnottion set from whih nnottions will e trnsE ferred @opied or movedAF sf nothing is spei(edD the hefult nnottion set will e usedF
RVV
por exmpleD suppose we wish to perform nmed entity reognition on only the text overed y the fyh nnottion from the yriginl wrkups nnottion set in n rwv doumentF e hve to run the gzetteer nd tokeniser on the entire doumentD euse sine these resoures do not depend on ny other nnottionsD we nnot speify n input nnottion set for them to useF e therefore trnsfer these nnottions to new nnottion set @pilteredA nd then perform the xi reognition over these nnottionsD y speifying this nnottion set s the input nnottion set for ll the following resouresF sn this exmpleD we would set the following prmeters @ssuming tht the nnottions from the tokenise nd gzetteer re initilly pled in the hefult nnottion setAF inputexmeX hefult outputexmeX piltered tgexmeX yriginl mrkups
RVW
opyennottionsX true or flse @depending on whether we wnt to keep the oken nd vookup nnottions in the hefult nnottion setA opyellnlesspoundX true he e mkes shllow opy of the feture mp for eh trnsferred nnottionD iFeF it retes new feture mp ontining the sme keys nd vlues s the originlF st does not lone the feture vlues themselvesD so if your nnottions hve feture whose vlue is olletion nd you need to mke deep opy of the olletion vlue then you will not e le to use the e to do thisF imilrly if you re opying nnottions nd do in ft wnt to shre the sme feture mp etween the soure nd trget nnottions then the e is not ppropriteF sn these sorts of ses tei grmmr or qroovy sript would e etter hoieF
21.15
Schema Enforcer
yne ommon use of the ennottion et rnsfer @eA @see etion PIFIRA is to rete len9 or (nl nnottion set for qei pplitionD iFeF n nnottion set ontining only those nnottions whih re required y the pplition without ny temporry or intermedite nnottions whih my lso hve een retedF hilst relly useful the e su'ers from two prolems IA it n e omplex to on(gure nd PA it o'ers no support for modifying or removing fetures of the nnottions it opiesF wny qei pplitions re developed through proess whih strts with experts mnE ully nnotting douments in order for the pplition developer to understnd wht is required nd whih n lter e used for testing nd evlutionF his is usully done using either qei emwre or within qei heveloper using the hem ennottion iditor @etion QFRFTAF iither pproh requires tht eh of the nnottion types eing reted is desried y n wv sed ennottion hemF he hem inforer @prt of the hemools pluginA uses these sme shems to rete n nnottion setD the ontents of whihD stritly mthes the provided shemsF he hem inforer will opy n nnottion if nd only ifFFFF the type of the nnottion mthes one of the supplied shems ll required fetures re present nd vlid @iFeF meet the requirements for eing opied to the 9len9 nnottionA ih feture of n nnottion is opied to the new nnottion if nd only ifFFFF
RWH
he hem inforer hs no initiliztion prmeters nd is on(gured vi the following runtime prmetersX inputexme E E this de(nes the nnottion set from whih nnottions will e opiedF sf nothing is spei(edD the defult nnottion set will e usedF outputexme E this de(nes the nnottion set to whih the nnottions will e trnsE ferredF his must e n empty or nonEexistent nnottion setF shems E list of shems tht will e enfored when dupliting the input nnottion setF usehefults E if true then the defult vlue for required fetures @spei(ed using the vlue ttriute in the wv shemA will e used to help omplete n otherwise invlid nnottionD defults to flseF hilst this mkes the retion of len output set esy @given the shemsA it is worth noting tht shems n only de(ne fetures whih hve si typesY stringD integerD oolenD )otD douleD shortD nd yteF his mens tht you nnot de(ne feture whih hs n ojet s it9s vlueF por exmpleD this prevents you de(ning feture s list of numersF sf this is n issue then it is trivil to write tei to opy extr fetures not spei(ed in the shems s the nnottions hve the sme sh in oth the input nd output nnottion setsF en exmple tei (le for opying the mthes feture reted y the yrthomther @see etion TFVA is providedF
21.16
qei omes with fullEfetured snformtion etrievl @sA susystem tht llows queries to e performed ginst qei orporF his omintion of si nd s mens tht douments n e retrieved from the orpor not only sed on their textul ontent ut lso ording to their fetures or nnottionsF por exmpleD serh over the erson nnottions for fush9 will return douments with higher relevneD ompred to serh in the ontent for the string ush9F he urrent implementtion is sed on the most populr open soure fullEtext serh engine E vuene @ville t httpXGGjkrtFpheForgGlueneGA ut other implementtions my e dded in the futureF
RWI
en snformtion etrievl system is most often onsidered system tht epts s input set of douments @orpusA nd query @omintion of serh termsA nd returns s input only those douments from the orpus whih re onsidered s relevnt ording to the queryF sullyD in ddition to the doumentsD proper relevne mesure @soreA is returned for eh doumentF here exist mny relevne metrisD ut usully douments whih re onsidered more relevntD ording to the queryD re sored higherF pigure PIFQ shows the results from running query ginst n indexed orpus in qeiF
pigure PIFQX houments with soresD returned from serh over orpus snformtion etrievl systems usully perform some preproessing one the input orpus in order to rete the doumentEterm mtrix for the orpusF e doumentEterm mtrix is usully presented s in le PIFPD where doci is doument from the orpusD termj is word tht is onsidered s importnt nd representtive for the doument nd wi, j is the weight ssigned to the term in the doumentF here re mny wys to de(ne the term weight funtionsD ut most often it depends on the term frequeny in the doument nd in the whole orpus @iFeF the lol nd the glol frequenyAF xote tht the mhine lerning plugin desried in
RWP
ghpter IV n produe suh doumentEterm mtrix @for detiled desription of the mtrix produedD see etion IVFPFRAF xote tht not ll of the words ppering in the doument re onsidered termsF here re mny words @lled stopEwords9A whih re ignoredD sine they re oserved too often nd re not representtive enoughF uh words re rtilesD onjuntionsD etF huring the preproessing phse whih identi(es suh wordsD usully form of stemming is performed in order to minimize the numer of terms nd to improve the retrievl rellF rious forms of the sme word @eFgF ply9D plying9 nd plyed9A re onsidered identil nd multiple ourrenes of the sme term @proly ply9A will e oservedF st is reommended tht the user reds the relevnt snformtion etrievl literture for detiled explntion of stop wordsD stemming nd term weightingF s systemsD in wy similr to si systemsD re evluted with the help of the preision nd rell mesures @see etion IHFI for more detilsAF
RWQ
nd the set of properties tht will e indexed suh s doument feturesD ontentD et @the sme properties will e indexed for eh doument in the orpusAF yne the orpus in indexedD you my strt running queries ginst itF xote tht the diretory spei(ed for the index dt should exist nd e emptyF ytherwise n error will our during the index retionF
pigure PIFRX sndexing orpus y speifying the index lotion nd indexed fetures @nd ontentA
! he orpus tht will e queriedF ! he query tht will e exeutedF ! he mximum numer of douments returnedF
RWR
CodyXgovernment CuthorXgxx
will inspet the doument ontent for the term government9 @together with vritions suh s governments9 etFA nd the index (eld nmed uthor9 for the term gxx9F he uthor9 (eld is spei(ed t index retion timeD nd is either doument feture or nother doument propertyF efter the erh is initilizedD running the pplition exeutes the spei(ed query over the spei(ed orpusF pinllyD the results re displyed @see (gFIA fter douleElik on the erh proE essing resoureF
SerialDataStore sds = Factory . openDataStore ( " gate . persist . SerialDataStore " , " / tmp / datastore1 " ); sds . open (); Document doc0 = Factory . newDocument ( new URL ( " / tmp / documents / doc0 . html " )); doc0 . getFeatures (). put ( " author " ," John Smith " ); Corpus corp0 = Factory . newCorpus ( " TestCorpus " );
RWS
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
IndexedCorpus indexedCorpus = ( IndexedCorpus ) serialCorpus ; DefaultIndexDefinition did = new DefaultIndexDefinition (); did . setIrEngineClassName ( gate . creole . ir . lucene . LuceneIREngine . class . getName ()); did . setIndexLocation ( " / tmp / index1 " ); did . addIndexField ( new IndexField ( " content " , new DocumentContentReader () , false )); did . addIndexField ( new IndexField ( " author " , null , false )); indexedCorpus . setIndexDefinition ( did ); indexedCorpus . getIndexManager (). createIndex ();
/ / the corpus is now indexed / / search the corpus
Search search = new LuceneSearch (); search . setCorpus ( ic ); QueryResultList res = search . search ( " + content : government + author : John " );
Iterator it = res . getQueryResults (); while ( it . hasNext ()) { QueryResult qr = ( QueryResult ) it . next (); System . out . println ( " DOCUMENT_ID = " + qr . getDocumentID () + ", score = " + qr . getScore ()); }
21.17
he egrwleresphinx9 plugin enles qei to uild orpus from we rwlF st is sed on esphinxD teeEsedD ustomizleD multiEthreded we rwlerF
xoteX if you re using this plugin vi n shiD you my need to mke sure tht the weE
sphinxFjr (le is on the shi9s lsspthD or dd it to the shi9s li diretoryF he si ide is to speify soure v @or set of douments reted from we vsA nd depth nd mximum numer of douments to uild the initil orpus upon whih further proessing ould e doneF he itself provides numer of other prmeters to regulte the rwlF
RWT
his now uses the r gontentEype heders to determine eh we pge9s enoding nd wswi type efore reting qei houment from itF st lso dds to eh douE ment Date feture @with jvFutilFhte vlueA sed on the r vstEwodified heder @if villeA or the urrent timestmpD n originalMimeType feture tken from the gontentEype hederD nd n originalLength feture inditing the size in ytes of the downloded doumentF
RWU
depth he depth @integerA to whih the rwl should proeedF dfs e oolenX true the rwler visits links with depthE(rst strtegyY flse the rwler visits links with redthE(rst strtegyY domin en enum vlueD presented s pullEdown list in the qsX fii he rwler visits only the desendents of the pges spei(ed s the roots
for the rwlF
if he rwler n visit ny pges on the weF ii he rwler n visit only pges tht re present on the server where the
root pges re lotedF
mx he mximum numer @integerA of pges to e keptX the rwler will stop when it hs
stored this numer of douments in the output orpusF se 1 to ignore this limitF
mxgeize he mximum pge size in kfY pges over this limit will e ignored"even
s roots of the rwl"nd their links will not e rwledF sf your rwl does not dd ny douments @even the seedsA to the output orpusD try inresing this vlueF @e H or negtive vlue here mens no limitFA
stopefter he mximum numer @integerA of pges to e fethedX the rwler will stop
when it hs visited this numer of pgesF se 1 to ignore this limitF sf max > stopAfter > 0 then the rwl will store t most stopAfter @not max A doumentsF
root e string ontining one v to strt the rwlF soure e orpus tht ontins the douments whose gteFsourev fetures will e used
to strt the rwlF sf you use oth root nd source prmetersD oth the nd the vs olleted from the source douments will seed the rwlF
root
vlue
outputgorpus he orpus in whih the fethed douments will e storedF keywords e vist`tringb for mthing ginst rwled doumentsF sf this list is empty
or nullD ll douments fethed will e keptF ytherwiseD only douments tht ontin one of these strings will e stored in the output orpusF @houments tht re fethed ut not kept re still snned for further linksFA sensitive or notF
keywordsgseensitive his oolen determines whether keyword mthing is seE onvertmlypes qei9s mlhoumentpormt only epts ertin wswi typesF
sf this prmeter is trueD the rwl onverts other wv types @suh s pplitionGtomCxmlFxmlA to textGxml efore trying to instntite the qei doE ument @this llows qei to hndle feedsD for exmpleAF
RWV
useregent sf this prmeter is lnkD the rwler will use the defult esphinx userEgent
hederF et this prmeter to spoof the hederF yne the prmeters re setD the rwl n e run nd the douments fethed @nd mthed to the keywordsD if tht list is in useA re dded to the spei(ed orpusF houments tht re fethed ut not mthed re disrded fter snning them for further linksF
xote tht you must use simple ipelineD nd not gorpus ipelineF sn order to proess
the orpus of rwled doumentsD you need to uild seprte gorpus ipeline nd run it fter rwlingF ou ould omine the two funtions y refully developing riptle gontroller @see setion UFIUFQ for detilsAF
21.18
WordNet in GATE
qei urrently supports versions IFT nd newer of ordxetD so in order to use ordxet in qeiD you must (rst instll omptile version of ordxet on your omputerF ordxet is ville t httpXGGwordnetFprinetonFeduGF he next step is to on(gure qei to work with your lol ordxet instlltionF ine qei relies on the tv ordxet virry @txvA for ordxet essD this step onsists of providing one speil xml (le tht is used internlly y txvF his (le desries the lotion of your lol opy of the ordxet index (lesF en exmple of this wnEon(gFxml (le is shown elowX
2 see
http://docs.oracle.com/javase/6/docs/technotes/guides/net/proxies.html
RWW
<?xml version="1.0" encoding="UTF-8"?> <jwnl_properties language="en"> <version publisher="Princeton" number="3.0" language="en"/> <dictionary class="net.didion.jwnl.dictionary.FileBackedDictionary"> <param name="morphological_processor" value="net.didion.jwnl.dictionary.morph.DefaultMorphologicalProcessor"> <param name="operations"> <param value= "net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/> <param value="net.didion.jwnl.dictionary.morph.DetachSuffixesOperation"> <param name="noun" value="|s=|ses=s|xes=x|zes=z|ches=ch|shes=sh|men=man|ies=y|"/> <param name="verb" value="|s=|ies=y|es=e|es=|ed=e|ed=|ing=e|ing=|"/> <param name="adjective"
SHH
SHI
value="|er=|est=|er=e|est=e|"/> <param name="operations"> <param value="net.didion.jwnl.dictionary.morph.LookupIndexWordOperation"/> <param value="net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/> </param> </param> <param value="net.didion.jwnl.dictionary.morph.TokenizerOperation"> <param name="delimiters"> <param value=" "/> <param value="-"/> </param> <param name="token_operations"> <param value="net.didion.jwnl.dictionary.morph.LookupIndexWordOperation"/> <param value="net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/> <param value="net.didion.jwnl.dictionary.morph.DetachSuffixesOperation"> <param name="noun" value="|s=|ses=s|xes=x|zes=z|ches=ch|shes=sh|men=man|ies=y|"/> <param name="verb" value="|s=|ies=y|es=e|es=|ed=e|ed=|ing=e|ing=|"/> <param name="adjective" value="|er=|est=|er=e|est=e|"/> <param name="operations"> <param value= "net.didion.jwnl.dictionary.morph.LookupIndexWordOperation"/> <param value= "net.didion.jwnl.dictionary.morph.LookupExceptionsOperation"/> </param> </param> </param> </param> </param> </param> <param name="dictionary_element_factory" value= "net.didion.jwnl.princeton.data.PrincetonWN17FileDictionaryElementFactory"/> <param name="file_manager" value= "net.didion.jwnl.dictionary.file_manager.FileManagerImpl"> <param name="file_type" value= "net.didion.jwnl.princeton.file.PrincetonRandomAccessDictionaryFile"/> <param name="dictionary_path" value="/home/mark/WordNet-3.0/dict/"/> </param> </dictionary> <resource class="PrincetonResource"/> </jwnl_properties>
SHP
here re three things in this (le whih you need to on(gure sed upon the version of ordxet you wish to useF pirstly hnge the numer ttriute of the version element to mth the version of ordxet you re usingF hen edit the vlue of the ditionrypth prmeter to point to your lol instlltion of ordxet @this is GusrGshreGwordnetG if you hve instlled the untu or hein wordnetEse pkgeFA
pinllyD if you wnt to use version IFT of ordxet then you lso need to lter the ditionryelementftory to use netFdidionFjwnlFprinetonFdtFrinetonxITpilehition por full detils of the formt of the on(gurtion (le see the txv doumenttion t httpXGGsoureforgeFnetGprojetsGjwordnetF efter on(guring qei to use ordxetD you n strt using the uiltEin ordxet rowser or esF sn qei heveloperD lod the ordxet plugin vi the lugin wngement gonsoleF hen lod ordxet y seleting it from the set of ville lnguge resouresF et the vlue of the prmeter to the pth of the xml properties (le whih desries the ordxet lotion @wnEon(gAF yne ordxet is loded in qei heveloperD the wellEknown interfe of ordxet will pE perF ou n serh ord xet y typing word in the ox next to to the lel erhord nd then pressing erh9F ell the senses of the word will e displyed in the window elowF futtons for the possile prts of speeh for this word will lso e tivted t this pointF por instneD for the word ply9D the uttons xoun9D er9 nd edjetive9 re tivtedF ressing one of these uttons will tivte menu with hyponymsD hypernymsD meronyms for nouns or ver groupsD nd use for versD etF eleting n item from the menu will disply the results in the window elowF o upgrde ny existing qei pplitions to use this improved ordxet plugin simply reple your existing on(gurtion (le with the exmple ove nd on(gure for ordxet IFTF his will then give results identil to the previous version ! unfortuntely it ws not possile to provide trnsprent upgrde proedureF wore informtion out ordxet n e found t httpXGGwordnetFprinetonFeduG wore informtion out the txv lirry n e found t httpXGGsoureforgeFnetG
projetsGjwordnet
en exmple of using the ordxet es in qei is ville on the qei exmples pge t httpXGGgteFFukGwikiGodeErepositoryGindexFhtmlF
SHQ
ordxetX the min ordxet lssF rovides methods for getting the synsets of lemmD for essing the unique eginnersD etF ordX o'ers ess to the word9s lemm nd senses ordenseX gives ess to the synsetD the wordD y nd lexil reltionsF ynsetX gives ess to the word senses @synonymsA in the synsetD the semnti relE tionsD y etF erX gives ess to the ver frmes @not working properly t presentA edjetiveX gives ess to the djF position @ttriutiveD preditiveD etFAF eltionX strt reltion suh s typeD symolD inverse reltionD set of y tgsD etF to whih it is pplileF vexileltion emntieltion erprme
SHR
21.19
ue is tool for utomti detetion of key phrses developed t the niversity of ikto in xew elndF he home pge of the projet n e found t httpXGGwwwFnzdlForgGueGF his user guide setion only dels with the spets relting to the integrtion of ue in qeiF por the inner workings of ueD plese visit the ue we site ndGor ontt its uthorsF sn order to use ue in qei heveloperD the ueyphrseixtrtionelgorithm9 plugin needs to e loded using the plugins mngement onsoleF efter doing thtD two new resoure types re ville for retionX the uie ueyphrse ixtrtor9 @ proessing resoureA nd the uie gorpus smporter9 @ visul resoure ssoited with the AF
SHS
doument he doument to e proessedF inpute he input nnottion setF his prmeter is only relevnt when the is runE
ning in trining mode nd it spei(es the nnottion set ontining the keyphrse nnottionsF
outpute he output nnottion setF his prmeter is only relevnt when the is
running in pplition mode @iFeF when the triningwode9 prmeter is set to flseA nd it spei(es the nnottion set where the generted keyphrse nnottions will e svedF
minhrsevength the minimum length @in numer of wordsA for keyphrseF minxumyur the minimum numer of ourrenes of phrse for it to e keyphrseF mxhrsevength the mximum length of keyphrseF phrsesoixtrt how mny di'erent keyphrses should e genertedF keyphrseennottionype the type of nnottions used for keyphrsesF dissllowsnternleriods should internl periods e disllowedF
SHT
oure hiretory the diretory ontining the text nd key (lesF his n e typed in or
seleted y pressing the folder utton next to the text (eldF
SHU
ixtension for text (les the extension used for text (elds @y defult FtxtAF ixtension for keyphrse (les the extension for the (les listing keyphrsesF inoding for input (les the enoding to e used when reding the (lesF gorpus nme the nme for the qei orpus tht will e retedF yutput nnottion set the nme for the nnottion set tht will ontin the keyphrses
red from the input (lesF
21.20
sf we hve nnottions out the sme sujet on the sme doument from di'erent nnoE ttorsD we my need to merge the nnottionsF his plugin implements two pprohes for nnottion mergingF
MajorityVoting
tkes prmeter numMinK nd selets the nnottion on whih t lest numMinK nnottors greeF sf two or more merged nnottions hve the sme spnD then the nnottion with the most supporters is kept nd other nnottions with the sme spn re disrdedF selets one nnottion from those nnottions with the sme spnD whih the mjority of the nnottors supportF xote tht if one nnottor did not rete the nnottion with the prtiulr spnD we ount it s one nonEsupport of the nnottion with the spnF sf it turns out tht the mjority of the nnottors did not support the nnottion with tht spnD then no nnottion with the spn would e put into the merged nnottionsF
MergingByAnnotatorNum
he nnottion merging methods re ville vi the ennottion werging pluginF he plugin n e used s in pipeline or orpus pipelineF o use the D eh doument in the pipeline or the orpus pipeline should hve the nnottion sets for mergingF he nnottion merging hs no loding prmeters ut hs severl runEtime prmetersD explined further elowF he nnottion merging methods re implemented in the qei esD nd re ville in qei imedded s desried in etion UFIWF
rmeters
annSetOutputX
the nnottion set in the urrent doument for storing the merged nnottionsF ou should not use n existing nnottion setD s the ontents my e deleted or overwrittenF
SHV
annSetsForMergingX
the nnottion sets in the doument for mergingF st is n optionl prmeterF sf it is not ssigned with ny vlueD the nnottion sets for merging would e ll the nnottion sets in the doument exept the defult nnottion setF sf spei(edD it is sequene of the nmes of the nnottion sets for mergingD seprted y Y9F por exmpleD the vlue EIYEPYEQ9 represents three nnottion setD EI9D EP9 nd EQ9F
annTypeAndFeatsX
the nnottion types in the nnottion set for mergingF st is n optionl prmeterF st spei(es the nnottion types in the nnottion sets for mergingF por eh type spei(edD it my lso speify n nnottion feture of the typeF he prmeter is sequene of nmes of nnottion typesD seprted y Y9F e single nnottion feture n e spei(ed immeditely following the nnottion type9s nmeD seprted y E>9 in the sequeneF por exmpleD the vlue ixE >senelYysxsyxyYysxsyxgE>type9 spei(es three nnottion typesD ix9D ysxsyxy9 nd ysxsyxg9 nd spei(es the nnottion feE ture senel9 nd type9 for the two types ix nd ysxsyxgD respetively ut does not speify ny feture for the type ysxsyxyF sf the annTypeAndFeats pE rmeter is not setD the nnottion types for merging re ll the types in the nnottion sets for mergingD nd no nnottion feture for eh type is spei(edF
keepSourceForMergedAnnotationsX ForMerging
should soure nnottions e kept in the nnottion sets when mergedc rue y defultF
annSets-
spei(es the method used for mergingF ossile vlues re MajorityVoting nd MergingByAnnotatorNumD referring to the two merging methods deE sried oveD respetivelyF
minimalAnnNumX
mergingMethodX
spei(es the miniml numer of nnottors who gree on one nnoE ttion in order to put the nnottion into merged setD whih is needed y the merging method MergingByAnnotatorNumF sf the vlue of the prmeter is smller thn ID the prmeter is tken s IF sf the vlue is igger thn totl numer of nnottion sets for mergingD it is tken to e totl numer of nnottion setsF sf no vlue is ssignedD defult vlue of I is usedF xote tht the prmeter does not hve ny e'et on the other merging method MajorityVotingF
21.21
ometimes doument hs two opiesD eh of whih ws nnotted y di'erent nnoE ttors for the sme tskF e my wnt to opy the nnottions in one opy to the other opy of the doumentF his ould e in order to use less resouresD or so tht we n proess them with some other pluginD suh s nnottion merging or seeF he gopyennotsfetweenhos plugin does extly thisF he plugin is ville with the qei distriutionF hen loding the plugin into qeiD it is represented s proessing resoureD gopy enns to enother ho F ou need
SHW
to put the into Corpus Pipeline to use itF he plugin does not hve ny initilistion prmetersF st hs severl runEtime prmetersD whih speify the nnottions to e opiedD the soure douments nd trget doumentsF sn detilD the runEtime prmeters reX sourepilesv spei(es diretory in whih the soure douments re inF he soure douments must e qei xml doumentsF he plugin opies the nnottions from these soure douments to trget doumentsF inputexme spei(es the nme of the nnottion set in the soure doumentsF hole nnottions or prts of nnottions in the nnottion set will e opiedF nnottionypes spei(es one or more nnottion types in the nnottion set inputASName whih will e opied into trget doumentsF sf no vlue is givenD the plugin will opy ll nnottions in the nnottion setF outputexme spei(es the nme of the nnottion set in the trget doumentsD into whih the nnottions will e opiedF sf there is no suh nnottion set in the trget doumentsD the nnottion set will e reted utomtillyF he gorpus prmeter of the Corpus Pipeline pplition ontining the plugin spei(es orpus whih ontins the trget doumentsF qiven one @trgetA doument in the orpusD the plugin tries to (nd soure doument in the soure diretory spei(ed y the prmeter sourceFilesURLD ording to the similrity of the nmes of the soure nd trget doumentsF he similrity of two (le nmes is lulted y ompring the two strings of nmes from the strt to the end of the stringsF wo nmes hve greter similrity if they shre more hrE ters from the eginning of the stringsF por exmpleD suppose two trget douments hve the nmes aabcc.xml nd abcab.xml nd three soure (les hve nmes abacc.xmlD abcbb.xml nd aacc.xmlD respetivelyF hen the trget doument aabcc.xml hs the orresponding soure doument aacc.xmlD nd abcab.xml hs the orresponding soure doument abcbb.xmlF
21.22
OpenCalais Plugin
ypenglis provides we servie for semnti nnottion of textF he user sumits doument to the we servieD whih returns entity nd reltions nnottions in hpD tyx or some other formtF ypillyD users integrte ypenglis nnottion of their we pges to provide dditionl links nd semnti funtionlity9F ypenglis n e found t httpX
GGwwwFopenlisFom
he qei ypenglis sumits qei doument to the ypenglis we servieD nd dds the nnottions from the ypenglis response s qei nnottions in the qei doumentF st therefore provides ypenglis semnti nnottion funtionlity within qeiD for use y other sF
SIH
he only supports ypenglis entitiesD not reltions E lthough this should e strightE forwrd for ompetent tv progrmmer to ddF ih ypenglis entity is represented in qei s n ypenglis nnottionD with fetures s given in the ypenglis doumenE ttionF he n e loded with the giyvi plugin mnger dilogD from the reole diretory in the gte distriutionD gteGpluginsGggerypenglisF sn order to use the D you will need to hve n ypenglis ountD nd request n ypenglis servie keyF ou n do this from the ypenglis we site t httpXGGwwwFopenlisFomF rovide your servie key s n initilistion prmeter when you rete new ypenglis in qeiF ypenglis mke restritions on the the numer of requests you n mke to their we servieF ee the ypenglis we pge for detilsF snitilistion prmeters reX openglisv his is the v of the ypenglis i servieD nd should not need to e hnged E unless ypenglis moves it3 liensesh our ypenglis servie keyF his hs to e requested from ypenglis nd is spei( to youF rious runtime prmeters re ville from the ypenglis esD nd re nmed the sme s in tht esF ee the ypenglis doumenttion for further detilsF
21.23
LingPipe Plugin
vingipe is suite of tv lirries for the linguisti nlysis of humn lnguge3 F e hve provided plugin lled vingipe9 with wrppers for some of the resoures ville in the vingipe lirryF sn order to use these resouresD plese lod the vingipe9 pluginF gurrentlyD we hve integrted the following (ve proessing resouresF vingipe okenizer vingipe entene plitter vingipe y gger vingipe xi vingipe vnguge sdenti(er
3 see
http://alias-i.com/lingpipe/
SII
lese note tht most of the resoures in the vingipe lirry llow lerning of new modelsF roweverD in this version of the qei plugin for vingipeD we hve only integrted the pplition funtionlityF ou will need to lern new models with vingpipe outside of qeiF e hve provided some exmple models under the resoures9 folder whih were downloded from vingipe9s wesiteF por more informtion on liensing issues relted to the use of these modelsD plese refer to the liensing terms under the vingipe plugin diretoryF he vingipe system n e loded from the qei qs y simply seleting the vod vingipe ystem9 menu item under the pile9 menuF his is similr to loding the exxsi pplition with defult vluesF
SIP
wo models for fulgrin re now ville in qeiX bulgarian-full.model nd bulgarian-simplied.modelD trined on trnsformed version of the fulreefnkE h ysenov 8 imov HRD imov 8 ysenov HQD imov et al. HPD imov et al. HRF he full model uses the omplete tgset imov et al. HR wheres the simpli(ed model uses tgs trunted efore ny hyphens @for exmpleD !pD !sEfD !sEmD !sEnD nd eEsEm re ll merged to A to improve performneF his redues the set from SUQ to PRW tgs nd sves memoryF his hs the following runEtime prmetersF
inputexme he nme of the nnottion set with Token nd Sentence nnottionsF pplitionwode he y tgger n e pplied on the text in three di'erent modesF psfi he tgger produes one tg for eh token @the one tht it lultes
is estA nd stores it s simple tring in the
category
fetureF
gyxpshixgi he tgger produes the est (ve tgs for eh tokenD with on(E
dene soresD nd stores them s wp`tringD houleb in the category fetureF his pplition mode requires more memory thn the othersF
xfi he tgger produes the (ve est tggings for the whole doument nd
then stores one to (ve tgs for eh token @with doumentEsed soresA s wp`tringD vist`houle in the category fetureF his pplition mode is notiely slower thn the othersF
SIQ
left lnk @null or emptyAD the lssi(es the text of eh doument nd stores the result s doument fetureF
is lnkF
21.24
OpenNLP Plugin
ypenxv provides jvEsed tools for sentene detetionD tokeniztionD posEtggingD hunkE ingD prsingD nmedEentity detetionD nd orefereneF ee the ypenxv wesite for detilsF sn order to use these tools s qei proessing resouresD lod the ypenxv9 plugin vi the lugin wngement gonsoleF elterntivelyD the ypenxv system for inglish n e loded from the qei qs y simply seleting Applications Ready Made Applications OpenNLP OpenNLP IE SystemF wo smple pplitions re lso provided for huth nd qermn in this plugin9s resoures diretoryD lthough you need to downlod the relevnt models from oureforgeF e hve integrted (ve ypenxv tools into qei proessing resouresX ypenxv okenizer ypenxv entene plitter ypenxv y gger ypenxv ghunker ypenxv xi @nmed entity reognitionA sn generlD these s n e mixed with other s of similr typesF por exmpleD you ould rete pipeline tht uses the ypenxv okenizerD nd the exxsi y ggerF ou my osionlly hve prolems with some omintionsD nd di'erent ypenxv models use di'erent y nd hunk tgsF xotes on omptiility nd prerequisites re given for eh in the setions elowF
SIR
xote lso tht some of the ypenxv tools use quite lrge mhine lerning modelsD whih the s need to lod into memoryF ou my (nd tht you hve to give dditionl memory to qei in order to use the ypenxv s omfortlyF ee the pe on the qei iki for n exmple of how to do thisF
ypenxv y gger
his dds
category
feture to eh
Token
nnottionF
his requires Sentence nd Token nnottions to e present in the nnottion set speE i(ed y inputexmeF @hey do not hve to ome from ypenxv sFA sf the outE
SIS
Token
nnottion nd dd the
category
ypenxv xi @xmepinderA
his (nds stndrd nmed entities nd dds nnottions for themF his requires Sentence nd Token nnottions to e present in the nnottion set speiE (ed y the inputexme runEtime prmeterF @hey do not hve to ome from ypenxv sFA he Token nnottions do not need to hve category feture @so y tgger is not prerequisite to this AF his retes nnottions in the outputexme runEtime prmeter9s set with types spei(ed in the on(gurtion (leD whose v ws spei(ed s n init prmeter so it nnot e hnged fter initiliztionF @he ontents of the on(g (le nd the (les it points toD howeverD n e hnged"reinitilizing the lers out ny models in memoryD relods the on(g (leD nd lods the models now spei(ed in tht (leFA e on(gurtion (le should onsist of two whitespeEseprted olumnsD s in this exmpleF
he (rst entry in eh row ontins pth to model (le @reltive to the diretory where the on(g (le is lotedD so in this exmple the models re ll in the sme diretory with the on(g (leAD nd the seond ontins the nnottion type to e generted from tht modelF wore thn one model (le n generte the sme nnottion typeF
ypenxv ghunker
his mrks nounD verD nd other hunks using fetures on
Token
nnottionsF
his requires Sentence nd Token nnottions to e present in inputexme runEtime prmeter9s setD nd requires category fetures on the Token nnottions @so y tgger is prerequisiteAF sf the outputexme nd inputexme runEtime prmeters re the smeD the
SIT
dds feture nmed ording to the hunkpeture runEtime prmeter to eh Token nnottionF sf the nnottion sets re di'erentD the opies eh Token nd dds the feture to the output opyF he feture uses the ommon fsy vluesD s in the following exmplesX
fEx token egins of noun phrseY sEx token is inside noun phrseY fE token egins ver phrseY sE token is inside ver phrseY y token is outside ny phrseY fE token egins prepositionl phrseY fEeh token egins n dveril phrseF
21.25
hen working in losed domin it is often possile to rft few tei rules to seprte rel doument ontent from the oilerplte hedersD footersD menusD etF tht often pperD espeilly when deling with we doumentsF es the numer of doument soures inresesD howeverD it eomes di0ult to seprte ontent from oilerplte using hnd rfted rules nd more generl pproh is requiredF he ggerfoilerpipe9 plugin ontins tht n e used to pply the oilerE pipe lirry @see httpXGGodeFgoogleFomGpGoilerpipeGA to qei douments in order to nnotte the ontent setionsF he oilerpipe lirry is sed upon work reported in uohlshtter et al. IHD lthough it hs seen numer of improvements sine thenF hue to the wy in whih the lirry works not ll fetures re urrently ville through the qei F he is on(gured using the following runtime prmetersX
SIU
llgontentX this prmeter de(nes how the mime type prmeter should e interE preted nd if douments shouldD insted of eing proessedD y ssumed to ontin nothing ut tul ontentF defults to sf wime ype is xy visted9 whih mens tht ny doument with mime type not listed is ssumed to e ll ontentF nnottefoilerplteX should we nnotte the oilerplte setions of the doumentD defults to flseF nnottegontentX should we nnotte the min ontent of the doumentD defults to trueF oilerplteennottionxmeX the nme of the nnottion type to nnotte seE tions determined to e oilerplteD defults to foilerplte9F hilst this prmeter is optionl it must e spei(ed if nnottefoilerplte is set to trueF ontentennottionxmeX the nme of the nnottion type to nnotte setions determined to e ontentD defults to gontent9F hilst this prmeter is optionl it must e spei(ed if nnottegontent is set to trueF deugX if true then nnottions reted y the will ontin deugging infoD defults to flseF extrtorX spei(es the oilerpipe extrtor to useD defults to the defult extrtorF filynwissingsnputennottionsX if the input nnottions @okensA re missing should this fil or just not do nythingD defults to true to llow ovious mistkes in pipeline on(gurtion to e ptured t n erly stgeF inputexmeX the nme of the input nnottion set mimeypesX set of mime types tht ontrol doument proessingD defults to texE tGhtmlF he ext ehviour of the is dependent upon oth this prmeter nd the vlue of the llgontent prmeterF ouputexmeX the nme of the output nnottion set userintspromyriginlwrkupsX often the originl mrkups will provide hints tht my e useful for orretly identifying the min ontent of the doumentF sf trueD useful mrkup @urrently the titleD odyD nd nhor tgsA will e used y the to help detet ontentD defults to trueF
21.26
he see pluginD snterennottoregreementD omputes internnottor greement meE sures for vrious tsksF por nmed entity nnottionsD it omputes the pEmesuresD nmely reisionD ell nd pID for two or more nnottion setsF por text lssi(tion tsksD it
SIV
omputes gohen9s kpp nd some other see mesures whih re more suitle thn the pEmesures for the tskF his plugin is fully doumented in etion IHFSF ghpter IH introE dues vrious mesures of internnottor greement nd desries rnge of tools provided in qei for lulting themF
21.27
he plugin hemennottioniditor9 onstrins the nnottion editor to permitted typesF ee etion QFRFT for more informtionF
21.28
he gorefools9 plugin provides frmework for oEreferene type tsksD with min fous on time e0ienyF snluded is the yrthoef D tht uses the goref prmework to perform orthogrphi oErefereneD in mnner similr to the yrthomther TFVF he prinipl elements of the goref prmework re de(ned s followsX
oEreferene two nphors re sid to e co-referring when they refer to the sme entityF gger softwre module tht emits set of
@ritrry stringsA when provided with n nphorF hen two nphors hve tgs in ommonD tht is n indition tht they my e oEreferringF
tags
wther softwre module tht heks whether two nphors re oEreferring or notF
he plugin lso inludes the gate.creole.core.CorefBase strt lss tht implements the following work)owX IF enumerte ll nphors in the input doumentF his selets ll nnottions of types mrked s input in the on(gurtion (leD nd sorts them in the order they pper in the doumentF PF for eh nphorX @A otin the set of ssoited tgsD y interrogting ll nnottion typeY
taggers
SIW
@A onstrut list of antecedentsD ontining the previous nphors tht hve tgs in ommon with the urrent nphorF por eh of themX (nd ll the matchers registered for the orret nphor nd nteedent nnoE ttion typeF nteedents for whih t lest on mther on(rms positive mth get dded to the list of candidatesF @A generte dateF
coref
candi-
nnottionetxme
String vlueD representing the nme of the nnottion set tht ontins the nphor nnottionsF he resulting reltions re produed in the reltion set ssoited with this nnottion set @see etion UFU for tehnil detilsAF
on(gpilerl
nd
matchers
to e usedF
mxvookfehind n Integer vlueD speifying the mximum distne etween the urrent
nphor nd the most distnt nteedent tht should e onsideredF e vlue of 1 requires the system to only onsider the immeditely preeding nteedentY the defult vlue is 10F o disle this funtionD set this prmeter to negtive vlueD in whih se ll nteedents will e onsideredF his is proly not good ide in the generl oEreferene settingD s it will likely produe undesired resultsF he exeution speed will lso e negtively 'eted on very lrge doumentsF
he most importnt prmeter listed ove is onfigpilerlD whih should point to (le desriing whih tggers nd mthers should e usedF he (le should e in wv formtD nd the esiest wy of produing one is to modify the provided exmpleF prom tehnil point of viewD the on(gurtion (le is tully n wv serilistion of gate.creole.coref.Config ojetD using the trem lirry @httpXGGxstremFodehusF orgGAF he trem seriliser is on(gured to mke the wv (le more userEfriendly nd less veroseF e shortened exmple is inluded elow for refereneX
1 2 3 4 5 6 7 8 9 10 11
< coref . Config > < taggers > < default . taggers . DocumentText annotationType = " Organization " / > < default . taggers . Initials annotationType = " Organization " / > < default . taggers . MwePart annotationType = " Organization " / > ... </ taggers > < matchers > <! ## O r g a n i z a t i o n <! I d e n t i t y >
##
>
SPH
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
etul oEreferene s n e implemented y extending the CorefBase lss nd providing pproprite defult vlues for some of the prmetersD ndD if requiredD dditionl funtionE lityF he gorefools plugin inludes some redyEmde
Tagger
nd
Matcher
implementtionsF
he following ggers re villeX elis his tgger requires n externl on(gurtion (leD ontining lisesD eFgF person
nmes nd ssoited niknmesF ih line in the on(gurtion (le ontins the se formD the lisD nd optionlly on(dene soreD ll seprted y t hrtersF sf the doument text for the provided nphor @or ny of its prts in the se of multiE word expressionsA is known se form or n lisD then the tgger will emit oth the se form nd the lis s tgsF
ennype e tgger tht simply returns the nnottion type for the given nphorF gollte e ompound tgger tht wrps list of suEtggersF por eh nphor it produes
set of tgs tht onsists of ll possile omintions of tgs produed y its suEtggersF
houmentext e simple tgger tht uses the normlised doument text s tgF he
normlistion performed inludes removing whitespe t the strt nd end of the nnottionsD nd repling ll internl sequenes of whitespe with single spe hrterF vided nphorF
pixedgs e tgger tht lwys returns the sme (xed set of tgsD regrdless of the proE
SPI
snitils sf the doument text for the provided nphor is multiEwordEexpressionD where
eh onstituent strts with n upper se letterD this tgger returns two tgsX one ontining the initilsD nd the other ontining the initilsD eh followed y full stopF por exmpleD Internation Business Machines would produe IBM nd I.B.M.F eh onstituent strts with n upper se letterD this tgger returns the set of onE stituent prts s tgsF
wwert sf the doument text for the provided nphor is multiEwordEexpressionD where
he following wthers re villeX elis e mther tht mthes when the doument text for the nphor nd the nteedent
@or their onstituent prtsD in the se of multiEword expressionsA re lises of eh otherF
end e ompound mther tht mthes when ll of its suEmthers mthF ennype e mther tht mthes when the nnottion type for the nphor nd its nE
teedent re the smeF
houmentext e mther tht mthes if the normlised doument text of the nphor
nd its nteedent re the smeF
plse e mther tht never mthesF snitils e mther tht mthes when the doument texts for the nphor nd its nE
teedent re initils of eh otherF
wwert e mther tht mthes when the nphor nd its nteedent re multiEwordE
expression nd one of its prtsD respetivelyF
yr e ompound mther tht mthes when ny of its suEmthers mthF rnsitiveend e mther tht wrps suEmtherF qiven n nphor nd n nteedentD
the following work)ow is followedX lulte the coref trnsitive losure for the nteedentX set ontining the nteedentD nd ll the nnottions tht re in oref reltion with nother nnottion from this setAF return positive mth if nd only if the provided nphor mthes ll the nE teedents in the losure setD ording to the wrpped suEmtherF
SPP
oEreferent or not sed on similrities etween their surfe forms @the doument textAF he OrthoRef lso serves s n exmple of how to use the goref frmeworkF
Data WriterF
elso inluded with the gorefools plugin is roessing esoure nmed Legacy Coref sts role is onvert to eh reltionsEsed oEreferene dt into doument fetures into the legy formt used y the goref iditorF his onstitutes ridge etween the new reltionsEsed dt model nd the old doument fetures sed oneF
21.29
Pubmed Format
his plugin ontins formt nlysers for the textul formts used y uwed6 nd the gohrne virry7 F he title nd strt of the input doument re used to produe the ontent for the qei doumentY ll other (elds re onverted into qei doument feturesF o use itD simply lod the pormtumed pluginY this will register the doument formts with qeiF sf the input (les use FpumedFtxt or FohrneFtxt extensionsD then qei should utoE mtilly (nd the orret doument formtF sf your (les ome with di'erent extensionsD then you n fore the use of the orret doument formt y expliitly speifying the mime type vlue s textGxEpumed or textGxEohrneD s ppropriteF his will work oth when diretly reting new qei doument nd when populting orpusF
21.30
MediaWiki Format
his plugin ontins formt nlysers for douments using wediiki mrkup8 F o use itD simply lod the pormtwediiki pluginY this will register the doument formts with qeiF hen loding doument into qei you must then speify the pproprite mime typeX textGxEmediwiki for plin text douments ontining wediiki mrkupD or textGxmlCmediwiki for wv dump (les @suh s those produed y ikipedi9 AF his will work oth when diretly reting new qei doument nd when populting orpusF xote tht if loding n wv dump (le ontining more thn one pgeD urrently only the (nl pge within the (le will e lodedF sf you wish to populte orpus from single wediiki wv dumpD use the option to populte from single (leD set the root element to textD the mime type to textGxEmediwiki nd don9t inlude the root element in the reted doumentsF
6 http://www.ncbi.nlm.nih.gov/pubmed/ 7 http://www.thecochranelibrary.com/ 8 http://www.mediawiki.org/wiki/Help:Formatting
9 http://en.wikipedia.org/wiki/Wikipedia:Database_download
SPQ
21.31
ermider is set of term extrtion nd soring tools developed in the xeyn nd eE gywiw projetsF elthough the plugin is still experimentlD we re now inluding it in qei s response to frequent requests from qei users who hve red pulitions relted to those projetsF lese note tht lthough the ermider qs nd es re themselves firly stleD they re sujet to hnge nd the output formts re unstleF he esiest wy to test ermider is to populte orpus with relted doumentsD lod the smple pplition @pluginsGermiderGpplitionsGtermriderEengFgppAD nd run itF his pplition will proess the douments nd rete instnes of three termnk lnguge resoures with sensile prmetersF
SPR
fsdf ermnk
his termnk lultes tfFidf sores over ll the term ndidtes in the set of orporF st hs the following dditionl init prmetersF idfglultionX n enum @pullEdown menu in the qsA with the following options for inverted doument frequenyX
! ! ! ! !
Natural
Logarithmic
LogarithmicPlus1
tfglultionX n enum @pullEdownA with the following options for term frequenyX
Natural
= tf Y = 1 + log2 tf F
Logarithmic
por these luttionsD tf is the term frequeny @numer of ourrenes of the term in the orporAD df is the doument frequeny @numer of douments ontining the termAD nd n is the totl numer of doumentsF
ennottion ermnk
his termnk ollets the vlues of soring fetures on ll the term ndidtes nd selets the minimum or mximum sore or verges themD ording to the mergingwode prmeterF st hs the following dditionl init prmetersF inputorepetureX n nnottion feture whose vlue should e xumer or interE pretle s numerF mergingwodeX n enum @pullEdown menu in the qsA with the options MEAND or MAXIMUMF
MINIMUMD
ryponymy ermnk
his termnk lultes uyy homin elevne fosm 8 ossen IH over ll the term ndidtesF st hs the following dditionl init prmeterF inputredpetures @vist`tringbAX nnottion fetures on term ndidtes onE tining the hed of the expressionF
SPS
red informtion is generted y the multiword tei grmmr inluded in the pplitionF e onsider T1 hyponym of T2 if nd only if T2 hed feture vlue ends with T1 9s hed or string feture vlueF
SPT
SPU
SQH
GATE Cloud
qei is @nd lwys will eA freeD ut mhine timeD triningD dedited support nd espoke development is notF sing qeigloud you n rent loud time to proess lrge thes of douments on vst server frmsD or demi lustersF ou n push teryte of nnotted dt into n index server nd replite the dt ross the worldF yr just purhse trining servies nd support for the vrious tools in the qei fmilyF
22.1
et the time of writingD there re severl kinds of serviesD o'erred from qei gloudD ut they will e growing signi(ntly over the ourse of the next six monthsD so for n upEtoEdte list see httpsXGGgteloudFnetGshopfrontF qei ennottion erviesX these llow you to run qei pplitionD on the loudD over lrge doument olletionsF he qei pplition n e reted y the user in qei heveloper nd uploded on the loudD or users n use some preEpkged pplitionsD eFgFD exxsiD exxsi with xumer nd wesurement ddEonsF qei emwre @ghpter PQAX weEsed ollortive nnottion toolD tht supE ports distriuted tems of mnul nnottors nd dt mngersGurtors to produe goldEstndrd orpor for evlution nd triningF qei wsws @ghpter PRAX multiEprdigm informtion mngement index nd repository whih n e used to index nd serh over textD nnottionsD semnti shems @ontologiesAD nd semnti metEdt @instne dtAF st llows queries tht ritrrily mix fullE textD struturlD linguisti nd semnti queries nd tht n sle to terytes of textF
22.2
here re severl other textEnlysisEsEEservie systems out there tht do some of wht we doF rere re some di'erenesX e9re the only open soure solutionF e9re the only ustomisle solution we support ringEyourEownEnnottor option ! qei pipeline ! s well s preEpkged entity nnottion servies like other systemsF e9re the only endEtoEend full lifeyle solutionF e don9t just do entity extrtion ! we do dt preprtionD interEnnottor greementD qulity ssurne nd ontrolD dt visulistionD indexing nd serh of full textGnnottion grphGontologyGinstne storeD etF etF etF
GATE Cloud
fulk uplod of douments to proessD no need to use progrmming essF xo reurring monthly ostsD pyEperEuseD illed per hourF xo dily limit on numer of douments to proessF xo limit on doument sizeF gosts of proessing dependent on overll dt sizeD not numer of doumentsF
SQI
eEsed ollortive nnottion tool to orret mistkes nd rete trining nd evlution dt @see ghpter PQAF peedX other systems prie per doument @we prie on proessing timeA ! this mkes it impossile to ompre like with like @do you relly wnt to ompre the proessing of individul tweets ginst PHH pge tehnil reportsc3AF qeigloud is lso hevily optimised for high volumes ! if you wnt to do low volumesD you n do them on your netookF gommunityX we9ve een here for more thn IS yersD nd our ommunity of develE opersD usersD third prty suppliers nd so on is seond to noneF
22.3
fefore you n uy ny of our loud sed o'erings you need to rete n ount on qeigloudFnetD use the egister link t the top right of ny pge nd follow the instrutionsF yne registered nd logged in you n rowse through the shop nd deided on the servies you wish to purhseF he shop does not hndle money ut works insted with vouhers ought from the niversity of he0eld9s onEline shopF ouhers re ville in multiples of SD the mount you need to purhse will depend upon the servies you wish to useF yne you re redy to uy time on qeigloudFnet rete n ount with the niversity shop nd then uy the pproprite mount redit vouhersF fe sure to use the sme emil ddress when uying vouhers s when registering for qeigloudFnet ount so tht redit you purhse n utomtilly e dded to your qeigloudFnet ountF yne you hve enough redit you n lik through to the hekout where you n review your sket efore (nlizing your orderF ennottion jo purhses should pper instntly within your dshordF emwre servers tke little longer to rete nd we will eEmil you when the server is redy for useF ell pst purhses n e monitored nd ontrolled vi your dshordF
SQP
GATE Cloud
22.4
e run your jos in the loud nd we pss on the loud ostsD plus smll premiumF e do not hve our own privte loudD so eh jo we run osts us moneyF herefore we n9t run zero ost servieD ut we do supply disounts nd freeies for people wnting to try the servieF o get disountX rete n ount use the qei gloud ontt pge to send us your user nme nd request for disount we pply priing rule to your ount you then shop in the norml mnnerD s desried in etion PPFQ oveF e lst word on priingX the underlying softwre is ll open soureD so there9s nothing to stop you rolling your own if you n9t 'ord the loud ostsF
GATE Cloud
SQQ
22.5
qeigloudFnet nnottion jos provide wy to quikly proess lrge numers of douE ments using qei pplitionD with the results exported to (les in qei wv or gi formt ndGor sent to wimir server for indexingF ennottion jos re optimized for the proessing of lrge thes of douments @tens of thousnds or moreA rther thn proessing smll numer of douments on the )y @qei heveloper is est suited for the ltterAF o sumit n nnottion jo you (rst hoose whih qei pplition you wnt to runF qeigloudFnet provides some stndrd preEpkged pplitions @eFgFD exxsiAD or you n provide your own pplition @see etion PPFTAF ou then uplod the douments you wish to proess pkged up into s or @optionlly ompressedA e rhivesD or eg (les @s produed y the reritrix we rwlerAD nd deide whih nnottions you would like returned s outputD nd in wht formtF hen the jo is strtedD qeigloudFnet tkes the doument rhives you provided nd divides them up into mngeleEsized thes of up to ISDHHH doumentsF ih th is then proessed using the qei prlleliser nd the generted output (les re pkged up nd mde ville for you to downlod from the qeigloudFnet site when the jo hs ompletedF
SQR
GATE Cloud
of funds t ny timeD ll your urrentlyEexeuting nnottion jos will e suspendedF ou will e le to resume the suspended jos one you hve topped up your ount to ler the negtive lneF xote tht it is not possile to downlod the result (les from ompleted jos if your qeigloudFnet ount is overdrwnF
pirst single 4split4 tsk whih tkes the initil doument rhives tht were provided when the jo ws on(gured nd splits them into mngele thes for proessingF
! s (les tht re smller thn SHwf will not e splitD nd will e proessed s ! s (les lrger thn SHwfD nd ll e (lesD will e split into hunks of mximum
size SHwf @ompressed sizeA or ISDHHH doumentsD whihever is the smllerF ih hunk will e proessed s seprte proessing tskF
yne or more proessing tsksD s determined y the split tsk desried oveF ih proessing tsk will run the qei pplition over the douments from its input hunk s de(ned y the input spei(tionD nd sve ny output (les in s rhives of no more thn IHHwfD whih will e ville to downlod one the jo is ompleteF e (nl 4join4 tsk to ollte the exeution logs from the proessing tsks nd produe n overll summry reportF xote tht euse s nd e input (les my e split into hunksD it is importnt tht eh input doument in the rhive should e selfEontinedD for exmple wv (les should not refer to hh stored elsewhere in the s (leF sf your douments do hve externl dependenies suh s hhs then you hve two hoiesD you n either @A use qei hevelE oper to lod your originl douments nd reEsve them s qei wv formt @whih is self ontinedAD or @A use ustom nnottion jo @see elowA nd inlude the dditionl (les in your pplition sD nd refer to them using solute pthsF
22.6
qeigloudFnet provides wy for you to run pretty muh ny qei pplition on the loudF ou develop your pplition in the usul wy using qei heveloper nd then sve
GATE Cloud
SQS
it s single selfEontined s (leD typilly using the 4ixport for qeigloudFnet4 optionF his setion tells you wht you need to know to ensure tht your pplition will run on qeigloudFnetF
pigure PPFPX epplition s struture he esiest wy to uild suh pkge is simply to sve your pplition in qei heveloper using the 4ixport for qeigloudFnet4 optionD whih produes s (le ontining n pplitionFxgpp nd ll its required resoures in one likF
SQT
GATE Cloud
rrdwre nd softwre
qeigloudFnet nnottion jos re exeuted on virtul TREit @xVTTRA vinux servers in the loudD spei(lly untu IHFIH @wverik weerktAF he qei pplition is run using the openEsoure qg tool1 on un tv T @IFTFHPIAF he urrent o'ering uses the emzon igP loudD nd runs jos on their 9mIFxlrge9 mhines whih provide R virtul g ores nd ISqf of memoryD of whih IQqf is ville to the qg proessF he qg @qei gloud rlleliserA proess is on(gured for 9hedless9 opertion @EhjvFwtFhedlessatrueAD nd your ode should not ssume tht qs disply is villeF qg lods one opy of your pplitionFxgpp in the usul wy using the ersisteneE wngerF st then uses the qei duplition mehnism to mke further S indepenE dent opies of the loded pplitionD nd runs T prllel threds to proess your douE mentsF por most s this duplition proess is essentilly equivlent to loding the originl pplitionFxgpp T times ut if you re writing ustom you my wish to onsider implementing ustom duplition strtegyF
hiretories
he pplition s (le will lwys e unpked in diretory nmed /gatecloud/application on the loud serverF hus the pplition (le will lwys e GgteloudGpplitionGpplitionFxgpp nd if ny of your omponents need to know the solute pth to their resoure (les you n work this out y prependE ing GgteloudGpplitionG to the pth of the entry inside your s pkgeF he user ount tht runs the qg proess hs full red nd write ess in the GgteloudGpplition diretoryD so if ny of your omponents need to rete temporry (les then this is good ple to put themF eny (les reted under GgteloudGpplition will e lost when the urrent th of douments hs een proessedF he diretory GgteloudGthGoutput is where qg will write ny output (les spei(ed y the output de(nitions you supply when running n nnottion joF ell (les reted under this diretory will e pkged up into s (les when the th of douments hs een proessed nd mde ville for downlod when the jo hs ompletedF husD ny dditionl output (les tht your pplition retes nd tht need to e returned to the user should e pled under GgteloudGthGoutputF our ode should not ssume it hs permission to red nd write ny (les outside these two lotionsF
https://gate.svn.sourceforge.net/svnroot/gate/gcp/trunk
1 Source code is available in the subversion repository at
GATE Cloud
SQU
rereD tggerFsh nd postproessorFpl re sripts tht need to e mrked s exeutleD so we ould rete (le pluginsGwyggerGFexeutles ontining the two linesX
resources/tagger.sh resources/postprocessor.pl
SQV
GATE Cloud
iither wyD the e'et would e to mke the qeigloudFnet proessing mhine mrk the relevnt (les s exeutle efore running your pplitionF
eurity nd privy
qeigloudFnet does not run seprte mhine for eh nnottion joF snsted it splits eh nnottion jo up into mngele piees @referred to s tsksAD puts these tsks into queueD nd runs olletion of proessing mhines @referred to s 4nodes4A tht simply tke the next tsk from the queue whenever they hve (nished proessing their previous tskF hile tsk is running it hs exlusive use of tht prtiulr node E we never run more thn one tsk on the sme node t the sme time E ut one the tsk is omplete the sme node will then run nother tsk @whih my or my not e prt of the sme nnottion joAF o ensure the seurity nd privy of your ode nd dtD the node tkes the following preutionsX ell qg proesses re run s n unprivileged user ount whih only hs write perE mission in restrited re of the (lesystem @see oveAF et the end of every tskD ll proesses running under tht user sh re forily terE minted @so there9s no risk of stry or mliious kground proess strted y previous tsk eing le to red your dtAF he GgteloudGpplition nd GgteloudGth diretories re ompletely deleted t the end of every tsk @whether the tsk ompleted suessfully or filedA so your dt will not e left for the following tsk to seeF
httpsXGGgteFsvnFsoureforgeFnetGsvnrootGgteGtemwreGtrunk
23.1
Introduction
por the pst ten yersD xv development frmeworks suh s ypenxvD qeiD nd swe hve een providing tool support nd filitting xv reserhers with the tsk of imE plementing new lgorithmsD shringD nd reusing themF et the sme timeD snformtion SQW
SRH
ixtrtion @siA reserh nd omputtionl linguistis in generl hs een driven forwrd y the growing volume of nnotted orporD produed y reserh projets nd through evlution inititives suh s wg wrsh 8 erznowski WVD egi1 D hg hg HID nd goxvv shred tsksF ome of the xv frmeworks @eFgFD equ wed 8 trssel HRD qei gunninghm et al. HPA even provide text nnottion user interfesF roweverD muh more is needed in order to produe high qulity nnotted orporX stringent methodologyD nnottion guidelinesD interEnnottor greement mesuresD nd in some sesD nnottion djudition @or dt urtionA to reonile di'erenes etween nnottorsF gurrent tools demonstrte tht nnottion projets n e pprohed in ollortive fshion suessfullyF roweverD we elieve tht this n e improved further y providing uni(ed environment tht provides multiErole methodologil frmework to support the di'erent phses nd tors in the nnottion proessF he multiErole support is prtiulrly importntD s it enles the most e0ient use of the skills of the di'erent people nd lowers overll nnottion osts through hving simple nd e0ient nnottion weEsed ss for nonEspeilist nnottorsF his lso enles roleEsed seurityD projet mngement nd performne mesurement of nnottorsD whih re ll prtiulrly importnt in orporte environmentsF his hpter presents emwreD weEsed softwre suite nd methodology for the implementtion nd support of omplex nnottion projetsF sn ddition to its reserh usesD it hs lso een tested s frmework for ostEe'etive ommeril nnottion serviesD supplied either s inEhouse units or s outsoured speilist tivitiesF sn omprison to previous work emwre is novel generl purposeD weEsed nnottion frmeworkD whihX
strutures the roles of the di'erent tors involved in lrgeEsle orpus nnottion @eFgFD nnottorsD editorsD mngersA nd supports their intertions in n uni(ed enE vironmentY provides set of generl purpose text nnottion toolsD tilored to the di'erent user rolesD eFgFD urtor mngement tool with interEnnottor greement metris nd djudition filities nd weEsed doument tool for inEexperiened nnottorsY supports omplex nnottion work)ows nd provides mngement onsole with usiE ness proess sttistisD suh s time spent per doument y eh of its nnottorsD perentge of ompleted doumentsD etY o'ers methodologil supportD to omplement the diverse tehnologil tool supportF
1 http://www.ldc.upenn.edu/Projects/ACE/
SRI
23.2
es disussed oveD ollortive orpus nnottion is omplex proessD whih involves di'erent kinds of tors @eFgFD nnottorsD editorsD mngersA nd lso requires diverse rnge of preEproessingD user interfeD nd evlution toolsF rere we struture ll these into oherent set of key requirementsD whih rise from our gol to provide ostEe'etive orpus nnottionF pirstlyD due to the multiple tors involved nd their omplex intertionsD ollortive enE vironment needs to support these di'erent roles through user groupsD ess privilegesD nd orresponding user interfesF eondlyD sine mny nnottion projets mnipulte hundreds of doumentsD there needs to e remoteD e0ient dt storgeF hirdlyD sigE ni(nt ost svings n e hieved through preEnnotting orpor utomtillyD whih in turns requires support for utomti nnottion servies nd their )exile on(gurE tionF vstD ut not lestD )exile work)ow engine is required to pture the omplex requirements nd intertionsF xext we disuss the four highElevel requirements in (nerEgrined detilsF
ennottors re given set of nnottion guidelines nd often work on the sme doument
independentlyF his is needed in order to get more relile results ndGor mesure how well humns perform the nnottion tsk @see more on snterEennottor egreement @seeA elowAF gonsequentlyD mnul nnottion is slow nd errorEprone tskD whih mkes overll orpus prodution very expensiveF sn order to llow the involvement of lessEspeilised nnottorsD the mnul nnottion user interfe needs to e simple to lern nd useF sn dditionD there needs to e n utomti trining mode for nnottors where their performne is ompred ginst known gold stndrd nd ll mistkes re identi(ed nd explined to the nnottorD until they hve mstered the guidelinesF ine the nnottors nd the orpus editors re most likely working t di'erent lotionsD there needs to e ommunition hnnel etween themD eFgFD instnt messgingF sf n editorGmnger is not villeD n nnottor should lso e le to mrk n nnottion s requiring disussion nd then ll suh nnottions should e shown utomtilly in the editor onsoleF sn dditionD the nnottion environment needs to restrit nnottors
SRP
to working on mximum of n douments @given s numer or perentgeAD in order to prevent n overEzelous nnottor from tking over projet nd introduing individul isF ennottors lso need to e le to sve their work ndD if they lose the nnottion toolD the sme doument must e presented to them for ompletion the next time they log inF prom the user interfe perspetiveD there needs to e support for nnotting doumentElevel metdt @eFgFD lnguge identi(tionAD wordElevel nnottions @eFgFD nmed entitiesD y tgsAD nd reltions nd trees @eFgFD oErefereneD syntx treesAF sdellyD the interfe should o'er some generi omponents for ll theseD whih n e ustomised with the projetEspei( tgs nd vlues vi n wv shem or other similr delrtive mehnismF he s lso needs to e extensileD so speilised ss n esily e plugged inD if requiredF
nottion djuditionD goldEstndrd produtionD nd nnottor triningF hey lso need to ommunite with nnottors when questions riseF hereforeD they need to hve wider privileges in the systemF sn ddition to the stndrd nnottion interfesD they need to hve ess to the tul orpus nd its douments nd run see metrisF hey lso need speilised djudition interfe whih helps them identify nd reonile di'erenes in multiply nnotted doumentsF por some nnottion projetsD they lso need to e le to send prolemti doument k for reEnnottionF
their work)owsD monitoring their progressD nd deling with performne issuesF hepending on projet spei(sD they my work together with the urtors nd de(ne the nnottion guidelinesD the ssoited shems @or set of tgsAD nd prepre nd uplod the orpus to e nnottedF hey lso mke methodologil hoiesX whether to hve multiple nnottors per doumentY how mnyY whih utomti xv servies need to e used to preEproess the dtY nd wht is the overll work)ow of nnottionD qulity ssurneD djuditionD nd orpus deliveryF wngers need projet monitoring tool where they n seeX hether orpus is urrently ssigned to projet orD wht nnottion projets hve een run on the orpus with links to these projets or their rhive reports @if no longer tiveAF elso links to the the nnottion shems for ll nnottion types urrently in the orpusF rojet ompletion sttus @eFgFD VH7 mnully nnottedD PH7 djuditedAF ennottor sttistis within nd ross projetsX whih nnottor worked on eh of the doumentsD wht shems they usedD how long they tookD nd wht ws their see @if mesuredAF he ility to lok orpus from further editingD either during or fter projetF
SRQ
eility to rhive projet reportsD so projets n e deleted from the tive listF erhives should preserve informtion on wht ws done nd y whomD how long it tookD etF
SRR
23.3
emwre is weEsed ollortive nnottion nd urtion environmentD whih llows unskilled nnottors to e trined nd then used to lower the ost of orpus nnottion projetsF purther ost redutions re hieved y ootstrpping with relevnt utomti nnottion serviesD where these existD ndGor through mixed inititive lerning methodsF st hs servieEsed rhiteture whih is prllelD distriutedD nd lso slle @vi servie replitionA @see pigure PQFIAF es shown in pigure PQFID the emwre rhiteture onsists of ye we servies for dt storgeD set of weEsed user interfes @s vyerAD nd n exeutive lyer in the
SRS
middle where the work)ows of the spei( nnottion projets re de(nedF he s vyer is onneted with the ixeutive vyer for exhnging ommnd nd ontrol messges @suh s requesting the sh for doument tht needs to e nnotted nextAD nd lso it onnets diretly to the servies lyer for dtEintensive ommunition @suh s downloding the tul doument dtD nd uploding k the nnottions produedAF
SRT
he endpointD messge queue nd worker@sA re oneptully nd logilly seprteD nd my e physilly hosted within the sme tv irtul whine @wAD within seprte ws on the sme physil hostD or on seprte hosts onneted over networkF hen servie is (rst deployed it will typilly e s single worker whih resides in the sme w s the servie endpointF his my e dequte for simple or lightlyEloded servies ut for more hevilyEloded servies dditionl workers my e dded dynmilly without shutting down the we servieD nd similrly workers my e removed when no longer requiredF ell workers tht re on(gured to onsume jos from the sme endpoint will trnsprently shre the lodF wultiple workers lso provide fultEtolerne ! if worker fils its inEprogress jos will e returned to the queue nd will e piked up nd hndled y other workersF
SRU
een developed to meet most of the requirements disussed in etion PQFPFR oveF pirstlyD it provides dynmi work)ow mngementX reteD redD updteD delete @ghA work)ow de(nitionsD nd work)ow tionsF eondlyD it supports usiness proess monitoringD iFeFD mesures how long nnottors tkeD how good they re t nnottingD s well s reporting the overll progress nd ostsF hirdlyD there is work)ow exeution engine whih runs the tul nnottion projetsF es prt of the exeution proessD the projet mnger selets the numer of nnottors per doumentY the nnottion shemsY the set of nnottors nd urtor@sA involved in the projetY nd the orpus to e nnottedF pigure PQFP shows n exmple work)ow templteF he digrm on the right shows the hoie points in work)ow templtes E whether to do utomti nnottion or mnul or othY whih utomti nnottion servies to exeute nd in wht sequeneY nd for mnul nnottion ! wht shems to useD how my nnottors per doumentD whether they n rejet nnottE ing doumentD etF he leftEhnd side shows the tul seletions mde for this prtiulr work)owD iFeFD use oth utomti nd mnul nnottionY nnotte mesurementsD referE enesD nd setionsY nd hve one nnottor per doumentF yne this templte is sved y the projet mngerD then it n e exeuted y the work)ow engine on hosen orpus nd list of nnottors nd urtorsF he work)ow engine will (rst ll the utomti nnottion servie to ootstrp nd then its results will e orreted y humn nnottorsF
SRV
he rtionle ehind hving n exeutive lyer rther thn de(ning uthentition nd work)ow mngement s servies similr to the storge nd ontology ones omes from the ft tht emwre servies re ll ye we serviesD wheres elements of the exeutive lyer re only in prt implemented s ye servies with the rest eing rowser sedF goneptully lso the work)ow mnger ts like middlemn tht ties together ll the di'erent servies nd ommunites with the user interfesF
SRW
or if n nnottor needs to log o' prior to ompleting doumentF he next time they login nd request new tskD they will e given this doument to omplete (rstF
SSH
yet to e ompleted @see pigure PQFSAF er nnottor sttistis re lso ville ! time spent per doumentD overll time workedD verge seeD etF hese requirements were disussed in further detil in etion PQFPFI oveF
23.4
Practical Applications
emwre hs lredy een used in prtie in over IH orpus nnottion projets of vrying omplexity nd size ! due to spe limittionsD here we fous on three representtive onesF pirstlyD we tested the roustness of the dt lyer nd the work)ow mnger in the fe of simultneous onurrent essF por this we nnotted IHH doumentsD P nnottors per doumentD with TH tive nnottors requesting douments to nnotte nd sving their results on the serverF here were no lteny or onurreny issues reportedF yne the urrent version ws onsidered stleD we rn severl orpus nnottion projets to produe gold stndrds for si evlution in three dominsX usiness intelligeneD (sheriesD nd ioEinformtisF he ltter involved IH ioEinformtis students whih were (rst given rief trining session nd were then llowed to work from homeF he projet hd P nnottors per doumentD working with T entity types nd their feturesF yverllD IHW wedline strts of round PHHEQHH words eh were nnotted with verge nnottion speed of W minutes per strtF his projet reveled severl shortomings of emwre whih will e ddressed in the forthoming version PX see is lulted per doumentD ut there is no esy wy to see how it hnges ross the entire orpusF he dtstore lyer n sometimes leve the dt in n inonsistent stte following n errorD due to the underlying inry tv serilistion formtF e move towrds wv (leEsed storge is eing investigtedF here needs to e limit on the proportion of douments whih ny given nnottor is
SSI
llowed to work onD sine one overEzelous nnottor ended up introduing signi(nt is y nnotting more thn VH7 of ll doumentsF he most verstile nd still ongoing prtil use of emwre hs een in ommeril ontextD where ompny hs two tems of S nnottors eh @one in ghin nd one in the hilippinesAF he nnottion projets re eing de(ned nd overseen y mngers in the eD who lso t osionlly s urtorsF hey hve found tht the stndrd douleE nnotted greementEsed pproh is good foundtion for their ommeril needs @eFgFD in the erly stges of the projet nd ontinuously for gold stndrd produtionAD while they lso use very simple work)ows where the results of utomti servies re eing orreted y nnottorsD working only one per doument to mximise volume nd lower the ostsF sn the pst few months they hve nnotted over IDRHH doumentsD mny of whih ording to multiple shems nd nnottion guidelinesF por instneD RHH ptent douments were douly nnotted oth with mesurements @see hieved VHEWS7A nd ioEinformtis entitiesD nd then urted nd djudited to rete gold stndrdF hey lso nnotted IHHH wedline strts with speies informtion where they mesured verge speed of SEU minutes per doumentF he initil nnottor trining in emwre ws etween QH minutes nd one hourD following whih they rn severl smllEsle experimentl projets to trin the nnottors in the prtiulr nnottion guidelines @eFgFD mesurements in ptentsAF ennottion speed lso improved over timeD s the nnottors eme more pro(ient with the guidelines ! the emwre nnottor sttistis registered improvements of etween IS nd PH7F ennottion qulity @mesured through interEnnottor greementA remined highD even when nnottors hve worked on mny douments over timeF
SSP
httpsXGGgteFsvnFsoureforgeFnetGsvnrootGgteGmimirGtrunk
SSQ
SSR
GATE Mmir
A.1
SST
Change Log
mrkupD oth plin text nd wv dump (les suh s those from ikipedi @see etion PIFQHAF
MediaWiki
sn dditionD redyEmde pplitions hve een dded to mny existing plugins @notly the Lang_* nonEinglish lnguge pluginsA to mke it esier to experiment with their sF
Change Log
SSU
sulss is ommon wy of providing with di'erent set of defult prmeters @this is used extensively in the lnguge plugins to provide ustom gzetteers nd nmed entity trnsduersAF his hs the dded ene(t of ensuring tht new fetures lso utomtilly perolte down to these sulssesF sf you hve developed your own tht extends one of the exxsi ones you my (nd it hs quired new prmeters tht were not there previouslyD you my need to use the driddengreolermeter nnottion to suppress themF he orpus prmeter of vngugeenlyser @n interfe mostD if not llD s impleE mentA is now nnotted s dyptionl s most implementtions do not tully require the prmeter to e setF hen sving n pplition the plugins re now sved in the sme order in whih they were originlly loded into qeiF his ensures tht dependenies etween plugins re orretly mintined when pplitions re restoredF es support for working with reltions etween nnottions ws ddedF ee etion UFU for more detilsF he method of populting orpus from single (le hs een updted to llow ny mime type to e used when reting the new doumentsF end numerous smller ug (xes nd performne improvementsF F F
A.2
SSV
Change Log
smproved the support for proessing iomedil text y dding new s to inorporte the following toolsX eqeneD the xormqene tggerD the qixse sentene splitterD wuttionE pinder nd the enn fiogger @ontins tokenizer nd three tggers for geneD mlignny nd vritionAF por full detils of these new resoures see setion ITFIF he plexile qzetteer hs een rewritten to provide etter nd fster implementE tionF he two prmeters inputennottionetxme nd outputennottionetxme hve een renmed to inputexme nd outputexmeD however old pplitions with the old prmeters should still workF lese see etion IQFT for more detilsF
he egment roessing hs two dditionl runEtime prmeters lled segmentennottionpeturexm nd segmentennottionpeturelueF hese fetures llow users to speify onstrint
Change Log
SSW
on feture nme nd feture vlueF sf user hs provided vlues for these prmetersD only the nnottions with the spei(ed feture nme nd feture vlue re proessed with the egment roessing F elsoD the prmeter ontroller hs een renmed to nlyser whih mens the egment roessing n now lso run n individul on the spei(ed segments1 F ee IWFPFIH for more informtion on setionEyEsetion proessingF he rsh qzetteer @setion IQFSA now properly supports the seensitive prmeter @previously the prmeter ould e set ut hd no e'etAF he houment eset @etion TFIA now defults to keeping the uey set s well s yriginl mrkupsF his mkes working with preEnnotted gold stndrd doument less dngerous @ssuming you put the gold stndrd nnottions in set lled ueyAF pdted tnford rser plugin @see etion IUFRA to version IFTFVF he extgt sed vnguge sdenti(tion now supports generting new lnguge (nE gerprintsF ee setion ISFI for full detilsF edded support for reding ge nd wsEformt douments reted y sweF ee seE tion SFSFW for detilsF rious improvements to the qei heveloper qsX dded support in the doument editor to swith the prinipl text orienttionD to etter support douments written in rightEtoEleft lnguges suh s eriD rerew or rdu @setion QFPAF dded new mouse shortuts to the ennottion tk view in the doument editor to speed up the urtion proess @setion QFRFQAF the doument editor lyout is now sved to the user preferenes (leD gteFxmlF st mens tht you n give this (le to new user so sGhe will hve preon(gured doument editor @setion QFPAF the sript ehind n instne of the qroovy ripting @setion UFIUFPA n now e edited from within qei heveloper through new visul resoure whih supports syntx highlightingF he rule nd phse nmes re now essile in tei tv r y the rulexme@A nd phsexme@A methods nd the nme of the tei proessing resoure exeuting the tei trnsduer is essile through the tion ontext getxme@A methodF ee setion VFTFSF
1 Existing saved applications using the
question implements the
controller parameter will still work provided the controller in LanguageAnalyser interface. The CorpusController implementations supplied as
STH
Change Log
A.3
@SFS metresD I minute QH seondsD IH to IS poundsD etFA long with their normlized vlues in s unitsF ee setion PIFV for full detilsF
ggerhtexormlizer to nnotte nd normlize dtes within doumentF ee seE hemools providing hem inforer tht n e used to rete len output
nnottion set sed on set of nnottion shemsF ee setion PIFIS for full detilsF
here re numerous performne enhnements nd ug (xes detiled in setion ITFIFPF xote tht this version of the plugin is not omptile with the version provided in qei TFHD though this erlier version is still ville in the ysolete diretory if requiredF
Change Log
STI
ddxmespepetures E set to true to deserilize nmespe pre(x nd s informtion s feturesF nmespes E he feture nme to use tht will hold the nmespe s of the elementD eFgF nmespe nmespere(x E he feture nme to use tht will hold the nmespe pre(x of the elementD eFgF pre(x etting these ttriutes will lter qei9s defult nmespe deseriliztion ehviour to remove the nmespe pre(x nd dd it s fetureD long with the nmespe sF his llows nmespeEpre(xed elements in the yriginl mrkups nnottion set to e mthed with tei expressionsD nd lso llows nmespe sope to e dded to new nnottions when serilized to wvF ee SFSFP for detilsF erhle eril htstores @vueneEsedA re now portle nd n e moved ross di'erent systemsF elsoD severl qs improvements hve een mde to ese the retion of vuene dtstoresF ee hpter W for detilsF he populte method tht llowed populting orpus from trewe (le hs een mde more generi to ept tgF he method extrts ontent etween the strt nd end of this tg to rete new doumentsF sn qei heveloperD rightEliking on n instne of the gorpus nd hoosing the option opulte from ingle gontented pile4 llows users to populte the orpus using this funtionlityF ee etion UFRFS for more detilsF pixed regression in the tei prser tht prevented the use of r mros tht refer to vr lel @nmed loks Xlel { FFF } nd ssignments XlelFype a {} inhned the qroovy sriptle ontroller with some fetures inspired y the reltime onE trollerD in prtiulr the ility to ignore exeptions thrown y s nd the ility to limit the running time of ertin sF ee setion UFIUFQ for detilsF he yntology nd qzetteervuf plugins hve een upgrded to use esme QFPFQ nd yvsw QFSF he esphinx grwler @setion PIFIUA hs new runtime prmeters for ontrolling the mximum pge size nd spoo(ng the userEgentF e few ug (xes nd improvements to the reover logi of the pkgegpp ent tsk @see setion iFPAF F F F nd mny other smller ug(xesF
xoteX es of version TFID qei heveloper nd imedded require tv T or lter nd will no longer run on tv SF sf you require tv S omptiility you should use
qei TFHF
STP
Change Log
A.4
Change Log
STQ
ghnged the semntis of the ontologyEwre mthing mode in tei to tke E ount of the defult nmespe in n ontologyF xow lss feture vlues tht re not omplete ss will e treted s nming lsses within the defult nmespe of the trget ontology onlyD nd not @s previouslyA ny lss whose s ends with the spei(ed nmeF his is more onsistent with the wy yv normlly worksD s well s eing muh more e0ient to exeuteF ee setion IRFIH for more detilsF pdted the ordxet plugin to support more reent releses of ordxet thn IFTF he formt of the on(gurtion (le hs hngedD if you re using the previous ordxet IFT support you will need to updte your on(gurtionF ee setion PIFIV for detilsF he depreted ggerreegger plugin hs een removedD pplitions tht used it will need to e updted to use the ggerprmework plugin instedF ee setion PIFQ for detils of how to do thisF
STR
Change Log
from the input nnottionsF ee setion PIFQ for detilsF edded new prmeters nd options to the vingipe vnguge sdenti(er F @seE tion PIFPQFSAD nd orreted the doumenttion for the vingipe y gger @seE tion PIFPQFQAF sn the doument editorD (xed severl exeptions to mke editing text with nnottions highlighted workingF o you should now e le to edit the text nd the nnottions should ehve orretly tht is to sy moveD expnd or dispper ording to the text insertions nd deletionsF yptions for doument editorX redEonly nd insert ppendGprepend hve een moved from the options dilogue to the doument editor toolr t the top right on the tringle ion tht disply menu with the optionsF ee setion QFPF edded new prmeters nd options to the grwl nd doument fetures to its outputY see setion PIFIU for detilsF pixed ug where ontologyEwre tei rules worked orretly when the trget nnottion9s lss ws sulss of the lss spei(ed in the ruleD ut filed when the two lss nmes mthed extlyF smproved support for onditionl pipelines ontining nonEvngugeenlyser proessing reE souresF edded the urrent gorpus to the sript inding for the qroovy ript D llowing qroovy sript to ess nd set orpusElevel feturesF elso dded llks tht qroovy sript n implement to do dditionl preE or postEproessing efore the (rst nd fter the lst doument in orpusF ee setion UFIU for detilsF
A.5
his is ug(x relese to resolve severl ugs tht were reported shortly fter the relese of version SFPX pixed some ugs with the utomti rete instne feture in ye @the ontology nnottion toolA when used with the new yntology pluginF edded vlidtion to dttype property vlues of the
dateD time
nd
datetime
typesF
pixed ug with qzetteervuf tht prevented it working when the ditionryth ontined spesF edded utility lss to hndle ommon ses of enoding ss for use in ontologiesD nd (xed the exmple ode to show how to mke use of thisF ee hpter IR for detilsF
Change Log
STS
he nnottion set trnsfer now opies the feture mp of eh nnottion it trnsfersD rther thn reEusing the sme peturewp @this mens tht when used to opy nnottions rther thn move themD the opied nnottion is independent from the originl nd modifying the fetures of one does not modify the otherAF ee seE tion PIFIR for detilsF he vogRt log (les re now reted y defult in the Fgte diretory under the user9s home diretoryD rther thn eing reted in the urrent diretory when qei strtsD to e more friendly when qei is instlled in shred lotion where the user does not hve write permissionF
his relese lso (xes some shortomings in the qroovy support dded y SFPD in prtiulrX he orpor vrile in the onsole now inludes persistent orpor @loded from dtstoreA s well s trnsient orporF he susript nottion for nnottion sets works with long vlues s well s intsD so someennottionFstrt@AFFnnottionFend@A works s expetedF
A.6
STT
Change Log
A.7
ersion SFI is mjor inrement with lots of new fetures nd integrtion of numer of importnt systems from Qrd prties @eFgF vingipeD ypenxvD ypenglisD revised swe onnetorAF e9ve stuk with the S series @insted of jumping to TFHA euse the ore
Change Log
remins stle nd kwrds omptileF yther highlights inludeX
STU
n entirely new ontology es from tohnn etrk of ypes @the old one is still ville ut s pluginA new enhmrking filities for tei from endrew forthwik nd ollegues t snE telius new qulity ssurne tools from homs reitz nd ollegues t yntotext nd he0eld generi tgger integrtion frmework from en itte of gonordi niversity severl new ode ontriutions from yntotextD inluding lrge knowledgeEsed gzetteer nd vrious plugin wrppers from wrin xozhevD qeorgi qeorgiev nd olE legues revised nd reordered user guideD mlgmted with the progrmmers9 guide nd other mterils qroovy supportD pplition ompositionD setionEyEsetion proessing nd lots of other its nd piees
ypenxv upport
ypenxv provides tools for sentene detetionD tokeniztionD posEtggingD hunking nd prsingD nmedEentity detetionD nd orefereneF he tools use wximum intropy modE ellingF e hve provided plugin lled ypenxv9 with wrppers for some of the resoures ville in the ypenxv ools lirryF por more detilsD see setion PIFPRF
STV
Change Log
ypenglis upport
e dded new lled ypenglis 9F his will proess doument through the ypenglis servieD nd dd ypenglis entity nnottions to the doumentF por more detilsD see etion PIFPPF
yntology es
he ontology es @pkge gteFreoleFontology hs een hngedD the existing ontology implementtion sed on esmeI nd yvswP @pkge gteFreoleFontologyFowlimA hs een moved into the plugin yntologyyvswPF en upgrded implementtion sed on esmeP nd yvswQ tht lso provides numer of new fetures hs een dded s plugin yntologyF ee etion IRFIQ for detiled desription of ll hngesF
fenhmrking smprovements
e numer of improvements to the enhmrking support in qeiF tei trnsduers now log the time spent in individul phses of multiEphse grmmr nd y individul rules within eh phseF yther s tht use tei grmmrs internlly @the pronominl orefE erenerD inglish tokeniserA log the time tken y their internl trnsduersF e reporting toolD lled ro(ling eports9 under the ools9 menu mkes summry informtion esily villeF por more detilsD see hpter IIF
qs improvements
o del with qulity ssurne of nnottionsD one omponent hs een updted nd two new omponents hve een ddedF he nnottion di' tool hs new mode to opy nnottions to onsensus setD see setion IHFPFIF en nnottion stk view hs een dded in the doument editor nd it llows to opy nnottions to onsensus setD see setion QFRFQF e orpus view hs een dded for ll orpus to get sttistis like preisionD rell nd pEmesureD see setion IHFQF en nnottion stk view hs een dded in the doument editor to mke esier to see overlpping nnottionsD see setion QFRFQF
efxi upport
efxi is e fiomedil xmed intity eogniserD for (nding entities suh s genes in textF e hve provided plugin lled energger9 with wrpper for efxiF por more detilsD see setion ITFIFIF
Change Log
STW
etionEyEetion roessing
e hve dded new lled egment roessing 9F es the nme suggests this llows proessing individul segments of doument independently of one otherF por more detilsD plese look t the setion IWFPFIHF
epplition gomposition
he gteFgontroller implementtions provided with the min qei distriution now lso implement the gteFroessingesoure interfeF his mens tht n pplition n now ontin nother pplition s one of its omponentsF
qroovy upport
qroovy is dynmi progrmming lnguge sed on tvF ou n now use it s sripting lnguge for qeiD vi the qroovy gonsoleF por more detilsD see etion UFIUF
SUH
Change Log
A.8
Change Log
SUI
yntologyEfsed qzetteer
edded new plugin qzetteeryntologyfsed9D whih ontins yntooot qzetteer ! dynmilly reted gzetteer whih isD in omintion with few other generi resouresD ple of produing ontologyEwre nnottions over the given ontent with regrds to the given ontologyF por more detils see etion IQFVF
SUP
Change Log
qs smprovements
e new shemEdriven tool to stremline mnul nnottion tsks @see etion QFRFTAF gontextEsensitive help on elements in the resoure tree nd when pressing pI keyF erh in miling list from the relp menuF relp is displyed in your rowser or in tv rowser if you don9t hve oneF smproved serh funtion inside douments with regulr expression uilderF erh nd reple nnottion funtion in ll nnottion editorsF ememer for eh resoure type the lst pth used when lodingGsving resoureF ememer the lst nnottions seleted in the nnottion set view when you shift lik on the nnottion set view uttonF smproved ontext menu nd when possile dded drg nd drop inX resoure treeD nnottion set viewD nnottion list viewD orpus viewD ontroller viewF gontext menu key n e now used if you hve tv IFTF xew dilog ox for error messges with user oriented messgesD optionl disply of the on(gurtion nd proposing some useful tionsF his will progressively reple the old stk tre dump into the messge pnel whih is still here for the moment ut should e hide y defult in the futureF edd redEonly doument mode tht n e enle from the yptions menuF
Change Log
SUQ
edd seletion (lter in the sttus r of the nnottions list tle to esily selet rows sed on the text you enterF edd the lst (ve pplitions lodedGsved in the ontext menu of the lnguge reE soures in the resoures treeF hisply more informtions on wht going9s on in the witing dilog ox when running n pplitionF he gol is to improve it to get glol progress r nd estimted timeF
SUR
Change Log
edded new getgovering9 method to ennottionetF his method returns nnottions tht ompletely spn the provided rngeF en optionl nnottion type prmeter n e provided to further limit the returned setF gomplete redesign of exxsg qsF wore detils in etion WF
A.9
Change Log
SUS
yntology es
e new ontology esD sed on yv sn wemory @yvswAD whih o'ers etter esD revised ontology event model nd n improved ontology editor to nme ut fewF ee ghpter IR for more detilsF
yge
yntologyEsed gorpus ennottion ool to help nnottors to mnully nnotte douments using ontologiesF por more detils plese see etion IRFTF
elignment ools
e new set of omponents @eFgF gompoundhoumentD elignmentiditor etFA tht help in uilding lignment tools nd in rrying out rossEdoument proessingF ee ghpter IW for more detilsF
tv SFH upport
qei now requires tv SFH or lter to ompile nd runF his rings numer of ene(tsX tv SFH syntx is now ville on the right hnd side of tei rules with the defult ilipse ompilerF ee etion VFT for detilsF enum types re now supported for resoure prmetersF see etion UFIP for detils on de(ning the prmeters of resoureF
SUT
Change Log
ennottionet nd the greoleegister tke dvntge of generi typesF he ennottionet interfe is now n extension of et`ennottionb rther thn just etD whih should mke for lener nd more typeEsfe ode when progrmming to the esD nd the greoleegister now uses prmeterized typesD whih re kwrdsE omptile ut provide etter typeEsfety for new odeF
Change Log
SUU
e new interfe hs een dded tht lets s reeive noti(tion t the strt nd end of exeution of their ontining ontrollerF his is useful for s tht need to do lenup or other proessing fter whole orpus hs een proessedF ee etion RFR for detilsF he qei heveloper qs does not ll ystemFexit@A ny more when it is losedF snsted n e'ort is mde to stop ll tive threds nd to relese ll qs resouresD whih leds to the tw exiting grefullyF his is prtiulrly useful when qei is emedded in other systems s losing the min qei window will not kill the tw proess ny moreF he set of ennottionhems tht used to e inluded in the ore gteFjr nd loded s uiltins hve now een moved to the exxsi pluginF hen the plugin is lodedD the defult nnottion shems re instntited utomtilly nd re ville when doing mnul nnottionF here is now support in reoleFxml (les for utomtilly reting instnes of reE soure tht re hidden @iFeF do not show in the qsAF yne exmple of this n e seen in the reoleFxml (le of the exxsi plugin where the defult nnottion shems re de(nedF e ouple of helper lsses hve een dded to ssist in using qei within pring pplitionF etion UFIS explins the detilsF smprovements hve een mde to the thredEsfety of some internl omponentsD whih men tht it is now sfe to rete resoures in multiple threds @though it is not sfe to use the sme resoure instne in more thn one thredAF his is ig dvntge when using qei in multithreded environmentD suh s we pplitionF ee etion UFIR for detilsF lugins n now provide ustom ions for their s nd vs in the plugin te (leF ee etion UFIP for detilsF st is now possile to override the defult lotion for the sved session (le using system propertyF ee etion PFQ for detilsF he reegger plugin @ggerreegger9A supports system property to speify the lotion of the shell interpreter used for the tgger shell sriptF sn omintion with gygwin this mkes it muh esier to use the tgger on indowsF he fuhrt plugin hs een removedF st is superseded y viD nd instruE tions on how to upgrde your pplitions from fuhrt to vi re given in etion IUFQF he proility (nder plugin hs lso een removedD s it is no longer mintinedF he ootstrp wizrd now retes si plugin tht uilds with entF ine nixE style mke ommnd is no longer required this mens tht the generted plugin will uild on indows without needing gygwin or winqF
SUV
Change Log
he qei soure ode hs moved from g into uversionF ee etion PFPFQ for detils of how to hek out the ode from the new repositoryF en optionl prmeterD keepyriginlwrkupseD hs een dded to the houmenteE set whih llows users to deide whether to keep the yriginl wrkups e or not while reseting the doumentF ee etion TFI for more detilsF
! he doument s refreshes fster thn eforeF ! he presene of the qs for doument indues smller performne penlty
thn it used toF hue to etter threding implementtionD mhines ene(tE ing from multiple gs @eFgF dul gD dul ore or hyperthreding mhinesA should only see negligile inrese in proessing time when doument is disE plyed ompred to the situtions where the doument view is not shownF sn the previous versionD displying doument while it ws proessed used to inrese exeution time y n order of mgnitudeF
Change Log
SUW
! he strnge exeptions tht used to our osionlly while working with the
doument qs should not hppen ny moreF
end s lwys there re mny smller ug(xes too numerous to list hereFFF
A.10
xew yntology es
he ontology lyer hs een rewritten in order to provide n strtion lyer etween the model representtion nd the tools used for input nd output of the vrious representtion formtsF en implementtion tht uses ten P @httpXGGjenFsoureforgeFnetGontologyA for reding nd writing yv nd hp@A is providedF
SVH
Change Log
vimited support for loding hp nd wirosoft ord doument formtsF ynly the text is extrted from the doumentsD no formtting informtion is preservedF he fuhrt prser hs een depreted nd repled y new plugin lled vi E the he0eld niversity rolog rser for vnguge ingineeringF pull detilsD inluding informtion on how to move your pplition from fuhrt to viD is in etion IUFQF he repple y gger is now openEsoureF he soure ode hs een inluded in the qei heveloperGimedded distriutionD under srGheppleGpostgF wore informtion out the y gger n e found in etion TFTF winipr is now supported on indowsF minipar-windows.exeD modi(ed version of pdemo.cpp is dded under the gteGpluginsGrserwinipr diretory to llow users to run winipr on windows pltformF hile using winipr on indowsD this inry should e provided s vlue for miniparBinary prmeterF por full informtion on winipr in qeiD see etion IUFIF he mlqtepormt writer@ve es ml from qei heveloper qsD gteFhoumentFtoml@A from qei imedded esA nd reder hve een modi(ed to write nd red qei nnottion shsF por kwrd omptiility resons the old reder hs een keptF his hnge (xes ug whih mnifested in the following situtionX sf qei doE ument hd nnottions rrying fetures of whih vlues were numers representing other qei nnottion shsD fter sve nd relod of the doument to nd from wvD the former vlues of the fetures ould hve eome invlid y pointing to other nnottionsF fy sving nd restoring the qei nnottion shD the former onsisteny of the qei doument is mintinedF por more informtionD see etion SFSFPF he x hunker nd hemistry tgger plugins hve een updtedF wrk eF qreenwood hs reliened them under the vqvD so their soure ode hs een moved into the qei heveloperGimedded distriutionF ee etions PIFP nd PIFR for detilsF he ree gger wrpper hs een updted with n option to e less strit when hrters tht nnot e represented in the tgger9s enoding re enountered in the doumentF tei rnsduers n e serilized into inry (lesF he option to lod serilized version of tei rnsduer @n initEtime prmeter binaryGrammarURLA is lso imE plemented whih n e used s n lterntive to the prmeter grammarURLF wore informtion n e found in etion VFWF yn w yD qei heveloper now ehves more nturlly9F he pplition menu items nd keyord shortuts for About nd Preferences now do wht you would expetD nd exiting qei heveloper with ommndE or the Quit menu item properly sves your options nd urrent sessionF pdted versions of ek@QFRFTA nd wxent@PFRFHAF
Change Log
yptimistion in now fsterF
gate.creole.mlX
SVI
st is now possile to rete your own implementtion of ennottionD nd hve qei use this insted of the defult implementtionF ee ennottionptory nd ennottionetsmpl in the gteFnnottion pkge for detilsF
SVP
Change Log
A.11
January 2005
elese of version QF xew plugins for proessing in vrious lnguges @see ISAF hese re not full si systems ut re designed s strting points for further development @prenhD qermnD pnishD etFAD or s smple or toy pplitions @geunoD rindiD etFAF yther new pluginsX ghemistry gger PIFR wontrel rnsduer @sine retiredA e rser IUFP winir IUFI fuhrt rser IUFQ winorhird @ersion SFIX removedA x ghunker PIFP temmer PIFIH reegger roility pinder grwler PIFIU qoogle gFP upport for w vightD support vetor mhine implementtionD hs een dded to the mhine lerning plugin verning9 @see setion IVFQFSAF
A.12
December 2004
qei no longer depends on the un tv ompiler to runD whih mens it will now work on ny tv runtime environment of t lest version IFRF tei grmmrs re now ompiled using the ilipse th tv ompiler y defultF e welome sideEe'et of this hnge is tht it is now muh esier to integrte qeiEsed proessing into we pplitions in omtF ee etion UFIT for detilsF
Change Log
SVQ
A.13
September 2004
qei pplitions re now sved in wv formt using the trem lirryD rther thn y using ntive jv seriliztionF yn loding n pplitionD qei will utomtilly detet whether it is in the old or the new formtD nd so pplitions in oth formts n e lodedF roweverD older versions of qei will e unle to lod pplitions sved in the wv formtF @e jvFioFtremgorruptedixeptionX invlid strem heder exeption will ourFA st is possile to get new versions of qei to use the old formt y setting )g in the soure odeF @ee the qteFjv (le for detilsFA his hnge hs een mde euse it llows the detils of n pplition to e viewed nd edited in text editorD whih is sometimes esier thn loding the pplition into qeiF
A.14
ersion Q inorportes lot of new funtionlity nd some reorgnistion of existing omE ponentsF xote tht fet I is fetureEomplete ut needs further deugging @plese send us ug reE ports3AF righlights inludeX ompletely rewritten doument viewerGeditorY extensive ontology supE portY new plugin mngement systemY seprte Fjr (les nd omt lssloding (xY lots more giyvi omponents @nd some more to ome soonAF elmost ll the hnges re kwrdsEomptileY some reent lsses hve een renmed @prtiulrly the ontologies support lssesA nd few events dded @see elowAY dtstores reted y version Q will proly not red properly in version PF sf you hve prolems use the miling list nd we9ll help you (x your ode3 he gorey detilsX enonymous g is now villeF ee etion PFPFQ for detilsF giyvi repositories nd the omponents they ontin re now mnged s pluginsF ou n selet the plugins the system knows out @nd dd new onesA y going to wnge giyvi lugins9 on the (le menuF he gteFjr (le no longer ontins ll the susidiry lirries nd giyvi ompoE nent resouresF his mkes it esier to reple lirry versions ndGor not lod them when not required @lirries used y giyvi uiltins will now not e loded unless you sk for them from the plugins mnger onsoleAF exxsi nd other undled omponents now hve their resoure (les @eFgF pttern (lesD gzetteer listsA in seprte diretory in the distriution ! gteGpluginsF
SVR
Change Log
ome testing with un9s thu IFS preEreleses hs een done nd no prolems reportedF he gteXGG v system used to lod giyvi nd exxsi resoures in pst releses is no longer neededF his mens tht loding in systems like omt is now muh esierF weg y is now properly supported y the instlled nd the runtimeF en yntologyEsed gorpus ennottion ool @ygeA hs een implemented s pluE ginF houmenttion of its funtionlity is in etion IRFTF he xvq vexil tools from the wseu projet hve now een relesedF he petures viewerGeditor hs een ompletely updted ! see etion QFRFS for detilsF he houment editor hs een ompletely rewritten ! see etion QFP for more inforE mtionF he dtstore viewer is now fullEsize ! see etion QFWFP for more informtionF
A.15
July 2004
qei douments now (re events when the doument ontent is editedF his ws dded in order to support the new fility of editing douments from the qsF his hnge will rek kwrds omptiility y requiring ll houmentvistener implementtions to implement new methodX
A.16
June 2004
e new lgorithm hs een implemented for the ennottionhi' funtionF e newD more usleD qs is inludedD nd n ixport to rwv9 option ddedF wore detils out the ennottionhi' tool re in etion IHFPFIF e new uild proessD sed on ex @httpXGGntFpheForgGA is now villeF he old uild proessD sed on mkeD is now unsupportedF ee etion PFS for detils of the new uild proessF e tpe heugger from yntos eq hs een integrtedF ou n turn integrtion yx with ommnd line option Ej9F sf you run qei heveloper with this optionD the new menu item for tpe heugger qs will pper in the ools menuF he defult vlue of integrtion is yppF e re urrently witing doumenttion for thisF
Change Log
SVS
xyi3 ueep in mind there is glssgstixeption if you try to deug gonditionlgorpusE ipelineF tpe heugger is designed for gorpus ipeline onlyF he yntos ode needs to e hnged to llow deugging of gonditionlgorpusipelineF
A.17
April 2004
here re now two lterntive strtegies for ontologyEwre grmmr trnsdutionX using the ontology feture oth in grmmrs nd nnottionsY with the defult rnsE duerF using the ontology wre trnsduer ! pssing n ontology v to new susume method in the implepeturewpsmplF the ltter strtegy does not hek for ontology fetures @this will mke the writing of grmmrs esier ! no need to speify ontologyAF he hnges re inX inglehsernsduer @lwys ll susume with ontology ! if null then the ordinry susumption tkes pleA implepeturewpsmpl @new susume method using n ontology vA wore informtion out the ontologyEwre trnsduer n e found in etion IRFIHF e morphologil nlyser hs een ddedF his (nds the root nd 0x vlues of token nd dds them s fetures to tht tokenF e )exile gzetteer hs een ddedF his performs lookup over doument sed on the vlues of n ritrry feture of n ritrry nnottion typeD y using n externlly provided gzetteerF ee IQFT for detilsF
A.18
March 2004
upport ws dded for the weix mhine lerning lirryF @ee IVFQFR for detilsFA
A.19
xote tht qei PFP works with thu IFRFH or oveF ersion IFRFP is reommendedD nd is the one inluded with the ltest instllersF
SVT
Change Log
qei hs een dpted to work with ostgres UFQF he omptiility with ostgrev UFP hs een preservedF xote tht s of ersion SFI ostgrev is no longer supportedF xew lirry version ! vuene IFQ @rIA e ug in gteFutilFtv hs een (xed in order to ount for situtions when tring literls require n enoding di'erent from the pltform defultF emporry Fjv (les used to ompile tei r tions re now sved using pEV nd the Eenoding pEV9 option is pssed to the jv ompilerF e ustom toolsFjr is no longer neessry winor hnges hve een mde to the look nd feel of qei heveloper to improve its pperne with thu IFRFP
A.20
sntegrtion of whine verning nd iue wrpper @see etion IVFQAF eddition of hewvCysv exporterF sntegrtion of ordxet @see etion PIFIVAF he syntx tree viewer hs een updted to (x some ugsF
A.21
June 2002
gonditionl versions of the ontrollers re now ville @see etion QFVFPAF hese llow proessing resoures to e run onditionlly on doument feturesF ostgrev htstores re now supportedF hese store dt into ostgrev hfwF @es of ersion SFI ostgrev is no longer supportedFA eddition of yntoqzetteer @see etion IQFQAD n interfe whih mkes ontologies visile within qei heveloperD nd supports si methods for hierrhy mngement nd trverE slF sntegrtion of rotgD so tht people with developed rotg ontologies n use them within
Change Log
qeiF eddition of s filities in qei @see etion PIFITAF
SVU
wodi(tion of the orpus enhmrk tool @see etion IHFRFQAD whih now tkes n ppliE tion s prmeterF ee lso for detils of other reent ug (xesF
SVV
Change Log
SVW
SWH yld xme ner lignment nnottionwerging ri dmgomputtion euno ghemistrygger hinese hineseegmenter opyePenoho rwl frenh germn google hindi ilugin itlin ue lerning lkgzetteer winipr xghunking yntologyfsedqzetteer ypenglis openxv rsp romnin tnford temmer vi ggerprmework reegger uim yhoo
C.1
Note: the JapeC compiler does not currently support the new JAPE language features introduced in JulySeptember 2008. If you need to use negation, the accessors, the contextual operators than
==,
then you will need to use the standard JAPE transducer instead of JapeC.
tpeg is n lterntive implementtion of the tei lnguge whih works y ompiling tei grmmrs into tv odeF gompred to the stndrd implementtionD these ompiled grmmrs n e severl times fster to runF et yntotextD modi(ed version of the exxsi sentene splitter using ompiled grmmrs hs een found to run up to (ve times s fst s the stndrd versionF he ompiler n e invoked mnully from the ommnd lineD or used through the yntotext tpe gompiler9 in the Jape_Compiler pluginF he yntotext tpe rnsduer9 @omFontotextFgteFjpeFtpernsduerA is proessing resoure tht is designed to e n lterntive to the originl tpe rnsduerF ou n simply reple gteFreoleFrnsduer with omFontotextFgteFjpeFtpernsduer in your gte pplition nd it should work s expetedF he tpe trnsduer tkes the sme prmeters s the stndrd tei trnsduerX
grmmrv the v from whih the grmmr is to e lodedF xote tht the tpe
rnsduer will only work on fileX vsF elsoD the lterntive prmeter of the stndrd trnsduer is not supportedF SWI
binaryGrammarURL
SWP
enoding the hrter enoding used to lod the grmmrsF ontology the ontology used for ontologEwre trnsdutionF
sts runtime prmeters re likewise the sme s those of the stndrd trnsduerX
doument the doument to proessF inputexme nme of the ennottionet from whih input nnottions to the trnsduer
re redF
outputexme nme of the ennottionet to whih output nnottions from the trnsE
duer re writtenF
he tpe ompiler itself is written in rskellF gompiled inries re provided for indowsD vinux @xVTA nd w y @owergAD so no rskell interpreter is required to run tpe on these pltformsF por other pltformsD or if you mke hnges to the ompiler soure odeD you n uild the ompiler yourself using the ent uild (le in the tpegompiler plugin diretoryF ou will need to instll the ltest version of the qlsgow rskell gompiler1 nd ssoited lirriesF he jpe ompiler n then e uilt y runningX
C.2
Google Plugin
his plugin is no longer opertionl euse the funtionlityD provided y qoogleD on whih it dependsD is no longer villeF
C.3
Yahoo Plugin
he hoo es is now integrted with qeiD nd n e used s Esed pluginF his pluginD eerhhoo9D llows the user to query hoo nd uild doument orpus tht ontins the serh results returned y hoo for the queryF por more informtion out the hoo es plese refer to httpXGGdeveloperFyhooFomGserhGF sn order to use the hoo D you need to otin n pplition shF
1 GHC version 6.4.1 was used to build the supplied binaries for Windows, Linux and Mac
SWQ
he hoo n e used for numer of di'erent pplition senriosF por exmpleD one use se is where user wnts to (nd the di'erent nmed entities tht n e ssoited with prtiulr individulF sn this exmpleD the user ould uild olletion of douments y querying hoo with the individul9s nme nd then running exxsi over the olletionF his would nnotte the results nd show the di'erent yrgniztionD votion nd other entities tht re ssoited with the queryF
C.4
qze is tool for editing the gzetteer lists D de(nitions nd mpping to ontologyF st is suitle for use oth for linGviner qzetteers @hefult nd rsh qzetteersA nd yntologyEenled qzetteers @yntoqzetteerAF he qzetteer ssoited with the viewer is reinitilised every time sve opertion is performedF xote tht qei does not sle up
SWR
to very lrge lists @we suggest not using it to view over RHDHHH entries nd not to opy inside more thn IHD HHH entriesAF qze is prt of nd provided y the exxsi pluginF o mke it possile to visulize gzetteers with the qze visulizerD the exxsi plugin must e loded (rstF houle liking on gzetteer tht uses gzetteer de(nition @indexA (le will disply the ontents of the gzetteer in the min windowF he (rst pne will disply the de(nition (leD while the right pne will disply whihever gzetteer list hs een seleted from itF e gzetteer list n e modi(ed simply y typing in itF it n e sved y liking the ve uttonF hen list is svedD the whole gzetteer is utomtilly reinitilised @nd will e redy for use in qei immeditelyAF o edit the de(nition (leD right lik inside the pne nd hoose from the options @snsetD iditD emoveAF e popEup menu will pper to guide you through the remining proessF ve the de(nition (le y seleting veF eginD the gzetteer will e reinitilised utomtillyF
SWS
xew ! ressing xew invokes (le dilog where the lotion of the new de(nition is spei(edF vod ! ressing vod invokes (le dilogD nd fter loting the new de(nition it is loded
y pressing ypenF
ve ! ressing ve sves the de(nition to the lotion from whih it hs een redF ve es ! ressing ve es llows nother lotion to e hosenD nd the de(nition sved
thereF
vistD wjor ypeD winor ype nd vngugesF he mndtory (elds re vist nd wjor ypeF efter pressing yuD new liner node is dded to the de(nitionF
emove ! yn rightElik over node nd hoosing emoveD the seleted liner node is
removed from the de(nitionF
idit ! yn rightElik over node nd hoosing idit dilog is displyed llowing hnges
of the (elds vistD wjor ypeD winor ype nd vngugesF
SWT
C.5
Google Translator PR
he qoogle rnsltor llows users to trnslte their douments into mny other lnguges using the qoogle trnsltion servieF st is sed on the lirry lled googleEtrnslteEpiEjv whih is distriuted under the vqv liene nd is ville to downlod from httpXGGodeFgoogleFomGpGgoogleEpiEtrnslteEjvGF he is inluded in the plugin lled ernslteqoogle nd depends on the elignE ment pluginF @hpter IWAF sf user wnts to trnslte n inglish doument into prenh using the qoogle rnsltor F he (rst thing user needs to do is to rete n instne of gompoundhoument with the inglish doument s memer of itF he gompoundhoument in qei provides onvenient wy to group prllel douments tht re trnsltions of one other @see hpter IW for more informtionAF he ide is to use text from one of the memers of the provided ompound doumentD trnslte it using the qoogle trnsltion servie nd rete nother memer with the trnslted textF sn the proessD the lso ligns the hunks of prllel textsF rereD hunk ould e senteneD prgrphD setion or the entire doumentF
siteeferrer is the only initEtime prmeter required to instntite the F st hs to e vlid wesite ddressF he vlue of this prmeter is required to inform qoogle out the users using their servieF here re seven runEtime prmetersX
doument E n instne of the ompound doument with memer doument ontinE ing soure textF sourehoumentsd E id of the soure memer doument tht needs to e trnsltedF trgethoumentsd E id of the trget memer doumentF his doument is reted y the nd ontins the trnslted textF sourevnguge E the lnguge of the soure doumentF trgetvnguge E the lnguge into whih the soure doument should e trnsltedF
SWU
unityfrnsltion E nnottion type used for identifying hunks of texts to e trnsE lted nd lignedF inputexme E nme of the nnottion set whih ontins unit of trnsltionsF lignmentpeturexme E nme of the lignment feture used for storing the lignment informtionF he lignment feture is doument feture stored on the ompound doumentF
SWV
qei is kplne into whih speilised tv fens plugF hese ens re looseEoupled with respet to eh other E they ommunite entirely y mens of the qei frmeworkF snterEomponent ommunition is hndled y model omponents E vngugeesouresD nd eventsF gomponents re de(ned y onformne to vrious interfes @eFgF vngugeesoureAD ensuring seprtion of interfe nd implementtionF he reson for dding to the norml en initilistion meh is tht vsD s nd s ll hve hrteristi prmeteristion phsesY the qei resouresGomponents model mkes expliit these phsesF
D.1
Patterns
qei is strutured round numer of wht we might ll priniplesD or ptternsD or lterntivelyD lever ides stolen from etter minds thn mineF hese ptterns reX SWW
THH
Design Notes
modelling most things s extensile sets of omponents @fF etion hFIFIAY seprting omponents into modelD viewD or ontroller @fF etion hFIFPA typesY hiding implementtion ehind interfes @fF etion hFIFQAF
pour interfes in the topElevel pkge desrie the qei view of omponentsX esoureD roessingesoureD vngugeesoure nd isulesoureF
D.1.1 Components
erhiteturl riniple
herever users of the rhiteture my wish to extend the set of prtiulr type of entityD those types should e expressed s omponentsF enother wy to express this is to sy tht the rhiteture is sed on agentsF s9ve voided this in the pst euse of n ssoition etween this term nd the ide of its of ode moving round etween mhines of their own volitionF s tke this to e somewht pointlessD nd proly the result of n nthropomorphi osession with moility s orrelte of intelligeneF sf we drop this onnottionD howeverD we n sy tht qei is n gentEsed rhitetureF sf we wnt toD tht isF
prmework ixpression
wny of the lsses in the frmework re omponentsD y whih we men lsses tht onform to n interfe with ertin stndrd propertiesF sn our se these properties re sed on the tv fens omponent rhitetureD with the ddition of omponent metdtD utomted loding nd stndrdised storgeD threding nd distriutionF ell omponents inherit from esoureD vi one of the three suEinterfes vngugeesoure @vAD isulesoure @A or roessingesoure @A isulesoures @sA re strightE forwrd ! they represent visulistion nd editing omponents tht prtiipte in qss ! ut the distintion etween lnguge nd proessing resoures merits further disussionF vike other softwreD vi progrms onsist of dt nd lgorithmsF he urrent orthodoxy in softwre development is to model oth dt nd lgorithms togetherD s objects1 F ystems tht dopt the new pproh re referred to s yjetEyriented @yyAD nd there re good resons to elieve tht yy softwre is esier to uild nd mintin thn other vrieties fooh WRD ourdon WTF
1 Older development methods like Jackson Structured Design [Jackson 75] or Structured Analysis
[Yourdon 89] kept them largely separate.
Design Notes
THI
sn the domin of humn lnguge proessing 8hD howeverD the terminology is little more omplexF vnguge dtD in vrious formsD is of suh signi(ne in the (eld tht it is frequently worked on independently of the lgorithms tht proess itF por exmpleX treenk2 n e developed independently of the prsers tht my lter e trined from itY thesurus n e developed independently of the query expnsion or sense tgging mehnisms tht my lter ome to use itF his type of dt hs ome to hve its own termD Language Resources @vsA vigEI WVD overing mny dt souresD from lexions to orporF sn reognition of this distintionD we will dopt the following terminologyX
vnguge esoure @vAX refers to dtEonly resoures suh s lexionsD orporD theE
suri or ontologiesF ome vs ome with softwre @eFgF ordnet hs oth user query interfe nd g nd rolog essAD ut where this is only mens of essing the underlying dt we will still de(ne suh resoures s vsF mti or lgorithmiD suh s lemmtisersD genertorsD trnsltorsD prsers or speeh reognisersF por exmpleD prtEofEspeeh tgger is est hrterised y referene to the proess it performs on textF s typilly include vsD eFgF tgger often hs lexionY word sense dismigutor uses ditionry or thesurusF
edditionl terminology worthy of note in this ontextX language data refers to vs whih re t their ore exmples of lnguge in prtieD or performne dt9D eFgF orpor of texts or speeh reordings @possily inluding dded desriptive informtion s mrkupAY data about language refers to vs whih re purely desriptiveD suh s grmmr or lexionF s n e viewed s lgorithms tht mp etween di'erent types of vD nd whih typilly use vs in the mpping proessF en w engineD for exmpleD mps monolingul orpus into multilingul ligned orpus using lexionsD grmmrsD etF3 purther support for the Gv terminology my e glened from the rgument in fvour of delrtive dt strutures for grmmrsD knowledge sesD etF his rgument ws urrent in the lte IWVHs nd erly IWWHs qzdr 8 wellish VWD prtly s response to wht hs een seen s the overly proedurl nture of previous tehniques suh s ugmented trnsition networksF helrtive strutures represent seprtion etween dt out lnguge nd the lgorithms tht use the dt to perform lnguge proessing tsksY similr seprtion to tht used in qeiF edopting the Gv distintion is mtter of onforming to estlished domin prtie nd terminologyF st does not imply tht we nnot model the domin @or uild softwre to support itA in n yjetEyriented mnnerY indeed the models in qei re themselves yjetEyrientedF
2 A corpus of texts annotated with syntactic analyses. 3 This point is due to Wim Peters.
THP
Design Notes
Design Notes
diretlyD euse we use the wing toolkit for the qssY
THQ
y nlogyD where vs re modelsD s re views nd s re ontrollersF yf theseD the ltter sits lest esily with the wg shemeD s s my indeed e ontrollers ut my lso not eF
D.1.3 Interfaces
erhiteturl riniple
he implementtion of types should generlly e hidden from the lients of the rhitetureF
prmework ixpression
ith few exeptions @suh s for utility lssesAD lients of the frmework work with the gteFB pkgeF his pkge is mostly omposed of interfe de(nitionsF snstntitions of these interfes re otined vi the ptory lssF he susidiry pkges of qei provide the implementtions of the gteFB interfes tht re essed vi the ftoryF hey themselves void diretly onstruting lsses from other pkges @with few exeptionsD suh s tei9s need for untthed nnottion setsAF snsted they use the ftoryF
D.2
Exception Handling
hen nd how to use exeptionsc forrowing from fill ennersD here re some guidelines @with exmplesAX IF ixeptions exist to refer prolem onditions up the ll stk to level t whih they my e delt withF 4sf your method enounters n norml ondition that it can't handleD it should throw n exeptionF4 sf the method n hndle the prolem rtioE nllyD it should th the exeption nd del with itF
ixmpleX
sf the retion of resoure suh s doument requires v s prmeterD the method tht does the retion needs to onstrut the v nd red from itF sf there is n exeption during this proessD the qei method should ort y throwing its own exeptionF he exeption will e delt with higher up the food hinD eFgF y sking the user to input nother vD or y orting th sriptF
THR
Design Notes
PF ell qei exeptions should inherit from gteFutilFqteixeption @ desendnt of jvFlngFixeptionD hene heked exeptionA or gteFutilFqteuntimeixeption @ desendnt of jvFlngFuntimeixeptionD hene n unheked exeptionAF his rule mens tht lients of qei ode n th ll sorts of exeptions thrown y the system with only two th sttementsF @his rule my e roken y methods tht re not puliD so long s their llers th the nonEqei exeptions nd del with them or onvert them to qteixeptionGqteuntimeixeptionFA elmost ll exepE tions thrown y qei should e heked exeptionsX the point of n exeption is tht lients of your ode get to know out itD so use heked exeption to mke the ompiler fore them to del with itF ixeptX
ixmpleX
ith referene to the previous exmpleD prolem using the v will e signlled y something like n nknownrostixeption or n syixeptionF hese should e ught nd reEthrown s desendnts of qteixeptionF QF sn sitution where n exeptionl ondition is n indition of ug in the qei lirryD or in the implementtion of some other lirryD then it is permissile to throw n unheked exeptionF
ixmpleX
sf method is reting nnottions on doumentD nd efore reting the nnottions it heks tht their strt nd end points re vlid rnges in reltion to the ontent of the doument @iFeF they fll within the o'set spe of the doumentD nd the end is fter the strtAD then if the method reeives n snvlidy'setixeption from the ennottionetFdd llD something is seriously wrongF sn suh ses it my e est to throw qteuntimeixeptionF RF here you re inheriting from nonEqei lss nd therefore hve the exeption signtures (xed for youD you my dd new exeption deriving from nonEqei lssF
ixmpleX
he e wv prser es uses xixeptionF smplementing e prser for doument type involves overriding methods tht throw this exeptionF here you wnt to hve sutype for some prolem whih is spei( to qei proessingD you ould use qtexixeption whih extends xixeptionF SF est ode is di'erentX in the tnit test ses it is (ne just to delre tht eh method throws ixeption nd leve it t thtF he tnit test runner will pik up the exepE tions nd report them to youF est methods shouldD howeverD try nd ensure tht the exeptions thrown re meningfulF por exmpleD void null pointer exeptions in the test ode itselfD eFgF y using ssertxonxullF
Design Notes
THS
ixmpleX
1 2 3 4 5 6 7 8 9 10 11
public void testComments () throws Exception { ResourceData docRd = ( ResourceData ) reg . get ( " gate . Document " ); assertNotNull ( " testComments : couldn 't find document res data " , docRd ); String comment = docRd . getComment (); assert ( " testComments : incorrect or missing COMMENT on document " , comment != null && comment . equals ( " GATE document " ) ); } / / testComments()
ee lso the testing notesF TF 4hrow di'erent exeption type for eh norml onditionF4 ou n go too fr on this one E hundred exeption types per pkge would ertinly e too muh E ut in generl you should rete new exeption type for eh di'erent sort of prolem you enounterF
ixmpleX
he gteFreole pkge hs esouresnstntitionixeption E this dels with ll prolems to do with reting resouresF e ould hve hd 4esourerlrolem4 nd 4esourermeterrolem4 ut tht would proly hve ended up with too mnyF yn the other hndD just throwing everything s qteixeption is too orse @rmish tke note3AF UF ut exeptions in the pkge tht they9re thrown from @unless they9re used in mny pkgesD in whih se they n go in gteFutilAF his mkes it esier to (nd them in the doumenttion nd prevents nme lshesF
ixmpleX
gteFjpeFrserixeption is orretly pledY if it ws in gteFutil it might lsh withD for exmpleD gteFxmlFrserixeption if there ws suhF
THT
Design Notes
E.1
o use the qei ent tsks in your uild (le you must inlude the following `typedefb @where 6{gteFhome} is the lotion of your qei instlltionAX
<typedef resource="gate/util/ant/antlib.xml"> <classpath> <pathelement location="${gate.home}/bin/gate.jar" /> <fileset dir="${gate.home}/lib" includes="*.jar" /> </classpath> </typedef>
sf you hve prolems with lirry on)its you should e le to redue the te (les inluded from the li diretory to just jdomD xstrem nd jxenF
E.2
E.2.1 Introduction
qei sved pplition sttes @qe (lesA re n wv representtion of the stte of qei pplitionF yne of the fetures of qe (le is tht it holds referenes to the THU
THV
externl resoure (les used y the pplition s pths reltive to the lotion of the qe (le itself @or reltive to the lotion of the qei home diretory where ppropriteAF his is useful in mny ses ut if you wnt to pkge up opy of n pplition to send to third prty or to use in we pplitionD etFD then you need to e very reful to sve the (le in diretory ove all its resouresD nd pkge the resoures up with the qe (le t the sme reltive pthsF sf the pplition refers to resoures outside its own (le tree @iFeF with reltive pths tht inlude FFA then you must either mintin this struture or mnully edit the wv to move the resoure referenes round nd opy the (les to the right ples to mthF his n e quite tedious nd errorEproneFFF he pkgegpp ent tsk ims to utomte this proessF st extrts ll the reltive pths from qe (leD writes modi(ed version of the (le with these pths rewritten to point to lotions elow the new qe (le lotion @iFeF with no FF pth segmentsA nd opies the referened (les to their rewritten lotionsF he result is diretory struture tht n e esily pkged into zip (le or similr nd moved round s selfEontined unitF his ent tsk is the underlying driver for the ixport for qeigloudFnet9 option desried in etion QFWFRF ixport for qeigloudFnet does the equivlent ofX
<packagegapp src="sourceFile.gapp" destfile="{tempdir}/application.xgapp" copyPlugins="yes" copyResourceDirs="yes" onUnresolved="recover" />
followed y pkging the temporry diretory into zip (leF hese options re explined in detil elowF he pkgegpp tsk requires ent IFU or lterF
xote tht the prent diretory of the destfile @in this se pkgeA must lredy existF st will not e reted utomtillyF he vlue for the gtehome ttriute should e the pth to your qei instlltion @the diretory ontining uildFxmlD the inD li nd plugins diretoriesD etFAF sf you know tht the gpp (le you wnt to pkge does not referene ny resoures reltive to the qei home diretory1 then this ttriute my e omittedF
1 You can check this by searching for the string $gatehome$ in the XML
THW
PF por eh plugin referred to y reltive pthD fooGrGwyluginD rewrite the plugin lotion to e pluginsGwylugin @reltive to the lotion of the destfileAF sf the pplition refers to two plugins in di'erent originl lotions with the sme nmeD one of them will e renmed to void nme lshF sf one plugin is sudiretory of nother pluginD this nesting will e mintined in the reloted diretory strutureF QF por eh resoure (le referred to y the gppD see if it lives under the originl lotion of one of the plugins moved in the previous stepF sf soD rewrite its lotion reltive to the new lotion of the pluginF RF sf there re ny reltive resoure pths tht re not ounted for y the ove rule @iFeF they do not live inside referened pluginAD the uild fils @see etion iFPFQ for how to hnge this ehviourAF SF rite out the modi(ed qe to the destfileF TF eursively opy the whole ontent of eh of the plugins from step P to their new lotions2 F his mens tht the ll the reltive pths in the new qe (le @pkgeGtrgetFxgppA will point to pluginsGomethingF ou n now undle up the whole pkge diretory nd tke it elsewhereF
TIH
fil @defultA the uild fils if n unresolved reltive pth is foundF solute unresolved reltive pths re left pointing to the sme lotion s in the originl
(leD ut s n absolute rther thn reltive vF he sme (le will e used even if you move the qe (le to di'erent diretoryF his option is useful if the resoure in question is visile t the sme solute lotion on the mhine where you will e putting the pkged (le @for exmple very lrge ditionry or ontology held on network shreAF
TII
sn this exmpleD ~GmyEppEvIGgrmmrGminFjpe would e mpped to the lotion resouresGmyEppGgrmmrGminFjpe @s lwysD reltive to the output qe (leAF ou n lso hint tht ertin resoures should e onverted to solute pths rther thn eing pkged with the pplitionD using solutea4yes4F he from nd to vlues refer to diretories E you nnot hint single (leD nor put two (les from the sme originl diretory into di'erent diretories in the pkged qeF ixpliit hints override the defult pluginEsed hintsF por exmple given the hint froma46{gteFhome}GpluginsGexxsiGresoures4 toa4resouresGexxsi4D resoures in the exxsi plugin would e mpped into resouresGexxsiD ut the plugin reoleFxml itself would still e mpped into pluginsGexxsiF es well s providing the hints inline in the uild (le you n lso red them from (le in the norml tv roperties formt3 D using
`hint filea4hintsFproperties4 Gb
he keys in the property (le re the from pths @in this seD reltive pths re resolved ginst the projet se diretoryD s with the lotion ttriute of property tskA nd the vlues re the to pths reltive to the output (le lotionF he order of the `hintb elements is signi(nt ! if more thn one hint ould pply to the sme resoure (leD the one de(ned (rst is usedF por exmpleD given the hints
<hint from="${gate.home}/plugins/ANNIE/resources/tokeniser" to="tokeniser" /> <hint from="${gate.home}/plugins/ANNIE/resources" to="annie" />
the resoure pluginsGexxsiGresouresGtokeniserGhefultokeniserFrules would e mpped into the tokeniser diretoryD ut if the hints were reversed it would insted e mpped into nnieGtokeniserF xoteD howeverD tht this does not neessrily extend to hints loded from property (lesD s the order in whih hints from single property (le re pplied is not spei(edF qiven
<hint file="file1.proeprties" /> <hint file="file2.properties" />
the reltive preedene of two hints from (leI is not (xedD ut it is the se tht ll hints in (leI will e pplied efore those in (lePF
3 the hint tag supports all the attributes of the standard Ant property tag so can load the hints from a
le on disk or from a resource in a JAR le
TIP
sn this modeD the pkger tsk will opy only the following (les from eh pluginX reoleFxml ny te (les referened from `teb elements in reoleFxml4 sn ddition it will of ourse opy ny (les directly referened y the qeD ut not (les referened indiretly @the lssi exmples eing Flst (les used y gzetteer FdefD or the individul phses of multiphse tei grmmrA or (les tht re referened y the reoleFxml itself s eysxexgi prmeters @eFgF the nnottion shems in exxsiAF ou will need to nme these extr (les expliitly s extr resoures @see the next setionAF
extraresourcespath.
TIQ
he `extrresourespthb llows you to speify spei( extr (les tht should e inluded in the pkgeX
<packagegapp src="original.xgapp" destfile="package/target.xgapp"> <extraresourcespath> <pathelement location="${user.home}/common-files/README" /> <fileset dir="${user.home}/my-app-v1" includes="grammar/*.jape" /> </extraresourcespath> </packagegapp>
es the nme suggestsD this is pthElike struture nd supports ll the usul elements nd ttriutes of n ent `pthbD inluding multiple nested filesetD filelistD pthelement nd other pth elementsF por spei( types of indiret referenesD there re helper eleE ments tht n e inluded under extrresourespthF gurrently the only one of these is gzetteerlistsD whih tkes the pth to gzetteer de(nition (le nd returns the set of Flst (les the de(nition usesX
<path id="extra.files"> ... </path> <packagegapp ...> <extraresourcespath refid="extra.files" /> </packagegapp>
esoures
delred in the extrresourespth nd diretories inluded using opyesourehirs re treted extly the sme s resoures tht re referened y the qe (le E their trget lotions in the pkge re determined y the mpping hintsD deE fult pluginEsed hintsD nd the onnresolved setting s oveF sf you wnt to put extr resoure (les t spei( lotions in the pkge treeD independent of the mpping hints mehnismD you should do this with seprte `opyb tsk fter the `pkgegppb tsk hs done its workF
TIR
E.3
he expndreoles tsk proesses numer of reoleFxml (les from pluginsD proesses ny dgreoleesoure nd dgreolermeter nnottions on the delred resoure lssesD nd merges this on(gurtion with the originl wv on(gurtion into new opy of the reoleFxmlF st is not neessry to do this in the norml use of qeiD nd this tsk is doumented here simply for ompletenessF st is intended simply for use with nonEqei tools tht n proess the reoleFxml (le formt to extrt informtion out plugins @the prime use se for this is to generte the qei plugins informtion pge utomtilly from the plugin de(nitionsAF he typil usge of this tsk @tken from the qei uildFxmlA isX
<expandcreoles todir="build/plugins" gatehome="${basedir}"> <fileset dir="plugins" includes="*/creole.xml" /> </expandcreoles>
his will initilise qei with the given qeirywi diretoryD then red eh (le from the nested (lesetD prse it s reoleFxmlD expnd it from ny nnottion on(gurtionD nd write it out to (le under uildGpluginsF ih output (le will e generted t the sme lotion reltive to the todir s the originl (le ws reltive to the dir of its filesetF
his hpter desries the individul grmmrs used in qei for xmed intity eogE nitionD nd how they re omined togetherF st reltes to the defult xi grmmr for exxsiD ut should lso provide guidelines for those dpting or reting new grmmrsF por doumenttion out spei( grmmrs other thn this ore setD use this doument in omintion with the omments in the relevnt grmmr (lesF hpter V lso provides inE formtion out designing new grmmr rules nd tips for ensuring mximum proessing speedF
F.1
Main.jape
his (le ontins list of the grmmrs to e usedD in the orret proessing orderF he ordering of the grmmrs is ruilD euse they re proessed in seriesD nd lter grmmrs TIS
TIT
my depend on nnottions produed y erlier grmmrsF he defult grmmr onsists of the following phsesX (rstFjpe (rstnmeFjpe nmeFjpe nmepostFjpe dtepreFjpe dteFjpe reldteFjpe numerFjpe ddressFjpe urlFjpe identi(erFjpe jotitleFjpe (nlFjpe unknownFjpe nmeontextFjpe orgontextFjpe loontextFjpe lenFjpe
F.2
rst.jape
his grmmr must lwys e proessed (rstF st n ontin ny generl mros needed for the whole grmmr setF his should onsist of mro de(ning how spe nd ontrol hrters re to e proessed @nd my onsequently e di'erent for eh grmmr setD depending on the text typeAF feuse this is de(ned (rst of llD it is not neessry to restte this in lter grmmrsF his hs ig dvntge ! it mens tht defult grmmrs n e used for speilised grmmr setsD without hving to e dpted to del with eFgF di'erent
TIU
tretment of spes nd ontrol hrtersF sn this wyD only the (rstFjpe (le needs to e hnged for eh grmmr setD rther thn every individul grmmrF he (rstFjpe grmmr lso hs dummy rule inF his is never intended to (re ! it is simply dded euse every grmmr set must ontin rulesD ut there re no spei( rules we wish to dd hereF iven if the rule were to mth the pttern de(nedD it is designed not to produe ny output @due to the empty rAF
F.3
rstname.jape
his grmmr ontins rules to identify (rst nmes nd titles vi the gzetteer listsF st dds gender feture where pproprite from the gzetteer listF his gender feture is used lter in order to improve oEreferene etween nmes nd pronounsF he grmmr retes seprte nnottions of type pirsterson nd itleF
F.4
name.jape
his grmmr ontins initil rules for orgniztionD lotion nd person entitiesF hese rules ll rete temporry nnottionsD some of whih will e disrded lterD ut the mjority of whih will e onverted into (nl nnottions in lter grmmrsF ules eginning with xot9 re negtive rules ! this mens tht we detet something nd give it speil nnottion @or no nnottion t llA in order to prevent it eing reognised s nmeF his is euse we hve no negtive opertor @we hve a9 ut not 3a9AF
F.4.1 Person
e (rst de(ne mros for initilsD (rst nmesD surnmesD nd endingsF e then use these to reognise omintions of (rst nmes from the previous phseD nd surnmes from their y tgs or se informtionF ersons get mrked with the nnottion emperson9F e lso perolte feture informtion out the gender from the previous nnottions if knownF
F.4.2 Location
he rules for votion re firly strightforwrdD ut we de(ne them in this grmmr so tht ny miguity n e resolved t the top levelF votions re often omined with other entity typesD suh s yrgnistionsF his is delt with y nnotting the two entity types seprtelyD nd them omining them in lter phseF votions re reognised minly y
TIV
gzetteer lookupD using not only lists of known plesD ut lso key words suh s mountinD lkeD riverD ity etF votions re nnotted s empvotion in this phseF
F.4.3 Organization
yrgniztions tend to e de(ned either y stright lookup from the gzetteer listsD orD for the mjorityD y omintion of y or se informtion nd key words suh s omE pny9D nk9D ervies9 vtdF9 etF wny orgniztions re lso identi(ed y ontextul informtion in the lter phse orgontextFjpeF sn this phseD orgniztions re nnotted s empyrgniztionF
F.4.4 Ambiguities
ome miguities re resolved immeditely in this grmmrD while others re left until lter phsesF por exmpleD ghristin nme followed y possile votion is resolved y defult to person rther thn votion @eFgF uen vondon9AF yn the other hndD ghristin nme followed y possile orgnistion ending is resolved to n yrgnistion @eFgF elexndr ottery9AD though this is slightly less sure ruleF
F.5
name_post.jape
his grmmr runs fter the nme grmmr to (x some erroneous nnottions tht my hve een retedF yf ourseD more elegnt solution would e not to rete the prolem in the (rst instneD ut this is workroundF por exmpleD if the surnme of erson ontins ertin stop wordsD eFgF wry end9 then only the (rst nme should e reognised s ersonF roweverD it might e tht the (rstnme is lso n yrgniztion @nd hs een tgged with empyrgniztion lredyAD eFgF FxF9 sf this is the seD then the nnottion is left untouhedD euse this is orretF
TIW
F.6
date_pre.jape
his grmmr preedes the dte phseD euse it inludes extr ontext to prevent dtes eing reognised erroneously in the middle of longer expressionsF st minly trets the se where n expression is lredy tgged s ersonD ut ould lso e tgged s dte @eFgF ITth tnAF
F.7
date.jape
his grmmr ontins the se rules for reognising times nd dtesF qiven the omplexity of potentil ptterns representing suh expressionsD there re lrge numer of rules nd mrosF elthough times nd dtes n e mutully miguousD we try to distinguish etween them s erly s possileF htesD times nd yers re generlly tgged seprtely @s emphteD empime nd emper respetivelyA nd then reomined to form (nl hte nnotE tion in lter phseF his is euse dtesD times nd yers n e omined together in mny di'erent wysD nd lso euse there n e muh miguity etween the threeF por exmpleD IQIP ould e time or yerD while WEIH ould e spn of time or dteD or (xed time or dteF
F.8
reldate.jape
his grmmr hndles reltive rther thn solute dte nd time sequenesD suh s yesE terdy morning9D P hours go9D the (rst W months of the (nnil yer9etF st uses minly expliit key words suh s go9 nd items from the gzetteer listsF
F.9
number.jape
his grmmr overs rules onerning money nd perentgesF he rules re firly strightE forwrdD using keywords from the gzetteer listsD nd there is little miguity hereD exept for exmple where ound9 n e money or weightD or where there is no expliit urreny denomintorF
TPH
F.10
address.jape
ules for eddress over ip ddressesD phone nd fx numersD nd postl ddressesF sn generlD these re not highly miguousD nd n e overed with simple pttern mthingD lthough phone numers n require use of ontextul informtionF gurrently only u formts re relly hndledD though hndling of foreign zipodes nd phone numer formts is envisged in futureF he nnottions produed re of type imilD hone etF nd re then repled in lter phse with (nl eddress nnottions with phone9 etF s feturesF
F.11
url.jape
ules for emil ddresses nd rls re in seprte grmmr from the other ddress typesD for the simple reson tht peokens need to e identi(ed for these rules to operteD wheres this is not neessry for the other eddress typesF por speed of proessingD we ple them in seprte grmmrs so tht peokens n e eliminted from the snput when they re not requiredF
F.12
identier.jape
his grmmr identi(es sdenti(ers9 whih silly mens ny omintion of numers nd letters ting s n shD referene numer etF not reognised s ny other entity typeF
F.13
jobtitle.jape
his grmmr simply identi(es totitles from the gzetteer listsD nd dds toitle nnoE ttionD whih is used in lter phses to id reognition of other entity types suh s erson nd yrgniztionF st my then e disrded in the glen phse if not required s (nl nnottion typeF
F.14
nal.jape
his grmmr uses the temporry nnottions previously ssigned in the erlier phsesD nd onverts them into (nl nnottionsF he reson for this is tht we need to e le to resolve miguities etween di'erent entity typesD so we need to hve ll the di'erent entity types hndled in single grmmr somewhereF emiguities n e resolved using prioritistion
TPI
tehniquesF elsoD we my need to omine previously nnotted elementsD suh s dtes nd timesD into single entityF he rules in this grmmr use tv ode on the r to remove the existing temporry nnottionsD nd reple them with new nnottionsF his is euse we wnt to retin the fetures ssoited with the temporry nnottionsF por exmpleD we might need to keep trk of whether person is mle or femleD or whether lotion is ity or ountryF st lso enles us to keep trk of whih rules hve een usedD for deugging purposesF por the ske of ofustionD lthough this phse is lled (nlD it is not the (nl phse3
F.15
unknown.jape
his short grmmr (nds proper nouns not previously reognisedD nd gives them n nE known nnottionF his is then used y the nmemther ! if n nknown nnottion n e mthed with previously tegorised entityD its nnottion is hnged to tht of the mthed entityF eny remining nknown nnottions re useful for deugging purposesD nd n lso e used s input for dditionl grmmrs or proessing resouresF
F.16
name_context.jape
his grmmr looks for nknown nnottions ourring in ertin ontexts whih indite they might elong to ersonF his is typil exmple of grmmr tht would ene(t from lerning or utomti ontext genertionD euse useful ontexts re @A hrd to (nd mnully nd my require lrge volumes of trining dtD nd @A often very domin!spei(F sn this ore grmmrD we on(ne the use of ontexts to firly generl usesD sine this grmmr should not e domin!dependentF
F.17
org_context.jape
his grmmr opertes on similr priniple to nmeontextFjpeF st is slightly oriented towrds usiness textsD so does not quite ful(l the generlity riteri of the previous grmmrF st doesD howeverD provide some insight into more detiled use of ontextsF`Gpb
TPP
F.18
loc_context.jape
his grmmr lso opertes in similr mnner to the preeding twoD using generl ontext suh s oordinted pirs of lotionsD nd hyponymi types of informtionF
F.19
clean.jape
his grmmr omes lst of llD nd simply ims to len up @removeA some of the temporry nnottions tht my not hve een deleted long the wyF
TPR
h E predeterminerX heterminer like elements preeding n rtile or possessive pronounY llGh his mrles9D quiteGh mess9F y E possessive endingX xouns ending in 9s9 or 99F E personl pronoun 6 E unknownED ut proly possessive pronoun E unknownED ut proly possessive pronoun 6 E unknownD ut proly possessive pronounDsuh s my9D your9D his9D his9D its9D one9s9D our9D nd their9F f E dverX most words ending in Ely9F elso quite9D too9D very9D enough9D indeed9D not9D En9t9D nd never9F f E dver E omprtiveX dvers ending with Eer9 with omprtive meningF f E dver E superltive E prtileX wostly monosylli words tht lso doule s diretionl dversF ee E strt stte mrker @used internllyA w E symolX tehnil symols or expressions tht ren9t inglish wordsF y E literl to r E interjetionX uh s my9D oh9D plese9D uh9D well9D yes9F fh E ver E pst tenseX inludes onditionl form of the ver to e9Y sf s wereGfh rihFFF9F fq E ver E gerund or present prtiiple fx E ver E pst prtiiple f E ver E nonEQrd person singulr present f E ver E se formX susumes impertivesD in(nitives nd sujuntivesF f E ver E Qrd person singulr present h E wh9Edeterminer 6 E possessive wh9EpronounX inludes whose9 E wh9EpronounX inludes wht9D who9D nd whom9F f E wh9EdverX inludes how9D where9D why9F snludes when9 when used in temporl senseF XX E literl olon D E literl omm 6 E literl dollr sign E E literl douleEdsh E literl doule quotes E literl grve @ E literl left prenthesis F E literl period 5 E literl pound sign A E literl right prenthesis 9 E literl single quote or postrophe
References
[Agatonovic et al. 08] M. Agatonovic, N. Aswani, K. Bontcheva, H. Cunningham, T. Heitz, Y. Li, I. Roberts, and V. Tablan. Large-scale, parallel automatic patent annotation. In Proceedings of the 1st ACM workshop on Patent information retrieval, PaIR '08, pages 18, New York, NY, USA, October 2008. ACM. [Ao & Takagi 05] H. Ao and T. Takagi. ALICE: an algorithm to extract abbreviations from MEDLINE. J Am Med Inform Assoc, 12(5):576586, 2005. [Aronson & Lang 10] A. R. Aronson and F.-M. Lang. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association (JAMIA), 17:229236, 2010. [Aswani & Gaizauskas 09] N. Aswani and R. Gaizauskas. Evolving a General Framework for Text Alignment: Case Studies with Two South Asian Languages. In Proceedings of the International Conference on Machine Translation: Twenty-Five Years On, Craneld, Bedfordshire, UK, November 2009. [Aswani & Gaizauskas 10] N. Aswani and R. Gaizauskas. Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages. In 7th Language Resources and Evaluation Conference (LREC), La Valletta, Malta, May 2010. ELRA. [Aswani et al. 05] N. Aswani, V. Tablan, K. Bontcheva, and H. Cunningham. Indexing and Querying Linguistic Metadata and Document Content. In Proceedings of Fifth International Conference on Recent Advances in Natural Language Processing (RANLP2005), Borovets, Bulgaria, 2005. [Aswani et al. 06] N. Aswani, K. Bontcheva, and H. Cunningham. Mining information for instance unication. In 5th International Semantic Web Conference (ISWC2006), Athens, Georgia, USA, 2006. [Azar 89] S. Azar. Understanding and Using English Grammar. Prentice Hall Regents, 1989.
TPS
TPT
References
[Baker et al. 02] P. Baker, A. Hardie, T. McEnery, H. Cunningham, and R. Gaizauskas. EMILLE, A 67Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation. In Proceedings of 3rd Language Resources and Evaluation Conference (LREC'2002), pages 819825, 2002. [Bird & Liberman 99] S. Bird and M. Liberman. A Formal Framework for Linguistic Annotation. Technical Report MS-CIS-99-01, Department of Computer and Information Science, University of Pennsylvania, 1999. http://xxx.lanl.gov/abs/cs.CL/9903003. [Bontcheva & Sabou 06] K. Bontcheva and M. Sabou. Learning Ontologies from Software Artifacts: Exploring and Combining Multiple Sources. In Workshop on Semantic Web Enabled Software Engineering (SWESE), Athens, G.A., USA, November 2006. [Bontcheva 04] K. Bontcheva. Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies. In Proceedings of 4th Language Resources and Evaluation Conference (LREC'04), 2004. [Bontcheva 05] K. Bontcheva. Generating Tailored Textual Summaries from Ontologies. In Second European Semantic Web Conference (ESWC'2005), 2005. [Bontcheva et al. 00] K. Bontcheva, H. Brugman, A. Russel, P. Wittenburg, and H. Cunningham. An Experiment in Unifying Audio-Visual and Textual Infrastructures for Language Processing R&D. In Proceedings of the Workshop on Using Toolsets and Architectures To Build NLP Systems at COLING-2000, Luxembourg, 2000. http://gate.ac.uk/. [Bontcheva et al. 02a] K. Bontcheva, H. Cunningham, V. Tablan, D. Maynard, and O. Hamza. Using GATE as an Environment for Teaching NLP. In Proceedings of the ACL Workshop on Eective Tools and Methodologies in Teaching NLP, 2002. http://gate.ac.uk/sale/acl02/gate4teaching.pdf. [Bontcheva et al. 02b] K. Bontcheva, H. Cunningham, V. Tablan, D. Maynard, and H. Saggion. Developing Reusable and Robust Language Processing Components for Information Systems using GATE. In Proceedings of the 3rd International Workshop on Natural Language and Information Systems (NLIS'2002), Aix-en-Provence, France, 2002. IEEE Computer Society Press. http://gate.ac.uk/sale/nlis/nlis.ps. [Bontcheva et al. 02c] K. Bontcheva, M. Dimitrov, D. Maynard, V. Tablan, and H. Cunningham. Shallow Methods for Named Entity Coreference Resolution. In Chanes de rfrences et rsolveurs d'anaphores, workshop TALN 2002, Nancy, France, 2002. http://gate.ac.uk/sale/taln02/taln-ws-coref.pdf.
References
TPU
[Bontcheva et al. 03] K. Bontcheva, A. Kiryakov, H. Cunningham, B. Popov, and M. Dimitrov. Semantic web enabled, open source language technology. In EACL workshop on Language Technology and the Semantic Web: NLP and XML, Budapest, Hungary, 2003. http://gate.ac.uk/sale/eacl03-semweb/bontcheva-etal-final.pdf. [Bontcheva et al. 04] K. Bontcheva, V. Tablan, D. Maynard, and H. Cunningham. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, 10(3/4):349373, 2004. [Bontcheva et al. 06a] K. Bontcheva, H. Cunningham, A. Kiryakov, and V. Tablan. Semantic Annotation and Human Language Technology. In J. Davies, R. Studer, and P. Warren, editors, Semantic Web Technology: Trends and Research. John Wiley and Sons, 2006. [Bontcheva et al. 06b] K. Bontcheva, J. Davies, A. Duke, T. Glover, N. Kings, and I. Thurlow. Semantic Information Access. In J. Davies, R. Studer, and P. Warren, editors, Semantic Web Technologies. John Wiley and Sons, 2006. [Bontcheva et al. 09] K. Bontcheva, B. Davis, A. Funk, Y. Li, and T. Wang. Human Language Technologies. In J. Davies, M. Grobelnik, and D. Mladenic, editors, Semantic Knowledge Management, pages 3749. 2009. [Bontcheva et al. 10] K. Bontcheva, H. Cunningham, I. Roberts, and V. Tablan. Web-based collaborative corpus annotation: Requirements and a framework implementation. In Proceedings of the Workshop on New Challenges for NLP Frameworks, pages 2027, Valletta, Malta, May 2010. [Booch 94] G. Booch. Object-Oriented Analysis and Design 2nd Edn. Benjamin/Cummings, 1994. [Bosma & Vossen 10] W. Bosma and P. Vossen. Bootstrapping language-neutral term extraction. In 7th Language Resources and Evaluation Conference (LREC), Valletta, Malta, 2010. [Brugman et al. 99] H. Brugman, K. Bontcheva, P. Wittenburg, and H. Cunningham. Integrating Multimedia and Textual Software Architectures for Language Technology. Technical report MPI-TG99-1, Max-Planck Institute for Psycholinguistics, Nijmegen, Netherlands, 1999. [Caporaso et al. 07] J. G. Caporaso, W. A. B. Jr., D. A. Randolph, K. B. Cohen, , and L. Hunter. MutationFinder: A high-performance system for extracting point mutation mentions from text. Bioinformatics, 23(14):18621865, 2007. [Carletta 96] J. Carletta. Assessing agreement on classication tasks: the Kappa statistic. Computational Linguistics, 22(2):249254, 1996.
TPV
References
[CC001]
LIBSVM: a library for support vector machines, 2001. Software available at http://www. csie.ntu.edu.tw/~cjlin/libsvm.
[Chinchor 92] N. Chinchor. MUC-4 Evaluation Metrics. In Proceedings of the Fourth Message Understanding Conference, pages 2229, 1992. [Cimiano et al. 03] P. Cimiano, S.Staab, and J. Tane. Automatic Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 1017, Cavtat-Dubrovnik, Croatia, 2003. [Cobuild 99] C. Cobuild, editor. English Grammar. Harper Collins, 1999. [Cunningham & Bontcheva 05] H. Cunningham and K. Bontcheva. Computational Language Systems, Architectures. Encyclopedia of Language and Linguistics, 2nd Edition, pages 733752, 2005. [Cunningham & Scott 04a] H. Cunningham and D. Scott. Introduction to the Special Issue on Software Architecture for Language Engineering. Natural Language Engineering, 2004. http://gate.ac.uk/sale/jnle-sale/intro/intro-main.pdf. [Cunningham & Scott 04b] H. Cunningham and D. Scott, editors. Special Issue of Natural Language Engineering on Software Architecture for Language Engineering. Cambridge University Press, 2004. [Cunningham 94] H. Cunningham. Support Software for Language Engineering Research. Technical Report 94/05, Centre for Computational Linguistics, UMIST, Manchester, 1994. [Cunningham 99a] H. Cunningham. A Denition and Short History of Language Engineering. Journal of Natural Language Engineering, 5(1):116, 1999. [Cunningham 99b] H. Cunningham. JAPE: a Java Annotation Patterns Engine. Research Memorandum CS 9906, Department of Computer Science, University of Sheeld, May 1999. [Cunningham 00] H. Cunningham. Software Architecture for Language Engineering. Unpublished PhD thesis, University of Sheeld, 2000. http://gate.ac.uk/sale/thesis/. [Cunningham 02] H. Cunningham. GATE, a General Architecture for Text Engineering. Computers and the Humanities, 36:223254, 2002. [Cunningham 05] H. Cunningham. Information Extraction, Automatic. Encyclopedia of Language and Linguistics, 2nd Edition, pages 665677, 2005.
References
TPW
[Cunningham et al. 94] H. Cunningham, M. Freeman, and W. Black. Software Reuse, Object-Oriented Frameworks and Natural Language Processing. In New Methods in Language Processing (NeMLaP-1), September 1994, pages 357367, Manchester, 1994. UCL Press. [Cunningham et al. 95] H. Cunningham, R. Gaizauskas, and Y. Wilks. A General Architecture for Text Engineering (GATE) a new approach to Language Engineering R&D. Technical Report CS9521, Department of Computer Science, University of Sheeld, 1995. http://xxx.lanl.gov/abs/cs.CL/9601009. [Cunningham et al. 96a] H. Cunningham, K. Humphreys, R. Gaizauskas, and M. Stower. CREOLE Developer's Manual. Technical report, Department of Computer Science, University of Sheeld, 1996. http://www.dcs.shef.ac.uk/nlp/gate. [Cunningham et al. 96b] H. Cunningham, K. Humphreys, R. Gaizauskas, and Y. Wilks. TIPSTER-Compatible Projects at Sheeld. In Advances in Text Processing, TIPSTER Program Phase II. DARPA, Morgan Kaufmann, California, 1996. [Cunningham et al. 96c] H. Cunningham, Y. Wilks, and R. Gaizauskas. GATE a General Architecture for Text Engineering. In Proceedings of the 16th Conference on Computational Linguistics (COLING-96), Copenhagen, August 1996. ftp://ftp.dcs.shef.ac.uk/home/hamish/auto_papers/Cun96b.ps. [Cunningham et al. 96d] H. Cunningham, Y. Wilks, and R. Gaizauskas. Software Infrastructure for Language Engineering. In Proceedings of the AISB Workshop on Language Engineering for Document Analysis and Recognition, Brighton, U.K., April 1996. [Cunningham et al. 96e] H. Cunningham, Y. Wilks, and R. Gaizauskas. New Methods, Current Trends and Software Infrastructure for NLP. In Proceedings of the Conference on New Methods in Natural Language Processing (NeMLaP-2), Bilkent University, Turkey, September 1996. ftp://ftp.dcs.shef.ac.uk/home/hamish/auto_papers/Cun96c.ps. [Cunningham et al. 97a] H. Cunningham, K. Humphreys, R. Gaizauskas, and Y. Wilks. GATE a TIPSTERbased General Architecture for Text Engineering. In Proceedings of the TIPSTER Text Program (Phase III) 6 Month Workshop. DARPA, Morgan Kaufmann, California, May 1997. ftp://ftp.dcs.shef.ac.uk/home/hamish/auto_papers/Cun97e.ps. [Cunningham et al. 97b] H. Cunningham, K. Humphreys, R. Gaizauskas, and Y. Wilks. Software Infrastructure for Natural Language Processing. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP-97), March 1997. ftp://ftp.dcs.shef.ac.uk/home/hamish/auto_papers/Cun97a.ps.gz.
TQH
References
[Cunningham et al. 98a] H. Cunningham, W. Peters, C. McCauley, K. Bontcheva, and Y. Wilks. A Level Playing Field for Language Resource Evaluation. In Workshop on Distributing and Accessing Lexical Resources at Conference on Language Resources Evaluation, Granada, Spain, 1998. http://www.dcs.shef.ac.uk/ hamish/dalr. [Cunningham et al. 98b] H. Cunningham, M. Stevenson, and Y. Wilks. Implementing a Sense Tagger within a General Architecture for Language Engineering. In Proceedings of the Third Conference on New Methods in Language Engineering (NeMLaP-3), pages 5972, Sydney, Australia, 1998. [Cunningham et al. 99] H. Cunningham, R. Gaizauskas, K. Humphreys, and Y. Wilks. Experience with a Language Engineering Architecture: Three Years of GATE. In Proceedings of the AISB'99 Workshop on Reference Architectures and Data Standards for NLP, Edinburgh, April 1999. The Society for the Study of Articial Intelligence and Simulation of Behaviour. http://www.dcs.shef.ac.uk/ hamish/GateAisb99.html. [Cunningham et al. 00a] H. Cunningham, K. Bontcheva, W. Peters, and Y. Wilks. Uniform language resource access and distribution in the context of a General Architecture for Text Engineering (GATE). In Proceedings of the Workshop on Ontologies and Language Resources (OntoLex'2000), Sozopol, Bulgaria, September 2000. http://gate.ac.uk/sale/ontolex/ontolex.ps. [Cunningham et al. 00b] H. Cunningham, K. Bontcheva, V. Tablan, and Y. Wilks. Software Infrastructure for Language Resources: a Taxonomy of Previous Work and a Requirements Analysis. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2), pages 815824, Athens, 2000. [Cunningham et al. 00c] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and Y. Wilks. Experience of using GATE for NLP R&D. In Proceedings of the Workshop on Using Toolsets and Architectures To Build NLP Systems at COLING-2000, Luxembourg, 2000. http://gate.ac.uk/. [Cunningham et al. 00d] H. Cunningham, D. Maynard, and V. Tablan. JAPE: a Java Annotation Patterns Engine (Second Edition). Research Memorandum CS0010, Department of Computer Science, University of Sheeld, November 2000. [Cunningham et al. 02] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. Gate: an architecture for development of robust hlt applications. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 168175, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. [Cunningham et al. 03] H. Cunningham, V. Tablan, K. Bontcheva, and M. Dimitrov. Language Engineering Tools for Collaborative Corpus Annotation. In Proceedings of Corpus Linguistics 2003, Lancaster, UK, 2003. http://gate.ac.uk/sale/cl03/distrib-ollie-cl03.doc.
References
TQI
[Damljanovic & Bontcheva 08] D. Damljanovic and K. Bontcheva. Enhanced Semantic Access to Software Artefacts. In Workshop on Semantic Web Enabled Software Engineering (SWESE), Karlsruhe, Germany, October 2008. [Damljanovic 10] D. Damljanovic. Towards Portable Controlled Natural Languages for Querying Ontologies. In M. Rosner and N. Fuchs, editors, Second Workshop on Controlled Natural Languages, volume 622 of CEUR Workshop Pre-Proceedings ISSN 1613-0073. http://ceur-ws.org, Marettimo Island, Italy, September 2010. [Damljanovic et al. 08] D. Damljanovic, V. Tablan, and K. Bontcheva. A Text-based Query Interface to OWL Ontologies. In 6th Language Resources and Evaluation Conference (LREC), Marrakech, Morocco, May 2008. ELRA. [Damljanovic et al. 09] D. Damljanovic, F. Amardeilh, and K. Bontcheva. CA Manager Framework: Creating Customised Workows for Ontology Population and Semantic Annotation. In Proceedings of The Fifth International Conference on Knowledge Capture (KCAP'09), California, USA, September 2009. [Davies & Fleiss 82] M. Davies and J. Fleiss. Measuring Agreement for Multinomial Data. Biometrics, 38:1047 1051, 1982. [Davis et al. 06] B. Davis, S. Handschuh, H. Cunningham, and V. Tablan. Further use of Controlled Natural Language for Semantic Annotation of Wikis. In Proceedings of the 1st Semantic Authoring and Annotation Workshop at ISWC2006, Athens, Georgia, USA, November 2006. [Day et al. 97] D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson, and M. Vilain. MixedInitiative Development of Language Processing Systems. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP-97), 1997. [Della Valle et al. 08] E. Della Valle, D. Cerizza, I. Celino, A. Turati, H. Lausen, N. Steinmetz, M. Erdmann, and A. Funk. Realizing Service-Finder: Web service discovery at web scale. In European Semantic Technology Conference (ESTC), Vienna, September 2008. [Dimitrov 02a] M. Dimitrov. A Light-weight Approach to Coreference Resolution for Named Entities in Text. MSc Thesis, University of Soa, Bulgaria, 2002. http://www.ontotext.com/ie/thesis-m.pdf. [Dimitrov 02b] M. Dimitrov. A Light-weight Approach to Coreference Resolution for Named Entities in Text. MSc Thesis, University of Soa, Bulgaria, 2002. http://www.ontotext.com/ie/thesis-m.pdf.
TQP
References
[Dimitrov et al. 02] M. Dimitrov, K. Bontcheva, H. Cunningham, and D. Maynard. A Light-weight Approach to Coreference Resolution for Named Entities in Text. In Proceedings of the Fourth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC), Lisbon, 2002. [Dimitrov et al. 04] M. Dimitrov, K. Bontcheva, H. Cunningham, and D. Maynard. A Light-weight Approach to Coreference Resolution for Named Entities in Text. In A. Branco, T. McEnery, and R. Mitkov, editors, Anaphora Processing: Linguistic, Cognitive and Computational Modelling. John Benjamins, 2004.
[Dowman et al. 05a] M. Dowman, V. Tablan, H. Cunningham, and B. Popov. Content augmentation for mixed-mode news broadcasts. In Proceedings of the 3rd European Conference on Interactive Television: User Centred ITV Systems, Programmes and Applications, Aalborg University, Denmark, 2005. http://gate.ac.uk/sale/euro-itv-2005/content-augmentation-for-mixed-mode-news-broadcast-c [Dowman et al. 05b] M. Dowman, V. Tablan, H. Cunningham, and B. Popov. Web-assisted annotation, semantic indexing and search of television and radio news. In Proceedings of the 14th International World Wide Web Conference, Chiba, Japan, 2005. [Dowman et al. 05c] M. Dowman, V. Tablan, H. Cunningham, C. Ursu, and B. Popov. Semantically enhanced television news through web and video integration. In Second European Semantic Web Conference (ESWC'2005), 2005. [DUC 01] NIST. Proceedings of the Document Understanding Conference, September 13 2001. [Eugenio & Glass 04] B. D. Eugenio and M. Glass. The kappa statistic: a second look. Computational Linguistics, 1(30), 2004. (squib). [Fleiss 75] J. L. Fleiss. Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31:651659, 1975. [Frakes & Baeza-Yates 92] W. Frakes and R. Baeza-Yates, editors. Information retrieval, data structures and algorithms. Prentice Hall, New York, Englewood Clis, N.J., 1992. [Funk et al. 07a] A. Funk, D. Maynard, H. Saggion, and K. Bontcheva. Ontological integration of information extracted from multiple sources. In Multi-source Multilingual Information Extraction and Summarization (MMIES) workshop at Recent Advances in Natural Language Processing (RANLP07), pages 915, Borovets, Bulgaria, September 2007. [Funk et al. 07b] A. Funk, V. Tablan, K. Bontcheva, H. Cunningham, B. Davis, and S. Handschuh. CLOnE:
References
TQQ
Controlled Language for Ontology Editing. In Proceedings of the 6th International Semantic Web Conference (ISWC 2007), Busan, Korea, November 2007. [Gaizauskas et al. 95] R. Gaizauskas, T. Wakao, K. Humphreys, H. Cunningham, and Y. Wilks. Description of the LaSIE system as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 207220. Morgan Kaufmann, California, 1995. [Gaizauskas et al. 96a] R. Gaizauskas, P. Rodgers, H. Cunningham, and K. Humphreys. GATE User Guide. http://www.dcs.shef.ac.uk/nlp/gate, 1996. [Gaizauskas et al. 96b] R. Gaizauskas, H. Cunningham, Y. Wilks, P. Rodgers, and K. Humphreys. GATE an Environment to Support Research and Development in Natural Language Engineering. In Proceedings of the 8th IEEE International Conference on Tools with Articial Intelligence (ICTAI-96), Toulouse, France, October 1996. ftp://ftp.dcs.shef.ac.uk/home/robertg/ictai96.ps. [Gaizauskas et al. 03] R. Gaizauskas, M. A. Greenwood, M. Hepple, I. Roberts, H. Saggion, and M. Sargaison. The University of Sheeld's TREC 2003 Q&A Experiments. In In Proceedings of the 12th Text REtrieval Conference, 2003. [Gaizauskas et al. 04] R. Gaizauskas, M. A. Greenwood, M. Hepple, I. Roberts, H. Saggion, and M. Sargaison. The University of Sheeld's TREC 2004 Q&A Experiments. In In Proceedings of the 13th Text REtrieval Conference, 2004. [Gaizauskas et al. 05] R. Gaizauskas, M. A. Greenwood, M. Hepple, H. Harkema, H. Saggion, and A. Sanka. The University of Sheeld's TREC 2005 Q&A Experiments. In In Proceedings of the 11th Text REtrieval Conference, 2005. [Gambck & Olsson 00] B. Gambck and F. Olsson. Experiences of Language Engineering Algorithm Reuse. In Second International Conference on Language Resources and Evaluation (LREC), pages 155160, Athens, Greece, 2000. [Gazdar & Mellish 89] G. Gazdar and C. Mellish. Natural Language Processing in Prolog. Addison-Wesley, Reading, MA, 1989. [Gooch 12] P. Gooch. Badrex: In situ expansion and coreference of biomedical abbreviations using dynamic regular expressions. Technical report, City University London, 2012. [Greenwood et al. 02] M. A. Greenwood, I. Roberts, and R. Gaizauskas. The University of Sheeld's TREC 2002 Q&A Experiments. In In Proceedings of the 11th Text REtrieval Conference, 2002.
TQR
References
[Grishman 97] R. Grishman. TIPSTER Architecture Design Document Version 2.3. Technical report, DARPA, 1997. http://www.itl.nist.gov/div894/894.02/related_projects/tipster/. [Hepple 00] M. Hepple. Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), Hong Kong, October 2000. [Hripcsak & Heitjan 02] G. Hripcsak and D. Heitjan. Measuring agreement in medical informatics reliability studies. Journal of Biomedical Informatics, 35:99110, 2002. [Hripcsak & Rothschild 05] G. Hripcsak and A. S. Rothschild. Agreement, the F-measure, and Reliability in Information Retrieval. Journal of the American Medical Informatics Association, 12(3):296298, 2005. [Humphreys et al. 96] K. Humphreys, R. Gaizauskas, H. Cunningham, and S. Azzam. CREOLE Module Specications. http://www.dcs.shef.ac.uk/nlp/gate/, 1996. [Humphreys et al. 98] K. Humphreys, R. Gaizauskas, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, and Y. Wilks. Description of the LaSIE system as used for MUC-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html, 1998. [Humphreys et al. 99] K. Humphreys, R. Gaizauskas, M. Hepple, and M. Sanderson. The University of Sheeld TREC-8 Q&A System. In In Proceedings of the 8th Text REtrieval Conference, 1999. [Ide et al. 00] N. Ide, P. Bonhomme, and L. Romary. XCES: An XML-based Standard for Linguistic Corpora. In Proceedings of the Second International Language Resources and Evaluation Conference (LREC), pages 825830, Athens, Greece, 2000. [Jackson 75] M. Jackson. Principles of Program Design. Academic Press, London, 1975. [Jin et al. 06] Y. Jin, R. T. McDonald, K. Lerman, M. A. Mandel, S. Carroll, M. Y. Liberman, F. C. Pereira, R. S. Winters, , and P. S. White. Automated recognition of malignancy mentions in biomedical literature. BMC Bioinformatics, 7:492499, 2006. [Kiryakov 03] A. Kiryakov. Ontology and Reasoning in MUMIS: Towards the Semantic Web. Technical Report CS0303, Department of Computer Science, University of Sheeld, 2003. http://gate.ac.uk/gate/doc/papers.html.
References
TQS
[Kohlschtter et al. 10] C. Kohlschtter, P. Fankhauser, and W. Nejdl. Boilerplate Detection using Shallow Text Features. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, 2010. [Laclavik & Maynard 09] M. Laclavik and D. Maynard. Motivating intelligent email in business: an investigation into current trends for email processing and communication research. In Proceedings of Workshop on Emails in e-Commerce and Enterprise Context, 11th IEEE Conference on Commerce and Enterprise Computing, Vienna, Austria, 2009. [Lal & Ruger 02] P. Lal and S. Ruger. Extract-based summarization with simplication. In Proceedings of the ACL 2002 Automatic Summarization / DUC 2002 Workshop, 2002. http://www.doc.ic.ac.uk/ srueger/pr-p.lal-2002/duc02-final.pdf. [Lal 02] P. Lal. Text summarisation. Unpublished M.Sc. thesis, Imperial College, London, 2002.
[Li & Bontcheva 08] Y. Li and K. Bontcheva. Adapting support vector machines for f-term-based classication of patents. ACM Transactions on Asian Language Information Processing, 7(2):7:17:19, 2008. [Li & Cunningham 08] Y. Li and H. Cunningham. Geometric and Quantum Methods for Information Retrieval. SIGIR Forum, 42(2):2232, 2008. [Li & Shawe-Taylor 03] Y. Li and J. Shawe-Taylor. The SVM with Uneven Margins and Chinese Document Categorization. In Proceedings of The 17th Pacic Asia Conference on Language, Information and Computation (PACLIC17), Singapore, Oct. 2003. [Li & Shawe-Taylor 06] Y. Li and J. Shawe-Taylor. Using KCCA for Japanese-English Cross-language Information Retrieval and Document Classication. Journal of Intelligent Information Systems, 27(2):117133, 2006. [Li & Shawe-Taylor 07] Y. Li and J. Shawe-Taylor. Advanced Learning Algorithms for Cross-language Patent Retrieval and Classication. Information Processing and Management, 43(5):11831199, 2007. [Li et al. 02] Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola. The Perceptron Algorithm with Uneven Margins. In Proceedings of the 9th International Conference on Machine Learning (ICML-2002), pages 379386, 2002. [Li et al. 04] Y. Li, K. Bontcheva, and H. Cunningham. An SVM Based Learning Algorithm for Information Extraction. Machine Learning Workshop, Sheeld, 2004. http://gate.ac.uk/sale/ml-ws04/mlw2004.pdf.
TQT
References
[Li et al. 05a] Y. Li, K. Bontcheva, and H. Cunningham. SVM Based Learning System For Information Extraction. In M. N. J. Winkler and N. Lawerence, editors, Deterministic and Statistical Methods in Machine Learning, LNAI 3635, pages 319339. Springer Verlag, 2005. [Li et al. 05b] Y. Li, K. Bontcheva, and H. Cunningham. Using Uneven Margins SVM and Perceptron for Information Extraction. In Proceedings of Ninth Conference on Computational Natural Language Learning (CoNLL-2005), 2005. [Li et al. 05c] Y. Li, C. Miao, K. Bontcheva, and H. Cunningham. Perceptron Learning for Chinese Word Segmentation. In Proceedings of Fourth SIGHAN Workshop on Chinese Language processing (Sighan-05), pages 154157, Korea, 2005. [Li et al. 07a] Y. Li, K. Bontcheva, and H. Cunningham. Hierarchical, Perceptron-like Learning for Ontology Based Information Extraction. In 16th International World Wide Web Conference (WWW2007), pages 777786, May 2007. [Li et al. 07b] Y. Li, K. Bontcheva, and H. Cunningham. Cost Sensitive Evaluation Measures for Fterm Patent Classication. In The First International Workshop on Evaluating Information Access (EVIA 2007), pages 4453, May 2007. [Li et al. 07c] Y. Li, K. Bontcheva, and H. Cunningham. Experiments of opinion analysis on the corpora MPQA and NTCIR-6. In Proceedings of the Sixth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and CrossLingual Information Access, pages 323329, May 2007. [Li et al. 07d] Y. Li, K. Bontcheva, and H. Cunningham. SVM Based Learning System for F-term Patent Classication. In Proceedings of the Sixth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and CrossLingual Information Access, pages 396402, May 2007. [Li et al. 09] Y. Li, K. Bontcheva, and H. Cunningham. Adapting SVM for Data Sparseness and Imbalance: A Case Study on Information Extraction. Natural Language Engineering, 15(2):241 271, 2009. [Lombard et al. 02] M. Lombard, J. Snyder-Duch, and C. C. Bracken. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research, 28:587604, 2002. [LREC-1 98] Conference on Language Resources Evaluation (LREC-1), Granada, Spain, 1998.
References
[LREC-2 00] Second Conference on Language Resources Evaluation (LREC-2), Athens, 2000.
TQU
[Maeda & Strassel 04] K. Maeda and S. Strassel. Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium. In Proceedings of 4th Language Resources and Evaluation Conference (LREC'2004), 2004. [Manning & Schtze 99] C. Manning and H. Schtze. Foundations of Statistical Natural Language Processing. MIT press, Cambridge, MA, 1999. Supporting materials available at http://www.sultry.arts.usyd.edu.au/fsnlp/ . [Manov et al. 03] D. Manov, A. Kiryakov, B. Popov, K. Bontcheva, and D. Maynard. Experiments with geographic knowledge for information extraction. In Workshop on Analysis of Geographic References, HLT/NAACL'03, Edmonton, Canada, 2003. http://gate.ac.uk/sale/hlt03/paper03.pdf. [Marsh & Perzanowski 98] E. Marsh and D. Perzanowski. Muc-7 evaluation of ie technology: Overview of results. In Proceedings of the Seventh Message Understanding Conference (MUC-7). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html, 1998. [Maynard 05] D. Maynard. Benchmarking ontology-based annotation tools for the semantic web. In UK e-Science Programme All Hands Meeting (AHM2005) Workshop on Text Mining, e-Research and Grid-enabled Language Technology, Nottingham, UK, 2005. [Maynard 08] D. Maynard. Benchmarking textual annotation tools for the semantic web. In Proc. of 6th International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, 2008. [Maynard et al. 00] D. Maynard, H. Cunningham, K. Bontcheva, R. Catizone, G. Demetriou, R. Gaizauskas, O. Hamza, M. Hepple, P. Herring, B. Mitchell, M. Oakes, W. Peters, A. Setzer, M. Stevenson, V. Tablan, C. Ursu, and Y. Wilks. A Survey of Uses of GATE. Technical Report CS0006, Department of Computer Science, University of Sheeld, 2000. [Maynard et al. 01] D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natural Language Processing 2001 Conference, pages 257274, Tzigov Chark, Bulgaria, 2001. [Maynard et al. 02a] D. Maynard, K. Bontcheva, H. Saggion, H. Cunningham, and O. Hamza. Using a Text Engineering Framework to Build an Extendable and Portable IE-based Summarisation System. In Proceedings of the ACL Workshop on Text Summarisation, pages 1926, Phildadelphia, Pennsylvania, 2002. ACM.
TQV
References
[Maynard et al. 02b] D. Maynard, H. Cunningham, K. Bontcheva, and M. Dimitrov. Adapting a robust multigenre NE system for automatic content extraction. In Proceedings of the 10th International Conference on Articial Intelligence: Methodology, Systems, Applications (AIMSA'02), Varna, Bulgaria, Sep 2002. [Maynard et al. 02c] D. Maynard, H. Cunningham, K. Bontcheva, and M. Dimitrov. Adapting A Robust MultiGenre NE System for Automatic Content Extraction. In Proceedings of the Tenth International Conference on Articial Intelligence: Methodology, Systems, Applications (AIMSA 2002), 2002. [Maynard et al. 02d] D. Maynard, H. Cunningham, and R. Gaizauskas. Named entity recognition at sheeld university. In H. Holmboe, editor, Nordic Language Technology Arbog for Nordisk Sprogtechnologisk Forskningsprogram 2002-2004, pages 141145. Museum Tusculanums Forlag, 2002. [Maynard et al. 02e] D. Maynard, V. Tablan, H. Cunningham, C. Ursu, H. Saggion, K. Bontcheva, and Y. Wilks. Architectural Elements of Language Engineering Robustness. Journal of Natural Language Engineering Special Issue on Robust Methods in Analysis of Natural Language Data, 8(2/3):257274, 2002. [Maynard et al. 03a] D. Maynard, K. Bontcheva, and H. Cunningham. From information extraction to content extraction. Submitted to EACL'2003, 2003. [Maynard et al. 03b] D. Maynard, K. Bontcheva, and H. Cunningham. Towards a semantic extraction of named entities. In G. Angelova, K. Bontcheva, R. Mitkov, N. Nicolov, and N. Nikolov, editors, Proceedings of Recent Advances in Natural Language Processing (RANLP'03), pages 255 261, Borovets, Bulgaria, Sep 2003. http://gate.ac.uk/sale/ranlp03/ranlp03.pdf. [Maynard et al. 03c] D. Maynard, K. Bontcheva, and H. Cunningham. Towards a semantic extraction of Named Entities. In Recent Advances in Natural Language Processing, Bulgaria, 2003. [Maynard et al. 03d] D. Maynard, V. Tablan, K. Bontcheva, and H. Cunningham. Rapid customisation of an Information Extraction system for surprise languages. Special issue of ACM Transactions on Asian Language Information Processing: Rapid Development of Language Capabilities: The Surprise Languages, 2:295300, 2003. [Maynard et al. 03e] D. Maynard, V. Tablan, and H. Cunningham. NE recognition without training data on a language you don't speak. In ACL Workshop on Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models, Sapporo, Japan, 2003. [Maynard et al. 04a] D. Maynard, K. Bontcheva, and H. Cunningham. Automatic Language-Independent Induc-
References
TQW
tion of Gazetteer Lists. In Proceedings of 4th Language Resources and Evaluation Conference (LREC'04), Lisbon, Portugal, 2004. ELRA. [Maynard et al. 04b] D. Maynard, H. Cunningham, A. Kourakis, and A. Kokossis. Ontology-Based Information Extraction in hTechSight. In First European Semantic Web Symposium (ESWS 2004), Heraklion, Crete, 2004. [Maynard et al. 04c] D. Maynard, M. Yankova, N. Aswani, and H. Cunningham. Automatic Creation and Monitoring of Semantic Metadata in a Dynamic Knowledge Portal. In Proceedings of the 11th International Conference on Articial Intelligence: Methodology, Systems, Applications (AIMSA 2004), Varna, Bulgaria, 2004. [Maynard et al. 06] D. Maynard, W. Peters, and Y. Li. Metrics for evaluation of ontology-based information extraction. In WWW 2006 Workshop on Evaluation of Ontologies for the Web (EON), Edinburgh, Scotland, 2006. [Maynard et al. 07a] D. Maynard, W. Peters, M. d'Aquin, and M. Sabou. Change management for metadata evolution. In ESWC International Workshop on Ontology Dynamics (IWOD), Innsbruck, Austria, June 2007. [Maynard et al. 07b] D. Maynard, H. Saggion, M. Yankova, K. Bontcheva, and W. Peters. Natural Language Technology for Information Integration in Business Intelligence. In 10th International Conference on Business Information Systems (BIS-07), Poznan, Poland, 25-27 April 2007. [Maynard et al. 08a] D. Maynard, W. Peters, and Y. Li. Evaluating evaluation metrics for ontology-based applications: Innite reection. In Proc. of 6th International Conference on Language Resources and Evaluation (LREC), Marrakech, Morocco, 2008. [Maynard et al. 08b] D. Maynard, Y. Li, and W. Peters. NLP Techniques for Term Extraction and Ontology Population. In P. Buitelaar and P. Cimiano, editors, Bridging the Gap between Text and Knowledge - Selected Contributions to Ontology Learning and Population from Text. IOS Press, 2008. [Maynard et al. 09] D. Maynard, A. Funk, and W. Peters. SPRAT: a tool for automatic semantic pattern-based ontology population. In International Conference for Digital Libraries and the Semantic Web, Trento, Italy, September 2009. [McDonald & Pereira 05] R. McDonald and F. Pereira. Identifying Gene and Protein Mentions in Text Using Conditional Random Fields. BMC Bioinformatics, 6(Suppl 1):S6, 2005. [McDonald et al. 04] R. T. McDonald, R. S. Winters, M. Mandel, Y. Jin, P. S. White, and F. Pereira. An entity
TRH
References
tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics, 20(17):32493251, 2004.
[McEnery et al. 00] A. McEnery, P. Baker, R. Gaizauskas, and H. Cunningham. EMILLE: Building a Corpus of South Asian Languages. Vivek, A Quarterly in Articial Intelligence, 13(3):2332, 2000. [Osenova & Simov 04] P. Osenova and K. Simov. BulTreeBank stylebook. Technical Report BTB-TR05, BulTreeBank Project, May 2004. [Pastra et al. 02] K. Pastra, D. Maynard, H. Cunningham, O. Hamza, and Y. Wilks. How feasible is the reuse of grammars for named entity recognition? In Proceedings of the 3rd Language Resources and Evaluation Conference, 2002. http://gate.ac.uk/sale/lrec2002/reusability.ps. [Peters et al. 98] W. Peters, H. Cunningham, C. McCauley, K. Bontcheva, and Y. Wilks. Uniform Language Resource Access and Distribution. In Workshop on Distributing and Accessing Lexical Resources at Conference on Language Resources Evaluation, Granada, Spain, 1998. [Polajnar et al. 05] T. Polajnar, V. Tablan, and H. Cunningham. User-friendly ontology authoring using a controlled language. Technical Report CS Report No. CS-05-10, University of Sheeld, Sheeld, UK, 2005. [Porter 80] M. Porter. An algorithm for sux stripping. Program, 14(3):130137, 1980. [Ramshaw & Marcus 95] L. Ramshaw and M. Marcus. Text Chunking Using Transformation-Based Learning. In Proceedings of the Third ACL Workshop on Very Large Corpora, 1995. [Saggion & Funk 09] H. Saggion and A. Funk. Extracting opinions and facts for business intelligence. RNTI Journal, E(17):119146, November 2009. [Saggion & Gaizauskas 04a] H. Saggion and R. Gaizauskas. Mining on-line sources for denition knowledge. In Proceedings of the 17th FLAIRS 2004, Miami Bearch, Florida, USA, May 17-19 2004. AAAI. [Saggion & Gaizauskas 04b] H. Saggion and R. Gaizauskas. Multi-document summarization by cluster/prole relevance and redundancy removal. In Proceedings of the Document Understanding Conference 2004. NIST, 2004. [Saggion & Gaizauskas 05] H. Saggion and R. Gaizauskas. Experiments on statistical and pattern-based biographical summarization. In Proceedings of EPIA 2005, pages 611621, 2005.
References
TRI
[Saggion 04] H. Saggion. Identifying denitions in text collections for question answering. lrec. In Proceedings of Language Resources and Evaluation Conference. ELDA, 2004. [Saggion 06] H. Saggion. Multilingual Multidocument Summarization Tools and Evaluation. In Proceedings of LREC 2006, 2006. [Saggion 07] H. Saggion. Shef: Semantic tagging and summarization techniques applied to crossdocument coreference. In Proceedings of SemEval 2007, Assocciation for Computational Linguistics, pages 292295, June 2007. [Saggion et al. 02a] H. Saggion, H. Cunningham, K. Bontcheva, D. Maynard, C. Ursu, O. Hamza, and Y. Wilks. Access to Multimedia Information through Multisource and Multilanguage Information Extraction. In Proceedings of the 7th Workshop on Applications of Natural Language to Information Systems (NLDB 2002), Stockholm, Sweden, 2002. [Saggion et al. 02b] H. Saggion, H. Cunningham, D. Maynard, K. Bontcheva, O. Hamza, C. Ursu, and Y. Wilks. Extracting Information for Information Indexing of Multimedia Material. In Proceedings of 3rd Language Resources and Evaluation Conference (LREC'2002), 2002. http://gate.ac.uk/sale/lrec2002/mumis_lrec2002.ps. [Saggion et al. 03a] H. Saggion, K. Bontcheva, and H. Cunningham. Robust Generic and Query-based Summarisation. In Proceedings of the European Chapter of Computational Linguistics (EACL), Research Notes and Demos, 2003. [Saggion et al. 03b] H. Saggion, H. Cunningham, K. Bontcheva, D. Maynard, O. Hamza, and Y. Wilks. Multimedia Indexing through Multisource and Multilingual Information Extraction; the MUMIS project. Data and Knowledge Engineering, 48:247264, 2003. [Saggion et al. 03c] H. Saggion, J. Kuper, H. Cunningham, T. Declerck, P. Wittenburg, M. Puts, F. DeJong, and Y. Wilks. Event-coreference across Multiple, Multi-lingual Sources in the Mumis Project. In Proceedings of the European Chapter of Computational Linguistics (EACL), Research Notes and Demos, 2003. [Saggion et al. 07] H. Saggion, A. Funk, D. Maynard, and K. Bontcheva. Ontology-based information extraction for business applications. In Proceedings of the 6th International Semantic Web Conference (ISWC 2007), Busan, Korea, November 2007. [Schwartz & Hearst 03] A. S. Schwartz and M. A. Hearst. A simple algorithm for identifying abbreviation definitions in biomedical text. Pacic Symposium on Biocomputing. Pacic Symposium on Biocomputing, pages 451462, 2003.
TRP
References
[Scott & Gaizauskas. 00] S. Scott and R. Gaizauskas. The University of Sheeld TREC-9 Q&A System. In In Proceedings of the 9th Text REtrieval Conference, 2000. [Settles 05] B. Settles. ABNER: An open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics, 21(14):31913192, 2005. [Shaw & Garlan 96] M. Shaw and D. Garlan. Software Architecture. Prentice Hall, New York, 1996. [Simov & Osenova 03] K. Simov and P. Osenova. Practical annotation scheme for an HPSG treebank of Bulgarian. In Proceedings of the 4th International Workshop on Linguistically Interpreteted Corpora (LINC-2003), Budapest, Hungary, 2003. [Simov et al. 02] K. Simov, G. Popova, and P. Osenova. HPSG-based syntactic treebank of Bulgarian (BulTreeBank). In A. Wilson, P. Rayson, and T. McEnery, editors, A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, pages 135142. Lincom-Europa, Munich, 2002. [Simov et al. 04a] K. Simov, P. Osenova, A. Simov, and M. Kouylekov. Design and implementation of the Bulgarian HPSG-based treebank. Journal of Research on Language and Computation, 2(4):495 522, December 2004. [Simov et al. 04b] K. Simov, P. Osenova, and M. Slavcheva. BulTreeBank morphosyntactic tagset. Technical Report BTB-TR03, BulTreeBank Project, March 2004. [Stevenson et al. 98] M. Stevenson, H. Cunningham, and Y. Wilks. Sense tagging and language engineering. In Proceedings of the 13th European Conference on Articial Intelligence (ECAI-98), pages 185189, Brighton, U.K., 1998. [Tablan et al. 02] V. Tablan, C. Ursu, K. Bontcheva, H. Cunningham, D. Maynard, O. Hamza, T. McEnery, P. Baker, and M. Leisher. A Unicode-based Environment for Creation and Use of Language Resources. In 3rd Language Resources and Evaluation Conference, Las Palmas, Canary Islands Spain, 2002. ELRA. http://gate.ac.uk/sale/iesl03/iesl03.pdf. [Tablan et al. 03] V. Tablan, K. Bontcheva, D. Maynard, and H. Cunningham. Ollie: on-line learning for information extraction. In SEALTS '03: Proceedings of the HLT-NAACL 2003 workshop on Software engineering and architecture of language technology systems, volume 8, pages 1724, Morristown, NJ, USA, 2003. Association for Computational Linguistics. http://gate.ac.uk/sale/hlt03/ollie-sealts.pdf. [Tablan et al. 06a] V. Tablan, W. Peters, D. Maynard, H. Cunningham, and K. Bontcheva. Creating tools for
References
TRQ
morphological analysis of sumerian. In 5th Language Resources and Evaluation Conference (LREC), Genoa, Italy, May 2006. ELRA. [Tablan et al. 06b] V. Tablan, T. Polajnar, H. Cunningham, and K. Bontcheva. User-friendly Ontology Authoring Using a Controlled Language. In 5th Language Resources and Evaluation Conference (LREC), Genoa, Italy, May 2006. ELRA. [Tablan et al. 08] V. Tablan, D. Damljanovic, and K. Bontcheva. A Natural Language Query Interface to Structured Information. In Proceedings of the 5h European Semantic Web Conference (ESWC 2008), volume 5021 of Lecture Notes in Computer Science, pages 361375, Tenerife, Spain, June 2008. Springer-Verlag New York Inc. [Tanabe & Wilbur 02] L. Tanabe and W. J. Wilbur. Tagging Gene and Protein Names in Full Text Articles. In Proceedings of the ACL-02 workshop on Natural Language Processing in the biomedical domain - Volume 3, pages 913. Association for Computational Linguistics, 2002. [Tsuruoka et al. 05] Y. Tsuruoka, Y. Tateishi, J.-D. Kim, T. Ohta, J. McNaught, S. Ananiadou, and J. Tsujii. Developing a robust part-of-speech tagger for biomedical text. In P. Bozanis and E. Houstis, editors, Advances in Informatics, volume 3746 of Lecture Notes in Computer Science, pages 382392. Springer Berlin Heidelberg, 2005. [Ursu et al. 05] C. Ursu, T. Tablan, H. Cunningham, and B. Popav. Digital media preservation and access through semantically enhanced web-annotation. In Proceedings of the 2nd European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies (EWIMT 2005), London, UK, December 01 2005. [van Rijsbergen 79] C. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. [Wang et al. 05] T. Wang, D. Maynard, W. Peters, K. Bontcheva, and H. Cunningham. Extracting a domain ontology from linguistic resource based on relatedness measurements. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2005), pages 345351, Compiegne, France, Septmeber 2005. [Wang et al. 06] T. Wang, Y. Li, K. Bontcheva, H. Cunningham, and J. Wang. Automatic Extraction of Hierarchical Relations from Text. In Proceedings of the Third European Semantic Web Conference (ESWC 2006), Budva, Montenegro, 2006. [Witten & Frank 99] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999. [Wood et al. 03] M. M. Wood, S. J. Lydon, V. Tablan, D. Maynard, and H. Cunningham. Using parallel
TRR
References
texts to improve recall in IE. In Recent Advances in Natural Language Processing, Bulgaria, 2003.
[Wood et al. 04] M. Wood, S. Lydon, V. Tablan, D. Maynard, and H. Cunningham. Populating a Database from Parallel Texts using Ontology-based Information Extraction. In Proceedings of NLDB 2004, 2004. http://gate.ac.uk/sale/nldb2004/NLDB.pdf. [Yourdon 89] E. Yourdon. Modern Structured Analysis. Prentice Hall, New York, 1989. [Yourdon 96] E. Yourdon. The Rise and Resurrection of the American Programmer. Prentice Hall, New York, 1996.
Colophon
porml semntis @heneforth pAD t lest s it reltes to omputtionl lnE guge understndingD is in one wy rther like onnetionismD though without the ruil prop ejnowski9s work @IWVTA is widely elieved to give to the ltterX oth re old dotrines returnedD like the fouronsD hving lerned nothing nd forgotten nothingF fut p hs nothing to show s showpiee of suess fter ll the intelletul groning nd e'ortF
On Keeping Logic in its Place
@in heoretil sssues in xturl vnguge roE essingD edF ilksAD orik ilksD IWVW @pFIQHAF
A e used v i to produe this doumentD long with eRr for the rwv produtionF hnk you hon unuthD veslie vmport nd iitn qurriF
TRS