Acl 2015 Handbook
Acl 2015 Handbook
Acl 2015 Handbook
InterContinetal
Beijing Beichen
Handbook assembled by Xianpei Han, Kang Liu and Zhuoyu Wei
Cover designed by XinXing Deng
Contents
Table of Contents i
In Memoriam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1 Conference Information 19
Message from the General Chair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Message from the Program Committee Co-Chairs . . . . . . . . . . . . . . . . . . . . . . 21
Organizing Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Program Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Meal Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
i
Social Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
ii
Local Organizer
The Chinese Information Processing Society of China
(CIPS) was officially founded in June 1981, with the
objective to unite the domestic and overseas researchers
in the field of Chinese information processing.
Currently, CIPS has 15 professional committees, 5
working committees, 61 group members and more than
3000 personal members. The Journal of Chinese
Information Processing is the journal of CIPS.
Website: http://www.cipsc.org.cn/
Google Inc. http://research.google.com
Googles mission is to organize the worlds information and make it universally accessible and
useful. Perhaps as remarkable as two Stanford research students having the ambition to found a
company with such a lofty objective is the progress the company has made to that end. Ten
years ago, Larry Page and Sergey Brin applied their research to an interesting problem and
invented the world's most popular search engine. The same spirit holds true at Google today.
The mission of research at Google is to deliver cutting-edge innovation that improves Google
products and enriches the lives of all who use them. We publish innovation through industry
standards, and our researchers are often helping to define not just todays products but also
tomorrows.
Noahs Ark Lab of Huawei Technologies
The Noahs Ark Lab is a research lab of Huawei Technologies, located in Hong Kong and
Shenzhen.
The mission of the lab is to make significant contributions to both the company and society by
innovating in data mining, artificial intelligence, and related fields. Mainly driven by long term and
big impact projects, research in the lab also tries to advance the state of the art in the fields as well
as to harness the products and services of the company, at each stage of the innovation process.
As a world class research lab, we are pushing the frontier of research and development in all areas
that we work in. We dare to address both the challenges and opportunities in this era of big data,
to revolutionize the ways in which people work and live, and the ways in which companies do
business, through intelligentization of all processes, with the slogan 'from big data to deep
knowledge'.
Research areas of the lab mainly include machine learning, data mining, speech and language
processing, intelligent systems, information and knowledge management, and human computer
interaction.
Founded in 2012, the lab has now grown to be a research organization with many significant
achievements in both academia and industry. We welcome talented researchers and engineers to
join us to realize their dreams.
In Memoriam
The Association for Computational Linguistics mourns the passing of Adam Kilgarriff. a remarkable
computational linguist and long term ACL member.
A long time ago now (maybe 1988?), Gerald (Gazdar) and I supervised Adams DPhil at the Uni-
versity of Sussex. Adam was my age, give or take a year, having come to academia a little late,
and was my first doctoral student. Adams topic was polysemy, and Im not really sure that much
supervision was actually required, though I recall fun exchanges trying to model the subtleties of
word meaning using symbolic knowledge representation techniques - an experience that was clearly
enough to convince Adam later that this was a bad idea. In fact, Adams thesis title itself was Poly-
semy. Much as we encourage concision in thesis titles, pulling off the one-word title is a tall order,
requiring a unique combination of focus and coverage, breadth and depth, and most of all, author-
ity. Adam completely nailed it, at least from the perspective of the pre-empirical Computational
Linguistics of the early nineties.
Three years later, after a spell working for dictionary publishers, Adam joined me as a research
fellow now at the University of Brighton. I had a project to explore the automatic enrichment of
lexical databases to support the latest trends in language analysis, and in particular, task-specific
lexical resources. I was really pleased and excited to recruit Adam - he had lost none of his intellectual
independence, a quality I particularly valued. Within a few weeks he came to me with his own plan
for the research - a detour, as he put it, from the original workplan. I still have the email, dated
April 6 1995, in which he proposed that, instead of chasing a prescriptive notion of a single lexical
resource that needed to be customised to a domain, we should let the domain determine the lexicon,
providing lexicographic tools to explore words, and particularly word senses, that were significant
for that domain. In that email, Computational Lexicography at Brighton was born.
Over the next eight years or Computational Lexicography became a key part of our groups success,
increasingly under Adams direct leadership. The key project, WASPS, developed the WASPbench
- the direct precursor to the Sketch Engine, recruiting David (Tugwell) to the team. In addition,
Adam was one of the founding organisers of SENSEVAL, an initiative to bring international teams of
researchers together to work in friendly competition on a pre-determined word sense disambiguation
task (and which has now transformed into SEMEVAL). Together we secured funding to support the
first two rounds of SENSEVAL. Each round required the preparation of standardised datasets, guided
by Adams highly tuned intuitions about lexical data preparation and management. And we engaged
somewhat in the European funding merry-go-round, most fondly in the CONCEDE project, working
on dictionaries for Central European languages with amazing teams from the MULTEXT-EAST
consortium, and with Georgian and German colleagues in the GREG project.
But Adam was not entirely comfortable in academia, or at least not in a version of academia which
didnt share his drive for the practical as well as the theoretical. He didnt have tenure, nor any clear
12
route to achieve tenure, which meant that he could not apply for and hold grants in his own right
(although he freely admitted he was happy not to have the associated administrative responsibilities);
he set up a high quality masters programme in Computational Lexicography, which ran for a couple
of years, but the funding model didnt really work, and it quickly evolved into the highly successful,
but independent, Lexicom workshop series; and he couldnt engage university support for developing
the WASPbench as a commercial product. So in 2003, he spread his wings, left the university, and
set up Lexical Computing Ltd.
For many people, Lexical Computing and the Sketch Engine are what Adam is best known for. He
spent eleven years tirelessly developing the company, the software, the methodology, the resources,
the discipline. It was an environment in which he seemed completely at ease, sometimes the shame-
less promoter of his wares, sometimes the astute academic authority, often the source of practical
solutions to real problems, and the instigator of new initiatives, and always the generous facilitator,
educator and friend. For me personally, though, this was a time when our friendship was more promi-
nent than our professional relationship. We would meet for the odd drink, usually in the Constant
Service pub (Adams favourite), and chat about life, family, sometimes work, and occasional schemes
for new collaborations, though the company didnt leave him very much time for that. It was one of
those relaxed undemanding friendships that just picks up whenever and wherever we find the time to
meet, but remains strong nevertheless.
Adams illness was as unexpected to him as to anyone. Over the summer of 2014, he was making
plans for new directions and projects. And then, a brief hiatus in communication before we heard the
news in early November, And yet, already, he seemed reconciled - not resigned, but resolved, calm,
dignified. I was upset, angry, helpless - hopeless really, and feeling very selfish in my distress. I saw
Adam three times after he became ill and they are all good, strong memories, and that is more to his
credit than mine.
The first was in his kitchen, with early spring sunshine, drinking strong coffee he had made very
meticulously, watching the winter birds scavenging in the garden, just chatting about nothing in
particular, and gossiping about work for a couple of hours. The second was a surprise trip to the pub
- the surprise being that Adam was strong enough to get there (and back) on his own, and drink a
couple of pints too. We went to the Constant Service, as always, and it was one of our occasional
NLP group outings so a good crowd was there. The third was back in his kitchen, this time for
work a few weeks later. Ironically, the university system that struggled to engage with Adams
practical drive, is now fully signed up to demonstrating the Impact of its research. Adams work
on Computational Lexicography at Brighton and afterwards through Lexical Computing, featured
as an Impact Case Study in recent national evaluations, and has subsequently been selected for a
wider national initiative showcasing Computer Science research. Adam was happy to cooperate with
this, in part to alleviate boredom, and we arranged a skype call with a technical author from his
kitchen. Adam was on excellent form describing his work, his passion, and still full of ideas for
gentle academic engagement if his retirement would allow it.
Shortly after that meeting we heard the news of Adams relapse and decision not to continue
treatment. Like everyone else, I followed the blog, and also emailed a little privately. I arranged to
go and visit again, but Adam wasnt well enough so we cancelled. Like everyone else, I waited for
the inevitable blog post.
Adams funeral was in a modest church in the village of Rottingdean just along the coast from
Brighton. A beautiful setting and a sunny afternoon. The church was absolutely packed - standing
room only - we estimate about 250 people, family, friends and colleagues from far and wide. A
committed atheist, the service focused on fond memories of Adam from those closest to him, with
just one hymn - Immortal Invisible, as all his blog readers will understand. A beautiful and fitting
farewell to a man who, it seems was to everyone a friend first, and a colleague, boss, or antagonist,
second.
There have been many comments on Adams blog, on twitter, in academic forums, which say much
more and so much better than I can. Some have said that Adam will be remembered for the Sketch
Engine and the amazing data resources that have been built up around it. I would say that his real
legacy is much more deeply intellectual than that. Adam would probably smile with satisfaction that
the two things can co-exist so comfortably - a rare combination of the intellectual and practitioner, a
real giant of the field.
13
In Memoriam
The Association for Computational Linguistics (ACL) mourns the passing of Jane J. Robinson, for-
mer president of the ACL.
Jane Robinson, a pioneering computational linguist, made major contributions to machine transla-
tion, natural language, and speech systems research programs at the RAND Corporation, IBM, and
in the AI Center at SRI International. She served as ACL president in 1982.
Jane became a computational linguist accidentally. She had a Ph.D. in history from UCLA, but
could not obtain a faculty position in that field because those were reserved for men. Instead, she
took positions teaching English, first at UCLA and then at California State College, Los Angeles.
While at LA State, where she was tasked with teaching engineers how to write, Jane noticed an
announcement for a talk on Chomskys transformational grammar. She went to the talk thinking
this work on grammar might help her teach better. Although its subject matter did not match her
expectations, the talk marked a turning point in her career.
In the late 1950s, Jane became a consultant to the RAND Corporation group working on ma-
chine translation under Dave Hays (ACL president, 1964). From the beginning, Jane was concerned
with identifying connections between different traditions in formal grammars and their correspond-
ing detailed linguistic realizations. Her 1965 International Conference on Computational Linguistics
(COLING) paper, "Endocentric Constructions and the Cocke Parsing Logic" [Robinson1965], is a
beautiful example of connecting specific linguistic phenomena to parsing strategies in a way that
preserves the nature of the linguistic phenomena, endocentric constructions. While at RAND, Jane
became colleague and friend to many in the machine translation and emerging computational linguis-
tics world, including Susumo Kuno (ACL president 1967), Martin Kay (ACL president, 1969), Joyce
Friedman (ACL president, 1971), and Karen Sparck Jones (ACL president, 1994).
In the late 1960s Jane moved to the Automata Theory and Computability Group at the IBM
Thomas J. Watson Research Center, Yorktown Heights, NY. She used her knowledge of formal
work on grammars and parsing to draw correspondences between Dependency Grammars and Phrase
Structure Grammars. Although Jane came from the Dependency Grammar tradition, her balanced,
careful analysis of tradeoffs enabled others to bridge the approaches. Her 1967 COLING paper,
"Methods for Obtaining Corresponding Phrase Structure and Dependency Grammars" [Robinson1967],
is a wonderful example of her understanding of the seminal issues underlying these different systems.
She subsequently published the classic paper connecting dependency structure and transformational
rules, "Dependency Structures and Transformational Rules" [Robinson1970a]. This paper exem-
plifies Janes scholarship and her deftness in dealing with the formal and computational issues of
language processing in a very fair and informative manner. Her 1970 paper, "Case, category and
configuration" [Robinson1970b], demonstrated in a very convincing way the possibility of formally
interpreting Fillmores case grammar in terms of dependencies, in a more economic fashion and
without any loss of information.
In 1973, Don Walker (ACL secretary-treasurer, 19761993) recruited Jane to the speech group in
the AI Center (AIC) at SRI International. Jane remained a key member of the AICs natural language
14
group until she retired in the mid-1980s. She made major contributions to a wide range of research,
ranging from grammars for speech understanding systems and dialogues to such discourse issues as
codes and clues in contexts. For several of the NLP systems SRI developed in the 1970s and 1980s,
the grammar was the coordinating point for all knowledge about the language, so Jane interacted
with everyone developing any component of the system, from the architecture through semantics and
discourse. Speech processing was a bit "deaf" in those days, and Jane frequently remarked that she
had to write not only a grammar for English, but also its dual (one for non-English to rule out bad
parsings). During her time at SRI, Jane wrote some of the most comprehensive grammars for NLP
systems.
One of us (Barbara Grosz) notes that Janes contributions to the AICs natural language group
went far beyond her official grammar writing responsibilities. She served as mentor (before that
word was widely used in academia) for a large group of "young Turks" as she referred to those of us
in the younger cohort involved in building NLP systems; she was our in-house expert in linguistics;
provided critiques of drafts of papers, making them shorter, clearer and more scholarly; debugged
our ideas across the full spectrum of system components; and introduced us to the most senior people
in linguistics and computational linguistics.
Another of us (Aravind Joshi, ACL president, 1975) recalls meeting Jane in September 1975 at
an NLP workshop at the University of Michigan. At that time, he and his students were working
on two separate areas of CL. One of them concerned the minimal formal machinery needed for
representing structural aspects of language and the other one dealt with some aspects of cooperation
in NL interfaces to databases. After just a brief discussion with Jane, when she asked what he was
doing, it became clear to him, "that I was in the presence of someone who had already worked on
such diverse areas. From thereon, whenever I had an opportunity to meet with Jane, I took advantage
of her deep understanding."
And the third of us (Eva Hajicova, ACL president, 1998) recalls the important ties Jane formed with
Praguian linguists interested in formal grammar starting in 1965, when the founder of the Prague
group, Petr Sgall, first visited the United States and met Jane at RAND. Their common research
interests in computational linguistics, particularly Dependency Grammar, forged deep personal re-
lationships between Jane and Sgalls linguistics group in Prague. Jane visited Prague twice, once
before and once after the change of the communist regime. During difficult political times, Jane
provided the Prague group with linguistics literature published in the West, and she introduced them
to her colleagues and students, yielding additional important connections, which have continued to
this day. Eva notes that, "only those who have the same historical experience as we in Prague have
had can appreciate fully how important such activities were for our research and for our students."
Jane read broadly and her training as a historian made her a careful and deep scholar. Throughout
her life she would go to talks that seemed a bit far afield and come back with new ideas. For those
who worked with her she was an invaluable source of out-of-the-box thinking as well as the go-to
person for what to read in linguistics. Jane often said that had she been born in a later generation, she
would have become an astronomer. Given her love of exploration, she might have been an astronaut.
(You can see this bent in the poem she wrote for her poetry class friends funeral, "Time To Go"
[Robinson2008]). Lucky for all of us, she wound up in Computational Linguistics.
Jane was the mother of four children and the proud grandmother of two grandsons, one now a
lawyer, the other an actor. She extended her family to encompass her colleagues, building cama-
raderie through dinners at her home lamb stew mostly and picnics at Foothills Park in Palo Alto,
activities which drew the families of AIC researchers together and yielded many lifelong friendships.
Jane built such friendships throughout her career. In responding to our questions about Janes time at
RAND, Dave Hayss wife Rita noted that Jane remained close friends with her and Dave even after
Dave went to SUNY Buffalo and Jane to IBM Yorktown Heights and then SRI. Rita and Jane were
traveling and hiking companions into Janes 90s.
To celebrate her 60th birthday, Jane "got in shape" to hike in the Himalayas around Annapurna.
She came back with beautiful photos and the desire to join the young Turks who went backpacking in
Yosemite. (Although she had been a regular visitor to Yosemite since her 40s, she had not backpacked
before.) She offered to drive everyone in the huge Chrysler Imperial she had gotten to feel safe driving
in New York. Jane backpacked into her late 70s, then switched to the luxury of the High Sierra Camp
tent cabins and later to a small cabin with an inside shower. For those lucky enough to visit Yosemite
with her, she was as much a guide to the mountains as she had been a guide to linguistics.
15
In Memoriam
For people who worked with Jane at SRI, she was a towering figure in the field, a wonderful
colleague who imparted deep wisdom as well as linguistic facts, and a dear friend. She was senior
to most of the members of the AIC. Looking back on her arrival, Peter Hart (a director of the AIC)
noted that Janes presence changed the tone of the early SRI AI Center. She brought not only a keen
intellect and depth of knowledge, but "also a gentleness, openness, and generosity of spirit." Gary
Hendrix (who led the NLP group at SRI for several years) remembers that for many who worked
with her at SRI, "Jane was like a second mother, loving and giving and nurturing." Jerry Hobbs (ACL
president, 1990) recalls that, "one of the things I learned from her, though imperfectly, was how to be
tough with grace." Several remember Jane as one of a handful of elder statesmen that their generation
could look up to.
Jane was a colleague and friend of Ray Perrault (ACL president, 1983, and current AIC direc-
tor) from the time he was a Ph.D. student at the University of Michigan. In the early 1970s, Jane
frequently visited Joyce Friedman (ACL president, 1971), her old friend and his advisor. Ray re-
members Jane was a gentle but firm critic of his thesis, and he fondly recalls her passing though
in her huge Chrysler on her move from IBM to SRI. Ray was ACL vice-president when Jane was
president, and she was delighted when he decided to join the young Turks at SRI. Subsequently, she
"even tolerated me as her manager until her retirement and became doting godmother to my son."
Jane made a difference in peoples lives, not just their research. Her death marks the end of an era
and the passing of an icon. We will miss her greatly.
Acknowledgments: This remembrance was a composite of the reminiscences of a number of peo-
ple, including Tom Garvey, Marguerite Hays, Peter Hart, Gary Hendrix, Jerry Hobbs, David Israel,
Ray Perrault, Candy Sidner, and Marty Tenenbaum.
References
[Robinson1965] Robinson, Jane J. 1965. Endocentric constructions and the cocke parsing logic. In
Proceedings of the 1965 Conference on Computational Linguistics, COLING 65, pages 123
[Robinson1967] Robinson, Jane J. 1967. Methods for obtaining corresponding phrase structure and
dependency grammars. In Proceedings of the 1967 Conference on Computational Linguistics, COL-
ING 67, pages 125
[Robinson1970b] Robinson, Jane J. 1970. Case, category, and configuration. Journal of Linguistics,
6(1):5780.
[Robinson1970a] Robinson, Jane J. 1970. Dependency structures and transformational rules. Lan-
guage, 46(2):259285.
[Robinson2008] Robinson, Jane J. 2008. Time to go. http://poemsofjanerobinson.blogspot.com/2008/06/time-
to-go.html.
16
IN MEMORIAM: PAUL CHAPIN
1938-2015
Memorial statement prepared by Tom Bever, Merrill Garrett, and Cecile McKee (University of
Arizona)
The Association for Computational Linguistics (ACL) mourns the July 1, 2015 death of Paul
Chapin, former ACL president (1977).
The language sciences lost a truly valued defender and friend on July 1, 2015, with the death of
Paul Gipson Chapin from Acute Myeloid Leukemia, in Tucson, Arizona.
Paul was born December 27, 1938, in El Paso, Texas, son of John Letcher Chapin and Velma
Gipson Chapin. In 1962, he married Susan Levy of New York, his beloved spouse of 53 years
who survives him and will continue to reside in Tucson. Other survivors are: sister Clare Ratliff
of Santa Fe, NM; children Ellen Endress of Beltsville, MD, John Chapin of Alexandria, VA, Robin
Chapin of Honolulu, HI, and Noelle Green of Sherman Oaks, CA; and grandchildren Kasey Chapin
of Woodland, WA and Malia Green of Sherman Oaks, CA.
Paul received his B.A. from Drake University in 1960 and his Ph.D. from MIT in 1967 as a student
of Noam Chomsky. He served as Assistant Professor at UCSD from 1967 to 1975, in a newly form-
ing department. During this period, he developed a particular interest in psycholinguistics, and made
early contributions to its initial growth. He then had an opportunity at NSF to be of use as a broad
organizer of the language sciences in general. He directed the National Science Foundations Lin-
guistics Program from 1975 to 1999, declining several offers to move up to higher positions in NSF.
Between 1999 and his retirement in 2001, Paul supported cross-directorate activities at NSF. When
he retired, NSF gave him the Directors Superior Accomplishment Award. The Linguistic Society of
America gave him the first Victoria A. Fromkin Award for Distinguished Service to the Profession
that same year. He later served as Secretary-Treasurer of the LSA from 2008-2013. He was elected
a fellow of the Linguistic Society of America, the American Association for the Advancement of
Science, and the Association for Psychological Sciences.
As a 1967 Ph.D. graduate from MIT, Paul could have expressed a particular professional bias for
generative grammar while at NSF. But even as a student, he was eclectic and was as interested in lan-
guages as in their structure. His first graduate program was in Philosophy at Harvard for a year. He
subsequently became a student at MIT, but worked for most of that time in the MITRE Corporations
pioneering lab in computational linguistics indeed, he later became President of the Association of
Computational Linguistics in 1977. His dissertation, "On the syntax of word-derivation in English"
argued against the then prevailing view that transformations preserve meaning; he showed that in
the terms of the current theory (essentially an early version of the Aspects model), there is a cycle
of transformations internal to complex words that modify their meaning: this can be seen as a pre-
monition of Generative Semantics treatment of lexical items, and todays principles of Derivational
Morphology. During his period as an assistant professor in the linguistics program at UCSD, Paul
published papers on a range of topics, including articles from his dissertation, analyses of Samoan,
the history of Polynesian languages, methodological papers on computational topics (e.g., automatic
morpheme stripping), experimental studies of sentence comprehension (e.g., on click location dur-
17
In Memoriam
ing sentence comprehension), and several important review articles. Theoretical frameworks for his
investigations included transformational grammar, case grammar, and generative semantics, among
others.
Thus, no one could have been more broadly trained to take on the task of managing the funding
of NSFs Linguistics Program. As much as any leading academic, he must be credited with shaping
the field as it is today. His positive influence on linguistics cannot be overstated. Since NSF is the
primary source of government support for the field, the NSF program director has a supremely impor-
tant influence: Paul used this influence with a great sense of critical judgment but with an equal sense
of impartiality in a field rife with academic conflicts. The 25 year period of his stewardship of the
program, witnessed some of the most extreme disagreements within the language sciences, pitting
rationalists against associationists, structuralists against functionalists, nativists against empiricists:
They all were struggling for NSF support during a time of increasingly limited resources. Paul stood
above these arguments, and insisted on supporting any affordable proposal that had promise for im-
portant results, both theoretical and empirical, whatever the philosophical stripe of the investigators.
He could see past the intellectual commercials accompanying a project into its value in propelling
the field forward. Without him, research support could easily have fallen into one camp or another
with an ultimate loss for everyone.
Paul was first known to many of us professionally in his capacity as director of the NSF Linguistics
Program. He was a consistent chaperone of ideas and research projects, gentle with his advice,
generous with assistance in helping applicants modify their proposals to become more successful.
His mellifluous basso profundo voice on the telephone (the earlier days were prior to email) still
resonates in our memory of discussions about why our most recent proposal did not get funded,
or needed some changes, or in fact did get funded. His tone was always the same, quiet, factual,
friendly, and concerned to be helpful. Indeed, a few years after his official retirement, he published
an extraordinary book on how to write grant proposals and use them to formulate coherent research
programs.
Paul was a witty and engaging personal friend, with wide ranging interests. He had a lifelong love
of music, as a flute player, a singer, and in retirement serving on the board of the Desert Chorale
in Santa Fe, NM. He enjoyed great food whether high cuisine or ethnic. He could always tell you
where and when he had eaten his favorite version of any particular dish. He found most published
crossword puzzles too easy. He collaborated with an online community from 2003-2012 to follow
the Samuel Pepys diary on a day by day basis.
Family and friends remember Paul for his deep caring for others and his lifelong commitment to
social progress. He will be sorely missed.
Memorial donations may be made to the MD Anderson Cancer Center in Houston, TX.
18
Conference Information
1
Message from the General Chair
It was fifteen years ago when ACL first came to Asia in 2000. The conference in Hong Kong was
a very exciting one and attracted lots of people. It was a great opportunity for a number of Asian
NLP researchers to meet face-to-face in such a large scale meeting. Establishment of AFNLP (Asian
Federation of Natural Language Processing) was discussed soon after this wonderful event, and then
AFNLP started IJCNLP (the International Joint Conference on Natural Language Processing) as a
biennial flagship conference of AFNLP. ACLs three year regional rotation and IJCNLPs two year
cycle meet every six years, and this is the second joint ACL-IJCNLP conference following the first
held in Singapore in 2009. ACL meetings in Asia and IJNCLPs are now a propelling force of NLP
research in Asian regions, and provide valuable experiences especially to young researchers and
students who first attend this size of a big conference.
The success of ACL-IJCNLP owes a great deal to the hard work and dedication of many people. I
would like to thank all of them for their time and contribution to this joint ACL-AFNLP conference.
Priscilla Rasmussen (the ACL Business Manager), Gertjan van Noord (ACL Past President), Chris
Manning (ACL President), Graeme Hirst (ACL Treasurer), Dragomir Radev (ACL Secretary), Keh-
Yih Su (AFNLP Past President), Kam-Fai Wong (AFNLP President), all other ACL and AFNLP
Executive Committee members and ACL-AFNLP Conference Coordinating Committee members
(forgive me for not listing all their names) have always been very helpful and guided me anytime I
missed something or was behind the schedule, and given me appropriate advice. Without their help,
I could not fulfill even half my duty.
I was very lucky to have a wonderful team of chairs, who have done a fantastic job for leading
this conference to an invaluable one. I would like to express my deepest gratitude to Michael Strube
and Chengqing Zong (Program Committee Co-Chairs), Le Sun and Yang Liu (Local Arrangement
Co-Chairs), Hang Li and Sebastian Riedel (Workshop Co-Chairs), Kevin Duh and Eneko Agirre (Tu-
torial Co-Chairs), Hsin-His Chen and Katja Markert (System Demonstration Co-Chairs), Wanxiang
Che and Guodong Zhou (Publications Co-Chairs), Stephan Oepen, Chin-Yew Lin and Emily Bender
(Student Research Workshop Faculty Advisors), Kuan-Yu Chen, Angelina Ivanova and Ellie Pavlick
(Student Research Workshop Co-Chairs), Francis Bond (Mentoring Chair), Xianpei Han and Kang
Liu (Publicity Co-Chairs), Zhiyuan Liu (Webmaster), and all the team members of the Local Orga-
nizing Committee. Thanks to their dedicated efforts, we now have a great conference consisting of
the Presidential Address (by Chris Manning), two Keynote Addresses (by Marti Hearst and Jiawei
Han), 173 long and 145 short papers, 13 TACL papers, 7 Student Research Workshop papers, 25
system demonstrations, 8 tutorials, 15 workshops, one collocated conference (CoNLL-2015), and a
not yet known Lifetime Achievement Awardees speech.
I am also grateful to our sponsors for their generous contributions. ACL-IJCNLP-2015 is sup-
ported by six Platinum Sponsors (CreditEase, Baidu, Tencent, Alibaba Group, SAMSUNG, and
Microsoft), four Gold Sponsors (Google, Facebook, SinoVoice, and Huawei), three Silver Spon-
19
Message from the General Chair
sors (Nuance, Amazon, and Sogou), one Bronze Sponsor (Yandex), one Oversea Student Fellowship
Sponsor (Baobab), and one Best Paper Sponsor (IBM). I would express special thanks to Yiqun Liu
(Local Sponsorship Chair) and all members of the International Sponsorship Committee (Ting Liu,
Hideto Kazawa, Asli Celikyilmaz, Julia Hockenmaier, and Alessandro Moschitti).
Finally, I would like to thank two keynote speakers, the area chairs of the main conference, the
workshop organizers, the tutorial presenters, the authors of main conference and demo papers, the
reviewers for their contribution, and all the attendees for participation. I hope everyone have a great
time and enjoy this conference.
20
Message from the Program Committee Co-Chairs
Welcome to the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing of the Asian Federation of Natural
Language Processing (ACL-IJCNLP)! This year ACL-IJCNLP received 692 long paper submissions
and 648 short paper submissions which sets a new record for ACL for both long and short papers! We
are pleased to observe that our community continues to grow. Of the long papers, 173 were accepted
for presentation at ACL-IJCNLP 105 as oral and 68 as poster presentations. 145 short papers
were accepted 50 as oral and 95 as poster presentations. In addition, ACL-IJCNLP also features 12
presentations of papers accepted in the Transactions of the Association for Computational Linguistics
(TACL).
The submissions were reviewed under different categories and using different review forms for
empirical/data-driven, theoretical, applications/tools, resources/evaluation, and survey papers. This
year we introduced the item MENTORING to the review form to indicate whether a paper needs
the help of a mentor in its writing, organization or presentation. For short papers, following up on last
years successful experiences, we also welcomed the submissions describing negative results. We are
glad to see that the community is becoming more open towards negative results so that such papers
have the chance to be published, so that other researchers do not fall in the same trap.
We view posters as an integral part of ACL-IJCNLP. Half of the papers have been accepted as
posters. Hence, we spent a great deal of time to ensure that the poster session will be a good experi-
ence for poster presenters and their audience. Following last years exciting poster session, we also
organized the posters in two large poster sessions to accommodate the high-quality submissions ac-
cepted in poster presentation format. We hope attendees and authors will benefit from this additional
time to present and have more time to discuss with each other.
ACL-IJCNLP 2015 will have two distinguished invited speakers. Marti Hearst (professor at UC
Berkeley in the School of Information and EECS) and Jiawei Han (Abel Bliss Professor at University
of Illinois at Urbana-Champaign). We are grateful that they accepted our invitation.
There are many individuals to thank for their contributions to ACL-IJCNLP 2015. We would like
to thank the 37 area chairs for their hard work on recruiting reviewers, leading the discussion pro-
cess, and carefully ranking the submissions. We would also like to thank the 751 primary and the 137
secondary reviewers on whose efforts we depended to select high-quality and timely scientific work.
This year we specifically acknowledge around 18.2% of the reviewers who went the extra mile and
provided extremely helpful reviews (their names are marked with a * in the organization section of
the proceedings). The ACL coordinating committee members, including Dragomir Radev, Graeme
Hirst, Jian Su, and Gertjan van Noord were invaluable on various issues relating to the organization.
We would like to thank the prior conference chairs Kristina Toutanova and Hua Wu and their prede-
cessors for their advice. We are very grateful for the guidance and support of the general chair Yuji
Matsumoto, to the ACL Business Manager Priscilla Rasmussen who knew practically everything,
to the local chairs Le Sun and Yang Liu, the publication chairs Wanxiang Che and Guodong Zhou,
and web master Zhiyuan Liu. We would also like to thank Jiajun Zhang who helped with reviewer
assignment and numerous other tasks. Rich Gerber and Paolo Gai from Softconf were extremely
responsive to all of our requests, and we are grateful for that.
We are indebted to the best paper award committee which consisted of Eneko Agirre, Tim Baldwin,
Philipp Koehn, Joakim Nivre, and Yue Zhang. They read the candidate papers, ranked them and
provided comments on a very short notice.
We hope you will enjoy ACL-IJCNLP 2015 in Beijing!
21
Organizing Committee
General Chair
Yuji Matsumoto, Nara Institute of Science and Technology
Program Committee Co-chairs
Chengqing Zong, Institute of Automation, Chinese Academy of Sciences
Michael Strube, Heidelberg Institute for Theoretical Studies
Local Co-chairs
Le Sun, Institute of Software, Chinese Academy of Sciences
Yang Liu, Tsinghua University
Workshop Co-chairs
Hang Li, Huawei
Sebastian Riedel, University College London
Tutorial Co-chairs
Kevin Duh, Nara Institute of Science and Technology
Eneko Agirre, University of the Basque Country
Publications Co-chairs
Wanxiang Che, Harbin Institute of Technology
Guodong Zhou, Suzhou University
Demonstration Co-Chairs
Hsin-Hsi Chen, National Taiwan University
Katja Markert, University of Leeds
Sponsorship Chair
Yiqun Liu, Tsinghua University
Publicity Co-Chairs
Xianpei Han, Institute of software, Chinese Academy of Sciences
Kang Liu, Institute of Automation, Chinese Academy of Sciences
Student Research Workshop Co-chairs
Student Co-chairs
Kuan-Yu Chen, National Taiwan University
Angelina Ivanova, University of Oslo
Ellie Pavlick, University of Pennsylvania
Faculty Advisors
Stephan Oepen, University of Oslo
Chin-Yew Lin, Microsoft Research Asia
Emily Bender, University of Washington
Mentoring Chair
Francis Bond, Nanyang Technological University
Student Volunteer Co-Chairs
Erhong Yang, Beijing Language and Culture University
Dong Yu, Beijing Language and Culture University
Webmasters
Zhiyuan Liu, Tsinghua University
Qi Zhang, Fudan University
Entertainment Chair
Binyang Li, University of International Relations
Space Management Co-Chairs
Jiajun Zhang, Institute of Automation, Chinese Academy of Sciences
Wenbin Jiang, Institute of Computing Technology, Chinese Academy of Sciences
Qiuye Zhao, Institute of Computing Technology, Chinese Academy of Sciences
22
Graphic Design
Yi Han, Tsinghua University
Ying Lin, Beijing University of Posts and Telecommunications
Business Manager
Priscilla Rasmussen
23
Program Committee
24
Sentiment Analysis and Opinion Mining
Jing Jiang, Singapore Management University
Lun-Wei Ku, Academia Sinica
Spoken Language Processing, Dialogue and Interactive Systems, and Multimodal NLP
David Schlangen, University of Bielefeld
Julia Hirschberg, Columbia University
Summarization and Generation
Xiaojun Wan, Peking University
Mirella Lapata, University of Edinburgh
Tagging, Chunking, Syntax and Parsing
Yusuke Miyao, National Institute of Informatics
Anders Soegaard, University of Copenhagen
Mark Dras, Macquarie University
25
Meal Info
26
Tutorials: Sunday, July 26
2
Overview
Sentiment and Belief: How to Think about and Represent and and Annotate Private
States 302B
Owen Rambow and Janyce Wiebe
What You Need to Know about Chinese for Chinese Language Processing 305
Chu-Ren Huang
27
Tutorials
28
Sunday, July 26, 2015
Tutorial 1
301A+B
Historically Natural Language Processing (NLP) focuses on unstructured data (speech and text)
understanding while Data Mining (DM) mainly focuses on massive, structured or semi-structured
datasets. The general research directions of these two fields also have followed different philoso-
phies and principles. For example, NLP aims at deep understanding of individual words, phrases and
sentences ("micro-level"), whereas DM aims to conduct a high-level understanding, discovery and
synthesis of the most salient information from a large set of documents when working on text data
("macro-level"). But they share the same goal of distilling knowledge from data. In the past five
Jiawei Han is Abel Bliss Professor in the Department of Computer Science at the University of
Illinois. He has been researching into data mining, information network analysis, and database sys-
tems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions
on Knowledge Discovery from Data (TKDD). He has received ACM SIGKDD Innovation Award
(2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W.
Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011).
He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network
Academic Research Center (INARC) supported by the Network Science-Collaborative Technology
Alliance (NS-CTA) program of U.S. Army Research Lab and also the Director of KnowEnG, an NIH
Center of Excellence in big data computing as part of NIH Big Data to Knowledge (BD2K) initiative.
His co-authored textbook "Data Mining: Concepts and Techniques" (Morgan Kaufmann) has been
adopted worldwide. He has delivered tutorials in many reputed international conferences, including
WWW14, SIGMOD14 and KDD14.
Heng Ji is Edward H. Hamilton Development Chair Associate Professor in Computer Science De-
partment of Rensselaer Polytechnic Institute. She received "AIs 10 to Watch" Award in 2013, NSF
CAREER award in 2009, Google Research Awards in 2009 and 2014 and IBM Watson Faculty
Awards in 2012 and 2014. In the past five years she has done extensive collaborations with Prof.
Jiawei Han and Prof. Yizhou Sun on applying data mining techniques to NLP problems and jointly
published 15 papers, including a "Best of SDM2013" paper and a "Best of ICDM2013" paper. She
has delivered tutorials at COL-ING2012, ACL2014 and NLPCC2014.
Yizhou Sun is an assistant professor in the College of Computer and Information Science of North-
eastern University. She received her Ph.D. in Computer Science from the University of Illinois at
Urbana Champaign in 2012. Her principal research interest is in mining information and social net-
works, and more generally in data mining, database systems, statistics, machine learning, information
retrieval, and network science, with a focus on modeling novel problems and proposing scalable al-
gorithms for large scale, real-world applications. Yizhou has over 60 publications in books, journals,
and major conferences. Tutorials based on her thesis work on mining heterogeneous information
networks have been given in several premier conferences, including EDBT 2009, SIGMOD 2010,
KDD 2010, ICDE 2012, VLDB 2012, and ASONAM 2012. She received 2012 ACM SIGKDD
Best Student Paper Award, 2013 ACM SIGKDD Doctoral Dissertation Award, and 2013 Yahoo ACE
(Academic Career Enhancement) Award.
29
Tutorials
years, these two areas have had intensive interactions and thus mutually enhanced each other through
many successful text mining tasks. This positive progress mainly benefits from some innovative in-
termediate representations such as "heterogeneous information networks" [Han et al., 2010, Sun et
al., 2012b].
However, successful collaborations between any two fields require substantial mutual understanding,
patience and passion among researchers. Similar to the applications of machine learning techniques
in NLP, there is usually a gap of at least several years between the creation of a new DM approach and
its first successful application in NLP. More importantly, many DM approaches such as gSpan [Yan
and Han, 2002] and RankClus [Sun et al., 2009a] have demonstrated their power on structured data.
But they remain relatively unknown in the NLP community, even though there are many obvious
potential applications. On the other hand, compared to DM, the NLP community has paid more
attention to developing large-scale data annotations, resources, shared tasks which cover a wide range
of multiple genres and multiple domains. NLP can also provide the basic building blocks for many
DM tasks such as text cube construction [Tao et al., 2014]. Therefore in many scenarios, for the
same approach the NLP experiment setting is often much closer to real-world applications than its
DM counterpart.
We would like to share the experiences and lessons from our extensive inter-disciplinary collabora-
tions in the past five years. The primary goal of this tutorial is to bridge the knowledge gap between
these two fields and speed up the transition process. We will introduce two types of DM methods: (1).
those state-of-the-art DM methods that have already been proven effective for NLP; and (2). some
newly developed DM methods that we believe will fit into some specific NLP problems. In addition,
we aim to suggest some new research directions in order to better marry these two areas and lead to
more fruitful outcomes. The tutorial will thus be useful for researchers from both communities. We
will try to provide a concise roadmap of recent perspectives and results, as well as point to the related
DM software and resources, and NLP data sets that are available to both research communities.
30
Sunday, July 26, 2015
Tutorial 2
302A
Statistical natural language processing relies on probabilistic models of linguistic structure. More
complex models can help capture our intuitions about language, by adding linguistically meaningful
interactions and latent variables. However, inference and learning in the models we want often poses
a serious computational challenge. Belief propagation (BP) and its variants provide an attractive
approximate solution, especially using recent training methods. These approaches can handle joint
models of interacting components, are computationally efficient, and have extended the state-of-the-
art on a number of common NLP tasks, including dependency parsing, modeling of morphologi-
cal paradigms, CCG parsing, phrase extraction, semantic role labeling, and information extraction
(Smith and Eisner, 2008; Dreyer and Eisner, 2009; Auli and Lopez, 2011; Burkett and Klein, 2012;
Naradowsky et al., 2012; Stoyanov and Eisner, 2012).
This tutorial delves into BP with an emphasis on recent advances that enable state-of-the-art perfor-
mance in a variety of tasks. Our goal is to elucidate how these approaches can easily be applied
to new problems. We also cover the theory underlying them. Our target audience is researchers in
human language technologies; we do not assume familiarity with BP. In the first three sections, we
discuss applications of BP to NLP problems, the basics of modeling with factor graphs and message
passing, and the theoretical underpinnings of "what BP is doing" and how it relates to other inference
techniques. In the second three sections, we cover key extensions to the standard BP algorithm to
enable modeling of linguistic structure, efficient inference, and approximation-aware training. We
survey a variety of software tools and introduce a new software framework that incorporates many of
the modern approaches covered in this tutorial.
Matt Gormley is a PhD student at Johns Hopkins University working with Mark Dredze and Jason
Eisner. His current research focuses on joint modeling of multiple linguistic strata in learning settings
where supervised resources are scarce. He has authored papers in a variety of areas including topic
modeling, global optimization, semantic role labeling, relation extraction, and grammar induction.
http://www.cs.jhu.edu/~mrg/
Jason Eisner is a Professor in Computer Science and Cognitive Science at Johns Hopkins University,
where he has received two school-wide awards for excellence in teaching. His 90+ papers have
presented many models and algorithms spanning numerous areas of NLP. His goal is to develop the
probabilistic modeling, inference, and learning techniques needed for a unified model of all kinds of
linguistic structure. In particular, he and his students introduced structured belief propagation (which
incorporates classical NLP models and their associated dynamic programming algorithms), as well as
loss-calibrated training for use with belief propagation. http://www.cs.jhu.edu/~jason/
31
Tutorials
Tutorial 3
Sentiment and Belief: How to Think about and Represent and and
Annotate Private States
302B
Over the last ten years, there has been an explosion in interest in sentiment analysis, with many
interesting and impressive results. For example, the first twenty publications on Google Scholar
returned for the Query "sentiment analysis all date from 2003 or later, and have a total citation
count of 12,140. The total number of publications is in the thousands. Partly, this interest is driven
by the immediate commercial applications of sentiment analysis.
Sentiment is a "private state" (Wiebe 1990). However, it is not the only private state that has received
attention in the computational literature; others include belief and intention. In this tutorial, we pro-
pose to provide a deeper understanding of what a private state is. We will concentrate on sentiment
and belief. Belief is very closely related to factuality, and also to notions such as veridicality, modal-
ity, and hedging. We will provide background that will allow the tutorial participants to understand
the notion of a private state as a cognitive phenomenon, which can be manifested linguistically in
various ways. We will explain the formalization in terms of a triple of state, source, and target. We
will discuss how to model the source and the target. We will then explain in some detail the anno-
tations that have been made. The issue of annotation is crucial for private states: while the MPQA
corpus (Wiebe et al. 2005) has been around for some time, most research using it does not make
use of many of its features. We believe this is because the MPQA annotation is quite complex and
requires a deeper understanding of the phenomenon of "private state, which is what the annotation
is getting at. Furthermore, there are currently several efforts underway of creating new versions of
annotations, which we will also present.
Owen Rambow is a Senior Research Scientist at the Center for Computational Learning Systems
at Columbia University. He is also the co-chair of the Center for New Media at the Data Science
Institute at Columbia University. He has been interested in modeling cognitive states in relation
to language for a long time, initially in the context of natural language generation (Rambow 1993
Walker and Rambow 1994). More recently, he has studied belief in the context of recognizing beliefs
in language (diab et al. 2009, Prabhakaran et al. 2010, Danlos and Rambow 2011, Prabhakaran et al.
2012). He is currently leading the DARPA DEFT Belief group, working with other researchers and
with the LDC to define annotation standards and evaluations. He was recently involved in the pilot
evaluation for belief recognition (in English) in the DARPA DEFT program.
Janyce Wiebe is Professor of Computer Science and Professor and Co-Director of the Intelligent
Systems at the University of Pittsburgh. She has worked on issues related to private states for some
time, originally in the context of tracking point of view in narrative (Wiebe 1994), and later in the
context of recognizing sentiment in other genres such as news articles (Wilson et al. 2005). She
has approached the area from the perspective of corpus annotation (Wiebe et al. 2005, Deng et al.
2013), lexical semantics (Wiebe and Mihalcea 2006), and discourse (Somasundaran et al. 2009). In
addition to continuing these lines of research, she has recently begun investigating implicatures in
opinion analysis (Deng and Wiebe 2014). http://people.cs.pitt.edu/~wiebe/
32
Sunday, July 26, 2015
The larger goal of this tutorial is to allow the tutorial participants to gain a deeper understanding of
the role of private states in human communication, and to encourage them to use this deeper under-
standing in their computational work. The immediate goal of this tutorial is to allow the participants
to make more complete use of available annotated resources. These include the MPQA corpus, The
LU Coprus (Diab et al. 2009), FactBank (Saur and Pustejovsky 2009), and the corpora under de-
velopment at the LDC which include sentiment and belief. We propose to achieve these goals by
concentrating on annotated corpora, since this will allow participants to both understand the under-
lying content (achieving the larger goal) and the technical details of the annotations (achieving the
immediate goal).
33
Tutorials
Tutorial 4
305
This tutorial presents a pattern-based empirical approach to meaning representation and computation.
It is a response to the finding by corpus linguists that "most meanings require the presence of more
than one word for their normal realization". The tutorial shows how patterns are built, using corpus
evidence, using machine learning methods, and discusses potential applications of patterns. It is
intended for an audience with heterogeneous competences but with a common interest in corpus
linguistics and computational models for meaning-related tasks in NLP. The goal is to equip the
audience with a better understanding of the role played by patterns in natural language, an operative
command of the methodology used to acquire patterns, and a forum in which to discuss their utility
in NLP applications.
The relatively recent explosion of corpus-driven research has shown that intermediate text represen-
tations (ITRs), built from the bottom up, using corpus examples towards a complex representation
of phrases, play an important role in dealing with the meaning disambiguation problem. It has been
shown that it is possible to identify and to learn corpus patterns that encode the information that
accounts for the senses of the verb and its arguments in the context. These patterns link the syntactic
structure of verbal phrases and the semantic types of their argument fillers via the role that each of
these play in the disambiguation of the phrase as a whole. The available solutions developed so far
range from supervised to totally unsupervised approaches. The patterns obtained encode the neces-
sary information for handling the meaning of each word individually as well as that of the phrase
as a whole. As such, they are instrumental in building better language models as in the contexts
matched by such patterns. The semantic types used in pattern representation play a discriminative
role, therefore the patterns are sense discriminative and as such they can be used in word sense dis-
ambiguation and other meaning-related tasks. The meaning of a pattern as a whole is expressed as a
set of basic implicatures. The implicatures are instrumental in textual entailment, semantic similarity
Patrick Hanks is a lexicographer and corpus linguist. He currently holds two research professorships: one at
the Research Institute of Information and Language Processing in the University of Wolverhampton, the other
at the Bristol Centre for Linguistics in the University of the West of England (UWE, Bristol).
Elisabetta Jezek has been teaching Syntax and Semantics and Applied Linguistics at the University of Pavia
since 2001. Her research interests and areas of expertise are lexical semantics, verb classification, theory of
Argument Structure, event structure in syntax and semantics, corpus annotation, computational Lexicography.
Daisuke Kawahara is an Associate Professor at Kyoto University. He is an expert in the areas of parsing,
knowledge acquisition and information analysis. He teaches graduate classes in natural language processing.
His current work is focused on automatic induction of semantic frames and semantic parsing, verb polysemic
classes, verb sense disambiguation, and automatic induction of semantic frames.
Octavian Popescu is a researcher at IBMs T. J. Watson Research Center, Yorktown, working on computational
semantics with a focus on corpus patterns for meaning processing. His work is focused on models for word
sense disambiguation, textual entailment and paraphrase acquisition. He taught various NLP graduate courses
in computational semantics at Trento University (IT), Colorado University at Boulder (US) and University of
Bucharest (RO).
34
Sunday, July 26, 2015
35
Tutorials
Tutorial 5
Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktaschel, and Andreas Vlachos
301A+B
Guillaume Bouchard is a senior researcher in statistics and machine learning at Xerox, focusing on
statistical learning using low-rank model for large relational databases. His research includes text
understanding, user modeling, and social media analytics. The theoretical part of his work is related
to the efficient algorithms to compute high dimensional integrals, essential to deal with uncertainty
(missing and noisy data, latent variable models, Bayesian inference). The main application areas of
his work includes the design of virtual conversational agents, link prediction (predictive algorithms
for relational data), social media monitoring and transportation analytics. His web page is available
at http://www.xrce.xerox.com/people/bouchard.
Jason Naradowsky is a postdoc at the Machine Reading group at UCL. Having previously obtained
a PhD at UMass Amherst under the supervision of David Smith and Mark Johnson, his current
research aims to improve natural language understanding by performing task-specific training of
word representations and parsing models. He is also interested in semi-supervised learning, joint
inference, and semantic parsing. His web page is available at http://narad.github.io/.
Sebastian Riedel is a senior lecturer at University College London and an Allen Distinguished In-
vestigator, leading the Machine Reading Lab. Before, he was a postdoc and research scientist with
Andrew McCallum at UMass Amherst, a researcher at Tokyo University and DBCLS with Tsujii
Junichi, and a PhD student with Ewan Klein at the University of Edinburgh. He is interested in
teaching machines how to read and works at the intersection of Natural Language Processing (NLP)
and Machine Learning, investigating various stages of the NLP pipeline, in particular those that
require structured prediction, as well as fully probabilistic architectures of end-to-end reading and
reasoning systems. Recently he became interested in new ways to represent textual knowledge using
low-rank embeddings and how to reason with such representations. His web page is available at
http://www.riedelcastro.org/.
Tim Rocktaschel is a PhD student in Sebastian Riedels Machine Reading group at University Col-
lege London. Before that he worked as research assistant in the Knowledge Management in Bioin-
formatics group at Humboldt-University zu Berlin, where he also obtained his Diploma in Computer
Science. He is broadly interested in representation learning (e.g. matrix/tensor factorization, deep
learning) for NLP and automated knowledge base completion, and how these methods can take ad-
vantage of symbolic background knowledge. His webpage is available at http://rockt.github.io/.
Andreas Vlachos is postdoc at the Machine Reading group at UCL working with Sebastian Riedel
on automated fact-checking using low-rank factorization methods. Before that he was a postdoc at
the Natural Language and Information Processing group at the University of Cambridge and at the
University of Wisconsin-Madison. He is broadly interested in natural language understanding (e.g.
information extraction, semantic parsing) and in machine learning approaches that would help us
towards this goal. He has also worked on active learning, clustering and biomedical text mining. His
web page is available at http://sites.google.com/site/andreasvlachos/.
36
Sunday, July 26, 2015
Tensor and matrix factorization methods have attracted a lot of attention recently thanks to their
successful applications to information extraction, knowledge base population, lexical semantics and
dependency parsing. In the first part, we will first cover the basics of matrix and tensor factorization
theory and optimization, and then proceed to more advanced topics involving convex surrogates and
alternative losses. In the second part we will discuss recent NLP applications of these methods and
show the connections with other popular methods such as transductive learning, topic models and
neural networks. The aim of this tutorial is to present in detail applied factorization methods, as well
as to introduce more recently proposed methods that are likely to be useful to NLP applications.
37
Tutorials
Tutorial 6
302A
Much of NLP tries to map structured input (sentences) to some form of structured output (tag se-
quences, parse trees, semantic graphs, or translated/paraphrased/compressed sentences). Thus struc-
tured prediction and its learning algorithm is of central importance to us NLP researchers. However,
when applying traditional machine learning to structured domains, we often face scalability issues
for two reasons:
1. Exact structured inference (such as parsing and translation) is too slow for repeated use on the
training data, but approximate search (such as beam search) unfortunately breaks down the nice
theoretical properties (such as convergence) of existing machine learning algorithms.
2. Even with inexact search, the scale of the training data in NLP still makes pure online learning
(such as perceptron and MIRA) too slow on a single CPU.
This tutorial reviews recent advances that address these two challenges. In particular, we will cover
principled machine learning methods that are designed to work under vastly inexact search, and
parallelization algorithms that speed up learning on multiple CPUs. We will also extend structured
learning to the latent variable setting, where in many NLP applications such as translation the gold-
standard derivation is hidden. interpretation of metaphors.
Liang Huang is an Assistant Professor at the City University of New York (CUNY). He graduated
in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant
Professor at USC/ISI. His work is mainly on the theoretical aspects (algorithms and formalisms) of
computational linguistics, as well as theory and algorithms of structured learning. He has received a
Best Paper Award at ACL 2008, several best paper nominations (ACL 2007, EMNLP 2008, and ACL
2010), two Google Faculty Research Awards (2010 and 2013), and a University Graduate Teaching
Prize at Penn (2005). He has given three tutorials at COLING 2008, NAACL 2009 and ACL 2014.
James Cross is a Ph.D. candidate at the City University of New York (CUNY) Graduate Center,
working under the direction of Liang Huang in the area of natural language processing. He has
undergraduate degrees in computer science and French from Auburn University, a law degree from
New York University, and a masters degree in computer science from the City College of New York.
He was a summer intern at the IBM T.J. Watson Research Center in 2014 and is a summer research
intern at Facebook in 2015.
38
Sunday, July 26, 2015
Tutorial 7
Arjun Mukherjee
302B
With the advent of Web 2.0, consumer reviews have become an important resource for public opinion
that influence our decisions over an extremely wide spectrum of daily and professional activities:
e.g., where to eat, where to stay, which products to purchase, which doctors to see, which books to
read, which universities to attend, and so on. Positive/negative reviews directly translate to financial
gains/losses for companies. This unfortunately gives strong incentives for opinion spamming which
refers to illegal human activities (e.g., writing fake reviews and giving false ratings) that try to mislead
customers by promoting/demoting certain entities (e.g., products and businesses). The problem has
been widely reported in the news. Despite the recent research efforts on detection, the problem is far
from solved. What is worse is that opinion spamming is widespread. While credit card fraud is as
rare as 0.2
Major review hosting sites and e-commerce vendors have already made some progress in detecting
fake reviews. However, the task is still extremely challenging because it is very difficult to obtain
large-scale ground truth samples of deceptive opinions for algorithm development and for evaluation,
or to conduct large-scale domain ex-pert evaluations. Further, in contrast to other kinds of spamming
(e.g., Web and link spam, social/blog spam, email spam, etc.) opinion spam has a very unique flavor
as it involves fluid sentiments of users and their evaluations. Thus, they require a very different treat-
ment. Since our first paper in 2007 (Jindal and Liu, 2007) on the topic, our group and many other
researchers have pro-posed several algorithms and bridged algorithmic methodologies from various
scientific disciplines including computational linguistics (Ott et al., 2011), social and behavioral sci-
ences (Jindal and Liu, 2008; Mukherjee et al., 2013a), machine learning, data mining and Bayesian
statistics (Mukherjee et al., 2012; Fei et al., 2013; Mukherjee et al., 2013b; Li et al., 2014b; Li et
al., 2014a) to solve the problem. The field of deceptive opinion spam has gained a lot of interest
in communications (Hancock et al., 2008), psycho-linguistics communities (Gokhman et al., 2012),
and economic analysis (Wang, 2010) apart from mainstream NLP and Web mining as attested by
publications in top tier venues in their respective communities. The problem has far reaching im-
plications in various allied NLP topics including Lie Detection, Forensic Linguistics, Opinion Trust
and Veracity Verification and Plagiarism Detection. However, owing to the inherent nature of the
problem, a unique blend of NLP, data mining, ma-chine learning, social, behavioral, and statistical
techniques are required which many NLP re-searchers may not be familiar with.
In this tutorial, we aim to cover the problem in its full depth and width, covering diverse algorithms
Arjun Mukherjee is an Assistant Professor in the Department of Computer Science at the University of Hous-
ton. He is an active researcher in the area of opinion spam, sentiment analysis and Web mining. He is the
lead author behind several influential works on opinion spam research. These include group opinion spam,
commercial fake re-view filters (e.g., Yelp), and various statistical models for detecting singular opinion spam-
mers, burstiness patterns, and campaign. His work on opinion mining including deception detection have also
received significant media attention (e.g., ACM Tech News, NYTimes, LATimes, Business Week, CNet, etc).
Mukherjee has also served as program committee members of WWW, ACL, EMNLP, and IJCNLP.
39
Tutorials
that have been developed over the past 7 years. The most attractive quality of these techniques is that
many of them can be adapted for cross-domain and unsupervised settings. Some of the methods are
even in use by startups and established companies. Our focus is on insight and understanding, using
illustrations and intuitive deductions. The goal of the tutorial is to make the inner workings of these
techniques transparent, intuitive and their results interpretable.
40
Sunday, July 26, 2015
Tutorial 8
Chu-Ren Huang
305
The synergy between language sciences and language technology has been an elusive one for the
computational linguistics community, especially when dealing with a language other than English.
The reasons are two-fold: the lack of an accessible comprehensive and robust account of a specific
language so as to allow strategic linking between a processing task to linguistic devices, and the lack
of successful computational studies taking advantage of such links. With a fast growing number of
available online resources, as well as a rapidly increasing number of members of the CL community
who are interested in and/or working on Chinese language processing, the time is ripe to take a
serious look at how knowledge of Chinese can help Chinese language processing.
The tutorial will be organized according to the structure of linguistic knowledge of Chinese, starting
from the basic building block to the use of Chinese in context. The first part deals with characters
as the basic linguistic unit of Chinese in terms of phonology, orthography, and basic concepts. An
ontological view of how the Chinese writing system organizes meaningful content as well as how
this onomasiological decision affects Chinese text processing will also be discussed. The second
part deals with words and presents basic issues involving the definition and identification of words
in Chinese, especially given the lack of conventional marks of word boundaries. The third part deals
with parts of speech and focuses on definition of a few grammatical categories specific to Chinese,
as well as distributional properties of Chinese PoS and tagging systems. The fourth part deals with
sentence and structure, focusing on how to identify grammatical relations in Chinese as well as a
few Chinese-specific constructions. The fifth part deals with how meanings are represented and
expressed, especially how different linguistic devices (from lexical choice to information structure)
are used to convey different information. Lastly, the sixth part deals with the ranges of different
varieties of Chinese in the world and the computational approaches to detect and differentiate these
varieties. In each topic, an empirical foundation of linguistics facts are clearly explicated with a
robust generalization, and the linguistic generalization is then accounted for in terms of its function
in the knowledge representation system. Lastly this knowledge representation role is then exploited
Chu-Ren Huang is currently a Chair Professor at the Hong Kong Polytechnic University. He is a
Fellow of the Hong Kong Academy of the Humanities, a permanent member of the International
Committee on Computational Linguistics, and President of the Asian Association of Lexicography.
He currently serves as Chief Editor of the Journal Lingua Sinica, as well as Cambridge University
Press? Studies in Natural Language Processing. He is an associate editor of both Journal of Chi-
nese Linguistics, and Lexicography. He has served advisory and/or organizing roles for conferences
including ALR, ASIALEX, CLSW, CogALex, COLING, IsCLL, LAW, OntoLex, PACLIC, RO-
CLING, and SIGHAN. Chinese language resources constructed under his direction include the CKIP
lexicon and ICG, Sinica, Sinica Treebank, Sinica BOW, Chinese WordSketch, Tagged Chinese Giga-
word Corpus, Hantology, Chinese WordNet, and Emotion Annotated Corpus. He is the co-author of
a Chinese Reference Grammar (Huang and Shi 2016), and a book on Chinese Language Processing
(Lu, Xue and Huang in preparation).
41
Tutorials
in terms of the aims of specific language technology tasks. In terms of references, in addition to
language resources and various relevant papers, the tutorial will make reference to Huang and Shis
(2016) reference grammar for a linguistic description of Chinese.
42
Monday, July 27, 2015
Welcome Reception
Catch up with your colleagues at the Welcome Reception! It will be held immediately following the
Tutorials on Sunday, July 26 at 6:00pm in Ballroom C of CNCC.
43
44
Main Conference: Monday, July 27
3
Overview
45
Session 1
Rare Word Scene Gen- Smooth Knowl- ularization for tion via Dy-
Problem in eration with edge Graph Sparse Con- namic Multi-
Neural Machine Rich Lexical Embedding junctive Fea- Pooling Convo-
Translation Grounding Guo, Wang, ture Spaces: An lutional Neural
Luong, Sutskev- Chang, Monroe, Wang, Wang, Application to Networks
er, Le, Vinyals, Savva, Potts, and Guo Named Entity Chen, Xu, Liu,
and Zaremba and Manning Classification Zeng, and Zhao
Primadhanty,
Carreras, and
Quattoni
Encoding MultiGranCNN: SensEmbed: Learning Word Stacked Ensem-
11:00
chine Transla- vised Mod- Embedding for namic Feature Event Schema
tion Features els of Aspect- Contrasting Selection for Induction with
with Multitask Sentiment for Meaning Fast Sequential Entity Disam-
Tensor Net- Online Course Chen, Lin, Prediction biguation
works Discussion Fo- Chen, Chen, Strubell, Vilnis, Nguyen, Tan-
Setiawan, rums Wei, Jiang, and Silverstein, and nier, Ferret, and
Huang, Devlin, Ramesh, Ku- Zhu McCallum Besanon
Lamar, Zbib, mar, Foulds,
Schwartz, and and Getoor
Makhoul
46
Monday, July 27, 2015
Parallel Session 1
Encoding Source Language with Convolutional Neural Network for Machine Translation
Fandong Meng, Zhengdong Lu, Mingxuan Wang, Hang Li, Wenbin Jiang, and Qun Liu 11:0011:25
The recently proposed neural network joint model NNJM Devlin et al., 2014 augments the n-gram target
language model with a heuristically chosen source context window, achieving state-of-the-art performance
in SMT. In this paper, we give a more systematic treatment by summarizing the relevant source information
through a convolutional architecture guided by the target information. With different guiding signals during
decoding, our specifically designed convolution+gating architectures can pinpoint the parts of a source sentence
that are relevant to predicting a target word, and fuse them with the context of en- tire source sentence to form
a unified representation. This representation, together with target language words, are fed to a deep neural
network DNN to form a stronger NNJM. Experiments on two NIST Chinese-English translation tasks show
that the proposed model can achieve significant improvements over the previous NNJM by up to +1.08 BLEU
points on average.
47
Session 1
48
Monday, July 27, 2015
49
Session 1
1 [TACL] Improving Distributional Similarity with Lessons Learned from Word Embeddings
Omer Levy, Yoav Goldberg, and Ido Dagan 10:1010:35
Recent trends suggest that neural network-inspired word embedding models outperform traditional count-based
distributional models on word similarity and analogy detection tasks. We reveal that much of the performance
gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather
than the embedding algorithms themselves. Furthermore, we show that these modifications can be transferred
to traditional distributional models, yielding similar gains. In contrast to prior reports, we observe mostly local
or insignificant performance differences between the methods, with no global advantage to any single approach
over the others.
Semantically Smooth Knowledge Graph Embedding
Shu Guo, Quan Wang, Bin Wang, Lihong Wang, and Li Guo 10:3511:00
This paper considers the problem of embedding Knowledge Graphs KGs consisting of entities and relations
into low-dimensional vector spaces. Most of the existing methods perform this task based solely on observed
facts. The only requirement is that the learned embeddings should be compatible within each individual fact. In
this paper, aiming at further discovering the intrinsic geometric structure of the embedding space, we propose
Semantically Smooth Embedding SSE. The key idea of SSE is to take full advantage of additional semantic
information and enforce the embedding space to be semantically smooth, i.e., entities belonging to the same
semantic category will lie close to each other in the embedding space. Two manifold learning algorithms
Laplacian Eigenmaps and Locally Linear Embedding are used to model the smoothness assumption. Both
are formulated as geometrically based regularization terms to constrain the embedding task. We empirically
evaluate SSE in two benchmark tasks of link prediction and triple classification, and achieve significant and
consistent improvements over state-of-the-art methods. Furthermore, SSE is a general framework. The smooth-
ness assumption can be imposed to a wide variety of embedding models, and it can also be constructed using
other information besides entities semantic categories.
1
At the suggestion of the TACL co-editors-in-chief, the decision between talk and poster for TACL papers was made
using random selection, with the exception of two papers that specifically requested a poster presentation.
50
Monday, July 27, 2015
51
Session 1
52
Monday, July 27, 2015
Student Lunch
Function A+B
Join your fellow students for a students-only lunch on Monday, July 27 at 11:50 in the Function A+B
at the CNCC. This is a chance to get to know others who share similar interests and goals and who
may become your lifelong colleagues.
53
Session 2
Down BTG tinuous Word ing a Composi- Neural Net- tion Extraction
Parsing for Ma- Embedding tional Seman- work Model and Reason-
chine Transla- with Metada- tics for Free- for Graph-based ing: A Scalable
tion Preordering ta for Ques- base with an Dependency Statistical Rela-
Nakagawa tion Retrieval Open Predicate Parsing tional Learning
in Community Vocabulary Pei, Ge, and Approach
Question An- Krishnamurthy Chang Wang and Co-
swering and Mitchell hen
Zhou, He, Zhao,
and Hu
Online Multi- Question An- A Generalisa- Structured A Knowledge-
14:10
task Learning swering over tion of Lexical Training for Intensive Model
for Machine Freebase with Functions for Neural Network for Preposition-
Translation Multi-Column Composition in Transition- al Phrase At-
Quality Esti- Convolution- Distributional Based Parsing tachment
mation al Neural Net- Semantics Weiss, Alberti, Nakashole and
De Souza, Ne- works Bride, Cruys, Collins, and Mitchell
gri, Ricci, and Dong, Wei, and Asher Petrov
Turchi Zhou, and Xu
A Context- [TACL] Higher- Simple Learn- Transition- A Convolution
14:35
Aware Topic order Lexical ing and Com- Based Depen- Kernel Ap-
Model for Sta- Semantic Mod- positional Ap- dency Pars- proach to Iden-
tistical Machine els for Non- plication of ing with Stack tifying Compar-
Translation factoid Answer Perceptually Long Short- isons in Text
Su, Xiong, Liu, Reranking Grounded Word Term Memory Tkachenko and
Han, Lin, Yao, Fried, Jansen, Meanings for Dyer, Balles- Lauw
and Zhang Hahn-Powell, Incremental teros, Ling,
Surdeanu, and Reference Res- Matthews, and
Clark olution Smith
Kennington and
Schlangen
54
Monday, July 27, 2015
Parallel Session 2
55
Session 2
Learning Continuous Word Embedding with Metadata for Question Retrieval in Community
Question Answering
Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu 13:4514:10
Community question answering cQA has become an important issue due to the popularity of cQA archives on
the web. This paper is concerned with the problem of question retrieval. Question retrieval in cQA archives
aims to find the existing questions that are semantically equivalent or relevant to the queried questions. How-
ever, the lexical gap problem brings about new challenge for question retrieval in cQA. In this paper, we propose
to learn continuous word embeddings with metadata of category information within cQA pages for question
retrieval. To deal with the variable size of word embedding vectors, we employ the framework of fisher kernel
to aggregated them into the fixed-length vectors. Experimental results on large-scale real world cQA data set
show that our approach can significantly outperform state-of-the-art translation models and topic-based models
for question retrieval in cQA.
56
Monday, July 27, 2015
[TACL] Learning a Compositional Semantics for Freebase with an Open Predicate Vocabulary
Jayant Krishnamurthy and Tom M. Mitchell 13:4514:10
We present an approach to learning a model-theoretic semantics for natural language tied to Freebase. Cru-
cially, our approach uses an open predicate vocabulary, enabling it to produce denotations for phrases such as
"Republican front-runner from Texas" whose semantics cannot be represented using the Freebase schema. Our
approach directly converts a sentences syntactic CCG parse into a logical form containing predicates derived
from the words in the sentence, assigning each word a consistent semantics across sentences. This logical
form is evaluated against a probabilistic database containing a learned denotation for each textual predicate. A
training phase produces this probabilistic database using a corpus of entity-linked text and probabilistic ma-
trix factorization. We evaluate our approach on a compositional question answering task where it outperforms
several competitive baselines. We also compare our approach against manually annotated Freebase queries,
finding that our open predicate vocabulary enables us to answer many questions that Freebase cannot.
57
Session 2
58
Monday, July 27, 2015
Joint Information Extraction and Reasoning: A Scalable Statistical Relational Learning Ap-
proach
William Yang Wang and William W. Cohen 13:4514:10
A standard pipeline for statistical relational learning involves two steps: one first constructs the knowledge base
KB from text, and then performs the learning and reasoning tasks using probabilistic first-order logics. How-
ever, a key issue is that information extraction IE errors from text affect the quality of the KB, and propagate to
the reasoning task. In this paper, we propose a statistical relational learning model for joint information extrac-
tion and reasoning. More specifically, we incorporate context-based entity extraction with structure learning
SL in a scalable probabilistic logic framework. We then propose a latent context invention LCI approach to
improve the performance. In experiments, we show that our approach outperforms state-of-the-art baselines
over three real-world Wikipedia datasets from multiple domains; that joint learning and inference for IE and
SL significantly improve both tasks; that latent context invention further improves the results.
59
Session 3
Corpus and Im- ions: Cross- els for Survey Learning Tech- el Tree-based
itation Learn- Lingual Opin- Generation: A niques for Dis- Structured
ing Framework ion Mining with Factoid-Based parate Label Learning Algo-
for Context- Dependencies Evaluation Sets rithms Applied
Dependent Se- Almeida, Pin- Jha, Finegan- Kim, Stratos, to Tweet Entity
mantic Parsing to, Figueira, Dollak, King, Sarikaya, and Linking
Vlachos and Mendes, and Coke, and Jeong Yang and
Clark Martins Radev Chang
It Depends: De- Learning to Training a Nat- Matrix Factor- [TACL] Design
15:55
pendency Parser Adapt Cred- ural Language ization with Challenges for
Comparison ible Knowl- Generator From Knowledge Entity Linking
Using A Web- edge in Cross- Unaligned Data Graph Propa- Ling, Singh,
based Evalua- lingual Senti- Duek and Jur- gation for Un- and Weld
tion Tool ment Analysis cicek supervised Spo-
Choi, Tetreault, Chen, Li, Lei, ken Language
and Stent Liu, and He Understanding
Chen, Wang,
Gershman, and
Rudnicky
Generating Learning Bilin- Event-Driven Efficient Disflu- Entity Retrieval
16:20
High Quali- gual Sentiment Headline Gen- ency Detection via Entity Fac-
ty Proposition Word Embed- eration with Transition- toid Hierarchy
Banks for Mul- dings for Cross- Sun, Zhang, based Parsing Lu, Lam, and
tilingual Se- language Senti- Zhang, and Ji Wu, Zhang, Liao
mantic Role ment Classifica- Zhou, and Zhao
Labeling tion
Akbik, Zhou, Chen,
Chiticariu, Shi, and Huang
Danilevsky, Li,
Vaithyanathan,
and Zhu
60
Monday, July 27, 2015
Parallel Session 3
Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling
Alan Akbik, Laura Chiticariu, Marina Danilevsky, Yunyao Li, Shivakumar Vaithyanathan, and
Huaiyu Zhu 16:2016:45
Semantic role labeling SRL is crucial to natural language understanding as it identifies the predicate-argument
structure in text with semantic labels. Unfortunately, resources required to construct SRL models are expensive
to obtain and simply do not exist for most languages. In this paper, we present a two-stage method to enable
the construction of SRL models for resource-poor languages by exploiting monolingual SRL and multilingual
parallel data. Experimental results show that our method outperforms existing methods. We use our method
to generate Proposition Banks with high to reasonable quality for 7 languages in three language families and
release these resources to the research community.
61
Session 3
62
Monday, July 27, 2015
63
Session 3
Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language
Understanding
Yun-Nung Chen, William Yang Wang, Anatole Gershman, and Alexander Rudnicky 15:5516:20
Spoken dialogue systems SDS typically require a predefined semantic ontology to train a spoken language
understanding SLU module. In addition to the annotation cost, a key challenge for designing such an ontology
is to define a coherent slot set while considering their complex relations. This paper introduces a novel matrix
factorization MF approach to learn latent feature vectors for utterances and semantic elements without the
need of corpus annotations. Specifically, our model learns the semantic slots for a domain-specific SDS in an
unsupervised fashion, and carries out semantic parsing using latent MF techniques. To further consider the
global semantic structure, such as inter-word and inter-slot relations, we augment the latent MF-based model
with a knowledge graph propagation model based on a slot-based semantic graph and a word-based lexical
graph. Our experiments show that the proposed MF approaches produce better SLU models that are able to
predict semantic slots and word patterns taking into account their relations and domain-specificity in a joint
manner.
Efficient Disfluency Detection with Transition-based Parsing
Shuangzhi Wu, Dongdong Zhang, Ming Zhou, and Tiejun Zhao 16:2016:45
Automatic speech recognition ASR outputs often contain various disfluencies. It is necessary to remove these
disfluencies before processing downstream tasks. In this paper, an efficient disfluency detection approach based
on right-to-left transition-based parsing is proposed, which can efficiently identify disfluencies and keep ASR
outputs grammatical. Our method exploits a global view to capture long-range dependencies for disfluency
detection by integrating a rich set of syntactic and disfluency features with linear complexity. The experimental
results show that our method outperforms state-of-the-art work and achieves a 85.1% f-score on the commonly
used English Switchboard test set. We also apply our method to in-house annotated Chinese data and achieve
a significantly higher f-score compared to the baseline of CRF-based approach.
64
Monday, July 27, 2015
65
Session 4
for the Con- for Semi- Help Sentence overspecified els for Image
struction of supervised Sen- Compression referring ex- Captioning:
Monolingual timent Classifi- for News High- pressions: the The Quirks and
and Cross- cation lights Genera- role of discrimi- What Works
lingual Word Li, Huang, tion nation Devlin, Cheng,
Similarity Wang, and Zhou Wei, Liu, Li, Paraboni, Fang, Gup-
Datasets and Gao Galindo, and ta, Deng, He,
Camacho- Iacovelli Zweig, and
Collados, Pile- Mitchell
hvar, and Nav-
igli
On metric em- Deep Markov Domain- Using prosodic A Distributed
17:15
tributed Rep- ysis and Help- Lexical Simpli- Supervised Dis- guage through
resentation of fulness Predic- fication: Do We course Relation pictures
Word Sense via tion of Text for Need Simplified Classification Chrupaa,
WordNet Gloss Online Product Corpora? Fisher and Sim- Kdr, and Al-
Composition Reviews Glava and Sta- mons ishahi
and Context Yang, Yan, Qiu, jner
Clustering and Bao
Chen, Xu, He,
and Wang
A Multitask Document Clas- Zoom: a cor- I do not dis- Exploiting Im-
17:45
66
Monday, July 27, 2015
Parallel Session 4
Improving Distributed Representation of Word Sense via WordNet Gloss Composition and
Context Clustering
Tao Chen, Ruifeng Xu, Yulan He, and Xuan Wang 17:3017:45
In recent years, there has been an increasing interest in learning a distributed representation of word sense.
Traditional context clustering based models usually require careful tuning of model parameters, and typically
perform worse on infrequent word senses. This paper presents a novel approach which addresses these limita-
tions by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet
glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context
clustering based model to generate the distributed representations of word senses. Our learned representations
outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13
sub tasks in the analogical reasoning task.
67
Session 4
Semantic Analysis and Helpfulness Prediction of Text for Online Product Reviews
Yinfei Yang, Yaowei Yan, Minghui Qiu, and Forrest Bao 17:3017:45
Predicting the helpfulness of product reviews is a key component of many e-commerce tasks such as review
ranking and recommendation. However, previous work mixed review helpfulness prediction with those outer
layer tasks. Using non-text features, it leads to less transferable models. This paper solves the problem from
a new angle by hypothesizing that helpfulness is an internal property of text. Purely using review text, we
isolate review helpfulness prediction from its outer layer tasks, employ two interpretable semantic features, and
use human scoring of helpfulness as ground truth. Experimental results show that the two semantic features
can accurately predict helpfulness scores and greatly improve the performance compared with using features
previously used. Cross-category test further shows the models trained with semantic features are easier to be
generalized to reviews of different product categories. The models we built are also highly interpretable and
align well with human annotations.
68
Monday, July 27, 2015
69
Session 4
70
Monday, July 27, 2015
71
Poster session P1.02
Poster session P1
72
Monday, July 27, 2015
document coverage, specificity, topic diversity, and topic homogeneity, each of which, we show, is naturally
modeled by a submodular function. Other information, provided say by unsupervised approaches such as LDA
and its variants, can also be utilized by defining a submodular function that expresses coherence between the
chosen topics and this information. We use a large-margin framework to learn convex mixtures over the set
of submodular components. We empirically evaluate our method on the problem of automatically generating
Wikipedia disambiguation pages using human generated clusterings as ground truth. We find that our frame-
work improves upon several baselines according to a variety of standard evalua- tion metrics including the
Jaccard Index, F1 score and NMI, and moreover, can be scaled to extremely large scale problems.
1-7 Bring you to the past: Automatic Generation of Topically Relevant Event Chronicles
Tao Ge, Wenzhe Pei, Heng Ji, Sujian Li, Baobao Chang, and Zhifang Sui
An event chronicle provides people with an easy and fast access to learn the past. In this paper, we propose the
first novel approach to automatically generate a topically relevant event chronicle during a certain period given
a reference chronicle during another period. Our approach consists of two core components a time-aware
hierarchical Bayesian model for event detection, and a learning-to-rank model to select the salient events to
construct the final chronicle. Experimental results demonstrate our approach is promising to tackle this new
problem.
73
Poster session P1.03
based approaches for morph resolution. Our approach achieved significant improvement over the state-of-the-
art method Huang et al., 2013 which used a large amount of training data.
1-9 Multi-Objective Optimization for the Joint Disambiguation of Nouns and Named Entities
Dirk Weissenborn, Leonhard Hennig, Feiyu Xu, and Hans Uszkoreit
In this paper, we present a novel approach to joint word sense disambiguation WSD and entity linking EL that
combines a set of complementary objectives in an extensible multi-objective formalism. During disambiguation
the system performs continuous optimization to find optimal probability distributions over candidate senses.
The performance of our system on nominal WSD as well as EL improves state-of-the-art results on several
corpora. These improvements demonstrate the importance of combining complementary objectives in a joint
model for robust disambiguation.
1-13 Semantic Representations for Domain Adaptation: A Case Study on the Tree Kernel-based
Method for Relation Extraction
Thien Huu Nguyen, Barbara Plank, and Ralph Grishman
We study the application of word embeddings to generate semantic representations for the domain adapta-
tion problem of relation extraction RE in the tree kernel-based method. We systematically evaluate various
techniques to generate the semantic representations and demonstrate that they are effective to improve the
generalization performance of a tree kernel-based relation extractor across domains up to 7
1-14 Omnia Mutantur, Nihil Interit: Connecting Past with Present by Finding Corresponding
Terms across Time
Yating Zhang, Adam Jatowt, Sourav Bhowmick, and Katsumi Tanaka
In the current fast paced world, people tend to possess limited knowledge about things from the past. For
example, some young users may not know that Walkman played similar function as iPod does nowadays. In
this paper, we approach the temporal correspondence problem in which, given the input term e.g., iPod and the
target time e.g. 1980s, the task is to find the counterpart of the query that existed in the target time. We propose
74
Monday, July 27, 2015
an approach that transforms word contexts across time based on their neural network representations. We then
experimentally demonstrate the effectiveness of our method on the New York Times Annotated Corpus.
75
Poster session P1.06
represent a named symbol object entity and relation. The first one represents the meaning of an entity relation,
the other one is used to construct mapping matrix dynamically. Compared with CTransR, TransD not only
considers the diversity of relations, but also entities. TransD has less parameters and has no matrix-vector mul-
tiplication. In Experiments, we evaluate our model on two typical task including triplets classification and link
prediction. Evaluation results show that our model outperforms the other embedding models including TransE,
TransH and TransR/CTransR.
1-19 How Far are We from Fully Automatic High Quality Grammatical Error Correction?
Christopher Bryant and Hwee Tou Ng
In this paper, we first explore the role of inter-annotator agreement statistics in grammatical error correction
and conclude that they are less informative in fields where there may be more than one correct answer. We
next created a dataset of 50 student essays, each corrected by 10 different annotators for all error types, and
investigated how both human and GEC system scores vary when different combinations of these annotations
are used as the gold standard. Upon learning that even humans are unable to score higher than 75
76
Monday, July 27, 2015
1-25 Vector-space calculation of semantic surprisal for predicting word pronunciation duration
Asad Sayeed, Stefan Fischer, and Vera Demberg
In order to build psycholinguistic models of processing difficulty and evaluate these models against human
data, we need highly accurate language models. Here we specifically consider surprisal, a words predictability
in context. Existing approaches have mostly used n-gram models or more sophisticated syntax-based parsing
models; this largely does not account for effects specific to semantics. We build on the work by Mitchell et
al. 2010 and show that the semantic prediction model suggested there can successfully predict spoken word
durations in naturalistic con versational data.An interesting finding is that the training data for the semantic
model also plays a strong role: the model trained on in-domain data, even though a better language model for
our data, is not able to predict word durations, while the out-of-domain trained language model does predict
word durations. We argue that this at first counter-intuitive result is due to the out-of-domain model better
matching the language models of the speakers in our data.
77
Poster session P1.09
78
Monday, July 27, 2015
Modern statistical machine translation SMT systems usually use a linear combination of features to model
the quality of each translation hypothesis. The linear combination assumes that all the features are in a linear
relationship and constrains that each feature interacts with the rest features in an linear manner, which might
limit the expressive power of the model and lead to a under-fit model with the current data. In this paper,
we propose a non-linear modeling for the quality of translation hypotheses based on neural networks, which
allows more complex interaction between features. A learning framework is presented for training the non-
linear models. We also discuss possible heuristics in designing the network structure which may improve
the non-linear learning performance. Experimental results show that with the basic features of a hierarchical
phrase-based machine translation system, our method produce translations that are significantly better than a
linear model.
1-32 Unifying Bayesian Inference and Vector Space Models for Improved Decipherment
Qing Dou, Ashish Vaswani, Kevin Knight, and Chris Dyer
We introduce into Bayesian decipherment a base distribution derived from similarities of word embeddings.
We use Dirichlet multinomial regression Mimno and McCallum, 2012 to learn a mapping between ciphertext
and plaintext word embeddings from non-parallel data. Experimental results show that the base distribution is
highly beneficial to decipherment, improving state-of-the-art decipherment accuracy from 45.8
79
Poster session P1.11
1-38 The NL2KR Platform for building Natural Language Translation Systems
Nguyen Vo, Arindam Mitra, and Chitta Baral
This paper presents the NL2KR platform to build systems that can translate text to different formal languages. It
is freely available, customizable, and comes with an Interactive GUI support that is useful in the development of
a translation system. Our key contribution is a user friendly system based on an interactive multistage learning
algorithm. This effective algorithm employs Inverse-Lambda, Generalization and user provided dictionary to
learn new meanings of words from sentences and their representations. Using the learned meanings, and the
Generalization approach, it is able to translate new sentences. ANON is evaluated on two standard corpora,
Jobs and GeoQuery and t exhibits state-of-the-art performance on both of them.
80
Monday, July 27, 2015
research purpose. Our experiment results demonstrate the effectiveness of our NSW detection method and the
benefit of NSW detection for NER. Our proposed methods perform better than the state-of-the-art NER system.
81
Poster session P1.14
1-45 Joint Case Argument Identification for Japanese Predicate Argument Structure Analysis
Hiroki Ouchi, Hiroyuki Shindo, Kevin Duh, and Yuji Matsumoto
Existing methods for Japanese predicate argument structure PAS analysis identify case arguments of each pred-
icate without considering interactions between the target PAS and others in a sentence. However, the argument
structures of the predicates in a sentence are semantically related to each other. This paper proposes new meth-
ods for Japanese PAS analysis to jointly identify case arguments of all predicates in a sentence by 1 modeling
multiple PAS interactions with a bipartite graph and 2 approximately searching optimal PAS combinations. Per-
forming experiments on the NAIST Text Corpus, we demonstrate that our joint analysis methods substantially
outperform a strong baseline and are comparable to previous work.
1-46 Jointly optimizing word representations for lexical and sentential tasks with the C-
PHRASE model
Nghia The Pham, Germn Kruszewski, Angeliki Lazaridou, and Marco Baroni
We introduce C-PHRASE, a distributional semantic model that learns word representations by optimizing
context prediction for phrases at all levels in a syntactic tree, from single words to full sentences. C-PHRASE
outperforms the state-of-the-art C-BOW model on a variety of lexical tasks. Moreover, since C-PHRASE word
vectors are induced through a compositional learning objective modeling the contexts of words combined into
phrases, when they are summed, they produce sentence representations that rival those generated by ad-hoc
compositional models.
82
Monday, July 27, 2015
different and important related tasks, i.e., Paraphrasing Identification and Textual Entailment Recognition.
1-51 Learning Semantic Representations of Users and Products for Document Level Sentiment
Classification
Duyu Tang, Bing Qin, and Ting Liu
Neural network methods have achieved promising results for sentiment classification of text. However, these
models only use semantics of texts, while ignoring users who express the sentiment and products which are
evaluated, both of which have great influences on interpreting the sentiment of text. In this paper, we address
this issue by incorporating user- and product-level information into a neural network approach for document
level sentiment classification. Users and products are modeled using vector space models, the representations
of which capture important global clues such as individual preferences of users or overall qualities of products.
Such global evidence in turn facilitates embedding learning procedure at document level, yielding better text
representations. By combining evidence at user-, product- and document- level in a unified neural framework,
the proposed model achieves state-of-the-art performances on Amazon and Yelp datasets.
1-53 Sparse, Contextually Informed Models for Irony Detection: Exploiting User Communities,
Entities and Sentiment
Byron C. Wallace, Do Kook Choe, and Eugene Charniak
Automatically detecting verbal irony roughly, sarcasm in online content is important for many practical appli-
cations e.g., sentiment detection, but it is difficult. Previous approaches have relied predominantly on signal
gleaned from word counts and grammatical cues. But such approaches fail to exploit the context in which
comments are embedded. We thus propose a novel strategy for verbal irony classification that exploits contex-
tual features, specifically by combining noun phrases and sentiment extracted from comments with the forum
type e.g., conservative or liberal to which they were posted. We show that this approach improves verbal irony
classifica- tion performance. Furthermore, because this method generates a very large feature space and we ex-
pect predictive contextual features to be strong but few, we propose a mixed regularization strategy that places
a sparsity-inducing l1 penalty on the contextual feature weights on top of the l2 penalty applied to all model
coefficients. This increases model sparsity and reduces the variance of model performance
83
Poster session P1.16
1-56 Improving social relationships in face-to-face human-agent interactions: when the agent
wants to know users likes and dislikes
Caroline Langlet and Chlo Clavel
This paper tackles the issue of the detection of users likes and dislikes in a human-agent interaction. We present
a system handling the interaction issue by jointly processing agents and users utterances. It is designed as
a rule-based and bottom-up process based on a symbolic representation of the structure of the sentence. This
article also describes the annotation campaign carried out through Amazon Mechanical Turk for the creation
of the evaluation data-set. Finally, we present all measures for rating agreement between our system and the
human reference and obtain agreement scores that correspond at least to substantial agreements.
1-57 Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces
Ramn Astudillo, Silvio Amir, Wang Ling, Mario Silva, and Isabel Trancoso
We investigate a technique to adapt unsupervised word embeddings to specific applications, when only small
and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model pa-
rameters, and then use the labeled data to tailor them for the intended task. However, this approach is prone
to overfitting when the training is performed with scarce and noisy data. To overcome this issue, we use the
supervised data to find an embedding subspace that fits the task complexity. All the word representations are
adapted through a projection into this task-specific subspace, even if they do not occur on the labeled dataset.
This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-
the-art results. Here we show results improving those of the challenge, as well as additional experiments in a
Twitter Part-Of-Speech tagging task.
84
Monday, July 27, 2015
scription to a large community of online workers crowd. We also get spoken English grades from the crowd.
We achieve 95
1-59 Driving ROVER with Segment-based ASR Quality Estimation
Shahab Jalalvand, Matteo Negri, Falavigna Daniele, and Marco Turchi
ROVER is a widely used method to combine the output of multiple automatic speech recognition ASR systems.
Though effective, the basic approach and its variants suffer from potential drawbacks: i their results depend on
the order in which the hypotheses are used to feed the combination process, ii when applied to combine long
hypotheses, they disregard possible differences in transcription quality at local level, iii they often rely on word
confidence information. We address these issues by proposing a segment-based ROVER in which hypothesis
ranking is obtained from a confidence-independent ASR quality estimation method. Our results on English
data from the IWSLT2012 and IWSLT2013 evaluation campaigns significantly outperform standard ROVER
and approximate two strong oracles.
85
Poster session P1.18
subcategorization features derived from a syntactic lexicon. We train it on a modified version of the French
Treebank enriched with morphological dependencies. It recognizes 81.79
1-63 End-to-end learning of semantic role labeling using recurrent neural networks
Jie Zhou and Wei Xu
Semantic role labeling SRL is one of the basic natural language processing NLP problems.To this date, the
most suc cessful SRL systems were built on top of some form of parsing results Koomenet al., 2005; Palmer
et al., 2010, where pre-defined feature templates over the syntactic structure are used.The attempts of building
an end-to-end SRL learning system without using parsing were less successful Collobert et al., 2011. In this
work, we propose to use deep bi-directional recurrent network as an end-to-end system for SRL. We take only
original text information as input feature, without using any syntactic knowledge. The proposed algorithm
was evaluated on CoNLL-2005 shared task on semantic role labeling and achieved F1score of 81.07. This
result outperforms the previous state-of-the-art system which is a result from combining 5 different parsing
trees from two different parsers. Our analysis shows that our model is better at handling longer sentences than
the traditional models. And the latent variables of our model implicitly captures the syntactic structure of a
sentence.
1-64 Feature Optimization for Constituent Parsing via Neural Networks
Zhiguo Wang, Haitao Mi, and Nianwen Xue
The performance of discriminative constituent parsing relies crucially on feature engineering, and effective
features usually have to be carefully selected through a painful manual process. In this paper, we propose a
method that automatically learns a set of optimal features. Specifically, we build a feedforward neural network
model, which takes as input a few primitive units words, POS tags and certain contextual tokens from the local
context, induces the feature representation in the hidden layer and makes parsing predictions in the output layer.
The network simultaneously learns the feature representation and the prediction model parameters using a back
propagation algorithm. By pre-training the model on a large amount of automatically parsed data, our model
achieves impressive improvements. Evaluated on the standard data sets, our final performance reaches 86.6
1-66 A Re-ranking Model for Dependency Parser with Recursive Convolutional Neural Net-
work
Chenxi Zhu, Xipeng Qiu, Xinchi Chen, and Xuanjing Huang
In this work, we address the problem to model all the nodes words or phrases in a dependency tree with the
dense representations. We propose a recursive convolutional neural network RCNN architecture to capture
syntactic and compositional-semantic representations of phrases and words in a dependency tree. Different
with the original recursive neural network, we introduce the convolution and pooling layers, which can model
a variety of compositions by the feature maps and choose the most informative compositions by the pooling
layers. Based on RCNN, we use a discriminative model to re-rank a k-best list of candidate dependency parsing
trees. The experiments show that RCNN is very effective to improve the state-of-the-art dependency parsing
on both English and Chinese datasets.
86
Monday, July 27, 2015
on the stack and queue employed in transition-based parsing, in addition to the representations of partially
parsed tree structure. Our transition-based neural constituent parsing achieves performance comparable to the
state-of-the-art parsers, demonstrating F1 score of 90.68
87
System Demonstrations
In this paper, we bridge the lexical feature gap by using distributed feature representations and their compo-
sition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map
vocabularies from two different languages into a common vector space. Consequently, both lexical features and
non-lexical features can be used in our model for cross-lingual transfer.Furthermore, our framework is able to
incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve
an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized
parser, trained on English universal treebank and transferred to three other languages. It also significantly
outperforms McDonald et al. 2013 augmented with projected cluster features on identical data.
System Demonstrations
88
Tuesday, July 28, 2015
89
System Demonstrations
90
Main Conference: Tuesday, July 28
4
Overview
91
Main Conference
92
Tuesday, July 28, 2015
Plenary Hall B
Abstract:
How we teach and learn is undergoing a revolution, due to changes in technology and connectiv-
ity. Education may be one of the best application areas for advanced NLP techniques, and NLP
researchers have much to contribute to this problem, especially in the areas of learning to write,
mastery learning, and peer learning. In this talk I consider what happens when we convert natural
language processors into natural language coaches.
Biography: Marti Hearst is a Professor at UC Berkeley in the School of Information and EECS. She
received her PhD in CS from UC Berkeley in 1994 and was a member of the research staff at Xerox
PARC form 1994-1997. Her research is in computational linguistics, search user interfaces, informa-
tion visualization, and improving learning at scale. Her NLP work includes automatic acquisition of
hypernym relations ("Hearst Patterns"), TextTiling discourse segmentation, abbreviation recognition,
and multiword semantic relations. She wrote the book "Search User Interfaces" (Cambridge) in 2009,
co-founded the ACM Conference on Learning at Scale in 2014, and was named an ACM Fellow in
2013. She has received four student-initiated Excellence in Teaching Awards, including in 2014 and
2015.
93
Session 5
tion for Translating for Undirected ral Networks for Japanese typed supervision using
Out-of-Vocabulary Topic Models Automatic Resolu- dependency parsing inference learning
Words tion of Crossword with grammatical
Gu and Li Puzzles Roller, Agirre,
Tsvetkov and Dyer function analysis
Severyn, Nicosia, Soroa, and Steven-
Tanaka and Nagata
Barlacchi, and son
Moschitti
Recurrent Neu- A Hassle-Free Un- Word Order Ty- KLcpos3 - a Lan- A Lexicalized Tree
10:45
ral Network based supervised Domain pology through guage Similarity Kernel for Open
Rule Sequence Adaptation Method Multilingual Word Measure for Delex- Information Extrac-
Model for Sta- Using Instance Alignment icalized Parser tion
tistical Machine Similarity Features Transfer
Translation Ostling Xu, Ringlstetter,
Yu and Jiang Rosa and Zabokrt- Kim, Kondrak,
Yu and Zhu
sky Goebel, and Miyao
Discriminative Pre- Dependency-based Measuring idiosyn- CCG Supertagging A Dependency-
11:00
ordering Meets Convolutional Neu- cratic interests with a Recurrent Based Neural Net-
Kendalls Maxi- ral Networks for in children with Neural Network work for Relation
mization Sentence Embed- autism Classification
ding Xu, Auli, and Clark
Hoshino, Miyao, Rouhizadeh, Liu, Wei, Li, Ji,
Sudoh, Hayashi, Ma, Huang, Zhou, Prudhommeaux, Zhou, and Wang
and Nagata and Xiang Van Santen, and
Sproat
Evaluating Ma- Non-Linear Text Frame-Semantic An Efficient Dy- Embedding Meth-
11:15
chine Translation Regression with a Role Labeling with namic Oracle for ods for Fine
Systems with Sec- Deep Convolutional Heterogeneous Unrestricted Non- Grained Entity
ond Language Pro- Neural Network Annotations Projective Parsing Type Classification
ficiency Tests Bitvai and Cohn Kshirsagar, Thom- Gmez-Rodrguez Yogatama, Gillick,
Matsuzaki, Fujita, son, Schneider, and Fernndez- and Lazic
Todo, and Arai Carbonell, Smith, Gonzlez
and Dyer
Representation A Unified Learn- Semantic Interpre- Synthetic Word Sieve-Based Enti-
11:30
Based Translation ing Framework of tation of Superla- Parsing Improves ty Linking for the
Evaluation Metrics Skip-Grams and tive Expressions via Chinese Word Seg- Biomedical Do-
Chen and Guo Global Vectors Structured Knowl- mentation main
Suzuki and Nagata edge Bases Cheng, Duh, and DSouza and Ng
Zhang, Feng, Matsumoto
Huang, Xu, Han,
and Zhao
Exploring the Plan- Pre-training of Grounding Seman- If all you have is Open IE as an In-
11:45
et of the APEs: a Hidden-Unit CRFs tics in Olfactory a bit of the Bible: termediate Struc-
Comparative Study Perception Learning POS tag- ture for Semantic
of State-of-the-art Kim, Stratos, and Tasks
gers for truly low-
Methods for MT Sarikaya Kiela, Bulat, and resource languages
Automatic Post- Stanovsky and Da-
Clark
Editing Agic, Hovy, and gan
Chatterjee, Weller, Sgaard
Negri, and Turchi
94
Tuesday, July 28, 2015
Parallel Session 5
Recurrent Neural Network based Rule Sequence Model for Statistical Machine Translation
Heng Yu and Xuan Zhu 10:4511:00
The inability to model long-distance dependency has been handicapping SMT for years.Specifically, the context
independence assumption makes it hard to capture the dependency between translation rules. In this paper, we
introduce a novel recurrent neural network based rule sequence model to incorporate arbitrary long contextual
information during estimating probabilities of rule sequences. Moreover, our model frees the translation model
from keeping huge and redundant grammars, resulting in more efficient training and decoding. Experimental
results show that our method achieves a 0.9 point BLEU gain over the baseline, and a significant reduction in
rule table size for both phrase-based and hierarchical phrase-based systems.
Exploring the Planet of the APEs: a Comparative Study of State-of-the-art Methods for MT
Automatic Post-Editing
Rajen Chatterjee, Marion Weller, Matteo Negri, and Marco Turchi 11:4512:00
Downstream processing of machine translation MT output promises to be a solution to improve translation
quality, especially when the MT systems internal decoding process is not accessible. Both rule-based and
statistical automatic post-editing APE methods have been proposed over the years, but with contrasting results.
95
Session 5
A missing aspect in previous evaluations is the assessment of different methods: i under comparable conditions,
and ii on different language pairs featuring variable levels of MT quality. Focusing on statistical APE methods
more portable across languages, we propose the first systematic analysis of two approaches. To understand
their potential, we compare them in the same conditions over six language pairs having English as source. Our
results evidence consistent improvements on all language pairs, a relation between the extent of the gain and
MT output quality, slight but statistically significant performance differences between the two methods, and
their possible complementarity.
96
Tuesday, July 28, 2015
97
Session 5
98
Tuesday, July 28, 2015
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
eljko Agic, Dirk Hovy, and Anders Sgaard 11:4512:00
We present a simple method for learning part-of-speech taggers for languages like Akawaio, Aukan, or Cakchiquel
languages for which nothing but a translation of parts of the Bible exists. By aggregating over the tags from
a few annotated languages and spreading them via word-alignment on the verses, we learn POS taggers for
100 languages, using the languages to bootstrap each other. We evaluate our cross-lingual models on the 25
languages where test sets exist, as well as on another 10 for which we have tag dictionaries. Our approach
performs much better 20-30
99
Session 5
100
Tuesday, July 28, 2015
13:30
prehension with Word Embed- mantic Parsing Polarities of feature-rich
Discourse Rela- dings from De- with Partial On-Tweets by discriminative
tions compositions of tologies Composing approach to de-
Narasimhan Count Matrices Choi, Word Embed- pendency gram-
and Barzilay Stratos, Collins, Kwiatkowski, dings with Long mar induction
and Hsu and Zettlemoyer Short-Term Grave and El-
Memory hadad
Wang, Liu,
Sun, Wang, and
Wang
Implicit Role Entity Hierar- Semantic Pars- Topic Modeling Parse Imputa-
13:55
Linking on Chi- chy Embedding ing via Staged based Senti- tion for Depen-
nese Discourse: Hu, Huang, Query Graph ment Analysis dency Annota-
Exploiting Ex- Deng, Gao, and Generation: on Social Media tions
plicit Roles and Xing Question An- for Stock Mar- Mielens, Sun,
Frame-to-Frame swering with ket Prediction and Baldridge
Relations Knowledge Nguyen and
Li, Wu, Wang, Base Shirai
and Chai Yih, Chang, He,
and Gao
Discourse- Orthogonality Building a Se- Learning Tag Probing the
14:20
sensitive Au- of Syntax and mantic Parser Embeddings Linguistic
tomatic Identifi- Semantics with- Overnight and Tag-specific Strengths and
cation of Gener- in Distributional Wang, Berant, Composition Limitations of
ic Expressions Spaces and Liang Functions in Unsupervised
Friedrich and Mitchell and Recursive Neu- Grammar In-
Pinkal Steedman ral Network duction
Qian, Tian, Bisk and Hock-
Huang, Liu, enmaier
Zhu, and Zhu
101
Session 6
Parallel Session 6
Implicit Role Linking on Chinese Discourse: Exploiting Explicit Roles and Frame-to-Frame
Relations
Ru Li, Juan Wu, Zhiqiang Wang, and Qinghua Chai 13:5514:20
There is a growing interest in researching null instantiations, which are those implicit semantic arguments.
Many of these implicit arguments can be linked to referents in context, and their discoveries are of great benefits
to semantic processing. We address the issue of automatically identifying and resolving implicit arguments in
Chinese discourse. For their resolutions, we present an approach that combines the information about overtly
labeled arguments and frame-to-frame relations defined by FrameNet. Experimental results on our created
corpus demonstrate the effectiveness of our approach.
102
Tuesday, July 28, 2015
103
Session 6
Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge
Base
Wen-tau Yih, Ming-Wei Chang, Xiaodong He, and Jianfeng Gao 13:5514:20
We propose a novel semantic parsing framework for question answering using a knowledge base. We define
a query graph that resembles subgraphs of the knowledge base and can be directly mapped to a logical form.
Semantic parsing is reduced to query graph generation, formulated as a staged search problem. Unlike tradi-
tional approaches, our method leverages the knowledge base in an early stage to prune the search space and
thus simplifies the semantic matching problem. By applying an advanced entity linking system and a deep
convolutional neural network model that matches questions and predicate sequences, our system outperforms
previous methods substantially, and achieves an F1 measure of 52.5
104
Tuesday, July 28, 2015
Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction
Thien Hai Nguyen and Kiyoaki Shirai 13:5514:20
The goal of this research is to build a model to predict stock price movement using sentiments on social media.
A new feature which captures topics and their sentiments simultaneously is introduced in the prediction model.
In addition, a new topic model TSLDA is proposed to obtain this feature. Our method outperformed a model
using only historical prices by about 6.07
Learning Tag Embeddings and Tag-specific Composition Functions in Recursive Neural Net-
work
Qiao Qian, Bo Tian, Minlie Huang, Yang Liu, Xuan Zhu, and Xiaoyan Zhu 14:2014:45
Recursive neural network is one of the most successful deep learning models for natural language processing
due to the compositional nature of text. The model recursively composes the vector of a parent phrase from
those of child words or phrases, with a key component named composition function. Although a variety of
composition functions have been proposed, the syntactic information has not been fully encoded in the com-
position process. We propose two models, Tag Guided RNN TG-RNN for short which chooses a composition
function according to the part-of-speech tag of a phrase, and Tag Embedded RNN/RNTN TE-RNN/RNTN for
short which learns tag embeddings and then combines tag and word embeddings together. In the fine-grained
sentiment classification, experiment results show the proposed models obtain remarkable improvement: TG-
RNN/TE-RNN obtain remarkable improvement over baselines, TE-RNTN obtains the second best result among
all the top performing models, and all the proposed models have much less parameters/complexity than their
counterparts.
105
Session 6
106
Tuesday, July 28, 2015
15:15
Coreference the House: A cient Inference complete Word duction
Resolution with Hierarchical and Structured Vector Repre- Fernndez-
Model Stacking Ideal Point Top- Learning for sentations Gonzlez and
Clark and Man- ic Model and Semantic Role Faruqui, Martins
ning Its Application Labeling Tsvetkov, Yo-
to Republican Tckstrm, gatama, Dyer,
Legislators Ganchev, and and Smith
in the 112th Das
Congress
Nguyen, Boyd-
Graber, Resnik,
and Miler
Learning KB-LDA: Joint- Compositional Learning Se- Optimal Shift-
15:40
Anaphoricity ly Learning Semantic Pars- mantic Word Reduce Con-
and Antecedent a Knowledge ing on Semi- Embeddings stituent Parsing
Ranking Fea- Base of Hierar- Structured Ta- based on Ordi- with Structured
tures for Coref- chy, Relations, bles nal Knowledge Perceptron
erence Resolu- and Facts Pasupat and Constraints Thang, Noji,
tion Movshovitz- Liang Liu, Jiang, Wei, and Miyao
Wiseman, Rush, Attias and Co- Ling, and Hu
Shieber, and hen
Weston
Transferring A Computation- Graph parsing Adding Seman- A Data-Driven,
16:05
Coreference ally Efficient with s-graph tics to Data- Factorization
Resolvers with Algorithm for grammars Driven Para- Parser for CCG
Posterior Regu- Learning Topi- Groschwitz, phrasing Dependency
larization cal Collocation Koller, and Te- Pavlick, Bos, Structures
Martins Models ichmann Nissim, Beller, Du, Sun, and
Zhao, Du, Van Durme, and Wan
Brschinger, Callison-Burch
Pate, Ciarami-
ta, Steedman,
and Johnson
107
Session 7
Parallel Session 7
108
Tuesday, July 28, 2015
109
Session 7
110
Tuesday, July 28, 2015
111
Session 7
112
Tuesday, July 28, 2015
Poster session P2
2-2 The Users Who Say Ni: Audience Identification in Chinese-language Restaurant Reviews
Rob Voigt and Dan Jurafsky
We give an algorithm for disambiguating generic versus referential uses of second-person pronouns in restau-
rant reviews in Chinese. Reviews in this domain use the you pronoun either generically or to refer to shop-
keepers, readers, or for self-reference in reported conversation. We first show that linguistic features of the local
context drawn from prior literature help in disambigation. We then show that document-level features n-grams
and document-level embeddings - not previously used in the referentiality literature - actually give the largest
gain in performance, and suggest this is because pronouns in this domain exhibit one-sense-per-discourse.
Our work highlights an important case of discourse effects on pronoun use, and may suggest practical implica-
tions for audience extraction and other sentiment tasks in online reviews.
2-3 Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling
State-of-the-Art Resolvers
Chen Chen and Vincent Ng
We propose an unsupervised probabilistic model for zero pronoun resolution. To our knowledge, this is the
first such model that 1 is trained on zero pronouns in an unsupervised manner; 2 jointly identifies and resolves
anaphoric zero pronouns; and 3 exploits discourse information provided by a salience model. Experiments
demonstrate that our unsupervised model significantly outperforms its state-of-the-art unsupervised counterpart
when resolving the Chinese zero pronouns in the OntoNotes corpus.
2-5 Retrieval of Research-level Mathematical Information Needs: A Test Collection and Tech-
nical Terminology Experiment
Yiannos Stathopoulos and Simone Teufel
In this paper, we present a test collection for mathematical information retrieval composed of real-life, research-
level mathematical information needs. Topics and relevance judgements have been procured from the on-line
collaboration website MathOverflow by delegating domain-specific decisions to experts on-line. With our
113
Poster session P2.03
test collection, we construct a baseline using Lucenes vector-space model implementation and conduct an
experiment to investigate how prior extraction of technical terms from mathematical text can affect retrieval
efficiency. We show that by boosting the importance of technical terms, statistically significant improvements
in retrieval performance can be obtained over the baseline.
2-8 Semantic Clustering and Convolutional Neural Network for Short Text Categorization
Peng Wang, Jiaming Xu, Bo Xu, Chenglin Liu, Heng Zhang, Fangyuan Wang, and Hongwei Hao
Short texts usually encounter data sparsity and ambiguity problems in representations for their lack of con-
text. In this paper, we propose a novel method to model short texts based on word embedding clustering and
convolutional neural network. Particularly, we first discover semantic cliques in embedding spaces by fast
clustering. Then, multi-scale semantic units are detected under the supervision of semantic cliques, which
introduce useful external knowledge for short texts. These meaningful semantic units are combined and fed
into convolutional layer, followed by max-pooling operation. Experimental results on two open benchmarks
validate the effectiveness of the proposed method.
2-10 Event Detection and Domain Adaptation with Convolutional Neural Networks
Thien Huu Nguyen and Ralph Grishman
We study the event detection problem using convolutional neural networks CNNs that overcome the two fun-
damental limitations of the traditional feature-based approaches to this task: complicated feature engineering
114
Tuesday, July 28, 2015
for rich feature sets and error propagation from the preceding stages which generate these features. The exper-
imental results show that the CNNs outperform the best reported feature-based systems in the general setting
as well as the domain adaptation setting without resorting to extensive external resources.
2-11 Seed-Based Event Trigger Labeling: How far can event descriptions get us?
Ofer Bronstein, Ido Dagan, Qi Li, Heng Ji, and Anette Frank
The task of event trigger labeling is typically addressed in the standard supervised setting: triggers for each
target event type are annotated as training data, based on annotation guidelines. We propose an alternative
approach, which takes the example trigger terms mentioned in the guidelines as seeds, and then applies an
event-independent similarity-based classifier for trigger labeling. This way we can skip manual annotation for
new event types, while requiring only minimal annotated training data for few example events at system setup.
Our method is evaluated on the ACE-2005 dataset, achieving 5.7
2-15 Robust Multi-Relational Clustering via `1 -Norm Symmetric Nonnegative Matrix Factor-
ization
Kai Liu and Hua Wang
In this paper, we propose an `1 -norm Symmetric Nonnegative Matrix Tri-Factorization `1 S-NMTF framework
to cluster multi-type relational data by utilizing their interrelatedness. Due to introducing the `1 -norm distances
in our new objective function, the proposed approach is robust against noise and outliers, which are inherent in
multi-relational data. We also derive the solution algorithm and rigorously analyze its correctness and conver-
gence. The promising experimental results of the algorithm applied to text clustering on IMDB dataset validate
the proposed approach.
115
Poster session P2.05
The entire process is domain independent, and demands no prior annotation samples, or rules specific to an
annotation.
2-17 FrameNet+: Fast Paraphrastic Tripling of FrameNet
Ellie Pavlick, Travis Wolfe, Pushpendre Rastogi, Chris Callison-Burch, Mark Dredze, and Benjamin
Van Durme
We increase the lexical coverage of FrameNet through automatic paraphrasing. We use crowdsourcing to
manually filter out bad paraphrases in order to ensure a high-precision resource. Our expanded FrameNet
contains an additional 22K lexical units, a 3-fold increase over the current FrameNet, and achieves 40
2-18 IWNLP: Inverse Wiktionary for Natural Language Processing
Matthias Liebeck and Stefan Conrad
Nowadays, there are a lot of natural language processing pipelines that are based on training data created by a
few experts. This paper examines how the proliferation of the internet and its collaborative application possi-
bilities can be practically used for NLP. For that purpose, we examine how the German version of Wiktionary
can be used for a lemmatization task. We introduce IWNLP, an open-source parser for Wiktionary, that reim-
plements several MediaWiki markup language templates for conjugated verbs and declined adjectives. The
lemmatization task is evaluated on three German corpora on which we compare our results with existing soft-
ware for lemmatization. With Wiktionary as a resource, we obtain a high accuracy for the lemmatization of
nouns and can even improve on the results of existing software for the lemmatization of nouns.
2-23 deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Tar-
gets
Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret
Mitchell, Jianfeng Gao, and Bill Dolan
116
Tuesday, July 28, 2015
We introduce Discriminative BLEU deltaBLEU, a novel metric for intrinsic evaluation of generated text in
tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on
a scale of [ , +1] to weight multi-reference BLEU. In tasks involving generation of conversational responses,
deltaBLEU correlates reasonably with human judgments and outperforms sentence-level and IBM BLEU in
terms of both Spearmans who and Kendalls tau.
2-24 Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based
Tibetan Word Segmentation
Minghua Nuo, Huidan Liu, Congjun Long, and Jian Wu
In Tibetan, as words are written consecutively without delimiters, finding unknown word boundary is difficult.
This paper presents a hybrid approach for Tibetan unknown word identification for offline corpus processing.
Firstly, Tibetan named entity is preprocessed based on natural annotation. Secondly, other Tibetan unknown
words are extracted from word segmentation fragments using MTC, the combination of a statistical metric
and a set of context sensitive rules. In addition, the preliminary experimental results on Tibetan News Corpus
are reported. Lexicon-based Tibetan word segmentation system SegT with proposed unknown word extension
mechanism is indeed helpful to promote the performance of Tibetan word segmentation. It increases the F-score
of Tibetan word segmentation by 4.15
2-27 Early and Late Combinations of Criteria for Reranking Distributional Thesauri
Olivier Ferret
In this article, we first propose to exploit a new criterion for improving distributional thesauri. Following a
bootstrapping perspective, we select relations between the terms of similar nominal compounds for building
in an unsupervised way the training set of a classifier performing the reranking of a thesaurus. Then, we
evaluate several ways to combine thesauri reranked according to different criteria and show that exploiting the
complementary information brought by these criteria leads to significant improvements.
117
Poster session P2.08
It has been extensively observed that languages minimise the distance between two related words. Dependency
length minimisation effects are explained as a means to reduce memory load and for effective communication.
In this paper, we ask whether they hold in typically short spans, such as noun phrases, which could be thought
of being less subject to efficiency pressure. We demonstrate that minimisation does occur in short spans, but
also that it is a complex effect: it is not only the length of the dependency that is at stake, but also the effect of
the surrounding dependencies.
118
Tuesday, July 28, 2015
Field EGTRF to model non-linear dependencies between words. Specifically, we parse sentences into depen-
dency trees and represent them as a graph, and assume the topic assignment of a word is influenced by its
adjacent words and distance-2 words. Word similarity information learned from large corpus is incorporated to
enhance word topic assignment. Parameters are estimated efficiently by variational inference and experimental
results on two datasets show EGTRF achieves lower perplexity and higher log predictive probability.
2-36 Learning Hidden Markov Models with Distributed State Representations for Domain
Adaptation
Min Xiao and Yuhong Guo
Recently, a variety of representation learning approaches have been developed in the literature to induce latent
generalizable features across two domains. In this paper, we extend the standard hidden Markov models HMMs
to learn distributed state representations to improve cross-domain prediction performance. We reformulate the
HMMs by mapping each discrete hidden state to a distributed representation vector and employ an expectation-
maximization algorithm to jointly learn distributed state representations and model parameters. We empirically
investigate the proposed model on cross-domain part-of-speech tagging and noun-phrase chunking tasks. The
experimental results demonstrate the effectiveness of the distributed HMMs on facilitating domain adaptation.
119
Poster session P2.09
We propose a novel method for translation selection in statistical machine translation, in which a convolutional
neural network is employed to judge the similarity between a phrase pair in two languages. The specifically
designed convolutional architecture encodes not only the semantic similarity of the translation pair, but also
the context containing the phrase in the source language. Therefore, our approach is able to capture context-
dependent semantic similarities of translation pairs. We adopt a curriculum learning strategy to train the model:
we classify the training examples into easy, medium, and difficult categories, and gradually build the ability
of representing phrases and sentence-level contexts by using training examples from easy to difficult. Ex-
perimental results show that our approach significantly outperforms the baseline system by up to 1.4 BLEU
points.
2-39 Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Transla-
tion
Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, and Hai Zhao
Statistical models for reordering source words have been used to enhance the hierarchical phrase-based statisti-
cal machine translation system. Existing word reordering models learn the reordering for any two source words
in a sentence or only for two continuous words. This paper proposes a series of separate sub-models to learn
reorderings for word pairs with different distances. Our experiments demonstrate that reordering sub-models
for word pairs with distances less than a specific threshold are useful to improve translation quality. Com-
pared with previous work, our method may more effectively and efficiently exploit helpful word reordering
information.
2-40 UNRAVELA Decipherment Toolkit
Malte Nuhn, Julian Schamper, and Hermann Ney
In this paper we present the UNRAVEL toolkit. It implements many of the recently published works on
decipherment, including decipherment for deterministic ciphers like e.g. the ZODIAC-408 cipher and Part two
of the BEALE ciphers, as well as decipherment of probabilistic ciphers and unsupervised training for machine
translation. It also includes data and example configuration files so that the previously published experiments
are easy to reproduce.
2-41 Multi-Pass Decoding With Complex Feature Guidance for Statistical Machine Translation
Benjamin Marie and Aurlien Max
In Statistical Machine Translation, some complex features are still difficult to integrate during decoding and
usually used through the reranking of the k-best hypotheses produced by the decoder. We propose a translation
table partitioning method that exploits the result of this reranking to iteratively guide the decoder in order to
produce a new k-best list more relevant to some complex features. We report experiments on two translation
domains and two translations directions which yield improvements of up to 1.4 BLEU over the reranking
baseline using the same set of complex features. On a practical viewpoint, our approach allows SMT system
developers to easily integrate complex features into decoding rather than being limited to their use in one-time
k-best list reranking.
2-42 Whats in a Domain? Analyzing Genre and Topic Differences in Statistical Machine Trans-
lation
Marlies Van der Wees, Arianna Bisazza, Wouter Weerkamp, and Christof Monz
Domain adaptation is an active field of research in statistical machine translation SMT, but so far most work has
ignored the distinction between the topic and genre of documents. In this paper we quantify and disentangle
the impact of genre and topic differences on translation quality by introducing a new data set that has controlled
topic and genre distributions. In addition, we perform a detailed analysis showing that differences across topics
only explain to a limited degree translation performance differences across genres, and that genre-specific errors
are more attributable to model coverage than to suboptimal scoring of translation candidates.
120
Tuesday, July 28, 2015
2-45 BrailleSUM: A News Summarization System for the Blind and Visually Impaired People
Xiaojun Wan and Yue Hu
In this article, we discuss the challenges of document summarization for the blind and visually impaired people
and then propose a new system called BrailleSUM to produce better summaries for the blind and visually
impaired people. Our system considers the factor of braille length of each sentence in news articles into the
ILP-based summarization method. Evaluation results on a DUC dataset show that BrailleSUM can produce
shorter braille summaries than existing methods, meanwhile, it does not sacrifice the content quality of the
summaries.
2-46 Automatic Identification of Age-Appropriate Ratings of Song Lyrics
Anggi Maulidyani and Ruli Manurung
This paper presents a novel task, namely the automatic identification of age appropriate ratings of a musical
track, or album, based on its lyrics. Details are provided regarding the construction of a dataset of lyrics from
12,242 tracks across 1,798 albums along with age-appropriate ratings obtained from various web resources,
along with results from various text classification experiments. The best accuracy of 71.02
121
Poster session P2.11
We present and evaluate a method for automatically detecting sentence fragments in English texts written
by non-native speakers. Our method combines syntactic parse tree patterns and parts-of-speech information
produced by a tagger to detect this phenomenon. When evaluated on a corpus of authentic learner texts,
our best model achieved a precision of 0.84 and a recall of 0.62, a statistically significant improvement over
baselines using non-parse features, as well as a popular grammar checker.
2-54 Twitter User Geolocation Using a Unified Text and Network Prediction Model
Afshin Rahimi, Trevor Cohn, and Timothy Baldwin
We propose a label propagation approach to geolocation prediction based on Modified Adsorption, with two
enhancements: 1 the removal of celebrity nodes to increase location homophily and boost tractability; and
2 the incorporation of text-based geolocation priors for test users. Experiments over three Twitter benchmark
datasets achieve state-of-the-art results, and demonstrate the effectiveness of the enhancements.
2-55 Automatic Keyword Extraction on Twitter
Luis Marujo, Wang Ling, Isabel Trancoso, Chris Dyer, Alan W Black, Anatole Gershman, David
Martins De Matos, Joo Neto, and Jaime Carbonell
122
Tuesday, July 28, 2015
In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods.
We identify key differences between this domain and the work performed on other domains, such as news,
which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets.
These datasets include the small amount of content in each tweet, the frequent usage of lexical variants and the
high variance of the cardinality of keywords present in each tweet. We propose methods for addressing these
issues, which leads to solid improvements on this dataset for this task.
2-58 Lexical Comparison Between Wikipedia and Twitter Corpora by Using Word Embeddings
Luchen Tan, Haotian Zhang, Charles Clarke, and Mark Smucker
Compared with carefully edited prose, the language of social media is informal in the extreme. The application
of NLP techniques in this context may require a better understanding of word usage within social media. In this
paper, we compute a word embedding for a corpus of tweets, comparing it to a word embedding for Wikipedia.
After learning a transformation of one vector space to the other, and adjusting similarity values according to
term frequency, we identify words whose usage differs greatly between the two corpora. For any given word,
the set of words closest to it in a particular embedding provides a characterization for that words usage within
the corresponding corpora.
123
Poster session P2.13
We study the problem of predicting tense in Chinese conversations. The unique challenges include: 1 Chinese
verbs do not have explicit lexical or grammatical forms to indicate tense; 2 Tense information is often implicitly
hidden outside of the target sentence. To tackle these challenges, we first propose a set of novel sentence-level
local features using rich linguistic resources and then propose a new hypothesis of ne tense per scene to
incorporate scene-level global evidence to enhance the performance. Experimental results demonstrate the
power of this hybrid approach, which can serve as a new and promising benchmark.
124
Tuesday, July 28, 2015
2-66 A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering
Di Wang and Eric Nyberg
In this paper, we present an approach that address the answer sentence selection problem for question an-
swering. The proposed method uses a stacked bidirectional Long-Short Term Memory BLSTM network to
sequentially read words from question and answer sentences, and then outputs their relevance scores. Unlike
prior work, this approach does not require any syntactic parsing or external knowledge resources such as Word-
Net which may not be available in some domains or languages. The full system is based on a combination of the
stacked BLSTM relevance model and keywords matching. The results of our experiments on a public bench-
mark dataset from TREC show that our system outperforms previous work which requires syntactic features
and external knowledge resources.
2-67 Answer Sequence Learning with Neural Networks for Answer Selection in Community
Question Answering
Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, and Xiaolong Wang
In this paper, the answer selection problem in community question answering CQA is regarded as an answer
sequence labeling task, and a novel approach is proposed based on the recurrent architecture for this problem.
Our approach applies convolution neural networks CNNs to learning the joint representation of question-answer
pair firstly, and then uses the joint representation as input of the long short-term memory LSTM to learn the
answer sequence of a question for labeling the matching quality of each answer. Experiments conducted on the
SemEval 2015 CQA dataset shows the effectiveness of our approach.
2-69 How Well Do Distributional Models Capture Different Types of Semantic Knowledge?
Dana Rubinstein, Effi Levi, Roy Schwartz, and Ari Rappoport
In recent years, distributional models DMs have shown great success in representing lexical semantics. In this
work we show that the extent to which DMs represent semantic knowledge is highly dependent on the type of
knowledge. We pose the task of predicting properties of concrete nouns in a supervised setting, and compare
between learning taxonomic properties e.g., animacy and attributive properties e.g., size, color. We employ
four state-of-the-art DMs as sources of feature representation for this task, and show that they all yield poor
results when tested on attributive properties, achieving no more than an average F-score of 0.37 in the binary
property prediction task, compared to 0.73 on taxonomic properties. Our results suggest that the distributional
hypothesis may not be equally applicable to all types of semantic information.
125
Poster session P2.15
Several compositional distributional semantic methods use tensors to model multi-way interactions between
vectors. Unfortunately, the size of the tensors can make their use impractical in large-scale implementations. In
this paper, we investigate whether we can match the performance of full tensors with low-rank approximations
that use a fraction of the original number of parameters. We investigate the effect of low-rank tensors on the
transitive verb construction where the verb is a third-order tensor. The results show that, while the low-rank
tensors require about two orders of magnitude fewer parameters per verb, they achieve performance comparable
to, and occasionally surpassing, the unconstrained-rank tensors on sentence similarity and verb disambiguation
tasks.
2-71 Constrained Semantic Forests for Improved Discriminative Semantic Parsing
Wei Lu
In this paper, we present a model for improved discriminative semantic parsing. The model addresses an
important limitation associated with our previous state-of-the-art discriminative semantic parsing model - the
relaxed hybrid tree model by introducing our constrained semantic forests. We show that our model is able to
yield new state-of-the-art results on standard datasets even with simpler features. Our system is available for
download from http://statnlp.org/research/sp/.
2-75 Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information
Zhongqing Wang, Sophia Lee, Shoushan Li, and Guodong Zhou
Code-switching is commonly used in the free-form text environment, such as social media, and it is especially
favored in emotion expressions. Emotions in code-switching texts differ from monolingual texts in that they
can be expressed in either monolingual or bilingual forms. In this paper, we first utilize two kinds of knowledge,
i.e. bilingual and sentimental information to bridge the gap between different languages. Moreover, we use a
term-document bipartite graph to incorporate both bilingual and sentimental information, and propose a label
126
Tuesday, July 28, 2015
propagation based approach to learn and predict in the bipartite graph. Empirical studies demonstrate the
effectiveness of our proposed approach in detecting emotion in code-switching texts.
2-77 Linguistic Template Extraction for Recognizing Reader-Emotion and Emotional Reso-
nance Writing Assistance
Yung-Chun Chang, Cen-Chieh Chen, Yu-lun Hsieh, Chien Chin Chen, and Wen-Lian Hsu
In this paper, we propose a flexible principle-based approach PBA for reader-emotion classification and writing
assistance. PBA is a highly automated process that learns emotion templates from raw texts to characterize an
emotion and is comprehensible for humans. These templates are adopted to predict reader-emotion, and may
further assist in emotional resonance writing. Experiment results demonstrate that PBA can effectively detect
reader-emotions by exploiting the syntactic structures and semantic associations in the context, thus outper-
forming well-known statistical text classification methods and the state-of-the-art reader-emotion classification
method. Moreover, writers are able to create more emotional resonance in articles under the assistance of the
generated emotion templates. These templates have been proven to be highly interpretable, which is an attribute
that is difficult to accomplish in traditional statistical methods.
127
Poster session P2.17
Dialog state tracking is a key component of many modern dialog systems, most of which are designed with a
single, well-defined domain in mind. This paper shows that dialog data drawn from different dialog domains
can be used to train a general belief tracking model which can operate across all of these domains, exhibiting
superior performance to each of the domain-specific models. We propose a training procedure which uses
out-of-domain data to initialise belief tracking models for entirely new domains. This procedure leads to im-
provements in belief tracking performance regardless of the amount of in-domain data available for training the
model.
2-81 Dialogue Management based on Sentence Clustering
Wendong Ge and Bo Xu
Dialogue Management DM is a key issue in Spoken Dialogue System SDS. Most of the existing studies on DM
use Dialogue Act DA to represent semantic information of sentence, which might not represent the nuanced
meaning sometimes. In this paper, we model DM based on sentence clusters which have more powerful seman-
tic representation ability than DAs. Firstly, sentences are clustered not only based on the internal information
such as words and sentence structures, but also based on the external information such as context in dialogue
via Recurrent Neural Networks. Additionally, the DM problem is modeled as a Partially Observable Markov
Decision Processes POMDP with sentence clusters. Finally, experimental results illustrate that the proposed
DM scheme is superior to the existing one.
2-84 A Simultaneous Recognition Framework for the Spoken Language Understanding Module
of Intelligent Personal Assistant Software on Smart Phones
Changsu Lee, Youngjoong Ko, and Jungyun Seo
The intelligent personal assistant soft-ware such as the Apples Siri and Sam-sungs S-Voice has been issued
these days. This paper introduces a novel Spoken Language Understanding SLU module to predict users
intention for determining system actions of the intelligent personal assistant software. The SLU module usually
consists of several connected recognition tasks on a pipeline framework, whereas the proposed SLU module
simultaneously recognizes four recognition tasks on a recognition framework using Conditional Random Fields
CRF. The four tasks include named entity, speech-act, target and operation recognition. In the experiments, the
new simultaneous recognition method achieves the higher performance of 4
128
Tuesday, July 28, 2015
2-85 A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its
Evaluation
Sanja Stajner, Hannah Bechara, and Horacio Saggion
In the last few years, there has been a growing number of studies addressing the Text Simplification TS task as
a monolingual machine translation MT problem which translates from riginal to imple language. Motivated
by those results, we investigate the influence of quality vs quantity of the training data on the effectiveness of
such a MT approach to text simplification. We conduct 40 experiments on the aligned sentences from English
Wikipedia and Simple English Wikipedia, controlling for: 1 the similarity between the original and simplified
sentences in the training and development datasets, and 2 the sizes of those datasets. The results suggest that
in the standard PB-SMT approach to text simplification the quality of the datasets has a greater impact on the
system performance. Additionally, we point out several important differences between cross-lingual MT and
monolingual MT used in text simplification, and show that BLEU is not a good measure of system performance
in text simplification task.
2-87 A Methodology for Evaluating Timeline Generation Algorithms based on Deep Semantic
Units
Sandro Bauer and Simone Teufel
Timeline generation is a summarisation task which transforms a narrative, roughly chronological input text into
a set of timestamped summary sentences, each expressing an atomic historical event. We present a methodology
for evaluating systems which create such timelines, based on a novel corpus consisting of 36 human-created
timelines. Our evaluation relies on deep semantic units which we call historical content units. An advantage of
our approach is that it does not require human annotation of new system summaries.
2-88 Unsupervised extractive summarization via coverage maximization with syntactic and se-
mantic concepts
Natalie Schluter and Anders Sgaard
Coverage maximization with bigram concepts is a state-of-the-art approach to unsupervised extractive summa-
rization. It has been argued that such concepts are adequate and, in contrast to more linguistic concepts such
as named entities or syntactic dependencies, more robust, since they do not rely on automatic processing. In
this paper, we show that while this seems to be the case for a commonly used newswire dataset, syntactic and
semantic concepts lead to significant improvements in performance in other domains.
2-90 Semantic Structure Analysis of Noun Phrases using Abstract Meaning Representation
Yuichiro Sawai, Hiroyuki Shindo, and Yuji Matsumoto
129
Student Research Workshop
We propose a method for semantic structure analysis of noun phrases using Abstract Meaning Representation
AMR. AMR is a graph representation for the meaning of a sentence, in which noun phrases NPs are manually
annotated with internal structure and semantic relations. We extract NPs from the AMR corpus and construct a
data set of NP semantic structures. We also propose a transition-based algorithm which jointly identifies both
the nodes in a semantic structure tree and semantic relations between them. Compared to the baseline, our
method improves the performance of NP semantic structure analysis by 2.7 points, while further incorporating
external dictionary boosts the performance by 7.1 points.
2-91 Boosting Transition-based AMR Parsing with Refined Actions and Auxiliary Analyzers
Chuan Wang, Nianwen Xue, and Sameer Pradhan
We report improved AMR parsing results by adding a new action to a transition-based AMR parser to infer
abstract concepts and by incorporating richer features produced by auxiliary analyzers such as a semantic role
labeler and a coreference resolver. We report final AMR parsing results that show an improvement of 7
130
Tuesday, July 28, 2015
131
Social Event
Plenary Hall B
The ACL 2015 Social Event will be held immediately following the Tuesday Poster Session and
dinner in the China national convention center (CNCC). Here you will enjoy desserts, coffee, tea and
wine. Bring your boots and hats and enjoy dancing. Enjoy networking with colleagues and have a
relaxing evening!
We hope to make your conference experience not only enlightening but also entertaining and enjoy-
able!
132
Main Conference: Wednesday, July 29
5
Overview
133
Main Conference
134
Wednesday, July 29, 2015
Plenary Hall B
Abstract:
The real-world data are unstructured but interconnected. The majority of such data is in the form of
natural language text. One of the grand challenges is to turn such massive data into actionable knowl-
edge. In this talk, we present our vision on how to turn massive unstructured, text-rich, but intercon-
nected data into knowledge. We propose a data-to-network-to-knowledge (i.e., D2N2K) paradigm,
which is to first turn data into relatively structured heterogeneous information networks, and then
mine such text-rich and structure-rich heterogeneous networks to generate useful knowledge. We
show why such a paradigm represents a promising direction and present some recent progress on the
development of effective methods for construction and mining of structured heterogeneous informa-
tion networks from text data.
Biography: Jiawei Han is Abel Bliss Professor in the Department of Computer Science, University
of Illinois at Urbana-Champaign. He has been researching into data mining, information network
analysis, database systems, and data warehousing, with over 600 journal and conference publica-
tions. He has chaired or served on many program committees of international conferences, including
PC co-chair for KDD, SDM, and ICDM conferences, and Americas Coordinator for VLDB confer-
ences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Dis-
covery from Data and is serving as the Director of Information Network Academic Research Center
supported by U.S. Army Research Lab, and Director of KnowEnG, an NIH funded Center of Excel-
lence in Big Data Computing. He is a Fellow of ACM and Fellow of IEEE, and received 2004 ACM
SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, 2009
IEEE Computer Society Wallace McDowell Award, and 2011 Daniel C. Drucker Eminent Faculty
Award at UIUC. His co-authored book "Data Mining: Concepts and Techniques" has been adopted
as a textbook popularly worldwide.
135
Session 8
136
Wednesday, July 29, 2015
Parallel Session 8
137
Session 8
138
Wednesday, July 29, 2015
A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda Setting Cam-
paigns
Oren Tsur, Dan Calacci, and David Lazer 10:5511:20
Framing is a sophisticated form of discourse in which the speaker tries to induce a cognitive bias through
consistent linkage between a topic and a specific context frame. We build on political science and communi-
cation theory and use probabilistic topic models combined with time series regression analysis autoregressive
distributed-lag models to gain insights about the language dynamics in the political processes. Processing four
years of public statements issued by members of the U.S. Congress, our results provide a glimpse into the
complex dynamic processes of framing, attention shifts and agenda setting, commonly known as spin. We
further provide new evidence for the divergence in party discipline in U.S. politics.
139
Session 8
140
Wednesday, July 29, 2015
141
ACL Business Meeting
142
Wednesday, July 29, 2015
14:35
ing for cross- Linear-Time Unsupervised the user occu- and Part-of-
lingual NLP Chinese Word Method for pational class Speech with
Sgaard, Agic, Segmentation Uncovering through Twitter Pitman-Yor
Alonso, Plank, via Embedding Morphological content Hidden Semi-
Bohnet, and Matching Chains Preotiuc-Pietro, Markov Models
Johannsen Ma and Hin- Narasimhan, Lampos, and Uchiumi,
richs Barzilay, and Aletras Tsukahara, and
Jaakkola Mochihashi
Multi-Task Gated Recur- [TACL] Mod- Tracking un- Coupled Se-
15:00
Learning for sive Neural eling Word bounded Topic quence Label-
Multiple Lan- Network for Forms Using Streams ing on Hetero-
guage Transla- Chinese Word Latent Underly- Wurzer, geneous An-
tion Segmentation ing Morphs and Lavrenko, and notations: POS
Dong, Wu, He, Chen, Qiu, Zhu, Phonology Osborne Tagging as a
Yu, and Wang and Huang Cotterell, Peng, Case Study
and Eisner Li, Chao,
Zhang, and
Chen
143
Session 9
Parallel Session 9
144
Wednesday, July 29, 2015
145
Session 9
[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology
Ryan Cotterell, Nanyun Peng, and Jason Eisner 15:0015:25
The observed pronunciations or spellings of words are often explained as arising from the "underlying forms"
of their morphemes. These forms are latent strings that linguists try to reconstruct by hand. We propose to
reconstruct them automatically at scale, enabling generalization to new words. Given some surface word types
of a concatenative language along with the abstract morpheme sequences that they express, we show how to
recover consistent underlying forms for these morphemes, together with the (stochastic) phonology that maps
each concatenation of underlying forms to a surface form. Our technique involves loopy belief propagation in a
natural directed graphical model whose variables are unknown strings and whose conditional distributions are
encoded as finite-state machines with trainable weights. We define training and evaluation paradigms for the
task of surface word prediction, and report results on subsets of 6 languages.
146
Wednesday, July 29, 2015
147
Session 9
148
Wednesday, July 29, 2015
149
Session 9
150
CoNLL
6
Conference Program
151
Co-located Conference: CoNLL
One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction
Kaveh Taghipour and Hwee Tou Ng
Feature Selection for Short Text Classification using Wavelet Packet Transform
Anuj Mahajan, Sharmistha Jat and Shourya Roy
152
ThursdayFriday, July 3031, 2015
The UniTN Discourse Parser in CoNLL 2015 Shared Task: Token-level Sequence
Labeling with Argument-specific Models
Evgeny Stepanov, Giuseppe Riccardi and Ali Orkan Bayer
153
Co-located Conference: CoNLL
Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The
Impact of Word Representations on Sequence Labelling Tasks
Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider and
Timothy Baldwin
Opinion Holder and Target Extraction based on the Induction of Verbal Categories
Michael Wiegand and Josef Ruppenhofer
Does the success of deep neural network language processing mean finally! the
end of theoretical linguistics?
Paul Smolensky, Johns Hopkins University
AIDA2: A Hybrid Approach for Token and Sentence Level Dialect Identification in
Arabic
Mohamed Al-Badrashiny, Heba Elfardy and Mona Diab
154
ThursdayFriday, July 3031, 2015
155
Co-located Conference: CoNLL
16:00-17:30 Session 8.a: Joint Poster Presentation (long, short and shared task papers)
Long Papers
A Joint Framework for Coreference Resolution and Mention Head Detection
Haoruo Peng, Kai-Wei Chang and Dan Roth
Opinion Holder and Target Extraction based on the Induction of Verbal Categories
Michael Wiegand and Josef Ruppenhofer
Short Papers
Deep Neural Language Models for Machine Translation
Thang Luong, Michael Kayser and Christopher D. Manning
156
ThursdayFriday, July 3031, 2015
One Million Sense-Tagged Instances for Word Sense Disambiguation and Induction
Kaveh Taghipour and Hwee Tou Ng
Feature Selection for Short Text Classification using Wavelet Packet Transform
Anuj Mahajan, Sharmistha Jat and Shourya Roy
157
Co-located Conference: CoNLL
The DCU Discourse Parser for Connective, Argument Identification and Explicit
Sense Classification
Longyue Wang, Chris Hokamp, Tsuyoshi Okita, Xiaojun Zhang and Qun Liu
158
Workshops
7
ThursdayFriday
401 Eighth SIGHAN Workshop on Chinese Language Processing p.160
Thursday
301A Arabic Natural Language Processing p.163
301B Grammar Engineering Across Frameworks (GEAF 2015) p.165
302A Eighth Workshop on Building and Using Comparable Corpora p.166
302B Semantics-Driven Statistical Machine Translation: Theory and p.168
Practice
303A Novel Computational Approaches to Keyphrase Extraction p.169
303B SIGHUM Workshop on Language Technology for Cultural p.170
Heritage, Social Sciences, and Humanities
305 BioNLP 2015 p.172
Friday
301A The Fifth Named Entities Workshop p.174
301B Third Workshop on Continuous Vector Space Models and their p.176
Compositionality
302A Fourth Workshop on Hybrid Approaches to Translation p.177
302B Fourth Workshop on Linked Data in Linguistics: Resources and p.179
Applications
303A Workshop on Noisy User-generated Text p.180
303B The 2nd Workshop on Natural Language Processing Techniques p.183
for Educational Applications
305 The First Workshop on Computing News Storylines p.185
159
Two-day Workshops
160
ThursdayFriday, July 3031, 2015
161
One-day Workshops
162
Thursday, July 30, 2015
163
One-day Workshops
164
Thursday, July 30, 2015
Organizers: Emily M. Bender, Lori Levin, Stefan Mller, Yannick Parmentier, and Aarne Ranta
Venue: 301B
165
One-day Workshops
166
Thursday, July 30, 2015
167
One-day Workshops
Organizers: Deyi Xiong, Kevin Duh, Christian Hardmeier, and Roberto Navigli
Venue: 302B
168
Thursday, July 30, 2015
Organizers: Sujatha Das Gollapalli, Cornelia Caragea, C. Lee Giles, and Xiaoli Li
Venue: 303A
169
One-day Workshops
170
Thursday, July 30, 2015
171
One-day Workshops
Organizers: Kevin Bretonnel Cohen, Dina Demner-Fushman, Sophia Ananiadou, and Junichi Tsujii
Venue: 305
172
Friday, July 31, 2015
173
One-day Workshops
174
Friday, July 31, 2015
16:5017:00 Closing
175
One-day Workshops
Organizers: Alexandre Allauzen, Edward Grefenstette, Karl Moritz Hermann, Hugo Larochelle, and
Scott Wen-tau Yih
Venue: 301B
176
Friday, July 31, 2015
Organizers: Bogdan Babych, Kurt Eberle, Marta R. Costa-jussa, Rafael E. Banchs, Patrik Lambert,
and Reinhard Rapp
Venue: 302A
177
One-day Workshops
18:0018:15 Towards a shared task for shallow semantics-based translation (in an industrial
setting)
Kurt Eberle
18:1518:20 Conclusions
178
Friday, July 31, 2015
Organizers: Christian Chiarcos, John Philip McCrae, Petya Osenova, Philipp Cimiano, and
Nancy Ide
Venue: 302B
179
One-day Workshops
180
Friday, July 31, 2015
181
One-day Workshops
15:0015:15 Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition
for Twitter Microposts using Distributed Word Representations
Frederic Godin, Baptist Vandersmissen, Wesley De Neve, and Rik Van de Walle
15:1515:30 NCSU SAS SAM: Deep Encoding and Reconstruction for Normalization of
Noisy Text
Samuel Leeman-Munk, James Lester, and James Cox
15:3016:00 Coffee Break
16:0017:30 Invited Talks
16:0016:45 Automated Grammatical Error Correction for Language Learners: Where are
we, and where do we go from there?
Joel Tetreault
16:4517:30 Are Minority Dialects "Noisy Text"?: Implications of Social and Linguistic
Diversity for Social Media NLP
Brendan OConnor
182
Friday, July 31, 2015
Organizers: Hsin-Hsi Chen, Yuen-Hsien Tseng, Yuji Matsumoto, and Lung Hsiang Wong
Venue: 303B
184
Friday, July 31, 2015
Organizers: Tommaso Caselli, Marieke van Erp, Anne-Lyse Minard, Mark Finlayson, Ben Miller,
Jordi Atserias, Alexandra Balahur, and Piek Vossen
Venue: 305
185
186
Anti-harassment policy
8
ACL has always prided itself as a venue that allows for the open exchange of ideas and the freedom
of thought and expression. In keeping with these beliefs, to codify them and to ensure that ACL
becomes immediately aware of any deviation from these principles, ACL has instituted an
anti-harassment policy in coordination with NAACL.
Harassment and hostile behavior is unwelcome at any ACL conference; including speech or
behavior that intimidates, creates discomfort, or interferes with a persons participation or
opportunity for participation in the conference. We aim for ACL conferences to be environments
where harassment in any form does not happen, including but not limited to harassment based on
race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender
identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking,
harassing photography or recording, inappropriate physical contact, and unwelcome sexual
attention.
If you are a victim of harassment or hostile behaviour at an ACL conference, or otherwise observe
such behaviour toward someone else, please contact any of the following people:
Please be assured that if you approach us, your concerns will be kept in strict confidence, and we
will consult with you on the actions taken by the Board.
187
188
Index
189
Author Index
190
Author Index
191
Author Index
192
Author Index
193
Author Index
194
Author Index
195
Author Index
196
Author Index
197
Author Index
198
Author Index
199
Author Index
200
Author Index
201
Author Index
202
Author Index
203
Local Guide
204
Local Guide
Local Guide
9
The Association for Computational Linguistics and the Asian Federation of Natural Language
Processing (ACL-IJCNLP 2015) will be held in Beijing, China this year. Here are some things that I
think might be useful or enjoyable for visiting computational linguists, natural language processing
people, and the like.
205
Currency The official name for the currency of China is Renminbi (RMB). It is
denominated into Yuan () or Kuai (). Foreign currency can be exchanged for RMB
at airports, banks and hotels. Major credit cards are honored at most hotels. Banks usually
open at 9:00 in the morning and close at 17:00 in the afternoon all working days.
Smoking Smoking is not allowed in the conference venues, nor in any public indoor
establishments such as restaurants and bars.
Weather Summer clothes such as shorts and dresses are enough for this time in Beijing.
The sunlight may be very strong in the afternoon, so do prepare some sun-tan oil, lotion,
cream if you are going to go outdoors. You may also pack a raincoat or umbrella for any
sudden rain during travel.
Beijing Average Climate by Month
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
High() 1.1 3.9 11.1 19.4 26.1 30.0 30.6 29.4 25.6 18.9 10.0 2.8
Low() -9.4 -7.2 -1.1 7.2 12.8 17.8 21.1 20.0 13.9 7.2 -0.6 -7.2
Precip(mm) 0.1 0.2 0.4 1 1.1 2.8 6.9 7.2 1.9 0.7 0.2 0.1
High() 34 39 52 67 79 86 87 85 78 66 50 37
Low() 15 19 30 45 55 64 70 68 57 45 31 19
Transportation
1. Airport to CNCC
Beijing Subway (Airport Express Line) You can take the Airport Express Line
runs from Terminal 3/Terminal 2 to Sanyuanqiao () station, then take subway line
10 to Beitucheng () station, and transfer subway line 8 to the Olympic Sports
Center () station.
Airport Taxi The legitimate taxis form a long queue outside the Arrival Hall, but taxis
move quickly so you won't wait long. At the head of the line a dispatcher will give you
your taxi's number, which is useful in case of complaints. The charge will be at least
100CNY, but pay according to the meter, which includes an expressway toll of 15CNY.
After 23:00, you will pay more. You can show the below information to taxi driver:
Please take me to China National Convention Center (CNCC)
7
Address: No.7 Tianchen East Road, Olympic District, Beijing
Airport Shuttle Bus The airport shuttle runs every 30 minutes from early 5:30 to
20:00, and cover different routes, including Asian Games Village (Anhui Bridge which
is close to CNCC). It costs 25 yuan (about $4).
2. Airport to Hotels
CNCC Grand Hotel Take Airport Express, then transfer to Subway Line 8 and get
off at Olympic Park, and walk west about 300 meters. You can also get to CNCC Grand
Hotel by taxi and you can show the below information to taxi driver:
Please take me to China National Convention Center Grand Hotel
8
Address: No.8 Beichen West Road, Chaoyang District, Beijing
Please take me to InterContinental Beijing Beichen Hotel
8
Address: No.8 Beichen West Road, Chaoyang District, Beijing
Best Western OL Stadium Hotel Take Airport Bus Line 4 (Beijing Capital
International Airport - Gongzhufen) and get off at Madian Bridge station. Then transfer
to bus 315 and get off at Beishatan Bridge station. You can also get to Best Western OL
Stadium Hotel by taxi and you can show the below information to taxi driver:
Please take me to Best Western OL Stadium Hotel
Address: No.1 Datun Road, Beishatan Chaoyang District, Beijing
3. Hotels to CNCC
CNCC Grand Hotel From the hotel is about 10-minutes walk away to the
Conference venue.
InterContinental Beijing Beichen Hotel From the hotel is about 10-minutes walk
away to the Conference venue.
Best Western OL Stadium Hotel From the hotel is about 20-minutes walk away to
the Conference venue. The local organizer will also offer shuttle bus service between
Best Western OL Stadium Hotel and CNCC from July 27 to July 29. Below is the time
table of the shuttle bus:
4. City Transport
Subway During the build up to the 2008 Olympics, Beijing's Subway has been
extensively developed from 2 lines to 14 lines. Please see the Subway Map for detail.
City Public Buses They run from 5:30 till 23:00 daily. Taking buses in Beijing is
cheap, but less comfortable than a taxi or the subway. The flat rate for a tram or
ordinary public bus is 2 yuan. Buses equipped with air-conditioning or of a special line
are charged according to the distance. Having your destination in Chinese characters
will help. When squeezing onto a crowded bus take care of your wallet, etc. Minibuses,
running from 7:00 to 19:00, charge the flat rate of 2 yuan guaranteeing a seat. They are
faster and more comfortable.
Taxis Though Beijing does suffer from congestion, its taxi drivers will find the fastest
way to your destination. Bring the name of your destination in Chinese characters if
your spoken Chinese is not good. Pedi-cab is a good choice for sightseeing, especially
for visiting the narrow Hutongs. You will find pedi-cabs on the street. You should agree
on a price with the driver before starting the journey. Legally registered pedi-cabs can
be identified by a certificate attached to the cab and the driver has a card hanging
around his neck.
Bicycle China used to be called the sea of bicycles and in Beijing today the bike is
still a convenient vehicle for most people. Renting a bike may be a better way for you to
see this city at your own pace. Bikes can be hired from many hotels for 2030 yuan/day.
A deposit will be required. You can also rent bikes at some bicycle shops for repairing
bikes and inflating tires. Their charge for renting bikes there is lower as the bikes are
not as new. When needed, you park your bike in a bike park, which can be easily
identified by the large amount of bikes on roadsides. The charge is about 1 yuan.
Culture and Tour
Over the 5,000 years history of China, Beijing has been the capital for many dynasties,
Jin, Yuan, Ming, Qing, etc. It has many historical attractions. As the capital of China,
Beijing is also a modern city with 20 million populations. There are abundant place to
visit in Beijing.
CNCC
Xin Ao Shopping
Center
Route Walk 200 meters straight (the east direction) from the front door of the CNCC,
and then go downstairs
Restaurant Adress Characteristic Per capita Telephone Logo
& Logo consumption
NEW H1-61 American 80-300RMB 010-8437
YORKER Cafeteria 8485
Food court H1-62 Chinese Fast 20-50RMB 010-
Food 84374628
Platinum Sponsors____________________________________
Gold Sponsors_________________________________________
Silver Sponsors_________________________________________