Masayuki Asahara


2024

pdf bib
Long Unit Word Tokenization and Bunsetsu Segmentation of Historical Japanese
Hiroaki Ozaki | Kanako Komiya | Masayuki Asahara | Toshinobu Ogiso
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)

In Japanese, the natural minimal phrase of a sentence is the “bunsetsu” and it serves as a natural boundary of a sentence for native speakers rather than words, and thus grammatical analysis in Japanese linguistics commonly operates on the basis of bunsetsu units.In contrast, because Japanese does not have delimiters between words, there are two major categories of word definition, namely, Short Unit Words (SUWs) and Long Unit Words (LUWs).Though a SUW dictionary is available, LUW is not.Hence, this study focuses on providing deep learning-based (or LLM-based) bunsetsu and Long Unit Words analyzer for the Heian period (AD 794-1185) and evaluating its performances.We model the parser as transformer-based joint sequential labels model, which combine bunsetsu BI tag, LUW BI tag, and LUW Part-of-Speech (POS) tag for each SUW token.We train our models on corpora of each period including contemporary and historical Japanese.The results range from 0.976 to 0.996 in f1 value for both bunsetsu and LUW reconstruction indicating that our models achieve comparable performance with models for a contemporary Japanese corpus.Through the statistical analysis and diachronic case study, the estimation of bunsetsu could be influenced by the grammaticalization of morphemes.

pdf bib
Collection of Japanese Route Information Reference Expressions Using Maps as Stimuli
Yoshiko Kawabata | Mai Omura | Hikari Konishi | Masayuki Asahara | Johane Takeuchi
Proceedings of the 4th Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP 2024)

We constructed a database of Japanese expressions based on route information. Using 20 maps as stimuli, we requested descriptions of routes between two points on each map from 40 individuals per route, collecting 1600 route information reference expressions. We determined whether the expressions were based solely on relative reference expressions by using landmarks on the maps. In cases in which only relative reference expressions were used, we labeled the presence or absence of information regarding the starting point, waypoints, and destination. Additionally, we collected clarity ratings for each expression using a survey.

pdf bib
Prior Knowledge-Guided Adversarial Training
Lis Pereira | Fei Cheng | Wan Jou She | Masayuki Asahara | Ichiro Kobayashi
Proceedings of the 9th Workshop on Representation Learning for NLP (RepL4NLP-2024)

We introduce a simple yet effective Prior Knowledge-Guided ADVersarial Training (PKG-ADV) algorithm to improve adversarial training for natural language understanding. Our method simply utilizes task-specific label distribution to guide the training process. By prioritizing the use of prior knowledge of labels, we aim to generate more informative adversarial perturbations. We apply our model to several challenging temporal reasoning tasks. Our method enables a more reliable and controllable data training process than relying on randomized adversarial perturbation. Albeit simple, our method achieved significant improvements in these tasks. To facilitate further research, we will release the code and models.

2023

pdf bib
UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation
Mai Omura | Hiroshi Matsuda | Masayuki Asahara | Aya Wakasa
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this study, we have developed Universal Dependencies (UD) resources for spoken Japanese in the Corpus of Everyday Japanese Conversation (CEJC). The CEJC is a large corpus of spoken language that encompasses various everyday conversations in Japanese, and includes word delimitation and part-of-speech annotation. We have newly annotated Long Word Unit delimitation and Bunsetsu (Japanese phrase)-based dependencies, including Bunsetsu boundaries, for CEJC. The UD of Japanese resources was constructed in accordance with hand-maintained conversion rules from the CEJC with two types of word delimitation, part-of-speech tags and Bunsetsu-based syntactic dependency relations. Furthermore, we examined various issues pertaining to the construction of UD in the CEJC by comparing it with the written Japanese corpus and evaluating UD parsing accuracy.

pdf bib
Word Familiarity Rate Estimation for Japanese Functional Words Using a Bayesian Linear Mixed Model
Bocheng Chen | Masayuki Asahara
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Spatial Information Annotation Based on the Double Cross Model
Yoshiko Kawabata | Mai Omura | Masayuki Asahara | Johane Takeuchi
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
All-Words Word Sense Disambiguation for Historical Japanese
Soma Asada | Kanako Komiya | Masayuki Asahara
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

2022

pdf bib
Word Sense Disambiguation of Corpus of Historical Japanese Using Japanese BERT Trained with Contemporary Texts
Kanako Komiya | Nagi Oki | Masayuki Asahara
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation

pdf bib
CHJ-WLSP: Annotation of ‘Word List by Semantic Principles’ Labels for the Corpus of Historical Japanese
Masayuki Asahara | Nao Ikegami | Tai Suzuki | Taro Ichimura | Asuko Kondo | Sachi Kato | Makoto Yamazaki
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

This article presents a word-sense annotation for the Corpus of Historical Japanese: a mashed-up Japanese lexicon based on the ‘Word List by Semantic Principles’ (WLSP). The WLSP is a large-scale Japanese thesaurus that includes 98,241 entries with syntactic and hierarchical semantic categories. The historical WLSP is also compiled for the words in ancient Japanese. We utilized a morpheme-word sense alignment table to extract all possible word sense candidates for each word appearing in the target corpus. Then, we manually disambiguated the word senses for 647,751 words in the texts from the 10th century to 1910.

pdf bib
Reading Time and Vocabulary Rating in the Japanese Language: Large-Scale Japanese Reading Time Data Collection Using Crowdsourcing
Masayuki Asahara
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This study examines how differences in human vocabulary affect reading time. Specifically, we assumed vocabulary to be the random effect of research participants when applying a generalized linear mixed model to the ratings of participants in the word familiarity survey. Thereafter, we asked the participants to take part in a self-paced reading task to collect their reading times. Through fixed effect of vocabulary when applying a generalized linear mixed model to reading time, we clarified the tendency that vocabulary differences give to reading time.

2021

pdf bib
Lower Perplexity is Not Always Human-Like
Tatsuki Kuribayashi | Yohei Oseki | Takumi Ito | Ryo Yoshida | Masayuki Asahara | Kentaro Inui
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In computational psycholinguistics, various language models have been evaluated against human reading behavior (e.g., eye movement) to build human-like computational models. However, most previous efforts have focused almost exclusively on English, despite the recent trend towards linguistic universal within the general community. In order to fill the gap, this paper investigates whether the established results in computational psycholinguistics can be generalized across languages. Specifically, we re-examine an established generalization —the lower perplexity a language model has, the more human-like the language model is— in Japanese with typologically different structures from English. Our experiments demonstrate that this established generalization exhibits a surprising lack of universality; namely, lower perplexity is not always human-like. Moreover, this discrepancy between English and Japanese is further explored from the perspective of (non-)uniform information density. Overall, our results suggest that a cross-lingual evaluation will be necessary to construct human-like computational models.

pdf bib
Dependency Enhanced Contextual Representations for Japanese Temporal Relation Classification
Chenjing Geng | Fei Cheng | Masayuki Asahara | Lis Kanashiro Pereira | Ichiro Kobayashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib
ALICE++: Adversarial Training for Robust and Effective Temporal Reasoning
Lis Pereira | Fei Cheng | Masayuki Asahara | Ichiro Kobayashi
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib
The Annotation of Antonym Information in the ‘Word List by Semantic Principles’
Sachi Kato | Masayuki Asahara | Nanami Moriyama | Makoto Yamazaki Asami Ogiwara
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

pdf bib
Word Delimitation Issues in UD Japanese
Mai Omura | Aya Wakasa | Masayuki Asahara
Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021)

2020

pdf bib
Adversarial Training for Commonsense Inference
Lis Pereira | Xiaodong Liu | Fei Cheng | Masayuki Asahara | Ichiro Kobayashi
Proceedings of the 5th Workshop on Representation Learning for NLP

We apply small perturbations to word embeddings and minimize the resultant adversarial risk to regularize the model. We exploit a novel combination of two different approaches to estimate these perturbations: 1) using the true label and 2) using the model prediction. Without relying on any human-crafted features, knowledge bases, or additional datasets other than the target datasets, our model boosts the fine-tuning performance of RoBERTa, achieving competitive results on multiple reading comprehension datasets that require commonsense inference.

pdf bib
Generation and Evaluation of Concept Embeddings Via Fine-Tuning Using Automatically Tagged Corpus
Kanako Komiya | Daiki Yaginuma | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Composing Word Vectors for Japanese Compound Words Using Bilingual Word Embeddings
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Automatic Creation of Correspondence Table of Meaning Tags from Two Dictionaries in One Language Using Bilingual Word Embedding
Teruo Hirabayashi | Kanako Komiya | Masayuki Asahara | Hiroyuki Shinnou
Proceedings of the 13th Workshop on Building and Using Comparable Corpora

In this paper, we show how to use bilingual word embeddings (BWE) to automatically create a corresponding table of meaning tags from two dictionaries in one language and examine the effectiveness of the method. To do this, we had a problem: the meaning tags do not always correspond one-to-one because the granularities of the word senses and the concepts are different from each other. Therefore, we regarded the concept tag that corresponds to a word sense the most as the correct concept tag corresponding the word sense. We used two BWE methods, a linear transformation matrix and VecMap. We evaluated the most frequent sense (MFS) method and the corpus concatenation method for comparison. The accuracies of the proposed methods were higher than the accuracy of the random baseline but lower than those of the MFS and corpus concatenation methods. However, because our method utilized the embedding vectors of the word senses, the relations of the sense tags corresponding to concept tags could be examined by mapping the sense embeddings to the vector space of the concept tags. Also, our methods could be performed when we have only concept or word sense embeddings whereas the MFS method requires a parallel corpus and the corpus concatenation method needs two tagged corpora.

pdf bib
Design of BCCWJ-EEG: Balanced Corpus with Human Electroencephalography
Yohei Oseki | Masayuki Asahara
Proceedings of the Twelfth Language Resources and Evaluation Conference

The past decade has witnessed the happy marriage between natural language processing (NLP) and the cognitive science of language. Moreover, given the historical relationship between biological and artificial neural networks, the advent of deep learning has re-sparked strong interests in the fusion of NLP and the neuroscience of language. Importantly, this inter-fertilization between NLP, on one hand, and the cognitive (neuro)science of language, on the other, has been driven by the language resources annotated with human language processing data. However, there remain several limitations with those language resources on annotations, genres, languages, etc. In this paper, we describe the design of a novel language resource called BCCWJ-EEG, the Balanced Corpus of Contemporary Written Japanese (BCCWJ) experimentally annotated with human electroencephalography (EEG). Specifically, after extensively reviewing the language resources currently available in the literature with special focus on eye-tracking and EEG, we summarize the details concerning (i) participants, (ii) stimuli, (iii) procedure, (iv) data preprocessing, (v) corpus evaluation, (vi) resource release, and (vii) compilation schedule. In addition, potential applications of BCCWJ-EEG to neuroscience and NLP will also be discussed.

pdf bib
KOTONOHA: A Corpus Concordance System for Skewer-Searching NINJAL Corpora
Teruaki Oka | Yuichi Ishimoto | Yutaka Yagi | Takenori Nakamura | Masayuki Asahara | Kikuo Maekawa | Toshinobu Ogiso | Hanae Koiso | Kumiko Sakoda | Nobuko Kibe
Proceedings of the Twelfth Language Resources and Evaluation Conference

The National Institute for Japanese Language and Linguistics, Japan (NINJAL, Japan), has developed several types of corpora. For each corpus NINJAL provided an online search environment, ‘Chunagon’, which is a morphological-information-annotation-based concordance system made publicly available in 2011. NINJAL has now provided a skewer-search system ‘Kotonoha’ based on the ‘Chunagon’ systems. This system enables querying of multiple corpora by certain categories, such as register type and period.

pdf bib
Dynamically Updating Event Representations for Temporal Relation Classification with Multi-category Learning
Fei Cheng | Masayuki Asahara | Ichiro Kobayashi | Sadao Kurohashi
Findings of the Association for Computational Linguistics: EMNLP 2020

Temporal relation classification is the pair-wise task for identifying the relation of a temporal link (TLINKs) between two mentions, i.e. event, time and document creation time (DCT). It leads to two crucial limits: 1) Two TLINKs involving a common mention do not share information. 2) Existing models with independent classifiers for each TLINK category (E2E, E2T and E2D) hinder from using the whole data. This paper presents an event centric model that allows to manage dynamic event representations across multiple TLINKs. Our model deals with three TLINK categories with multi-task learning to leverage the full size of data. The experimental results show that our proposal outperforms state-of-the-art models and two strong transfer learning baselines on both the English and Japanese data.

2019

pdf bib
Word Familiarity Rate Estimation Using a Bayesian Linear Mixed Model
Masayuki Asahara
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP

This paper presents research on word familiarity rate estimation using the ‘Word List by Semantic Principles’. We collected rating information on 96,557 words in the ‘Word List by Semantic Principles’ via Yahoo! crowdsourcing. We asked 3,392 subject participants to use their introspection to rate the familiarity of words based on the five perspectives of ‘KNOW’, ‘WRITE’, ‘READ’, ‘SPEAK’, and ‘LISTEN’, and each word was rated by at least 16 subject participants. We used Bayesian linear mixed models to estimate the word familiarity rates. We also explored the ratings with the semantic labels used in the ‘Word List by Semantic Principles’.

2018

pdf bib
Between Reading Time and Clause Boundaries in Japanese - Wrap-up Effect in a Head-Final Language
Masayuki Asahara
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Annotation of ‘Word List by Semantic Principles’ Labels for the Balanced Corpus of Contemporary Written Japanese
Sachi Kato | Masayuki Asahara | Makoto Yamazaki
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

pdf bib
All-words Word Sense Disambiguation Using Concept Embeddings
Rui Suzuki | Kanako Komiya | Masayuki Asahara | Minoru Sasaki | Hiroyuki Shinnou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Universal Dependencies Version 2 for Japanese
Masayuki Asahara | Hiroshi Kanayama | Takaaki Tanaka | Yusuke Miyao | Sumire Uematsu | Shinsuke Mori | Yuji Matsumoto | Mai Omura | Yugo Murawaki
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Predicting Japanese Word Order in Double Object Constructions
Masayuki Asahara | Satoshi Nambu | Shin-Ichiro Sano
Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing

This paper presents a statistical model to predict Japanese word order in the double object constructions. We employed a Bayesian linear mixed model with manually annotated predicate-argument structure data. The findings from the refined corpus analysis confirmed the effects of information status of an NP as ‘givennew ordering’ in addition to the effects of ‘long-before-short’ as a tendency of the general Japanese word order.

pdf bib
Coordinate Structures in Universal Dependencies for Head-final Languages
Hiroshi Kanayama | Na-Rae Han | Masayuki Asahara | Jena D. Hwang | Yusuke Miyao | Jinho D. Choi | Yuji Matsumoto
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This paper discusses the representation of coordinate structures in the Universal Dependencies framework for two head-final languages, Japanese and Korean. UD applies a strict principle that makes the head of coordination the left-most conjunct. However, the guideline may produce syntactic trees which are difficult to accept in head-final languages. This paper describes the status in the current Japanese and Korean corpora and proposes alternative designs suitable for these languages.

pdf bib
UD-Japanese BCCWJ: Universal Dependencies Annotation for the Balanced Corpus of Contemporary Written Japanese
Mai Omura | Masayuki Asahara
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

In this paper, we describe a corpus UD Japanese-BCCWJ that was created by converting the Balanced Corpus of Contemporary Written Japanese (BCCWJ), a Japanese language corpus, to adhere to the UD annotation schema. The BCCWJ already assigns dependency information at the level of the bunsetsu (a Japanese syntactic unit comparable to the phrase). We developed a program to convert the BCCWJ to UD based on this dependency structure, and this corpus is the result of completely automatic conversion using the program. UD Japanese-BCCWJ is the largest-scale UD Japanese corpus and the second-largest of all UD corpora, including 1,980 documents, 57,109 sentences, and 1,273k words across six distinct domains.

2017

pdf bib
Between Reading Time and Information Structure
Masayuki Asahara
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf bib
Between Reading Time and Syntactic/Semantic Categories
Masayuki Asahara | Sachi Kato
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This article presents a contrastive analysis between reading time and syntactic/semantic categories in Japanese. We overlaid the reading time annotation of BCCWJ-EyeTrack and a syntactic/semantic category information annotation on the ‘Balanced Corpus of Contemporary Written Japanese’. Statistical analysis based on a mixed linear model showed that verbal phrases tend to have shorter reading times than adjectives, adverbial phrases, or nominal phrases. The results suggest that the preceding phrases associated with the presenting phrases promote the reading process to shorten the gazing time.

2016

pdf bib
BCCWJ-DepPara: A Syntactic Annotation Treebank on the ‘Balanced Corpus of Contemporary Written Japanese’
Masayuki Asahara | Yuji Matsumoto
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

Paratactic syntactic structures are difficult to represent in syntactic dependency tree structures. As such, we propose an annotation schema for syntactic dependency annotation of Japanese, in which coordinate structures are split from and overlaid on bunsetsu-based (base phrase unit) dependency. The schema represents nested coordinate structures, non-constituent conjuncts, and forward sharing as the set of regions. The annotation was performed on the core data of ‘Balanced Corpus of Contemporary Written Japanese’, which comprised about one million words and 1980 samples from six registers, such as newspapers, books, magazines, and web texts.

pdf bib
Universal Dependencies for Japanese
Takaaki Tanaka | Yusuke Miyao | Masayuki Asahara | Sumire Uematsu | Hiroshi Kanayama | Shinsuke Mori | Yuji Matsumoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon UniDic. Porting is done by mapping the part-of-speech tagset in UniDic to the universal part-of-speech tagset, and converting a constituent-based treebank to a typed dependency tree. The conversion is not straightforward, and we discuss the problems that arose in the conversion and the current solutions. A treebank consisting of 10,000 sentences was built by converting the existent resources and currently released to the public.

pdf bib
Reading-Time Annotations for “Balanced Corpus of Contemporary Written Japanese”
Masayuki Asahara | Hajime Ono | Edson T. Miyamoto
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The Dundee Eyetracking Corpus contains eyetracking data collected while native speakers of English and French read newspaper editorial articles. Similar resources for other languages are still rare, especially for languages in which words are not overtly delimited with spaces. This is a report on a project to build an eyetracking corpus for Japanese. Measurements were collected while 24 native speakers of Japanese read excerpts from the Balanced Corpus of Contemporary Written Japanese Texts were presented with or without segmentation (i.e. with or without space at the boundaries between bunsetsu segmentations) and with two types of methodologies (eyetracking and self-paced reading presentation). Readers’ background information including vocabulary-size estimation and Japanese reading-span score were also collected. As an example of the possible uses for the corpus, we also report analyses investigating the phenomena of anti-locality.

pdf bib
BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’
Masayuki Asahara | Kazuya Kawahara | Yuya Takei | Hideto Masuoka | Yasuko Ohba | Yuki Torii | Toru Morii | Yuki Tanaka | Kikuo Maekawa | Sachi Kato | Hikari Konishi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising ten billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents the corpus concordance system named ‘BonTen’ which enables the ten-billion-scaled corpus to be queried by string, a sequence of morphological information or a subtree of the syntactic dependency structure.

pdf bib
Demonstration of ChaKi.NET – beyond the corpus search system
Masayuki Asahara | Yuji Matsumoto | Toshio Morita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

ChaKi.NET is a corpus management system for dependency structure annotated corpora. After more than 10 years of continuous development, the system is now usable not only for corpus search, but also for visualization, annotation, labelling, and formatting for statistical analysis. This paper describes the various functions included in the current ChaKi.NET system.

2014

pdf bib
BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text
Masayuki Asahara | Sachi Kato | Hikari Konishi | Mizuho Imada | Kikuo Maekawa
International Journal of Computational Linguistics & Chinese Language Processing, Volume 19, Number 3, September 2014

2013

pdf bib
BCCWJ-TimeBank: Temporal and Event Information Annotation on Japanese Text
Masayuki Asahara | Sachi Yasuda | Hikari Konishi | Mizuho Imada | Kikuo Maekawa
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

2012

pdf bib
Head-driven Transition-based Parsing with Top-down Prediction
Katsuhiko Hayashi | Taro Watanabe | Masayuki Asahara | Yuji Matsumoto
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Identifying Temporal Relations by Sentence and Document Optimizations
Katsumasa Yoshikawa | Masayuki Asahara | Ryu Iida
Proceedings of COLING 2012: Posters

2011

pdf bib
Different Input Systems for Different Devices
Asad Habib | Masakazu Iwatate | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Workshop on Advances in Text Input Methods (WTIM 2011)

pdf bib
Jointly Extracting Japanese Predicate-Argument Relation with Markov Logic
Katsumasa Yoshikawa | Masayuki Asahara | Yuji Matsumoto
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Third-order Variational Reranking on Packed-Shared Dependency Forests
Katsuhiko Hayashi | Taro Watanabe | Masayuki Asahara | Yuji Matsumoto
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
A Structured Model for Joint Learning of Argument Roles and Predicate Senses
Yotaro Watanabe | Masayuki Asahara | Yuji Matsumoto
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Multilingual Syntactic-Semantic Dependency Parsing with Three-Stage Approximate Max-Margin Linear Models
Yotaro Watanabe | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task

pdf bib
Jointly Identifying Temporal Relations with Markov Logic
Katsumasa Yoshikawa | Sebastian Riedel | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
A Pipeline Approach for Syntactic and Semantic Dependency Parsing
Yotaro Watanabe | Masakazu Iwatate | Masayuki Asahara | Yuji Matsumoto
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Constructing a Temporal Relation Tagged Corpus of Chinese Based on Dependency Structure Analysis
Yuchang Cheng | Masayuki Asahara | Yuji Matsumoto
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 2, June 2008

pdf bib
Japanese-Spanish Thesaurus Construction Using English as a Pivot
Jessica Ramírez | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Use of Event Types for Temporal Relation Identification in Chinese Text
Yuchang Cheng | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Analyzing Chinese Synthetic Words with Tree-based Information and a Survey on Chinese Morphologically Derived Words
Jia Lu | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Japanese Dependency Parsing Using a Tournament Model
Masakazu Iwatate | Masayuki Asahara | Yuji Matsumoto
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
NAIST.Japan: Temporal Relation Identification Using Dependency Parsed Tree
Yuchang Cheng | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
Yotaro Watanabe | Masayuki Asahara | Yuji Matsumoto
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
An Annotated Corpus Management Tool: ChaKi
Yuji Matsumoto | Masayuki Asahara | Kiyota Hashimoto | Yukio Tono | Akira Ohtani | Toshio Morita
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Large scale annotated corpora are very important not only inlinguistic research but also in practical natural language processingtasks since a number of practical tools such as Part-of-speech (POS) taggers and syntactic parsers are now corpus-based or machine learning-based systems which require some amount of accurately annotated corpora. This article presents an annotated corpus management tool that provides various functions that include flexible search, statistic calculation, and error correction for linguistically annotated corpora. The target of annotation covers POS tags, base phrase chunks and syntactic dependency structures. This tool aims at helping development of consistent construction of lexicon and annotated corpora to be used by researchers both in linguists and language processing communities.

pdf bib
Multi-lingual Dependency Parsing at NAIST
Yuchang Cheng | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib
The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer
Chooi-Ling Goh | Jia Lü | Yuchang Cheng | Masayuki Asahara | Yuji Matsumoto
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
Automatic Extraction of Fixed Multiword Expressions
Campbell Hore | Masayuki Asahara | Yūji Matsumoto
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Building a Japanese-Chinese Dictionary Using Kanji/Hanzi Conversion
Chooi-Ling Goh | Masayuki Asahara | Yuji Matsumoto
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Chinese Deterministic Dependency Analyzer: Examining Effects of Global Features and Root Node Finder
Yuchang Cheng | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

pdf bib
Combination of Machine Learning Methods for Optimum Chinese Word Segmentation
Masayuki Asahara | Kenta Fukuoka | Ai Azuma | Chooi-Ling Goh | Yotaro Watanabe | Yuji Matsumoto | Takashi Tsuzuki
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation by Classification of Characters
Chooi-Ling Goh | Masayuki Asahara | Yuji Matsumoto
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 3, September 2005: Special Issue on Selected Papers from ROCLING XVI

2004

pdf bib
Chinese Word Segmentation by Classification of Characters
Chooi-Ling Goh | Masayuki Asahara | Yuji Matsumoto
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing

pdf bib
Japanese Unknown Word Identification by Character-based Chunking
Masayuki Asahara | Yuji Matsumoto
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Pruning False Unknown Words to Improve Chinese Word Segmentation
Chooi-Ling Goh | Masayuki Asahara | Yuji Matsumoto
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation

2003

pdf bib
Japanese Named Entity Extraction with Redundant Morphological Analysis
Masayuki Asahara | Yuji Matsumoto
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Chinese Unknown Word Identification Using Character-based Tagging and Chunking
Chooi Ling Goh | Masayuki Asahara | Yuji Matsumoto
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Combining Segmenter and Chunker for Chinese Word Segmentation
Masayuki Asahara | Chooi Ling Goh | Xiaojie Wang | Yuji Matsumoto
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

2002

pdf bib
Use of XML and Relational Databases for Consistent Development and Maintenance of Lexicons and Annotated Corpora
Masayuki Asahara | Ryuichi Yoneda | Akiko Yamashita | Yasuharu Den | Yuji Matsumoto
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Extended Models and Tools for High-performance Part-of-speech
Masayuki Asahara | Yuji Matsumoto
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

Search