计算机科学 ›› 2017, Vol. 44 ›› Issue (Z6): 80-83.doi: 10.11896/j.issn.1002-137X.2017.6A.016

• 智能计算 • 上一篇    下一篇

中文开放式多元实体关系抽取

李颖,郝晓燕,王勇   

  1. 太原理工大学计算机科学与技术学院 晋中030600,太原理工大学计算机科学与技术学院 晋中030600,太原理工大学计算机科学与技术学院 晋中030600
  • 出版日期:2017-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受基于框架语义标注的中文篇章指代消解策略研究(2012011011-2)资助

N-ary Chinese Open Entity-relation Extraction

LI Ying, HAO Xiao-yan and WANG Yong   

  • Online:2017-12-01 Published:2018-12-01

摘要: 传统信息抽取针对特定的领域。当转换到新领域时,需要人工编写新的抽取规则和人工标记新的训练样本。开放信息抽取突破了传统信息抽取的局限性。现有的开放式信息抽取系统大多针对英文,然而,目前对于中文的研究相对较少,并主要以抽取三元组为主,没有针对中文抽取多元组的方法。因此提出了一种基于依存分析的中文开放式多元实体关系抽取方法。首先,对文本集进行预处理和依存关系分析;然后将动词视为候选关系词,将与此动词有满足条件的有效依存路径的基本名词短语视为实体词,关联两个及两个以上的实体词的关系词可与实体词组成候选多元实体关系组;最后,使用经过训练的逻辑回归分类器对多元实体关系组进行过滤。对百度百科数据集的抽取结果显示,所提方法在抽取大量实体关系多元组时准确性可达到81%。

关键词: 中文开放式信息抽取,依存分析,实体关系抽取,机器学习,OIE,word2vec

Abstract: Traditionally,information extraction (IE) has focused on satisfying precise,narrow,pre-specified requests from small homogeneous corpora.Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples.Open information extraction (OIE) overcomes the limitations of traditional IE techniques,which trains individual extractors for every single relation type.Present studies have attracted much attention on English OIE.However,few studies have been reported on OIE for Chinese.This paper presented a N-ary Chinese OIE system(N-COIE).N-COIE preprocesses the sentences using the nature language processing tools,and then extracts entity-relation groups from the preprocessed sentences.Finally,N-COIE filters entity-relation groups using the trained logistic regression classifier.Empirical results show the effectiveness of the proposed system.

Key words: Chinese open information extraction,Dependency parsing,Entity-relation extraction,Machine learning,OIE,Word2vec

[1] CHINCHOR N,MARSH E.MUC-7 Information ExtractionTask Definition[C]∥Proc of MUC-7.1998.
[2] BANKO M,CAFARELL M J,S ODERLAND S.Open information extraction from the Web[C]∥Proc of IJCAI.2007.
[3] BANKO M,ETZIONI O.The tradeoff between open and traditional relation extraction[C]∥Proc of Annual Meeting of the Association for Computational Lingustics.2008:28-36.
[4] WU F,WELD D S.Open information extraction using Wikipedia[C]∥Proc of Annual Meeting of the Association for Computational Lingustics.2010:118-127.
[5] FADER A,SODERLAND S,ETZIONI O.Identifying relations for open information extarctions[C]∥Proc of Conference on Empirical Methods in Natural Language Processing.2011:1535-1545.
[6] ETZIONI O,FADER A,CHRISTENSEN J.Open informationextraction:the second generation[C]∥Proc of International Joint Conference on Artificial Intelligence.2011:3-10.
[7] MAUSAM,SCHMITZ M,BART R,et al.Open LanguageLearning from Information Extraction[C]∥Proc of Conference-on Empirical Methods in Natural Language Processing and Computer Language Learning(EMNLP).2012:523-534.
[8] XAVIER C C,DE LIMAV L S.Boosting Open Information Extraction with Noun-Based Relations[C]∥LREC.2014.
[9] AKBIK A,LOSER A.KRAKEN:N-ary Facts in Open Information Extraction[C]∥Proc of AKBC-WEKEX at NAACL.2012:199-202.
[10] AKBIK A,BROSS J.Wanderlust:Extracting semantic relations from natural language text using dependency grammar patterns[C]∥Proc of the 1st Workshop on Semantic Search at 18th WWWW Conference.2009.
[11] 杨博,蔡东风,杨华.开放式信息抽取研究进展[J].中文信息学报,2014,28(4):1-11.
[12] GAMALLO P,GARCIA M,FERNADEZ-LANZA S.Dependency-based open information extraction [C]∥Proc of ROBUSUNSUP.2012.
[13] TSENG Y H,LEE L H,LIN S Y,et al.Chinese open information extraction for knowledge acquisition[C]∥EACL2014.2014:12-16.
[14] QIU L K,ZHANG Y.ZORE:A syntax-based system for Chinese open information extraction[C]∥EMNLP.2014:1870-1880.
[15] 秦兵,刘安安,刘挺.无指导的中文开放式实体关系抽取[J].计算机研究与发展,2015,52(5):1029-1035.
[16] CHE W X,LI Z H,LIU T.LTP:A Chinese Language Techno-logy Platform [C]∥ACL.2010:13-16.
[17] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]∥CoRR.2013.
[18] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]∥Advances in Neural Information Processing Systems.2013:3111-3119.

No related articles found!
Viewed
Full text
77
HTML PDF
Just accepted Online first Issue Just accepted Online first Issue
0 0 0 0 0 77

  From Others local
  Times 2 75
  Rate 3% 97%

Abstract
542
Just accepted Online first Issue
0 0 542
  From local
  Times 542
  Rate 100%

Cited

Web of Science  Crossref   ScienceDirect  Search for Citations in Google Scholar >>
 
This page requires you have already subscribed to WoS.
  Shared   
  Discussed   
No Suggested Reading articles found!