Computer Science > Computation and Language
[Submitted on 27 Feb 2018 (v1), last revised 8 Sep 2018 (this version, v2)]
Title:A Hybrid Word-Character Approach to Abstractive Summarization
View PDFAbstract:Automatic abstractive text summarization is an important and challenging research topic of natural language processing. Among many widely used languages, the Chinese language has a special property that a Chinese character contains rich information comparable to a word. Existing Chinese text summarization methods, either adopt totally character-based or word-based representations, fail to fully exploit the information carried by both representations. To accurately capture the essence of articles, we propose a hybrid word-character approach (HWC) which preserves the advantages of both word-based and character-based representations. We evaluate the advantage of the proposed HWC approach by applying it to two existing methods, and discover that it generates state-of-the-art performance with a margin of 24 ROUGE points on a widely used dataset LCSTS. In addition, we find an issue contained in the LCSTS dataset and offer a script to remove overlapping pairs (a summary and a short text) to create a clean dataset for the community. The proposed HWC approach also generates the best performance on the new, clean LCSTS dataset.
Submission history
From: Chieh-Teng Chang [view email][v1] Tue, 27 Feb 2018 15:31:11 UTC (92 KB)
[v2] Sat, 8 Sep 2018 04:59:55 UTC (144 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.