How much does a word weigh? Weighting word embeddings for word sense induction

Arefyev, Nikolay; Ermolaev, Pavel; Panchenko, Alexander

Computer Science > Computation and Language

arXiv:1805.09209 (cs)

[Submitted on 23 May 2018 (v1), last revised 25 Oct 2018 (this version, v2)]

Title:How much does a word weigh? Weighting word embeddings for word sense induction

Authors:Nikolay Arefyev, Pavel Ermolaev, Alexander Panchenko

View PDF

Abstract:The paper describes our participation in the first shared task on word sense induction and disambiguation for the Russian language RUSSE'2018 (Panchenko et al., 2018). For each of several dozens of ambiguous words, the participants were asked to group text fragments containing it according to the senses of this word, which were not provided beforehand, therefore the "induction" part of the task. For instance, a word "bank" and a set of text fragments (also known as "contexts") in which this word occurs, e.g. "bank is a financial institution that accepts deposits" and "river bank is a slope beside a body of water" were given. A participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the "company" and the "area" senses of the word "bank". The organizers proposed three evaluation datasets of varying complexity and text genres based respectively on texts of Wikipedia, Web pages, and a dictionary of the Russian language. We present two experiments: a positive and a negative one, based respectively on clustering of contexts represented as a weighted average of word embeddings and on machine translation using two state-of-the-art production neural machine translation systems. Our team showed the second best result on two datasets and the third best result on the remaining one dataset among 18 participating teams. We managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings.

Comments:	In the Proceedings of the 24rd International Conference on Computational Linguistics and Intellectual Technologies (Dialogue'2018), Moscow, Russia. RGGU
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1805.09209 [cs.CL]
	(or arXiv:1805.09209v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1805.09209

Submission history

From: Alexander Panchenko [view email]
[v1] Wed, 23 May 2018 14:58:13 UTC (863 KB)
[v2] Thu, 25 Oct 2018 15:23:44 UTC (691 KB)

Computer Science > Computation and Language

Title:How much does a word weigh? Weighting word embeddings for word sense induction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:How much does a word weigh? Weighting word embeddings for word sense induction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators