2022
pdf
bib
abs
Hate Speech Dynamics Against African descent, Roma and LGBTQI Communities in Portugal
Paula Carvalho
|
Bernardo Cunha
|
Raquel Santos
|
Fernando Batista
|
Ricardo Ribeiro
Proceedings of the Thirteenth Language Resources and Evaluation Conference
This paper introduces FIGHT, a dataset containing 63,450 tweets, posted before and after the official declaration of Covid-19 as a pandemic by online users in Portugal. This resource aims at contributing to the analysis of online hate speech targeting the most representative minorities in Portugal, namely the African descent and the Roma communities, and the LGBTQI community, the most commonly reported target of hate speech in social media at the European context. We present the methods for collecting the data, and provide insightful statistics on the distribution of tweets included in FIGHT, considering both the temporal and spatial dimensions. We also analyze the availability over time of tweets targeting the above-mentioned communities, distinguishing public, private and deleted tweets. We believe this study will contribute to better understand the dynamics of online hate speech in Portugal, particularly in adverse contexts, such as a pandemic outbreak, allowing the development of more informed and accurate hate speech resources for Portuguese.
2020
pdf
bib
abs
Mapping the Dialog Act Annotations of the LEGO Corpus into ISO 24617-2 Communicative Functions
Eugénio Ribeiro
|
Ricardo Ribeiro
|
David Martins de Matos
Proceedings of the Twelfth Language Resources and Evaluation Conference
ISO 24617-2, the ISO standard for dialog act annotation, sets the ground for more comparable research in the area. However, the amount of data annotated according to it is still reduced, which impairs the development of approaches for automatic recognition. In this paper, we describe a mapping of the original dialog act labels of the LEGO corpus, which have been neglected, into the communicative functions of the standard. Although this does not lead to a complete annotation according to the standard, the 347 dialogs provide a relevant amount of data that can be used in the development of automatic communicative function recognition approaches, which may lead to a wider adoption of the standard. Using the 17 English dialogs of the DialogBank as gold standard, our preliminary experiments have shown that including the mapped dialogs during the training phase leads to improved performance while recognizing communicative functions in the Task dimension.
2019
pdf
bib
abs
L2F/INESC-ID at SemEval-2019 Task 2: Unsupervised Lexical Semantic Frame Induction using Contextualized Word Representations
Eugénio Ribeiro
|
Vânia Mendonça
|
Ricardo Ribeiro
|
David Martins de Matos
|
Alberto Sardinha
|
Ana Lúcia Santos
|
Luísa Coheur
Proceedings of the 13th International Workshop on Semantic Evaluation
Building large datasets annotated with semantic information, such as FrameNet, is an expensive process. Consequently, such resources are unavailable for many languages and specific domains. This problem can be alleviated by using unsupervised approaches to induce the frames evoked by a collection of documents. That is the objective of the second task of SemEval 2019, which comprises three subtasks: clustering of verbs that evoke the same frame and clustering of arguments into both frame-specific slots and semantic roles. We approach all the subtasks by applying a graph clustering algorithm on contextualized embedding representations of the verbs and arguments. Using such representations is appropriate in the context of this task, since they provide cues for word-sense disambiguation. Thus, they can be used to identify different frames evoked by the same words. Using this approach we were able to outperform all of the baselines reported for the task on the test set in terms of Purity F1, as well as in terms of BCubed F1 in most cases.
2016
pdf
bib
abs
SPA: Web-based Platform for easy Access to Speech Processing Modules
Fernando Batista
|
Pedro Curto
|
Isabel Trancoso
|
Alberto Abad
|
Jaime Ferreira
|
Eugénio Ribeiro
|
Helena Moniz
|
David Martins de Matos
|
Ricardo Ribeiro
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper presents SPA, a web-based Speech Analytics platform that integrates several speech processing modules and that makes it possible to use them through the web. It was developed with the aim of facilitating the usage of the modules, without the need to know about software dependencies and specific configurations. Apart from being accessed by a web-browser, the platform also provides a REST API for easy integration with other applications. The platform is flexible, scalable, provides authentication for access restrictions, and was developed taking into consideration the time and effort of providing new services. The platform is still being improved, but it already integrates a considerable number of audio and text processing modules, including: Automatic transcription, speech disfluency classification, emotion detection, dialog act recognition, age and gender classification, non-nativeness detection, hyper-articulation detection, dialog act recognition, and two external modules for feature extraction and DTMF detection. This paper describes the SPA architecture, presents the already integrated modules, and provides a detailed description for the ones most recently integrated.
2015
pdf
bib
Extending a Single-Document Summarizer to Multi-Document: a Hierarchical Approach
Luís Marujo
|
Ricardo Ribeiro
|
David Martins de Matos
|
João Neto
|
Anatole Gershman
|
Jaime Carbonell
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics
2014
pdf
bib
abs
Revising the annotation of a Broadcast News corpus: a linguistic approach
Vera Cabarrão
|
Helena Moniz
|
Fernando Batista
|
Ricardo Ribeiro
|
Nuno Mamede
|
Hugo Meinedo
|
Isabel Trancoso
|
Ana Isabel Mata
|
David Martins de Matos
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.
pdf
bib
abs
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
Anabela Barreiro
|
Fernando Batista
|
Ricardo Ribeiro
|
Helena Moniz
|
Isabel Trancoso
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents 3 sets of OpenLogos resources, namely the English-German, the English-French, and the English-Italian bilingual dictionaries. In addition to the usual information on part-of-speech, gender, and number for nouns, offered by most dictionaries currently available, OpenLogos bilingual dictionaries have some distinctive features that make them unique: they contain cross-language morphological information (inflectional and derivational), semantico-syntactic knowledge, indication of the head word in multiword units, information about whether a source word corresponds to an homograph, information about verb auxiliaries, alternate words (i.e., predicate or process nouns), causatives, reflexivity, verb aspect, among others. The focal point of the paper will be the semantico-syntactic knowledge that is important for disambiguation and translation precision. The resources are publicly available at the METANET platform for free use by the research community.
2008
pdf
bib
Mixed-Source Multi-Document Speech-to-Text Summarization
Ricardo Ribeiro
|
David Martins de Matos
Coling 2008: Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization
2004
pdf
bib
Rethinking Reusable Resources
David M. de Matos
|
Ricardo Ribeiro
|
Nuno J. Mamede
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
2002
pdf
bib
Morphosyntactic Disambiguation for TTS Systems
Ricardo Ribeiro
|
Luís Oliveira
|
Isabel Trancoso
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2000
pdf
bib
Some Language Resources and Tools for Computational Processing of Portuguese at INESC
Luzia Wittmann
|
Ricardo Daniel Ribeiro
|
Tânia Pêgo
|
Fernando Batista
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)