I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles

Alberto Rodero Peña; Jacinto Mata Vázquez; Victoria Pachón Álvarez

doi:10.18653/v1/2024.semeval-1.121

I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles

Alberto Rodero Peña, Jacinto Mata Vazquez, Victoria Pachón Álvarez

Abstract

With the rise of AI-based text generators, the need for effective detection mechanisms has become paramount. This paper presents new techniques for building adaptable models and optimizing training aspects for identifying synthetically produced texts across multiple generators and domains. The study, divided into binary and multilabel classification tasks, avoids overfitting through strategic training data limitation. A key innovation is the incorporation of multimodal models that blend numerical text features with conventional NLP approaches. The work also delves into optimizing ensemble model combinations via various voting methods, focusing on accuracy as the official metric. The optimized ensemble strategy demonstrates significant efficacy in both subtasks, highlighting the potential of multimodal and ensemble methods in enhancing the robustness of detection systems against emerging text generators.

Anthology ID:: 2024.semeval-1.121
Volume:: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:: SemEval
SIG:: SIGLEX
Publisher:: Association for Computational Linguistics
Note:
Pages:: 845–852
Language:
URL:: https://aclanthology.org/2024.semeval-1.121
DOI:: 10.18653/v1/2024.semeval-1.121
Bibkey:
Cite (ACL):: Alberto Rodero Peña, Jacinto Mata Vazquez, and Victoria Pachón Álvarez. 2024. I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 845–852, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles (Rodero Peña et al., SemEval 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.semeval-1.121.pdf
Supplementary material:: 2024.semeval-1.121.SupplementaryMaterial.txt

PDF Cite Search Supplementary material