Papers by Efstathios Stamatatos
We provide you with a training corpus that consists of suspicious documents. Each suspicious docu... more We provide you with a training corpus that consists of suspicious documents. Each suspicious document is about a specific topic and may consist of plagiarized passages obtained from web pages on that topic found in the ClueWeb09 corpus.
We provide you with a training corpus that consists of pairs of documents, one of which may conta... more We provide you with a training corpus that consists of pairs of documents, one of which may contain passages of text resued from the other. The reused text is subject to various kinds of (automatic) obfuscation to hide the fact it has been reused.
We provide you with a training data set that consists of documents written in both English and Sp... more We provide you with a training data set that consists of documents written in both English and Spanish. With regard to age, we will consider posts of three classes: 10s (13-17), 20s (23-27), and 30s (33-47). Moreover, documents from authors who pretend to be minors will be included (e.g., documents composed of chat lines of sexual predators will be also considered). Learn more »
We provide a collection of (up to 50) short documents (paragraphs extracted from larger documents... more We provide a collection of (up to 50) short documents (paragraphs extracted from larger documents), identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given. More information: Link
The Kluwer international series on information retrieval, 2019
PAN is a networking initiative for digital text forensics, where researchers and practitioners st... more PAN is a networking initiative for digital text forensics, where researchers and practitioners study technologies for text analysis with regard to originality, authorship, and trustworthiness. The practical importance of such technologies is obvious for law enforcement, cyber-security, and marketing, yet the general public needs to be aware of their capabilities as well to make informed decisions about them. This is particularly true since almost all of these technologies are still in their infancy, and active research is required to push them forward. Hence PAN focuses on the evaluation of selected tasks from the digital text forensics in order to develop large-scale, standardized benchmarks, and to assess the state of the art. In this chapter we present the evolution of three shared tasks: plagiarism detection, author identification, and author profiling.
Language Resources and Evaluation, Jan 29, 2011
Pattern Recognition Letters, Jul 1, 2020
Lecture Notes in Computer Science, 2022
Lecture Notes in Computer Science, 2020
Lecture Notes in Computer Science, 2021
Communications in Computer and Information Science, 2007
Proceedings of the 28th international conference on Software engineering, 2006
Lecture Notes in Computer Science, 2006
Lecture Notes in Computer Science, 2013
Uploads
Papers by Efstathios Stamatatos