Author identification using latent dirichlet allocation

H Calvo, Á Hernández-Castañeda… - … Linguistics and Intelligent …, 2017 - Springer
International Conference on Computational Linguistics and Intelligent Text …, 2017Springer
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation
(LDA) model. By using this method, we take into account the vocabulary and context of
words at the same time, and after a statistical process find to what extent the relations
between words are given in each document; processing a set of documents by LDA returns
a set of distributions of topics. Each distribution can be seen as a vector of features and a
fingerprint of each document within the collection. We used then a Naïve Bayes classifier on …
Abstract
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.
Springer
Showing the best result for this search. See all results