A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

Wang, Fei; Liu, Haoyu; Bi, Haoyang; Shen, Xiangzhuang; Zhu, Renyu; Wu, Runze; Lin, Minmin; Lv, Tangjie; Fan, Changjie; Liu, Qi; Huang, Zhenya; Chen, Enhong

Computer Science > Human-Computer Interaction

arXiv:2403.08826 (cs)

[Submitted on 10 Mar 2024]

Title:A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

Authors:Fei Wang, Haoyu Liu, Haoyang Bi, Xiangzhuang Shen, Renyu Zhu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Qi Liu, Zhenya Huang, Enhong Chen

View PDF

Abstract:For the purpose of efficient and cost-effective large-scale data labeling, crowdsourcing is increasingly being utilized. To guarantee the quality of data labeling, multiple annotations need to be collected for each data sample, and truth inference algorithms have been developed to accurately infer the true labels. Despite previous studies having released public datasets to evaluate the efficacy of truth inference algorithms, these have typically focused on a single type of crowdsourcing task and neglected the temporal information associated with workers' annotation activities. These limitations significantly restrict the practical applicability of these algorithms, particularly in the context of long-term and online truth inference. In this paper, we introduce a substantial crowdsourcing annotation dataset collected from a real-world crowdsourcing platform. This dataset comprises approximately two thousand workers, one million tasks, and six million annotations. The data was gathered over a period of approximately six months from various types of tasks, and the timestamps of each annotation were preserved. We analyze the characteristics of the dataset from multiple perspectives and evaluate the effectiveness of several representative truth inference algorithms on this dataset. We anticipate that this dataset will stimulate future research on tracking workers' abilities over time in relation to different types of tasks, as well as enhancing online truth inference.

Subjects:	Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Cite as:	arXiv:2403.08826 [cs.HC]
	(or arXiv:2403.08826v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2403.08826

Submission history

From: Fei Wang [view email]
[v1] Sun, 10 Mar 2024 16:00:41 UTC (1,524 KB)

Computer Science > Human-Computer Interaction

Title:A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators