Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Approach

Nguyen, Cuong; Do, Thanh-Toan; Carneiro, Gustavo

Computer Science > Machine Learning

arXiv:2301.01405 (cs)

[Submitted on 4 Jan 2023 (v1), last revised 16 Apr 2023 (this version, v2)]

Title:Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Approach

Authors:Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro

View PDF

Abstract:Learning from noisy labels (LNL) plays a crucial role in deep learning. The most promising LNL methods rely on identifying clean-label samples from a dataset with noisy annotations. Such an identification is challenging because the conventional LNL problem, which assumes a single noisy label per instance, is non-identifiable, i.e., clean labels cannot be estimated theoretically without additional heuristics. In this paper, we aim to formally investigate this identifiability issue using multinomial mixture models to determine the constraints that make the problem identifiable. Specifically, we discover that the LNL problem becomes identifiable if there are at least $2C - 1$ noisy labels per instance, where $C$ is the number of classes. To meet this requirement without relying on additional $2C - 2$ manual annotations per instance, we propose a method that automatically generates additional noisy labels by estimating the noisy label distribution based on nearest neighbours. These additional noisy labels enable us to apply the Expectation-Maximisation algorithm to estimate the posterior probabilities of clean labels, which are then used to train the model of interest. We empirically demonstrate that our proposed method is capable of estimating clean labels without any heuristics in several label noise benchmarks, including synthetic, web-controlled, and real-world label noises. Furthermore, our method performs competitively with many state-of-the-art methods.

Comments:	Clarify further the motivation, finding results and the method proposed
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2301.01405 [cs.LG]
	(or arXiv:2301.01405v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2301.01405

Submission history

From: Cuong Nguyen [view email]
[v1] Wed, 4 Jan 2023 01:54:33 UTC (214 KB)
[v2] Sun, 16 Apr 2023 07:48:11 UTC (219 KB)

Computer Science > Machine Learning

Title:Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators