A Boosting Algorithm for Positive-Unlabeled Learning

Zhao, Yawen; Zhang, Mingzhe; Zhang, Chenhao; Chen, Weitong; Ye, Nan; Xu, Miao

Computer Science > Machine Learning

arXiv:2205.09485 (cs)

[Submitted on 19 May 2022 (v1), last revised 7 Dec 2022 (this version, v4)]

Title:A Boosting Algorithm for Positive-Unlabeled Learning

Authors:Yawen Zhao, Mingzhe Zhang, Chenhao Zhang, Weitong Chen, Nan Ye, Miao Xu

View PDF

Abstract:Positive-unlabeled (PU) learning deals with binary classification problems when only positive (P) and unlabeled (U) data are available. Many recent PU methods are based on neural networks, but little has been done to develop boosting algorithms for PU learning, despite boosting algorithms' strong performance on many fully supervised classification problems. In this paper, we propose a novel boosting algorithm, AdaPU, for PU learning. Similarly to AdaBoost, AdaPU aims to optimize an empirical exponential loss, but the loss is based on the PU data, rather than on positive-negative (PN) data. As in AdaBoost, we learn a weighted combination of weak classifiers by learning one weak classifier and its weight at a time. However, AdaPU requires a very different algorithm for learning the weak classifiers and determining their weights. This is because AdaPU learns a weak classifier and its weight using a weighted positive-negative (PN) dataset with some negative data weights $-$ the dataset is derived from the original PU data, and the data weights are determined by the current weighted classifier combination, but some data weights are negative. Our experiments showed that AdaPU outperforms neural networks on several benchmark PU datasets, including a large-scale challenging cyber security dataset.

Comments:	17 pages, 24 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2205.09485 [cs.LG]
	(or arXiv:2205.09485v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.09485

Submission history

From: Yawen Zhao [view email]
[v1] Thu, 19 May 2022 11:50:22 UTC (1,739 KB)
[v2] Tue, 5 Jul 2022 07:16:41 UTC (4,458 KB)
[v3] Sat, 20 Aug 2022 04:35:38 UTC (9,042 KB)
[v4] Wed, 7 Dec 2022 05:26:44 UTC (8,491 KB)

Computer Science > Machine Learning

Title:A Boosting Algorithm for Positive-Unlabeled Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Boosting Algorithm for Positive-Unlabeled Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators