GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Xu, Minghao; Geng, Yunteng; Zhang, Yihang; Yang, Ling; Tang, Jian; Zhang, Wentao

Computer Science > Machine Learning

arXiv:2405.16206 (cs)

[Submitted on 25 May 2024 (v1), last revised 1 Oct 2024 (this version, v3)]

Title:GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Authors:Minghao Xu, Yunteng Geng, Yihang Zhang, Ling Yang, Jian Tang, Wentao Zhang

View PDF HTML (experimental)

Abstract:Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan property and function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Also, we evaluate how taxonomy prediction can boost other three function prediction tasks by MTL. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance. We provide all datasets and source codes at this https URL and maintain a leaderboard at this https URL

Comments:	Research project paper. All code and data are released
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2405.16206 [cs.LG]
	(or arXiv:2405.16206v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.16206

Submission history

From: Minghao Xu [view email]
[v1] Sat, 25 May 2024 12:35:31 UTC (874 KB)
[v2] Thu, 26 Sep 2024 07:32:09 UTC (876 KB)
[v3] Tue, 1 Oct 2024 05:14:15 UTC (876 KB)

Computer Science > Machine Learning

Title:GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators