Applying SoftTriple Loss for Supervised Language Model Fine Tuning

DOI: http://dx.doi.org/10.15439/2022F185

Citation: Proceedings of the 17th Conference on Computer Science and Intelligence Systems, M. Ganzha, L. Maciaszek, M. Paprzycki, D. Ślęzak (eds). ACSIS, Vol. 30, pages 141147

Abstract. We introduce a new loss function based on cross entropy and SoftTriple loss, TripleEntropy, to improve classification performance for fine-tuning general knowledge pre-trained language models.This loss function can improve the robust RoBERTa baseline model fine-tuned with cross-entropy loss by about 0.02 - 2.29 percentage points. Thorough tests on popular datasets using our loss function indicate a steady gain. The fewer samples in the training dataset, the higher gain -- thus, for small-sized dataset, it is about 0.71 percentage points, for medium-sized -- 0.86 percentage points, for large -- 0.20 percentage points, and for extra-large 0.04 percentage points.


