Dynamic Traceback Learning for Medical Report Generation

Ye, Shuchang; Meng, Mingyuan; Li, Mingjian; Feng, Dagan; Naseem, Usman; Kim, Jinman

Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.13267 (cs)

[Submitted on 24 Jan 2024 (v1), last revised 7 Sep 2024 (this version, v3)]

Title:Dynamic Traceback Learning for Medical Report Generation

Authors:Shuchang Ye, Mingyuan Meng, Mingjian Li, Dagan Feng, Usman Naseem, Jinman Kim

View PDF HTML (experimental)

Abstract:Automated medical report generation has the potential to significantly reduce the workload associated with the time-consuming process of medical reporting. Recent generative representation learning methods have shown promise in integrating vision and language modalities for medical report generation. However, when trained end-to-end and applied directly to medical image-to-text generation, they face two significant challenges: i) difficulty in accurately capturing subtle yet crucial pathological details, and ii) reliance on both visual and textual inputs during inference, leading to performance degradation in zero-shot inference when only images are available. To address these challenges, this study proposes a novel multi-modal dynamic traceback learning framework (DTrace). Specifically, we introduce a traceback mechanism to supervise the semantic validity of generated content and a dynamic learning strategy to adapt to various proportions of image and text input, enabling text generation without strong reliance on the input from both modalities during inference. The learning of cross-modal knowledge is enhanced by supervising the model to recover masked semantic information from a complementary counterpart. Extensive experiments conducted on two benchmark datasets, IU-Xray and MIMIC-CXR, demonstrate that the proposed DTrace framework outperforms state-of-the-art methods for medical report generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.13267 [cs.CV]
	(or arXiv:2401.13267v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.13267

Submission history

From: Shuchang Ye [view email]
[v1] Wed, 24 Jan 2024 07:13:06 UTC (2,867 KB)
[v2] Wed, 6 Mar 2024 10:55:44 UTC (2,867 KB)
[v3] Sat, 7 Sep 2024 07:55:43 UTC (3,341 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Dynamic Traceback Learning for Medical Report Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Dynamic Traceback Learning for Medical Report Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators