BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Zhang, Zachariah; Liu, Jingshu; Razavian, Narges

Computer Science > Information Retrieval

arXiv:2006.03685 (cs)

[Submitted on 26 May 2020]

Title:BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Authors:Zachariah Zhang, Jingshu Liu, Narges Razavian

View PDF

Abstract:Clinical interactions are initially recorded and documented in free text medical notes. ICD coding is the task of classifying and coding all diagnoses, symptoms and procedures associated with a patient's visit. The process is often manual and extremely time-consuming and expensive for hospitals. In this paper, we propose a machine learning model, BERT-XML, for large scale automated ICD coding from EHR notes, utilizing recently developed unsupervised pretraining that have achieved state of the art performance on a variety of NLP tasks. We train a BERT model from scratch on EHR notes, learning with vocabulary better suited for EHR tasks and thus outperform off-the-shelf models. We adapt the BERT architecture for ICD coding with multi-label attention. While other works focus on small public medical datasets, we have produced the first large scale ICD-10 classification model using millions of EHR notes to predict thousands of unique ICD codes.

Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.03685 [cs.IR]
	(or arXiv:2006.03685v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2006.03685

Submission history

From: Zachariah Zhang [view email]
[v1] Tue, 26 May 2020 21:12:43 UTC (4,161 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
cs.CL
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zachariah Zhang
Jingshu Liu
Narges Razavian

export BibTeX citation

Computer Science > Information Retrieval

Title:BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators