Estimating Redundancy in Clinical Text

Searle, Thomas; Ibrahim, Zina; Teo, James; Dobson, Richard JB

doi:10.1016/j.jbi.2021.103938

Computer Science > Computation and Language

arXiv:2105.11832 (cs)

[Submitted on 25 May 2021 (v1), last revised 26 Oct 2021 (this version, v2)]

Title:Estimating Redundancy in Clinical Text

Authors:Thomas Searle, Zina Ibrahim, James Teo, Richard JB Dobson

View PDF

Abstract:The current mode of use of Electronic Health Record (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to a propagation of errors, inconsistencies and misreporting of care. Therefore, quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical narratives.
This work is a quantitative examination of information redundancy in EHR notes. We present and evaluate two strategies to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model. We evaluate the measures by training large Transformer-based language models using clinical text from a large openly available US-based ICU dataset and a large multi-site UK based Trust. By comparing the information-theoretic content of the trained models with open-domain language models, the language models trained using clinical text have shown ~1.5x to ~3x less efficient than open-domain corpora. Manual evaluation shows a high correlation with lexicosyntactic and semantic redundancy, with averages ~43 to ~65%.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2105.11832 [cs.CL]
	(or arXiv:2105.11832v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2105.11832
Journal reference:	JBI v124 (2021)
Related DOI:	https://doi.org/10.1016/j.jbi.2021.103938

Submission history

From: Thomas Searle [view email]
[v1] Tue, 25 May 2021 11:01:45 UTC (834 KB)
[v2] Tue, 26 Oct 2021 10:15:49 UTC (503 KB)

Computer Science > Computation and Language

Title:Estimating Redundancy in Clinical Text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Estimating Redundancy in Clinical Text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators