0% found this document useful (0 votes)

93 views9 pages

Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment

The document proposes a model for automatic radiology report generation based on chest x-ray images. It introduces multi-view image fusion and medical concept enrichment to generate more accurate reports. Experimental results on a chest x-ray dataset show the proposed model achieves state-of-the-art performance compared to baselines.

Uploaded by

sebampitako duncan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views9 pages

Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment

Uploaded by

sebampitako duncan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Automatic Radiology Report Generation based

on Multi-view Image Fusion and

Medical Concept Enrichment

Jianbo Yuan1 , Haofu Liao1 , Rui Luo2 , and Jiebo Luo1

1
Department of Computer Science, University of Rochester
arXiv:1907.09085v2 [eess.IV] 23 Jul 2019

{jyuan10, hliao6, jluo}@cs.rochester.edu

2
Futurewei Technologies, Inc., Bellevue, WA 98004, USA
[email protected]

Abstract. Generating radiology reports is time-consuming and requires

extensive expertise in practice. Therefore, reliable automatic radiology
report generation is highly desired to alleviate the workload. Although
deep learning techniques have been successfully applied to image classi-
fication and image captioning tasks, radiology report generation remains
challenging in regards to understanding and linking complicated medical
visual contents with accurate natural language descriptions. In addition,
the data scales of open-access datasets that contain paired medical im-
ages and reports remain very limited. To cope with these practical chal-
lenges, we propose a generative encoder-decoder model and focus on chest
x-ray images and reports with the following improvements. First, we pre-
train the encoder with a large number of chest x-ray images to accurately
recognize 14 common radiographic observations, while taking advantage
of the multi-view images by enforcing the cross-view consistency. Second,
we synthesize multi-view visual features based on a sentence-level atten-
tion mechanism in a late fusion fashion. In addition, in order to enrich
the decoder with descriptive semantics and enforce the correctness of
the deterministic medical-related contents such as mentions of organs or
diagnoses, we extract medical concepts based on the radiology reports in
the training data and fine-tune the encoder to extract the most frequent
medical concepts from the x-ray images. Such concepts are fused with
each decoding step by a word-level attention model. The experimental
results conducted on the Indiana University Chest X-Ray dataset demon-
strate that the proposed model achieves the state-of-the-art performance
compared with other baseline approaches.

1 Introduction
Medical images are widely used in clinical decision-making. For example, chest
x-ray images are used for diagnosing pneumonia and pleural effusion. The inter-
pretation of medical images requires extensive expertise and is prone to human
errors. Considering the demands of accurately interpreting medical images in
large amounts within short times, an automatic medical imaging report genera-
tion model can be helpful to alleviate the labor intensity involved in the task. In
2 J. Yuan et al.

this work, we aim to propose a novel medical imaging report generation model
focusing on radiology. To be more specific, the inputs of the proposed frame-
work are chest x-ray images under different views (frontal and lateral) based
on which radiology reports are generated accordingly. Radiology reports contain
information summarized by radiologists and are important for further diagnosis
and follow-up recommendations.
The problem setting is similar to image captioning, where the objective is
to generate descriptions for natural images. Most existing studies apply similar
structures including an encoder based on convolutional neural networks (CNN),
and a decoder based on recurrent neural networks (RNN) [11] which captures the
temporal information and is widely used in natural language processing (NLP).
Attention models have been applied in captioning to connect the visual contents
and semantics selectively [13]. More recently, studies on radiology report genera-
tion have shown promising results. To handle paragraph-level generation, a hier-
archical LSTM decoder has been applied to generate medical imaging reports [6]
incorporating with visual and tag attentions. Xue et al. build an iterative decoder
with visual attentions to enforce the coherence between sentences [14]. Li et al.
propose a retrieval model based on extracted disease graphs for medical report
generation [7]. Medical report generation is different from image captioning in
that: (1) data in medical and clinical domains is often limited in scales and thus
it is difficult to obtain robust models for reasoning; (2) medical reports are para-
graphs other than sentences as in image captioning, where conventional RNN
decoders such as long short-term memory (LSTM) have issues of gradient van-
ishing; and (3) generating medical reports requires higher precision when used
in practice, especially on medical-related contents, such as disease diagnosis.
We choose the widely used Indiana University Chest X-ray radiology report
dataset (IU-RR) [1] for this task. In most cases, radiology reports contain de-
scriptive findings in the form of paragraphs, and conclusive impressions in one or
a few sentences. To address the challenges mentioned above, we aim to improve
both the encoder and decoder in the following aspects:
First, we construct a multi-task scheme consisting of chest x-ray image classi-
fication and report generation. This strategy has been shown to be successful be-
cause the encoder is enforced to learn radiology-related features for decoding [6].
Since the data scale of IU-RR is small, encoder pretraining is important in order
to obtain a robust performance. Different from previous studies using ImageNet
which is collected for general-purposed object recognition, we pretrain with large
scale chest x-ray images from the same domain, namely CheXpert [5], to better
capture domain specific image features for decoding. Second, most of previous
studies using chest x-ray images for disease classification and report generation
consider the frontal and lateral images from the same patient as two independent
cases [6,12]. We argue that lateral images contain complementary information
to frontal images in the process of interpreting medical images. Such multi-view
features should be synthesized selectively other than contributing equally (con-
catenate, mean or sum) to the final results. Moreover, it is likely to generate
inconsistent results for the same patient based on images from different views.
Automatic Radiology Report Generation 3

Fig. 1. Overall framework of the proposed encoder and decoder with attentions. E, D,
and D0 denote the encoder, sentence decoder, and word decoder, respectively.

We propose to synthesize multi-view information by applying a sentence-level

attention model, and enforce the encoder to extract consistent features with a
cross-view consistency (CVC) loss.
From the decoder side, we use hierarchical LSTM (sentence and word level
LSTM) to generate radiology reports. RNN decoders tend to memorize word
distributions and patterns which frequently occur in the training data, and thus
might produce inaccurate results when the target contents have not been ob-
served, or when multiple patterns share similar distributions given the previ-
ous contents. Such limitations significantly hinders the credibility of machine-
generated results in practical use, since incorrect medical-related contents can
be very misleading. For example, generating “left-sided pleural effusion” while
the ground truth is “right-sided pleural effusion”. In addition, the source visual
contents stay too far from the targeted word decoder in hierarchical LSTM which
makes the generation process more difficult. To address such issues, we explore
the semantics conveyed in the textual contents and apply them directly to the
word decoder. We first extract frequent medical concepts based on the radiology
reports and fine-tune the encoder to recognize such concepts. The obtained med-
ical concepts contain explicit information to accurately generate deterministic
medical-related contents, such as diagnosis, locations, and observations.
The main contributions of our work are summarized as follows: (1) to the
best of our knowledge, we are the first to employ the latest CheXpert dataset to
obtain a more robust encoder for radiology report generation; (2) we selectively
incorporate multi-view visual contents with sentence-level attentions and enforce
the consistency between different views; (3) we extract and apply medical con-
cepts to the decoder with word-level attentions to enhance the correctness of the
medical-related contents; (4) our integrated framework outperforms the state-of-
the-art baselines in the experiments, and (5) we visualize uncertain radiographic
observations predicted by the encoder to provide an added benefit to direct more
expert attention to such uncertainties for further analysis in practice.

2 Methodology

The proposed framework consists of a multi-view CNN encoder and a concept

enriched hierarchical LSTM decoder as in Figure 1. We apply a multi-task scheme
4 J. Yuan et al.

including: (1) radiographic observation classification to pretrain and fine-tune

the encoder with large-scale images; (2) to extract medical concepts; and (3) to
fuse all information to generate radiology reports. Therefore, two datasets are
used in this work: CheXpert [5], a large collection of chest x-ray images under
14 common chest radiographic observations to pretrain the image encoder, and
Indiana University Chest X-ray [1] containing full radiology reports but in a
considerably smaller scale for training and evaluating the report generation task.

2.1 Image Encoder

The encoder uses Resnet-152 [4] as the backbone and extracts visual features for
predicting chest radiologic observations and radiology report generation.
Chest Radiographic Observations: The task is formulated as a multi-label
classification with 14 common radiographic observations following [5] including:
enlarged cardiom, cardiomegaly, lung opacity, lung lesion, edema, consolidation,
pneumonia, atelectasis, pneumothorax, pleural effusion, pleural other, fracture,
support devices, and no finding. Compared with previous studies using pretrained
encoders based on ImageNet [6,14], pretraining with images from the same do-
main yields better results. We add one full-connected layer as classifier and
compute the binary cross entropy (BCE) loss. Additionally, we consider both
frontal and lateral images of one patient as one input pair and enforce the pre-
diction consistencies under different views by a mean square error (MSE) loss
over the multi-view encoder outputs. The loss function is thus defined in Equa-
tion 1 where yi,j is the j-th ground truth entry (j ∈ [1, 14])) of the i-th sample
(i ∈ (1, N )), the frontal view and lateral view prediction are denoted as ŷif and
ŷil . The encoder outputs global features after average-pooling, and local features
v ∈ Rk×dv from the last CNN block where k denotes the number of local regions
and dv denotes the dimension.

X X
v v
X 2
LI = − yi,j log ŷi,j + (1 − yi,j ) log 1 − ŷi,j +λ yif − yil (1)
v∈{f,l} i,j i

Medical Concepts: The textual reports contain descriptive information related

to the visual contents which have not yet been explored by existing models. In
IU-RR, Medical text indexer (MTI) can be potentially used in a similar man-
ner [6]. However, MTIs are sometimes noisy and not normalized. Therefore, we
use Semrep3 to extract normalized medical concepts that frequently occur in the
training reports. We empirically set the minimal occurrences as 80 and obtained
69 medical concepts for a decent detection accuracy. We fix the pretrained image
encoder, and add another fully connected layer on top as the concept classifier.

2.2 Hierarchical Decoder

Since conventional RNN decoder is not suitable for paragraph generation, we
apply a hierarchical decoder, which has been widely used in paragraph encoding
3
https://semrep.nlm.nih.gov/
Automatic Radiology Report Generation 5

Fig. 2. Different fusion schemes for multi-view image features.

and decoding, to generate radiology reports. The decoder contains two layers: a
sentence LSTM decoder that outputs sentence hidden states, and a word LSTM
decoder which decodes the sentence hidden states into natural languages. In this
way, reports are generated sentence-by-sentence.
Sentence Decoder with Attentions: The sentence decoder is fed with visual
features extracted from the encoder, and generates sentence hidden states. Since
we have both frontal and lateral features, the selection of fusion schemes is
important. As show in Figure 2, we propose and compare three fusion schemes:
an intuitive solution is to directly concatenate the features from both views; early
fusion where the concatenated features are selectively attended by the previous
hidden state; and late fusion which fuses the hidden states by two decoders after
visual-sentence attentions. To generate sentence hidden state hts at time step
ts ∈ (1, N s ), we compute the visual attention weights αi with Equation 2, where
vm is the m-th local feature, and Wa , Wv and Ws are weight matrices.

ai = Wa [tanh (Wv vi + Ws hts −1 )] , αi = sof tmax(ai ) (2)

By leveragingPk all the local regions, the attended local feature is thus calculated
as vatt = m=1 αi,m vm , and is concatenated with the previous hidden state to
be fed into the sentence LSTM for computing the current hidden state hts .
Word Decoder with Attentions: Incorporated with the obtained medical
concepts, the sentence hidden states are used as inputs to the word LSTM de-
coder. For each word decoding step, the previous word hidden state ĥtw for time
step tw ∈ (1, Ntws ) is used to generate the word distribution over the vocabulary
and output the word with the highest score. The embedding wtw of the pre-
dicted word ŵtw is then fused with medical concepts in order to generate the
next word hidden state. Given the medical concept embeddings c ∈ Rn×dc for
p medical concepts for the i-th sample, and the predicted concept distributions
yˆic , the attention weights over all medical concepts at time step tw is defined in
Equation 3 where Wac , Wc , and Ww are the weight matrices to be learned.
h i
aci = Wac tanh yˆic Wc c + Ww ĥtw −1 , αic = sof tmax(aci ) (3)

Similar to visual attention

Pp model, the attended medical concept feature is cal-
c
culated as catt = n=1 j,n cn , and is concatenated with the previous word
α
embedding to generate the next word. We use cross entropy loss LW given the
6 J. Yuan et al.

predicted word distribution yˆw tw and the ground truth ytww using Equation 4.

N s
ts Nw
X X
ytww log yˆw tw

LW = − (4)
ts =1 tw =1

3 Experiment

Data Collection: CheXpert [5] contains 224,316 multi-view chest x-ray images
from 65,240 patients of 14 common radiographic observations. The observations
are generated using NLP tools from the radiology reports labeled as positive,
negative, and uncertain. We inherited and visualized the uncertain predictions
to address more expert attention for practical use. An alternative dataset is
ChestX-ray14 [12]. We chose to use CheXpert because its labeler is reported to
be more reliable as compared with ChestX-ray14 [5].
Since neither of the aforementioned datasets released radiology reports, we
use IU-RR [1] for evaluating radiology report generation. For preprocessing, we
first removed samples without multi-view images, and concatenated the “find-
ings” and “impression” sections because in some forms all contents are either in
the “findings” or “impression” section with the other left blank. We filtered out
the reports with less than 3 sentences. In the end, we obtained 3,074 samples
with multi-view images of which 20% (615 samples/1,330 images) are used for
testing, and the 80% (2459 samples/4,918 images) are used for training and val-
idation. For encoder fine-tuning, we extract the same 14 labels as [5] on IU-RR.
For report parsing, we converted the texts to tokens, and added “hstarti” and
“hendi” to the beginning and end of each sentence, respectively. Low frequency
(less than 3 occurrences) words were dropped, and textual errors were replaced
with “hunki” which are caused by being falsely recognized as confidential infor-
mation during the original data de-identification of IU-RR.

Table 1. Average ROC-AUC (avg-AUC) on radiographic observation classification.

Metrics Base ImgNet CX CX+CVC CX+CVC+F

avg-AUC 0.727 0.731 0.747 0.751 0.764

Chest Radiographic Observations: We conducted extensive experiments on

the encoder regarding two factors: how to properly pretrain and fine-tune the en-
coder, and how to leverage the multi-view information. The classification results
on radiographic observations are shown in Table 1. In general, pretraining with
ImageNet (ImgNet) performs marginally better than models without pretraining
(Base), and encoders pretrained by CheXpert (CX) performs the best, indicat-
ing that pretraining with large scale data in the same domain helps. Enforcing
cross-view consistency (CX+CVC) also improves the results. We obtained the
best result by fusing multi-view predictions with max operation (CX+CVC+F).
Automatic Radiology Report Generation 7

Table 2. Evaluations of generated radiology reports.

Methods BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE

Vis-Att [13] 0.399 0.251 0.168 0.118 0.167 0.323
MM-Att [14] 0.464 0.358 0.270 0.195 0.274 0.366
KERP [7] 0.482 0.325 0.226 0.162 − 0.339
Co-Att [6] 0.517 0.386 0.306 0.247 0.217 0.447
(Ours) MvH 0.478 0.334 0.277 0.191 0.265 0.318
(Ours) MvH+AttE 0.483 0.337 0.285 0.228 0.282 0.335
(Ours) MvH+AttL 0.488 0.357 0.296 0.246 0.313 0.351
(Ours) MvH+AttL+MC 0.529 0.372 0.315 0.255 0.343 0.453
(Ours) MvH+AttL+MC* 0.649 0.500 0.413 0.303 0.418 0.496

Radiology Report Generation: The evaluation metrics we use are BLEU [9],
METEOR [2], and ROUGE [8] scores, all of which are widely used in image cap-
tioning and machine translation tasks. We compared the proposed model with
several state-of-the-art baselines: (1) a visual attention based image captioning
model (Vis-Att) [13]; (2) radiology report generation models, including a hierar-
chical decoder with co-attention (Co-Att) [6], multimodal generative model with
visual attention (MM-Att) [14], and knowledge-drive retrieval based report gen-
eration (KERP) [7]; and (3) the proposed multi-view encoder with hierarchical
decoder (MvH) model, the base model with visual attentions and early fusion
(MvH+AttE), MvH with late fusion fashion (MvH+AttL), and the combination
of late fusion with medical concepts (MvH+AttL+MC). MvH+AttL+MC* is an
oracle run based on ground-truth medical concepts and considered as the upper
bound of the improvement caused by applying medical concepts. As shown in
Table 2, our proposed models generally outperform the state-of-the-art base-
lines. Compared with MvH, multi-view feature fusions by attentions (AttE and
AttL) yield better results. Applying medical concepts significantly improve the
performance especially on Meteors because the recalls rise with more semantical
information provided directly to the word decoder, and Meteor weights more on
recalls over precisions. However, the improvement is subject to prediction errors
on medical concepts, indicating that a better encoder would benefit the whole
model by a large margin as shown in MvH+AttL+MC*.
Discussion: As Figure 3 shows, AttL (and other baseline models) have difficul-
ties generating abnormalities and locations because there is no explicit abnormal
information involved in word-level decoding compared with our proposed model.
Not all the predicted medical concepts would necessarily appear in the generated
reports. On the other hand, the prediction errors from the encoder propagate,
such as predicting “right” instead of “right lung”, and affect the generated re-
ports, suggesting a more accurate encoder is beneficial. Moreover, since there
are no constraints on the sentence decoder during the training, it is likely to
generate similar hidden states for our model. In this case, a stacked attention
mechanism would be beneficial for forcing the decoder to focus on different image
sub-regions. In addition, we observe that it is difficult for our model to generate
unseen sentences and sometimes there are syntax errors. Such errors are due to
8 J. Yuan et al.

Fig. 3. An example report generated by the proposed model. The medical concepts
marked red are false (positive/negative) predictions. The underlined sentences are ab-
normality descriptions. Uncertain predictions are visualized using Grad-cam [10].

the limited corpus scale of IU-RR, and we expect by exploring unpaired textual
data for pretraining the decoder would address such limitations [3].

4 Conclusions

In this paper, we present a novel encoder-decoder model for radiology report

generation. The proposed model takes advantage of multi-view information in
radiology by applying visual attentions in a late fusion fashion, and enriches
the semantics involved in the hierarchical LSTM decoder with medical concepts.
Consequently, both the visual and textual contents have been better exploited to
achieve the state-of-the-art performance. The automatic interpretation approach
will simplify and expedite the conventional process of generating radiology re-
ports and better assist human experts in decision making. As a valuable added
benefit, uncertain radiographic observations are extracted and visualized by our
model because it is important to direct more expert attention to such uncertain-
ties for further analysis in practice.

Acknowledgment This work is supported in part by NSF through award IIS-

1722847, NIH through the Morris K. Udall Center of Excellence in Parkinson’s
Disease Research, and our corporate sponsor.

References
1. Demner-Fushman, D., Kohli, M.D., Rosenman, M.B., Shooshan, S.E., Rodriguez,
L., Antani, S.K., Thoma, G.R., McDonald, C.J.: Preparing a collection of radiology
examinations for distribution and retrieval. JAMIA 23(2), 304–310 (2016)
2. Denkowski, M., Lavie, A.: Meteor universal: Language specific translation evalua-
tion for any target language. In: Proceedings of the ninth workshop on statistical
machine translation. pp. 376–380 (2014)
3. Feng, Y., Ma, L., Liu, W., Luo, J.: Unsupervised image captioning. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4125–
4134 (2019)
Automatic Radiology Report Generation 9

4. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on Computer Vision and Pattern Recognition.
pp. 770–778 (2016)
5. Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H.,
Haghgoo, B., Ball, R., Shpanskaya, K., et al.: Chexpert: A large chest radiograph
dataset with uncertainty labels and expert comparison. arXiv:1901.07031 (2019)
6. Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging
reports. In: Proceedings of the 56th Annual Meeting of the Association for Com-
putational Linguistics, ACL 2018, Melbourne, Australia. pp. 2577–2586 (2018)
7. Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, para-
phrase for medical image report generation. arxiv:1903.10122 (2019)
8. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Proceed-
ings of the ACL-04 Workshop. pp. 74–81. Association for Computational Linguis-
tics, Barcelona, Spain (July 2004)
9. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic
evaluation of machine translation. In: Proceedings of the 40th annual meeting on
association for computational linguistics. pp. 311–318 (2002)
10. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-
cam: Visual explanations from deep networks via gradient-based localization. In:
Proceedings of the IEEE International Conference on Computer Vision. pp. 618–
626 (2017)
11. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image
caption generator. arxiv:1411.4555 (2015)
12. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8:
Hospital-scale chest x-ray database and benchmarks on weakly-supervised classi-
fication and localization of common thorax diseases. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition. pp. 2097–2106 (2017)
13. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R.,
Bengio, Y.: Show, attend and tell: Neural image caption generation with visual
attention. In: arxiv:1502.03044 (2015)
14. Xue, Y., Xu, T., Long, L.R., Xue, Z., Antani, S.K., Thoma, G.R., Huang, X.: Mul-
timodal recurrent model with attention for automated radiology report generation.
In: Medical Image Computing and Computer Assisted Intervention 2018, Granada,
Spain, Proceedings, Part I. pp. 457–466 (2018)

Bingham Yield Slurry
No ratings yet
Bingham Yield Slurry
124 pages
Jazz Guitar Chords - The Ultimate Guide
No ratings yet
Jazz Guitar Chords - The Ultimate Guide
40 pages
Comparison TIA Portal Vs Studio 5000 1
100% (1)
Comparison TIA Portal Vs Studio 5000 1
53 pages
Chestx-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks On Weakly-Supervised Classification and Localization of Common Thorax Diseases
No ratings yet
Chestx-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks On Weakly-Supervised Classification and Localization of Common Thorax Diseases
19 pages
AS360 Series Elevator Intergrated Controller Operation Manual V1.00 2013.9.18
100% (1)
AS360 Series Elevator Intergrated Controller Operation Manual V1.00 2013.9.18
66 pages
Phoneme PDF
No ratings yet
Phoneme PDF
26 pages
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
No ratings yet
Automation of Sewage Treatment Plant Using PLC & SCADA: A Major Project Report
23 pages
Design Analysis of Machine Tool Structure With Art
No ratings yet
Design Analysis of Machine Tool Structure With Art
14 pages
Contrastive Learning of Medical Visual Representations From Paired Images and Text
No ratings yet
Contrastive Learning of Medical Visual Representations From Paired Images and Text
15 pages
Comput Methods Programs Biomed 2021 206 106130
No ratings yet
Comput Methods Programs Biomed 2021 206 106130
11 pages
Tableau Tutorial For Beginners
No ratings yet
Tableau Tutorial For Beginners
8 pages
Syllabus Computer
No ratings yet
Syllabus Computer
105 pages
mTechPesWeJune21Grp6 Final+Submission
No ratings yet
mTechPesWeJune21Grp6 Final+Submission
50 pages
Distance Calculator
No ratings yet
Distance Calculator
201 pages
اطروحة شبر جواد كاظم العبيدي
No ratings yet
اطروحة شبر جواد كاظم العبيدي
135 pages
TEXA Axone Nemo Specs
No ratings yet
TEXA Axone Nemo Specs
36 pages
ICAO Frequency Management Manual
No ratings yet
ICAO Frequency Management Manual
19 pages
Polyflow BMTF WS04 Bottle Blow Molding
No ratings yet
Polyflow BMTF WS04 Bottle Blow Molding
34 pages
Adapting Pretrained Vision-Language Foundational Models To Medical Imaging Domains
No ratings yet
Adapting Pretrained Vision-Language Foundational Models To Medical Imaging Domains
17 pages
I-F Plus: FANUC Series 0
No ratings yet
I-F Plus: FANUC Series 0
3 pages
Mendel Laws
No ratings yet
Mendel Laws
24 pages
Two-Stage Conditional Chest X-Ray Radiology Report Generation
No ratings yet
Two-Stage Conditional Chest X-Ray Radiology Report Generation
6 pages
Chest Xray Captioning
No ratings yet
Chest Xray Captioning
28 pages
Artificial Intelligence in Finance Newsletter by Slidesgo
No ratings yet
Artificial Intelligence in Finance Newsletter by Slidesgo
27 pages
Failures Related To Heat Treating Operations PDF
No ratings yet
Failures Related To Heat Treating Operations PDF
32 pages
Andre Leite Dissertacao
No ratings yet
Andre Leite Dissertacao
81 pages
A Survey On Automatic Generation of Medical Imaging Reports Based On Deep Learning
No ratings yet
A Survey On Automatic Generation of Medical Imaging Reports Based On Deep Learning
16 pages
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
No ratings yet
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
8 pages
Med Image Syn
No ratings yet
Med Image Syn
10 pages
Advanced Materials For Space Applications
No ratings yet
Advanced Materials For Space Applications
9 pages
Soft Computing
No ratings yet
Soft Computing
17 pages
Shorting
No ratings yet
Shorting
4 pages
OME753 Systems Engineering MCQ - NPTEL
No ratings yet
OME753 Systems Engineering MCQ - NPTEL
23 pages
Enlil Vertical Axis Wind Turbine
100% (1)
Enlil Vertical Axis Wind Turbine
20 pages
Clamping Tools Brochure English
No ratings yet
Clamping Tools Brochure English
6 pages
FYP Final Evaluation
No ratings yet
FYP Final Evaluation
26 pages
Performance Analysis of 2.5 Gbps PIN and APD Photodiodes To Use in Free Space Optical Communication Link
No ratings yet
Performance Analysis of 2.5 Gbps PIN and APD Photodiodes To Use in Free Space Optical Communication Link
7 pages
Can We Trust Deep Learning Model For Diagnosis
No ratings yet
Can We Trust Deep Learning Model For Diagnosis
10 pages
LOGO! StarterKit
No ratings yet
LOGO! StarterKit
2 pages
Paper 91-Comparative Evaluation of CNN Architectures
No ratings yet
Paper 91-Comparative Evaluation of CNN Architectures
9 pages
TMRGM A Template-Based Multi-Attention Model For X
No ratings yet
TMRGM A Template-Based Multi-Attention Model For X
12 pages
Survey On Image Resizing Techniques: Priyanka C. Dighe, Shanthi K. Guru
No ratings yet
Survey On Image Resizing Techniques: Priyanka C. Dighe, Shanthi K. Guru
5 pages
Token Ring (TR, IEEE 802.5) : Technology Description
No ratings yet
Token Ring (TR, IEEE 802.5) : Technology Description
4 pages
Medical Image Captioning Via Generative Pretrained Transformers
No ratings yet
Medical Image Captioning Via Generative Pretrained Transformers
12 pages
Online Platt Scaling With Calibeating
No ratings yet
Online Platt Scaling With Calibeating
23 pages
Intelligent Word Embeddings of Free-Text Radiology Reports
No ratings yet
Intelligent Word Embeddings of Free-Text Radiology Reports
10 pages
On The Automatic Generation of Medical Imaging Reports
No ratings yet
On The Automatic Generation of Medical Imaging Reports
10 pages
Word Think Before You Write Email Elisabeth
No ratings yet
Word Think Before You Write Email Elisabeth
3 pages
Verification Against In-Situ Observations For Data-Driven Weather Prediction
No ratings yet
Verification Against In-Situ Observations For Data-Driven Weather Prediction
10 pages
Accurate Ignition Detection of Solid Fuel Particles Using Machine Learning
No ratings yet
Accurate Ignition Detection of Solid Fuel Particles Using Machine Learning
9 pages
Rad-Former: Structuring Radiology Reports Using Transformers
No ratings yet
Rad-Former: Structuring Radiology Reports Using Transformers
6 pages
Geo F2 (Kilombero Prenecta 19)
No ratings yet
Geo F2 (Kilombero Prenecta 19)
6 pages
Feature Embedding Clustering Using POCS-based Clustering Algorithm
No ratings yet
Feature Embedding Clustering Using POCS-based Clustering Algorithm
6 pages
Medical Paper - Plag Report
No ratings yet
Medical Paper - Plag Report
34 pages
Controllable Chest X-Ray Report Generation From Longitudinal Representations
No ratings yet
Controllable Chest X-Ray Report Generation From Longitudinal Representations
14 pages
Automated Abnormality Classi Fication of Chest Radiographs Using Deep Convolutional Neural Networks
No ratings yet
Automated Abnormality Classi Fication of Chest Radiographs Using Deep Convolutional Neural Networks
8 pages
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
No ratings yet
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
8 pages
Medical Paper
No ratings yet
Medical Paper
20 pages
Pid 23
No ratings yet
Pid 23
28 pages
Cheapskate
No ratings yet
Cheapskate
2 pages
Capstone Review 2
No ratings yet
Capstone Review 2
17 pages
AutoRG-Brain - Grounded Report Generation For Brain MRI
No ratings yet
AutoRG-Brain - Grounded Report Generation For Brain MRI
39 pages
Keyboard Shortcuts Linux
No ratings yet
Keyboard Shortcuts Linux
1 page
Explainable Automated Coding of Clinical Notes Using Hierarchical Label-Wise Attention Networks and Label Embedding Initialisation
No ratings yet
Explainable Automated Coding of Clinical Notes Using Hierarchical Label-Wise Attention Networks and Label Embedding Initialisation
25 pages
Artificial Intelligence in Medicine 2019
No ratings yet
Artificial Intelligence in Medicine 2019
9 pages
Merlin: A Vision Language Foundation Model For 3D Computed Tomography
No ratings yet
Merlin: A Vision Language Foundation Model For 3D Computed Tomography
28 pages
Applsci 15 00343
No ratings yet
Applsci 15 00343
14 pages
Unit III 1
No ratings yet
Unit III 1
11 pages
Sciadv Abb7973
No ratings yet
Sciadv Abb7973
12 pages
Vadapav
No ratings yet
Vadapav
2 pages
Paper 158
No ratings yet
Paper 158
16 pages
Adobe Dimension CC Classroom in A Book (2019 Release) (PDFDrive) - 1
No ratings yet
Adobe Dimension CC Classroom in A Book (2019 Release) (PDFDrive) - 1
150 pages
Structural Entities Extraction and Patient Indications Incorporation For Chest X-Ray Report Generation
No ratings yet
Structural Entities Extraction and Patient Indications Incorporation For Chest X-Ray Report Generation
11 pages
Medical Image Captioning Via Generative Pretrained
No ratings yet
Medical Image Captioning Via Generative Pretrained
13 pages
Medical Image Description Using
No ratings yet
Medical Image Description Using
9 pages
2 UG Crystal Note
No ratings yet
2 UG Crystal Note
97 pages
Unpaired Medical Report Generation Cycle Consistency Hirsch Tal
No ratings yet
Unpaired Medical Report Generation Cycle Consistency Hirsch Tal
16 pages
Uet Lahore
No ratings yet
Uet Lahore
13 pages
Healthgpt
No ratings yet
Healthgpt
19 pages
Healthgpt
No ratings yet
Healthgpt
19 pages
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
No ratings yet
RadTextAid AAAI GenAI Workshop 2024 v0 Camera Ready
8 pages
Medical Image Captioning Using Deep Learning - Rohan Paul
No ratings yet
Medical Image Captioning Using Deep Learning - Rohan Paul
14 pages
Using Generative AI To Investigate Medical Imagery Models
No ratings yet
Using Generative AI To Investigate Medical Imagery Models
14 pages
Vision Teransformaer Paper
No ratings yet
Vision Teransformaer Paper
12 pages
RADIOLOGY REPORT GENERATOR USING DEEP LEARNING AND EXPLAINABLE AI IN INDIAN LANGUAGES Removed
No ratings yet
RADIOLOGY REPORT GENERATOR USING DEEP LEARNING AND EXPLAINABLE AI IN INDIAN LANGUAGES Removed
8 pages
Ultrasound Report Generation With Multimodal Large Language Models For Standardized Texts
No ratings yet
Ultrasound Report Generation With Multimodal Large Language Models For Standardized Texts
10 pages
Automated Knee X-Ray Report Generation
No ratings yet
Automated Knee X-Ray Report Generation
6 pages
Radia
No ratings yet
Radia
19 pages
LM RRG
No ratings yet
LM RRG
14 pages
Serpent
No ratings yet
Serpent
10 pages
Rad Align
No ratings yet
Rad Align
15 pages
CRRG Clip
No ratings yet
CRRG Clip
15 pages
Cardio Disease DL
No ratings yet
Cardio Disease DL
3 pages
DART
No ratings yet
DART
15 pages
Multimodal GenAi Pranav
No ratings yet
Multimodal GenAi Pranav
7 pages
Automatic Report Generation For Chest X-Ray Images: A Multilevel Multi-Attention Approach
No ratings yet
Automatic Report Generation For Chest X-Ray Images: A Multilevel Multi-Attention Approach
10 pages
CBSE Class12 PYQs Electric Charges and Fields-1
No ratings yet
CBSE Class12 PYQs Electric Charges and Fields-1
2 pages

Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment

Uploaded by

Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment

Uploaded by

Automatic Radiology Report Generation based

on Multi-view Image Fusion and

Jianbo Yuan1 , Haofu Liao1 , Rui Luo2 , and Jiebo Luo1

{jyuan10, hliao6, jluo}@cs.rochester.edu

Abstract. Generating radiology reports is time-consuming and requires

We propose to synthesize multi-view information by applying a sentence-level

The proposed framework consists of a multi-view CNN encoder and a concept

including: (1) radiographic observation classification to pretrain and fine-tune

2.1 Image Encoder

Medical Concepts: The textual reports contain descriptive information related

2.2 Hierarchical Decoder

Fig. 2. Different fusion schemes for multi-view image features.

ai = Wa [tanh (Wv vi + Ws hts −1 )] , αi = sof tmax(ai ) (2)

Similar to visual attention

Table 1. Average ROC-AUC (avg-AUC) on radiographic observation classification.

Metrics Base ImgNet CX CX+CVC CX+CVC+F

Chest Radiographic Observations: We conducted extensive experiments on

Table 2. Evaluations of generated radiology reports.

Methods BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE

In this paper, we present a novel encoder-decoder model for radiology report

Acknowledgment This work is supported in part by NSF through award IIS-

You might also like