0% found this document useful (0 votes)

9 views20 pages

Medical Paper

Uploaded by

veeraanusuyacse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views20 pages

Medical Paper

Uploaded by

veeraanusuyacse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

DIAGNOSING CHEST X-RAY IMAGE BASED ON MEDICAL IMAGE CAPTIONING

Abstract

Chest diseases are especially deadly because of their effects on the lungs. The lungs are
very essential organs in the human body, thus the harmful damages can be caused. The main
method used to diagnose conditions like pneumonia, emphysema, etc. is a chest x-
ray.Traditionally, radiologists thoroughly examine the x-ray image and document the
patient’scondition in a report. Appling, image captioning task in medical field that helps to
reduces thehuman error. Medical image captioning is the process of producing natural language
processing for medical images. It might be useful in helping doctors identify illnesses
andcomprehend medical situations. Target is to find abnormalities in a chest x ray image. In this
study, suggest a medical picture captioning model that builds on EfficientNet and GRU
advantages. Encoder-decoder architecture is utilized. This work utilizes a gamma correction
technique for enhancing a quality of medical image. Bidirectional Encoder Representation from
Transformer (BERT) is used to remove the textual features from the text. The evolving model is
tested on publically available medical image dataset: the IU X-Ray dataset that is associated with
various chest and lung related diseases and it has 3955related radiology reports from 3955
patients along with 7470 X-ray images. The enhanced model is performed well on performance
metrics BLEU, ROUUGE-L.

Introduction

In our routine life, evolving Computer vision (CV) and Natural Language Processing
(NLP) are playing a crucial role in solving various real-time problems. One of the interesting
research areas that are receiving a lot of attention is image captioning [25]. Information about an
image can be accurately explained by humans. Image captioning is one of them, and it aims to
recognise objects, activities, and their relationships in an image order to provide a syntactically
and semantically correct visual description [26]. A Family of convolutional neural networks is
used to extract visual features in an image and a language generation model such as Long Short
Term Memory or transformer based model is used to generate a text for extracted visual features.
Medical image captioning

Chest diseases are especially deadly because of their effects on the lungs.The lungs are
vital organs in the human body, thus any damage caused potentially be harmful.The main
method used to diagnose conditions like pneumonia, emphysema, etc. is a chest x-ray.
Traditionally, radiologists thoroughly examine the x-ray image and document the patient's
condition in a report.To correctly read a chest x-ray, a radiologist needs to be equipped with the
following abilities: a thorough understanding of the basic anatomy of the thorax as well as the
physiology of various chest diseases; the capacity to analyze the radiograph by recognizing
different patterns; the capacity to analyze and evaluate the evolution of chest x-rays over time
and recognize any changes that might occur; knowledge of clinical presentation and history;
understanding of correlation with diagnostic outcomes. If a report is written by a physician with
little experience, this complicated effort may lead to errors, but it can also be time-consuming
and tiresome for a more experienced practitioner. By applying image captioning in Medical field
for diagnosing a chest X ray can reduce the human error and save the radiology time.
A state-of-the-art use of computer vision and natural language processing in the
healthcare industry is medical image captioning. It entails the analysis and interpretation of
medical images, including X-rays, MRIs, CT scans, and histopathological images, using
sophisticated algorithms to produce captions that are both descriptive and clinically relevant
[21]. The process of creating visual representations of a body's interior for clinical analysis, as
well as representations of how specific organs or tissues function, is known as medical image
captioning. They are frequently used in clinics and hospitals to diagnose illnesses and fractures.
Medical professionals with specialized training read and interpret the medical images. They then
communicate their findings about each body area examined through written Medical Reports.
The objective of Medical image captioning is to help the medical professionals to comprehend
and interpret complex medical imagery more easily, which will ultimately improve treatment
planning and diagnostic accuracy. Medical image captioning that diagnoses the medical
condition by providing a textual description about the medical image [32].
Conventional approaches to medical image analysis frequently depend on trained
radiologists and clinicians to manually interpret the images. But the need for more precise and
effective diagnosis, along with the growing amount of medical imaging data, prompted research
into AI-driven solutions like medical image captioning [22].
For diagnosing reports for a chest X ray, A family of Convolutional Neural Networks is
used to retrieve the visual features by recognizing patterns, structures, and abnormalities within
the images and Recurrent Neural Network or transformer-based architecture are used to generate
the description for an image. This enables the model to produce captions that are understandable
by humans and provide clinically relevant details about the conditions or anomalies that are
being observed.
Medical image captioning has the potential to improve workflow for radiologists,
promote better professional communication among healthcare providers, and make diagnostic
information more easily accessible [29]. With its comprehensive explanations of various medical
conditions and their visual representations, it can also be a useful tool in medical education.
The challenge faced by medical image captioning is diversity and complexity of medical
images because it requires an accurate and contextually relevant description which is true to the
image. The capabilities of medical image captioning are being improved and expanded by
ongoing research and AI advancements, which is adding to its increasing importance in the field
of healthcare technology. For this research work,EfficientNetV2B0 is used as a features extractor
for detecting abnormalities in chest x ray image and GRU is used to generate a description for an
image.
Literature Survey
Image Captioning
The author utilizes an attention strategy for captioning images. Implement stimulus
driven attention to extract a color, position, dimension of an object in an image and concept
driven attention for classical question answering [1]. For extracting feature vectors from an
image they used VGG19 and LSTM [30] is utilized for generating a description for an image.
For this research work they used Flickr30k, MSCOCO dataset. This method achieved a BLEU
score of 0.6.Introduced a captioning model by extracting higher level features in an image.
Flickr8k, Flickr30k, MSCOCO, PASCAL, SBU datasets are utilized for this research work [2].
The author combine both lower level image features(color, shape of an object) with higher level
image features(human in an image).VGG 16 is utilized for the extraction of feature vectors in an
image and LSTM is used for generating a caption for an image by using extracted feature vector
of VGG 16. This method achieved a BLEU score of 0.764. The author proposed a Human
Centric Captioning Model (HCCM) [3] that describes the relationship between human and object
in an image. This model gives additional weights to human body parts. For this work they
created a Human Centric COCO (HC-COCO) dataset of 16,125 images and 78,462 captions
more than 70% of the captions describe human behavior. Faster R-CNN is used for extracting
human activity related features in an image and LSTM is used for caption generation. This model
gives more importance to human in an image. Attention-based Encoder-Decoder architecture is
proposed [4]. In order to extract features from an image, CNN (Xception) is integrated with
YOLOv4 to extract object features. GRU is employed as a language generation model to serve as
a decoder portion. This study uses the Bahdanau soft attention system. The deterministic
attention mechanism of the model is responsible for its overall smoothness and differentiability.
The machine translation was made better by utilizing the attention mechanism. They proposed
the importance factor, which prioritizes large objects in the foreground of a picture over little
ones in the background.The author introduced a Hybrid Attention Network that integrates both
bottom-up attention and top-down attention to enhance the generated caption. Adaptive fusion
technique is used for combining the attention mechanism. Bottom-up attention detects the
semantic object in an image and reassigns the weight. It helps to avoid the object hallucination.
Generated caption quality is improved by using top-down attention. For extracting visual
information in an image Faster R-CNN is utilized and a language generation model LSTM is
used. This model gives more importance to human features in an image [5].
Medical image captioning
The author suggests a complete Transformer-based report creation model called TrMRG
(Transformer Medical Report Generator). Vision Transformer is used to detect abnormalities
found in chest x-ray images [9]. Causal Language Modelling is utilized for generating reports.
For this work, publicly available Indiana University Chest x-Ray dataset (IU X-Ray) utilized and
achieved 0.532 as a BLEU score. The main limitation is that for normal circumstances, the
reports produced by the proposed TrMRG resemble the reference report; but, when it comes to
sick cases, the model frequently omits some crucial medical terminology. The primary cause is
the IU dataset's unique medical words and very limited corpus size.To address the data-bias
issue, suggest a contrastive triplet network (CTN) based on the Transformer architecture for
automated chest X-ray reporting. By comparing visual and semantic data between normal and
abnormal cases using a triplet network, CTN effectively accentuates abnormalities
[10].Enhancing the triplet comparison includes contrasting the visual and semantic embeddings
between triplets in two distinct stages. In order to convert the report to a fixed-size vector in an
embedded space for semantic comparison, pre-train a textual encoder based on the BERT
architecture.This study presents a novel architecture for an X-ray report generator that uses a
Multi-Head Attention (MHA) system and enhances images during pre-processing [11]. In the
pre-processing stage, a gamma correction is performed to improve the quality of X-ray images
and address noise that occurred in the image during the acquisition procedure.ChexNet is used to
detect the defects in an image and BERT embedding is used to preprocess the text. Finally, the
report is generated by using Long Short Term Memory.The author presents a complete deep
neural network for CXR image generation that generates radiological reports with clinical use
using contextual word representations. VGG 16 is used to extract the visual features in an image
[12]. The BERT model is fed the text corpus that was taken out of the dataset's ground truth
reports in order to produce word embeddings. the extracted visual features and corresponding
word embedding passed to the LSTM for report generation. Finally, Sentiment analysis using
DistilBERT is used to determine whether sentences in the generated reports are good and
negative.The first encoder-to-decoder model for CXR report extraction was used by Wang et al.
[15]. The authors utilised multi-label abnormality classification (ChestX-ray14 abnormality
labels) and multi-task learning of CXR report generation (IU X-ray). ResNet-50 and LSTM with
attention mechanism is utilized for diagnosing a chest ray image. When compared to a general-
domain picture captioning method, which was evaluated using multiple NLG metrics, the
suggested method fared better.The author proposes a semantic fusion network in lesion area
detection model that detects the visual and pathological information in a medical image to
overcome the drawback in existing research. The model learns to fuse pathological information
from ResNet 50 into LSTM for generating diagnostic reports [13]. It gives accurate pathological
information and higher metrics scores. The research work utilizes a hybrid retrieval and
reinforcement learning strategy that aligned with the report's template [16]. The sentence decoder
generated a topic for reinforcement learning and training using a Template Database or
Generation Module, resulting in a BLEU-4 value of 0.15. This work suggested using Clinical
Finding Scores to evaluate report quality, taking into account medical abnormal and negative
phrases [17].
EFFICIENTNETV1
EfficientNetis a Convolutional neural network architecture and scaling method that
uniformly scales all dimensions of depth/width/resolution using a compound coefficient [19]. As
a baseline neural architecture search is utilized and scale it up to obtain a family of models,
called EfficientNet, which achieve much better accuracy and efficiency than previous ConvNets.
Resolution scaling refers as enhancing the image size. Larger input images contain more fine-
grained details. By increasing the depth, EfficientNet can capture more complex patterns and
features in the data. Width scaling involves adjusting the number of channels (or neurons) in
each layer of the neural network. Wider networks have more channels, which allows them to
capture more information at each layer. scaling method is proposed for combining all the 3
scaling. The compound scaling method is justified by the intuition that if the input image is
bigger, then the network needs more layers to increase the receptive field and more channels to
capture more fine-grained patterns on the bigger image.

f =¿

Value of efficientNet-B0 depth(α ) = 1.2,width( β ) = 1.1 andresolution(γ )=1.15 is obtained after

performing grid search by fixing pi=1 under the condition α . β 2 . γ 2 ≈ 2 . By fixing α , β , γ as a
constant and scaling up value of compound coefficient to obtain (B1 to B7). EfficientNet is a
family of models that are optimized for FLOPs and parameter efficiency. It leverages NAS to
search for the baseline EfficientNet-B0 that has better trade-off on accuracy and FLOPs. It has
some drawbacks training with large image size is slow, to accommodate large image on the
GPU, need to decrease the batch size it become suboptimal with small batches.

EFFICIENTNETV2

EfficientNetV2 is designed, a new family of convolutional networks that have faster

training speed and better parameter efficiency than previous models [18]. It is developed by a
combination of training-aware neural architecture search and scaling to improve both training
speed and parameter efficiency. EfficientNetV2 is designed with a search space enriched with
additional ops such as Fused-MBConv, and apply training-aware NAS and scaling to jointly
optimize model accuracy, training speed, and parameter size. Additionally progressive learning
is introduced that train the network with small image size and weak regularization (e.g., dropout
and data augmentation), then we gradually increase image size and add stronger regularization
this approach speeds up training, and simultaneously improves accuracy. EfficientNetV2, train
up to 4x faster than prior models, while being up to 6.8x smaller in parameter size.

PROPOSED METHODOLOGY

Fig 2: Proposed model for diagnosing chest X ray image

Fig 2 shows that generating caption for chest X ray image. First gamma correction is utilized to
enhance the image quality. EfficientNetV2B0 is used to extract the visual features from chest X
ray image and GRU is used to generate a report by using extracted visual features.
Image preprocessing
Image preprocessing starts with image resizing, initial resolution of a Chest X-ray image
is 512*512 is resized into 224*224, which the efficientNetV2B0 accept. After that, resized image
undergo for image enhancement techniquesgamma correction, in particular, adjusts the
brightness and contrast of an image by applying a nonlinear transformation to the pixel values. It
can enhance the overall appearance of an image by improving its luminance characteristics and
making it more visually pleasing or easier to interpret.Enhanced images help to extract
meaningful features from chest X-ray image.
Gamma Correction for image enhancement
Gamma correction is a non-linear technique that modifies pixel values according to the gamma
constant (γ), and is essential for modifying exposure or tristimulus in photos and videos.The
process of gamma correction involves applying nonlinear techniques to the input image's pixels,
which modifies the saturation of the image [14]. The gamma value must be kept constant,
meaning it shouldn't be too high or too low. It altering pixels based on the gamma constant ( γ ),
with the formulaV out =V γ¿. WhereV ¿ is the input pixel value. V out is the output pixel value. When
the gamma value is 1, no adjustments are made, and the image remains unchanged. When the
value is larger than 1, the image seems lighter and brighter. In contrast, an image changes
towards a darker spectrum when the gamma value is less than 1, which can improve contraction
and highlight features in shadows. It is a powerful tool for controlling the brightness and contrast
of images including exposure adjustment and color correction as shown in Fig3.

Image before applying GC Image after applying GC

Fig 3: Preprocessed Chest X ray image

Fig 4: histogram visualization for CXRs image

Fig 4: shows that histogram visualization chest X-ray image before after applying gamma
correction.
Feature Extraction
Feature extraction is one of the crucial steps to identify and encode both normal and
abnormal features in a chest x-ray image accurately for producing a report that is useful for
clinical practice. In this work pretrained EfficientNetV2B0 is utilized for extracting a visual
feature in achest x ray image. It contains convolutional layers for feature extraction and fully
connected layers for classification. For captioning task, it is necessary to eliminate a fully
connected layer because is not a classification problem. The pre-processed image is passed to
input layers of efficientnetV2B0.After that, Stem convolutional layer is responsible for
extracting basic features in chest x ray image. Then, MBConv blocks that is the fundamental
blocks of Efficientnetfor increasing the no of channels which includes convolutional layers such
as depthwise convolutional, expansion layer, squeeze and excitation helps to learn the essential
features and it reduces the spatial dimension. The global average pooling layer spatially average
a feature map resulting in fixed size vector.By result a fixed size 1280 vector is obtained from
last convolutional layer from efficientNetV2B0.

Let X ∈ D C× H × W represent the input image Y represent the output feature map after passing the
convolutional layer where c is the number of channel, H and W represent image height and
width.

I is the feature vector for an input image X from EfficientnetV2B0 where I ∈ I 1 ,2 , 3 ….. k where is a
number of vector in I. I F is the feature vector extracted from a frontal view of a chest X-ray
image. I lis the feature vector extracted from a lateral view of a chest X-ray image. Eff represent
EfficientNetV2B0.
I F =Eff ( X) (2)
I l=Eff ( X) (3)

Combining the output features from the both view of input image

Y= concat ( I F , I l) (4)

The two feature vectors I F and I l are concatenate by using pooling layers.
Text Preprocessing
Text preprocessing is a crucial step in medical image captioning. Indiana Chest X ray
dataset, one report is associated with more than one image and it has 8 feature which includes
image_id, indication, problem, MeSH, comparison, findings, impression, image view. Among 8
feature, finding feature is utilized for this work remaining features are dropped. Finding feature
contains useful information about X-ray for both normal and abnormal case. Preprocessing starts
with removing null values in finding features as it contains 512 null fields. Second step is to
remove additional spaces, punctuation and number. After that decontraction, that replaces word
from “won’t” to “will not” and removing stop words. Finally, tokenization that helps to
breakdown sentences into sub words and allocate a unique integer id as shown in Fig 5.

Fig5: Text preprocessing

Language Generation Model

Given a pair of chest x ray images, we first use EfficientNetV2B0 to detect the
abnormalities. Then, we utilize a recurrent neural network(RNN) to model the potential
dependencies between features. Specifically, the gated recurrent unit(GRU) is used to generate
description for an image [28].By utilizing Bidirectional Encoder Representations from
transformer short as BERTembedding is a technique to map each word in C into a distributed
vector space, resulting in a word vector e t ϵ R d for each wordw t , where d e denote size of the
e

word embedding. BERT generates contextualized embeddings for each token. Contextualized
embeddings capture the meaning of a word based on its surrounding context in the text.For each
token, BERT typically outputs a vector representation in a high-dimensional space. GRU is a
type of recurrent neural network that is designed to address the vanishing gradient problem in
traditional RNN. It captures long-term dependencies in sequential data more effectively
compared to traditional RNNs. It has two gates. Update gate control how much of past
information should retrain and reset gate determine how much of past information to forget.
GRU is used as the main structure of the report generation model. Gates of GRU control
the flow of information, and the calculation procedures are shown as follows:
Y is the input features, o t is the output word at a time step t, e t is the output word at time
step t, ht is the hidden state at time step t, ⨀ denote elementwise multiplication.
z t =sigmoid ¿ (5)

r t =sigmoid ¿ (6)

'
ht =tanh ⁡¿ (7)

'
ht =( 1−z t ) ⨀ ht −1+ z t ⨀ ht (8)

o t=softmax ( V . ht ) (9)

W z ,W r , W are the weight matrix and b z , b r ,b are bias vector, which are the learnable parameters
of the GRU.Cross-entropy loss function is used to increase the probability of generating correct
report.
Implementation Details
The captioning model is mainly implemented by PyTorch, Tensorflow andoptimized via
Adam [20]. EfficientNetV2B0 [18]to detect the visual features in chest X-ray image.1280
dimensional features are obtained from frontal view of chest-X ray and 1280-dimensional
features vector is obtained from lateral view of chest X-ray image. By concatenating visual
feature from a chest X ray image, we obtained 2560-dimensional featuresI from the poollayer
before the classification layer. For the proposed work EfficientNetV2B0 is used which gives a
better accuracy and faster training speed.For generating finding in a chest X-ray image Gated
recurrent Unit type of recurrent neural network is utilized. Thedimension of the hidden state and
word embedding in GRU is 512 and 768.Thedimension of the input in Transformer is set to 512,
and the numberof heads is set to 8. We set the dropout probability to 0.1. The Adamoptimizer is
used to train the captioning model with a learning rateof 5e-4 under cross-entropy loss. The batch
size is 10, lasting for 30epochs, and the predetermined sampling probability is increased by0.05
every 5 epochs.
Experiments and analysis
Dataset Used
Medical image captioning datasets usually consist of medical images and corresponding
reports. For this work, IU Chest X-ray dataset is used [27]. It contains 7470 X-ray images and
3955 associated radiology reports from 3955 patients. The dataset has a frontal and lateral view
of chest x-ray images. More than one image is associated with one report. The report contains
various features. Problem section indicates the abnormality found in a chest x ray image. The
indication shows the symptoms of a disease provided by the patient. Comparison section
provides information about a patient’s previous medical treatment. The Finding section provides
a problem found in an image as shown in fig 6. The MeSh section is written by the radiology
expert. Finding feature is utilized for this work. It contains 512 null fields. By eliminating 512
null fields 80% of data is utilized for training and remaining data is utilized for testing and
validation.

Fig6: Chest X ray images and finding feature in IU chest X ray Dataset
Table 1: DatasetDescription

Dataset Total no of images Image Representation Total no of records

IU Chest X ray 7470 Frontal & lateral 3955

dataset
OUTPUT SCREENSHOT
Fig 7. Report predicted by proposed methodology
Frontal chest x ray Lateral chest X ray Acutal Report Generated report
image image
Cardiomediastinal Cardiomediastinal
silhouette and silhouette and pulmonary
pulmonary vasculature are within
vasculature are normal limits. Clear
within normal Lungs. No pneumothorax
limits.Lungs are and no acute
clear. No cardiopulmonary findings.
pneumothorax or
pleural effusion.
No acute osseous
findings.
The cardiac The cardiac silhouette is
silhouette is borderline larger in size
borderline and no focal opcity.No
enlarged. pneumothorax.Mediastinal
Otherwise, there is contours are in normal
no focal opacity. limits.Borderline
Mediastinal enlargement of the cardiac
contours are within silhouette.
normal limits.
There is no large
pleural effusion.
No pneumothorax.

Fig 7 shows that report generated by the proposed method on the IU chest X ray dataset. In the
above table, generated report is the finding predicted by the proposed model
EfficientNetV2B0+GRU and actual report is the finding in dataset.
Evaluation Metrics
BLEU: Bilingual Evaluation Understudyis one of the popular and expensive methods that is
used to measure similarity between generated sentence and expected sentence [7]. It uses n-gram
precision to compare candidate sentence generated sentence to one or more reference sentence
and count the number of matches. n-gram ranges from 1,2,3,4. It score ranges from 0 to 1. BLEU
with ahigh score indicate better quality of caption. Advantage of BLEU is easy to compute. Main
disadvantage is it measures the similarity between generated and excepted sentence rather than
fluency. Each n-gram in BLEU is equally computed.
Let Pn be the modified precision score for whole text corpus, A - candidate sentence, B -
reference sentence.

To compute

∑ ∑ count clip ( n−grams )

A ∈ { candidate } n−grams ∈ A
Pn = (10)
∑ ∑ count ( n−grams ' )
A ' ∈ { candidate } n−grams ' ∈ A '

a-candidate translation length, b-reference corpus length and positive weights W n summing to
one.

To compute the brevity penalty BP,

{
BP= 1,∧if a> b
a , if a ≤ b
(11)

(∑ )
N
BLEU =BP ∙ exp W n log p n (12)
n

Ranking behavior is apparent in the log domain

( )
N
b
log BLEU =min 1− , 0 + ∑ W n log pn (13)
a n=0

N ≤ 4 and uniform weights W n =1/ N .

CIDE: Consensus-based Image Description Evaluation. It measures the similarity between the
images and a generated caption [8]. CIDEr measures the generated caption's quality based on
how closely it matches the typical descriptions provided by human annotators. It is highly
correlated with human consensus score and gives more weight-age to important n-grams.

n is computed using the average cosine similarity between the candidate sentence and the
reference sentences, which accounts for both precision and recall

M n n
1 g ( A )∙ g (B)
CIDEr n ( A , B )= ∑
M i=1 ‖gn ( A )‖‖g n ( B )‖
(14)

Where A be the candidate caption, B be the reference caption and gn ( A ) , g n ( B ) is the formed by
Term Frequency Inverse Document Frequency (TF-IDF) of all n-grams in A, B.‖g n ( A )‖ ,‖g n ( B )‖
is the magnitude of vector gn ( A ) , g n ( B ) , where M is the number reference caption

ROUGE: Recall-Oriented Understudy for Gisting Evaluation. It is mainly used in text

summarization [31]. ROUGE-L, ROUGE-N, ROUGE-S, and ROUGE-W are the variant of
ROUGE which are included in the ROUGE summarization evaluation package and their
evaluations. In image captioning ROUGE is used to measure the similarity between the
candidate and reference sentence by using various n-grams based on precision and recall score.
ROUGE is similar to BLEU score.

∑ ∑ c match ( gramn )
B ∈ referencesentences gramn ∈ B
ROUGE−N= (15)
∑ ∑ c ( gramn )
B ∈reference sentences gramn ∈ B

Where B - candidate sentence, c match ( gramn ) - maximum number of n-grams co-occurring in a

candidate summary and a set of reference summaries, n – length of n-grams.

Result and analysis

Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 CIDEr ROUGE-L

Model without 0.4023 0 .262 0.185 0.109 0.298 0.305
applying gamma
correction
Model with gamma 0.458 0.310 0.221 0.159 0.348 0.345
correction
Table 3: Performance analysis of proposed model with applying gamma correction and without
applying gamma correction on IU chest X ray dataset
In the Table 3, the IU chest X ray dataset is taken, and the performance analysis of image
enhancement technique with and without including gamma correction for the proposed model is
carried out for the dataset.Performance analysis for generated report from chest X ray image in
terms of BLEU@A (A=1,2,3,4), CIDEr and ROUGE-L scorefor proposed model. From this
analysis proposed model with including gamma correction technique gives better accuracy than
model without including gamma correction technique.

Table 2: comparative analysis of proposed model with existing research work

MODEL BLEU@1 BLEU@2 BLEU@3 BLEU@4 CIDEr ROUGE-L

[23] 0.438 0.298 0.208 0.154 0.343 0.322
[24] 0.441 0.291 0.203 0.147 0.304 0.367
[15] 0.286 0.159 0.1038 0.0736 - 0.2263
Proposed 0.458 0.310 0.221 0.159 0.348 0.367

In the table 2, performance analysis of proposed model with the existing works in terms of
BLEU@N (where N= 1,2,3,4), CIDEr and ROUGE-L score. The IU chest X ray dataset is taken,
and the generating finding in the chest X-ray is carried out for the dataset. From this analysis, the
proposed method shows a better accuracy rate of about 0.458 as BLEU@1, BLEU@2 as 0.310,
BLEU@3 as 0.221, BLEU@4 as 0.159, CIDEr as 0.348 and ROUGE-L as 0.368 than the
existing research work.

0.5000
0.4500
0.4000
0.3500
0.3000
0.2500
0.2000
0.1500
0.1000
0.0500
0.0000
BLEU @1 BLEU @2 BLEU @3 BLEU @4 CIDEr ROUGE-L

[58] [59] [53] Proposed

Fig8: comparative analysis of proposed methodology

The above graph shows Comparison performance analysis of BLUE@N (N=1,2,3,4), CIDEr and
ROUGE-L score for proposed model EfficientNetV2b0+GRU with previous research work. The
proposed model EfficientNetV2B0 with GRU achieve a better BLEU@N (N=1,2,3,4), CIDEr
and ROUGE-L than previous model while including image enhancement technique gamma
correction and the efficient feature extraction capabilities of EfficientNetV2B0, allowing it to
process images quickly and accurately.

Conclusion
Thus,including image improvement methods and the integration of EfficientNetV2 with
GRU provides a strong solution for medical image captioning that improves interpretability and
accuracy.The model's ability to extract detailed information from medical images and produce
logical textual descriptions is made possible by combining the sequential learning skills of GRU
with the potent feature extraction capabilities of EfficientNetV2. The model shows improved
performance in producing meaningful and enlightening captions by combining EfficientNetV2
for feature extraction with GRU for sequential learning.More precise captioning is made possible
by combining these designs with gamma-corrected photos. In Future, transformer based model
will be used in order to achieve better caption for an image.

References

[1] Songtao Ding, Shiru Qu, Yuling Xi, Shaohua Wan, Stimulus-driven and concept-driven analysis for
image caption generation, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.04.095, vol 398,
2020, pg.no. 520-530.

[2] Songtao Ding, Shiru Qu, Yuling Xi, Arun Kumar Sangaiah, Shaohua Wan,Image caption generation
with high-level image features, Elsevier: Pattern Recognition Letters
https://doi.org/10.1016/j.patrec.2019.03.021, Volume 123,2019.

[3] Zuopeng Yang, Pengbo Wang, Tianshu Chu, Jie Yang,Human-Centric Image Captioning” Elsevier:
Pattern Recognition”, https://doi.org/10.1016/j.patcog.2022.108545,2022, Vol 126.

[4] Muhammad Abdelhadie Al‐Malla, Assef Jafar and Nada Ghneim, Image captioning model using
attention and object features to mimic human image understanding, Springer: Journal of Big Data,
https://doi.org/10.1186/s40537-022-00571-w, 2022.

[5] Wenhui Jiang, Qin Li, Kun Zhan, Yumung Fang, Fei Shen, Hybrid attention network for image
captioning, Elsevier: Displays, https://doi.org/10.1016/j.displa.2022.102238, Vol 73, July 2022.
[6] Mohsan, M. M., Akram, M. U., Rasool, G., Alghamdi, N. S., Baqai, M. A. A., & Abbas, M. (2022).
Vision Transformer and Language Model Based Radiology Report Generation. IEEE Access, 11,
1814-1824.

[7] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of
machine translation,” in Proc. 40 th Annu. Meeting Assoc. Comput. Linguistics, 2002, pp. 311–318.

[8] R. Vedantam, C. L. Zitnick, and D. Parikh, ‘‘CIDER: Consensus-based image description

evaluation,’’in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 4566–4575.

[9] Mohsan, M. M., Akram, M. U., Rasool, G., Alghamdi, N. S., Baqai, M. A. A., & Abbas, M. (2022).
Vision transformer and language model based radiology report generation. IEEE Access, 11, 1814-
1824.

[10] Yang, Y., Yu, J., Jiang, H., Han, W., Zhang, J., & Jiang, W. (2022). A contrastive triplet network
for automatic chest X-ray reporting. Neurocomputing, 502, 71-83.

[11] Tsaniya, H., Fatichah, C., & Suciati, N. (2024). Automatic radiology report generator using
transformer with contrast-based image enhancement. IEEE Access.

[12] Kaur, N., & Mittal, A. (2022). RadioBERT: A deep learning-based system for medical report
generation from chest X-ray images using contextual embeddings. Journal of Biomedical
Informatics, 135, 104220.

[13] Zeng, X., Wen, L., Xu, Y., & Ji, C. (2020). Generating diagnostic report for medical image by
high-middle-level visual information incorporation on double deep learning models. Computer
methods and programs in biomedicine, 197, 105700.

[14] Tharsanee, R. M., Soundariya, R. S., Kumar, A. S., Karthiga, M., & Sountharrajan, S. (2021).
Deep convolutional neural network–based image classification for COVID-19 diagnosis. In Data
Science for COVID-19 (pp. 117-145). Academic Press.

[15] Wang X, Peng Y, Lu L, Lu Z, Summers RM. TieNet: Text-image embedding network for
common thorax disease classification and reporting in chest X-Rays. In: 2018 IEEE/CVF conference
on computer vision and pattern recognition. IEEE; 2018, p. 9049–58.
http://dx.doi.org/10.1109/cvpr.2018.00943.

[16] Y. Li, X. Liang, Z. Hu, and E. P. Xing, ‘‘Hybrid retrieval-generation reinforced agent for medical
image report generation,’’ in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 1530–1540.

[17] G. Liu, T.-M. H. Hsu, M. McDermott, W. Boag, W.-H. Weng, P. Szolovits, and M. Ghassemi,
‘‘Clinically accurate chest X-ray report generation,’’ 2019, arXiv:1904.02633. [Online]. Available:
https://arxiv.org/abs/1904.02633.
[18] Tan, M., & Le, Q. (2021, July). Efficientnetv2: Smaller models and faster training.
In International conference on machine learning (pp. 10096-10106). PMLR.

[19] Tan, M., & Le, Q. (2019, May). Efficientnet: Rethinking model scaling for convolutional neural
networks. In International conference on machine learning (pp. 6105-6114). PMLR.

[20] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.

[21] Ayesha, H., Iqbal, S., Tariq, M., Abrar, M., Sanaullah, M., Abbas, I., ... & Hussain, S. (2021).
Automatic medical image interpretation: State of the art and future directions. Pattern
Recognition, 114, 107856.

[22] Yang, S., Wu, X., Ge, S., Zhou, S. K., & Xiao, L. (2022). Knowledge matters: Chest radiology
report generation with general and specific knowledge. Medical image analysis, 80, 102510.

[23] Li, Y., Liang, X., Hu, Z., & Xing, E. P. (2018). Hybrid retrieval-generation reinforced agent for
medical image report generation. Advances in neural information processing systems, 31.

[24] Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A., & Xu, D. (2020, April). When radiology report
generation meets knowledge graph. In Proceedings of the AAAI conference on artificial
intelligence (Vol. 34, No. 07, pp. 12910-12917).

[25] Jiang, W., Li, Q., Zhan, K., Fang, Y., & Shen, F. (2022). Hybrid attention network for image
captioning. Displays, 73, 102238.

[26] Sasibhooshan, R., Kumaraswamy, S., & Sasidharan, S. (2023). Image caption generation using
visual attention prediction and contextual spatial relation extraction. Journal of Big Data, 10(1), 18.

[27] Demner-Fushman, D., Kohli, M. D., Rosenman, M. B., Shooshan, S. E., Rodriguez, L., Antani,
S., ... & McDonald, C. J. (2016). Preparing a collection of radiology examinations for distribution and
retrieval. Journal of the American Medical Informatics Association, 23(2), 304-310.

[28] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., &
Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical
machine translation. arXiv preprint arXiv:1406.1078.

[29] Nicolson, A., Dowling, J., & Koopman, B. (2023). Improving chest X-ray report generation by
leveraging warm starting. Artificial intelligence in medicine, 144, 102633.

[30] J. Donahue et al., "Long-Term Recurrent Convolutional Networks for Visual Recognition and
Description," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp.
677-691, 1 April 2017, doi: 10.1109/TPAMI.2016.2599174.
[31] C. Lin, ‘‘Rouge: A package for automatic evaluation of summaries,’’ in Proc. Text
Summarization Branches Out, 2004, pp. 74–81.

[32] Wang, E. K., Zhang, X., Wang, F., Wu, T. Y., & Chen, C. M. (2019). Multilayer dense
attention model for image caption. IEEE Access, 7, 66358-66368.

Pid 23
No ratings yet
Pid 23
28 pages
Chestx-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks On Weakly-Supervised Classification and Localization of Common Thorax Diseases
No ratings yet
Chestx-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks On Weakly-Supervised Classification and Localization of Common Thorax Diseases
19 pages
Merlin: A Vision Language Foundation Model For 3D Computed Tomography
No ratings yet
Merlin: A Vision Language Foundation Model For 3D Computed Tomography
28 pages
Deep Learning-Based Methods For Brain Tumor Segmentation
No ratings yet
Deep Learning-Based Methods For Brain Tumor Segmentation
23 pages
Multi-Label Classification of Lung Diseases Using Deep Learning
No ratings yet
Multi-Label Classification of Lung Diseases Using Deep Learning
19 pages
Soft Computing
No ratings yet
Soft Computing
17 pages
Adapting Pretrained Vision-Language Foundational Models To Medical Imaging Domains
No ratings yet
Adapting Pretrained Vision-Language Foundational Models To Medical Imaging Domains
17 pages
Hybrid Deep Learning For Detecting Lung Diseases From X-Ray Images
No ratings yet
Hybrid Deep Learning For Detecting Lung Diseases From X-Ray Images
23 pages
Medical Paper - Plag Report
No ratings yet
Medical Paper - Plag Report
34 pages
5............ Automatic Interpretation of Brain Medical Images Using Hierarchical Classification and Image Captioning Model
No ratings yet
5............ Automatic Interpretation of Brain Medical Images Using Hierarchical Classification and Image Captioning Model
15 pages
Medical Image Analysis With Transformers
No ratings yet
Medical Image Analysis With Transformers
66 pages
Med Image Syn
No ratings yet
Med Image Syn
10 pages
A Survey On Biomedical Image Captioning: Department of Informatics, Athens University of Economics and Business, Greece
No ratings yet
A Survey On Biomedical Image Captioning: Department of Informatics, Athens University of Economics and Business, Greece
11 pages
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
No ratings yet
2002.08277 - When Radiology Report Generation Meets Knowledge Graph
8 pages
Neha Report
No ratings yet
Neha Report
75 pages
Code
No ratings yet
Code
5 pages
Paper 158
No ratings yet
Paper 158
16 pages
Chest Xray Captioning
No ratings yet
Chest Xray Captioning
28 pages
Research Paper
No ratings yet
Research Paper
10 pages
Artificial Intelligence in Finance Newsletter by Slidesgo
No ratings yet
Artificial Intelligence in Finance Newsletter by Slidesgo
27 pages
Applied Sciences: A Novel Transfer Learning Based Approach For Pneumonia Detection in Chest X-Ray Images
No ratings yet
Applied Sciences: A Novel Transfer Learning Based Approach For Pneumonia Detection in Chest X-Ray Images
17 pages
Medical Image Captioning Via Generative Pretrained
No ratings yet
Medical Image Captioning Via Generative Pretrained
13 pages
Research Proposal
No ratings yet
Research Proposal
16 pages
Final Report Gokul
No ratings yet
Final Report Gokul
30 pages
Medical Image Captioning Via Generative Pretrained Transformers
No ratings yet
Medical Image Captioning Via Generative Pretrained Transformers
12 pages
Diagnosis of Pneumonia From Chest X-Ray Images Using Deep Learning
No ratings yet
Diagnosis of Pneumonia From Chest X-Ray Images Using Deep Learning
5 pages
Lo Chen 2024 Automated Breast Imaging Report Generation Based On The Integration of Multiple Image Features in A
No ratings yet
Lo Chen 2024 Automated Breast Imaging Report Generation Based On The Integration of Multiple Image Features in A
14 pages
Vision Teransformaer Paper
No ratings yet
Vision Teransformaer Paper
12 pages
Enhancing Thorax Disease Classification in Chest X-Ray Images Through Advance Deep Learning Techniques
No ratings yet
Enhancing Thorax Disease Classification in Chest X-Ray Images Through Advance Deep Learning Techniques
10 pages
On The Automatic Generation of Medical Imaging Reports
No ratings yet
On The Automatic Generation of Medical Imaging Reports
10 pages
Automatic Report Generation For Chest X-Ray Images: A Multilevel Multi-Attention Approach
No ratings yet
Automatic Report Generation For Chest X-Ray Images: A Multilevel Multi-Attention Approach
10 pages
Performance Evaluation of Medical Image Captioning Using
No ratings yet
Performance Evaluation of Medical Image Captioning Using
10 pages
Research Paper 1
No ratings yet
Research Paper 1
10 pages
TMRGM A Template-Based Multi-Attention Model For X
No ratings yet
TMRGM A Template-Based Multi-Attention Model For X
12 pages
Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment
No ratings yet
Automatic Radiology Report Generation Based On Multi-View Image Fusion and Medical Concept Enrichment
9 pages
Report
No ratings yet
Report
41 pages
Arf Ansari 2020
No ratings yet
Arf Ansari 2020
8 pages
06 Deep Convolutional Neural Network With Segmentation Techniques For Chest X-Ray Analysis
No ratings yet
06 Deep Convolutional Neural Network With Segmentation Techniques For Chest X-Ray Analysis
5 pages
04 Manuscript
No ratings yet
04 Manuscript
15 pages
Algorithmic Precision in Chest Radiograph Interpretation
No ratings yet
Algorithmic Precision in Chest Radiograph Interpretation
9 pages
Pneumonia Classification (MD - Abid) Draft 01
No ratings yet
Pneumonia Classification (MD - Abid) Draft 01
5 pages
Research Paper 4
No ratings yet
Research Paper 4
11 pages
Log With CNN
No ratings yet
Log With CNN
8 pages
ICSE Class 8 Chemistry Sample Paper 2
No ratings yet
ICSE Class 8 Chemistry Sample Paper 2
6 pages
Diagnostics 13 00216
No ratings yet
Diagnostics 13 00216
18 pages
Artificial Intelligence in Medicine 2019
No ratings yet
Artificial Intelligence in Medicine 2019
9 pages
Litrature Review
No ratings yet
Litrature Review
4 pages
Pneumonia Detection Using Convolutional
No ratings yet
Pneumonia Detection Using Convolutional
4 pages
Applsci 15 00343
No ratings yet
Applsci 15 00343
14 pages
Lung Disease Classification Using CNN
No ratings yet
Lung Disease Classification Using CNN
7 pages
NTTTTT
No ratings yet
NTTTTT
5 pages
Artificial Intelligence in Lung Cancer: Current Applications and Perspectives
No ratings yet
Artificial Intelligence in Lung Cancer: Current Applications and Perspectives
10 pages
Automated Abnormality Classi Fication of Chest Radiographs Using Deep Convolutional Neural Networks
No ratings yet
Automated Abnormality Classi Fication of Chest Radiographs Using Deep Convolutional Neural Networks
8 pages
MedAI Science Project 2025
No ratings yet
MedAI Science Project 2025
12 pages
Artificial Intelligence in Respirato - 2021 - Archivos de Bronconeumolog A Engl
No ratings yet
Artificial Intelligence in Respirato - 2021 - Archivos de Bronconeumolog A Engl
2 pages
(IJIT-V7I2P7) :manish Gupta, Rachel Calvin, Bhavika Desai, Prof. Suvarna Aranjo
No ratings yet
(IJIT-V7I2P7) :manish Gupta, Rachel Calvin, Bhavika Desai, Prof. Suvarna Aranjo
4 pages
Elixir of Restoration: Advanced Potion Making, Page #120
No ratings yet
Elixir of Restoration: Advanced Potion Making, Page #120
3 pages
OB Meds
100% (2)
OB Meds
19 pages
Cleaning of Heat Exchangers
0% (1)
Cleaning of Heat Exchangers
22 pages
RFID Presentation Slides
No ratings yet
RFID Presentation Slides
13 pages
English Lessons For The Office
No ratings yet
English Lessons For The Office
24 pages
NIT Warangal Curriculum
No ratings yet
NIT Warangal Curriculum
90 pages
Candidate Information Sheet: HR Information (For Official Use Only)
No ratings yet
Candidate Information Sheet: HR Information (For Official Use Only)
2 pages
TYCO CPP Project List 2024-25
No ratings yet
TYCO CPP Project List 2024-25
6 pages
M 1 PPT-PDF XI Bio CH 7 Structural Organisation in Animals
No ratings yet
M 1 PPT-PDF XI Bio CH 7 Structural Organisation in Animals
39 pages
Prof BSP - Neurologic Paraneoplastic Syndromes
No ratings yet
Prof BSP - Neurologic Paraneoplastic Syndromes
122 pages
Development of Greenhouse Autonomous Control System For Home Agriculture Project
No ratings yet
Development of Greenhouse Autonomous Control System For Home Agriculture Project
6 pages
Cara-Menjawab-Kertas-2-Bahasa-Inggeris Slide2
No ratings yet
Cara-Menjawab-Kertas-2-Bahasa-Inggeris Slide2
38 pages
Laws of Malawi 2002
No ratings yet
Laws of Malawi 2002
6 pages
Essay Ni Paronds Bading
No ratings yet
Essay Ni Paronds Bading
10 pages
Pilot Devices - 22 MM: Color Selection
No ratings yet
Pilot Devices - 22 MM: Color Selection
1 page
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
1 page
Reinforcement Learning Syllabus
No ratings yet
Reinforcement Learning Syllabus
1 page
Imaging of Urinary Tract Diverticula Instant Reading Access
No ratings yet
Imaging of Urinary Tract Diverticula Instant Reading Access
16 pages
Final III-Cookery Activity Sheets-SSS-Quarter 3 Week 3
No ratings yet
Final III-Cookery Activity Sheets-SSS-Quarter 3 Week 3
8 pages
11X MCaseStudy - A - Sets
No ratings yet
11X MCaseStudy - A - Sets
3 pages
Tingkat Pendidikan Ibu Dan Pola Asuh Gizi Hubungannya Dengan Kejadian Stunting Pada Balitausia 24-59 Bulan
No ratings yet
Tingkat Pendidikan Ibu Dan Pola Asuh Gizi Hubungannya Dengan Kejadian Stunting Pada Balitausia 24-59 Bulan
9 pages
HSBC Case Study Report
No ratings yet
HSBC Case Study Report
6 pages
CE21M112 Gopalji L9
No ratings yet
CE21M112 Gopalji L9
2 pages
HL-Image-Broschüre EN RZ Ansicht
No ratings yet
HL-Image-Broschüre EN RZ Ansicht
28 pages
Explanation For Algorithm
No ratings yet
Explanation For Algorithm
15 pages
Vdo Paper
No ratings yet
Vdo Paper
40 pages
Science DLP Form 1 Chapter 1 (1) - Quizizz
No ratings yet
Science DLP Form 1 Chapter 1 (1) - Quizizz
3 pages
Students With Food Insecurity Are More Likely To Screen
No ratings yet
Students With Food Insecurity Are More Likely To Screen
21 pages
ACV-12 Adjustable Choke Valves: For Wide Applications in Oil, Gas, and Water Service
No ratings yet
ACV-12 Adjustable Choke Valves: For Wide Applications in Oil, Gas, and Water Service
2 pages
DS Model Lab Question Set
No ratings yet
DS Model Lab Question Set
4 pages
(PDF - 3Mb) - Buffalo City
No ratings yet
(PDF - 3Mb) - Buffalo City
16 pages
The Effect of Transcutaneous Electrical Nerve Stimulation (TENS) On Chronic Subjective Tinnitus
No ratings yet
The Effect of Transcutaneous Electrical Nerve Stimulation (TENS) On Chronic Subjective Tinnitus
5 pages
024 Chugoku Marine Paints
No ratings yet
024 Chugoku Marine Paints
1 page
Soil Resources Issue in The Philippines
No ratings yet
Soil Resources Issue in The Philippines
2 pages
Advanced Analytics of Image Datasets in Human Health
From Everand
Advanced Analytics of Image Datasets in Human Health
Dr. Zemelak Goraga
No ratings yet
Medical Imaging Mastery: Techniques and Technologies
From Everand
Medical Imaging Mastery: Techniques and Technologies
Bea D. Kinsley
No ratings yet
Modern Radiology And Ai
From Everand
Modern Radiology And Ai
Enrico Guardelli
No ratings yet
Textbook of Urgent Care Management: Chapter 35, Urgent Care Imaging and Interpretation
From Everand
Textbook of Urgent Care Management: Chapter 35, Urgent Care Imaging and Interpretation
Tim Hogan
No ratings yet
Augmented Reality Assisted Surgery: Enhancing Surgical Precision through Computer Vision
From Everand
Augmented Reality Assisted Surgery: Enhancing Surgical Precision through Computer Vision
Fouad Sabry
No ratings yet

Medical Paper

Uploaded by

Medical Paper

Uploaded by

DIAGNOSING CHEST X-RAY IMAGE BASED ON MEDICAL IMAGE CAPTIONING

Value of efficientNet-B0 depth(α ) = 1.2,width( β ) = 1.1 andresolution(γ )=1.15 is obtained after

EfficientNetV2 is designed, a new family of convolutional networks that have faster

Fig 2: Proposed model for diagnosing chest X ray image

Image before applying GC Image after applying GC

Fig 4: histogram visualization for CXRs image

Fig5: Text preprocessing

Language Generation Model

Dataset Total no of images Image Representation Total no of records

IU Chest X ray 7470 Frontal & lateral 3955

∑ ∑ count clip ( n−grams )

To compute the brevity penalty BP,

Ranking behavior is apparent in the log domain

N ≤ 4 and uniform weights W n =1/ N .

ROUGE: Recall-Oriented Understudy for Gisting Evaluation. It is mainly used in text

Where B - candidate sentence, c match ( gramn ) - maximum number of n-grams co-occurring in a

Result and analysis

Model BLEU@1 BLEU@2 BLEU@3 BLEU@4 CIDEr ROUGE-L

Table 2: comparative analysis of proposed model with existing research work

MODEL BLEU@1 BLEU@2 BLEU@3 BLEU@4 CIDEr ROUGE-L

[58] [59] [53] Proposed

Fig8: comparative analysis of proposed methodology

[8] R. Vedantam, C. L. Zitnick, and D. Parikh, ‘‘CIDER: Consensus-based image description

You might also like