0% found this document useful (0 votes)
44 views44 pages

Chest CT Image Segmentation Using Deep Learning

Uploaded by

thomas.mancilla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views44 pages

Chest CT Image Segmentation Using Deep Learning

Uploaded by

thomas.mancilla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

저작자표시-비영리-변경금지 2.

0 대한민국

이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게

l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다.

다음과 같은 조건을 따라야 합니다:

저작자표시. 귀하는 원저작자를 표시하여야 합니다.

비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.

변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.

l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건


을 명확하게 나타내어야 합니다.
l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다.

저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다.

이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.

Disclaimer
Master’s Thesis

Chest CT Image Segmentation


Using Deep Learning

Youngjun Lee

Department of Medical Imaging Engineering

Graduate School

Korea University

August. 2023
Chest CT Image Segmentation
Using Deep Learning
by
Youngjun Lee

_____________________________________

under the supervision of Professor Han-Jeong Hwang

A thesis submitted in partial fulfillment of


the requirements for the degree of
Master of Science

Department of Medical Imaging Engineering

Graduate School
Korea University

August. 2023


The thesis of Youngjun Lee has been approved
by the thesis committee in partial fulfillment of
the requirements for the degree of
Master of Science.

June. 2023.

signature
__________________________
Committee Chair: Han-Jeong Hwang
signature
__________________________
Committee Member: Chang-Hyun Oh
signature
__________________________
Committee Member: Cheol E. Han


Chest CT Image Segmentation

Using Deep Learning


by Youngjun Lee

Department of Medical Imaging Engineering

under the supervision of Professor Han-Jeong Hwang

ABSTRACT

Automated deep learning segmentation of chest computed


tomography (CT) images can support treatment planning and diagnosis.
While segmentation studies of the lung have become increasingly
popular due to COVID-19. Segmentation of other organs such as the
liver, sternum, and trachea-bronchus is not yet extensively researched.
As a result, achieving high prediction accuracy for these organs remains
a challenge. This study explores the potential of applying deep learning
techniques for automatic segmentation of such organs using a U-net
framework. Specifically, I implemented deep learning models
segmenting chest CT images into four distinct organs: lung, liver,
sternum, and trachea-bronchus. The four models were trained using a
series of seven patients with 698 Chest CT images. The training data
comprised 80% of the images, while the remaining 20% were used for
testing. Each model underwent training with 500 iterations with early
stopping, a batch sizes of 8 to 32, and 1e-3 to 5 of learning rates. The


hyperparameters were adjusted for each model to achieve optimal
performance.
To evaluate the performance of the model, the Dice similarity
coefficient (DSC) was utilized, which measures the similarity between
the predicted findings and the ground truth masks. In the test set, the
model achieved Dice coefficients of 0.9482 ± 0.1903 for lung, 0.8432 ±
0.4129 for liver, 0.8827 ± 0.3104 for sternum, and 0.7331 ± 0.2027 for
trachea-bronchus. In conclusion, the deep-learning models can be
utilized as diagnostic assistance tools for predicting organ locations and
may also serve as a secondary reference if the technology continues to
advance.

Keywords: Deep Learning, Multi-Organ Image Segmentation, X-ray


Computed Tomography (CT).


딥러닝을 이용한 흉부 컴퓨터 단층 촬영 영상 분할

이영준

의료 영상 공학과

지도교수: 황 한 정

국문 초록

최근 인체 다양한 장기에 대하여 진단 및 치료 계획 수립을 위한 딥


러닝 기반 다중 장기 자동 영상 분할 기술이 활발히 연구되고 있다.
특히 폐의 경우, COVID-19 로 인해 폐 부위에 대한 영상 분할 연구가
광범위하게 진행되고 있으나, 간, 흉골, 기관-기관지와 같은 다른
기관의 분할은 아직 광범위하게 연구되지 않았다. 결과적으로
이러한 기관에 대한 높은 예측 정확도를 달성하는 것은 여전히
어려운 과제로 남아 있다. 본 논문은 U-Net 프레임워크 기반 폐, 간,
흉골 및 기관-기관지의 자동 딥 러닝 영상 분할 기법을 제시한다.
딥러닝 모델은 가슴 CT 영상을 4 개의 장기로 분류하도록
학습하였으며, 7 명 환자에 대해 총 698 장의 흉부 CT 이미지가 각
장기 모델 학습에 사용되었다. 그 중 학습 데이터로 80% (총 583 장),
테스트 데이터로 20% (총 115 장) 가 사용되었다. 각 모델은
조기종료를 포함한 500 회 학습횟수 및 배치 크기 8 부터 32 그리고


1e-3 부터 5 학습률 등으로 학습하였으며 예측 정확도 개선을
위해 일부 매개 변수를 조정하였다. 모델의 성능 평가를 위해 예측
결과와 실측 마스크의 유사성을 추정하는 Dice 유사성 점수를
적용하였다. 테스트 데이터에서 모델은 다음과 같은 성능을
달성했다: 0.9482 ± 0.1903(폐), 0.8432 ± 0.4129(간), 0.8827 ±
0.3104(복장 뼈), and 0.7331 ± 0.2027(기관 및 기관지)을 달성했다.
개발된 다중 장기 영상 분할 딥 러닝 모델은 진단 보조로 활용될 수
있으며, 흉부 장기 외 다양한 인체 장기에 적용 가능성을 보였다.

중심어: 딥러닝, 다중 장기 영상 분할, 엑스선 단층 촬영


ACKNOWLEDGMENTS

This paper was supported by You-Jin, Jeong with a few students for
manual segmentation tasks – multi organs such as lung, liver, sternum,
and trachea-bronchus. In addition, the results from this paper could be
secondary references regarding the project from Korea University Anam
Hospital.


TABLE OF CONTENTS

ABSTRACT ................................................................................................. 3

국문 초록..................................................................................................... 5

ACKNOWLEDGMENTS............................................................................. 7

TABLE OF CONTENTS .............................................................................. 8

LIST OF TABLES........................................................................................ 9

LIST OF FIGURES .................................................................................... 10

CHAPTER 1. INTRODUCTION ................................................................ 11

CHAPTER 2.METHODS ........................................................................... 14


2.1. Data Acquisition .................................................................................. 14
2.2. Data Pre-Processing ............................................................................. 14
2.2.1. 3D-Slicer ................................................................................. 15
2.3. Deep Learning U-Net .......................................................................... 17
2.3.1. Training the model ................................................................... 20
2.4. Evaluation Metrics ............................................................................... 23

CHAPTER 3. RESULTS ............................................................................ 25


3.1. Lung ................................................................................................... 25
3.2. Liver .................................................................................................... 28
3.3. Sternum ............................................................................................... 31
3.4. Trachea-Bronchus ................................................................................ 34

CHAPTER 4. DISCUSSION ...................................................................... 37

CHAPTER 5. CONCLUSION .................................................................... 40

REFERENCES ........................................................................................... 41


LIST OF TABLES

Table 1 .................................................................................................. 27
Table 2 .................................................................................................. 30
Table 3 .................................................................................................. 33
Table 4 .................................................................................................. 36


LIST OF FIGURES

Figure 1 ........................................................................................15
Figure 2 ........................................................................................16
Figure 3 ........................................................................................19
Figure 4 ........................................................................................22
Figure 5 ........................................................................................26
Figure 6 ........................................................................................29
Figure 7 ........................................................................................32
Figure 8 ........................................................................................35

10
CHAPTER 1. INTRODUCTION

Motivated by advancements in deep learning in computer vision, researchers are


exploring its potential in clinical imaging, particularly in medical environments
where tasks such as image enhancement, registration, synthesis, and segmentation
have been investigated (1-3). Multi-organ segmentation is a fundamental task that
can significantly improve clinical diagnosis by speeding up the segmentation process,
enhancing contour consistency, and facilitating interpretation (4).

Computed Tomography (CT) images provide valuable information for diagnosis


but often lack high-resolution details in soft tissue contrast (5, 6). Manual
segmentation of soft tissues in CT images, such as the lung lower boundary and
esophagus, can be challenging and prone to observer variability. Therefore, there is
a need to develop automated contouring systems that provide consistent and accurate
organ delineations. Previous studies have explored advanced architectures such as
HighRes3DNet for multi-organ segmentation in specific regions (7). However, in
this study, we focus on the basic U-Net architecture, which has shown strength and
applicability in the clinical field (8).

The objective of this research is to discover and compare the best iterations for
training deep learning models in four organ segmentation tasks: lung, liver, sternum,
and trachea-bronchus. By optimizing parameters such as iterations, batch size, and
learning rate, we aim to prevent underfitting and overfitting, thus improving the
performance and generalization capabilities of the models (9, 10). We also discuss
the challenges associated with segmenting other body organs and explore various
single-organ segmentation approaches and their performance evaluations.

Deep learning-based techniques have shown remarkable advancements in

11
medical image segmentation, especially in multi-organ segmentation, outperforming
traditional methods that rely on handcrafted features (11-13). Leveraging the ability
of deep learning models to extract meaningful features from clinical images, we
strive to develop customized approaches for accurate and consistent multi-organ
segmentation. The integration of deep learning techniques into medical image
analysis has the potential to revolutionize clinical practices and advance the field of
radiology (14, 15).

In conclusion, this study aims to contribute to the field of medical image


segmentation by exploring the potential of deep learning, specifically using the U-
Net architecture, in multi-organ segmentation tasks. By improving the accuracy and
efficiency of organ delineation, our research endeavors to enhance clinical diagnosis
and enable further advancements in medical imaging analysis.

A. Related work

In recent years, 3D fully convolutional networks (FCN) have shown


promising results in segmenting volumetric medical multi organs images. Holger
et al. (16) present a multi-class 3D FCN approach trained on manually labeled
CT scans of seven abdominal structures. They propose a two-stage, coarse-to-
fine strategy where the first-stage FCN roughly delineates the organs based on a
binary mask, reducing the number of voxels for the second-stage FCN to classify.
The authors utilize a training and validation set of 281 and 50 clinical CT images,
respectively, achieving an improved Dice score for each organ from 68.5 to
82.2%.

For another approach, Okada et al. (17) present a framework for the
automated segmentation of multiple organs in upper abdominal computed

12
tomography (CT) data. The framework was evaluated on 134 CT data from 86
patients obtained under six imaging conditions at two hospitals. The average
Dice coefficients for the liver, spleen, and kidneys were more than 92%,
indicating high segmentation accuracy for these organs. The pancreas and
gallbladder achieved Dice coefficients of approximately 73% and 67%
respectively, demonstrating reasonable segmentation performance. The
experimental results highlight the effectiveness of the proposed prediction-based
priors and their ability to adapt to various imaging conditions without the need
for supervised intensity information.

B. Our contribution

Compared to the related works, our study brings novelty by specifically


focusing on the utilization of the U-Net architecture for integrated multi-organ
segmentation tasks, investigating the optimal training iterations, and addressing
the challenges associated with segmenting various organs beyond the lung.

13
CHAPTER 2. METHOD

2.1. Data Acquisition

The dataset in this study was approved by Medical Imaging Data Resource
Center (MIDRC) – RSNA International COVID-19 Open Radiology Database
(RICORD). The data needed for this paper’s purpose indicates 7 patients with
698 images among a total of 230 patients retrospectively collected Chest
Computed Tomography (CT) and was saved in DICOM format. Related
parameters for chest CT scanning were listed as follows: tube voltage, 140 kV;
tube current, 385 mA; scan options, helical mode; slice thickness, 3 mm; spacing
between slices, 3 mm; collimation 80×0.625 mm; matrix, 512×512; scan rage
from apex to half of a liver; convolutional kernel, standard; window width of
400 HU with a window center of 40 HU.

2.2. Data Pre-Processing

For each slide, a series of pre-processing steps were applied. The first step is
to segment the CT data of each lung, liver, trachea-bronchus, and sternum region
using 3D Slier software, in which the organs are different from each other
regarding CT values as shown in Fig. 1. The second step is to normalize the data
of the DICOM format into NumPy Array applying the DICOM details such as
RescaleSlope and RescaleIntercept. All data directly received from PACS was
used for the train and test without any equalizing techniques to maintain the
clinical raw status of the data itself and not to advance local anatomy resolution
in Fig. 2.

14
2.2.1. 3D Slicer

3D Slicer is a software application to visualize and analyze medical


images and research clinical image-guided therapy. It is possible to facilitate
interactive organ segmentation and registration and is a free open-source
software. Thus, this study manually labeled human organs such as a lung,
liver, trachea-bronchus, and sternum leveraging the 3D slicer tool, generating
583 ground truths labeled as “lung”, “liver”, “trachea-bronchus”, and
“sternum” each.

Fig. 1. Segmentation task for generating ground truth masking using a 3D


slicer in Chest CT images, in which scan range was set from the clavicle to
half of the liver region, and 3D visualization of multiple body organs (lung,
liver, sternum, and trachea-bronchus). Each first plane in the image presents
a 3D image of each organ. The second plane to fourth plane indicates axial,
coronal, and sagittal images orderly.

15
Fig. 2. Before training the model, identification of the matching original Chest CT image and the produced ground
truth mask, respectively. For each image of the organ, the first column means the original organ’s image and the
second indicates the overlapped original image with the ground truth masks colored red. This task successfully
identified both two images.

16
2.3. Deep Learning U-Net

The U-Net architecture, introduced by Ronneberger et al. in 2015, is a widely


used convolutional neural network (CNN) model specifically designed for image
segmentation tasks. It derives its name from its characteristic U-shaped structure,
which comprises an encoder path and a corresponding decoder path.
The U-Net architecture consists of three main sections: encoding, connection,
and decoding. In the encoding section, the input image undergoes a series of
convolutional layers followed by the Rectified Linear Unit (ReLU) activation
function as shown in Eq. 1, which introduces non-linearity and captures complex
relationships between image features (18). In detail, the ReLU activation
function is a simple and widely used activation function. It returns the input
value if it is positive and zero otherwise. ReLU is a computationally efficient
and does not suffer from vanishing gradients for positive inputs. The gradient
vanishing means that while learning deep artificial neural network, the gradient
is getting smaller toward the input layer in the backpropagation, finally not
finding optimal model. Batch normalization is applied after each convolutional
layer to enhance training stability and convergence speed.

+ | | > 0,
( ) = max(0, ) = = (1)
2 0 ℎ .

Where x is the input to a neuron. To capture hierarchical information and


downsample the feature maps, max pooling with a stride of 2 is used instead of
traditional pooling operations. This downsampling process reduces the spatial
dimensions while preserving crucial features. In the connection section, the paths
from the encoding and decoding sections are merged, allowing the model to

17
effectively combine low-level and high-level features and make accurate
predictions by leveraging both local and global information.
The decoding section restores the feature maps to the original image size
using convolutional transpose layers (deconvolution layers). These layers
increase the spatial dimensions while learning to fill in missing details.
Concatenation is performed between the unsampled feature maps and the
corresponding feature maps from the encoding section to facilitate information
flow.
At the final stage of the decoding section, a sigmoid activation function is
applied to the output feature map (19), producing pixel-wise probabilities
ranging from 0 to 1 described Eq. 2. To be specific, the sigmoid function is like
a smooth “S” curve that takes any input value and squashes it between o and 1.
It is commonly used for binary classification problems to produce probabilities
indicating the likelihood of a certain class. However, it can have issues with
extremely positive or negative input values, which may cause gradients to vanish
during training.

( )= (2)

These probabilities represent the likelihood of each pixel belonging to the


segmented organ. During training, the U-Net model utilizes the binary cross-
entropy loss function and is optimized using the Adam optimizer.
The U-Net architecture's flexibility, hierarchical feature extraction, and
incorporation of residual connections make it well-suited for accurate and
detailed organ segmentation in medical imaging. Its ability to capture both local
and global information contributes to its effectiveness in challenging
segmentation tasks.

18
Fig. 3. Deep learning U-Net structure used for 2D input images that consist of four organ’s true images and
ground truth masking. Based on the 2D input images, the structure progresses through three phases (encoding,
skip connection, and decoding) with the critical activation functions of ReLU and the sigmoid layer.

19
2.3.1. Training the model

All four models in this study were segmented and developed from
Thoracic CT images for four body organs – lung, liver, trachea-bronchus,
sternum –, and trained utilizing an NVIDIA Tesla P4 GPU supported by
Google Colab, TensorFlow, and Keras. Using the Keras environment, which
is a powerful neural network API and compatible on top of TensorFlow,
helps the researchers to do computing experiments easily and quickly with
no a multitude of codes. Configuration-dependent, it took between an average
of 30 minutes and three hours to train for 500 epochs including Adam
optimizer and Binary Cross Entropy with 1e-3 to 1e-5 of learning rates and 8
to 32 of batch size based on early stopping (10 patience). In detail, Adam
(short for Adaptive Moment Estimation) is a popular optimization algorithm
commonly used in deep learning by Eq. 3. It combines the benefits of both
the AdaGrad and RMSprop optimizers. The Adam optimizer adjusts the
learning rate for each parameter based on the average of past gradients and
their squared gradients (20). This adaptive learning rate helps the optimizer
converge faster and more effectively navigate different parts of the parameter
space. It also includes momentum, which helps smooth the optimization
process and prevents getting stuck in local optima. The binary cross-entropy
loss function is commonly used in binary classification tasks (21), where the
goal is to classify inputs into one of two classes (e.g., 0 or 1, positive or
negative) by Eq. 4. It measures the dissimilarity between the predicted
probabilities and the true binary labels. The loss function calculates the
average of the individual losses for each prediction, penalizing larger
differences between predicted and true labels more heavily. It encourages the
model to adjust its parameters to minimize the overall loss and improve the

20
accuracy of binary classification. Also, the main purpose of EarlyStopping is
to stop the training process if the model's performance on the validation set
does not improve or starts deteriorating. It helps prevent the model from
continuously optimizing its parameters on the training data to the point where
it becomes too specialized and fails to generalize well to unseen data.

= − (3)

= − ∑ ( ∙ log + (1 − ) ∙ log 1 − ) (4)

For the coarse stage and the fine stage, respectively, the model
inference of the test set 115 images took an average of three and seven
seconds. The dataset in the training includes the original Thoracic CT images
and manually segmented masking images with four groups – lung, liver,
trachea-bronchus, and sternum. Each dataset contains 583 sets of ground-
truth masks with binary values and unmask Thoracic images, and it was
separated into training and validation data. The training data consists of 80%
with 466 images of the total sets, and 20% with 177 images were used for
validation. The images, in detail, were reshaped 512 by 512 and trained in
batched in order to support faster training without requiring additional
computational resources as shown the figure 3.

21
Fig. 4. Training and testing framework. Each multi-organs deep learning model – Lung, Liver, Trachea-Bronchus,
Sternum – was trained based on U-Net architecture with various iterations, respectively. And the models were
trained with some hyperparameters of batch size (8 to 32), optimizer (Adam), early stopping (10 patience), and
loss function (Binary Cross Entropy). All ground-truth masks in the training sets were manually segmented by
college students who were taught by the specialized medical researcher. Each result produced by each model was
compared with evaluation metrics.
22
2.4. Evaluation Metrics

This paper evaluated the model performance employing the concept of dice
coefficient which is the similarity of the manually made Ground Truth and the
result generated by the model. Dice similarity coefficient (DSC) is an evaluation
of the intersection between Ground Truth and the result (22). The evaluation
varies from 0 to 1 where the dice coefficient of 1 indicates excellent and
complete overlap. It can be computed using the formula eq.5:

| ∩ |
= = | | | |
(5)

Where TP is the true positive, FP is the false positive, and FN is the false negative.
False positives are the negatives that the model incorrectly classifies as positive,
whereas true positives are the number that the model discovers or correctly
detects. Additionally, in the context of chest CT image segmentation, receiver
operating characteristic (ROC) curve – area under the curve (AUC) is a metric
used to assess the performance of a deep learning model (23) using for formula
eq.6.

= = =

Specificity = (6)

FPR = 1 – TNR =

Where in the ROC curve, TPR and FPR are true positive rate and false positive
rate meaning for every threshold. The percentage of test subjects who are
negative and do not have the target condition is known as specificity. Sensitivity

23
also refers to the percentage of subjects who test positively and have the target
condition (24). It quantifies the model's ability to accurately differentiate
between the target region of interest (ROI) and the background regions. A higher
ROC-AUC value indicates better segmentation performance, reflecting the
model's effectiveness in accurately classifying pixels or voxels as belonging to
the ROI or the background. This metric provides valuable insights into the
segmentation accuracy of the deep learning model in the context of chest CT
image analysis.

24
CHAPTER 3. RESULT

3.1. Lung

The results of training the lung region using initial 500 iterations set based
on early stopping are presented in Figures 5 and 6. Figure 5 illustrates sample
predicted results generated by the deep learning model. In section A in Figure 5,
the leftmost images represent the original Chest CT axial images in detail, while
the middle images display manually segmented lung masks serving as the
ground truth. The rightmost images depict the predicted lung masks produced by
the model. The outcomes depicted in section A of Figure 5 demonstrate a high
degree of similarity between the predicted lung masks and the ground truth
masks, providing evidence of the model's excellent predictive ability. For section
B, the findings display the dice similarity coefficient (DSC) graph and the
accuracy loss graph of the model. It also indicates the optimal iteration point
(152 epochs), avoiding both overfitting and underfitting. Specifically, the blue
line represents the training dice coefficient graph, while the green line represents
the validation dice coefficient graph in section B of Figure 5. The orange line
corresponds to the training loss, and the red line represents the validation loss.
Additionally, based on setting patience value as 10 to stop the training, the model
was trained to 152 iterations and improved after adjusting several
hyperparameters – 16 of batch size, Adam optimizer, and 1e-4 of learning rate
as shown in Fig. 5, achieving 0.9990 of AUROC.

25
Fig. 5. The results after adjusting several hyperparameters from the past settings. The revised hyperparameters
(batch size, optimizer, learning rate) were based on 16 of batch size, Adam optimizer, 1e-4 of learning rate with
early stopping. A. Visual result from the lung model that was stopped on 152 epochs, B-C. Dice score and ROC-
AUC findings after adjusting the model’s parameters.

26
Table 1. Comparison of the trained lung models with initial 500 epochs with two sets (training and test). Each
numerical finding was based on evaluation metrics of the dice similarity coefficient (DSC) and receiver operating
characteristics and the area under the curve (ROC-AUC). DSC results are explained with mean and standard
deviation. The improved result of the lung model trained 152 epochs was set modifying some hyperparameters
of early stopping, 16 of batch size, 1e-4 of learning rate.

Metrics
DSC (mean ± std) ROC-AUC
Organ
Lung Train 0.9862 ± 0.1209 0.9998
with 152 Epochs Test 0.9482 ± 0.1903 0.9990

27
3.2. Liver

The results of training the liver region using initial 500 iterations set based
on early stopping are presented in Figure 6. Figure 6 illustrates sample predicted
results generated by the deep learning model. In section A in Figure 6, the
leftmost images represent the original Chest CT axial images in detail, while the
middle images display manually segmented liver masks serving as the ground
truth. The rightmost images depict the predicted liver masks produced by the
model. The outcomes depicted in section A of Figure 6 demonstrate a high
degree of similarity between the predicted liver masks and the ground truth
masks, providing evidence of the model's excellent predictive ability. For section
B, the findings display the dice similarity coefficient (DSC) graph and the
accuracy loss graph of the model. It also indicates the optimal iteration point (50
epochs), avoiding both overfitting and underfitting. Specifically, the blue line
represents the training dice coefficient graph, while the green line represents the
validation dice coefficient graph in section B of Figure 6. The orange line
corresponds to the training loss, and the red line represents the validation loss.
Additionally, based on setting patience value as 10 to stop the training, the model
was trained to 50 iterations and improved after adjusting several
hyperparameters – 8 of batch size, Adam optimizer, and 1e-3 of learning rate,
achieving 0.9885 of AUROC as shown in section C of Figure 6.

28
Fig. 6. The results after adjusting several hyperparameters from the past settings. The revised hyperparameters
(batch size, optimizer, learning rate) were based on 8 of batch size, Adam optimizer, 1e-3 of learning rate with
early stopping. A. Visual result from the liver model that was stopped on 50 epochs, B-C. Dice score and ROC-
AUC findings after adjusting the model’s parameters.

29
Table 2. Comparison of the trained liver models with initial 500 epochs with two sets (training and test). Each
numerical finding was based on evaluation metrics of the dice similarity coefficient (DSC) and receiver operating
characteristics and the area under the curve (ROC-AUC). DSC results are explained with mean and standard
deviation. The improved result of the liver model trained 50 epochs was set modifying some hyperparameters of
early stopping, 8 of batch size, 1e-3 of learning rate.

Metrics
DSC (mean ± std) ROC-AUC
Organ

Train 0.9158 ± 0.3197 0.9987


Liver with
50 Epochs Test 0.8432 ± 0.4129 0.9885

30
3.3. Sternum

The results of training the sternum region using initial 500 iterations set
based on early stopping are presented in Figure 7. Figure 7 illustrates sample
predicted results generated by the deep learning model. In section A in Figure 7,
the leftmost images represent the original Chest CT axial images in detail, while
the middle images display manually segmented sternum masks serving as the
ground truth. The rightmost images depict the predicted sternum masks
produced by the model. The outcomes depicted in section A of Figure 7
demonstrate a high degree of similarity between the predicted sternum masks
and the ground truth masks, providing evidence of the model's excellent
predictive ability. For section B, the findings display the dice similarity
coefficient (DSC) graph and the accuracy loss graph of the model. It also
indicates the optimal iteration point (97 epochs), avoiding both overfitting and
underfitting. Specifically, the blue line represents the training dice coefficient
graph, while the green line represents the validation dice coefficient graph in
section B of Figure 7. The orange line corresponds to the training loss, and the
red line represents the validation loss. Additionally, based on setting patience
value as 10 to stop the training, the model was trained to 97 iterations and
improved after adjusting several hyperparameters – 16 of batch size, Adam
optimizer, and 1e-3 of learning rate, achieving 0.9937 of AUROC as shown in
section C of Figure 7.

31
Fig. 7. The results after adjusting several hyperparameters from the past settings. The revised hyperparameters
(batch size, optimizer, learning rate) were based on 16 of batch size, Adam optimizer, 1e-3 of learning rate with
early stopping. A. Visual result from the lung model that was stopped on 97 epochs, B-C. Dice score and ROC-
AUC findings after adjusting the model’s parameters.

32
Table 3. Comparison of the trained sternum models with initial 500 epochs with two sets (training and test). Each
numerical finding was based on evaluation metrics of the dice similarity coefficient (DSC) and receiver operating
characteristics and the area under the curve (ROC-AUC). DSC results are explained with mean and standard
deviation. The improved result of the sternum model trained 97 epochs was set modifying some hyperparameters
of early stopping, 16 of batch size, 1e-3 of learning rate.

Metrics
DSC (mean ± std) ROC-AUC
Organ

Sternum with Train 0.9439 ± 0.2492 0.9959


97 Epochs Test 0.8827 ± 0.3104 0.9937

33
3.4. Trachea-Bronchus

The results of training the trachea-bronchus region using initial 500


iterations set based on early stopping are presented in Figure 8. Figure 8
illustrates sample predicted results generated by the deep learning model. In
section A in Figure 8, the leftmost images represent the original Chest CT axial
images in detail, while the middle images display manually segmented trachea-
bronchus masks serving as the ground truth. The rightmost images depict the
predicted trachea-bronchus masks produced by the model. The outcomes
depicted in section A of Figure 8 demonstrate a high degree of similarity between
the predicted trachea-bronchus masks and the ground truth masks, providing
evidence of the model's excellent predictive ability. For section B, the findings
display the dice similarity coefficient (DSC) graph and the accuracy loss graph
of the model. It also indicates the optimal iteration point (128 epochs), avoiding
both overfitting and underfitting. Specifically, the blue line represents the
training dice coefficient graph, while the green line represents the validation dice
coefficient graph in section B of Figure 8. The orange line corresponds to the
training loss, and the red line represents the validation loss. Additionally, based
on setting patience value as 10 to stop the training, the model was trained to 128
iterations and improved after adjusting several hyperparameters – 32 of batch
size, Adam optimizer, and 1e-4 of learning rate, achieving 0.9653 of AUROC as
shown in section C of Figure 8.

34
Fig. 8. The results of the trachea-bronchus’s model after adjusting several hyperparameters from the past settings.
The revised hyperparameters (batch size, optimizer, learning rate) were based on 32 of batch size, Adam
optimizer, 1e-4 of learning rate with early stopping (patience 10). A. Visual result from the trachea-bronchus
model that was stopped on 128 epochs, B-C. Dice score and ROC-AUC findings after adjusting the model’s
parameters.
35
Table 4. Comparison of the trained trachea-bronchus models with initial 500 epochs with two sets (training and
test). Each numerical finding was based on evaluation metrics of the dice similarity coefficient (DSC) and
receiver operating characteristics and the area under the curve (ROC-AUC). DSC results are explained with mean
and standard deviation. The improved result of the trachea-bronchus model trained 128 epochs was set modifying
some hyperparameters of early stopping, 32 of batch size, 1e-4 of learning rate.

Metrics
DSC (mean ± std) ROC-AUC
Organ
Trachea-Bronchus Train 0.9501 ± 0.1552 0.9933
with 128 Epochs Test 0.7331 ± 0.2027 0.9653

36
CHAPTER 4. DISCUSSION

This study aimed to develop four deep-learning models to train on multi-organs,


namely the lung, liver, sternum, and trachea-bronchus, using a dataset of 698 chest
images of seven patients' Chest CT from a public dataset with initial 500 epochs. The
targeted organs have different Hounsfield units (HU) generally applied in computed
tomography (CT) scanning. In addition, computer vision research in clinical routine
using Deep Learning and Machine Learning has rapidly progressed, and offering
advancements to the hospital environment, where clinicians often require the
differentiation of normal organs for accurate diagnoses. In this respect, we conducted
a fundamental study on multi-organ chest CT segmentation, utilizing the U-net
structure (8). Due to recent trend, a number of researchers have segmented and
analyzed various organs using Deep Learning models with various neural network
architectures, such as F-CNN, ResNet, etc (25). The architecture used in this study
is versatile, making it applicable to other organ segmentation applications. The
results of multi-organs segmentations were tested and compared with several
iterations to avoid any issue of overfitting and underfitting, which were depending
on variations of image resolutions for each organ. To assess the model performance,
this study employed evaluation metrics such as dice similarity coefficient (DSC),
dice loss, and receiver operating characteristic curve (ROC). Our motivation in this
study was to identify the optimal parameters such as batch size, optimizer, and
epochs, for robust performance of the deep learning models applied to four multiple
body organs. Discovering suitable hyperparameters can be time-consuming and
challenging.
In the case of all findings (lung, liver, sternum, and trachea bronchus), based on
the initial 500 epochs with early stopping, we conducted experiments in order to
achieve high performance without overfitting. To improve the jumping part in the

37
graph as shown in Figure 5-8, we have adjusted some hyperparameters of batch size
(lung, 16; liver, 8; sternum, 16; trachea-bronchus, 32), Adam optimizer, learning rate
(lung and trachea-bronchus, 1e-4; liver and sternum, 1e-3) with early stopping. As a
result, the trained models consistently improved and demonstrated a good fit, as both
the training and testing datasets performed well, and the model stabilized at a specific
point (26). Each model achieved the performance in the test set with 0.9482 ±
0.1903, 0.8432 ± 0.4129, 0.8827 ± 0.3104, and 0.7331 ± 0.2027 of DSC and 0.9990,
0.9885, 0.9937, and 0.9653 of ROC-AUC, respectively in the order of lung, liver,
sternum, and trachea-bronchus, indicating that the deep learning model can
anticipate organ area at unrevealed Chest CT images with a small error of margin
and good prediction accuracy.
However, this research has a few limitations, including Over-fitting, Class
imbalance, Computational complexity, Inter- and intra-observer error, and Low
image quality. For overfitting, it stems from data scarcity, which universally leads to
the overfitting of a model that demonstrates strong performance during the training
stage but fails to generalize well to new datasets. Manual segmentation tasks can be
time-consuming, high-laborious, and error-prone from time to time, and innumerable
labeling tasks for training are not available in sufficient amount. That’s why
modification of various hyperparameters could be necessary to prevent over-fitting
(27), instead of solely relying on extensive segmentation. Class imbalance is another
problem in multi-organ segmentation (28). For the labeling of multi-organs using the
chest CT datasets, the regions of the sternum and trachea-bronchus are much smaller
than the liver and lung. Training a neural network based on class-imbalanced data
can lead to an unstable segmentation model and bias towards the classes of large
body organs. Therefore, choosing an optimal loss function is crucial for these cases
(29). In a DL-based model, the complexity is determined by the network framework,
input image size, batch size, and other learnable hyperparameters (30). To enhance

38
the segmentation computation and prevent any GPU memory related problem, one
can reduce the number of parameters or layers in the network and concentrate on
ways that artificially augmenting the training data, rather than modifying the network
architecture. Regarding man-made errors, the ground truth masks were manually
obtained by physicians during training process. Differences in manual contouring
tasks performed by different individuals or the same individual under varying
conditions can introduce bias in the DL-based results. This bias can arise from
variations in physicians' segmentation styles, representing a systematic error, as well
as uncertain segmentations, which are random errors (31). However, such challenges
exist in any supervised learning-based approach. Additionally, inferior image quality
caused by noise artifacts, human breathing, and inhomogeneity can influence
accurate multi-organs segmentation (32). To handle these issues, several useful
techniques can be utilized, such as synthetic samples and deep supervision.

39
CHAPTER 5. CONCLUSION

This study focuses on automating the segmentation of multiple organs, namely


the lung, liver, sternum, and trachea-bronchus, in Chest CT images using the U-Net
deep learning algorithm. We optimized the hyperparameters of the deep learning
model to achieve the best results. These results indicate that the trained deep learning
models are capable of accurately segmenting the specified multi-organs in Chest CT
images. They serve as essential resources for further research and detection studies,
such as analyzing lung and liver diseases, as well as investigating tracheal disorders.
The high precision achieved by the four deep learning models reinforces their
potential value in advancing medical image analysis and related fields.

40
REFERENCES

1. Long J, Shelhamer E, Darrell T, editors. Fully convolutional networks for semantic


segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition; 2015.
2. Mostajabi M, Yadollahpour P, Shakhnarovich G, editors. Feedforward semantic
segmentation with zoom-out features. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition; 2015.
3. Noh H, Hong S, Han B, editors. Learning deconvolution network for semantic
segmentation. Proceedings of the IEEE International Conference on Computer Vision; 2015.
4. Sharp G, Fritscher KD, Pekar V, Peroni M, Shusharina N, Veeraraghavan H, et al.
Vision 20/20: perspectives on automated image segmentation for radiotherapy. Medical
Physics. 2014;41(5):050902.
5. Akira M, Yokoyama K, Yamamoto S, Higashihara T, Morinaga K, Kita N, et al.
Early asbestosis: evaluation with high-resolution CT. Radiology. 1991;178(2):409-16.
6. Alfidi RJ, MacIntyre WJ, Haaga JR. The effects of biological motion on CT
resolution. American Journal of Roentgenology. 1976;127(1):11-5.
7. Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusamy K, et al., editors.
Towards image-guided pancreas and biliary endoscopy: automatic multi-organ
segmentation on abdominal CT with dense dilated networks. Medical Image Computing and
Computer Assisted Intervention− MICCAI 2017: 20th International Conference, Quebec
City, QC, Canada, September 11-13, 2017, Proceedings, Part I 20; 2017: Springer.
8. Ronneberger O, Fischer P, Brox T, editors. U-net: Convolutional networks for
biomedical image segmentation. Medical Image Computing and Computer-Assisted
Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-
9, 2015, Proceedings, Part III 18; 2015: Springer.
9. Shrestha A, Mahmood A. Review of deep learning algorithms and architectures.
IEEE Access. 2019;7:53040-65.
10. Young SR, Rose DC, Karnowski TP, Lim S-H, Patton RM, editors. Optimizing deep
learning hyper-parameters through an evolutionary algorithm. Proceedings of the Workshop
on Machine Learning in High-Performance Computing Environments; 2015.
11. Ibragimov B, Xing L. Segmentation of organs‐at‐risks in head and neck CT images
using convolutional neural networks. Medical Physics. 2017;44(2):547-57.
12. Pekar V, McNutt TR, Kaus MR. Automated model-based organ delineation for
radiotherapy planning in prostatic region. Int J Radiat Oncol Biol Phys. 2004;60(3):973-80.
13. Ren X, Xiang L, Nie D, Shao Y, Zhang H, Shen D, et al. Interleaved 3D‐CNN s for
joint segmentation of small‐volume structures in head and neck CT images. Medical Physics.
2018;45(5):2063-75.
14. Chen X, Wang X, Zhang K, Fung K-M, Thai TC, Moore K, et al. Recent advances
and clinical applications of deep learning in medical image analysis. Medical Image
Analysis. 2022:102444.
15. Wang X, Zhao Y, Pourpanah F. Recent advances in deep learning. International

41
Journal of Machine Learning and Cybernetics. 2020;11:747-50.
16. Roth HR, Oda H, Hayashi Y, Oda M, Shimizu N, Fujiwara M, et al. Hierarchical
3D fully convolutional networks for multi-organ segmentation. arXiv preprint
arXiv:170406382. 2017.
17. Okada T, Linguraru MG, Hori M, Summers RM, Tomiyama N, Sato Y. Abdominal
multi-organ segmentation from CT images using conditional shape–location and
unsupervised intensity priors. Medical Image Analysis. 2015;26(1):1-18.
18. Schmidt-Hieber J. Nonparametric regression using deep neural networks with ReLU
activation function. 2020.
19. Menon A, Mehrotra K, Mohan CK, Ranka S. Characterization of a class of sigmoid
functions with applications to neural networks. Neural Networks. 1996;9(5):819-35.
20. Zhang Z, editor Improved adam optimizer for deep neural networks. 2018
IEEE/ACM 26th International Symposium on Quality of Service (IWQoS); 2018: IEEE.
21. Ruby U, Yendapalli V. Binary cross entropy with deep learning technique for image
classification. Int J Adv Trends Comput Sci Eng. 2020;9(10).
22. Shamir RR, Duchin Y, Kim J, Sapiro G, Harel N. Continuous dice coefficient: a
method for evaluating probabilistic segmentations. arXiv preprint arXiv:190611031. 2019.
23. Narkhede S. Understanding auc-roc curve. Towards Data Science. 2018;26(1):220-
7.
24. Parikh R, Mathai A, Parikh S, Sekhar GC, Thomas R. Understanding and using
sensitivity, specificity and predictive values. Indian Journal of Ophthalmology.
2008;56(1):45.
25. Wu Z, Shen C, Van Den Hengel A. Wider or deeper: Revisiting the resnet model
for visual recognition. Pattern Recognition. 2019;90:119-33.
26. Brownlee J. How to Diagnose Overfitting and Underfitting of LSTM Models.
Machine Learning Mastery. 2017.
27. Ying X, editor An overview of overfitting and its solutions. Journal of Physics:
Conference Series; 2019: IOP Publishing.
28. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance.
Journal of Big Data. 2019;6(1):1-54.
29. Anderson RP, Gonzalez Jr I. Species-specific tuning increases robustness to
sampling bias in models of species distributions: an implementation with Maxent. Ecological
Modelling. 2011;222(15):2796-811.
30. Thompson NC, Greenewald K, Lee K, Manso GF. The computational limits of deep
learning. arXiv preprint arXiv:200705558. 2020.
31. Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng P-A, et al. Deep
learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis:
is the problem solved? IEEE Transactions on Medical Imaging. 2018;37(11):2514-25.
32. Sabottke CF, Spieler BM. The effect of image resolution on deep learning in
radiography. Radiology: Artificial Intelligence. 2020;2(1):e190015.

42

You might also like