0% found this document useful (0 votes)
8 views

Evaluating Cellularity Estimation Methods Comparin Part2

The document describes a study that uses deep learning to estimate tumor cell ratio (TCR) in lung cancer patients. It details the cohort of 41 patients from 4 medical institutions, how 3 pathologists created exhaustive cell-level annotations to establish a gold standard, and how they trained and evaluated their deep learning model using a leave-one-hospital-out validation scheme.

Uploaded by

quang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Evaluating Cellularity Estimation Methods Comparin Part2

The document describes a study that uses deep learning to estimate tumor cell ratio (TCR) in lung cancer patients. It details the cohort of 41 patients from 4 medical institutions, how 3 pathologists created exhaustive cell-level annotations to establish a gold standard, and how they trained and evaluated their deep learning model using a leave-one-hospital-out validation scheme.

Uploaded by

quang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Diagnostics 2024, 14, 1115 3 of 17

model itself also requires a large amount of computational cost to learn and may suffer
from hallucinations that change the semantic content of the image, therefore introducing
label-noise.
An effective method to improve the robustness of AI models from limited data is
data augmentation (DA) [21,22]. This method randomly adds variations in positional
information (rotation, flipping) and color, brightness, etc., to the original input images to
increase the diversity of the dataset without changing the semantics of the data (location,
number and type of cells). By constructing AI models from these augmented datasets, the
robustness of AI to variations in image quality is improved. In [23], DA is reported to
be more effective than normalization of input images for improving the robustness of AI
models to staining variations.
To develop a practical AI-powered TCR calculation software, it is essential to perform
cell detection and classification accurately and efficiently. Chen et al. [24] developed a
method for identifying the cell cycle phase of cells based on fluorescence images. Their
approach utilizes k-means clustering and rule-based heuristics to classify cells into dif-
ferent phases; while this method may be effective for limited datasets, its generalization
performance to datasets from different institutions or acquired using different imaging
devices is likely to be constrained. Ghani et al. [25] proposed a method for accelerating
cell detection using a convolutional neural network embedded on an FPGA device. This
approach offers significant speedups compared to software-based implementations. The
authors demonstrated the effectiveness of their method in detecting different types of
damaged kidney cells with an accuracy of nearly 100%. However, the limited size of their
test dataset raises concerns about the generalizability of their results. Further validation
with a larger and more diverse dataset is recommended.
Addressing these challenges, we leverage a previously-developed deep-learning
model [26] for TCR counting and use it in this study to predict TCR for a cohort (see
Section 2.1) of 41 non-small cell lung cancer patients from four different medical institu-
tions (sites). In other reported experiments on TCR counting [27], the “gold standard”
ground truth is established by first defining a tumor area and then counting all cells within
that area as tumor cells. In contrast, our approach is more fine-grained and classifies each
cell independently as tumor or non-tumor.
In Section 2.2, we first establish a “gold standard” (GS) set of exhaustive cell-level anno-
tations by three pathologists in regions of interests of the cohort. We also ask 13 pathologists
to visually estimate the TCR on these regions (see Section 2.3). In Section 2.4, we detail
the model architecture, training partition, and data augmentation we use to create our AI
model. Finally, to evaluate the real-world robustness of the AI model, we devised a leave-
one-hospital-out cross-validation scheme, where we test it on images from sites unseen
during training. In Section 3, we report our findings, comparing the TCR predictions by
the AI model to the gold standard and to the pathologists’ visual estimate.

2. Materials and Methods


2.1. Cohort
The cohort in this study consists of 41 patient at various stages of non-small cell
lung cancer (NSCLC) and considered for genetic panel testing. The specimens were
collected between 2018 and 2022 from 4 sites: Hokkaido University Hospital (HUH),
Kansai Medical University Hospital (KMU), Kanagawa Cancer Center (KCC), and Saitama
Cancer Center (SCC). A total of 11 specimens were extracted with trans-bronchial biopsy
(TBB), 9 were extracted with trans-bronchial needle aspiration (TBNA), 9 were extracted
with core-needle biopsy (CNB), and 12 were extracted with surgical resections. These
specimens were then prepared into blocks using the formalin-fixed paraffin-embedded
(FFPE) method, sectioned, stained with hematoxylin and eosin (H&E), and scanned in
bright-field at 40× magnification (0.25 microns/pixel) with a whole slide device (Philips
UFS with Philips IntelliSite Pathology Solution, Philips, Amsterdam, The Netherlands) to
generate WSI images.
Diagnostics 2024, 14, 1115 4 of 17

Immunostained images from the same blocks were used during case selection for
histological type differentiation but were not subsequently used for TCR estimation by
pathologists nor by AI. Specimens with a high number of crushed necrotic cells were
excluded. Specimen in the cohort exhibited adenocarcinoma (27) and squamous cell
carcinoma (14). Care was taken that both cancer subtypes and sample extraction methods
were split among sites to avoid site biases.

2.2. Gold Standard Labels


In a first step, we define, for each WSI slide a region of interest (ROI) where the labeling
will take place (Figure 1). To eliminate bias in selecting the ROIs from the WSI slides, we
employed the following procedure:
1. A pathologist manually traces a target area on a WSI slide (typically, the tumor area).
2. A program divides the area into square tiles of 400 microns (1760 × 1760 pixels at
40× magnification) and randomly selects a tile.
3. The pathologist examines the selected tile. If it does not contain tumor cells or has
an image quality issue (e.g., blurriness, artifacts), it is excluded, and another tile is
randomly selected by the program. If the selected tile has none of the aforementioned
issues, it is finalized as the representative ROI for the WSI.

Figure 1. Region of interest (ROI) selection over WSI slide. In (A), a tissue is selected on the WSI slide. In
(B), the tissue is tiled and one tile is selected as ROI. In (C), the cells of the ROI are exhaustively labeled.

In this manner, a representative ROI was selected from each WSI of the 41 cases,
resulting in a total of 41 ROIs.
Three pathologists were then independently instructed to exhaustively identify the
location of every cell in the ROI and to label them as either tumor cells (tCells), non-tumor
cells (nCells), or indistinguishable cells (iCells). iCells are cells within an ROI that cannot
be definitively classified as either tCells, nCells, or cells that should not be labeled. These
cells may also be excluded from tumor cell content calculations by pathologists if DNA
nucleic acids cannot be extracted due to crush, necrosis, degeneration, or keratinization.
In addition, cells that are not used for tumor cell content calculations by pathologists,
such as red blood cells and cells whose nuclei cannot be recognized due to necrosis or
degeneration, are excluded from the labeling process.
For each cell, the labels given by the three pathologists were combined into a single
final label using a majority rule. Cells are initially matched using a distance-based matching
algorithm thus resulting in a 3-cell match (all 3 pathologists annotated that cell), a 2-cell
match, or a 1-cell match. Then, the final label was established following the rules shown in
Table 1.

You might also like