Evaluating Cellularity Estimation Methods Comparin Part2
Evaluating Cellularity Estimation Methods Comparin Part2
model itself also requires a large amount of computational cost to learn and may suffer
from hallucinations that change the semantic content of the image, therefore introducing
label-noise.
An effective method to improve the robustness of AI models from limited data is
data augmentation (DA) [21,22]. This method randomly adds variations in positional
information (rotation, flipping) and color, brightness, etc., to the original input images to
increase the diversity of the dataset without changing the semantics of the data (location,
number and type of cells). By constructing AI models from these augmented datasets, the
robustness of AI to variations in image quality is improved. In [23], DA is reported to
be more effective than normalization of input images for improving the robustness of AI
models to staining variations.
To develop a practical AI-powered TCR calculation software, it is essential to perform
cell detection and classification accurately and efficiently. Chen et al. [24] developed a
method for identifying the cell cycle phase of cells based on fluorescence images. Their
approach utilizes k-means clustering and rule-based heuristics to classify cells into dif-
ferent phases; while this method may be effective for limited datasets, its generalization
performance to datasets from different institutions or acquired using different imaging
devices is likely to be constrained. Ghani et al. [25] proposed a method for accelerating
cell detection using a convolutional neural network embedded on an FPGA device. This
approach offers significant speedups compared to software-based implementations. The
authors demonstrated the effectiveness of their method in detecting different types of
damaged kidney cells with an accuracy of nearly 100%. However, the limited size of their
test dataset raises concerns about the generalizability of their results. Further validation
with a larger and more diverse dataset is recommended.
Addressing these challenges, we leverage a previously-developed deep-learning
model [26] for TCR counting and use it in this study to predict TCR for a cohort (see
Section 2.1) of 41 non-small cell lung cancer patients from four different medical institu-
tions (sites). In other reported experiments on TCR counting [27], the “gold standard”
ground truth is established by first defining a tumor area and then counting all cells within
that area as tumor cells. In contrast, our approach is more fine-grained and classifies each
cell independently as tumor or non-tumor.
In Section 2.2, we first establish a “gold standard” (GS) set of exhaustive cell-level anno-
tations by three pathologists in regions of interests of the cohort. We also ask 13 pathologists
to visually estimate the TCR on these regions (see Section 2.3). In Section 2.4, we detail
the model architecture, training partition, and data augmentation we use to create our AI
model. Finally, to evaluate the real-world robustness of the AI model, we devised a leave-
one-hospital-out cross-validation scheme, where we test it on images from sites unseen
during training. In Section 3, we report our findings, comparing the TCR predictions by
the AI model to the gold standard and to the pathologists’ visual estimate.
Immunostained images from the same blocks were used during case selection for
histological type differentiation but were not subsequently used for TCR estimation by
pathologists nor by AI. Specimens with a high number of crushed necrotic cells were
excluded. Specimen in the cohort exhibited adenocarcinoma (27) and squamous cell
carcinoma (14). Care was taken that both cancer subtypes and sample extraction methods
were split among sites to avoid site biases.
Figure 1. Region of interest (ROI) selection over WSI slide. In (A), a tissue is selected on the WSI slide. In
(B), the tissue is tiled and one tile is selected as ROI. In (C), the cells of the ROI are exhaustively labeled.
In this manner, a representative ROI was selected from each WSI of the 41 cases,
resulting in a total of 41 ROIs.
Three pathologists were then independently instructed to exhaustively identify the
location of every cell in the ROI and to label them as either tumor cells (tCells), non-tumor
cells (nCells), or indistinguishable cells (iCells). iCells are cells within an ROI that cannot
be definitively classified as either tCells, nCells, or cells that should not be labeled. These
cells may also be excluded from tumor cell content calculations by pathologists if DNA
nucleic acids cannot be extracted due to crush, necrosis, degeneration, or keratinization.
In addition, cells that are not used for tumor cell content calculations by pathologists,
such as red blood cells and cells whose nuclei cannot be recognized due to necrosis or
degeneration, are excluded from the labeling process.
For each cell, the labels given by the three pathologists were combined into a single
final label using a majority rule. Cells are initially matched using a distance-based matching
algorithm thus resulting in a 3-cell match (all 3 pathologists annotated that cell), a 2-cell
match, or a 1-cell match. Then, the final label was established following the rules shown in
Table 1.