Towards Foundation Models of Biological Image Segmentation

Uploaded by

soumikfarhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views3 pages

Towards Foundation Models of Biological Image Segmentation

Uploaded by

soumikfarhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Comment

https://doi.org/10.1038/s41592-023-01885-0

Towards foundation models of biological

image segmentation
Jun Ma & Bo Wang Check for updates

In the ever-evolving landscape of biological networks6, which can learn hierarchical feature representations directly
from raw image data without requiring manual feature engineering.
imaging technology, it is crucial to develop The U-Net architecture7, a type of fully convolutional neural net-
foundation models capable of adapting to work, has been extensively adopted in biological image segmenta-
tion. For example, the 2D U-Net is commonly used for human cell2,8
various imaging modalities and tackling and bacterial cell9 segmentation in 2D light microscopy images. The
complex segmentation tasks. 3D U-Net excels at segmenting neurons10 and organelles4 in volume
electron microscopy images and nuclear pore complex segmenta-
Biological images have long played a crucial role in unraveling the tion in cryo-ET images1. Furthermore, it is noteworthy that U-Net and
mechanisms underlying biological systems, enabling a deeper under- its variants11 continue to dominate biomedical image segmentation
standing of the intricate processes within cells and tissues. A central competitions, including tasks such as lesion and organ segmentation
task of leveraging biological images is the extraction of quantitative in computed tomography or magnetic resonance images12, as well as
features that can be used to characterize and compare different biologi- nuclei segmentation in images stained with hematoxylin and eosin13.
cal systems or conditions. These features may include the size, shape Despite the success of U-Net and its variants, these methods are
and texture of cellular structures, as well as the spatial relationships tailored to specific segmentation tasks and thus have limited gener-
between them. To obtain reliable quantitative measurements, accurate alization ability to other tasks. Additionally, as the model parameters
and effective segmentation methods are indispensable. are trained from scratch, they cannot be transferred across tasks.
This common practice is a significant and inefficient expenditure of
Fundamental segmentation tasks in biological resources. The limited transferability of U-Net highlights the necessity
image analysis for more versatile and generalizable methods in biological image seg-
Biological image segmentation is the process of partitioning an image mentation. In our view, the most promising path toward revolutionizing
into meaningful regions, which can then be analyzed individually or current segmentation paradigms lies in the methodological advances
in relation to one another. There are three fundamental segmentation of foundation models, which hold immense transformative potential.
tasks that cater to a broad spectrum of applications (Fig. 1). Semantic
segmentation is one of the most common tasks, aiming to classify Foundation models are transforming the segmentation
each pixel in an image according to its corresponding object category, paradigm
without differentiating between individual instances of the same cat- Foundation models are large pretrained models that are trained on
egory. For example, in cryo-electron tomography (cryo-ET) images, massive amounts of data and can offer better flexibility and adapt-
semantic segmentation can be used to label different organelles, such ability, enabling more accurate segmentation in diverse scenarios
as ribosomes and membranes1. Instance segmentation assigns unique and surpassing the performance of specialized models. They learn
identifiers to instances if they belong to the same category. An example the underlying data patterns and structures, which can be adapted
application is the identification and separation of individual cells in to a wide range of tasks via transfer learning. They have emerged as
multiplexed immunofluorescence images, allowing the analysis of each a transformative technology with many successful examples in vari-
cell separately2,3. Panoptic segmentation combines elements of both ous fields, such as the Generative Pre-trained Transformer (GPT)14 in
semantic and instance segmentation, aiming to identify the semantic natural language processing and Stable Diffusion15 in vision language
category of each pixel and assign a unique identifier to each object modeling. In image segmentation, there have been notable successes
instance of the same class. For instance, in volume electron microscopy as well. For example, OneFormer16, a transformer-based multi-task
images, panoptic segmentation delineates mitochondria and nuclei universal image segmentation framework, outperformed specially
while also distinguishing between individual mitochondrion4. trained networks across semantic, instance and panoptic segmentation
Over the past 30 years, the field of image segmentation has evolved tasks. X-Decoder17, a transformer-based decoder, also achieved better
through three main stages of methodological development. In the early performance than specialist models on multiple segmentation tasks.
years, the field was dominated by rule-based methods, such as the
watershed algorithm, which relied on predefined rules and heuristics When and how foundation models can transform biological
derived from human knowledge. Advancements in statistical machine image segmentation
learning led to its increased popularity for image segmentation5. There are three main components to construct foundation mod-
Methods such as support vector machines and random forests used els: large-scale datasets, abundant computational resources and
handcrafted features to enhance the performance, but they still faced transformer-based architectures. For example, the widely used
overfitting issues. The emergence of deep learning further revolution- ADE20K18 and MS COCO (Microsoft Common Objects in Context) stuff19
ized this field, particularly with the advent of convolutional neural datasets consist of 25,000 images with 150 semantic categories and

nature methods Volume 20 | July 2023 | 953–955 | 953

Comment

Biological image sources Foundation models Segmentation tasks and results

Microscopy images Attention layer

MatMul

Softmax Semantic
segmentation

Scale
Ribosomes
Membranes

Query Key Value

Transformer block

N× +

Instance
Multilayer segmentation
perception

Normalization Cells

Multi-head
attention
Panoptic
segmentation

Normalization
Nuclei

Embedded
image patches Mitochondria

Fig. 1 | Three fundamental biological image segmentation tasks. Microscopes with attention layers, can handle a wide range of image segmentation tasks in
and microscopy images are vital for studying cellular and molecular structures in a unified manner. Shown on the right are examples of semantic, instance and
biology. Foundation models, constructed using consecutive transformer blocks panoptic segmentation tasks.

164,000 images with 172 classes, respectively. Training large vision to explore the potential of foundation models in biological image
models on ADE20K or COCO datasets typically requires thousands to segmentation. By leveraging transformer-based algorithms and pub-
hundreds of thousands of GPU hours16,17. Moreover, all existing foun- licly available resources, researchers can build versatile, efficient and
dation models are built on transformer-based architectures20, which generalizable segmentation algorithms for a wide range of biological
offer better flexibility and superior modeling capability compared to imaging modalities.
convolutional neural networks. There are three promising directions where foundation models
We may ask: how far are we from making foundation models a can transform biological image segmentation. First, transformer-based
reality in biological image segmentation? We address this question by frameworks could enable unified and generalized segmentation of
examining quantitative information about roadblocks. First, the size of different structures across various biological images, unlocking
aggregated annotated microscopy image datasets2,3,9,21 is near to 15,000 unprecedented capabilities for the analysis of complex biological
images. Although the dataset size is still far from that of the COCO data- data. Second, multi-modality models for fusing complementary
set, it approaches the size of the ADE20K dataset. Importantly, many information between microscopy images and sequencing data hold
unlabeled microscopy images22 can be used for building foundation great potential for generating more comprehensive representations
models via self-supervised learning23. The cost of training foundation of cellular environments, enabling more precise identification of
models could be prohibitively expensive for many academic research cell types, states and interactions in spatial transcriptomics. Lastly,
groups, but the availability of large-scale computational resources reinforcement learning from biologist feedback (RLBF) can pave
provided by open research organizations such as OpenBioML (https:// the way for new biological discoveries. Motivated by the unprec-
openbioml.org/) has provided great opportunities to mitigate this edented success of ChatGPT24, foundation models could become
constraint. In addition, well-designed transformers can outperform increasingly effective at generating insights and detecting subtle
customized networks in natural image segmentation16,17. Considering yet significant biological phenomena by iteratively incorporating
these developments, we believe that now is an opportune moment expert knowledge.

nature methods Volume 20 | July 2023 | 953–955 | 954

Comment

When applying pretrained foundation models to new tasks 3. Greenwald, N. F. et al. Nat. Biotechnol. 40, 555–565 (2022).
4. Heinrich, L. et al. Nature 599, 141–146 (2021).
or unseen images from new imaging technologies, transfer learn- 5. Berg, S. et al. Nat. Methods 16, 1226–1232 (2019).
ing is a common technique. One can fine-tune the foundation 6. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Adv. Neural Inf. Process. Syst. 25,
model on customized datasets to extend its abilities. In particular, 1097–1105 (2012).
7. Ronneberger, O., Fischer, P. & Brox, T. In Intl C. Medical Image Computing and
parameter-efficient fine-tuning methods25 have made fine-tuning Computer-Assisted Intervention (eds Navab, N. et al.) 234–241 (Springer, 2015).
large models robust and computationally feasible. Another promising 8. Pachitariu, M. & Stringer, C. Nat. Methods 19, 1634–1641 (2022).
method is in-context learning24, allowing foundation models to acquire 9. Cutler, K. J. et al. Nat. Methods 19, 1438–1448 (2022).
10. Sheridan, A. et al. Nat. Methods 20, 295–303 (2023).
knowledge or adapt their behaviors on the basis of the context of input 11. Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. Nat. Methods 18,
data during inference time. Together, we believe that foundation mod- 203–211 (2021).
els have transformative potential in biological image segmentation and 12. Antonelli, M. et al. Nat. Commun. 13, 4128 (2022).
13. Graham, S. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2303.06274 (2023).
the capacity to revolutionize our understanding of complex biological 14. Brown, T. B. et al. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
systems. We anticipate that in the next five years foundation models will 15. Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. In Proc. IEEE/CVF Conf.
be a dominant force in the field, as researchers will have ample time to Computer Vision and Pattern Recognition 10674–10685 (IEEE, 2022).
16. Jain, J. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2211.06220 (2023).
improve these models and develop targeted applications that address 17. Zou, X. et al. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition
the unique challenges of biological image segmentation. It is vital for (in the press).
the research community to continue investing in the development 18. Zhou, B. et al. Int. J. Comput. Vis. 127, 302–321 (2019).
19. Caesar, H., Uijlings, J & Ferrari, V. In Computer Vision and Pattern Recognition
and optimization of these models and encourage interdisciplinary col- 1209–1218 (CVPR, 2018).
laboration between biologists, computer scientists and other experts 20. Vaswani, A. et al. Adv. Neural Inf. Process. Syst. (2017); https://proceedings.neurips.cc/
paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
to successfully integrate foundation models into biological research.
21. Edlund, C. et al. Nat. Methods 18, 1038–1045 (2021).
22. Lin, J.-R. et al. Cell 186, 363–381.e19 (2023).
Jun Ma & Bo Wang
1,2,3 1,2,3,4,5
23. He, K. et al. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition 15979–15988
(IEEE, 2022).
1
Peter Munk Cardiac Centre, University Health Network, Toronto,
24. Ouyang, L. et al. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).
Ontario, Canada. 2Department of Laboratory Medicine and 25. Hu, E. J. et al. In Intl Conf. Learning Representations (2022); https://openreview.net/
Pathobiology, University of Toronto, Toronto, Ontario, Canada. forum?id=nZeVKeeFYf9
3
Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
Acknowledgements
4
AI Hub, University Health Network, Toronto, Ontario, Canada. We thank R. Xie and K. Mckeen for insightful discussions. This work was supported by the
5
Department of Computer Science, University of Toronto, Toronto, Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2020-06189
Ontario, Canada. and DGECR-2020-00294), Canadian Institute for Advanced Research (CIFAR) AI Catalyst
Grants, and CIFAR AI Chair programs.
e-mail: [email protected]
Author contributions
Published online: 11 July 2023 J.M. wrote the manuscript and B.W. edited the original draft and provided funding support.
All authors wrote, edited and gave final approval to the manuscript.

References
1. de Teresa-Trueba, I. et al. Nat. Methods 20, 284–294 (2023). Competing interests
2. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Nat. Methods 18, 100–106 (2021). The authors declare no competing interests.