Image Compositing for Segmentation of Surgical Tools without Manual Annotations

Garcia-Peraza-Herrera, Luis C.; Fidon, Lucas; D'Ettorre, Claudia; Stoyanov, Danail; Vercauteren, Tom; Ourselin, Sebastien

doi:10.1109/tmi.2021.3057884

Computer Science > Computer Vision and Pattern Recognition

arXiv:2102.09528 (cs)

[Submitted on 18 Feb 2021]

Title:Image Compositing for Segmentation of Surgical Tools without Manual Annotations

Authors:Luis C. Garcia-Peraza-Herrera, Lucas Fidon, Claudia D'Ettorre, Danail Stoyanov, Tom Vercauteren, Sebastien Ourselin

View PDF

Abstract:Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.

Comments:	Accepted by IEEE TMI
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2102.09528 [cs.CV]
	(or arXiv:2102.09528v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2102.09528
Related DOI:	https://doi.org/10.1109/tmi.2021.3057884

Submission history

From: Luis Carlos Garcia-Peraza-Herrera [view email]
[v1] Thu, 18 Feb 2021 18:14:43 UTC (45,491 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Image Compositing for Segmentation of Surgical Tools without Manual Annotations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Image Compositing for Segmentation of Surgical Tools without Manual Annotations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators