REVEAL

REVEAL: Robust Endoscopic Vision-language foundation model for surgical video understanding

Uploaded by

Muhammad Saqib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views5 pages

REVEAL

REVEAL: Robust Endoscopic Vision-language foundation model for surgical video understanding

Uploaded by

Muhammad Saqib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

REVEAL: Robust Endoscopic Vision-language foundation model for

surgical video understanding

Summary of Research
Developing specialist computer vision models is a tedious job and requires extensive labelled data and
training. This hinders the progress of AI in medical interventions and causes developed applications to
run slowly due to enormous amount of data. In this research we look at the large deep learning neural
networks trained on massive and diverse datasets known as Foundation Models (FMs). Our research
focuses on assessing the effectiveness and potential drawbacks of using FMs for surgical data science
tools. This will help to develop more accessible practical AI solutions for clinicians and patients.

Objectives
The main aim of the project is to train and adopt Foundation Models (FMs) to create cardiac and
colorectal related AI solutions. Some aims include:

1. Establishing a comprehensive evaluation framework for widespread adoption of FMs.

2. Holistically assess surgical skills cardiac and colorectal surgeons in the UK

3. Advancing the adaptation of FMs for Electronic Medical Records (FEMRs)

Research Context

Research in the field of computer vision and artificial intelligence (AI) has shown remarkable promise in
revolutionizing medical interventions, particularly in surgery. However, the development of specialist
computer vision models tailored for surgical applications presents significant challenges [1][2][3][4][5].
Traditionally, these models require extensive labelled datasets and time-consuming training processes,
which not only impede progress but also hinder the practical application of AI in medical settings. The
specific challenges of working with medical data require the utilization of many types of AI models.
Consequently, there is a need for more efficient and accessible solutions to accelerate the deployment
of AI in surgery.

This research seeks to address these challenges by exploring the potential of Foundation Models (FMs)
[6][7] as a more efficient alternative. Unlike traditional approaches [8][9], FMs are trained on unlabelled
data and then adapted to specific tasks with minimal annotated data and fine-tuning. This methodology
not only reduces the resource-intensive nature of model development but also holds promise for
speeding up the application of AI in surgical data science tools.

To assess the effectiveness and potential drawbacks of using FMs in surgical contexts, we draw
inspiration from recent advancements in the field of language models, particularly the work on holistic
evaluation frameworks [10]. Such language models demonstrate the importance of comprehensive
evaluation methods in elucidating the clinical value of language models. By leveraging insights from FMs,
we aims to establish a robust evaluation framework tailored specifically for assessing the performance
of FMs in surgical settings.
Summary on recent approaches used in developing Foundation Models (FMs)

Traditional approaches in AI such as Expert Systems [8], Decision Trees [9], and Natural language
processing (NLP) requires enormous training and labelled data, thus hampers the progress of AI in
medical intervention. Foundation models (FMs) are large deep learning neural networks [11] which are
trained on massive datasets. These models can perform a wide range of tasks with a great of accuracy
based on input prompts. Some tasks include natural language processing (NLP) [12], question answering,
and image classification [13] [14]. Some of the foundation models include Bidirectional Encoder
Representations from Transformers (BERT) [6] was one of the first foundation models. The Generative
Pre-trained Transformer (GPT) [7] model was developed by OpenAI in 2018. Stable Diffusion [15] is a
text-to-image model that can generate realistic-looking, high-definition images. Similarly, generalist
medical AI (GMAI) [16] models are capable of carrying out a diverse set of tasks using very little or no
task-specific labelled data.

Bibliography

1. ELYAN, E., VUTTIPITTAYAMONGKOL, P., JOHNSTON, P., MARTIN, K., MCPHERSON, K., MORENO-
GARCIA, C.F., JAYNE, C. and SARKER, M.M.K. 2022. Computer vision and machine learning for
medical image analysis: recent advances, challenges, and way forward. Artificial intelligence
surgery [online], 2, pages 24-25. Available from: https://doi.org/10.20517/ais.2021.15
2. Sebastian Bodenstedt, Martin Wagner, Beat Peter Müller-Stich, Jürgen Weitz, Stefanie Speidel;
Artificial Intelligence-Assisted Surgery: Potential and Challenges. Visc Med 4 December 2020; 36
(6): 450–455. https://doi.org/10.1159/000511351
3. Liang, X., Yang, X., Yin, S. et al. Artificial Intelligence in Plastic Surgery: Applications and
Challenges. Aesth Plast Surg 45, 784–790 (2021). https://doi.org/10.1007/s00266-019-01592-2
4. S. Kumar, P. Singhal and V. N. Krovi, "Computer-Vision-Based Decision Support in Surgical
Robotics," in IEEE Design & Test, vol. 32, no. 5, pp. 89-97, Oct. 2015, doi:
10.1109/MDAT.2015.2465135.
5. François Chadebecq, Francisco Vasconcelos, Evangelos Mazomenos, Danail Stoyanov; Computer
Vision in the Surgical Operating Room. Visc Med 4 December 2020; 36 (6): 456–462.
https://doi.org/10.1159/000511934
6. Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (October 11, 2018). "BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding".
arXiv:1810.04805v2 [cs.CL].
7. Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman,
Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.
arXiv preprint arXiv:2303.08774, 2023.
8. Buchanan, B. G., & Smith, R. G. (1988). Fundamentals of expert systems. Retrieved 4 30, 2024,
from https://annualreviews.org/doi/abs/10.1146/annurev.cs.03.060188.000323
9. Navada, Arundhati, et al. "Overview of use of decision tree algorithms in machine learning."
2011 IEEE control and system graduate research colloquium. IEEE, 2011.
10. Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi
Zhang, Deepak Narayanan, Hannah Teufel, Marco Bellagente, et al. Holistic evaluation of text-to-
image models. Advances in Neural Information Processing Systems, 36, 2023.
11. Schmidhuber, Jürgen. "Deep learning in neural networks: An overview." Neural networks 61
(2015): 85-117.
12. Chowdhary, KR1442, and K. R. Chowdhary. "Natural language processing." Fundamentals of
artificial intelligence (2020): 603-649.
13. Esteva, A., Chou, K., Yeung, S. et al. Deep learning-enabled medical computer vision. npj Digit.
Med. 4, 5 (2021). https://doi.org/10.1038/s41746-020-00376-2
14. Deng, J. et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on
Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
15. Stable Diffusion XL on Amazon Bedrock - AWS. 2024. https://aws.amazon.com/bedrock/stable-
diffusion/ Accessed 24 April 2024.
16. Moor, M., Banerjee, O., Abad, Z.S.H. et al. Foundation models for generalist medical artificial
intelligence. Nature 616, 259–265 (2023). https://doi.org/10.1038/s41586-023-05881-4

Research Methodology
Data Collection and Preprocessing:

Gather a diverse dataset of surgical videos encompassing procedures in cardiac and colorectal surgery.
Ensure the dataset includes a wide range of variations, complexities, and surgical scenarios. Annotate a
subset of the dataset with relevant labels, such as surgical instruments, anatomical structures, and
procedural steps. This annotated subset will be used for fine-tuning the Foundation Models (FMs).
Preprocess the dataset to enhance its quality, remove noise, and standardize formats to ensure
compatibility with the FM training process.

Training Foundation Models (FMs):

Utilize state-of-the-art Foundation Models such as BERT and GPT, pre-trained on large-scale datasets, as
the basis for further training. Fine-tune the pre-trained FMs on the annotated subset of surgical videos
using transfer learning techniques. This process will adapt the FMs to comprehend and analyze surgical
video data effectively.

Evaluation Framework Development:

Develop a comprehensive evaluation framework inspired by recent advancements in language model

evaluation. Define metrics and criteria for assessing the performance of FMs in surgical video
understanding tasks by considering factors like accuracy, efficiency, and clinical relevance. Collaborate
with clinical experts to validate the evaluation framework and ensure it aligns with real-world surgical
requirements and expectations.

Holistic Assessment of Surgical Skills:

Engage with cardiac and colorectal surgeons in the UK to holistically assess their surgical skills using the
developed FM-based tools. Implement the FM-powered surgical skill assessment within platforms like
IVA HEART to enable personalized feedback and insights for surgeons. Collect feedback from surgeons
regarding the usability, effectiveness, and practicality of the FM-based assessment tools.

Advancing Adaptation of FMs for Electronic Medical Records (FEMRs):

Explore the integration of FMs into Electronic Medical Record (EMR) systems to enhance clinical
documentation, decision support, and data analysis. Collaborate with healthcare IT professionals and
EMR vendors to integrate FM-based functionalities into existing EMR platforms and workflows.

Iterative Refinement and Validation:

Iteratively refine the trained FMs, evaluation framework, and tools based on feedback and insights
gathered from clinicians, researchers, and end-users. Publish findings, methodologies, and insights in
peer-reviewed journals and present them at relevant conferences and symposiums to contribute to the
broader scientific community.

Tentative Timetable
• Work Package 1
o Month 1-6:
▪ Course work is completed.
▪ The research proposal prepared in close collaboration with supervisor.
▪ Proposal is approved by the supervisor and the advisory team.
▪ Conducting Proposal Workshop.
▪ 1st 6-month report submitted to advisory team.
o Month 6-12:
▪ Data collection for paper 1 (Literature Survey and problem identification) is
completed.
▪ Annual Evaluation
▪ Prepare first paper for submission!
▪ 2nd 6-month report submitted to advisory team.
• Work Package 2
o Month 12-24:
▪ Paper 1 published
▪ Evaluation Framework proposed based on literature review.
▪ Prepare journal paper against Framework.
▪ 3rd and 4th 6-month report submitted to advisory team.
• Work Package 3
o Month 24-32:
▪ Research work is completed.
▪ Journal paper is published.
▪ Prepare research paper based on user study.
▪ 5th 6-month report submitted to advisory team.
• Work Package 4
o Month 32-36
▪ Month 30: Complete thesis writing
▪ Thesis submitted to supervisor.
▪ Revised thesis is submitted to the two reviewers.
▪ Final Presentation
Relevant Experience
• Master-level IT professional with 14+ years of experience in designing large scale web
applications.
• Working on developing/overseeing solutions for Healthcare for the past 8 years and have hands
on experience on number of related technologies.
• Experience in designing large scale web applications and services with over 8 years of project
management experience.
• Designed and architected software systems and web services for Fortune 500 companies such as
Ford Motors, BMW, Hyundai and Toyota.

I am currently working as Technical Project Manager at OneTouch EMR since 2015. I worked as
Technical Manager and Architect between 2012-2015. Served as Lecturer/Software Engineer at NUST
School of Electrical Engineering and Computer Science (SEECS), from July 2009 till December 2012,
which is one of the top universities in Pakistan.

Encl: CV