SI, 5, Ward 2021
SI, 5, Ward 2021
SI, 5, Ward 2021
DOI: 10.1002/jso.26496
REVIEW ARTICLE
Thomas M. Ward MD1 | Pietro Mascagni MD2,3,4 | Amin Madani MD, PhD5 |
2,4 4,6
Nicolas Padoy PhD | Silvana Perretta MD | Daniel A. Hashimoto MD, MS1
1
Department of Surgery, Surgical AI &
Innovation Laboratory, Massachusetts Abstract
General Hospital, Boston, Massachusetts
Surgical data science (SDS) aims to improve the quality of interventional healthcare
2
ICube, University of Strasbourg, CNRS,
France
and its value through the capture, organization, analysis, and modeling of procedural
3
Fondazione Policlinico A. Gemelli IRCCS, data. As data capture has increased and artificial intelligence (AI) has advanced, SDS
Rome, Italy can help to unlock augmented and automated coaching, feedback, assessment, and
4
IHU Strasbourg, Strasbourg, France decision support in surgery. We review major concepts in SDS and AI as applied to
5
Department of Surgery, University Health
surgical education and surgical oncology.
Network, Toronto, Canada
6
IRCAD, Strasbourg, France
KEYWORDS
artificial intelligence, computer vision systems, data science, deep learning, surgical education
Correspondence
Daniel A. Hashimoto, MD MS, Surgical AI &
Innovation Laboratory, Department of
Surgery, Massachusetts General Hospital, 15
Parkman St, WAC460, Boston, MA 02114,
USA.
Email: [email protected]
1 | INTRODUCTION upon our experience to ultimately make a decision in the best in-
terest of the specific patient.
Surgical data science (SDS) aims to improve the quality of inter- Artificial intelligence (AI) has also recently grown in popularity
ventional healthcare and its value through the capture, organization, and interest within surgery. Although SDS and AI in surgery are not
analysis, and modelling of procedural data. Although this certainly one and the same, SDS can utilize techniques from AI to facilitate
includes surgery, the tent of SDS also includes other procedural improvements in the delivery of surgical care, whether through di-
fields such as interventional radiology, pulmonology, and gastro- rect patient contact via diagnostic and therapeutic intervention or
enterology.1 Building on the evolution of innovative technologies in through clinicians who can benefit from data‐enabled insights into
these fields and the foundation of data analysis established by the their own performance. Other published studies have conducted
quality improvement and health services fields, SDS incorporates a reviews of the diagnostic and therapeutic potential of SDS and AI2–5;
range of inputs from traditional registry and claims data, device data, we focus our review on applications to surgical education, including
and patient/surgeon‐specific data to yield insights into procedural decision support and coaching, feedback, and performance assess-
care (Figure 1). ment. We briefly review concepts in AI in surgery, their current
Over time, clinicians have evolved from relying only on intuition applications in research (emphasizing possible impacts for surgical
and personal experience to having data‐driven intuition and collec- education), and the anticipated future directions of the field.
tive experience as the result of technology, innovation, and evidence‐
based medicine. The trick now is striking the correct balance from
population‐level data and clinician experience to deliver in- 1.1 | AI in surgery
dividualized care (Figure 2) as there is no area of medicine that is
currently “data only.” As clinicians, we are expected to consider the The field of AI studies how to make computers function intelligently
best available evidence or data for an individual patient and then call to understand, process, and act in the world. The media portrays a
the feedback with which the machine tries to learn: supervised, semi‐
supervised, unsupervised, and reinforcement.12 In supervised learning,
the machine algorithm receives inputs in the form of labeled (anno-
tated) data. It learns the ability to take future unlabeled data and
correctly determine the labels. For example, a machine could be fed
x‐rays labeled with pneumonia and could subsequently attempt to
F I G U R E 2 Modern clinicians have varying levels of intuition or identify pneumonia in unlabeled x‐rays.
experience and data depending on the clinical situation and must
In unsupervised learning, machines recognize patterns within
balance the two to deliver care to patients [Color figure can be
viewed at wileyonlinelibrary.com] their input data. As an example, if given a wide sample of unlabeled
chest radiographs, the machine will group the films into many cate-
gories. Upon review, these categories may include films that have “no
yet‐to‐be‐achieved level of AI, Artificial General Intelligence, where pathology” or “pneumonia,” but they may even uncover previously
robots and computers function at human‐like levels.6 Realistically, unknown clusters (e.g., subtle signs of volume overload or early
current AI functions in a narrow manner, where computers can malignancies). Semi‐supervised learning lies in‐between the two
successfully perform a few select tasks. Today, narrow AI is ubiqui- above extremes, training on partially labeled data.
tous: it recommends movies to consumers, drives autonomous Lastly, reinforcement learning occurs via trial and error while
vehicles, and even processes letters for postal services. attempting to achieve a specific objective. The AI either receives a
Although the term AI was not formally coined until 1956 at the positive reward when it succeeds or punishment when it fails. In
Dartmouth Summer Research Project, influences on the field date reinforcement learning, the machine is not told exactly what it did
back decades (if not centuries).7 Despite lofty initial promises, AI as a wrong, but rather, just that it did something wrong. It then refines its
field experienced a “boom and bust” trajectory over the next four actions over repeated iterations, keeping those that led to success
decades, including two “AI Winters” during which excitement and and avoiding those that led to failure. Although reinforcement
funding for the field dropped significantly.8 The most recent “AI learning seems to be an indirect and therefore inefficient way to
Winter” has experienced a thaw over the past twenty‐five years, train an AI, it can produce incredible results. Through reinforcement
driven by gradual advances in three combined factors: data avail- learning, a computer algorithm with no other precedent knowledge
ability, computational power, and novel techniques. of the game of Go was able to easily beat the world champion.13
Data forms the foundation upon which computers learn. Now, Classical ML methods use structures such as trees to represent
more than ever, we generate “big data.” Before 2003, humanity data and require experts in the field to “hand‐craft” parts of the
created 5 exabytes (5*1018) of data ‐‐ an amount now generated process (e.g., select features most likely to represent the phenom-
every two days.9 Advances in deep learning, which started in the enon of interest) to work with each domain's data. Neural networks
1990s, created algorithms with multiple processing layers that could take a different approach, where inferences are made from the data
use this tremendous amount of data to discover complex and hidden itself without requiring manual feature selection. Neural networks
patterns.10 However, we could not use these algorithms efficiently have proven to be highly adaptable and generalizable to auto-
until recently with the use of modern high‐powered computer chips matically select the features that are most likely to yield results for a
called Graphical Processing Units (GPUs) that can process the given phenomenon of interest; thus, they can work for a multitude of
mountains of data through the multiple layers of deep learning.11 problems.7
Machine learning (ML) is a field of study within AI that tries to Similar to a human brain's internal structure, a neural network is
teach computers and machines how to learn. Learning is a fundamental composed of tens to hundreds, and even thousands, of computer‐
process where an agent (be it human or computer), uses information represented “neurons” that are either on or off. A deep‐learning net-
from observations to improve its performance on future tasks. Machine work is a stack of three or more neuronal layers, with each layer spe-
learning is often categorized into four different categories described by cializing at a certain task, creating more specific output through each
WARD ET AL. | 223
progression of the layers.10 To provide a simplified example, consider a Language, however, is highly composable and does not lend itself to
deep learning network with multiple layers for recognizing geometric the prediction from the preceding words alone that N‐gram models
shapes. The first layer may decide if an object has straight versus generate. The composable nature of neural networks nicely mirrors
rounded edges. Having noted straight edges, the next layer determines that of language, which allows for extremely accurate NLP.16 Now
the number of edges (in this case four). The ensuing layer detects the that computers can reliably model and understand language, re-
four edges are the same length, and the final layer detects each edge is searchers have begun to deploy NLP in multiple medical applications,
at a right‐angle to the other. Taken all together, the network finally particularly those related to the electronic medical record.
outputs the object's classification as a square (Figure 3). Computer vision (CV) is the process of training a computer to see
The beauty of deep learning lies in the summation of each layer, and understand images. After relative stagnation for decades,
which performs a small simple function, into a complex overall out- Krizhevsky et al.17 revolutionized the field in 2012, obtaining human‐
put. As it is composed of small functions, different layers can be like object classification accuracy. Their algorithm succeeded through
combined to produce different outputs, and even small modifications the utilization of a particular neural network structure called convolu-
of just the last few layers allow for easy reapplication of a model that tional neural networks (CNNs). CNNs work in a similar fashion to a
performs one task (such as diagnosing chest radiographs) to a model human's visual cortex. Instead of needing to process every pixel of
that can accurately perform another (such as diagnosing retinal information, they allow for classification of an image's parts into the key
images). This technique is known as transfer learning, and the prior components necessary for recognition, such as shape, texture, and
example in fact has been published in the literature.14,15 The use of color. As the network only needs to train on these small components, it
deep learning has particularly revolutionized the fields of image re- learns more efficiently and quickly while attempting to minimize the
cognition and language processing. dangers of overfitting. Overfitting occurs when a statistical model
Although classical machine learning, neural networks, and deep conforms too closely to a selection of data points and, therefore, per-
learning are not the only approaches to AI available to researchers forms poorly when applied to other data sets. As CNNs learn to visually
and developers, they are some of the most popular in use in surgery.4 recognize objects through the key components alone, they can re-
Furthermore, each of these techniques can be utilized within appli- cognize a car, for example, even if it is a different brand or color. The
cative fields such as natural language processing and computer vi- success of CNNs has led to numerous deployments in visual aspects of
sion, as well as strengthen the impact of other fields related to real‐ medicine, from image‐based diagnostics to real‐time surgical video
world decision making. By processing and inputting real‐world data analysis. It has also formed the basis for much of the recent success and
into a form in which computers can reason or better “understand” the growing interest in AI in surgery.
the impact of actions, in reality, advances in the fields are expected
to significantly impact surgery and many other application domains.
Natural language processing (NLP), the comprehension of writ- 1.2 | AI for surgical performance augmentation
ten human language by computers, has also undergone a revolution and education
due to neural networks. Before neural networks, NLP was limited to
the categorization of language in N‐gram models, which is the Data over the last three decades have shown alarmingly high rates of
12
probability of an ensuing word as a result of the preceding words. preventable adverse events amongst hospitalized surgical patients.
F I G U R E 3 Simplified schematic of a neural network that identifies shapes. Each layer derives a piece of information that can ultimately
result in the identification of a shape such as a square. More complex versions of this basic architecture have achieved tasks such as
identification of surgical instruments, anatomy, and operative steps. Courtesy of SAIIL. Reproduced with permission [Color figure can be viewed
at wileyonlinelibrary.com]
224 | WARD ET AL.
F I G U R E 4 Screenshot of the think like a surgeon platform demonstrating annotation of recurrent laryngeal nerve by multiple experienced
surgeons to generate a “heat map” of areas where the nerve is perceived to be. Courtesy of Amin Madani. Reproduced with permission
[Color figure can be viewed at wileyonlinelibrary.com]
Within these reports, which include a wide range of demographics, from advanced cognitive skills, there is tremendous potential in de-
geographic locations, and surgical subspecialties, root‐cause analyses veloping algorithms that are able to analyze surgical data and aug-
tend to trace most of the errors back to events that occurred at the ment our mental model to improve surgical decision‐making.
time of surgery.18–21 Specifically, adverse events tend to occur as an Although computer vision has shown promise in various non-
error in judgment or decision‐making that led to behaviors or actions surgical fields of medicine (e.g., cancer diagnosis from mammograms),
that contributed to the outcome.22 Most surgeons would agree that there are several challenges to consider before applying computer
the skills that contribute the greatest to the development of an elite vision to surgery. Firstly, anatomical structures in the field are almost
surgeon are their cognitive skills as opposed to psychomotor skills. never well‐demarcated and often hidden under layers of fatty and
This is also supported by the body of literature in surgical education fibrous tissues, making it difficult to train a model for intraoperative
emphasizing the importance of intraoperative judgment and guidance and navigation. To compound this problem, unlike images
decision‐making as a dominant determinant of surgical performance from diagnostic radiology or fundoscopy, most surgical videos have a
and, ultimately, surgical outcome.22 For instance, errors in human tremendous amount of variability in terms of background noise,
visual pattern recognition can lead to misinterpretation of surgical image and video quality, and other objects in the field. Secondly,
anatomy that leads to injury of a critical structure, such as the bile there is a significant variation amongst experts with respect to their
23
duct during laparoscopic cholecystectomy. advanced cognitive behaviors. For instance, most experts will not
It is, therefore, unsurprising that with the advances in the digi- agree on the exact location where to dissect, or the exact location of
talization of the surgical field and the ability to collect large sets of an anatomical plane, or the best possible instrument to use in any
data from the operating room (OR) (e.g., images, videos), innovators given instance. Therefore, establishing a gold‐standard reference
and researchers have turned to machine learning as a potential tool (“ground truth”) on which to train an algorithm and evaluate its
to augment a surgical team's performance. The field of computer performance is a major obstacle.
vision has made significant strides over the last decade, and new To overcome some of these challenges, Madani et al.24 proposed
machine learning methodologies such as deep learning have provided the visual concordance test (VCT) as a novel methodology to es-
the means to develop algorithms that can perform advanced human‐ tablish expert consensus within a surgical field. In this process, sur-
level perceptual functions, such as object recognition and tracking geons make annotations on frames extracted from a surgical video,
within a video, and scene recognition. Given that most errors stem while watching the video itself for reference. These annotations can
WARD ET AL. | 225
be obtained from a panel of experienced surgeons and compiled to average precision greater than 70%, suggesting that such a tech-
create a “heat map” that demonstrates the level of agreement nology could potentially augment intraoperative judgment for diffi-
amongst these surgeons (Figure 4). Despite the fact that most ex- cult situations.
perts will not annotate the exact same set of pixels, the area of
convergence of these annotations is considered to be an agreement
amongst the expert panel, and these pixels are subsequently used to 1.3 | Automated phase and instrument
train the AI model (e.g., the area the panel agrees is the best location recognition
to dissect in that particular scene).
One application that leverages the lessons learned from the Today, instead of using the rich information from intraoperative
mapping of experts’ mental models and surgical decision‐making events, most of the surgery is distilled into a one‐page operative
25
using VCT is the development of GoNoGoNet and CholeNet. These report. These operative reports fail to detail almost a third of in-
models were developed to automatically detect and outline safe traoperative complications.31 They also fail to capture how the op-
areas of dissection (“Go zone”), dangerous areas of dissection eration proceeded, as in, how well did the surgeon perform the
(“No‐Go zone”), and other anatomical structures during laparoscopic operation. We know intraoperative performance matters: across a
cholecystectomy. In this study, a data set of 290 laparoscopic cho- group of bariatric surgeons, those in the top quartile of surgical skill
lecystectomy videos drawn from 136 institutions in 37 countries was had lower rates of reoperation, readmission, ED visits, surgical
used to train these models with over 90% pixel accuracy and good complications, and medical complications.32 This has been replicated
spatial overlap compared to ground truths (Figure 5A,B). Real‐time in colorectal surgery for transanal total mesorectal excision with
overlay of Go and No‐Go zones could provide feedback and guidance Curtis et al demonstrating that surgeons in the upper quartile of
to surgeons who wish to learn new operations, seek to improve their technical skill had better outcomes as measured by integrity of the
performance, or find themselves in particularly difficult operations. mesorectal dissection plane and morbidity.33 Although surgical
In light of the encouraging results from GoNoGoNet, the appli- coaching programs are growing in popularity and could provide the
cations for surgical oncology can be potentially transformational. means through which surgical performance could be improved
Performing an adequate oncologic resection while minimizing peri- (especially in practicing surgeons), video review remains a tedious
operative morbidity is the cornerstone of most cancer operations. exercise.34–36 SDS could assist in the analysis of surgical skills
For example, deviation from the ideal dissection plane can either through automated methods to segment, annotate, and assess op-
lead to an oncologically inadequate resection or an increased risk of erative video.
complications due to injury to surrounding structures. Several groups
are currently working on developing models to provide real‐time
guidance on the ideal dissection plane during cancer operations to
minimize early perioperative outcomes and improve long‐term on-
cologic outcomes.
Surgical decision‐making is not always a cognitive behavior that
relates to a specific location in the surgical field (e.g., where to dis-
sect). Often it occurs at a higher level in relation to the tactical
approach of the operation. For this reason, AI‐based automated
scene recognition and assessment could also be leveraged to assist in
critical decision points of procedures, especially when significant
operator variability exists. For instance, during laparoscopic chole-
cystectomy, it is not only important to keep the dissection in a safe
plane that minimizes the risk of a major bile duct injury, but it is also
important not to divide any cystic structures until a critical view of
safety (CVS) has been achieved. It is also important to consider
various bailout procedures if a hostile environment is encountered
and a CVS cannot be safely achieved, such as a subtotal cholecys-
tectomy.26,27 Given that the determination of a CVS has been found
to be highly operator dependent,28,29 a model that provides decision‐
support to surgeons in real‐time as to what is the most optimal
strategy could be highly advantageous. Mascagni et al. recently
published their results on DeepCVS, which is a two‐stage model to
F I G U R E 5 (A) Image demonstrating a “Go” zone or safe area of
segment (i.e., delineate an object along its boundaries) hepatocystic dissection identified by GoNoGoNet and (B) a “NoGo” zone or unsafe
anatomy and predict whether or not each of the three elements of area of dissection. Courtesy of Amin Madani. Reproduced with
the CVS has been achieved (Figure 6).30 The model had a mean permission [Color figure can be viewed at wileyonlinelibrary.com]
226 | WARD ET AL.
F I G U R E 6 Example of DeepCVS semantic segmentation of the gallbladder (yellow), the cystic duct (cyan), the cystic artery (blue), the
dissected hepatocystic triangle (orange), the exposed cystic plate (red), and surgical instrument (light green) to improve the performance
and interpretability of the automatic assessment of critical view of safety criteria (C1: 2 tubular structures connected to the gallbladder; C2:
a well dissected hepatocystic triangle; C3: the lower part of the gallbladder is dissected to expose the cystic plate). Courtesy of CAMMA.
Reproduced with permission [Color figure can be viewed at wileyonlinelibrary.com]
As anecdotally known by surgical educators and formalized in and size of annotated surgical datasets, hampering the development
the Zwisch model for teaching and assessment in the OR, under- of AI models capable of performing more demanding surgical tasks
standing surgical instrument usage and the sequence of steps needed and limiting the generalization of such models across centers and
to successfully complete a surgical procedure is an early stage of procedures. The SDS community has been working to overcome this
surgical training.37 Subsequently, awareness and anticipation of limitation by devising models that learn using less (i.e., weakly‐
procedures’ workflows are gained before ultimately having trainees supervised) or no (i.e., self‐supervised) manual annotations. For in-
actively participating in surgical interventions.38 stance, Yu et al. used a small data set of annotated videos to train a
Inspired by this intuition in procedural skill development, the “teacher” model to automatically label a larger set of videos for
SDS community has firstly focused on developing computer vision training a lighter “student” model capable of real‐time inference,
systems for automatic phase recognition and instrument detection, utilizing state‐of‐the‐art models for surgical instrument detection,
39
fundamental elements of surgical workflows, with the aim to then localization, and tracking that were trained using images annotated
provide context‐aware assistance in the OR.39,40 Starting from fairly only with binary instruments presence information (i.e., a given tool
standardized procedures such as cholecystectomy and hysterectomy, is present or not in an image).49–51
early algorithms made use of hand‐crafted signals such as surgical Automated workflow analysis could serve surgical education in a
instrument usage and time dependencies to model and visualize multitude of ways, most of which have yet to be explored. Potential
surgical workflow using classical machine learning techniques such as pedagogic uses of phase recognition and instrument detection
dynamic time warping and hidden Markov models.41,42 More re- models can be categorized based on the timing of the analysis—
cently, breakthroughs in deep neural networks have boosted com- either post hoc for feedback or real‐time for guidance and decision‐
puter vision performance and revived the field of surgical workflow support.
43
analysis. Post hoc analysis of surgical workflows has been proposed to
In 2016, Twinanda et al trained a convolutional neural network facilitate video‐based assessment (VBA). This valuable approach for
on 80 publicly released laparoscopic cholecystectomy videos (Cho- the evaluation of performance and quality improvement is currently
lec80) to detect surgical instruments and recognize seven phases of limited by the time‐consuming process of collecting, manually re-
44
the procedure in a multitasking manner. The resulting model, viewing, and editing long surgical videos.36 Automated workflow
EndoNet, demonstrated 92% accuracy in the classification of surgical inference could be used to make VBA more efficient, standardized,
phases when used for posthoc analysis, and 86% accuracy when used and scalable. Phases and instrument usage information could be used
for real‐time inference.44 Deep learning models have since been to synchronize videos of the same surgical procedure, allowing for
trained to accurately recognize phases across a range of procedures, smart indexing and efficient querying of large databases of surgical
including bariatric,45 colorectal,46 ophthalmic,47 and intraluminal videos. As recently shown by a Japanese group analyzing gas-
interventions.48 trectomies,52 surgical instrument usage patterns could be plotted to
These models were trained using supervised learning with an- efficiently screen for cases and scenes likely to show unexpected
notations of video that contain the phase or the surgical instrument events so as to prioritize their VBA. The Surgical AI and Innovation
seen in a given video frame. The time‐consuming and tedious process Laboratory at Massachusetts General Hospital have described the
of manually labeling images has so far limited the informativeness concept of the “surgical fingerprint” wherein phase recognition
WARD ET AL. | 227
algorithms can assess the video of interest's workflow against that of OR activities and intervene to prevent surgical errors and in-
a pre‐existing database to determine whether the video is following efficiencies.55 Phase recognition and instrument detection models
an expected operative course. Time points with deviations from an could continuously analyze operative videos in real‐time and alert
expected operative course could signal areas where complications or the surgical control tower when a critical step is about to be per-
unexpected events might occur.45,48 formed or when unexpected, risky deviations from normal workflows
To demonstrate the potential value of such workflow informa- are detected. For example, a surgeon may begin to operate within a
tion in rapidly analyzing operative video, a computer vision platform “no go” zone that raises the risk of an inadvertent injury. Proctors
called EndoDigest used the predictions of phase recognition and could respond by scrubbing into the case or providing assistance
instrument detection models to accurately detect the time of the directly from the surgical control tower through telementoring and
cystic duct division in laparoscopic cholecystectomy videos and au- telestration. Alternatively, workflow analysis models could be used
tomatically provided short videos effectively documenting the cri- to provide direct feedback to surgeons with context‐aware notifi-
tical view of safety in 91% of cases (Figure 7).53 These and other cations. Such notifications could remind surgeons to implement
similarly “digested” surgical videos could be used for auditing per- procedure‐specific best practices at the right time during interven-
formance and/or efficiently rehearsing demanding procedural steps. tions, potentially contributing to the overall safety of surgery. For
Finally, workflow elements such as the sequence and duration of example, notifications could remind surgeons to administer fluor-
phases and instrument usage patterns could be used to compute escent dye, check their margins, or ensure identification of critical
metrics reflective of surgical technical skills. This concept is already structures before dividing tissues. Finally, such models could com-
being applied to automate assessment during flexible endoscopy si- pute in real‐time the same workflow‐based performance metrics
mulation, where the time taken to complete a simulated task is uti- discussed above so as to automatically provide formative feedback
lized as a metric of performance.54 Pending correlation with clinical and coaching during procedures.
outcomes and other validity evidence, the same concepts of ex-
tracting workflow data could be applied to assess surgeons’ perfor-
mance in the OR. It is not hard to imagine a future in which such 2 | LIM IT A TIO N S A ND FUT URE
quantitative and deterministic metrics of technical skills could con- DIRECTIONS
tribute to credentialing and privileging surgeons.
Real‐time analysis of surgical workflows could greatly facilitate Although the advances noted above are exciting and carry the pro-
monitoring of intraoperative events to provide context‐aware, case‐ mise of transforming the delivery of surgical care, it is worth noting
specific, and timely feedback. AI models for phase recognition and there are several important considerations to take into account to
instrument detection could represent the “brain” of surgical control avoid major pitfalls when applying computer vision for surgical
towers, rooms from which proctors and OR managers could oversee procedures. AI is just like any other surgical innovation, and unless
there is a very clear unmet need or gap in clinical practice that this that the majority of ML/AI advances in medicine rely on supervised
technology can address, its value will be minimal and widespread learning. This label reliance, though, compounds a fear of ML in
adoption unlikely. DeepCVS and GoNoGoNet were specifically de- general: bias. Biased or incorrect training data makes ML models
signed and developed to address specific cognitive behaviors which (both supervised and unsupervised) output bad decisions. We cur-
data suggests are major root causes of bile duct injuries.23,26,27 rently have a limited amount of training data, so our models already
Future models need to be similarly grounded in such data. If the aim have a high level of bias. In supervised learning, we compound this by
is to develop AI models that are able to provide real‐time data to having the computers learn biased human labels to categorize the
surgeons based on what expert surgeons would do, it is critical to already biased data. Recognizing the potential for bias is a key ele-
first understand how experts think and their mental processes that ment in appropriately performing and interpreting ML studies,
lead to their elite performance. Qualitative data and cognitive task especially in medicine. Thus, while a data set may contain thousands
analyses are a powerful method to delineate expert mental models of videos of a particular case, videos sourced from a handful of in-
and by gaining a better understanding of these cognitive behaviors, stitutions or surgeons may be biased in the patient population,
AI algorithms can be trained to replicate these behaviors and ulti- technique, equipment, or other factors and thus limit the general-
56,57
mately be deployed for real‐time decision‐support. izability of those models. Increased participation from additional
Surgical decision‐making has traditionally been presented as a centers, surgeons, and patients will be critical to ensure the success
linear process, where one operative step necessarily follows another of advances in SDS.
and few branches in decision‐making exist. For example, dissection of Finally, a cultural shift will be needed in surgery and is already
Calot's triangle is followed by isolation of the cystic duct or artery. partially underway. A culture of quality improvement has led to an
More recent approaches to investigating surgical decision‐making increased appetite for systematically collecting, storing, and analyz-
have focused on attempting to map decision‐making in a manner that ing surgical data, and this culture must now shift to include in-
is more reflective of the process utilized by surgeons during the traoperative video data as well. Surgeons must be willing to collect
course of an operation. That is, decision making is less likely to be video data on cases they perform and should approach patients
linear and more likely to be comprised of steps arranged in an in- prospectively to obtain their consent to utilize such data in surgical
terconnected fashion with multiple decision points that are affected registries. Although big data provides the means through which we
by a combination of patient, surgeon, and environmental factors.58 are able to infer phenomena from populations, it is critical to re-
An understanding of surgical decision‐making plays a major role member that the data are collected from individuals. Patients are the
in the development of AI applications, particularly as it relates to beneficiaries of surgical care, and it is important to maintain the
surgical education. Decision maps that contain the possible steps of patient at the center of care whether in an operation, in an out-
an operation can serve as the ground truth by which training data for patient visit, or in the laboratory where SDS tools are developed. As
AI is annotated. This enables applications such as automated iden- with more traditional clinical trials conducted in oncology, early
tification of operative phases and feedback on case progression. discussion with patients on the value of their surgical data is im-
Furthermore, an understanding of how experienced surgeons make portant so that patients can consider whether to contribute their
decisions based on visuospatial data can provide clues to training operative video and other data to SDS efforts. Regulatory con-
algorithms to detect critical structures, operative planes, and other siderations must be taken into account and vary across countries
important visual information. Thus, surgeons can engage with re- (e.g., Health Insurance Portability and Accountability Act in the
searchers to better understand decision‐making across a range of United States or General Data Protection Regulation in Europe).
procedures and case complexity.
Furthermore, while it is true that models like DeepCVS or Go-
NoGoNet produce consistent output for any given input, they rely on 3 | CO N C L U S I O N
human annotation to learn so their value is strictly dependent on the
quality and reliability of annotations. To improve the quality of an- SDS and AI carry the potential to transform surgical training and the
notation, these should be performed by multiple trained reviewers delivery of surgical care. Within the field of surgical education,
59
using validated protocols. This is likely to reduce human sub- computer vision has perhaps held the majority of the surgical at-
jectivity and bias, especially compared to operators self‐assessing tention given its easily understandable applications such as auto-
during potentially stressful procedures. To further reduce operator mated indexing of cases and identification of anatomy and other
dependencies, deep learning models should be trained on annota- spatial characteristics in surgical video. Additional advances in the
tions from many surgeons from multiple institutions so as to learn to field will require the participation of a diverse array of surgeons,
approximate the most commonly held interpretation for any given patients, and researchers to ensure that the applications of such
data point. technology are clinically meaningful.
The notion of bias in the annotation of data also raises concerns
about bias in the video data itself. Data is the life force for machine ACKNOWLEDGM E NTS
learning. With limited data, algorithms need “hints” in the form of Thomas M. Ward and Daniel A. Hashimoto (DAH) receive research
labels to make decisions. The “cognitive boost” of labels has meant support from Olympus Corporation for work outside of this
WARD ET AL. | 229
manuscript. DAH is a consultant for Johnson & Johnson Institute and adverse events in patients admitted to a Canadian teaching hospital.
Verily Life Sciences and has received research support from the In- CMAJ. 2004;170:1235‐1240.
21. Gawande AA, Thomas EJ, Zinner MJ, Brennan TA. The incidence and
tuitive Foundation for work outside of this manuscript. Nicolas
nature of surgical adverse events in Colorado and Utah in 1992.
Padoy is a consultant for Caresyntax and has received research Surgery. 1999;126:66‐75.
support from Intuitive Surgical for work outside of this manuscript. 22. Madani A, Vassiliou MC, Watanabe Y, et al. What are the principles
that guide behaviors in the operating room? Creating a framework
to define and measure performance. Ann Surg. 2017;265:255‐267.
ORCID
23. Way LW, Stewart L, Gantert W, et al. Causes and prevention of
Amin Madani https://orcid.org/0000-0003-0901-9851 laparoscopic bile duct injuries: analysis of 252 cases from a human
Silvana Perretta https://orcid.org/0000-0002-5354-535X factors and cognitive psychology perspective. Ann Surg. 2003;237:
Daniel A. Hashimoto http://orcid.org/0000-0003-4725-3104 460‐469.
24. Madani A, Grover K, Watanabe Y. Measuring and teaching in-
traoperative decision‐making using the visual concordance test:
R EF E RE N C E S
deliberate practice of advanced cognitive skills. JAMA Surg. 2019;
1. Maier‐Hein L, Vedula SS, Speidel S, et al. Surgical data science for 155:78. https://doi.org/10.1001/jamasurg.2019.4415
next‐generation interventions. Nat Biomed Eng. 2017;1:691‐696. 25. Madani A, Namazi B, Altieri MS, et al. Artificial intelligence for in-
2. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence traoperative guidance: using semantic segmentation to identify
versus clinicians: systematic review of design, reporting standards, surgical anatomy during laparoscopic cholecystectomy [published
and claims of deep learning studies. BMJ. 2020;368:m689. online ahead of print November 13, 2020]. Ann Surg. https://doi.org/
3. Topol EJ. High‐performance medicine: the convergence of human 10.1097/SLA.0000000000004594
and artificial intelligence. Nat Med. 2019;25:44‐56. 26. Brunt LM, Deziel DJ, Telem DA, et al. Safe cholecystectomy multi‐
4. Hashimoto DA, Rosman G, Rus D, Meireles OR. Artificial In- society practice guideline and state of the art consensus conference
telligence in Surgery: Promises and Perils. Ann Surg. 2018;268: on prevention of bile duct injury during cholecystectomy. Ann Surg.
70‐76. 2020;272:3‐23.
5. Navarrete‐Welton AJ, Hashimoto DA. Current applications of arti- 27. Madani A, Watanabe Y, Feldman LS, et al. Expert intraoperative
ficial intelligence for intraoperative decision support in surgery. judgment and decision‐making: defining the cognitive competencies for
Front Med. 2020;14:369‐381. safe laparoscopic cholecystectomy. J Am Coll Surg. 2015;221:931‐40.e8
6. Goertzel B, Pennachin C. Artificial General Intelligence. Berlin, 28. Nijssen MAJ, Schreinemakers JMJ, Meyer Z, Van der Schelling GP,
Heidelberg: Springer Science & Business Media; 2007. Crolla RMPH, Rijken AM. Complications after laparoscopic chole-
7. Hashimoto DA. Artificial intelligence in surgery: An AI Primer for cystectomy: a video evaluation study of whether the critical view of
Surgical Practice. McGraw‐Hill Education/Medical; 2020. safety was reached. World J Surg. 2015;39:1798‐1803.
8. Crevier D. AI: The Tumultuous History of the Search for Artificial 29. Stefanidis D, Chintalapudi N, Anderson‐Montoya B, Oommen B,
Intelligence. New York: BasicBooks; 1993. Tobben D, Pimentel M. How often do surgeons obtain the critical
9. Sagiroglu S, Sinanc D. Big data: A review. International Conference on view of safety during laparoscopic cholecystectomy? Surg Endosc.
Collaboration Technologies and Systems. 2013;2013:42‐47. 2017;31:142‐146.
10. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521: 30. Mascagni P, Vardazaryan A, Alapatt D, et al. Artificial intelligence
436‐444. for surgical safety: automatic assessment of the critical view of
11. Mittal S, Vaishay S. A survey of techniques for optimizing deep safety in laparoscopic cholecystectomy using deep learning [pub-
learning on GPUs. Int J High Perform Syst Archit. 2019;99:99. lished online ahead of print November 16, 2020]. Ann Surg. https://
12. Russell SJ, Norvig P, Davis E. Artificial Intelligence: a Modern Ap- doi.org/10.1097/SLA.0000000000004351
proach. 3rd ed. Upper Saddle River: Prentice Hall; 2010. 31. Wauben LSGL, Van Grevenstein WMU, Goossens RHM,
13. Silver D, Schrittwieser J, Simonyan K, et al. Mastering the game of Meulen FHV, Lange JF. Operative notes do not reflect reality in
go without human knowledge. Nature. 2017;550:354‐359. laparoscopic cholecystectomy. BJS. 2011;98:1431‐1436.
14. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data 32. Birkmeyer JD, Finks JF, O'Reilly A, et al. Surgical skill and complication
Eng. 2010;22:1345‐1359. rates after bariatric surgery. N Engl J Med. 2013;369:1434‐1442.
15. Kermany DS, Goldbaum M, Cai W, et al. Identifying medical diag- 33. Curtis NJ, Foster JD, Miskovic D, et al. Association of surgical skill
noses and treatable diseases by image‐based deep learning. Cell. assessment with clinical outcomes in cancer surgery. JAMA Surg.
2018;172:1122‐31.e9 2020;155:590‐598.
16. Otter DW, Medina JR, Kalita JK. A survey of the usages of deep 34. Greenberg CC, Byrnes ME, Engler TA, Quamme SP, Thumma JR,
learning for natural language processing. IEEE Trans Neural Dimick JB. Association of a statewide surgical coaching program
Networks Learn Syst. 2020:1‐21. with clinical outcomes and surgeon perceptions [published online
17. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with ahead of print February 10, 2021]. Ann Surg. https://doi.org/10.
deep convolutional neural networks. In: Pereira F, Burges CJC, 1097/SLA.0000000000004800
Bottou L, Weinberger KQ, eds. Advances in Neural Information 35. Vande Walle KA, Quamme SRP, Beasley HL, et al. development and
Processing Systems. Vol 25. New York: Curran Associates, Inc.; 2012: assessment of the wisconsin surgical coaching rubric. JAMA Surg.
1097‐1105. 2020;155:486‐492.
18. Brennan TA, Leape LL, Laird NM, et al. Incidence of adverse events 36. Pugh CM, Hashimoto DA, Korndorffer JR Jr. The what? How? And
and negligence in hospitalized patients. N Engl J Med. 1991;324: who? Of video based assessment. Am J Surg. 2021;221:13‐18.
370‐376. https://doi.org/10.1056/nejm199102073240604 37. DaRosa DA, Zwischenberger JB, Meyerson SL, et al. A theory‐based
19. Baker GR. The Canadian adverse events study: the incidence of model for teaching and assessing residents in the operating room.
adverse events among hospital patients in Canada. Can Med Assoc J. J Surg Educ. 2013;70:24‐30.
2004;170:1678‐1686. https://doi.org/10.1503/cmaj.1040498 38. Graafland M, Schraagen JMC, Boermeester MA, Bemelman WA,
20. Forster AJ, Asmis TR, Clark HD, Al Saied G, Code CC, Caughey SC, Schijven MP. Training situational awareness to reduce surgical er-
et al. Ottawa hospital patient safety study: incidence and timing of rors in the operating room. Br J Surg. 2015;102:16‐23.
230 | WARD ET AL.
39. Padoy N. Machine and deep learning for workflow recognition Annotation of Biomedical Data and Expert Label Synthesis. Cham:
during surgery. Minim Invasive Ther Allied Technol. 2019;28:82‐90. Springer; 2018:169‐179.
40. Vercauteren T, Unberath M, Padoy N, Navab N. CAI4CAI: The rise 52. Yamazaki Y, Kanaji S, Matsuda T, et al. Automated surgical instru-
of contextual artificial intelligence in computer assisted interven- ment detection from laparoscopic gastrectomy video images using
tions. Proc IEEE Inst Electr Electron Eng. 2020;108:198‐214. an open source convolutional neural network platform. J Am Coll
41. Meeuwsen FC, Van Luyn F, Blikkendaal MD, Jansen FW, Surg. 2020;230:725‐32.e1
Van den Dobbelsteen JJ. Surgical phase modelling in minimal in- 53. Mascagni P, Alapatt D, Urade T, et al. A computer vision platform to
vasive surgery. Surg Endosc. 2019;33:1426‐1432. automatically locate critical events in surgical videos: documenting
42. Padoy N, Blum T, Ahmadi S‐A, Feussner H, Berger M‐O, Navab N. safety in laparoscopic cholecystectomy. Ann Surg. 2021. https://doi.
Statistical modeling and recognition of surgical workflow. Med Image org/10.1097/SLA.0000000000004736
Anal. 2012;16:632‐641. 54. Bencteux V, Saibro G, Shlomovitz E, et al. Automatic task recogni-
43. Garrow CR, Kowalewski K‐F, Li L, et al. Machine learning for surgical tion in a flexible endoscopy benchtop trainer with semi‐supervised
phase recognition: a systematic review. Ann Surg. 2020;273: learning. Int J Comput Assist Radiol Surg. 2020;15:1585‐1595.
684‐693. https://doi.org/10.1097/SLA.0000000000004425 55. Mascagni P, Padoy N. OR black box and surgical control tower:
44. Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, recording and streaming data and analytics to improve surgical care
Padoy N. EndoNet: a deep architecture for recognition tasks on [published online ahead of print March 9, 2021]. J Visc Surg.
laparoscopic videos. IEEE Trans Med Imaging. 2017;36:86‐97. Accepted. https://doi.org/10.1016/j.jviscsurg.2021.01.004.
45. Hashimoto DA, Rosman G, Witkowski ER, et al. Computer vision 56. Madani A, Watanabe Y, Vassiliou M, et al. Defining competencies for
analysis of intraoperative video: automated recognition of operative safe thyroidectomy: An international delphi consensus. Surgery.
steps in laparoscopic sleeve gastrectomy. Ann Surg. 2019;270: 2016;159(86–94):96‐101.
414‐421. 57. Madani A, Grover K, Kuo JH, et al. Defining the competencies for
46. Kitaguchi D, Takeshita N, Matsuzaki H, et al. Automated laparo- laparoscopic transabdominal adrenalectomy: an investigation of in-
scopic colorectal surgery workflow recognition using artificial in- traoperative behaviors and decisions of experts. Surgery. 2020;167:
telligence: experimental research. Int J Surg. 2020;79:88‐94. 241‐249.
47. Lalys F, Bouget D, Riffaud L, Jannin P. Automatic knowledge‐based 58. Hashimoto DA, Axelsson CG, Jones CB, et al. Surgical procedural
recognition of low‐level tasks in ophthalmological procedures. Int map scoring for decision‐making in laparoscopic cholecystectomy.
J Comput Assist Radiol Surg. 2013;8:39‐49. Am J Surg. 2019;217:356‐361.
48. Ward TM, Hashimoto DA, Ban Y, et al. Automated operative phase 59. Mascagni P, Fiorillo C, Urade T, et al. Formalizing video doc-
identification in peroral endoscopic myotomy [published online umentation of the Critical View of Safety in laparoscopic cholecys-
ahead of print July 27, 2020]. Surg Endosc. https://doi.org/10.1007/ tectomy: a step towards artificial intelligence assistance to improve
s00464-020-07833-9 surgical safety. Surg Endosc. 2020;34:2709‐2714.
49. Nwoye CI, Mutter D, Marescaux J, Padoy N. Weakly supervised
convolutional LSTM approach for tool tracking in laparoscopic vi-
deos. Int J Comput Assist Radiol Surg. 2019;14:1059‐1067.
50. Yu T, Mutter D, Marescaux J, Padoy N. Learning from a tiny dataset
of manual annotations A teacher/student approach for surgical How to cite this article: Ward TM, Mascagni P, Madani A,
phase recognition. 2018. http://arxiv.org/abs/1812.00033. Accessed
Padoy N, Perretta S, Hashimoto DA. Surgical data science and
February 25, 2021.
51. Vardazaryan A, Mutter D, Marescaux J, Padoy N. Weakly‐ artificial intelligence for surgical education. J Surg Oncol.
supervised learning for tool localization in laparoscopic videos. In- 2021;124:221–230. https://doi.org/10.1002/jso.26496
travascular Imaging and Computer Assisted Stenting and Large‐Scale