Chapter 4.2.4 110 INTRODUCTION Surgical workflow analysis is an important component to standardize the time- line of a procedure. This is useful for quantifying surgical skills196, training progression197,198, and can also provide contextual support for further computer analysis both offline for auditing and online for surgeon assistance and automation.199-201 In the context of laparoscopy, where the main input is video, the current approaches for automated workflow analysis focus on frame- level multi-label classification. The majority of the state-of-the-art models can be decomposed into two components: feature extractor and feature classifier. The feature extractor normally is a Convolutional Neural Network back- bone converting images or batches of images (clips) into feature vectors. Most of the features extracted at this stage are spatial features or fine-level temporal features depending on the type of the input. Considering that long-term information in surgical video sequences aids the classification process, the following feature classifier predicts phases based on a temporally ordered sequence of extracted features. Following from natural language processing and computer vision techniques, the architecture behind this feature classifier has evolved from Long Short-Term Memory (EndoNet, SVRCNet)202,203, Temporal Convolution Network (TeCNO)204, to Transformer (OperA, TransSV)205,206 in workflow analysis. Although these techniques have improved over the years, the main problem formulation remains unchanged in that phase labels are assigned to individual units of frames or clips. These conventional models achieve now excellent performance on the popular Cholec80 benchmark, namely on frame-based evaluation metrics (accuracy, precision, recall, f1-score).203 However, small but frequent errors still occur throughout the classification of large videos, causing a high number of erroneous phase transitions, which make it very challenging to pinpoint exactly where one phase ends and another starts. To address this problem, we propose a novel methodology for surgical workflow segmentation. Rather than classifying individual frames in sequential order, we attempt to locate the phase transitions directly. Additionally, we employ reinforcement learning as a solution to this problem, since it has shown good capability in similar retrieval tasks.207,208 Our contributions can be summarized as follows: – We propose a novel formulation for surgical workflow segmentation based on phase transition retrieval. This strictly enforces that surgical phases are continuous temporal intervals, and immune to frame-level noise. – We propose Transition Retrieval Network (TRN) that actively searches for phase transitions using multi-agent reinforcement learning. We describe a range of TRN configurations that provide different trade-offs between accuracy and amount of video processed. – We validate our method both on the public benchmark Cholec80 and on an in-house dataset of laparoscopic sacrocolpopexy, where we demonstrate a single phase detection application.
RkJQdWJsaXNoZXIy MTk4NDMw