Ann-Sophie Page

SCP workflow analysis 115 RESULTS & DISCUSSION We first provide an ablation of different configurations of our TRN model in Table 1, for Cholec80. It includes two search window sizes (21 and 41 clips) and two initializations (FI, RMI). The observations are straightforward. Larger windows induce generally better f1-scores, and RMI outperforms FI. This means that heavier configurations, requiring more computations, lead to better accuracies. Particular choice of a TRN configuration would depend on a trade-off analysis between computational efficiency and frame-level accuracy. Window size Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7 Overall F1-score TRN21 FI 0.854 0.917 0.513 0.903 0.687 0.549 0.83 0.782 TRN41 FI 0.828 0.943 0.636 0.922 0.558 0.694 0.85 0.808 TRN21 RMI 0.852 0.942 0.619 0.939 0.727 0.747 0.837 0.830 TRN41 RMI 0.828 0.940 0.678 0.945 0.753 0.738 0.861 0.846 Table 1. TRN ablation in the Cholec80 dataset (F1-scores). The values per-phase are computed before Gaussian Composition, while the overall F1-score is for the complete TRN method. Table 2 shows a comparison between TRN and state-of-the-art frame-based methods on both Cholec80 and Sacrocolpopexy. The utilized baselines are TeCNO204, Trans-SVNet206, which we implemented and trained ourselves. Instead of simple ResNet50, we use the same feature averaging process as the TRN for consistency. Also for consistency, we disabled causal convolution in TCN (it is a provided flag in their code) that allowing Trans-SV and TCN to be trained in off-line mode. Dataset Method Accuracy Precision Recall F1Score Event ratio Ward event ratio Coverage rate (%) Computation cost (s) Cholec80 ResNet-50 79.7±7.5 73.5±8.4 78.5±8.9 0.756 0.120 0.375 full 96.6 TeCNO 88.3±6.5 78.6±9.9 76.7±12.5 0.774 0.381 0.691 full 99.6 TransSVNet 89.1±5.7 81.7±6.5 79.1±12.6 0.800 0.316 0.566 full 99.6 TRN21 FI 85.3±9.6 78.1±11.1 78.9±13.5 0.782 1 0.934 57.6 60.6 TRN41 FI 87.8±8.1 80.3±9.1 81.7±12.4 0.808 1 0.956 59.1 64.9 TRN41 RMI 90.1±5.7 84.5±5.9 85.1±8.2 0.846 1 0.985 full 105.5 Sacrocol- - popexy ResNet-50 92.5±3.8 94.9±2.8 84.5±8.4 0.892 0.029 0.016 full 493.7 TeCNO 98.1±1.7 97.7±1.9 97.5±3.0 0.976 0.136 0.438 full 493.8 TransSVNet 97.8±2.2 96.5±4.5 98.0±3.5 0.971 0.536 0.813 full 493.9 TRN21 FI 89.8±6.2 88.6±11.7 85.3±11.1 0.860 0.971 0.875 14.6 78.1 TRN81 FI 90.7±6.1 88.6±11.5 88.5±11.1 0.875 0.941 0.860 18.3 104.0 Table 2. Evaluation metric results summary of ResNet-50, our implementation of TeCNO and Trans-SV, and ablative selected TRN result on Cholec80 and Sacro- colpopexy. The computational cost is in average second to process a single video. For Cholec80, our full-coverage model (TRN41 RMI) surpasses the best baseline (Trans-SVNet) in all frame-based metrics, while having significantly better even-based metrics (event ratio, Ward event ratio). This can be explained by TRN’s immunity to frame-level noisy predictions, which can be visualized on a sample test video in Fig. 3a. Remaining visualizations for all test data are provided in supplementary material. Still for Cholec80, our partial-coverage models (TRN21/41 FI) have frame- based metrics below the state-of-the-art baselines, however, they have the advantage of performing segmentation by only processing below 60% of the video samples. The trade-off between coverage and accuracy can be observed.

RkJQdWJsaXNoZXIy MTk4NDMw