2.4. Discussion 2 21 processing of phosphene vision has been sparsely studied. Yet, based on the performance measures and the responses from the exit survey, we can conclude that the representation of background textures and surface gradients at higher resolutions of prosthetic vision proves to be informative for mobility – at least in the current environment. Combined, the aforementioned conclusions advocate for a balanced compromise between informativity and interpretability. The optimal amount of scene simplification should be a careful choice that depends on the characteristics of the implant and environmental context. 2.4.3. Feasibility of deep learning-based surface-boundary detection for scene simplification Besides the theoretically attainable benefit of strict scene simplification, which was tested by removal of gradients and background textures in the environment, we evaluated its practical feasibility through intelligent image processing. Comparing SharpNet with CED, no significant improvement in performance was found in any of the study conditions. Looking at the results in the complex environment, the relative decreased performance with SharpNet in the higher phosphene resolution (42 × 42 phosphenes) is in line with the aforementioned analysis of the CED trials and suggests that that removal of gradients and background textures may not be beneficial at higher resolutions. The absence of improvement in the SharpNet trials with a lower phosphene resolution (26 × 26 phosphenes), however, is unexpected, as for low resolutions a strict scene simplification method is theorized to prevent overcrowding of the phosphene representation (Vergnieux et al., 2017). Even more unpredicted, is the omnipresent performance drop in the plain environment, since in this environment the behavior of the SharpNet model is expected to be similar to CED –all visual gradients, besides shadows, match object surface boundaries. Rather than inherent disadvantages of surface boundary detection, these findings are likely to be explained by poor achievement of the current implementation. Here we summarize a few potential issues: firstly, post-hoc inspection of the captured image stream revealed poor prediction of the surface boundaries by the SharpNet model. Please note that the output of CED on images acquired in the plain environment (Figure 4B) is effectively equivalent to the ideal output of SharpNet in the complex environment (compare to Figure 3G). The network is trained on a naturalistic indoor image dataset and the underperformance may reflect poor generalizability to the current environment. Furthermore, based on incidental reports from the exit survey and post-hoc visual inspection of the videos, head movements seemed to negatively influence the prediction performance, indicating that the network might be sensitive to motion blur. This occasionally caused obstacles to remain undetected. Secondly, a potential problem with the current SharpNet implementation, is that it is based on individual frame processing, resulting in large frame-to-frame differences. The lack of dynamical consistency may have cause reduced interpretability of the phosphene representations. Thirdly, participants might have experienced difficulties adjusting between the two image processing strategies. Although participants were trained during the practice session, in equal amounts for both methods, by design, our experiment contained fewer SharpNet trials compared to CED trials. This relative underrepresentation may have caused reduced familiarity with the SharpNet condition. Lastly, the current image processing pipeline is to some extent based on arbitrary choices. Although the effects of specific parameter settings and processing choices were

Made with FlippingBook

RkJQdWJsaXNoZXIy MTk4NDMw