4.4. Discussion 4 53 In experiment 3 we tested three different reconstruction tasks. The results indicate that for different tasks the model converges to a different optimal encoding strategy, which may indicate task-specific optimization. The higher FSIM and the lower MSE in the perceptual reconstruction task, compared to the intensity-based reconstruction task, indicate that information about the higher-level perceptual features is favoured over pixel-intensity information. Similarly, when the model was trained with BCE-loss to reconstruct the processed target labels from the ADE20k dataset, only semantic boundary information was preserved. The phosphene encodings that were found by the model in this condition are comparable to those found with traditional pre-processing approaches (seeFigure4.6e), and yield similar reconstruction quality (Table4.3) compared to edge detection (Canny, 1986) or holistic contour detection (Xie & Tu, 2017). Note that a variation on these approaches is investigated in ref. (Sanchez-Garcia et al., 2020), who demonstrated that pre-processing with semantic segmentation may successfully improve object recognition performance in simulated prosthetic vision (compared to pre-processing with conventional edge detection techniques). Different from the aforementioned traditional strategies, the proposed end-to-end architecture, merely requires supervision to the output reconstructions and the labels do not directly control the phosphene representations themselves The proposed end-to-end method takes the advantages of existing deep learning approaches (such as supervision with large precisely-labeled datasets) to achieve comparable results. In addition to that, our proposed method provides a generalized approach that opens the possibility for task-specific and tailored optimization. The VGG-feature loss and BCE-loss that were implemented in this paper, were not chosen only because of their well-established application in optimization problems (Asgari Taghanaki et al., 2021; Zhang et al., 2018) but also because they represent basic functions that are normally performed in the brain. The feature representations found in deep neural networks illustrate a similar processing hierarchy to that of the visual cortex (Güçlü & van Gerven, 2015; Yamins et al., 2014) and boundary detection is one of these processing steps, needed for segregation of objects from background (Roelfsema et al., 2002). Although many details about the downstream information processing of direct stimulation in V1 are yet to be discovered, we know that conscious awareness of a stimulated percept requires coordinated activity across a whole network of brain areas (Bosking et al., 2017a). By acting as a digital twin, a well-chosen reconstruction task may mimic the downstream visual processing hierarchy, enabling direct optimization of visual prosthetics to the biological system. Still, fully optimizing the interaction between prosthetic stimulation and the downstream visual processing, requires a deep understanding of the biological networks involved. The proposed end-to-end approach is designed in a modular way and future research can extend the concept with virtually any reconstruction model and task. 4.4.4. Tailored optimization to realistic phosphene mappings The precise characteristics of the artificial percept that can be generated with visual prosthetics will depend on many factors, including the electrode placement and the visual cortex of the patient. By including a more realistic simulation module with customized phosphene mapping, we explored the potential of our end-to-end method to optimize

Made with FlippingBook

RkJQdWJsaXNoZXIy MTk4NDMw