4.2. Methods 4 41 prosthetic vision remain active topics of scientific investigation. In previous work, various preprocessing techniques have been tested for a variety of applications using simulated prosthetic vision (SPV) (Dagnelie et al., 2007; Parikh et al., 2013; Srivastava et al., 2009; Vergnieux et al., 2014). Such preprocessing techniques range from basic edge-filtering techniques for increasing wayfinding performance (Vergnieux et al., 2017) to more sophisticated algorithms, such as segmentation models for object recognition (Sanchez-Garcia et al., 2020), or facial landmark detection algorithms for emotion recognition (Bollen et al., 2019a; Bollen et al., 2019b). The latter two examples underline the potential benefits of embracing recent breakthroughs in computer vision and deep learning (DL) models for optimization of prosthetic vision. Despite the rapid advancements in these fields, research currently faces a difficult challenge in finding a general preprocessing strategy that can be automatically tailored to specific tasks and requirements of the user. We illustrate this with two issues: First, prosthetic engineers need to speculate or make assumptions about what visual features are crucial for the task and the ways in which these features can be transformed into a suitable stimulation protocol. Second, as a result of practical, medical or biophysical limitations of the neural interface, one might want to tailor the stimulation parameters to additional constraints. Recent work on task-based feature learning for prosthetic vision suggests that DL models can be employed to overcome such issues (White et al., 2019). In this paper we present a novel approach that explicitly exploits the potential of DL models for automated optimization to specific tasks and constraints. We propose a deep neural network (DNN) auto-encoder architecture, that includes a highly adjustable simulation module of cortical prosthetic vision. Instead of optimizing image preprocessing as an isolated operation, our approach is designed to optimize the entire process of phosphene generation in an end-to-end fashion (Donti et al., 2017). As a proof of principle, we demonstrate with computational simulations that by considering the entire pipeline as an end-to-end optimization problem, we can automatically find a stimulation protocol that optimally preserves information encoded in the phosphene representation, arriving at results that are comparable to traditional approaches. Furthermore, we show that such an approach enables tailored optimization to specific additional constraints such as sparse electrode activation or arbitrary phosphene mappings. 4.2. Methods In this section, we provide an overview of the components of the proposed end-to-end DL architecture. Next, we describe four simulation experiments that were conducted to explore the performance of our model with various sparsity constraints, naturalistic visual contexts and realistic phosphene mappings. 4.2.1. Model description The end-to-end model consists of three main components: an encoder, a phosphene simulator and a decoder (Figure 4.2). Given an input image, the encoder is designed to find a suitable stimulation protocol, which yields an output map. The value of each element in this stimulation protocol represents the stimulation intensity of one electrode in the stimulation grid. The encoder follows a fully-convolutional DNN architecture. In all layers, apart from the output layer, leaky rectified linear units are used as the activation

Made with FlippingBook

RkJQdWJsaXNoZXIy MTk4NDMw