4 46 4. End-to-end optimization of prosthetic vision where LI is the pixel-wise reconstruction loss and the parameter κcan be adjusted to choose the relative weight between the reconstruction loss and sparsity loss. We evaluated 13 values of κ, again each time using the random restarts approach, testing five different weight initializations. We performed a regression analysis on the overall percentage of active electrodes and the reconstruction performance (MSE) to evaluate the effectiveness and the decrease in performance, respectively. The results for three of the κ-parameters are displayed in Figure 4.4. As can be observed, adding additional sparsity loss by increasing theκparameter, resulted in an overall lower percentage of active phosphenes. For larger values of κ, the reconstruction quality dropped. In contrast to the results of experiment 1, addition of a sparsity loss led to phosphene patterns that more naturally encode the presence of pixels by the presence of phosphenes (instead of vice versa). Figure 4.4: Results of experiment 2. The model was trained on a combination of mean squared error loss and sparsity loss. 13 different values for sparsity weight κwere tested. (a) Visualization of the results for three out of the 13 values for κ. Each row displays the performance metrics for the best-performing model out of five random restarts, and one input image from, the validation dataset (left), with the corresponding simulated phosphene representation (middle) and reconstruction (right). (b) Regression plot displaying the sparsity of electrode activation and the reconstruction error in relation to the sparsity weight κ. The red circles indicate the best-performing model for the corresponding sparsity condition, as visualized in panel (a). 4.3.4. Experiment 3 In the third experiment, the model was trained on a more complex and naturalistic image dataset. To this end, we made use of the ADE20k semantic segmentation dataset (Zhou et al., 2017; 2019). Compared with the synthetic character dataset which we used for the aforementioned experiments, one of the key challenges of such naturalistic stimuli is that instead of merely foreground objects on a plain dark background, the images contain abundant information. Here, not all features may be considered relevant, and therefore the task at hand (implemented by a loss function) should control which information needs to be preserved in the phosphene representations. Note that the proposed end-to-end framework allows for optimization to virtually any type of task that can be formalized as a loss function. However, with the current experiments we merely aimed to demonstrate the basic principle by exploring the encoding strategies for three different types of image reconstruction tasks. We compared the pixel-based MSE reconstruction task that was used in the first experiment with two other types of
RkJQdWJsaXNoZXIy MTk4NDMw