5 70 5. Towards biologically plausible phosphene simulation Figure 5.6: Schematic illustration of the end-to-end machine-learning pipeline adapted from (de Ruyter van Steveninck et al., 2022a). A convolutional neural network encoder is trained to convert input images or video frames into a suitable electrical stimulation protocol. In the training procedure, the simulator generates a simulation of the expected prosthetic percept, which is evaluated by a second convolutional neural network that decodes a reconstruction of the input image. The quality of the encoding is iteratively optimized by updating the network parameters using back-propagation. Different loss terms can be used to constrain the phosphene encoding, such as the reconstruction error between the reconstruction and the input, a regularization loss between the phosphenes and the input, or a supervised loss term between the reconstructions and some ground-truth labeled data (not depicted here). Note that the internal parameters of the simulator (e.g., the estimated tissue activation) can also be used as loss terms. the model has successfully learned to represent the original input frames in phosphene vision over time, and the decoder is able to approximately reconstruct the original input. Constrained stimulation and naturalistic scenes In a second experiment, we trained the end-to-end model with a more challenging dataset containing complex images of naturalistic scenes (the ADE20K dataset; Zhou et al., 2019). In this experiment, we implemented the original pipeline described in (de Ruyter van Steveninck et al., 2022a) (experiment 4), with the same phosphene coverage as the previously described experiment. The images were normalized and converted to grayscale, and we applied a circular mask such that the corners (outside the field covered by phosphenes) were ignored in the reconstruction task. The experiment consisted of three training runs, in which we tested different conditions: a free optimization condition, a constrained optimization condition, and a supervised boundary reconstruction condition. In the free optimization condition, the model was trained using an equally weighted combination of a MSE reconstruction loss between input and reconstruction, and a MSE regularization loss between the phosphenes and input images. After six epochs the model found an optimal encoding strategy that can accurately represent the scene and allows the decoder to accurately reconstruct pixel intensities while qualitatively maintaining the image structure (see Figure 5.8). Importantly, the encoder encoded brighter areas of the input picture with large stimulation amplitudes (over 2000µA). The encoding strategy found in such an unconstrained optimization scheme is not feasible for real-life applications. In practice, the electrical stimulation protocol will need to satisfy safety bounds and it will need to comply with technical requirements and limitations of the

RkJQdWJsaXNoZXIy MTk4NDMw