4 42 4. End-to-end optimization of prosthetic vision function and batch normalization is used to stabilize training (Table4.1). At this time, cortical visual prostheses do not allow for systematic control over phosphene brightness (Najarpour Foroushani et al., 2018; Troyk et al., 2003). Therefore, in this paper we assume binary instead of graded electrode activation. The Heaviside step function is used as activation function in the output layer to obtain quantized (binary) stimulation values. A straight-through estimator (Yin et al., 2019) was implemented to maintain the gradient flow during backpropagation. Figure 4.2: Schematic representation of the end-to-end model and its three components: (a) The phosphene encoder finds a stimulation protocol, given an input image. (b) The personalized phosphene simulator maps the stimulation vector into a simulated phosphene vision (SPV) representation. (c) The phosphene decoder receives a SPV-image as input and generates a reconstruction of the original image. During training, the reconstruction dissimilarity loss between the reconstructed and original image is backpropagated to the encoder and decoder models. Additional loss components, such as sparsity loss on the stimulation protocol, can be implemented to train the network for specific constraints. In the second component of our model, a phosphene simulator is used to convert the stimulation protocol that is created by the encoder to a SPV representation. This component has no trainable parameters and uses predefined specifications to realistically mimic the perceptual effects of electrical stimulation in visual prosthetics. Phosphene simulation occurs in three steps: first, each element in the 32 × 32 stimulation protocol is mapped onto pre-specified pixels of a 256 × 256 image, yielding the simulated visual field. Phosphenes are mapped onto a rectangular grid, of which the positions were distorted by a random factor between -0.25 and 0.25 times the phosphene spacing in both horizontal and vertical direction. Secondly, the phosphene intensities were multiplied with a prespecified random gain value between 0.5 and 1.5 to mimic natural variation in brightness. Thirdly, the obtained image is convolved with a gaussian kernel to simulate the characteristic perceptual effects of electrical point stimulation. In the current experiments, we use a phosphene spacing of 8 pixels, and a sigma value of 1.5 pixels for the gaussian kernel. The third component of our model is the decoder. The decoder is an image-to-image
RkJQdWJsaXNoZXIy MTk4NDMw