4.3. Experiments and Results 4 43 Type In Out size Stride Pad Normalization Activation 1 Conv. 1 8 3 1 1 BN LReLU 2 Conv. + Pool 8 16 3/2 1 1 BN LReLU 3 Conv. + Pool 16 32 3/2 1 1 BN LReLU 4 Res. 32/32 32/32 3/3 1/1 1/1 BN/BN LReLU/LReLU 5 Res. 32/32 32/32 3/3 1/1 1/1 BN/BN LReLU/LReLU 6 Res. 32/32 32/32 3/3 1/1 1/1 BN/BN LReLU/LReLU 7 Res. 32/32 32/32 3/3 1/1 1/1 BN/BN LReLU/LReLU 8 Conv. 32 16 3 1 1 BN LReLU 9 Conv. 16 1 3 1 1 - Step Table 4.1: Architecture of the encoder component. Conv: convolutional layer; Res: residual block; Pool: maxpooling layer; BN: batch normalization; LReLU: leaky rectified linear unit. Type In Out size Stride Pad Normalization Activation 1 Conv. 1 16 3 1 1 BN LReLU 2 Conv. 16 32 3 1 1 BN LReLU 3 Conv. 32 64 3 2 1 BN LReLU 4 Res. 64/64 64/64 3/3 1/1 1/1 BN/BN LReLU/LReLU 5 Conv. + Pool 32/32 32/32 3/3 1/1 1/1 BN LReLu 6 Conv. + Pool 32/32 32/32 3/3 1/1 1/1 BN LReLu 7 Conv. + Pool 32/32 32/32 3/3 1/1 1/1 BN LReLu 8 Conv. 64 32 3 1 1 BN LReLU 9 Conv. 32 1 3 1 1 - Sigmoid Table 4.2: Architecture of the decoder component. Conv: convolutional layer; Res: residual block; Pool: maxpooling layer; BN: batch normalization; LReLU: leaky rectified linear unit. conversion model that is based on a residual network architecture (He et al., 2016), which is known for its useful training properties (Ebrahimi & Abadi, 2021; Huang et al., 2020). Furthermore, residual networks demonstrate computational similarities with recurrent networks that are found in the biological visual system (Kubilius et al., 2019; Liao & Poggio, 2016; Schrimpf et al., 2020). Batch normalization and activation with leaky rectified linear units is implemented in all layers of the model, except for the output layer, which uses sigmoid activation and no batch normalization (Table4.2). The decoder component is designed to ‘interpret’ the SPV representation by converting it into a reconstruction of the original input. Our end-to-end architecture implements an auto-encoder architecture, where the SPV representations can be seen as a latent encoding of the original input (or some transformation thereof) (Bengio et al., 2013). In this view, the efficient encoding of the rather complex visual environment into a relatively rudimentary phosphene representation can be considered a dimensionality reduction problem in which we aim to maximally preserve the information that is present in the latent SPV representation. 4.3. Experiments and Results Model performance was explored via four computational experiments using different datasets. In each experiment a different combination of loss functions and constraints was
RkJQdWJsaXNoZXIy MTk4NDMw