8.2. Automated optimization with virtual implant users 8 117 template of what could theoretically be achieved with flawless object detection, segmentation or depth prediction models. This enables testing hypothetical benefits of different image processing strategies, without the need of implementing deep neural networks. For this reason, in AI research, training with synthetic data from virtual environments has become a frequently-adopted practice (e.g., seeTremblay et al., 2018). In the field of neuroprosthetics it can be used to guide the further development of scene processing software. 8.2. Automated optimization with virtual implant users In silico virtual implant users In this dissertation, a substantial focus is put on the use of in silico frameworks that can automate the cycle of development, evaluation and optimization of the scene processing and phosphene encoding (see Chapter 3, Chapter 4, Chapter 5). In the recent years this concept has gained an increasing interest in the research field (Bruce & Beyeler, 2022; Granley et al., 2022a; 2023; Küçükog˘lu et al., 2022; White et al., 2022). The proof-of-principle results discussed in this dissertation and the aforementioned literature demonstrate the viability of end-to-end optimization of prosthetic vision. Nevertheless, there are many practical and theoretical factors that require further consideration. Choosing the right constraints and objective functions Evidently, simulated patients have no knowledge on practically useful phosphene encodings without being given the right training constraints. While deep learning-based models are inherently well-suited for optimizinginformation content, they are principally naive with respect tosafety aspects of electrical stimulation, or theintuitive interpretability by human observers. The inclusion of explicit constraints can guide the encoding towards, for instance sparse stimulation patterns (seeChapter4). Furthermore, to ensure task-relevance, the feature selection can be controlled through supervised learning with labeled targets (e.g., see Chapter 4; Granley et al., 2022a), or more complex learning objectives such as reinforcement learning objectives (seeChapter3; Küçükog˘lu et al., 2022; White et al., 2022). Including explicit constraints in the training procedure can steer the encoding towards, for instance, sparse stimulation patterns (seeChapter4).Moreover, to promote human interpretability, it can be useful to explicitly maximize similarity between the simulated phosphene representation and the visual target (seeChapter4; Granley et al., 2022a). This prevents the deep learning encoder from producing abstract and uninterpretable phosphene encodings that have no intuitive resemblance with the visual scene. These examples illustrate the necessity of choosing the right combination of learning objectives and constraints for obtaining intuitive, task-relevant and efficient phosphene encodings. Constraining the search space In the end-to-end architecture inChapter 4, a naive encoder is trained from scratch, which requires the optimization of thousands of randomly-initialized parameters. This has both advantages and disadvantages. The virtue of deep neural networks is that the multitude of parameters allow for complex and intelligent behavior. However, large models require large datasets and a well-designed optimization pipeline. In many cases

RkJQdWJsaXNoZXIy MTk4NDMw