Liza Kok

Discussion 223 7 With rapid experimental advancements, high-throughput tools such as transcriptomics are now more affordable and hence accessible to many researchers. As a result, generating new data is no longer the primary bottleneck in research—it is the analysis and interpretation of these vast datasets. While the availability of such tools can drive new discoveries, it is worth questioning whether they are always the most appropriate approach. With the ease of generating large datasets—often yielding publishable findings—one might wonder if such methods are sometimes chosen out of convenience. In light of this, we should wonder: do small advances in sequencing technology, increased sample size or slightly different in vitro approach, really justify the generation of yet another new big data set? What happens to datasets that are already available - are we utilizing them to their full potential? And to what extent do choices in data analysis and research direction shape the outcomes of our studies? In this thesis, we have generated multiple transcriptome datasets, though our approaches and subsequent decisions varied. For example, in chapter 2, we used a reductionist approach, focussing on a differentially expressed gene (DEG) list to identify a single extreme gene (ARX), with known functions. This finding directed further research, leading to the identification of interneuron involvement in 4H. However, this reductionist approach is inherently sensitive to bias, as it relies heavily on pre-existing knowledge (Abedi et al., 2019). While we validated differences in ARX expression across other in vitro differentiation products, many other significant genes remained unexplored. An alternative, less biased approach involves gene set enrichment analysis (GSEA). In this thesis, we demonstrate that GSEA can yield numerous significant gene sets. However, determining which of these are truly relevant and warrant further investigation remains a challenge. Often, existing knowledge again guides researchers to prioritize gene sets that align with current hypotheses. Yet, as shown in the section on sample selection, including other disease samples revealed that some gene sets that aligned well for one specific disease, were often significant in other disease as well. To conclude, transcriptome studies have become more accessible, and have led to new discoveries. In my opinion, ensuring that data is Findable, Accessible, Interoperable, and Reusable (FAIR) can maximize the potential of the single cell data set generated in this thesis, instead of generating new ones. Of course, with the reuse of data it is important to address concerns about the reuse of shared data by researchers who were not involved in its collection and hence might not understand all the study parameters (Longo & Drazen, 2016).

RkJQdWJsaXNoZXIy MTk4NDMw