Maaike Swets

211 Summary and general discussion 9 It is difficult to predict how the developments in the field of data science will change (medical) research practices in the coming years and decades. There is an increasing belief that the developments in data science and big data will transform medicine into precision medicine44. While it may take some time for this future vision to become reality, there are many opportunities to improve and accelerate medical science using advancements in the field of data science. First of all, linking of different data sets may reduce missing data, creating new possibilities for research with observational data. Examples are the linking of data from EHR in hospitals to claims data, genomic data, data with information on social economic status, pathogen data or data from GP practices29. Moreover, machine learning approaches are often used for prediction, as it is increasingly difficult for humans to deal with the amount of data available to us29. A 2018 study used deep learning, a form of machine learning, to create a prediction model using EHR data from over 200.000 patients, and outperformed traditional prediction models for a range of clinical outcomes (including mortality and length of stay)45. However, the quality of data from EHRs may prove to be a limiting factor, not just for machine learning, but as mentioned earlier, for all studies using automatically collected data sets29. Another possibility of machine learning techniques applied to big data sets is for (unsupervised) clustering approaches, such as K-means clustering, or probabilistic clustering approaches, such as latent class analysis29,46. These techniques can generate hypotheses by finding correlations in observational data, which can later be studied in traditional causal inference structures29. Finally, there are many developments in the field of causal inference and machine learning47. The integration of causal reasoning and machine learning to handle large and high-dimensional data sets can assist in understanding complex systems47. When it comes to causal inference, large data sets do not solve the problems that are faced with small data sets. If selection bias is present in a large data set and no attempts are made to correct for it, a large data set will give a biased effect size estimate29. The importance of understanding the conditions for causal inference continue to be important when utilising large data sets. While RCTs cannot be replaced, these new techniques can provide information and hypotheses, which can ultimately advance medical practice. Additionally, findings from observational studies can contribute to the development of more informed hypotheses for RCTs. Part III: Facilitating causal inference Apart from systematic reviews, randomised controlled trials are considered to provide the highest level of evidence27. While in some settings alternative study designs might

RkJQdWJsaXNoZXIy MTk4NDMw