274 chapter 3 manufacturers and medical centres. Also, volume of interest segmentation should be performed in a standardised manner, preferably (semi-)automatically using an algorithm to reduce inter- and intraobserver variability [382]. In addition, a lack of standardisation in definition and extraction of radiomic features introduced variation. The Image Biomarker Standardisation Initiative (IBSI) made an effort to harmonise this by providing common nomenclature, mathematical definitions, benchmarks for image processing and feature extraction, and reporting guidelines [383, 384]. Similarly, repeatability and reproducibility studies have been performed to identify features that show minimal variations at different time points, under different conditions and with different feature definitions [385, 386]. Besides overcoming technical variations, another challenge of radiomics lies in the large number of features (generally over 100 features per lesion) compared to the number of subjects in a study (varying from several tens to hundreds in typical PET and CT studies, respectively). In contrast to traditional biomarker research, which is hypothesis-driven, radiomic research is of explorative nature. In explorative or data-driven research, a biological rationale of a feature representing certain disease characteristics lacks [387]. Therefore, many features are investigated, under the assumption that some features show association with underlying biology. Simultaneously, because of variation in scan protocols, it is challenging to find sufficiently large homogeneous datasets. When the number of data points (patients or scans) are small compared to the number of features, overfitting occurs, negatively impacting the generalisation performance of the radiomic model [388]. Overfitting means that the model is specifically adjusted to the training, or input, dataset, solely reflecting its noise and random fluctuations, and, consequently, it cannot be applied to other datasets, i.e., it is not generalisable. Therefore, before modelling, the number of features should be reduced using feature selection (supervised by outcome) or dimensionality reduction (unsupervised) [389]. In the modelling step, an AI algorithm may be used to fit a function to the input data and compares it with the desired output (e.g. tumour phenotype) minimising a cost-function [390]. Several (integrated) algorithms for both feature selection/dimensionality reduction and modelling are available, but no consensus on which one to use for radiomic analysis exists. The choice of the algorithm has been shown to affect the prediction performance of the radiomic model and depends on the nature of the data [361]. Many radiomic studies employ multiple AI algorithms, which comes with the risk of multiple testing and thus increasing the false-discovery rate. Multiple-modelling strategies can be justified when comprehensively documented to ensure reproducibility, and when extensively (and externally) validated [391]. In addition to external validation of the radiomic model, another strategy that contributes to clinical translation is the comparison of the performance of a radiomic model with the performance of current approaches, e.g., blood biomarkers or visual interpretation. Also, false discoveries can be minimised by, among other things, validation of the results using sham data, i.e., randomly shuffling outcome labels or using radiomic features from healthy tissue, testretest studies, and by studying the biological rationale, or semantics, of the radiomic features in the model [392, 393].
RkJQdWJsaXNoZXIy MTk4NDMw