Liza Kok

Chapter 6 204 NES and neurons) as well as disease status (control or 4H) on gene and protein expression as well as transcripts was evaluated. A random effect was also included in the LMEM to account for inter-individual variability, where repeated measures (across the different cell types) for each donor line was predominately prevalent. For instances where the three repeated measures for a specific donor line were not all present, the chosen LMEM appropriately handled such missingness by utilizing all available data through restricted maximum likelihood (REML) estimation (under the assumption of missing at random or MAR). Similarly, a LMEM was fitted for the Pol III subunits on the original response scale and included a fixed effect (disease status only) and a random effect which accounted for twelve (or slightly less) repeated measures given by the number of technical replicates. Again, any missing data was dealt with through REML estimation under the MAR assumption. All LMEMs results are summarized in a Type-II ANOVA table, which tests for each main, fixed effect (e.g., cell type) after the other main, fixed effect (e.g., disease status). It is noted that all reported findings are on the original data scale (i.e., log-scaled results were back-transformed via exponentiation). Diagnostic checks to validate the fit of all LMEMs (and associated output) was done via residual analysis, by inspecting the Quantile-Quantile (Q-Q) as well as the residuals-versus-fitted-values plots and performing the S-W test to check for residual normality. More details on the exact formulation of the aforementioned models are detailed in the Supplementary Material. Influential outliers The data exploration stage identified the potential presence of influential outliers. This implicated data (of a specific disease status and cell type) tend to deviate significantly from the general data pattern observed and has a disproportionate effect on the output of the LMEMs. More specifically, the parameter estimates are distorted, the model fit is altered, and the interpretation is impacted (resulting in misleading conclusions). Cook’s Distance was used to formally quantify the influence of a data point by assessing the change in fitted values when the said data point was entirely removed from the analysis. This diagnostic measure combines the leverage of a data point (i.e., how far the independent variable values are from mean) and the residual (how far the observed value is from the fitted value) to provide a single measure of influence. The reader is referred to Nieuwenhuis et al. (2012) for Cook’s Distance in context of LMEMs (Nieuwenhuis et al., 2012). After the removal of these influential outliers, the same above-described analyses for all expressions/transcripts/subunits were re-run, thereby serving as a sort of sensitivity analysis to ensure the overall reliability and validity of the fitted models as well as the implicated inference.

RkJQdWJsaXNoZXIy MTk4NDMw