Maaike Swets

161 S/F94 as a proxy for COVID-19 severity 7 for some function f taking values in [0,1] that may in principle depend on all covariates. Let denote the predicted probability of day 28 mortality for individual i derived from the model Equation 32. Then the rate of predicted day 28 mortality in the sample can be estimated by . The odds ratio associated with the treatment effect f is estimated by We take this as our estimate of effect size. Quantifying uncertainty The required sample size and 95% confidence interval when the outcome measure is 1- or 2-level sustained improvement on the WHO scale is noticeably larger than for the other outcome measures, in particular S/F94. There are several reasons for this. The first is that the magnitude of effect size compared to variance of outcome measure is much smaller when 1- or 2-level sustained improvement on the WHO scale is the outcome measure compared to when S/F94 is the outcome measure. The second reason is that S/F94 is a continuous variable, whereas 1- or 2-level sustained improvement on the WHO scale are Bernoulli variables whose mean is the proportion of people who had a 1 or 2-level sustained improvement on the WHO scale, respectively. The sample size calculation with S/F94 as outcome measure relies on a two-sample t-test for testing the hypothesis that the means of two normal distributions with the same variance are equal. In the case of 1- or 2-level sustained improvement on the WHO scale, the procedure is similar except the variances of the two distributions are not equal. The mean of a large number of independent Bernoulli variables with mean μ is approximately normally distributed with mean μ and asymptotic variance . Therefore, two sets of Bernoulli distributed variables with different means also have different variances. If we use μ1 to denote the mean for the control group, and μ2 to denote the mean for the treatment group, then while μ1 < μ2 ≤ 0.5, the variance of the sample mean of the outcome measure increases as the effect size μ2 - μ1 increases. Thus while these conditions hold, and with all else equal, this has the consequence that an outcome measure that is Bernoulli distributed will require larger sample sizes than an outcome measure that has a continuous distribution. A synthetic dataset was generated for use in the online sample size calculation tool. This dataset was created using imputation by chained equations in multiple stages. First, empty records for one thousand new patients at day zero were added to the real data. Next, variables that can be reasonably expected to be constant over a 28

RkJQdWJsaXNoZXIy MTk4NDMw