Laura Spinnewijn

84 Chapter 5 in gynecology training (LS, FS). The case vignette ranking exercise was sent out by email or printed on paper. Participants were assigned to either the ‘complexity’ ranking group or the ‘job satisfaction’ ranking group before they were invited to participate. Group assignment was not blinded. Participants had to rank cases into three preference categories. The ‘job satisfaction’ group ranked the three most rewarding (H), four neutral (N) and the three least rewarding (L) cases. The ‘complexity’ group ranked the three most complex (H), four neutral (N) and three least complex (L) case vignettes. We only asked for a partial ranking of all ten vignettes in three categories. Full ordinal ranking becomes more complicated when ranking alternatives increase, and results may get biased [21]. Decreasing ordering complexity from ten options to categorizing in three ordinal classes reduces this risk of ranking bias. Research questions The concrete research questions for our statistical analyses were: Do the frequencies and distributions of SDM, EMO and TECH vignette counts differ within the lower (L), and higher (H) ranked categories of ‘job satisfaction’ and ‘complexity’, respectively? Statistical analyses Due to the nature of our data, it was impossible to use standard statistical tests, like the Wilcoxon-signed rank test, as our data did not meet the assumptions for these tests (e.g., measures are not continuous). Therefore, an expert team of statisticians (JE, KS) developed an algorithm for analyses based on randomization tests. [23] The algorithm made it possible to distinguish whether ranking results were based on chance. We tested the null hypothesis that cases were randomly assigned to Category L, each with equal probability within each participant and with a multivariate hypergeometric distribution of the variables within Category L. [24] First, we determined how often each case vignette type SDM, EMO or TECH was assigned to the lowest Category L per participant. Then, we compared case vignette observed sum scores (S) with chance level, reporting both sum scores and the statistic’s expected value (E) in each test. Then, using a convolution algorithm, we computed the sum scores’ probability distribution from the hypergeometric distributions. [25] According to the doubling formula, the two-sided p-values were calculated from S based on these probability distributions. [26] Next, we repeated the previous steps and compared all observed vignette counts pairwise. Subsequently, all analyses mentioned above were repeated for the highest Category, H. Because we conducted six significance tests in each block, the Bonferroni correction was also applied, meaning all p-values were multiplied by six to correct for false-positive results. A p-value < 0.05 was considered significant. Analyses were executed separately for the ‘job satisfaction’ and ‘complexity’ ranking groups. In reporting, we use subscript L and H letters to

RkJQdWJsaXNoZXIy MTk4NDMw