Robin van Rijthoven

86 Chapter 4 First, during “Information,” the child has to answer verbally asked questions to test their general knowledge about events, objects, places, and people. Secondly, during “Similarities,” the child has to name the similarity between two concepts. Thirdly, during “Productive vocabulary,” the experimenter pronounces a word and the task of the child was todefine thegivenword. Fourthly, during “Comprehension,” the experimenter asked questions about social situations or common concepts. Kaufman (1975) already showed that these four measures together form a factor named “verbal comprehension,” which was also the case in the current sample (Van Rijthoven et al., 2018). Spelling error classification The PI word dictation consisted of 135 words. However, for most children, testing was terminated earlier. We therefore selected the first four blocks (60 words). All possible types of errorswithin these 60wordswere listedand labeled (e.g., phoneme addition, end d/t, ei/ij). These errors were divided into three categories: phonological, morphological, and orthographic errors following Tops and colleagues (2014), Vanderswalmen and colleagues (2010), and Worthy and Viise (1996), see Appendix A. Some types of errors could not be classified exclusively into phonological, morphological, or orthographic errors. Words containing these types of errors were removed from the dataset, see Appendix B (i.e., 21.66% (version A) and 18.33% (version B)). After the removal of the above-mentioned words, the total number of possible errors was calculated based on the descriptions of the total number of possible errors within each category. This was done for each version of the PI word dictation (versions A and B) to correct for any differences between the two versions. Next, the dictation tasks of all participants were screened on the number and type of errors made by the child. Following Tops and colleagues (2014), the error classification was based on the end product and not on the strategy used by the child. For each child with dyslexia, two dictations were screened (pre- and posttest). For typically developing children, a single dictation was screened (all version A). The interrater reliability of the two MSc students who did the screening was good: Cohen’s kappa is .84 (A-version) and .88 (B-version). For each dictation task, all errors were entered in a dataset in which each error was assigned to the type of error. One word could contain multiple errors following the descriptions in Appendix A. In case of early termination of the task (after eight errors in a block of 15 words), all possible errors in non-written words were entered in the dataset based on the assumption that the upcoming words would be too difficult for the child. This is following procedures of other tests such as theWISC-IIINL (Kort et al., 2005a) or the PPVT (Dunn & Dunn, 1997). In the end, the total number of phonological, morphological,