Maaike Swets Viral infections research in a data driven era Infectious disease surveillance and real-world causal inference
Viral infections research in a data driven era Infectious disease surveillance and real-world causal inference Maaike C. Swets
Viral infections research in a data driven era: infectious disease surveillance and real-world causal inference © 2024, Maaike C Swets, The Netherlands ISBN: 978-94-6506-865-7 Layout: Maaike Swets Printing: Ridderprint
Viral infections research in a data driven era Infectious disease surveillance and real-world causal inference Proefschrift ter verkrijging van de graad van doctor aan de Universiteit Leiden, op gezag van rector magnificus prof.dr.ir. H. Bijl, volgens besluit van het college voor promoties te verdedigen op vrijdag 21 maart 2025 klokke 13.00 uur door Maaike Caroline Swets geboren te Den Haag in 1994
Promotor: Prof. dr. L.G. Visser Copromotores: Prof. dr. J.K. Baillie (The University of Edinburgh) Dr. G.H. Groeneveld Promotiecommissie: Prof. dr. S.C. Cannegieter (LUMC) Prof. dr. J.J. Goeman (LUMC) Prof. dr. M.D. de Jong (Amsterdam UMC en RIVM) Dr. A.B. Docherty (The University of Edinburgh) Dr. S. van den Hof (RIVM en LUMC) Dr. J. Tibboel (LUMC)
7 Chapter 1 Part I Chapter 2 Chapter 3 Part II Chapter 4 Chapter 5 Chapter 6 Part III Chapter 7 Chapter 8 Chapter 9 Introduction and outline of the thesis Infectious disease surveillance Use of proxy indicators for automated surveillance of severe acute respiratory infection, the Netherlands, 20172023: a proof-of-concept study Using laboratory test results for surveillance during a new outbreak of acute hepatitis in 3 week to 5 year old children in the United Kingdom, the Netherlands, Ireland and Curaçao: observational cohort study Causal inference and real-world data SARS-CoV-2 co-infection with influenza viruses, respiratory syncytial virus, or adenoviruses A comparison of the effectiveness of different doses of tocilizumab and sarilumab in the treatment of severe COVID-19: a natural experiment due to drug shortages Influenza season and outcome after elective cardiac surgery: an observational cohort study Facilitating causal inference Evaluation of pragmatic oxygenation measurement as a proxy for Covid-19 severity Clinical sub-phenotypes of Staphylococcus aureus bacteraemia Summary and general discussion Nederlandse samenvatting Acknowledgements List of publications Curriculum Vitae 9 23 45 77 95 119 141 175 199 224 241 243 247 Table of contents
Chapter 1 Introduction and outline of the thesis
10 Chapter 1 Large, opportunistic datasets and “big data” There is an exponential increase in the amount of data worldwide1. Advancements in technology have made the collection and storage of enormous amounts of data easier than ever before, and the amount of data collected and stored is expected to increase even further in the coming years2. Data is collected through every aspect of our daily activities: mobile (wearable) devices, such as smartphones or smartwatches provide a constant data stream, monitoring our geographical location or measure vital signs like heart rate3–5. Whenever we use Google to look up information, this is stored and can be analysed using Google search trends6. It is not just tech companies that have expanded their data collection, the way data is collected in health care has also drastically changed in the past decades4,7. Paper health records are mostly replaced with electronic health records (EHR)4,8, and tremendous amounts of data are generated and stored on a daily basis in hospitals across the globe, although organising and utilising this data is not easy3. This routinely collected or generated data is sometimes used for clinical research, but manually extracting data from EHR is labour intensive. Automated extraction of structured data from EHR can create opportunities to rapidly build large datasets, with relatively minimal effort. Large, observational datasets containing data on many variables for many patients are often referred to as “big data”9. An important aspect of big data is that the relative size is increased, compared to what is considered a typical dataset in the field7. Other important features of datasets that are considered big data are volume (amount of data), velocity (speed of collection) and variety (types of data, like structured and unstructured data), also known as the ‘three V terms’. With these large datasets, data collection is not always done with a specific research question in mind. This leads to an obvious disadvantage: when the dataset is not designed for a specific research question, often times the dataset will be sub optimal to answer said research question. Epidemiological knowledge is frequently needed to avoid drawing invalid conclusions, for example due to differences between groups leading to confounding. Specific care is needed to ensure that the study population of interest is represented and included in the database. Patient with mild disease, for example, whose disease is managed by their general practitioner may not have an EHR in the hospital system. Moreover, data should be of sufficient quality. I.e., there are several diagnostic approaches for diseases like heart failure, and for most research questions it will be important to ensure that the same diagnostic criteria are used in the entire study population10. Moreover, routinely collected datasets often lack quality control. Large datasets can contain incomplete or inaccurate information that can be difficult to identify, and there are not always standard procedures to check the quality of the data. However, there are also many opportunities and advantages using these datasets for health care research, if the aforementioned obstacles are overcome. First of all, real world data reflects the heterogeneity of the population and clinical practice,
11 General Introduction 1 and makes findings more easily generalisable. Second, even though randomised controlled trials are ideal to answer causal questions, this is not always ethical or feasible (for example due to the amount of time, money or patients needed). However, large, observational datasets, under some conditions, can be used to answer causal questions9. Third, since many of these datasets contain routinely collected data, typically little effort is involved for researches to use the data. New developments make mining of data from EHR possible, and several commercial companies use anonymised health care data, for example from healthcare insurance declarations. Fourth, large datasets allow the identification of patterns or subgroups that cannot be identified in smaller datasets. Finally, by reducing the manual data collection burden, researchers can expedite the pace of research discoveries and potentially improve patient care. The idea to use this data to improve care is not new11,12. International Classification of Diseases (ICD) codes are a common way to study EHR data4. Like many other areas of medicine, the field of infectious diseases (ID) saw a large increase in the number of publications using big data7,13. A common use of these new and large datasets are for infectious diseasesurveillance7 (discussed below), but has also been used in other aspects of infectious disease research, such as improvement of care4. Viral respiratory disease Outbreaks of seasonal and pandemic viruses have shaped recent human history. From the “Spanish Flu” in 1918 to the recent Severe Acute Respiratory Syndrome Coronavirus -2 (SARS-CoV-2) pandemic that started in 2019, these respiratory viruses have had a massive global impact. With the increased possibility for travel for many people, and the close proximity of animals to humans, the likelihood of the emergence of a new variant with pandemic potential increases14. As in many developed countries intensive care units (ICUs) and hospitals operate close to capacity, this means that outbreaks of seasonal or pandemic viruses can rapidly lead to serious capacity problems14–17. Shortages of beds15, ventilators18 and medication19 have been reported during previous (seasonal or pandemic) outbreaks of viral respiratory disease14,16. An epidemic is defined as an outbreak that spreads over a large geographical area, whereas a pandemic is defined as an epidemic that spreads globally20. The 19181919 “Spanish Flu” (influenza) is the deadliest pandemic in recorded history, during which an estimated 50-100 million people died, over 2.5% of the world population at the time14,21. Since 1918, there have been several influenza virus pandemics (1957, 1968 and 2009)22,23. Both the 1957 and 1968 influenza pandemics resulted in an estimated one to two million deaths24. Before the recent coronavirus pandemic caused by SARS-CoV-2, in 2003 the SARS-1 pandemic resulted in over 8000 infections and almost 800 deaths in 27 countries25,26. No human cases of SARS-1
12 Chapter 1 have been detected since early 200426. In 2012 a new virus was isolated after a fatal pneumonia in Saudi Arabia: Middle East Respiratory Syndrome Coronavirus (MERSCoV)26. At the end of 2021, over 2500 confirmed cases and over 900 deaths have been reported27. The most recent pandemic, caused by SARS-CoV-2, lead to an estimated 287 million infections and 5.4 million deaths by the end of 202128. However, those where numbers reported to the World Health Organisation (WHO), and the estimated global excess mortality is significantly higher, with almost 15 million deaths in 2020 and 202128. Apart from the irregular but recurrent influenza pandemics, influenza is also known as a cause of seasonal epidemics, possibly occurring since the middle ages24,29. Even though pandemic viruses are typically better known and get more public attention than seasonal viruses, the cumulative morbidity and mortality is higher for seasonal outbreaks21. In the European Union, each year an estimated 4 to 50 million people get seasonal influenza virus, with over 26.000 respiratory deaths on average every winter30. Globally, it is estimated that between 290.000-645.000 people die from respiratory disease caused by seasonal influenza21. Excess mortality is mainly in older adults and those with underlying health conditions24. In (sub)tropical areas, there is year-round occurrence of influenza, with a similar impact of disease as in temperate regions with peaks in the colder months24. Humans, domestic animals and wild animals living in close proximity to each other increases the probability of the emergence of a new virus24,31. The growing human population, global movements of people (travel, immigration)14 and shift to living more crowded in urban area’s32 makes it easier for new viruses to spread and become pandemic32. The SARS-CoV-2 pandemic demonstrated that many countries were not adequately prepared to deal with a new pandemic15. The pandemic highlighted weaknesses in pandemic preparedness for many countries33, but also many successes, such as the development of effective vaccines within a year of the start of the pandemic34, and the speed at which large numbers of patients were included in observational studies and clinical trials to learn more about this new disease35,36 In the period between two pandemics, it is essential to learn from the last and prepare for the next pathogen or seasonal outbreak. In this time of technological developments and an exponential increase in the amount of data available, utilising these new data opportunities can increase the efficiency of research, and accelerate discoveries. Real-time or near real-time automated data collection and analysis can help with many challenges that we continuously face when it comes to viral respiratory diseases research: improve surveillance and clinical care. Ultimately, more efficient research and faster results make it possible to make informed decisions for both public health and individual patient care. The aim of this thesis is to learn from seasonal and pandemic outbreaks of viral respiratory diseases, with a focus on large datasets containing routinely collected data. The aim was to efficiently use existing datasets to learn about different aspects
13 General Introduction 1 of infectious diseases. While the primary focus of this thesis is respiratory viruses, some chapters focus on other infectious diseases. However, most of the methods used in this thesis will be applicable to a broad spectrum of infectious diseases, and are not limited to respiratory viruses. The first part of this thesis focuses on population level surveillance and answers questions regarding the incidence of viral outbreaks. The second part of this thesis focuses on clinical applications, and aims to answer questions concerning risk factors and treatment of seasonal and pandemic viruses on the patient level. The third part of this thesis, the aim is to improve the efficiency of causal inference (and therefore, the efficiency of clinical trials) in infectious disease research, by improving the assessment of the outcome of interest, and by removing noise from heterogeneous diseases. Infectious disease surveillance While the most famous example of early infectious disease surveillance is perhaps John Snow’s London Cholera study in 185437, there are even earlier examples, like John Graunt’s plague surveillance study from 166238 and James Moore’s smallpox study from 18178. Over the years, infectious disease surveillance systems have become more elaborate and sophisticated, and private companies, such as Google, have tested new methods for disease surveillance8,39. Infectious disease surveillance, defined as the monitoring of the health of a population, often using epidemiological tools, is a critical aspect of public health5. There are three goals of infectious disease surveillance:5 1. Describing the current status and burden of disease 2. Monitoring of trends 3. Identifying outbreaks and novel pathogens: surveillance is essential to recognise and mitigate new pandemics14 Traditional infectious disease surveillance systems have several downsides. It is labour intensive and therefore expensive, and there is typically a time lag due to this intensive data collection process7. These time lags in data collection can lead to a higher number of infections, as a rapid response is needed with outbreaks5. The emergence of large electronic datasets has made disease surveillance significantly easier40. Examples are using medical claims data or collect data from EHR for disease tracking8 and outbreak detection41. Infectious disease surveillance systems ideally have (near) real-time collection and analysis of data, have granular geographical data and are representative of the population7,8. Well-functioning surveillance systems can be used for outbreak detection, monitoring or prediction, changes in the characteristics of patients
14 Chapter 1 affected8,14 and can also be used to test the effect of different implementations, such as vaccination8. Moreover, the output from these surveillance systems should be communicated to health care providers and the community in general8. Possible limitations are due to privacy concerns and high costs when working with commercial partners8. A potentially under-used source of data for infectious disease surveillance is routinely collected data from hospitals. Every day, large amounts of clinical, microbiological and laboratory data are stored in hospital systems. Frequently, it is difficult to (rapidly) extract this data in an automated manner, which is essential for disease surveillance. In this first section, routinely collected hospital data is used to answer questions about the incidence of disease and population level surveillance for viral diseases. An example is the use of routinely collected data for the surveillance of Severe Acute Respiratory Infections (SARI). After the 2009 Influenza A pandemic, the WHO recommended that all countries develop national surveillance systems42. In Chapter 2, three different outcome measures are compared that can be used for the SARI surveillance: ICD-10 diagnostic code registration, Reverse Transcription Polymerase Chain Reaction (RT-PCR) tests and registration of contact and droplet precautions. A data-mining tool was used to collect data from EHRs and assess which outcome measure would be most useful in a future, prospective setting. Shortly after the COVID-19 pandemic, an outbreak of severe hepatitis in young children was first established in Glasgow, Scotland. Since then, cases were reported in 35 countries, and the causative agent is thought to be adenovirus in combination with AAV-243. However, it was unclear whether there was also a group of children with milder disease, i.e. if we were only seeing ‘the tip of the iceberg’. In order to determine the presence of a larger group of children who may have a milder form of hepatitis, aspartate transaminase (AST) and alanine transaminase (ALT) data from 30 hospitals across the Netherlands, the UK, Ireland and Curaçao were collected to compare the proportion of increased AST and/or ALT values over time. Moreover, the method used to collect and share this data efficiently without the risk of disclosing any identifiable information, is discussed. This is described in Chapter 3. Causal inference and real-world data Finding causal relationships is a major part of scientific research44. Defining and understanding the meaning of causality has been a topic of research for centuries45. One of the conditions for causal inference, as explained by Hernán and Robins46, is exchangeability. This means that in a perfectly executed randomised controlled trial, the treated and untreated groups are exchangeable based on clinical characteristics. If the treated and untreated group were accidentally switched, and the group who was supposed to receive no treatment got treatment, and vice versa, the proportional outcome would be the same46. In an observational study however, we have no influence on who is treated and who is not treated, and the groups are unlikely to be
15 General Introduction 1 exchangeable44. A common approach is to correct for known confounders that cause this lack of exchangeability between treated and untreated groups in observational data44. Correcting for a confounding variable produces conditional exchangeability within levels of the confounding variable, and can help with calculating a causal effect estimate, but only if we assume that there is no residual or unmeasured confounding44. In the second part of this thesis focuses on answering specific clinical questions using opportunistic, large datasets, using various epidemiological approaches to infer causal relations. In Chapter 4 the International Severe Acute Respiratory and emerging Infections Consortium Coronavirus Clinical Characterisation Consortium (ISARIC4C) Clinical Characterisation Protocol UK (CCP-UK) dataset was used to study the relationship between the need for invasive mechanical ventilation, mortality and co-infections in SARS-CoV-2. The ISARIC4C dataset consists of data from over 300.000 hospitalised COVID-19 patients from over 250 hospitals across the UK. A SARS-CoV-2 mono infection was compared with a SARS-CoV-2 co-infection with influenza viruses, adenovirus or respiratory syncytial virus (RSV). Inverse probability weighting was used to correct for the increased likelihood of testing in patients that were severely ill or admitted to the intensive care unit. Chapter 5 is a natural experiment, in which the dosing of IL-6 inhibitor for treatment of hospitalised COVID-19 patients was determined by the time of hospitalisation. Due to drug shortages, different doses of tocilizumab and sarilumab were recommended in different time periods in 2021 in the Netherlands. Using real world claims data, we compared the effectiveness of different doses of these IL6 inhibitors. For final chapter of the second part of this thesis, Chapter 6, data from the National Intensive Care Evaluation (NICE) registry was used, to study the relationship between cardiac surgery and influenza-like-illness (ILI) season. The duration of IMV was utilised as a proxy for viral respiratory disease, and was compared between patients who underwent elective cardiac surgery in ILI season compared to patients who underwent surgery in a season with low incidence of viral respiratory disease. Facilitating causal inference The third part of this thesis explores different methods and approaches that can increase the efficiency of causal inference. Apart from meta-analyses, RCTs provide the most reliable evidence for the effectiveness of interventions, and should ideally use patient relevant outcome measures, such as mortality47. A downside of many patient relevant outcomes is that follow-up time is long and a large sample size is needed, both leading to higher costs. Especially in therapeutic studies in COVID-19 (or other emerging viruses), in which answers are needed rapidly, it is vital to assess therapeutic options efficiently and accurately in early-stage clinical studies. A possible way to improve the efficiency of
16 Chapter 1 clinical trials -by decreasing sample size and follow-up time- is to use intermediate or surrogate outcomes. For example, HbA1c could be used as a surrogate for micro- and macrovascular complications in type II diabetes mellitus48. Continuous (numerical) endpoints also decrease sample size, further improving the efficiency of a trial49. A commonly used intermediate endpoint is the WHO ordinal scale, but this has several downsides. Developing a surrogate endpoint with a closer relation to the definitive endpoint can improve causal inference, by providing a better assessment of the definitive endpoint. In Chapter 7, an intermediate endpoint for clinical trials in COVID-19 was developed and evaluated. This intermediate endpoint is a measure for pulmonary oxygenation function (S/F94) and we compare this measure to other commonly used outcome measures in clinical trials for COVID-19. Another method to facilitate more efficient causal inference is by removing noise generated by including patients with divergent disease processes. There are many diseases that present as clinically heterogeneous syndromes, and identifying clinically relevant subgroups can be useful to provide better care for patients: subgroup specific treatments, accurately predicting risks or estimating prognosis are a few of the possible benefits50,51. Latent Class Analysis (LCA) is a frequently used unsupervised modelling approach to identify unobserved (“latent”) homogeneous groups (clusters) of people within a larger, heterogeneous population51. For example, within the population of Acute Respiratory Distress Syndrome (ARDS) patients, different subgroups -identified using LCA- responded differently to therapy52,53. This means that future clinical trials can be better aimed at patient groups that are most likely to benefit. Clusters are formed based on indicator variables, such as demographics and comorbidities. Within clusters, the distribution of these indicator variables is similar, but different from those in other clusters. In the final chapter of the third section of this thesis Chapter 8, LCA was utilised to identify different subgroups in Staphylococcus Aureus Bacteraemia (SAB) patients. A summary of the studies in part I, II and III, a reflection on the different approaches and contemplation of future uses of large datasets is given in the final chapter of this thesis: Chapter 9.
17 General Introduction 1 References 1. Cremin, C. J., Dash, S. & Huang, X. Big data: Historic advances and emerging trends in biomedical research. Current Research in Biotechnology 4, 138–151 (2022). 2. Gandomi, A. & Haider, M. Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management 35, 137–144 (2015). 3. Batko, K. & Ślęzak, A. The use of Big Data Analytics in healthcare. Journal of Big Data 9, 3 (2022). 4. Kasson, P. M. Infectious Disease Research in the Era of Big Data. Annual Review of Biomedical Data Science 3, 43–59 (2020). 5. Murray, J. & Cohen, A. L. Infectious Disease Surveillance. in 222–229 (Elsevier, 2017). doi:10.1016/B978-0-12-803678-5.00517-8. 6. Wang, B. et al. COVID-19 Related Early Google Search Behavior and Health Communication in the United States: Panel Data Analysis on Health Measures. International Journal of Environmental Research and Public Health 20, 3007 (2023). 7. Bansal, S., Chowell, G., Simonsen, L., Vespignani, A. & Viboud, C. Big Data for Infectious Disease Surveillance and Modeling. Journal of Infectious Diseases 214, S375–S379 (2016). 8. Simonsen, L., Gog, J. R., Olson, D. & Viboud, C. Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems. Journal of Infectious Diseases 214, S380–S385 (2016). 9. Hernán, M. A. & Robins, J. M. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. American Journal of Epidemiology 183, 758–764 (2016). 10. Bots, S. H., Groenwold, R. H. H. & Dekkers, O. M. Using electronic health record data for clinical research: A quick guide. European Journal of Endocrinology 186, E1–E6 (2022). 11. Kohane, I. S. Using electronic health records to drive discovery in disease genomics. Nature Reviews Genetics 12, 417–428 (2011). 12. Breyer, B. N. et al. Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States. Urology 78, 267–271 (2011). 13. Zhang, Q. Data science approaches to infectious disease surveillance. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 380, 20210115 (2022). 14. Kain, T. & Fowler, R. Preparing intensive care for the next pandemic influenza. Critical Care 23, 337 (2019). 15. Cavallo, J. J., Donoho, D. A. & Forman, H. P. Hospital Capacity and Operations in the Coronavirus Disease 2019 (COVID-19) PandemicPlanning for the Nth Patient. JAMA Health Forum 1, e200345 (2020). 16. Lane, C. J. et al. ICU Resource Limitations During Peak Seasonal Influenza: Results of a 2018 National Feasibility Study. Critical Care Explorations 4, e0606 (2022). 17. Harris, G. H. et al. US Hospital Capacity Managers’ Experiences and Concerns Regarding Preparedness for Seasonal Influenza and Influenza-like Illness. JAMA Network Open 4, e212382 (2021). 18. Ranney, M. L., Griffeth, V. & Jha, A. K. Critical Supply Shortages The Need for Ventilators and Personal Protective Equipment during the Covid-19 Pandemic. New England Journal of Medicine 382, e41 (2020). 19. Stukas, S. et al. Reduced fixed dose tocilizumab 400 mg IV compared to weight-based dosing in critically ill patients with COVID-19: A before-after cohort study. The Lancet Regional Health - Americas 100228 (2022) doi:10.1016/j.lana.2022.100228. 20. Grennan, D. What Is a Pandemic? JAMA 321, 910 (2019). 21. Paules, C. I. & Fauci, A. S. Influenza Vaccines: Good, but We Can Do Better. The Journal of Infectious Diseases 219, S1–S4 (2019). 22. Wong, K. H. & Lal, S. K. Alternative antiviral approaches to combat influenza A virus. Virus Genes 59, 25–35 (2023). 23. Iskander, J., Strikas, R. A., Gensheimer, K. F., Cox, N. J. & Redd, S. C. Pandemic influenza planning, United States, 1978-2008. Emerging Infectious Diseases 19, 879–885 (2013). 24. Hampson, A. W. & Mackenzie, J. S. The influenza viruses. Medical Journal of Australia 185,
18 Chapter 1 S39–S43 (2006). 25. LeDuc, J. W. & Barry, M. A. SARS, the first pandemic of the 21st Century1. Emerging Infectious Diseases 10, e26–e26 (2004). 26. De Wit, E., Van Doremalen, N., Falzarano, D. & Munster, V. J. SARS and MERS: recent insights into emerging coronaviruses. Nature Reviews Microbiology 14, 523–534 (2016). 27. Peiris, M. & Perlman, S. Unresolved questions in the zoonotic transmission of MERS. Current Opinion in Virology 52, 258–264 (2022). 28. Msemburi, W. et al. The WHO estimates of excess mortality associated with the COVID-19 pandemic. Nature 613, 130–137 (2023). 29. Taubenberger, J. K. & Morens, D. M. Influenza: The once and future pandemic. Public health reports 125, 15–26 (2010). 30. Paget, J., Marquet, R., Meijer, A. & Velden, K. van der. Influenza activity in Europe during eight seasons (19992007): an evaluation of the indicators used to measure activity and an assessment of the timing, length and course of peak activity (spread) across Europe. BMC Infectious Diseases 7, 141 (2007). 31. Wolfe, N. D., Dunavan, C. P. & Diamond, J. Origins of major human infectious diseases. Nature 447, 279–283 (2007). 32. Hassell, J. M., Begon, M., Ward, M. J. & Fèvre, E. M. Urbanization and Disease Emergence: Dynamics at the WildlifeLivestockHuman Interface. Trends in Ecology & Evolution 32, 55–67 (2017). 33. Clark, H. et al. Transforming or tinkering: the world remains unprepared for the next pandemic threat. Lancet (London, England) 399, 1995–1999 (2022). 34. Burgos, R. M. et al. The race to a COVID-19 vaccine: opportunities and challenges in development and distribution. Drugs in Context 10, (2021). 35. Docherty, A. B. et al. Features of 20133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study. BMJ (Clinical research ed.) 369, m1985 (2020). 36. The RECOVERY Collaborative Group. Dexamethasone in Hospitalized Patients with Covid-19 Preliminary Report. New England Journal of Medicine NEJMoa2021436 (2020) doi:10.1056/ NEJMoa2021436. 37. Tulchinsky, T. H. John Snow, Cholera, the Broad Street Pump; Waterborne Diseases Then and Now. in 77–99 (Elsevier, 2018). doi:10.1016/B978-0-12-804571-8.00017-2. 38. Morabia, A. Epidemiology’s 350th Anniversary: 1662-2012. Epidemiology (Cambridge, Mass.) 24, 179–183 (2013). 39. Ginsberg, J. et al. Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009). 40. Gilbert, G. L., Degeling, C. & Johnson, J. Communicable Disease Surveillance Ethics in the Age of Big Data and New Technology. Asian Bioethics Review 11, 173–187 (2019). 41. Imanishi, M. et al. Typhoid fever acquired in the United States, 19992010: epidemiology, microbiology, and use of a spacetime scan statistic for outbreak detection. Epidemiology and Infection 143, 2343–2354 (2015). 42. Buda, S., Tolksdorf, K., Schuler, E., Kuhlen, R. & Haas, W. Establishing an ICD-10 code based SARI-surveillance in Germany description of the system and first results from five recent influenza seasons. BMC Public Health 17, 612 (2017). 43. Ho, A. et al. Adeno-Associated Virus 2 Infection in Children with Non-A-E Hepatitis. http:// medrxiv.org/lookup/doi/10.1101/2022.07.19.22277425 (2022) doi:10.1101/2022.07.19.22277425. 44. Martin, W. Making valid causal inferences from observational data. Preventive Veterinary Medicine 113, 281–297 (2014). 45. Raita, Y., Camargo, C. A., Liang, L. & Hasegawa, K. Big data, data science, and causal inference: A primer for clinicians. Frontiers in Medicine 8, 678047 (2021). 46. Hernan, M. A. & Robins, J. M. Causal Inference: What If. (Boca Raton: Chapman & Hall/CRC., 2020). 47. Ciani, O., Manyara, A. M., Chan, A.-W., Taylor, R. S. & on behalf of the SPIRIT-SURROGATE/ CONSORT-SURROGATE project group. Surrogate endpoints in trials: a call for better reporting. Trials 23, 991 (2022).
19 General Introduction 1 48. Woerle, H. J. et al. Impact of fasting and postprandial glycemia on overall glycemic control in type 2 diabetes. Diabetes Research and Clinical Practice 77, 280–285 (2007). 49. Dodd, L. E. et al. Endpoints for randomized controlled clinical trials for COVID-19 treatments. Clinical Trials (London, England) 17, 472–482 (2020). 50. Aflaki, K., Vigod, S. & Ray, J. G. Part I: A friendly introduction to latent class analysis. Journal of Clinical Epidemiology 147, 168–170 (2022). 51. Mori, M., Krumholz, H. M. & Allore, H. G. Using Latent Class Analysis to Identify Hidden Clinical Phenotypes. JAMA 324, 700 (2020). 52. Calfee, C. S. et al. Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials. The Lancet. Respiratory Medicine 2, 611–620 (2014). 53. Calfee, C. S. et al. Acute respiratory distress syndrome subphenotypes and differential response to simvastatin: secondary analysis of a randomised controlled trial. The Lancet Respiratory Medicine 6, 691–698 (2018).
Part I Infectious disease surveillance
Chapter 2 Use of proxy indicators for automated surveillance of severe acute respiratory infection, the Netherlands, 2017 to 2023: a proof-of-concept study Maaike C Swets, Annabel Niessen, Emilie P Buddingh, Ann CTM Vossen, Karin Ellen Veldkamp, Irene K Veldhuijzen, Mark GJ de Boer, Geert H Groeneveld Eurosurveillance. 2024 Jul 4;29(27).
24 Chapter 2 Abstract Background: Effective pandemic preparedness requires robust severe acute respiratory infection (SARI) surveillance. However, identifying SARI patients based on symptoms is time-consuming. Using the number of reverse transcription (RT)-PCR tests or contact and droplet precaution labels as a proxy for SARI could accurately reflect the epidemiology of patients presenting with SARI. Aim: We aimed to compare the number of RT-PCR tests, contact and droplet precaution labels and SARI-related International Classification of Disease (ICD)-10 codes and evaluate their use as surveillance indicators. Methods: Patients from all age groups hospitalised at Leiden University Medical Center between 1 January 2017 up to and including 30 April 2023 were eligible for inclusion. We used a clinical data collection tool to extract data from electronic medical records. For each surveillance indicator, we plotted the absolute count for each week, the incidence proportion per week and the correlation between the three surveillance indicators. Results: We included 117,404 hospital admissions. The three surveillance indicators generally followed a similar pattern before and during the COVID-19 pandemic. The correlation was highest between contact and droplet precaution labels and ICD-10 diagnostic codes (Pearson correlation coefficient: 0.84). There was a strong increase in the number of RT-PCR tests after the start of the COVID-19 pandemic. Discussion: All three surveillance indicators have advantages and disadvantages. ICD-10 diagnostic codes are suitable but are subject to reporting delays. Contact and droplet precaution labels are a feasible option for automated SARI surveillance, since these reflect trends in SARI incidence and may be available real-time.
25 Surveillance of Severe Acute Respiratory Infection 2 Introduction Severe acute respiratory infection (SARI) surveillance is essential for disease control and prevention, enabling assessment of the effectiveness of community-based preventive measures, detection of unusual events, identification of risk factors and evaluating pandemic preparedness and capacity management1,2. Ideally, a SARI surveillance system should be (near) real-time, combining syndromic surveillance with pathogen testing and be automated where possible to decrease the administrative burden. European-level SARI surveillance is available, with weekly reports being published online by the European Centre for Disease Prevention and Control (ECDC) 3. However, the number of contributing countries is small, and there is often a delay in reporting because of the intensive nature of data collection. At present, there is no robust sentinel or universal SARI surveillance system in the Netherlands. The rapid developments in the field of data science and the increase in easily accessible healthcare data bring new opportunities for infectious disease surveillance 4. While manual reporting of cases was once the sole method for infectious disease surveillance, a variety of data sources is used at present4. Selected International Classification of Disease (ICD)-10 codes are used in multiple European countries for SARI surveillance, with or without virological test results5-8. Although ICD-10 codes are standardised, delays in reporting and the mix of codes being used may over- or underestimate the true number of SARI cases1. A narrow selection of codes could underestimate the true number of SARI cases, while a broad selection of codes could overestimate the true number of SARI cases. Early in the COVID-19 pandemic in 2020, the World Health Organization (WHO) issued guidelines for the protection of healthcare workers, e.g. in hospitals9. These guidelines recommended contact and droplet precautions when caring for suspected COVID-19 patients. These guidelines have been implemented rapidly and, in most hospitals in the Netherlands, these patients have a contact and droplet precaution label in their electronic medical record (EMR). During the pandemic, information about the numbers in contact and droplet isolation have been used to determine the COVID-19 impact on hospital capacity10. Both before and following the pandemic, contact and droplet isolation precautions have been used for patients with a suspected viral respiratory infection. The number of these patients is likely to reflect the number of patients who are hospitalised with a respiratory tract infection and could serve as a proxy in SARI surveillance. In addition, patients who are hospitalised with a suspected viral respiratory infection are typically tested using a reverse transcription (RT) PCR test. The number of RT-PCR tests for viral respiratory pathogens, irrespective of the test result, could also reflect the number of hospitalised patients with a respiratory tract infection, and be suitable for SARI surveillance. In this proof-of concept study, we hypothesise that both the number of RT-PCR tests and contact and droplet precaution labels are indicative of SARI and could be pragmatic indicators for monitoring of trends and capacity management in
26 Chapter 2 SARI surveillance. Using data between 2017 and 2023 from one hospital in the Netherlands, we compare RT-PCR, contact and droplet precaution labels, and ICD10 codes to assess SARI counts and incidence proportions and to evaluate their suitability as surveillance tool. Methods Study design and population We conducted a retrospective observational study at the Leiden University Medical Center (LUMC, Leiden, the Netherlands), a tertiary university hospital in one of the larger metropolitan areas of the Netherlands. Almost 21,000 patients are admitted to the LUMC every year. Patients of all ages hospitalised for at least 24 h between 1 January 2017 and 30 April 2023 were included. Patients who were hospitalised for less than 24 h at the LUMC but were transferred to another hospital were included in our study. A patient could be included multiple times, if more than one hospitalisation occurred within the study period, with the exception of readmissions within 10 days of the previous hospitalisation. For all included hospitalisations, we collected data on the presence of the three different surveillance indicators detailed below. The study period was divided into three timeframes. The first period (pre-COVID-19) consists of data from week 1 2017 to week 8 2020. The second period starts in week 9 2020, when the first COVID-19 case was reported in the Netherlands and includes data up to week 53 2020. As the registration policy for contact and droplet precaution labels was changed at the end of 2020 (see below), we included a third time period, in which these changes were fully implemented. The third period starts in week 1 2021 and ends at the end of our study period (week 18 2023). Surveillance indicators ICD-10 diagnostic codes ICD-10 diagnostic codes8 indicative of conditions seen in SARI patients were selected. These codes included: J00–J22 (upper and lower respiratory tract infections), U07.1 and U07.2 (COVID-19 infections). For children, J40 (bronchitis), J45.9 (asthma, unspecified) and J98.8 (other respiratory disorders) are frequently used for SARI in our hospital and were therefore included. In order to be included, ICD-10 codes had to be registered between hospital admission and 7 days after hospital discharge and be registered by the treating physician in the EMR. To avoid the inclusion of patients with chronic disease, mainly asthma, we only included ICD10 codes if the same ICD-10 code was not registered in the previous year.
27 Surveillance of Severe Acute Respiratory Infections 2 RT-PCR testing Patients were tested for respiratory viruses at the discretion of the treating physician. RT-PCR tests were conducted on upper respiratory tract samples, using either a nasal, nasopharyngeal, or throat swabs. Virology results were recorded in the Global Laboratory Information Management System (GLIMS), which is linked to the EMR. The total number of RT-PCR tests performed for one of the following respiratory viruses were collected: human adenovirus, bocavirus, human coronaviruses (SARSCoV-2, MERS, 229E, HKU1, NL63, OC43), human metapneumovirus, influenza viruses A and B, parainfluenza virus (PIV) 1–4, human rhinovirus and respiratory syncytial virus (RSV). Even though patients were frequently screened for multiple viruses using the multiplex RT-PCR method, we accounted for it as a single RT-PCR test per patient in our analysis. If a patient had more than one (multiplex) RT-PCR test done during hospitalisation, we selected the first test. If a patient tested positive for multiple pathogens within a single RT-PCR test, we included all of the identified pathogens in the virological test results. Only RT-PCR tests that were done 48 h before admission to 48 h after admission were eligible for inclusion to minimise the probability of including hospital-acquired infections. Tests before hospital admission were included to account for patients that were tested, e.g. at the emergency department or in the outpatient clinic, but initially sent home before being readmitted within the next 2 days because of clinical deterioration. RT-PCR test results were reported as positive or negative for each tested virus. Contact and droplet precautions According to the standard procedure in our hospital, contact and droplet precautions are applied for all patients suspected or confirmed to have a respiratory viral infection from one of the viruses mentioned above. However, for rhinoviruses, these precautions are only recommended for immunocompromised patients and neonates. The installation of contact and droplet precautions are recorded in the EMR. The process for recording contact and droplet precautions in our hospital’s system underwent a revision on 1 December 2020. Prior to this date, only the infection control and hospital hygiene department staff could add these precautions to the EMR. Precautions were added to the EMR only after a positive RT-PCR test result. Starting from 1 December 2020, a broader range of healthcare personnel, including nurses and physicians from all departments, could add contact and droplet precautions to the EMR. Precautions were taken and added to the EMR for both suspected or confirmed infections. Only contact and droplet precautions registered within 48 h of hospital admission were counted, in order to minimise the probability of including hospital-acquired infections.
28 Chapter 2 Data collection CTcue (IQVIA) is a clinical data collection tool that can be used to identify patients and extract data from their EMRs. We collected (structured) data for the following variables: age, date of hospital admission and date of hospital discharge, ICU admission during hospitalisation and information on our three surveillance indicators (ICD-10, RT-PCR and contact and droplet precautions), as described above. The clinical data collection tool was previously validated using Dutch EMR data and showed high accuracy11,12. In order to validate the accuracy of our data collection, we selected 2 random weeks for each surveillance indicator, and checked the results with regular quality control data in our hospital. This was done to ensure that the data collection tool did not miss any admissions or relevant variables. Statistical analyses For each surveillance indicator, we plotted the absolute count per week during the study period and visually compared trends. Next, we plotted the incidence as a proportion of the total number of hospitalised patients for a specific week (incidence proportion). For the number of PCR tests, for example, we plotted the number of unique patients who were tested for at least one respiratory virus using an RT-PCR test, as the proportion of all newly hospitalised patients, for each week. A subanalysis including only patients that were admitted to the intensive care unit (ICU) at any point during their hospital admission was performed. In a second subanalysis, the results for RT-PCR tests and contact and droplet precautions were split by age group. We estimated the Pearson correlation coefficient between the different surveillance indicators over several time periods. Finally, we plotted the number of positive RTPCR tests for each week in the study period. R software (version 4.3.1, R Foundation) was used to analyse the data and create the graphs. Results A total of 417,119 hospitalisations were registered at Leiden University Medical Center between 1 January 2017 and 30 April 2023. Of these, 299,715 re-admissions and admissions with a duration of less than 24 h were excluded. A total of 117,404 admissions were included in our analysis. The flowchart of inclusion and exclusion can be found in Supplementary Figure 1. Information on data validation can be found in the Supplement. In our study period, 11,959 RT-PCR tests for respiratory viruses were registered, 4,683 contact and droplet precautions were registered, and 3,908 ICD-10 diagnostic codes of interest were registered. The overlap between the presence of the different surveillance indicators in the three different time periods can be seen in Supplementary Figure 2, 3 and 4. There were no missing data for any of the collected variables in our analysis.
29 Surveillance of Severe Acute Respiratory Infections 2 Absolute counts On average, the absolute count in the pre-COVID-19 years (2017–19) was lower than the absolute count during and at the end of the pandemic (see figure 1 and Supplementary Table 1, which provides the mean count per week for the three surveillance indicators). Prior to the COVID-19 pandemic, all three surveillance indicators had relatively similar absolute counts, with the number of RT-PCR tests being slightly higher than the other two surveillance indicators, especially during the traditional influenza-like illness (ILI) season. In 2018, for example, there were on average 4.5 contact and droplet precaution labels, 10.2 ICD-10 codes and 15.1 RT PCR tests each week, see Supplementary Table 1. From spring 2020 onwards, the number of RT-PCR tests was consistently higher than the number of the other two surveillance indicators e.g. in 2021, there were 34.5 contact and droplet precaution labels, 18.2 ICD-10 codes and 69.6 RT-PCR tests each week (see Supplementary Table 1). The number of contact and droplet precaution labels remained low throughout most of 2020, but increased steeply at the end of 2020 and remained higher than ICD-10 registrations in 2021, 2022 and 2023 (up to and including week 18). During this study period, 11,959 PCR tests were done, and 4,683 contact and droplet isolation labels and 3,908 ICD-10 codes were registered. The total number of registrations for the three surveillance indicators combined is 20,550 for 117,404 hospital admissions. 2023 2021 2022 2019 2020 2017 2018 0 5 101520253035404550 0 5 101520253035404550 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Weeknumber Absolute count Contact and droplet precautions ICD−10 RT−PCR Figure 1. Absolute count per week for the three surveillance indicators over time, Leiden University Medical Center, Leiden, the Netherlands, 1 January 2017–30 April 2023 (n = 117,404 hospital admissions).
30 Chapter 2 When showing the incidence proportion, a similar pattern can be seen (figure 2). The RT-PCR test incidence proportion noticeably diverges from the other two surveillance indicators after the first COVID-19 cases were reported in the Netherlands in week 9 202013. During this study period, 11,959 PCR tests were done, and 4,683 contact and droplet isolation labels and 3,908 ICD-10 codes were registered. The total number of registrations for the three surveillance indicators combined is 20,550 for 117,404 hospital admissions. Note that the y-axis ends at 0.3 instead of 1.0 to enhance visibility of differences between the different surveillance indicators. Virological test results Figure 3 shows the positive virological test results per week. Since the onset of the COVID-19 pandemic in 2020, there was an increase in the number of overall RTPCR tests performed, accompanied by a lower proportion of positive test results. The number of RT-PCR tests that had an unknown test result, e.g. because the test was lost or the analysis was stopped, was stable over time, with roughly one unknown test result each week (data not shown). We therefore only show the positive test results over time, as a proportion of the total number of tests done. 2023 2021 2022 2019 2020 2017 2018 0 5 101520253035404550 0 5 101520253035404550 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 Weeknumber Incidence proportion Contact and droplet precautions ICD−10 RT−PCR Figure 2. Weekly incidence proportion for the three surveillance indicators over time, Leiden University Medical Center, Leiden, the Netherlands, 1 January 2017–30 April 2023 (n = 117,404 hospital admissions).
31 Surveillance of Severe Acute Respiratory Infections 2 Correlation The correlation between the different surveillance indicators can be seen in Table 1. The total number of RT-PCR results was used for the PCR surveillance indicator (as opposed to the number of positive tests). Overall, the correlation between the different surveillance indicators was highest in the third time period, especially between contact and droplet precautions and ICD-10 registration. Surveillance indicators Week 1/2017– week 8/2020 Week 9/2020– week 53/2020 Week 1/2021– week 18/2023 RT-PCR and contact and droplet precautions 0.82 0.38 0.73 Contact and droplet precautions and ICD-10 0.52 0.66 0.84 RT-PCR and ICD-10 0.57 0.56 0.64 Table 1. Pearson correlation coefficient for the correlation between combinations of surveillance indicators in the three different time periods, Leiden University Medical Center, Leiden, the Netherlands, 1 January 2017–30 April 2023 2023 2021 2022 2019 2020 2017 2018 0 5 101520253035404550 0 5 101520253035404550 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Weeknumber Absolute count Virus Influenza A Influenza B Other RSV SARS−CoV−2 RT−PCR tests and positive RT−PCR results Figure 3. RT-PCR test results for respiratory viruses over time, Leiden University Medical Center, Leiden, the Netherlands, 1 January 2017–30 April 2023 (n = 117,404 hospital admissions). RSV: respiratory syncytial virus; SARS-CoV-2: severe acute respiratory syndrome coronavirus 2. Positive RT-PCR test results over time (coloured bars) and total number of RT-PCR tests done (yellow line) are shown. The total number of positive tests represented are as follows: influenza A: 274; influenza B: 114; other: 1,009; RSV: 317; SARS-CoV-2: 1,034. The ‘other’ group includes the following viruses: human adenovirus, bocavirus, human coronaviruses (229E, HKU1, NL63 and OC43), human metapneumovirus, parainfluenza virus 1–4 and human rhinovirus. The number of positive tests can be higher than the total number of tests when a patient tests positive for multiple viruses.
www.ridderprint.nlRkJQdWJsaXNoZXIy MTk4NDMw