Statistical and AI methods to detect school-level outbreaks from routinely collected absence data

Research during the SARS-CoV-2 pandemic demonstrated the utility of novel data sources for understanding pathogen transmission in and between schools. In particular, school-level absences data acted as a complementary SARS-CoV-2 surveillance system, and has been used to parameterise and calibrate transmission-dynamic models of school interventions. Given the existing routine collection of school absence data in the UK, there is an under-utilised potential for this to be harnessed for future early outbreak detection and control.

Several statistical approaches can be used to predict infectious disease cases based on time-series data from small geographical areas. Non-linear random effects models are a useful statistical approach that allow spatially correlated random effects to be introduced. These correlated random effects need not be based on spatial distance alone; in the context of schools, correlations could also depend upon the connectedness of schools within the school-household network. The use of machine learning (ML) and artificial intelligence (AI) methodology in epidemiological surveillance and modelling is in its infancy and has not yet been applied to understanding school-level transmission control. Such approaches have the potential to outperform traditional statistical approaches because of their ability to capture complex non-linear relationships from high-dimensional inputs.

Early outbreak detection could transform the way we control transmission in schools. During the SARS-CoV-2 pandemic, schools were given rigid guidance at either the regional or national level – advice to schools was independent of a school’s specific epidemiological context. Approaches that identify which schools are likely to experience a large wave of infections if control measures are not implemented could allow for targeted interventions that minimise school-level transmission and disruption simultaneously. For outbreaks of other pathogens, including measles and scarlet fever, schools are currently advised to contact their local UK Health Security Agency (UKHSA) health protection team if they have ‘a higher than previously experienced and/or rapidly increasing number of absences due to the same infection’. Statistical/AI approaches may be able to detect such rapid increases earlier and more consistently than current practice.

This PhD project will aim to:

  1. develop methods capable of identifying which schools are in the early stages of an outbreak from routinely collected data in schools, using the SARS-CoV-2 pandemic as a case study.
  2. understand whether ML/AI approaches outperform ‘traditional’ statistical approaches in this task
  3. use transmission modelling to demonstrate the efficacy of this approach in response to future infection outbreaks.

Supervisors:
Lead supervisor: Dr Trystan Leng, Lancaster Medical School, Lancaster University
Co-supervisor: Dr Sam Moore, Lancaster Medical School, Lancaster University

This PhD opportunity is being offered as part of the LSTM and Lancaster University Doctoral Training Partnership. Find out more about the studentships (https://www.lstmed.ac.uk/mrc_dtp_case) and how to apply (https://www.lstmed.ac.uk/study/research-degrees/lstm-mrc-doctoral-training-partnership/mrc-dtp-guidance-notes)

For enquiries please contact Dr Trystan Leng: t.leng@lancaster.ac.uk

Type
PhD position
Institution
Lancaster University
City
Lancaster
Country
United Kingdom
Closing date
December 7th, 2025
Posted on
November 12th, 2025 12:50
Last updated
November 12th, 2025 12:50
Share