4. Next generation matrix: `covid.tasks.next_generation_matrix` computes the posterior next generation matrix for the

epidemic, from which measures of Local Authority District level and National-level reproduction number can be derived.

This posterior is saved in `<output_dir>/ngm.pkl`.

5. National Rt: `covid.tasks.overall_rt` evaluates the dominant eigenvalue of the next generation matrix samples using

power iteration and Rayleigh Quotient method. The dominant eigenvalue of the inter-LAD next generation matrix gives the

national reproduction number estimate.

6. Prediction: `covid.tasks.predict` calculates the Bayesian predictive distribution of the epidemic given the observed

data and joint posterior distribution. This is used in two ways:

- in-sample predictions are made for the latest 7 and 14 day time intervals in the observed data time window. These

are saved as `<output_dir>/insample7.pkl` and `<output_dir>/insample14.pkl``xarray` data structures.

- medium-term predictions are made by simulating forward 56 days from the last+1 day of the observed data time window. These is saved as `<output_dir>/medium_term.pkl``xarray` data structure.

7. Summary output:

- LAD-level reproduction number: `covid.tasks.summarize.rt` takes the column sums of the next generation matrix as the

LAD-level reproduction number. This is saved in `<output_dir>/rt_summary.csv`.

- Incidence summary: `covid.tasks.summarize.infec_incidence` calculates mean and quantile information for the medium term prediction, `<output_dir>/infec_incidence_summary.csv`.

- Prevalence summary: `covid.tasks.summarize.prevalence` calculated the predicted prevalence of COVID-19 infection

(model E+I compartments) at LAD level, `<output_dir>/prevalence_summary.csv`.

- Population attributable risk fraction for infection: `covid.tasks.within_between` calculates the population

attributable fraction of within-LAD versus between-LAD infection risk, `<output_dir>/within_between_summary.csv`.

- Case exceedance: `covid.tasks.case_exceedance` calculates the probability that observed cases in the last 7 and 14

days of the observed timeseries exceeding the predictive distribution. This highlights regions that are behaving

atypically given the model, `<output_dir>/exceedance_summary.csv`.

8. In-sample predictive plots: `covid.tasks.insample_predictive_timeseries` plots graphs of the in-sample predictive

distribution for the last 7 and 14 days within the observed data time window, `<output_dir>/insample_plots7` and

`<output_dir>/insample_plots14`.

9. Geopackage summary: `covid.tasks.summary_geopackage` assembles summary information into a `geopackage` GIS file,

`<output_dir>/prediction.pkg`.

10. Long format summary: `covid.tasks.summary_longformat` assembles reproduction number, observed data, in-sample, and medium-term

predictive incidence and prevalence (per 100000 people) into a long-format XLSX file.

This repository contains code that produces Monte Carlo samples of the Bayesian posterior distribution

given the model and case timeseries data from [coronavirus.data.gov.uk](https://coronavirus.data.gov.uk),

implementing an ETL step, the model itself, and associated inference and prediction steps.

Users requiring an end-to-end pipeline implementation should refer to the [covid-pipeline](https://github.com/chrism0dwk/covid-pipeline)

repository.

## COVID-19 Lancaster University data statement

...

...

@@ -103,13 +29,6 @@ UTLA: Upper Tier Local Authority

LAD: Local Authority District

### Files

*`covid` Python package

*`example_config.yaml` example configuration file containing data paths and MCMC settings

*`data` a directory containing example data (see below)

*`pyproject.py` a PEP518-compliant file describing the `poetry` build system and dependencies.

## Example data files

*`data/c2019modagepop.csv` a file containing local authority population data in the UK, taken from ONS prediction for December 2019. Local authorities [City of Westminster, City of London] and [Cornwall, Isles of Scilly] have been aggregated to meet commute data processing requirements.

*`data/mergedflows.csv` inter local authority mobility matrix taken from UK Census 2011 commuting data and aggregated up from Middle Super Output Area level (respecting aggregated LADs as above).