Commit f317808d authored by Chris Jewell's avatar Chris Jewell
Browse files

Merge remote-tracking branch 'origin/master' into testing

# Conflicts:
#	pyods/__init__.py
#	pyods/model/Model.py
#	pyods/model/Rate.py
#	pyods/model/State.py
#	pyods/model/Transition.py
#	pyods/simulator/Simulator.py
#	pyods/simulator/__init__.py
#	tests/test_basic.py
parents 6fe07fab e3fded50
image: chrism0dwk/gem-ci:latest
# Pip's cache doesn't store the python packages
# https://pip.pypa.io/en/stable/reference/pip_install/#caching
#
# If you want to also cache the installed packages, you have to install
# them in a virtualenv and cache it as well.
cache:
paths:
- .cache/pip
test:
script:
- python -V
- pip install coverage
- pip install graphviz
- coverage run -m unittest
- coverage report -m -i
- coverage html -i -d coverage
artifacts:
paths:
- coverage/
except:
- pylint
pylint:
script:
- pip install pylint
- pip install anybadge
- pylint --output-format=parseable gem > pylint_report.txt || true
- score=$(sed -n 's/^Your code has been rated at \([-0-9.]*\)\/.*/\1/p' pylint_report.txt)
- anybadge --value=$score --file=pylint_badge.svg pylint
artifacts:
paths:
- pylint_report.txt
- pylint_badge.svg
pages:
script:
- curl -X POST -d "branches=master" -d "token=$RTD_TOKEN" https://readthedocs.org/api/v2/webhook/gem/98961/
only:
- master
<component name="CopyrightManager">
<settings default="MIT" />
</component>
\ No newline at end of file
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: doc/source/conf.py
# Optionally build your docs in additional formats such as PDF and ePub
formats: all
# Optionally set the version of Python and requirements required to build your docs
python:
version: 3.7
install:
- requirements: doc/requirements.txt
\ No newline at end of file
FROM krallin/centos-tini:latest
RUN yum update -y
ENV PATH /opt/conda/bin:$PATH
RUN yum install -y wget bzip2 make
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh && \
ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \
echo "conda activate base" >> ~/.bashrc
RUN conda install matplotlib
RUN conda install tensorflow
RUN conda install -c conda-forge tensorflow-probability
RUN conda install coverage
RUN pip install lark-parser
ENTRYPOINT [ "/usr/bin/tini", "--" ]
CMD [ "/bin/bash" ]
# GEM: a domain-specific language for epidemic modelling
GEM is a toolkit for epidemic analysis. It is a probabilistic
programming language that allows users to define an epidemic
process in a clear, repeatable language, and embed that process
into a higher-order probabilistic model. It provides for both simulation
and inference processes.
GEM documentation: [https://gem.readthedocs.io](https://gem.readthedocs.io)
## Quickstart
To install GEM:
```bash
$ pip install git+http://fhm-chicas-code.lancs.ac.uk/GEM/gem.git
```
Example usage:
```python
from gem import GEM
from gem.plotting import plot_timeseries, traceplot
prog = """
beta ~ Gamma(2, 10)
gamma ~ Gamma(1, 10)
Epidemic MyEpidemic() {
S = State(init=999)
I = State(init=1)
R = State(init=0)
[S -> I] = beta * I / 1000
[I -> R] = gamma
}
epi ~ MyEpidemic()
"""
model = GEM(prog)
# Simulate
sim = model.sample(1, condition_vars={'beta': [0.4], 'gamma': [0.14]})
plot_timeseries(model.random_variables['epi'], sim['epi'])
# Inference
posterior, accept = model.fit(observed={'epi': sim['epi']}, n_samples=5000,
init=[0.001, 0.001], burnin=2000)
traceplot(posterior)
```
## Status
GEM is currently very much in alpha testing, with the interface likely to change without warning. So far, the software is able to perform inference on epidemic models with known transition times. It can also perform inference on a range of non-epidemic probabilistic models.
Watch this space in the coming year for developments in GEM, specifically around partially-observed epidemic processes and hidden-Markov models.
## Get involved
If you like the idea of GEM, get involved! Clone the Git repo, and/or contact the project leader, Chris Jewell <c.jewell@lancaster.ac.uk>.
## Acknowledgements
I'd very much like to thank the [University of Lancaster](http://www.lancaster.ac.uk) for providing me a base from which to develop GEM, [CHICAS](http://chicas.lancaster-university.uk/) my group at Lancaster, and most of all [The Wellcome Trust](http://www.wellcome.ac.uk) for providing funding [funding](https://wellcome.ac.uk/funding/people-and-projects/grants-awarded/gem-translational-software-outbreak-analysis) to take GEM to the next level.
\ No newline at end of file
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
:end
popd
sphinx>2
sphinx_rtd_theme>=0.3.1
numpy
tensorflow>=1.14
tensorflow-probability>=0.7.0
lark-parser
graphviz
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
import sphinx_rtd_theme
sys.path.insert(0, os.path.abspath('../..'))
# -- Project information -----------------------------------------------------
project = 'GEM'
copyright = '2019, Chris Jewell'
author = 'Chris Jewell'
# The full version, including alpha/beta/rc tags
release = '0.1alpha'
# -- General configuration ---------------------------------------------------
master_doc = 'index'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
'sphinx.ext.inheritance_diagram',
'sphinx_rtd_theme',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
pygments_style = 'sphinx'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = []
nitpicky = True
.. _gemlang_doc:
gemlang: The GEM modelling language
===================================
The GEM modelling language, **gemlang**, is defined using
`extended Backus-Naur Form <https://en.wikipedia.org/wiki/Extended_BackusNaur_form>`_ (eBNF) notation and parsed
using the Python parser generator package `Lark <https://github.com/lark-parser/lark>`_. The basic architecture of the
GEM parsing framework closely follows the source-to-source translation ideas outlined in [Par2010]_ and shown
in outlined in :ref:`Figure 1 <parsechain>`.
.. _parsechain:
.. figure:: parsechain.svg
:scale: 60%
:alt: GEM parse chain
Figure 1: The GEM parse chain showing the main stages of the source-to-source translation pipeline.
Lexical and Syntactic Analysis
------------------------------
The GEM program syntax is defined using an eBNF-like grammar, which can be read by the
`Lark <https://github.com/lark-parser/lark>`_ parsing engine. eBNF is a syntax that describes a language syntax in
terms of *production rules*, which may be arranged in a hierarchy.
Given a gemlang program, lexical analysis and the first stage of syntactic analysis are performed by
`Lark <https://github.com/lark-parser/lark>`_. On invocation, Lark reads the gemlang eBNF grammar definition
in `gem/gemlang/gem_grammar.cfgr`, and lexes and parses an input gemlang program to a *parse tree* as described in
the Lark documentation -- one tree node per production rule in the grammar.
Parsing is done using Lark's implementation of the `Earley <https://en.wikipedia.org/wiki/Earley_parser>`_ algorithm,
providing a robust and powerful method of parsing against the gemlang grammar.
In the second stage of syntactic analysis, the parse tree is then *transformed* into an
Abstract Syntax Tree (AST) representing the GEM program, with objects of (base) type
:class:`ASTNode <gem.gemlang.ast.ast_base.ASTNode>` representing nodes
in the tree. The reason for doing this is that the parse tree is entirely homogeneous, using Lark's built in `Tree`
class to represent nodes. For purposes of semantic analysis, we find it easier to work with specialisations of
:class:`ASTNode <gem.gemlang.ast.ast_base.ASTNode>` (:class:`Number <gem.gemlang.ast.ast_expression.Number>`,
:class:`MulExpr <gem.gemlang.ast.ast_expression.MulExpr>`, :class:`Call <gem.gemlang.ast.ast_expression.Call>`, etc.) so that
we can use Python's type system to identify operations and objects
represented within the AST. This has the important advantage of decoupling parse tree generation from subsequent
semantic analysis, leading to a more modularised software architecture.
Where in the source code does this happen?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Lark is invoked via the :class:`GEM <gem.interface.GEM>` class, which in turn calls the
:func:`gemparse <gem.gemlang.parse_gemlang.gemparse>` function.
:func:`gemparse <gem.gemlang.parse_gemlang.gemparse>` returns an AST, a branching tree composed of objects of type
:class:`ASTNode <gem.gemlang.ast.ast_base.ASTNode>`. :class:`ASTNode <gem.gemlang.ast.ast_base.ASTNode>` is specialised
into subclasses representing language features and concepts within gemlang, as
specified in the type hierarchy in :mod:`gem.gemlang.ast`.
Importantly, the transformation of the Lark parse tree to AST is implemented by
:class:`GEMParser <gem.gemlang.parse_gemlang.GEMParser>`.
A unit test (`tests/unit/test_parse_completeness`) scans the GEM grammar for production rules, and makes sure a
complementary method exists in GEMParser. In addition, :class:`GEMParser <gem.gemlang.parse_gemlang.GEMParser>` will
throw an exception if a production rule is invoked in the grammar for which no method is implemented.
Abstract syntax tree
^^^^^^^^^^^^^^^^^^^^
The `AST <https://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ is a tree representation of a GEM program. Nodes within
the tree are objects of subclasses of :class:`ASTNode <gem.gemlang.ast.ast_base.ASTNode>` representing statements, atoms, and expressions within gemlang. The
tree may be traversed using depth-first or breadth-first using specialisations of the
:class:`ASTWalker <gem.gemlang.ast_walker.ASTWalker>`
class (see developer API documentation for the :mod:`gem.gemlang.ast` module).
Semantic Analysis
-----------------
The topic of general semantic analysis is broad, and the reader is encouraged to read at least [Par2010]_. This
documentation will describe the semantic analysis steps currently performed in GEM. As the GEM project develops, we
expect to add more steps in semantic analysis so the following description should be regarded as non-exhaustive!
To illustrate our description of semantic analysis, it is useful to consider a GEM code fragment:
.. code-block::
:linenos:
:name: gemprog
:caption: Example GEM program implementing an SI model
mu = 0.0
beta ~ Normal(mu, 1.0)
Epidemic SIModel() {
S = State(init=999)
I = State(init=1)
[S -> I] = beta * I
}
epi ~ SIModel()
Semantic analysis currently consists of 3 stages which are executed sequentially:
1. Symbol Declaration (:class:`ParseDeclarations <gem.gemlang.symbol_resolve.ParseDeclarations>`)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In this stage, a symbol table is built containing builtin GEM symbols as well as symbols declared in the GEM program.
Since gemlang is implicitly typed, symbols representing variables are declared when they are first assigned.
In :ref:`Example 1 <gemprog>`, symbols `mu`, `beta`, `S`, `I`, and `epi` are pushed into the symbol table on
their first assignment (lines 1, 2, 4, 5, 8 respectively).
In GEM, symbols are represented by the :class:`Symbol <gem.gemlang.symbol.Symbol>` class hierarchy. Symbol tables are
represented by the :class:`Scope <gem.gemlang.symbol.Scope>` class hierarchy, which is essentially a wrapper around a
Python `dict` object storing *name: symbol* pairs. Symbols can also have scopes themselves, representing declarations
such as that for `SIModel` in the :ref:`example <gemprog>` code. The symbol hierarchy for
:ref:`Example 1 <gemprog>` is shown in :ref:`Figure 2 <symtab>`.
Notes
`````
1. During symbol declaration, each symbol (:class:`IdRef <gem.gemlang.ast.ast_expression.IdRef>`) is annotated with a reference to the AST node representing the declaration.
2. Since gemlang is declarative, symbols may not be declared (or even assigned to) more than once. An exception is
raised if duplicate declarations are encountered.
3. Scopes are pushed onto a stack, starting with the global scope. When a new (child) :class:`ScopedSymbol <gem.gemlang.symbol.ScopedSymbol>`
is encountered, it is pushed onto the stack. When leaving the :class:`ScopedSymbol <gem.gemlang.symbol.ScopedSymbol>`,
it is popped off the stack, returning to the parent scope. Developers are referred to [Par2010]_ for further reading.
.. _symtab:
.. figure:: symtab.svg
:scale: 10%
:alt: Example symbol table
Figure 2: The symbol table for the :ref:`example <gemprog>` GEM program.
2. Symbol Resolution
^^^^^^^^^^^^^^^^^^^^
At the symbol resolution stage, the AST is traversed and each encountered symbol
is looked up in the current scope's symbol table. If the symbol
exists, the symbol node (:class:`IdRef <gem.gemlang.ast.ast_expression.IdRef>`) is annotated with a reference to the
symbol in the symbol table. If the symbol is not found, a syntax error exception is raised notifying the user that
an undefined variable exists in the code.
3. Data Injection
^^^^^^^^^^^^^^^^^
The GEM language allows the user to inject static (i.e. constant) data into a model at compile time. This is done
much like the concept of placeholders in Tensorflow. For example, in the linear model defined in :ref:`Example 2 <data-injection>`
covariate data is defined as a matrix placeholder, with data passed in when the GEM program is compiled.
.. code-block::
:linenos:
:caption: Example of data injection
:name: data-injection
prog = """
X = Vector()
alpha ~ Normal(0, 1000)
beta ~ Normal(0, 1000)
sigma ~ Gamma(2, 0.1)
y ~ Normal(alpha + beta * X, sigma)
"""
model = GEM(prog, const_data={'X': x_numpy})
Data injection is performed by the :class:`DataInjector <gem.gemlang.inject_data.InjectData>` class. The algorithm
traverses the AST performing four actions:
1. Declaring assignments of placeholders to variables are replaced in the AST with a
:class:`NullNode <gem.gemlang.ast.ast_base.NullNode>` so that no output code is generated.
This is done because the data already exists in the user's host language environment.
2. Declarations of random variables with symbol names matching data object names in the user's
`const_data` dictionary are re-written as static data objects.
3. Data structures in the user's `const_data` dictionary are converted into the maths layer's required
format (see :func:`convert_to_maths_layer <gem.gemlang.tf_output.convert_to_maths_layer>`).
4. References to each constant data object are written into the AST in locations corresponding
to the global scope.
The resulting AST contains :class:`AssignExpr <gem.gemlang.ast.ast_statement.AssignExpr>` nodes for each
piece of data injected *in the global scope*.
Code Generation
---------------
Code generation is performed by walking the AST and building a hierarchical data structure of objects of type
:class:`Outputter <gem.gemlang.tf_output.Outputter>` defined in the :mod:`gem.gemlang.tf_output` module. The
:mod:`gem.gemlang.tf_output` module has one class (derived from :class:`Outputter <gem.gemlang.tf_output.Outputter>`)
per GEM language feature which then generates the target Python/Tensorflow generated code.
Tree walking is performed by the :class:`CodeGenerator <gem.gemlang.model_generator.CodeGenerator>` class, which
has a method for each node in the AST which assembles the appropriate collection of :class:`Outputter <gem.gemlang.tf_output.Outputter>`
objects. In this sense, **source-to-source translation is intended to be performed by the
:class:`CodeGenerator <gem.gemlang.model_generator.CodeGenerator>` class** and not the :class:`Outputter <gem.gemlang.tf_output.Outputter>`
objects.
Note, code generation is not yet a perfect beast -- significant streamlining with possibly more levels of abstraction
are expected in future.
.. [Par2010] Parr, T. Language Implementation Patterns. The Pragmatic Programmers LLC. 2010.
\ No newline at end of file
GEM Developer Documentation
===========================
GEM is a domain-specific modelling language for epidemics, providing a bridge between applied epidemic modelling and
computational and statistical methodology. The package is written in Python, and is based conceptually around 4 layers:
1. The GEM model description language, *gemlang*;
2. Maths functions encoding the probabilistic GEM model;
3. A back-end extensible algorithms layer for simulation and inference;
4. A scalable and hardware-independent maths layer, currently `Tensorflow <https://www.tensorflow.org>`_
GEM provides both a user interface and a developer interface, allowing use by epidemics scientists analysing outbreaks
as well as methods researchers (statisticians, computer scientists, etc) to add new functionality to the package.
Essentially, GEM is a source-to-source translator, taking a **gemlang** input specification and translating it to a
Python representation. All translation and launching of a given model is conducted within a parent language runtime
interface, initially Python. Future plans are in place to develop a thin R wrapper around the Python interface.
This guide is intended to allow developers to understand the GEM architecture enough to be able to extend, maintain, and
develop the package.
.. toctree::
:maxdepth: 1
:caption: GEM software components
gemlang
maths
algorithms
interface
The GEM interface
=================
The GEM model description language is intended to be a standalone, clean, and concise way of describing a model. GEM
programs are intended to be compiled within a host data analysis environment (DAE). DAEs are many and varied, but GEM
specifically targets Python and R. A user's workflow is based around the DAE, so for example in Python it might be
along the lines of :ref:`Example 1 <dae-example>`.
.. code-block:: Python
:linenos:
:name: dae-example
:caption: Example of how GEM is used within Python as a data analysis environment.
from gem import GEM
import numpy as np
k_matrix = np.load("contact_matrix.npy") # Contact matrix loaded from disk
>>> gem_prog = """
K = Matrix()
beta ~ Gamma(1, 1)
gamma ~ Gamma(1, 1)
I0 = Multinomial(1000, 1) # Draw initial state vector for I
Epidemic SIR() {
S = State(init=1-I0)
I = State(init=I0)
R = State(init=Zeros_like(I0))
[S -> I] = beta * K @ I
[I -> R] = gamma
}
epi ~ SIR()
"""
>>> model = GEM(gem_prog, data={'K': k_matrix})
>>>
**Question**: Why is GEM a standalone language that is compiled within a different language?
**Answer**: We choose this pattern for a number of reasons:
1. A separate DSML enables a very clean, succinct language orientated towards model description. Thus the language
doesn't have to deal with the complexities of I/O, explicit iteration, memory management, etc. As such, GEM is
deliberately not intended to be Turing-complete -- GEM needs to provide a complete description of an epidemic model; it
does *not* need to be able to make your morning coffee!
2. Embedded probabilisitic programming languages, such as `PyMC3 <https://docs.pymc3.io>`_ suffer from lack of clarity
as host-language (in this case Python) constructs obscure the salient features of the probability model. In GEM, we wish
to avoid this!
3. The most compelling reason for a separate DSML is that is can then be used in a variety of DAEs without requiring
a large amount of re-coding for each new host language. GEM is developed in Python, which provides a natural fit for
using Python as a DAE. A thin wrapper around the Python interface functions provides a fast and simple way to
interact with GEM from other languages, notably R. This pattern is successfully used by several other probabilistic
programming languages, notably `STAN <https://mc-stan.org>`_ and `OpenBUGS <http://openbugs.net>`_.
GEM compiler interface
======================
The GEM compiler interface is responsible for reading a GEM program string and returning an object of type
:class:`GEM <gem.interface.GEM>`.
:class:`GEM <gem.interface.GEM>` is responsible for parsing, semantics checking, and code generation as
described in the :ref:`gemlang <gemlang_doc>` documentation. Whilst source-to-source translation of gemlang to
Python/Tensorflow represents the main task of the interface, the generated code also needs to be executed according to
the following process:
1. The generated Python/Tensorflow code defines a function `model_impl()` which contains a representation of the model
using the (embedded) `Edward2 <https://www.tensorflow.org/probability>`_ probabilistic programming language. The
function returns a list of all defined variables within the GEM model in its global scope.
2. The generated code is run dynamically (using Python `exec()`) inside the :class:`GEM() <gem.interface.GEM>` instance,
which monkey-patches it with the `model_impl()` method.
The resulting model object of type :class:`GEM() <gem.interface.GEM>` may then be used to print out the generated
Python code via the `GEM.pyprog` attribute, or even access the Edward2 model implementation directly via the `GEM.model_impl()`
method.
This diff is collapsed.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="233.44009mm"
height="60.374226mm"
viewBox="0 0 233.4401 60.374226"
version="1.1"
id="svg1478"
inkscape:version="0.92.3 (2405546, 2018-03-11)"
sodipodi:docname="symtab.svg">
<defs
id="defs1472">
<marker
inkscape:stockid="Arrow2Lend"
orient="auto"
refY="0"
refX="0"