Commit a03dcde5 authored by Chris Jewell's avatar Chris Jewell
Browse files

More doc!

parent f4f7140f
Pipeline #248 passed with stage
in 5 minutes and 2 seconds
.. _gemlang_doc:
gemlang: The GEM modelling language
......@@ -164,6 +166,19 @@ piece of data injected *in the global scope*.
Code Generation
Code generation is performed by walking the AST and building a hierarchical data structure of objects of type
:class:`Outputter <gem.gemlang.tf_output.Outputter>` defined in the :mod:`gem.gemlang.tf_output` module. The
:mod:`gem.gemlang.tf_output` module has one class (derived from :class:`Outputter <gem.gemlang.tf_output.Outputter>`)
per GEM language feature which then generates the target Python/Tensorflow generated code.
Tree walking is performed by the :class:`CodeGenerator <gem.gemlang.model_generator.CodeGenerator>` class, which
has a method for each node in the AST which assembles the appropriate collection of :class:`Outputter <gem.gemlang.tf_output.Outputter>`
objects. In this sense, **source-to-source translation is intended to be performed by the
:class:`CodeGenerator <gem.gemlang.model_generator.CodeGenerator>` class** and not the :class:`Outputter <gem.gemlang.tf_output.Outputter>`
Note, code generation is not yet a perfect beast -- significant streamlining with possibly more levels of abstraction
are expected in future.
The GEM interface
The GEM model description language is intended to be a standalone, clean, and concise way of describing a model. GEM
programs are intended to be compiled within a host data analysis environment (DAE). DAEs are many and varied, but GEM
specifically targets Python and R. A user's workflow is based around the DAE, so for example in Python it might be
along the lines of :ref:`Example 1 <dae-example>`.
.. code-block:: Python
:name: dae-example
:caption: Example of how GEM is used within Python as a data analysis environment.
from gem import GEM
import numpy as np
k_matrix = np.load("contact_matrix.npy") # Contact matrix loaded from disk
>>> gem_prog = """
K = Matrix()
beta ~ Gamma(1, 1)
gamma ~ Gamma(1, 1)
I0 = Multinomial(1000, 1) # Draw initial state vector for I
Epidemic SIR() {
S = State(init=1-I0)
I = State(init=I0)
R = State(init=Zeros_like(I0))
[S -> I] = beta * K @ I
[I -> R] = gamma
epi ~ SIR()
>>> model = GEM(gem_prog, data={'K': k_matrix})
**Question**: Why is GEM a standalone language that is compiled within a different language?
**Answer**: We choose this pattern for a number of reasons:
1. A separate DSML enables a very clean, succinct language orientated towards model description. Thus the language
doesn't have to deal with the complexities of I/O, explicit iteration, memory management, etc. As such, GEM is
deliberately not intended to be Turing-complete -- GEM needs to provide a complete description of an epidemic model; it
does *not* need to be able to make your morning coffee!
2. Embedded probabilisitic programming languages, such as `PyMC3 <>`_ suffer from lack of clarity
as host-language (in this case Python) constructs obscure the salient features of the probability model. In GEM, we wish
to avoid this!
3. The most compelling reason for a separate DSML is that is can then be used in a variety of DAEs without requiring
a large amount of re-coding for each new host language. GEM is developed in Python, which provides a natural fit for
using Python as a DAE. A thin wrapper around the Python interface functions provides a fast and simple way to
interact with GEM from other languages, notably R. This pattern is successfully used by several other probabilistic
programming languages, notably `STAN <>`_ and `OpenBUGS <>`_.
GEM compiler interface
The GEM compiler interface is responsible for reading a GEM program string and returning an object of type
:class:`GEM <gem.interface.GEM>`.
:class:`GEM <gem.interface.GEM>` is responsible for parsing, semantics checking, and code generation as
described in the :ref:`gemlang <gemlang_doc>` documentation. Whilst source-to-source translation of gemlang to
Python/Tensorflow represents the main task of the interface, the generated code also needs to be executed according to
the following process:
1. The generated Python/Tensorflow code defines a function `model_impl()` which contains a representation of the model
using the (embedded) `Edward2 <>`_ probabilistic programming language. The
function returns a list of all defined variables within the GEM model in its global scope.
2. The generated code is run dynamically (using Python `exec()`) inside the :class:`GEM() <gem.interface.GEM>` instance,
which monkey-patches it with the `model_impl()` method.
The resulting model object of type :class:`GEM() <gem.interface.GEM>` may then be used to print out the generated
Python code via the `GEM.pyprog` attribute, or even access the Edward2 model implementation directly via the `GEM.model_impl()`
......@@ -38,6 +38,9 @@ gem.gemlang.parse\_gemlang module
gem.gemlang.symbol module
.. inheritance-diagram:: gem.gemlang.symbol
:parts: 1
.. automodule:: gem.gemlang.symbol
......@@ -46,6 +49,9 @@ gem.gemlang.symbol module
gem.gemlang.symbol\_resolve module
.. inheritance-diagram:: gem.gemlang.symbol_resolve
:parts: 1
.. automodule:: gem.gemlang.symbol_resolve
......@@ -54,6 +60,9 @@ gem.gemlang.symbol\_resolve module
gem.gemlang.inject\_data module
.. inheritance-diagram:: gem.gemlang.inject_data
:parts: 1
.. automodule:: gem.gemlang.inject_data
......@@ -61,6 +70,9 @@ gem.gemlang.inject\_data module\_output module
.. inheritance-diagram:: gem.gemlang.tf_output
:parts: 1
.. automodule:: gem.gemlang.tf_output
......@@ -46,4 +46,4 @@ class InjectData(ASTWalker):
rhs = assign.children[1]
if lhs.value in self.__data.keys():
raise SyntaxError(
f"Redefinition of symbol '{lhs.value}' as constant data")
\ No newline at end of file
f"Redefinition of symbol '{lhs.value}' as constant data")
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment