Commit 7a4c1d73 authored by Poppy Miller's avatar Poppy Miller
Browse files

Added some more documentation

parent f69b9e0d
......@@ -14,8 +14,15 @@
#' @importFrom R6 R6Class
#' @export
#'
#' @return Object of \code{\link{HaldDP}} with methods for creating a HaldDP model, running the model, and accessing and plotting the results.
#' @format Object of \code{\link{R6Class}} with methods for creating a HaldDP model, running the model, and accessing and plotting the results.
#' @return Object of \code{\link{HaldDP}} with methods for creating a HaldDP model,
#' running the model, and accessing and plotting the results.
#' @format Object of \code{\link{R6Class}} with methods for creating a HaldDP model,
#' running the model, and accessing and plotting the results.
#' @section Description:
#' This function fits a non-parametric Poisson source attribution model for human cases of
#' disease. It supports multiple types, sources, times and locations. The number of
#' human cases for each type, time and location follow a Poisson likelihood.
#' \deqn{y_{itl}\sim\textsf{Poisson}(\lambda_{itl})}
#' @section Methods:
#' \describe{
#' \item{\code{new(data, k, priors, a_q, inits = NULL)}}{Constructor takes
......@@ -24,27 +31,82 @@
#' \code{Human}, columns containing the number of positive source samples
#' (one column per source), a column with the time id's named \code{Time},
#' a column with the type id's named \code{Type}, and a column with the source
#' location id's \code{Location})
#' location id's \code{Location}). The data for the human cases and source
#' counts must be integers. The data for the time, location and type columns
#' must be factors. The source counts are currently only allowed to vary over time,
#' hence they must be repeated for each location within each time.
#'
#' \code{k} prevalence dataframe (with columns named \code{Value, Time,
#' Location and Source})
#' Location and Source}). Prevalences must be between 0 and 1 as they are the
#' proportion of samples that were positive for any type for a given source and time.
#'
#' \code{a_q} the Dirichlet Process concentration parameter
#' \code{priors} list with elements named \code{a_r}, \code{a_alpha}, \code{a_theta} and \code{b_theta},
#' corresponding to the prior parameters for the \code{r}, \code{alpha}, and base
#' distribution for the DP parameters respectively.
## The \code{r} parameters have
## a Dirichlet prior for each type/ time combination, whilst the \code{alpha}
## parameters have a Dirichet prior for each time/ location combination. Therefore,
## the prior values \code{a_alpha} can be either a single positive number (to be
## used for all \code{alphas}) or a data frame with a single positive number for
## each \code{alpha} parameter (in a column called \code{Value}) and columns named
## \code{Time}, \code{Location} and \code{Source} containing the time, location and
## source names for each \code{alpha} prior parameter. Similarly, the prior values \code{a_r}
## can either be a single positive number or a data frame with columns named \code{Value},
## \code{Time}, \code{Type} and \code{Source}.
## The base distribution is Gamma distributed with shape and rate parameters given by \code{a_theta}
## and \code{b_theta} respectively.
#'
#' \code{inits} (optional) initial values for the mcmc algorithm. This is a list
#' that may contain any of the following items:
#' \tabular{lllll}{
#' \emph{Parameter} \tab \emph{Prior Distribution} \tab \emph{Prior Parameters}\cr
#' \code{a_r} \tab Dirichlet(concentration) \tab A single positive number or a data \cr
#' \tab \tab frame with columns giving the prior values \cr
#' \tab \tab (named \code{Value}), times (named Time) \cr
#' \tab \tab and source ids (named \code{Source}. If a\cr
#' \tab \tab single number is supplied, it will be used for\cr
#' \tab \tab all times, sources an locations. \cr
#'
#' \code{a_alpha} \tab Dirichlet(concentration) \tab A single positive number or a dataframe \cr
#' \tab \tab with columns giving the prior values (named \cr
#' \tab \tab \code{value}), times (named the name of the \cr
#' \tab \tab time column in the data), locations (named the \cr
#' \tab \tab name of the location column in the data) and \cr
#' \tab \tab the source id (named \code{source_id}). \cr
#'
#' \code{alpha} (a data frame with
#' columns named \code{Value} contining the initial values, \code{Time}, \code{Location}, \code{Source}),
#' Type effects base \tab DPM(Gamma(shape, rate), \tab Numerical vector of length 2 for the shape and \cr
#' distribution (\code{theta}) \tab alpha)\tab rate of the Gamma base distribution.\cr
#' }
#'
#' \code{q} (a data frame with columns names \code{Value} contining the initial values and \code{Type}), and
#' \code{a_q} the Dirichlet Process concentration parameter.
#'
#' \code{r} (a data frame with columns:
#' a column with the initial r values named \code{Value}
#' (note these must sum to 1 for each source-time combination),
#' a column with the source id's named \code{Source},
#' a column with the time id's named \code{Time},
#' a column with the type id's named \code{Type}.)}
#' \code{inits} (optional) initial values for the mcmc algorithm. This is a list
#' that may contain any of the following items: \code{alpha} (a data frame with
#' columns named \code{Value} contining the initial values, \code{Time},
#' \code{Location}, \code{Source}), \code{q} (a data frame with columns names
#' \code{Value} contining the initial values and \code{Type}), and \code{r} (a data
#' frame a column with the initial r values named \code{Value} (note these must
#' sum to 1 for each source-time combination), a column with the source id's
#' named \code{Source}, a column with the time id's named \code{Time}, a column
#' with the type id's named \code{Type}.)
#' An optional list giving the starting values for the parameters.
#' \tabular{lll}{
#' \emph{Parameter} \tab \emph{Description} \cr
#' \code{r}
#' \tab A data frame with columns giving the initial values (named \code{Value}),\cr
#' \tab times (named Time) and source and type id's (named \code{Source} and Type. \cr
#' \tab DEFAULT: the default initial values are the maximum likelihood point \cr
#' \tab estimates of \code{r} from the source matrix (i.e. \eqn{r_ij = x_ij / sum_i=1^n x_ij}).\cr
#' Source effects (\code{alpha})
#' \tab A data frame with columns named \code{Value} (containing the initial values), \cr
#' \tab \code{Source} (containing the source names) and columns giving the time and \cr
#' \tab location for each parameter (named Location). DEFAULT: The default initial values\cr
#' \tab for the source effects are drawn the prior distribution (Dirichlet). \cr
#' Type effects (\code{q})
#' \tab A data frame with columns giving the initial values (named \code{Value})\cr
#' \tab and the type ids (named Type). DEFAULT: initialise all type effects to be in \cr
#' \tab a single group with a theta value calculated as \cr
#' \tab \eqn{\theta = sum(Human_itl) / sum_l=1^L(sum_t=1^T(sum_i=1^n(sum_j=1^m(alpha_jtl * r_ijt * k_jt))))}. \cr
#' \tab i.e. \eqn{theta = sum(Human_itl) / sum(lambda_ijtl / theta)}}
#' }
#'
#' \item{\code{fit_params(n_iter = 1000, burn_in = 0, thin = 1,
#' n_r = ceiling(private$nTypes * 0.2), params_fix = NULL)}}{when called, sets the mcmc
......@@ -124,7 +186,7 @@
#' are \code{"percentiles"} and \code{"spin"}).
#' See \code{extract} for details on the subsetting. \code{lambda_j_prop} returns the
#' proportion of cases attributed to each source \code{j} and is calculated by dividing
#' each iteration of \code{lambda_{jtl}} values by their sum within each time \code{t} and location \code{l}.}.}
#' each iteration of \code{lambda_{jtl}} values by their sum within each time \code{t} and location \code{l}.}
#'
#' \item{\code{plot_heatmap(iters, cols = c("blue","white"), hclust_method = "complete")}}{
#' Creates a dendrogram and heatmap for the type effect groupings (\code{s} parameter in the model).
......@@ -143,7 +205,7 @@
#'
#' @references Chen, M.-H. and Shao, Q.-M. (1998). Monte Carlo estimation of Bayesian credible and HPD intervals, \emph{Journal of Computational and Graphical Statistics}, 7.
#' @references Liu Y, Gelman A, Zheng T (2015). "Simulation-efficient shortest probability intervals." Statistics and Computing.
#' @author Chris Jewell and Poppy Miller \email{p.miller@lancaster.ac.uk}
#' @author Chris Jewell and Poppy Miller \email{p.miller at lancaster.ac.uk}
#'
#' @examples
#' data(campy)
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment