Code


latest release gjam 2.1.3 on 4-23-17

Generalized Joint Attribute Modeling (GJAM) in R

Ecological attributes include species abundances, traits, and individual condition (e.g., growth or infection status), to name a few. They are multivariate data, but not all of one type.  They can be combinations of presence-absence, ordinal, continuous, discrete, composition, or zero-inflated.  GJAM provides inference on sensitivity to input variables, correlations between responses, model selection, prediction of responses, inverse prediction of predictors, and community classification by response to predictors.

GJAM was motivated by species distribution and abundance data, but can provide an attractive alternative to traditional methods wherever observations are multivariate and combine multiple scales and mixtures of continuous and discrete data.

Importantly, analysis is done on the observation scale. That is, coefficients and covariances are interpreted on the same scale as the data. This contrasts with standard generalized linear models (GLMs), where coefficients and covariances are difficult to interpret and cannot be compared across responses that are modeled on different scales and with nonlinear link functions.

GJAM accommodates massive zeros in multivariate data by avoiding the standard mixtures used in zero-inflated GLMs. Instead, gjam relies on censoring.

GJAM exploits censoring to combine multiple data types in a single model, including mixtures of continuous and discrete data.  For example, the microbial community (composition data) might be tracked together with host condition (continuous, categorical, binary, ordinal, …).

 

Citation for model:

Clark, J.S., D. Nemergut, B. Seyednasrollah, P. Turner, and S. Zhang. 2017.  Generalized joint attribute modeling for biodiversity analysis: Median-zero, multivariate, multifarious data.   Ecological Monographs, 87, 34–56.  Clark2017EcolMonogrclarksupplement.

Presents the motivation and model; summarizes computation in gjam.  The Supplement file provides additional detail on algorithms.

Dimension reduction: 

Taylor-Rodrıguez, D., K. Kaufeld, E. M. Schliep, J. S. Clark, and Alan E. Gelfand. 2016.  Joint Species distribution modeling: dimension eduction using Dirichlet processes.  Bayesian Analysisin press. bayesanaly2016.

Many applications require large numbers of response variables.  Microbiome studies bring the additional complication of composition data.  And most observed values can still be zero.  This paper describes the Dirichlet process prior implemented in gjam that finds a low-dimensional representation for the covariance between responses.

Trait analysis:  

Clark, J.S. 2016.  Why species tell us more about traits than traits tell us about species: Predictive models. Ecology,97, 1979–1993, ecology2016ecology2016_AppendixS1

The joint distribution of ecological attributes (‘traits’) can be modeled together with species, separately, or predicted from the joint distribution of species.  This paper describes the model and computation implemented in gjam.

 

Vignette with R code and applications: gjam vignette

Installation in R or RStudio:

> install.packages('gjam')
> library('gjam')

Documentation:

> help('gjam')
> browseVignettes('gjam')

Below are cluster plots of the correlation matrix for a presence-absence model (a), continuous abundance model (b), and the response to environmental variables (d).  The cluster analysis in (c) is based on distances in (d).  These plots are obtained by specifying GRIDPLOTS=T in gjamPlot.

fig7a

fig7b

 

 

 

Main contributors:

Jim Clark wrote the GJAM model, the R and C++ code, and the GJAM package.

Alan Gelfand and Daniel Taylor-Rodrigues wrote the Dirichlet process model and algorithms for dimension reduction.

Daniel Taylor-Rodrigues implemented the Dirichlet process in R and C++ in GJAM.

Bene Bachelot, Chase Nuñes, and Brad Tomasek provided extensive testing and feedback through all stages of development.

Many others: Students in the course Bayesian Inference Environm Models (BIO/ENV 665) at Duke University and members of the Multivariate Modeling working group of the SAMSI Ecology program contributed many ideas, recommendations, and feedback.