Fits a vector autoregressive spatio-temporal (VAST) model using a minimal feature-set and a widely used interface.
Usage
tinyVAST(
formula,
data,
time_term = NULL,
space_term = NULL,
spacetime_term = NULL,
family = gaussian(),
delta_options = list(formula = ~1),
spatial_varying = NULL,
weights = NULL,
spatial_domain = NULL,
development = list(),
control = tinyVASTcontrol(),
space_columns = c("x", "y"),
time_column = "time",
times = NULL,
variable_column = "var",
variables = NULL,
distribution_column = "dist"
)Arguments
- formula
Formula with response on left-hand-side and predictors on right-hand-side, parsed by
mgcvand hence allowings(.)for splines oroffset(.)for an offset.- data
Data-frame of predictor, response, and offset variables. Also includes variables that specify space, time, variables, and the distribution for samples, as identified by arguments
variable_column,time_column,space_columns, anddistribution_column.- time_term
Specification for time-series structural equation model structure for constructing a time-variable interaction that defines a time-varying intercept for each variable (i.e., applies uniformly across space).
time_term=NULLdisables the space-variable interaction; seemake_dsem_ram()for notation.- space_term
Specification for structural equation model structure for constructing a space-variable interaction.
space_term=NULLdisables the space-variable interaction; seemake_sem_ram()for notation.- spacetime_term
Specification for time-series structural equation model structure including lagged or simultaneous effects for constructing a time-variable interaction, which is then combined in a separable process with the spatial correlation to form a space-time-variable interaction (i.e., the interaction occurs locally at each site).
spacetime_term=NULLdisables the space-variable interaction; seemake_dsem_ram()ormake_eof_ram().- family
A function returning a class
family, includinggaussian(),lognormal(),tweedie(),binomial(),Gamma(),poisson(),nbinom1(), ornbinom2(). Alternatively, can be a named list of these functions, with names that match levels ofdata$distribution_columnto allow different families by row of data. Delta model families are possible, and seeFamiliesfor delta-model options. For binomial family options, see 'Binomial families' in the Details section below.- delta_options
a named list with slots for
formula,space_term, andspacetime_term. These specify options for the second linear predictor of a delta model, and are only used (or estimable) when adelta familyis used for some samples.- spatial_varying
a formula specifying spatially varying coefficients (SVC). Note that using formulas in R,
spatial_varying = ~ Xautomatically adds an intercept to implicitly read asspatial_varying = ~ 1 + X, so tinyVAST then estimates an SVC for an intercept in addition to covariateX. Therefore, if you only want an SVC for a single covariate, usespatial_varying = ~ 0 + Xto suppress the default behavior of formulas in R.- weights
A numeric vector representing optional likelihood weights for the data likelihood. Weights do not have to sum to one and are not internally modified. Thee weights argument needs to be a vector and not a name of the variable in the data frame.
- spatial_domain
Object that represents spatial relationships, either using
fmesher::fm_mesh_2d()to apply the SPDE method,igraph::make_empty_graph()for independent time-series,igraph::make_graph()to apply a simultaneous autoregressive (SAR) process to a user-supplied graph,sfnetwork_mesh()for stream networks, or classsfc_GEOMETRYe.g constructed using sf::st_make_grid to apply a SAR to an areal model with adjacency based on the geometry of the object, orNULLto specify a single site. If usingigraphthen the graph must have vertex namesV(graph)$namethat match levels ofdata[,'space_columns']- development
Specify options that are under active development. Please do not use these features without coordinating with the package authors.
- control
Output from
tinyVASTcontrol(), used to define user settings.- space_columns
A string or character vector that indicates the column(s) of
dataindicating the location of each sample. Whenspatial_domainis anigraphobject,space_columnsis a string with with levels matching the names of vertices of that object. Whenspatial_domainis anfmesherorsfnetworkobject, space_columns is a character vector indicating columns ofdatawith coordinates for each sample.- time_column
A character string indicating the column of
datalisting the time-interval for each sample, from the set of times in argumenttimes.- times
A integer vector listing the set of times in order. If
times=NULL, then it is filled in as the vector of integers from the minimum to maximum value ofdata$time. Alternatively, it could be the minimum value ofdata$timethrough future years, such that the model can forecast those future years.- variable_column
A character string indicating the column of
datalisting the variable for each sample, from the set of times in argumentvariables.- variables
A character vector listing the set of variables. if
variables=NULL, then it is filled in as the unique values fromdata$variable_columns.- distribution_column
A character string indicating the column of
datalisting the distribution for each sample, from the set of names in argumentfamily. ifvariables=NULL, then it is filled in as the unique values fromdata$variables.
Value
An object (list) of class tinyVAST. Elements include:
- data
Data-frame supplied during model fitting
- spatial_domain
the spatial domain supplied during fitting
- formula
the formula specified during model fitting
- obj
The TMB object from
MakeADFun- opt
The output from
nlminb- opt
The report from
obj$report()- sdrep
The output from
sdreport- tmb_inputs
The list of inputs passed to
MakeADFun- call
A record of the function call
- run_time
Total time to run model
- interal
Objects useful for package function, i.e., all arguments passed during the call
- deviance_explained
output from
deviance_explained
Details
tinyVAST includes several basic inputs that specify the model structure:
formulaspecifies covariates and splines in a Generalized Additive Model;time_termspecifies interactions among variables and over time that are constant across space, constructing the time-variable interaction.space_termspecifies interactions among variables and over time that occur based on the variable values at each location, constructing the space-variable interaction.spacetime_termspecifies interactions among variables and over time, constructing the space-time-variable interaction.
These inputs require defining the domain of the model. This includes:
spatial_domainspecifies spatial domain, with determines spatial correlationstimesspecifies the temporal domain, i.e., sequence of time-stepsvariablesspecifies the set of variables, i.e., the variables that will be modeled
The default spacetime_term=NULL and space_term=NULL turns off all multivariate
and temporal indexing, such that spatial_domain is then ignored, and the model collapses
to a generalized additive model using gam. To specify a univariate spatial model,
the user must specify spatial_domain and either space_term="" or spacetime_term="", where the latter
two are then parsed to include a single exogenous variance for the single variable
| Model type | How to specify |
| Generalized additive model | specify spatial_domain=NULL space_term="" and spacetime_term="", and then use formula to specify splines and covariates |
| Dynamic structural equation model (including vector autoregressive, dynamic factor analysis, ARIMA, and structural equation models) | specify spatial_domain=NULL and use spacetime_term to specify interactions among variables and over time |
| Univariate spatio-temporal model, or multiple independence spatio-temporal variables | specify spatial_domain and spacetime_term="", where the latter is then parsed to include a single exogenous variance for the single variable |
| Multivariate spatial model including interactions | specify spatial_domain and use space_term to specify spatial interactions |
| Vector autoregressive spatio-temporal model (i.e., lag-1 interactions among variables) | specify spatial_domain and use spacetime_term="" to specify interactions among variables and over time, where spatio-temporal variables are constructed via the separable interaction of spacetime_term and spatial_domain |
Model building notes
binomial familes: A binomial family can be specified in only one way: the response is the observed proportion (proportion = successes / trials), and the 'weights' argument is used to specify the Binomial size (trials, N) parameter (proportion ~ ..., weights = N).factor models: If a factor model is desired, the factor(s) must be named and included in thevariables. The factor is then modeled forspace_term,time_term, andspacetime_termand it's variance must be fixed a priori for any term where it is not being used.
See also
Details section of make_dsem_ram() for a summary of the math involved with constructing the DSEM, and doi:10.1111/2041-210X.14289
for more background on math and inference
doi:10.48550/arXiv.2401.10193 for more details on how GAM, SEM, and DSEM components are combined from a statistical and software-user perspective
summary.tinyVAST() to visualize parameter estimates related to SEM and DSEM model components
Examples
# Simulate a seperable two-dimensional AR1 spatial process
n_x = n_y = 25
n_w = 10
R_xx = exp(-0.4 * abs(outer(1:n_x, 1:n_x, FUN="-")) )
R_yy = exp(-0.4 * abs(outer(1:n_y, 1:n_y, FUN="-")) )
z = mvtnorm::rmvnorm(1, sigma=kronecker(R_xx,R_yy) )
# Simulate nuissance parameter z from oscillatory (day-night) process
w = sample(1:n_w, replace=TRUE, size=length(z))
Data = data.frame( expand.grid(x=1:n_x, y=1:n_y), w=w, z=as.vector(z) + cos(w/n_w*2*pi))
Data$n = Data$z + rnorm(nrow(Data), sd=1)
# Add columns for multivariate and/or temporal dimensions
Data$var = "n"
# make SPDE mesh for spatial term
mesh = fmesher::fm_mesh_2d( Data[,c('x','y')], n=100 )
# fit model with cyclic confounder as GAM term
out = tinyVAST( data = Data,
formula = n ~ s(w),
spatial_domain = mesh,
space_term = "n <-> n, sd_n" )
# Run crossvalidation (too slow for CRAN)
# \donttest{
CV = cv::cv( out, k = 4 )
#> R RNG seed set to 635383
#> Error in eval(call, parent.frame()): object 'mesh' not found
CV
#> Error: object 'CV' not found
# }
