Integration for target variable — integrate

Calculates an estimator for a derived quantity by summing across multiple predictions. This can be used to approximate an integral when estimating area-expanded abundance, abundance-weighting a covariate to calculate distribution shifts, and/or weighting one model variable by another.

Usage

integrate_output(
  object,
  newdata,
  area,
  type = rep(1, nrow(newdata)),
  weighting_index,
  covariate,
  getsd = TRUE,
  bias.correct = TRUE,
  apply.epsilon = FALSE,
  intern = FALSE
)

Arguments

object

Output from tinyVAST().

newdata

New data-frame of independent variables used to predict the response, where a total value is calculated by combining across these individual predictions. If these locations are randomly drawn from a specified spatial domain, then integrate_output applies midpoint integration to approximate the total over that area. If locations are drawn sysmatically from a domain, then integrate_output is applying a midpoint approximation to the integral.

area

vector of values used for area-weighted expansion of estimated density surface for each row of newdata with length of nrow(newdata).

type

Integer-vector indicating what type of expansion to apply to each row of newdata, with length of nrow(newdata).

type=1: Area-weighting: weight predictor by argument area
type=2: Abundance-weighted covariate: weight covariate by proportion of total in each row of newdata
type=3: Abundance-weighted variable: weight predictor by proportion of total in a prior row of newdata. This option is used to weight a prediction for one category based on predicted proportional density of another category, e.g., to calculate abundance-weighted condition in a bivariate model.
type=4: Abundance-expanded variable: weight predictor by density in a prior row of newdata. This option is used to weight a prediction for one category based on predicted density of another category, e.g., to calculate abundance-expanded consumption in a bivariate model.
type=0: Exclude from weighting: give weight of zero for a given row of newdata. Including a row of newdata with type=0 is useful, e.g., when calculating abundance at that location, where the eventual index uses abundance as weighting term but without otherwise using the predicted density in calculating a total value.

weighting_index

integer-vector used to indicate a previous row that is used to calculate a weighted average that is then applied to the given row of newdata. Only used for when type=3.

covariate

numeric-vector used to provide a covariate that is used in expansion, e.g., to provide positional coordinates when calculating the abundance-weighted centroid with respect to that coordinate. Only used for when type=2.

getsd

logical indicating whether to get the standard error, where getsd=FALSE is faster during initial exploration

bias.correct

logical indicating if bias correction should be applied using standard methods in TMB::sdreport()

apply.epsilon

Apply epsilon bias correction using a manual calculation rather than using the conventional method in TMB::sdreport? See details for more information.

intern

Do Laplace approximation on C++ side? Passed to TMB::MakeADFun().

Value

A vector containing the plug-in estimate, standard error, the epsilon bias-corrected estimate if available, and the standard error for the bias-corrected estimator. Depending upon settings, one or more of these will be NA values, and the function can be repeatedly called to get multiple estimators and/or statistics.

Details

Analysts will often want to calculate some value by combining the predicted response at multiple locations, and potentially from multiple variables in a multivariate analysis. This arises in a univariate model, e.g., when calculating the integral under a predicted density function, which is approximated using a midpoint or Monte Carlo approximation by calculating the linear predictors at each location newdata, applying the inverse-link-trainsformation, and calling this predicted response mu_g. Total abundance is then be approximated by multiplying mu_g by the area associated with each midpoint or Monte Carlo approximation point (supplied by argument area), and summing across these area-expanded values.

In more complicated cases, an analyst can then use covariate to calculate the weighted average of a covariate for each midpoint location. For example, if the covariate is positional coordinates or depth/elevation, then type=2 measures shifts in the average habitat utilization with respect to that covariate. Alternatively, an analyst fitting a multivariate model might weight one variable based on another using weighting_index, e.g., to calculate abundance-weighted average condition, or predator-expanded stomach contents.

In practice, spatial integration in a multivariate model requires two passes through the rows of newdata when calculating a total value. In the following, we write equations using C++ indexing conventions such that indexing starts with 0, to match the way that integrate_output expects indices to be supplied. Given inverse-link-transformed predictor $ \mu_g $, function argument type as $ type_g $ function argument area as $ a_g $, function argument covariate as $ x_g $, function argument weighting_index as \eqn{ h_g } function argument weighting_index as \eqn{ h_g } the first pass calculates:

$$ \nu_g = \mu_g a_g $$

where the total value from this first pass is calculated as:

$$ \nu^* = \sum_{g=0}^{G-1} \nu_g $$

The second pass then applies a further weighting, which depends upon $ type_g $, and potentially upon $ x_g $ and $ h_g $.

If $type_g = 0$ then $\phi_g = 0$

If $type_g = 1$ then $\phi_g = \nu_g$

If $type_g = 2$ then $\phi_g = x_g \frac{\nu_g}{\nu^*} $

If $type_g = 3$ then $\phi_g = \frac{\nu_{h_g}}{\nu^*} \mu_g $

If $type_g = 4$ then $\phi_g = \nu_{h_g} \mu_g $

Finally, the total value from this second pass is calculated as:

$$ \phi^* = \sum_{g=0}^{G-1} \phi_g $$

and $\phi^*$ is outputted by integrate_output, along with a standard error and potentially using the epsilon bias-correction estimator to correct for skewness and retransformation bias.

Standard bias-correction using bias.correct=TRUE can be slow, and in some cases it might be faster to do apply.epsilon=TRUE and intern=TRUE. However, that option is somewhat experimental, and a user might want to confirm that the two options give identical results. Similarly, using bias.correct=TRUE will still calculate the standard-error, whereas using apply.epsilon=TRUE and intern=TRUE will not.