Calculates an estimator for a derived quantity by summing across multiple predictions. This can be used to approximate an integral when estimating area-expanded abundance, abundance-weighting a covariate to calculate distribution shifts, and/or weighting one model variable by another.
Arguments
- object
Output from
tinyVAST()
.- newdata
New data-frame of independent variables used to predict the response, where a total value is calculated by combining across these individual predictions. If these locations are randomly drawn from a specified spatial domain, then
integrate_output
applies Monte Carlo integration to approximate the total over that area. If locations are drawn sysmatically from a domain, thenintegrate_output
is applying a midpoint approximation to the integral.- area
vector of values used for area-weighted expansion of estimated density surface for each row of
newdata
with length ofnrow(newdata)
.- type
Integer-vector indicating what type of expansion to apply to each row of
newdata
, with length ofnrow(newdata)
.type=1
Area-weighting: weight predictor by argument
area
type=2
Abundance-weighted covariate: weight
covariate
by proportion of total in each row ofnewdata
type=3
Abundance-weighted variable: weight predictor by proportion of total in a prior row of
newdata
. This option is used to weight a prediction for one category based on predicted density of another category, e.g., to calculate abundance-weighted condition in a bivariate model.type=0
Exclude from weighting: give weight of zero for a given row of
newdata
. Including a row ofnewdata
withtype=0
is useful, e.g., when calculating abundance at that location, where the eventual index uses abundance as weighting term but without otherwise using the predicted density in calculating a total value.
- weighting_index
integer-vector used to indicate a previous row that is used to calculate a weighted average that is then applied to the given row of
newdata
. Only used for whentype=3
.- covariate
numeric-vector used to provide a covariate that is used in expansion, e.g., to provide positional coordinates when calculating the abundance-weighted centroid with respect to that coordinate. Only used for when
type=2
.- getsd
logical indicating whether to get the standard error, where
getsd=FALSE
is faster during initial exploration- bias.correct
logical indicating if bias correction should be applied using standard methods in
TMB::sdreport()
- apply.epsilon
Apply epsilon bias correction using a manual calculation rather than using the conventional method in TMB::sdreport? See details for more information.
- intern
Do Laplace approximation on C++ side? Passed to
TMB::MakeADFun()
.
Details
Analysts will often want to calculate some value by combiningg the predicted response at multiple
locations, and potentially from multiple variables in a multivariate analysis.
This arises in a univariate model, e.g., when calculating the integral under a predicted
density function, which is approximated using a midpoint or Monte Carlo approximation
by calculating the linear predictors at each location newdata
,
applying the inverse-link-trainsformation,
and calling this predicted response mu_g
. Total abundance is then be approximated
by multiplying mu_g
by the area associated with each midpoint or Monte Carlo
approximation point (supplied by argument area
),
and summing across these area-expanded values.
In more complicated cases, an analyst can then use covariate
to calculate the weighted average
of a covariate for each Monte Carlo point. For example, if the covariate is
positional coordinates or depth/elevation, then type=2
measures shifts in the average habitat utilization with respect to that covariate.
Alternatively, an analyst fitting a multivariate model might weight one variable
based on another using weighting_index
, e.g.,
to calculate abundance-weighted average condition, or
predator-expanded stomach contents.
In practice, spatial integration in a multivariate model requires two passes through the rows of
newdata
when calculating a total value. In the following, we
write equations using C++ indexing conventions such that indexing starts with 0,
to match the way that integrate_output
expects indices to be supplied.
Given inverse-link-transformed predictor \( \mu_g \),
function argument type
as \( type_g \)
function argument area
as \( a_g \),
function argument covariate
as \( x_g \),
function argument weighting_index
as \eqn{ h_g }
function argument weighting_index
as \eqn{ h_g }
the first pass calculates:
$$ \nu_g = \mu_g a_g $$
where the total value from this first pass is calculated as:
$$ \nu^* = \sum_{g=0}^{G-1} \nu_g $$
The second pass then applies a further weighting, which depends upon \( type_g \), and potentially upon \( x_g \) and \( h_g \).
If \(type_g = 0\) then \(\phi_g = 0\)
If \(type_g = 1\) then \(\phi_g = \nu_g\)
If \(type_g = 2\) then \(\phi_g = x_g \frac{\nu_g}{\nu^*} \)
If \(type_g = 3\) then \(\phi_g = \frac{\nu_{h_g}}{\nu^*} \mu_g \)
Finally, the total value from this second pass is calculated as:
$$ \phi^* = \sum_{g=0}^{G-1} \phi_g $$
and \(\phi^*\) is outputted by integrate_output
,
along with a standard error and potentially using
the epsilon bias-correction estimator to correct for skewness and retransformation
bias.
Standard bias-correction using bias.correct=TRUE
can be slow, and in
some cases it might be faster to do apply.epsilon=TRUE
and intern=TRUE
.
However, that option is somewhat experimental, and a user might want to confirm
that the two options give identical results. Similarly, using bias.correct=TRUE
will still calculate the standard-error, whereas using
apply.epsilon=TRUE
and intern=TRUE
will not.