| Title: | Variational Multivariate Spatial Small Area Estimation |
|---|---|
| Description: | Variational Autoencoded Multivariate Spatial Fay-Herriot models are designed to efficiently estimate population parameters in small area estimation. This package implements the variational generalized multivariate spatial Fay-Herriot model (VGMSFH) using 'NumPyro' and 'PyTorch' backends, as demonstrated by Wang, Parker, and Holan (2025) <doi:10.48550/arXiv.2503.14710>. The 'vmsae' package provides utility functions to load weights of the pretrained variational autoencoders (VAEs) as well as tools to train custom VAEs tailored to users specific applications. |
| Authors: | Zhenhua Wang [aut, cre], Paul A. Parker [aut, res], Scott H. Holan [aut, res] |
| Maintainer: | Zhenhua Wang <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-04-06 07:52:44 UTC |
| Source: | https://github.com/zhenhua-wang/vmsae |
This method extracts posterior mean estimates of model coefficients from a VGMSFH object. It can return either fixed effect coefficients or spatial random effects.
## S4 method for signature 'VGMSFH' coef(object, var_idx = 1, type = "fixed")## S4 method for signature 'VGMSFH' coef(object, var_idx = 1, type = "fixed")
object |
An object of class |
var_idx |
Integer. The index of the variable of interest (for multivariate models). Default is |
type |
Character. The type of coefficient to extract. Options are:
|
A numeric vector of posterior means for the selected coefficient type.
library(vmsae) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) coef(example_model) # Get fixed effect coefficients coef(example_model, type = "spatial") # Get spatial random effectslibrary(vmsae) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) coef(example_model) # Get fixed effect coefficients coef(example_model, type = "spatial") # Get spatial random effects
This method computes 95\
## S4 method for signature 'VGMSFH' confint(object, var_idx = 1, field = "yhat_samples")## S4 method for signature 'VGMSFH' confint(object, var_idx = 1, field = "yhat_samples")
object |
An object of class |
var_idx |
Integer. The index of the variable of interest (for multivariate models). Default is |
field |
Character. The name of the slot to summarize (e.g., |
The function extracts posterior samples for the specified variable and then computes quantiles to form 95\
A data frame with two columns:
lower: the 2.5\
upper: the 97.5\
library(vmsae) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) confint(example_model) # Get credible intervals for predicted values confint(example_model, field = "beta_samples") # For fixed effectslibrary(vmsae) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) confint(example_model) # Get credible intervals for predicted values confint(example_model, field = "beta_samples") # For fixed effects
An S4 class to represent a neural network decoder, used for emulating spatial priors. The class includes parameters for input and output weight matrices and biases, as well as region identifiers.
GEOIDA character vector of region or area identifiers.
W_inAn array representing the input weight matrix of the decoder.
B_inAn array representing the input bias vector of the decoder.
W_outAn array representing the output weight matrix of the decoder.
B_outAn array representing the output bias vector of the decoder.
This function downloads a pretrained VAE model archive from Zenodo, extracts its contents into a specified directory, and removes the downloaded ZIP file after extraction.
download_pretrained_vae(model_name, save_dir, verbose = TRUE)download_pretrained_vae(model_name, save_dir, verbose = TRUE)
model_name |
Character. The name of the model file (without extension) to download.
This should correspond to a |
save_dir |
Character. The local directory where the model should be saved and extracted. |
verbose |
Logical; if |
No return value, called for side effects
## Not run: library(vmsae) # this function is time consuming for the first run install_environment() load_environment() download_pretrained_vae("mo_county", tempdir()) ## End(Not run)## Not run: library(vmsae) # this function is time consuming for the first run install_environment() load_environment() download_pretrained_vae("mo_county", tempdir()) ## End(Not run)
This function creates the vmsae python environment and installs required packages.
install_environment(envname = "vmsae", use_gpu = FALSE)install_environment(envname = "vmsae", use_gpu = FALSE)
envname |
Character. The name of the Python environment to create or update.
Default is |
use_gpu |
Boolean. An indicator for whether to install packages with GPU support.
Default is |
No return value, called for side effects
## Not run: library(vmsae) # this function is time consuming for the first run install_environment() # Install into default "vmsae" environment # this step is time consuming for the first run install_environment("custom") # Install into a custom-named environment ## End(Not run)## Not run: library(vmsae) # this function is time consuming for the first run install_environment() # Install into default "vmsae" environment # this step is time consuming for the first run install_environment("custom") # Install into a custom-named environment ## End(Not run)
This function activates a specified Python virtual environment and sources Python modules used by the vmsae package, including models and python scripts.
load_environment(envname = "vmsae", is_conda = FALSE)load_environment(envname = "vmsae", is_conda = FALSE)
envname |
Character. The name of the Python environment to create or update.
Default is |
is_conda |
Boolean. The indicator for whether the loaded environment is a conda environment.
Default is |
The function loads four Python scripts located in the package's py/ directory:
vgmcar.py
vae.py
train_vae.py
car_dataset.py
The environment must be created beforehand (e.g., using install_environment()),
and must include all Python dependencies required by these modules.
No return value, called for side effects
## Not run: library(vmsae) # this function is time consuming for the first run install_environment() load_environment() # Load default "vmsae" environment # this function is time consuming for the first run install_environment("custom") load_environment("custom") # Load custom virtual environment load_environment("custom", is_conda = TRUE) # Load custom conda environment ## End(Not run)## Not run: library(vmsae) # this function is time consuming for the first run install_environment() load_environment() # Load default "vmsae" environment # this function is time consuming for the first run install_environment("custom") load_environment("custom") # Load custom virtual environment load_environment("custom", is_conda = TRUE) # Load custom conda environment ## End(Not run)
Load a pretrained Variational Autoencoder (VAE) decoder from disk. This function reads the saved PyTorch model weights and corresponding GEOID list, and constructs a Decoder S4 object with the loaded parameters.
load_vae(model_name, save_dir = NULL)load_vae(model_name, save_dir = NULL)
model_name |
Character. The name of the trained VAE model (without |
save_dir |
Character. The directory where the trained VAE model is saved. Defaults to the current directory if |
This function assumes the model was trained and saved using train_vae(), and that the decoder weights are stored in a file compatible with torch::load() (via reticulate). It extracts the decoder input/output weights and biases, along with region GEOIDs, and returns them as an S4 object of class Decoder.
An object of class Decoder, containing the decoder weights and region identifiers.
## Not run: library(vmsae) # this function is time consuming for the first run install_environment() load_environment() decoder <- load_vae(model_name = "mo_county") ## End(Not run)## Not run: library(vmsae) # this function is time consuming for the first run install_environment() load_environment() decoder <- load_vae(model_name = "mo_county") ## End(Not run)
This method plots spatial summaries of results from a VGMSFH object, including model estimates and comparisons with direct estimates.
## S4 method for signature 'VGMSFH' plot(x, shp = NULL, var_idx = 1, type = "compare", verbose = TRUE)## S4 method for signature 'VGMSFH' plot(x, shp = NULL, var_idx = 1, type = "compare", verbose = TRUE)
x |
An object of class |
shp |
An |
var_idx |
Integer. The index of the variable of interest (for multivariate models). |
type |
Character. The type of plot to generate. Options are:
|
verbose |
Logical; if |
The function provides spatial visualization of model results. It supports both univariate and multivariate response settings. When type = "compare", it generates side-by-side choropleth maps for the direct and model-based estimates. When type = "estimate", it plots the posterior mean and standard deviation of the VGMSFH model output.
If no shapefile is provided, a default geometry is loaded from the pretrained repository.
A ggplot object. The plot is rendered to the active device.
library(vmsae) library(sf) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) example_shp <- read_sf(system.file("example", "mo_county.shp", package = "vmsae")) plot(example_model, shp = example_shp, type = "compare") plot(example_model, shp = example_shp, type = "estimate", var_idx = 2)library(vmsae) library(sf) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) example_shp <- read_sf(system.file("example", "mo_county.shp", package = "vmsae")) plot(example_model, shp = example_shp, type = "compare") plot(example_model, shp = example_shp, type = "estimate", var_idx = 2)
This method provides a summary of posterior samples from a VGMSFH object, including posterior means and credible intervals for a specified parameter field.
## S4 method for signature 'VGMSFH' summary(object, var_idx = 1, field = "beta_samples")## S4 method for signature 'VGMSFH' summary(object, var_idx = 1, field = "beta_samples")
object |
An object of class |
var_idx |
Integer. The index of the variable of interest (for multivariate models). Default is |
field |
Character. The name of the slot in the |
This function extracts the posterior samples for the specified variable index, and combines it with confint() to compute credible intervals. The result is a compact summary table of central tendency and uncertainty.
A data frame with columns:
mean: Posterior mean,
lower: Lower bound of the credible interval,
upper: Upper bound of the credible interval.
library(vmsae) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) summary(example_model) # Summary of beta_samples for variable 1 summary(example_model, var_idx = 2, field = "yhat_samples")library(vmsae) example_model <- readRDS(system.file("example", "example_model.Rds", package = "vmsae")) summary(example_model) # Summary of beta_samples for variable 1 summary(example_model, var_idx = 2, field = "yhat_samples")
Trains a Variational Autoencoder (VAE) to learn the spatial structure implied by the Conditional Autoregressive (CAR) prior. The trained VAE parameters are saved and can later be used as a generator within Hamiltonian Monte Carlo (HMC) sampling.
train_vae( W, GEOID, model_name, save_dir, n_samples = 10000, batch_size = 256, epoch = 10000, lr_init = 0.001, lr_min = 1e-07, verbose = TRUE, use_gpu = TRUE )train_vae( W, GEOID, model_name, save_dir, n_samples = 10000, batch_size = 256, epoch = 10000, lr_init = 0.001, lr_min = 1e-07, verbose = TRUE, use_gpu = TRUE )
W |
Matrix. A proximity or adjacency matrix representing spatial relationships. |
GEOID |
Character vector. Identifiers for spatial units (e.g., region or area codes). |
model_name |
Character. The name of the trained VAE model. |
save_dir |
Character. Directory to save the trained VAE model and associated metadata. Defaults to the current working directory. |
n_samples |
Integer. Number of samples to draw from the prior for training. Default is |
batch_size |
Integer. Batch size for VAE training. Default is |
epoch |
Integer. Number of training epochs. Default is |
lr_init |
Numeric. Initial learning rate. Default is |
lr_min |
Numeric. Minimum learning rate at the final epoch. Default is |
verbose |
Logical; if |
use_gpu |
Boolean. Use GPU if available. Default is |
The function requires a configured Python environment via the reticulate interface,
with VAE training implemented in Python. It uses py$train_vae() defined in the
sourced Python modules (see load_environment).
A named list containing:
loss |
Total training loss |
RCL |
Reconstruction error |
KLD |
Kullback–Leibler divergence |
## Not run: library(vmsae) library(sf) # this function is time consuming for the first run install_environment() load_environment() acs_data <- read_sf(system.file("example", "mo_county.shp", package = "vmsae")) W <- readRDS(system.file("example", "W.Rds", package = "vmsae")) loss <- train_vae(W = W, GEOID = acs_data$GEOID, model_name = "test", save_dir = tempdir(), n_samples = 1000, # set to larger values in practice, e.g. 10000. batch_size = 256, epoch = 1000) # set to larger values in practice, e.g. 10000. ## End(Not run)## Not run: library(vmsae) library(sf) # this function is time consuming for the first run install_environment() load_environment() acs_data <- read_sf(system.file("example", "mo_county.shp", package = "vmsae")) W <- readRDS(system.file("example", "W.Rds", package = "vmsae")) loss <- train_vae(W = W, GEOID = acs_data$GEOID, model_name = "test", save_dir = tempdir(), n_samples = 1000, # set to larger values in practice, e.g. 10000. batch_size = 256, epoch = 1000) # set to larger values in practice, e.g. 10000. ## End(Not run)
This function runs the Variational Generalized Multivariate Spatil Fay-Herriot model (VGMSFH) using NumPyro as the inference backend. It loads pretrained VAE decoder weights, prepares the data, and performs posterior sampling.
vgmsfh_numpyro( y, y_sigma, X, W, GEOID, model_name, save_dir = NULL, num_warmup = 1000, num_samples = 1000, verbose = TRUE, use_gpu = FALSE )vgmsfh_numpyro( y, y_sigma, X, W, GEOID, model_name, save_dir = NULL, num_warmup = 1000, num_samples = 1000, verbose = TRUE, use_gpu = FALSE )
y |
Matrix. Response variables (direct estimates). |
y_sigma |
Matrix. Reported standard deviations of the responses. |
X |
Matrix. Covariate matrix. |
W |
Matrix. Proximity or adjacency matrix defining spatial structure. |
GEOID |
Character vector. FIPS codes or other region identifiers used to match with the pretrained VAE model. |
model_name |
Character. The name of the pretrained VAE model. |
save_dir |
Character. The directory where the VAE model is stored. If |
num_warmup |
Integer. Number of warmup (burn-in) iterations. Default is 1000. |
num_samples |
Integer. Number of posterior samples to draw. Default is 1000. |
verbose |
Logical; if |
use_gpu |
Boolean. Use GPU if available. GPU support is recommended only for high-dimensional datasets (e.g., those with more than 1,000 areas). Default is |
This function uses a pretrained VAE decoder to parameterize the CAR prior and enables scalable inference through NumPyro. It is suitable for both univariate and multivariate response modeling in spatial domains.
An object of class VGMSFH, which contains:
direct_estimate: the observed response data,
yhat_samples: posterior samples of the latent population process,
phi_samples: posterior samples of spatial random effects (CAR),
beta_samples: posterior samples of fixed effect coefficients,
other_samples: a list containing all sampled parameters, including mu, delta, and other intermediate quantities.
Wang, Z., Parker, P. A., & Holan, S. H. (2025). Variational Autoencoded Multivariate Spatial Fay-Herriot Models. arXiv:2503.14710. https://arxiv.org/abs/2503.14710
## Not run: library(sf) library(vmsae) # this function is time consuming for the first run install_environment() load_environment() acs_data <- read_sf(system.file("example", "mo_county.shp", package = "vmsae")) y <- readRDS(system.file("example", "y.Rds", package = "vmsae")) y_sigma <- readRDS(system.file("example", "y_sigma.Rds", package = "vmsae")) X <- readRDS(system.file("example", "X.Rds", package = "vmsae")) W <- readRDS(system.file("example", "W.Rds", package = "vmsae")) num_samples <- 1000 # set to larger values in practice, e.g. 10000. model <- vgmsfh_numpyro(y, y_sigma, X, W, GEOID = acs_data$GEOID, model_name = "mo_county", save_dir = NULL, num_samples = num_samples, num_warmup = num_samples) y_hat_np <- model@yhat_samples y_hat_mean_np <- apply(y_hat_np, c(2, 3), mean) y_hat_lower_np <- apply(y_hat_np, c(2, 3), quantile, 0.025) y_hat_upper_np <- apply(y_hat_np, c(2, 3), quantile, 0.975) plot(model, shp = acs_data, type = "compare", var_idx = 2) ## End(Not run)## Not run: library(sf) library(vmsae) # this function is time consuming for the first run install_environment() load_environment() acs_data <- read_sf(system.file("example", "mo_county.shp", package = "vmsae")) y <- readRDS(system.file("example", "y.Rds", package = "vmsae")) y_sigma <- readRDS(system.file("example", "y_sigma.Rds", package = "vmsae")) X <- readRDS(system.file("example", "X.Rds", package = "vmsae")) W <- readRDS(system.file("example", "W.Rds", package = "vmsae")) num_samples <- 1000 # set to larger values in practice, e.g. 10000. model <- vgmsfh_numpyro(y, y_sigma, X, W, GEOID = acs_data$GEOID, model_name = "mo_county", save_dir = NULL, num_samples = num_samples, num_warmup = num_samples) y_hat_np <- model@yhat_samples y_hat_mean_np <- apply(y_hat_np, c(2, 3), mean) y_hat_lower_np <- apply(y_hat_np, c(2, 3), quantile, 0.025) y_hat_upper_np <- apply(y_hat_np, c(2, 3), quantile, 0.975) plot(model, shp = acs_data, type = "compare", var_idx = 2) ## End(Not run)
An S4 class to store results from the Variational Gaussian Markov Small Area Estimation with Fay-Herriot model (VGMSFH). This class holds the posterior samples for various model components as well as the original direct estimates.
model_nameCharacter. The name of the trained VAE model.
direct_estimateArray. Direct estimates of parameters.
yhat_samplesArray. Posterior samples of the estimated parameters.
phi_samplesArray. Posterior samples of the estimated spatial random effects.
beta_samplesArray. Posterior samples of the fixed effect coefficients.
other_samplesList. List of posterior samples of other parameters in the VGMSFH model.