| Title: | Reduced Model Space Bayesian Model Averaging |
|---|---|
| Description: | Implements Bayesian model averaging for settings with many candidate regressors relative to the available sample size, including cases where the number of regressors exceeds the number of observations. By restricting attention to models with at most M regressors, the package supports reduced model space inference, thereby preserving degrees of freedom for estimation. It provides posterior summaries, Extreme Bounds Analysis, model selection procedures, joint inclusion measures, and graphical tools for exploring model probabilities, model size distributions, and coefficient distributions. The methodological approach follows Doppelhofer and Weeks (2009) <doi:10.1002/jae.1046>. |
| Authors: | Krzysztof Beck [aut, cre] (ORCID: <https://orcid.org/0000-0003-3679-2962>) |
| Maintainer: | Krzysztof Beck <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-13 09:06:28 UTC |
| Source: | https://github.com/cran/rmsBMA |
This function creates a ranking of best models according to one of the possible criterion (PMP under binomial model prior, PMP under binomial-beta model prior, R^2 under binomial model prior, R^2 under binomial-beta model prior). The function gives two types of tables in three different formats: inclusion table (where 1 indicates presence of the regressor in the model and 0 indicates that the variable is excluded from the model) and estimation results table (it displays the best models and estimation output for those models: point estimates, standard errors, significance level, and R^2).
best_models(bma_list, criterion = 1, best = 5, round = 3, estimate = TRUE)best_models(bma_list, criterion = 1, best = 5, round = 3, estimate = TRUE)
bma_list |
bma object (the result of the bma function) |
criterion |
The criterion that will be used for a basis of the model ranking: |
best |
The number of the best models to be considered |
round |
Parameter indicating the decimal place to which number in the tables should be rounded (default round = 3) |
estimate |
A parameter with values TRUE or FALSE indicating which table should be displayed when
TRUE - table with estimation to the results |
A list with best_models objects:
matrix with inclusion of the regressors in the best models
matrix with estimation output in the best models with regular standard errors
knitr_kable table with inclusion of the regressors in the best models (the best for the display on the console - up to 11 models)
knitr_kable table with estimation output in the best models with regular standard errors (the best for the display on the console - up to 6 models)
gTree table with inclusion of the regressors in the best models (displayed as a plot). Use grid::grid.draw() to display.
gTree table with estimation output in the best models with regular standard errors (displayed as a plot). Use grid::grid.draw() to display.
data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) models <- best_models(bma_list, best = 3) models[[4]]data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) models <- best_models(bma_list, best = 3) models[[4]]
This function calculates bma and related objects for the modelSpace object obtained using model_space function.
bma( modelSpace, EMS = NULL, dilution = 0, dil.Par = 0.5, Narrative = 0, p = NULL, Nar_vec = NULL, round = 6 )bma( modelSpace, EMS = NULL, dilution = 0, dil.Par = 0.5, Narrative = 0, p = NULL, Nar_vec = NULL, round = 6 )
modelSpace |
Model space object (the result of the model_space function) |
EMS |
Expected model size for model binomial and binomial-beta model prior. |
dilution |
Binary parameter: 0 - NO application of a dilution prior; 1 - application of a dilution prior (George 2010). |
dil.Par |
Parameter associated with dilution prior - the exponent of the determinant (George 2010). Used only if parameter dilution=1. |
Narrative |
Binary parameter: 0 - NO application of a Narrative dilution prior; 1 - application of a Narrative dilution prior. |
p |
Parameter or vector that indicates by how much we cut probability of a model with substitutes. |
Nar_vec |
Vector with information on narrative dilution prior where: 0 - the variable has no substitutes; numbers different than 0 denote consecutive groups of variables considered to be substitutes. |
round |
Parameter indicating to which place the function should round up the results in final tables. |
A list with Posterior objects:
pmp_uniform_table - table with results with PMP under binomial model prior
pmp_random_table - table with results with PMP under binomial-beta model prior
eba_object - table with results of Extreme Bounds Analysis
pms_table - table with prior and posterior model sizes
x_names - vector with names of the regressors - to be used by the functions
K - total number of regressors
MS - size of the mode space
EMS - expected model size for binomial and binomial-beta model prior specified by the user (default EMS=K/2)
dilution - parameter indicating use of dilution
for_jointness - table for jointness function
for_best_models - table for best_models function
for_model_pmp - table for model_pmp function
for_model_sizes - table for model_sizes function
alphas - vector with the values of the constant
betas_nonzero - matrix with coefficients on regressors
data("Trade_data", package = "rmsBMA") data <- Trade_data modelSpace <- model_space(data, M = 6) bma_list <- bma(modelSpace) bma_list[[1]]data("Trade_data", package = "rmsBMA") data <- Trade_data modelSpace <- model_space(data, M = 6) bma_list <- bma(modelSpace) bma_list[[1]]
This function draws graphs of the distribution (in the form of histogram or kernel density) of the coefficients for all the considered regressors over the part of the model space that includes this regressors (half of the model space).
bma_list |
bma object (the result of the bma function) |
weight |
Parameter indicating whether the coefficients should be weighted by posterior model probabilities:
|
BW |
Parameter indicating what method should be chosen to find bin widths for the histograms:
|
binW |
A vector with bin widths to be used to construct histograms for the regressors. The vector must be of the size equal to total number of regressors. The vector with bin widths is used only if parameter BW="vec". |
BN |
Parameter taking the values (default: BN = 0): |
num |
A vector with the numbers of bins used to be used to construct histograms for the regressors. The vector must be of the size equal to total number of regressors. The vector with bin widths is used only if parameter BN=1. |
kernel |
A parameter taking the values (default: kernel = 0): |
A list with the graphs of the distribution of coefficients for all the considered regressors.
data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) coefs <- coef_hist(bma_list, kernel = 1) coefs[[1]] coefs[[2]]data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) coefs <- coef_hist(bma_list, kernel = 1) coefs[[1]] coefs[[2]]
The function extracts coefficients or standard errors from the results of g_regression and fast_ols function
coef_to_full(model_coefs, model_row)coef_to_full(model_coefs, model_row)
model_coefs |
a vector with estimated coefficients or standard errors |
model_row |
inclusion vector - row of a model space matrix |
A vector of coefficients coefficients or standard errors from the results of g_regression and fast_ols function
If the data is in the panel form the function assumes it has the following structure
section_1 year_1 y x1 x2 x3 ....
section_2 year_1 y x1 x2 x3 ....
section_3 year_1 y x1 x2 x3 ....
........
section_n year_1 y x1 x2 x3 ....
section_1 year_2 y x1 x2 x3 ....
section_2 year_2 y x1 x2 x3 ....
section_3 year_2 y x1 x2 x3 ....
........
section_n year_2 y x1 x2 x3 ....
........
section_n year_(T-1) y x1 x2 x3 ....
section_1 year_T y x1 x2 x3 ....
section_2 year_T y x1 x2 x3 ....
section_3 year_T y x1 x2 x3 ....
........
section_n year_T y x1 x2 x3 ....
data_prep( data, FE = FALSE, Time = 0, Section = 0, Time_FE = 0, Section_FE = 0, STD = 0 )data_prep( data, FE = FALSE, Time = 0, Section = 0, Time_FE = 0, Section_FE = 0, STD = 0 )
data |
A data file. |
FE |
Binary variable: TRUE - include fixed effect, FALSE - do not include fixed effects. |
Time |
The number of time periods - works only if FE=1. |
Section |
The number of cross-sections - works only if EF=1. |
Time_FE |
Binary variable: 1 - include time fixed effect, 0 - do not include time fixed effects. Works only if EF=1. |
Section_FE |
Binary variable: 1 - include cross-section fixed effect, 0 - do not include cross-section fixed effects. Works only if EF=1. |
STD |
Binary variable: 1 - standardize the data set, 0 - do not standardize the data set. By standardization we mean subtraction of a mean and division by standard deviation of each variable. |
Formatted data set.
y <- matrix(1:20,nrow=20,ncol=1) x1 <- matrix(21:40,nrow=20,ncol=1) x2 <- matrix(41:60,nrow=20,ncol=1) data <- cbind(y,x1,x2) new_data <- data_prep(data,FE=1,Time=5,Section=4,Time_FE=1,Section_FE=1,STD=0) y <- rnorm(20, mean = 0, sd = 1) x1 <- rnorm(20, mean = 0, sd = 1) x2 <- rnorm(20, mean = 0, sd = 1) data <- cbind(y,x1,x2) new_data <- data_prep(data,FE=1,Time=5,Section=4,Time_FE=1,Section_FE=1,STD=1)y <- matrix(1:20,nrow=20,ncol=1) x1 <- matrix(21:40,nrow=20,ncol=1) x2 <- matrix(41:60,nrow=20,ncol=1) data <- cbind(y,x1,x2) new_data <- data_prep(data,FE=1,Time=5,Section=4,Time_FE=1,Section_FE=1,STD=0) y <- rnorm(20, mean = 0, sd = 1) x1 <- rnorm(20, mean = 0, sd = 1) x2 <- rnorm(20, mean = 0, sd = 1) data <- cbind(y,x1,x2) new_data <- data_prep(data,FE=1,Time=5,Section=4,Time_FE=1,Section_FE=1,STD=1)
Prepares a dataset for econometric analysis by applying fixed-effects demeaning (within transformation) and/or standardization to numeric variables. The behavior of the function depends on whether panel identifiers are supplied and whether fixed effects are explicitly requested.
data_preparation( data, id = NULL, time = NULL, fixed_effects = FALSE, effect = c("twoway", "section", "time"), standardize = FALSE )data_preparation( data, id = NULL, time = NULL, fixed_effects = FALSE, effect = c("twoway", "section", "time"), standardize = FALSE )
data |
A data.frame containing the data. |
id |
An optional character string specifying the cross-sectional
(section) identifier. Must be supplied together with |
time |
An optional character string specifying the time identifier.
Must be supplied together with |
fixed_effects |
Logical. If |
effect |
A character string indicating the fixed-effects structure
when |
standardize |
Logical. If |
If both id and time are provided and fixed_effects = TRUE,
the function applies section, time, or two-way fixed-effects demeaning and may
optionally standardize the transformed variables. If fixed_effects = FALSE,
fixed-effects demeaning is skipped even when identifiers are present, and only
standardization (if requested) is applied.
If either id or time is missing, fixed-effects demeaning is not
available and the function requires standardize = TRUE.
For two-way fixed effects, the transformation is:
Standardization consists of subtracting the mean and dividing by the standard deviation of each variable and is applied after fixed-effects demeaning (if any).
The function operates in three modes:
Fixed effects only: fixed_effects = TRUE,
standardize = FALSE.
Fixed effects + standardization: fixed_effects = TRUE,
standardize = TRUE.
Standardization only: fixed_effects = FALSE,
standardize = TRUE.
When id and time are not provided, only the standardization-only
mode is available.
Missing values are ignored when computing means and standard deviations. After fixed-effects demeaning, an intercept term is redundant in subsequent linear regressions.
A data.frame containing only numeric variables used in estimation.
Panel identifiers (id, time) are removed from the output.
Transformed variables preserve their original column names.
df <- migration_panel # Standardization only (panel identifiers present but FE skipped) X <- data_preparation( df, id = "Pair_ID", time = "Year_0", fixed_effects = FALSE, standardize = TRUE ) # Two-way fixed effects with standardization X <- data_preparation( df, id = "Pair_ID", time = "Year_0", fixed_effects = TRUE, effect = "twoway", standardize = TRUE ) # Section fixed effects only X <- data_preparation( df, id = "Pair_ID", time = "Year_0", fixed_effects = TRUE, effect = "section" ) # Standardization only (no panel identifiers) X <- data_preparation(df, standardize = TRUE)df <- migration_panel # Standardization only (panel identifiers present but FE skipped) X <- data_preparation( df, id = "Pair_ID", time = "Year_0", fixed_effects = FALSE, standardize = TRUE ) # Two-way fixed effects with standardization X <- data_preparation( df, id = "Pair_ID", time = "Year_0", fixed_effects = TRUE, effect = "twoway", standardize = TRUE ) # Section fixed effects only X <- data_preparation( df, id = "Pair_ID", time = "Year_0", fixed_effects = TRUE, effect = "section" ) # Standardization only (no panel identifiers) X <- data_preparation(df, standardize = TRUE)
Computes Extreme Bounds Analysis (EBA) summaries for the intercept and each
regressor across a model space. For each coefficient, the function reports:
the minimum coefficient ("Low"), maximum coefficient ("High"), the mean
coefficient ("Mean_coef"), and corresponding "extreme bounds" defined as
and ,
where is the standard error associated
with the coefficient estimate in the model attaining the minimum/maximum.
eba(betas, VAR, Reg_ID, var_tol = 0)eba(betas, VAR, Reg_ID, var_tol = 0)
betas |
Numeric matrix of dimension |
VAR |
Numeric matrix of dimension |
Reg_ID |
Numeric or integer matrix of dimension |
var_tol |
Nonnegative numeric scalar used as a tolerance when checking
variance positivity. Entries with |
The intercept (constant) is assumed to be included in all models. Each
regressor is summarized only over models in which it is included, as indicated
by the model-inclusion matrix Reg_ID.
A numeric matrix of dimension (K+1) x 5 with columns:
evaluated at the model
where is minimal.
Minimum coefficient value across relevant models.
Mean coefficient across relevant models (intercept: all models; regressor: included models only).
Maximum coefficient value across relevant models.
evaluated at the model
where is maximal.
Rows correspond to the intercept (row 1) and regressors (rows 2..K+1).
If a regressor is never included (no 1s in its column of Reg_ID), its row
will contain NA.
OLS calculation with additional objects
fast_ols(y, x)fast_ols(y, x)
y |
A vector with the dependent variable. |
x |
A matrix with with regressors as columns. |
A list with OLS objects: Coefficients, Standard errors, Marginal likelihood, R^2, Degrees of freedom, Determinant of the regressors' matrix.
x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) fast_ols(y,x)x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) fast_ols(y,x)
OLS calculation with additional objects for model with just a constant.
fast_ols_const(y)fast_ols_const(y)
y |
A vector with the dependent variable. |
A list with OLS objects: Coefficients, Standard errors, Marginal likelihood, R^2, Degrees of freedom, Determinant of the regressors matrix.
x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) fast_ols_const(y)x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) fast_ols_const(y)
OLS calculation with heteroscedasticity consistent covariance matrix (MacKinnon & White 1985).
fast_ols_HC(y, x)fast_ols_HC(y, x)
y |
A vector with the dependent variable. |
x |
A matrix with with regressors as columns. |
A list with OLS objects: Coefficients, Standard errors, Marginal likelihood, R^2, Degrees of freedom, Determinant of the regressors' matrix.
x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) fast_ols_HC(y,x)x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) fast_ols_HC(y,x)
The function implements Bayesian regression with g prior (Zellner, 1986)
g_regression(data, g = "UIP")g_regression(data, g = "UIP")
data |
A matrix with data. The first column is interpreted as with the dependent variable, while the remaining columns are interpreted as regressors. |
g |
Value for g in the g prior. Either a number above zero specified by the user or: |
A list with g_regression objects:
Expected values of coefficients
Posterior standard errors
Natural logarithm of marginal likelihood
R^2 form ols model
Degrees of freedom
Determinant of the regressors' matrix
x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression(data, g = "UIP") g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression(data, g = "benchmark") g_result[[1]] g_result[[2]]x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression(data, g = "UIP") g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression(data, g = "benchmark") g_result[[1]] g_result[[2]]
The function implements Bayesian regression with g prior (Zellner, 1986)
g_regression_fast(data, g = 0.5)g_regression_fast(data, g = 0.5)
data |
A matrix with data. The first column is interpreted as with the dependent variable, while the remaining columns are interpreted as regressors. |
g |
Value for g in the g prior. Default value: g = 0.5. |
A list with g_regression objects:
Expected values of coefficients
Posterior standard errors
Natural logarithm of marginal likelihood
R^2 form ols model
Degrees of freedom
Determinant of the regressors' matrix
x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast(data, g = 0.99) g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast(data, g = 1.1) g_result[[1]] g_result[[2]]x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast(data, g = 0.99) g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast(data, g = 1.1) g_result[[1]] g_result[[2]]
The function implements Bayesian regression with g prior (Zellner, 1986) for a model with just a constant
g_regression_fast_const(y, g = 0.5)g_regression_fast_const(y, g = 0.5)
y |
A vector with data - only the dependent variable. |
g |
Value for g in the g prior. Default value: g = 0.5. |
A list with g_regression objects:
Expected values of coefficients
Posterior standard errors
Natural logarithm of marginal likelihood
R^2 form ols model
Degrees of freedom
Determinant of the regressors' matrix
x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e g_result <- g_regression_fast_const(y, g = 0.99) g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e g_result <- g_regression_fast_const(y, g = 1.1) g_result[[1]] g_result[[2]]x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e g_result <- g_regression_fast_const(y, g = 0.99) g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e g_result <- g_regression_fast_const(y, g = 1.1) g_result[[1]] g_result[[2]]
The function implements Bayesian regression with g prior (Zellner, 1986)
g_regression_fast_HC(data, g = 0.5)g_regression_fast_HC(data, g = 0.5)
data |
A matrix with data. The first column is interpreted as with the dependent variable, while the remaining columns are interpreted as regressors. |
g |
Value for g in the g prior. Default value: g = 0.5. |
A list with g_regression objects:
Expected values of coefficients
Posterior standard errors
Natural logarithm of marginal likelihood
R^2 form ols model
Degrees of freedom
Determinant of the regressors' matrix
x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast_HC(data, g = 0.99) g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast_HC(data, g = 1.1) g_result[[1]] g_result[[2]]x1 <- rnorm(100, mean = 0, sd = 1) x2 <- rnorm(100, mean = 0, sd = 2) e <- rnorm(100, mean = 0, sd = 5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast_HC(data, g = 0.99) g_result[[1]] g_result[[2]] x1 <- rnorm(50, mean = 0, sd = 1) x2 <- rnorm(50, mean = 0, sd = 2) e <- rnorm(50, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2) g_result <- g_regression_fast_HC(data, g = 1.1) g_result[[1]] g_result[[2]]
Computes a group-based dilution factor for each model (row) in a model-inclusion
matrix. Regressors are assigned to "relatedness groups" via Nar_vec. For each
model and each group, the dilution exponent increases by 1 for every additional
regressor from that group beyond the first. The model's dilution factor is the
product of group-specific penalties.
group_dilution(Reg_ID, Nar_vec, p)group_dilution(Reg_ID, Nar_vec, p)
Reg_ID |
An |
Nar_vec |
An integer vector of length |
p |
Either:
If
|
Formally, for model i and group h >= 1, let
and
If p is a scalar, the dilution factor is
.
If p is group-specific with values p_h, then
A numeric vector of length MS containing the dilution factor for each model.
# Example model space: MS models, K regressors Reg_ID <- rbind( c(0,0,0,0,0), # null c(1,1,0,0,0), # two from group 1 -> penalty p_1^(1) c(1,1,1,0,0), # three from group 1 -> penalty p_1^(2) c(1,1,0,1,1) # two from group 1 and two from group 2 -> p_1^1 * p_2^1 ) Nar_vec <- c(1,1,1,2,2) # Scalar p (same for all groups) group_dilution(Reg_ID, Nar_vec, p = 0.7) # Group-specific p (vector in group order: group 1 then group 2) group_dilution(Reg_ID, Nar_vec, p = c(0.7, 0.5)) # Group-specific p with explicit mapping via names group_dilution(Reg_ID, Nar_vec, p = c("1"=0.7, "2"=0.5))# Example model space: MS models, K regressors Reg_ID <- rbind( c(0,0,0,0,0), # null c(1,1,0,0,0), # two from group 1 -> penalty p_1^(1) c(1,1,1,0,0), # three from group 1 -> penalty p_1^(2) c(1,1,0,1,1) # two from group 1 and two from group 2 -> p_1^1 * p_2^1 ) Nar_vec <- c(1,1,1,2,2) # Scalar p (same for all groups) group_dilution(Reg_ID, Nar_vec, p = 0.7) # Group-specific p (vector in group order: group 1 then group 2) group_dilution(Reg_ID, Nar_vec, p = c(0.7, 0.5)) # Group-specific p with explicit mapping via names group_dilution(Reg_ID, Nar_vec, p = c("1"=0.7, "2"=0.5))
This function calculates four types of the jointness measures based on the posterior model probabilities calculated using binomial and binomial-beta model prior. The four measures are:
HCGHM - for Hofmarcher et al. (2018) measure;
LS - for Ley & Steel (2007) measure;
DW - for Doppelhofer & Weeks (2009) measure;
PPI - for posterior probability of including both variables.
The measures under binomial model prior will appear in a table above the diagonal, and the measure calculated under binomial-beta model prior below the diagonal.
REFERENCES
Doppelhofer G, Weeks M (2009) Jointness of growth determinants. Journal of Applied Econometrics., 24(2), 209-244. doi: 10.1002/jae.1046
Hofmarcher P, Crespo Cuaresma J, Grün B, Humer S, Moser M (2018) Bivariate jointness measures in Bayesian Model Averaging: Solving the conundrum. Journal of Macroeconomics, 57, 150-165. doi: 10.1016/j.jmacro.2018.05.005
Ley E, Steel M (2007) Jointness in Bayesian variable selection with applications to growth regression. Journal of Macroeconomics, 29(3), 476-493. doi: 10.1016/j.jmacro.2006.12.002
jointness(bma_list, measure = "HCGHM", rho = 0.5, round = 3)jointness(bma_list, measure = "HCGHM", rho = 0.5, round = 3)
bma_list |
bma object (the result of the bma function) |
measure |
Parameter for choosing the measure of jointness: |
rho |
The parameter "rho" ( |
round |
Parameter indicating the decimal place to which the jointness measures should be rounded (default round = 3). |
A table with jointness measures for all the pairs of regressors used in the analysis. The results obtained with the binomial model prior are above the diagonal, while the ones obtained with the binomial-beta prior are below.
data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) jointness_table <- jointness(bma_list)data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) jointness_table <- jointness(bma_list)
The dataset contains bilateral migration and its economic determinants for 23 European Union countries over the period 1995–2020. Each observation represents a country pair in a given 5 year period, resulting in 353 unique country pairs and four periods (the first period is utilized for lagged data only).
The countries included are: Austria, Belgium, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Netherlands, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden and the United Kingdom
The dataset was used in: Afonso, Aves, Beck (2025), Drivers of migration flows in the European Union: Earnings or unemployment?, International Labour Review, 164(2), 1-23. doi: 10.16995/ilr.18845
data(migration_panel)data(migration_panel)
A data frame with 1012 rows and 14 variables:
Column indication observation period.
Column indication country pair.
The absolute value of the net migration flows scaled by the sum of the population of a given pair of countries.
The absolute value of the net migration flows scaled by the sum of the population of a given pair of countries lagged by one period (5 years).
The absolute value of the difference in unemployment rates, averaged over the 5-year period.
The absolute value of the difference in net earnings expressed in PPP, averaged over the 5-year period.
The absolute value of the difference in mean income tax, averaged over the 5-year period.
The absolute value of the difference in mean social benefits per person, averaged over the 5-year period.
A natural logarithm of the distance between the capital of a given pair of countries based on the shortest route.
The absolute value of the difference in mean annual temperature, averaged over the 5-year period.
The absolute value of the difference in the human capital index (Barro and Lee 2013), averaged over the 5-year period.
The absolute value of the difference in government spending share of GDP, averaged over the 5-year period.
The absolute value of the difference in the Gini coefficient between a pair of countries, averaged over the 5-year period.
The absolute value of the difference in the fertility rate between a pair of countries, averaged over the 5-year period.
The absolute difference in the value of control of corruption measure from the Worldwide Governance Indicator, averaged over the 5-year period.
The absolute value of the difference in the number of intentional homicides per 1 000 inhabitants, averaged over the 5-year period.
Harvard Dataverse. doi:10.7910/DVN/GTOFJB
The function creates a matrix with ones indicating inclusion and zeros indicating inclusion of a regressor in a model
model_matrix(K, M)model_matrix(K, M)
K |
total number of regressors |
M |
maximum number of regressor in a model |
A matrix with ones indicating inclusion and zeros indicating inclusion of a regressor in a model
This function draws four graphs of prior and posterior model probabilities for the best individual models:
a) The results with binomial model prior (based on PMP - posterior model probability)
b) The results with binomial-beta model prior (based on PMP - posterior model probability)
Models on the graph are ordered according to their posterior model probability.
bma_list |
bma_list object (the result of the bma function) |
top |
The number of the best model to be placed on the graphs |
A list with three graphs with prior and posterior model probabilities for individual models:
The results with binomial model prior (based on PMP - posterior model probability)
The results with binomial-beta model prior (based on PMP - posterior model probability)
On graph combining the aforementioned graphs
data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) model_pmps <- model_pmp(bma_list, top = 100) model_pmps[[1]]data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) model_pmps <- model_pmp(bma_list, top = 100) model_pmps[[1]]
This function draws four graphs of prior and posterior model probabilities:
a) The results with binomial model prior (based on PMP - posterior model probability)
b) The results with binomial-beta model prior (based on PMP - posterior model probability)
bma_list |
bma_list object (the result of the bma function) |
A list with three graphs with prior and posterior model probabilities for model sizes:
The results with binomial model prior (based on PMP - posterior model probability)
The results with binomial-beta model prior (based on PMP - posterior model probability)
One graph combining all the aforementioned graphs
data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) sizes <- model_sizes(bma_list) sizes[[1]]data("Trade_data", package = "rmsBMA") data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) sizes <- model_sizes(bma_list) sizes[[1]]
This function calculates all possible models with M regressors that can be constructed out of K regressors.
model_space(data, M = NULL, g = "UIP", HC = FALSE)model_space(data, M = NULL, g = "UIP", HC = FALSE)
data |
Data set to work with. The first column is the data for the dependent variable, and the other columns is the data for the regressors. |
M |
Maximum number of regressor in the estimated models (default is K - total number of regressors). |
g |
Value for g in the g prior. Either a number above zero specified by the user or: |
HC |
Logical indicator (default = FALSE) specifying whether a heteroscedasticity-consistent covariance matrix should be used for the estimation of standard errors (MacKinnon & White 1985). |
A list with model_space objects:
x_names - vector with names of the regressors
ols_results - table with the model space - contains ols objects for all the estimated models
MS - size of the mode space
M - maximum number of regressors in a model
K- total number of regressors
x1 <- rnorm(20, mean = 0, sd = 1) x2 <- rnorm(20, mean = 0, sd = 2) x3 <- rnorm(20, mean = 0, sd = 3) x4 <- rnorm(20, mean = 0, sd = 1) x5 <- rnorm(20, mean = 0, sd = 2) x6 <- rnorm(20, mean = 0, sd = 4) e <- rnorm(20, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2,x3,x4,x5,x6) modelSpace <- model_space(data, M = 3)x1 <- rnorm(20, mean = 0, sd = 1) x2 <- rnorm(20, mean = 0, sd = 2) x3 <- rnorm(20, mean = 0, sd = 3) x4 <- rnorm(20, mean = 0, sd = 1) x5 <- rnorm(20, mean = 0, sd = 2) x6 <- rnorm(20, mean = 0, sd = 4) e <- rnorm(20, mean = 0, sd = 0.5) y <- 2 + x1 + 2*x2 + e data <- cbind(y,x1,x2,x3,x4,x5,x6) modelSpace <- model_space(data, M = 3)
A pre-computed model space object obtained using the
model_space function on the Trade_data_small
dataset with a maximum of 7 regressors and the Unit Information Prior.
A list of length 5 with the following components:
Character vector containing the names of the regressors.
Matrix containing the full model space. Each row corresponds to a model specification and includes:
binary inclusion indicators for regressors,
estimated coefficients (including intercept),
standard errors,
log-marginal likelihood,
,
degrees of freedom,
dilution prior term.
Integer. Total number of models in the model space.
Integer. Maximum number of regressors allowed in each model.
Integer. Total number of available regressors.
The object contains all possible linear regression models constructed from the available regressors (up to 7 included simultaneously), together with their estimated coefficients, standard errors, log-marginal likelihood values, R-squared statistics, degrees of freedom, and dilution prior components.
The g-prior specification corresponds to the Unit Information Prior
(UIP), i.e. , where denotes the sample size.
Generated using:
modelSpace <- model_space(Trade_data_small, M = 7, g = "UIP")
OLS calculation with additional objects
ols(y, x, const, Norm = NULL)ols(y, x, const, Norm = NULL)
y |
A vector with the dependent variable. |
x |
A matrix with with regressors as columns or 0 for model without any regressors. |
const |
Binary variable: 1 - include a constant in the estimation, 0 - do not include a constant in the estimation. |
Norm |
A parameter used to correct likelihood function when it gets to close to zero in the case of high number of observations. |
A list with OLS objects: Coefficients, Standard errors, Marginal likelihood, R^2, Degrees of freedom, Determinant of the regressors' matrix, log(Marginal likelihood).
x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) y<-2+x1+2*x2 x<-cbind(x1,x2) const<-1 ols(y,x,const) x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) const<-1 ols(y,x,const)x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) y<-2+x1+2*x2 x<-cbind(x1,x2) const<-1 ols(y,x,const) x1<-rnorm(10, mean = 0, sd = 1) x2<-rnorm(10, mean = 0, sd = 2) e<-rnorm(10, mean = 0, sd = 0.5) y<-2+x1+2*x2+e x<-cbind(x1,x2) const<-1 ols(y,x,const)
Computes posterior probabilities of a positive coefficient sign, P(+), for the intercept and each regressor by averaging model-specific probabilities across the model space, weighted by posterior model probabilities.
plus_pmp_from_pmp(pmp_uniform, pmp_random, betas, VAR, DF, Reg_ID)plus_pmp_from_pmp(pmp_uniform, pmp_random, betas, VAR, DF, Reg_ID)
pmp_uniform |
Numeric vector of length |
pmp_random |
Numeric vector of length |
betas |
Numeric matrix of dimension |
VAR |
Numeric matrix of dimension |
DF |
Numeric vector of length |
Reg_ID |
Numeric or integer matrix of dimension |
For a given model and coefficient , the contribution is
where is the CDF of the Student-
distribution with degrees of freedom.
The intercept is included in all models, while each regressor contributes
only in those models in which it is included, as indicated by the model
inclusion matrix Reg_ID.
The posterior probability of a positive sign for coefficient is
computed as
where denotes the set of models that include regressor
. For the intercept, contains all models.
This definition follows the sign-probability interpretation in Doppelhofer and Weeks (2009).
A list with two elements:
A (K+1) x 1 numeric matrix containing
posterior probabilities of a positive coefficient sign under the
uniform model prior. The first row corresponds to the intercept.
A (K+1) x 1 numeric matrix containing
posterior probabilities of a positive coefficient sign under the
random model prior. The first row corresponds to the intercept.
Doppelhofer, G. and Weeks, M. (2009). Jointness of growth determinants. Journal of Applied Econometrics, 24(2), 209–244.
This function draws graphs of the posterior densities of all the coefficients of interest.
bma_list |
bma object (the result of the bma function) |
prior |
Parameter indicating which model prior should be used for calculations:
|
A list with the graphs of the posterior densities of coefficients for all the considered regressors.
data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) distPlots <- posterior_dens(bma_list, prior = "binomial") distPlots[[1]] distPlots[[2]]data <- Trade_data[,1:10] modelSpace <- model_space(data, M = 9, g = "UIP") bma_list <- bma(modelSpace) distPlots <- posterior_dens(bma_list, prior = "binomial") distPlots[[1]] distPlots[[2]]
Computes the posterior probability that a regression coefficient has the same sign as its posterior mean, following the sign-certainty measure used in Sala-i-Martin, Doppelhofer and Miller (2004) and Doppelhofer and Weeks (2009).
posterior_sign_certainty(pmp_uniform, pmp_random, betas, VAR, DF, Reg_ID)posterior_sign_certainty(pmp_uniform, pmp_random, betas, VAR, DF, Reg_ID)
pmp_uniform |
Numeric vector of length |
pmp_random |
Numeric vector of length |
betas |
Numeric matrix of dimension |
VAR |
Numeric matrix of dimension |
DF |
Numeric vector of length |
Reg_ID |
Numeric or integer matrix of dimension |
For each coefficient , define
where is the CDF of the Student-
distribution with degrees of freedom.
The posterior sign certainty probability is then defined as
The intercept is included in all models. For slope coefficients that are
excluded from a given model, the contribution to is set to
, reflecting a symmetric distribution centered at zero.
A list with four elements:
A (K+1) x 1 numeric matrix containing posterior
sign certainty probabilities under the uniform model prior.
A (K+1) x 1 numeric matrix containing posterior
sign certainty probabilities under the random model prior.
A (K+1) x 1 numeric matrix of posterior means
under the uniform model prior.
A (K+1) x 1 numeric matrix of posterior means
under the random model prior.
Sala-i-Martin, X., Doppelhofer, G., and Miller, R. I. (2004). Determinants of long-term growth: A Bayesian averaging of classical estimates. American Economic Review, 94(4), 813–835.
Doppelhofer, G. and Weeks, M. (2009). Jointness of growth determinants. Journal of Applied Econometrics, 24(2), 209–244.
The function creates a matrix with regressors that should be included in a specific model based on inclusion vector
subset_design(x, model_row)subset_design(x, model_row)
x |
matrix with regressors |
model_row |
inclusion vector - row of a model space matrix |
A matrix with regressors to be used in a specific model
The dataset contains bilateral trade and its economic determinants for 26 European Union countries over the period 1995–2015. Each observation represents a country pair, resulting in 325 unique trading pairs.
The countries included are: Austria, Belgium, Bulgaria, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, the Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, and the United Kingdom.
The dataset was used in: Beck (2020), What drives international trade? Robust analysis for the European Union, Journal of International Studies, 13(3), 68–84.
data(Trade_data)data(Trade_data)
A data frame with 325 rows and 18 variables:
Natural logarithm of bilateral trade between two countries.
Border dummy (1 if countries share a common border; 0 otherwise).
Natural logarithm of the shortest distance between capital cities.
Common language dummy (1 if countries share at least one official language; 0 otherwise).
Natural logarithm of the product of real GDPs of the two countries.
Absolute difference in real GDP per capita.
Absolute difference in government expenditure shares of GDP.
Absolute difference in human capital indicators.
Absolute difference in capital per worker.
Absolute difference in the standard deviation of inflation rates.
Absolute difference in arable land.
Absolute difference in arable land per worker.
Absolute difference in total land area.
Absolute difference in land area per capita.
Absolute difference in electricity consumption per capita.
Absolute difference in foreign direct investment flows.
Krugman specialization index (value added, 35 sectors).
Absolute difference in the Bayesian Corruption Index.
Harvard Dataverse. doi:10.7910/DVN/JMMOEA
The dataset contains bilateral trade and its economic determinants for 26 European Union countries over the period 1995–2015. Each observation represents a country pair, resulting in 325 unique trading pairs.
The countries included are: Austria, Belgium, Bulgaria, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, the Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, and the United Kingdom.
The dataset was used in: Beck (2020), What drives international trade? Robust analysis for the European Union, Journal of International Studies, 13(3), 68–84.
data(Trade_data_small)data(Trade_data_small)
A data frame with 325 rows and 11 variables:
Natural logarithm of bilateral trade between two countries.
Border dummy (1 if countries share a common border; 0 otherwise).
Natural logarithm of the shortest distance between capital cities.
Common language dummy (1 if countries share at least one official language; 0 otherwise).
Natural logarithm of the product of real GDPs of the two countries.
Absolute difference in human capital indicators.
Absolute difference in the standard deviation of inflation rates.
Absolute difference in arable land.
Absolute difference in arable land per worker.
Absolute difference in total land area.
Absolute difference in land area per capita.
Harvard Dataverse. doi:10.7910/DVN/JMMOEA