Title: | Functions for Stochastic Search Variable Selection (SSVS) |
---|---|
Description: | Functions for performing stochastic search variable selection (SSVS) for binary and continuous outcomes and visualizing the results. SSVS is a Bayesian variable selection method used to estimate the probability that individual predictors should be included in a regression model. Using MCMC estimation, the method samples thousands of regression models in order to characterize the model uncertainty regarding both the predictor set and the regression parameters. For details see Bainter, McCauley, Wager, and Losin (2020) Improving practices for selecting a subset of important predictors in psychology: An application to predicting pain, Advances in Methods and Practices in Psychological Science 3(1), 66-80 <DOI:10.1177/2515245919885617>. |
Authors: | Sierra Bainter [cre, aut] , Thomas McCauley [aut], Mahmoud Fahmy [aut], Dean Attali [aut] |
Maintainer: | Sierra Bainter <[email protected]> |
License: | GPL-3 |
Version: | 2.0.0 |
Built: | 2024-11-08 05:30:00 UTC |
Source: | https://github.com/sabainter/ssvs |
ssvs
function
@format A data frame with 74 records and 76 variablesExample dataset for ssvs
function
@format A data frame with 74 records and 76 variables
dat
dat
An object of class data.frame
with 74 rows and 76 columns.
Run an interactive analysis tool (Shiny app) that lets you perform SSVS in a browser
launch()
launch()
Plot results of an SSVS model
## S3 method for class 'ssvs' plot(x, threshold = 0.5, legend = TRUE, title = NULL, color = TRUE, ...)
## S3 method for class 'ssvs' plot(x, threshold = 0.5, legend = TRUE, title = NULL, color = TRUE, ...)
x |
An SSVS result object obtained from |
threshold |
An MIP threshold to show on the plot, must be between 0-1.
If |
legend |
If |
title |
The title of the plot. Set to |
color |
If |
... |
Ignored |
Creates a plot of the inclusion probabilities by variable
outcome <- "qsec" predictors <- c("cyl", "disp", "hp", "drat", "wt", "vs", "am", "gear", "carb", "mpg") results <- ssvs(x = predictors, y = outcome, data = mtcars, progress = FALSE) plot(results)
outcome <- "qsec" predictors <- c("cyl", "disp", "hp", "drat", "wt", "vs", "am", "gear", "carb", "mpg") results <- ssvs(x = predictors, y = outcome, data = mtcars, progress = FALSE) plot(results)
For continuous outcomes, a basic Gibbs sampler is used. For binary
outcomes, BoomSpikeSlab::logit.spike()
is used.
ssvs( data, y, x, continuous = TRUE, inprob = 0.5, runs = 20000, burn = 5000, a1 = 0.01, b1 = 0.01, prec.beta = 0.1, progress = TRUE )
ssvs( data, y, x, continuous = TRUE, inprob = 0.5, runs = 20000, burn = 5000, a1 = 0.01, b1 = 0.01, prec.beta = 0.1, progress = TRUE )
data |
The dataframe used to extract predictors and response values |
y |
The response variable |
x |
The set of predictor variables |
continuous |
If |
inprob |
Prior inclusion probability value, which applies to all predictors. The prior inclusion probability reflects the prior belief that each predictor should be included in the model. A prior inclusion probability of .5 reflects the belief that each predictor has an equal probability of being included or excluded. Note that a value of .5 also implies a prior belief that the true model contains half of the candidate predictors. The prior inclusion probability will influence the magnitude of the marginal inclusion probabilities (MIPs), but the relative pattern of MIPs is expected to remain fairly consistent, see Bainter et al. (2020) for more information. |
runs |
Total number of iterations (including burn-in). Results are based on the Total - Burn-in iterations. |
burn |
Number of burn-in iterations. Burn-in iterations are discarded warmup iterations used to achieve MCMC convergence. You may increase the number of burn-in iterations if you are having convergence issues. |
a1 |
Prior parameter for Gamma(a,b) distribution on the precision (1/variance)
residual variance. Only used when |
b1 |
Prior parameter for Gamma(a,b) distribution on the precision (1/variance)
residual variance. Only used when |
prec.beta |
Prior precision (1/variance) for beta coefficients.
Only used when |
progress |
If |
An SSVS object that can be used in
summary()
or plot()
.
# Example 1: continuous response variable outcome <- "qsec" predictors <- c("cyl", "disp", "hp", "drat", "wt", "vs", "am", "gear", "carb", "mpg") results <- ssvs(data = mtcars, x = predictors, y = outcome, progress = FALSE) # Example 2: binary response variable library(AER) data(Affairs) Affairs$hadaffair[Affairs$affairs > 0] <- 1 Affairs$hadaffair[Affairs$affairs == 0] <- 0 outcome <- "hadaffair" predictors <- c("gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating") results <- ssvs(data = Affairs, x = predictors, y = outcome, continuous = FALSE, progress = FALSE)
# Example 1: continuous response variable outcome <- "qsec" predictors <- c("cyl", "disp", "hp", "drat", "wt", "vs", "am", "gear", "carb", "mpg") results <- ssvs(data = mtcars, x = predictors, y = outcome, progress = FALSE) # Example 2: binary response variable library(AER) data(Affairs) Affairs$hadaffair[Affairs$affairs > 0] <- 1 Affairs$hadaffair[Affairs$affairs == 0] <- 0 outcome <- "hadaffair" predictors <- c("gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating") results <- ssvs(data = Affairs, x = predictors, y = outcome, continuous = FALSE, progress = FALSE)
Summarize results from SSVS including marginal inclusion probabilities, Bayesian model averaged parameter estimates, and 95% highest posterior density credible intervals. Estimates and credible intervals are based on standardized X variables.
## S3 method for class 'ssvs' summary(object, interval = 0.89, threshold = 0, ordered = FALSE, ...)
## S3 method for class 'ssvs' summary(object, interval = 0.89, threshold = 0, ordered = FALSE, ...)
object |
An SSVS result object obtained from |
interval |
The desired probability for the credible interval, specified as a decimal |
threshold |
Minimum MIP threshold where a predictor will be shown in the output, specified as a decimal |
ordered |
If |
... |
Ignored |
A dataframe with results
outcome <- "qsec" predictors <- c("cyl", "disp", "hp", "drat", "wt", "vs", "am", "gear", "carb", "mpg") results <- ssvs(data = mtcars, x = predictors, y = outcome, progress = FALSE) summary(results, interval = 0.9, ordered = TRUE)
outcome <- "qsec" predictors <- c("cyl", "disp", "hp", "drat", "wt", "vs", "am", "gear", "carb", "mpg") results <- ssvs(data = mtcars, x = predictors, y = outcome, progress = FALSE) summary(results, interval = 0.9, ordered = TRUE)