Title: | The SHAPBoost Feature Selection Algorithm |
Version: | 1.0.0 |
Description: | The implementation of SHAPBoost, a boosting-based feature selection technique that ranks features iteratively based on Shapley values. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | xgboost, SHAPforxgboost, methods, caret, Matrix |
Suggests: | flare, survival |
URL: | https://github.com/O-T-O-Z/SHAPBoost-R |
BugReports: | https://github.com/O-T-O-Z/SHAPBoost-R/issues |
NeedsCompilation: | no |
Packaged: | 2025-09-22 09:04:24 UTC; o.t.ozyilmaz |
Author: | Ömer Tarik Özyilmaz
|
Maintainer: | Ömer Tarik Özyilmaz <o.t.ozyilmaz@umcg.nl> |
Repository: | CRAN |
Date/Publication: | 2025-09-29 16:40:02 UTC |
SHAPBoostEstimator Class
Description
This class implements the SHAPBoost algorithm for feature selection. It is designed to be extended by specific implementations such as SHAPBoostRegressor and SHAPBoostSurvival. Any new method should implement the abstract methods defined in this class.
Fields
evaluator
The model that is used to evaluate each additional feature.
metric
A character string representing the evaluation metric.
xgb_params
A list of parameters for the XGBoost model.
number_of_folds
The number of folds for cross-validation.
epsilon
A small value to determine convergence.
max_number_of_features
The maximum number of features to select.
siso_ranking_size
The number of features to consider in the SISO ranking.
siso_order
The order of combinations to consider in SISO.
reset
A logical indicating whether to reset the weights.
num_resets
The number of resets allowed.
fold_random_state
The random state for reproducibility in cross-validation.
verbose
The verbosity level of the output.
stratification
A logical indicating whether to use stratified sampling. Only applicable for c-index metric.
collinearity_check
A logical indicating whether to check for collinearity.
correlation_threshold
The threshold for correlation to consider features as collinear.
Examples
if (requireNamespace("flare", quietly = TRUE)) {
data("eyedata", package = "flare")
shapboost <- SHAPBoostRegressor$new(
max_number_of_features = 1,
evaluator = "lr",
metric = "mae",
siso_ranking_size = 10,
verbose = 0
)
X <- as.data.frame(x)
y <- as.data.frame(y)
subset <- shapboost$fit(X, y)
}
SHAPBoostRegressor is a reference class for regression feature selection through gradient boosting.
Description
This class extends the SHAPBoostEstimator class and implements methods for initializing, updating weights, scoring, and fitting estimators.
Fields
evaluator
The model that is used to evaluate each additional feature. Choice between "lr" and "xgb".
metric
The metric used for evaluation, such as "mae", "mse", or "r2".
xgb_params
A list of parameters for the XGBoost model.
number_of_folds
The number of folds for cross-validation.
epsilon
A small value to prevent division by zero.
max_number_of_features
The maximum number of features to consider.
siso_ranking_size
The size of the SISO ranking.
siso_order
The order of the SISO ranking.
reset
A boolean indicating whether to reset the model.
xgb_importance
The importance type for XGBoost.
num_resets
The number of resets for the model.
fold_random_state
The random state for folds.
verbose
The verbosity level for logging.
stratification
A boolean indicating whether to use stratification. Only applicable for c-index metric.
use_shap
A boolean indicating whether to use SHAP values.
collinearity_check
A boolean indicating whether to check for collinearity.
correlation_threshold
The threshold for correlation to consider features as collinear.
Examples
if (requireNamespace("flare", quietly = TRUE)) {
data("eyedata", package = "flare")
shapboost <- SHAPBoostRegressor$new(
max_number_of_features = 1,
evaluator = "lr",
metric = "mae",
siso_ranking_size = 10,
verbose = 0
)
X <- as.data.frame(x)
y <- as.data.frame(y)
subset <- shapboost$fit(X, y)
}
SHAPBoostSurvival is a reference class for survival analysis feature selection through gradient boosting.
Description
This class extends the SHAPBoostEstimator class and implements methods for initializing, updating weights, scoring, and fitting estimators.
Fields
evaluator
The model that is used to evaluate each additional feature. Choice between "coxph" and "xgb".
metric
The metric used for evaluation, such as "mae", "mse", or "r2".
xgb_params
A list of parameters for the XGBoost model.
number_of_folds
The number of folds for cross-validation.
epsilon
A small value to prevent division by zero.
max_number_of_features
The maximum number of features to consider.
siso_ranking_size
The size of the SISO ranking.
siso_order
The order of the SISO ranking.
reset
A boolean indicating whether to reset the model.
xgb_importance
The importance type for XGBoost.
num_resets
The number of resets for the model.
fold_random_state
The random state for folds.
verbose
The verbosity level for logging.
stratification
A boolean indicating whether to use stratification. Only applicable for c-index metric.
use_shap
A boolean indicating whether to use SHAP values.
collinearity_check
A boolean indicating whether to check for collinearity.
correlation_threshold
The threshold for correlation to consider features as collinear.
Examples
if (requireNamespace("survival", quietly = TRUE)) {
shapboost <- SHAPBoostSurvival$new(
max_number_of_features = 1,
evaluator = "coxph",
metric = "c-index",
verbose = 0,
xgb_params = list(
objective = "survival:cox",
eval_metric = "cox-nloglik"
)
)
X <- as.data.frame(survival::gbsg[, -c(1, 10, 11)])
y <- as.data.frame(survival::gbsg[, c(10, 11)])
subset <- shapboost$fit(X, y)
}