Title: The SHAPBoost Feature Selection Algorithm
Version: 1.0.0
Description: The implementation of SHAPBoost, a boosting-based feature selection technique that ranks features iteratively based on Shapley values.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: xgboost, SHAPforxgboost, methods, caret, Matrix
Suggests: flare, survival
URL: https://github.com/O-T-O-Z/SHAPBoost-R
BugReports: https://github.com/O-T-O-Z/SHAPBoost-R/issues
NeedsCompilation: no
Packaged: 2025-09-22 09:04:24 UTC; o.t.ozyilmaz
Author: Ömer Tarik Özyilmaz ORCID iD [aut, cre, cph], Tamas Szili-Török ORCID iD [aut, cph]
Maintainer: Ömer Tarik Özyilmaz <o.t.ozyilmaz@umcg.nl>
Repository: CRAN
Date/Publication: 2025-09-29 16:40:02 UTC

SHAPBoostEstimator Class

Description

This class implements the SHAPBoost algorithm for feature selection. It is designed to be extended by specific implementations such as SHAPBoostRegressor and SHAPBoostSurvival. Any new method should implement the abstract methods defined in this class.

Fields

evaluator

The model that is used to evaluate each additional feature.

metric

A character string representing the evaluation metric.

xgb_params

A list of parameters for the XGBoost model.

number_of_folds

The number of folds for cross-validation.

epsilon

A small value to determine convergence.

max_number_of_features

The maximum number of features to select.

siso_ranking_size

The number of features to consider in the SISO ranking.

siso_order

The order of combinations to consider in SISO.

reset

A logical indicating whether to reset the weights.

num_resets

The number of resets allowed.

fold_random_state

The random state for reproducibility in cross-validation.

verbose

The verbosity level of the output.

stratification

A logical indicating whether to use stratified sampling. Only applicable for c-index metric.

collinearity_check

A logical indicating whether to check for collinearity.

correlation_threshold

The threshold for correlation to consider features as collinear.

Examples

if (requireNamespace("flare", quietly = TRUE)) {
  data("eyedata", package = "flare")
  shapboost <- SHAPBoostRegressor$new(
    max_number_of_features = 1,
    evaluator = "lr",
    metric = "mae",
    siso_ranking_size = 10,
    verbose = 0
  )
  X <- as.data.frame(x)
  y <- as.data.frame(y)
  subset <- shapboost$fit(X, y)
}


SHAPBoostRegressor is a reference class for regression feature selection through gradient boosting.

Description

This class extends the SHAPBoostEstimator class and implements methods for initializing, updating weights, scoring, and fitting estimators.

Fields

evaluator

The model that is used to evaluate each additional feature. Choice between "lr" and "xgb".

metric

The metric used for evaluation, such as "mae", "mse", or "r2".

xgb_params

A list of parameters for the XGBoost model.

number_of_folds

The number of folds for cross-validation.

epsilon

A small value to prevent division by zero.

max_number_of_features

The maximum number of features to consider.

siso_ranking_size

The size of the SISO ranking.

siso_order

The order of the SISO ranking.

reset

A boolean indicating whether to reset the model.

xgb_importance

The importance type for XGBoost.

num_resets

The number of resets for the model.

fold_random_state

The random state for folds.

verbose

The verbosity level for logging.

stratification

A boolean indicating whether to use stratification. Only applicable for c-index metric.

use_shap

A boolean indicating whether to use SHAP values.

collinearity_check

A boolean indicating whether to check for collinearity.

correlation_threshold

The threshold for correlation to consider features as collinear.

Examples

if (requireNamespace("flare", quietly = TRUE)) {
  data("eyedata", package = "flare")
  shapboost <- SHAPBoostRegressor$new(
    max_number_of_features = 1,
    evaluator = "lr",
    metric = "mae",
    siso_ranking_size = 10,
    verbose = 0
  )
  X <- as.data.frame(x)
  y <- as.data.frame(y)
  subset <- shapboost$fit(X, y)
}


SHAPBoostSurvival is a reference class for survival analysis feature selection through gradient boosting.

Description

This class extends the SHAPBoostEstimator class and implements methods for initializing, updating weights, scoring, and fitting estimators.

Fields

evaluator

The model that is used to evaluate each additional feature. Choice between "coxph" and "xgb".

metric

The metric used for evaluation, such as "mae", "mse", or "r2".

xgb_params

A list of parameters for the XGBoost model.

number_of_folds

The number of folds for cross-validation.

epsilon

A small value to prevent division by zero.

max_number_of_features

The maximum number of features to consider.

siso_ranking_size

The size of the SISO ranking.

siso_order

The order of the SISO ranking.

reset

A boolean indicating whether to reset the model.

xgb_importance

The importance type for XGBoost.

num_resets

The number of resets for the model.

fold_random_state

The random state for folds.

verbose

The verbosity level for logging.

stratification

A boolean indicating whether to use stratification. Only applicable for c-index metric.

use_shap

A boolean indicating whether to use SHAP values.

collinearity_check

A boolean indicating whether to check for collinearity.

correlation_threshold

The threshold for correlation to consider features as collinear.

Examples

if (requireNamespace("survival", quietly = TRUE)) {
  shapboost <- SHAPBoostSurvival$new(
    max_number_of_features = 1,
    evaluator = "coxph",
    metric = "c-index",
    verbose = 0,
    xgb_params = list(
      objective = "survival:cox",
      eval_metric = "cox-nloglik"
    )
  )
  
  X <- as.data.frame(survival::gbsg[, -c(1, 10, 11)])
  y <- as.data.frame(survival::gbsg[, c(10, 11)])
  subset <- shapboost$fit(X, y)
}