# TemporalForest

[![CRAN status](https://www.r-pkg.org/badges/version/TemporalForest)](https://CRAN.R-project.org/package=TemporalForest)

`TemporalForest` is an R package for reproducible feature selection in high-dimensional longitudinal data.

It combines time-aware network reduction (consensus TOM from WGCNA), mixed-effects model trees that respect within-subject correlation, and stability selection to deliver a small, interpretable, and stable set of predictors.

## Why TemporalForest?

* **Time-aware modules:** Builds a consensus topological overlap network across time points to keep temporally persistent correlations.
* **Mixed-effects trees:** LMER-trees handle within-subject dependence and reduce split bias common in standard trees.
* **Stability selection:** Uses bootstrapping and selection probabilities to control false discoveries and improve reproducibility.
* **Practical speed-ups:** Optionally pass a precomputed `dissimilarity_matrix` to skip the slow network construction stage.
* **Designed for omics & sensors:** Works well when p » n, repeated measures, and correlated predictors are the norm.

## Installation

You can install the released version of `TemporalForest` from [CRAN](https://CRAN.R-project.org) with:

```r
install.packages("TemporalForest")
```

And the development version from [GitHub](https://github.com/) with:

```r
# install.packages("remotes")
remotes::install_github("SisiShao/TemporalForest")
```

## 30-second Quick Start

A tiny example that skips network construction by supplying a lightweight dissimilarity matrix:

```r
library(TemporalForest)

set.seed(11)
n_subjects <- 60; n_timepoints <- 2; p <- 20

# Build X: list of length T, each an n × p matrix with identical column names
X <- replicate(n_timepoints, matrix(rnorm(n_subjects * p), n_subjects, p), simplify = FALSE)
colnames(X[[1]]) <- colnames(X[[2]]) <- paste0("V", 1:p)

# Long view + metadata
X_long <- do.call(rbind, X)
id   <- rep(seq_len(n_subjects), each = n_timepoints)
time <- rep(seq_len(n_timepoints), times = n_subjects)

# Outcome with three strong signals
u_subj <- rnorm(n_subjects, 0, 0.7)
eps    <- rnorm(length(id), 0, 0.08)
Y <- 4*X_long[, "V1"] + 3.5*X_long[, "V2"] + 3.2*X_long[, "V3"] +
     rep(u_subj, each = n_timepoints) + eps

# Simple dissimilarity to bypass Stage 1 (fast demo)
A <- 1 - abs(stats::cor(X_long)); diag(A) <- 0
dimnames(A) <- list(colnames(X[[1]]), colnames(X[[1]]))

fit <- temporal_forest(
  X = X, Y = Y, id = id, time = time,
  dissimilarity_matrix = A,     # skip WGCNA/TOM
  n_features_to_select = 3,     # expect V1, V2, V3
  n_boot_screen = 6, n_boot_select = 18,
  keep_fraction_screen = 1,
  min_module_size = 2,
  alpha_screen = 0.5, alpha_select = 0.6
)

print(fit$top_features)
#> [1] "V1" "V3" "V2"
```

For a more detailed example and a full pipeline run, please see the package vignette.

## Documentation

A long-form guide and reproducible examples can be found in the vignette:
`vignette("TemporalForest-Introduction", package = "TemporalForest")`

## Contributing

Issues and pull requests are welcome! Please report bugs or request features at the [official GitHub repository](https://github.com/SisiShao/TemporalForest/issues).

## Citation

If you use `TemporalForest` in your work, please cite the manuscript:

> Shao, S., Moore, J.H., Ramirez, C.M. (2025). Network-Guided Temporal Forests for Feature Selection in High-Dimensional Longitudinal Data. *Manuscript submitted for publication*.

You can also get the citation from within R:

```r
citation("TemporalForest")
```