Overview

MultiLevelOptimalBayes (MLOB) is designed for estimating two-level latent variable models, particularly in small sample settings. This is especially useful in psychology, education, and other fields with hierarchical or nested data structures. We present the R package MultiLevelOptimalBayes (MLOB) for estimating between-group effects in multilevel latent variable models. MLOB employs a regularised Bayesian estimator devised by Dashuk, Hecht, Luedtke, Robitzsch, and Zitzmann (2024), which was subsequently enhanced for additional covariates by the same authors in 2025. This estimator chooses prior parameters to minimise the mean squared error (MSE) of the between-group effect by effectively balancing bias and variance. The regularised Bayesian estimator provides MSE-optimal estimations due to the mean-variance tradeoff, especially in scenarios of small sample sizes and poor intraclass correlation (ICC). The MLOB software supports imbalanced group sizes through integrated data-balancing methods and offers comprehensive inference, including point estimates, standard errors, p-values, and confidence intervals for both primary regressors and covariates. To gain comprehensive understanding, we initially examine the theoretical underpinnings of the regularised Bayesian estimator (Dashuk et al. 2024, 2025), followed by a discussion of its implementation in MLOB, namely the core function mlob(). We illustrate the application of mlob() using real datasets. Consequently, we provide researchers in psychology, education, and related disciplines a robust, user-friendly instrument for dependable multilevel latent variable estimation, particularly in contexts characterised by small sample sizes and low ICCs.

The core function mlob() estimates the between-group coefficient (beta_b) using a regularized Bayesian approach, and applies a data balancing procedure if the groups are unbalanced.

Key Features

Suitable for small-sample, multilevel data
Performs regularized Bayesian estimation
Automatically balances data if necessary
Optional jackknife resampling for standard errors
Provides comparison with unregularized ML estimates

Function Usage

Below is the signature for the mlob() function. This shows the arguments an theit default values you can pass, but note that this chunk is not meant to be executed.

mlob(
  formula,
  data,
  group,
  balancing.limit = 0.2,
  conf.level = 0.95,
  jackknife = FALSE,
  punish.coeff = 2,
  ...
)

Arguments:

formula: A formula (e.g., Y ~ X + C) where Y is the outcome, X is the context variable of interest, and C represents covariates.
data: A data.frame containing all variables in the formula and the grouping variable.
group: A string naming the grouping variable.
balancing.limit: Proportion (0-1) of the dataset that can be removed to balance group sizes. Default is 0.2.
conf.level: Confidence level for confidence intervals. Default is 0.95 (95% CI).
jackknife: Logical. If TRUE, standard errors and CIs are computed via jackknife resampling. Default is FALSE.
punish.coeff: Multiplier for penalizing removal of entire groups during balancing. Higher values discourage full-group deletions.

Balancing Procedure: The mlob() function also verifies whether the data is balanced, that is each group consist of exactly the same number of individuals. If the data is unbalanced, the balancing procedure comes into effect and identifies the optimal number of individuals and groups to delete based on the punishment coefficient. If the amount of data to be deleted is more than the threshold (balancing.limit), the regularized Bayesian estimator is not calculated and mlob() produces an error. This forces the user to increase the balancing limit manually and warns that the results should be interpreted with caution. # Examples

Example 1: Iris Dataset

result_iris <- mlob(
  Sepal.Length ~ Sepal.Width + Petal.Length,
  data = iris,
  group = "Species",
  conf.level = 0.99,
  jackknife = FALSE
)

summary(result_iris)
#> Call:
#>  mlob(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris, group = Species, conf.level = 0.99, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>                     Estimate Std. Error Lower CI (99%) Upper CI (99%)   Z value
#> beta_b             0.8308711  1.4655556     -2.9441499       4.605892 0.5669325
#> gamma_Petal.Length 0.4679522  0.2582579     -0.1972762       1.133181 1.8119567
#>                      Pr(>|z|) Significance
#> beta_b             0.57076004             
#> gamma_Petal.Length 0.06999289            .
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>                     Estimate   Std. Error Lower CI (99%) Upper CI (99%)
#> beta_b             0.6027440 5.424780e+15  -1.397331e+16   1.397331e+16
#> gamma_Petal.Length 0.4679522 2.582579e-01  -1.972762e-01   1.133181e+00
#>                         Z value   Pr(>|z|) Significance
#> beta_b             1.111094e-16 1.00000000             
#> gamma_Petal.Length 1.811957e+00 0.06999289            .
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:
#>   The standard error from unoptimized ML estimation is about 3.701518e+17% larger than the standard error obtained through our optimization procedure,
#>   meaning that the optimized estimates are more accurate.
#>   Concerning the estimates themselves, the unoptimized ML estimates may
#>   differ greatly from the optimized estimates and should not be reported.
#>   As the optimized estimates are always at least as accurate as the
#>   unoptimized ML estimates,
#>   please use them and their corresponding standard errors (first table of
#>   output) for interpretation and reporting.
#>   For more information, see Dashuk et al. (2025).

Example 2: Slightly Unbalanced ChickWeight Dataset

result_chick <- mlob(
  weight ~ Time,
  data = ChickWeight,
  group = "Diet",
  punish.coeff = 1.5,
  jackknife = FALSE
)

print(result_chick)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Coefficients
#>     beta_b
#>  -392.9573
#> 
#> Standard_Error
#>    beta_b
#>  335.9924
#> 
#> Confidence_Interval (95%)
#>           Lower    Upper
#> beta_b -1051.49 265.5757
#> 
#> Z value
#>     beta_b
#>  -1.169542
#> 
#> p value
#>     beta_b
#>  0.2421852
summary(result_chick)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>         Estimate Std. Error Lower CI (95%) Upper CI (95%)   Z value  Pr(>|z|)
#> beta_b -392.9573   335.9924       -1051.49       265.5757 -1.169542 0.2421852
#>        Significance
#> beta_b             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>         Estimate Std. Error Lower CI (95%) Upper CI (95%)    Z value  Pr(>|z|)
#> beta_b -57.92977   169.8339       -390.798       274.9385 -0.3410967 0.7330308
#>        Significance
#> beta_b             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:

** Interperation of the results for Chickenweight Dataset **

Estimated beta_b and additional covariates effects, denoted as gamma_Diet3 and gamma_Diet4, are included in the end result table. We did not incorporate any additional covariates into the mlob() formula. However, due to the fact that Diet is a four-level factor, mlob() considers Diet 1 to be the primary regressor (X), excludes Diet 2 from the design matrix to prevent multicollinearity, and reports the remaining levels as gamma_Diet3 and gamma_Diet4, using Diet 2 as the reference. The results of the estimation indicate a substantial distinction between the regularised Bayesian estimator (first table) and the ML estimator (second table). Beta_b’s regularised Bayesian estimator is 5.108, with a statistically significant p-value of 0.0139. This implies that the average weight of chicks on Diet 1 is approximately 5g more than that of chicks on Diet 2 at each time point. In contrast, the ML estimate of the between-group effect beta_b is significantly larger (13.993) but not statistically significant (p value of 0.3910), demonstrating how standard ML can overstate the group-level effect when data is scarce. The positive estimate of gamma_Diet3 is 41.69 and is highly significant for both estimators (p = 0.0048 in MultiLevelOptimalBayes (MLOB)). This suggests that chicks on Diet 3 have a significant increase in weight in comparison to the baseline group. Gamma_Diet4 has an estimate of 29.64, but its p-value of 0.0504 is marginally significant at the 5% level. This implies that Diet 4 has a positive impact on weight; however, the magnitude of this effect is less significant than that of Diet 3.

Example 3: Highly Unbalanced mtcars Dataset

result_mtcars <- mlob(
  mpg ~ hp + wt + am + hp:wt + hp:am,
  data = mtcars,
  group = "cyl",
  balancing.limit = 0.35
)

summary(result_mtcars)
#> Call:
#>  mlob(mpg ~ hp + wt + am + hp:wt + hp:am, data = mtcars, group = cyl, balancing.limit = 0.35, conf.level = 0.95) 
#> 
#> Summary of Coefficients:
#>                 Estimate  Std. Error Lower CI (95%) Upper CI (95%)    Z value
#> beta_b      -10.14396628  9.35801453   -28.48533774      8.1974052 -1.0839870
#> gamma_wt     -7.35030630 11.52439853   -29.93771236     15.2370998 -0.6378039
#> gamma_am      2.08029569  9.58981281   -16.71539204     20.8759834  0.2169277
#> gamma_hp:wt   0.02030139  0.05611299    -0.08967805      0.1302808  0.3617948
#> gamma_hp:am  -0.01091929  0.07222809    -0.15248375      0.1306452 -0.1511779
#>              Pr(>|z|) Significance
#> beta_b      0.2783706             
#> gamma_wt    0.5236013             
#> gamma_am    0.8282647             
#> gamma_hp:wt 0.7175054             
#> gamma_hp:am 0.8798354             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>                Estimate   Std. Error Lower CI (95%) Upper CI (95%)
#> beta_b      -0.10122425 4.530593e+12  -8.879800e+12   8.879800e+12
#> gamma_wt    -7.35030630 1.152440e+01  -2.993771e+01   1.523710e+01
#> gamma_am     2.08029569 9.589813e+00  -1.671539e+01   2.087598e+01
#> gamma_hp:wt  0.02030139 5.611299e-02  -8.967805e-02   1.302808e-01
#> gamma_hp:am -0.01091929 7.222809e-02  -1.524838e-01   1.306452e-01
#>                   Z value  Pr(>|z|) Significance
#> beta_b      -2.234238e-14 1.0000000             
#> gamma_wt    -6.378039e-01 0.5236013             
#> gamma_am     2.169277e-01 0.8282647             
#> gamma_hp:wt  3.617948e-01 0.7175054             
#> gamma_hp:am -1.511779e-01 0.8798354             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:
#>   The standard error from unoptimized ML estimation is about 4.841404e+13% larger than the standard error obtained through our optimization procedure,
#>   meaning that the optimized estimates are more accurate.
#>   Concerning the estimates themselves, the unoptimized ML estimates may
#>   differ greatly from the optimized estimates and should not be reported.
#>   As the optimized estimates are always at least as accurate as the
#>   unoptimized ML estimates,
#>   please use them and their corresponding standard errors (first table of
#>   output) for interpretation and reporting.
#>   For more information, see Dashuk et al. (2025).

Output

The output is an object of class mlob_result, which contains:

Estimated coefficients (beta_b and gamma values)
Standard errors and confidence intervals
Unregularized ML estimates for comparison
Information about the data balancing procedure (if used)

Available S3 Methods

The mlob_result object supports a comprehensive set of S3 methods that follow standard R conventions, making it easy to work with results in familiar ways. Here are all available methods:

Display Methods

# Get a basic result for demonstration
result <- mlob(weight ~ Time, data = ChickWeight, group = 'Diet', jackknife = FALSE)

# Print method - displays coefficients, standard errors, confidence intervals, Z-values, and p-values
print(result)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Coefficients
#>     beta_b
#>  -11.96512
#> 
#> Standard_Error
#>    beta_b
#>  234.4558
#> 
#> Confidence_Interval (95%)
#>            Lower    Upper
#> beta_b -471.4901 447.5598
#> 
#> Z value
#>      beta_b
#>  -0.0510336
#> 
#> p value
#>     beta_b
#>  0.9592987

# Summary method - comprehensive summary with significance stars and comparison to unoptimized ML
summary(result)
#> Call:
#>  mlob(weight ~ Time, data = ChickWeight, group = Diet, conf.level = 0.95, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>         Estimate Std. Error Lower CI (95%) Upper CI (95%)    Z value  Pr(>|z|)
#> beta_b -11.96512   234.4558      -471.4901       447.5598 -0.0510336 0.9592987
#>        Significance
#> beta_b             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>         Estimate Std. Error Lower CI (95%) Upper CI (95%)     Z value  Pr(>|z|)
#> beta_b -3.806336   161.6417      -320.6182       313.0055 -0.02354798 0.9812132
#>        Significance
#> beta_b             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:

Statistical Methods

# Extract coefficients as a data frame
coef(result)
#>      beta_b
#> 1 -11.96512

# Extract standard errors
se(result)
#>   beta_b 
#> 234.4558

# Extract variance-covariance matrix (diagonal only)
vcov(result)
#>   beta_b 
#> 54969.53

# Extract confidence intervals
confint(result)
#>             2.5%    97.5%
#> beta_b -471.4901 447.5598

# Extract confidence intervals for specific parameters
confint(result, "beta_b")
#>             2.5%    97.5%
#> beta_b -471.4901 447.5598

# Extract confidence intervals with different confidence level
confint(result, level = 0.99)
#>             0.5%   99.5%
#> beta_b -615.8833 591.953

Utility Methods

# Convert results to a data frame format
as.data.frame(result)
#>         Estimate Std. Error Lower CI (95%) Upper CI (95%)    Z value  Pr(>|z|)
#> beta_b -11.96512   234.4558      -471.4901       447.5598 -0.0510336 0.9592987

# Get dimensions (number of parameters)
dim(result)
#> [1] 1 1

# Get number of parameters
length(result)
#> [1] 1

# Get parameter names
names(result)
#> [1] "beta_b"

Update Method

# Update model with new parameters (e.g., different confidence level)
updated_result <- update(result, conf.level = 0.99)
summary(updated_result)
#> Call:
#>  mlob(weight ~ Time, data = data, group = Diet, conf.level = 0.99, jackknife = FALSE) 
#> 
#> Summary of Coefficients:
#>         Estimate Std. Error Lower CI (99%) Upper CI (99%)    Z value  Pr(>|z|)
#> beta_b -11.96512   234.4558      -615.8833        591.953 -0.0510336 0.9592987
#>        Significance
#> beta_b             
#> 
#> 
#> For comparison, summary of coefficients from unoptimized analysis (ML):
#>         Estimate Std. Error Lower CI (99%) Upper CI (99%)     Z value  Pr(>|z|)
#> beta_b -3.806336   161.6417      -420.1677       412.5551 -0.02354798 0.9812132
#>        Significance
#> beta_b             
#> 
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Note:

Discovering Available Methods

You can discover all available methods for mlob_result objects using:

methods(class = "mlob_result")
#>  [1] as.data.frame coef          confint       dim           length       
#>  [6] names         print         se            summary       update       
#> [11] vcov         
#> see '?methods' for accessing help and source code

Practical Example: Working with Results

Here’s a practical example showing how to use multiple methods together:

# Run analysis
result <- mlob(weight ~ Time, data = ChickWeight, group = 'Diet', jackknife = FALSE)

# Get basic information
cat("Number of parameters:", length(result), "\n")
#> Number of parameters: 1
cat("Parameter names:", paste(names(result), collapse = ", "), "\n")
#> Parameter names: beta_b

# Extract key statistics
coefficients <- coef(result)
standard_errors <- se(result)
confidence_intervals <- confint(result, level = 0.99)

# Create a custom summary table
custom_summary <- data.frame(
  Parameter = names(result),
  Estimate = as.numeric(coefficients),
  SE = as.numeric(standard_errors),
  CI_Lower = confidence_intervals[, 1],
  CI_Upper = confidence_intervals[, 2]
)

print(custom_summary)
#>   Parameter Estimate       SE  CI_Lower CI_Upper
#> 1    beta_b -290.836 476.4304 -1518.039 936.3675

All these methods follow standard R conventions, making your mlob_result objects compatible with existing R workflows and familiar to users of other statistical packages.

Limitations

While MultiLevelOptimalBayes provides a robust solution for regularized estimation in two-level models, users should be aware of the following limitations:

Local Grid Search:
The optimization is restricted to a local 5*sigma region around the ML estimate. Although this region captures the true MSE minimum with high probability (>= 0.999 for J = 5), it does not guarantee identification of the global minimum.
Assumption of Equal Group Sizes:
The estimator assumes equal group sizes to simplify the model. While averaging group sizes is a proposed solution, the method does not yet handle unbalanced group sizes natively, but find the optimal reduction to the balanced size.
Jackknife Computational Cost:
Jackknife resampling improves interval coverage in small samples. Note that it may be computationally demanding for larger samples.
Limited Model Scope:
Currently, MLOB only handles two-level models with continuous outcomes. Extensions to support generalized linear mixed models, three- and more-level structures, or multivariate outcomes are not yet available.

References

Dashuk, V., Hecht, M., Luedtke, O., Robitzsch, A., & Zitzmann, S. (2024).
An Optimally Regularized Estimator of Multilevel Latent Variable Models, with Improved MSE Performance.
https://doi.org/10.13140/RG.2.2.18148.39048

Luedtke, O., Marsh, H. W., Robitzsch, A., et al. (2008).
The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies.
https://doi.org/10.1037/a0012869

Authors

Valerii Dashuk
Binayak Timilsina
Martin Hecht
Steffen Zitzmann

Contact:
vadashuk@gmail.com
binayak.timilsina001@gmail.com
martin.hecht@hsu-hh.de
steffen.zitzmann@medicalschool-hamburg.de

MultiLevelOptimalBayes-Intro