SuperLearner

Choosing Learners

SuperLearner is a type 2 ensemble method, meaning it combines many methods of different types into one predictive model. SuperLearner uses cross-validation to find the best weighted combination of algorithms based on the predictive performance measure specified (default in the SuperLearner package is non-negative least squares based on the Lawson-Hanson algorithm (Mullen and Stokkum 2023), but measures such as AUC can also be used). To run SuperLearner, the user needs to specify a library consisting of all the different methods SuperLearner should incorporate in the final model, as well as the number of cross-validation folds.

See previous chapter for other types of ensemble learning methods.

SuperLearner will perform as well as possible given the library of algorithms considered. A very recent paper by Phillips et al. (2023) provides some concrete guidelines for the determination of the number of cross-validation folds necessary and the selection of algorithms to include. Overall, we want to make sure the set of algorithms provided is:

  • Diverse: Having a rich library of algorithms allows the SuperLearner to adapt to a range of underlying data structures. Diverse libraries include:

    • Parametric learners such as generalized linear models (GLMs)
    • Highly data-adaptive learners
    • Multiple variants of the same learner with different parameter specifications
  • Computationally feasible: Lots of machine learning algorithms take a long time to run. Having multiple computationally intensive algorithms in your library will cause the SuperLearner as a whole to take much too long to run.

Note

Some of the more specific guidelines depend on our effective sample size. For binary outcomes, this can be calculated as:

\[ n_{eff}=min(n, 5*(n*min(\bar{p},1-\bar{p}))) \]

where \(\bar{p}\): prevalence of the outcome.

For continuous outcomes, the effective sample size is the same as the sample size (\(n_{eff} = n\)).

We also want to consider the characteristics of our particular sample.

  • If there are continuous covariates: We should include learners that do not force relationships to be linear/monotonic. For example, we could include regression splines, support vector machines, and tree-based methods like regression trees.

  • If we have high-dimensional data (a large number of covariates e.g. more than \(n_{eff}/20\) ): We should include some learners that fall under the class of screeners. These are learners which incorporate dimension reduction such as LASSO and random forests.

  • If the sample size is very large (i.e. \(n_{eff}>500\) ): We should include as many learners as is computationally feasible.

  • If the sample size is small (i.e. \(n_{eff} \leq 500\) ): We should include fewer learners (e.g. up to \(n_{eff}/5\) ), and include less flexible learners.

Some examples of learners that could be included are given in the table below (Polley 2021):

Type of learner Examples
Parametric
  • SL.mean: simple mean

  • SL.glm: generalized linear models

  • SL.lm: ordinary least squares

  • SL.speedglm: fast version of glm

  • SL.speedlm: fast version of lm

  • SL.gam: generalized additive methods

  • SL.step: choose model based on AIC (backwards or forwards or both)

Highly data-adaptive
  • SL.glmnet: penalized regression using elastic net (ridge regression and Lasso)

  • Kernel-based methods

    • SL.kernelKnn: k-nearest neighbours

    • SL.ksvm: kernel-based support vector machine

  • SL.xgboost: extreme gradient boosting

  • SL.gbm: gradient-boosted machines

  • SL.nnet: neural networks

Allowing non-linear/monotonic relationships
  • SL.earth: multivariate adaptive regression splines

  • Tree-based methods

    • SL.randomForest: random forests

    • tmle.SL.dbarts2: bayesian additive regression trees

    • SL.cforest: random forests using conditional inference trees

    • SL.ranger: fast implementation of random forest suited for high dimensional data

  • SL.svm: support vector machines

Screeners
  • screen.corP: retain covariates with correlation with outcome p-value <0.1

  • screen.corRank: retain top j covariates with highest correlation with outcome

  • screen.glmnet: Lasso

  • screen.randomForest: random forests

  • screen.SIS: retain covariates based on distance correlation

There is also a useful tool implemented in the SuperLearner library which allows us to easily see a list of all available learners.

SuperLearner::listWrappers()
#> All prediction algorithm wrappers in SuperLearner:
#>  [1] "SL.bartMachine"      "SL.bayesglm"         "SL.biglasso"        
#>  [4] "SL.caret"            "SL.caret.rpart"      "SL.cforest"         
#>  [7] "SL.earth"            "SL.extraTrees"       "SL.gam"             
#> [10] "SL.gbm"              "SL.glm"              "SL.glm.interaction" 
#> [13] "SL.glmnet"           "SL.ipredbagg"        "SL.kernelKnn"       
#> [16] "SL.knn"              "SL.ksvm"             "SL.lda"             
#> [19] "SL.leekasso"         "SL.lm"               "SL.loess"           
#> [22] "SL.logreg"           "SL.mean"             "SL.nnet"            
#> [25] "SL.nnls"             "SL.polymars"         "SL.qda"             
#> [28] "SL.randomForest"     "SL.ranger"           "SL.ridge"           
#> [31] "SL.rpart"            "SL.rpartPrune"       "SL.speedglm"        
#> [34] "SL.speedlm"          "SL.step"             "SL.step.forward"    
#> [37] "SL.step.interaction" "SL.stepAIC"          "SL.svm"             
#> [40] "SL.template"         "SL.xgboost"
#> 
#> All screening algorithm wrappers in SuperLearner:
#> [1] "All"
#> [1] "screen.corP"           "screen.corRank"        "screen.glmnet"        
#> [4] "screen.randomForest"   "screen.SIS"            "screen.template"      
#> [7] "screen.ttest"          "write.screen.template"

SuperLearner in TMLE

  • The default SuperLearner library for estimating the outcome includes (Gruber, Van Der Laan, and Kennedy 2020)

    • SL.glm: generalized linear models (GLMs)
    • SL.glmnet: least absolute shrinkage and selection operator (LASSO)
    • tmle.SL.dbarts2: modeling and prediction using Bayesian Additive Regression Trees (BART)
  • The default library for estimating the propensity scores includes

    • SL.glm: generalized linear models (GLMs)
    • tmle.SL.dbarts.k.5: SL wrappers for modeling and prediction using BART
    • SL.gam: generalized additive models (GAMs)
  • It is certainly possible to use different set of learners

    • More methods can be added by
      • specifying lists of models in the Q.SL.library (for the outcome model) and g.SL.library (for the propensity score model)

References

Gruber, S., M. Van Der Laan, and C. Kennedy. 2020. Package ’Tmle’. https://cran.r-project.org/web/packages/tmle/tmle.pdf.
Mullen, Katharine M., and Ivo H. M. van Stokkum. 2023. Package ’Nnls’. https://cran.r-project.org/web/packages/nnls/nnls.pdf.
Phillips, Rachael V., Mark J. van der Laan, Hana Lee, and Susan Gruber. 2023. “Practical Considerations for Specifying a Super Learner.” International Journal of Epidemiology 52: 1276–85. https://doi.org/10.1093/ije/dyad023.
Polley, Eric. 2021. Package ’SuperLearner’. https://cran.r-project.org/web/packages/SuperLearner/SuperLearner.pdf.