Cross-validation of linear SEM, ML or DNN training models

The function does a R-repeated K-fold cross-validation of SEMrun(), SEMml() or SEMdnn() models.

crossValidation(
  models,
  outcome = NULL,
  K = 5,
  R = 1,
  metric = NULL,
  ncores = 2,
  verbose = FALSE,
  ...
)

Arguments

models: A named list of model fitting objects from SEMrun(), SEMml() or SEMdnn() function, with default group=NULL (for SEMrun() or outcome=NULL (for SEMml() or SEMdnn()).
outcome: A character vector (as.factor) of labels for a categorical output (target). If NULL (default), the categorical output (target) will not be considered.
K: A numerical value indicating the number of k-fold to create.
R: A numerical value indicating the number of repetitions for the k-fold cross-validation.
metric: A character value indicating the metric for boxplots display, i.e.: "amse", "r2", or "srmr", for continuous outcomes, and "f1", "accuracy" or "mcc", for a categorical outcome (default = NULL).
ncores: Number of cpu cores (default = 2).
verbose: Output to console boxplots and summarized results (default = FALSE).
...: Currently ignored.

Value

A list of 2 objects: (1) "stats", a list with performance evaluation metrics. If outcome=FALSE, mean and (0.025;0.0975)-quantiles of amse, r2, and srmr across folds and repetitions are reported; if outcome=TRUE, mean and (0.025;0.0975)-quantiles of f1, accuracy and mcc from confusion matrix averaged across all repetitions are reported; and (2) "PE", a data.frame of repeated cross-validation results.

Details

Easy-to-use model comparison and selection of SEM, ML or DNN models, in which several models are defined and compared in a R-repeated K-fold cross-validation procedure. The winner model is selected by reporting the mean predicted performances across all runs, as outline in de Rooij & Weeda (2020).

References

de Rooij M, Weeda W. Cross-Validation: A Method Every Psychologist Should Know. Advances in Methods and Practices in Psychological Science. 2020;3(2):248-263. doi:10.1177/2515245919898466

Author

Mario Grassi mario.grassi@unipv.it

Examples


# \donttest{
# Load Amyotrophic Lateral Sclerosis (ALS)
ig<- alsData$graph
data<- alsData$exprs
data<- transformData(data)$data
#> Conducting the nonparanormal transformation via shrunkun ECDF...done.
group<- alsData$group

# ... with continuous outcomes 

res1 <- SEMml(ig, data, algo="tree")
#> Running SEM model via ML...
#>  done.
#> 
#> TREE solver ended normally after 23 iterations
#> 
#>  logL:-52.850417  srmr:0.196585
res2 <- SEMml(ig, data, algo="rf")
#> Running SEM model via ML...
#>  done.
#> 
#> RF solver ended normally after 23 iterations
#> 
#>  logL:-41.009436  srmr:0.0905
res3 <- SEMml(ig, data, algo="xgb")
#> Running SEM model via ML...
#>  done.
#> 
#> XGB solver ended normally after 23 iterations
#> 
#>  logL:29.12011  srmr:0.006995
res4 <- SEMml(ig, data, algo="nn")
#> Running SEM model via ML...
#>  done.
#> 
#> NN solver ended normally after 23 iterations
#> 
#>  logL:-50.521439  srmr:0.187327

models <- list(res1,res2,res3,res4)
names(models) <- c("tree","rf","xgb","nn")

res.cv1 <- crossValidation(models, outcome=NULL, K=5, R=10)
#> Running Cross-validation...
#> r-repeat = 1 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 2 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 3 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 4 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 5 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 6 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 7 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 8 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 9 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 10 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
print(res.cv1$stats)
#> $amse
#>      Wins  2.5%  mean 97.5%
#> tree    0 0.814 0.814 0.814
#> rf      0 0.789 0.789 0.789
#> xgb    10 0.767 0.767 0.767
#> nn      0 0.802 0.802 0.802
#> 
#> $r2
#>      Wins  2.5%  mean 97.5%
#> tree    0 0.186 0.186 0.186
#> rf      0 0.211 0.211 0.211
#> xgb    10 0.233 0.233 0.233
#> nn      0 0.198 0.198 0.198
#> 
#> $srmr
#>      Wins  2.5%  mean 97.5%
#> tree    0 0.269 0.269 0.269
#> rf      0 0.216 0.216 0.216
#> xgb    10 0.192 0.192 0.192
#> nn      0 0.251 0.251 0.251
#> 

#... with a categorical (as.factor) outcome

outcome <- factor(ifelse(group == 0, "control", "case"))
res.cv2 <- crossValidation(models, outcome=outcome, K=5, R=10)
#> Running Cross-validation...
#> r-repeat = 1 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 2 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 3 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 4 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 5 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 6 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 7 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 8 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 9 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
#> r-repeat = 10 
#>  5-fold tree done.
#>  5-fold rf done.
#>  5-fold xgb done.
#>  5-fold nn done.
print(res.cv2$stats)
#> $f1
#>      Wins  2.5%  mean 97.5%
#> tree    0 0.836 0.836 0.836
#> rf      0 0.833 0.834 0.834
#> xgb    10 0.877 0.888 0.890
#> nn      0 0.805 0.807 0.826
#> 
#> $accuracy
#>      Wins  2.5%  mean 97.5%
#> tree    0 0.814 0.818 0.819
#> rf      0 0.806 0.806 0.806
#> xgb    10 0.867 0.879 0.881
#> nn      0 0.769 0.773 0.798
#> 
#> $mcc
#>      Wins  2.5%  mean 97.5%
#> tree    0 0.386 0.393 0.436
#> rf      0 0.495 0.511 0.513
#> xgb    10 0.525 0.566 0.572
#> nn      0 0.490 0.491 0.491
#> 
# }