Usage
crossValidation(
models,
outcome = NULL,
K = 5,
R = 1,
metric = NULL,
ncores = 2,
verbose = FALSE,
...
)Arguments
- models
A named list of model fitting objects from
SEMrun(),SEMml()orSEMdnn()function, with default group=NULL (forSEMrun()or outcome=NULL (forSEMml()orSEMdnn()).- outcome
A character vector (as.factor) of labels for a categorical output (target). If NULL (default), the categorical output (target) will not be considered.
- K
A numerical value indicating the number of k-fold to create.
- R
A numerical value indicating the number of repetitions for the k-fold cross-validation.
- metric
A character value indicating the metric for boxplots display, i.e.: "amse", "r2", or "srmr", for continuous outcomes, and "f1", "accuracy" or "mcc", for a categorical outcome (default = NULL).
- ncores
Number of cpu cores (default = 2).
- verbose
Output to console boxplots and summarized results (default = FALSE).
- ...
Currently ignored.
Value
A list of 2 objects: (1) "stats", a list with performance evaluation metrics.
If outcome=FALSE, mean and (0.025;0.0975)-quantiles of amse, r2, and srmr
across folds and repetitions are reported; if outcome=TRUE, mean and
(0.025;0.0975)-quantiles of f1, accuracy and mcc from confusion matrix averaged across
all repetitions are reported; and (2) "PE", a data.frame of repeated cross-validation
results.
Details
Easy-to-use model comparison and selection of SEM, ML or DNN models, in which several models are defined and compared in a R-repeated K-fold cross-validation procedure. The winner model is selected by reporting the mean predicted performances across all runs, as outline in de Rooij & Weeda (2020).
References
de Rooij M, Weeda W. Cross-Validation: A Method Every Psychologist Should Know. Advances in Methods and Practices in Psychological Science. 2020;3(2):248-263. doi:10.1177/2515245919898466
Author
Mario Grassi mario.grassi@unipv.it
Examples
# \donttest{
# Load Amyotrophic Lateral Sclerosis (ALS)
ig<- alsData$graph
data<- alsData$exprs
data<- transformData(data)$data
#> Conducting the nonparanormal transformation via shrunkun ECDF...done.
group<- alsData$group
# ... with continuous outcomes
res1 <- SEMml(ig, data, algo="tree")
#> Running SEM model via ML...
#> done.
#>
#> TREE solver ended normally after 23 iterations
#>
#> logL:-52.850417 srmr:0.196585
res2 <- SEMml(ig, data, algo="rf")
#> Running SEM model via ML...
#> done.
#>
#> RF solver ended normally after 23 iterations
#>
#> logL:-41.009436 srmr:0.0905
res3 <- SEMml(ig, data, algo="xgb")
#> Running SEM model via ML...
#> done.
#>
#> XGB solver ended normally after 23 iterations
#>
#> logL:29.12011 srmr:0.006995
res4 <- SEMml(ig, data, algo="sem")
#> Running SEM model via ML...
#> done.
#>
#> SEM solver ended normally after 23 iterations
#>
#> logL:-56.348113 srmr:0.288874
models <- list(res1,res2,res3,res4)
names(models) <- c("tree","rf","xgb","sem")
res.cv1 <- crossValidation(models, outcome=NULL, K=5, R=10)
#> Running Cross-validation...
#> r-repeat = 1
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 2
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 3
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 4
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 5
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 6
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 7
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 8
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 9
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 10
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
print(res.cv1$stats)
#> $amse
#> Wins 2.5% mean 97.5%
#> tree 0 0.884 0.884 0.884
#> rf 0 0.866 0.866 0.866
#> xgb 0 0.949 0.949 0.949
#> sem 10 0.803 0.803 0.803
#>
#> $r2
#> Wins 2.5% mean 97.5%
#> tree 0 0.116 0.116 0.116
#> rf 0 0.134 0.134 0.134
#> xgb 0 0.051 0.051 0.051
#> sem 10 0.197 0.197 0.197
#>
#> $srmr
#> Wins 2.5% mean 97.5%
#> tree 0 0.253 0.253 0.253
#> rf 0 0.192 0.192 0.192
#> xgb 10 0.167 0.167 0.167
#> sem 0 0.321 0.321 0.321
#>
#... with a categorical (as.factor) outcome
outcome <- factor(ifelse(group == 0, "control", "case"))
res.cv2 <- crossValidation(models, outcome=outcome, K=5, R=10)
#> Running Cross-validation...
#> r-repeat = 1
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 2
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 3
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 4
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 5
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 6
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 7
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 8
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 9
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
#> r-repeat = 10
#> 5-fold tree done.
#> 5-fold rf done.
#> 5-fold xgb done.
#> 5-fold sem done.
print(res.cv2$stats)
#> $f1
#> Wins 2.5% mean 97.5%
#> tree 0 0.863 0.863 0.863
#> rf 0 0.824 0.824 0.824
#> xgb 10 0.882 0.882 0.882
#> sem 0 0.759 0.759 0.759
#>
#> $accuracy
#> Wins 2.5% mean 97.5%
#> tree 0 0.850 0.850 0.850
#> rf 0 0.794 0.794 0.794
#> xgb 10 0.869 0.869 0.869
#> sem 0 0.713 0.713 0.713
#>
#> $mcc
#> Wins 2.5% mean 97.5%
#> tree 0 0.488 0.488 0.488
#> rf 0 0.497 0.497 0.497
#> xgb 10 0.588 0.588 0.588
#> sem 0 0.407 0.407 0.407
#>
# }