Layer-wise SEM train with a Deep Neural Netwok (DNN)

The function builds the topological layer (TL) ordering of the input graph to fit a series of Deep Neural Networks (DNN) models, where the nodes in one layer act as response variables (output) y and the nodes in the sucessive layers act as predictors (input) x. Each fit uses the dnn function of the cito R package, based on the deep learning framework 'torch'.

The torch package is native to R, so it's computationally efficient and the installation is very simple, as there is no need to install Python or any other API, and DNNs can be trained on CPU, GPU and MacOS GPUs. In order to install torch please follow these steps:

install.packages("torch")

library(torch)

install_torch(reinstall = TRUE)

For setup GPU or if you have problems installing torch package, check out the installation help from the torch developer.

SEMdnn(
  graph,
  data,
  outcome = NULL,
  thr = NULL,
  nboot = 0,
  hidden = c(10L, 10L, 10L),
  link = "relu",
  bias = TRUE,
  dropout = 0,
  loss = "mse",
  validation = 0,
  lambda = 0,
  alpha = 0.5,
  optimizer = "adam",
  lr = 0.01,
  epochs = 100,
  device = "cpu",
  ncores = 2,
  early_stopping = FALSE,
  verbose = FALSE,
  ...
)

Arguments

graph: An igraph object.
data: A matrix with rows corresponding to subjects, and columns to graph nodes (variables).
outcome: A character vector (as.factor) of labels for a categorical output (target). If NULL (default), the categorical output (target) will not be considered.
thr: A numeric value [0-1] indicating the threshold to apply to the Olden's connection weights to color the graph. If thr = NULL (default), the threshold is set to thr = 0.5*max(abs(connection weights)).
nboot: number of bootstrap samples that will be used to compute cheap (lower, upper) CIs for all input variable weights. As a default, nboot = 0.
hidden: hidden units in layers; the number of layers corresponds with the length of the hidden units. As a default, hidden = c(10L, 10L, 10L).
link: A character value describing the activation function to use, which might be a single length or be a vector with many activation functions assigned to each layer. As a default, link = "selu".
bias: A logical vector, indicating whether to employ biases in the layers, which can be either vectors of logicals for each layer (number of hidden layers + 1 (final layer)) or of length one. As a default, bias = TRUE.
dropout: A numerical value for the dropout rate, which is the probability that a node will be excluded from training. As a default, dropout = 0.
loss: A character value specifying the at which the network should be optimized. For regression problem used in SEMdnn(), the user can specify: (a) "mse" (mean squared error), "mae" (mean absolute error), or "gaussian" (normal likelihood). As a default, loss = "mse".
validation: A numerical value indicating the proportion of the data set that should be used as a validation set (randomly selected, default = 0).
lambda: A numerical value indicating the strength of the regularization, \(\lambda\)(L1 + L2) for lambda penalty (default = 0).
alpha: A numerical vector to add L1/L2 regularization into the training. Set the alpha parameter for each layer to (1-\(\alpha\))L1 + \(\alpha\)L2. It must fall between 0 and 1 (default = 0.5).
optimizer: A character value indicating the optimizer to use for training the network. The user can specify: "adam" (ADAM algorithm), "adagrad" (adaptive gradient algorithm), "rmsprop" (root mean squared propagation), "rprop” (resilient backpropagation), "sgd" (stochastic gradient descent). As a default, optimizer = "adam".
lr: A numerical value indicating the learning rate given to the optimizer (default = 0.01).
epochs: A numerical value indicating the epochs during which the training is conducted (default = 100).
device: A character value describing the CPU/GPU device ("cpu", "cuda", "mps") on which the neural network should be trained on. As a default, device = "cpu".
ncores: number of cpu cores (default = 2)
early_stopping: If set to integer, training will terminate if the loss increases over a predetermined number of consecutive epochs and apply validation loss when available. Default is FALSE, no early stopping is applied.
verbose: The training loss values of the DNN model are displayed as output, comparing the training, validation and baseline in the last epoch (default = FALSE).
...: Currently ignored.

Value

An S3 object of class "DNN" is returned. It is a list of 5 objects:

"fit", a list of DNN model objects, including: the estimated covariance matrix (Sigma), the estimated model errors (Psi), the fitting indices (fitIdx), and the parameterEstimates, i.e., the data.frame of Olden's connection weights.
"gest", the data.frame of estimated connection weights (parameterEstimates) of outcome levels, if outcome != NULL.
"model", a list of all j=1,...,(L-1) fitted MLP network models.
"graph", the induced DAG of the input graph mapped on data variables. The DAG is colored based on the Olden's connection weights (W), if abs(W) > thr and W < 0, the edge is inhibited and it is highlighted in blue; otherwise, if abs(W) > thr and W > 0, the edge is activated and it is highlighted in red. If the outcome vector is given, nodes with absolute connection weights summed over the outcome levels, i.e. sum(abs(W[outcome levels])) > thr, will be highlighted in pink.
"data", input data subset mapping graph nodes.

Details

By mapping data onto the input graph, SEMdnn() creates a set of DNN models based on the topological layer (j=1,…,L) structure of the input graph. In each iteration, the response (output) variables, y are the nodes in the j=1,...,(L-1) layer and the predictor (input) variables, x are the nodes belonging to the successive, (j+1),...,L layers. Each DNN model is a Multilayer Perceptron (MLP) network, where every neuron node is connected to every other neuron node in the hidden layer above and every other hidden layer below. Each neuron's value is determined by calculating a weighted summation of its outputs from the hidden layer before it, and then applying an activation function. The calculated value of every neuron is used as the input for the neurons in the layer below it, until the output layer is reached.

If boot != 0, the function will implement the cheap bootstrapping proposed by Lam (2002) to generate uncertainties, i.e. 90 for DNN parameters. Bootstrapping can be enabled by setting a small number (1 to 10) of bootstrap samples. Note, however, that the computation can be time-consuming for massive DNNs, even with cheap bootstrapping!

References

Amesöder, C., Hartig, F. and Pichler, M. (2024), ‘cito': an R package for training neural networks using ‘torch'. Ecography, 2024: e07143. https://doi.org/10.1111/ecog.07143

Grassi M, Palluzzi F, Tarantino B (2022). SEMgraph: An R Package for Causal Network Analysis of High-Throughput Data with Structural Equation Models. Bioinformatics, 38 (20), 4829–4830. <https://doi.org/10.1093/bioinformatics/btac567>

Lam, H. (2022). Cheap bootstrap for input uncertainty quantification. WSC '22: Proceedings of the Winter Simulation Conference, 2318 - 2329.

Author

Mario Grassi mario.grassi@unipv.it

Examples


# \donttest{
if (torch::torch_is_installed()){

# load ALS data
ig<- alsData$graph
data<- alsData$exprs
data<- transformData(data)$data
group<- alsData$group

#...with train-test (0.5-0.5) samples
set.seed(123)
train<- sample(1:nrow(data), 0.5*nrow(data))
#ncores<- parallel::detectCores(logical = FALSE)

start<- Sys.time()
dnn0<- SEMdnn(ig, data[train, ], thr = NULL,
      #hidden = 5*K, link = "selu", bias = TRUE,
      hidden = c(10,10,10), link = "selu", bias = TRUE,
      validation = 0, epochs = 32, ncores = 2)
end<- Sys.time()
print(end-start)

#str(dnn0, max.level=2)
dnn0$fit$fitIdx
parameterEstimates(dnn0$fit)
gplot(dnn0$graph)
table(E(dnn0$graph)$color)

#...with source nodes -> graph layer structure -> sink nodes

#Topological layer (TL) ordering
K<- c(12,  5,  3,  2,  1,  8)
K<- rev(K[-c(1,length(K))]);K

ig1<- mapGraph(ig, type="source"); gplot(ig1)

start<- Sys.time()
dnn1<- SEMdnn(ig1, data[train, ], thr = NULL,
      hidden = 5*K, link = "selu", bias = TRUE,
    validation = 0, epochs = 32, ncores = 2)
end<- Sys.time()
print(end-start)

#Visualization of the neural network structure
nn1 <- dnn1$model[[1]][[1]]
nplot(nn1, bias=FALSE)

#str(dnn1, max.level=2)
dnn1$fit$fitIdx
mean(dnn1$fit$Psi)
parameterEstimates(dnn1$fit)
gplot(dnn1$graph)
table(E(dnn1$graph)$color)

#...with a categorical outcome, a train set (0.5) and a validation set (0.2)
outcome<- factor(ifelse(group == 0, "control", "case")); table(outcome) 

start<- Sys.time()
dnn2<- SEMdnn(ig, data[train, ], outcome[train], thr = NULL,
      #hidden = 5*K, link = "selu", bias = TRUE,
      hidden = c(10,10,10), link = "selu", bias = TRUE,
      validation = 0.2, epochs = 32, ncores = 2)
end<- Sys.time()
print(end-start)

#str(dnn2, max.level=2)
dnn2$fit$fitIdx
parameterEstimates(dnn2$fit)
gplot(dnn2$graph) 
table(E(dnn2$graph)$color)
table(V(dnn2$graph)$color)
}
#> Conducting the nonparanormal transformation via shrunkun ECDF...done.
#> Running SEM model via DNN...
#>  done.
#> 
#> DNN solver ended normally after 736 iterations
#> 
#>  logL:-41.518629  srmr:0.1845
#> Time difference of 4.837363 secs


#> Running SEM model via DNN...
#>  done.
#> 
#> DNN solver ended normally after 384 iterations
#> 
#>  logL:-20.679355  srmr:0.154085
#> Time difference of 2.863694 secs


#> Running SEM model via DNN...
#>  done.
#> 
#> DNN solver ended normally after 800 iterations
#> 
#>  logL:-43.263141  srmr:0.179866
#> Time difference of 4.60148 secs

#> 
#>  pink white 
#>     8    23 
# }