Gradient Weight method for neural network variable importance

The function computes the gradient matrix, i.e., the average conditional effects of the input variables w.r.t the neural network model, as discussed by Amesöder et al (2024).

getGradientWeight(object, thr = NULL, verbose = FALSE, ...)

Arguments

object: A neural network object from SEMdnn() function.
thr: A numeric value [0-1] indicating the threshold to apply to the gradient weights to color the graph. If thr = NULL (default), the threshold is set to thr = 0.5*max(abs(gradient weights)).
verbose: A logical value. If FALSE (default), the processed graph will not be plotted to screen.
...: Currently ignored.

Value

A list of three object: (i) est: a data.frame including the connections together with their gradient weights, (ii) gest: if the outcome vector is given, a data.frame of gradient weights for outcome lavels, and (iii) dag: DAG with colored edges/nodes. If abs(grad) > thr and grad < 0, the edge is inhibited and it is highlighted in blue; otherwise, if abs(grad) > thr and grad > 0, the edge is activated and it is highlighted in red. If the outcome vector is given, nodes with absolute connection weights summed over the outcome levels, i.e. sum(abs(grad[outcome levels])) > thr, will be highlighted in pink.

Details

The partial derivatives method calculates the derivative (the gradient) of each output variable (y) with respect to each input variable (x) evaluated at each observation (i=1,...,n) of the training data. The contribution of each input is evaluated in terms of both magnitude taking into account not only the connection weights and activation functions, but also the values of each observation of the input variables. Once the gradients for each variable and observation, a summary gradient is calculated by averaging over the observation units. Finally, the average weights are entered into a matrix, W(pxp) and the element-wise product with the binary (1,0) adjacency matrix, A(pxp) of the input DAG, W*A maps the weights on the DAG edges. Note that the operations required to compute partial derivatives are time consuming compared to other methods such as Olden's (connection weight). The computational time increases with the size of the neural network or the size of the data. Therefore, the function uses a progress bar to check the progress of the gradient evaluation per observation.

References

Amesöder, C., Hartig, F. and Pichler, M. (2024), ‘cito': an R package for training neural networks using ‘torch'. Ecography, 2024: e07143. https://doi.org/10.1111/ecog.07143

Author

Mario Grassi mario.grassi@unipv.it

Examples


# \donttest{
if (torch::torch_is_installed()){

# load ALS data
ig<- alsData$graph
data<- alsData$exprs
data<- transformData(data)$data

#ncores<- parallel::detectCores(logical = FALSE)
dnn0<- SEMdnn(ig, data, outcome = NULL, thr = NULL,
      #hidden = 5*K, link = "selu", bias = TRUE,
      hidden = c(10,10,10), link = "selu", bias = TRUE,
      validation = 0,  epochs = 32, ncores = 2)

gw05<- getGradientWeight(dnn0, thr = 0.5, verbose = TRUE)
table(E(gw05$dag)$color)
}
#> Conducting the nonparanormal transformation via shrunkun ECDF...done.
#> Running SEM model via DNN...
#>  done.
#> 
#> DNN solver ended normally after 736 iterations
#> 
#>  logL:-50.299972  srmr:0.20529

#> 
#>     gray50       red2 royalblue3 
#>         26         13          6 
# }