From 55376cd849e7c633afcf762d5c3db20fd896ef5f Mon Sep 17 00:00:00 2001 From: nkoenen Date: Sun, 15 Oct 2023 11:39:33 +0200 Subject: [PATCH 01/20] Add the method Integrated Gradients - Implementation of the method Integrated Gradients using the R6 class `IntegratedGradient` - Add documentation for the new class - Add tests for the new class --- NAMESPACE | 1 + R/GradienBased.R | 212 +++++++++++++ R/InterpretingMethod.R | 12 +- man-roxygen/examples-IntegratedGradient.R | 107 +++++++ man/ConnectionWeights.Rd | 1 + man/DeepLift.Rd | 1 + man/Gradient.Rd | 1 + man/GradientBased.Rd | 1 + man/IntegratedGradient.Rd | 317 +++++++++++++++++++ man/InterpretingMethod.Rd | 1 + man/LRP.Rd | 1 + man/SmoothGrad.Rd | 1 + tests/testthat/test_IntegratedGradient.R | 361 ++++++++++++++++++++++ 13 files changed, 1013 insertions(+), 4 deletions(-) create mode 100644 man-roxygen/examples-IntegratedGradient.R create mode 100644 man/IntegratedGradient.Rd create mode 100644 tests/testthat/test_IntegratedGradient.R diff --git a/NAMESPACE b/NAMESPACE index 1b38641..db2892c 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -9,6 +9,7 @@ export(ConnectionWeights) export(Converter) export(DeepLift) export(Gradient) +export(IntegratedGradient) export(LRP) export(SmoothGrad) export(get_result) diff --git a/R/GradienBased.R b/R/GradienBased.R index f4554c0..6f30c59 100644 --- a/R/GradienBased.R +++ b/R/GradienBased.R @@ -10,6 +10,7 @@ #' gradients w.r.t. to the input for given data. Implemented are: #' #' - *Vanilla Gradients* and *Gradient\eqn{\times}Input* ([`Gradient`]) +#' - *Integrated Gradients* ([`IntegratedGradient`]) #' - *SmoothGrad* and *SmoothGrad\eqn{\times}Input* ([`SmoothGrad`]) #' #' @template param-converter @@ -228,6 +229,217 @@ Gradient <- R6Class( ) ) +############################################################################### +# IntegratedGradients +############################################################################### + +#' @title Integrated Gradients +#' +#' @description +#' The `IntegratedGradient` class implements the method Integrated Gradients +#' (Sundararajan et al., 2017), which incorporates a reference value \eqn{x'} +#' (also known as baseline value) analogous to the [`DeepLift`] method. +#' Integrated Gradients helps to uncover the relative importance of input +#' features in the predictions \eqn{y = f(x)} made by a model compared to the +#' prediction of the reference value \eqn{y' = f(x')}. This is achieved through +#' the following formula: +#' \deqn{ +#' (x - x') \times \int_{\alpha=0}^{1} \frac{\partial f(x' + \alpha (x - x'))}{\partial x} d\alpha +#' } +#' In simpler terms, it calculates how much each feature contributes to a +#' model's output by tracing a path from a baseline input \eqn{x'} to the actual +#' input \eqn{x} and measuring the average gradients along that path. +#' +#' Similar to the other gradient-based methods, by default the integrated +#' gradient is multiplied by the input to get an approximate decomposition +#' of \eqn{y - y'}. However, with the parameter `times_input` only the gradient +#' describing the output sensitivity can be returned. +#' +#' @template param-converter +#' @template param-data +#' @template param-channels_first +#' @template param-dtype +#' @template param-output_idx +#' @template param-ignore_last_act +#' @template param-verbose +#' @template examples-IntegratedGradient +#' +#' @references +#' M. Sundararajan et al. (2017) *Axiomatic attribution for deep networks.* ICML +#' 2017, PMLR 70, pp. 3319-3328. +#' +#' @family methods +#' @export +IntegratedGradient <- R6Class( + classname = "IntegratedGradient", + inherit = GradientBased, + public = list( + + #' @field n (`integer(1)`)\cr + #' Number of steps for the approximation of the integration path along + #' \eqn{\alpha} (default: \eqn{50}).\cr + #' @field x_ref (`list`)\cr + #' The reference input for the IntegratedGradient method. This value is + #' stored as a list of `torch_tensor`s of shape *(1, dim_in)* for each + #' input layer.\cr + #' + n = NULL, + x_ref = NULL, + + #' @description + #' Create a new instance of the `IntegratedGradient` R6 class. When + #' initialized, the method *Integrated Gradient* is applied to the given + #' data and baseline value and the results are stored in the field `result`. + #' + #' @param times_input (`logical(1`)\cr + #' Multiplies the integrated gradients with the difference of the input + #' features and the baseline values. By default, the original definition of + #' IntegratedGradient is applied. However, by setting `times_input = FALSE` + #' only an approximation of the integral is calculated, which describes the + #' sensitivity of the features to the output.\cr + #' @param n (`integer(1)`)\cr + #' Number of steps for the approximation of the integration path along + #' \eqn{\alpha} (default: \eqn{50}).\cr + #' @param x_ref ([`array`], [`data.frame`], [`torch_tensor`] or `list`)\cr + #' The reference input for the IntegratedGradient method. This value + #' must have the same format as the input data of the passed model to the + #' converter object. This means either + #' - an `array`, `data.frame`, `torch_tensor` or array-like format of + #' size *(1, dim_in)*, if e.g., the model has only one input layer, or + #' - a `list` with the corresponding input data (according to the upper point) + #' for each of the input layers. + #' - It is also possible to use the default value `NULL` to take only + #' zeros as reference input.\cr + #' + initialize = function(converter, data, + x_ref = NULL, + n = 50, + times_input = TRUE, + channels_first = TRUE, + output_idx = NULL, + ignore_last_act = TRUE, + verbose = interactive(), + dtype = "float") { + super$initialize(converter, data, channels_first, output_idx, + ignore_last_act, times_input, verbose, dtype) + + cli_check(checkInt(n, lower = 1), "n") + self$n <- n + + if (is.null(x_ref)) { + x_ref <- lapply(lapply(self$data, dim), + function(x) array(0, dim = c(1, x[-1]))) + } + self$x_ref <- private$test_data(x_ref, name = "x_ref") + + # Check if x_ref is only a single instance + num_instances <- unlist(lapply(self$x_ref, function(x) dim(x)[1])) + if (any(num_instances != 1)) { + stopf("For the method {.code IntegratedGradient}, you have to pass ", + "only a single instance for the argument {.arg x_ref}. ", + "You passed (for at least one input layer) {max(num_instances)}", + " data instances!") + } + + self$result <- private$run() + self$converter$model$reset() + } + ), + private = list( + run = function() { + # Combine input and baseline for each input layer + input <- lapply(seq_along(self$data), + function(i) list(data = self$data[[i]], + x_ref = self$x_ref[[i]])) + + # Define helper function for getting `self$n` interpolated inputs, i.e. + # the result has a shape of (batch_size * n, dim_in) + tmp_fun <- function(input) { + # Repeat the input + res <- torch_repeat_interleave( + input$data, + repeats = torch_tensor(self$n, dtype = torch_long()), + dim = 1) + + # Define scale + scale <- torch_tensor(rep(seq(0, 1, length.out = self$n), length.out = res$shape[1])) + scale <- scale$reshape(c(-1, rep(1, res$dim() - 1))) + + # Create interpolations between x and x_ref + input$x_ref + scale * (res - input$x_ref) + } + + # Create interpolated inputs along the integration path for each input + # layer and calculate the gradients of them + input <- lapply(input, tmp_fun) + gradients <- private$calculate_gradients(input, "IntegratedGradient") + + # Define the core IntegreatedGradients calculation + # `grad` has a shape of (batch_size * n, dim_in) + tmp_IG <- function(grad) { + # Output node is not connected to the input layer + if (is.null(grad)) { + res <- NULL + } else { # otherwise ... + # Chunk the gradients for each of the batch_size samples + # Results in a list with batch_size entries containing torch_tensors + # of shape (n, dim_in) + grad <- grad$chunk(dim(self$data[[1]])[1]) + + # Define trapezoidal rule for approximation the integral + trapez_rule <- function(x, n) { + torch_mean((x[1:(n-1), ] + x[2:n]) / 2, dim = 1) + } + + # Calculate the result of IntegratedGradients for current gradients + res <- torch_stack(lapply(grad, trapez_rule, n = self$n)) + } + + res + } + + # Apply IntegratedGradients to all outputs + integrated_grads <- lapply( + gradients, + function(grad_output) lapply(grad_output, tmp_IG) + ) + + # Multiply the integrated gradients with the corresponding difference + # from baseline input (only if times_input is TRUE) + if (self$times_input) { + input_minus_ref <- lapply(seq_along(self$data), function(i) self$data[[i]] - self$x_ref[[i]]) + integrated_grads <- calc_times_input(integrated_grads, input_minus_ref) + } + + integrated_grads + }, + + print_method_specific = function() { + i <- cli_ul() + if (self$times_input) { + cli_li(paste0("{.field times_input}: TRUE (", + symbol$arrow_right, + " decomposition of y - y')")) + } else { + cli_li(paste0("{.field times_input}: FALSE (", + symbol$arrow_right, + " output sensitivity)")) + } + cli_li(paste0("{.field n}: ", self$n)) + all_zeros <- all(unlist(lapply(self$x_ref, + function(x) all(as_array(x) == 0)))) + if (all_zeros) { + s <- "zeros" + } else { + values <- unlist(lapply(self$x_ref, as_array)) + s <- paste0("mean: ", mean(values), " (q1: ", quantile(values, 0.25), + ", q3: ", quantile(values, 0.75), ")") + } + cli_li(paste0("{.field x_ref}: ", s)) + cli_end(id = i) + } + ) +) ############################################################################### # SmoothGrad diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index ea6303c..7763e59 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -12,6 +12,7 @@ #' - Gradient-based methods: #' - *Vanilla gradients* including *Gradient\eqn{\times}Input* ([`Gradient`]) #' - Smoothed gradients including *SmoothGrad\eqn{\times}Input* ([`SmoothGrad`]) +#' - *Integrated gradients* ([`IntegratedGradient`]) #' - *Connection Weights* (global and local) ([`ConnectionWeights`]) #' #' @template param-converter @@ -271,7 +272,7 @@ InterpretingMethod <- R6Class( value_name <- "Contribution" include_data <- TRUE } else if (inherits(self, "GradientBased")) { - value_name <- "Gradient" + value_name <- if(self$times_input) "Relevance" else "Gradient" include_data <- TRUE } @@ -745,14 +746,17 @@ InterpretingMethod <- R6Class( if (is.data.frame(input_data)) { input_data <- as.matrix(input_data) } - as.array(input_data) + input_data <- as.array(input_data) + assertNumeric(input_data) + input_data }, error = function(e) { stopf("Failed to convert the argument {.arg ", name, - "[[", i, "]]} to an array ", + "[[", i, "]]} to a numeric array ", "using the function {.fn base::as.array}. The class of your ", "argument {.arg ", name, "[[", i, "]]}: '", - paste(class(input_data), collapse = "', '"), "'") + paste(class(input_data), collapse = "', '"), "'", + " (of type: '", paste(typeof(input_data), collapse = "', '"), "')") }) ordered_dim <- self$converter$input_dim[[i]] diff --git a/man-roxygen/examples-IntegratedGradient.R b/man-roxygen/examples-IntegratedGradient.R new file mode 100644 index 0000000..7612646 --- /dev/null +++ b/man-roxygen/examples-IntegratedGradient.R @@ -0,0 +1,107 @@ +#' @examplesIf torch::torch_is_installed() +#' #----------------------- Example 1: Torch ---------------------------------- +#' library(torch) +#' +#' # Create nn_sequential model and data +#' model <- nn_sequential( +#' nn_linear(5, 12), +#' nn_relu(), +#' nn_linear(12, 2), +#' nn_softmax(dim = 2) +#' ) +#' data <- torch_randn(25, 5) +#' ref <- torch_randn(1, 5) +#' +#' # Create Converter +#' converter <- Converter$new(model, input_dim = c(5)) +#' +#' # Apply method IntegratedGradient +#' int_grad <- IntegratedGradient$new(converter, data, x_ref = ref) +#' +#' # Print the result as a torch tensor for first two data points +#' get_result(int_grad, "torch.tensor")[1:2] +#' +#' # Plot the result for both classes +#' plot(int_grad, output_idx = 1:2) +#' +#' # Plot the boxplot of all datapoints and for both classes +#' boxplot(int_grad, output_idx = 1:2) +#' +#' # ------------------------- Example 2: Neuralnet --------------------------- +#' if (require("neuralnet")) { +#' library(neuralnet) +#' data(iris) +#' +#' # Train a neural network +#' nn <- neuralnet((Species == "setosa") ~ Petal.Length + Petal.Width, +#' iris, +#' linear.output = FALSE, +#' hidden = c(3, 2), act.fct = "tanh", rep = 1 +#' ) +#' +#' # Convert the model +#' converter <- Converter$new(nn) +#' +#' # Apply IntegratedGradient with a reference input of the feature means +#' x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) +#' int_grad <- IntegratedGradient$new(converter, iris[, c(3, 4)], x_ref = x_ref) +#' +#' # Get the result as a dataframe and show first 5 rows +#' get_result(int_grad, type = "data.frame")[1:5, ] +#' +#' # Plot the result for the first datapoint in the data +#' plot(int_grad, data_idx = 1) +#' +#' # Plot the result as boxplots +#' boxplot(int_grad) +#' } +#' +#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' # ------------------------- Example 3: Keras ------------------------------- +#' if (require("keras")) { +#' library(keras) +#' +#' # Make sure keras is installed properly +#' is_keras_available() +#' +#' data <- array(rnorm(10 * 32 * 32 * 3), dim = c(10, 32, 32, 3)) +#' +#' model <- keras_model_sequential() +#' model %>% +#' layer_conv_2d( +#' input_shape = c(32, 32, 3), kernel_size = 8, filters = 8, +#' activation = "softplus", padding = "valid") %>% +#' layer_conv_2d( +#' kernel_size = 8, filters = 4, activation = "tanh", +#' padding = "same") %>% +#' layer_conv_2d( +#' kernel_size = 4, filters = 2, activation = "relu", +#' padding = "valid") %>% +#' layer_flatten() %>% +#' layer_dense(units = 64, activation = "relu") %>% +#' layer_dense(units = 2, activation = "softmax") +#' +#' # Convert the model +#' converter <- Converter$new(model) +#' +#' # Apply the IntegratedGradient method with a zero baseline and n = 20 +#' # iteration steps +#' int_grad <- IntegratedGradient$new(converter, data, +#' channels_first = FALSE, +#' n = 20 +#' ) +#' +#' # Plot the result for the first image and both classes +#' plot(int_grad, output_idx = 1:2) +#' +#' # Plot the result as boxplots for first class +#' boxplot(int_grad, output_idx = 1) +#' } +#' @examplesIf torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 +#' #------------------------- Plotly plots ------------------------------------ +#' if (require("plotly")) { +#' # You can also create an interactive plot with plotly. +#' # This is a suggested package, so make sure that it is installed +#' library(plotly) +#' boxplot(int_grad, as_plotly = TRUE) +#' } diff --git a/man/ConnectionWeights.Rd b/man/ConnectionWeights.Rd index 2dd068c..3e0f09f 100644 --- a/man/ConnectionWeights.Rd +++ b/man/ConnectionWeights.Rd @@ -134,6 +134,7 @@ simulated data.} Ecological Modelling 178, p. 389–397 Other methods: \code{\link{DeepLift}}, \code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, \code{\link{LRP}}, \code{\link{SmoothGrad}} } diff --git a/man/DeepLift.Rd b/man/DeepLift.Rd index 56754d1..96d0fff 100644 --- a/man/DeepLift.Rd +++ b/man/DeepLift.Rd @@ -140,6 +140,7 @@ propagating activation differences.} ICML 2017, p. 4844-4866 Other methods: \code{\link{ConnectionWeights}}, \code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, \code{\link{LRP}}, \code{\link{SmoothGrad}} } diff --git a/man/Gradient.Rd b/man/Gradient.Rd index 95fb873..37c7b1a 100644 --- a/man/Gradient.Rd +++ b/man/Gradient.Rd @@ -134,6 +134,7 @@ if (require("plotly")) { Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, +\code{\link{IntegratedGradient}}, \code{\link{LRP}}, \code{\link{SmoothGrad}} } diff --git a/man/GradientBased.Rd b/man/GradientBased.Rd index 78b808e..00cde36 100644 --- a/man/GradientBased.Rd +++ b/man/GradientBased.Rd @@ -10,6 +10,7 @@ gradient-based methods and provides a private function to calculate the gradients w.r.t. to the input for given data. Implemented are: \itemize{ \item \emph{Vanilla Gradients} and \emph{Gradient\eqn{\times}Input} (\code{\link{Gradient}}) +\item \emph{Integrated Gradients} (\code{\link{IntegratedGradient}}) \item \emph{SmoothGrad} and \emph{SmoothGrad\eqn{\times}Input} (\code{\link{SmoothGrad}}) } } diff --git a/man/IntegratedGradient.Rd b/man/IntegratedGradient.Rd new file mode 100644 index 0000000..55fab5d --- /dev/null +++ b/man/IntegratedGradient.Rd @@ -0,0 +1,317 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/GradienBased.R +\name{IntegratedGradient} +\alias{IntegratedGradient} +\title{Integrated Gradients} +\description{ +The \code{IntegratedGradient} class implements the method Integrated Gradients +(Sundararajan et al., 2017), which incorporates a reference value \eqn{x'} +(also known as baseline value) analogous to the \code{\link{DeepLift}} method. +Integrated Gradients helps to uncover the relative importance of input +features in the predictions \eqn{y = f(x)} made by a model compared to the +prediction of the reference value \eqn{y' = f(x')}. This is achieved through +the following formula: +\deqn{ +(x - x') \times \int_{\alpha=0}^{1} \frac{\partial f(x' + \alpha (x - x'))}{\partial x} d\alpha +} +In simpler terms, it calculates how much each feature contributes to a +model's output by tracing a path from a baseline input \eqn{x'} to the actual +input \eqn{x} and measuring the average gradients along that path. + +Similar to the other gradient-based methods, by default the integrated +gradient is multiplied by the input to get an approximate decomposition +of \eqn{y - y'}. However, with the parameter \code{times_input} only the gradient +describing the output sensitivity can be returned. +} +\examples{ +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#----------------------- Example 1: Torch ---------------------------------- +library(torch) + +# Create nn_sequential model and data +model <- nn_sequential( + nn_linear(5, 12), + nn_relu(), + nn_linear(12, 2), + nn_softmax(dim = 2) +) +data <- torch_randn(25, 5) +ref <- torch_randn(1, 5) + +# Create Converter +converter <- Converter$new(model, input_dim = c(5)) + +# Apply method IntegratedGradient +int_grad <- IntegratedGradient$new(converter, data, x_ref = ref) + +# Print the result as a torch tensor for first two data points +get_result(int_grad, "torch.tensor")[1:2] + +# Plot the result for both classes +plot(int_grad, output_idx = 1:2) + +# Plot the boxplot of all datapoints and for both classes +boxplot(int_grad, output_idx = 1:2) + +# ------------------------- Example 2: Neuralnet --------------------------- +if (require("neuralnet")) { + library(neuralnet) + data(iris) + + # Train a neural network + nn <- neuralnet((Species == "setosa") ~ Petal.Length + Petal.Width, + iris, + linear.output = FALSE, + hidden = c(3, 2), act.fct = "tanh", rep = 1 + ) + + # Convert the model + converter <- Converter$new(nn) + + # Apply IntegratedGradient with a reference input of the feature means + x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) + int_grad <- IntegratedGradient$new(converter, iris[, c(3, 4)], x_ref = x_ref) + + # Get the result as a dataframe and show first 5 rows + get_result(int_grad, type = "data.frame")[1:5, ] + + # Plot the result for the first datapoint in the data + plot(int_grad, data_idx = 1) + + # Plot the result as boxplots + boxplot(int_grad) +} +\dontshow{\}) # examplesIf} +\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +# ------------------------- Example 3: Keras ------------------------------- +if (require("keras")) { + library(keras) + + # Make sure keras is installed properly + is_keras_available() + + data <- array(rnorm(10 * 32 * 32 * 3), dim = c(10, 32, 32, 3)) + + model <- keras_model_sequential() + model \%>\% + layer_conv_2d( + input_shape = c(32, 32, 3), kernel_size = 8, filters = 8, + activation = "softplus", padding = "valid") \%>\% + layer_conv_2d( + kernel_size = 8, filters = 4, activation = "tanh", + padding = "same") \%>\% + layer_conv_2d( + kernel_size = 4, filters = 2, activation = "relu", + padding = "valid") \%>\% + layer_flatten() \%>\% + layer_dense(units = 64, activation = "relu") \%>\% + layer_dense(units = 2, activation = "softmax") + + # Convert the model + converter <- Converter$new(model) + + # Apply the IntegratedGradient method with a zero baseline and n = 20 + # iteration steps + int_grad <- IntegratedGradient$new(converter, data, + channels_first = FALSE, + n = 20 + ) + + # Plot the result for the first image and both classes + plot(int_grad, output_idx = 1:2) + + # Plot the result as boxplots for first class + boxplot(int_grad, output_idx = 1) +} +\dontshow{\}) # examplesIf} +\dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#------------------------- Plotly plots ------------------------------------ +if (require("plotly")) { + # You can also create an interactive plot with plotly. + # This is a suggested package, so make sure that it is installed + library(plotly) + boxplot(int_grad, as_plotly = TRUE) +} +\dontshow{\}) # examplesIf} +} +\references{ +M. Sundararajan et al. (2017) \emph{Axiomatic attribution for deep networks.} ICML +2017, PMLR 70, pp. 3319-3328. +} +\seealso{ +Other methods: +\code{\link{ConnectionWeights}}, +\code{\link{DeepLift}}, +\code{\link{Gradient}}, +\code{\link{LRP}}, +\code{\link{SmoothGrad}} +} +\concept{methods} +\section{Super classes}{ +\code{\link[innsight:InterpretingMethod]{innsight::InterpretingMethod}} -> \code{\link[innsight:GradientBased]{innsight::GradientBased}} -> \code{IntegratedGradient} +} +\section{Public fields}{ +\if{html}{\out{
}} +\describe{ +\item{\code{n}}{(\code{integer(1)})\cr +Number of steps for the approximation of the integration path along +\eqn{\alpha} (default: \eqn{50}).\cr} + +\item{\code{x_ref}}{(\code{list})\cr +The reference input for the IntegratedGradient method. This value is +stored as a list of \code{torch_tensor}s of shape \emph{(1, dim_in)} for each +input layer.\cr} +} +\if{html}{\out{
}} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-IntegratedGradient-new}{\code{IntegratedGradient$new()}} +\item \href{#method-IntegratedGradient-clone}{\code{IntegratedGradient$clone()}} +} +} +\if{html}{\out{ +
Inherited methods + +
+}} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-IntegratedGradient-new}{}}} +\subsection{Method \code{new()}}{ +Create a new instance of the \code{IntegratedGradient} R6 class. When +initialized, the method \emph{Integrated Gradient} is applied to the given +data and baseline value and the results are stored in the field \code{result}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{IntegratedGradient$new( + converter, + data, + x_ref = NULL, + n = 50, + times_input = TRUE, + channels_first = TRUE, + output_idx = NULL, + ignore_last_act = TRUE, + verbose = interactive(), + dtype = "float" +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{converter}}{(\code{\link{Converter}})\cr +An instance of the \code{Converter} class that includes the +torch-converted model and some other model-specific attributes. See +\code{\link{Converter}} for details.\cr} + +\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The data to which the method is to be applied. These must +have the same format as the input data of the passed model to the +converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{(batch_size, dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the +upper point) for each of the input layers.\cr +}} + +\item{\code{x_ref}}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The reference input for the IntegratedGradient method. This value +must have the same format as the input data of the passed model to the +converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{(1, dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the upper point) +for each of the input layers. +\item It is also possible to use the default value \code{NULL} to take only +zeros as reference input.\cr +}} + +\item{\code{n}}{(\code{integer(1)})\cr +Number of steps for the approximation of the integration path along +\eqn{\alpha} (default: \eqn{50}).\cr} + +\item{\code{times_input}}{(\verb{logical(1})\cr +Multiplies the integrated gradients with the difference of the input +features and the baseline values. By default, the original definition of +IntegratedGradient is applied. However, by setting \code{times_input = FALSE} +only an approximation of the integral is calculated, which describes the +sensitivity of the features to the output.\cr} + +\item{\code{channels_first}}{(\code{logical(1)})\cr +The channel position of the given data (argument +\code{data}). If \code{TRUE}, the channel axis is placed at the second position +between the batch size and the rest of the input axes, e.g., +\code{c(10,3,32,32)} for a batch of ten images with three channels and a +height and width of 32 pixels. Otherwise (\code{FALSE}), the channel axis +is at the last position, i.e., \code{c(10,32,32,3)}. If the data +has no channel axis, use the default value \code{TRUE}.\cr} + +\item{\code{output_idx}}{(\code{integer}, \code{list} or \code{NULL})\cr +These indices specify the output nodes for which +the method is to be applied. In order to allow models with multiple +output layers, there are the following possibilities to select +the indices of the output nodes in the individual output layers: +\itemize{ +\item An \code{integer} vector of indices: If the model has only one output +layer, the values correspond to the indices of the output nodes, e.g., +\code{c(1,3,4)} for the first, third and fourth output node. If there are +multiple output layers, the indices of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{integer} vectors of indices: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired indices of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{ignore_last_act}}{(\code{logical(1)})\cr +Set this logical value to include the last +activation functions for each output layer, or not (default: \code{TRUE}). +In practice, the last activation (especially for softmax activation) is +often omitted.\cr} + +\item{\code{verbose}}{(\code{logical(1)})\cr +This logical argument determines whether a progress bar is +displayed for the calculation of the method or not. The default value is +the output of the primitive R function \code{\link[=interactive]{interactive()}}.\cr} + +\item{\code{dtype}}{(\code{character(1)})\cr +The data type for the calculations. Use +either \code{'float'} for \link{torch_float} or \code{'double'} for +\link{torch_double}.\cr} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-IntegratedGradient-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{IntegratedGradient$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/InterpretingMethod.Rd b/man/InterpretingMethod.Rd index a896a00..1050c64 100644 --- a/man/InterpretingMethod.Rd +++ b/man/InterpretingMethod.Rd @@ -13,6 +13,7 @@ This is a super class for all interpreting methods in the \itemize{ \item \emph{Vanilla gradients} including \emph{Gradient\eqn{\times}Input} (\code{\link{Gradient}}) \item Smoothed gradients including \emph{SmoothGrad\eqn{\times}Input} (\code{\link{SmoothGrad}}) +\item \emph{Integrated gradients} (\code{\link{IntegratedGradient}}) } \item \emph{Connection Weights} (global and local) (\code{\link{ConnectionWeights}}) } diff --git a/man/LRP.Rd b/man/LRP.Rd index 769838a..63f3e5d 100644 --- a/man/LRP.Rd +++ b/man/LRP.Rd @@ -154,6 +154,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/SmoothGrad.Rd b/man/SmoothGrad.Rd index 018b014..c396d83 100644 --- a/man/SmoothGrad.Rd +++ b/man/SmoothGrad.Rd @@ -134,6 +134,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, \code{\link{LRP}} } \concept{methods} diff --git a/tests/testthat/test_IntegratedGradient.R b/tests/testthat/test_IntegratedGradient.R new file mode 100644 index 0000000..a958a53 --- /dev/null +++ b/tests/testthat/test_IntegratedGradient.R @@ -0,0 +1,361 @@ + +test_that("IntegratedGradient: General errors", { + library(keras) + library(torch) + + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "relu") %>% + layer_dense(units = 3, activation = "softmax") + + converter <- Converter$new(model) + + expect_error(IntegratedGradient$new(model, data)) + expect_error(IntegratedGradient$new(converter, model)) + expect_error(IntegratedGradient$new(converter, data, channels_first = NULL)) + expect_error(IntegratedGradient$new(converter, data, times_input = "asdf")) + expect_error(IntegratedGradient$new(converter, data, x_ref = "asdf")) + expect_error(IntegratedGradient$new(converter, data, n = "asdf")) + expect_error(IntegratedGradient$new(converter, data, dtype = NULL)) + expect_error(IntegratedGradient$new(converter, data, ignore_last_act = c(1))) +}) + +test_that("IntegratedGradient: Plot and Boxplot", { + library(neuralnet) + library(torch) + + data(iris) + data <- iris[sample.int(150, size = 10), -5] + nn <- neuralnet(Species ~ ., + iris, + linear.output = FALSE, + hidden = c(10, 8), act.fct = "tanh", rep = 1, threshold = 0.5 + ) + # create an converter for this model + converter <- Converter$new(nn) + + ig <- IntegratedGradient$new(converter, data, + dtype = "double", + ignore_last_act = FALSE + ) + + # ggplot2 + + # Non-existing data points + expect_error(plot(ig, data_idx = c(1,11))) + expect_error(boxplot(ig, data_idx = 1:11)) + # Non-existing class + expect_error(plot(ig, output_idx = c(5))) + expect_error(boxplot(ig, output_idx = c(5))) + + p <- plot(ig) + boxp <- boxplot(ig) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(ig, data_idx = 1:3) + boxp <- boxplot(ig, data_idx = 1:4) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(ig, data_idx = 1:3, output_idx = 1:3) + boxp <- boxplot(ig, data_idx = 1:5, output_idx = 1:3) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # plotly + library(plotly) + + p <- plot(ig, as_plotly = TRUE) + boxp <- boxplot(ig, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(ig, data_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(ig, data_idx = 1:4, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(ig, data_idx = 1:3, output_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(ig, data_idx = 1:5, output_idx = 1:3, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") +}) + +test_that("IntegratedGradient: Dense-Net (Neuralnet)", { + library(neuralnet) + library(torch) + + data(iris) + + data <- iris[sample.int(150, size = 10), -5] + nn <- neuralnet(Species ~ ., + iris, + linear.output = FALSE, + hidden = c(10, 8), act.fct = "tanh", rep = 1, threshold = 0.5 + ) + # create an converter for this model + converter <- Converter$new(nn) + x_ref <- matrix(rnorm(4), nrow = 1) + + # ignore last activation + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + ignore_last_act = FALSE) + + int_grad <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad), c(10, 4, 3)) + + # include last activation + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + ignore_last_act = TRUE) + int_grad_no_last_act <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad_no_last_act), c(10, 4, 3)) + + # not times input + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + times_input = FALSE, + ignore_last_act = TRUE) + int_grad_no_times_input <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad_no_times_input), c(10, 4, 3)) +}) + +test_that("IntegratedGradient: Dense-Net (keras)", { + library(keras) + library(torch) + + data <- matrix(rnorm(4 * 10), nrow = 10) + + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "tanh") %>% + layer_dense(units = 3, activation = "softmax") + + converter <- Converter$new(model) + x_ref <- matrix(rnorm(4), nrow = 1) + + # ignore last activation + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + ignore_last_act = FALSE) + + int_grad <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad), c(10, 4, 3)) + + # not times input + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + times_input = FALSE, + ignore_last_act = TRUE) + int_grad_no_times_input <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad_no_times_input), c(10, 4, 3)) +}) + +test_that("IntegratedGradient: Conv1D-Net", { + library(keras) + library(torch) + + data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + + model <- keras_model_sequential() + model %>% + layer_conv_1d( + input_shape = c(64, 3), kernel_size = 16, filters = 8, + activation = "softplus" + ) %>% + layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% + layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 1, activation = "sigmoid") + + # test non-fitted model + converter <- Converter$new(model) + x_ref <- array(rnorm(64 * 3), dim = c(1, 64, 3)) + + # ignore last activation + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + channels_first = FALSE, + ignore_last_act = FALSE) + + int_grad <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad), c(4, 64, 3, 1)) + + # not times input + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + times_input = FALSE, + channels_first = FALSE, + ignore_last_act = TRUE) + int_grad_no_times_input <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad_no_times_input), c(4, 64, 3, 1)) +}) + +test_that("IntegratedGradient: Conv2D-Net", { + library(keras) + library(torch) + + data <- array(rnorm(4 * 32 * 32 * 3), dim = c(4, 32, 32, 3)) + + model <- keras_model_sequential() + model %>% + layer_conv_2d( + input_shape = c(32, 32, 3), kernel_size = 8, filters = 8, + activation = "softplus", padding = "same" + ) %>% + layer_conv_2d( + kernel_size = 8, filters = 4, activation = "tanh", + padding = "same" + ) %>% + layer_conv_2d( + kernel_size = 4, filters = 2, activation = "relu", + padding = "same" + ) %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 2, activation = "softmax") + + # test non-fitted model + converter <- Converter$new(model) + x_ref <- array(rnorm(32 * 32 * 3), dim = c(1, 32, 32, 3)) + + # ignore last activation + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + channels_first = FALSE, + ignore_last_act = FALSE) + + int_grad <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad), c(4, 32, 32, 3, 2)) + + # not times input + ig <- IntegratedGradient$new(converter, data, + x_ref = x_ref, + times_input = FALSE, + channels_first = FALSE, + ignore_last_act = TRUE) + int_grad_no_times_input <- ig$get_result(type = "torch.tensor") + expect_equal(dim(int_grad_no_times_input), c(4, 32, 32, 3, 2)) +}) + + + +test_that("IntegratedGradient: Keras model with two inputs + two outputs (concat)", { + library(keras) + + main_input <- layer_input(shape = c(10,10,2), name = 'main_input') + lstm_out <- main_input %>% + layer_conv_2d(2, c(2,2), activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 4) + auxiliary_input <- layer_input(shape = c(5), name = 'aux_input') + auxiliary_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>% + layer_dense(units = 2, activation = 'relu', name = 'aux_output') + main_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>% + layer_dense(units = 5, activation = 'relu') %>% + layer_dense(units = 3, activation = 'tanh', name = 'main_output') + model <- keras_model( + inputs = c(auxiliary_input, main_input), + outputs = c(auxiliary_output, main_output) + ) + + converter <- Converter$new(model) + + # Check IntegratedGradient with ignoring last activation + data <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + x_ref <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(1, x))) + + int_grad <- IntegratedGradient$new(converter, data, x_ref = x_ref, + channels_first = FALSE, output_idx = list(c(2), c(1,3))) + result <- int_grad$get_result() + expect_equal(length(result), 2) + expect_equal(length(result[[1]]), 2) + expect_equal(dim(result[[1]][[1]]), c(10,5,1)) + expect_equal(dim(result[[1]][[2]]), c(10,10,10,2,1)) + expect_equal(length(result[[2]]), 2) + expect_equal(dim(result[[2]][[1]]), c(10,5,2)) + expect_equal(dim(result[[2]][[2]]), c(10,10,10,2,2)) + + # Check IntegratedGradient without times_input and ignoring last activation + data <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + x_ref <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(1, x))) + int_grad <- IntegratedGradient$new(converter, data, x_ref = x_ref, + channels_first = FALSE, + times_input = FALSE, + output_idx = list(c(1), c(1,2))) + result <- int_grad$get_result() + expect_equal(length(result), 2) + expect_equal(length(result[[1]]), 2) + expect_equal(dim(result[[1]][[1]]), c(10,5,1)) + expect_equal(dim(result[[1]][[2]]), c(10,10,10,2,1)) + expect_equal(length(result[[2]]), 2) + expect_equal(dim(result[[2]][[1]]), c(10,5,2)) + expect_equal(dim(result[[2]][[2]]), c(10,10,10,2,2)) +}) + + +test_that("IntegratedGradient: Keras model with three inputs + one output (add)", { + library(keras) + + input_1 <- layer_input(shape = c(12,15,3)) + part_1 <- input_1 %>% + layer_conv_2d(3, c(4,4), activation = "relu", use_bias = FALSE) %>% + layer_conv_2d(2, c(3,3), activation = "relu", use_bias = FALSE) %>% + layer_flatten() %>% + layer_dense(20, activation = "relu", use_bias = FALSE) + input_2 <- layer_input(shape = c(10)) + part_2 <- input_2 %>% + layer_dense(50, activation = "tanh", use_bias = FALSE) + input_3 <- layer_input(shape = c(20)) + part_3 <- input_3 %>% + layer_dense(40, activation = "relu", use_bias = FALSE) + + output <- layer_concatenate(c(part_1, part_3, part_2)) %>% + layer_dense(100, activation = "relu", use_bias = FALSE) %>% + layer_dense(1, activation = "linear", use_bias = FALSE) + + model <- keras_model( + inputs = c(input_1, input_3, input_2), + outputs = output + ) + + converter <- Converter$new(model) + + # Check IntegratedGradient with ignoring last activation + data <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(10,x))) + x_ref <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(1,x))) + + int_grad <- IntegratedGradient$new(converter, data, x_ref = x_ref, + channels_first = FALSE) + result <- int_grad$get_result() + expect_equal(length(result), 3) + expect_equal(dim(result[[1]]), c(10,12,15,3,1)) + expect_equal(dim(result[[2]]), c(10,20,1)) + expect_equal(dim(result[[3]]), c(10,10,1)) + + # Check IntegratedGradient without times_input and ignoring last activation + data <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(10,x))) + x_ref <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(1,x))) + + int_grad <- IntegratedGradient$new(converter, data, x_ref = x_ref, + channels_first = FALSE, + times_input = FALSE) + result <- int_grad$get_result() + expect_equal(length(result), 3) + expect_equal(dim(result[[1]]), c(10,12,15,3,1)) + expect_equal(dim(result[[2]]), c(10,20,1)) + expect_equal(dim(result[[3]]), c(10,10,1)) +}) + From 63431aedd81c2dc2bce81f392e4d88aa94ad7df9 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Wed, 8 Nov 2023 12:13:37 +0100 Subject: [PATCH 02/20] Add the method DeepSHAP * Implementing the method DeepSHAP as the R6 class `DeepSHAp` * Add documentation for this new class * Add tests for this new class --- NAMESPACE | 2 + R/DeepLift.R | 176 +++++- R/InterpretingLayer.R | 29 +- R/InterpretingMethod.R | 27 +- R/Layer_conv1d.R | 6 +- R/Layer_conv2d.R | 6 +- R/Layer_dense.R | 6 +- R/Layer_normalization.R | 4 +- R/Layer_pooling.R | 16 +- ...xamples_DeepLift.R => examples-DeepLift.R} | 0 man-roxygen/examples-DeepSHAP.R | 109 ++++ man/ConnectionWeights.Rd | 1 + man/DeepLift.Rd | 1 + man/DeepSHAP.Rd | 323 ++++++++++ man/Gradient.Rd | 1 + man/IntegratedGradient.Rd | 1 + man/LRP.Rd | 1 + man/SmoothGrad.Rd | 1 + tests/testthat/test_DeepSHAP.R | 595 ++++++++++++++++++ 19 files changed, 1263 insertions(+), 42 deletions(-) rename man-roxygen/{examples_DeepLift.R => examples-DeepLift.R} (100%) create mode 100644 man-roxygen/examples-DeepSHAP.R create mode 100644 man/DeepSHAP.Rd create mode 100644 tests/testthat/test_DeepSHAP.R diff --git a/NAMESPACE b/NAMESPACE index db2892c..594ed7b 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -2,12 +2,14 @@ S3method(boxplot,ConnectionWeights) S3method(boxplot,DeepLift) +S3method(boxplot,DeepSHAP) S3method(boxplot,GradientBased) S3method(boxplot,LRP) S3method(get_result,InterpretingMethod) export(ConnectionWeights) export(Converter) export(DeepLift) +export(DeepSHAP) export(Gradient) export(IntegratedGradient) export(LRP) diff --git a/R/DeepLift.R b/R/DeepLift.R index d1f77ff..44feacf 100644 --- a/R/DeepLift.R +++ b/R/DeepLift.R @@ -1,3 +1,7 @@ +############################################################################### +# DeepLift +############################################################################### + #' @title Deep learning important features (DeepLift) #' #' @description @@ -26,7 +30,7 @@ #' @template param-dtype #' @template param-verbose #' @template field-x_ref -#' @template examples_DeepLift +#' @template examples-DeepLift #' #' @references #' A. Shrikumar et al. (2017) \emph{Learning important features through @@ -84,6 +88,15 @@ DeepLift <- R6Class( } self$x_ref <- private$test_data(x_ref, name = "x_ref") + # Check if x_ref is only a single instance + num_instances <- unlist(lapply(self$x_ref, function(x) dim(x)[1])) + if (any(num_instances != 1)) { + stopf("For the method {.code DeepLift}, you have to pass ", + "only a single instance for the argument {.arg x_ref}. ", + "You passed (for at least one input layer) {max(num_instances)}", + " data instances!") + } + self$converter$model$forward(self$data, channels_first = self$channels_first, save_input = TRUE, @@ -123,6 +136,159 @@ DeepLift <- R6Class( ) +############################################################################### +# DeepSHAP +############################################################################### + +#' Deep Shapley additive explanations (DeepSHAP) +#' +#' @description +#' The *DeepSHAP* method extends the [DeepLift] technique by not only +#' considering a single reference value but by calculating the average +#' from several, ideally representative reference values at each layer. The +#' obtained feature-wise results are approximate Shapley values for the +#' chosen output, where the conditional expectation is computed using these +#' different reference values, i.e., the *DeepSHAP* method decompose the +#' difference from the prediction and the mean prediction \eqn{f(x) - E[f(x)]} +#' in feature-wise effects. The reference values can be passed by the argument +#' `data_ref`. +#' +#' @template param-converter +#' @template param-data +#' @template param-output_idx +#' @template param-channels_first +#' @template param-ignore_last_act +#' @template param-dtype +#' @template param-verbose +#' @template examples-DeepSHAP +#' +#' @references +#' S. Lundberg & S. Lee (2017) \emph{A unified approach to interpreting model +#' predictions.} NIPS 2017, p. 4768–4777 +#' +#' @family methods +#' @export +DeepSHAP <- R6Class( + classname = "DeepSHAP", + inherit = InterpretingMethod, + public = list( + + #' @field rule_name (`character(1)`)\cr + #' Name of the applied rule to calculate the contributions. + #' Either `'rescale'` or `'reveal_cancel'`.\cr + #' @field data_ref (`list`)\cr + #' The passed reference dataset for estimating the conditional expectation + #' as a `list` of `torch_tensors` in the selected + #' data format (field `dtype`) matching the corresponding shapes of the + #' individual input layers. Besides, the channel axis is moved to the + #' second position after the batch size because internally only the + #' format *channels first* is used.\cr + rule_name = NULL, + data_ref = NULL, + + #' @description + #' Create a new instance of the `DeepSHAP` R6 class. When initialized, + #' the method *DeepSHAP* is applied to the given data and the results are + #' stored in the field `result`. + #' + #' @param rule_name (`character(1)`)\cr + #' Name of the applied rule to calculate the + #' contributions. Use either `'rescale'` or `'reveal_cancel'`. \cr + #' @param winner_takes_all (`logical(1)`)\cr + #' This logical argument is only relevant for MaxPooling + #' layers and is otherwise ignored. With this layer type, it is possible that + #' the position of the maximum values in the pooling kernel of the normal input + #' \eqn{x} and the reference input \eqn{x'} may not match, which leads to a + #' violation of the summation-to-delta property. To overcome this problem, + #' another variant is implemented, which treats a MaxPooling layer as an + #' AveragePooling layer in the backward pass only, leading to an uniform + #' distribution of the upper-layer contribution to the lower layer.\cr + #' @param data_ref ([`array`], [`data.frame`], [`torch_tensor`] or `list`)\cr + #' The reference data which is used to estimate the conditional expectation. + #' These must have the same format as the input data of the passed model to + #' the converter object. This means either + #' \itemize{ + #' \item an `array`, `data.frame`, `torch_tensor` or array-like format of + #' size *(batch_size, dim_in)*, if e.g., the model has only one input layer, or + #' \item a `list` with the corresponding input data (according to the + #' upper point) for each of the input layers. + #' \item or `NULL` (default) to use only a zero baseline for the estimation.\cr + #' } + #' @param limit_ref (`integer(1)`)\cr + #' This argument limits the number of instances taken from the reference + #' dataset `data_ref` so that only random `limit_ref` elements and not + #' the entire dataset are used to estimate the conditional expectation. + #' A too-large number can significantly increase the computation time.\cr + initialize = function(converter, data, + channels_first = TRUE, + output_idx = NULL, + ignore_last_act = TRUE, + rule_name = "rescale", + data_ref = NULL, + limit_ref = 100, + winner_takes_all = TRUE, + verbose = interactive(), + dtype = "float") { + super$initialize(converter, data, channels_first, output_idx, + ignore_last_act, winner_takes_all, verbose, dtype) + + cli_check(checkChoice(rule_name, c("rescale")), "rule_name") + self$rule_name <- rule_name + cli_check(checkInt(limit_ref), "limit_ref") + + # For default values of data_ref, DeepLift with zero baseline is applied + if (is.null(data_ref)) { + data_ref <- lapply(lapply(self$data, dim), + function(x) array(0, dim = c(1, x[-1]))) + } + self$data_ref <- private$test_data(data_ref, name = "data_ref") + + # For computational reasons, the number of instances in the reference + # dataset 'data_ref' is limited by the value `limit_ref` + num_samples <- dim(self$data_ref[[1]])[1] + ids <- sample.int(num_samples, min(num_samples, limit_ref)) + self$data_ref <- lapply(self$data_ref, function(x) x[ids, drop = FALSE]) + + # Forward for normal input + self$converter$model$forward(self$data, + channels_first = self$channels_first, + save_input = TRUE, + save_preactivation = TRUE, + save_output = TRUE + ) + # Forward for reference dataset + self$converter$model$update_ref(self$data_ref, + channels_first = self$channels_first, + save_input = TRUE, + save_preactivation = TRUE, + save_output = TRUE + ) + + self$result <- private$run("DeepSHAP") + } + ), + + private = list( + print_method_specific = function() { + i <- cli_ul() + cli_li(paste0("{.field rule_name}: '", self$rule_name, "'")) + cli_li(paste0("{.field winner_takes_all}: ", self$winner_takes_all)) + all_zeros <- all(unlist(lapply(self$x_ref, + function(x) all(as_array(x) == 0)))) + if (all_zeros) { + s <- "zeros" + } else { + values <- unlist(lapply(self$x_ref, as_array)) + s <- paste0("mean: ", mean(values), " (q1: ", quantile(values, 0.25), + ", q3: ", quantile(values, 0.75), ")") + } + cli_li(paste0("{.field x_ref}: ", s)) + cli_end(id = i) + } + ) +) + + #' #' @importFrom graphics boxplot #' @exportS3Method @@ -130,3 +296,11 @@ DeepLift <- R6Class( boxplot.DeepLift <- function(x, ...) { x$boxplot(...) } + +#' +#' @importFrom graphics boxplot +#' @exportS3Method +#' +boxplot.DeepSHAP <- function(x, ...) { + x$boxplot(...) +} diff --git a/R/InterpretingLayer.R b/R/InterpretingLayer.R index 577cf31..9928e46 100644 --- a/R/InterpretingLayer.R +++ b/R/InterpretingLayer.R @@ -91,7 +91,8 @@ InterpretingLayer <- nn_module( rel_input }, - get_input_multiplier = function(mult_output, rule_name = "rescale", ...) { + get_input_multiplier = function(mult_output, rule_name = "rescale", + use_grad_near_zero = FALSE, ...) { # --------------------- Non-linear part--------------------------- mult_pos <- mult_output @@ -106,17 +107,23 @@ InterpretingLayer <- nn_module( (self$preactivation - self$preactivation_ref)$unsqueeze(-1) # Near zero needs special treatment - mask <- torch_le(abs(delta_preact), eps) * 1.0 - x <- mask * - (self$preactivation + self$preactivation_ref)$unsqueeze(-1) / 2 - x$requires_grad <- TRUE + if (use_grad_near_zero) { + mask <- torch_le(abs(delta_preact), eps) * 1.0 + x <- mask * + (self$preactivation + self$preactivation_ref)$unsqueeze(-1) / 2 + x$requires_grad <- TRUE + + y <- sum(self$activation_f(x)) + grad <- autograd_grad(y, x)[[1]] + + nonlin_mult <- + (1 - mask) * (delta_output / (delta_preact + delta_preact$eq(0.0) * eps)) + + mask * grad + } else { + nonlin_mult <- + delta_output / (delta_preact + delta_preact$eq(0.0) * eps) + } - y <- sum(self$activation_f(x)) - grad <- autograd_grad(y, x)[[1]] - - nonlin_mult <- - (1 - mask) * (delta_output / (delta_preact + delta_preact$eq(0.0) * eps)) + - mask * grad mult_pos <- mult_output * nonlin_mult mult_neg <- mult_output * nonlin_mult diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index 7763e59..26dce70 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -268,7 +268,7 @@ InterpretingMethod <- R6Class( } else if (inherits(self, "LRP")) { value_name <- "Relevance" include_data <- TRUE - } else if (inherits(self, "DeepLift")) { + } else if (inherits(self, c("DeepLift", "DeepSHAP"))) { value_name <- "Contribution" include_data <- TRUE } else if (inherits(self, "GradientBased")) { @@ -390,7 +390,7 @@ InterpretingMethod <- R6Class( value_name <- "Relative Importance" } else if (inherits(self, "LRP")) { value_name <- "Relevance" - } else if (inherits(self, "DeepLift")) { + } else if (inherits(self, c("DeepLift", "DeepSHAP"))) { value_name <- "Contribution" } else if (inherits(self, "GradientBased")) { value_name <- "Gradient" @@ -531,7 +531,7 @@ InterpretingMethod <- R6Class( private = list( # ----------------------- backward Function ------------------------------- - run = function(method_name) { # only 'LRP' or 'DeepLift' + run = function(method_name) { # Declare vector for relevances for each output node rel_list <- vector(mode = "list", @@ -582,7 +582,7 @@ InterpretingMethod <- R6Class( } # For DeepLift, we only need ones - if (method_name == "DeepLift") { + if (method_name %in% c("DeepLift", "DeepSHAP")) { rel <- torch_diag_embed(torch_ones_like(out)) # Overwrite rule name if (self$ignore_last_act) { @@ -656,9 +656,11 @@ InterpretingMethod <- R6Class( rel <- layer$get_input_relevances(rel, rule_name = lrp_rule$rule_name, rule_param = lrp_rule$rule_param, winner_takes_all = self$winner_takes_all) - } else if (method_name == "DeepLift") { + } else if (method_name %in% c("DeepLift", "DeepSHAP")) { + use_grad_near_zero <- ifelse(method_name == "DeepLift", TRUE, FALSE) rel <- layer$get_input_multiplier(rel, rule_name = rule_name, - winner_takes_all = self$winner_takes_all) + winner_takes_all = self$winner_takes_all, + use_grad_near_zero = use_grad_near_zero) } else if (method_name == "Connection-Weights") { rel <- layer$get_gradient(rel, weight = layer$W, use_avgpool = !self$winner_takes_all) @@ -713,18 +715,21 @@ InterpretingMethod <- R6Class( } } - # For the DeepLift method, we only get the multiplier. Hence, we have - # to multiply this by the differences of inputs - if (method_name == "DeepLift") { + # For the DeepLift and DeepSHAP method, we only get the multiplier. + # Hence, we have to multiply this by the differences of inputs + if (method_name %in% c("DeepLift", "DeepSHAP")) { fun <- function(result, out_idx, in_idx, x, x_ref) { res <- result[[out_idx]][[in_idx]] if (is.null(res)) { res <- NULL } else { - res <- res * (x[[in_idx]] - x_ref[[in_idx]])$unsqueeze(-1) + res <- res * (x[[in_idx]] - + torch_mean(x_ref[[in_idx]], dim = 1, keepdim = TRUE) + )$unsqueeze(-1) } } - result <- apply_results(result, fun, x = self$data, x_ref = self$x_ref) + x_ref <- if (method_name == "DeepLift") self$x_ref else self$data_ref + result <- apply_results(result, fun, x = self$data, x_ref = x_ref) } result diff --git a/R/Layer_conv1d.R b/R/Layer_conv1d.R index ce6ab0b..4e40cf0 100644 --- a/R/Layer_conv1d.R +++ b/R/Layer_conv1d.R @@ -83,7 +83,7 @@ conv1d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } # Apply padding if (any(self$padding != 0)) { @@ -97,12 +97,12 @@ conv1d_layer <- nn_module( dilation = self$dilation ) if (save_preactivation) { - self$preactivation_ref <- preactivation_ref + self$preactivation_ref <- preactivation_ref$mean(dim = 1, keepdim = TRUE) } output_ref <- self$activation_f(preactivation_ref) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref diff --git a/R/Layer_conv2d.R b/R/Layer_conv2d.R index 724c336..9904f65 100644 --- a/R/Layer_conv2d.R +++ b/R/Layer_conv2d.R @@ -89,7 +89,7 @@ conv2d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } # Apply padding if (any(self$padding != 0)) { @@ -103,12 +103,12 @@ conv2d_layer <- nn_module( dilation = self$dilation ) if (save_preactivation) { - self$preactivation_ref <- preactivation_ref + self$preactivation_ref <- preactivation_ref$mean(dim = 1, keepdim = TRUE) } # Apply non-linearity output_ref <- self$activation_f(preactivation_ref) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref diff --git a/R/Layer_dense.R b/R/Layer_dense.R index 8e14b06..6b75bf1 100644 --- a/R/Layer_dense.R +++ b/R/Layer_dense.R @@ -69,15 +69,15 @@ dense_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } preactivation_ref <- nnf_linear(x_ref, self$W, self$b) if (save_preactivation) { - self$preactivation_ref <- preactivation_ref + self$preactivation_ref <- preactivation_ref$mean(dim = 1, keepdim = TRUE) } output_ref <- self$activation_f(preactivation_ref) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref diff --git a/R/Layer_normalization.R b/R/Layer_normalization.R index 5e3472d..970f5c8 100644 --- a/R/Layer_normalization.R +++ b/R/Layer_normalization.R @@ -38,14 +38,14 @@ batchnorm_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } out <- nnf_batch_norm(x_ref, self$run_mean, self$run_var, self$gamma, self$beta, eps = self$eps) if (save_output) { - self$output_ref <- out + self$output_ref <- out$mean(dim = 1, keepdim = TRUE) } out diff --git a/R/Layer_pooling.R b/R/Layer_pooling.R index 82af000..bbcf07d 100644 --- a/R/Layer_pooling.R +++ b/R/Layer_pooling.R @@ -169,11 +169,11 @@ avg_pool1d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } output_ref <- nnf_avg_pool1d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref @@ -242,11 +242,11 @@ avg_pool2d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } output_ref <- nnf_avg_pool2d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref @@ -319,11 +319,11 @@ max_pool1d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } output_ref <- nnf_max_pool1d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref @@ -479,11 +479,11 @@ max_pool2d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref + self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) } output_ref <- nnf_max_pool2d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref + self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) } output_ref diff --git a/man-roxygen/examples_DeepLift.R b/man-roxygen/examples-DeepLift.R similarity index 100% rename from man-roxygen/examples_DeepLift.R rename to man-roxygen/examples-DeepLift.R diff --git a/man-roxygen/examples-DeepSHAP.R b/man-roxygen/examples-DeepSHAP.R new file mode 100644 index 0000000..6e06e30 --- /dev/null +++ b/man-roxygen/examples-DeepSHAP.R @@ -0,0 +1,109 @@ +#' @examplesIf torch::torch_is_installed() +#' #----------------------- Example 1: Torch ---------------------------------- +#' library(torch) +#' +#' # Create nn_sequential model and data +#' model <- nn_sequential( +#' nn_linear(5, 12), +#' nn_relu(), +#' nn_linear(12, 2), +#' nn_softmax(dim = 2) +#' ) +#' data <- torch_randn(25, 5) +#' +#' # Create a reference dataset for the estimation of the conditional +#' # expectation +#' ref <- torch_randn(5, 5) +#' +#' # Create Converter +#' converter <- Converter$new(model, input_dim = c(5)) +#' +#' # Apply method DeepSHAP +#' deepshap <- DeepSHAP$new(converter, data, data_ref = ref) +#' +#' # Print the result as a torch tensor for first two data points +#' get_result(deepshap, "torch.tensor")[1:2] +#' +#' # Plot the result for both classes +#' plot(deepshap, output_idx = 1:2) +#' +#' # Plot the boxplot of all datapoints and for both classes +#' boxplot(deepshap, output_idx = 1:2) +#' +#' # ------------------------- Example 2: Neuralnet --------------------------- +#' if (require("neuralnet")) { +#' library(neuralnet) +#' data(iris) +#' +#' # Train a neural network +#' nn <- neuralnet((Species == "setosa") ~ Petal.Length + Petal.Width, +#' iris, +#' linear.output = FALSE, +#' hidden = c(3, 2), act.fct = "tanh", rep = 1 +#' ) +#' +#' # Convert the model +#' converter <- Converter$new(nn) +#' +#' # Apply DeepSHAP with rescale-rule and a 100 (default of `limit_ref`) +#' # instances as the reference dataset +#' deepshap <- DeepSHAP$new(converter, iris[, c(3, 4)], +#' data_ref = iris[, c(3, 4)]) +#' +#' # Get the result as a dataframe and show first 5 rows +#' get_result(deepshap, type = "data.frame")[1:5, ] +#' +#' # Plot the result for the first datapoint in the data +#' plot(deepshap, data_idx = 1) +#' +#' # Plot the result as boxplots +#' boxplot(deepshap) +#' } +#' +#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' # ------------------------- Example 3: Keras ------------------------------- +#' if (require("keras")) { +#' library(keras) +#' +#' # Make sure keras is installed properly +#' is_keras_available() +#' +#' data <- array(rnorm(10 * 32 * 32 * 3), dim = c(10, 32, 32, 3)) +#' +#' model <- keras_model_sequential() +#' model %>% +#' layer_conv_2d( +#' input_shape = c(32, 32, 3), kernel_size = 8, filters = 8, +#' activation = "softplus", padding = "valid") %>% +#' layer_conv_2d( +#' kernel_size = 8, filters = 4, activation = "tanh", +#' padding = "same") %>% +#' layer_conv_2d( +#' kernel_size = 4, filters = 2, activation = "relu", +#' padding = "valid") %>% +#' layer_flatten() %>% +#' layer_dense(units = 64, activation = "relu") %>% +#' layer_dense(units = 16, activation = "relu") %>% +#' layer_dense(units = 2, activation = "softmax") +#' +#' # Convert the model +#' converter <- Converter$new(model) +#' +#' # Apply the DeepSHAP method with zero baseline (wich is equivalent to +#' # DeepLift with zero baseline) +#' deepshap <- DeepSHAP$new(converter, data, channels_first = FALSE) +#' +#' # Plot the result for the first image and both classes +#' plot(deepshap, output_idx = 1:2) +#' +#' # Plot the result as boxplots for first class +#' boxplot(deepshap, output_idx = 1) +#' } +#' @examplesIf torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 +#' #------------------------- Plotly plots ------------------------------------ +#' if (require("plotly")) { +#' # You can also create an interactive plot with plotly. +#' # This is a suggested package, so make sure that it is installed +#' library(plotly) +#' boxplot(deepshap, as_plotly = TRUE) +#' } diff --git a/man/ConnectionWeights.Rd b/man/ConnectionWeights.Rd index 3e0f09f..44a0014 100644 --- a/man/ConnectionWeights.Rd +++ b/man/ConnectionWeights.Rd @@ -133,6 +133,7 @@ simulated data.} Ecological Modelling 178, p. 389–397 \seealso{ Other methods: \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LRP}}, diff --git a/man/DeepLift.Rd b/man/DeepLift.Rd index 96d0fff..d27bf3f 100644 --- a/man/DeepLift.Rd +++ b/man/DeepLift.Rd @@ -139,6 +139,7 @@ propagating activation differences.} ICML 2017, p. 4844-4866 \seealso{ Other methods: \code{\link{ConnectionWeights}}, +\code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LRP}}, diff --git a/man/DeepSHAP.Rd b/man/DeepSHAP.Rd new file mode 100644 index 0000000..99f36fa --- /dev/null +++ b/man/DeepSHAP.Rd @@ -0,0 +1,323 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/DeepLift.R +\name{DeepSHAP} +\alias{DeepSHAP} +\title{Deep Shapley additive explanations (DeepSHAP)} +\description{ +The \emph{DeepSHAP} method extends the \link{DeepLift} technique by not only +considering a single reference value but by calculating the average +from several, ideally representative reference values at each layer. The +obtained feature-wise results are approximate Shapley values for the +chosen output, where the conditional expectation is computed using these +different reference values, i.e., the \emph{DeepSHAP} method decompose the +difference from the prediction and the mean prediction \eqn{f(x) - E[f(x)]} +in feature-wise effects. The reference values can be passed by the argument +\code{data_ref}. +} +\examples{ +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#----------------------- Example 1: Torch ---------------------------------- +library(torch) + +# Create nn_sequential model and data +model <- nn_sequential( + nn_linear(5, 12), + nn_relu(), + nn_linear(12, 2), + nn_softmax(dim = 2) +) +data <- torch_randn(25, 5) + +# Create a reference dataset for the estimation of the conditional +# expectation +ref <- torch_randn(5, 5) + +# Create Converter +converter <- Converter$new(model, input_dim = c(5)) + +# Apply method DeepSHAP +deepshap <- DeepSHAP$new(converter, data, data_ref = ref) + +# Print the result as a torch tensor for first two data points +get_result(deepshap, "torch.tensor")[1:2] + +# Plot the result for both classes +plot(deepshap, output_idx = 1:2) + +# Plot the boxplot of all datapoints and for both classes +boxplot(deepshap, output_idx = 1:2) + +# ------------------------- Example 2: Neuralnet --------------------------- +if (require("neuralnet")) { + library(neuralnet) + data(iris) + + # Train a neural network + nn <- neuralnet((Species == "setosa") ~ Petal.Length + Petal.Width, + iris, + linear.output = FALSE, + hidden = c(3, 2), act.fct = "tanh", rep = 1 + ) + + # Convert the model + converter <- Converter$new(nn) + + # Apply DeepSHAP with rescale-rule and a 100 (default of `limit_ref`) + # instances as the reference dataset + deepshap <- DeepSHAP$new(converter, iris[, c(3, 4)], + data_ref = iris[, c(3, 4)]) + + # Get the result as a dataframe and show first 5 rows + get_result(deepshap, type = "data.frame")[1:5, ] + + # Plot the result for the first datapoint in the data + plot(deepshap, data_idx = 1) + + # Plot the result as boxplots + boxplot(deepshap) +} +\dontshow{\}) # examplesIf} +\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +# ------------------------- Example 3: Keras ------------------------------- +if (require("keras")) { + library(keras) + + # Make sure keras is installed properly + is_keras_available() + + data <- array(rnorm(10 * 32 * 32 * 3), dim = c(10, 32, 32, 3)) + + model <- keras_model_sequential() + model \%>\% + layer_conv_2d( + input_shape = c(32, 32, 3), kernel_size = 8, filters = 8, + activation = "softplus", padding = "valid") \%>\% + layer_conv_2d( + kernel_size = 8, filters = 4, activation = "tanh", + padding = "same") \%>\% + layer_conv_2d( + kernel_size = 4, filters = 2, activation = "relu", + padding = "valid") \%>\% + layer_flatten() \%>\% + layer_dense(units = 64, activation = "relu") \%>\% + layer_dense(units = 16, activation = "relu") \%>\% + layer_dense(units = 2, activation = "softmax") + + # Convert the model + converter <- Converter$new(model) + + # Apply the DeepSHAP method with zero baseline (wich is equivalent to + # DeepLift with zero baseline) + deepshap <- DeepSHAP$new(converter, data, channels_first = FALSE) + + # Plot the result for the first image and both classes + plot(deepshap, output_idx = 1:2) + + # Plot the result as boxplots for first class + boxplot(deepshap, output_idx = 1) +} +\dontshow{\}) # examplesIf} +\dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#------------------------- Plotly plots ------------------------------------ +if (require("plotly")) { + # You can also create an interactive plot with plotly. + # This is a suggested package, so make sure that it is installed + library(plotly) + boxplot(deepshap, as_plotly = TRUE) +} +\dontshow{\}) # examplesIf} +} +\references{ +S. Lundberg & S. Lee (2017) \emph{A unified approach to interpreting model +predictions.} NIPS 2017, p. 4768–4777 +} +\seealso{ +Other methods: +\code{\link{ConnectionWeights}}, +\code{\link{DeepLift}}, +\code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, +\code{\link{LRP}}, +\code{\link{SmoothGrad}} +} +\concept{methods} +\section{Super class}{ +\code{\link[innsight:InterpretingMethod]{innsight::InterpretingMethod}} -> \code{DeepSHAP} +} +\section{Public fields}{ +\if{html}{\out{
}} +\describe{ +\item{\code{rule_name}}{(\code{character(1)})\cr +Name of the applied rule to calculate the contributions. +Either \code{'rescale'} or \code{'reveal_cancel'}.\cr} + +\item{\code{data_ref}}{(\code{list})\cr +The passed reference dataset for estimating the conditional expectation +as a \code{list} of \code{torch_tensors} in the selected +data format (field \code{dtype}) matching the corresponding shapes of the +individual input layers. Besides, the channel axis is moved to the +second position after the batch size because internally only the +format \emph{channels first} is used.\cr} +} +\if{html}{\out{
}} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-DeepSHAP-new}{\code{DeepSHAP$new()}} +\item \href{#method-DeepSHAP-clone}{\code{DeepSHAP$clone()}} +} +} +\if{html}{\out{ +
Inherited methods + +
+}} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-DeepSHAP-new}{}}} +\subsection{Method \code{new()}}{ +Create a new instance of the \code{DeepSHAP} R6 class. When initialized, +the method \emph{DeepSHAP} is applied to the given data and the results are +stored in the field \code{result}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{DeepSHAP$new( + converter, + data, + channels_first = TRUE, + output_idx = NULL, + ignore_last_act = TRUE, + rule_name = "rescale", + data_ref = NULL, + limit_ref = 100, + winner_takes_all = TRUE, + verbose = interactive(), + dtype = "float" +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{converter}}{(\code{\link{Converter}})\cr +An instance of the \code{Converter} class that includes the +torch-converted model and some other model-specific attributes. See +\code{\link{Converter}} for details.\cr} + +\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The data to which the method is to be applied. These must +have the same format as the input data of the passed model to the +converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{(batch_size, dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the +upper point) for each of the input layers.\cr +}} + +\item{\code{channels_first}}{(\code{logical(1)})\cr +The channel position of the given data (argument +\code{data}). If \code{TRUE}, the channel axis is placed at the second position +between the batch size and the rest of the input axes, e.g., +\code{c(10,3,32,32)} for a batch of ten images with three channels and a +height and width of 32 pixels. Otherwise (\code{FALSE}), the channel axis +is at the last position, i.e., \code{c(10,32,32,3)}. If the data +has no channel axis, use the default value \code{TRUE}.\cr} + +\item{\code{output_idx}}{(\code{integer}, \code{list} or \code{NULL})\cr +These indices specify the output nodes for which +the method is to be applied. In order to allow models with multiple +output layers, there are the following possibilities to select +the indices of the output nodes in the individual output layers: +\itemize{ +\item An \code{integer} vector of indices: If the model has only one output +layer, the values correspond to the indices of the output nodes, e.g., +\code{c(1,3,4)} for the first, third and fourth output node. If there are +multiple output layers, the indices of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{integer} vectors of indices: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired indices of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{ignore_last_act}}{(\code{logical(1)})\cr +Set this logical value to include the last +activation functions for each output layer, or not (default: \code{TRUE}). +In practice, the last activation (especially for softmax activation) is +often omitted.\cr} + +\item{\code{rule_name}}{(\code{character(1)})\cr +Name of the applied rule to calculate the +contributions. Use either \code{'rescale'} or \code{'reveal_cancel'}. \cr} + +\item{\code{data_ref}}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The reference data which is used to estimate the conditional expectation. +These must have the same format as the input data of the passed model to +the converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{(batch_size, dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the +upper point) for each of the input layers. +\item or \code{NULL} (default) to use only a zero baseline for the estimation.\cr +}} + +\item{\code{limit_ref}}{(\code{integer(1)})\cr +This argument limits the number of instances taken from the reference +dataset \code{data_ref} so that only random \code{limit_ref} elements and not +the entire dataset are used to estimate the conditional expectation. +A too-large number can significantly increase the computation time.\cr} + +\item{\code{winner_takes_all}}{(\code{logical(1)})\cr +This logical argument is only relevant for MaxPooling +layers and is otherwise ignored. With this layer type, it is possible that +the position of the maximum values in the pooling kernel of the normal input +\eqn{x} and the reference input \eqn{x'} may not match, which leads to a +violation of the summation-to-delta property. To overcome this problem, +another variant is implemented, which treats a MaxPooling layer as an +AveragePooling layer in the backward pass only, leading to an uniform +distribution of the upper-layer contribution to the lower layer.\cr} + +\item{\code{verbose}}{(\code{logical(1)})\cr +This logical argument determines whether a progress bar is +displayed for the calculation of the method or not. The default value is +the output of the primitive R function \code{\link[=interactive]{interactive()}}.\cr} + +\item{\code{dtype}}{(\code{character(1)})\cr +The data type for the calculations. Use +either \code{'float'} for \link{torch_float} or \code{'double'} for +\link{torch_double}.\cr} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-DeepSHAP-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{DeepSHAP$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/Gradient.Rd b/man/Gradient.Rd index 37c7b1a..d0ac79d 100644 --- a/man/Gradient.Rd +++ b/man/Gradient.Rd @@ -134,6 +134,7 @@ if (require("plotly")) { Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{IntegratedGradient}}, \code{\link{LRP}}, \code{\link{SmoothGrad}} diff --git a/man/IntegratedGradient.Rd b/man/IntegratedGradient.Rd index 55fab5d..49147b8 100644 --- a/man/IntegratedGradient.Rd +++ b/man/IntegratedGradient.Rd @@ -142,6 +142,7 @@ M. Sundararajan et al. (2017) \emph{Axiomatic attribution for deep networks.} IC Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{LRP}}, \code{\link{SmoothGrad}} diff --git a/man/LRP.Rd b/man/LRP.Rd index 63f3e5d..5ad1e99 100644 --- a/man/LRP.Rd +++ b/man/LRP.Rd @@ -153,6 +153,7 @@ p. 1-46 Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{SmoothGrad}} diff --git a/man/SmoothGrad.Rd b/man/SmoothGrad.Rd index c396d83..337c7b1 100644 --- a/man/SmoothGrad.Rd +++ b/man/SmoothGrad.Rd @@ -133,6 +133,7 @@ CoRR, abs/1706.03825 Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LRP}} diff --git a/tests/testthat/test_DeepSHAP.R b/tests/testthat/test_DeepSHAP.R new file mode 100644 index 0000000..473e28e --- /dev/null +++ b/tests/testthat/test_DeepSHAP.R @@ -0,0 +1,595 @@ + +test_that("DeepSHAP: General errors", { + library(keras) + library(torch) + + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "relu") %>% + layer_dense(units = 3, activation = "softmax") + + converter <- Converter$new(model) + + expect_error(DeepSHAP$new(model, data)) + expect_error(DeepSHAP$new(converter, model)) + expect_error(DeepSHAP$new(converter, data, channels_first = NULL)) + expect_error(DeepSHAP$new(converter, data, rule_name = "asdf")) + expect_error(DeepSHAP$new(converter, data, rule_param = "asdf")) + expect_error(DeepSHAP$new(converter, data, dtype = NULL)) + expect_error(DeepSHAP$new(converter, data, ignore_last_act = c(1))) + expect_error(DeepSHAP$new(converter, data, data_ref = c(1,2,3))) +}) + +test_that("DeepSHAP: Plot and Boxplot", { + library(neuralnet) + library(torch) + + data(iris) + data <- iris[sample.int(150, size = 10), -5] + nn <- neuralnet(Species ~ ., + iris, + linear.output = FALSE, + hidden = c(10, 8), act.fct = "tanh", rep = 1, threshold = 0.5 + ) + # create an converter for this model + converter <- Converter$new(nn) + + # Rescale Rule + d <- DeepSHAP$new(converter, data, + dtype = "double", + ignore_last_act = FALSE + ) + + # ggplot2 + + # Non-existing data points + expect_error(plot(d, data_idx = c(1,11))) + expect_error(boxplot(d, data_idx = 1:11)) + # Non-existing class + expect_error(plot(d, output_idx = c(5))) + expect_error(boxplot(d, output_idx = c(5))) + + p <- plot(d) + boxp <- boxplot(d) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(d, data_idx = 1:3) + boxp <- boxplot(d, data_idx = 1:4) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(d, data_idx = 1:3, output_idx = 1:3) + boxp <- boxplot(d, data_idx = 1:5, output_idx = 1:3) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # plotly + library(plotly) + + p <- plot(d, as_plotly = TRUE) + boxp <- boxplot(d, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(d, data_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(d, data_idx = 1:4, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(d, data_idx = 1:3, output_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(d, data_idx = 1:5, output_idx = 1:3, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") +}) + +test_that("DeepSHAP: Dense-Net (Neuralnet)", { + library(neuralnet) + library(torch) + + data(iris) + + data <- iris[sample.int(150, size = 10), -5] + nn <- neuralnet(Species ~ ., + iris, + linear.output = FALSE, + hidden = c(10, 8), act.fct = "tanh", rep = 1, threshold = 0.5 + ) + # create an converter for this model + converter <- Converter$new(nn) + + # Rescale Rule (no reference dataset) + d <- DeepSHAP$new(converter, data, + ignore_last_act = FALSE + ) + res <- converter$model(torch_tensor(t(data))$t(), TRUE, + TRUE, TRUE, TRUE) + res <- converter$model$update_ref(torch_zeros(c(1,4)), + TRUE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale$sum(dim = 2) - contrib_true)^2)), 1e-8 + ) + + d <- + DeepSHAP$new(converter, data, + ignore_last_act = TRUE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = 2) - + contrib_no_last_act_true)^2)), 1e-8 + ) + + # Rescale Rule (with reference dataset) + data_ref <- matrix(rnorm(4 * 20), nrow = 20) + d <- DeepSHAP$new(converter, data, + data_ref = data_ref, + ignore_last_act = FALSE + ) + res <- converter$model(torch_tensor(t(data))$t(), TRUE, + TRUE, TRUE, TRUE) + res <- converter$model$update_ref(torch_tensor(data_ref), + TRUE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale$sum(dim = 2) - contrib_true)^2)), 1e-8 + ) + + d <- + DeepSHAP$new(converter, data, + data_ref = data_ref, + ignore_last_act = TRUE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = 2) - + contrib_no_last_act_true)^2)), 1e-8 + ) +}) + +test_that("DeepSHAP: Dense-Net (keras)", { + library(keras) + library(torch) + + data <- matrix(rnorm(4 * 10), nrow = 10) + + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "tanh") %>% + layer_dense(units = 3, activation = "softmax") + + converter <- Converter$new(model) + data_ref <- matrix(rnorm(4 * 20), nrow = 20) + + # Rescale Rule (no reference dataset) + d <- DeepSHAP$new(converter, data, + ignore_last_act = FALSE + ) + + converter$model(torch_tensor(data), TRUE, TRUE, TRUE, TRUE) + converter$model$update_ref(torch_zeros(c(1,4)), TRUE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale$sum(dim = 2) - contrib_true)^2)), 1e-8 + ) + + d <- + DeepSHAP$new(converter, data, + ignore_last_act = TRUE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = 2) - + contrib_no_last_act_true)^2)), 1e-8 + ) + + # Rescale Rule (with reference dataset) + d <- DeepSHAP$new(converter, data, + data_ref = data_ref, + ignore_last_act = FALSE + ) + + converter$model(torch_tensor(data), TRUE, TRUE, TRUE, TRUE) + converter$model$update_ref(torch_tensor(data_ref), TRUE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale$sum(dim = 2) - contrib_true)^2)), 1e-8 + ) + + d <- + DeepSHAP$new(converter, data, + data_ref = data_ref, + ignore_last_act = TRUE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(10, 4, 3)) + expect_lt( + as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = 2) - + contrib_no_last_act_true)^2)), 1e-8 + ) +}) + +test_that("DeepSHAP: Conv1D-Net", { + library(keras) + library(torch) + + data <- array(rnorm(4 * 32 * 3), dim = c(4, 32, 3)) + + model <- keras_model_sequential() + model %>% + layer_conv_1d( + input_shape = c(32, 3), kernel_size = 16, filters = 8, + activation = "softplus" + ) %>% + layer_conv_1d(kernel_size = 8, filters = 4, activation = "tanh") %>% + layer_conv_1d(kernel_size = 8, filters = 4, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 1, activation = "sigmoid") + + # test non-fitted model + converter <- Converter$new(model) + + # Rescale rule (no reference dataset) + d <- DeepSHAP$new(converter, data, + channels_first = FALSE, + ignore_last_act = FALSE + ) + + converter$model(torch_tensor(data), FALSE, TRUE, TRUE, TRUE) + converter$model$update_ref(torch_zeros(c(1, 32, 3)), FALSE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(4, 32, 3, 1)) + expect_lt(as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3)) - + contrib_true)^2)), 1e-8) + + d <- DeepSHAP$new(converter, data, + ignore_last_act = TRUE, + channels_first = FALSE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(4, 32, 3, 1)) + expect_lt(as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = c(2, 3)) - + contrib_no_last_act_true)^2)), 1e-8) + + # Rescale rule (with reference dataset) + data_ref <- array(rnorm(32 * 3 * 5), dim = c(5, 32, 3)) + + d <- DeepSHAP$new(converter, data, + data_ref = data_ref, + channels_first = FALSE, + ignore_last_act = FALSE + ) + + converter$model(torch_tensor(data), FALSE, TRUE, TRUE, TRUE) + converter$model$update_ref(torch_tensor(data_ref), FALSE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(4, 32, 3, 1)) + expect_lt(as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3)) - + contrib_true)^2)), 1e-8) + + d <- DeepSHAP$new(converter, data, + data_ref = data_ref, + ignore_last_act = TRUE, + channels_first = FALSE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(4, 32, 3, 1)) + expect_lt(as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = c(2, 3)) - + contrib_no_last_act_true)^2)), 1e-8) +}) + +test_that("DeepSHAP: Conv2D-Net", { + library(keras) + library(torch) + + data <- array(rnorm(4 * 16 * 16 * 3), dim = c(4, 16, 16, 3)) + + model <- keras_model_sequential() + model %>% + layer_conv_2d( + input_shape = c(16, 16, 3), kernel_size = 8, filters = 8, + activation = "softplus", padding = "same" + ) %>% + layer_conv_2d( + kernel_size = 4, filters = 4, activation = "relu") %>% + layer_conv_2d( + kernel_size = 2, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 2, activation = "softmax") + + # test non-fitted model + converter <- Converter$new(model) + + # Rescale rule (no reference data) + d <- DeepSHAP$new(converter, data, + channels_first = FALSE, + ignore_last_act = FALSE + ) + + converter$model(torch_tensor(data), FALSE, TRUE, TRUE, TRUE) + converter$model$update_ref(torch_zeros(c(1, 16, 16, 3)), + FALSE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + + deepshap_rescale <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale), c(4, 16, 16, 3, 2)) + expect_lt(as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3, 4)) - + contrib_true)^2)), 1e-8) + + d <- DeepSHAP$new(converter, data, + ignore_last_act = TRUE, + channels_first = FALSE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(4, 16, 16, 3, 2)) + expect_lt( + as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = c(2, 3, 4)) - + contrib_no_last_act_true)^2)), 1e-8 + ) + + # Rescale rule (with reference data) + data_ref <- array(rnorm(10 * 16 * 16 * 3), dim = c(10, 16, 16, 3)) + d <- DeepSHAP$new(converter, data, + data_ref = data_ref, + channels_first = FALSE, + ignore_last_act = FALSE + ) + + converter$model(torch_tensor(data), FALSE, TRUE, TRUE, TRUE) + converter$model$update_ref(torch_tensor(data_ref), FALSE, TRUE, TRUE, TRUE) + + last_layer <- rev(converter$model$modules_list)[[1]] + contrib_true <- last_layer$output - last_layer$output_ref + contrib_no_last_act_true <- + last_layer$preactivation - last_layer$preactivation_ref + + first_layer <- converter$model$modules_list[[1]] + input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + + deepshap_rescale <- d$get_result(type = "torch.tensor") + as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3, 4)) - + contrib_true)^2)) + + expect_equal(dim(deepshap_rescale), c(4, 16, 16, 3, 2)) + expect_lt(as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3, 4)) - + contrib_true)^2)), 1e-8) + + d <- DeepSHAP$new(converter, data, + data_ref = data_ref, + ignore_last_act = TRUE, + channels_first = FALSE + ) + deepshap_rescale_no_last_act <- d$get_result(type = "torch.tensor") + + expect_equal(dim(deepshap_rescale_no_last_act), c(4, 16, 16, 3, 2)) + expect_lt( + as.array(mean(abs(deepshap_rescale_no_last_act$sum(dim = c(2, 3, 4)) - + contrib_no_last_act_true)^2)), 1e-8 + ) +}) + + + +test_that("DeepSHAP: Keras model with two inputs + two outputs (concat)", { + library(keras) + + main_input <- layer_input(shape = c(10,10,2), name = 'main_input') + lstm_out <- main_input %>% + layer_conv_2d(2, c(2,2), activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 4) + auxiliary_input <- layer_input(shape = c(5), name = 'aux_input') + auxiliary_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>% + layer_dense(units = 2, activation = 'relu', name = 'aux_output') + main_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>% + layer_dense(units = 5, activation = 'relu') %>% + layer_dense(units = 3, activation = 'tanh', name = 'main_output') + model <- keras_model( + inputs = c(auxiliary_input, main_input), + outputs = c(auxiliary_output, main_output) + ) + + converter <- Converter$new(model) + + # Check DeepSHAP with rescale rule and ignoring last activation + data <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + data_ref <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(5, x))) + + deepshap <- DeepSHAP$new(converter, data, data_ref = data_ref, + channels_first = FALSE, output_idx = list(c(2), c(1,3))) + result <- deepshap$get_result() + expect_equal(length(result), 2) + expect_equal(length(result[[1]]), 2) + expect_equal(dim(result[[1]][[1]]), c(10,5,1)) + expect_equal(dim(result[[1]][[2]]), c(10,10,10,2,1)) + expect_equal(length(result[[2]]), 2) + expect_equal(dim(result[[2]][[1]]), c(10,5,2)) + expect_equal(dim(result[[2]][[2]]), c(10,10,10,2,2)) + + # Check correctness of DeepSHAP rescale rule without ignoring the last + # activation + data <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + data_ref <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(5, x))) + deepshap <- DeepSHAP$new(converter, data, data_ref = data_ref, ignore_last_act = FALSE, + channels_first = FALSE, output_idx = list(c(2), c(1,3))) + + y <- converter$model(data, channels_first = FALSE) + y_ref <- converter$model$update_ref(data_ref, channels_first = FALSE) + contrib_true <- list(as.array(y[[1]][, 2] - mean(y_ref[[1]][, 2])), + as.array(y[[2]][, 1] - mean(y_ref[[2]][, 1])), + as.array(y[[2]][, 3] - mean(y_ref[[2]][, 3]))) + + result <- deepshap$get_result("torch_tensor") + contrib_1 <- as.array(result$Output_1$Input_1$sum(c(2,3)) + + result$Output_1$Input_2$sum(c(2,3,4,5))) + contrib_2 <- as.array(result$Output_2$Input_1[,,1]$sum(c(2)) + + result$Output_2$Input_2[,,,,1]$sum(c(2,3,4))) + contrib_3 <- as.array(result$Output_2$Input_1[,,2]$sum(c(2)) + + result$Output_2$Input_2[,,,,2]$sum(c(2,3,4))) + + expect_lt(mean((contrib_true[[1]] - contrib_1)^2), 1e-8) + expect_lt(mean((contrib_true[[2]] - contrib_2)^2), 1e-8) + expect_lt(mean((contrib_true[[3]] - contrib_3)^2), 1e-8) +}) + + + +test_that("DeepSHAP: Keras model with three inputs + one output (add)", { + library(keras) + + input_1 <- layer_input(shape = c(12,15,3)) + part_1 <- input_1 %>% + layer_conv_2d(3, c(4,4), activation = "relu", use_bias = FALSE) %>% + layer_conv_2d(2, c(3,3), activation = "relu", use_bias = FALSE) %>% + layer_flatten() %>% + layer_dense(20, activation = "relu", use_bias = FALSE) + input_2 <- layer_input(shape = c(10)) + part_2 <- input_2 %>% + layer_dense(50, activation = "tanh", use_bias = FALSE) + input_3 <- layer_input(shape = c(20)) + part_3 <- input_3 %>% + layer_dense(40, activation = "relu", use_bias = FALSE) + + output <- layer_concatenate(c(part_1, part_3, part_2)) %>% + layer_dense(100, activation = "relu", use_bias = FALSE) %>% + layer_dense(1, activation = "linear", use_bias = FALSE) + + model <- keras_model( + inputs = c(input_1, input_3, input_2), + outputs = output + ) + + + converter <- Converter$new(model) + + # Check DeepSHAP with rescale rule and ignoring last activation + data <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(10,x))) + data_ref <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(5,x))) + + deepshap <- DeepSHAP$new(converter, data, data_ref = data_ref, + channels_first = FALSE) + result <- deepshap$get_result() + expect_equal(length(result), 3) + expect_equal(dim(result[[1]]), c(10,12,15,3,1)) + expect_equal(dim(result[[2]]), c(10,20,1)) + expect_equal(dim(result[[3]]), c(10,10,1)) + + # Check correctness of DeepSHAP rescale rule without ignoring the last + # activation + data <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(10,x))) + data_ref <- lapply(list(c(12,15,3), c(20), c(10)), + function(x) torch_randn(c(5,x))) + deepshap <- DeepSHAP$new(converter, data, data_ref = data_ref, + ignore_last_act = FALSE, + channels_first = FALSE) + + y <- converter$model(data, channels_first = FALSE) + y_ref <- converter$model$update_ref(data_ref, channels_first = FALSE) + contrib_true <- as.array(y[[1]] - mean(y_ref[[1]])) + + result <- deepshap$get_result("torch_tensor") + contrib <- as.array( + result$Input_1$sum(c(2,3,4,5)) + + result$Input_2$sum(c(2,3)) + + result$Input_3$sum(c(2,3))) + + expect_lt(mean((contrib_true - contrib)^2), 1e-8) +}) + From 57bf2ade7b3fdea0e5a5de70173169d5cf41bf4f Mon Sep 17 00:00:00 2001 From: nkoenen Date: Fri, 24 Nov 2023 14:43:45 +0100 Subject: [PATCH 03/20] Add model-agnostic methods * SHAP: Wraps the R package `fastshap` * LIME: Wraps the R package `lime` * Add test and documentation for both methods --- DESCRIPTION | 5 + NAMESPACE | 7 + R/AgnosticMethods.R | 230 ++++++++ R/AgnosticWrapper.R | 364 +++++++++++++ R/GradienBased.R | 12 +- R/InterpretingMethod.R | 10 + R/innsight.R | 1 + man-roxygen/examples-LIME.R | 81 +++ man-roxygen/examples-SHAP.R | 80 +++ man-roxygen/param-data-agnostic.R | 7 + man-roxygen/param-input_dim-agnostic.R | 4 + man-roxygen/param-input_names-agnostic.R | 13 + man-roxygen/param-model-agnostic.R | 10 + man-roxygen/param-output_names-agnostic.R | 8 + man-roxygen/param-output_type-agnostic.R | 3 + man-roxygen/param-pred_fun-agnostic.R | 9 + man-roxygen/param-x-agnostic.R | 8 + man/AgnosticWrapper.Rd | 193 +++++++ man/ConnectionWeights.Rd | 2 + man/DeepLift.Rd | 2 + man/DeepSHAP.Rd | 2 + man/Gradient.Rd | 2 + man/IntegratedGradient.Rd | 2 + man/LIME.Rd | 292 +++++++++++ man/LRP.Rd | 2 + man/SHAP.Rd | 283 ++++++++++ man/SmoothGrad.Rd | 4 +- tests/testthat/test_LIME.R | 613 ++++++++++++++++++++++ tests/testthat/test_SHAP.R | 612 +++++++++++++++++++++ 29 files changed, 2854 insertions(+), 7 deletions(-) create mode 100644 R/AgnosticMethods.R create mode 100644 R/AgnosticWrapper.R create mode 100644 man-roxygen/examples-LIME.R create mode 100644 man-roxygen/examples-SHAP.R create mode 100644 man-roxygen/param-data-agnostic.R create mode 100644 man-roxygen/param-input_dim-agnostic.R create mode 100644 man-roxygen/param-input_names-agnostic.R create mode 100644 man-roxygen/param-model-agnostic.R create mode 100644 man-roxygen/param-output_names-agnostic.R create mode 100644 man-roxygen/param-output_type-agnostic.R create mode 100644 man-roxygen/param-pred_fun-agnostic.R create mode 100644 man-roxygen/param-x-agnostic.R create mode 100644 man/AgnosticWrapper.Rd create mode 100644 man/LIME.Rd create mode 100644 man/SHAP.Rd create mode 100644 tests/testthat/test_LIME.R create mode 100644 tests/testthat/test_SHAP.R diff --git a/DESCRIPTION b/DESCRIPTION index e8c4854..a8c5a25 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -32,17 +32,20 @@ Imports: torch Suggests: covr, + fastshap, GGally, grid, gridExtra, gtable, keras, knitr, + lime, luz, neuralnet, palmerpenguins, plotly, rmarkdown, + ranger, spelling, tensorflow, testthat (>= 3.0.0) @@ -54,6 +57,8 @@ Language: en-US Roxygen: list(markdown = TRUE, r6 = TRUE) RoxygenNote: 7.2.3 Collate: + 'AgnosticMethods.R' + 'AgnosticWrapper.R' 'ConnectionWeights.R' 'Convert_keras.R' 'Convert_neuralnet.R' diff --git a/NAMESPACE b/NAMESPACE index 594ed7b..67c9fed 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,18 +1,24 @@ # Generated by roxygen2: do not edit by hand +S3method(boxplot,AgnosticWrapper) S3method(boxplot,ConnectionWeights) S3method(boxplot,DeepLift) S3method(boxplot,DeepSHAP) S3method(boxplot,GradientBased) S3method(boxplot,LRP) S3method(get_result,InterpretingMethod) +S3method(lime::model_type,innsight_agnostic_wrapper) +S3method(lime::predict_model,innsight_agnostic_wrapper) +export(AgnosticWrapper) export(ConnectionWeights) export(Converter) export(DeepLift) export(DeepSHAP) export(Gradient) export(IntegratedGradient) +export(LIME) export(LRP) +export(SHAP) export(SmoothGrad) export(get_result) exportMethods("+") @@ -54,5 +60,6 @@ importFrom(methods,show) importFrom(stats,aggregate) importFrom(stats,ave) importFrom(stats,median) +importFrom(stats,predict) importFrom(stats,quantile) importFrom(utils,packageVersion) diff --git a/R/AgnosticMethods.R b/R/AgnosticMethods.R new file mode 100644 index 0000000..e16a3f5 --- /dev/null +++ b/R/AgnosticMethods.R @@ -0,0 +1,230 @@ + + +################################################################################ +# Lime (from the package lime) +################################################################################ + +#' Local interpretable model-agnostic explanations (LIME) +#' +#' @description +#' The R6 class `LIME` calculates the feature weights of a linear surrogate of +#' the prediction model for a instance to be explained, namely the +#' *local interpretable model-agnostic explanations (LIME)*. It is a +#' model-agnostic method that can be applied to any predictive model. +#' This means, in particular, that +#' `LIME` can be applied not only to objects of the [`Converter`] class but +#' also to any other model. The only requirement is the argument `pred_fun`, +#' which generates predictions with the model for given data. However, this +#' function is pre-implemented for models created with +#' \code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +#' \code{\link[neuralnet]{neuralnet}} or [`Converter`]. Internally, the +#' suggested package `lime` is utilized and applied to `data.frame`. +#' +#' **Note:** Even signal and image data are initially transformed into a +#' `data.frame` using `as.data.frame()` and then [`lime::lime`] and +#' [`lime::explain`] are +#' applied. In other words, a custom `pred_fun` may need to convert the +#' `data.frame` back into an `array` as necessary. +#' +#' @template param-output_idx +#' @template param-channels_first +#' @template param-model-agnostic +#' @template param-data-agnostic +#' @template param-x-agnostic +#' @template param-output_type-agnostic +#' @template param-pred_fun-agnostic +#' @template param-input_dim-agnostic +#' @template param-input_names-agnostic +#' @template param-output_names-agnostic +#' @template examples-LIME +#' +#' @family methods +#' @export +LIME <- R6Class( + classname = "LIME", + inherit = AgnosticWrapper, + public = list( + + #' @description + #' Create a new instance of the `LIME` R6 class. When initialized, + #' the method *LIME* is applied to the given data and the results are + #' stored in the field `result`. + #' + #' @param ... other arguments forwarded to [`lime::explain`]. + initialize = function(model, data, x, + output_type = NULL, + pred_fun = NULL, + output_idx = NULL, + channels_first = TRUE, + input_dim = NULL, + input_names = NULL, + output_names = NULL, ...) { + + # Check if data or x is a torch_tensor + if (inherits(data, "torch_tensor")) data <- as.array(data) + if (inherits(x, "torch_tensor")) x <- as.array(x) + + super$initialize(model, data, x, output_type, pred_fun, output_idx, + channels_first, input_dim, input_names, + output_names) + + # Get the pre-processed x + x <- self$x + + # We use the lime package for the explanation + if (!requireNamespace("lime", quietly = TRUE)) { + stopf("Package {.pkg lime} must be installed to use this function!") + } + + # Create the explainer of the lime package + explainer <- lime::lime(data.frame(data), self$converter) + + # Apply lime + if (self$converter$output_type == "classification") { + res <- lime::explain(data.frame(x), explainer, + labels = self$converter$output_names[[1]][[1]][self$output_idx[[1]]], + n_features = prod(self$converter$input_dim[[1]]), + input_dim = self$converter$input_dim[[1]], ...) + res_dim <- c(dim(x)[-1], length(self$output_idx[[1]]), nrow(x)) + result <- torch_tensor(array(res$feature_weight, dim = res_dim)) + result <- result$movedim(-1, 1) + } else { + apply_lime <- function(idx) { + tmp_res <- lime::explain(data.frame(x), explainer, + n_features = prod(self$converter$input_dim[[1]]), + input_dim = self$converter$input_dim[[1]], + idx = idx, ...) + res_dim <- c(dim(x)[-1], nrow(x)) + tmp_res <- torch_tensor(array(tmp_res$feature_weight, dim = res_dim)) + tmp_res <- tmp_res$movedim(-1, 1) + } + res <- lapply(self$output_idx[[1]], apply_lime) + result <- torch_stack(res, dim = -1) + } + + self$result <- list(list(result)) + } + ) +) + +# Add functions predict_model and model_type for the objects of class +# innsight_agnostic_wrapper + +#' @exportS3Method lime::predict_model +predict_model.innsight_agnostic_wrapper <- function(x, newdata, type, idx, ...) { + pred <- x$pred_fun(newdata = newdata, ...) + if (type == "raw") { + as.data.frame(pred[, idx, drop = FALSE]) + } else { + if (!inherits(pred, c("data.frame", "matrix", "array"))) { + pred <- as.array(pred) + } + colnames(pred) <- x$output_names[[1]][[1]] + as.data.frame(pred, check.names = FALSE) + } +} + +#' @exportS3Method lime::model_type +model_type.innsight_agnostic_wrapper <- function(x, ...) { + x$output_type +} + +################################################################################ +# SHAP (from the package fastshap) +################################################################################ + +#' Shapley values +#' +#' @description +#' The R6 class `SHAP` calculates the famous Shapley values based on game +#' theory for an instance to be explained. It is a model-agnostic method +#' that can be applied to any predictive model. This means, in particular, that +#' `SHAP` can be applied not only to objects of the [`Converter`] class but +#' also to any other model. The only requirement is the argument `pred_fun`, +#' which generates predictions with the model for given data. However, this +#' function is pre-implemented for models created with +#' \code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +#' \code{\link[neuralnet]{neuralnet}} or [`Converter`]. Internally, the +#' suggested package `fastshap` is utilized and applied to `data.frame`. +#' +#' **Note:** Even signal and image data are initially transformed into a +#' `data.frame` using `as.data.frame()` and then [`fastshap::explain`] is +#' applied. In other words, a custom `pred_fun` may need to convert the +#' `data.frame` back into an `array` as necessary. +#' +#' @template param-output_idx +#' @template param-channels_first +#' @template param-model-agnostic +#' @template param-data-agnostic +#' @template param-x-agnostic +#' @template param-pred_fun-agnostic +#' @template param-input_dim-agnostic +#' @template param-input_names-agnostic +#' @template param-output_names-agnostic +#' @template examples-SHAP +#' +#' @family methods +#' @export +SHAP <- R6Class( + classname = "SHAP", + inherit = AgnosticWrapper, + public = list( + + #' @description + #' Create a new instance of the `SHAP` R6 class. When initialized, + #' the method *SHAP* is applied to the given data and the results are + #' stored in the field `result`. + #' + #' @param ... other arguments forwarded to [`fastshap::explain`]. + initialize = function(model, data, x, + pred_fun = NULL, + output_idx = NULL, + channels_first = TRUE, + input_dim = NULL, + input_names = NULL, + output_names = NULL, ...) { + + # output_type is not necessary for fastshap + output_type <- "regression" + + # Check if data or x is a torch_tensor + if (inherits(data, "torch_tensor")) data <- as.array(data) + if (inherits(x, "torch_tensor")) x <- as.array(x) + + super$initialize(model, data, x, output_type, pred_fun, output_idx, + channels_first, input_dim, input_names, + output_names) + + # We use the fastshap package for the explanation + if (!requireNamespace("fastshap", quietly = TRUE)) { + stopf("Package {.pkg fastshap} must be installed to use this function!") + } + + # Function for calculating Shapley values for a specific output + apply_shap <- function(idx, input_dim) { + pred_wrapper <- function(object, newdata, ...) { + self$converter$pred_fun(newdata = newdata, input_dim = input_dim, ...)[, idx] + } + + res <- fastshap::explain( + self$converter, + X = as.data.frame(data), + newdata = as.data.frame(self$x), + pred_wrapper = pred_wrapper, ...) + dim(res) <- dim(self$x) + res + } + + # Calculate Shapley values for all outputs + result <- lapply(self$output_idx[[1]], apply_shap, + input_dim = self$converter$input_dim[[1]]) + + # Reshape the result to (batch_size, input_dim, output_idx) + result <- torch_stack(result, dim = -1) + + # Save result + self$result <- list(list(result)) + } + ) +) + diff --git a/R/AgnosticWrapper.R b/R/AgnosticWrapper.R new file mode 100644 index 0000000..f95e5c5 --- /dev/null +++ b/R/AgnosticWrapper.R @@ -0,0 +1,364 @@ +############################################################################### +# Super class: AgnosticWrapper +############################################################################### + +#' +#' @title Super class for model-agnostic interpretability methods +#' @description This is a super class for all implemented model-agnostic +#' interpretability methods and inherits from the [`InterpretingMethod`] +#' class. Instead of just an object of the [`Converter`] class, any model +#' can now be passed. In contrast to the other model-specific methods in this +#' package, only the prediction function of the model is required, and not +#' the internal details of the model. The following model-agnostic methods +#' are available (all are wrapped by other packages): +#' +#' - *Shapley values* ([`SHAP`]) based on [`fastshap::explain`] +#' - *Local interpretable model-agnostic explanations* ([`LIME`]) based on +#' [`lime::lime`] +#' +#' @template param-output_idx +#' @template param-channels_first +#' @template param-model-agnostic +#' @template param-data-agnostic +#' @template param-x-agnostic +#' @template param-output_type-agnostic +#' @template param-pred_fun-agnostic +#' @template param-input_dim-agnostic +#' @template param-input_names-agnostic +#' @template param-output_names-agnostic +#' +#' @export +AgnosticWrapper <- R6Class( + classname = "AgnosticWrapper", + inherit = InterpretingMethod, + public = list( + #' @field x The individual instances to be explained by the method + #' (unprocessed!). + x = NULL, + + #' @description + #' Create a new instance of the `AgnosticWrapper` R6 class. + #' + initialize = function(model, data, x, + output_type = NULL, + pred_fun = NULL, + output_idx = NULL, + channels_first = TRUE, + input_dim = NULL, + input_names = NULL, + output_names = NULL) { + + # If x is missing, we interpret all instances in the given data + if (missing(x)) x <- data + + # Check for required arguments + if (missing(model)) stopf("Argument {.arg model} is missing, with no default!") + if (missing(data)) stopf("Argument {.arg data} is missing, with no default!") + + # Set the default input shape given by the data + if (is.null(input_dim)) { + input_dim <- dim(data)[-1] + # The input shape is always in the channels first format + if (!channels_first) { + input_dim <- c(input_dim[length(input_dim)], input_dim[-length(input_dim)]) + } + } + + # Get names from data (if possible) + if (is.null(input_names)) input_names <- names(data) + + # Create converter object for agnostic IML methods + conv_model <- get_converter(model, data, input_dim, input_names, + output_names, output_type, pred_fun, + channels_first) + + self$converter <- conv_model + self$channels_first <- channels_first + self$ignore_last_act <- FALSE + self$dtype <- "float" + + # Check output indices + self$output_idx <- check_output_idx(output_idx, conv_model$output_dim) + + # Save the original data to be explained + self$x <- x + + # Save the data (the field "data" means the data to be explained, i.e., + # in this case "x") + if (is.data.frame(x)) x <- as.matrix(x) + self$data <- list(torch_tensor(x)) + } + ) +) + +#' +#' @importFrom graphics boxplot +#' @exportS3Method +#' +boxplot.AgnosticWrapper <- function(x, ...) { + x$boxplot(...) +} + +################################################################################ +# Converter for model-agnostic methods +################################################################################ + +get_converter <- function(model, data, input_dim = NULL, input_names = NULL, + output_names = NULL, output_type = NULL, + pred_fun = NULL, channels_first = NULL) { + # We currently only support models with one input and one output layer + if (is.list(data) && !is.data.frame(data)) { + if (length(data) > 1) { + stopf("The package supports only models with a single input layer for ", + "the model-agnostic approaches!") + } + } + + # Look for pre-implemented methods + if (inherits(model, "nn")) { + conv_model <- get_nn_model(model) + } else if (is_keras_model(model)) { + conv_model <- get_keras_model(model) + } else if (inherits(model, "nn_module") && is_nn_module(model)) { + conv_model <- get_torch_model(model) + } else if (inherits(model, "Converter")) { + conv_model <- list( + model = model$model, + input_dim = model$input_dim, + input_names = model$input_names, + output_names = model$output_names, + output_type = get_output_type(model), + pred_fun = get_pred_fun(model, channels_first) + ) + } else { + conv_model <- list( + model = model, + input_dim = input_dim, + input_names = input_names, + output_names = output_names, + output_type = output_type, + pred_fun = pred_fun + ) + } + + # Overwrite defaults if arguments aren't NULL + if (!is.null(output_type)) conv_model$output_type <- output_type + if (!is.null(pred_fun)) conv_model$pred_fun <- pred_fun + + # Do some checks + cli_check(checkChoice(conv_model$output_type, c("regression", "classification")), + "output_type") + cli_check(checkFunction(conv_model$pred_fun, args = "newdata"), "pred_fun") + + # Check input_dim + if (!is.null(conv_model$input_dim)) { + if (any(unlist(conv_model$input_dim) != unlist(input_dim))) { + stopf("There is a missmatch in the calculated input shape ", + shape_to_char(unlist(conv_model$input_dim)), " and the given ", + "input shape {.arg input_dim} ", + shape_to_char(unlist(input_dim)), + " (or calculated by the given data {.arg data})! Remember ", + "that {.arg input_dim} has to be in the channels first format. ", + "If your data is not provided with the channels first, set the ", + "argument {.arg channels_first} to {.code FALSE}.") + } + } else { + conv_model$input_dim <- list(unlist(input_dim)) + } + + # Check output_dim + out <- tryCatch( + conv_model$pred_fun(as.data.frame(data), + input_dim = conv_model$input_dim[[1]]), + error = function(e) { + e$message <- c( + paste0( + "An error occurred when evaluating the {.arg data} using ", + "{.arg pred_fun}! Remember that the data is converted into a ", + "{.code data.frame} before it is fed into the model, i.e. the ", + "{.arg pred_fun} may have to reverse this process. Also note that ", + "you must specify with the {.arg channels_first} argument if your ", + "data does not have the channels directly after the batch dimension."), + "", + "x" = "Original message:", col_grey(e$message) + ) + stopf(e$message, use_paste = FALSE) + }) + + # We currently only support models with one input and one output layer + if (is.list(out) && !is.data.frame(out)) { + if (length(out) > 1) { + stopf("The package supports only models with a single output layer for ", + "the model-agnostic approaches!") + } + } + + calc_output_dim <- list(dim(out)[-1]) + if (!is.null(conv_model$output_dim)) { + if (any(unlist(conv_model$output_dim) != unlist(calc_output_dim))) { + stopf("There is a missmatch in the calculated output shape ", + shape_to_char(unlist(conv_model$output_dim)), " and the given ", + "output shape ", shape_to_char(unlist(calc_output_dim)), + " extracted of the {.arg model}!") + } + } else { + conv_model$output_dim <- list(unlist(calc_output_dim)) + } + + # Check input names + if (is.null(input_names)) { + if (is.null(conv_model$input_names)) { + conv_model$input_names <- + set_name_format(get_input_names(conv_model$input_dim)) + } + } else { + input_names <- set_name_format(input_names) + input_names_lenght <- lapply(input_names, + function(x) unlist(lapply(x, length))) + if (!all_equal(input_names_lenght, conv_model$input_dim)) { + given <- shape_to_char(input_names_lenght) + calc <- shape_to_char(conv_model$input_dim) + stopf(c( + paste0("Missmatch between the calculated shape of input names and ", + "given input names:"), + "*" = paste0("Calculated: '", calc, "'"), + "*" = paste0("Given: '", given, "'")), + use_paste = FALSE) + } + conv_model$input_names <- input_names + } + + # Check output names + if (is.null(output_names)) { + if (is.null(conv_model$output_names)) { + conv_model$output_names <- + set_name_format(get_output_names(conv_model$output_dim)) + } + } else { + output_names <- set_name_format(output_names) + output_names_length <- lapply(output_names, + function(x) unlist(lapply(x, length))) + + if (!all_equal(output_names_length, conv_model$output_dim)) { + given <- shape_to_char(output_names_length) + calc <- shape_to_char(conv_model$output_dim) + stopf(c( + paste0("Missmatch between the calculated shape of output names and ", + "given output names:"), + "*" = paste0("Calculated: '", calc, "'"), + "*" = paste0("Given: '", given, "'")), + use_paste = FALSE) + } + conv_model$output_names <- output_names + } + + class(conv_model) <- c("innsight_agnostic_wrapper", class(conv_model)) + conv_model +} + +################################################################################ +# Utility functions +################################################################################ + +# Package neuralnet +get_nn_model <- function(model) { + conv_model <- list( + model = model, + input_dim = list(ncol(model$covariate)), + input_names = list(list( + factor(colnames(model$covariate), + levels = unique(colnames(model$covariate))))), + output_dim = list(ncol(model$response)), + output_names = list(list( + factor(colnames(model$response), + levels = unique(colnames(model$response))))), + output_type = + ifelse(model$linear.output, "regression", "classification"), + pred_fun = function(newdata, ...) predict(model, newdata = newdata, ...) + ) +} + +# Keras +get_keras_model <- function(model) { + + # Get model input shape + input_dim <- list(unlist(model$input_shape)) + + # Get data formats + fun <- function(x) if("data_format" %in% names(x)) x$data_format + data_formats <- unlist(lapply(model$layers, fun)) + + # Convert the shapes to the channels first format + if (all(data_formats == "channels_last") && !is.null(data_formats)) { + fun <- function(x) c(x[length(x)], x[-length(x)]) + input_dim <- lapply(input_dim, fun) + } + + # Get output shape + output_dim <- list(unlist(model$output_shape)) + + # Get output type + fun <- function(x) model$get_layer(x)$get_config()$activation + out_types <- unlist(lapply(model$output_names, fun)) + if (all(out_types %in% c("softmax", "sigmoid"))) { + out_type <- "classification" + } else if (all(out_types == "linear")) { + out_type <- "regression" + } else { + stopf("You cannot use regression and classification output layers for ", + "model-agnostic methods!") + } + + # Build Converter as a list + conv_model <- list( + model = model, + input_dim = input_dim, + output_dim = output_dim, + output_type = out_type, + pred_fun = function(newdata, ...) { + newdata <- array(unlist(c(newdata)), dim = c(nrow(newdata), unlist(model$input_shape))) + predict(model, x = newdata, ...) + } + ) +} + +# Torch model +get_torch_model <- function(model) { + conv_model <- list( + model = model, + output_type = + ifelse(inherits(rev(model$modules)[[1]], c("nn_sigmoid", "nn_softmax")), + "classification", "regression"), + pred_fun = function(newdata, input_dim, ...) { + newdata <- array(unlist(c(newdata)), dim = c(nrow(newdata), input_dim)) + as.array(model$forward(torch_tensor(newdata))) + } + ) +} + +# Get output_type from Converter object +get_output_type <- function(converter) { + out_nodes <- converter$model$output_nodes + acts <- unlist(lapply(out_nodes, + function(i) converter$model$modules_list[[i]]$activation_name)) + if (all(acts %in% c("softmax", "sigmoid", "tanh", "logistic"))) { + output_type <- "classification" + } else { + output_type <- "regression" + } + + output_type +} + +# Get pred_fun from Converter object +get_pred_fun <- function(converter, channels_first) { + input_dim <- unlist(converter$input_dim) + if (!channels_first) { + input_dim <- c(input_dim[-1], input_dim[1]) + } + function(newdata, ...) { + newdata <- array(unlist(c(newdata)), dim = c(nrow(newdata), input_dim)) + res <- converter$model(torch_tensor(newdata), channels_first = channels_first)[[1]] + as.array(res) + } +} diff --git a/R/GradienBased.R b/R/GradienBased.R index 6f30c59..a848ce2 100644 --- a/R/GradienBased.R +++ b/R/GradienBased.R @@ -362,7 +362,7 @@ IntegratedGradient <- R6Class( dim = 1) # Define scale - scale <- torch_tensor(rep(seq(0, 1, length.out = self$n), length.out = res$shape[1])) + scale <- torch_tensor(rep(seq(1/self$n, 1, length.out = self$n), length.out = res$shape[1])) scale <- scale$reshape(c(-1, rep(1, res$dim() - 1))) # Create interpolations between x and x_ref @@ -386,13 +386,13 @@ IntegratedGradient <- R6Class( # of shape (n, dim_in) grad <- grad$chunk(dim(self$data[[1]])[1]) - # Define trapezoidal rule for approximation the integral - trapez_rule <- function(x, n) { - torch_mean((x[1:(n-1), ] + x[2:n]) / 2, dim = 1) - } + # # Define trapezoidal rule for approximation the integral + #trapez_rule <- function(x, n) { + # torch_mean((x[1:(n-1), ] + x[2:n]) / 2, dim = 1) + #} # Calculate the result of IntegratedGradients for current gradients - res <- torch_stack(lapply(grad, trapez_rule, n = self$n)) + res <- torch_stack(lapply(grad, torch_mean, dim = 1)) } res diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index 26dce70..f391b8b 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -274,6 +274,12 @@ InterpretingMethod <- R6Class( } else if (inherits(self, "GradientBased")) { value_name <- if(self$times_input) "Relevance" else "Gradient" include_data <- TRUE + } else if (inherits(self, "LIME")) { + value_name <- "Weight" + include_data <- TRUE + } else if (inherits(self, "SHAP")) { + value_name <- "Shapley Value" + include_data <- TRUE } # Check correctness of arguments @@ -394,6 +400,10 @@ InterpretingMethod <- R6Class( value_name <- "Contribution" } else if (inherits(self, "GradientBased")) { value_name <- "Gradient" + } else if (inherits(self, "LIME")) { + value_name <- "Weight" + } else if (inherits(self, "SHAP")) { + value_name <- "Shapley Value" } # diff --git a/R/innsight.R b/R/innsight.R index 8984977..d79d55d 100644 --- a/R/innsight.R +++ b/R/innsight.R @@ -45,4 +45,5 @@ #' @importFrom cli cli_h1 cli_h2 cli_text cli_ul cli_li cli_end col_grey #' @importFrom cli cli_dl symbol cli_ol cli_div cli_bullets col_cyan #' @importFrom cli cli_progress_bar cli_progress_update cli_progress_done +#' @importFrom stats predict NULL diff --git a/man-roxygen/examples-LIME.R b/man-roxygen/examples-LIME.R new file mode 100644 index 0000000..bece866 --- /dev/null +++ b/man-roxygen/examples-LIME.R @@ -0,0 +1,81 @@ +#' @examplesIf torch::torch_is_installed() +#' #----------------------- Example 1: Torch ----------------------------------- +#' library(torch) +#' +#' # Create nn_sequential model and data +#' model <- nn_sequential( +#' nn_linear(5, 12), +#' nn_relu(), +#' nn_linear(12, 2), +#' nn_softmax(dim = 2) +#' ) +#' data <- torch_randn(25, 5) +#' +#' # Calculate LIME for the first 10 instances and set the +#' # feature and outcome names +#' lime <- LIME$new(model, data, data[1:10, ], +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) +#' +#' # Get the result as an array for the first two instances +#' get_result(lime)[1:2,, ] +#' +#' # Plot the result for both classes +#' plot(lime, output_idx = c(1, 2)) +#' +#' # Show the boxplot over all 10 instances +#' boxplot(lime, output_idx = c(1, 2)) +#' +#' # We can also forward some arguments to lime::explain, e.g. n_permutatuins +#' # to get more accurate values +#' lime <- LIME$new(model, data, data[1:10, ], +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!"), +#' n_perturbations = 500) +#' +#' # Plot the boxplots again +#' boxplot(lime, output_idx = c(1, 2)) +#' +#' #----------------------- Example 2: Converter object -------------------------- +#' # We can do the same with an Converter object (all feature and outcome names +#' # will be extracted by the LIME method!) +#' conv <- Converter$new(model, +#' input_dim = c(5), +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) +#' +#' # Calculate LIME for the first 10 instances +#' lime <- LIME$new(conv, data, data[1:10], n_perturbations = 400) +#' +#' # Plot the result for both classes +#' plot(lime, output_idx = c(1, 2)) +#' +#' #----------------------- Example 3: Other model ------------------------------- +#' if (require("neuralnet") & require("ranger")) { +#' library(neuralnet) +#' library(ranger) +#' data(iris) +#' +#' # Fit a random forest unsing the ranger package +#' model <- ranger(Species ~ ., data = iris, probability = TRUE) +#' +#' # There is no pre-implemented predict function for ranger models, i.e., +#' # we have to define it ourselves. +#' pred_fun <- function(newdata, ...) { +#' predict(model, newdata, ...)$predictions +#' } +#' +#' # Calculate LIME for the instances of index 1 and 111 and add +#' # the outcome labels (for LIME, the output_type is required!) +#' lime <- LIME$new(model, iris[, -5], iris[c(1, 111), -5], +#' pred_fun = pred_fun, +#' output_type = "classification", +#' output_names = levels(iris$Species), +#' n_perturbations = 500) +#' +#' # Plot the result for the first two classes and all selected instances +#' plot(lime, data_idx = 1:2, output_idx = 1:2) +#' +#' # Get the result as a torch_tensor +#' get_result(lime, "torch_tensor") +#' } diff --git a/man-roxygen/examples-SHAP.R b/man-roxygen/examples-SHAP.R new file mode 100644 index 0000000..15612cb --- /dev/null +++ b/man-roxygen/examples-SHAP.R @@ -0,0 +1,80 @@ +#' @examplesIf torch::torch_is_installed() +#' #----------------------- Example 1: Torch ----------------------------------- +#' library(torch) +#' +#' # Create nn_sequential model and data +#' model <- nn_sequential( +#' nn_linear(5, 12), +#' nn_relu(), +#' nn_linear(12, 2), +#' nn_softmax(dim = 2) +#' ) +#' data <- torch_randn(25, 5) +#' +#' # Calculate Shapley values for the first 10 instances and set the +#' # feature and outcome names +#' shap <- SHAP$new(model, data, data[1:10, ], +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) +#' +#' # Get the result as an array for the first two instances +#' get_result(shap)[1:2,, ] +#' +#' # Plot the result for both classes +#' plot(shap, output_idx = c(1, 2)) +#' +#' # Show the boxplot over all 10 instances +#' boxplot(shap, output_idx = c(1, 2)) +#' +#' # We can also forward some arguments to fastshap::explain, e.g. nsim to +#' # get more accurate values +#' shap <- SHAP$new(model, data, data[1:10, ], +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!"), +#' nsim = 10) +#' +#' # Plot the boxplots again +#' boxplot(shap, output_idx = c(1, 2)) +#' +#' #----------------------- Example 2: Converter object -------------------------- +#' # We can do the same with an Converter object (all feature and outcome names +#' # will be extracted by the SHAP method!) +#' conv <- Converter$new(model, +#' input_dim = c(5), +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) +#' +#' # Calculate Shapley values for the first 10 instances +#' shap <- SHAP$new(conv, data, data[1:10]) +#' +#' # Plot the result for both classes +#' plot(shap, output_idx = c(1, 2)) +#' +#' #----------------------- Example 3: Other model ------------------------------- +#' if (require("neuralnet") & require("ranger")) { +#' library(neuralnet) +#' library(ranger) +#' data(iris) +#' +#' # Fit a random forest unsing the ranger package +#' model <- ranger(Species ~ ., data = iris, probability = TRUE) +#' +#' # There is no pre-implemented predict function for ranger models, i.e., +#' # we have to define it ourselves. +#' pred_fun <- function(newdata, ...) { +#' predict(model, newdata, ...)$predictions +#' } +#' +#' # Calculate Shapley values for the instances of index 1 and 111 and add +#' # the outcome labels +#' shap <- SHAP$new(model, iris[, -5], iris[c(1, 111), -5], +#' pred_fun = pred_fun, +#' output_names = levels(iris$Species), +#' nsim = 10) +#' +#' # Plot the result for the first two classes and all selected instances +#' plot(shap, data_idx = 1:2, output_idx = 1:2) +#' +#' # Get the result as a torch_tensor +#' get_result(shap, "torch_tensor") +#' } diff --git a/man-roxygen/param-data-agnostic.R b/man-roxygen/param-data-agnostic.R new file mode 100644 index 0000000..3c534f8 --- /dev/null +++ b/man-roxygen/param-data-agnostic.R @@ -0,0 +1,7 @@ +#' @param data ([`array`], [`data.frame`] or [`torch_tensor`])\cr +#' The dataset to which the method is to be applied. These must +#' have the same format as the input data of the passed model and has to +#' be either [`matrix`], an [`array`], a [`data.frame`] or a +#' [`torch_tensor`].\cr +#' **Note:** For the model-agnostic methods, only models with a single +#' input and output layer is allowed!\cr diff --git a/man-roxygen/param-input_dim-agnostic.R b/man-roxygen/param-input_dim-agnostic.R new file mode 100644 index 0000000..8613bb0 --- /dev/null +++ b/man-roxygen/param-input_dim-agnostic.R @@ -0,0 +1,4 @@ +#' @param input_dim (`integer`)\cr +#' The model input dimension excluding the batch +#' dimension. It can be specified as vector of integers, but has to be in +#' the format "channels first".\cr diff --git a/man-roxygen/param-input_names-agnostic.R b/man-roxygen/param-input_names-agnostic.R new file mode 100644 index 0000000..86be62e --- /dev/null +++ b/man-roxygen/param-input_names-agnostic.R @@ -0,0 +1,13 @@ +#' @param input_names (`character`, `factor` or `list`)\cr +#' The input names of the model excluding the batch dimension. For a model +#' with a single input layer and input axis (e.g., for tabular data), the +#' input names can be specified as a character vector or factor, e.g., +#' for a dense layer with 3 input features use `c("X1", "X2", "X3")`. If +#' the model input consists of multiple axes (e.g., for signal and +#' image data), use a list of character vectors or factors for each axis +#' in the format "channels first", e.g., use +#' `list(c("C1", "C2"), c("L1","L2","L3","L4","L5"))` for a 1D +#' convolutional input layer with signal length 4 and 2 channels.\cr +#' *Note:* This argument is optional and otherwise the names are +#' generated automatically. But if this argument is set, all found +#' input names in the passed model will be disregarded.\cr diff --git a/man-roxygen/param-model-agnostic.R b/man-roxygen/param-model-agnostic.R new file mode 100644 index 0000000..2cd9bc4 --- /dev/null +++ b/man-roxygen/param-model-agnostic.R @@ -0,0 +1,10 @@ +#' @param model \cr +#' A fitted model for a classification or regression task that +#' is intended to be interpreted. A [`Converter`] object can also be +#' passed. In order for the package to know how to make predictions +#' with the given model, a prediction function must also be passed with +#' the argument `pred_fun`. However, for models created by +#' \code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +#' \code{\link[neuralnet]{neuralnet}} or [`Converter`], +#' these have already been pre-implemented and do not need to be +#' specified.\cr diff --git a/man-roxygen/param-output_names-agnostic.R b/man-roxygen/param-output_names-agnostic.R new file mode 100644 index 0000000..b9ee483 --- /dev/null +++ b/man-roxygen/param-output_names-agnostic.R @@ -0,0 +1,8 @@ +#' @param output_names (`character`, `factor` )\cr +#' A character vector with the names for the output dimensions +#' excluding the batch dimension, e.g., for a model with 3 output nodes use +#' `c("Y1", "Y2", "Y3")`. Instead of a character +#' vector you can also use a factor to set an order for the plots.\cr +#' *Note:* This argument is optional and otherwise the names are +#' generated automatically. But if this argument is set, all found +#' output names in the passed model will be disregarded.\cr diff --git a/man-roxygen/param-output_type-agnostic.R b/man-roxygen/param-output_type-agnostic.R new file mode 100644 index 0000000..a0a34db --- /dev/null +++ b/man-roxygen/param-output_type-agnostic.R @@ -0,0 +1,3 @@ +#' @param output_type (`character(1)`)\cr +#' Type of the model output, i.e., either +#' `"classification"` or `"regression"`.\cr diff --git a/man-roxygen/param-pred_fun-agnostic.R b/man-roxygen/param-pred_fun-agnostic.R new file mode 100644 index 0000000..81f9411 --- /dev/null +++ b/man-roxygen/param-pred_fun-agnostic.R @@ -0,0 +1,9 @@ +#' @param pred_fun (`function`)\cr +#' Prediction function for the model. This argument is only +#' needed if `model` is not a model created by +#' \code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +#' \code{\link[neuralnet]{neuralnet}} or [`Converter`]. The first argument of +#' `pred_fun` has to be `newdata`, e.g., +#' ``` +#' function(newdata, ...) model(newdata) +#' ``` diff --git a/man-roxygen/param-x-agnostic.R b/man-roxygen/param-x-agnostic.R new file mode 100644 index 0000000..1d3a6e8 --- /dev/null +++ b/man-roxygen/param-x-agnostic.R @@ -0,0 +1,8 @@ +#' @param x (`array`, `data.frame` or `torch_tensor`)\cr +#' The individual instances to be explained by the method. +#' These must have the same format as the input data of the passed model +#' and has to be either [`matrix`], an [`array`], a [`data.frame`] or a +#' [`torch_tensor`]. If no value is specified, all instances in the +#' dataset `data` will be explained.\cr +#' **Note:** For the model-agnostic methods, only models with a single +#' input and output layer is allowed!\cr diff --git a/man/AgnosticWrapper.Rd b/man/AgnosticWrapper.Rd new file mode 100644 index 0000000..45ea1be --- /dev/null +++ b/man/AgnosticWrapper.Rd @@ -0,0 +1,193 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AgnosticWrapper.R +\name{AgnosticWrapper} +\alias{AgnosticWrapper} +\title{Super class for model-agnostic interpretability methods} +\description{ +This is a super class for all implemented model-agnostic +interpretability methods and inherits from the \code{\link{InterpretingMethod}} +class. Instead of just an object of the \code{\link{Converter}} class, any model +can now be passed. In contrast to the other model-specific methods in this +package, only the prediction function of the model is required, and not +the internal details of the model. The following model-agnostic methods +are available (all are wrapped by other packages): +\itemize{ +\item \emph{Shapley values} (\code{\link{SHAP}}) based on \code{\link[fastshap:explain]{fastshap::explain}} +\item \emph{Local interpretable model-agnostic explanations} (\code{\link{LIME}}) based on +\code{\link[lime:lime]{lime::lime}} +} +} +\section{Super class}{ +\code{\link[innsight:InterpretingMethod]{innsight::InterpretingMethod}} -> \code{AgnosticWrapper} +} +\section{Public fields}{ +\if{html}{\out{
}} +\describe{ +\item{\code{x}}{The individual instances to be explained by the method +(unprocessed!).} +} +\if{html}{\out{
}} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-AgnosticWrapper-new}{\code{AgnosticWrapper$new()}} +\item \href{#method-AgnosticWrapper-clone}{\code{AgnosticWrapper$clone()}} +} +} +\if{html}{\out{ +
Inherited methods + +
+}} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-AgnosticWrapper-new}{}}} +\subsection{Method \code{new()}}{ +Create a new instance of the \code{AgnosticWrapper} R6 class. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{AgnosticWrapper$new( + model, + data, + x, + output_type = NULL, + pred_fun = NULL, + output_idx = NULL, + channels_first = TRUE, + input_dim = NULL, + input_names = NULL, + output_names = NULL +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{model}}{\cr +A fitted model for a classification or regression task that +is intended to be interpreted. A \code{\link{Converter}} object can also be +passed. In order for the package to know how to make predictions +with the given model, a prediction function must also be passed with +the argument \code{pred_fun}. However, for models created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}, +these have already been pre-implemented and do not need to be +specified.\cr} + +\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{\code{x}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +The individual instances to be explained by the method. +These must have the same format as the input data of the passed model +and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}. If no value is specified, all instances in the +dataset \code{data} will be explained.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{\code{output_type}}{(\code{character(1)})\cr +Type of the model output, i.e., either +\code{"classification"} or \code{"regression"}.\cr} + +\item{\code{pred_fun}}{(\code{function})\cr +Prediction function for the model. This argument is only +needed if \code{model} is not a model created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. The first argument of +\code{pred_fun} has to be \code{newdata}, e.g., + +\if{html}{\out{
}}\preformatted{function(newdata, ...) model(newdata) +}\if{html}{\out{
}}} + +\item{\code{output_idx}}{(\code{integer}, \code{list} or \code{NULL})\cr +These indices specify the output nodes for which +the method is to be applied. In order to allow models with multiple +output layers, there are the following possibilities to select +the indices of the output nodes in the individual output layers: +\itemize{ +\item An \code{integer} vector of indices: If the model has only one output +layer, the values correspond to the indices of the output nodes, e.g., +\code{c(1,3,4)} for the first, third and fourth output node. If there are +multiple output layers, the indices of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{integer} vectors of indices: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired indices of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{channels_first}}{(\code{logical(1)})\cr +The channel position of the given data (argument +\code{data}). If \code{TRUE}, the channel axis is placed at the second position +between the batch size and the rest of the input axes, e.g., +\code{c(10,3,32,32)} for a batch of ten images with three channels and a +height and width of 32 pixels. Otherwise (\code{FALSE}), the channel axis +is at the last position, i.e., \code{c(10,32,32,3)}. If the data +has no channel axis, use the default value \code{TRUE}.\cr} + +\item{\code{input_dim}}{(\code{integer})\cr +The model input dimension excluding the batch +dimension. It can be specified as vector of integers, but has to be in +the format "channels first".\cr} + +\item{\code{input_names}}{(\code{character}, \code{factor} or \code{list})\cr +The input names of the model excluding the batch dimension. For a model +with a single input layer and input axis (e.g., for tabular data), the +input names can be specified as a character vector or factor, e.g., +for a dense layer with 3 input features use \code{c("X1", "X2", "X3")}. If +the model input consists of multiple axes (e.g., for signal and +image data), use a list of character vectors or factors for each axis +in the format "channels first", e.g., use +\code{list(c("C1", "C2"), c("L1","L2","L3","L4","L5"))} for a 1D +convolutional input layer with signal length 4 and 2 channels.\cr +\emph{Note:} This argument is optional and otherwise the names are +generated automatically. But if this argument is set, all found +input names in the passed model will be disregarded.\cr} + +\item{\code{output_names}}{(\code{character}, \code{factor} )\cr +A character vector with the names for the output dimensions +excluding the batch dimension, e.g., for a model with 3 output nodes use +\code{c("Y1", "Y2", "Y3")}. Instead of a character +vector you can also use a factor to set an order for the plots.\cr +\emph{Note:} This argument is optional and otherwise the names are +generated automatically. But if this argument is set, all found +output names in the passed model will be disregarded.\cr} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-AgnosticWrapper-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{AgnosticWrapper$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/ConnectionWeights.Rd b/man/ConnectionWeights.Rd index 44a0014..b50a659 100644 --- a/man/ConnectionWeights.Rd +++ b/man/ConnectionWeights.Rd @@ -136,7 +136,9 @@ Other methods: \code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, +\code{\link{LIME}}, \code{\link{LRP}}, +\code{\link{SHAP}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/DeepLift.Rd b/man/DeepLift.Rd index d27bf3f..028f86a 100644 --- a/man/DeepLift.Rd +++ b/man/DeepLift.Rd @@ -142,7 +142,9 @@ Other methods: \code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, +\code{\link{LIME}}, \code{\link{LRP}}, +\code{\link{SHAP}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/DeepSHAP.Rd b/man/DeepSHAP.Rd index 99f36fa..6256919 100644 --- a/man/DeepSHAP.Rd +++ b/man/DeepSHAP.Rd @@ -137,7 +137,9 @@ Other methods: \code{\link{DeepLift}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, +\code{\link{LIME}}, \code{\link{LRP}}, +\code{\link{SHAP}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/Gradient.Rd b/man/Gradient.Rd index d0ac79d..e868209 100644 --- a/man/Gradient.Rd +++ b/man/Gradient.Rd @@ -136,7 +136,9 @@ Other methods: \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, \code{\link{IntegratedGradient}}, +\code{\link{LIME}}, \code{\link{LRP}}, +\code{\link{SHAP}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/IntegratedGradient.Rd b/man/IntegratedGradient.Rd index 49147b8..463b3a9 100644 --- a/man/IntegratedGradient.Rd +++ b/man/IntegratedGradient.Rd @@ -144,7 +144,9 @@ Other methods: \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, \code{\link{Gradient}}, +\code{\link{LIME}}, \code{\link{LRP}}, +\code{\link{SHAP}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/LIME.Rd b/man/LIME.Rd new file mode 100644 index 0000000..e312e4f --- /dev/null +++ b/man/LIME.Rd @@ -0,0 +1,292 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AgnosticMethods.R +\name{LIME} +\alias{LIME} +\title{Local interpretable model-agnostic explanations (LIME)} +\description{ +The R6 class \code{LIME} calculates the feature weights of a linear surrogate of +the prediction model for a instance to be explained, namely the +\emph{local interpretable model-agnostic explanations (LIME)}. It is a +model-agnostic method that can be applied to any predictive model. +This means, in particular, that +\code{LIME} can be applied not only to objects of the \code{\link{Converter}} class but +also to any other model. The only requirement is the argument \code{pred_fun}, +which generates predictions with the model for given data. However, this +function is pre-implemented for models created with +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. Internally, the +suggested package \code{lime} is utilized and applied to \code{data.frame}. + +\strong{Note:} Even signal and image data are initially transformed into a +\code{data.frame} using \code{as.data.frame()} and then \code{\link[lime:lime]{lime::lime}} and +\code{\link[lime:explain]{lime::explain}} are +applied. In other words, a custom \code{pred_fun} may need to convert the +\code{data.frame} back into an \code{array} as necessary. +} +\examples{ +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#----------------------- Example 1: Torch ----------------------------------- +library(torch) + +# Create nn_sequential model and data +model <- nn_sequential( + nn_linear(5, 12), + nn_relu(), + nn_linear(12, 2), + nn_softmax(dim = 2) + ) +data <- torch_randn(25, 5) + +# Calculate LIME for the first 10 instances and set the +# feature and outcome names +lime <- LIME$new(model, data, data[1:10, ], + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) + +# Get the result as an array for the first two instances +get_result(lime)[1:2,, ] + +# Plot the result for both classes +plot(lime, output_idx = c(1, 2)) + +# Show the boxplot over all 10 instances +boxplot(lime, output_idx = c(1, 2)) + +# We can also forward some arguments to lime::explain, e.g. n_permutatuins +# to get more accurate values +lime <- LIME$new(model, data, data[1:10, ], + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!"), + n_perturbations = 500) + +# Plot the boxplots again +boxplot(lime, output_idx = c(1, 2)) + +#----------------------- Example 2: Converter object -------------------------- +# We can do the same with an Converter object (all feature and outcome names +# will be extracted by the LIME method!) +conv <- Converter$new(model, + input_dim = c(5), + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) + +# Calculate LIME for the first 10 instances +lime <- LIME$new(conv, data, data[1:10], n_perturbations = 400) + +# Plot the result for both classes +plot(lime, output_idx = c(1, 2)) + +#----------------------- Example 3: Other model ------------------------------- +if (require("neuralnet") & require("ranger")) { + library(neuralnet) + library(ranger) + data(iris) + + # Fit a random forest unsing the ranger package + model <- ranger(Species ~ ., data = iris, probability = TRUE) + + # There is no pre-implemented predict function for ranger models, i.e., + # we have to define it ourselves. + pred_fun <- function(newdata, ...) { + predict(model, newdata, ...)$predictions + } + + # Calculate LIME for the instances of index 1 and 111 and add + # the outcome labels (for LIME, the output_type is required!) + lime <- LIME$new(model, iris[, -5], iris[c(1, 111), -5], + pred_fun = pred_fun, + output_type = "classification", + output_names = levels(iris$Species), + n_perturbations = 500) + + # Plot the result for the first two classes and all selected instances + plot(lime, data_idx = 1:2, output_idx = 1:2) + + # Get the result as a torch_tensor + get_result(lime, "torch_tensor") +} +\dontshow{\}) # examplesIf} +} +\seealso{ +Other methods: +\code{\link{ConnectionWeights}}, +\code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, +\code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, +\code{\link{LRP}}, +\code{\link{SHAP}}, +\code{\link{SmoothGrad}} +} +\concept{methods} +\section{Super classes}{ +\code{\link[innsight:InterpretingMethod]{innsight::InterpretingMethod}} -> \code{\link[innsight:AgnosticWrapper]{innsight::AgnosticWrapper}} -> \code{LIME} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-LIME-new}{\code{LIME$new()}} +\item \href{#method-LIME-clone}{\code{LIME$clone()}} +} +} +\if{html}{\out{ +
Inherited methods + +
+}} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-LIME-new}{}}} +\subsection{Method \code{new()}}{ +Create a new instance of the \code{LIME} R6 class. When initialized, +the method \emph{LIME} is applied to the given data and the results are +stored in the field \code{result}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{LIME$new( + model, + data, + x, + output_type = NULL, + pred_fun = NULL, + output_idx = NULL, + channels_first = TRUE, + input_dim = NULL, + input_names = NULL, + output_names = NULL, + ... +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{model}}{\cr +A fitted model for a classification or regression task that +is intended to be interpreted. A \code{\link{Converter}} object can also be +passed. In order for the package to know how to make predictions +with the given model, a prediction function must also be passed with +the argument \code{pred_fun}. However, for models created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}, +these have already been pre-implemented and do not need to be +specified.\cr} + +\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{\code{x}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +The individual instances to be explained by the method. +These must have the same format as the input data of the passed model +and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}. If no value is specified, all instances in the +dataset \code{data} will be explained.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{\code{output_type}}{(\code{character(1)})\cr +Type of the model output, i.e., either +\code{"classification"} or \code{"regression"}.\cr} + +\item{\code{pred_fun}}{(\code{function})\cr +Prediction function for the model. This argument is only +needed if \code{model} is not a model created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. The first argument of +\code{pred_fun} has to be \code{newdata}, e.g., + +\if{html}{\out{
}}\preformatted{function(newdata, ...) model(newdata) +}\if{html}{\out{
}}} + +\item{\code{output_idx}}{(\code{integer}, \code{list} or \code{NULL})\cr +These indices specify the output nodes for which +the method is to be applied. In order to allow models with multiple +output layers, there are the following possibilities to select +the indices of the output nodes in the individual output layers: +\itemize{ +\item An \code{integer} vector of indices: If the model has only one output +layer, the values correspond to the indices of the output nodes, e.g., +\code{c(1,3,4)} for the first, third and fourth output node. If there are +multiple output layers, the indices of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{integer} vectors of indices: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired indices of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{channels_first}}{(\code{logical(1)})\cr +The channel position of the given data (argument +\code{data}). If \code{TRUE}, the channel axis is placed at the second position +between the batch size and the rest of the input axes, e.g., +\code{c(10,3,32,32)} for a batch of ten images with three channels and a +height and width of 32 pixels. Otherwise (\code{FALSE}), the channel axis +is at the last position, i.e., \code{c(10,32,32,3)}. If the data +has no channel axis, use the default value \code{TRUE}.\cr} + +\item{\code{input_dim}}{(\code{integer})\cr +The model input dimension excluding the batch +dimension. It can be specified as vector of integers, but has to be in +the format "channels first".\cr} + +\item{\code{input_names}}{(\code{character}, \code{factor} or \code{list})\cr +The input names of the model excluding the batch dimension. For a model +with a single input layer and input axis (e.g., for tabular data), the +input names can be specified as a character vector or factor, e.g., +for a dense layer with 3 input features use \code{c("X1", "X2", "X3")}. If +the model input consists of multiple axes (e.g., for signal and +image data), use a list of character vectors or factors for each axis +in the format "channels first", e.g., use +\code{list(c("C1", "C2"), c("L1","L2","L3","L4","L5"))} for a 1D +convolutional input layer with signal length 4 and 2 channels.\cr +\emph{Note:} This argument is optional and otherwise the names are +generated automatically. But if this argument is set, all found +input names in the passed model will be disregarded.\cr} + +\item{\code{output_names}}{(\code{character}, \code{factor} )\cr +A character vector with the names for the output dimensions +excluding the batch dimension, e.g., for a model with 3 output nodes use +\code{c("Y1", "Y2", "Y3")}. Instead of a character +vector you can also use a factor to set an order for the plots.\cr +\emph{Note:} This argument is optional and otherwise the names are +generated automatically. But if this argument is set, all found +output names in the passed model will be disregarded.\cr} + +\item{\code{...}}{other arguments forwarded to \code{\link[lime:explain]{lime::explain}}.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-LIME-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{LIME$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/LRP.Rd b/man/LRP.Rd index 5ad1e99..e150812 100644 --- a/man/LRP.Rd +++ b/man/LRP.Rd @@ -156,6 +156,8 @@ Other methods: \code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, +\code{\link{LIME}}, +\code{\link{SHAP}}, \code{\link{SmoothGrad}} } \concept{methods} diff --git a/man/SHAP.Rd b/man/SHAP.Rd new file mode 100644 index 0000000..7393c76 --- /dev/null +++ b/man/SHAP.Rd @@ -0,0 +1,283 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/AgnosticMethods.R +\name{SHAP} +\alias{SHAP} +\title{Shapley values} +\description{ +The R6 class \code{SHAP} calculates the famous Shapley values based on game +theory for an instance to be explained. It is a model-agnostic method +that can be applied to any predictive model. This means, in particular, that +\code{SHAP} can be applied not only to objects of the \code{\link{Converter}} class but +also to any other model. The only requirement is the argument \code{pred_fun}, +which generates predictions with the model for given data. However, this +function is pre-implemented for models created with +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. Internally, the +suggested package \code{fastshap} is utilized and applied to \code{data.frame}. + +\strong{Note:} Even signal and image data are initially transformed into a +\code{data.frame} using \code{as.data.frame()} and then \code{\link[fastshap:explain]{fastshap::explain}} is +applied. In other words, a custom \code{pred_fun} may need to convert the +\code{data.frame} back into an \code{array} as necessary. +} +\examples{ +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#----------------------- Example 1: Torch ----------------------------------- +library(torch) + +# Create nn_sequential model and data +model <- nn_sequential( + nn_linear(5, 12), + nn_relu(), + nn_linear(12, 2), + nn_softmax(dim = 2) + ) +data <- torch_randn(25, 5) + +# Calculate Shapley values for the first 10 instances and set the +# feature and outcome names +shap <- SHAP$new(model, data, data[1:10, ], + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) + +# Get the result as an array for the first two instances +get_result(shap)[1:2,, ] + +# Plot the result for both classes +plot(shap, output_idx = c(1, 2)) + +# Show the boxplot over all 10 instances +boxplot(shap, output_idx = c(1, 2)) + +# We can also forward some arguments to fastshap::explain, e.g. nsim to +# get more accurate values +shap <- SHAP$new(model, data, data[1:10, ], + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!"), + nsim = 10) + +# Plot the boxplots again +boxplot(shap, output_idx = c(1, 2)) + +#----------------------- Example 2: Converter object -------------------------- +# We can do the same with an Converter object (all feature and outcome names +# will be extracted by the SHAP method!) +conv <- Converter$new(model, + input_dim = c(5), + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) + +# Calculate Shapley values for the first 10 instances +shap <- SHAP$new(conv, data, data[1:10]) + +# Plot the result for both classes +plot(shap, output_idx = c(1, 2)) + +#----------------------- Example 3: Other model ------------------------------- +if (require("neuralnet") & require("ranger")) { + library(neuralnet) + library(ranger) + data(iris) + + # Fit a random forest unsing the ranger package + model <- ranger(Species ~ ., data = iris, probability = TRUE) + + # There is no pre-implemented predict function for ranger models, i.e., + # we have to define it ourselves. + pred_fun <- function(newdata, ...) { + predict(model, newdata, ...)$predictions + } + + # Calculate Shapley values for the instances of index 1 and 111 and add + # the outcome labels + shap <- SHAP$new(model, iris[, -5], iris[c(1, 111), -5], + pred_fun = pred_fun, + output_names = levels(iris$Species), + nsim = 10) + + # Plot the result for the first two classes and all selected instances + plot(shap, data_idx = 1:2, output_idx = 1:2) + + # Get the result as a torch_tensor + get_result(shap, "torch_tensor") +} +\dontshow{\}) # examplesIf} +} +\seealso{ +Other methods: +\code{\link{ConnectionWeights}}, +\code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, +\code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, +\code{\link{LIME}}, +\code{\link{LRP}}, +\code{\link{SmoothGrad}} +} +\concept{methods} +\section{Super classes}{ +\code{\link[innsight:InterpretingMethod]{innsight::InterpretingMethod}} -> \code{\link[innsight:AgnosticWrapper]{innsight::AgnosticWrapper}} -> \code{SHAP} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-SHAP-new}{\code{SHAP$new()}} +\item \href{#method-SHAP-clone}{\code{SHAP$clone()}} +} +} +\if{html}{\out{ +
Inherited methods + +
+}} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-SHAP-new}{}}} +\subsection{Method \code{new()}}{ +Create a new instance of the \code{SHAP} R6 class. When initialized, +the method \emph{SHAP} is applied to the given data and the results are +stored in the field \code{result}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{SHAP$new( + model, + data, + x, + pred_fun = NULL, + output_idx = NULL, + channels_first = TRUE, + input_dim = NULL, + input_names = NULL, + output_names = NULL, + ... +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{model}}{\cr +A fitted model for a classification or regression task that +is intended to be interpreted. A \code{\link{Converter}} object can also be +passed. In order for the package to know how to make predictions +with the given model, a prediction function must also be passed with +the argument \code{pred_fun}. However, for models created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}, +these have already been pre-implemented and do not need to be +specified.\cr} + +\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{\code{x}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +The individual instances to be explained by the method. +These must have the same format as the input data of the passed model +and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}. If no value is specified, all instances in the +dataset \code{data} will be explained.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{\code{pred_fun}}{(\code{function})\cr +Prediction function for the model. This argument is only +needed if \code{model} is not a model created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. The first argument of +\code{pred_fun} has to be \code{newdata}, e.g., + +\if{html}{\out{
}}\preformatted{function(newdata, ...) model(newdata) +}\if{html}{\out{
}}} + +\item{\code{output_idx}}{(\code{integer}, \code{list} or \code{NULL})\cr +These indices specify the output nodes for which +the method is to be applied. In order to allow models with multiple +output layers, there are the following possibilities to select +the indices of the output nodes in the individual output layers: +\itemize{ +\item An \code{integer} vector of indices: If the model has only one output +layer, the values correspond to the indices of the output nodes, e.g., +\code{c(1,3,4)} for the first, third and fourth output node. If there are +multiple output layers, the indices of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{integer} vectors of indices: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired indices of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{channels_first}}{(\code{logical(1)})\cr +The channel position of the given data (argument +\code{data}). If \code{TRUE}, the channel axis is placed at the second position +between the batch size and the rest of the input axes, e.g., +\code{c(10,3,32,32)} for a batch of ten images with three channels and a +height and width of 32 pixels. Otherwise (\code{FALSE}), the channel axis +is at the last position, i.e., \code{c(10,32,32,3)}. If the data +has no channel axis, use the default value \code{TRUE}.\cr} + +\item{\code{input_dim}}{(\code{integer})\cr +The model input dimension excluding the batch +dimension. It can be specified as vector of integers, but has to be in +the format "channels first".\cr} + +\item{\code{input_names}}{(\code{character}, \code{factor} or \code{list})\cr +The input names of the model excluding the batch dimension. For a model +with a single input layer and input axis (e.g., for tabular data), the +input names can be specified as a character vector or factor, e.g., +for a dense layer with 3 input features use \code{c("X1", "X2", "X3")}. If +the model input consists of multiple axes (e.g., for signal and +image data), use a list of character vectors or factors for each axis +in the format "channels first", e.g., use +\code{list(c("C1", "C2"), c("L1","L2","L3","L4","L5"))} for a 1D +convolutional input layer with signal length 4 and 2 channels.\cr +\emph{Note:} This argument is optional and otherwise the names are +generated automatically. But if this argument is set, all found +input names in the passed model will be disregarded.\cr} + +\item{\code{output_names}}{(\code{character}, \code{factor} )\cr +A character vector with the names for the output dimensions +excluding the batch dimension, e.g., for a model with 3 output nodes use +\code{c("Y1", "Y2", "Y3")}. Instead of a character +vector you can also use a factor to set an order for the plots.\cr +\emph{Note:} This argument is optional and otherwise the names are +generated automatically. But if this argument is set, all found +output names in the passed model will be disregarded.\cr} + +\item{\code{...}}{other arguments forwarded to \code{\link[fastshap:explain]{fastshap::explain}}.} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-SHAP-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{SHAP$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/SmoothGrad.Rd b/man/SmoothGrad.Rd index 337c7b1..21d351c 100644 --- a/man/SmoothGrad.Rd +++ b/man/SmoothGrad.Rd @@ -136,7 +136,9 @@ Other methods: \code{\link{DeepSHAP}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, -\code{\link{LRP}} +\code{\link{LIME}}, +\code{\link{LRP}}, +\code{\link{SHAP}} } \concept{methods} \section{Super classes}{ diff --git a/tests/testthat/test_LIME.R b/tests/testthat/test_LIME.R new file mode 100644 index 0000000..9514758 --- /dev/null +++ b/tests/testthat/test_LIME.R @@ -0,0 +1,613 @@ + +test_that("LIME: General errors", { + library(neuralnet) + + # Fit model + model <- neuralnet(Species ~ Petal.Length + Petal.Width, iris, + linear.output = FALSE) + data <- iris[, c(3,4)] + + expect_error(LIME$new()) # missing converter + expect_error(LIME$new(model)) # missing data + expect_error(LIME$new(NULL, data, data[1:2, ])) # no output_type + expect_error(LIME$new(NULL, data, data[1:2, ], output_type = "regression")) # no pred_fun + expect_error(LIME$new(NULL, data, data[1:2, ], + output_type = "regression", + perd_fun = function(newdata, ...) newdata)) + + LIME$new(model, data, data[1:2, ]) # successful run + expect_error(LIME$new(model, data, data[1:2, ], output_type = "ds")) # wrong output_type + expect_error(LIME$new(model, data, data[1:2, ], pred_fun = identity)) # wrong pred_fun + expect_error(LIME$new(model, data, data[1:2, ], output_idx = c(1,4))) # wrong output_idx + LIME$new(model, data, data[1:2, ], output_idx = c(2)) + expect_error(LIME$new(model, data, data[1:2, ], input_dim = c(1))) # wrong input_dim + expect_error(LIME$new(model, data, data[1:2, ], input_names = c("a", "b", "d"))) # wrong input_names + LIME$new(model, data, data[1:2, ], input_names = factor(c("a", "b"))) + expect_error(LIME$new(model, data, data[1:2, ], output_names = c("a", "d"))) # wrong output_names + LIME$new(model, data, data[1:2, ], output_names = factor(c("a", "d", "c"))) + + # Forwarding arguments to lime::explain + lime <- LIME$new(model, data, data[1:10, ], n_permutations = 100, gower_power = 3) + + # get_result() + res <- get_result(lime) + expect_array(res) + res <- get_result(lime, "data.frame") + expect_data_frame(res) + res <- get_result(lime, "torch_tensor") + expect_class(res, "torch_tensor") + + # Plots + + # Non-existing data points + expect_error(plot(lime, data_idx = c(1,11))) + expect_error(boxplot(lime, data_idx = 1:11)) + # Non-existing class + expect_error(plot(lime, output_idx = c(5))) + expect_error(boxplot(lime, output_idx = c(5))) + + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(lime, data_idx = 1:3) + boxp <- boxplot(lime, data_idx = 1:4) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(lime, data_idx = 1:3, output_idx = 1:3) + boxp <- boxplot(lime, data_idx = 1:3, output_idx = 1:3) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + boxp <- boxplot(lime, ref_data_idx = c(4)) + + # plotly + library(plotly) + + p <- plot(lime, as_plotly = TRUE) + boxp <- boxplot(lime, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(lime, data_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(lime, data_idx = 1:4, as_plotly = TRUE, individual_max = 2, + individual_data_idx = c(1,2,5,6)) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(lime, data_idx = 1:3, output_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(lime, data_idx = 1:5, output_idx = 1:3, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + + +}) + +test_that("LIME: Dense-Net (Neuralnet)", { + library(neuralnet) + library(torch) + + data(iris) + data <- iris[sample.int(150, size = 10), -5] + nn <- neuralnet(Species ~ ., + iris, + linear.output = FALSE, + hidden = c(10, 8), act.fct = "tanh", rep = 1, threshold = 0.5 + ) + + # Normal model + lime <- LIME$new(nn, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 3)) + p <- plot(lime, output_idx = c(2,3)) + boxp <- boxplot(lime, output_idx = c(2,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(nn) + lime <- LIME$new(conv, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 3)) + p <- plot(lime, output_idx = c(2,3)) + boxp <- boxplot(lime, output_idx = c(2,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") +}) + + +test_that("LIME: Dense-Net (keras)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "tanh") %>% + layer_dense(units = 3, activation = "softmax") + + # Normal model + lime <- LIME$new(model, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 3)) + p <- plot(lime, output_idx = c(1,3)) + boxp <- boxplot(lime, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + lime <- LIME$new(conv, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 3)) + p <- plot(lime, output_idx = c(1,3)) + boxp <- boxplot(lime, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "tanh") %>% + layer_dense(units = 2, activation = "linear") + + # Normal model + lime <- LIME$new(model, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + lime <- LIME$new(conv, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------- + res_array <- lime$get_result() + expect_true(is.array(res_array)) + res_dataframe <- lime$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- lime$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) + expect_error(lime$get_result(type = "adsf")) +}) + + +test_that("LIME: Conv1D-Net (keras)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_1d( + input_shape = c(64, 3), kernel_size = 16, filters = 8, + activation = "softplus" + ) %>% + layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% + layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 1, activation = "sigmoid") + + # Normal model + lime <- LIME$new(model, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 64, 3, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + lime <- LIME$new(conv, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 64, 3, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_1d( + input_shape = c(64, 3), kernel_size = 16, filters = 8, + activation = "softplus" + ) %>% + layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% + layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 2, activation = "linear") + + # Normal model + lime <- LIME$new(model, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 64, 3, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + lime <- LIME$new(conv, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 64, 3, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- lime$get_result() + expect_true(is.array(res_array)) + res_dataframe <- lime$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- lime$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + +test_that("LIME: Conv2D-Net (keras)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 10, 10, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_2d( + input_shape = c(10, 10, 3), kernel_size = 4, filters = 8, + activation = "softplus" + ) %>% + layer_conv_2d(kernel_size = 4, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 1, activation = "sigmoid") + + # Normal model + lime <- LIME$new(model, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 10, 10, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_2d( + input_shape = c(10, 10, 3), kernel_size = 4, filters = 8, + activation = "softplus" + ) %>% + layer_conv_2d(kernel_size = 4, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 2, activation = "linear") + + # Normal model + lime <- LIME$new(model, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- lime$get_result() + expect_true(is.array(res_array)) + res_dataframe <- lime$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- lime$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + + +test_that("LIME: Dense-Net (torch)", { + library(torch) + + # Classification ------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- nn_sequential( + nn_linear(4, 16), + nn_relu(), + nn_linear(16, 3), + nn_softmax(dim = -1) + ) + + # Normal model + lime <- LIME$new(model, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 3)) + p <- plot(lime, output_idx = c(1,3)) + boxp <- boxplot(lime, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(4)) + lime <- LIME$new(conv, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 3)) + p <- plot(lime, output_idx = c(1,3)) + boxp <- boxplot(lime, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- nn_sequential( + nn_linear(4, 16), + nn_relu(), + nn_linear(16, 2) + ) + + # Normal model + lime <- LIME$new(model, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(4)) + lime <- LIME$new(conv, data, data[1:2, ]) + expect_equal(dim(lime$get_result()), c(2, 4, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------- + res_array <- lime$get_result() + expect_true(is.array(res_array)) + res_dataframe <- lime$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- lime$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + + +test_that("LIME: Conv1D-Net (torch)", { + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 3, 64)) + model <- nn_sequential( + nn_conv1d(3, 8, 16), + nn_softplus(), + nn_conv1d(8, 4, 16), + nn_tanh(), + nn_conv1d(4, 2, 16), + nn_relu(), + nn_flatten(), + nn_linear(38, 16), + nn_relu(), + nn_linear(16, 1), + nn_sigmoid() + ) + + # Normal model + lime <- LIME$new(model, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 64, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 64)) + lime <- LIME$new(conv, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 64, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 3, 64)) + model <- nn_sequential( + nn_conv1d(3, 8, 16), + nn_softplus(), + nn_conv1d(8, 4, 16), + nn_tanh(), + nn_conv1d(4, 2, 16), + nn_relu(), + nn_flatten(), + nn_linear(38, 16), + nn_relu(), + nn_linear(16, 2) + ) + + # Normal model + lime <- LIME$new(model, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 64, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 64)) + lime <- LIME$new(conv, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 64, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- lime$get_result() + expect_true(is.array(res_array)) + res_dataframe <- lime$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- lime$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + +test_that("LIME: Conv2D-Net (torch)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 3, 10, 10)) + model <- nn_sequential( + nn_conv2d(3, 8, c(4, 4)), + nn_softplus(), + nn_conv2d(8, 2, c(4, 4)), + nn_relu(), + nn_flatten(), + nn_linear(32, 16), + nn_relu(), + nn_linear(16, 1), + nn_sigmoid() + ) + + # Normal model + lime <- LIME$new(model, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 10, 10)) + lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 1)) + p <- plot(lime) + boxp <- boxplot(lime) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 3, 10, 10)) + model <- nn_sequential( + nn_conv2d(3, 8, c(4, 4)), + nn_softplus(), + nn_conv2d(8, 2, c(4, 4)), + nn_relu(), + nn_flatten(), + nn_linear(32, 16), + nn_relu(), + nn_linear(16, 2) + ) + + # Normal model + lime <- LIME$new(model, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 10, 10)) + lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 2)) + p <- plot(lime, output_idx = c(2)) + boxp <- boxplot(lime, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- lime$get_result() + expect_true(is.array(res_array)) + res_dataframe <- lime$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- lime$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + + +test_that("LIME: Keras multiple input or output layers", { + library(keras) + + # Multiple input layers + main_input <- layer_input(shape = c(10,10,2), name = 'main_input') + lstm_out <- main_input %>% + layer_conv_2d(2, c(2,2), activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 4) + auxiliary_input <- layer_input(shape = c(5), name = 'aux_input') + main_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>% + layer_dense(units = 5, activation = 'tanh') %>% + layer_dense(units = 3, activation = 'softmax', name = 'main_output') + model <- keras_model( + inputs = c(auxiliary_input, main_input), + outputs = c(main_output) + ) + data <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + + expect_error(LIME$new(model, data)) + + # Multiple output layers + main_input <- layer_input(shape = c(10,10,2), name = 'main_input') + lstm_out <- main_input %>% + layer_conv_2d(2, c(2,2), activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 4) + auxiliary_output <- lstm_out %>% + layer_dense(units = 2, activation = 'softmax', name = 'aux_output') + main_output <- lstm_out %>% + layer_dense(units = 5, activation = 'tanh') %>% + layer_dense(units = 3, activation = 'softmax', name = 'main_output') + model <- keras_model( + inputs = c(main_input), + outputs = c(auxiliary_output, main_output) + ) + data <- lapply(list(c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + + + expect_error(LIME$new(model, data)) +}) + +test_that("Custom model", { + + # Ranger model and iris dataset + library(ranger) + + model <- ranger(Species ~ ., data = iris, probability = TRUE) + + pred_fun <- function(newdata, ...) { + predict(model, newdata, ...)$predictions + } + + lime <- LIME$new(model, iris[, -5], iris[c(1,70, 111), -5], + output_type = "classification", + pred_fun = pred_fun, + output_names = levels(iris$Species)) + + res <- get_result(lime) + expect_equal(dim(res), c(3, 4, 3)) + + p <- plot(lime, output_idx = c(1, 3)) + boxp <- boxplot(lime, output_idx = c(1, 3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") +}) diff --git a/tests/testthat/test_SHAP.R b/tests/testthat/test_SHAP.R new file mode 100644 index 0000000..b65b839 --- /dev/null +++ b/tests/testthat/test_SHAP.R @@ -0,0 +1,612 @@ + +test_that("SHAP: General errors", { + library(neuralnet) + + # Fit model + model <- neuralnet(Species ~ Petal.Length + Petal.Width, iris, + linear.output = FALSE) + data <- iris[, c(3,4)] + + expect_error(SHAP$new()) # missing converter + expect_error(SHAP$new(model)) # missing data + expect_error(SHAP$new(NULL, data, data[1:2, ])) # no output_type + expect_error(SHAP$new(NULL, data, data[1:2, ], output_type = "regression")) # no pred_fun + expect_error(SHAP$new(NULL, data, data[1:2, ], + output_type = "regression", + perd_fun = function(newdata, ...) newdata)) + + SHAP$new(model, data, data[1:2, ]) # successful run + expect_error(SHAP$new(model, data, data[1:2, ], output_type = "ds")) # wrong output_type + expect_error(SHAP$new(model, data, data[1:2, ], pred_fun = identity)) # wrong pred_fun + expect_error(SHAP$new(model, data, data[1:2, ], output_idx = c(1,4))) # wrong output_idx + SHAP$new(model, data, data[1:2, ], output_idx = c(2)) + expect_error(SHAP$new(model, data, data[1:2, ], input_dim = c(1))) # wrong input_dim + expect_error(SHAP$new(model, data, data[1:2, ], input_names = c("a", "b", "d"))) # wrong input_names + SHAP$new(model, data, data[1:2, ], input_names = factor(c("a", "b"))) + expect_error(SHAP$new(model, data, data[1:2, ], output_names = c("a", "d"))) # wrong output_names + SHAP$new(model, data, data[1:2, ], output_names = factor(c("a", "d", "c"))) + + # Forwarding arguments to fastshap::explain + shap <- SHAP$new(model, data, data[1:10, ], nsim = 4) + + # get_result() + res <- get_result(shap) + expect_array(res) + res <- get_result(shap, "data.frame") + expect_data_frame(res) + res <- get_result(shap, "torch_tensor") + expect_class(res, "torch_tensor") + + # Plots + + # Non-existing data points + expect_error(plot(shap, data_idx = c(1,11))) + expect_error(boxplot(shap, data_idx = 1:11)) + # Non-existing class + expect_error(plot(shap, output_idx = c(5))) + expect_error(boxplot(shap, output_idx = c(5))) + + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(shap, data_idx = 1:3) + boxp <- boxplot(shap, data_idx = 1:4) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + p <- plot(shap, data_idx = 1:3, output_idx = 1:3) + boxp <- boxplot(shap, data_idx = 1:3, output_idx = 1:3) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + boxp <- boxplot(shap, ref_data_idx = c(4)) + + # plotly + library(plotly) + + p <- plot(shap, as_plotly = TRUE) + boxp <- boxplot(shap, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(shap, data_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(shap, data_idx = 1:4, as_plotly = TRUE, individual_max = 2, + individual_data_idx = c(1,2,5,6)) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + p <- plot(shap, data_idx = 1:3, output_idx = 1:3, as_plotly = TRUE) + boxp <- boxplot(shap, data_idx = 1:5, output_idx = 1:3, as_plotly = TRUE) + expect_s4_class(p, "innsight_plotly") + expect_s4_class(boxp, "innsight_plotly") + + +}) + +test_that("SHAP: Dense-Net (Neuralnet)", { + library(neuralnet) + library(torch) + + data(iris) + data <- iris[sample.int(150, size = 10), -5] + nn <- neuralnet(Species ~ ., + iris, + linear.output = FALSE, + hidden = c(10, 8), act.fct = "tanh", rep = 1, threshold = 0.5 + ) + + # Normal model + shap <- SHAP$new(nn, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 3)) + p <- plot(shap, output_idx = c(2,3)) + boxp <- boxplot(shap, output_idx = c(2,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(nn) + shap <- SHAP$new(conv, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 3)) + p <- plot(shap, output_idx = c(2,3)) + boxp <- boxplot(shap, output_idx = c(2,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") +}) + + +test_that("SHAP: Dense-Net (keras)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "tanh") %>% + layer_dense(units = 3, activation = "softmax") + + # Normal model + shap <- SHAP$new(model, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 3)) + p <- plot(shap, output_idx = c(1,3)) + boxp <- boxplot(shap, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + shap <- SHAP$new(conv, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 3)) + p <- plot(shap, output_idx = c(1,3)) + boxp <- boxplot(shap, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- keras_model_sequential() + model %>% + layer_dense(units = 16, activation = "relu", input_shape = c(4)) %>% + layer_dense(units = 8, activation = "tanh") %>% + layer_dense(units = 2, activation = "linear") + + # Normal model + shap <- SHAP$new(model, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + shap <- SHAP$new(conv, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------- + res_array <- shap$get_result() + expect_true(is.array(res_array)) + res_dataframe <- shap$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- shap$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) + expect_error(shap$get_result(type = "adsf")) +}) + + +test_that("SHAP: Conv1D-Net (keras)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_1d( + input_shape = c(64, 3), kernel_size = 16, filters = 8, + activation = "softplus" + ) %>% + layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% + layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 1, activation = "sigmoid") + + # Normal model + shap <- SHAP$new(model, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 64, 3, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 64, 3, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_1d( + input_shape = c(64, 3), kernel_size = 16, filters = 8, + activation = "softplus" + ) %>% + layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% + layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 64, activation = "relu") %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 2, activation = "linear") + + # Normal model + shap <- SHAP$new(model, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 64, 3, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 64, 3, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- shap$get_result() + expect_true(is.array(res_array)) + res_dataframe <- shap$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- shap$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + +test_that("SHAP: Conv2D-Net (keras)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 10, 10, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_2d( + input_shape = c(10, 10, 3), kernel_size = 4, filters = 8, + activation = "softplus" + ) %>% + layer_conv_2d(kernel_size = 4, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 1, activation = "sigmoid") + + # Normal model + shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 10, 10, 3)) + model <- keras_model_sequential() + model %>% + layer_conv_2d( + input_shape = c(10, 10, 3), kernel_size = 4, filters = 8, + activation = "softplus" + ) %>% + layer_conv_2d(kernel_size = 4, filters = 2, activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 16, activation = "relu") %>% + layer_dense(units = 2, activation = "linear") + + # Normal model + shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model) + shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- shap$get_result() + expect_true(is.array(res_array)) + res_dataframe <- shap$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- shap$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + + +test_that("SHAP: Dense-Net (torch)", { + library(torch) + + # Classification ------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- nn_sequential( + nn_linear(4, 16), + nn_relu(), + nn_linear(16, 3), + nn_softmax(dim = -1) + ) + + # Normal model + shap <- SHAP$new(model, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 3)) + p <- plot(shap, output_idx = c(1,3)) + boxp <- boxplot(shap, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(4)) + shap <- SHAP$new(conv, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 3)) + p <- plot(shap, output_idx = c(1,3)) + boxp <- boxplot(shap, output_idx = c(1,3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- matrix(rnorm(4 * 10), nrow = 10) + model <- nn_sequential( + nn_linear(4, 16), + nn_relu(), + nn_linear(16, 2) + ) + + # Normal model + shap <- SHAP$new(model, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(4)) + shap <- SHAP$new(conv, data, data[1:2, ]) + expect_equal(dim(shap$get_result()), c(2, 4, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------- + res_array <- shap$get_result() + expect_true(is.array(res_array)) + res_dataframe <- shap$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- shap$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + + +test_that("SHAP: Conv1D-Net (torch)", { + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 3, 64)) + model <- nn_sequential( + nn_conv1d(3, 8, 16), + nn_softplus(), + nn_conv1d(8, 4, 16), + nn_tanh(), + nn_conv1d(4, 2, 16), + nn_relu(), + nn_flatten(), + nn_linear(38, 16), + nn_relu(), + nn_linear(16, 1), + nn_sigmoid() + ) + + # Normal model + shap <- SHAP$new(model, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 64, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 64)) + shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 64, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 64 * 3), dim = c(4, 3, 64)) + model <- nn_sequential( + nn_conv1d(3, 8, 16), + nn_softplus(), + nn_conv1d(8, 4, 16), + nn_tanh(), + nn_conv1d(4, 2, 16), + nn_relu(), + nn_flatten(), + nn_linear(38, 16), + nn_relu(), + nn_linear(16, 2) + ) + + # Normal model + shap <- SHAP$new(model, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 64, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 64)) + shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 64, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- shap$get_result() + expect_true(is.array(res_array)) + res_dataframe <- shap$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- shap$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + +test_that("SHAP: Conv2D-Net (torch)", { + library(keras) + library(torch) + + # Classification ------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 3, 10, 10)) + model <- nn_sequential( + nn_conv2d(3, 8, c(4, 4)), + nn_softplus(), + nn_conv2d(8, 2, c(4, 4)), + nn_relu(), + nn_flatten(), + nn_linear(32, 16), + nn_relu(), + nn_linear(16, 1), + nn_sigmoid() + ) + + # Normal model + shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 10, 10)) + shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 1)) + p <- plot(shap) + boxp <- boxplot(shap) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Regression ----------------------------------------------------------------- + data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 3, 10, 10)) + model <- nn_sequential( + nn_conv2d(3, 8, c(4, 4)), + nn_softplus(), + nn_conv2d(8, 2, c(4, 4)), + nn_relu(), + nn_flatten(), + nn_linear(32, 16), + nn_relu(), + nn_linear(16, 2) + ) + + # Normal model + shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Converter + conv <- Converter$new(model, input_dim = c(3, 10, 10)) + shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 2)) + p <- plot(shap, output_idx = c(2)) + boxp <- boxplot(shap, output_idx = c(2)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") + + # Test get_result ------------------------------------------------------------ + res_array <- shap$get_result() + expect_true(is.array(res_array)) + res_dataframe <- shap$get_result(type = "data.frame") + expect_true(is.data.frame(res_dataframe)) + res_torch <- shap$get_result(type = "torch.tensor") + expect_true(inherits(res_torch, "torch_tensor")) +}) + + +test_that("SHAP: Keras multiple input or output layers", { + library(keras) + + # Multiple input layers + main_input <- layer_input(shape = c(10,10,2), name = 'main_input') + lstm_out <- main_input %>% + layer_conv_2d(2, c(2,2), activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 4) + auxiliary_input <- layer_input(shape = c(5), name = 'aux_input') + main_output <- layer_concatenate(c(lstm_out, auxiliary_input)) %>% + layer_dense(units = 5, activation = 'tanh') %>% + layer_dense(units = 3, activation = 'softmax', name = 'main_output') + model <- keras_model( + inputs = c(auxiliary_input, main_input), + outputs = c(main_output) + ) + data <- lapply(list(c(5), c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + + expect_error(SHAP$new(model, data)) + + # Multiple output layers + main_input <- layer_input(shape = c(10,10,2), name = 'main_input') + lstm_out <- main_input %>% + layer_conv_2d(2, c(2,2), activation = "relu") %>% + layer_flatten() %>% + layer_dense(units = 4) + auxiliary_output <- lstm_out %>% + layer_dense(units = 2, activation = 'softmax', name = 'aux_output') + main_output <- lstm_out %>% + layer_dense(units = 5, activation = 'tanh') %>% + layer_dense(units = 3, activation = 'softmax', name = 'main_output') + model <- keras_model( + inputs = c(main_input), + outputs = c(auxiliary_output, main_output) + ) + data <- lapply(list(c(10,10,2)), + function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) + + + expect_error(SHAP$new(model, data)) +}) + +test_that("Custom model", { + + # Ranger model and iris dataset + library(ranger) + + model <- ranger(Species ~ ., data = iris, probability = TRUE) + + pred_fun <- function(newdata, ...) { + predict(model, newdata, ...)$predictions + } + + shap <- SHAP$new(model, iris[, -5], iris[c(1,70, 111), -5], + pred_fun = pred_fun, + output_names = levels(iris$Species)) + + res <- get_result(shap) + expect_equal(dim(res), c(3, 4, 3)) + + p <- plot(shap, output_idx = c(1, 3)) + boxp <- boxplot(shap, output_idx = c(1, 3)) + expect_s4_class(p, "innsight_ggplot2") + expect_s4_class(boxp, "innsight_ggplot2") +}) From 87ff0dfc61f2dae0638c7702ccbda01fd0961ebe Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 27 Nov 2023 08:33:11 +0100 Subject: [PATCH 04/20] Speed up DeepSHAP and fix wrong number of samples in ExpectedGradients --- NAMESPACE | 1 + R/DeepLift.R | 42 ++++++-- R/GradienBased.R | 207 ++++++++++++++++++++++++++++++++++++++++ R/InterpretingMethod.R | 19 ++-- R/Layer_conv1d.R | 6 +- R/Layer_conv2d.R | 6 +- R/Layer_dense.R | 6 +- R/Layer_normalization.R | 4 +- R/Layer_pooling.R | 16 ++-- 9 files changed, 270 insertions(+), 37 deletions(-) diff --git a/NAMESPACE b/NAMESPACE index 67c9fed..254cd91 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -14,6 +14,7 @@ export(ConnectionWeights) export(Converter) export(DeepLift) export(DeepSHAP) +export(ExpectedGradient) export(Gradient) export(IntegratedGradient) export(LIME) diff --git a/R/DeepLift.R b/R/DeepLift.R index 44feacf..5506390 100644 --- a/R/DeepLift.R +++ b/R/DeepLift.R @@ -93,8 +93,8 @@ DeepLift <- R6Class( if (any(num_instances != 1)) { stopf("For the method {.code DeepLift}, you have to pass ", "only a single instance for the argument {.arg x_ref}. ", - "You passed (for at least one input layer) {max(num_instances)}", - " data instances!") + "You passed (for at least one input layer) '", + max(num_instances), "' data instances!") } self$converter$model$forward(self$data, @@ -232,7 +232,7 @@ DeepSHAP <- R6Class( super$initialize(converter, data, channels_first, output_idx, ignore_last_act, winner_takes_all, verbose, dtype) - cli_check(checkChoice(rule_name, c("rescale")), "rule_name") + cli_check(checkChoice(rule_name, c("rescale", "reveal_cancel")), "rule_name") self$rule_name <- rule_name cli_check(checkInt(limit_ref), "limit_ref") @@ -249,22 +249,50 @@ DeepSHAP <- R6Class( ids <- sample.int(num_samples, min(num_samples, limit_ref)) self$data_ref <- lapply(self$data_ref, function(x) x[ids, drop = FALSE]) + # Repeat values, s.t. `data` and `data_ref` have the same number of + # instances + num_samples <- dim(self$data[[1]])[1] + num_samples_ref <- dim(self$data_ref[[1]])[1] + data <- lapply(self$data, torch_repeat_interleave, + repeats = as.integer(num_samples_ref), + dim = 1) # now of shape (batch_size * num_samples_ref, input_dim) + repeat_input <- function(x) { + torch_cat(lapply(seq_len(num_samples), function(i) x)) + } + data_ref <- lapply(self$data_ref, repeat_input) # now of shape (batch_size * num_samples_ref, input_dim) + # Forward for normal input - self$converter$model$forward(self$data, + self$converter$model$forward(data, channels_first = self$channels_first, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE ) - # Forward for reference dataset - self$converter$model$update_ref(self$data_ref, + + self$converter$model$update_ref(data_ref, channels_first = self$channels_first, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE ) - self$result <- private$run("DeepSHAP") + result <- private$run("DeepSHAP") + + # For the DeepSHAP method, we only get the multiplier. + # Hence, we have to multiply this by the differences of inputs + fun <- function(result, out_idx, in_idx, x, x_ref, n) { + res <- result[[out_idx]][[in_idx]] + if (is.null(res)) { + res <- NULL + } else { + res <- res * (x[[in_idx]] - x_ref[[in_idx]])$unsqueeze(-1) + res <- torch_stack(res$chunk(n), dim = 1)$mean(2) + } + } + result <- apply_results(result, fun, x = data, x_ref = data_ref, + n = num_samples) + + self$result <- result } ), diff --git a/R/GradienBased.R b/R/GradienBased.R index a848ce2..9b4764a 100644 --- a/R/GradienBased.R +++ b/R/GradienBased.R @@ -587,6 +587,213 @@ SmoothGrad <- R6Class( ) ) +############################################################################### +# ExpectedGradients +############################################################################### + +#' @title Expected Gradients +#' +#' @description +#' The `ExpectedGradient` class implements the method ExpectedGradients +#' (Sundararajan et al., 2017), which incorporates a reference value \eqn{x'} +#' (also known as baseline value) analogous to the [`DeepLift`] method. +#' Integrated Gradients helps to uncover the relative importance of input +#' features in the predictions \eqn{y = f(x)} made by a model compared to the +#' prediction of the reference value \eqn{y' = f(x')}. This is achieved through +#' the following formula: +#' \deqn{ +#' (x - x') \times \int_{\alpha=0}^{1} \frac{\partial f(x' + \alpha (x - x'))}{\partial x} d\alpha +#' } +#' In simpler terms, it calculates how much each feature contributes to a +#' model's output by tracing a path from a baseline input \eqn{x'} to the actual +#' input \eqn{x} and measuring the average gradients along that path. +#' +#' Similar to the other gradient-based methods, by default the integrated +#' gradient is multiplied by the input to get an approximate decomposition +#' of \eqn{y - y'}. However, with the parameter `times_input` only the gradient +#' describing the output sensitivity can be returned. +#' +#' @template param-converter +#' @template param-data +#' @template param-channels_first +#' @template param-dtype +#' @template param-output_idx +#' @template param-ignore_last_act +#' @template param-verbose +#' @template examples-IntegratedGradient +#' +#' @references +#' M. Sundararajan et al. (2017) *Axiomatic attribution for deep networks.* ICML +#' 2017, PMLR 70, pp. 3319-3328. +#' +#' @family methods +#' @export +ExpectedGradient <- R6Class( + classname = "ExpectedGradient", + inherit = GradientBased, + public = list( + + #' @field n (`integer(1)`)\cr + #' Number of steps for the approximation of the integration path along + #' \eqn{\alpha} (default: \eqn{50}).\cr + #' @field x_ref (`list`)\cr + #' The reference input for the IntegratedGradient method. This value is + #' stored as a list of `torch_tensor`s of shape *(1, dim_in)* for each + #' input layer.\cr + #' + n = NULL, + data_ref = NULL, + + #' @description + #' Create a new instance of the `IntegratedGradient` R6 class. When + #' initialized, the method *Integrated Gradient* is applied to the given + #' data and baseline value and the results are stored in the field `result`. + #' + #' @param times_input (`logical(1`)\cr + #' Multiplies the integrated gradients with the difference of the input + #' features and the baseline values. By default, the original definition of + #' IntegratedGradient is applied. However, by setting `times_input = FALSE` + #' only an approximation of the integral is calculated, which describes the + #' sensitivity of the features to the output.\cr + #' @param n (`integer(1)`)\cr + #' Number of steps for the approximation of the integration path along + #' \eqn{\alpha} (default: \eqn{50}).\cr + #' @param x_ref ([`array`], [`data.frame`], [`torch_tensor`] or `list`)\cr + #' The reference input for the IntegratedGradient method. This value + #' must have the same format as the input data of the passed model to the + #' converter object. This means either + #' - an `array`, `data.frame`, `torch_tensor` or array-like format of + #' size *(1, dim_in)*, if e.g., the model has only one input layer, or + #' - a `list` with the corresponding input data (according to the upper point) + #' for each of the input layers. + #' - It is also possible to use the default value `NULL` to take only + #' zeros as reference input.\cr + #' + initialize = function(converter, data, + data_ref = NULL, + n = 50, + times_input = TRUE, + channels_first = TRUE, + output_idx = NULL, + ignore_last_act = TRUE, + verbose = interactive(), + dtype = "float") { + super$initialize(converter, data, channels_first, output_idx, + ignore_last_act, times_input, verbose, dtype) + + cli_check(checkInt(n, lower = 1), "n") + self$n <- n + + if (is.null(data_ref)) { + data_ref <- lapply(lapply(self$data, dim), + function(x) array(0, dim = c(1, x[-1]))) + } + self$data_ref <- private$test_data(data_ref, name = "data_ref") + + self$result <- private$run() + self$converter$model$reset() + } + ), + private = list( + run = function() { + # Combine input and baseline for each input layer + input <- lapply(seq_along(self$data), + function(i) list(data = self$data[[i]], + data_ref = self$data_ref[[i]])) + + # Define helper function for getting `self$n` interpolated inputs, i.e. + # the result has a shape of (batch_size * n, dim_in) + tmp_fun <- function(input, idx) { + # Repeat the input (batch_size * n) + res <- torch_repeat_interleave( + input$data, + repeats = torch_tensor(self$n, dtype = torch_long()), + dim = 1) + + # Get the random baselines + res_ref <- input$data_ref[idx] + + # Define scale + scale <- torch_rand(res$shape[1])$reshape(c(-1, rep(1, res$dim() - 1))) + + # Create interpolations between x and x_ref + list( + inputs = res_ref + scale * (res - res_ref), + data = res, + data_ref = res_ref + ) + } + + # Get random samples from the baseline distribution + idx <- sample.int(input[[1]]$data_ref$shape[1], + size = self$n * input[[1]]$data$shape[1], + replace = TRUE) + + # Create interpolated inputs along the integration path for each input + # layer and calculate the gradients of them + input <- lapply(input, tmp_fun, idx = idx) + gradients <- private$calculate_gradients( + lapply(input, function(x) x$inputs), "ExpectedGradient") + + # Define the core ExpectedGradients calculation + # `grad` has a shape of (batch_size * n * num_baselines, dim_in) + tmp_ExpGrad <- function(i, grads, inputs) { + grad <- grads[[i]] + inp <- (inputs[[i]]$data - inputs[[i]]$data_ref)$unsqueeze(-1) + # Output node is not connected to the input layer + if (is.null(grad)) { + res <- NULL + } else { # otherwise ... + grad <- grad * inp + # Chunk the gradients for each of the batch_size samples + # Results in a list with batch_size entries containing torch_tensors + # of shape (n * baselines, dim_in) + grad <- grad$chunk(dim(self$data[[1]])[1]) + + # Calculate the result of ExpectedGradients for current gradients + res <- torch_stack(lapply(grad, torch_mean, dim = 1)) + } + + res + } + + # Apply ExpectedGradients to all outputs + expected_grads <- lapply( + gradients, + function(grad_output) lapply(seq_along(grad_output), tmp_ExpGrad, + grads = grad_output, inputs = input) + ) + + expected_grads + }, + + print_method_specific = function() { + i <- cli_ul() + if (self$times_input) { + cli_li(paste0("{.field times_input}: TRUE (", + symbol$arrow_right, + " decomposition of y - y')")) + } else { + cli_li(paste0("{.field times_input}: FALSE (", + symbol$arrow_right, + " output sensitivity)")) + } + cli_li(paste0("{.field n}: ", self$n)) + all_zeros <- all(unlist(lapply(self$x_ref, + function(x) all(as_array(x) == 0)))) + if (all_zeros) { + s <- "zeros" + } else { + values <- unlist(lapply(self$x_ref, as_array)) + s <- paste0("mean: ", mean(values), " (q1: ", quantile(values, 0.25), + ", q3: ", quantile(values, 0.75), ")") + } + cli_li(paste0("{.field x_ref}: ", s)) + cli_end(id = i) + } + ) +) + ############################################################################### # Utils diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index f391b8b..09f6388 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -541,7 +541,7 @@ InterpretingMethod <- R6Class( private = list( # ----------------------- backward Function ------------------------------- - run = function(method_name) { + run = function(method_name, reset = TRUE) { # Declare vector for relevances for each output node rel_list <- vector(mode = "list", @@ -667,16 +667,16 @@ InterpretingMethod <- R6Class( rule_param = lrp_rule$rule_param, winner_takes_all = self$winner_takes_all) } else if (method_name %in% c("DeepLift", "DeepSHAP")) { - use_grad_near_zero <- ifelse(method_name == "DeepLift", TRUE, FALSE) rel <- layer$get_input_multiplier(rel, rule_name = rule_name, winner_takes_all = self$winner_takes_all, - use_grad_near_zero = use_grad_near_zero) + use_grad_near_zero = TRUE) } else if (method_name == "Connection-Weights") { rel <- layer$get_gradient(rel, weight = layer$W, use_avgpool = !self$winner_takes_all) } } - layer$reset() + + if (reset) layer$reset() # Transform it back to a list if (!is.list(rel)) { @@ -725,21 +725,18 @@ InterpretingMethod <- R6Class( } } - # For the DeepLift and DeepSHAP method, we only get the multiplier. + # For the DeepLift method, we only get the multiplier. # Hence, we have to multiply this by the differences of inputs - if (method_name %in% c("DeepLift", "DeepSHAP")) { + if (method_name %in% c("DeepLift")) { fun <- function(result, out_idx, in_idx, x, x_ref) { res <- result[[out_idx]][[in_idx]] if (is.null(res)) { res <- NULL } else { - res <- res * (x[[in_idx]] - - torch_mean(x_ref[[in_idx]], dim = 1, keepdim = TRUE) - )$unsqueeze(-1) + res <- res * (x[[in_idx]] - x_ref[[in_idx]])$unsqueeze(-1) } } - x_ref <- if (method_name == "DeepLift") self$x_ref else self$data_ref - result <- apply_results(result, fun, x = self$data, x_ref = x_ref) + result <- apply_results(result, fun, x = self$data, x_ref = self$x_ref) } result diff --git a/R/Layer_conv1d.R b/R/Layer_conv1d.R index 4e40cf0..ce6ab0b 100644 --- a/R/Layer_conv1d.R +++ b/R/Layer_conv1d.R @@ -83,7 +83,7 @@ conv1d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } # Apply padding if (any(self$padding != 0)) { @@ -97,12 +97,12 @@ conv1d_layer <- nn_module( dilation = self$dilation ) if (save_preactivation) { - self$preactivation_ref <- preactivation_ref$mean(dim = 1, keepdim = TRUE) + self$preactivation_ref <- preactivation_ref } output_ref <- self$activation_f(preactivation_ref) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref diff --git a/R/Layer_conv2d.R b/R/Layer_conv2d.R index 9904f65..724c336 100644 --- a/R/Layer_conv2d.R +++ b/R/Layer_conv2d.R @@ -89,7 +89,7 @@ conv2d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } # Apply padding if (any(self$padding != 0)) { @@ -103,12 +103,12 @@ conv2d_layer <- nn_module( dilation = self$dilation ) if (save_preactivation) { - self$preactivation_ref <- preactivation_ref$mean(dim = 1, keepdim = TRUE) + self$preactivation_ref <- preactivation_ref } # Apply non-linearity output_ref <- self$activation_f(preactivation_ref) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref diff --git a/R/Layer_dense.R b/R/Layer_dense.R index 6b75bf1..8e14b06 100644 --- a/R/Layer_dense.R +++ b/R/Layer_dense.R @@ -69,15 +69,15 @@ dense_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } preactivation_ref <- nnf_linear(x_ref, self$W, self$b) if (save_preactivation) { - self$preactivation_ref <- preactivation_ref$mean(dim = 1, keepdim = TRUE) + self$preactivation_ref <- preactivation_ref } output_ref <- self$activation_f(preactivation_ref) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref diff --git a/R/Layer_normalization.R b/R/Layer_normalization.R index 970f5c8..5e3472d 100644 --- a/R/Layer_normalization.R +++ b/R/Layer_normalization.R @@ -38,14 +38,14 @@ batchnorm_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } out <- nnf_batch_norm(x_ref, self$run_mean, self$run_var, self$gamma, self$beta, eps = self$eps) if (save_output) { - self$output_ref <- out$mean(dim = 1, keepdim = TRUE) + self$output_ref <- out } out diff --git a/R/Layer_pooling.R b/R/Layer_pooling.R index bbcf07d..82af000 100644 --- a/R/Layer_pooling.R +++ b/R/Layer_pooling.R @@ -169,11 +169,11 @@ avg_pool1d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } output_ref <- nnf_avg_pool1d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref @@ -242,11 +242,11 @@ avg_pool2d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } output_ref <- nnf_avg_pool2d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref @@ -319,11 +319,11 @@ max_pool1d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } output_ref <- nnf_max_pool1d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref @@ -479,11 +479,11 @@ max_pool2d_layer <- nn_module( update_ref = function(x_ref, save_input = TRUE, save_preactivation = TRUE, save_output = TRUE, ...) { if (save_input) { - self$input_ref <- x_ref$mean(dim = 1, keepdim = TRUE) + self$input_ref <- x_ref } output_ref <- nnf_max_pool2d(x_ref, self$kernel_size, self$strides) if (save_output) { - self$output_ref <- output_ref$mean(dim = 1, keepdim = TRUE) + self$output_ref <- output_ref } output_ref From 9838e51dc504fa1b98d76374b35b2cc6b3db397b Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 27 Nov 2023 21:35:25 +0100 Subject: [PATCH 05/20] Add argument `output_label` and update R documentation * Add argument `output_label`: In addition to the output index with `output_idx`, the new argument `output_label` for the output label can now also be specified in order to calculate only certain output nodes * Small fix for the method `ExpectedGradient` * Update R documentation --- R/AgnosticMethods.R | 8 +- R/AgnosticWrapper.R | 9 +- R/ConnectionWeights.R | 9 +- R/Converter.R | 5 +- R/DeepLift.R | 16 +- R/GradienBased.R | 117 ++++---- R/InterpretingMethod.R | 108 ++++++-- R/LRP.R | 4 +- man-roxygen/field-output_label.R | 7 + man-roxygen/param-output_label.R | 26 ++ man/AgnosticWrapper.Rd | 28 ++ man/ConnectionWeights.Rd | 29 ++ man/DeepLift.Rd | 29 ++ man/DeepSHAP.Rd | 31 ++- man/ExpectedGradient.Rd | 339 ++++++++++++++++++++++++ man/Gradient.Rd | 29 ++ man/GradientBased.Rd | 28 ++ man/IntegratedGradient.Rd | 29 ++ man/InterpretingMethod.Rd | 36 +++ man/LIME.Rd | 29 ++ man/LRP.Rd | 29 ++ man/SHAP.Rd | 29 ++ man/SmoothGrad.Rd | 29 ++ tests/testthat/test_ConnectionWeights.R | 10 - tests/testthat/test_DeepSHAP.R | 26 +- tests/testthat/test_Gradients.R | 22 +- tests/testthat/test_LIME.R | 17 ++ 27 files changed, 952 insertions(+), 126 deletions(-) create mode 100644 man-roxygen/field-output_label.R create mode 100644 man-roxygen/param-output_label.R create mode 100644 man/ExpectedGradient.Rd diff --git a/R/AgnosticMethods.R b/R/AgnosticMethods.R index e16a3f5..013fd65 100644 --- a/R/AgnosticMethods.R +++ b/R/AgnosticMethods.R @@ -27,6 +27,7 @@ #' `data.frame` back into an `array` as necessary. #' #' @template param-output_idx +#' @template param-output_label #' @template param-channels_first #' @template param-model-agnostic #' @template param-data-agnostic @@ -55,6 +56,7 @@ LIME <- R6Class( output_type = NULL, pred_fun = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, input_dim = NULL, input_names = NULL, @@ -65,7 +67,7 @@ LIME <- R6Class( if (inherits(x, "torch_tensor")) x <- as.array(x) super$initialize(model, data, x, output_type, pred_fun, output_idx, - channels_first, input_dim, input_names, + output_label, channels_first, input_dim, input_names, output_names) # Get the pre-processed x @@ -153,6 +155,7 @@ model_type.innsight_agnostic_wrapper <- function(x, ...) { #' `data.frame` back into an `array` as necessary. #' #' @template param-output_idx +#' @template param-output_label #' @template param-channels_first #' @template param-model-agnostic #' @template param-data-agnostic @@ -179,6 +182,7 @@ SHAP <- R6Class( initialize = function(model, data, x, pred_fun = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, input_dim = NULL, input_names = NULL, @@ -192,7 +196,7 @@ SHAP <- R6Class( if (inherits(x, "torch_tensor")) x <- as.array(x) super$initialize(model, data, x, output_type, pred_fun, output_idx, - channels_first, input_dim, input_names, + output_label, channels_first, input_dim, input_names, output_names) # We use the fastshap package for the explanation diff --git a/R/AgnosticWrapper.R b/R/AgnosticWrapper.R index f95e5c5..6264258 100644 --- a/R/AgnosticWrapper.R +++ b/R/AgnosticWrapper.R @@ -20,6 +20,7 @@ #' @template param-channels_first #' @template param-model-agnostic #' @template param-data-agnostic +#' @template param-output_label #' @template param-x-agnostic #' @template param-output_type-agnostic #' @template param-pred_fun-agnostic @@ -43,6 +44,7 @@ AgnosticWrapper <- R6Class( output_type = NULL, pred_fun = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, input_dim = NULL, input_names = NULL, @@ -77,8 +79,11 @@ AgnosticWrapper <- R6Class( self$ignore_last_act <- FALSE self$dtype <- "float" - # Check output indices - self$output_idx <- check_output_idx(output_idx, conv_model$output_dim) + # Check output indices and labels + outputs <- check_output_idx(output_idx, self$converter$output_dim, + output_label, self$converter$output_names) + self$output_idx <- outputs[[1]] + self$output_label <- outputs[[2]] # Save the original data to be explained self$x <- x diff --git a/R/ConnectionWeights.R b/R/ConnectionWeights.R index 9c0131d..9a37ccf 100644 --- a/R/ConnectionWeights.R +++ b/R/ConnectionWeights.R @@ -24,6 +24,7 @@ #' @template param-channels_first #' @template param-dtype #' @template param-output_idx +#' @template param-output_label #' @template param-verbose #' #' @references @@ -57,6 +58,7 @@ ConnectionWeights <- R6Class( initialize = function(converter, data = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, times_input = FALSE, verbose = interactive(), @@ -77,8 +79,11 @@ ConnectionWeights <- R6Class( self$dtype <- dtype self$converter$model$set_dtype(dtype) - # Check output indices - self$output_idx <- check_output_idx(output_idx, converter$output_dim) + # Check output indices and labels + outputs <- check_output_idx(output_idx, converter$output_dim, + output_label, converter$output_names) + self$output_idx <- outputs[[1]] + self$output_label <- outputs[[2]] if (times_input & is.null(data)) { stopf( diff --git a/R/Converter.R b/R/Converter.R index fdf26f1..81c17b0 100644 --- a/R/Converter.R +++ b/R/Converter.R @@ -379,7 +379,8 @@ Converter <- R6Class("Converter", # Check input names if (is.null(model_as_list$input_names)) { - model_as_list$input_names <- get_input_names(model_as_list$input_dim) + model_as_list$input_names <- + set_name_format(get_input_names(model_as_list$input_dim)) } else { input_names <- model_as_list$input_names input_names_lenght <- lapply(input_names, @@ -399,7 +400,7 @@ Converter <- R6Class("Converter", # Check output names if (is.null(model_as_list$output_names)) { model_as_list$output_names <- - get_output_names(model_as_list$output_dim) + set_name_format(get_output_names(model_as_list$output_dim)) } else { output_names <- model_as_list$output_names output_names_length <- lapply(output_names, diff --git a/R/DeepLift.R b/R/DeepLift.R index 5506390..c63929d 100644 --- a/R/DeepLift.R +++ b/R/DeepLift.R @@ -24,6 +24,7 @@ #' @template param-converter #' @template param-data #' @template param-output_idx +#' @template param-output_label #' @template param-channels_first #' @template param-ignore_last_act #' @template param-x_ref @@ -69,13 +70,14 @@ DeepLift <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, rule_name = "rescale", x_ref = NULL, winner_takes_all = TRUE, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, winner_takes_all, verbose, dtype) cli_check(checkChoice(rule_name, c("rescale", "reveal_cancel")), @@ -143,7 +145,7 @@ DeepLift <- R6Class( #' Deep Shapley additive explanations (DeepSHAP) #' #' @description -#' The *DeepSHAP* method extends the [DeepLift] technique by not only +#' The *DeepSHAP* method extends the [`DeepLift`] technique by not only #' considering a single reference value but by calculating the average #' from several, ideally representative reference values at each layer. The #' obtained feature-wise results are approximate Shapley values for the @@ -156,6 +158,7 @@ DeepLift <- R6Class( #' @template param-converter #' @template param-data #' @template param-output_idx +#' @template param-output_label #' @template param-channels_first #' @template param-ignore_last_act #' @template param-dtype @@ -222,6 +225,7 @@ DeepSHAP <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, rule_name = "rescale", data_ref = NULL, @@ -229,7 +233,7 @@ DeepSHAP <- R6Class( winner_takes_all = TRUE, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, winner_takes_all, verbose, dtype) cli_check(checkChoice(rule_name, c("rescale", "reveal_cancel")), "rule_name") @@ -301,16 +305,16 @@ DeepSHAP <- R6Class( i <- cli_ul() cli_li(paste0("{.field rule_name}: '", self$rule_name, "'")) cli_li(paste0("{.field winner_takes_all}: ", self$winner_takes_all)) - all_zeros <- all(unlist(lapply(self$x_ref, + all_zeros <- all(unlist(lapply(self$data_ref, function(x) all(as_array(x) == 0)))) if (all_zeros) { s <- "zeros" } else { - values <- unlist(lapply(self$x_ref, as_array)) + values <- unlist(lapply(self$data_ref, as_array)) s <- paste0("mean: ", mean(values), " (q1: ", quantile(values, 0.25), ", q3: ", quantile(values, 0.75), ")") } - cli_li(paste0("{.field x_ref}: ", s)) + cli_li(paste0("{.field data_ref}: ", s)) cli_end(id = i) } ) diff --git a/R/GradienBased.R b/R/GradienBased.R index 9b4764a..72bc747 100644 --- a/R/GradienBased.R +++ b/R/GradienBased.R @@ -19,6 +19,7 @@ #' @template param-dtype #' @template param-ignore_last_act #' @template param-output_idx +#' @template param-output_label #' @template param-verbose #' GradientBased <- R6Class( @@ -43,12 +44,13 @@ GradientBased <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, times_input = TRUE, verbose = interactive(), dtype = "float" ) { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, TRUE, verbose, dtype) cli_check(checkLogical(times_input), "times_input") @@ -169,6 +171,7 @@ boxplot.GradientBased <- function(x, ...) { #' @template param-channels_first #' @template param-dtype #' @template param-output_idx +#' @template param-output_label #' @template param-ignore_last_act #' @template param-verbose #' @@ -191,11 +194,12 @@ Gradient <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, times_input = FALSE, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, times_input, verbose, dtype) self$result <- private$run() @@ -260,6 +264,7 @@ Gradient <- R6Class( #' @template param-channels_first #' @template param-dtype #' @template param-output_idx +#' @template param-output_label #' @template param-ignore_last_act #' @template param-verbose #' @template examples-IntegratedGradient @@ -317,10 +322,11 @@ IntegratedGradient <- R6Class( times_input = TRUE, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, times_input, verbose, dtype) cli_check(checkInt(n, lower = 1), "n") @@ -463,6 +469,7 @@ IntegratedGradient <- R6Class( #' @template param-channels_first #' @template param-dtype #' @template param-output_idx +#' @template param-output_label #' @template param-ignore_last_act #' @template param-verbose #' @@ -503,13 +510,14 @@ SmoothGrad <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, times_input = FALSE, n = 50, noise_level = 0.1, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, times_input, verbose, dtype) cli_check(checkInt(n, lower = 1), "n") @@ -594,37 +602,34 @@ SmoothGrad <- R6Class( #' @title Expected Gradients #' #' @description -#' The `ExpectedGradient` class implements the method ExpectedGradients -#' (Sundararajan et al., 2017), which incorporates a reference value \eqn{x'} -#' (also known as baseline value) analogous to the [`DeepLift`] method. -#' Integrated Gradients helps to uncover the relative importance of input -#' features in the predictions \eqn{y = f(x)} made by a model compared to the -#' prediction of the reference value \eqn{y' = f(x')}. This is achieved through -#' the following formula: +#' The *Expected Gradients* method (Erion et al., 2021), also known as +#' *GradSHAP*, is a local feature attribution technique which extends the +#' [`IntegratedGradient`] method and provides approximate Shapley values. In +#' contrast to IntegratedGradient, it considers not only a single reference +#' value \eqn{x'} but the whole distribution of reference values +#' \eqn{X' \sim x'} and averages the IntegratedGradient values over this +#' distribution. Mathematically, the method can be described as follows: #' \deqn{ -#' (x - x') \times \int_{\alpha=0}^{1} \frac{\partial f(x' + \alpha (x - x'))}{\partial x} d\alpha +#' E_{x'\sim X', \alpha \sim U(0,1)}[(x - x') \times \frac{\partial f(x' + \alpha (x - x'))}{\partial x}] #' } -#' In simpler terms, it calculates how much each feature contributes to a -#' model's output by tracing a path from a baseline input \eqn{x'} to the actual -#' input \eqn{x} and measuring the average gradients along that path. -#' -#' Similar to the other gradient-based methods, by default the integrated -#' gradient is multiplied by the input to get an approximate decomposition -#' of \eqn{y - y'}. However, with the parameter `times_input` only the gradient -#' describing the output sensitivity can be returned. +#' The distribution of the reference values is specified with the argument +#' `data_ref`, of which `n` samples are taken at random for each instance +#' during the exstimation. #' #' @template param-converter #' @template param-data #' @template param-channels_first #' @template param-dtype #' @template param-output_idx +#' @template param-output_label #' @template param-ignore_last_act #' @template param-verbose #' @template examples-IntegratedGradient #' #' @references -#' M. Sundararajan et al. (2017) *Axiomatic attribution for deep networks.* ICML -#' 2017, PMLR 70, pp. 3319-3328. +#' G. Erion et al. (2021) *Improving performance of deep learning models with * +#' *axiomatic attribution priors and expected gradients.* Nature Machine +#' Intelligence 3, pp. 620-631. #' #' @family methods #' @export @@ -634,36 +639,32 @@ ExpectedGradient <- R6Class( public = list( #' @field n (`integer(1)`)\cr - #' Number of steps for the approximation of the integration path along + #' Number of samples from the distribution of reference values and number + #' of samples for the approximation of the integration path along #' \eqn{\alpha} (default: \eqn{50}).\cr - #' @field x_ref (`list`)\cr - #' The reference input for the IntegratedGradient method. This value is - #' stored as a list of `torch_tensor`s of shape *(1, dim_in)* for each + #' @field data_ref (`list`)\cr + #' The reference input for the ExpectedGradient method. This value is + #' stored as a list of `torch_tensor`s of shape *( , dim_in)* for each #' input layer.\cr #' n = NULL, data_ref = NULL, #' @description - #' Create a new instance of the `IntegratedGradient` R6 class. When - #' initialized, the method *Integrated Gradient* is applied to the given - #' data and baseline value and the results are stored in the field `result`. + #' Create a new instance of the `ExpectedGradient` R6 class. When + #' initialized, the method *Expected Gradient* is applied to the given + #' data and baseline values and the results are stored in the field `result`. #' - #' @param times_input (`logical(1`)\cr - #' Multiplies the integrated gradients with the difference of the input - #' features and the baseline values. By default, the original definition of - #' IntegratedGradient is applied. However, by setting `times_input = FALSE` - #' only an approximation of the integral is calculated, which describes the - #' sensitivity of the features to the output.\cr #' @param n (`integer(1)`)\cr - #' Number of steps for the approximation of the integration path along + #' Number of samples from the distribution of reference values and number + #' of samples for the approximation of the integration path along #' \eqn{\alpha} (default: \eqn{50}).\cr - #' @param x_ref ([`array`], [`data.frame`], [`torch_tensor`] or `list`)\cr - #' The reference input for the IntegratedGradient method. This value + #' @param data_ref ([`array`], [`data.frame`], [`torch_tensor`] or `list`)\cr + #' The reference inputs for the ExpectedGradient method. This value #' must have the same format as the input data of the passed model to the #' converter object. This means either #' - an `array`, `data.frame`, `torch_tensor` or array-like format of - #' size *(1, dim_in)*, if e.g., the model has only one input layer, or + #' size *( , dim_in)*, if e.g., the model has only one input layer, or #' - a `list` with the corresponding input data (according to the upper point) #' for each of the input layers. #' - It is also possible to use the default value `NULL` to take only @@ -672,14 +673,14 @@ ExpectedGradient <- R6Class( initialize = function(converter, data, data_ref = NULL, n = 50, - times_input = TRUE, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, - ignore_last_act, times_input, verbose, dtype) + super$initialize(converter, data, channels_first, output_idx, output_label, + ignore_last_act, TRUE, verbose, dtype) cli_check(checkInt(n, lower = 1), "n") self$n <- n @@ -703,7 +704,7 @@ ExpectedGradient <- R6Class( # Define helper function for getting `self$n` interpolated inputs, i.e. # the result has a shape of (batch_size * n, dim_in) - tmp_fun <- function(input, idx) { + tmp_fun <- function(input, idx, scale, alpha) { # Repeat the input (batch_size * n) res <- torch_repeat_interleave( input$data, @@ -713,30 +714,33 @@ ExpectedGradient <- R6Class( # Get the random baselines res_ref <- input$data_ref[idx] - # Define scale - scale <- torch_rand(res$shape[1])$reshape(c(-1, rep(1, res$dim() - 1))) + # Reshape alpha + alpha <- alpha$reshape(c(-1, rep(1, res$dim() - 1))) # Create interpolations between x and x_ref list( - inputs = res_ref + scale * (res - res_ref), + inputs = res_ref + alpha * (res - res_ref), data = res, data_ref = res_ref ) } # Get random samples from the baseline distribution + num_samples <- self$n * input[[1]]$data$shape[1] idx <- sample.int(input[[1]]$data_ref$shape[1], - size = self$n * input[[1]]$data$shape[1], + size = num_samples, replace = TRUE) + # Get random alpha values + alpha <- torch_rand(num_samples) # Create interpolated inputs along the integration path for each input # layer and calculate the gradients of them - input <- lapply(input, tmp_fun, idx = idx) + input <- lapply(input, tmp_fun, idx = idx, alpha = alpha) gradients <- private$calculate_gradients( lapply(input, function(x) x$inputs), "ExpectedGradient") # Define the core ExpectedGradients calculation - # `grad` has a shape of (batch_size * n * num_baselines, dim_in) + # `grad` has a shape of (batch_size * n, dim_in) tmp_ExpGrad <- function(i, grads, inputs) { grad <- grads[[i]] inp <- (inputs[[i]]$data - inputs[[i]]$data_ref)$unsqueeze(-1) @@ -769,26 +773,17 @@ ExpectedGradient <- R6Class( print_method_specific = function() { i <- cli_ul() - if (self$times_input) { - cli_li(paste0("{.field times_input}: TRUE (", - symbol$arrow_right, - " decomposition of y - y')")) - } else { - cli_li(paste0("{.field times_input}: FALSE (", - symbol$arrow_right, - " output sensitivity)")) - } cli_li(paste0("{.field n}: ", self$n)) - all_zeros <- all(unlist(lapply(self$x_ref, + all_zeros <- all(unlist(lapply(self$data_ref, function(x) all(as_array(x) == 0)))) if (all_zeros) { s <- "zeros" } else { - values <- unlist(lapply(self$x_ref, as_array)) + values <- unlist(lapply(self$data_ref, as_array)) s <- paste0("mean: ", mean(values), " (q1: ", quantile(values, 0.25), ", q3: ", quantile(values, 0.75), ")") } - cli_li(paste0("{.field x_ref}: ", s)) + cli_li(paste0("{.field data_ref}: ", s)) cli_end(id = i) } ) diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index 09f6388..2e7ca7b 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -21,6 +21,7 @@ #' @template param-ignore_last_act #' @template param-dtype #' @template param-aggr_channels +#' @template param-output_label #' @template param-as_plotly #' @template param-verbose #' @template param-ref_data_idx @@ -35,6 +36,7 @@ #' @template field-ignore_last_act #' @template field-result #' @template field-output_idx +#' @template field-output_label #' @template field-verbose #' @template field-winner_takes_all #' @@ -49,6 +51,7 @@ InterpretingMethod <- R6Class( ignore_last_act = NULL, result = NULL, output_idx = NULL, + output_label = NULL, verbose = NULL, #' @description @@ -79,6 +82,7 @@ InterpretingMethod <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, winner_takes_all = TRUE, verbose = interactive(), @@ -102,8 +106,11 @@ InterpretingMethod <- R6Class( self$dtype <- dtype self$converter$model$set_dtype(dtype) - # Check output indices - self$output_idx <- check_output_idx(output_idx, converter$output_dim) + # Check output indices and labels + outputs <- check_output_idx(output_idx, converter$output_dim, + output_label, converter$output_names) + self$output_idx <- outputs[[1]] + self$output_label <- outputs[[2]] self$data <- private$test_data(data) }, @@ -898,16 +905,14 @@ print_output_idx <- function(output_idx, out_names) { # Utils ############################################################################### -check_output_idx <- function(output_idx, output_dim) { - # for the default value, choose from the first output the first ten - # (maybe less) output nodes - if (is.null(output_idx)) { - output_idx <- list(1:min(10, output_dim[[1]])) - } else if (testIntegerish(output_idx, - lower = 1, - upper = output_dim[[1]])) { - # or only a number (assumes the first output) +check_output_idx <- function(output_idx, output_dim, output_label, output_names) { + # Check the output indices -------------------------- + # Check if output_idx is a single vector + if (testIntegerish(output_idx, + lower = 1, + upper = output_dim[[1]])) { output_idx <- list(output_idx) + # Check if it's a list of vectors } else if (testList(output_idx, max.len = length(output_dim))) { # the argument output_idx is a list of output_nodes for each output n <- 1 @@ -916,25 +921,86 @@ check_output_idx <- function(output_idx, output_dim) { cli_check(checkInt(limit), "limit") if (!testIntegerish(output, lower = 1, upper = limit, null.ok = TRUE)) { stopf("Assertion on {.arg output_idx[[", n, "]]} failed: Value(s) ", - paste(output, collapse = ","), " not <= ", limit, ".") + paste(output, collapse = ","), " not <= ", limit, ".") } n <- n + 1 } - } else { + } else if (!is.null(output_idx)) { stopf("The argument {.arg output_idx} has to be either a vector with maximum ", - "value of '", output_dim[[1]], "' or a list of length '", - length(output_dim), "' with maximal values of '", - paste(unlist(output_dim), collapse = ","), "'.") + "value of '", output_dim[[1]], "' or a list of length '", + length(output_dim), "' with maximal values of '", + paste(unlist(output_dim), collapse = ","), "'.") + } + + # Check the output labels ------------------------- + # Check if output_label is a single vector + if (testCharacter(output_label, min.len = 1, max.len = output_dim[[1]]) || + testFactor(output_label, min.len = 1, max.len = output_dim[[1]])) { + # Check if labels are a subset of output_names + cli_check(checkSubset(as.factor(output_label), unlist(output_names[[1]])), + "output_label") + output_label <- list(output_label) + # Check if it's a list of vectors + } else if (testList(output_label, max.len = length(output_names))) { + # the argument output_label is a list of names for each output + n <- 1 + for (output in output_label) { + # Check if labels are a subset of output_names + cli_check(checkSubset(as.factor(unlist(output)), unlist(output_names[[n]])), + "output_label") + n <- n + 1 + } + } else if (!is.null(output_label)) { + stopf("The argument {.arg output_label} has to be either a vector of ", + "characters/factors or a list of vectors of characters/factors!") + } + + if (is.null(output_idx) && is.null(output_label)) { + output_idx <- list(1:min(10, output_dim[[1]])) + output_label <- list(output_names[[1]][[1]][output_idx[[1]]]) + } else if (is.null(output_idx) && !is.null(output_label)) { + output_idx <- list() + for (i in seq_along(output_label)) { + output_idx[[i]] <- match(output_label[[i]], output_names[[i]][[1]]) + if (length(output_idx[[i]]) == 0) output_idx[[i]] <- NULL + } + } else if (is.null(output_label) && !is.null(output_idx)) { + output_label <- list() + for (i in seq_along(output_idx)) { + output_label[[i]] <- output_names[[i]][[1]][output_idx[[i]]] + } } # Fill up with NULLs - if (length(output_idx) < length(output_dim)) { - output_idx <- - append(output_idx, - rep(list(NULL), length(output_dim) - length(output_idx))) + num_layers <- length(output_dim) + if (length(output_idx) < num_layers) { + output_idx <- append(output_idx, + rep(list(NULL), num_layers - length(output_idx))) + } + if (length(output_label) < num_layers) { + output_label <- append(output_label, + rep(list(NULL), num_layers - length(output_label))) } - output_idx + # Check if both are consistent + for (i in seq_along(output_dim)) { + if (testTRUE(length(output_idx[[i]]) != length(output_label[[i]]))) { + stopf("Both the {.arg output_idx} and {.arg output_label} arguments ", + "were passed (i.e., not {.code NULL}). However, they do not ", + "match and point to different output nodes.") + } + + # Get labels from output_idx + labels <- output_names[[i]][[1]][output_idx[[i]]] + if (length(labels) == 0) labels <- NULL + if (!testSetEqual(labels, as.factor(output_label[[i]]))) { + stopf("Both the {.arg output_idx} and {.arg output_label} arguments ", + "were passed (i.e., not {.code NULL}). However, they do not ", + "match and point to different output nodes.") + } + } + + list(output_idx, output_label) } diff --git a/R/LRP.R b/R/LRP.R index 5eb5be8..aacdd7e 100644 --- a/R/LRP.R +++ b/R/LRP.R @@ -22,6 +22,7 @@ #' @template param-x_ref #' @template param-dtype #' @template param-output_idx +#' @template param-output_label #' @template param-verbose #' @template param-winner_takes_all #' @@ -94,13 +95,14 @@ LRP <- R6Class( initialize = function(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, rule_name = "simple", rule_param = NULL, winner_takes_all = TRUE, verbose = interactive(), dtype = "float") { - super$initialize(converter, data, channels_first, output_idx, + super$initialize(converter, data, channels_first, output_idx, output_label, ignore_last_act, winner_takes_all, verbose, dtype) layer_names_with_rule <- c( diff --git a/man-roxygen/field-output_label.R b/man-roxygen/field-output_label.R new file mode 100644 index 0000000..46c888f --- /dev/null +++ b/man-roxygen/field-output_label.R @@ -0,0 +1,7 @@ +#' @field output_label (`list`)\cr +#' This list of `factors` specifies the output nodes to which +#' the method is to be applied. In the order of the output layers, the list +#' contains the respective output nodes labels and unwanted output layers +#' have the entry `NULL` instead of a vector of labels, +#' e.g., `list(NULL, c("a", "c"))` for the first and third output node in the +#' second output layer.\cr diff --git a/man-roxygen/param-output_label.R b/man-roxygen/param-output_label.R new file mode 100644 index 0000000..a685a03 --- /dev/null +++ b/man-roxygen/param-output_label.R @@ -0,0 +1,26 @@ +#' @param output_label (`character`, `factor`, `list` or `NULL`)\cr +#' These values specify the output nodes for which +#' the method is to be applied. Only values that were previously passed with +#' the argument `output_names` in the `converter` can be used. In order to +#' allow models with multiple +#' output layers, there are the following possibilities to select +#' the names of the output nodes in the individual output layers: +#' \itemize{ +#' \item A `character` vector or `factor` of labels: If the model has only one output +#' layer, the values correspond to the labels of the output nodes named in the +#' passed `Converter` object, e.g., +#' `c("a", "c", "d")` for the first, third and fourth output node if the +#' output names are `c("a", "b", "c", "d")`. If there are +#' multiple output layers, the names of the output nodes from the first +#' output layer are considered. +#' \item A `list` of `charactor`/`factor` vectors of labels: If the method is to be +#' applied to output nodes from different layers, a list can be passed +#' that specifies the desired labels of the output nodes for each +#' output layer. Unwanted output layers have the entry `NULL` instead of +#' a vector of labels, e.g., `list(NULL, c("a", "c"))` for the first and +#' third output node in the second output layer. +#' \item `NULL` (default): The method is applied to all output nodes in +#' the first output layer but is limited to the first ten as the +#' calculations become more computationally expensive for more output +#' nodes.\cr +#' } diff --git a/man/AgnosticWrapper.Rd b/man/AgnosticWrapper.Rd index 45ea1be..ad35dd0 100644 --- a/man/AgnosticWrapper.Rd +++ b/man/AgnosticWrapper.Rd @@ -58,6 +58,7 @@ Create a new instance of the \code{AgnosticWrapper} R6 class. output_type = NULL, pred_fun = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, input_dim = NULL, input_names = NULL, @@ -133,6 +134,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{channels_first}}{(\code{logical(1)})\cr The channel position of the given data (argument \code{data}). If \code{TRUE}, the channel axis is placed at the second position diff --git a/man/ConnectionWeights.Rd b/man/ConnectionWeights.Rd index b50a659..01b4219 100644 --- a/man/ConnectionWeights.Rd +++ b/man/ConnectionWeights.Rd @@ -134,6 +134,7 @@ simulated data.} Ecological Modelling 178, p. 389–397 Other methods: \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, @@ -186,6 +187,7 @@ are stored in the field \code{result}. converter, data = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, times_input = FALSE, verbose = interactive(), @@ -239,6 +241,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{channels_first}}{(\code{logical(1)})\cr The channel position of the given data (argument \code{data}). If \code{TRUE}, the channel axis is placed at the second position diff --git a/man/DeepLift.Rd b/man/DeepLift.Rd index 028f86a..2a4300f 100644 --- a/man/DeepLift.Rd +++ b/man/DeepLift.Rd @@ -140,6 +140,7 @@ propagating activation differences.} ICML 2017, p. 4844-4866 Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, @@ -194,6 +195,7 @@ the field \code{result}. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, rule_name = "rescale", x_ref = NULL, @@ -254,6 +256,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/DeepSHAP.Rd b/man/DeepSHAP.Rd index 6256919..75218ba 100644 --- a/man/DeepSHAP.Rd +++ b/man/DeepSHAP.Rd @@ -4,7 +4,7 @@ \alias{DeepSHAP} \title{Deep Shapley additive explanations (DeepSHAP)} \description{ -The \emph{DeepSHAP} method extends the \link{DeepLift} technique by not only +The \emph{DeepSHAP} method extends the \code{\link{DeepLift}} technique by not only considering a single reference value but by calculating the average from several, ideally representative reference values at each layer. The obtained feature-wise results are approximate Shapley values for the @@ -135,6 +135,7 @@ predictions.} NIPS 2017, p. 4768–4777 Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, @@ -193,6 +194,7 @@ stored in the field \code{result}. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, rule_name = "rescale", data_ref = NULL, @@ -254,6 +256,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/ExpectedGradient.Rd b/man/ExpectedGradient.Rd new file mode 100644 index 0000000..0c20d34 --- /dev/null +++ b/man/ExpectedGradient.Rd @@ -0,0 +1,339 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/GradienBased.R +\name{ExpectedGradient} +\alias{ExpectedGradient} +\title{Expected Gradients} +\description{ +The \emph{Expected Gradients} method (Erion et al., 2021), also known as +\emph{GradSHAP}, is a local feature attribution technique which extends the +\code{\link{IntegratedGradient}} method and provides approximate Shapley values. In +contrast to IntegratedGradient, it considers not only a single reference +value \eqn{x'} but the whole distribution of reference values +\eqn{X' \sim x'} and averages the IntegratedGradient values over this +distribution. Mathematically, the method can be described as follows: +\deqn{ +E_{x'\sim X', \alpha \sim U(0,1)}[(x - x') \times \frac{\partial f(x' + \alpha (x - x'))}{\partial x}] +} +The distribution of the reference values is specified with the argument +\code{data_ref}, of which \code{n} samples are taken at random for each instance +during the exstimation. +} +\examples{ +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#----------------------- Example 1: Torch ---------------------------------- +library(torch) + +# Create nn_sequential model and data +model <- nn_sequential( + nn_linear(5, 12), + nn_relu(), + nn_linear(12, 2), + nn_softmax(dim = 2) +) +data <- torch_randn(25, 5) +ref <- torch_randn(1, 5) + +# Create Converter +converter <- Converter$new(model, input_dim = c(5)) + +# Apply method IntegratedGradient +int_grad <- IntegratedGradient$new(converter, data, x_ref = ref) + +# Print the result as a torch tensor for first two data points +get_result(int_grad, "torch.tensor")[1:2] + +# Plot the result for both classes +plot(int_grad, output_idx = 1:2) + +# Plot the boxplot of all datapoints and for both classes +boxplot(int_grad, output_idx = 1:2) + +# ------------------------- Example 2: Neuralnet --------------------------- +if (require("neuralnet")) { + library(neuralnet) + data(iris) + + # Train a neural network + nn <- neuralnet((Species == "setosa") ~ Petal.Length + Petal.Width, + iris, + linear.output = FALSE, + hidden = c(3, 2), act.fct = "tanh", rep = 1 + ) + + # Convert the model + converter <- Converter$new(nn) + + # Apply IntegratedGradient with a reference input of the feature means + x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) + int_grad <- IntegratedGradient$new(converter, iris[, c(3, 4)], x_ref = x_ref) + + # Get the result as a dataframe and show first 5 rows + get_result(int_grad, type = "data.frame")[1:5, ] + + # Plot the result for the first datapoint in the data + plot(int_grad, data_idx = 1) + + # Plot the result as boxplots + boxplot(int_grad) +} +\dontshow{\}) # examplesIf} +\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +# ------------------------- Example 3: Keras ------------------------------- +if (require("keras")) { + library(keras) + + # Make sure keras is installed properly + is_keras_available() + + data <- array(rnorm(10 * 32 * 32 * 3), dim = c(10, 32, 32, 3)) + + model <- keras_model_sequential() + model \%>\% + layer_conv_2d( + input_shape = c(32, 32, 3), kernel_size = 8, filters = 8, + activation = "softplus", padding = "valid") \%>\% + layer_conv_2d( + kernel_size = 8, filters = 4, activation = "tanh", + padding = "same") \%>\% + layer_conv_2d( + kernel_size = 4, filters = 2, activation = "relu", + padding = "valid") \%>\% + layer_flatten() \%>\% + layer_dense(units = 64, activation = "relu") \%>\% + layer_dense(units = 2, activation = "softmax") + + # Convert the model + converter <- Converter$new(model) + + # Apply the IntegratedGradient method with a zero baseline and n = 20 + # iteration steps + int_grad <- IntegratedGradient$new(converter, data, + channels_first = FALSE, + n = 20 + ) + + # Plot the result for the first image and both classes + plot(int_grad, output_idx = 1:2) + + # Plot the result as boxplots for first class + boxplot(int_grad, output_idx = 1) +} +\dontshow{\}) # examplesIf} +\dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +#------------------------- Plotly plots ------------------------------------ +if (require("plotly")) { + # You can also create an interactive plot with plotly. + # This is a suggested package, so make sure that it is installed + library(plotly) + boxplot(int_grad, as_plotly = TRUE) +} +\dontshow{\}) # examplesIf} +} +\references{ +G. Erion et al. (2021) *Improving performance of deep learning models with * +\emph{axiomatic attribution priors and expected gradients.} Nature Machine +Intelligence 3, pp. 620-631. +} +\seealso{ +Other methods: +\code{\link{ConnectionWeights}}, +\code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, +\code{\link{Gradient}}, +\code{\link{IntegratedGradient}}, +\code{\link{LIME}}, +\code{\link{LRP}}, +\code{\link{SHAP}}, +\code{\link{SmoothGrad}} +} +\concept{methods} +\section{Super classes}{ +\code{\link[innsight:InterpretingMethod]{innsight::InterpretingMethod}} -> \code{\link[innsight:GradientBased]{innsight::GradientBased}} -> \code{ExpectedGradient} +} +\section{Public fields}{ +\if{html}{\out{
}} +\describe{ +\item{\code{n}}{(\code{integer(1)})\cr +Number of samples from the distribution of reference values and number +of samples for the approximation of the integration path along +\eqn{\alpha} (default: \eqn{50}).\cr} + +\item{\code{data_ref}}{(\code{list})\cr +The reference input for the ExpectedGradient method. This value is +stored as a list of \code{torch_tensor}s of shape \emph{( , dim_in)} for each +input layer.\cr} +} +\if{html}{\out{
}} +} +\section{Methods}{ +\subsection{Public methods}{ +\itemize{ +\item \href{#method-ExpectedGradient-new}{\code{ExpectedGradient$new()}} +\item \href{#method-ExpectedGradient-clone}{\code{ExpectedGradient$clone()}} +} +} +\if{html}{\out{ +
Inherited methods + +
+}} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-ExpectedGradient-new}{}}} +\subsection{Method \code{new()}}{ +Create a new instance of the \code{ExpectedGradient} R6 class. When +initialized, the method \emph{Expected Gradient} is applied to the given +data and baseline values and the results are stored in the field \code{result}. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{ExpectedGradient$new( + converter, + data, + data_ref = NULL, + n = 50, + channels_first = TRUE, + output_idx = NULL, + output_label = NULL, + ignore_last_act = TRUE, + verbose = interactive(), + dtype = "float" +)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{converter}}{(\code{\link{Converter}})\cr +An instance of the \code{Converter} class that includes the +torch-converted model and some other model-specific attributes. See +\code{\link{Converter}} for details.\cr} + +\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The data to which the method is to be applied. These must +have the same format as the input data of the passed model to the +converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{(batch_size, dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the +upper point) for each of the input layers.\cr +}} + +\item{\code{data_ref}}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The reference inputs for the ExpectedGradient method. This value +must have the same format as the input data of the passed model to the +converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{( , dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the upper point) +for each of the input layers. +\item It is also possible to use the default value \code{NULL} to take only +zeros as reference input.\cr +}} + +\item{\code{n}}{(\code{integer(1)})\cr +Number of samples from the distribution of reference values and number +of samples for the approximation of the integration path along +\eqn{\alpha} (default: \eqn{50}).\cr} + +\item{\code{channels_first}}{(\code{logical(1)})\cr +The channel position of the given data (argument +\code{data}). If \code{TRUE}, the channel axis is placed at the second position +between the batch size and the rest of the input axes, e.g., +\code{c(10,3,32,32)} for a batch of ten images with three channels and a +height and width of 32 pixels. Otherwise (\code{FALSE}), the channel axis +is at the last position, i.e., \code{c(10,32,32,3)}. If the data +has no channel axis, use the default value \code{TRUE}.\cr} + +\item{\code{output_idx}}{(\code{integer}, \code{list} or \code{NULL})\cr +These indices specify the output nodes for which +the method is to be applied. In order to allow models with multiple +output layers, there are the following possibilities to select +the indices of the output nodes in the individual output layers: +\itemize{ +\item An \code{integer} vector of indices: If the model has only one output +layer, the values correspond to the indices of the output nodes, e.g., +\code{c(1,3,4)} for the first, third and fourth output node. If there are +multiple output layers, the indices of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{integer} vectors of indices: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired indices of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + +\item{\code{ignore_last_act}}{(\code{logical(1)})\cr +Set this logical value to include the last +activation functions for each output layer, or not (default: \code{TRUE}). +In practice, the last activation (especially for softmax activation) is +often omitted.\cr} + +\item{\code{verbose}}{(\code{logical(1)})\cr +This logical argument determines whether a progress bar is +displayed for the calculation of the method or not. The default value is +the output of the primitive R function \code{\link[=interactive]{interactive()}}.\cr} + +\item{\code{dtype}}{(\code{character(1)})\cr +The data type for the calculations. Use +either \code{'float'} for \link{torch_float} or \code{'double'} for +\link{torch_double}.\cr} +} +\if{html}{\out{
}} +} +} +\if{html}{\out{
}} +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-ExpectedGradient-clone}{}}} +\subsection{Method \code{clone()}}{ +The objects of this class are cloneable with this method. +\subsection{Usage}{ +\if{html}{\out{
}}\preformatted{ExpectedGradient$clone(deep = FALSE)}\if{html}{\out{
}} +} + +\subsection{Arguments}{ +\if{html}{\out{
}} +\describe{ +\item{\code{deep}}{Whether to make a deep clone.} +} +\if{html}{\out{
}} +} +} +} diff --git a/man/Gradient.Rd b/man/Gradient.Rd index e868209..430fad7 100644 --- a/man/Gradient.Rd +++ b/man/Gradient.Rd @@ -135,6 +135,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, \code{\link{LRP}}, @@ -175,6 +176,7 @@ given data and the results are stored in the field \code{result}. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, times_input = FALSE, verbose = interactive(), @@ -233,6 +235,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/GradientBased.Rd b/man/GradientBased.Rd index 00cde36..4aa4ed3 100644 --- a/man/GradientBased.Rd +++ b/man/GradientBased.Rd @@ -56,6 +56,7 @@ the field \code{result}. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, times_input = TRUE, verbose = interactive(), @@ -114,6 +115,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/IntegratedGradient.Rd b/man/IntegratedGradient.Rd index 463b3a9..e9c7498 100644 --- a/man/IntegratedGradient.Rd +++ b/man/IntegratedGradient.Rd @@ -143,6 +143,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{LIME}}, \code{\link{LRP}}, @@ -200,6 +201,7 @@ data and baseline value and the results are stored in the field \code{result}. times_input = TRUE, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, verbose = interactive(), dtype = "float" @@ -281,6 +283,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/InterpretingMethod.Rd b/man/InterpretingMethod.Rd index 1050c64..15d4d36 100644 --- a/man/InterpretingMethod.Rd +++ b/man/InterpretingMethod.Rd @@ -82,6 +82,14 @@ have the entry \code{NULL} instead of a vector of indices, e.g., \code{list(NULL, c(1,3))} for the first and third output node in the second output layer.\cr} +\item{\code{output_label}}{(\code{list})\cr +This list of \code{factors} specifies the output nodes to which +the method is to be applied. In the order of the output layers, the list +contains the respective output nodes labels and unwanted output layers +have the entry \code{NULL} instead of a vector of labels, +e.g., \code{list(NULL, c("a", "c"))} for the first and third output node in the +second output layer.\cr} + \item{\code{verbose}}{(\code{logical(1)})\cr This logical value determines whether a progress bar is displayed for the calculation of the method or not. The default value is @@ -122,6 +130,7 @@ Create a new instance of this super class. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, winner_takes_all = TRUE, verbose = interactive(), @@ -180,6 +189,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/LIME.Rd b/man/LIME.Rd index e312e4f..d9eb586 100644 --- a/man/LIME.Rd +++ b/man/LIME.Rd @@ -112,6 +112,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LRP}}, @@ -154,6 +155,7 @@ stored in the field \code{result}. output_type = NULL, pred_fun = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, input_dim = NULL, input_names = NULL, @@ -230,6 +232,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{channels_first}}{(\code{logical(1)})\cr The channel position of the given data (argument \code{data}). If \code{TRUE}, the channel axis is placed at the second position diff --git a/man/LRP.Rd b/man/LRP.Rd index e150812..6175ccd 100644 --- a/man/LRP.Rd +++ b/man/LRP.Rd @@ -154,6 +154,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, @@ -219,6 +220,7 @@ the field \code{result}. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, rule_name = "simple", rule_param = NULL, @@ -279,6 +281,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/man/SHAP.Rd b/man/SHAP.Rd index 7393c76..aaf7538 100644 --- a/man/SHAP.Rd +++ b/man/SHAP.Rd @@ -108,6 +108,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, @@ -149,6 +150,7 @@ stored in the field \code{result}. x, pred_fun = NULL, output_idx = NULL, + output_label = NULL, channels_first = TRUE, input_dim = NULL, input_names = NULL, @@ -221,6 +223,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{channels_first}}{(\code{logical(1)})\cr The channel position of the given data (argument \code{data}). If \code{TRUE}, the channel axis is placed at the second position diff --git a/man/SmoothGrad.Rd b/man/SmoothGrad.Rd index 21d351c..77f7a3c 100644 --- a/man/SmoothGrad.Rd +++ b/man/SmoothGrad.Rd @@ -134,6 +134,7 @@ Other methods: \code{\link{ConnectionWeights}}, \code{\link{DeepLift}}, \code{\link{DeepSHAP}}, +\code{\link{ExpectedGradient}}, \code{\link{Gradient}}, \code{\link{IntegratedGradient}}, \code{\link{LIME}}, @@ -186,6 +187,7 @@ the given data and the results are stored in the field \code{result}. data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, times_input = FALSE, n = 50, @@ -246,6 +248,33 @@ calculations become more computationally expensive for more output nodes.\cr }} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{ignore_last_act}}{(\code{logical(1)})\cr Set this logical value to include the last activation functions for each output layer, or not (default: \code{TRUE}). diff --git a/tests/testthat/test_ConnectionWeights.R b/tests/testthat/test_ConnectionWeights.R index 839e549..dc77749 100644 --- a/tests/testthat/test_ConnectionWeights.R +++ b/tests/testthat/test_ConnectionWeights.R @@ -144,8 +144,6 @@ test_that("ConnectionWeights (global): Conv1D-Net", { # get_result method result <- cw_first$get_result() expect_equal(dim(result), c(1,3,64,4)) - expect_equal(dimnames(result), - c(list(NULL), converter$input_names[[1]], converter$output_names[[1]])) result <- cw_first$get_result("torch.tensor") expect_equal(dim(result), c(1,3,64,4)) result <- cw_first$get_result("data.frame") @@ -229,8 +227,6 @@ test_that("ConnectionWeights (global): Conv2D-Net", { # get_result method result <- cw_first$get_result() expect_equal(dim(result), c(1,3,32,32,5)) - expect_equal(dimnames(result), - c(list(NULL), converter$input_names[[1]], converter$output_names[[1]])) result <- cw_first$get_result("torch.tensor") expect_equal(dim(result), c(1,3,32,32,5)) result <- cw_first$get_result("data.frame") @@ -395,8 +391,6 @@ test_that("ConnectionWeights (local): Dense-Net", { # get_result method result <- cw_first$get_result() expect_equal(dim(result), c(10,4,3)) - expect_equal(dimnames(result), - c(list(NULL), converter$input_names[[1]], converter$output_names[[1]])) result <- cw_first$get_result("torch.tensor") expect_equal(dim(result), c(10,4,3)) result <- cw_first$get_result("data.frame") @@ -482,8 +476,6 @@ test_that("ConnectionWeights (local): Conv1D-Net", { # get_result method result <- cw_first$get_result() expect_equal(dim(result), c(10,3,64,4)) - expect_equal(dimnames(result), - c(list(NULL), converter$input_names[[1]], converter$output_names[[1]])) result <- cw_first$get_result("torch.tensor") expect_equal(dim(result), c(10,3,64,4)) result <- cw_first$get_result("data.frame") @@ -575,8 +567,6 @@ test_that("ConnectionWeights (local): Conv2D-Net", { # get_result method result <- cw_first$get_result() expect_equal(dim(result), c(10,3,32,32,5)) - expect_equal(dimnames(result), - c(list(NULL), converter$input_names[[1]], converter$output_names[[1]])) result <- cw_first$get_result("torch.tensor") expect_equal(dim(result), c(10,3,32,32,5)) result <- cw_first$get_result("data.frame") diff --git a/tests/testthat/test_DeepSHAP.R b/tests/testthat/test_DeepSHAP.R index 473e28e..a9b0c12 100644 --- a/tests/testthat/test_DeepSHAP.R +++ b/tests/testthat/test_DeepSHAP.R @@ -145,12 +145,12 @@ test_that("DeepSHAP: Dense-Net (Neuralnet)", { TRUE, TRUE, TRUE, TRUE) last_layer <- rev(converter$model$modules_list)[[1]] - contrib_true <- last_layer$output - last_layer$output_ref + contrib_true <- last_layer$output - last_layer$output_ref$mean(dim = 1, keepdim = TRUE) contrib_no_last_act_true <- - last_layer$preactivation - last_layer$preactivation_ref + last_layer$preactivation - last_layer$preactivation_ref$mean(dim = 1, keepdim = TRUE) first_layer <- converter$model$modules_list[[1]] - input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + input_diff <- (first_layer$input - first_layer$input_ref$mean(dim = 1, keepdim = TRUE))$unsqueeze(-1) deepshap_rescale <- d$get_result(type = "torch.tensor") @@ -234,12 +234,12 @@ test_that("DeepSHAP: Dense-Net (keras)", { converter$model$update_ref(torch_tensor(data_ref), TRUE, TRUE, TRUE, TRUE) last_layer <- rev(converter$model$modules_list)[[1]] - contrib_true <- last_layer$output - last_layer$output_ref + contrib_true <- last_layer$output - last_layer$output_ref$mean(dim = 1, keepdim = TRUE) contrib_no_last_act_true <- - last_layer$preactivation - last_layer$preactivation_ref + last_layer$preactivation - last_layer$preactivation_ref$mean(dim = 1, keepdim = TRUE) first_layer <- converter$model$modules_list[[1]] - input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + input_diff <- (first_layer$input - first_layer$input_ref$mean(dim = 1, keepdim = TRUE))$unsqueeze(-1) deepshap_rescale <- d$get_result(type = "torch.tensor") @@ -330,12 +330,12 @@ test_that("DeepSHAP: Conv1D-Net", { converter$model$update_ref(torch_tensor(data_ref), FALSE, TRUE, TRUE, TRUE) last_layer <- rev(converter$model$modules_list)[[1]] - contrib_true <- last_layer$output - last_layer$output_ref + contrib_true <- last_layer$output - last_layer$output_ref$mean(dim = 1, keepdim = TRUE) contrib_no_last_act_true <- - last_layer$preactivation - last_layer$preactivation_ref + last_layer$preactivation - last_layer$preactivation_ref$mean(dim = 1, keepdim = TRUE) first_layer <- converter$model$modules_list[[1]] - input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + input_diff <- (first_layer$input - first_layer$input_ref$mean(dim = 1, keepdim = TRUE))$unsqueeze(-1) deepshap_rescale <- d$get_result(type = "torch.tensor") @@ -429,16 +429,14 @@ test_that("DeepSHAP: Conv2D-Net", { converter$model$update_ref(torch_tensor(data_ref), FALSE, TRUE, TRUE, TRUE) last_layer <- rev(converter$model$modules_list)[[1]] - contrib_true <- last_layer$output - last_layer$output_ref + contrib_true <- last_layer$output - last_layer$output_ref$mean(dim = 1, keepdim = TRUE) contrib_no_last_act_true <- - last_layer$preactivation - last_layer$preactivation_ref + last_layer$preactivation - last_layer$preactivation_ref$mean(dim = 1, keepdim = TRUE) first_layer <- converter$model$modules_list[[1]] - input_diff <- (first_layer$input - first_layer$input_ref)$unsqueeze(-1) + input_diff <- (first_layer$input - first_layer$input_ref$mean(dim = 1, keepdim = TRUE))$unsqueeze(-1) deepshap_rescale <- d$get_result(type = "torch.tensor") - as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3, 4)) - - contrib_true)^2)) expect_equal(dim(deepshap_rescale), c(4, 16, 16, 3, 2)) expect_lt(as.array(mean(abs(deepshap_rescale$sum(dim = c(2, 3, 4)) - diff --git a/tests/testthat/test_Gradients.R b/tests/testthat/test_Gradients.R index a3acb81..04683ef 100644 --- a/tests/testthat/test_Gradients.R +++ b/tests/testthat/test_Gradients.R @@ -13,10 +13,24 @@ test_that("Gradient: Plot and Boxplot", { # create an converter for this model converter <- Converter$new(nn) - # Rescale Rule - grad <- Gradient$new(converter, data, - dtype = "double", - ) + expect_error(Gradient$new(converter, data, output_idx = c(1, 10))) + Gradient$new(converter, data, output_idx = c(1, 2)) + expect_error(Gradient$new(converter, data, output_idx = list(c(1, 10)))) + Gradient$new(converter, data, output_idx = list(c(1, 3))) + expect_error(Gradient$new(converter, data, output_idx = list(NULL, c(1, 2)))) + expect_error(Gradient$new(converter, data, output_label = c(1, 2))) + expect_error(Gradient$new(converter, data, output_label = c("A", "b"))) + Gradient$new(converter, data, output_label = c("setosa", "virginica")) + Gradient$new(converter, data, output_label = as.factor(c("setosa", "virginica"))) + Gradient$new(converter, data, output_label = list(c("setosa", "virginica"))) + expect_error(Gradient$new(converter, data, + output_label = c("setosa", "virginica"), + output_idx = c(1, 2))) + Gradient$new(converter, data, output_label = c("setosa", "virginica"), + output_idx = c(1, 3)) + + # ggplot2 + grad <- Gradient$new(converter, data, dtype = "double") # ggplot2 diff --git a/tests/testthat/test_LIME.R b/tests/testthat/test_LIME.R index 9514758..89bddc0 100644 --- a/tests/testthat/test_LIME.R +++ b/tests/testthat/test_LIME.R @@ -26,6 +26,23 @@ test_that("LIME: General errors", { expect_error(LIME$new(model, data, data[1:2, ], output_names = c("a", "d"))) # wrong output_names LIME$new(model, data, data[1:2, ], output_names = factor(c("a", "d", "c"))) + expect_error(LIME$new(model, data, data[1:2, ], output_idx = c(1, 10))) + LIME$new(model, data, data[1:2, ], output_idx = c(1, 2)) + expect_error(LIME$new(model, data, data[1:2, ], output_idx = list(c(1, 10)))) + LIME$new(model, data, data[1:2, ], output_idx = list(c(1, 3))) + expect_error(LIME$new(model, data, data[1:2, ], output_idx = list(NULL, c(1, 2)))) + expect_error(LIME$new(model, data, data[1:2, ], output_label = c(1, 2))) + expect_error(LIME$new(model, data, data[1:2, ], output_label = c("A", "b"))) + LIME$new(model, data, data[1:2, ], output_label = c("setosa", "virginica")) + LIME$new(model, data, data[1:2, ], output_label = as.factor(c("setosa", "virginica"))) + LIME$new(model, data, data[1:2, ], output_label = list(c("setosa", "virginica"))) + expect_error(LIME$new(model, data, data[1:2, ], + output_label = c("setosa", "virginica"), + output_idx = c(1, 2))) + LIME$new(model, data, data[1:2, ], + output_label = c("setosa", "virginica"), + output_idx = c(1, 3)) + # Forwarding arguments to lime::explain lime <- LIME$new(model, data, data[1:10, ], n_permutations = 100, gower_power = 3) From 352402552b87ca710b618e936d35672802b071a6 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 27 Nov 2023 22:33:30 +0100 Subject: [PATCH 06/20] Add syntactic sugar for object construction --- DESCRIPTION | 1 + NAMESPACE | 11 +++ R/innsight_sugar.R | 182 ++++++++++++++++++++++++++++++++++++++++++ man/innsight_sugar.Rd | 121 ++++++++++++++++++++++++++++ 4 files changed, 315 insertions(+) create mode 100644 R/innsight_sugar.R create mode 100644 man/innsight_sugar.Rd diff --git a/DESCRIPTION b/DESCRIPTION index a8c5a25..bdccf76 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -80,5 +80,6 @@ Collate: 'utils.R' 'utils_ggplot.R' 'utils_plotly.R' + 'innsight_sugar.R' 'innsight_ggplot2.R' 'innsight_plotly.R' diff --git a/NAMESPACE b/NAMESPACE index 254cd91..4633ade 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -21,7 +21,18 @@ export(LIME) export(LRP) export(SHAP) export(SmoothGrad) +export(convert) export(get_result) +export(run_cw) +export(run_deeplift) +export(run_deepshap) +export(run_expgrad) +export(run_grad) +export(run_intgrad) +export(run_lime) +export(run_lrp) +export(run_shap) +export(run_smoothgrad) exportMethods("+") exportMethods("[") exportMethods("[<-") diff --git a/R/innsight_sugar.R b/R/innsight_sugar.R new file mode 100644 index 0000000..dcfcb37 --- /dev/null +++ b/R/innsight_sugar.R @@ -0,0 +1,182 @@ +#' @title Syntactic sugar for object construction +#' +#' @name innsight_sugar +#' @description +#' +#' Since all methods and the preceding conversion step in the `innsight` +#' package were implemented using R6 classes and these always require a call +#' to `classname$new()` for initialization, the following functions are +#' defined to shorten the construction of the corresponding R6 objects: +#' +#' * `convert()` for [`Converter`] +#' * `run_grad()` for [`Gradient`] +#' * `run_smoothgrad()` for [`SmoothGrad`] +#' * `run_intgrad()` for [`IntegratedGradient`] +#' * `run_expgrad()` for [`ExpectedGradient`] +#' * `run_lrp()` for [`LRP`] +#' * `run_deeplift()` for [`DeepLift`] +#' * `run_deepshap` for [`DeepSHAP`] +#' * `run_cw` for [`ConnectionWeights`] +#' * `run_lime` for [`LIME`] +#' * `run_shap` for [`SHAP`] +#' +#' @template param-converter +#' @template param-x-agnostic +#' +#' @param model ([`nn_sequential`], \code{\link[keras]{keras_model}}, +#' \code{\link[neuralnet]{neuralnet}} or `list`)\cr +#' A trained neural network for classification or regression +#' tasks to be interpreted. Only models from the following types or +#' packages are allowed: \code{\link[torch]{nn_sequential}}, +#' \code{\link[keras]{keras_model}}, +#' \code{\link[keras]{keras_model_sequential}}, +#' \code{\link[neuralnet]{neuralnet}} or a named list (see details).\cr +#' **Note:** For the model-agnostic methods, an arbitrary fitted model for a +#' classification or regression task can be passed. A [`Converter`] object can +#' also be passed. In order for the package to know how to make predictions +#' with the given model, a prediction function must also be passed with +#' the argument `pred_fun`. However, for models created by +#' \code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +#' \code{\link[neuralnet]{neuralnet}} or [`Converter`], +#' these have already been pre-implemented and do not need to be +#' specified.\cr +#' @param data ([`array`], [`data.frame`], [`torch_tensor`] or `list`)\cr +#' The data to which the method is to be applied. These must +#' have the same format as the input data of the passed model to the +#' converter object. This means either +#' \itemize{ +#' \item an `array`, `data.frame`, `torch_tensor` or array-like format of +#' size *(batch_size, dim_in)*, if e.g., the model has only one input layer, or +#' \item a `list` with the corresponding input data (according to the +#' upper point) for each of the input layers. +#' } +#' **Note:** For the model-agnostic methods, only models with a single +#' input and output layer is allowed!\cr +#' @param ... Other arguments passed to the individual constructor functions +#' of the methods R6 classes. +#' +#' @return [R6::R6Class] object of the respective type. +#' +NULL + + +#' @rdname innsight_sugar +#' +#' @usage +#' # Create a new `Converter` object of the given `model` +#' convert(model, ...) +#' +#' @export +convert = function(model, ...) { + Converter$new(model, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `Gradient` method to the passed `data` to be explained +#' run_grad(converter, data, ...) +#' +#' @export +run_grad = function(converter, data, ...) { + Gradient$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `SmoothGrad` method to the passed `data` to be explained +#' run_smoothgrad(converter, data, ...) +#' +#' @export +run_smoothgrad = function(converter, data, ...) { + SmoothGrad$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `IntegratedGradient` method to the passed `data` to be explained +#' run_intgrad(converter, data, ...) +#' +#' @export +run_intgrad = function(converter, data, ...) { + IntegratedGradient$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `ExpectedGradient` method to the passed `data` to be explained +#' run_expgrad(converter, data, ...) +#' +#' @export +run_expgrad = function(converter, data, ...) { + ExpectedGradient$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `LRP` method to the passed `data` to be explained +#' run_lrp(converter, data, ...) +#' +#' @export +run_lrp = function(converter, data, ...) { + LRP$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `DeepLift` method to the passed `data` to be explained +#' run_deeplift(converter, data, ...) +#' +#' @export +run_deeplift = function(converter, data, ...) { + DeepLift$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `DeepSHAP` method to the passed `data` to be explained +#' run_deepshap(converter, data, ...) +#' +#' @export +run_deepshap = function(converter, data, ...) { + DeepSHAP$new(converter, data, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `ConnectionWeights` method (argument `data` is not always required) +#' run_cw(converter, ...) +#' +#' @export +run_cw = function(converter, ...) { + ConnectionWeights$new(converter, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `LIME` method to explain `x` by using the dataset `data` +#' run_lime(model, data, x, ...) +#' +#' @export +run_lime = function(model, data, x, ...) { + LIME$new(model, data, x, ...) +} + +#' @rdname innsight_sugar +#' +#' @usage +#' # Apply the `SHAP` method to explain `x` by using the dataset `data` +#' run_shap(model, data, x, ...) +#' +#' @export +run_shap = function(model, data, x, ...) { + SHAP$new(model, data, x, ...) +} diff --git a/man/innsight_sugar.Rd b/man/innsight_sugar.Rd new file mode 100644 index 0000000..b286add --- /dev/null +++ b/man/innsight_sugar.Rd @@ -0,0 +1,121 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/innsight_sugar.R +\name{innsight_sugar} +\alias{innsight_sugar} +\alias{convert} +\alias{run_grad} +\alias{run_smoothgrad} +\alias{run_intgrad} +\alias{run_expgrad} +\alias{run_lrp} +\alias{run_deeplift} +\alias{run_deepshap} +\alias{run_cw} +\alias{run_lime} +\alias{run_shap} +\title{Syntactic sugar for object construction} +\usage{ +# Create a new `Converter` object of the given `model` +convert(model, ...) + +# Apply the `Gradient` method to the passed `data` to be explained +run_grad(converter, data, ...) + +# Apply the `SmoothGrad` method to the passed `data` to be explained +run_smoothgrad(converter, data, ...) + +# Apply the `IntegratedGradient` method to the passed `data` to be explained +run_intgrad(converter, data, ...) + +# Apply the `ExpectedGradient` method to the passed `data` to be explained +run_expgrad(converter, data, ...) + +# Apply the `LRP` method to the passed `data` to be explained +run_lrp(converter, data, ...) + +# Apply the `DeepLift` method to the passed `data` to be explained +run_deeplift(converter, data, ...) + +# Apply the `DeepSHAP` method to the passed `data` to be explained +run_deepshap(converter, data, ...) + +# Apply the `ConnectionWeights` method (argument `data` is not always required) +run_cw(converter, ...) + +# Apply the `LIME` method to explain `x` by using the dataset `data` +run_lime(model, data, x, ...) + +# Apply the `SHAP` method to explain `x` by using the dataset `data` +run_shap(model, data, x, ...) +} +\arguments{ +\item{model}{(\code{\link{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{list})\cr +A trained neural network for classification or regression +tasks to be interpreted. Only models from the following types or +packages are allowed: \code{\link[torch]{nn_sequential}}, +\code{\link[keras]{keras_model}}, +\code{\link[keras]{keras_model_sequential}}, +\code{\link[neuralnet]{neuralnet}} or a named list (see details).\cr +\strong{Note:} For the model-agnostic methods, an arbitrary fitted model for a +classification or regression task can be passed. A \code{\link{Converter}} object can +also be passed. In order for the package to know how to make predictions +with the given model, a prediction function must also be passed with +the argument \code{pred_fun}. However, for models created by +\code{\link[torch]{nn_sequential}}, \code{\link[keras]{keras_model}}, +\code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}, +these have already been pre-implemented and do not need to be +specified.\cr} + +\item{...}{Other arguments passed to the individual constructor functions +of the methods R6 classes.} + +\item{converter}{(\code{\link{Converter}})\cr +An instance of the \code{Converter} class that includes the +torch-converted model and some other model-specific attributes. See +\code{\link{Converter}} for details.\cr} + +\item{data}{(\code{\link{array}}, \code{\link{data.frame}}, \code{\link{torch_tensor}} or \code{list})\cr +The data to which the method is to be applied. These must +have the same format as the input data of the passed model to the +converter object. This means either +\itemize{ +\item an \code{array}, \code{data.frame}, \code{torch_tensor} or array-like format of +size \emph{(batch_size, dim_in)}, if e.g., the model has only one input layer, or +\item a \code{list} with the corresponding input data (according to the +upper point) for each of the input layers. +} +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + +\item{x}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +The individual instances to be explained by the method. +These must have the same format as the input data of the passed model +and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}. If no value is specified, all instances in the +dataset \code{data} will be explained.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} +} +\value{ +\link[R6:R6Class]{R6::R6Class} object of the respective type. +} +\description{ +Since all methods and the preceding conversion step in the \code{innsight} +package were implemented using R6 classes and these always require a call +to \code{classname$new()} for initialization, the following functions are +defined to shorten the construction of the corresponding R6 objects: +\itemize{ +\item \code{convert()} for \code{\link{Converter}} +\item \code{run_grad()} for \code{\link{Gradient}} +\item \code{run_smoothgrad()} for \code{\link{SmoothGrad}} +\item \code{run_intgrad()} for \code{\link{IntegratedGradient}} +\item \code{run_expgrad()} for \code{\link{ExpectedGradient}} +\item \code{run_lrp()} for \code{\link{LRP}} +\item \code{run_deeplift()} for \code{\link{DeepLift}} +\item \code{run_deepshap} for \code{\link{DeepSHAP}} +\item \code{run_cw} for \code{\link{ConnectionWeights}} +\item \code{run_lime} for \code{\link{LIME}} +\item \code{run_shap} for \code{\link{SHAP}} +} +} From 91b47b1c81a4e94a8535e94dabfb109e13c41e03 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Thu, 14 Dec 2023 13:18:31 +0100 Subject: [PATCH 07/20] Save the prediction and the docomposition goal of each interpreting method and show them in the plots generated by plot() --- R/AgnosticWrapper.R | 4 + R/GradienBased.R | 86 ++++++++++++++++ R/InterpretingMethod.R | 211 +++++++++++++++++++++++++++++++++----- R/utils_ggplot.R | 68 ++++++++++-- man/InterpretingMethod.Rd | 91 +++++++++++++++- 5 files changed, 426 insertions(+), 34 deletions(-) diff --git a/R/AgnosticWrapper.R b/R/AgnosticWrapper.R index 6264258..56d59b3 100644 --- a/R/AgnosticWrapper.R +++ b/R/AgnosticWrapper.R @@ -88,6 +88,10 @@ AgnosticWrapper <- R6Class( # Save the original data to be explained self$x <- x + # Calculate predictions + self$preds <- list(self$converter$pred_fun(as.data.frame(x), + input_dim = self$converter$input_dim[[1]])) + # Save the data (the field "data" means the data to be explained, i.e., # in this case "x") if (is.data.frame(x)) x <- as.matrix(x) diff --git a/R/GradienBased.R b/R/GradienBased.R index 72bc747..ac52d13 100644 --- a/R/GradienBased.R +++ b/R/GradienBased.R @@ -70,10 +70,23 @@ GradientBased <- R6Class( save_output = FALSE, save_last_layer = TRUE) + # Save predictions + if (method_name %in% c("Gradient")) { + pred <- get_outputs(out, self$output_idx) + self$preds <- pred + } + if (self$ignore_last_act) { out <- lapply(self$converter$model$output_nodes, function(x) self$converter$model$modules_list[[x]]$preactivation) + + } + + # Save decomposition goal + if (method_name %in% c("Gradient")) { + pred <- get_outputs(out, self$output_idx) + self$decomp_goal <- pred } # Add up the output over the batch dimension @@ -347,6 +360,34 @@ IntegratedGradient <- R6Class( " data instances!") } + # Calculate predictions + out <- self$converter$model(self$data, + channels_first = self$channels_first, + save_input = FALSE, + save_preactivation = FALSE, + save_output = FALSE, + save_last_layer = TRUE) + out_ref <- self$converter$model$update_ref(self$x_ref, + channels_first = self$channels_first, + save_input = FALSE, + save_preactivation = FALSE, + save_output = FALSE, + save_last_layer = TRUE) + + # Save prediction + self$preds <- get_outputs(out, self$output_idx) + + if (self$ignore_last_act) { + out <- lapply(self$converter$model$output_nodes, + function(x) self$converter$model$modules_list[[x]]$preactivation) + out_ref <- lapply(self$converter$model$output_nodes, + function(x) self$converter$model$modules_list[[x]]$preactivation_ref) + } + + # Save decomposition goal + preds <- lapply(seq_along(out), function(i) out[[i]] - out_ref[[i]]) + self$decomp_goal <- get_outputs(preds, self$output_idx) + self$result <- private$run() self$converter$model$reset() } @@ -525,6 +566,22 @@ SmoothGrad <- R6Class( self$n <- n self$noise_level <- noise_level + # Calculate predictions + out <- self$converter$model(self$data, + channels_first = self$channels_first, + save_input = FALSE, + save_preactivation = FALSE, + save_output = FALSE, + save_last_layer = TRUE) + + self$preds <- get_outputs(out, self$output_idx) + + if (self$ignore_last_act) { + out <- lapply(self$converter$model$output_nodes, + function(x) self$converter$model$modules_list[[x]]$preactivation) + } + self$decomp_goal <- get_outputs(out, self$output_idx) + self$result <- private$run() self$converter$model$reset() } @@ -691,6 +748,30 @@ ExpectedGradient <- R6Class( } self$data_ref <- private$test_data(data_ref, name = "data_ref") + # Calculate predictions + out <- self$converter$model(self$data, + channels_first = self$channels_first, + save_input = FALSE, + save_preactivation = FALSE, + save_output = FALSE, + save_last_layer = TRUE) + out_ref <- self$converter$model$update_ref(self$data_ref, + channels_first = self$channels_first, + save_input = FALSE, + save_preactivation = FALSE, + save_output = FALSE, + save_last_layer = TRUE) + + self$preds <- get_outputs(out, self$output_idx) + if (self$ignore_last_act) { + out <- lapply(self$converter$model$output_nodes, + function(x) self$converter$model$modules_list[[x]]$preactivation) + out_ref <- lapply(self$converter$model$output_nodes, + function(x) self$converter$model$modules_list[[x]]$preactivation_ref) + } + preds <- lapply(seq_along(out), function(i) out[[i]] - out_ref[[i]]$mean(dim = 1, keepdim = TRUE)) + self$decomp_goal <- get_outputs(preds, self$output_idx) + self$result <- private$run() self$converter$model$reset() } @@ -808,3 +889,8 @@ calc_times_input <- function(gradients, input) { gradients } + +get_outputs <- function(out, out_idx) { + lapply(seq_along(out), + function(i) out[[i]][, out_idx[[i]], drop = FALSE]$data()) +} diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index 2e7ca7b..ea73ce6 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -8,12 +8,17 @@ #' `innsight` package. Implemented are the following methods: #' #' - *Deep Learning Important Features* ([`DeepLift`]) +#' - *Deep Shapley additive explanations* ([`DeepSHAP`]) #' - *Layer-wise Relevance Propagation* ([`LRP`]) #' - Gradient-based methods: #' - *Vanilla gradients* including *Gradient\eqn{\times}Input* ([`Gradient`]) #' - Smoothed gradients including *SmoothGrad\eqn{\times}Input* ([`SmoothGrad`]) #' - *Integrated gradients* ([`IntegratedGradient`]) +#' - *Expected gradients* ([`ExpectedGradient`]) #' - *Connection Weights* (global and local) ([`ConnectionWeights`]) +#' - Also some model-agnostic approaches: +#' - *Local interpretable model-agnostic explanations* ([`LIME`]) +#' - *Shapley values* ([`SHAP`]) #' #' @template param-converter #' @template param-data @@ -40,6 +45,19 @@ #' @template field-verbose #' @template field-winner_takes_all #' +#' @field preds (`list`)\cr +#' In this field, all calculated predictions are stored as a list of +#' `torch_tensor`s. Each output layer has its own list entry and contains +#' the respective predicted values.\cr +#' @field decomp_goal (`list`)\cr +#' In this field, the method-specific decomposition objectives are stored as +#' a list of `torch_tensor`s for each output layer. For example, +#' GradientxInput and LRP attempt to decompose the prediction into +#' feature-wise additive effects. DeepLift and IntegratedGradient decompose +#' the difference between \eqn{f(x)} and \eqn{f(x')}. On the other hand, +#' DeepSHAP and ExpectedGradient aim to decompose \eqn{f(x)} minus the +#' averaged prediction across the reference values.\cr +#' InterpretingMethod <- R6Class( classname = "InterpretingMethod", public = list( @@ -53,6 +71,8 @@ InterpretingMethod <- R6Class( output_idx = NULL, output_label = NULL, verbose = NULL, + preds = NULL, + decomp_goal = NULL, #' @description #' Create a new instance of this super class. @@ -163,11 +183,23 @@ InterpretingMethod <- R6Class( result <- apply_results(result, FUN) } + # Prepare predictions + prepare_preds <- function(x) { + if (is.null(unlist(x))) { + NULL + } else { + lapply(seq_along(x), function(i) as.array(x[[i]])) + } + } + preds <- prepare_preds(self$preds) + decomp_goals <- prepare_preds(self$decomp_goal) + # Convert the torch_tensor result into a data.frame result <- create_dataframe_from_result( seq_len(dim(self$data[[1]])[1]), result, self$converter$input_names, self$converter$output_names, - self$output_idx) + self$output_idx, preds, decomp_goals) + # Remove unnecessary columns if (all(result$input_dimension <= 2)) { result$feature_2 <- NULL @@ -223,6 +255,9 @@ InterpretingMethod <- R6Class( #' `'gridExtra'` and `'gtable'` must be installed in your R session. #' 3. If the global *Connection Weights* method was applied, the #' unnecessary argument `data_idx` will be ignored. + #' 4. The predictions, the sum of relevances, and, if available, the + #' decomposition target are displayed by default in a box within the plot. + #' Currently, these are not generated for `plotly` plots. #' #' @param data_idx (`integer`)\cr #' An integer vector containing the numbers of the data @@ -242,6 +277,13 @@ InterpretingMethod <- R6Class( #' same fill scale across multiple input layers or whether each is #' scaled individually. This argument is only used if more than one input #' layer results are plotted.\cr + #' @param show_preds (`logical`)\cr + #' This logical value indicates whether the plots display the prediction, + #' the sum of calculated relevances, and, if available, the targeted + #' decomposition value. For example, in the case of GradientxInput, the + #' goal is to obtain a decomposition of the predicted value, while for + #' DeepLift and IntegratedGradient, the goal is the difference between + #' the prediction and the reference value, i.e., \eqn{f(x) - f(x')}.\cr #' #' @return #' Returns either an [`innsight_ggplot2`] (`as_plotly = FALSE`) or an @@ -250,10 +292,13 @@ InterpretingMethod <- R6Class( #' plot = function(data_idx = 1, output_idx = NULL, + output_label = NULL, aggr_channels = "sum", as_plotly = FALSE, - same_scale = FALSE) { + same_scale = FALSE, + show_preds = TRUE) { + # Get method-specific arguments ----------------------------------------- if (inherits(self, "ConnectionWeights")) { if (!self$times_input) { if (!identical(data_idx, 1)) { @@ -289,20 +334,34 @@ InterpretingMethod <- R6Class( include_data <- TRUE } - # Check correctness of arguments + # Check correctness of arguments ---------------------------------------- cli_check( checkIntegerish(data_idx, lower = 1, upper = dim(self$data[[1]])[1]), "data_idx") - output_idx <- check_output_idx_for_plot(output_idx, self$output_idx) cli_check(checkLogical(as_plotly), "as_plotly") cli_check(checkLogical(same_scale), "same_scale") # Set aggregation function for channels aggr_channels <- get_aggr_function(aggr_channels) - # Get only relevant model outputs + # Check for output_label and output_idx + if (!is.null(output_label) & is.null(output_idx)) { + output_idx <- transform_label_to_idx(output_label, + self$converter$output_names) + } else if (!is.null(output_label) & !is.null(output_idx)) { + warningf("You passed non-{.code NULL} values for the arguments ", + "{.arg output_label} and {.arg output_idx}. Both are used ", + "to specify the output nodes to be plotted. In the ", + "following, only the values of {.arg output_idx} are used!") + } + output_idx <- check_output_idx_for_plot(output_idx, self$output_idx) + + # Get only the relevant results ----------------------------------------- + # Get only relevant model output layer null_idx <- unlist(lapply(output_idx, is.null)) result <- self$result[!null_idx] + preds <- if (length(self$preds) != 0) self$preds[!null_idx] else NULL + decomp_goals <- if (length(self$decomp_goal) != 0) self$decomp_goal[!null_idx] else NULL # Get the relevant output and class node indices # This is done by matching the given output indices ('output_idx') with @@ -312,8 +371,28 @@ InterpretingMethod <- R6Class( seq_along(output_idx), function(i) match(output_idx[[i]], self$output_idx[[i]]))[!null_idx] - result <- apply_results(result, aggregate_channels, idx_matches, data_idx, - self$channels_first, aggr_channels) + # Get only the relevant results, predictions and decomposition goals + result <- apply_results(result, aggregate_channels, idx_matches, + data_idx, self$channels_first, aggr_channels) + if (show_preds) { + fun <- function(i, x) { + as.array(x[[i]][data_idx, idx_matches[[i]], drop = FALSE]) + } + if (!is.null(unlist(preds))) { + preds <- lapply(seq_along(preds), fun, x = preds) + } else { + preds = NULL + } + if (!is.null(unlist(decomp_goals))) { + decomp_goals <- lapply(seq_along(decomp_goals), fun, x = decomp_goals) + } else { + decomp_goals = NULL + } + } else { + preds = NULL + decomp_goals = NULL + } + # Get and modify input names input_names <- lapply(self$converter$input_names, function(in_name) { @@ -323,8 +402,10 @@ InterpretingMethod <- R6Class( in_name }) + # Create the data.frame with all information necessary for the plot result_df <- create_dataframe_from_result( - data_idx, result, input_names, self$converter$output_names, output_idx) + data_idx, result, input_names, self$converter$output_names, output_idx, + preds, decomp_goals) # Get plot if (as_plotly) { @@ -332,7 +413,7 @@ InterpretingMethod <- R6Class( same_scale) } else { p <- create_ggplot(result_df, value_name, include_data, FALSE, NULL, - same_scale) + same_scale, show_preds) } p @@ -382,6 +463,7 @@ InterpretingMethod <- R6Class( #' [`innsight_plotly`] (`as_plotly = TRUE`) object with the plotted #' summarized results. boxplot = function(output_idx = NULL, + output_label = NULL, data_idx = "all", ref_data_idx = NULL, aggr_channels = "sum", @@ -390,6 +472,7 @@ InterpretingMethod <- R6Class( individual_data_idx = NULL, individual_max = 20) { + # Get method-specific arguments ----------------------------------------- if (inherits(self, "ConnectionWeights")) { if (!self$times_input) { stopf( @@ -406,19 +489,14 @@ InterpretingMethod <- R6Class( } else if (inherits(self, c("DeepLift", "DeepSHAP"))) { value_name <- "Contribution" } else if (inherits(self, "GradientBased")) { - value_name <- "Gradient" + value_name <- if(self$times_input) "Relevance" else "Gradient" } else if (inherits(self, "LIME")) { value_name <- "Weight" } else if (inherits(self, "SHAP")) { value_name <- "Shapley Value" } - # - # Do checks - # - - # output_idx - output_idx <- check_output_idx_for_plot(output_idx, self$output_idx) + # Check correctness of arguments ---------------------------------------- # data_idx num_data <- dim(self$data[[1]])[1] if (identical(data_idx, "all")) { @@ -446,6 +524,18 @@ InterpretingMethod <- R6Class( cli_check(checkInt(individual_max, lower = 1), "individual_max") individual_max <- min(individual_max, num_data) + # Check for output_label and output_idx + if (!is.null(output_label) & is.null(output_idx)) { + output_idx <- transform_label_to_idx(output_label, + self$converter$output_names) + } else if (!is.null(output_label) & !is.null(output_idx)) { + warningf("You passed non-{.code NULL} values for the arguments ", + "{.arg output_label} and {.arg output_idx}. Both are used ", + "to specify the output nodes to be plotted. In the ", + "following, only the values of {.arg output_idx} are used!") + } + output_idx <- check_output_idx_for_plot(output_idx, self$output_idx) + # Set the individual instances for the plot if (!as_plotly) { individual_idx <- ref_data_idx @@ -455,7 +545,7 @@ InterpretingMethod <- R6Class( individual_idx <- individual_idx[!is.na(individual_idx)] } - + # Get only the relevant results ----------------------------------------- # Get only relevant model outputs null_idx <- unlist(lapply(output_idx, is.null)) result <- self$result[!null_idx] @@ -468,7 +558,7 @@ InterpretingMethod <- R6Class( seq_along(output_idx), function(i) match(output_idx[[i]], self$output_idx[[i]]))[!null_idx] - # apply preprocess function + # Apply preprocess function --------------------------------------------- preprocess <- function(result, out_idx, in_idx, idx_matches) { res <- result[[out_idx]][[in_idx]] if (is.null(res)) { @@ -490,7 +580,8 @@ InterpretingMethod <- R6Class( }) idx <- sort(unique(c(individual_idx, data_idx))) - # Create boxplot data + + # Create boxplot data and plots ----------------------------------------- result <- apply_results(result, aggregate_channels, idx_matches, idx, self$channels_first, aggr_channels) @@ -553,6 +644,12 @@ InterpretingMethod <- R6Class( # Declare vector for relevances for each output node rel_list <- vector(mode = "list", length = length(self$converter$model$output_nodes)) + # Declare vector for the predictions for each output node + pred_list <- vector(mode = "list", + length = length(self$converter$model$output_nodes)) + # Declare vector for the decomposition goal for each output node + decomp_list <- vector(mode = "list", + length = length(self$converter$model$output_nodes)) if (self$verbose) { #messagef("Backward pass '", method_name, "':") @@ -581,6 +678,7 @@ InterpretingMethod <- R6Class( # relevances for this output if (is.null(self$output_idx[[idx]])) { rel <- NULL + pred <- NULL } else { # Otherwise ... # get the corresponding output depending on the argument @@ -598,6 +696,23 @@ InterpretingMethod <- R6Class( } } + # Save the prediction + pred <- layer$output + decomp_goal <- out + if (method_name %in% c("DeepLift", "DeepSHAP")) { + if (self$ignore_last_act) { + pred_ref <- layer$preactivation_ref + } else { + pred_ref <- layer$output_ref + } + num_samples <- dim(self$data[[1]])[1] + decomp_goal <- torch_stack((out - pred_ref)$chunk(num_samples), + dim = 1)$mean(2) + pred <- torch_stack(pred$chunk(num_samples), dim = 1)$mean(2) + } + pred_list[[idx]] <- pred[, self$output_idx[[idx]], drop = FALSE] + decomp_list[[idx]] <- decomp_goal[, self$output_idx[[idx]], drop = FALSE] + # For DeepLift, we only need ones if (method_name %in% c("DeepLift", "DeepSHAP")) { rel <- torch_diag_embed(torch_ones_like(out)) @@ -706,6 +821,10 @@ InterpretingMethod <- R6Class( } } + # Save output predictions + self$preds <- pred_list + self$decomp_goal <- decomp_list + if (self$verbose) cli_progress_done() # If necessary, move channels last @@ -992,8 +1111,10 @@ check_output_idx <- function(output_idx, output_dim, output_label, output_names) # Get labels from output_idx labels <- output_names[[i]][[1]][output_idx[[i]]] + labels_ref <- as.factor(output_label[[i]]) if (length(labels) == 0) labels <- NULL - if (!testSetEqual(labels, as.factor(output_label[[i]]))) { + if (length(labels_ref) == 0) labels_ref <- NULL + if (!testSetEqual(labels, labels_ref)) { stopf("Both the {.arg output_idx} and {.arg output_label} arguments ", "were passed (i.e., not {.code NULL}). However, they do not ", "match and point to different output nodes.") @@ -1070,7 +1191,8 @@ tensor_list_to_named_array <- function(torch_result, input_names, output_names, create_dataframe_from_result <- function(data_idx, result, input_names, - output_names, output_idx) { + output_names, output_idx, + preds = NULL, decomp_goal = NULL) { if (length(data_idx) == 0) { result_df <- NULL @@ -1082,8 +1204,9 @@ create_dataframe_from_result <- function(data_idx, result, input_names, output_names <- output_names[nonnull_idx] fun <- function(result, out_idx, in_idx, input_names, output_names, - output_idx, nonnull_idx) { + output_idx, nonnull_idx, preds) { res <- result[[out_idx]][[in_idx]] + result_df <- create_grid(data_idx, input_names[[in_idx]], output_names[[out_idx]][[1]][output_idx[[out_idx]]]) @@ -1091,6 +1214,27 @@ create_dataframe_from_result <- function(data_idx, result, input_names, result_df$value <- NaN } else { result_df$value <- as.vector(as.array(res)) + res_sum <- apply(res, c(1, length(dim(res))), sum) + num_reps <- nrow(result_df) %/% (length(data_idx) * length(output_idx[[out_idx]])) + res_sum <- do.call("rbind", lapply(seq_len(num_reps), function(x) res_sum)) + + if (length(preds) == 0) { + pred <- NA + } else { + pred <- preds[[out_idx]] + pred <- do.call("rbind", lapply(seq_len(num_reps), function(x) pred)) + } + + if (length(decomp_goal) == 0) { + dec_goal <- NA + } else { + dec_goal <- decomp_goal[[out_idx]] + dec_goal <- do.call("rbind", lapply(seq_len(num_reps), function(x) dec_goal)) + } + + result_df$pred <- as.vector(pred) + result_df$decomp_sum <- as.vector(res_sum) + result_df$decomp_goal <- as.vector(dec_goal) } result_df$model_input <- paste0("Input_", in_idx) result_df$model_output <- factor( @@ -1103,10 +1247,10 @@ create_dataframe_from_result <- function(data_idx, result, input_names, } result <- apply_results(result, fun, input_names, output_names, - output_idx, nonnull_idx) + output_idx, nonnull_idx, preds) result_df <- do.call("rbind", lapply(result, function(x) do.call("rbind", x))) - result_df <- result_df[, c(1, 8, 9, 3, 4, 2, 5, 7, 6)] + result_df <- result_df[, c(1, 11, 12, 3, 4, 2, 5, 7, 8, 9, 10, 6)] } result_df @@ -1172,6 +1316,25 @@ check_output_idx_for_plot <- function(output_idx, true_output_idx) { output_idx } +transform_label_to_idx <- function(output_label, output_names) { + if (!is.list(output_label)) { + output_label <- list(output_label) + } + + fun <- function(i) { + if (!is.null(output_label[[i]])) { + labels <- as.factor(output_label[[i]]) + out_names <- output_names[[i]][[1]] + cli_check(checkSubset(labels, out_names), "output_label") + match(labels, out_names) + } else { + NULL + } + } + + lapply(seq_along(output_label), fun) +} + move_channels_last <- function(names) { for (idx in seq_along(names)) { if (length(names[[idx]]) == 2) { # 1d input diff --git a/R/utils_ggplot.R b/R/utils_ggplot.R index 71ce0c0..8794f8f 100644 --- a/R/utils_ggplot.R +++ b/R/utils_ggplot.R @@ -8,7 +8,7 @@ #----- Main plot function ----------------------------------------------------- create_ggplot <- function(result_df, value_name = "Relevance", include_data = TRUE, boxplot = FALSE, data_idx = NULL, - same_scale = TRUE) { + same_scale = TRUE, show_preds = TRUE) { num_inputs <- length(unique(result_df$model_input)) num_outputs <- length(unique(result_df$model_output)) @@ -21,13 +21,15 @@ create_ggplot <- function(result_df, value_name = "Relevance", p <- plot_image(result_df, value_name, facet_rows = facet_rows, facet_cols = "output_node", - boxplot = boxplot) + boxplot = boxplot, + show_preds = show_preds) } else { p <- plot_bar(result_df, value_name, facet_rows = facet_rows, facet_cols = "output_node", boxplot = boxplot, - data_idx = data_idx) + data_idx = data_idx, + show_preds = show_preds) } p <- new("innsight_ggplot2", @@ -39,7 +41,7 @@ create_ggplot <- function(result_df, value_name = "Relevance", } else { # This is for models with multiple input and/or output layers p <- plot_extended(result_df, value_name, include_data, boxplot, data_idx, - same_scale) + same_scale, show_preds) } p @@ -48,7 +50,8 @@ create_ggplot <- function(result_df, value_name = "Relevance", #----- Plot function for 1D and 2D -------------------------------------------- plot_bar <- function(result_df, value_name = "value", facet_rows = NULL, facet_cols = NULL, calc_fill = TRUE, xticks = TRUE, - yticks = TRUE, boxplot = FALSE, data_idx = NULL) { + yticks = TRUE, boxplot = FALSE, data_idx = NULL, + show_preds = TRUE) { if (boxplot) { facet_rows <- NULL @@ -119,6 +122,28 @@ plot_bar <- function(result_df, value_name = "value", facet_rows = NULL, x_scale + scale_y_continuous(labels = get_format) + # Add box with local information + if (!boxplot & show_preds) { + df <- unique(result_df[c("data", "output_node","pred", "decomp_sum", + "decomp_goal")]) + labels <- c("Pred.: ", "Sum:") #\u2004 + values <- list(df$pred, df$decomp_sum) + if (any(!is.na(df$decomp_goal))) { + labels <- c(labels, "Goal:") + values <- append(values, list(df$decomp_goal)) + } + labels <- format(labels) + df$label <- lapply(seq_len(nrow(df)), function(i) { + res <- format(signif(round(unlist(lapply(values, function(x) x[i])), + digits = 8), digits = 4), justify = "right") + paste0(labels, res, collapse = "\n") + }) + + p <- p + geom_label(aes(label = .data$label), x = -Inf, y = Inf, + data = df, vjust = "inward", hjust = "inward", alpha = 0.5, + color = "black", fill = "darkgray", size = ggplot2::.pt) + } + # Add reference datapoint if (boxplot && !is.null(data_idx)) { p <- p + ref_line @@ -141,7 +166,8 @@ plot_bar <- function(result_df, value_name = "value", facet_rows = NULL, #----- Plot function for images ----------------------------------------------- plot_image <- function(result_df, value_name = "value", facet_rows = NULL, facet_cols = NULL, calc_fill = TRUE, xticks = TRUE, - yticks = TRUE, legend_labels = NULL, boxplot = FALSE) { + yticks = TRUE, legend_labels = NULL, boxplot = FALSE, + show_preds = TRUE) { if (boxplot) { facet_rows <- NULL @@ -234,6 +260,28 @@ plot_image <- function(result_df, value_name = "value", facet_rows = NULL, scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) + # Add box with local information + if (!boxplot & show_preds) { + df <- unique(result_df[c("data", "output_node","pred", "decomp_sum", + "decomp_goal")]) + labels <- c("Pred.: ", "Sum:") #\u2004 + values <- list(df$pred, df$decomp_sum) + if (any(!is.na(df$decomp_goal))) { + labels <- c(labels, "Goal:") + values <- append(values, list(df$decomp_goal)) + } + labels <- format(labels) + df$label <- lapply(seq_len(nrow(df)), function(i) { + res <- format(signif(round(unlist(lapply(values, function(x) x[i])), + digits = 8), digits = 4), justify = "right") + paste0(labels, res, collapse = "\n") + }) + + p <- p + geom_label(aes(label = .data$label), x = -Inf, y = Inf, + data = df, vjust = "inward", hjust = "inward", alpha = 0.5, + color = "black", fill = "darkgray", size = ggplot2::.pt) + } + # Remove ticks and labels if (!xticks) { p <- p + xlab(NULL) + @@ -251,7 +299,7 @@ plot_image <- function(result_df, value_name = "value", facet_rows = NULL, #----- Plot function for multimodal data -------------------------------------- plot_extended <- function(result_df, value_name, include_data, boxplot, - data_idx = NULL, same_scale) { + data_idx = NULL, same_scale, show_preds) { # Load required packages for (pkg in c("grid", "gtable", "gridExtra")) { if (!requireNamespace(pkg, quietly = FALSE)) { @@ -335,7 +383,8 @@ plot_extended <- function(result_df, value_name, include_data, boxplot, xticks = labels$xticks, yticks = labels$yticks, legend_labels = legend_labels, - boxplot = boxplot) + boxplot = boxplot, + show_preds = show_preds) } else { p <- plot_bar(data, value_name, facet_rows = facets$facet_rows, @@ -344,7 +393,8 @@ plot_extended <- function(result_df, value_name, include_data, boxplot, xticks = labels$xticks, yticks = labels$yticks, boxplot = boxplot, - data_idx = data_idx) + data_idx = data_idx, + show_preds = show_preds) } grobs[j, k, i] <- list(p) diff --git a/man/InterpretingMethod.Rd b/man/InterpretingMethod.Rd index 15d4d36..10db192 100644 --- a/man/InterpretingMethod.Rd +++ b/man/InterpretingMethod.Rd @@ -8,14 +8,21 @@ This is a super class for all interpreting methods in the \code{innsight} package. Implemented are the following methods: \itemize{ \item \emph{Deep Learning Important Features} (\code{\link{DeepLift}}) +\item \emph{Deep Shapley additive explanations} (\code{\link{DeepSHAP}}) \item \emph{Layer-wise Relevance Propagation} (\code{\link{LRP}}) \item Gradient-based methods: \itemize{ \item \emph{Vanilla gradients} including \emph{Gradient\eqn{\times}Input} (\code{\link{Gradient}}) \item Smoothed gradients including \emph{SmoothGrad\eqn{\times}Input} (\code{\link{SmoothGrad}}) \item \emph{Integrated gradients} (\code{\link{IntegratedGradient}}) +\item \emph{Expected gradients} (\code{\link{ExpectedGradient}}) } \item \emph{Connection Weights} (global and local) (\code{\link{ConnectionWeights}}) +\item Also some model-agnostic approaches: +\itemize{ +\item \emph{Local interpretable model-agnostic explanations} (\code{\link{LIME}}) +\item \emph{Shapley values} (\code{\link{SHAP}}) +} } } \section{Public fields}{ @@ -105,6 +112,20 @@ the problem of too many zero relevances. With the default value \code{TRUE}, the whole upper-layer relevance is passed to the maximum value in each pooling window. Otherwise, if \code{FALSE}, the relevance is distributed equally among all nodes in a pooling window.\cr} + +\item{\code{preds}}{(\code{list})\cr +In this field, all calculated predictions are stored as a list of +\code{torch_tensor}s. Each output layer has its own list entry and contains +the respective predicted values.\cr} + +\item{\code{decomp_goal}}{(\code{list})\cr +In this field, the method-specific decomposition objectives are stored as +a list of \code{torch_tensor}s for each output layer. For example, +GradientxInput and LRP attempt to decompose the prediction into +feature-wise additive effects. DeepLift and IntegratedGradient decompose +the difference between \eqn{f(x)} and \eqn{f(x')}. On the other hand, +DeepSHAP and ExpectedGradient aim to decompose \eqn{f(x)} minus the +averaged prediction across the reference values.\cr} } \if{html}{\out{}} } @@ -302,14 +323,19 @@ a bit more complex, therefore the suggested packages \code{'grid'}, \code{'gridExtra'} and \code{'gtable'} must be installed in your R session. \item If the global \emph{Connection Weights} method was applied, the unnecessary argument \code{data_idx} will be ignored. +\item The predictions, the sum of relevances, and, if available, the +decomposition target are displayed by default in a box within the plot. +Currently, these are not generated for \code{plotly} plots. } \subsection{Usage}{ \if{html}{\out{
}}\preformatted{InterpretingMethod$plot( data_idx = 1, output_idx = NULL, + output_label = NULL, aggr_channels = "sum", as_plotly = FALSE, - same_scale = FALSE + same_scale = FALSE, + show_preds = TRUE )}\if{html}{\out{
}} } @@ -331,6 +357,33 @@ initialization \code{new()} (see argument \code{output_idx} in method \code{new( this R6 class for details). By default (\code{NULL}), the smallest index of all calculated output nodes and output layers is used.\cr} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{aggr_channels}}{(\code{character(1)} or \code{\link{function}})\cr Pass one of \code{'norm'}, \code{'sum'}, \code{'mean'} or a custom function to aggregate the channels, e.g., the maximum @@ -351,6 +404,14 @@ A logical value that specifies whether the individual plots have the same fill scale across multiple input layers or whether each is scaled individually. This argument is only used if more than one input layer results are plotted.\cr} + +\item{\code{show_preds}}{(\code{logical})\cr +This logical value indicates whether the plots display the prediction, +the sum of calculated relevances, and, if available, the targeted +decomposition value. For example, in the case of GradientxInput, the +goal is to obtain a decomposition of the predicted value, while for +DeepLift and IntegratedGradient, the goal is the difference between +the prediction and the reference value, i.e., \eqn{f(x) - f(x')}.\cr} } \if{html}{\out{}} } @@ -391,6 +452,7 @@ a bit more complex, therefore the suggested packages \code{'grid'}, \subsection{Usage}{ \if{html}{\out{
}}\preformatted{InterpretingMethod$boxplot( output_idx = NULL, + output_label = NULL, data_idx = "all", ref_data_idx = NULL, aggr_channels = "sum", @@ -413,6 +475,33 @@ the initialization \code{new()} (see argument \code{output_idx} in method \code{ of this R6 class for details). By default (\code{NULL}), the smallest index of all calculated output nodes and output layers is used.\cr} +\item{\code{output_label}}{(\code{character}, \code{factor}, \code{list} or \code{NULL})\cr +These values specify the output nodes for which +the method is to be applied. Only values that were previously passed with +the argument \code{output_names} in the \code{converter} can be used. In order to +allow models with multiple +output layers, there are the following possibilities to select +the names of the output nodes in the individual output layers: +\itemize{ +\item A \code{character} vector or \code{factor} of labels: If the model has only one output +layer, the values correspond to the labels of the output nodes named in the +passed \code{Converter} object, e.g., +\code{c("a", "c", "d")} for the first, third and fourth output node if the +output names are \code{c("a", "b", "c", "d")}. If there are +multiple output layers, the names of the output nodes from the first +output layer are considered. +\item A \code{list} of \code{charactor}/\code{factor} vectors of labels: If the method is to be +applied to output nodes from different layers, a list can be passed +that specifies the desired labels of the output nodes for each +output layer. Unwanted output layers have the entry \code{NULL} instead of +a vector of labels, e.g., \code{list(NULL, c("a", "c"))} for the first and +third output node in the second output layer. +\item \code{NULL} (default): The method is applied to all output nodes in +the first output layer but is limited to the first ten as the +calculations become more computationally expensive for more output +nodes.\cr +}} + \item{\code{data_idx}}{(\code{integer})\cr By default, all available data points are used to calculate the boxplot information. However, this parameter can be From 60ad2baac9d168a235ffed28f77c7c0394451fc2 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 14:38:47 +0100 Subject: [PATCH 08/20] Update documentation and rename function to --- DESCRIPTION | 7 +- NAMESPACE | 9 +- NEWS.md | 57 ++ R/AgnosticMethods.R | 48 +- R/AgnosticWrapper.R | 38 +- R/ConnectionWeights.R | 9 +- R/Converter.R | 12 + R/DeepLift.R | 25 +- R/GradienBased.R | 27 +- R/InterpretingMethod.R | 59 +- R/LRP.R | 11 +- R/innsight.R | 6 + R/innsight_sugar.R | 36 +- README.Rmd | 18 +- README.md | 18 +- man-roxygen/details-Converter.R | 5 + man-roxygen/examples-ConnectionWeights.R | 17 +- man-roxygen/examples-Converter.R | 10 +- man-roxygen/examples-DeepLift.R | 20 +- man-roxygen/examples-DeepSHAP.R | 18 +- man-roxygen/examples-Gradient.R | 20 +- man-roxygen/examples-IntegratedGradient.R | 18 +- man-roxygen/examples-LIME.R | 27 +- man-roxygen/examples-LRP.R | 18 +- man-roxygen/examples-SHAP.R | 22 +- man-roxygen/examples-SmoothGrad.R | 16 +- man-roxygen/param-data-agnostic.R | 11 +- man-roxygen/param-data_ref-agnostic.R | 7 + man-roxygen/param-model-agnostic.R | 2 +- man-roxygen/param-x-agnostic.R | 8 - man/AgnosticWrapper.Rd | 26 +- man/ConnectionWeights.Rd | 22 +- man/Converter.Rd | 27 +- man/DeepLift.Rd | 25 +- man/DeepSHAP.Rd | 25 +- man/ExpectedGradient.Rd | 25 +- man/Gradient.Rd | 28 +- man/GradientBased.Rd | 3 +- man/IntegratedGradient.Rd | 23 +- man/InterpretingMethod.Rd | 12 +- man/LIME.Rd | 54 +- man/LRP.Rd | 23 +- man/SHAP.Rd | 49 +- man/SmoothGrad.Rd | 21 +- man/get_result.Rd | 3 +- man/innsight-package.Rd | 8 + man/innsight_ggplot2-indexing.Rd | 8 +- man/innsight_plotly-indexing.Rd | 5 +- man/innsight_sugar.Rd | 19 +- man/plot_global.Rd | 22 + tests/testthat/test_ConnectionWeights.R | 20 +- tests/testthat/test_Gradients.R | 24 +- tests/testthat/test_LIME.R | 126 ++--- tests/testthat/test_SHAP.R | 202 +++---- tests/testthat/test_innsight_ggplot2.R | 6 +- tests/testthat/test_innsight_plotly.R | 6 +- vignettes/Example_1_iris.Rmd | 19 +- vignettes/Example_2_penguin.Rmd | 15 +- vignettes/detailed_overview.Rmd | 639 ++++++++++++++++++---- vignettes/innsight.Rmd | 150 +++-- 60 files changed, 1522 insertions(+), 712 deletions(-) create mode 100644 man-roxygen/param-data_ref-agnostic.R delete mode 100644 man-roxygen/param-x-agnostic.R create mode 100644 man/plot_global.Rd diff --git a/DESCRIPTION b/DESCRIPTION index bdccf76..af41995 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -7,9 +7,10 @@ Authors@R: c( comment = c(ORCID = "0000-0002-4623-8271")), person("Raphael", "Baudeu", , "raphael.baudeu@gmail.com", role = "ctb") ) -Description: Interpretability methods to analyze the behavior and - individual predictions of modern neural networks. Implemented methods - are: 'Connection Weights' described by Olden et al. (2004) +Description: Interpretation methods for analyzing the behavior and individual + predictions of modern neural networks in a three-step procedure: Converting + the model, running the interpretation method, and visualizing the results. + Implemented methods are e.g., 'Connection Weights' described by Olden et al. (2004) , layer-wise relevance propagation ('LRP') described by Bach et al. (2015) , deep learning important features diff --git a/NAMESPACE b/NAMESPACE index 4633ade..163e3bf 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,14 +1,10 @@ # Generated by roxygen2: do not edit by hand -S3method(boxplot,AgnosticWrapper) -S3method(boxplot,ConnectionWeights) -S3method(boxplot,DeepLift) -S3method(boxplot,DeepSHAP) -S3method(boxplot,GradientBased) -S3method(boxplot,LRP) +S3method(boxplot,InterpretingMethod) S3method(get_result,InterpretingMethod) S3method(lime::model_type,innsight_agnostic_wrapper) S3method(lime::predict_model,innsight_agnostic_wrapper) +S3method(plot_global,InterpretingMethod) export(AgnosticWrapper) export(ConnectionWeights) export(Converter) @@ -23,6 +19,7 @@ export(SHAP) export(SmoothGrad) export(convert) export(get_result) +export(plot_global) export(run_cw) export(run_deeplift) export(run_deepshap) diff --git a/NEWS.md b/NEWS.md index c95049b..4c7be29 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,60 @@ +# innsight 0.3.0 + +This is a minor release but does contain a range of substantial new features +as well as visual changes, along with some bug fixes. For users, however, +nothing changes that is not set by default as in the previous version or made +aware by warnings. An exception to this are the graphics that are created +using `plot()`. These now contain a small box with information about the +prediction, the sum of the relevances and the goal of the method. + +### Breaking changes + +* By default, the `plot()` method now creates a small box within the plot +with the prediction for the instance and the corresponding class. This info +box also contains the sum of the relevances and, if available, the +decomposition target of the method. Displaying the box can be toggled with +the `show_preds` argument. + +* The `boxplot()` function for the interpretation methods has been +renamed `plot_global()` due to the inappropriate name (especially for images). +The old method `boxplot()` can still be used, but throws a warning for image +data and executes the method `plot_global()` internally. + +### New features + +* We have now implemented the following other feature attribution methods: + * Integrated Gradients (`IntegratedGradient`) + * Expected Gradients (`ExpectedGradient`) + * DeepSHAP (`DeepSHAP`) + * and the model-agnostic approaches LIME (`LIME`) and Shapley values + (`SHAP`). Both can be applied to arbitrary models (by providing the + prediction function `pref_fun`) and wrap the suggested packages `lime` and + `fastshap`. However, they can only be applied to models with a single input + and output layer. + +* We have added functions for the initialization of R6 classes. In this way, +we don't require prior knowledge of R6 syntax for our package. We implemented the +following methods: + * `convert(...)` for `Converter$new(...)` + * `run_grad(...)` for `Gradient$new(...)` + * `run_smoothgrad(...)` for `SmoothGrad$new(...)` + * `run_intgrad(...)` for `IntegratedGradient$new(...)` + * `run_expgrad(...)` for `ExpectedGradient$new(...)` + * `run_deeplift(...)` for `DeepLift$new(...)` + * `run_deepshap(...)` for `DeepSHAP$new(...)` + * `run_lrp(...)` for `LRP$new(...)` + * `run_cw(...)` for `ConnectionWeights$new(...)` + * `run_lime(...)` for `LIME$new(...)` + * `run_shap(...)` for `SHAP$new(...)` + +* In addition to the output index with `output_idx`, the new argument +`output_label` for the output label can now also be specified in order to +calculate or visualize only certain output nodes. + +### Documentation and vignettes + +* Update documentation and vignettes according to the new features and changes + # innsight 0.2.1 * Add `if(require("pkgname"))` for suggested packages in examples diff --git a/R/AgnosticMethods.R b/R/AgnosticMethods.R index 013fd65..55d8889 100644 --- a/R/AgnosticMethods.R +++ b/R/AgnosticMethods.R @@ -20,6 +20,9 @@ #' \code{\link[neuralnet]{neuralnet}} or [`Converter`]. Internally, the #' suggested package `lime` is utilized and applied to `data.frame`. #' +#' The R6 class can also be initialized using the [`run_lime`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' **Note:** Even signal and image data are initially transformed into a #' `data.frame` using `as.data.frame()` and then [`lime::lime`] and #' [`lime::explain`] are @@ -30,8 +33,8 @@ #' @template param-output_label #' @template param-channels_first #' @template param-model-agnostic +#' @template param-data_ref-agnostic #' @template param-data-agnostic -#' @template param-x-agnostic #' @template param-output_type-agnostic #' @template param-pred_fun-agnostic #' @template param-input_dim-agnostic @@ -52,7 +55,7 @@ LIME <- R6Class( #' stored in the field `result`. #' #' @param ... other arguments forwarded to [`lime::explain`]. - initialize = function(model, data, x, + initialize = function(model, data, data_ref, output_type = NULL, pred_fun = NULL, output_idx = NULL, @@ -62,16 +65,20 @@ LIME <- R6Class( input_names = NULL, output_names = NULL, ...) { - # Check if data or x is a torch_tensor - if (inherits(data, "torch_tensor")) data <- as.array(data) - if (inherits(x, "torch_tensor")) x <- as.array(x) + # Check if data or data_ref is a torch_tensor + if (!missing(data)) { + if (inherits(data, "torch_tensor")) data <- as.array(data) + } + if (!missing(data_ref)) { + if (inherits(data_ref, "torch_tensor")) data_ref <- as.array(data_ref) + } - super$initialize(model, data, x, output_type, pred_fun, output_idx, + super$initialize(model, data, data_ref, output_type, pred_fun, output_idx, output_label, channels_first, input_dim, input_names, output_names) # Get the pre-processed x - x <- self$x + x <- self$data_orig # We use the lime package for the explanation if (!requireNamespace("lime", quietly = TRUE)) { @@ -79,7 +86,7 @@ LIME <- R6Class( } # Create the explainer of the lime package - explainer <- lime::lime(data.frame(data), self$converter) + explainer <- lime::lime(data.frame(data_ref), self$converter) # Apply lime if (self$converter$output_type == "classification") { @@ -149,6 +156,9 @@ model_type.innsight_agnostic_wrapper <- function(x, ...) { #' \code{\link[neuralnet]{neuralnet}} or [`Converter`]. Internally, the #' suggested package `fastshap` is utilized and applied to `data.frame`. #' +#' The R6 class can also be initialized using the [`run_shap`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' **Note:** Even signal and image data are initially transformed into a #' `data.frame` using `as.data.frame()` and then [`fastshap::explain`] is #' applied. In other words, a custom `pred_fun` may need to convert the @@ -159,7 +169,7 @@ model_type.innsight_agnostic_wrapper <- function(x, ...) { #' @template param-channels_first #' @template param-model-agnostic #' @template param-data-agnostic -#' @template param-x-agnostic +#' @template param-data_ref-agnostic #' @template param-pred_fun-agnostic #' @template param-input_dim-agnostic #' @template param-input_names-agnostic @@ -179,7 +189,7 @@ SHAP <- R6Class( #' stored in the field `result`. #' #' @param ... other arguments forwarded to [`fastshap::explain`]. - initialize = function(model, data, x, + initialize = function(model, data, data_ref, pred_fun = NULL, output_idx = NULL, output_label = NULL, @@ -191,11 +201,15 @@ SHAP <- R6Class( # output_type is not necessary for fastshap output_type <- "regression" - # Check if data or x is a torch_tensor - if (inherits(data, "torch_tensor")) data <- as.array(data) - if (inherits(x, "torch_tensor")) x <- as.array(x) + # Check if data or data_ref is a torch_tensor + if (!missing(data)) { + if (inherits(data, "torch_tensor")) data <- as.array(data) + } + if (!missing(data_ref)) { + if (inherits(data_ref, "torch_tensor")) data_ref <- as.array(data_ref) + } - super$initialize(model, data, x, output_type, pred_fun, output_idx, + super$initialize(model, data, data_ref, output_type, pred_fun, output_idx, output_label, channels_first, input_dim, input_names, output_names) @@ -212,10 +226,10 @@ SHAP <- R6Class( res <- fastshap::explain( self$converter, - X = as.data.frame(data), - newdata = as.data.frame(self$x), + X = as.data.frame(data_ref), + newdata = as.data.frame(self$data_orig), pred_wrapper = pred_wrapper, ...) - dim(res) <- dim(self$x) + dim(res) <- dim(self$data_orig) res } diff --git a/R/AgnosticWrapper.R b/R/AgnosticWrapper.R index 56d59b3..4c24ea0 100644 --- a/R/AgnosticWrapper.R +++ b/R/AgnosticWrapper.R @@ -19,9 +19,9 @@ #' @template param-output_idx #' @template param-channels_first #' @template param-model-agnostic -#' @template param-data-agnostic +#' @template param-data_ref-agnostic #' @template param-output_label -#' @template param-x-agnostic +#' @template param-data-agnostic #' @template param-output_type-agnostic #' @template param-pred_fun-agnostic #' @template param-input_dim-agnostic @@ -33,14 +33,14 @@ AgnosticWrapper <- R6Class( classname = "AgnosticWrapper", inherit = InterpretingMethod, public = list( - #' @field x The individual instances to be explained by the method + #' @field data_orig The individual instances to be explained by the method #' (unprocessed!). - x = NULL, + data_orig = NULL, #' @description #' Create a new instance of the `AgnosticWrapper` R6 class. #' - initialize = function(model, data, x, + initialize = function(model, data, data_ref, output_type = NULL, pred_fun = NULL, output_idx = NULL, @@ -50,16 +50,16 @@ AgnosticWrapper <- R6Class( input_names = NULL, output_names = NULL) { - # If x is missing, we interpret all instances in the given data - if (missing(x)) x <- data - # Check for required arguments if (missing(model)) stopf("Argument {.arg model} is missing, with no default!") if (missing(data)) stopf("Argument {.arg data} is missing, with no default!") + # If data_ref is missing, we interpret all instances in the given data + if (missing(data_ref)) data_ref <- data + # Set the default input shape given by the data if (is.null(input_dim)) { - input_dim <- dim(data)[-1] + input_dim <- dim(data_ref)[-1] # The input shape is always in the channels first format if (!channels_first) { input_dim <- c(input_dim[length(input_dim)], input_dim[-length(input_dim)]) @@ -67,7 +67,7 @@ AgnosticWrapper <- R6Class( } # Get names from data (if possible) - if (is.null(input_names)) input_names <- names(data) + if (is.null(input_names)) input_names <- names(data_ref) # Create converter object for agnostic IML methods conv_model <- get_converter(model, data, input_dim, input_names, @@ -86,27 +86,19 @@ AgnosticWrapper <- R6Class( self$output_label <- outputs[[2]] # Save the original data to be explained - self$x <- x + self$data_orig <- data # Calculate predictions - self$preds <- list(self$converter$pred_fun(as.data.frame(x), + self$preds <- list(self$converter$pred_fun(as.data.frame(data), input_dim = self$converter$input_dim[[1]])) - # Save the data (the field "data" means the data to be explained, i.e., - # in this case "x") - if (is.data.frame(x)) x <- as.matrix(x) - self$data <- list(torch_tensor(x)) + # Save the data + if (is.data.frame(data)) data <- as.matrix(data) + self$data <- list(torch_tensor(data)) } ) ) -#' -#' @importFrom graphics boxplot -#' @exportS3Method -#' -boxplot.AgnosticWrapper <- function(x, ...) { - x$boxplot(...) -} ################################################################################ # Converter for model-agnostic methods diff --git a/R/ConnectionWeights.R b/R/ConnectionWeights.R index 9a37ccf..9dd66c5 100644 --- a/R/ConnectionWeights.R +++ b/R/ConnectionWeights.R @@ -18,6 +18,9 @@ #' the input data. You can use this variant by setting the `times_input` #' argument to `TRUE` and providing input data. #' +#' The R6 class can also be initialized using the [`run_cw`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @template examples-ConnectionWeights #' @template param-converter #' @template param-data-optional @@ -132,9 +135,3 @@ ConnectionWeights <- R6Class( ) ) - -#' @importFrom graphics boxplot -#' @exportS3Method -boxplot.ConnectionWeights <- function(x, ...) { - x$boxplot(...) -} diff --git a/R/Converter.R b/R/Converter.R index 81c17b0..a690a31 100644 --- a/R/Converter.R +++ b/R/Converter.R @@ -16,6 +16,9 @@ #' \code{vignette("detailed_overview", package = "innsight")} or the #' [website](https://bips-hb.github.io/innsight/articles/detailed_overview.html#model-as-named-list)). #' +#' The R6 class can also be initialized using the [`convert`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @field model ([`ConvertedModel`])\cr #' The converted neural network based on the torch module [ConvertedModel].\cr #' @field input_dim (`list`)\cr @@ -47,10 +50,19 @@ #' * S. Bach et al. (2015) \emph{On pixel-wise explanations for non-linear #' classifier decisions by layer-wise relevance propagation.} PLoS ONE 10, #' p. 1-46 +#' * M. T. Ribeiro et al. (2016) \emph{"Why should I trust you?": Explaining +#' the predictions of any classifier.} KDD 2016, p. 1135-1144 #' * A. Shrikumar et al. (2017) \emph{Learning important features through #' propagating activation differences.} ICML 2017, p. 4844-4866 #' * D. Smilkov et al. (2017) \emph{SmoothGrad: removing noise by adding noise.} #' CoRR, abs/1706.03825 +#' M. Sundararajan et al. (2017) \emph{Axiomatic attribution for deep networks.} +#' ICML 2017, p.3319-3328 +#' * S. Lundberg et al. (2017) \emph{A unified approach to interpreting model +#' predictions.} NIPS 2017, p. 4768-4777 +#' * G. Erion et al. (2021) \emph{Improving performance of deep learning models +#' with axiomatic attribution priors and expected gradients.} Nature Machine +#' Intelligence 3, p. 620-631 #' #' @export Converter <- R6Class("Converter", diff --git a/R/DeepLift.R b/R/DeepLift.R index c63929d..005d70e 100644 --- a/R/DeepLift.R +++ b/R/DeepLift.R @@ -21,6 +21,9 @@ #' activation functions: the *Rescale* rule (`'rescale'`) and #' *RevealCancel* rule (`'reveal_cancel'`). #' +#' The R6 class can also be initialized using the [`run_deeplift`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @template param-converter #' @template param-data #' @template param-output_idx @@ -151,10 +154,13 @@ DeepLift <- R6Class( #' obtained feature-wise results are approximate Shapley values for the #' chosen output, where the conditional expectation is computed using these #' different reference values, i.e., the *DeepSHAP* method decompose the -#' difference from the prediction and the mean prediction \eqn{f(x) - E[f(x)]} +#' difference from the prediction and the mean prediction \eqn{f(x) - E[f(\tilde{x})]} #' in feature-wise effects. The reference values can be passed by the argument #' `data_ref`. #' +#' The R6 class can also be initialized using the [`run_deepshap`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @template param-converter #' @template param-data #' @template param-output_idx @@ -319,20 +325,3 @@ DeepSHAP <- R6Class( } ) ) - - -#' -#' @importFrom graphics boxplot -#' @exportS3Method -#' -boxplot.DeepLift <- function(x, ...) { - x$boxplot(...) -} - -#' -#' @importFrom graphics boxplot -#' @exportS3Method -#' -boxplot.DeepSHAP <- function(x, ...) { - x$boxplot(...) -} diff --git a/R/GradienBased.R b/R/GradienBased.R index ac52d13..31a84cb 100644 --- a/R/GradienBased.R +++ b/R/GradienBased.R @@ -12,6 +12,7 @@ #' - *Vanilla Gradients* and *Gradient\eqn{\times}Input* ([`Gradient`]) #' - *Integrated Gradients* ([`IntegratedGradient`]) #' - *SmoothGrad* and *SmoothGrad\eqn{\times}Input* ([`SmoothGrad`]) +#' - *ExpectedGradients* ([`ExpectedGradient`]) #' #' @template param-converter #' @template param-data @@ -153,15 +154,6 @@ GradientBased <- R6Class( ) -#' -#' @importFrom graphics boxplot -#' @exportS3Method -#' -boxplot.GradientBased <- function(x, ...) { - x$boxplot(...) -} - - ############################################################################### # Vanilla Gradient ############################################################################### @@ -177,6 +169,12 @@ boxplot.GradientBased <- function(x, ...) { #' If the argument `times_input` is `TRUE`, the gradients are multiplied by #' the respective input value (*Gradient\eqn{\times}Input*), i.e., #' \deqn{x_i * d f(x)_j / d x_i.} +#' While the vanilla gradients emphasize prediction-sensitive features, +#' Gradient\eqn{\times}Input is a decomposition of the output into feature-wise +#' effects based on the first-order Taylor decomposition. +#' +#' The R6 class can also be initialized using the [`run_grad`] function as a +#' helper function so that no prior knowledge of R6 classes is required. #' #' @template examples-Gradient #' @template param-converter @@ -272,6 +270,9 @@ Gradient <- R6Class( #' of \eqn{y - y'}. However, with the parameter `times_input` only the gradient #' describing the output sensitivity can be returned. #' +#' The R6 class can also be initialized using the [`run_intgrad`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @template param-converter #' @template param-data #' @template param-channels_first @@ -504,6 +505,9 @@ IntegratedGradient <- R6Class( #' `times_input` to multiply the gradients by the inputs before taking the #' average (*SmoothGrad\eqn{\times}Input*). #' +#' The R6 class can also be initialized using the [`run_smoothgrad`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @template examples-SmoothGrad #' @template param-converter #' @template param-data @@ -671,7 +675,10 @@ SmoothGrad <- R6Class( #' } #' The distribution of the reference values is specified with the argument #' `data_ref`, of which `n` samples are taken at random for each instance -#' during the exstimation. +#' during the estimation. +#' +#' The R6 class can also be initialized using the [`run_expgrad`] function +#' as a helper function so that no prior knowledge of R6 classes is required. #' #' @template param-converter #' @template param-data diff --git a/R/InterpretingMethod.R b/R/InterpretingMethod.R index ea73ce6..9f1ecea 100644 --- a/R/InterpretingMethod.R +++ b/R/InterpretingMethod.R @@ -421,7 +421,7 @@ InterpretingMethod <- R6Class( #' @description #' This method visualizes the results of the selected method summarized as - #' boxplots and enables a visual in-depth investigation of the global + #' boxplots/median image and enables a visual in-depth investigation of the global #' behavior with the help of the S4 classes [`innsight_ggplot2`] and #' [`innsight_plotly`].\cr #' You can use the argument `output_idx` to select the individual output @@ -462,15 +462,15 @@ InterpretingMethod <- R6Class( #' Returns either an [`innsight_ggplot2`] (`as_plotly = FALSE`) or an #' [`innsight_plotly`] (`as_plotly = TRUE`) object with the plotted #' summarized results. - boxplot = function(output_idx = NULL, - output_label = NULL, - data_idx = "all", - ref_data_idx = NULL, - aggr_channels = "sum", - preprocess_FUN = abs, - as_plotly = FALSE, - individual_data_idx = NULL, - individual_max = 20) { + plot_global = function(output_idx = NULL, + output_label = NULL, + data_idx = "all", + ref_data_idx = NULL, + aggr_channels = "sum", + preprocess_FUN = abs, + as_plotly = FALSE, + individual_data_idx = NULL, + individual_max = 20) { # Get method-specific arguments ----------------------------------------- if (inherits(self, "ConnectionWeights")) { @@ -940,7 +940,8 @@ InterpretingMethod <- R6Class( #' [`InterpretingMethod`] for details. #' #' @param x An object of the class [`InterpretingMethod`] including the -#' subclasses [`Gradient`], [`SmoothGrad`], [`LRP`], [`DeepLift`] and +#' subclasses [`Gradient`], [`SmoothGrad`], [`LRP`], [`DeepLift`], +#' [`DeepSHAP`], [`IntegratedGradient`], [`ExpectedGradient`] and #' [`ConnectionWeights`]. #' @param ... Other arguments specified in the R6 method #' `InterpretingMethod$get_result()`. See [`InterpretingMethod`] for details. @@ -953,6 +954,42 @@ get_result.InterpretingMethod <- function(x, ...) { x$get_result(...) } + +#' +#' @importFrom graphics boxplot +#' @exportS3Method +#' +boxplot.InterpretingMethod <- function(x, ...) { + dims <- unlist(lapply(x$converter$input_dim, length)) + if (any(dims > 2)) { + warningf("The {.fn boxplot} function is only intended for tabular or signal ", + "data. It is called {.fn plot_global} instead. ") + } + x$plot_global(...) +} + + +#' Get the result of an interpretation method +#' +#' This is a generic S3 method for the R6 method +#' `InterpretingMethod$plot_global()`. See the respective method described in +#' [`InterpretingMethod`] for details. +#' +#' @param x An object of the class [`InterpretingMethod`] including the +#' subclasses [`Gradient`], [`SmoothGrad`], [`LRP`], [`DeepLift`], +#' [`DeepSHAP`], [`IntegratedGradient`], [`ExpectedGradient`] and +#' [`ConnectionWeights`]. +#' @param ... Other arguments specified in the R6 method +#' `InterpretingMethod$plot_global()`. See [`InterpretingMethod`] for details. +#' +#' @export +plot_global <- function(x, ...) UseMethod("plot_global", x) + +#' @exportS3Method +plot_global.InterpretingMethod <- function(x, ...) { + x$plot_global(...) +} + ############################################################################### # print utility functions ############################################################################### diff --git a/R/LRP.R b/R/LRP.R index aacdd7e..338f446 100644 --- a/R/LRP.R +++ b/R/LRP.R @@ -14,6 +14,9 @@ #' rule ("simple"), \eqn{\varepsilon}-rule ("epsilon") and #' \eqn{\alpha}-\eqn{\beta}-rule ("alpha_beta"). #' +#' The R6 class can also be initialized using the [`run_lrp`] function +#' as a helper function so that no prior knowledge of R6 classes is required. +#' #' @template examples-LRP #' @template param-converter #' @template param-data @@ -195,11 +198,3 @@ LRP <- R6Class( } ) ) - -#' -#' @importFrom graphics boxplot -#' @exportS3Method -#' -boxplot.LRP <- function(x, ...) { - x$boxplot(...) -} diff --git a/R/innsight.R b/R/innsight.R index d79d55d..eb345ba 100644 --- a/R/innsight.R +++ b/R/innsight.R @@ -19,10 +19,16 @@ #' * *Deep learning important features ([DeepLift])* #' * Including propagation rules for non-linearities: *Rescale* rule and #' *RevealCancel* rule +#' * [DeepSHAP] #' * Gradient-based methods: #' * *Vanilla [Gradient]*, including *Gradient\eqn{\times}Input* #' * Smoothed gradients *([SmoothGrad])*, including *SmoothGrad\eqn{\times}Input* +#' * *Integrated gradients* ([IntegratedGradient]) +#' * *Expected gradients* ([ExpectedGradient]) #' * *[ConnectionWeights]* +#' * Model-agnostic methods: +#' * *Local interpretable model-agnostic explanation ([LIME])* +#' * *Shapley values* ([SHAP]) #' #' The package `innsight` aims to be as flexible as possible and independent #' of a specific deep learning package in which the passed network has been diff --git a/R/innsight_sugar.R b/R/innsight_sugar.R index dcfcb37..783349f 100644 --- a/R/innsight_sugar.R +++ b/R/innsight_sugar.R @@ -21,7 +21,7 @@ #' * `run_shap` for [`SHAP`] #' #' @template param-converter -#' @template param-x-agnostic +#' @template param-data_ref-agnostic #' #' @param model ([`nn_sequential`], \code{\link[keras]{keras_model}}, #' \code{\link[neuralnet]{neuralnet}} or `list`)\cr @@ -67,7 +67,7 @@ NULL #' convert(model, ...) #' #' @export -convert = function(model, ...) { +convert <- function(model, ...) { Converter$new(model, ...) } @@ -78,7 +78,7 @@ convert = function(model, ...) { #' run_grad(converter, data, ...) #' #' @export -run_grad = function(converter, data, ...) { +run_grad <- function(converter, data, ...) { Gradient$new(converter, data, ...) } @@ -89,7 +89,7 @@ run_grad = function(converter, data, ...) { #' run_smoothgrad(converter, data, ...) #' #' @export -run_smoothgrad = function(converter, data, ...) { +run_smoothgrad <- function(converter, data, ...) { SmoothGrad$new(converter, data, ...) } @@ -100,7 +100,7 @@ run_smoothgrad = function(converter, data, ...) { #' run_intgrad(converter, data, ...) #' #' @export -run_intgrad = function(converter, data, ...) { +run_intgrad <- function(converter, data, ...) { IntegratedGradient$new(converter, data, ...) } @@ -111,7 +111,7 @@ run_intgrad = function(converter, data, ...) { #' run_expgrad(converter, data, ...) #' #' @export -run_expgrad = function(converter, data, ...) { +run_expgrad <- function(converter, data, ...) { ExpectedGradient$new(converter, data, ...) } @@ -122,7 +122,7 @@ run_expgrad = function(converter, data, ...) { #' run_lrp(converter, data, ...) #' #' @export -run_lrp = function(converter, data, ...) { +run_lrp <- function(converter, data, ...) { LRP$new(converter, data, ...) } @@ -133,7 +133,7 @@ run_lrp = function(converter, data, ...) { #' run_deeplift(converter, data, ...) #' #' @export -run_deeplift = function(converter, data, ...) { +run_deeplift <- function(converter, data, ...) { DeepLift$new(converter, data, ...) } @@ -144,7 +144,7 @@ run_deeplift = function(converter, data, ...) { #' run_deepshap(converter, data, ...) #' #' @export -run_deepshap = function(converter, data, ...) { +run_deepshap <- function(converter, data, ...) { DeepSHAP$new(converter, data, ...) } @@ -155,28 +155,28 @@ run_deepshap = function(converter, data, ...) { #' run_cw(converter, ...) #' #' @export -run_cw = function(converter, ...) { +run_cw <- function(converter, ...) { ConnectionWeights$new(converter, ...) } #' @rdname innsight_sugar #' #' @usage -#' # Apply the `LIME` method to explain `x` by using the dataset `data` -#' run_lime(model, data, x, ...) +#' # Apply the `LIME` method to explain `data` by using the dataset `data_ref` +#' run_lime(model, data, data_ref, ...) #' #' @export -run_lime = function(model, data, x, ...) { - LIME$new(model, data, x, ...) +run_lime <- function(model, data, data_ref, ...) { + LIME$new(model, data, data_ref, ...) } #' @rdname innsight_sugar #' #' @usage -#' # Apply the `SHAP` method to explain `x` by using the dataset `data` -#' run_shap(model, data, x, ...) +#' # Apply the `SHAP` method to explain `data` by using the dataset `data_ref` +#' run_shap(model, data, data_ref, ...) #' #' @export -run_shap = function(model, data, x, ...) { - SHAP$new(model, data, x, ...) +run_shap <- function(model, data, data_ref, ...) { + SHAP$new(model, data, data_ref, ...) } diff --git a/README.Rmd b/README.Rmd index 7d2e952..1004cb7 100644 --- a/README.Rmd +++ b/README.Rmd @@ -51,10 +51,13 @@ interpretability (feature attribution) methods based on neural networks in R, e. * Including propagation rules: $\varepsilon$-rule and $\alpha$-$\beta$-rule * Deep Learning Important Features ([DeepLift](https://arxiv.org/abs/1704.02685)) * Including propagation rules for non-linearities: Rescale rule and RevealCancel rule + * [DeepSHAP](https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html) * Gradient-based methods: * Vanilla Gradient, including [Gradient x Input](https://www.jmlr.org/papers/v11/baehrens10a.html) * Smoothed gradients ([SmoothGrad](https://arxiv.org/abs/1706.03825)), including SmoothGrad x Input + * [Integrated gradients](https://arxiv.org/abs/1703.01365) + * [Expected gradients](https://doi.org/10.1038/s42256-021-00343-w) * Connection Weights Example results for these methods on ImageNet with pretrained network VGG19 (see @@ -122,23 +125,24 @@ model = ... # -------------- Step 1: Convert your model ---------------- # For keras and neuralnet -converter <- Converter$new(model) +converter <- convert(model) # For a torch model the argument 'input_dim' is required -converter <- Converter$new(model, input_dim = model_input_dim) +converter <- convert(model, input_dim = model_input_dim) # -------------- Step 2: Apply method ---------------------- # Apply global method -result <- Method$new(converter) # no data argument is needed +result <- run_method(converter) # no data argument is needed # Apply local methods -result <- Method$new(converter, data) +result <- run_method(converter, data) # -------------- Step 3: Get and plot results -------------- # Get the results as an array res <- get_result(result) # Plot individual results plot(result) -# Plot a boxplot of all given data points in argument 'data' -boxplot(result) +# Plot a aggregated plot of all given data points in argument 'data' +plot_global(result) +boxplot(result) # alias of `plot_global` for tabular and signal data # Interactive plots can also be created for both methods plot(result, as_plotly = TRUE) ``` @@ -170,7 +174,7 @@ or create a feature request if you are missing something for your analyses or have great ideas for extending this package. Currently, we are working on the following: - [ ] GPU support -- [ ] More methods, e.g. Grad-CAM, integrated gradients, etc. +- [ ] More methods, e.g. Grad-CAM, etc. - [ ] More examples and documentation (contact me if you have a non-trivial application for me) diff --git a/README.md b/README.md index 2682ed2..864fa01 100644 --- a/README.md +++ b/README.md @@ -48,11 +48,14 @@ attribution) methods based on neural networks in R, e.g., ([DeepLift](https://arxiv.org/abs/1704.02685)) - Including propagation rules for non-linearities: Rescale rule and RevealCancel rule + - [DeepSHAP](https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html) - Gradient-based methods: - Vanilla Gradient, including [Gradient x Input](https://www.jmlr.org/papers/v11/baehrens10a.html) - Smoothed gradients ([SmoothGrad](https://arxiv.org/abs/1706.03825)), including SmoothGrad x Input + - [Integrated gradients](https://arxiv.org/abs/1703.01365) + - [Expected gradients](https://doi.org/10.1038/s42256-021-00343-w) - Connection Weights Example results for these methods on ImageNet with pretrained network @@ -122,23 +125,24 @@ model = ... # -------------- Step 1: Convert your model ---------------- # For keras and neuralnet -converter <- Converter$new(model) +converter <- convert(model) # For a torch model the argument 'input_dim' is required -converter <- Converter$new(model, input_dim = model_input_dim) +converter <- convert(model, input_dim = model_input_dim) # -------------- Step 2: Apply method ---------------------- # Apply global method -result <- Method$new(converter) # no data argument is needed +result <- run_method(converter) # no data argument is needed # Apply local methods -result <- Method$new(converter, data) +result <- run_method(converter, data) # -------------- Step 3: Get and plot results -------------- # Get the results as an array res <- get_result(result) # Plot individual results plot(result) -# Plot a boxplot of all given data points in argument 'data' -boxplot(result) +# Plot a aggregated plot of all given data points in argument 'data' +plot_global(result) +boxplot(result) # alias of `plot_global` for tabular and signal data # Interactive plots can also be created for both methods plot(result, as_plotly = TRUE) ``` @@ -172,7 +176,7 @@ missing something for your analyses or have great ideas for extending this package. Currently, we are working on the following: - [ ] GPU support -- [ ] More methods, e.g. Grad-CAM, integrated gradients, etc. +- [ ] More methods, e.g. Grad-CAM, etc. - [ ] More examples and documentation (contact me if you have a non-trivial application for me) diff --git a/man-roxygen/details-Converter.R b/man-roxygen/details-Converter.R index 0409ec1..3f34df8 100644 --- a/man-roxygen/details-Converter.R +++ b/man-roxygen/details-Converter.R @@ -21,9 +21,14 @@ #' following methods: #' * *Layerwise Relevance Propagation* ([LRP]), Bach et al. (2015) #' * *Deep Learning Important Features* ([DeepLift]), Shrikumar et al. (2017) +#' * *[DeepSHAP]*, Lundberg et al. (2017) #' * *[SmoothGrad]* including *SmoothGrad\eqn{\times}Input*, Smilkov et al. (2017) #' * *Vanilla [Gradient]* including *Gradient\eqn{\times}Input* +#' * *Integrated gradients* ([IntegratedGradient]), Sundararajan et al. (2017) +#' * *Expected gradients* ([ExpectedGradient]), Erion et al. (2021) #' * *[ConnectionWeights]*, Olden et al. (2004) +#' * *Local interpretable model-agnostic explanation ([LIME])*, Ribeiro et al. (2016) +#' * *Shapley values ([SHAP])*, Lundberg et al. (2017) #' #' #' ## Implemented libraries diff --git a/man-roxygen/examples-ConnectionWeights.R b/man-roxygen/examples-ConnectionWeights.R index 7b7bcdb..9229589 100644 --- a/man-roxygen/examples-ConnectionWeights.R +++ b/man-roxygen/examples-ConnectionWeights.R @@ -16,9 +16,18 @@ #' input_names = list(c("Car", "Cat", "Dog", "Plane", "Horse")) #' ) #' +#' # You can also use the helper function for the initialization part +#' converter <- convert(model, +#' input_dim = c(5), +#' input_names = list(c("Car", "Cat", "Dog", "Plane", "Horse")) +#' ) +#' #' # Apply method Connection Weights #' cw <- ConnectionWeights$new(converter) #' +#' # Again, you can use a helper function `run_cw()` for initializing +#' cw <- run_cw(converter) +#' #' # Print the head of the result as a data.frame #' head(get_result(cw, "data.frame"), 5) #' @@ -38,10 +47,10 @@ #' ) #' #' # Convert the trained model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Apply the Connection Weights method -#' cw <- ConnectionWeights$new(converter) +#' cw <- run_cw(converter) #' #' # Get the result as a torch tensor #' get_result(cw, type = "torch.tensor") @@ -76,10 +85,10 @@ #' layer_dense(units = 2, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the Connection Weights method -#' cw <- ConnectionWeights$new(converter) +#' cw <- run_cw(converter) #' #' # Get the head of the result as a data.frame #' head(get_result(cw, type = "data.frame"), 5) diff --git a/man-roxygen/examples-Converter.R b/man-roxygen/examples-Converter.R index 274d031..db7e37c 100644 --- a/man-roxygen/examples-Converter.R +++ b/man-roxygen/examples-Converter.R @@ -13,6 +13,10 @@ #' # Convert the model (for torch models is 'input_dim' required!) #' converter <- Converter$new(model, input_dim = c(5)) #' +#' # You can also use the helper function `convert()` for initializing a +#' # Converter object +#' converter <- convert(model, input_dim = c(5)) +#' #' # Get the converted model stored in the field 'model' #' converted_model <- converter$model #' @@ -33,7 +37,7 @@ #' ) #' #' # Convert the model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Print all the layers #' converter$model$modules_list @@ -63,7 +67,7 @@ #' layer_dense(units = 1, activation = "sigmoid") #' #' # Convert this model and save model as list -#' converter <- Converter$new(model, save_model_as_list = TRUE) +#' converter <- convert(model, save_model_as_list = TRUE) #' #' # Print the converted model as a named list #' str(converter$model_as_list, max.level = 1) @@ -105,7 +109,7 @@ #' ) #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Get the model as a torch::nn_module #' torch_model <- converter$model diff --git a/man-roxygen/examples-DeepLift.R b/man-roxygen/examples-DeepLift.R index 8f3b5df..9c310d5 100644 --- a/man-roxygen/examples-DeepLift.R +++ b/man-roxygen/examples-DeepLift.R @@ -12,12 +12,16 @@ #' data <- torch_randn(25, 5) #' ref <- torch_randn(1, 5) #' -#' # Create Converter -#' converter <- Converter$new(model, input_dim = c(5)) +#' # Create Converter using the helper function `convert` +#' converter <- convert(model, input_dim = c(5)) #' #' # Apply method DeepLift #' deeplift <- DeepLift$new(converter, data, x_ref = ref) #' +#' # You can also use the helper function `run_deeplift` for initializing +#' # an R6 DeepLift object +#' deeplift <- run_deeplift(converter, data, x_ref = ref) +#' #' # Print the result as a torch tensor for first two data points #' get_result(deeplift, "torch.tensor")[1:2] #' @@ -40,12 +44,12 @@ #' ) #' #' # Convert the model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Apply DeepLift with rescale-rule and a reference input of the feature #' # means #' x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) -#' deeplift_rescale <- DeepLift$new(converter, iris[, c(3, 4)], x_ref = x_ref) +#' deeplift_rescale <- run_deeplift(converter, iris[, c(3, 4)], x_ref = x_ref) #' #' # Get the result as a dataframe and show first 5 rows #' get_result(deeplift_rescale, type = "data.frame")[1:5, ] @@ -84,10 +88,10 @@ #' layer_dense(units = 2, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the DeepLift method with reveal-cancel rule -#' deeplift_revcancel <- DeepLift$new(converter, data, +#' deeplift_revcancel <- run_deeplift(converter, data, #' channels_first = FALSE, #' rule_name = "reveal_cancel" #' ) @@ -95,8 +99,8 @@ #' # Plot the result for the first image and both classes #' plot(deeplift_revcancel, output_idx = 1:2) #' -#' # Plot the result as boxplots for first class -#' boxplot(deeplift_revcancel, output_idx = 1) +#' # Plot the pixel-wise median reelvance image +#' plot_global(deeplift_revcancel, output_idx = 1) #' } #' @examplesIf torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 #' #------------------------- Plotly plots ------------------------------------ diff --git a/man-roxygen/examples-DeepSHAP.R b/man-roxygen/examples-DeepSHAP.R index 6e06e30..bc8e1f3 100644 --- a/man-roxygen/examples-DeepSHAP.R +++ b/man-roxygen/examples-DeepSHAP.R @@ -16,11 +16,15 @@ #' ref <- torch_randn(5, 5) #' #' # Create Converter -#' converter <- Converter$new(model, input_dim = c(5)) +#' converter <- convert(model, input_dim = c(5)) #' #' # Apply method DeepSHAP #' deepshap <- DeepSHAP$new(converter, data, data_ref = ref) #' +#' # You can also use the helper function `run_deepshap` for initializing +#' # an R6 DeepSHAP object +#' deepshap <- run_deepshap(converter, data, data_ref = ref) +#' #' # Print the result as a torch tensor for first two data points #' get_result(deepshap, "torch.tensor")[1:2] #' @@ -43,11 +47,11 @@ #' ) #' #' # Convert the model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Apply DeepSHAP with rescale-rule and a 100 (default of `limit_ref`) #' # instances as the reference dataset -#' deepshap <- DeepSHAP$new(converter, iris[, c(3, 4)], +#' deepshap <- run_deepshap(converter, iris[, c(3, 4)], #' data_ref = iris[, c(3, 4)]) #' #' # Get the result as a dataframe and show first 5 rows @@ -87,17 +91,17 @@ #' layer_dense(units = 2, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the DeepSHAP method with zero baseline (wich is equivalent to #' # DeepLift with zero baseline) -#' deepshap <- DeepSHAP$new(converter, data, channels_first = FALSE) +#' deepshap <- run_deepshap(converter, data, channels_first = FALSE) #' #' # Plot the result for the first image and both classes #' plot(deepshap, output_idx = 1:2) #' -#' # Plot the result as boxplots for first class -#' boxplot(deepshap, output_idx = 1) +#' # Plot the pixel-wise median of the results +#' plot_global(deepshap, output_idx = 1) #' } #' @examplesIf torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 #' #------------------------- Plotly plots ------------------------------------ diff --git a/man-roxygen/examples-Gradient.R b/man-roxygen/examples-Gradient.R index 1a26146..b504a87 100644 --- a/man-roxygen/examples-Gradient.R +++ b/man-roxygen/examples-Gradient.R @@ -12,7 +12,7 @@ #' data <- torch_randn(25, 5) #' #' # Create Converter with input and output names -#' converter <- Converter$new(model, +#' converter <- convert(model, #' input_dim = c(5), #' input_names = list(c("Car", "Cat", "Dog", "Plane", "Horse")), #' output_names = list(c("Buy it!", "Don't buy it!")) @@ -21,6 +21,10 @@ #' # Calculate the Gradients #' grad <- Gradient$new(converter, data) #' +#' # You can also use the helper function `run_grad` for initializing +#' # an R6 Gradient object +#' grad <- run_grad(converter, data) +#' #' # Print the result as a data.frame for first 5 rows #' get_result(grad, "data.frame")[1:5,] #' @@ -44,18 +48,18 @@ #' ) #' #' # Convert the trained model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Calculate the gradients -#' gradient <- Gradient$new(converter, iris[, -5]) +#' gradient <- run_grad(converter, iris[, -5]) #' #' # Plot the result for the first and 60th data point and all classes #' plot(gradient, data_idx = c(1, 60), output_idx = 1:3) #' #' # Calculate Gradients x Input and do not ignore the last activation -#' gradient <- Gradient$new(converter, iris[, -5], -#' ignore_last_act = FALSE, -#' times_input = TRUE) +#' gradient <- run_grad(converter, iris[, -5], +#' ignore_last_act = FALSE, +#' times_input = TRUE) #' #' # Plot the result again #' plot(gradient, data_idx = c(1, 60), output_idx = 1:3) @@ -88,10 +92,10 @@ #' layer_dense(units = 3, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the Gradient method -#' gradient <- Gradient$new(converter, data, channels_first = FALSE) +#' gradient <- run_grad(converter, data, channels_first = FALSE) #' #' # Plot the result for the first datapoint and all classes #' plot(gradient, output_idx = 1:3) diff --git a/man-roxygen/examples-IntegratedGradient.R b/man-roxygen/examples-IntegratedGradient.R index 7612646..5b5c6b2 100644 --- a/man-roxygen/examples-IntegratedGradient.R +++ b/man-roxygen/examples-IntegratedGradient.R @@ -13,11 +13,15 @@ #' ref <- torch_randn(1, 5) #' #' # Create Converter -#' converter <- Converter$new(model, input_dim = c(5)) +#' converter <- convert(model, input_dim = c(5)) #' #' # Apply method IntegratedGradient #' int_grad <- IntegratedGradient$new(converter, data, x_ref = ref) #' +#' # You can also use the helper function `run_intgrad` for initializing +#' # an R6 IntegratedGradient object +#' int_grad <- run_intgrad(converter, data, x_ref = ref) +#' #' # Print the result as a torch tensor for first two data points #' get_result(int_grad, "torch.tensor")[1:2] #' @@ -40,11 +44,11 @@ #' ) #' #' # Convert the model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Apply IntegratedGradient with a reference input of the feature means #' x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) -#' int_grad <- IntegratedGradient$new(converter, iris[, c(3, 4)], x_ref = x_ref) +#' int_grad <- run_intgrad(converter, iris[, c(3, 4)], x_ref = x_ref) #' #' # Get the result as a dataframe and show first 5 rows #' get_result(int_grad, type = "data.frame")[1:5, ] @@ -82,11 +86,11 @@ #' layer_dense(units = 2, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the IntegratedGradient method with a zero baseline and n = 20 #' # iteration steps -#' int_grad <- IntegratedGradient$new(converter, data, +#' int_grad <- run_intgrad(converter, data, #' channels_first = FALSE, #' n = 20 #' ) @@ -94,8 +98,8 @@ #' # Plot the result for the first image and both classes #' plot(int_grad, output_idx = 1:2) #' -#' # Plot the result as boxplots for first class -#' boxplot(int_grad, output_idx = 1) +#' # Plot the pixel-wise median of the results +#' plot_global(int_grad, output_idx = 1) #' } #' @examplesIf torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 #' #------------------------- Plotly plots ------------------------------------ diff --git a/man-roxygen/examples-LIME.R b/man-roxygen/examples-LIME.R index bece866..db34357 100644 --- a/man-roxygen/examples-LIME.R +++ b/man-roxygen/examples-LIME.R @@ -13,7 +13,13 @@ #' #' # Calculate LIME for the first 10 instances and set the #' # feature and outcome names -#' lime <- LIME$new(model, data, data[1:10, ], +#' lime <- LIME$new(model, data[1:10, ], data_ref = data, +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) +#' +#' # You can also use the helper function `run_lime` for initializing +#' # an R6 LIME object +#' lime <- run_lime(model, data[1:10, ], data_ref = data, #' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), #' output_names = c("Buy it!", "Don't buy it!")) #' @@ -28,10 +34,10 @@ #' #' # We can also forward some arguments to lime::explain, e.g. n_permutatuins #' # to get more accurate values -#' lime <- LIME$new(model, data, data[1:10, ], +#' lime <- run_lime(model, data[1:10, ], data_ref = data, #' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), #' output_names = c("Buy it!", "Don't buy it!"), -#' n_perturbations = 500) +#' n_perturbations = 200) #' #' # Plot the boxplots again #' boxplot(lime, output_idx = c(1, 2)) @@ -39,13 +45,13 @@ #' #----------------------- Example 2: Converter object -------------------------- #' # We can do the same with an Converter object (all feature and outcome names #' # will be extracted by the LIME method!) -#' conv <- Converter$new(model, -#' input_dim = c(5), -#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), -#' output_names = c("Buy it!", "Don't buy it!")) +#' conv <- convert(model, +#' input_dim = c(5), +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) #' #' # Calculate LIME for the first 10 instances -#' lime <- LIME$new(conv, data, data[1:10], n_perturbations = 400) +#' lime <- run_lime(conv, data[1:10], data_ref = data, n_perturbations = 300) #' #' # Plot the result for both classes #' plot(lime, output_idx = c(1, 2)) @@ -67,11 +73,12 @@ #' #' # Calculate LIME for the instances of index 1 and 111 and add #' # the outcome labels (for LIME, the output_type is required!) -#' lime <- LIME$new(model, iris[, -5], iris[c(1, 111), -5], +#' lime <- run_lime(model, iris[c(1, 111), -5], +#' data_ref = iris[, -5], #' pred_fun = pred_fun, #' output_type = "classification", #' output_names = levels(iris$Species), -#' n_perturbations = 500) +#' n_perturbations = 300) #' #' # Plot the result for the first two classes and all selected instances #' plot(lime, data_idx = 1:2, output_idx = 1:2) diff --git a/man-roxygen/examples-LRP.R b/man-roxygen/examples-LRP.R index 76a6256..0fc0ae6 100644 --- a/man-roxygen/examples-LRP.R +++ b/man-roxygen/examples-LRP.R @@ -12,11 +12,15 @@ #' data <- torch_randn(25, 5) #' #' # Create Converter -#' converter <- Converter$new(model, input_dim = c(5)) +#' converter <- convert(model, input_dim = c(5)) #' #' # Apply method LRP with simple rule (default) #' lrp <- LRP$new(converter, data) #' +#' # You can also use the helper function `run_lrp` for initializing +#' # an R6 LRP object +#' lrp <- run_lrp(converter, data) +#' #' # Print the result as an array for data point one and two #' get_result(lrp)[1:2,,] #' @@ -37,10 +41,10 @@ #' ) #' #' # Create an converter for this model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Create new instance of 'LRP' -#' lrp <- LRP$new(converter, iris[, -5], rule_name = "simple") +#' lrp <- run_lrp(converter, iris[, -5], rule_name = "simple") #' #' # Get the result as an array for data point one and two #' get_result(lrp)[1:2,,] @@ -49,13 +53,13 @@ #' get_result(lrp, type = "torch.tensor")[1:2] #' #' # Use the alpha-beta rule with alpha = 2 -#' lrp <- LRP$new(converter, iris[, -5], +#' lrp <- run_lrp(converter, iris[, -5], #' rule_name = "alpha_beta", #' rule_param = 2 #' ) #' #' # Include the last activation into the calculation -#' lrp <- LRP$new(converter, iris[, -5], +#' lrp <- run_lrp(converter, iris[, -5], #' rule_name = "alpha_beta", #' rule_param = 2, #' ignore_last_act = FALSE @@ -92,11 +96,11 @@ #' layer_dense(units = 3, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the LRP method with the epsilon rule for the dense layers and #' # the alpha-beta rule for the convolutional layers -#' lrp_comp <- LRP$new(converter, data, +#' lrp_comp <- run_lrp(converter, data, #' channels_first = FALSE, #' rule_name = list(Dense_Layer = "epsilon", Conv1D_Layer = "alpha_beta"), #' rule_param = list(Dense_Layer = 0.1, Conv1D_Layer = 1) diff --git a/man-roxygen/examples-SHAP.R b/man-roxygen/examples-SHAP.R index 15612cb..ce8236d 100644 --- a/man-roxygen/examples-SHAP.R +++ b/man-roxygen/examples-SHAP.R @@ -13,7 +13,13 @@ #' #' # Calculate Shapley values for the first 10 instances and set the #' # feature and outcome names -#' shap <- SHAP$new(model, data, data[1:10, ], +#' shap <- SHAP$new(model, data[1:10, ], data_ref = data, +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) +#' +#' # You can also use the helper function `run_shap` for initializing +#' # an R6 SHAP object +#' shap <- run_shap(model, data[1:10, ], data_ref = data, #' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), #' output_names = c("Buy it!", "Don't buy it!")) #' @@ -28,7 +34,7 @@ #' #' # We can also forward some arguments to fastshap::explain, e.g. nsim to #' # get more accurate values -#' shap <- SHAP$new(model, data, data[1:10, ], +#' shap <- run_shap(model, data[1:10, ], data_ref = data, #' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), #' output_names = c("Buy it!", "Don't buy it!"), #' nsim = 10) @@ -39,13 +45,13 @@ #' #----------------------- Example 2: Converter object -------------------------- #' # We can do the same with an Converter object (all feature and outcome names #' # will be extracted by the SHAP method!) -#' conv <- Converter$new(model, -#' input_dim = c(5), -#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), -#' output_names = c("Buy it!", "Don't buy it!")) +#' conv <- convert(model, +#' input_dim = c(5), +#' input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), +#' output_names = c("Buy it!", "Don't buy it!")) #' #' # Calculate Shapley values for the first 10 instances -#' shap <- SHAP$new(conv, data, data[1:10]) +#' shap <- run_shap(conv, data[1:10], data_ref = data) #' #' # Plot the result for both classes #' plot(shap, output_idx = c(1, 2)) @@ -67,7 +73,7 @@ #' #' # Calculate Shapley values for the instances of index 1 and 111 and add #' # the outcome labels -#' shap <- SHAP$new(model, iris[, -5], iris[c(1, 111), -5], +#' shap <- run_shap(model, iris[c(1, 111), -5], data_ref = iris[, -5], #' pred_fun = pred_fun, #' output_names = levels(iris$Species), #' nsim = 10) diff --git a/man-roxygen/examples-SmoothGrad.R b/man-roxygen/examples-SmoothGrad.R index 7884dc3..efd94bf 100644 --- a/man-roxygen/examples-SmoothGrad.R +++ b/man-roxygen/examples-SmoothGrad.R @@ -12,11 +12,15 @@ #' data <- torch_randn(25, 5) #' #' # Create Converter -#' converter <- Converter$new(model, input_dim = c(5)) +#' converter <- convert(model, input_dim = c(5)) #' #' # Calculate the smoothed Gradients #' smoothgrad <- SmoothGrad$new(converter, data) #' +#' # You can also use the helper function `run_smoothgrad` for initializing +#' # an R6 SmoothGrad object +#' smoothgrad <- run_smoothgrad(converter, data) +#' #' # Print the result as a data.frame for first 5 rows #' head(get_result(smoothgrad, "data.frame"), 5) #' @@ -40,16 +44,16 @@ #' ) #' #' # Convert the trained model -#' converter <- Converter$new(nn) +#' converter <- convert(nn) #' #' # Calculate the smoothed gradients -#' smoothgrad <- SmoothGrad$new(converter, iris[, -5], times_input = FALSE) +#' smoothgrad <- run_smoothgrad(converter, iris[, -5], times_input = FALSE) #' #' # Plot the result for the first and 60th data point and all classes #' plot(smoothgrad, data_idx = c(1, 60), output_idx = 1:3) #' #' # Calculate SmoothGrad x Input and do not ignore the last activation -#' smoothgrad <- SmoothGrad$new(converter, iris[, -5], ignore_last_act = FALSE) +#' smoothgrad <- run_smoothgrad(converter, iris[, -5], ignore_last_act = FALSE) #' #' # Plot the result again #' plot(smoothgrad, data_idx = c(1, 60), output_idx = 1:3) @@ -82,10 +86,10 @@ #' layer_dense(units = 3, activation = "softmax") #' #' # Convert the model -#' converter <- Converter$new(model) +#' converter <- convert(model) #' #' # Apply the SmoothGrad method -#' smoothgrad <- SmoothGrad$new(converter, data, channels_first = FALSE) +#' smoothgrad <- run_smoothgrad(converter, data, channels_first = FALSE) #' #' # Plot the result for the first datapoint and all classes #' plot(smoothgrad, output_idx = 1:3) diff --git a/man-roxygen/param-data-agnostic.R b/man-roxygen/param-data-agnostic.R index 3c534f8..e050e2e 100644 --- a/man-roxygen/param-data-agnostic.R +++ b/man-roxygen/param-data-agnostic.R @@ -1,7 +1,8 @@ -#' @param data ([`array`], [`data.frame`] or [`torch_tensor`])\cr -#' The dataset to which the method is to be applied. These must -#' have the same format as the input data of the passed model and has to -#' be either [`matrix`], an [`array`], a [`data.frame`] or a -#' [`torch_tensor`].\cr +#' @param data (`array`, `data.frame` or `torch_tensor`)\cr +#' The individual instances to be explained by the method. +#' These must have the same format as the input data of the passed model +#' and has to be either [`matrix`], an [`array`], a [`data.frame`] or a +#' [`torch_tensor`]. If no value is specified, all instances in the +#' dataset `data` will be explained.\cr #' **Note:** For the model-agnostic methods, only models with a single #' input and output layer is allowed!\cr diff --git a/man-roxygen/param-data_ref-agnostic.R b/man-roxygen/param-data_ref-agnostic.R new file mode 100644 index 0000000..d8917cb --- /dev/null +++ b/man-roxygen/param-data_ref-agnostic.R @@ -0,0 +1,7 @@ +#' @param data_ref ([`array`], [`data.frame`] or [`torch_tensor`])\cr +#' The dataset to which the method is to be applied. These must +#' have the same format as the input data of the passed model and has to +#' be either [`matrix`], an [`array`], a [`data.frame`] or a +#' [`torch_tensor`].\cr +#' **Note:** For the model-agnostic methods, only models with a single +#' input and output layer is allowed!\cr diff --git a/man-roxygen/param-model-agnostic.R b/man-roxygen/param-model-agnostic.R index 2cd9bc4..e100179 100644 --- a/man-roxygen/param-model-agnostic.R +++ b/man-roxygen/param-model-agnostic.R @@ -1,4 +1,4 @@ -#' @param model \cr +#' @param model (any prediction model)\cr #' A fitted model for a classification or regression task that #' is intended to be interpreted. A [`Converter`] object can also be #' passed. In order for the package to know how to make predictions diff --git a/man-roxygen/param-x-agnostic.R b/man-roxygen/param-x-agnostic.R deleted file mode 100644 index 1d3a6e8..0000000 --- a/man-roxygen/param-x-agnostic.R +++ /dev/null @@ -1,8 +0,0 @@ -#' @param x (`array`, `data.frame` or `torch_tensor`)\cr -#' The individual instances to be explained by the method. -#' These must have the same format as the input data of the passed model -#' and has to be either [`matrix`], an [`array`], a [`data.frame`] or a -#' [`torch_tensor`]. If no value is specified, all instances in the -#' dataset `data` will be explained.\cr -#' **Note:** For the model-agnostic methods, only models with a single -#' input and output layer is allowed!\cr diff --git a/man/AgnosticWrapper.Rd b/man/AgnosticWrapper.Rd index ad35dd0..c549961 100644 --- a/man/AgnosticWrapper.Rd +++ b/man/AgnosticWrapper.Rd @@ -23,7 +23,7 @@ are available (all are wrapped by other packages): \section{Public fields}{ \if{html}{\out{
}} \describe{ -\item{\code{x}}{The individual instances to be explained by the method +\item{\code{data_orig}}{The individual instances to be explained by the method (unprocessed!).} } \if{html}{\out{
}} @@ -38,9 +38,9 @@ are available (all are wrapped by other packages): \if{html}{\out{
Inherited methods
@@ -54,7 +54,7 @@ Create a new instance of the \code{AgnosticWrapper} R6 class. \if{html}{\out{
}}\preformatted{AgnosticWrapper$new( model, data, - x, + data_ref, output_type = NULL, pred_fun = NULL, output_idx = NULL, @@ -69,7 +69,7 @@ Create a new instance of the \code{AgnosticWrapper} R6 class. \subsection{Arguments}{ \if{html}{\out{
}} \describe{ -\item{\code{model}}{\cr +\item{\code{model}}{(any prediction model)\cr A fitted model for a classification or regression task that is intended to be interpreted. A \code{\link{Converter}} object can also be passed. In order for the package to know how to make predictions @@ -80,15 +80,7 @@ the argument \code{pred_fun}. However, for models created by these have already been pre-implemented and do not need to be specified.\cr} -\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr -The dataset to which the method is to be applied. These must -have the same format as the input data of the passed model and has to -be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a -\code{\link{torch_tensor}}.\cr -\strong{Note:} For the model-agnostic methods, only models with a single -input and output layer is allowed!\cr} - -\item{\code{x}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +\item{\code{data}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr The individual instances to be explained by the method. These must have the same format as the input data of the passed model and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a @@ -97,6 +89,14 @@ dataset \code{data} will be explained.\cr \strong{Note:} For the model-agnostic methods, only models with a single input and output layer is allowed!\cr} +\item{\code{data_ref}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + \item{\code{output_type}}{(\code{character(1)})\cr Type of the model output, i.e., either \code{"classification"} or \code{"regression"}.\cr} diff --git a/man/ConnectionWeights.Rd b/man/ConnectionWeights.Rd index 01b4219..beedebd 100644 --- a/man/ConnectionWeights.Rd +++ b/man/ConnectionWeights.Rd @@ -20,6 +20,9 @@ method \emph{Gradient\eqn{\times}Input} (see \code{\link{Gradient}}). Hence, the simply the point-wise product of the global \emph{Connection weights} method and the input data. You can use this variant by setting the \code{times_input} argument to \code{TRUE} and providing input data. + +The R6 class can also be initialized using the \code{\link{run_cw}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -40,9 +43,18 @@ converter <- Converter$new(model, input_names = list(c("Car", "Cat", "Dog", "Plane", "Horse")) ) +# You can also use the helper function for the initialization part +converter <- convert(model, + input_dim = c(5), + input_names = list(c("Car", "Cat", "Dog", "Plane", "Horse")) +) + # Apply method Connection Weights cw <- ConnectionWeights$new(converter) +# Again, you can use a helper function `run_cw()` for initializing +cw <- run_cw(converter) + # Print the head of the result as a data.frame head(get_result(cw, "data.frame"), 5) @@ -62,10 +74,10 @@ if (require("neuralnet")) { ) # Convert the trained model - converter <- Converter$new(nn) + converter <- convert(nn) # Apply the Connection Weights method - cw <- ConnectionWeights$new(converter) + cw <- run_cw(converter) # Get the result as a torch tensor get_result(cw, type = "torch.tensor") @@ -101,10 +113,10 @@ if (require("keras")) { layer_dense(units = 2, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the Connection Weights method - cw <- ConnectionWeights$new(converter) + cw <- run_cw(converter) # Get the head of the result as a data.frame head(get_result(cw, type = "data.frame"), 5) @@ -168,9 +180,9 @@ variant of the method or the local one was applied. If the value is \if{html}{\out{
Inherited methods
diff --git a/man/Converter.Rd b/man/Converter.Rd index fe46b3f..13865e0 100644 --- a/man/Converter.Rd +++ b/man/Converter.Rd @@ -20,6 +20,9 @@ You can use models from the following libraries: Furthermore, a model can be passed as a list (see \code{vignette("detailed_overview", package = "innsight")} or the \href{https://bips-hb.github.io/innsight/articles/detailed_overview.html#model-as-named-list}{website}). + +The R6 class can also be initialized using the \code{\link{convert}} function +as a helper function so that no prior knowledge of R6 classes is required. } \details{ In order to better understand and analyze the prediction of a neural @@ -43,9 +46,14 @@ following methods: \itemize{ \item \emph{Layerwise Relevance Propagation} (\link{LRP}), Bach et al. (2015) \item \emph{Deep Learning Important Features} (\link{DeepLift}), Shrikumar et al. (2017) +\item \emph{\link{DeepSHAP}}, Lundberg et al. (2017) \item \emph{\link{SmoothGrad}} including \emph{SmoothGrad\eqn{\times}Input}, Smilkov et al. (2017) \item \emph{Vanilla \link{Gradient}} including \emph{Gradient\eqn{\times}Input} +\item \emph{Integrated gradients} (\link{IntegratedGradient}), Sundararajan et al. (2017) +\item \emph{Expected gradients} (\link{ExpectedGradient}), Erion et al. (2021) \item \emph{\link{ConnectionWeights}}, Olden et al. (2004) +\item \emph{Local interpretable model-agnostic explanation (\link{LIME})}, Ribeiro et al. (2016) +\item \emph{Shapley values (\link{SHAP})}, Lundberg et al. (2017) } } @@ -77,6 +85,10 @@ data <- torch_randn(25, 5) # Convert the model (for torch models is 'input_dim' required!) converter <- Converter$new(model, input_dim = c(5)) +# You can also use the helper function `convert()` for initializing a +# Converter object +converter <- convert(model, input_dim = c(5)) + # Get the converted model stored in the field 'model' converted_model <- converter$model @@ -97,7 +109,7 @@ if (require("neuralnet")) { ) # Convert the model - converter <- Converter$new(nn) + converter <- convert(nn) # Print all the layers converter$model$modules_list @@ -127,7 +139,7 @@ if (require("keras")) { layer_dense(units = 1, activation = "sigmoid") # Convert this model and save model as list - converter <- Converter$new(model, save_model_as_list = TRUE) + converter <- convert(model, save_model_as_list = TRUE) # Print the converted model as a named list str(converter$model_as_list, max.level = 1) @@ -169,7 +181,7 @@ model$layers$Layer_2 <- ) # Convert the model -converter <- Converter$new(model) +converter <- convert(model) # Get the model as a torch::nn_module torch_model <- converter$model @@ -187,10 +199,19 @@ simulated data.} Ecological Modelling 178, p. 389–397 \item S. Bach et al. (2015) \emph{On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.} PLoS ONE 10, p. 1-46 +\item M. T. Ribeiro et al. (2016) \emph{"Why should I trust you?": Explaining +the predictions of any classifier.} KDD 2016, p. 1135-1144 \item A. Shrikumar et al. (2017) \emph{Learning important features through propagating activation differences.} ICML 2017, p. 4844-4866 \item D. Smilkov et al. (2017) \emph{SmoothGrad: removing noise by adding noise.} CoRR, abs/1706.03825 +M. Sundararajan et al. (2017) \emph{Axiomatic attribution for deep networks.} +ICML 2017, p.3319-3328 +\item S. Lundberg et al. (2017) \emph{A unified approach to interpreting model +predictions.} NIPS 2017, p. 4768-4777 +\item G. Erion et al. (2021) \emph{Improving performance of deep learning models +with axiomatic attribution priors and expected gradients.} Nature Machine +Intelligence 3, p. 620-631 } } \section{Public fields}{ diff --git a/man/DeepLift.Rd b/man/DeepLift.Rd index 2a4300f..5fd5f5e 100644 --- a/man/DeepLift.Rd +++ b/man/DeepLift.Rd @@ -19,6 +19,9 @@ get real contributions of the input features to the difference-from-reference prediction. There are two ways to handle activation functions: the \emph{Rescale} rule (\code{'rescale'}) and \emph{RevealCancel} rule (\code{'reveal_cancel'}). + +The R6 class can also be initialized using the \code{\link{run_deeplift}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -35,12 +38,16 @@ model <- nn_sequential( data <- torch_randn(25, 5) ref <- torch_randn(1, 5) -# Create Converter -converter <- Converter$new(model, input_dim = c(5)) +# Create Converter using the helper function `convert` +converter <- convert(model, input_dim = c(5)) # Apply method DeepLift deeplift <- DeepLift$new(converter, data, x_ref = ref) +# You can also use the helper function `run_deeplift` for initializing +# an R6 DeepLift object +deeplift <- run_deeplift(converter, data, x_ref = ref) + # Print the result as a torch tensor for first two data points get_result(deeplift, "torch.tensor")[1:2] @@ -63,12 +70,12 @@ if (require("neuralnet")) { ) # Convert the model - converter <- Converter$new(nn) + converter <- convert(nn) # Apply DeepLift with rescale-rule and a reference input of the feature # means x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) - deeplift_rescale <- DeepLift$new(converter, iris[, c(3, 4)], x_ref = x_ref) + deeplift_rescale <- run_deeplift(converter, iris[, c(3, 4)], x_ref = x_ref) # Get the result as a dataframe and show first 5 rows get_result(deeplift_rescale, type = "data.frame")[1:5, ] @@ -107,10 +114,10 @@ if (require("keras")) { layer_dense(units = 2, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the DeepLift method with reveal-cancel rule - deeplift_revcancel <- DeepLift$new(converter, data, + deeplift_revcancel <- run_deeplift(converter, data, channels_first = FALSE, rule_name = "reveal_cancel" ) @@ -118,8 +125,8 @@ if (require("keras")) { # Plot the result for the first image and both classes plot(deeplift_revcancel, output_idx = 1:2) - # Plot the result as boxplots for first class - boxplot(deeplift_revcancel, output_idx = 1) + # Plot the pixel-wise median reelvance image + plot_global(deeplift_revcancel, output_idx = 1) } \dontshow{\}) # examplesIf} \dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -175,9 +182,9 @@ Either \code{'rescale'} or \code{'reveal_cancel'}.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/DeepSHAP.Rd b/man/DeepSHAP.Rd index 75218ba..a49ac95 100644 --- a/man/DeepSHAP.Rd +++ b/man/DeepSHAP.Rd @@ -10,9 +10,12 @@ from several, ideally representative reference values at each layer. The obtained feature-wise results are approximate Shapley values for the chosen output, where the conditional expectation is computed using these different reference values, i.e., the \emph{DeepSHAP} method decompose the -difference from the prediction and the mean prediction \eqn{f(x) - E[f(x)]} +difference from the prediction and the mean prediction \eqn{f(x) - E[f(\tilde{x})]} in feature-wise effects. The reference values can be passed by the argument \code{data_ref}. + +The R6 class can also be initialized using the \code{\link{run_deepshap}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -33,11 +36,15 @@ data <- torch_randn(25, 5) ref <- torch_randn(5, 5) # Create Converter -converter <- Converter$new(model, input_dim = c(5)) +converter <- convert(model, input_dim = c(5)) # Apply method DeepSHAP deepshap <- DeepSHAP$new(converter, data, data_ref = ref) +# You can also use the helper function `run_deepshap` for initializing +# an R6 DeepSHAP object +deepshap <- run_deepshap(converter, data, data_ref = ref) + # Print the result as a torch tensor for first two data points get_result(deepshap, "torch.tensor")[1:2] @@ -60,11 +67,11 @@ if (require("neuralnet")) { ) # Convert the model - converter <- Converter$new(nn) + converter <- convert(nn) # Apply DeepSHAP with rescale-rule and a 100 (default of `limit_ref`) # instances as the reference dataset - deepshap <- DeepSHAP$new(converter, iris[, c(3, 4)], + deepshap <- run_deepshap(converter, iris[, c(3, 4)], data_ref = iris[, c(3, 4)]) # Get the result as a dataframe and show first 5 rows @@ -104,17 +111,17 @@ if (require("keras")) { layer_dense(units = 2, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the DeepSHAP method with zero baseline (wich is equivalent to # DeepLift with zero baseline) - deepshap <- DeepSHAP$new(converter, data, channels_first = FALSE) + deepshap <- run_deepshap(converter, data, channels_first = FALSE) # Plot the result for the first image and both classes plot(deepshap, output_idx = 1:2) - # Plot the result as boxplots for first class - boxplot(deepshap, output_idx = 1) + # Plot the pixel-wise median of the results + plot_global(deepshap, output_idx = 1) } \dontshow{\}) # examplesIf} \dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -174,9 +181,9 @@ format \emph{channels first} is used.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/ExpectedGradient.Rd b/man/ExpectedGradient.Rd index 0c20d34..4e30bab 100644 --- a/man/ExpectedGradient.Rd +++ b/man/ExpectedGradient.Rd @@ -16,7 +16,10 @@ E_{x'\sim X', \alpha \sim U(0,1)}[(x - x') \times \frac{\partial f(x' + \alpha ( } The distribution of the reference values is specified with the argument \code{data_ref}, of which \code{n} samples are taken at random for each instance -during the exstimation. +during the estimation. + +The R6 class can also be initialized using the \code{\link{run_expgrad}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -34,11 +37,15 @@ data <- torch_randn(25, 5) ref <- torch_randn(1, 5) # Create Converter -converter <- Converter$new(model, input_dim = c(5)) +converter <- convert(model, input_dim = c(5)) # Apply method IntegratedGradient int_grad <- IntegratedGradient$new(converter, data, x_ref = ref) +# You can also use the helper function `run_intgrad` for initializing +# an R6 IntegratedGradient object +int_grad <- run_intgrad(converter, data, x_ref = ref) + # Print the result as a torch tensor for first two data points get_result(int_grad, "torch.tensor")[1:2] @@ -61,11 +68,11 @@ if (require("neuralnet")) { ) # Convert the model - converter <- Converter$new(nn) + converter <- convert(nn) # Apply IntegratedGradient with a reference input of the feature means x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) - int_grad <- IntegratedGradient$new(converter, iris[, c(3, 4)], x_ref = x_ref) + int_grad <- run_intgrad(converter, iris[, c(3, 4)], x_ref = x_ref) # Get the result as a dataframe and show first 5 rows get_result(int_grad, type = "data.frame")[1:5, ] @@ -103,11 +110,11 @@ if (require("keras")) { layer_dense(units = 2, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the IntegratedGradient method with a zero baseline and n = 20 # iteration steps - int_grad <- IntegratedGradient$new(converter, data, + int_grad <- run_intgrad(converter, data, channels_first = FALSE, n = 20 ) @@ -115,8 +122,8 @@ if (require("keras")) { # Plot the result for the first image and both classes plot(int_grad, output_idx = 1:2) - # Plot the result as boxplots for first class - boxplot(int_grad, output_idx = 1) + # Plot the pixel-wise median of the results + plot_global(int_grad, output_idx = 1) } \dontshow{\}) # examplesIf} \dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -175,9 +182,9 @@ input layer.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/Gradient.Rd b/man/Gradient.Rd index 430fad7..4372606 100644 --- a/man/Gradient.Rd +++ b/man/Gradient.Rd @@ -11,6 +11,12 @@ variable \eqn{i} and output class \eqn{j} If the argument \code{times_input} is \code{TRUE}, the gradients are multiplied by the respective input value (\emph{Gradient\eqn{\times}Input}), i.e., \deqn{x_i * d f(x)_j / d x_i.} +While the vanilla gradients emphasize prediction-sensitive features, +Gradient\eqn{\times}Input is a decomposition of the output into feature-wise +effects based on the first-order Taylor decomposition. + +The R6 class can also be initialized using the \code{\link{run_grad}} function as a +helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -27,7 +33,7 @@ model <- nn_sequential( data <- torch_randn(25, 5) # Create Converter with input and output names -converter <- Converter$new(model, +converter <- convert(model, input_dim = c(5), input_names = list(c("Car", "Cat", "Dog", "Plane", "Horse")), output_names = list(c("Buy it!", "Don't buy it!")) @@ -36,6 +42,10 @@ converter <- Converter$new(model, # Calculate the Gradients grad <- Gradient$new(converter, data) +# You can also use the helper function `run_grad` for initializing +# an R6 Gradient object +grad <- run_grad(converter, data) + # Print the result as a data.frame for first 5 rows get_result(grad, "data.frame")[1:5,] @@ -59,18 +69,18 @@ if (require("neuralnet")) { ) # Convert the trained model - converter <- Converter$new(nn) + converter <- convert(nn) # Calculate the gradients - gradient <- Gradient$new(converter, iris[, -5]) + gradient <- run_grad(converter, iris[, -5]) # Plot the result for the first and 60th data point and all classes plot(gradient, data_idx = c(1, 60), output_idx = 1:3) # Calculate Gradients x Input and do not ignore the last activation - gradient <- Gradient$new(converter, iris[, -5], - ignore_last_act = FALSE, - times_input = TRUE) + gradient <- run_grad(converter, iris[, -5], + ignore_last_act = FALSE, + times_input = TRUE) # Plot the result again plot(gradient, data_idx = c(1, 60), output_idx = 1:3) @@ -103,10 +113,10 @@ if (require("keras")) { layer_dense(units = 3, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the Gradient method - gradient <- Gradient$new(converter, data, channels_first = FALSE) + gradient <- run_grad(converter, data, channels_first = FALSE) # Plot the result for the first datapoint and all classes plot(gradient, output_idx = 1:3) @@ -156,9 +166,9 @@ Other methods: \if{html}{\out{
Inherited methods
diff --git a/man/GradientBased.Rd b/man/GradientBased.Rd index 4aa4ed3..9ba4494 100644 --- a/man/GradientBased.Rd +++ b/man/GradientBased.Rd @@ -12,6 +12,7 @@ gradients w.r.t. to the input for given data. Implemented are: \item \emph{Vanilla Gradients} and \emph{Gradient\eqn{\times}Input} (\code{\link{Gradient}}) \item \emph{Integrated Gradients} (\code{\link{IntegratedGradient}}) \item \emph{SmoothGrad} and \emph{SmoothGrad\eqn{\times}Input} (\code{\link{SmoothGrad}}) +\item \emph{ExpectedGradients} (\code{\link{ExpectedGradient}}) } } \section{Super class}{ @@ -36,9 +37,9 @@ were multiplied by the provided input data or not.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/IntegratedGradient.Rd b/man/IntegratedGradient.Rd index e9c7498..f6d1f19 100644 --- a/man/IntegratedGradient.Rd +++ b/man/IntegratedGradient.Rd @@ -22,6 +22,9 @@ Similar to the other gradient-based methods, by default the integrated gradient is multiplied by the input to get an approximate decomposition of \eqn{y - y'}. However, with the parameter \code{times_input} only the gradient describing the output sensitivity can be returned. + +The R6 class can also be initialized using the \code{\link{run_intgrad}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -39,11 +42,15 @@ data <- torch_randn(25, 5) ref <- torch_randn(1, 5) # Create Converter -converter <- Converter$new(model, input_dim = c(5)) +converter <- convert(model, input_dim = c(5)) # Apply method IntegratedGradient int_grad <- IntegratedGradient$new(converter, data, x_ref = ref) +# You can also use the helper function `run_intgrad` for initializing +# an R6 IntegratedGradient object +int_grad <- run_intgrad(converter, data, x_ref = ref) + # Print the result as a torch tensor for first two data points get_result(int_grad, "torch.tensor")[1:2] @@ -66,11 +73,11 @@ if (require("neuralnet")) { ) # Convert the model - converter <- Converter$new(nn) + converter <- convert(nn) # Apply IntegratedGradient with a reference input of the feature means x_ref <- matrix(colMeans(iris[, c(3, 4)]), nrow = 1) - int_grad <- IntegratedGradient$new(converter, iris[, c(3, 4)], x_ref = x_ref) + int_grad <- run_intgrad(converter, iris[, c(3, 4)], x_ref = x_ref) # Get the result as a dataframe and show first 5 rows get_result(int_grad, type = "data.frame")[1:5, ] @@ -108,11 +115,11 @@ if (require("keras")) { layer_dense(units = 2, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the IntegratedGradient method with a zero baseline and n = 20 # iteration steps - int_grad <- IntegratedGradient$new(converter, data, + int_grad <- run_intgrad(converter, data, channels_first = FALSE, n = 20 ) @@ -120,8 +127,8 @@ if (require("keras")) { # Plot the result for the first image and both classes plot(int_grad, output_idx = 1:2) - # Plot the result as boxplots for first class - boxplot(int_grad, output_idx = 1) + # Plot the pixel-wise median of the results + plot_global(int_grad, output_idx = 1) } \dontshow{\}) # examplesIf} \dontshow{if (torch::torch_is_installed() & Sys.getenv("RENDER_PLOTLY", unset = 0) == 1) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -178,9 +185,9 @@ input layer.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/InterpretingMethod.Rd b/man/InterpretingMethod.Rd index 10db192..068698a 100644 --- a/man/InterpretingMethod.Rd +++ b/man/InterpretingMethod.Rd @@ -135,7 +135,7 @@ averaged prediction across the reference values.\cr} \item \href{#method-InterpretingMethod-new}{\code{InterpretingMethod$new()}} \item \href{#method-InterpretingMethod-get_result}{\code{InterpretingMethod$get_result()}} \item \href{#method-InterpretingMethod-plot}{\code{InterpretingMethod$plot()}} -\item \href{#method-InterpretingMethod-boxplot}{\code{InterpretingMethod$boxplot()}} +\item \href{#method-InterpretingMethod-plot_global}{\code{InterpretingMethod$plot_global()}} \item \href{#method-InterpretingMethod-print}{\code{InterpretingMethod$print()}} \item \href{#method-InterpretingMethod-clone}{\code{InterpretingMethod$clone()}} } @@ -422,11 +422,11 @@ individual results. } } \if{html}{\out{
}} -\if{html}{\out{}} -\if{latex}{\out{\hypertarget{method-InterpretingMethod-boxplot}{}}} -\subsection{Method \code{boxplot()}}{ +\if{html}{\out{}} +\if{latex}{\out{\hypertarget{method-InterpretingMethod-plot_global}{}}} +\subsection{Method \code{plot_global()}}{ This method visualizes the results of the selected method summarized as -boxplots and enables a visual in-depth investigation of the global +boxplots/median image and enables a visual in-depth investigation of the global behavior with the help of the S4 classes \code{\link{innsight_ggplot2}} and \code{\link{innsight_plotly}}.\cr You can use the argument \code{output_idx} to select the individual output @@ -450,7 +450,7 @@ a bit more complex, therefore the suggested packages \code{'grid'}, \code{'gridExtra'} and \code{'gtable'} must be installed in your R session. } \subsection{Usage}{ -\if{html}{\out{
}}\preformatted{InterpretingMethod$boxplot( +\if{html}{\out{
}}\preformatted{InterpretingMethod$plot_global( output_idx = NULL, output_label = NULL, data_idx = "all", diff --git a/man/LIME.Rd b/man/LIME.Rd index d9eb586..f331d71 100644 --- a/man/LIME.Rd +++ b/man/LIME.Rd @@ -17,6 +17,9 @@ function is pre-implemented for models created with \code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. Internally, the suggested package \code{lime} is utilized and applied to \code{data.frame}. +The R6 class can also be initialized using the \code{\link{run_lime}} function +as a helper function so that no prior knowledge of R6 classes is required. + \strong{Note:} Even signal and image data are initially transformed into a \code{data.frame} using \code{as.data.frame()} and then \code{\link[lime:lime]{lime::lime}} and \code{\link[lime:explain]{lime::explain}} are @@ -39,7 +42,13 @@ data <- torch_randn(25, 5) # Calculate LIME for the first 10 instances and set the # feature and outcome names -lime <- LIME$new(model, data, data[1:10, ], +lime <- LIME$new(model, data[1:10, ], data_ref = data, + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) + +# You can also use the helper function `run_lime` for initializing +# an R6 LIME object +lime <- run_lime(model, data[1:10, ], data_ref = data, input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), output_names = c("Buy it!", "Don't buy it!")) @@ -54,10 +63,10 @@ boxplot(lime, output_idx = c(1, 2)) # We can also forward some arguments to lime::explain, e.g. n_permutatuins # to get more accurate values -lime <- LIME$new(model, data, data[1:10, ], +lime <- run_lime(model, data[1:10, ], data_ref = data, input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), output_names = c("Buy it!", "Don't buy it!"), - n_perturbations = 500) + n_perturbations = 200) # Plot the boxplots again boxplot(lime, output_idx = c(1, 2)) @@ -65,13 +74,13 @@ boxplot(lime, output_idx = c(1, 2)) #----------------------- Example 2: Converter object -------------------------- # We can do the same with an Converter object (all feature and outcome names # will be extracted by the LIME method!) -conv <- Converter$new(model, - input_dim = c(5), - input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), - output_names = c("Buy it!", "Don't buy it!")) +conv <- convert(model, + input_dim = c(5), + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) # Calculate LIME for the first 10 instances -lime <- LIME$new(conv, data, data[1:10], n_perturbations = 400) +lime <- run_lime(conv, data[1:10], data_ref = data, n_perturbations = 300) # Plot the result for both classes plot(lime, output_idx = c(1, 2)) @@ -93,11 +102,12 @@ if (require("neuralnet") & require("ranger")) { # Calculate LIME for the instances of index 1 and 111 and add # the outcome labels (for LIME, the output_type is required!) - lime <- LIME$new(model, iris[, -5], iris[c(1, 111), -5], + lime <- run_lime(model, iris[c(1, 111), -5], + data_ref = iris[, -5], pred_fun = pred_fun, output_type = "classification", output_names = levels(iris$Species), - n_perturbations = 500) + n_perturbations = 300) # Plot the result for the first two classes and all selected instances plot(lime, data_idx = 1:2, output_idx = 1:2) @@ -133,9 +143,9 @@ Other methods: \if{html}{\out{
Inherited methods
@@ -151,7 +161,7 @@ stored in the field \code{result}. \if{html}{\out{
}}\preformatted{LIME$new( model, data, - x, + data_ref, output_type = NULL, pred_fun = NULL, output_idx = NULL, @@ -167,7 +177,7 @@ stored in the field \code{result}. \subsection{Arguments}{ \if{html}{\out{
}} \describe{ -\item{\code{model}}{\cr +\item{\code{model}}{(any prediction model)\cr A fitted model for a classification or regression task that is intended to be interpreted. A \code{\link{Converter}} object can also be passed. In order for the package to know how to make predictions @@ -178,15 +188,7 @@ the argument \code{pred_fun}. However, for models created by these have already been pre-implemented and do not need to be specified.\cr} -\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr -The dataset to which the method is to be applied. These must -have the same format as the input data of the passed model and has to -be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a -\code{\link{torch_tensor}}.\cr -\strong{Note:} For the model-agnostic methods, only models with a single -input and output layer is allowed!\cr} - -\item{\code{x}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +\item{\code{data}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr The individual instances to be explained by the method. These must have the same format as the input data of the passed model and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a @@ -195,6 +197,14 @@ dataset \code{data} will be explained.\cr \strong{Note:} For the model-agnostic methods, only models with a single input and output layer is allowed!\cr} +\item{\code{data_ref}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + \item{\code{output_type}}{(\code{character(1)})\cr Type of the model output, i.e., either \code{"classification"} or \code{"regression"}.\cr} diff --git a/man/LRP.Rd b/man/LRP.Rd index 6175ccd..7000d2b 100644 --- a/man/LRP.Rd +++ b/man/LRP.Rd @@ -16,6 +16,9 @@ is generally an approximation. There exist several propagation rules to determine the relevance scores. In this package are implemented: simple rule ("simple"), \eqn{\varepsilon}-rule ("epsilon") and \eqn{\alpha}-\eqn{\beta}-rule ("alpha_beta"). + +The R6 class can also be initialized using the \code{\link{run_lrp}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -32,11 +35,15 @@ model <- nn_sequential( data <- torch_randn(25, 5) # Create Converter -converter <- Converter$new(model, input_dim = c(5)) +converter <- convert(model, input_dim = c(5)) # Apply method LRP with simple rule (default) lrp <- LRP$new(converter, data) +# You can also use the helper function `run_lrp` for initializing +# an R6 LRP object +lrp <- run_lrp(converter, data) + # Print the result as an array for data point one and two get_result(lrp)[1:2,,] @@ -57,10 +64,10 @@ if (require("neuralnet")) { ) # Create an converter for this model - converter <- Converter$new(nn) + converter <- convert(nn) # Create new instance of 'LRP' - lrp <- LRP$new(converter, iris[, -5], rule_name = "simple") + lrp <- run_lrp(converter, iris[, -5], rule_name = "simple") # Get the result as an array for data point one and two get_result(lrp)[1:2,,] @@ -69,13 +76,13 @@ if (require("neuralnet")) { get_result(lrp, type = "torch.tensor")[1:2] # Use the alpha-beta rule with alpha = 2 - lrp <- LRP$new(converter, iris[, -5], + lrp <- run_lrp(converter, iris[, -5], rule_name = "alpha_beta", rule_param = 2 ) # Include the last activation into the calculation - lrp <- LRP$new(converter, iris[, -5], + lrp <- run_lrp(converter, iris[, -5], rule_name = "alpha_beta", rule_param = 2, ignore_last_act = FALSE @@ -112,11 +119,11 @@ if (require("keras")) { layer_dense(units = 3, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the LRP method with the epsilon rule for the dense layers and # the alpha-beta rule for the convolutional layers - lrp_comp <- LRP$new(converter, data, + lrp_comp <- run_lrp(converter, data, channels_first = FALSE, rule_name = list(Dense_Layer = "epsilon", Conv1D_Layer = "alpha_beta"), rule_param = list(Dense_Layer = 0.1, Conv1D_Layer = 1) @@ -200,9 +207,9 @@ rule parameter to each layer type.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/SHAP.Rd b/man/SHAP.Rd index aaf7538..67dbb3d 100644 --- a/man/SHAP.Rd +++ b/man/SHAP.Rd @@ -15,6 +15,9 @@ function is pre-implemented for models created with \code{\link[neuralnet]{neuralnet}} or \code{\link{Converter}}. Internally, the suggested package \code{fastshap} is utilized and applied to \code{data.frame}. +The R6 class can also be initialized using the \code{\link{run_shap}} function +as a helper function so that no prior knowledge of R6 classes is required. + \strong{Note:} Even signal and image data are initially transformed into a \code{data.frame} using \code{as.data.frame()} and then \code{\link[fastshap:explain]{fastshap::explain}} is applied. In other words, a custom \code{pred_fun} may need to convert the @@ -36,7 +39,13 @@ data <- torch_randn(25, 5) # Calculate Shapley values for the first 10 instances and set the # feature and outcome names -shap <- SHAP$new(model, data, data[1:10, ], +shap <- SHAP$new(model, data[1:10, ], data_ref = data, + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) + +# You can also use the helper function `run_shap` for initializing +# an R6 SHAP object +shap <- run_shap(model, data[1:10, ], data_ref = data, input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), output_names = c("Buy it!", "Don't buy it!")) @@ -51,7 +60,7 @@ boxplot(shap, output_idx = c(1, 2)) # We can also forward some arguments to fastshap::explain, e.g. nsim to # get more accurate values -shap <- SHAP$new(model, data, data[1:10, ], +shap <- run_shap(model, data[1:10, ], data_ref = data, input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), output_names = c("Buy it!", "Don't buy it!"), nsim = 10) @@ -62,13 +71,13 @@ boxplot(shap, output_idx = c(1, 2)) #----------------------- Example 2: Converter object -------------------------- # We can do the same with an Converter object (all feature and outcome names # will be extracted by the SHAP method!) -conv <- Converter$new(model, - input_dim = c(5), - input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), - output_names = c("Buy it!", "Don't buy it!")) +conv <- convert(model, + input_dim = c(5), + input_names = c("Car", "Cat", "Dog", "Plane", "Horse"), + output_names = c("Buy it!", "Don't buy it!")) # Calculate Shapley values for the first 10 instances -shap <- SHAP$new(conv, data, data[1:10]) +shap <- run_shap(conv, data[1:10], data_ref = data) # Plot the result for both classes plot(shap, output_idx = c(1, 2)) @@ -90,7 +99,7 @@ if (require("neuralnet") & require("ranger")) { # Calculate Shapley values for the instances of index 1 and 111 and add # the outcome labels - shap <- SHAP$new(model, iris[, -5], iris[c(1, 111), -5], + shap <- run_shap(model, iris[c(1, 111), -5], data_ref = iris[, -5], pred_fun = pred_fun, output_names = levels(iris$Species), nsim = 10) @@ -129,9 +138,9 @@ Other methods: \if{html}{\out{
Inherited methods
@@ -147,7 +156,7 @@ stored in the field \code{result}. \if{html}{\out{
}}\preformatted{SHAP$new( model, data, - x, + data_ref, pred_fun = NULL, output_idx = NULL, output_label = NULL, @@ -162,7 +171,7 @@ stored in the field \code{result}. \subsection{Arguments}{ \if{html}{\out{
}} \describe{ -\item{\code{model}}{\cr +\item{\code{model}}{(any prediction model)\cr A fitted model for a classification or regression task that is intended to be interpreted. A \code{\link{Converter}} object can also be passed. In order for the package to know how to make predictions @@ -173,15 +182,7 @@ the argument \code{pred_fun}. However, for models created by these have already been pre-implemented and do not need to be specified.\cr} -\item{\code{data}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr -The dataset to which the method is to be applied. These must -have the same format as the input data of the passed model and has to -be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a -\code{\link{torch_tensor}}.\cr -\strong{Note:} For the model-agnostic methods, only models with a single -input and output layer is allowed!\cr} - -\item{\code{x}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr +\item{\code{data}}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr The individual instances to be explained by the method. These must have the same format as the input data of the passed model and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a @@ -190,6 +191,14 @@ dataset \code{data} will be explained.\cr \strong{Note:} For the model-agnostic methods, only models with a single input and output layer is allowed!\cr} +\item{\code{data_ref}}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr +\strong{Note:} For the model-agnostic methods, only models with a single +input and output layer is allowed!\cr} + \item{\code{pred_fun}}{(\code{function})\cr Prediction function for the model. This argument is only needed if \code{model} is not a model created by diff --git a/man/SmoothGrad.Rd b/man/SmoothGrad.Rd index 77f7a3c..fd7ad97 100644 --- a/man/SmoothGrad.Rd +++ b/man/SmoothGrad.Rd @@ -12,6 +12,9 @@ gradients for \code{n} perturbations of each data point, i.e., with Analogous to the \emph{Gradient\eqn{\times}Input} method, you can also use the argument \code{times_input} to multiply the gradients by the inputs before taking the average (\emph{SmoothGrad\eqn{\times}Input}). + +The R6 class can also be initialized using the \code{\link{run_smoothgrad}} function +as a helper function so that no prior knowledge of R6 classes is required. } \examples{ \dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -28,11 +31,15 @@ model <- nn_sequential( data <- torch_randn(25, 5) # Create Converter -converter <- Converter$new(model, input_dim = c(5)) +converter <- convert(model, input_dim = c(5)) # Calculate the smoothed Gradients smoothgrad <- SmoothGrad$new(converter, data) +# You can also use the helper function `run_smoothgrad` for initializing +# an R6 SmoothGrad object +smoothgrad <- run_smoothgrad(converter, data) + # Print the result as a data.frame for first 5 rows head(get_result(smoothgrad, "data.frame"), 5) @@ -56,16 +63,16 @@ if (require("neuralnet")) { ) # Convert the trained model - converter <- Converter$new(nn) + converter <- convert(nn) # Calculate the smoothed gradients - smoothgrad <- SmoothGrad$new(converter, iris[, -5], times_input = FALSE) + smoothgrad <- run_smoothgrad(converter, iris[, -5], times_input = FALSE) # Plot the result for the first and 60th data point and all classes plot(smoothgrad, data_idx = c(1, 60), output_idx = 1:3) # Calculate SmoothGrad x Input and do not ignore the last activation - smoothgrad <- SmoothGrad$new(converter, iris[, -5], ignore_last_act = FALSE) + smoothgrad <- run_smoothgrad(converter, iris[, -5], ignore_last_act = FALSE) # Plot the result again plot(smoothgrad, data_idx = c(1, 60), output_idx = 1:3) @@ -98,10 +105,10 @@ if (require("keras")) { layer_dense(units = 3, activation = "softmax") # Convert the model - converter <- Converter$new(model) + converter <- convert(model) # Apply the SmoothGrad method - smoothgrad <- SmoothGrad$new(converter, data, channels_first = FALSE) + smoothgrad <- run_smoothgrad(converter, data, channels_first = FALSE) # Plot the result for the first datapoint and all classes plot(smoothgrad, output_idx = 1:3) @@ -167,9 +174,9 @@ perturbation, i.e., \eqn{\sigma = (max(x) - min(x)) *} \code{noise_level}.\cr} \if{html}{\out{
Inherited methods
diff --git a/man/get_result.Rd b/man/get_result.Rd index 887ecea..86df3fe 100644 --- a/man/get_result.Rd +++ b/man/get_result.Rd @@ -8,7 +8,8 @@ get_result(x, ...) } \arguments{ \item{x}{An object of the class \code{\link{InterpretingMethod}} including the -subclasses \code{\link{Gradient}}, \code{\link{SmoothGrad}}, \code{\link{LRP}}, \code{\link{DeepLift}} and +subclasses \code{\link{Gradient}}, \code{\link{SmoothGrad}}, \code{\link{LRP}}, \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{IntegratedGradient}}, \code{\link{ExpectedGradient}} and \code{\link{ConnectionWeights}}.} \item{...}{Other arguments specified in the R6 method diff --git a/man/innsight-package.Rd b/man/innsight-package.Rd index bca3f26..663076b 100644 --- a/man/innsight-package.Rd +++ b/man/innsight-package.Rd @@ -30,12 +30,20 @@ This package implements several model-specific interpretability \item Including propagation rules for non-linearities: \emph{Rescale} rule and \emph{RevealCancel} rule } +\item \link{DeepSHAP} \item Gradient-based methods: \itemize{ \item \emph{Vanilla \link{Gradient}}, including \emph{Gradient\eqn{\times}Input} \item Smoothed gradients \emph{(\link{SmoothGrad})}, including \emph{SmoothGrad\eqn{\times}Input} +\item \emph{Integrated gradients} (\link{IntegratedGradient}) +\item \emph{Expected gradients} (\link{ExpectedGradient}) } \item \emph{\link{ConnectionWeights}} +\item Model-agnostic methods: +\itemize{ +\item \emph{Local interpretable model-agnostic explanation (\link{LIME})} +\item \emph{Shapley values} (\link{SHAP}) +} } The package \code{innsight} aims to be as flexible as possible and independent diff --git a/man/innsight_ggplot2-indexing.Rd b/man/innsight_ggplot2-indexing.Rd index ac00c3a..a850016 100644 --- a/man/innsight_ggplot2-indexing.Rd +++ b/man/innsight_ggplot2-indexing.Rd @@ -1,21 +1,23 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/innsight_ggplot2.R -\name{[,innsight_ggplot2-method} +\name{[,innsight_ggplot2,ANY,ANY,ANY-method} +\alias{[,innsight_ggplot2,ANY,ANY,ANY-method} \alias{[,innsight_ggplot2-method} \alias{[.innsight_ggplot2} \alias{[[,innsight_ggplot2-method} \alias{[[.innsight_ggplot2} +\alias{[<-,innsight_ggplot2,ANY,ANY,ANY-method} \alias{[<-,innsight_ggplot2-method} \alias{[<-.innsight_ggplot2} \alias{[[<-,innsight_ggplot2-method} \alias{[[<-.innsight_ggplot2} \title{Indexing plots of \code{innsight_ggplot2}} \usage{ -\S4method{[}{innsight_ggplot2}(x, i, j, ..., restyle = TRUE, drop = TRUE) +\S4method{[}{innsight_ggplot2,ANY,ANY,ANY}(x, i, j, ..., restyle = TRUE, drop = TRUE) \S4method{[[}{innsight_ggplot2}(x, i, j, ..., restyle = TRUE) -\S4method{[}{innsight_ggplot2}(x, i, j, ...) <- value +\S4method{[}{innsight_ggplot2,ANY,ANY,ANY}(x, i, j, ...) <- value \S4method{[[}{innsight_ggplot2}(x, i, j, ...) <- value } diff --git a/man/innsight_plotly-indexing.Rd b/man/innsight_plotly-indexing.Rd index 1d84ae6..2eed740 100644 --- a/man/innsight_plotly-indexing.Rd +++ b/man/innsight_plotly-indexing.Rd @@ -1,13 +1,14 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/innsight_plotly.R -\name{[,innsight_plotly-method} +\name{[,innsight_plotly,ANY,ANY,ANY-method} +\alias{[,innsight_plotly,ANY,ANY,ANY-method} \alias{[,innsight_plotly-method} \alias{[.innsight_plotly} \alias{[[,innsight_plotly-method} \alias{[[.innsight_plotly} \title{Indexing plots of \code{innsight_plotly}} \usage{ -\S4method{[}{innsight_plotly}(x, i, j, ..., drop = TRUE) +\S4method{[}{innsight_plotly,ANY,ANY,ANY}(x, i, j, ..., drop = TRUE) \S4method{[[}{innsight_plotly}(x, i, j, ..., drop) } diff --git a/man/innsight_sugar.Rd b/man/innsight_sugar.Rd index b286add..942adce 100644 --- a/man/innsight_sugar.Rd +++ b/man/innsight_sugar.Rd @@ -42,11 +42,11 @@ run_deepshap(converter, data, ...) # Apply the `ConnectionWeights` method (argument `data` is not always required) run_cw(converter, ...) -# Apply the `LIME` method to explain `x` by using the dataset `data` -run_lime(model, data, x, ...) +# Apply the `LIME` method to explain `data` by using the dataset `data_ref` +run_lime(model, data, data_ref, ...) -# Apply the `SHAP` method to explain `x` by using the dataset `data` -run_shap(model, data, x, ...) +# Apply the `SHAP` method to explain `data` by using the dataset `data_ref` +run_shap(model, data, data_ref, ...) } \arguments{ \item{model}{(\code{\link{nn_sequential}}, \code{\link[keras]{keras_model}}, @@ -88,12 +88,11 @@ upper point) for each of the input layers. \strong{Note:} For the model-agnostic methods, only models with a single input and output layer is allowed!\cr} -\item{x}{(\code{array}, \code{data.frame} or \code{torch_tensor})\cr -The individual instances to be explained by the method. -These must have the same format as the input data of the passed model -and has to be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a -\code{\link{torch_tensor}}. If no value is specified, all instances in the -dataset \code{data} will be explained.\cr +\item{data_ref}{(\code{\link{array}}, \code{\link{data.frame}} or \code{\link{torch_tensor}})\cr +The dataset to which the method is to be applied. These must +have the same format as the input data of the passed model and has to +be either \code{\link{matrix}}, an \code{\link{array}}, a \code{\link{data.frame}} or a +\code{\link{torch_tensor}}.\cr \strong{Note:} For the model-agnostic methods, only models with a single input and output layer is allowed!\cr} } diff --git a/man/plot_global.Rd b/man/plot_global.Rd new file mode 100644 index 0000000..c0a1835 --- /dev/null +++ b/man/plot_global.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/InterpretingMethod.R +\name{plot_global} +\alias{plot_global} +\title{Get the result of an interpretation method} +\usage{ +plot_global(x, ...) +} +\arguments{ +\item{x}{An object of the class \code{\link{InterpretingMethod}} including the +subclasses \code{\link{Gradient}}, \code{\link{SmoothGrad}}, \code{\link{LRP}}, \code{\link{DeepLift}}, +\code{\link{DeepSHAP}}, \code{\link{IntegratedGradient}}, \code{\link{ExpectedGradient}} and +\code{\link{ConnectionWeights}}.} + +\item{...}{Other arguments specified in the R6 method +\code{InterpretingMethod$plot_global()}. See \code{\link{InterpretingMethod}} for details.} +} +\description{ +This is a generic S3 method for the R6 method +\code{InterpretingMethod$plot_global()}. See the respective method described in +\code{\link{InterpretingMethod}} for details. +} diff --git a/tests/testthat/test_ConnectionWeights.R b/tests/testthat/test_ConnectionWeights.R index dc77749..463ff6b 100644 --- a/tests/testthat/test_ConnectionWeights.R +++ b/tests/testthat/test_ConnectionWeights.R @@ -242,7 +242,7 @@ test_that("ConnectionWeights (global): Conv2D-Net", { expect_s4_class(p, "innsight_plotly") p <- plot(cw_first, as_plotly = TRUE, output_idx = c(1,2)) expect_s4_class(p, "innsight_plotly") - expect_error(boxplot(cw_first)) + expect_error(plot_global(cw_first)) expect_message(plot(cw_first, data_idx = c(1,3))) # Test plot function with channels last @@ -254,7 +254,7 @@ test_that("ConnectionWeights (global): Conv2D-Net", { expect_s4_class(p, "innsight_plotly") p <- plot(cw_last, as_plotly = TRUE, output_idx = c(1,2)) expect_s4_class(p, "innsight_plotly") - expect_error(boxplot(cw_last)) + expect_error(plot_global(cw_last)) expect_message(plot(cw_last, data_idx = c(1,3))) }) @@ -325,7 +325,7 @@ test_that("ConnectionWeights (global): Keras model with two inputs + two outputs expect_s4_class(p, "innsight_plotly") p <- plot(cw_first, as_plotly = TRUE, output_idx = list(c(2), c(1))) expect_s4_class(p, "innsight_plotly") - expect_error(boxplot(cw_first)) + expect_error(plot_global(cw_first)) expect_message(plot(cw_first, data_idx = c(1,3))) # Test plot function with channels last @@ -337,7 +337,7 @@ test_that("ConnectionWeights (global): Keras model with two inputs + two outputs expect_s4_class(p, "innsight_plotly") p <- plot(cw_last, as_plotly = TRUE, output_idx = list(c(2), c(1))) expect_s4_class(p, "innsight_plotly") - expect_error(boxplot(cw_last)) + expect_error(plot_global(cw_last)) expect_message(plot(cw_last, data_idx = c(1,3))) }) @@ -593,10 +593,10 @@ test_that("ConnectionWeights (local): Conv2D-Net", { p <- plot(cw_last, as_plotly = TRUE, output_idx = c(1,2), data_idx = c(1,4)) expect_s4_class(p, "innsight_plotly") - # Test boxplot - box <- boxplot(cw_first) + # Test plot_global + box <- plot_global(cw_first) expect_s4_class(box, "innsight_ggplot2") - box <- boxplot(cw_first, as_plotly = TRUE) + box <- plot_global(cw_first, as_plotly = TRUE) expect_s4_class(box, "innsight_plotly") }) @@ -685,9 +685,9 @@ test_that("ConnectionWeights (global): Keras model with two inputs + two outputs p <- plot(cw_last, as_plotly = TRUE, output_idx = list(c(2), c(1))) expect_s4_class(p, "innsight_plotly") - # Test boxplot - box <- boxplot(cw_first) + # Test plot_global + box <- plot_global(cw_first) expect_s4_class(box, "innsight_ggplot2") - box <- boxplot(cw_first, as_plotly = TRUE) + box <- plot_global(cw_first, as_plotly = TRUE) expect_s4_class(box, "innsight_plotly") }) diff --git a/tests/testthat/test_Gradients.R b/tests/testthat/test_Gradients.R index 04683ef..6e55f80 100644 --- a/tests/testthat/test_Gradients.R +++ b/tests/testthat/test_Gradients.R @@ -422,34 +422,34 @@ test_that("Gradient: Conv2D-Net", { p <- plot(grad, aggr_channels = function(x) -abs(sum(x))) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad) + p <- plot_global(grad) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad, preprocess_FUN = identity) + p <- plot_global(grad, preprocess_FUN = identity) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad, preprocess_FUN = function(x) -abs(x)) + p <- plot_global(grad, preprocess_FUN = function(x) -abs(x)) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad_first) + p <- plot_global(grad_first) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad, output_idx = c(1)) + p <- plot_global(grad, output_idx = c(1)) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad, data_idx = 1:3) + p <- plot_global(grad, data_idx = 1:3) expect_s4_class(p, "innsight_ggplot2") - p <- boxplot(grad, individual_max = 2, individual_data_idx = 1:2 ) + p <- plot_global(grad, individual_max = 2, individual_data_idx = 1:2 ) expect_s4_class(p, "innsight_ggplot2") skip_if_not_installed("plotly") p <- plot(grad, as_plotly = TRUE) expect_s4_class(p, "innsight_plotly") - p <- boxplot(grad, as_plotly = TRUE) + p <- plot_global(grad, as_plotly = TRUE) expect_s4_class(p, "innsight_plotly") - p <- boxplot(grad, output_idx = c(1), as_plotly = TRUE) + p <- plot_global(grad, output_idx = c(1), as_plotly = TRUE) expect_s4_class(p, "innsight_plotly") - p <- boxplot(grad, data_idx = 1:3, as_plotly = TRUE) + p <- plot_global(grad, data_idx = 1:3, as_plotly = TRUE) expect_s4_class(p, "innsight_plotly") - p <- boxplot(grad, individual_max = 2, individual_data_idx = 1:2, + p <- plot_global(grad, individual_max = 2, individual_data_idx = 1:2, as_plotly = TRUE) expect_s4_class(p, "innsight_plotly") - p <- boxplot(grad, as_plotly = TRUE, ref_data_idx = c(3)) + p <- plot_global(grad, as_plotly = TRUE, ref_data_idx = c(3)) expect_s4_class(p, "innsight_plotly") }) diff --git a/tests/testthat/test_LIME.R b/tests/testthat/test_LIME.R index 89bddc0..5d9f859 100644 --- a/tests/testthat/test_LIME.R +++ b/tests/testthat/test_LIME.R @@ -9,42 +9,42 @@ test_that("LIME: General errors", { expect_error(LIME$new()) # missing converter expect_error(LIME$new(model)) # missing data - expect_error(LIME$new(NULL, data, data[1:2, ])) # no output_type - expect_error(LIME$new(NULL, data, data[1:2, ], output_type = "regression")) # no pred_fun - expect_error(LIME$new(NULL, data, data[1:2, ], + expect_error(LIME$new(NULL, data[1:2, ], data )) # no output_type + expect_error(LIME$new(NULL, data[1:2, ], data, output_type = "regression")) # no pred_fun + expect_error(LIME$new(NULL, data[1:2, ], data, output_type = "regression", perd_fun = function(newdata, ...) newdata)) - LIME$new(model, data, data[1:2, ]) # successful run - expect_error(LIME$new(model, data, data[1:2, ], output_type = "ds")) # wrong output_type - expect_error(LIME$new(model, data, data[1:2, ], pred_fun = identity)) # wrong pred_fun - expect_error(LIME$new(model, data, data[1:2, ], output_idx = c(1,4))) # wrong output_idx - LIME$new(model, data, data[1:2, ], output_idx = c(2)) - expect_error(LIME$new(model, data, data[1:2, ], input_dim = c(1))) # wrong input_dim - expect_error(LIME$new(model, data, data[1:2, ], input_names = c("a", "b", "d"))) # wrong input_names - LIME$new(model, data, data[1:2, ], input_names = factor(c("a", "b"))) - expect_error(LIME$new(model, data, data[1:2, ], output_names = c("a", "d"))) # wrong output_names - LIME$new(model, data, data[1:2, ], output_names = factor(c("a", "d", "c"))) - - expect_error(LIME$new(model, data, data[1:2, ], output_idx = c(1, 10))) - LIME$new(model, data, data[1:2, ], output_idx = c(1, 2)) - expect_error(LIME$new(model, data, data[1:2, ], output_idx = list(c(1, 10)))) - LIME$new(model, data, data[1:2, ], output_idx = list(c(1, 3))) - expect_error(LIME$new(model, data, data[1:2, ], output_idx = list(NULL, c(1, 2)))) - expect_error(LIME$new(model, data, data[1:2, ], output_label = c(1, 2))) - expect_error(LIME$new(model, data, data[1:2, ], output_label = c("A", "b"))) - LIME$new(model, data, data[1:2, ], output_label = c("setosa", "virginica")) - LIME$new(model, data, data[1:2, ], output_label = as.factor(c("setosa", "virginica"))) - LIME$new(model, data, data[1:2, ], output_label = list(c("setosa", "virginica"))) - expect_error(LIME$new(model, data, data[1:2, ], + LIME$new(model, data[1:2, ], data) # successful run + expect_error(LIME$new(model, data[1:2, ], data, output_type = "ds")) # wrong output_type + expect_error(LIME$new(model, data[1:2, ], data, pred_fun = identity)) # wrong pred_fun + expect_error(LIME$new(model, data[1:2, ], data, output_idx = c(1,4))) # wrong output_idx + LIME$new(model, data[1:2, ], data, output_idx = c(2)) + expect_error(LIME$new(model, data[1:2, ], data, input_dim = c(1))) # wrong input_dim + expect_error(LIME$new(model, data[1:2, ], data, input_names = c("a", "b", "d"))) # wrong input_names + LIME$new(model, data[1:2, ], data, input_names = factor(c("a", "b"))) + expect_error(LIME$new(model, data[1:2, ], data, output_names = c("a", "d"))) # wrong output_names + LIME$new(model, data[1:2, ], data, output_names = factor(c("a", "d", "c"))) + + expect_error(LIME$new(model, data[1:2, ], data, output_idx = c(1, 10))) + LIME$new(model, data[1:2, ], data, output_idx = c(1, 2)) + expect_error(LIME$new(model, data[1:2, ], data, output_idx = list(c(1, 10)))) + LIME$new(model, data[1:2, ], data, output_idx = list(c(1, 3))) + expect_error(LIME$new(model, data[1:2, ], data, output_idx = list(NULL, c(1, 2)))) + expect_error(LIME$new(model, data[1:2, ], data, output_label = c(1, 2))) + expect_error(LIME$new(model, data[1:2, ], data, output_label = c("A", "b"))) + LIME$new(model, data[1:2, ], data, output_label = c("setosa", "virginica")) + LIME$new(model, data[1:2, ], data, output_label = as.factor(c("setosa", "virginica"))) + LIME$new(model, data[1:2, ], data, output_label = list(c("setosa", "virginica"))) + expect_error(LIME$new(model, data[1:2, ], data, output_label = c("setosa", "virginica"), output_idx = c(1, 2))) - LIME$new(model, data, data[1:2, ], + LIME$new(model, data[1:2, ], data, output_label = c("setosa", "virginica"), output_idx = c(1, 3)) # Forwarding arguments to lime::explain - lime <- LIME$new(model, data, data[1:10, ], n_permutations = 100, gower_power = 3) + lime <- LIME$new(model, data[1:10, ], data, n_permutations = 100, gower_power = 3) # get_result() res <- get_result(lime) @@ -110,7 +110,7 @@ test_that("LIME: Dense-Net (Neuralnet)", { ) # Normal model - lime <- LIME$new(nn, data, data[1:2, ]) + lime <- LIME$new(nn, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 3)) p <- plot(lime, output_idx = c(2,3)) boxp <- boxplot(lime, output_idx = c(2,3)) @@ -119,7 +119,7 @@ test_that("LIME: Dense-Net (Neuralnet)", { # Converter conv <- Converter$new(nn) - lime <- LIME$new(conv, data, data[1:2, ]) + lime <- LIME$new(conv, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 3)) p <- plot(lime, output_idx = c(2,3)) boxp <- boxplot(lime, output_idx = c(2,3)) @@ -141,7 +141,7 @@ test_that("LIME: Dense-Net (keras)", { layer_dense(units = 3, activation = "softmax") # Normal model - lime <- LIME$new(model, data, data[1:2, ]) + lime <- LIME$new(model, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 3)) p <- plot(lime, output_idx = c(1,3)) boxp <- boxplot(lime, output_idx = c(1,3)) @@ -150,7 +150,7 @@ test_that("LIME: Dense-Net (keras)", { # Converter conv <- Converter$new(model) - lime <- LIME$new(conv, data, data[1:2, ]) + lime <- LIME$new(conv, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 3)) p <- plot(lime, output_idx = c(1,3)) boxp <- boxplot(lime, output_idx = c(1,3)) @@ -166,7 +166,7 @@ test_that("LIME: Dense-Net (keras)", { layer_dense(units = 2, activation = "linear") # Normal model - lime <- LIME$new(model, data, data[1:2, ]) + lime <- LIME$new(model, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -175,7 +175,7 @@ test_that("LIME: Dense-Net (keras)", { # Converter conv <- Converter$new(model) - lime <- LIME$new(conv, data, data[1:2, ]) + lime <- LIME$new(conv, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -213,7 +213,7 @@ test_that("LIME: Conv1D-Net (keras)", { layer_dense(units = 1, activation = "sigmoid") # Normal model - lime <- LIME$new(model, data, data[1:2,, ], channels_first = FALSE) + lime <- LIME$new(model, data[1:2,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 64, 3, 1)) p <- plot(lime) boxp <- boxplot(lime) @@ -222,7 +222,7 @@ test_that("LIME: Conv1D-Net (keras)", { # Converter conv <- Converter$new(model) - lime <- LIME$new(conv, data, data[1:2,, ], channels_first = FALSE) + lime <- LIME$new(conv, data[1:2,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 64, 3, 1)) p <- plot(lime) boxp <- boxplot(lime) @@ -245,7 +245,7 @@ test_that("LIME: Conv1D-Net (keras)", { layer_dense(units = 2, activation = "linear") # Normal model - lime <- LIME$new(model, data, data[1:2,, ], channels_first = FALSE) + lime <- LIME$new(model, data[1:2,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 64, 3, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -254,7 +254,7 @@ test_that("LIME: Conv1D-Net (keras)", { # Converter conv <- Converter$new(model) - lime <- LIME$new(conv, data, data[1:2,, ], channels_first = FALSE) + lime <- LIME$new(conv, data[1:2,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 64, 3, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -288,19 +288,19 @@ test_that("LIME: Conv2D-Net (keras)", { layer_dense(units = 1, activation = "sigmoid") # Normal model - lime <- LIME$new(model, data, data[1:2,,, ], channels_first = FALSE) + lime <- LIME$new(model, data[1:2,,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 1)) p <- plot(lime) - boxp <- boxplot(lime) + boxp <- plot_global(lime) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter conv <- Converter$new(model) - lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = FALSE) + lime <- LIME$new(conv, data[1:2,,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 1)) p <- plot(lime) - boxp <- boxplot(lime) + boxp <- plot_global(lime) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") @@ -318,19 +318,19 @@ test_that("LIME: Conv2D-Net (keras)", { layer_dense(units = 2, activation = "linear") # Normal model - lime <- LIME$new(model, data, data[1:2,,, ], channels_first = FALSE) + lime <- LIME$new(model, data[1:2,,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 2)) p <- plot(lime, output_idx = c(2)) - boxp <- boxplot(lime, output_idx = c(2)) + boxp <- plot_global(lime, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter conv <- Converter$new(model) - lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = FALSE) + lime <- LIME$new(conv, data[1:2,,, ], data, channels_first = FALSE) expect_equal(dim(lime$get_result()), c(2, 10, 10, 3, 2)) p <- plot(lime, output_idx = c(2)) - boxp <- boxplot(lime, output_idx = c(2)) + boxp <- plot_global(lime, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") @@ -357,7 +357,7 @@ test_that("LIME: Dense-Net (torch)", { ) # Normal model - lime <- LIME$new(model, data, data[1:2, ]) + lime <- LIME$new(model, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 3)) p <- plot(lime, output_idx = c(1,3)) boxp <- boxplot(lime, output_idx = c(1,3)) @@ -366,7 +366,7 @@ test_that("LIME: Dense-Net (torch)", { # Converter conv <- Converter$new(model, input_dim = c(4)) - lime <- LIME$new(conv, data, data[1:2, ]) + lime <- LIME$new(conv, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 3)) p <- plot(lime, output_idx = c(1,3)) boxp <- boxplot(lime, output_idx = c(1,3)) @@ -382,7 +382,7 @@ test_that("LIME: Dense-Net (torch)", { ) # Normal model - lime <- LIME$new(model, data, data[1:2, ]) + lime <- LIME$new(model, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -391,7 +391,7 @@ test_that("LIME: Dense-Net (torch)", { # Converter conv <- Converter$new(model, input_dim = c(4)) - lime <- LIME$new(conv, data, data[1:2, ]) + lime <- LIME$new(conv, data[1:2, ], data) expect_equal(dim(lime$get_result()), c(2, 4, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -428,7 +428,7 @@ test_that("LIME: Conv1D-Net (torch)", { ) # Normal model - lime <- LIME$new(model, data, data[1:2,, ], channels_first = TRUE) + lime <- LIME$new(model, data[1:2,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 64, 1)) p <- plot(lime) boxp <- boxplot(lime) @@ -437,7 +437,7 @@ test_that("LIME: Conv1D-Net (torch)", { # Converter conv <- Converter$new(model, input_dim = c(3, 64)) - lime <- LIME$new(conv, data, data[1:2,, ], channels_first = TRUE) + lime <- LIME$new(conv, data[1:2,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 64, 1)) p <- plot(lime) boxp <- boxplot(lime) @@ -460,7 +460,7 @@ test_that("LIME: Conv1D-Net (torch)", { ) # Normal model - lime <- LIME$new(model, data, data[1:2,, ], channels_first = TRUE) + lime <- LIME$new(model, data[1:2,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 64, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -469,7 +469,7 @@ test_that("LIME: Conv1D-Net (torch)", { # Converter conv <- Converter$new(model, input_dim = c(3, 64)) - lime <- LIME$new(conv, data, data[1:2,, ], channels_first = TRUE) + lime <- LIME$new(conv, data[1:2,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 64, 2)) p <- plot(lime, output_idx = c(2)) boxp <- boxplot(lime, output_idx = c(2)) @@ -504,19 +504,19 @@ test_that("LIME: Conv2D-Net (torch)", { ) # Normal model - lime <- LIME$new(model, data, data[1:2,,, ], channels_first = TRUE) + lime <- LIME$new(model, data[1:2,,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 1)) p <- plot(lime) - boxp <- boxplot(lime) + boxp <- plot_global(lime) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter conv <- Converter$new(model, input_dim = c(3, 10, 10)) - lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = TRUE) + lime <- LIME$new(conv, data[1:2,,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 1)) p <- plot(lime) - boxp <- boxplot(lime) + boxp <- plot_global(lime) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") @@ -534,19 +534,19 @@ test_that("LIME: Conv2D-Net (torch)", { ) # Normal model - lime <- LIME$new(model, data, data[1:2,,, ], channels_first = TRUE) + lime <- LIME$new(model, data[1:2,,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 2)) p <- plot(lime, output_idx = c(2)) - boxp <- boxplot(lime, output_idx = c(2)) + boxp <- plot_global(lime, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter conv <- Converter$new(model, input_dim = c(3, 10, 10)) - lime <- LIME$new(conv, data, data[1:2,,, ], channels_first = TRUE) + lime <- LIME$new(conv, data[1:2,,, ], data, channels_first = TRUE) expect_equal(dim(lime$get_result()), c(2, 3, 10, 10, 2)) p <- plot(lime, output_idx = c(2)) - boxp <- boxplot(lime, output_idx = c(2)) + boxp <- plot_global(lime, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") @@ -580,7 +580,7 @@ test_that("LIME: Keras multiple input or output layers", { data <- lapply(list(c(5), c(10,10,2)), function(x) array(rnorm(10 * prod(x)), dim = c(10, x))) - expect_error(LIME$new(model, data)) + expect_error(LIME$new(model, data, data_ref = NULL)) # Multiple output layers main_input <- layer_input(shape = c(10,10,2), name = 'main_input') @@ -615,7 +615,7 @@ test_that("Custom model", { predict(model, newdata, ...)$predictions } - lime <- LIME$new(model, iris[, -5], iris[c(1,70, 111), -5], + lime <- LIME$new(model, iris[c(1,70, 111), -5], iris[, -5], output_type = "classification", pred_fun = pred_fun, output_names = levels(iris$Species)) diff --git a/tests/testthat/test_SHAP.R b/tests/testthat/test_SHAP.R index b65b839..2dfe112 100644 --- a/tests/testthat/test_SHAP.R +++ b/tests/testthat/test_SHAP.R @@ -9,25 +9,25 @@ test_that("SHAP: General errors", { expect_error(SHAP$new()) # missing converter expect_error(SHAP$new(model)) # missing data - expect_error(SHAP$new(NULL, data, data[1:2, ])) # no output_type - expect_error(SHAP$new(NULL, data, data[1:2, ], output_type = "regression")) # no pred_fun - expect_error(SHAP$new(NULL, data, data[1:2, ], + expect_error(SHAP$new(NULL, data[1:2, ], data)) # no output_type + expect_error(SHAP$new(NULL, data[1:2, ], data, output_type = "regression")) # no pred_fun + expect_error(SHAP$new(NULL, data[1:2, ], data, output_type = "regression", perd_fun = function(newdata, ...) newdata)) - SHAP$new(model, data, data[1:2, ]) # successful run - expect_error(SHAP$new(model, data, data[1:2, ], output_type = "ds")) # wrong output_type - expect_error(SHAP$new(model, data, data[1:2, ], pred_fun = identity)) # wrong pred_fun - expect_error(SHAP$new(model, data, data[1:2, ], output_idx = c(1,4))) # wrong output_idx - SHAP$new(model, data, data[1:2, ], output_idx = c(2)) - expect_error(SHAP$new(model, data, data[1:2, ], input_dim = c(1))) # wrong input_dim - expect_error(SHAP$new(model, data, data[1:2, ], input_names = c("a", "b", "d"))) # wrong input_names - SHAP$new(model, data, data[1:2, ], input_names = factor(c("a", "b"))) - expect_error(SHAP$new(model, data, data[1:2, ], output_names = c("a", "d"))) # wrong output_names - SHAP$new(model, data, data[1:2, ], output_names = factor(c("a", "d", "c"))) + SHAP$new(model, data[1:2, ], data) # successful run + expect_error(SHAP$new(model, data[1:2, ], data, output_type = "ds")) # wrong output_type + expect_error(SHAP$new(model, data[1:2, ], data, pred_fun = identity)) # wrong pred_fun + expect_error(SHAP$new(model, data[1:2, ], data, output_idx = c(1,4))) # wrong output_idx + SHAP$new(model, data[1:2, ], data, output_idx = c(2)) + expect_error(SHAP$new(model, data[1:2, ], data, input_dim = c(1))) # wrong input_dim + expect_error(SHAP$new(model, data[1:2, ], data, input_names = c("a", "b", "d"))) # wrong input_names + SHAP$new(model, data[1:2, ], data, input_names = factor(c("a", "b"))) + expect_error(SHAP$new(model, data[1:2, ], data, output_names = c("a", "d"))) # wrong output_names + SHAP$new(model, data[1:2, ], data, output_names = factor(c("a", "d", "c"))) # Forwarding arguments to fastshap::explain - shap <- SHAP$new(model, data, data[1:10, ], nsim = 4) + shap <- SHAP$new(model, data[1:10, ], data, nsim = 4) # get_result() res <- get_result(shap) @@ -93,7 +93,7 @@ test_that("SHAP: Dense-Net (Neuralnet)", { ) # Normal model - shap <- SHAP$new(nn, data, data[1:2, ]) + shap <- SHAP$new(nn, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 3)) p <- plot(shap, output_idx = c(2,3)) boxp <- boxplot(shap, output_idx = c(2,3)) @@ -102,7 +102,7 @@ test_that("SHAP: Dense-Net (Neuralnet)", { # Converter conv <- Converter$new(nn) - shap <- SHAP$new(conv, data, data[1:2, ]) + shap <- SHAP$new(conv, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 3)) p <- plot(shap, output_idx = c(2,3)) boxp <- boxplot(shap, output_idx = c(2,3)) @@ -124,7 +124,7 @@ test_that("SHAP: Dense-Net (keras)", { layer_dense(units = 3, activation = "softmax") # Normal model - shap <- SHAP$new(model, data, data[1:2, ]) + shap <- SHAP$new(model, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 3)) p <- plot(shap, output_idx = c(1,3)) boxp <- boxplot(shap, output_idx = c(1,3)) @@ -133,7 +133,7 @@ test_that("SHAP: Dense-Net (keras)", { # Converter conv <- Converter$new(model) - shap <- SHAP$new(conv, data, data[1:2, ]) + shap <- SHAP$new(conv, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 3)) p <- plot(shap, output_idx = c(1,3)) boxp <- boxplot(shap, output_idx = c(1,3)) @@ -149,7 +149,7 @@ test_that("SHAP: Dense-Net (keras)", { layer_dense(units = 2, activation = "linear") # Normal model - shap <- SHAP$new(model, data, data[1:2, ]) + shap <- SHAP$new(model, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) @@ -158,7 +158,7 @@ test_that("SHAP: Dense-Net (keras)", { # Converter conv <- Converter$new(model) - shap <- SHAP$new(conv, data, data[1:2, ]) + shap <- SHAP$new(conv, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) @@ -181,23 +181,19 @@ test_that("SHAP: Conv1D-Net (keras)", { library(torch) # Classification ------------------------------------------------------------- - data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + data <- array(rnorm(4 * 14 * 3), dim = c(4, 14, 3)) model <- keras_model_sequential() model %>% layer_conv_1d( - input_shape = c(64, 3), kernel_size = 16, filters = 8, + input_shape = c(14, 3), kernel_size = 8, filters = 2, activation = "softplus" ) %>% - layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% - layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% layer_flatten() %>% - layer_dense(units = 64, activation = "relu") %>% - layer_dense(units = 16, activation = "relu") %>% layer_dense(units = 1, activation = "sigmoid") # Normal model - shap <- SHAP$new(model, data, data[1:2,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 64, 3, 1)) + shap <- SHAP$new(model, data[1:2,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 14, 3, 1)) p <- plot(shap) boxp <- boxplot(shap) expect_s4_class(p, "innsight_ggplot2") @@ -205,31 +201,27 @@ test_that("SHAP: Conv1D-Net (keras)", { # Converter conv <- Converter$new(model) - shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 64, 3, 1)) + shap <- SHAP$new(conv, data[1:2,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 14, 3, 1)) p <- plot(shap) boxp <- boxplot(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Regression ----------------------------------------------------------------- - data <- array(rnorm(4 * 64 * 3), dim = c(4, 64, 3)) + data <- array(rnorm(4 * 14 * 3), dim = c(4, 14, 3)) model <- keras_model_sequential() model %>% layer_conv_1d( - input_shape = c(64, 3), kernel_size = 16, filters = 8, + input_shape = c(14, 3), kernel_size = 8, filters = 4, activation = "softplus" ) %>% - layer_conv_1d(kernel_size = 16, filters = 4, activation = "tanh") %>% - layer_conv_1d(kernel_size = 16, filters = 2, activation = "relu") %>% layer_flatten() %>% - layer_dense(units = 64, activation = "relu") %>% - layer_dense(units = 16, activation = "relu") %>% layer_dense(units = 2, activation = "linear") # Normal model - shap <- SHAP$new(model, data, data[1:2,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 64, 3, 2)) + shap <- SHAP$new(model, data[1:2,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 14, 3, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") @@ -237,8 +229,8 @@ test_that("SHAP: Conv1D-Net (keras)", { # Converter conv <- Converter$new(model) - shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 64, 3, 2)) + shap <- SHAP$new(conv, data[1:2,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 14, 3, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") @@ -258,62 +250,58 @@ test_that("SHAP: Conv2D-Net (keras)", { library(torch) # Classification ------------------------------------------------------------- - data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 10, 10, 3)) + data <- array(rnorm(4 * 4 * 4 * 3), dim = c(4, 4, 4, 3)) model <- keras_model_sequential() model %>% layer_conv_2d( - input_shape = c(10, 10, 3), kernel_size = 4, filters = 8, + input_shape = c(4, 4, 3), kernel_size = 2, filters = 4, activation = "softplus" ) %>% - layer_conv_2d(kernel_size = 4, filters = 2, activation = "relu") %>% layer_flatten() %>% - layer_dense(units = 16, activation = "relu") %>% layer_dense(units = 1, activation = "sigmoid") # Normal model - shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 1)) + shap <- SHAP$new(model, data[1:2,,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 4, 4, 3, 1)) p <- plot(shap) - boxp <- boxplot(shap) + boxp <- plot_global(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter conv <- Converter$new(model) - shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 1)) + shap <- SHAP$new(conv, data[1:2,,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 4, 4, 3, 1)) p <- plot(shap) - boxp <- boxplot(shap) + boxp <- plot_global(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Regression ----------------------------------------------------------------- - data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 10, 10, 3)) + data <- array(rnorm(4 * 4 * 4 * 3), dim = c(4, 4, 4, 3)) model <- keras_model_sequential() model %>% layer_conv_2d( - input_shape = c(10, 10, 3), kernel_size = 4, filters = 8, + input_shape = c(4, 4, 3), kernel_size = 2, filters = 4, activation = "softplus" ) %>% - layer_conv_2d(kernel_size = 4, filters = 2, activation = "relu") %>% layer_flatten() %>% - layer_dense(units = 16, activation = "relu") %>% layer_dense(units = 2, activation = "linear") # Normal model - shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 2)) + shap <- SHAP$new(model, data[1:2,,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 4, 4, 3, 2)) p <- plot(shap, output_idx = c(2)) - boxp <- boxplot(shap, output_idx = c(2)) + boxp <- plot_global(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter conv <- Converter$new(model) - shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = FALSE) - expect_equal(dim(shap$get_result()), c(2, 10, 10, 3, 2)) + shap <- SHAP$new(conv, data[1:2,,, ], data, channels_first = FALSE) + expect_equal(dim(shap$get_result()), c(2, 4, 4, 3, 2)) p <- plot(shap, output_idx = c(2)) - boxp <- boxplot(shap, output_idx = c(2)) + boxp <- plot_global(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") @@ -340,7 +328,7 @@ test_that("SHAP: Dense-Net (torch)", { ) # Normal model - shap <- SHAP$new(model, data, data[1:2, ]) + shap <- SHAP$new(model, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 3)) p <- plot(shap, output_idx = c(1,3)) boxp <- boxplot(shap, output_idx = c(1,3)) @@ -349,7 +337,7 @@ test_that("SHAP: Dense-Net (torch)", { # Converter conv <- Converter$new(model, input_dim = c(4)) - shap <- SHAP$new(conv, data, data[1:2, ]) + shap <- SHAP$new(conv, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 3)) p <- plot(shap, output_idx = c(1,3)) boxp <- boxplot(shap, output_idx = c(1,3)) @@ -365,7 +353,7 @@ test_that("SHAP: Dense-Net (torch)", { ) # Normal model - shap <- SHAP$new(model, data, data[1:2, ]) + shap <- SHAP$new(model, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) @@ -374,7 +362,7 @@ test_that("SHAP: Dense-Net (torch)", { # Converter conv <- Converter$new(model, input_dim = c(4)) - shap <- SHAP$new(conv, data, data[1:2, ]) + shap <- SHAP$new(conv, data[1:2, ], data) expect_equal(dim(shap$get_result()), c(2, 4, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) @@ -395,65 +383,57 @@ test_that("SHAP: Conv1D-Net (torch)", { library(torch) # Classification ------------------------------------------------------------- - data <- array(rnorm(4 * 64 * 3), dim = c(4, 3, 64)) + data <- array(rnorm(4 * 14 * 3), dim = c(4, 3, 14)) model <- nn_sequential( - nn_conv1d(3, 8, 16), - nn_softplus(), - nn_conv1d(8, 4, 16), - nn_tanh(), - nn_conv1d(4, 2, 16), + nn_conv1d(3, 8, 8), nn_relu(), nn_flatten(), - nn_linear(38, 16), + nn_linear(56, 16), nn_relu(), nn_linear(16, 1), nn_sigmoid() ) # Normal model - shap <- SHAP$new(model, data, data[1:2,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 64, 1)) + shap <- SHAP$new(model, data[1:2,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 14, 1)) p <- plot(shap) boxp <- boxplot(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter - conv <- Converter$new(model, input_dim = c(3, 64)) - shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 64, 1)) + conv <- Converter$new(model, input_dim = c(3, 14)) + shap <- SHAP$new(conv, data[1:2,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 14, 1)) p <- plot(shap) boxp <- boxplot(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Regression ----------------------------------------------------------------- - data <- array(rnorm(4 * 64 * 3), dim = c(4, 3, 64)) + data <- array(rnorm(4 * 14 * 3), dim = c(4, 3, 14)) model <- nn_sequential( - nn_conv1d(3, 8, 16), - nn_softplus(), - nn_conv1d(8, 4, 16), - nn_tanh(), - nn_conv1d(4, 2, 16), + nn_conv1d(3, 8, 8), nn_relu(), nn_flatten(), - nn_linear(38, 16), + nn_linear(56, 16), nn_relu(), nn_linear(16, 2) ) # Normal model - shap <- SHAP$new(model, data, data[1:2,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 64, 2)) + shap <- SHAP$new(model, data[1:2,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 14, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter - conv <- Converter$new(model, input_dim = c(3, 64)) - shap <- SHAP$new(conv, data, data[1:2,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 64, 2)) + conv <- Converter$new(model, input_dim = c(3, 14)) + shap <- SHAP$new(conv, data[1:2,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 14, 2)) p <- plot(shap, output_idx = c(2)) boxp <- boxplot(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") @@ -473,63 +453,59 @@ test_that("SHAP: Conv2D-Net (torch)", { library(torch) # Classification ------------------------------------------------------------- - data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 3, 10, 10)) + data <- array(rnorm(4 * 4 * 4 * 3), dim = c(4, 3, 4, 4)) model <- nn_sequential( - nn_conv2d(3, 8, c(4, 4)), - nn_softplus(), - nn_conv2d(8, 2, c(4, 4)), + nn_conv2d(3, 8, c(2, 2)), nn_relu(), nn_flatten(), - nn_linear(32, 16), + nn_linear(72, 16), nn_relu(), nn_linear(16, 1), nn_sigmoid() ) # Normal model - shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 1)) + shap <- SHAP$new(model, data[1:2,,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 4, 4, 1)) p <- plot(shap) - boxp <- boxplot(shap) + boxp <- plot_global(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter - conv <- Converter$new(model, input_dim = c(3, 10, 10)) - shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 1)) + conv <- Converter$new(model, input_dim = c(3, 4, 4)) + shap <- SHAP$new(conv, data[1:2,,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 4, 4, 1)) p <- plot(shap) - boxp <- boxplot(shap) + boxp <- plot_global(shap) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Regression ----------------------------------------------------------------- - data <- array(rnorm(4 * 10 * 10 * 3), dim = c(4, 3, 10, 10)) + data <- array(rnorm(4 * 4 * 4 * 3), dim = c(4, 3, 4, 4)) model <- nn_sequential( - nn_conv2d(3, 8, c(4, 4)), - nn_softplus(), - nn_conv2d(8, 2, c(4, 4)), + nn_conv2d(3, 8, c(2, 2)), nn_relu(), nn_flatten(), - nn_linear(32, 16), + nn_linear(72, 16), nn_relu(), nn_linear(16, 2) ) # Normal model - shap <- SHAP$new(model, data, data[1:2,,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 2)) + shap <- SHAP$new(model, data[1:2,,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 4, 4, 2)) p <- plot(shap, output_idx = c(2)) - boxp <- boxplot(shap, output_idx = c(2)) + boxp <- plot_global(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") # Converter - conv <- Converter$new(model, input_dim = c(3, 10, 10)) - shap <- SHAP$new(conv, data, data[1:2,,, ], channels_first = TRUE) - expect_equal(dim(shap$get_result()), c(2, 3, 10, 10, 2)) + conv <- Converter$new(model, input_dim = c(3, 4, 4)) + shap <- SHAP$new(conv, data[1:2,,, ], data, channels_first = TRUE) + expect_equal(dim(shap$get_result()), c(2, 3, 4, 4, 2)) p <- plot(shap, output_idx = c(2)) - boxp <- boxplot(shap, output_idx = c(2)) + boxp <- plot_global(shap, output_idx = c(2)) expect_s4_class(p, "innsight_ggplot2") expect_s4_class(boxp, "innsight_ggplot2") @@ -598,7 +574,7 @@ test_that("Custom model", { predict(model, newdata, ...)$predictions } - shap <- SHAP$new(model, iris[, -5], iris[c(1,70, 111), -5], + shap <- SHAP$new(model, iris[c(1,70, 111), -5], iris[, -5], pred_fun = pred_fun, output_names = levels(iris$Species)) diff --git a/tests/testthat/test_innsight_ggplot2.R b/tests/testthat/test_innsight_ggplot2.R index bf3159c..6c5debc 100644 --- a/tests/testthat/test_innsight_ggplot2.R +++ b/tests/testthat/test_innsight_ggplot2.R @@ -387,7 +387,7 @@ test_that("innsight_ggplot2: Tabular data (multiple columns)", { #----- Image data ----------------------------------------------------------- test_that("innsight_ggplot2: Signal data (one column)", { - p <- boxplot(res_2d) + p <- plot_global(res_2d) # Check class expect_s4_class(p, "innsight_ggplot2") @@ -415,7 +415,7 @@ test_that("innsight_ggplot2: Signal data (one column)", { test_that("innsight_ggplot2: Tabular data (multiple columns)", { - p <- boxplot(res_2d, output_idx = c(1,2,3)) + p <- plot_global(res_2d, output_idx = c(1,2,3)) # Check class expect_s4_class(p, "innsight_ggplot2") @@ -445,7 +445,7 @@ test_that("innsight_ggplot2: Tabular data (multiple columns)", { test_that("innsight_ggplot2: Mixed data", { skip_if_not_installed("keras") - p <- boxplot(res_mixed, output_idx = list(c(1,2,3), c(1))) + p <- plot_global(res_mixed, output_idx = list(c(1,2,3), c(1))) # Check class expect_s4_class(p, "innsight_ggplot2") diff --git a/tests/testthat/test_innsight_plotly.R b/tests/testthat/test_innsight_plotly.R index 1b221d4..5fbfa8f 100644 --- a/tests/testthat/test_innsight_plotly.R +++ b/tests/testthat/test_innsight_plotly.R @@ -301,7 +301,7 @@ test_that("innsight_plotly: Tabular data (multiple columns)", { #----- Image data ----------------------------------------------------------- test_that("innsight_plotly: Signal data (one column)", { - p <- boxplot(res_2d, as_plotly = TRUE) + p <- plot_global(res_2d, as_plotly = TRUE) # Check class expect_s4_class(p, "innsight_plotly") @@ -321,7 +321,7 @@ test_that("innsight_plotly: Signal data (one column)", { test_that("innsight_plotly: Tabular data (multiple columns)", { - p <- boxplot(res_2d, output_idx = c(1,2,3), as_plotly = TRUE) + p <- plot_global(res_2d, output_idx = c(1,2,3), as_plotly = TRUE) # Check class expect_s4_class(p, "innsight_plotly") @@ -343,7 +343,7 @@ test_that("innsight_plotly: Tabular data (multiple columns)", { test_that("innsight_plotly: Mixed data", { skip_if_not_installed("keras") - p <- boxplot(res_mixed, output_idx = list(c(1,2,3), c(1)), as_plotly = TRUE) + p <- plot_global(res_mixed, output_idx = list(c(1,2,3), c(1)), as_plotly = TRUE) # Check class expect_s4_class(p, "innsight_plotly") diff --git a/vignettes/Example_1_iris.Rmd b/vignettes/Example_1_iris.Rmd index 2b18b83..9cebb38 100644 --- a/vignettes/Example_1_iris.Rmd +++ b/vignettes/Example_1_iris.Rmd @@ -101,7 +101,7 @@ part for our trained model is done by: ```{r example_1_conv_1} # Create the converter object -converter <- Converter$new(model, input_dim = c(4)) +converter <- convert(model, input_dim = c(4)) ``` Since there are no labels for the inputs and outputs stored in the **torch** @@ -113,7 +113,7 @@ are thrown if any discrepancies arise. ```{r example_1_conv_2, eval = torch::torch_is_installed()} # Create `Converter` object (with custom labels) -converter <- Converter$new(model, +converter <- convert(model, input_dim = c(4), input_names = c("Sepal (length)", "Sepal (width)", "Petal (length)", "Petal (width)"), output_names = c("Setosa", "Versicolor", "Virginica") @@ -162,13 +162,13 @@ methods provided in the package:
  • **Gradient without last activation** ```{r} -grad_no_softmax <- Gradient$new(converter, x, ignore_last_act = TRUE) +grad_no_softmax <- run_grad(converter, x, ignore_last_act = TRUE) ```
  • **Gradient with last activation** ```{r, message = FALSE, results = 'hide'} -grad_softmax <- Gradient$new(converter, x, ignore_last_act = FALSE) +grad_softmax <- run_grad(converter, x, ignore_last_act = FALSE) ```
  • @@ -178,7 +178,7 @@ We can also set the method-specific arguments, such as the LRP rule and its associated parameter: ```{r, message = FALSE, results = 'hide'} -lrp_eps <- LRP$new(converter, x, rule_name = "epsilon", rule_param = 0.01) +lrp_eps <- run_lrp(converter, x, rule_name = "epsilon", rule_param = 0.01) ```
  • @@ -186,7 +186,7 @@ lrp_eps <- LRP$new(converter, x, rule_name = "epsilon", rule_param = 0.01) ```{r, message = FALSE, results = 'hide'} x_ref <- x$mean(1, keepdim = TRUE) # ref value needs the shape (1,4) -deeplift_mean <- DeepLift$new(converter, x, x_ref = x_ref) +deeplift_mean <- run_deeplift(converter, x, x_ref = x_ref) ```
  • @@ -232,8 +232,8 @@ For each of these classes and thus, of course, also for each method, there are two plot functions: * `plot()` shows only individual data points and -* `boxplot()` visualizes summaries of multiple data points using summary -statistics. +* `plot_global()` visualizes summaries of multiple data points using summary +statistics (you can also use the alias `boxplot()` for tabular and signal data). #### Plot individual results @@ -275,7 +275,8 @@ plotly::config(print(p, shareY = TRUE)) #### Plot summarized results -The S3 function `boxplot()` is implemented for each of the available local methods. +The S3 function `plot_global()` (or `boxplot()` for tabular and signal data) +is implemented for each of the available local methods. You can select your desired data points (default is `'all'`) and output nodes/classes with the `data_idx` and `output_idx` arguments, respectively. To switch between a **ggplot2** and **plotly** based plot, you can use the diff --git a/vignettes/Example_2_penguin.Rmd b/vignettes/Example_2_penguin.Rmd index d803914..22aa286 100644 --- a/vignettes/Example_2_penguin.Rmd +++ b/vignettes/Example_2_penguin.Rmd @@ -257,7 +257,7 @@ input_names <- output_names <- c("Adelie", "Chinstrap", "Gentoo") # Create the `Converter` object -converter_1 <- Converter$new(model, +converter_1 <- convert(model, input_dim = 10, input_names = input_names, output_names = output_names @@ -271,7 +271,7 @@ explained in the next step and we create a converter for this variant here: ```{r} # Create a second `Converter` object for combined categorical features -converter_2 <- Converter$new(model, +converter_2 <- convert(model, input_dim = 10, output_names = output_names ) @@ -298,7 +298,7 @@ to the default result, i.e., you can use the method of your choice as usual: # Data to be analyzed (in this case, we use the whole dataset) data <- test_ds$x # Apply method 'LRP' with rule alpha-beta -lrp_ab_1 <- LRP$new(converter_1, data, rule_name = "alpha_beta", rule_param = 2) +lrp_ab_1 <- run_lrp(converter_1, data, rule_name = "alpha_beta", rule_param = 2) # the result for 333 instances, 10 inputs and all 3 outputs dim(get_result(lrp_ab_1)) @@ -315,7 +315,7 @@ of the categorical features have to be combined so that the field ```{r, results='hide', message=FALSE} # Apply method as in the other case -lrp_ab_2 <- LRP$new(converter_2, data, rule_name = "alpha_beta", rule_param = 2) +lrp_ab_2 <- run_lrp(converter_2, data, rule_name = "alpha_beta", rule_param = 2) # Adjust input dimension and input names in the method converter object lrp_ab_2$converter$input_dim[[1]] <- 7 @@ -368,7 +368,8 @@ For each of these classes and thus, of course, also for each method, there are two plot functions: * `plot()` shows only individual data points and -* `boxplot()` visualizes summaries of multiple data points using summary statistics. +* `plot_global()` visualizes summaries of multiple data points using summary +statistics (you can also use the alias `boxplot()` for tabular and signal data). ### Plot individual results @@ -421,7 +422,9 @@ resulted. ### Plot summarized results -The function `boxplot()` is implemented for each of the available local methods. +The function `plot_global()` is implemented for each of the available local methods. +Since we are dealing with tabular data in this case, the method `boxplot()` can +alternatively be used instead of `plot_global()`. You can select your desired data points (default is `'all'`) and output nodes/classes with the `data_idx` and `output_idx` arguments, respectively. To switch between a **ggplot2** and **plotly** based plot, you can use the diff --git a/vignettes/detailed_overview.Rmd b/vignettes/detailed_overview.Rmd index e9d5439..dfad94f 100644 --- a/vignettes/detailed_overview.Rmd +++ b/vignettes/detailed_overview.Rmd @@ -97,12 +97,28 @@ converter <- Converter$new(model, ) ``` +As you can see, the `Converter` class is implemented using an `R6::R6Class()` +class. However, this assumes that users have prior knowledge of these classes, +potentially making the application a bit unfamiliar. For this reason, we +have implemented a shortcut function that initializes an object of the +`Converter` class in a more familiar R syntax: + +```{r, eval = FALSE} +converter <- convert(model, + input_dim = NULL, + input_names = NULL, + output_names = NULL, + dtype = "float", + save_model_as_list = FALSE +) +``` + ## Argument `model` This is the passed trained model. Currently, it can be a sequential -[**torch**](https://torch.mlverse.org/) model (`nn_sequential`), a -[**tensorflow/keras**](https://tensorflow.rstudio.com/) model (`keras_model` or -`keras_model_sequential`), a **neuralnet** model or a model as a list. All +[**torch**](https://torch.mlverse.org/) model (`nn_sequential()`), a +[**tensorflow/keras**](https://tensorflow.rstudio.com/) model (`keras_model()` or +`keras_model_sequential()`), a **neuralnet** model or a model as a list. All these options are explained in detail in the following subsections. ### Package torch @@ -111,24 +127,24 @@ Currently, only models created by [`torch::nn_sequential`](https://torch.mlverse However, the most popular standard layers and activation functions are available: -- Linear layers: `nn_linear` +- Linear layers: `nn_linear()` -- Convolutional layers: `nn_conv1d`, `nn_conv2d` (but only with +- Convolutional layers: `nn_conv1d()`, `nn_conv2d()` (but only with `padding_mode = "zeros"` and numerical padding) -- Max-pooling layers: `nn_max_pool1d`, `nn_max_pool2d` (both only with +- Max-pooling layers: `nn_max_pool1d()`, `nn_max_pool2d()` (both only with default arguments for `padding = 0`, `dilation = 1`, `return_indices = FALSE` and `ceil_mode = FALSE`) -- Average-pooling layers: `nn_avg_pool1d`, `nn_avg_pool2d` (both only +- Average-pooling layers: `nn_avg_pool1d()`, `nn_avg_pool2d()` (both only with default arguments for `padding = 0`, `return_indices = FALSE` and `ceil_mode = FALSE`) -- Batch-normalization layers: `nn_batch_norm1d`, `nn_batch_norm2d` +- Batch-normalization layers: `nn_batch_norm1d()`, `nn_batch_norm2d()` -- Flatten layer: `nn_flatten` +- Flatten layer: `nn_flatten()` -- Skipped layers: `nn_dropout` +- Skipped layers: `nn_dropout()` - Activation functions: `nn_relu`, `nn_leaky_relu`, `nn_softplus`, `nn_sigmoid`, `nn_softmax`, `nn_tanh` (open an issue if you need any more) @@ -159,7 +175,7 @@ torch_model <- nn_sequential( ) # For torch models the optional argument `input_dim` becomes a necessary one -converter <- Converter$new(torch_model, input_dim = c(3, 10, 10)) +converter <- convert(torch_model, input_dim = c(3, 10, 10)) ``` @@ -171,34 +187,34 @@ or [`keras_model`](https://tensorflow.rstudio.com/reference/keras/keras_model) a accepted. Within these functions, the following layers are allowed to be used: -- Input layer: `layer_input` +- Input layer: `layer_input()` -- Linear layers: `layer_dense` +- Linear layers: `layer_dense()` -- Convolutional layers: `layer_conv_1d`, `layer_conv_2d` +- Convolutional layers: `layer_conv_1d()`, `layer_conv_2d()` -- Pooling layers: `layer_max_pooling_1d`, `layer_max_pooling_2d`, - `layer_average_pooling_1d`, `layer_average_pooling_2d`, - `layer_global_average_pooling_1d`, `layer_global_average_pooling_2d`, - `layer_global_max_pooling_1d`, `layer_global_max_pooling_2d` +- Pooling layers: `layer_max_pooling_1d()`, `layer_max_pooling_2d()`, + `layer_average_pooling_1d()`, `layer_average_pooling_2d()`, + `layer_global_average_pooling_1d()`, `layer_global_average_pooling_2d()`, + `layer_global_max_pooling_1d()`, `layer_global_max_pooling_2d()` -- Batch-normalization layer: `layer_batch_normalization` +- Batch-normalization layer: `layer_batch_normalization()` -- Flatten layer: `layer_flatten` +- Flatten layer: `layer_flatten()` -- Merging layers: `layer_add`, `layer_concatenate` (but it is assumed +- Merging layers: `layer_add()`, `layer_concatenate()` (but it is assumed that the concatenation axis points to the channel axis) -- Padding layers: `layer_zero_padding_1d`, `layer_zero_padding_2d` +- Padding layers: `layer_zero_padding_1d()`, `layer_zero_padding_2d()` -- Skipped layers: `layer_dropout` +- Skipped layers: `layer_dropout()` - Activation functions: The following activation functions are allowed as character argument (`activation`) in a linear and convolutional layer: `"relu"`, `"softplus"`, `"sigmoid"`, `"softmax"`, `"tanh"`, `"linear"`. But you can also specify the activation function as a - standalone layer: `layer_activation_relu`, - `layer_activation_softmax`. But keep in mind that an activation + standalone layer: `layer_activation_relu()`, + `layer_activation_softmax()`. But keep in mind that an activation layer may only follow a dense, convolutional or pooling layer. If you miss an activation function, feel free to open an issue on GitHub. @@ -219,7 +235,7 @@ keras_model_seq <- keras_model_seq %>% layer_dropout(0.2) %>% layer_dense(4, activation = "softmax") -converter <- Converter$new(keras_model_seq) +converter <- convert(keras_model_seq) ``` **Example 2: `keras_model`** @@ -244,7 +260,7 @@ output <- layer_concatenate(list(conv_part, input_tab)) %>% keras_model_concat <- keras_model(inputs = list(input_image, input_tab), outputs = output) -converter <- Converter$new(keras_model_concat) +converter <- convert(keras_model_concat) ``` @@ -273,7 +289,7 @@ neuralnet_model <- neuralnet(Species ~ Petal.Length + Petal.Width, iris, ) # Convert model -converter <- Converter$new(neuralnet_model) +converter <- convert(neuralnet_model) # Show input names converter$input_names # Show output names @@ -286,9 +302,9 @@ converter$output_names Besides models from the packages **keras**, **torch** and **neuralnet** it is also possible to pass a self-defined model in the form of a named list to -the `Converter` class. This enables the interpretation of networks from -other libraries with all available methods provided by the **innsight** -package. +the `Converter` class/`convert()` function. This enables the interpretation +of networks from other libraries with all available methods provided by +the **innsight** package. If you want to create a custom model, your list (e.g., `model <- list()`) needs at least the keys `model$input_dim` and `model$layers`. However, @@ -1049,7 +1065,7 @@ is loaded!): ```{r} # Convert the model and save the model as a list -converter <- Converter$new(keras_model_concat, save_model_as_list = TRUE) +converter <- convert(keras_model_concat, save_model_as_list = TRUE) # Get the field `input_dim` converter$input_dim @@ -1106,7 +1122,7 @@ str(model_as_list$layers[[4]]) # let's change the activation function to "relu" model_as_list$layers[[4]]$activation_name <- "relu" # create a Converter object with the modified model -converter_modified <- Converter$new(model_as_list) +converter_modified <- convert(model_as_list) # now, we get different results for the same input because of the relu activation converter_modified$model(x, channels_first = FALSE) @@ -1145,12 +1161,19 @@ method looks like this: method <- Method$new(converter, data, channels_first = TRUE, output_idx = NULL, + output_label = NULL, ignore_last_act = TRUE, verbose = interactive(), dtype = "float" ) ``` +In this case as well, all methods are implemented as R6 classes. However, +here we have also implemented helper functions for initialization, allowing +the application of a method through a simple method call instead of `$new()`. +These methods all start with the prefix `run_` and end with the corresponding +acronym for the method (e.g., `run_grad()`). + ## Arguments ### Argument `converter` @@ -1222,6 +1245,35 @@ the output nodes in the individual output layers: calculations become more computationally expensive for more output nodes. +### Argument `output_label` + +These values specify the output nodes for which the method is to be applied and +can be used as an alternative to the argument `output_idx`. +Only values that were previously passed with the argument `output_names` in +the `converter` can be used. In order to allow models with multiple output +layers, there are the following possibilities to select the names of +the output nodes in the individual output layers: + +- A `character` vector or `factor` of labels: If the model has only one output + layer, the values correspond to the labels of the output nodes named in the + passed `Converter` object, e.g., + `c("a", "c", "d")` for the first, third and fourth output node if the + output names are `c("a", "b", "c", "d")`. If there are + multiple output layers, the names of the output nodes from the first + output layer are considered. + +- A `list` of `charactor`/`factor` vectors of labels: If the method is to be + applied to output nodes from different layers, a list can be passed + that specifies the desired labels of the output nodes for each + output layer. Unwanted output layers have the entry `NULL` instead of + a vector of labels, e.g., `list(NULL, c("a", "c"))` for the first and + third output node in the second output layer. + +- `NULL` (default): The method is applied to all output nodes in + the first output layer but is limited to the first ten as the + calculations become more computationally expensive for more output + nodes. + ### Argument `ignore_last_act` Set this logical value to include the last activation function for each @@ -1282,7 +1334,7 @@ model <- list( ) ) -converter <- Converter$new(model) +converter <- convert(model) ``` @@ -1293,7 +1345,8 @@ One of the first and most intuitive methods for interpreting neural networks is the *Gradients* method introduced by [Simonyan et al. (2013)](https://arxiv.org/abs/1312.6034), also known as *Vanilla Gradients* or *Saliency maps*. This method computes the gradients of the selected output with respect to the input -variables. Therefore the resulting relevance values indicate those +variables. Therefore the resulting relevance values indicate +prediction-sensitive variables, i.e., those variables that can be locally perturbed the least to change the outcome the most. Mathematically, this method can be described by the following formula for the input variable $x_i$ with $x \in \mathbb{R}^n$, the @@ -1306,17 +1359,27 @@ $$ As described in the introduction of this section, the corresponding **innsight**-method `Gradient` inherits from the super class `InterpretingMethod`, meaning that we need -to change the term `Method` to `Gradient`. The only model-specific +to change the term `Method` to `Gradient`. Alternatively, an object of the +class `Gradient` can also be created using the mentioned helper function +`run_grad()`, which does not require prior knowledge of R6 objects. +The only model-specific argument is `times_input`, which can be used to switch between the two methods *Gradient* (default `FALSE`) and *Gradient$\times$Input* (`TRUE`). For more information on the method *Gradient$\times$Input* see [this subsection](#gradienttimesinput-and-smoothgradtimesinput). ```{r, eval = FALSE} +# R6 class syntax grad <- Gradient$new(converter, data, times_input = FALSE, ... # other arguments inherited from 'InterpretingMethod' ) + +# Using the helper function +grad <- run_grad(converter, data, + times_input = FALSE, + ... # other arguments inherited from 'InterpretingMethod' +) ```
    @@ -1385,9 +1448,9 @@ result: data <- matrix(c(0.45), 1, 1) # Apply method (but don't ignore last activation) -grad <- Gradient$new(converter, data, ignore_last_act = FALSE) +grad <- run_grad(converter, data, ignore_last_act = FALSE) # get result -grad$get_result() +get_result(grad) ```
    @@ -1418,8 +1481,9 @@ $$ As described in the introduction of this section, the **innsight** method `SmoothGrad` inherits from the super class `InterpretingMethod`, meaning -that we need to change the term `Method` to `SmoothGrad`. In addition, -there are the following three model-specific arguments: +that we need to change the term `Method` to `SmoothGrad` or use the helper +function `run_smoothgrad()` for initializing an object of class `SmoothGrad`. +In addition, there are the following three model-specific arguments: - `n` (default: `50`): This integer value specifies how many perturbations will be used to calculate the mean gradient, i.e., the @@ -1440,12 +1504,21 @@ there are the following three model-specific arguments: subsection](#gradienttimesinput-and-smoothgradtimesinput). ```{r, eval = FALSE} +# R6 class syntax smoothgrad <- SmoothGrad$new(converter, data, n = 50, noise_level = 0.1, times_input = FALSE, ... # other arguments inherited from 'InterpretingMethod' -) +) + +# Using the helper function +smoothgrad <- run_smoothgrad(converter, data, + n = 50, + noise_level = 0.1, + times_input = FALSE, + ... # other arguments inherited from 'InterpretingMethod' +) ```
    @@ -1515,13 +1588,13 @@ With **innsight**, this method is applied as follows: data <- matrix(c(0.6), 1, 1) # Apply method -smoothgrad <- SmoothGrad$new(converter, data, +smoothgrad <- run_smoothgrad(converter, data, noise_level = 0.2, n = 50, ignore_last_act = FALSE # include the tanh activation ) # get result -smoothgrad$get_result() +get_result(smoothgrad) ```
    @@ -1601,7 +1674,8 @@ $$ Both methods are variants of the respective gradient methods `Gradient` and `SmoothGrad` and also have the corresponding model-specific -arguments. These variants can be chosen with the argument `times_input`: +arguments and helper functions for the initialization. These variants can +be chosen with the argument `times_input`: ```{r, eval = FALSE} # the "x Input" variant of method "Gradient" @@ -1610,11 +1684,23 @@ grad_x_input <- Gradient$new(converter, data, ... # other arguments of method "Gradient" ) +# the same using the corresponding helper function +grad_x_input <- run_grad(converter, data, + times_input = TRUE, + ... # other arguments of method "Gradient" +) + # the "x Input" variant of method "SmoothGrad" smoothgrad_x_input <- SmoothGrad$new(converter, data, times_input = TRUE, ... # other arguments of method "SmoothGrad" -) +) + +# the same using the corresponding helper function +smoothgrad_x_input <- run_smoothgrad(converter, data, + times_input = TRUE, + ... # other arguments of method "SmoothGrad" +) ```
    @@ -1662,12 +1748,12 @@ With **innsight**, this method is applied as follows: data <- matrix(c(0.49), 1, 1) # Apply method -grad_x_input <- Gradient$new(converter, data, +grad_x_input <- run_grad(converter, data, times_input = TRUE, ignore_last_act = FALSE # include the tanh activation ) # get result -grad_x_input$get_result() +get_result(grad_x_input) ``` **SmoothGrad$\times$Input:** @@ -1724,12 +1810,12 @@ With **innsight**, this method is applied as follows: data <- matrix(c(0.49), 1, 1) # Apply method -smoothgrad_x_input <- SmoothGrad$new(converter, data, +smoothgrad_x_input <- run_smoothgrad(converter, data, times_input = TRUE, ignore_last_act = FALSE # include the tanh activation ) # get result -smoothgrad_x_input$get_result() +get_result(smoothgrad_x_input) ```
    @@ -1855,12 +1941,21 @@ there are the following method-specific arguments for this method: nodes in a pooling window. ```{r, eval = FALSE} +# R6 class syntax lrp <- LRP$new(converter, data, rule_name = "simple", rule_param = NULL, winner_takes_all = TRUE, ... # other arguments inherited from 'InterpretingMethod' ) + +# Using the helper function for initialization +lrp <- run_lrp(converter, data, + rule_name = "simple", + rule_param = NULL, + winner_takes_all = TRUE, + ... # other arguments inherited from 'InterpretingMethod' +) ```
    @@ -1888,10 +1983,10 @@ data <- matrix( ) # Apply LRP with simple rule -lrp <- LRP$new(converter, data, +lrp <- run_lrp(converter, data, ignore_last_act = FALSE ) -lrp$get_result() +get_result(lrp) # get approximation error matrix(lrp$get_result()) - as_array(converter$model(torch_tensor(data))[[1]]) @@ -1905,19 +2000,19 @@ black. fun_1 <- function(x) { - LRP$new(converter, matrix(x, ncol = 1), ignore_last_act = FALSE)$get_result() + run_lrp(converter, matrix(x, ncol = 1), ignore_last_act = FALSE)$get_result() } fun_2 <- function(x) { - LRP$new(converter, matrix(x, ncol = 1), ignore_last_act = FALSE, rule_name = "epsilon", rule_param = 0.1)$get_result() + run_lrp(converter, matrix(x, ncol = 1), ignore_last_act = FALSE, rule_name = "epsilon", rule_param = 0.1)$get_result() } fun_3 <- function(x) { - LRP$new(converter, matrix(x, ncol = 1), ignore_last_act = FALSE, rule_name = "alpha_beta", rule_param = 0.5)$get_result() + run_lrp(converter, matrix(x, ncol = 1), ignore_last_act = FALSE, rule_name = "alpha_beta", rule_param = 0.5)$get_result() } fun_4 <- function(x) { - LRP$new(converter, matrix(x, ncol = 1), ignore_last_act = FALSE, rule_name = "alpha_beta", rule_param = 1)$get_result() + run_lrp(converter, matrix(x, ncol = 1), ignore_last_act = FALSE, rule_name = "alpha_beta", rule_param = 1)$get_result() } ggplot() + @@ -1957,8 +2052,10 @@ after adding the negative or positive contribution revealing dependencies missed by other approaches. Analogous to the previous methods, the **innsight** method `DeepLift` inherits -from the `InterpretingMetod` super class and thus all arguments. In addition, -there are the following method-specific arguments for this method: +from the `InterpretingMetod` super class and thus all arguments. Alternatively, +an object of the class `DeepLift` can also be created using the helper function +`run_deeplift()`, which does not require prior knowledge of R6 objects. In +addition, there are the following method-specific arguments for this method: * `x_ref` (default: `NULL`): This argument describes the reference input $\tilde{x}$ for the DeepLift method. This value must have the same format as @@ -1985,12 +2082,21 @@ distribution of the upper-layer contribution to the lower layer. ```{r, eval = FALSE} +# R6 class syntax deeplift <- DeepLift$new(converter, data, x_ref = NULL, rule_name = "rescale", winner_takes_all = TRUE, ... # other arguments inherited from 'InterpretingMethod' ) + +# Using the helper function for initialization +deeplift <- run_deeplift(converter, data, + x_ref = NULL, + rule_name = "rescale", + winner_takes_all = TRUE, + ... # other arguments inherited from 'InterpretingMethod' +) ```
    @@ -2008,7 +2114,7 @@ x <- matrix(c(0.55)) x_ref <- matrix(c(0.1)) # Apply method DeepLift with rescale rule -deeplift <- DeepLift$new(converter, x, x_ref = x_ref, ignore_last_act = FALSE) +deeplift <- run_deeplift(converter, x, x_ref = x_ref, ignore_last_act = FALSE) # Get result get_result(deeplift) @@ -2023,11 +2129,11 @@ set.seed(42) model <- neuralnet(Species ~ ., iris, hidden = 5, linear.output = FALSE) # Step 1: Create 'Converter' -conv <- Converter$new(model) +conv <- convert(model) # Step 2: Apply DeepLift (reveal-cancel rule) x_ref <- matrix(colMeans(iris[, -5]), nrow = 1) # use colmeans as reference value -deeplift <- DeepLift$new(conv, iris[, -5], +deeplift <- run_deeplift(conv, iris[, -5], x_ref = x_ref, ignore_last_act = FALSE, rule_name = "reveal_cancel" ) @@ -2044,6 +2150,281 @@ mean((delta_y - summed_decomposition)^2)
    + +### Integrated Gradients + +In the *Integrated Gradients* method introduced by +[Sundararajan et al. (2017)](https://arxiv.org/abs/1703.01365), the gradients +are integrated along a path from the value $x$ to a reference value $\tilde{x}$. +This integration results, similar to DeepLift, in a decomposition of $f(x) - f(\tilde{x})$. In +this sense, the method uncovers the feature-wise relative effect of the +input features on the difference between the prediction $f(x)$ and the +reference prediction $f(\tilde{x})$. This is archived through the following +formula: +$$ +\text{IntGrad}(x)_i^c = (x - \tilde{x}) \int_{\alpha = 0}^1 \frac{\partial f(\tilde{x} + \alpha (x - \tilde{x}))}{\partial x} d\alpha +$$ +In simpler terms, it calculates how much each feature contributes to a model's +output by tracing a path from a baseline input $\tilde{x}$ to the actual input +$x$ and measuring the average gradients along that path. + +Similar to the other gradient-based methods, by default the integrated +gradient is multiplied by the input to get an approximate decomposition of +$f(x) - f(\tilde{x})$. However, with the parameter `times_input` only the +gradient describing the output sensitivity can be returned. + +Analogous to the previous methods, the **innsight** method `IntegratedGradient` +inherits from the `InterpretingMetod` super class and thus all arguments. +Alternatively, an object of the class `IntegratedGradient` can also be created +using the helper function `run_intgrad()`, which does not require prior +knowledge of R6 objects. In addition, there are the following method-specific +arguments for this method: + +* `x_ref` (default: `NULL`): This argument describes the reference input +$\tilde{x}$ for the Integrated Gradients method. This value must have the same +format as the input data of the passed model to the converter class, i.e., + + * an `array`, `data.frame`, `torch_tensor` or array-like format of size + $\left(1, \text{input_dim}\right)$ or + * a `list` with the corresponding input data (according to the upper point) + for each of the input layers. + * It is also possible to use the default value `NULL` to take only zeros + as reference input. + +* `n` (default: `50`): Number of steps for the approximation of the +integration path along $\alpha$. + +* `times_input` (default: `TRUE`): Multiplies the integrated gradients with +the difference of the input features and the baseline values. By default, the +original definition of Integrated Gradient is applied. However, by setting +`times_input = FALSE` only an approximation of the integral is calculated, +which describes the sensitivity of the features to the output. + + +```{r, eval = FALSE} +# R6 class syntax +intgrad <- IntegratedGradient$new(converter, data, + x_ref = NULL, + n = 50, + times_input = TRUE, + ... # other arguments inherited from 'InterpretingMethod' +) + +# Using the helper function for initialization +intgrad <- run_intgrad(converter, data, + x_ref = NULL, + n = 50, + times_input = TRUE, + ... # other arguments inherited from 'InterpretingMethod' +) +``` + +
    + Examples +In this example, let's consider the point $x = 0.55$ and the reference point +$\tilde{x} = 0.1$. With the help of the model defined previously, the +respective outputs are $y = f(x) = 0.4699$ and $\tilde{y} = f(\tilde{x}) = 0.0997$. +The Integrated Gradient method now generates an approximate variable-wise +decomposition of the so-called difference-from-reference value +$\Delta y = y - \tilde{y} = 0.3702772$. +Since there is only one input feature in this case, the entire value should be +assigned to it: +```{r} +# Create data +x <- matrix(c(0.55)) +x_ref <- matrix(c(0.1)) + +# Apply method IntegratedGradient +intgrad <- run_intgrad(converter, x, x_ref = x_ref, ignore_last_act = FALSE) + +# Get result +get_result(intgrad) +``` + +
    + + +### Expected Gradients + +The *Expected Gradients* method ([Erion et al., 2021](https://doi.org/10.1038/s42256-021-00343-w)), +also known as *GradSHAP*, +is a local feature attribution technique which extends the Integrated Gradient +method and provides approximate Shapley values. In contrast to Integrated +Gradient, it considers not only a single reference value $\tilde{x}$ but the +whole distribution of reference values $\tilde{X} \sim \tilde{x}$ and averages +the Integrated Gradient values over this distribution. Mathematically, the +method can be described as follows: +$$ +\text{ExpGrad}(x)_i^c = \mathbb{E}_{\tilde{x}\sim \tilde{X}, \alpha \sim U(0,1)} \left[(x - \tilde{x}) \times \frac{\partial f(\tilde{x} + \alpha (x - \tilde{x}))}{\partial x} \right] +$$ +These feature-wise values approximate a decomposition of the prediction minus +the average prediction in the reference dataset, i.e., +$f(x) - \mathbb{E}_{\tilde{x}}\left[f(\tilde{x}) \right]$. This means, it +solves the issue of choosing the right reference value. + +Analogous to the previous methods, the **innsight** method `ExpectedGradient` +inherits from the `InterpretingMetod` super class and thus all arguments. +Alternatively, an object of the class `ExpectedGradient` can also be created +using the helper function `run_expgrad()`, which does not require prior +knowledge of R6 objects. In addition, there are the following method-specific +arguments for this method: + +* `data_ref` (default: `NULL`): This argument describes the reference inputs +$\tilde{x}$ for the Expected Gradients method. This value must have the same +format as the input data of the passed model to the converter class, i.e., + + * an `array`, `data.frame`, `torch_tensor` or array-like format of size + $\left(1, \text{input_dim}\right)$ or + * a `list` with the corresponding input data (according to the upper point) + for each of the input layers. + * It is also possible to use the default value `NULL` to take only zeros + as reference input. + +* `n` (default: `50`): Number of samples from the distribution of reference +values $\tilde{x} \sim \tilde{X}$ and number of samples for the approximation +of the integration path along $\alpha$. + + +```{r, eval = FALSE} +# R6 class syntax +expgrad <- ExpectedGradient$new(converter, data, + data_ref = NULL, + n = 50, + ... # other arguments inherited from 'InterpretingMethod' +) + +# Using the helper function for initialization +expgrad <- run_expgrad(converter, data, + x_ref = NULL, + n = 50, + ... # other arguments inherited from 'InterpretingMethod' +) +``` + +
    + Examples +In the following example, we demonstrate how the Expected Gradient method +is applied to the Iris dataset, accurately approximating the difference +between the prediction and the mean prediction +(adjusted for a very high sample size of $10\,000$): +```{r} +library(neuralnet) +set.seed(42) + +# Crate model with package 'neuralnet' +model <- neuralnet(Species ~ ., iris, linear.output = FALSE) + +# Step 1: Create 'Converter' +conv <- convert(model) + +# Step 2: Apply Expected Gradient +expgrad <- run_expgrad(conv, iris[c(1, 60), -5], + data_ref = iris[, -5], ignore_last_act = FALSE, + n = 10000 +) + +# Verify exact decomposition +y <- predict(model, iris[, -5]) +delta_y <- y[c(1, 60), ] - rbind(colMeans(y), colMeans(y)) +summed_decomposition <- apply(get_result(expgrad), c(1, 3), FUN = sum) # dim 2 is the input feature dim + +# Show the error between both +delta_y - summed_decomposition +``` + +
    + +### DeepSHAP + +The *DeepSHAP* method [(Lundberg & Lee, 2017)](https://dl.acm.org/doi/10.5555/3295222.3295230) +extends the DeepLift technique by not only considering a +single reference value but by calculating the average from several, ideally +representative reference values at each layer. The obtained feature-wise +results are approximate Shapley values for the chosen output, where the +conditional expectation is computed using these different reference values, +i.e., the DeepSHAP method decompose the difference from the prediction and the +mean prediction $f(x) - \mathbb{E}_{\tilde{x}}\left[f(\tilde{x}) \right]$ in +feature-wise effects. This means, the DeepSHAP method has the same underlying +goal as the Expected Gradient method and, hence, also solves the issue of +choosing the right reference value for the DeepLift method. + +Analogous to the previous methods, the **innsight** method `DeepSHAP` +inherits from the `InterpretingMetod` super class and thus all arguments. +Alternatively, an object of the class `DeepSHAP` can also be created +using the helper function `run_deepshap`()`, which does not require prior +knowledge of R6 objects. In addition, there are the following method-specific +arguments for this method: + +* `data_ref` (default: `NULL`): The reference data which is used to estimate +the conditional expectation. These must have the same format as the input data +of the passed model to the converter object. This means either + + * an `array`, `data.frame`, `torch_tensor` or array-like format of size + $\left(1, \text{input_dim}\right)$ or + * a `list` with the corresponding input data (according to the upper point) + for each of the input layers. + * It is also possible to use the default value `NULL` to take only zeros + as reference input. + +* `limit_ref` (default: `100`): This argument limits the number of instances taken from +the reference dataset `data_ref` so that only random `limit_ref` elements and +not the entire dataset are used to estimate the conditional expectation. A +too-large number can significantly increase the computation time. + +* (other model-specific arguments already explained in the DeepLift method, e.g., +`rule_name` or `winner_takes_all`). + + +```{r, eval = FALSE} +# R6 class syntax +deepshap <- DeepSHAP$new(converter, data, + data_ref = NULL, + limit_ref = 100, + ... # other arguments inherited from 'DeepLift' +) + +# Using the helper function for initialization +deepshap <- run_deepshap(converter, data, + data_ref = NULL, + limit_ref = 100, + ... # other arguments inherited from 'DeepLift' +) +``` + +
    + Examples +In the following example, we demonstrate how the DeepSHAP method +is applied to the Iris dataset, accurately approximating the difference +between the prediction and the mean prediction +(adjusted for a very high sample size of $10\,000$): +```{r} +library(neuralnet) +set.seed(42) + +# Crate model with package 'neuralnet' +model <- neuralnet(Species ~ ., iris, linear.output = FALSE) + +# Step 1: Create 'Converter' +conv <- convert(model) + +# Step 2: Apply Expected Gradient +deepshap <- run_deepshap(conv, iris[c(1, 60), -5], + data_ref = iris[, -5], ignore_last_act = FALSE, + limit_ref = nrow(iris) +) + +# Verify exact decomposition +y <- predict(model, iris[, -5]) +delta_y <- y[c(1, 60), ] - rbind(colMeans(y), colMeans(y)) +summed_decomposition <- apply(get_result(deepshap), c(1, 3), FUN = sum) # dim 2 is the input feature dim + +# Show the error between both +delta_y - summed_decomposition +``` + +
    + + ### Connection Weights One of the earliest methods specifically for neural networks was the *Connection @@ -2062,9 +2443,12 @@ to `TRUE` and providing input data. The **innsight** method `ConnectionWeights` also inherits from the super class `InterpretingMethod`, meaning that you need to change the -term `Method` to `ConnectionWeights`. The only model-specific argument is -`times_input`, which can be used to switch between the -global (`FALSE`) and the local (`TRUE`) *Connection Weights* method. +term `Method` to `ConnectionWeights`. Alternatively, +an object of the class `ConnectionWeights` can also be created using the +helper function `run_cw()`, which does not require prior knowledge of R6 objects. +The only model-specific argument is `times_input`, which can be used to +switch between the global (`FALSE`) and the local (`TRUE`) +*Connection Weights* method. ```{r, eval = FALSE} # The global variant (argument 'data' is no longer required) @@ -2078,6 +2462,12 @@ cw_local <- ConnectionWeights$new(converter, data, times_input = TRUE, ... # other arguments inherited from 'InterpretingMethod' ) + +# Using the helper function +cw_local <- run_cw(converter, data, + times_input = TRUE, + ... # other arguments inherited from 'InterpretingMethod' +) ```
    @@ -2090,7 +2480,7 @@ $$ With the **innsight** package, we get the same value: ```{r} # Apply global Connection Weights method -cw_global <- ConnectionWeights$new(converter, times_input = FALSE) +cw_global <- run_cw(converter, times_input = FALSE) # Show the result get_result(cw_global) @@ -2102,7 +2492,7 @@ relevances: data <- array(c(0.1, 0.4, 0.6), dim = c(3, 1)) # Apply local Connection Weights method -cw_local <- ConnectionWeights$new(converter, data, times_input = TRUE) +cw_local <- run_cw(converter, data, times_input = TRUE) # Show the result get_result(cw_local) @@ -2120,7 +2510,8 @@ signals or 2D images) suitable plot and boxplot functions based on complexity of higher dimensional inputs, these plots and boxplots can also be displayed as an interactive [plotly](https://plotly.com/r/) plots by using the argument `as_plotly`. These three class methods have also been implemented -as S3 methods (`get_result()`, `plot()` and `boxplot()`) for easier handling. +as S3 methods (`get_result()`, `plot()` and `plot_global()`/`boxplot()`) for +easier handling.
    Create results to be visualized @@ -2159,17 +2550,17 @@ img_model <- nn_sequential( ) # Create converter -tab_conv <- Converter$new(tab_model, +tab_conv <- convert(tab_model, input_dim = c(4), input_names = tab_names, output_names = out_names ) -img_conv <- Converter$new(img_model, input_dim = c(3, 32, 32)) +img_conv <- convert(img_model, input_dim = c(3, 32, 32)) # Apply Gradient x Input -tab_grad <- Gradient$new(tab_conv, tab_data, times_input = TRUE) -img_grad <- Gradient$new(img_conv, img_data, times_input = TRUE) +tab_grad <- run_grad(tab_conv, tab_data, times_input = TRUE) +img_grad <- run_grad(img_conv, img_data, times_input = TRUE) ```
    @@ -2250,7 +2641,7 @@ List of k **Example with a tabular model** ```{r, eval = torch::torch_is_installed() & keras::is_keras_available()} # Apply method 'Gradient x Input' for classes 1 ('setosa') and 3 ('virginica') -tab_grad <- Gradient$new(tab_conv, tab_data, +tab_grad <- run_grad(tab_conv, tab_data, output_idx = c(1, 3), times_input = TRUE ) @@ -2268,7 +2659,7 @@ result_array[c(1, 10), , ] **Example with an image model** ```{r, eval = torch::torch_is_installed() & keras::is_keras_available()} # Apply method 'Gradient' for outputs 1 and 2 -img_grad <- Gradient$new(img_conv, img_data, output_idx = c(1, 2)) +img_grad <- run_grad(img_conv, img_data, output_idx = c(1, 2)) # Get result result_array <- img_grad$get_result() # You can also use the S3 function 'get_result' @@ -2309,7 +2700,7 @@ model <- keras_model( outputs = output ) -conv <- Converter$new(model) +conv <- convert(model) data <- lapply( list(c(10, 10, 2), c(11)), function(x) array(rnorm(5 * prod(x)), dim = c(5, x)) @@ -2319,7 +2710,7 @@ data <- lapply( ```{r, eval = torch::torch_is_installed() & keras::is_keras_available()} # Apply method 'Gradient' for outputs 1 and 2 -grad <- Gradient$new(conv, data, output_idx = c(1, 2), channels_first = FALSE) +grad <- run_grad(conv, data, output_idx = c(1, 2), channels_first = FALSE) # Get result result_array <- grad$get_result() # You can also use the S3 function 'get_result' @@ -2361,7 +2752,7 @@ model <- keras_model( outputs = c(first_output, second_output) ) -conv <- Converter$new(model) +conv <- convert(model) data <- lapply( list(c(10, 10, 2), c(11)), function(x) array(rnorm(5 * prod(x)), dim = c(5, x)) @@ -2372,7 +2763,7 @@ data <- lapply( ```{r, eval = torch::torch_is_installed() & keras::is_keras_available()} # Apply method 'Gradient' for outputs 1 and 2 in the first and # for outputs 1 and 3 in the second output layer -grad <- Gradient$new(conv, data, +grad <- run_grad(conv, data, output_idx = list(c(1, 2), c(1, 3)), channels_first = FALSE ) @@ -2429,6 +2820,17 @@ output layer specified with the argument `output_idx`. respective data point, input layer, output layer, output node/class and input feature. +* `'pred'`: The prediction of the respective input instance (`'data'`) for +the output node `'output_node'` in the output layer `'model_output'`. + +* `'decomp_sum'`: The sum of all relevance values (`'value'`) for an input +instance (`'data'`) and output node `'output_node'` in the output layer +`'model_output'`. + +* `'decomp_goal'`: The corresponding decomposition goal of the applied feature +attribution method (if available, otherwise `NA`s) for an input instance +(`'data'`) and output node `'output_node'` in the output layer `'model_output'`. + * `'input_dimension'`: This column contains one of the values * `1`: the value corresponds to an tabular input. * `2`: the value corresponds to a signal input. @@ -2489,8 +2891,10 @@ This method visualizes the result of the selected method and enables a in-depth visual investigation with the help of the S4 classes `innsight_ggplot2` and `innsight_plotly`. You can use the argument `data_idx` to select the data points in the given data for the plot. In addition, the individual plot's output -nodes or classes can be selected with the argument `output_idx`. -But this has to be a subset of the [argument `output_idx`](#argument-output_idx) +nodes or classes can be selected with the argument `output_idx` (for indices) or +`output_label` (for the class labels). +But this has to be a subset of the [argument `output_idx`](#argument-output_idx) +or [`output_label`](#argument-output_label) passed to the respective method previously because the results were only calculated for these outputs. The different results for the selected data points and outputs are visualized using the **ggplot2**-based S4 class `innsight_ggplot2`. @@ -2520,6 +2924,15 @@ passed to the respective method previously. By default (`NULL`), the smallest index of all calculated output nodes and output layers is used. +
  • +`output_label`: The labels of the output nodes for which the results is to +be plotted. This can be either a vector of labels or a list of vectors of +labels (character or factor) but must be a subset of the labels for which the +results were calculated, i.e., a subset of the [argument `output_label`](#argument-output_label) +passed to the respective method previously. By default (`NULL`), the smallest +index of all calculated output nodes and output layers is used. +
  • +
  • `aggr_channels`: Pass one of `'norm'`, `'sum'`, `'mean'` or a custom function to aggregate the channels. By default (`'sum'`), the sum of all @@ -2554,6 +2967,18 @@ available, input layer are scaled and highlighted separately with the colors red positive, blue for negative, and white for the absence of relevance so that the different intensities of the relevances can be distinguished by their colors. +
  • + +
  • +`show_preds`: This logical value indicates whether the plots display the +prediction, the sum of calculated relevances, and, if available, the targeted +decomposition value in a small infobox. For example, in the case of +Gradient$\times$Input, the goal is to obtain a feature-wise decomposition of +the predicted value, while for DeepLift and IntegratedGradient, the goal is +the difference between the prediction and the reference value, i.e., +$f(x) - f(\tilde{x})$. However, the infoboxes are only shown for ggplo2-based +plots. +
  • @@ -2562,27 +2987,31 @@ different intensities of the relevances can be distinguished by their colors. method$plot( data_idx = 1, output_idx = NULL, + output_label = NULL, aggr_channels = "sum", as_plotly = FALSE, - same_scale = FALSE + same_scale = FALSE, + show_preds = TRUE ) # or the S3 method plot(method, data_idx = 1, output_idx = NULL, + output_label = NULL, aggr_channels = "sum", as_plotly = FALSE, - same_scale = FALSE + same_scale = FALSE, + show_preds = TRUE ) ``` **Examples and usage:** ```{r, fig.width = 8, fig.height=5, eval = torch::torch_is_installed() & keras::is_keras_available()} -# Create plot for output classes '1' (setosa) and '3' (virginica) and +# Create plot for output classes 'setosa' and 'virginica' and # data points '1' and '70' -p <- plot(tab_grad, output_idx = c(1, 3), data_idx = c(1, 70)) +p <- plot(tab_grad, output_label = c("setosa", "virginica"), data_idx = c(1, 70)) # Although it's not a ggplot2 object ... class(p) @@ -2656,14 +3085,15 @@ tab_grad$result[[1]][[1]] <- abs(result[[1]][[1]]) plot(tab_grad, output_idx = c(1, 3), data_idx = c(1, 70)) ``` -## Plot summarized results `boxplot()` +## Plot summarized results `plot_global()` This method visualizes summarized results of the selected method over all data points (or a subset) and enables an in-depth visual investigation with the help of the S4 classes `innsight_ggplot2` and `innsight_plotly` similar to the previous [`plot` function](#plot-single-results-plot). You can use the argument -`output_idx` to select the individual output nodes for the plot. But this -has to be a subset of the [argument `output_idx`](#argument-output_idx) +`output_idx`/`output_label` to select the individual output nodes for the plot. But this +has to be a subset of the [argument `output_idx`](#argument-output_idx) or +[`output_label`](#argument-output_label) passed to the respective method previously because the results were only calculated for these outputs. For tabular and 1D signal data, boxplots are created in which a reference value can be selected from the data using the @@ -2695,6 +3125,15 @@ subset of them by passing the indices. For example, with `c(1:10, 25, 26)` only first 10 data points and the 25th and 26th are used to calculate the boxplots. +
  • +`output_label`: The labels of the output nodes for which the results is to +be plotted. This can be either a vector of labels or a list of vectors of +labels (character or factor) but must be a subset of the labels for which the +results were calculated, i.e., a subset of the [argument `output_label`](#argument-output_label) +passed to the respective method previously. By default (`NULL`), the smallest +index of all calculated output nodes and output layers is used. +
  • +
  • `ref_data_idx`: This integer number determines the index for the reference data point. In addition to the boxplots, it is displayed in red color and is @@ -2763,7 +3202,7 @@ increase the runtime. ```{r, eval = FALSE} # Class method -method$boxplot( +method$plot_global( output_idx = NULL, data_idx = "all", ref_data_idx = NULL, @@ -2775,7 +3214,7 @@ method$boxplot( ) # or the S3 method -boxplot(method, +plot_global(method, output_idx = NULL, data_idx = "all", ref_data_idx = NULL, @@ -2785,6 +3224,9 @@ boxplot(method, individual_data_idx = NULL, individual_max = 20 ) + +# or the alias for tabular or signal data +boxplot(...) ``` **Examples and usage:** @@ -2827,9 +3269,10 @@ p <- boxplot(tab_grad, plotly::config(print(p)) ``` ```{r, fig.width=8, fig.height=4, eval = torch::torch_is_installed() & keras::is_keras_available()} -# We can do the same for models with image data. In addition, you can define +# We can do the same for models with image data (but have to use the method +# `plot_global`, since no boxplots are created). In addition, you can define # the aggregation function for the channels -p <- boxplot(img_grad, output_idx = c(1, 2), aggr_channels = "norm") +p <- plot_global(img_grad, output_idx = c(1, 2), aggr_channels = "norm") # Although it's not a ggplot2 object ... class(p) @@ -2842,7 +3285,7 @@ p + ``` ```{r, fig.width = 8, fig.height=4, echo = TRUE, eval = FALSE} # You can do the same with the plotly-based plots -p <- boxplot(img_grad, +p <- plot_global(img_grad, output_idx = c(1, 2), aggr_channels = "norm", as_plotly = TRUE ) @@ -2854,7 +3297,7 @@ p ``` ```{r, fig.width = 8, fig.height=4, echo = FALSE, message=FALSE, eval=Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 & torch::torch_is_installed() & keras::is_keras_available()} # You can do the same with the plotly-based plots -p <- boxplot(img_grad, +p <- plot_global(img_grad, output_idx = c(1, 2), aggr_channels = "norm", as_plotly = TRUE ) @@ -2863,7 +3306,7 @@ plotly::config(print(p)) ## Advanced plotting -In the preceding two sections, the basic `plot()` and `boxplot()` functions have +In the preceding two sections, the basic `plot()` and `plot_global()`/`boxplot()` functions have already been explained. As mentioned there, these functions create either an object of the S4 class `innsight_ggplot2` (if `as_plotly = FALSE`) or one of the S4 class `innsight_plotly` (if `as_plotly = TRUE`). These are intended @@ -2885,10 +3328,10 @@ model <- keras_model_sequential() %>% layer_dense(20, activation = "relu") %>% layer_dense(3, activation = "softmax") -converter <- Converter$new(model) +converter <- convert(model) data <- array(rnorm(5 * 50), dim = c(50, 5)) -res_simple <- Gradient$new(converter, data) +res_simple <- run_grad(converter, data) # Create model with images as inputs and two output layers input_image <- layer_input(shape = c(10, 10, 3)) @@ -2912,10 +3355,10 @@ keras_model_concat <- keras_model( outputs = c(output_1, output_2) ) -converter <- Converter$new(keras_model_concat) +converter <- convert(keras_model_concat) data <- array(rnorm(10 * 10 * 3 * 5), dim = c(5, 10, 10, 3)) -res_one_input <- Gradient$new(converter, data, +res_one_input <- run_grad(converter, data, channels_first = FALSE, output_idx = list(1:3, 1:3) ) @@ -2945,10 +3388,10 @@ keras_model_concat <- keras_model( outputs = list(output_1, output_2) ) -converter <- Converter$new(keras_model_concat) +converter <- convert(keras_model_concat) data <- lapply(list(c(10, 10, 3), c(10)), function(x) torch_randn(c(5, x))) -res_two_inputs <- Gradient$new(converter, data, +res_two_inputs <- run_grad(converter, data, times_input = TRUE, channels_first = FALSE, output_idx = list(1:3, 1:3) diff --git a/vignettes/innsight.Rmd b/vignettes/innsight.Rmd index f3c07ac..6b03019 100644 --- a/vignettes/innsight.Rmd +++ b/vignettes/innsight.Rmd @@ -128,7 +128,7 @@ variant of PyTorch called LibTorch (see Fig. 2). * **Unified framework:** It does not matter which model and method you choose, it is always the same three steps that lead to a visual illustration of the results (see the [next section](#how-to-use) for details): -
    model $\xrightarrow{\text{Step 1}}$ `Converter` $\xrightarrow{\text{Step 2}}$ method $\xrightarrow{\text{Step 3}}$ `print()` or `boxplot()`

    +
    model $\xrightarrow{\text{Step 1}}$ `Converter` $\xrightarrow{\text{Step 2}}$ method $\xrightarrow{\text{Step 3}}$ `plot()` or `plot_global()`/`boxplot()`

    * **Visualization tools:** Our package **innsight** offers several visualization methods for individual or summarized results regardless of @@ -154,15 +154,18 @@ are: model <- ... # this step is left to the user # Step 1: Convert the model -converter <- Converter$new(model) +converter <- convert(model) +converter <- Converter$new(model) # the same but without helper function # Step 2: Apply selected method to your data -result <- Method$new(converter, data) +result <- run_method(converter, data) +result <- Method$new(converter, data) # the same but without helper function # Step 3: Show and plot the results get_result(result) # get the result as an `array`, `data.frame` or `torch_tensor` plot(result) # for individual results (local) -boxplot(result) # for summarized results (global) +plot_global(result) # for summarized results (global) +boxplot(result) # alias for `plot_global` for tabular and signal data ``` @@ -174,17 +177,20 @@ or defined. For this reason, there are several ways in this package to pass a neural network to the `Converter` object, but the call is always the same: ```{r, eval = FALSE} -# The 'Converter' object (R6 class) +# Using the helper function `convert` +converter <- convert(model, ...) +# It simply passes all arguments to the initialization function of +# the corresponding R6 class, i.e., it is equivalent to converter <- Converter$new(model, ...) ``` Except for a **neuralnet** model, no names of inputs or outputs are stored in the given model. If no further arguments are set for the `Converter` -instance, default labels are generated for the input (e.g. `'X1'`, `'X2'`, ...) -and output names (`'Y1'`, `'Y2'`, ... ). In the converter, -however, there is the possibility with the optional arguments `input_names` -and `output_names` to pass the names, which will then be used in all results -and plots created by this object. +instance or `convert()` function, default labels are generated for the input +(e.g. `'X1'`, `'X2'`, ...) and output names (`'Y1'`, `'Y2'`, ... ). In the +converter, however, there is the possibility with the optional arguments +`input_names` and `output_names` to pass the names, which will then be used in +all results and plots created by this object. #### Usage with torch models @@ -216,10 +222,10 @@ model <- nn_sequential( nn_softmax(2) ) # Convert the model -conv_dense <- Converter$new(model, input_dim = c(3)) +conv_dense <- convert(model, input_dim = c(3)) # Convert model with input and output names conv_dense_with_names <- - Converter$new(model, + convert(model, input_dim = c(3), input_names = list(c("Price", "Weight", "Height")), output_names = list(c("Buy it!", "Don't buy it!")) @@ -254,7 +260,7 @@ model <- model %>% layer_dense(5, activation = "softmax") # Convert the model -conv_cnn <- Converter$new(model) +conv_cnn <- convert(model) ```
  • @@ -279,7 +285,7 @@ model <- neuralnet(Species ~ Petal.Length + Petal.Width, iris, ) # Convert model -conv_dense <- Converter$new(model) +conv_dense <- convert(model) ```
    @@ -310,7 +316,7 @@ model <- list( ) ) -converter <- Converter$new(model) +converter <- convert(model) ``` @@ -339,6 +345,19 @@ method <- Method$new(converter, data, # required arguments ... # other args and method-specific args ) ``` + +However, you can also use the helper functions (e.g., `run_grad()`, +`run_deeplift()`, etc.) for initializing a new object: + +```{r, eval = FALSE} +method <- run_method(converter, data, # required arguments + channels_first = TRUE, # optional settings + output_idx = NULL, # . + ignore_last_act = TRUE, # . + ... # other args and method-specific args +) +``` + The most important arguments are explained below. For a complete and detailed explanation, however, we refer to the R documentation (see `?InterpretingMethod`) or the vignette ["In-depth explanation"](https://bips-hb.github.io/innsight/articles/detailed_overview.html#step-2-apply-selected-method) @@ -367,6 +386,11 @@ first, third and fourth output node. If your model has more than one output layer, you can pass the respective output nodes in a list which is described in detail in the R documentation (see `?InterpretingMethod`) or in the [in-depth vignette](https://bips-hb.github.io/innsight/articles/detailed_overview.html#argument-output_idx) +* `output_label`: These values specify the output nodes for which the method +is to be applied and can be used as an alternative to the argument `output_idx`. +Only values that were previously passed with the argument `output_names` in +the `converter` can be used. + * `ignore_last_act`: Set this logical value to include the last activation functions for each output layer, or not (default: `TRUE`) @@ -395,9 +419,12 @@ $$ # Apply method 'Gradient' for the dense network grad_dense <- Gradient$new(conv_dense, iris[-c(1, 2, 5)]) +# You can also use the helper function `run_grad` +grad_dense <- run_grad(conv_dense, iris[-c(1, 2, 5)]) + # Apply method 'Gradient x Input' for CNN x <- torch_randn(c(10, 3, 10, 10)) -grad_cnn <- Gradient$new(conv_cnn, x, times_input = TRUE) +grad_cnn <- run_grad(conv_cnn, x, times_input = TRUE) ``` @@ -415,15 +442,59 @@ with $\varepsilon_1, \ldots \varepsilon_n \sim \mathcal{N}(0, \sigma^2)$. Examples ```{r, results='hide', message=FALSE, eval = keras::is_keras_available() & torch::torch_is_installed()} # Apply method 'SmoothGrad' for the dense network -smooth_dense <- SmoothGrad$new(conv_dense, iris[-c(1, 2, 5)]) +smooth_dense <- run_smoothgrad(conv_dense, iris[-c(1, 2, 5)]) # Apply method 'SmoothGrad x Input' for CNN x <- torch_randn(c(10, 3, 10, 10)) -smooth_cnn <- SmoothGrad$new(conv_cnn, x, times_input = TRUE) +smooth_cnn <- run_smoothgrad(conv_cnn, x, times_input = TRUE) ``` +
  • **`IntegratedGradient`** : Calculation of the integrated gradients +([Sundararajan et al. (2017)](https://arxiv.org/abs/1703.01365)) with respect +to a reference input $\tilde{x}$: +$$ +\text{IntGrad}(x)_i^c = (x - \tilde{x}) \int_{\alpha = 0}^1 \frac{\partial f(\tilde{x} + \alpha (x - \tilde{x}))}{\partial x} d\alpha. +$$ + +
    +Examples +```{r, results='hide', message=FALSE, eval = keras::is_keras_available() & torch::torch_is_installed()} +# Apply method 'IntegratedGradient' for the dense network +intgrad_dense <- run_intgrad(conv_dense, iris[-c(1, 2, 5)]) + +# Apply method 'IntegratedGradient' for CNN with the average baseline +x <- torch_randn(c(10, 3, 10, 10)) +x_ref <- x$mean(1, keepdim = TRUE) +intgrad_cnn <- run_intgrad(conv_cnn, x, x_ref = x_ref) +``` +
    +
  • + +
  • **`ExpectedGradient`** : Calculation of the integrated gradients +([Erion et al., 2021](https://doi.org/10.1038/s42256-021-00343-w)) with respect +to a whole reference dataset $\tilde{X} \sim \tilde{x}$: +$$ +\text{ExpGrad}(x)_i^c = \mathbb{E}_{\tilde{x}\sim \tilde{X}, \alpha \sim U(0,1)} \left[(x - \tilde{x}) \times \frac{\partial f(\tilde{x} + \alpha (x - \tilde{x}))}{\partial x} \right] +$$ + +
    +Examples +```{r, results='hide', message=FALSE, eval = keras::is_keras_available() & torch::torch_is_installed()} +# Apply method 'ExpectedGradient' for the dense network +expgrad_dense <- run_expgrad(conv_dense, iris[-c(1, 2, 5)], + data_ref = iris[-c(1, 2, 5)]) + +# Apply method 'ExpectedGradient' for CNN +x <- torch_randn(c(10, 3, 10, 10)) +data_ref <- torch_randn(c(20, 3, 10, 10)) +expgrad_cnn <- run_expgrad(conv_cnn, x, data_ref = data_ref) +``` +
    +
  • + +
  • **`LRP`** : Back-propagating the model output to the model input neurons to obtain relevance scores for the model prediction which is known as [*Layer-wise Relevance Propagation*](https://doi.org/10.1371/journal.pone.0130140): @@ -436,11 +507,11 @@ with $R_i$ relevance score for input neuron $i$. Examples ```{r, results='hide', message=FALSE, eval = keras::is_keras_available() & torch::torch_is_installed()} # Apply method 'LRP' for the dense network -lrp_dense <- LRP$new(conv_dense, iris[-c(1, 2, 5)]) +lrp_dense <- run_lrp(conv_dense, iris[-c(1, 2, 5)]) # Apply method 'LRP' for CNN with alpha-beta-rule x <- torch_randn(c(10, 10, 10, 3)) -lrp_cnn <- LRP$new(conv_cnn, x, +lrp_cnn <- run_lrp(conv_cnn, x, rule_name = "alpha_beta", rule_param = 1, channels_first = FALSE ) @@ -464,11 +535,11 @@ difference-from-reference model output $\Delta y_c$. # Define reference value x_ref <- array(colMeans(iris[-c(1, 2, 5)]), dim = c(1, 2)) # Apply method 'DeepLift' for the dense network -deeplift_dense <- DeepLift$new(conv_dense, iris[-c(1, 2, 5)], x_ref = x_ref) +deeplift_dense <- run_deeplift(conv_dense, iris[-c(1, 2, 5)], x_ref = x_ref) # Apply method 'DeepLift' for CNN (default is a zero baseline) x <- torch_randn(c(10, 3, 10, 10)) -deeplift_cnn <- DeepLift$new(conv_cnn, x) +deeplift_cnn <- run_deeplift(conv_cnn, x) ```
  • @@ -481,17 +552,23 @@ them up (see [*Connection Weights*](https://doi.org/10.1016/j.ecolmodel.2004.03. Examples ```{r, results='hide', message=FALSE, eval = keras::is_keras_available() & torch::torch_is_installed()} # Apply global method 'ConnectionWeights' for a dense network -connectweights_dense <- ConnectionWeights$new(conv_dense) +connectweights_dense <- run_cw(conv_dense) # Apply local method 'ConnectionWeights' for a CNN # Note: This variant requires input data x <- torch_randn(c(10, 3, 10, 10)) -connectweights_cnn <- ConnectionWeights$new(conv_cnn, x, times_input = TRUE) +connectweights_cnn <- run_cw(conv_cnn, x, times_input = TRUE) ``` - - + +
  • Additionally, the method **`DeepSHAP`** and the model-agnostic methods +**`LIME`** and **`SHAP`** are implemented (by the functions `run_deepshap()`, +`run_lime()` and `run_shap()`). For details, we refer to our [vignette "In-depth +explanation"](https://bips-hb.github.io/innsight/articles/detailed_overview.html). +
  • + + > **`r knitr::asis_output("\U1F4DD")` Notes** > @@ -607,26 +684,31 @@ for the plot. In addition, if the results have channels, the `aggr_channels` argument can be used to determine how the channels are aggregated. All arguments are explained in detail in the R documentation (see `?InterpretingMethod`) or [here for `plot()`](https://bips-hb.github.io/innsight/articles/detailed_overview.html#plot-single-results-plot) and -[here for `boxplot()`](https://bips-hb.github.io/innsight/articles/detailed_overview.html#plot-summarized-results-boxplot). +[here for `plot_global()`](https://bips-hb.github.io/innsight/articles/detailed_overview.html#plot-summarized-results-plot_global). ```{r, eval = FALSE} # Create a plot for single data points plot(method, data_idx = 1, # the data point to be plotted output_idx = NULL, # the indices of the output nodes/classes to be plotted + output_label = NULL, # the class labels to be plotted aggr_channels = "sum", as_plotly = FALSE, # create an interactive plot ... # other arguments ) # Create a plot with summarized results -boxplot(method, +plot_global(method, output_idx = NULL, # the indices of the output nodes/classes to be plotted + output_label = NULL, # the class labels to be plotted ref_data_idx = NULL, # the index of an reference data point to be plotted aggr_channels = "sum", as_plotly = FALSE, # create an interactive plot ... # other arguments ) + +# Alias for `plot_global` for tabular and signal data +boxplot(...) ``` > **`r knitr::asis_output("\U1F4DD")` Note** @@ -663,12 +745,12 @@ plotly::config(print(p))
    - `boxplot()` function (**ggplot2**) + `plot_global()` function (**ggplot2**) ```{r, eval = keras::is_keras_available() & torch::torch_is_installed(), fig.height=6, fig.width=9} # Create boxplot for the first two output classes -boxplot(smooth_dense, output_idx = 1:2) +plot_global(smooth_dense, output_idx = 1:2) # Use no preprocess function (default: abs) and plot a reference data point -boxplot(smooth_dense, +plot_global(smooth_dense, output_idx = 1:3, preprocess_FUN = identity, ref_data_idx = c(55) ) @@ -676,17 +758,17 @@ boxplot(smooth_dense,
    - `boxplot()` function (**plotly**) + `plot_global()` function (**plotly**) ```{r, fig.height=6, fig.width=9, eval = FALSE} # You can do the same with the plotly-based plots -boxplot(smooth_dense, +plot_global(smooth_dense, output_idx = 1:3, preprocess_FUN = identity, ref_data_idx = c(55), as_plotly = TRUE ) ``` ```{r, fig.width = 8, fig.height=4, echo = FALSE, message=FALSE, eval=Sys.getenv("RENDER_PLOTLY", unset = 0) == 1 & torch::torch_is_installed()} # You can do the same with the plotly-based plots -p <- boxplot(smooth_dense, +p <- plot_global(smooth_dense, output_idx = 1:3, preprocess_FUN = identity, ref_data_idx = c(55), as_plotly = TRUE ) From 1153d46c07285edf94c7470c43175f0b23b3dbab Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 14:41:00 +0100 Subject: [PATCH 09/20] Increment version number to 0.3.0 --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index af41995..46c26fa 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Type: Package Package: innsight Title: Get the Insights of Your Neural Network -Version: 0.2.1 +Version: 0.3.0 Authors@R: c( person("Niklas", "Koenen", , "niklas.koenen@gmail.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-4623-8271")), From 317d0d108a5a5fb80bc3002ab2d54f9d415c8985 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 16:03:41 +0100 Subject: [PATCH 10/20] Update documentation --- man/innsight_ggplot2-indexing.Rd | 8 +++----- man/innsight_plotly-indexing.Rd | 5 ++--- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/man/innsight_ggplot2-indexing.Rd b/man/innsight_ggplot2-indexing.Rd index a850016..ac00c3a 100644 --- a/man/innsight_ggplot2-indexing.Rd +++ b/man/innsight_ggplot2-indexing.Rd @@ -1,23 +1,21 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/innsight_ggplot2.R -\name{[,innsight_ggplot2,ANY,ANY,ANY-method} -\alias{[,innsight_ggplot2,ANY,ANY,ANY-method} +\name{[,innsight_ggplot2-method} \alias{[,innsight_ggplot2-method} \alias{[.innsight_ggplot2} \alias{[[,innsight_ggplot2-method} \alias{[[.innsight_ggplot2} -\alias{[<-,innsight_ggplot2,ANY,ANY,ANY-method} \alias{[<-,innsight_ggplot2-method} \alias{[<-.innsight_ggplot2} \alias{[[<-,innsight_ggplot2-method} \alias{[[<-.innsight_ggplot2} \title{Indexing plots of \code{innsight_ggplot2}} \usage{ -\S4method{[}{innsight_ggplot2,ANY,ANY,ANY}(x, i, j, ..., restyle = TRUE, drop = TRUE) +\S4method{[}{innsight_ggplot2}(x, i, j, ..., restyle = TRUE, drop = TRUE) \S4method{[[}{innsight_ggplot2}(x, i, j, ..., restyle = TRUE) -\S4method{[}{innsight_ggplot2,ANY,ANY,ANY}(x, i, j, ...) <- value +\S4method{[}{innsight_ggplot2}(x, i, j, ...) <- value \S4method{[[}{innsight_ggplot2}(x, i, j, ...) <- value } diff --git a/man/innsight_plotly-indexing.Rd b/man/innsight_plotly-indexing.Rd index 2eed740..1d84ae6 100644 --- a/man/innsight_plotly-indexing.Rd +++ b/man/innsight_plotly-indexing.Rd @@ -1,14 +1,13 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/innsight_plotly.R -\name{[,innsight_plotly,ANY,ANY,ANY-method} -\alias{[,innsight_plotly,ANY,ANY,ANY-method} +\name{[,innsight_plotly-method} \alias{[,innsight_plotly-method} \alias{[.innsight_plotly} \alias{[[,innsight_plotly-method} \alias{[[.innsight_plotly} \title{Indexing plots of \code{innsight_plotly}} \usage{ -\S4method{[}{innsight_plotly,ANY,ANY,ANY}(x, i, j, ..., drop = TRUE) +\S4method{[}{innsight_plotly}(x, i, j, ..., drop = TRUE) \S4method{[[}{innsight_plotly}(x, i, j, ..., drop) } From 8f45b7ec1861e0d3b1c78ee6df73b4039e576d64 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 16:04:04 +0100 Subject: [PATCH 11/20] Update README --- README.Rmd | 2 +- README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.Rmd b/README.Rmd index 1004cb7..0c9985b 100644 --- a/README.Rmd +++ b/README.Rmd @@ -85,7 +85,7 @@ from GitHub with the following commands (successful installation of [`devtools`](https://www.r-project.org/nosvn/pandoc/devtools.html) is required) ```{r, eval = FALSE} -# Stable version (v0.1.1) +# Stable version install.packages("innsight") # Development version diff --git a/README.md b/README.md index 864fa01..c33a002 100644 --- a/README.md +++ b/README.md @@ -84,7 +84,7 @@ of [`devtools`](https://www.r-project.org/nosvn/pandoc/devtools.html) is required) ``` r -# Stable version (v0.1.1) +# Stable version install.packages("innsight") # Development version From fdbbd1baa2ea8773e10b311c0dd4316d3782c605 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 21:56:08 +0100 Subject: [PATCH 12/20] Set seed for tests --- tests/testthat.R | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tests/testthat.R b/tests/testthat.R index d4e64e7..88a5efc 100644 --- a/tests/testthat.R +++ b/tests/testthat.R @@ -1,5 +1,10 @@ library(testthat) library(innsight) -if (Sys.getenv("TORCH_TEST", unset = 0) == 1) +if (Sys.getenv("TORCH_TEST", unset = 0) == 1) { + set.seed(42) + torch::torch_manual_seed(42) + tensorflow::set_random_seed(43) + test_check("innsight") +} From 17fe4cc7ff486f5c68371a4d29366d6da74abb55 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 21:57:03 +0100 Subject: [PATCH 13/20] Add missing topics for pkgdown --- _pkgdown.yml | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/_pkgdown.yml b/_pkgdown.yml index 4ce3fe1..4d35f8e 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -17,15 +17,22 @@ reference: contents: - Gradient - SmoothGrad + - IntegratedGradient + - ExpectedGradient - LRP - DeepLift + - DeepSHAP - ConnectionWeights + - LIME + - SHAP + - innsight_sugar - title: Visualization and getting results desc: > Functions and S4 classes for getting and visualizing the results contents: - get_result + - plot_global - innsight_ggplot2 - innsight_plotly @@ -56,5 +63,6 @@ reference: Super classes intended only for the user's information contents: - innsight-package + - AgnosticWrapper - GradientBased - InterpretingMethod From 1deaab5c1c82b2d2f82ae152eee3529ac321decf Mon Sep 17 00:00:00 2001 From: nkoenen Date: Mon, 18 Dec 2023 22:14:27 +0100 Subject: [PATCH 14/20] Update CRAN comments --- cran-comments.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/cran-comments.md b/cran-comments.md index 120b3d5..45817c5 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,4 +1,4 @@ -## --- `innsight` 0.2.1 ------------------------------------------------------ +## --- `innsight` 0.3.0 ------------------------------------------------------ ### Test environments with LibTorch * GitHub Actions (ubuntu-22.04): 3.5, 3.6, 4.0, 4.1, release, devel @@ -7,7 +7,7 @@ #### R CMD check results -No warnings or errors occurred. +TODO! **Note:** We can't run examples, tests or vignettes on CRAN, as this requires a successful installation of LibTorch. Every implemented method From 7cf280975e614bbccc376149c940edf080885d40 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Tue, 19 Dec 2023 11:45:56 +0100 Subject: [PATCH 15/20] Set default c++ compiler --- .github/workflows/R-CMD-check.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index a5b4c0e..7cdb088 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -39,6 +39,8 @@ jobs: TORCH_TEST: 1 TORCH_INSTALL: 1 TORCH_COMMIT_SHA: "none" + CC: gcc-14 + CXX: g++-14 steps: - uses: actions/checkout@v3 @@ -77,4 +79,5 @@ jobs: - uses: r-lib/actions/check-r-package@v2 with: + error-on: '"error"' upload-snapshots: true From 3e4639ccf14ef587a738d464fd8e205130d126fa Mon Sep 17 00:00:00 2001 From: nkoenen Date: Wed, 20 Dec 2023 14:29:37 +0100 Subject: [PATCH 16/20] Update GitHub actions workflows --- .github/workflows/R-CMD-check.yaml | 10 ++-- .github/workflows/pr-commands.yaml | 79 ------------------------------ 2 files changed, 4 insertions(+), 85 deletions(-) delete mode 100644 .github/workflows/pr-commands.yaml diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index 7cdb088..a4873f7 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -16,7 +16,7 @@ jobs: R-CMD-check: runs-on: ${{ matrix.config.os }} - name: ${{ matrix.config.os }} (tf-${{ matrix.config.tf }}, R-${{ matrix.config.r }}) + name: ${{ matrix.config.os }} (R-${{ matrix.config.r }}) strategy: fail-fast: false @@ -30,8 +30,6 @@ jobs: - {os: 'ubuntu-latest', r: 'release'} - {os: 'ubuntu-latest', r: 'oldrel-1'} - {os: 'ubuntu-latest', r: 'oldrel-2'} - - {os: 'ubuntu-latest', r: 'oldrel-3'} - - {os: 'ubuntu-latest', r: 'oldrel-4'} env: R_REMOTES_NO_ERRORS_FROM_WARNINGS: true @@ -39,8 +37,7 @@ jobs: TORCH_TEST: 1 TORCH_INSTALL: 1 TORCH_COMMIT_SHA: "none" - CC: gcc-14 - CXX: g++-14 + PYTORCH_ENABLE_MPS_FALLBACK: 1 steps: - uses: actions/checkout@v3 @@ -64,7 +61,7 @@ jobs: shell: Rscript {0} - name: Install Tensorflow + Keras deps - run: keras::install_keras(tensorflow = '${{ matrix.config.tf }}-cpu') + run: keras::install_keras(tensorflow = 'default-cpu') shell: Rscript {0} - name: Check if torch is installed @@ -80,4 +77,5 @@ jobs: - uses: r-lib/actions/check-r-package@v2 with: error-on: '"error"' + args: 'c("--no-multiarch", "--no-manual", "--as-cran")' upload-snapshots: true diff --git a/.github/workflows/pr-commands.yaml b/.github/workflows/pr-commands.yaml deleted file mode 100644 index 97271eb..0000000 --- a/.github/workflows/pr-commands.yaml +++ /dev/null @@ -1,79 +0,0 @@ -# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples -# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help -on: - issue_comment: - types: [created] - -name: Commands - -jobs: - document: - if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/document') }} - name: document - runs-on: ubuntu-latest - env: - GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} - steps: - - uses: actions/checkout@v2 - - - uses: r-lib/actions/pr-fetch@v2 - with: - repo-token: ${{ secrets.GITHUB_TOKEN }} - - - uses: r-lib/actions/setup-r@v2 - with: - use-public-rspm: true - - - uses: r-lib/actions/setup-r-dependencies@v2 - with: - extra-packages: any::roxygen2 - needs: pr-document - - - name: Document - run: roxygen2::roxygenise() - shell: Rscript {0} - - - name: commit - run: | - git config --local user.name "$GITHUB_ACTOR" - git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" - git add man/\* NAMESPACE - git commit -m 'Document' - - - uses: r-lib/actions/pr-push@v2 - with: - repo-token: ${{ secrets.GITHUB_TOKEN }} - - style: - if: ${{ github.event.issue.pull_request && (github.event.comment.author_association == 'MEMBER' || github.event.comment.author_association == 'OWNER') && startsWith(github.event.comment.body, '/style') }} - name: style - runs-on: ubuntu-latest - env: - GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} - steps: - - uses: actions/checkout@v2 - - - uses: r-lib/actions/pr-fetch@v2 - with: - repo-token: ${{ secrets.GITHUB_TOKEN }} - - - uses: r-lib/actions/setup-r@v2 - - - name: Install dependencies - run: install.packages("styler") - shell: Rscript {0} - - - name: Style - run: styler::style_pkg() - shell: Rscript {0} - - - name: commit - run: | - git config --local user.name "$GITHUB_ACTOR" - git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com" - git add \*.R - git commit -m 'Style' - - - uses: r-lib/actions/pr-push@v2 - with: - repo-token: ${{ secrets.GITHUB_TOKEN }} From 814da93c998c8d6d18ebdfccbdd29826d25aada0 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Wed, 20 Dec 2023 15:32:25 +0100 Subject: [PATCH 17/20] Add PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 --- .github/workflows/R-CMD-check.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index a4873f7..8b27161 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -38,6 +38,7 @@ jobs: TORCH_INSTALL: 1 TORCH_COMMIT_SHA: "none" PYTORCH_ENABLE_MPS_FALLBACK: 1 + PYTORCH_MPS_HIGH_WATERMARK_RATIO: 0.0 steps: - uses: actions/checkout@v3 From 72f45422f82fe1a98fd5e63d4c87f938d9d8f665 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Thu, 21 Dec 2023 12:48:31 +0100 Subject: [PATCH 18/20] Move the check whether Tensorflow is available from the roxygen2 tag to the example body --- man-roxygen/examples-ConnectionWeights.R | 4 ++-- man-roxygen/examples-Converter.R | 4 ++-- man-roxygen/examples-DeepLift.R | 4 ++-- man-roxygen/examples-DeepSHAP.R | 4 ++-- man-roxygen/examples-Gradient.R | 4 ++-- man-roxygen/examples-IntegratedGradient.R | 4 ++-- man-roxygen/examples-LRP.R | 4 ++-- man-roxygen/examples-SmoothGrad.R | 4 ++-- man/ConnectionWeights.Rd | 4 ++-- man/Converter.Rd | 4 ++-- man/DeepLift.Rd | 4 ++-- man/DeepSHAP.Rd | 4 ++-- man/ExpectedGradient.Rd | 4 ++-- man/Gradient.Rd | 4 ++-- man/IntegratedGradient.Rd | 4 ++-- man/LRP.Rd | 4 ++-- man/SmoothGrad.Rd | 4 ++-- man/innsight_ggplot2-indexing.Rd | 8 +++++--- man/innsight_plotly-indexing.Rd | 5 +++-- 19 files changed, 42 insertions(+), 39 deletions(-) diff --git a/man-roxygen/examples-ConnectionWeights.R b/man-roxygen/examples-ConnectionWeights.R index 9229589..ea90305 100644 --- a/man-roxygen/examples-ConnectionWeights.R +++ b/man-roxygen/examples-ConnectionWeights.R @@ -58,9 +58,9 @@ #' # Plot the result #' plot(cw) #' } -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-Converter.R b/man-roxygen/examples-Converter.R index db7e37c..1994b7b 100644 --- a/man-roxygen/examples-Converter.R +++ b/man-roxygen/examples-Converter.R @@ -43,9 +43,9 @@ #' converter$model$modules_list #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' #----------------------- Example 3: Keras ---------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-DeepLift.R b/man-roxygen/examples-DeepLift.R index 9c310d5..acae153 100644 --- a/man-roxygen/examples-DeepLift.R +++ b/man-roxygen/examples-DeepLift.R @@ -61,9 +61,9 @@ #' boxplot(deeplift_rescale) #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-DeepSHAP.R b/man-roxygen/examples-DeepSHAP.R index bc8e1f3..f6764d0 100644 --- a/man-roxygen/examples-DeepSHAP.R +++ b/man-roxygen/examples-DeepSHAP.R @@ -64,9 +64,9 @@ #' boxplot(deepshap) #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-Gradient.R b/man-roxygen/examples-Gradient.R index b504a87..ab71b6a 100644 --- a/man-roxygen/examples-Gradient.R +++ b/man-roxygen/examples-Gradient.R @@ -65,9 +65,9 @@ #' plot(gradient, data_idx = c(1, 60), output_idx = 1:3) #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-IntegratedGradient.R b/man-roxygen/examples-IntegratedGradient.R index 5b5c6b2..fb77540 100644 --- a/man-roxygen/examples-IntegratedGradient.R +++ b/man-roxygen/examples-IntegratedGradient.R @@ -60,9 +60,9 @@ #' boxplot(int_grad) #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-LRP.R b/man-roxygen/examples-LRP.R index 0fc0ae6..9017723 100644 --- a/man-roxygen/examples-LRP.R +++ b/man-roxygen/examples-LRP.R @@ -69,9 +69,9 @@ #' plot(lrp, output_idx = 1:3) #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man-roxygen/examples-SmoothGrad.R b/man-roxygen/examples-SmoothGrad.R index efd94bf..867e8bd 100644 --- a/man-roxygen/examples-SmoothGrad.R +++ b/man-roxygen/examples-SmoothGrad.R @@ -59,9 +59,9 @@ #' plot(smoothgrad, data_idx = c(1, 60), output_idx = 1:3) #' } #' -#' @examplesIf keras::is_keras_available() & torch::torch_is_installed() +#' @examplesIf torch::torch_is_installed() #' # ------------------------- Example 3: Keras ------------------------------- -#' if (require("keras")) { +#' if (require("keras") & keras::is_keras_available()) { #' library(keras) #' #' # Make sure keras is installed properly diff --git a/man/ConnectionWeights.Rd b/man/ConnectionWeights.Rd index beedebd..b6c1b44 100644 --- a/man/ConnectionWeights.Rd +++ b/man/ConnectionWeights.Rd @@ -86,9 +86,9 @@ if (require("neuralnet")) { plot(cw) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/Converter.Rd b/man/Converter.Rd index 13865e0..5f977a3 100644 --- a/man/Converter.Rd +++ b/man/Converter.Rd @@ -115,9 +115,9 @@ if (require("neuralnet")) { converter$model$modules_list } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} #----------------------- Example 3: Keras ---------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/DeepLift.Rd b/man/DeepLift.Rd index 5fd5f5e..dc17012 100644 --- a/man/DeepLift.Rd +++ b/man/DeepLift.Rd @@ -87,9 +87,9 @@ if (require("neuralnet")) { boxplot(deeplift_rescale) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/DeepSHAP.Rd b/man/DeepSHAP.Rd index a49ac95..3be1342 100644 --- a/man/DeepSHAP.Rd +++ b/man/DeepSHAP.Rd @@ -84,9 +84,9 @@ if (require("neuralnet")) { boxplot(deepshap) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/ExpectedGradient.Rd b/man/ExpectedGradient.Rd index 4e30bab..b8c044b 100644 --- a/man/ExpectedGradient.Rd +++ b/man/ExpectedGradient.Rd @@ -84,9 +84,9 @@ if (require("neuralnet")) { boxplot(int_grad) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/Gradient.Rd b/man/Gradient.Rd index 4372606..2609dda 100644 --- a/man/Gradient.Rd +++ b/man/Gradient.Rd @@ -86,9 +86,9 @@ if (require("neuralnet")) { plot(gradient, data_idx = c(1, 60), output_idx = 1:3) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/IntegratedGradient.Rd b/man/IntegratedGradient.Rd index f6d1f19..bd0712e 100644 --- a/man/IntegratedGradient.Rd +++ b/man/IntegratedGradient.Rd @@ -89,9 +89,9 @@ if (require("neuralnet")) { boxplot(int_grad) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/LRP.Rd b/man/LRP.Rd index 7000d2b..78ff91d 100644 --- a/man/LRP.Rd +++ b/man/LRP.Rd @@ -92,9 +92,9 @@ if (require("neuralnet")) { plot(lrp, output_idx = 1:3) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/SmoothGrad.Rd b/man/SmoothGrad.Rd index fd7ad97..782fb64 100644 --- a/man/SmoothGrad.Rd +++ b/man/SmoothGrad.Rd @@ -78,9 +78,9 @@ if (require("neuralnet")) { plot(smoothgrad, data_idx = c(1, 60), output_idx = 1:3) } \dontshow{\}) # examplesIf} -\dontshow{if (keras::is_keras_available() & torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} +\dontshow{if (torch::torch_is_installed()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} # ------------------------- Example 3: Keras ------------------------------- -if (require("keras")) { +if (require("keras") & keras::is_keras_available()) { library(keras) # Make sure keras is installed properly diff --git a/man/innsight_ggplot2-indexing.Rd b/man/innsight_ggplot2-indexing.Rd index ac00c3a..a850016 100644 --- a/man/innsight_ggplot2-indexing.Rd +++ b/man/innsight_ggplot2-indexing.Rd @@ -1,21 +1,23 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/innsight_ggplot2.R -\name{[,innsight_ggplot2-method} +\name{[,innsight_ggplot2,ANY,ANY,ANY-method} +\alias{[,innsight_ggplot2,ANY,ANY,ANY-method} \alias{[,innsight_ggplot2-method} \alias{[.innsight_ggplot2} \alias{[[,innsight_ggplot2-method} \alias{[[.innsight_ggplot2} +\alias{[<-,innsight_ggplot2,ANY,ANY,ANY-method} \alias{[<-,innsight_ggplot2-method} \alias{[<-.innsight_ggplot2} \alias{[[<-,innsight_ggplot2-method} \alias{[[<-.innsight_ggplot2} \title{Indexing plots of \code{innsight_ggplot2}} \usage{ -\S4method{[}{innsight_ggplot2}(x, i, j, ..., restyle = TRUE, drop = TRUE) +\S4method{[}{innsight_ggplot2,ANY,ANY,ANY}(x, i, j, ..., restyle = TRUE, drop = TRUE) \S4method{[[}{innsight_ggplot2}(x, i, j, ..., restyle = TRUE) -\S4method{[}{innsight_ggplot2}(x, i, j, ...) <- value +\S4method{[}{innsight_ggplot2,ANY,ANY,ANY}(x, i, j, ...) <- value \S4method{[[}{innsight_ggplot2}(x, i, j, ...) <- value } diff --git a/man/innsight_plotly-indexing.Rd b/man/innsight_plotly-indexing.Rd index 1d84ae6..2eed740 100644 --- a/man/innsight_plotly-indexing.Rd +++ b/man/innsight_plotly-indexing.Rd @@ -1,13 +1,14 @@ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/innsight_plotly.R -\name{[,innsight_plotly-method} +\name{[,innsight_plotly,ANY,ANY,ANY-method} +\alias{[,innsight_plotly,ANY,ANY,ANY-method} \alias{[,innsight_plotly-method} \alias{[.innsight_plotly} \alias{[[,innsight_plotly-method} \alias{[[.innsight_plotly} \title{Indexing plots of \code{innsight_plotly}} \usage{ -\S4method{[}{innsight_plotly}(x, i, j, ..., drop = TRUE) +\S4method{[}{innsight_plotly,ANY,ANY,ANY}(x, i, j, ..., drop = TRUE) \S4method{[[}{innsight_plotly}(x, i, j, ..., drop) } From 255fda5b7deea963cdeb1729d52236f5b505e2b1 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Thu, 21 Dec 2023 12:49:27 +0100 Subject: [PATCH 19/20] Fix typo in description --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 46c26fa..db63c0e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -10,7 +10,7 @@ Authors@R: c( Description: Interpretation methods for analyzing the behavior and individual predictions of modern neural networks in a three-step procedure: Converting the model, running the interpretation method, and visualizing the results. - Implemented methods are e.g., 'Connection Weights' described by Olden et al. (2004) + Implemented methods are, e.g., 'Connection Weights' described by Olden et al. (2004) , layer-wise relevance propagation ('LRP') described by Bach et al. (2015) , deep learning important features From 2abfdd59e2c3b75d4fcd361651b61e51f72af672 Mon Sep 17 00:00:00 2001 From: nkoenen Date: Thu, 21 Dec 2023 12:50:55 +0100 Subject: [PATCH 20/20] Update cran comments --- cran-comments.md | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/cran-comments.md b/cran-comments.md index 45817c5..980e9ff 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,37 +1,54 @@ ## --- `innsight` 0.3.0 ------------------------------------------------------ ### Test environments with LibTorch -* GitHub Actions (ubuntu-22.04): 3.5, 3.6, 4.0, 4.1, release, devel +* GitHub Actions (ubuntu-22.04): 4.1, 4.2, release, devel * GitHub Actions (windows): release * Github Actions (macOS): release #### R CMD check results -TODO! +There were no errors or warnings only one note which is not related to our +package and caused by `keras` (in `keras`, these files are cleaned up after +the tests' execution; see [here](https://github.com/rstudio/keras/blob/eb5d21b9e37e918c2662eb6ec5bcc46a00054db6/tests/testthat/setup.R)) + +``` +* checking for detritus in the temp directory ... NOTE +Found the following files/directories: + ‘__autograph_generated_filejdw6cqsg.py’ ‘__pycache__’ +``` **Note:** We can't run examples, tests or vignettes on CRAN, as this -requires a successful installation of LibTorch. Every implemented method +requires a successful installation of LibTorch/Lantern. Every implemented method relies on an instance of `Converter` that converts a passed model to a torch model, so any possibility of examples or (non-trivial) tests requires -LibTorch. In this regard, we have followed the recommendations of the authors -of torch (see torch +LibTorch/Lantern. In this regard, we have followed the recommendations +of the authors of torch (see torch [issue #651](https://github.com/mlverse/torch/issues/651#issuecomment-896783144)) and disabled their execution on CRAN. ### Test environments without LibTorch - winbuilder Windows Server 2022, R-devel, 64 bit - winbuilder Windows Server 2022, R-release, 64 bit -- R-hub Ubuntu Linux 20.04.1 LTS, R-release, GCC +- winbuilder Windows Server 2022, R-oldrel, 64 bit +- R-hub Ubuntu Linux 20.04.1 LTS, R-release - R-hub Fedora Linux, R-devel, clang, gfortran - macOS builder, R-release #### R CMD check results -There were no errors or warnings only some notes which are not related to -the package: +There were no errors or warnings, only some notes under R-Hub unrelated to the +package: (see issues [#548](https://github.com/r-hub/rhub/issues/548), +[#560](https://github.com/r-hub/rhub/issues/560), +[#503](https://github.com/r-hub/rhub/issues/503)): ``` * checking HTML version of manual ... NOTE -Skipping checking HTML validation: no command 'tidy' found -Skipping checking math rendering: package 'V8' unavailable + Skipping checking HTML validation: no command 'tidy' found + Skipping checking math rendering: package 'V8' unavailable +* checking for non-standard things in the check directory ... NOTE + Found the following files/directories: + ''NULL'' +* checking for detritus in the temp directory ... NOTE + Found the following files/directories: + 'lastMiKTeXException' ```