This repository contains the code and material to reproduce the results of the manuscript "Conditional Feature Importance with Generative Modeling using Adversarial Random Forests".
The core method introduced in our paper, cARFi, is implemented in the R script
located at cARFi.R
. This script includes the main functions and including a
description of all parameters and how to use them.
The repository is structured as follows:
eval_bike_sharing/
contains the code to evaluate the cARFi method on the Bike-Sharing dataset considering various conditioning sets of variables (Fig. 4)eval_conditioning_set/
contains the code to compare cARFi under various conditionng sets against some marginal (PFI, SAGE) and conditional methods (CS, CPI with Gaussian and sequential knockoffs, LOCO) on our modified DAG of König et al. (2020) (Fig. 3)eval_mixed_data/
contains the mixed data simulation based on the DAG of Blesch et al. (2023) (Fig. 2 and Appendix S7 + S8)eval_proof_of_concept/
contains the proof of concept simulation (Fig. 1 and Appendix S1-S6)figures/
contains the figures generated by the simulations and analyses from above
You can run the code and reproduce the results by running the corresponding
file run_simulation.R
in each simulation directory. The run_simulation.R
file will automatically save the results in the figures/
directory.
For example:
Rscript eval_bike_sharing/run_simulation.R
This project was developed and tested using R version 4.4.1 and it requires the following R packages:
-
arf
version$\geq$ 0.2.2 installed by runningpak::pkg_install("bips-hb/arf")
-
seqknockoff
installed by runningpak::pkg_install("kormama1/seqknockoff")
-
cs
installed by runningpak::pkg_install("christophM/paper_conditional_subgroups")
-
cpi
version$\geq$ 0.1.5 installed by runningpak::pkg_install("bips-hb/cpi")
ggplot2
batchtools
data.table
here
envalysis
ggpubr
ggsci
fastDummies
Metrics
mlr3verse
ranger
microbenchmark
pak
dplyr
Note: The script setup.R
ensures that all necessary R packages are installed
and is called before any analysis is run. It also automatically sets up the
environment by creating required folders, setting the ggplot2 theme, and
managing CPU usage during simulations.