CausalFlow is a python library for causal analysis from time-series data. It comprises:
- F-PCMCI - Filtered-PCMCI
- CAnDOIT - CAusal Discovery with Observational and Interventional data from Time-series
- RandomGraph
- Other causal discovery methods all within the same framework
- F-PCMCI:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2023).
Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios,
Proceedings of the Conference on Causal Learning and Reasoning (CLeaR).
@inproceedings{castri2023enhancing, title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, booktitle={Conference on Causal Learning and Reasoning}, pages={243--258}, year={2023}, organization={PMLR} }
- CAnDOIT:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2024).
CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series,
Advanced Intelligent Systems.
@article{https://doi.org/10.1002/aisy.202400181, author = {Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, title = {CAnDOIT: Causal Discovery with Observational and Interventional Data from Time Series}, journal = {Advanced Intelligent Systems}, volume = {n/a}, number = {n/a}, pages = {2400181}, keywords = {causal robotics, observations and interventions-based causal discoveries, time series}, doi = {https://doi.org/10.1002/aisy.202400181}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202400181}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/aisy.202400181}, }
- Tutorials [Coming soon..]
Extension of the state-of-the-art causal discovery method PCMCI, augmented with a feature-selection method based on Transfer Entropy. The algorithm, starting from a prefixed set of variables, identifies the correct subset of features and a hypothetical causal model between them. Then, using the selected features and the hypothetical causal model, the causal discovery is executed. This refined set of variables and the list of potential causal links between them contribute to achieving faster and more accurate causal discovery.
In the following, an example demonstrating the main functionality of F-PCMCI is presented, along with a comparison between causal models obtained by PCMCI and F-PCMCI causal discovery algorithms using the same data. The dataset consists of a 7-variables system defined as follows:
min_lag = 1
max_lag = 1
np.random.seed(1)
nsample = 1500
nfeature = 7
d = np.random.random(size = (nsample, feature))
for t in range(max_lag, nsample):
d[t, 0] += 2 * d[t-1, 1] + 3 * d[t-1, 3]
d[t, 2] += 1.1 * d[t-1, 1]**2
d[t, 3] += d[t-1, 3] * d[t-1, 2]
d[t, 4] += d[t-1, 4] + d[t-1, 5] * d[t-1, 0]
Causal Model by PCMCI | Causal Model by F-PCMCI |
---|---|
Execution time ~ 8min 40sec | Execution time ~ 3min 00sec |
F-PCMCI removes the variable
CAnDOIT extends LPCMCI, allowing the incorporation of interventional data into the causal discovery process alongside observational data. Like its predecessor, CAnDOIT can handle both lagged and contemporaneous dependencies, as well as latent variables.
In the following example, taken from one of the tigramite tutorials (this), we demonstrate CAnDOIT's ability to incorporate and leverage interventional data to improve the accuracy of causal analysis. The example involves a system of equations with four variables:
Note that
tau_max = 2
pc_alpha = 0.05
np.random.seed(19)
nsample_obs = 500
nfeature = 4
d = np.random.random(size = (nsample_obs, nfeature))
for t in range(tau_max, nsample_obs):
d[t, 0] += 0.9 * d[t-1, 0] + 0.6 * d[t, 1]
d[t, 2] += 0.9 * d[t-1, 2] + 0.4 * d[t-1, 1]
d[t, 3] += 0.9 * d[t-1, 3] - 0.5 * d[t-2, 2]
# Remove the unobserved component time series
data_obs = d[:, [0, 2, 3]]
var_names = ['X_0', 'X_2', 'X_3']
d_obs = Data(data_obs, vars = var_names)
d_obs.plot_timeseries()
lpcmci = LPCMCI(d_obs,
min_lag = 0,
max_lag = tau_max,
val_condtest = ParCorr(significance='analytic'),
alpha = pc_alpha)
# Run LPCMCI
lpcmci_cm = lpcmci.run()
lpcmci_cm.ts_dag(node_size = 4, min_width = 1.5, max_width = 1.5,
x_disp=0.5, y_disp=0.2, font_size=10)
As you can see from LPCMCI's result, the method correctly identifies the bidirected link (indicating the presence of a latent confounder) between
Now, let's introduce interventional data and examine its benefits. In this case, we perform a hard intervention on the variable
nsample_int = 150
int_data = dict()
# Intervention on X_2.
d_int = np.random.random(size = (nsample_int, nfeature))
d_int[0:tau_max, :] = d[len(d)-tau_max:,:]
d_int[:, 2] = 3 * np.ones(shape = (nsample_int,))
for t in range(tau_max, nsample_int):
d_int[t, 0] += 0.9 * d_int[t-1, 0] + 0.6 * d_int[t, 1]
d_int[t, 3] += 0.9 * d_int[t-1, 3] - 0.5 * d_int[t-2, 2]
data_int = d_int[:, [0, 2, 3]]
df_int = Data(data_int, vars = var_names)
int_data['X_2'] = df_int
candoit = CAnDOIT(d_obs,
int_data,
alpha = pc_alpha,
min_lag = 0,
max_lag = tau_max,
val_condtest = ParCorr(significance='analytic'))
candoit_cm = candoit.run()
candoit_cm.ts_dag(node_size = 4, min_width = 1.5, max_width = 1.5,
x_disp=0.5, y_disp=0.2, font_size=10)
CAnDOIT, like LPCMCI, correctly detects the bidirected link
In this section, we discuss an application of CAnDOIT in a robotic scenario. We designed an experiment to learn the causal model in a hypothetical robot arm application equipped with a camera. For this application, we utilised Causal World, which models a TriFinger robot, a floor, and a stage.
In our case, we use only one finger of the robot, with the finger's end effector equipped with a camera. The scenario consists of a cube placed at the centre of the floor, surrounded by a white stage.
The colour's brightness (
Note that
This model is used to generate observational data, which is then used by LPCMCI and CAnDOIT to reconstruct the causal model. For the interventional domain instead, we substitute the equation modelling
Also in this experiment, we can see the benefit of using intervention data alongside the observations. LPCMCI is unable to orient the contemporaneous (spurious) link between
RandomGraph is a random-model generator capable of creating random systems of equations with various properties: linear, nonlinear, lagged and/or contemporaneous dependencies, and hidden confounders. This tool offers several adjustable parameters, listed as follows:
- time-series length;
- number of observable variables;
- number of observable parents per variable (link density);
- number of hidden confounders;
- number of confounded variables per hidden confounder;
- noise configuration, e.g. Gaussian noise
$\mathcal{N}(\mu, \sigma^2)$ ; - minimum
$\tau_{min}$ and maximum$\tau_{max}$ time delay to consider in the equations; - coefficient range of the equations' terms;
- functional forms applied to the equations' terms:
$[-, \sin, \cos, \text{abs}, \text{pow}, \text{exp}]$ , where$-$ stands for none; - operators used to link various equations terms:
$[+, -, *, /]$ .
RandomGraph outputs a graph, the associated system of equations, and observational data. Additionally, it provides the option to generate interventional data.
noise_uniform = (NoiseType.Uniform, -0.5, 0.5)
noise_gaussian = (NoiseType.Gaussian, 0, 1)
noise_weibull = (NoiseType.Weibull, 2, 1)
RG = RandomGraph(nvars = 5,
nsamples = 1000,
link_density = 3,
coeff_range = (0.1, 0.5),
max_exp = 2,
min_lag = 0,
max_lag = 3,
noise_config = random.choice([noise_uniform, noise_gaussian, noise_weibull]),
functions = [''],
operators = ['+', '-'],
n_hidden_confounders = 2)
RG.gen_equations()
RG.ts_dag(withHidden = True)
noise_uniform = (NoiseType.Uniform, -0.5, 0.5)
noise_gaussian = (NoiseType.Gaussian, 0, 1)
noise_weibull = (NoiseType.Weibull, 2, 1)
RG = RandomGraph(nvars = 5,
nsamples = 1000,
link_density = 3,
coeff_range = (0.1, 0.5),
max_exp = 2,
min_lag = 0,
max_lag = 3,
noise_config = random.choice([noise_uniform, noise_gaussian, noise_weibull]),
functions = ['','sin', 'cos', 'exp', 'abs', 'pow'],
operators = ['+', '-', '*', '/'],
n_hidden_confounders = 2)
RG.gen_equations()
RG.ts_dag(withHidden = True)
noise_gaussian = (NoiseType.Gaussian, 0, 1)
RS = RandomGraph(nvars = 5,
nsamples = 1500,
link_density = 3,
coeff_range = (0.1, 0.5),
max_exp = 2,
min_lag = 0,
max_lag = 3,
noise_config = noise_gaussian,
functions = ['','sin', 'cos', 'exp', 'abs', 'pow'],
operators = ['+', '-', '*', '/'],
n_hidden_confounders = 2)
RS.gen_equations()
d_obs_wH, d_obs = RS.gen_obs_ts()
d_obs.plot_timeseries()
d_int = RS.intervene('X_4', 250, random.uniform(5, 10), d_obs.d)
d_int['X_4'].plot_timeseries()
Although the main contribution of this repository is to present the CAnDOIT and F-PCMCI algorithms, other causal discovery methods have been included for benchmarking purposes. Consequently, CausalFlow offers a collection of causal discovery methods, beyond F-PCMCI and CAnDOIT, that output time-series graphs (graphs that specify the lag for each link). These methods are listed as follows:
- DYNOTEARS - from the causalnex package;
- PCMCI - from the tigramite package;
- PCMCI+ - from the tigramite package;
- LPCMCI - from the tigramite package;
- J-PCMCI+ - from the tigramite package;
- TCDF - from the causal_discovery_for_time_series package;
- tsFCI - from the causal_discovery_for_time_series package;
- VarLiNGAM - from the lingam package;
Some algorithms are imported from other languages such as R and Java and are then wrapped in Python. Having the majority of causal discovery methods integrated into a single framework, which handles various types of inputs and outputs causal models, can facilitate the use of these algorithms.
Algorithm | Observations | Feature Selection | Interventions | |
---|---|---|---|---|
DYNOTEARS | ✅ | ❌ | ❌ | |
PCMCI | ✅ | ❌ | ❌ | |
PCMCI+ | ✅ | ❌ | ❌ | |
LPCMCI | ✅ | ❌ | ❌ | |
J-PCMCI+ | ✅ | ❌ | ❌ | |
TCDF | ✅ | ❌ | ❌ | |
tsFCI | ✅ | ❌ | ❌ | |
VarLiNGAM | ✅ | ❌ | ❌ | |
F-PCMCI | ✅ | ✅ | ❌ | |
CAnDOIT | ✅ | ❌ | ✅ |
Please consider citing the following papers depending on which method you use:
- F-PCMCI:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2023).
Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios,
Proceedings of the Conference on Causal Learning and Reasoning (CLeaR).
@inproceedings{castri2023enhancing, title={Enhancing Causal Discovery from Robot Sensor Data in Dynamic Scenarios}, author={Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, booktitle={Conference on Causal Learning and Reasoning}, pages={243--258}, year={2023}, organization={PMLR} }
- CAnDOIT:
L. Castri, S. Mghames, M. Hanheide and N. Bellotto (2024).
CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series,
Advanced Intelligent Systems.
@article{https://doi.org/10.1002/aisy.202400181, author = {Castri, Luca and Mghames, Sariah and Hanheide, Marc and Bellotto, Nicola}, title = {CAnDOIT: Causal Discovery with Observational and Interventional Data from Time Series}, journal = {Advanced Intelligent Systems}, volume = {n/a}, number = {n/a}, pages = {2400181}, keywords = {causal robotics, observations and interventions-based causal discoveries, time series}, doi = {https://doi.org/10.1002/aisy.202400181}, url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202400181}, eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/aisy.202400181}, }
- pandas>=1.5.2
- numba>=0.58.1
- scipy>=1.3.3
- networkx>=2.8.6
- ruptures>=1.1.7
- scikit_learn>=1.1.3
- torch>=1.11.0
- gpytorch>=1.4
- dcor>=0.5.3
- h5py>=3.7.0
- jpype1>=1.5.0
- mpmath>=1.3.0
- causalnex
- lingam
- pyopencl>=2024.1
- matplotlib>=3.7.0
- numpy
- pgmpy>=0.1.19
- tigramite>=5.1.0.3
- rectangle-packer
- grandalf
Before installing CausalFlow, you need to install Java and the IDTxl package used for the feature-selection process, following the guide described here. Once complete, you can install the current release of CausalFlow
with:
pip install py-causalflow
For a complete installation Java - IDTxl - CausalFlow, follow the following procedure.
Verify that you have not already installed Java:
java -version
if the latter returns Command 'java' not found, ...
, you can install Java by the following commands, otherwise you can jump to IDTxl installation.
# Java
sudo apt-get update
sudo apt install default-jdk
Then, you need to add JAVA_HOME to the environment
sudo nano /etc/environment
JAVA_HOME="/lib/jvm/java-11-openjdk-amd64/bin/java" # Paste the JAVA_HOME assignment at the bottom of the file
source /etc/environment
# IDTxl
git clone -b v1.4 https://github.com/pwollstadt/IDTxl.git
cd IDTxl
pip install -e .
pip install py-causalflow
Version | Changes |
---|---|
4.0.4 | IDTxl v1.4 |
4.0.3 | numba version fix DAG dag() fix CAnDOIT fix: min_lag must be equal to 0 |
4.0.2 | PyPI fixes rectangle-packer and grandalf added to requirements numba version fix causal_discovery/baseline/pkgs fix |
4.0.1 | PyPI |
4.0.0 | package published |