` Causal Discovery Toolbox Documentation ====================================== Package for causal inference in graphs and in the pairwise settings for Python>=3.5. Tools for graph structure recovery and dependencies are included. The package is based on Numpy, Scikit-learn, Pytorch and R. .. image:: https://travis-ci.org/FenTechSolutions/CausalDiscoveryToolbox.svg?branch=master :target: https://travis-ci.org/FenTechSolutions/CausalDiscoveryToolbox .. image:: https://travis-ci.org/FenTechSolutions/CausalDiscoveryToolbox.svg?branch=dev :target: https://travis-ci.org/FenTechSolutions/CausalDiscoveryToolbox .. image:: https://codecov.io/gh/FenTechSolutions/CausalDiscoveryToolbox/branch/master/graph/badge.svg :target: https://codecov.io/gh/FenTechSolutions/CausalDiscoveryToolbox .. image:: https://img.shields.io/aur/license/pac.svg?maxAge=259200 .. image:: https://img.shields.io/badge/version-0.6.0-yellow.svg?maxAge=259200 It implements lots of algorithms for graph structure recovery (including algorithms from the `bnlearn`, `pcalg` packages), mainly based out of observational data. Install it using pip: (See more details on installation below) .. code-block:: bash pip install cdt Open-source project =================== The package is open-source and under the MIT license, the source code is available at : https://github.com/FenTechSolutions/CausalDiscoveryToolbox When using this package, please cite: `Kalainathan, D., & Goudet, O. (2019). Causal Discovery Toolbox: Uncover causal relationships in Python. arXiv:1903.02278 `_. Docker images ============= Docker images are available, including all the dependencies, and enabled functionalities: .. |36cpu| image:: https://img.shields.io/badge/docker-0.6.0-0db7ed.svg?maxAge=259200 :target: https://hub.docker.com/r/divkal/cdt-py3.6/ .. |37cpu| image:: https://img.shields.io/badge/docker-0.6.0-0db7ed.svg?maxAge=259200 :target: https://hub.docker.com/r/divkal/cdt-py3.7/ .. |36gpu| image:: https://img.shields.io/badge/nvidia--docker-0.6.0-76b900.svg?maxAge=259200 :target: https://hub.docker.com/r/divkal/nv-cdt-py3.6/ .. |36cpudev| image:: https://img.shields.io/badge/docker-latest-0db7ed.svg?maxAge=259200 :target: https://hub.docker.com/r/divkal/cdt-dev/ .. |37cpudev| image:: https://img.shields.io/badge/docker-unavailable-lightgrey.svg?maxAge=259200 .. |36gpudev| image:: https://img.shields.io/badge/nvidia--docker-latest-76b900.svg?maxAge=259200 :target: https://hub.docker.com/r/divkal/nv-cdt-dev/ .. | Branch | master | dev | .. | Python 3.6 - CPU | [![d36cpu](https://img.shields.io/badge/docker-0.6.0-0db7ed.svg?maxAge=259200)](https://hub.docker.com/r/divkal/cdt-py3.6/) | [![d36cpudev](https://img.shields.io/badge/dev-0.6.0-0db7ed.svg?maxAge=259200)](https://hub.docker.com/r/divkal/cdt-dev) | .. | Python 3.7 - CPU | [![d37cpu](https://img.shields.io/badge/docker-0.6.0-0db7ed.svg?maxAge=259200)](https://hub.docker.com/r/divkal/cdt-py3.7/) | [![d37gpu](https://img.shields.io/badge/dev-unavailable-lightgrey.svg?maxAge=259200)](#) | .. | Python 3.6 - GPU | [![d36gpu](https://img.shields.io/badge/nvidia--docker-0.6.0-76b900.svg?maxAge=259200)](https://hub.docker.com/r/divkal/nv-cdt-py3.6/) | [![d36gpudev](https://img.shields.io/badge/nvidia--dev-0.6.0-0db7ed.svg?maxAge=259200)](https://hub.docker.com/r/divkal/nv-cdt-dev) | +-------------------+---------+------------+ | Branch | master | dev | +===================+=========+============+ | Python 3.6 - CPU | |36cpu| | |36cpudev| | +-------------------+---------+------------+ | Python 3.7 - CPU | |37cpu| | |37cpudev| | +-------------------+---------+------------+ | Python 3.6 - GPU | |36gpu| | |36gpudev| | +-------------------+---------+------------+ Installation ============ The packages requires a python version >=3.5, as well as some libraries listed in the `requirements file `_. For some additional functionalities, more libraries are needed for these extra functions and options to become available. Here is a quick install guide of the package, starting off with the minimal install up to the full installation. .. note:: A (mini/ana)conda framework would help installing all those packages and therefore could be recommended for non-expert users. PyTorch ------- As some of the key algorithms in the _cdt_ package use the PyTorch package, it is required to install it. Check out their website to install the PyTorch version suited to your hardware configuration: https://pytorch.org Install the CausalDiscoveryToolbox package ------------------------------------------ The package is available on PyPi: .. code-block:: bash pip install cdt Or you can also install it from source. .. code-block:: bash $ git clone https://github.com/FenTechSolutions/CausalDiscoveryToolbox.git # Download the package $ cd CausalDiscoveryToolbox $ pip install -r requirements.txt # Install the requirements $ python setup.py install develop --user **The package is then up and running ! You can run most of the algorithms in the CausalDiscoveryToolbox, you might get warnings: some additional features are not available** From now on, you can import the library using : .. code-block:: python import cdt Additional : R and R libraries ------------------------------ In order to have access to additional algorithms from various R packages such as bnlearn, kpcalg, pcalg, ... while using the _cdt_ framework, it is required to install R. Check out how to install all R dependencies in the before-install section of the [travis.yml](https://github.com/FenTechSolutions/CausalDiscoveryToolbox/blob/master/.travis.yml) file for debian based distributions. The `r-requirements file `_ notes all the R packages used by the toolbox. Here is an example of installation script of the R packages on Ubuntu 20.04: .. code-block:: sh apt-get -qq update DEBIAN_FRONTEND=noninteractive apt-get install -y tzdata apt-get -qq install dialog apt-utils -y apt-get install apt-transport-https -y apt-get install -qq software-properties-common -y apt-get -qq update apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' -y apt-get -qq update apt-get -qq install r-base -y apt-get -qq install libssl-dev -y apt-get -qq install libgmp3-dev -y apt-get -qq install git -y apt-get -qq install build-essential -y apt-get -qq install libv8-dev -y apt-get -qq install libcurl4-openssl-dev -y apt-get -qq install libgsl-dev -y Rscript -e 'install.packages(c("V8"),repos="http://cran.us.r-project.org", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages(c("sfsmisc"),repos="http://cran.us.r-project.org", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages(c("clue"),repos="http://cran.us.r-project.org", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages("https://cran.r-project.org/src/contrib/Archive/randomForest/randomForest_4.6-14.tar.gz", repos=NULL, type="source")' Rscript -e 'install.packages(c("lattice"),repos="http://cran.us.r-project.org", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages(c("devtools"),repos="http://cran.us.r-project.org", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages(c("MASS"),repos="http://cran.us.r-project.org", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages("BiocManager")' Rscript -e 'BiocManager::install(c("igraph"))' Rscript -e 'install.packages("https://cran.r-project.org/src/contrib/Archive/fastICA/fastICA_1.2-2.tar.gz", repos=NULL, type="source")' Rscript -e 'BiocManager::install(c("SID", "bnlearn", "pcalg", "kpcalg", "glmnet", "mboost"))' Rscript -e 'install.packages("https://cran.r-project.org/src/contrib/Archive/CAM/CAM_1.0.tar.gz", repos=NULL, type="source")' Rscript -e 'install.packages("https://cran.r-project.org/src/contrib/sparsebnUtils_0.0.8.tar.gz", repos=NULL, type="source")' Rscript -e 'BiocManager::install(c("ccdrAlgorithm", "discretecdAlgorithm"))' apt-get -qq install libxml2-dev -y Rscript -e 'install.packages("devtools")' Rscript -e 'library(devtools); install_github("cran/CAM"); install_github("cran/momentchi2"); install_github("Diviyan-Kalainathan/RCIT", quiet=TRUE, verbose=FALSE)' Rscript -e 'install.packages("https://cran.r-project.org/src/contrib/Archive/sparsebn/sparsebn_0.1.2.tar.gz", repos=NULL, type="source")' Overview ======== The following figure shows how the package and its algorithms are structured: :: cdt package | |- independence | |- graph (Infering the skeleton from data) | | |- Lasso variants (Randomized Lasso[1], Glasso[2], HSICLasso[3]) | | |- FSGNN (CGNN[12] variant for feature selection) | | |- Skeleton recovery using feature selection algorithms (RFECV[5], LinearSVR[6], RRelief[7], ARD[8,9], DecisionTree) | | | |- stats (pairwise methods for dependency) | |- Correlation (Pearson, Spearman, KendallTau) | |- Kernel based (NormalizedHSIC[10]) | |- Mutual information based (MIRegression, Adjusted Mutual Information[11], Normalized mutual information[11]) | |- data | |- CausalPairGenerator (Generate causal pairs) | |- AcyclicGraphGenerator (Generate FCM-based graphs) | |- load_dataset (load standard benchmark datasets) | |- causality | |- graph (methods for graph inference) | | |- CGNN[12] | | |- PC[13] | | |- GES[13] | | |- GIES[13] | | |- LiNGAM[13] | | |- CAM[13] | | |- GS[23] | | |- IAMB[24] | | |- MMPC[25] | | |- SAM[26] | | |- CCDr[27] | | | |- pairwise (methods for pairwise inference) | |- ANM[14] (Additive Noise Model) | |- IGCI[15] (Information Geometric Causal Inference) | |- RCC[16] (Randomized Causation Coefficient) | |- NCC[17] (Neural Causation Coefficient) | |- GNN[12] (Generative Neural Network -- Part of CGNN ) | |- Bivariate fit (Baseline method of regression) | |- Jarfo[20] | |- CDS[20] | |- RECI[28] | |- metrics (Implements the metrics for graph scoring) | |- Precision Recall | |- SHD | |- SID [29] | |- utils |- Settings -> SETTINGS class (hardware settings) |- loss -> MMD loss [21, 22] & various other loss functions |- io -> for importing data formats |- graph -> graph utilities References ========== - [1] Wang, S., Nan, B., Rosset, S., & Zhu, J. SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning.
- [27] Aragam, B., & Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. Journal of Machine Learning Research, 16, 2273-2328.
- [28] Bloebaum, P., Janzing, D., Washio, T., Shimizu, S., & Schoelkopf, B. (2018, March). Cause-Effect Inference by Comparing Regression Errors. In International Conference on Artificial Intelligence and Statistics (pp. 900-909).
- [29] Structural Intervention Distance (SID) for Evaluating Causal Graphs, Jonas Peters, Peter Bühlmann: https://arxiv.org/abs/1306.1043