Causal Discovery Toolbox Documentation¶
Package for causal inference in graphs and in the pairwise settings for Python>=3.5. Tools for graph structure recovery and dependencies are included. The package is based on Numpy, Scikitlearn, Pytorch and R.
It implements lots of algorithms for graph structure recovery (including algorithms from the bnlearn, pcalg packages), mainly based out of observational data.
Install it using pip: (See more details on installation below)
pip install cdt
Opensource project¶
The package is opensource and under the MIT license, the source code is available at : https://github.com/FenTechSolutions/CausalDiscoveryToolbox
When using this package, please cite: Kalainathan, D., & Goudet, O. (2019). Causal Discovery Toolbox: Uncover causal relationships in Python. arXiv:1903.02278.
Docker images¶
Docker images are available, including all the dependencies, and enabled functionalities:
Branch 
master 
dev 

Python 3.6  CPU 

Python 3.7  CPU 

Python 3.6  GPU 
Installation¶
The packages requires a python version >=3.5, as well as some libraries listed in the requirements file. For some additional functionalities, more libraries are needed for these extra functions and options to become available. Here is a quick install guide of the package, starting off with the minimal install up to the full installation.
Note
A (mini/ana)conda framework would help installing all those packages and therefore could be recommended for nonexpert users.
PyTorch¶
As some of the key algorithms in the _cdt_ package use the PyTorch package, it is required to install it. Check out their website to install the PyTorch version suited to your hardware configuration: https://pytorch.org
Install the CausalDiscoveryToolbox package¶
The package is available on PyPi:
pip install cdt
Or you can also install it from source.
$ git clone https://github.com/FenTechSolutions/CausalDiscoveryToolbox.git # Download the package
$ cd CausalDiscoveryToolbox
$ pip install r requirements.txt # Install the requirements
$ python setup.py install develop user
The package is then up and running ! You can run most of the algorithms in the CausalDiscoveryToolbox, you might get warnings: some additional features are not available
From now on, you can import the library using :
import cdt
Additional : R and R libraries¶
In order to have access to additional algorithms from various R packages such as bnlearn, kpcalg, pcalg, … while using the _cdt_ framework, it is required to install R.
Check out how to install all R dependencies in the beforeinstall section of the [travis.yml](https://github.com/FenTechSolutions/CausalDiscoveryToolbox/blob/master/.travis.yml) file for debian based distributions. The rrequirements file notes all the R packages used by the toolbox.
Overview¶
The following figure shows how the package and its algorithms are structured:
cdt package

 independence
  graph (Infering the skeleton from data)
   Lasso variants (Randomized Lasso[1], Glasso[2], HSICLasso[3])
   FSGNN (CGNN[12] variant for feature selection)
   Skeleton recovery using feature selection algorithms (RFECV[5], LinearSVR[6], RRelief[7], ARD[8,9], DecisionTree)
 
  stats (pairwise methods for dependency)
  Correlation (Pearson, Spearman, KendallTau)
  Kernel based (NormalizedHSIC[10])
  Mutual information based (MIRegression, Adjusted Mutual Information[11], Normalized mutual information[11])

 data
  CausalPairGenerator (Generate causal pairs)
  AcyclicGraphGenerator (Generate FCMbased graphs)
  load_dataset (load standard benchmark datasets)

 causality
  graph (methods for graph inference)
   CGNN[12]
   PC[13]
   GES[13]
   GIES[13]
   LiNGAM[13]
   CAM[13]
   GS[23]
   IAMB[24]
   MMPC[25]
   SAM[26]
   CCDr[27]
 
  pairwise (methods for pairwise inference)
  ANM[14] (Additive Noise Model)
  IGCI[15] (Information Geometric Causal Inference)
  RCC[16] (Randomized Causation Coefficient)
  NCC[17] (Neural Causation Coefficient)
  GNN[12] (Generative Neural Network  Part of CGNN )
  Bivariate fit (Baseline method of regression)
  Jarfo[20]
  CDS[20]
  RECI[28]

 metrics (Implements the metrics for graph scoring)
  Precision Recall
  SHD
  SID [29]

 utils
 Settings > SETTINGS class (hardware settings)
 loss > MMD loss [21, 22] & various other loss functions
 io > for importing data formats
 graph > graph utilities
References¶
[1] Wang, S., Nan, B., Rosset, S., & Zhu, J. (2011). Random lasso. The annals of applied statistics, 5(1), 468.
[2] Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432441.
[3] Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., & Sugiyama, M. (2014). Highdimensional feature selection by featurewise kernelized lasso. Neural computation, 26(1), 185207.
[4] Feizi, S., Marbach, D., Médard, M., & Kellis, M. (2013). Network deconvolution as a general method to distinguish direct dependencies in networks. Nature biotechnology, 31(8), 726733.
[5] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46(1), 389422.
[6] Vapnik, V., Golowich, S. E., & Smola, A. J. (1997). Support vector method for function approximation, regression estimation and signal processing. In Advances in neural information processing systems (pp. 281287).
[7] Kira, K., & Rendell, L. A. (1992, July). The feature selection problem: Traditional methods and a new algorithm. In Aaai (Vol. 2, pp. 129134).
[8] MacKay, D. J. (1992). Bayesian interpolation. Neural Computation, 4, 415–447.
[9] Neal, R. M. (1996). Bayesian learning for neural networks. No. 118 in Lecture Notes in Statistics. New York: Springer.
[10] Gretton, A., Bousquet, O., Smola, A., & Scholkopf, B. (2005, October). Measuring statistical dependence with HilbertSchmidt norms. In ALT (Vol. 16, pp. 6378).
[11] Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 28372854.
[12] Goudet, O., Kalainathan, D., Caillou, P., LopezPaz, D., Guyon, I., Sebag, M., … & Tubaro, P. (2017). Learning functional causal models with generative neural networks. arXiv preprint arXiv:1709.05321.
[13] Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction, and Search. MIT press.
[14] Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Advances in neural information processing systems (pp. 689696).
[15] Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheischler, J., Daniušis, P., … & Schölkopf, B. (2012). Informationgeometric approach to inferring causal directions. Artificial Intelligence, 182, 131.
[16] LopezPaz, D., Muandet, K., Schölkopf, B., & Tolstikhin, I. (2015, June). Towards a learning theory of causeeffect inference. In International Conference on Machine Learning (pp. 14521461).
[17] LopezPaz, D., Nishihara, R., Chintala, S., Schölkopf, B., & Bottou, L. (2017, July). Discovering causal signals in images. In Proceedings of CVPR.
[18] Stegle, O., Janzing, D., Zhang, K., Mooij, J. M., & Schölkopf, B. (2010). Probabilistic latent variable models for distinguishing between cause and effect. In Advances in Neural Information Processing Systems (pp. 16871695).
[19] Zhang, K., & Hyvärinen, A. (2009, June). On the identifiability of the postnonlinear causal model. In Proceedings of the twentyfifth conference on uncertainty in artificial intelligence (pp. 647655). AUAI Press.
[20] Fonollosa, J. A. (2016). Conditional distribution variability measures for causality detection. arXiv preprint arXiv:1601.06680.
[21] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel twosample test. Journal of Machine Learning Research, 13(Mar), 723773.
[22] Li, Y., Swersky, K., & Zemel, R. (2015). Generative moment matching networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML15) (pp. 17181727).
[23] Margaritis D (2003). Learning Bayesian Network Model Structure from Data . Ph.D. thesis, School of Computer Science, CarnegieMellon University, Pittsburgh, PA. Available as Technical Report CMUCS03153
[24] Tsamardinos I, Aliferis CF, Statnikov A (2003). “Algorithms for Large Scale Markov Blanket Discovery”. In “Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference”, pp. 376381. AAAI Press.
[25] Tsamardinos I, Aliferis CF, Statnikov A (2003). “Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations”. In “KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining”, pp. 673678. ACM. Tsamardinos I, Brown LE, Aliferis CF (2006). “The MaxMin HillClimbing Bayesian Network Structure Learning Algorithm”. Machine Learning,65(1), 3178.
[26] Kalainathan, Diviyan & Goudet, Olivier & Guyon, Isabelle & LopezPaz, David & Sebag, Michèle. (2018). SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning.
[27] Aragam, B., & Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. Journal of Machine Learning Research, 16, 22732328.
[28] Bloebaum, P., Janzing, D., Washio, T., Shimizu, S., & Schoelkopf, B. (2018, March). CauseEffect Inference by Comparing Regression Errors. In International Conference on Artificial Intelligence and Statistics (pp. 900909).
[29] Structural Intervention Distance (SID) for Evaluating Causal Graphs, Jonas Peters, Peter Bühlmann: https://arxiv.org/abs/1306.1043