cdt.causality¶
cdt.causality.pairwise¶

class
cdt.causality.pairwise.model.
PairwiseModel
[source]¶ Base class for all pairwise causal inference models
Usage for undirected/directed graphs and CEPC df format.

orient_graph
(df_data, graph, printout=None, **kwargs)[source]¶ Orient an undirected graph using the pairwise method defined by the subclass.
The pairwise method is ran on every undirected edge.
 Parameters
df_data (pandas.DataFrame) – Data
graph (networkx.Graph) – Graph to orient
printout (str) – (optional) Path to file where to save temporary results
 Returns
a directed graph, which might contain cycles
 Return type
networkx.DiGraph
Warning
Requirement : Name of the nodes in the graph correspond to name of the variables in df_data

predict
(x, *args, **kwargs)[source]¶ Generic predict method, chooses which subfunction to use for a more suited.
Depending on the type of x and of *args, this function process to execute different functions in the priority order:
If
args[0]
is anetworkx.(Di)Graph
, thenself.orient_graph
is executed.If
args[0]
exists, thenself.predict_proba
is executed.If
x
is apandas.DataFrame
, thenself.predict_dataset
is executed.If
x
is apandas.Series
, thenself.predict_proba
is executed.
 Parameters
x (numpy.array or pandas.DataFrame or pandas.Series) – First variable or dataset.
args (numpy.array or networkx.Graph) – graph or second variable.
 Returns
predictions output
 Return type
pandas.Dataframe or networkx.Digraph

predict_dataset
(x, **kwargs)[source]¶ Generic dataset prediction function.
Runs the score independently on all pairs.
 Parameters
x (pandas.DataFrame) – a CEPC format Dataframe.
kwargs (dict) – additional arguments for the algorithms
 Returns
a Dataframe with the predictions.
 Return type
pandas.DataFrame

predict_proba
(dataset, idx=0, **kwargs)[source]¶ Prediction method for pairwise causal inference.
predict_proba is meant to be overridden in all subclasses
 Parameters
dataset (tuple) – Couple of np.ndarray variables to classify
idx (int) – (optional) index number for printing purposes
 Returns
Causation score (Value : 1 if a>b and 1 if b>a)
 Return type
float

ANM¶

class
cdt.causality.pairwise.
ANM
[source]¶ ANM algorithm.
Description: The Additive noise model is one of the most popular approaches for pairwise causality. It bases on the fitness of the data to the additive noise model on one direction and the rejection of the model on the other direction.
Data Type: Continuous
Assumptions: Assuming that \(x\rightarrow y\) then we suppose that the data follows an additive noise model, i.e. \(y=f(x)+E\). E being a noise variable and f a deterministic function. The causal inference bases itself on the independence between x and e. It is proven that in such case if the data is generated using an additive noise model, the model would only be able to fit in the true causal direction.
Note
Ref : Hoyer, Patrik O and Janzing, Dominik and Mooij, Joris M and Peters, Jonas and Schölkopf, Bernhard, “Nonlinear causal discovery with additive noise models”, NIPS 2009 https://papers.nips.cc/paper/3548nonlinearcausaldiscoverywithadditivenoisemodels.pdf
Example
>>> from cdt.causality.pairwise import ANM >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> data, labels = load_dataset('tuebingen') >>> obj = ANM() >>> >>> # This example uses the predict() method >>> output = obj.predict(data) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset('sachs') >>> output = obj.orient_graph(data, nx.DiGraph(graph)) >>> >>> # To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
Bivariate Fit¶

class
cdt.causality.pairwise.
BivariateFit
(ffactor=2, maxdev=3, minc=12)[source]¶ Bivariate Fit model.
Description: The bivariate fit model is based onon a bestfit criterion relying on a Gaussian Process regressor. Used as weak baseline.
Data Type: Continuous
Assumptions: This is often a model used to show that correlation \(\neq\) causation. It holds very weak performance, as it states that the best predictive model is the causal model.
Example
>>> from cdt.causality.pairwise import BivariateFit >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> data, labels = load_dataset('tuebingen') >>> obj = BivariateFit() >>> >>> # This example uses the predict() method >>> output = obj.predict(data) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
CDS¶

class
cdt.causality.pairwise.
CDS
(ffactor=2, maxdev=3, minc=12)[source]¶ Conditional Distribution Similarity Statistic
Description: The Conditional Distribution Similarity Statistic measures the std. of the rescaled values of y (resp. x) after binning in the x (resp. y) direction. The lower the std. the more likely the pair to be x>y (resp. y>x). It is a single feature of the Jarfo model.
Data Type: Continuous and Discrete
Assumptions: This approach is a statistical feature of the joint distribution of the data mesuring the variance of the marginals, after conditioning on bins.
Note
Ref : Fonollosa, José AR, “Conditional distribution variability measures for causality detection”, 2016.
Example
>>> from cdt.causality.pairwise import CDS >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> data, labels = load_dataset('tuebingen') >>> obj = CDS() >>> >>> # This example uses the predict() method >>> output = obj.predict(data) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
GNN¶

class
cdt.causality.pairwise.
GNN
(nh=20, lr=0.01, nruns=6, njobs=None, gpus=None, verbose=None, batch_size= 1, train_epochs=1000, test_epochs=1000, dataloader_workers=0)[source]¶ Shallow Generative Neural networks.
Description: Pairwise variant of the CGNN approach, Models the causal directions x>y and y>x with a 1hidden layer neural network and a MMD loss. The causal direction is considered as the bestfit between the two causal directions.
Data Type: Continuous
Assumptions: The class of generative models is not restricted with a hard contraint, but with the hyperparameter
nh
. This algorithm greatly benefits from bootstrapped runs (nruns >=12 recommended), and is very computationnally heavy. GPUs are recommended. Parameters
nh (int) – number of hidden units in the neural network
lr (float) – learning rate of the optimizer
nruns (int) – number of runs to execute per batch (before testing for significance with ttest).
njobs (int) – number of runs to execute in parallel. (defaults to
cdt.SETTINGS.NJOBS
)gpus (bool) – Number of available gpus (defaults to
cdt.SETTINGS.GPU
)idx (int) – (optional) index of the pair, for printing purposes
verbose (bool) – verbosity (defaults to
cdt.SETTINGS.verbose
)batch_size (int) – batch size, defaults to fullbatch
train_epochs (int) – Number of epochs used for training
test_epochs (int) – Number of epochs used for evaluation
dataloader_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
Note
Ref : Learning Functional Causal Models with Generative Neural Networks Olivier Goudet & Diviyan Kalainathan & Al. (https://arxiv.org/abs/1709.05321)
Example
>>> from cdt.causality.pairwise import GNN >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> data, labels = load_dataset('tuebingen') >>> obj = GNN() >>> >>> # This example uses the predict() method >>> output = obj.predict(data) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

orient_graph
(df_data, graph, printout=None, **kwargs)[source]¶ Orient an undirected graph using the pairwise method defined by the subclass.
The pairwise method is ran on every undirected edge.
 Parameters
df_data (pandas.DataFrame or MetaDataset) – Data (check cdt.utils.io.MetaDataset)
graph (networkx.Graph) – Graph to orient
printout (str) – (optional) Path to file where to save temporary results
 Returns
a directed graph, which might contain cycles
 Return type
networkx.DiGraph
Note
This function is an override of the base class, in order to be able to use the torch.utils.data.Dataset classes
Warning
Requirement : Name of the nodes in the graph correspond to name of the variables in df_data
IGCI¶

class
cdt.causality.pairwise.
IGCI
[source]¶ IGCI model.
Description: Information Geometric Causal Inference is a pairwise causal discovery model model considering the case of minimal noise \(Y=f(X)\), with \(f\) invertible and leverages assymetries to predict causal directions.
Data Type: Continuous
Assumptions: Only the case of invertible functions only is considered, as the prediction would be trivial otherwise if the noise is minimal.
Note
P. Daniušis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, B. Schölkopf: Inferring deterministic causal relations. Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI2010). http://event.cwi.nl/uai2010/papers/UAI2010_0121.pdf
Example
>>> from cdt.causality.pairwise import IGCI >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> data, labels = load_dataset('tuebingen') >>> obj = IGCI() >>> >>> # This example uses the predict() method >>> output = obj.predict(data) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

predict_proba
(dataset, ref_measure='gaussian', estimator='entropy', **kwargs)[source]¶ Evaluate a pair using the IGCI model.
 Parameters
dataset (tuple) – Couple of np.ndarray variables to classify
refMeasure (str) – Scaling method (gaussian (default), integral or None)
estimator (str) – method used to evaluate the pairs (entropy (default) or integral)}
 Returns
value of the IGCI model >0 if a>b otherwise if return <0
 Return type
float

Jarfo¶

class
cdt.causality.pairwise.
Jarfo
[source]¶ Jarfo model, 2nd of the Cause Effect Pairs challenge, 1st of the Fast Causation Challenge.
Description: The Jarfo model is an ensemble method for causal discovery: it builds lots of causally relevant features (such as ANM) with a gradient boosting classifier on top.
Data Type: Continuous, Categorical, Mixed
Assumptions: This method needs a substantial amount of labelled causal pairs to train itself. Its final performance depends on the training set used.
Note
Ref : Fonollosa, José AR, “Conditional distribution variability measures for causality detection”, 2016.
Example
>>> from cdt.causality.pairwise import Jarfo >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> from sklearn.model_selection import train_test_split >>> data, labels = load_dataset('tuebingen') >>> X_tr, X_te, y_tr, y_te = train_test_split(data, labels, train_size=.5) >>> >>> obj = Jarfo() >>> obj.fit(X_tr, y_tr) >>> # This example uses the predict() method >>> output = obj.predict(X_te) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

orient_graph
(df_data, graph, printout=None, **kwargs)[source]¶ Orient an undirected graph using Jarfo, function modified for optimization.
 Parameters
df_data (pandas.DataFrame) – Data
umg (networkx.Graph) – Graph to orient
nruns (int) – number of times to rerun for each pair (bootstrap)
printout (str) – (optional) Path to file where to save temporary results
 Returns
a directed graph, which might contain cycles
 Return type
networkx.DiGraph

predict_dataset
(df)[source]¶ Runs Jarfo independently on all pairs.
 Parameters
x (pandas.DataFrame) – a CEPC format Dataframe.
kwargs (dict) – additional arguments for the algorithms
 Returns
a Dataframe with the predictions.
 Return type
pandas.DataFrame

predict_proba
(dataset, idx=0, **kwargs)[source]¶ Use Jarfo to predict the causal direction of a pair of vars.
 Parameters
dataset (tuple) – Couple of np.ndarray variables to classify
idx (int) – (optional) index number for printing purposes
 Returns
Causation score (Value : 1 if a>b and 1 if b>a)
 Return type
float

NCC¶

class
cdt.causality.pairwise.
NCC
[source]¶ Neural Causation Coefficient.
Description: The Neural Causation Coefficient (NCC) is an approach neural network relying only on Neural networks to build causally relevant embeddings of distributions during training, and classyfing the pairs using the last layers of the neural network.
Data Type: Continuous, Categorical, Mixed
Assumptions: This method needs a substantial amount of labelled causal pairs to train itself. Its final performance depends on the training set used.
Example
>>> from cdt.causality.pairwise import NCC >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> from sklearn.model_selection import train_test_split >>> data, labels = load_dataset('tuebingen') >>> X_tr, X_te, y_tr, y_te = train_test_split(data, labels, train_size=.5) >>> >>> obj = NCC() >>> obj.fit(X_tr, y_tr) >>> # This example uses the predict() method >>> output = obj.predict(X_te) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

fit
(x_tr, y_tr, epochs=50, batch_size=32, learning_rate=0.01, verbose=None, device=None)[source]¶ Fit the NCC model.
 Parameters
x_tr (pd.DataFrame) – CEPC format dataframe containing the pairs
y_tr (pd.DataFrame or np.ndarray) – labels associated to the pairs
epochs (int) – number of train epochs
learning_rate (float) – learning rate of Adam
verbose (bool) – verbosity (defaults to
cdt.SETTINGS.verbose
)device (str) – cuda or cpu device (defaults to
cdt.SETTINGS.default_device
)

predict_dataset
(df, device=None, verbose=None)[source]¶  Parameters
x_tr (pd.DataFrame) – CEPC format dataframe containing the pairs
epochs (int) – number of train epochs
rate (learning) – learning rate of Adam
verbose (bool) – verbosity (defaults to
cdt.SETTINGS.verbose
)device (str) – cuda or cpu device (defaults to
cdt.SETTINGS.default_device
)
 Returns
dataframe containing the predicted causation coefficients
 Return type
pandas.DataFrame

predict_proba
(dataset, device=None, idx=0)[source]¶ Infer causal directions using the trained NCC pairwise model.
 Parameters
dataset (tuple) – Couple of np.ndarray variables to classify
device (str) – Device to run the algorithm on (defaults to
cdt.SETTINGS.default_device
)
 Returns
Causation score (Value : 1 if a>b and 1 if b>a)
 Return type
float

RCC¶

class
cdt.causality.pairwise.
RCC
(rand_coeff=333, nb_estimators=500, nb_min_leaves=20, max_depth=None, s=10, njobs=None, verbose=None)[source]¶ Randomized Causation Coefficient model. 2nd approach in the Fast Causation challenge.
Description: The Randomized causation coefficient (RCC) relies on the projection of the empirical distributions into a RKHS using random cosine embeddings, then classfies the pairs using a random forest based on those features.
Data Type: Continuous, Categorical, Mixed
Assumptions: This method needs a substantial amount of labelled causal pairs to train itself. Its final performance depends on the training set used.
 Parameters
rand_coeff (int) – number of randomized coefficients
nb_estimators (int) – number of estimators
nb_min_leaves (int) – number of min samples leaves of the estimator
() (max_depth) – (optional) max depth of the model
s (float) – scaling
njobs (int) – number of jobs to be run on parallel (defaults to
cdt.SETTINGS.NJOBS
)verbose (bool) – verbosity (defaults to
cdt.SETTINGS.verbose
)
Note
Ref : LopezPaz, David and Muandet, Krikamol and Schölkopf, Bernhard and Tolstikhin, Ilya O, “Towards a Learning Theory of CauseEffect Inference”, ICML 2015.
Example
>>> from cdt.causality.pairwise import RCC >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> from sklearn.model_selection import train_test_split >>> data, labels = load_dataset('tuebingen') >>> X_tr, X_te, y_tr, y_te = train_test_split(data, labels, train_size=.5) >>> >>> obj = RCC() >>> obj.fit(X_tr, y_tr) >>> # This example uses the predict() method >>> output = obj.predict(X_te) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset('sachs') >>> output = obj.orient_graph(data, nx.DiGraph(graph)) >>> >>> # To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

featurize_row
(x, y)[source]¶ Projects the causal pair to the RKHS using the sampled kernel approximation.
 Parameters
x (np.ndarray) – Variable 1
y (np.ndarray) – Variable 2
 Returns
projected empirical distributions into a single fixedsize vector.
 Return type
np.ndarray
RECI¶

class
cdt.causality.pairwise.
RECI
(degree=3)[source]¶ RECI model.
Description: Regression Error based Causal Inference (RECI) relies on a bestfit mse with monome regressor and [0,1] rescaling to infer causal direction.
Data Type: Continuous (depends on the regressor used)
Assumptions: No independence tests are used, but the assumptions on the model depend on the regessor used for RECI.
 Parameters
degree (int) – Degree of the polynomial regression.
Note
Bloebaum, P., Janzing, D., Washio, T., Shimizu, S., & Schoelkopf, B. (2018, March). CauseEffect Inference by Comparing Regression Errors. In International Conference on Artificial Intelligence and Statistics (pp. 900909).
Example
>>> from cdt.causality.pairwise import RECI >>> import networkx as nx >>> import matplotlib.pyplot as plt >>> from cdt.data import load_dataset >>> data, labels = load_dataset('tuebingen') >>> obj = RECI() >>> >>> # This example uses the predict() method >>> output = obj.predict(data) >>> >>> # This example uses the orient_graph() method. The dataset used >>> # can be loaded using the cdt.data module >>> data, graph = load_dataset("sachs") >>> output = obj.orient_graph(data, nx.Graph(graph)) >>> >>> #To view the directed graph run the following command >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
cdt.causality.graph¶
Find causal relationships and output a directed graph.

class
cdt.causality.graph.model.
GraphModel
[source]¶ Base class for all graph causal inference models.
Usage for undirected/directed graphs and raw data. All causal discovery models out of observational data base themselves on this class. Its main feature is the predict function that executes a function according to the given arguments.

create_graph_from_data
(data, **kwargs)[source]¶ Infer a directed graph out of data.
Note
Not implemented: will be implemented by the model classes.

orient_directed_graph
(data, dag, **kwargs)[source]¶ Re/Orient an undirected graph.
Note
Not implemented: will be implemented by the model classes.

orient_undirected_graph
(data, umg, **kwargs)[source]¶ Orient an undirected graph.
Note
Not implemented: will be implemented by the model classes.

predict
(df_data, graph=None, **kwargs)[source]¶ Orient a graph using the method defined by the arguments.
Depending on the type of graph, this function process to execute different functions:
If
graph
is anetworkx.DiGraph
, thenself.orient_directed_graph
is executed.If
graph
is anetworkx.Graph
, thenself.orient_undirected_graph
is executed.If
graph
is aNone
, thenself.create_graph_from_data
is executed.
 Parameters
df_data (pandas.DataFrame) – DataFrame containing the observational data.
graph (networkx.DiGraph or networkx.Graph or None) – Prior knowledge on the causal graph.
Warning
Requirement : Name of the nodes in the graph must correspond to the name of the variables in df_data

bnlearnbased models¶

class
cdt.causality.graph.bnlearn.
BNlearnAlgorithm
(score='NULL', alpha=0.05, beta='NULL', optim=False, verbose=None)[source]¶ BNlearn algorithm. All these models imported from bnlearn revolve around this base class and have all the same attributes/interface.
 Parameters
score (str) – the label of the conditional independence test to be used in the algorithm. If none is specified, the default test statistic is the mutual information for categorical variables, the JonckheereTerpstra test for ordered factors and the linear correlation for continuous variables. See below for available tests.
alpha (float) – a numeric value, the target nominal type I error rate.
beta (int) – a positive integer, the number of permutations considered for each permutation test. It will be ignored with a warning if the conditional independence test specified by the score argument is not a permutation test.
optim (bool) – See bnlearnpackage for details.
verbose (bool) – Sets the verbosity. Defaults to SETTINGS.verbose
 Available tests:
 discrete case (categorical variables)
 – mutual information: an informationtheoretic distance measure.
It’s proportional to the loglikelihood ratio (they differ by a 2n factor) and is related to the deviance of the tested models. The asymptotic χ2 test (mi and miadf, with adjusted degrees of freedom), the Monte Carlo permutation test (mcmi), the sequential Monte Carlo permutation test (smcmi), and the semiparametric test (spmi) are implemented.
 – shrinkage estimator for the mutual information (mish)
An improved asymptotic χ2 test based on the JamesStein estimator for the mutual information.
 – Pearson’s X2the classical Pearson’s X2 test for contingency tables.
The asymptotic χ2 test (x2 and x2adf, with adjusted degrees of freedom), the Monte Carlo permutation test (mcx2), the sequential Monte Carlo permutation test (smcx2) and semiparametric test (spx2) are implemented .
 discrete case (ordered factors)
 – JonckheereTerpstraa trend test for ordinal variables.
The asymptotic normal test (jt), the Monte Carlo permutation test (mcjt) and the sequential Monte Carlo permutation test (smcjt) are implemented.
 continuous case (normal variables)
 – linear correlation: Pearson’s linear correlation.
The exact Student’s t test (cor), the Monte Carlo permutation test (mccor) and the sequential Monte Carlo permutation test (smccor) are implemented.
 – Fisher’s Z: a transformation of the linear correlation with asymptotic normal distribution.
Used by commercial software (such as TETRAD II) for the PC algorithm (an R implementation is present in the pcalg package on CRAN). The asymptotic normal test (zf), the Monte Carlo permutation test (mczf) and the sequential Monte Carlo permutation test (smczf) are implemented.
 – mutual information: an informationtheoretic distance measure.
Again it is proportional to the loglikelihood ratio (they differ by a 2n factor). The asymptotic χ2 test (mig), the Monte Carlo permutation test (mcmig) and the sequential Monte Carlo permutation test (smcmig) are implemented.
 – shrinkage estimator for the mutual information(migsh):
an improved asymptotic χ2 test based on the JamesStein estimator for the mutual information.
 hybrid case (mixed discrete and normal variables)
 – mutual information: an informationtheoretic distance measure.
Again it is proportional to the loglikelihood ratio (they differ by a 2n factor). Only the asymptotic χ2 test (micg) is implemented.

create_graph_from_data
(data)[source]¶ Run the algorithm on data.
 Parameters
data (pandas.DataFrame) – DataFrame containing the data
 Returns
Solution given by the algorithm.
 Return type
networkx.DiGraph

orient_directed_graph
(data, graph)[source]¶ Run the algorithm on a directed_graph.
 Parameters
data (pandas.DataFrame) – DataFrame containing the data
graph (networkx.DiGraph) – Skeleton of the graph to orient
 Returns
Solution on the given skeleton.
 Return type
networkx.DiGraph
Warning
The algorithm is ran on the skeleton of the given graph.
GS¶

class
cdt.causality.graph.bnlearn.
GS
[source]¶ GrowShrink algorithm.
Description: The Grow Shrink algorithm is a constraint based algorithm to recover bayesian networks. It consists in two phases, one growing phase in which nodes are added to the markov blanket based on conditional independence and a shrinking phase in which most irrelevant nodes are removed.
Required R packages: bnlearn
Data Type: Depends on the test used. Check here for the list of available tests.
Assumptions: GS outputs a CPDAG, with additional assumptions depending on the conditional test used.
Note
Margaritis D (2003). Learning Bayesian Network Model Structure from Data . Ph.D. thesis, School of Computer Science, CarnegieMellon University, Pittsburgh, PA. Available as Technical Report CMUCS03153
Example
>>> import networkx as nx >>> from cdt.causality.graph import GS >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = GS() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
IAMB¶

class
cdt.causality.graph.bnlearn.
IAMB
[source]¶ IAMB algorithm.
Description: The is a bayesian constraint based algorithm to recover Markov blankets in a forward selection and a modified backward selection process.
Required R packages: bnlearn
Data Type: Depends on the test used. Check here for the list of available tests.
Assumptions: IAMB outputs Markov blankets of nodes, with additional assumptions depending on the conditional test used.
Note
Tsamardinos I, Aliferis CF, Statnikov A (2003). “Algorithms for Large Scale Markov Blanket Discovery”. In “Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference”, pp. 376381. AAAI Press.
Example
>>> import networkx as nx >>> from cdt.causality.graph import IAMB >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = IAMB() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
Fast_IAMB¶

class
cdt.causality.graph.bnlearn.
Fast_IAMB
[source]¶ Fast IAMB algorithm.
Description: Similar to IAMB, FastIAMB adds speculation to provide more computational performance without affecting the accuracy of markov blanket recovery.
Required R packages: bnlearn
Data Type: Depends on the test used. Check here for the list of available tests.
Assumptions: FastIAMB outputs markov blankets of nodes, with additional assumptions depending on the conditional test used.
Note
Yaramakala S, Margaritis D (2005). “Speculative Markov Blanket Discovery for Optimal Feature Selection”. In “ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Mining”, pp. 809812. IEEE Computer Society.
Example
>>> import networkx as nx >>> from cdt.causality.graph import Fast_IAMB >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = Fast_IAMB() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
Inter_IAMB¶

class
cdt.causality.graph.bnlearn.
Inter_IAMB
[source]¶ Interleaved IAMB algorithm.
Description: Similar to IAMB, InterleavedIAMB has a progressive forward selection minimizing false positives.
Required R packages: bnlearn
Data Type: Depends on the test used. Check here for the list of available tests.
Assumptions: InterIAMB outputs markov blankets of nodes, with additional assumptions depending on the conditional test used.
Note
Yaramakala S, Margaritis D (2005). “Speculative Markov Blanket Discovery for Optimal Feature Selection”. In “ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Min ing”, pp. 809812. IEEE Computer Society.
Example
>>> import networkx as nx >>> from cdt.causality.graph import Inter_IAMB >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = Inter_IAMB() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
MMPC¶

class
cdt.causality.graph.bnlearn.
MMPC
[source]¶ MaxMin ParentsChildren algorithm.
Description: The MaxMin ParentsChildren (MMPC) is a 2phase algorithm with a forward pass and a backward pass. The forward phase adds recursively the variables that possess the highest association with the target conditionally to the already selected variables. The backward pass tests dseparability of variables conditionally to the set and subsets of the selected variables.
Required R packages: bnlearn
Data Type: Depends on the test used. Check here for the list of available tests.
Assumptions: MMPC outputs markov blankets of nodes, with additional assumptions depending on the conditional test used.
Note
Tsamardinos I, Aliferis CF, Statnikov A (2003). “Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations”. In “KDD ’03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining”, pp. 673678. ACM. Tsamardinos I, Brown LE, Aliferis CF (2006). “The MaxMin HillClimbing Bayesian Network Structure Learning Algorithm”. Machine Learning,65(1), 3178.
Example
>>> import networkx as nx >>> from cdt.causality.graph import MMPC >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = MMPC() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()
CAM¶

class
cdt.causality.graph.
CAM
(score='nonlinear', cutoff=0.001, variablesel=True, selmethod='gamboost', pruning=False, prunmethod='gam', njobs=None, verbose=None)[source]¶ CAM algorithm [R model].
Description: Causal Additive models, a causal discovery algorithm relying on fitting Gaussian Processes on data, while considering all noises additives and additive contributions of variables.
Required R packages: CAM
Data Type: Continuous
Assumptions: The data follows a generalized additive noise model: each variable \(X_i\) in the graph \(\mathcal{G}\) is generated following the model \(X_i = \sum_{X_j \in \mathcal{G}} f(X_j) + \epsilon_i\), \(\epsilon_i\) representing mutually independent noises variables accounting for unobserved variables.
 Parameters
score (str) – Score used to fit the gaussian processes.
cutoff (float) – threshold value for variable selection.
variablesel (bool) – Perform a variable selection step.
selmethod (str) – Method used for variable selection.
pruning (bool) – Perform an initial pruning step.
prunmethod (str) – Method used for pruning.
njobs (int) – Number of jobs to run in parallel.
verbose (bool) – Sets the verbosity of the output.
 Available scores:
nonlinear: ‘SEMGAM’
linear: ‘SEMLIN’
 Available variable selection methods:
gamboost’: ‘selGamBoost’
gam’: ‘selGam’
lasso’: ‘selLasso’
linear’: ‘selLm’
linearboost’: ‘selLmBoost’
 Default Parameters:
FILE: ‘/tmp/cdt_CAM/data.csv’
SCORE: ‘SEMGAM’
VARSEL: ‘TRUE’
SELMETHOD: ‘selGamBoost’
PRUNING: ‘TRUE’
PRUNMETHOD: ‘selGam’
NJOBS: str(SETTINGS.NJOBS)
CUTOFF: str(0.001)
VERBOSE: ‘FALSE’
OUTPUT: ‘/tmp/cdt_CAM/result.csv’
Note
Ref: Bühlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal additive models, highdimensional order search and penalized regression. The Annals of Statistics, 42(6), 25262556.
Warning
This implementation of CAM does not support starting with a graph. The adaptation will be made at a later date.
Example
>>> import networkx as nx >>> from cdt.causality.graph import CAM >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = CAM() >>> output = obj.predict(data)
CCDr¶

class
cdt.causality.graph.
CCDr
(verbose=None)[source]¶ CCDr algorithm [R model].
Description: Concave penalized Coordinate Descent with reparametrization) structure learning algorithm as described in Aragam and Zhou (2015). This is a fast, score based method for learning Bayesian networks that uses sparse regularization and blockcyclic coordinate descent.
Required R packages: sparsebn
Data Type: Continuous
Assumptions: This model does not restrict or prune the search space in any way, does not assume faithfulness, does not require a known variable ordering, works on observational data (i.e. without experimental interventions), works effectively in high dimensions, and is capable of handling graphs with several thousand variables. The output of this model is a DAG.
Imported from the ‘sparsebn’ package.
Warning
This implementation of CCDr does not support starting with a graph.
Note
ref: Aragam, B., & Zhou, Q. (2015). Concave penalized estimation of sparse Gaussian Bayesian networks. Journal of Machine Learning Research, 16, 22732328.
Example
>>> import networkx as nx >>> from cdt.causality.graph import CCDr >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = CCCDr() >>> output = obj.predict(data)
CGNN¶

class
cdt.causality.graph.
CGNN
(nh=20, nruns=16, njobs=None, gpus=None, batch_size= 1, lr=0.01, train_epochs=1000, test_epochs=1000, verbose=None, dataloader_workers=0)[source]¶ Causal Generative Neural Netwoks
Description: Causal Generative Neural Networks. Scoremethod that evaluates candidate graph by generating data following the topological order of the graph using neural networks, and using MMD for evaluation.
Data Type: Continuous
Assumptions: The class of generative models is not restricted with a hard contraint, but with the hyperparameter
nh
. This algorithm greatly benefits from bootstrapped runs (nruns >=12 recommended), and is very computationnally heavy. GPUs are recommended. Parameters
nh (int) – Number of hidden units in each generative neural network.
nruns (int) – Number of times to run CGNN to have a stable evaluation.
njobs (int) – Number of jobs to run in parallel. Defaults to
cdt.SETTINGS.NJOBS
.gpus (bool) – Number of available gpus (Initialized with
cdt.SETTINGS.GPU
)batch_size (int) – batch size, defaults to fullbatch
lr (float) – Learning rate for the generative neural networks.
train_epochs (int) – Number of epochs used to train the network.
test_epochs (int) – Number of epochs during which the results are harvested. The network still trains at this stage.
verbose (bool) – Sets the verbosity of the execution. Defaults to
cdt.SETTINGS.verbose
.dataloader_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
Note
Ref : Learning Functional Causal Models with Generative Neural Networks Olivier Goudet & Diviyan Kalainathan & Al. (https://arxiv.org/abs/1709.05321)
Note
The input data can be of type torch.utils.data.Dataset, or it defaults to cdt.utils.io.MetaDataset. This class is overridable to write custom data loading functions, useful for very large datasets.
Example
>>> import networkx as nx >>> from cdt.causality.graph import CGNN >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = CGNN() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

create_graph_from_data
(data)[source]¶ Use CGNN to create a graph from scratch. All the possible structures are tested, which leads to a super exponential complexity. It would be preferable to start from a graph skeleton for large graphs.
 Parameters
data (pandas.DataFrame or torch.utils.data.Dataset) – Observational data on which causal discovery has to be performed.
 Returns
Solution given by CGNN.
 Return type
networkx.DiGraph

orient_directed_graph
(data, dag, alg='HC')[source]¶ Modify and improve a directed acyclic graph solution using CGNN.
 Parameters
data (pandas.DataFrame or torch.utils.data.Dataset) – Observational data on which causal discovery has to be performed.
dag (nx.DiGraph) – Graph that provides the initial solution, on which the CGNN algorithm will be applied.
alg (str) – Exploration heuristic to use, only “HC” is supported for now.
 Returns
Solution given by CGNN.
 Return type
networkx.DiGraph

orient_undirected_graph
(data, umg, alg='HC')[source]¶ Orient the undirected graph using GNN and apply CGNN to improve the graph.
 Parameters
data (pandas.DataFrame) – Observational data on which causal discovery has to be performed.
umg (nx.Graph) – Graph that provides the skeleton, on which the GNN then the CGNN algorithm will be applied.
alg (str) – Exploration heuristic to use, only “HC” is supported for now.
 Returns
Solution given by CGNN.
 Return type
networkx.DiGraph
Note
GNN (
cdt.causality.pairwise.GNN
) is first used to orient the undirected graph and output a DAG before applying CGNN.
GES¶

class
cdt.causality.graph.
GES
(score='obs', verbose=None)[source]¶ GES algorithm [R model].
Description: Greedy Equivalence Search algorithm. A scorebased Bayesian algorithm that searches heuristically the graph which minimizes a likelihood score on the data.
Required R packages: pcalg
Data Type: Continuous (
score='obs'
) or Categorical (score='int'
)Assumptions: The output is a Partially Directed Acyclic Graph (PDAG) (A markov equivalence class). The available scores assume linearity of mechanisms and gaussianity of the data.
 Parameters
score (str) – Sets the score used by GES.
verbose (bool) – Defaults to
cdt.SETTINGS.verbose
.
 Available scores:
int: GaussL0penIntScore
obs: GaussL0penObsScore
Note
Ref: D.M. Chickering (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research 3 , 507–554
A. Hauser and P. Bühlmann (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research 13, 2409–2464.
P. Nandy, A. Hauser and M. Maathuis (2015). Understanding consistency in hybrid causal structure learning. arXiv preprint 1507.02608
P. Spirtes, C.N. Glymour, and R. Scheines (2000). Causation, Prediction, and Search, MIT Press, Cambridge (MA)
Example
>>> import networkx as nx >>> from cdt.causality.graph import GES >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = GES() >>> #The predict() method works without a graph, or with a >>> #directed or udirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

create_graph_from_data
(data)[source]¶ Run the GES algorithm.
 Parameters
data (pandas.DataFrame) – DataFrame containing the data
 Returns
Solution given by the GES algorithm.
 Return type
networkx.DiGraph
GIES¶

class
cdt.causality.graph.
GIES
(score='obs', verbose=False)[source]¶ GIES algorithm [R model].
Description: Greedy Interventional Equivalence Search algorithm. A scorebased Bayesian algorithm that searches heuristically the graph which minimizes a likelihood score on the data. The main difference with GES is that it accepts interventional data for its inference.
Required R packages: pcalg
Data Type: Continuous (
score='obs'
) or Categorical (score='int'
)Assumptions: The output is a Partially Directed Acyclic Graph (PDAG) (A markov equivalence class). The available scores assume linearity of mechanisms and gaussianity of the data.
 Parameters
score (str) – Sets the score used by GIES.
verbose (bool) – Defaults to
cdt.SETTINGS.verbose
.
 Available scores:
int: GaussL0penIntScore
obs: GaussL0penObsScore
Note
Ref: D.M. Chickering (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research 3 , 507–554
A. Hauser and P. Bühlmann (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research 13, 2409–2464.
P. Nandy, A. Hauser and M. Maathuis (2015). Understanding consistency in hybrid causal structure learning. arXiv preprint 1507.02608
P. Spirtes, C.N. Glymour, and R. Scheines (2000). Causation, Prediction, and Search, MIT Press, Cambridge (MA)
Example
>>> import networkx as nx >>> from cdt.causality.graph import GIES >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = GIES() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

create_graph_from_data
(data)[source]¶ Run the GIES algorithm.
 Parameters
data (pandas.DataFrame) – DataFrame containing the data
 Returns
Solution given by the GIES algorithm.
 Return type
networkx.DiGraph
LiNGAM¶

class
cdt.causality.graph.
LiNGAM
(verbose=False)[source]¶ LiNGAM algorithm [R model].
Description: Linear NonGaussian Acyclic model. LiNGAM handles linear structural equation models, where each variable is modeled as \(X_j = \sum_k \alpha_k P_a^{k}(X_j) + E_j, j \in [1,d]\), with \(P_a^{k}(X_j)\) the \(k\)th parent of \(X_j\) and \(\alpha_k\) a real scalar.
Required R packages: pcalg
Data Type: Continuous
Assumptions: The underlying causal model is supposed to be composed of linear mechanisms and nongaussian data. Under those assumptions, it is shown that causal structure is fully identifiable (even inside the Markov equivalence class).
 Parameters
verbose (bool) – Sets the verbosity of the algorithm. Defaults to cdt.SETTINGS.verbose
Note
Ref: S. Shimizu, P.O. Hoyer, A. Hyvärinen, A. Kerminen (2006) A Linear NonGaussian Acyclic Model for Causal Discovery; Journal of Machine Learning Research 7, 2003–2030.
Warning
This implementation of LiNGAM does not support starting with a graph.
Example
>>> import networkx as nx >>> from cdt.causality.graph import LiNGAM >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = LiNGAM() >>> output = obj.predict(data)
PC¶

class
cdt.causality.graph.
PC
(CItest='gaussian', method_indep='corr', alpha=0.01, njobs=None, verbose=None)[source]¶ PC algorithm [R model].
Description: PC (Peter  Clark) One of the most famous score based approaches for causal discovery. Based on conditional tests on variables and sets of variables, it proved itself to be really efficient.
Required R packages: pcalg, kpcalg, RCIT (variant, see notes)
Data Type: Continuous and discrete
Assumptions: This approach’s complexity grows rapidly with the number of variables, even for quick tests. Consider graphs < 200 variables. The model assumptions made by this approch mainly depend on the type of test used. Kernelbased tests are also available. The prediction of PC is a CPDAG (identifiability up to the Markov equivalence class).
 Parameters
CItest (str) – Test for conditional independence.
alpha (float) – significance level (number in (0, 1) for the individual conditional independence tests.
njobs (int) – number of processor cores to use for parallel computation. Only available for method = “stable.fast” (set as default).
verbose – if TRUE, detailed output is provided.
 Variables
arguments (dict) – contains all current parameters used in the PC algorithm execution.
dir_CI_test (dict) – contains all available conditional independence tests.
dir_method_indep (dict) – contains all available heuristics for CI testing.
 Available CI tests:
binary: “data=X, ic.method=”dcc””
discrete: “data=X, ic.method=”dcc””
hsic_gamma: “data=X, ic.method=”hsic.gamma””
hsic_perm: “data=X, ic.method=”hsic.perm””
hsic_clust: “data=X, ic.method=”hsic.clust””
gaussian: “C = cor(X), n = nrow(X)”
rcit: “data=X, ic.method=”RCIT::RCIT””
rcot: “data=X, ic.method=”RCIT::RCoT””
 Default Parameters:
FILE: ‘/tmp/cdt_pc/data.csv’
SKELETON: ‘FALSE’
EDGES: ‘/tmp/cdt_pc/fixededges.csv’
GAPS: ‘/tmp/cdt_pc/fixedgaps.csv’
CITEST: “pcalg::gaussCItest”
METHOD_INDEP: “C = cor(X), n = nrow(X)”
SELMAT: ‘NULL’
DIRECTED: ‘TRUE’
SETOPTIONS: ‘NULL’
ALPHA: ‘0.01’
VERBOSE: ‘FALSE’
OUTPUT: ‘/tmp/cdt_pc/result.csv’
Note
Ref: D.Colombo and M.H. Maathuis (2014). Orderindependent constraintbased causal structure learning. Journal of Machine Learning Research 15 37413782.
M. Kalisch, M. Maechler, D. Colombo, M.H. Maathuis and P. Buehlmann (2012). Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software 47(11) 1–26, http://www.jstatsoft.org/v47/i11/
M. Kalisch and P. Buehlmann (2007). Estimating highdimensional directed acyclic graphs with the PCalgorithm. JMLR 8 613636.
J. Ramsey, J. Zhang and P. Spirtes (2006). Adjacencyfaithfulness and conservative causal inference. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, VA.
P. Spirtes, C. Glymour and R. Scheines (2000). Causation, Prediction, and Search, 2nd edition. The MIT Press
Strobl, E. V., Zhang, K., & Visweswaran, S. (2017). Approximate Kernelbased Conditional Independence Tests for Fast NonParametric Causal Discovery. arXiv preprint arXiv:1702.03877.
Imported from the Pcalg package.
The RCIT package has been adapted to fit the CDT package, please use the variant available at https://github.com/DiviyanKalainathan/RCIT
Example
>>> import networkx as nx >>> from cdt.causality.graph import PC >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = PC() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

create_graph_from_data
(data, **kwargs)[source]¶ Run the PC algorithm.
 Parameters
data (pandas.DataFrame) – DataFrame containing the data
 Returns
Solution given by PC on the given data.
 Return type
networkx.DiGraph

orient_directed_graph
(data, graph, *args, **kwargs)[source]¶ Run PC on a directed_graph (Only takes account of the skeleton of the graph).
 Parameters
data (pandas.DataFrame) – DataFrame containing the data
graph (networkx.DiGraph) – Skeleton of the graph to orient
 Returns
Solution given by PC on the given skeleton.
 Return type
networkx.DiGraph
Warning
The algorithm is ran on the skeleton of the given graph.
SAM¶

class
cdt.causality.graph.
SAM
(lr=0.01, dlr=0.001, mixed_data=False, lambda1=10, lambda2=0.001, nh=20, dnh=200, train_epochs=3000, test_epochs=1000, batch_size= 1, losstype='fgan', dagloss=True, dagstart=0.5, dagpenalization=0, dagpenalization_increase=0.01, functional_complexity='l2_norm', hlayers=2, dhlayers=2, sampling_type='sigmoidproba', linear=False, nruns=8, njobs=None, gpus=None, verbose=None)[source]¶ SAM Algorithm.
Description: Structural Agnostic Model is an causal discovery algorithm for DAG recovery leveraging both distributional asymetries and conditional independencies. the first version of SAM without DAG constraint is available as
SAMv1
.Data Type: Continuous, (Mixed  Experimental)
Assumptions: The class of generative models is not restricted with a hard contraint, but with soft constraints parametrized with the
lambda1
andlambda2
parameters, with gumbel softmax sampling. This algorithms greatly benefits from bootstrapped runs (nruns >=8 recommended). GPUs are recommended but not compulsory. The output is a DAG, but may need a thresholding as the output is averaged over multiple runs. Parameters
lr (float) – Learning rate of the generators
dlr (float) – Learning rate of the discriminator
mixed_data (bool) – Experimental – Enable for mixedtype datasets
lambda1 (float) – L0 penalization coefficient on the causal filters
lambda2 (float) – L2 penalization coefficient on the weights of the neural network
nh (int) – Number of hidden units in the generators’ hidden layers (regularized with lambda2)
dnh (int) – Number of hidden units in the discriminator’s hidden layers
train_epochs (int) – Number of training epochs
test_epochs (int) – Number of test epochs (saving and averaging the causal filters)
batch_size (int) – Size of the batches to be fed to the SAM model Defaults to fullbatch
losstype (str) – type of the loss to be used (either ‘fgan’ (default), ‘gan’ or ‘mse’)
dagloss (bool) – Activate the DAG with NoTEARS constraint
dagstart (float) – Controls when the DAG constraint is to be introduced in the training (float ranging from 0 to 1, 0 denotes the start of the training and 1 the end)
dagpenalisation (float) – Initial value of the DAG constraint
dagpenalisation_increase (float) – Increase incrementally at each epoch the coefficient of the constraint
functional_complexity (str) – Type of functional complexity penalization (choose between ‘l2_norm’ and ‘n_hidden_units’)
hlayers (int) – Defines the number of hidden layers in the generators
dhlayers (int) – Defines the number of hidden layers in the discriminator
sampling_type (str) – Type of sampling used in the structural gates of the model (choose between ‘sigmoid’, ‘sigmoid_proba’ and ‘gumble_proba’)
linear (bool) – If true, all generators are set to be linear generators
nruns (int) – Number of runs to be made for causal estimation Recommended: >=32 for optimal performance
njobs (int) – Numbers of jobs to be run in Parallel Recommended: 1 if no GPU available, 2*number of GPUs else
gpus (int) – Number of available GPUs for the algorithm
verbose (bool) – verbose mode
Note
Ref: Kalainathan, Diviyan & Goudet, Olivier & Guyon, Isabelle & LopezPaz, David & Sebag, Michèle. (2018). Structural Agnostic Modeling: Adversarial Learning of Causal Graphs.
Example
>>> import networkx as nx >>> from cdt.causality.graph import SAM >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = SAM() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

predict
(data, graph=None, return_list_results=False)[source]¶ Execute SAM on a dataset given a skeleton or not.
 Parameters
data (pandas.DataFrame) – Observational data for estimation of causal relationships by SAM
skeleton (numpy.ndarray) – A priori knowledge about the causal relationships as an adjacency matrix. Can be fed either directed or undirected links.
 Returns
Graph estimated by SAM, where A[i,j] is the term of the ith variable for the jth generator.
 Return type
networkx.DiGraph
SAMv1¶

class
cdt.causality.graph.
SAMv1
(lr=0.1, dlr=0.1, l1=0.1, nh=50, dnh=200, train_epochs=1000, test_epochs=1000, batch_size= 1, nruns=6, njobs=None, gpus=None, verbose=None)[source]¶ SAM Algorithm. Implementation of the first version of the SAM algorithm, available at https://arxiv.org/abs/1803.04929v1.
Description: Structural Agnostic Model is an fullydifferenciable causal discovery algorithm leveraging both distributional assymetries and conditional independencies.
Data Type: Continuous
Assumptions: The class of generative models is not restricted with a hard contraint, but with the hyperparameter
nh
. This algorithms greatly benefits from bootstrapped runs (nruns >=8 recommended). GPUs are recommended but not compulsory. Output is not a DAG Parameters
lr (float) – Learning rate of the generators
dlr (float) – Learning rate of the discriminator
l1 (float) – L1 penalization on the causal filters
nh (int) – Number of hidden units in the generators’ hidden layers
dnh (int) – Number of hidden units in the discriminator’s hidden layer$
train_epochs (int) – Number of training epochs
test_epochs (int) – Number of test epochs (saving and averaging the causal filters)
batch_size (int) – Size of the batches to be fed to the SAM model.
nruns (int) – Number of runs to be made for causal estimation. Recommended: >=12 for optimal performance.
njobs (int) – Numbers of jobs to be run in Parallel. Recommended: 1 if no GPU available, 2*number of GPUs else.
gpus (int) – Number of available GPUs for the algorithm.
verbose (bool) – verbose mode
Note
Ref: Kalainathan, Diviyan & Goudet, Olivier & Guyon, Isabelle & LopezPaz, David & Sebag, Michèle. (2018). SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning.
Example
>>> import networkx as nx >>> from cdt.causality.graph import SAMv1 >>> from cdt.data import load_dataset >>> data, graph = load_dataset("sachs") >>> obj = SAMv1() >>> #The predict() method works without a graph, or with a >>> #directed or undirected graph provided as an input >>> output = obj.predict(data) #No graph provided as an argument >>> >>> output = obj.predict(data, nx.Graph(graph)) #With an undirected graph >>> >>> output = obj.predict(data, graph) #With a directed graph >>> >>> #To view the graph created, run the below commands: >>> nx.draw_networkx(output, font_size=8) >>> plt.show()

predict
(data, graph=None, return_list_results=False)[source]¶ Execute SAM on a dataset given a skeleton or not.
 Parameters
data (pandas.DataFrame) – Observational data for estimation of causal relationships by SAM
skeleton (numpy.ndarray) – A priori knowledge about the causal relationships as an adjacency matrix. Can be fed either directed or undirected links.
 Returns
Graph estimated by SAM, where A[i,j] is the term of the ith variable for the jth generator.
 Return type
networkx.DiGraph