Get started
This section focuses on explaining the general functionalities of the package, and how its components interact with each other. Then two tutorials are provided:
For an installation guide, please check here
Package Description
General package architecture
The Causal Discovery Toolbox is a package for causal discovery in the observational setting. Therefore support for data with interventions is not available at the moment, but is considered for later versions.
The package is structured in 5 modules:
Causality:
cdt.causality
implements algorithms for causal discovery, either in the pairwise setting or the graph setting.Independence:
cdt.independence
includes methods to recover the dependence graph of the data.Data:
cdt.data
provides the user with tools to generate data, and load benchmark data.Utils:
cdt.utils
provides tools to the users for model construction, graph utilities and settings.Metrics:
cdt.metrics
includes scoring metrics for graphs, taking as inputnetworkx.DiGraph
All methods for computation adopt a ‘scikit-learn’ like interface, where .predict()
manages to launch the algorithm on the given data to the toolbox, .fit()
allows
to train learning algorithms Most of the algorithms are classes, and their
parameters can be customized in the .__init__()
function of the class.
Note
The .predict()
function is often implemented in the base class
(cdt.causality.graph.GraphModel
for causal graph algorithms).
.predict()
is often a wrapper calling sub-functions depending on the
arguments fed to the functions. The sub-functions, such as
.orient_directed_graph()
for cdt.causality.graph
models (which is
called when a directed graph is fed as a second argument ), are
implemented and documented in the various algorithms.
Hardware and algorithm settings
The toolbox has a SETTINGS class that defines the hardware settings. Those
settings are unique and their default parameters are defined in
cdt/utils/Settings
.
These parameters are accessible and overridable via accessing the class:
>>> import cdt
>>> cdt.SETTINGS
Moreover, the hardware parameters are detected and defined automatically
(including number of GPUs, CPUs, available optional packages) at the import
of the package using the cdt.utils.Settings.autoset_settings
method, ran at
startup.
These settings are overriddable in two ways:
By changing
cdt.SETTINGS
attributes, thus changing the SETTINGS for the whole python session.By changing the parameters of the functions/classes used. When their default value in the class definition is
None
, thecdt.SETTINGS
value is taken, by using thecdt.SETTINGS.get_default
function. This allows for quick and temporary parameter change.
The graph class
The whole package revolves around using the graph classes of the networkx
package.
Most of the methods have the option of predicting a directed graph
(networkx.DiGraph) or an undirected graph (networkx.Graph).
The networkx
library might not be intuitive to use at first, but it comes
with many useful tools for graphs. Here is a list of handy function, to
understand the output of the toolbox’s outputs:
>>> import networkx as nx
>>> g = nx.DiGraph() # initialize a directed graph
>>> l = list(g.nodes()) # list of nodes in the graph
>>> a = nx.adj_matrix(g).todense() # Output the adjacency matrix of the graph
>>> e = list(g.edges()) # list of edges in the graph
Please refer to networkx ‘s documentation for more detailed information: https://networkx.github.io/documentation/stable/