Developer Documentation
This project is an open-source community project, hosted on GitHub at the following address: https://github.com/FenTechSolutions/CausalDiscoveryToolbox
We abide by the principles of openness, respect, and consideration of others of the Python Software Foundation: https://www.python.org/psf/codeofconduct/
Bug reporting
Encountering a bug while using this package may occur. In order to fix the said bug and improve all users’ experience, it is highly recommended to submit a bug report on the GitHub issue tracker: https://github.com/FenTechSolutions/CausalDiscoveryToolbox/issues
When reporting a bug, please mention:
Your
cdt
package version or docker image tag.Your python version.
Your
PyTorch
package version.Your hardware configuration, if there are GPUs available.
The full traceback of the raised error if one is raised.
A small code snippet to reproduce the bug if the description is not explicit.
Contributing
The recommended way to contribute to the Causal Discovery Toolbox is to submit a
pull request on the dev
branch of https://github.com/FenTechSolutions/CausalDiscoveryToolbox
To submit a pull request, the following are required:
Having an up-to-date forked repository of the package and a python 3 installation
Clone your forked version of the code locally and install it in developer mode, in a separate python environement (e.g. Anaconda environement):
$ conda create --name cdt_dev python=3.6 numpy scipy scikit-learn $ source activate cdt_dev $ git clone git@github.com:YourLogin/CausalDiscoveryToolbox.git $ cd CausalDiscoveryToolbox $ git checkout dev $ python setup.py install develop
Where
python
refers to your python 3 installation.Make your changes to the source code of the package
Test your changes using
pytest
:$ cd CausalDiscoveryToolbox $ pip install pytest $ pytest
If the tests pass, commit and push your changes:
$ git add . $ git commit -m "[DEV] Your commit message" $ git push -u origin dev
The commits must begin with a tag, defining the main purpose of the commit. Examples of tags are:
[DEV]
for development[TRAVIS]
for changes on the continuous integration[DOC]
for documentation[TEST]
for testing and coverage[FIX]
for bugfixes[REL]
and[MREL]
are reserved names for releases and major releases. They trigger package version updates on the continuous integration.[DEPLOY]
is a reserved tag for the continuous integration to upload its changes.
Please check that your pull request complies with all the rules of the checklist:
Respected the pattern design of the package, using the
networkx.DiGraph
classes and thecdt.Settings
modules and heritage from the model classes, and verified the correct import of the new functionalities.Added documentation to your added functionalities (check the following section)
Added corresponding tests to the added functions/classes in
/tests/scripts
Finally, submit your pull request using the GitHub website.
Dependencies
The package is to be as much independent of other packages as possible, as it already depends on many libraries. Therefore, all contributions requiring the addition of a new dependency will be severely examined.
Two types of dependencies are possible for now:
Python dependencies, defined in
requirements.txt
andsetup.py
R dependencies, defined in
r_requirements.txt
Warning
For R dependencies, the Docker base images have to be rebuilt, thus notifying the core maintainers of the package is necessary for the Docker image to be updated.
Documentation
The documentation of the package is automatically generated using Sphinx, by
parsing docstrings of functions and classes, as defined in /docs/index.md
and the /docs/*.rst
files. To add a new function in the documentation, add
the respective mention in the .rst
file. The documentation is automatically
built and updated online by the Continuous Integration Tool at each push on the
master branch.
When writing your docstrings, please use the Google Style format: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
Your docstrings must include:
A presentation of the functionality
A detailed description or the arguments and returns
A scientific source in
..note::
if applicableA short example
Testing
The package is thoroughly tested using pytest
and codecov
for code
coverage. Tests are run using a Continuous Integration Tool, for
each push on master/dev or pull requests, allowing to provide users with
immediate feedback.
The test scripts are included in the GitHub repository at /tests/scripts
,
and some sample data for the function to be applied on can be found in
/tests/datasets
.
In order to write new tests functions, add either a new python file or complete
an already existing file, and add a function whose name must begin with test_
.
This allows pytest to automatically detect the new test function.
New test functions must provide optimal code coverage of tested functionalities, as well as test of imports and result coherence.
Continuous Integration
Continuous integration (travis-ci) is enabled on this project, it allows for:
Testing new code with
pytest
and upload the code coverage results to https://codecov.io/gh/FenTechSolutions/CausalDiscoveryToolboxBumping a new version of the package and push it to GitHub.
Building new docker images and push them to https://hub.docker.com/u/fentech
Push the new package version to PyPi
Compile the new documentation and upload its website.
All the tasks described above are defined in the .travis.yml
file.
R integration
One of this project’s main features is wrapping around R-libraries. In order to do it in the most efficient way, the R tasks are executed in a different process than the main python process thus freeing the computation from the GIL.
A /tmp/ folder is used as buffer, and everything is executed with the subprocess library. Check out cdt.utils.R for more detailed information.
Parallelization
Many algorithms are computationally heavy, but parallelizable, as they include bootstrapped functions, multiple runs of a same computation.
Therefore, using multiprocessing allows to alleviate the required computation
time. For CPU jobs, we use the joblib
library, for its efficiency and ease
of use. However, for GPU jobs, the multiprocessing interface was recoded,
in order to account for available resources and a memory leak issue between
joblib and PyTorch.
Check out cdt.utils.parallel for more detailed information.