Advanced Tutorial ================= This second tutorial targets more experienced users. We will focus on: 1. Launching `cdt` Docker containers 2. Tweaking the ``cdt.SETTINGS`` to adapt the package to the hardware configuration 3. Generate a artificial dataset from scratch 4. Perform causal discovery on GPU 5. Evaluate the results 1. Launch the Docker containers ------------------------------- Docker images are really useful to have a portable environment with minimal impact on performance. In our case, it becomes really handy as all the R libraries are quite time-consuming to install and have lots of incompatibilities depending on the user environment. Check https://docs.docker.com/install/ to install Docker and have a quick tutorial on its usage. `cdt` Docker containers are available at https://hub.docker.com/u/divkal . Check :ref:`here ` to select the image adapted to your configuration. In this tutorial we will consider having GPUs available, but the methods are really similar if you don't have GPUs (selecting the CPU docker image instead of the GPU one). .. code-block:: bash $ docker pull divkal/nv-cdt-py3.6:XX # XX corresponds to the latest version $ nvidia-docker run -it --init --ipc=host --rm -u=$(id -u):$(id -g) divkal/nv-cdt-py3.6:XX /bin/bash ============= == PyTorch == ============= NVIDIA Release 18.09 (build 687447) Container image Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. Copyright (c) 2016- Facebook, Inc (Adam Paszke) Copyright (c) 2014- Facebook, Inc (Soumith Chintala) Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) Copyright (c) 2011-2013 NYU (Clement Farabet) Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) Copyright (c) 2006 Idiap Research Institute (Samy Bengio) Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) All rights reserved. Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file. Failed to detect NVIDIA driver version. I have no name!@5308f95cd331:/workspace$ I have no name!@5308f95cd331:/workspace$ ipython Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) Type 'copyright', 'credits' or 'license' for more information IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: The docker image is built upon the Nvidia NGC docker image for PyTorch. Details of the options of the docker command: - ``nvidia-docker`` is a variant of ``docker`` developed by NVIDIA for GPU passthrough. It is available at : https://github.com/NVIDIA/nvidia-docker - ``-it`` is an option to launch the container in interactive mode - ``--init`` is to passthrough the signals such as SIGINT or SIGKILL in the container. - ``--rm`` is an option to save space by deleting the container at the end of the execution. - ``-u`` is an option to launch the container as a specific user. Otherwise it will be executed as ``root``. This is quite useful for accessing files created in the container from the outside environment. 2. Adapt the `cdt` package configuration ---------------------------------------- In this section, we will tweak the ``cdt.SETTINGS`` to fit our usage. We will first check the current configuration, then increase the number of jobs as the graph generated in the next section will be quite small. More details on the package settings are :ref:`provided here `. .. code-block:: python In [1]: import cdt Detecting 1 CUDA device(s). In [2]: cdt.SETTINGS.GPU # Is set to the number of devices Out[2]: 1 In [3]: cdt.SETTINGS.NJOBS # Set to the num of devices Out[3]: 1 In [4]: cdt.SETTINGS.NJOBS = 3 # 3 jobs per GPU In [5]: cdt.SETTINGS.verbose = False 3. Artifical graph generation ----------------------------- Generating artificial graph with the `cdt` package is quite straightforward when using the ``cdt.data.AcyclicGraphGenerator`` class. :ref:`Check here ` to have more details on how to customize the graph generator. .. code-block:: python In [6]: generator = cdt.data.AcyclicGraphGenerator('gp_add', noise_coeff=.2, nodes=20, parents_max=3) In [7]: data, graph = generator.generate() In [7]: data.head() Out[7]: V0 V1 V2 V3 ... V16 V17 V18 V19 0 -0.948506 0.366023 -0.659409 -1.012921 ... -0.086537 0.504257 1.163381 -0.815508 1 -1.175473 1.612285 1.087017 -1.505346 ... -0.119292 -1.251204 0.303203 -0.730214 2 -0.899956 0.757223 -0.394799 -1.345747 ... -0.620322 -0.919279 -1.948743 0.027883 3 -1.143217 1.419192 0.608848 -1.144207 ... 1.992465 -1.277411 -0.109563 -0.907268 4 -0.653106 -0.582684 -0.947306 -0.701014 ... -0.217655 1.429272 -1.156742 1.305437 [5 rows x 20 columns] And the data and graph are generated. 4. Run SAM on GPUs ------------------ Running multiple bootstrapped runs of SAM proved itself to yield much better results than a single run. The parameter ``nruns`` allows to control the total number of runs. As soon as the setting ``cdt.SETTINGS.GPU > 0``, the execution of GPU compatible algorithms will be automatically performed on those devices, making the prediction step similar to a traditional algorithm: .. code-block:: python In [8]: sam = cdt.causality.graph.SAM(nruns=12) In [9]: prediction = sam.predict(data) .. seealso:: Kalainathan, Diviyan & Goudet, Olivier & Guyon, Isabelle & Lopez-Paz, David & Sebag, Michèle. (2018). SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning. 5. Scoring the results ---------------------- In a similar fashion to the other tutorial, we can quickly score the results using the methods in ``cdt.metrics``: .. code-block:: python In [10]: from cdt.metrics import (precision_recall, SHD) In [11]: [metric(graph, prediction) for metric in (precision_recall, SHD)] Out[11]: [(0.53, [(0.06, 1.0), (1.0, 0.0)]), 24.0] This concludes our second tutorial on the `cdt` package.