sempler.generators

This module contains functions to generate random graphs, which can then be used together with sempler.ANM or sempler.LGANM to produce random SCMs.

sempler.generators.dag_avg_deg(p, k, w_min=1, w_max=1, return_ordering=False, random_state=None, debug=False)

Generate an Erdos-Renyi graph with p nodes and average degree k, and orient edges according to a random ordering. Sample the edge weights from a uniform distribution.

Parameters:
  • p (int) – The number of nodes in the graph.

  • k (float) – The desired average degree.

  • w_min (float, optional) – The lower bound on the sampled weights. Defaults to 1.

  • w_max (float, optional) – The upper bound on the sampled weights. Defaults to 1.

  • return_ordering (bool, optional) – If the topological ordering used to orient the edges should be returned.

  • random_state (int,optional) – To set the random state for reproducibility.

  • debug (bool, optional) – If debug traces should be printed.

Returns:

  • W (numpy.ndarray) – The connectivity (weights) matrix of the generated DAG.

  • ordering (numpy.ndarray, optional) – If return_ordering = True, a topological ordering of the graph.

Example

>>> from sempler.generators import dag_avg_deg
>>> dag_avg_deg(5, 2, random_state = 42)
array([[0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0.]])

Optionally, the ordering used to orient the edges can be returned

>>> dag_avg_deg(5, 2, return_ordering = True, random_state = 42)
(array([[0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 1.],
       [0., 0., 0., 0., 0.]]), array([0, 3, 1, 4, 2]))
sempler.generators.dag_full(p, w_min=1, w_max=1, return_ordering=False, random_state=None)

Create a fully connected DAG, sampling the weights from a uniform distribution.

Parameters:
  • p (int) – The number of nodes in the graph.

  • w_min (float, optional) – The lower bound on the sampled weights. Defaults to 1.

  • w_max (float, optional) – The upper bound on the sampled weights. Defaults to 1.

  • return_ordering (bool, optional) – If the topological ordering used to orient the edges should be returned.

  • random_state (int,optional) – To set the random state for reproducibility.

  • debug (bool, optional) – If debug traces should be printed.

Returns:

  • W (numpy.ndarray) – The connectivity (weights) matrix of the generated DAG.

  • ordering (numpy.ndarray, optional) – If return_ordering = True, a topological ordering of the graph.

Example

>>> from sempler.generators import dag_full
>>> dag_full(4, random_state = 42)
array([[0., 0., 1., 1.],
       [1., 0., 1., 1.],
       [0., 0., 0., 0.],
       [0., 0., 1., 0.]])

Optionally, the ordering used to orient the edges can be returned

>>> dag_full(4, return_ordering = True, random_state = 42)
(array([[0., 0., 1., 1.],
       [1., 0., 1., 1.],
       [0., 0., 0., 0.],
       [0., 0., 1., 0.]]), array([1, 0, 3, 2]))
sempler.generators.intervention_targets(p, K, size, replace=True, random_state=None)

Sample a set of intervention targets.

Parameters:
  • p (int) – The number of variables, i.e. targets will be sampled from [0,p-1].

  • K (int) – The total number of interventions.

  • size (int or tuple) – Specifies the size of each intervention, i.e. the number of targets / intervention. If a two-element tuple, the number of targets is sampled uniformly at random from [size[0], size[1]].

  • replace (bool, default=True) – Wether the intervention targets should be sampled with replacement, i.e. if repeated targets are allowed across environments.

  • random_state (int or None) – To set the random state for reproducibility.

Returns:

interventions – The sampled intervention targets.

Return type:

list of list of int

Raises:

ValueError : – If the size of each intervention (i.e. number of targets) is larger than the actual number of variables, or if the tuple passed as size does not have length 2.

Examples

Generating a set of single-variable interventions:

>>> from sempler.generators import intervention_targets
>>> intervention_targets(10, 5, 1, random_state=42)
[[0], [7], [6], [4], [4]]

Without replacement:

>>> intervention_targets(10, 5, 1, replace=False, random_state=42)
[[0], [7], [6], [4], [3]]

Generating a set of interventions with random number of targets:

>>> intervention_targets(10, 5, (1,3), random_state=42)
[[8], [2, 6, 0], [8, 7], [6, 7], [8, 1]]

Without replacement:

>>> intervention_targets(10, 5, (1,2), replace=False, random_state=42)
[[8], [6, 0], [1, 4], [7], [9]]

An exception is raised if size > p:

>>> intervention_targets(4, 5, 5)
Traceback (most recent call last):
  ...
ValueError: The (max.) intervention size cannot be larger than the number of variables.

Or if size is a tuple with size different than two:

>>> intervention_targets(4, 5, (0,1,2))
Traceback (most recent call last):
  ...
ValueError: The intervention size must be a positive integer or two-element tuple.

If sampling targets without replacement, the maximum intervention size and number of interventions must be set accordingly, i.e. max_size x K <= p. Otherwise an exception is raised:

>>> intervention_targets(10, 5, (0,3), replace=False)
Traceback (most recent call last):
  ...
ValueError: Cannot sample targets without replacement for the given intervention size and number of interventions.