sempler.NormalDistribution

The NormalDistribution class represents a normal distribution through its mean and covariance. The class defines methods for

conditioning: conditional()
marginalization: marginal()
sampling: sample()
least-squares regression in the “population setting”: regress() and mse()

class sempler.NormalDistribution(mean, covariance, check_valid='ignore')

Symbolic representation of a normal distribution.

Parameters:

mean (array_like) – The marginal means of the variables.
covariance (array_like) – The covariance matrix of the distribution.
check_valid ({'ignore', 'warn', 'raise'}, optional) – Behaviour when the provided covariance matrix is not positive definite. Note that semi-definiteness is enough for sampling, but not for the conditioning and regression operations.

Raises:

ValueError – If check_valid=’raise’ and the provided matrix is not positive semidefinite; or if the sizes of the mean vector and covariance matrices do not match.

Examples

>>> import sempler
>>> import numpy as np

Defining a univariate standard normal distribution:

>>> distribution = sempler.NormalDistribution(0, 1)

Defining an isotropic normal with zero mean vector:

>>> covariance = np.array([[1,0],[0,1]])
>>> means = np.array([0,0])
>>> distribution = sempler.NormalDistribution(means, covariance)

An error is raised if the covariance matrix is not positive definite and we set check_valid=’raise’:

>>> sempler.NormalDistribution([0,0], [[1,0],[1,1]], check_valid='raise')
Traceback (most recent call last):
  ...
ValueError: Covariance matrix is not positive definite.

>>> sempler.NormalDistribution([0,0], [[1,1],[1,1]], check_valid='raise')
Traceback (most recent call last):
  ...
ValueError: Covariance matrix is not positive definite.

Or if the size of the mean vector and covariance matrix do not match:

>>> sempler.NormalDistribution([0], [[1,0],[0,1]])
Traceback (most recent call last):
  ...
ValueError: Mismatch in the size of mean vector and covariance matrix.

mean

The marginal means of the variables.

Type:: numpy.ndarray

covariance

The covariance matrix of the distribution.

Type:: numpy.ndarray

p

The number of variables.

Type:: int

conditional(Y, X, x)

Return the conditional distribution of some variables given some others’ values.

Parameters:

Y (array_like, list of ints or np.array) – The indices of the conditioned variables. Note that the indices in the new distribution are dependent on the order given in Y, e.g. marginal([0,1]) and marginal([1,0]) yield different (permuted) distributions.
X (array_like, list of ints or np.array) – The indices of the variables to condition on.
x (array_like) – The values of the conditioning variables.

Raises:

ValueError – If the size of X and x do not match, or if X and Y are not disjoint.

Returns:

distribution – the conditional distribution.

Return type:

sempler.NormalDistribution

Examples

Conditioning a single variable:

>>> conditional = distribution.conditional(0, 1, .1)

Conditioning several variables:

>>> conditional = distribution.conditional([0,1], [2,3], [.2, .3])

An exception is raised if the size of X and x do not match:

>>> distribution.conditional(0, [1], [.1, .2])
Traceback (most recent call last):
...
ValueError: Mismatch in the size of X and x.

or if Y and X are not disjoint:

>>> distribution.conditional([0], [0,1], [0, .1])
Traceback (most recent call last):
...
ValueError: X and Y are not disjoint.

equal(dist, rtol=1e-05, atol=1e-08)

Check if this distribution is equal to another sempler.NormalDistribution, up to a tolerance.

Parameters:

dist (sempler.NormalDistribution) – The distribution to compare with.
rtol (float, optional) – The relative tolerance in the elements of the mean and covariance. Default is 1e-5.
atol (float, optional) – The absolute tolerance in the elements of the mean and covariance. Default is 1e-8.

Returns:

equal – If the two distributions have the same mean/covariance up to tolerance tol.

Return type:

bool

Raises:

TypeError – If dist is not a sempler.NormalDistribution.

Example

>>> import sempler
>>> dist1 = sempler.NormalDistribution(0, 1)
>>> sempler.NormalDistribution(0.5, 1).equal(dist1, atol=0.5)
True
>>> sempler.NormalDistribution(0.5, 1).equal(dist1)
False

An exception is raised if we attempt to compare with anything else than a sempler.NormalDistribution:

>>> sempler.NormalDistribution(0,1).equal(1)
Traceback (most recent call last):
...
TypeError: Unexpected type for "dist".

marginal(X)

Return the marginal distribution of some variables.

Parameters:: X (array_like) – The indices of the variables. Note that the indices in the new distribution are dependent on the order given in X, e.g. marginal([0,1]) and marginal([1,0]) yield different (permuted) distributions.
Returns:: marginal – The marginal distribution.
Return type:: sempler.NormalDistribution

Examples

>>> marginal = distribution.marginal(0)

>>> marginal = distribution.marginal([0,1])

mse(y, Xs)

Compute the population (i.e. expected) mean squared error resulting from regressing a variable on a subset of others.

The regression coefficients/intercept are the MLE computed in regress().

Parameters:

y (int) – The index of the response/predicted variable.
Xs (int, list of ints or np.array) – The indices of the predictor/explanatory variables.

Returns:

mse – the expected mean squared error.

Return type:

float

Example

>>> distribution.mse(0, [1,2])
0.24666666666666665

Regressing a variable on itself yields a zero error:

>>> distribution.mse(0,0)
0.0

regress(y, Xs)

Compute the population MLE of the regression coefficients and intercept from regressing a variable on a subset of others.

Parameters:

y (int) – The index of the response/predicted variable.
Xs (array_like) – The indices of the predictor/explanatory variables.

Returns:

coefs (numpy.ndarray) – A p-sized array containing the estimated coefficients in the indices in Xs and 0s elsewhere.
intercept (float) – The estimated intercept.

Example

>>> distribution.regress(0,[1,2])
(array([ 0.        , -0.33333333,  0.33333333,  0.        ,  0.        ]), 0.6666666666666666)

Regressing a variable on itself:

>>> distribution.regress(0,0)
(array([1., 0., 0., 0., 0.]), 0.0)

sample(n, random_state=None)

Generate a sample from the distribution.

Parameters:

n (int) – The size of the sample (i.e. number of observations).
random_state (int,optional) – To set the random state for reproducibility.

Returns:

An array containing the sample, where each column corresponds to a variable.

Return type:

numpy.ndarray

Example

>>> distribution.sample(5, random_state = 42)
array([[0.754143  , 1.67465613, 2.45882157, 2.4718102 , 3.8108411 ],
       [2.04520905, 1.21383225, 3.29393672, 4.46230358, 5.79975029],
       [0.99209988, 2.50969225, 4.41981256, 5.3086489 , 5.73704626],
       [0.76152114, 2.94130318, 4.45257529, 5.02206465, 6.25225282],
       [0.71256027, 1.22445591, 1.33646622, 2.19050731, 0.07261   ]])