sempler.NormalDistribution
The NormalDistribution class represents a normal distribution through its mean and covariance. The class defines methods for
conditioning:
conditional()
marginalization:
marginal()
sampling:
sample()
least-squares regression in the “population setting”:
regress()
andmse()
- class sempler.NormalDistribution(mean, covariance, check_valid='ignore')
Symbolic representation of a normal distribution.
- Parameters:
mean (array_like) – The marginal means of the variables.
covariance (array_like) – The covariance matrix of the distribution.
check_valid ({'ignore', 'warn', 'raise'}, optional) – Behaviour when the provided covariance matrix is not positive definite. Note that semi-definiteness is enough for sampling, but not for the conditioning and regression operations.
- Raises:
ValueError – If check_valid=’raise’ and the provided matrix is not positive semidefinite; or if the sizes of the mean vector and covariance matrices do not match.
Examples
>>> import sempler >>> import numpy as np
Defining a univariate standard normal distribution:
>>> distribution = sempler.NormalDistribution(0, 1)
Defining an isotropic normal with zero mean vector:
>>> covariance = np.array([[1,0],[0,1]]) >>> means = np.array([0,0]) >>> distribution = sempler.NormalDistribution(means, covariance)
An error is raised if the covariance matrix is not positive definite and we set check_valid=’raise’:
>>> sempler.NormalDistribution([0,0], [[1,0],[1,1]], check_valid='raise') Traceback (most recent call last): ... ValueError: Covariance matrix is not positive definite.
>>> sempler.NormalDistribution([0,0], [[1,1],[1,1]], check_valid='raise') Traceback (most recent call last): ... ValueError: Covariance matrix is not positive definite.
Or if the size of the mean vector and covariance matrix do not match:
>>> sempler.NormalDistribution([0], [[1,0],[0,1]]) Traceback (most recent call last): ... ValueError: Mismatch in the size of mean vector and covariance matrix.
- mean
The marginal means of the variables.
- Type:
numpy.ndarray
- covariance
The covariance matrix of the distribution.
- Type:
numpy.ndarray
- p
The number of variables.
- Type:
int
- conditional(Y, X, x)
Return the conditional distribution of some variables given some others’ values.
- Parameters:
Y (array_like, list of ints or np.array) – The indices of the conditioned variables. Note that the indices in the new distribution are dependent on the order given in Y, e.g. marginal([0,1]) and marginal([1,0]) yield different (permuted) distributions.
X (array_like, list of ints or np.array) – The indices of the variables to condition on.
x (array_like) – The values of the conditioning variables.
- Raises:
ValueError – If the size of X and x do not match, or if X and Y are not disjoint.
- Returns:
distribution – the conditional distribution.
- Return type:
Examples
Conditioning a single variable:
>>> conditional = distribution.conditional(0, 1, .1)
Conditioning several variables:
>>> conditional = distribution.conditional([0,1], [2,3], [.2, .3])
An exception is raised if the size of X and x do not match:
>>> distribution.conditional(0, [1], [.1, .2]) Traceback (most recent call last): ... ValueError: Mismatch in the size of X and x.
or if Y and X are not disjoint:
>>> distribution.conditional([0], [0,1], [0, .1]) Traceback (most recent call last): ... ValueError: X and Y are not disjoint.
- equal(dist, rtol=1e-05, atol=1e-08)
Check if this distribution is equal to another sempler.NormalDistribution, up to a tolerance.
- Parameters:
dist (sempler.NormalDistribution) – The distribution to compare with.
rtol (float, optional) – The relative tolerance in the elements of the mean and covariance. Default is 1e-5.
atol (float, optional) – The absolute tolerance in the elements of the mean and covariance. Default is 1e-8.
- Returns:
equal – If the two distributions have the same mean/covariance up to tolerance tol.
- Return type:
bool
- Raises:
TypeError – If dist is not a sempler.NormalDistribution.
Example
>>> import sempler >>> dist1 = sempler.NormalDistribution(0, 1) >>> sempler.NormalDistribution(0.5, 1).equal(dist1, atol=0.5) True >>> sempler.NormalDistribution(0.5, 1).equal(dist1) False
An exception is raised if we attempt to compare with anything else than a sempler.NormalDistribution:
>>> sempler.NormalDistribution(0,1).equal(1) Traceback (most recent call last): ... TypeError: Unexpected type for "dist".
- marginal(X)
Return the marginal distribution of some variables.
- Parameters:
X (array_like) – The indices of the variables. Note that the indices in the new distribution are dependent on the order given in X, e.g. marginal([0,1]) and marginal([1,0]) yield different (permuted) distributions.
- Returns:
marginal – The marginal distribution.
- Return type:
Examples
>>> marginal = distribution.marginal(0)
>>> marginal = distribution.marginal([0,1])
- mse(y, Xs)
Compute the population (i.e. expected) mean squared error resulting from regressing a variable on a subset of others.
The regression coefficients/intercept are the MLE computed in
regress()
.- Parameters:
y (int) – The index of the response/predicted variable.
Xs (int, list of ints or np.array) – The indices of the predictor/explanatory variables.
- Returns:
mse – the expected mean squared error.
- Return type:
float
Example
>>> distribution.mse(0, [1,2]) 0.24666666666666665
Regressing a variable on itself yields a zero error:
>>> distribution.mse(0,0) 0.0
- regress(y, Xs)
Compute the population MLE of the regression coefficients and intercept from regressing a variable on a subset of others.
- Parameters:
y (int) – The index of the response/predicted variable.
Xs (array_like) – The indices of the predictor/explanatory variables.
- Returns:
coefs (numpy.ndarray) – A p-sized array containing the estimated coefficients in the indices in Xs and 0s elsewhere.
intercept (float) – The estimated intercept.
Example
>>> distribution.regress(0,[1,2]) (array([ 0. , -0.33333333, 0.33333333, 0. , 0. ]), 0.6666666666666666)
Regressing a variable on itself:
>>> distribution.regress(0,0) (array([1., 0., 0., 0., 0.]), 0.0)
- sample(n, random_state=None)
Generate a sample from the distribution.
- Parameters:
n (int) – The size of the sample (i.e. number of observations).
random_state (int,optional) – To set the random state for reproducibility.
- Returns:
An array containing the sample, where each column corresponds to a variable.
- Return type:
numpy.ndarray
Example
>>> distribution.sample(5, random_state = 42) array([[0.754143 , 1.67465613, 2.45882157, 2.4718102 , 3.8108411 ], [2.04520905, 1.21383225, 3.29393672, 4.46230358, 5.79975029], [0.99209988, 2.50969225, 4.41981256, 5.3086489 , 5.73704626], [0.76152114, 2.94130318, 4.45257529, 5.02206465, 6.25225282], [0.71256027, 1.22445591, 1.33646622, 2.19050731, 0.07261 ]])