|
In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). It is also closely related to matrix normal distribution. In mathematics, the real numbers may be described informally as numbers that can be given by an infinite decimal representation, such as 2. ...
In linear algebra, a coordinate vector is an explicit representation of a vector in an abstract vector space as an ordered list of numbers or, equivalently, as an element of the coordinate space Fn. ...
In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. ...
In linear algebra, a positive-definite matrix is a Hermitian matrix which in many ways is analogous to a positive real number. ...
In mathematics, a matrix (plural matrices) is a rectangular table of elements (or entries), which may be numbers or, more generally, any abstract quantities that can be added and multiplied. ...
In mathematics, the support of a real-valued function f on a set X is sometimes defined as the subset of X on which f is nonzero. ...
In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ...
In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than...
In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are...
In probability theory and statistics, a median is a type of average that is described as the number dividing the higher half of a sample, a population, or a probability distribution, from the lower half. ...
In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ...
In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ...
Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ...
The far red light has no effect on the average speed of the gravitropic reaction in wheat coleoptiles, but it changes kurtosis from platykurtic to leptokurtic (-0. ...
Claude Shannon In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. ...
In probability theory and statistics, the moment-generating function of a random variable X is wherever this expectation exists. ...
In probability theory, the characteristic function of any random variable completely defines its probability distribution. ...
Probability theory is the branch of mathematics concerned with analysis of random phenomena. ...
This article is about the field of statistics. ...
In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ...
The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...
The matrix normal distribution is a probability distribution that is a generalization of the normal distribution. ...
General case
A random vector follows a multivariate normal distribution if it satisfies the following equivalent conditions: A multivariate random variable or random vector is a vector X=(X1,...,Xn) whose components are scalar-valued random variables on the same probability space (Ω, P). ...
- there is a random vector
, whose components are independent standard normal random variables, a vector and an matrix such that . -
 If is non-singular, then the distribution may be described by the following PDF: The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...
The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...
For the square matrix section, see square matrix. ...
In probability theory, the characteristic function of any random variable completely defines its probability distribution. ...
In mathematics and especially linear algebra, an n-by-n matrix A is called invertible, non-singular or regular if there exists another n-by-n matrix B such that AB = BA = In, where In denotes the n-by-n identity matrix and the multiplication used is ordinary matrix multiplication. ...
In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ...
 where is the determinant of . Note how the equation above reduces to that of the univariate normal distribution if is a scalar (i.e., a multiple of the identity matrix). In algebra, a determinant is a function depending on n that associates a scalar, det(A), to every nÃn square matrix A. The fundamental geometric meaning of a determinant is as the scale factor for volume when A is regarded as a linear transformation. ...
The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...
The vector μ in these conditions is the expected value of X and the matrix is the covariance matrix of the components Xi. In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are...
In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. ...
It is important to realize that the covariance matrix must be allowed to be singular (thus not described by above formula for which is defined). That case arises frequently in statistics; for example, in the distribution of the vector of residuals in ordinary linear regression problems. Note also that the Xi are in general not independent; they can be seen as the result of applying the linear transformation A to a collection of independent Gaussian variables Z. This article is about the field of statistics. ...
In statistics and optimization, the concepts of error and residual are easily confused with each other. ...
In statistics, linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xi, i = 1, ..., p, and a random term ε. The model can be written as Example of linear regression with one dependent and one independent variable. ...
That the distribution of a random vector X is a multivariate normal distribution can be written in the following notation:  or to make it explicitly known that X is N-dimensional,  Cumulative distribution function The cumulative distribution function (cdf) F(x) is defined as the probability that all values in a random vector X are less than or equal to the corresponding values in vector x. Though there is no closed form for F(x), there are a number of algorithms that estimate it numerically. For example, see MVNDST under [1] (includes FORTRAN code) or [2] (includes MATLAB code). In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than...
Fortran (previously FORTRAN[1]) is a general-purpose[2], procedural,[3] imperative programming language that is especially suited to numeric computation and scientific computing. ...
Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ...
Not to be confused with Matlab Upazila in Chandpur District, Bangladesh. ...
Source code (commonly just source or code) is any series of statements written in some human-readable computer programming language. ...
A counterexample The fact that two or more random variables X and Y are normally distributed does not imply that the pair (X, Y) has a joint normal distribution. A simple example is one in which Y = X if |X| > 1 and Y = −X if |X| < 1. Also see normally distributed and uncorrelated does not imply independent. In probability theory, it is almost a cliche to say that uncorrelatedness of two random variables does not entail independence. ...
Normally distributed and independent If X and Y are normally distributed and independent, then they are "jointly normally distributed", i.e., the pair (X, Y) has a bivariate normal distribution. There are of course also many bivariate normal distributions in which the components are correlated.
Bivariate case In the 2-dimensional nonsingular case, the probability density function (with mean (0,0)) is In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ...
 where ρ is the correlation between X and Y. In this case, Positive linear correlations between 1000 pairs of numbers. ...
 Linear transformation If is a linear transformation of where is an matrix then has a multivariate normal distribution with expected value and variance (i.e., ). In mathematics, a linear transformation (also called linear map or linear operator) is a function between two vector spaces that preserves the operations of vector addition and scalar multiplication. ...
Corollary: any subset of the has a marginal distribution that is also multivariate normal. To see this consider the following example: to extract the subset , use  which extracts the desired elements directly.
Geometric interpretation The equidensity contours of a non-singular multivariate normal distribution are ellipsoids (i.e. linear transformations of hyperspheres) centered at the mean[1]. The directions of the principal axes of the ellipsoids are given by the eigenvectors of the covariance matrix Σ. The squared relative lengths of the principal axes are given by the corresponding eigenvalues. 3D rendering of an ellipsoid In mathematics, an ellipsoid is a type of quadric that is a higher dimensional analogue of an ellipse. ...
2-sphere wireframe as an orthogonal projection Just as a stereographic projection can project a spheres surface to a plane, it can also project a 3-spheres surface into 3-space. ...
If Σ = UΛUT = UΛ1 / 2(UΛ1 / 2)T is an eigendecomposition where the columns of U are unit eigenvectors and Λ is a diagonal matrix of the eigenvalues, then we have ...
In linear algebra, a diagonal matrix is a square matrix in which the entries outside the main diagonal are all zero. ...
-
 Moreover, U can be chosen to be a rotation matrix, as inverting an axis does not have any effect on N(0,Λ), but inverting a column changes the sign of U's determinant. The distribution N(μ,Σ) is in effect N(0,I) scaled by Λ1 / 2, rotated by U and translated by μ. A rotation matrix is a matrix which when multiplied by a vector has the effect of changing the direction of the vector but not its magnitude. ...
Conversely, any choice of μ, full rank matrix U, and positive diagonal entries Λi yields a non-singular multivariate normal distribution. If any Λi is zero and U is square, the resulting covariance matrix UΛUT is singular. Geometrically this means that every contour ellipsoid is infinitely thin and has zero volume in n-dimensional space, as at least one of the principal axes has length of zero. In mathematics and especially linear algebra, an n-by-n matrix A is called invertible, non-singular or regular if there exists another n-by-n matrix B such that AB = BA = In, where In denotes the n-by-n identity matrix and the multiplication used is ordinary matrix multiplication. ...
Correlations and independence In general, random variables may be uncorrelated but highly dependent. But if a random vector has a multivariate normal distribution then any two or more of its components that are uncorrelated are independent. This implies that any two or more of its components that are pairwise independent are independent. In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. ...
But it is not true that two random variables that are (separately, marginally) normally distributed and uncorrelated are independent. Two random variables that are normally distributed may fail to be jointly normally distributed, i.e., the vector whose components they are may fail to have a multivariate normal distribution. For an example of two normally distributed random variables that are uncorrelated but not independent, see normally distributed and uncorrelated does not imply independent. In probability theory, it is almost a cliche to say that uncorrelatedness of two random variables does not entail independence. ...
Higher moments The kth-order moments of X are defined by-1...
![mu _{1,dots,N}(X) stackrel{mathrm{def}}{=} mu _{r_{1},dots,r_{N}}(X) stackrel{mathrm{def}}{=} Eleft[ prodlimits_{j=1}^{N}X_j^{r_{j}}right]](http://upload.wikimedia.org/math/8/e/8/8e895d7ed390749881b8992984ae3fa8.png) where  The central k-order moments are given as follows (a) If k is odd, . (b) If k is even with k = 2λ, then  where the sum is taken over all allocations of the set into λ (unordered) pairs, giving (2λ − 1)! / (2λ − 1(λ − 1)!) terms in the sum, each being the product of λ covariances. The covariances are determined by replacing the terms of the list by the corresponding terms of the list consisting of r1 ones, then r2 twos, etc, after each of the possible allocations of the former list into pairs. In particular, the 4-order moments are ![Eleft[ X_{i}^{4}right] = 3( sigma _{ii}) ^{2}](http://upload.wikimedia.org/math/c/7/e/c7e695ff1c46c43af56d6c46545fc358.png) ![Eleft[ X_{i}^{3}X_{j}right] = 3sigma _{ii}sigma _{ij}](http://upload.wikimedia.org/math/a/b/7/ab7e10d33de9fe603b406ed5d27bbbca.png) ![Eleft[ X_{i}^{2}X_{j}^{2}right] = sigma _{ii}sigma _{jj}+2left( sigma _{ij}right) ^{2}](http://upload.wikimedia.org/math/3/e/6/3e6a8ed9a7909838a49fb484da560dcc.png) ![Eleft[ X_{i}^{2}X_{j}X_{k}right] = sigma _{ii}sigma _{jk}+2sigma _{ij}sigma _{ik}](http://upload.wikimedia.org/math/5/2/2/522050cf2c2d942f6be92a5d56223e07.png) ![Eleft[ X_{i}X_{j}X_{k}X_{n}right] = sigma _{ij}sigma _{kn}+sigma _{ik}sigma _{jn}+sigma _{in}sigma _{jk}.](http://upload.wikimedia.org/math/8/3/e/83e60bcc615fbb4c4e6930cea6b8bcb2.png) For fourth order moments (four variables) there are three terms. For sixth-order moments there are 3 × 5 = 15 terms, and for eighth-order moments there are 3 × 5 × 7 = 105 terms. The sixth-order moment case can be expanded as ![begin{align} & {} E[X_{1}X_{2}X_{3}X_{4}X_{5}X_{6}] &{} = E[X_{1}X_{2}]E[X_{3}X_{4}]E[X_{5}X_{6}]+E[X_{1}X_{2}]E[X_{3}X_{5}]E[X_{4}X_{6}]+E[X_{1}X_{2}]E[X_{3}X_{6}]E[X_{4}X_{5}] &{} + E[X_{1}X_{3}]E[X_{2}X_{4}]E[X_{5}X_{6}]+E[X_{1}X_{3}]E[X_{2}X_{5}]E[X_{4}X_{6}]+E[X_{1}X_{3}]E[X_{2}X_{6}]E[X_{4}X_{5}] &+ E[X_{1}X_{4}]E[X_{2}X_{3}]E[X_{5}X_{6}]+E[X_{1}X_{4}]E[X_{2}X_{5}]E[X_{3}X_{6}]+E[X_{1}X_{4}]E[X_{2}X_{6}]E[X_{3}X_{5}] & + E[X_{1}X_{5}]E[X_{2}X_{3}]E[X_{4}X_{6}]+E[X_{1}X_{5}]E[X_{2}X_{4}]E[X_{3}X_{6}]+E[X_{1}X_{5}]E[X_{2}X_{6}]E[X_{3}X_{4}] &+E[X_{1}X_{6}]E[X_{2}X_{3}]E[X_{4}X_{5}]+E[X_{1}X_{6}]E[X_{2}X_{4}]E[X_{3}X_{5}]+E[X_{1}X_{6}]E[X_{2}X_{5}]E[X_{3}X_{4}]. end{align}](http://upload.wikimedia.org/math/f/8/6/f86e4e66e7cbdb944a49722012ceba06.png) Conditional distributions If μ and Σ are partitioned as follows with sizes  with sizes  then the distribution of x1 conditional on x2 = a is multivariate normal where  and covariance matrix  This matrix is the Schur complement of in . In linear algebra and the theory of matrices, the Schur complement (named after Issai Schur) of a block of a matrix within the larger matrix is defined as follows. ...
Note that knowing the value of x2 to be a alters the variance; perhaps more surprisingly, the mean is shifted by ; compare this with the situation of not knowing the value of a, in which case x1 would have distribution . The matrix is known as the matrix of regression coefficients. In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ...
Fisher information matrix The Fisher information matrix (FIM) for a normal distribution takes a special formulation. The (m,n) element of the FIM for is In statistics, the Fisher information I(θ), thought of as the amount of information that an observable random variable carries about an unobservable parameter θ upon which the probability distribution of X depends, is the variance of the score. ...
 where In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A, i. ...
Kullback-Leibler divergence The Kullback-Leibler divergence from N0N(μ0,Σ0) to N1N(μ1,Σ1) is: In probability theory and information theory, the Kullback-Leibler divergence (or information divergence, or information gain, or relative entropy) is a natural distance measure from a true probability distribution P to an arbitrary probability distribution Q. Typically P represents data, observations, or a precise calculated probability distribution. ...
 Estimation of parameters The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle and elegant. See estimation of covariance matrices. Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. ...
In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter; an estimate is the result from the actual application of the function to a particular set of data. ...
In multivariate statistics, the importance of the Wishart distribution stems in part from the fact that it is the probability distribution of the maximum likelihood estimator of the covariance matrix of a multivariate normal distribution. ...
In short, the probability density function (pdf) of an N-dimensional multivariate normal is  and the ML estimator of the covariance matrix is  which is simply the sample covariance matrix for sample size n. This is a biased estimator whose expectation is In statistics, a biased estimator is one that for some reason on average over_ or underestimates what is being estimated. ...
![E[widehatSigma] = {n-1 over n}Sigma.](http://upload.wikimedia.org/math/6/f/5/6f5b2fd1317b7743d57df74a3e9350dc.png) An unbiased sample covariance is  Entropy The differential entropy of the multivariate normal distribution is [2] Differential entropy (also referred to as continuous entropy) is a concept in information theory which tries to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. ...
   where is the determinant of the covariance matrix Σ. In algebra, a determinant is a function depending on n that associates a scalar, det(A), to every nÃn square matrix A. The fundamental geometric meaning of a determinant is as the scale factor for volume when A is regarded as a linear transformation. ...
Multivariate normality tests Multivariate normality tests check a given set of data for similarity to the multivariate normal distribution. The null hypothesis is that the data set is similar to the normal distribution, therefore a sufficiently small p-value indicates non-normal data. Multivariate normality tests include the Cox-Small test [3] and Smith and Jain's adaptation [4] of the Friedman-Rafsky test. In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...
A data set (or dataset) is a collection of data, usually presented in tabular form. ...
In statistical hypothesis testing, the p-value of a random variable T used as a test statistic is the probability that T will assume a value at least as extreme as the observed value tobserved, given that a null hypothesis being considered is true. ...
Drawing values from the distribution A widely used method for drawing a random vector X from the N-dimensional multivariate normal distribution with mean vector μ and covariance matrix Σ (required to be symmetric and positive-definite) works as follows: In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. ...
- Compute the Cholesky decomposition (matrix square root) of Σ, that is, find the unique lower triangular matrix A such that
. - Let
be a vector whose components are N independent standard normal variates (which can be generated, for example, by using the Box-Muller transform). - Let X be
. In mathematics, the Cholesky decomposition, named after André-Louis Cholesky, is a matrix decomposition of a symmetric positive-definite matrix into a lower triangular matrix and the transpose of the lower triangular matrix. ...
The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...
Diagram of the Box Muller transform. ...
References - ^ Nikolaus Hansen. The CMA Evolution Strategy: A Tutorial.
- ^ Gokhale, DV; NA Ahmed, BC Res, NJ Piscataway (May 1989). "Entropy Expressions and Their Estimators for Multivariate Distributions". Information Theory, IEEE Transactions on 35 (3): 688-692.
- ^ Cox, D. R.; N. J. H. Small (August 1978). "Testing multivariate normality". Biometrika 65 (2): 263–272.
- ^ Smith, Stephen P.; Anil K. Jain (September 1988). "A test to determine the multivariate normality of a dataset". IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (5): 757–761. doi:10.1109/34.6789.
|