Gaussian processes

Gaussian processes play a fundamental role in the calibration tasks of ACBICI. We provide next a brief summary of their definition and properties and explain how they are employed in the calibration.

Definition

According to the Wikipedia, “A GP is a stochastic process, that is, a collection of random variables indexed by a scalar, often interpreted as time, such that every finite collection of these variables has a multivariate normal distribution. Equivalently, every finite linear combination of these variables is normally distributed.”

A Gaussian process on \(\mathbb{R}^n\) is completely defined by a mean function \(m:\mathbb{R}^n\to\mathbb{R}\) and a covariance function \(c:\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}\). We write \(f(x)\sim \mathcal{GP}(m(\cdot),c(\cdot,\cdot))\).

A per its definition, the fundamental property of a GP is that, when restricted to a finite number \(N\) of variables, their joint distribution is a normal multivariate, that is, if we define \(\mathbf{x}=\{x_1,x_2,\ldots,x_N\}\), then

\[\]

\[\mathbf{x} \sim \mathcal{N}(m(\mathbf{x}), c(\mathbf{x},\mathbf{x}'))\]

Covariance and kernel

When considering a finite number of random variables as before, the covariance matrix of the multivariate distribution is

\[\]

\[\Sigma = c(\mathbf{x},\mathbf{x}')\]

This means that, for every pair \(x_i,x_j\) of random variables in \(\mathbf{x}\), the \((i,j)\) component of the covariance matrix is calculated by evaluating the covariance function \(c\) at these two points. To ensure certain properties of the Gaussian process this function can not be arbitrary. Most often the covariance function is defined in terms of an isotropic kernel \(k:\mathbb{R}^+\to\mathbb{R}\) such that \(c(\mathbf{x},\mathbf{y}) = k(\|\mathbf{x}-\mathbf{y}\|)\).

Note that given a vector \(\mathbf{x}\in\mathbb{R}^n\) its Euclidean norm \(\|x\| = (\sum_{i=1}^n x_i^2)^{1/2}\) is only well-defined when all the components of the vector are either dimensionless or have the same physical dimensions. If this is not the case, a symmetric positive definite metric \(\mathbf{M}\) must be defined such that

\[\| \mathbf{x} \|^2 = \mathbf{x}^T\; \mathbf{M} \mathbf{x} .\]

In ACBICI, it is assumed that all variables are dimensionless, which simplifies the kernel definitions but also entails certain limitations, including:

Loss of physical interpretability: By treating all variables as dimensionless, the connection to their original physical units is lost. This can obscure meaningful relationships that depend on units, making it harder to interpret results in a physical context.
Reduced flexibility: Forcing all variables into a common, dimensionless framework may be inappropriate for heterogeneous data combining quantities with fundamentally different units (e.g., length, time, energy). This can limit the applicability of the model to complex, multi-physical problems.
Scaling sensitivity: Without explicitly accounting for the physical dimensions, improper or inconsistent scaling of variables can bias distance-based methods such as kernels. The natural geometry of the data, which could be captured by a positive definite metric matrix \(\mathbf{M}\), is neglected.

Some common isotropic kernels available in ACBICI are:

The squared exponential kernel or RBF kernel, that depends on a lengthscale parameter \(\beta\) and a signal variance \(\lambda\)

\[k_{sqexp}(r;\lambda,\beta) = \lambda \exp[ -\frac{r^2}{2\beta^2}]\]

The Matérn 3/2 kernel, that depends on a lengthscale parameter \(\beta\) and a signal variance \(\lambda\)

\[k_{m32}(r;\lambda,\beta) = \lambda \left(1+\frac{\sqrt3\,r}{\beta}\right) \exp\left[-\frac{\sqrt3\,r}{\beta}\right]\]

The Matérn 5/2 kernel, that depends on a lengthscale parameter \(\beta\) and a signal variance \(\lambda\)

\[k_{m52}(r;\lambda,\beta) = \lambda \left(1+\frac{\sqrt5\,r}{\beta} + \frac{5\,r^2}{3\,\beta^2}\right) \exp\left[-\frac{\sqrt5\,r}{\beta}\right]\]

The exponential kernel, that depends on a lengthscale parameter \(\beta\) and a signal variance \(\lambda\)

\[k_{exp}(r;\lambda,\beta) = \lambda\exp\left[-r/\beta\right]\]

The rational quadratic kernel with \alpha=1, depending on a lengthscale parameter \(\beta\) and a signal variance \(\lambda\)

\[k_{ratquad} = \lambda\left(1+\frac{r^2}{2\alpha\beta^2}\right)^{-\alpha}\]

For the multivariate GP, we choose a similarity kernel, that chooses the Matern 3/2 kernel when comparing the same task and otherwise 0, as explained in Multi-Output Calibration.

All the kernels employ one or more hyperparameters whose value give shape to the covariance. Note that in ACBICI we have one lengthscale parameter for the parameters \(\beta_t\) and one for the input variables \(\beta_x\).

The use of Gaussian processes in calibration

Gaussian processes have many applications in statistics and data science. In partcular, they are powerfull parameter-free regression models.

In ACBICI, they are employed for two reasons:

When the model that has to be calibrated is too expensive, it is replaced by a meta-model that, although needs to be calibrated, it is much inexpensive to evaluate. In ACBICI, this surrogate model is a Gaussian process.
Often, the discrepancy error of the model needs to be inferred. Since there is no a priori information about the form of this discrepancy, ACBICI uses a Gaussian process to represent it.

Whenever a Gaussian process is used in ACBICI, its hyperparameters need to be inferred.