Gaussian processes
******************
Gaussian processes play a fundamental role in the calibration tasks of ACBICI. We provide next a brief summary of their definition and properties and explain how they are employed in the calibration.


Definition
==========
According to the `Wikipedia <https://en.wikipedia.org/wiki/Gaussian_process>`_, "A GP is a stochastic process, that is, a collection of random variables indexed by a scalar, often interpreted as time, such that every finite collection of these variables has a multivariate normal distribution. Equivalently, every finite linear combination of these variables is normally distributed."

A Gaussian process on $\mathbb{R}^n$ is completely defined by a mean function $m:\mathbb{R}^n\to\mathbb{R}$ and a covariance function $c:\mathbb{R}^n\times\mathbb{R}^n\to\mathbb{R}$. We write $f(x)\sim \mathcal{GP}(m(\cdot),c(\cdot,\cdot))$.

A per its definition, the fundamental property of a GP is that, when restricted to a finite number $N$ of variables, their joint distribution is a normal multivariate, that is, if we define $\mathbf{x}=\{x_1,x_2,\ldots,x_N\}$, then

.. math::

$$\mathbf{x} \sim \mathcal{N}(m(\mathbf{x}), c(\mathbf{x},\mathbf{x}'))$$


Covariance and kernel
=====================

When considering a finite number of random variables as before, the covariance matrix of the multivariate distribution is

.. math::

$$\Sigma = c(\mathbf{x},\mathbf{x}')$$

This means that, for every pair $x_i,x_j$ of random variables in $\mathbf{x}$, the $(i,j)$ component of the covariance matrix is calculated by evaluating the *covariance function* $c$ at these two points. To ensure certain properties of the Gaussian process this function can not be arbitrary. Most often the covariance function is defined in terms of an isotropic *kernel* $k:\mathbb{R}^+\to\mathbb{R}$ such that $c(\mathbf{x},\mathbf{y}) = k(\|\mathbf{x}-\mathbf{y}\|)$.

Note that given a vector $\mathbf{x}\in\mathbb{R}^n$ its Euclidean norm $\|x\| = (\sum_{i=1}^n x_i^2)^{1/2}$ is only well-defined when all the components of the vector are either dimensionless or have the same physical dimensions. If this is not the case, a symmetric positive definite metric $\mathbf{M}$ must be defined such that

.. math::

  \| \mathbf{x} \|^2 = \mathbf{x}^T\; \mathbf{M} \mathbf{x} .

In ACBICI, it is assumed that all variables are dimensionless, which simplifies the kernel definitions but also entails certain limitations, including:

- **Loss of physical interpretability:** By treating all variables as dimensionless, the connection to their original physical units is lost. This can obscure meaningful relationships that depend on units, making it harder to interpret results in a physical context.

- **Reduced flexibility:** Forcing all variables into a common, dimensionless framework may be inappropriate for heterogeneous data combining quantities with fundamentally different units (e.g., length, time, energy). This can limit the applicability of the model to complex, multi-physical problems.

- **Scaling sensitivity:** Without explicitly accounting for the physical dimensions, improper or inconsistent scaling of variables can bias distance-based methods such as kernels. The natural geometry of the data, which could be captured by a positive definite metric matrix :math:`\mathbf{M}`, is neglected.

Some common isotropic kernels available in ACBICI are:

- The squared exponential kernel or RBF kernel, that depends on a lengthscale parameter $\beta$ and a signal variance $\lambda$

.. math::

  k_{sqexp}(r;\lambda,\beta) = \lambda \exp[ -\frac{r^2}{2\beta^2}]

- The Matérn 3/2 kernel, that depends on a lengthscale parameter $\beta$ and a signal variance $\lambda$

.. math::

  k_{m32}(r;\lambda,\beta) = \lambda \left(1+\frac{\sqrt3\,r}{\beta}\right) \exp\left[-\frac{\sqrt3\,r}{\beta}\right]


- The Matérn 5/2 kernel, that depends on a lengthscale parameter $\beta$ and a signal variance $\lambda$

.. math::

  k_{m52}(r;\lambda,\beta) = \lambda \left(1+\frac{\sqrt5\,r}{\beta} + \frac{5\,r^2}{3\,\beta^2}\right) \exp\left[-\frac{\sqrt5\,r}{\beta}\right]

- The exponential kernel, that depends on a lengthscale parameter $\beta$ and a signal variance $\lambda$

.. math::

  k_{exp}(r;\lambda,\beta) = \lambda\exp\left[-r/\beta\right]

- The rational quadratic kernel with \alpha=1, depending on a lengthscale parameter $\beta$ and a signal variance $\lambda$

.. math::
  k_{ratquad} = \lambda\left(1+\frac{r^2}{2\alpha\beta^2}\right)^{-\alpha}

For the multivariate GP, we choose a similarity kernel, that chooses the Matern 3/2 kernel when comparing the same task and otherwise 0, as explained in :ref:`multioutput_calibration`.

All the kernels employ one or more *hyperparameters* whose value give shape to the covariance.
Note that in ACBICI we have one lengthscale parameter for the parameters :math:`\beta_t` and one for the input variables :math:`\beta_x`.


The use of Gaussian processes in calibration
============================================

Gaussian processes have many applications in statistics and data science. In partcular, they are powerfull parameter-free regression models.

In ACBICI, they are employed for two reasons:

- When the model that has to be calibrated is too expensive, it is replaced by a meta-model that, although needs to be calibrated, it is much inexpensive to evaluate. In ACBICI, this surrogate model is a Gaussian process.

- Often, the discrepancy error of the model needs to be inferred. Since there is no a priori information about the form of this discrepancy, ACBICI uses a Gaussian process to represent it.


Whenever a Gaussian process is used in ACBICI, its hyperparameters need to be inferred.