miguelgondu's blog

Creating Priors using Normal Noise

When modeling processes in a Bayesian framework, you start with a prior distribution π(x)\pi(x) and you update it when new information, bb, arrives. This blogpost focuses on the creation of priors. These topics where taught to me on a PhD course at DTU in December, 2019. Many thanks to Daniela Calvetti and Erkki Somersalo for their lectures.

It all starts with a function we're trying to create, call it X:[0,1]RX: [0, 1] \to \mathbb{R}. We will create a discrete approximation of this function by splitting the unit interval in a grid of nn points tjt_j given by tj=j/nt_j = j/n. Let's denote the points we want to find as Xj=X(tj)X_j = X(t_j).

Programming wise, let's start by importing some packages for numerical computations and for plotting. In particular, we'll use sparse matrices in scipy and we'll use matplotlib:

import matplotlib.pyplot as plt import numpy as np from scipy import sparse from scipy.sparse.linalg import spsolve

As always, here's the link with the whole code of this blogpost.

Priors with bounded slope

We could encode, in this prior, some assumptions about it's shape and form. As a first example, consider the assumption that the first derivative (i.e. the slope) of XX is bounded. We can frame this bounded slope condition using a first order approximation: if h=1/nh = 1/n and γ\gamma is some positive real number, then by bounded slope we mean

XjXj1h<γ\left|\frac{X_j - X_{j-1}}{h}\right| < \gamma

or, in other words, we're expecting nXjXj1<γn|X_j - X_{j-1}| < \gamma. The way we can model this assumption is by considering a random vector WW distributed normally with mean 00 and variance 11 and setting up the following system of equations (which comes up after solving for XjX_j in the equation above):

n[11111]n\begin{bmatrix}1&&& \\-1&1&& \\&\ddots&\ddots& \\&&-1&1 \\\end{bmatrix} [X1Xn]=\begin{bmatrix}X_1\\\vdots\\X_n\\\end{bmatrix} = [γW1+X0γWn]\begin{bmatrix}\gamma W_1 + X_0\\\vdots\\\gamma W_n\\ \end{bmatrix},

we will call this coefficient matrix L1L_1. We will usually assume that X0=0X_0 = 0.

We have a couple of free variables that we can tweak and tune: the amount of points in the grid nn and our assumption of the boundedness in γ\gamma. We can code this, then, with a couple of functions in Python. Let's start with a function that creates the L1L_1 matrix specified above. We will use the spdiags method in scipy:

def create_L1(n): # Creating the diagonals diagonals = [ [-1]*n, # n-element list. [1]*n ] # Specifying the offsets offsets = [-1, 0] # Creating the matrix L1 = n * sparse.diags(diagonals, offsets=offsets, shape=(n, n)) return L1

With this function, we can create a prior XX by sampling WW from N(0,1)N(0, 1) and solving the linear system:

def bounded_slope(n, gamma): L1 = create_L1(n) # Sampling W = gamma * np.random.normal(size=(n, 1)) # Solving the system X_interior = spsolve(L1, W) # Putting boundary conditions X = np.array([0, *X_interior]) return X

Here's an image of the result of sampling some of them and plotting them in the unit interval:

Priors with bounded curvature

A very similar procedure could be applied to the condition of bounded curvature. What we mean in this case is that the second derivative would be bounded by a given constant, say, again, γ\gamma. The condition of bounded second derivative can be written after discretization as Xj12Xj+Xj+1h2<γ\left|\frac{X_{j-1} - 2X_{j} + X_{j+1}}{h^2}\right| < \gamma which, after solving for XjX_j, gives rise to the following system of equations:

n2[2112112]n^2\begin{bmatrix}-2&1&& \\\phantom{-}1&-2&1& \\&\ddots&\ddots& \\&&1&-2 \\\end{bmatrix} [X1Xn1]=\begin{bmatrix}X_1\\\vdots\\X_{n-1}\\\end{bmatrix} = [γW1X0γWn1Xn]\begin{bmatrix}\gamma W_1 - X_0\\\vdots\\\gamma W_{n-1} - X_n\\\end{bmatrix}.

An easy modification of create_L1 and bounded_slope can be used here in order to create L2 (assuming that the boundary points are 0) and to solve the system using this new matrix:

def create_L2(n): # Creating the diagonals diagonals = [ [1]*n, [-2]*n, [1]*n ] # Specifying the offsets offsets = [-1, 0, 1] # Creating the matrix L2 = n ** 2 * sparse.diags(diagonals, offsets=offsets, shape=(n, n)) return L2 def bounded_curvature(n, gamma): L2 = create_L2(n) # Sampling W = gamma * np.random.normal(size=(n, 1)) # Solving the system X_interior = spsolve(L2, W) # Putting boundary conditions X = np.array([0, *X_interior, 0]) return X

Notice that we're refreshing the notation and using nn instead of n1n-1. Here are some plots of the result of solving the linear system with L2L_2 with some random draws:

Higher dimensions

We can extend this line of reasoning to higher dimensions: instead of making assumptions about the first and second order derivatives of X:[0,1]RX:[0,1]\to\mathbb{R}, consider making assumptions about the Laplacian Δ=x12X+x22X\Delta = \partial_{x_1}^2X + \partial_{x_2}^2X for X:[0,1]2RX:[0,1]^2\to\mathbb{R}. In order to do so, we will need to construct a finite-difference approximation of the Laplacian, call it L3L_3. After that, it is as easy as considering the solutions of L3X=WL_3X = W for XX and WW stacked vectors of size n2n^2.

Our life is made easier by the fact that discrete Laplacians can be constructed from the matrices we have already built above, as it turns out that discrete Laplacians are just Kronecker sums of the finite-difference operators for the second derivative in one dimension, that is, L3=IL2+L2IL_3 = I \otimes L_2 + L_2 \otimes I. We can compute these Kronecker products using sparse.kron:

def create_L3(n): # Getting L2 L2 = create_L2(n) # Computing the Kronecker products D1 = sparse.kron(sparse.eye(n-1), L1) D2 = sparse.kron(L1, sparse.eye(n-1)) # L3 as the sum of the products L3 = D1 + D2 return L3

We can easily solve the system and un-stack the vector to get a matrix by reshaping:

def bounded_Laplacian(n, gamma): L3 = create_L3(n) # Sampling W = gamma * np.random.normal(size=(n ** 2, 1)) X = spsolve(L3, W) # Ignoring boundary conditions return X.reshape((n, n))

Here are some examples of multiple results of querying this function:

Whittle-Matérn priors

In order to introduce a measure of correlation, we can consider the finite-differences approximation of a different operator, given by the first derivate (or respectively the second) plus λ2\lambda^{-2} times the identity. The priors that result from solving this finite-difference approximation are called Whittle-Matérn priors, and λ\lambda is called the correlation length.

In our code, it is as simple as constructing a new matrix L=Li+λ2IL = L_i + \lambda^{-2}I for each ii in {1,2,3}\{1, 2, 3\}, depending on what we're trying to build. In order to illustrate, I show a couple of results of sampling the solution to the problem using the 2D example (i.e. L=L3+λ2IL = L_3 + \lambda^{-2}I) and different lambdas.

We can even introduce a smoothness order β\beta by solving the system LβX=γWL^\beta X = \gamma W. By applying the operator more than once, we would be making assumptions about higher order derivatives. These are a couple examples, varying β\beta and λ\lambda.

Conclusion and future directions

This post discusses how to create priors for Bayesian processes. The method consists on assuming conditions on the function itself and approximate it in a grid. We discussed two examples in 1D: bounded first and second derivative. These assumptions and this approximation give rise to a system of linear equations of the form LiX=γWL_iX = \gamma W, where γ\gamma encodes the bound, LiL_i is the finite-difference approximation of the differential operator and WW is sampled from a normal distribution with mean 0 and variance 1. We also discussed how this idea can be expanded to two dimensions by bounding the Laplacian of a function X:[0,1]2RX:[0,1]^2\to\mathbb{R}.

We can expand the amount of free parameters we can tweak by considering Whittle-Matérn priors, in which a correlation length parameter λ\lambda is introduced and the system of equations changes to (Li+λ2I)X=γW(L_i + \lambda^{-2}I)X = \gamma W. Another parameter is introduced by considering the system (Li+λ2I)βX=γW(L_i + \lambda^{-2}I)^\beta X = \gamma W, that is, by considering higher order derivatives.

Besides the use in scientific computing, I think these priors could be used as a source of noise for Procedural Content Generation, just like Perlin Noise is used to create artificial landscapes. I plan to explore this direction in future posts.

(All the code of this post, including image generation, can be found in this gist)