When modeling processes in a Bayesian framework, you start with a prior distribution and you update it when new information, , arrives. This blogpost focuses on the creation of priors. These topics where taught to me on a PhD course at DTU in December, 2019. Many thanks to Daniela Calvetti and Erkki Somersalo for their lectures.
It all starts with a function we're trying to create, call it . We will create a discrete approximation of this function by splitting the unit interval in a grid of points given by . Let's denote the points we want to find as .
Programming wise, let's start by importing some packages for numerical computations and for plotting. In particular, we'll use sparse matrices in scipy and we'll use matplotlib:
import matplotlib.pyplot as plt import numpy as np from scipy import sparse from scipy.sparse.linalg import spsolve
We could encode, in this prior, some assumptions about it's shape and form. As a first example, consider the assumption that the first derivative (i.e. the slope) of is bounded. We can frame this bounded slope condition using a first order approximation: if and is some positive real number, then by bounded slope we mean
or, in other words, we're expecting . The way we can model this assumption is by considering a random vector distributed normally with mean and variance and setting up the following system of equations (which comes up after solving for in the equation above):
we will call this coefficient matrix . We will usually assume that .
We have a couple of free variables that we can tweak and tune: the amount of points in the grid and our assumption of the boundedness in . We can code this, then, with a couple of functions in Python. Let's start with a function that creates the matrix specified above. We will use the spdiags method in scipy:
def create_L1(n): # Creating the diagonals diagonals = [ [-1]*n, # n-element list. *n ] # Specifying the offsets offsets = [-1, 0] # Creating the matrix L1 = n * sparse.diags(diagonals, offsets=offsets, shape=(n, n)) return L1
With this function, we can create a prior by sampling from and solving the linear system:
def bounded_slope(n, gamma): L1 = create_L1(n) # Sampling W = gamma * np.random.normal(size=(n, 1)) # Solving the system X_interior = spsolve(L1, W) # Putting boundary conditions X = np.array([0, *X_interior]) return X
Here's an image of the result of sampling some of them and plotting them in the unit interval:
A very similar procedure could be applied to the condition of bounded curvature. What we mean in this case is that the second derivative would be bounded by a given constant, say, again, . The condition of bounded second derivative can be written after discretization as which, after solving for , gives rise to the following system of equations:
An easy modification of create_L1 and bounded_slope can be used here in order to create L2 (assuming that the boundary points are 0) and to solve the system using this new matrix:
def create_L2(n): # Creating the diagonals diagonals = [ *n, [-2]*n, *n ] # Specifying the offsets offsets = [-1, 0, 1] # Creating the matrix L2 = n ** 2 * sparse.diags(diagonals, offsets=offsets, shape=(n, n)) return L2 def bounded_curvature(n, gamma): L2 = create_L2(n) # Sampling W = gamma * np.random.normal(size=(n, 1)) # Solving the system X_interior = spsolve(L2, W) # Putting boundary conditions X = np.array([0, *X_interior, 0]) return X
Notice that we're refreshing the notation and using instead of . Here are some plots of the result of solving the linear system with with some random draws:
We can extend this line of reasoning to higher dimensions: instead of making assumptions about the first and second order derivatives of , consider making assumptions about the Laplacian for . In order to do so, we will need to construct a finite-difference approximation of the Laplacian, call it . After that, it is as easy as considering the solutions of for and stacked vectors of size .
Our life is made easier by the fact that discrete Laplacians can be constructed from the matrices we have already built above, as it turns out that discrete Laplacians are just Kronecker sums of the finite-difference operators for the second derivative in one dimension, that is, . We can compute these Kronecker products using sparse.kron:
def create_L3(n): # Getting L2 L2 = create_L2(n) # Computing the Kronecker products D1 = sparse.kron(sparse.eye(n-1), L1) D2 = sparse.kron(L1, sparse.eye(n-1)) # L3 as the sum of the products L3 = D1 + D2 return L3
We can easily solve the system and un-stack the vector to get a matrix by reshaping:
def bounded_Laplacian(n, gamma): L3 = create_L3(n) # Sampling W = gamma * np.random.normal(size=(n ** 2, 1)) X = spsolve(L3, W) # Ignoring boundary conditions return X.reshape((n, n))
Here are some examples of multiple results of querying this function:
In order to introduce a measure of correlation, we can consider the finite-differences approximation of a different operator, given by the first derivate (or respectively the second) plus times the identity. The priors that result from solving this finite-difference approximation are called Whittle-Matérn priors, and is called the correlation length.
In our code, it is as simple as constructing a new matrix for each in , depending on what we're trying to build. In order to illustrate, I show a couple of results of sampling the solution to the problem using the 2D example (i.e. ) and different lambdas.
We can even introduce a smoothness order by solving the system . By applying the operator more than once, we would be making assumptions about higher order derivatives. These are a couple examples, varying and .
This post discusses how to create priors for Bayesian processes. The method consists on assuming conditions on the function itself and approximate it in a grid. We discussed two examples in 1D: bounded first and second derivative. These assumptions and this approximation give rise to a system of linear equations of the form , where encodes the bound, is the finite-difference approximation of the differential operator and is sampled from a normal distribution with mean 0 and variance 1. We also discussed how this idea can be expanded to two dimensions by bounding the Laplacian of a function .
We can expand the amount of free parameters we can tweak by considering Whittle-Matérn priors, in which a correlation length parameter is introduced and the system of equations changes to . Another parameter is introduced by considering the system , that is, by considering higher order derivatives.
Besides the use in scientific computing, I think these priors could be used as a source of noise for Procedural Content Generation, just like Perlin Noise is used to create artificial landscapes. I plan to explore this direction in future posts.
(All the code of this post, including image generation, can be found in this gist)