Fast Exact Gaussian Process in PyMC

3-June-2022

Setup PyMC dev environment

https://github.com/pymc-devs/pymc
conda-envs/environment-test-py39.yml has all of the dependencies that are needed.
Forked PyMC into kunalghosh/pymc
Created a branch fastGP
conda env create -f environment-test-py39.yml
conda activate pymc-test-py39

Am I set up at this point ?

Check if tests pass

Quick googling brought up this contributing guide which links to how to run tests.

Ran pytest -v pymc/tests/test_model.py without running the pip install -r requirements-dev.txt just to check if I need these requirements after creating the conda test environment.
Loads of compiler errors
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Ran pip install -r requirements-dev.txt but got this error among multiple lines of others

SystemError: Cannot compile 'Python.h'. Perhaps you need to
install python-dev|python-devel.
[end of output]

Seems like I need to install python devel
Installed conda install -c conda-forge python-devtools and then ran tests again, but tests failed again and a prominent error was missing stdio.h
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Removed python-devtools
export SDKROOT=$(xcrun --sdk macosx --show-sdk-path)
Ran tests again pytest -v pymc/tests/test_model.py and all 59 tests passed.
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Installed pre-commit pre-commit install so that all tests defined in .pre-commit-config.yaml

Next : Contributing 101 Open a draft pull-request. This seems like an even better first contributor guide

First Contribution

Following the first contributor guide which asks to commit a doc-string fix
I found a bug in the formatting of the pymc/gp/TP docstring. there were a few extra escaped out characters \\nu instead of \nu
While building documentation locally

❯ make html (pymc-test-py39)
sphinx-build docs/source docs/_build -b html
Running Sphinx v4.5.0
Extension error:
Could not import extension jupyter_sphinx (exception: No module named 'jupyter_sphinx')
make: *** [html] Error 2
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
jupyter_sphinx must be added to requirements-dev.txt but this is added automatically
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

15-June-2022

Tried running following code to test running a simple model but it failed. Linked in Junpeng Lao's answer in this discourse thread

I found the following conjugate gradients implementation which should help in my implementation:

Link to the FastGP paper
Here's a theano conjugate gradients code theano-hf it doesn't implement everything in theano, particularly the loop is still in pure python. Might be a good starting point.
Here's another implementation of CG in PyTorch

For my FastGP code I will need the following:

I will create a separate package similar to aesara-devs/aemcmc in my case it will be kunalghosh/aembcg and will contain mBCG (modified batched version of linear conjugate gradients) implemented in aesara. This will allow me to implement and test the function separately. Once its implemented and tested, I could move it to pymc.gp.utils keeping mBCG separate also allows it to be used more generally.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

maybe I can keep things simple and implement mBCG in pymc.gp.util I am not sure how generally useful it will be.

Inside the pymc code, I will subclass MarginalGP and override the _build_conditional() and _build_marginal_likelihood() methods using mbcg() function from the aembcg package.

separating these two features out allows the minibatch conjugate gradients to be used separately and also these two parts can be tested out separately.

Test my PyMC setup

Ran a simple GP model to fit a tiny dataset seems to




































































# generate data
import numpy as np
from matplotlib import pyplot as plt
import pymc as pm


def get_data():
    x = np.linspace(0, 10, 20)
    m, c = 1.2, 2
    noise_std = 3
    noise = np.random.rand(*x.shape) * noise_std
    y = m * x + c
    y_noise = y + noise
    print(f"Noise free , Noise")
    for _ in zip(y, y_noise):
        print(_)
    return x, y_noise


def add_dims(a):
    return np.expand_dims(a, axis=1)


def split_train_test(a):
    num_train = 15
    a_train, a_test = a[:num_train], a[num_train:]
    return add_dims(a_train), add_dims(a_test)


np.random.seed(100)

x, y = get_data()
X_train, X_test = split_train_test(x)
Y_train, Y_test = split_train_test(y)

print(X_train.shape)
for _ in zip(X_test, Y_test):
    print(_)

plt.scatter(x, y)
plt.scatter(X_train, Y_train, color="r")
plt.savefig("data.png")


with pm.Model() as model:
    n_trend = pm.HalfCauchy("η_trend", beta=0.5)
    ls = pm.Gamma("ℓ_trend", alpha=4.0, beta=0.2)
    cov_func = n_trend * pm.gp.cov.ExpQuad(input_dim=1, ls=ls, active_dims=[0])
    mean = pm.gp.mean.Constant()
    gp = pm.gp.Marginal(mean_func=mean, cov_func=cov_func)
    # Since the normal noise model and the GP are conjugates, we use `Marginal` with the `.marginal_likelihood` method

    y_ = gp.marginal_likelihood("y", X=X_train, y=Y_train, noise=1e-6)
    mp = pm.find_MAP(include_transformed=True)
    print(mp)
    sorted([name + ":" + str(mp[name]) for name in mp.keys() if not name.endswith("_")])
    print("Predicting with gp ...")
    mu, var = gp.predict(X_test, point=mp, diag=True, jitter=1e-6)
    mu_all, var_all = gp.predict(add_dims(x), point=mp, diag=True, jitter=1e-6)
    
    plt.scatter(X_train, Y_train, label="Training")
    plt.scatter(X_test, Y_test, label="Test")
    plt.errorbar(add_dims(x), mu_all[:, 1], yerr=var_all, label="Fit")
    plt.savefig("prediction.png")

    print(mu)
    print(var)

fit look pretty good.

21-June-2022

Setup Aesara

Setup Aesara on Google colab link thanks to Ricardo's help and aesara works fine. But I just couldn't get Aesara to use the GPU.
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Tried a simple test of multiplying two numbers but still it took 26s using %%timeit on Google colab link with GPU backend.
Thanks to Brandon and Ricardo, I now have Aesara (JAX+GPU) running on Colab link
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- It seems that, when using JAX directly, jax.np as jnp it is faster to pre-allocate the data on the GPU. However, using the same GPU allocated data with Aesara is almost twice as slow. Aesara has the same speed as native JAX when using the NumPy matrix directly (not pre-allocated on the GPU).
  tl;dr: When using Aesara with JAX(GPU) backend, using NumPy arrays as input for the functions seems to provide the most speedups.
- Did a similar test with JAX+TPU on Colab and I see that JAX is using the TPUs but Aesara + JAX is not using them.

7-July-2022

Found the code in GPyTorch which implements the mBCG algorithm. Link to code

12-July-2022

Pivoted Cholesky in GPyTorch
Original pivoted cholesky paper
GPyTorch uses torch.gather and this thread in torch discourse says that we can do NumPy like indexing in Theano (consequently in Aesara). Very useful link which talks about this.













a = np.asarray([[1,2,3],[4,5,6],[7,8,9]])
# array([[1, 2, 3],
#        [4, 5, 6],
#        [7, 8, 9]])
torch.gather(torch.tensor(a), 0, torch.tensor([[0,0],[2,0]]))
# argument 2 which is 0 indicates, the index tensors evaluated column_wise [0,2] => in first column of a pick index 0 and 2 i.e. [1,7]
# tensor([[1, 2],
#         [7, 2]])

torch.gather(torch.tensor(a), 1, torch.tensor([[0,0],[2,0]]))
# argument 2 which is 0 indicates, the index tensors go through row_wise
# tensor([[1, 1],
#         [6, 4]])

same thing can be achieved in NumPy as














a = np.asarray([[1,2,3],[4,5,6],[7,8,9]])
# array([[1, 2, 3],
#        [4, 5, 6],
#        [7, 8, 9]])

# we want all the columns, pick row 0, 1
a[[0,1],:]
# argument 2 which is 0 indicates, the index tensors go through row_wise
# tensor([[1, 2],
#         [7, 2]])

torch.gather(torch.tensor(a), 1, torch.tensor([[0,0],[2,0]]))
# tensor([[1, 1],
#         [6, 4]])

23-July-2022

Found the mBCG implementation here

Now I need to implement this, which should finish a major portion of the project. Now I would have everything in NumPy which can very easily be sped-up with Jax.

At this point I will implement the sub-class of Marginal GP, _log_marginal() and _log_conditional(), as discussed before which would allow us to check the speed-up with-in a GP model.

10-Aug-2022

I finally sent the pull request for the pivoted Cholesky and linear conjugate gradients to gh:pymc-devs/pymc-experimental still need to address the comments from Bill but it looks nice.

I really like Bill's way of working and his feedback.
We had a meeting today and Bill (after consulting another colleague from pymc-labs) told that we could consider wrapping log-determinant and the solve computations in asesara ops.

Aesara ops are basically logical entities that Aesara sees. They could be implemented in whatever language aesara supports as backend. In our case those are C, NumPy, Numba and Jax. We implement forward and backward pass of the ops and aesara handles the rest.

Backward pass of Aesara ops are implemented in L_ops and there is a way for one op to depend on the other. So I could have one op which implements the InvQuadLogDet like what GPyTorch does and then Solve and LogDet computations depend on that. This is something new that I have learned and I really like it a lot !

Pull requests so far :