# Fast Exact Gaussian Process in PyMC ## 3-June-2022 ### Setup PyMC dev environment * https://github.com/pymc-devs/pymc * `conda-envs/environment-test-py39.yml` has all of the dependencies that are needed. * Forked PyMC into `kunalghosh/pymc` * Created a branch `fastGP` * `conda env create -f environment-test-py39.yml` * `conda activate pymc-test-py39` #### Am I set up at this point ? * Check if tests pass > Quick googling brought up [this contributing guide](https://github.com/pymc-devs/pymc/blob/main/CONTRIBUTING.md) which links to how to run tests. * Ran `pytest -v pymc/tests/test_model.py` without running the `pip install -r requirements-dev.txt` just to check if I need these requirements after creating the conda test environment. * Loads of compiler errors :laughing: * Ran `pip install -r requirements-dev.txt` but got this error among multiple lines of others > SystemError: Cannot compile 'Python.h'. Perhaps you need to > install python-dev|python-devel. > [end of output] * Seems like I need to install python devel * Installed `conda install -c conda-forge python-devtools` and then ran tests again, but tests failed again and a prominent error was missing `stdio.h` :thinking_face: * Removed `python-devtools` * `export SDKROOT=$(xcrun --sdk macosx --show-sdk-path)` * Ran tests again `pytest -v pymc/tests/test_model.py` and all 59 tests passed. :star-struck: * Installed pre-commit `pre-commit install` so that all tests defined in `.pre-commit-config.yaml` Next : [Contributing 101](https://docs.pymc.io/en/latest/contributing/index.html) Open a draft pull-request. This seems like an even better [first contributor guide](https://pymc-data-umbrella.xyz/en/latest/sprint/docstring_tutorial.html) ### First Contribution * Following the [first contributor guide](https://pymc-data-umbrella.xyz/en/latest/sprint/docstring_tutorial.html) which asks to commit a doc-string fix * I found a bug in the formatting of the [pymc/gp/TP](https://docs.pymc.io/en/latest/_modules/pymc/gp/gp.html#TP) docstring. there were a few extra escaped out characters `\\nu` instead of `\nu` * While building documentation locally >❯ make html (pymc-test-py39) > sphinx-build docs/source docs/_build -b html > Running Sphinx v4.5.0 > Extension error: > Could not import extension jupyter_sphinx (exception: No module named 'jupyter_sphinx') > make: *** [html] Error 2 * :exclamation: `jupyter_sphinx` must be added to `requirements-dev.txt` __but this is added automatically__ :thinking_face: ## 15-June-2022 Tried running following [code](https://github.com/martiningram/mcmc_runtime_comparison) to test running a simple model but it failed. Linked in Junpeng Lao's answer in this discourse [thread](https://discourse.pymc.io/t/pymc3-with-gpu-support-on-google-colab/1649/6) I found the following conjugate gradients implementation which should help in my implementation: * Link to the [FastGP paper](https://proceedings.neurips.cc/paper/2018/file/27e8e17134dd7083b050476733207ea1-Paper.pdf) * Here's a theano conjugate gradients code [theano-hf](https://github.com/boulanni/theano-hf/blob/master/hf.py) it doesn't implement everything in theano, particularly the loop is still in pure python. Might be a good starting point. * Here's another implementation of [CG in PyTorch](https://github.com/sbarratt/torch_cg) For my FastGP code I will need the following: 1. I will create a separate package similar to [aesara-devs/aemcmc](https://github.com/aesara-devs/aemcmc) in my case it will be **kunalghosh/aembcg** and will contain mBCG (modified _batched_ version of linear conjugate gradients) implemented in aesara. This will allow me to implement and test the function separately. Once its implemented and tested, I could move it to `pymc.gp.utils` keeping `mBCG` separate also allows it to be used more generally. :exclamation: maybe I can keep things simple and implement mBCG in `pymc.gp.util` I am not sure how generally useful it will be. 2. Inside the pymc code, I will subclass [MarginalGP](https://github.com/pymc-devs/pymc/blob/562be3781c9d37d3300c4efd4cf6598e5739c32d/pymc/gp/gp.py#L358) and override the `_build_conditional()` and `_build_marginal_likelihood()` methods using `mbcg()` function from the **aembcg** package. separating these two features out allows the minibatch conjugate gradients to be used separately and also these two parts can be tested out separately. ### Test my PyMC setup Ran a simple GP model to fit a tiny dataset seems to ```python= # generate data import numpy as np from matplotlib import pyplot as plt import pymc as pm def get_data(): x = np.linspace(0, 10, 20) m, c = 1.2, 2 noise_std = 3 noise = np.random.rand(*x.shape) * noise_std y = m * x + c y_noise = y + noise print(f"Noise free , Noise") for _ in zip(y, y_noise): print(_) return x, y_noise def add_dims(a): return np.expand_dims(a, axis=1) def split_train_test(a): num_train = 15 a_train, a_test = a[:num_train], a[num_train:] return add_dims(a_train), add_dims(a_test) np.random.seed(100) x, y = get_data() X_train, X_test = split_train_test(x) Y_train, Y_test = split_train_test(y) print(X_train.shape) for _ in zip(X_test, Y_test): print(_) plt.scatter(x, y) plt.scatter(X_train, Y_train, color="r") plt.savefig("data.png") with pm.Model() as model: n_trend = pm.HalfCauchy("η_trend", beta=0.5) ls = pm.Gamma("ℓ_trend", alpha=4.0, beta=0.2) cov_func = n_trend * pm.gp.cov.ExpQuad(input_dim=1, ls=ls, active_dims=[0]) mean = pm.gp.mean.Constant() gp = pm.gp.Marginal(mean_func=mean, cov_func=cov_func) # Since the normal noise model and the GP are conjugates, we use `Marginal` with the `.marginal_likelihood` method y_ = gp.marginal_likelihood("y", X=X_train, y=Y_train, noise=1e-6) mp = pm.find_MAP(include_transformed=True) print(mp) sorted([name + ":" + str(mp[name]) for name in mp.keys() if not name.endswith("_")]) print("Predicting with gp ...") mu, var = gp.predict(X_test, point=mp, diag=True, jitter=1e-6) mu_all, var_all = gp.predict(add_dims(x), point=mp, diag=True, jitter=1e-6) plt.scatter(X_train, Y_train, label="Training") plt.scatter(X_test, Y_test, label="Test") plt.errorbar(add_dims(x), mu_all[:, 1], yerr=var_all, label="Fit") plt.savefig("prediction.png") print(mu) print(var) ``` fit look pretty good. ![](https://i.imgur.com/75VNJC8.png) ## 21-June-2022 ### Setup Aesara * Setup Aesara on Google colab [link](https://colab.research.google.com/drive/1JlcLflFktvdn53t-n5AcBMX8SKVLQsl9?usp=sharing) thanks to Ricardo's help and aesara works fine. _But I just couldn't get Aesara to use the GPU._ :confused: * Tried a simple test of multiplying two numbers but still it took 26s using `%%timeit` on Google colab [link](https://colab.research.google.com/drive/1SKBGG3x4i7VwOBWpWzYebfPYSFy-00QE?usp=sharing) with GPU backend. * Thanks to Brandon and Ricardo, I now have Aesara (JAX+GPU) running on Colab [link](https://colab.research.google.com/drive/1SKBGG3x4i7VwOBWpWzYebfPYSFy-00QE?usp=sharing) :sunglasses: * It seems that, when using JAX directly, jax.np as jnp it is faster to pre-allocate the data on the GPU. However, using the same GPU allocated data with Aesara is almost twice as slow. Aesara has the same speed as native JAX when using the NumPy matrix directly (not pre-allocated on the GPU). **tl;dr: When using Aesara with JAX(GPU) backend, using NumPy arrays as input for the functions seems to provide the most speedups.** * Did a similar test with JAX+TPU on [Colab](https://colab.research.google.com/drive/1WywfSUKtYeLNCIWkdHXb6xaP8VwVpE7s?usp=sharing) and I see that JAX is using the TPUs but Aesara + JAX is not using them. ### Links 1. PyMC 4 installation [guide](https://github.com/pymc-devs/pymc/wiki/Installation-Guide-(Linux)#pymc-v4-installation) 2. FastGP [paper](https://proceedings.neurips.cc/paper/2018/file/27e8e17134dd7083b050476733207ea1-Paper.pdf) 3. [Hackmd Emoji list](https://github.com/ikatyang/emoji-cheat-sheet) ## 7-July-2022 * Found the code in GPyTorch which implements the `mBCG` algorithm. [Link to code](https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/utils/minres.py) ## 12-July-2022 * [Pivoted Cholesky](https://github.com/cornellius-gp/gpytorch/blob/78796f7e3d36530fbb4a395cf9b303c680367c0d/gpytorch/functions/_pivoted_cholesky.py#L1) in GPyTorch * Original pivoted cholesky [paper](https://www.sciencedirect.com/science/article/abs/pii/S0168927411001814) * GPyTorch uses `torch.gather` and this [thread](https://discuss.pytorch.org/t/whats-the-equivalence-of-theanos-inc-subtensor/1080/11) in torch discourse says that we can do NumPy like indexing in Theano (consequently in Aesara). Very useful [link](https://stackoverflow.com/questions/50999977/what-does-the-gather-function-do-in-pytorch-in-layman-terms) which talks about this. ```python= a = np.asarray([[1,2,3],[4,5,6],[7,8,9]]) # array([[1, 2, 3], # [4, 5, 6], # [7, 8, 9]]) torch.gather(torch.tensor(a), 0, torch.tensor([[0,0],[2,0]])) # argument 2 which is 0 indicates, the index tensors evaluated column_wise [0,2] => in first column of a pick index 0 and 2 i.e. [1,7] # tensor([[1, 2], # [7, 2]]) torch.gather(torch.tensor(a), 1, torch.tensor([[0,0],[2,0]])) # argument 2 which is 0 indicates, the index tensors go through row_wise # tensor([[1, 1], # [6, 4]]) ``` same thing can be achieved in NumPy as ```python= a = np.asarray([[1,2,3],[4,5,6],[7,8,9]]) # array([[1, 2, 3], # [4, 5, 6], # [7, 8, 9]]) # we want all the columns, pick row 0, 1 a[[0,1],:] # argument 2 which is 0 indicates, the index tensors go through row_wise # tensor([[1, 2], # [7, 2]]) torch.gather(torch.tensor(a), 1, torch.tensor([[0,0],[2,0]])) # tensor([[1, 1], # [6, 4]]) ``` ## 23-July-2022 Found the `mBCG` implementation [here](https://github.com/cornellius-gp/gpytorch/blob/55312b22080542eeca93b062750c2c689cd6a2a5/gpytorch/functions/_inv_quad_logdet.py#L13) Now I need to implement this, which should finish a major portion of the project. Now I would have everything in NumPy which can very easily be sped-up with `Jax`. At this point I will implement the sub-class of Marginal GP, `_log_marginal()` and `_log_conditional()`, as discussed before which would allow us to check the speed-up with-in a GP model. ## 10-Aug-2022 I finally sent the pull request for the `pivoted Cholesky` and `linear conjugate gradients` to `gh:pymc-devs/pymc-experimental` still need to address the comments from Bill but it looks nice. I really like Bill's way of working and his feedback. We had a meeting today and Bill (after consulting another colleague from pymc-labs) told that we could consider wrapping `log-determinant` and the `solve` computations in **asesara ops**. Aesara ops are basically logical entities that Aesara sees. They could be implemented in whatever language aesara supports as backend. In our case those are `C`, `NumPy`, `Numba` and `Jax`. We implement forward and backward pass of the ops and aesara handles the rest. Backward pass of Aesara ops are implemented in `L_ops` and there is a way for one op to depend on the other. So I could have one op which implements the `InvQuadLogDet` like what GPyTorch does and then `Solve` and `LogDet` computations depend on that. **This is something new that I have learned and I really like it a lot !** Pull requests so far : * [PR:Pivoted Cholesky](https://github.com/pymc-devs/pymc-experimental/pull/63) * [PR:LinearCG](https://github.com/pymc-devs/pymc-experimental/pull/62)