NeurIPS 2022: Paper review #4
G-Research were headline sponsors at NeurIPS 2022, in New Orleans.
ML is a fast-evolving discipline; attending conferences like NeurIPS and keeping up-to-date with the latest developments is key to the success of our quantitative researchers and machine learning engineers.
Our NeurIPS 2022 paper review series gives you the opportunity to hear about the research and papers that our quants and ML engineers found most interesting from the conference.
Here, Hugh S, Senior Quantitative Researcher at G-Research, discusses two papers from NeurIPS:
- Posterior and Computational Uncertainty in Gaussian Processes
- Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport
Jonathan Wenger, Geoff Pleiss, Marvin Pförtner, Philipp Hennig, John P. Cunningham
Exact inference in Gaussian process models generally scales cubically in the number of data, and many approximations have been proposed to reduce this complexity to linear or quadratic.
The ideas in this paper come from probabilistic numerics (PN), where numerical computation is considered a problem of Bayesian inference. This seems a strange idea; why not just do the computation?
Consider solving a linear system. The conjugate gradients algorithm is a traditional approach and it gets the right answer in a sequence of steps, but what if the algorithm is stopped early? PN addresses this question by treating the solution as an unknown quantity and building a sequence of observations of this quantity.
With each observation, the solution becomes more determined, until eventually there is no uncertainty at all and the exact solution is recovered.
The resulting algorithm looks similar to the conjugate gradients algorithm, but it keeps track of the uncertainty of the answer at each step. When this idea is applied to Gaussian process inference, a very interesting thing happens: the traditional approaches to scaling Gaussian process inference (partial conjugate gradients, inducing points, pivoted cholesky approaches etc) all appear in modified form to reflect the ‘computational uncertainty’.
These modifications are demonstrably better than their traditional counterparts as they never underestimate the predictive variance.
There has been an enormous literature on Gaussian process approximations since early 2000s, so it is remarkable how much more this paper has to say on the topic.
Matthias Bitzer, Mona Meister, Christoph Zimmer
Gaussian processes provide a simple yet powerful approach to modelling functions. In simple settings, the only modelling decision is which kernel to use and there are a number of options that give very different properties.
Practitioners often go one step further and use the marginal likelihood of the data to select a good kernel using gradient descent on a family of kernels. Sometimes this isn’t enough, however, and methods have been proposed to select over a broader class of kernels.
A key piece of work in this area is the Automatic Statistician, where a tree construction of kernels is proposed, which matches on to a natural language description.
A key difficulty of this approach is efficiently searching over the tree of kernels. The original Automatic Statistician used greedy search. This paper instead uses Bayesian optimisation for the kernel optimisation, and uses optimal transport between trees to define a metric on the kernel space.
The resulting algorithm is very ‘double’ in nature: there is a kernel in the original data space and another kernel over the kernels. The promise of this sort of approach is a truly automatic algorithm that can find underlying structure from data in a human-interpretable way.