Colin Doumont, Victor Picheny, Viacheslav Borovitskiy, Henry Moss
This paper considers kernel choice in the setting of Bayesian optimisation, where the underlying parameter space is a product of finite sets.
The authors point out that several popular kernels are based on Hamming distance – k(x, y) is a function of the count of coordinates along which x and y differ. Standard kernels however place weights on the different coordinate axes, thus introducing many hyperparameters to the resulting algorithms.
The authors propose a canonical choice of k, effectively fixing the relative weights of the different coordinate axes, that they evaluate empirically to be comparable to state-of-the-art in this field.
Given the parameter space S, that is a product of finite sets, the authors form the Hamming graph H with vertex set S, by connecting vertices that differ in exactly one coordinate. Their choice of kernel is the heat kernel on H, thus depending only on a single hyperparameter.
The authors observe that H is a product of cliques, and hence the heat kernel can be analytically computed. This proposal is empirically evaluated on a selection of problems, including a database of MAX-SAT problems and a sample neural architecture search problem.
The authors’ proposal is interesting, making use of classical ideas from graph theory and geometry.
Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation