The best of ICML 2022 – Paper reviews (part 9)
This article is one of a series of paper reviews from our researchers and machine learning engineers – view more
Last month, G-Research were Diamond sponsors at ICML, hosted this year in Baltimore, US.
As well as having a stand and team in attendance, a number of our quantitative researchers and machine learning engineers attended as part of their ongoing learning and development.
We asked our quants and machine learning practitioners to write about some of the papers and research that they found most interesting.
Here, James Bayliss, Quantitative Researcher at G-Research, discusses three papers.
David M. Knigge, David W Romero, Erik J Bekkers
Group convolutional neural networks (CNNs) provide a way to improve performance over standard architectures when the problem in question has underlying symmetry. This paper presents a modification to group CNNs which is shown to reduce training time and outperform existing baselines on some standard datasets.
The method is relevant to when the group in question is a Lie group, represented as a semidirect product. The paper focuses mostly on the case of combined translations/dilations/rotations acting on image data. The key idea is to express the group convolution kernels as factorisations across the subgroups in the semidirect product. Doing so acts as regularisation, and also allows computation of the convolution to be separated in the same manner, reducing training time.
Interestingly, experimental evidence in the paper suggests that training of (non-separable) group CNNs also results in kernels which can be closely approximated by such a factorisation, further motivating this method.
A few notable tricks are used to make the convolution equations tractable to evaluate/approximate without losing the desired equivariance. Kernels are described using sinusoidal representation networks on the Lie algebra, and then mapped to the group via the exponential. Randomised sampling is also used to avoid discretisation artefacts.
Overall, this paper takes a nice high-level idea all the way down to empirical wins in performance and accuracy, and provides an interesting perspective on group CNNs.
Alexandre Rame, Corentin Dancette, Matthieu Cord
Out-of-distribution generalisation concerns tackling the problem that occurs when test data is drawn from a different distribution than the training data – a common problem when using deep learning models in application.
The paper cites examples where deep learning models failed to detect COVID-19 from lung scans reliably in practice, due to reliance on simple correlations that were only artefacts of the training dataset construction.
Constraining a model to behave similarly (in some suitable sense) across different domains present in the training set is a general strategy used to try to improve out-of-distribution generalisation, and the paper claims that its manifestation of this idea is the first to systematically outperform baselines for domain generalisation tasks.
The idea behind the development in the paper is based on recent work that encourages gradients across different training domains to agree, and the key addition is regularisation that encourages not only agreement of gradients, but agreement of gradient variances. This aligns the local geometry of the loss landscapes over different domains during training, which has been previously suggested as a method of achieving domain generalisation.
While having a simple description, the paper connects the idea to deeper statistical concepts, and presents some impressive results.
Rui Wang, Robin Walters, Rose Yu
This paper explores models which are able to use the inductive bias of symmetry without having it enforced exactly, for applications to problems which have imperfect symmetry.
Examples of such problems are given as predicting the future of dynamical systems where the equations are symmetric but depend on some unobserved data which breaks the symmetry.
The notion of approximate equivariance is introduced as a formalisation of what it means for a system to have imperfect symmetry, and synthetic data is used to compare the equivariance properties of fitted models across varying levels of problem equivariance, with the proposed method providing the best match.
Symmetry-breaking is introduced into the model by replacing the otherwise exactly equivariant convolutions with weighted sums of convolutions, where the weights vary across the group. This is an interesting idea, similar in spirit to introducing non-linearity into an otherwise linear model by allowing the coefficients to also depend on the inputs.
These models are demonstrated to outperform baselines on both a synthetic dataset which has approximate symmetry by design, and a real dataset that is believed to have these approximate symmetry properties.
I would be interested to see an exploration of more problems which have imperfect symmetry to properly understand the application scope of this type of method.