The best of ICML 2022 – Paper reviews (part 7)
This article is one of a series of paper reviews from our researchers and machine learning engineers – view more
Last month, G-Research were Diamond sponsors at ICML, hosted this year in Baltimore, US.
As well as having a stand and team in attendance, a number of our quantitative researchers and machine learning engineers attended as part of their ongoing learning and development.
We asked our quants and machine learning practitioners to write about some of the papers and research that they found most interesting.
Here, Stephen Chestnut, Senior Quantitative Researcher at G-Research, discusses two papers.
Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
Real world data are subject to all types of dataset shift, and data are often non-stationary over a long enough time period. That means spurious correlations are a common problem when a model that has been trained on historical data goes into production.
This paper from the Spurious Correlations, Invariance and Stability workshop at ICML 2022 investigates the learned representations of image classification models trained on data with spurious features. The authors find that while the spurious features destroy the test accuracy, good test accuracy can be recovered by simply masking the spurious features at test time. This leads the authors to conclude that even though the classifier is trained on data with very strong spurious features, it still learns representations that capture the desired behaviour.
They go on to propose a two stage fine-tuning scheme where in the first stage, all of the available labelled data are used to fine-tune a pre-trained model and in the second stage a smaller balanced subset of the data, that is balanced with respect to the spurious correlation, is used just to fine tune the final layer.
Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions
Aaron Mishkin, Arda Sahiner, Mert Pilanci
At ICML 2020, Pilanci and Ergen showed that a multilayer perceptron with one hidden layer can be reformulated as a constrained linear model with a group-L1 penalty. That opened up the possibility of optimising the MLP weights using methods from convex optimisation instead of stochastic gradient descent. This paper shows that not only is it possible to use convex methods, it can be much faster than SGD with the right method.
The authors develop a variant of the well-known FISTA algorithm for solving the convex problem, and they show that it can be much faster than SGD with the other advantages that it is deterministic and provides optimality certificates. One catch is that the reformulated convex problem may be big (exponentially many variables big) and to overcome the size the authors apply a heuristic method for down-sampling the problem. It makes the convex problem tractable, but optimality guarantees only apply to the subsampled problem.