G-Research were headline sponsors at NeurIPS 2022, in New Orleans.
ML is a fast-evolving discipline; attending conferences like NeurIPS and keeping up-to-date with the latest developments is key to the success of our quantitative researchers and machine learning engineers.
Our NeurIPS 2022 paper review series gives you the opportunity to hear about the research and papers that our quants and ML engineers found most interesting from the conference.
Here, Simon L, Senior Quantitative Researcher at G-Research, discusses two papers from NeurIPS:
- Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift
- Deep Ensembles Work, But Are They Necessary?
Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift
Christina Baek, Yiding Jiang, Aditi Raghunathan, Zico Kolter
This paper presents a method for predicting out-of-distribution (OOD) accuracy of a neural network from its in-distribution (ID) performance, without the need for a labelled test set.
The method is based on the observation that ID/OOD agreement of predictions from two neural network classifiers are strongly linearly related (post a probit transform), in a very similar way to how ID/OOD accuracy was shown to be in a more general setting in recent work (Accuracy on the line by Miller et al., 2021). Specifically, the values of the intercept and slope coefficients of these two regressions are very close.
Another interesting observation is that, for neural networks, ID/OOD agreement tends to hold whenever ID/OOD accuracy on the line holds. Hence an agreement regression (which doesn’t require any labels), plus a calculation of in-distribution accuracy can be used to estimate out of distribution accuracy.
The authors even demonstrate that the agreement regression could be performed on checkpoints of a single model fit. These are fascinating empirical findings and hopefully further understanding of the specificity of these observations to neural networks will follow in the future.
Deep Ensembles Work, But Are They Necessary?
Taiga Abe, E. Kelly Buchanan, Geoff Pleiss, Richard Zemel, John P. Cunningham
There were a number of interesting papers at NeurIPS making progress in the understanding of different properties of deep ensembles.
This paper, for instance, questions the claim that predictions from deep ensembles have robustness and uncertainty quantification benefits over and above those of single neural networks.
Taking robustness, the authors show empirically that ensembles sit on the same ID-OOD accuracy line as their constituent models, and hence, the diversity of ensemble predictions does not yield additional protection to dataset distribution shift, relative to a single model (perhaps larger or with a different architecture) with the same predictive performance.
Further, they show empirically that some common metrics of ensemble diversity take similar values on both ID and OOD datasets. By decomposing common ensemble uncertainty metrics into average single model uncertainty, plus a diversity term, they conclude that deep ensemble uncertainty is driven by the uncertainty of a single model.
Deep ensembles have lots of interesting and useful properties, and I found this work helpful in clarifying some of these more carefully.