NeurIPS 2022: Paper review #3
G-Research were headline sponsors at NeurIPS 2022, in New Orleans.
ML is a fast-evolving discipline; attending conferences like NeurIPS and keeping up-to-date with the latest developments is key to the success of our quantitative researchers and machine learning engineers.
Our NeurIPS 2022 paper review series gives you the opportunity to hear about the research and papers that our quants and ML engineers found most interesting from the conference.
Here, Tom M, Machine Learning Engineer at G-Research, discusses two papers from NeurIPS:
- On the Symmetries of Deep Learning Models and their Internal Representations
- IBUG: Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
Charles Godfrey, Davis Brown, Tegan Emerson, Henry Kvinge
Families of neural networks contain symmetries arising from their architectures – symmetries whose action preserves the output of the network as a function.
Here, the authors investigate a fundamental set of such symmetries by introducing intertwiner groups: symmetries of activation functions. They suggest a method for systematically finding any activation’s intertwiner group and analysing groups of popular activations.
The authors seek to connect these symmetries that arise from network architecture with the symmetries found in trained networks’ internal representations of data.
Informally, intertwiner groups’ elements are pairs of linear maps that can “commute” with the activation function. Elements with this property can readily be shown to represent weight-space symmetries in a network. Their analysis of ReLU’s interwiner group in particular lends explanation to previous studies that found that neurons of some networks are more interpretable than random linear combinations of neurons.
To measure similarity between instances of the same model trained with different initialisations, they make use of a prior technique of model stitching, whereby the front and back of two separately trained models are “stitched together” with an extra layer.
To examine how much of the similarity is explained by intertwiner symmetries, they adapt stitching training to restrict it to the intertwiner group of ReLU. In order to restrict the stitching layer the permutation-like maps in this group, they leverage the relaxation of permutations to doubly stochastic matrices. They infer from their experiments that intertwiner symmetries account for most, but not all, of the success of stitching.
They define and compute dissimilarity measures between trained ReLU networks – roughly “differences that cannot be explained by the intertwiner group”. They compare these favourably to existing counterparts using only orthogonal matrices rather than intertwiner maps.
Jonathan Brophy, Daniel Lowd
This paper presents a straightforward method of using existing gradient-boosted decision trees for probabilistic regression: making them produce not just a prediction value for regression problems but also some measure of prediction uncertainty.
Having a good uncertainty measure can be valuable in areas such as forecasting and explainable AI. Here, the authors aim to show that IBUG is more flexible, simple and performant than existing methods such as NGBoost, PGBM and CBU.
The uncertainty estimate is calculated by identifying neighbours in the training set with high “affinity” to the prediction sample (roughly, the ones that fall into the most of the same forest leaves), then using these to model the output distribution and estimate a variance.
The flexibility of the method comes from it being agnostic to GBDT type (LightGBM, XGBoost, CatBoost), and also from it having the ability to model an output distribution using any parametric or non-parametric density estimator.
They address prediction-efficiency of this non-parametric method by suggesting that the k-highest affinity training samples be approximated by sampling trees at random. They also provide a method for efficiently tuning the hyperparameter by specifying the number of neighbours used.
They compare IBUG’s probabilistic performance to existing methods, measured by Continuous Ranked Probability Score. IBUG outperforms existing methods, and an ensemble model of IBUG + CBU outperforms all substantially.