This article is one of a series of paper reviews from our researchers and machine learning engineers – view more
Last month, G-Research were Diamond sponsors at ICML, hosted this year in Baltimore, US.
As well as having a stand and team in attendance, a number of our quantitative researchers and machine learning engineers attended as part of their ongoing learning and development.
We asked our quants and machine learning practitioners to write about some of the papers and research that they found most interesting.
Here, Oliver L, Quantitative Researcher at G-Research, discusses three papers.
Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation
Giung Nam, Hyungi Lee, Byeongho Heo, Juho Lee
This paper proposes a novel approach to distil teacher ensembles as a Batch Ensemble in applications that require fast inference speed.
The goal is to reduce the student ensemble to a simple network via weight averaging. To make this meaningful, the paper uses a clever weight initialisation of the rank-one factors, which map the shared parameters in the Batch Ensemble to each student network.
In addition, training samples are perturbed in such a way to better represent a diverse teacher but homogeneous student ensemble input.
Hierarchical Shrinkage: Improving the Accuracy and Interpretability of Tree-Based Methods
Abhineet Agarwal, Yan Shuo Tan, Omer Ronen, Chandan Singh, Bin Yu
This paper proposes a innovative, post-hoc regularisation algorithm for tree-based models. Their key insight focuses on seeing trees as linear models where the nodes correspond to a learned basis.
Applying a type of Ridge regression motivates their hierarchical shrinkage (HS) algorithm. It is remarkably similar to the existing leaf-based shrinkage (LBS) that only sees the leaf nodes as basis. However, the paper shows that HS generalises better than existing methods, including LBS, on wide variety of datasets. This is mostly driven by a drastically reduced model variance which in turn also gives more meaningful SHAP values.
Multirate Training of Neural Networks
Tiffany J Vlaar, Benedict Leimkuhler
Most differential equation solvers use adaptive step sizes to capture phenomena on different scales. Mapping step sizes to learning rates, the authors show that this idea can also be applied in deep learning to speed-up transfer learning or regularise neural networks.
To speed-up transfer learning, the idea is to compute the gradients of the first layers only every k steps, which shortens most of the backward passes. In their experiments this leads to speed-ups of +50% without sacrificing accuracy.