This article is one of a series of paper reviews from our researchers and machine learning engineers – view more
Last month, G-Research were Diamond sponsors at ICML, hosted this year in Baltimore, US.
As well as having a stand and team in attendance, a number of our quantitative researchers and machine learning engineers attended as part of their ongoing learning and development.
We asked our quants and machine learning practitioners to write about some of the papers and research that they found most interesting.
Here, Jaak Simm, Quantitative Researcher at G-Research, discusses three papers.
Planning with Diffusion for Flexible Behavior Synthesis
Michael Janner, Yilun Du, Joshua B. Tenenbaum, Sergey Levine
The paper proposes an innovative approach to reinforcement learning (RL). The authors observe that standard model-based RL is hurt by the planner overfitting to the learned model, i.e., resulting in an adversarial attack on the model. Therefore, their approach is to use diffusion probabilistic models to generate RL strategies, analogously how in-painting generates missing parts of the image.
Their key novelty is the use of diffusing denoising probabilistic models (DDPM) to sample trajectories that are consistent with the dynamics and also maximise rewards. This results in two main advantages:
- Coherence across long horizons (in contrast to standard 1-step autoregressive approaches that focus on single-step dynamics)
- On-the-fly ability to compose several tasks, which allows to re-purpose the model to new tasks during inference (test) time.
Prioritized training on points that are learnable, worth learning, and not yet learnt
Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal
The paper proposes a novel idea on how to speed up deep learning. They make an observation that some of the data used for learning is already:
- Accurately predicted
- Too hard to learn
- Has mistakes (outlier)
Therefore, if one could avoid training on that data, one could speed up training significantly. As a solution, they propose a selection criteria that allows to create effective minibatches that only contain data that do not have the above mentioned issues.
In addition to speeding up the learning they observe that they can reach higher final accuracies, especially when the dataset contains label noise.
GACT: Activation Compressed Training for Generic Network Architectures
Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung
Increasing batch size in deep learning can increase learning speed. However, increasing the batch size increases the required GPU memory, as more samples and their activations have to be fit into the GPU RAM.
To tackle this issue the paper proposes a black box compression algorithm for the activations, called GACT. The key idea is to use linear approximation of the gradient.
The authors show in the experiments that this allows them to reduce the required GPU memory by 8x, thus allowing them to use 8 times larger batch sizes.