Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzciński, Benjamin Eysenbach
This paper investigates the impact of network depth on self-supervised reinforcement learning (RL). Most previous works used shallow architectures (typically two to five layers).
The authors integrate residual connections, layer normalisation and Swish activations to stabilise training of very deep networks, scaling depth up to 1024 layers. They find that across multiple tasks, deeper models achieve substantial performance improvements—ranging from modest gains to over 50× increases in goal-reaching success—compared with standard shallow networks. Interestingly, depth scaling also leads to qualitatively distinct behaviours, such as the stickman jumping over the wall of a maze.
I found it interesting how the authors tackled the problem of training instability. The paper is a good example of how rigorous empirical experimentation can lead to good and even unexpected results.
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities