NeurIPs Paper Reviews 2023 #4

Abide by the law and follow the flow: conservation laws for gradient flows

Sibylle Marcotte, Remi Gribonval, Gabriel Peyré

Think of neural network training as a dynamical system obeying the laws of classical mechanics. The loss function L is like a potential energy surface, and the NN weights W follow trajectories of steepest descent according to “laws of motion”, which are defined by a differential equation dW/dt = -k * dL/dW. The authors show that the NN weights obey conservation laws just like conservation of energy in classical mechanics.

For example, for a 1-dimensional, 2-layer ReLU network with two weights u and v, there is one conserved quantity h = u^2 – v^2. This implies that the initial choice of weights is important as the final state is constrained to keep h constant throughout training. This builds on previous work (Zhao 2022) which argues that these conservation laws induce an inductive bias towards “flat” minima of the loss function, which reduces overfitting and makes training more robust.

The paper contains a complicated procedure for computing the conserved quantities for more complicated NNs, but the slides have some nice pictures illustrating the 1-d example. I like it because it is a neat way to understand NN training using ideas from physics. It also suggests that bigger NNs with more parameters might work well.

Abide by the law and follow the flow: conservation laws for gradient flows

NeurIPS 2022 Paper Reviews

Read paper reviews from NeurIPS 2022 from a number of our quantitative researchers and machine learning practitioners.

Read now

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Miłoś, Tomasz Trzcinski

In a deep 18-layer neural network for image classification, the layers can be divided into two distinct roles. The first 8 layers act as a feature “extractor”, and are responsible for most of the predictive power of the network. The following 10 layers act as a “tunnel”, whose purpose is to compress the intermediate activation vector in to a low-dimensional embedding.

According to the authors, the “extractor” attains >99% of the final prediction accuracy, and that the numerical rank of the weight matrices in the “tunnel” collapses to log(d) where d is the number of output classes. The authors perform a number of experiments: combining the “extractor” trained on one task with the “tunnel” trained on a different task. They show that the “extractor” is task-specific but the “tunnel” is the same for both tasks.

I like it because it is a nice, practical way to understand NN training dynamics, which seems to conclude with a meaningful interpretation. I would be curious to see if this holds for other architectures and datasets. I like the use of intermediate metrics (like numerical rank of intermediate layers) to probe what’s happening during training.

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Latest News

The Tyranny of Tech Debt

28 Apr 2025

Hear from our Head of Forecasting Engineering on why the term "tech debt" has outlived its usefulness. In this blog, he explores why we should move away from generic labels and instead ask more precise, value-driven questions that lead to meaningful improvements in engineering and business outcomes.

Read article

G-Research March 2025 Grant Winners

22 Apr 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our March grant winners.

Read article

Invisible Work of OpenStack: Eventlet Migration

25 Mar 2025

Hear from Jay, an Open Source Software Engineer, on tackling technical debt in OpenStack. As technology evolves, outdated code becomes inefficient and harder to maintain. Jay highlights the importance of refactoring legacy systems to keep open-source projects sustainable and future-proof.

Read article

Latest Events

Quantitative Engineering
Quantitative Research

SIAM Conference on Financial Mathematics and Engineering

15 Jul 2025 - 18 Jul 2025 Hyatt Regency Miami, 400 SE 2nd St, Miami, FL 33131, United States

More info

Quantitative Engineering
Quantitative Research

Imperial PhD Careers Fair

10 Jun 2025 Queen's Tower Rooms, Sherfield Building, South Kensington Campus, Imperial College London, London, SW7 2AZ

More info

Quantitative Engineering
Quantitative Research

Women in Quant Finance

15 Jun 2025 - 16 Jun 2025 1 Soho Place, London, W1D 3BG

More info

NeurIPs Paper Reviews 2023 #4

Abide by the law and follow the flow: conservation laws for gradient flows

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Quantitative Research and Machine Learning

Read more of our quantitative researchers thoughts

Latest News

Latest Events

SIAM Conference on Financial Mathematics and Engineering

Imperial PhD Careers Fair

Women in Quant Finance

Stay up to date with
G-Research

NeurIPs Paper Reviews 2023 #4

Abide by the law and follow the flow: conservation laws for gradient flows

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Quantitative Research and Machine Learning

Read more of our quantitative researchers thoughts

Latest News

Latest Events

SIAM Conference on Financial Mathematics and Engineering

Imperial PhD Careers Fair

Women in Quant Finance

Stay up to date with G-Research

Stay up to date with
G-Research