Skip to main content

NeurIPS paper reviews 2025 #5

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Quantitative Researcher, Cédric.

Learning (Approximately) Equivariant Networks via Constrained Optimisation

Andrei Manolache, Luiz F.O. Chamon, Mathias Niepert

A recurring question in machine learning is whether symmetries present in a task or dataset should be explicitly hard-coded into a model.

Even when the underlying data is perfectly equivariant, a sufficiently flexible model is not guaranteed to learn an equivariant solution. Conversely, enforcing equivariance through architectural constraints can complicate optimisation, and in many practical settings equivariance holds only approximately in the dataset, due to measurement noise or other imperfections.

This paper proposes a principled framework for controlling the degree of equivariance enforced in a model. The key idea is to embed an equivariant architecture within a broader multi-parameter family of models, where equivariance is recovered when these “equivariance-breaking” parameters vanish.

By examining the dual of the training constrained optimisation problem, the authors derive a minimax formulation that allows joint optimisation of the model and equivariance-breaking parameters (via gradient descent) and dual variables (via gradient ascent). This reformulation provides a mechanism through which the optimisation process itself determines the appropriate degree of equivariance.

When the dataset is fully equivariant, the authors show that the equivariance-breaking parameters naturally converge to zero during training, leading the model to recover an equivariant solution without manual enforcement. In situations where the data only approximately respects a symmetry, the framework is extended by relaxing the constraints through slack variables that bound the equivariance-breaking parameters. In this partially equivariant regime, the parameters do not vanish, but the authors derive theoretical guarantees relating their magnitude to the model’s deviation from perfect equivariance.

Finally, the paper presents empirical results demonstrating that this gradual and data-driven incorporation of equivariance does not degrade performance on downstream tasks. Moreover, the training procedure remains stable even when the input symmetry is intentionally degraded, supporting the claim that the method adapts appropriately to imperfect real-world symmetries.

What I particularly enjoyed about this paper is its introduction of a technique I had not encountered before: framing the incorporation of equivariance as a minimax optimisation derived from the dual of a constrained loss. Enforcing equivariance by adding auxiliary loss terms is challenging, as it inevitably introduces a delicate weight-tuning problem.

This paper’s approach bypasses that issue entirely by integrating the constraint in a principled, optimisation-driven manner. I find this both elegant and practically appealing, and I’m eager to investigate how these ideas might inform my own research.

Learning (Approximately) Equivariant Networks via Constrained Optimisation
Blue line chart object tracing a jagged upward trend action along vertical and horizontal axes context on a white square background NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity

Quentin Bertrand, Anne Gagneux, Mathurin Massias, Rémi Emonet

Modern generative models, such as those trained via conditional flow matching, can generate high-quality samples. Nevertheless, in low-data regimes they may learn vector fields that partially memorise the training set rather than capturing its underlying distribution.

Several hypotheses have been proposed to explain why these models generalise rather than memorise. One proposed explanation is that the stochasticity inherent in the conditional flow-matching target acts as a regulariser.

The authors challenge this explanation in high-dimensional settings. They analyse the closed-form vector field that exactly transports a Gaussian prior to the empirical (discrete) data distribution, providing a deterministic memorising baseline against which the stochastic training target can be compared.

Across multiple experiments, the authors find that the conditional flow-matching target exhibits substantial stochasticity only at very high noise levels, corresponding to a small portion of the training trajectory. At lower noise levels, the target becomes almost deterministic. Moreover, they observe that models generalise better when the learned vector field deviates more strongly from the closed-form solution. This suggests that exact recovery of the deterministic field may promote memorisation.

In a hybrid sampling experiment, following the closed-form vector field up to a threshold t∈[0,1]  before switching to the learned field, the authors find that even for small t, the resulting samples largely resemble the training data. This indicates that the trajectory’s outcome is effectively determined early in the flow, i.e., at high noise levels.

Finally, the authors experiment with training using the closed-form vector field as a deterministic target. Because computing this field scales with dataset size, they employ a Monte-Carlo approximation whose variance remains lower than that of the conditional flow-matching target. Empirically, they obtain samples superior to those produced via standard conditional flow matching. This supports their claim that reducing target stochasticity does not impede, and may even improve, generative performance.

I appreciated this paper for its effort to frame overfitting in explicitly measurable terms, made possible by the introduction of a “pure overfit” baseline that provides a clean reference point for evaluating the authors’ contributions. The experimental suite is particularly original, addressing several questions that I have often considered but never had the opportunity to explore rigorously.

On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more paper reviews

NeurIPS 2025: Paper review #1

Discover the perspectives of Nick, one of our Quantitative Researchers, on the following papers:

  • Counterfactual Identifiability via Dynamic Optimal Transport
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
  • Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Read now
NeurIPS 2025: Paper review #2

Discover the perspectives of Tomi, one of our Quantitative Research Managers, on the following papers:

  • Kronos: A Foundation Model for the Language of Financial Markets
  • LOBERT: Generative AI Foundation Model for Limit Order Book Messages
  • Auto-Compressing Networks
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #4

Discover the perspectives of Nick, one of our Software Engineers, on the following papers:

  • Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
  • Antidistillation Sampling
  • Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #7

Discover the perspectives of Radomir, one of our Machine Learning Engineers, on the following papers:

  • Learning Task-Agnostic Representations through Multi-Teacher Distillation
  • Contrastive Representations for Temporal Reasoning
Read now
NeurIPS 2025: Paper review #8

Discover the perspectives of Benjamin, one of our Quantitative Researchers, on the following papers:

  • Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
  • Backward Conformal Prediction
  • Predicting the Performance of Black-box Language Models with Follow-up Queries
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #13

Discover the perspectives of Szymon, one of our Senior Quantitative Researchers, on the following papers:

  • Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
  • Parallelizing MCMC Across the Sequence Length
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research