Skip to main content

NeurIPS paper reviews 2025 #8

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Senior Quantitative Researcher, Benjamin.

Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks

Andrea Montanari, Pierfrancesco Urbani

It’s often observed of deep and high-capacity that if one trains the model for long enough or in a certain way, it ends up being able to perfectly interpolate the training data, i.e. with zero error, with the resultant fitted models being of widely varying ability to generalise to unseen test data.

This implies the manner in which a model is trained is very important, and that architecture and training data alone are insufficient for determining out-of-fit performance.

Here the authors focus on a wide two-layer neural network (a simple setting but one which still exhibits feature learning during early training and with capacity to have zero error on training data) and explain dynamics throughout the training process in a unified framework that explains empirical observations within the training. Non-monotone test error is a consequence of this framework, i.e. they identify timescales during training within which the network is underfitting or overfitting.

This framework also explains how the points where feature learning stops (test error stops decreasing) and overfitting starts (test error starts increasing) depend on network size and the scale of parameter initialisation in the final layer (of interest given that it disagrees with what is often common practice) in a clear way, explained through the prism of complexity growth of the model throughout training.

This provides a single framework explaining both training and overfitting in a nonasymptotic way, yielding a more comprehensive understanding of training dynamics of the two-layer networks they consider. In turn, since many of these same empirical observations are often made for more complicated models, we can hope that the intuition provided is extendible to those settings.

It will be exciting to see if future research can apply these techniques to more complex models, promising both confirmation of existing understanding and even provision of novel insights.

Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
Blue line chart object tracing a jagged upward trend action along vertical and horizontal axes context on a white square background NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

Backward Conformal Prediction

Etienne Gauthier, Francis Bach, Michael I. Jordan

Conformal prediction is a relatively modern tool allowing uncertainty quantification for black-box models, with practitioners able to specify an error alpha and obtain a prediction set such that the probability of this set containing the true value exceeds 1 – alpha.

This paper incorporates recent developments in hypothesis testing with e-values into conformal prediction, resulting in an advanced procedure that provides a potentially significant expansion in flexibility of the overall approach. In particular, with traditional conformal prediction one cannot specify any requirements of the size of the prediction set meaning that the result could be too large to be of practical use.

Here, one is able to choose the largest size of prediction set one is willing to accept, and obtain both the set itself and (an estimate of, for which concentration results are provided) the miscoverage probability. Furthermore, one can in fact compute a curve to characterise the relationship between prediction set size and miscoverage probability before making a selection between this sequence of sets, all without loss of validity.

The flexibility that this new method enables in conformal prediction has very compelling practical applications.

Backward Conformal Prediction
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Predicting the Performance of Black-box Language Models with Follow-up Queries

Dylan Sam, Marc Finzi, J. Zico Kolter

The problem of quantifying the uncertainty in the outputs of large language models (LLMs) is important if we want to be able to rely on them in practice to not return confidently wrong answers.

If one has some labelled training data (i.e. queries for which you can objectively evaluate the model’s response as either correct or incorrect), then this paper proposes a straightforward method that does not require an involved inspection of internal weights of a model and is even applicable to a setting where the only way one interacts with the model is querying it through a black-box API.

This paper is in essence an investigation into the dependence between the correctness of an initial answer and the distribution over subsequent answers within the same prompt, and how in turn the latter can be used to predict the former.

This is explored through the development of a method of asking a set of ‘follow-up queries’ to a model and generating a small set of features based on the answers to these questions. The set used is partly human-generated but mostly generated by a large-language-model. These features are used with a labelled training set to train a binary classifier. Though the requirement of labelled training data is a drawback, their experiments provide compelling evidence that the approach improves performance.

The treatment of model outputs as abstract numerical embeddings with no necessary notion of semantic interpretability provided an interesting perspective, looking at how the model behaves purely as a high-dimensional system.

Predicting the Performance of Black-box Language Models with Follow-up Queries

Read more paper reviews

NeurIPS 2025: Paper review #1

Discover the perspectives of Nick, one of our Quantitative Researchers, on the following papers:

  • Counterfactual Identifiability via Dynamic Optimal Transport
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
  • Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Read now
NeurIPS 2025: Paper review #2

Discover the perspectives of Tomi, one of our Quantitative Research Managers, on the following papers:

  • Kronos: A Foundation Model for the Language of Financial Markets
  • LOBERT: Generative AI Foundation Model for Limit Order Book Messages
  • Auto-Compressing Networks
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #4

Discover the perspectives of Nick, one of our Software Engineers, on the following papers:

  • Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
  • Antidistillation Sampling
  • Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Read now
NeurIPS 2025: Paper review #5

Discover the perspectives of Cédric, one of our Quantitative Researchers, on the following papers:

  • Learning (Approximately) Equivariant Networks via Constrained Optimisation
  • On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #7

Discover the perspectives of Radomir, one of our Machine Learning Engineers, on the following papers:

  • Learning Task-Agnostic Representations through Multi-Teacher Distillation
  • Contrastive Representations for Temporal Reasoning
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #13

Discover the perspectives of Szymon, one of our Senior Quantitative Researchers, on the following papers:

  • Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
  • Parallelizing MCMC Across the Sequence Length
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research