Skip to main content

NeurIPS paper reviews 2025 #1

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Quantitative Researcher, Nick.

Counterfactual Identifiability via Dynamic Optimal Transport

Fabio De Sousa Ribeiro, Ainkaran Santhirasekaram, Ben Glocker

This paper addresses a major open problem: whether high-dimensional, multivariate outcomes (e.g. images) admit identifiable counterfactuals from observational data alone. In standard causal modelling, identifiability of counterfactuals is key to making valid causal claims – yet prior work on high-dimensional outcome variables has neglected theoretical guarantees.

The authors propose a novel framework combining Dynamic Optimal Transport (OT) with continuous-time flows (flow matching) to recover a unique, monotone, rank-preserving transport map from factual to counterfactual distributions under standard assumptions. This ensures that given observational data, one can consistently derive counterfactual outcomes in a way that respects the joint multivariate structure rather than assuming coordinate-wise independence or arbitrary ordering.

The authors provide theoretical analysis characterising the required conditions for identifiability and demonstrate empirically that their method yields sound counterfactuals in both a toy setting as well as a real-world chest X-ray dataset. This work significantly advances the foundations for counterfactual inference in high-dimensional domains, potentially enabling more reliable causal analysis and reinterpretation in areas like fairness, image editing, or treatment-effect estimation.

I have previously worked on counterfactual estimation in the medical imaging domain. As such this work closes some gaps in one of my own papers. It is exciting to see this approach being both theoretically sound and empirically performant.

Counterfactual Identifiability via Dynamic Optimal Transport
Thick blue arrow composed of a diagonal shaft and right angle corner pointing toward the upper right on a plain white background NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

This paper critically revisits the claim that Reinforcement Learning with Verifiable Rewards (RLVR) enables large language models to develop fundamentally new reasoning abilities beyond what is already encoded in the base model. The authors systematically evaluate RL-trained models across math, coding, and visual reasoning tasks using a “pass@k” metric with large k – which reflects whether a model could solve a problem given many sampling attempts. They find that although RL-trained models outperform base models at low k (e.g. pass@1), when k increases, base models often catch up or even surpass RL models, indicating that successful reasoning paths were already present in the base distribution.

In many cases RLVR simply skews the sampling distribution toward reasoning traces more likely to be rewarded, improving sampling efficiency but decreasing diversity. Manual inspection confirms that the reasoning paths produced post-RL are not novel. The authors further show that, unlike RLVR, distillation from a stronger reasoning model can genuinely introduce new reasoning patterns by adding new supervised training data. The study challenges the assumption that RLVR yields emergent reasoning and instead suggests that current RL fine-tuning mostly optimises sampling, not reasoning capacity itself.

This paper challenges common assumptions about RL for LLMs which means the field might have to rethink existing paradigms for building truly intelligent models.

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster

Shriram M S, Xinyue Hao, Shihao Hou, Yang Lu, Laura Sevilla-Lara, Anurag Arnab, Shreyank N Gowda

This paper introduces Progressive Data Dropout (PDD), a remarkably simple training strategy that aims to reduce computational cost while preserving – and sometimes enhancing – model performance.

Instead of repeatedly training on the full dataset over all epochs, PDD gradually decreases the amount of data presented to the model as training progresses. Early training phases rely on broad exposure to the data distribution, while later phases increasingly focus on a curated subset, often those examples that remain most informative for improving the model.

The key appeal of PDD is its minimalism: it requires no architectural changes, special losses or complex scheduling. The method can be integrated into standard training pipelines with only minor modifications. Conceptually, PDD is motivated by the observation that not all data points remain equally valuable throughout training. As models begin to solidify their representation of core patterns, repeatedly revisiting easy or redundant examples can provide diminishing returns. By strategically reducing data volume over time, PDD allocates compute to the examples that continue to challenge the model, leading to faster convergence and more efficient use of training resources.

Across a variety of architectures and tasks, experiments show that this simple adjustment can substantially shorten training time while maintaining competitive accuracy. The broader implication is that dataset management – not just model design – can be a powerful lever for improving deep learning efficiency at scale.

A lot of my training runs take quite some time to finish, partially because of the amount of data that we are training on. Cutting training time would allow me to test more ideas and iterate faster.

Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster

Read more paper reviews

NeurIPS 2025: Paper review #2

Discover the perspectives of Tomi, one of our Quantitative Research Managers, on the following papers:

  • Kronos: A Foundation Model for the Language of Financial Markets
  • LOBERT: Generative AI Foundation Model for Limit Order Book Messages
  • Auto-Compressing Networks
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #4

Discover the perspectives of Nick, one of our Software Engineers, on the following papers:

  • Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
  • Antidistillation Sampling
  • Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Read now
NeurIPS 2025: Paper review #5

Discover the perspectives of Cédric, one of our Quantitative Researchers, on the following papers:

  • Learning (Approximately) Equivariant Networks via Constrained Optimisation
  • On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #7

Discover the perspectives of Radomir, one of our Machine Learning Engineers, on the following papers:

  • Learning Task-Agnostic Representations through Multi-Teacher Distillation
  • Contrastive Representations for Temporal Reasoning
Read now
NeurIPS 2025: Paper review #8

Discover the perspectives of Benjamin, one of our Quantitative Researchers, on the following papers:

  • Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
  • Backward Conformal Prediction
  • Predicting the Performance of Black-box Language Models with Follow-up Queries
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #13

Discover the perspectives of Szymon, one of our Senior Quantitative Researchers, on the following papers:

  • Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
  • Parallelizing MCMC Across the Sequence Length
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research