Skip to main content

NeurIPS paper reviews 2025 #13

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Senior Quantitative Researcher, Szymon.

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava

This paper introduces Rollout Roulette, a principled approach to inference-time scaling for large language models that reframes generation as a problem of probabilistic inference rather than deterministic search.

The method is applicable in a setting where a process reward model, which assigns scores to partial continuation sequences, is available. In this setting, commonly used decoding methods are inspired by beam search or Monte Carlo Tree Search, and use the given reward model in a relatively greedy way. Because of that, those methods can prematurely discard certain continuations, especially in case of early reward model errors or noise.

The authors propose instead to model the generation process as a state-space system in which the language model defines a transition distribution over tokens, while a process reward model provides noisy, imperfect observations of solution quality along partial trajectories.

Then it is possible to perform test time scaling by applying classical particle-based Monte Carlo filtering methods to sample from the distribution over trajectories of output tokens given a sequence of input tokens and observed partial rewards. In this setting, each particle represents a possible continuation trajectory, weighted according to the reward model and stochastically resampled.

Because discarding candidate continuations happens only when the weights of the set of candidate trajectories get sufficiently imbalanced, instead of greedily at each step, this probabilistic treatment allows the method to hedge against reward model uncertainty, preserve alternative reasoning paths and delay hard decisions until sufficient evidence is accumulated.

Empirically, the authors demonstrate that this particle filtering framework leads to substantially more efficient inference-time scaling on challenging reasoning tasks, achieving strong performance improvements and, in some settings, allowing smaller models to rival or surpass much larger models under fixed rollout budgets.

I found this paper really satisfying because of how it leverages an elegant classical method in a natural way in context of LLMs to deliver strong performance improvements.

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Blue line chart object tracing a jagged upward trend action along vertical and horizontal axes context on a white square background NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

Parallelizing MCMC Across the Sequence Length

David M. Zoltowski, Skyler Wu, Xavier Gonzalez, Leo Kozachkov, Scott W. Linderman

This paper presents a novel approach to accelerating Markov Chain Monte Carlo (MCMC) by parallelising across the chain length, challenging the conventional view that MCMC is inherently sequential.

Importantly, the proposed approach can leverage modern hardware to perform Bayesian computation on much larger scale.

This stands in contrast to previous approaches to parallelising MCMC algorithms, which were mostly limited to simulating multiple sequential chains, and thus unlikely to utilise modern accelerators to their full potential.

The key idea is to reinterpret the entire sequence of MCMC samples as the solution to a fixed-point problem, enabling the use of the Newton method in a parallel-in-time manner to generate a long sequence of samples simultaneously.

Conceptually, this approach is closely inspired by prior work on parallelising nonlinear recurrent systems, particularly algorithms such as DEER.

By applying similar ideas to MCMC, the authors demonstrate how classical samplers such as Gibbs, MALA and HMC can be reformulated so that hundreds of thousands of sequential steps are replaced by a small number of parallelisable iterations, yielding substantial wall-clock speedups.

The authors further propose quasi-Newton approximations to make this approach practical in high dimensions and empirically shows that the resulting samplers achieve comparable sample quality to standard MCMC while dramatically reducing runtime.

Overall, this work introduces a very innovative approach to MCMC computations, with potentially very significant implications for scalable probabilistic inference and stochastic simulations.

Parallelizing MCMC Across the Sequence Length
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more paper reviews

NeurIPS 2025: Paper review #1

Discover the perspectives of Nick, one of our Quantitative Researchers, on the following papers:

  • Counterfactual Identifiability via Dynamic Optimal Transport
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
  • Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Read now
NeurIPS 2025: Paper review #2

Discover the perspectives of Tomi, one of our Quantitative Research Managers, on the following papers:

  • Kronos: A Foundation Model for the Language of Financial Markets
  • LOBERT: Generative AI Foundation Model for Limit Order Book Messages
  • Auto-Compressing Networks
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #4

Discover the perspectives of Nick, one of our Software Engineers, on the following papers:

  • Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
  • Antidistillation Sampling
  • Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Read now
NeurIPS 2025: Paper review #5

Discover the perspectives of Cédric, one of our Quantitative Researchers, on the following papers:

  • Learning (Approximately) Equivariant Networks via Constrained Optimisation
  • On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #7

Discover the perspectives of Radomir, one of our Machine Learning Engineers, on the following papers:

  • Learning Task-Agnostic Representations through Multi-Teacher Distillation
  • Contrastive Representations for Temporal Reasoning
Read now
NeurIPS 2025: Paper review #8

Discover the perspectives of Benjamin, one of our Quantitative Researchers, on the following papers:

  • Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
  • Backward Conformal Prediction
  • Predicting the Performance of Black-box Language Models with Follow-up Queries
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research