Skip to main content

NeurIPS paper reviews 2025 #4

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Software Engineer, Nick.

Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems

Christian Walder, Deep Karkhanis

With the rise of increasingly capable generative models, evaluating and improving them via reinforcement learning has become central to progress. Pass@K Policy Optimisation (PKPO) offers an elegant reframing of how sample efficiency and exploration should be handled.

While most RL pipelines optimise for pass@1, treating each sample independently, the authors highlight how this short-sighted focus undervalues the collective utility of a batch – an issue that becomes especially limiting on harder tasks where single-shot success is rare.

PKPO tackles this by directly optimising pass@k through a family of low-variance unbiased estimators for both binary and continuous rewards. These transformations jointly reshape batches of rewards in a way that is computationally stable and compatible with standard RL training loops. Importantly, unlike prior work that only considers the k = n case, the method generalises to any k <= n, enabling finer control over the exploration-exploitation landscape.

The results show clear practical benefits: higher k values unlock the ability to solve more challenging problems, while annealing k during training preserves strong pass@1 performance alongside substantial pass@k gains. By prioritising joint utility over isolated successes, PKPO offers a route to improving exploration and unblocking stalled learning. I’m excited to see how this approach shapes future RL fine-tuning pipelines across large model training.

Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
Blue gear icon centered showing a circular hub and six angular teeth sitting on a plain white background conveying mechanical or settings concept NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

Antidistillation Sampling

Yash Savani, Asher Trockman, Zhili Feng, Yixuan Even Xu, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter

Antidistillation sampling tackles a problem that has been steadily growing in visibility: frontier models often reveal far more than intended through their long-form reasoning traces. These traces, while useful for interpretability and debugging, can also act as ready-made supervision signals for distillation. The paper presents a clean and practical defence – modify the next-token distribution just enough to degrade the usefulness of those traces, without degrading the model’s actual performance.

The idea is straightforward but well-motivated. Rather than restricting access or weakening the model, antidistillation sampling selectively perturbs probabilities so that intermediate reasoning becomes noisy or misleading, while final outputs remain high quality. The paper frames this not as obfuscation for its own sake, but as a principled shift in the sampling regime: preserve utility for legitimate users, but avoid producing the kind of clean step-by-step explanations that make distillation trivial.

The experiments show clear separation between these two goals. Task performance remains effectively unchanged, yet the value of the emitted traces for distillation drops sharply. It’s a great demonstration that deployment-time defences do not always require heavy architectural changes.

Antidistillation Sampling
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs

Xander Davies, Eric Winsor, Alexandra Souly, Tomek Korbak, Robert Kirk, Christian Schroeder de Witt, Yarin Gal

This paper delivers a clear and somewhat uncomfortable message for anyone relying on fine-tuning safeguards: pointwise detection is not enough.

Existing defences typically focus on identifying harmful individual training or inference samples, assuming that misuse will surface through obviously toxic or high-risk prompts. The authors show that this assumption breaks down. By exploiting benign semantic and syntactic variation in a model’s own outputs, they construct “pointwise-undetectable” attacks where every single sample looks harmless in isolation.

The attack class is elegant in its simplicity. Adversaries first query the model to gather a library of innocuous responses, then repurpose subtle variations in these outputs to encode dangerous knowledge. Because both training and inference traces remain low-perplexity and benign, standard filters have nothing suspicious to latch onto.

The experiments against the OpenAI fine-tuning API make this concrete: the approach reliably elicits harmful multiple-choice answers while slipping past enhanced monitoring systems designed to catch known attack patterns.

The work’s contribution is less about proposing yet another exploit and more about formalising the underlying limitation. If the defence only evaluates samples one at a time, attackers can hide in the gaps. The authors argue that meaningful robustness will require multi-point or distributional methods – a direction that now feels unavoidable for future fine-tuning API design.

Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs

Read more paper reviews

NeurIPS 2025: Paper review #1

Discover the perspectives of Nick, one of our Quantitative Researchers, on the following papers:

  • Counterfactual Identifiability via Dynamic Optimal Transport
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
  • Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Read now
NeurIPS 2025: Paper review #2

Discover the perspectives of Tomi, one of our Quantitative Research Managers, on the following papers:

  • Kronos: A Foundation Model for the Language of Financial Markets
  • LOBERT: Generative AI Foundation Model for Limit Order Book Messages
  • Auto-Compressing Networks
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #5

Discover the perspectives of Cédric, one of our Quantitative Researchers, on the following papers:

  • Learning (Approximately) Equivariant Networks via Constrained Optimisation
  • On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #7

Discover the perspectives of Radomir, one of our Machine Learning Engineers, on the following papers:

  • Learning Task-Agnostic Representations through Multi-Teacher Distillation
  • Contrastive Representations for Temporal Reasoning
Read now
NeurIPS 2025: Paper review #8

Discover the perspectives of Benjamin, one of our Quantitative Researchers, on the following papers:

  • Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
  • Backward Conformal Prediction
  • Predicting the Performance of Black-box Language Models with Follow-up Queries
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #13

Discover the perspectives of Szymon, one of our Senior Quantitative Researchers, on the following papers:

  • Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
  • Parallelizing MCMC Across the Sequence Length
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research