Skip to main content

NeurIPS paper reviews 2025 #7

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Machine Learning Engineer, Radomir.

Learning Task-Agnostic Representations through Multi-Teacher Distillation

Philippe Formont, Maxime Darrin, Banafsheh Karimian, Jackie CK Cheung, Eric Granger, Ismail Ben Ayed, Mohammadhadi Shateri, Pablo Piantanida

The paper tackles the problem of distilling multiple large embedding models into a single, task-agnostic student representation. Existing multi-teacher knowledge distillation methods typically rely on task labels or task-specific heads and are tuned to a single downstream objective, which makes them hard to reuse on unseen tasks.

The authors instead define a “task-enabling” view of distillation, comparing the Bayes-optimal classifiers induced by teacher and student embeddings for arbitrary downstream tasks. They prove that the probability that the student’s Bayes classifier disagrees with the ensemble of teachers can be controlled via the conditional entropy of the teachers’ embeddings given the student embedding, leading to an information-theoretic loss that maximises mutual information between student and teachers.

Concretely, they implement this loss with a differentiable Gaussian mixture estimator of conditional entropy and train student embedders on unlabelled data from text, vision and molecular domains. After distillation, the student is frozen and simple feed-forward heads are trained for a variety of classification, regression, clustering and similarity benchmarks.

Across modalities, the distilled students achieve competitive and often improved performance compared to the best individual teachers and prior distillation baselines, while remaining compact and task-agnostic.

This paper is appealing because it shows how to compress the knowledge of many strong foundation models into a single, general-purpose representation without needing task labels.

As many people are currently focusing on creating better and better foundation models, this paper shows how people can fuse multiple different models into a single smaller one. This can be beneficial for academia and industry as it will allow saving hardware whilst improving the final model.

Learning Task-Agnostic Representations through Multi-Teacher Distillation
Blue line chart object tracing a jagged upward trend action along vertical and horizontal axes context on a white square background NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

Contrastive Representations for Temporal Reasoning

Alicja Ziarko, Michał Bortkiewicz, Michał Zawalski1, Benjamin Eysenbach, Piotr Miłos

The paper studies whether temporal reasoning for combinatorial problems can be embedded into learned representations rather than delegated entirely to explicit search.

They start from temporal contrastive learning methods used in reinforcement learning, where an encoder is trained with an InfoNCE objective to bring states close to future goals and push them away from negatives drawn from other trajectories.

In domains such as Sokoban, Rubik’s Cube and other combinatorial puzzles, they show that this standard setup can overfit to static contextual features (for example, wall layouts), causing trajectories to collapse into disconnected clusters that do not reflect underlying temporal structure. To address this, they propose Contrastive Representations for Temporal Reasoning (CRTR), which uses in-trajectory negative samples: the model must distinguish temporally distant states within the same episode even when observations look visually similar.

They provide theoretical arguments that this negative sampling scheme removes spurious features and encourages embeddings that encode meaningful dynamics. Empirically, CRTR significantly improves planning efficiency over standard contrastive learning across five combinatorial benchmarks (Sokoban, Rubik’s Cube, N-Puzzle, Lights Out and Digit Jumper), approaching or matching supervised baselines.

For Rubik’s Cube, the learned representation generalises to arbitrary initial states and supports a latent-space planner that solves cubes using fewer node expansions than a best-first search baseline, albeit with somewhat longer solution paths.

This paper stands out because it focuses on learning representations that directly support long-horizon decision-making, rather than treating planning as a separate algorithm layered on top.

In the failure mode they identify contrastive objectives latching onto spurious context instead of genuine temporal structure, which feels highly relevant for financial time series, where models can easily overfit to regime or calendar artefacts. The CRTR idea of designing negatives to “factor out” static context and emphasise temporal distance is conceptually simple but powerful and could inspire alternative contrastive objectives for market data or other sequential signals we work with.

More broadly, the demonstration that good representations can materially reduce search effort in hard combinatorial problems is interesting for any setting where we need fast approximate planning over a huge, discrete state space.

Contrastive Representations for Temporal Reasoning
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more paper reviews

NeurIPS 2025: Paper review #1

Discover the perspectives of Nick, one of our Quantitative Researchers, on the following papers:

  • Counterfactual Identifiability via Dynamic Optimal Transport
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model
  • Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Read now
NeurIPS 2025: Paper review #2

Discover the perspectives of Tomi, one of our Quantitative Research Managers, on the following papers:

  • Kronos: A Foundation Model for the Language of Financial Markets
  • LOBERT: Generative AI Foundation Model for Limit Order Book Messages
  • Auto-Compressing Networks
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #4

Discover the perspectives of Nick, one of our Software Engineers, on the following papers:

  • Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
  • Antidistillation Sampling
  • Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Read now
NeurIPS 2025: Paper review #5

Discover the perspectives of Cédric, one of our Quantitative Researchers, on the following papers:

  • Learning (Approximately) Equivariant Networks via Constrained Optimisation
  • On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #8

Discover the perspectives of Benjamin, one of our Quantitative Researchers, on the following papers:

  • Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
  • Backward Conformal Prediction
  • Predicting the Performance of Black-box Language Models with Follow-up Queries
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #13

Discover the perspectives of Szymon, one of our Senior Quantitative Researchers, on the following papers:

  • Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
  • Parallelizing MCMC Across the Sequence Length
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research