Skip to main content

NeurIPS paper reviews 2025 #2

30 January 2026
  • News
  • Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Quantitative Research Manager, Tomi.

Kronos: A Foundation Model for the Language of Financial Markets

Yu Shi, Zongliang Fu, Shuo Chen, Bohan Zhao, Wei Xu, Changshui Zhang, Jian Li

There has been a growing wave of work on foundation models for time series, with NeurIPS 2025 even hosting a workshop on the topic. The hope is that, as with language models, one can train a large autoregressive model on a huge corpus of time-series data to obtain a foundation model that can be used zero-shot or lightly fine-tuned for a wide variety of tasks across many domains, outperforming models trained purely on specific datasets or targets.

As a step towards this, Kronos is a foundation model for financial K-line time series (open, close, high, low, volume, etc.) trained across many asset classes, exchanges, products, frequencies and years.

The approach uses an autoencoder to encode each K-line element into a binary code (split into coarse and fine parts) and a Transformer to predict the next binary code (coarse first, then fine), which is then decoded back into a K-line element. As usual in this type of work, the model achieves state-of-the-art results on several public benchmarks, including those relating to price-forecasting and data generation.

While a generic foundation model is unlikely to be competitive with the task-specific or input-specific models we train internally at G-Research, it would be unwise to ignore the techniques others are finding useful on financial data.

Kronos: A Foundation Model for the Language of Financial Markets
Blue cloud outline with diagonal arrow pointing upward right passing through it on a plain white background NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

LOBERT: Generative AI Foundation Model for Limit Order Book Messages

Eljas Linna, Kestutis Baltakys, Alexandros Iosifidis, Juho Kanniainen

Presented only as a poster, due to it being less than two weeks old at the conference, this work introduces another foundation model for time-series, but operating on limit-order-book messages rather than K-lines. These messages include information such as side, price, size and event type (order add, cancel, trade, etc.).

While the authors are not the first to train a generative model for LOB messages, they propose a more efficient hybrid continuous-discrete tokenisation scheme and train the model using both next-token prediction and BERT-style masking. They then fine-tune it with task-specific heads and achieve improvements over other public approaches, for example in mid-price forecasting.

This paper connected nicely with the broader theme of generative world-modelling that surfaced across many NeurIPS 2025 talks, including the workshop on Embodied World Models for Decision Making. The most striking demonstration was Tesla’s real-time world-model that could play forward a traffic scene and respond to human control. Going back to this paper, it was fun to think about the similar task of modelling the slightly simpler world of the limit order book for use in generative simulation.

LOBERT: Generative AI Foundation Model for Limit Order Book Messages
Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture
Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Auto-Compressing Networks

Vaggelis Dorovatas, Georgios Paraskevopoulos, Alexandros Potamianos

With the dominance of deep learning, a great deal of work focuses on understanding and managing depth in neural networks. Andrew Saxe’s keynote on Demystifying Depth: Principles of Learning in Deep Neural Networks provided a theoretical angle at NeurIPS 2025.

Auto-Compressing Networks gives a more practical perspective on improving the training of very deep networks and mitigating issues observed in ResNets. It can be viewed alongside other architectural responses such as DenseNets.

The authors argue that although ResNets solved the vanishing-gradient problem, these residual shortcuts can cause layers to be under-used, bypassed, overfit or redundant. The authors instead add a direct long connection from each layer to the final output, rather than short connections between adjacent layers. With supporting theory, they show that this structure encourages information to be represented by earlier layers when possible, effectively “compressing” it, which may aid generalisation and help avoid forgetting during continuous training.

As an aside, I should mention the poster The Curse of Depth in Large Language Models, which proposes mitigating issues with training deep ResNets with LayerNorm via a simple layer-dependent scaling after each LayerNorm.

Deep learning is used a lot at G-Research, so we naturally pay attention to papers like this. In fact, I’m fairly sure that by the time I’m back from the conference, this architecture (and many others) will already have been tested on our data, either immediately after release or even before.

Auto-Compressing Networks

Read more paper reviews

NeurIPS 2025: Paper review #1

Discover the perspectives of Nick, one of our Quantitative Researchers, on the following papers:

  • Counterfactual Identifiability via Dynamic Optimal Transport
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
  • Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster
Read now
NeurIPS 2025: Paper review #3

Discover the perspectives of Dustin, one of our Scientific Directors, on the following papers:

  • Statistical Inference for Gradient Boosting Regression
  • Dynamic Low-Rank Training with Spectral Regularisation: Achieving Robustness in Compressed Representations
Read now
NeurIPS 2025: Paper review #4

Discover the perspectives of Nick, one of our Software Engineers, on the following papers:

  • Pass@K Policy Optimisation: Solving Harder Reinforcement Learning Problems
  • Antidistillation Sampling
  • Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs
Read now
NeurIPS 2025: Paper review #5

Discover the perspectives of Cédric, one of our Quantitative Researchers, on the following papers:

  • Learning (Approximately) Equivariant Networks via Constrained Optimisation
  • On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Read now
NeurIPS 2025: Paper review #6

Discover the perspectives of Ognjen, one of our Quantitative Researchers, on the following papers:

  • Omnipresent Yet Overlooked: Heat Kernels in Combinatorial Bayesian Optimisation
  • Bubbleformer: Forecasting Boiling with Transformers
Read now
NeurIPS 2025: Paper review #7

Discover the perspectives of Radomir, one of our Machine Learning Engineers, on the following papers:

  • Learning Task-Agnostic Representations through Multi-Teacher Distillation
  • Contrastive Representations for Temporal Reasoning
Read now
NeurIPS 2025: Paper review #8

Discover the perspectives of Benjamin, one of our Quantitative Researchers, on the following papers:

  • Dynamical Decoupling of Generalisation and Overfitting in Large Two-Layer Networks
  • Backward Conformal Prediction
  • Predicting the Performance of Black-box Language Models with Follow-up Queries
Read now
NeurIPS 2025: Paper review #9

Discover the perspectives of Casey, one of our Machine Learning Engineers, on the following papers:

  • Distributed Orthonormal Updates for Large-Scale Training
  • Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenisations
Read now
NeurIPS 2025: Paper review #10

Discover the perspectives of Hugh, one of our Quantitative Research Managers, on the following papers:

  • Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference
  • Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Read now
NeurIPS 2025: Paper review #11

Discover the perspectives of Timothy, one of our Machine Learning Engineers, on the following papers:

  • ZeroS: Zero-Sum Linear Attention for Efficient Transformers
  • In Search of Adam’s Secret Sauce
Coming soon
NeurIPS 2025: Paper review #12

Discover the perspectives of David, one of our Quantitative Researchers, on the following papers:

  • 1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
  • Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Read now
NeurIPS 2025: Paper review #13

Discover the perspectives of Szymon, one of our Senior Quantitative Researchers, on the following papers:

  • Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
  • Parallelizing MCMC Across the Sequence Length
Read now
NeurIPS 2025: Paper review #14

Discover the perspectives of Simon, one of our Senior Quantitative Researchers, on the following papers:

  • Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
  • ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Read now

Latest events

  • Technology innovation and open-source

KubeCon Europe 2026

23 Mar 2026 - 26 Mar 2026 RAI Amsterdam, Europaplein 24, 1078 GZ Amsterdam
  • Platform engineering
  • Software engineering

NVIDIA GTC 2026

16 Mar 2026 - 19 Mar 2026 San Jose McEnery Convention Centre
  • Machine learning
  • Quantitative research

Spring into Quant Finance 2026

12 Apr 2026 - 17 Apr 2026 Palermo, Sicily, Italy

Stay up to date with G-Research