Back to news

ICML 2024: Paper Review #7

24 September 2024

Quantitative Research

Machine Learning (ML) is a fast evolving discipline, which means conference attendance and hearing about the very latest research is key to the ongoing development and success of our quantitative researchers and ML engineers.

In this paper review series, our ICML 2024 attendees reveal the research and papers they found most interesting.

Here, discover the perspectives of Quantitative Researcher, Ingmar, as he discusses his most compelling findings from the conference.

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

Large-scale models for policy learning in control/robotics have shown impressive mutli-task and generalisation capabilities in recent years, but so far policy learning in the generalist large-model regime has mostly relied on Behaviour Cloning, requiring near-optimal demonstrations during training. This work demonstrates the benefits of large-scale models for offline RL.

The key contribution is an offline actor-critic algorithm that allows to smoothly trade off RL and BC loss terms. This is combined with a scalable transformer-based multi-modal architecture to represent policy and value function. The experiments include scaling analysis as well as comparisons to strong BC baselines such as Gato (Reed et al., 2022) and RoboCat (Bousmalis et al., 2023) for pre-training, as well as an analysis of fine tuning with the critic. ^[1] ^[2]

^{[1] A Generalist Agent}

^{[2] RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation}

Offline Actor-Critic Reinforcement Learning Scales to Large Models

ICML 2023 Paper Reviews

Read paper reviews from ICML 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Information-Directed Pessimism for Offline Reinforcement Learning

Alec Koppel, Sujay Bhatt, Jiacheng Guo, Joe Eappen, Mengdi Wang, Sumitra Ganesh

In the offline reinforcement learning setting, this paper introduces a new type of penalty to restrict the mismatch between offline data distribution and online policy-induced distribution. Because of its interpretation as Stein information, the authors refer to this as information-directed pessimism.

Importantly, this allows for a the next-state distribution to be represented as a mixture of distributions, allowing for explicitly multi-modal state transition functions. Among others, the authors demonstrate improved performance of their method on a toy portfolio optimisation problem (Neuneier, 1997). ^[3]

^{[3] Enhancing Q-Learning for Optimal Asset Allocation}

Information-Directed Pessimism for Offline Reinforcement Learning

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Latest News

G-Research May 2025 Grant Winners

18 Jun 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our May grant winners.

Read article

G-Research 2025 PhD prize winners: University of Warwick

04 Jun 2025

Every year, G-Research runs a number of different PhD prizes in Maths and Data Science at universities in the UK, Europe and beyond. We're pleased to announce the winners of this prize, run in conjunction with the University of Warwick.

Read article

G-Research 2025 PhD prize winners: University of Oxford

29 May 2025

Read article

Latest Events

Quantitative Engineering
Quantitative Research

OxML 2025

08 Aug 2025 University of Oxford, Radcliffe Observatory, Andrew Wiles Building, Woodstock Rd, Oxford, OX2 6GG

More info

Quantitative Engineering
Quantitative Research

G-Research networking drinks at EuroPython 2025

16 Jul 2025 Shared on confirmation of your place

More info

Quantitative Engineering
Quantitative Research

ML in PL Conference 2025

15 Oct 2025 - 18 Oct 2025 Copernicus Science Centre, Warsaw, Poland

More info

ICML 2024: Paper Review #7

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Information-Directed Pessimism for Offline Reinforcement Learning

Quantitative Research and Machine Learning

Read more of our quantitative researchers thoughts

Latest News

Latest Events

OxML 2025

G-Research networking drinks at EuroPython 2025

ML in PL Conference 2025

Stay up to date with
G-Research

ICML 2024: Paper Review #7

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Information-Directed Pessimism for Offline Reinforcement Learning

Quantitative Research and Machine Learning

Read more of our quantitative researchers thoughts

Latest News

Latest Events

OxML 2025

G-Research networking drinks at EuroPython 2025

ML in PL Conference 2025

Stay up to date with G-Research

Stay up to date with
G-Research