Skip to main content
Back to News
ICML 2024: Paper Review #7

ICML 2024: Paper Review #7

24 September 2024
  • Quantitative Research

Machine Learning (ML) is a fast evolving discipline, which means conference attendance and hearing about the very latest research is key to the ongoing development and success of our quantitative researchers and ML engineers.

In this paper review series, our ICML 2024 attendees reveal the research and papers they found most interesting.

Here, discover the perspectives of Quantitative Researcher, Ingmar, as he discusses his most compelling findings from the conference.

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

Large-scale models for policy learning in control/robotics have shown impressive mutli-task and generalisation capabilities in recent years, but so far policy learning in the generalist large-model regime has mostly relied on Behaviour Cloning, requiring near-optimal demonstrations during training. This work demonstrates the benefits of large-scale models for offline RL.

The key contribution is an offline actor-critic algorithm that allows to smoothly trade off RL and BC loss terms. This is combined with a scalable transformer-based multi-modal architecture to represent policy and value function. The experiments include scaling analysis as well as comparisons to strong BC baselines such as Gato (Reed et al., 2022) and RoboCat (Bousmalis et al., 2023) for pre-training, as well as an analysis of fine tuning with the critic. [1] [2]

[1] A Generalist Agent

[2] RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Offline Actor-Critic Reinforcement Learning Scales to Large Models
ICML 2023 Paper Reviews

Read paper reviews from ICML 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Information-Directed Pessimism for Offline Reinforcement Learning

Alec Koppel, Sujay Bhatt, Jiacheng Guo, Joe Eappen, Mengdi Wang, Sumitra Ganesh

In the offline reinforcement learning setting, this paper introduces a new type of penalty to restrict the mismatch between offline data distribution and online policy-induced distribution. Because of its interpretation as Stein information, the authors refer to this as information-directed pessimism.

Importantly, this allows for a the next-state distribution to be represented as a mixture of distributions, allowing for explicitly multi-modal state transition functions. Among others, the authors demonstrate improved performance of their method on a toy portfolio optimisation problem (Neuneier, 1997). [3]

[3] Enhancing Q-Learning for Optimal Asset Allocation

Information-Directed Pessimism for Offline Reinforcement Learning

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

ICML 2024: Paper Review #1

Discover the perspectives of Yousuf, one of our machine learning engineers, on the following papers:

  • Arrows of Time for Large Language Models
  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Danny, one of our machine learning engineers, on the following papers:

  • Compute Better Spent: Replacing Dense Layers with Structured Matrices
  • Emergent Equivariance in Deep Ensembles
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Jonathan, one of our software engineers, on the following papers:

  • A Universal Class of Sharpness-Aware Minimization Algorithms
  • Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Read now
ICML 2024: Paper Review #4

Discover the perspectives of Evgeni, one of our senior quantitative researchers, on the following papers:

  • Trained Random Forests Completely Reveal your Dataset
  • Test-of-time Award: DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Michael, one of our Scientific Directors, on the following papers:

  • Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
  • Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Fabian, one of our senior quantitative researchers, on the following papers:

  • I/O Complexity of Attention, or How Optimal is Flash Attention?
  • Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff
Read now
ICML 2024: Paper Review #8

Discover the perspectives of Oliver, one of our quantitative researchers, on the following papers:

  • Better & Faster Large Language Models via Multi-token Prediction
  • Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Read now

Latest News

NeurIPS Paper Reviews 2024 #5
  • 23 Jan 2025

In this NeurIPS paper review series, Dustin, Scientific Director, shares his perspectives on the most exciting research presented at the conference, providing a comprehensive look at the newest trends and innovations shaping the future of ML.

Read article
NeurIPS Paper Reviews 2024 #4
  • 23 Jan 2025

In this NeurIPS paper review series, Angus, Machine Learning Engineer, shares his perspectives on the most exciting research presented at the conference, providing a comprehensive look at the newest trends and innovations shaping the future of ML.

Read article
NeurIPS Paper Reviews 2024 #3
  • 23 Jan 2025

In this NeurIPS paper review series, Mark, Senior Quantitative Researcher, shares his perspectives on the most exciting research presented at the conference, providing a comprehensive look at the newest trends and innovations shaping the future of ML.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

Boston Trivia Night

06 Mar 2025 Boston - to be confirmed after registration
  • Platform Engineering
  • Software Engineering

Imperial Doc Soc Coding Challenge

20 Feb 2025 Imperial College London, Exhibition Rd, South Kensington, London, SW7 2AZ
  • Infrastructure Engineering
  • Platform Engineering
  • Software Engineering

SXSW – Software Engineering Social

11 Mar 2025 Moonshine Grill, 303 Red River St, Austin, TX 78701, United States

Stay up to date with
G-Research