Skip to main content
Back to News
ICML 2024: Paper Review #5

ICML 2024: Paper Review #5

24 September 2024
  • Quantitative Research

Machine Learning (ML) is a fast evolving discipline, which means conference attendance and hearing about the very latest research is key to the ongoing development and success of our quantitative researchers and ML engineers.

In this paper review series, our ICML 2024 attendees reveal the research and papers they found most interesting.

Here, discover the perspectives of Scientific Director, Michael, as he discusses his most compelling findings from the conference.

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taiga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

Scaling up models has been less straightforward in Reinforcement Learning compared to its overwhelming success in supervised settings. In previous work, several examples have been pointed out where increasing model capacity or training iterations eventually starts to decrease performance.

This paper investigates replacing the usual MSE loss on the scalar value function with cross-entropy loss, after quantizing the value into a number of bins. A previous approach distributed the probability mass into the two closest bins such that the expectation matches the original value (“Two-Hot encoding”). The authors propose smearing with a Gaussian and initially attempt adapting its width to the bin size (essentially proposing a form of Six-Hot encoding) but discover that the absolute variance of the Gaussian, rather than the number of bins in its effective support, is the relevant hyperparameter.

The results demonstrate improved performance over alternate distributions as well as standard regression and achieve monotonic scaling in the cases where regression did not. Ablation demonstrates that the use of the cross-entropy loss is critical compared to just lifting the representation from a scalar to a distribution.

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
ICML 2023 Paper Reviews

Read paper reviews from ICML 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

Zeyuan Allen-Zhu, Yuanzhi Li

This paper is part of a series on how large language models (LLMs) acquire various capabilities, using controlled experiments with purely synthetic data.

It studies the conditions under which LLMs learn knowledge in a way that, for example, allows them to answer questions, rather than just repeat sentences verbatim.

One key finding is that it is necessary for a fact to appear in the training data in multiple variations (for example, sentence permutations or translations). The understanding of a fact about X is demonstrated not only through a model’s ability to answer questions about X but also through probing that shows that such facts are only directly associated with X when they appear with variations, otherwise the association is with the whole sentence.

The paper also shows that instruction-fine tuning with questions cannot recover knowledge that hasn’t been presented with variations during pretraining. Conversely it shows that adding questions during pretraining can improve performance. The authors therefore recommend this approach, along with augmenting the pretraining data with sentences rewritten by an auxiliary model.

The controlled “physics-like” approach is a fascinating way to study the behaviour of LLMs.

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

ICML 2024: Paper Review #1

Discover the perspectives of Yousuf, one of our machine learning engineers, on the following papers:

  • Arrows of Time for Large Language Models
  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Danny, one of our machine learning engineers, on the following papers:

  • Compute Better Spent: Replacing Dense Layers with Structured Matrices
  • Emergent Equivariance in Deep Ensembles
Read now
ICML 2024: Paper Review #3

Discover the perspectives of Jonathan, one of our software engineers, on the following papers:

  • A Universal Class of Sharpness-Aware Minimization Algorithms
  • Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Read now
ICML 2024: Paper Review #4

Discover the perspectives of Evgeni, one of our senior quantitative researchers, on the following papers:

  • Trained Random Forests Completely Reveal your Dataset
  • Test-of-time Award: DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Fabian, one of our senior quantitative researchers, on the following papers:

  • I/O Complexity of Attention, or How Optimal is Flash Attention?
  • Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Ingmar, one of our quantitative researchers, on the following papers:

  • Offline Actor-Critic Reinforcement Learning Scales to Large Models
  • Information-Directed Pessimism for Offline Reinforcement Learning
Read now
ICML 2024: Paper Review #8

Discover the perspectives of Oliver, one of our quantitative researchers, on the following papers:

  • Better & Faster Large Language Models via Multi-token Prediction
  • Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Read now

Latest News

Going 15 Percent Faster with Graph-Based Type-checking (part two)
  • 13 Jan 2025

Hear from Florian, Open-Source Software Engineer, in the second part of this two part series, on the challenges and breakthroughs of an internal G-Research initiative aimed at enhancing the .NET developer experience at scale.

Read article
G-Research December 2024 Grant Winners
  • 09 Jan 2025

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our December grant winners.

Read article
James Maynard on Prime Numbers: Cryptography, Twin Primes and Groundbreaking Discoveries
  • 19 Dec 2024

We were thrilled to welcome James Maynard, Fields Medallist 2022 and Professor of Number Theory, at the Mathematical Institute in Oxford, on stage for the latest Distinguished Speaker Symposium last month. James’ talk on Patterns in prime numbers hones in on unanswered questions within mathematics and the recent developments that have brought the solutions to those problems closer to reality. Hear more in his exclusive interview with us.

Read article

Latest Events

  • Platform Engineering
  • Software Engineering

Hack the Burgh

01 Mar 2025 - 02 Mar 2025 The Nucleus Building, The University of Edinburgh, Thomas Bayes Road, Edinburgh, UK
  • Quantitative Engineering
  • Quantitative Research

Pub Quiz: Oxford

12 Feb 2025 Oxford - to be confirmed after registration
  • Quantitative Engineering
  • Quantitative Research

Pub Quiz: Cambridge

25 Feb 2025 Cambridge - to be confirmed after registration

Stay up to date with
G-Research