Skip to main content
We're enhancing our site and your experience, so please keep checking back as we evolve.
Back to News
ICML 2024: Paper Review #3

ICML 2024: Paper Review #3

24 September 2024
  • Quantitative Research

Machine Learning (ML) is a fast evolving discipline, which means conference attendance and hearing about the very latest research is key to the ongoing development and success of our quantitative researchers and ML engineers.

In this paper review series, our ICML 2024 attendees reveal the research and papers they found most interesting.

Here, discover the perspectives of Software Engineer, Jonathan, as he discusses his most compelling findings from the conference.

A Universal Class of Sharpness-Aware Minimization Algorithms

Behrooz Tahmasebi, Ashkan Soleymani, Dara Bahri, Stefanie Jegelka, Patrick Jaillet

Sharpness-Aware Minimization (SAM) (Foret et al, 2021) is an optimisation procedure that aims to improve the generalisation of trained models, by biasing towards flatter minima in the loss landscape. [1]

Specifically, SAM works by minimising the maximum loss L(x) within a local neighbourhood of the model parameters. This has been shown to be equivalent to minimising the largest eigenvalue of the Hessian matrix H(x) on the zero-loss manifold. A variant of this uses the average loss instead of the maximum – this corresponds to minimising the trace of H(x) (and thus the average eigenvalue). These functions of the Hessian can be considered measures of sharpness.

However, these particular sharpness measures have shortcomings. For instance, a loss of L(a, b) = a2 – b2 has tr(H) = 0 everywhere, so the average-based sharpness objective becomes meaningless. The authors also provide examples where L(x) is scale-invariant but the sharpness measures are not.

Using the average-based objective as a starting point, the authors define a generalised class of sharpness measures S(x) that are functions of the Hessian. They then prove that this class of sharpness measures is universal for functions of Hessian eigenvalues, as well as for arbitrary functions of the Hessian.

They then provide an objective function that only relies on zeroth-order information about the training loss, with an explicit bias towards minimising S(x), as well as the full generalised SAM algorithm. They provide the form of the original maximum and average-based measures under this parameterisation, as well as new measures which use the Frobenius norm and determinant of the Hessian to address the aforementioned problems with saddle points and scale-invariance.

The authors then demonstrate that these SAM variants are competitive with the original ones on various vision datasets, and can outperform in certain scenarios, e.g. with training data is limited, and in the presence of label noise. It would be interesting to see how these variants compare across different datasets and model architectures.

[1] Sharpness-Aware Minimization for Efficiently Improving Generalization

A Universal Class of Sharpness-Aware Minimization Algorithms
ICML 2023 Paper Reviews

Read paper reviews from ICML 2023 from a number of our quantitative researchers and machine learning practitioners.

Read now

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

Atli Kosson, Bettina Messmer, Martin Jaggi

Weight decay and L2-regularisation are widely used techniques in deep learning but the mechanisms by which they regularise a network are not well understood.

Weight decay is often thought of as an explicit regularisation method. However, this has been shown not to hold in networks with normalisation layers, as this can make weight vectors scale-invariant, and thus the magnitude of the weight vectors does not affect the output of the network.

In this paper, the authors break down the update dynamics of various different optimisers into the magnitude and directional components. They show that the critical effect of weight decay is to control the speed of angular updates to the weight vectors. Gradient updates and weight decay have opposite effects on the norm of the weights, thus bringing the weight vectors into an equilibrium state with fixed expected norm and angular update size.

They derive expressions for the update sizes and equilibrium norm for AdamW and Adam+L2, showing that in Adam+L2 this equilibrium depends on the gradient norm, whereas in AdamW it doesn’t. AdamW is known empirically to outperform Adam+L2, and the authors suggest that the balanced rotation between layers and neurons under AdamW is the main reason for this. They illustrate this imbalanced rotation in Adam+L2 by plotting the per-layer rotation speed over time, which they show is clearly far more dispersed than for AdamW.

The authors propose rotational variants (RVs) of AdamW and SGDM, which explicitly control the size of the angular updates while keeping the weight norms constant. They demonstrate performance parity with the baseline optimisers, and show that the RVs eliminate the need for learning rate warmup (which is often required to counteract fast rotation at the start of training), as well as making the dynamics more robust to normalisation or the lack thereof.

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

ICML 2024: Paper Review #1

Discover the perspectives of Yousuf, one of our machine learning engineers, on the following papers:

  • Arrows of Time for Large Language Models
  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Read now
ICML 2024: Paper Review #2

Discover the perspectives of Danny, one of our machine learning engineers, on the following papers:

  • Compute Better Spent: Replacing Dense Layers with Structured Matrices
  • Emergent Equivariance in Deep Ensembles
Read now
ICML 2024: Paper Review #4

Discover the perspectives of Evgeni, one of our senior quantitative researchers, on the following papers:

  • Trained Random Forests Completely Reveal your Dataset
  • Test-of-time Award: DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Read now
ICML 2024: Paper Review #5

Discover the perspectives of Michael, one of our Scientific Directors, on the following papers:

  • Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
  • Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Read now
ICML 2024: Paper Review #6

Discover the perspectives of Fabian, one of our senior quantitative researchers, on the following papers:

  • I/O Complexity of Attention, or How Optimal is Flash Attention?
  • Simple Linear Attention Language Models Balance the Recall-Throughput Tradeoff
Read now
ICML 2024: Paper Review #7

Discover the perspectives of Ingmar, one of our quantitative researchers, on the following papers:

  • Offline Actor-Critic Reinforcement Learning Scales to Large Models
  • Information-Directed Pessimism for Offline Reinforcement Learning
Read now

Latest News

G Research
G-Research September 2024 Grant Winners
  • 08 Oct 2024

Each month, we provide up to £2,000 in grant money to early career researchers in quantitative disciplines. Hear from our August grant winners.

Read article
Lessons learned: Delivering software programs (part 5)
  • 04 Oct 2024

Hear more from our Head of Forecasting Engineering and learn how to keep your projects on track by embracing constant change and acting quickly.

Read article

Latest Events

  • Quantitative Engineering
  • Quantitative Research

Oxford Coding Challenge

23 Oct 2024 University of Oxford, Computer Science Lecture Theatre A, 7 Parks Rd, Oxford, OX1 3QG
  • Quantitative Engineering
  • Quantitative Research

Cambridge Coding Challenge

28 Oct 2024 East Hub 1, University of Cambridge, JJ Thomson Avenue, Cambridge, CB3 0US
  • Quantitative Engineering
  • Quantitative Research

Cambridge Quant Challenge

06 Nov 2024 University of Cambridge, Centre for Mathematical Sciences,  Wilberforce Road,  Cambridge CB3 0WA

Stay up to date with
G-Research