Skip to main content
We're enhancing our site and your experience, so please keep checking back as we evolve.
Back to News
NeurIPs Paper Reviews 2023 #1

NeurIPs Paper Reviews 2023 #1

19 January 2024
  • Quantitative Research

Our team of quantitative researchers have shared the most interesting research presented during workshops and seminars at NeurIPs 2023.

Discover the perspectives of machine learning engineer Danny, as he discusses his most compelling findings from the conference.

The G-Research booth at NeurIPS 2022

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Alicia Curth, Alan Jeffares, Mihaela van der Schaar

The double descent hypothesis is a relatively recent idea that attempts to reconcile modern machine learning “bigger is better” practice with the bias-variance trade off. It states that in the overparameterized regime, the traditional U-shaped test error vs model complexity curve breaks down, and it’s possible to see improved generalization performance by continually increasing the model parameter count. This is referred to as the interpolation region, where the number of model parameters is greater than or equal to the training set size.

In this paper, the authors revisit the results from the original Belkin et al paper, from 2019, which observes a double descent for Random Fourier Feature regression, decision tree ensembles and gradient boosted trees. They claim that in each of these cases, model complexity is increased along multiple axes (for example, splits per tree and number of trees for a tree ensemble) and the double descent appears as an artefact of switching between these axes when increasing model complexity, rather than as a result of reaching the interpolation threshold (number of model parameters == training set size). When test error is plotted against increasing model complexity on any single axis, the traditional U-shaped bias-variance curve is recovered.

They go on to interpret each of these cases as a “smoother” from the classical statistical literature, which allows them to derive the effective number of parameters for each model. They then reproduce and re-plot the results from the original paper against the effective parameter count and recover the U-shaped curve in all cases. The obvious omission is the investigation of the “deep double descent” case which is suggested as the next direction for this work.

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning
NeurIPS 2022 Paper Reviews

Read paper reviews from NeurIPS 2022 from a number of our quantitative researchers and machine learning practitioners.

Read now

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Maximilian Müller, Tiffany Vlaar, David Rolnick, Matthias Hein

Sharpness Aware Minimization (SAM) is a technique that attempts to improve generalization performance by finding loss regions with flatter minima by minimizing the worst case sharpness of the training loss in a neighbourhood around L(w).

In this paper, the authors introduce SAM-ON (SAM-OnlyNorm), which applies SAM to the normalization parameters of a network only. They find that SAM-ON achieves better generalization performance than the original SAM method (applied to all parameters) on ResNet architectures with BatchNorm and Vision Transformers with LayerNorm.

They investigate this further by measuring loss sharpness for both SAM and SAM-ON and find that SAM-ON actually finds regions with sharper minima, despite exhibiting better generalization performance. This supports claims from previous work that the generalization performance of SAM is not solely due to it finding flatter minima.

Normalization Layers Are All That Sharpness-Aware Minimization Needs

Quantitative Research and Machine Learning

Want to learn more about life as a researcher at G-Research?

Learn more

Read more of our quantitative researchers thoughts

NeurIPs Paper Reviews 2023 #2

Discover the perspectives of Paul, one of our quantitative researcher, on the following papers:

  • Sharpness-Aware Minimization Leads to Low-Rank Features
  • When Do Neural Nets Outperform Boosted Trees on Tabular Data?
Paper Review #2
NeurIPs Paper Reviews 2023 #3

Discover the perspectives of Szymon, one of our quantitative researchers, on the following papers:

  • Convolutional State Space Models for Long-Range Spatiotemporal Modeling
  • How to Scale Your EMA
Paper Review #3
NeurIPS Paper Review 2023 #4

Discover the perspectives of Dustin, our scientific director, on the following papers:

  • Abide by the law and follow the flow: conservation laws for gradient flows
  • The Tunnel Effect: Building Data Representations in Deep Neural Networks
Paper Review #4
NeurIPS Paper Review 2023 #5

Discover the perspectives of Laurynas, one of our machine learning engineers, on the following papers:

  • Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
  • QLoRA: Efficient Finetuning of Quantized LLMs
Paper Review #5
NeurIPS Paper Review 2023 #6

Discover the perspectives of Rui, one of our quantitative analyst, on the following papers:

  • Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
  • Conformal Prediction for Time Series with Modern Hopfield Networks
Paper Review #6

Latest News

Building Modern, Resilient and Reliant Infrastructure
  • 15 Jul 2024

In this video, Philip Bullock, US IaaS Lead, shares insights into our approach to infrastructure innovation. He delves into the immense compute power that underpins G-Research and explains how we ensure we stay at the cutting edge of technology and finance.

Read article
Celebrating Our New Dallas Home
  • 11 Jul 2024

Our new office at One Victory Commons, Dallas, exemplifies our commitment to excellence, providing a state-of-the-art setting that truly reflects the cutting-edge work we do.

Read article
An interview with Michael Kagan (CTO at NVIDIA)
  • 02 Jul 2024

We spoke to Michael Kagan, CTO at NVIDIA, in an exclusive interview shot ahead of his keynote talk at the G-Research Distinguished Speaker Symposium.

Read article

Latest Events

  • Quantitative Engineering
  • Software Engineering

Jobs for Mathematicians Fair (University of Oxford)

19 Nov 2024 Mathematical Institute, Radcliffe Observatory, Andrew Wiles Building, Woodstock Road, Oxford, OX2 6GG,
  • Quantitative Engineering
  • Software Engineering

University of Cambridge: Maths and Quants Fair

30 Oct 2024 Student Services Centre, New Museum Site, Bene't St, Cambridge CB2 3PT
  • Quantitative Engineering
  • Software Engineering

The SEC Engineering Career Fair

04 Sep 2024 Legends Event Center, 2533 Midtown Pk Blvd, Bryan, TX 77801, United States

Stay up to date with