NeurIPS paper reviews 2025 #10

30 January 2026

News
Quantitative research

In this paper review series our team of researchers and machine learning practitioners discuss the papers they found most interesting at NeurIPS 2025.

Here, discover the perspectives of Quant Research Manager, Hugh.

Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference

Jiayi Yuan, Hao Li, Xinheng Ding, Wenya Xie, Yu-Jhe Li, Wentian Zhao, Kun Wan, Jing Shi, Xia Hu, Zirui Liu

This paper shines light on a commonly observed problem of non-determinism in transformer inference. While generally randomness is attributed to the random sampling, this paper demonstrates that randomness due to non-determinism of floating point numerics is a significant factor.

This paper shows that the non-determinism is mostly attributable to the non-associativity of floating-point addition (i.e. that (a + b) + c is not numerically equal to a + (b + c), despite being mathematically equivalent). In general, GPU execution does not guarantee the order of summation, so it is tricky to maintain deterministic inference while achieving performant GPU execution.

Beyond the LLM world, this is problematic as flash attention-based models are non-deterministic, which means results are not reproducible and making paired comparison tests between model variants with the same seed is made more difficult. The paper suggests a LayerCast method that incrementally casts the layer to higher precision. They show that this greatly mitigates the issue in practice (though does not completely avoid it).

Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference

NeurIPS 2024 paper reviews

Read paper reviews from NeurIPS 2024 from a number of our quantitative researchers and machine learning practitioners.

Read now

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin

This paper makes the simple suggestion of an extra gate in the attention architecture. Of the five possible positions (after each of the KVQ projections, after the SDPA layer and after the output projection), including the gate after the SDPA block is shown to be the far most effective.

The author list includes many researchers from the qwen team and so extensive experiments are presented on LLM training. Beyond the experiments, the paper also gives some intuition why the extra gate works.

The conclusion is that gating always helps to enhance non-linearity but also that the position of the proposed gate induces input-dependent sparsity, which helps eliminate the ‘attention sink’ phenomenon, where a layer attends overwhelmingly to the first few tokens even if they are irrelevant.

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Man gestures while talking to three colleagues around a table in a modern meeting room with wood slat wall a whiteboard plants and a decorative wire sculpture

Quantitative research & machine learning

Want to learn more about life as a researcher at G-Research?

Learn more

Latest news

16 Feb 2026

G-Research January 2026 grant winners

30 Jan 2026

NeurIPS paper reviews 2025 #14

30 Jan 2026

NeurIPS paper reviews 2025 #13

Latest events

Quantitative engineering
Quantitative research

New York trivia night

24 Mar 2026 New York - to be confirmed after registration

More info

Quantitative engineering
Quantitative research

Boston trivia night

25 Mar 2026 Boston - to be confirmed after registration

More info

Quantitative engineering
Quantitative research

Careers beyond academia: Options and pathways for researchers

26 Mar 2026 Mathematical Institute, Andrew Wiles Building, Oxford

More info

NeurIPS paper reviews 2025 #10

Understanding and Mitigating Numerical Sources of Non-determinism in LLM Inference

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Read more paper reviews

Latest news

Latest events

New York trivia night

Boston trivia night

Careers beyond academia: Options and pathways for researchers

Stay up to date with G-Research