Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Serhii Melnyk

Authors

Serhii Melnyk Senior Lead Software Engineer, NC, USA

Keywords:

Nondeterminism, Large Language Models (LLMs), Batch Invariance, Reproducibility

Abstract

A recent paper from Thinking Machines Lab (TML), "Defeating Nondeterminism in LLM Inference," has provided a new perspective on the prevalence of nondeterministic outputs in Large Language Models (LLMs) configured for deterministic behavior.[1] This issue undermines reliability, complicates testing, and hinders scientific reproducibility, with studies showing accuracy variations of up to 15% across identical runs.[2] This paper's primary contribution is to analyze the TML findings through a novel three-part framework, categorizing the boundaries of any determinism solution as: (1) an operational boundary (reproducibility is local to a specific hardware/software stack); (2) a functional boundary (it applies only to greedy decoding, not generative sampling); and (3) an architectural boundary (it does not solve nondeterminism in distributed, multi-GPU systems). This analysis argues that the TML work provides a critical engineering trade-off for reproducibility rather than a complete solution to nondeterminism. By situating the TML work within the proposed framework, this analysis clarifies what is practically achievable versus what is fundamentally impossible in the pursuit of deterministic AI.

References

H. He and Thinking Machines Lab, "Defeating Nondeterminism in LLM Inference," Thinking Machines Lab: Connectionism, Sep. 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

B. Atil, "Non-Determinism of 'Deterministic' LLM Settings," arXiv:2408.04667 [cs.CL], Aug. 2024 (updated Apr. 2025). [Online]. Available: https://arxiv.org/abs/2408.04667

A. Sedova, G. Sivaraman, M. Coletti, W. Elwasif, M. Smith, and O. Hernandez, "Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications," 2024. [Online]. Available: https://www.researchgate.net/publication/383037277_Impacts_of_floating-point_non-associativity_on_reproducibility_for_HPC_and_deep_learning_applications

J. Yuan, H. Li, X. Ding, et al., "Understanding and Mitigating Numerical Sources of Nondeterminism in LLM Inference," arXiv:2506.09501 [cs.CL], Jun. 2025 (updated Oct. 2025). [Online]. Available: https://arxiv.org/abs/2506.09501

S. Shanmugavelu, M. Taillefumier, C. Culver, et al., "Impacts of Floating-Point Non-Associativity on Reproducibility for HPC and Deep Learning Applications," in Proceedings of SC24 Workshops (SCW24), 2024. doi: 10.1109/SCW63240.2024.00028.

"Towards Deterministic Inference in SGLang and Reproducible RL Training," LMSYS Blog, Sep. 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://lmsys.org/blog/2025-09-22-sglang-deterministic/

Kubiya, "What is Deterministic AI: Concepts, Benefits, and Its Role in Building Reliable AI Agents (2025 Guide)," 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://www.kubiya.ai/blog/what-is-deterministic-ai

NVIDIA, "cuBLAS Library," in CUDA Toolkit Documentation, version 13.0, 2025. Accessed: Nov. 11, 2025. [Online]. Available: https://docs.nvidia.com/cuda/cublas/index.html

S. Troshin, "Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs," arXiv:2510.01218 [cs.CL], Sep. 2025. [Online]. Available: https://arxiv.org/abs/2510.01218

B. Siklósi, G. R. Mudalige, and I. Z. Reguly, "Enabling Bitwise Reproducibility for the Unstructured Computational Motif," Applied Sciences, vol. 14, no. 2, p. 639, 2024. doi: 10.3390/app14020639.

TensorFlow Team, "Reproducible Training," 2023. Accessed: Nov. 11, 2025. [Online]. Available: https://www.tensorflow.org/guide/random_numbers#determinism

PyTorch Core Team, "Reproducibility," 2024. Accessed: Nov. 11, 2025. [Online]. Available: https://pytorch.org/docs/stable/notes/randomness.html

Bounded Determinism: A Framework for Analyzing the Operational, Functional, And Architectural Limits of LLM Inference

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License