
A deep dive into why we can treat dependent variables as independent when using Lagrange multipliers, and how the multiplier absorbs the constraint's effects.
Nov 17, 2025

A comprehensive mathematical derivation of DeepSeek MLA's weight absorption mechanism, explaining how it compresses the KV cache by 57× while maintaining performance.
Oct 11, 2025