Writings
Collected thoughts and quiet observations
✦
Essays & notes
- Mixed Precision and ZeRO: Training Large Models Without Running Out of MemoryA note on mixed precision, high-precision parameter copies, and ZeRO.11 Feb 2025
- Weight-Tying: Gentle Read–Write Symmetry in Language ModelsHow a single embedding table serves as both reader and writer in language models, with forward and backward passes and the split of gradients.9 Feb 2025
✦