- Bayesian Decision Theory for Your Projects and Honestly, Life
- Unfolding Attention: How Log-Space, Semirings, and Separability Reveal a Path to Linear-Time Transformers
- Rethinking Attention Through Semirings: Toward Linear-Time Log-Domain Transformers
- Formulations of Neural Net Weight initializations
- Derivations of Gradients w.r.t. Neural Networks