Monthly Archives: December 2025

Self-Attention in Transformers

Posted on December 31, 2025 by wdengquantum.me

Problem: Explain how the transformer architecture works at a mathematical level (e.g. as outlined in the Attention Is All You Need paper). Solution: \[\Delta\mathbf x_i=V\text{softmax}\left(\frac{K^T\mathbf q_i}{\sqrt{n_{qk}}}\right)\] where $K=(\mathbf k_1,…,\mathbf k_N)\in\mathbf R^{n_{qk}\times N}$ and $V=(\mathbf v_1,…,\mathbf v_N)\in\mathbf R^{n_e\times N}$ are key … Continue reading →

Posted in Blog | Leave a comment

JAX Fundamentals (Part $1$)

Posted on December 7, 2025 by wdengquantum.me

JAX_tutorial $\textbf{Problem}$: What is JAX? $\textbf{Solution}$: JAX = Autograd + XLA, where Autograd refers to automatic differentiation, and XLA refers to accelerated linear algebra (compiler developed by Google that optimizes code to run fast on GPUs/TPUs). At a high level, … Continue reading →

Posted in Blog | Leave a comment

Monthly Archives: December 2025

Self-Attention in Transformers

JAX Fundamentals (Part \(1\))

Archives

Meta