Monthly Archives: December 2025

Self-Attention in Transformers

Problem: Explain how the transformer architecture works at a mathematical level (e.g. as outlined in the Attention Is All You Need paper). Solution: \[\Delta\mathbf x_i=V\text{softmax}\left(\frac{K^T\mathbf q_i}{\sqrt{n_{qk}}}\right)\] where \(K=(\mathbf k_1,…,\mathbf k_N)\in\mathbf R^{n_{qk}\times N}\) and \(V=(\mathbf v_1,…,\mathbf v_N)\in\mathbf R^{n_e\times N}\) are key … Continue reading

Posted in Blog | Leave a comment

JAX Fundamentals (Part \(1\))

JAX_tutorial $\textbf{Problem}$: What is JAX? $\textbf{Solution}$: JAX = Autograd + XLA, where Autograd refers to automatic differentiation, and XLA refers to accelerated linear algebra (compiler developed by Google that optimizes code to run fast on GPUs/TPUs). At a high level, … Continue reading

Posted in Blog | Leave a comment