The purpose of this post is to provide some examples of perturbation theory calculations in quantum mechanics by analyzing a specific perturbation called the Stark effect. For simplicity, we take as our quantum system a hydrogen atom with the usual non-relativistic two-body Hamiltonian:
and corresponding non-relativistic energy spectrum \(E_n=-\frac{1}{2}\left(\frac{\alpha}{n}\right)^2\mu c^2\). We now apply a constant external electric field \(\textbf E_{\text{ext}}=E_{\text{ext}}\hat{\textbf k}\) which is not too strong \(0<E_{\text{ext}}\ll e^2/4\pi\varepsilon_0r^2_{\text{Bohr}}\) along the \(z\)-axis and ask how this perturbs the original energy spectrum \(E_n\) (one can of course also calculate how it will perturb the corresponding \(H\)-eigenstates \(|n,\ell,m_{\ell}\rangle\) but for now we won’t look too deeply into that).
Before delving into the math, it is always helpful to leverage some classical intuition to try and predict what effect the perturbation will have on the various unperturbed \(H\)-eigenstates \(|n,\ell,m_{\ell}\rangle\) as well as their energies \(E_n\). Classically, we expect the proton \(p^+\) in the nucleus to be tugged in the direction of \(\textbf E_{\text{ext}}\) along the positive \(z\)-axis while the electron \(e^-\) is tugged by an equal and opposite force along the negative \(z\)-axis. Thus, for \(H\)-eigenstates where the electron \(e^-\) position space wavefunction \(\langle\textbf x|n,l,m_{\ell}\rangle\) was more concentrated “above” the proton \(p^+\), the average electric dipole moment \(\textbf p\) would find itself anti-aligned relative to the electric field \(\textbf E_{\text{ext}}\), leading to a positive energy contribution \(-\textbf p\cdot\textbf E_{\text{ext}}>0\) so that the energy \(E_n\) of that \(H\)-eigenstate is expected to increase (one might object that the Coulomb potential energy \(V\) would decrease. By a similar logic, \(H\)-eigenstates where the \(e^-\) was predominantly “below” the proton \(p^+\) should experience a decrease in their energy \(E_n\).should this should increasing their electric dipole moment \(\textbf p\) and by extension their average Coulomb potential energy \(\langle V\rangle\), hence by the virial theorem also leading on average to an increased energy \(\langle H\rangle =\langle V\rangle/2\). It remains to be seen if quantum mechanics bears out these classical considerations!
The Stark perturbation \(\Delta H_{\text{Stark}}\) to the original Hamiltonian \(H\) due to the application of the external electric field \(\textbf E_{\text{ext}}\) arises physically from the coupling between the electric dipole moment \(\textbf p\) of the proton-electron pair defining the hydrogen atom and the external electric field \(\textbf E_{\text{ext}}\), i.e. \(\Delta H_{\text{Stark}}=-\textbf p\cdot\textbf E_{\text{ext}}=-(-e\textbf X)\cdot\textbf E_{\text{ext}}=eE_{\text{ext}}X_3\).
Case #1: Nondegenerate Perturbation Theory
The easiest Stark perturbation to compute pertains to the ground state \(|1,0,0\rangle\) of the hydrogen atom. The reason it’s the easiest is simply that there is only one unique ground state in hydrogen (caveat: we are working with non-relativistic quantum mechanics so we’ll pretend we don’t know about spin!). So for the \(n=1\) ground state there is no degeneracy to worry about (which is decisively not true for \(n>1\)!). The ground state \(|1,0,0\rangle\) of course initially possesses an unperturbed energy \(E_1\approx -13.6\text{ eV}\). Using the standard result of nondegenerate perturbation theory, the first-order correction \(\Delta E_1^{(1)}\) to the unperturbed ground state energy \(E_1\) is given by the expectation of the unperturbed ground state \(|1,0,0\rangle\) in the perturbation Hamiltonian \(\Delta H_{\text{Stark}}\):
Recall that rotational symmetry \([H,\textbf L]=\textbf 0\) automatically implies centrosymmetry \([H,\Pi]=0\) and so \(\Pi|n,\ell,m_{\ell}\rangle=(-1)^{\ell}|n,\ell,m_{\ell}\rangle\) have definite parity. It follows that the ground state \(|1,0,0\rangle\) has even parity \(\Pi|1,0,0\rangle=|1,0,0\rangle\) so we expect \(\langle 1,0,0|X_3|1,0,0\rangle=\langle 1,0,0|\Pi^{\dagger}X_3\Pi|1,0,0\rangle=-\langle 1,0,0|X_3|1,0,0\rangle\) (where we’ve used the fact that \(\textbf X\) is a vector operator), thus to first-order \(\Delta E_1^{(1)}=0\) and the ground state energy \(E_1\) is unaffected (notice the above argument would have been equally valid if the ground state were to have oddparity! Thus, the key is not which parity it had, but that it had a definite parity at all). Instead, the Stark perturbation only affects the ground state energy \(E_1\) at second-order \(\Delta E_1^{(2)}\neq 0\) which is why this is an example of the so-called quadratic Stark effect.
To compute the second-order correction \(\Delta E_1^{(2)}\), we need to calculate:
Since \(X_3\) is already the \(T^1_0\) spherical component of the rank-\(1\) tensor operator \(\textbf X\) (since it’s a vector operator!), the Wigner-Eckart theorem asserts that the matrix element \(\langle n,\ell,m_{\ell}|X_3|1,0,0\rangle\) is proportional to the Clebsch-Gordan coefficient \(\langle n,\ell,m_{\ell}|\otimes\langle 1,0,0|1,0\rangle\). In particular, one has the usual angular momentum addition selection rules \(0=m_{\ell}+0\) and \(|\ell-0|\leq 1\leq\ell+0\), so the only non-vanishing matrix elements are those with \(\ell=1,m_{\ell}=0\) (put another way, the only states that mix with the ground state \(|1,0,0\rangle\) are those of the form \(|n,1,0\rangle\)). The second-order energy correction \(\Delta E_1^{(2)}\) to the ground state \(|1,0,0\rangle\) simplifies to a single infinite series:
where, if you want the details of that sum, see here. The point is \(\Delta E_1^{(2)}<0\). Classically, the external electric field \(\textbf E_{\text{ext}}\) polarizes the hydrogen atom with electric dipole moment \(\textbf p=\alpha\textbf E_{\text{ext}}\) with \(\alpha>0\) the atomic polarizability, so it makes sense that the perturbation \(V_{\text{Stark}}=-\textbf p\cdot\textbf E_{\text{ext}}=-\alpha E_{\text{ext}}^2<0\) is both negative and comes in at quadratic order \(V_{\text{Stark}}=O_{E_{\text{ext}}\to 0}(E_{\text{ext}}^2)\).
Case #2: Degenerate Perturbation Theory
Now we compute the Stark perturbation for the \(n=2\) states. Recall that in the hydrogen atom, the \(n\)-th energy level \(E_n\) is associated with an \(\sum_{\ell=0}^{n-1}(2\ell+1)=n^2\)-dimensional degenerate \(H\)-eigensubspace. For \(n=1\), \(1^2=1\) so we have a single non-degenerate ground state \(|1,0,0\rangle\) as before. However, for \(n=2\) we have \(2^2=4\) degenerate states \(|2,0,0\rangle,|2,1,-1\rangle,|2,1,0\rangle\) and \(|2,1,1\rangle\) (in chemistry lingo, the \(2s\)-wave and the three \(2p\)-waves). As usual, we look for eigenstates of the perturbation Hamiltonian \(\Delta H_{\text{Stark}}\biggr|_{\ker(H-E_2 1)}\) restricted to the degenerate \(E_2\)-eigensubspace \(\ker(H-E_2 1)=\text{span}_{\textbf C}|2,0,0\rangle,|2,1,-1\rangle,|2,1,0\rangle,|2,1,1\rangle\) of the unperturbed Hamiltonian \(H\) since such eigenstates are automatically eigenstates of the perturbed Hamiltonian \(H+\Delta H_{\text{Stark}}\)! The matrix elements of \(\Delta H_{\text{Stark}}\biggr|_{\ker(H-E_2 1)}\) in the obvious basis \(|2,0,0\rangle,|2,1,-1\rangle,|2,1,0\rangle,|2,1,1\rangle\) may be determined through parity considerations for the diagonal elements and Wigner-Eckart selection rule considerations for the off-diagonal elements:
Thus with only (3\) distinct eigenvalues \(-3er_{\text{Bohr}}E_{\text{ext}},0,3er_{\text{Bohr}}E_{\text{ext}}\) representing the degeneracy-lifting energy corrections that need to be made to the degenerate energy \(E_2\) (actually though, having an eigenvalue of \(0\) means no energy correction is made!). Thus, the Stark perturbation lifts the degeneracy among the states \(|2,0,0\rangle\) and \(|2,1,0\rangle\) by mixing them into \(2\) “\(sp\) hybrid waves” with equiprobable “\(50\%\) \(s\)-character” and “\(50\%\) \(p\)-character”.
Note that it doesn’t really make sense to say specifically which of the unperturbed states \(|2,0,0\rangle,|2,1,0\rangle\) “became” which of the two new hybrid states. It only makes sense to say that they mixed. The other two \(p\)-waves \(|2,1,-1\rangle,|2,1,1\rangle\) remain degenerate/”unhybridized” \(p\)-waves.
Although the energy corrections to \(E_2\) scaled linearly with \(E_{\text{ext}}\) (thus, this is called the linear Stark effect), the new eigenstates are completely independent of \(E_{\text{ext}}\)! Even the slightest stray electric field in the lab evidently causes the hydrogen atom to prefer the probabilistic superposition. Diagramatically, the Stark perturbation as a function of the external electric field strength for the states considered so far looks like:
I mentioned this already, but I just wanted to re-emphasize that “degenerate perturbation theory” is a misnomer because the energies have been computed exactly rather than perturbatively! That’s it, those are the exact linear Stark effect energies! It is interesting to try and think about why these results make sense classically.
From an experimental physics perspective, the Stark effect enables one to “sniff out” degeneracies in the Hamiltonian \(H\) of an initially “highly symmetric” quantum system simply through the application of an external electric field \(\textbf E_{\text{ext}}\) (which breaks that symmetry and therefore the degeneracy). If the agent causing the external electric field \(\textbf E_{\text{ext}}\) is say another atom, then roughly speaking the Stark effect (specifically the linear Stark effect part where the lowest-energy lifted state \((|2,0,0\rangle+|2,1,0\rangle)\sqrt{2}\) is maximally delocalized) also explains why conductors work.
Although the spherical harmonics are just functions of \(\theta,\phi\) on the sphere \(S^2\), one can introduce an \(r\)-dependence simply by multiplying each of them by \(r\). The resulting functions are then best written in Cartesian coordinates \((x,y,z)\):
But this suggests something quite interesting. It means if we have any vector \(\textbf x\in\textbf R^3\) (i.e. a rank \(\ell=1\) tensor) with contravariant components \(\textbf x=x\hat{\textbf i}+y\hat{\textbf j}+z\hat{\textbf k}\) in a Cartesian (aka orthonormal) basis \(\hat{\textbf i},\hat{\textbf j},\hat{\textbf k}\), then by rotating into the so-called spherical basis \(\hat{\textbf e}_{-1}=\frac{\hat{\textbf i}-i\hat{\textbf j}}{\sqrt{2}},\hat{\textbf e}_{0}=\hat{\textbf k},\hat{\textbf e}_{1}=-\frac{\hat{\textbf i}+i\hat{\textbf j}}{\sqrt{2}}\), the contravariant components \(\textbf x=x^{\mu}\hat{\textbf e}_{\mu}\) of the same vector \(\textbf x\) in this spherical basis are now proportional to the spherical harmonics! But this gives us a handle on how such components must transform under rotations because we know how spherical harmonics transform under rotations (and the \(r\)-prefactor is constant with respect to rotations)! To wit, recall that \(Y_{\ell}^m(\hat{\textbf n})=\langle\hat{\textbf n}|\ell, m\rangle\) are the (angular part of the) position space wavefunctions of orbital angular momentum eigenstates, and for each fixed total angular momentum \(j\in\textbf N/2\), the \(2j+1\)-dimensional subspace \(\mathcal H_{j}:=\text{span}_{m=-j,…,j}|j,m\rangle\) is \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}\)-invariant for all \(\Delta \boldsymbol{\phi}\) in \(SO(3)\). More precisely, these states transform according to the components of the Wigner D-matrix \(D^j(\boldsymbol{\phi})\in U(2j+1)\):
with \(D_{m’m}^j(\Delta{\boldsymbol{\phi}})=\langle j,m’|e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}|j,m\rangle\).
Example: Consider the vector \(\textbf x=\hat{\textbf i}-2\hat{\textbf j}+3\hat{\textbf k}\) in some Cartesian basis. Suppose we wished to rotate \(\textbf x\) by an angle of \(\pi/5\) about the \(z\)-axis. Ordinarily, we could just achieve this by acting on the Cartesian components of \(\textbf x\) with a suitable rotation matrix:
Alternatively, in the spherical basis we instead have \(\textbf x=\frac{1+2i}{\sqrt{2}}\hat{\textbf e}_{-1}+3\hat{\textbf e}_{0}-\frac{1-2i}{\sqrt 2}\hat{\textbf e}_{1}\). We then note that the associated Wigner \(D\)-matrix has components \(D^1_{m’m}(\pi/5)=\langle 1,m’|e^{-i\pi/5 J_3}|1,m\rangle=\delta_{m’m}e^{-im\pi/5}\) and so \(D^1(\pi/5)=\text{diag}(e^{i\pi/5},1,e^{-i\pi/5})\). Acting with \(D^1(\pi/5)\) on the components of the vector \(\textbf x\) in the spherical basis then yields:
Taking the earlier Cartesian result and converting it into the spherical basis shows that the two calculations match as claimed.
So far we have been talking about a vector \(\textbf x\in\textbf R^3\). Of course though, all of this discussion applies just as well to vector operators on \(\mathcal H=L^2(\textbf R^3\to\textbf C)\) in quantum mechanics such as \(\textbf X,\textbf P\), etc. To reiterate, to rotate a vector in \(\textbf R^3\), one can always first perform a unitary change of basis from Cartesian to spherical, rotate the spherical components of the vector operator as if they were orbital angular momentum eigenstates (or spherical harmonics if you like) using a suitable Wigner \(D\)-matrix, and the result will be the rotated vector operator in the spherical basis.
Even more generally, one need not limit oneself to scalar operators or vector operators; any tensor operator can be converted from Cartesian components to spherical components (with the latter dubbed a “spherical tensor operator” although I find it to be misnomer as it’s not a different kind of tensor operator, just one whose components happen to expressed in the spherical basis). More precisely, if \(T^j_m\) are the spherical components of a rank \(j\in\textbf N\) tensor operator \(T\) with \(2j+1\) component operators \(T^j_m\) for \(m=-j,…,j\), then these components transform under rotations via the Wigner \(D\)-matrix:
As already suggested, rank \(j=0\) scalar operators have \(2j+1=1\) spherical component operator that transforms under rotations as \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}T^0_0e^{i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}=D_{00}^0(\Delta{\boldsymbol{\phi}})T_0^0=T_0^0\) which has the logically equivalent infinitesimal formulation \([\textbf J,T_0^0]=0\). Meanwhile, a rank \(j=1\) vector operator with \(2j+1=3\) spherical component operators \(T^1_{-1},T^1_{0},T^1_1\) transforms under rotations as \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}T^1_me^{i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}=D_{-1m}^1(\Delta{\boldsymbol{\phi}})T_{-1}^1+D_{0m}^1(\Delta{\boldsymbol{\phi}})T_0^1+D_{1m}^1(\Delta{\boldsymbol{\phi}})T_1^1\), or infinitesimally \(\). This should be compared with the analogous rotation behavior statement for the Cartesian components \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}T_ie^{i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}=R_{ij}T_j\), or infinitesimally \([J_i,T_j]=i\hbar\varepsilon_{ijk}T_k\). More generally then, one can also look at the infinitesimal limit of the above defining equation for spherical components of a tensor operator to obtain logically equivalent commutation relations:
\[[J_3,T^{\ell}_m]=m\hbar T^{\ell}_m\]
\[[J_{\pm},T^{\ell}_m]=\hbar\sqrt{(\ell\mp m)(\ell\pm m +1)}T^{\ell}_{m\pm 1}\]
Hopefully this all makes sense! Finally then, we arrive at:
Theorem (Wigner-Eckart Theorem): Let \(T:\mathcal H\to\mathcal H\) be a spherical tensor operator of rank \(j\) with \(2j+1\) components \(T_{(j)}^{m}\) for \(m=-j,…,j\). Then there exists a so-called reduced matrix element \(\langle j_2||T^{(j)}||j_1\rangle\) independent of \(m_1,m_2,m\) which acts as a proportionality constant between the matrix elements of \(T\) in the angular momentum eigenbasis and Clebsch-Gordan coefficients that can be easily computed or looked up in tables:
In particular, recall that the Clebsch-Gordan coefficients \langle j_1,m_1|\otimes\langle j_2,m_2|j,m\rangle have the standard selection rules \(m=m_1+m_2\) and \(|j_1-j_2|\leq j\leq j_1+j_2\). By virtue of the Wigner-Eckart theorem, these selection rules are also directly inherited by the matrix elements \(\langle j_2,m_2|T_j^m|j_1,m_1\rangle\).
The purpose of this post is to demonstrate how Maxwell’s equations are compatible with special relativity. First though, some review of the formalism of special relativity is needed.
Review of Lorentz Scalars, Four-Vectors, Four-Covectors
Recall that in special relativity, a Lorentz scalar \(a=a(X)\) is any real number depending in some way on the four-position \(X=(ct,\textbf x)\) of events in Minkowski spacetime such that whenever the four-position \(X\) of an event undergoes a Lorentz transformation \(X\mapsto \Lambda X\) for some \(\Lambda\in O(1,3)\) in the Lorentz group (which recall consists of Lorentz boosts and proper/improper rotations) the Lorentz scalar \(a(X)\) is Lorentz invariant \(a(\Lambda X)=a(X)\). Similarly a four-vector \(V=V(X)\) transforms in an equivariant/commuting way \(V(\Lambda X)=\Lambda V(X)\) under a Lorentz transformation \(X\mapsto \Lambda X\) (here, I’m tempted to say “in a covariant way” but actually the components of \(X\) turn out to be contravariant…more on this later).
Theorem: If \(V(X),W(X)\) are any two four-vectors, then their inner product \(V(X)^T\eta W(X)\) is a Lorentz scalar. If in addition \(a(X),b(X)\) are Lorentz scalars, then the linear combination \(a(X)V(X)+b(X)W(X)\) is a four-vector.
Proof: \(V(\Lambda X)^T\eta W(\Lambda X)=(\Lambda V(X))^T\eta\Lambda W(X)=V(X)^T\Lambda^T\eta\Lambda W(X)=V(X)^T\eta W(X)\) as the defining property for membership \(\Lambda\in O(1,3)\) in the Lorentz group is that \(\Lambda^T\eta\Lambda=\eta\). For the second part, we have \(a(\Lambda X)V(\Lambda X)+b(\Lambda X)W(\Lambda X)=a(X)\Lambda V(X)+b(X)\Lambda W(X)=\Lambda(a(X)V(X)+b(X)W(X))\).
Corollaries: The Minkowski distance/”ruler” \(ds^2:=dX^T\eta dX=\eta_{\mu\nu}dX^{\mu}dX^{\nu}=c^2dt^2-|d\textbf x|^2\) is the fundamental Lorentz scalar that encodes the hyperbolic geometry of Minkowski spacetime. The proper time interval \(d\tau:=ds/c\) experienced by a particle (where here \(ds\) is the Minkowski distance between two infinitesimally close events on the particle’s worldline) is therefore also a Lorentz scalar. Thus, the derivative \(V:=\frac{dX}{d\tau}\) of the four-vector \(dX\) with respect to the Lorentz scalar \(d\tau\) is also a four-vector, called the four-velocity of the particle. Similar comments apply to the four-momentum \(P:=mV\), the four-acceleration \(A:=\frac{dV}{d\tau}\), etc.
Recall that in any inner product space \(V\), there is a canonical isomorphism \(V\cong V^*\) between \(V\) and its algebraic dual space \(V^*\) given by transposition \(\textbf v\mapsto \textbf v^T\) from a column vector to a row vector if one likes. In the case of Minkowski spacetime, the analog of the “inner product” is of course \((V,W)\mapsto V^T\eta W\) (though this is not strictly a valid inner product because \(X^T\eta X\) need not be positive semi-definite as exemplified when \(X\) is a space-like separated event!) so it makes sense that a canonical isomorphism should also exist between Minkowski spacetime and its algebraic dual spacetime given by \(V\mapsto V^T\eta\). Thus for instance, a four-position \(X=(ct,\textbf x)^T\) is dual to \(X^T\eta=(ct, -\textbf x)\), a four-velocity \(V=\gamma(c,\textbf v)^T\) is dual to \(V^T\eta=\gamma(c,-\textbf v)\), etc. Such objects like \(X^T\eta\) and \(V^T\eta\) that live in dual Minkowski spacetime are called four-covectors. It can be checked that under a Lorentz transformation \(X\mapsto \Lambda X\), a four-covector \(\tilde V(X)=V(X)^T\eta\) transforms (roughly) the opposite way to how a four-vector behaves, namely \(\tilde V(\Lambda X)=\tilde V(X)\Lambda^{-1}\). In other words, this provides another way to see why the inner product \(\tilde V(X)V(X)\mapsto \tilde V(X)\Lambda^{-1}\Lambda V(X)=\tilde V(X)V(X)\) is a Lorentz scalar.
If one works in a specific inertial frame, then the covariant components \(X_{\mu}\) of a four-covector \(\tilde X\) are obtained from the contravariant components \(X^{\nu}\) of the corresponding four-vector \(X\) by lowering indices with the Minkowski metric tensor \(X_{\mu}=\eta_{\mu\nu}X^{\nu}\). As \(\eta^2=1\) is an involution, it follows that \(\eta^{\mu\rho}\eta_{\rho\nu}=\delta^{\mu}_{\nu}\) so the contravariant Minkowski metric \(\eta^{\mu\nu}\) can also be used to “raise indices” (i.e. promote from a four-covector back to a four-vector) \(X^{\mu}=\eta^{\mu\nu}X_{\nu}\). Using this index formalism, under a Lorentz transformation of contravariant components \(X^{\mu}\mapsto \Lambda^{\mu}_{\nu}X^{\nu}\), the covariant components transform as \(X_{\mu}=\eta_{\mu\sigma}X^{\sigma}\mapsto \eta_{\mu\sigma}\Lambda^{\sigma}_{\rho} X^{\rho}=\eta_{\mu\sigma}\Lambda^{\sigma}_{\rho}\eta^{\rho\nu} X_{\nu}=\Lambda^{\nu}_{\mu}X_{\nu}\).
A prominent example of a four-covector is the four-gradient \(\frac{\partial}{\partial X}:=\left(\frac{1}{c}\frac{\partial}{\partial t},\frac{\partial}{\partial\textbf x}\right)\). It has components \(\frac{\partial}{\partial X^{\mu}}\). To prove that it’s actually a four-covector and not a four-vector, let \(X’^{\mu}=\Lambda^{\mu}_{\nu}X^{\nu}\) be the Lorentz-transformed contravariant components of the four-position \(X\). From this we obtain \(\frac{\partial X’^{\mu}}{\partial X^{\nu}}=\frac{\partial}{\partial X^{\nu}}\Lambda^{\mu}_{\rho}X^{\rho}=\Lambda^{\mu}_{\rho}\frac{\partial X^{\rho}}{\partial X^{\nu}}=\Lambda^{\mu}_{\rho}\delta_{\nu}^{\rho}=\Lambda^{\mu}_{\nu}\) or equivalently \(\frac{\partial X^{\nu}}{\partial X’^{\mu}}=\Lambda^{\nu}_{\mu}\). Meanwhile, the components of the four-gradient transform (by virtue of the chain rule) into \(\frac{\partial}{\partial X^{\mu}}\mapsto\frac{\partial}{\partial X’^{\mu}}=\frac{\partial X^{\nu}}{\partial X’^{\mu}}\frac{\partial}{\partial X^{\nu}}=\Lambda^{\nu}_{\mu}\frac{\partial}{\partial X^{\nu}}\). Thus, either from the fact that the \(\mu\) is a subscript on \(\Lambda\) or that the \(\nu\) is a superscript on \(\Lambda\), clearly this implies that the four-gradient \(\frac{\partial}{\partial X}\) is a four-covector and hence it has covariant components that can be denoted by the subscript \(\partial_{\mu}:=\frac{\partial}{\partial X^{\mu}}\).
Tensors
In general, on an arbitrary manifold (such as Minkowski spacetime!) if one transforms between two coordinate charts \((x^1,x^2,…)\mapsto (x’^1,x’^2,…)\) (i.e. a Lorentz transformation between two inertial frames!), then physicists define a tensor \(T\) of type \((u,v)\) by enforcing that its components transform according to:
Please ingrain this equation into your head. In the case where that manifold is Minkowski spacetime, the partial derivatives become (via the earlier derivation):
which we call a Lorentz tensor. Thus, Lorentz scalars are type \((0,0)\) Lorentz tensors, four-vectors are type \((1,0)\) Lorentz tensors while four-covectors are type \((0,1)\) Lorentz tensors. The important point about Lorentz tensors in special relativity is that any equation you write down using tensors (where you’ve respected the rules about pairing upper/contravariant and lower/covariant indices, etc.) is necessarily a Lorentz covariant equation in the sense that it will look the same in all inertial frames. This phenomenon of Lorentz covariance was one of Einstein’s original postulates of special relativity!
Four-Current
The four-current is defined by:
\[J:=\begin{pmatrix}\rho c \\ \textbf J\end{pmatrix}\]
It turns out that the four-current \(J\) is a four-vector as its name suggests, thus it has contravariant components \(J^{\mu}\) for spacetime index \(\mu=0,1,2,3\). Unlike other four-vectors such as the four-position \(X\), the four-velocity \(V\), etc. which were all properties of a single particle, the four-current is more of a “four-vector field” for a given configuration/dynamics of electric charges because it depends on the scalar field \(\rho=\rho(X)\) and the vector field \(\textbf J=\textbf J(X)\). Thus, the four-current is a type \((1,0)\) tensor field.
But why is \(J\) a four-vector (field)? The idea is that \(J\) contains both the information \(\rho\) about where the charges are and how they’re moving \(\textbf J\), but of course by local charge conservation these two quantities are related by the continuity equation \(\frac{\partial\rho}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf J=0\) which is elegantly expressed using the four-gradient as \(\partial_{\mu}J^{\mu}=0\). So the aesthetic of this should give some suggestion that, because we already showed that the four-gradient \(\partial_{\mu}\) is a four-covector, \(J\) should be a four-vector. To prove this rigorously, we enforce Lorentz covariance of local charge conservation in all inertial frames, so \(\partial’_{\mu}J’^{\mu}=0\). Then, \(\partial_{\mu}=\Lambda_{\mu}^{\nu}\partial’_{\nu}\), so \(\Lambda_{\mu}^{\nu}\partial’_{\nu}J^{\mu}=\partial’_{\nu}\Lambda_{\mu}^{\nu}J^{\mu}=0\). This implies that \(J’^{\mu}=\Lambda^{\mu}_{\nu}J^{\nu}+C\) where \(C\) is an integration constant independent of \(\mu\). But if the primed and unprimed inertial frames were to coincide so that \(\Lambda^{\mu}_{\nu}=\delta^{\mu}_{\nu}\) and \(J’^{\mu}=J^{\mu}\), then we must have \(C=0\). This completes the proof that \(J\) is a four-vector. As an example, if a wire of uniform charge density \(\rho\) and current \(\textbf J=\textbf 0\) is at rest in one inertial frame, then a second inertial observer Lorentz-boosted relative to the wire with velocity \(\textbf v\) would therefore see an increased charge density \(\rho’=\gamma\rho\) with the factor of \(\gamma\) attributable to length contraction and current density \(\textbf J’=-\rho’\textbf v\).
Electric & Magnetic Fields
One of the key insights that arises when one applies special relativity to electromagnetism is that magnetic fields are a relativistic effect. I suppose this is true even in materials where the \(\textbf B\)-field is due to spin angular momenta, and the Dirac equation teaches us that spin too is an intrinsically relativistic phenomenon! A simple example is given by imagining that one has an infinitely-long neutral \(\rho=0\) wire of cross-sectional area \(A\) conducting a current \(I>0\) along say the positive \(z\)-axis in the wire’s rest frame. Now, classically, we know that this is supposed to establish a circulating magnetostatic field \(\textbf B(r,\phi)=\frac{\mu_0 I}{2\pi r}\hat{\boldsymbol{\phi}}_{\phi}\). Thus, if a point charge \(q>0\) were moving with velocity \(\textbf v=v\hat{\textbf k}\) along the \(z\)-axis at some moment in time, although it would feel no electric force \(\textbf F_{\textbf E}=\textbf 0\) in the wire’s rest frame because of the wire’s electroneutrality \(\rho=0\), it should experience a magnetic force \(\textbf F_{\textbf B}=q\textbf v\times\textbf B\) attracting it towards the wire. However, it turns out that we don’t need to assume this! All we need to assume is we know how electric forces work, and then inject a bit of special relativity. To see this, Lorentz boost into the instantaneous inertial rest frame of the point charge \(q\) so that in this frame, it clearly wouldn’t experience any magnetic force because now \(\textbf v’=\textbf 0\). However, at the same time the wire is no longer neutral, instead acquiring a negative charge density \(\rho’=-\gamma\beta I/Ac<0\) (as an aside, the current also increases to \(I’=\gamma I\)). In this rest frame of \(q\) then, an electric force \(\textbf F’_{\textbf E’}\) therefore attracts it (being a positive charge \(q>0\)) towards the negatively-charged wire given by the standard formula from Gauss’s law for electric fields: \(\textbf F’_{\textbf E’}=\frac{q\rho’ A}{2\pi\varepsilon_0 r}\hat{\textbf r}=-qv\frac{\gamma I}{2\pi\varepsilon_0 c^2r}\hat{\textbf r}\). But recall \(c^2=1/\mu_0\varepsilon_0\) so \(\textbf F’_{\textbf E’}=-qv\frac{\gamma \mu_0 I}{2\pi r}\hat{\textbf r}\). Lorentz transforming back into the neutral rest frame of the wire, we conclude that there must be a magnetic force \(\textbf F_{\textbf B}=\textbf F’_{\textbf E’}/\gamma=-qv\frac{\mu_0 I}{2\pi r}\hat{\textbf r}\) which is just what we expect from Ampere’s law (this is obtained by noting that the Lorentz boosts are performed along the \(z\)-axis so the \(xy\) spatial components of the four-force\(F=\gamma(\textbf F\cdot\textbf v/c, \textbf F)^T\) are unaffected, and in particular as \(\textbf F’_{\textbf E’}\) lives in the \(xy\)-plane so \(\gamma\textbf F_{\textbf B}=\gamma’\textbf F’_{\textbf E’}\) but \(\gamma’=1\) in the rest frame of \(q\) so \(\textbf F_{\textbf B}=\textbf F’_{\textbf E’}/\gamma\) as claimed).
The Electromagnetic Four-Potential
Recall that out of \(4\) of Maxwell’s equations for \(\textbf E\) and \(\textbf B\), Gauss’s law for electric fields and the Ampere-Maxwell law involve the source fields \(\rho,\textbf J\) whereas Gauss’s law for magnetic fields and Faraday’s law of EM induction relate purely to the \(\textbf E\) and \(\textbf B\)-fields, independent of \(\rho\) and \(\textbf J\). Thus, it would be nice if one could in some sense just “get those two equations out of the way” to focus on the remaining two that actually deal with the sources. The trick to doing this is to work with the underlying electric scalar potential \(\phi(X)\) and magnetic vector potential \(\textbf A(X)\):
which immediately guarantees that \(\frac{\partial}{\partial\textbf x}\cdot\textbf B=0\) and \(\frac{\partial}{\partial\textbf x}\times\textbf E=-\frac{\partial\textbf B}{\partial t}\) are satisfied. The two remaining Maxwell’s equations become:
Of course, there exists a gauge redundancy/freedom (also called “gauge symmetry” although it’s not strictly speaking a “symmetry”) such that multiple non-unique choices of the potentials \((\phi,\textbf A)\) can give rise to the same fields \(\textbf E,\textbf B\). For one, it is clear that \(\textbf A’=\textbf A-\frac{\partial\Gamma}{\partial\textbf x}\) for any scalar field \(\Gamma(X)\) still yields the same \(\textbf B=\frac{\partial}{\partial\textbf x}\times\textbf A’\). If the electric field is to also be preserved \(\textbf E=-\frac{\partial\phi’}{\partial\textbf x}-\frac{\partial\textbf A’}{\partial t}=-\frac{\partial\phi}{\partial\textbf x}-\frac{\partial\textbf A}{\partial t}\), then one can deduce that the electric scalar potential must transform correspondingly as \(\phi’=\phi+\frac{\partial\Gamma}{\partial t}\). Taken together, \((\phi,\textbf A)\mapsto (\phi+\dot{\Gamma},\textbf A-\Gamma’)\) is called a gauge transformation of the potentials \(\phi,\textbf A\) with respect to the gauge field \(\Gamma\). Since, at least classically, the choice of the gauge \(\Gamma\) has no physical significance, one can simply fix it to simplify the resultant \(2\) Maxwell equations; for instance, the Coulomb gauge \(\Gamma_{\text{Coulomb}}\) is chosen to kill the divergence term \(\frac{\partial}{\partial\textbf x}\cdot\textbf A=0\), whereas the Lorenz gauge \(\Gamma_{\text{Lorenz}}\) is chosen to kill \(\frac{1}{c}\frac{\partial\phi}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf A=0\), etc.
Define the electromagneticfour-potential \(A\) by:
Why is this a four-vector? The following argument is due to J. Murray on Physics Stack Exchange: denoting the components of \(A\) in an inertial frame by \(A^{\mu}\) (even though we haven’t proved it’s a four-vector yet), since we know about the gauge freedom inherent in the four-potential \(A’^{\mu}=A^{\mu}+\partial^{\mu}\Gamma\) where \(\partial^{\mu}=\eta^{\mu\nu}\partial_{\nu}\) are the contravariant components of the four-gradient four-vector, clearly we should check whether the purported four-vectorial nature of \(A\) would change simply because of which gauge \(\Gamma\) one fixes. Fortunately, this is not the case; \(A^{\mu}\) transforms contravariantly if and only if \(A’^{\mu}\) transforms contravariantly with respect to Lorentz transformations (this is a simple check!). It remains to exhibit a specific example of a gauge \(\Gamma\) within which \(A^{\mu}\) transforms contravariantly. Fix \(\Gamma:=\Gamma_{\text{Lorenz}}\) so that \(\partial_{\mu}A^{\mu}=0\). Then Gauss’s law for electric fields and the Ampere-Maxwell law reduce to \(☐^2 A=\mu_0 J\) where the d’Alembert operator is \(☐^2:=\partial_{\mu}\partial^{\mu}\). But this is manifestly a Lorentz scalar operator, and because we’ve already shown that the four-current \(J\) is a four-vector, it follows that the electromagnetic four-potential \(A\) is also a four-vector.
The Electromagnetic Tensor
Now that we understand how the potentials \(\phi,\textbf A\) behave under Lorentz transformations, and we know how these are related to \(\textbf E,\textbf B\), it is straightforward to deduce how they too must transform. However, the relevant “nice object” is no longer a four-vector since \(\textbf E\) and \(\textbf B\) has six components! Instead, rather than a four-vector (i.e. a type \((1,0)\) Lorentz tensor), it turns out to be convenient to use a \(4\times 4\) matrix \(F\) (i.e. a type \((1,1)\) Lorentz tensor) that will be called the electromagnetic tensor (also sometimes called the Faraday tensor hence the letter \(F\)). Specifically, since we know \(\textbf E,\textbf B\) are gauge-invariant, one would like to ensure that the electromagnetic tensor \(F\) is also gauge-invariant in order to have any hope that it can properly capture \(\textbf E,\textbf B\). Pondering all these considerations together, I think it’s not too much of a stretch to say that you could eventually stumble upon the following definition (called the Plucker matrix ofthe four-gradient \(\partial/\partial X\) with the electromagnetic four-potential \(A\), each thought of as a “point” that together define a line in real projective space \(\textbf RP^3\)):
where \(\partial_{\mu}\) are the usual covariant components of the four-gradient four-covector \(\partial/\partial X\) and \(A_{\mu}=\eta_{\mu\nu}A^{\nu}\) are the covariant components of the electromagnetic four-potential four-covector. As promised, one can quickly check that \(F\) is gauge-invariant, and moreover antisymmetric \(F\in\frak{so}\)\((4)\). Doing a few computations, one can show that the electric and magnetic fields \(\textbf E,\textbf B\) have the interpretation of the Plucker coordinates of the line:
with \(\textbf B_{\times}=-\textbf B\cdot\boldsymbol{\mathcal J}\) the cross-product matrix of the magnetic field \(\textbf B\). One can also construct the dual electromagnetic tensor \(\tilde F\) to be fully contravariant \(F^{\mu\nu}=\eta^{\nu\sigma}\eta^{\mu\rho}F_{\rho\sigma}\) so that:
The punchline then is that we know how \(\textbf E\) and \(\textbf B\) transform under Lorentz transformations because \(F,\tilde F\) are type \((1,1)\) Lorentz tensors! This follows because they’re constructed out of other objects like \(\partial/\partial X\), \(A\), \(\eta\), etc. that were already known to be Lorentz tensors. The transformation is the usual one for a type \((1,1)\) tensor (strictly, tensor field):
or in index-free notation, \(F’=\Lambda F\Lambda^T\). Amazing!
Although for four-vectors such as \(X\) and \(V\) it is true that a Lorentz boost in some direction \(\hat{\boldsymbol{\beta}}\) looks just like a Galilean boost in the same direction \(\hat{\boldsymbol{\beta}}\) provided one only looks at the spatial orthogonal complement \(\text{span}^{\perp}_{\textbf R}\hat{\boldsymbol{\beta}}\), for the electromagnetic field it turns out to be the other way around (as the electromagnetic tensor \(F\) is no longer a four-vector!). This is most transparent in the formulas below for a Lorentz boost \(\beta\in[-1,1]\) along the positive \(x\)-axis into a second inertial frame with axes aligned to the first:
\[E’_x=E_x\]
\[E’_y=\gamma(E_y-\beta cB_z)\]
\[E’_z=\gamma(E_z+\beta cB_y)\]
\[cB’_x=cB_x\]
\[cB’_y=\gamma(cB_y+\beta E_z)\]
\[cB’_z=\gamma(cB_z-\beta E_y)\]
Or, in the non-relativistic limit \(|\beta|\ll 1\), \(\textbf E’\approx \textbf E+\textbf v\times\textbf B\) and \(\textbf B’=\textbf B\) are the “Galilean boosts” of the electromagnetic field.
The purpose of this post is to outline a derivation of the classical Maxwell distribution \(\rho_{\textbf V}(\textbf v|m,T)\), i.e. the probability density function for thecontinuous speed random vector \(\textbf V\) in a monatomic ideal gas given the atomic mass \(m\) and temperature \(T\).
It turns out that the specific Gaussian form of the \(\textbf v\)-dependence in the Maxwell distribution, namely\(\rho_{\textbf V}(\textbf v|m,T)\propto e^{-|\textbf v|^2/2\sigma^2(m,T)}\) does not actually require any physics whatsoever, emerging instead purely from the \(SO(3)\) symmetry of the setup (valid whether or not there’s any interaction among the gas particles). Put another way, any continuous random vector like the momentum \(\textbf P\), the acceleration \(\textbf A\), the angular momentum \(\textbf L\), etc. will also be normally distributed across the gas particles! For sake of concreteness however, let us work with the velocity random vector \(\textbf V\).
To prove the magical claim above, first note that \(\rho_{\textbf V}(\textbf v)=\rho_{\textbf V}(v_x,v_y,v_z)\) is clearly the joint probability distribution formed from the probability distributions \(\rho_{V_x}(v_x),\rho_{V_y}(v_y),\rho_{V_z}\) of the particle velocities along the \(x,y\) and \(z\) directions. Mathematically, the probabilistic chain rule asserts (for instance) that \(\rho_{\textbf V}(v_x,v_y,v_z)=\rho_{V_x}(v_x)\rho_{V_y|V_x}(v_y|v_x)\rho_{V_z|V_x,V_y}(v_z|v_x,v_y)\). However, the continuous random variables \(V_x,V_y,V_z\) are independent and identically distributed by \(SO(3)\) symmetry so the conditional probability distributions all simplify to the same marginal distribution denoted \(\tilde\rho:=\rho_{V_x}=\rho_{V_y}=\rho_{V_z}\). Thus, the mathematical challenge is to solve the following functional equation simultaneously for \(\rho_{\textbf V}\) and \(\tilde\rho\):
As usual, the trick to solving functional equations is to convert them to differential equations. Taking logs to convert the product into a sum and then differentiating with respect to \(v_x,v_y\) and then \(v_z\) yields:
where we are viewing \(\rho_{\textbf V}=\rho_{\textbf V}(\textbf v)=\rho_{\textbf V}(|\textbf v|)=\rho_{\textbf V}(v)\) by the same \(SO(3)\) isotropy argument (and indeed the answer was already spoiled earlier!). By the usual separation of variables argument, one can set everything equal to a constant suggestively denoted \(-1/\sigma^2\) (it has to be negative or the resulting solution would not be normalizable as required for interpretation as a probability distribution). This yields a separable first-order ODE which is straightforwardly integrated (and normalized) to yield a zero-mean normal distribution:
If one is specifically interested in the distribution \(\rho_V(v)\) of the continuous speed random variable \(V:=|\textbf V|\), then clearly \(\rho_V(v)=4\pi v^2\rho_{\textbf V}(v)\) is instead a chi distribution rather than a normal distribution.
In order to determine the parameter \(\sigma\) however, it is (finally!) necessary to look at the physics. For \(N=1\) gas particle with Hamiltonian \(H\), its semi-classical canonical partition function \(Z\) is:
For an ideal gas, the classical Hamiltonian is just \(H=|\textbf p|^2/2m\), so if the container has volume \(V=\int_{\textbf x\in\textbf R^3}d^3\textbf x\) then:
At this point the quick way to obtain \(\sigma\) would be to recognize that the semi-classical canonical partition function is the normalization factor in the Boltzmann distribution of the canonical ensemble, so the integrand must be proportional to the probability density function for the momentum random vector \(\textbf P\), from which one can directly see (using \(p^2/2m=mv^2/2\)) that \(1/2\sigma^2=\beta m/2\) so \(\sigma=1/\sqrt{\beta m}=\sqrt{kT/m}\). Alternatively, if that was too slick, one can also separate out the \(3\) Gaussian integrals in the Poisson fashion to get:
\[Z=\frac{V}{\lambda^3}\]
with \(\lambda=\sqrt{\frac{2\pi\hbar^2}{mkT}}\) the thermal de Broglie wavelength of the gas particles. Since the gas is ideal, hence non-interacting, the semi-classical canonical partition function for \(N\) ideal gas particles (denoted \(Z\) again by abuse of notation) is:
\[Z=\frac{V^N}{N!\lambda^{3N}}\]
where the \(N!\) is to avert the Gibbs paradox by accounting for identical particles (although see the article on the Gibbs paradox by E.T. Jaynes). Armed with \(Z\), everything else is at most a few derivatives away. The most immediate is the Helmholtz free energy \(F=-kT\ln Z\approx -NkT\left(\ln V-\ln N-3\ln\lambda+1\right)\) from which one deduces that the pressure \(p=-\frac{\partial F}{\partial V}=\frac{NkT}{V}\) follows the ideal gas law. The expected total energy is \(\langle E\rangle =-\partial\ln Z/\partial\beta=\frac{3}{2}NkT\) with \(C_V=\partial\langle E\rangle/\partial T=\frac{3}{2}Nk\) the heat capacity and by the fluctuation-dissipation theorem \(\sigma_E^2=kT^2C_V=3Nk^2T^2/2\) so that the relative fluctuations \(\sigma_E/\langle E\rangle=\sqrt{2/3N}\) become vanishingly small in the thermodynamic limit. The entropy \(S=k(\beta\langle E\rangle+\ln Z)\) is then given by the Sackur-Tetrode equation:
Notably, the extensivity of the entropy \(S(\xi N,\xi V, T)=\xi S(N, V, T)\) is only possible here thanks to the earlier inclusion of the \(N!\) in the denominator of the semi-classical canonical partition function \(Z\).
Anyways, back to computing \(\sigma\). We can first convert the Maxwell distribution \(\rho_{V}(v)\) for the continuous speed random variable \(V\) into a probability density function \(\rho_K(E)\) for the continuous kinetic energy random variable \(K\) by enforcing \(\rho_K(E)dE=\rho_{V}(v)dv\). Thus, one obtains a gamma distribution:
Equating the average energy per ideal gas particle \(\langle E\rangle/N=\frac{3}{2}kT\) calculated earlier with the expected kinetic energy \(\text E[K]=\int_0^{\infty}E\frac{2}{m^{3/2}\sigma^3}\sqrt{\frac{E}{\pi}}e^{-E/m\sigma^2}dE=\frac{2m\sigma^2}{\sqrt{\pi}}\Gamma(5/2)=\frac{3m\sigma^2}{2}\) yields an equation for \(\sigma\) that can be solved to yield \(\sigma=\sqrt{kT/m}\) as claimed earlier.
The purpose of this post is to lay out the basic theory of ensembles in statistical mechanics in addition to some examples. At the end, the goal will be to convincingly demonstrate ensemble equivalence.
Any system (e.g. a gas, a ferromagnet, a Bose-Einstein condensate, the universe, etc.) in equilibrium is described by some state space \(\mathcal H\) whose elements are called states of the system (or one often also encounters the term microstates used). For classical systems, \(\mathcal H\cong\textbf R^{6N}\) is typically just the product of \(N\) copies of phase space \((\textbf x,\textbf p)\in\textbf R^6\) for a system of \(N\) particles, though it could also be something more discrete such as \(\mathcal H\cong\{-1,1\}^N\) encoding the \(2^N\) configurations of an Ising ferromagnet of \(N\) spins. For quantum systems, \(\mathcal H\) is (as the notation for it suggests) the Hilbert space of that system. When encountering a new system in nature, the hardest obstacle can just be to figure out what the right choice of state space \(\mathcal H\) for such a system is in the first place.
Armed with a state space \(\mathcal H\), an ensemble \(\mathcal E\) in the broadest sense is simply any subset of state space \(\mathcal E\subset\mathcal H\). For instance, if the state space is \(\textbf R^3\), then the plane \(x+y+z=1\) or the sphere \(x^2+y^2+z^2=1\) would both be ensembles of the state space \(\textbf R^3\). In practice, ensembles of physical systems tend to be cut out from state space \(\mathcal H\) by constraining various quantities (intensive or extensive) to be conserved (this is reasonable because the system is in equilibrium). Sometimes it is said that an ensemble is a collection of microstates compatible with a given macrostate (and such (micro)states are said to be accessible).
Isolated/Equiprobable/Microcanonical Ensemble
The microcanonical ensemble \(\mathcal E_E\) of a given system enforces unwavering conservation of energy \(E\) (though there may also be other conserved quantities like particle number \(N\) and volume \(V\)). That is, only states with the prescribed energy \(E\) are accessible. Physically, this is realized by having the system be utterly isolated from its environment (in practice, the universe is the only truly isolated system, but it turns out later this is the most important system one will need to apply the microcanonical ensemble to).
An important axiom of statistical mechanics is that all accessible microstates in the system’s microcanonical ensemble \(\mathcal E_E\) are equiprobable. This is sometimes called the principle of equal a priori probabilities or the fundamental assumption of statistical mechanics. Regardless, it plays the same kind of fundamental role that e.g. the fundamental theorem of calculus play to calculus (except that the latter can is a theorem that can be proven whereas the former is an axiom that is assumed). One can rationalize it in various heuristic ways (e.g. appealing to the principle of detailed balance from kinetic theory) though ultimately, as with anything in theoretical physics, the proper justification comes from experiments.
Although for a classical system like a liquid or gas, it would seem that the number/density/degeneracy/multiplicity of states \(g(E):=|\mathcal E_E|\) in the microcanonical ensemble is uncountably infinite, this will be swept under a rug for now. Then one postulates that the probability \(p_{\mu}\) of observing the system in an accessible state \(\mu\in\mathcal E_E\) is just:
\[p_{\mu}=\frac{1}{g(E)}\]
(the choice of notation “\(\mu\)” comes about because e.g. a micrometer is written \(\mu\text{m}\) so the “\(\mu\)” should be read like “micro” which is reminiscent of the phrase “microstate”).
Given any isolated system, one can always mentally partition the system into a bunch of subsystems, say \(\mathcal N\) pieces. Clearly, this doesn’t affect the fact that the system is still isolated and so only has access to states in its microcanonical \(E\)-ensemble \(\mathcal E_E\). However, the \(\mathcal N\) individual subsystems are no longer isolated since they can interact with each other to shuffle energy back and forth, so long as the system’s total energy \(E_1+E_2+…+E_{\mathcal N}\) remains conserved at \(E\).
The question one would like to answer then is: what is the most likely \(\mathcal N\)-tuple of energies \((E_1,E_2,…,E_{\mathcal N})\in\textbf R^{\mathcal N}\) (notice this is asking about a macrostate)? That is, one would like to maximize the joint probability distribution (a separable function of \(E_1,E_2,…,E_{\mathcal N}\)):
subject to the constraint \(E_1+E_2+…+E_{\mathcal N}=E\), where \(g(E)=(g_1*g_2*…*g_{\mathcal N})(E)\). Implementing this with a Lagrange multiplier \(\lambda\), one can check that:
is the same for all subsystems \(1\leq i\leq\mathcal N\). This also motivates a log-scale measurement \(S_i(E_i)\) of the number of ways to distribute energy \(E_i\) among the microscopic constituents of the \(i\)-th system (cf. the decibel scale in acoustics or the magnitude scale in astronomy) called the (microcanonical/Boltzmann) entropy:
is the same for all systems in the most likely \(E\)-partition, where \(\beta_i(E_i)=\frac{1}{k_BT_i(E_i)}\). Here, \(T_i=T_i(E_i)\) is the familiar concept of temperature, and arises in the microcanonical ensemble as an emergent parameter determined by the energy \(E_i\). Although people sometimes say that temperature is only defined in equilibrium, this clearly need not be the case (e.g. nonequilibrium temperature evolution in time is described by the heat equation \(\dot{T}=\alpha T^{\prime\prime}\)) though certainly what is special about temperature is that it is very likely to be the same for a bunch of bodies in equilibrium \(t\to\infty\).
It’s worth re-examining the derivation above in terms of entropy \(S\) rather than the degeneracy \(g=e^{S/k_B}\). First, notice that rather than maximizing the probability \(\prod_{i=1}^{\mathcal N}g_i(E_i)/g(E)\) one can focus on just maximizing the numerator \(\prod_{i=1}^{\mathcal N}g_i(E_i)\) (representing the total number of states where system #\(1\) has energy \(E_1\), system #\(2\) has energy \(E_2\), etc.) since the denominator \(g(E)\) is just a constant. Second, since logs are monotonically increasing, this is equivalent to maximizing \(\ln\prod_{i=1}^{\mathcal N}g_i(E_i)=\sum_{i=1}^{\mathcal N}\ln g_i(E_i)\). Multiplying by Boltzmann’s constant \(k_B\), it is thus clear that the most likely \(E\)-partition (which need not be an equipartition!) is the one that maximizes the total entropy \(\sum_{i=1}^{\mathcal N}S_i(E_i)\) of the isolated system.
To check that one really has a maximum (and not a minimum), one can take the “Lagrangian”:
which is diagonal and, provided that the heat capacities \(C_i^{-1}:=\frac{\partial T_i}{\partial E_i}\) of the subsystems are all positive as required for equilibrium stability, is clearly negative-definite everywhere (and will thus also be so when restricted to the tangent space of the constraint (at the equal-\(T\) \(E\)-partition) which in this case is just the constraint itself \(E_1+E_2+…+E_{\mathcal N}=E\) because it’s already a plane).
To summarize this section: the most likely \(E\)-partition:
\[E=E_1+E_2+…+E_{\mathcal N}\]
maximizes total entropy \(S_1(E_1)+S_2(E_2)+…+S_{\mathcal N}(E_{\mathcal N})\) by thermalizing \(T_1(E_1)=T_2(E_2)=…=T_{\mathcal N}(E_{\mathcal N})\). These considerations follow respectively from simple probability and calculus, and represent one way to understand the \(2\)nd law of equilibrium (a.k.a. thermodynamics).
Interpretation of “Probability”
The meaning of the word “probability” \(p_{\mu}\) above and also throughout this post deserves a brief comment. Basically, there are \(2\) Fourier-dual ways to think about it:
1. (Frequency Interpretation): Fix a time \(t\) and consider \(\mathcal N\gg 1\) copies of the system. Then the expected number \(\langle \mathcal N_{\mu}\rangle\leq\mathcal N\) of these \(\mathcal N\) systems in state \(\mu\) will be \(\langle \mathcal N_{\mu}\rangle=p_{\mu}\mathcal N\).
2. (Time Interpretation): Fix \(\mathcal N=1\) copy of the system and consider as time \(t\) evolves how the state \(\mu(t)\) of the system fluctuates. Then over a long enough time interval \(\Delta t\), the expected amount of time \(\langle\Delta t_{\mu}\rangle\) the system spent in state \(\mu\) will be \(\langle\Delta t_{\mu}\rangle=p_{\mu}\Delta t\).
But why should these \(2\) interpretations both be valid? Can one show that they are logically equivalent to each other? The answer is no. Rather, this is another axiom which often goes by the name of the ergodic hypothesis. In practice, most systems are ergodic.
Thermal/Boltzmann/Canonical Ensemble
Consider the \(\mathcal N=2\) case where the joint probability distribution of energies simplifies to:
where \((g_1*g_2)(E)=\sum_{E_1}g_1(E_1)g_2(E-E_1)\). Now make the assumption that system #\(1\) is some large system of interest in equilibrium with system #\(2\) which is an infinitely larger heat bath with infinite heat capacity \(C_2=\infty\) representing the rest of the universe/environment/surroundings which is not of interest. Then:
So \(T_2\neq T_2(E_2)\). This means the earlier definition of \(\beta_2\):
\[\beta_2=\frac{\partial\ln g_2}{\partial E_2}\]
now becomes a differential equation with exponentially growing degeneracy \(g_2(E_2)=g_2(0)e^{\beta E_2}\) in light of \(\beta:=\beta_2\neq\beta_2(E_2)\). Alternatively, the extensive quantities \(S_2,E_2\) are on equal footing \(TS_2(E_2)=E_2+TS_2(0)\).
At this point, a natural step would be to Taylor expand \(g_2(E-E_1)\approx g_2(E)-E_1g’_2(E)\). Taking the two systems to be in equilibrium, hence sitting at the same coldness \(\beta=\frac{g_1′(E_1)}{g_1(E_1)}=\frac{g_2′(E_2)}{g_2(E_2)}\approx\frac{g_2′(E)}{g_2(E)}\).
Substituting into the probability distribution yields the exponentially suppressed Boltzmann distribution at temperature \(T\):
where the canonical partition function \(Z(\beta):=\sum_{E_1}g_1(E_1)e^{-\beta E_1}\equiv\int_{-\infty}^{\infty}dE_1g_1(E_1)e^{-\beta E_1}\) is just the Laplace transform of the density of states \(g_1(E_1)\). It follows that the canonical partition function \(Z(\beta)\) contains exactly the same information as the spectrum \(g_1(E_1)\) of a system which can be recovered from knowledge of \(Z(\beta)\) by the inverse Laplace transform:
(notation simplified here to forget about the heat bath) Due to the Laplace transform relationship between \(g(E)\) and \(Z(\beta)\), many standard properties of the transform can be immediately imported. For instance, the density of states \(g(E)=(g_1*g_2*…*g_{\mathcal N})(E)\) of \(\mathcal N\) independent systems, by the convolution theorem, that the corresponding non-interacting partition function is just the \(\mathcal N\)-fold product \(Z(\beta)=Z_1(\beta)Z_2(\beta)…Z_{\mathcal N}(\beta)\).
Given the relative nature of the energy \(E\), the proper, gauge-invariant way to understand the Boltzmann distribution comes from looking at probability ratios:
which depends only on the gauge-invariant energy difference \(\Delta E:=E-E_0>0\) through the Boltzmann factor \(e^{-\beta\Delta E}\). Heuristically, the first factor \(g(E)/g(E_0)=e^{\Delta S/k_B}\) is entropic while the second factor \(e^{-\beta\Delta E}\) is energetic. That is, if one thinks of \(E_0\) as a ground state energy, then states with energy \(E\) exceeding \(E_0\) by more than the thermal energy scale \(\beta^{-1}=k_BT\) are exponentially unlikely to be observed…that is, unless the entropic factor \(g(E)/g(E_0)\) can come to the rescue by growing even faster to offset this exponential suppression. In particular, this will be easier at lower temperature \(T\) where more interesting, high-energy physics can be observed.
In stark contrast to the microcanonical ensemble \(\mathcal E_E\), the energy \(E\) in the canonical ensemble \(\mathcal E_T\) is no longer conserved, but rather fluctuates around the ensemble average \(\langle E\rangle=\int_{-\infty}^{\infty}dE\rho(E)E\) where \(\rho(E)=g(E)e^{-\beta E}/Z\). In fact, one can view the energy \(E:\mathcal E_T\to\textbf R\) as a continuous random variable with Boltzmannian probability density function \(\rho(E)\). Then, as per the shift theorem for Laplace transforms, the moment generating function \(\langle e^{\tilde{\beta}E}\rangle\) of \(E\) is just:
Finally, one can check properly (using chain rule), or just remember on dimensional analysis grounds the following version of the fluctuation-dissipation theorem:
This is just a purely mathematical concept from probability theory.
In physics, probability distributions \(p_{\mu}\) arise due to an observer’s classical ignorance about the state of a system (even in quantum mechanics, von Neumann entropy \(S=-\text{Tr}\rho\ln\rho\) is probing the classicaluncertainty associated with a density operator \(\rho\) of a mixed state, not the Heisenbergian uncertainty intrinsic to pure quantum states).
Entropy is thus subjective because it depends on how much one knows about a system (e.g. does one know \(N,V,E\) exactly, or merely averages \(\langle N\rangle,\langle V\rangle,\langle E\rangle\), etc.) which informs one’s “personal” probability distribution over the system’s accessible state space \(\mathcal E\). For instance, God knows the exact state \(\mu^*\) of the universe, so that the delta distribution \(p_{\mu}=\delta_{\mu\mu^*}\) spiking at \(\mu=\mu^*\) has entropy \(S=0\). From God’s perspective therefore, the entropy of the universe is always \(S=0\). The average person however isn’t as omniscient as God, and so from their reference frame the universe has \(S>0\).
In casual conversation, when one speaks of e.g. the entropy \(S\) of a glass of water, or the change in entropy \(\Delta S\) of that glass of water, everything can be framed more clearly if, instead of using the word “entropy”, one replaces it by the phrase “classical ignorance“. This naturally leads one to rephrase one’s language; one could instead say “my entropy towards that glass of water is \(S\)” or “my entropy towards the heat bath has increased by \(\Delta S\)”. Moreover, if one’s information about the system changes, then prior probabilities need to be updated to posterior probabilities as well, changing one’s entropy towards a system.
In the microcanonical ensemble, the Boltzmann entropy is \(S=k\ln\Omega\). However, this is no longer appropriate for the canonical ensemble because the energy \(E\) is no longer fixed and therefore \(\Omega=\Omega(E)\) is not well-defined.
Instead, we can figure out what \(S\) is in the canonical ensemble (actually, in any thermodynamic ensemble) by recycling the trick of viewing the system + heat bath \(\mathcal S\cup\mathcal E\) as living in the microcanonical ensemble so that we would be able to apply the principle of equal a priori probabilities. In fact, one additional trick is also required. This trick is to consider \(N\) identical copies \(\prod_N\mathcal S\cup\mathcal E\) of \(\mathcal S\cup\mathcal E\). If one likes, one can think of \(N\) identical non-interacting universes each containing a copy of \(\mathcal S\cup\mathcal E\). If one writes \(0\leq N_n\leq N\) to be the number of these \(N\) universes whose system \(\mathcal S\) is in microstate \(n\) so that \(\sum_n N_n=N\), then by the law of large numbers one expects that the \(N_n= p_n N\) are fixed by the microstate probability distribution \(p_n\) (not necessarily the canonical ensemble \(p_n=e^{-\beta E_n}/Z\)) in any of the \(N\) identical universes.
As a toy example for building intuition, suppose only two microstates \(|\uparrow\rangle\) and \(|\downarrow\rangle\) are accessible with probabilities \(p_{|\uparrow\rangle}=30\%\) and \(p_{|\downarrow\rangle}=70\%\) so that in \(N=100\) universes we expect \(p_{|\uparrow\rangle}N=30\) of them to be in the \(|\uparrow\rangle\) microstate and \(p_{|\downarrow\rangle}N=70\) to be in the \(|\uparrow\rangle\) microstate. Then a microstate of the entire collection of \(N=100\) universes would consist of specifying the microstate (either \(|\uparrow\rangle\) or \(|\downarrow\rangle\)) for each of the \(N=100\) universes such as to satisfy the constraints that \(30\) are \(|\uparrow\rangle\) and \(70\) are \(|\downarrow\rangle\). This is precisely analogous to if we had \(30\) \(|\uparrow\rangle\) “letters” and \(70\) \(|\downarrow\rangle\) letters and asked for the number of distinct \(100\)-letter words that could be formed from their permutations. Clearly, the answer is given by the multinomial coefficient \(\Omega=\frac{100!}{30!70!}\). Generalizing this, we therefore have:
With the corresponding Boltzmann entropy in the microcanonical ensemble (after applying Stirling’s approximation in the form \(\ln N!\approx N\ln N-N\)):
\[S\approx -Nk\sum_{n}p_n\ln p_n\]
However, for non-interacting systems (universes!) the entropy is additive (alternatively, just set \(N=1\) universe) so one obtains the formula for Gibbs entropy as promised long ago:
\[S=-k\sum_{n}p_n\ln p_n\]
Back in the canonical ensemble \(p_n=e^{-\beta E_n}/Z\), this leads to a nice formula for entropy of a canonical ensemble:
where the first equation can be roughly interpreted as a Boltzmann-like entropy \(k\ln Z\) with an additional energetic contribution \(k\beta\langle E\rangle=\langle E\rangle/T\). In fact, there is a more direct way to see how the Boltzmann entropy emerges by looking at how the partition function \(Z\) in the above formula for \(S\) behaves in the thermodynamic limit. Specifically, recall that \(Z=\sum_n e^{-\beta E_n}=\sum_{E_n}\Omega_{\mathcal S}(E_n)e^{-\beta E_n}\) which, being a sum of exponentials, will be dominated by the maximum value \(Z\approx \Omega_{\mathcal S}(E^*)e^{-\beta E^*}\) occurring at some energy \(E^*\) satisfying \(\frac{\partial \Omega_{\mathcal S}(E^*)e^{-\beta E^*}}{\partial E^*}=0\) or equivalently\(\frac{\partial\ln\Omega_{\mathcal S}(E^*)}{\partial E^*}=\beta\). Then clearly \(\langle E\rangle=-\frac{\partial\ln Z}{\partial\beta}=-\frac{\partial\ln \Omega_{\mathcal S}(E^*)}{\partial E^*}\frac{\partial E^*}{\partial\beta}+E^*+\beta\frac{\partial E^*}{\partial\beta}=E^*\) as expected in the thermodynamic limit and:
Heat Capacity Contribution From Schottky Anomaly in Canonical Ensemble
Consider a system of \(N\) fixed, non-interacting electrons \(e^-\) placed in an external \(\textbf B_{\text{ext}}\)-field. Each electron experiences a Zeeman splitting of magnitude \(\Delta E=E_{\downarrow}-E_{\uparrow}=\hbar\omega_0\) with \(\omega_0=\gamma B_{\text{ext}}\) the Larmor frequency of spin angular momentum precession about \(\textbf B_{\text{ext}}\). If we focus for a moment on just \(N=1\) of these fixed electrons \(e^-\), then \(Z=e^{-\beta E_{\uparrow}}+e^{-\beta E_{\downarrow}}=e^{\beta\hbar\omega_0/2}+e^{-\beta\hbar\omega_0/2}=2\cosh\frac{\beta\hbar\omega_0}{2}\). But then, recalling that the \(N\) electrons are non-interacting, it follows that the total partition function is:
In practice, this Schottky anomaly due to the spin angular momentum contribution to \(C_V\) is dwarfed by contributions from phonons and from conduction \(e^-\) in metals. Furthermore, it doesn’t account for coupling between spins since we assumed the \(e^-\) were non-interacting.
As an example of computing entropy in the canonical ensemble, for the earlier Schottky anomaly this is:
As another remark, if we take \(M_{\text{ind}}:=N_{|\uparrow\rangle}-N_{|\downarrow\rangle}=N\tanh\frac{\beta\hbar\omega_0}{2}\) to be the net induced magnetization along \(\textbf B_{\text{ext}}\), then one can also establish a version of Curie’s law \(\chi:=\frac{\partial M}{\partial B_{\text{ext}}}=\beta\mu N\text{sech}^2\frac{\beta\hbar\omega_0}{2}\approx\beta\mu N\) in the high-temperature limit \(T\to\infty\) (where here \(\mu=\hbar\gamma/2\)).
Helmholtz Free Energy
The phrase “free energy” is used ubiquitously throughout thermodynamics. From a practical engineering perspective, free energies can always be interpreted as the maximum available work \(W_{\text{available}}^*\) under specific circumstances. In mechanics, this would simply called “potential energy”, and to reflect this analogy in thermodynamics the free energy functions are also known as thermodynamic potentials.
Because the canonical ensemble is the \(NVT\)-ensemble, in particular at constant volume \(V\) and constant temperature \(T\) the Helmholtz free energy \(F:=\langle E\rangle -TS=-kT\ln Z\) which has the most direct connection to the partition function \(Z\) (no derivatives, just the log) is minimized (NEED TO COME BACK TO THIS TO ELABORATE WHY SO, see: https://physics.stackexchange.com/questions/341345/why-does-the-gibbs-free-energy-need-to-be-minimized-for-an-equilibrium for a rough idea).
Chemical Potential & Grand Canonical Ensemble
Conceptually, having already made the transition from the microcanonical ensemble to the canonical ensemble, it’s not difficult to make the jump from the canonical ensemble to the grand canonical ensemble. Here, the system \(\mathcal S\) interacts with a heat bath \(\mathcal E\) at constant temperature \(T\) and chemical potential \(\mu:=-T\frac{\partial S}{\partial N}\). As usual, assume the heat bath has a lot more energy and particles than the system \(N_{\mathcal E},E_{\mathcal E}\gg N_{\mathcal S},E_{\mathcal S}\) and that the composite system \(\mathcal S\cup\mathcal E\) is in the microcanonical ensemble so by the principle of equal a priori probabilities:
where where \(|\mu_{\mathcal S}\rangle\) denotes an arbitrary microstate of just the system \(\mathcal S\) and \(\Omega_{\mathcal S\cup\mathcal E}=\sum_{|\tilde{\mu}_{\mathcal S}\rangle}\Omega_{\mathcal E}(N-N_{|\tilde{\mu}_{\mathcal S}\rangle},E-E_{|\tilde{\mu}_{\mathcal S}\rangle})\) which ensures \(\sum_{|\mu_{\mathcal S}\rangle}p_{|\mu_{\mathcal S}\rangle}=1\). As usual, the \(3\) steps are then:
Rewrite the heat bath’s multiplicity \(\Omega_{\mathcal E}\) in terms of its Boltzmann entropy \(\Omega_{\mathcal E}=e^{S_{\mathcal E}/k}\)
Taylor expand the heat bath’s Boltzmann entropy \(S_{\mathcal E}(N_{\mathcal E},E_{\mathcal E})\) about the total \((N,E)\) of the composite system + heat bath \(\mathcal S\cup\mathcal E\).
Plug in the definitions of \(T\) and \(\mu\) into the first partial derivative terms that appear in such a multivariable Taylor expansion.
Doing this, you end up with the grand canonical ensemble microstate probability distribution:
Now it’s the grand canonical partition function \(\mathcal Z=\sum_{|\tilde{\mu}_{\mathcal S}\rangle}e^{-\beta(E_{|\tilde{\mu}_{\mathcal S}\rangle}-\mu N_{|\tilde{\mu}_{\mathcal S}\rangle})}\) which provides the gateway to computing all the macroscopic observables of a grand canonical thermodynamic system. One can check the following formulas (all referring to the system \(\mathcal S\)):
In the grand canonical ensemble, we of course have the grand canonical potential \(\Phi:=F-\mu N\) the Legendre transform of the Helmholtz free energy \(F\) from \(N\) to \(\mu\) (since now in the grand canonical ensemble it is \(\mu\) no longer \(N\)). Note that \(\Phi=-kT\ln \mathcal Z\). Finally, using the extensivity argument \(\Phi(\mu, \xi V,T)=\xi\Phi(\mu, V, T)\), it follows that \(\Phi\propto V\), and in fact the constant of proportionality is the (negative of the) pressure \(p=p(\mu, T)\), so \(\Phi=-pV\). This is because of the differential \(d\Phi=-Nd\mu-pdV-SdT\).
Recall that if one confines a free quantum particle onto a circle \(S^1\) of radius \(R\), then the de Broglie wavelength of the \(n\)-th \(H=P^2/2m\)-eigenstate must be quantized in the obvious manner \(\lambda_n=2\pi R/n\), leading to the angular wavenumber \(k_n=n/R\), the momenta \(p_n=n\hbar/R\) and the energies \(E_n=n^2\hbar^2/2I\) with \(I=mR^2\) the classical moment of inertia. Thus, more broadly, because the spatial domain \(S^1\) of the quantum particle was compact, it follows that the conjugate momenta \(p_n\) were discrete. However, it should not be a surprise to hear that the converse holds too by virtue of Pontryagin duality. That is to say, if one instead begins by discretizing space \(S^1\mapsto\textbf Z/N\textbf Z\) into a lattice of \(N\) equally spaced atomic sites such as in a crystal, then one should now expect the momenta \(p\) to lie in some compact interval (which will turn out to be called the Brillouin zone). Thus, roughly speaking, compactness and discreteness are Pontryagin/Fourier duals of each other. In some sense, this is just another manifestation of the Heisenberg uncertainty principle.
One might think to just stick with the usual free Hamiltonian \(H=P^2/2m\), however this doesn’t work because \(P\) is undefined on \(\mathcal H=L^2(\textbf Z/N\textbf Z\to\textbf C)\). Instead, we employ a so-called tight-binding Hamiltonian \(H_{\text{tight-binding}}\) defined by:
where for \(n=1,2,…,N\), \(|n\rangle\) denotes a position \(X\)-eigenstate of the quantum particle localized on the \(n\)-th atomic lattice point, with probability \(E_{\text{hop}}^2/E_0^2\) of locally tunneling from \(|n\rangle\) to a neighbouring atom \(|n\pm 1\rangle\). Some remarks:
These \(N\) position \(X\)-eigenstates span the finite \(N\)-dimensional state space \(\mathcal H=\text{span}_{1\leq n\leq N}|n\rangle\).
The matrix elements of the tight-binding Hamiltonian \(H_{\text{tight-binding}}\) in the \(X\)-eigenbasis \(\{|n\rangle:n=1,…,N\}\) are given by \(\langle m|H_{\text{tight-binding}}|n\rangle=E_0\delta_{nm}-E_{\text{hop}}\delta_{n,m+1}-E_{\text{hop}}\delta_{n,m-1}\) or one can combine \(\delta_{n,m+1}+\delta_{n,m-1}=\delta_{n,m\pm 1}\) into a single Kronecker delta. Such an \(N\times N\) Hermitian matrix \([H_{\text{tight-binding}}]_{(|n\rangle:n=1,…,N)}^{(|n\rangle:n=1,…,N)}\in i\frak u\)\((N)\) has \(E_0\) on the diagonal and \(-E_{\text{hop}}\) on the subdiagonal and superdiagonal, in addition to a \(-E_{\text{hop}}\) at the bottom-left and top-right corners due to the periodicity \(|N+1\rangle\cong |1\rangle\) of \(\textbf Z/N\textbf Z\). This reinforces the local tunneling interpretation earlier.
\(E_0\) is the “on-site energy” associated with being in a particular atomic wave while \(E_{\text{hop}}\) is like a “delocalization energy” for reasons that will become clear.
An illustrative case is that of \(N=5\) atoms, with a single electron \(e^-\) tunneling around this regular pentagon lattice such as one might find in a cyclopentane ring:
In this case, the \(5\times 5\) matrix of the tight-binding Hamiltonian looks like:
To find the allowed energies \(E\) of the quantum particle (in this case the electron \(e^-\)) tunneling around on this pentagonal lattice, one could play the standard game of finite-dimensional linear algebra which is to solve \(\det([H_{\text{tight-binding}}]_{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}^{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}-E1)=0\) for the roots \(E\) of the characteristic polynomial of \([H_{\text{tight-binding}}]_{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}^{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}\), and then go back and find the corresponding \(H_{\text{tight-binding}}\)-eigenstates. Indeed, this is a few lines of Python code using the SymPy library:
which says that there are \(3\) energies, a singlet ground state at \(E=E_0-2E_{\text{hop}}\), a degenerate doublet at \(E=E_0-E_{\text{hop}}/\varphi\) and a highest-energy degenerate doublet at \(E=E_0+\varphi E_{\text{hop}}\) where \(\varphi=(1+\sqrt{5})/2\) is the golden ratio:
However, to handle the more general case, it is helpful to exploit the translational symmetry of the lattice group \(\textbf Z/N\textbf Z\cong C_N\). The way this manifests is that the tight-binding Hamiltonian matrix \([H_{\text{tight-binding}}]_{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}^{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}\) is a particular kind of Toeplitz matrix known as a circulantmatrix, i.e. the rows and columns look like successive cyclic shifts of the same “circulant vector” \(\begin{pmatrix} -E_{\text{hop}}\\E_0\\-E_{\text{hop}}\\0\\0\end{pmatrix}\).
There is a fundamental universality theorem that everyone ought to know about circulant matrices (which is really not all that surprising), namely that they are diagonalized by the discrete Fourier transform(DFT). This is because any \(N\times N\) circulant matrix \(C\) defined by a circulant vector \(\textbf c\in\textbf C^N\) (the first column of \(C\)) acting on a vector \(\textbf x\in \textbf C^N\) produces the vector \(C\textbf x\in\textbf C^N\) which is nothing more than the discrete circular convolution \(C\textbf x=\textbf c*\textbf x=\textbf x*\textbf c\) of the circulant vector \(\textbf c\) with the input vector \(\textbf x\). So by the convolution theorem, \(F_NC\textbf x=F_N\textbf c\odot F_N\textbf x\) where \(F_N\in U(N)\) is the \(N\times N\) unitary DFT matrix of \(N\)-th roots of unity with components \((F_N)_{nm}=\omega_{N}^{-nm}/\sqrt{N}\) for \(0\leq n,m\leq N-1\) where \(\omega_N:=e^{2\pi i/N}\) is the \(N\)-th primitive root of unity. In particular, one can write the convolution theorem in the more suggestive form \(F_NCF_N^{-1}\textbf x=F_N\textbf c\odot \textbf x\). In particular, one can check that this means \(F_N\) (independent of the circulant matrix \(C\)!) diagonalizes \(C\) with the resultant diagonal matrix of eigenvalues \(F_NCF_N^{-1}=\text{diag}(F_N\textbf c)\) dependent on \(C\). In other words, the \(N\) columns of \(F_N\) are the eigenvectors of any \(N\times N\) circulant matrix, and are called the \(N\)-th Fourier modes.
Thus, all this discussion goes against the usual finite-dimensional linear algebra routine since we’ve written down the eigenvectors first before even finding their eigenvalues, but of course the symmetry here warrants it. Note that there is nothing inherently quantum-mechanical about any of this; for instance, if one has \(N\) classical masses \(m\) connected by identical springs \(k\) all sliding around \(S^1\), then the equation of motion for the state of such a system is also expressible via circulant matrices because each mass interacts only with its nearest neighbors.
Back to the physics then, we can immediately write down the normalized \(H_{\text{tight-binding}}\)-eigenstates for a \(5\)-atom pentagonal lattice. For instance, one of them is simply a maximally delocalized LCAO of all \(5\) \(X\)-eigenstates \(\frac{1}{\sqrt{5}}(|1\rangle+|2\rangle+|3\rangle+|4\rangle+|5\rangle)\). The others are the remaining columns of the \(5\times 5\) DFT matrix \(F_5\):
import sympy as sp
import numpy as np
init_printing(use_unicode=True)
DFT_matrix = Matrix(np.empty((5, 5), dtype=complex))
for n in np.arange(0, 5):
for m in np.arange(0, 5):
DFT_matrix[n, m] = nsimplify(sp.expand_complex(sp.exp(-2j * pi * n * m / 5)/sp.sqrt(5)))
DFT_matrix
Although keep in mind that all of these complex numbers have modulus \(1/\sqrt{5}\), so in some sense all of these are all fully delocalized around the ring.
The more interesting quantity one would like to compute are the energies \(E\). These were already calculated above using SymPy but of course if one already knows the eigenvectors then the energies are trivial to obtain. For instance, acting \([H_{\text{tight-binding}}]_{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}^{(|1\rangle, |2\rangle, |3\rangle, |4\rangle, |5\rangle)}\) on \((1,1,1,1,1)^T\) (doesn’t even have to be normalized!) gives \(E=E_0-2E_{\text{hop}}\) so that \(\frac{1}{\sqrt{5}}(|1\rangle+|2\rangle+|3\rangle+|4\rangle+|5\rangle)\) is the ground state of the system. Proceeding along the columns of the DFT matrix \(F_5\), the next energy lies at \(E=E_0-e^{-2\pi i/5}E_{\text{hop}}-e^{-8\pi i/5}E_{\text{hop}}=E_0-E_{\text{hop}}/\varphi\), the next two both have maximum energy \(E=E_0+\varphi E_{\text{hop}}\), and finally the last one also has energy \(E=E_0-E_{\text{hop}}/\varphi\).
So this is all fine for \(N=5\), and has hopefully provided intuition for what’s going on. Generalizing, one finds that the \(n\)-th Fourier mode eigenvector \(\boldsymbol{\psi}_n=(1, e^{2\pi i n/N},…,e^{2\pi i n(N-1)/N})^T\) for \(n=0,1,…,N-1\) of the tight-binding Hamiltonian matrix \([H_{\text{tight-binding}}]_{(|n\rangle:n=1,…,N)}^{(|n\rangle:n=1,…,N)}\) has energy \(E_n=E_0-(e^{2\pi i n/N}+e^{2\pi i n(N-1)/N})E_{\text{hop}}\). But \(e^{2\pi i n(N-1)/N}=e^{-2\pi i n/N}\) so overall we get a real energy as required by Hermiticity of \(H_{\text{tight-binding}}\):
The \(\lfloor N/2\rfloor +1\) degeneracy \(E_n=E_{-n}\) of the energies is thus manifest via the even nature of the cosine function.
Finally, define the angular wavenumber \(k_n\) of the \(n\)-th Fourier mode \(\boldsymbol{\psi}_n\) by requiring \(k_n a:=\frac{2\pi n}{N}\) where \(a\) is the separation between neighbouring atoms. Then of course one can also write \(E_n=E_0-2E_{\text{hop}}\cos(k_n a)\). If we think of \(n\) as going from \(n\approx -N/2,…,N/2\), then the wavenumber \(k_n\) lies in the compact interval \(k_n\in[-\pi/a,\pi/a]\), called the Brillouin zone of the system. The dispersion relation for the \(N\)-atom ring is then given by:
This mimics the usual free-space dispersion relation \(E(k)=\frac{\hbar^2k^2}{2m}\) near \(k=0\) where the cosine is approximately parabolic. Indeed, performing a Taylor expansion, then for a slow-moving electron \(k\to 0\) it looks like a free particle with effective mass \(m_{\text{eff}}=\hbar^2/2a^2E_{\text{hop}}\) which has nothing to do with the actual mass \(m\) of the electron, being inherited from properties \(E_{\text{hop}}\) and \(a\) of the lattice.
Consider a classical particle subject to Newton’s second law \(\dot{\textbf p}=\textbf F\). If one takes the dot product of both sides with the particle’s velocity \(\dot{\textbf x}\) so that \(\dot{\textbf p}\cdot\dot{\textbf x}=\textbf F\cdot\dot{\textbf x}\), the left hand side is an exact total derivative \(\dot T=\dot{\textbf p}\cdot\dot{\textbf x}\) of the kinetic energy \(T=\frac{\textbf p\cdot\dot{\textbf x}}{2}\) while the right hand side is by definition the total power \(P\) developed, thus obtaining the familiar \(\dot T=P\). Of course we all know how useful this is, especially when \(\textbf F=-\frac{\partial V}{\partial\textbf x}\) is conservative so that \(P=-\dot V\) and integration yields a conserved energy \(E=T+V\).
But what if, instead of dotting the equation of motion \(\dot{\textbf p}=\textbf F\) with the velocity \(\dot{\textbf x}\) as we did above, we were instead curious about what fruitful insights might arise if we simply dot the equation of motion \(\dot{\textbf p}=\textbf F\) with the position \(\textbf x\) itself rather than the velocity \(\dot{\textbf x}\). This leads to:
But we notice that the right-hand side of the equation looks like the work done by the force \(\textbf F\) if the particle were moved from the origin \(\textbf 0\) to its actual position \(\textbf x\) (strictly speaking, this interpretation is only exactly correct is \(\textbf F\) is a constant force). This suggests that the left-hand side of the equation should be amenable to some kind of energy interpretation. One can notice the “integration by parts” combination \(\dot{\textbf p}\cdot\textbf x=\frac{d}{dt}(\textbf x\cdot\textbf p)-\textbf p\cdot\dot{\textbf x}\) and moreover the last term is just \(2T=\textbf p\cdot\dot{\textbf x}\) twice the kinetic energy, confirming the energy interpretation mentioned above. The remaining term inside the time derivative is a bit more mysterious, consisting of the dot product of the position \(\textbf x\) with its conjugate momentum \(\textbf p\) (feels like a combination you would see in Hamiltonian mechanics, but nothing immediately comes to mind). Let’s call it the virial \(\tilde L:=\textbf x\cdot\textbf p\) because it resembles angular momentum \(\textbf L=\textbf x\times\textbf p\) except with the cross product \(\times\) replaced by a dot product \(\cdot\).
The key insight of the virial theorem is that, in many physical situations of interest in the real world (e.g. bound orbits in a Coulomb potential), the average rate of change of the virial \(\tilde L\) on sufficiently long timescales \(\Delta t\to\infty\) is zero. To see this, recall that the average rate of change over a time interval \(\Delta t\) is just \(\frac{\Delta\tilde L|_{\Delta t}}{\Delta t}\). Under what conditions does this quotient vanish? Of course, it could just happen to sporadically vanish at random specific values of \(\Delta t\) but that’s not a particularly reliable phenomenon. Instead, the more robust way it would (approximately) vanish is if the denominator \(\Delta t\to\infty\) is large. But there is a catch here, which is that the numerator \(\Delta\tilde L|_{\Delta t}\) could also blow up to \(\infty\) at the same time. So really, the requirement we’re after is that the numerator \(\Delta\tilde L|_{\Delta t}\) does not grow faster than the denominator \(\Delta t\) as \(\Delta t\to\infty\). In other words, we require sublinear numerator growth \(\Delta\tilde L|_{\Delta t}=o_{\Delta t\to\infty}(\Delta t)\). Of course, it could be that the numerator actually decays as \(\Delta t\to\infty\), or remains bounded \(\Delta\tilde L|_{\Delta t}=\Theta_{\Delta t\to\infty}(1)\).
Assuming therefore that the physical situation of interest satisfies the \( \frac{\Delta\tilde L|_{\Delta t}}{\Delta t}=0\) hypothesis of the virial theorem, it follows that, on suitably long time scales \(\Delta t\):
In the specific case where the force \(\textbf F\) is not only conservative but specifically arises from a central power-law potential energy \(V(\textbf x)=k|\textbf x|^n\), then the corresponding central force is \(\textbf F=-\frac{\partial V}{\partial\textbf x}=-nk|\textbf x|^{n-2}\textbf x\) and so happily \(\textbf F\cdot\textbf x=-nV(\textbf x)\) yielding the useful corollary of the virial theorem \(2\langle T\rangle_{\Delta t}=n\langle V\rangle_{\Delta t}\) for central power-law forces. For the isotropic harmonic oscillator \(n=2\), this gives the familiar equipartition theorem \(\langle T\rangle_{\Delta t}=\langle V\rangle_{\Delta t}\) while for the Coulomb potential \(n=-1\) this gives \(2\langle T\rangle_{\Delta t}=-\langle V\rangle_{\Delta t}\). Finally, it is not difficult to generalize the virial theorem to multi-particle systems, where the generalization is essentially the obvious one.
The purpose of this post is to contrast the classical treatment of Rutherford scattering with its quantum mechanical treatment. At the end, the surprise will be that for certain important quantities such as the differential cross-section, the classical and quantum scattering calculations agree with each other.
Classical Rutherford Scattering
Classically, Rutherford scattering refers to the scattering of classical particles by a repulsive Coulomb potential \(V(r)=\frac{k}{r}\) with \(k>0\) for a repulsive interaction. In the limit when the particle is far away, the eccentricity (Laplace-Runge-Lenz) vector \(\textbf e=\frac{\textbf p\times\textbf L}{mk}-\hat{\textbf x}\) approximately forms a right triangle as follows:
It follows from Pythagoras that \(e^2=1^2+\frac{p^2_{\infty}L^2}{m^2k^2}\) with \(L=\rho p_{\infty}\) proportional to the impact parameter \(\rho\). Thus, \(e^2=1+\frac{p^4_{\infty}\rho^2}{m^2k^2}\). We can then relate \(p_{\infty}^2=2mE\) with the mechanical energy \(E\) so that \(e^2=1+\left(\frac{2E\rho}{k}\right)^2\). Finally, the geometry of hyperbolas dictates that the scattering angle \(\theta\) is related to the eccentricity \(e\) by \(\cos\left(\frac{\pi}{2}-\frac{\theta}{2}\right)=\sin\left(\frac{\theta}{2}\right)=\frac{1}{e}\). Thus, one finds classically:
More realistically, in scattering experiments one does not throw just one projectile at the scattering center, but rather fires a beam of projectiles. In light of this, recall that the classical differential cross-section \(d\sigma\) is defined to conserve particle number: \(\textbf J_{\text{incident}}\cdot d\boldsymbol{\sigma}=\textbf J_{\text{scattered}}\cdot d\textbf S\). If one imagines an infinitely thin and narrow cylindrical “washer” of \(N\) incident particles, then the incident particle density is \(n=\frac{N}{2\pi\rho d\rho dz}\). Supposing moreover the incident particles move at velocity \(v\), then \(J_{\text{incident}}=nv=\frac{Nv}{2\pi\rho d\rho dz}\). On the other hand by energy conservation \(\dot E=0\) the scattered particles eventually are moving at the same velocity \(v\). The scattering current is therefore \(J_{\text{scattered}}=\frac{Nv}{2\pi r\sin\theta rd\theta dz}\). Finally, using \(dS=r^2 d\Omega\) and isolating for the differential cross-section gives:
However, it turns out that the integrand has an antiderivative \(-2\csc^2\left(\frac{\theta}{2}\right)\) which blows up to \(\infty\) at \(\theta=0\) so that the total cross section \(\sigma=\infty\)! It turns out this has to do with the fact that, although the Coulomb potential \(V(r)=\frac{k}{r}\) decays to \(V(r)\to 0\) as \(r\to\infty\), it does so “too slowly” so that it’s ultimately a long-range force and so all incident particles feel its repulsive scattering influence “to some non-zero extent” so that \(\sigma=\infty\).
Quantum Rutherford Scattering
Although the result for the differential cross-section \(d\sigma\) for Rutherford scattering was obtained through classical mechanics alone, it turns out to agree with quantum mechanical scattering calculations. To demonstrate this, it is helpful to first take a step back and analyze a generic central potential \(V(r)\) which (both classically and quantum mechanically) conserves total angular momentum. Recall in this case, the Schrodinger equation is a separable linear PDE in spherical coordinates with solutions of the form \(\psi(r,\theta,\phi)=\sum_{\ell=0}^{\infty}\sum_{m=-\ell}^{\ell}R_{\ell}(r)Y_{\ell}^{m}(\theta,\phi)\). However, by scattering incident particles along the \(z\)-axis, in our case we expect not just asymptotically but everywhere that \(\partial\psi/\partial\phi=0\Rightarrow L_3\psi=0\) has angular momentum confined to the \(xy\)-plane and so one can set \(m=0\) to reduce the spherical harmonics \(Y_{\ell}^0\propto P_{\ell}(\cos\theta)\) to Legendre polynomials. Redefining the radial functions to absorb any constants, the spherical harmonic expansion of \(\psi(r,\theta,\phi)\) earlier therefore reduces to a partial wave expansion of \(\psi(r,\theta)\):
specifically, the original linear Schrodinger PDE can be thought of as separating into a linear ODE \(\textbf L^2P_{\ell}(\cos\theta)=\ell(\ell+1)\hbar^2P_{\ell}(\cos\theta)\) that defines the Legendre polynomials and a linear ODE for the radial amplitude \(R_{\ell}(r)\) which depends on the exact nature of the central potential \(V(r)\):
First, suppose \(V(r)=0\) everywhere so that there isn’t really any scattering going on (i.e. the Schrodinger equation reduces to a Helmholtz equation). Then as is well-known the radial amplitude \(R_{\ell}(r)\) is given by linear combinations of spherical Bessel functions \(j_{\ell}(kr),y_{\ell}(kr)\) where \(k:=\sqrt{2mE}/\hbar\) and we have the Rodrigues-type formulas for the spherical Bessel functions (sometimes called Rayleigh’s formulas):
where \(\text{cosc}(x):=\cos(x)/x\) makes the spherical Bessel function \(y_{\ell}(x)\) singular at the origin \(x=0\) (the minus sign in front of \(y_{\ell}(x)\) is also merely conventional). It is common to treat the spherical Bessel functions as real and imaginary parts of complex-valued spherical Hankel functions \(h_{\ell}^{(1)}(x):=j_{\ell}(x)+iy_{\ell}(x)\) and \(h_{\ell}^{(2)}(x):=\overline{h_{\ell}^{(1)}(x)}=j_{\ell}(x)-iy_{\ell}(x)\). An especially important case are the \(\ell=0\) \(s\)-waves \(j_0(x)=\text{sinc}(x)\) and \(y_0(x)=-\text{cosc}(x)\), in which case \(h_0^{(1)}(x)=\frac{e^{-ix}}{x}\) and \(h_0^{(2)}(x)=\frac{e^{ix}}{x}\).
For example, the incident plane wave \(\psi(r,\theta)=e^{ikz}=e^{ikr\cos\theta}\) by itself is clearly a solution of the free Schrodinger equation. By linearity, it must therefore be possible to write such a plane wave \(e^{ikz}\) in terms of free partial waves of the form \((\alpha_{\ell}j_{\ell}(kr)+\beta_{\ell}y_{\ell}(kr))P_{\ell}(\cos\theta)\) for \(\ell\in\textbf N\). Indeed, because \(e^{ikz}\) is non-singular anywhere, we only require the spherical Bessel functions \(j_{\ell}(kr)\) of the first kind so \(\beta_{\ell}=0\). The expansion turns out to be (use orthogonality of Legendre polynomials):
Close to the origin \(x\to 0\), the spherical Bessel functions behave like (for \(j_{\ell}(x)\) this is just saying that its Taylor series has its first non-zero term at order \(x^{\ell}\) so that \(d^{\ell}j_{\ell}/dx^{\ell}(x=0)=\ell!/(2\ell+1)!!\); for \(y_{\ell}(x)\) it diverges at \(x=0\) hence being non-differentiable there):
where for example the double factorial \(!!\) is defined (not as the double application of the usual factorial, but rather) by \(5!!=5\times 3\times 1=15\). Asymptotically \(x\to\infty\), the spherical Bessel functions decay like:
\[j_{\ell}(x)\to\frac{\sin(x-\ell\pi/2)}{x}\]
\[y_{\ell}(x)\to-\frac{\cos(x-\ell\pi/2)}{x}\]
If you stare at the graphs above, these properties all seem pretty reasonable. In particular, when the asymptotic \(r\to\infty\) behavior is applied to the plane wave \(\psi(r,\theta)=e^{ikz}=e^{ikr\cos\theta}\), this becomes:
Finally, as in one-dimensional scattering, in the steady state we want the asymptotic wavefunction \(\psi_{\infty}(r,\theta)\) to be not just an incident plane wave \(e^{ikz}=e^{ikr\cos\theta}\) of energy \(E=\hbar^2k^2/2m\) but also to comprise a scattered spherical wave (of the same energy \(E\)) which we therefore write as \(f(\theta)e^{ikr}/r\) for a complex-valued scattering amplitude \(f(\theta)\in\textbf C\) (dependent on the nature of the potential \(V(r)\) one is scattering off of and the incident momentum \(k\) of the beam) modulating the spherical Hankel \(s\)-wave \(e^{ikr}/r\):
This ansatz certainly solves the Schrodinger equation asymptotically \(r\to\infty\) because asymptotically we expect \(V(r)\to 0\) which reduces to the earlier analysis where we had pretended \(V(r)=0\) everywhere. If we could compute the scattering amplitude \(f(\theta)\), then we immediately obtain the differential cross-section because \(J_{\text{incident}}=\frac{\hbar k}{m}\), \(J_{\text{scattered}}=\frac{\hbar k}{mr^2}|f(\theta)|^2\) so we have the fundamental relation \(d\sigma=|f(\theta)|^2d\Omega\) and as a corollary the cross-section itself is \(\sigma=2\pi\int_{-1}^{1}|f(\theta)|^2d\cos\theta\) (however, it turns out this corollary can be strengthened to the optical theorem \(\sigma=\frac{4\pi}{k}\Im f(0)\) where all the information about the potential \(V(r)\) is contained in the forward scattering amplitude \(f(0)\)).
Now, by expanding the scattering amplitude itself in a simplified partial wave basis \(f(\theta)=\sum_{\ell=0}^{\infty}f_{\ell}P_{\ell}(\cos\theta)\), this leads to the partial wave expansion of the asymptotic wavefunction:
From this, we conclude that there are countably infinitely many \(U(1)\) eigenvalues of the scattering \(S\)-operator, one for each \(\ell\in\textbf N\) given by \(e^{2i\delta_{\ell}(k)}=1+\frac{2ikf_{\ell}}{2\ell+1}\) (remember that the \(f_{\ell}\in\textbf C\)). In particular, the optical theorem then follows from orthogonality of the Legendre polynomials:
One can then isolate \(f_{\ell}=(e^{2i\delta_{\ell}(k)}-1)(2\ell+1)/2ik\) and substitute into the above, and compare with \(f(0)=\sum_{\ell=0}^{\infty}f_{\ell}\) to check the optical theorem’s validity. Furthermore, writing \(\sigma=\sum_{\ell=0}^{\infty}\sigma_{\ell}\) with each contribution cross-section \(\sigma_{\ell}=\frac{4\pi(2\ell+1)\sin^2\delta_{\ell}(k)}{k^2}\leq\frac{4\pi(2\ell+1)}{k^2}\) is called a unitarity bound (since it arises from the unitarity of the \(S\)-operator). This unitarity bound can be understood semiclassically (not entirely classical because it does use quantum mechanics), by imagining a beam of classical particles all incident on the scattering center with momentum \(k\). Then, look at some particle with impact parameter \(\rho\) and hence angular momentum \(L=\hbar k\rho\). This should be quantized as \(L=\ell\hbar\), so suppose that \(\rho\in[\ell/k,(\ell+1)/k]\) corresponds to a given quantization. Then in the ideal case when the particle is \(100\%\) scattered, this idealized area \(\sigma_{\ell}^*\) of the annulus into which it scatters:
which is the unitary bound \(\sigma_{\ell}\leq 4\sigma_{\ell}^*\) up to the factor of \(4\).
Scattering Off Hard Spheres
Although we’d ultimately like to understand Rutherford scattering quantum mechanically, it helps to first look at a few simpler examples. The first is the infinitely hard sphere of radius \(a\), defined by \(V(r)=\infty\) for \(r<a\) and \(V(r)=0\) for \(r>a\). Basically, outside the hard sphere we have the usual solution \(\psi(r,\theta)=\sum_{\ell=0}^{\infty}(\alpha_{\ell}j_{\ell}(kr)+\beta_{\ell}y_{\ell}(kr))P_{\ell}(\cos\theta)\) whereas inside the hard sphere the wavefunction \(\psi(r<a)=0\) vanishes. First, we need to remember that we’re scattering off this sphere. This means that, asymptotically, the wavefunction \(\psi(r,\theta)\) needs to look like \(\psi_{\infty}(r,\theta)=\frac{1}{2ik}\sum_{\ell=0}^{\infty}(2\ell+1)\left[e^{2i\delta_{\ell}(k)}\frac{e^{ikr}}{r}+(-1)^{\ell+1}\frac{e^{-ikr}}{r}\right]P_{\ell}(\cos\theta)\). Using the asymptotic \(r\to\infty\) expansions of the spherical Bessel functions given earlier, one can check that \(e^{2i\delta_{\ell}(k)}=\frac{\alpha_{\ell}-i\beta_{\ell}}{\alpha_{\ell}+i\beta_{\ell}}\) which implies that the ratio of the probability amplitudes \(\beta_{\ell}/\alpha_{\ell}\) is simply (note that only this ratio is physically significant, not their absolute values because scattering states are not normalized anyways):
Finally, the Dirichlet boundary condition \(\psi(a)=0\) implies that \(\alpha_{\ell}j_{\ell}(ka)+\beta_{\ell}y_{\ell}(ka)=0\) for all \(\ell\in\textbf N\) so that the scattering phase shifts \(\delta_{\ell}(k)\in\textbf R\) are given simply by:
For instance, \(\delta_0(k)=-ka\) is exactly a linear phase shift for this specific case of scattering off a hard sphere. The higher-\(\ell\) phase shifts \(\delta_{\ell}(k)\) wrap like so:
We are now in a position to compute the cross-section \(\sigma=\frac{4\pi}{k^2}\sum_{\ell=0}^{\infty}(2\ell+1)\sin^2\delta_{\ell}(k)\). First, consider just the low-momentum \(ka\ll 1\) limit whereby, applying the near-origin behavior of the spherical Bessel functions, we obtain the corresponding low-momentum behavior of the phase shifts \(\delta_{\ell}(k)\approx -\frac{(ka)^{2\ell+1}}{(2\ell+1)!!(2\ell-1)!!}\) (this qualitatively agrees with the graphs above). The infinite series for the cross-section \(\sigma_{ka\ll 1}\) in this low-momentum limit \(ka\ll 1\) is therefore:
But since each \(\delta_{\ell}(k)\) is a monomial in \(ka\) but we’re working in the low-momentum limit \(ka\ll 1\) so each \(\delta_{\ell}(k)\) is almost zero (again, see the graph too), allowing us to apply a small-angle approximation \(\sin^2\left(-\frac{(ka)^{2\ell+1}}{(2\ell+1)!!(2\ell-1)!!}\right)\approx\frac{(ka)^{4\ell+2}}{(2\ell+1)!!^2(2\ell-1)!!^2}\):
Finally, we see that for \(\ell\geq 1\gg ka\) the monomials \((ka)^{4\ell+2}\) go to zero especially rapidly as \(\ell\to\infty\). The only phase shift that goes to zero but not as rapidly as the others is the \(\ell=0\) \(s\)-wave phase shift \(\delta_0(k)=-ka\) (again, all this is apparent from the graph). Thus, keeping only the dominant \(\ell=0\) term in the series gives finally:
which is \(4\times\) greater than the classical geometric cross-section \(\sigma_{\text{classical}}=\pi a^2\) for scattering billiard balls off the hard sphere (i.e. it is now the surface area of the hard sphere rather than its cross-sectional area, so still using the name “cross-section” for \(\sigma\) is strictly a misnomer).
Although above we reasoned all the approximations made in the calculation of the low-energy scattering cross-section \(\sigma_{ka\ll 1}\) purely mathematically (e.g. monomials, etc.), there is a nice semiclassical “physics way” to understand it too. Just as earlier with the semiclassical argument for the unitarity bound, imagine scattering classical particles with impact parameter \(\rho:=\ell/k\) in which case the regime \(\ell\gg ka\) (such as \(\ell\geq 1\gg ka\) above) is merely the regime \(\rho\gg a\) where the classical particle flies in far away from the scattering center \(r=0\), so it makes sense that it’s not really going to be scattered much/feel the influence of \(V(r)\), hence the phase shift \(\delta_{\ell}(k)\approx 0\) is negligible for \(\ell\geq 1\). Instead, only when you throw the particles head-on at the scattering center, which corresponds to the \(\ell=0\) \(s\)-wave do they accumulate a “somewhat noticeable” phase shift \(\delta_0(k)=-ka\) when \(ka\ll 1\).
Finally, just briefly, let’s also look at the high-momentum \(ka\gg 1\) regime. Here, applying far-field asymptotics of the spherical Bessel functions, all the phase shifts become lines displaced vertically in units of \(90^{\circ}=\pi/2\) from each other \(\delta_{\ell}(k)\approx -ka+\ell\pi/2\) (graph agrees with this). In this high-momentum \(ka\gg 1\) limit, the cross-section \(\sigma_{ka\gg 1}\) is:
which tends to \(\sigma\to 2\pi a^2\) as one lets \(ka\to\infty\) (hemispherical surface area?).
Universal Dynamics of Low-Energy Scattering
Okay, now let’s go back to low-energy (equivalently low-momentum) scattering. We already saw for the hard sphere that for each quantized angular momentum \(\ell\in\textbf N\), the phase shift \(\delta_{\ell}(k)\) scaling law at low-momentum \(ka\ll 1\) was:
\[\delta_{\ell}(k)\propto -(ka)^{2\ell+1}\]
It turns out, if we replace the hard sphere radius \(a\mapsto a_s\) by a more general characteristic scattering length \(a_s\) unique to each potential \(V(r)\) (e.g. \(a_s=a\) for the hard sphere potential), then this power law scaling is a universal phenomenon; in any scattering central potential \(V(r)\), when one throws quantum particles at it slowly \(ka_s\ll 1\) with some angular momentum \(\ell\) misaligned from the scattering center, the phase shift \(\delta_{\ell}(k)\propto -(ka_s)^{2\ell+1}\) decays as an odd power of \(ka_s\). And as before, really the only noticeable phase shift is associated with the head-on \(\ell=0\) \(s\)-wave with \(\delta_0(k)=-ka_s+O_{ka_s\ll 1}(ka_s)^3\). By repeating verbatim the approximation used in calculating \(\sigma_{ka\ll 1}\), one can check that we also have a characteristic low-energy cross-section \(\sigma_{ka_s\ll 1}=4\pi a_s^2\) the surface area of a fictitious sphere \(S^2_{a_s}\) of radius equal to the potential’s scattering length \(a_s\).
Why Bound/Resonant States Cause \(a_s\to\pm\infty\) to Diverge
If an attractive central potential \(V(r)\) with \(dV/dr>0\) (so not the hard sphere potential which was repulsive \(dV/dr<0\)) happens to be sufficiently attracting that it admits the possibility of trapping a quantum particle inside a bound or resonant state, then the mere existence (or “almost-existence”) of such a bound/resonant state will strongly affect scattering states (it’s like kicking a soccer ball into a valley, if the valley is deep enough then it will interact significantly with the soccer ball to slow it down and attempt to trap it). More precisely, one of the most pronounced such effects is that the potential’s scattering length \(a_s\to\pm\infty\) will diverge to infinity whenever there is a threshold bound or resonant state (intuitively, this says when you throw the particle slowly and head-on at \(V(r)\), the particle looks like it scatters off a hard sphere of radius \(a_s\), except that \(a_s>0\) means it’s repelled whereas \(a_s<0\) means it’s attracted while \(a_s=0\) means no scattering).
The simplest model system in which to see this phenomenon of divergence is the finite spherical potential well of radius \(a\) and depth \(V_0>0\). We already know that when \(V_0\) is deep enough or \(a\) is big enough, bound states should start to appear. First, we know outside \(r>a\) we have the same solution as the hard sphere \(\psi(r,\theta)=\sum_{\ell=0}^{\infty}(\alpha_{\ell}j_{\ell}(kr)+\beta_{\ell}y_{\ell}(kr))P_{\ell}(\cos\theta)\). Since we’re still matching this to the same “asymptotic boundary condition” \(\psi_{\infty}(r,\theta)=\frac{1}{2ik}\sum_{\ell=0}^{\infty}(2\ell+1)\left[e^{2i\delta_{\ell}(k)}\frac{e^{ikr}}{r}+(-1)^{\ell+1}\frac{e^{-ikr}}{r}\right]P_{\ell}(\cos\theta)\), this still leads to the same condition on the phase shifts:
The difference now though is that inside the finite spherical potential well the wavefunction is \(\psi(r,\theta)=\sum_{\ell=0}^{\infty}\gamma_{\ell}j_{\ell}(k’r)P_{\ell}(\cos\theta)\) with \(k’^2=k^2+2mV_0/\hbar^2\) with no second-kind spherical Bessel functions \(y_{\ell}(k’r)\) due to the requirement of regularity at the origin \(r=0\). By matching the logarithmic derivative of \(\psi(r,\theta)\) at \(r=a\) to eliminate the unwanted \(\gamma_{\ell}\) unknowns, calculating derivatives of spherical Bessel functions by just directly applying the product rule on their Rayleigh-definitions, and doing some algebra gets:
So far everything has been exact. Now come the approximations. The first is the usual low-energy scattering limit \(ka\ll 1\), the second is that \(k\ll \sqrt{2mV_0}/\hbar\), or equivalently we can assume \(k’\approx\sqrt{2mV_0}/\hbar\) is approximately independent of \(k\) in what follows. Artificially multiplying numerator and denominator by \(a\) allows one to write:
with \(x:=ka\) the parameter we’re taking to be \(x\ll 1\). After a calculation, one can check that the right-hand side, Taylor expanded about \(x=0\) to first-order yields:
\[\tan\delta_0(k)\approx(\text{tanc}(k’a)-1)x\]
Finally, assuming that even for this \(\ell=0\) \(s\)-wave the phase shift \(\delta_{\ell}(k)\) is sufficiently small so that \(\tan\delta_0(k)\approx\delta_0(k)\):
\[\delta_0(k)\approx(\text{tanc}(k’a)-1)ka\]
allows us to directly read off the scattering length as:
\[a_s=a(1-\text{tanc}(k’a))\]
By virtue of the tangent function, the scattering length \(a_s\) of the finite spherical potential well diverges at \(k’a=\pi(n+1/2)\), corresponding (in our approximation scheme) to potential well depths of the form \(V_0=\frac{(n+1/2)^2\pi^2\hbar^2}{2ma^2}\). Note that we’re implicitly assuming it’s a potential well \(V_0>0\); if it had been a finitely hard sphere \(V_0<0\) instead, then we would have an imaginary momentum \(k’=\sqrt{2mV_0}/\hbar=i\sqrt{-2mV_0}/\hbar\) and using \(\tan i\theta=i\tanh\theta\), we would conclude that instead the scattering length \(a_s\) is given by:
in which there is no longer any trace of the divergences \(a_s=\pm\infty\) seen earlier with the attractive well \(V_0>0\) (but rather, as the graph shows, when the sphere gets infinitely hard \(\Im(k’)a\to\infty\), the scattering length \(a_s\to a\) converges back to our earlier result of the hard sphere radius \(a\)). This hints that the divergences have something to do with the potential being attractive \(V_0>0\) and from its quantized nature and the formula for the depths \(V_0\), it’s natural to conjecture that it has something to do with bound/resonant states coming into existence associated with said attractive potential. This connection can be clarified by looking for imaginary momenta \(k=-i\kappa\) on the lower imaginary axis \(\kappa\in(0,\infty)\) such that the \(S\)-operator eigenvalue \(e^{2i\delta_{0}(k)}=0\) associated to the \(\ell=0\) \(s\)-wave vanishes. But we already found earlier while matching asymptotic boundary conditions that we have exactly:
which is simply the trivially obvious identity \(e^{2i\theta}=(1+i\tan\theta)/(1-i\tan\theta)\). This vanishes whenever the numerator \(\tan\delta_0(k)=i\) vanishes. Now, working with the exact formula for \(\tan\delta_0(k)\) from earlier (i.e. not touching any of the approximate formulas!) one finds that this implies:
\[\tan k’a=-\frac{k’}{\kappa}\]
where any value of \(\kappa\in (0,\infty)\) we can find that solves this equation (keeping in mind \(k’\) is defined in terms of \(\kappa\) via \(k’^2=-\kappa^2+2mV_0/\hbar^2\)) implies a bound state with energy \(E=-\hbar^2\kappa^2/2m<0\). Exactly at \(k=\kappa=E=0\) however so that \(k’=\sqrt{2mV_0}/\hbar\), one obtains a bound state at threshold whenever \(k’a=\pi (n+1/2)\) which reproduces the potential well depths \(V_0=\frac{(n+1/2)^2\pi^2\hbar^2}{2ma^2}\) at which the scattering length \(a_s\) diverges. Moreover, for a general zero of \(e^{2i\delta_0(k)}\) at some \(k=-i\kappa\), if one Laurent-expands \(e^{2i\delta_0(k)}=\frac{1+i\tan\delta_0(k)}{1-i\tan\delta_0(k)}\) about \(k=-i\kappa\), then because \(e^{2i\delta_0(k)}\in U(1)\) it must look like a Mobius transformation \(e^{2i\delta_0(k)}\approx\frac{k+i\kappa}{k-i\kappa}\) for \(k\) near \(-i\kappa\) so in other words \(\tan\delta_0(k)\approx -k/\kappa\). Approximating \(\tan\delta_0(k)\approx\delta_0(k)\) then yields \(a_s=1/\kappa\) which indeed diverges as \(\kappa\to 0\).
Breit-Wigner Distribution
Recall that while “scattering states” with lower imaginary momenta \(k=-i\kappa\) that suppress the scattering \(e^{2i\delta_0(k)}=0\) are actually bound states with energy \(E=-\hbar^2\kappa^2/2m\) in disguise, we also saw that “scattering states” with complex momenta in quadrant \(2\) or \(4\) of the complex \(k\)-plane \(\Re k\Im k<0\) that suppress the scattering \(e^{2i\delta_0(k)}=0\) are actually resonant states with complex energy \(E = \frac{\hbar^2 (\Re^2 k – \Im^2 k)}{2m} + i \frac{\hbar^2}{m} \Re k\Im k\) in disguise with lifetime \(\tau=-m/\hbar\Re k\Im k>0\) and energy width \(\Gamma/2=\hbar/\tau\).
This allows us to write the resonant state’s complex energy in the form \(E=E_{\text{CoM}}-i\Gamma/2\) where the center of mass energy is \(E_{\text{CoM}}:=\Re E=\frac{\hbar^2 (\Re^2 k – \Im^2 k)}{2m}\). Viewing the complex momentum \(k=\sqrt{2mE}/\hbar\) as a function of the complex energy \(E\), we can thus also view the \(U(1)\) phase \(e^{2i\delta_{\ell}(k)}=e^{2i\delta_{\ell}(E)}\) as a function of \(E\) rather than \(k\). Specifically, if \(e^{2i\delta_0(E=E_0-i\Gamma/2)}=0\), then in a neighbourhood of this complex energy \(E=E_{\text{CoM}}-i\Gamma/2\), the map \(E\mapsto e^{2i\delta_0(E)}\) looks locally like a Mobius transformation:
which directly yields the scattering cross-section \(\sigma_{\ell}(E)\) as a function of the complex energy \(E\) for partial waves of angular momentum \(\ell\):
This is (roughly speaking) known in particle physics as the Breit-Wigner distribution for the cross-section. If we artificially restrict to \(E\in\textbf R\), then one can plot \(\sigma_{\ell}(E)\) and it looks pretty much like \(1/E\) modulated by a Lorentzian “bell curve”, sort of like the discovery of the Higgs boson:
The Breit-Wigner distribution maximizes \(d\sigma_{\ell}/dE=0\) at \(E=\frac{2}{3}E_{\text{CoM}}+\frac{1}{3}\sqrt{E_{\text{CoM}}^2-\frac{3}{4}\Gamma^2}\), though it’s basically always the case that the center of mass energy \(E_{\text{CoM}}\gg\Gamma\) is much greater than the energy width \(\Gamma\) of the particle’s resonant state so that we can basically just focus on maximizing the Lorentzian piece \(\frac{\Gamma^2}{4(E-E_{\text{CoM}})^2+\Gamma^2}\) which is trivially maximized to \(1\) at its center when the denominator is minimized at \(E=E_{\text{CoM}}\) and agrees with the above quadratic formula result. The corresponding maximal cross-section (typically measured in units of \(1\text{ barn}=10^{-28}\text{ m}^2\) in experiment) is therefore \(\sigma_{\ell}^*\approx\frac{2\pi\hbar^2}{mE_{\text{CoM}}}(2\ell+1)=2\pi\bar{\lambda}^2(2\ell+1)\) where \(\bar{\lambda}:=\hbar/mc\) is the reduced Compton wavelength of the particle by virtue of \(E_{\text{CoM}}=mc^2\) which saturates the unitarity bound. Finally, although the Lorentzian distribution is one of those pathological probability distributions with undefined mean (and by extension, undefined variance, and all higher-order moments), it still has a perfectly well-defined \(\text{FWHM}=\Gamma\) hence explaining why \(\Gamma\) has been called the energy width the whole time.
A particle accelerator’s job is to basically vary the incident energy \(E\) and measure the corresponding cross-section \(\sigma(E)\). It is a fundamental fact of nature that most particles are very unstable with short lifetimes \(\tau\). The stable exceptions are protons \(p^+\), electrons \(e^-\), photons \(\gamma\), neutrinos \(\nu\), etc. By colliding these stable particles, unstable ones can emerge briefly in such collisions, but only appear as resonant states which quickly decay with short lifetime \(\tau\).
Lippman-Schwinger Equation, Born Series Approximation
Suppose we write the Hamiltonian for a scattering process perturbatively as \(H=\tilde H+\Delta\tilde H\), where \(\tilde H\) is some well-understood Hamiltonian and \(\Delta\tilde H\) is a small perturbation to it. Most often, we’ll take \(\tilde H=T\) to just be the kinetic energy operator and \(\Delta\tilde H:=V\) to be the potential energy from which one is scattering (this is the same view taken in the nearly free electron model in condensed matter physics where in that context \(V\) would be the periodic potential of an underlying lattice \(\Lambda\)). We then have the following logically equivalent formulation of the eigenequation for the Hamiltonian \(H\):
where \(\widetilde{|E\rangle}\in\ker(\tilde H-E1)\) is any eigenstate \(\tilde H\widetilde{|E\rangle}=E\widetilde{|E\rangle}\) of the unperturbed Hamiltonian \(\tilde H\) with energy \(E\). This is the abstract version of the Lippman-Schwinger equation. It is more useful in a concrete form, for instance when expressed in the position representation it becomes an integral reformulation of the Schrodinger differential equation (show this via a simple Fourier transform, and performing an inverse FT with a clever \(i\varepsilon\)-prescription and contour integration over the Jordan domain).
where we’ve assumed \(\tilde H=T,\Delta\tilde H=V\) and therefore taken the plane wave \(\widetilde{|E\rangle}=|\textbf k\rangle\) to be the \(T\)-eigenstate. Similar to the multipole expansion of the electrostatic potential where we assume the charge density \(\rho(\textbf x’)\) is more or less localized to a compact region of space, we assume here that the potential \(V(\textbf x’)\) is also roughly localized (or at least decays quickly enough), so that the integration need not be over \(\textbf x’\in\textbf R^3\) but rather is confined to a compact subspace as well. Then in the asymptotic limit \(|\textbf x|\gg |\textbf x’|\), the exponent in \(e^{ik|\textbf x’-\textbf x|}\) goes like:
while the multipole expansion also in the integrand \(\frac{1}{|\textbf x-\textbf x’|}=\frac{1}{|\textbf x|}\sum_{\ell=0}^{\infty}\left(\frac{|\textbf x’|}{|\textbf x|}\right)^{\ell}P_{\ell}(\cos\theta)=\frac{1}{|\textbf x|}+O_{|\textbf x|\gg |\textbf x’|}\left(\frac{|\textbf x’|}{|\textbf x|}\right)\) can be kept to just monopole order. These asymptotic approximations allow one to obtain the asymptotic Lippman-Schwinger equation:
Taking \(\textbf k=k\hat{\textbf k}\) along the \(z\)-axis and writing \(r:=|\textbf x|\), we have therefore proven from the asymptotic Lippman-Schwinger equation the asymptotic form of the scattered waves in agreement with the intuition earlier. In particular, the scattering amplitude \(f(\theta,\phi)\) can be read off as:
where \(\textbf k’=\textbf k'(\theta,\phi)=k\hat{\textbf x}\) is the momentum of the scattered wave in a given direction. The \(\phi\)-dependence in the scattering amplitude \(f(\theta,\phi)\) allows for the possibility of an anisotropic (e.g. dipolar) potential \(V(\textbf x’)\) (notice we haven’t assumed anything about \(V\) yet other than its asymptotic decay behavior), in particular the dependence of \(\theta,\phi\) is entirely contained in the \(S^2\) direction vector \(\hat{\textbf x}=\sin\theta\cos\phi\hat{\textbf i}+\sin\theta\sin\phi\hat{\textbf j}+\cos\theta\hat{\textbf k}\) in the exponent.
To finally cut at the crux of the difficulty (i.e. the fact that the state \(|E\rangle\) is still on both sides of the Lippman-Schwinger equation), we employ the Born series approximation. If we go back to the abstract form of the Lippman-Schwinger equation, we can formally “isolate” for \(|E\rangle\) as follows:
But then recognize that the so-called Moller scattering operator \((1-(E1-\tilde H)^{-1}\Delta\tilde H)^{-1}\) as written can be expanded as a geometric series (assuming the perturbation \(\Delta\tilde H\) is suitably “small” so that it converges):
(there even exists a “Feynman diagram” style way to write this Born series which I won’t get into here). The truncation of this geometric series at the \(n\)-th term then constitutes the \(n\)-th Born approximation. For \(n=0\), this is just:
\[|E\rangle\approx \widetilde{|E\rangle}\]
This is pretty crude as it completely ignores the potential \(V\). It is more common to use an \(n=1\) Born approximation to obtain the refined estimate:
This looks almost the same as the original exact Lippman-Schwinger equation with the exception that whereas we had the pesky unknown state \(|E\rangle\) we were trying to solve for on the right earlier, we’ve now replaced \(|E\rangle\mapsto\widetilde{|E\rangle}\) with its known zeroth-order Born approximation from above to obviate that difficulty. In particular, for \(\tilde H=T,\Delta\tilde H=V\) and \(\widetilde{|E\rangle}=|\textbf k\rangle\) as before, the \(n=1\) Born approximation asserts:
In particular, the Green’s operator \((E1-T)^{-1}V\) was already computed above in the position representation so we can just steal it and write the \(n=1\) Born approximation here as:
where the change in momentum between the scattered and incident waves is \(\Delta\textbf k:=\textbf k’-\textbf k\). This in turn leads to a corresponding approximation of the scattering amplitude \(f(\theta,\phi)\) as being proportional to the Fourier transform of the potential energy \(V(\textbf x)\):
where \(\Delta\textbf k=\Delta\textbf k(\theta,\phi)\) through \(\textbf k’=\textbf k'(\theta,\phi)\). The differential cross-section associated to the scattering is thus:
Intuitively, the way to interpret this formula is that variations in the potential \(V(\textbf x)\) in \(\textbf x\)-space occurring on the order of \(\sim \Delta x\) will show up in the power spectrum \(|\hat V(\Delta\textbf k)|^2\) of the potential only for large momentum transfer \(|\Delta\textbf k|\sim 1/\Delta x\) due to Heisenberg. But because the differential cross-section \(d\sigma\propto |\hat V(\Delta\textbf k)|^2\) is the experimentally measurable quantity (or strictly speaking the cross-section \(\sigma=\int d\sigma\) itself is what’s experimentally measured), in order to be able to detect \(d\sigma\neq 0\) we therefore need \(\hat V(\Delta\textbf k)\neq 0\) (i.e. larger the better). So scattering particles harder lets you peek on finer details.
Yukawa Potential For Strong Nuclear Force
The strong nuclear force can be well-modelled asymptotically via the central Yukawa potential:
\[V_{\text{Yukawa}}(r)=-\frac{g^2}{r}e^{-r/R}\]
where \(g\) is a coupling constant and \(R\) is called the range of the Yukawa force:
It turns out that the range \(R\propto 1/m\) is inversely proportional to the mass \(m\) of the particle “mediating” the potential. For electromagnetism, these are photons \(\gamma\) which are massless \(m=0\) so in that case the Yukawa potential has infinite range \(R=\infty\) and reduces to the Coulomb potential (hence why we called it a long-range force earlier) with \(g^2=-q_1q_2/4\pi\varepsilon_0\). However, for massive particles like protons \(p^+\) and neutrons \(n^0\) in the nuclei of atoms (for which the strong interaction is most relevant), the Yukawa potential is the correct (asymptotic) potential to use. As an aside, this strong nuclear force of course explains why the nucleus doesn’t fall apart. Two protons \(p^+\) would ordinarily repel on electromagnetic grounds but are attracted on strong nuclear grounds, with an equilibrium separation \(r_0\) occurring when \(\frac{\alpha\hbar c}{g^2}e^{r_0/R}=1+\frac{r_0}{R}\).
Now one can ask what does scattering look like in a Yukawa potential? Let’s first figure out the scattering amplitude \(f(\theta,\phi)\). The Fourier transform of the Yukawa potential is (by working in spherical coordinates \(\iiint_{\textbf x’\in\textbf R^3}d^3\textbf x’=\int_{0}^{2\pi}d\phi’\int_{-1}^1d\cos\theta’\int_0^{\infty}dr’ r’^2\) and recognizing that one can take the momentum transfer \(\Delta\textbf k\) to lie along the \(z\)-axis without loss of generality so that \(\Delta\textbf k\cdot\textbf x’=|\Delta\textbf k|r’\cos\theta’\)) essentially Lorentzian:
where the dependence on the scattering angle \(\textbf k\cdot\textbf k’=k^2\cos\theta\) can be made to appear explicitly by writing \(|\Delta\textbf k|^2=|\textbf k’-\textbf k|^2=2k^2(1-\cos\theta)=4k^2\sin^2\theta/2\) as evident geometrically in this \(\textbf k,\textbf k’\)-rhombus:
(note that the optical theorem would seem to predict \(\sigma=0\) because the scattering amplitude \(f(\theta)\in\textbf R\) is real-valued; this is not a contradiction, but rather a limitation of the Born series approximation which is implicit in all the computations we’ve been doing).
In particular, if one now lets the range \(R\to\infty\) of the Yukawa force to become long-range and \(g^2=\alpha\hbar c\), then one would seem to obtain the corresponding scattering amplitude, differential cross-section, and total cross-section for Rutherford scattering:
Although we don’t have a classical analog of the scattering amplitude \(f(\theta)\), we did previously see a classical computation for the differential cross section \(d\sigma\) which one can check agrees with this first-order Born approximation from quantum scattering! (by trivial extension both the classical and quantum scattering calculations also agree that the Coulomb cross-section blows up \(\sigma=\infty\)). However, although ultimately it turns out that this is the correct answer for the differential-cross section \(d\sigma\), the method here of simply starting from the Yukawa potential and extending the range to \(R\to\infty\) is invalid.
The purpose of this post is to acquire a deeper appreciation of Mobius transformations. Typically, one simply encounters these as maps \(\mathcal M:\textbf C\cup\{\infty\}\to\textbf C\cup\{\infty\}\) on the Riemann sphere \(\textbf C\cup\{\infty\}\) of the form \(\mathcal M(z):=\frac{az+b}{cz+d}\) for \(ad-bc\neq 0\) without enough emphasis being placed on just how incredibly general and versatile this form is. The following problems (followed by their solutions) are meant to expose the full scope of possibilities that come with Mobius transformations.
Problem #\(1\): Solve the equation \(\frac{5}{8}=\frac{11-3x}{4+7x}\) for \(x\in\textbf R\).
Solution #\(1\): Exploit the invariance under Mobius transformations of a suitable cross-ratio:
In general, the Mobius group is isomorphic to \(\cong\text{PGL}_2(\textbf C)\). This is incredibly powerful because it provides an explicit bridge between complex analysis and linear algebra. In this language, the question of which Mobius transformations are involutions \(\mathcal M^2=1\) boils down to which \(2\times 2\) complex invertible matrices \(M=\begin{pmatrix}a&b\\c&d\end{pmatrix}\) satisfy \(M^2=1\). By the Cayley-Hamilton theorem, any matrix with eigenvalues \(\lambda=\pm 1\) will satisfy such an equation…
What’s the connection with unitary quantum logic gates on qubits? Give examples with 1D scattering (which is similar to Fresnel equations). Even for certain classes of intuitive \(2\)D matrices like rotations, what do the corresponding Mobius maps say? And also is there any connection between evaluating a Mobius map on some \(z\) and multiplying the corresponding matrix by a C^2 vector?
Problem #\(4\): Decompose the following rational function into partial fractions:
\[\frac{x^2+15}{(x+3)^2(x^2+3)}\]
Solution #\(4\): Remember that partial fractions may be viewed as just Mobius transformations with \(a=0\) (or powers thereof). In this case, that means:
where the residues \(R_1,R_2,R_3,R_4\) may be calculated efficiently using the residue theorem (one may optionally prefer one’s residues to be real, in which case the last \(2\) terms should be combined as \(\frac{\tilde R_3x+\tilde R_4}{x^2+3}\); however as will be clear below it is actually very easy to get \(\tilde R_3,\tilde R_4\in\textbf R\) by simply summing \(\frac{R_3}{x-\sqrt{3}i}+\frac{R_4}{x+\sqrt{3}i}\) once the complex residues \(R_3,R_4\in\textbf C\) have been obtained; partial fraction composition is much easier decomposition!). To begin, it is helpful to complexify \(x\mapsto z\) everywhere:
Now, suppose one wished to extract the residue at one of the simple poles, say \(R_1\). Then pick any contour in \(\textbf C\) that encloses only the corresponding simple pole, in this case at \(z=-3\) (strictly speaking \(z=-3\) is a double pole of the relevant function but that’s a technicality as far as computing \(R_1\) is concerned), and compute the corresponding contour integral on both sides, normalized by \(2\pi i\):
By changing the contour to enclose the simple poles at \(z=\pm\sqrt{3}i\), one can similarly compute the residues \(R_3=1/(\sqrt{3}i-3)\) and \(R_4=-1/(3+\sqrt{3}i)\) (here just the vanilla Cauchy integral formula will do). This immediately leads to the residues \(\tilde R_3=-1/2\) and \(\tilde R_4=1/2\).
Finally to compute the “residue” \(R_2\) at the \(z=-3\) double pole, one simply has to convert it into a simple pole by multiplying through the whole equation by a factor of \((z+3)\):
The purpose of this post is to prove a simple but somewhat unintuitive result in nonrelativistic \(1\)D scattering of quantum particles. Specifically, consider a potential energy landscape \(V(x)\) that decays asymptotically to \(\lim_{|x|\to\infty}V(x)=0\) at \(|x|\to\infty\) which may be asymmetric so that \(V(x)\) does not necessarily coincide with \(V(-x)\). As a result of such an asymmetry, one might expect that \(V(x)\) would scatter incident quantum particles differently depending on whether such quantum particles are incident on \(V(x)\) from the left \(x\to-\infty\) or from the right \(x\to\infty\). Nonetheless, it is an unintuitive but important result that actually the transmission amplitudes \(t_{\rightarrow}=t_{\leftarrow}\) are exactly the same while the reflection amplitudes \(r_{\rightarrow}=-r_{\leftarrow}^*t_{\rightarrow}/t^*_{\leftarrow}\) are congruent modulo a \(U(1)\) phase. In particular, one has the corollary that the left-incident and right-incident transmission and reflection probabilities are identical \(|t_{\rightarrow}|^2=|t_{\leftarrow}|^2\), \(|r_{\rightarrow}|^2=|r_{\leftarrow}|^2\) regardless of the precise nature of \(V(x)\), so long as it vanishes for sufficiently large \(|x|\).
To prove this, define the Wronskian state \(|W\rangle_{|\psi_1\rangle,|\psi_2\rangle}\in\mathcal H^{\otimes 2}\) of two states \(|\psi_1\rangle,|\psi_2\rangle\in\mathcal H\) by the antisymmetric Slater-like determinant \(|W\rangle_{|\psi_1\rangle,|\psi_2\rangle}:=|\psi_1\rangle\otimes P_{\mathcal H}|\psi_2\rangle-P_{\mathcal H}|\psi_1\rangle\otimes |\psi_2\rangle\); for instance one has the probability current state \(|J\rangle=|W\rangle_{K|\psi\rangle,|\psi\rangle}/2m\). One can explicitly calculate (thanks to \(H-V=T\propto P^2\)) that if \(|E\rangle_1,|E\rangle_2\) are any two degenerate \(H\)-eigenstates with the same energy \(E>0\) (e.g. a beam of particles of energy \(E\) thrown in from the left vs. a beam of the same energy \(E\) scattering in from the right), then \(|W\rangle_{|E_1\rangle,|E_2\rangle}\in\ker P_{\mathcal H^{\otimes 2}}\) has zero momentum, using the “product rule” \(P_{\mathcal H^{\otimes 2}}=P_{\mathcal H}\otimes 1_{\mathcal H}+1_{\mathcal H}\otimes P_{\mathcal H}\).
Thus, the Wronskian state \(|W\rangle_{|E_1\rangle,|E_2\rangle}\) is translationally invariant, meaning it is a constant across \(x\in\textbf R\). In particular, at \(x\to-\infty\), it is \(|W\rangle_{|k\rangle,|-k\rangle}=-2ikt_{\leftarrow}\) but at \(x\to\infty\) it is \(-2ikt_{\rightarrow}\) so we get \(t_{\rightarrow}=t_{\leftarrow}\). A similar argument replacing \(|k\rangle\mapsto K|k\rangle\) (but not touching the other one) leads to the condition \(\begin{pmatrix} t_{\leftarrow} & r_{\leftarrow}\end{pmatrix}^{\dagger}\begin{pmatrix}0&1\\1&0\end{pmatrix}\begin{pmatrix}t_{\rightarrow}\\r_{\rightarrow}\end{pmatrix}=0\) as claimed.