The Aharanov-Bohm Phase & Dirac Quantization

The purpose of this post is to describe the Aharanov-Bohm phase and its relevance to Dirac’s classic \(1931\) argument for the quantization of magnetic charge (if magnetic monopoles were to exist) using the principles of quantum mechanics. Finally, a few more proofs of Dirac quantization will be given that were subsequently discovered.

Aharanov-Bohm Effect

Here is ChatGPT’s depiction of the effect:

A simple idealized example that gets to the essence of the Aharanov-Bohm effect is as follows; consider a cylindrical magnetic flux tube \(\Phi_{\textbf B}\) (perhaps from some solenoid), and consider a charge \(q\) confined to move on a circle \(S^1_R\) of radius \(R\) outside the magnetic flux tube where \(\textbf B=\textbf 0\) (aside: from Maxwell’s equations \(\partial_{\textbf x}\cdot\textbf E=\partial_{\textbf x}\times\textbf E=\dot{\textbf E}=0\) outside the solenoid, so by the Helmholtz decomposition theorem I believe that \(\textbf E=\textbf 0\) outside the solenoid but I might be wrong about this). Despite this, the magnetic vector potential \(\textbf A\) cannot be zero due to Stokes’s theorem \(\Phi_{\textbf B}=\oint_{\textbf x\in S^1_R}\textbf A(\textbf x)\cdot d\textbf x\) which in a suitable gauge implies the azimuthally-circulating vector field \(\textbf A=\frac{\Phi_{\textbf B}}{2\pi R}\hat{\boldsymbol{\phi}}_{\phi}\). The Hamiltonian for this charge is (remember again it’s confined to the circle \(S^1_R\)):

\[H=\frac{(P_{\phi}-q\Phi_{\textbf B}/2\pi R)^2}{2m}\]

Since the particle is constrained to move on a circle, \(L_3=RP_{\phi}\), so:

\[H=\frac{(L_3-q\Phi_{\textbf B}/2\pi)^2}{2I}\]

with \(I:=mR^2\) the moment of inertia. The energies are therefore quantized by \(m_{\ell}\in\textbf Z\):

\[E_{m_{\ell}}=\frac{\hbar^2}{2I}\left(m_{\ell}-\frac{\Phi_{\textbf B}}{\Phi_{\textbf B,0}}\right)^2\]

with the quantum of magnetic flux \(\Phi_{\textbf B,0}=h/e\). The corresponding usual \(L_3\)-eigenstate wavefunctions that we’ve known since we analyzed the “particle on a ring” back in the good ol’ days are all maximally delocalized and given by:

\[\langle\phi|E_{m_{\ell}}\rangle=\frac{1}{\sqrt{2\pi R}}e^{im_{\ell}\phi}\]

Anyways, focusing back on the energies \(E_{m_{\ell}}\), it is clear that the spectrum of the Hamiltonian \(H\) is indistinguishable within residue classes of the form \(\Phi_{\textbf B}\text{ mod }\Phi_{\textbf B,0}\) for any \(\Phi_{\textbf B}\in\textbf R\). Graphically, if one plots for various values of \(m_{\ell}\in\textbf Z\) the energy \(E_{m_{\ell}}\) as a function of \(\Phi_{\textbf B}/\Phi_{\textbf B,0}\), one obtains a sequence of parabolas (only \(-4\leq m_{\ell}\leq 4\) are shown but it’s important to realize that other parabola segments should be leaking into here too forming a lattice-like structure which would graphically explain the above observation about indistinguishability):

However, notice that the zero vector potential \(\textbf 0=\textbf A-\frac{\partial\Gamma}{\partial\textbf x}\) can (despite our earlier emphasis that \(\textbf A\neq\textbf 0\)!) be obtained from the earlier vector potential \(\textbf A(\rho,\phi)=\frac{\Phi_{\textbf B}}{2\pi \rho}\hat{\boldsymbol{\phi}}_{\phi}\) via the multivalued gauge transformation \(\Gamma(\phi)=\frac{\Phi_{\textbf B}\phi}{2\pi}\) (which means that \(\frac{\partial}{\partial\textbf x}\times\textbf A=\textbf 0\) is irrotational despite the fact that \(\textbf A\) appears azimuthally circulating) and therefore our earlier wavefunctions also transform accordingly into \(\frac{1}{\sqrt{2\pi R}}e^{im_{\ell}\phi}\mapsto \frac{1}{\sqrt{2\pi R}}e^{im_{\ell}\phi-iq\Gamma(\phi)/\hbar}\). Although it’s okay for the gauge \(\Gamma\) to be multivalued, it’s not okay on physical grounds for wavefunctions to be multivalued (i.e. they have to actually be functions!) and so the only way to ensure that the new wavefunctions are single-valued is iff \(\Phi_{\textbf B}/\Phi_{\textbf B,0}\in\textbf Z\). Thus, the gauge transformation is allowed iff the energy spectrum \(E_{m_{\ell}}\) is unaffected, meaning the quantum particle is completely oblivious to the magnetic flux tube’s existence. Thus, the restriction that the particle’s wavefunction \(\langle\phi|E_{m_{\ell}}\rangle\) has to wind \(\langle\phi+2\pi|E_{m_{\ell}}\rangle=\langle\phi|E_{m_{\ell}}\rangle\) around the flux tube \(\Phi_{\textbf B}\) is an example of how the topology of a space can have important quantum mechanical ramifications.

Double-Slit Experiment Using Classical Physics

Here we first recall how the far-field (also called the Fraunhofer) period-averaged interference pattern for the classic double-slit experiment is calculated classically using the usual wave theory of light. As usual, a monochromatic, coherent electromagnetic plane wave (polarized along the \(\hat{\textbf k}\) direction) with wavenumber \(k\) is incident normally on two infinitely thin slits separated by distance \(\Delta x\). From each slit emerges (via the magic wand of diffraction, or more precisely via the Huygens-Fresnel principle) monochromatic, coherent electromagnetic cylindrical waves of the same wavenumber \(k\) which interfere with each other everywhere in three-dimensional space \(\textbf R^3\). However, as is standard, we place a screen normally at a far distance \(\Delta y\gg k\Delta x^2\) away, thereby sampling a two-dimensional cross-section \(\cong\textbf R^2\subseteq\textbf R^3\) of the full three-dimensional space where all the interference is really happening (rant: in everyday usage the word interference generally brings with it a negative connotation, however we know that at various points \(\textbf x\in\textbf R^3\) in space waves can “interfere” not just in a destructive way befitting of the negative connotation but also in a constructive way which sort of lends itself to a cognitive dissonance. A more neutral synonym for interference would be superposition, and maybe a better name for interferometer would be superpositionometer but since it’s standard to speak about interference I’ll just stick to that annoying terminology).

To compute the time-averaged far-field interference pattern (here defined as the irradiance received by the screen) \(\bar I^{\infty}(x|k,\Delta x,\Delta y)\) at some position \(\textbf x=(x,\Delta y)\) on the screen \(y=\Delta y\), we first note that the two slits are located at \(\textbf x_{\pm}=(\pm\Delta x/2,0)\). If we arbitrarily focus for a moment on the far-field electric field \(\textbf E_+^{\infty}(\textbf x,t)\) associated with the cylindrical EM-wave emanating from just the slit at position \(\textbf x_+\), it is intuitively given by \(\textbf E_+^{\infty}(\textbf x,t)\sim E_0\cos k(|\textbf x-\textbf x_+|-ct)\hat{\textbf k}\). Because we’re dealing with cylindrical waves, the coefficient \(E_0\) is not actually a constant but rather exhibits the far-field decay \(E_0=E_0(|\textbf x-\textbf x_+|)\sim \frac{1}{\sqrt{|\textbf x-\textbf x_+|}}\) characteristic of Bessel function cylindrical waves. Nonetheless, we will ignore this amplitude attenuation in the electric field strength \(|\textbf E_+^{\infty}(\textbf x,t)|\) as it would distract from the important features (e.g. locations of bright and dark fringes) in the far-field interference pattern \(\bar I^{\infty}(x|k,\Delta x,\Delta y)\) itself which don’t depend on it.

Similarly, for the slit at \(\textbf x_-\), the far-field electric field “spilling out” of it is \(\textbf E_-^{\infty}(\textbf x,t)\sim E_0\cos k(|\textbf x-\textbf x_-|-ct)\hat{\textbf k}\) (importantly no phase shift term needs to be introduced in the argument of the \(\cos\) function because the plane wave was taken to be incident normally/simultaneously on both of the slits). The net far-field electric field therefore arises by interference \(\textbf E^{\infty}(\textbf x,t)=\textbf E_+^{\infty}(\textbf x,t)+\textbf E_-^{\infty}(\textbf x,t)\) and evaluating it at a point \(\textbf x=(x,\Delta y)\) on the screen yields (with the help of a trig identity):

\[\textbf E^{\infty}(\textbf x,t)=2E_0\cos \frac{k}{2}\left((|\textbf x-\textbf x_+|+|\textbf x-\textbf x_-|)-2ct\right)\cos\frac{k}{2}(|\textbf x-\textbf x_+|-|\textbf x-\textbf x_-|)\hat{\textbf k}\]

The far-field (but not yet period-averaged) irradiance \(I^{\infty}(\textbf x,t)\) received is given by the magnitude of the asymptotic Poynting field \(\textbf S^{\infty}\) (strictly projected onto the flat screen, though in the small-angle approximation we can assume this is roughly satisfied) \(I^{\infty}(\textbf x,t)\approx |\textbf S^{\infty}(\textbf x,t)|=\frac{|\textbf E^{\infty}(\textbf x,t)|^2}{\mu_0 c}=\frac{4E_0^2\cos^2\frac{k}{2}\left((|\textbf x-\textbf x_+|+|\textbf x-\textbf x_-|)-2ct\right)\cos^2\frac{k}{2}(|\textbf x-\textbf x_+|-|\textbf x-\textbf x_-|)}{\mu_0 c}\)

Now then, period-averaging \(\bar I^{\infty}(\textbf x)=\frac{ck}{2\pi}\int_0^{2\pi/ck} I^{\infty}(\textbf x,t)dt\) turns the time-dependent \(\cos^2\mapsto\frac{1}{2}\) so the final result is:

\[\bar I^{\infty}(\textbf x)=\frac{2E_0^2}{\mu_0 c}\cos^2\frac{k\Delta(\textbf x|\textbf x_+,\textbf x_-)}{2}\]

where the path-length difference is \(\Delta(\textbf x|\textbf x_+,\textbf x_-):=|\textbf x-\textbf x_+|-|\textbf x-\textbf x_-|\). The part where the far-field approximation is used has nothing to with the physics of interference or diffraction, and is just a matter of being able to simplify the path-length difference \(\Delta(\textbf x|\textbf x_+,\textbf x_-)\approx \Delta x\sin\theta\). This follows either from the standard geometric argument, or algebraically by approximating and binomial-expanding the metric \(|\textbf x-\textbf x_{\pm}|=\sqrt{|\textbf x|^2-2\textbf x\cdot\textbf x_{\pm}+|\textbf x_{\pm}|^2}\approx \sqrt{|\textbf x|^2-2\textbf x\cdot\textbf x_{\pm}}=\sqrt{|\textbf x|^2\pm|\textbf x|\Delta x\sin\theta}=|\textbf x|\sqrt{1\pm\frac{\Delta x\sin\theta}{|\textbf x|}}\approx |\textbf x|\pm\frac{\Delta x\sin\theta}{2}\).

Double Slit Experiment Using Quantum Mechanics

It may seem like the double-slit experiment falls within the realm of classical physics because, as we’ve just shown above, some simple-minded classical reasoning gives the correct answer for the asymptotic, period-averaged irradiance distribution \(\bar I^{\infty}(\textbf x)\) on the screen, in agreement with experiments. Yet in fact the double-slit experiment was really a smoking gun for quantum mechanics. Here we’ll redo the calculation, this time using quantum mechanics to see that it agrees with the classical answer (though it will be apparent that the two calculations are really isomorphic to each other!). Basically, the fundamental premise now is that instead of electric fields \(\textbf E_{\pm}(\textbf x,t)\) interfering as was the case classically, it is the position space wavefunctions \(\psi_{\pm}(\textbf x,t)\) (i.e. the probability amplitudes) that interfere (a common but false way of saying it is that the “photons are interfering with each other”; actually one can do the experiment sending in one photon at a time and over time the interference pattern builds up as a statistical distribution; this means that in a laser light for instance with trillions of photons, it is not the photons that are interfering with each other, but rather the wavefunction of each individual photon is in some sense interfering with itself). Mathematically, an incident plane wave \(\psi_{\text{incident}}(x)=e^{ikx}\) diffracts into two cylindrical wavefunctions which interfere to produce the total scattered wavefunction \(\psi_{\text{scattered}}(x)=\psi_+(x)+\psi_-(x)=e^{ik|\textbf x-\textbf x_+|}+e^{ik|\textbf x-\textbf x_-|}\) (notice that all along we have simply ignored the time dependence as is customary for stationary states!). The probability density is therefore \(|\psi_{\text{scattered}}(x)|^2=|\psi_+(x)|^2+|\psi_-(x)|^2+2\Re(\psi_+(x)\psi_-^*(x))=2(1+\Re e^{ik\Delta(\textbf x|\textbf x_+,\textbf x_-)})=2(1+\cos k\Delta(\textbf x|\textbf x_+,\textbf x_-))\). The form of this result clearly agrees with the classical result, where the key was the interference term \(\Re(\psi_+(x)\psi_-^*(x))\) which wouldn’t be present if one were to just throw classical particles at the double slits.

There is also another nice way to think about it using Heisenberg’s uncertainty principle. We can imagine that at the moment when the photon is passing through one of the slits, it has an equal probability to pass through either of the slits and so is initially prepared in the symmetric “Schrodinger’s cat state” \(|\psi\rangle=\frac{|\textbf x_+\rangle+|\textbf x_-\rangle}{\sqrt 2}\). The fact that we initially know the particle’s position pretty well along the \(x\)-axis means by Heisenberg’s uncertainty principle that there would be a large uncertainty in its momentum \(P_1\) along the \(x\)-direction as it exits the slit; this wide distribution of possible momenta, together with the interference, can be seen to heuristically rationalize why an interference pattern so spread out is observed! More precisely, the wavefunction in momentum space is \(\langle\textbf p|\psi\rangle=\frac{\langle \textbf p|\textbf x_+\rangle+\langle \textbf p|\textbf x_-\rangle}{\sqrt 2}=\frac{e^{-i\textbf p\cdot\textbf x_+/\hbar}+e^{-i\textbf p\cdot\textbf x_-/\hbar}}{\sqrt{2h^3}}\) from which it is clear that computing the probability distribution \(|\langle\textbf p|\psi\rangle|^2\) will produce the same shape of the interference pattern (since \(\textbf x_{\pm}\) are both “along the \(x\)-axis”, it makes sense to also only consider momenta \(\textbf p\) lying along the \(x\)-axis too). Note in particular that any kind of detector which could tell you whether the photon passed through a given slit will also disturb the interference pattern by virtue of collapsing the wavefunction (note the screen can, in quantum mechanical terms, be thought of as a measuring device for the position observable \(X\)).

Examples: The above calculation therefore highlights how to solve for the far-field interference pattern on a flat screen in a diffraction setup from a general grating geometry; classically one computes the Fourier transform of the transparency \(T(\textbf x)\) of the grating whereas quantum mechanically one computes the momentum space wavefunction in the \(x\)-direction \(\psi(\textbf p_1)\) at the grating (which itself is the Fourier transform of the position space wavefunction \(\psi(\textbf x)\) at the grating). For instance, for a single slit of finite width \(\Delta x>0\) (note that we previously assumed the double slits were infinitely thin, here if we did the same then we wouldn’t get an interference pattern), if the photon is equally likely to pass anywhere within the slit region then \(\psi(x)=\frac{1}{\sqrt{\Delta x}}\) for \(x\in[-\Delta x/2,\Delta x/2]\) and vanishes in the \(V=\infty\) potential region (this is called a box or top-hat filter). The momentum space wavefunction is therefore \(\psi(p_1)\propto \int_{-\Delta x/2}^{\Delta x/2}e^{-ixp_1/\hbar}dx\propto\text{sinc}\frac{\Delta xp_1}{2\hbar}=\text{sinc}\frac{k_1\Delta x}{2}\). Thus, because \(k_1=k\sin\theta\), the irradiance distribution (which we see is really a statistical distribution of photon position \(X\)) is \(I^{\infty}(\theta|k,\Delta x)\propto |\psi(p_1=\hbar k_1)|^2\propto \text{sinc}^2\left(\frac{k\Delta x}{2}\sin\theta\right)\). If on the other hand one has a double slit where the centers of each slit are separated by \(\Delta x\) but the slits themselves are also not infinitely thin but rather have some width \(a\), then such a slit is simply the spatial convolution of the single slit of width \(a\) with the infinitely thin double slits separated by \(\Delta x\), so by the convolution theorem the far-field interference pattern of such a grating is the product of their individual far-field interference patterns \(\bar I^{\infty}(\theta)\propto\cos^2\left(\frac{k\Delta x}{2}\sin\theta\right)\text{sinc}^2\left(\frac{ka}{2}\sin\theta\right)\). As a corollary, if \(\Delta x=a\) then \(\bar I^{\infty}(\theta)\propto\cos^2\left(\frac{k\Delta x}{2}\sin\theta\right)\text{sinc}^2\left(\frac{ka}{2}\sin\theta\right)\propto\text{sinc}^2(ka\sin\theta)\) becomes a single slit of width \(2a\).

Double Slit Experiment with a Twist

Now consider placing a flux tube \(\Phi_{\textbf B}\) behind the double slits; how does this affect the far-field interference pattern? Well, since it’s just another flux tube, we already saw that the associated magnetic vector potential \(\textbf A\) circulates around it, yet remains irrotational and thus conservative with gauge \(\frac{\partial}{\partial\textbf x}\int_{\textbf x_0}^{\textbf x}\textbf A(\textbf x’)\cdot d\textbf x’=\textbf A(\textbf x)\) where \(\textbf x_0\in\textbf R^3\) is an arbitrary reference point for the line integral. Now then, the story starts the same, sending in an incident plane wave \(\psi_{\text{incident}}(x)=e^{ikx}\) which diffracts through the slits into scattered cylindrical waves that interfere \(\psi_{\text{scattered}}(x)=\psi_+(x)+\psi_-(x)=e^{ik|\textbf x-\textbf x_+|}+e^{ik|\textbf x-\textbf x_-|}\). But now, because we’re doing a gauge transformation \(\textbf A\mapsto\textbf 0\) via the gauge \(\int_{\textbf x_0}^{\textbf x}\textbf A(\textbf x’)\cdot d\textbf x’\) above, each wavefunction must transform by respective \(U(1)\) phases \(\psi_{\pm}(x)\mapsto e^{-iq\int_{\textbf x_{\pm}}^{\textbf x}\textbf A(\textbf x’)\cdot d\textbf x’/\hbar}\psi_{\pm}(x)\). Although the phases acquired by each wavefunction depends on what \(\textbf A\) is, and so isn’t gauge-invariant, the phase difference \(\frac{q}{\hbar}\oint_{\textbf x}\textbf A(\textbf x)\cdot d\textbf x=\frac{q\Phi_{\textbf B}}{\hbar}\) (which is what counts in the interference term \(\Re(\psi_+\psi_-^*)\)) is gauge invariant, and notably has the effect of translating the entire interference pattern either “left” or “right” proportional to the amount of magnetic flux \(\Phi_{\textbf B}\) applied in the flux tube (this is experimentally verified!). Thus, the Aharanov-Bohm phase associated to any magnetic flux \(\Phi_{\textbf B}\) is \(e^{2\pi i\Phi_{\textbf B}/\Phi_{\textbf B,0}}\in U(1)\). Clearly, it’s \(1\) if and only if \(\Phi_{\textbf B}/\Phi_{\textbf B,0}\in\textbf Z\), which again means that only when the magnetic flux is divisible into an integer number of quanta does the interference pattern look like the usual untranslated double-slit interference pattern.

Magnetic Monopoles

Recall that “electric monopoles” exist; they are basically point charges with electric charge \(q\in\textbf R\), or more precisely \(q\in e\textbf Z\) with \(e\approx 1.602\times 10^{-19}\text{ C}\) the elementary quantum of charge (ignoring quarks, etc.). The electrostatic field of such an electric monopole is the usual Coulomb field:

\[\textbf E=\frac{q}{4\pi\varepsilon_0r^2}\hat{\textbf r}\]

This result follows from the integral form of Gauss’s law for \(\textbf E\)-fields \(\Phi_{\textbf E,\partial V}=\frac{Q_V}{\varepsilon_0}\) with the suitable Gaussian surface \(S^2_r\). Meanwhile, we know that normally Gauss’s law for \(\textbf B\)-field simply says \(\Phi_{\textbf B,\partial V}=0\) for any volume \(V\). But if magnetic monopoles were to exist, this law would be (I don’t know how to rigorously justify this but it seems intuitively reasonable) \(\Phi_{\textbf B,\partial V}=G_V\) where \(G_V\) is now the magnetic charge enclosed in \(V\) (and has the same units as the magnetic flux, e.g. webers \(\text{Wb}\)). In particular, for a magnetic monopole of magnetic charge \(g\) (perhaps this symbol is chosen because it kind of looks dual to the usual symbol \(q\) for electric charge), it follows that the magnetic field is also Coulombic and given by:

\[\textbf B=\frac{g}{4\pi r^2}\hat{\textbf r}\]

However, it turns out that not just any magnetic monopole charge \(g\) is allowed by quantum mechanics. Dirac’s original argument was to consider an electric charge \(q\) that completes say a “counter-clockwise” circular loop \(S^1\) around a magnetic monopole \(g\) with the radial magnetic field \(\textbf B\) above. This causes the wavefunction (this is where the quantum mechanics comes in!) of \(q\) to acquire an Aharanov-Bohm phase \(e^{-iq\oint_{\textbf x\in S^1}\textbf A(\textbf x)\cdot d\textbf x/\hbar}\) where \(\textbf A\) is some we-don’t-know-if-it-even-exists vector potential for the radial magnetic field \(\textbf B\) of the monopole. By Stokes’ theorem, we have:

\[\oint_{\textbf x\in S^1}\textbf A(\textbf x)\cdot d\textbf x=\Phi_{\textbf B,S^2_N/2}=g/2\]

where \(S^2_N/2\) is the “northern” hemisphere oriented with outward normal. On the other hand, it is also \(-\Phi_{\textbf B,S^2_S/2}=-g/2\) by taking the southern hemisphere (although these two separate line integral computations should not be treated transitively because only the Aharanov-Bohm phase is observable/meaningful). This means:

\[e^{-iqg/2\hbar}=e^{iqg/2\hbar}\]

This forces the product of their respective charges \(qg\) must be a multiple of Planck’s constant \(qg\in h\textbf Z\), or equivalently \(g\in\Phi_{\textbf B,0}\textbf Z\) comes in multiples of the quantum flux. This is the Dirac quantization condition. For a pair of dyons \((q_1,g_1),(q_2,g_2)\) (thought of as “electromagnetic charges”), the Dirac-Zwanziger quantization condition is:

\[\det\begin{pmatrix}q_1 & g_1 \\ q_2 & g_2 \end{pmatrix}\in h\textbf Z\]

obtained by orbiting each dyon in the background of the other dyon. Above, the explicit construction of a magnetic vector potential \(\textbf A\) for the monopole magnetic field \(\textbf B\) was avoided. Can it even be done? The answer is yes, thanks to the fact that \(\textbf R^3-\{\textbf 0\}\) admits a topologically non-trivial \(U(1)\) bundle over \(S^2\) unlike \(\textbf R^3\). For instance, the following Dirac strings will do the trick:

\[\textbf A_{\text{Dirac string}}^{\pm}(r,\theta,\phi)=\pm\frac{g}{4\pi r}\frac{1\mp\cos\theta}{\sin\theta}\hat{\boldsymbol{\phi}}_{\phi}\]

they are singular gauge connections on the half-lines \(\theta=0,\pi\) respectively.

Angular Momentum Quantization Implies Dirac Quantization

If the above argument for the Dirac quantization condition was not sufficiently convincing, here is a better argument in my opinion. Consider fixing the magnetic monopole \(g\) at the origin, and asking about the classical dynamics of a charge \(q\) experiencing the Lorentz force \(\dot{\textbf{p}}=q\dot{\textbf x}\times\textbf B(\textbf x)\) of the monopole’s magnetic field \(\textbf B\). Since \(\textbf B\) is central, one might think that, similar to the situation for a central force, maybe angular momentum \(\textbf L=\textbf x\times\textbf p\) would be an interesting quantity to look at (though here it’s clearly not a central force!). Specifically, its time derivative can be computed as:

\[\dot{\textbf L}=\frac{d}{dt}\left(\frac{qg}{4\pi}\hat{\textbf r}\right)\]

So indeed \(\textbf L\) is not conserved, but the modified angular momentum \(\textbf L’:=\textbf L-\frac{qg}{4\pi}\hat{\textbf r}\) is thus conserved (think of the contribution term as the angular momentum in the electromagnetic field itself). The component of this modified angular momentum along the particle’s position vector is also conserved \(\textbf L’\cdot\hat{\textbf r}=-qg/4\pi\), suggesting motion constrained to a cone. But the angular momentum along any direction must be quantized (this is where the quantum mechanics sneaks in) as \(m\hbar\). The slight catch here is that \(m\) is not just restricted to integers (which is normally the case for orbital angular momentum), but can also be half-integers. Equating \(qg/4\pi\in\frac{\hbar}{2}\textbf Z\) reproduces the Dirac quantization condition.

Posted in Blog | Leave a comment

Landau Levels

Problem: Write down the Lagrangian \(L\) of a nonrelativistic classical point charge \(q\) of mass \(m\) moving in an electromagnetic field. Justify it.

Solution: \(L=T-V\) as usual where the nonrelativistic kinetic energy \(T=m|\dot{\mathbf x}|^2/2\) (if it were relativistic then \(T=-mc^2/\gamma_{|\dot{\mathbf x}|}\)) and the velocity-dependent electromagnetic potential energy \(V\) must be written in terms of the electromagnetic gauge potentials:

\[V=q(\phi(\mathbf x,t)-\dot{\mathbf x}\cdot\mathbf A(\mathbf x,t))\]

The ultimate justification for this choice of \(L\) is that the corresponding Euler-Lagrange equations reproduce the nonrelativistic Lorentz force law:

\[m\ddot{\mathbf x}=q(\mathbf E(\mathbf x,t)+\dot{\mathbf x}\times\mathbf B(\mathbf x,t))\]

where \(\textbf E=-\frac{\partial\phi}{\partial\textbf x}-\frac{\partial\textbf A}{\partial t}\) and \(\textbf B=\frac{\partial}{\partial\textbf x}\times\textbf A\).

However, it can be motivated by the fact that the interaction portion of the action \(S_V=-q\int d\tau v^{\mu}A_{\mu}=-\int dt V\) is manifestly a Lorentz scalar so in particular by time dilation \(dt/d\tau=\gamma\) one recovers the above form of \(V\).

Problem: Compute the corresponding Hamiltonian \(H\) of the nonrelativistic charge \(q\) of mass \(m\) in the electromagnetic field. Hence, derive the minimal coupling rule.

Solution: One has the conjugate momentum \(\mathbf p=\frac{\partial L}{\partial\mathbf x}=m\dot{\mathbf x}+q\mathbf A(\mathbf x,t)\) which in particular is not the usual \(m\dot{\mathbf x}\) momentum due to the additional magnetic contribution \(q\mathbf A\) from the velocity-dependent magnetic potential. The Hamiltonian can thus be computed to obey the minimal coupling rule \(\mathbf p\mapsto\mathbf p-q\mathbf A\):

\[H:=\dot{\textbf x}\cdot\textbf p-L=\frac{|\textbf p-q\textbf A(\textbf x,t)|^2}{2m}+q\phi(\textbf x,t)\]

Problem: Comment on how the Lagrangian \(L\) and Hamiltonian \(H\) behave under electromagnetic gauge transformations \(A^{\mu}\mapsto A^{\mu}+\partial^{\mu}\Gamma\).

Solution: \(L\) is not gauge invariant, rather accumulating a total time derivative \(L\mapsto L-q\dot{\Gamma}\). On the other hand, \(H\) changes by a partial time derivative \(H\mapsto H+q\frac{\partial\Gamma}{\partial t}\). The equations of motion (whether in the form of Euler-Lagrange or Hamilton) are of course gauge invariant since \(\mathbf E,\mathbf B\) are gauge invariant.

Problem: A simple case of the above formalism consists of a non-relativistic point charge \(q\) of mass \(m\) in a uniform magnetic field, thus \(\mathbf E=\mathbf 0\) (hence \(\phi=0\)) and \(\mathbf B=B\hat{\mathbf z}\). What are some suitable choices of the gauge field \(\mathbf A\)?

Solution: Physically, one is looking for a “flow field” \(\mathbf A(\mathbf x)\) whose vorticity will generate a \(\textbf B=\frac{\partial}{\partial\textbf x}\times\textbf A\) that points uniformly along \(z\). The \(2\) obvious ways to set this up are either in a “river” manner like \(\mathbf A=-By\hat{\mathbf x}\) or \(\mathbf A=Bx\hat{\mathbf y}\) (both examples of the Landau gauge) or in a “whirlpool” manner like \(\mathbf A=\mathbf B\times\mathbf x/2\) (called the symmetric gauge, cf. familiar fluid mechanics result \(\frac{\partial}{\partial\mathbf x}\times(\boldsymbol{\omega}\times\mathbf x)=2\boldsymbol{\omega}\)).

Problem: Now assume the point charge is a quantum particle and begin by working in the Landau gauge \(\mathbf A_L=-By\hat{\mathbf x}\). Diagonalize the corresponding Hamiltonian \(H=|\mathbf P-q\mathbf A_L|^2/2m\).

Solution: Expanding and completing the square:

\[H=\frac{P_y^2}{2m}+\frac{m\omega_B^2}{2}\left(Y+\frac{p_x}{m\omega_B}\right)^2+\frac{p_z^2}{2m}\]

where \(p_x,p_z\in\mathbf R\) are the momentum eigenvalues in the \(x\) and \(z\) directions since \([H,P_x]=[H,P_z]=0\) and \(\omega_B=qB/m\) is the classical cyclotron frequency. Meanwhile, in the \(y\)-direction, the Hamiltonian is identical to that of a (displaced) \(1\)D quantum harmonic oscillator with natural frequency \(\omega_B\); this defines the corresponding characteristic length scale \(\ell_B=\sqrt{\hbar/m|\omega_B|}=\sqrt{\hbar/|q|B}\) (called the magnetic length) and characteristic momentum scale \(p_B=\sqrt{\hbar m|\omega_B|}=\sqrt{\hbar |q|B}\) of the oscillator. The energies (called Landau levels) are:

\[E_{n,p_z}=\hbar\omega_B(n+1/2)+\frac{p_z^2}{2m}\]

and (not gauge-invariant) position-space wavefunctions:

\[\psi_{n,p_x,p_z}(x,y,z)\sim e^{i(p_xx+p_zz)/\hbar}H_n\left(\frac{y+p_x/m\omega_B}{\ell_B}\right)e^{-(y+p_x/m\omega_B)^2/2\ell_B^2}\]

Problem: Show that the \(n^{\text{th}}\) Landau level has degeneracy \(\dim\ker(H-E_{n,p_z=0}1)=\Phi_{\mathbf B}/\Phi_{\mathbf B,0}\) with \(\Phi_{\mathbf B}\) the magnetic flux through the plane perpendicular to \(\mathbf B\) and \(\Phi_{\mathbf B,0}=h/|q|\) the quantum of magnetic flux.

Solution: The degeneracy arises from the fact that \(E_{n,p_z=0}\) doesn’t depend on the \(p_x\) eigenvalue even though \(\psi_{n,p_x,p_z=0}\) does. If the system were infinitely large, then \(p_x\in\mathbf R\) would cause each Landau level to be uncountably degenerate; therefore, assume the system lives in a rectangle of dimensions \(L_x\times L_y\) (as \(p_z=0\) the system is effectively confined to the \(xy\)-plane). Then \(p_x\in\hbar\frac{2\pi}{L_x}\mathbf Z\) is quantized, which leads to quantization of the oscillator’s equilibrium \(y_0=-p_x/m\omega_B\in\frac{h}{m\omega_BL_x}\mathbf Z\) with adjacent equilibria separated by \(\Delta y_0=\frac{h}{m|\omega_B|L_x}\). Since this must lie in an interval of width \(L_y\), one obtains the degeneracy:

\[\dim\ker(H-E_{n,p_z=0}1)=\frac{L_y}{\Delta y_0}=\frac{\Phi_{\mathbf B}}{\Phi_{\mathbf B,0}}\]

since \(\Phi_{\mathbf B}=BL_xL_y\); note the result is independent of mass \(m\), independent of the sign of the charge \(q\), and also independent of \(n\in\mathbf N\), indicating all Landau levels are equally degenerate. Furthermore, if one plugs some numbers in (e.g. \(B=1\text{ T}\) for an electron or proton), one finds that each Landau level has a mind-boggling degeneracy per unit area \(B/\Phi_{\mathbf B,0}\sim 10^{14}\frac{\text{states}}{\text{Landau level}\times{\text m^2}}\).

Problem: Derive the structure of Landau levels again but this time using the symmetric gauge \(\mathbf A_S=\mathbf B\times\mathbf x/2\).

Solution: A property of the earlier Landau gauge \(\mathbf A_L\) that was not mentioned (because it wasn’t necessary) is that it is also a Coulomb gauge \(\frac{\partial}{\partial\mathbf x}\cdot\mathbf A_L=0\). In general, whenever \(\frac{\partial}{\partial\mathbf x}\cdot\mathbf A=0\) for some gauge field \(\mathbf A\), there is a useful simplification that can be made on the cross-terms:

\[|\mathbf P-q\mathbf A|^2=|\mathbf P|^2+q^2|\mathbf A|^2-q(\mathbf P\cdot\mathbf A+\mathbf A\cdot\mathbf P)=|\mathbf P|^2+q^2|\mathbf A|^2-2q\mathbf A\cdot\mathbf P\]

namely the dot-product commutator \(\mathbf P\cdot\mathbf A-\mathbf A\cdot\mathbf P=-i\hbar\left(\frac{\partial}{\partial\mathbf X}\cdot\mathbf A\right)\) vanishes iff one works in a Coulomb gauge. It so happens that the symmetric gauge \(\mathbf A_S\) is also a Coulomb gauge \(\frac{\partial}{\partial\mathbf x}\cdot\mathbf A_S=0\), so this time exploit this lemma:

\[H=\frac{|\mathbf P-q\mathbf A_S|^2}{2m}=\frac{|\mathbf P|^2}{2m}+\frac{1}{2}m\left(\frac{\omega_B}{2}\right)^2\rho^2-\frac{\omega_B}{2}L_z\]

Since \([H,P_z]=[H,L_z]=0\) (the latter is the whole point of using the symmetric gauge!), one can analyze this \(2\)D isotropic harmonic oscillator in polar coordinates \(\psi(\rho,\phi,z)=R(\rho)e^{im_{\ell}\phi}e^{ip_zz/\hbar}\) such that the radial Schrodinger equation is:

\[\left[ -\frac{\hbar^2}{2m} \left( \frac{d^2}{d \rho^2} + \frac{1}{\rho}\frac{d}{d \rho} – \frac{m_\ell^2}{\rho^2} \right) + \frac{1}{8} m \omega_B^2 \rho^2 – \frac{\hbar \omega_B}{2} m_\ell + \frac{p_z^2}{2m}\right] R=ER\]

which has solutions parameterized by a radial quantum number \(n_{\rho}\in\mathbf N\) and \(m_{\ell}\in\mathbf Z\):

\[R_{n_{\rho},m_{\ell}}(\rho)\sim\rho^{|m_\ell|} e^{-\rho^2 / 4\ell_B^2} L_{n_{\rho}}^{|m_\ell|}\left( \frac{\rho^2}{2\ell_B^2} \right)\]

and energies:

\[E_{n_{\rho},m_{\ell},p_z}=\hbar\omega_B\left(n_{\rho}+\frac{|m_{\ell}|-m_{\ell}}{2}+1/2\right)+\frac{p_z^2}{2m}\]

or from ML, \(|m_{\ell}|-m_{\ell}=2\text{ReLU}(-m_{\ell})\) so defining \(n:=n_{\rho}+\text{ReLU}(-m_{\ell})\) ensures this coincides with the Landau level energies obtained earlier in a Landau gauge (how to show that the Landau levels still have the same degeneracy?).

Problem: How do the above results compare with the classical solution that the charge spirals along \(\mathbf B\) at the cyclotron frequency \(\omega_B\)?

Solution: Using the Landau gauge \(\mathbf A_L=-By\hat{\mathbf x}\) amounts to perfectly knowing the \(x\)-component \(p_x\) of the charge’s conjugate momentum at the expense of completely smearing it out (\(e^{ip_xx/\hbar}\)) in that direction. By contrast, with the symmetric gauge \(\mathbf A_S=\mathbf B\times\mathbf x/2\), the wavefunctions are isotropic, more closely matching the classical intuition…but none of these wavefunctions are gauge invariant (indeed, the exact \(U(1)\) gauge transformation they undergo is described below) so don’t read into them too much.

Quantum Gauge Theory

Suppose we insist that the Schrodinger equation be gauge covariant under the gauge transformations of the potentials \(\phi\mapsto\phi’=\phi+\frac{\partial\Gamma}{\partial t}\) and \(\textbf A\mapsto\textbf A’=\textbf A-\frac{\partial\Gamma}{\partial\textbf x}\). The novelty is that the state \(|\psi\rangle\mapsto |\psi’\rangle=U_{\Gamma}|\psi\rangle\) will have to transform too under the gauge transformations (which we postulate occurs via some linear operator \(U_{\Gamma}\)) in order to have any hope of maintaining gauge covariance. Thus, the Hamiltonian becomes \(H\mapsto H’=H+q\frac{\partial\Gamma}{\partial t}\) and we demand by gauge covariance that \(i\hbar\dot{|\psi’\rangle}=H’|\psi’\rangle\) in addition to \(i\hbar\dot{|\psi\rangle}=H|\psi\rangle\). With these two equations in mind, one can show that \(U_{\Gamma}\) satisfies the differential equation:

\[i\hbar\frac{\partial U_{\Gamma}}{\partial t}=[H,U_{\Gamma}]+q\frac{\partial\Gamma}{\partial t}U_{\Gamma}\]

If we assume for a moment that \([H,U_{\Gamma}]=0\) commute, then the resultant ODE is easy to integrate and gives:

\[U_{\Gamma}(\textbf X,t)=e^{-iq\Gamma(\textbf X,t)/\hbar}\]

which is indeed a local unitary phase as it was suggestively written all along. This suggests a new interpretation of the gauge \(\Gamma\), namely it is a generator of translations in charge space \(q\in\textbf R\), adding more or less charge to the system. The assumption earlier that \([H,U_{\Gamma}]=0\) amounts to conservation of electric charge?

There is another way to see why the above is the correct gauge transformation of states, even if its derivation was a little handwavy. Specifically, if one defines the four-covector operator \(\mathcal D:=\frac{\partial}{\partial X}+\frac{iq}{\hbar}A\) with corresponding covariant components \(\mathcal D_{\mu}=\partial_{\mu}+\frac{iqA_{\mu}}{\hbar}\) for spacetime index \(\mu=0,1,2,3\), then the Schrodinger equation looks like that for a free particle (descending from kets to wavefunctions \(|\psi\rangle\mapsto\psi(\textbf x,t)\)):

\[i\hbar c\mathcal D_0\psi=-\frac{\hbar^2}{2m}\mathcal D^2\psi\]

where \(\mathcal D^2:=\mathcal D_1^2+\mathcal D_2^2+\mathcal D_3^2\). Now then, check (using the product rule and \([A_{\mu},U_{\Gamma}]=0\)) that each \(\mathcal D_{\mu}\psi\mapsto U_{\Gamma}\mathcal D_{\mu}\psi\) under the gauge transformation \(\Gamma\) so the Schrodinger equation is thus gauge invariant since \(D_{\mu}U_{\Gamma}=U_{\Gamma}D_{\mu}\) as operators (though the Schrodinger equation is not Lorentz covariant! That’s a topic for another time).

Posted in Blog | Leave a comment

Bloch’s Theorem

The purpose of this post is to provide some intuition for Bloch’s theorem, a result which might be more descriptively called the “periodic potential lemma” or even the “fundamental theorem of condensed matter physics“.

Problem #\(1\): Solve the \(1\)st order linear ODE:

\[\dot v(t)+\Gamma(t)v(t)=0\]

first for arbitrary parametric driving \(\Gamma(t)\), and then for the specific periodic driving \(\Gamma(t)=\Gamma_0\cos^2(\omega t)\).

Solution #\(1\): (solve by an integrating factor or separation of variables):

This calculation thus illustrates an example of Floquet’s theorem; the response of a first-order linear system to periodic parametric driving looks like the “free” evolution of the system with respect to the average driving (i.e. exponential growth/decay) modulated by a periodic envelope with the same period as the driving.

Problem #\(2\): What are the \(2\) key differences between having a free particle \(V=0\) and having a particle in a \(\Lambda\)-periodic potential \(V(\textbf x-\Delta\textbf x)=V(\textbf x),\Delta\textbf x\in\Lambda\)

Solution #\(2\):

The \(\Lambda\)-periodic function \(u_{\textbf k}(\textbf x)\) is in general something complicated and so in practice not that useful. However, one can equivalently formulate Bloch’s theorem without mentioning it all:

Problem #\(3\): Let \(\Lambda\) be a Bravais lattice, let \(H_{\Lambda}\) be a Hamiltonian with \(\Lambda\)-periodic potential, and let \(\psi(\textbf x)\) be an \(H_{\Lambda}\)-eigenstate. Then the following \(2\) propositions are logically equivalent:

  1. There exists a crystal quasimomentum \(\textbf k\in\textbf R^3/\Lambda^*\) and a \(\Lambda\)-periodic function \(u_{\textbf k}(\textbf x)\) such that \(\psi(\textbf x)=u_{\textbf k}(\textbf x)e^{i\textbf k\cdot\textbf x}\).
  2. There exists a crystal quasimomentum \(\textbf k\in\textbf R^3/\Lambda^*\) such that for all \(\Delta\textbf x\in\Lambda\):

\[\psi(\textbf x-\Delta\textbf x)=e^{-i\textbf k\cdot\Delta\textbf x}\psi(\textbf x)\]

Solution #\(3\): Trivial, but still good to fully think it through. In practical problem-solving, the way one actually uses Bloch’s theorem is frequently via formulation #\(2\).

Proof: The Bravais lattice \(\Lambda\) is a subgroup of the translation group \(\textbf R^3\), so we can invoke the usual unitary representation \(\Delta\textbf x\mapsto e^{-i\Delta\textbf x\cdot\textbf P/\hbar}\) where \(\textbf P\) is the generator of the Lie group of translations \(\textbf R^3\). Due to the induced \(\Lambda\)-periodicity of the crystal Hamiltonian \(H\), it follows that \(e^{i\Delta\textbf x\cdot\textbf P/\hbar}He^{-i\Delta\textbf x\cdot\textbf P/\hbar}=H\), or equivalently \([H,e^{-i\Delta\textbf x\cdot\textbf P/\hbar}]=0\). However, because \(\Lambda\) itself is not a Lie group, its associated discrete translational symmetry by only the lattice vectors \(\Delta\textbf x\in\Lambda\) is non-Noetherian and therefore it would be false to claim that the momentum \(\textbf P\) of states is conserved \([H,\textbf P]\neq 0\). Nevertheless, the fact that the crystal Hamiltonian \([H,e^{-i\Delta\textbf x\cdot\textbf P/\hbar}]=0\) commutes with the entire \(\Lambda\)-representation is all we need to prove Bloch’s theorem. Because \(\Lambda\) has the structure of an additive, hence abelian group, its complex irreducible subrepresentations act on one-dimensional eigenspaces by Schur’s lemma (obvious if you think transitively!) each of which is spanned by some plane wave \(|\textbf k\rangle\) and therefore indexed by crystal momentum \(\textbf k\in\textbf R^3\). By Schur’s lemma again, each of these plane waves are also \(H\)-eigenstates \(|\textbf k\rangle=|E_{\textbf k}\rangle\). In other words, the state \(|-\textbf k\rangle\otimes |E_{\textbf k}\rangle\in\mathcal H^{\otimes 2}\) obeys \(e^{-i\Delta\textbf x\cdot\textbf P_{\mathcal H^{\otimes 2}}/\hbar}\left(|-\textbf k\rangle\otimes|E_{\textbf k}\rangle\right)=e^{-i\Delta\textbf x\cdot\textbf P_{\mathcal H}/\hbar}|-\textbf k\rangle\otimes e^{-i\Delta\textbf x\cdot\textbf P_{\mathcal H}/\hbar}|E_{\textbf k}\rangle=e^{i\textbf k\cdot\Delta\textbf x}e^{-i\textbf k\cdot\Delta\textbf x}|-\textbf k\rangle\otimes|E_{\textbf k}\rangle=|-\textbf k\rangle\otimes|E_{\textbf k}\rangle\). But eigenfunctions of translation with eigenvalue \(1\) are synonymous with periodic functions. This is the periodic amplitude modulation \(u_{\textbf k}(\textbf x)=\langle\textbf x|\otimes\langle\textbf x|-\textbf k\rangle\otimes|E_{\textbf k}\rangle\).

Above in the proof of Bloch’s theorem, it was emphasized that the momentum \(\textbf P\) of states is not conserved. However, it turns out that the crystal momentum \(\hbar\textbf k\) is conserved modulo the reciprocal lattice \(\Lambda^*\). This follows from the form of the Bloch \(H\)-eigenstates in the crystal and the fact that one of the ways to define a reciprocal lattice vector \(\Delta\textbf k\in\Lambda^*\) is by the requirement that \(e^{i\Delta\textbf k\cdot\textbf x}=1\) for all real space lattice vectors \(\textbf x\in\Lambda\). The above discussion with the crystal momentum \(\textbf k\in\textbf R^3\) therefore took place in the extended (Brillouin) zone scheme, whereas the restriction of the crystal momentum \(\textbf k\in\Gamma_{\Lambda^*}\) to the first Brillouin zone is called the reduced (Brillouin) zone scheme. In the reduced scheme however, the Bloch \(H\)-eigenstates can no longer just be labelled like \(|E_{\textbf k}\rangle\), one needs an additional energy band index \(n\in\textbf Z^+\) to describe which energy band one is in; thus \(|E\rangle=|E_{\textbf k,n}\rangle\).

Born-Von Karmen Boundary Conditions

Although the Bloch form of the \(H\)-eigenstates \(\langle\textbf x|E_{\textbf k}\rangle=\langle x|\textbf k\rangle u_{\textbf k}(\textbf x)\) contain a \(\Lambda\)-periodic part \(u_{\textbf k}(\textbf x)\), the \(H\)-eigenstate \(|E_{\textbf k}\rangle\) itself does not immediately have any kind of periodicity associated to it unless its crystal momentum \(\textbf k\in\textbf R^3\) happens to land exactly on a reciprocal lattice point \(\textbf k\in\Lambda^*\) in which case the plane wave \(e^{i\textbf k\cdot\textbf k}\) would also be \(\Lambda\)-periodic and thereby absorbed into \(u_{\textbf k}(\textbf x)\) factor, ensuring that \(\langle\textbf x|E_{\textbf k}\rangle\) would be \(\Lambda\)-periodic. But \(\Lambda^*\) is a set of measure zero in \(\textbf R^3\), so this isn’t a reliable source of periodicity. Instead, we can simply enforce periodic boundary conditions (also called Born-von Karmen boundary conditions) on the \(H\)-eigenstate. Specifically, if the crystal is a rectangular prism with \(N_j\) atoms along the \(j\)-th primitive lattice vector \(\textbf a_j\) direction, then we enforce it to be \(N_j\)-periodic along \(\textbf a_j\), i.e. \(\langle \textbf x+N_j\textbf a_j|E_{\textbf k}\rangle\) for each \(j=1,2,3\) (not to be confused as an Einstein series!). This is clearly compatible with Bloch’s theorem provided the crystal momentum \(\textbf k\) satisfies \(e^{iN_j\textbf k\cdot\textbf a_j}=1\) for each \(j\in\{1,2,3\}\). Note that this doesn’t necessarily imply that \(\textbf k\in\Lambda^*\) since it only holds at some \(N_j\textbf a_j\in\Lambda\) not at every Bravais lattice point in \(\Lambda\). More precisely, we can still express the crystal momentum \(\textbf k\) in the reciprocal basis \(\textbf a^1,\textbf a^2,\textbf a^3\) just that the coefficients of such a linear combination need not be integers and hence \(\textbf k\) need not be in \(\Lambda^*\). More precisely, one can check using \(\textbf a^j\textbf a_k=2\pi\delta^j_k\) that the Born-von Karmen boundary conditions are logically equivalent to \(\textbf k=\sum_{j=1}^3\frac{n_j}{N_j}\textbf a^j\) for any choice of three integers \(n_j\in\textbf Z\). Thus, the Born-von Karmen boundary conditions quantize the wavevector \(\textbf k=\textbf k_{n_1,n_2,n_3}\) to lie in a lattice \(\text{span}_{\textbf Z}(\textbf a^1/N_1,\textbf a^2/N_2,\textbf a^3/N_3)\) which contains the reciprocal lattice \(\Lambda^*\subseteq\text{span}_{\textbf Z}(\textbf a^1/N_1,\textbf a^2/N_2,\textbf a^3/N_3)\) as a sublattice.

Armed with this discretization of reciprocal space given by the Born-von Karmen boundary conditions, it makes sense to ask how many \(H\)-eigenstates are associated by Bloch’s theorem to a crystal momentum \(\textbf k\in\Gamma_{\Lambda^*}\) which lies within the first Brillouin zone? In other words, what is the cardinality of the intersection \(\Gamma_{\Lambda^*}\cap\text{span}_{\textbf Z}(\textbf a^1/N_1,\textbf a^2/N_2,\textbf a^3/N_3)\)? The answer is independent of what the particular Bravais lattice \(\Lambda\) is (and hence what \(\Lambda^*\) is) because regardless of its exact nature, in all cases the volume \(V^*\) of the Brillouin zone in reciprocal space is \(V^*=|\textbf a^1\cdot(\textbf a^2\times\textbf a^3)|\) (this is because the Brillouin zone is just the Wigner-Seitz cell of \(\Lambda^*\) which itself is just a particularly symmetric choice of primitive unit cell and all primitive unit cells of any Bravais lattice have the same volume). Therefore, within this volume \(V^*\) one can fit approximately \(N_1N_2N_3\) little parallelepipeds with sides \(\textbf a^1/N_1,\textbf a^2/N_2,\textbf a^3/N_3)\). But each parallelepiped corresponds to a Bloch \(H\)-eigenstate, so this argument proves that \(\#\Gamma_{\Lambda^*}\cap\text{span}_{\textbf Z}(\textbf a^1/N_1,\textbf a^2/N_2,\textbf a^3/N_3)=N:=N_1N_2N_3\) is the total number of atoms in the entire crystal (again, when subject to Born-von Karmen boundary conditions).

Wannier Wavefunctions

Another important property of the Bloch \(H\)-eigenstates is that they are very much delocalized in \(\textbf x\)-space since the probability density \(|\langle\textbf x|E_{\textbf k}\rangle|^2=|u_{\textbf k}(\textbf x)|^2\) is at least \(\Lambda\)-periodic, hence delocalized. In line with the Heisenberg uncertainty principle, these Bloch \(H\)-eigenstates are also very much localized in \(\textbf k\)-space. It therefore makes sense that, if we instead view the Bloch \(H\)-eigenstate \(\langle\textbf x|E_{\textbf k}\rangle\) not so much as a function of \(\textbf x\) but as a function of the crystal momentum \(\textbf k\), then we can imagine running a unitary Fourier transform from \(\textbf k\mapsto\textbf x\) which by Heisenberg again should produce a more localized wavefunction in \(\textbf x\)-space. Indeed, these wavefunctions are called Wannier wavefunctions \(w(\textbf x-\textbf x_0)\) defined as described above (the normalization factor \(N\) comes from the argument in the preceding section using the Born-Von Karmen boundary conditions):

\[w(\textbf x-\textbf x_0):=\frac{1}{\sqrt N}\sum_{\textbf k\in\Gamma_{\Lambda^*}}e^{-i\textbf k\cdot\textbf x_0}\langle\textbf x|E_{\textbf k}\rangle\]

where the way the argument is written emphasizes the Wannier wavefunctions are associated each atomic site \(\textbf x_0\in\Lambda\) in the Bravais lattice (check this using Bloch’s theorem). The Wannier wavefunctions are no longer wavefunctions of \(H\)-eigenstates since they can’t be written in Bloch form. They also suffer a gauge redundancy, which opens up the whole topic of maximally localized Wannier functions (MLWF); see this paper of Marzari, Vanderbilt, et al. By the unitarity of the Fourier transform as encoded by Plancherel’s theorem, the Wannier wavefunctions are orthonormal with respect to distinct atomic sites \(\iiint_{\textbf x\in\textbf R^3}w^*(\textbf x-\textbf x_1)w(\textbf x-\textbf x_2)d^3\textbf x=\delta_{\textbf x_1,\textbf x_2}\). The inverse unitary Fourier transform gives the delocalized Bloch \(H\)-eigenstate as a superposition of the localized Wannier wavefunctions:

\[\langle\textbf x|E_{\textbf k}\rangle=\frac{1}{\sqrt N}\sum_{\textbf x_0\in\Lambda}e^{i\textbf k\cdot\textbf x_0}w(\textbf x-\textbf x_0)\]

As a general rule of thumb, it seems that Bloch wavefunctions are a more suitable basis for conductors, etc. where the electrons \(e^-\) are known to be nearly free and delocalized in energy bands whereas Wannier wavefunctions are a more suitable basis for insulators and materials described by tight-binding Hamiltonians where the electrons \(e^-\) are known to be more localized.

Posted in Blog | Leave a comment

Driven, Damped Harmonic Oscillators

The purpose of this post is to serve as a reference on standard properties of the driven, damped harmonic oscillator:

\[\ddot{x}+\Delta\omega\dot{x}+\omega_0^2x=f(t)\]

where the damping coefficient \(\Delta\omega>0\) describes the resonant bandwidth of the system’s frequency response (reciprocal to the oscillator lifetime \(\tau=1/\Delta\omega\)), and \(\omega_0>0\) is the natural (and approximately resonant) angular frequency of the harmonic oscillator. The ratio of these \(2\) frequency parameters yields a dimensionless quality factor of the damped harmonic oscillator:

\[Q:=\frac{\omega_0}{\Delta\omega}\]

where a higher \(Q\)-factor, as its name suggests, indicates a less damped oscillator, hence one that “rings” for longer before it dies out.

The general solution of the inhomogeneous/driven ODE will look like an exponentially decaying transient complementary function that solves the homogeneous/undriven ODE (dependent on initial conditions), superimposed with a steady state particular integral that oscillates at the driving frequency of \(f(t)\) (independent of initial conditions).

Henceforth focusing on the steady state, if \(f(t)=f_0e^{i\omega t}\) (and one can gauge-fix \(f_0\in (0,\infty)\)), then the ansatz \(x(t)=x_0e^{i\omega t}\) yields the frequency response:

\[x_0=\frac{f_0}{\omega_0^2-\omega^2+i\omega\Delta\omega}\]

The point is to quickly extract amplitude and phase information about the response:

\[|x_0|=\frac{f_0}{\sqrt{(\omega^2-\omega^2_0)^2+\omega^2\Delta\omega^2}}\]

\[\tan\angle x_0=\frac{\omega\Delta\omega}{\omega^2-\omega_0^2}\]

Although the response amplitude \(|x_0|\) is strictly maximized when driven at \(\omega=\sqrt{\omega_0^2+\Delta\omega^2/2}\), in the weakly damped limit \(\Delta\omega\ll\omega_0\) this is \(\omega=\omega_0+O(\Delta\omega)^2\). At this frequency, the phase offset \(\angle x_0\) between \(f(t)\) and \(x(t)\) rapidly (i.e. over a “Fermi surface” of thickness \(\sim\Delta\omega\)) “flips” from being in-phase \(\angle x_0\approx 0\) to out-of-phase \(\angle x_0\approx\pi\).

In addition to \(x_0=x_0(\omega)\), another useful spectrum to look at is the steady state period-averaged driving power \(\bar P=\bar P(\omega)\) (note this is not the total power, which would also include the powers developed by the spring and damping forces, this is just the input driving power). First, the steady state driving power not time-averaged yet (for mass \(m\)) is:

\[P(t)=\Re mf(t)\Re \dot x(t)=mf_0\cos\omega t\Re\left(i\omega x_0e^{i\omega t}\right)\]

\[=\frac{m\omega f_0^2\cos \omega t}{(\omega^2-\omega_0^2)^2+\omega^2\Delta\omega^2}\left[(\omega^2-\omega_0^2)\sin\omega t+\omega\Delta\omega\cos\omega t\right]\]

so time-averaging over a period:

\[\bar P=\frac{m\omega^2 f_0^2\Delta\omega}{2((\omega^2-\omega_0^2)^2+\omega^2\Delta\omega^2)}\]

(alternatively, one could first compute the complex power \((mf)^{\dagger}\dot x/2\) and take \(\Re\) to obtain the real power injected into the system by driving, leading to the same result \(\bar P\) as above).

Working around resonance \(\omega\approx\omega_0\), the difference of squares approximation \(\omega^2-\omega_0^2\approx 2\omega_0(\omega-\omega_0)\) (and similar approximations) lead to:

\[|x_0|\approx\frac{f_0}{2\omega_0\sqrt{(\omega-\omega_0)^2+\Delta\omega^2/4}}\]

\[\tan\angle x_0=\frac{\Delta\omega}{2(\omega-\omega_0)}\]

\[\bar P=\frac{mf_0^2}{2\Delta\omega}\frac{1}{1+(2(\omega-\omega_0)/\Delta\omega)^2}\]

where the form of \(\bar P\) is immediately recognized as a Lorentzian with FWHM bandwidth \(\Delta\omega\).

A physically insightful way to remember this is as follows: in the steady state, the input driving power is matched by the dissipative power due to damping, so looking at the equation of motion, one can equate on dimensional grounds \(\dot x\Delta\omega\sim f_0\). But the power at resonance is \(F_0\dot x\sim F_0f_0/\Delta\omega\sim F_0^2/m\Delta\omega\). Then, just remember there’s a factor of \(1/2\) from period-averaging a \(\cos^2\).

Problem: Write down the general (i.e. initial condition dependent) transient particular integral solution to the homogeneous ODE (i.e. an undriven oscillator).

Solution: In the underdamped case \(Q>1/2\) (i.e. there will be oscillations if released from rest):

\[x(t)=e^{-\Delta\omega t/2}\left(A\cos\left(\sqrt{\omega_0^2-\frac{\Delta\omega^2}{4}}t\right)+B\sin\left(\sqrt{\omega_0^2-\frac{\Delta\omega^2}{4}}t\right)\right)\]

In the overdamped case \(Q<1/2\) (i.e. no oscillations if released from rest):

\[x(t)=e^{-\Delta\omega t/2}\left(Ae^{-\sqrt{\Delta\omega^2/4-\omega_0^2}t}+Be^{\sqrt{\Delta\omega^2/4-\omega_0^2}t}\right)\]

And in the critically damped case \(Q=1/2\):

\[x(t)=e^{-\Delta\omega t/2}(A+Bt)\]

All these are quickly checked by noting the identity:

\[\ddot x+\Delta\omega\dot x+\omega_0^2x=e^{-\Delta\omega t/2}\frac{d^2}{dt^2}\left(e^{\Delta\omega t/2}x\right)+\left(\omega_0^2-\frac{\Delta\omega^2}{4}\right)x\]

Misconception: An underdamped, undriven harmonic oscillator will have oscillations but a critically damped or underdamped (undriven) harmonic oscillator will not have oscillations.

Clarification: It is true that an underdamped oscillator (regardless of initial conditions) always oscillates before it dies out to \(x\to 0\). In the case of critical damping or being underdamped, if the oscillator starts from rest \(\dot{x}(0)=0\), then indeed it will not oscillate about \(x=0\) in the sense that there will be no overshoot, just a monotonic approach towards \(x=0\) with critical damping being the fastest. However, if one does give the system an initial push in the direction of \(x=0\), then indeed the oscillator can (though not necessarily! One requires \(\dot{x}(0)<x(0)\Delta\omega/2\) for critical damping) overshoot the origin \(x=0\), though only once (if one pushes it in the direction away from \(x=0\), then indeed there will be no oscillation since that’s equivalent to the system just starting from a new amplitude and decaying monotonically back to \(x\to 0\)).

Misconception: The quality factor \(Q\) is only defined for an undriven oscillator.

Clarification: From the definition \(Q=\omega_0/\Delta\omega\), it should be clear that the quality factor is an intrinsic property of any oscillator, and whether or not it happens to be driven is immaterial to \(Q\) being well-defined. Nevertheless, the interpretation of \(Q\) is different depending on whether the oscillator is being driven or not:

  1. If the oscillator is undriven, then of course no matter whether it is underdamped, critically damped, or overdamped, one has \(\lim_{t\to\infty}x(t)=0\) simply because damping \(\Delta\omega>0\) exists. So in this case, the interpretation of \(Q\) is that it is the number of radians of oscillations before the energy of the oscillator dissipates to \(1/e\) of its initial energy. Equivalently, \(Q/2\pi\) is the number of oscillations before the oscillator “stops oscillating” and so this occurs after a time:

\[t=\frac{Q}{2\pi}T=\frac{1}{\Delta\omega}\]

cf. the exponentially decaying envelope \(x(t)\sim e^{-\Delta\omega t/2}\) in all cases of underdamped/critically damped/overdamped behavior. This interpretation is in fact best suited for underdamped oscillators since only they basically oscillate (though see the misconception above).

2. If the oscillator is driven, then \(Q\) is interpreted as the sharpness of the resonance peak of the response function \(x_0(\omega)\) as a function of the driving frequency \(\omega\).

Posted in Blog | Leave a comment

Debye Model of Insulators

Problem #\(1\): Write down a formula for the total energy \(E\) stored in the transverse and longitudinal acoustic phonons of an insulator at temperature \(T\).

Solution #\(1\): This should feel completely natural:

\[E=\int_0^{k_D}dk 3\frac{V}{(2\pi)^3}4\pi k^2\times\hbar\omega_k\times\frac{1}{e^{\beta\hbar\omega_k}-1}\]

where the Debye wavenumber \(k_D\sim(N/V)^{1/3}\) is determined precisely by requiring \(\int_0^{k_D}dk 3\frac{V}{(2\pi)^3}4\pi k^2=3N\) normal modes in an insulating solid of \(3N\) atoms (it is the UV cutoff support of the density of states).

Problem #\(2\): What dispersion relation \(\omega_k\) is assumed in the Debye model?

Solution #\(2\): A linear nondispersion \(\omega_k=ck\), where \(c\) should be read as a suitable average of the acoustic and longitudinal velocities. In practice, for sufficiently high-momenta phonons, the acoustic band dispersion is nonlinear with the group velocity vanishing at the boundary of the underlying crystal lattice’s Brillouin zone, so in order to ensure one is working in the linear, low-\(k\) regime, just keep \(T\) low (so expect the Debye model to work better for low-\(T\) even though it also reproduces the Dulong-Petit law at high-\(T\)).

Problem #\(3\): Introduce the Debye frequency \(\omega_D\) and hence Debye temperature \(T_D\), and show that, given the assumption of Solution #\(2\), the energy \(E\) stored in the transverse/longitudinal acoustic phonons of a solid can be written in terms of \(T_D\) as:

\[E=\frac{9Nk_BT^4}{T_D^3}\int_0^{T_D/T}dx\frac{x^3}{e^x-1}\]

Solution #\(3\): In line with Solution #\(2\), define \(\omega_D:=ck_D\) and \(k_BT_D:=\hbar\omega_D\). The rest is bookwork (using the exact \(k_D=(6\pi^2N/V)^{1/3}\)).

Problem #\(4\): Now one is in a position to get the heat capacity (at constant \(V\) to be precise)! Just \(T\)-differentiate \(E\) in Problem #\(3\)!

Solution #\(4\):

\[C_V=\frac{4E}{T}-\frac{9Nk_BT_D}{T(e^{T_D/T}-1)}\]

Problem #\(5\): Check that \(C_V\to 3Nk_B\) for \(T\gg T_D\) (by the way this now makes precise the informal statement earlier about being at “high-\(T\)”) and \(C_V\sim\frac{12\pi^4}{5}Nk_B\left(\frac{T}{T_D}\right)^3\) for \(T\ll T_D\) (this is what “low-\(T\)” means!), in agreement with the \(3^{\text{rd}}\) law of thermodynamics.

Solution #\(5\):

Problem #\(6\): Why make such a fuss that the Debye model only applies to solids which are insulators?

Solution #\(6\): Well, to be more precise, the Debye model only considers the energy tied up in transverse/longitudinal non-dispersive acoustic phonons (with wavevectors confined in an appropriate Brillouin zone \(k\leq k_D\)), and their corresponding heat capacity contribution. In insulators, this is typically the dominant contribution at low \(T\ll T_D\). However, for conductors, the conduction electrons make an even greater linear contribution to the heat capacity (which can be estimated in an ideal Fermi gas approximation), which at low \(T\) will dominate the phonon contribution (this difference in the scaling \(T^3\) vs. \(T\) arises fundamentally because phonons are massless bosons whereas electrons are massive fermions.).

Problem #\(7\): Give a heuristic derivation of the Debye’s \(T^3\) law.

Solution #\(7\): As with any thermal distribution at temperature \(T\), only phonon states of energy \(\hbar\omega\leq k_BT\Leftrightarrow k\leq k_BT/\hbar c\) will be populated, the total number of such states being proportional to the volume of the corresponding ball in \(\mathbf k\)-space, so \(\propto T^3\). The energy of each such low-energy state is \(\hbar\omega/(e^{\hbar\omega/k_BT}-1)\approx k_BT\) so the total energy \(E\sim T^4\), hence \(C\sim T^3\).

(if one looks back at the integral manipulations earlier, this is basically just a more physically-appealing rephrasing of those calculations).

cf. the similar argument for the Sommerfeld \(T\) law for conduction electrons.

Posted in Blog | Leave a comment

Planckian Statistics

Problem: What is the position space wavefunction \(\psi(\textbf x)\) of a photon in a box of volume \(V=L^3\) whose opposite faces are subject to periodic boundary conditions? What about reflecting boundary conditions?

Solution: With periodic boundary conditions, the wavefunctions are travelling plane waves:

\[\psi(\textbf x)\sim e^{i\textbf k\cdot\textbf x}\]

where \(\textbf k\in\frac{2\pi}{L}\textbf Z^3\).

With reflecting boundary conditions (e.g. ideal conductor surfaces), wavefunctions are standing waves (suitable superpositions of travelling plane waves):

\[\psi(\textbf x)\sim\sin(k_xx)\sin(k_yy)\sin(k_zz)\]

where \(k_x,k_y,k_z\in \frac{\pi}{L}\textbf Z^+\) (the \(\textbf Z^+\) here rather than a \(\textbf Z\) restricts to the first octant in \(\textbf k\)-space because e.g. \(\sin(-k_xx)=-\sin(k_xx)\) is a physically equivalent state.

(if one wants to normalize these wavefunctions, then in the former case it’s \(1/\sqrt{V}\), in the latter case it’s \(\sqrt{8/V}\)).

Problem: Write down the Planck distribution \(\mathcal E_{\omega}\).

Solution: Keeping in mind \(\omega=ck\):

\[\mathcal E_{\omega}=2\times\frac{V}{(2\pi)^3}4\pi k^2\frac{\hbar\omega}{e^{\beta\hbar\omega}-1}\]

Problem: Assuming an isotropic photon gas, find an expression for the irradiance density \(I(\omega)\) incident normally on a face of the cavity.

Solution: Letting the unit normal to the face be \(\hat{\textbf z}\), this is given by the hemispherical expectation of the Poynting vector projection along \(\hat{\textbf z}\):

\[I(\omega)d\omega=\langle\textbf S_{\omega}(\hat{\textbf n})\cdot\hat{\textbf z}\rangle_{\hat{\textbf n}\in S^2/2}\]

where the Poynting vector for electromagnetic waves (or photons) of frequency \(\omega\) travelling in direction \(\hat{\textbf n}\) is:

\[\textbf S_{\omega}(\hat{\textbf n})=\mathcal E_{\omega}d\omega c\hat{\textbf n}\]

So, using an isotropic distribution \(p(\hat{\textbf n})=1/4\pi\):

\[I(\omega)=\mathcal E_{\omega}c\int_{S^2/2} d^2\hat{\textbf n}p(\hat{\textbf n})\hat{\textbf n}\cdot\hat{\textbf z}\]

\[=\mathcal E_{\omega}c\int_0^1 d\cos\theta\int_0^{2\pi}d\phi \frac{1}{4\pi}\cos\theta\]

\[=\frac{\mathcal E_{\omega} c}{4}\]

Problem: Derive the Stefan-Boltzmann law.

Solution: Integrate the irradiance density over all \(\omega\in(0,\infty)\):

\[\frac{P}{A}=\int_0^{\infty}d\omega I(\omega)=\sigma T^4\]

where \(\sigma:=\frac{\pi^2k_B^4}{60c^2\hbar^3}\).

Problem: So the Stefan-Boltzmann law says that as an object gets hotter (avoid using the term blackbody), it emits more energy in proportion to \(T^4\); but how much of this effect is due to the energy of the individual photons actually increasing and how much is due to the fact that it’s emitting more photons?

Solution: \(4=1+3\), where the \(1\) is the energy increase of each photon, while the \(3\) is the increase in the number of photons emitted (thus, it is clear what the dominant effect is).

Mystery (Why This Derivation Fails?)

The canonical partition function for the photon gas is actually (remembering also the \(1/N!\) Gibbs factor):

\[Z=\sum_{N=0}^{\infty}\frac{1}{N!}\sum_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf Z^{3N}}\sum_{(|\psi_1\rangle,|\psi_2\rangle,…,|\psi_N\rangle)\in\{|0\rangle,|1\rangle\}^N}e^{-\beta E_{(\textbf n_1,\textbf n_2,…,\textbf n_N)}}\]

where the total energy of the photon gas in that particular microstate is degenerate with respect to their polarizations, instead depending only on the momenta \(E_{(\textbf n_1,\textbf n_2,…,\textbf n_N)}=\frac{2\pi\hbar c}{L}\sum_{i=1}^N|\textbf n_i|\). As a result, we can move the exponential outside the sum over polarization states \(\sum_{(|\psi_1\rangle,|\psi_2\rangle,…,|\psi_N\rangle)\in\{|0\rangle,|1\rangle\}^N}1=2^N\). We can also convert the exponential of the series of energies into a product \(e^{-\beta E_{(\textbf n_1,\textbf n_2,…,\textbf n_N)}}=\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\).

The series of products \(\sum_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf Z^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\) is tricky to calculate but can be accurately estimated with the aid of a useful approximation.

Digression on the Euler-Maclaurin Summation Formula

The Euler-Maclaurin summation formula asserts that, for any smooth function \(f\in C^{\infty}(\textbf R\to\textbf C)\), if we sum the values of \(f\) evaluated on a bunch of consecutive integers \(\sum_{n=a}^bf(n)\) with \(a,b\in\textbf Z\cup\{-\infty,\infty\}\), then this is often well-approximated by the integral \(\int_a^bf(n)dn\) of the function \(f\) on the interval \([a,b]\), treating \(n\) as though it were a continuous variable \(n\in\textbf R\) rather than an integer \(n\in\textbf Z\) (of course this is a two-way street, meaning that many integrals can also be approximated by suitable series). The canonical example of the Euler-Maclaurin summation formula concerns the harmonic series \(f(n)=1/n\), with the following diagram given to rationalize why the Euler-Maclaurin approximation should be a good approximation:

However, the full Euler-Maclaurin formula also acknowledges that the approximation is imperfect by explicitly giving a formula for the error/discrepancy between the series and its integral approximation:

\[\sum_{n=a}^bf(n)=\int_a^bf(n)dn+\frac{f(a)+f(b)}{2}+\sum_{n=1}^{\infty}\frac{B_{2n}}{(2n)!}\left(f^{[2n-1]}(b)-f^{[2n-1]}(a)\right)\]

where the asymptotic series on the right features the non-zero even Bernoulli numbers \(B_2=1/6,B_4=-1/30\), etc. This interesting sequence of rational numbers were historically first discovered in the context of Faulhaber’s formula \(1^p+2^p+…+n^p=\frac{1}{p+1}\left(B_0{{p+1}\choose{0}}n^{p+1}+B_1{{p+1}\choose{1}}n^{p}+…+B_p{{p+1}\choose{p}}n\right)\) for sums of powers, but arguably the real utility of the Bernoulli numbers lies in their appearance in the Euler-Maclaurin summation formula (indeed Faulhaber’s formula is just the special case of the Euler-Maclaurin summation formula with \(f(n)=n^p\)). The quickest way to compute these Bernoulli numbers is to use a Pascal-like triangle construction (details can be found in this paper of Kawasaki and Ohno). The idea is to first write down a couple terms of the harmonic sequence \(1, 1/2,1/3,…\) in the first row. Then, whereas in Pascal’s triangle one would compute the number “below” two numbers by simply adding them, here instead one first has to subtract the two (left number minus right number, or if you forget then bigger number minus smaller number to always obtain a positive difference), and then multiply the result by the denominator of whichever term in the original harmonic sequence \(1, 1/2,1/3,…\) lies on the same downward-sloping “diagonal column” so to speak. A picture is worth \(10^3\) words:

The Bernoulli numbers \(B_0=1,B_1=1/2,B_2=1/6\), etc. are then simply read off as the elements on the first (left-most) downward-sloping diagonal column (SHOULD INCLUDE SOME COMMENTS ABOUT WHEN EULER-MACLAURIN SUMMATION IS VALID AND RELATE IT TO THE STATISTICAL MECHANICAL APPLICATION FUNCTION WHICH IS EXPONENTIALLY DECAYING!).

Back to the Physics

All this is to say that we expect we can approximate the series of products earlier \(\sum_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf Z^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\) over the discrete lattice \(\textbf Z^{3N}\) by an integral over the smooth manifold \(\textbf R^{3N}\) instead, obtaining a \(3N\)-dimensional analog of the Euler-Maclaurin summation formula:

\[\sum_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf Z^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\approx \int_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf R^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}d^3\textbf n_i\]

Notice that the \(N\) integrals for each photon completely decouple from each other (as expected because the photons are non-interacting):

\[\int_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf R^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}d^3\textbf n_i=\prod_{i=1}^N\iiint_{\textbf n_i\in\textbf R^3}e^{-2\pi\beta\hbar c|\textbf n_i|/L}d^3\textbf n_i\]

Moreover, each of these integrals are identical (because each photon is an identical boson) so the product becomes an \(N\)-fold exponentiation:

\[\prod_{i=1}^N\iiint_{\textbf n_i\in\textbf R^3}e^{-2\pi\beta\hbar c|\textbf n_i|/L}d^3\textbf n_i=\left(\iiint_{\textbf n\in\textbf R^3}e^{-2\pi\beta\hbar c|\textbf n|/L}d^3\textbf n\right)^N\]

Expressing the volume element \(d^3\textbf n=4\pi|\textbf n|^2d|\textbf n|\) in spherical coordinates, the integral comes out as:

\[\iiint_{\textbf n\in\textbf R^3}e^{-2\pi\beta\hbar c|\textbf n|/L}d^3\textbf n=4\pi\int_0^{\infty}|\textbf n|^2e^{-2\pi\beta\hbar c|\textbf n|/L}d|\textbf n|=4\pi\left(\frac{L}{2\pi\beta\hbar c}\right)^3\Gamma(3)=\frac{V}{\pi^2\beta^3\hbar^3c^3}\]

So assembling everything together, the canonical partition function \(Z\) for the photon gas becomes:

\[Z=\sum_{N=0}^{\infty}\frac{(2V/\pi^2\beta^3\hbar^3c^3)^N}{N!}=e^{2V/\pi^2\beta^3\hbar^3c^3}\]

Unfortunately, I believe the correct answer is actually \(Z=e^{\pi^2 V/45\beta^3\hbar^3c^3}\). My answer is correct up to a numerical factor of \(\zeta(4)=\pi^4/90\) which I cannot for the life of me figure out where I forgot to incorporate.

Posted in Blog | Leave a comment

Van der Waals Equation of State

The purpose of this post is to explore the rich physics encoded in the Van der Waals equation of state:

\[\left(p+\frac{aN^2}{V^2}\right)(V-Nb)=NkT\]

Essentially, the Van der Waals equation of state dispenses with two assumptions implicit in the ideal gas law:

  1. Gas particles are not point particles, but have some finite volume which reduces the total available volume \(V\).
  2. More importantly, gas particles interact with each other through some potential energy \(V(|\textbf x_i-\textbf x_j|)\) (not to be confused with the volume \(V\)). They’re not just billiard balls flying around oblivious to each other’s existence.

First, it will be instructive to derive the Van der Waals equation from a microscopic, first principles statistical mechanics calculation within the regime of certain approximations. Then, some of the interesting physics that it predicts will be investigated further.

Origins of the Van der Waals Equation

The starting point is the virial expansion (which has nothing to do with the virial theorem). It is just a perturbation of the ideal gas law to accommodate non-ideal effects (e.g. the two points mentioned above). The idea is that the ideal gas law is increasingly accurate in the low-density limit \(N/V\to 0\) so one can write down a power series equation of state in the particle density \(N/V\):

\[\frac{p}{kT}=\frac{N}{V}+B_2(T)\left(\frac{N}{V}\right)^2+B_3(T)\left(\frac{N}{V}\right)^3…=\sum_{j=1}^{\infty}B_j(T)\left(\frac{N}{V}\right)^j\]

where the expansion coefficients \(B_j(T)\) are called virial coefficients with \(B_1(T)=1\) (note that because \(N/V\) is not dimensionless, each of these virial coefficients has different dimensions). The goal is to calculate the virial coefficients \(B_j(T)\) from first principles (i.e. from specifying the precise nature of the interatomic potential energy \(V(r)\); it is taken to be central for now).

Recall that classically for two permanent electric dipoles \(\textbf p_1,\textbf p_2\), their potential energy is \(V(\textbf r)=\frac{\textbf p_1\cdot\textbf p_2-3(\textbf p_1\cdot\hat{\textbf r})(\textbf p_2\cdot\hat{\textbf r})}{4\pi\varepsilon_0 r^3}\) where \(\hat{\textbf r}\) can either be the unit vector from \(\textbf p_1\) to \(\textbf p_2\) or vice versa (clearly doesn’t matter). In our case of neutral monatomic gas particles, there are no permanent electric dipoles \(\textbf p_1=\textbf p_2=\textbf 0\), but it turns out there are still temporary/instantaneous electric dipoles (ultimately due to quantum fluctuations, but here one is treating everything classically). Therefore, even neutral atoms like a container of helium \(\text{He}(g)\) gas still experience slight interatomic forces, dubbed London dispersion forces. One can argue the strength of such London dispersion forces as follows: if an electric dipole \(\textbf p_1^{\text{temp}}\) happens to arise temporarily due to random fluctuations in an atom, then it will also come with a temporary, anisotropic electric field \(\textbf E^{\text{temp}}_1(\textbf r)=\frac{3(\textbf p_1^{\text{temp}}\cdot\hat{\textbf r})\hat{\textbf r}-\textbf p_1^{\text{temp}}}{4\pi\varepsilon_0r^3}\). Another neutral atom located some relative position \(\textbf x\) away will feel the influence of \(\textbf E^{\text{temp}}_1(\textbf x)\) and it will now acquire an induced electric dipole \(\textbf p_2^{\text{ind}}=\alpha\textbf E^{\text{temp}}_1(\textbf x)\) with \(\alpha>0\) its atomic polarizability. One can then apply the same formula for the potential energy of any two electric dipoles, \(V_{\text{London}}(\textbf x)=\frac{\textbf p_1^{\text{temp}}\cdot\textbf p_2^{\text{ind}}-3(\textbf p_1^{\text{temp}}\cdot\hat{\textbf x})(\textbf p_2^{\text{ind}}\cdot\hat{\textbf x})}{4\pi\varepsilon_0 |\textbf x|^3}\). Thus, London dispersion forces are conservative forces with corresponding attractive London dispersion potential energy \(V_{\text{London}}\propto-\frac{1}{r^6}\) (the weakest among all Van der Waals forces!).

So London dispersion forces tend to weakly attract gas particles together. On the other hand, they can’t get too close together by virtue of the Pauli exclusion principle (this is again a bit handwavy, but can all be justified more rigorously using quantum mechanics). Since this is a classical model, the exact form of the repulsive “Pauli potential” \(V_{\text{Pauli}}\) isn’t so important. One common model is \(V_{\text{Pauli}}(r)\propto\frac{1}{r^{12}}\) (because \(12=6\times 2\)). With this, one obtains the Lennard-Jones potential \(V_{\text{Lennard-Jones}}(r)=V_{\text{London}}(r)+V_{\text{Pauli}}(r)=V_{*}\left[\left(\frac{r_*}{r}\right)^{12}-2\left(\frac{r_*}{r}\right)^6\right]\), where \(r_*\) is the global minimum of the potential and \(V_*=-V(r_*)>0\) is the depth of the potential well. However, another possibility for the Pauli potential is simply the infinite potential barrier \(V_{\text{Pauli}}=\infty[r<r_0]\) for some hard sphere center-to-center separation \(r_0>0\) (so that each hard sphere actually has radius \(r_0/2\) called its Van der Waals radius). For computational convenience later, it turns out this latter form will be simpler to work with.

Canonical Partition Function

Now the Hamiltonian for \(N\) gas particles is \(H=\sum_{i=1}^N\frac{p_i^2}{2m}+\sum_{1\leq i<j\leq N}V(|\textbf x_i-\textbf x_j|)\) where \(V_{ij}=V\) is the same potential energy for all pairs of gas particles. Since one would like to think of this gas as trapped in a box, it can’t exchange particles, but it’s okay if it exchanges energy with the environment heat bath so let’s consider it at thermodynamic equilibrium with microstate probabilities provided by the Boltzmann distribution in the canonical ensemble. The canonical partition function is (this time one has to do the full \(6N\)-dimensional phase space integral because the gas particles are interacting):

\[Z=\frac{1}{N!h^{3N}}\int_{\textbf R^{6N}}\prod_{i=1}^Nd^3x_id^3p_ie^{-\beta H}=\frac{1}{N!\lambda^{3N}}\int_{\textbf R^{3N}}\prod_{i=1}^Nd^3x_ie^{-\beta\sum_{1\leq j<k\leq N}V(|\textbf x_j-\textbf x_k|)}\]

Thus, because of the interactions, the integral over positions doesn’t factor in any obvious way; it’s an example of what makes interactions difficult to study in general in physics. One non-trivial solution is to define the Mayer \(f\)-function by the formula:

\[f(r):=e^{-\beta V(r)}-1\]

For instance, for the Lennard-Jones potential \(V(r)=V_{\text{Lennard-Jones}}(r)\), the Mayer \(f\)-function looks something like:

For notational brevity, defining the shorthand \(f_{jk}:=f(|\textbf x_j-\textbf x_k|)\) (which may be viewed as components of a symmetric \(N\times N\) Mayer \(f\)-matrix with \(-1\) on the diagonals), one can rewrite the canonical partition function \(Z\) for the gas as:

\[Z=\frac{1}{N!\lambda^{3N}}\int_{\textbf R^{3N}}\prod_{i=1}^Nd^3x_i\prod_{1\leq j<k\leq N}(1+f_{jk})\]

To second-order, the product looks like a series of subseries \(\prod_{1\leq j<k\leq N}(1+f_{jk})=1+\sum_{1\leq j<k\leq n}f_{jk}+\sum_{1\leq j<k\leq N,1\leq \ell<m\leq N}f_{jk}f_{\ell m}\). The zeroth and first-order terms evaluate approximately to (using \(N(N-1)/2\approx N^2/2\)):

\[Z\approx Z_{\text{ideal}}\left(1+\frac{2\pi N^2}{V}\int_0^{\infty}r^2f(r)dr+…\right)\]

where \(Z_{\text{ideal}}=\frac{V^N}{N!\lambda^{3N}}\). Thus, one sees explicitly how the interactions \(f(r)\) constitute a perturbation to the canonical partition function of the ideal gas. In the canonical ensemble, the Helmholtz free energy is approximately (ignoring terms in quadratic in the Mayer \(f\)-function and Taylor expanding the logarithm) \(F=-kT\ln Z\approx -kT\ln Z_{\text{ideal}}-kT\ln\left(1+\frac{2\pi N^2}{V}\int_0^{\infty}r^2f(r)dr\right)\approx F_{\text{ideal}}-\frac{2\pi N^2kT}{V}\int_0^{\infty}r^2f(r)dr\). The pressure is then \(p=-\partial F/\partial V=\frac{NkT}{V}-\frac{2\pi N^2kT}{V^2}\int_0^{\infty}r^2f(r)dr\). From this, one is able to identify the second virial coefficient as \(B_2(T)=-2\pi \int_0^{\infty}r^2f(r)dr\). To evaluate the integral, one can use the hard sphere repulsive Pauli potential so that \(f(r)=-1\) for \(r<r_0\) and \(f(r)=e^{\beta V_0(r_0/r)^6}-1\) for \(r\geq r_0\). In the high-temperature limit \(\beta V_0\ll 1\), one can Taylor expand the exponential to make the integral easier. The virial coefficient comes out as \(B_2(T)=b-a/kT\) where \(a:=2\pi r_0^3V_0/3\) and \(b:=2\pi r_0^3/3\). The virial expansion to this order is thus:

\[\frac{pV}{NkT}=1+\left(b-\frac{a}{kT}\right)\frac{N}{V}\]

If you isolate for \(kT\):

\[kT=\frac{V}{N}\left(p+\frac{aN^2}{V^2}\right)\left(1+\frac{Nb}{V}\right)^{-1}\]

then Taylor expand the last term because the particle density \(N/V\) is small, you finally do obtain the Van der Waals equation of state:

\[\left(p+\frac{aN^2}{V^2}\right)(V-Nb)=NkT\]

To interpret it, recall that \(a\propto V_0>0\) is proportional to the attractive strength \(V_0\) of the interaction, so increasing the attractive strength ceteris paribus between gas particles means a lower pressure \(p\) which of course makes sense. Moreover, this correction is proportional to the particle density squared \((N/V)^2\) because the number of attractive forces within the container is given by the number of pairs \(N(N-1)/2\propto N^2\) of gas particles (as the derivation using the canonical partition function made very explicit!). On the other hand, \(b=a/V_0\) is a purely geometric excluded volume that arose from the infinitely hard Pauli repulsion. It is four times larger than the volume of each individual hard sphere \(b=4\times\frac{4}{3}\pi\left(\frac{r_0}{2}\right)^3\) but also half the volume \(b=\frac{1}{2}\frac{4}{3}\pi r_0^3\) that it physically excludes the centers of other hard spheres from entering:

So where does the factor of \(2\) come from? Well, if \(2b\) denotes that physically excluded volume, then if one adds in gas particles one by one, the first can roam in the container’s volume \(V\), the second in \(V-2b\), the third in \(V-4b\), etc. The total \(N\)-dimensional configuration space (NOT phase space!) hypervolume \(\mathcal V\) for \(N\) gas particles is:

\[\mathcal V=\frac{1}{N!}\prod_{n=1}^N(V-2nb)\]

In the limit \(V\ll 2b\), one can check that this hypervolume \(\mathcal V\) reduces to \(\mathcal V=\frac{1}{N!}\left(V-Nb\right)^N\) suggesting that it is indeed \(b\) and not \(2b\) that is the correct correction to the container volume \(V\) in the Van der Waals equation (in other words, \(b\) should be thought of as the excluded configuration space volume contributed by each gas particle).

To summarize, recall the following assumptions that went into the derivation of the Van der Waals equation of state:

  1. Low density gases \(N/V\ll 1/r_0^3\) (much less dense than the liquid phase!).
  2. High-temperature \(V_0\ll kT\).
  3. Hard sphere Pauli repulsion \(V(r)=\infty[r<r_0]\).

So while the Van der Waals equation of state is a significant improvement over the ideal gas law, it too has its limitations.

Universality in Van der Waals Fluids

Recall that taking an ideal gas and turning on Van der Waals interactions described by a long-range attractive Lennard-Jones potential and a short-range repulsive Pauli interaction modifies the equation of state to:

\[p=\frac{NkT}{V-Nb}-a\left(\frac{N}{V}\right)^2\]

One can proceed to plot some isotherms of this van der Waals fluid in the usual \(pV\) state space of the fluid (wiggles are exaggerated):

At sufficiently high temperatures \(T>T_c\), the pressure \(p\) decreases monotonically with volume \(V\) along a (roughly) hyperbolic path \(p\propto V^{-1}\) so the Van der Waals fluid is supercritical in this regime (i.e. behaves like a liquid as \(V\to Nb\) and \(p\to\infty\) but otherwise behaves like an ideal gas as \(V\to\infty,p\to 0\)). By contrast, for low temperatures \(T<T_c\), the isotherm is no longer monotonic, experiencing both a local min and a local max. These occur at volumes \(V\) satisfying the cubic equation:

\[\frac{\partial p}{\partial V}=-\frac{NkT}{(V-Nb)^2}+\frac{2aN^2}{V^3}=0\]

It can be checked that the discriminant of the cubic vanishes (corresponding to additionally imposing \(\partial^2p/\partial V^2=0\)) at the critical temperature \(kT_c=8a/27b\), with \(T<T_c\) corresponding to \(3\) real roots and \(T>T_c\) only \(1\) real root (a local max hiding at the unphysical \(V<Nb\) so not shown). The corresponding critical volume \(V_c=3Nb\) and critical pressure \(p_c=a/27b^2\) are easily found and annotated on the figure. In particular, note that the critical compression factor \(Z_c:=p_cV_c/NkT_c\) (not to be confused with the isothermal compressibility \(\kappa_T=-(\partial\ln V/\partial p)_T\) which blows up \(\kappa_T=\infty\) anyways at the critical point) is predicted by the Van der Waals equation to be a universal constant in all Van der Waals fluids:

\[Z_c=\frac{3}{8}=0.375\]

In practice, real fluids have \(Z_c\in\approx[0.2,0.3]\), reflecting the fact that the Van der Waals equation is not so accurate in the liquid regime (because it includes \(2\) fitting parameters \(a,b\) whereas the critical point is defined by \(3\) variables \(p_c,V_c,T_c\), it will always be possible to isolate some kind of universal ratio like this in any equation of state with only \(2\) fit parameters, but not necessarily so for more complicated equations of state). In any case, it definitely outperforms the naive ideal gas prediction \(Z=1\) (i.e. this prediction is just anywhere since there is no notion of criticality in the ideal gas model).

Using the critical variables \(p_c,V_c,T_c\), one can eliminate the fluid-specific constants \(a,b\) from the Van der Waals equation by rewriting it in terms of the reduced pressure \(\hat p:=p/p_c\), the reduced volume \(\hat V:=V/V_c\) and the reduced temperature \(\hat T:=T/T_c\):

\[\hat p=\frac{8\hat T}{3\hat V-1}-\frac{3}{\hat V^2}\]

This universal form of the Van der Waals equation is an instance of the law of corresponding states. Specifically, it is universal in the sense that it’s saying, for all fluids, certain aspects of their physics near their respective critical points look identical. For instance, one has the obvious Taylor expansion \(p-p_c\sim(V-V_c)^3\) as \(V\to V_c\) at the critical point. Two very different fluids will in general have their own very different critical \(p_c\) and \(V_c\), though the Van der Waals equation is making the remarkable claim that if one were to just be handed log-log plots of \(\ln(p-p_c)\) vs. \(\ln(V-V_c)\) for \(p\) close to \(p_c\) and \(V\) close to \(V_c\) in the case of each fluid, then both will have the same slope of \(3\) and differ only by a translation.

Another Van der Waals prediction is the divergence of the isothermal incompressibility \(\kappa_T\sim (T-T_c)^{-1}\) as \(T\to T_c^+\) (this of course agrees with the earlier assertion that exactly at the critical point, \(\kappa_{T=T_c}=\infty\) blows up). Although the Van der Waals equation was historically the first time that this whole theme of “universality at criticality” arose (and which is indeed what is observed experimentally), quantitatively speaking it only produces a rough approximation to the various critical exponents (heuristically this is because fluctuations become more and more important as one approaches the critical point, which is not something that Van der Waals takes into account). For instance, it turns out that actually \(p-p_c\sim(V-V_c)^{\approx 4.2}\) instead, and that \(\kappa_T\sim (T-T_c)^{\approx -1.2}\).

Liquid-To-Gas Phase Transition

Naively, the Van der Waals equation suggests that for a given subcritical isotherm \(\hat T<1\), there are supposedly certain equilibria where the Van der Waals fluid has negative isothermal compressibility \(\kappa_{\hat T}<0\). Such equilibria are obviously unstable. Fortunately, it seems that one can salvage this situation by the fact that

But all points on the \(T<T_C\) isotherm with \(\partial p/\partial V>0\) are unstable equilibria since, if the fluid starts at any such point, then any small perturbation in its volume \(V\) would, according to the isotherm, suggest that it now wants to equilibrate to a higher pressure \(p\), which would push the volume \(V\) even further in the original direction of change, enforcing a positive feedback loop. So instead, only the negative-feedback states with \(\partial p/\partial V<0\) can be observed in practice. For this reason, rather than strictly plotting a \(T<T_C\) Van der Waals isotherm, it would be desirable to replace the unstable section of the isotherm with the actually-experimentally-observed equilibrium states over those range of volumes \(V\).

Although at all temperatures \(T\), each volume \(V\) is associated with some unique equilibrium pressure \(p\), for \(T<T_C\) where monotonicity is broken, there is no longer a bijection between pressures \(p\) and volumes \(V\); in particular, it is clear that for all pressures \(p\) in the blue range highlighted,

In the case \(T<T_C\), for all pressures \(p\) in the blue range highlighted, the Van der Waals isotherm suggests there are \(3\) equilibrium volumes \(V\) that the system would tend towards if it were not already sitting at one of those volumes (assuming here the volume \(V\) is changeable by e.g. a piston).

Gravity is an anisotropic external input, density filter.

The middle phase is unstable due to its positive gradient \(\partial p/\partial V>0\), however the outer two are stable, and by inspecting not only their corresponding volumes \(V\) but also their isothermal compressibilities \(\kappa_T:=-(\partial\ln V/\partial p)_T\) or equivalently their isothermal bulk moduli \(B_T:=1/\kappa_T\), it is clear that the left point corresponds to a liquid while the right point corresponds to a gas. However, recall that the Van der Waals equation is essentially a second-order virial expansion in the particle density \(N/V\) so although it predicts the existence of a stable liquid phase, any quantitative predictions it makes about such a liquid phase (for which \(N/V\) is not typically small) need to be taken with a grain of salt.

But this begs the question, if the system sits at a pressure \(p\) for which both the liquid and gas phases seem to be stable, what does one actually observe experimentally? The answer is that, in general, there will be a mixture of both the liquid and gas phases. Implicitly here, one is considering the steady state \(t\to\infty\) limit with everything in thermodynamic equilibrium. Mechanical equilibrium is necessarily satisfied because by assumption the pressure \(p\) is fixed on a horizontal line, and thermal equilibrium is also ensured because both phases sit on the same \(T\)-isotherm. The only additional constraint yet to be imposed is diffusive equilibrium, i.e. that the number of particles in each has reached a steady state. This means both the liquid and gas phases, if they were to coexist, would also need to lie at the same chemical potential \(\mu\) (since \(p,T\mu\) are all intensive there is no constraint on the absolute numbers \(N_{\ell},N_g\) of liquid and gas particles, only on their relative proportion \(N_{\ell}/N_g\)). The Maxwell construction shows (proof omitted) that this only happens at a pressure \(p\) for which one has equal areas:

(thus, if the initial pressure \(p\) were not at this specific pressure, then the chemical potentials \(\mu_{\ell}\neq\mu_g\) would initially be unbalanced and particles would begin to diffuse from one phase to the other until the chemical potentials were balanced and the pressure would rise or fall accordingly). Considering the set of temperatures \(T\leq T_C\), for each one it is possible to perform a Maxwell construction to identify exactly which two points on that \(T\)-isotherm lead to equilibrium between the liquid and gas phases, and plot the corresponding \((p_{\ell},V_{\ell}),(p_g,V_g)\); the set of all such pairs in the \(pV\)-plane defines the coexistence curve. This is not to be confused with the spinodal curve which is just the curve passing through the extrema of the Van der Waals isotherms. Sandwiched between these are metastable states.

Posted in Blog | Leave a comment

The Scattering Operator \(S\)

The purpose of this post is to demonstrate how the scattering operator \(S:\mathcal H_{\text{incident}}^{\infty}\to\mathcal H_{\text{scattered}}^{\infty}\), also called the \(S\)-operator for short, despite being defined to enact the scattering \(|\psi’_{\infty}\rangle=S|\psi_{\infty}\rangle\) of asymptotic incident waves \(|\psi_{\infty}\rangle\) off a potential \(V\) into asymptotic scattered waves \(|\psi’_{\infty}\rangle\), also encodes information about non-scattering states (e.g. bound states and resonant states) of the very same Hamiltonian \(H=T+V\) from which one is scattering quantum particles off of.

To illustrate this, recall that in one-dimension we have \(4\) distinct asymptotic waves \(|\pm k\rangle_{\pm’\infty}\), defined by the position space wavefunctions \(\langle x|\pm k\rangle_{\pm’\infty}:=e^{\pm ikx}[x\to\pm’\infty]\). We demand that the \(2\) incident waves (from the left and right) individually scatter as \(|k\rangle_{-\infty}\mapsto S|k\rangle_{-\infty}=r_{\rightarrow}|-k\rangle_{-\infty}+t_{\rightarrow}|k\rangle_{\infty}\) and \(|-k\rangle_{\infty}\mapsto S|-k\rangle_{\infty}=t_{\leftarrow}|-k\rangle_{-\infty}+r_{\leftarrow}|k\rangle_{\infty}\) as the picture below shows:

Therefore, with respect to the basis \(|k\rangle_{-\infty},|-k\rangle_{\infty}\) for the space of incident/ingoing waves \(\mathcal H_{\text{incident}}^{\infty}\) and the basis \(|-k\rangle_{-\infty},|k\rangle_{\infty}\) for the space of scattered/outgoing waves\(\mathcal H_{\text{scattered}}^{\infty}\), the matrix representation of \(S\), called the \(S\)-matrix, looks like:

\[[S]_{|k\rangle_{-\infty},|-k\rangle_{\infty}}^{|-k\rangle_{-\infty},|k\rangle_{\infty}}=\begin{pmatrix}r_{\rightarrow}&t_{\leftarrow}\\t_{\rightarrow}&r_{\leftarrow}\end{pmatrix}\]

Recall that we have some unintuitive results on 1D scattering, namely \(t_{\leftarrow}=t_{\rightarrow}:=t\) (not to be confused with time!) and \(r_{\rightarrow}t^*+r_{\leftarrow}^*t=0\). This just says that the columns (equivalently the rows) of the \(S\)-matrix are complex-orthogonal to each other. The orthogonality condition also gives that \(|r_{\leftarrow}|=|r_{\rightarrow}|\), hence by matching asymptotic probability currents \(J_{-\infty}=J_{\infty}\) one obtains \(|r_{\leftarrow}|^2+|t|^2=|r_{\leftarrow}|^2+|t|^2=1\). This discussion proves that \([S]_{|k\rangle_{-\infty},|-k\rangle_{\infty}}^{|-k\rangle_{-\infty},|k\rangle_{\infty}}\in U(2)\) has spectrum \(\Lambda_S\subseteq U(1)\) with complex-orthogonal eigenvectors. One can write such eigenvalues in terms of two real-valued phase shifts \(\delta_{\pm}=\delta_{\pm}(k)\) dependent on the incident momentum \(k\) of the beam of particles, i.e. \(e^{2i\delta_{\pm}}=\frac{r_{\leftarrow}+r_{\rightarrow}}{2}\pm\sqrt{\left(\frac{r_{\leftarrow}-r_{\rightarrow}}{2}\right)^2+t^2}\) with respective eigenvectors that I won’t bother writing down yet.

At this point, to simplify life we will work with an even potential \(V(-x)=V(x)\) so that \(r_{\leftarrow}=r_{\rightarrow}:=r\) coincide. Then the \(S\)-matrix becomes a \(2\times 2\) circulant matrix so we can immediately write down its normalized eigenvectors (this is why I didn’t bother writing them down yet above!) as the two Fourier modes \((1/\sqrt{2},\pm 1/\sqrt{2})^T\). The corresponding eigenvalues of the \(S\)-matrix are straightforwardly \(e^{2i\delta_{\pm}}=r\pm t\), either from the earlier formula or by acting on those Fourier mode eigenvectors (recall that for \(t\in\textbf C\) we have the usual two branches for the square root \(\sqrt{t^2}=\pm t\)). Put another way, the \(S\)-operator is diagonalized in this basis:

\[[S]_{|k\rangle_{-\infty}+|-k\rangle_{\infty}, |k\rangle_{-\infty}-|-k\rangle_{\infty}}^{|-k\rangle_{-\infty}+|k\rangle_{\infty}, |-k\rangle_{-\infty}-|k\rangle_{\infty}}=e^{2i\text{diag}(\delta_+,\delta_-)}\]

Intuitively, if one views \(r,t\in\textbf C\cong\textbf R^2\) as vectors in the complex plane, then for a symmetric potential they must be \(\textbf R^2\)-orthogonal to each other, consistent with their \(\textbf C^2\)-orthogonality \(rt^*+(rt^*)^*=2\Re(rt^*)=0\Rightarrow |r\pm t|^2=|r|^2\pm 2\Re(rt^*)+|t|^2=|r|^2+|t|^2=1\) so that \(r\pm t=e^{2i\delta_{\pm}}\in U(1)\) as claimed all along:

In hindsight, it’s actually not too surprising why the eigenstates of the \(S\)-operator were given by the Fourier modes \(|k\rangle_{-\infty}\pm|-k\rangle_{\infty}\) when \(V(-x)=V(x)\). Physically, these look like simultaneously throwing in two identical beams of particles from both the left and right, just that one setup has even parity while the other setup has odd parity. After all, because \([H,\Pi]=0\) conserves parity, the interaction of the particles with \(H\) (in other words the scattering!) has to preserve parity asymptotically in the past and future. And those particular linear combinations are essentially the only way to construct \(\Pi\)-eigenstates out of the incident waves \(|k\rangle_{-\infty},|-k\rangle_{\infty}\) which don’t have definite parity.

Example: So far we’ve gotten away with proving several general results on one-dimensional scattering, assuming only a symmetric potential \(V(-x)=V(x)\). However, to actually compute \(S\)-matrices (regardless of which basis of scattering states you want to work in) we ultimately have to say what \(V(x)\) we’re talking about! To that effect, consider our favorite finite potential well of width \(L>0\) and depth \(V_0>0\). The goal is to compute the phase shifts \(\delta_{\pm}(k)\) that completely characterize all scattering phenomena off this potential. If we define \(|\psi_{\text{well}}^{\pm}\rangle:=\alpha_{\pm}(|k’\rangle_{[-L/2,L/2]}\pm|-k’\rangle_{[-L/2,L/2]})\) to be the compactly supported even/odd state in the finite potential well region, with increased momentum \(k’=\sqrt{2m(E+V_0)}/\hbar>k=\sqrt{2mE}/\hbar\), then the even/odd-parity total states \(|\psi_{\pm}\rangle\) are given by the superpositions \(|\psi_{\pm}\rangle=|k\rangle_{-\infty}\pm|-k\rangle_{\infty}+e^{2i\delta_{\pm}(k)}(|-k\rangle_{-\infty}\pm|k\rangle_{\infty})+|\psi_{\text{well}}^{\pm}\rangle\). Now because \(\langle x|\psi_{\pm}\rangle\) and \(\langle x|K|\psi_{\pm}\rangle\) should both be continuous at \(x=L/2\) (and by parity also at \(x=-L/2\)), it follows that \(\langle x|K\ln|\psi_{\pm}\rangle\) also needs to be continuous at \(x=\pm L/2\). Approaching \(x=L/2\) from the left means:

\[\langle L/2|K\ln|\psi_{\pm}\rangle=k’\frac{\langle L/2|k’\rangle\mp\langle L/2|-k’\rangle}{\langle L/2|k’\rangle\pm\langle L/2|-k’\rangle}\]

Whereas approaching \(x=L/2\) from the right means:

\[\langle L/2|K\ln|\psi_{\pm}\rangle=k\frac{\mp\langle L/2|-k\rangle\pm e^{2i\delta_{\pm}(k)}\langle L/2|k\rangle}{\pm\langle L/2|-k\rangle\pm e^{2i\delta_{\pm}(k)}\langle L/2|k\rangle}\]

Treating it as a Mobius transformation, one can isolate for the \(S\)-operator eigenvalues:

\[e^{2i\delta_{\pm}(k)}=\frac{(k+k’\frac{\langle L/2|k’\rangle\mp\langle L/2|-k’\rangle}{\langle L/2|k’\rangle\pm\langle L/2|-k’\rangle})(k-k\frac{\mp\langle L/2|-k\rangle\pm\langle L/2|k\rangle}{\pm\langle L/2|-k\rangle\pm\langle L/2|k\rangle})}{(k-k’\frac{\langle L/2|k’\rangle\mp\langle L/2|-k’\rangle}{\langle L/2|k’\rangle\pm\langle L/2|-k’\rangle})(k+k\frac{\mp\langle L/2|-k\rangle\pm \langle L/2|k\rangle}{\pm\langle L/2|-k\rangle\pm \langle L/2|k\rangle})}\]

These simplify respectively to:

\[e^{2i\delta_{+}(k)}=-e^{-ikL}\frac{k’\tan(k’L/2)-ik}{k’\tan(k’L/2)+ik}\]

\[e^{2i\delta_{-}(k)}=e^{-ikL}\frac{k\tan(k’L/2)-ik’}{k\tan(k’L/2)+ik’}\]

Hence we can read off the phase shifts:

\[\delta_{+}(k)=-\frac{kL}{2}-\arctan\left(\frac{k}{k’}\cot\frac{k’L}{2}\right)\]

\[\delta_-(k)=-\frac{kL}{2}-\arctan\left(\frac{k’}{k}\cot\frac{k’L}{2}\right)\]

where \(k’=k'(k)\) through the energy \(E=\frac{\hbar^2k^2}{2m}=\frac{\hbar^2k’^2}{2m}-V_0\). The phase shift difference is obtained either by dividing the original exponentials or applying the arctan subtraction identity:

\[\delta_+(k)-\delta_-(k)=\text{arctan}\left(\frac{k’^2-k^2}{2kk’}\sin k’L\right)\]

Finally, although the eigenvalues \(e^{2i\delta_{\pm}(k)}\) of the \(S\)-operator lie in \(U(1)\) when \(k\in\textbf R\), if we instead allow \(k\in\textbf C\) to roam the entire complex momentum plane, then the exponential \(e^{2i\delta_{\pm}(k)}\) is no longer restricted to lie in \(U(1)\). In particular, focus for a moment on just letting \(k\in i\textbf R\subseteq\textbf C\) roam the imaginary axis. We know from experience that the \(E<0\) bound state wavefunctions of the finite potential well (of which one even-parity bound ground state always exists) look like the pure harmonics of the infinite potential well but with an exponential leakage \(\propto e^{\pm \kappa x}[x\to\mp \infty]\) into the classically forbidden regions. These real exponentials \(e^{\pm\kappa x}\) are clearly very similar-looking to the complex exponentials \(e^{\pm ikx}\) that the scattering state wavefunctions involve. This is therefore how I would motivate this trick of looking at \(k=i\kappa’\) for \(\kappa’\in\textbf R\). We also know that the bound states of the finite potential well have to have definite parity (unlike the scattering states) so it seems plausible that maybe these bound states can emerge from our scattering states \(|\psi_{\pm}\rangle\) constructed earlier that do happen to have definite parity \(\pm\) as well. Therefore, with the substitution \(k=i\kappa’\), the total even/odd-parity scattering states \(|\psi_{\pm}\rangle\) transform asymptotically into looking like:

\[\langle x|\psi_{\pm}\rangle= (e^{-\kappa’ x}+e^{2i\delta_{\pm}(i\kappa’)}e^{\kappa’ x})[x\to-\infty]\pm(e^{\kappa’ x}+e^{2i\delta_{\pm}(i\kappa’)}e^{-\kappa’ x})[x\to\infty]\]

But clearly these candidate bound states \(|\psi_{\pm}\rangle\) (whether even or odd) can only have a hope of being normalizable (as required for bound states but not scattering states!) iff \(\kappa'<0\). Therefore, define \(\kappa:=-\kappa’>0\):

\[\langle x|\psi_{\pm}\rangle= (e^{\kappa x}+e^{2i\delta_{\pm}(-i\kappa)}e^{-\kappa x})[x\to-\infty]\pm(e^{-\kappa x}+e^{2i\delta_{\pm}(-i\kappa)}e^{\kappa x})[x\to\infty]\]

In order to kill the non-normalizable exponentials, we must therefore find values of \(k=-i\kappa\) on the lower imaginary axis such that the \(S\)-operator eigenvalues vanish \(e^{2i\delta_{\pm}(k)}=0\). More precisely, if we want to realize our dream of \(|\psi_+\rangle\) being an even parity bound state, then we need:

\[e^{2i\delta_+(-i\kappa)}=-e^{-\kappa L}\frac{k’\tan(k’L/2)-\kappa}{k’\tan(k’L/2)+\kappa}=0\]

where now \(\kappa^2+k’^2=2mV_0/\hbar^2\) are constrained to lie on a circle \(k’-i\kappa\in \sqrt{2mV_0}/\hbar U(1)\) in the complex momentum plane \(\textbf C\). Meanwhile the condition above is satisfied iff the numerator vanishes \(\kappa=k’\tan(k’L/2)\).

This is the usual transcendental system of simultaneous equations for \((\kappa,k’)\in (0,\infty)^2\) that need to be solved to find even-parity bound states of the finite potential well. Graphically, it is clear that there always exists at least one ground state solution with possibly more depending on the width \(L\) and depth \(V_0\) of the potential.

For odd-parity bound states, we require \(e^{2i\delta_-(-i\kappa)}=0\) which instead leads to \(k’=-\kappa\tan k’L/2\). Graphically, this is a mirroring of the setup for even-parity states about the \(\Im =-\Re \) diagonal.

Resonances in Particle Physics

Recall from classical physics that the \(n\)-th harmonic for \(n\in\textbf Z^+\) on say a violin string of length \(L\) or inside a pipe of length \(L\) with nodes at both ends has wavelength \(\lambda=\lambda_n\) quantized according to \(n\lambda=2L\). Loosely, one might say that such harmonics “maximize transmission” of the instrument’s pleasant sound. It turns out the same principle is true when it comes to scattering quantum particles off a finite potential well of length \(L\); if the de Broglie wavelength \(\lambda=2\pi/k\) satisfies \(n\lambda=2L\), then the transmission probability is maximized \(T=|t|^2=|(e^{2i\delta_+(k)}-e^{2i\delta_-(k)})/2|^2=\sin^2(\delta_+(k)-\delta_-(k))=1\rightarrow \delta_+-\delta_-\equiv \pi/2\pmod \pi\) (alternatively, from the earlier geometric picture it is clear that \(|t|=\sin|\delta_+(k)-\delta_-(k)|\)). Equivalently, if one scatters in quantum particles exactly with energy \(E=\frac{n^2\pi^2\hbar^2}{2mL^2}+V_0\) for some \(n\in\textbf Z^+\), then the particle is guaranteed to transmit through with no reflection \(R=|r|^2=0\). Note that these energies are precisely those of the bound states in an infinite potential well with the bottom of the well raised to \(V_0\) (I’m not sure if this is a coincidence or something more universal about de Broglie quantization). This is one form of resonance in quantum mechanical scattering experiments, which one can think of as transmission resonances.

However, there is another sense of the word “resonance” as used in particle physics which is related to the above discussion. Specifically, a resonant state of a quantum system is one which is metastable, i.e. unstable in the long run \(t\to\infty\). Resonant states are thus a kind of hybrid of a bound state and a scattering state. What kind of quantum system admits resonant states? The finite potential well, it turns out, doesn’t. Instead, when asked to think about a potential \(V(x)\) admitting a resonant state, one’s gut reaction should be to visualize a trapping environment such as:

Here, the intuition is that there are no bound (i.e. permanently stable) states because the particle can quantum tunnel through the walls of the trap (don’t call it a well anymore!). Hence, the next best thing you can get is a resonant state.

We can idealize the above picture by considering a double Dirac delta potential trap \(V(x)=V_0L(\delta(x-L/2)+\delta(x+L/2))\) of width \(L>0\) and height \(V_0>0\) (the factor of \(L\) is needed to compensate the fact that delta functions always have the inverse dimension of their argument. This ensures \(V_0\) has units of energy). Integrating the Schrodinger equation in an \(\varepsilon\)-ball around the spike at \(x=L/2\) yields the magnitude of the first derivative’s discontinuity \(\psi'(L/2)^+-\psi'(L/2)^-=\frac{2mV_0 L}{\hbar^2}\psi(L/2)\) where the quantity \(\psi(L/2)\) is well-defined because \(\psi(x)\) itself will be continuous at \(x=L/2\) (and indeed, is the other condition we need to patch). Since this potential is again even \(V(-x)=V(x)\), we should scatter in particles from the left and right simultaneously and patch only at \(x=L/2\); looking at the even-parity scattering state for now (odd case is similar):

\[\alpha(e^{ikL/2}+e^{-ikL/2})=e^{-ikL/2}+e^{2i\delta_+(k)+ikL/2}\]

\[-ike^{-ikL/2}+ike^{2i\delta_+(k)+ikL/2}-ik\alpha(e^{ikL/2}-e^{-ikL/2})=\frac{2mV_0 L}{\hbar^2}\left(e^{-ikL/2}+e^{2i\delta_+(k)+ikL/2}\right)\]

Isolating for the eigenvalue \(e^{2i\delta_+(k)}\) yields:

\[e^{2i\delta_+(k)}=-e^{-ikL}\frac{k\tan(kL/2)-\tilde k-ik}{k\tan(kL/2)-\tilde k+ik}\]

where \(\tilde k:=2mV_0 L/\hbar^2\). Unlike the finite potential well, the Dirac double delta potential trap admits no bound states because there does not exist any \(k\) on the lower imaginary axis such that \(e^{2i\delta_+(k)}=0\). To prove this, write \(k=-i\kappa\) for \(\kappa>0\) so that we need to solve the real equation \(\tanh\frac{\kappa L}{2}=-1-\frac{\tilde k}{\kappa}\). But the range of \(\tanh\) is \((-1,1)\) and for \(\kappa>0\) the right-hand side is always less than \(-1\), hence there can be no solution for \(\kappa>0\) (it turns out there are solutions for \(\kappa<0\) whenever \(\tilde kL\leq\approx 0.5569290855\) where the constant satisfies the transcendental equation \(\tanh\left(\frac{1}{2}+\frac{x}{4}\right)=1-\frac{1}{\frac{1}{x}-\frac{1}{2}}\) but we already saw these were unphysical). Of course, this non-existence of bound states is what we had expected all along. However, there do exist complex momenta \(k\in\textbf C-i\textbf R\) not on the imaginary axis for which the phase eigenvalue \(e^{2i\delta_+(k)}=0\) does vanish. But does that have any associated physical meaning? Well, just like the bound state case \(k=-i\kappa\) led to the negative energy \(E=-\hbar^2\kappa^2/2m\) characteristic of bound states, a general complex momentum \(k=\Re k+i\Im k\) leads to a complex energy \(E=\frac{\hbar^2(\Re^2 k-\Im^2 k)}{2m}+i\frac{\hbar^2}{m}\Re k\Im k\). To see what this means, consider that an \(H\)-eigenstate \(|E\rangle\) with such a complex energy \(E\) would evolve in time as:

\[|E(t)\rangle=e^{-iEt/\hbar}|E(0)\rangle=e^{-i\hbar(\Re^2 k-\Im^2 k)t/2m}e^{\hbar\Re k\Im k t/m}|E(0)\rangle\]

Thus, such an \(H\)-eigenstate \(|E(t)\rangle\) oscillates at angular frequency \(\omega=\hbar(\Re^2 k-\Im^2 k)/2m\). If we assume that \(\Re k\Im k<0\) have opposite signs (i.e. \(k\in\textbf C\) either lives in the second or fourth quadrant), then\(|E(t)\rangle\) also decays (rather than grows) exponentially in amplitude with time constant \(\tau=-m/\hbar\Re k\Im k>0\). This turns out to correspond to a resonant state with \(\tau\) representing the typical time scale for which one can trap the particle before it tunnels away. It is common to define the energy width \(\Gamma:=-2\hbar^2\Re k\Im k/m\) of such a metastable resonant state so that \(\Gamma\tau=2\hbar\) has a Heisenberg form (the reason for this name has to do with the Breit-Wigner distribution in particle physics).

So assuming we’ve found some non-imaginary complex momentum \(k=\Re k+i\Im k\in\textbf C-i\textbf R\) for which the outgoing waves are suppressed \(e^{2i\delta_+(k)}=0\), recall this means that the even parity scattering state (incorporating the time dependence) reduces to looking like \(\langle x|\psi_+(t)\rangle\to e^{-iEt/\hbar+ikx}=e^{i(\Re k x-\hbar(\Re^2k-\Im^2k)t/2m)}e^{-\Im k(x-\hbar\Re k t/m)}\) as \(x\to -\infty\) and \(\langle x|\psi_+(t)\rangle\to e^{-iEt/\hbar-ikx}=e^{-i(\Re k x+\hbar(\Re^2k-\Im^2k)t/2m)}e^{\Im k(x+\hbar\Re k t/m)}\) as \(x\to\infty\). But this simply describes the wavefunction of a particle that has tunneled out of trap, moving away in both directions \(x\to\pm\infty\) from the trap at speed \(v=\hbar|\Re k|/m\). Such a wavefunction \(|\psi_+\rangle\notin L^2(\textbf R\to\textbf C,dx)\) is not normalizable as is typical of resonant states, analogous to scattering states (for instance, despite being an \(H\)-eigenstate, its complex energy eigenvalue \(E\in\textbf C\) violates the supposed Hermiticity of the Hamiltonian \(H\), so such a state \(|\psi_+\rangle\) cannot belong to the state space \(L^2\)).

So then back to the double Dirac delta trap, we expect intuitively it should have an even resonant state, which mathematically means that its \(S\)-operator eigenvalue \(e^{2i\delta_+(k)}=-e^{-ikL}\frac{k\tan(kL/2)-\tilde k-ik}{k\tan(kL/2)-\tilde k+ik}\) should vanish somewhere in quadrant \(2\) or \(4\) of the complex \(k\)-plane. Indeed, this means we expect the numerator to vanish \(k\tan(kL/2)=\tilde k+ik\). One can check that this is equivalent to the requirement \(e^{ikL}=-1-\frac{2ik}{\tilde k}\). If the potential were infinitely trapping so that \(\tilde k\to\infty\), then we would have real momenta \(k=k_n=\frac{\pi}{L}+\frac{2\pi n}{L}\in\textbf R\). These are bound states because an infinitely strong double Dirac delta trap looks like an infinite potential well (at least in the trapping region \(x\in(-L/2,L/2)\); asymptotically all incident waves are reflected at the walls and \(e^{2i\delta_+(k)}=-e^{-ikL}\neq 0\) with the minus sign due to the phase shift of a reflection. Another point is that we previously saw bound states should have \(k\) living in the lower imaginary axis, but here \(k\in\textbf R\). This is a pathological exception to the usual rule since ordinarily for finite \(\tilde k\) we didn’t expect any bound states in the first place but an infinite trapping potential forbids particles from tunneling out, so would-be resonant states transmogrify into bound states in this artificial limit \(\tilde k\to\infty\)). Now suppose that the potential trapping strength \(\tilde k\) is still strong but finite (this is the limit we work in). Focus on the \(n=0\) bound state for which \(k=\pi/L\). It’s position space wavefunction in the trap looks just like the fundamental harmonic on a violin string. We want to know where this initially real \(k=\pi/L\) instead ends up in \(\textbf C\). To first order in \(1/\tilde k\), it simply moves to (going to first-order in the log and using binomial expansion) \(ikL\approx i\pi+2ik/\tilde k\Rightarrow k\approx \frac{\pi}{L}(1+2/\tilde kL)\). However, to second order \(ikL\approx i\pi+2ik/\tilde k+4k^2/\tilde k^2\). We can take the first-order solution for \(k\) and substitute in for the \(k^2\) term, keeping only terms to order \(1/\tilde k^2\), and end up getting \(k\approx\frac{\pi}{L}(1+2/\tilde kL+4/\tilde k^2L^2)-\frac{4\pi^2}{\tilde k^2L^3}i\) neglecting real and imaginary terms of order \(O_{\tilde k\to\infty}(1/\tilde k)^3\). This shows \(k\) moving toward the right and down in the complex momentum plane \(\textbf C\), which is the resonant (ground) state momentum!

Posted in Blog | Leave a comment

Nearly Free Electrons

Problem #\(1\): Derive the band structure of the \(1\)D nearly free electron model.

Solution #\(1\):

In the \(1\)D extended zone scheme, the band structure (a butchering of the free particle parabolic dispersion) looks like:

or in the \(1\)D reduced zone scheme:

Problem #\(2\): Comment on the group velocity and effective mass behaviour of this band structure.

Solution #\(2\): The group velocity of electron wavepacket provides \(1^{\text{st}}\) derivative information about the dispersion (here \(p=\hbar k\)):

\[v:=\frac{\partial E}{\partial p}\]

One heuristic justification for the electron group velocity flattening out \(v\to 0\) at zone boundaries \(k=n\pi/a\) is that this ensures upon backfolding into the \(1^{\text{st}}\) Brillouin zone in the reduced zone scheme that the group velocity \(v=v(k)\) remains a continuous function of \(k\) at \(k=n\pi/a\).

Meanwhile, the effective mass provides \(2^{\text{nd}}\) derivative information about the dispersion:

\[m^*=\left(\frac{\partial^2 E}{\partial p^2}\right)^{-1}\]

(i.e. one can think of \(m^*\) as the radius of curvature at a point on the dispersion). For the electron it is positive near the bottom of a band but negative near the top, whereas it is the other way around for holes.

Digression on Floquet Theory

There is another, more abstract way to qualitatively see why a periodic potential \(V(x+\Delta x)=V(x)\) gives rise to Brillouin zones and energy bands.

Consider an \(N\)-dimensional linear dynamical system \(\dot{\textbf x}(t)=A(t)\textbf x(t)\) where \(A(t)\) can be time-dependent. In general, one expects this to have \(N\) linearly independent solutions \(\textbf x_1(t),\textbf x_2(t),…,\textbf x_N(t)\). A fundamental matrix solution \(X(t)\) to the linear dynamical system is any \(N\times N\) matrix comprised of a basis of \(N\) such linearly independent solutions \(X(t)=\left(\textbf x_1(t),\textbf x_2(t),…,\textbf x_N(t)\right)\), so in other words the linear independence implies that \(X(t)\in GL_N(\textbf R)\) is invertible \(\det X(t)\neq 0\) at all times \(t\in\textbf R\). Think of the fundamental matrix solution \(X(t)\) as an \(N\)-dimensional parallelotope evolving in time \(t\). It can be checked that any fundamental matrix solution \(\dot{X}(t)=A(t)X(t)\) satisfies the same system of ODEs, and that any individual solution \(\textbf x(t)\) must lie in the column space \(\textbf x(t)\in\text{span}_{\textbf R}(\textbf x_1(t),…,\textbf x_N(t))\) of any fundamental matrix solution \(X(t)\) (indeed, this is why it’s called “fundamental” because it provides a basis for the entire solution space of the linear dynamical system). Thus, \(\textbf x(t)=X(t)X^{-1}(0)\textbf x(0)\). If \(X(0)=1_{N\times N}\), then \(X(t)\) is called a principal fundamental matrix solution and in that case we just have \(\textbf x(t)=X(t)\textbf x(0)\). Thus, the principal fundamental matrix solution plays a role analogous to the time evolution operator \(U(t)\) in quantum mechanics, evolving the initial state of a system through time \(t\).

Floquet theory is interested in a special case of the setup above, namely when \(A(t+T)=A(t)\) is a \(T\)-periodic matrix. In this case, it turns out that the solutions \(\textbf x(t)\) themselves will not necessarily be \(T\)-periodic (or periodic at all for that matter), but will nonetheless take on a much simpler form than the otherwise unenlightening time-ordered exponential solution \(\textbf x(t)=\mathcal Te^{\int_0^tA(t’)dt’}\textbf x(0)\) as a Dyson series. What’s that simplification? Let me spoil it first, then prove it afterwards. Recall that I just said \(\textbf x(t)\) won’t necessarily be periodic. But that should feel pretty strange since \(A(t)\) is by assumption \(T\)-periodic so surely \(\textbf x(t)\) should inherit something from this. And that’s exactly the essence of Floquet’s theorem, namely that although \(\textbf x(t)\) itself need not be periodic, its time evolution can always be factorized into a \(T\)-periodic \(N\times N\) matrix \(X_A(t)\) (with the subscript \(A\) to suggest that it inherits its \(T\)-periodicity \(X_A(t+T)=X_A(t)\) from the \(T\)-periodicity of the matrix \(A(t)\)) modulated by an exponential growth/decay or oscillatory “envelope” \(e^{\Lambda t}\) for some time-independent \(N\times N\) (possibly complex) matrix \(\Lambda\in\textbf C^{N\times N}\). In other words, one has the ansatz:

\[\textbf x(t)=X_A(t)e^{\Lambda t}\textbf x(0)\]

To prove Floquet’s theorem, we first ask: although \(\textbf x(t+T)\neq\textbf x(t)\) need not be periodic, how are \(\textbf x(t+T)\) and \(\textbf x(t)\) related to each other? Clearly by a simple \(T\)-time translation \(\textbf x(t+T)=X(t+T)X^{-1}(t)\textbf x(t)\). It makes sense then to focus on understanding these fundamental matrix solutions \(X(t)\) better.

The first key result of Floquet theory is that the time evolution of any fundamental matrix solution \(X(t)\) across a period \(T\) behaves multiplicatively \(X(t+T)=X(t)\tilde X_T\) where the \(N\times N\) monodromy matrix \(\tilde X_T=X^{-1}(t)X(t+T)\) is a conserved quantity of any \(T\)-periodic linear dynamical system:

\[\dot{\tilde X}_T=-X^{-1}(t)A(t)X(t+T)+X^{-1}(t)A(t+T)X(t+T)=0\]

It is thus common to initialize the monodromy matrix \(\tilde X_T=X^{-1}(0)X(T)\) and write \(X(t+T)=X(t)X^{-1}(0)X(T)\). After \(n\) periods elapse we have \(X(t+nT)=X(t)\tilde X_T^n\) so as a corollary for instance the hypervolume \(\det X(t)\) of the parallelotope grows stroboscopically in time \(t\) as a geometric sequence \(\det X(t+nT)=(\det \tilde X_T)^n\det X(t)\). Indeed, using Jacobi’s formula one can check that \(\dot{\det}X(t)=\det A(t)\det X(t)\) so \(\det X(t)=e^{\int_0^tA(t’)dt’}\det X(0)\) is consistent with \(X(t+T)=X(t)\tilde X_T\) provided the monodromy matrix \(\tilde X_T\) and \(A(t)\) are related by \(\det\tilde X_T=e^{\int_0^T\text{Tr} A(t)dt}\) (despite the suggestion, note that in general \(\tilde X_T\neq e^{\int_0^T A(t)dt}\)! Everything would be simpler if this were true but alas it isn’t).

It is therefore intuitively clear that the behavior/stability of any periodic linear dynamical system is sensitive to its monodromy matrix \(\tilde X_T\), specifically whether it “blows up” or “decays” exponentially or oscillates (as would have been one’s first instinct given the periodicity \(A(t+T)=A(t)\) of \(A\))! More precisely, to understand the behavior of powers \(\tilde X_T^n\) of a matrix, it is always a good idea to diagonalize it (a subtle but important property of the monodromy matrix \(\tilde X_T\) proven on page \(52\) here is that although it depends on the choice of fundamental matrix solution \(X(t)\) used to construct it, it will be similar to any other monodromy matrix \(\tilde X’_T\) constructed from any other fundamental matrix solution \(X'(t)\); thus, the eigenvalues of the monodromy matrix \(\tilde X_T\) are properties only of the linear, periodic dynamical system \(\dot X=AX\) and therefore are rightfully called its characteristic multipliers).

Since \(\tilde X_T\in GL_N(\textbf R)\), these characteristic multipliers must be non-zero, and so all such characteristic multipliers can be written as \(e^{\lambda T}\) for some \(\lambda\in\textbf C\) called its characteristic exponent. Although the imaginary part \(\Im\lambda\) of such a characteristic exponent is not unique \(e^{(\lambda+2\pi i/T)T}=e^{\lambda T}\), the real part \(\Re\lambda\) is unique and physically meaningful, being given the name Lyapunov exponent. Specifically, it is clear that if a solution \(\textbf x(t)\) happens to initialize on \(\textbf x(0)=X(0)\tilde{\textbf x}\) where \(\tilde X_T\tilde{\textbf x}=e^{\lambda T}\tilde{\textbf x}\) is an eigenvector of the monodromy matrix \(\tilde X_T\), then one can check that such a solution \(\textbf x(t+T)=e^{\lambda T}\textbf x(t)\) evolves multiplicatively in an analogous manner to the fundamental matrix solution earlier. In particular, if one replaces this \(\textbf x(t)\mapsto \textbf x_A(t):=e^{-\lambda t}\textbf x(t)\), then clearly this new trajectory (but no longer a solution!) \(\textbf x_A(t)\) is \(T\)-periodic \(\textbf x_A(t+T)=e^{-\lambda t}e^{-\lambda T}\textbf x(t+T)=e^{-\lambda t}e^{-\lambda T}e^{\lambda T}\textbf x(t)=e^{-\lambda t}\textbf x(t)=\textbf x_A(t)\). Thus, one can isolate the form of this “eigentrajectory” as \(\textbf x(t)=e^{\lambda t}\textbf x_A(t)\). For \(\Re \lambda<0\) we have \(\lim_{t\to\infty}\textbf x(t)=\textbf 0\), for \(\Re \lambda>0\) we have \(\lim_{t\to\infty}\textbf x(t)=\boldsymbol{\infty}\) while for \(\Re \lambda=0\) the eigentrajectory \(\textbf x(t)\) is in general pseudoperiodic in that it ends up at the same distance \(|\textbf x(t+T)|=|\textbf x(t)|\) (but not necessarily the same point) away from the origin after each period \(t\mapsto t+T\). These criteria emphasize the role played by the Lyapunov exponent \(\Re\lambda\) in the analysis of stability. By lining up \(N\) such linearly independent eigentrajectories (we’ll assume that the monodromy matrix \(\tilde X_T\) is diagonalizable), it follows that any fundamental matrix solution \(X(t)\) admits a so-called Floquet normal form \(X(t)=X_{A}(t)e^{\Lambda t}\) where \(X_A(t+T)=X_A(t)\) inherits the \(T\)-periodicity of \(A\) and \(\Lambda\) is a time-independent \(N\times N\) matrix. This therefore proves Floquet’s theorem!

Connection to Periodic Potentials in Quantum Mechanics

Define \(\boldsymbol{\phi}(x):=(\psi(x),\psi'(x))^T\) so that the one-dimensional second-order Schrodinger equation reduces to the first-order linear system:

\[\boldsymbol{\phi}'(x)=A(x)\boldsymbol{\phi}(x)\]

where \(A(x)=\begin{pmatrix}0&1\\2m(V(x)-E)/\hbar^2 & 0\end{pmatrix}\) is \(\Delta x\)-periodic \(A(x+\Delta x)=A(x)\) by virtue of the \(\Delta x\)-periodicity of the potential \(V(x)\) itself. Now notice that clearly \(\text{Tr} A(x)=0+0=0\), so it follows from the earlier general Floquet theory results that \(\det\tilde \Phi_{\Delta x}=1\) where \(\tilde \Phi_{\Delta x}\) is the monodromy matrix of the periodic quantum system. This means that its two eigenvalues \(\varphi_{\pm}\) are given by:

\[\varphi_{\pm}=\frac{\text{Tr}\tilde \Phi_{\Delta x}}{2}\pm\sqrt{\frac{\text{Tr}^2\tilde \Phi_{\Delta x}}{4}-1}\]

Thus, when \(\text{Tr}\tilde \Phi_{\Delta x}<-2\) or \(\text{Tr}\tilde \Phi_{\Delta x}>2\), both \(\phi_{\pm}>0\) are real and positive, indicating a spatially unstable wavefunction \(\psi(x)\) that diverges in a non-normalizable manner as \(x\to\pm\infty\). Therefore, this is saying that there do not exist \(H\)-eigenstates with energy \(E\) such that the corresponding monodromy matrix \(\tilde \Phi_{\Delta x}=\tilde \Phi_{\Delta x}(E)\) has trace less than \(-2\) or greater than \(2\). These are the band gaps between the various energy bands! Of course then, on the other hand when \(\text{Tr}\tilde \Phi_{\Delta x}\in(-2,2)\) is in the zone of spatial stability (aka \(L^2\)-normalizability), these are associated with the energy bands! Finally, exactly when \(\text{Tr}\tilde \Phi_{\Delta x}=\pm 2\) is where one reaches the boundaries separating various Brillouin zones! Admittedly this Floquet analysis is a bit abstract and handwavy but I still think it’s a nice alternative perspective to the more concrete nearly-free electron model.

If we let \(\Phi(x)=\begin{pmatrix}\psi_1(x) & \psi_2(x) \\ \psi’_1(x) & \psi’_2(x)\end{pmatrix}\) be a fundamental matrix solution, then of course \(\det\Phi(x)\) is the usual constant Wronskian of two wavefunctions \(\psi_1(x),\psi_2(x)\), and Floquet’s theorem allows us to write \(\Phi(x)=\Phi_V(x)e^{\Lambda x}\) for a \(\Delta x\)-periodic \(2\times 2\) matrix \(\Phi_V(x+\Delta x)=\Phi_V(x)\) whose \(\Delta x\)-periodicity is inherited from the \(\Delta x\)-periodicity of the potential \(V(x)\). This leads to Bloch’s theorem (in one dimension), namely that there exists a crystal momentum \(k\in[-\pi/\Delta x,\pi/\Delta x]\) in the first Brillouin zone is given by the Bloch state \(\psi_{k}(x)=e^{ikx}\phi_k(x)\).

Posted in Blog | Leave a comment

Fermi’s Golden Rule

Problem: Before developing Fermi’s golden rule, it is necessary to first lay out the general framework of time-dependent perturbation theory, of which Fermi’s golden rule is a special case. To this effect, begin by considering the usual (Schrodinger picture) Hamiltonian decomposition \(H=H_0+V\), and obtain the “fundamental theorem of (\(1^{\text{st}}\)-order) time-dependent perturbation theory” for the transition amplitude between an initial \(H_0\)-eigenstate \(|i\rangle\) at \(t=0\) and a final (Schrodinger picture) \(H_0\)-eigenstate \(|f\rangle\) with distinct energy \(E_f\neq E_i\) as a function of the elapsed time \(t\):

\[\langle f|\mathcal T\exp\left(-\frac{i}{\hbar}\int_0^tdt’H(t’)\right)|i\rangle\sim h^{-1}V_{fi}(\omega_{fi})*_{\omega_{fi}}e^{i\omega_{fi}t/2}t\text{sinc}\frac{\omega_{fi}t}{2}\]

where \(\hbar\omega_{fi}:=E_f-E_i\), \(V_{fi}(\omega_{fi}):=\langle f|V(\omega_{fi})|i\rangle\) and the linear response Fourier transform convention is being used:

\[V(\omega):=\int_{-\infty}^{\infty}dte^{i\omega t}V(t)\]

Solution: First one has to establish the following exact decomposition lemma for the time evolution operator under the perturbed Hamiltonian \(H\):

\[\mathcal T\exp\left(-\frac{i}{\hbar}\int_0^tdt’H(t’)\right)=\exp\left(-\frac{iH_0t}{\hbar}\right)\mathcal T\exp\left(-\frac{i}{\hbar}\int_0^tdt’V_{/H_0}(t’)\right)\]

which makes sense conceptually as that is what the interaction picture is all about (to prove it, show that both sides obey the same \(1^{\text{st}}\)-order ODE \(i\hbar\dot U=HU\) with same initial condition \(U(0)=1\)). Now then, upon taking the matrix element of both sides in the \(H_0\)-eigenbasis, the free evolution piece \(\exp\left(-\frac{iH_0t}{\hbar}\right)\) simply contributes an irrelevant phase \(e^{-iE_ft/\hbar}\) when acting on the bra \(\langle f|\) since of course \(H_0\) cannot promote transitions between its own eigenstates; later when taking the mod-square of the transition amplitude to obtain a transition probability, this will drop out.

At this point, one employs the key step of (\(1^{\text{st}}\)-order) time-dependent perturbation theory, which is to Dyson-expand the time-ordered exponential to \(1^{\text{st}}\)-order (the word “order” here is being used in \(2\) different senses!):

\[\sim \langle f|1-\frac{i}{\hbar}\int_0^tdt’V_{/H_0}(t’)|i\rangle\]

Since \(E_f\neq E_i\) and \(H_0\) is Hermitian, it follows \(\langle f|i\rangle=0\) are orthogonal so the transition amplitude becomes:

\[\sim\hbar^{-1}\int_0^tdt’\langle f|V_{/H_0}(t’)|i\rangle=\hbar^{-1}\int_0^tdt’e^{i\omega_{fi}t}\langle f|V(t’)|i\rangle\]

(aside: why not just Dyson-expand the original time-evolution operator \(\mathcal T\exp\left(-\frac{i}{\hbar}\int_0^tdt’H(t’)\right)\)? Answer: because it is \(V\) that is weak, not \(H=H_0+V\)! This simple observation motivates the whole point of using the interaction picture in the first place!)

At this point, the integral is recognized as a \([0,t]\)-windowed Fourier transform of \(\langle f|V(t)|i\rangle\), so by interpreting \(\int_0^tdt’=\int_{-\infty}^{\infty}dt'[0\leq t’\leq t]\) with a top-hat filter and using the convolution theorem, one arrives at the stated result (note that with this F.T. convention, there is a factor of \(2\pi\) in the convolution theorem that makes \(\hbar\mapsto h=2\pi\hbar\)).

Thus, the key conceptual corollary of all this is that the only possible \(H_0\)-eigenstate transitions \(|i\rangle\mapsto |f\rangle\) are between those coupled by the perturbation \(V(t)\) in the sense that its spectrum \(V_{fi}(\omega_{fi})\neq 0\) has some support there.

Problem: Suppose that \(V(t)=V_0\) is time-independent. Show that the transition probability between arbitrary \(H_0\)-eigenstates \(|i\rangle,|f\rangle\) is given by:

\[P_{fi}(t)=\left(\frac{2|\langle f|V_0|i\rangle|}{\hbar\omega_{fi}}\sin\frac{\omega_{fi}t}{2}\right)^2\]

Solution: This one leads to a “Fraunhofer single-slit”:

Problem: Instead of just a single DC component at \(\omega=0\), now let the spectrum of the perturbation \(V\) be monochromatic at some frequency \(\omega\neq 0\); due to Hermiticity requirements, such a perturbation must be of the form:

\[V(t)=V_0e^{i\omega t}+V_0^{\dagger}e^{-i\omega t}\]

leading to a Hermitian “Fraunhofer double-slit” spectrum:

\[V(\omega_{fi})=2\pi(V_0\delta(\omega_{fi}+\omega)+V_0^{\dagger}\delta(\omega_{fi}-\omega))\]

In this case, what does the transition probability become?

Solution:

which correspond to the processes of stimulated absorption and emission.

Problem: Using the mathematical identity:

\[\lim_{t\to\infty}\text{sinc}^2(\omega t)=\frac{\pi}{t}\delta(\omega)\]

establish Fermi’s golden rule:

\[P_{fi}(t)\propto t\]

Solution:

Problem: What is Fermi’s useful golden rule for stimulated absorption?

Solution: Instead of a discrete final \(H_0\)-eigenstate \(|f\rangle\), there should be a continuum (or in practice a quasi-continuum) of final states described by a density of \(H_0\)-eigenstates \(g_{f,H_0}(E_f)\). The useful form of Fermi’s golden rule for the transition rate into any of these energetically compatible \(H_0\)-eigenstates is then:

\[P_{fi}(t)\approx\frac{2\pi|\langle f|V_0|i\rangle|^2}{\hbar}g_{f,H_0}(E_f=E_i+\hbar\omega)\]

where a factor of \(\hbar\) has been (optionally) absorbed to have an energy rather than angular frequency argument in the density of states. Of course, a similar useful form of Fermi’s golden rule holds for stimulated emission.

Problem: What are the \(3\) key assumptions of Fermi’s golden rule (FGR)?

Solution: The \(3\) key assumptions are:

Assumption #\(1\): Weak coupling \(V(t)\) (several ways to formalize this).

Assumption #\(2\): Continuum (or in practice quasicontinuum) of final scattering \(H_0\)-eigenstates to transition into, hence described by a density of states \(g_{f,H_0}(E_f)\).

Assumption #\(3\): Intermediate time window (this leads to the idea of the emergence and subsequent breakdown of FGR as a function of time \(t\)).

In most quantum mechanics textbooks, Assumption #\(1\) tends to be emphasized clearly by the fact that one is doing (time-dependent) perturbation theory. Assumption #\(2\) is present in Fermi’s useful golden rule, so in that sense is a bit optional. Finally, Assumption #\(3\) tends to be obscured, and sometimes omitted entirely, but is absolutely critical.

Problem: Elaborate on the importance of Assumption #\(3\).

Solution: The fact that there should be a lower bound on \(t\) makes sense, as enough time needs to pass for the \(\text{sinc}^2\) to actually look like the \(\delta\) as one of the key steps above. However, at the same time the notion of \(t\to\infty\) is a bit artificial, and for one clearly cannot be strictly correct since otherwise the transition probability \(P_{fi}(t)\propto t\) would eventually exceed \(1\).

More quantitatively, if one (optionally) defines a notion of Rabi frequency in the usual way \(\hbar\Omega_{fi}/2:=|\langle f|V_0|i\rangle|\), then it’s clear from the earlier formula for the transition probability that, taking the \(t\to 0\) limit rather than \(t\to\infty\), it instead takes off quadratically in time \(t\) as \(P_{fi}(t)=\Omega_{fi}^2t^2/4\) before the emergence of the linear-in-\(t\) FGR regime.

Meanwhile, in the large-\(t\) limit, if one drives too strongly then of course perturbation theory breaks down and instead one would get Rabi oscillations or something like that. So the exact regime of validity of FGR is a much richer and subtler topic than it seems at first. See the paper of Micklitz et al. and the paper of Chen et al. for more details.

(aside: a scattering amplitude is simply a transition amplitude between \(2\) asymptotic scattering \(H_0\)-eigenstates; thus the term “transition amplitude” is more general, referring to the probability amplitude of an \(H_0\)-measurement at time \(t\) to yield the energy \(E_f\) given the quantum system was, at time \(t=0\), in the \(H_0\)-eigenstate \(E_i\)).

Problem: A hydrogen atom in its \(s\)-wave bound ground state is ionized by light of frequency \(\omega\). Calculate the FGR transition rate to the continuum of asymptotically free scattering states (this is a simple model of the photoelectric effect).

Solution: A qualitative sketch of the calculation: first, ignore quantization of the EM field (valid for e.g. a laser) by imposing a suitable vector potential such as \(\textbf A\propto e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\) for the usual planar EM wave. The perturbation Hamiltonian \(V(t)\) is determined by expanding to first-order the usual minimally-coupled Hamiltonian \(H\) for a charge \(q=-e\) in an electromagnetic field. Since this is absorption rather than stimulated emission, compute the matrix element of the appropriate term between the hydrogen atom ground state \(|1,0,0\rangle\) and the free \(e^-\) state \(|\textbf k\rangle\) (where recall that the density of such free states is \(g_{f,H_0}(E)\propto\sqrt E\)).

Problem: (to be added, dipole approximation demonstration that rates of stimulated absorption/emission are same, and the first-order p.t. expression for that rate).

Solution: Fermi’s golden rule applied to a monochromatic perturbation (should this be treated as another assumption?), but for a more general spectrum, even transitions between discrete bound \(H_0\)-eigenstates are possible, e.g. atom bathing in a photon gas. To compute the stimulated absorption/emission rates \(\Gamma_a,\Gamma_e\) it is common to employ a so-called dipole approximation (though in this context I feel a better name would be long-wavelength approximation or even just cold approximation; note also that this is a further approximation on top of all the other approximations that have already been made before this) in which the typical wavelength \(\lambda\gg r_B\) of photons \(\gamma\) in the thermal bath is much longer than the Bohr radius \(r_B\) representing the typical length scale of atoms. Essentially what this means is that the vector potential \(\textbf A\sim e^{i(\textbf k_{\text{ext}}\cdot\textbf x-\omega_{\text{ext}} t)}\approx e^{-i\omega_{\text{ext}}t}\) is approximately spatially uniform for \(\textbf k_{\text{ext}}\cdot\textbf x\ll 1\) for \(\textbf x\) ranging over the atom, and therefore the time-dependent perturbation to the atom just has the form \(\Delta\tilde H(t)=e\textbf E_{\text{ext}}(t)\cdot\sum_{e^-}\textbf X_{e^-}\) where \(e\sum_{e^-}\textbf X_{e^-}\) is the net electric dipole moment of the atom, hence the name “dipole approximation”. Chugging this into Case #\(2\) of Fermi’s golden rules, one can compute the rates of stimulated absorption and emission to be equal and given by:

\[\Gamma_{1\to 2}=\Gamma_{2\to 1}=\frac{\pi e^2}{3\varepsilon_0\hbar^2}\left|\sum_{e^-}\langle 2|\textbf X_{e^-}|1\rangle\right|^2g(\Delta E_{12})\]

Note that non-relativistic quantum mechanics predicts that atoms can only undergo stimulated absorption or stimulated emission. In particular, non-relativistic quantum mechanics predicts that there are no such things as “spontaneous absorption” or “spontaneous emission” \(\Gamma_{1\to 2}^*=\Gamma_{2\to 1}^*=0\) where an atom can undergo a transition between \(\tilde H\)-eigenstates in the absence \(\textbf E_{\text{ext}}=\textbf B_{\text{ext}}=\textbf 0\) of external electromagnetic fields/photons (because the non-relativistic Schrodinger equation asserts that the \(t\)-evolution of \(\tilde H\)-eigenstates is to stay right where they are rather than hopping to other \(\tilde H\)-eigenstates). Indeed, it turns out there is no such thing as “spontaneous absorption” \(\Gamma_{1\to 2}^*\), but there can be spontaneous emission \(\Gamma_{2\to 1}^*\neq 0\). This phenomenon only arises once one quantizes the classical, smooth electromagnetic field \((\textbf E,\textbf B)\). When this is done, it is found that spontaneous emission is possible due to interactions between the atom and zero-point fluctuations of the quantum electromagnetic field, but to delve deeper would be the subject of quantum electrodynamics.

Nevertheless, Einstein was able to compute the rate of spontaneous emission \(\Gamma_{2\to 1}^*\) without knowing anything about the quantization of the electromagnetic field. Or more precisely, Einstein showed that if one could calculate the stimulated emission rate \(\Gamma_{2\to 1}\), then one would also get the spontaneous emission rate \(\Gamma_{2\to 1}^*\) “for free” since he found that the two were proportional to each other and what that proportionality constant was.

Einstein’s Statistical Argument for Spontaneous Emission

Consider any \(2\) energy levels \(E_1<E_2\) in an atom of degeneracies \(\Omega_1,\Omega_2\) with number of electrons \(N_1,N_2\) respectively.

Then in general, one heuristically expects probabilistic kinetics of the form (ASIDES: what happens if one also adds a “spontaneous absorption” term \(A_{12}N_1\) among the rate terms? And also, need to explicitly related the \(\Gamma\)-transition rates above in Fermi’s golden rule with these Einstein \(A,B\) coefficients):

\[\dot N_2=-\dot N_1=-B_{12}(\Delta\omega_{12})N_1\mathcal E_{\text{ext}}(\Delta\omega_{12})+B_{21}(\Delta\omega_{12})N_2\mathcal E_{\text{ext}}(\Delta\omega_{12})+A_{21}(\Delta\omega_{12})N_2\]

where the spontaneous emission term \(A_{21}(\Delta\omega_{12})N_2\) is not proportional to the spectral energy density of photons \(\mathcal E_{\text{ext}}(\Delta\omega_{12})\) with the correct frequency \(\Delta\omega_{12}=(E_2-E_1)/\hbar\) precisely because it’s spontaneous (i.e. not stimulated by the radiation and therefore independent of its “concentration”). Another way to see this is that, in the absence of any external radiation \(\mathcal E_{\text{ext}}(\Delta\omega_{12})=0\) with the correct frequency \(\Delta\omega_{12}\), the only operating mechanism is spontaneous emission in which the occupation number \(N_2(t)=N_2(0)e^{-A_{21}(\Delta\omega_{12})t}\) decays exponentially to \(N_2\to 0\) with lifetime \(\tau(\Delta\omega_{12})=1/A_{21}(\Delta\omega_{12})\) inversely correlated with the gap size \(\Delta\omega_{12}\) (since a larger gap size \(\Delta\omega_{12}\) signifies a more “unstable” state).

More generally then, when there are photons with the right frequency \(\mathcal E_{\text{ext}}(\Delta\omega_{12})\neq 0\) for stimulated absorption/emission, the transient \(t\)-dependence is tricky to model, but the steady state \(t\to\infty\) where \(\dot N_1=\dot N_2=0\) is straightforward: the rates of stimulated absorption/emission and spontaneous emission must balance:

\[-B_{12}(\Delta\omega_{12})N_1\mathcal E_{\text{ext}}(\Delta\omega_{12})+B_{21}(\Delta\omega_{12})N_2\mathcal E_{\text{ext}}(\Delta\omega_{12})+A_{21}(\Delta\omega_{12})N_2=0\]

or isolating for the spectral energy density \(\mathcal E_{\text{ext}}(\Delta\omega_{12})\) of correct-frequency photons:

\[\mathcal E_{\text{ext}}(\Delta\omega_{12})=\frac{A_{21}(\Delta\omega_{12})}{B_{12}(\Delta\omega_{12})(N_1/N_2)-B_{21}(\Delta\omega_{12})}\]

But being in the \(t\to\infty\) steady state is synonymous with the atom being in thermodynamic equilibrium with the photon gas at some temperature \(T\) in the canonical ensemble, so statistical mechanics asserts that both \(\frac{N_1}{N_1+N_2}=\Omega_1e^{-E_1/kT}/Z\) and \(\frac{N_2}{N_1+N_2}=\Omega_2e^{-E_2/kT}/Z\) will be Boltzmann distributed while the spectral energy density of the radiation \(\mathcal E_{\text{ext}}(\Delta\omega_{12})=\frac{\hbar\Delta\omega_{12}^3}{\pi^2 c^3}\frac{1}{e^{\hbar\Delta\omega_{12}/kT}-1}\) will be Planck distributed. Plugging these distributions in leads to:

\[\frac{\hbar\Delta\omega_{12}^3}{\pi^2 c^3}\frac{1}{e^{\hbar\Delta\omega_{12}/kT}-1}=\frac{A_{21}(\Delta\omega_{12})\Omega_2}{B_{12}(\Delta\omega_{12})\Omega_1}\frac{1}{e^{\hbar\Delta\omega_{12}/kT}-B_{21}(\Delta\omega_{12})\Omega_2/B_{12}(\Delta\omega_{12})\Omega_1}\]

One can then just “pattern-match” (it is an encouraging sign that this is even possible!) because the relation must hold at arbitrary temperature \(T\in\textbf R\) to obtain:

\[\frac{B_{12}(\Delta\omega_{12})}{\Omega_2}=\frac{B_{21}(\Delta\omega_{12})}{\Omega_1}\]

\[\frac{\hbar\Delta\omega_{12}^3}{\pi^2 c^3}=\frac{A_{21}(\Delta\omega_{12})\Omega_2}{B_{12}(\Delta\omega_{12})\Omega_1}\]

The first relation is just reconfirming that \(\Gamma_{12}=\Gamma_{21}\). Morally, it makes sense that these stimulated rates coincide because absorption of a photon of energy \(\pm E\) is “equivalent” to emission of a photon of energy \(\mp E\). Mathematically, the time-dependent perturbation \(\Delta\tilde H(t)=\Delta\tilde H_0e^{i\omega t}+\Delta\tilde H_0^{\dagger}e^{-i\omega t}\) is Hermitian.

Plugging the first relation into the second, one obtains a direct relation between the rates of stimulated and spontaneous emission:

\[A_{21}(\Delta\omega_{12})=\frac{\hbar\Delta\omega_{12}^3}{\pi^2 c^3}B_{21}(\Delta\omega_{12})\]

At first, even after one has seen Einstein’s statistical argument for \(\Gamma_{1\to 2}=\Gamma_{2\to 1}\) and \(A_{21}(\Delta\omega_{12})\propto\Delta\omega_{12}^3B_{21}(\Delta\omega_{12})\), it is still not clear how the argument manages to “get away” with not knowing anything about the quantization of the electromagnetic field while still obtain the correct answer. It turns out to be implicit; one has already quantized the electromagnetic field by taking the Planck distribution as an input to the derivation.

Being thermodynamic in nature, whereas transition rates are inherently kinetic, Einstein’s statistical argument of course fails to actually yield an explicit expression for the various transition rates. This “kinetics” part was of course the whole point of the time-dependent perturbation theory calculation with the dipole approximation. Combining Einstein’s thermodynamic argument with a dose of quantum kinetics, the lifetime \(\tau\) of an isolated atom in an excited state before it spontaneously emits a photon \(\gamma\) in order to decay to a lower energy is roughly:

\[\tau\approx \frac{3m_{e^-}c^3}{4\hbar\Delta\omega_{12}^3a_0}\]

Posted in Blog | Leave a comment