Problem: Write down a formula for the total energy \(E\) stored in the transverse and longitudinal acoustic phonons of an insulator at temperature \(T\).
where the Debye wavenumber \(k_D\sim(N/V)^{1/3}\) is determined precisely by requiring \(\int_0^{k_D}dk 3\frac{V}{(2\pi)^3}4\pi k^2=3N\) normal modes in an insulating solid of \(3N\) atoms (it is the UV cutoff support of the density of states).
Problem: What dispersion relation \(\omega_k\) is assumed in the Debye model?
Solution: A linear nondispersion \(\omega_k=ck\), where \(c\) should is a suitable average of the acoustic and longitudinal velocities. In practice, for sufficiently high-momenta phonons, the acoustic band dispersion is nonlinear with the group velocity vanishing at the boundary of the underlying crystal lattice’s Brillouin zone, so in order to ensure one is working in the linear, low-\(k\) regime, just keep \(T\) low (so expect the Debye model to work better for low-\(T\) even though it also reproduces the Dulong-Petit law at high-\(T\)).
Problem #\(3\): Introduce the Debye frequency \(\omega_D\) and hence Debye temperature \(T_D\), and show that, given the assumption of Solution #\(2\), the energy \(E\) stored in the transverse/longitudinal acoustic phonons of a solid can be written in terms of \(T_D\) as:
Solution #\(3\): In line with Solution #\(2\), define \(\omega_D:=ck_D\) and \(k_BT_D:=\hbar\omega_D\). The rest is bookwork (using the exact \(k_D=(6\pi^2N/V)^{1/3}\)).
Problem #\(4\): Now one is in a position to get the heat capacity (at constant \(V\) to be precise)! Just \(T\)-differentiate \(E\) in Problem #\(3\)!
Problem #\(5\): Check that \(C_V\to 3Nk_B\) for \(T\gg T_D\) (by the way this now makes precise the informal statement earlier about being at “high-\(T\)”) and \(C_V\sim\frac{12\pi^4}{5}Nk_B\left(\frac{T}{T_D}\right)^3\) for \(T\ll T_D\) (this is what “low-\(T\)” means!), in agreement with the \(3^{\text{rd}}\) law of thermodynamics.
Solution #\(5\):
Problem #\(6\): Why make such a fuss that the Debye model only applies to solids which are insulators?
Solution #\(6\): Well, to be more precise, the Debye model only considers the energy tied up in transverse/longitudinal non-dispersive acoustic phonons (with wavevectors confined in an appropriate Brillouin zone \(k\leq k_D\)), and their corresponding heat capacity contribution. In insulators, this is typically the dominant contribution at low \(T\ll T_D\). However, for conductors, the conduction electrons make an even greater linear contribution to the heat capacity (which can be estimated in an ideal Fermi gas approximation), which at low \(T\) will dominate the phonon contribution (this difference in the scaling \(T^3\) vs. \(T\) arises fundamentally because phonons are masslessbosons whereas electrons are massive fermions.).
Problem #\(7\): Give a heuristic derivation of the Debye’s \(T^3\) law.
Solution #\(7\): As with any thermal distribution at temperature \(T\), only phonon states of energy \(\hbar\omega\leq k_BT\Leftrightarrow k\leq k_BT/\hbar c\) will be populated, the total number of such states being proportional to the volume of the corresponding ball in \(\mathbf k\)-space, so \(\propto T^3\). The energy of each such low-energy state is \(\hbar\omega/(e^{\hbar\omega/k_BT}-1)\approx k_BT\) so the total energy \(E\sim T^4\), hence \(C\sim T^3\).
(if one looks back at the integral manipulations earlier, this is basically just a more physically-appealing rephrasing of those calculations).
cf. the similar argument for the Sommerfeld \(T\) law for conduction electrons.
Problem: What is the position space wavefunction \(\psi(\textbf x)\) of a photon in a box of volume \(V=L^3\) whose opposite faces are subject to periodic boundary conditions? What about reflecting boundary conditions?
Solution: With periodic boundary conditions, the wavefunctions are travelling plane waves:
where \(k_x,k_y,k_z\in \frac{\pi}{L}\textbf Z^+\) (the \(\textbf Z^+\) here rather than a \(\textbf Z\) restricts to the first octant in \(\textbf k\)-space because e.g. \(\sin(-k_xx)=-\sin(k_xx)\) is a physically equivalent state.
(if one wants to normalize these wavefunctions, then in the former case it’s \(1/\sqrt{V}\), in the latter case it’s \(\sqrt{8/V}\)).
Problem: Write down the Planck distribution \(\mathcal E_{\omega}\).
Problem: Assuming an isotropic photon gas, find an expression for the irradiance density \(I(\omega)\) incident normally on a face of the cavity.
Solution: Letting the unit normal to the face be \(\hat{\textbf z}\), this is given by the hemispherical expectation of the Poynting vector projection along \(\hat{\textbf z}\):
where \(\sigma:=\frac{\pi^2k_B^4}{60c^2\hbar^3}\).
Problem: So the Stefan-Boltzmann law says that as an object gets hotter (avoid using the term blackbody), it emits more energy in proportion to \(T^4\); but how much of this effect is due to the energy of the individual photons actually increasing and how much is due to the fact that it’s emitting more photons?
Solution: \(4=1+3\), where the \(1\) is the energy increase of each photon, while the \(3\) is the increase in the number of photons emitted (thus, it is clear what the dominant effect is).
Mystery (Why This Derivation Fails?)
The canonical partition function for the photon gas is actually (remembering also the \(1/N!\) Gibbs factor):
where the total energy of the photon gas in that particular microstate is degenerate with respect to their polarizations, instead depending only on the momenta \(E_{(\textbf n_1,\textbf n_2,…,\textbf n_N)}=\frac{2\pi\hbar c}{L}\sum_{i=1}^N|\textbf n_i|\). As a result, we can move the exponential outside the sum over polarization states \(\sum_{(|\psi_1\rangle,|\psi_2\rangle,…,|\psi_N\rangle)\in\{|0\rangle,|1\rangle\}^N}1=2^N\). We can also convert the exponential of the series of energies into a product \(e^{-\beta E_{(\textbf n_1,\textbf n_2,…,\textbf n_N)}}=\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\).
The series of products \(\sum_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf Z^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\) is tricky to calculate but can be accurately estimated with the aid of a useful approximation.
Digression on the Euler-Maclaurin Summation Formula
The Euler-Maclaurin summation formula asserts that, for any smooth function \(f\in C^{\infty}(\textbf R\to\textbf C)\), if we sum the values of \(f\) evaluated on a bunch of consecutive integers \(\sum_{n=a}^bf(n)\) with \(a,b\in\textbf Z\cup\{-\infty,\infty\}\), then this is often well-approximated by the integral \(\int_a^bf(n)dn\) of the function \(f\) on the interval \([a,b]\), treating \(n\) as though it were a continuous variable \(n\in\textbf R\) rather than an integer \(n\in\textbf Z\) (of course this is a two-way street, meaning that many integrals can also be approximated by suitable series). The canonical example of the Euler-Maclaurin summation formula concerns the harmonic series \(f(n)=1/n\), with the following diagram given to rationalize why the Euler-Maclaurin approximation should be a good approximation:
However, the full Euler-Maclaurin formula also acknowledges that the approximation is imperfect by explicitly giving a formula for the error/discrepancy between the series and its integral approximation:
where the asymptotic series on the right features the non-zero even Bernoulli numbers \(B_2=1/6,B_4=-1/30\), etc. This interesting sequence of rational numbers were historically first discovered in the context of Faulhaber’s formula \(1^p+2^p+…+n^p=\frac{1}{p+1}\left(B_0{{p+1}\choose{0}}n^{p+1}+B_1{{p+1}\choose{1}}n^{p}+…+B_p{{p+1}\choose{p}}n\right)\) for sums of powers, but arguably the real utility of the Bernoulli numbers lies in their appearance in the Euler-Maclaurin summation formula (indeed Faulhaber’s formula is just the special case of the Euler-Maclaurin summation formula with \(f(n)=n^p\)). The quickest way to compute these Bernoulli numbers is to use a Pascal-like triangle construction (details can be found in this paper of Kawasaki and Ohno). The idea is to first write down a couple terms of the harmonic sequence \(1, 1/2,1/3,…\) in the first row. Then, whereas in Pascal’s triangle one would compute the number “below” two numbers by simply adding them, here instead one first has to subtract the two (left number minus right number, or if you forget then bigger number minus smaller number to always obtain a positive difference), and then multiply the result by the denominator of whichever term in the original harmonic sequence \(1, 1/2,1/3,…\) lies on the same downward-sloping “diagonal column” so to speak. A picture is worth \(10^3\) words:
The Bernoulli numbers \(B_0=1,B_1=1/2,B_2=1/6\), etc. are then simply read off as the elements on the first (left-most) downward-sloping diagonal column (SHOULD INCLUDE SOME COMMENTS ABOUT WHEN EULER-MACLAURIN SUMMATION IS VALID AND RELATE IT TO THE STATISTICAL MECHANICAL APPLICATION FUNCTION WHICH IS EXPONENTIALLY DECAYING!).
Back to the Physics
All this is to say that we expect we can approximate the series of products earlier \(\sum_{(\textbf n_1,\textbf n_2,…,\textbf n_N)\in\textbf Z^{3N}}\prod_{i=1}^Ne^{-2\pi\beta\hbar c|\textbf n_i|/L}\) over the discrete lattice \(\textbf Z^{3N}\) by an integral over the smooth manifold \(\textbf R^{3N}\) instead, obtaining a \(3N\)-dimensional analog of the Euler-Maclaurin summation formula:
Unfortunately, I believe the correct answer is actually \(Z=e^{\pi^2 V/45\beta^3\hbar^3c^3}\). My answer is correct up to a numerical factor of \(\zeta(4)=\pi^4/90\) which I cannot for the life of me figure out where I forgot to incorporate.
The purpose of this post is to explore the rich physics encoded in the Van der Waals equation of state:
\[\left(p+\frac{aN^2}{V^2}\right)(V-Nb)=NkT\]
Essentially, the Van der Waals equation of state dispenses with two assumptions implicit in the ideal gas law:
Gas particles are not point particles, but have some finite volume which reduces the total available volume \(V\).
More importantly, gas particles interact with each other through some potential energy \(V(|\textbf x_i-\textbf x_j|)\) (not to be confused with the volume \(V\)). They’re not just billiard balls flying around oblivious to each other’s existence.
First, it will be instructive to derive the Van der Waals equation from a microscopic, first principles statistical mechanics calculation within the regime of certain approximations. Then, some of the interesting physics that it predicts will be investigated further.
Origins of the Van der Waals Equation
The starting point is the virial expansion (which has nothing to do with the virial theorem). It is just a perturbation of the ideal gas law to account for interactions (e.g. the two points mentioned above). The idea is that the ideal gas law is increasingly accurate in the low-density limit \(n:=N/V\to 0\) so one can write down a power series equation of state in \(n\):
where the virial coefficients \(B_k(T)=d^k(p/k_BT)/d(n=0)^k\) begin as \(B_0(T)=0\) and \(B_1(T)=1\) (note: this definition of the virial coefficients is unconventional in that the factor of \(1/k!\) is typically absorbed into them). The goal is to calculate the virial coefficients \(B_k(T)\) from first principles (i.e. from specifying the precise nature of the interatomic potential energy \(V(r)\); it is taken to be central for now).
Recall that classically for two permanent electric dipoles \(\textbf p_1,\textbf p_2\), their potential energy is \(V(\textbf r)=\frac{\textbf p_1\cdot\textbf p_2-3(\textbf p_1\cdot\hat{\textbf r})(\textbf p_2\cdot\hat{\textbf r})}{4\pi\varepsilon_0 r^3}\) where \(\hat{\textbf r}\) can either be the unit vector from \(\textbf p_1\) to \(\textbf p_2\) or vice versa (clearly doesn’t matter). In our case of neutral monatomic gas particles, there are no permanent electric dipoles \(\textbf p_1=\textbf p_2=\textbf 0\), but it turns out there are still temporary/instantaneous electric dipoles (ultimately due to quantum fluctuations, but here one is treating everything classically). Therefore, even neutral atoms like a container of helium \(\text{He}(g)\) gas still experience slight interatomic forces, dubbed London dispersion forces. One can argue the strength of such London dispersion forces as follows: if an electric dipole \(\textbf p_1^{\text{temp}}\) happens to arise temporarily due to random fluctuations in an atom, then it will also come with a temporary, anisotropic electric field \(\textbf E^{\text{temp}}_1(\textbf r)=\frac{3(\textbf p_1^{\text{temp}}\cdot\hat{\textbf r})\hat{\textbf r}-\textbf p_1^{\text{temp}}}{4\pi\varepsilon_0r^3}\). Another neutral atom located some relative position \(\textbf x\) away will feel the influence of \(\textbf E^{\text{temp}}_1(\textbf x)\) and it will now acquire an induced electric dipole \(\textbf p_2^{\text{ind}}=\alpha\textbf E^{\text{temp}}_1(\textbf x)\) with \(\alpha>0\) its atomic polarizability. One can then apply the same formula for the potential energy of any two electric dipoles, \(V_{\text{London}}(\textbf x)=\frac{\textbf p_1^{\text{temp}}\cdot\textbf p_2^{\text{ind}}-3(\textbf p_1^{\text{temp}}\cdot\hat{\textbf x})(\textbf p_2^{\text{ind}}\cdot\hat{\textbf x})}{4\pi\varepsilon_0 |\textbf x|^3}\). Thus, London dispersion forces are conservative forces with corresponding attractive London dispersion potential energy \(V_{\text{London}}\propto-\frac{1}{r^6}\) (the weakest among all Van der Waals forces!).
So London dispersion forces tend to weakly attract gas particles together. On the other hand, they can’t get too close together by virtue of the Pauli exclusion principle (this is again a bit handwavy, but can all be justified more rigorously using quantum mechanics). Since this is a classical model, the exact form of the repulsive “Pauli potential” \(V_{\text{Pauli}}\) isn’t so important. One common model is \(V_{\text{Pauli}}(r)\propto\frac{1}{r^{12}}\) (because \(12=6\times 2\)). With this, one obtains the Lennard-Jones potential \(V_{\text{Lennard-Jones}}(r)=V_{\text{London}}(r)+V_{\text{Pauli}}(r)=V_{*}\left[\left(\frac{r_*}{r}\right)^{12}-2\left(\frac{r_*}{r}\right)^6\right]\), where \(r_*\) is the global minimum of the potential and \(V_*=-V(r_*)>0\) is the depth of the potential well. However, another possibility for the Pauli potential is simply the infinite potential barrier \(V_{\text{Pauli}}=\infty[r<r_0]\) for some hard sphere center-to-center separation \(r_0>0\) (so that each hard sphere actually has radius \(r_0/2\) called its Van der Waals radius). For computational convenience later, it turns out this latter form will be simpler to work with.
Canonical Partition Function
Now the Hamiltonian for \(N\) gas particles is \(H=\sum_{i=1}^N\frac{p_i^2}{2m}+\sum_{1\leq i<j\leq N}V(|\textbf x_i-\textbf x_j|)\) where \(V_{ij}=V\) is the same potential energy for all pairs of gas particles. Since one would like to think of this gas as trapped in a box, it can’t exchange particles, but it’s okay if it exchanges energy with the environment heat bath so let’s consider it at thermodynamic equilibrium with microstate probabilities provided by the Boltzmann distribution in the canonical ensemble. The canonical partition function is (this time one has to do the full \(6N\)-dimensional phase space integral because the gas particles are interacting):
Thus, because of the interactions, the integral over positions doesn’t factor in any obvious way; it’s an example of what makes interactions difficult to study in general in physics.
It turns out the way this is usually treated perturbatively is to define the \(T\)-dependent Mayer \(f\)-function by the formula:
\[f(\mathbf x):=e^{-\beta V(\mathbf x)}-1\]
For instance, for the Lennard-Jones potential \(V(r)=V_{\text{Lennard-Jones}}(r)\), the Mayer \(f\)-function looks something like:
For notational brevity, defining the shorthand \(f_{jk}:=f(|\textbf x_j-\textbf x_k|)\) (which may be viewed as components of a symmetric \(N\times N\) Mayer \(f\)-matrix with \(-1\) on the diagonals), one can rewrite the canonical partition function \(Z\) for the gas as:
To second-order, the product looks like a series of subseries \(\prod_{1\leq j<k\leq N}(1+f_{jk})=1+\sum_{1\leq j<k\leq n}f_{jk}+\sum_{1\leq j<k\leq N,1\leq \ell<m\leq N}f_{jk}f_{\ell m}\). The zeroth and first-order terms evaluate approximately to (using \(N(N-1)/2\approx N^2/2\)):
where \(Z_{\text{ideal}}=\frac{V^N}{N!\lambda^{3N}}\). Thus, one sees explicitly how the interactions \(f(r)\) constitute a perturbation to the canonical partition function of the ideal gas. In the canonical ensemble, the Helmholtz free energy is approximately (ignoring terms in quadratic in the Mayer \(f\)-function and Taylor expanding the logarithm) \(F=-k_BT\ln Z\approx -k_BT\ln Z_{\text{ideal}}-k_BT\ln\left(1+\frac{2\pi N^2}{V}\int_0^{\infty}r^2f(r)dr\right)\approx F_{\text{ideal}}-\frac{2\pi N^2k_BT}{V}\int_0^{\infty}r^2f(r)dr\). The pressure is then \(p=-\partial F/\partial V=nk_BT-B_2(T)n^2k_BT\) where the second virial coefficient is given by:
\[B_2(T)=-\frac{1}{2}\int d^3\mathbf x f(\mathbf x)=-2\pi \int_0^{\infty}dr r^2f(r)\]
The unique temperature \(T_B\) at which \(B_2(T_B)=0\) is called the Boyle temperature for the interacting gas as in that case the \(\sim n^2\) virial corrections to the ideal gas law vanish, meaning it behaves quasi-ideally (assuming the higher-order \(O(n^3)\) corrections can be neglected).
To evaluate the integral, one can use the hard sphere repulsive Pauli potential so that \(f(r)=-1\) for \(r<r_0\) and \(f(r)=e^{\beta V_0(r_0/r)^6}-1\) for \(r\geq r_0\). In the high-temperature limit \(\beta V_0\ll 1\), one can Taylor expand the exponential to make the integral easier. The virial coefficient comes out as \(B_2(T)=V_{\text{ex}(1-V_0/k_BT)\) with Boyle temperature \(k_BT_B=V_0\). The virial expansion to this order is thus:
To interpret it, recall that \(a\propto V_0>0\) is proportional to the attractive strength \(V_0\) of the interaction, so increasing the attractive strength ceteris paribus between gas particles means a lower pressure \(p\) which of course makes sense. Moreover, this correction is proportional to \(n^2\) because the number of attractive forces within the container is given by the number of pairs \(N(N-1)/2\propto N^2\) of gas particles (as the derivation using the canonical partition function made very explicit!). On the other hand, \(V_{\text{ex}}=2a/V_0\) is the excluded volume that arose from the infinitely hard Pauli repulsion. It is clearly just \(2^3=8\) times the volume of each individual hard sphere.
The Van der Waals equation is often written with \(a=V_{\text{ex}}V_0/2\) and \(b=V_{\text{ex}}/2\).
So where does the factor of \(2\) come from? Well, if \(2b\) denotes that physically excluded volume, then if one adds in gas particles one by one, the first can roam in the container’s volume \(V\), the second in \(V-2b\), the third in \(V-4b\), etc. The total \(N\)-dimensional configuration space (NOT phase space!) hypervolume \(\mathcal V\) for \(N\) gas particles is:
\[\mathcal V=\frac{1}{N!}\prod_{n=1}^N(V-2nb)\]
In the limit \(V\ll 2b\), one can check that this hypervolume \(\mathcal V\) reduces to \(\mathcal V=\frac{1}{N!}\left(V-Nb\right)^N\) suggesting that it is indeed \(b\) and not \(2b\) that is the correct correction to the container volume \(V\) in the Van der Waals equation (in other words, \(b\) should be thought of as the excluded volume per particle).
To summarize, recall the following assumptions that went into the derivation of the Van der Waals equation of state:
Low density gases \(N/V\ll 1/r_0^3\) (much less dense than the liquid phase!).
High-temperature \(V_0\ll kT\).
Hard sphere Pauli repulsion \(V(r)=\infty[r<r_0]\).
So while the Van der Waals equation of state is a significant improvement over the ideal gas law, it too has its limitations.
Universality in Van der Waals Fluids
Recall that taking an ideal gas and turning on Van der Waals interactions described by a long-range attractive Lennard-Jones potential and a short-range repulsive Pauli interaction modifies the equation of state to:
One can proceed to plot some isotherms of this van der Waals fluid in the usual \(pV\) state space of the fluid (wiggles are exaggerated):
At sufficiently high temperatures \(T>T_c\), the pressure \(p\) decreases monotonically with volume \(V\) along a (roughly) hyperbolic path \(p\propto V^{-1}\) so the Van der Waals fluid is supercritical in this regime (i.e. behaves like a liquid as \(V\to Nb\) and \(p\to\infty\) but otherwise behaves like an ideal gas as \(V\to\infty,p\to 0\)). By contrast, for low temperatures \(T<T_c\), the isotherm is no longer monotonic, experiencing both a local min and a local max. These occur at volumes \(V\) satisfying the cubic equation:
It can be checked that the discriminant of the cubic vanishes (corresponding to additionally imposing \(\partial^2p/\partial V^2=0\)) at the critical temperature \(kT_c=8a/27b\), with \(T<T_c\) corresponding to \(3\) real roots and \(T>T_c\) only \(1\) real root (a local max hiding at the unphysical \(V<Nb\) so not shown). The corresponding critical volume \(V_c=3Nb\) and critical pressure \(p_c=a/27b^2\) are easily found and annotated on the figure. In particular, note that the critical compression factor \(Z_c:=p_cV_c/NkT_c\) (not to be confused with the isothermal compressibility \(\kappa_T=-(\partial\ln V/\partial p)_T\) which blows up \(\kappa_T=\infty\) anyways at the critical point) is predicted by the Van der Waals equation to be a universal constant in all Van der Waals fluids:
\[Z_c=\frac{3}{8}=0.375\]
In practice, real fluids have \(Z_c\in\approx[0.2,0.3]\), reflecting the fact that the Van der Waals equation is not so accurate in the liquid regime (because it includes \(2\) fitting parameters \(a,b\) whereas the critical point is defined by \(3\) variables \(p_c,V_c,T_c\), it will always be possible to isolate some kind of universal ratio like this in any equation of state with only \(2\) fit parameters, but not necessarily so for more complicated equations of state). In any case, it definitely outperforms the naive ideal gas prediction \(Z=1\) (i.e. this prediction is just anywhere since there is no notion of criticality in the ideal gas model).
Using the critical variables \(p_c,V_c,T_c\), one can eliminate the fluid-specific constants \(a,b\) from the Van der Waals equation by rewriting it in terms of the reduced pressure \(\hat p:=p/p_c\), the reduced volume \(\hat V:=V/V_c\) and the reduced temperature \(\hat T:=T/T_c\):
This universal form of the Van der Waals equation is an instance of the law of corresponding states. Specifically, it is universal in the sense that it’s saying, for all fluids, certain aspects of their physics near their respective critical points look identical. For instance, one has the obvious Taylor expansion \(p-p_c\sim(V-V_c)^3\) as \(V\to V_c\) at the critical point. Two very different fluids will in general have their own very different critical \(p_c\) and \(V_c\), though the Van der Waals equation is making the remarkable claim that if one were to just be handed log-log plots of \(\ln(p-p_c)\) vs. \(\ln(V-V_c)\) for \(p\) close to \(p_c\) and \(V\) close to \(V_c\) in the case of each fluid, then both will have the same slope of \(3\) and differ only by a translation.
Another Van der Waals prediction is the divergence of the isothermal incompressibility \(\kappa_T\sim (T-T_c)^{-1}\) as \(T\to T_c^+\) (this of course agrees with the earlier assertion that exactly at the critical point, \(\kappa_{T=T_c}=\infty\) blows up). Although the Van der Waals equation was historically the first time that this whole theme of “universality at criticality” arose (and which is indeed what is observed experimentally), quantitatively speaking it only produces a rough approximation to the various critical exponents (heuristically this is because fluctuations become more and more important as one approaches the critical point, which is not something that Van der Waals takes into account). For instance, it turns out that actually \(p-p_c\sim(V-V_c)^{\approx 4.2}\) instead, and that \(\kappa_T\sim (T-T_c)^{\approx -1.2}\).
Liquid-To-Gas Phase Transition
Naively, the Van der Waals equation suggests that for a given subcritical isotherm \(\hat T<1\), there are supposedly certain equilibria where the Van der Waals fluid has negative isothermal compressibility \(\kappa_{\hat T}<0\). Such equilibria are obviously unstable. Fortunately, it seems that one can salvage this situation by the fact that
But all points on the \(T<T_C\) isotherm with \(\partial p/\partial V>0\) are unstable equilibria since, if the fluid starts at any such point, then any small perturbation in its volume \(V\) would, according to the isotherm, suggest that it now wants to equilibrate to a higher pressure \(p\), which would push the volume \(V\) even further in the original direction of change, enforcing a positive feedback loop. So instead, only the negative-feedback states with \(\partial p/\partial V<0\) can be observed in practice. For this reason, rather than strictly plotting a \(T<T_C\) Van der Waals isotherm, it would be desirable to replace the unstable section of the isotherm with the actually-experimentally-observed equilibrium states over those range of volumes \(V\).
Although at all temperatures \(T\), each volume \(V\) is associated with some unique equilibrium pressure \(p\), for \(T<T_C\) where monotonicity is broken, there is no longer a bijection between pressures \(p\) and volumes \(V\); in particular, it is clear that for all pressures \(p\) in the blue range highlighted,
In the case \(T<T_C\), for all pressures \(p\) in the blue range highlighted, the Van der Waals isotherm suggests there are \(3\) equilibrium volumes \(V\) that the system would tend towards if it were not already sitting at one of those volumes (assuming here the volume \(V\) is changeable by e.g. a piston).
Gravity is an anisotropic external input, density filter.
The middle phase is unstable due to its positive gradient \(\partial p/\partial V>0\), however the outer two are stable, and by inspecting not only their corresponding volumes \(V\) but also their isothermal compressibilities \(\kappa_T:=-(\partial\ln V/\partial p)_T\) or equivalently their isothermal bulk moduli \(B_T:=1/\kappa_T\), it is clear that the left point corresponds to a liquid while the right point corresponds to a gas. However, recall that the Van der Waals equation is essentially a second-order virial expansion in the particle density \(N/V\) so although it predicts the existence of a stable liquid phase, any quantitative predictions it makes about such a liquid phase (for which \(N/V\) is not typically small) need to be taken with a grain of salt.
But this begs the question, if the system sits at a pressure \(p\) for which both the liquid and gas phases seem to be stable, what does one actually observe experimentally? The answer is that, in general, there will be a mixture of both the liquid and gas phases. Implicitly here, one is considering the steady state \(t\to\infty\) limit with everything in thermodynamic equilibrium. Mechanical equilibrium is necessarily satisfied because by assumption the pressure \(p\) is fixed on a horizontal line, and thermal equilibrium is also ensured because both phases sit on the same \(T\)-isotherm. The only additional constraint yet to be imposed is diffusive equilibrium, i.e. that the number of particles in each has reached a steady state. This means both the liquid and gas phases, if they were to coexist, would also need to lie at the same chemical potential \(\mu\) (since \(p,T\mu\) are all intensive there is no constraint on the absolute numbers \(N_{\ell},N_g\) of liquid and gas particles, only on their relative proportion \(N_{\ell}/N_g\)). The Maxwell construction shows (proof omitted) that this only happens at a pressure \(p\) for which one has equal areas:
(thus, if the initial pressure \(p\) were not at this specific pressure, then the chemical potentials \(\mu_{\ell}\neq\mu_g\) would initially be unbalanced and particles would begin to diffuse from one phase to the other until the chemical potentials were balanced and the pressure would rise or fall accordingly). Considering the set of temperatures \(T\leq T_C\), for each one it is possible to perform a Maxwell construction to identify exactly which two points on that \(T\)-isotherm lead to equilibrium between the liquid and gas phases, and plot the corresponding \((p_{\ell},V_{\ell}),(p_g,V_g)\); the set of all such pairs in the \(pV\)-plane defines the coexistence curve. This is not to be confused with the spinodal curve which is just the curve passing through the extrema of the Van der Waals isotherms. Sandwiched between these are metastable states.
The purpose of this post is to demonstrate how the scattering operator \(S:\mathcal H_{\text{incident}}^{\infty}\to\mathcal H_{\text{scattered}}^{\infty}\), also called the \(S\)-operator for short, despite being defined to enact the scattering \(|\psi’_{\infty}\rangle=S|\psi_{\infty}\rangle\) of asymptotic incident waves \(|\psi_{\infty}\rangle\) off a potential \(V\) into asymptotic scattered waves \(|\psi’_{\infty}\rangle\), also encodes information about non-scattering states (e.g. bound states and resonant states) of the very same Hamiltonian \(H=T+V\) from which one is scattering quantum particles off of.
To illustrate this, recall that in one-dimension we have \(4\) distinct asymptotic waves \(|\pm k\rangle_{\pm’\infty}\), defined by the position space wavefunctions \(\langle x|\pm k\rangle_{\pm’\infty}:=e^{\pm ikx}[x\to\pm’\infty]\). We demand that the \(2\) incident waves (from the left and right) individually scatter as \(|k\rangle_{-\infty}\mapsto S|k\rangle_{-\infty}=r_{\rightarrow}|-k\rangle_{-\infty}+t_{\rightarrow}|k\rangle_{\infty}\) and \(|-k\rangle_{\infty}\mapsto S|-k\rangle_{\infty}=t_{\leftarrow}|-k\rangle_{-\infty}+r_{\leftarrow}|k\rangle_{\infty}\) as the picture below shows:
Therefore, with respect to the basis \(|k\rangle_{-\infty},|-k\rangle_{\infty}\) for the space of incident/ingoing waves \(\mathcal H_{\text{incident}}^{\infty}\) and the basis \(|-k\rangle_{-\infty},|k\rangle_{\infty}\) for the space of scattered/outgoing waves\(\mathcal H_{\text{scattered}}^{\infty}\), the matrix representation of \(S\), called the \(S\)-matrix, looks like:
Recall that we have some unintuitive results on 1D scattering, namely \(t_{\leftarrow}=t_{\rightarrow}:=t\) (not to be confused with time!) and \(r_{\rightarrow}t^*+r_{\leftarrow}^*t=0\). This just says that the columns (equivalently the rows) of the \(S\)-matrix are complex-orthogonal to each other. The orthogonality condition also gives that \(|r_{\leftarrow}|=|r_{\rightarrow}|\), hence by matching asymptotic probability currents \(J_{-\infty}=J_{\infty}\) one obtains \(|r_{\leftarrow}|^2+|t|^2=|r_{\leftarrow}|^2+|t|^2=1\). This discussion proves that \([S]_{|k\rangle_{-\infty},|-k\rangle_{\infty}}^{|-k\rangle_{-\infty},|k\rangle_{\infty}}\in U(2)\) has spectrum \(\Lambda_S\subseteq U(1)\) with complex-orthogonal eigenvectors. One can write such eigenvalues in terms of two real-valued phase shifts \(\delta_{\pm}=\delta_{\pm}(k)\) dependent on the incident momentum \(k\) of the beam of particles, i.e. \(e^{2i\delta_{\pm}}=\frac{r_{\leftarrow}+r_{\rightarrow}}{2}\pm\sqrt{\left(\frac{r_{\leftarrow}-r_{\rightarrow}}{2}\right)^2+t^2}\) with respective eigenvectors that I won’t bother writing down yet.
At this point, to simplify life we will work with an even potential \(V(-x)=V(x)\) so that \(r_{\leftarrow}=r_{\rightarrow}:=r\) coincide. Then the \(S\)-matrix becomes a \(2\times 2\) circulantmatrix so we can immediately write down its normalized eigenvectors (this is why I didn’t bother writing them down yet above!) as the two Fourier modes \((1/\sqrt{2},\pm 1/\sqrt{2})^T\). The corresponding eigenvalues of the \(S\)-matrix are straightforwardly \(e^{2i\delta_{\pm}}=r\pm t\), either from the earlier formula or by acting on those Fourier mode eigenvectors (recall that for \(t\in\textbf C\) we have the usual two branches for the square root \(\sqrt{t^2}=\pm t\)). Put another way, the \(S\)-operator is diagonalized in this basis:
Intuitively, if one views \(r,t\in\textbf C\cong\textbf R^2\) as vectors in the complex plane, then for a symmetric potential they must be \(\textbf R^2\)-orthogonal to each other, consistent with their \(\textbf C^2\)-orthogonality \(rt^*+(rt^*)^*=2\Re(rt^*)=0\Rightarrow |r\pm t|^2=|r|^2\pm 2\Re(rt^*)+|t|^2=|r|^2+|t|^2=1\) so that \(r\pm t=e^{2i\delta_{\pm}}\in U(1)\) as claimed all along:
In hindsight, it’s actually not too surprising why the eigenstates of the \(S\)-operator were given by the Fourier modes \(|k\rangle_{-\infty}\pm|-k\rangle_{\infty}\) when \(V(-x)=V(x)\). Physically, these look like simultaneously throwing in two identical beams of particles from both the left and right, just that one setup has even parity while the other setup has odd parity. After all, because \([H,\Pi]=0\) conserves parity, the interaction of the particles with \(H\) (in other words the scattering!) has to preserve parity asymptotically in the past and future. And those particular linear combinations are essentially the only way to construct \(\Pi\)-eigenstates out of the incident waves \(|k\rangle_{-\infty},|-k\rangle_{\infty}\) which don’t have definite parity.
Example: So far we’ve gotten away with proving several general results on one-dimensional scattering, assuming only a symmetric potential \(V(-x)=V(x)\). However, to actually compute \(S\)-matrices (regardless of which basis of scattering states you want to work in) we ultimately have to say what \(V(x)\) we’re talking about! To that effect, consider our favorite finite potential well of width \(L>0\) and depth \(V_0>0\). The goal is to compute the phase shifts \(\delta_{\pm}(k)\) that completely characterize all scattering phenomena off this potential. If we define \(|\psi_{\text{well}}^{\pm}\rangle:=\alpha_{\pm}(|k’\rangle_{[-L/2,L/2]}\pm|-k’\rangle_{[-L/2,L/2]})\) to be the compactly supported even/odd state in the finite potential well region, with increased momentum \(k’=\sqrt{2m(E+V_0)}/\hbar>k=\sqrt{2mE}/\hbar\), then the even/odd-parity total states \(|\psi_{\pm}\rangle\) are given by the superpositions \(|\psi_{\pm}\rangle=|k\rangle_{-\infty}\pm|-k\rangle_{\infty}+e^{2i\delta_{\pm}(k)}(|-k\rangle_{-\infty}\pm|k\rangle_{\infty})+|\psi_{\text{well}}^{\pm}\rangle\). Now because \(\langle x|\psi_{\pm}\rangle\) and \(\langle x|K|\psi_{\pm}\rangle\) should both be continuous at \(x=L/2\) (and by parity also at \(x=-L/2\)), it follows that \(\langle x|K\ln|\psi_{\pm}\rangle\) also needs to be continuous at \(x=\pm L/2\). Approaching \(x=L/2\) from the left means:
where \(k’=k'(k)\) through the energy \(E=\frac{\hbar^2k^2}{2m}=\frac{\hbar^2k’^2}{2m}-V_0\). The phase shift difference is obtained either by dividing the original exponentials or applying the arctan subtraction identity:
Finally, although the eigenvalues \(e^{2i\delta_{\pm}(k)}\) of the \(S\)-operator lie in \(U(1)\) when \(k\in\textbf R\), if we instead allow \(k\in\textbf C\) to roam the entire complex momentum plane, then the exponential \(e^{2i\delta_{\pm}(k)}\) is no longer restricted to lie in \(U(1)\). In particular, focus for a moment on just letting \(k\in i\textbf R\subseteq\textbf C\) roam the imaginary axis. We know from experience that the \(E<0\) bound state wavefunctions of the finite potential well (of which one even-parity bound ground state always exists) look like the pure harmonics of the infinite potential well but with an exponential leakage \(\propto e^{\pm \kappa x}[x\to\mp \infty]\) into the classically forbidden regions. These real exponentials \(e^{\pm\kappa x}\) are clearly very similar-looking to the complex exponentials \(e^{\pm ikx}\) that the scattering state wavefunctions involve. This is therefore how I would motivate this trick of looking at \(k=i\kappa’\) for \(\kappa’\in\textbf R\). We also know that the bound states of the finite potential well have to have definite parity (unlike the scattering states) so it seems plausible that maybe these bound states can emerge from our scattering states \(|\psi_{\pm}\rangle\) constructed earlier that do happen to have definite parity \(\pm\) as well. Therefore, with the substitution \(k=i\kappa’\), the total even/odd-parity scattering states \(|\psi_{\pm}\rangle\) transform asymptotically into looking like:
But clearly these candidate bound states \(|\psi_{\pm}\rangle\) (whether even or odd) can only have a hope of being normalizable (as required for bound states but not scattering states!) iff \(\kappa'<0\). Therefore, define \(\kappa:=-\kappa’>0\):
In order to kill the non-normalizable exponentials, we must therefore find values of \(k=-i\kappa\) on the lower imaginary axis such that the \(S\)-operator eigenvalues vanish \(e^{2i\delta_{\pm}(k)}=0\). More precisely, if we want to realize our dream of \(|\psi_+\rangle\) being an even parity bound state, then we need:
where now \(\kappa^2+k’^2=2mV_0/\hbar^2\) are constrained to lie on a circle \(k’-i\kappa\in \sqrt{2mV_0}/\hbar U(1)\) in the complex momentum plane \(\textbf C\). Meanwhile the condition above is satisfied iff the numerator vanishes \(\kappa=k’\tan(k’L/2)\).
This is the usual transcendental system of simultaneous equations for \((\kappa,k’)\in (0,\infty)^2\) that need to be solved to find even-parity bound states of the finite potential well. Graphically, it is clear that there always exists at least one ground state solution with possibly more depending on the width \(L\) and depth \(V_0\) of the potential.
For odd-parity bound states, we require \(e^{2i\delta_-(-i\kappa)}=0\) which instead leads to \(k’=-\kappa\tan k’L/2\). Graphically, this is a mirroring of the setup for even-parity states about the \(\Im =-\Re \) diagonal.
Resonances in Particle Physics
Recall from classical physics that the \(n\)-th harmonic for \(n\in\textbf Z^+\) on say a violin string of length \(L\) or inside a pipe of length \(L\) with nodes at both ends has wavelength \(\lambda=\lambda_n\) quantized according to \(n\lambda=2L\). Loosely, one might say that such harmonics “maximize transmission” of the instrument’s pleasant sound. It turns out the same principle is true when it comes to scattering quantum particles off a finite potential well of length \(L\); if the de Broglie wavelength \(\lambda=2\pi/k\) satisfies \(n\lambda=2L\), then the transmission probability is maximized \(T=|t|^2=|(e^{2i\delta_+(k)}-e^{2i\delta_-(k)})/2|^2=\sin^2(\delta_+(k)-\delta_-(k))=1\rightarrow \delta_+-\delta_-\equiv \pi/2\pmod \pi\) (alternatively, from the earlier geometric picture it is clear that \(|t|=\sin|\delta_+(k)-\delta_-(k)|\)). Equivalently, if one scatters in quantum particles exactly with energy \(E=\frac{n^2\pi^2\hbar^2}{2mL^2}+V_0\) for some \(n\in\textbf Z^+\), then the particle is guaranteed to transmit through with no reflection \(R=|r|^2=0\). Note that these energies are precisely those of the bound states in an infinite potential well with the bottom of the well raised to \(V_0\) (I’m not sure if this is a coincidence or something more universal about de Broglie quantization). This is one form of resonance in quantum mechanical scattering experiments, which one can think of as transmission resonances.
However, there is another sense of the word “resonance” as used in particle physics which is related to the above discussion. Specifically, a resonant state of a quantum system is one which is metastable, i.e. unstable in the long run \(t\to\infty\). Resonant states are thus a kind of hybrid of a bound state and a scattering state. What kind of quantum system admits resonant states? The finite potential well, it turns out, doesn’t. Instead, when asked to think about a potential \(V(x)\) admitting a resonant state, one’s gut reaction should be to visualize a trapping environment such as:
Here, the intuition is that there are no bound (i.e. permanently stable) states because the particle can quantum tunnel through the walls of the trap (don’t call it a well anymore!). Hence, the next best thing you can get is a resonant state.
We can idealize the above picture by considering a double Dirac delta potential trap \(V(x)=V_0L(\delta(x-L/2)+\delta(x+L/2))\) of width \(L>0\) and height \(V_0>0\) (the factor of \(L\) is needed to compensate the fact that delta functions always have the inverse dimension of their argument. This ensures \(V_0\) has units of energy). Integrating the Schrodinger equation in an \(\varepsilon\)-ball around the spike at \(x=L/2\) yields the magnitude of the first derivative’s discontinuity \(\psi'(L/2)^+-\psi'(L/2)^-=\frac{2mV_0 L}{\hbar^2}\psi(L/2)\) where the quantity \(\psi(L/2)\) is well-defined because \(\psi(x)\) itself will be continuous at \(x=L/2\) (and indeed, is the other condition we need to patch). Since this potential is again even \(V(-x)=V(x)\), we should scatter in particles from the left and right simultaneously and patch only at \(x=L/2\); looking at the even-parity scattering state for now (odd case is similar):
where \(\tilde k:=2mV_0 L/\hbar^2\). Unlike the finite potential well, the Dirac double delta potential trap admits no bound states because there does not exist any \(k\) on the lower imaginary axis such that \(e^{2i\delta_+(k)}=0\). To prove this, write \(k=-i\kappa\) for \(\kappa>0\) so that we need to solve the real equation \(\tanh\frac{\kappa L}{2}=-1-\frac{\tilde k}{\kappa}\). But the range of \(\tanh\) is \((-1,1)\) and for \(\kappa>0\) the right-hand side is always less than \(-1\), hence there can be no solution for \(\kappa>0\) (it turns out there are solutions for \(\kappa<0\) whenever \(\tilde kL\leq\approx 0.5569290855\) where the constant satisfies the transcendental equation \(\tanh\left(\frac{1}{2}+\frac{x}{4}\right)=1-\frac{1}{\frac{1}{x}-\frac{1}{2}}\) but we already saw these were unphysical). Of course, this non-existence of bound states is what we had expected all along. However, there do exist complex momenta \(k\in\textbf C-i\textbf R\) not on the imaginary axis for which the phase eigenvalue \(e^{2i\delta_+(k)}=0\) does vanish. But does that have any associated physical meaning? Well, just like the bound state case \(k=-i\kappa\) led to the negative energy \(E=-\hbar^2\kappa^2/2m\) characteristic of bound states, a general complex momentum \(k=\Re k+i\Im k\) leads to a complex energy \(E=\frac{\hbar^2(\Re^2 k-\Im^2 k)}{2m}+i\frac{\hbar^2}{m}\Re k\Im k\). To see what this means, consider that an \(H\)-eigenstate \(|E\rangle\) with such a complex energy \(E\) would evolve in time as:
\[|E(t)\rangle=e^{-iEt/\hbar}|E(0)\rangle=e^{-i\hbar(\Re^2 k-\Im^2 k)t/2m}e^{\hbar\Re k\Im k t/m}|E(0)\rangle\]
Thus, such an \(H\)-eigenstate \(|E(t)\rangle\) oscillates at angular frequency \(\omega=\hbar(\Re^2 k-\Im^2 k)/2m\). If we assume that \(\Re k\Im k<0\) have opposite signs (i.e. \(k\in\textbf C\) either lives in the second or fourth quadrant), then\(|E(t)\rangle\) also decays (rather than grows) exponentially in amplitude with time constant \(\tau=-m/\hbar\Re k\Im k>0\). This turns out to correspond to a resonant state with \(\tau\) representing the typical time scale for which one can trap the particle before it tunnels away. It is common to define the energy width \(\Gamma:=-2\hbar^2\Re k\Im k/m\) of such a metastable resonant state so that \(\Gamma\tau=2\hbar\) has a Heisenberg form (the reason for this name has to do with the Breit-Wigner distribution in particle physics).
So assuming we’ve found some non-imaginary complex momentum \(k=\Re k+i\Im k\in\textbf C-i\textbf R\) for which the outgoing waves are suppressed \(e^{2i\delta_+(k)}=0\), recall this means that the even parity scattering state (incorporating the time dependence) reduces to looking like \(\langle x|\psi_+(t)\rangle\to e^{-iEt/\hbar+ikx}=e^{i(\Re k x-\hbar(\Re^2k-\Im^2k)t/2m)}e^{-\Im k(x-\hbar\Re k t/m)}\) as \(x\to -\infty\) and \(\langle x|\psi_+(t)\rangle\to e^{-iEt/\hbar-ikx}=e^{-i(\Re k x+\hbar(\Re^2k-\Im^2k)t/2m)}e^{\Im k(x+\hbar\Re k t/m)}\) as \(x\to\infty\). But this simply describes the wavefunction of a particle that has tunneled out of trap, moving away in both directions \(x\to\pm\infty\) from the trap at speed \(v=\hbar|\Re k|/m\). Such a wavefunction \(|\psi_+\rangle\notin L^2(\textbf R\to\textbf C,dx)\) is not normalizable as is typical of resonant states, analogous to scattering states (for instance, despite being an \(H\)-eigenstate, its complex energy eigenvalue \(E\in\textbf C\) violates the supposed Hermiticity of the Hamiltonian \(H\), so such a state \(|\psi_+\rangle\) cannot belong to the state space \(L^2\)).
So then back to the double Dirac delta trap, we expect intuitively it should have an even resonant state, which mathematically means that its \(S\)-operator eigenvalue \(e^{2i\delta_+(k)}=-e^{-ikL}\frac{k\tan(kL/2)-\tilde k-ik}{k\tan(kL/2)-\tilde k+ik}\) should vanish somewhere in quadrant \(2\) or \(4\) of the complex \(k\)-plane. Indeed, this means we expect the numerator to vanish \(k\tan(kL/2)=\tilde k+ik\). One can check that this is equivalent to the requirement \(e^{ikL}=-1-\frac{2ik}{\tilde k}\). If the potential were infinitely trapping so that \(\tilde k\to\infty\), then we would have real momenta \(k=k_n=\frac{\pi}{L}+\frac{2\pi n}{L}\in\textbf R\). These are bound states because an infinitely strong double Dirac delta trap looks like an infinite potential well (at least in the trapping region \(x\in(-L/2,L/2)\); asymptotically all incident waves are reflected at the walls and \(e^{2i\delta_+(k)}=-e^{-ikL}\neq 0\) with the minus sign due to the phase shift of a reflection. Another point is that we previously saw bound states should have \(k\) living in the lower imaginary axis, but here \(k\in\textbf R\). This is a pathological exception to the usual rule since ordinarily for finite \(\tilde k\) we didn’t expect any bound states in the first place but an infinite trapping potential forbids particles from tunneling out, so would-be resonant states transmogrify into bound states in this artificial limit \(\tilde k\to\infty\)). Now suppose that the potential trapping strength \(\tilde k\) is still strong but finite (this is the limit we work in). Focus on the \(n=0\) bound state for which \(k=\pi/L\). It’s position space wavefunction in the trap looks just like the fundamental harmonic on a violin string. We want to know where this initially real \(k=\pi/L\) instead ends up in \(\textbf C\). To first order in \(1/\tilde k\), it simply moves to (going to first-order in the log and using binomial expansion) \(ikL\approx i\pi+2ik/\tilde k\Rightarrow k\approx \frac{\pi}{L}(1+2/\tilde kL)\). However, to second order \(ikL\approx i\pi+2ik/\tilde k+4k^2/\tilde k^2\). We can take the first-order solution for \(k\) and substitute in for the \(k^2\) term, keeping only terms to order \(1/\tilde k^2\), and end up getting \(k\approx\frac{\pi}{L}(1+2/\tilde kL+4/\tilde k^2L^2)-\frac{4\pi^2}{\tilde k^2L^3}i\) neglecting real and imaginary terms of order \(O_{\tilde k\to\infty}(1/\tilde k)^3\). This shows \(k\) moving toward the right and down in the complex momentum plane \(\textbf C\), which is the resonant (ground) state momentum!
Problem: Explain what the “nearly” in “nearly free electrons” means precisely.
Solution: It means the potential \(V\) experienced by the electrons is assumed to be weak in comparison to their kinetic energy \(V\ll T\) (in addition to being weak, it is also the case that the potential \(V\) is assumed to be periodic, though that’s not really conveyed by “nearly”). Immediately, the weak hypothesis on \(V\) implies it may be treated as a perturbation on \(T\) which is convenient because the simple kinetic Hamiltonian \(H=T\) has the familiar plane wave eigenstates \(|\mathbf k\rangle\) with energy eigenvalues \(E(\mathbf k)=\hbar^2|\mathbf k|^2/2m\).
Problem: Let \(\Lambda\) be a lattice in \(\mathbf R^d\), and let \(V(\mathbf x)\) be a weak \(\Lambda\)-periodic potential generated by some motif of atoms/ions present at each lattice point in \(\Lambda\). Within the nearly free electron model, explain qualitatively how the isotropic free electron dispersion \(E(\mathbf k)=\hbar^2|\mathbf k|^2/2m\) develops a discontinuity \(\Delta E_{\partial T^d}(\mathbf k)\) (called a band gap) on the boundary \(\mathbf k\in\partial T^d\) of the Brillouin zone \(T^d\).
Solution: A \(\Lambda\)-periodic function admits a Fourier series:
where the Fourier coefficients are determined by the structure factor of the potential \(V_{V_p}(\mathbf k)=\int_{V_p}d^d\mathbf xe^{-i\mathbf k\cdot\mathbf x}V(\mathbf x)\) with respect to a primitive cell \(V_p\) of volume \(|V_p||T^d|=(2\pi)^d\). If two plane wave states \(|\mathbf k\rangle,|\mathbf k’\rangle\) lie on the same sphere \(E(\mathbf k)=E(\mathbf k’)\) in \(\mathbf k\)-space, then degenerate perturbation theory is required iff the potential \(\langle\mathbf k’|V(\mathbf x)|\mathbf k\rangle\neq 0\) couples them:
Thus, \(\langle\mathbf k’|V(\mathbf x)|\mathbf k\rangle\neq 0\Leftrightarrow\mathbf k’-\mathbf k\in\Lambda^*\) which is sometimes visualized in \(d=2,3\) via the Ewald sphere construction:
Furthermore, in practice Born-von Karman quantization of \(\mathbf k\)-space due to a finite real space volume means that the Dirac comb can be replaced by a “Kronecker comb” that leads directly to the Fourier coefficient:
Thus, although in the Brillouin zone interior \(\text{int}(T^d)\) there exist uncountably many degenerate spheres \(S^{d-1}(k)\), it is only for radii \(k>0\) intersecting the Brillouin zone boundary \(S^{d-1}(k)\cap\partial T^d\neq\emptyset\) that there is a hope of two wavevectors \(\mathbf k,\mathbf k’\in S^{d-1}(k)\cap\partial T^d\) differing by \(\mathbf k’-\mathbf k\in\Lambda^*\) simply because of how \(T^d\) is defined as the Wigner-Seitz cell of \(\Lambda^*\).
Problem: Flesh out completely all the implications of the \(d=1\) nearly free electron model.
Solution:
In the \(1\)D extended zone scheme, the band structure (a butchering of the free particle parabolic dispersion) looks like:
or in the \(1\)D reduced zone scheme:
Problem: Comment on the group velocity and effective mass behaviour of this band structure.
Solution: The group velocity of electron wavepacket provides \(1^{\text{st}}\) derivative information about the dispersion (here \(p=\hbar k\)):
\[v:=\frac{\partial E}{\partial p}\]
One heuristic justification for the electron group velocity flattening out \(v\to 0\) at zone boundaries \(k=n\pi/a\) is that this ensures upon backfolding into the \(1^{\text{st}}\) Brillouin zone in the reduced zone scheme that the group velocity \(v=v(k)\) remains a continuous function of \(k\) at \(k=n\pi/a\).
Meanwhile, the effective mass provides \(2^{\text{nd}}\) derivative information about the dispersion:
(i.e. one can think of \(m^*\) as the radius of curvature at a point on the dispersion). For the electron it is positive near the bottom of a band but negative near the top, whereas it is the other way around for holes.
Problem: Let \(\Lambda\) be an arbitrary lattice in dimension \(d\), and let \(\mathbf k_0\in (T^d)^{(d-1)}\) be a wavevector on the Brillouin zone boundary such that it couples to only one other wavevector \(\mathbf k_0-\Delta\mathbf k\in (T^d)^{(d-1)}\) for \(\Delta\mathbf k\in\Lambda^*\) and no other \((d-1)\)-skeleton wavevectors in the CW complex \((T^d)^{(d-1)}\). In that case, show that the nearly free electron dispersion relation in a neighbourhood of \(\mathbf k=\mathbf k_0\) is the sum of non-relativistic and “relativistic” contributions:
Solution: The \(2\times 2\) Hamiltonian matrix restricted to the ~degenerate subspace \(\text{span}_{\mathbf C}|\mathbf k\rangle,|\mathbf k-\Delta\mathbf k\rangle\) is:
Writing \(\mathbf k=\mathbf k_0+\delta\mathbf k\) and invoking \(\varepsilon_{\mathbf k_0-\Delta\mathbf k}=\varepsilon_{\mathbf k_0}\Leftrightarrow \mathbf k_0\cdot\Delta\mathbf k=|\Delta\mathbf k|^2/2\), one can massage \(H\) into the form:
with effective magnetic field \(\mathbf B=-\mu_B^{-1}(\text{Re}V_{\Delta\mathbf k},-\text{Im}V_{\Delta\mathbf k},(\mathbf p-\mathbf p_0)\cdot\mathbf v_0)\). Standard properties of Pauli matrices then yields the desired spectrum. In particular, the group velocity vector field and effective mass tensor field are given by:
Problem: Consider a \(d=1\) linear lattice \(\Lambda_a\) with single-atom motifs spaced periodically a distance \(a\) apart, and suppose each atom is monovalent \(z=1\). Although standard band structure theory predicts that the first energy band will thus be half-filled and the material a metal as a result, explain why in practice (especially at sufficiently low temperatures) this is actually not the case.
There is another, more abstract way to qualitatively see why a periodic potential \(V(x+\Delta x)=V(x)\) gives rise to Brillouin zones and energy bands.
Consider an \(N\)-dimensional linear dynamical system \(\dot{\textbf x}(t)=A(t)\textbf x(t)\) where \(A(t)\) can be time-dependent. In general, one expects this to have \(N\) linearly independent solutions \(\textbf x_1(t),\textbf x_2(t),…,\textbf x_N(t)\). A fundamental matrix solution \(X(t)\) to the linear dynamical system is any \(N\times N\) matrix comprised of a basis of \(N\) such linearly independent solutions \(X(t)=\left(\textbf x_1(t),\textbf x_2(t),…,\textbf x_N(t)\right)\), so in other words the linear independence implies that \(X(t)\in GL_N(\textbf R)\) is invertible \(\det X(t)\neq 0\) at all times \(t\in\textbf R\). Think of the fundamental matrix solution \(X(t)\) as an \(N\)-dimensional parallelotope evolving in time \(t\). It can be checked that any fundamental matrix solution \(\dot{X}(t)=A(t)X(t)\) satisfies the same system of ODEs, and that any individual solution \(\textbf x(t)\) must lie in the column space \(\textbf x(t)\in\text{span}_{\textbf R}(\textbf x_1(t),…,\textbf x_N(t))\) of any fundamental matrix solution \(X(t)\) (indeed, this is why it’s called “fundamental” because it provides a basis for the entire solution space of the linear dynamical system). Thus, \(\textbf x(t)=X(t)X^{-1}(0)\textbf x(0)\). If \(X(0)=1_{N\times N}\), then \(X(t)\) is called a principal fundamental matrixsolution and in that case we just have \(\textbf x(t)=X(t)\textbf x(0)\). Thus, the principal fundamental matrixsolution plays a role analogous to the time evolution operator \(U(t)\) in quantum mechanics, evolving the initial state of a system through time \(t\).
Floquet theory is interested in a special case of the setup above, namely when \(A(t+T)=A(t)\) is a \(T\)-periodic matrix. In this case, it turns out that the solutions \(\textbf x(t)\) themselves will not necessarily be \(T\)-periodic (or periodic at all for that matter), but will nonetheless take on a much simpler form than the otherwise unenlightening time-ordered exponential solution \(\textbf x(t)=\mathcal Te^{\int_0^tA(t’)dt’}\textbf x(0)\) as a Dyson series. What’s that simplification? Let me spoil it first, then prove it afterwards. Recall that I just said \(\textbf x(t)\) won’t necessarily be periodic. But that should feel pretty strange since \(A(t)\) is by assumption \(T\)-periodic so surely \(\textbf x(t)\) should inherit something from this. And that’s exactly the essence of Floquet’s theorem, namely that although \(\textbf x(t)\) itself need not be periodic, its time evolution can always be factorized into a \(T\)-periodic \(N\times N\) matrix \(X_A(t)\) (with the subscript \(A\) to suggest that it inherits its \(T\)-periodicity \(X_A(t+T)=X_A(t)\) from the \(T\)-periodicity of the matrix \(A(t)\)) modulated by an exponential growth/decay or oscillatory “envelope” \(e^{\Lambda t}\) for some time-independent \(N\times N\) (possibly complex) matrix \(\Lambda\in\textbf C^{N\times N}\). In other words, one has the ansatz:
\[\textbf x(t)=X_A(t)e^{\Lambda t}\textbf x(0)\]
To prove Floquet’s theorem, we first ask: although \(\textbf x(t+T)\neq\textbf x(t)\) need not be periodic, how are \(\textbf x(t+T)\) and \(\textbf x(t)\) related to each other? Clearly by a simple \(T\)-time translation \(\textbf x(t+T)=X(t+T)X^{-1}(t)\textbf x(t)\). It makes sense then to focus on understanding these fundamental matrix solutions \(X(t)\) better.
The first key result of Floquet theory is that the time evolution of any fundamental matrix solution \(X(t)\) across a period \(T\) behaves multiplicatively \(X(t+T)=X(t)\tilde X_T\) where the \(N\times N\) monodromy matrix \(\tilde X_T=X^{-1}(t)X(t+T)\) is a conserved quantity of any \(T\)-periodic linear dynamical system:
It is thus common to initialize the monodromy matrix \(\tilde X_T=X^{-1}(0)X(T)\) and write \(X(t+T)=X(t)X^{-1}(0)X(T)\). After \(n\) periods elapse we have \(X(t+nT)=X(t)\tilde X_T^n\) so as a corollary for instance the hypervolume \(\det X(t)\) of the parallelotope grows stroboscopically in time \(t\) as a geometric sequence \(\det X(t+nT)=(\det \tilde X_T)^n\det X(t)\). Indeed, using Jacobi’s formula one can check that \(\dot{\det}X(t)=\det A(t)\det X(t)\) so \(\det X(t)=e^{\int_0^tA(t’)dt’}\det X(0)\) is consistent with \(X(t+T)=X(t)\tilde X_T\) provided the monodromy matrix \(\tilde X_T\) and \(A(t)\) are related by \(\det\tilde X_T=e^{\int_0^T\text{Tr} A(t)dt}\) (despite the suggestion, note that in general \(\tilde X_T\neq e^{\int_0^T A(t)dt}\)! Everything would be simpler if this were true but alas it isn’t).
It is therefore intuitively clear that the behavior/stability of any periodic linear dynamical system is sensitive to its monodromy matrix \(\tilde X_T\), specifically whether it “blows up” or “decays” exponentially or oscillates (as would have been one’s first instinct given the periodicity \(A(t+T)=A(t)\) of \(A\))! More precisely, to understand the behavior of powers \(\tilde X_T^n\) of a matrix, it is always a good idea to diagonalize it (a subtle but important property of the monodromy matrix \(\tilde X_T\) proven on page \(52\) here is that although it depends on the choice of fundamental matrix solution \(X(t)\) used to construct it, it will be similar to any other monodromy matrix \(\tilde X’_T\) constructed from any other fundamental matrix solution \(X'(t)\); thus, the eigenvalues of the monodromy matrix \(\tilde X_T\) are properties only of the linear, periodic dynamical system \(\dot X=AX\) and therefore are rightfully called its characteristic multipliers).
Since \(\tilde X_T\in GL_N(\textbf R)\), these characteristic multipliers must be non-zero, and so all such characteristic multipliers can be written as \(e^{\lambda T}\) for some \(\lambda\in\textbf C\) called its characteristic exponent. Although the imaginary part \(\Im\lambda\) of such a characteristic exponent is not unique \(e^{(\lambda+2\pi i/T)T}=e^{\lambda T}\), the real part \(\Re\lambda\) is unique and physically meaningful, being given the name Lyapunov exponent. Specifically, it is clear that if a solution \(\textbf x(t)\) happens to initialize on \(\textbf x(0)=X(0)\tilde{\textbf x}\) where \(\tilde X_T\tilde{\textbf x}=e^{\lambda T}\tilde{\textbf x}\) is an eigenvector of the monodromy matrix \(\tilde X_T\), then one can check that such a solution \(\textbf x(t+T)=e^{\lambda T}\textbf x(t)\) evolves multiplicatively in an analogous manner to the fundamental matrix solution earlier. In particular, if one replaces this \(\textbf x(t)\mapsto \textbf x_A(t):=e^{-\lambda t}\textbf x(t)\), then clearly this new trajectory (but no longer a solution!) \(\textbf x_A(t)\) is \(T\)-periodic \(\textbf x_A(t+T)=e^{-\lambda t}e^{-\lambda T}\textbf x(t+T)=e^{-\lambda t}e^{-\lambda T}e^{\lambda T}\textbf x(t)=e^{-\lambda t}\textbf x(t)=\textbf x_A(t)\). Thus, one can isolate the form of this “eigentrajectory” as \(\textbf x(t)=e^{\lambda t}\textbf x_A(t)\). For \(\Re \lambda<0\) we have \(\lim_{t\to\infty}\textbf x(t)=\textbf 0\), for \(\Re \lambda>0\) we have \(\lim_{t\to\infty}\textbf x(t)=\boldsymbol{\infty}\) while for \(\Re \lambda=0\) the eigentrajectory \(\textbf x(t)\) is in general pseudoperiodic in that it ends up at the same distance \(|\textbf x(t+T)|=|\textbf x(t)|\) (but not necessarily the same point) away from the origin after each period \(t\mapsto t+T\). These criteria emphasize the role played by the Lyapunov exponent \(\Re\lambda\) in the analysis of stability. By lining up \(N\) such linearly independent eigentrajectories (we’ll assume that the monodromy matrix \(\tilde X_T\) is diagonalizable), it follows that any fundamental matrix solution \(X(t)\) admits a so-called Floquet normal form \(X(t)=X_{A}(t)e^{\Lambda t}\) where \(X_A(t+T)=X_A(t)\) inherits the \(T\)-periodicity of \(A\) and \(\Lambda\) is a time-independent \(N\times N\) matrix. This therefore proves Floquet’s theorem!
Connection to Periodic Potentials in Quantum Mechanics
Define \(\boldsymbol{\phi}(x):=(\psi(x),\psi'(x))^T\) so that the one-dimensional second-order Schrodinger equation reduces to the first-order linear system:
where \(A(x)=\begin{pmatrix}0&1\\2m(V(x)-E)/\hbar^2 & 0\end{pmatrix}\) is \(\Delta x\)-periodic \(A(x+\Delta x)=A(x)\) by virtue of the \(\Delta x\)-periodicity of the potential \(V(x)\) itself. Now notice that clearly \(\text{Tr} A(x)=0+0=0\), so it follows from the earlier general Floquet theory results that \(\det\tilde \Phi_{\Delta x}=1\) where \(\tilde \Phi_{\Delta x}\) is the monodromy matrix of the periodic quantum system. This means that its two eigenvalues \(\varphi_{\pm}\) are given by:
Thus, when \(\text{Tr}\tilde \Phi_{\Delta x}<-2\) or \(\text{Tr}\tilde \Phi_{\Delta x}>2\), both \(\phi_{\pm}>0\) are real and positive, indicating a spatially unstable wavefunction \(\psi(x)\) that diverges in a non-normalizable manner as \(x\to\pm\infty\). Therefore, this is saying that there do not exist \(H\)-eigenstates with energy \(E\) such that the corresponding monodromy matrix \(\tilde \Phi_{\Delta x}=\tilde \Phi_{\Delta x}(E)\) has trace less than \(-2\) or greater than \(2\). These are the band gaps between the various energy bands! Of course then, on the other hand when \(\text{Tr}\tilde \Phi_{\Delta x}\in(-2,2)\) is in the zone of spatial stability (aka \(L^2\)-normalizability), these are associated with the energy bands! Finally, exactly when \(\text{Tr}\tilde \Phi_{\Delta x}=\pm 2\) is where one reaches the boundaries separating various Brillouin zones! Admittedly this Floquet analysis is a bit abstract and handwavy but I still think it’s a nice alternative perspective to the more concrete nearly-free electron model.
If we let \(\Phi(x)=\begin{pmatrix}\psi_1(x) & \psi_2(x) \\ \psi’_1(x) & \psi’_2(x)\end{pmatrix}\) be a fundamental matrix solution, then of course \(\det\Phi(x)\) is the usual constant Wronskian of two wavefunctions \(\psi_1(x),\psi_2(x)\), and Floquet’s theorem allows us to write \(\Phi(x)=\Phi_V(x)e^{\Lambda x}\) for a \(\Delta x\)-periodic \(2\times 2\) matrix \(\Phi_V(x+\Delta x)=\Phi_V(x)\) whose \(\Delta x\)-periodicity is inherited from the \(\Delta x\)-periodicity of the potential \(V(x)\). This leads to Bloch’s theorem (in one dimension), namely that there exists a crystal momentum \(k\in[-\pi/\Delta x,\pi/\Delta x]\) in the first Brillouin zone is given by the Bloch state \(\psi_{k}(x)=e^{ikx}\phi_k(x)\).
Problem: Before developing FGR, it is first necessary to lay out as a prerequisite the general framework of time-dependent perturbation theory (FGR is then a special case thereof). To this effect, consider the usual (Schrodinger picture) Hamiltonian decomposition \(H=H_0+V\), and obtain the “fundamental theorem of (\(1^{\text{st}}\)-order) time-dependent perturbation theory” for the transition amplitude between an initial (Schrodinger picture) \(H_0\)-eigenstate \(|i\rangle\) at \(t=0\) and a final (Schrodinger picture) \(H_0\)-eigenstate \(|f\rangle\) with distinct energies \(E_f\neq E_i\) as a function of the elapsed time \(t\):
where \(\hbar\omega_{fi}:=E_f-E_i\), \(V_{fi}(\omega_{fi}):=\langle f|V(\omega_{fi})|i\rangle\) and the linear response Fourier transform convention is being used:
Solution: First one has to establish the following exact decomposition lemma for the time evolution operator under the perturbed Hamiltonian \(H\) (prove by showing both sides obey \(i\hbar\dot U=HU\) with \(U(0)=1\)):
Note such decompositions exist only because the time integral begins at a finite \(t=0\) rather than e.g. \(t=-\infty\) as encountered in scattering theory).
Taking the matrix element of both sides in the \(H_0\)-eigenbasis, the free evolution piece \(\exp\left(-\frac{iH_0t}{\hbar}\right)\) contributes an irrelevant phase \(e^{-iE_ft/\hbar}\) when acting on the bra \(\langle f|\) since \(H_0\) cannot promote transitions between its own eigenstates; later when taking the mod-square of the transition amplitude to obtain a transition probability, this will drop out.
At this point, one employs the key step of (\(1^{\text{st}}\)-order) time-dependent perturbation theory, which is to Dyson-expand the time-ordered exponential to \(1^{\text{st}}\)-order (the word “order” here is being used in \(2\) different senses!):
(aside: why not just Dyson-expand the original time-evolution operator \(\mathcal T\exp\left(-\frac{i}{\hbar}\int_0^tdt’H(t’)\right)\)? Answer: because it is \(V\) that is weak, not \(H=H_0+V\)! This simple observation motivates the whole point of using the interaction picture in the first place!)
At this point, the integral is recognized as a \([0,t]\)-windowed Fourier transform of \(\langle f|V(t)|i\rangle\), so by interpreting \(\int_0^tdt’=\int_{-\infty}^{\infty}dt'[0\leq t’\leq t]\) with a top-hat filter and using the convolution theorem, one arrives at the stated result (note that with this F.T. convention, there is a factor of \(2\pi\) in the convolution theorem that makes \(\hbar\mapsto h=2\pi\hbar\)).
Problem: Suppose that \(V(t)=V_0\) is time-independent. Show that the transition probability between arbitrary \(H_0\)-eigenstates \(|i\rangle,|f\rangle\) undergoes Rabi oscillations:
Solution: This one leads to a “Fraunhofer single-slit”:
Problem: Instead of just a single DC component at \(\omega=0\), now let the spectrum of the perturbation \(V\) be monochromatic at some frequency \(\omega\neq 0\); due to Hermiticity requirements, such a perturbation must be of the form:
hence for large \(t\), one may approximate \(\text{sinc}^2(\omega t)=\pi\delta(\omega t)\) (or more concisely, \(\text{sinc}^2=\pi\delta\)). Using this lemma, establish Fermi’s golden rule for the “transition rate”:
Problem: What is Fermi’s useful golden rule for stimulated absorption?
Solution: Instead of a discrete final \(H_0\)-eigenstate \(|f\rangle\), there should be a continuum (or in practice a quasi-continuum) of final states described by a density of \(H_0\)-eigenstates \(g_{f,H_0}(E_f)\). The useful form of Fermi’s golden rule for the transition rate into any of these energetically compatible \(H_0\)-eigenstates is then:
where a factor of \(\hbar\) has been (optionally) absorbed to have an energy rather than angular frequency argument in the density of states. Of course, a similar useful form of Fermi’s golden rule holds for stimulated emission.
Problem: What are the \(3\) key assumptions of Fermi’s golden rule (FGR)?
Solution: The \(3\) key assumptions are:
Assumption #\(1\): Weak coupling \(V(t)\) (several ways to formalize this).
Assumption #\(2\): Continuum (or in practice quasicontinuum) of final scattering \(H_0\)-eigenstates to transition into, hence described by a density of states \(g_{f,H_0}(E_f)\).
Assumption #\(3\): Intermediate time window (this leads to the idea of the emergence and subsequent breakdown of FGR as a function of time \(t\)).
In most quantum mechanics textbooks, Assumption #\(1\) tends to be emphasized clearly by the fact that one is doing (time-dependent) perturbation theory. Assumption #\(2\) is present in Fermi’s useful golden rule, so in that sense is a bit optional. Finally, Assumption #\(3\) tends to be obscured, and sometimes omitted entirely, but is absolutely critical.
Problem: Elaborate on the importance of Assumption #\(3\).
Solution: The fact that there should be a lower bound on \(t\) makes sense, as enough time needs to pass for the \(\text{sinc}^2\) to actually look like the \(\delta\) as one of the key steps above. However, at the same time the notion of \(t\to\infty\) is a bit artificial, and for one clearly cannot be strictly correct since otherwise the transition probability \(P_{fi}(t)\propto t\) would eventually exceed \(1\).
More quantitatively, if one (optionally) defines a notion of Rabi frequency in the usual way \(\hbar\Omega_{fi}/2:=|\langle f|V_0|i\rangle|\), then it’s clear from the earlier formula for the transition probability that, taking the \(t\to 0\) limit rather than \(t\to\infty\), it instead takes off quadratically in time \(t\) as \(P_{fi}(t)=\Omega_{fi}^2t^2/4\) before the emergence of the linear-in-\(t\) FGR regime.
Meanwhile, in the large-\(t\) limit, if one drives too strongly then of course perturbation theory breaks down and instead one would get Rabi oscillations or something like that. So the exact regime of validity of FGR is a much richer and subtler topic than it seems at first. See the paper of Micklitz et al. and the paper of Chen et al. for more details.
(aside: a scattering amplitude is simply a transition amplitude between \(2\) asymptotic scattering \(H_0\)-eigenstates; thus the term “transition amplitude” is more general, referring to the probability amplitude of an \(H_0\)-measurement at time \(t\) to yield the energy \(E_f\) given the quantum system was, at time \(t=0\), in the \(H_0\)-eigenstate \(E_i\)).
Problem: Explain under what conditions in a scanning tunneling spectroscopy experiment, the differential conductance \(dI/d\phi\propto g_{s}(E=e\phi)\).
Solution: Assuming the \(3\) conditions of Fermi’s golden rule are met, one has to integrate the joint density of states (a convolution) \(g_{t\to s}(E):=\int dE’g_{t}(E-E’)g_{s}(E’)\):
Problem: A hydrogen atom in its \(s\)-wave bound ground state is ionized by light of frequency \(\omega\). Calculate the FGR transition rate to the continuum of asymptotically free scattering states (this is a simple model of the photoelectric effect).
Solution: A qualitative sketch of the calculation: first, ignore quantization of the EM field (valid for e.g. a laser) by imposing a suitable vector potential such as \(\textbf A\propto e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\) for the usual planar EM wave. The perturbation Hamiltonian \(V(t)\) is determined by expanding to first-order the usual minimally-coupled Hamiltonian \(H\) for a charge \(q=-e\) in an electromagnetic field. Since this is absorption rather than stimulated emission, compute the matrix element of the appropriate term between the hydrogen atom ground state \(|1,0,0\rangle\) and the free \(e^-\) state \(|\textbf k\rangle\) (where recall that the density of such free states is \(g_{f,H_0}(E)\propto\sqrt E\)).
Problem: (to be added, dipole approximation demonstration that rates of stimulated absorption/emission are same, and the first-order p.t. expression for that rate).
Solution: Fermi’s golden rule applied to a monochromatic perturbation (should this be treated as another assumption?), but for a more general spectrum, even transitions between discrete bound \(H_0\)-eigenstates are possible, e.g. atom bathing in a photon gas. To compute the stimulated absorption/emission rates \(\Gamma_a,\Gamma_e\) it is common to employ a so-called dipole approximation (though in this context I feel a better name would be long-wavelength approximation or even just cold approximation; note also that this is a further approximation on top of all the other approximations that have already been made before this) in which the typical wavelength \(\lambda\gg r_B\) of photons \(\gamma\) in the thermal bath is much longer than the Bohr radius \(r_B\) representing the typical length scale of atoms. Essentially what this means is that the vector potential \(\textbf A\sim e^{i(\textbf k_{\text{ext}}\cdot\textbf x-\omega_{\text{ext}} t)}\approx e^{-i\omega_{\text{ext}}t}\) is approximately spatially uniform for \(\textbf k_{\text{ext}}\cdot\textbf x\ll 1\) for \(\textbf x\) ranging over the atom, and therefore the time-dependent perturbation to the atom just has the form \(\Delta\tilde H(t)=e\textbf E_{\text{ext}}(t)\cdot\sum_{e^-}\textbf X_{e^-}\) where \(e\sum_{e^-}\textbf X_{e^-}\) is the net electric dipole moment of the atom, hence the name “dipole approximation”. Chugging this into Case #\(2\) of Fermi’s golden rules, one can compute the rates of stimulated absorption and emission to be equal and given by:
Note that non-relativistic quantum mechanics predicts that atoms can only undergo stimulated absorption or stimulated emission. In particular, non-relativistic quantum mechanics predicts that there are no such things as “spontaneous absorption” or “spontaneous emission” \(\Gamma_{1\to 2}^*=\Gamma_{2\to 1}^*=0\) where an atom can undergo a transition between \(\tilde H\)-eigenstates in the absence \(\textbf E_{\text{ext}}=\textbf B_{\text{ext}}=\textbf 0\) of external electromagnetic fields/photons (because the non-relativistic Schrodinger equation asserts that the \(t\)-evolution of \(\tilde H\)-eigenstates is to stay right where they are rather than hopping to other \(\tilde H\)-eigenstates). Indeed, it turns out there is no such thing as “spontaneous absorption” \(\Gamma_{1\to 2}^*\), but there can be spontaneous emission \(\Gamma_{2\to 1}^*\neq 0\). This phenomenon only arises once one quantizes the classical, smooth electromagnetic field \((\textbf E,\textbf B)\). When this is done, it is found that spontaneous emission is possible due to interactions between the atom and zero-point fluctuations of the quantum electromagnetic field, but to delve deeper would be the subject of quantum electrodynamics.
Nevertheless, Einstein was able to compute the rate of spontaneous emission \(\Gamma_{2\to 1}^*\) without knowing anything about the quantization of the electromagnetic field. Or more precisely, Einstein showed that if one could calculate the stimulated emission rate \(\Gamma_{2\to 1}\), then one would also get the spontaneous emission rate \(\Gamma_{2\to 1}^*\) “for free” since he found that the two were proportional to each other and what that proportionality constant was.
Einstein’s Statistical Argument for Spontaneous Emission
Consider any \(2\) energy levels \(E_1<E_2\) in an atom of degeneracies \(\Omega_1,\Omega_2\) with number of electrons \(N_1,N_2\) respectively.
Then in general, one heuristically expects probabilistic kinetics of the form (ASIDES: what happens if one also adds a “spontaneous absorption” term \(A_{12}N_1\) among the rate terms? And also, need to explicitly related the \(\Gamma\)-transition rates above in Fermi’s golden rule with this triplet of Einstein coefficients):
Consider a \(2\)-level system \(\mathcal H=\text{span}_{\mathbf C}|0\rangle,|1\rangle\cong\mathbf C^2\) in steady state thermal equilibrium \(\dot N_0=\dot N_1=0\) with a photon gas at temperature \(T\). The populations \(N_0,N_1\) of the \(2\) levels are Boltzmann distributed so that in particular \(N_0/N_1=g_0e^{\beta\hbar\omega_0}/g_1\) for \(\hbar\omega_0=E_1-E_0>0\). Meanwhile, the energy density of the photon gas per unit angular frequency \(d\omega\) is \(\frac{1}{c}\times\frac{2}{(2\pi)^3}4\pi\left(\frac{\omega}{c}\right)^2\frac{\hbar\omega}{e^{\beta\hbar\omega}-1}=\frac{\hbar\omega^3}{\pi^2c^3}\frac{1}{e^{\beta\hbar\omega}-1}\). The \(2\)-level system and the photon gas interact via \(3\) channels: stimulated absorption (rate \(B_{01}\frac{\hbar\omega_0^3}{\pi^2c^3}\frac{1}{e^{\beta\hbar\omega_0}-1}N_0\)), stimulated emission (rate \(B_{10}\frac{\hbar\omega_0^3}{\pi^2c^3}\frac{1}{e^{\beta\hbar\omega_0}-1}N_1\)), and spontaneous emission (rate \(A_{10}N_1\)), where the \(3\) Einstein coefficients \(B_{01},B_{10},A_{10}>0\) are each \(\omega_0\)-dependent (there is no such thing as “spontaneous absorption” \(A_{01}=0\)).
Equating the rate of \(01\) processes with the rate of \(10\) processes:
Because this result must hold for arbitrary equilibrium temperatures \(\beta\), one can just “pattern-match” (crucially, this is only possible thanks to Einstein’s introduction of spontaneous emission \(A_{10}\neq 0\) into the model!):
The first equality simply asserts that the transition rates \(\Gamma_{01}=\Gamma_{10}\) are equal (where \(\Gamma_{ij}:=g_iB_{ij}\)), which follows on the grounds that stimulated absorption of a photon of energy \(-E\) is equivalent to stimulated emission of a photon of energy \(E\) because the relevant Hamiltonian \(H\sim e^{i\omega_0 t}+e^{-i\omega_0 t}\) is Hermitian.
The second equality is the genuinely novel prediction (agreeing with the answer obtained from more rigorous QED calculations; here, Einstein implicitly quantizes the EM field by invoking the Planck distribution for the photon gas in the derivation). It’s nice because it provides a bridge to get from the \(B\)’s (which can be computed using standard dipole approximation/time-dependent perturbation theory) to the \(A\) (which outside of QED would not be computable otherwise). When this is done, one finds the lifetime \(\tau_{10}:=1/A_{10}\) of the excited state \(|1\rangle\) is given by:
The purpose of this post is to provide some examples of perturbation theory calculations in quantum mechanics by analyzing a specific perturbation called the Stark effect. For simplicity, we take as our quantum system a hydrogen atom with the usual non-relativistic two-body Hamiltonian:
and corresponding non-relativistic energy spectrum \(E_n=-\frac{1}{2}\left(\frac{\alpha}{n}\right)^2\mu c^2\). We now apply a constant external electric field \(\textbf E_{\text{ext}}=E_{\text{ext}}\hat{\textbf k}\) which is not too strong \(0<E_{\text{ext}}\ll e^2/4\pi\varepsilon_0r^2_{\text{Bohr}}\) along the \(z\)-axis and ask how this perturbs the original energy spectrum \(E_n\) (one can of course also calculate how it will perturb the corresponding \(H\)-eigenstates \(|n,\ell,m_{\ell}\rangle\) but for now we won’t look too deeply into that).
Before delving into the math, it is always helpful to leverage some classical intuition to try and predict what effect the perturbation will have on the various unperturbed \(H\)-eigenstates \(|n,\ell,m_{\ell}\rangle\) as well as their energies \(E_n\). Classically, we expect the proton \(p^+\) in the nucleus to be tugged in the direction of \(\textbf E_{\text{ext}}\) along the positive \(z\)-axis while the electron \(e^-\) is tugged by an equal and opposite force along the negative \(z\)-axis. Thus, for \(H\)-eigenstates where the electron \(e^-\) position space wavefunction \(\langle\textbf x|n,l,m_{\ell}\rangle\) was more concentrated “above” the proton \(p^+\), the average electric dipole moment \(\textbf p\) would find itself anti-aligned relative to the electric field \(\textbf E_{\text{ext}}\), leading to a positive energy contribution \(-\textbf p\cdot\textbf E_{\text{ext}}>0\) so that the energy \(E_n\) of that \(H\)-eigenstate is expected to increase (one might object that the Coulomb potential energy \(V\) would decrease. By a similar logic, \(H\)-eigenstates where the \(e^-\) was predominantly “below” the proton \(p^+\) should experience a decrease in their energy \(E_n\).should this should increasing their electric dipole moment \(\textbf p\) and by extension their average Coulomb potential energy \(\langle V\rangle\), hence by the virial theorem also leading on average to an increased energy \(\langle H\rangle =\langle V\rangle/2\). It remains to be seen if quantum mechanics bears out these classical considerations!
The Stark perturbation \(\Delta H_{\text{Stark}}\) to the original Hamiltonian \(H\) due to the application of the external electric field \(\textbf E_{\text{ext}}\) arises physically from the coupling between the electric dipole moment \(\textbf p\) of the proton-electron pair defining the hydrogen atom and the external electric field \(\textbf E_{\text{ext}}\), i.e. \(\Delta H_{\text{Stark}}=-\textbf p\cdot\textbf E_{\text{ext}}=-(-e\textbf X)\cdot\textbf E_{\text{ext}}=eE_{\text{ext}}X_3\).
Case #1: Nondegenerate Perturbation Theory
The easiest Stark perturbation to compute pertains to the ground state \(|1,0,0\rangle\) of the hydrogen atom. The reason it’s the easiest is simply that there is only one unique ground state in hydrogen (caveat: we are working with non-relativistic quantum mechanics so we’ll pretend we don’t know about spin!). So for the \(n=1\) ground state there is no degeneracy to worry about (which is decisively not true for \(n>1\)!). The ground state \(|1,0,0\rangle\) of course initially possesses an unperturbed energy \(E_1\approx -13.6\text{ eV}\). Using the standard result of nondegenerate perturbation theory, the first-order correction \(\Delta E_1^{(1)}\) to the unperturbed ground state energy \(E_1\) is given by the expectation of the unperturbed ground state \(|1,0,0\rangle\) in the perturbation Hamiltonian \(\Delta H_{\text{Stark}}\):
Recall that rotational symmetry \([H,\textbf L]=\textbf 0\) automatically implies centrosymmetry \([H,\Pi]=0\) and so \(\Pi|n,\ell,m_{\ell}\rangle=(-1)^{\ell}|n,\ell,m_{\ell}\rangle\) have definite parity. It follows that the ground state \(|1,0,0\rangle\) has even parity \(\Pi|1,0,0\rangle=|1,0,0\rangle\) so we expect \(\langle 1,0,0|X_3|1,0,0\rangle=\langle 1,0,0|\Pi^{\dagger}X_3\Pi|1,0,0\rangle=-\langle 1,0,0|X_3|1,0,0\rangle\) (where we’ve used the fact that \(\textbf X\) is a vector operator), thus to first-order \(\Delta E_1^{(1)}=0\) and the ground state energy \(E_1\) is unaffected (notice the above argument would have been equally valid if the ground state were to have oddparity! Thus, the key is not which parity it had, but that it had a definite parity at all). Instead, the Stark perturbation only affects the ground state energy \(E_1\) at second-order \(\Delta E_1^{(2)}\neq 0\) which is why this is an example of the so-called quadratic Stark effect.
To compute the second-order correction \(\Delta E_1^{(2)}\), we need to calculate:
Since \(X_3\) is already the \(T^1_0\) spherical component of the rank-\(1\) tensor operator \(\textbf X\) (since it’s a vector operator!), the Wigner-Eckart theorem asserts that the matrix element \(\langle n,\ell,m_{\ell}|X_3|1,0,0\rangle\) is proportional to the Clebsch-Gordan coefficient \(\langle n,\ell,m_{\ell}|\otimes\langle 1,0,0|1,0\rangle\). In particular, one has the usual angular momentum addition selection rules \(0=m_{\ell}+0\) and \(|\ell-0|\leq 1\leq\ell+0\), so the only non-vanishing matrix elements are those with \(\ell=1,m_{\ell}=0\) (put another way, the only states that mix with the ground state \(|1,0,0\rangle\) are those of the form \(|n,1,0\rangle\)). The second-order energy correction \(\Delta E_1^{(2)}\) to the ground state \(|1,0,0\rangle\) simplifies to a single infinite series:
where, if you want the details of that sum, see here. The point is \(\Delta E_1^{(2)}<0\). Classically, the external electric field \(\textbf E_{\text{ext}}\) polarizes the hydrogen atom with electric dipole moment \(\textbf p=\alpha\textbf E_{\text{ext}}\) with \(\alpha>0\) the atomic polarizability, so it makes sense that the perturbation \(V_{\text{Stark}}=-\textbf p\cdot\textbf E_{\text{ext}}=-\alpha E_{\text{ext}}^2<0\) is both negative and comes in at quadratic order \(V_{\text{Stark}}=O_{E_{\text{ext}}\to 0}(E_{\text{ext}}^2)\).
Case #2: Degenerate Perturbation Theory
Now we compute the Stark perturbation for the \(n=2\) states. Recall that in the hydrogen atom, the \(n\)-th energy level \(E_n\) is associated with an \(\sum_{\ell=0}^{n-1}(2\ell+1)=n^2\)-dimensional degenerate \(H\)-eigensubspace. For \(n=1\), \(1^2=1\) so we have a single non-degenerate ground state \(|1,0,0\rangle\) as before. However, for \(n=2\) we have \(2^2=4\) degenerate states \(|2,0,0\rangle,|2,1,-1\rangle,|2,1,0\rangle\) and \(|2,1,1\rangle\) (in chemistry lingo, the \(2s\)-wave and the three \(2p\)-waves). As usual, we look for eigenstates of the perturbation Hamiltonian \(\Delta H_{\text{Stark}}\biggr|_{\ker(H-E_2 1)}\) restricted to the degenerate \(E_2\)-eigensubspace \(\ker(H-E_2 1)=\text{span}_{\textbf C}|2,0,0\rangle,|2,1,-1\rangle,|2,1,0\rangle,|2,1,1\rangle\) of the unperturbed Hamiltonian \(H\) since such eigenstates are automatically eigenstates of the perturbed Hamiltonian \(H+\Delta H_{\text{Stark}}\)! The matrix elements of \(\Delta H_{\text{Stark}}\biggr|_{\ker(H-E_2 1)}\) in the obvious basis \(|2,0,0\rangle,|2,1,-1\rangle,|2,1,0\rangle,|2,1,1\rangle\) may be determined through parity considerations for the diagonal elements and Wigner-Eckart selection rule considerations for the off-diagonal elements:
Thus with only (3\) distinct eigenvalues \(-3er_{\text{Bohr}}E_{\text{ext}},0,3er_{\text{Bohr}}E_{\text{ext}}\) representing the degeneracy-lifting energy corrections that need to be made to the degenerate energy \(E_2\) (actually though, having an eigenvalue of \(0\) means no energy correction is made!). Thus, the Stark perturbation lifts the degeneracy among the states \(|2,0,0\rangle\) and \(|2,1,0\rangle\) by mixing them into \(2\) “\(sp\) hybrid waves” with equiprobable “\(50\%\) \(s\)-character” and “\(50\%\) \(p\)-character”.
Note that it doesn’t really make sense to say specifically which of the unperturbed states \(|2,0,0\rangle,|2,1,0\rangle\) “became” which of the two new hybrid states. It only makes sense to say that they mixed. The other two \(p\)-waves \(|2,1,-1\rangle,|2,1,1\rangle\) remain degenerate/”unhybridized” \(p\)-waves.
Although the energy corrections to \(E_2\) scaled linearly with \(E_{\text{ext}}\) (thus, this is called the linear Stark effect), the new eigenstates are completely independent of \(E_{\text{ext}}\)! Even the slightest stray electric field in the lab evidently causes the hydrogen atom to prefer the probabilistic superposition. Diagramatically, the Stark perturbation as a function of the external electric field strength for the states considered so far looks like:
I mentioned this already, but I just wanted to re-emphasize that “degenerate perturbation theory” is a misnomer because the energies have been computed exactly rather than perturbatively! That’s it, those are the exact linear Stark effect energies! It is interesting to try and think about why these results make sense classically.
From an experimental physics perspective, the Stark effect enables one to “sniff out” degeneracies in the Hamiltonian \(H\) of an initially “highly symmetric” quantum system simply through the application of an external electric field \(\textbf E_{\text{ext}}\) (which breaks that symmetry and therefore the degeneracy). If the agent causing the external electric field \(\textbf E_{\text{ext}}\) is say another atom, then roughly speaking the Stark effect (specifically the linear Stark effect part where the lowest-energy lifted state \((|2,0,0\rangle+|2,1,0\rangle)\sqrt{2}\) is maximally delocalized) also explains why conductors work.
Although the spherical harmonics are just functions of \(\theta,\phi\) on the sphere \(S^2\), one can introduce an \(r\)-dependence simply by multiplying each of them by \(r\). The resulting functions are then best written in Cartesian coordinates \((x,y,z)\):
But this suggests something quite interesting. It means if we have any vector \(\textbf x\in\textbf R^3\) (i.e. a rank \(\ell=1\) tensor) with contravariant components \(\textbf x=x\hat{\textbf i}+y\hat{\textbf j}+z\hat{\textbf k}\) in a Cartesian (aka orthonormal) basis \(\hat{\textbf i},\hat{\textbf j},\hat{\textbf k}\), then by rotating into the so-called spherical basis \(\hat{\textbf e}_{-1}=\frac{\hat{\textbf i}-i\hat{\textbf j}}{\sqrt{2}},\hat{\textbf e}_{0}=\hat{\textbf k},\hat{\textbf e}_{1}=-\frac{\hat{\textbf i}+i\hat{\textbf j}}{\sqrt{2}}\), the contravariant components \(\textbf x=x^{\mu}\hat{\textbf e}_{\mu}\) of the same vector \(\textbf x\) in this spherical basis are now proportional to the spherical harmonics! But this gives us a handle on how such components must transform under rotations because we know how spherical harmonics transform under rotations (and the \(r\)-prefactor is constant with respect to rotations)! To wit, recall that \(Y_{\ell}^m(\hat{\textbf n})=\langle\hat{\textbf n}|\ell, m\rangle\) are the (angular part of the) position space wavefunctions of orbital angular momentum eigenstates, and for each fixed total angular momentum \(j\in\textbf N/2\), the \(2j+1\)-dimensional subspace \(\mathcal H_{j}:=\text{span}_{m=-j,…,j}|j,m\rangle\) is \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}\)-invariant for all \(\Delta \boldsymbol{\phi}\) in \(SO(3)\). More precisely, these states transform according to the components of the Wigner D-matrix \(D^j(\boldsymbol{\phi})\in U(2j+1)\):
with \(D_{m’m}^j(\Delta{\boldsymbol{\phi}})=\langle j,m’|e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}|j,m\rangle\).
Example: Consider the vector \(\textbf x=\hat{\textbf i}-2\hat{\textbf j}+3\hat{\textbf k}\) in some Cartesian basis. Suppose we wished to rotate \(\textbf x\) by an angle of \(\pi/5\) about the \(z\)-axis. Ordinarily, we could just achieve this by acting on the Cartesian components of \(\textbf x\) with a suitable rotation matrix:
Alternatively, in the spherical basis we instead have \(\textbf x=\frac{1+2i}{\sqrt{2}}\hat{\textbf e}_{-1}+3\hat{\textbf e}_{0}-\frac{1-2i}{\sqrt 2}\hat{\textbf e}_{1}\). We then note that the associated Wigner \(D\)-matrix has components \(D^1_{m’m}(\pi/5)=\langle 1,m’|e^{-i\pi/5 J_3}|1,m\rangle=\delta_{m’m}e^{-im\pi/5}\) and so \(D^1(\pi/5)=\text{diag}(e^{i\pi/5},1,e^{-i\pi/5})\). Acting with \(D^1(\pi/5)\) on the components of the vector \(\textbf x\) in the spherical basis then yields:
Taking the earlier Cartesian result and converting it into the spherical basis shows that the two calculations match as claimed.
So far we have been talking about a vector \(\textbf x\in\textbf R^3\). Of course though, all of this discussion applies just as well to vector operators on \(\mathcal H=L^2(\textbf R^3\to\textbf C)\) in quantum mechanics such as \(\textbf X,\textbf P\), etc. To reiterate, to rotate a vector in \(\textbf R^3\), one can always first perform a unitary change of basis from Cartesian to spherical, rotate the spherical components of the vector operator as if they were orbital angular momentum eigenstates (or spherical harmonics if you like) using a suitable Wigner \(D\)-matrix, and the result will be the rotated vector operator in the spherical basis.
Even more generally, one need not limit oneself to scalar operators or vector operators; any tensor operator can be converted from Cartesian components to spherical components (with the latter dubbed a “spherical tensor operator” although I find it to be misnomer as it’s not a different kind of tensor operator, just one whose components happen to expressed in the spherical basis). More precisely, if \(T^j_m\) are the spherical components of a rank \(j\in\textbf N\) tensor operator \(T\) with \(2j+1\) component operators \(T^j_m\) for \(m=-j,…,j\), then these components transform under rotations via the Wigner \(D\)-matrix:
As already suggested, rank \(j=0\) scalar operators have \(2j+1=1\) spherical component operator that transforms under rotations as \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}T^0_0e^{i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}=D_{00}^0(\Delta{\boldsymbol{\phi}})T_0^0=T_0^0\) which has the logically equivalent infinitesimal formulation \([\textbf J,T_0^0]=0\). Meanwhile, a rank \(j=1\) vector operator with \(2j+1=3\) spherical component operators \(T^1_{-1},T^1_{0},T^1_1\) transforms under rotations as \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}T^1_me^{i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}=D_{-1m}^1(\Delta{\boldsymbol{\phi}})T_{-1}^1+D_{0m}^1(\Delta{\boldsymbol{\phi}})T_0^1+D_{1m}^1(\Delta{\boldsymbol{\phi}})T_1^1\), or infinitesimally \(\). This should be compared with the analogous rotation behavior statement for the Cartesian components \(e^{-i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}T_ie^{i\Delta{\boldsymbol{\phi}}\cdot\textbf J/\hbar}=R_{ij}T_j\), or infinitesimally \([J_i,T_j]=i\hbar\varepsilon_{ijk}T_k\). More generally then, one can also look at the infinitesimal limit of the above defining equation for spherical components of a tensor operator to obtain logically equivalent commutation relations:
\[[J_3,T^{\ell}_m]=m\hbar T^{\ell}_m\]
\[[J_{\pm},T^{\ell}_m]=\hbar\sqrt{(\ell\mp m)(\ell\pm m +1)}T^{\ell}_{m\pm 1}\]
Hopefully this all makes sense! Finally then, we arrive at:
Theorem (Wigner-Eckart Theorem): Let \(T:\mathcal H\to\mathcal H\) be a spherical tensor operator of rank \(j\) with \(2j+1\) components \(T_{(j)}^{m}\) for \(m=-j,…,j\). Then there exists a so-called reduced matrix element \(\langle j_2||T^{(j)}||j_1\rangle\) independent of \(m_1,m_2,m\) which acts as a proportionality constant between the matrix elements of \(T\) in the angular momentum eigenbasis and Clebsch-Gordan coefficients that can be easily computed or looked up in tables:
In particular, recall that the Clebsch-Gordan coefficients \langle j_1,m_1|\otimes\langle j_2,m_2|j,m\rangle have the standard selection rules \(m=m_1+m_2\) and \(|j_1-j_2|\leq j\leq j_1+j_2\). By virtue of the Wigner-Eckart theorem, these selection rules are also directly inherited by the matrix elements \(\langle j_2,m_2|T_j^m|j_1,m_1\rangle\).
Problem: Working on the flat manifold of Minkowski spacetime \(X\cong\mathbf R^{1,3}\) equipped with the Lorentzian metric signature \((+,-,-,-)\), define what is meant by a Lorentz scalar, a Lorentz vector (also called a 4-vector), a Lorentz tensor, and a Lorentz spinor.
Solution: Recall that the proper orthochronous Lorentz Lie algebra \(\frak{so}^+\)\((1,3)\) is spanned by \(3\) rotation generators \(\mathbf J\) and \(3\) Lorentz boost generators \(\mathbf K\) with commutation relations \(\mathbf J\times\mathbf J=\mathbf K\times\mathbf K=i\hbar\mathbf J\) and \((\mathbf J\times\mathbf K+\mathbf K\times\mathbf J)/2=i\hbar\mathbf K\). Upon complexifying \(\mathbf J_1:=(\mathbf J+i\mathbf K)/2\) and its adjoint \(\mathbf J_1^{\dagger}:=\mathbf J_2=(\mathbf J-i\mathbf K)/2\), one has \(\mathbf J_{1,2}\times\mathbf J_{1,2}=i\hbar\mathbf J_{1,2}\) and \(\mathbf J_1\times\mathbf J_2=-\mathbf J_2\times\mathbf J_1\) so \(\frak{s0}^+\)\((1,3)\cong\frak{su}\)\((2)\oplus\frak{su}\)\((2)\) and irreps of \(SO^+(1,3)\) are labelled by pairs \((j_1,j_2)\in\{(0,0),(1/2,0),(0,1/2),(1/2,1/2),…\}\).
Lorentz scalars are real numbers that transform under the trivial \((0,0)\) irrep of \(SO^+(1,3)\). Examples include the speed of light \(c\) (by postulate!), particle (rest) mass \(m\), electric charge \(q\), spacetime volumes \(d^4x\) (more generally, \(\sqrt{|\det g|}d^4x\)), spacetime intervals/”rulers” \(ds^2=c^2dt^2-|d\mathbf x|^2\) (more generally \(ds^2=g_{\mu\nu}dx^{\mu}dx^{\nu}\)), proper time along a time-like worldline \(d\tau:=ds/c\), energy-momentum intervals \((E/c)^2-|\mathbf p|^2=m^2c^2\) (more generally \(g_{\mu\nu}p^{\mu}p^{\nu}=m^2c^2\)), the electromagnetic Lagrangian density field \(-B_{\mu\nu}(x)B^{\mu\nu}(x)/4\mu_0=\varepsilon_0|\mathbf E(x)|^2/2-|\mathbf B(x)|^2/2\mu_0\), the \(\theta\)-axion pseudoscalar field \(-\epsilon^{\mu\nu\rho\sigma}B_{\mu\nu}(x)B_{\rho\sigma}(x)/8=\mathbf E(x)/c\cdot\mathbf B(x)\), the Ricci scalar field \(R(x)\), etc.
Lorentz vectors (aka \(4\)-vectors) are tangent vectors in \(T(X)\cong X\) that transform under the natural \((1/2,1/2)\) irrep of \(SO^+(1,3)\). Examples include the \(4\)-position \(x:=(ct,\mathbf x)\) (note: in curved spacetimes \(X\neq\mathbf R^{1,3}\), the “twisting” of the local tangent space means one can no longer treat events \(x\in X\) as “\(4\)-positions”), the \(4\)-velocity \(v:=dx/d\tau=\gamma(c,\mathbf v)\) with time dilation \(\gamma:=dt/d\tau\), the \(4\)-momentum \(p:=mv=(E/c,\mathbf p)\) with \(E=\gamma mc^2\) and \(\mathbf p=\gamma m\mathbf v\), the \(4\)-wavevector \(k:=p/\hbar=(\omega/c,\mathbf k)\) with \(E=\hbar\omega\) and \(\mathbf p=\hbar\mathbf k\), the \(4\)-acceleration \(a:=dv/d\tau=\gamma^4(\boldsymbol{\beta}\cdot\mathbf a,\mathbf a+\boldsymbol{\beta}\times(\boldsymbol{\beta}\times\mathbf a))\) with \(\boldsymbol{\beta}:=\mathbf v/c\), the \(4\)-force \(F:=dp/d\tau=ma=\gamma(\mathbf F\cdot\boldsymbol{\beta},\mathbf F)\) where \(\dot E=\gamma^3 m\mathbf a\cdot\mathbf v=\mathbf F\cdot\mathbf v\) and \(\dot{\mathbf p}=m\gamma^3(\mathbf a+\boldsymbol{\beta}\times(\boldsymbol{\beta}\times\mathbf a))=\mathbf F\), the \(4\)-current field \(J(x):=(\rho(x) c,\mathbf J(x))\), the electromagnetic \(4\)-potential gauge field \(A(x):=(\phi(x)/c,\mathbf A(x))\), etc.
Lorentz tensors are any objects that transform under some direct sum of \((j_1,j_2)\) irreps of \(SO^+(1,3)\) for which \(j_1+j_2\in\mathbf N\); thus, Lorentz vectors are an instance of Lorentz tensors. Other examples include the electromagnetic tensor field \(B(x)\) which transforms under the (reducible) \(SO^+(1,3)\)-representation \((1,0)\oplus (0,1)\) and the electromagnetic stress tensor field \(T(x)=\begin{pmatrix}\mathcal H(x)&-\mathbf I^T(x)/c \\ -\mathbf I(x)/c & -\sigma(x)\end{pmatrix}\) where the electromagnetic Hamiltonian density \(\mathcal H(x)=\varepsilon_0|\mathbf E(x)|^2/2+|\mathbf B(x)|^2/2\mu_0\), the Poynting vector \(\mathbf I(x)=\mathbf E(x)\times\mathbf B(x)/\mu_0\), and the Maxwell stress tensor field \(\sigma(x):=\varepsilon\mathbf E^{\otimes 2}(x)+\mathbf B^{\otimes 2}(x)/\mu_0-\mathcal H(x)1\).
Lorentz spinors are any objects that transform under some \((j_1,j_2)\) irrep of \(SO^+(1,3)\) for which \(j_1+j_2\in\mathbf N+1/2\). For instance, objects that transform under the \((1/2,0)\) irrep are called left-handed Weyl spinors while objects that transform under \((0,1/2)\) are called right-handed Weyl spinors. These will be of no concern for electromagnetism which is focused on spin-\(1\) photons, but do matter when one has to couple fermionic matter fields to gravity.
Finally, as a terminology note, Lorentz scalars are synonymous with Lorentz invariant quantities, while all other objects transforming under some non-trivial irrep of \(SO^+(1,3)\) are collectively grouped under the umbrella of Lorentz covariant quantities (this is not to be confused with an obsolete use of the word “covariant” in “covariant vectors” to distinguish them from “contravariant vectors”; in modern terminology, one should simply refer to them as covectors and tangent vectors respectively).
Problem:
Solution:
\[f^{\mu}=qv^{\nu}F^{\mu}_{\nu}\]
with just as \(v^{\mu}a_{\mu}=0\) are orthogonal, so \(p^{\mu}f_{\mu}=0\) are also orthogonal.
A prominent example of a four-covector is the four-gradient \(\frac{\partial}{\partial X}:=\left(\frac{1}{c}\frac{\partial}{\partial t},\frac{\partial}{\partial\textbf x}\right)\). It has components \(\frac{\partial}{\partial X^{\mu}}\). To prove that it’s actually a four-covector and not a four-vector, let \(X’^{\mu}=\Lambda^{\mu}_{\nu}X^{\nu}\) be the Lorentz-transformed contravariant components of the four-position \(X\). From this we obtain \(\frac{\partial X’^{\mu}}{\partial X^{\nu}}=\frac{\partial}{\partial X^{\nu}}\Lambda^{\mu}_{\rho}X^{\rho}=\Lambda^{\mu}_{\rho}\frac{\partial X^{\rho}}{\partial X^{\nu}}=\Lambda^{\mu}_{\rho}\delta_{\nu}^{\rho}=\Lambda^{\mu}_{\nu}\) or equivalently \(\frac{\partial X^{\nu}}{\partial X’^{\mu}}=\Lambda^{\nu}_{\mu}\). Meanwhile, the components of the four-gradient transform (by virtue of the chain rule) into \(\frac{\partial}{\partial X^{\mu}}\mapsto\frac{\partial}{\partial X’^{\mu}}=\frac{\partial X^{\nu}}{\partial X’^{\mu}}\frac{\partial}{\partial X^{\nu}}=\Lambda^{\nu}_{\mu}\frac{\partial}{\partial X^{\nu}}\). Thus, either from the fact that the \(\mu\) is a subscript on \(\Lambda\) or that the \(\nu\) is a superscript on \(\Lambda\), clearly this implies that the four-gradient \(\frac{\partial}{\partial X}\) is a four-covector and hence it has covariant components that can be denoted by the subscript \(\partial_{\mu}:=\frac{\partial}{\partial X^{\mu}}\).
Tensors
In general, on an arbitrary manifold (such as Minkowski spacetime!) if one transforms between two coordinate charts \((x^1,x^2,…)\mapsto (x’^1,x’^2,…)\) (i.e. a Lorentz transformation between two inertial frames!), then physicists define a tensor \(T\) of type \((u,v)\) by enforcing that its components transform according to:
Please ingrain this equation into your head. In the case where that manifold is Minkowski spacetime, the partial derivatives become (via the earlier derivation):
which we call a Lorentz tensor. Thus, Lorentz scalars are type \((0,0)\) Lorentz tensors, four-vectors are type \((1,0)\) Lorentz tensors while four-covectors are type \((0,1)\) Lorentz tensors. The important point about Lorentz tensors in special relativity is that any equation you write down using tensors (where you’ve respected the rules about pairing upper/contravariant and lower/covariant indices, etc.) is necessarily a Lorentz covariant equation in the sense that it will look the same in all inertial frames. This phenomenon of Lorentz covariance was one of Einstein’s original postulates of special relativity!
But why is \(J\) a four-vector (field)? The idea is that \(J\) contains both the information \(\rho\) about where the charges are and how they’re moving \(\textbf J\), but of course by local charge conservation these two quantities are related by the continuity equation \(\frac{\partial\rho}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf J=0\) which is elegantly expressed using the four-gradient as \(\partial_{\mu}J^{\mu}=0\). So the aesthetic of this should give some suggestion that, because we already showed that the four-gradient \(\partial_{\mu}\) is a four-covector, \(J\) should be a four-vector. To prove this rigorously, we enforce Lorentz covariance of local charge conservation in all inertial frames, so \(\partial’_{\mu}J’^{\mu}=0\). Then, \(\partial_{\mu}=\Lambda_{\mu}^{\nu}\partial’_{\nu}\), so \(\Lambda_{\mu}^{\nu}\partial’_{\nu}J^{\mu}=\partial’_{\nu}\Lambda_{\mu}^{\nu}J^{\mu}=0\). This implies that \(J’^{\mu}=\Lambda^{\mu}_{\nu}J^{\nu}+C\) where \(C\) is an integration constant independent of \(\mu\). But if the primed and unprimed inertial frames were to coincide so that \(\Lambda^{\mu}_{\nu}=\delta^{\mu}_{\nu}\) and \(J’^{\mu}=J^{\mu}\), then we must have \(C=0\). This completes the proof that \(J\) is a four-vector. As an example, if a wire of uniform charge density \(\rho\) and current \(\textbf J=\textbf 0\) is at rest in one inertial frame, then a second inertial observer Lorentz-boosted relative to the wire with velocity \(\textbf v\) would therefore see an increased charge density \(\rho’=\gamma\rho\) with the factor of \(\gamma\) attributable to length contraction and current density \(\textbf J’=-\rho’\textbf v\).
Electric & Magnetic Fields
One of the key insights that arises when one applies special relativity to electromagnetism is that magnetic fields are a relativistic effect. I suppose this is true even in materials where the \(\textbf B\)-field is due to spin angular momenta, and the Dirac equation teaches us that spin too is an intrinsically relativistic phenomenon! A simple example is given by imagining that one has an infinitely-long neutral \(\rho=0\) wire of cross-sectional area \(A\) conducting a current \(I>0\) along say the positive \(z\)-axis in the wire’s rest frame. Now, classically, we know that this is supposed to establish a circulating magnetostatic field \(\textbf B(r,\phi)=\frac{\mu_0 I}{2\pi r}\hat{\boldsymbol{\phi}}_{\phi}\). Thus, if a point charge \(q>0\) were moving with velocity \(\textbf v=v\hat{\textbf k}\) along the \(z\)-axis at some moment in time, although it would feel no electric force \(\textbf F_{\textbf E}=\textbf 0\) in the wire’s rest frame because of the wire’s electroneutrality \(\rho=0\), it should experience a magnetic force \(\textbf F_{\textbf B}=q\textbf v\times\textbf B\) attracting it towards the wire. However, it turns out that we don’t need to assume this! All we need to assume is we know how electric forces work, and then inject a bit of special relativity. To see this, Lorentz boost into the instantaneous inertial rest frame of the point charge \(q\) so that in this frame, it clearly wouldn’t experience any magnetic force because now \(\textbf v’=\textbf 0\). However, at the same time the wire is no longer neutral, instead acquiring a negative charge density \(\rho’=-\gamma\beta I/Ac<0\) (as an aside, the current also increases to \(I’=\gamma I\)). In this rest frame of \(q\) then, an electric force \(\textbf F’_{\textbf E’}\) therefore attracts it (being a positive charge \(q>0\)) towards the negatively-charged wire given by the standard formula from Gauss’s law for electric fields: \(\textbf F’_{\textbf E’}=\frac{q\rho’ A}{2\pi\varepsilon_0 r}\hat{\textbf r}=-qv\frac{\gamma I}{2\pi\varepsilon_0 c^2r}\hat{\textbf r}\). But recall \(c^2=1/\mu_0\varepsilon_0\) so \(\textbf F’_{\textbf E’}=-qv\frac{\gamma \mu_0 I}{2\pi r}\hat{\textbf r}\). Lorentz transforming back into the neutral rest frame of the wire, we conclude that there must be a magnetic force \(\textbf F_{\textbf B}=\textbf F’_{\textbf E’}/\gamma=-qv\frac{\mu_0 I}{2\pi r}\hat{\textbf r}\) which is just what we expect from Ampere’s law (this is obtained by noting that the Lorentz boosts are performed along the \(z\)-axis so the \(xy\) spatial components of the four-force\(F=\gamma(\textbf F\cdot\textbf v/c, \textbf F)^T\) are unaffected, and in particular as \(\textbf F’_{\textbf E’}\) lives in the \(xy\)-plane so \(\gamma\textbf F_{\textbf B}=\gamma’\textbf F’_{\textbf E’}\) but \(\gamma’=1\) in the rest frame of \(q\) so \(\textbf F_{\textbf B}=\textbf F’_{\textbf E’}/\gamma\) as claimed).
The Electromagnetic Four-Potential
Recall that out of \(4\) of Maxwell’s equations for \(\textbf E\) and \(\textbf B\), Gauss’s law for electric fields and the Ampere-Maxwell law involve the source fields \(\rho,\textbf J\) whereas Gauss’s law for magnetic fields and Faraday’s law of EM induction relate purely to the \(\textbf E\) and \(\textbf B\)-fields, independent of \(\rho\) and \(\textbf J\). Thus, it would be nice if one could in some sense just “get those two equations out of the way” to focus on the remaining two that actually deal with the sources. The trick to doing this is to work with the underlying electric scalar potential \(\phi(X)\) and magnetic vector potential \(\textbf A(X)\):
which immediately guarantees that \(\frac{\partial}{\partial\textbf x}\cdot\textbf B=0\) and \(\frac{\partial}{\partial\textbf x}\times\textbf E=-\frac{\partial\textbf B}{\partial t}\) are satisfied. The two remaining Maxwell’s equations become:
Of course, there exists a gauge redundancy/freedom (also called “gauge symmetry” although it’s not strictly speaking a “symmetry”) such that multiple non-unique choices of the potentials \((\phi,\textbf A)\) can give rise to the same fields \(\textbf E,\textbf B\). For one, it is clear that \(\textbf A’=\textbf A-\frac{\partial\Gamma}{\partial\textbf x}\) for any scalar field \(\Gamma(X)\) still yields the same \(\textbf B=\frac{\partial}{\partial\textbf x}\times\textbf A’\). If the electric field is to also be preserved \(\textbf E=-\frac{\partial\phi’}{\partial\textbf x}-\frac{\partial\textbf A’}{\partial t}=-\frac{\partial\phi}{\partial\textbf x}-\frac{\partial\textbf A}{\partial t}\), then one can deduce that the electric scalar potential must transform correspondingly as \(\phi’=\phi+\frac{\partial\Gamma}{\partial t}\). Taken together, \((\phi,\textbf A)\mapsto (\phi+\dot{\Gamma},\textbf A-\Gamma’)\) is called a gauge transformation of the potentials \(\phi,\textbf A\) with respect to the gauge field \(\Gamma\). Since, at least classically, the choice of the gauge \(\Gamma\) has no physical significance, one can simply fix it to simplify the resultant \(2\) Maxwell equations; for instance, the Coulomb gauge \(\Gamma_{\text{Coulomb}}\) is chosen to kill the divergence term \(\frac{\partial}{\partial\textbf x}\cdot\textbf A=0\), whereas the Lorenz gauge \(\Gamma_{\text{Lorenz}}\) is chosen to kill \(\frac{1}{c}\frac{\partial\phi}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf A=0\), etc.
Define the electromagneticfour-potential \(A\) by:
Why is this a four-vector? The following argument is due to J. Murray on Physics Stack Exchange: denoting the components of \(A\) in an inertial frame by \(A^{\mu}\) (even though we haven’t proved it’s a four-vector yet), since we know about the gauge freedom inherent in the four-potential \(A’^{\mu}=A^{\mu}+\partial^{\mu}\Gamma\) where \(\partial^{\mu}=\eta^{\mu\nu}\partial_{\nu}\) are the contravariant components of the four-gradient four-vector, clearly we should check whether the purported four-vectorial nature of \(A\) would change simply because of which gauge \(\Gamma\) one fixes. Fortunately, this is not the case; \(A^{\mu}\) transforms contravariantly if and only if \(A’^{\mu}\) transforms contravariantly with respect to Lorentz transformations (this is a simple check!). It remains to exhibit a specific example of a gauge \(\Gamma\) within which \(A^{\mu}\) transforms contravariantly. Fix \(\Gamma:=\Gamma_{\text{Lorenz}}\) so that \(\partial_{\mu}A^{\mu}=0\). Then Gauss’s law for electric fields and the Ampere-Maxwell law reduce to \(☐^2 A=\mu_0 J\) where the d’Alembert operator is \(☐^2:=\partial_{\mu}\partial^{\mu}\). But this is manifestly a Lorentz scalar operator, and because we’ve already shown that the four-current \(J\) is a four-vector, it follows that the electromagnetic four-potential \(A\) is also a four-vector.
The Electromagnetic Tensor
Now that we understand how the potentials \(\phi,\textbf A\) behave under Lorentz transformations, and we know how these are related to \(\textbf E,\textbf B\), it is straightforward to deduce how they too must transform. However, the relevant “nice object” is no longer a four-vector since \(\textbf E\) and \(\textbf B\) has six components! Instead, rather than a four-vector (i.e. a type \((1,0)\) Lorentz tensor), it turns out to be convenient to use a \(4\times 4\) matrix \(F\) (i.e. a type \((1,1)\) Lorentz tensor) that will be called the electromagnetic tensor (also sometimes called the Faraday tensor hence the letter \(F\)). Specifically, since we know \(\textbf E,\textbf B\) are gauge-invariant, one would like to ensure that the electromagnetic tensor \(F\) is also gauge-invariant in order to have any hope that it can properly capture \(\textbf E,\textbf B\). Pondering all these considerations together, I think it’s not too much of a stretch to say that you could eventually stumble upon the following definition (called the Plucker matrix ofthe four-gradient \(\partial/\partial X\) with the electromagnetic four-potential \(A\), each thought of as a “point” that together define a line in real projective space \(\textbf RP^3\)):
where \(\partial_{\mu}\) are the usual covariant components of the four-gradient four-covector \(\partial/\partial X\) and \(A_{\mu}=\eta_{\mu\nu}A^{\nu}\) are the covariant components of the electromagnetic four-potential four-covector. As promised, one can quickly check that \(F\) is gauge-invariant, and moreover antisymmetric \(F\in\frak{so}\)\((4)\). Doing a few computations, one can show that the electric and magnetic fields \(\textbf E,\textbf B\) have the interpretation of the Plucker coordinates of the line:
with \(\textbf B_{\times}=-\textbf B\cdot\boldsymbol{\mathcal J}\) the cross-product matrix of the magnetic field \(\textbf B\). One can also construct the dual electromagnetic tensor \(\tilde F\) to be fully contravariant \(F^{\mu\nu}=\eta^{\nu\sigma}\eta^{\mu\rho}F_{\rho\sigma}\) so that:
The punchline then is that we know how \(\textbf E\) and \(\textbf B\) transform under Lorentz transformations because \(F,\tilde F\) are type \((1,1)\) Lorentz tensors! This follows because they’re constructed out of other objects like \(\partial/\partial X\), \(A\), \(\eta\), etc. that were already known to be Lorentz tensors. The transformation is the usual one for a type \((1,1)\) tensor (strictly, tensor field):
or in index-free notation, \(F’=\Lambda F\Lambda^T\). Amazing!
Although for four-vectors such as \(X\) and \(V\) it is true that a Lorentz boost in some direction \(\hat{\boldsymbol{\beta}}\) looks just like a Galilean boost in the same direction \(\hat{\boldsymbol{\beta}}\) provided one only looks at the spatial orthogonal complement \(\text{span}^{\perp}_{\textbf R}\hat{\boldsymbol{\beta}}\), for the electromagnetic field it turns out to be the other way around (as the electromagnetic tensor \(F\) is no longer a four-vector!). This is most transparent in the formulas below for a Lorentz boost \(\beta\in[-1,1]\) along the positive \(x\)-axis into a second inertial frame with axes aligned to the first:
\[E’_x=E_x\]
\[E’_y=\gamma(E_y-\beta cB_z)\]
\[E’_z=\gamma(E_z+\beta cB_y)\]
\[cB’_x=cB_x\]
\[cB’_y=\gamma(cB_y+\beta E_z)\]
\[cB’_z=\gamma(cB_z-\beta E_y)\]
Or, in the non-relativistic limit \(|\beta|\ll 1\), \(\textbf E’\approx \textbf E+\textbf v\times\textbf B\) and \(\textbf B’=\textbf B\) are the “Galilean boosts” of the electromagnetic field.
Problem: Define an ideal gas (whether that be classical or quantum). Write down the single-particle canonical partition function \(Z_1\) for a classical, monatomic ideal gas, and hence the \(N\)-particle canonical partition function \(Z_N\) in terms of \(Z_1\) (assuming all \(N\) particles are identical).
Solution: The word ideal is synonymous with non-interacting \(V_{\text{int}}=0\). For the classical, monatomic ideal gas, \(Z_1\) is given by the dimensionless ratio of volumes:
Problem: In the canonical ensemble, the variables \(N,V,T\) are fixed. Hence, express the conjugate variables \(\mu, p, S\) of the classical, monatomic ideal gas in terms of \(N,V,T\).
Solution: From \(F=-k_BT\ln Z_N\), one has the chemical potential of the classical, monatomic ideal gas:
where the number density \(n:=N/V\) and dimensionless phase space density \(n\lambda_T^3\) feature.
Problem: All the results above are specific to a classical, monatomic ideal gas. Now, fixing the “classical” and “ideal” assumptions, one can first focus on generalizing the “monatomic” assumption to “polyatomic”. In particular, write down the single-particle canonical partition function \(Z_1\) of a rigid rotor, and compute the analogs of the \((\mu, p, S)\) formulas above in terms of \((N,V,T)\).
Solution: A rigid rotor is just a rigid body (i.e. averaging out vibrational perturbations via phenomenological moments of inertia) fixed at a point (i.e. stripping away translational degrees of freedom). There are \(2\) fundamentally different cases that need to be analyzed separately.
A linear rigid rotor has principal inertia tensor \(\mathcal I=\text{diag}(I,I,0)\). Classically, its configuration space is described by just \(2\) Euler angles \((\theta,\phi)\cong S^2\) with classical Hamiltonian \(H=\frac{1}{2I}\left(p_{\theta}^2+\frac{p_{\phi}^2}{\sin^2\theta}\right)\):
where the rotational temperature \(k_BT_I=E_I\) is associated to a quantum of rotational kinetic energy \(E_I=\hbar^2/2I\) and \(|G_p\cap SO(3)|\) is the number of proper rotational symmetries in the molecule’s point group \(G_p\).
Quantum mechanically, the symmetric rigid rotor Hamiltonian \(H=\frac{|\mathbf L|^2-L_z^2}{2I}+\frac{L_z^2}{2I_z}\) is diagonalized not in the usual \(|\ell,m_{\ell}\rangle\) space-frame angular momentum eigenbasis, but rather (because \(L_z\) is the principal body-frame component of \(\mathbf L\) which commutes with all its space-frame components) in a basis \(|\ell,m_{\ell},\tilde{m}_{\ell}\rangle\) where both projections \(-\ell\leq m_{\ell},\tilde{m}_{\ell}\leq\ell\). From the \(m_{\ell}\)-independent spectrum \(E_{|\ell,m_{\ell},\tilde m_{\ell}\rangle}=\ell(\ell+1)E_I+\tilde m_{\ell}^2(E_{I_z}-E_I)\), one has:
At this point, recall the rigid rotor is linear so \(I_z\to 0\Leftrightarrow E_{I_z}\to\infty\) and only the \(\tilde{m}_{\ell}=0\) term in the inner sum \(\sum_{\tilde{m}_{\ell}=-\ell}^{\ell}\) survives. Furthermore, working in the high-temperature/classical limit \(\beta E_I\ll 1\), one can apply the zeroth-order Euler-Maclaurin formula:
2. For a nonlinear rigid rotor, the configuration space is now fundamentally higher-dimensional, being described by \(3\) Euler angles \((\theta,\phi,\psi)\cong SO(3)\) rather than just \(2\). The classical Hamiltonian is also more complicated:
As a sanity check, in the earlier symmetric \(I:=I_x=I_y\) case with finite \(I_z<\infty\) but still operating in the high-temperature limit \(\beta E_I,\beta E_{I_z}\ll 1\), the Gaussian integrals can be evaluated analytically:
which reduces to the expected result \(Z_1=\frac{\sqrt{\pi}}{|G_p\cap SO(3)|}\sqrt{\frac{T^3}{T_I^2T_{I_z}}}\) as \(\beta E_I,\beta E_{I_z}\to 0\) since \(e^{\beta E_I^2/4E_{I_z}}\text{erfc}\left(\sqrt{\frac{\beta}{E_{I_z}}}\frac{E_I}{2}\right)\to 1\).
Problem: For both the linear and nonlinear rigid rotors, compute their contributions to \(\mu,S\) and \(p\) in terms of \((N,V,T)\).
Solution: Now, \(Z=Z_1^N\) and \(F=-Nk_BT\ln Z_1\) (note: the \(1/N!\) only applies to the translational partition function). For the linear rigid rotor:
The purpose of this post is to outline a derivation of the classical Maxwell distribution \(\rho_{\textbf V}(\textbf v|m,T)\), i.e. the probability density function for thecontinuous speed random vector \(\textbf V\) in a monatomic ideal gas given the atomic mass \(m\) and temperature \(T\).
It turns out that the specific Gaussian form of the \(\textbf v\)-dependence in the Maxwell distribution, namely\(\rho_{\textbf V}(\textbf v|m,T)\propto e^{-|\textbf v|^2/2\sigma^2(m,T)}\) does not actually require any physics whatsoever, emerging instead purely from the \(SO(3)\) symmetry of the setup (valid whether or not there’s any interaction among the gas particles). Put another way, any continuous random vector like the momentum \(\textbf P\), the acceleration \(\textbf A\), the angular momentum \(\textbf L\), etc. will also be normally distributed across the gas particles! For sake of concreteness however, let us work with the velocity random vector \(\textbf V\).
To prove the magical claim above, first note that \(\rho_{\textbf V}(\textbf v)=\rho_{\textbf V}(v_x,v_y,v_z)\) is clearly the joint probability distribution formed from the probability distributions \(\rho_{V_x}(v_x),\rho_{V_y}(v_y),\rho_{V_z}\) of the particle velocities along the \(x,y\) and \(z\) directions. Mathematically, the probabilistic chain rule asserts (for instance) that \(\rho_{\textbf V}(v_x,v_y,v_z)=\rho_{V_x}(v_x)\rho_{V_y|V_x}(v_y|v_x)\rho_{V_z|V_x,V_y}(v_z|v_x,v_y)\). However, the continuous random variables \(V_x,V_y,V_z\) are independent and identically distributed by \(SO(3)\) symmetry so the conditional probability distributions all simplify to the same marginal distribution denoted \(\tilde\rho:=\rho_{V_x}=\rho_{V_y}=\rho_{V_z}\). Thus, the mathematical challenge is to solve the following functional equation simultaneously for \(\rho_{\textbf V}\) and \(\tilde\rho\):
As usual, the trick to solving functional equations is to convert them to differential equations. Taking logs to convert the product into a sum and then differentiating with respect to \(v_x,v_y\) and then \(v_z\) yields:
where we are viewing \(\rho_{\textbf V}=\rho_{\textbf V}(\textbf v)=\rho_{\textbf V}(|\textbf v|)=\rho_{\textbf V}(v)\) by the same \(SO(3)\) isotropy argument (and indeed the answer was already spoiled earlier!). By the usual separation of variables argument, one can set everything equal to a constant suggestively denoted \(-1/\sigma^2\) (it has to be negative or the resulting solution would not be normalizable as required for interpretation as a probability distribution). This yields a separable first-order ODE which is straightforwardly integrated (and normalized) to yield a zero-mean normal distribution:
If one is specifically interested in the distribution \(\rho_V(v)\) of the continuous speed random variable \(V:=|\textbf V|\), then clearly \(\rho_V(v)=4\pi v^2\rho_{\textbf V}(v)\) is instead a chi distribution rather than a normal distribution.
In order to determine the parameter \(\sigma\) however, it is (finally!) necessary to look at the physics. For \(N=1\) gas particle with Hamiltonian \(H\), its semi-classical canonical partition function \(Z\) is:
For an ideal gas, the classical Hamiltonian is just \(H=|\textbf p|^2/2m\), so if the container has volume \(V=\int_{\textbf x\in\textbf R^3}d^3\textbf x\) then:
At this point the quick way to obtain \(\sigma\) would be to recognize that the semi-classical canonical partition function is the normalization factor in the Boltzmann distribution of the canonical ensemble, so the integrand must be proportional to the probability density function for the momentum random vector \(\textbf P\), from which one can directly see (using \(p^2/2m=mv^2/2\)) that \(1/2\sigma^2=\beta m/2\) so \(\sigma=1/\sqrt{\beta m}=\sqrt{kT/m}\). Alternatively, if that was too slick, one can also separate out the \(3\) Gaussian integrals in the Poisson fashion to get:
\[Z=\frac{V}{\lambda^3}\]
with \(\lambda=\sqrt{\frac{2\pi\hbar^2}{mkT}}\) the thermal de Broglie wavelength of the gas particles. Since the gas is ideal, hence non-interacting, the semi-classical canonical partition function for \(N\) ideal gas particles (denoted \(Z\) again by abuse of notation) is:
\[Z=\frac{V^N}{N!\lambda^{3N}}\]
where the \(N!\) is to avert the Gibbs paradox by accounting for identical particles (although see the article on the Gibbs paradox by E.T. Jaynes). Armed with \(Z\), everything else is at most a few derivatives away. The most immediate is the Helmholtz free energy \(F=-kT\ln Z\approx -NkT\left(\ln V-\ln N-3\ln\lambda+1\right)\) from which one deduces that the pressure \(p=-\frac{\partial F}{\partial V}=\frac{NkT}{V}\) follows the ideal gas law. The expected total energy is \(\langle E\rangle =-\partial\ln Z/\partial\beta=\frac{3}{2}NkT\) with \(C_V=\partial\langle E\rangle/\partial T=\frac{3}{2}Nk\) the heat capacity and by the fluctuation-dissipation theorem \(\sigma_E^2=kT^2C_V=3Nk^2T^2/2\) so that the relative fluctuations \(\sigma_E/\langle E\rangle=\sqrt{2/3N}\) become vanishingly small in the thermodynamic limit. The entropy \(S=k(\beta\langle E\rangle+\ln Z)\) is then given by the Sackur-Tetrode equation:
Notably, the extensivity of the entropy \(S(\xi N,\xi V, T)=\xi S(N, V, T)\) is only possible here thanks to the earlier inclusion of the \(N!\) in the denominator of the semi-classical canonical partition function \(Z\).
Anyways, back to computing \(\sigma\). We can first convert the Maxwell distribution \(\rho_{V}(v)\) for the continuous speed random variable \(V\) into a probability density function \(\rho_K(E)\) for the continuous kinetic energy random variable \(K\) by enforcing \(\rho_K(E)dE=\rho_{V}(v)dv\). Thus, one obtains a gamma distribution:
Equating the average energy per ideal gas particle \(\langle E\rangle/N=\frac{3}{2}kT\) calculated earlier with the expected kinetic energy \(\text E[K]=\int_0^{\infty}E\frac{2}{m^{3/2}\sigma^3}\sqrt{\frac{E}{\pi}}e^{-E/m\sigma^2}dE=\frac{2m\sigma^2}{\sqrt{\pi}}\Gamma(5/2)=\frac{3m\sigma^2}{2}\) yields an equation for \(\sigma\) that can be solved to yield \(\sigma=\sqrt{kT/m}\) as claimed earlier.