The Dirac Equation

Problem #\(1\): Define the Poincaré group.

Solution #\(1\): In words, the Poincaré group is the isometry group of Minkowski spacetime \(\textbf R^{1,3}\). Mathematically, it is the semidirect product \(\textbf R^{1,3}⋊O(1,3)\) of the normal subgroup \(\textbf R^{1,3}\) of spacetime translations with the Lorentz subgroup \(O(1,3)\) of rotations, Lorentz boosts, parity, and time reversal:

The reason the Poincaré group is a semidirect product rather than simply a direct product \(\textbf R^{1,3}\times O(1,3)\) is that spacetime translations and Lorentz transformations talk to each other via the Poincaré group’s composition rule:

\[(\Delta X_2,\Lambda_2)\cdot(\Delta X_1,\Lambda_1):=(\Delta X_2+\Lambda_2\Delta X_1,\Lambda_2\Lambda_1)\]

Problem #\(2\): With the exception of parity and time reversal, all symmetries of the Poincaré group can be implemented continuously:

where the green box of proper, orthochronous Poincaré transformations is the connected component of the identity \(1\) and the only one of the \(4\) connected components which by itself comprises a subgroup of the Poincaré group. Among the \({4}\choose{2}\)\(=6\) pairs of connected components one can join together, the \(2\) pairs:

\[\textbf R^{1,3}⋊SO(1,3)=\textbf R^{1,3}⋊SO^+(1,3)\cup\textbf R^{1,3}⋊SO^-(1,3)\]

and

\[\textbf R^{1,3}⋊O^+(1,3)=\textbf R^{1,3}⋊SO(1,3)\cup\textbf R^{1,3}⋊\not{S}O(1,3)\]

are the only subgroups of the Poincaré group \(\textbf R^{1,3}⋊O(1,3)\). Anyways, this digression is just to say that, modulo the Klein \(4\)-group quotient structure \(\{1,\Pi,\Theta,\Pi\Theta\}\), the proper, orthochronous Poincaré group \(\textbf R^{1,3}⋊SO^+(1,3)\) is a Lie group and hence can be studied via its Lie algebra. Classify fully the structure of this so-called Poincaré algebra.

Solution #\(2\): Consider spacetime translations \(\Delta X\in\textbf R^{1,3}\) first. It is clear that:

\[e^{-\Delta X\cdot\frac{\partial}{\partial X}}X=X-\Delta X\]

where \(\Delta X\cdot\frac{\partial}{\partial X}=\Delta X^{\mu}\partial_{\mu}\) is a Lorentz scalar. Although there’s nothing quantum mechanical about what is being done here, for sake of comparison one can artificially introduce \(\hbar\) and define the generator of spacetime translations \(P:=-i\hbar\frac{\partial}{\partial X}\Leftrightarrow P_{\mu}=-i\hbar\partial_{\mu}\) so that a general spacetime translation would be given by the infinite-dimensional unitary representation \(e^{-i\Delta X\cdot P/\hbar}\). Since spacetime translations are abelian, the algebra is simply characterized by \([P_{\mu},P_{\nu}]=0\).

Considering proper, orthochronous Lorentz transformations \(\Lambda\in SO^+(1,3)\) next, recall that membership in \(O(1,3)\) means that:

\[\Lambda^T\eta\Lambda=\eta\Leftrightarrow(\Lambda^T)_{\mu}^{\space\space\nu}\eta_{\nu\rho}\Lambda^{\rho}_{\space\space\sigma}=\eta_{\mu\sigma}\]

So if one substitutes \(\Lambda=1+\omega\Leftrightarrow \Lambda^{\mu}_{\space\space\nu}=\delta^{\mu}_{\space\space\nu}+\omega^{\mu}_{\space\space\nu}\) for infinitesimal \(\omega\), one finds that any such generator \(\omega\in\frak{so}^+\)\((1,3)\) must be antisymmetric:

\[\omega^T\eta=-\eta\omega\Leftrightarrow\omega_{\mu\nu}=-\omega_{\nu\mu}\]

which is unsurprising considering \(SO^+(1,3)\) is similar to \(SO(4)\). Thus, one can adapt the obvious \(\frak{so}\)\((4)\) basis of \(6\) antisymmetric generators to the Lorentz Lie algebra \(\frak{so}\)\(^+(1,3)\). There are various notations with which one can express these \(6\) generators:

More Intuitive Notation: One has \(3\) generators \(\textbf J:=(J_1,J_2,J_3)\) for rotations and \(3\) generators \(\textbf K:=(K_1,K_2,K_3)\) for Lorentz boosts such that by definition a rotation through angular displacement \(\Delta\boldsymbol{\phi}\) is given by \(e^{-i\Delta\boldsymbol{\phi}\cdot\textbf J/\hbar}\) and a Lorentz boost through rapidity \(\Delta\boldsymbol{\varphi}\) is given by \(e^{-i\Delta\boldsymbol{\varphi}\cdot\textbf K/\hbar}\). One can kind of “cheat” using one’s prior knowledge about how macroscopic Lorentz transformation matrices \(\Lambda\) look in the case of rotations or Lorentz boosts about various axes, “infinitesimalize” them, and extract the corresponding generator. For example, for a rotation about the \(z\)-axis, as \(\Delta\phi\to 0\) one expects to \(\mathcal O(\Delta\phi)\):

\[\begin{pmatrix}1&0&0&0\\0&\cos\Delta\phi&-\sin\Delta\phi&0\\0&\sin\Delta\phi&\cos\Delta\phi&0\\0&0&0&1\end{pmatrix}\approx\begin{pmatrix}1&0&0&0\\0&1&-\Delta\phi&0\\0&\Delta\phi&1&0\\0&0&0&1\end{pmatrix}=\begin{pmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}-\frac{i\Delta\phi J_3}{\hbar}\]

from which one obtains:

\[J_3=i\hbar\begin{pmatrix}0&0&0&0\\0&0&-1&0\\0&1&0&0\\0&0&0&0\end{pmatrix}\]

and similarly for \(J_1,J_2\). Meanwhile, repeating the procedure for an infinitesimal Lorentz boost by \(\Delta\varphi\to 0\) along the \(x\)-axis:

\[\begin{pmatrix}\cosh\Delta\varphi&-\sinh\Delta\varphi&0&0\\-\sinh\Delta\varphi&\cosh\Delta\varphi&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}\approx \begin{pmatrix}1&-\Delta\varphi&0&0\\-\Delta\varphi&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}=\begin{pmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}-\frac{i\Delta\varphi K_1}{\hbar}\]

So that the generator of Lorentz boosts along the \(x\)-axis is:

\[K_1=i\hbar\begin{pmatrix}0&-1&0&0\\-1&0&0&0\\0&0&0&0\\0&0&0&0\end{pmatrix}\]

and similarly for \(K_2,K_3\). In light of these matrix representations for the Lorentz Lie algebra generators, one can directly compute their commutation relations:

\[[J_i,J_j]=i\hbar\varepsilon_{ijk}J_k\]

\[[J_i,K_j]=i\hbar\varepsilon_{ijk}K_k\]

\[[K_i,K_j]=-i\hbar\varepsilon_{ijk}J_k\]

where the last commutator expresses (via the BCH formula) the counterintuitive phenomenon of Wigner rotation associated with non-collinear \(i\neq j\) Lorentz boosts.

Less Intuitive Notation: The cross product on \(\textbf R^3\) realizes a representation of the rotation algebra \(\frak{so}\)\((3)\) given by assigning each antisymmetric tensor \(\omega\in\frak{so}\)\((3)\) to the unique vector \(\boldsymbol{\omega}\in\textbf R^3\) such that \(\omega\textbf x=\boldsymbol{\omega}\times\textbf x\) for all \(\textbf x\in\textbf R^3\). More explicitly:

\[\omega=\begin{pmatrix}0&-\omega_3&\omega_2\\\omega_3&0&-\omega_1\\-\omega_2&\omega_1&0\end{pmatrix}\]

\[\boldsymbol{\omega}=\begin{pmatrix}\omega_1\\\omega_2\\\omega_3\end{pmatrix}\]

Or in indices, \(\omega_{ij}=-\varepsilon_{ijk}\omega_k\) or, inverting by contraction with a suitable Levi-Civita symbol, one has the equivalent form \(\omega_k=-\frac{1}{2}\varepsilon_{ijk}\omega_{ij}\). Here, the vector \(\boldsymbol{\omega}\) is arguably more intuitive than the tensor \(\omega\) even though both contain exactly the same information.

In precisely this same spirit, one can take the more intuitive “vector” generators \(\textbf J,\textbf K\) for rotations and Lorentz boosts and convert them into the equivalent but less intuitive form of a \(4\times 4\) matrix, each of whose \(16\) elements is itself a \(4\times 4\) matrix (one could even think of this as a \(16\times 16\) matrix if one so desired). That is, it is typical to define:

\[(\mathcal M^{\rho\sigma})^{\mu\nu}=\eta^{\rho\mu}\eta^{\sigma\nu}-\eta^{\sigma\mu}\eta^{\rho\nu}\]

where \(0\leq \mu,\nu,\rho,\sigma\leq 3\) are all spacetime indices (this is partly the reason for introducing this less intuitive representation of the Lorentz algebra because it puts space and time on more equal footing in this compact expression). This is related to the more intuitive representation above by:

\[i\hbar\mathcal M=\begin{pmatrix}0&-K_1&-K_2&-K_3\\K_1&0&J_3&-J_2\\K_2&-J_3&0&J_1\\K_3&J_2&-J_1&0\end{pmatrix}=\begin{pmatrix}0&-\textbf K^T\\\textbf K& -J\end{pmatrix}\]

where \(J=\textbf J\times\) as in the example with \(\omega=\boldsymbol{\omega}\times\) earlier, or in indices:

\[i\hbar\mathcal M^{ij}=\varepsilon_{ijk}J_k\Leftrightarrow J_i=\frac{i\hbar}{2}\varepsilon_{ijk}\mathcal M^{jk}\]

\[K_i=i\hbar\mathcal M_{i0}\]

One can then compute the Lorentz algebra in a single sweep:

Lowering the \(\nu\) index with the metric (it’s more useful to have it like this since Lorentz transformations have a conventionally NW-SE index structure \(\Lambda^{\mu}_{\space\space\nu}\) and one is anticipating generating these by exponentiating linear combinations of these generators):

\[(\mathcal M^{\rho\sigma})^{\mu}_{\space\space\nu}=\eta^{\rho\mu}\delta^{\sigma}_{\space\space\nu}-\eta^{\sigma\mu}\delta^{\rho}_{\space\space\nu}\]

so that an arbitrary Lorentz algebra element \(\omega\in\frak{so}^+\)\((1,3)\) (not necessarily infinitesimal anymore) is some linear combination of the \(6\) generators \(\mathcal M^{\rho\sigma}\in\frak{so}^+\)\((1,3)\) with real coefficients \(\Delta\Phi_{\rho\sigma}\in\textbf R\) quantifying the extent of a rotation or Lorentz boost (they are related to the earlier more intuitive representation by \(\Delta\phi_i=\) and \(\Delta\varphi_i=\)):

\[\omega=\frac{1}{2}\Delta\Phi_{\rho\sigma}\mathcal M^{\rho\sigma}\Leftrightarrow\omega^{\mu}_{\space\space\nu}=\Delta\Phi^{\mu}_{\space\space\nu}\]

where the factor of \(1/2\) is to compensate for double-counting (since both \(\Delta\Phi_{\sigma\rho}=-\Delta\Phi_{\rho\sigma}\) and \(\mathcal M^{\sigma\rho}=-\mathcal M^{\rho\sigma}\) are antisymmetric in their indices \(\rho,\sigma\)) (explicitly write out \(\Phi\) to have \(\textbf J,\textbf K\) entries inside).

Finally, spacetime translations talk with Lorentz transformations (recall the proper, orthochronous Poincaré group is given by a semi-direct product \(\textbf R^{1,3}⋊SO^+(1,3)\)), so to fully specify the Poincaré algebra, one also has to figure out how the generators \(P:=(H/c,\textbf P)\) of \(\textbf R^{1,3}\) talk with the generators \(\textbf J,\textbf K\) of \(\frak{so}^+\)\((1,3)\), not just how they talk within their own Lie subalgebras.

It turns out one can concisely summarize all the remaining commutation relations within the Poincaré algebra in the more intuitive form:

\[[H,\textbf J]=\textbf 0\Leftrightarrow [H,J_i]=0\]

\[[H,\textbf K]=i\hbar c\textbf P\Leftrightarrow [H,K_i]=i\hbar cP_i\]

\[\textbf P\times\textbf J=i\hbar\textbf P\Leftrightarrow [P_i,J_j]=i\hbar\varepsilon_{ijk}P_k\]

\[[\textbf P,\textbf K]_{\otimes}=i\hbar\frac{H}{c}1\Leftrightarrow [P_i,K_j]=i\hbar\frac{H}{c}\delta_{ij}\]

or the less intuitive but more compact/Lorentz invariant form:

\[[\mathcal M^{\rho\sigma},P^{\mu}]=\eta^{\rho\mu}P^{\sigma}-\eta^{\sigma\mu}P^{\rho}\]

Problem #\(3\): Motivate the general definition of a Clifford algebra in mathematics, state the anticommutation relations of the specific Clifford algebra \(\text{Cl}_{1,3}(\textbf R)\), and state the chiral/Weyl representation of \(\text{Cl}_{1,3}(\textbf R)\).

Solution #\(3\): Given a vector space \(V\) over a field \(F\), a quadratic form \(Q:V\to F\) is any function with \(2\) properties:

  1. \(Q(\lambda\textbf v)=\lambda^2Q(\textbf v)\) for all vectors \(\textbf v\in V\) and scalars \(\lambda\in F\).
  2. The function \(\langle\space|\space\rangle:V\times V\to F\) defined by \(\langle\textbf v|\textbf w\rangle:=\frac{1}{2}(Q(\textbf v+\textbf w)-Q(\textbf v)-Q(\textbf w))\) is a bilinear form (called the polarization of \(Q\)).

Note that \(Q\) and \(\langle\space|\space\rangle\) contain exactly the same information, since one can also invert \(Q(\textbf v)=\langle\textbf v|\textbf v\rangle\). A general quadratic form on Euclidean space \(\textbf R^n\) is of the form \(Q(\textbf v):=\textbf v^Tg\textbf v\) (where without loss of generality one can take \(g^T=g\) to be symmetric) whose associated symmetric bilinear form is \(\langle\textbf v|\textbf w\rangle=(\textbf v^Tg\textbf w+\textbf w^Tg\textbf v)/2\). In particular, when \(g=1\) is the standard metric on Euclidean space, then \(\langle\textbf v|\textbf w\rangle=\textbf v\cdot\textbf w\) coincides with the usual dot product.

Given a quadratic space \((V,F,Q)\), one can construct from this the Clifford algebra \(\text{Cl}(V,F,Q)\) by starting with the tensor algebra \(\oplus_{k=0}^{\infty}V^{\otimes k}\) of \(V\) and quotienting it by the relation \(\textbf v^2:=\textbf v\otimes\textbf v=Q(\textbf v)1_V\) for all \(\textbf v\in V\). From this, it follows as a corollary that more generally, for any two vectors \(\textbf v,\textbf w\in V\):

\[\{\textbf v,\textbf w\}=\textbf v\textbf w+\textbf w\textbf v:=\textbf v\otimes\textbf w+\textbf w\otimes\textbf v=2\langle\textbf v|\textbf w\rangle 1_V\]

where in physics contexts the tensor product \(\textbf v\otimes\textbf w\) is often abbreviated to just \(\textbf v\textbf w\) and called the geometric product. Furthermore, since this anticommutator \(\{\textbf v,\textbf w\}\) is bilinear, it may be completely specified by finding a basis \(\textbf e_i\) of \(V\) and simply specifying the value of \(\{\textbf e_i,\textbf e_j\}\) for all pairs of basis vectors \(\textbf e_i,\textbf e_j\). Returning to the example of \(\textbf R^n\) with the standard metric, if one chooses an orthonormal basis \(\hat{\textbf e}_i\cdot\hat{\textbf e}_j=\delta_{ij}\), then the Clifford algebra is simply characterized by \(\{\hat{\textbf e}_i,\hat{\textbf e}_j\}=2\delta_{ij}1\). More explicitly, the \(\hat{\textbf e}_i\) all anticommute with each other and each squares to the identity \(\hat{\textbf e}^2_i=1\).

In \(\textbf R^2\), picking an orthonormal basis \(\hat{\textbf e}_1,\hat{\textbf e}_2\), although a general element of the Clifford algebra \(\text{Cl}_2(\textbf R)\) is formally:

\[a1+(b\hat{\textbf e}_1+c\hat{\textbf e}_2)+(d\hat{\textbf e}_1\hat{\textbf e}_1+e\hat{\textbf e}_1\hat{\textbf e}_2+f\hat{\textbf e}_2\hat{\textbf e}_1+g\hat{\textbf e}_2\hat{\textbf e}_2)+…\]

The anticommutation relations of the Clifford algebra allow one to substantially reduce the number of degrees of freedom e.g. \(\hat{\textbf e}_1\hat{\textbf e}_1=\hat{\textbf e}_2\hat{\textbf e}_2=1\) and \(\hat{\textbf e}_1\hat{\textbf e}_2=-\hat{\textbf e}_2\hat{\textbf e}_1\)). All higher-order multivectors also reduce to either a scalar, a vector, or a bivector, e.g. \(\hat{\textbf e}_1\hat{\textbf e}_2\hat{\textbf e}_1=-\hat{\textbf e}_1\hat{\textbf e}_1\hat{\textbf e}_2=-\hat{\textbf e}_2\). So in fact, a general element of the Clifford algebra \(\text{Cl}_2(\textbf R)\) is simply spanned by \(4\) generators:

\[a1+b\hat{\textbf e}_1+c\hat{\textbf e}_2+d\hat{\textbf e}_1\hat{\textbf e}_2\]

While the scalars and vectors \(1^2=\hat{\textbf e}_1^2=\hat{\textbf e}_2^2=1\) all square to the identity, the bivector \((\hat{\textbf e}_1\hat{\textbf e}_2)^2=-1\) squares to minus the identity! This is reminiscent of the identity \(i^2=-1\), where \(i=\sqrt{-1}\) is the imaginary unit of \(\textbf C\).

In \(\textbf R^3\), now with an orthonormal basis \(\hat{\textbf e}_1,\hat{\textbf e}_2,\hat{\textbf e}_3\), a general element of the Clifford algebra \(\text{Cl}_3(\textbf R)\) is parameterized by \(8\) real degrees of freedom (extrapolating, it should be clear by Pascal’s triangle that \(\dim\text{Cl}_{n}(\textbf R)=2^n\)):

\[a+b\hat{\textbf e}_1+c\hat{\textbf e}_2+d\hat{\textbf e}_3+e\hat{\textbf e}_1\hat{\textbf e}_2+f\hat{\textbf e}_2\hat{\textbf e}_3+g\hat{\textbf e}_1\hat{\textbf e}_3+h\hat{\textbf e}_1\hat{\textbf e}_2\hat{\textbf e}_3\]

note that \(\text{Cl}_3(\textbf R)\) can be explicitly represented by the \(3\) Pauli matrices \(\{\sigma_i,\sigma_j\}=2\delta_{ij}1\). In addition, the bivectors \((\hat{\textbf e}_1\hat{\textbf e}_2)^2=(\hat{\textbf e}_2\hat{\textbf e}_3)^2=(\hat{\textbf e}_1\hat{\textbf e}_3)^2=-1\) all square to minus the identity while \(\hat{\textbf e}_1\hat{\textbf e}_2\hat{\textbf e}_2\hat{\textbf e}_3\hat{\textbf e}_3\hat{\textbf e}_1=-1\), reminiscent of the defining relations \(i^2=j^2=k^2=ijk=-1\) of the quaternions \(\textbf H\) (H for “Hamilton”).

In special relativity, Minkowski spacetime \(\textbf R^{1,3}\) is a real vector space that naturally comes with a quadratic form \(Q(X):=X^T\eta X\) induced by the Minkowski metric \(g=\eta=\text{diag}(1,-1,-1,-1)\). Its associated \(16\)-dimensional Clifford algebra \(\text{Cl}_{1,3}(\textbf R)\) is therefore subject to the anticommutation relations:

\[\{\gamma^{\mu},\gamma^{\nu}\}=2\eta^{\mu\nu}1\]

for which one representation (and it turns out the only \(4\)-dimensional irreducible representation up to unitary similarity) is the chiral/Weyl representation:

\[\gamma^{\mu}=\begin{pmatrix}0&\sigma^{\mu}\\\sigma_{\mu}&0\end{pmatrix}\]

where \(\sigma^{\mu}=(1,\boldsymbol{\sigma})\) and \(\sigma_{\mu}=(1,-\boldsymbol{\sigma})\).

Problem #\(4\):

Solution #\(4\):

Problem #\(5\):

Solution #\(5\):

Problem #\(6\):

Solution #\(6\):

Problem #\(7\): Motivate (from a historical perspective) the classical Lagrangian density \(\mathcal L\) for the Dirac bispinor field \(\psi:\textbf R^{1,3}\to\textbf C^4\), and hence obtain the on-shell Dirac equation. Write down the general solution to the Dirac equation (given that it’s linear!).

Solution #\(7\): Recall that for a non-relativistic free particle, the on-shell dispersion relation \(H=\textbf P^2/2m\) is first order in the energy \(H\), whereas for a relativistic free particle it is second order \(H^2=c^2\textbf P^2+m^2c^4\). The former quantizes to the Schrodinger equation \((\partial_i\partial_i+2ik_c\partial_0)\psi=0\), while the latter quantizes to the Klein-Gordon equation \((\partial^{\mu}\partial_{\mu}+k_c^2)\phi=0\). At first glance, one might think that \(\phi(X)\) would, like the wavefunction \(\psi(X)\), also admit a probabilistic interpretation given by the Born rule, but historically there were \(2\) reasons why this was a bit suspicious.

Objection #\(1\): In non-relativistic QM one would only have needed to specify \(|\psi(t=0)\rangle\) to get the entire future time evolution of \(|\psi(t)\rangle\) because the Schrodinger equation is first-order in time \(t\). So it seems a bit weird that, just to be relativistically compatible, one would also need to specify the initial velocity \(\partial_0|\psi(t=0)\rangle\).

Objection #\(2\): On the same note of being \(2\)-nd order in time, the Klein-Gordon equation has \(2\) linearly independent solutions:

\[\phi(X)\sim\int d^3\textbf k(a_{\textbf k}e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}+a_{\textbf k}^{\dagger}e^{-i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\]

(this should be contrasted with the Schrodinger equation which would have only had a single linearly independent \(e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\) where the energy is always positive b/c it’s just kinetic energy, so get a positive semi-definite probability density). In particular, the conserved current is \(J^{\mu}=\) and the probability density \(J^0=\phi^2\) can be negative?

An obvious way to get around Objection #\(1\) is to instead Fourier transform the square root of the earlier relativistic dispersion relation \(H=\pm\sqrt{c^2\textbf P^2+m^2c^4}\)…indeed, in some sense this is what Dirac did. But notice the \(\pm\) signs!

Ironically though, in the modern understanding of QFT, despite the whole motivation being to somehow interpret \(\phi\) probabilistically, it actually does not admit such an interpretation. Similar to Bohr’s derivation of the gross structure of the hydrogenic atom, the method is not fully sound in its foundations but the answer it gives turns out to be correct.

However, the Klein-Gordon equation, being a second order differential equation, has \(2\) linearly independent solutions:

\[\phi(X)\sim\int d^3\textbf k(a_{\textbf k}e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}+a_{\textbf k}^{\dagger}e^{-i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\]

Recalling that the

\[i\hbar\frac{\partial|\psi\rangle}{\partial t}=H|\psi\rangle\]

The Dirac Lagrangian \(\mathcal L=\mathcal L(\psi,\bar{\psi})\) is:

\[\mathcal L=\bar{\psi}(i\hbar\displaystyle{\not\!\partial}-mc1)\psi\]

So by varying the action with respect to the Dirac adjoint bispinor field \(\bar{\psi}:=\psi^{\dagger}\gamma^0\), one obtains the Dirac equation:

\[(i\hbar\displaystyle{\not\!\partial}-mc1)\psi=0\]

Solution #\(7\):

Problem #\(8\):

Solution #\(8\):

Problem #\(9\):

Solution #\(9\):

Problem #\(10\):

Solution #\(10\):

Problem #\(11\): Verify that, in spite of the anticommutation relations that were imposed in canonical quantization of the Dirac free field theory, the resulting creation and annihilation operators have the expected commutation relations with the Hamiltonian \(:H:\):

\[[:H:,(b_{\textbf k}^{m_s})^{\dagger}]=\hbar\omega_{\textbf k}(b_{\textbf k}^{m_s})^{\dagger}\]

\[[:H:,b_{\textbf k}^{m_s}]=-\hbar\omega_{\textbf k}b_{\textbf k}^{m_s}\]

\[[:H:,(c_{\textbf k}^{m_s})^{\dagger}]=\hbar\omega_{\textbf k}(c_{\textbf k}^{m_s})^{\dagger}\]

\[[:H:,c_{\textbf k}^{m_s}]=-\hbar\omega_{\textbf k}c_{\textbf k}^{m_s}\]

Solution #\(11\):

These commutation relations assert that the spectrum of Dirac free FT are given by Fock states describing particles and antiparticles carrying arbitrary momentum \(\textbf k\in\textbf R^3\) and spin angular momentum \(m_s\in\{\pm 1/2\}\) that emerge from excitations of the vacuum.

Posted in Blog | Leave a comment

Weakly Coupled Quantum Field Theories

Problem #\(1\): Write down a general \(\phi\)-dependent perturbation to the Klein-Gordon Lagrangian density \(\mathcal L\) for a real scalar field \(\phi\), and explain why in practice only the first \(2\) terms of such a perturbation need to be considered.

Solution #\(1\): Assuming the potential \(V(\phi)\) is analytic in \(\phi\):

\[\mathcal L=\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi-\frac{1}{2}k_c^2\phi^2-\sum_{n\geq 3}\frac{\lambda_n}{n!}\phi^n\]

where the coupling constants \(\lambda_0=\lambda_1=0\), \(\lambda_2=k_c^2\), etc. are not to be confused with the Compton wavelength \(\lambda:=h/mc\). Because the action \(S=\int d^4|X\rangle\mathcal L\) has dimensions of angular momentum \([S]=[\hbar]\) and \([d^4|X\rangle]=[\lambda]^4\), it follows that \([\mathcal L]=[\hbar]/[\lambda]^4=[\hbar^{-3}(mc)^4]\) and thus \([\phi]=[\hbar^{-1/2}mc]\), so:

\[[\lambda_n\phi^n]=[\mathcal L]\Rightarrow [\lambda_n]=[\hbar^{(n-6)/2}(mc)^{4-n}]\]

So on dimensional analysis grounds, the \(n\)-th order coupling constant \(\lambda_n\) does not by itself give a meaningful assessment as to the relevance of the corresponding \(\phi^n\) perturbation, rather it is the dimensionless parameter \(\hbar^{(6-n)/2}(E/c)^{n-4}\lambda_n\) (where \([E]=[mc^2]\) is the relevant energy scale of the process/physics at work) that really matters, i.e. the \(n\)-th order \(\phi^n\) perturbation is small iff \(\hbar^{(6-n)/2}(E/c)^{n-4}\lambda_n\ll 1\).

By graphing \(\hbar^{(6-n)/2}(E/c)^{n-4}\lambda_n\) as a function of \(E\) and given the non-negotiable fact that one is typically interested in “low-\(E\) physics”, it follows that all the \(\phi^n\) couplings for \(n\geq 5\) are irrelevant at these low energy scales, and instead only the marginal \(\phi^4\) quartic coupling and relevant \(\phi^3\) cubic coupling need to be considered. This analysis is quite deep, and goes to show the power of dimensional analysis.

Problem #\(2\): What does it mean for an interacting QFT to be weakly coupled?

Solution #\(2\): It means that one can legitimately treat higher-order interaction terms as small perturbations of a free QFT, i.e. that perturbation theory is accurate (otherwise it would a strongly coupled QFT which is a whole different beast…)

Problem #\(3\): Derive the Dyson series solution to the linear dynamical system \(\dot{\textbf x}(t)=A(t)\textbf x(t)\) and show that it can be expressed as a time-ordered exponential. Show explicitly that it satisfies the ODE.

Solution #\(3\): First, “normalize” away the uninteresting initial condition \(\textbf x(0)\) by writing \(\textbf x(t)=U(t)\textbf x(0)\) so that \(\dot U(t)=A(t)U(t)\) with \(U(0)=1\). Then integrating both sides:

\[U(t)=1+\int_0^tdt’A(t’)U(t’)\]

And recursively substitute ad infinitum:

\[=1+\int_0^tdt’A(t’)\left(1+\int_0^{t’}dt^{\prime\prime}A(t^{\prime\prime})U(t^{\prime\prime})\right)=1+\int_0^tdt’A(t’)+\int_0^tdt’\int_0^{t’}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+…\]

Recall that the definition of the time-ordering superoperator \(\mathcal T\) applied to a string of operators \(A_1(t_1)A_2(t_2)…A_n(t_n)\) is to reorder them so that the earliest time operators also act first (i.e. are on the right). For instance, for a string of \(2\) operators:

\[\mathcal T[A_1(t_1)A_2(t_2)]=A(t_1)A(t_2)\Theta(t_1-t_2)+A(t_2)A(t_1)\Theta(t_2-t_1)\]

or for a string of \(3\) operators, there will be \(3!=6\) terms:

\[\mathcal T[A_1(t_1)A_2(t_2)A_3(t_3)]=A_1(t_1)A_2(t_2)A_3(t_3)\Theta(t_1-t_2)\Theta(t_2-t_3)+A_1(t_1)A_3(t_3)A_2(t_2)\Theta(t_1-t_3)\Theta(t_3-t_2)+…\]

In particular, the Dyson series can be equivalently expressed as the time-ordering superoperator \(\mathcal T\) applied to the strings of operators obtained once one Taylor expands the naive exponential solution:

\[e^{\int_0^tA(t’)dt’}:=1+\int_0^tA(t’)dt’+\frac{1}{2}\int_0^tdt’\int_0^tdt^{\prime\prime}A(t’)A(t^{\prime\prime})+\frac{1}{6}\int_0^tdt’\int_0^tdt^{\prime\prime}\int_0^tdt^{\prime\prime\prime}A(t’)A(t^{\prime\prime})A(t^{\prime\prime\prime})+…\]

Applying \(\mathcal T\) (which is linear):

\[\mathcal T[e^{\int_0^tA(t’)dt’}]=1+\int_0^tA(t’)dt’+\frac{1}{2}\int_0^tdt’\int_0^tdt^{\prime\prime}\biggr(A(t’)A(t^{\prime\prime})\Theta(t’-t^{\prime\prime})+A(t^{\prime\prime})A(t’)\Theta(t^{\prime\prime}-t’)\biggr)+…\]

\[=1+\int_0^tA(t’)dt’+\frac{1}{2}\int_0^tdt’\int_0^{t’}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+\frac{1}{2}\int_0^tdt’\int_{t’}^{t}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+…\]

Among the \(2\) quadratic terms, the first one looks like the one in the Dyson series, but how about the second one? In fact it is equal to the first term; to see this draw a picture to facilitate interchanging the order of integration:

so it becomes clear that \(\int_0^t dt’\int_{t’}^{t}dt^{\prime\prime}=\int_0^tdt^{\prime\prime}\int_0^{t^{\prime\prime}}dt’\). Finally, if one wishes one can interchange the dummy variables \(t’\leftrightarrow t^{\prime\prime}\) to make the two double integrals look utterly identical. As one may anticipate, it turns out that all \(3!=6\) triple integrals at cubic order also coincide, precisely cancelling the \(1/3!=1/6\) prefactor in the Taylor expansion, etc. so that the Dyson series can indeed be written as a time-ordered exponential.

It is also satisfying to explicitly check that the Dyson series for \(U(t)\) satisfies the ODE:

\[\dot U(t)=\frac{d}{dt}\biggr(1+\int_0^tdt’A(t’)+\int_0^tdt’\int_0^{t’}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+…\biggr)=A(t)+A(t)\int_0^t dt’A(t’)+…=A(t)U(t)\]

Or perhaps a more “slick” derivation is to appeal to the time-ordered exponential form of the Dyson series:

\[\dot U(t)=\frac{d}{dt}\mathcal T[e^{\int_0^tA(t’)dt’}]=\mathcal T\left[\frac{d}{dt}e^{\int_0^tA(t’)dt’}\right]\]

and then recognize that within \(\mathcal T\) all operators commute (e.g. for a string of \(2\) operators it is clear that \(\mathcal T[A_1(t_1)A_2(t_2)]=\mathcal T[A_2(t_2)A_1(t_1)]\)) so one can just differentiate naively:

\[=\mathcal T[A(t)e^{\int_0^tA(t’)dt’}]=A(t)\mathcal T[e^{\int_0^tA(t’)dt’}]\]

where the last step follows because \(t\) is the latest time so it can be “factored out” of \(\mathcal T\) to the left.

Problem #\(4\): Write down the Lagrangian density \(\mathcal L\) for scalar Yukawa QFT. What is the condition for this interacting QFT to be weakly coupled? By inspecting \(\mathcal L\), what charges remain conserved despite the interactions? Obtain the scalar Yukawa Hamiltonian density \(\mathcal H\).

Solution #\(4\): Given a real scalar field \(\phi\) and a complex scalar field \(\psi\):

\[\mathcal L=\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi+\partial^{\mu}\bar{\psi}\partial_{\mu}\psi-\frac{1}{2}k_{\phi}^2\phi^2-k_{\psi}^2\bar{\psi}\psi-g\phi\bar{\psi}\psi\]

where the relevant cubic coupling interaction \(g\phi\bar{\psi}\psi\) will be a small perturbation to the free QFT (so that this interacting QFT is weakly coupled) iff the coupling constant \(g=\lambda_3/6\) satisfies (using the results of Solution #\(1\)):

\[\sqrt{\hbar}g\ll k_{\phi},k_{\psi}\]

Just as for the free theory, since one still has the \(U(1)\) global internal symmetry \(\psi\mapsto e^{i\alpha}\psi\) in spite of the cubic coupling term, one continues to have a conserved charge \(Q=N_{\psi}-N_{\bar{\psi}}\).

Performing the relevant Legendre transforms:

\[\partial_0\phi\mapsto\pi_{\phi}=\partial_0\phi\]

\[\partial_0\psi\mapsto\pi_{\psi}=\partial_0\bar{\psi}\]

\[\partial_0\bar{\psi}\mapsto\pi_{\bar{\psi}}=\partial_0\psi\]

One obtains the Hamiltonian density \(\mathcal H=\mathcal H_{\text{free}}+\mathcal H_{\text{int}}\) for scalar Yukawa QFT, where:

\[\mathcal H_{\text{free}}=\frac{\pi^2_{\phi}}{2}+\frac{1}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2+\pi_{\bar{\psi}}\pi_{\psi}+\frac{\partial\bar{\psi}}{\partial\textbf x}\cdot\frac{\partial\psi}{\partial\textbf x}+\frac{1}{2}k^2_{\phi}\phi^2+k^2_{\psi}\bar{\psi}\psi\]

and:

\[\mathcal H_{\text{int}}=g\phi\bar{\psi}\psi\]

Problem #\(5\): Within scalar Yukawa QFT, explain why the scattering amplitude for the process \(\phi\to\bar{\psi}\psi\) is non-zero, and hence calculate it to order \(g\); what is an assumption implicit in this question?

Solution #\(5\): The implicit assumption is that the initial and final scattering states \(|t=-\infty\rangle,|t=\infty\rangle\in\mathcal F\) are indeed Fock states of the free QFT, i.e. \(H_{\text{free}}=\int d^3\textbf x\mathcal H_{\text{free}}\)-eigenstates, specifically the Lorentz-normalized excitations of the vacuum:

\[|t=-\infty\rangle:=\sqrt{\frac{2}{\hbar}}\frac{\omega_{\textbf k_{\phi}}}{c}a_{\textbf k_{\phi}}^{\dagger}|0\rangle\]

\[|t=\infty\rangle:=\frac{2\omega_{\textbf k_{\bar{\psi}}}\omega_{\textbf k_{\psi}}}{\hbar c^2}b_{\textbf k_{\psi}}^{\dagger}c_{\textbf k_{\bar{\psi}}}^{\dagger}|0\rangle\]

After a long time, in the interaction picture the initial state \(|t=-\infty\rangle\) evolves to \(\mathcal T\exp\left(-\frac{i}{\hbar}\int_{-\infty}^{\infty}dtH_{\text{int}}\right)|t=-\infty\rangle\) (where \(H_{\text{int}}\) is the interaction Hamiltonian in the interaction picture). The overlap of this with a particular final state \(|t=\infty\rangle\) thus yields the probability amplitude of that scattering process:

\[\langle t=\infty|\mathcal T\exp\left(-\frac{i}{\hbar}\int_{-\infty}^{\infty}dtH_{\text{int}}\right)|t=-\infty\rangle\]

where this is expected to be non-zero because \(Q=0\) is conserved in this process. If one is only interested in computing this scattering amplitude to order \(g\), then as usual one Taylor expands the time-ordered exponential to first order:

\[\approx \langle t=\infty|t=-\infty\rangle-\frac{i}{\hbar}\langle t=\infty|\int_{-\infty}^{\infty}dt H_{\text{int}}|t=-\infty\rangle\]

In this case, the initial and final states are orthogonal \(\langle t=\infty|t=-\infty\rangle=0\) which follows intuitively because a \(\phi\)-particle is not the same as a \(\bar{\psi}\psi\)-particle antiparticle pair, or mathematically the creation operators of the different particles all commute so either \(a_{\textbf k_{\phi}}^{\dagger}\) annihilates the bra \(\langle 0|\) or \(b_{\textbf k_{\psi}},c_{\textbf k_{\bar{\psi}}}\) annihilate the ket \(|0\rangle\).

The \(\mathcal O(g)\) matrix element term simplifies to:

\[-\frac{i}{\hbar c}\int d^4|X\rangle\langle t=\infty|\mathcal H_{\text{int}}|t=-\infty\rangle\sim-ig\int d^4|X\rangle\langle 0|b_{\textbf k_{\psi}}c_{\textbf k_{\bar{\psi}}}\phi\bar{\psi}\psi a_{\textbf k_{\phi}}^{\dagger}|0\rangle\]

At this point one has to pull out the Fourier normal mode plane wave expansions of \(\phi,\psi,\bar{\psi}\) (with time dependence since operators evolve in the interaction picture as if they were in the Heisenberg picture of the free QFT), annihilate as much as you can and otherwise use commutation relations to pick up delta functions where needed to do the \(13\)-dimensional integral. When the dust settles, one finds that the scattering amplitude for the process goes like \(-ig\delta^4|K_{\psi}+K_{\bar{\psi}}-K_{\phi}\rangle\) so it’s zero unless \(4\)-momentum is conserved. As a corollary, in the ZMF (rest frame) of the \(\phi\)-particle, one can check that this implies \(m_{\phi}\geq 2m_{\psi}\).

Problem #\(6\): Verify Wick’s theorem for the case of \(3\) scalar fields:

\[\mathcal T\phi_{|X_1\rangle}\phi_{|X_2\rangle}\phi_{|X_3\rangle}=:\phi_{|X_1\rangle}\phi_{|X_2\rangle}\phi_{|X_3\rangle}:+\Delta^F_{|X_2-X_3\rangle}\phi_{|X_1\rangle}+\Delta^F_{|X_3-X_1\rangle}\phi_{|X_2\rangle}+\Delta^F_{|X_1-X_2\rangle}\phi_{|X_3\rangle}\]

Solution #\(6\):

Problem #\(7\): Within weakly coupled scalar Yukawa QFT, compute the scattering amplitude for the process \(\bar{\psi}\psi\to\bar{\psi}\psi\) at order \(\mathcal O(g^2)\).

Solution #\(7\): At order \(\mathcal O(g^2)\), the scattering amplitude for this interaction from the initial state \(|t=-\infty\rangle\sim b_{\textbf k_1}^{\dagger}b_{\textbf k_2}^{\dagger}|0\rangle\) to the final state \(|t=\infty\rangle=\sim b_{\textbf k’_1}^{\dagger}b_{\textbf k’_2}^{\dagger}|0\rangle\) is given by:

\[\langle t=-\infty|S-1|t=\infty\rangle\sim i\mathcal A_{\bar{\psi}\psi\to\bar{\psi}\psi}\delta^4|K_{\psi}+K_{\bar{\psi}}-K_{\phi}\rangle\]

where to order \(\mathcal O(g^2)\), the “amplitude” \(\mathcal A_{\bar{\psi}\psi\to\bar{\psi}\psi}\) is given by the sum of the following two tree-level Feynman diagrams, namely a \(t\)-channel and a \(u\)-channel:

Problem #\(8\): Repeat Problem #\(7\) for the case of scalar Yukawa scattering of mesons \(\phi\phi\to\phi\phi\) at order \(\mathcal O(g^4)\).

Solution #\(8\): Given that the interaction Hamiltonian density for scalar Yukawa theory is cubic \(\mathcal H_{\text{int}}=g\phi\bar{\psi}\psi\), it follows that any Feynman diagram must be a cubic graph with exactly one \(\phi\)-edge, one \(\psi\)-edge, and one \(\bar{\psi}\)-edge at each vertex of the directed graph. By initially drawing the following:

and pondering for a bit, it is intuitively plausible that the simplest way to satisfy the above constraint is for the \(\phi\)-mesons to exchange virtual \(\psi/\bar{\psi}\) nucleons/antinucleons via the following one-loop (no longer tree-level) Feynman diagram:

and so because there are \(4\) vertices, it follows that the \(\phi\phi\to\phi\phi\) interaction scattering amplitude is \(\mathcal O(g^4)\). The Feynman rules assert that to \(\mathcal O(g^4)\) it is given by (notation here is a bit loose, in particular \(|\phi\phi\rangle^{\dagger}\neq\langle\phi\phi|\)):

\[\langle\phi\phi|S-1|\phi\phi\rangle\sim i\mathcal A_{\phi\phi\to\phi\phi}\delta^4(K’_1+K’_2-K_1-K_2)\]

where \(\mathcal A_{\phi\phi\to\phi\phi}\) is given by the one-loop integral over \(4\)-momenta:

Problem #\(10\): Define the Mandelstam variables \(s,t,u\in\textbf R\) by drawing suitable \(s\)-channel, \(t\)-channel and \(u\)-channel Feynman diagrams for interactions between two particles (of arbitrary type).

Solution #\(10\):

Posted in Blog | Leave a comment

Toric Code

The purpose of this post is to explain what the toric code is, and its potential use as a fault-tolerant quantum error correcting stabilizer surface code for topological quantum computing.

To begin, consider an \(N\times N\) square lattice \(\Lambda\) with a qubit placed at each edge as follows:

The vertices of the lattice \(\Lambda\) are called stars \(S\) while its faces are called plaquettes \(P\) (which can also be viewed as the vertices of the dual lattice \(\Lambda^*\), etc.). Since \(\Lambda\) is an \(N\times N\) square lattice, one might initially think that there are \(2N(N+1)\) qubits; however the catch is that opposite ends of the lattice are identified with each other, so that there are in fact only \(2N^2\) qubits. More importantly, this imposition of periodic boundary conditions endows \(\Lambda\cong S^1\times S^1\) with the topology of a torus, hence the name “toric code”.

A priori, the state space \(\mathcal H\) of \(2N^2\) qubits has dimension \(\text{dim}\mathcal H=2^{2N^2}\) (though if some/all of these qubits are identical then it may be less than this). Using the notation \(\sigma_i^x\) as a shorthand for the operator \(1_1\otimes…\otimes 1_{i-1}\otimes\sigma^x\otimes 1_{i+1}\otimes…\otimes 1_{2N^2}\) acting on \(\mathcal H\) and similarly for \(\sigma_i^y\) and \(\sigma_i^z\), one can define star and plaquette four-body interaction operators \(A_S,B_P\) associated respectively to stars \(S\) or plaquettes \(P\) in \(\Lambda\) by:

\[A_S:=\prod_{i\in S}\sigma_i^z\]

\[B_P:=\prod_{i\in P}\sigma_i^x\]

where the notation “\(i\in S\)” denotes the set of \(4\) qubits nearest to a given star \(S\), and similarly for “\(i\in P\)”; the notation \(\prod_{i\in S}\) or \(\prod_{i\in P}\) does not suffer from any operator ordering ambiguity because Pauli operators on distinct qubits (regardless of their \(x,y,z\) nature) trivially commute.

For any two stars \(S,S’\), and any two plaquettes \(P,P’\), one can check (without any motivation yet at this point) that:

\[[A_S,A_{S’}]=[B_{P},B_{P’}]=[A_S,B_P]=0\]

Specifically, if the stars or plaquettes are well-separated from each other then these are always trivially true so it suffices to just check the “edge cases” so to speak when the stars or plaquettes are close enough to share qubits. Then, for each such shared qubit, one just has to apply the anticommutation relation \(\{\sigma^{\alpha},\sigma^{\beta}\}=2\delta^{\alpha\beta}1\). Notably, the commutation relation \([A_S,B_P]=0\) relies on the fundamental fact that a star \(S\) and plaquette \(P\) which are adjacent to each other always share only \(2\) qubits, and \(2\) is even so \((-1)^2=1\).

Finally, suppose one thinks of the lattice of qubits as a spin lattice such that the spins interact via the following Hamiltonian:

\[H=-\sum_{S\in\Lambda}A_S-\sum_{P\in\Lambda^*}B_P\]

From the above discussion, it follows that for any star \(S\) or plaquette \(P\), one has:

\[[H,A_S]=[H,B_P]=0\]

There are \(N^2\) stars and \(N^2\) plaquettes, so this would suggest that one is free to specify the however, using a “Stokes theorem” type of argument (basically a more refined version of earlier arguments for showing \([A_S,A_{S’}]=[B_P,B_{P’}]=0\)), one can convince oneself inductively (i.e. playing around with small \(N=1,2,…\) etc.) that it is precisely thanks to the toric topology of \(\Lambda\) that one has the constraints:

\[\prod_{S\in\Lambda}A_S=\prod_{P\in\Lambda^*}B_P=1\]

(not to be confused with the quantities in the Hamiltonian \(H\) which involve sums \(\sum\) rather than products \(\prod\)).

To emphasize again, this is why a torus is desirable…4-fold degeneracy in the ground state manifold of \(H\).

Posted in Blog | Leave a comment

The IR Spectrum of \(\text{C}_{60}\)

The purpose of this post is to explain why, experimentally, one only observes \(4\) electric dipole transitions in the IR spectrum of buckminsterfullerene, also known as \(\text C_{60}\) or informally as the buckyball:

Buckyball Basics

The simplest conceptual way to construct a buckyball is to start with a regular icosahedron:

which has \(F=20\) equilateral triangular faces, \(V=12\) vertices, and \(E=30\) edges; notice this obeys Euler’s formula \(F+V=E+2\). Then, by simply “shaving off” each of the \(V=12\) vertices; it is clear from the picture these \(12\) vertices would transform into \(12\) pentagonal faces, each surrounded by \(5\) hexagonal faces so that no \(2\) pentagonal faces share an adjacent edge, yielding a buckyball topology (and geometrically, all \(\text C-\text C\) covalent bonds should be of equal length). This shaving process increases the number of faces to \(F’=32\) (of which \(12\) are pentagonal and \(20\) are hexagonal), the number of vertices to \(V’=60\) (i.e. just the number of carbon atoms), and the number of edges to \(E’=90\) such that Euler’s formula is maintained \(F’+V’=E’+2\).

IR Spectroscopy Selection Rules

Recall that within the Born-Oppenheimer approximation, the \(N\) nuclei of some molecule are “clamped” at positions \(\textbf X:=(\textbf X_1,…,\textbf X_N)\) and are regarded as moving in the effective potential \(V_{\text{eff}}(\textbf X)\) due to the electrons \(e^-\), so that the molecular Hamiltonian is:

\[H=T_{\text n}+V_{\text{eff}}(\textbf X)\]

If in addition one approximates \(V_{\text{eff}}(\textbf X)\) by a harmonic potential about the (stable) equilibrium configuration \(\textbf X_0\) of the nuclei:

\[V_{\text{eff}}(\textbf X)\approx\frac{1}{2}(\textbf X-\textbf X_0)^T\left(\frac{\partial^2 V_{\text{eff}}}{\partial\textbf X^2}\right)_{\textbf X_0}(\textbf X-\textbf X_0)\]

then, upon diagonalizing the Hessian \(\left(\frac{\partial^2 V_{\text{eff}}}{\partial\textbf X^2}\right)_{\textbf X_0}\) into the orthonormal eigenbasis of its normal modes, one obtains \(3N\) decoupled simple harmonic oscillators so the spectrum of \(H\) is just that of an anisotropic harmonic oscillator in \(\textbf R^{3N}\), at least within all the assumptions made so far (e.g. ignoring anharmonicity of \(V_{\text{eff}}(\textbf X)\)). Each Fock eigenstate \(|n_1,n_2,…,n_{3N}\rangle\) of \(H\) (thought of as a vibrational eigenstate because one can view \(H\) as a Hamiltonian governing vibrations/SHM of the nuclei about equilibrium) is thus a product \(|n_1,n_2,…,n_{3N}\rangle=\otimes_{i=1}^{3N}|n_i\rangle\) of suitable \(1\)D quantum harmonic oscillator wavefunctions.

IR radiation is just like any other electromagnetic radiation in that (within the dipole approximation, which is fair for long-wavelength IR radiation despite the larger size of molecules) it stimulates electric dipole transitions via the time-dependent perturbation \(\Delta H=\Delta H(t)\) to the molecular Hamiltonian \(H\):

\[\Delta H=-\boldsymbol{\pi}\cdot\textbf E_0\cos \omega_{\text{IR}} t\]

where the electric dipole moment \(\boldsymbol{\pi}=\boldsymbol{\pi}(\textbf X,\textbf x)\) of the molecule is:

\[\boldsymbol{\pi}:=e\sum_{\text{nuclei }i}Z_i\textbf X_i-e\sum_{\text{electrons }i}\textbf x_i\]

and the molecule is assumed to be neutral \(\sum_{\text{nuclei }i}Z_i=\sum_{\text{electrons }i}\). By Fermi’s golden rule, the molecular transition rate between two distinct vibrational \(H\)-eigenstates \(|n_1,…,n_{3N}\rangle\to|n’_1,…,n’_{3N}\rangle\) is proportional to the mod-square of the matrix element of the (time-independent amplitude of the) perturbation:

\[|\langle n’_1,…,n’_{3N}|\boldsymbol{\pi}\cdot\textbf E_0|n\rangle|^2\]

However, most IR spectroscopy experiments (e.g. IR laser sources in FTIR spectrometers) use unpolarized IR radiation, so this means one should really replace by an isotropic averaging factor of \(1/3\) and forget about (as far as selection rules are concerned) a factor of \(|\textbf E_0|^2\), so just focus on:

\[|\langle n’_1,…,n’_{3N}|\boldsymbol{\pi}|n\rangle|^2\]

Similar to what was done earlier for the effective potential \(V_{\text{eff}}(\textbf X)\), one can also Taylor expand the dipole moment operator \(\boldsymbol{\pi}\) within the configuration space \(\textbf X\) of the nuclei about the equilibrium configuration \(\textbf X_0\):

\[\boldsymbol{\pi}\approx\boldsymbol{\pi}(\textbf X_0)+\left(\frac{\partial\boldsymbol{\pi}}{\partial\textbf X}\right)_{\textbf X_0}(\textbf X-\textbf X_0)+…\]

Sandwiching this in the matrix element, the constant term \(\boldsymbol{\pi}(\textbf X_0)\) (which represents the possibility of a permanent electric dipole like in a water molecule) vanishes by orthogonality of the distinct vibrational \(H\)-eigenstates, so the quantity of interest is just:

\[\left|\left(\frac{\partial\boldsymbol{\pi}}{\partial\textbf X}\right)_{\textbf X_0}\langle n’_1,…,n’_{3N}|\textbf X-\textbf X_0|n_1,…,n_{3N}\rangle\right|^2\]

This immediately leads to a “gross selection rule” for two vibrational \(H\)-eigenstates to be coupled by the IR perturbation \(\Delta H\), namely that the Jacobian \(\left(\frac{\partial\boldsymbol{\pi}}{\partial\textbf X}\right)_{\textbf X_0}\) must be non-vanishing; physically, this means that only the normal modes of the nuclear vibration that experience a change in \(\boldsymbol{\pi}\) when the nuclei are slightly displaced from \(\textbf X_0\) will be IR-active.

…main point of this section is to explain that IR transitions are low-energy excitations require a change in electric dipole moment b/w initial and final states, which, factoring out the charge e, means the matrix element of the position observable b/w two states must be non-zero by Fermi’s golden rule. There are actually \(2\) selection rules for IR spectroscopy; the gross selection rule is that \(\partial\boldsymbol{\pi}/\partial\) must be zero.

Character Tables of Finite Group Representations

As a warmup, consider the symmetric group \(S_3=\{1, (12), (13), (23), (123), (132)\}\) of order \(|S_3|=3!=6\). Then \(S_3\) can be partitioned into \(3\) conjugacy classes, namely \(S_3=\{1\}\cup\{(12), (13), (23)\}\cup\{(123), (132)\}\). Each \(S_3\) conjugacy class maps onto an \(S_3\)-irrep, so there will also be three \(S_3\)-irreps. One of them is always just the trivial irrep whereby the permutations do nothing to all vectors. For symmetric groups, there is also always the sign irrep \(1,(123),(132)\mapsto 1\) and \((12),(13),(23)\mapsto -1\) such that odd permutations do nothing to all vectors but even \(A_3\)-permutations flip vectors across the origin. Finally, there is the standard \(S_3\)-irrep that one might intuitively think about as acting on the vertices of an equilateral triangle which rigidly drags the whole Cartesian plane \(\textbf R^2\) along with it. Note the dimensions of these \(S_3\)-irreps are the only ones that could have been compatible with its order \(6=1^2+1^2+2^2\).

Each \(S_3\)-irrep is associated with its own character class function which can be evaluated on an arbitrary representative of each conjugacy class. For instance, the trivial irrep is \(1\)-dimensional and its character always evaluates to \(1\) on all conjugacy classes. The sign irrep is also \(1\)-dimensional but its character evaluated on each conjugacy class is instead the sign of the permutations in that conjugacy class. Finally, noting that \(\cos(2\pi/3)=-1/2\) and that the trace of a rotation in \(\textbf R^2\) by angle \(\theta\) is \(2\cos\theta\), the following character table for \(S_3\) may be obtained:

So far this example has been fairly abstract. To bridge this “abstract nonsense” with chemistry, consider the specific example of the ammonia molecule \(\text{NH}_3\):

In this case, rather than pulling a random group (like \(S_3\)) out of thin air, here one can use the ammonia molecule to motivate studying the specific group of actions on \(\textbf R^3\) that leave it looking like nothing happened. In chemist jargon, one is interested in the point group of the ammonia molecule, the adjective “point” implying that any such action on \(\textbf R^3\) must have at least one fixed point in space (commonly chosen to be the origin) so that e.g. translations of the ammonia molecule are disregarded (cf. the distinction in special relativity between the Poincare group and its Lorentz subgroup). Note that this is indeed a group thanks to its very definition (if one action leaves ammonia looking like nothing happened, and a second action also leaves ammonia looking like nothing happened, then composing them will also leave ammonia looking like nothing happened).

So what is the point group of the ammonia molecule? For this, there isn’t really any super rigorous/systematic way to do it, one just has to stare hard at the molecule and think…

Clearly, one way to leave ammonia looking like nothing happened is to literally do nothing. There is also manifestly \(C_3\) rotational symmetry (by \(120^{\circ}\) or \(240^{\circ}\)). Finally, there are \(3\) reflection symmetries about “vertical” mirror planes. As a mathematician, it is thus easy to recognize that the point group of the ammonia molecule is just the dihedral group \(D_3\) (which happens to be isomorphic to \(S_3\)). However, as a chemist one would instead refer to this as the “\(C_{3v}\) point group”, where the \(C\) and the \(3\) subscript are meant to emphasize the \(C_3\) subgroup of rotational symmetries mentioned above, while the “\(v\)” subscript is meant to emphasize that the mirror planes are “vertical”. Strictly speaking, one should also check that there are no horizontal mirror planes, inversion centers, or improper rotation axes in order to be able to confidently assert that the point group of ammonia really is \(C_{3v}\) rather than \(C_{3v}\) merely being a subgroup of an actually larger point group.

The isomorphism \(C_{3v}\cong D_3\cong S_3\) means that by happy chance the character table for the ammonia point group \(C_{3v}\) is already known. For instance, the transpositions \((12),(23),(13)\) in \(S_3\) map onto vertical mirror plane reflections in \(C_{3v}\), while the \(3\)-cycles \((123),(132)\) are simply \(C_3\) rotations. Indeed, the \(3\)-dimensional \(C_{3v}\)-representation on the ammonia molecule itself embedded rigidly in \(\textbf R^3\) is reducible into a direct sum of the trivial irrep acting on the \(1\)-dimensional \(z\)-axis and the standard irrep acting on the \(2\)-dimensional \(xy\)-plane. It is also worth verifying that the columns of the character table satisfy orthonormality \(\sum_{\text{ccl}\text{ of }C_{3v}}|\text{ccl}|\chi_{\phi}(\text{ccl})\chi_{\phi’}(\text{ccl})=|C_{3v}|\delta_{\phi\cong\phi’}\) for any \(2\) \(C_{3v}\)-irreps \(\phi,\phi’\).

One can also consider how various scalar-valued and vector-valued polynomials on \(\textbf R^3\) transform passively under the \(C_{3v}\) point group. For instance, any function \(f=f(\rho)\) of the cylindrical coordinate \(\rho:=\sqrt{x^2+y^2}\) only, such as \(f(\rho)=\rho^2\), will transform under the trivial \(C_{3v}\)-irrep.

Making the reasonable assumption that all \(60\) carbon atoms are specifically of the \(^{12}\text C\) isotope, and are therefore identical bosons, so something about the permutation group \(S_{60}\)? (formalize the earlier comment about “looking the same” via the idea of identical quantum particles/ bosons and fermions where in that case the vectors are not in \(\textbf R^3\) but state vectors in the symmetrized tensor product Hilbert space of identical bosons)

Posted in Blog | Leave a comment

Laser Cooling & Trapping of Atoms

The purpose of this post is to explain several techniques for laser cooling and laser trapping of atoms. In order to better emphasize key conceptual points, it will take the approach of posing problems, followed immediately by their solutions.

Problem #\(1\): Recall that any locally conserved field is associated to a continuity equation \(\dot{\rho}+\partial_{\textbf x}\cdot\textbf J=0\). In particular, the local flow of the corresponding conserved quantity can always be interpreted in terms of a velocity field \(\textbf v\) (with SI units of meters/second) by enforcing that \(\textbf J=\rho\textbf v\). Interpret this in the context of fluid mechanics and electromagnetism.

Solution #\(1\): In fluid mechanics, mass is locally conserved, so \(\rho\) could represent mass density while \(\textbf J\) represents mass flux. Momentum is also locally conserved provided there are no external body forces acting on the fluid, so in that case \(\rho\) would be a vector field representing momentum density while \(\textbf J\) would be the stress tensor field representing momentum transport (this is just the Navier-Stokes equations). Similar comments can be made about energy, angular momentum, etc.

In electromagnetism, electric charge is locally conserved, so in that case \(\rho\) represents charge density while \(\textbf J\) represents an electric current. Similarly, the total energy stored in both electromagnetic fields and electric charges is locally conserved. In that case, the continuity equation in linear dielectrics takes the form:

\[\frac{\partial\mathcal E}{\partial t}+\textbf J_f\cdot\textbf E+\frac{\partial}{\partial\textbf x}\cdot\textbf S=0\]

where the electromagnetic field energy density is \(\mathcal E:=(\textbf D\cdot\textbf E+\textbf H\cdot\textbf B)/2\) and the Poynting vector \(\textbf S:=\textbf E\times\textbf H\). If an electromagnetic wave propagates with group velocity \(|\textbf v|\leq c\) in some linear dielectric, it follows that \(\textbf S=\mathcal E\textbf v\) (similar comments apply to momentum density; indeed they are unified in the Maxwell stress tensor).

Problem #\(2\): Within the framework of classical electromagnetism, what is the formula for the radiation pressure \(p_{\gamma}\) exerted at normal incidence to an absorbing surface \(\hat{\textbf n}\) with reflection coefficient \(R\)?

Solution #\(2\): Just as \(\textbf S=\mathcal E\textbf v\) (from Solution #\(1\)), so the radiation pressure is a certain projection of an isotropic current for the momentum density \(\boldsymbol{\mathcal P}\):

\[p_{\gamma}=(1+R)(\boldsymbol{\mathcal P}\cdot\hat{\textbf n})(\textbf v\cdot\hat{\textbf n})=(1+R)|\boldsymbol{\mathcal P}||\textbf v|\cos^2\theta\]

The photon dispersion relation \(\mathcal E=|\boldsymbol{\mathcal P}||\textbf v|\) in a linear dielectric then implies that the radiation pressure \(p_{\gamma}=(1+R)\mathcal E\cos^2\theta\) is given by a Malusian proportionality with the energy density stored in the EM field (aside: how is this related to the equation of state \(p=\mathcal E/3\) for a photon gas?).

Problem #\(3\): An atom moving at (nonrelativistic) speed \(v\) (in the lab frame) and meets a counter-propagating laser beam of incident intensity \(I\) and frequency \(\omega\) (both in the lab frame, where of course it is assumed that the laser source is also at rest in the lab frame). Estimate the velocity-dependent scattering force \(F_{\gamma}(v)\) exerted on the atom and its maximum possible value \(F_{\gamma}^*> F_{\gamma}(v)\).

Solution #\(3\): Heuristically, the scattering force might also be called the radiation force as it’s roughly just the radiation pressure \(p_{\gamma}\) exerted on the cross-section \(\sigma(\omega_D)\) of the atom (evaluated at the Doppler-shifted frequency \(\omega_D:=\omega\sqrt{(1+\beta)/(1-\beta)}\approx \omega+kv\)):

\[F_{\gamma}(v)\approx p_{\gamma}\sigma(\omega_D)=\frac{I}{c}\frac{\sigma_{01}}{1+(2\delta_D/\Gamma)^2}\]

where in the last expression it is assumed that the Doppler detuning \(\delta_D=\omega_D-\omega_{01}\) is small so in particular both \(I\) and \(\sigma_{01}\) are taken to be independent of \(\omega_D\). This is from the perspective of stimulated absorption. On the other hand, because all the analysis is implicitly in the steady state, one can also view this from the perspective of spontaneous emission:

\[F_{\gamma}(v)=(\hbar k)(\Gamma\rho_{11})=\frac{\Gamma}{2}\frac{s}{1+s+(2\delta_D/\Gamma)^2}\]

Note that these two formulas for \(F_{\gamma}(v)\) are not exactly equal, but approximately so at low irradiance \(s\ll 1\) (how to rationalize why?). Meanwhile, on resonance \(\delta_D=0\), in the high-irradiance \(s\to\infty\) limit, the theoretical max scattering force achievable is \(F_{\gamma}^*=\hbar k\Gamma/2\).

Problem #\(4\): Atoms flying out of an oven at temperature \(T\) are collimated to form an atomic beam which meets a counterpropagating laser \(s,\omega\). Without any frequency compensation on the laser \(\omega\), show how to determine the terminal velocity \(v_{\infty}\) of the atoms.

Solution #\(4\): The idea is that when one is far detuned from \(\omega_D\), there is hardly any scattering force and so the atoms move at roughly constant velocity. It is only in a small window \(\omega_D\in[\omega_{01}-\Gamma/2,\omega_{01}+\Gamma/2]\) that there is any significant scattering force. In this near-resonance regime, one can therefore set up an equation of motion \(M\dot v=-F_{\gamma}(v)\) whose solution is:

\[(1+s)(v_{\infty}-v_0)+\frac{4}{3\Gamma k^2}(\delta_{D,\infty}^3-\delta_{D,0}^3)=-\frac{\hbar k\Gamma s}{2M}\Delta t\]

where the initial speed can be taken as the most probable one \(v_0=\sqrt{2k_BT/M}\), and the time \(\Delta t\) can be estimated by considering the situation where \(\omega+kv_0=\omega_{01}+\Gamma/2\) and \(\omega+kv_{\infty}=\omega_{01}-\Gamma/2\) (actually not sure about this).

Problem #\(5\): If the atomic beam emerging from the oven along with the counterpropagating laser are oriented along some “\(z\)-axis”, write down the magnetic field \(\mathbf B(z)=B(z)\hat{\textbf k}\) required to make a Zeeman slower.

Solution #\(5\): Assuming one is working at relatively low magnetic field strengths, the low-field basis which diagonalizes the perturbative hyperfine Hamiltonian \(\Delta H_{\text{HFS}}\propto\textbf I\cdot\mathbf J\) is \(|f,m_f\rangle\). In this basis, the expectation of the perturbative Zeeman Hamiltonian \(\Delta H_{\text{Zeeman}}\) is approximately:

\[\langle f,m_f|\Delta H_{\text{Zeeman}}|f,m_f\rangle=g_f\mu_BB(z)m_f\hbar\]

Thus, if one is exploiting an electric dipole transition from a “ground state” \(|fm_f\rangle\) to an “excited state” \(|f’m’_f\rangle\), then one requires the difference in the Zeeman shifts experienced by these \(2\) levels to precisely compensate for the Doppler detuning \(\delta_D=\omega+kv-\omega_{01}\):

\[(g_{f’}m’_f-g_fm_f)\frac{\mu_BB(z)}{\hbar}=\delta_D\]

In practice, one would like to achieve a constant deceleration \(a>0\) so that \(v^2=v_0^2-2az\). This requires a magnetic field from e.g. a tapered solenoid of the form:

\[B(z)=B_0\sqrt{1-\frac{z}{z_0}}+B_{\text{bias}}\]

where \(z_0:=v_0^2/2a\) is the stopping distance, \(B_0:=\frac{\hbar kv_0}{\mu_B(g_{f’}m’_f-g_fm_f)}\) can be interpreted as \(B_0=B(0)-B(z_0)\), and the bias field is \(B_{\text{bias}}=\frac{\hbar(\omega-\omega_{01})}{\mu_B(g_{f’}m’_f-g_fm_f)}\); or at least, if one wishes to completely stop the atoms at \(z=z_0\) (if not, then one may need to experimentally lower \(B_{\text{bias}}\) until one obtains a desired terminal velocity; in that case one should then discontinuously turn off the magnetic field \(B(z):=0\) for \(z>z_0\) so that the E\(1\) transition would again be substantially detuned from resonance and so experience a negligible scattering force \(F_{\gamma}\) after the atoms exit the Zeeman slower).

Problem #\(6\): For the specific case of say sodium atoms \(\text{Na}\) (analogous remarks apply to any alkali atom), explain why electric dipole transitions between the stretched states \(|fm_f\rangle=|22\rangle\) in \(3s_{1/2}\) and \(|f’m’_f\rangle=|33\rangle\) in \(3p_{3/2}\) are a good choice for laser cooling via the scattering force \(F_{\gamma}\)? In this case, what should be the corresponding value of \(B_0\) used in a Zeeman slower (the transition has \(\lambda_{01}=589\text{ nm}\))? What should be the polarization of the incident laser light?

Solution #\(6\): This E\(1\) transition is not only allowed but also forms a closed cycle thanks to the usual E\(1\) selection rules, ensuring no optical pumping into dark states that would prevent a given atom from experiencing any further scattering. One can check that \(B_0\approx 0.12\text{ T}=1200\text{ G}\) is reasonable to achieve in the lab. The incident photons should be \(\sigma^+\) circularly-polarized along the quantization axis \(\hat{\textbf k}\) defined by the \(\textbf B\)-field along which the atomic beam is also propagating.

Problem #\(7\): Show, stating any assumptions made, that the velocity-dependent optical molasses force \(F_m(v)\approx -\alpha v\) along an arbitrary axis looks like linear damping with some damping coefficient \(\alpha>0\) for an appropriately-chosen laser detuning \(\delta\). Hence, show that the atom’s kinetic energy \(T(t)=T(0)e^{-t/\tau}\) (supposedly) decays exponentially with time constant \(\tau=m/2\alpha\).

Solution #\(7\): Along any of the \(3\) Cartesian axies, the component of the optical molasses force \(F_m(v)\) along that axis is a superposition of two opposite (but not necessarily equal) scattering forces:

\[F_m(v)=-F_{\gamma}(v)+F_{\gamma}(-v)\approx -2v\frac{\partial F_{\gamma}}{\partial v}(v=0)\]

where the low-velocity assumption \(kv/\Gamma\ll 1\) has been made. This shows that the damping coefficient is:

\[\alpha=2\frac{\partial F_{\gamma}}{\partial v}(v=0)=-4\hbar k^2s\frac{2\delta/\Gamma}{(1+s+(2\delta/\Gamma)^2)^2}\]

So in order for \(\alpha>0\) to actually be damping, one requires \(\delta<0\) to be red-detuned as suggested in the picture. It turns out that in order to treat the two laser beams independently (as has been implicitly done above), they both have the same low \(s\ll 1\) so that neither one saturates the E\(1\) transition too much. In this simple model of optical molasses, one should therefore really take:

\[\alpha=2\frac{\partial F_{\gamma}}{\partial v}(v=0)=-4\hbar k^2s\frac{2\delta/\Gamma}{(1+(2\delta/\Gamma)^2)^2}\]

In some sense, the fact that it looks like linear damping near \(v=0\) is not really conceptually any different from the fact that e.g. systems looks like simple harmonic oscillators near stable equilibria; it’s just taking the \(1\)-st order term in a Taylor series in \(v\).

Kinetic energy is lost through the dissipative power of the molasses force:

\[\dot T=F_m(v)v=-\alpha v^2=-\frac{2\alpha}{m}T\Rightarrow T(t)=T(0)e^{-t/\tau}\]

where \(\tau=m/2\alpha\) is usually on the order of microseconds. This suggests that there is no cooling limit, i.e. that one can quickly cool down arbitrarily close to \(T\to 0\), but this turns out to be impossible, see Problem #\(9\).

Problem #\(8\): For a random walk \(\textbf X_1,\textbf X_2,…,\textbf X_N\) of \(N\) steps, each of identical length \(L=|\textbf X_1|=|\textbf X_2|=…=|\textbf X_N|\), what is the expectation of the total displacement squared \(\langle|\textbf X_1+\textbf X_2+…+\textbf X_N|^2\rangle\)?

Solution #\(8\): All the dot products vanish so:

\[\langle|\textbf X_1+\textbf X_2+…+\textbf X_N|^2\rangle=NL^2\]

Problem #\(9\): Conceptually, what is the issue with the approach taken in Solution #\(7\)? Upon amending it, show that there is actually a Doppler cooling limit, namely a minimum temperature \(T_D\) that can be achieved using the optical molasses technique, given by:

\[k_BT_D=\frac{\hbar\Gamma}{2}\]

Solution #\(9\): The issue with Solution #\(7\) is that, while it accounts for the momentum kick given to the atom during stimulated absorption of photons and also recognizes that spontaneous emission averages to no momentum kick, it neglects fluctuations in both the number of photons absorbed and in the number of photons scattered. Using the result of Problem #\(8\), except here conceptually it’s not a random walk in real space but rather in momentum space with \(\langle p_z^2\rangle(t)=2(\hbar k)^2(2\gamma)t\), the kinetic energy \(T_z\) of an atom in the optical molasses now evolves as (see Foote textbook for the detailed arguments, some sketchy assumptions involved too, and turns out ultimately wrong, see Problem #\(10\)):

\[\dot T_z=-\frac{2\alpha}{m} T_z+\left(1+3\times\frac{1}{3}\right)\frac{\hbar^2k^2}{2m}(2\gamma)\]

So in the steady state \(\dot T_z=0\) and using equipartition \(T_z=k_BT/2\), one finds that:

\[k_BT=-\frac{\hbar\Gamma}{4}\frac{1+(2\delta/\Gamma)^2}{2\delta/\Gamma}\]

is minimized for red-detuned \(\delta<0\) at \(2\delta/\Gamma=-1\), yielding the Doppler cooling limit \(k_BT_D=\hbar\Gamma/2\) as claimed.

Problem #\(10\): Explain why, even after accounting for Poissonian fluctuations in the stimulated absorption and spontaneous emission of photons, the Doppler cooling limit for the optical molasses technique above rests on a “spherical cows in vacuum” foundation.

Solution #\(10\): Simply put, the assumption of a \(2\)-level atom has been implicitly lurking in the discussion the whole time, but it turns out that if one leverages the existence of other sublevels, one can do even better than the Doppler cooling limit above would suggest. Note also that the optical molasses laser cooling technique only provides confinement in \(\textbf k\)-space, but not so much in \(\textbf x\)-space which is also relevant; in other words, laser trapping is to \(\textbf x\) what laser cooling is to \(\textbf k\).

Problem #\(11\): In light of Problem #\(10\), draw \(3\) pictures that summarize how a magneto-optical trap (MOT) works. In particular, it should be clear that the MOT could not work with just a \(2\)-level atom. Also write down explicitly a formula for the MOT force \(F_{\text{MOT}}(z,v)\).

Solution #\(11\):

The MOT force \(F_{\text{MOT}}(z,v)\) is conceptually identical to the optical molasses force \(F_m(v)\) only now the detuning includes not only the \(v\)-dependent Doppler shift but also the \(z\)-dependent Zeeman shift (more precisely, the formula \(F_{\text{MOT}}(z,v)\) that will be given below is the component of the net MOT force \(\textbf F_{\text{MOT}}(\textbf x,\textbf v)\) along the quantization \(z\)-axis of the quadrupolar anti-Helmholtz MOT \(\textbf B\)-field near the origin \(\textbf x\approx\textbf 0\) whose purpose is to generate a nearly uniform magnetic field gradient; a similar equation holds in the \(\rho\)-direction except one should note from \(\frac{\partial}{\partial\textbf x}\cdot\textbf B=0\) that \(\frac{\partial B_{\rho}}{\partial\rho}=-\frac{1}{2}\frac{\partial B_z}{\partial z}\)).

\[F_{\text{MOT}}(z,v)=F_{\gamma}\left(\delta=\omega-kv-\left(\omega_{01}+\frac{g’_f\mu_B}{\hbar}\frac{\partial B_z}{\partial z}z\right)\right)-F_{\gamma}\left(\delta=\omega+kv-\left(\omega_{01}-\frac{g’_f\mu_B}{\hbar}\frac{\partial B_z}{\partial z}z\right)\right)\]

Making the same low Doppler shift assumption \(kv\ll\Gamma\) but now also a low Zeeman shift assumption \(\frac{g’_f\mu_B}{\hbar}\frac{\partial B_z}{\partial z}z\ll\Gamma\), one can check that this reduces to the previous linear damping force of the optical molasses together with a Hookean spring force:

\[F_{\text{MOT}}(z,v)=-\alpha v-\beta z\]

where the spring constant is \(\beta=\frac{\alpha g’_f\mu_B}{\hbar k}\frac{\partial B_z}{\partial z}\) and the damping coefficient \(\alpha\) is as before in Problem #\(7\). Usually, the atom will be overdamped (as is desirable) \(\alpha^2>4\beta m\).

Problem #\(12\): State one advantage and one disadvantage of a MOT compared with just the optical molasses technique.

Solution #\(12\): The advantage of a MOT is that its capture velocity \(v_{c,\text{MOT}}\gg v_{c,m}\) is much greater than that of optical molasses so it is a better trapper, hence its name (indeed, one can load a MOT simply by firing a room-\(T\) vapor, or to get more atoms one can first Zeeman slow them). The disadvantage is that it is not as good of a laser cooler; optical molasses is able to reach much colder temperatures, surpassing the naive Doppler cooling limit \(k_BT_D=\hbar\Gamma/2\) mentioned earlier (this is because of various sub-Doppler cooling mechanisms, notably Sisyphus cooling, that are elaborated upon in the next problem; all that matters for now is that such sub-Doppler cooling mechanisms are effectively disabled when the Zeeman shift exceeds the light shift as in a MOT).

Problem #\(13\): In a MOT, where does the trapping come from?

Solution #\(13\): It does not come from the quadrupolar \(\textbf B\)-field, but rather

Problem #\(14\): Draw a picture to explain how Sisyphus cooling allows. What is the new cooling limit?

Problem #\(15\): Show that the magnetic field strength \(|\textbf B(\textbf x)|\) in free space can never have a strict local maximum provided \(\dot{\textbf E}=\textbf 0\) (this is one version of Earnshaw’s theorem).

Solution #\(15\): One has the usual identity:

\[\frac{\partial}{\partial\textbf x}\times\left(\frac{\partial}{\partial\textbf x}\times\textbf B\right)=\frac{\partial}{\partial\textbf x}\left(\frac{\partial}{\partial\textbf x}\cdot\textbf B\right)-\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf B\]

But thanks to the assumptions of the problem the term on the left vanishes, and in addition to the absence of magnetic monopoles, one concludes that \(\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf B=\textbf 0\).

Now, \(|\textbf B(\textbf x)|\) satisfies the claim if and only if \(|\textbf B(\textbf x)|^2\) satisfies it. In turn, one just has to check that the Hessian \(\frac{\partial^2|\textbf B|^2}{\partial\textbf x^2}\) does not have \(3\) negative eigenvalues anywhere. A sufficient (but not necessary) condition for this would be if \(\text{Tr}\frac{\partial^2|\textbf B|^2}{\partial\textbf x^2}\geq 0\) everywhere. But the trace of the Hessian is just the Laplacian of the original scalar field:

\[\text{Tr}\frac{\partial^2|\textbf B|^2}{\partial\textbf x^2}=\left|\frac{\partial}{\partial\textbf x}\right|^2|\textbf B|^2=2\left|\frac{\partial|\textbf B|}{\partial\textbf x}\right|^2+2|\textbf B|\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf B=2\left|\frac{\partial|\textbf B|}{\partial\textbf x}\right|^2\geq 0\]

which completes the proof.

Problem #\(16\): What is the high-level mechanism of magnetic trapping? What are the \(3\) most common kinds of magnetic traps? (just draw a picture of each one with some annotations, no need for long explanations)

Solution #\(16\): Just as in the Stern-Gerlach experiment, magnetic dipoles in non-uniform magnetic fields experience a force that pushes them towards weaker or stronger magnetic field depending on their orientation relative to the magnetic field. The difference is that in the Stern-Gerlach experiment the atoms are flying in at hundreds of meters/second and getting deflected, way beyond capture velocity, whereas in magnetic trapping they have already been laser cooled. The diThe \(3\) most common magnetic traps are the quadrupole trap (made from a pair of anti-Helmholtz coils as in a MOT), a time-averaged orbiting potential (TOP) trap, and an Ioffe-Pritchard trap, the last of which is the most common and simply consists of \(4\) long cylindrical bars (which form a “linear quadrupole” in direct analogy with an electrostatic quadrupole) and \(2\) Helmholtz-like coils. Key words: radial/\(\rho\)-trapping and axial/\(z\)-trapping. Due to the result of Problem #\(14\), it follows that one can only trap atoms in low-field seeking states (such atoms are called low-field seekers).

Good overview here: https://arxiv.org/pdf/1310.6054

Posted in Blog | Leave a comment

Amplitude-Splitting Interferometry

The purpose of this post is to review the theory behind several standard amplitude-splitting interferometers.

Michelson Interferometer

The simplest possible version of a Michelson interferometer is the following:

For now, consider for simplicity a monochromatic light source of frequency \(\omega=ck=nvk=n’v’k=n^{\prime\prime}v^{\prime\prime}k\). In “chronological order”, the phases accrued by \(2\) amplitude-split reflected and transmitted rays in their respective journeys from the light source to the photodiode are explicitly:

\[\phi_1=nkx_0+\frac{n’ka}{2}+\pi+\frac{n’ka}{2}+nkx_2+\pi+nkx_2+\frac{n’ka}{2}+\frac{n^{\prime\prime}kb}{\sqrt{1-n’^2/2{n^{\prime\prime}}^2}}+\frac{n’ka}{2}+nkx_3\]

\[\phi_2=nkx_0+\frac{n’ka}{2}+\frac{n^{\prime\prime}kb}{\sqrt{1-n’^2/2{n^{\prime\prime}}^2}}+\frac{n’ka}{2}+nkx_1+\pi+nkx_1+\frac{n’ka}{2}+\pi+\frac{n’ka}{2}+nkx_3\]

where it has been assumed that \(n'<n^{\prime\prime}\) and that \(n<n_{\text{mirrors}}\), though as far as their relative phase difference \(\Delta\phi\) is concerned, because \(\textbf R\) is an additive abelian group it will always just be:

\[\Delta\phi=|\phi_2-\phi_1|=2nk\Delta x\]

where the arm-length difference is \(\Delta x:=|x_2-x_1|\) (to get a \(50:50\) beam splitter there should also be some constraint on \(a,b,n,n’,n^{\prime\prime}\) coming from the Fresnel equations). If the photodiode has sampling period \(T_s>0\), then one expects that the signal it measures will be a time-averaged irradiance of the form (it is being assumed here that the beam splitter really is exactly \(50:50\)):

\[I\sim\langle|e^{-i\omega t}+e^{i\Delta\phi}e^{-i\omega t}|^2\rangle_{T_s}\sim 1+\cos\Delta\phi\sim\cos^2nk\Delta x\]

where \(\cos\Delta\phi\) is the interference term which is sensitive to the relative phase difference \(\Delta\phi\) between the two interfering waves of identical frequency \(\omega\).

On the other hand then, given two distinct frequencies \(\omega<\omega’\) separated by \(\Delta\omega:=\omega’-\omega>0\), one can repeat the story above, this time measuring an irradiance:

\[I\sim\langle|(1+e^{i\Delta\phi})e^{-i\omega t}+(1+e^{i\Delta\phi’})e^{-i\omega’ t}|^2\rangle_{T_s}\sim 1+\cos\frac{\Delta\phi+\Delta\phi’}{2}\cos\frac{\Delta\phi’-\Delta\phi}{2}\]

where the coupled interference cross-term:

\[\frac{\sin\Delta\omega T_s+\sin(\Delta\omega T_s-\Delta\phi’)+\sin(\Delta\omega T_s+\Delta\phi)+\sin(\Delta\omega T_s+\Delta\phi-\Delta\phi’)}{\Delta\omega T_s}\]

is only non-negligible in the limit \(\Delta\omega T_s\ll 1\) which is typically not satisfied, and so here is taken to vanish. The key takeaway from this is that, while any two waves will interfere, waves of distinct frequencies \(\omega’\neq\omega\) do not in general interfere in an easy-to-measure way, i.e. only interference between waves of the same frequency \(\omega\) tends to be measurable.

That being said, it is essential to stress that interference between waves of the same \(\omega\) is interference nonetheless, and the resultant interference pattern \(I(\Delta\phi,\Delta\phi’)\) is readily measured in experiments and still extremely useful. One example of this is in the case where \(\Delta\omega\) is small, such as between \(2\) atomic fine or hyperfine transitions. Then the interference pattern can be written as a function of the Michelson interferometer arm-length difference \(\Delta x\):

\[I(\Delta x)\sim 1+\cos n(k+k’)\Delta x\cos n(k’-k)\Delta x\]

This is analogous to the phenomenon of beats in the \(t\)-domain; here in a spatial domain \(\Delta x\), the low-frequency envelope \(\cos n(k’-k)\Delta x\) is modulated by high-frequency fringes \(\cos n(k+k’)\Delta x\). By finding a \(\Delta x=\Delta x_0\) where fringes disappear due to vanishing of the envelope \(\Delta k:=k’-k=\pi/2n\Delta x_0\), one can thereby obtain the frequency difference \(\Delta\omega=nv\Delta k\) assuming \(n\) is non-dispersive. The envelope is sometimes phrased in terms of the fringe visibility:

\[V(\Delta x):=\frac{\Delta I_{\text{envelope}}(\Delta x)}{\bar I(\Delta x)}=\cos n(k’-k)\Delta x\]

As its name suggests, when the fringes become invisible (i.e. fringe visibility becomes \(V(\Delta x_0)=0\)), then one again recovers the same frequency difference \(\Delta\omega\) as above.

Another point implicitly assumed above and worth stressing is that the relative phase difference \(\Delta\phi\) of the two interfering waves have to maintain temporal coherence \(\frac{\partial\Delta\phi}{dt}=0\) at the photodiode position.

More generally, assuming that waves of any two distinct frequencies, no matter how close, do not interfere, a highly polychromatic signal containing irradiance \(2\hat I(k)dk\) in the wavenumber interval \([k,k+dk]\) would be expected to exhibit a Michelson interference pattern of the form:

\[I(\Delta x)=2\int_0^{\infty}\hat I(k)(1+\cos nk\Delta x)dk\]

Or equivalently, because \(\hat I(\Delta x)\in\textbf R\) is expected to be real-valued for all \(\Delta x\), it follows that its Fourier transform \(\hat I(k)\) needs to be Hermitian \(\hat I(-k)=\hat I^{\dagger}(k)\). This means one can rewrite this as an inverse Fourier transform:

\[I(\Delta x)=I_0+\int_{-\infty}^{\infty}\hat I(k)e^{ink\Delta x}dk\]

with \(I_0=\int_{-\infty}^{\infty}\hat I(k)dk\) the background irradiance. The Fourier transform then provides the spectrum of the light source:

\[\hat I(k)=\int_{-\infty}^{\infty}(I(\Delta x)-I_0)e^{-ink\Delta x}dx\]

and is the basis of the Fourier transform infrared spectrometer (FTIR) where a Michelson interferometer is employed along with one mirror on a motorized translation stage to vary \(\Delta x\) and a photodiode to measure the corresponding total interference pattern \(I(\Delta x)\) from all the wavenumbers \(k\in(0,\infty)\).

Note in all of the above discussion it has been implicitly assumed that all the beams in the Michelson interferometer are perfectly collimated, etc. so that the photodiode observes an interference pattern \(I(\Delta x)\) which would vary with arm-length difference \(\Delta x\) but spatially across the photodiode surface would be uniform; in practice, due to misalignment (accidental or deliberate) or imperfect collimation because the light source is not a point but extended, etc. there is not only an interference pattern in \(\Delta x\)-space, but in real space \((x,y)\) across the photodiode surface too; this whole interference profile \(I(x,y,\Delta x)\) would then change as \(\Delta x\) were to evolve (this is best understood by putting one’s eye at the photodiode location and looking into the beam splitter; one would then see \(2\) virtual images of the light source behind a mirror sitting around \(\sim 2 x_2\) away, separated by \(\sim 2\Delta x\). Conceptually, one can then just forget about the whole Michelson interferometer setup and pretend as if one just had \(2\) coherent light sources interfering with each other on a distant screen. Using this perspective, it is clear that the spatial fringes one observes will be sections of hyperbolae.

Finally, it is worth mentioning that sometimes a minor variant of the Michelson interferometer (called a Twyman-Green interferometer) is used for better collimation of the incident light beam. In addition, if instead of a beam splitter one were to use a half-silvered mirror, then a compensator may also need to be added (this would compensate not only for the optical path length difference in monochromatic incident light but also for optical dispersion in the case of polychromatic incident light).

Mach-Zehnder Interferometer

The setup is the following:

  • Kinda similar to Michelson except light never retravels its path.
  • Also, instead of a single photodiode on which equal-frequency waves interfere, have two photodiodes to measure the intensities on each output port of the last beamsplitter.
  • Idea is that for a \(50:50\) beamsplitter, a unitary (probability-conserving) to describe action on photon probability amplitudes for going down each arm of the Mach-Zehnder.

Fabry-Perot Interferometer

Miscellaneous: Thin Film Interferometer, Haidinger Fringes & Newton’s Rings

Problem: Write down the conditions for maximum constructive and destructive interference in a thin-film interferometer.

Solution: For a light wave incident at angle \(\theta\):

This is the phase shift between the amplitude-split waves arising solely from their path length difference and the magnitude of the impedance mismatch between the \(2\) dielectrics. However, depending on the sign of this impedance mismatch (i.e. whether \(k>k’\) or \(k<k’\)), one of the waves will, at one of the reflections, also accrue an additional \(\pi\) phase shift. So the total phase shift is actually:

\[\Delta\phi=2k_y’\Delta y+\pi\]

and from this the condition for maximum constructive interference \(\Delta\phi=2\pi m\) or maximum destructive interference \(\Delta\phi=2\pi m+\pi\) can be written. To actually compute the resultant light wave’s electric and magnetic fields, one would have to use the Fresnel equations, separately looking at \(s\) and \(p\)-polarized light.

Problem: Show geometrically that the same result applies in a Fabry-Perot interferometer, except the only difference now is that the amplitude is split at transmission/reflection from a low-\(Z\) to high-\(Z\) dielectric, contributing an additional \(\pi\) phase shift not present in the thin-film interferometer, so in total it’s \(2\pi\equiv 0\).

Solution:

One can also view the Fabry-Perot interferometer as a black box in which case it just behaves like a diffraction grating with slit spacing \(2\Delta x\tan\theta’\), though one also has to account for the extra optical path length traversed inside the optical cavity which can be modelled by an effective glass wedge that phase shifts each beam by a constant \(2k’\Delta x/\cos\theta’\) with respect to its nearest neighbours):

Problem: Now one would like to do better, improving on the previous result. Specifically, one would not only like to know when maxima occur, but also the shape of the FPI’s transmission function \(T(\Delta\phi)\) (i.e. the fraction of the incident power transmitted as a function of the phase shift \(\Delta\phi\) between nearest neighbour reflections). To this end, show that such a transmission function is given by the Airy formula:

\[T(\Delta\phi)=\frac{1}{1+\frac{4R}{(1-R)^2}\sin^2(\Delta\phi/2)}\]

where \(R\) is the reflectivity of either mirror (assumed to be identical), and \(\Delta\phi=2k_x’\Delta x\).

Solution:

Problem: Verify using the Airy transmission function the earlier claim in Problem … that maximum constructive interference occurs when \(\Delta\phi=2\pi m\).

Solution: The Airy transmission function \(T(\Delta\phi)\) is maximized when its denominator is minimized, which occurs when the positive semi-definite part is \(0\), i.e. \(\sin\Delta\phi/2=0\Rightarrow\Delta\phi/2=m\pi\Rightarrow\Delta\phi= 2\pi m\).

Problem: From the plot, it is clear that as one polishes the mirrors more, increasing their reflectivity \(R\), the transmission resonances of the Fabry-Perot occurring (as verified in the previous part) at \(\Delta\phi=2\pi m\) are getting sharper and sharper. Quantify this phenomenon by writing down a formula for the width of the peak.

Solution: The peak clearly has a Lorentzian feel to it, and indeed this can be made explicit by Taylor expanding the \(\sin\) function, choosing the \(m=0\) resonance for simplicity (periodicity ensures they are all identical):

Problem: In analogy to the quality factor \(Q\) of a damped harmonic oscillator, define the finesse \(\mathcal F\) of the Fabry-Perot interferometer (it similarly measures the “quality” of the FPI).

Solution: Just as \(Q=\omega_0/\Delta\omega\), one can analogously define:

\[\mathcal F=\frac{2\pi}{\Delta\phi_{\text{FWHM}}}=\frac{\pi\sqrt{R}}{1-R}\]

where the \(2\pi\) can be interpreted in analogy to \(\omega_0\) as being where the first transmission resonance lies (apart from the one at \(\Delta\phi=0\)), though of course this is actually the separation between any pair of nearest neighbour transmission resonances (which one can loosely refer to as the free spectral range \(\Delta\phi_{\text{FSR}}=2\pi\)).

Problem: It is customary to speak of the free spectral range not in \(\Delta\phi\)-space, but when the FPI’s transmission comb is plotted in \(f\)-space or \(\lambda’\)-space. Using \(\Delta\phi_{\text{FSR}}=2\pi\) as the starting point, find expressions for \(\Delta f_{\text{FSR}}\) and \(\Delta\lambda_{\text{FSR}}\) (strictly speaking, a bunch of places where \(\Delta\phi\) is written should really be written as \(\Delta\Delta\phi\) and read as “the difference in delta-phis”)

Solution:

Problem: Although the finesse \(\mathcal F\) was defined in analogy to a \(Q\)-factor in \(\Delta\phi\)-space, show that it also behaves more intimately like a genuine \(Q\)-factor in \(\omega\)-space (and hence also \(f\)-space)!

Solution:

\[\frac{\Delta f_{\text{FSR}}}{\Delta f_{\text{FWHM}}}=\frac{\Delta\phi_{\text{FSR}}}{\Delta\phi_{\text{FWHM}}}=\mathcal F\]

which follows because \(\Delta\phi\) and \(f\) are related by a linear transformation:

\[\Delta\phi=\frac{4\pi\Delta x\cos\theta’}{v’}\Delta f\]

Problem: The line width \(\Delta\lambda’_{\text{FWHM}}\) is often taken to define a “resolution element” of the FPI in \(\lambda’\)-space. This is because, if the incident light is not monochromatic, but say contains \(2\) closely spaced wavelengths \(\lambda’_1,\lambda’_2\), then the FPI will be able to resolve them at the \(m^{\text{th}}\) order iff:

\[\frac{\Delta\lambda}{\lambda’}\geq \frac{1}{m\mathcal F}\]

Show this.

Solution: Starting from the identity:

\[\frac{\Delta\lambda’_{\text{FWHM}}}{\lambda’}=\frac{\Delta f_{\text{FWHM}}}{f}=\frac{\Delta f_{\text{FSR}}}{f\mathcal F}\]

Now write \(f=f_m\) for the frequency of the \(m^{\text{th}}\) order of the FPI in \(f\)-space. Clearly this is just \(f_m=m\Delta f_{\text{FSR}}\) a sequence of \(m\) hops of the free spectral range \(\Delta f_{\text{FSR}}\). This yields the desired result.

Posted in Blog | Leave a comment

Spectral Line Shape Broadening Mechanisms

Problem: What does the phrase “spectral line shape” mean?

Solution: A spectrum is typically a plot of the intensity \(I(\omega)\sim|E(\omega)|^2\) of some underlying time domain signal \(E(t)\).

Problem: What are the \(2\) most prominent kinds of spectra one encounters?

Solution: There are both emission spectra and absorption spectra (and also, it seems there are only broadening mechanisms, no such thing as narrowing mechanisms).

Problem: Distinguish between homogeneous broadening mechanisms and inhomogeneous broadening mechanisms.

Solution: Homogeneous broadening mechanisms affect all atoms indistinguishably, giving rise to a Lorentzian spectral line shape. By contrast, inhomogeneous broadening mechanisms affect atoms distinguishably; this leads to non-Lorentzian spectral broadening.

Problem: In the presence of Doppler broadening alone, what is the spectral line shape?

Solution: Along \(1\)-dimension, the Maxwell-Boltzmann distribution of velocities is normally distributed (no factor of \(4\pi v^2\)):

\[\rho(v)=\sqrt{\frac{\beta m}{2\pi}}e^{-\beta mv^2/2}\]

The non-relativistic Doppler shift gives the angular frequency random variable \(\omega\) as an affine transformation of the velocity random variable \(v\):

\[\omega=\omega_0+k_0v\]

so:

\[\rho(\omega)=\frac{dv}{d\omega}\rho(v)=\sqrt{\frac{\beta m}{2\pi k_0^2}}e^{-\beta m(\omega-\omega_0)^2/2k_0^2}\]

In particular it is also a Gaussian (hence Doppler broadening is inhomogeneous) but with expectation \(\omega_0\) and standard deviation (aka line width) \(\Delta\omega\):

\[\Delta\omega=\omega_0\sqrt{\frac{k_BT}{mc^2}}\propto\sqrt{T}\]

(slogan: variance proportional to temperature) so the quality factor \(Q:=\omega_0/\Delta\omega_D\) of depends on the ratio of rest mass energy \(mc^2\) of each particle to their thermal energy \(k_BT\).

Problem: In the presence of natural/lifetime/radiative broadening alone, what is the spectral line shape?

Solution: For a \(2\)-level system with ground state \(|0\rangle\) and excited state \(|1\rangle\) associated to a lifetime \(\tau\) of spontaneous emission, then the natural line width is:

\[\Gamma\sim\omega\sim\frac{1}{\tau}\]

and the spectral line shape is a Lorentzian \(\sim\frac{1}{(\omega-\omega_0)^2+\Gamma^2}\) (is there a clear way to understand why? From optical Bloch equations for instance…). So natural broadening is a homogeneous broadening mechanism.

Problem: In the presence of pressure/collisional broadening alone, what is the spectral line shape?

Solution: Also a Lorentzian as with natural broadening (so also homogeneous), but the line width is determined from the relaxation time \(\tau=\ell/\langle v_{\text{rel}}\rangle\) between collisions. Using the fact that the mean free path is \(\ell=1/n\sigma\), this yields the line width due to pressure broadening:

\[\Delta\omega=n\sigma\langle v\rangle\]

Problem: What does the overall spectral line shape look like when multiple broadening mechanisms are at play?

Solution:

sum of independent random variables yields convolution.

Power Broadening

This is also homogeneous.

Doppler Broadening

Need to use that formula for converting a probability distribution of one random variable into another function of it with the derivative…(this is how you can experimentally verify the Maxwell distribution btw).

Posted in Blog | Leave a comment

Biology Basics

Problem: Draw a DNA molecule. Annotate the terms nucleotides, phosphate group, deoxyribose, phosphodiester linkage, adenine, thymine, cytosine, guanine, nitrogenous bases.

Problem: Draw an RNA molecule, and make similar annotations as for DNA.

Problem: What is the typical length scale \(x_{\text{cell}}\) of a cell? How does that compare with the typical width \(x_{\text{hair}}\) of a human hair?

Solution: One has \(x_{\text{cell}}\sim 10\space\mu\text m\), whereas \(x_{\text{hair}}\sim 100\space\mu\text m\), so \(x_{\text{hair}}/x_{\text{cell}}\sim 10\) (though exact values vary!).

Problem: Draw a cartoon cross-section of a typical eukaryotic cell. Annotate the terms nucleus, mitochondria, ribosome, rough endoplasmic reticulum, smooth endoplasmic reticulum, golgi bodies, lysosomes, and (for plants) chloroplasts.

Problem: Draw a diagram depicting the process of fertilization. Annotate the terms gamete, zygote, pair of homologous chromosomes, haploid number, and diploid number on the diagram.

Solution:

Posted in Blog | Leave a comment

Weak Wave Turbulence (WWT)

  • The initial \(t=0\) Bose gas momentum space distribution \(n_{|k\rangle}(t=0)\) is sharply peaked at some \(k=k_0\), with the low-energy bosons of \(k<k_0\) in the so-called IR regime and the high-energy bosons of \(k>k_0\) in the UV regime.
  • The strongly interacting regime corresponds to bosons with \(k\xi\ll 1\) with \(\xi\propto a^{-1/2}\) the healing length, whereas the weakly interacting regime is \(k\xi\gg 1\).
  • In the UV regime, if it starts weakly interacting then it will remain like that because \(k\) just grows, pushing further into weakly interacting.
  • In the IR regime, there is a transition from strongly to weakly interacting, so more interesting to study.
  • Gevorg’s speed limit paper showed that the coherent coarsening dynamics as quantified by the speed limit \(3\hbar/m\) is, after an initial transient, independent of the (dimensionless) interaction strength \(1/(k\xi)\).
  • WWT applies to the weakly interacting \(k\xi\gg 1\) regime.
  • # the wiggles at the end are due to diffraction off the BEC, not the sinc momentum space contribution from the BEC which is approximately (need to crop them from the data)
  •         # a homogeneous top hat in real space.
  •         # Each experimental cycle lasts about 30 seconds, the first 25 seconds is just the standard steps (MOT, evaporation, Sisyphus cooling, molasses, etc.), the last 5 seconds
  •         # only like a few seconds is actually the physics.
  •         # Right now, Gevorg & Simon decided to have 15 increasing TOFs, in order to probe the momentum space distribution n_k of the BEC.
  •         # The idea is that you measure a 2D momentum space distribution of the BEC along your line of sight, then inverse Abel transform (involves a derivative)
  •         # it to get the 3D momentum space distribution. But also, the only part you can reliably inverse Abel transform is the part that is not diffracted
  •         # off the BEC and doesn’t saturate the optical density OD at 3, so can only reliably measure in sort of the “outskirt” regions of the cylinder so to speak
  •         # where the BEC is not too dense (i.e. OD < 3). Also this is why it’s hard to measure the low-k part of the momentum space distribution, b/c the BEC is so dense
  •         # there (OD > 3) that it saturates the imaging system.
  •         # Numerical differentiation is much less robust than numerical integration (Gevorg gave example of a monotonic function like exp(x)),
  •         # so it’s much harder to get the 3D momentum space distribution. In the case of the inverse Abel transform, you have to differentiate by subtracting the
  •         # of neighbouring pixels.
  •         # Need to be on the Cambridge VPN to “SSH” into the VNC viewer to see Analysis GpUI (imaging computer), Cicero, Origin, or any of the office computers
  •         # each of the lab computers has a IP address, and you can only access them from the Cambridge network. Also a remote Toptica software for
  •         # relocking the laser if it drifts/unlocks overnight (somehow not so easy to just automate this b/c need to manually play with the current and piezo voltage in the AOMs).
  •         # Ground state wavefunction of BEC in k-space is not exactly a sinc, rather a Bessel b/c it’s a cylinder.
  •         # one-body loss, evaporative loss, usually temperatures too high cf. box trap depth, in that case lose a lot of energy but not many particles
  •         # b/c only particels you lose are the high-energy ones, so you lose a lot of energy but not many particles
  •         # another worry is counting worry, the n_rad_k distributions might overestimate or underestimate the number of particles
  •         # e.g. the 350000 atoms seems too high…truncate integrals as well to avoid the noisy region at high-k.
  •         # for n_k use log-log, for k**2*n_k or k**4*n_k use lin-lin maybe? or log-lin… advantage of lin-lin is that area you visually see is proportional to the
  •         # integral of the function and for k**2 and k**4 this is nice to see.
  • #try playing around with some different definition of “speed of thermalization”, see if they give a monotonic thing or not
  • # also compare with the GPE & WWT theory of the paper (which plots k**2*n_k) see if it matches…
  • # change energy to temperature scale (nK), also get E/N for each set
Posted in Blog | Leave a comment

Classical Field Theory

Problem: Explain what is meant by a classical field.

Solution: A classical field is any function \(\phi(x)\) (denoted as \(\phi\) because “phi” sounds like “field”) on spacetime \(x:=(ct,\mathbf x)\in\mathbf R^{1,3}\) which is a \(c\)-number, that is to say \(\phi(x)\in\mathbf R\) or \(\phi(x)\in\mathbf C\) or \(\phi(x)\in\mathbf R^4\) or \(\mathbf C^4\) or something like that. Classical fields stand in sharp contrast to quantum fields which are no longer (necessarily) \(c\)-numbers but in general \(q\)-numbers (operator-valued).

Problem: Explain why the words “theory” and “action” are interchangeable, so e.g. if someone says “Klein-Gordan field theory is Lorentz invariant” or “Maxwell’s field theory of classical electromagnetism is both Lorentz invariant and gauge invariant”, it is respectively synonymous with “the Klein-Gordan action is Lorentz invariant” or “the Maxwell action is both Lorentz invariant and gauge invariant”.

Solution: Because once an action \(S\) is written down for a classical field theory, it defines what classical fields are involved in the theory, their dynamics, etc. in other words, a complete specification of all the relevant information.

Problem: Write down the action \(S\) of a classical field theory in terms of its Lagrangian density \(\mathcal L\). State the Lagrangian density \(\mathcal L_{\text{KG}}\) of Klein-Gordan field theory and the Lagrangian density \(\mathcal L_{\text{EM}}\) of Maxwell field theory.

Solution: From classical particle mechanics, \(S=\int dt L\) where \(L=\int d^3\mathbf x\mathcal L\) is the classical Lagrangian. Thus, overall \(S=c^{-1}\int d^{1+3}x\mathcal L\) where the extra factor of \(c\) is to remove the factor of \(c\) in the spacetime measure \(d^{1+3}x=cdtd^3\mathbf x\). Klein-Gordon field theory governs the dynamics of a single real scalar field \(\phi(x)\):

\[\mathcal L_{\text{KG}}(\phi,\partial_0\phi,\partial_1\phi,\partial_2\phi,\partial_3\phi)=\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi-\frac{1}{2}k^2_c\phi^2=\frac{1}{2c^2}\left(\frac{\partial\phi}{\partial t}\right)^2-\left(\frac{1}{2}\left|\frac{\partial\phi}{\partial\mathbf x}\right|^2+\frac{1}{2}k_c^2\phi^2\right)\]

where \(k_c=2\pi/\lambda_c\) is the Compton wavenumber associated to the Compton wavelength \(\lambda_c=h/mc\). The important point is that \(k_c\propto m\). By contrast, Maxwell field theory governs the dynamics of a \(4\)-vector field \(A^{\mu}(x)\):

\[\mathcal L_{\text{EM}}(A^0,A^1,A^2,A^3,\partial…)=-\frac{1}{4\mu_0}F_{\mu\nu}F^{\mu\nu}-J^{\mu}A_{\mu}=\frac{\varepsilon_0}{2}|\mathbf E|^2-\left(\frac{|\mathbf B|^2}{2\mu_0}+\rho\phi-\mathbf J\cdot\mathbf A\right)\]

where the electromagnetic field tensor \(F_{\mu\nu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}\).

Problem: What does it mean for a classical field theory to be Lorentz invariant (aka relativistic)? How can one often instantly tell if a classical field theory is compatible with special relativity or not?

Solution: Loosely speaking, a classical field theory is Lorentz invariant iff its action \(S\) is a Lorentz scalar:

\[\frac{\delta S}{\delta SO^+(1,3)}=0\]

Both Klein-Gordan and Maxwell are relativistic field theories. The giveaway is the presence of Greek spacetime indices like \(\mu,\nu=0,1,2,3\), etc. in their Lagrangian densities \(\mathcal L_{\text{KG}},\mathcal L_{\text{EM}}\) (one calls this phenomenon manifest Lorentz invariance). More precisely, such spacetime indices must be contracted appropriately, thus putting space and time on equal footing.

Problem: More generally, any transformation \((x,\phi_i)\mapsto (x’,\phi’_i)=(x+\delta x,\phi_i+\delta\phi_i)\) of the spacetime coordinates and fields that leaves the action \(S\) invariant is called a symmetry of the classical field theory (perhaps that’s why the action is denoted by the letter \(S\) because it’s screaming “\(S\)ymmetry!”). Explain why finding symmetries is useful.

Solution: If the symmetry is continuous, then Noether’s theorem asserts existence of a conserved \(4\)-current. Moreover, the proof is constructive:

\[\delta S=\frac{1}{c}\int\left(\delta(d^{1+3}x)\mathcal L+d^{1+3}x\delta\mathcal L\right)\]

where the Jacobian determinant is \(\det\left(\frac{\partial x’}{\partial x}\right)\approx 1+\partial_{\mu}\delta x^{\mu}\) and \(\delta\mathcal L=\frac{\partial\mathcal L}{\partial\phi_i}\delta\phi_i+\frac{\partial\mathcal L}{\partial(\partial_{\mu}\phi_i)}\partial_{\mu}\phi_i+\frac{\partial \mathcal L}{\partial x^{\mu}}\delta x^{\mu}=\partial_{\mu}(\pi^{\mu}_i\delta\phi_i)+(\partial_{\mu}\mathcal L)\delta x^{\mu}\) (where \(\pi^{\mu}_i:=\frac{\partial\mathcal L}{\partial (\partial_{\mu}\phi_i)}\)). Hence, the action variation is the spacetime integral of a spacetime divergence:

\[\delta S=\frac{1}{c}\int d^{1+3}x\partial_{\mu}(\pi^{\mu}_i\delta\phi_i+\mathcal L\delta x^{\mu})=0\]

yielding the conserved Noether current:

\[j^{\mu}=\pi^{\mu}_i\delta\phi_i+\mathcal L\delta x^{\mu}\]

Here, \(\delta\phi_i:=\phi’_i(x)-\phi_i(x)\) is the local variation of \(\phi_i\) at a fixed event \(x\in\mathbf R^{1,3}\). It can be related to the net variation \(\Delta\phi_i:=\phi’_i(x’)-\phi_i(x)\) due to both internal transformations and induced variation arising from the coordinate transformation \(\delta x\) using \(\delta\phi_i=\Delta\phi_i-\delta x^{\mu}\partial_{\mu}\phi_i\). This allows the Noether current to be rewritten:

\[j^{\mu}=\pi^{\mu}_i\Delta\phi_i-T^{\mu}_{\nu}\delta x^{\nu}\]

with canonical stress-energy tensor field \(T^{\mu}_{\nu}=\pi^{\mu}_i\partial_{\nu}\phi_i-\delta^{\mu}_{\nu}\mathcal L\) or equivalently \(T^{\mu\nu}=\pi^{\mu}_i\partial^{\nu}\phi_i-g^{\mu}_{\nu}\mathcal L\)

(in particular, notice that \(\delta\mapsto g\)).

Problem: Consider a classical field theory (massless \(\phi^4\)) involving a single real scalar field \(\phi(x)\) with action:

\[S=\frac{1}{c}\int d^{1+3}x\left(\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi-\frac{\lambda}{4}\phi^4\right)\]

Show that the conformal dilatation transformation \(\phi(x)\mapsto\alpha\phi(\alpha x)\) for \(\alpha\in\mathbf R\) is a continuous symmetry of this classical field theory, and hence compute the associated Noether current.

Solution: The hardest part of this problem is to understand precisely what the notation \(\phi(x)\mapsto\alpha\phi(\alpha x)\) means. The answer is that it simultaneously defines a spacetime transformation \(x\mapsto x’=x/\alpha\) and a corresponding active field transformation \(\phi(x)\mapsto \phi'(x)=\alpha\phi(\alpha x)\), so in particular \(\phi'(x’)=\alpha\phi(x)\). To check invariance of the action \(S\):

\[S’=\frac{1}{c}\int d^{1+3}x’\left(\frac{1}{2}\partial’^{\mu}\phi'(x’)\partial’_{\mu}\phi'(x’)-\frac{\lambda}{4}\phi’^4(x’)\right)\]

\[=\frac{1}{c}\int\frac{d^{1+3} x}{\alpha^4}\left(\frac{1}{2}\alpha\partial^{\mu}(\alpha\phi(x))\alpha\partial_{\mu}(\alpha\phi(x))-\frac{\lambda}{4}\alpha^4\phi^4(x)\right)\]

\[=S\]

One can then linearize about \(\alpha=1\):

\[x’\approx x+(\alpha-1)\left(-\frac{x}{\alpha^2}\right)_{\alpha=1}=2x-\alpha x\]

\[\phi'(x)\approx\phi(x)+(\alpha-1)(\phi(\alpha x)+\alpha x^{\nu}\partial_{\nu}\phi(\alpha x))_{\alpha=1}=\phi(x)+(\alpha-1)(\phi+x^{\nu}\partial_{\nu}\phi)\]

\[\phi'(x’)=\phi(x)+(\alpha-1)\phi(x)=\alpha\phi\]

Thus:

\[\delta x=(1-\alpha)x\]

\[\delta\phi=(\alpha-1)(\phi+x^{\nu}\partial_{\nu}\phi)\]

\[\Delta\phi=(\alpha-1)\phi\]

which is consistent with \(\Delta\phi=\delta\phi+\delta x^{\nu}\partial_{\nu}\phi\). The conserved Noether current is thus (using the form with the stress-energy tensor):

\[j^{\mu}=\partial^{\mu}\phi(\alpha-1)\phi-T^{\mu}_{\nu}(1-\alpha)x^{\nu}\]

Dividing out the infinitesimal \(\alpha-1\):

\[j^{\mu}=\phi\partial^{\mu}\phi+T^{\mu}_{\nu}x^{\nu}\]

(aside: it is instructive to explicitly check \(\partial_{\mu}j^{\mu}=0\) by working on-shell \(\partial^{\mu}\partial_{\mu}\phi=-\lambda\phi^3\) and also invoking spacetime translational symmetry \(\partial_{\mu}T^{\mu}_{\nu}=0\)).

Problem: Continuing with the classical massless \(\phi^4\) field theory:

\[S=\frac{1}{c}\int d^{1+3}x\left(\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi-\frac{\lambda}{4}\phi^4\right)\]

Show that the alternative transformations:

  1. \(x’=x\), \(\phi'(x’)=\alpha\phi(\alpha x)\)
  2. \(x’=\cos(\alpha x\), \(\phi'(x’)=\phi(\alpha x)\)

also preserve the action \(S\), hence are symmetries of the theory. Compute their associated Noether currents.

Solution:

  1. This perspective is implicitly adopted in the Cambridge Theoretical Physics course official solution for this problem:

However, since \(\delta x=0\), this approach suggests an incorrect Noether current \(j^{\mu}=\pi^{\mu}\delta\phi=\phi\partial^{\mu}\phi+x^{\nu}\partial_{\nu}\phi\partial^{\mu}\phi\). Rather, as the solution above suggests, one has to compute the corresponding change in the infinitesimal action \(d^{1+3}x’\mathcal L’-d^{1+3}x\mathcal L=d^{1+3}x(\alpha-1)\partial_{\mu}(x^{\mu}\mathcal L)\), hence giving the required subtractive correction.

2. This transformation is just a trivial renaming of the integration dummy variable from \(x\) to \(x’=\alpha x\) (indeed, it could be renamed to any reasonable \(x’=f_{\alpha}(x)\) with corresponding \(\phi'(x’)=\phi(f_{\alpha}(x))\)):

\[\frac{1}{c}\int d^4x’\mathcal L(\phi(x’),\partial’\phi(x’))=\frac{1}{c}\int d^4x\mathcal L(\phi(x),\partial\phi(x))\]

In particular, \(\phi'(x)=\phi(x)\) and thus \(\delta\phi=0\). The transformation of course has nothing to do with the conformal dilatation of the massless \(\phi^4\) field. Contrary to the previous situation, the formula now predicts an incorrect Noether current \(j^{\mu}=x^{\mu}\mathcal L\). Applying the same correction as above yields the trivial Noether current \(0\).

The lesson here is that one should always write the transformed field \(\phi'(x’)\) in terms of \(\phi(x)\), and in particular not \(\phi(\alpha x)\) or any other \(\phi(f(x))\). Then the earlier Noether current result applies.

Problem: Explain how to calculate the orbital angular momentum pseudovector \(\mathbf L\) of a classical field.

Solution: Recall that spatial rotations \(SO(3)\) form a subgroup of the Lorentz group \(SO^+(1,3)\). Thus, when traditionally working with an infinitesimal Lorentz transformation \(\Lambda^{\mu}_{\nu}=\delta^{\mu}_{\nu}+\omega^{\mu}_{\nu}\), this procedure simultaneously yields conservation laws for both spatial rotations and Lorentz boosts. One has \(\Lambda\in SO^+(1,3)\) iff the generator \(\omega_{\mu\nu}=-\omega_{\nu\mu}\) is antisymmetric (hence denoted by the symbol “\(\omega\)”). Thus, \(x’^{\mu}=\Lambda^{\mu}_{\nu}x^{\nu}\approx x^{\mu}+\omega^{\mu}_{\nu}x^{\nu}\) and \(\Delta\phi=0\) because \(\phi'(x’)=\phi(x)\) is a scalar field. One has:

\[j^{\mu}=-T^{\mu}_{\nu}\omega^{\nu}_{\rho}x^{\rho}=-\omega_{\nu\rho}x^{\rho}T^{\mu\nu}:=\frac{\omega_{\nu\rho}cL^{\mu\nu\rho}}{2}\]

where the orbital angular momentum density tensor field is \(L^{\mu\nu\rho}:=c^{-1}(x^{\nu}T^{\mu\rho}-x^{\rho}T^{\mu\nu})\). There are \(6\) combinations of \(\nu,\rho\in\{0,1,2,3\}\) for which \(\omega_{\nu\rho}\neq 0\), giving \(6\) conserved currents \(\partial_{\mu}L^{\mu\nu\rho}=0\), with the conserved charge \(L^{\mu\nu}=\int d^3\mathbf x L^{0\mu\nu}(x)\). By antisymmetry \(L^{\nu\mu}=-L^{\mu\nu}\), the spatial parts can be identified with a \(3\)-vector in the usual way \(L^i:=\frac{\varepsilon_{ijk}}{2}L^{jk}\); this is the orbital angular momentum pseudovector \(\mathbf L\) of the field.

Problem: Consider a classical field theory involving a complex scalar field \(\psi(x)\). Write down the Lagrangian density \(\mathcal L(\psi)\) for \(\psi\) such that:

\[\mathcal L(\psi)=\mathcal L_{\text{KG}}(\sqrt{2}\text{Re}\psi)+\mathcal L_{\text{KG}}(\sqrt{2}\text{Im}\psi)\]

where the Klein-Gordon Lagrangian density for a real scalar field \(\phi(x)\) is \(\mathcal L_{\text{KG}}(\phi):=\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi-\frac{1}{2}k_c^2\phi^2\). Suppose \(\psi\) is coupled to an electromagnetic field \(A^{\mu}\); describe how \(\mathcal L(\psi)\) needs to be modified to account for this.

Solution: It’s just a “Hermitian” version of Klein-Gordon, without the factors of \(1/2\):

\[\mathcal L(\psi)=(\partial^{\mu}\psi)^{\dagger}\partial_{\mu}\psi-k_c^2|\psi|^2\]

Typically one just writes \((\partial^{\mu}\psi)^{\dagger}=\partial^{\mu}\psi^{\dagger}\) because \((\partial^{\mu})^{\dagger}=\) is Hermitian, but it will soon be important to remember that the dagger \(\dagger\) is meant to act on the entire object.

When \(\psi\) is coupled to an electromagnetic \(4\)-potential \(A^{\mu}\), there are \(2\) changes one must make to \(\mathcal L(\psi)\); the obvious one is to also tack on \(\mathcal L_{\text{EM}}=-F^{\mu\nu}F_{\mu\nu}/4\mu_0\). The more subtle one is the minimal coupling substitution in the kinetic term \(\partial\mapsto\mathcal D:=\partial-iqA/\hbar\) with \(\mathcal D\) the gauge covariant derivative.

Although inspired by the classical minimal coupling \(\mathbf p\mapsto\mathbf p-q\mathbf A\), the reason for this has to do with desiring \(\mathcal L\) to be invariant under a simultaneous local \(U(1)\) phase transformation of the complex scalar field \(\psi(x)\mapsto\psi'(x’)=e^{i\theta(x)}\psi(x)\) in conjunction with a gauge transformation of \(A(x)\mapsto A'(x’)=A(x)+\frac{\hbar}{q}\partial\theta(x)\) (in both cases, \(x’=x\)), then:

\[\mathcal D’\psi’=e^{i\theta}\mathcal D\psi\]

Thus, the kinetic term is invariant \((\mathcal D’\psi’)^{\dagger}\mathcal D’\psi’=e^{-i\theta}e^{i\theta}(\mathcal D\psi)^{\dagger}\mathcal D\psi=(\mathcal D\psi)^{\dagger}\mathcal D\psi\). Equivalently, one can say that if the gauge field is transformed as \(A\mapsto A+\partial\Gamma\), then the scalar field must transform with a local phase \(\psi\mapsto e^{iq\Gamma/\hbar}\psi\).

Problem: Consider the ungauged Lagrangian density:

\[\mathcal L=(\partial^{\mu}\psi)^{\dagger}\partial D_{\mu}\psi-V(\psi)\]

Consider using the Mexican hat potential \(V(\psi)=-k_c^2|\psi|^2+\frac{\lambda}{2}|\psi|^4\) with \(k_c^2,\lambda>0\). Explain both of these inequalities.

Write down the general ground state \(\psi_0\), and “quadraticize” (cf. “linearize”) \(\mathcal L\) about \(\psi_0\). Identify the massless Goldstone field. By promoting global to local \(U(1)\) phase symmetry to hold for the theory, the idea of coupling to a gauge field \(A(x)\) emerges naturally (scalar electrodynamics):

\[\mathcal L=(\mathcal D^{\mu}\psi)^{\dagger}\mathcal D_{\mu}\psi-V(\psi)+\frac{1}{4\mu_0}F^{\mu\nu}F_{\mu\nu}\]

Show that the gauge field \(A(x)\) acquires a mass by eating the Goldstone field via the abelian Higgs mechanism.

Solution: The need to make \(\lambda>0\) arises from insisting that \(V(\psi)\in\mathbf R\) is bounded from below (otherwise \(|\psi|\to\infty\) would be unphysical), so in particular at least \(1\) global minimum exists (called a ground field/”state”). In addition, the negative coefficient \(-k_c^2\) on the quadratic term is the opposite of the positive coefficient \(k_c^2\) that normally appears in Klein-Gordan type Lagrangian densities; this is to mimic ordering behavior in contexts like continuous phase transitions where \(T<T_c\)).

By insisting \(\partial V/\partial |\psi|^2=0\), one obtains \(\psi_0=\frac{k_c}{\sqrt{\lambda}}e^{i\phi}\) for arbitrary phase \(\phi\in\mathbf R\). By settling into a specific angle \(\phi\), the system spontaneously breaks the continuous rotational symmetry of \(V(\psi)\). Due to the global \(U(1)\) phase symmetry of the theory, one can simply rotate away the phase \(\phi\) and henceforth w.l.o.g. work with a real vacuum expectation value \(\psi_0=\frac{k_c}{\sqrt{\lambda}}\in\mathbf R\). However, this is just an expectation value; there will be fluctuations about it. One can consider both radial fluctuations \(h(x)\) and azimuthal fluctuations \(g(x)\), leading to the ansatz \(\psi(x)=\psi_0+h(x)+ig(x)\). Plugging into \(\mathcal L\) and discarding cubic or higher-order terms:

\[\mathcal L\approx \partial^{\mu}h\partial_{\mu}h+\partial^{\mu}g\partial_{\mu}g-2k_c^2h^2-V(\psi_0)\]

So the radial fluctuation field \(h(x)\) (called the Higgs field) is massive in the sense that after quantization, its particles will have mass \(m_h=\sqrt{2}m\) (where \(m=\hbar k_c/c\)) and dispersion relation \(\omega_k=c\sqrt{k^2+2k_c^2}\). By contrast, the azimuthal fluctuation field \(g(x)\) (called the Goldstone field) is massless \(m_g=0\) because of the absence of an associated \(g^2\) term in \(\mathcal L\). It is therefore like a photon \(\omega_k=ck\) (except it is spin-\(0\) whereas photons are spin-\(1\)). These notions are all pretty intuitive if one conceptualizes mass as a kind of “stiffness” of moving in certain directions in the potential energy landscape \(V(\psi)\); clearly it requires more energy to move radially rather than azimuthally in a Mexican hat \(V(\psi)\) when sitting at the trough \(\psi_0\).

The Higgs mechanism essentially amounts to FOILing the gauge-invariant kinetic term:

\[(\mathcal D^{\mu}\psi)^{\dagger}\mathcal D_{\mu}\psi=\]

and noticing that it gives an \(\sim A^2\) mass-like term.

Problem: Outline the isomorphism between classical particle mechanics and classical field theory (indeed, leveraging one’s prior experience with classical particle mechanics is the best way to bridge towards an understanding of classical field theory).

Solution: In classical particle mechanics, the dynamical degrees of freedom are the particle trajectories \(\mathbf x_1(t),…,\mathbf x_N(t)\). By contrast, in classical field theory, the dynamical degrees of freedom are the classical fields \(\phi_1(x),…,\phi_N(x)\). The analogy is thus:

\[t\Leftrightarrow x=(ct,\mathbf x)\]

\[\mathbf x\Leftrightarrow\phi\]

In particular, whereas \(\mathbf x\) is the dynamical degree of freedom in particle mechanics, it is demoted to a mere label in classical field theory for the fields \(\phi_i=\phi_i(\mathbf x,t)\).

Problem: The \(i=1,…,N\) Euler-Lagrange equations of motion for a classical field theory governed by a \(1^{\text{st}}\)-order Lagrangian density \(\mathcal L\) are typically written:

\[\partial_{\mu}\pi^{\mu}_i=\frac{\partial\mathcal L}{\partial\phi_i}\]

Does the presence of the Greek spacetime index \(\mu=0,1,2,3\) imply that this field theory is relativistic?

Defining the Lagrangian \(L:=\int d^3\mathbf x\mathcal L\), write down the analog of the Euler-Lagrange equations for \(L\) rather than \(\mathcal L\). Similarly, given the Hamiltonian density \(\mathcal H:=\pi^0_i\dot{\phi}_i-\mathcal L\) and the Hamiltonian \(H:=\int d^3\mathbf x\mathcal H\), write down Hamilton’s equations at the level of both \(\mathcal H\) and \(H\); is there an analog of the Legendre transform that relates \(L\) and \(H\)?

Solution: The presence of the Greek spacetime index \(\mu=0,1,2,3\) does not imply anything relativistic about the corresponding field theory, it’s merely a convenient way to write the spacetime divergence that emerges naturally from the chain rule. Thus, it’s unfortunately very misleading.

The Euler-Lagrange equations for \(L\) look very similar to their classical counterpart:

\[\frac{d}{dt}\left(\frac{\delta L}{\delta\dot{\phi}(\mathbf x)}\right)=\frac{\delta L}{\delta\phi_i(\mathbf x)}\]

Hamilton’s equations for \(H\) also look very similar to their classical counterpart:

\[\dot{\phi}_i=\frac{\delta H}{\delta\pi^0_i(x)}\]

\[\dot{\pi}^0_i=-\frac{\delta H}{\delta\phi_i(x)}\]

while Hamilton’s equations for \(\mathcal H\) are a bit more cumbersome:

\[\dot{\phi}_i=\frac{\partial\mathcal H}{\partial\pi^0_i}\]

\[\dot{\pi}^0_i=-\frac{\partial\mathcal H}{\partial\phi_i}+\frac{\partial}{\partial\mathbf x}\cdot\frac{\partial\mathcal H}{\partial (\partial\phi/\partial\mathbf x)}\]


Problem: A string of length \(L\), mass per unit length \(\mu\), under uniform tension \(T\) is fixed at each end. The Lagrangian \(\mathcal L\) governing the time evolution of the transverse displacement \(\psi(x,t)\) is:

\[\mathcal L=\int_{0}^L\left(\frac{\mu}{2}\dot{\psi}^2-\frac{T}{2}\psi’^2\right)dx\]

where \(x\in[0,L]\) identifies position along the string from one endpoint. By expressing the transverse displacement as a Fourier sine series:

\[\psi(x,t)=\sqrt{\frac{2}{L}}\sum_{n=1}^{\infty}q_n(t)\sin\frac{n\pi x}{L}\]

Show that the Lagrangian \(\mathcal L\) becomes:

\[\mathcal L=\sum_{n=1}^{\infty}\left(\frac{\mu}{2}\dot{q}^2_n-\frac{T}{2}\left(\frac{n\pi}{L}\right)^2q_n^2\right)\]

Derive the equations of motion. Hence, show that the string is equivalent to an infinite set of decoupled harmonic oscillators with frequencies:

\[\omega_n=\sqrt{\frac{T}{\mu}}\frac{n\pi}{L}\]

Solution:

Problem: Show that the solution space of the Klein-Gordon equation is closed under Lorentz transformations.

Solution:

Problem: The motion of a complex scalar field \(\psi(X)\) is governed by the Lagrangian density:

\[\mathcal L=\partial_{\mu}\psi^*\partial^{\mu}\psi-m^2\psi^*\psi-\frac{\lambda}{2}(\psi^*\psi)^2\]

Write down the Euler-Lagrange field equations for this system. Verify that the Lagrangian density \(\mathcal L\) is invariant under the infinitesimal transformation:

\[\delta\psi=i\alpha\psi\]

\[\delta\psi^*=-i\alpha\psi^*\]

Derive the Noether current \(j^{\mu}\) associated with this transformation and verify explicitly that it is conserved using the field equations satisfied by \(\psi\).

Solution:

Problem: Verify that the Lagrangian density:

\[\mathcal L=\frac{1}{2}\partial_{\mu}\phi_{\alpha}\partial^{\mu}\phi_{\alpha}-\frac{1}{2}m^2\phi_{\alpha}\phi_{\alpha}\]

for a triplet of real fields \(\phi_{\alpha}(X)\), \(\alpha=1,2,3\), is invariant under the infinitesimal \(SO(3)\) rotation by \(\theta\):

\[\phi_{\alpha}\mapsto\phi_{\alpha}+\theta\varepsilon_{\alpha\beta\gamma}n_{\beta}\phi_{\gamma}\]

where \(n_{\beta}\) is a unit vector. Compute the Noether current \(j^{\mu}\) associated to this transformation. Deduce that the three quantities:

\[Q_{\alpha}=\int d^3\textbf x\varepsilon_{\alpha\beta\gamma}\dot{\phi}_{\beta}\phi_{\gamma}\]

are all conserved.

Solution:

Problem: By requiring that Lorentz transformations \(\Lambda^{\mu}_{\space\space\nu}\) should preserve the Minkowski norm of \(4\)-vectors \(\eta_{\mu\nu}X’^{\mu}X’^{\nu}=\eta_{\mu\nu}X^{\mu}X^{\nu}\), show that this implies:

\[\eta_{\mu\nu}=\eta_{\sigma\tau}\Lambda^{\sigma}_{\space\space\mu}\Lambda^{\tau}_{\space\space\nu}\]

Show that an infinitesimal transformation of the form \(\Lambda^{\mu}_{\space\space\nu}=\delta^{\mu}_{\space\space\nu}+\omega^{\mu}_{\space\space\nu}\) is specifically a Lorentz transformation iff \(\omega_{\mu\nu}=-\omega_{\nu\mu}\) is antisymmetric.

Write down the matrix for \(\omega^{\mu}_{\space\space\nu}\) corresponding to an infinitesimal rotation by angle \(\theta\) around the \(x^3\)-axis. Do the same for an infinitesimal Lorentz boost along the \(x^1\)-axis by velocity \(v\).

Solution:

Problem: Consider a general infinitesimal Lorentz transformation \(X^{\mu}\mapsto X’^{\mu}=X^{\mu}+\omega^{\mu}_{\space\space\nu}X^{\nu}\) acting at the level of the \(4\)-vector \(X\). How does this perturbation manifest at the level of a scalar field \(\phi=\phi(X)\)? What about at the level of the Lagrangian density \(\mathcal L=\mathcal L(\phi)\)? What about at the level of the action \(S=S(\mathcal L)\)? In particular, show that the perturbation \(\delta\mathcal L\) at the level of \(\mathcal L\) is a total spacetime derivative, and hence describe the Noetherian implication of this.

Solution:

Problem: Maxwell’s Lagrangian for the electromagnetic field is:

\[\mathcal L=-\frac{1}{4}F_{\mu\nu}F^{\mu\nu}\]

where \(F_{\mu\nu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}\) and \(A_{\mu}\) is the \(4\)-vector potential. Show that \(\mathcal L\) is invariant under gauge transformations \(A_{\mu}\mapsto A_{\mu}+\partial_{\mu}\Gamma\) where \(\Gamma=\Gamma(X)\) is a scalar field with arbitrary (differentiable) dependence on \(X\). Use Noether’s theorem, and the spacetime translational invariance of the action \(S\) to construct the energy-momentum tensor \(T^{\mu\nu}\) for the electromagnetic field. Show that the resulting object is neither symmetric nor gauge invariant. Consider a new tensor given by:

\[\Theta^{\mu\nu}=T^{\mu\nu}-F^{\rho\mu}\partial_{\rho}A^{\nu}\]

Show that this object also defines \(4\) conserved currents. Moreover, show that it is symmetric, gauge invariant and traceless.

Solution:

Problem: In Problem #\(3\), a classical field theory involving two complex scalar fields \(\psi(X),\psi^*(X)\) with Lagrangian density:

\[\mathcal L(\psi,\psi^*,\partial_{\mu}\psi,\partial_{\mu}\psi^*)=\partial_{\mu}\psi^*\partial^{\mu}\psi-V(\psi^*\psi)\]

was analyzed (there the potential \(V\) was taken to be analytic in \(\psi^*\psi\) and so Taylor expanded to quadratic order). In particular, the Noether current \(j^{\mu}\) was obtained explicitly for the continuous global, internal \(U(1)\) \(\mathcal L\)-symmetry \(\psi\mapsto e^{i\alpha}\psi\) for a constant \(\alpha\in\textbf R\). By promoting \(\alpha=\alpha(X)\) but still acting infinitesimally across \(X\) in spacetime, recompute the Noether current \(j^{\mu}\), and notice in particular that the Noether current \(j^{\mu}\) doesn’t care about the potential terms \(V(\psi^*\psi)\), only the kinetic terms.

Solution:

Posted in Blog | Leave a comment