Laser Cooling & Trapping of Atoms

The purpose of this post is to explain several techniques for laser cooling and laser trapping of atoms. In order to better emphasize key conceptual points, it will take the approach of posing problems, followed immediately by their solutions.

Problem #\(1\): Recall that any locally conserved field is associated to a continuity equation \(\dot{\rho}+\partial_{\textbf x}\cdot\textbf J=0\). In particular, the local flow of the corresponding conserved quantity can always be interpreted in terms of a velocity field \(\textbf v\) (with SI units of meters/second) by enforcing that \(\textbf J=\rho\textbf v\). Interpret this in the context of fluid mechanics and electromagnetism.

Solution #\(1\): In fluid mechanics, mass is locally conserved, so \(\rho\) could represent mass density while \(\textbf J\) represents mass flux. Momentum is also locally conserved provided there are no external body forces acting on the fluid, so in that case \(\rho\) would be a vector field representing momentum density while \(\textbf J\) would be the stress tensor field representing momentum transport (this is just the Navier-Stokes equations). Similar comments can be made about energy, angular momentum, etc.

In electromagnetism, electric charge is locally conserved, so in that case \(\rho\) represents charge density while \(\textbf J\) represents an electric current. Similarly, the total energy stored in both electromagnetic fields and electric charges is locally conserved. In that case, the continuity equation in linear dielectrics takes the form:

\[\frac{\partial\mathcal E}{\partial t}+\textbf J_f\cdot\textbf E+\frac{\partial}{\partial\textbf x}\cdot\textbf S=0\]

where the electromagnetic field energy density is \(\mathcal E:=(\textbf D\cdot\textbf E+\textbf H\cdot\textbf B)/2\) and the Poynting vector \(\textbf S:=\textbf E\times\textbf H\). If an electromagnetic wave propagates with group velocity \(|\textbf v|\leq c\) in some linear dielectric, it follows that \(\textbf S=\mathcal E\textbf v\) (similar comments apply to momentum density; indeed they are unified in the Maxwell stress tensor).

Problem #\(2\): Within the framework of classical electromagnetism, what is the formula for the radiation pressure \(p_{\gamma}\) exerted at normal incidence to an absorbing surface \(\hat{\textbf n}\) with reflection coefficient \(R\)?

Solution #\(2\): Just as \(\textbf S=\mathcal E\textbf v\) (from Solution #\(1\)), so the radiation pressure is a certain projection of an isotropic current for the momentum density \(\boldsymbol{\mathcal P}\):

\[p_{\gamma}=(1+R)(\boldsymbol{\mathcal P}\cdot\hat{\textbf n})(\textbf v\cdot\hat{\textbf n})=(1+R)|\boldsymbol{\mathcal P}||\textbf v|\cos^2\theta\]

The photon dispersion relation \(\mathcal E=|\boldsymbol{\mathcal P}||\textbf v|\) in a linear dielectric then implies that the radiation pressure \(p_{\gamma}=(1+R)\mathcal E\cos^2\theta\) is given by a Malusian proportionality with the energy density stored in the EM field (aside: how is this related to the equation of state \(p=\mathcal E/3\) for a photon gas?).

Problem #\(3\): An atom moving at (nonrelativistic) speed \(v\) (in the lab frame) and meets a counter-propagating laser beam of incident intensity \(I\) and frequency \(\omega\) (both in the lab frame, where of course it is assumed that the laser source is also at rest in the lab frame). Estimate the velocity-dependent scattering force \(F_{\gamma}(v)\) exerted on the atom and its maximum possible value \(F_{\gamma}^*> F_{\gamma}(v)\).

Solution #\(3\): Heuristically, the scattering force might also be called the radiation force as it’s roughly just the radiation pressure \(p_{\gamma}\) exerted on the cross-section \(\sigma(\omega_D)\) of the atom (evaluated at the Doppler-shifted frequency \(\omega_D:=\omega\sqrt{(1+\beta)/(1-\beta)}\approx \omega+kv\)):

\[F_{\gamma}(v)\approx p_{\gamma}\sigma(\omega_D)=\frac{I}{c}\frac{\sigma_{01}}{1+(2\delta_D/\Gamma)^2}\]

where in the last expression it is assumed that the Doppler detuning \(\delta_D=\omega_D-\omega_{01}\) is small so in particular both \(I\) and \(\sigma_{01}\) are taken to be independent of \(\omega_D\). This is from the perspective of stimulated absorption. On the other hand, because all the analysis is implicitly in the steady state, one can also view this from the perspective of spontaneous emission:

\[F_{\gamma}(v)=(\hbar k)(\Gamma\rho_{11})=\frac{\Gamma}{2}\frac{s}{1+s+(2\delta_D/\Gamma)^2}\]

Note that these two formulas for \(F_{\gamma}(v)\) are not exactly equal, but approximately so at low irradiance \(s\ll 1\) (how to rationalize why?). Meanwhile, on resonance \(\delta_D=0\), in the high-irradiance \(s\to\infty\) limit, the theoretical max scattering force achievable is \(F_{\gamma}^*=\hbar k\Gamma/2\).

Problem #\(4\): Atoms flying out of an oven at temperature \(T\) are collimated to form an atomic beam which meets a counterpropagating laser \(s,\omega\). Without any frequency compensation on the laser \(\omega\), show how to determine the terminal velocity \(v_{\infty}\) of the atoms.

Solution #\(4\): The idea is that when one is far detuned from \(\omega_D\), there is hardly any scattering force and so the atoms move at roughly constant velocity. It is only in a small window \(\omega_D\in[\omega_{01}-\Gamma/2,\omega_{01}+\Gamma/2]\) that there is any significant scattering force. In this near-resonance regime, one can therefore set up an equation of motion \(M\dot v=-F_{\gamma}(v)\) whose solution is:

\[(1+s)(v_{\infty}-v_0)+\frac{4}{3\Gamma k^2}(\delta_{D,\infty}^3-\delta_{D,0}^3)=-\frac{\hbar k\Gamma s}{2M}\Delta t\]

where the initial speed can be taken as the most probable one \(v_0=\sqrt{2k_BT/M}\), and the time \(\Delta t\) can be estimated by considering the situation where \(\omega+kv_0=\omega_{01}+\Gamma/2\) and \(\omega+kv_{\infty}=\omega_{01}-\Gamma/2\) (actually not sure about this).

Problem #\(5\): If the atomic beam emerging from the oven along with the counterpropagating laser are oriented along some “\(z\)-axis”, write down the magnetic field \(\mathbf B(z)=B(z)\hat{\textbf k}\) required to make a Zeeman slower.

Solution #\(5\): Assuming one is working at relatively low magnetic field strengths, the low-field basis which diagonalizes the perturbative hyperfine Hamiltonian \(\Delta H_{\text{HFS}}\propto\textbf I\cdot\mathbf J\) is \(|f,m_f\rangle\). In this basis, the expectation of the perturbative Zeeman Hamiltonian \(\Delta H_{\text{Zeeman}}\) is approximately:

\[\langle f,m_f|\Delta H_{\text{Zeeman}}|f,m_f\rangle=g_f\mu_BB(z)m_f\hbar\]

Thus, if one is exploiting an electric dipole transition from a “ground state” \(|fm_f\rangle\) to an “excited state” \(|f’m’_f\rangle\), then one requires the difference in the Zeeman shifts experienced by these \(2\) levels to precisely compensate for the Doppler detuning \(\delta_D=\omega+kv-\omega_{01}\):

\[(g_{f’}m’_f-g_fm_f)\frac{\mu_BB(z)}{\hbar}=\delta_D\]

In practice, one would like to achieve a constant deceleration \(a>0\) so that \(v^2=v_0^2-2az\). This requires a magnetic field from e.g. a tapered solenoid of the form:

\[B(z)=B_0\sqrt{1-\frac{z}{z_0}}+B_{\text{bias}}\]

where \(z_0:=v_0^2/2a\) is the stopping distance, \(B_0:=\frac{\hbar kv_0}{\mu_B(g_{f’}m’_f-g_fm_f)}\) can be interpreted as \(B_0=B(0)-B(z_0)\), and the bias field is \(B_{\text{bias}}=\frac{\hbar(\omega-\omega_{01})}{\mu_B(g_{f’}m’_f-g_fm_f)}\); or at least, if one wishes to completely stop the atoms at \(z=z_0\) (if not, then one may need to experimentally lower \(B_{\text{bias}}\) until one obtains a desired terminal velocity; in that case one should then discontinuously turn off the magnetic field \(B(z):=0\) for \(z>z_0\) so that the E\(1\) transition would again be substantially detuned from resonance and so experience a negligible scattering force \(F_{\gamma}\) after the atoms exit the Zeeman slower).

Problem #\(6\): For the specific case of say sodium atoms \(\text{Na}\) (analogous remarks apply to any alkali atom), explain why electric dipole transitions between the stretched states \(|fm_f\rangle=|22\rangle\) in \(3s_{1/2}\) and \(|f’m’_f\rangle=|33\rangle\) in \(3p_{3/2}\) are a good choice for laser cooling via the scattering force \(F_{\gamma}\)? In this case, what should be the corresponding value of \(B_0\) used in a Zeeman slower (the transition has \(\lambda_{01}=589\text{ nm}\))? What should be the polarization of the incident laser light?

Solution #\(6\): This E\(1\) transition is not only allowed but also forms a closed cycle thanks to the usual E\(1\) selection rules, ensuring no optical pumping into dark states that would prevent a given atom from experiencing any further scattering. One can check that \(B_0\approx 0.12\text{ T}=1200\text{ G}\) is reasonable to achieve in the lab. The incident photons should be \(\sigma^+\) circularly-polarized along the quantization axis \(\hat{\textbf k}\) defined by the \(\textbf B\)-field along which the atomic beam is also propagating.

Problem #\(7\): Show, stating any assumptions made, that the velocity-dependent optical molasses force \(F_m(v)\approx -\alpha v\) along an arbitrary axis looks like linear damping with some damping coefficient \(\alpha>0\) for an appropriately-chosen laser detuning \(\delta\). Hence, show that the atom’s kinetic energy \(T(t)=T(0)e^{-t/\tau}\) (supposedly) decays exponentially with time constant \(\tau=m/2\alpha\).

Solution #\(7\): Along any of the \(3\) Cartesian axies, the component of the optical molasses force \(F_m(v)\) along that axis is a superposition of two opposite (but not necessarily equal) scattering forces:

\[F_m(v)=-F_{\gamma}(v)+F_{\gamma}(-v)\approx -2v\frac{\partial F_{\gamma}}{\partial v}(v=0)\]

where the low-velocity assumption \(kv/\Gamma\ll 1\) has been made. This shows that the damping coefficient is:

\[\alpha=2\frac{\partial F_{\gamma}}{\partial v}(v=0)=-4\hbar k^2s\frac{2\delta/\Gamma}{(1+s+(2\delta/\Gamma)^2)^2}\]

So in order for \(\alpha>0\) to actually be damping, one requires \(\delta<0\) to be red-detuned as suggested in the picture. It turns out that in order to treat the two laser beams independently (as has been implicitly done above), they both have the same low \(s\ll 1\) so that neither one saturates the E\(1\) transition too much. In this simple model of optical molasses, one should therefore really take:

\[\alpha=2\frac{\partial F_{\gamma}}{\partial v}(v=0)=-4\hbar k^2s\frac{2\delta/\Gamma}{(1+(2\delta/\Gamma)^2)^2}\]

In some sense, the fact that it looks like linear damping near \(v=0\) is not really conceptually any different from the fact that e.g. systems looks like simple harmonic oscillators near stable equilibria; it’s just taking the \(1\)-st order term in a Taylor series in \(v\).

Kinetic energy is lost through the dissipative power of the molasses force:

\[\dot T=F_m(v)v=-\alpha v^2=-\frac{2\alpha}{m}T\Rightarrow T(t)=T(0)e^{-t/\tau}\]

where \(\tau=m/2\alpha\) is usually on the order of microseconds. This suggests that there is no cooling limit, i.e. that one can quickly cool down arbitrarily close to \(T\to 0\), but this turns out to be impossible, see Problem #\(9\).

Problem #\(8\): For a random walk \(\textbf X_1,\textbf X_2,…,\textbf X_N\) of \(N\) steps, each of identical length \(L=|\textbf X_1|=|\textbf X_2|=…=|\textbf X_N|\), what is the expectation of the total displacement squared \(\langle|\textbf X_1+\textbf X_2+…+\textbf X_N|^2\rangle\)?

Solution #\(8\): All the dot products vanish so:

\[\langle|\textbf X_1+\textbf X_2+…+\textbf X_N|^2\rangle=NL^2\]

Problem #\(9\): Conceptually, what is the issue with the approach taken in Solution #\(7\)? Upon amending it, show that there is actually a Doppler cooling limit, namely a minimum temperature \(T_D\) that can be achieved using the optical molasses technique, given by:

\[k_BT_D=\frac{\hbar\Gamma}{2}\]

Solution #\(9\): The issue with Solution #\(7\) is that, while it accounts for the momentum kick given to the atom during stimulated absorption of photons and also recognizes that spontaneous emission averages to no momentum kick, it neglects fluctuations in both the number of photons absorbed and in the number of photons scattered. Using the result of Problem #\(8\), except here conceptually it’s not a random walk in real space but rather in momentum space with \(\langle p_z^2\rangle(t)=2(\hbar k)^2(2\gamma)t\), the kinetic energy \(T_z\) of an atom in the optical molasses now evolves as (see Foote textbook for the detailed arguments, some sketchy assumptions involved too, and turns out ultimately wrong, see Problem #\(10\)):

\[\dot T_z=-\frac{2\alpha}{m} T_z+\left(1+3\times\frac{1}{3}\right)\frac{\hbar^2k^2}{2m}(2\gamma)\]

So in the steady state \(\dot T_z=0\) and using equipartition \(T_z=k_BT/2\), one finds that:

\[k_BT=-\frac{\hbar\Gamma}{4}\frac{1+(2\delta/\Gamma)^2}{2\delta/\Gamma}\]

is minimized for red-detuned \(\delta<0\) at \(2\delta/\Gamma=-1\), yielding the Doppler cooling limit \(k_BT_D=\hbar\Gamma/2\) as claimed.

Problem #\(10\): Explain why, even after accounting for Poissonian fluctuations in the stimulated absorption and spontaneous emission of photons, the Doppler cooling limit for the optical molasses technique above rests on a “spherical cows in vacuum” foundation.

Solution #\(10\): Simply put, the assumption of a \(2\)-level atom has been implicitly lurking in the discussion the whole time, but it turns out that if one leverages the existence of other sublevels, one can do even better than the Doppler cooling limit above would suggest. Note also that the optical molasses laser cooling technique only provides confinement in \(\textbf k\)-space, but not so much in \(\textbf x\)-space which is also relevant; in other words, laser trapping is to \(\textbf x\) what laser cooling is to \(\textbf k\).

Problem #\(11\): In light of Problem #\(10\), draw \(3\) pictures that summarize how a magneto-optical trap (MOT) works. In particular, it should be clear that the MOT could not work with just a \(2\)-level atom. Also write down explicitly a formula for the MOT force \(F_{\text{MOT}}(z,v)\).

Solution #\(11\):

The MOT force \(F_{\text{MOT}}(z,v)\) is conceptually identical to the optical molasses force \(F_m(v)\) only now the detuning includes not only the \(v\)-dependent Doppler shift but also the \(z\)-dependent Zeeman shift (more precisely, the formula \(F_{\text{MOT}}(z,v)\) that will be given below is the component of the net MOT force \(\textbf F_{\text{MOT}}(\textbf x,\textbf v)\) along the quantization \(z\)-axis of the quadrupolar anti-Helmholtz MOT \(\textbf B\)-field near the origin \(\textbf x\approx\textbf 0\) whose purpose is to generate a nearly uniform magnetic field gradient; a similar equation holds in the \(\rho\)-direction except one should note from \(\frac{\partial}{\partial\textbf x}\cdot\textbf B=0\) that \(\frac{\partial B_{\rho}}{\partial\rho}=-\frac{1}{2}\frac{\partial B_z}{\partial z}\)).

\[F_{\text{MOT}}(z,v)=F_{\gamma}\left(\delta=\omega-kv-\left(\omega_{01}+\frac{g’_f\mu_B}{\hbar}\frac{\partial B_z}{\partial z}z\right)\right)-F_{\gamma}\left(\delta=\omega+kv-\left(\omega_{01}-\frac{g’_f\mu_B}{\hbar}\frac{\partial B_z}{\partial z}z\right)\right)\]

Making the same low Doppler shift assumption \(kv\ll\Gamma\) but now also a low Zeeman shift assumption \(\frac{g’_f\mu_B}{\hbar}\frac{\partial B_z}{\partial z}z\ll\Gamma\), one can check that this reduces to the previous linear damping force of the optical molasses together with a Hookean spring force:

\[F_{\text{MOT}}(z,v)=-\alpha v-\beta z\]

where the spring constant is \(\beta=\frac{\alpha g’_f\mu_B}{\hbar k}\frac{\partial B_z}{\partial z}\) and the damping coefficient \(\alpha\) is as before in Problem #\(7\). Usually, the atom will be overdamped (as is desirable) \(\alpha^2>4\beta m\).

Problem #\(12\): State one advantage and one disadvantage of a MOT compared with just the optical molasses technique.

Solution #\(12\): The advantage of a MOT is that its capture velocity \(v_{c,\text{MOT}}\gg v_{c,m}\) is much greater than that of optical molasses so it is a better trapper, hence its name (indeed, one can load a MOT simply by firing a room-\(T\) vapor, or to get more atoms one can first Zeeman slow them). The disadvantage is that it is not as good of a laser cooler; optical molasses is able to reach much colder temperatures, surpassing the naive Doppler cooling limit \(k_BT_D=\hbar\Gamma/2\) mentioned earlier (this is because of various sub-Doppler cooling mechanisms, notably Sisyphus cooling, that are elaborated upon in the next problem; all that matters for now is that such sub-Doppler cooling mechanisms are effectively disabled when the Zeeman shift exceeds the light shift as in a MOT).

Problem #\(13\): In a MOT, where does the trapping come from?

Solution #\(13\): It does not come from the quadrupolar \(\textbf B\)-field, but rather

Problem #\(14\): Draw a picture to explain how Sisyphus cooling allows. What is the new cooling limit?

Problem #\(15\): Show that the magnetic field strength \(|\textbf B(\textbf x)|\) in free space can never have a strict local maximum provided \(\dot{\textbf E}=\textbf 0\) (this is one version of Earnshaw’s theorem).

Solution #\(15\): One has the usual identity:

\[\frac{\partial}{\partial\textbf x}\times\left(\frac{\partial}{\partial\textbf x}\times\textbf B\right)=\frac{\partial}{\partial\textbf x}\left(\frac{\partial}{\partial\textbf x}\cdot\textbf B\right)-\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf B\]

But thanks to the assumptions of the problem the term on the left vanishes, and in addition to the absence of magnetic monopoles, one concludes that \(\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf B=\textbf 0\).

Now, \(|\textbf B(\textbf x)|\) satisfies the claim if and only if \(|\textbf B(\textbf x)|^2\) satisfies it. In turn, one just has to check that the Hessian \(\frac{\partial^2|\textbf B|^2}{\partial\textbf x^2}\) does not have \(3\) negative eigenvalues anywhere. A sufficient (but not necessary) condition for this would be if \(\text{Tr}\frac{\partial^2|\textbf B|^2}{\partial\textbf x^2}\geq 0\) everywhere. But the trace of the Hessian is just the Laplacian of the original scalar field:

\[\text{Tr}\frac{\partial^2|\textbf B|^2}{\partial\textbf x^2}=\left|\frac{\partial}{\partial\textbf x}\right|^2|\textbf B|^2=2\left|\frac{\partial|\textbf B|}{\partial\textbf x}\right|^2+2|\textbf B|\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf B=2\left|\frac{\partial|\textbf B|}{\partial\textbf x}\right|^2\geq 0\]

which completes the proof.

Problem #\(16\): What is the high-level mechanism of magnetic trapping? What are the \(3\) most common kinds of magnetic traps? (just draw a picture of each one with some annotations, no need for long explanations)

Solution #\(16\): Just as in the Stern-Gerlach experiment, magnetic dipoles in non-uniform magnetic fields experience a force that pushes them towards weaker or stronger magnetic field depending on their orientation relative to the magnetic field. The difference is that in the Stern-Gerlach experiment the atoms are flying in at hundreds of meters/second and getting deflected, way beyond capture velocity, whereas in magnetic trapping they have already been laser cooled. The diThe \(3\) most common magnetic traps are the quadrupole trap (made from a pair of anti-Helmholtz coils as in a MOT), a time-averaged orbiting potential (TOP) trap, and an Ioffe-Pritchard trap, the last of which is the most common and simply consists of \(4\) long cylindrical bars (which form a “linear quadrupole” in direct analogy with an electrostatic quadrupole) and \(2\) Helmholtz-like coils. Key words: radial/\(\rho\)-trapping and axial/\(z\)-trapping. Due to the result of Problem #\(14\), it follows that one can only trap atoms in low-field seeking states (such atoms are called low-field seekers).

Good overview here: https://arxiv.org/pdf/1310.6054

Posted in Blog | Leave a comment

Amplitude-Splitting Interferometry

The purpose of this post is to review the theory behind several standard amplitude-splitting interferometers.

Michelson Interferometer

The simplest possible version of a Michelson interferometer is the following:

For now, consider for simplicity a monochromatic light source of frequency \(\omega=ck=nvk=n’v’k=n^{\prime\prime}v^{\prime\prime}k\). In “chronological order”, the phases accrued by \(2\) amplitude-split reflected and transmitted rays in their respective journeys from the light source to the photodiode are explicitly:

\[\phi_1=nkx_0+\frac{n’ka}{2}+\pi+\frac{n’ka}{2}+nkx_2+\pi+nkx_2+\frac{n’ka}{2}+\frac{n^{\prime\prime}kb}{\sqrt{1-n’^2/2{n^{\prime\prime}}^2}}+\frac{n’ka}{2}+nkx_3\]

\[\phi_2=nkx_0+\frac{n’ka}{2}+\frac{n^{\prime\prime}kb}{\sqrt{1-n’^2/2{n^{\prime\prime}}^2}}+\frac{n’ka}{2}+nkx_1+\pi+nkx_1+\frac{n’ka}{2}+\pi+\frac{n’ka}{2}+nkx_3\]

where it has been assumed that \(n'<n^{\prime\prime}\) and that \(n<n_{\text{mirrors}}\), though as far as their relative phase difference \(\Delta\phi\) is concerned, because \(\textbf R\) is an additive abelian group it will always just be:

\[\Delta\phi=|\phi_2-\phi_1|=2nk\Delta x\]

where the arm-length difference is \(\Delta x:=|x_2-x_1|\) (to get a \(50:50\) beam splitter there should also be some constraint on \(a,b,n,n’,n^{\prime\prime}\) coming from the Fresnel equations). If the photodiode has sampling period \(T_s>0\), then one expects that the signal it measures will be a time-averaged irradiance of the form (it is being assumed here that the beam splitter really is exactly \(50:50\)):

\[I\sim\langle|e^{-i\omega t}+e^{i\Delta\phi}e^{-i\omega t}|^2\rangle_{T_s}\sim 1+\cos\Delta\phi\sim\cos^2nk\Delta x\]

where \(\cos\Delta\phi\) is the interference term which is sensitive to the relative phase difference \(\Delta\phi\) between the two interfering waves of identical frequency \(\omega\).

On the other hand then, given two distinct frequencies \(\omega<\omega’\) separated by \(\Delta\omega:=\omega’-\omega>0\), one can repeat the story above, this time measuring an irradiance:

\[I\sim\langle|(1+e^{i\Delta\phi})e^{-i\omega t}+(1+e^{i\Delta\phi’})e^{-i\omega’ t}|^2\rangle_{T_s}\sim 1+\cos\frac{\Delta\phi+\Delta\phi’}{2}\cos\frac{\Delta\phi’-\Delta\phi}{2}\]

where the coupled interference cross-term:

\[\frac{\sin\Delta\omega T_s+\sin(\Delta\omega T_s-\Delta\phi’)+\sin(\Delta\omega T_s+\Delta\phi)+\sin(\Delta\omega T_s+\Delta\phi-\Delta\phi’)}{\Delta\omega T_s}\]

is only non-negligible in the limit \(\Delta\omega T_s\ll 1\) which is typically not satisfied, and so here is taken to vanish. The key takeaway from this is that, while any two waves will interfere, waves of distinct frequencies \(\omega’\neq\omega\) do not in general interfere in an easy-to-measure way, i.e. only interference between waves of the same frequency \(\omega\) tends to be measurable.

That being said, it is essential to stress that interference between waves of the same \(\omega\) is interference nonetheless, and the resultant interference pattern \(I(\Delta\phi,\Delta\phi’)\) is readily measured in experiments and still extremely useful. One example of this is in the case where \(\Delta\omega\) is small, such as between \(2\) atomic fine or hyperfine transitions. Then the interference pattern can be written as a function of the Michelson interferometer arm-length difference \(\Delta x\):

\[I(\Delta x)\sim 1+\cos n(k+k’)\Delta x\cos n(k’-k)\Delta x\]

This is analogous to the phenomenon of beats in the \(t\)-domain; here in a spatial domain \(\Delta x\), the low-frequency envelope \(\cos n(k’-k)\Delta x\) is modulated by high-frequency fringes \(\cos n(k+k’)\Delta x\). By finding a \(\Delta x=\Delta x_0\) where fringes disappear due to vanishing of the envelope \(\Delta k:=k’-k=\pi/2n\Delta x_0\), one can thereby obtain the frequency difference \(\Delta\omega=nv\Delta k\) assuming \(n\) is non-dispersive. The envelope is sometimes phrased in terms of the fringe visibility:

\[V(\Delta x):=\frac{\Delta I_{\text{envelope}}(\Delta x)}{\bar I(\Delta x)}=\cos n(k’-k)\Delta x\]

As its name suggests, when the fringes become invisible (i.e. fringe visibility becomes \(V(\Delta x_0)=0\)), then one again recovers the same frequency difference \(\Delta\omega\) as above.

Another point implicitly assumed above and worth stressing is that the relative phase difference \(\Delta\phi\) of the two interfering waves have to maintain temporal coherence \(\frac{\partial\Delta\phi}{dt}=0\) at the photodiode position.

More generally, assuming that waves of any two distinct frequencies, no matter how close, do not interfere, a highly polychromatic signal containing irradiance \(2\hat I(k)dk\) in the wavenumber interval \([k,k+dk]\) would be expected to exhibit a Michelson interference pattern of the form:

\[I(\Delta x)=2\int_0^{\infty}\hat I(k)(1+\cos nk\Delta x)dk\]

Or equivalently, because \(\hat I(\Delta x)\in\textbf R\) is expected to be real-valued for all \(\Delta x\), it follows that its Fourier transform \(\hat I(k)\) needs to be Hermitian \(\hat I(-k)=\hat I^{\dagger}(k)\). This means one can rewrite this as an inverse Fourier transform:

\[I(\Delta x)=I_0+\int_{-\infty}^{\infty}\hat I(k)e^{ink\Delta x}dk\]

with \(I_0=\int_{-\infty}^{\infty}\hat I(k)dk\) the background irradiance. The Fourier transform then provides the spectrum of the light source:

\[\hat I(k)=\int_{-\infty}^{\infty}(I(\Delta x)-I_0)e^{-ink\Delta x}dx\]

and is the basis of the Fourier transform infrared spectrometer (FTIR) where a Michelson interferometer is employed along with one mirror on a motorized translation stage to vary \(\Delta x\) and a photodiode to measure the corresponding total interference pattern \(I(\Delta x)\) from all the wavenumbers \(k\in(0,\infty)\).

Note in all of the above discussion it has been implicitly assumed that all the beams in the Michelson interferometer are perfectly collimated, etc. so that the photodiode observes an interference pattern \(I(\Delta x)\) which would vary with arm-length difference \(\Delta x\) but spatially across the photodiode surface would be uniform; in practice, due to misalignment (accidental or deliberate) or imperfect collimation because the light source is not a point but extended, etc. there is not only an interference pattern in \(\Delta x\)-space, but in real space \((x,y)\) across the photodiode surface too; this whole interference profile \(I(x,y,\Delta x)\) would then change as \(\Delta x\) were to evolve (this is best understood by putting one’s eye at the photodiode location and looking into the beam splitter; one would then see \(2\) virtual images of the light source behind a mirror sitting around \(\sim 2 x_2\) away, separated by \(\sim 2\Delta x\). Conceptually, one can then just forget about the whole Michelson interferometer setup and pretend as if one just had \(2\) coherent light sources interfering with each other on a distant screen. Using this perspective, it is clear that the spatial fringes one observes will be sections of hyperbolae.

Finally, it is worth mentioning that sometimes a minor variant of the Michelson interferometer (called a Twyman-Green interferometer) is used for better collimation of the incident light beam. In addition, if instead of a beam splitter one were to use a half-silvered mirror, then a compensator may also need to be added (this would compensate not only for the optical path length difference in monochromatic incident light but also for optical dispersion in the case of polychromatic incident light).

Mach-Zehnder Interferometer

The setup is the following:

  • Kinda similar to Michelson except light never retravels its path.
  • Also, instead of a single photodiode on which equal-frequency waves interfere, have two photodiodes to measure the intensities on each output port of the last beamsplitter.
  • Idea is that for a \(50:50\) beamsplitter, a unitary (probability-conserving) to describe action on photon probability amplitudes for going down each arm of the Mach-Zehnder.

Fabry-Perot Interferometer

Miscellaneous: Thin Film Interferometer, Haidinger Fringes & Newton’s Rings

Problem: Write down the conditions for maximum constructive and destructive interference in a thin-film interferometer.

Solution: For a light wave incident at angle \(\theta\):

This is the phase shift between the amplitude-split waves arising solely from their path length difference and the magnitude of the impedance mismatch between the \(2\) dielectrics. However, depending on the sign of this impedance mismatch (i.e. whether \(k>k’\) or \(k<k’\)), one of the waves will, at one of the reflections, also accrue an additional \(\pi\) phase shift. So the total phase shift is actually:

\[\Delta\phi=2k_y’\Delta y+\pi\]

and from this the condition for maximum constructive interference \(\Delta\phi=2\pi m\) or maximum destructive interference \(\Delta\phi=2\pi m+\pi\) can be written. To actually compute the resultant light wave’s electric and magnetic fields, one would have to use the Fresnel equations, separately looking at \(s\) and \(p\)-polarized light.

Problem: Show geometrically that the same result applies in a Fabry-Perot interferometer, except the only difference now is that the amplitude is split at transmission/reflection from a low-\(Z\) to high-\(Z\) dielectric, contributing an additional \(\pi\) phase shift not present in the thin-film interferometer, so in total it’s \(2\pi\equiv 0\).

Solution:

One can also view the Fabry-Perot interferometer as a black box in which case it just behaves like a diffraction grating with slit spacing \(2\Delta x\tan\theta’\), though one also has to account for the extra optical path length traversed inside the optical cavity which can be modelled by an effective glass wedge that phase shifts each beam by a constant \(2k’\Delta x/\cos\theta’\) with respect to its nearest neighbours):

Problem: Now one would like to do better, improving on the previous result. Specifically, one would not only like to know when maxima occur, but also the shape of the FPI’s transmission function \(T(\Delta\phi)\) (i.e. the fraction of the incident power transmitted as a function of the phase shift \(\Delta\phi\) between nearest neighbour reflections). To this end, show that such a transmission function is given by the Airy formula:

\[T(\Delta\phi)=\frac{1}{1+\frac{4R}{(1-R)^2}\sin^2(\Delta\phi/2)}\]

where \(R\) is the reflectivity of either mirror (assumed to be identical), and \(\Delta\phi=2k_x’\Delta x\).

Solution:

Problem: Verify using the Airy transmission function the earlier claim in Problem … that maximum constructive interference occurs when \(\Delta\phi=2\pi m\).

Solution: The Airy transmission function \(T(\Delta\phi)\) is maximized when its denominator is minimized, which occurs when the positive semi-definite part is \(0\), i.e. \(\sin\Delta\phi/2=0\Rightarrow\Delta\phi/2=m\pi\Rightarrow\Delta\phi= 2\pi m\).

Problem: From the plot, it is clear that as one polishes the mirrors more, increasing their reflectivity \(R\), the transmission resonances of the Fabry-Perot occurring (as verified in the previous part) at \(\Delta\phi=2\pi m\) are getting sharper and sharper. Quantify this phenomenon by writing down a formula for the width of the peak.

Solution: The peak clearly has a Lorentzian feel to it, and indeed this can be made explicit by Taylor expanding the \(\sin\) function, choosing the \(m=0\) resonance for simplicity (periodicity ensures they are all identical):

Problem: In analogy to the quality factor \(Q\) of a damped harmonic oscillator, define the finesse \(\mathcal F\) of the Fabry-Perot interferometer (it similarly measures the “quality” of the FPI).

Solution: Just as \(Q=\omega_0/\Delta\omega\), one can analogously define:

\[\mathcal F=\frac{2\pi}{\Delta\phi_{\text{FWHM}}}=\frac{\pi\sqrt{R}}{1-R}\]

where the \(2\pi\) can be interpreted in analogy to \(\omega_0\) as being where the first transmission resonance lies (apart from the one at \(\Delta\phi=0\)), though of course this is actually the separation between any pair of nearest neighbour transmission resonances (which one can loosely refer to as the free spectral range \(\Delta\phi_{\text{FSR}}=2\pi\)).

Problem: It is customary to speak of the free spectral range not in \(\Delta\phi\)-space, but when the FPI’s transmission comb is plotted in \(f\)-space or \(\lambda’\)-space. Using \(\Delta\phi_{\text{FSR}}=2\pi\) as the starting point, find expressions for \(\Delta f_{\text{FSR}}\) and \(\Delta\lambda_{\text{FSR}}\) (strictly speaking, a bunch of places where \(\Delta\phi\) is written should really be written as \(\Delta\Delta\phi\) and read as “the difference in delta-phis”)

Solution:

Problem: Although the finesse \(\mathcal F\) was defined in analogy to a \(Q\)-factor in \(\Delta\phi\)-space, show that it also behaves more intimately like a genuine \(Q\)-factor in \(\omega\)-space (and hence also \(f\)-space)!

Solution:

\[\frac{\Delta f_{\text{FSR}}}{\Delta f_{\text{FWHM}}}=\frac{\Delta\phi_{\text{FSR}}}{\Delta\phi_{\text{FWHM}}}=\mathcal F\]

which follows because \(\Delta\phi\) and \(f\) are related by a linear transformation:

\[\Delta\phi=\frac{4\pi\Delta x\cos\theta’}{v’}\Delta f\]

Problem: The line width \(\Delta\lambda’_{\text{FWHM}}\) is often taken to define a “resolution element” of the FPI in \(\lambda’\)-space. This is because, if the incident light is not monochromatic, but say contains \(2\) closely spaced wavelengths \(\lambda’_1,\lambda’_2\), then the FPI will be able to resolve them at the \(m^{\text{th}}\) order iff:

\[\frac{\Delta\lambda}{\lambda’}\geq \frac{1}{m\mathcal F}\]

Show this.

Solution: Starting from the identity:

\[\frac{\Delta\lambda’_{\text{FWHM}}}{\lambda’}=\frac{\Delta f_{\text{FWHM}}}{f}=\frac{\Delta f_{\text{FSR}}}{f\mathcal F}\]

Now write \(f=f_m\) for the frequency of the \(m^{\text{th}}\) order of the FPI in \(f\)-space. Clearly this is just \(f_m=m\Delta f_{\text{FSR}}\) a sequence of \(m\) hops of the free spectral range \(\Delta f_{\text{FSR}}\). This yields the desired result.

Posted in Blog | Leave a comment

Spectral Line Shape Broadening Mechanisms

Problem: What does the phrase “spectral line shape” mean?

Solution: A spectrum is typically a plot of the intensity \(I(\omega)\sim|E(\omega)|^2\) of some underlying time domain signal \(E(t)\).

Problem: What are the \(2\) most prominent kinds of spectra one encounters?

Solution: There are both emission spectra and absorption spectra (and also, it seems there are only broadening mechanisms, no such thing as narrowing mechanisms).

Problem: Distinguish between homogeneous broadening mechanisms and inhomogeneous broadening mechanisms.

Solution: Homogeneous broadening mechanisms affect all atoms indistinguishably, giving rise to a Lorentzian spectral line shape. By contrast, inhomogeneous broadening mechanisms affect atoms distinguishably; this leads to non-Lorentzian spectral broadening.

Problem: In the presence of Doppler broadening alone, what is the spectral line shape?

Solution: Along \(1\)-dimension, the Maxwell-Boltzmann distribution of velocities is normally distributed (no factor of \(4\pi v^2\)):

\[\rho(v)=\sqrt{\frac{\beta m}{2\pi}}e^{-\beta mv^2/2}\]

The non-relativistic Doppler shift gives the angular frequency random variable \(\omega\) as an affine transformation of the velocity random variable \(v\):

\[\omega=\omega_0+k_0v\]

so:

\[\rho(\omega)=\frac{dv}{d\omega}\rho(v)=\sqrt{\frac{\beta m}{2\pi k_0^2}}e^{-\beta m(\omega-\omega_0)^2/2k_0^2}\]

In particular it is also a Gaussian (hence Doppler broadening is inhomogeneous) but with expectation \(\omega_0\) and standard deviation (aka line width) \(\Delta\omega\):

\[\Delta\omega=\omega_0\sqrt{\frac{k_BT}{mc^2}}\propto\sqrt{T}\]

(slogan: variance proportional to temperature) so the quality factor \(Q:=\omega_0/\Delta\omega_D\) of depends on the ratio of rest mass energy \(mc^2\) of each particle to their thermal energy \(k_BT\).

Problem: In the presence of natural/lifetime/radiative broadening alone, what is the spectral line shape?

Solution: For a \(2\)-level system with ground state \(|0\rangle\) and excited state \(|1\rangle\) associated to a lifetime \(\tau\) of spontaneous emission, then the natural line width is:

\[\Gamma\sim\omega\sim\frac{1}{\tau}\]

and the spectral line shape is a Lorentzian \(\sim\frac{1}{(\omega-\omega_0)^2+\Gamma^2}\) (is there a clear way to understand why? From optical Bloch equations for instance…). So natural broadening is a homogeneous broadening mechanism.

Problem: In the presence of pressure/collisional broadening alone, what is the spectral line shape?

Solution: Also a Lorentzian as with natural broadening (so also homogeneous), but the line width is determined from the relaxation time \(\tau=\ell/\langle v_{\text{rel}}\rangle\) between collisions. Using the fact that the mean free path is \(\ell=1/n\sigma\), this yields the line width due to pressure broadening:

\[\Delta\omega=n\sigma\langle v\rangle\]

Problem: What does the overall spectral line shape look like when multiple broadening mechanisms are at play?

Solution:

sum of independent random variables yields convolution.

Power Broadening

This is also homogeneous.

Doppler Broadening

Need to use that formula for converting a probability distribution of one random variable into another function of it with the derivative…(this is how you can experimentally verify the Maxwell distribution btw).

Posted in Blog | Leave a comment

Boltzmann’s Equation

The purpose of this post is to explain how Boltzmann’s equation in kinetic theory arises.

Problem #\(1\): Write down Liouville’s equation from classical Hamiltonian mechanics governing the incompressible phase space flow (i.e. time evolution) of the joint probability density function \(\rho(\textbf x_1,…,\textbf x_N,\textbf p_1,…,\textbf p_N,t)\).

Solution #\(1\): Remembering the intuition that \(\{\space\space,H\}\) implements an advective derivative on the joint phase space:

\[\frac{D\rho}{Dt}=0\Rightarrow\frac{\partial\rho}{\partial t}+\{\rho,H\}=0\]

The idea is that \(\rho(\textbf x_1,…,\textbf x_N,\textbf p_1,…,\textbf p_N,t)d^3\textbf x_1…d^3\textbf x_Nd^3\textbf p_1…d^3\textbf p_N\) is the (purely due to classical ignorance) probability that the system of \(N\) identical particles is, at some time \(t\in\textbf R\), living within an infinitesimal volume \(d^3\textbf x_1…d^3\textbf x_Nd^3\textbf p_1…d^3\textbf p_N\) centered around the microstate \((\textbf x_1,…,\textbf x_N,\textbf p_1,…,\textbf p_N)\in\textbf R^{6N}\) in phase space.

Problem #\(2\): What is the marginal probability density \(\rho_1(\textbf x,\textbf p,t)\) of the any given particle being in state \((\textbf x,\textbf p)\in\textbf R^6\) at time \(t\in\textbf R\)? How is \(\rho_1\) related to the \(1\)-particle distribution function \(n_1(\textbf x,\textbf p,t)\)?

Solution #\(2\): One simply takes the joint distribution and integrates away the \(6(N-1)\) degrees of freedom of the other \(N-1\) particles:

\[\rho_1(\textbf x,\textbf p,t)=\int d^3\textbf x_2…d^3\textbf x_Nd^3\textbf p_2…d^3\textbf p_N\rho(\textbf x,…\textbf x_N,\textbf p,…,\textbf p_N,t)\]

Moreover, because all \(N\) particles are identical, the same marginal distribution \(\rho_1\) applies equally well to all the other \(N-1\) particles; thus \(\rho_1(\textbf x,\textbf p,t)=\rho_2(\textbf x,\textbf p,t)=…=\rho_N(\textbf x,\textbf p,t)\). The one-particle distribution function \(n_1(\textbf x,\textbf p,t)\) at \((\textbf x,\textbf p)\in\textbf R^6\) at time \(t\in\textbf R\) is therefore:

\[n_1(\textbf x,\textbf p,t)=\sum_{i=1}^N\rho_i(\textbf x,\textbf p,t)=N\rho_1(\textbf x,\textbf p,t)\]

Problem #\(3\): In terms of the \(1\)-particle distribution function \(n_1(\textbf x,\textbf p,t)\), write down formulas for the number density \(n(\textbf x,t)\) in configuration space. What about the momentum density and kinetic energy density (also in \(\textbf x\)-space)?

Solution #\(3\): If one further marginalizes \(n_1(\textbf x,\textbf p,t)\) over the momenta \(\textbf p\), one obtains the number density of particles purely in configuration space:

\[n(\textbf x,t)=\int d^3\textbf p n_1(\textbf x,\textbf p,t)\]

Similarly, the momentum density in phase space is \(\textbf pn_1(\textbf x,\textbf p,t)\) while the kinetic energy density in phase space is \(\textbf |\textbf p|^2n_1(\textbf x,\textbf p,t)/2m\), and both of these can be similarly marginalized over \(\textbf p\) to obtain corresponding density distributions in configuration space.

Problem #\(4\): What is the analog of Liouville’s equation for the \(1\)-particle distribution function \(n_1\)?

Solution #\(4\): For a generic Hamiltonian \(H\):

\[\frac{\partial n_1}{\partial t}(\textbf x,\textbf p,t)=N\int d^3\textbf x_2…d^3\textbf x_Nd^3\textbf p_2…d^3\textbf p_N\{H(\textbf x,…\textbf x_N,\textbf p,…,\textbf p_N,t),\rho(\textbf x,…\textbf x_N,\textbf p,…,\textbf p_N,t)\}\]

Problem #\(5\): To make further progress, it is necessary to actually specify some physics. The following typical dispersion relation for the Hamiltonian \(H\) is chosen (here the identification \(\textbf x_1:=\textbf x\) is being made):

\[H(\textbf x,…\textbf x_N,\textbf p,…,\textbf p_N,t)=\sum_{i=1}^N\left(\frac{|\textbf p_i|^2}{2m}+V_{\text{ext}}(\textbf x_i,t)\right)+\sum_{1\leq i<j\leq N}V_{\text{int}}(\textbf x_i-\textbf x_j)\]

Show that:

\[\frac{\partial n_1}{\partial t}(\textbf x,\textbf p,t)=\{H_1(\textbf x,\textbf p,t),n_1(\textbf x,\textbf p,t)\}+\left(\frac{\partial n_1}{\partial t}\right)_{\text{collision}}(\textbf x,\textbf p,t)\]

where \(H_1(\textbf x,\textbf p,t):=|\textbf p|^2/2m+V_{\text{ext}}(\textbf x,t)\) is the \(1\)-particle Hamiltonian which features in the so-called streaming term \(\{H_1,n_1\}\), and \(\left(\partial n_1/\partial t\right)_{\text{coll}}\) is called the collision integral and is given by:

\[\left(\frac{\partial n_1}{\partial t}\right)_{\text{coll}}(\textbf x,\textbf p,t)=\int d^3\textbf x_2d^3\textbf p_2\frac{\partial V_{\text{int}}(\textbf x-\textbf x_2)}{\partial\textbf x}\cdot\frac{\partial n_2}{\partial\textbf p}\]

where the \(2\)-particle distribution function \(n_2(\textbf x,\textbf x_2,\textbf p,\textbf p_2,t)\) is defined by:

\[n_2(\textbf x,\textbf x_2,\textbf p,\textbf p_2,t):=N(N-1)\int d^3\textbf x_3…d^3\textbf x_Nd^3\textbf p_3…d^3\textbf p_N\rho(\textbf x,…,\textbf x_N,\textbf p,…,\textbf p_N,t)\]

and has an interpretation entirely analogous to the one-particle distribution function \(n_1(\textbf x,\textbf p,t)\).

Solution #\(5\):

Problem #\(6\): Explain why only the streaming term \(\{H_1,n_1\}\) (and not the collision integral \((\partial n_1/\partial t)_{\text{coll}}\)) matters for the time evolution of the real space number density \(n(\textbf x,t)\). Is this also the case for the momentum density \(\textbf pn_1\) or the kinetic energy density \(|\textbf p|^2n_1/2m\)?

Solution #\(6\): Using \(n=\int d^3\textbf p n_1\):

\[\frac{\partial n}{\partial t}=\int d^3\textbf p\{H_1,n_1\}+\int d^3\textbf p\left(\frac{\partial n_1}{\partial t}\right)_{\text{coll}}\]

But the integral over the collision integral vanishes for the same kind of reasons as in Solution #\(5\):

(note that if instead one were interested in the time evolution of the momentum space number density \(n(\textbf p,t):=\int d^3\textbf xn_1(\textbf x,\textbf p,t)\) then the collision integral does matter. Similar remarks apply to the real space momentum density \(\int d^3\textbf p\textbf pn_1\) or the real space kinetic energy density \(\int d^3\textbf p|\textbf p|^2n_1/2m\)).

Problem #\(7\): Explain how the methods of Solution #\(5\) generalize to yield the BBGKY hierarchy of \(k=1,2,…,N\sim 10^{23}\) coupled PDEs.

Solution #\(7\): Using the same methods as Solution #\(5\), one can check that:

\[\frac{\partial n_k}{\partial t}=\{H_k,n_k\}+\sum_{i=1}^{k}\int d^3\textbf x_{k+1}d^3\textbf p_{k+1}\frac{\partial V_{\text{int}}(\textbf x_i-\textbf x_{k+1})}{\partial\textbf x_i}\cdot\frac{\partial n_{k+1}}{\partial\textbf p_i}\]

with \(k\)-particle distribution function:

\[n_k(\textbf x_1,…\textbf x_k,\textbf p_1,…,\textbf p_k,t)=\frac{N!}{(N-k)!}\int d^3\textbf x_{k+1}…d^3\textbf x_Nd^3\textbf p_{k+1}…d^3\textbf p_N n(\textbf x_1,…,\textbf x_N,\textbf p_1,…,\textbf p_N,t)\]

and \(k\)-particle Hamiltonian \(H_k\) including both \(V_{\text{ext}}\) and interactions \(V_{\text{int}}\) among the first \(k\) particles but ignores interactions with the other \(N-k\) particles:

\[H_k=\sum_{i=1}^{k}\left(\frac{|\textbf p_i|^2}{2m}+V_{\text{ext}}(\textbf x_i)\right)+\sum_{1\leq i<j\leq k}V_{\text{int}}(\textbf x_i-\textbf x_j)\]

The Boltzmann equation arises by truncating the BBGKY hierarchy. The idea is that there is a long time scale \(\tau\) (interscattering time) and a short time scale \(\tau_c\ll\tau\) (scattering time).

Posted in Blog | Leave a comment

Weak Wave Turbulence (WWT)

  • The initial \(t=0\) Bose gas momentum space distribution \(n_{|k\rangle}(t=0)\) is sharply peaked at some \(k=k_0\), with the low-energy bosons of \(k<k_0\) in the so-called IR regime and the high-energy bosons of \(k>k_0\) in the UV regime.
  • The strongly interacting regime corresponds to bosons with \(k\xi\ll 1\) with \(\xi\propto a^{-1/2}\) the healing length, whereas the weakly interacting regime is \(k\xi\gg 1\).
  • In the UV regime, if it starts weakly interacting then it will remain like that because \(k\) just grows, pushing further into weakly interacting.
  • In the IR regime, there is a transition from strongly to weakly interacting, so more interesting to study.
  • Gevorg’s speed limit paper showed that the coherent coarsening dynamics as quantified by the speed limit \(3\hbar/m\) is, after an initial transient, independent of the (dimensionless) interaction strength \(1/(k\xi)\).
  • WWT applies to the weakly interacting \(k\xi\gg 1\) regime.
  • # the wiggles at the end are due to diffraction off the BEC, not the sinc momentum space contribution from the BEC which is approximately (need to crop them from the data)
  •         # a homogeneous top hat in real space.
  •         # Each experimental cycle lasts about 30 seconds, the first 25 seconds is just the standard steps (MOT, evaporation, Sisyphus cooling, molasses, etc.), the last 5 seconds
  •         # only like a few seconds is actually the physics.
  •         # Right now, Gevorg & Simon decided to have 15 increasing TOFs, in order to probe the momentum space distribution n_k of the BEC.
  •         # The idea is that you measure a 2D momentum space distribution of the BEC along your line of sight, then inverse Abel transform (involves a derivative)
  •         # it to get the 3D momentum space distribution. But also, the only part you can reliably inverse Abel transform is the part that is not diffracted
  •         # off the BEC and doesn’t saturate the optical density OD at 3, so can only reliably measure in sort of the “outskirt” regions of the cylinder so to speak
  •         # where the BEC is not too dense (i.e. OD < 3). Also this is why it’s hard to measure the low-k part of the momentum space distribution, b/c the BEC is so dense
  •         # there (OD > 3) that it saturates the imaging system.
  •         # Numerical differentiation is much less robust than numerical integration (Gevorg gave example of a monotonic function like exp(x)),
  •         # so it’s much harder to get the 3D momentum space distribution. In the case of the inverse Abel transform, you have to differentiate by subtracting the
  •         # of neighbouring pixels.
  •         # Need to be on the Cambridge VPN to “SSH” into the VNC viewer to see Analysis GpUI (imaging computer), Cicero, Origin, or any of the office computers
  •         # each of the lab computers has a IP address, and you can only access them from the Cambridge network. Also a remote Toptica software for
  •         # relocking the laser if it drifts/unlocks overnight (somehow not so easy to just automate this b/c need to manually play with the current and piezo voltage in the AOMs).
  •         # Ground state wavefunction of BEC in k-space is not exactly a sinc, rather a Bessel b/c it’s a cylinder.
  •         # one-body loss, evaporative loss, usually temperatures too high cf. box trap depth, in that case lose a lot of energy but not many particles
  •         # b/c only particels you lose are the high-energy ones, so you lose a lot of energy but not many particles
  •         # another worry is counting worry, the n_rad_k distributions might overestimate or underestimate the number of particles
  •         # e.g. the 350000 atoms seems too high…truncate integrals as well to avoid the noisy region at high-k.
  •         # for n_k use log-log, for k**2*n_k or k**4*n_k use lin-lin maybe? or log-lin… advantage of lin-lin is that area you visually see is proportional to the
  •         # integral of the function and for k**2 and k**4 this is nice to see.
  • #try playing around with some different definition of “speed of thermalization”, see if they give a monotonic thing or not
  • # also compare with the GPE & WWT theory of the paper (which plots k**2*n_k) see if it matches…
  • # change energy to temperature scale (nK), also get E/N for each set
Posted in Blog | Leave a comment

Classical Field Theory

The purpose of this post is to solve some problems touching on various aspects of classical field theory. Most are just taken from this problem sheet, with solutions here, but there are also a few extra problems meant to clarify aspects of classical field theory.

Problem #\(1\): A string of length \(L\), mass per unit length \(\mu\), under uniform tension \(T\) is fixed at each end. The Lagrangian \(\mathcal L\) governing the time evolution of the transverse displacement \(\psi(x,t)\) is:

\[\mathcal L=\int_{0}^L\left(\frac{\mu}{2}\dot{\psi}^2-\frac{T}{2}\psi’^2\right)dx\]

where \(x\in[0,L]\) identifies position along the string from one endpoint. By expressing the transverse displacement as a Fourier sine series:

\[\psi(x,t)=\sqrt{\frac{2}{L}}\sum_{n=1}^{\infty}q_n(t)\sin\frac{n\pi x}{L}\]

Show that the Lagrangian \(\mathcal L\) becomes:

\[\mathcal L=\sum_{n=1}^{\infty}\left(\frac{\mu}{2}\dot{q}^2_n-\frac{T}{2}\left(\frac{n\pi}{L}\right)^2q_n^2\right)\]

Derive the equations of motion. Hence, show that the string is equivalent to an infinite set of decoupled harmonic oscillators with frequencies:

\[\omega_n=\sqrt{\frac{T}{\mu}}\frac{n\pi}{L}\]

Solution #\(1\):

Problem #\(2\): Show that the solution space of the Klein-Gordon equation is closed under Lorentz transformations.

Solution #\(2\):

Problem #\(3\): The motion of a complex scalar field \(\psi(X)\) is governed by the Lagrangian density:

\[\mathcal L=\partial_{\mu}\psi^*\partial^{\mu}\psi-m^2\psi^*\psi-\frac{\lambda}{2}(\psi^*\psi)^2\]

Write down the Euler-Lagrange field equations for this system. Verify that the Lagrangian density \(\mathcal L\) is invariant under the infinitesimal transformation:

\[\delta\psi=i\alpha\psi\]

\[\delta\psi^*=-i\alpha\psi^*\]

Derive the Noether current \(j^{\mu}\) associated with this transformation and verify explicitly that it is conserved using the field equations satisfied by \(\psi\).

Solution #\(3\):

Problem #\(4\): Verify that the Lagrangian density:

\[\mathcal L=\frac{1}{2}\partial_{\mu}\phi_{\alpha}\partial^{\mu}\phi_{\alpha}-\frac{1}{2}m^2\phi_{\alpha}\phi_{\alpha}\]

for a triplet of real fields \(\phi_{\alpha}(X)\), \(\alpha=1,2,3\), is invariant under the infinitesimal \(SO(3)\) rotation by \(\theta\):

\[\phi_{\alpha}\mapsto\phi_{\alpha}+\theta\varepsilon_{\alpha\beta\gamma}n_{\beta}\phi_{\gamma}\]

where \(n_{\beta}\) is a unit vector. Compute the Noether current \(j^{\mu}\) associated to this transformation. Deduce that the three quantities:

\[Q_{\alpha}=\int d^3\textbf x\varepsilon_{\alpha\beta\gamma}\dot{\phi}_{\beta}\phi_{\gamma}\]

are all conserved.

Solution #\(4\):

Problem #\(5\): By requiring that Lorentz transformations \(\Lambda^{\mu}_{\space\space\nu}\) should preserve the Minkowski norm of \(4\)-vectors \(\eta_{\mu\nu}X’^{\mu}X’^{\nu}=\eta_{\mu\nu}X^{\mu}X^{\nu}\), show that this implies:

\[\eta_{\mu\nu}=\eta_{\sigma\tau}\Lambda^{\sigma}_{\space\space\mu}\Lambda^{\tau}_{\space\space\nu}\]

Show that an infinitesimal transformation of the form \(\Lambda^{\mu}_{\space\space\nu}=\delta^{\mu}_{\space\space\nu}+\omega^{\mu}_{\space\space\nu}\) is specifically a Lorentz transformation iff \(\omega_{\mu\nu}=-\omega_{\nu\mu}\) is antisymmetric.

Write down the matrix for \(\omega^{\mu}_{\space\space\nu}\) corresponding to an infinitesimal rotation by angle \(\theta\) around the \(x^3\)-axis. Do the same for an infinitesimal Lorentz boost along the \(x^1\)-axis by velocity \(v\).

Solution #\(5\):

Problem #\(6\): Consider a general infinitesimal Lorentz transformation \(X^{\mu}\mapsto X’^{\mu}=X^{\mu}+\omega^{\mu}_{\space\space\nu}X^{\nu}\) acting at the level of the \(4\)-vector \(X\). How does this perturbation manifest at the level of a scalar field \(\phi=\phi(X)\)? What about at the level of the Lagrangian density \(\mathcal L=\mathcal L(\phi)\)? What about at the level of the action \(S=S(\mathcal L)\)? In particular, show that the perturbation \(\delta\mathcal L\) at the level of \(\mathcal L\) is a total spacetime derivative, and hence describe the Noetherian implication of this.

Solution #\(6\):

Problem #\(7\): Maxwell’s Lagrangian for the electromagnetic field is:

\[\mathcal L=-\frac{1}{4}F_{\mu\nu}F^{\mu\nu}\]

where \(F_{\mu\nu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}\) and \(A_{\mu}\) is the \(4\)-vector potential. Show that \(\mathcal L\) is invariant under gauge transformations \(A_{\mu}\mapsto A_{\mu}+\partial_{\mu}\Gamma\) where \(\Gamma=\Gamma(X)\) is a scalar field with arbitrary (differentiable) dependence on \(X\). Use Noether’s theorem, and the spacetime translational invariance of the action \(S\) to construct the energy-momentum tensor \(T^{\mu\nu}\) for the electromagnetic field. Show that the resulting object is neither symmetric nor gauge invariant. Consider a new tensor given by:

\[\Theta^{\mu\nu}=T^{\mu\nu}-F^{\rho\mu}\partial_{\rho}A^{\nu}\]

Show that this object also defines \(4\) conserved currents. Moreover, show that it is symmetric, gauge invariant and traceless.

Solution #\(7\):

Problem #\(8\): In Problem #\(3\), a classical field theory involving two complex scalar fields \(\psi(X),\psi^*(X)\) with Lagrangian density:

\[\mathcal L(\psi,\psi^*,\partial_{\mu}\psi,\partial_{\mu}\psi^*)=\partial_{\mu}\psi^*\partial^{\mu}\psi-V(\psi^*\psi)\]

was analyzed (there the potential \(V\) was taken to be analytic in \(\psi^*\psi\) and so Taylor expanded to quadratic order). In particular, the Noether current \(j^{\mu}\) was obtained explicitly for the continuous global, internal \(U(1)\) \(\mathcal L\)-symmetry \(\psi\mapsto e^{i\alpha}\psi\) for a constant \(\alpha\in\textbf R\). By promoting \(\alpha=\alpha(X)\) but still acting infinitesimally across \(X\) in spacetime, recompute the Noether current \(j^{\mu}\), and notice in particular that the Noether current \(j^{\mu}\) doesn’t care about the potential terms \(V(\psi^*\psi)\), only the kinetic terms.

Solution #\(8\):

Problem #9:

Posted in Blog | Leave a comment

AC Stark Effect & Optical Dipole Traps

Consider an atomic two-level system with ground state \(|0\rangle\) and excited state \(|1\rangle\). Recall that in the interaction picture, after making the rotating wave approximation and boosting into a steady-state rotating frame, one had the resultant time-independent steady-state Hamiltonian:

\[H_{\infty}=\frac{\hbar}{2}\tilde{\boldsymbol{\Omega}}\cdot\boldsymbol{\sigma}\]

Invoking the identity of Pauli matrices \((\tilde{\boldsymbol{\Omega}}\cdot\boldsymbol{\sigma})^2=|\tilde{\boldsymbol{\Omega}}|^21\), it is clear that the eigenvalues of this Hamiltonian are thus \(E_{\pm}=\pm\frac{\hbar|\tilde{\boldsymbol{\Omega}}|}{2}=\frac{\hbar\sqrt{\Omega^2+\delta^2}}{2}\), and this is known as the light shift resulting from the AC Stark effect (also called the Autler-Townes effect). In particular, if \(\Omega=0\) then \(E_{\pm}=\).

\[E_{\pm}=\pm\left(\frac{\hbar\delta}{2}+\frac{\hbar\Omega^2}{4\delta}\right)\]

It is not a coincidence that this light shift calculated from time-independent perturbation theory, after a first-order binomial expansion, to the result of first-order nondegenerate time-independent perturbation theory applied to … turns out in the framework of QED that these correspond to so-called dressed states of the atom-photon system.

Posted in Blog | Leave a comment

Classical Optics

The purpose of this post is to explain the \(2\) key models of classical optics, namely geometrical optics (also known as ray optics) and physical optics (also known as wave optics). Although historically geometrical optics came before physical optics, and indeed this is also usually the order in which they are conventionally taught, this post will take the more unconventional approach of presenting physical optics first, and then showing how it reduces to geometrical optics in the \(\lambda\to 0\) limit.

Physical Optics

Discuss:

  • Fourier optics in the Fraunhofer regime.
  • Gaussian (pilot) beams
  • TE/TM/TEM modes in EM waveguides
  • How Fresnel diffraction is an exact solution to the paraxial Helmholtz equation and what this has to do with the eikonal approximation/Hamilton-Jacobi equation from classical dynamics.

Maxwell’s equations assert that the electric and magnetic field \(\textbf E,\textbf B\) satisfy vector wave equations in vacuum:

\[\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\textbf E-\frac{1}{c^2}\ddot{\textbf E}=\textbf 0\]

\[\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\textbf B-\frac{1}{c^2}\ddot{\textbf B}=\textbf 0\]

Any of their \(6\) components (denoted \(\psi\)) thus satisfy the scalar wave equation \(|\frac{\partial}{\partial\textbf x}|^2\psi+\frac{1}{c^2}\ddot{\psi}=0\). The spacetime Fourier transform yields the trivial dispersion relation \(\omega=ck\) from which it is evident that performing just a temporal Fourier transform (to avoid the minutiae of \(t\)-dependence) leads to the scalar Helmholtz equation for \(\psi(\textbf x)\):

\[\left(\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2+k^2\right)\psi=\textbf 0\]

In other words, one is looking for eigenfunctions of the Laplacian \(\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\) with eigenvalue \(-k^2\). To begin, consider one of Green’s identities, valid for arbitrary scalar fields \(\psi(\textbf x’),\tilde{\psi}(\textbf x’)\) which are \(C^2\) everywhere in the volume \(V\):

\[\iint_{\textbf x’\in\partial V}\left(\psi\frac{\partial\tilde{\psi}}{\partial\textbf x’}-\tilde{\psi}\frac{\partial\psi}{\partial\textbf x’}\right)\cdot d^2\textbf x’=\iiint_{\textbf x’\in V}\left(\psi\biggr|\frac{\partial}{\partial\textbf x’}\biggr|^2\tilde{\psi}-\tilde{\psi}\biggr|\frac{\partial}{\partial\textbf x’}\biggr|^2\psi\right)d^3\textbf x’\]

(it’s just the divergence theorem applied to the vector field \(\psi\frac{\partial\tilde{\psi}}{\partial\textbf x’}-\tilde{\psi}\frac{\partial\psi}{\partial\textbf x’}\)). It is now obvious that the volume integral will vanish if one then imposes that both \(\psi(\textbf x’)\) and \(\tilde{\psi}(\textbf x’)\) also satisfy the scalar Helmholtz equation. Given any point \(\textbf x\in\textbf R^3\), it is physically clear that the spherical wave Green’s function \(\tilde{\psi}(\textbf x’|\textbf x)=e^{ik|\textbf x-\textbf x’|}/|\textbf x-\textbf x’|\) is one possible (though certainly not a unique) solution to the scalar Helmholtz equation, provided one stays away from the singularity at \(\textbf x’=\textbf x\). This motivates the choice of volume \(V\) to be some arbitrary region but with an \(\varepsilon\)-ball cut around \(\textbf x\), in which case the volume integral can legitimately be taken to vanish over this choice of \(V\). In that case, the surface \(\partial V=S^2_{\varepsilon}\cup S\) can be partitioned into an inner surface \(S^2_{\varepsilon}\) and an outer surface \(S\):

The flux through these two surfaces \(S^2_{\varepsilon},S\) must thus be equal:

\[\iint_{\textbf x’\in S^2_{\varepsilon}}\left(\psi\frac{\partial\tilde{\psi}}{\partial\textbf x’}-\tilde{\psi}\frac{\partial\psi}{\partial\textbf x’}\right)\cdot d^2\textbf x’=\iint_{\textbf x’\in S}\left(\tilde{\psi}\frac{\partial\psi}{\partial\textbf x’}-\psi\frac{\partial\tilde{\psi}}{\partial\textbf x’}\right)\cdot d^2\textbf x’\]

The integral over \(S^2_{\varepsilon}=\{\textbf x’\in\textbf R^3:|\textbf x’-\textbf x|=\varepsilon\}\) is straightforward in the limit \(\varepsilon\to 0\):

\[\iint_{\textbf x’\in S^2_{\varepsilon}}\left(\psi\frac{\partial\tilde{\psi}}{\partial\textbf x’}-\tilde{\psi}\frac{\partial\psi}{\partial\textbf x’}\right)\cdot d^2\textbf x’=\psi(\textbf x)\lim_{\varepsilon\to 0}4\pi\varepsilon^2\frac{1-ik\varepsilon}{\varepsilon^2}e^{ik\varepsilon}+\hat{\textbf n}\cdot\frac{\partial\psi}{\partial\textbf x’}(\textbf x)\lim_{\varepsilon\to 0}4\pi\varepsilon^2\frac{e^{ik\varepsilon}}{\varepsilon}=4\pi\psi(\textbf x)\]

One thus obtains Kirchoff’s integral formula for the scalar Helmholtz equation:

\[\psi(\textbf x)=\frac{1}{4\pi}\iint_{\textbf x’\in S:\textbf x\in V}\left(\frac{e^{ik|\textbf x-\textbf x’|}}{|\textbf x-\textbf x’|}\frac{\partial\psi}{\partial\textbf x’}-\psi(\textbf x’)\frac{\partial}{\partial\textbf x’}\frac{e^{ik|\textbf x-\textbf x’|}}{|\textbf x-\textbf x’|}\right)\cdot d^2\textbf x’\]

where as above:

\[\frac{\partial}{\partial\textbf x’}\frac{e^{ik|\textbf x-\textbf x’|}}{|\textbf x-\textbf x’|}=\frac{ik|\textbf x-\textbf x’|-1}{|\textbf x-\textbf x’|^3}e^{ik|\textbf x-\textbf x’|}(\textbf x’-\textbf x)\]

As an aside, Kirchoff’s integral formula is very similar in spirit to another more well-known integral formula, namely the Cauchy integral formula \(f(z_0)=\frac{1}{2\pi i}\oint_{z\in\gamma:z_0\in\text{int}(\gamma)}\frac{f(z)}{z-z_0}dz\) from complex analysis; the constraint of complex analyticity is analogous to constraining \(\psi\) to obey the Helmholtz equation; if one specifies both Dirichlet and Neumann boundary conditions for \(\psi(\textbf x’)\) everywhere on \(\textbf x’\in S\), then in principle this is enough to uniquely determine \(\psi(\textbf x)\) everywhere in the interior \(V\) of the enclosing surface \(S\).

Now consider the following standard diffraction setup:

Here the surface \(S\) is chosen to be a sphere of radius \(R\) centered at \(\textbf x\), except where it flattens along the aperture with some distribution of slits. As one takes \(R\to\infty\), then in analogy to Jordan’s lemma from complex analysis, one can argue that the flux through this spherical cap portion of \(S\) in Kirchoff’s integral formula vanishes like \(\sim 1/R\to 0\) (this is admittedly still a bit handwavy, for a rigorous argument see the Sommerfeld radiation condition). Thus, the behavior of \(\psi\) on the aperture alone is sufficient to determine its value \(\psi(\textbf x)\) at an arbitrary “screen location” \(\textbf x\) beyond the aperture. Supposing a monochromatic plane wave \(\psi(\textbf x’)=\psi(x’,y’,0)e^{ikz’}\) of momentum \(k\) (hence solving the scalar Helmholtz equation) is normally incident on the aperture \(z’=0\) and that \(k|\textbf x-\textbf x’|\gg 1\) (easily true in most cases), this imposes the boundary condition \(-\frac{\partial\psi}{\partial z’}(x’,y’,0)=-ik\psi(x’,y’,0)\) so one can check that Kirchoff’s integral formula simplifies to:

\[\psi(\textbf x)\approx -\frac{ik}{4\pi}\iint_{\textbf x’\in\text{aperture}}\psi(\textbf x’)\frac{e^{ik|\textbf x-\textbf x’|}}{|\textbf x-\textbf x’|}(1+\cos\angle(\textbf x-\textbf x’,\hat{\textbf k}))d^2\textbf x’\]

Or equivalently:

\[\psi(\textbf x)=\frac{1}{i\lambda}\iint_{\textbf x’\in\text{aperture}}\psi(\textbf x’)\frac{e^{ik|\textbf x-\textbf x’|}}{|\textbf x-\textbf x’|}K(\textbf x-\textbf x’)d^2\textbf x’\]

where the obliquity kernel is \(K(\textbf x-\textbf x’):=\frac{1+\cos\angle(\textbf x-\textbf x’,\hat{\textbf k})}{2}=\cos^2\frac{\angle(\textbf x-\textbf x’,\hat{\textbf k})}{2}\). This is nothing more than a mathematical expression of the Huygens-Fresnel principle.

Fresnel vs. Fraunhofer Diffraction

In general the Huygens-Fresnel integral is difficult to evaluate analytically for an arbitrary point \(\textbf x\) on a screen. Thus, one often begins by making the paraxial approximation \(K(\textbf x-\textbf x’)\approx 1\iff |\textbf x-\textbf x’|\approx z\), except in the complex exponential (otherwise all Huygens wavelets would interfere constructively which is silly). Here instead, one implements a less strict version of the paraxial approximation in the form of a \(z^2\gg |\textbf x-\textbf x’|^2-z^2\) binomial expansion:

\[|\textbf x-\textbf x’|=\sqrt{z^2+|\textbf x-\textbf x’|^2-z^2}=z+\frac{|\textbf x-\textbf x’|^2-z^2}{2z}-\frac{(|\textbf x-\textbf x’|^2-z^2)^2}{8z^3}+…\]

In practice the quadratic term is negligible in the paraxial limit, so neglecting it and all higher-order terms yields the Fresnel diffraction integral:

\[\psi(\textbf x)=\frac{e^{ikz}}{i\lambda z}\iint_{\textbf x’\in\text{aperture}}\psi(\textbf x’)e^{ik(|\textbf x-\textbf x’|^2-z^2)/2z}d^2\textbf x’\]

Or equivalently, expanding \(|\textbf x-\textbf x’|^2=|\textbf x|^2+|\textbf x’|^2-2\textbf x\cdot\textbf x’\):

\[\psi(\textbf x)=\frac{e^{ikz}e^{ik(|\textbf x|^2-z^2)/2z}}{i\lambda z}\iint_{\textbf x’\in\text{aperture}}\psi(\textbf x’)e^{ik|\textbf x’|^2/2z}e^{-i\textbf k\cdot\textbf x’}d^2\textbf x’\]

where \(\textbf k=k\textbf x/z\). If in addition one also assumes that \(k|\textbf x’|^2/2z\ll 1\), then one obtains the Fourier optics case of far-field/Fraunhofer diffraction:

\[\psi(\textbf x)=\frac{e^{ikz}e^{ik(|\textbf x|^2-z^2)/2z}}{i\lambda z}\iint_{\textbf x’\in\text{aperture}}\psi(\textbf x’)e^{-i\textbf k\cdot\textbf x’}d^2\textbf x’\]

The reason that Fraunhofer diffraction is only considered to apply in the far-field is that the above condition for its validity can be rewritten as \(z\gg z_R\) where \(z_R:=\rho’^2/\lambda\) is the Rayleigh distance of an aperture of typical length scale \(\rho’\sim\sqrt{x’^2+y’^2}\) when illuminated by monochromatic light of wavelength \(\lambda\). In other words, the precise meaning of “far-field” is “farther than the Rayleigh distance \(z_R\)”. Otherwise, when \(z≲z_R\) then such a term cannot be neglected and one simply refers to it as Fresnel diffraction. Notice that \(z≲z_R\) is not saying the same thing as \(z\ll z_R\); it is often said that Fresnel diffraction is the regime of near-field diffraction but that phrase can be misleading because it suggests that \(z\) can be arbitrarily small, yet clearly at some point if one kept decreasing \(z\) then eventually the higher-order terms in the binomial expansion would also start to matter (moreover, the paraxial approximation would also start to break down). Instead of calling it “near-field diffraction”, a more accurate name for Fresnel diffraction would be “not-far-enough diffraction” \(z≲z_R\). By contrast, Fraunhofer diffraction truly is arbitrarily far-field \(z\gg z_R\). Of course nothing stops one from also considering the case \(z\ll z_R\), there just doesn’t seem to be any special name given to this regime and in practice it’s not as relevant. Finally, sometimes one also encounters the terminology of the Fresnel number \(F(z):=z_R/z\); in this jargon, Fraunhofer diffraction occurs when \(F(z)\ll 1\) whereas Fresnel diffraction occurs when \(F(z)≳1\).

In practice, one typically ignores the pre-factor in front of the aperture integrals since it is the general profile of the irradiance \(|\psi(\textbf x)|^2\) that is mainly of interest. In this case, particular for Fraunhofer diffraction, one can write \(\hat{\psi}(\textbf k)\equiv\psi(\textbf x)\) as just the \(2\)D spatial Fourier transform of the aperture.

Diffraction Through a Single Slit

Consider a single \(y\)-invariant slit of width \(\Delta x\) centered at \(x’=0\). Then the Fraunhofer interference pattern has the form:

\[\hat{\psi}(k_x)\sim\int_{-\Delta x/2}^{\Delta x/2}e^{-ik_x x}dx=\frac{1}{ik_x}2i\sin\frac{k_x\Delta x}{2}\sim\Delta x\text{sinc}\frac{k_x\Delta x}{2}\]

where \(k_x=k\sin\theta\):

By contrast, the Fresnel interference pattern is given by:

\[\psi(x)\sim\int_{-\Delta x/2}^{\Delta x/2}e^{ik(x’-x)^2/2z}dx’\]

Although in general such an integral needs to be evaluated numerically, there is a simple geometric way to gain some intuition for how \(\psi(x)\in\textbf C\) behaves at fixed \(z≲z_R\sim\Delta x^2/\lambda\) via the Cornu spiral (also called the Euler spiral in contexts outside of physical optics). The idea is to shift the \(x\)-dependence from the integrand into the limits via the substitution \(\pi t’^2/2:=k(x-x’)^2/2z\iff t’=\sqrt{\frac{2}{\lambda z}}(x-x’)\). Then, ignoring chain rule factors, one has:

\[\psi(x)\sim\int_{t’_1(x)}^{t’_2(x)}e^{i\pi t’^2/2}dt’\]

where the limits are \(t’_1(x)=\sqrt{\frac{2}{\lambda z}}(x+\Delta x/2)\) and \(t’_2(x)=\sqrt{\frac{2}{\lambda z}}(x-\Delta x/2)\). Written in terms of the normalized Fresnel integral \(\text{Fr}(t):=\int_0^{t}e^{i\pi t’^2/2}dt’\):

\[\psi(x)\sim\text{Fr}(t’_2(x))-\text{Fr}(t’_1(x))\]

The object \(\text{Fr}(t)\) is a trajectory in \(\textbf C\) which is the aforementioned Cornu spiral:

where one can check the limits \(\lim_{t\to\pm\infty}\text{Fr}(t)=\pm(1+i)/2\). Noting that \(\dot{\text{Fr}}(t)=e^{i\pi t^2/2}\), it follows that the speed \(|\dot{\text{Fr}}(t)|=1\) is uniform and thus the distance/arc length traversed in time \(\Delta t\) is always just \(\Delta t\). Moreover, the curvature \(\kappa(t)=\Im(\dot{\text{Fr}}^{\dagger}\ddot{\text{Fr}})/|\dot{\text{Fr}}|^3=\pi t\) increases linearly in the arc length \(t\) (essentially what defines a spiral!). The point is that the irradiance \(|\psi(x)|^2\sim|\text{Fr}(t’_2(x))-\text{Fr}(t’_1(x))|^2\) is now visually just the length (squared) of a vector between the points \(\text{Fr}(t’_1(x))\) and \(\text{Fr}(t’_2(x))\) on the Cornu spiral. The idea is to first trek a distance \(\frac{t’_1(x)+t’_2(x)}{2}=\sqrt{\frac{2}{\lambda z}}x\) from the origin to some central \(x\)-dependent point on the spiral, and then extend around it by the \(x\)-independent amount \(t’_1-t’_2=\sqrt{\frac{2}{\lambda z}}\Delta x\sim\sqrt{F(z)}\) to get a corresponding line segment whose length will be \(|\psi(x)|\).

Diffraction Through a Circular Aperture

Given a circular aperture of radius \(R\), its \(2\)D isotropic nature means that the Fraunhofer interference pattern is just proportional to the Hankel transform of the aperture:

\[\hat{\psi}(k_{\rho})\sim\int_{0}^R\rho’J_0(k_{\rho}\rho’)d\rho’\sim\frac{J_1(k_{\rho}R)}{k_{\rho}}\]

with \(k_{\rho}=k\sin\theta\). This is sometimes called a sombrero or \(\text{jinc}\) function, being the polar analog of the \(\text{sinc}\) function. It has its first zero at \(k_{\rho}R\approx 3.8317\) which defines the boundary of the Airy disk (cf. \(\text{sinc}(k_x\Delta x/2)\) having its first zero at \(k_x\Delta x/2=\pi\approx 3.1415\)). This is often expressed paraxially via the angular radius of the Airy disk \(\theta_{\text{Airy}}\approx 1.22\frac{\lambda}{D}\) with \(D=2R\) the diameter of the aperture (cf. \(\theta_{\text{central max}}\approx\frac{\lambda}{\Delta x}\) for the single slit).

For the same circular aperture setup, one can also ask what happens in the Fresnel regime \(z≲z_R\). In general, the integral is complicated:

\[\psi(\rho,z)\sim\int_0^R e^{ik\rho’^2/2z}\rho’ J_0(k_{\rho}\rho)d\rho’\]

But if one restricts attention to the symmetric case of being on-axis \(\rho=k_{\rho}=0\), then:

\[\psi(0,z)\sim e^{ikR^2/2z}-1\Rightarrow|\psi(0,z)|^2\sim\sin^2\frac{kR^2}{4z}\]

This is essentially the topologist’s favorite pathological sine function, but of course it was already mentioned that this solution is only reliable when the argument \(kR^2/4z\sim F(z)≳1\). For this specific on-axis case, it turns out one can significantly relax the paraxial assumption, namely, although one still assumes the obliquity kernel \(K(\textbf x-\textbf x’)\approx 1\), otherwise one acknowledges that \(r^2=\rho’^2+z^2\):

\[\psi(0,z)\sim\int_{z}^{\sqrt{R^2+z^2}}e^{ikr}dr\Rightarrow |\psi(0,z)|^2\sim\sin^2\frac{k(\sqrt{R^2+z^2}-z)}{2}\]

where if one were to binomial expand \(\sqrt{R^2+z^2}\approx z+R^2/2z\) one would just recover the Fresnel solution. If one fixes a given on-axis observation distance \(z\) and instead views \(|\psi(0,z)|^2\) as a function of the aperture radius \(R\) (and not \(z\)), then clearly it alternates between bright maxima and dark minima at aperture radii \(R\equiv\rho’_n\) given by:

\[\frac{k(\sqrt{{\rho’_n}^2+z^2}-z)}{2}=\frac{n\pi}{2}\iff \rho’_n=\sqrt{n\lambda z+\left(\frac{n\lambda}{2}\right)^2}\approx\sqrt{n\lambda z}\]

Thus, in general for a fixed aperture radius \(R\), there will be \(\sim F(z)\) concentric annuli of the form \(\rho’\in[\rho’_{n-1},\rho’_n]\) that can be made to partition the aperture disk; the annulus \([\rho’_{n-1},\rho’_n]\) is called the \(n\)-th Fresnel half-period zone. Note that the area of each Fresnel half-period zone is a constant \(\pi\lambda z\) in the Fresnel regime thus providing equal but alternating contributions to \(\psi(0,z)\) that lead to the observed oscillatory behavior in \(|\psi(0,z)|^2\). The existence of this Fresnel half-period zone structure motivates the construction of Fresnel zone plates which are vaguely like polar analogs of diffraction gratings, except rather than being regularly spaced, they block alternate Fresnel half-period zones with some opaque material to reinforce constructive interference for a given \(z\) and \(\lambda\) (thus, in order to design such a zone plate, one has to already have in mind a \(z\) and a \(\lambda\) ahead of time in order to compute the radii \(\rho’_n\approx\sqrt{n\lambda z}\) to be etched out). If a given \((z,\lambda)\) zone plate has already been constructed, but one then proceeds to move \(z\mapsto z/m\) for some \(m\in\textbf Z^+\), then in each of the Fresnel half-period zones associated to \(z\), there would now be \(m\) Fresnel half-period zones associated to \(z/m\), and so each transparent region of the \((z,\lambda)\) zone plate would allow through \(m\) of the \(z/m\) Fresnel half-period zones. Thus, if \(m\in 2\textbf Z^+-1\) is odd, then one would still expect a net constructive interference on-axis at \(z/m\), whereas if \(m\in 2\textbf Z^+\) then destructive interference of pairs of adjacent Fresnel half-period zones wins out (this parity argument is easy to remember because if \(m=1\) then nothing happens and the whole point of constructing the zone plate was to amplify the constructive interference at \(z\)). Finally, it is sometimes said that a given \((z,\lambda)\) Fresnel zone plate acts like a lens of focal length \(z\); however, due to the dependence on \(\lambda\), such a lens suffers from chromatic aberration.

Now consider the complimentary problem of calculating \(\tilde{\psi}(0,z)\) for a circular obstruction of radius \(R\), rather than a circular aperture of radius \(R\) carved into an infinitely-extending obstruction. Clearly, the two are related by subtracting the solution \(\psi(0,z)\) of the circular aperture of radius \(R\) from the free, unobstructed plane wave \(e^{ikz}\); this obvious corollary of the linearity of the scalar Helmholtz equation is an instance of Babinet’s principle. This leads to the counterintuitive prediction of Poisson’s spot (also called Arago’s spot):

\[\tilde{\psi}(0,z)=e^{ikz}-\psi(0,z)\sim e^{ik\sqrt{R^2+z^2}}\Rightarrow |\psi(0,z)|^2\sim 1\]

where, taking into account the obliquity kernel \(K(\textbf x-\textbf x’)\), this holds as long as one doesn’t wander too close to the aperture. Note from the earlier case of the circular aperture that there was the complementary (and equally counterintuitive) prediction that one could get a dark on-axis spot at certain \(z\) (i.e. those for which the aperture \(R=\rho’_{2m}\) partitions into an even number \(2m\) of Fresnel half-period zones as evident from the formula \(|\psi(0,z)|^2\sim\sin^2k(\sqrt{R^2+z^2}-z)/2\)).

Talk about how, by working with a scalar \(\psi\), have basically neglected polarization which only comes about from the vectorial nature of the electromagnetic field; forms basis of so-called scalar wave theory or scalar diffraction theory. Connect all this to the Lippman-Schwinger equation in quantum mechanical scattering theory (specifically, this is basically the first-order Born approximation solution to LS equation).

Fraunhofer is taxicab, treats wavefronts as planar, Fresnel is \(\ell^2\), actually considers their curvature. Actually, anywhere that Fraunhofer works, Fresnel also works.

Geometrical Optics

Consider a spherical glass of index \(n’\) and radius \(R>0\) placed in a background of index \(n\), and a paraxial light ray incident at angle \(\theta\) and distance \(\rho\) (where both \(\theta\) and \(\rho\) are measured with respect to some suitable choice of principal \(z\)-axis):

The incident angle of the yellow ray is \(\theta+\rho/R\) while its refracted angle is \(\theta’+\rho/R\) so Snell’s law asserts (paraxially) that:

\[n’\left(\theta’+\frac{\rho}{R}\right)=n\left(\theta+\frac{\rho}{R}\right)\]

This directly yields the paraxial ray transfer matrix for this spherical glass:

\[\begin{pmatrix}\rho’\\\theta’\end{pmatrix}=\begin{pmatrix}1&0\\ (n-n’)/n’R&n/n’\end{pmatrix}\begin{pmatrix}\rho\\\theta\end{pmatrix}\]

There is no need to memorize such a matrix; instead, because it is \(2\times 2\), it can always be quickly rederived by finding two linearly independent vectors on which the action of such a matrix is physically obvious. The natural choice are its eigenvectors, which correspond physically to the following two “eigenrays”:

In the limit \(R\to\infty\) of a flat interface (e.g. in a plano-convex lens), the paraxial ray transfer matrix reduces to the diagonal matrix \(\begin{pmatrix}1&0\\ 0&n/n’\end{pmatrix}\).

Welding two such spherical glasses (of the same index \(n’\) and radii \(R>0,R'<0\) in the usual Cartesian sign convention) together back-to-back and assuming the usual thin-lens approximation (otherwise one would also need to include a propagation ray transfer matrix \(\begin{pmatrix}1&\Delta z\\ 0&1\end{pmatrix}\) if the thickness \(\Delta z>0\) were non-negligible), one obtains the paraxial ray transfer matrix of a thin convex lens (indeed any thin lens):

\[\begin{pmatrix}1&0\\ (n’-n)/nR’&n’/n\end{pmatrix}\begin{pmatrix}1&0\\ (n-n’)/n’R&n/n’\end{pmatrix}=\begin{pmatrix}1&0\\-P&1\end{pmatrix}\]

where \(P:=1/f\) is the optical power of the thin convex lens and the focal length \(f>0\) is given by the lensmaker’s equation:

\[\frac{1}{f}=\frac{n’-n}{n}\left(\frac{1}{R}-\frac{1}{R’}\right)\]

As a check of this formula’s self-consistency, consider the special case of a thin plano-convex lens (where the convex side has radius \(R>0\)). According to the lensmaker’s equation, this should have focal length \(1/f=(n’-n)/nR\). On the other hand, if one were to flip this plano-convex lens around (and call the radius of the convex side \(R'<0\)), then the lensmaker’s formula says it should now have focal length \(1/f’=-(n’-n)/nR’\). But, since both plano-convex lenses are thin, if one simply puts them next to each other then one would reform the thin convex lens as before, with effective focal length:

\[\frac{1}{f_{\text{eff}}}=\frac{n’-n}{n}\left(\frac{1}{R}-\frac{1}{R’}\right)=\frac{1}{f}+\frac{1}{f’}\]

In other words, the optical powers are additive \(P_{\text{eff}}=P+P’\). Note that this holds for any \(2\) thin lenses placed next to each other to form an “effective lens”, not just for the example of \(2\) plano-convex lenses given above. More generally, because the group of shears on \(\textbf R^2\) along a given direction is isomorphic to the additive abelian group \(\textbf R\), it holds for any \(N\in\textbf Z^+\) thin lenses arranged in an arbitrary order:

\[\begin{pmatrix}1&0\\-P_N&1\end{pmatrix}…\begin{pmatrix}1&0\\-P_2&1\end{pmatrix}\begin{pmatrix}1&0\\-P_1&1\end{pmatrix}=\begin{pmatrix}1&0\\-(P_1+P_2+…+P_N)&1\end{pmatrix}\]

It’s worth quickly clarifying why these are even thin lenses in the first place and why the claimed \(f\) really is a suitable notion of focal length. One can proceed axiomatically, demanding that a thin lens be any optical element which:

  1. Focuses all incident light rays parallel to the principal \(z\)-axis to a focal point \(f\) (i.e. \((\rho,0)\mapsto(\rho,-\rho/f)\)).
  2. Doesn’t affect any light rays that pass through the principal \(z\)-axis (i.e. \((0,\theta)\mapsto(0,\theta)\)).

These are two linearly independent vectors (though only the latter is an eigenvector, as shear transformations are famous for being non-diagonalizable), so these \(2\) axioms are sufficient to fix the form \(\begin{pmatrix}1&0\\-1/f&1\end{pmatrix}\) of the paraxial ray transfer matrix of a thin lens.

Often, one would like to use thin lenses to image various objects. Consider an arbitrary point in space sitting a distance \(\rho\) above the the principal \(z\)-axis and a distance \(z>0\) away from a thin lens. If a light ray is emitted from this point at some angle \(\theta\), and this light ray refracts through the thin lens (of focal length \(f\)), and ends up at some point \(\rho’,z’\) after the lens during its trajectory, then one has:

\[\begin{pmatrix}\rho’\\\theta’\end{pmatrix}=\begin{pmatrix}1&z’\\0&1\end{pmatrix}\begin{pmatrix}1&0\\-1/f&1\end{pmatrix}\begin{pmatrix}1&z\\0&1\end{pmatrix}\begin{pmatrix}\rho\\\theta\end{pmatrix}\]

where the composition of those \(3\) matrices evaluates to \(\begin{pmatrix}1-z’/f&z+z’-zz’/f\\-1/f&1-z/f\end{pmatrix}\). But this has an important corollary; if one were to specifically choose the distance \(z’>0\) such as to make the top-right entry vanish \(z+z’-zz’/f=0\iff f^2=(z-f)(z’-f)\iff 1/f=1/z+1/z’\), then \(\rho’=(1-z’/f)\rho=\rho/(1-z/f)\) would be independent of \(\theta\)! The condition \(1/f=1/z+1/z’\) is sometimes called the (Gaussian) thin lens equation, though a better name would simply be the imaging condition. The corresponding linear transverse magnification is \(M_{\rho}:=\rho’/\rho=-z’/z=1/(1-z/f)\). One sometimes also sees the linear longitudinal magnification \(M_{z}:=\partial z’/\partial z=-1/(1-z/f)^2=-M_{\rho}^2<0\) which is always negative.

A magnifying glass works by placing an object at \(z\approx f\) so as to form a virtual image at a distance \(z’\to -\infty\). In that case, both \(M_{\rho},M_z\to\infty\) exhibit poles at \(z=f\), so what does it mean when a company advertises a magnifying glass as offering e.g. \(\times 40\) magnification? It turns out this is actually a specification of the angular magnification \(M_{\theta}:=\theta’/\theta=(\rho/f)/(\rho/d)=d/f\) of the convex lens when viewed at a distance \(d=25\text{ cm}\) from the object (not from the lens). So the statement that \(M_{\theta}=40\) is really a statement about the focal length \(f=d/M_{\theta}=0.625\text{ cm}\) of the lens. In turn, if the glass of the magnifier has a typical index such as \(n=1.5\) and is intended to be symmetric, then the lensmaker’s equation requires one to use \(R=-R’=0.625\text{ cm}\) in air (coincidentally the same as \(f\)).

The set of paraxial rays \((\rho,\theta)\) constitute a real, \(2\)-dimensional vector space on which optical elements such as lenses act by linear transformations. For instance, a collection of parallel rays incident on the lens (represented by the horizontal line below) is first sheared vertically by the lens, and subsequently free space propagation by a distance \(f\) shears the resultant line horizontally to the point that it becomes vertical, indicating that all the parallel rays have been focused to the same point (thus, this is an instance of the general identity \(\arctan f+\arctan 1/f=\text{sgn}(f)\pi/2\) for arbitrary \(f\in\textbf R-\{0\}\)).

An important corollary of this is that, if one wishes to observe the Fraunhofer interference pattern of some aperture at any distance \(f\) of interest, not necessarily just in the far-field \(f\gg z_R\), a simple way to achieve this is to just place a thin convex lens of focal length \(f\) into the aperture. Recalling that the Fraunhofer interference pattern arises by the superposition of (essentially) parallel Huygens wavelet contributions from each point on the aperture (parallel because one is working in the far-field), and recalling that a lens focuses all incident parallel rays onto a given point in its back focal plane \(f\), this provides a geometrical optics way of seeing why one can form the Fraunhofer interference pattern at an arbitrary distance \(f\) simply by choosing a suitable convex lens.

There is also a more physical optics way of seeing the same result. Recall that, at the end of the day, a lens is just two spherical caps of radii \(R,R’\) that have been welded together. In Cartesian coordinates, the equations of such caps are \(z=-\sqrt{R^2-x’^2-y’^2}\) and \(z=\sqrt{R’^2-x’^2-y’^2}\), but in the paraxial approximation, these look like the paraboloids \(z\approx -R+(x’^2+y’^2)/2R\) and \(z\approx -R’+(x’^2+y’^2)/2R’\) (where \(R'<0\) for a convex lens, etc.). Here, despite using a “thin” lens approximation, one cannot completely ignore the thickness profile across the lens (also a paraboloid):

\[\Delta z(x’,y’)=R-R’-\frac{x’^2+y’^2}{2}\left(\frac{1}{R}-\frac{1}{R’}\right)\]

So a light ray incident at \((x’,y’)\) on the lens will, upon exiting the lens, have acquired an additional phase shift of:

\[\Delta\phi(x’,y’)=n’k\Delta z(x’,y’)-nk\Delta z(x’,y’)=(n’-n)k\Delta z(x’,y’)\]

It is important to understand that \(k\) here is the free space wavenumber, but that in a medium \(n\) it becomes \(k\mapsto nk\) because \(\omega=ck=vnk\) is fixed. This corresponds to a spatially-varying \(U(1)\) modulation of the aperture field:

\[\psi(x’,y’,0)\mapsto\psi(x’,y’,0)e^{i\Delta\phi(x’,y’)}=\psi(x’,y’,0)e^{i(n’-n)k(R-R’)}e^{-ink(x’^2+y’^2)/2f}\]

where the lensmaker’s equation has been used. But notice that, when inserted into the Fresnel diffraction integral (with \(k\mapsto nk\) and hence \(\lambda\mapsto\lambda/n\)), if one places the screen exactly at \(z=f\), then the quadratic phase terms cancel out and one is left with precisely the Fraunhofer interference pattern:

\[\psi(\textbf x)=\frac{ne^{i(n’-n)k(R-R’)}e^{inkf}e^{ink(|\textbf x|^2-f^2)/2f}}{i\lambda f}\iint_{\textbf x’\in\text{aperture}}\psi(\textbf x’)e^{-in\textbf k\cdot\textbf x’}d^2\textbf x’\]

More generally, for an arbitrary optical component with ray transfer matrix \(\begin{pmatrix}A&B\\C&D\end{pmatrix}\) in the geometrical optics picture, its corresponding operator in the physical optics picture is \(e^{}\).

Problem: Show that \(2\) closely-spaced wavenumbers \(k,k’\) separated by \(\Delta k:=|k-k’|\) can be resolved by a diffraction grating with \(N\) slits at the \(m^{\text{th}}\)-order iff:

\[\frac{\Delta k}{k}\geq \frac{1}{mN}\]

(note the spectral resolution can also be expressed in terms of wavelengths \(\Delta k/k=\Delta\lambda/\lambda\) because \(k\lambda=2\pi\)).

Solution: Although not relevant to the final result, it is useful to introduce the length \(L\) of the entire grating as well as the separation \(d\) between adjacent slits (each considered infinitely thin) so that \(L=Nd\).

Now, for all wavenumbers \(k\), the Fraunhofer interference pattern in \(k_x\)-space looks the same:

The maxima only appear to split in \(\sin\theta\)-space, where more precisely the \(m^{\text{th}}\)-order maxima of the \(2\) wavenumbers \(k,k’\) are now separated by:

\[\Delta\sin\theta=\frac{2\pi m}{kd}-\frac{2\pi m}{k’d}=\frac{m\Delta\lambda}{d}\]

And the Rayleigh criterion considers these maxima to be resolved iff \(\Delta\sin\theta\) is at least the width (again in \(\sin\theta\)-space) of any one of the maxima \(\frac{2\pi}{kL}=\frac{\lambda}{L}\). So:

\[\frac{m\Delta\lambda}{d}\geq\frac{\lambda}{L}\]

\[\frac{\Delta \lambda}{\lambda}\geq \frac{d}{mL}\]

and finally, just remember \(L=Nd\).

  • Collimator setup
  • Lenses, sign conventions (basically, one key point is that an optical element which tends to converge light rays has positive focal length).
  • Real objects are a source of rays, real images are a sink of rays, virtual images are a source of rays (probably all of this can be made precise in the eikonal approximation).
  • Gaussian optics as paraxial geometrical optics
  • No notion of \(\lambda\) (can view geometrical optics as the \(\lambda\to 0\) limit of physical optics)
  • Ray tracing algorithms.
  • Spherical aberration
  • Chromatic aberration due to optical dispersion \(n(\lambda)\) as the only time where \(\lambda\) shows up, sort of ad hoc.
Posted in Blog | Leave a comment

Phases of the Classical Ising Model

Problem #\(1\): When someone comes up to you on the street and just says “Ising model”, what should be the first thing you think of?

Solution #\(1\): The classical Hamiltonian:

\[H=-E_{\text{int}}\sum_{\langle i,j\rangle}\sigma_i\sigma_j-E_{\text{ext}}\sum_i\sigma_i\]

(keeping in mind though that there many variants on this simple Ising model).

Problem #\(2\): Is the Ising model classical or quantum mechanical?

Solution #\(2\): It is purely classical. Indeed, this is a very common misconception, because many of the words that get tossed around when discussing the Ising model (e.g. “spins” on a lattice, “Hamiltonian”, “(anti)ferromagnetism”, etc.) sound like they are quantum mechanical concepts, and indeed they are but the Ising model by itself is a purely classical mathematical model that a priori need not have any connection to physics (and certainly not to quantum mechanical systems; that being said it’s still useful for intuition to speak about it as if it were a toy model of a ferromagnet).

To hit this point home, remember that the Hamiltonian \(H\) is just a function on phase space in classical mechanics, whereas it is an operator in quantum mechanics…but in the formula for \(H\) in Solution #\(1\), there are no operators on the RHS, the \(\sigma_i\in\{-1,1\}\) are just some numbers which specify the classical microstate \((\sigma_1,\sigma_2,…)\) of the system, so it is much more similar to just a classical (as opposed to quantum) Hamiltonian. And there are no superpositions of states, or non-commuting operators, or any other quantum voodoo going on. So, despite the discreteness/quantization which is built into the Ising model, it is purely classical.

Problem #\(3\): What does it mean to “solve” the Ising model? (i.e. what properties of the Ising lattice is one interested in understanding?)

Solution #\(3\): The mental picture one should have in mind is that of coupling the Ising lattice with a heat bath at some temperature \(T\), and then ask how the order parameter \(m\) of the lattice (in this case the Boltzmann-averaged mean magnetization) varies with the choice of heat bath temperature \(T\). Intuitively, one should already have a qualitative sense of the answer:

So to “solve” the Ising model just means to quantitatively get the equation of those curves \(m=m(T)\) for all possible combinations of parameters \(E_{\text{int}},E_{\text{ext}}\in\textbf R\) in the Ising Hamiltonian \(H\).

Problem #\(4\):

Solution #\(4\):

Problem #\(5\):

Solution #\(5\):

Comparing with the earlier intuitive sketch (note all the inner loop branches at low temperature are unstable):

In particular, the phase transition at \(E_{\text{ext}}=0\) is manifest by the trifurcation at the critical point \(\beta=\beta_c\).

Problem #\(6\):

Solution #\(6\):

Problem #\(7\): Show that in the mean field approximation, the short-range Ising model at \(E_{\text{ext}}=0\) experiences a \(2\)-nd order phase transition in the equilibrium magnetization \(m_*(T)\) for \(T<T_c\) (but \(T\) close to \(T_c\)) goes like \(m_*(T)\approx\pm\sqrt{3(T_c/T-1)}\).

Solution #\(7\): Within (stupid!) mean-field theory, the effective free energy is:

\[f(m)=-k_BT\ln 2-E_{\text{ext}}m-\frac{qE_{\text{int}}}{2}m^2+\frac{1}{2}k_BT\biggr[(1+m)\ln(1+m)+(1-m)\ln(1-m)\biggr]\]

Proof:

So anyways, Maclaurin-expanding the mean-field effective free energy \(f(m)\) per unit spin:

The spontaneous \(\textbf Z_2\) symmetry breaking (i.e. ground state not preserved!) associated to the \(T<T_c\) ordered phase at \(E_{\text{ext}}=0\) is apparent:

Problem #\(8\): In the Ehrenfest classification of phase transitions, an \(N\)-th order phase transition occurs when the \(N\)-th derivative of the free energy \(\frac{\partial^N F}{\partial m^N}\) is discontinuous at some critical value of the order parameter \(m_*\). But considering that \(F=-k_BT\ln Z\) and the partition function \(Z=\sum_{\{\sigma_i\}}e^{-\beta E_{\{\sigma_i\}}}\) is a sum of \(2^N\) analytic exponentials, how can phase transitions be possible?

Solution #\(8\): By analogy, consider the Fourier series for a certain square wave:

\[f(t)=\frac{4}{\pi}\sum_{n=1,3,5,…}^{\infty}\frac{\sin(2\pi n t/T)}{n}\]

Although each sinusoid in the Fourier series is everywhere analytic, the series converges in \(L^2\) norm to a limiting square wave which has discontinuities at \(t_m=mT\), hence not being analytic at those points! So the catch here is that while any finite series of analytic functions (e.g. a partial sum truncation) will have its analyticity preserved, an infinite series need not! This simple result of analysis underpins the existence of phase transitions! In practice of course, for any finite number of spins \(N<\infty\), \(2^N\) will still be finite and in fact there are strictly speaking no phase transitions in any finite system. But in practice \(N\sim 10^{23}\) is so large that it is effectively infinite, and so in this \(N=\infty\) system limit it looks to all intents and purposes as a phase transition.

Similar to the phase transitions, spontaneous symmetry breaking is also an \(N=\infty\) phenomenon only, strictly speaking:

\[m_{SSB}=\lim_{E_{\text{ext}}\to 0}\lim_{N\to\infty}\biggr\langle\frac{1}{N}\sum_{i=1}^N\sigma_i\biggr\rangle\]

where the limits do not commute \(\lim_{E_{\text{ext}}\to 0}\lim_{N\to\infty}\neq\lim_{N\to\infty}\lim_{E_{\text{ext}}\to 0}\) because for any finite \(N<\infty\), \(\langle m\rangle_N=-\frac{1}{N}\frac{\partial F_H}{\partial E_{\text{ext}}}|_{E_{\text{ext}}=0}=0\) since \(\textbf Z_2\) symmetry enforces \(F_H(E_{\text{ext}})=F_H(-E_{\text{ext}})\) so that its derivative must be odd and therefore vanishing at the origin.

Problem #\(9\): Show that, in the mean-field short-range Ising model at \(E_{\text{ext}}=0\) the specific/intensive heat capacity \(c\) is discontinuous at \(T=T_c\).

Solution #\(9\):

Appendix: Physical Systems Described by Classical Ising Statistics

The purpose of this post is to dive into the intricacies of the classical Ising model. For this, it is useful to imagine the Bravais lattice \(\textbf Z^d\) in \(d\)-dimensions of lattice parameter \(a\), together with a large number \(N\) of neutral spin \(s=1/2\) fermions (e.g. neutrons, ignoring the fact that isolated neutrons are unstable) tightly bound to the lattice sites \(\textbf x\in\textbf Z^d\), each site accommodating at most one fermion by the Pauli exclusion principle. On top of all this, apply a uniform external magnetic field \(\textbf B_{\text{ext}}\) across the entire sample of \(N\) fermions. Physically then, ignoring any kinetic energy or hopping/tunneling between lattice sites (cf. Fermi-Hubbard model), there are two forms of potential energy that contribute to the total Hamiltonian \(H\) of this lattice of spins:

  1. Each of these \(N\) neutral fermions has a magnetic dipole moment \(\boldsymbol{\mu}_{\textbf S}=\gamma_{\textbf S}\textbf S\) arising from its spin angular momentum \(\textbf S\) (in the case of charged fermions such as electrons \(e^-\), this is just the usual \(\gamma_{\textbf S}=-g_{\textbf S}\mu_B/\hbar\) but for neutrons the origin of such a magnetic dipole moment is more subtle, ultimately arising from its quark structure). This magnetic dipole moment \(\boldsymbol{\mu}_{\textbf S}\) couples with the external magnetic field \(\textbf B_{\text{ext}}\), leading to an interaction energy of the form:

\[V_{\text{ext}}=-\sum_{i=1}^N\boldsymbol{\mu}_{\textbf S,i}\cdot\textbf B_{\text{ext}}=-\gamma_{\textbf S}\sum_{i=1}^N\textbf S_i\cdot\textbf B_{\text{ext}}\]

2. Nearby magnetic dipole moments couple with each other across space, mediating a local internal interaction of the form:

\[V_{\text{int}}=\frac{\mu_0\gamma_{\textbf S}^2}{4\pi}\sum_{1\leq i\neq j\leq N}\frac{3(\textbf S_i\cdot\Delta\hat{\textbf x}_{ij})(\textbf S_j\cdot\Delta\hat{\textbf x}_{ij})-\textbf S_i\cdot\textbf S_j}{|\Delta\textbf x_{ij}|^3}\]

Thus, the total Hamiltonian \(H\) on this state space \(\mathcal H\cong\textbf Z^d\otimes\textbf C^{\otimes 2N}\) is:

\[H=V_{\text{int}}+V_{\text{ext}}\]

Right now, it is hopelessly complicated. From this point onward, a sequence of dubious approximations will be applied to transform this current Hamiltonian \(H\mapsto H_{\text{Ising}}\) to the Ising Hamiltonian \(H_{\text{Ising}}\) (in fact, as mentioned, even the apparently complicated form of the Hamiltonian \(H\) is already approximate; the reason for using neutral fermions is to avoid dealing with an additional Coulomb repulsion contribution to \(H\)).

  • Approximation #1: Recall that the direction of the applied magnetic field, say along the \(z\)-axis \(\textbf B_{\text{ext}}=B_{\text{ext}}\hat{\textbf k}\), defines the quantization axis of all the relevant angular momenta. For a sufficiently strong magnetic field \(\textbf B_{\text{ext}}\) (cf. the Paschen-Back effect in atoms), the external coupling \(V_{\text{ext}}\) should dominate the internal coupling \(V_{\text{int}}\) and so all \(N\) spin angular momenta \(\textbf S_i\) will Larmor-precess around \(\textbf B_{\text{ext}}\) with \(m_{s,i}\in\{-1/2,1/2\}\) becoming a good quantum number.
  • Approximation #2: Assume that only nearest-neighbour dipolar couplings are important (in \(\textbf Z^d\) there would be \(2d\) nearest neighbours) and that moreover, because all the spins are roughly aligned in the direction of \(\textbf B_{\text{ext}}\), the term \((\textbf S_i\cdot\Delta\hat{\textbf x}_{ij})(\textbf S_j\cdot\Delta\hat{\textbf x}_{ij})\) is not as important as the spin-spin coupling term \(\textbf S_i\cdot\textbf S_j\).

Combining these two approximations, one obtains the Ising Hamiltonian \(H_{\text{Ising}}\) acting on the Ising state space \(\mathcal H_{\text{Ising}}\cong\{-1,1\}^N\):

\[H\approx H_{\text{Ising}}=-E_{\text{int}}\sum_{\langle i,j\rangle}\sigma_i\sigma_j-E_{\text{ext}}\sum_{i=1}^N\sigma_i\]

where \(\sigma_i:=2m_{s,i}\in\{-1,1\}\), \(E_{\text{int}}:=\mu_0\hbar^2\gamma_{\textbf S}^2/16\pi a^3\) is a proxy for the interaction strength between adjacent fermions via the energy gain of being mutually spin-aligned and \(E_{\text{ext}}:=\hbar\gamma_{\textbf S}B_{\text{ext}}/2\) is a proxy for the external field strength via the energy gain of being spin-aligned with it. In the context of magnetism, a material with \(E_{\text{int}}>0\) would be thought of as a ferromagnet while \(E_{\text{int}}<0\) is called an antiferromagnet (this possibility does not arise however after the various approximations that were made). Similarly, \(E_{\text{ext}}\) can be either positive or negative (e.g. for neutrons it is actually negative \(E_{\text{ext}}<0\) because \(\gamma_{\textbf S}<0\)) but for intuition purposes one can just think \(E_{\text{ext}}>0\) so that being spin-aligned with \(\textbf B_{\text{ext}}\) is the desirable state of affairs.

From \(H_{\text{Ising}}\) to \(Z_{\text{Ising}}\)

As usual, once the Hamiltonian \(H_{\text{Ising}}\) has been found (i.e. once the physics has been specified), the rest is just math. In particular, the usual next task is to calculate its canonical partition function \(Z_{\text{Ising}}=\text{Tr}(e^{-\beta H_{\text{Ising}}})\). The calculation of \(Z_{\text{Ising}}\) can be done exactly in dimension \(d=1\) for arbitrary \(E_{\text{ext}}\) (this is what Ising did in his PhD thesis) and also for \(d=2\) provided the absence of an external magnetic field \(E_{\text{ext}}=0\) (this is due to Onsager). In higher dimensions \(d\gg 1\), as the number \(2d\) of nearest neighbours increases, the accuracy of an approximate method for evaluating \(Z_{\text{Ising}}\) known as mean field theory increases accordingly, becoming an exact solution only for the unphysical \(d=\infty\). It is simplest to first work through the mathematics of the mean field theory approach before looking at the special low-dimensional cases \(d=1,(d=2,E_{\text{ext}}=0)\) (it is worth emphasizing that the Ising model can also be trivially solved in any dimension \(d\) if interactions are simply turned off \(E_{\text{int}}=0\) but this would be utterly missing the whole point of the Ising model! edit: in hindsight, maybe not really after all, see the section below on mean field theory).

First, just from inspecting the Hamiltonian \(H_{\text{Ising}}\) it is clear that the net magnetization” \(\Sigma:=\sum_{i=1}^N\sigma_i\) is conjugate to \(E_{\text{ext}}\), so in the canonical ensemble it fluctuates around the expectation:

\[\langle\Sigma\rangle=-\frac{\partial F_{\text{Ising}}}{\partial E_{\text{ext}}}\]

The ensemble-averaged spin is therefore \(\langle\sigma\rangle=\langle\Sigma\rangle/N\). The usual “proper” way to calculate \(\langle\sigma\rangle\) would be to just directly and analytically evaluate the sums in \(Z_{\text{Ising}}=e^{-\beta F_{\text{Ising}}}\) so in particular \(\langle\sigma\rangle\) shouldn’t appear anywhere until one is explicitly calculating for it. However, using mean field theory, it turns out one will end up with an implicit equation for \(\langle\sigma\rangle\) that can nevertheless still be solved in a self-consistent manner.

To begin, write \(\sigma_i=\langle\sigma\rangle+\delta\sigma_i\) (cf. the Reynolds decomposition used to derive the RANS equations in turbulent fluid mechanics). Then the interaction term in \(H_{\text{Ising}}\) (which is both the all-important term but also the one that makes the problem hard) can be written:

\[V_{\text{int}}=-E_{\text{int}}\sum_{\langle i,j\rangle}(\langle\sigma\rangle+\delta\sigma_i)(\langle\sigma\rangle+\delta\sigma_j)=-E_{\text{int}}\langle\sigma\rangle^2\sum_{\langle i,j\rangle}1-E_{\text{int}}\langle\sigma\rangle\sum_{\langle i,j\rangle}(\delta\sigma_i+\delta\sigma_j)-E_{\text{int}}\sum_{\langle i,j\rangle}\delta\sigma_i\delta\sigma_j\]

Although the variance \(\langle\delta\sigma_i^2\rangle=\langle\sigma_i^2\rangle-\langle\sigma\rangle^2=1-\langle\sigma\rangle^2\) of each individual spin \(\sigma_i\) from the mean background spin \(\langle\sigma\rangle\) is not in general going to be zero (unless of course the entire system is magnetized along or against \(\textbf B_{\text{ext}}\), i.e. \(\langle\sigma\rangle=\pm 1\)), the mean field approximation says that the covariance between distinct neighbouring spins \(\langle i,j\rangle\) should average to \(\sum_{\langle i,j\rangle}\delta\sigma_i\delta\sigma_j\approx 0\), so that, roughly speaking, the overall \(N\times N\) covariance matrix of the spins is not only diagonal but just proportional to the identity \((1-\langle\sigma\rangle^2)1_{N\times N}\).

Thus, reverting back to \(\delta\sigma_i=\sigma_i-\langle\sigma\rangle\) and using for the lattice \(\textbf Z^d\) the identity \(\sum_{\langle i,j\rangle}1\approx Nd\) (because each of \(N\) spins has \(2d\) nearest neighbours but a factor of \(1/2\) is needed to compensate double-counting each bond) and the identity \(\sum_{\langle i,j\rangle}(\sigma_i+\sigma_j)\approx 2d\sum_{i=1}^N\sigma_i\) (just draw a picture), the mean field Ising Hamiltonian \(H’_{\text{Ising}}\) simplifies to:

\[H’_{\text{Ising}}=NdE_{\text{int}}\langle\sigma\rangle^2-E_{\text{ext}}’\sum_{i=1}^N\sigma_i\]

where the constant \(NdE_{\text{int}}\langle\sigma\rangle^2\) doesn’t affect any of the physics (although it will be kept in the calculations below for clarity) and \(E_{\text{ext}}’=E_{\text{ext}}+2dE_{\text{int}}\langle\sigma\rangle\) is the original energy \(E_{\text{ext}}\) together now with a mean field contribution \(2dE_{\text{int}}\langle\sigma\rangle\). This has a straightforward interpretation; one is still acknowledging that only the \(2d\) nearest neighbouring spins can influence a given spin, but now, rather than each one having its own spin \(\sigma_j\), one is assuming that they all exert the same mean field \(\langle\sigma\rangle\) that permeates the entire Ising lattice \(\textbf Z^d\). Basically, the mean field approximation has removed interactions \(E_{\text{int}}’=0\), reducing the problem to a trivial non-interacting one for which it is straightforward to calculate (this is just repeating the usual steps of calculating, e.g. the Schottky anomaly):

\[Z’_{\text{Ising}}=(2e^{dE_{\text{int}}\langle\sigma\rangle^2}\cosh\beta E’_{\text{ext}})^N\]

\[F’_{\text{Ising}}=-\frac{N}{\beta}(\ln\cosh\beta E’_{\text{ext}}+dE_{\text{int}}\langle\sigma\rangle^2+\ln 2)\]

\[\langle\sigma\rangle=\tanh\beta E’_{\text{ext}}=\tanh\beta(E_{\text{ext}}+2dE_{\text{int}}\langle\sigma\rangle)\]

As promised earlier, this is an implicit equation for the average spin \(\langle\sigma\rangle\) that can, for a fixed dimension \(d\), be solved for various values of temperature \(\beta\) (which intuitively wants to randomize the spins) and energies \(E_{\text{int}},E_{\text{ext}}\) (both of which intuitively want to align the spins). The outcome of this competition is the following:

If one applies any kind of external magnetic field \(E_{\text{ext}}\neq 0\), then as one increases the temperature \(1/2d\beta E_{\text{int}}\to\infty\), the mean spin \(\langle\sigma\rangle\to 0\) randomizes gradually (more precisely, as \(\langle\sigma\rangle=\beta E_{\text{ext}}+2dE_{\text{int}}E_{\text{ext}}\beta^2+O_{\beta\to 0}(\beta^3)\)). The surprise though occurs in the absence of any external magnetic field \(E_{\text{ext}}=0\); here, driven solely by mean field interactions, the mean spin \(\langle\sigma\rangle=0\) abruptly vanishes at all temperatures \(T\geq T_c\) exceeding a critical temperature \(kT_c=2dE_{\text{int}}\). This is a second-order ferromagnetic-to-paramagnetic phase transition (it is second order because the discontinuity occurs in the derivative of \(\langle\sigma\rangle\) which itself is already a derivative of the free energy \(F’_{\text{Ising}}\)). Meanwhile, there is also a first-order phase transition given by fixing a subcritical temperature \(T<T_c\) and varying \(E_{\text{ext}}\), as in this case it is the mean magnetization \(\langle\sigma\rangle\) itself that jumps discontinuously.

Note also that, similar to the situation for the Van der Waals equation when one had \(T<T_c\), here it is apparent that at sufficiently low temperatures for arbitrary \(E_{\text{ext}}\), the mean field Ising model predicts \(3\) possible mean magnetizations \(\langle\sigma\rangle\). For \(E_{\text{ext}}=0\), the unmagnetized solution \(\langle\sigma\rangle=0\) solution turns out to be an unstable equilibrium. For \(E_{\text{ext}}>0\), the solution on the top branch with \(\langle\sigma\rangle>0\) aligned with the external magnetic field is stable while among the two solutions with \(\langle\sigma\rangle<0\), one is likewise unstable while one is metastable, and similarly for \(E_{\text{ext}}<0\).

Critical Exponents

Solving The Ising Chain (\(d=1\)) Via Transfer Matrices

Low & High-\(T\) Limits of Ising Model in \(d=2\) Dimensions

Talk about Peierls droplet, prove Kramers-Wannier duality between the low and high-\(T\) regimes.

Beyond Ferromagnetism

The point of the Ising model isn’t really to be some kind of accurate model for any real-life physical system, but just a “proof of concept” demonstration that phase transitions can arise from statistical mechanics; although the sum of finitely many analytic functions is analytic, in the thermodynamic limit a phase transition can appear. Similar vein of mathematical modelling can be used to model lattice gases,

Talk about the Metropolis-Hastings algorithm.

Posted in Blog | Leave a comment

Turbulence

The purpose of this post is to study the universal properties of fully developed turbulence \(\text{Re}\gg\text{Re}^*\sim 10^3\). Thanks to direct numerical simulation (DNS), there is strong evidence to suggest that the nonlinear advective term \(\left(\textbf v\cdot\frac{\partial}{\partial\textbf x}\right)\textbf v\) in the Navier-Stokes equations correctly captures turbulent flow in fluids. However, rather than trying to find analytical solutions \(\textbf v(\textbf x,t)\) that exhibit turbulence (which is clearly pretty hopeless), it makes sense to decompose \(\textbf v=\bar{\textbf v}+\delta\textbf v\) into a mean velocity field \(\bar{\textbf v}(\textbf x,t)\) superimposed by some fluctuations \(\delta\textbf v(\textbf x,t)\). The precise meaning of the word “mean” in the phrase “mean velocity field” for \(\bar{\textbf v}\) is time-averaged over some “suitable” period \(T\), also known as Reynolds averaging:

\[\bar{\textbf v}(\textbf x,t):=\frac{1}{T}\int_{t}^{t+T}\textbf v(\textbf x,t’)dt’\]

By construction, this implies that the Reynolds time average of the fluctuations vanishes \(\overline{\delta\textbf v}=\overline{\textbf v-\bar{\textbf v}}=\bar{\textbf v}-\bar{\textbf v}=\textbf 0\).

One can also check that \(\textbf v\) is incompressible if and only if both the Reynolds averaged flow \(\bar{\textbf v}\) and the fluctuations \(\delta\textbf v\) are also incompressible. One similarly works with the Reynolds averaged pressure \(p=\bar p+\delta p\) so that by design \(\overline{\delta p}=0\).

Substituting \(\textbf v=\bar{\textbf v}+\delta\textbf v\) and \(p=\bar p+\delta p\) into the Navier-Stokes equations and Reynolds averaging both sides of the equation yields the well-named Reynolds-averaged Navier-Stokes (RANS) equations:

\[\rho\left(\frac{\partial\bar{\textbf v}}{\partial t}+\left(\bar{\textbf v}\cdot\frac{\partial}{\partial\textbf x}\right)\bar{\textbf v}\right)=-\frac{\partial\bar p}{\partial\textbf x}+\eta\left|\frac{\partial}{\partial\textbf x}\right|^2\bar{\textbf v}-\rho\overline{\left(\delta\textbf v\cdot\frac{\partial}{\partial\textbf x}\right)\delta\textbf v}+\bar{\textbf f}\]

Or in a Cartesian basis:

\[\rho(\dot{\bar{v}}_i+\bar v_j\partial_j\bar v_i)=\partial_j\bar{\sigma}_{ij}+\bar{f}_i\]

where the Reynolds averaged stress tensor \(\bar{\sigma}\) now includes an additional turbulent contribution \(\bar{\sigma}_{\text{Reynolds}}=-\rho\overline{\delta\textbf v\otimes\delta\textbf v}\) known as the Reynolds stress:

\[\bar{\sigma}_{ij}=-\bar p\delta_{ij}+\eta(\partial_j\bar v_i+\partial_i\bar v_j)-\rho\overline{\delta v_i\delta v_j}\]

(this can be quickly checked using the incompressibility conditions \(\partial_j\bar v_j=\partial_j\delta v_j=0\)).

At this point, assuming that the external body forces have no fluctuations \(\delta\textbf f=\textbf f-\bar{\textbf f}=\textbf 0\), one can subtract the RANS equations from the original Navier-Stokes equations to obtain:

\[\rho\left(\frac{\partial\delta\textbf v}{\partial t}+\left(\bar{\textbf v}\cdot\frac{\partial}{\partial\textbf x}\right)\delta\textbf v+\left(\delta\textbf v\cdot\frac{\partial}{\partial\textbf x}\right)\bar{\textbf v}+\delta\left(\left(\delta\textbf v\cdot\frac{\partial}{\partial\textbf x}\right)\delta\textbf v\right)\right)=-\frac{\partial\delta p}{\partial\textbf x}+\eta\left|\frac{\partial}{\partial\textbf x}\right|^2\delta\textbf v\]

Taking the outer product of both sides with \(\delta\textbf v\) and then Reynolds averaging yields (to be added: closure problem, Boussinesq approximation as a closure model).

Dimensional Analysis

Posted in Blog | Leave a comment