Experimental Optics

Problem: What does it mean to say that a field \(\mathbf E(\mathbf x,t)\in\mathbf C^3\) is a plane wave with speed \(c\geq 0\) in direction \(\hat{\mathbf z}\in S^2\)? Show that a general such plane wave can be written as a Fourier synthesis over all frequencies \(\omega\in\mathbf R\):

\[\mathbf E(z,t)=\int_{-\infty}^{\infty}d\omega\left(\mathbf E_+(\omega)e^{ik_{\omega}(z-ct)}+\mathbf E_-(\omega)e^{ik_{\omega}(z+ct)}\right)\]

where \(k_{\omega}:=\omega/c\). In the special case where \(\mathbf E(\mathbf x,t)\) is the electric field of an electromagnetic wave in vacuum, what additional constraints on the Fourier components \(\mathbf E_{\pm}(\omega)\) are present? Under what further assumptions can the description of the plane wave be reduced to a Jones vector \(\mathbf E_0\in\mathbf C^2\)?

Solution: The “plane” part means that \(\mathbf E(\mathbf x,t)=\mathbf E(z,t)\) is constant on planes of constant \(z:=\mathbf x\cdot\hat{\mathbf z}\) perpendicular to \(\hat{\mathbf z}\). The “wave” part means that \(\mathbf E(z,t)=\mathbf E_+(z-ct)+\mathbf E_-(z+ct)\) satisfies the dispersionless wave equation with speed \(c=\omega/k\). Each of these travelling plane wave components can be Fourier expanded which leads to the desired result. It is essential to emphasize that all Fourier components \(\omega\in\mathbf R\) are travelling either parallel/anti-parallel to \(\hat{\mathbf z}\) in the Fourier superposition, i.e. \(\partial\hat{\mathbf z}/\partial\omega=\mathbf 0\).

For electromagnetic plane waves, \(\frac{\partial}{\partial\mathbf x}\cdot\mathbf E=0\Rightarrow\mathbf k\cdot\mathbf E=\hat{\mathbf z}\cdot\mathbf E=0\) so both \(\mathbf E_{\pm}(z\pm ct)\) vectors are confined to lie within their own contour planes.

Finally, in order to be amenable to a Jones vector description, one has to assume the plane wave is travelling so that e.g. \(\mathbf E_-=\mathbf 0\) and monochromatic so that the surviving Fourier component is of the form \(\mathbf E_+(\omega’)=\mathbf E_0\delta(\omega’-\omega)\); then \(\mathbf E_0\in\mathbf C^2\) is the Jones vector of this travelling, monochromatic electromagnetic plane wave:

\[\mathbf E(z,t)=\mathbf E_0e^{i(kz-\omega t)}\]

(there exists a more general formalism known as Mueller calculus that extends the Jones calculus to deal with more general kinds of plane waves). Sometimes, the Jones vector is normalized \(\mathbf E_0\mapsto\mathbf E_0/|\mathbf E_0|\) to live on the “Bloch sphere” (see Poincare sphere) but this discards irradiance information \(\langle I\rangle=\varepsilon_0c|\mathbf E_0|^2/2\).

Problem: State the Jones vectors for travelling, monochromatic EM plane waves with:

a) Elliptical polarization

b) Circular polarization

c) Linear polarization at an angle \(\theta\) to the \(x\)-axis

Solution:

a) This is just the most general Jones vector \((E_x,E_ye^{i\phi})\) for (in general distinct) amplitudes \(E_x,E_y\in\mathbf R\) separated by an arbitrary phase \(e^{i\phi}\in U(1)\). Thus, just to emphasize again: Jones vector\(\Leftrightarrow\)Elliptic polarization. They are synonyms! Defining \(\tan\psi:=E_y/E_x\), the semi-major axis of this ellipse will be tilted relative to the \(x\)-axis at an angle \(\theta\) given by \(\tan 2\theta=\tan 2\psi\cos\phi\).

b) This is the special case of elliptical polarization where \(2\) conditions are simultaneously true: \(E_x=E_y:=E_0/\sqrt{2}\) and \(\phi=\pm\pi/2\). In that case, the Jones vector reduces to \(\mathbf E_0=\frac{E_0}{\sqrt 2}(1,\pm i)\) where the \(\pm\) distinguishes between left vs. right circular polarization.

c) This is the special case of elliptical polarization where \(\phi\in\{0,\pi\}\). Thus, one can write \(E_x=E_0\cos\theta\) and \(E_y=E_0\sin\theta\) so that a linearly polarized Jones vector is of the form \(\mathbf E_0=E_0(\cos\theta,\sin\theta)\).

Problem: What does it mean for a (possibly nonlinear) dielectric to be birefringent? What fundamentally causes birefringence? How is birefringence typically quantified?

Solution: Recall that \(\mathbf D:=\varepsilon_0\mathbf E+\mathbf P\) where the polarization \(\mathbf P=\mathbf P(\mathbf E)\) induced by \(\mathbf E\) may in general (for a generic, nonlinear dielectric which may be piezoelectric or even pyroelectric or even ferroelectric) be expanded as a Taylor series about \(\mathbf E=\mathbf 0\) of the form:

\[\mathbf P=\mathbf P_0+\varepsilon_0(\chi^{(1)}\mathbf E+\chi^{(2)}\mathbf E^{\otimes 2}+\chi^{(3)}\mathbf E^{\otimes 3}+…)\]

so that one can write \(\mathbf D=\varepsilon\mathbf E+\mathbf P_0\) with (effective) permittivity tensor \(\varepsilon:=\varepsilon_0(1+\chi^{(1)}+\chi^{(2)}\mathbf E+\chi^{(3)}\mathbf E^{\otimes 2}+…)\). The dielectric is then said to be birefringent iff its permittivity tensor \(\varepsilon\) is anisotropic (i.e. possesses at least \(2\) distinct permittivity eigenvalues). If \(2\) of the \(3\) permittivity eigenvalues are the same \(\varepsilon_1=\varepsilon_2\neq\varepsilon_3\), then the dielectric is said to be uniaxially birefringent with \(\varepsilon_1,\varepsilon_3\) referred to respectively as the ordinary and extraordinary permittivities (and the principal axis associated to \(\varepsilon_3\) being referred to as the optic axis). If instead all \(3\) permittivity eigenvalues are distinct, then the dielectric is said to be biaxially birefringent. Fundamentally, these forms of optical anisotropy are either due to structural anisotropy in the lattice of nuclei themselves (which may be intrinsic or induced by external stress for instance), or anisotropy in the electron cloud distribution induced by the light itself even if the lattice itself is e.g. primitive cubic and hence isotropic. In any case, the key corollary of birefringence is that \(\mathbf D\) need not be parallel to \(\mathbf E\). In particular, for an electromagnetic wave propagating in direction \(\mathbf k\), one has the orthogonal triad \((\mathbf D,\mathbf H,\mathbf k)\). But this means the Poynting field \(\mathbf I=\mathbf E\times\mathbf H\) need not be parallel to \(\mathbf k\).

Provided that \(\varepsilon^{\dagger}=\varepsilon\) is Hermitian (which will be the case iff there is no energy transfer between the light and the dielectric in the form of stimulated absorption or emission), one can always diagonalize it in an orthonormal eigenbasis of real permittivity eigenvalues \(\varepsilon_1,\varepsilon_2,\varepsilon_3\in\mathbf R\). It is thus convenient to consider a retardation ellipsoid in \(\mathbf D\)-space (cf. Poinsot’s ellipsoid in \(\mathbf L\)-space) defined by the effective electric energy density \(D^2/2\varepsilon_{\text{eff}}\) as if the dielectric were isotropic with scalar permittivity \(\varepsilon_{\text{eff}}\):

\[\frac{D_1^2}{2\varepsilon_1}+\frac{D_2^2}{2\varepsilon_2}+\frac{D_3^2}{2\varepsilon_3}=\frac{D^2}{2\varepsilon_{\text{eff}}}\]

Consider the specific case \(\varepsilon_1=\varepsilon_2\) of a uniaxially birefringent dielectric, and suppose \(\mathbf k=k(\sin\theta,0,\cos\theta)\) is incident at some angle \(\theta\) (w.l.o.g. \(\phi:=0\)) to its optic axis. Since \(\mathbf k\cdot\mathbf D=0\), a generic electric displacement \(\mathbf D\) can be decomposed in the orthonormal basis \((0,1,0),(-\cos\theta,0,\sin\theta)\) of that degenerate subspace and its components evolved independently. Thus, w.l.o.g. one can separately consider the ordinary case \(\mathbf D=D(0,1,0)\) and the extraordinary case \(\mathbf D=D(-\cos\theta,0,\sin\theta)\). In the ordinary case, plugging into the retardation ellipsoid, one obtains \(\varepsilon_{\text{eff}}=\varepsilon_1\), whereas in the extraordinary case one obtains \(1/\varepsilon_{\text{eff}}(\theta)=\cos^2\theta/\varepsilon_1+\sin^2\theta/\varepsilon_3\).

For dielectrics described by a centrosymmetric crystal structure so that the induced polarization \(\mathbf P(-\mathbf E)=-\mathbf P(\mathbf E)\) is an odd function of the applied \(\mathbf E\)-field, both the spontaneous polarization \(\mathbf P_0=\mathbf 0\) and Pockel’s electric susceptibility tensor \(\chi^{(2)}=0\) vanish.

Birefringence is fundamentally a measure of (a.k.a. is caused by) symmetry breaking. This could be due to tetragonal or orthorhombic lattice in a solid, or molecular chirality. One can also consider a kind of birefringence in conductors where \(\varepsilon(\omega)=\varepsilon_0+i\sigma(\omega)/\omega\) if the conductivity tensor \(\sigma\) is anisotropic.

Problem: Write down the Jones matrices for:

a) Linear polarizer (e.g. a dichroic Polaroid film) at angle \(\theta\) to the \(x\)-axis. Check the cases \(\theta=0,\pi/2\) and prove Malus’s law.

b) A \(\phi\)-waveplate with fast axis at angle \(\theta\) to the incident polarization vector. What about the special cases of a \(\lambda/4\)-waveplate and a \(\lambda/2\)-waveplate?

c) Crossed linear polarizers.

d) A cuvette containing chiral sugar water (e.g. dextrose) of length \(\ell\), with circular birefringence \(\Delta n:=n_{\text{left}}-n_{\text{right}}\) (aka optical activity).

Solution:

a) The eigenvectors of this Jones matrix must be \((\cos\theta,\sin\theta)\) and \((-\sin\theta,\cos\theta)\), both with respective eigenvalues \(1\) and \(0\). Thus, in analogy with quantum mechanical formulas like \(H=\sum_EE|E\rangle\langle E|\), one has:

\[1\begin{pmatrix}\cos\theta \\ \sin\theta\end{pmatrix}^{\otimes 2}+0\begin{pmatrix}-\sin\theta \\ \cos\theta\end{pmatrix}^{\otimes 2}=\begin{pmatrix}\cos^2\theta&\cos\theta\sin\theta \\ \cos\theta\sin\theta & \sin^2\theta\end{pmatrix}\]

In particular, this is \(\begin{pmatrix}1&0\\0&0\end{pmatrix}\) for \(\theta=0\) and \(\begin{pmatrix}0&0\\0&1\end{pmatrix}\) for \(\theta=\pi/2\) as expected.

A travelling monochromatic plane wave polarized with Jones vector \(E_0(1,0)\) along the \(x\)-axis maps to \(E_0(\cos^2\theta,\cos\theta\sin\theta)\) which is associated to the time-averaged intensity \(\langle I\rangle=\varepsilon_0 cE_0^2(\cos^4\theta+\cos^2\theta\sin^2\theta)/2=\langle I_0\rangle\cos^2\theta\) where \(\langle I_0\rangle=\varepsilon_0 cE_0^2/2\).

b) The eigenvectors \((\cos\theta,\sin\theta)\) and \((-\sin\theta,\cos\theta)\) are the same as for the linear polarizer tilted at \(\theta\) but the respective eigenvalues are now \(1\) and \(e^{i\phi}\) with \(\phi:=k\Delta n\ell\) for vacuum wavenumber \(k:=\omega/c\), (uniaxial or biaxial) birefringence \(\Delta n:=n_{\text{slow}}-n_{\text{fast}}\), and waveplate thickness \(\ell\). Thus, the Jones matrix is:

\[1\begin{pmatrix}\cos\theta \\ \sin\theta\end{pmatrix}^{\otimes 2}+e^{i\phi}\begin{pmatrix}-\sin\theta \\ \cos\theta\end{pmatrix}^{\otimes 2}=\begin{pmatrix}\cos^2\theta+e^{i\phi}\sin^2\theta &\cos\theta\sin\theta (1-e^{i\phi}) \\ \cos\theta\sin\theta (1-e^{i\phi}) & \sin^2\theta + e^{i\phi}\cos^2\theta\end{pmatrix}\]

For a \(\Delta n\ell:=\lambda/4\)-waveplate, \(\phi=\pi/2\) so \(e^{i\phi}=i\):

\[\begin{pmatrix}\cos^2\theta+i\sin^2\theta &\cos\theta\sin\theta (1-i) \\ \cos\theta\sin\theta (1-i) & \sin^2\theta + i\cos^2\theta\end{pmatrix}\]

For \(\Delta n\ell:=\lambda/2\)-waveplate, \(\phi=\pi\) so \(e^{i\phi}=-1\):

\[\begin{pmatrix}\cos 2\theta & \sin 2\theta \\ \sin 2\theta & \cos 2\theta\end{pmatrix}\]

c) Zero of course.

d) Essentially identical eigenvalues \(1,e^{i\phi}\) as the waveplate (where \(\phi=k\Delta n\ell\) still holds just with a re-interpretation of \(\Delta n\) and \(\ell\)), only the eigenvectors change to left/right circularly polarized normalized Jones vectors \((1,\pm i)/\sqrt{2}\):

\[1\begin{pmatrix}1/\sqrt{2} \\ i/\sqrt{2}\end{pmatrix}^{\otimes 2}+e^{i\phi}\begin{pmatrix}1/\sqrt{2} \\ -i/\sqrt{2}\end{pmatrix}^{\otimes 2}=e^{i\phi/2}\begin{pmatrix}\cos\phi/2 & \sin\phi/2 \\ \sin\phi/2 & -\cos\phi/2\end{pmatrix}\]

Thus, the plane of linear polarization (which is an equal amplitude superposition of left and right circular polarizations) is rotated by \(\phi/2\), or equivalently \(\partial(\phi/2)/\partial\ell=k\Delta n\) is the specific rotation power of the sugar solution which can either be positive or negative depending on whether \(\Delta n>0\) is dextrorotatory or \(\Delta n<0\) is levorotatory.

(aside: even in intrinsically achiral media, one can induce chirality by applying an uniform external magnetic field \(\mathbf B\) to a dielectric or plasma; this magnetically-induced circular birefringence is known as the Faraday effect \(\phi/2=VB\ell\) where \(V=V(\omega)\) is called the Verdet constant).

Optical Fibers & APC Connectors

An optical fiber is a waveguide for light waves. The idea is to use it to transmit light over long distances with minimal loss. It consists of an inner core, made of glass or plastic, where total internal reflection can take place within the waveguide (ignoring evanescent transmitted waves) because of a cladding with (higher/lower?) refractive index, and a jacket (blue layer in the picture).

At the ends of optical fibers, one typically also has angled physical contact (APC) connectors to minimize back-reflection of light (by using an angled design usually around \(8^{\circ}\)). These ensure alignment of optical fiber cores when connecting two optical fibers to each other.

Often, optical fibers can be polarization-maintaining (PM) meaning that when one excites a given optical fiber. This is because apparently the core of an optical fiber is typically already pre-stressed to give it some kind of birefringence \(\Delta n=n_{\text{slow}}-n_{\text{fast}}\) (general rule of thumb: any symmetry which is easily broken will be broken; for example the magnetic field is never actually \(\textbf B=\textbf 0\) due to Earth, someone’s phone, etc. and since you don’t want other things to be defining your quantization axis, so you should just apply a magnetic field yourself anyways).

Coupling Laser Light into an Optical Fiber

Goal is to get laser beam to be normally incident \(\theta_x=\theta_y=0\) at the center \(x=y=0\) of an optical fiber. Although initially this sounds quite trivial, as with any waveguide, the optical fiber is extraordinarily sensitive to any small deviations in these \(4\) degrees of freedom \(x,y,\theta_x,\theta_y\) and will only work if these \(4\) conditions are almost perfectly met (hence rendering the task highly non-trivial). Thus, the naive solution of just trying to align the laser beam into the optical fiber “by hand” is hopeless since one’s hands afford merely coarse control over \(x,y,\theta_x,\theta_y\) but clearly here one requires much finer control in order to successfully couple the laser light into the optical fiber.

The way to obtain such fine control is to use mirrors; each mirror comes with fine control in both spherical coordinates \(\phi,\theta\) (and also there is leeway in exactly where the laser is incident on the mirror and the fact that it need not be exactly \(45^{\circ}\) or anything like that). Of course changing the azimuth \(\phi\) of a given mirror will simultaneously change both \(x,\theta_x\) and similarly changing the zenith angle \(\theta\) of a mirror simultaneously affects both \(y,\theta_y\), so in this sense these degrees of freedom are “coupled”. Specifically, each mirror provides for \(2\) degrees of freedom \(\phi,\theta\) which is why in total \(2\) mirrors are actually needed to properly couple the laser into the optical fiber.

One can connect the output end of the optical fiber to a fiber pen and use a translucent polymer sheet to see where the laser beam from the laser intersects the laser beam from the fiber pen at various regions in the setup. From having played around with the setup, it is more sensible to focus on aligning them at the extremes of the path, which tends to automatically ensure that they will be aligned everywhere else in the middle. Moreover, a general rule of thumb turns out to be that in order to align a section, the mirror one should do fine adjustments is, perhaps counterintuitively, the one further away (is there some name for this kind of algorithm?). Doing it iteratively like this will converge onto an aligned optical system; doing it the other way will diverge into a hopelessly misaligned system.

After having completed the “fine structure” alignment of the mirrors properly so that there is for sure some non-zero signal coming out the output of the optical fiber, one can then proceed to a “hyperfine” level of adjustments, putting the output of the optical fiber into a photodiode and measuring the photocurrent developed across a potentiometer \(R\) via a multimeter, or just directly using a power meter. Here again, one essentially seeks to maximize the photodiode signal by an algorithm which vaguely feels like a manual implementation of gradient descent. More precisely, it turns out to be more advisable to make some small random perturbation to the \(\phi\) (resp. \(\theta\)) of the mirror farther away (not necessarily physically, but in the sense of the optical path length) from the input of the optical fiber, then adjusting \(\phi\) (resp. \(\theta\)) of the mirror closer to the optical fiber input until the signal is locally maximized, and repeating this until one eventually converges onto not merely a local, but global maximum (2D search). Finally, also consider the focal length of the lens relative to the fiber (this is a 1D search at the end). At this point, one can feel pretty confident that the laser light is properly coupled into the optical fiber, i.e. that \(x\approx y\approx\theta_x\approx\theta_y\approx 0\). Each time one takes a fiber out and puts it back in again, one has to recouple because of how sensitive the whole alignment is.

Optical Tables & Breadboards

Small vibrations (e.g. footsteps, motors, etc.) can perturb the delicate alignment of optical systems, hence all optical components need to be firmly bolted down to an optical table (possibly with the aid of ferromagnetic bases). The top and bottom layers of an optical table are usually manufactured from some grade of stainless steel perforated by a square lattice of \(\text{M}6\) threaded holes with lattice parameter \(\Delta x=25\text{ mm}\) (recall that \(\text{M}D\times L\) is the standard notation for a metric thread of outer diameter \(D\text{ mm}\) and length \(L\text{ mm}\) and typically one assumes the thread pitch \(\delta\text{ mm}\) is the coarsest/largest one that is standardized for that particular thread diameter \(D\) so that the helix winds \(N=L/\delta\) times around, although \(\delta\) could be finer/smaller too, see this reference). The exact engineering details of how an optical table seeks to critically damp external vibrations is interesting, involving the use of pneumatic legs and several layers of viscoelastic materials sandwiched between the steel layers in a rigid honeycomb structure.

Image

Optical breadboards are basically just smaller, less fancy version of an optical table, mainly used for prototyping and easier portability of a particular modular setup into some main optical table.

Acousto-Optic Modulators (AOMs)

An acousto-optic modulator (AOM), also known as an acousto-optic deflector (AOD), is at first glance similar to a diffraction grating for light in the sense that if one shines some incident plane wave from a laser through the hole in the AOM, then out comes an \(m=0\) order mode in addition to \(m=\pm 1\) and occasionally higher-order modes too (the exact distribution of intensities among these harmonics will depend very sensitively on the incident angle that one shines the laser light at into the AOM).

However, despite being superficially similar to a diffraction grating, there are some notable differences; the first is that the Fraunhofer interference pattern of a diffraction grating typically occurs via a (\(2\)-dimensional) screen with a bunch of slits on it; here a (\(3\)-dimensional!) volume Bragg grating (VBG) is used instead, which in practice means some kind of glass attached to a piezoelectric transducer that drives the glass (i.e. applies periodic stress to it) at some radio frequency \(f_{\text{ext}}\sim 100\text{ MHz}\) via an external RF driver. This induces a periodic modulation in the glass’s refractive index \(n=n(x)\) where the “period” \(\lambda_{\text{ext}}=c_{\text{glass}}/f_{\text{ext}}\) over which \(n(x+\lambda_{\text{ext}})=n(x)\) corresponds to the wavelength of the sound waves, where \(c_{\text{glass}}\) is the phase velocity of sound waves in the glass.

Provided the light is incident at the Bragg angle \(\theta_B\approx\sin\theta_B\approx 2\lambda/\lambda_{\text{ext}}\), then one has an effective crystal with interplanar spacing \(\lambda_{\text{ext}}\) and so the Bragg condition yields the angular positions of the constructive maxima of the Brillouin scattering:

\[2\lambda_{\text{ext}}\sin\theta_m=m\lambda\]

In addition, whereas for ordinary light incident on a diffraction grating the wavelength and frequency don’t change after diffraction, here because the photons either absorb or emit a phonon quasiparticle (respectively \(m=\pm 1\) orders), they do also accrue a slight Doppler shift in the frequency. When an AOM is labelled as being \(110\text{ MHz}\) for instance, it does not mean that the only Doppler shifts it is able to provide are exactly \(\pm 110\text{ MHz}\) but rather the diffraction efficiency \(\eta\) is greatest at this frequency, with some FWHM bandwidth \(\delta f_{\text{ext}}\) around this. For instance, for \(2\) AOMs in the lab, the following frequency response efficiency curves were measured (for both single pass and double pass, the latter of which should roughly be the square of the former).

AOMs are commonly used in a double-pass configuration, which means that light is passed through, then passed back again in exactly along the trajectory it came. If the diffraction efficiency of the first-order is \(\eta(\omega)<1\) at some frequency \(\omega=2\pi f\) ideally around the central \(\omega\) of the AOM (e.g. \(\omega=2\pi\times 110\text{ MHz}\)), then double-passing will lead to a reduced frequency \(\eta^2(\omega)<\eta(\omega)\). Provided one picks out the right order (not always trivial to do, need to change the driving amplitude to see which order drops faster, and use geometrical ray optics arguments), then this allows accruing a Doppler shift of \(2f_{\text{ext}}\) without sacrificing too much efficiency (if tried to get this from the \(m=2\) mode on a single-pass, would lose a lot of efficiency). AOMs are also commonly used for Q-switching in lasers (i.e. as glorified switches b/c they can switch on nanosecond time scales).

Laser (Toptica) with massive DLC Pro driver? Talk about how lasers work + lasing requirements

Notes on how Zoran’s lab works:

  • The UHV in the MOT and science cells are like \(10^{-11},10^{-13}\text{ mbar}\) respectively, measured by a current which is on the order of \(\text{nA}\) (but at such low pressures, with such few particles, one can argue that pressure fails to even be a well-defined quantity).
  • There are \(4\) AOM drivers for D1 cooling/repump and D2 cooling/repump light. Each has frequency, TTL, and amplitude control which need to be connected to analog channels like AO1, AO2, etc. which in turn are controlled in Cicero.
  • Laser goggles have certain wavelength ranges over which they block best. The ODT uses 767 nm red light, but the box trap uses 532 nm green light.
  • The Toptica laser controller is one component of a PID control loop.
  • First, saturated absorption spectroscopy (require heat b/c K-39 to be in a gaseous form b/c otherwise just K-39 liquid/solid sitting at the bottom of the tube; this is achieved by winding some coils around and passing large current through coils and relying on resultant Joule heating; for K-39 need around human body temperature? \(35-40^{\circ}\text{ C}\) (the double-pass thing in the absorption cell) is used to get a Doppler-free \(\lambda_{D1},\lambda_{D2}\) signals that are fed to photodiodes, which send this to the Toptica laser controller which sends it to the Toptica software that’s used for laser locking.
  • Need to lock the laser b/c a piezoelectric crystal has some voltage applied to it that causes mechanical deformation, move distance b/w 2 mirrors, but overtime it can drift due to temperature fluctuations, etc.
  • The photodiodes need to be powered (by old car battery in this case) and also a separate cable which feeds into Toptica laser controller (it is also this cable which has the extra resistor at its end…I think idea is that the photodiode converts absorption signal into a photocurrent that flows across the resistor, and gets converted into a voltage…note that it’s a BNC cable, and most BNC cables already have some internal resistance, so this resistor really is just an extra resistor which I guess is to decrease the “gain” in some sense?).
  • Kibble-Zurek mechanism?
  • Anything in the lab (e.g. PCs, soldering irons, vacuum pumps, all kettle plugs, etc.) connected to AC mains needs to be PAT tested.
  • There are \(4\) sets of coils in the experiment. In chronological order of use, they are:
  • Quadrupole field coils (both \(x\),\(y\) and \(z\)) for the MOT and magnetic trapping.
  • Guide field coils (to impose a quantization axis?) on MOT side for pumping and on the imaging side.
  • Feshbach (“Fesh”) field coils for the science cell (to exploit Feshbach resonance of hyperfine states in order to tune s-wave scattering length).
  • Compensation coils in \(x,y,z\) (the \(z\) compensation coil is also called “anti-\(g\)” coil for obvious reasons).
  • Speedy coils? For quantum quench experiments?
  • One of the coils cancels the curvature in the Feshbach coils.
  • Each of these coils obviously requires a very bulky power supply.
  • Igor’s thesis should contain more information about the coils.
  • The track (arm which moves the magnetically trapped atoms) has \(3\) states, START, MOVE, MOVE2, and ENERGIZE? There is a track control box connected to the analog channels which one can use to control how the track moves in Cicero during an experimental sequence.
  • Regarding water cooling of the experiment, the water is already pressurized, so adding a pump would only slow it down?
  • The pipes also contain flow meters which monitor the flow rate \(|\textbf v|\) of the water (not sure how?), and send this information to a logic circuit which also uses temperature control. Will suddenly stop all current flowing through Feshbach coils if it detects that some thresholds are breached on both; thus, behaves as a current-controlled switch, aka a transistor, and more precisely they are IGBTs (insulated-gate bipolar transistor) because it turns out only these transistors are rated for the kinds of currents being used here.
  • For all the coils, one frequently would like to switch them off suddenly. If you just do this directly, the significant inductance \(L\) of the coils will lead to a substantial back emf that would destroy the PSU. Hence the need for an alternative path for current to flow, which is why we also have a capacitor in parallel?
  • Apparently, the light inside an optical fiber can also heat the fiber enough to melt it…
  • There can be up to \(I\sim 200\text{ A}\) of current flowing through the Feshbach coils, with \(V=400\text{ V}\)…the whole circuit is low-resistance so if you touch probably not lethal but still better to be safe. The

The D1, D2 cooling and repump light must first get the required frequency shifts, then it all gets coupled simultaneously into a TA (amplifier) which should be seeded at all times, is externally controlled by a current knob \(I\) that dictates how much amplification \(A=A(I)\) it gives to the laser power. This is all then coupled into a polarization-maintaining optical fiber that goes into an optical fiber port cluster (FPC) (see the ChatGPT blurb about it) which is basically a compact setup of mirrors/lenses/polarizing beamsplitters (Chris says conceptually it’s not hard to build one yourself, just that save time with a company at the cost of double the price cf. self-building; similar remarks even apply to e.g. a laser which can be self-built and indeed many labs do that, just takes time). This then takes the incident light from the fiber and redistributes it into \(6\) beams of roughly equal power for the MOT (i.e. the “O” in “MOT”).

The MOT loading time \(\Delta t_{\text{load}}\) is the time to load the MOT from the vapor of K-39 atoms that sits at some background pressure \(p_o\) and temperature \(T_0\). Some exponential “charging curve” \(1-e^{-t/t_{\text{load}}}\)? And also, normally you gauge how well the MOT is working (and decide when need to fire again) by measuring atom number in the BEC in the science cell. If the science cell isn’t working, what you can instead do is to measure an initial \(I_0\) from absorption spectroscopy, then do magnetic transport of the atoms to the science cell and back to the MOT, and measure \(I\); then the recapture efficiency of the MOT is \(I/I_0\).

Also in science cell, one-body losses are very significant. Relative to the BEC, the thermal cloud around it is at effectively infinite energy heat bath, so if any such atom collides with an atom in the BEC, it will remove it…(I guess thermalization is always happening, and at the microscopic/kinetic level what this looks like is precisely one-body losses).

One very effective practice/way to learn more about how any lab with a bunch of cables/wires works is to just trace/route wires, one at a time, to gain some sense for how different components are connected to each other.

General EQ Stuff

If you’re building a new machine/experiment, need to make the shop ppl’s life “living hell”, ask about stock available and be persistent, ask “can you get it to me by tomorrow”, etc. and don’t leave it to the point that they have to reach out to some more senior ppl etc, then stuff will never get done. Example in this case was for boards to enclose the perimeter of the optical table with, some were not right size so were looking for companies to get new ones from. Simon found a company and even more quickly found that they had a contact, so he just called them right away and got the order sorted out very efficiently.

Posted in Blog | Leave a comment

Beer-Lambert Law & Radiative Broadening

In cold atom experiments, one very basic question one can ask is, given some atom cloud, what is the number of atoms \(N\) in the cloud? One way is to basically shine some light on the atom cloud and see how much is absorbed. This absorption effect is quantified by the Beer-Lambert law.

\[I(z)=I(0)e^{-n\sigma z}\]

where \(n=N/V\) is the number density of atoms in the cloud of volume \(V\) and \(\sigma=\sigma(\omega_{\text{ext}})\) is the optical absorption cross-section presented by each atom in the cloud to incident monochromatic light of frequency \(\omega_{\text{ext}}\).

It is instructive to derive the Beer-Lambert law from first principles. In particular, the derivation is meant to emphasize that, for the most part, one can basically just think of the Beer-Lambert law as a mathematical theorem about probabilities, with some quantum mechanical asterisks to that statement. To get a sense of this, consider first a \(2\)D version of the Beer-Lambert law, in which one has an atom cloud confined to a plane, along with an incident beam of photons of frequency \(\omega_{\text{ext}}\) travelling along the (arbitrarily defined) \(z\)-direction.

The (average) number density of atoms is \(n\) (units: \(\text{atoms}/\text m^2\)) and each atom can be thought of as a “hard circle” with diameter \(\sigma\) (units: \(\text m/\text{atom}\)). In that case, in a small strip of width \(dz\), there will be \(ndz\) atoms per unit length along the strip, or equivalently the average interatomic spacing is \(1/ndz\) along the strip (see the picture). The probability that a given photon “collides” with such an atom is therefore \(\sigma/(1/ndz)=n\sigma dz\); such photons are depicted red on the diagram, while those that make it through the first layer \(dz\) are depicted green. Over many photons, this manifests as a loss \(dI<0\) in their collective intensity \(I\) across the layer \(dz\), so one may equate the fractional loss of intensity with the absorption probability:

\[\frac{dI}{I}=-n\sigma dz\]

for which the solution of this ODE yields the Beer-Lambert law:

\[I(z)=I(0)e^{-n\sigma z}\]

where \(1/n\sigma\) is the length scale of this exponential attenuation in the beam intensity. Of course, this argument generalizes readily to the \(3\)D case where now \(n\) (units: \(\text{atoms}/\text m^3\)) is the number density of atoms in \(\textbf R^3\) and \(\sigma\) (units: \(\text m^2/\text{atom}\)) is now the optical cross-section presented by each atom. As stressed earlier, there isn’t really much physics going on here, it’s just a statement about the statistics of a \(3\)D Galton board.

At this point however, one would like to introduce some quantum mechanical modifications to this simple Beer-Lambert law. As usual, suppose the laser light \(\omega_{\text{ext}}\) is not too detuned from a particular atomic transition \(\omega_{01}\) between some ground state \(|0\rangle\) and some excited state \(|1\rangle\) in each of the atoms in the cloud (also assume for simplicity that both \(|0\rangle\) and \(|1\rangle\) are non-degenerate). In that case, it makes sense to distinguish \(n=n_0+n_1\) between the number density \(n_0\) of atoms in the ground state \(|0\rangle\) vs. the number density \(n_1\) of atoms in the excited state \(|1\rangle\) since only the atoms in the ground state \(|0\rangle\) can absorb the incident photons, after which they go into the excited state \(|1\rangle\) and so are no longer able to absorb any more photons. Thus, one might think that the correct form of the Beer-Lambert law should be:

\[\frac{dI}{I}=-n_0\sigma dz\]

But this is forgetting that atoms in the excited state \(|1\rangle\) can undergo stimulated emission too back down to the ground state \(|0\rangle\) (and in the steady state, recall from Einstein’s statistical argument that the rates of stimulated absorption and emission are equal). In contrast to absorption, this would have the effect of actually increasing the intensity \(I\) because the atom emits a photon back into the beam. Thus, the correct form of the Beer-Lambert law is actually:

\[\frac{dI}{I}=-n_0\sigma dz+n_1\sigma dz=(n_1-n_0)\sigma dz\]

where by time-reversal symmetry the optical cross-section \(\sigma\) is the same for both stimulated absorption and emission. In the steady state (i.e. when \(\dot n_1=\dot n_2=0\) reach an equilibrium), it is clear that one must also have \((n_0-n_1)\sigma I=n_1\Gamma\hbar\omega_{\text{ext}}\) where \(\Gamma=A_{10}\) is the rate of spontaneous emission/decay from the excited state \(|1\rangle\) back down to the ground state \(|0\rangle\) (note that it really is \(\hbar\omega_{\text{ext}}\) and not \(\hbar\omega_{01}\) in the formula; whatever frequency an atom absorbs must also be what it emits by energy conservation). On the other hand, also in the steady state, the optical Bloch equations assert that:

\[\rho_{11}=\frac{n_1}{n}=\frac{1}{2}\frac{s}{1+s+(2\delta/\Gamma)^2}\]

where \(s=I/I_{\text{sat}}=2(\Omega/\Gamma)^2\) is the saturation. Combining these two expressions allows one to obtain an explicit formula for how the optical cross-section \(\sigma\) depends on the “driving frequency” \(\omega_{\text{ext}}\) of the incident photons in e.g. a laser:

\[\sigma(\omega_{\text{ext}})=\frac{1}{1+(2\delta/\Gamma)^2}\frac{\hbar\omega_{\text{ext}}\Omega^2}{\Gamma I}\]

where there is also an \(\omega_{\text{ext}}\)-dependence hiding in the detuning \(\delta=\omega_{\text{ext}}-\omega_{01}\). At first glance, this seems to suggest that the optical cross-section \(\sigma\), in addition to depending on \(\omega_{\text{ext}}\) also depends on the intensity \(I\) of the incident photons, but actually this is an illusion, because the Rabi frequency \(\Omega\) also depends on \(I\) in such a way that the two effects cancel out so as to actually make \(\sigma\) independent of \(I\). To see this, recall that the time-average of the Poynting vector over a period \(2\pi/\omega_{\text{ext}}\) is \(I=\varepsilon_0 c|\textbf E_0|^2/2\) and that the Rabi frequency is \(\hbar\Omega=e\textbf E_0\cdot \langle 1|\textbf X|0\rangle\). The unsightly presence of the matrix element can be further removed by recalling that (in the dipole approximation) one has \(\Gamma=4\alpha\omega_{01}^3|\langle 1|\textbf X|0\rangle|^2/3c^2\). Therefore, in the best case where the incident light is polarized along the dipole moments of the atoms, then \(\Omega^2=e^2|\textbf E_0|^2|\langle 1|\textbf X|0\rangle|^2/\hbar^2\). If on the other hand the incident light were unpolarized or the atoms in the cloud were randomly oriented, then isotropic averaging would contribute an additional factor of \(1/3\):

\[\langle\cos^2\theta\rangle_{S^2}=\frac{1}{4\pi}\int_0^{2\pi}d\phi\int_0^{\pi}d\theta\cos^2\theta\sin\theta=\frac{1}{3}\]

Sticking to the best case scenario (which can be thought of as an upper bound if one likes though it is experimentally the typical situation since one often tries to maximize \(\sigma\) anyways), this leads to the explicitly \(I\)-independent form of the optical cross-section:

\[\sigma(\omega_{\text{ext}})=\frac{1}{1+(2\delta/\Gamma)^2}\frac{6\pi\omega_{\text{ext}}c^2}{\omega_{01}^3}\]

so the optical cross-section takes its maximum value at \(\omega_{\text{ext}}=\sqrt{\omega_{01}^2+(\Gamma/2)^2}\) but because the line width \(\Gamma\ll\omega_{01}\) is typically much less than the transition frequency itself, this is basically just \(\omega_{\text{ext}}\approx \omega_{01}\) so the maximum cross-section \(\sigma_{01}\) occurs on resonance and is given by:

\[\sigma_{01}=\sigma(\omega_{01})=\frac{6\pi c^2}{\omega_{01}^2}=\frac{3\lambda_{01}^2}{2\pi}\]

This also allows one to approximate the spectrum of the optical cross-section \(\sigma\) as just a Lorentzian profile centered at \(\omega_{\text{ext}}\approx \omega_{01}\) with \(\Gamma\) being its FWHM:

\[\sigma(\omega_{\text{ext}})\approx\frac{\sigma_{01}}{1+(2\delta/\Gamma)^2}\]

Typical transition wavelengths (e.g. visible light) might be around \(\lambda_{01}\sim 10^{-7}\text{ m}\) which far exceeds the length scale \(\sim a_0\sim 10^{-11}\text{ m}\) of the individual atoms themselves. The corresponding optical cross-section \(\sigma_{01}\sim\lambda_{01}^2\) is thus much larger than the actual “size” of the atoms themselves, so this emphasizes another quantum mechanical discrepancy to the classically-minded picture where \(\sigma\) would have just been interpreted as the size of individual “hard sphere” atoms (and in that case it wouldn’t have any \(\omega_{\text{ext}}\)-dependence in the first place). Moreover, the fact that near resonance \(\sigma\) is much larger than the atoms themselves also helps to ensure laser cooling actually works since it gives each photon more “leeway” in that it doesn’t need to hit an atom “head-on” to be absorbed, but merely has to pass within the cross-section \(\sigma\).

Intensity Saturation & Broadening

At low incident intensities \(s\ll 1\), spontaneous emission dominates stimulated absorption/emission \(\Gamma\gg\Omega\) and so any atom which is excited from the ground state \(|0\rangle\) into the excited state \(|1\rangle\) will quickly decay back down to the ground state \(|0\rangle\) by spontaneous emission. However, as one ramps up the laser intensity to saturation \(s\to 1\) and even \(s>1\), although there is a cap \(\rho_{11}<1/2\) on the excited state population, nevertheless the ground state population \(\rho_{00}\to 1/2\) will have depleted so much that there won’t be that many atoms left to absorb any more incident photons, so one would expect the sample to get worse and worse at absorbing incident photons. Recalling that \(s=I/I_{\text{sat}}=2(\Omega/\Gamma)^2\) (note that in the optimal case \(I_{\text{sat}}=\hbar\omega_{01}^3\Gamma/12\pi c^2\) but importantly is an intrinsic property of the atomic transition that scales with the transition frequency as \(I_{\text{sat}}\propto\omega_{01}^6\) due to the extra factor of \(\omega_{01}^3\) in \(\Gamma\)), it is clear that when \(s\to 1\), the Rabi frequency \(\Omega\) grows to the point of being comparable with the spontaneous decay rate \(\Gamma\), so now stimulated emission starts competing with spontaneous emission. In order to see this mathematically, it is useful to look at the absorption coefficient whose reciprocal directly governs the length scale of attenuation in the Beer-Lambert law:

\[(n_0-n_1)\sigma=\frac{n\sigma_{01}}{1+s}\frac{1}{1+(2\delta/\Gamma\sqrt{1+s})^2}\]

This is just another Lorentzian similar to the cross-section \(\sigma(\omega_{\text{ext}})\) itself. But there’s a crucial difference; whereas the FWHM of the Lorentzian for \(\sigma\) was fixed at \(\Gamma\), here it is \(\Gamma\sqrt{1+s}\); but this is now dependent on the laser intensity \(s\), causing the Lorentzian to broaden as \(s\) increases (this is exactly the same kind of broadening seen in \(\rho_{11}\); one difference though is that while \(\rho_{11}\to 1/2\) saturates, here the resonant absorption coefficient \(n\sigma_{01}/(1+s)\) just decreases monotonically as \(s\) is ramped up).

Finally, one can revisit the original Beer-Lambert law \(I(z)=I_0e^{-n\sigma z}\) and ask what becomes of it after all the modifications; from the expression for the absorption coefficient above, one has:

\[\frac{ds}{dz}=-\frac{n\sigma_{01}s}{1+s+(2\delta/\Gamma)^2}\]

In terms of the line-of-sight atomic column density \(n_c(z):=\int_0^zn(z’)dz’\), this ODE is trivial to integrate:

\[n_c\sigma_{01}=\ln\frac{I_0}{I}+\frac{I_0-I}{I_{\text{sat}}}\]

where \(I_0:=I(z=0)\) is the incident irradiance. The quantity \(\ln I_0/I\) is often called the optical density (OD) in AMO physics, or the absorbance in chemistry. In practice, this formula cannot just be used as is, but rather requires calibrating for the polarization, detuning fluctuations, optical pumping losses, etc. by sweeping over a range of incident intensities \(I_0\) and, using some known atom number \(N=n_cA\) obtained by other methods, choosing \(I_{\text{sat}}\) so that \(n_c\sigma_{01}\) is approximately invariant for all \(I_0\) and corresponding \(I\).

Posted in Blog | Leave a comment

Oversaturated Absorption Imaging of Atomic Clouds

The purpose of this post is to describe the relevant theory needed to understand the paper “High signal to noise absorption imaging of alkali atoms at moderate
magnetic fields
” by Hans et al. In particular, a key paper which they cite that details the calibration of the absorption imaging setup is “Strong saturation absorption imaging of dense clouds of ultracold atoms” by Reinaudi et al. Another useful resource is the PhD dissertation of Hans which goes into more depth on details that are omitted in their paper.

Atomic Structure of \(^{39}\text K\)

The alkali atom isotope \(^{39}\text K\) has fixed, non-negotiable electron spin \(s=1/2\) and nuclear spin \(i=3/2\); hence it is bosonic \(s+i=2\). Within the gross \(n\)-manifold for \(n=4\), consider either the \(4s_{1/2}\) or \(4p_{1/2}\) fine \(j\)-manifolds for \(j=1/2\). In both cases, there are two hyperfine \(f\)-manifolds corresponding to total atomic angular momenta \(f=1,2\). In the strict absence \(\textbf B=\textbf 0\) of an external magnetic field, the \(f=1\) hyperfine manifold has \(3\) degenerate \(m_f\)-sublevels corresponding to projections \(m_f=-1,0,1\) of the total atomic angular momentum along some arbitrary \(z\)-axis, while the \(f=2\) hyperfine manifold has \(5\) degenerate \(m_f\)-sublevels corresponding to \(m_f=-2,-1,0,1,2\). However, upon turning \(\textbf B\neq\textbf 0\) on with \(B:=|\textbf B|\), the Breit-Rabi formula asserts that the \(2f+1\)-fold degeneracy among the Zeeman sublevels within each hyperfine \(f\)-manifold is lifted exactly according to the trajectories:

\[\Delta E_{|f=3/2\pm 1/2,m_f\rangle}(B)=-\frac{A}{4}\pm\frac{1}{2}\sqrt{4A^2+2m_fAg_j\mu_BB+(g_j\mu_BB)^2}\]

where \(g_j=2\) for \(4s_{1/2}\) and \(g_j=2/3\) for \(4p_{1/2}\), and \(A\approx h\times 230.859860\text{ MHz}\) for \(4s_{1/2}\) whereas \(A\approx h\times 27.793\text{ MHz}\) for \(4p_{1/2}\) (see the data for \(^{39}\text K\) here).

  • Intuitively, the reason why the \(m_f\) sublevels seem to be inverted in the \(4s_{1/2}\), \(f=1\) hyperfine manifold is that \(g_f=-1/2<0\) is negative and to first-order the Zeeman perturbation is \(g_fm_f\mu_BB\) (originally \(-\boldsymbol{\mu}\cdot\textbf B\) but \(q=-e\)).
  • 2D scan of \(I_{\sigma^+}+I_{\sigma^-}\) vs. \(I_{\sigma^+}/I_{\sigma^-}\), in an ideal world the measured OD should be constant across the entire space (as measured at low-field), but
  • AOM driver right now is just being controlled by essentially varying a potentiometer \(R_2\) which controls the voltage at the midpoint of a voltage divider, which is fed into a voltage oscillator circuit that effectively maps \(V\to\omega_{\text{ext}}\) to RF-drive the AOM with. By flicking the switch, voltage divider circuit is no longer controlling it, instead it’s externally controlled by a computer in the Cicero Word Generator GUI for AMO physics experiments.
  • The natural line width of optical/visible light (THz) transitions is practically zero compared with the RF transition (on the order of 400 MHz) between potassium-39 hyperfine states because \(\Gamma\propto\omega_{01}^3\).
  • Need to first lock onto the right B-field (395 G) by doing a frequency sweep. Then, once that is locked onto, need to then impose correct frequency shifts on the AOMs (have a substantial line width/leeway here like 6 MHz or something?), will require a second frequency sweep to find max SNR) centered around roughly where we expect it to be located anyways (show calculation for this).
  • Panos’s thesis did pixel-by-pixel calibration.
  • https://www.tobiastiecke.nl/archive/PotassiumProperties.pdf
  • Description of the experiment:
  • The idea is that one would like to do spin-resolved polaron injection spectroscopy.
  • D\(1\) repump light is from \(4s_{1/2}\) manifold (typically use \(|1,1\rangle\) for its broad Feshbach resonance) to \(4p_{1/2}\) manifold \(|2,2\rangle\). D\(2\) imaging light is from \(4s_{1/2}\) to \(4p_{3/2}\) stretched state \(|3,3\rangle\).
  • The D\(2\) laser light is first incident on a \(\lambda/2\) waveplate which rotates the polarizations so that some go into each arm of a double-pass AOM. It is first passed into an AOM double-pass setup to get \(\pm 220\text{ MHz}\). These then are incident on a D\(2\) flip mirror which redirects this D\(2\) light into the modular optical breadboard setup we built. Specifically, the crossed polarizations are incident on a \(\lambda/2\) waveplate that rotates a certain amount of polarization into each of two double-pass AOM arms. One branch is additive by \(220\text{ MHz}\) in total (after double-pass) while the other branch is subtractive \(-220\text{ MHz}\), so when aligning it is essential to maximize the correct order \(m=\pm 1\), and to check this by turning on the TTL switch of the driver to see which order is left just before the iris. These then need to be overlapped onto an output fiber, with another \(\lambda/2\) waveplate onto a PBS which will throw away \(P/2\) but at the benefit of having a single polarization propagating through your polarization-maintaining fiber and directly into the science cell. This waveplate also allows optimizing \(I_{\sigma^+}/I_{\sigma^-}\).
  • The AOM drivers are controlled by a digital channel for using Cicero to do TTL switching and also an analog channel for using Cicero to change driving amplitude of the AOMs (Janet for the \(-220\text{ MHz}\) and Billy for \(+220\text{ MHz}\)). In Cicero, the Override option for the D\(2\) flip mirror needs to be checked, but value is off for it to be down. Also, when overriding a digital channel, it is automatic, but when overriding an analog channel, need to specifically say so.
  • If one wishes to abort a given sequence, best to tick the box, and when the sequence is finished (usually around \(30\) seconds). to quickly close it and click “restart sequence” to start up a new sequence or something (to keep coils heated).
  • There are quadrupole coils (seem like 4 pairs?) in an anti-Helmholtz configuration for the MOT, Feshbach coils for the broad \(|1,1\rangle\) Feshbach resonance field to tune \(a\) (there are some empirical correlations in Cicero between the applied voltage in the coils and the corresponding \(B\to a\) you get out of it).
  • Optical dipole trap (ODT), the light for that is the dangerous IR (power is 1 Watt, can even burn your skin).
  • “Walking the beam” (draw schematic) by turning say \(\phi_1\) and seeing which direction \(\phi_2\) needs to go to keep at the same voltage, doing same for \(\theta_1,\theta_2\)…adjusting collimation at the end.
  • Fiber pen, fiber cleaning kit (microscope, never look into it if the other end of fiber is coupled to light or will go blind).
  • To actually make the optical box trap of green light, shine light onto a spatial light modulator (SLM, which is a bunch of liquid crystals applying some phase and stuff, a Freedericksz transition?, a bit like DMD except rotates slower so response time is kinda ass). Box is not a perfect cylinder, it is more like the waist of a Gaussian beam (length of \(40\) microns or so is Rayleigh distance \(z_R\)), and the sides are given by steep power law potentials. A bunch of lenses of various \(f\) act like Fourier transformers, etc. so that light field at focal plane is Fraunhofer pattern of SLM grating.
  • When locking onto say the D\(2\) laser, have an absorption cell of solid \(\text K(s)\) with melting point around \(40^{\circ}\text{ C}\). Doppler-free spectroscopy allows measuring . derivative is physically measured, two PID controllers for different time scales used to
Posted in Blog | Leave a comment

Ideal Fermi Gases

Problem: Define an ideal Fermi gas.

Solution: A non-interacting collection of identical fermions (e.g. electrons \(e^-\), neutrons \(n^0\), etc.). Mathematically, the “Fermi” part says that the state space is the antisymmetric submanifold \(\mathcal H=\bigwedge^NL^2(\mathbf R^3)\otimes\mathbf C^{2s+1}\) while the “ideal” part says that the Hamiltonian of the ideal Fermi gas does not contain any pairwise interaction potential \(V_{\text{int}}=0\) (note that “Pauli repulsion” or “exclusion pressure” mimics but is not an interaction):

\[H=\sum_{i=1}^N\frac{|\mathbf p_i|^2}{2m}+V_{\text{ext}}(\mathbf x_i)\]

Problem: Henceforth, assume \(V_{\text{ext}}=0\), so that \(H=T\) is purely kinetic. When it comes to analyzing the quantum statistics of the ideal Fermi gas, explain why it is advantageous to work in the grand canonical ensemble. Hence, obtain the Fermi-Dirac distribution \(\langle N_{|\mathbf k\rangle}\rangle\) of single-fermion \(|\mathbf k\rangle\)-state occupation numbers for an ideal Fermi gas.

Solution: Since \(H=T\) is purely kinetic, the \(H\)-eigenstates are just Slater determinants of distinct single-particle plane wave spinors \(\sqrt{N!}\mathcal A_N|\mathbf k_1,\sigma_1\rangle\otimes…\otimes|\mathbf k_N,\sigma_N\rangle\) where \(\langle\mathbf x|\mathbf k\rangle=e^{i\mathbf k\cdot\mathbf x}/\sqrt{V}\) with energy eigenvalues \(H\mathcal A_N|\mathbf k_1,\sigma_1\rangle\otimes…\otimes|\mathbf k_N,\sigma_N\rangle=\sum_{i=1}^N\frac{\hbar^2|\mathbf k_i|^2}{2m}\mathcal A_N|\mathbf k_1,\sigma_1\rangle\otimes…\otimes|\mathbf k_N,\sigma_N\rangle\).

If one were working in the canonical \((N,V,T)\) ensemble, evaluating the canonical partition function would require a sum over all \(N\)-fermion states of the form above which (if one actually takes a moment to think about it) is just a combinatorial nightmare to deal with (this is also not much better than asking how many \(N\)-fermion states there are with a given energy \(E\) as in the microcanonical \((N,V,E)\) ensemble). The upshot is that working in the grand canonical ensemble is ideal because one is then free to let \(N\) vary,

\[\mathcal Z=\prod_{|\mathbf k\rangle}\sum_{N_k=0,1}e^{-\beta(N_kE_k-\mu N_k)}=\prod_{|\mathbf k\rangle}\left(1+e^{-\beta(E_k-\mu)}\right)\]

From which the grand canonical potential is:

\[\Phi=-\frac{1}{\beta}\ln\mathcal Z=-\frac{1}{\beta}\sum_{|k\rangle\in\mathcal H_0}\ln\left(1+e^{-\beta(E_k-\mu)}\right)\]

And the average number of fermions is:

\[\langle N\rangle=-\frac{\partial\Phi}{\partial\mu}=\sum_{|k\rangle\in\mathcal H_0}\frac{1}{e^{\beta(E_k-\mu)}+1}\]

from which one immediately reads off the Fermi-Dirac distribution of the Fermi occupation numbers of each of the single-fermion states \(|k\rangle\):

\[\langle N_k\rangle=\frac{1}{e^{\beta(E_k-\mu)}+1}\]

It is remarkable that a mere sign change in the denominator from the Bose-Einstein distribution is all that is needed to enforce the Pauli exclusion principle. Unlike for the ideal Bose gas where the chemical potential \(\mu<0\) had to be negative, for the Fermi-Dirac distribution \(\mu\in\textbf R\) can be anything.

Just as with the ideal Bose gas, for an ideal Fermi gas one would like to approximate the series with integrals (called the Thomas-Fermi approximation) \(\sum_{|k\rangle\in\mathcal H_0}\mapsto\int_0^{\infty}g(E)dE\). Taking the ideal Fermi gas to be non-relativistic, one has the density of states:

\[g(E)=\frac{g_sm^{3/2}V}{\sqrt{2}\pi^2\hbar^3}\sqrt{E}\]

where \(g_s=2s+1\) is a spin degeneracy factor (which has to be explicitly included for fermions by virtue of the spin-statistics theorem \(s=1/2,3/2,5/2,…\) and the fact that the free Hamiltonian \(H=T\) commutes with \(\textbf S^2\)). In the grand canonical ensemble, one thus has for an ideal Fermi gas:

\[\Phi=\frac{g_sV}{\beta\lambda^3}\text{Li}_{5/2}(-z)\]

\[\langle N\rangle=-\frac{g_sV}{\lambda^3}\text{Li}_{3/2}(-z)\]

\[\langle E\rangle=-\frac{3g_sV}{2\beta\lambda^3}\text{Li}_{5/2}(-z)\]

from which one obtains \(pV=\frac{2}{3}E\) for an ideal Fermi gas as was the case for the ideal Bose gas (and the ideal classical gas). In the high-temperature \(T\to\infty\) limit \(z\to 0\), one finds that, similar to the ideal Bose gas, the ideal Fermi gas looks like an ideal classical gas, at least to first order in the virial expansion (at second order, the quantum correction actually increases the pressure of the ideal Fermi gas whereas it was decreasing for the ideal Bose gas):

\[pV=NkT\left(1+\frac{\lambda^3N}{4\sqrt{2}g_sV}+O\left(\frac{N}{V}\right)^2\right)\]

In order to see more interesting, non-classical physics, it will as usual be necessary to look in the low-temperature limit \(T\to 0,z\to 1\). In fact, to start, one may as well look directly at the case of absolute zero \(T=0\). In this case, the ideal Fermi gas is said to be degenerate. At a glance, this is because the Fermi-Dirac distribution for the Fermi occupation numbers reduces to a top-hat filter:

\[N_k=\frac{1}{e^{\beta(E_k-\mu)}+1}=[E_k<\mu]\]

One can define the Fermi energy by \(E_F:=\mu(T=0)\) so that states \(|k\rangle\) with \(\hbar^2k^2/2m<E_F\) lying in the Fermi sea are fully occupied (i.e. have Fermi occupation number of \(N_k=1\)) while states \(|k\rangle\) with \(\hbar^2k^2/2m>E_F\) lying beyond the Fermi surface are completely empty. This definition of the Fermi energy \(E_F\) is strictly speaking a bit misleading since in the grand canonical ensemble \(\mu\) and \(T\) are independent and fixed while \(N\) fluctuates; in practice \(N\) is fixed and both \(\mu\) and \(T\) fluctuate in a way to keep \(N\) fixed so that working in the grand canonical ensemble is just a mathematical convenience. Therefore, it would make more sense to express/define \(E_F\) in terms of the fixed number \(N\) of fermions in the degenerate ideal Fermi gas:

\[N=\sum_{|k\rangle\in\mathcal H_0}N_k=\int_0^{\infty}[E<E_F]g(E)dE=\int_0^{E_F}g(E)dE\Rightarrow E_F=\frac{\hbar^2}{2m}\left(\frac{6\pi^2 N}{g_sV}\right)^{2/3}\]

This is of course related to the Fermi momentum and Fermi temperature by \(E_F=\hbar^2k_F^2/2m=kT_F\). The Fermi temperature \(T_F\) for the ideal Fermi gas determines whether the ideal Fermi gas is in the high-temperature \(T>T_F\) regime or the low-temperature \(T<T_F\) regime. For example, in a copper \(\text{Cu(s)}\) wire the number density of electrons \(e^-\) is \(N/V\approx 8.5\times 10^{28}\text{ m}^{-3}\), so the corresponding Fermi temperature is actually quite hot \(T_F\approx 8.2\times 10^4\text{ K}\) by everyday standards, and so in particular room temperature \(T\approx 300\text{ K}\ll T_F\) means that the electrons \(e^-\) in metals can be thought of to a good approximation as degenerate \(T=0\) Fermi gases.

Having computed the total number of fermions \(N=\langle N\rangle\), one can also compute the total energy \(E=\langle E\rangle\) in the grand canonical ensemble:

\[E=\sum_{|k\rangle\in\mathcal H_0}N_kE_k=\int_0^{\infty}[E<E_F]Eg(E)dE=\int_0^{E_F}Eg(E)dE=\frac{3}{5}NE_F\]

which is pretty intuitive, the factor of \(3/5\) essentially just coming from the average of \(k^2\) in a ball of radius \(k_F\), i.e. \(\frac{3}{4\pi k_F^3}\int_0^{k_F}k^24\pi k^2dk=\frac{3}{5}k_F^2\).

Finally, the “equation of state” \(pV=\frac{2}{3}E\) earlier yields the corresponding degeneracy pressure:

\[pV=\frac{2}{5}NE_F\]

For comparison, recall that below the critical temperature \(T<T_c\) the pressure \(p\sim T^{5/2}\) of a BEC approached \(p\to 0\) as \(T\to 0\); not so for an ideal Fermi gas. For both the ideal Bose and Fermi gases, \(pV=\frac{2}{3}E\) but because bosons can condense to the \(E=0\) ground state, their pressure \(p\) also drops to \(p\to 0\), however fermions cannot do this because of the Pauli exclusion principle (they are forced to fill out a Fermi sea instead), so their total energy \(E=\frac{3}{5}NE_F\) can never reach zero, and therefore their pressure \(p\) also cannot reach \(p\to 0\), leaving this residual \(T=0\) degeneracy pressure \(p=\frac{2}{5}\frac{N}{V}E_F>0\).

Finally, it is worth asking more generally just about the physics of an ideal Fermi gas not necessarily when it is degenerate at \(T=0\), but merely at some “low” temperature \(T\ll T_F\). Here, “physics” shall mean “low-temperature heat capacity” \(C_V=C_V(T)\).

In this case, the Fermi-Dirac distribution will be distorted from the degenerate \(T=0\) top-hat filter into a distribution that looks like:

The key observation is that only fermions close to the Fermi surface, specifically whose energy is within \(kT\) of the Fermi energy \(E_F\) can respond to any additional energy added to the ideal Fermi gas, and therefore contribute to the heat capacity \(C_V\) (since only they notice the non-degenerate temperature \(T>0\), the rest of the fermions being locked in the Fermi sea by the Pauli exclusion principle).

\[C_V=\frac{\partial E}{\partial T}=-\frac{3g_sV}{2}\frac{\partial}{\partial T}\left(\frac{1}{\beta\lambda^3}\text{Li}_{5/2}(-z)\right)\]

At this point, invoke the behavior of the polylogarithm as the fugacity \(z\to 1\) in the low-\(T\) limit (called the Sommerfeld expansion, essentially just a lot of binomial expansions):

\[-\text{Li}_{s}(-z)=\frac{(\ln z)^s}{\Gamma(s+1)}\left(1+\frac{\pi^2}{6}\frac{s(s-1)}{(\ln z)^2}+…\right)\]

where \(\ln z=\beta\mu\), so this simplifies to:

\[C_V\approx \frac{\sqrt{2}g_sm^{3/2}V}{5\pi^2\hbar^3}\frac{\partial}{\partial T}\left(\mu^{5/2}\left(1+\frac{5\pi^2}{8\beta^2\mu^2}\right)\right)\]

Problem: Explain why, in order for the number of fermions \(N\) in the gas to be fixed, in particular \(dN/dT=0\), the chemical potential \(\mu\) must become a function of temperature \(\mu=\mu(T)\).

Solution: Simply because:

\[N=\int_0^{\infty}dEg(E)\frac{1}{e^{\beta(E-\mu)}+1}\]

So if the LHS is a constant, but the RHS has an explicit \(T\)-dependence in the \(\beta=1/k_BT\), so \(\mu(T)\) must vary implicitly so as to “offset” the explicit \(T\) variation in \(\beta\) to keep the overall integral constant.

Problem: At \(T=0\), the value of the chemical potential is by definition called the Fermi energy \(E_F:=\mu(T=0)\) of the Fermi gas, i.e. roughly speaking each additional fermion added to the gas increases the gas’s energy by \(E_F\) since that fermion would be added to the Fermi surface. State how \(E_F\) scales with the number density \(N/V\) of fermions in the gas in dimension \(d\).

Solution: The key point to realize is that at \(T=0\) the Fermi-Dirac distribution becomes a step function \([E<E_F]\), so one has the implicit equation for \(E_F\):

So in particular, the important point to remember is that \(E_F\sim (N/V)^{2/d}\).

Problem: Now suppose, instead of working with a strictly degenerate \(T=0\) Fermi gas, one heats the gas up a little to some strictly positive temperature \(T>0\), but still much less than the gas’s Fermi temperature \(T_F:=E_F/k_B\). In this low-\(T\) regime, use the Sommerfeld expansion to show that the chemical potential \(\mu(T)\) decreases (provided \(d\geq 3\)) quadratically from its \(T=0\) value of \(\mu(T=0)=E_F\), in particular \(\partial\mu/\partial T|_{T=0}=0\) so to \(1^{\text{st}}\)-order provided \(T\ll T_F\) one can often get away with approximating the chemical potential \(\mu\approx E_F\) by its constant value at \(T=0\).

Solution:

A comment: recall that the Bose-Einstein distribution \(\frac{1}{e^{\beta(E-\mu)}-1}\) comes from summing a suitable geometric series in the partition function. The idea of the Sommerfeld expansion is kinda to undo this step, recasting distribution back into its geometric series form…except the catch here is that one is working with the Fermi-Dirac distribution, not the Bose-Einstein, and indeed in the derivation of the Fermi-Dirac distribution there was no geometric series involved (or a trivial geometric series of just \(2\) terms if one likes), yet the way the sum is being unwrapped is more in the spirit of Bose-Einstein statistics…is there any connection here or just a mere mathematical coincidence?

Finally, it is clear that one can re-express the heat capacity in terms of \(N\) and \(E_F\) (the fixed variables) as:

\[C_V=\frac{3N}{5}\frac{\partial}{\partial T}\left(\mu\frac{1+5\pi^2k^2T^2/8\mu^2}{1+\pi^2k^2T^2/8\mu^2}\right)\approx\frac{3NE_F}{5}\frac{\partial}{\partial T}\left(1-\left(\frac{5}{8}-\frac{1}{8}-\frac{1}{12}\right)\frac{\pi^2}{\beta^2E_F^2}\right)\]

leading to the linear heat capacity behavior of the low-\(T\) ideal Fermi gas:

\[C_V=\frac{\pi^2}{2}Nk\frac{T}{T_F}\]

Ignoring the \(\pi^2/2\) prefactor which came from the detailed Sommerfeld expansion of the polylogarithms, there is a simple intuitive way to understand this formula: the number of Fermi surface fermions living within \(kT\) of the Fermi energy \(E_F\) is \(g(E_F)kT\) and the energy of each fermion is of order \(kT\) so the total energy of all Fermi surface fermions is \(E\sim g(E_F)(kT)^2\). If one adds some energy \(dE\) into the ideal Fermi gas, then essentially all this energy has to go into the Fermi surface fermions so that one may legitimately equate \(dE\sim g(E_F)k^2TdT\) reproducing the linear heat capacity:

\[C_V\sim g(E_F)k^2T\sim E_F^{1/2}k^2T\sim N^{1/3}k^2T\sim Nk\frac{T}{T_F}\]

Actually, even the pre-factor \(\pi^2/2\) from the Sommerfeld expansion can almost be calculated correctly. Since \(N=\int_0^{E_F}dEg(E)\) and the integral of a power \(g(E)\sim E^{1/2}\) is just \(N=\frac{2}{3}E_Fg(E_F)\), and since this is basically a free electron gas (minus Pauli as usual), any injection of energy goes directly into Fermi surface electrons. There are \(g(E_F)k_BT\) of these states, which, assuming they’re all completely filled, also means there are that many electrons, each with the equipartition kinetic energy \(3k_BT/2\):

\[dE=g(E_F)k_BT\times\frac{3}{2}k_BT\]

so \(C_V=\partial _TE=\frac{9}{2}Nk_B\frac{T}{T_F}\).

A more visually intuitive way to understand how \(\mu\) depends on \(T\):

The theory of ideal Fermi gases has diverse applications, ranging from electrons \(e^-\) in a conductor (as justified by Landau’s Fermi liquid theory) to astrophysics (e.g. white dwarf stars are supported by electron degeneracy pressure, neutron stars are supported by neutron degeneracy pressure, thanks to the fact that both electrons \(e^-\) and neutrons \(n^0\) are fermions) to Pauli paramagnetism and Landau diamagnetism in condensed matter physics.

Posted in Blog | Leave a comment

Bose-Einstein Condensation

The purpose of this post is to prove several general identities concerning the quantum statistical mechanics of an isolated, ideal Bose gas at equilibrium.

Problem #\(1\): Specify the physics (i.e. write down the Hamiltonian \(H\) for an isolated, ideal Bose gas).

Solution #\(1\): Because the Bose gas is isolated, there is no external potential \(V_{\text{ext}}=0\) and because it is ideal, there are no internal interactions \(V_{\text{int}}=0\). This leaves only the relativistic kinetic energy, and the single-boson Hamiltonian \(H\) is given by the usual dispersion relation:

\[H=\sqrt{\textbf P^2c^2+m^2c^4}-mc^2\]

However, for the typical case of nonrelativistic \(|\textbf P|\ll mc\) massive bosons, \(H\approx|\textbf P|^2/2m\), and for the less typical but still important case of (necessarily relativistic) massless \(m=0\) bosons (e.g. photons), \(H=|\textbf P|c\).

Problem #\(2\): By carefully considering what it means to be a boson, explain why one should work in the grand canonical ensemble.

Solution #\(2\): If \(\mathcal H\) denotes a single-boson state space, then the state space of \(N\) identical bosons is the \(N\)-fold symmetric tensor product \(S^N(\mathcal H)\subseteq\mathcal H^{\otimes N}\), i.e. two states \(|\Psi\rangle,|\Psi’\rangle\in S^N(\mathcal H)\) are physically equivalent \(|\Psi’\rangle\equiv|\Psi\rangle\) iff both specify exactly the same number of bosons \(N_{|\textbf k\rangle}\in\textbf N\) in each single-boson state \(|\textbf k\rangle\in\mathcal H\).

This means that, although experimentally \(N\) is typically fixed (massless \(m=0\) bosons like photons being the exception), mathematically any sum of the form \(\sum_{|\Psi\rangle\in S^N(\mathcal H)}\) for fixed \(N\) (which would arise when computing the partition function in the microcanonical or canonical ensembles) is a non-trivial combinatorics problem to parameterize. Thus, solely motivated by ease of mathematical calculation, one should allow the total number of bosons \(N\) in the Bose gas to fluctuate in diffusive equilibrium with an external particle bath \(\mu\), hence working in the grand canonical ensemble.

Problem #\(3\): Write down an expression for the grand canonical potential \(\Phi\), stating any assumptions.

Solution #\(3\): As usual, \(\Phi=-\beta^{-1}\ln\mathcal Z\), where the grand canonical partition function is:

\[\mathcal Z=\prod_{|\textbf k\rangle\in\mathcal H}\sum_{N_{|\textbf k\rangle}=0}^{\infty}e^{-\beta(N_{|\textbf k\rangle}E_{|\textbf k\rangle}-\mu N_{|\textbf k\rangle})}\]

where, from Solution #\(1\), the energy of the single-boson plane wave state is \(E_{|\textbf k\rangle}=\sqrt{\hbar^2|\textbf k|^2c^2+m^2c^4}-mc^2\). The geometric series converges for all \(|\textbf k\rangle\in\mathcal H\) iff \(\mu<E_{|\textbf k\rangle}\) for all \(|\textbf k\rangle\in\mathcal H\); since the kinetic energy is positive semi-definite (reaching its global minimum \(E_{|\textbf 0\rangle}=0\) for the \(\textbf k=\textbf 0\) ground state), this in turn is logically equivalent to the condition of a strictly negative chemical potential \(\mu<0\) (how to explain that this condition is violated for massless bosons and also for Bose-Einstein condensates, both of which have \(\mu=0\)? edit: one way I just thought of approaching this is to take Zoran’s perspective about the \(c\) subscript being placed on \(N_c\) rather than the experimentally more pertinent \(T_c\), i.e. recall the argument was that for a fixed \(T\), one can find an \(N_c\) such that if \(N=N_c\), then \(T=T_c\)…but then recalling \(T_c\propto N^{2/3}\) in a box trap or \(T_c\propto N^{1/3}\) in a harmonic trap, hence monotonically increasing functions of \(N\), so for \(N>N_c\), occupation of ground state is \(N-N_c\) (so in principle can get BEC at room temperature \(T\) just will need to surpass a ridiculous \(N_c\)). Hence:

\[\Phi=\beta^{-1}\sum_{|\textbf k\rangle\in\mathcal H}\ln\left(1-e^{-\beta(E_{|\textbf k\rangle}-\mu)}\right)\]

Problem #\(4\): Using Solution #\(3\), write down series for:

  • The average pressure \(\langle p\rangle\)
  • The entropy \(S\)
  • The average boson number \(\langle N\rangle\) (and deduce the Bose-Einstein distribution)
  • The average energy \(\langle E\rangle\)

Solution #\(4\):

\[p=-\frac{\Phi}{V}=-(\beta V)^{-1}\sum_{|\textbf k\rangle\in\mathcal H}\ln\left(1-e^{-\beta(E_{|\textbf k\rangle}-\mu)}\right)\]

\[S=-\frac{\partial\Phi}{\partial T}=\]

\[\langle N\rangle=-\frac{\partial\Phi}{\partial\mu}=\]

From which the Bose-Einstein distribution of Bose occupation numbers \(N_{|\textbf k\rangle}\) is:

\[N_{|\textbf k\rangle}=\frac{1}{e^{\beta(E_{|\textbf k\rangle-\mu})}-1}\]

\[\langle E\rangle=\frac{\partial(\Phi/\beta)}{\partial\beta}=\]

Problem #\(5\): Write down the relativistic density of states \(g(k)\). Hence compute \(g(E)\). What does it reduce to in the non-relativistic limit \(E\ll mc^2\) and in the massless \(m=0\) limit?

Solution #\(5\): Assuming infinite space periodic boundary conditions:

\[g(k)=\frac{\sigma V}{(2\pi)^3}4\pi k^2\]

where \(\sigma\) is some additional factor accounting for degrees of freedom besides \(\textbf k\) (e.g. \(\sigma=2\) for the polarization qubit of a photon). So:

\[g(E)=g(k)\frac{\partial k}{\partial E}=\frac{\sigma V}{2\pi^2\hbar^3c^3}(E+mc^2)\sqrt{(E+mc^2)^2-m^2c^4}\]

If \(E\ll mc^2\), this becomes approximately:

\[g(E)\approx \]

Problem #\(6\): Hence, using Solution #\(5\), estimate the excited state population \(N^*\) and corresponding excited kinetic energy \(E^*\). Explain why the ground state is not accounted for.

\[\Phi=-\frac{m^{3/2}V}{\sqrt{2}\pi^2\beta\hbar^3}\int_0^{\infty}\sqrt{E}\ln(1-ze^{-\beta E})dE=-\frac{V}{\beta\lambda^3}\text{Li}_{5/2}(z)\]

\[\langle N\rangle=\frac{m^{3/2}V}{\sqrt{2}\pi^2\hbar^3}\int_0^{\infty}\frac{\sqrt{E}}{z^{-1}e^{\beta E}-1}dE=\frac{V}{\lambda^3}\text{Li}_{3/2}(z)\]

\[\langle E\rangle=\frac{m^{3/2}V}{\sqrt{2}\pi^2\hbar^3}\int_0^{\infty}\frac{E^{3/2}}{z^{-1}e^{\beta E}-1}dE=\frac{3V}{2\beta\lambda^3}\text{Li}_{5/2}(z)\]

where \(z:=e^{\beta\mu}\in (0, 1)\) is called the fugacity of the ideal Bose gas, \(\lambda=\sqrt{\frac{2\pi\hbar^2}{mkT}}\) is the thermal de Broglie wavelength of the ideal Bose gas, and the polylogarithm is defined by the series \(\text{Li}_s(z):=\sum_{n=1}^{\infty}\frac{z^n}{n^s}\), so for instance \(\text{Li}_s(1)=\zeta(s)\) and \(\int_0^{\infty}\frac{x^{s-1}}{z^{-1}e^x-1}dx=\Gamma(s)\text{Li}_s(z)\).

Recalling that \(\Phi=-pV\) in the grand canonical ensemble, one therefore obtains (in an indirect form) the equation of state for an ideal Bose gas:

\[pV=\frac{2}{3}\langle E\rangle\]

Or, working in the thermodynamic limit henceforth:

\[pV=\frac{2}{3}E\]

At first, this looks exactly the same as the equation of state \(pV=NkT\) for just a classical (non-bosonic) ideal gas since there the kinetic energy is \(E=\frac{3}{2}NkT\). To see how in fact the equation of state for the ideal Bose gas is not the same as that of the ideal classical gas (i.e. that \(E\neq\frac{3}{2}NkT\) for the ideal Bose gas), clearly one must find how \(E=E(N,T)\) is related to \(N\) and \(T\). Conceptually this is straightforward since above one has already computed \(N=N(\mu, T)\) and \(E=E(\mu, T)\), so one just has to first invert \(N=N(\mu,T)\) in the form \(\mu=\mu(N,T)\) and then substitute this into \(E=E(\mu,T)=E(\mu(N,T),T)=E(N,T)\) which can be plugged into the equation of state to obtain \(pV=\frac{2}{3}E(N,T)\). Practically, there is no simple analytical way to do this for arbitrary temperatures \(T\) and chemical potential \(\mu\).

Instead, the next best thing one can hope for is to get a sense of the physics at the two extremes of the fugacity \(z\in(0,1)\), namely the high-temperature limit \(z\to 0\) and the low-temperature limit \(z\to 1\) (it is a priori counterintuitive that \(z\to 0\) is a high-\(T\) expansion or that \(z\to 1\) is a low-\(T\) expansion considering \(\mu<0\); this only becomes apparent a posteriori). In the high-\(T\) case \(z\to 0\), some algebra gives the second-order virial expansion for the high-temperature equation of state of an ideal Bose gas:

\[pV=NkT\left(1-\frac{\lambda^3N}{4\sqrt{2}V}+O\left(\frac{N}{V}\right)^2\right)\]

Thus, compared to an ideal gas at the same temperature \(T\), the effect of bosonic statistics is to reduce the pressure \(p\) a little bit, but otherwise, at high temperatures \(T\to\infty\), the ideal Bose gas and ideal classical gas are basically the same.

The more interesting physics lurks in the low-\(T\) limit \(z\to 1\). In this case, it is clear that the number of bosons in the ideal Bose gas approaches:

\[N\to\frac{\zeta(3/2)V}{\lambda^3_c}\]

at a critical temperature \(T_c\) given by:

\[T_c=\frac{2\pi\hbar^2}{mk}\left(\frac{N}{\zeta(3/2)V}\right)^{2/3}\]

However, suppose one were to cool below the critical temperature \(T<T_c\). Supposing that \(z\) remains capped at \(1\) (otherwise the polylogarithm \(\text{Li}_{3/2}(z)\) would diverge for \(z>1\) as the graph suggests), then because \(\lambda\propto T^{-1/2}\), this implies that the number of bosons \(N\) should decrease, but that is absurd because \(V\) is constant and the number of bosons \(N\) does not fluctuate enough in the thermodynamic limit \(\sigma_N/N\sim N^{-1/2}\) to explain this decrease. The resolution to this paradox is subtle and cuts to the heart of Bose-Einstein condensation. Recall from the Bose-Einstein distribution that the Bose occupation number \(N_0=\langle N_0\rangle\) of the single-boson ground state \(|0\rangle\) (whose energy is \(E_0=0\)) is:

\[N_0=\frac{1}{z^{-1}-1}\]

So more and more bosons in the ideal Bose gas will, in the low-temperature limit \(z\to 1\), condense into the ground state \(|0\rangle\) as evidenced by the blowup of \(N_0\) at \(z=1\). However, earlier the replacement \(\sum_{|k\rangle\in\mathcal H_0}\mapsto\int_0^{\infty}g(E)dE\) when evaluating the average number of bosons \(\langle N\rangle\) would have (in this \(z\to 1\) edge case) undercounted all of these bosons condensing in the ground state \(|0\rangle\) because the density of states \(g(E)\propto\sqrt{E}\) vanishes \(g(0)=0\) at the ground state energy \(E_0=0\). Instead, one can just manually add \(N_0=(z^{-1}-1)^{-1}\) into the total boson count “by hand” to obtain the revised count:

\[N=\frac{V}{\lambda^3}\text{Li}_{3/2}(z)+\frac{1}{z^{-1}-1}\]

The way to read this is that the first term \(\frac{V}{\lambda^3}\text{Li}_{3/2}(z)\) is the total number of bosons not in the ground state, which as \(z\to 1\) should become negligible in comparison to the Bose occupation number \(N_0\) of the ground state. In this limit, one has:

\[N\to\frac{1}{z^{-1}-1}\Rightarrow z\to\left(1+\frac{1}{N}\right)^{-1}\approx 1-\frac{1}{N}<1\]

So the fugacity \(z\) is naturally capped by however many bosons \(N\) one started with, regardless of how low the temperature \(T\to 0\) drops. More precisely, as \(T<T_c\) drops below the critical temperature, the fraction \(N_0/N\) of bosons in the ground state \(|0\rangle\) grows monotonically towards \(N_0/N\to 1\) as \(T/T_c\to 0\) in the manner:

\[\frac{N_0}{N}=\frac{1}{1+(N-N_0)/N_0}\approx 1-\frac{N-N_0}{N_0}\approx 1-\frac{\zeta(3/2)V}{N\lambda^3_c}=1-\left(\frac{T}{T_c}\right)^{3/2}\]

This low-temperature bosonic communism is called Bose-Einstein condensation.

As for the low-temperature \(T<T_c\) equation of state for the ideal Bose gas, one has the previous grand canonical potential but with an additional contribution from the BEC in the ground state:

\[\Phi=-\frac{V}{\beta\lambda^3}\text{Li}_{5/2}(z)+\frac{1}{\beta}\ln(1-z)\]

As \(z\to 1-\frac{1}{N}\) maxes out:

\[\Phi\to -\frac{1}{\beta}\left(\frac{\zeta(5/2)V}{\lambda^3}+\ln N\right)\]

However, this makes it clear that, unlike for the total boson number \(N\), here the ground state contribution to the grand canonical potential \(\Phi\) is actually negligible in the thermodynamic limit because \(V/\lambda^3\sim N\) is much larger than \(\ln(N)\). The low-\(T\) equation of state for the ideal Bose gas (well actually now the BEC) is therefore:

\[p=\frac{\zeta(5/2)}{\beta\lambda^3}\sim T^{5/2}\]

Clearly, this is now very different from the classical ideal gas. Notably, the pressure \(p\) is independent of the bosonic number density \(N/V\), the intuition being that the vast number of bosons condensing in the motional ground state \(|k\rangle=|0\rangle\) will be (roughly) frozen in place and therefore make negligible contribution to the pressure \(p\).

For now, there is one more interesting thing to mention about Bose-Einstein condensation; clearly it’s a phase transition between two radically different states of matter, namely from a(n ideal Bose) gas to a BEC (cf. gaseous steam \(\text H_2\text O(g)\) condensing into liquid water \(\text H_2\text O(\ell)\)). Usually phase transitions are associated with some kind of discontinuity in physical properties at the phase transition; how does this manifest in the case of Bose-Einstein condensation at the critical temperature \(T=T_c\)? It turns out that the derivative \(\frac{\partial C_V}{\partial T}\) of the isochoric heat capacity \(C_V\) with respect to the temperature \(T\) is discontinuous at the critical temperature \(T=T_c\) (although the heat capacity \(C_V=C_V(T)\) itself is continuous). This is reminiscent (and related to) the superfluid \(\lambda\)-transition seen in bosonic \(^4\text{He}\) at \(T\approx 2.17\text{ K}\).

Posted in Blog | Leave a comment

Rabi Oscillations & Optical Bloch Equations

Problem: Consider an isolated atom with time-independent Hamiltonian \(H_0\). Such an atom will have many bound \(H_0\)-eigenstates, but for simplicity focus on just two such bound states (think of it as a qubit) \(|0\rangle\) and \(|1\rangle\) (called the ground state and the excited state) separated by a resonant frequency \(\omega_0=(E_1-E_0)/\hbar\). If one now proceeds to shine light on the atom of frequency \(\omega\), show that the corresponding interaction potential \(V_{/H_0}(t)\) in the interaction picture modulo \(H_0\) is given approximately by:

\[V_{/H_0}(t)\approx \frac{\hbar\Omega}{2}(e^{-i\delta t}\sigma_++h.c.)\]

where the detuning \(\delta:=\omega-\omega_0\), and state all assumptions.

Solution: The assumptions are:

  1. (Semiclassical approximation) The atom is treated “quantumly” but the light is treated classically.
  2. (\(\alpha\times\)Stark \(=\) Zeeman) Within the semiclassical approximation, the effect of the \(\textbf B\)-field is ignored compared to the \(\textbf E\)-field.
  3. (Dipole approximation) The electric field is approximately spatially independent \(\textbf E(t)=\textbf E_0\cos(\omega t)\) (by evaluating it at the atom’s position).
  4. (Two-level system) Assume no other \(H_0\)-eigenstates are relevant, i.e. \(|0\rangle\langle 0|+|1\rangle\langle 1|\approx 1\).
  5. (\([H_0,\Pi]=0\)) The ground state \(|0\rangle\) and the excited state \(|1\rangle\) are both \(\Pi\)-eigenstates with opposite parity eigenvalues \(\pm 1\).
  6. (Rotating wave approximation) The detuning is small, i.e. \(|\delta|\ll\omega+\omega_0\) (notice this is compatible with assumption #\(4\)).
  7. (Monochromatic) The light is assumed to be of a single frequency \(\omega\) with zero spectral width.

Then the interaction potential in the Schrodinger picture is:

\[V(t)=-\boldsymbol{\pi}\cdot\textbf E(t)=\hbar\Omega\cos\omega t\sigma_x\]

where the matrix elements are in the obvious Hilbert space \(\text{span}_{\textbf C}|0\rangle,|1\rangle\) and the diagonal entries vanish by the parity assumption. Here, \(\Omega\) is the Rabi frequency and defined to capture those non-vanishing off-diagonal matrix elements (both gauge-fixed to be real):

\[\hbar\Omega:=\langle 1|-\boldsymbol{\pi}\cdot\textbf E_0|0\rangle=\langle 0|-\boldsymbol{\pi}\cdot\textbf E_0|1\rangle\]

(intuition: \(\Omega\) simultaneously contains information about how “bright” the light is and also how strongly this particular E\(1\) perturbation couples \(|0\rangle\) and \(|1\rangle\)). In the interaction picture modulo \(H_0\):

\[V_{/H_0}(t)=e^{iH_0t/\hbar}V(t)e^{-iH_0t/\hbar}\]

\[=\hbar\Omega\cos\omega t\text{diag}(e^{iE_0t/\hbar},e^{iE_1t/\hbar})\sigma_x\text{diag}(e^{-iE_0t/\hbar},e^{-iE_1t/\hbar})\]

\[=\hbar\Omega\cos\omega t\begin{pmatrix}0&e^{-i\omega_0 t}\\e^{i\omega_0 t}&0\end{pmatrix}\]

\[\approx\frac{\hbar\Omega}{2}\begin{pmatrix}0&e^{i\delta t}\\e^{-i\delta t}&0\end{pmatrix}\]

where one has written \(\cos \omega t=(e^{i\omega t}+e^{-i\omega t})/2\) to make the “lock-in detection” explicit and then low-pass filtered with RWA. In particular, seeing the factor of \(1/2\) in front is a smoking gun for RWA. This then matches the claimed result with \(\sigma_+:=|1\rangle\langle 0|=\begin{pmatrix}0&0\\1&0\end{pmatrix}\) and \(\sigma_{\pm}^{\dagger}=\sigma_{\mp}\).

Problem: Having found \(V_{/H_0}(t)\), show that \(|\psi_{/H_0}(t)\rangle\) undergoes Rabi oscillations at the generalized Rabi frequency \(\tilde{\Omega}:=\sqrt{\Omega^2+\delta^2}\).

If one expands out the interaction picture state \(|\psi_I(t)\rangle=\langle 0|\psi_I(t)\rangle|0\rangle+\langle 1|\psi_I(t)\rangle|1\rangle\) in the subspace as well, then one obtains a non-autonomous linear dynamical system with \(2\pi/\delta\)-periodic Floquet forcing:

\[\begin{pmatrix}\dot{\langle 0|\psi_I\rangle}\\\dot{\langle 1|\psi_I\rangle}\end{pmatrix}=\frac{\Omega}{2i}\begin{pmatrix}0&e^{i\delta t}\\e^{-i\delta t}&0\end{pmatrix}\begin{pmatrix}\langle 0|\psi_I\rangle\\\langle 1|\psi_I\rangle\end{pmatrix}\]

Nevertheless, it turns out to be very easy in this case to decouple the time evolutions of the projections \(\langle 0|\psi_I\rangle\) and \(\langle 1|\psi_I\rangle\) from each other into two undriven, damped harmonic oscillators (even though ironically the atom is being driven with light at \(\omega_{\text{ext}}=\omega_{01}+\delta\)):

\[\ddot{\langle 0|\psi_I\rangle}-i\delta\dot{\langle 0|\psi_I\rangle}+\frac{\Omega^2}{4}\langle 0|\psi_I\rangle=0\]

\[\ddot{\langle 1|\psi_I\rangle}+i\delta\dot{\langle 1|\psi_I\rangle}+\frac{\Omega^2}{4}\langle 1|\psi_I\rangle=0\]

Assuming the initial condition \(|\psi_I(0)\rangle=|0\rangle\) that the atom starts at time \(t=0\) in the ground state \(|0\rangle\), the solutions are:

\[\langle 0|\psi_I(t)\rangle=e^{i\delta t/2}\left(\cos\frac{\tilde{\Omega}}{2}t-\frac{i\delta}{\tilde{\Omega}}\sin\frac{\tilde{\Omega}}{2}t\right)\]

and more intuitively:

\[\langle 1|\psi_I(t)\rangle=-ie^{-i\delta t/2}\frac{\Omega}{\tilde{\Omega}}\sin\frac{\tilde{\Omega}}{2}t\]

where \(\tilde{\Omega}:=\sqrt{\Omega^2+\delta^2}\) is called the generalized Rabi frequency. Being a harmonic oscillator, it makes sense that the atom’s interaction picture state \(|\psi_I(t)\rangle\) roughly speaking “oscillates” between the ground state \(|0\rangle\) and the excited state \(|1\rangle\), called Rabi oscillations, but these oscillations are not actually damped because the “damping coefficient” \(\pm i\delta\) was imaginary. Note that although all of the above discussion has focused on Rabi oscillations in the context of electric dipole transitions, there are also times when the magnetic field \(\textbf B_{\text{ext}}\) rather than the electric field \(\textbf E_{\text{ext}}\) dominates the physics (e.g. fine structure or hyperfine structure transitions) in which case there would also be Rabi oscillations in the context of magnetic dipole transitions.

Rabi oscillations are more intuitive when expressed in terms of the probabilities prescribed by the Born rule. In this case, one has (dropping the \(I\)-subscript because it no longer matters):

\[|\langle 0|\psi(t)\rangle|^2=1-|\langle 1|\psi(t)\rangle|^2\]

\[|\langle 1|\psi(t)\rangle|^2=\frac{\Omega^2}{\tilde{\Omega}^2}\sin^2\frac{\tilde{\Omega}}{2}t\]

where remember that \(\sin^2\frac{\tilde{\Omega}}{2}t=\frac{1}{2}(1-\cos\tilde{\Omega}t)\) oscillates at the generalized Rabi frequency \(\tilde{\Omega}\) and not \(\tilde{\Omega}/2\). In particular, these Rabi “probability oscillations” are most pronounced when the light is resonant with the atom, i.e. \(\delta=0\). In this case, \(\tilde{\Omega}=\Omega\) and one has:

Such \(\delta=0\) resonant Rabi oscillations also provide a way to experimentally prepare various qubit states in the lab simply by controlling the driving time \(\Omega t\) for which one applies the light. For instance, if one applies a \(\pi\)-pulse so that \(\Omega t=\pi\), then in theory one is guaranteed to excite the atom \(|0\rangle\mapsto -i|1\rangle\equiv|1\rangle\). Alternatively, if one applies an \(\Omega t=\pi/2\)-pulse, then this yields the “circularly polarized” state \(|0\rangle\mapsto (|0\rangle-i|1\rangle)/\sqrt{2}\).

More generally when the detuning \(\delta\neq 0\) is off-resonance, the maximum probability of an electric dipole transition \(|0\rangle\to|1\rangle\) from the ground state to the excited state that one can achieve is \(\Omega^2/\tilde{\Omega}^2\), although in the limit as \(\Omega\to\infty\) (e.g. cranking up the laser), this ratio does approach \(1\).

As with any qubit system, one can visualize the dynamics of Rabi oscillations on the Bloch sphere. Although here the ket \(|\psi_I(t)\rangle\) is tautologically a pure state, one can nevertheless work with its interaction picture density operator \(\rho_I(t)=|\psi_I(t)\rangle\langle\psi_I(t)|\). However, despite one’s first instinct being to work with \(\rho_I\) in the \(H\)-eigenbasis of the ground state \(|0\rangle\) and excited state \(|1\rangle\), it turns out to be more convenient to first boost unitarily into a “steady-state picture”. Specifically, if one instead works with the “steady-state basis” \(|\tilde 0\rangle:=e^{i\delta t/2}|0\rangle\) and similarly \(|\tilde 1\rangle:=e^{-i\delta t/2}|1\rangle\), along with the bras \(\langle\tilde 0|=e^{-i\delta t/2}\langle 0|\) and \(\langle\tilde 1|=e^{i\delta t/2}\langle 1|\), then starting from the earlier Schrodinger equation in the rotating wave approximation:

\[\begin{pmatrix}\dot{\langle 0|\psi_I\rangle}\\\dot{\langle 1|\psi_I\rangle}\end{pmatrix}=\frac{\Omega}{2i}\begin{pmatrix}0&e^{i\delta t}\\e^{-i\delta t}&0\end{pmatrix}\begin{pmatrix}\langle 0|\psi_I\rangle\\\langle 1|\psi_I\rangle\end{pmatrix}\]

the benefit of this change of basis is that the new “steady-state Hamiltonian” \(H_{\infty}\):

\[H_{\infty}=\frac{\hbar}{2}\begin{pmatrix}\delta&\Omega\\\Omega&-\delta\end{pmatrix}=\frac{\hbar}{2}\tilde{\boldsymbol{\Omega}}\cdot\boldsymbol{\sigma}\]

for which

\[\begin{pmatrix}\dot{\langle\tilde 0|\psi_I\rangle}\\\dot{\langle \tilde 1|\psi_I\rangle}\end{pmatrix}=\frac{1}{2i}\begin{pmatrix}\delta&\Omega\\\Omega&-\delta\end{pmatrix}\begin{pmatrix}\langle\tilde 0|\psi_I\rangle\\\langle \tilde 1|\psi_I\rangle\end{pmatrix}\]

is now time-independent at the expense of gaining back the diagonal matrix elements, where the generalized Rabi vector is given by \(\tilde{\boldsymbol{\Omega}}:=(\Omega,0,\delta)\) and has magnitude equal to the generalized Rabi frequency \(|\tilde{\boldsymbol{\Omega}}|=\sqrt{\Omega^2+\delta^2}\).

Problem: Show that this result can also be obtained by transforming to an alternative picture (rather than the standard interaction picture mod \(H_0\)):

  1. Split \(H_0=E_0|0\rangle\langle 0|+E_1|1\rangle\langle 1|\) into symmetric and antisymmetric parts:

\[H_0=\frac{E_0+E_1}{2}1+\frac{\hbar\omega_0}{2}\left(|1\rangle\langle 1|-|0\rangle\langle 0|\right)\]

And since the symmetric part is isotropic, one can safely discard it and keep only the antisymmetric part.

2. Starting from the Schrodinger picture, transform into a picture defined by the unitary \(U(t):=e^{-i\omega t|0\rangle\langle 0|}\).

3. Apply the rotating wave approximation.

Solution: The first part is straightforward. Then:

In this steady-state basis \(|\tilde 0\rangle,|\tilde 1\rangle\), the density matrix \([\rho_I(t)]_{|\tilde 0\rangle,|\tilde 1\rangle}^{|\tilde 0\rangle,|\tilde 1\rangle}=\frac{1}{2}(1+\tilde{\textbf b}\cdot\boldsymbol{\sigma})\) can be replaced by the conceptually simpler Bloch vector \(\tilde{\textbf b}\in\textbf R^3\) of the qubit whose components \(\tilde{\textbf b}=(\tilde b_1,\tilde b_2,b_3)\) relate back to the matrix elements of the density operator \(\rho_I\) via:

\[\tilde b_1=\tilde{\rho}_{01}+\tilde{\rho}_{10}\]

\[\tilde b_2=i(\tilde{\rho}_{01}-\tilde{\rho}_{10})\]

\[b_3=\rho_{00}-\rho_{11}\]

where the populations \(\tilde{\rho}_{00}=\rho_{00},\tilde{\rho}_{11}=\rho_{11}\) are unaffected by the boost (relative to if the matrix elements of \(\rho_I\) were expressed in the \(|0\rangle,|1\rangle\) basis) and the coherences \(\tilde{\rho}_{01}=e^{-i\delta t}\rho_{01},\tilde{\rho}_{10}=e^{i\delta t}\rho_{10}\) are affected:

\[[\rho_I(t)]_{|\tilde 0\rangle,|\tilde 1\rangle}^{|\tilde 0\rangle,|\tilde 1\rangle}=\begin{pmatrix}\rho_{00}&\tilde{\rho}_{01}\\ \tilde{\rho}_{10}&\rho_{11}\end{pmatrix}=\begin{pmatrix}\langle 0|\rho|0\rangle & \langle\tilde 0|\rho|\tilde 1\rangle \\ \langle\tilde 1|\rho|\tilde 0\rangle & \langle 1|\rho|1\rangle\end{pmatrix}=\begin{pmatrix}|\langle 0|\psi_I\rangle|^2 & \langle \tilde 0|\psi_I\rangle\langle\psi_I|\tilde 1\rangle \\ \langle\tilde 1|\psi_I\rangle\langle\psi_I|\tilde 0\rangle & |\langle 1|\psi_I\rangle|^2\end{pmatrix}\]

From Liouville’s equation \(i\hbar\dot{\rho}_I=[H_{\infty},\rho_I]\) and the standard identity of Pauli matrices \([\tilde{\boldsymbol{\Omega}}\cdot\boldsymbol{\sigma},\tilde{\textbf b}\cdot\boldsymbol{\sigma}]=2i(\tilde{\boldsymbol{\Omega}}\times\tilde{\textbf b})\cdot\boldsymbol{\sigma}\), one immediately obtains the precession of the Bloch vector \(\tilde{\textbf b}\) around the generalized Rabi vector \(\tilde{\boldsymbol{\Omega}}\) at the generalized Rabi frequency:

\[\dot{\tilde{\textbf b}}=\tilde{\boldsymbol{\Omega}}\times\tilde{\textbf b}\]

For large detunings \(\delta\), the Bloch vector \(\tilde{\textbf b}\) precesses faster (i.e. one gets faster Rabi oscillations) though at the expense of the maximum excited population \(\text{max}(\rho_{11})=\Omega^2/\tilde{\Omega}^2\) achievable (as already mentioned before). Note also that the generalized Rabi vector \(\tilde{\boldsymbol{\Omega}}\) does depend on when one starts the clock; for instance if \(\textbf E_{\text{ext}}(\textbf x,t)=\textbf E_0\sin(\textbf k_{\text{ext}}\cdot\textbf x-\omega_{\text{ext}}t)\) instead of \(\cos\), then in this case \(\tilde{\boldsymbol{\Omega}}=(0,\Omega,\delta)\) instead, etc.

Optical Bloch Equations

From the stimulated absorption and emission interactions of the atom with the external optical field, Rabi oscillations were seen to emerge. However, Einstein’s statistical argument showed that in addition to stimulated absorption/emission, there is also spontaneous emission. How does this affect the physics? Although a rigorous treatment requires quantizing the EM field, phenomenologically one can simply include a decay term (in the spirit of Einstein) of the form \(-\Gamma\rho_{11}\) (where \(\Gamma=A_{10}\) in the Einstein model) to the rate equation for the excited state population \(\dot{\rho}_{11}=\frac{\Omega}{2}\tilde{b}_2-\Gamma\rho_{11}\). At low laser intensities, \(\Gamma\sim 2\pi\times 10\text{ MHz}\) will in fact typically be greater than the Rabi frequency \(\Omega\), making spontaneous decay an important mechanism by which otherwise coherent Rabi oscillations decohere over time. Although it is immediate that \(\dot b_3=-\Omega\tilde b_2-\Gamma(b_3-1)\), what is not so clear is how \(\tilde b_1,\tilde b_2\) are affected by the spontaneous decay \(\Gamma\neq 0\). By analogy with a classical damped electric dipole, it turns out one can phenomenologically obtain the optical Bloch equations:

\[\dot{\tilde b}_1=\delta\tilde b_2-\frac{\Gamma}{2}\tilde b_1\]

\[\dot{\tilde b}_2=-\delta\tilde b_1+\Omega b_3-\frac{\Gamma}{2}\tilde b_2\]

\[\dot b_3=-\Omega\tilde b_2-\Gamma(b_3-1)\]

Unlike the earlier precession \(\dot{\tilde{\textbf b}}=\tilde{\boldsymbol{\Omega}}\times\tilde{\textbf b}\) in the absence \(\Gamma=0\) of spontaneous emissions, now, in the steady state limit \(t\gg 1/\Gamma\) of long driving times where \(\dot{\tilde b}_1=\dot{\tilde b}_2=\dot b_3=0\), the Bloch vector eventually settles onto:

\[\tilde{\textbf b}_{\infty}=\frac{1}{\delta^2+\Omega^2/2+\Gamma^2/4}\begin{pmatrix}\Omega\delta\\\Omega\Gamma/2\\\delta^2+\Gamma^2/4\end{pmatrix}\]

In particular, an immediate important corollary is:

\[\rho_{11}=\frac{\Omega^2/4}{\delta^2+\Omega^2/2+\Gamma^2/4}\]

along with the strongly driven limit \(\rho_{11}\to 1/2\) of the excited state population as \(\Omega\to\infty\). Equivalently, in terms of the spontaneous decay rate \(\gamma:=\Gamma\rho_{11}\), one has:

\[\gamma=\frac{\Gamma}{2}\frac{s}{1+s+(2\delta/\Gamma)^2}\]

with \(s=2(\Omega/\Gamma)^2:=I/I_{\text{sat}}\) the normalized intensity. Thus, one’s intuition would suggest that simply cranking up the laser intensity \(s\to\infty\) should excite all atoms into the excited state \(|1\rangle\), the spontaneous decay rate \(\gamma\to\Gamma/2\) prevents one from achieving this.

Posted in Blog | Leave a comment

Electromagnetism in Materials

Problem #\(1\): Consider a simplistic classical model of an atom as a positively charged nucleus \(Q>0\) surrounded by a spherical, uniformly dense electron cloud \(-Q<0\) of radius \(a\). If this atom is subjected to a DC external electric field \(\textbf E_{\text{ext}}\), show that the induced dipole \(\textbf p_{\text{ind}}\) developed by the atom is given by \(\textbf p_{\text{ind}}=\alpha\textbf E_{\text{ext}}\) where the atomic polarizability is given by \(\alpha=4\pi\varepsilon_0 a^3>0\). Neglect any higher-order multipole moments of the electron cloud (i.e. assume it maintains a spherical shape when perturbed by \(\textbf E_{\text{ext}}\)).

Solution #\(1\): By Gauss’s law for electric fields, the internal electrostatic field \(\textbf E_{\text{int}}(\textbf x)\) due to just the electron cloud must rise linearly in magnitude to match the Coulomb field at \(|\textbf x|=a\), thus \(\textbf E_{\text{int}}(|\textbf x|=a)=-Q/4\pi\varepsilon_0 a^2\hat{\textbf x}\), hence \(\textbf E_{\text{int}}(\textbf x)=-Q|\textbf x|/4\pi\varepsilon_0 a^3\hat{\textbf x}\) for \(|\textbf x|\leq a\). When the external electric field \(\textbf E_{\text{ext}}\) is applied, this will induce a displacement of the nucleus to a new equilibrium \(\textbf x=\textbf 0\mapsto\textbf x=\Delta\textbf x_{\text{ind}}\) relative to the center of the electron cloud, at which location force balance requires:

\[\textbf E_{\text{int}}(\Delta\textbf x_{\text{ind}})+\textbf E_{\text{ext}}=\textbf 0\]

Noting that the induced dipole is \(\textbf p_{\text{ind}}=Q\Delta\textbf x_{\text{ind}}\), the claim follows.

Problem #\(2\): Explain what it means for a dielectric to be linear and explain under what circumstances can most dielectrics be modelled as linear.

Solution #\(2\): A dielectric is said to be linear iff, when subjected to an external (not necessarily uniform or DC!) electric field \(\textbf E_{\text{ext}}\), the induced polarization \(\textbf P_{\text{ind}}\) is linear in \(\textbf E_{\text{ext}}\) in the sense that there exists an electric susceptibility \(\chi_e\) such that:

\[\textbf P_{\text{ind}}=\varepsilon_0\chi_e\textbf E_{\text{ext}}\]

By definition, \(\textbf P_{\text{ind}}:=n_{\textbf p_{\text{ind}}}\textbf p_{\text{ind}}\) is the number density \(n_{\textbf p_{\text{ind}}}\) of induced electric dipoles \(\textbf p_{\text{ind}}\). If \(\textbf E_{\text{ext}}\) is DC as in Problem #\(1\), then in that case \(\chi_e>0\) is real and given by \(\chi_e=4\pi n_{\textbf p_{\text{ind}}}a^3\). In general, the assumption of linearity is only a good approximation provided \(|\textbf E_{\text{ext}}|\) is not too strong.

Problem #\(3\): Explain what the concept of bound charge is and show that the surface bound charge density \(\sigma_b\) and the volume bound charge density \(\rho_b\) are related to the polarization \(\textbf P\) via:

\[\sigma_b=\hat{\textbf n}\cdot\textbf P\]

\[\rho_b=-\frac{\partial}{\partial\textbf x}\cdot\textbf P\]

Comment on the case where \(\textbf P\) is uniform within a material.

Solution #\(3\): Whenever, for any reason whatsoever, there is a polarization \(\textbf P\neq\textbf 0\) (often when an \(\textbf E_{\text{ext}}\) is applied such as in Problems #\(1\) and #\(2\), though see ferroelectrics), then there must necessarily be a build-up of electric charge somewhere in a material volume \(V\) (or on its surface \(\partial V\)); any such charge is called bound charge. This is because the electrostatic potential \(\phi\) induced by this \(\textbf P\) would be given by:

\[\phi(\textbf x)=\frac{1}{4\pi\varepsilon_0}\iiint_{\textbf x’\in V}\frac{\textbf P(\textbf x’)\cdot(\textbf x-\textbf x’)}{|\textbf x-\textbf x’|^3}d^3\textbf x’\]

But noting that (e.g. in spherical coordinates) \(\frac{\partial}{\partial\textbf x’}\frac{1}{|\textbf x-\textbf x’|}=\frac{\textbf x-\textbf x’}{|\textbf x-\textbf x’|^3}\), an integration by parts shows that:

\[\phi(\textbf x)=\frac{1}{4\pi\varepsilon_0}\iint_{\textbf x’\in\partial V}\frac{\sigma_b(\textbf x’)}{|\textbf x-\textbf x’|}d^2\textbf x’+\frac{1}{4\pi\varepsilon_0}\iiint_{\textbf x’\in V}\frac{\rho_b(\textbf x’)}{|\textbf x-\textbf x’|}d^3\textbf x’\]

(note that right now both of these are only electrostatic results because they arise by comparison to Coulomb’s law). When \(\textbf P\) is uniformly polarized, then bound charge does not clump up anywhere inside the material \(\rho_b=0\), but only on its surface \(\sigma_b\neq 0\).

Problem #\(4\): Explain what the concept of free charge is, and show that for a linear dielectric sphere of radius \(R\) with a lump of charge \(Q_f\) deposited at its center, the surface bound charge density is \(\sigma_b=\frac{\chi_e Q_f}{4\pi R^2(1+\chi_e)}\). What is the corresponding discontinuity \(\Delta\textbf E\) in the \(\textbf E\)-field across the surface of the linear dielectric sphere?

Solution #\(4\): Free charge is any charge that has clumped somewhere in a material, but that is not due to polarization \(\textbf P\), i.e. it is not bound charge. Essentially, it is a wastebasket for non-bound charge so that \(\rho=\rho_f+\rho_b\). Defining the displacement field as \(\textbf D:=\varepsilon_0\textbf E+\textbf P\) so that \(\frac{\partial}{\partial\textbf x}\cdot\textbf D=\rho_f\), it follows that for the free charge distribution \(\rho_f(\textbf x)=Q_f\delta^3(\textbf x)\), one has (everywhere in space \(\textbf x\in\textbf R^3-\{\textbf 0\}\), not just inside the dielectric sphere!):

\[\textbf D(\textbf x)=\frac{Q_f}{4\pi|\textbf x|^2}\hat{\textbf x}\]

Outside the dielectric sphere \(|\textbf x|\geq R\) where \(\textbf P=\textbf 0\), this reproduces the usual electric field \(\textbf E(\textbf x)=Q_f/4\pi\varepsilon_0|\textbf x|^2\hat{\textbf x}\). Meanwhile, for \(|\textbf x|\leq R\), the assumption of dielectric linearity allows one to write \(\textbf D=\varepsilon\textbf E\) with the permittivity \(\varepsilon=\varepsilon_0(1+\chi_e)\), yielding an internal electric field \(\textbf E(\textbf x)=Q_f/4\pi\varepsilon|\textbf x|^2\hat{\textbf x}\) screened by bound charge, and a corresponding induced polarization:

\[\textbf P(\textbf x)=\frac{\chi_e Q_f}{4\pi(1+\chi_e)|\textbf x|^2}\hat{\textbf x}\]

The claim then follows from \(\sigma_b=\hat{\textbf x}\cdot\textbf P(|\textbf x|=R)\). Alternatively, the bound charge \(\rho_b(\textbf x)=Q_b\delta^3(\textbf x)\) polarized towards the origin is clearly (using \(\rho_b=-\partial/\partial\textbf x\cdot\textbf P\) and the divergence theorem) \(Q_b=-\chi_eQ_f/(1+\chi_e)\) so that the net charge at the origin is \(Q_f+Q_b=Q_f/(1+\chi_e)\), consistent with the form of the \(\textbf E\)-field inside the linear dielectric sphere. But since the sphere as a whole should only contain the central free charge \(Q_f\), this requires it to have surface charge \(-Q_b\) from which \(\sigma_b=-Q_b/4\pi R^2\) leads to the same answer.

Although the \(\textbf D\)-field is continuous \(\Delta\textbf D=\textbf 0\) across the sphere surface (from \(\frac{\partial}{\partial\textbf x}\cdot\textbf D=\rho_f\) and a Gaussian pillbox, this is seen to be due to the lack of free surface charges \(\sigma_f=0\)), the \(\textbf E\)-field discontinuity is \(\Delta\textbf E=\sigma_b/\varepsilon_0\).

Problem #\(5\): Derive the analog of Problem #\(3\) for magnetostatics.

Solution #\(5\): Everything is completely analogous, in particular the analog of \(\phi\) is \(\textbf A\):

\[\textbf A(\textbf x)=\frac{\mu_0}{4\pi}\iiint_{\textbf x’\in V}\frac{\textbf M(\textbf x’)\times(\textbf x-\textbf x’)}{|\textbf x-\textbf x’|^3}d^3\textbf x’\]

Applying the same identity and tricks as in Solution #\(3\):

\[\textbf A(\textbf x)=\frac{\mu_0}{4\pi}\iint_{\textbf x’\in\partial V}\frac{\textbf K_b(\textbf x’)}{|\textbf x-\textbf x’|}d^2\textbf x’+\frac{\mu_0}{4\pi}\iiint_{\textbf x’\in V}\frac{\textbf J_b(\textbf x’)}{|\textbf x-\textbf x’|}d^3\textbf x’\]

where the bound surface current is \(\textbf K_b=\textbf M\times\hat{\textbf n}\) and the bound current is \(\textbf J_b=\frac{\partial}{\partial\textbf x}\times\textbf M\) (note that right now both of these are only magnetostatic results because they arise by comparison to the Biot-Savart law).

Problem #\(6\): Unfortunately there is no simple classical analog of Problem #\(1\) that can be used to rationalize why it is the case that many materials are linear not just as dielectrics, but also as diamagnets/paramagnets (quantum mechanics is needed!). So for now, taking as an experimental fact that \(\textbf M_{\text{ind}}=\chi_m\textbf H_{\text{ext}}\) for some magnetic susceptibility \(\chi_m\in\textbf R\), explain why \(\mu=\mu_0(1+\chi_m)\) (cf. \(\varepsilon=\varepsilon_0(1+\chi_e)\)).

Solution #\(6\): Most of the time when one thinks of “current”, one is thinking of free current \(\textbf J_f\) which contributes to the magnetic field \(\textbf B\) on top of any bound current \(\textbf J_b\) due to magnetization \(\textbf M\neq\textbf 0\). Ampere’s law of magnetostatics asserts that:

\[\frac{\partial}{\partial\textbf x}\times\textbf B=\mu_0(\textbf J_f+\textbf J_b)\]

So this motivates the definition of the magnetizing field \(\textbf H:=\textbf B/\mu_0-\textbf M\) so that \(\partial/\partial\textbf x\times\textbf H=\textbf J_f\). That being said, analogous to how \(\textbf D=\varepsilon\textbf E\) in linear dielectrics, one would also like \(\textbf H=\textbf B/\mu\) in these “linear diamagnets/paramagnets”. The claim follows.

Problem #\(7\): Explain what changes of the electrostatic/magnetostatic results above when the fields are taken to depend on time \(t\).

Solution #\(7\): Just as Maxwell’s displacement current contribution to Ampere’s original magnetostatic law came about by enforcing local charge conservation, here the idea will be to enforce local bound charge conservation, with local free charge conservation therefore coming as a corollary. As usual, this means stipulating the continuity equation:

\[\frac{\partial\rho_b}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf J_b=0\]

The earlier magnetostatic result \(\textbf J_b=\partial/\partial\textbf x\times\textbf M\) is clearly not compatible with this if \(\dot{\rho}_b\neq 0\) is changing, and therefore, because \(\rho_b=-\partial/\partial\textbf x\cdot\textbf P\), needs to be corrected to:

\[\textbf J_b=\frac{\partial}{\partial\textbf x}\times\textbf M+\frac{\partial\textbf P}{\partial t}\]

Problem #\(8\): Write down Maxwell’s equations in materials. Comment on the case \(\rho_f=0\) and \(\textbf J_f=\textbf 0\).

Solution #\(8\):

\[\frac{\partial}{\partial\textbf x}\cdot\textbf D=\rho_f\]

\[\frac{\partial}{\partial\textbf x}\cdot\textbf B=0\]

\[\frac{\partial}{\partial\textbf x}\times\textbf E=-\frac{\partial\textbf B}{\partial t}\]

\[\frac{\partial}{\partial\textbf x}\times\textbf H=\textbf J_f+\frac{\partial\textbf D}{\partial t}\]

These should always be viewed with in conjunction with a pair of constitutive relations, usually just the linear approximations \(\textbf D=\varepsilon\textbf E\) and \(\textbf H=\textbf B/\mu\). In dielectrics where \(\rho_f=0\) and \(\textbf J_f=\textbf 0\), Maxwell’s equations look like the usual ones in vacuum but with \(\varepsilon_0\mapsto\varepsilon\) and \(\mu_0\mapsto\mu\) (assuming the dielectric is linear). Light waves can therefore propagate in linear dielectrics just as they would in vacuum, except at a slower speed \(v=(\mu\varepsilon)^{-1/2}<c\).

Problem #\(9\): Using the results of Solution #\(8\), derive the \(4\) electromagnetic interface conditions (hint: each of the \(4\) macroscopic Maxwell equations will provide one).

Solution #\(9\): “Use pillboxes for the divs and closed loops for the curls”:

\[\Delta\textbf D\cdot\hat{\textbf n}=\sigma_f\]

\[\Delta\textbf B\cdot\hat{\textbf n}=0\]

\[\hat{\textbf n}\times\Delta\textbf E=\textbf 0\]

\[\hat{\textbf n}\times\Delta\textbf H=\textbf K_f\]

Problem: Consider placing a thin rod aligned with an external magnetic field \(\textbf B_{\text{ext}}\). What is the internal magnetic field \(\textbf B_{\text{int}}\) and magnetizing field \(\textbf H_{\text{int}}\)?

Solution:

Problem: Repeat the above problem but for a thin slab.

Solution: Following the discussion above, the dominant boundary condition is the normal continuity of the \(\textbf B\)-field:

Problem #\(10\): Derive the law of reflection and Snell’s law.

Solution #\(10\): Applying phase-matching at the \(z=0\) interface (otherwise there could be no hope of satisfying the electromagnetic interface conditions), one concludes that the frequency is solely a property of the incident source \(\omega=\omega’=\omega^{\prime\prime}\), reflecting conservation of photon energy, and that the projections \(k\sin\theta=k’\sin\theta’=k^{\prime\prime}\sin\theta^{\prime\prime}\) of the wavevectors \(\textbf k,\textbf k’\) and \(\textbf k^{\prime\prime}\) onto the \(xy\)-plane \(z=0\) are also equal, reflecting conservation of photon momentum along that direction. The law of reflection then follows simply because the reflected ray is in the same dielectric \(n^{\prime\prime}=n\) as the incident ray, while Snell’s law \(n\sin\theta=n’\sin\theta’\) follows because the refracted ray is now in a different dielectric \(n’\neq n\) with different dispersion relation.

(aside: provided the media actually support wave propagation and that all indices of refraction are defined via the phase velocity so that \(n,n’\) could be less than \(1\), then both the law of reflection and Snell’s law hold at the interface of arbitrary media, not just two dielectrics. Furthermore, both laws also hold for water waves and sound waves, not just light waves, which emphasizes that they are not really corollaries of Maxwell’s equations, but corollaries of the wave equation, while for light waves the wave equation is a corollary of Maxwell).

Problem #\(11\): Using the result of Problem #\(10\), derive the Fresnel equations for reflection and transmission at dielectric interfaces.

Solution #\(11\): The assumption of dielectrics is needed here in order to set \(\sigma_f=0\) and \(\textbf K_f=\textbf 0\) in the electromagnetic interface conditions (i.e. would not be true for conductors!). Note also that the amplitudes \(\textbf E_0,\textbf E’_0,\textbf E^{\prime\prime}_0\) are arbitrary, so capture all cases of linear, circular, or elliptical polarization.

The incident wavevector is \(\textbf k=k\sin\theta\hat{\boldsymbol{\rho}}_{\phi}+k\cos\theta\hat{\textbf z}\) while the transmitted wavevector is \(\textbf k’=k’\sin\theta’\hat{\boldsymbol{\rho}}_{\phi}+k’\cos\theta’\hat{\textbf z}\); the incident electric field amplitude \(\textbf E_0\) lives in the orthogonal complement \(\text{span}^{\perp}(\textbf k)\) while \(\textbf E’_0\) lives in \(\text{span}^{\perp}(\textbf k’)\) (analogous remarks apply for the magnetic field amplitudes \(\textbf B_0,\textbf B’_0\) which are inseparably married to their respective electric fields but are not indicated on the diagram for simplicity). These two-dimensional subspaces admit convenient orthonormal bases given by the normal unit vector \(\hat{\textbf s}:=\hat{\textbf k}\times\hat{\textbf p}=\hat{\textbf k}’\times\hat{\textbf p}’\) and the parallel unit vectors \(\hat{\textbf p}:=\text{cot}\theta\hat{\textbf k}-\text{csc}\theta\hat{\textbf z}\) and likewise \(\hat{\textbf p}’:=\text{cot}\theta’\hat{\textbf k}’-\text{csc}\theta’\hat{\textbf z}\); here the words “normal” and “parallel” are with respect to the plane of incidence/transmission/reflection \(\text{span}(\textbf k,\hat{\textbf z})=\text{span}(\textbf k’,\hat{\textbf z})\).

By decomposing \(\textbf E_0=E_{0,s}\hat{\textbf s}+E_{0,p}\hat{\textbf p}\) where \(E_{0,s}=\textbf E_0\cdot\hat{\textbf s}\) and \(E_{0,p}=\textbf E_0\cdot\hat{\textbf p}\), and doing likewise for \(\textbf E’_0,\textbf E^{\prime\prime}_0,\textbf B_0,\textbf B’_0,\textbf B^{\prime\prime}_0\) and substituting all this into the \(4\) electromagnetic interface conditions with \(\hat{\textbf n}=\hat{\textbf z}\), one obtains the Fresnel equations:

\[t_s:=\frac{E’_{0,s}}{E_{0,s}}=\frac{2Y\cos\theta}{Y\cos\theta+Y’\cos\theta’}\]

\[r_s:=\frac{E^{\prime\prime}_{0,p}}{E_{0,p}}=\frac{Y\cos\theta-Y’\cos\theta’}{Y\cos\theta+Y’\cos\theta’}\]

\[t_p:=\frac{E’_{0,p}}{E_{0,p}}=\frac{2Y\cos\theta}{Y\cos\theta’+Y’\cos\theta}\]

\[r_p:=\frac{E^{\prime\prime}_{0,p}}{E_{0,p}}=\frac{Y\cos\theta’-Y’\cos\theta}{Y\cos\theta’+Y’\cos\theta}\]

where the admittances are \(Y:=Z^{-1}=\sqrt{\varepsilon/\mu}\) and likewise \(Y’:=Z’^{-1}=\sqrt{\varepsilon’/\mu’}\). Note here that all \(4\) coefficients \(t_s,r_s,t_p,r_p\) are defined here with respect to the electric field; one could have also defined them from the \(s\) and \(p\)-polarized components of the magnetic field, leading to almost the same Fresnel equations (differing by some subtle minus signs here and there).

Problem #\(12\): Sketch graphs of \(t_s,r_s,t_p,r_p\) as a function of the incident angle \(\theta\) in the two cases \(Y>Y’\) and \(Y<Y’\), and comment.

Solution #\(12\): From air to glass \(Y’/Y\approx 1.5\) (assuming both have \(\mu=\mu’=\mu_0\)):

At normal incidence \(\theta=\theta’=0\), there is clearly no longer any distinction between \(s\) and \(p\)-polarization, reflecting the fact that (as is clear from the graph) \(t_s(\theta=0)=t_p(\theta=0)\) and \(r_s(\theta=0)=r_p(\theta=0)\) regardless of whether \(Y<Y’\) or \(Y>Y’\). In addition, again regardless of whether \(Y<Y’\) or \(Y>Y’\), there always exists a Brewster angle \(\tan\theta_B=\sqrt{\frac{1-(Y’/Y)^2}{(n/n’)^2-1}}\approx Y’/Y\) (approximation valid for non-magnetic dielectrics e.g. air, glass for which \(Y’/Y\approx n’/n\) and the reflected light \(\textbf k’\cdot\textbf k^{\prime\prime}=0\) is orthogonal to the refracted light \(\theta_B+\theta’_B\approx \pi/2\)) and is entirely linearly \(s\)-polarized since \(r_p(\theta=\theta_B)=0\). However, only in the case \(Y>Y’\) of going from an optically dense to less dense medium does there exist a critical angle \(\theta_C=\arcsin(Y’/Y)\) beyond which \(\theta>\theta_C\) only total internal reflection occurs together with transmission of an evanescent wave \(\textbf E'(\rho,\phi,z)=\textbf E’_0e^{-\sqrt{k^2\sin^2\theta-k’^2}z}e^{i(k\rho\cos(\phi-\phi_{\textbf k})\sin\theta-\omega t)}\).

Problem #\(13\): Building off of Problem #\(12\), calculate the transmitted powers \(T_s(\theta),T_p(\theta)\) along with the reflected powers \(R_s(\theta),R_p(\theta)\) as a function of incident angle \(\theta\) for both \(s\) and \(p\)-polarization. Verify that \(T_s(\theta)+R_s(\theta)=T_p(\theta)+R_p(\theta)=1\). Show that for air-to-glass at normal incidence (so that there is no distinction between \(s\) and \(p\)-polarization), \(R(\theta=0)\approx 4\%\).

Solution #\(13\): As long as one remembers the expression for the (period-averaged) Poynting vector \(\textbf S=\frac{1}{2}\textbf E\times\textbf H^*=\frac{Y|\textbf E_0|^2}{2}\hat{\textbf k}\), then the rest is easy (by definition the coefficients here are defined relative to the normal \(\hat{\textbf n}=\hat{\textbf z}\) of the interface though Poynting’s theorem with \(\textbf J_f=\textbf 0\) would hold in an arbitrary direction; there is also a nice way to see why this is using notions of “beam divergence”).

\[T_s(\theta)=\frac{\textbf S’_s\cdot\hat{\textbf z}}{\textbf S_s\cdot\hat{\textbf z}}=\frac{Y’\cos\theta’}{Y\cos\theta}|t_s(\theta)|^2\]

\[R_s(\theta)=\frac{\textbf S^{\prime\prime}_s\cdot\hat{\textbf z}}{\textbf S_s\cdot\hat{\textbf z}}=|r_s(\theta)|^2\]

\[T_p(\theta)=\frac{\textbf S’_p\cdot\hat{\textbf z}}{\textbf S_p\cdot\hat{\textbf z}}=\frac{Y’\cos\theta’}{Y\cos\theta}|t_p(\theta)|^2\]

\[R_p(\theta)=\frac{\textbf S^{\prime\prime}_p\cdot\hat{\textbf z}}{\textbf S_p\cdot\hat{\textbf z}}=|r_p(\theta)|^2\]

the rest is easy to check (aside: given the \(4\) electromagnetic boundary conditions earlier, can one write down some corresponding boundary conditions on \(\hat{\textbf n}\cdot\Delta\textbf S\) and \(\hat{\textbf n}\times\Delta\textbf S\) which capture energy conservation? Especially the first one seems to just coming from Poynting’s theorem).

Problem #\(14\): Using the Lorentz oscillator model of the electron \(e^-\), show how the atomic polarizability \(\alpha\), the electric susceptibility \(\chi_e\), the permittivity \(\varepsilon\), and the wavenumber \(k\) all become complex-valued functions of the frequency \(\omega\in\textbf R\) of the incident light.

Solution #\(14\): The equation of motion for the electron is taken to be of the usual form of a damped, driven harmonic oscillator:

\[\ddot{\textbf x}(t)+\Delta\omega\dot{\textbf x}(t)+\omega_0^2\textbf x(t)=\frac{q\textbf E_0}{m}e^{i(\textbf k_{\omega}\cdot\textbf x(t)-\omega t)}\]

where the damping coefficient \(\Delta\omega\) and the resonant frequency \(\omega_0\) are both phenomenological parameters of the Lorentz model (the former \(\Delta\omega\) is the quantum line width of the relevant optical transitions that have energy commensurate with \(\hbar\omega\), equivalently the spontaneous emission rate or reciprocal lifetime, while the latter \(\omega_0^2=q^2/4\pi\varepsilon_0ma^3\) has a classical form following Solution #\(1\)). Although of course the electric field \(\textbf E(\textbf x,t)=\textbf E_0e^{i(\textbf k_{\omega}\cdot\textbf x-\omega t)}\) of course has an accompanying magnetic field \(\textbf B(\textbf x,t)=\textbf B_0e^{i(\textbf k_{\omega}\cdot\textbf x-\omega t)}\), the magnetic force \(q\dot{\textbf x}\times\textbf B\) can be ignored in the non-relativistic limit \(|\dot{\textbf x}|\ll c\).

Making the further assumption that at the relevant frequency \(\omega\), \(\textbf k_{\omega}\cdot\textbf x(t)\ll 1\) at all times \(t\), then it really reduces to a linearly damped, driven harmonic oscillator, for which the steady state particular integral oscillation is:

\[\textbf x=\frac{q}{m}\frac{1}{\omega_0^2-\omega^2-i\omega\Delta\omega}\textbf E\]

Optionally multiplying this solution \(\textbf x\) by the electron charge \(q\), one can write the electric dipole moment of an atom as:

\[\textbf p=\frac{q^2}{m}\frac{1}{\omega_0^2-\omega^2-i\omega\Delta\omega}\textbf E\]

But from the definition \(\textbf p=\alpha\textbf E\), one immediately recognizes the atomic polarizability has become a \(\textbf C\)-valued function of \(\omega\in\textbf R\):

\[\alpha(\omega)=\frac{q^2}{m}\frac{1}{\omega_0^2-\omega^2-i\omega\Delta\omega}\]

From \(\textbf p\), one can get \(\textbf P\) through a factor of the number density \(n\) of electrons (equivalent to the number density of electric dipole moments):

\[\textbf P=n\textbf p=n\alpha\textbf E:=\varepsilon_0\chi_e\textbf E\]

So:

\[\chi_e(\omega)=\frac{\omega_p^2}{\omega_0^2-\omega^2-i\omega\Delta\omega}\]

where the plasma frequency \(\omega_p^2=nq^2/m\varepsilon_0\) scales with the plasma density \(n\). The \(\textbf C\)-valued nature of the permittivity is then evident from \(\varepsilon(\omega)=\varepsilon_0(1+\chi_e(\omega))\) whose the real and imaginary parts are:

\[\Re\varepsilon(\omega)=\varepsilon_0\left(1-\frac{\omega_p^2(\omega^2-\omega_0^2)}{(\omega^2-\omega_0^2)^2+\omega^2\Delta\omega^2}\right)\]

\[\Im\varepsilon(\omega)=\varepsilon_0\frac{\omega_p^2\omega\Delta\omega}{(\omega^2-\omega_0^2)^2+\omega^2\Delta\omega^2}\]

And the shape of both of their graphs is a good thing to get familiar with (note the roles played by the \(3\) distinct frequency scales in the function \(\omega_p,\omega_0,\Delta\omega\)):

As an aside, because the complexified \(\varepsilon(\omega)\) is complex analytic in the upper half-plane \(\Im\omega>0\) on causality grounds, its real and imaginary parts \(\Re\varepsilon(\omega),\Im\varepsilon(\omega)\) are related by Kramers-Kronig relations:

\[\Re\varepsilon(\omega)=\mathcal P\int_{-\infty}^{\infty}\frac{d\omega’}{\pi}\frac{\Im\varepsilon(\omega’)}{\omega’-\omega}\]

\[\Im\varepsilon(\omega)=\mathcal P\int_{-\infty}^{\infty}\frac{d\omega’}{\pi}\frac{\Re\varepsilon(\omega’)}{\omega-\omega’}\]

Problem #\(15\): Dispersion relations for waves are conventionally thought of as \(\omega=\omega_{\textbf k}\), but one can of course invert this to obtain \(\textbf k=\textbf k_{\omega}\) and view that as the dispersion relation. This view will be more useful here; think of the frequency \(\omega\in\textbf R\) as free real parameter one gets to select as the experimentalist.

For a linear dielectric in the absence of (free) charges or (free conduction) currents, what consequences does \(\varepsilon(\omega)\in\textbf C\) have for the propagation of light in such a medium?

Solution #\(15\): One always has the “naive linear” nondispersion:

\[|\textbf k_{\omega}|=\sqrt{\mu\varepsilon}\omega\]

However, even though one typically assumes \(\mu\in\textbf R\) to be \(\omega\)-independent, the fact that \(\varepsilon=\varepsilon(\omega)\in\textbf C\) promotes the wavevector \(\textbf k_{\omega}\in\textbf C^3\) rather than the usual \(\textbf k_{\omega}\in\textbf R^3\). There are \(2\) important consequences of this, illustrated for instance using \(\textbf B(\textbf x,t)\):

Problem #\(16\): Show using the Drude model that all of the discussion for dielectrics applies equally well to conductors provided the permittivity receives an additional \(\textbf C\)-contribution from conduction electrons of the form:

\[\varepsilon(\omega)\mapsto\varepsilon_{\text{eff}}(\omega):=\varepsilon(\omega)+\frac{i\sigma(\omega)}{\omega}\]

where the optical conductivity is given by the Mobius transformation:

\[\sigma(\omega)=\frac{\sigma_{\text{DC}}}{1-i\omega\tau}\]

where \(\sigma_{\text{DC}}=\sigma(\omega=0)=nq^2\tau/m=\varepsilon_0\tau\omega_p^2\) and \(\tau>0\) is the relaxation time between electron collisions.

Solution #\(16\): Now the equation of motion is basically the same as from the Lorentz oscillator model except there is no Hookean restoring force \(\omega_0=0\) and one renames \(\Delta\omega=1/\tau\). Notably, because the ODE is now merely \(1^{\text{st}}\)-order:

\[\dot{\textbf v}+\frac{1}{\tau}\textbf v=\frac{q\textbf E}{m}\]

The steady state particular integral velocity response \(\textbf v(t)=\textbf v_0e^{-i\omega t}\) to an AC electric field \(\textbf E=\textbf E_0e^{-i\omega t}\) is now a Mobius rather than a Lorentzian:

\[\textbf v=\frac{q\tau}{m}\frac{1}{1-i\omega\tau}\textbf E\]

In order to get optical conductivity \(\sigma(\omega)\), one just has to remember what its defining equation is in the first place, i.e. Ohm’s law \(\textbf J=\sigma\textbf E\). So computing \(\textbf J=\rho\textbf v=\frac{nq^2\tau}{m}\frac{1}{1-i\omega\tau}\textbf E\), the claim follows.

In Fourier space, applying Ohm’s law (valid for low mean free path of the electrons) and the continuity equation for local charge conservation, Maxwell’s equations in an ohmic conductor:

\[i\textbf k_{\omega}\cdot\varepsilon\textbf E_0=\frac{\textbf k_{\omega}\cdot\sigma\textbf E_0}{\omega}\]

\[i\textbf k_{\omega}\cdot\textbf B_0=0\]

\[i\textbf k_{\omega}\times\textbf E_0=-(-i\omega\textbf B_0)\]

\[i\textbf k_{\omega}\times\frac{\textbf B}{\mu}=\sigma\textbf E_0-i\omega\varepsilon\textbf E_0\]

And in particular, the last equation (Ampere-Maxwell) gives the desired \(\varepsilon_{\text{eff}}=\varepsilon+i\sigma/\omega\) (interestingly this same quantity is required to be non-vanishing \(\varepsilon_{\text{eff}}\neq 0\) so that Gauss’s law for the \(\textbf D\)-field yields transverse EM waves). Computing the usual \(\textbf k_{\omega}\times(\textbf k_{\omega}\times\textbf E_0)\) gives the dispersion relation:

\[k_{\omega}=\sqrt{\mu\varepsilon_{\text{eff}}}\omega\]

Problem #\(17\): How do waves behave in the low-\(\omega\) and high-\(\omega\) limits inside the conductor?

Solution #\(17\): For low-\(\omega\), the conductivity term dominates so the permittivity \(\varepsilon_{\text{eff}}\approx\varepsilon_0\frac{i\sigma}{\omega}\approx\varepsilon_0\frac{i\sigma_{\text{DC}}}{\omega}\) is purely imaginary. The dispersion relation then dictates:

\[k_{\omega}=\sqrt{\mu\varepsilon_{\text{eff}}}\omega=\sqrt{\frac{\mu\sigma_{\text{DC}}\omega}{2}}(1+i)\]

where the other option \(\sqrt{i}=-(1+i)/\sqrt{2}\) is rejected because one requires on physical grounds an evanescent wave so \(\Im k_{\omega}>0\). Moreover, because \(\Re k_{\omega}=\Im k_{\omega}\) in this case, the \(\textbf E\) (equivalently, roughly \(\textbf D\)) and \(\textbf B\) (equivalently \(\textbf H\)) fields are \(45^{\circ}\) out of phase in the conductor. All of the EM fields \(\textbf E,\textbf B,\textbf D,\textbf H\) are exponentially attenuated:

\[(\text{phasor})e^{i(\textbf k_{\omega}\cdot\textbf x-\omega t)}=(\text{phasor})e^{-z/\delta(\omega)}e^{i(z/\delta(\omega)-\omega t)}\]

where the skin depth \(\delta(\omega):=\sqrt{2/\mu\sigma_{\text{DC}}\omega}=\lambda/2\pi\) is on the order of a single wavelength \(\lambda\)!

For high-\(\omega\), \(\varepsilon\to\varepsilon_0(1-\omega_p^2/\omega^2)\) and \(\sigma\to i\sigma_{\text{DC}}/\omega\tau\) (just look at the graphs from earlier to see all of this!) so:

\[\varepsilon_{\text{eff}}\to\varepsilon_0\left(1-\frac{2\omega_p^2}{\omega_0^2}\right)\in\textbf R\]

The implication of this is that \(\omega_p\) (or more precisely \(\sqrt{2}\omega_p\)) provides a strict cut-off:

The atmosphere can be modelled as such a plasma and this underlies why FM radio waves with \(\omega>\sqrt{2}\omega_p\) transmit through the atmosphere whereas AM radio waves with \(\omega<\sqrt{2}\omega_p\) are reflected.

Posted in Blog | Leave a comment

Electronics

Problem: Explain how to make a Zener diode and the physics underlying its breakdown mechanism.

Solution: A Zener diode is simply made by heavily doping a \(p\)-\(n\) junction, leading to a very thin depletion region. Then, upon applying a suitably strong reverse bias, eventually electrons from the valence band of the \(\text{p}\)-type semiconductor can quantum tunnel into the conduction band of the \(\text n\)-type semiconductor, hence giving rise to Zener breakdown; the Zener voltage at which this occurs is typically \(V_{\text{Zener}}<5\text{ V}\) or so. Beyond this point, a substantial generation current flows across the \(\text p\)-\(\text n\) junction.

Problem: Explain how to make an avalanche diode and the physics underlying its breakdown mechanism.

Solution: The avalanche diode is sometimes also marketed as a Zener diode simply because in practical electronics, both serve the same function, e.g. voltage referencing, protection in flyback Zener diodes, etc.

This time, the idea is to lightly dope instead. One thus gets a large depletion region. Minority mobile charge carriers accelerated across the depletion region then have enough energy to ionize core electrons, leaving behind holes—>3 carriers, each of which is further accelerated by the \(\textbf E\)-field, and can ionize more Si host atoms. This is thus an avalanche effect, leading to avalanche breakdown (e.g. in single-photon avalanche diodes).

Problem: A smartphone tuning app is able to tune the fifth string of a guitar to \(110\text{ Hz}\) with a precision of \(0.07\text{ Hz}\). Estimate the minimum sampling frequency and
sampling time needed for this task.

Solution:

\[f_s\geq 2\times(110+0.07)\text{ Hz}=220.14\text{ Hz}\]

\[T_s\geq\frac{1}{0.07\text{ Hz}}\approx 14\text{ s}\]

Digitization & Signal Processing

In what follows, it is convenient to think of \(f(t)\in\textbf R\) as a real-valued analog \(t\)-domain signal. Then its Fourier transform \(\hat f(\omega):=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}f(t)e^{-i\omega t}dt\) (exact convention doesn’t matter) is trivially seen to be conjugate-even in \(\omega\)-space \(\hat f(-\omega)=\hat f^*(\omega)\) and in particular its modulus \(|\hat f(\omega)|\) (which one often plots and speaks of as if it were \(\hat f(\omega)\) itself) is simply an even function in \(\omega\)-space \(|\hat f(-\omega)|=|\hat f(\omega)|\). Thus, as Newton might say, for every positive frequency component \(\omega>0\) in \(\hat f(\omega)\), there is an “equal and opposite” negative frequency component \(-\omega<0\) in \(\hat f(\omega)\). This notion that \(\hat f(\omega)\) will be symmetric around the origin \(\omega=0\) is important to keep in mind in what follows below:

(Weak Nyquist Theorem): Given a \(t\)-domain signal \(f(t)\) whose Fourier transform \(\hat f(\omega)\) is known a priori to be compactly supported (or bandlimited) in \(\omega\)-space on a symmetric interval of the form \([-\omega^*,\omega^*]\), then one can guarantee recovery of \(f(t)\) from the \(t\)-sampled signal \(f(t)Ш_{2\pi/\omega_s}(t)\) at sampling angular frequency \(\omega_s\) (where the Dirac comb is defined by \(Ш_{T}(t):=\sum_{n=-\infty}^{\infty}\delta(t-nT)\)) provided one \(t\)-samples with at least \(\omega_s>2\omega^*\).

Proof: Given the \(t\)-sampled signal \(f(t)Ш_{2\pi/\omega_s}(t)\) at sampling angular frequency \(\omega_s\), the Fourier transform of this product is (by the convolution theorem) the convolution of their Fourier transforms (where the Dirac comb is roughly speaking an eigenfunction of the Fourier transform):

\[f(t)Ш_{2\pi/\omega_s}(t)\mapsto\hat f(\omega)*Ш_{\omega_s}(\omega)\]

which is also equal to \(\hat f(\omega)*Ш_{\omega_s}(\omega)=\sum_{n=0}^{\infty}\hat f(\omega-n\omega_s)\), i.e. placing translated copies of the graph of \(\hat f(\omega)\) at each discrete lattice point \(n\omega_s\) in \(\omega\)-space. No information is lost from sampling if and only if these so-called convolution images \(\hat f(\omega-n\omega_s)\) for \(n\in\textbf Z\) do not overlap with each other in \(\omega\)-space for which it is a sufficient (though not necessary) condition that \(\omega_s> 2\omega^*\) as claimed. More precisely, this claimed losslessness of information is captured by the following explicit algorithm for recovering this information: first, multiplying \(\hat f(\omega)*Ш_{\omega_s}(\omega)\) by the top-hat filter \(\text{rect}(\omega/2\omega^*):=[\omega\in[-\omega^*,\omega^*]]\) in \(\omega\)-space filters out all the duplicate convolution images \(\hat f(\omega-n\omega_s)\) for \(n\neq 0\) arising from the \(t\)-sampling and leaves one with the desired, bandlimited Fourier transform of just the original, unsampled, signal \(f\):

\[\hat f(\omega)=[\hat f(\omega)*Ш_{\omega_s}(\omega)]\text{rect}(\omega/2\omega^*)\]

To recover the original \(t\)-domain signal \(f(t)\), the inverse Fourier transform leads (by the convolution theorem again) to:

\[f(t)=[f(t)Ш_{2\pi/\omega_s}(t)]*2\omega^*\text{sinc}\left(\omega^*t\right)\]

where \(\text{sinc}(x):=\sin(x)/x\) in this context is unnormalized \(\int_{-\infty}^{\infty}\text{sinc}(x)dx=\pi\) (see the Dirichlet integral; one can of course also work with the normalized \(\widehat{\text{sinc}}(x):=\text{sinc}(\pi x)\)) and acts as an interpolating kernel when convolved with the \(t\)-sampled signal \(f(t)Ш_{2\pi/\omega_s}(t)\) (this discussion therefore also proves the Whittaker-Shannon interpolation formula). Hence the information in \(f\) is conserved as promised.

However, given more information about \(f\) (specifically, about \(\hat f\)), one can accordingly do better. That is, suppose \(\hat f(\omega)\) is not merely supported on a symmetric interval in \(\omega\)-space of the form \([-\omega^*,\omega^*]\), but rather on a symmetric union of two intervals of the form \([-\omega^*,-\omega_*]\cup[\omega_*,\omega^*]\) with some lower angular frequency cutoff \(\omega_*<\omega^*\) known a priori in addition to the bandlimiting frequency \(\omega^*\). Then it turns out to be possible to recover \(f(t)\) from a sub-Nyquist \(t\)-sampling frequency \(\omega_s\leq 2\omega^*\) provided one chooses \(\omega_s\) wisely:

(Strong Nyquist Theorem): Given a \(t\)-domain signal \(f(t)\) whose Fourier transform \(\hat f(\omega)\) is known a priori to be bandlimited in \(\omega\)-space on a symmetric union of two intervals of the form \([-\omega^*,-\omega_*]\cup[\omega_*,\omega^*]\), then it may be possible to \(t\)-sample at a sub-Nyquist frequency \(\omega_s>2\Delta\omega\) (where \(\Delta\omega:=\omega^*-\omega_*\) is the bandwidth of the signal \(f\)) while still being able to recover \(f(t)\) from \(f(t)Ш_{2\pi/\omega_s}(t)\).

Proof: Just draw some pictures in \(\omega\)-space. Note the catch to this: such a sampling frequency \(\omega_s\) may not exist if there is not enough room to sneak aliases into the empty region \([-\omega_*,\omega_*]\). The exact nature of when this is possible are complicated:

In practice, for signals \(f(t)\) one encounters in the real world (e.g. temperature \(T(t)\) of a thermistor, air pressure \(p(t)\) of a microphone, displacement \(x(t)\) of a particle, etc. though in practice these are all reduced via modern electronics to voltages \(V(t)\) in a circuit), it is rare to find that \(\hat f(\omega)\) is localized precisely to some interval in \(\omega\)-space like \([-\omega^*,\omega^*]\) or \([-\omega^*,-\omega_*]\cup[\omega_*,\omega^*]\); usually the support \(\hat f^{-1}(\textbf C-\{0\})\) will be unbounded so that in this case the separate convolution images \(\hat f(\omega-n\omega_s)\) separated by the sampling frequency \(\omega_s\) could only be made disjoint from each other in the limit of infinite separation \(\omega_s\to\infty\) in \(\omega\)-space but of course there is no practical \(t\)-sampler with \(\omega_s=\infty\). However all is not lost; in many applications it turns out that one can think of \(\omega_*\) and \(\omega^*\) not as strictly the smallest and largest frequency components present in the signal \(\hat f(\omega)\), but rather as the smallest and largest frequency components in \(\hat f(\omega)\) that one gives a damn about for the application of interest. For instance, human biology limits our ears to being able to detect sound frequencies in between \(\omega_*/2\pi\approx 20\text{ Hz}\) and \(\omega^*/2\pi \approx 20\text{ kHz}\). Mathematically, this means that the signal \(f(t)\) corresponding to \(\hat f(\omega)\) and the signal \(\tilde f(t)\) corresponding to \(\hat f(\omega)\text{rect}_{2\Delta\omega}(\omega)\) sound exactly the same to our ears. The Nyquist theorem therefore asserts that one needs to sample at at least \(\omega_s/2\pi\geq\approx 40\text{ kHz}\) and indeed the standard audio sampling frequency turns out to be \(\omega_s/2\pi=44.1\text{ kHz}\).

Operational Amplifiers

An operational amplifier (op-amp for short) is, as its name suggests, a kind of amplifier. The “operational” part of its name refers to the fact that op-amp circuits can essentially be used to perform various mathematical operations on functions such as addition/subtraction \(\pm\), differentiation \(\frac{d}{dt}\), integration \(\int dt\), etc. The typical schematic of an op-amp is shown below (note there is no widespread convention regarding whether one should draw the non-inverting input \(+\) above or below the inverting input \(-\)):

Since op-amps are active components (unlike say resistors, capacitors or inductors which are passive components), they require a bipolar external DC power supply \(\pm V_S\) which is not always annotated on the schematic like it is above, being implicitly understood. Typically, but not always, \(V_S=15\text{ V}\). As far as practical use of op-amps goes, it suffices to treat them as black boxes (in reality lots of transistors, etc.) subject to the following \(3\) rules (strictly true only in the steady state \(t\to\infty\)):

\[V_{\text{out}}\leq V_s\]

\[V_{\text{out}}=A(V_+-V_-)\]

\[I_+=I_-=0\]

where \(A\) is called the op-amp’s open-loop gain because as it stands, the op-amp drawn above is said to be in an open-loop configuration meaning that there’s nothing at all connecting the output \(V_{\text{out}}\) to either of the inputs \(V_{\pm}\) (if there were, then the op-amp would be in a closed-loop configuration). The open-loop gain \(A\) is typically quite large \(A\sim 10^5\), and also not well-controlled during op-amp manufacturing (i.e. there can be a substantially large tolerance \(\Delta A\)). For this reason, despite the fact that the rule \(V_{\text{out}}=A(V_+-V_-)\) clearly suggests an amplification of the differential input voltage \(V_+-V_-\) by the open-loop gain \(A\), actually most of the time (with the exception of certain special op-amp circuits such as comparators), the amplifying ability of the op-amp is not meant to come from its open-loop gain \(A\). Put another way, if one tried to amplify even a mere \(V_+-V_-=0.2\text{ mV}\) using just the raw op-amp in its open-loop configuration, then \(A(V_+-V_-)\sim 20\text{ V}\) would already exceed the typical supply voltage \(V_S=15\text{ V}\), contradicting the first rule and therefore causing the op-amp in this open-loop configuration to simply saturate at \(V_{\text{out}}=V_S=15\text{ V}\).

So as it stands, we have some open-loop op-amp with some large open-loop gain \(A\) plagued by a large uncertainty \(\Delta A\) and only able to amplify the smallest of voltages before saturating. Doesn’t sound too useful! The key insight to addressing all of these problems is, as one may guess, to not be using open-loop op-amps, but rather to use op-amps in a closed-loop configuration. More precisely, there are \(2\) distinct kinds of closed-loop configuration, one generally more useful than the other. If one connects a wire from \(V_{\text{out}}\) to \(V_+\) (resp. \(V_-\)) (possibly with other components like resistors, capacitors, etc. along this wire), then the op-amp is said to be in a positive (resp. negative) feedback configuration. As its name suggests, a positive feedback configuration is self-reinforcing because, suppose initially \(V_+=V_-\) so that \(V_{\text{out}}=0\). Then, if some small perturbation causes a sudden imbalance \(V_+>V_-\), then of course \(V_{\text{out}}=A(V_+-V_-)\) will respond by increasing too. But now comes the positive feedback loop! Since \(V_{\text{out}}\sim V_+\) are correlated together in a positive feedback configuration, as \(V_{\text{out}}\) grows so does \(V_+\) which by \(V_{\text{out}}=A(V_+-V_-)\) causes \(V_{\text{out}}\) to grow even more, and so forth, leading to a runaway instability (in practice the op-amp would quickly rail at \(V_{\text{out}}=V_S\) and square-wave like oscillations of \(\pm V_S\) in \(V_{\text{out}}(t)\) may ensue). This is not very useful unless one is specifically interested in such an oscillator. By far the most useful closed-loop configuration of an op-amp is the negative feedback configuration. Repeating the above arguments shows that this tends to have a stabilizing effect by coaxing \(V_+-V-\to 0\) to a setpoint of \(0\) in a manner not unlike a mass on a spring which attempts to restore the mass’s position \(x\) to the equilibrium position \(x_0\) via \(x-x_0\to 0\). Heuristically:

Thus, for op-amps with a closed-loop negative-feedback configuration specifically, one can just assume that \(V_+\approx V_-\) even when \(V_{\text{out}}\neq 0\). This is called the virtual short rule. Together with the two other rules \(V_{\text{out}}\leq V_S\) and \(I_+=I_-=0\), these \(3\) rules taken together are called the golden rules for op-amps (sometimes the rule \(V_{\text{out}}\leq V_S\) is omitted from the golden rules but I think it’s important enough to be included among them). Note that the two golden rules \(V_{\text{out}}\leq V_S\) and \(I_+=I_-=0\) are always true, the former because of just how op-amps work (black box!) and the latter because the input impedance \(Z_+=Z_-=\infty\) is in practice very large, so effectively infinite on both inputs, hence no current is drawn \(I_-=I_+=0\) (this is clearly independent of whether the op-amp happens to be closed-loop or not). But the virtual short golden rule \(V_+=V_-\) only applies to op-amps in closed-loop negative-feedback configurations. This is a very important caveat that one must always remember. In other words, the first thing to always check before analyzing any op-amp circuit is whether or not there is a negative feedback path \(V_{\text{out}}\to V_-\). If so, then one’s life is made easy by the \(3\) golden rules. Otherwise, if the op-amp is in an open-loop or closed-loop but positive-feedback configuration, then one must be more careful and in particular one cannot just blindly apply the golden rule \(V_+=V_-\) anymore.

Perhaps the simplest closed-loop negative-feedback op-amp circuit is that of the voltage follower; one simply starts with an open-loop op-amp and then connects a negative feedback loop \(V_{\text{out}}\to V_-\) with absolutely nothing on it:

Thanks to the earlier discussion, one can therefore legitimately apply the golden rules here to find that, at equilibrium, \(V_{\text{out}}=V_+=V_-\), hence the name “voltage follower”. It may seem like a voltage follower is a bit of a pointless op-amp circuit because it simply “buffers” the input voltages \(V_+,V_-\) across to the output \(V_{\text{out}}\); after all a wire would do exactly the same thing. The key difference though is that whereas a wire will also draw some non-zero current \(I\neq 0\), a voltage follower, being essentially an op-amp, does not draw any current \(I_+=0\) as one of the op-amp golden rules. In other words, electrons \(e^-\) are allowed to hop between the two stages, so they are effectively isolated from each other and cannot affect each other. This point is best illustrated with the following explicit example in which the first stage is a voltage divider \((R,R)\) (same resistance \(R\) for simplicity) and the second stage is some load resistor \(R\) (again same \(R\) for simplicity). One would like to take the voltage outputted at the center of the voltage divider and apply it across the load resistor \(R\). Naively, applying an input voltage of \(V_{\text{in}}\) across the voltage divider leads to an output voltage \(V_{\text{out}}=V_{\text{in}}/2\). Suppose this is how much voltage we actually would like to apply across the load resistor \(R\). So then we connect the load resistor \(R\) to the voltage divider with a simple, naive wire:

the problem here of course is that the first stage (the voltage divider) is not isolated from the second stage (the load resistor \(R\)) because a current will flow across and into the load resistor \(R\). Because the load resistor \(R\) is in parallel with the bottom resistor \(R\) in the potential divider, one can check this implies that, rather than the voltage divider outputting the voltage \(V_{\text{in}}/2\) we had hoped for, actually it’s been reduced to \(V_{\text{in}}/3\) after we hooked up the load resistor \(R\). Of course in this case it’s not too much hassle to deal with this (just adjust \(V_{\text{in}}\) accordingly), but more generally it can be headache if each additional stage one adds messes with previous stages. Instead of connecting just a bare wire between the two stages, here a voltage follower comes in very handy:

Now, because \(I_+=0\), we recover our expected voltage divider output of \(V_{\text{in}}/2\). Moreover, the voltage follower then simply passes this voltage \(V_{\text{in}}/2\) across the second stage load resistor \(R\) without any fuss, as we desired. Voltage followers therefore provide a simple and effective way to buffer an output voltage from one stage as the input voltage into a second stage, ensuring that any modifications made to the second stage later (e.g. changing the resistance of the load resistor \(R\mapsto R’\neq R\)) do not affect this buffered voltage \(V_{\text{in}}/2\).

Earlier, it was mentioned that op-amps, while possessing amplification powers, are seldom used as open-loop amplifiers with their unstable open-loop gain \(A\). So then how do they amplify voltages? One way is to use an non-inverting amplifier. The idea is to start with the open-loop op-amp, then provide negative feedback through a feedback resistor \(R_F\), followed by completing the voltage divider with a second resistor \(R\) (of course, there are many topologically homeomorphic ways to draw this, in particular drawings that put the non-inverting input \(+\) above the inverting input \(-\)):

The fact that this is a voltage divider for \(V_{\text{out}}\) together with the golden rule (because we’re supplying negative feedback!) immediately yields \(V_{\text{in}}=\frac{R}{R+R_F}V_{\text{out}}\) from which we obtain the equation for a non-inverting amplifier:

\[V_{\text{out}}=\left(1+\frac{R_F}{R}\right)V_{\text{in}}\]

Notice that if one removes the feedback resistor \(R_F=0\), then the non-inverting amplifier reduces to a voltage follower. Notice also that \(1+\frac{R_F}{R}\geq 1\) is always amplifying, never attenuating.

Alternatively, there also exists the inverting amplifier, which derives from the non-inverting amplifier by simply swapping the locations of \(V_{\text{in}}\) and ground \(\text{GND}\):

In this case, equating the currents across the two resistors along the top wire gives the equation for an inverting amplifier:

\[V_{\text{out}}=-\frac{R_F}{R}V_{\text{in}}\]

the fact that \(\text{sgn}(V_{\text{out}})=-\text{sgn}(V_{\text{in}})\) explains why this is called an inverting amplifier (i.e. if \(V_{\text{in}}(t)\) were a time-dependent rather than just DC input voltage, then \(V_{\text{out}}(t)\) would also be time-dependent and \(\pi\) out of phase with \(V_{\text{in}}(t)\)). Actually, the non-inverting and inverting amplifiers together explain why the \(-\) input is called the inverting input in the first place, and likewise why the \(+\) input is called the non-inverting input (see where \(V_{\text{in}}\) is being inputted in the case of each amplifier circuit!). Unlike the non-inverting amplifier, the inverting amplifier can clearly either amplify or attenuate depending on the application. More generally, for any op-amp circuit there will always be some suitable notion of input voltage \(V_{\text{in}}\) along with the usual output voltage \(V_{\text{out}}\) so that for any op-amp circuit one can define its gain \(G:=V_{\text{out}}/V_{\text{in}}\). Thus, a voltage follower has \(G=1\) unit gain, a non-inverting amplifier has gain \(G=1+R_F/R\) while an inverting amplifier has negative gain \(G=-R_F/R\).

In any case, the important point here I want to emphasize again about these amplifier circuits is that, unlike the op-amp’s unstable intrinsic open-loop gain \(A\), these closed-loop gains \(G\) are essentially not functions of \(A\), i.e. \(\partial G/\partial A=0\) precisely because \(A\sim 10^5\) is so large. The point is that \(G\) is a much more controllable gain to work with for amplification purposes than \(A\) because it only depends on the values of external components like resistances \(R_F,R\) which are generally well-known with small tolerances. To hit this point home, consider the following abstraction of the process of negative feedback:

The op-amp is in a closed-loop negative-feedback configuration with a feedback factor \(\beta\in\textbf R\) (implemented through some external components, for instance \(\beta=-R/R_F\) in the inverting amplifier) defined so that it basically does what we intuitively expect negative feedback to do, namely \(V_-\mapsto V_{-}-\beta V_{\text{out}}\). Because \(V_{\text{out}}=G(V_+-(V_{-}-\beta V_{\text{out}}))=A(V_+-V_-)\), it follows that the closed-loop gain \(G\) is related to the open-loop gain \(A\) via:

\[G=\frac{A}{1+\beta A}\]

In particular, because \(A\gg 1\) is large, \(G\approx 1/\beta\) is essentially independent of \(A\)! And even if there’s a large manufacturing uncertainty \(\Delta A\) in the open-loop gain \(A\), the corresponding uncertainty \(\Delta G=\frac{\Delta A}{(1+\beta A)^2}\) in the closed-loop gain \(G\) (assuming \(\Delta\beta\ll\Delta A\) has much less uncertainty which is true) is utterly negligible due to the suppression by a much larger factor of \((1+\beta A)^2\) in the denominator. Intuitively, this is just saying that all the way out at \(A\approx 10^5\), the blue curve is basically flat at \(G\approx 1/\beta\) and so moving around a little bit \(\sim\Delta A\) out so far on such a flat curve won’t really affect the fact that \(G\) is still \(\approx 1/\beta\).

Finally, there is a whole pantheon of clever op-amp circuits that have been devised such as active filters, differentiators, integrators (to make it practical/avoid the DC railing problem, add a shunt resistor in parallel with the capacitor!), summing amplifiers, differential amplifiers, instrumentation amplifiers, comparators, precision rectifiers, \(IV\) converters, Schmitt triggers, etc.

One more point regarding op-amps is that, for AC voltage inputs, their (now complex) open-loop gain \(A=A(\omega)\in\textbf C\) is a function of frequency \(\omega\in\textbf R\), roughly going like \(|A(\omega)|\sim\omega^{-1}\) as \(\omega\to\infty\). On a Bode magnitude plot of \(\log|A(\omega)|\) vs. \(\log\omega\), this behavior shows up as an asymptotic \(\omega\to\infty\) linear behavior with slope \(\frac{d\log|A(\omega)|}{d\log\omega}=-1\). Op-amps are designed with this kind of low-pass open-loop behavior to avoid instabilities at large \(\omega\). In addition to this, one can also add decoupling capacitors \(C_{\text{decoupling}}\) between the external power supply leads \(\pm V_S\) and ground \(\text{GND}\) to short out high-frequency noise that occurs such as when turning on the external DC power supply on (the Heaviside step function \(V_0[t>0]\) has Fourier transform \(V_0\left(\frac{1}{i\omega\sqrt{2\pi}}+\sqrt{\frac{\pi}{2}}\delta(\omega)\right)\)). Finally, one more technicality is that in real life op-amps are also limited by a slew rate \(\dot V_{\text{out}}^*\) such that for all \(t\in\textbf R\), \(|\dot V_{\text{out}}(t)|<\dot V_{\text{out}}^*\) is capped by the op-amp’s slew rate.

Control Systems

A control system may be thought of abstractly as a dynamical system (labelled as just “system” in the block diagrams below) subject to an additional controller input \(\textbf u\) as follows:

\[\dot{\textbf x}=\textbf F(\textbf x,\textbf u)\]

Generalizing the situation for op-amps, there are two kinds of control systems, namely open-loop (also called feedforward) control systems and closed-loop (also called feedback) control systems:

Note that even in the so-called “open-loop control system” there is a closed loop from the output state \(\textbf x\) back into the system. This is just the fact that we’re working with a dynamical system \(\dot{\textbf x}=\textbf F(\textbf x,…)\) where the state at a later time depends on the state at a previous time (we can of course also consider the special case where \(\partial\textbf F/\partial\textbf x=0\) for the system of interest so that there wouldn’t be any such closed loop). This same closed loop also appears in the closed-loop control system, but here the novelty is that there is an additional closed loop (called the control loop!) from the output \(\textbf x\) all the way back into the controller (in the form of the error \(\textbf x-\textbf x_0\) from the desired reference setpoint \(\textbf x_0\)). It is usually this kind of control system possessing a closed control loop in a negative feedback configuration (seeking to minimize \(\textbf x-\textbf x_0\)) that tends to be the most useful. Practically speaking, this is implemented by adding a sensor to continuously the monitor the system’s state \(\textbf x\) and having it feed the measured state \(\textbf x\) back into the controller.

To give an example, one of my pet peeves is traffic lights. Even when there are no cars anywhere in the transverse lane, one must typically still wait out the entire duration of the red light turning green before one can go. In the language introduced above, this is an example of an open-loop control system; the dynamical system consists of the cars approaching this intersection (with state vector \(\textbf x:=(N_1,N_2)\) the number of cars lined up at each intersection) and the controller is the traffic lights. It sucks. To improve this control system, it makes to sense to add a negative feedback control loop (e.g. cameras for counting the number of cars \(N_1,N_2\)). The setpoint state would then be \((N_1,N_2)=(0,0)\).

Focusing on the more interesting case of closed-loop control systems with a negative feedback control loop, one can view the control system in \(\omega\)-space (or engineers also like \(s=i\omega\)-space) rather than the \(t\)-domain. If one assumes that the dynamical system is linear, then one has a block diagram of the form:

where now \(H_C(\omega)\) is the transfer function of the controller and \(H_S(\omega)\) the transfer function of the dynamical system. Here, because \(\hat{\textbf u}=H_C(\hat{\textbf x}-\hat{\textbf x}_0)\) and \(\hat{\textbf x}=H_S\hat{\textbf u}\), one can eliminate the controller input \(\hat{\textbf u}\) to obtain:

\[\hat{\textbf x}=(H_L-1)^{-1}H_L\hat{\textbf x}_0\]

where \(H_L=H_SH_C\) is the transfer function for the entire control loop.

Whenever \(|H_L|>1\) and \(\text{arg}(H_L)=\pi\), one gets a subcritical Hopf bifurcation (intuitively makes sense but exact details are still muddled). This is like saying that the negative feedback inadvertently becomes positive feedback becomes the application of the negative feedback is \(\pi\)-out of phase with the system’s dynamics, accidentally acting like positive feedback instead. To quantify how far a given (closed-loop, negative-feedback) control system is from this oscillatory instability, one introduces the two complementary notions of the gain margin \(||H_L|-1|_{\arg(H_L)=\pi}\) and phase margin \(|\arg(H_L)-\pi|_{|H_L|=1}\) which are typically straightforward to read off from Bode magnitude and phase plots of the control loop transfer function \(H_L\).

One very popular and effective type of controller is a PID (proportional, integral, derivative) controller which operates on the following formula:

\[\textbf u_{\text{PID}}=K_P\Delta\textbf x+K_I\int_0^t\Delta\textbf x(t’)dt’+K_D\Delta\dot{\textbf x}\]

where \(\Delta\textbf x:=\textbf x-\textbf x_0\) and \(K_P,K_I,K_D\) are constants that typically need to be tuned via trial-and-error.

Noise Filtering

In general, noise \(N(t)\) in the \(t\)-domain may be construed as an uncountable continuum of draws from an uncountable continuum of continuous random variables \(\mathcal N(t)\) that could in principle themselves be \(t\)-dependent (and possibly correlated across different times \(\mathcal N(t)\sim\mathcal N(t’)\)), one draw and one random variable per moment \(t\in\textbf R\) in time (this is often called a stochastic process). Of course, one can also opt to view the noise \(N(t)\mapsto\hat N(\omega)\) in \(\omega\)-space. In this case one often defines \(|\hat N(\omega)|^2\) to be the energy spectral density of the noise \(N\), and because the total energy \(E:=\int_{-\infty}^{\infty}|\hat N(\omega)|^2d\omega=\int_{-\infty}^{\infty}|N(t)|^2dt\) tends to diverge \(E=\infty\) for these kinds of noisy signals \(N\), it is instead more instructive to look at the total power \(P\) of the noise \(N\) instead \(P:=\lim_{\Delta t\to\infty}\frac{1}{\Delta t}\int_{-\Delta t/2}^{\Delta t/2}|N(t)|^2dt=\lim_{\Delta t\to\infty}\frac{1}{\Delta t}\int_{-\infty}^{\infty}|N(t)\text{rect}(t/\Delta t)|^2dt=\lim_{\Delta t\to\infty}\frac{1}{\Delta t}\int_{-\infty}^{\infty}|\hat N(\omega)*\text{sinc}(\omega\Delta t)|^2d\omega\). In particular, the integrand then defines the power spectral density \(\hat P(\omega)=\lim_{\Delta t\to\infty}\frac{1}{\Delta t}|\hat N(\omega)*\text{sinc}(\omega\Delta t)|^2\). Defining the autocorrelation of the noise as \(A(t):=\int_{-\infty}^{\infty}N(t’)N(t’-t)dt’\), a quick calculation via the convolution theorem (assuming a stationary random/Wiener stochastic process which is valid for noise \(N\)) allows one to check that:

\[\hat P(\omega)=\hat A(\omega)\]

The fact that the autocorrelation and the power spectral density are Fourier transform pairs is called the Wiener-Knichin theorem.

There are several different kinds of noise commonly encountered. The simplest is called white noise. This is defined by having a uniform power spectral density \(\hat P_{\text{white}}(\omega)=\text{constant}\). In this case, the Wiener-Knichin theorem asserts that the autocorrelation of white noise is a Dirac delta distribution \(A_{\text{white}}(t)\sim \delta(t)\). This means the only way you get a non-zero correlation is when you put the noise exactly on top of itself; this means that white noise is uncorrelated noise. Examples of white noise include Johnson-Nyquist noise \(\hat P_{\text{JN}}(\omega)\propto T\) in electronics (manage by cooling down electronics) and shot noise in photodiodes.

There is also pink noise which is defined by having a power spectral density which (no pun intended) goes as the power law \(\hat P_{\text{pink}}(\omega)\sim\frac{1}{\omega}\). Roughly speaking, pink noise is universal because it’s associated with self-similarity, similar to a fractal. One simple way to see this self-similarity is that the power allocated to each “octave” (or generalization thereof) is constant for instance:

\[\int_{\omega_1}^{\lambda\omega_1}\hat P_{\text{pink}}(\omega)d\omega=\int_{\omega_2}^{\lambda\omega_2}\hat P_{\text{pink}}(\omega)d\omega\]

for all \(\lambda,\omega_1,\omega_2\in\textbf R\) by virtue of how logarithms work (the autocorrelation for pink noise is also \(A(t)\sim 1/|t|\) if one interprets its power spectral density as \(\hat P_{\text{pink}}(\omega)\sim 1/\omega=1/|\omega|\) symmetrically). Finally, there are also kinds of noise like \(\hat P_{\text{Brownian}}(\omega)\sim 1/\omega^2\).

In practice, it is common to have both white noise and pink noise mixed in with a signal. In the DC limit \(\omega\to 0\), it is clear that pink noise will dominate over white noise \(\hat P_{\text{pink}}(\omega)\gg \hat P_{\text{white}}(\omega)\) but in the high-frequency limit it is white noise that will dominate \(\hat P_{\text{white}}(\omega)\gg \hat P_{\text{pink}}(\omega)\).

Lock-In Detection

A common situation one encounters is that the signal \(f(t)\) one is interested might have some frequency component at some small \(\omega_0\in\textbf R\) so that by the above discussion it will be buried in pink noise. In that case, provided one knows a priori what the frequency \(\omega_0\) of the signal is that one is interested in, then the technique of lock-in detection allows one to lock on to the signal at that frequency \(\omega_0\) notwithstanding even an \(\text{SNR}<1\) (i.e. being buried in the pink noise)! Roughly speaking, lock-in detection just amounts to implementing the Fourier transform \(\hat f(\omega_0)\) evaluated at the frequency \(\omega=\omega_0\) of interest. One starts by having some reference oscillator generate a relatively clean sinusoidal signal \(\sim\cos(\omega_0 t)\) at the frequency \(\omega_0\) of interest. Then, a “mixer” multiplies (or “modulates”) the noisy signal \(f(t)\) by this clean sinusoid (or sometimes a square wave) to obtain \(\sim f(t)\cos(\omega_0 t)\). Finally one needs some kind of low-pass filter to implement an integral-like step \(\sim\int f(t)\cos(\omega_0 t)dt\), which is the usual Fourier transform for \(f\) evaluated at frequency \(\omega_0\) (strictly one has to first also mix in a \(\sim\sin(\omega_0 t)\) reference signal too and repeat everything above). Because the Fourier transform of course not only provides amplitude information \(|\hat f(\omega_0)|\) but also phase information \(\text{arg}\hat f(\omega_0)\), lock-in detection is also sometimes called “phase-sensitive detection”.

Problem: A resistor \(R\) sitting on a table, connected to nothing whatsoever will of course have \(V=0\) across it…or to be more precise, the expected voltage \(\langle V\rangle=0\) will be vanishing, but it turns there will be thermal/equilibrium fluctuations \(\langle V^2\rangle\neq 0\) about \(\langle V\rangle=0\).

Explain heuristically what the origin of such voltage fluctuations \(\langle V^2\rangle\neq 0\) (also called Johnson noise) is and obtain a quantitative estimate of their order of magnitude by equating, in equilibrium, the power dissipated in the resistor with its thermal power from blackbody radiation in the Rayleigh-Jeans limit.

Solution: One has:

\[\frac{\langle V^2\rangle}{(2R)^2}R\sim k_BT\Delta f\]

Hence the RMS voltage is:

\[\sqrt{\langle V^2\rangle}=\sqrt{4Rk_BT\Delta f}\]

where the factor of \(4\) comes from a more detailed analysis (e.g. see Nyquist’s \(1928\) paper about this). In fact, Johnson noise is just a corollary of the fluctuation-dissipation theorem.

Posted in Blog | Leave a comment

Variational Method & \(1\)D Band Structure

Problem #\(1\):

Solution #\(1\):

Problem:

Solution:

Problem #\(3\):

Solution #\(3\):

Problem #\(4\):

Solution #\(4\):

Problem #\(5\):

Solution #\(5\): First, although this tight-binding model looks like a classical model, in fact it arises from the quantum Hamiltonian \(H=E_01-t_1\sum_n(|n+1\rangle\langle n|+|n-1\rangle\langle n|)\) together with the “discrete position representation” \(|\psi\rangle=\sum_n\psi_n|n\rangle\Leftrightarrow\psi_n:=\langle n|\psi\rangle\). The piece \(\psi_{n-1}+\psi_{n+1}\) can be coarse-grained to a kinetic energy:

\[\psi_{n-1}+\psi_{n+1}=\psi_{n-1}+\psi_{n+1}-2\psi_n+2\psi_n\approx\Delta x^2\psi^{\prime\prime}-2\psi\]

to yield the Schrodinger equation for a free particle:

\[-t_1\Delta x^2\psi^{\prime\prime}+(E_0-2t_1)\psi=E\psi\]

with effective mass \(m^*\) defined through \(\frac{\hbar^2}{2m^*}=t_1\Delta x^2\). The eigenstates are the usual scattering plane waves \(\psi(x)\sim e^{\pm ikx}\) where one has the free particle dispersion relation:

\[E=E_0-2t_1+t_1k^2\Delta x^2\]

This motivates how to proceed, namely replace \(x\mapsto n\Delta x\) and make the ansatz \(\psi_n=e^{ikn\Delta x}\). Doing so gives the cosine band dispersion:

\[E=E_0-2t_1\cos k\Delta x\]

so in this tight-binding approximation there is a single band of width \(4t_1\) centered at \(E_0\) (no notion of band gaps in this model because there’s only \(1\) band!). Notice also that Taylor expanding the \(\cos\) reproduces the earlier free particle dispersion near \(k=0\).

For part b), for \(n\leq -1\) or \(n\geq 2\), in both cases one finds:

\[t_1(c+c^{-1})+E-E_0=0\]

Meanwhile for \(n=0,1\), one finds:

\[\begin{pmatrix}s&E-E_0+ct_1\\E-E_0+ct_1&s\end{pmatrix}\begin{pmatrix}\alpha\\\beta\end{pmatrix}=\begin{pmatrix}0\\0\end{pmatrix}\]

this is a \(2\times 2\) circulant matrix so the (unnormalized) eigenvectors \((\alpha,\beta)=(1,\pm 1)\) are very easy to read off and so no determinant computations are even needed; the dispersion relations are \(E-E_0=-ct_1\pm s\). Eliminating \(E-E_0\) in the “bulk” equation immediately yields \(c=\pm t_1/s\) so that \(|c|<1\) which indicates a bound state in which the electron is localized near the origin by the high-energy \(s>t_1\) defect because \(\psi_n\to 0\) as \(n\to\pm\infty\). The energies \(E_{\pm}=E_0\pm\left(\frac{t_1^2}{s}+s\right)\) may be checked to lie outside the original band \(E\in[E_0-2t_1,E_0+2t_1]\).

Problem #\(6\):

Solution #\(6\):

Problem #\(7\):

Solution #\(7\):

Posted in Blog | Leave a comment

Insights on Thermodynamics

Problem #\(1\): Derive the Maxwell relation for a gas:

\[\left(\frac{\partial S}{\partial V}\right)_T=\left(\frac{\partial p}{\partial T}\right)_V\]

And explain why Maxwell relations in general should be viewed as much more than just mathematical identities.

Solution #\(1\): Here it is clear that one is viewing \((T,V)\) as independent variables, and the corresponding energy which has these as its natural variables is the Helmholtz energy \(F\). This has the differential:

\[dF=-SdT-pdV\]

So by Clairaut’s theorem:

\[\frac{\partial^2F}{\partial T\partial V}=\frac{\partial^2F}{\partial V\partial T}\]

\[\left(\frac{\partial p}{\partial T}\right)_V=\left(\frac{\partial S}{\partial V}\right)_T\]

All Maxwell relations (for a gas system described by state space \((p,V)\)) can be boiled down to the Jacobian determinant:

\[\frac{\partial (p,V)}{\partial (T,S)}=1\]

which expresses the area/orientation-preserving property of the state space transformation \((p,V)\to(T,S)\). So the point is that thermodynamics is really a theory of geometry (cf. special relativity), the geometry of equilibrium states.

In any Maxwell relation, one partial derivative represents a process which is experimentally accessible, while the other side represents an experimentally difficult quantity to measure (this is typically the entropy \(S\)) but which is in some way seeking to understand microscopic detail. Even to measure (changes in) entropy, one typically measures heat capacities…but the partial derivative is being taken at constant \(T\)! So always write Maxwell relations in a way that agrees with the usual convention in math of putting the dependent variable on the LHS and independent variables on the RHS, in this case putting the experimentally difficult partial derivative on the LHS and the experimentally accessible one on the RHS.

Problem #\(2\): For a gas, what does it mean to have complete thermodynamic information?

Solution #\(2\): Knowing \(N,V,T,\mu,p,S\). For instance, if one specifies a function \(F=F(N,V,T)\) for the Helmholtz energy, then this provides one with complete thermodynamic information since one can obtain \(\mu=\frac{\partial F}{\partial N}, p=-\frac{\partial F}{\partial V},S=-\frac{\partial F}{\partial T}\). Since the Helmholtz free energy is equivalent to giving the canonical partition function \(F=-k_BT\ln Z\), this is unsurprising.

Problem #\(3\): What are the assumptions required for each of the following differential equalities to be true?

\[dE=\bar dQ+\bar dW\]

\[\bar dQ=TdS\]

\[\bar dW=-pdV+\mu dN\]

\[dE=TdS-pdV+\mu dN\]

Solution #\(3\): All \(4\) equations require one to go between \(2\) (infinitesimally separated) equilibrium states. In other words, this is thermodynamics! With this in mind, the first equation is then always true regardless of the path one takes between those \(2\) equilibrium states (because \(E\) is a state function). However, the \(2^{\text{nd}}\) and \(3^{\text{rd}}\) equations are true iff the path by which one goes between those \(2\) equilibrium states is reversible (since \(\bar dW\) and \(\bar dQ\) are now path functions); if the path were irreversible then both equalities would instead become inequalities! Finally, although at first one may think the \(4^{\text{th}}\) equation is only true for a reversible path, in fact because \(dE\) is an exact \(1\)-form (whereas \(\bar dW,\bar dQ\) are inexact), it is again always true, reversible or irreversible, since at the end of the day all that matters for state functions is the initial and final states.

In the case of \(E\) for a gas, clearly its extensive natural variables are \(N,V,S\), while its intensive derived variables are \(\mu, p, T\).

This also makes it clearer what the \(1^{\text{st}}\) law of thermodynamics is really saying, i.e. the sum of the inexact \(1\)-forms \(\bar dQ+\bar dW\) is (nontrivially!) an exact \(1\)-form that one happens to call \(dE\). Put another way, it is rooted in a bedrock belief that conservation of energy should hold even in the presence of heat transfer \(\bar dQ\).

Problem #\(4\): What is the difference between a quasistatic and a reversible process?

Solution #\(4\): Reversible processes are a subset of quasistatic processes. More precisely:

\[\text{Reversible}=\text{Quasistatic}+\text{No Friction}\]

where \(\text{No Friction}\) can also be interpreted as “adiabatic” or “isentropic”. Reversible really means time reversible.

Problem #\(5\): Prove Euler’s homogeneous function theorem, i.e.:

\[V(\lambda\textbf x)=\lambda^n V(\textbf x)\Leftrightarrow \textbf x\cdot\frac{\partial V}{\partial\textbf x}=nV(\textbf x)\]

Hence, by appealing to extensivity of the energy \(E=E(N,V,S)\), obtain the Gibbs-Duhem relation:

\[SdT-Vdp+Nd\mu=0\]

Solution #\(5\): Just differentiate \(\left(\frac{\partial}{\partial\lambda}\right)_{\lambda=1}\) (recognize this as the precursor of the virial theorem!). Extensivity of the energy is equivalent to the \(n=1\) case of Euler’s homogeneous function theorem, i.e.

\[N\frac{\partial E}{\partial N}+V\frac{\partial E}{\partial V}+S\frac{\partial E}{\partial S}=E\]

\[E=TS-pV+\mu N\]

Taking the differential of both sides:

\[dE=TdS-pdV+\mu dN+SdT-Vdp+Nd\mu\]

so because \(dE=TdS-pdV+\mu dN\), the latter part must vanish, which is the Gibbs-Duhem relation.

Problem #\(6\): Using the Gibbs-Duhem relation, prove the Clausius-Clapeyron equation for a single-component system at a \(1^{\text{st}}\)-order phase boundary/coexistence curve:

\[\frac{\partial p}{\partial T}=\frac{\Delta S}{\Delta V}\]

Solution #\(6\): For each phase of the single-component system, one has a Gibbs-Duhem relation which can be written in the form:

\[S_1dT_1-V_1dp_1+N_1d\mu_1=0\]

\[S_2dT_2-V_2dp_2+N_2d\mu_2=0\]

Now, the defining property of a phase boundary is that the \(2\) phases are in equilibrium, so the intensive variables between the \(2\) phases are equal everywhere on the phase boundary \(T_1=T_2:=T,p_1=p_2:=p,\mu_1=\mu_2:=\mu\). So moving infinitesimally along this phase boundary, it is a rigorous corollary that \(dT_1=dT_2:=dT,dp_1=dp_2:=dp,d\mu_1=d\mu_2:=d\mu\). But now one has \(2\) equations and \(3\) unknowns \(dT,dp,d\mu\):

\[S_1dT-V_1dp+N_1d\mu=0\]

\[S_2dT-V_2dp+N_2d\mu=0\]

Eliminating \(d\mu\) yields the Clausius-Clapeyron equation:

\[\frac{dp}{dT}=\frac{S_2/N_2-S_1/N_1}{V_2/N_2-V_1/N_1}\]

Or in terms of molar entropies \(s:=S/n\) and molar volumes \(v:=V/n\), where \(n:=N/N_A\):

\[\frac{dp}{dT}=\frac{\Delta s}{\Delta v}\]

So in other words, phase transitions are all about parameters changing discontinuously, and in this case the discontinuous changes \(\Delta s,\Delta v\) are directly also what influence the slope of the phase boundary.

Because the phase transition occurs isothermally at temperature \(T\), one can also write \(\Delta s:=q_L/T\) where \(q_L\) is the molar latent heat released during the phase transition. And often, if it’s a liquid-to-gas phase transition, the molar volume of liquid water \(v_{\ell}\approx\) is significantly less than the molar volume of water vapor \(v_{\text H_2\text O(\text g)}\approx\), so it is common to approximate \(\Delta v\approx v_g\approx RT/p\) (if the gas is ideal) so that \(q_L\) refers to the specific latent heat of vaporization \(\ell\to\text g\) and, assuming \(q_L\) to be approximately \(T\)-independent, the Clausius-Clapeyron equation can be integrated to yield the equation of the phase boundary itself:

\[p=p_0\exp-\frac{q_L}{R}\left(\frac{1}{T}-\frac{1}{T_0}\right)\]

where in this context \(p\) is called the vapor pressure. Alternatively, written in the form:

\[\ln p-\ln p_0=-\frac{q_L}{R}\left(\frac{1}{T}-\frac{1}{T_0}\right)\]

shows that \(q_L\) can be experimentally determined through linear regression of \(\ln p\) vs. \(1/T\).

Finally, note that earlier the choice was made to eliminate \(d\mu\) to isolate for \(dp/dT\); however one could just as well have eliminated \(dp\) to isolate for \(d\mu/dT\) or eliminated \(dT\) to obtain \(dp/d\mu\)…each giving rise to its own kind of Clausius-Clapeyron equation.

Problem #\(7\): Using the fact that in a closed cycle \(\oint dE=0\), write \(dE=TdS-pdV+\mu dN\) and apply Stokes’ theorem to obtain suitable Maxwell relations.

Solution #\(7\): Stokes’ theorem needs a curve and a surface with that as its boundary curve. First, consider looking at just motion in the \((S,V)\)-plane so that \(dN=0\). Then Stokes’ theorem reduces to Green’s theorem in that plane:

\[0=\oint TdS-pdV=\oint\begin{pmatrix} T\\-p\end{pmatrix}\cdot\begin{pmatrix}dS\\dV\end{pmatrix}=\iint\left(-\frac{\partial p}{\partial S}-\frac{\partial T}{\partial V}\right)dSdV\]

Hence one obtains the Maxwell relation:

\[\left(\frac{\partial p}{\partial S}\right)_{N,V}=-\left(\frac{\partial T}{\partial V}\right)_{N,S}\]

Working in the \((N,S)\) plane or \((N,V)\) plane produces \(2\) other Maxwell relations.

Problem #\(8\): Equation of state as a constitutive relation/dispersion relation/heart of the physics. Want to distinguish clearly between this and all the kinematic Maxwell relations, definitions, etc.

Equations of state will never involve entropy \(S\), the experimentally inaccessible bastard. So when one encounters any \(\partial S\) quantities, the immediate knee-jerk reaction should be to convert it to a corresponding \(\partial T\) derivative which is more readily measurable.

All the usual shorthands for special collections of partial derivatives like heat capacities, thermal expansion coefficients, compressibilities/bulk moduli, and other moduli are singled out for this special treatment because they are often found to be constant material parameters?

\[C_V:=\left(\frac{\partial E}{\partial T}\right)_V\]

\[C_{\sigma}:=\]

\[\alpha=\left(\frac{\partial V}{\partial T}\right)_{\sigma}\]

\[\kappa_T=\frac{1}{V}\left(\frac{\partial V}{\partial\sigma}\right)_T\]

Solution #\(8\):

Problem: In general, the energy may be written abstractly as a linear combination of \(N\) extensive natural variables \(Q_i\) (thought of as “generalized charges/coordinates“) weighted by their \(N\) conjugate intensive derived variables \(\phi_i\) (thought of as “generalized potentials/forces“):

\[E=\sum_{i=1}^N\phi_iQ_i\]

For instance, temperature \(T\) can be thought of as an “entropy potential” as entropy \(S\) flows via heat from high \(T\) to low \(T\). Similarly, \(\sigma=-p\) is a “volume potential”, thus volume \(V\) flows from high \(\sigma\) to low \(\sigma\), aka from low \(p\) to high \(p\) (this is consistent with the usual interpretation of pressure \(p\) as a force acting on the piston walls). Similarly, particles \(N\) flow from high chemical potential \(\mu\) to low \(\mu\). In general, charge \(Q_i\) flows from high potential \(\phi_i\) to low \(\phi_i\).

How many Maxwell relations can be obtained from \(E\) alone and what is their general form?

Solution: There are \(N\choose{2}\)\(=\frac{N(N-1)}{2}\) Maxwell relations that can be wringed out from this equilibrium potential \(E\) alone. Because there are no fiddly minus signs in this series, it is clear that there won’t be any fiddly minus signs in the corresponding Maxwell relations. In this case, all Maxwell relations will have the form:

\[\left(\frac{\partial\phi_1}{\partial Q_2}\right)_{Q_1,Q_3,…}=\left(\frac{\partial\phi_2}{\partial Q_1}\right)_{Q_2,Q_3,…}\]

Problem: For a closed single-component \(3\)D gas in equilibrium, how many independent intensive variables parameterize the equilibrium manifold of the system?

Solution: The Gibbs phase rule (analogous to Euler’s graph formula \(F+V=E+2\)) asserts:

\[I+P=C+2\Rightarrow I+1=1+2\Rightarrow I=2\]

So one is always free to select any \(2\) intensive potentials such that when their values are fixed, so too automatically are the values of all other intensive potentials at equilibrium. It’s a bit like saying if a mass is acted on by forces \(\textbf F_1,\textbf F_2,\textbf F_3\), and one is told \(\textbf F_1,\textbf F_2\), then because the mass is in translational equilibrium the value of \(\textbf F_3=-\textbf F_1-\textbf F_2\) is fixed.

Problem: For a single-component \(3\)D gas with energy/Hamiltonian:

\[E=TS+\sigma V+\mu N\]

Explain whether the following partial derivatives are (in general) well-posed or not. For those that are well-posed, write down their associated Maxwell relation.

\[\left(\frac{\partial S}{\partial V}\right)_{N}\]

\[\left(\frac{\partial S}{\partial V}\right)_{T,V,\mu}\]

\[\left(\frac{\partial S}{\partial V}\right)_{T,N}\]

\[\left(\frac{\partial T}{\partial V}\right)_{\sigma, N}\]

\[\left(\frac{\partial T}{\partial \mu}\right)_{\sigma, N}\]

\[\left(\frac{\partial T}{\partial S}\right)_{\sigma, N}\]

\[\left(\frac{\partial T}{\partial E}\right)_{V, N}\]

\[\left(\frac{\partial H}{\partial T}\right)_{V, N}\]

\[\left(\frac{\partial T}{\partial G}\right)_{\sigma, \mu}\]

\[\left(\frac{\partial V}{\partial \mu}\right)_{S,E}\]

\[\left(\frac{\partial T}{\partial V}\right)_{S,\sigma}\]

Solution: It is useful to define the notion of a natural variable set to be any energy together with its set of natural variables which are the natural variables of some energy. Starting from the fundamental natural variable set:

\[\{E,S,V,N\}\]

Legendre transforms give all \(8\) of the other natural variable sets:

\[\{F,T,V,N\}\]

\[\{H,S,\sigma,N\}\]

\[\{G,T,\sigma,N\}\]

\[\{\Phi,T,V,\mu\}\]

and \(3\) other combinations that don’t seem to have a name. Anyways, the point is that the Gibbs phase rule gives \(I=2\), but it doesn’t count the extensivity degree of freedom which is always present because that doesn’t affect equilibrium, hence explaining why all natural variable sets have \(2+1=3\) variables. Moreover, as should be clear from the fact that one is Legendre transforming, no variable appears with its conjugate in the same natural variable set.

This then provides a litmus test for whether a partial derivative is ill-posed or not; just see if the variables in the subscript together with either the variable in the numerator or the variable in the denominator can be made to form a natural variable set or not; more simply, can one form a set which is not cohabited by conjugate variables (is there a notion of conjugate variables for the energies like \(E,F,\)etc, themselves that makes the Maxwell relations continue to hold)?

(aside: actually this last partial derivative should be undefined? because holding both \(\sigma,\mu\) constant amounts to holding \(T\) constant since they’re intensive…?)

Also, given any \(2\) extensives \(E_1,E_2\) and any \(2\) distinct intensives \(I_1\neq I_2\), the partial derivative:

\[\left(\frac{\partial E_1}{\partial E_2}\right)_{I_1,I_2}=\frac{E_1}{E_2}\]

this follows because \(E_1/E_2\) is intensive so equal to a constant \(\lambda=\lambda(I_1,I_2)\), so \(E_1=\lambda E_2\Rightarrow \partial E_1/\partial E_2=\lambda=E_1/E_2\).

Aside: when intensive and intensive or extensive and extensive bunch together like bosons, then the Maxwell relation has a minus sign. Similarly, intensive and extensive antibunch like fermions but that gives a + sign Maxwell relation. It seems that, roughly speaking, the individual intensive/extensive variables can treated like fermions, in the sense that starting with any Maxwell relation and exchanging \(\phi_i\Leftrightarrow Q_i\) gives a minus sign. And obviously equality is symmetric which is really a reflection of the fact that 2 fermions together make a boson.

Problem: For a single-component system, what are the \(3\) standard intensive equilibrium material properties?

Solution: The compressibility (either the isothermal one \(\kappa_T\) or the isentropic one \(\kappa_S\), and equivalently one can use the isothermal bulk modulus \(B_T\) or the isentropic bulk modulus \(B_S\)), the specific heat capacity (either \(c_V\) or \(c_p\) and note it could be per unit mass or per unit mole, etc.) and the thermal expansion coefficient \(\alpha\). The definitions are here.

Problem: Using the compressible Bernoulli’s equation, show that enthalpy is conserved in a Joule-Thomson expansion. Define the corresponding Joule-Thomson coefficient \(\mu_{\text{JT}}\) and show that \(\mu_{\text{JT}}=0\) for ordinary Joule expansion of an ideal gas.

Solution: The compressible Bernoulli equation looks like the usual Bernoulli equation but with the addition of the gas’s specific energy \(e\):

\[p+\frac{1}{2}\rho v^2+\rho\phi+e=\text{const.}\]

The terms \(h:=e+p\) are nothing more than the specific enthalpy of the gas.

In Joule-Thomson expansion, it is conventional to assume the macroscopic energy density \(\frac{1}{2}\rho v^2+\rho\phi\) is constant throughout the expansion, so this implies that \(h\) is conserved.

Alternatively, one can imagine a setup in which gas at a higher (but constant) pressure \(p\) is throttled through a porous plug to a region of lower (constant) pressure \(p'<p\). Then, the gas behind does work \(pV\) on the gas that passes through, and similarly the gas that expands in the other side does work \(p’V\) on the gas in front of it (too lazy to draw this). It’s as if there were fictitious pistons on either side of the plug…assuming the whole thing is enclosed in adiabatic walls, so \(Q=0\), then from the first law of thermodynamics, \(\Delta H=0\).

The Joule-Thomson temperature change \(\Delta T_{\text{JT}}\) is familiar in everyday life. For instance, when opening a bike tire valve, the pressure inside is initially a few atmospheres higher than the ambient atmospheric pressure, but as the gas escapes (isenthalpically) it cools down, causing the tire valve to feel cold to the touch. Physically, for a non-ideal gas, expansion increases potential energy, reducing kinetic energy, hence cooling the gas.

In general, the Joule-Thomson coefficient:

\[\mu_{\text{JT}}:=\left(\frac{\partial T}{\partial p}\right)_{H,N}=\frac{V}{C_p}\left(\alpha T-1\right)\]

quantifies this cooling across a pressure differential (for most gases at room temperature, the Joule-Thomson coefficient is positive and of order \(\mu_{\text{JT}}\sim 0.1\frac{\text J}{\text{atm}}\), hence explaining the cooling rather than heating normally observed).

For an ideal gas, \(\alpha=1/T\) so there is no associated Joule-Thomson cooling/heating across a pressure differential. The inversion point is when \(\mu_{\text{JT}}=0\).

Problem: Distinguish between heat engines, refrigerators, and heat pumps.

Solution: A heat engine \(\textbf x_E(t)\) is an abstraction of any periodic process \(\textbf x_E(t+\Delta t)=\textbf x_E(t)\) whose net outcome in each period \(\Delta t\) is to convert some amount of (useless!) heat \(Q_H>0\) into (useful!) work \(W>0\) (thus, both \(Q_H\) and \(W\) are normalized per orbit of the heat engine \(\textbf x_E(t)\) in a suitable state space). It thus makes sense to define the efficiency \(\eta\) of a heat engine by the buck-to-bang ratio:

\[\eta:=\frac{W}{Q_H}\]

Kelvin’s formulation of the \(2^{\text{nd}}\) law of thermodynamics is logically equivalent to the assertion that the efficiency \(\eta\) of any heat engine \(\textbf x_E(t)\) must obey:

\[\eta<1\]

though in theory \(\eta\) can get arbitrarily close to \(1\). Schematically, for each orbit:

Since \(E\) is a function of the engine’s state \(\textbf x_E(t)\) only, over each \(\Delta t\)-orbit, \(\oint dE=0\) so the first law of thermodynamics guarantees \(\oint\bar dQ+\oint\bar dW=0\). This guarantees that \(Q_H=W+Q_C\) over each engine period \(\Delta t\).

Both refrigerators and heat pumps are basically the same thing (note: refrigerators are not called “heat refrigerators” even though that would have been more consistent with “heat engine” and “heat pump”). They are also both abstractions of any periodic process whose net outcome in each period is to remove some amount of heat \(Q_C>0\) from a colder place and dump some of that heat \(Q_H>0\) into a hotter place, fighting an uphill battle against the spontaneous direction that heat would otherwise flow (i.e. from hotter to colder). The difference between refrigerators and heat pumps is simply a matter of emphasis; in the case of refrigerators, the goal is to remove as much heat \(Q_C\) from the cold place as possible, whereas for heat pumps the goal is to dump as much heat \(Q_H\) into the hot place as possible. Schematically, for each orbit:

Clausius’s formulation of the \(2^{\text{nd}}\) law of thermodynamics is logically equivalent to the assertion that for any refrigerator or heat pump, \(Q_H<Q_C\). This implies that some external work \(W>0\) must be done each cycle to facilitate this heat transfer. This motivates the corresponding definitions of the coefficient of performance \(\text{COP}\) for a refrigerator:

\[\text{COP}:=\frac{Q_C}{W}\]

and a heat pump:

\[\text{COP}:=\frac{Q_H}{W}\]

Problem: Explain what the reversibility theorem (also misleadingly called “Carnot’s theorem”) asserts.

Solution: The notions of heat engines and refrigerators/heat pumps are very general, and a priori there is no requirement about “using a hot reservoir \(T_H\) and a cold reservoir \(T_C\)”. However, if one restricts one’s scope to just the subset of heat engines and refrigerators/heat pumps operating only between \(2\) heat reservoirs \(T_H,T_C\), then within this subset one can prove that reversible cycles are the most efficient in all cases (i.e. for heat engines they maximize \(\eta\), and for refrigerators/heat pumps they maximize \(\text{COP}\)).

Problem: Show that the specific example of a Carnot cycle is reversible (though it certainly isn’t the only reversible cycle one can take), and hence compute \(\eta\) for a Carnot heat engine and \(\text{COP}\) for both a Carnot refrigerator and a Carnot heat pump, thus obtaining upper bounds on these values within the subsets described earlier.

Solution: The key is that every step of the Carnot cycle is reversible, regardless of whether it is run as a heat engine or as a refrigerator/heat pump. The universal way to depict the Carnot cycle is on the \((T,S)\)-plane:

Any other depiction of the Carnot cycle, such as in the \((p,V)\)-plane using an ideal gas working substance is then simply a geometric deformation of this rectangle:

For a Carnot heat engine, the efficiency is:

\[\eta=1-\frac{T_C}{T_H}<1\]

For a Carnot refrigerator, the coefficient of performance is:

\[\text{COP}=\frac{T_C}{T_H-T_C}\]

And for a Carnot heat pump:

\[\text{COP}=\frac{T_H}{T_H-T_C}\]

Problem: By a Stokes-like maneuver, any reversible cycle can be decomposed into a bunch of Carnot “vortices” (more precisely, isothermal and adiabatic segments). Hence, establish the Clausius inequality:

\[\oint\frac{\overline{d}Q}{T}\leq 0\]

with equality if and only if the cycle is reversible (e.g. a Carnot cycle).

Solution:

The fact that \(\oint\frac{\overline{d}Q}{T}=0\) in the space of reversible cycles implies that there exists a conservative field \(S\) called the entropy that only depends on the initial and final equilibrium states.

Posted in Blog | Leave a comment