Tricks for Learning Physics

Learning physics is hard. The purpose of this post is to collect a bunch of techniques that can help to make the process of learning (and remembering!) new concepts easier.

Trick #\(1\): Avoid always simplifying formulas as much as possible, group quantities in dimensionally useful combinations.

Explanation: Since elementary school, one is often taught to simplify everything as much as possible. While this is often a good idea if one is doing some calculation, sometimes a final result is better left unsimplified, especially if dimensionally intuitive quantities are grouped together.

Example: The density of states in \(k\)-space of an ideal electron gas, simplified, is:

\[g(k)=\]

However, it is much more preferable to remember it in the unsimplified form:

\[g(k)=\]

Example: The impedance \(Z\) of non-dispersive transverse displacement waves on a string of tension \(T\) and speed \(c\) is, when simplified, \(Z=\sqrt{Tc}\). However, it is preferable to remember it as:

\[Z=\frac{T}{\sqrt{T/c}}\]

since this reinforces that it’s just a “force/velocity”.

Trick #\(2\): Try covering up the microscopic inner workings with a black box.

Explanation: This is basically an engineering mindset. For instance, to use an op-amp, one does not have to understand the detailed transistor networks inside.

Example: A Fabry-Perot interferometer…looks like a diffraction grating viewed as a black box.

Trick #\(3\): Understand systems by isomorphism.

Explanation: Mathematicians love talking about (and finding) isomorphisms between various kinds of spaces and structures, since often one structure \(X\) and easier to understand than another structure \(Y\) but in fact both are really the same, so one can leverage one’s understanding of \(X\) in order to make sense of \(Y\).

Example: Electric circuits are not as intuitive to me as a damped, driven harmonic oscillator. Yet the \(2\) systems are in fact often isomorphic.

Trick #\(4\): Dimensional analysis! And don’t be afraid to package a lot of constants into one’s own custom-defined variables.

Explanation: As a general rule of thumb, use the least number of variable packages that eliminates all numerical factors (a catchy slogan to summarize this?).

Example: The dispersion relation for the \(n^{\text{th}}\) mode of waves propagating along the \(x\)-axis of a \(2\)D waveguide with fixed boundary conditions at \(y=0,\Delta y\) can be written:

\[\omega^2=c^2\left(k_x^2+\frac{n^2\pi^2}{\Delta y^2}\right)\]

This formula can be conceptually simplified by defining the cutoff frequency \(\omega_c:=\pi c/\Delta y\) so that:

\[\omega^2=c^2k_x^2+n\omega_c^2\]

Example: The dispersion relation for electromagnetic waves in a conductor at high frequencies can be written:

(plasma frequency)

Trick #\(5\): Build a “circuit model” of the system.

Explanation:

Example: A transmission line can be viewed as…

Trick #\(6\): Write formulas without any numerical prefactors, focusing on dimensional analysis.

Trick #\(7\): Remember some physical values of quantities in SI units, get a feel for orders of magnitude.

Example: Knowing that Avogadro’s constant is something like \(N_A\sim 10^23\) and Boltzmann’s constant is \(k_B\sim 10^{-23}\) (both when expressed in SI units), it follows that their product should be \(O(1)\) (again in SI units), and in this context one obvious \(O(1)\) constant is the gas constant \(R\approx 8.3\) so this helps to remember that:

\[N_Ak_B=R\]

Trick #\(8\):

Posted in Blog | Leave a comment

Quantifying & Interpreting Errors

The purpose of this post is to lay the foundations for what any experimental physicist should know when it comes to analyzing their experimental data in a rigorous and thoughtful manner.

Problem: How does on reduce random error? How does one reduce systematic error?

Solution: Inspecting the formula for random error reveals that there are essentially only \(2\) ways one can reduce random error:

  1. Reduce the variable controlling the magnitude of the random error (e.g. if it’s Johnson noise, cool one’s electronics down, etc.)
  2. Make more measurements

Reducing systematic error is a completely different beast, and requires calibration, differential measurements, etc.

Problem: Show that, if \(X_1,X_2,…,X_N\) are i.i.d. random variables, hence all having the same mean \(\mu=\langle X_1\rangle=\langle X_2\rangle=…\), then the random variable:

\[\bar X:=\frac{X_1+X_2+…+X_N}{N}\]

is an unbiased estimator for \(\mu\).

Solution: The purpose of this question is just to emphasize what it means to be an unbiased estimator, specifically one should compute the expectation:

\[\langle\bar X\rangle=\frac{\langle X_1+X_2+…+X_N\rangle}{N}=\frac{\langle X_1\rangle+\langle X_2\rangle+…+\langle X_N\rangle}{N}=\frac{\mu+\mu+…+\mu}{N}=\frac{N\mu}{N}=\mu\]

so it is indeed unbiased since there is no “systematic error” \(\langle\bar X\rangle=\mu\).

Problem: Show that, if \(X_1,X_2,…,X_N\) are i.i.d. random variables, hence all having the same mean \(\mu=\langle X_1\rangle=\langle X_2\rangle=…\) and same variance \(\sigma^2=\langle (X_1-\mu)^2\rangle=\langle (X_2-\mu)^2\rangle=…\), then the random variables:

\[\sigma^2_p=\frac{1}{N}\sum_{i=1}^N(X_i^2-\mu^2)\]

\[\sigma^2_s:=\frac{1}{N-1}\sum_{i=1}^N(X_i^2-\bar X^2)\]

are both unbiased estimators for \(\sigma^2\).

Solution:

In practice, one never knows the actual population mean \(\mu\), so one always has to estimate \(\mu\) somehow. And there are basically \(2\) ways one could estimate \(\mu\), namely from somewhere else (e.g. some other data set), or using the data set itself!

So heuristically, Bessel’s correction of \(1/(N-1)\) arises from the fact that, by using the data set itself to (unbiasedly) estimate the mean, so the \(N\) random variables are in some sense constrained through their vanishing net residual \(\sum_{i=1}^N(X_i-\bar X)=0\); this constraint represents a reduction of \(1\) degree of freedom, hence \(N\mapsto N-1\).

Problem: Explain how the \(chi^2\) goodness-of-fit test arises from maximizing a log-likelihood function subject to normally distributed errors between the data and the model.

Solution: The \(chi^2\) statistic is basically just a rearrangement of the usual formula for variance to isolate for \(N=\chi^2\). Indeed, comparing \(\chi^2\ll N,\chi^2\approx N,\chi^2\ll N\) basically is the whole point of the test…the subtraction of the number of constraints from \(N\) is also reminiscent of the Bessel correction, and in fact the \(2\) are there for conceptually the same reason.

Posted in Blog | Leave a comment

Computer Science Notes

Just as the fundamental theorem of single-variable calculus \(\int_{x_1}^{x_2}f'(x)dx=f(x_2)-f(x_1)\) is the key insight on which the entire subject of single-variable calculus rests, there is an analogous sense in which one can consider a fundamental theorem of classical computing to be the key insight on which the entire field of classical computing rests. This is:

Fundamental Theorem of Classical Computing: The vector space \(\textbf N\) over the binary Galois field \(\textbf Z/2\textbf Z\) admits the countably infinite basis \(\textbf N=\text{span}_{\textbf Z/2\textbf Z}\{2^n:n\textbf N\}\) so that \(\text{dim}_{_{\textbf Z/2\textbf Z}}(\textbf N)=\aleph_0\). Put differently, the binary representation map \(n\in\textbf N\mapsto [n]^{(1,2,4,8,…)}\in \) is

From a physics perspective, one can loosely think of \(\textbf N\cong(\textbf Z/2\textbf Z)^{\oplus\infty}\) as the direct sum of infinitely many copies of the binary “vector space” \(\textbf Z/2\textbf Z\). By contrast, in the field of quantum computing, one instead has a vector space of the form \((\textbf C^2)^{\otimes N}\) for \(N\) qubits…and the multiplicative structure of the tensor product is much richer than the additive structure of the direct sum, hence the interest in quantum computing.

Knowing all this, it follows that any data \(X\) (e.g. an image file, an audio file, etc.) which can simply be “reduced to numbers” by some injection \(f:X\to\textbf N\) is also in principle just reduced to some bit string \((b_k)_{k=0}^\infty\) simply by taking each \(f(x)\in\textbf N\) for \(x\in X\) and writing out the binary representation of \(f(x)\). For instance, when \(X=\{a,b,c,…,x,y,z\}\) is the alphabet, then one possible injection (or encoding) is called the American Standard Code for Information Interchange \(\text{ASCII}:\{a,b,c,…,x,y,z\}\to\textbf N\) which maps for instance \(\text{ASCII}(a):=97\), \(\text{ASCII}(b):=98\), and so forth (strictly speaking, ASCII uses \(7\) bits to represent \(2^7=128\) characters of which \(26\) are the usual English alphabet letters in lower case while another \(26\) are uppercase English letters. There was even an extended ASCII which used \(8\) bits to represent \(2^8=256\) characters, but ultimately that was clearly insufficient for things like the Chinese language and so nowadays Unicode (with UTF-8 encoding, which stands for Unicode Transformation Format – 8 bit, or UTF-16, UTF-32, etc.) tends to be used in lieu of ASCII).

When storing numbers in memory (e.g. RAM, HDD, SSD) a computer can experience overflow error (most computers have \(64\)-bit architecture, so can safely store up to \(2^{64}-1\)), roundoff error (this is especially relevant to floating point arithmetic), and precision errors.

An analog-to-digital converter (ADC) is just an abstraction for any function that maps an analog signal \(V(t)\), \(I(x,y)\), etc. to a digital signal \(\bar V_i,\bar I_{ij}\), etc. More precisely, one can consider an ADC to be a pair \(\text{ADC}=(f_s,\rho)\) where \(f_s\) is the sampling rate of the ADC in samples/second and \(\rho\) is the bit depth/resolution/precision at which the ADC quantizes data in bits/sample, thus the bit rate \(\dot b\) of the ADC is \(\dot b=f_s\rho\) and this is in general distinct from the baud rate \(\dot{Bd}\) of serial communication with the ADC by a factor \(\lambda:=\frac{\dot b}{\dot{Bd}}\geq 1\) which describes the number of bits per baud (see this Stack Overflow Q&A for an idea of the distinction).

For \(V(t)\) an analog signal in the time domain which is bandlimited by some bandwidth \(\Delta f\), the Nyquist-Shannon sampling theorem asserts that in order to avoid aliasing distortions when sampling \(V(t)\), one has to use \(f_s>\Delta f\). Equivalently, if \(f^*=\Delta f/2\) is the largest frequency present in \(V(t)\), then the sampling frequency needs to obey \(f_s>2f^*\). For instance, humans can hear audio up to \(f^*=20\text{ kHz}\), so audio ADCs (a fancy way of saying microphones) sample at \(f_s=48\text{ kHz}\). Cameras are just image ADCs, where now “samples” is replaced by “pixels” and so \(f_s\) might be better called “pixel frequency” (with units of pixels/meter rather than samples/second?) and the use of the RGB color space is fundamentally based on the biology of the human eye and its \(3\) types of cone cells, and conventionally each R, G, B channel has \(256\) levels (or \(1\) byte) of intensity quantization simply because that was empirically found to be sufficient (so the total bit depth of an RGB image is \(\rho=3\) bytes/pixel or \(24\) bits/px.

In practice, such data would likely be further compressed (either via lossless or lossy data compression algorithms). For instance, JPEG (lossy), PNG (lossless), run-length encoding (lossless), etc. for digital/bitmap/raster images, Lempel-Ziv-Welch (LZW) (lossless) compression, Huffman encoding (lossless), byte pair encoding, for text files, and perceptual audio encoding (lossy) which exploits the psychoacoustic quirks of the human auditory system such as auditory masking and high frequency limits.

Computers & Logic Circuits

One abstract paradigm for understanding how a computer works is: input\(\to\)storage + processing\(\to\)output. Input is typically taken from sensors (e.g. keyboards, mouses, touchscreens, microphones, cameras), memory is handled by RAM, storage is handled by HDD/SSD, processing is done by the central processing unit (CPU) (an integrated circuit (IC)) where CPU = control unit + arithmetic logic unit (ALU) (both storage and processing use logic circuits made of many logic gates combined together), and output is a monitor, a speaker, an electric motor, an LED etc. Processing + memory are heavily interdependent, connected by a memory bus.

This excellent YouTube demonstration of implementing standard logic gates (e.g. buffers, NOT gates, AND gates, OR gates, XOR gates, NAND gates, NOR gates, etc.) using standard hardware on a solderless breadboard, notably transistors. So really, when it comes to processing data, a “computer” is an abstraction over “logic circuits” which itself is an abstraction over “logic gates” which itself is an abstraction over “bits” which itself is an abstraction over transistors and physical hardware that one actually touch and feel in the real world.

Our current computer (Microsoft Surface Studio \(2\)) has \(\sigma_{\text{RAM}}=32\text{ GB}\) and \(\sigma_{\text{SSD or C-Drive}}=1\text{ TB}\) (with Microsoft OneDrive providing an additional \(\sigma_{\text{OneDrive}}=1\text{ TB}\) of storage space).

The Internet

A computer network is topologically any strongly connected undirected graph where nodes represent computing devices (e.g. computers, phones, etc.) and edges are communication channels between a pair of computers. Common network topologies include the ring, star, mesh, bus, and tree topologies. Examples of computer networks include local area networks (LAN), wide area networks (WAN), data center networks (DCN), etc. with the Internet being a distributed packet-switched WAN. When designing the architecture of a computer network, one is interested in minimizing the distance (with respect to a suitable metric) that any piece of data \(D\) must travel to get from one computer to another.

At the level of physical hardware, data can be communicated between computers via copper category 5 (CAT5) twisted pair cables adhering to Ethernet standards. Fiber optic cables can also be used with Ethernet standards. Wi-Fi or Bluetooth communicates data via radio waves which suffer attenuation. Regardless, for all \(3\) of these line coding schemes (maps from abstract bits to a digital signal in the real world) we need to thank James Clerk Maxwell.

The informal notion of the “speed” of an internet connection (i.e. one of the communication channels mentioned earlier) between two computing devices \(X, Y\) is made precise by the bit rate \(\dot b_{(X,Y)}\) between the computing devices \(X\) and \(Y\). The bandwidth of that communication channel is then just the maximum bit rate \(\dot b^*_{(X,Y)}\) between \(X\) and \(Y\) (not to be confused with the signal processing notion of the bandwidth \(\Delta f\) of an analog signal). Another important factor is the latency \(\Delta t_{(X,Y)}\) of a given communication channel (i.e. just the delay). Running an Internet speed test for my computer with the measurement lab (M-lab) yields \(\dot b^{\text{downloads}}_{\text{(computer,M-lab)}}=650.7\frac{\text{Mb}}{\text s}\) and \(\dot b^{\text{uploads}}_{\text{(computer,M-lab)}}=732.0\frac{\text{Mb}}{\text s}\) and a latency of \(\Delta t_{\text{(computer,M-lab)}}=4\text{ ms}\)

Just like every house \(H\) has a physical address \(A_H\), in the WAN that we call the Internet, every computing device \(X\) has an Internet Protocol (IP) address \(\text{IP}_X\). When a computing device \(X\) transmits data packets \(D\) across a communication channel to another computing device \(Y\), \(X\) must specify the IP address \(\text{IP}_Y\) of \(Y\) in addition to providing its own IP address \(\text{IP}_X\) so that \(Y\) can reply to it. There are actually \(2\) common IP address protocols, IPv4 (a string of \(4\) bytes, leading to \(2^{32}\) possible IPv4 addresses) and IPv6 (a string of \(8\) hexadecimal numbers each up to \(0xFFFF\) for a total of \(2^{128}\) possible IPv6 addresses). IP addresses of computing devices may also be dynamic meaning that one’s Internet service provider changes it over time \(t\). Or, if one connects to a disjoint Wi-Fi network, then this will usually also mean a different IP address as each Wi-Fi provider (internet service provider) has a range of IP addresses it is allowed to allocate. By contrast, computing devices acting as servers (e.g. Google’s computers) often have static IP addresses (e.g. \(\text{IPv4}_{\text{Google computers}}=74.125.20.113\)) to make it easier for client computing devices to communicate quickly with.

In terms of actually interpreting what the numbers in an IP address mean, it turns out that it doesn’t have to be the case that (say in an IPv4 address) each byte corresponds to some piece of information. Rather, one can decide how one wishes to impose a hierarchy of subnetworks (subnets) on the IP address, that is, how many bits to represent a given piece of information. This is sometimes known as “octet splitting” where “octet” = “byte” in an IPv4 address.

The Domain Name System (DNS) is essentially a map \(\text{DNS}:\{\text{URLs}\}\to\{\text{IP addresses}\}\) and indeed anytime one uses a browser application (e.g. Chrome) to search for a website URL (e.g. www.youtube.com), DNS servers need to first find the IP address \(\text{DNS}\)(www.youtube.com) associated with that URL.

Posted in Blog | Leave a comment

Resolvents and Perturbation Theory

Problem #\(1\): Given a linear operator \(H\) on some Hilbert space, define the resolvent operator associated to \(H\).

Solution #\(1\): The resolvent \(G_H(E)\) of \(H\) is the operator-valued Mobius-like transformation of a complex variable \(E\in\textbf C\) defined by the matrix inverse:

\[G_H(E):=\frac{1}{E1-H}\]

(this notation \(A/B\) is only unambiguous when \([A,B^{-1}]=0\) which it is in this case).

Problem #\(2\): What is the domain for \(E\in\textbf C\) of the resolvent \(G_H(E)\)?

Solution #\(2\): Any value of \(E\in\textbf C\) for which the matrix \(E1-H\) is invertible leads to a well-defined resolvent. But invertibility is equivalent to a non-vanishing determinant \(\det(E1-H)\neq 0\). However, when \(\det(E1-H)=0\), then \(E\) is an eigenvalue of \(H\). So the domain of \(G_H(E)\) is \(E\in\textbf C-\text{spec}(H)\).

Problem #\(3\): To see the conclusion of Solution #\(2\) another way, assume \(H\) is Hermitian so that it admits a real orthonormal eigenbasis \(H|n\rangle=E_n|n\rangle\). Show that the resolvent \(G_H(E)\) of \(H\) may be expressed as a linear combination of projectors onto its eigenspaces:

\[\frac{1}{E1-H}=\sum_n\frac{|n\rangle\langle n|}{E-E_n}\]

Solution #\(3\): Insert \(2\) resolutions of the identity:

\[\frac{1}{E1-H}=\sum_n|n\rangle\langle n|\frac{1}{E1-H}\sum_m|m\rangle\langle m|\]

where the matrix element is \(\langle n|\frac{1}{E1-H}|m\rangle=\frac{\delta_{nm}}{E-E_n}\). Thus, the resolvent has a simple pole whenever \(E=E_n\) for some \(H\)-eigenstate \(|n\rangle\), and its residue at that simple pole is given by the corresponding projector \(|n\rangle\langle n|\).

Problem #\(4\): Show that if \(H=H_0+V\), then the resolvents \(G_H,G_{H_0}\) of \(H\) and \(H_0\) are related by a Lippman-Schwinger like formula:

\[G_H=\frac{1}{1-G_{H_0}V}G_{H_0}\]

Solution #\(4\):

if the last step of expanding the geometric series (also called a Neumann series in the context of operators) feels a bit handwavy, note that one can trivially rewrite the formula:

\[G_H=G_{H_0}+G_{H_0}VG_H\]

so that recursive self-substitution would reproduce the geometric Neumann series.

As an aside, the Lippman-Schwinger equation from scattering theory is:

\[|\psi\rangle=|\psi_0\rangle+G_{H_0}V|\psi\rangle\]

or equivalently:

\[|\psi\rangle=\frac{1}{1-G_{H_0}V}|\psi_0\rangle\]

where \(H=H_0+V\) and typically \(H_0=T\) is just the kinetic energy (in which case \(|\psi_0\rangle=|\textbf k\rangle\) is a plane wave with \(E=\hbar^2|\textbf k|^2/2m\) if the scattering is non-relativistic) and \(V\) is the “scattering potential” which is viewed as a perturbation of \(H_0\) (analogous to e.g. the nearly free electron model in condensed matter physics).

The resolvent \(G_{H_0}\) is the Green’s function for the unperturbed Schrodinger equation:

\[(E1-H_0)G_{H_0}=1\]

The same geometric Neumann series expansion of what is called the Moller scattering operator \(\Omega\) (it’s kind of like the \(S\)-operator in that it brings an asymptotic incident \(|\psi_0\rangle\mapsto \Omega|\psi_0\rangle=|\psi\rangle\)):

\[\Omega:=\frac{1}{1-G_{H_0}V}=\sum_{n=0}^{\infty}(G_{H_0}V)^n\]

in this context is called the Born series and truncating it at the \(n^{\text{th}}\) partial sum is called the \(n^{\text{th}}\) Born approximation to the scattered state \(|\psi\rangle\), analogous to doing \(n^{\text{th}}\)-order perturbation theory.

There are however a few subtleties; in the case of the Lippman-Schwinger equation, it is common to include a \(\pm i\varepsilon\)-prescription (equivalent to picking an indented contour in the complex \(|\textbf k|\)-plane) to distinguish the different kinds (advanced/retarded Green’s functions) representing ingoing or outgoing scattered waves, and it’s also not really perturbation theory in the sense that the energies are just taken to be \(E=\hbar^2|\textbf k|^2/2m\), rather one is much more interested in the eigenstates (asymptotically especially!) from which other kinds of data like scattering amplitudes and cross sections, etc. are more interesting. So the goals of the \(2\) programs are different.

Problem #\(5\): Show how, by tracking the movement of a simple pole \(E_n\) of \(G_H(E)\) in the complex \(E\)-plane, one can recover the eigenvalue and eigenstate corrections of perturbation theory.

Solution #\(5\): Shine the spotlight on some eigenstate \(|n\rangle\) and its associated energy \(E_n\) by separating the unperturbed resolvent as:

\[G_{H_0}(E)=\frac{|n\rangle\langle n|}{E-E_n}+\sum_{m\neq n}\frac{|m\rangle\langle m|}{E-E_m}\]

and substitute it into the geometric series for \(G_H=G_H(E)\):

\[G_H=G_{H_0}+G_{H_0}VG_{H_0}+G_{H_0}VG_{H_0}VG_{H_0}+…\]

for instance:

\[G_{H_0}VG_{H_0}=\frac{|n\rangle\langle n|V|n\rangle\langle n|}{(E-E_n)^2}+\sum_{m\neq n}\frac{|n\rangle\langle n|V|m\rangle\langle m|+h.c.}{(E-E_n)(E-E_m)}+\sum_{m,\ell\neq n}\frac{|m\rangle\langle m|V|\ell\rangle\langle\ell|}{(E-E_m)(E-E_{\ell})}\]

The first term in the sum turns \(E=E_n\) from a simple pole into a double pole. It turns out this is what’s responsible for shifting the location of the pole away from \(E=E_n\), in other words, perturbing the eigenvalue. Meanwhile the series in the middle contributes to the residue at \(E=E_n\) because \(m\neq n\). Clearly they must be responsible for perturbing the eigenstate. Finally, because \(m,\ell\neq n\) in the last sum, it will be analytic in a neighbourhood of \(E=E_n\), in other words: crap.

Strictly speaking, after including the \(G_{H_0}VG_{H_0}\) term from the geometric Neumann-Born-Laurent series, the pole still sits at \(E=E_n\), just that its order has increased from \(1\to 2\). But consider as an example the geometric series \(1+1/x+1/x^2+…\). For any finite partial sum truncation, the pole sits at \(x=0\). But for \(|x|>1\) this converges absolutely to \(x/(x-1)\) where now the pole has been displaced to \(x=1\). Or just take any function like \(\tan(x)\) and Taylor expand it around \(x=0\) say; although all the terms in that Taylor series are analytic, the limiting behavior must be non-analytic at \(x=\pm\pi/2\). It’s basically a more extreme version of a phase transition, since singularities are more extreme than discontinuities.

Anyways, the fact that its the expectation \(\langle n|V|n\rangle\) which is sitting in the numerator of the double pole term means that this is the \(1^{\text{st}}\)-order correction to the energy. Similarly, decomposing into partial fractions:

\[\frac{1}{(E-E_n)(E-E_m)}=\frac{1}{E_n-E_m}\left(\frac{1}{E-E_n}-\frac{1}{E-E_m}\right)\]

only the first term \(\sim(E-E_n)^{-1}\) contributes to the residue at \(E=E_n\), and moreover this eigenstate contribution is just \(\sum_{m\neq n}\frac{\langle m|V|n\rangle}{E_n-E_m}|m\rangle\). Continuing to higher-order terms in the expansion reproduces the next formulas.

Problem #\(6\): Let \(H=H(\lambda)\) be a non-degenerate Hamiltonian depending on a parameter \(\lambda\) (not necessarily infinitesimal), and let \(|n\rangle=|n(\lambda)\rangle\) be a normalized \(H\)-eigenstate with energy \(E_n=E_n(\lambda)\). By differentiating the spectral equation:

\[H|n\rangle=E_n|n\rangle\]

with respect to \(\lambda\), prove the Hellman-Feynman theorems for the rate of change of the eigenvalue \(E_n\) and the \(H\)-eigenstate \(|n\rangle\):

\[\frac{\partial E_n}{\partial\lambda}=\biggr\langle n\biggr|\frac{\partial H}{\partial\lambda}\biggr|n\biggr\rangle\]

\[\frac{\partial|n\rangle}{\partial\lambda}=\sum_{m\neq n}\frac{\biggr\langle m\biggr|\frac{\partial H}{\partial\lambda}\biggr|n\biggr\rangle}{E_n-E_m}|m\rangle\]

(for the latter, one also has to fix the global \(U(1)\) gauge via \(\langle n|\frac{\partial|n\rangle}{\partial\lambda}\in\textbf R\)).

Solution #\(6\): The product rule gives:

\[\frac{\partial H}{\partial\lambda}|n\rangle+H\frac{\partial|n\rangle}{\partial\lambda}=\frac{\partial E_n}{\partial\lambda}|n\rangle+E_n\frac{\partial |n\rangle}{\partial\lambda}\]

This is like having \(2\) vectors \((a,b,c)=(d,e,f)\); naturally one’s instinct would be to equate components \(a=d,b=e,c=f\). In this context, what that looks like is projecting both sides onto an arbitrary \(H\)-eigenstate \(|m\rangle\) to equate the scalar components of all vectors:

\[\biggr\langle m\biggr|\frac{\partial H}{\partial\lambda}\biggr|n\biggr\rangle+E_m\langle m|\frac{\partial|n\rangle}{\partial\lambda}=\frac{\partial E_n}{\partial\lambda}\delta_{nm}+E_n\langle m|\frac{\partial|n\rangle}{\partial\lambda}\]

the Hellman-Feynman theorems then arise by considering the \(2\) cases \(m=n\) and \(m\neq n\). There is a priori also a component of the rate of change \(\partial|n\rangle/\partial\lambda\) of the \(H\)-eigenstate \(|n\rangle\) along itself with amplitude \(\langle n|\frac{\partial|n\rangle}{\partial\lambda}\), but due to normalization \(\langle n|n\rangle=1\Rightarrow\frac{\partial\langle n|}{\partial\lambda}|n\rangle+\langle n|\frac{\partial|n\rangle}{\partial\lambda}=0\Rightarrow\Re\langle n|\frac{\partial|n\rangle}{\partial\lambda}=0\Rightarrow\langle n|\frac{\partial|n\rangle}{\partial\lambda}=0\); this is like saying that an ant crawling on a sphere \(|\textbf x|^2=\text{const}\) must have orthogonal position and velocity \(\textbf x\cdot\dot{\textbf x}=0\).

The Hellman-Feynman theorems are reminiscent of formulas such as:

\[i\hbar\frac{d}{dt}\langle\phi|A|\psi\rangle=\langle\phi|[A,H]|\psi\rangle+i\hbar\biggr\langle\phi\biggr|\frac{\partial H}{\partial t}\biggr|\psi\biggr\rangle\]

in which one takes \(A=H\) and \(\lambda=t\), as well as the special case \(\phi=\psi\) of Ehrenfest’s theorem.

Problem #\(7\): Hence, by applying the Hellman-Feynman theorems to a linearly perturbed Hamiltonian \(H=H_0+\lambda V\), deduce the \(O(\lambda^2)\) corrections to both the eigenvalues \(E_n\) and eigenstates \(|n\rangle\) of the unperturbed Hamiltonian \(H_0\) in the presence of a perturbation \(V\).

Solution #\(7\): Specialized to the case of this particular linearly perturbed Hamiltonian, one has trivially \(\frac{\partial H}{\partial\lambda}=V\). With this in mind, one simply takes the Hellman-Feynman formulas and differentiate them again with respect to \(\lambda\):

For fun, here is a (failed!) attempt to compute the \(O(\lambda^3)\) eigenvalue correction (what’s the mistake?):

And for the \(O(\lambda^2)\) eigenstate correction, refer to this document.

TO DO: extend/generalize all the above discussion to degenerate and time dependent perturbation theory!

Posted in Blog | Leave a comment

Gaussians & Feynman Diagrams

Although Feynman diagrams are often first encountered in statistical/quantum field theory contexts where they are employed in perturbative calculations of partition/correlation functions based on Wick’s theorem, there is a lot of “fluff” in these cases that obscures their underlying simplicity. The purpose of this post is therefore to build up to a simpler, intuitive view of what Feynman diagrams are really about that hopefully demystifies them.

Problem #\(1\): Calculate the \(n\)-th moment:

\[\langle x^n\rangle:=\frac{1}{\sigma\sqrt{2\pi}}\int_{-\infty}^{\infty}dx x^ne^{-x^2/2\sigma^2}\]

of a univariate normally distributed random variable \(x\) with zero mean \(\langle x\rangle=0\) (the choice of zero mean is motivated by the fact that in practice one only cares about central moments of the distribution, so to avoid writing \(x-\langle x\rangle\) everywhere it is convenient to just set \(\langle x\rangle:=0\)).

Solution #\(1\): It is clear that for odd \(n=1,3,5,…\), the integrand is an odd function so not only is \(\langle x\rangle=0\) by construction, but all higher odd moments also vanish \(\langle x^3\rangle=\langle x^5\rangle=…=0\). As for even \(n=0,2,4,…\), there are several ways:

Way #\(1\): Start with the \(n=0\) normalization (obtained in the usual Poissonian manner):

\[\int_{-\infty}^{\infty}dx e^{-x^2/2\sigma^2}=\sigma\sqrt{2\pi}\]

and differentiate the equation by \(\frac{\partial}{\partial(-1/2\sigma^2)}\) to pull down arbitrarily many factors of \(x^2\). One finds for instance:

\[\langle x^2\rangle=\sigma^2\]

\[\langle x^4\rangle=3\sigma^4\]

\[\langle x^6\rangle=15\sigma^6\]

\[\langle x^8\rangle=105\sigma^8\]

and so forth, in general following the rule \(\langle x^{2m}\rangle=(2m-1)!!\sigma^{2m}\) for even \(n=2m\), where the double factorial can also be written in terms of single factorials as:

\[(2m-1)!!=\frac{(2m)!}{2^mm!}\]

Way #\(2\): Substitute for \(x^2/2\sigma^2\) to recast the integral in terms of a gamma function:

\[\langle x^{2m}\rangle=\frac{(2\sigma^2)^m}{\sqrt{\pi}}\Gamma(m+1/2)\]

where the connection between the gamma function and factorials is well-known for half-integer arguments:

\[\Gamma(m+1/2)=(m-1/2)!=\frac{(2m-1)!!}{2^m}\sqrt{\pi}\]

(this is apparent if one starts with the well-known \((1/2)!=\sqrt{\pi}/2\) and works one’s way up from there).

Way #\(3\): Compute the moment generating function \(\langle e^{\kappa x}\rangle\) of the normal distribution by completing the square:

\[\langle e^{\kappa x}\rangle=e^{\kappa^2\sigma^2/2}\]

And Maclaurin-expand the resulting exponential:

\[e^{\kappa^2\sigma^2/2}=\sum_{m=0}^{\infty}\frac{\sigma^{2m}}{2^mm!}\kappa^{2m}\]

which immediately shows that all odd moments are zero while even moments are:

\[\frac{\langle x^{2m}\rangle}{(2m)!}=\frac{\sigma^{2m}}{2^mm!}\]

Problem #\(2\): From Solution #\(1\), the presence of factorials suggests a combinatorial interpretation of the result; what is this interpretation?

Solution #\(2\): Suppose one has \(6\) people that need to be paired up for a dance; how many pairings can be formed? There are \(2\) ways to think about this.

Way #\(1\): The first person can be paired with \(5\) other people. Then, after they’ve been paired, the next person can only pair up with \(3\) more people. And after they’ve paired, the next person can only pair with the \(1\) other person that’s left. So the answer is \(5!!=5\times 3\times 1=15\) pairs.

Way #\(2\): There are \(6!\) permutations of the \(6\) people. However, they are going to form \(3\) pairs which can be permuted in \(3!\) ways. And within each of the \(3\) pairs, there are a further \(2!=2\) permutations. So in total there will be \(\frac{6!}{2^33!}=15\) pairs.

The fact that Way #\(1\) and Way #\(2\) give the same result is just a restatement of the earlier identity \((2m-1)!!=(2m)!/2^mm!\).

In this case however, the “people” are the powers of \(x\) in \(x^{2m}\)! Because all factors of \(x\) are indistinguishable, thus all \((2m-1)!!\) pairings of the \(2m\) factors of \(x\) in \(x^{2m}\) into \(m\) pairs of \(x^2\) are equivalent. The factor of \(\sigma^{2m}\) then follows on dimensional analysis grounds (it’s the only length scale for the normal distribution) and the numerical coefficient takes on this combinatorial pairing interpretation.

Problem #\(3\): Estimate the expectation \(\langle\cos(x/\sigma)\rangle\) in a univariate normal random variable \(x\) with variance \(\sigma^2\) and zero mean \(\langle x\rangle=0\).

Solution #\(3\): The integral:

\[\biggr\langle\cos\frac{x}{\sigma}\biggr\rangle=\frac{1}{\sigma\sqrt{2\pi}}\int_{-\infty}^{\infty}dx\cos\frac{x}{\sigma}e^{-x^2/2\sigma^2}\]

will mostly receive contribution from small \(|x|\leq\sigma\), so one can hope to get a rough estimate of it by Maclaurin-expanding \(\cos\theta=1-\theta^2/2+\theta^4/24-\theta^6/720+…\):

\[\biggr\langle\cos\frac{x}{\sigma}\biggr\rangle\approx 1-\frac{\langle x^2\rangle}{2\sigma^2}+\frac{\langle x^4\rangle}{24\sigma^4}-\frac{\langle x^6\rangle}{720\sigma^6}\]

But these are just the moments that were computed above:

\[=1-\frac{1}{2}+\frac{1}{8}-\frac{1}{48}+…=\sum_{m=0}^{\infty}\frac{(-1/2)^m}{m!}\]

Alternatively, one can evaluate the expectation analytically by writing \(\cos\theta=\Re e^{i\theta}\) and completing the square to obtain:

\[\biggr\langle\cos\frac{x}{\sigma}\biggr\rangle=\frac{1}{\sqrt{e}}\approx 0.60653\]

(or one could have just recognized the earlier Maclaurin series for \(e^{-1/2}\)). So just taking the \(4\)th partial sum \(1-\frac{1}{2}+\frac{1}{8}-\frac{1}{48}=\frac{29}{48}\approx 0.60417\) already gets within \(0.4\%\) of the true answer. Thus, monomials/powers of \(x\) form a basis of analytic functions, and expectation is linear, so by computing all the moments of a distribution, one in principle has access to the expectation of any analytic function with respect to that distribution.

Problem #\(4\): Evaluate the cumulant generating function \(\ln\langle e^{\kappa x}\rangle\) of a univariate normal random variable \(x\) with variance \(\sigma^2\) and zero mean \(\langle x\rangle=0\).

Solution #\(4\): A cinch:

\[\ln\langle e^{\kappa x}\rangle=\ln e^{\kappa^2\sigma^2/2}=\frac{\kappa^2\sigma^2}{2}\]

So it is a quadratic parabola in \(\kappa\) with curvature \(\sigma^2\) at its vertex. The point therefore is that besides the \(2\)nd cumulant \(\sigma^2\), all other cumulants of the normal distribution are vanishing! For instance, the \(3\)rd cumulant (“skewness”) is \(\langle x^3\rangle=0\), the fourth cumulant (“excess kurtosis”) is \(\langle x^4\rangle-3\langle x^2\rangle^2=0\), etc.

Problem #\(4.5\): Another fun application of these ideas: define the \(n\)-th (probabilist’s) Hermite polynomials \(\text{He}_n(x)\) to be the unique degree-\(n\) monic polynomial which is orthogonal to all lower degree Hermite polynomials with respect to the Gaussian weight function \(e^{-x^2/2}\) over the real line \(\textbf R\). Hence, calculate the first \(5\) Hermite polynomials \(\text{He}_0(x),\text{He}_1(x),\text{He}_2(x),\text{He}_3(x),\text{He}_4(x)\).

Solution #\(4.5\): From the definition given above, \(\text{He}_0(x)\) must just be a constant, and the monic requirement fixes this constant to be \(1\); thus \(\text{He}_0(x)=1\). The next Hermite polynomial must have the form \(\text{He}_1(x)=x+c_0\). To fix \(c_0\), one thus requires that (using the fact that inner products with respect to a weight function are identical to expectations of products with respect to the weight function viewed as a probability distribution):

\[\langle\text{He}_1(x/\sigma)\text{He}_0(x/\sigma)\rangle=\langle x/\sigma\rangle+c_0=0\]

so \(c_0=0\) and \(\text{He}_1(x)=x\). Next make the ansatz \(\text{He}_2(x)=x^2+c_1x+c_0\). Enforcing:

\[\langle\text{He}_2(x/\sigma)\text{He}_0(x/\sigma)\rangle=\langle (x/\sigma)^2\rangle+c_1\langle x/\sigma\rangle+c_0=0\Rightarrow c_0=-1\]

\[\langle\text{He}_2(x/\sigma)\text{He}_1(x/\sigma)\rangle=\langle (x/\sigma)^3\rangle+c_1\langle (x/\sigma)^2\rangle+c_0\langle x/\sigma\rangle=0\Rightarrow c_1=0\]

So \(\text{He}_2(x)=x^2-1\). A similar procedure gives \(\text{He}_3(x)=x^3-3x\). At this point to speed oneself up a bit, one could recognize that the Hermite polynomials alternate in parity \(\text{He}_n(-x)=(-1)^n\text{He}_n(x)\), so powers of \(x\) hop by \(2\). This motivates the more intelligent ansatz \(\text{He}_4(x)=x^4+c_2x^2+c_0\), automatically ensuring orthogonality with \(\text{He}_1(x)\) and \(\text{He}_3(x)\). Enforcing orthogonality with \(\text{He}_0(x)\) and \(\text{He}_2(x)\) gives the system of linear equations \(3+c_2+c_0=0\) and \(15+3c_2+c_0=0\) so \(c_0=3\) and \(c_2=-6\) which gives \(\text{He}_4(x)=x^4-6x^2+3\).

(mention the exponential generating function of the Hermite polynomials, and the operator representation, any connections?)

Problem #\(5\): Consider generalizing the prior discussion of a univariate normal random variable \(x\) with variance \(\sigma^2\) and zero mean \(\langle x\rangle=0\) to a \(d\)-dimensional multivariate normal random vector \(\textbf x\in\textbf R^d\) with covariance matrix \(\sigma^2\) and zero mean \(\langle\textbf x\rangle=\textbf 0\). Write down the appropriate normalized probability density function \(\rho(\textbf x)\) for \(\textbf x\).

Solution #\(5\): In analogy with the \(d=1\) univariate normal distribution, one has:

\[\rho(\textbf x)=\frac{1}{\det(\sigma)(2\pi)^{d/2}}\exp\left(-\frac{1}{2}\textbf x^T\sigma^{-2}\textbf x\right)\]

(prove by diagonalizing the covariance matrix \(\sigma^2\) of \(\textbf x\)).

Problem #\(6\): What are the moment and cumulant generating functions of a \(d\)-dimensional multivariate normal random vector \(\textbf x\in\textbf R^d\) with covariance matrix \(\sigma^2\) and zero mean \(\langle\textbf x\rangle=\textbf 0\)?

Solution #\(6\): Again in analogy with \(d=1\):

\[\langle e^{\boldsymbol{\kappa}\cdot\textbf x}\rangle=\exp\left(\frac{1}{2}\boldsymbol{\kappa}^T\sigma^2\boldsymbol{\kappa}\right)\]

\[\ln\langle e^{\boldsymbol{\kappa}\cdot\textbf x}\rangle=\frac{1}{2}\boldsymbol{\kappa}^T\sigma^2\boldsymbol{\kappa}\]

The phrase “moment generating function” is only really appropriate in \(d=1\); this is because in \(d\geq 2\), the generator \(\langle e^{\boldsymbol{\kappa}\cdot\textbf x}\rangle\) for the random vector \(\textbf x=(x_1,x_2,…,x_d)\) generates more than just moments along a given axis like \(\langle x_1^2\rangle, \langle x_2^4\rangle\) but also correlators such as \(\langle x_1x_2^3\rangle,\langle x_1x_2x_3\rangle\), etc. which obviously didn’t exist in \(d=1\). Similar to the univariate case, the even \(\textbf Z_2\) symmetry of the multivariate generator means that only correlators with even powers of \(x_i\) survive, so for instance \(\langle x_1^2x_2x_3\rangle=0\). To compute such strictly even correlators, the quickest way is typically to just compute the relevant term in the Maclaurin expansion of the generator:

\[\exp\left(\frac{1}{2}\boldsymbol{\kappa}^T\sigma^2\boldsymbol{\kappa}\right)=1+\frac{1}{2}\boldsymbol{\kappa}^T\sigma^2\boldsymbol{\kappa}+\frac{1}{8}\boldsymbol{\kappa}^T\sigma^2\boldsymbol{\kappa}^{\otimes 2}\sigma^2\boldsymbol{\kappa}+…\]

Problem #\(7\): Explain why, for an arbitrary analytic random function \(f(\textbf x)\) of an arbitrary (i.e. not necessarily normal) random vector \(\textbf x\), the expectation is:

\[\langle f(\textbf x)\rangle=f\left(\frac{\partial}{\partial\boldsymbol{\kappa}}\right)\langle e^{\boldsymbol{\kappa}\cdot\textbf x}\rangle\biggr|_{\boldsymbol{\kappa}=\textbf 0}\]

Posted in Blog | Leave a comment

Landau-Ginzburg Theory

Problem #\(1\): What is the Landau-Ginzburg free energy functional \(F[m]\) for the Ising model?

Solution #\(1\): It is defined implicitly through:

\[e^{-\beta F[m]}=\sum_{\{\sigma_i\}\to_{\text{c.g.}}m(\textbf x)}e^{-\beta E_{\{\sigma_i\}}}\]

where \(E_{\sigma_i}=-E_{\text{ext}}\sum_{i=1}^N\sigma_i-E_{\text{int}}\sum_{\langle i,j\rangle}\sigma_i\sigma_j\) is the energy of a given spin microstate \(\{\sigma_i\}\) and “\(\text{c.g.}\)” is short for coarse graining the underlying \(d\)-dimensional lattice \(\Lambda\to\textbf R^d\).

Problem #\(2\): Describe how a saddle-point approximation can be used to evaluate the canonical partition function \(Z\) of a Landau-Ginzburg theory.

Solution #\(2\): In the canonical ensemble, the probability density functional \(p[\phi]\) of finding the system in a given configuration \(\phi=\phi(\textbf x)\) is given by the Boltzmann distribution:

\[p[\phi]=\frac{e^{-\beta F[\phi]}}{Z}\]

where, to ensure normalization \(\int\mathcal D\phi p[\phi]=1\) over the space of all local order parameter configurations \(\phi\), the canonical partition function \(Z\) is given by the path integral:

\[Z=\int\mathcal D\phi e^{-\beta F[\phi]}\]

In general, functional integrals (so-called because the integrand \(e^{-\beta F[\phi]}\) is a functional) are difficult to evaluate (partly because they are hard to even rigorously define!). However, whether one is doing integrals over \(\textbf R,\textbf C\) or function spaces, as long as one’s integrand looks like \(e^{-\text{something}}\), it’s always worth trying a saddle-point approximation, which in this case looks like:

\[Z\approx e^{-\beta F[\phi_*]}\]

where \(\phi_*\) is the order parameter configuration minimizing the free energy \(F=F[\phi]\). In other words, \(\phi_*\) is a stationary point of \(F[\phi]\) so that the functional derivative \(\frac{\delta F}{\delta\phi^*}=0\) vanishes.

Landau mean field theory is the special case of this saddle-point approximation in which all fluctuations \(\phi_*=\phi_*(\textbf x)\) are completely ignored, yielding a homogeneous mean field order parameter.

Problem #\(3\): Explain why, as with many other functionals in physics (e.g. the action \(S[\textbf x]\)), the Landau-Ginzburg free energy functional \(F[\phi]\) must take the form:

\[F[\phi]=\int d^d\textbf x \mathcal F\left(\phi(\textbf x),\frac{\partial\phi}{\partial\textbf x},…\right)\]

Combining this with the saddle-point approximation in Solution #\(2\), what can one conclude?

Solution #\(3\): The presence of the integral \(\int d^d\textbf x\) simply reflects the extensive nature of the free energy \(F\), while the dependence of the integrand \(\mathcal F\) on \(\phi\) and its derivatives only reflects locality.

In the special case that the free energy density \(\mathcal F\) depends only on the field \(\phi(\textbf x)\) and its gradient \(\frac{\partial\phi}{\partial\textbf x}\) (but no higher derivatives), and it obeys suitable boundary conditions, one then has the usual Euler-Lagrange equations:

\[\frac{\partial}{\partial\textbf x}\cdot\frac{\partial \mathcal F}{\partial (\partial\phi/\partial\textbf x)}=\frac{\partial\mathcal F}{\partial\textbf x}\]

or equivalently, because of the lack of explicit \(\textbf x\)-dependence in \(\mathcal F\), one has the equivalent Beltrami identity:

\[\frac{\partial \mathcal F}{\partial (\partial\phi/\partial\textbf x)}\cdot\frac{\partial\phi}{\partial\textbf x}-\mathcal F=-\mathcal F_0\]

for some constant \(\mathcal F_0\in\textbf R\).

Problem #\(3\): Suppose that free energy density \(\mathcal F(\phi)\) of a particular system (e.g. the Ising ferromagnet) is taken (on locality, analyticity and suitable symmetry grounds) to be of the form:

\[\mathcal F\left(\phi,\frac{\partial\phi}{\partial\textbf x}\right)=\frac{\alpha_2}{2}\phi^2+\frac{\alpha_4}{4}\phi^4+\frac{\gamma}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2\]

where the phenomenological coupling constants \(\alpha_2,\alpha_4,\gamma\) each may carry some \(T\)-dependence, though as far as the study of second-order phase transitions at critical points is concerned, only the \(T\)-dependence \(\alpha_2(T)\sim T-T_c\) on the quadratic \(\phi^2\) term matters, and all that needs to be assumed about the other coupling constants is their sign for all temperatures \(T\), in this case \(\alpha_4,\gamma>0\). Show that in the subcritical regime \(T<T_c\), the \(\textbf Z_2\) symmetry \(F[-\phi]=F[\phi]\) of the theory is spontaneously broken via a bifurcation into \(2\) degenerate ground states (also called vacua in analogy with QFT) representing mean-field homogeneous/ordered phases/configurations \(\phi_*(\textbf x)=\phi_0\). Show that there is also a more interesting domain wall soliton:

\[\phi_*^{\text{DW}}(x)=\phi_0\tanh\left(\sqrt{-\frac{\alpha_2}{2\gamma}}x\right)\]

that emerges upon imposing Dirichlet boundary conditions \(\lim_{x\to\pm\infty}\phi_*(\textbf x)=\pm\phi_0\) implementing a smooth transition interpolating between the two ground state phases \(\pm\phi_0\).

Solution #\(3\): The Euler-Lagrange equations for this particular free energy density \(\mathcal F\) yield a Poisson/Helmholtz-like (but nonlinear!) PDE for the on-shell field \(\phi_*(\textbf x)\):

\[\gamma\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\phi_*=\alpha_2\phi_*+\alpha_4\phi_*^3\]

Looking for a homogeneous ansatz \(\phi_*(\textbf x)=\phi_0\) yields the \(2\) degenerate ground states \(\phi_0=\sqrt{-\alpha_2/\alpha_4}\) with free energy density \(\mathcal F_0:=\mathcal F(\pm\phi_0)=-\alpha_2^2/4\alpha_4\) and corresponding Landau-Ginzburg free energy \(F_0:=F[\pm\phi_0]=L^d\mathcal F_0\) (putting the system in a box \([-L/2,L/2]^d\) to regularize the obvious IR divergence that would otherwise arise).

By contrast, reverting now to the Beltrami identity, assuming that \(\phi_*^{\text{DW}}(\textbf x)=\phi_*^{\text{DW}}(x)\) varies only along the \(x\)-direction, the PDE reduces to the nonlinear separable first-order ODE:

\[\frac{\gamma}{2}\left(\frac{d\phi_*^{\text{DW}}}{dx}\right)^2-\frac{\alpha_2}{2}(\phi_*^{\text{DW}})^2-\frac{\alpha_4}{4}(\phi_*^{\text{DW}})^4=-\mathcal F_0\]

In particular, placing the domain wall at the origin \(x=0\) so that \(\phi(0)=0\), one obtains the soliton described (for a domain wall at some other location \(x_0\in\textbf R\), just translate \(x\mapsto x-x_0\) in the \(\tanh\) function). In particular; the width of the domain wall is \(\Delta x=\sqrt{-2\gamma/\alpha_2}\) which is pretty intuitive (c.f. the formula \(\omega_0=\sqrt{k/m}\) for a mass \(m\) on a spring \(k\)). Unlike the homogeneous ground states \(\pm\phi_0\), this domain wall soliton is a non-MF stationary point of the Landau-Ginzburg free energy functional \(F\).

Problem #\(4\): By definition, the ground state(s) of any system are global minima of its energy. In particular, it is clear that the domain wall soliton \(\phi_*^{\text{DW}}(x)\) is not a ground state, having free energy \(F[\phi_*^{\text{DW}}]>F_0\); precisely how much free energy \(\Delta F_{\text{DW}}:=F[\phi_*^{\text{DW}}]-F_0\) does it cost to create such a domain wall from a homogeneous ground state phase?

Solution #\(4\): Differential equations can (and should!) often be thought of as a dance/tension between conflicting characters. Even without doing any of the math in Solution #\(3\), it should be clear that the domain wall transition cannot happen instantaneously or the free energy cost \(\int d^d\textbf x\frac{\gamma}{2}\left(\frac{d\phi}{dx}\right)^2\) from the “kinetic” term would be too great, but neither can it proceed too slowly otherwise the free energy cost \(\int d^d\textbf x\left(\frac{\alpha_2}{2}\phi^2+\frac{\alpha_4}{4}\phi^4\right)\) from the “potential” term would be too great; the domain wall \(\phi_*^{\text{DW}}(x)\) must therefore strike a balance between these two free energy costs while satisfying the boundary conditions \(\lim_{x\to\pm\infty}\phi_*^{\text{DW}}(x)=\pm\phi_0\) (cf. the virial theorem in classical mechanics). In other words, \(0<\Delta x<\infty\).

(easy to forget, but remember one is working in the subcritical \(T<T_c\) regime where \(\alpha_2<0,\alpha_4>0\) so the potential term \(\frac{\alpha_2}{2}\phi^2+\frac{\alpha_4}{4}\phi^4\) is not positive semi-definite, in particular its minimum does not lie at \(\phi_0=0\) but rather has degenerate minima at \(\phi_0=\pm\sqrt{-\alpha_2/\alpha_4}\)! This means that anywhere \(\textbf x\in\textbf R^d\) that \(\phi_*^{\text{DW}}(\textbf x)\) strays too far away from the bottom of the potential wells at \(\pm\phi_0\) costs free energy, so for this reason the domain wall cannot take too long to climb over the hump between the two minima).

Indeed, the Beltrami identity quantifies this free energy balance and one can exploit it to quickly calculate the free energy of the domain wall soliton:

\[F[\phi_*^{\text{DW}}]=\int d^d\textbf x\left(\frac{\alpha_2}{2}(\phi_*^{\text{DW}})^2+\frac{\alpha_4}{4}(\phi_*^{\text{DW}})^4+\frac{\gamma}{2}\left(\frac{d\phi_*^{\text{DW}}}{dx}\right)^2\right)=\int d^d\textbf x\left(\gamma\left(\frac{d\phi_*^{\text{DW}}}{dx}\right)^2+f_0\right)\]

the latter term is recognized as just the free energy \(F_0=\int d^d\textbf x f_0=L^df_0\) of the ground state(s) so the excess free energy cost \(\Delta F_{\text{DW}}\) of creating the domain wall is (for \(L\gg\Delta x\)):

\[\Delta F_{\text{DW}}=\gamma\int d^d\textbf x\left(\frac{d\phi_*^{\text{DW}}}{dx}\right)^2=\gamma L^{d-1}\left(\frac{\phi_0}{\Delta x}\right)^2\int_{-L/2}^{L/2}dx\space\text{sech}^4\left(\frac{x}{\Delta x}\right)\]

\[\approx\frac{\gamma L^{d-1}\phi_0^2}{\Delta x}\int_{-\infty}^{\infty}d\varphi\space\text{sech}^4\varphi=\frac{4}{3\sqrt{2}}\frac{\sqrt{-\gamma\alpha_2^3}}{\alpha_4}L^{d-1}\]

but the key point is that \(\Delta F_{\text{DW}}\sim L^{d-1}\) scales with the (hyper)area of the domain wall. Well actually, another important scaling to note is that \(\Delta F_{\text{DW}}\sim(-\alpha_2)^{3/2}\) which suggests that near criticality where \(\alpha_2\to 0\), the free energy cost \(\Delta F_{\text{DW}}\to 0\) of creating a domain wall also vanishes, while its width \(\Delta x\sim(-\alpha_2)^{-1/2}\) diverges. This suggests that domain walls become important near critical points.

Problem #\(5\): Working with the same Landau-Ginzburg system as above, estimate in \(d=1\) dimension the probability \(p_N\) that thermal fluctuations will spontaneously break the \(\textbf Z_2\) symmetry of the free energy \(F\) via the creation of \(N\ll L/\Delta x\) domain walls anywhere, and comment on the implication of this for the lower critical dimension \(d_{\ell}\) of this system.

Solution #\(5\): Because the ODE arising from the Beltrami identity was nonlinear, the superposition of a bunch of domain walls at different locations is not strictly speaking a stationary point of the free energy \(F\), but nonetheless one can sweep this under the rug and assume it is still an approximate solution provided the \(N\) domain walls are well-separated. In particular, this means their total free energy \(\approx N\Delta F_{\text{DW}}+F_0\) is also approximately additive. If one imagines discretizing the \(d=1\) line \([-L/2,L/2]\) into \(L/\Delta x\) bins each of width \(\Delta x\), then there are \(L/\Delta x\choose{N}\) choices for where to put the domain walls in (ignoring the fact that in some cases, they may not be so well-separated), so the probability of having any configuration of \(N\) domain walls is approximately:

\[p_N\approx {{L/\Delta x}\choose{N}}\frac{e^{-N\beta\Delta F_{\text{DW}}+F_0}}{Z}\approx\frac{(L/\Delta x)^N}{N!}\frac{e^{-N\beta\Delta F_{\text{DW}}+F_0}}{Z}\]

where the sparse approximation has been used \(N\ll L/\Delta x\). Alternatively, normalizing with respect to the \(N=0\) “vacuum” probability \(p_0=e^{-\beta F_0}/Z\):

\[\frac{p_N}{p_0}=\frac{(L/\Delta x)^N}{N!}e^{-N\beta\Delta F_{\text{DW}}}\]

Importantly, in \(d=1\) domain walls are free since \(\Delta F_{\text{DW}}\sim L^{1-1}=L^0\). So all the \(L\)-dependence in the expression above for \(p_N/p_0\) is contained in the entropic factor \((L/\Delta x)^N/N!\) and shows that as one takes the infinite system limit \(L\to\infty\), the probability \(p_N/p_0\) receives no exponential suppression from the \(e^{-N\beta\Delta F_{\text{DW}}}\) and instead grows unbounded. The probability that thermal fluctuations produce an even number \(N\in 2\textbf N\) of domain walls is:

\[\sum_{N=2,4,6,…}p_N=\cosh\left(\frac{Le^{-\beta\Delta F_{\text{DW}}}}{\Delta x}\right)p_0\]

and similarly for odd \(N\in 2\textbf N+1\):

\[\sum_{N=1,3,5,…}p_N=\sinh\left(\frac{Le^{-\beta\Delta F_{\text{DW}}}}{\Delta x}\right)p_0\]

and since \(\lim_{\varphi\to\infty}\tanh\varphi=1\), these two probabilities both approach the same \(e^{Le^{-\beta\Delta F_{\text{DW}}}/\Delta x}p_0/2\) as \(L\to\infty\).

More generally, any Landau-Ginzburg theory with a discrete symmetry (like the \(\textbf Z_2\) symmetry of this particular \(F\)) will possess a bunch of disconnected, degenerate ground states/vacua (like the \(\phi(\textbf x)=\pm\phi_0\) in this case) that spontaneously break that discrete symmetry, and all have \(d_{\ell}=1\) because there is very high probability that thermal fluctuations will proliferate domain walls that toggle between the degenerate ground states, preventing ordered phases from forming. Hence for instance the Ising model has no phase transitions in \(d=1\).

Problem #\(6\):

Solution #\(6\):

Problem #\(7\):

Solution #\(7\):

Note that if one does not work in natural units, then the heat capacity is instead \(C=k_B\beta^2\frac{\partial^2\ln Z}{\partial\beta^2}\) so all specific heat capacities should come with an additional factor of \(k_B\).

Problem #\(8\):

Solution #\(8\):

Problem #\(9\):

Solution #\(9\):

(NEED TO COME BACK TO THIS QUESTION!)

Problem #\(10\): Describe the quadratic approximation to the Landau-Ginzburg free energy density \(\mathcal F\) governing a \(\textbf Z_2\)-symmetric system/theory (e.g. Ising ferromagnet) described by a single, real scalar order parameter \(\phi\) in the absence of any external coupling \(E_{\text{ext}}=0\).

Solution #\(10\): On LAS (locality, analyticity, symmetry) grounds, the exact free energy density \(\mathcal F\) that knows about the all detailed microscopic physics must take the phenomenological form:

\[\mathcal F=\frac{\alpha_2}{2}\phi^2+\frac{\alpha_4}{4}\phi^4+\frac{\gamma}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2+…\]

where in principle there are also coupling constants for \(\phi^6,\phi^5\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\phi\) etc. though not for couplings like \(\phi^3\) or \(1/\phi^2\). In addition, one also has to import from mean-field theory the assumption that \(\alpha_2\sim T-T_c\) (i.e. \(\alpha_2\to 0\) as \(T\to T_c\) in a linear/critical exponent \(1\) manner) and that \(\gamma,\alpha_4>0\) for all \(T\) in a neighbourhood of \(T_c\).

The quadratic approximation to \(\mathcal F\) looks slightly different depending on whether one is working in the supercritical \(T>T_c\) regime (where \(\alpha_2(T)>0\)) or the subcritical \(T<T_c\) regime (where \(\alpha_2(T)<0\)).

In the supercritical \(T>T_c\) regime, the quadratic approximation does what it sounds like, in fact drop not only all quartic, sextic, octic, decic, etc. couplings like \(\phi^4,\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^{22}\) but even quadratic couplings containing Laplacians, third derivatives, biharmonics, and all higher-derivative couplings. So at \(T>T_c\), this amounts to keeping only \(2\) couplings, a “kinetic” coupling \(\frac{\gamma}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2\) and a “Hookean quadratic potential” coupling \(\frac{\alpha_2}{2}\phi^2\):

\[\mathcal F\approx\frac{\alpha_2}{2}\phi^2+\frac{\gamma}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2\]

Notice in this case one can also replace \(\phi\mapsto\delta\phi\) everywhere:

\[=\frac{\alpha_2}{2}\delta\phi^2+\frac{\gamma}{2}\biggr|\frac{\partial\delta\phi}{\partial\textbf x}\biggr|^2\]

where the fluctuation \(\delta\phi(\textbf x):=\phi(\textbf x)-\langle\phi(\textbf x)\rangle\). This is simply because, at \(T>T_c\), \(\langle\phi(\textbf x)\rangle=0\) vanishes homogeneously (from mean-field theory).

By contrast, in the subcritical \(T<T_c\) regime, the quadratic approximation consists of \(2\) separate approximation steps:

Step #\(1\): Keep not only the zeroth and first-order quadratic couplings that were retained in the supercritical \(T>T_c\) case, but also keep the quartic \(\phi^4\) coupling:

\[\mathcal F\approx\frac{\alpha_2}{2}\phi^2+\frac{\gamma}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2\]

The reason for this is that for \(T<T_c\), \(\alpha_2(T)<0\) so the free energy density \(\mathcal F(\phi)=-\frac{|\alpha_2|}{2}\phi^2+…\) would be unbounded from below, yielding an unstable theory. By including the quartic coupling, one instead has \(2\) degenerate \(\textbf Z_2\) symmetry breaking homogeneous ordered phases \(\phi(\textbf x)=\pm\phi_0\) with \(\phi_0=\sqrt{-\alpha_2/\alpha_4}\).

However, beyond the presence of \(\alpha_4\) in \(\phi_0\), one would otherwise like to remove all other vestiges of it in \(\mathcal F\) in order to get a more quadratic-looking free energy density like in the \(T>T_c\) case. Thus:

Step #\(2\): Insert the “Reynolds decomposition” \(\phi=\phi_0+\delta\phi\) into \(\mathcal F\) and notice that the term linear in \(\delta\phi\) vanishes because \(\phi_0\) is on-shell:

\[\mathcal F\approx\mathcal F[\phi_0]-\alpha_2\delta\phi^2+\frac{\gamma}{2}\biggr|\frac{\partial\delta\phi}{\partial\textbf x}\biggr|^2+O(\delta\phi^3)\]

where the cubic and quartic couplings \(O(\delta\phi^3)=2\alpha_4\phi_0\delta\phi^3+\frac{\alpha_4}{2}\delta\phi^4\) at the level of the fluctuations \(\delta\phi\) from the mean field \(\phi_0\) are assumed negligible in this second step of the quadratic approximation for \(T<T_c\). Finally, note that the constant term \(\mathcal F[\phi_0]\) would drop out when differentiating \(\ln Z\) to compute Boltzmannian cumulants, so can safely be omitted.

In this way, the name “quadratic approximation” is justified because both the \(T>T_c\) and \(T<T_c\) cases can be unified by writing the free energy as:

\[\mathcal F\approx\frac{1}{2}\left(\mu^2\delta\phi^2+\gamma\biggr|\frac{\partial\delta\phi}{\partial\textbf x}\biggr|^2\right)\]

where the “mass coupling” \(\mu^2\geq 0\) is defined piecewise to capture both the subcritical and supercritical regimes:

\[\mu^2(T) =\begin{cases}
\alpha_{2}(T), & T \geq T_c\\
-2\,\alpha_{2}(T), & T \leq T_c
\end{cases}\]

cf. the Klein-Gordon Lagrangian density when written in natural units:

\[\mathcal L=\frac{1}{2}\partial^{\mu}\partial_{\mu}\phi-\frac{1}{2}m^2\phi^2\]

Problem #\(11\): Working within the quadratic approximation to the free energy density \(\mathcal F\) outlined in Solution #\(11\), compute the partition function \(Z\).

Solution #\(11\): This is pretty much the only case where the path integral underlying \(Z\) can be computed analytically thanks to the fact that Gaussian integrals are straightforward to do. Here is a sketch of the computation:

Step #\(1\): The free energy \(F=\int d^d\textbf x\mathcal F(\delta\phi)\) is certainly extensive so one must add up \(\int\) the chunks of free energy \(d^d\textbf x\mathcal F(\delta\phi)\) due to fluctuations \(\delta\phi\) of the order parameter \(\phi\) at each point \(\textbf x\in\textbf R^d\) in space (or \(\textbf x\in V\subset\textbf R^d\) in some suitably large volume). But conceptually, some of these fluctuations might be more short-range, rapid oscillations across \(\textbf x\), while others may be more long-range, slow envelopes. Nonetheless, it should be intuitively clear that, rather than taking the local approach of stepping through each \(\textbf x\in\textbf R^d\) and adding up the energies of all fluctuations at that point \(\textbf x\), one can take a more global approach of adding up the energies of all fluctuations across the entire space \(\textbf R^d\) that have a given wavelength \(\lambda\) (or equivalently wavenumber \(k=2\pi/\lambda\)), and stepping through all possible wavelengths, from the long (“IR”) wavelengths all the way down to the short (“UV”) wavelengths. This intuitive picture can of course be formalized by explicitly writing \(\delta\phi(\textbf x)\) as a superposition of plane wave excitations:

\[\delta\phi(\textbf x)=\int\frac{d^d\textbf k}{(2\pi)^d}\delta\phi_{\textbf k}e^{i\textbf k\cdot\textbf x}\]

whereupon, one obtains what is essentially just Plancherel’s theorem:

\[F=\frac{1}{2}\int\frac{d^d\textbf k}{(2\pi)^d}|\delta\phi_{\textbf k}|^2(\mu^2+\gamma|\textbf k|^2)\]

which uses the fact that for \(\delta\phi(\textbf x)\in\textbf R\), the Fourier transform satisfies \(\delta\phi_{-\textbf k}=\delta\phi^{\dagger}_{\textbf k}\). By viewing the system as having some large but finite volume \(V\), one can replace on dimensional grounds:

\[\int\frac{d^d\textbf k}{(2\pi)^d}\Leftrightarrow\frac{1}{V}\sum_{\textbf k}\]

where \(\sum_{\textbf k}\) means over all \(\textbf k=\frac{2\pi}{L}\textbf n\) for \(\textbf n\in\textbf Z^d\) since these are the only wavevectors compatible with periodic boundary conditions on \(\delta\phi(\textbf x)\) in a box \([-L/2,L/2]^d\) of volume \(V=L^d\). Put another way, it reduces Plancherel’s theorem for the Fourier transform to Parseval’s theorem for Fourier series (since \(\delta\phi(\textbf x)\) is now \(L\)-periodic in all \(d\) dimensions).

Finally, the existence of the plane wave basis also allows one to “rigorously” define the measure \(\mathcal D\phi=\mathcal D\delta\phi\) in the path integral for \(Z=\int\mathcal D\delta\phi e^{-\beta F[\delta\phi]}\). Intuitively, the path integral \(\int\mathcal D\delta\phi\) wants to sum over all possible fluctuations \(\delta\phi\) of the field about the mean field. But the Fourier transform allows one to explicitly parameterize this abstract space! Simply integrate over all possible choices of the “Fourier knobs/coefficients” \(\phi_{\textbf k}\) which can span any fluctuation \(\delta\phi\):

\[\int\mathcal D\delta\phi\sim\int\prod_{\textbf k}d\delta\phi_{\textbf k}\sim\int\prod_{\textbf k}d\Re \delta\phi_{\textbf k}d\Im \delta\phi_{\textbf k}\]

where the product \(\prod_{\textbf k}\) is over the same countably infinite lattice of \(\textbf k\)-wavevectors as the earlier sum \(\sum_{\textbf k}\) (actually, strictly speaking, one should only take the product over half of the entire \(\textbf k\)-space, for instance imposing \(k_x>0\). This is because if \(\delta\phi_{\textbf k}\in\textbf C\) is already known for some \(\textbf k\), then \(\delta\phi_{-\textbf k}\) is also automatically known by virtue of the reality criterion \(\delta\phi_{-\textbf k}=\delta\phi^{\dagger}_{\textbf k}\), cf. \(\sin(kx)=\frac{1}{2i}e^{ikx}+?e^{-ikx}\) where, knowing that \(\sin(kx)\in\textbf R\), one can immediately conclude that \(?=(\frac{1}{2i})^{\dagger}=-\frac{1}{2i}\). So \(\delta\phi_{\textbf k}\) and \(\delta\phi_{-\textbf k}\) are not independent knobs (imagine a “complex conjugation gear” between them), but the path integral \(\int\mathcal D\delta\phi\) only wants to integrate over independent degrees of freedom since double-counting the same fluctuation is obviously not desired. That being said, this only leads to a factor of \(2\) discrepancy which doesn’t affect any physical quantities, and the measure \(\mathcal D\delta\phi\) is only defined up to some “normalization” anyways).

At this point one writes \(|\delta\phi_{\textbf k}|^2=\Re^2\delta\phi_{\textbf k}+\Im^2\delta\phi_{\textbf k}\) and decouples all the Gaussian integrals to obtain the final result for \(Z\):

\[Z\sim\prod_{\textbf k}\sqrt{\frac{\pi V}{\beta(\mu^2+\gamma|\textbf k|^2)}}\]

Problem #\(12\): Using this \(Z\), and making the specific assumption that \(\alpha_2(T)=k_B(T-T_c)\) and that \(\gamma\) is independent of \(T\), compute the specific heat capacity \(c\) of this Landau-Ginzburg system in the supercritical \(T>T_c\) regime.

Solution #\(12\): Simply use the formula:

\[c=\frac{k_B}{V}\beta^2\frac{\partial^2\ln Z}{\partial\beta^2}\]

with cumulant generating function (ignoring the temperature-independent parts):

\[\ln Z\sim-\frac{1}{2}\sum_{\textbf k}\ln\beta(\mu^2+\gamma|\textbf k|^2)\]

So, noting that \(\partial\mu^2/\partial\beta=-1/\beta^2\):

\[c=-\frac{k_B}{2}\beta^2\frac{\partial^2}{\partial\beta^2}\int\frac{d^d\textbf k}{(2\pi)^d}\ln\beta(\mu^2+\gamma|\textbf k|^2)\]

\[=-\frac{k_B}{2}\int\frac{d^d\textbf k}{(2\pi)^d}\beta^2\frac{\partial^2}{\partial\beta^2}\left(\ln\beta+\ln(\mu^2+\gamma|\textbf k|^2)\right)\]

\[=\frac{k_B}{2}\int\frac{d^d\textbf k}{(2\pi)^d}\left(1-\frac{2}{\beta(\mu^2+\gamma |\textbf k|^2)}+\frac{1}{\beta^2(\mu^2+\gamma |\textbf k|^2)^2}\right)\]

where the \(\frac{k_B}{2}\times 1\) part reflects equipartition of quadratic degrees of freedom and is simply due to the \(\beta\) temperature dependence in \(e^{-\beta F}\) (fluctuation-dissipation theorem?). By contrast, the other terms are additional contributions to the heat capacity arising from the \(T\)-dependence in \(F\) instead, specifically in \(\mu^2=k_B(T-T_c)\).

Problem #\(13\): Clearly, the specific heat capacity \(c\) as it’s currently written suffers from a \(|\textbf k|\to\infty\) UV divergence, since it is that part of the integrand that causes the integral to diverge to \(\infty\). What should one make of this?

Solution #\(13\): To regularize this UV divergence, one has to impose a UV cutoff \(k^*\) with the property that the Fourier transform \(\delta\phi_{\textbf k}\) is only supported for \(|\textbf k|\leq k^*\) in the \(k^*\)-ball. This UV cutoff should be chosen so that \(k^*\sim 1/\Delta x\), with \(\Delta x\) being the lattice spacing of some underlying microscopic structure which has been coarse-grained away. By implementing a UV cutoff, the specific heat capacity \(c\) is made finite:

\[c=\frac{k_B}{2(2\pi)^d}|S^{d-1}|\left(\frac{(k^*)^d}{d}-\frac{2}{\beta}\int_0^{k^*}dk\frac{k^{d-1}}{\mu^2+\gamma k^2}+\frac{1}{\beta^2}\int_0^{k^*}dk\frac{k^{d-1}}{(\mu^2+\gamma k^2)^2}\right)\]

where the (hyper)surface area of the unit \(d-1\)-sphere is \(|S^{d-1}|=2\pi^{d/2}/\Gamma(d/2)\).

(aside: to prove this, notice that the \(d\)-dimensional isotropic Gaussian integral \(\int_{\textbf x\in\textbf R^d} d^d\textbf x e^{-|\textbf x|^2}\) evaluates to \(\pi^{d/2}\) when separated in Cartesian coordinates, or equivalently \(|S^{d-1}|\int_0^{\infty}d|\textbf x||\textbf x|^{d-1}e^{-|\textbf x|^2}\) in spherical coordinates. The latter integral can then be massaged into the form of a gamma function \(\Gamma(d/2)/2\) via the substitution \(z:=|\textbf x|^2\)).

For some small dimensions \(d=1,2,3,4,5\), the \(2\) integrals can be evaluated analytically, with results compiled in the table below:

or in hindsight many of these can be written more compactly via the (yet unmotivated) correlation length \(\xi:=\sqrt{\gamma}/\mu\).

Problem #\(14\): Using the results of Problem #\(14\), comment on how \(c\) behaves in various dimensions \(d\) as one approaches the critical point \(T\to T_c^+\) from above (since the results above were all computed within the supercritical \(T>T_c\) regime, though the analysis is the same in the subcritical \(T<T_c\) regime).

Solution #\(14\): As \(T\to T_c^+\), the quadratic coupling constant \(\mu^2\to 0\) vanishes. For \(d\geq 5\), both of the integrals above converge to a finite value determined by the choice of UV cutoff wavenumber \(k^*\), so \(c\) is also finite, and more precisely \(c\sim(k^*)^{d-2}\) from the first integral. For \(d=4,3\), only the first integral diverges while the second one converges, whereas for \(d=2,1\), both integrals diverge. For \(d\leq 3\), this divergence goes like \(c\sim\mu^{d-4}\) whereas for \(d=4\) it is a logarithmic divergence \(c\sim\ln\mu\) (both of these being determined in this case by the second integral).

In the case \(d\leq 3\), since \(\mu^2\sim T-T_c\) as \(T\to T_c^+\), this implies that \(c\sim (T-T_c)^{(d-4)/2}\) but in general the critical exponent \(\alpha\) is defined by the property that \(c\sim |T-T_c|^{-\alpha}\). Although this analysis was only for \(T>T_c\), one can check that for \(T<T_c\) one would have mirror behavior, so this analysis shows that, at least for \(d\leq 3\), the critical exponent is \(\alpha=2-d/2\). Thus, the contribution of fluctuations causes the critical exponent to differ from the mean-field prediction \(\alpha=0\).

Problem #\(15\): At each \(\textbf x\in\textbf R^d\), one can associate a continuous random variable \(\phi(\textbf x)\) which draws an order parameter configuration \(\phi\) from a thermal Boltzmann distribution \(p[\phi]=e^{-\beta F[\phi]}/Z\) and returns the evaluation of \(\phi\) at \(\textbf x\). Given two arbitrary positions \(\textbf x,\textbf x’\in\textbf R^d\), each giving rise to its own random variable \(\phi(\textbf x),\phi(\textbf x’)\), define the connected \(2\)-point correlation propagator \(\langle\delta\phi(\textbf x)\delta\phi(\textbf x’)\rangle\) between \(\textbf x,\textbf x’\).

Solution #\(15\): The connected \(2\)-point correlation propagator is simply the cross-covariance of the random variables \(\phi(\textbf x),\phi(\textbf x’)\). It is thus related to the cross-correlation \(\langle\phi(\textbf x)\phi(\textbf x’)\rangle\) between \(\textbf x,\textbf x’\) and their individual expected values \(\langle\phi(\textbf x)\rangle,\langle\phi(\textbf x’)\rangle\) by the usual parallel axis theorem (sort of…):

\[\langle\delta\phi(\textbf x)\delta\phi(\textbf x’)\rangle=\langle\phi(\textbf x)\phi(\textbf x’)\rangle-\langle\phi(\textbf x)\rangle\langle\phi(\textbf x’)\rangle\]

where, just to flesh it out explicitly, these thermal Boltzmann canonical ensemble averages look like configuration space functional integrals:

\[\langle\phi(\textbf x)\rangle=\int\mathcal D\phi\phi(\textbf x)e^{-\beta F[\phi]}\]

(remember that evaluation at \(\textbf x\) is a functional \(\phi(\textbf x)=\text{eval}_{\textbf x}[\phi]\)).

Problem #\(16\): Define the functional derivative of a functional. Evaluate the following functional derivatives:

\[\frac{\delta}{\delta f(\textbf x)}\int d^d\textbf x’\cos f(\textbf x’)\]

\[\frac{\delta}{\delta f(\textbf x)}\int d^d\textbf x’ d^d\textbf x^{\prime\prime}\frac{f(\textbf x’)f(\textbf x^{\prime\prime})}{|\textbf x’-\textbf {x}^{\prime\prime}|}\]

\[\frac{\delta}{\delta f(\textbf x)}\exp{\int d^d\textbf x’\left(\frac{1}{2}f(\textbf x’)^2+\frac{1}{2}\biggr|\frac{\partial f}{\partial\textbf x’}\biggr|^2\right)}\]

(comment: the \(1\)st and \(3\)rd functionals are local whereas the \(2\)nd functional one is nonlocal).

Solution #\(16\): Functions \(f(\textbf x)\) are like \(\infty\)-dimensional vectors whose components are indexed not by a discrete label \(i\) but rather by a continuous label \(\textbf x\in\textbf R^d\); one may as well write \(f_{\textbf x}\) or \(\langle\textbf x|f\rangle\) rather than \(f(\textbf x)\) to stress this point. The functional derivative is then conceptually no different from a partial derivative with respect to one of these infinitely many components \(f(\textbf x)\) of \(f\) and just ends up returning a vanilla function of \(\textbf x\).

Just as for a function \(F(f_1,f_2,…)\) of some real variables \(f_1,f_2,…\in\textbf R\) one has the total differential:

\[dF=\sum_i\frac{\partial F}{\partial f_i}df_i\]

So simply replacing the sum by an integral \(\sum_i\mapsto\int d^d\textbf x\), the functional derivative of a functional \(F[f]\) with respect to the variable \(f(\textbf x)\) is defined by requiring:

\[\delta F=\int d^d\textbf x\frac{\delta F}{\delta f(\textbf x)}\delta f(\textbf x)\]

where typically one takes \(\delta\phi(\textbf x)=0\) for \(\textbf x\) on the boundary of the domain of integration in \(\textbf R^d\).

Notice that functional derivatives are much easier to compute than functional integrals, reflecting the general trend from single-variable calculus that differentiation is easier than integration. In particular, for functionals which are either directly (as in the \(1\)st and \(2\)nd examples) or indirectly (as in the \(3\)rd example) related to some integral of the function, functional differentiation just boils down to partial differentiation of the integrand. Keeping in mind that \(\delta f(\textbf x)=0\) for \(\textbf x\in\partial\), the key identity in this regard, generalizing the Euler-Lagrange equations, is:

\[\frac{\delta}{\delta f(\textbf x)}\int d^d\textbf x\mathcal F\left(f,\frac{\partial f}{\partial\textbf x},\frac{\partial^2 f}{\partial\textbf x^2},…\right)=\frac{\partial\mathcal F}{\partial f}-\frac{\partial}{\partial\textbf x}\cdot\frac{\partial\mathcal F}{\partial(\partial f/\partial \textbf x)}+\frac{\partial^2}{\partial\textbf x^2}\cdot\frac{\partial^2\mathcal F}{\partial(\partial^2 f/\partial \textbf x^2)}-+…\]

Problem #\(17\): Another functional differentiation problem for fun. For any function \(f(\textbf x)\), the “orthonormality condition”:

\[\frac{\delta f(\textbf x’)}{\delta f(\textbf x)}=\delta^d(\textbf x-\textbf x’)\]

seems intuitively clear (cf. \(\langle\textbf x’|\textbf x\rangle=\delta^3(\textbf x-\textbf x’)\) in nonrelativistic quantum mechanics), but how does one go about “rigorously” interpreting and proving it?

Solution #\(17\): The idea, mentioned before, is to view \(f(\textbf x’)=\text{eval}_{\textbf x’}[f]\) as an evaluation functional so that it really can be interpreted as a functional derivative. Then, express the evaluation functional in the integral form that one is most comfortable with by convolving with a delta:

\[f(\textbf x’)=\int d^d\textbf x f(\textbf x)\delta^d(\textbf x-\textbf x’)\]

So the result follows from the considerations in Solution #\(16\).

Problem #\(18\): In general, what are the dimensions of the functional derivative \(\frac{\delta F}{\delta f}\)? Look back at both Problems #\(16,17\) and check this.

Solution #\(18\): Although one is used to intuitively reading off dimensions from regular derivatives like \([\partial f/\partial x]=[f]/[x]\), for functional derivatives the “densitized” nature of the continuum \(d^d\textbf x\) means that actually:

\[\biggr[\frac{\delta F}{\delta f(\textbf x)}\biggr]=\frac{[F]}{[f][\textbf x]^d}\]

which in particular is not equal to the naive \([F]/[f]\) (unless \(d=0\) which is boring). This is consistent with the functional derivatives in both Solutions #\(16,17\).

Problem #\(19\): Show that by “applying an external magnetic field” \(E_{\text{ext}}(\textbf x)\) to the system, one has for the original exact Landau-Ginzburg free energy bifunctional \(F[\phi,E_{\text{ext}}]\) (i.e. not the quadratic approximation \(F=F[\delta\phi]\)) the functional derivatives:

\[\langle\phi(\textbf x)\rangle=\frac{\delta\ln Z}{\delta\beta E_{\text{ext}}(\textbf x)}\]

\[\langle\delta\phi(\textbf x)\delta\phi(\textbf x’)\rangle=\frac{\delta^2\ln Z}{\delta\beta E_{\text{ext}}(\textbf x’)\delta\beta E_{\text{ext}}(\textbf x)}\]

Solution #\(19\): The Zeeman coupling is linear, so mathematically it is a chemist’s Legendre transform of the free energy density \(\mathcal F\) from \(\phi\to E_{\text{ext}}\):

\[\mathcal F=-E_{\text{ext}}\phi+\frac{\mu^2}{2}\phi^2+\frac{\gamma}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2+…\]\]

Proving the \(1\)st identity concerning \(\langle\phi(\textbf x)\rangle\) is easy, and it can be directly substituted into the \(2\)nd identity to give:

where at the end one can also turn off \(E_{\text{ext}}(\textbf x)=0\) to get the ensemble average and \(2\)-point correlator in the original \(\textbf Z_2\) theory.

Problem #\(20\): Using the results of Solution #\(17\), show that, reverting back to the quadratic approximation of \(\mathcal F\):

\[\langle\phi(\textbf x)\rangle=(E_{\text{ext}}*G)(\textbf x)=\int d^d\textbf x’E_{\text{ext}}(\textbf x’)G(\textbf x-\textbf x’)\]

\[\langle\delta\phi(\textbf x)\delta\phi(\textbf x’)\rangle=\beta^{-1}G(\textbf x-\textbf x’)\]

where \(G(\textbf x)\) is almost a stationary point of the quadratic free energy \(F\) through being the fundamental Green’s function of the Helmholtz operator \(\mu^2-\gamma\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\) on \(\textbf R^d\):

\[G(\textbf x)=\int\frac{d^d\textbf k}{(2\pi)^d}\frac{e^{-i\textbf k\cdot\textbf x}}{k^2+1/\xi^2}\]

Solution #\(20\): Essentially the same as Solution #\(11\) except that need to first complete the square in the free energy (which uses the reality of the magnetic field \(E_{\text{ext}}(\textbf x)\in\textbf R\Leftrightarrow E_{\text{ext}}^{-\textbf k}=\left(E_{\text{ext}}^{\textbf k}\right)^{\dagger}\))

\[F=\int\frac{d^d\textbf k}{(2\pi)^d}\left(\frac{\mu^2+\gamma |\textbf k|^2}{2}\biggr|\phi_{\textbf k}-\frac{E_{\text{ext}}^{\textbf k}}{\mu^2+\gamma |\textbf k|^2}\biggr|^2-\frac{|E_{\text{ext}}^{\textbf k}|^2}{2(\mu^2+\gamma |\textbf k|^2)}\right)\]

and then do the Gaussian path integral to get the original partition function (when \(E_{\text{ext}}=0\)) with an additional Plancherelian contribution from \(E_{\text{ext}}\neq 0\):

\[\ln Z=\frac{1}{2}\sum_{\textbf k}\ln\frac{\pi V}{\beta(\mu^2+\gamma |\textbf k|^2)}+\frac{\beta}{2}\int\frac{d^d\textbf k}{(2\pi)^d}\frac{|E_{\text{ext}}^{\textbf k}|^2}{\mu^2+\gamma |\textbf k|^2}\]

Or, substituting \(E_{\text{ext}}^{\textbf k}=\int d^d\textbf x E_{\text{ext}}(\textbf x)e^{-i\textbf k\cdot\textbf x}\) to revert from \(\textbf k\mapsto\textbf x\) (in anticipation that one would like to take functional derivatives with respect to \(E_{\text{ext}}(\textbf x)\)):

\[\ln Z=\frac{1}{2}\sum_{\textbf k}\ln\frac{\pi V}{\beta(\mu^2+\gamma |\textbf k|^2)}+\frac{\beta}{2}\int d^d\textbf x d^d\textbf x’ E_{\text{ext}}(\textbf x)E_{\text{ext}}(\textbf x’)G(\textbf x-\textbf x’)\]

From here, the functional derivatives are straightforward to take (and only the \(2\)nd term above matters):

(aside: it seems that \(E_{\text{ext}}(\textbf x)\) can be interpreted as some kind of fluctuation distribution which, upon convolving with the \(2\)-point correlator, solves the inhomogeneous Helmholtz equation:

\[\left(\mu^2-\gamma\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\right)\langle\phi(\textbf x)\rangle=E_{\text{ext}}(\textbf x)\]

is there an intuitive interpretation of this?)

Problem #\(21\): Check explicitly that:

\[\left(\mu^2-\gamma\biggr|\frac{\partial}{\partial\textbf x}\biggr|^2\right)G(\textbf x)=\delta^d(\textbf x)\]

Solution #\(21\):

Problem #\(22\): Show that the isotropic \(2\)-point correlator has the Ornstein-Zernicke asymptotics:

\[\beta\langle\delta\phi(\textbf x)\delta\phi(\textbf x’)\rangle\sim\begin{cases}2^{-3d/2}\pi^{(d-1)/2}e^{-d/8}d^{(d-4)/2}\gamma^{-1}\frac{1}{r^{d-2}}, & r\ll\xi\\ 2^{-(d+3)/2}\pi^{(1-d)/2}\gamma^{-1}\frac{e^{-r/\xi}}{\xi^{(d-3)/2}r^{(d-1)/2}}, & r\gg\xi
\end{cases}\]

where \(r:=|\textbf x-\textbf x’|\) the identity \(\frac{d-3}{2}+\frac{d-1}{2}=d-2\) ensures dimensional consistency and is a good way to remember it (and indeed the \(d-2\) follows on dimensional analysis grounds).

Solution #\(22\): The derivation involves a clever saddle-point approximation:

Problem #\(23\): What are the critical exponents \(\nu,\eta\) in the mean-field, quadratic approximation to the full Landau-Ginzburg theory?

Solution #\(23\): The critical point is defined to be where the quadratic coupling \(\mu^2=0\) vanishes. So get any kind of critical exponent, this always just means: write down a formula for the quantity of interest in terms of \(\mu\sim |T-T_c|^{1/2}\). For the correlation length, this is trivial:

\[\xi=\frac{\sqrt{\gamma}}{\mu}\sim\mu^{-1}=|T-T_c|^{-1/2}:=|T-T_c|^{-\nu}\]

so \(\nu=1/2\). In particular, as \(T\to T_c\), the correlation length diverges \(\xi\to\infty\). This also means that at the critical point, the only relevant regime in the Ornstein-Zernicke \(2\)-point correlator is \(r\ll\xi\to\infty\), so:

\[\langle\delta\phi(\textbf x)\delta\phi(\textbf x’)\rangle\sim\frac{1}{r^{d-2}}:=\frac{1}{r^{d-2+\eta}}\]

So \(\eta=0\).

Problem #\(24\): Using the Ginzburg criterion, rationalize why the upper critical dimension \(d_c=4\) for the Ising model.

Solution #\(24\): Conceptually, the Ginzburg criterion is a common sense idea. It says that mean-field theory has sensible things to say about the critical point \(T\to T_c\) iff:

\[\text{fluctuations about mean field}\ll\text{mean field itself}\]

or, mathematically (integrating only within a \(\xi\)-ball due to the exponentially-decaying Ornstein-Zernicke correlation for \(|\textbf x|>\xi\)):

\[\int_{|\textbf x|\leq\xi}d^d\textbf x\langle\delta\phi(\textbf x)\delta\phi(\textbf 0)\rangle\ll\int_{|\textbf x|\leq\xi}d^d\textbf x\phi_0^2\]

\[\frac{\xi^2}{\beta\gamma}\ll\xi^d\phi_0^2\]

\[\frac{\xi^{2-d}}{\phi_0^2}\ll\beta\gamma\]

\[\frac{|T-T_c|^{\nu(d-2)}}{|T-T_c|^{2\beta}}\ll\beta\gamma\]

But at the critical point \(\beta\gamma\to\beta_c\gamma_c\) which is just finite number…so in order for the left side of the equality to also remain bounded as \(T\to T_c\), require:

\[\nu(d-2)\geq 2\beta\]

Using the mean-field exponents \(\beta=\nu=1/2\), this amounts to constraining \(d\geq 4=d_c\). This may seem a bit sketchy given that the mean-field exponents for \(\beta,\nu\) were used in the Ginzburg criterion to show that mean-field theory is consistent. Rather, it just means that MFT is self-consistent. On the other hand, for \(d<4\), MFT literally predicts its own demise!

Posted in Blog | Leave a comment

The Dirac Equation

Problem #\(1\): Define the Poincaré group.

Solution #\(1\): In words, the Poincaré group is the isometry group of Minkowski spacetime \(\textbf R^{1,3}\). Mathematically, it is the semidirect product \(\textbf R^{1,3}⋊O(1,3)\) of the normal subgroup \(\textbf R^{1,3}\) of spacetime translations with the Lorentz subgroup \(O(1,3)\) of rotations, Lorentz boosts, parity, and time reversal:

The reason the Poincaré group is a semidirect product rather than simply a direct product \(\textbf R^{1,3}\times O(1,3)\) is that spacetime translations and Lorentz transformations talk to each other via the Poincaré group’s composition rule:

\[(\Delta X_2,\Lambda_2)\cdot(\Delta X_1,\Lambda_1):=(\Delta X_2+\Lambda_2\Delta X_1,\Lambda_2\Lambda_1)\]

Problem #\(2\): With the exception of parity and time reversal, all symmetries of the Poincaré group can be implemented continuously:

where the green box of proper, orthochronous Poincaré transformations is the connected component of the identity \(1\) and the only one of the \(4\) connected components which by itself comprises a subgroup of the Poincaré group. Among the \({4}\choose{2}\)\(=6\) pairs of connected components one can join together, the \(2\) pairs:

\[\textbf R^{1,3}⋊SO(1,3)=\textbf R^{1,3}⋊SO^+(1,3)\cup\textbf R^{1,3}⋊SO^-(1,3)\]

and

\[\textbf R^{1,3}⋊O^+(1,3)=\textbf R^{1,3}⋊SO(1,3)\cup\textbf R^{1,3}⋊\not{S}O(1,3)\]

are the only subgroups of the Poincaré group \(\textbf R^{1,3}⋊O(1,3)\). Anyways, this digression is just to say that, modulo the Klein \(4\)-group quotient structure \(\{1,\Pi,\Theta,\Pi\Theta\}\), the proper, orthochronous Poincaré group \(\textbf R^{1,3}⋊SO^+(1,3)\) is a Lie group and hence can be studied via its Lie algebra. Classify fully the structure of this so-called Poincaré algebra.

Solution #\(2\): Consider spacetime translations \(\Delta X\in\textbf R^{1,3}\) first. It is clear that:

\[e^{-\Delta X\cdot\frac{\partial}{\partial X}}X=X-\Delta X\]

where \(\Delta X\cdot\frac{\partial}{\partial X}=\Delta X^{\mu}\partial_{\mu}\) is a Lorentz scalar. Although there’s nothing quantum mechanical about what is being done here, for sake of comparison one can artificially introduce \(\hbar\) and define the generator of spacetime translations \(P:=-i\hbar\frac{\partial}{\partial X}\Leftrightarrow P_{\mu}=-i\hbar\partial_{\mu}\) so that a general spacetime translation would be given by the infinite-dimensional unitary representation \(e^{-i\Delta X\cdot P/\hbar}\). Since spacetime translations are abelian, the algebra is simply characterized by \([P_{\mu},P_{\nu}]=0\).

Considering proper, orthochronous Lorentz transformations \(\Lambda\in SO^+(1,3)\) next, recall that membership in \(O(1,3)\) means that:

\[\Lambda^T\eta\Lambda=\eta\Leftrightarrow(\Lambda^T)_{\mu}^{\space\space\nu}\eta_{\nu\rho}\Lambda^{\rho}_{\space\space\sigma}=\eta_{\mu\sigma}\]

So if one substitutes \(\Lambda=1+\omega\Leftrightarrow \Lambda^{\mu}_{\space\space\nu}=\delta^{\mu}_{\space\space\nu}+\omega^{\mu}_{\space\space\nu}\) for infinitesimal \(\omega\), one finds that any such generator \(\omega\in\frak{so}^+\)\((1,3)\) must be antisymmetric:

\[\omega^T\eta=-\eta\omega\Leftrightarrow\omega_{\mu\nu}=-\omega_{\nu\mu}\]

which is unsurprising considering \(SO^+(1,3)\) is similar to \(SO(4)\). Thus, one can adapt the obvious \(\frak{so}\)\((4)\) basis of \(6\) antisymmetric generators to the Lorentz Lie algebra \(\frak{so}\)\(^+(1,3)\). There are various notations with which one can express these \(6\) generators:

More Intuitive Notation: One has \(3\) generators \(\textbf J:=(J_1,J_2,J_3)\) for rotations and \(3\) generators \(\textbf K:=(K_1,K_2,K_3)\) for Lorentz boosts such that by definition a rotation through angular displacement \(\Delta\boldsymbol{\phi}\) is given by \(e^{-i\Delta\boldsymbol{\phi}\cdot\textbf J/\hbar}\) and a Lorentz boost through rapidity \(\Delta\boldsymbol{\varphi}\) is given by \(e^{-i\Delta\boldsymbol{\varphi}\cdot\textbf K/\hbar}\). One can kind of “cheat” using one’s prior knowledge about how macroscopic Lorentz transformation matrices \(\Lambda\) look in the case of rotations or Lorentz boosts about various axes, “infinitesimalize” them, and extract the corresponding generator. For example, for a rotation about the \(z\)-axis, as \(\Delta\phi\to 0\) one expects to \(\mathcal O(\Delta\phi)\):

\[\begin{pmatrix}1&0&0&0\\0&\cos\Delta\phi&-\sin\Delta\phi&0\\0&\sin\Delta\phi&\cos\Delta\phi&0\\0&0&0&1\end{pmatrix}\approx\begin{pmatrix}1&0&0&0\\0&1&-\Delta\phi&0\\0&\Delta\phi&1&0\\0&0&0&1\end{pmatrix}=\begin{pmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}-\frac{i\Delta\phi J_3}{\hbar}\]

from which one obtains:

\[J_3=i\hbar\begin{pmatrix}0&0&0&0\\0&0&-1&0\\0&1&0&0\\0&0&0&0\end{pmatrix}\]

and similarly for \(J_1,J_2\). Meanwhile, repeating the procedure for an infinitesimal Lorentz boost by \(\Delta\varphi\to 0\) along the \(x\)-axis:

\[\begin{pmatrix}\cosh\Delta\varphi&-\sinh\Delta\varphi&0&0\\-\sinh\Delta\varphi&\cosh\Delta\varphi&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}\approx \begin{pmatrix}1&-\Delta\varphi&0&0\\-\Delta\varphi&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}=\begin{pmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\end{pmatrix}-\frac{i\Delta\varphi K_1}{\hbar}\]

So that the generator of Lorentz boosts along the \(x\)-axis is:

\[K_1=i\hbar\begin{pmatrix}0&-1&0&0\\-1&0&0&0\\0&0&0&0\\0&0&0&0\end{pmatrix}\]

and similarly for \(K_2,K_3\). In light of these matrix representations for the Lorentz Lie algebra generators, one can directly compute their commutation relations:

\[[J_i,J_j]=i\hbar\varepsilon_{ijk}J_k\]

\[[J_i,K_j]=i\hbar\varepsilon_{ijk}K_k\]

\[[K_i,K_j]=-i\hbar\varepsilon_{ijk}J_k\]

where the last commutator expresses (via the BCH formula) the counterintuitive phenomenon of Wigner rotation associated with non-collinear \(i\neq j\) Lorentz boosts.

Less Intuitive Notation: The cross product on \(\textbf R^3\) realizes a representation of the rotation algebra \(\frak{so}\)\((3)\) given by assigning each antisymmetric tensor \(\omega\in\frak{so}\)\((3)\) to the unique vector \(\boldsymbol{\omega}\in\textbf R^3\) such that \(\omega\textbf x=\boldsymbol{\omega}\times\textbf x\) for all \(\textbf x\in\textbf R^3\). More explicitly:

\[\omega=\begin{pmatrix}0&-\omega_3&\omega_2\\\omega_3&0&-\omega_1\\-\omega_2&\omega_1&0\end{pmatrix}\]

\[\boldsymbol{\omega}=\begin{pmatrix}\omega_1\\\omega_2\\\omega_3\end{pmatrix}\]

Or in indices, \(\omega_{ij}=-\varepsilon_{ijk}\omega_k\) or, inverting by contraction with a suitable Levi-Civita symbol, one has the equivalent form \(\omega_k=-\frac{1}{2}\varepsilon_{ijk}\omega_{ij}\). Here, the vector \(\boldsymbol{\omega}\) is arguably more intuitive than the tensor \(\omega\) even though both contain exactly the same information.

In precisely this same spirit, one can take the more intuitive “vector” generators \(\textbf J,\textbf K\) for rotations and Lorentz boosts and convert them into the equivalent but less intuitive form of a \(4\times 4\) matrix, each of whose \(16\) elements is itself a \(4\times 4\) matrix (one could even think of this as a \(16\times 16\) matrix if one so desired). That is, it is typical to define:

\[(\mathcal M^{\rho\sigma})^{\mu\nu}=\eta^{\rho\mu}\eta^{\sigma\nu}-\eta^{\sigma\mu}\eta^{\rho\nu}\]

where \(0\leq \mu,\nu,\rho,\sigma\leq 3\) are all spacetime indices (this is partly the reason for introducing this less intuitive representation of the Lorentz algebra because it puts space and time on more equal footing in this compact expression). This is related to the more intuitive representation above by:

\[i\hbar\mathcal M=\begin{pmatrix}0&-K_1&-K_2&-K_3\\K_1&0&J_3&-J_2\\K_2&-J_3&0&J_1\\K_3&J_2&-J_1&0\end{pmatrix}=\begin{pmatrix}0&-\textbf K^T\\\textbf K& -J\end{pmatrix}\]

where \(J=\textbf J\times\) as in the example with \(\omega=\boldsymbol{\omega}\times\) earlier, or in indices:

\[i\hbar\mathcal M^{ij}=\varepsilon_{ijk}J_k\Leftrightarrow J_i=\frac{i\hbar}{2}\varepsilon_{ijk}\mathcal M^{jk}\]

\[K_i=i\hbar\mathcal M_{i0}\]

One can then compute the Lorentz algebra in a single sweep:

Lowering the \(\nu\) index with the metric (it’s more useful to have it like this since Lorentz transformations have a conventionally NW-SE index structure \(\Lambda^{\mu}_{\space\space\nu}\) and one is anticipating generating these by exponentiating linear combinations of these generators):

\[(\mathcal M^{\rho\sigma})^{\mu}_{\space\space\nu}=\eta^{\rho\mu}\delta^{\sigma}_{\space\space\nu}-\eta^{\sigma\mu}\delta^{\rho}_{\space\space\nu}\]

so that an arbitrary Lorentz algebra element \(\omega\in\frak{so}^+\)\((1,3)\) (not necessarily infinitesimal anymore) is some linear combination of the \(6\) generators \(\mathcal M^{\rho\sigma}\in\frak{so}^+\)\((1,3)\) with real coefficients \(\Delta\Phi_{\rho\sigma}\in\textbf R\) quantifying the extent of a rotation or Lorentz boost (they are related to the earlier more intuitive representation by \(\Delta\phi_i=\) and \(\Delta\varphi_i=\)):

\[\omega=\frac{1}{2}\Delta\Phi_{\rho\sigma}\mathcal M^{\rho\sigma}\Leftrightarrow\omega^{\mu}_{\space\space\nu}=\Delta\Phi^{\mu}_{\space\space\nu}\]

where the factor of \(1/2\) is to compensate for double-counting (since both \(\Delta\Phi_{\sigma\rho}=-\Delta\Phi_{\rho\sigma}\) and \(\mathcal M^{\sigma\rho}=-\mathcal M^{\rho\sigma}\) are antisymmetric in their indices \(\rho,\sigma\)) (explicitly write out \(\Phi\) to have \(\textbf J,\textbf K\) entries inside).

Finally, spacetime translations talk with Lorentz transformations (recall the proper, orthochronous Poincaré group is given by a semi-direct product \(\textbf R^{1,3}⋊SO^+(1,3)\)), so to fully specify the Poincaré algebra, one also has to figure out how the generators \(P:=(H/c,\textbf P)\) of \(\textbf R^{1,3}\) talk with the generators \(\textbf J,\textbf K\) of \(\frak{so}^+\)\((1,3)\), not just how they talk within their own Lie subalgebras.

It turns out one can concisely summarize all the remaining commutation relations within the Poincaré algebra in the more intuitive form:

\[[H,\textbf J]=\textbf 0\Leftrightarrow [H,J_i]=0\]

\[[H,\textbf K]=i\hbar c\textbf P\Leftrightarrow [H,K_i]=i\hbar cP_i\]

\[\textbf P\times\textbf J=i\hbar\textbf P\Leftrightarrow [P_i,J_j]=i\hbar\varepsilon_{ijk}P_k\]

\[[\textbf P,\textbf K]_{\otimes}=i\hbar\frac{H}{c}1\Leftrightarrow [P_i,K_j]=i\hbar\frac{H}{c}\delta_{ij}\]

or the less intuitive but more compact/Lorentz invariant form:

\[[\mathcal M^{\rho\sigma},P^{\mu}]=\eta^{\rho\mu}P^{\sigma}-\eta^{\sigma\mu}P^{\rho}\]

Problem #\(3\): Motivate the general definition of a Clifford algebra in mathematics, state the anticommutation relations of the specific Clifford algebra \(\text{Cl}_{1,3}(\textbf R)\), and state the chiral/Weyl representation of \(\text{Cl}_{1,3}(\textbf R)\).

Solution #\(3\): Given a vector space \(V\) over a field \(F\), a quadratic form \(Q:V\to F\) is any function with \(2\) properties:

  1. \(Q(\lambda\textbf v)=\lambda^2Q(\textbf v)\) for all vectors \(\textbf v\in V\) and scalars \(\lambda\in F\).
  2. The function \(\langle\space|\space\rangle:V\times V\to F\) defined by \(\langle\textbf v|\textbf w\rangle:=\frac{1}{2}(Q(\textbf v+\textbf w)-Q(\textbf v)-Q(\textbf w))\) is a bilinear form (called the polarization of \(Q\)).

Note that \(Q\) and \(\langle\space|\space\rangle\) contain exactly the same information, since one can also invert \(Q(\textbf v)=\langle\textbf v|\textbf v\rangle\). A general quadratic form on Euclidean space \(\textbf R^n\) is of the form \(Q(\textbf v):=\textbf v^Tg\textbf v\) (where without loss of generality one can take \(g^T=g\) to be symmetric) whose associated symmetric bilinear form is \(\langle\textbf v|\textbf w\rangle=(\textbf v^Tg\textbf w+\textbf w^Tg\textbf v)/2\). In particular, when \(g=1\) is the standard metric on Euclidean space, then \(\langle\textbf v|\textbf w\rangle=\textbf v\cdot\textbf w\) coincides with the usual dot product.

Given a quadratic space \((V,F,Q)\), one can construct from this the Clifford algebra \(\text{Cl}(V,F,Q)\) by starting with the tensor algebra \(\oplus_{k=0}^{\infty}V^{\otimes k}\) of \(V\) and quotienting it by the relation \(\textbf v^2:=\textbf v\otimes\textbf v=Q(\textbf v)1_V\) for all \(\textbf v\in V\). From this, it follows as a corollary that more generally, for any two vectors \(\textbf v,\textbf w\in V\):

\[\{\textbf v,\textbf w\}=\textbf v\textbf w+\textbf w\textbf v:=\textbf v\otimes\textbf w+\textbf w\otimes\textbf v=2\langle\textbf v|\textbf w\rangle 1_V\]

where in physics contexts the tensor product \(\textbf v\otimes\textbf w\) is often abbreviated to just \(\textbf v\textbf w\) and called the geometric product. Furthermore, since this anticommutator \(\{\textbf v,\textbf w\}\) is bilinear, it may be completely specified by finding a basis \(\textbf e_i\) of \(V\) and simply specifying the value of \(\{\textbf e_i,\textbf e_j\}\) for all pairs of basis vectors \(\textbf e_i,\textbf e_j\). Returning to the example of \(\textbf R^n\) with the standard metric, if one chooses an orthonormal basis \(\hat{\textbf e}_i\cdot\hat{\textbf e}_j=\delta_{ij}\), then the Clifford algebra is simply characterized by \(\{\hat{\textbf e}_i,\hat{\textbf e}_j\}=2\delta_{ij}1\). More explicitly, the \(\hat{\textbf e}_i\) all anticommute with each other and each squares to the identity \(\hat{\textbf e}^2_i=1\).

In \(\textbf R^2\), picking an orthonormal basis \(\hat{\textbf e}_1,\hat{\textbf e}_2\), although a general element of the Clifford algebra \(\text{Cl}_2(\textbf R)\) is formally:

\[a1+(b\hat{\textbf e}_1+c\hat{\textbf e}_2)+(d\hat{\textbf e}_1\hat{\textbf e}_1+e\hat{\textbf e}_1\hat{\textbf e}_2+f\hat{\textbf e}_2\hat{\textbf e}_1+g\hat{\textbf e}_2\hat{\textbf e}_2)+…\]

The anticommutation relations of the Clifford algebra allow one to substantially reduce the number of degrees of freedom e.g. \(\hat{\textbf e}_1\hat{\textbf e}_1=\hat{\textbf e}_2\hat{\textbf e}_2=1\) and \(\hat{\textbf e}_1\hat{\textbf e}_2=-\hat{\textbf e}_2\hat{\textbf e}_1\)). All higher-order multivectors also reduce to either a scalar, a vector, or a bivector, e.g. \(\hat{\textbf e}_1\hat{\textbf e}_2\hat{\textbf e}_1=-\hat{\textbf e}_1\hat{\textbf e}_1\hat{\textbf e}_2=-\hat{\textbf e}_2\). So in fact, a general element of the Clifford algebra \(\text{Cl}_2(\textbf R)\) is simply spanned by \(4\) generators:

\[a1+b\hat{\textbf e}_1+c\hat{\textbf e}_2+d\hat{\textbf e}_1\hat{\textbf e}_2\]

While the scalars and vectors \(1^2=\hat{\textbf e}_1^2=\hat{\textbf e}_2^2=1\) all square to the identity, the bivector \((\hat{\textbf e}_1\hat{\textbf e}_2)^2=-1\) squares to minus the identity! This is reminiscent of the identity \(i^2=-1\), where \(i=\sqrt{-1}\) is the imaginary unit of \(\textbf C\).

In \(\textbf R^3\), now with an orthonormal basis \(\hat{\textbf e}_1,\hat{\textbf e}_2,\hat{\textbf e}_3\), a general element of the Clifford algebra \(\text{Cl}_3(\textbf R)\) is parameterized by \(8\) real degrees of freedom (extrapolating, it should be clear by Pascal’s triangle that \(\dim\text{Cl}_{n}(\textbf R)=2^n\)):

\[a+b\hat{\textbf e}_1+c\hat{\textbf e}_2+d\hat{\textbf e}_3+e\hat{\textbf e}_1\hat{\textbf e}_2+f\hat{\textbf e}_2\hat{\textbf e}_3+g\hat{\textbf e}_1\hat{\textbf e}_3+h\hat{\textbf e}_1\hat{\textbf e}_2\hat{\textbf e}_3\]

note that \(\text{Cl}_3(\textbf R)\) can be explicitly represented by the \(3\) Pauli matrices \(\{\sigma_i,\sigma_j\}=2\delta_{ij}1\). In addition, the bivectors \((\hat{\textbf e}_1\hat{\textbf e}_2)^2=(\hat{\textbf e}_2\hat{\textbf e}_3)^2=(\hat{\textbf e}_1\hat{\textbf e}_3)^2=-1\) all square to minus the identity while \(\hat{\textbf e}_1\hat{\textbf e}_2\hat{\textbf e}_2\hat{\textbf e}_3\hat{\textbf e}_3\hat{\textbf e}_1=-1\), reminiscent of the defining relations \(i^2=j^2=k^2=ijk=-1\) of the quaternions \(\textbf H\) (H for “Hamilton”).

In special relativity, Minkowski spacetime \(\textbf R^{1,3}\) is a real vector space that naturally comes with a quadratic form \(Q(X):=X^T\eta X\) induced by the Minkowski metric \(g=\eta=\text{diag}(1,-1,-1,-1)\). Its associated \(16\)-dimensional Clifford algebra \(\text{Cl}_{1,3}(\textbf R)\) is therefore subject to the anticommutation relations:

\[\{\gamma^{\mu},\gamma^{\nu}\}=2\eta^{\mu\nu}1\]

for which one representation (and it turns out the only \(4\)-dimensional irreducible representation up to unitary similarity) is the chiral/Weyl representation:

\[\gamma^{\mu}=\begin{pmatrix}0&\sigma^{\mu}\\\sigma_{\mu}&0\end{pmatrix}\]

where \(\sigma^{\mu}=(1,\boldsymbol{\sigma})\) and \(\sigma_{\mu}=(1,-\boldsymbol{\sigma})\).

Problem #\(4\):

Solution #\(4\):

Problem #\(5\):

Solution #\(5\):

Problem #\(6\):

Solution #\(6\):

Problem #\(7\): Motivate (from a historical perspective) the classical Lagrangian density \(\mathcal L\) for the Dirac bispinor field \(\psi:\textbf R^{1,3}\to\textbf C^4\), and hence obtain the on-shell Dirac equation. Write down the general solution to the Dirac equation (given that it’s linear!).

Solution #\(7\): Recall that for a non-relativistic free particle, the on-shell dispersion relation \(H=\textbf P^2/2m\) is first order in the energy \(H\), whereas for a relativistic free particle it is second order \(H^2=c^2\textbf P^2+m^2c^4\). The former quantizes to the Schrodinger equation \((\partial_i\partial_i+2ik_c\partial_0)\psi=0\), while the latter quantizes to the Klein-Gordon equation \((\partial^{\mu}\partial_{\mu}+k_c^2)\phi=0\). At first glance, one might think that \(\phi(X)\) would, like the wavefunction \(\psi(X)\), also admit a probabilistic interpretation given by the Born rule, but historically there were \(2\) reasons why this was a bit suspicious.

Objection #\(1\): In non-relativistic QM one would only have needed to specify \(|\psi(t=0)\rangle\) to get the entire future time evolution of \(|\psi(t)\rangle\) because the Schrodinger equation is first-order in time \(t\). So it seems a bit weird that, just to be relativistically compatible, one would also need to specify the initial velocity \(\partial_0|\psi(t=0)\rangle\).

Objection #\(2\): On the same note of being \(2\)-nd order in time, the Klein-Gordon equation has \(2\) linearly independent solutions:

\[\phi(X)\sim\int d^3\textbf k(a_{\textbf k}e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}+a_{\textbf k}^{\dagger}e^{-i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\]

(this should be contrasted with the Schrodinger equation which would have only had a single linearly independent \(e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\) where the energy is always positive b/c it’s just kinetic energy, so get a positive semi-definite probability density). In particular, the conserved current is \(J^{\mu}=\) and the probability density \(J^0=\phi^2\) can be negative?

An obvious way to get around Objection #\(1\) is to instead Fourier transform the square root of the earlier relativistic dispersion relation \(H=\pm\sqrt{c^2\textbf P^2+m^2c^4}\)…indeed, in some sense this is what Dirac did. But notice the \(\pm\) signs!

Ironically though, in the modern understanding of QFT, despite the whole motivation being to somehow interpret \(\phi\) probabilistically, it actually does not admit such an interpretation. Similar to Bohr’s derivation of the gross structure of the hydrogenic atom, the method is not fully sound in its foundations but the answer it gives turns out to be correct.

However, the Klein-Gordon equation, being a second order differential equation, has \(2\) linearly independent solutions:

\[\phi(X)\sim\int d^3\textbf k(a_{\textbf k}e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}+a_{\textbf k}^{\dagger}e^{-i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\]

Recalling that the

\[i\hbar\frac{\partial|\psi\rangle}{\partial t}=H|\psi\rangle\]

The Dirac Lagrangian \(\mathcal L=\mathcal L(\psi,\bar{\psi})\) is:

\[\mathcal L=\bar{\psi}(i\hbar\displaystyle{\not\!\partial}-mc1)\psi\]

So by varying the action with respect to the Dirac adjoint bispinor field \(\bar{\psi}:=\psi^{\dagger}\gamma^0\), one obtains the Dirac equation:

\[(i\hbar\displaystyle{\not\!\partial}-mc1)\psi=0\]

Solution #\(7\):

Problem #\(8\):

Solution #\(8\):

Problem #\(9\):

Solution #\(9\):

Problem #\(10\):

Solution #\(10\):

Problem #\(11\): Verify that, in spite of the anticommutation relations that were imposed in canonical quantization of the Dirac free field theory, the resulting creation and annihilation operators have the expected commutation relations with the Hamiltonian \(:H:\):

\[[:H:,(b_{\textbf k}^{m_s})^{\dagger}]=\hbar\omega_{\textbf k}(b_{\textbf k}^{m_s})^{\dagger}\]

\[[:H:,b_{\textbf k}^{m_s}]=-\hbar\omega_{\textbf k}b_{\textbf k}^{m_s}\]

\[[:H:,(c_{\textbf k}^{m_s})^{\dagger}]=\hbar\omega_{\textbf k}(c_{\textbf k}^{m_s})^{\dagger}\]

\[[:H:,c_{\textbf k}^{m_s}]=-\hbar\omega_{\textbf k}c_{\textbf k}^{m_s}\]

Solution #\(11\):

These commutation relations assert that the spectrum of Dirac free FT are given by Fock states describing particles and antiparticles carrying arbitrary momentum \(\textbf k\in\textbf R^3\) and spin angular momentum \(m_s\in\{\pm 1/2\}\) that emerge from excitations of the vacuum.

Posted in Blog | Leave a comment

Weakly Coupled Quantum Field Theories

Problem #\(1\): Write down a general \(\phi\)-dependent perturbation to the Klein-Gordon Lagrangian density \(\mathcal L\) for a real scalar field \(\phi\), and explain why in practice only the first \(2\) terms of such a perturbation need to be considered.

Solution #\(1\): Assuming the potential \(V(\phi)\) is analytic in \(\phi\):

\[\mathcal L=\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi-\frac{1}{2}k_c^2\phi^2-\sum_{n\geq 3}\frac{\lambda_n}{n!}\phi^n\]

where the coupling constants \(\lambda_0=\lambda_1=0\), \(\lambda_2=k_c^2\), etc. are not to be confused with the Compton wavelength \(\lambda:=h/mc\). Because the action \(S=\int d^4|X\rangle\mathcal L\) has dimensions of angular momentum \([S]=[\hbar]\) and \([d^4|X\rangle]=[\lambda]^4\), it follows that \([\mathcal L]=[\hbar]/[\lambda]^4=[\hbar^{-3}(mc)^4]\) and thus \([\phi]=[\hbar^{-1/2}mc]\), so:

\[[\lambda_n\phi^n]=[\mathcal L]\Rightarrow [\lambda_n]=[\hbar^{(n-6)/2}(mc)^{4-n}]\]

So on dimensional analysis grounds, the \(n\)-th order coupling constant \(\lambda_n\) does not by itself give a meaningful assessment as to the relevance of the corresponding \(\phi^n\) perturbation, rather it is the dimensionless parameter \(\hbar^{(6-n)/2}(E/c)^{n-4}\lambda_n\) (where \([E]=[mc^2]\) is the relevant energy scale of the process/physics at work) that really matters, i.e. the \(n\)-th order \(\phi^n\) perturbation is small iff \(\hbar^{(6-n)/2}(E/c)^{n-4}\lambda_n\ll 1\).

By graphing \(\hbar^{(6-n)/2}(E/c)^{n-4}\lambda_n\) as a function of \(E\) and given the non-negotiable fact that one is typically interested in “low-\(E\) physics”, it follows that all the \(\phi^n\) couplings for \(n\geq 5\) are irrelevant at these low energy scales, and instead only the marginal \(\phi^4\) quartic coupling and relevant \(\phi^3\) cubic coupling need to be considered. This analysis is quite deep, and goes to show the power of dimensional analysis.

Problem #\(2\): What does it mean for an interacting QFT to be weakly coupled?

Solution #\(2\): It means that one can legitimately treat higher-order interaction terms as small perturbations of a free QFT, i.e. that perturbation theory is accurate (otherwise it would a strongly coupled QFT which is a whole different beast…)

Problem #\(3\): Derive the Dyson series solution to the linear dynamical system \(\dot{\textbf x}(t)=A(t)\textbf x(t)\) and show that it can be expressed as a time-ordered exponential. Show explicitly that it satisfies the ODE.

Solution #\(3\): First, “normalize” away the uninteresting initial condition \(\textbf x(0)\) by writing \(\textbf x(t)=U(t)\textbf x(0)\) so that \(\dot U(t)=A(t)U(t)\) with \(U(0)=1\). Then integrating both sides:

\[U(t)=1+\int_0^tdt’A(t’)U(t’)\]

And recursively substitute ad infinitum:

\[=1+\int_0^tdt’A(t’)\left(1+\int_0^{t’}dt^{\prime\prime}A(t^{\prime\prime})U(t^{\prime\prime})\right)=1+\int_0^tdt’A(t’)+\int_0^tdt’\int_0^{t’}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+…\]

Recall that the definition of the time-ordering superoperator \(\mathcal T\) applied to a string of operators \(A_1(t_1)A_2(t_2)…A_n(t_n)\) is to reorder them so that the earliest time operators also act first (i.e. are on the right). For instance, for a string of \(2\) operators:

\[\mathcal T[A_1(t_1)A_2(t_2)]=A(t_1)A(t_2)\Theta(t_1-t_2)+A(t_2)A(t_1)\Theta(t_2-t_1)\]

or for a string of \(3\) operators, there will be \(3!=6\) terms:

\[\mathcal T[A_1(t_1)A_2(t_2)A_3(t_3)]=A_1(t_1)A_2(t_2)A_3(t_3)\Theta(t_1-t_2)\Theta(t_2-t_3)+A_1(t_1)A_3(t_3)A_2(t_2)\Theta(t_1-t_3)\Theta(t_3-t_2)+…\]

In particular, the Dyson series can be equivalently expressed as the time-ordering superoperator \(\mathcal T\) applied to the strings of operators obtained once one Taylor expands the naive exponential solution:

\[e^{\int_0^tA(t’)dt’}:=1+\int_0^tA(t’)dt’+\frac{1}{2}\int_0^tdt’\int_0^tdt^{\prime\prime}A(t’)A(t^{\prime\prime})+\frac{1}{6}\int_0^tdt’\int_0^tdt^{\prime\prime}\int_0^tdt^{\prime\prime\prime}A(t’)A(t^{\prime\prime})A(t^{\prime\prime\prime})+…\]

Applying \(\mathcal T\) (which is linear):

\[\mathcal T[e^{\int_0^tA(t’)dt’}]=1+\int_0^tA(t’)dt’+\frac{1}{2}\int_0^tdt’\int_0^tdt^{\prime\prime}\biggr(A(t’)A(t^{\prime\prime})\Theta(t’-t^{\prime\prime})+A(t^{\prime\prime})A(t’)\Theta(t^{\prime\prime}-t’)\biggr)+…\]

\[=1+\int_0^tA(t’)dt’+\frac{1}{2}\int_0^tdt’\int_0^{t’}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+\frac{1}{2}\int_0^tdt’\int_{t’}^{t}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+…\]

Among the \(2\) quadratic terms, the first one looks like the one in the Dyson series, but how about the second one? In fact it is equal to the first term; to see this draw a picture to facilitate interchanging the order of integration:

so it becomes clear that \(\int_0^t dt’\int_{t’}^{t}dt^{\prime\prime}=\int_0^tdt^{\prime\prime}\int_0^{t^{\prime\prime}}dt’\). Finally, if one wishes one can interchange the dummy variables \(t’\leftrightarrow t^{\prime\prime}\) to make the two double integrals look utterly identical. As one may anticipate, it turns out that all \(3!=6\) triple integrals at cubic order also coincide, precisely cancelling the \(1/3!=1/6\) prefactor in the Taylor expansion, etc. so that the Dyson series can indeed be written as a time-ordered exponential.

It is also satisfying to explicitly check that the Dyson series for \(U(t)\) satisfies the ODE:

\[\dot U(t)=\frac{d}{dt}\biggr(1+\int_0^tdt’A(t’)+\int_0^tdt’\int_0^{t’}dt^{\prime\prime}A(t’)A(t^{\prime\prime})+…\biggr)=A(t)+A(t)\int_0^t dt’A(t’)+…=A(t)U(t)\]

Or perhaps a more “slick” derivation is to appeal to the time-ordered exponential form of the Dyson series:

\[\dot U(t)=\frac{d}{dt}\mathcal T[e^{\int_0^tA(t’)dt’}]=\mathcal T\left[\frac{d}{dt}e^{\int_0^tA(t’)dt’}\right]\]

and then recognize that within \(\mathcal T\) all operators commute (e.g. for a string of \(2\) operators it is clear that \(\mathcal T[A_1(t_1)A_2(t_2)]=\mathcal T[A_2(t_2)A_1(t_1)]\)) so one can just differentiate naively:

\[=\mathcal T[A(t)e^{\int_0^tA(t’)dt’}]=A(t)\mathcal T[e^{\int_0^tA(t’)dt’}]\]

where the last step follows because \(t\) is the latest time so it can be “factored out” of \(\mathcal T\) to the left.

Problem #\(4\): Write down the Lagrangian density \(\mathcal L\) for scalar Yukawa QFT. What is the condition for this interacting QFT to be weakly coupled? By inspecting \(\mathcal L\), what charges remain conserved despite the interactions? Obtain the scalar Yukawa Hamiltonian density \(\mathcal H\).

Solution #\(4\): Given a real scalar field \(\phi\) and a complex scalar field \(\psi\):

\[\mathcal L=\frac{1}{2}\partial^{\mu}\phi\partial_{\mu}\phi+\partial^{\mu}\bar{\psi}\partial_{\mu}\psi-\frac{1}{2}k_{\phi}^2\phi^2-k_{\psi}^2\bar{\psi}\psi-g\phi\bar{\psi}\psi\]

where the relevant cubic coupling interaction \(g\phi\bar{\psi}\psi\) will be a small perturbation to the free QFT (so that this interacting QFT is weakly coupled) iff the coupling constant \(g=\lambda_3/6\) satisfies (using the results of Solution #\(1\)):

\[\sqrt{\hbar}g\ll k_{\phi},k_{\psi}\]

Just as for the free theory, since one still has the \(U(1)\) global internal symmetry \(\psi\mapsto e^{i\alpha}\psi\) in spite of the cubic coupling term, one continues to have a conserved charge \(Q=N_{\psi}-N_{\bar{\psi}}\).

Performing the relevant Legendre transforms:

\[\partial_0\phi\mapsto\pi_{\phi}=\partial_0\phi\]

\[\partial_0\psi\mapsto\pi_{\psi}=\partial_0\bar{\psi}\]

\[\partial_0\bar{\psi}\mapsto\pi_{\bar{\psi}}=\partial_0\psi\]

One obtains the Hamiltonian density \(\mathcal H=\mathcal H_{\text{free}}+\mathcal H_{\text{int}}\) for scalar Yukawa QFT, where:

\[\mathcal H_{\text{free}}=\frac{\pi^2_{\phi}}{2}+\frac{1}{2}\biggr|\frac{\partial\phi}{\partial\textbf x}\biggr|^2+\pi_{\bar{\psi}}\pi_{\psi}+\frac{\partial\bar{\psi}}{\partial\textbf x}\cdot\frac{\partial\psi}{\partial\textbf x}+\frac{1}{2}k^2_{\phi}\phi^2+k^2_{\psi}\bar{\psi}\psi\]

and:

\[\mathcal H_{\text{int}}=g\phi\bar{\psi}\psi\]

Problem #\(5\): Within scalar Yukawa QFT, explain why the scattering amplitude for the process \(\phi\to\bar{\psi}\psi\) is non-zero, and hence calculate it to order \(g\); what is an assumption implicit in this question?

Solution #\(5\): The implicit assumption is that the initial and final scattering states \(|t=-\infty\rangle,|t=\infty\rangle\in\mathcal F\) are indeed Fock states of the free QFT, i.e. \(H_{\text{free}}=\int d^3\textbf x\mathcal H_{\text{free}}\)-eigenstates, specifically the Lorentz-normalized excitations of the vacuum:

\[|t=-\infty\rangle:=\sqrt{\frac{2}{\hbar}}\frac{\omega_{\textbf k_{\phi}}}{c}a_{\textbf k_{\phi}}^{\dagger}|0\rangle\]

\[|t=\infty\rangle:=\frac{2\omega_{\textbf k_{\bar{\psi}}}\omega_{\textbf k_{\psi}}}{\hbar c^2}b_{\textbf k_{\psi}}^{\dagger}c_{\textbf k_{\bar{\psi}}}^{\dagger}|0\rangle\]

After a long time, in the interaction picture the initial state \(|t=-\infty\rangle\) evolves to \(\mathcal T\exp\left(-\frac{i}{\hbar}\int_{-\infty}^{\infty}dtH_{\text{int}}\right)|t=-\infty\rangle\) (where \(H_{\text{int}}\) is the interaction Hamiltonian in the interaction picture). The overlap of this with a particular final state \(|t=\infty\rangle\) thus yields the probability amplitude of that scattering process:

\[\langle t=\infty|\mathcal T\exp\left(-\frac{i}{\hbar}\int_{-\infty}^{\infty}dtH_{\text{int}}\right)|t=-\infty\rangle\]

where this is expected to be non-zero because \(Q=0\) is conserved in this process. If one is only interested in computing this scattering amplitude to order \(g\), then as usual one Taylor expands the time-ordered exponential to first order:

\[\approx \langle t=\infty|t=-\infty\rangle-\frac{i}{\hbar}\langle t=\infty|\int_{-\infty}^{\infty}dt H_{\text{int}}|t=-\infty\rangle\]

In this case, the initial and final states are orthogonal \(\langle t=\infty|t=-\infty\rangle=0\) which follows intuitively because a \(\phi\)-particle is not the same as a \(\bar{\psi}\psi\)-particle antiparticle pair, or mathematically the creation operators of the different particles all commute so either \(a_{\textbf k_{\phi}}^{\dagger}\) annihilates the bra \(\langle 0|\) or \(b_{\textbf k_{\psi}},c_{\textbf k_{\bar{\psi}}}\) annihilate the ket \(|0\rangle\).

The \(\mathcal O(g)\) matrix element term simplifies to:

\[-\frac{i}{\hbar c}\int d^4|X\rangle\langle t=\infty|\mathcal H_{\text{int}}|t=-\infty\rangle\sim-ig\int d^4|X\rangle\langle 0|b_{\textbf k_{\psi}}c_{\textbf k_{\bar{\psi}}}\phi\bar{\psi}\psi a_{\textbf k_{\phi}}^{\dagger}|0\rangle\]

At this point one has to pull out the Fourier normal mode plane wave expansions of \(\phi,\psi,\bar{\psi}\) (with time dependence since operators evolve in the interaction picture as if they were in the Heisenberg picture of the free QFT), annihilate as much as you can and otherwise use commutation relations to pick up delta functions where needed to do the \(13\)-dimensional integral. When the dust settles, one finds that the scattering amplitude for the process goes like \(-ig\delta^4|K_{\psi}+K_{\bar{\psi}}-K_{\phi}\rangle\) so it’s zero unless \(4\)-momentum is conserved. As a corollary, in the ZMF (rest frame) of the \(\phi\)-particle, one can check that this implies \(m_{\phi}\geq 2m_{\psi}\).

Problem #\(6\): Verify Wick’s theorem for the case of \(3\) scalar fields:

\[\mathcal T\phi_{|X_1\rangle}\phi_{|X_2\rangle}\phi_{|X_3\rangle}=:\phi_{|X_1\rangle}\phi_{|X_2\rangle}\phi_{|X_3\rangle}:+\Delta^F_{|X_2-X_3\rangle}\phi_{|X_1\rangle}+\Delta^F_{|X_3-X_1\rangle}\phi_{|X_2\rangle}+\Delta^F_{|X_1-X_2\rangle}\phi_{|X_3\rangle}\]

Solution #\(6\):

Problem #\(7\): Within weakly coupled scalar Yukawa QFT, compute the scattering amplitude for the process \(\bar{\psi}\psi\to\bar{\psi}\psi\) at order \(\mathcal O(g^2)\).

Solution #\(7\): At order \(\mathcal O(g^2)\), the scattering amplitude for this interaction from the initial state \(|t=-\infty\rangle\sim b_{\textbf k_1}^{\dagger}b_{\textbf k_2}^{\dagger}|0\rangle\) to the final state \(|t=\infty\rangle=\sim b_{\textbf k’_1}^{\dagger}b_{\textbf k’_2}^{\dagger}|0\rangle\) is given by:

\[\langle t=-\infty|S-1|t=\infty\rangle\sim i\mathcal A_{\bar{\psi}\psi\to\bar{\psi}\psi}\delta^4|K_{\psi}+K_{\bar{\psi}}-K_{\phi}\rangle\]

where to order \(\mathcal O(g^2)\), the “amplitude” \(\mathcal A_{\bar{\psi}\psi\to\bar{\psi}\psi}\) is given by the sum of the following two tree-level Feynman diagrams, namely a \(t\)-channel and a \(u\)-channel:

Problem #\(8\): Repeat Problem #\(7\) for the case of scalar Yukawa scattering of mesons \(\phi\phi\to\phi\phi\) at order \(\mathcal O(g^4)\).

Solution #\(8\): Given that the interaction Hamiltonian density for scalar Yukawa theory is cubic \(\mathcal H_{\text{int}}=g\phi\bar{\psi}\psi\), it follows that any Feynman diagram must be a cubic graph with exactly one \(\phi\)-edge, one \(\psi\)-edge, and one \(\bar{\psi}\)-edge at each vertex of the directed graph. By initially drawing the following:

and pondering for a bit, it is intuitively plausible that the simplest way to satisfy the above constraint is for the \(\phi\)-mesons to exchange virtual \(\psi/\bar{\psi}\) nucleons/antinucleons via the following one-loop (no longer tree-level) Feynman diagram:

and so because there are \(4\) vertices, it follows that the \(\phi\phi\to\phi\phi\) interaction scattering amplitude is \(\mathcal O(g^4)\). The Feynman rules assert that to \(\mathcal O(g^4)\) it is given by (notation here is a bit loose, in particular \(|\phi\phi\rangle^{\dagger}\neq\langle\phi\phi|\)):

\[\langle\phi\phi|S-1|\phi\phi\rangle\sim i\mathcal A_{\phi\phi\to\phi\phi}\delta^4(K’_1+K’_2-K_1-K_2)\]

where \(\mathcal A_{\phi\phi\to\phi\phi}\) is given by the one-loop integral over \(4\)-momenta:

Problem #\(10\): Define the Mandelstam variables \(s,t,u\in\textbf R\) by drawing suitable \(s\)-channel, \(t\)-channel and \(u\)-channel Feynman diagrams for interactions between two particles (of arbitrary type).

Solution #\(10\):

Posted in Blog | Leave a comment

Toric Code

The purpose of this post is to explain what the toric code is, and its potential use as a fault-tolerant quantum error correcting stabilizer surface code for topological quantum computing.

To begin, consider an \(N\times N\) square lattice \(\Lambda\) with a qubit placed at each edge as follows:

The vertices of the lattice \(\Lambda\) are called stars \(S\) while its faces are called plaquettes \(P\) (which can also be viewed as the vertices of the dual lattice \(\Lambda^*\), etc.). Since \(\Lambda\) is an \(N\times N\) square lattice, one might initially think that there are \(2N(N+1)\) qubits; however the catch is that opposite ends of the lattice are identified with each other, so that there are in fact only \(2N^2\) qubits. More importantly, this imposition of periodic boundary conditions endows \(\Lambda\cong S^1\times S^1\) with the topology of a torus, hence the name “toric code”.

A priori, the state space \(\mathcal H\) of \(2N^2\) qubits has dimension \(\text{dim}\mathcal H=2^{2N^2}\) (though if some/all of these qubits are identical then it may be less than this). Using the notation \(\sigma_i^x\) as a shorthand for the operator \(1_1\otimes…\otimes 1_{i-1}\otimes\sigma^x\otimes 1_{i+1}\otimes…\otimes 1_{2N^2}\) acting on \(\mathcal H\) and similarly for \(\sigma_i^y\) and \(\sigma_i^z\), one can define star and plaquette four-body interaction operators \(A_S,B_P\) associated respectively to stars \(S\) or plaquettes \(P\) in \(\Lambda\) by:

\[A_S:=\prod_{i\in S}\sigma_i^z\]

\[B_P:=\prod_{i\in P}\sigma_i^x\]

where the notation “\(i\in S\)” denotes the set of \(4\) qubits nearest to a given star \(S\), and similarly for “\(i\in P\)”; the notation \(\prod_{i\in S}\) or \(\prod_{i\in P}\) does not suffer from any operator ordering ambiguity because Pauli operators on distinct qubits (regardless of their \(x,y,z\) nature) trivially commute.

For any two stars \(S,S’\), and any two plaquettes \(P,P’\), one can check (without any motivation yet at this point) that:

\[[A_S,A_{S’}]=[B_{P},B_{P’}]=[A_S,B_P]=0\]

Specifically, if the stars or plaquettes are well-separated from each other then these are always trivially true so it suffices to just check the “edge cases” so to speak when the stars or plaquettes are close enough to share qubits. Then, for each such shared qubit, one just has to apply the anticommutation relation \(\{\sigma^{\alpha},\sigma^{\beta}\}=2\delta^{\alpha\beta}1\). Notably, the commutation relation \([A_S,B_P]=0\) relies on the fundamental fact that a star \(S\) and plaquette \(P\) which are adjacent to each other always share only \(2\) qubits, and \(2\) is even so \((-1)^2=1\).

Finally, suppose one thinks of the lattice of qubits as a spin lattice such that the spins interact via the following Hamiltonian:

\[H=-\sum_{S\in\Lambda}A_S-\sum_{P\in\Lambda^*}B_P\]

From the above discussion, it follows that for any star \(S\) or plaquette \(P\), one has:

\[[H,A_S]=[H,B_P]=0\]

There are \(N^2\) stars and \(N^2\) plaquettes, so this would suggest that one is free to specify the however, using a “Stokes theorem” type of argument (basically a more refined version of earlier arguments for showing \([A_S,A_{S’}]=[B_P,B_{P’}]=0\)), one can convince oneself inductively (i.e. playing around with small \(N=1,2,…\) etc.) that it is precisely thanks to the toric topology of \(\Lambda\) that one has the constraints:

\[\prod_{S\in\Lambda}A_S=\prod_{P\in\Lambda^*}B_P=1\]

(not to be confused with the quantities in the Hamiltonian \(H\) which involve sums \(\sum\) rather than products \(\prod\)).

To emphasize again, this is why a torus is desirable…4-fold degeneracy in the ground state manifold of \(H\).

Posted in Blog | 1 Comment

The IR Spectrum of \(\text{C}_{60}\)

The purpose of this post is to explain why, experimentally, one only observes \(4\) electric dipole transitions in the IR spectrum of buckminsterfullerene, also known as \(\text C_{60}\) or informally as the buckyball:

Buckyball Basics

The simplest conceptual way to construct a buckyball is to start with a regular icosahedron:

which has \(F=20\) equilateral triangular faces, \(V=12\) vertices, and \(E=30\) edges; notice this obeys Euler’s formula \(F+V=E+2\). Then, by simply “shaving off” each of the \(V=12\) vertices; it is clear from the picture these \(12\) vertices would transform into \(12\) pentagonal faces, each surrounded by \(5\) hexagonal faces so that no \(2\) pentagonal faces share an adjacent edge, yielding a buckyball topology (and geometrically, all \(\text C-\text C\) covalent bonds should be of equal length). This shaving process increases the number of faces to \(F’=32\) (of which \(12\) are pentagonal and \(20\) are hexagonal), the number of vertices to \(V’=60\) (i.e. just the number of carbon atoms), and the number of edges to \(E’=90\) such that Euler’s formula is maintained \(F’+V’=E’+2\).

IR Spectroscopy Selection Rules

Recall that within the Born-Oppenheimer approximation, the \(N\) nuclei of some molecule are “clamped” at positions \(\textbf X:=(\textbf X_1,…,\textbf X_N)\) and are regarded as moving in the effective potential \(V_{\text{eff}}(\textbf X)\) due to the electrons \(e^-\), so that the molecular Hamiltonian is:

\[H=T_{\text n}+V_{\text{eff}}(\textbf X)\]

If in addition one approximates \(V_{\text{eff}}(\textbf X)\) by a harmonic potential about the (stable) equilibrium configuration \(\textbf X_0\) of the nuclei:

\[V_{\text{eff}}(\textbf X)\approx\frac{1}{2}(\textbf X-\textbf X_0)^T\left(\frac{\partial^2 V_{\text{eff}}}{\partial\textbf X^2}\right)_{\textbf X_0}(\textbf X-\textbf X_0)\]

then, upon diagonalizing the Hessian \(\left(\frac{\partial^2 V_{\text{eff}}}{\partial\textbf X^2}\right)_{\textbf X_0}\) into the orthonormal eigenbasis of its normal modes, one obtains \(3N\) decoupled simple harmonic oscillators so the spectrum of \(H\) is just that of an anisotropic harmonic oscillator in \(\textbf R^{3N}\), at least within all the assumptions made so far (e.g. ignoring anharmonicity of \(V_{\text{eff}}(\textbf X)\)). Each Fock eigenstate \(|n_1,n_2,…,n_{3N}\rangle\) of \(H\) (thought of as a vibrational eigenstate because one can view \(H\) as a Hamiltonian governing vibrations/SHM of the nuclei about equilibrium) is thus a product \(|n_1,n_2,…,n_{3N}\rangle=\otimes_{i=1}^{3N}|n_i\rangle\) of suitable \(1\)D quantum harmonic oscillator wavefunctions.

IR radiation is just like any other electromagnetic radiation in that (within the dipole approximation, which is fair for long-wavelength IR radiation despite the larger size of molecules) it stimulates electric dipole transitions via the time-dependent perturbation \(\Delta H=\Delta H(t)\) to the molecular Hamiltonian \(H\):

\[\Delta H=-\boldsymbol{\pi}\cdot\textbf E_0\cos \omega_{\text{IR}} t\]

where the electric dipole moment \(\boldsymbol{\pi}=\boldsymbol{\pi}(\textbf X,\textbf x)\) of the molecule is:

\[\boldsymbol{\pi}:=e\sum_{\text{nuclei }i}Z_i\textbf X_i-e\sum_{\text{electrons }i}\textbf x_i\]

and the molecule is assumed to be neutral \(\sum_{\text{nuclei }i}Z_i=\sum_{\text{electrons }i}\). By Fermi’s golden rule, the molecular transition rate between two distinct vibrational \(H\)-eigenstates \(|n_1,…,n_{3N}\rangle\to|n’_1,…,n’_{3N}\rangle\) is proportional to the mod-square of the matrix element of the (time-independent amplitude of the) perturbation:

\[|\langle n’_1,…,n’_{3N}|\boldsymbol{\pi}\cdot\textbf E_0|n\rangle|^2\]

However, most IR spectroscopy experiments (e.g. IR laser sources in FTIR spectrometers) use unpolarized IR radiation, so this means one should really replace by an isotropic averaging factor of \(1/3\) and forget about (as far as selection rules are concerned) a factor of \(|\textbf E_0|^2\), so just focus on:

\[|\langle n’_1,…,n’_{3N}|\boldsymbol{\pi}|n\rangle|^2\]

Similar to what was done earlier for the effective potential \(V_{\text{eff}}(\textbf X)\), one can also Taylor expand the dipole moment operator \(\boldsymbol{\pi}\) within the configuration space \(\textbf X\) of the nuclei about the equilibrium configuration \(\textbf X_0\):

\[\boldsymbol{\pi}\approx\boldsymbol{\pi}(\textbf X_0)+\left(\frac{\partial\boldsymbol{\pi}}{\partial\textbf X}\right)_{\textbf X_0}(\textbf X-\textbf X_0)+…\]

Sandwiching this in the matrix element, the constant term \(\boldsymbol{\pi}(\textbf X_0)\) (which represents the possibility of a permanent electric dipole like in a water molecule) vanishes by orthogonality of the distinct vibrational \(H\)-eigenstates, so the quantity of interest is just:

\[\left|\left(\frac{\partial\boldsymbol{\pi}}{\partial\textbf X}\right)_{\textbf X_0}\langle n’_1,…,n’_{3N}|\textbf X-\textbf X_0|n_1,…,n_{3N}\rangle\right|^2\]

This immediately leads to a “gross selection rule” for two vibrational \(H\)-eigenstates to be coupled by the IR perturbation \(\Delta H\), namely that the Jacobian \(\left(\frac{\partial\boldsymbol{\pi}}{\partial\textbf X}\right)_{\textbf X_0}\) must be non-vanishing; physically, this means that only the normal modes of the nuclear vibration that experience a change in \(\boldsymbol{\pi}\) when the nuclei are slightly displaced from \(\textbf X_0\) will be IR-active.

…main point of this section is to explain that IR transitions are low-energy excitations require a change in electric dipole moment b/w initial and final states, which, factoring out the charge e, means the matrix element of the position observable b/w two states must be non-zero by Fermi’s golden rule. There are actually \(2\) selection rules for IR spectroscopy; the gross selection rule is that \(\partial\boldsymbol{\pi}/\partial\) must be zero.

Character Tables of Finite Group Representations

As a warmup, consider the symmetric group \(S_3=\{1, (12), (13), (23), (123), (132)\}\) of order \(|S_3|=3!=6\). Then \(S_3\) can be partitioned into \(3\) conjugacy classes, namely \(S_3=\{1\}\cup\{(12), (13), (23)\}\cup\{(123), (132)\}\). Each \(S_3\) conjugacy class maps onto an \(S_3\)-irrep, so there will also be three \(S_3\)-irreps. One of them is always just the trivial irrep whereby the permutations do nothing to all vectors. For symmetric groups, there is also always the sign irrep \(1,(123),(132)\mapsto 1\) and \((12),(13),(23)\mapsto -1\) such that odd permutations do nothing to all vectors but even \(A_3\)-permutations flip vectors across the origin. Finally, there is the standard \(S_3\)-irrep that one might intuitively think about as acting on the vertices of an equilateral triangle which rigidly drags the whole Cartesian plane \(\textbf R^2\) along with it. Note the dimensions of these \(S_3\)-irreps are the only ones that could have been compatible with its order \(6=1^2+1^2+2^2\).

Each \(S_3\)-irrep is associated with its own character class function which can be evaluated on an arbitrary representative of each conjugacy class. For instance, the trivial irrep is \(1\)-dimensional and its character always evaluates to \(1\) on all conjugacy classes. The sign irrep is also \(1\)-dimensional but its character evaluated on each conjugacy class is instead the sign of the permutations in that conjugacy class. Finally, noting that \(\cos(2\pi/3)=-1/2\) and that the trace of a rotation in \(\textbf R^2\) by angle \(\theta\) is \(2\cos\theta\), the following character table for \(S_3\) may be obtained:

So far this example has been fairly abstract. To bridge this “abstract nonsense” with chemistry, consider the specific example of the ammonia molecule \(\text{NH}_3\):

In this case, rather than pulling a random group (like \(S_3\)) out of thin air, here one can use the ammonia molecule to motivate studying the specific group of actions on \(\textbf R^3\) that leave it looking like nothing happened. In chemist jargon, one is interested in the point group of the ammonia molecule, the adjective “point” implying that any such action on \(\textbf R^3\) must have at least one fixed point in space (commonly chosen to be the origin) so that e.g. translations of the ammonia molecule are disregarded (cf. the distinction in special relativity between the Poincare group and its Lorentz subgroup). Note that this is indeed a group thanks to its very definition (if one action leaves ammonia looking like nothing happened, and a second action also leaves ammonia looking like nothing happened, then composing them will also leave ammonia looking like nothing happened).

So what is the point group of the ammonia molecule? For this, there isn’t really any super rigorous/systematic way to do it, one just has to stare hard at the molecule and think…

Clearly, one way to leave ammonia looking like nothing happened is to literally do nothing. There is also manifestly \(C_3\) rotational symmetry (by \(120^{\circ}\) or \(240^{\circ}\)). Finally, there are \(3\) reflection symmetries about “vertical” mirror planes. As a mathematician, it is thus easy to recognize that the point group of the ammonia molecule is just the dihedral group \(D_3\) (which happens to be isomorphic to \(S_3\)). However, as a chemist one would instead refer to this as the “\(C_{3v}\) point group”, where the \(C\) and the \(3\) subscript are meant to emphasize the \(C_3\) subgroup of rotational symmetries mentioned above, while the “\(v\)” subscript is meant to emphasize that the mirror planes are “vertical”. Strictly speaking, one should also check that there are no horizontal mirror planes, inversion centers, or improper rotation axes in order to be able to confidently assert that the point group of ammonia really is \(C_{3v}\) rather than \(C_{3v}\) merely being a subgroup of an actually larger point group.

The isomorphism \(C_{3v}\cong D_3\cong S_3\) means that by happy chance the character table for the ammonia point group \(C_{3v}\) is already known. For instance, the transpositions \((12),(23),(13)\) in \(S_3\) map onto vertical mirror plane reflections in \(C_{3v}\), while the \(3\)-cycles \((123),(132)\) are simply \(C_3\) rotations. Indeed, the \(3\)-dimensional \(C_{3v}\)-representation on the ammonia molecule itself embedded rigidly in \(\textbf R^3\) is reducible into a direct sum of the trivial irrep acting on the \(1\)-dimensional \(z\)-axis and the standard irrep acting on the \(2\)-dimensional \(xy\)-plane. It is also worth verifying that the columns of the character table satisfy orthonormality \(\sum_{\text{ccl}\text{ of }C_{3v}}|\text{ccl}|\chi_{\phi}(\text{ccl})\chi_{\phi’}(\text{ccl})=|C_{3v}|\delta_{\phi\cong\phi’}\) for any \(2\) \(C_{3v}\)-irreps \(\phi,\phi’\).

One can also consider how various scalar-valued and vector-valued polynomials on \(\textbf R^3\) transform passively under the \(C_{3v}\) point group. For instance, any function \(f=f(\rho)\) of the cylindrical coordinate \(\rho:=\sqrt{x^2+y^2}\) only, such as \(f(\rho)=\rho^2\), will transform under the trivial \(C_{3v}\)-irrep.

Making the reasonable assumption that all \(60\) carbon atoms are specifically of the \(^{12}\text C\) isotope, and are therefore identical bosons, so something about the permutation group \(S_{60}\)? (formalize the earlier comment about “looking the same” via the idea of identical quantum particles/ bosons and fermions where in that case the vectors are not in \(\textbf R^3\) but state vectors in the symmetrized tensor product Hilbert space of identical bosons)

Posted in Blog | Leave a comment