Problem: In a multilayer perceptron(MLP), how are layers conventionally counted?
Solution: The input layer \(\textbf x\equiv\textbf a^{(0)}\) is also called “layer \(0\)”. However, if someone says that an MLP has e.g. \(7\) layers, what this means is that in fact it has \(6\) hidden layers \(\textbf a^{(1)},\textbf a^{(2)},…,\textbf a^{(6)}\), together with the output layer \(\textbf a^{(7)}\). In other words, the input layer \(\textbf a^{(0)}\) is not counted by convention.
Problem: Write down the formula for the activation \(a^{(\ell)}_n\) of the \(n^{\text{th}}\) neuron in the \(\ell^{\text{th}}\) layer of a multilayer perceptron (MLP)artificial neural network.
Solution: Using a sigmoid activation function, the activation of the \(n^{\text{th}}\) neuron in the \(\ell^{\text{th}}\) layer is:
The purpose of this post is to document my progress for my summer internship in the Ultracold Quantum Matter Lab at Yale University working with the group of Professor Nir Navon to study bichromatic Rabi oscillations of a driven Fermi polaron.
Everything in the first half of the paper (steady state part) can be summed up like:
And the latter half discussing the resonant dynamics of the pre-steady state can be visualized qualitatively as:
Finally, a few points worth emphasizing:
Clearly \(\uparrow\) is unstable, wants to decay back to \(\downarrow\) with transition rate \(\Gamma\), so the zero-detuning \(\delta_0<0\) must contain information about the \(\uparrow,B\) interactions.
A Rabi experiment is different from say Ramsey in that it really is about a continuous drive so \(t\gg 1/\Omega\), as mentioned in the paper’s Figure \(1a\).
To emphasize again, there is the implicit hyperparameters that \(a_{\uparrow B}=\infty\) is unitarily/strongly interacting while \(a_{\downarrow B}\approx 0\) is weakly/non-interacting; the paper gives exact values). So only the attractive polaron is present. At the end the paper mentions extending into the BEC regime, where both would coexist for a sufficiently broad Feshbach resonance.
And some random thoughts I have about the paper:
Instead of the \(3D\) surface graph, maybe a heat map would be instructive too?
At large \(\Omega\), how does the AC Stark shift associated with a dressed atom-photon affect the physics (since it seems it would affect the internal energies of the polaron)?
Problem: Prove the Sokhotski-Plemelj theorem on the real line \(\textbf R\):
Solution: (in this solution, the function \(\chi(\omega)\) is assumed to be analytic on the integration interval in \(\textbf R\) to be able to apply the Cauchy integral formula):
Problem: Explain why any linear response function \(\chi\) should obey in the time domain \(\chi(t)=0\) for \(t<0\). Hence, what is the implication of this for \(\chi(\omega)\) in the frequency domain?
Solution: The fundamental definition of the linear response function \(\chi\) that gives it its name is that in the time domain \(\chi=\chi(t)\), the response \(x(t)\) should be proportional to the perturbation \(f(t)\) with \(\chi\) essentially acting as the proportionality constant according to the convolution:
(or equivalently the local behavior \(x(\omega)=\chi(\omega)f(\omega)\) in Fourier space). But if the “force” \(f(t’)\) is applied at time \(t’\), on causality grounds this can only affect the response \(x(t)\) at times \(t\geq t’\). In other words, it should be possible to change the limits on the integral from \(\int_{-\infty}^{\infty}dt’\) to \(\int_{-\infty}^tdt’\) without affecting the result. This therefore requires \(\chi(t-t’)=0\) for \(t<t’\), or more simply \(\chi(t)=0\) for \(t<0\). In light of this, \(\chi\) is also called a causal/retarded Green’s function.
For \(t<0\), Jordan’s lemma asserts that one should close the contour in the lower half-plane \(\Im\omega<0\). But the fact that \(\chi(t)=0\) for \(t<0\) suggests that the sum of all residues of \(\chi(\omega)\) in the lower half-plane \(\Im\omega<0\) should be “traceless”. A sufficient condition for this is if \(\chi(\omega)\) is analytic in the lower half-plane \(\Im\omega<0\), and henceforth this will be assumed.
Problem: Qualitatively, what do the Kramers-Kronig relations assert? What about quantitatively?
Solution: Qualitatively, for a linear response function like \(\chi(\omega)\) which is analytic in the lower half-plane \(\Im\omega<0\), knowing its reactive part \(\Re\chi(\omega)\) is equivalent to knowing its absorptive/dissipative spectrum \(\Im\chi(\omega)\) which in turn is equivalent to knowing \(\chi(\omega)\) itself.
Quantitatively, the bridges \(\Leftrightarrow\) are provided by the Kramers-Kronig relations:
Note: sometimes the discussion of Kramers-Kronig relations are phrased in terms of \(\chi(\omega)\) being analytic in the upper half-plane. This stems from an unconventional definition of the Fourier transform \(\chi(t)=\int_{-\infty}^{\infty}\frac{d\omega}{2\pi}e^{-i\omega t}\chi(\omega)\) rather than the more conventional definition \(\chi(t)=\int_{-\infty}^{\infty}\frac{d\omega}{2\pi}e^{i\omega t}\chi(\omega)\) used above. Consequently, the minus signs in the Kramers-Kronig relations may also appear flipped around.
Problem: If the response \(x(t)\) and the driving force \(f(t)\) are both real-valued, what are the implications of this for \(\Re\chi(\omega)\) and \(\Im\chi(\omega)\)?
Solution: Then \(\chi(t)\in\textbf R\) must also be real-valued, so \(\chi(\omega)\) is Hermitian:
\[\chi^{\dagger}(\omega)=\chi(-\omega)\]
Consequently, the reactive response \(\Re\chi(-\omega)=\Re\chi(\omega)\) is even while the absorptive/dissipative response \(\Im\chi(-\omega)=-\Im\chi(\omega)\) is odd.
Problem: State the thermodynamic sum rule.
Solution: The sum rule asserts that if one knows how much a system absorbs/dissipates at all frequencies \(\omega\in\textbf R\), then one can deduce the system’s DC linear response \(\chi(\omega=0)\), called its susceptibility (of course the Kramers-Kronig relations actually show that knowing \(\Im\chi(\omega)\) allows complete reconstruction of \(\chi(\omega)\) at all frequencies \(\omega\in\textbf R\), not just \(\omega=0\)).
Problem: Write an essay that summarizes the key points learned from the following papers/slides:
Solution: For a generic \(2\)-component Fermi gas whose \(2\) components may be called \(\uparrow\) and \(\downarrow\) (this could be \(2\) hyperfine states of the same atom, or \(2\) hyperfine states of different atoms) the Hamiltonian is \(H=H_0+V_{\downarrow\uparrow}\) where the kinetic energy is:
and the short-range scattering pseudopotential \(V_{\downarrow\uparrow}\) of “bare strength” \(g_{\uparrow\downarrow}\) describes momentum-conserving collisions between the \(2\) components \(\uparrow,\downarrow\) of the Fermi gas in a volume \(V\) (note that interactions among the components themselves are neglected, i.e. they are separately ideal Fermi gases. That is, one assumes there is no \(\uparrow\uparrow\) or \(\downarrow\downarrow\) scattering. This is justified by the fact that identical spin parts of \(2\) identical fermions could only interact via an odd-\(\ell\) scattering channel, the lowest of which is \(\ell=1\) \(\p\)-wave scattering whose cross-section \(\sigma_{\ell}\sim k^{2\ell}\) is suppressed at low \(k\)):
The Fermi polaron is the limit \(N_{\downarrow}/N_{\uparrow}\to 0\) of the \(2\)-component Fermi gas, in fact typically one just takes \(N_{\downarrow}=1\). In light of this population imbalance between the \(2\) components \(\uparrow,\downarrow\) of the Fermi gas, the standard terminology is to call the majority \(\uparrow\) component as the bath and the minority \(\downarrow\) component as the impurity. Through the short-range interaction \(V_{\downarrow\uparrow}\), the \(\downarrow\) impurity polarizes the \(\uparrow\) bath in its \(\textbf x\)-space vicinity (hence the name polaron!), and it is common to say that the \(\downarrow\) impurity is dressed by the polarized \(\uparrow\) cloud that it “carries” along with it. This composite object of the \(\downarrow\) impurity together with the \(\uparrow\) cloud is a quasiparticle called the (Fermi) polaron (in particular, it is important to emphasize that polaron is not synonymous with \(\downarrow\) impurity; the interaction \(V_{\downarrow\uparrow}\) is essential and instead polaron is synonymous with \(\downarrow\) impurity + \(\uparrow\) polarized cloud).
This discussion has been an intuitive/qualitative picture in \(\textbf x\)-space (aka real space). In \(\textbf k\)-space (aka reciprocal space), one can get more quantitative. Here, rather than starting from a \(2\)-component \(\uparrow\downarrow\) Fermi gas, one can first visualize a single-component \(\uparrow\) ideal Fermi gas in its ground state where the Fermi sea is occupied up to the Fermi wavenumber \(k_F=(6\pi^2 n_{\uparrow})^{1/3}\):
The excited states are then particle-hole excitations of this ground state Fermi sea \(|\text{FS}\rangle\). When adding a single \(\downarrow\) impurity to the bath, one would expect that the impurity would scatter bath fermions from inside to outside the Fermi sea. The modifiedground state \(|\text{FP}\rangle\) of the Fermi polaron system (i.e. the \(\downarrow\) impurity + \(\uparrow\) bath) would therefore be expected to be of the form (working in the ZMF classically, or quantum mechanically fixing an eigenstate of total momentum \(\textbf 0\)):
where the \(…\) indicates \(N\) particle-hole excitations for \(N\geq 2\). Ignoring the \(…\) terms, this is called the Chevy ansatz and can be used as a trial ground state with \(\alpha_0,\alpha_{\textbf k,\textbf k’}\) the fitting parameters to be tuned such as to minimize the Rayleigh-Ritz energy quotient \(E=\frac{\langle\text{FP}|H|\text{FP}\rangle}{\langle \text{FP}|\text{FP}\rangle}\) in the variational method. The ground state energy eigenvalue \(E\) obtained in this manner is an estimate of the polaron energy. Explicitly:
As another corollary, the fitting parameter \(\alpha_0\), once fitted, gives the polaron residue:
\[Z:=|\alpha_0|^2\leq 1\]
The (unobservable) bare strength \(g_{\uparrow\downarrow}=g_{\uparrow\downarrow}(k^*)\) should be taken to run with the (unobservable) UV cutoff \(k^*\to\infty\) such as to keep \(a_s\) fixed! Specifically, through the renormalization condition:
But \(\psi(\textbf x)=e^{ikz}+f(\textbf k,\textbf k’)\frac{e^{ikr}}{r}\), and it’s that divergent \(1/r\) piece in the scattered spherical wave that’s gonna cause trouble. This is because \(\psi(\textbf 0)\) seems to blow up due to it. But rather than let it blow up, allow it to be some large number call it \(k^*/\pi\) (clearly dimensionally okay). Then \(\psi(\textbf 0)=1+f(\textbf k,\textbf k’)k^*/\pi\). Substituting gives and isolating for \(f\):
The point is now, suddenly, you introduce a new low-energy parameter into the game, the \(s\)-wave scattering length \(a_s\)! Notice it didn’t appear in any equations yet! But since we only care about low-energy/low-momentum \(\textbf k\to\textbf 0\), and we know we have the limit \(f(\textbf k,\textbf k’)\to -a_s\) as \(\textbf k\to \textbf 0\). It’s a sort of limit/correspondence principle-like knot at the end of a string that the theory has to approach. So making that substitution, one obtains the running of the bare coupling with the UV cutoff. This turns out the be the same as the above renormalization condition.
More precise derivation:
Write the Born series for the scattering amplitude:
By defining the transition operator \(T_{\uparrow\downarrow,s}|\textbf k\rangle:=V_{\uparrow\downarrow,s}|\psi_{\textbf k}\rangle\) which can be easily checked to obey \(T=V+VG_0T\) with \(G_0=(E_{\textbf k}1-H_0)^{-1}\) the free particle resolvent, then because \(\langle k|V_{\uparrow\downarrow,s}|\textbf k’\rangle=g\) for \(V_{\uparrow\downarrow,s}=g\delta^3(\textbf X)\), then actually \(\textbf k’\) doesn’t even matter (i.e. \(s\)-wave scattering is isotropic!) so it can be used as a dummy index for the summation. Furthermore to the isotropy, \(f(\textbf k)=f(k)\):
In the end, once you set \(\textbf k:=\textbf 0\) so that \(f(0)=-a_{\uparrow\downarrow,s}\), you get the same thing. Here, the subtleties are that the series part can be summed by letting \(V\to\infty\) so \(\frac{1}{V}\sum_{\textbf k’}\to\int\frac{d^3\textbf k’}{(2\pi)^3}\), and if you take \(\langle\textbf x|\textbf k\rangle=e^{i\textbf k\cdot\textbf x}\) then the correct identity resolution is \(\frac{1}{V}\sum_{\textbf k}|\textbf k\rangle\langle\textbf k|\) for quantization volume \(V\) and also remember \(G_0|\textbf k’\rangle=\frac{1}{E_{\textbf k}-E_{\textbf k’}}|\textbf k’\rangle\) is an eigenstate.
(there are both attractive and repulsive Fermi polarons so this polarization effect can go either way). In the attractive case, if the attraction is strong enough, the the polaron can dimerize with a bath fermion, forming a molecule; this polaron-molecule transition is interesting.
Surprisingly, the Chevy ansatz works remarkably well (i.e. agrees with state-of-the-art diagrammatic quantum Monte Carlo stuff)! Seems to include the dimer bound state in it?
———————-
There are \(2\) key assumptions about the typical regime of ultracold atomic gases, namely \(n^{-1/3},\lambda_T\gg r_{vdW}\sim 100a_0\).
In the vicinity of a broad Feshbach resonance, the scattering amplitude may be approximated by the Mobius transformation \(f_s(k)=-\frac{1}{ik+a^{-1}_s}\). However, in the vicinity of a narrow Feshbach resonance, need to also parameterize it with the effective range \(r_{\text{eff}}\) so that \(f_s(k)=-\frac{1}{ik+a^{-1}_s-\frac{1}{2}r_{\text{eff}}k^2}\). Although \(a_s\) and \(r_{\text{eff}}\) are determined by microscopic details of \(V_{\uparrow\downarrow}(r)\), different microscopic details in another potential \(\tilde V_{\uparrow\downarrow}(r)\) can lead to the same low-energy scattering amplitude \(f_s(k)\). The practical corollary of this observation is that one do just that, namely substitute \(V_{\uparrow\downarrow}(r)\) for a suitable pseudopotential.
Problem: Consider a toy model of the Fermi polaron in which the \(\downarrow\) impurity interacts with only the nearest \(\uparrow\) impurity in the Fermi sea, the rest of the \(\uparrow\) Fermi sea serving to exert a pressure that effectively confines the relative distance between the \(\downarrow\) and \(\uparrow\) impurities to a radius \(R\). By equating the ground state energy of the infinite spherical potential well with the Fermi energy \(E_F\), show that:
\[R=\sqrt{\frac{m_{\uparrow}}{\mu}}\]
Hence, show that for a positive-energy eigenstate \(E=\frac{\hbar^2k^2}{2m}\) the wavenumber \(k\) is determined through the \(s\)-wave scattering length \(a_s\) by:
\[k\cot kR=a^{-1}_s+R^*k^2\]
(where the Bethe-Peierls boundary condition is used). Show that for \(m_{\uparrow}=m_{\downarrow}\) and \(R^*=0\), this simplifies to:
By considering the scaled energy from the Fermi energy \(\frac{E-E_F}{E_F}\) which in this case amounts to \(2(k/k_F)^2-1\), plot this as a function of \(-1/k_Fa_s\).
Problem: Explain how Ramsey interferometry works.
Solution: Applying two \(\pi/2\)-pulses separated by some time \(\Delta t\); then Ramsey fringes are seen as a function of this temporal separation \(\Delta t\); it is a bit like a time-domain analog of a Mach-Zender interferometer.
Problem: Consider a system of \(N=2\) identical non-interacting spin \(s=1/2\) fermions in an infinite potential well of width \(L\) (nodes at \(x=0,L\)). Write down the general \(2\)-body wavefunction \(\Psi(1,2)\) for the system’s ground state, \(1^{\text{st}}\) excited state, and \(2^{\text{nd}}\) excited state by calculating Slater determinants of the single-fermion spin-orbitals.
Solution: Use the notation \(\chi_{n,m_s}\) for the spin-orbital (note the word “orbital” here isn’t really meant in the sense of e.g. orbital angular momentum \(\textbf L\), but rather in the chemist sense of “atomic orbital” though the \(2\) notions aren’t entirely disjoint):
\[\chi_{n,m_s}=|\psi_n\rangle\otimes|m_s\rangle\]
where \(\psi_n(x)=\sqrt{\frac{2}{L}}\sin\frac{n\pi x}{L}\) is the position space wavefunction of the \(n^{\text{th}}\) excited single-particle state. Use the notation \(\chi_{n,m_s}(i)\) to refer to the spin-orbital for the (arbitrarily labelled) \(i^{\text{th}}\) fermion (in this case \(i=1,2\)), for example:
Now consider the ground state. Because the fermions are non-interacting, this amounts to the requirement \(n_1=n_2=1\), thus fixing the spatial part of the allowed spin-orbitals. In particular, since the spatial parts are identical, the spin parts cannot be \(m^{(1)}_s\neq m^{(2)}_s\) otherwise \(\Psi(1,2)=0\); this is just the Pauli exclusion principle. Arbitrarily letting \(m^{(1)}_s=-m^{(2)}_s=1/2\), it follows that the ground state manifold is one-dimensional, and spanned by the ground state:
So the spatial part of the \(2\)-body wavefunction is clearly \(1\leftrightarrow 2\) symmetric while the singlet spin part is \(1\leftrightarrow 2\) antisymmetric, ensuring total antisymmetry \(\Psi(2,1)=-\Psi(1,2)\).
Meanwhile, for the system’s \(1^{\text{st}}\) excited state:
Problem: Briefly describe how the results above would be affected if instead it were \(N=2\) identical non-interacting spin \(s=1\) bosons.
Solution: Instead of Slater determinants, one would use “Slater permanents”, or just “permanents” for short. These are basically calculated in the same way as a determinant except all minus signs become plus signs. This implicitly enforces total symmetry \(\Psi(2,1)=\Psi(1,2)\) of the wavefunction. In addition, because \(2s+1=3\) now, the degeneracies of each manifold would be enhanced.
Problem: Define the \(N\)-symmetrizer \(\mathcal S_N\) and the \(N\)-antisymmetrizer \(\mathcal A_N\) operators on the space \(\mathcal H^{\otimes N}\) of \(N\) identical particles (where \(\mathcal H\) is a single-particle state space).
Solution: The \(N\)-symmetrizer \(\mathcal S_N:\mathcal H^{\otimes N}\to S^N\mathcal H\) is defined by:
note that \(\#S_N=N!\). Strictly speaking each permutation \(\sigma\in S_N\) is an abstract group element for which there is no notion of “group addition” (and it certainly doesn’t refer to compositions of permutations either), rather this is a faithful representation of \(S_N\) on \(\mathcal H^{\otimes N}\) (and so a more pedantic notation could be \(\hat{\sigma}\) for the operator associated to \(\sigma\)). For instance, if \(N=6\) and \(\sigma=(352)(14)\) in cycle notation, then:
\[\sigma\Psi(1,2,3,4,5,6)=\Psi(4,3,5,1,2,6)\]
Problem: What is the explicit connection between the Slater permanents/determinants and the symmetrizer/antisymmetrizer?
Solution: The idea is to symmetrize/antisymmetrize the Hartree product ansatz, essentially providing the bridge to Hartree-Fock. This immediately yields Slater permanents/determinants respectively (up to a normalization):
Problem: Establish the following useful properties of \(\mathcal S_N\) and \(\mathcal A_N\):
i) \[\sigma\mathcal S_N=\mathcal S_N\sigma=\mathcal S_N\] and \[\sigma\mathcal A_N=\mathcal A_N\sigma=\text{sgn}(\sigma)\mathcal A_N\] for any \(\sigma\in S_N\).
ii) \[\mathcal S^{\dagger}_N=\mathcal S_N\] and \[\mathcal A^{\dagger}_N=\mathcal A_N\] (i.e. the \(N\)-symmetrizer and \(N\)-antisymmetrizer are both Hermitian observables).
iii) \(\mathcal S^2_N=\mathcal S_N\) and \(\mathcal A^2_N=\mathcal A_N\) and (i.e. \(\mathcal S_N\) and \(\mathcal A_N\) are orthogonal projectors, which makes sense because they project onto the symmetric subspace \(S^N\mathcal H\) and its orthogonal complement? \(\bigwedge^N\mathcal H\) (this also explains the factor of \(1/\#S_N\) in the definition as a “idempotence factor”?)
iv) \[[\mathcal S_N,H]=[\mathcal A_N,H]=0\] where \(H\) is any Hermitian observable.
Solution:
Problem: Show that for \(N=2\) identical particles, \(S_2+A_2=1\) partitions the space \(\mathcal H^{\otimes 2}\) but for \(N=3\) identical particles \(S_3+A_3\neq 1\).
Solution:
Problem: The purpose of the previous problems wasn’t so much to actually compute all those \(N\)-body wavefunctions \(\Psi(1,…,N)\), but rather to force one to compute them in order to realize how tedious and redundant the whole business is. That is, because the particles are identical, it is highly inefficient to be asking “which state is which particle in” since the notion of “which particle” is meaningless. Instead, common sense dictates that the more efficient question to ask is “how many particles are in each state?” as this doesn’t care about which particle is which. This conceptual simplification lies at the heart of the second quantization approach to the quantum mechanics of a many-body system; it is also called the occupation number representation.
The canonical example of this is the ideal Bose/Fermi gas which, although it can be described by linear combinations of \(N\)-body wavefunctions with \(N!\) terms, for \(N\sim 10^{23}\) this quickly becomes inconvenient, so in practice one sweeps the wavefunctions under the rug and just speaks about occupation numbers \(N_{\textbf k}\) of various \(\textbf k\)-states, aka the Bose-Einstein and Fermi-Dirac distributions.
A warning: second quantization is only really useful for systems of identical particles; if the \(N\) particles were all distinguishable then in principle one can still use the second quantization framework, but in that case it doesn’t offer any advantage over just plain wavefunction language (aka first quantization).
Now then, go back to the earlier problems and redo everything in second quantization/occupation number language.
Solution:
Problem: What is the name for a many-body quantum state \(|N_1,N_2,…\rangle\) written in the occupation number representation? What is the name of the space that such states live in?
Solution: Many-body quantum states in the occupation number representation are called Fock states, and live in Fock space \(\mathcal F:=\oplus_{N=0}^{\infty}\mathcal F_N\) which considers variable number of particles via the various sectors \(\mathcal F_N\).
Problem: How does one navigate the Fock space \(\mathcal F\).
Solution: Using creation and annihilation operators. To be precise, for a given basis of \(\mathcal H^{\otimes N}\), one associates a creation and annihilation operator to each basis state in the basis…
Problem:Derive the commutation relations for the bosonic creation/annihilation operators and similarly derive the anticommutation relations for the fermionic creation/annihilation operators, starting from … . This shows that commutation and anticommutation relations completely encode the permutation symmetries of the bosonic and fermionic states.
Problem: What does it mean for a linear operator \(H\) to be an \(N\)-body operator?
Solution: An operator \(H\) is said to be an \(N\)-body operator iff there exists a decomposition of \(H\) in the form:
\[H=\sum_i H_i\]
where each operator \(H_i\) acts only on \(N\) particles at a time, acting as the identity operator on all other particles. For instance, the kinetic energy operator for an arbitrary system of particles (whether identical or not) is a \(1\)-body operator, as are most external potentials. By contrast, common \(2\)-body operators include interaction potentials between particles.
Problem: Given a system of \(N\) identical particles (fermions or bosons), and a \(1\)-body operator \(H\), and a basis \(\{|i\rangle\}\) of the single-particle Hilbert space, explain why the second quantization functor acting on \(H\) is given by the “dictionary”:
Solution: Because the matrix elements are preserved under this homomorphism, so the functor is sort of “unitary” in a way? More precisely, matrix elements between any \(2\) Fock states are unchanged.
Problem: Write the one-body Rabi drive operator \(V_{\text{Rabi}}=\frac{\hbar}{2}\tilde{\boldsymbol{\Omega}}\cdot\boldsymbol{\sigma}\) in \(2^{\text{nd}}\) quantization.
Solution: Since the states \(\{|\textbf k,\sigma\rangle\}\) are a basis for the single-particle Hilbert space:
Problem: Write the \(2\)-body scattering contact pseudopotential operator \(V:=g\delta^3(\textbf X-\textbf X’)\) in \(2^{\text{nd}}\) quantization.
Problem: What is the meant by the phrase “elementary excitations” of an ideal Fermi gas?
Solution: Basically “excitations” is a fancy word for “excited states”, in this case more precisely “many-body excited states”. One example is depicted in the diagram below. Note that the elementary excitations are the subset of all excitations involving only a single fermion creation/annihilation operator (i.e. only removing \(1\) fermion from the Fermi sea, or only adding \(1\) fermion to a state beyond the Fermi sea; thus particle-hole excitations are not elementary excitations but could be thought of as composed of \(2\) elementary excitations).
Problem: What is a rule of thumb for the difference between quasiparticles and collective excitations?
Solution: Quasiparticles (e.g. holes) are fermionic while collective excitations (e.g. phonons) are bosonic.
Problem: Given a many-body interacting system of identical fermions, explain the necessary (but not sufficient) condition of adiabaticity that the fermion interactions must fulfill in order for that system of identical fermions to deserve the name/classification of being a “Fermi liquid“.
Solution: A Hamiltonian \(H\) is said to be adiabatically connected to another Hamiltonian \(H_0\) iff there exists a smooth path in “Hamiltonian space” from \(H_0\to H\). Here “smooth” means no level crossings of the eigenstates, or \(\Leftrightarrow\) no phase transitions. Also, just as in thermodynamics when one sketches e.g. an isotherm or adiabat on a \(pV\)-diagram which is always implicitly showing a quasistatic and in fact reversible process connecting a bunch of equilibrium states together, so here too the “path” in Hamiltonian space should be traversed sufficiently slowly as a function of time \(t\) (as quantified for instance by the adiabatic theorem). Of course in the lab, the interactions in \(H\) are already “on”, so this rather technical minutiae is at best a gedankenexperiment.
A system of identical fermions with interacting Hamiltonian \(H\) only has any hope of being a Fermi liquid if \(H\) is adiabatically connected to the corresponding non-interacting Hamiltonian \(H_0\) of an ideal Fermi gas as a reference system. In other words, from a topological/phase diagram perspective, Fermi liquids are a subset of the connected component of the ideal Fermi gas.
The “meaty corollary” that adiabaticity brings with it as that, if it’s satisfied, then there must exist a bijection between the ideal Fermi gas and the interacting Fermi liquid. The essence of theoretical physics is to map hard problems to easy problems! And moreover, these sorts of isomorphisms reveal a lot of deep connections/symmetries (e.g. in this case experiments found linear heat capacities, constant Pauli diamagnetism, etc. which were predicted qualitatively in the ideal Fermi gas model despite interactions…this isomorphism is the essence of Landau’s explanation of that remarkable observation).
The “rigorous proof” of this isomorphism goes back to the gedankenexperiment above, i.e. that it should be possible to “trace the footsteps” of each non-interacting eigenstate \(N_{\textbf k}\) of \(H_0\) through state space to end up at the unique, corresponding eigenstate of \(H\).
Henceforth, write \(\tilde N_{\textbf k}\) to denote the eigenstate of the interacting fermion system \(H\) adiabatically connected to \(N_{\textbf k}\); note that \(N_{\textbf k}\neq \frac{1}{e^{\beta (E_{\textbf k}-\mu)}+1}\) can be any arbitrary, possibly non-equilibrium occupation number distribution in \(\textbf k\)-space, so long as it’s compatible with Pauli exclusion \(N_{\textbf k}\in\{0,1\}\). This is roughly saying that “eigenvalues are more robust than eigenvectors” with respect to perturbations, or in quantum lingo, “quantum numbers are more robust than eigenstates”; although \(\textbf k\) in general will no longer be a good quantum number for basically any kind of interactions, one can sort of “pretend” that it’s a good quantum number anyway by using it as an adiabatic label for the interacting eigenstates of \(H\).
Problem: (something about the ansatz…)
Solution: For a degenerate ideal Fermi gas, the occupation numbers are given by a sharp Fermi-Dirac step \(N_{\textbf k}=\Theta(k_F-|\textbf k|)\) and the total energy is \(E=\frac{3}{5}NE_F\). Now suppose one were to shuffle the fermions around in \(\textbf k\)-space, effectively moving some fermions from the \(|\textbf k|<k_F\) Fermi sea (leaving behind holes) and promoting them to the unoccupied region \(|\textbf k|>k_F\). Then within the \(|\textbf k|<k_F\) Fermi sea, if a fermion was removed from a particular \(\textbf k\)-state, then the occupation number \(N_{\textbf k}\) of that \(\textbf k\)-state will have decreased by \(\Delta N_{\textbf k}=-1\). Similarly, if that fermion is then added to a \(\textbf k\)-state outside the Fermi sea \(|\textbf k|>k_F\), the occupation number of that \(\textbf k\)-state would increase by \(\Delta N_{\textbf k}=1\). It is thus clear that the total energy of the degenerate ideal Fermi gas would increase from its initial value of \(E=\frac{3}{5}NE_F\) by an amount:
where the sum \(\sum_{\textbf k}\) is of course over all \(\textbf k\)-states, both inside and outside the Fermi sea (and due to the monotonically increasing nature of the free particle dispersion \(\sim|\textbf k|^2\), the positive contributions from outside the Fermi sea will necessarily overwhelm the negative contributions from within the Fermi sea leading to an increase \(\Delta E>0\) as mentioned above).
So far this discussion has been for a ideal (aka non-interacting) Fermi gas. What happens if the fermions can now interact with each other (e.g. Coulombic repulsion between electrons)? Then Landau postulated that the above expression should be replaced by:
Problem: Show that for \(T\ll T_F\) and \(E-\mu\ll\mu\), the quasiparticle lifetime \(\tau\) goes like:
\[\tau=\frac{\hbar\mu}{a(E-\mu)^2+b(k_BT)^2}\]
for dimensionless constants \(a,b\in\textbf R\) of \(O(1)\). In particular, for a \(T=0\) degenerate Fermi liquid, one has \(\tau\sim 1/(E-E_F)^2\) so the closer the energy \(E\) of the quasiparticle is to the Fermi energy \(E_F\), the longer-lived it is.
Solution: This can be obtained simply from an application of Fermi’s golden rule to the (obviously dominant!) scattering process of a quasiparticle with momentum \(|\textbf k|>k_F\) colliding with a Fermi sea particle of momentum \(|\textbf k_2|<k_F\) and ending up as \(2\) quasiparticles outside the Fermi sea \(|\textbf k’_1|,|\textbf k’_2|>k_F\). This decay channel is obviously dominant because of Pauli blocking (what else could possibly happen?) and is enforced by the corresponding step functions (each either \(0,1\)) in the density of final states:
(aside: are the assumptions of Fermi’s golden rule sufficiently fulfilled in this case?). Actually, if we want to extract the \(T\)-dependent part of \(\tau_{\textbf k}\), then instead of step functions we should linearize Fermi-Dirac at temperature \(T\) about the Fermi surface. Justify that this is because in fact, with a little thought (can be proven mathematically), all of the wavevectors actually need to be pretty close to the Fermi surface for momentum and kinetic energy constraints to be satisfiable.
Problem: What are \(2\) examples of quantum systems to which Landau’s Fermi liquid theory applies?
Solution: Normal (i.e. not superfluid) \(^3\text{He}\) and normal (i.e. not superconducting) conduction electrons in a conductor (the latter case has a bit more complications due to the long-range nature of the Coulomb electrostatic repulsion whereas in He-3 it’s just short-range VdW LJ).
Problem: Distinguish between supervised learning and unsupervised learning.
Solution: In supervised learning, the machine is trained on a labelled training set \(\{(\textbf x_1,y_1),…,(\textbf x_N,y_N)\}\) consisting of feature vectors \(\textbf x_i\) with their target values \(y_i\). The goal is then to fit a function \(\hat y(\textbf x)\) (historically called a hypothesis, also called a model) to this data in order to predict/estimate the target value of arbitrary feature vectors \(\textbf x\) beyond the training set.
In unsupervised learning, the machine is simply given a bunch of data \(\textbf x_1,…,\textbf x_N\) and asked to find patterns/structure within the data. Thus, supervision and training are synonymous.
Problem: What are the \(2\) most common types of supervised learning?
Solution: If the range of \(\hat y\) is at most countable (e.g \(\{0,1\},\textbf N\), etc.) then this type of supervised learning is called classification. If instead the range of \(\hat y\) is uncountable (e.g. \(\textbf R\)) then this type of supervised learning is called regression (cf. the distinction between discrete and continuous random variables).
Problem: What are some kinds of unsupervised learning?
Solution: Clustering, etc.
Problem: (clarifying the theory of multivariate linear regression)
Problem: On using Python to do univariate linear regression, suggesting multiple ways/libraries to do it.
Problem: Write down using mathematical notation the iterative formula for gradient descent minimization of an arbitrary function \(C(\textbf x)\), then write down the same formula using programming notation.
is a mean-squarecost function for a linear regression model \(\hat y(\textbf x|\textbf w,b)=\textbf w\cdot\textbf x+b\) attempting to predict a training set \(\{(\textbf x_1,y_1),…,(\textbf x_N,y_N)\}\) (here \(\textbf w\) is a weight vector and \(b\) is called a bias), what do the update rules for gradient descent look like in mathematical notation (explicitly)?
Solution: Note that \(\textbf w_n\) and \(b_n\) should be updated simultaneously:
Problem: What are the \(2\) key advantages of using NumPy?
Solution:Vectorization and broadcasting.
Vectorization is what it sounds like: do all your computations with NumPy arrays (i.e. ndarray).
Broadcasting allows for standard operations such as scalar multiplication \(\textbf x\mapsto c\textbf x\) to occur.
Universal functions (ufuncs) in NumPy (precompiled \(C\)-loop).
(crudely speaking, these are Cartesian tensors in math/physics). Typically never have to bother with Python loops.
Problem: Implement gradient descent in Python for the training set \(\{(1,1),(2,2),(3,3)\}\) using a linear regression model. In particular, for a fixed initial guess, determine the optimal learning rate \(\alpha\). Also try different initial guesses? And batch gradient descent vs. mini-batch…
Solution:
numpy_exercises
$\textbf{Problem}$: Write Python code to generate two random vectors $\textbf w,\textbf x\in\mathbf R^{10^7}$ and compute their dot product $\textbf w\cdot\textbf x$:
i) Without vectorization (i.e. using a for loop)
ii) With vectorization
Show that vectorization significantly speeds up the computation of $\textbf w\cdot\textbf x$.
$\textbf{Solution}$:
In [52]:
importnumpyasnp# import the NumPy libraryfromnumpyimportrandom# import the random module from the NumPy libraryfromtimeimporttime# import the time libraryn=int(1e7)random.seed(62831853)# seed for "predictable" random vectorsw=random.rand(n)# random vector with n entries between 0 and 1print(w)x=random.rand(n)# random vector with n entries between 0 and 1print(x)# Without vectorization:t_initial=time()dot_product=0foriinnp.arange(n):dot_product=dot_product+w[i]*x[i]t_final=time()print(f"Dot product value: {dot_product}")print(f"Time taken: {t_final-t_initial} seconds")# With vectorization:t_initial=time()dot_product=np.dot(w,x)t_final=time()print(f"Dot product value: {dot_product}")print(f"Time taken: {t_final-t_initial} seconds")
i) Generate (from a fixed seed) $1000$ random points in the $xy$-plane which are dispersed around the line $y=3x+1$ with standard deviation $\sigma_y=50$, in the range $0\leq x\leq 100$.
ii) Using the univariate linear regression model $\hat y=wx+b$ with initial guess $w=b=0$, apply gradient descent with learning rates $\alpha=10^{-6},10^{-5},10^{-4}$ for $100$ iterations each and plot the corresponding learning curves.
iii) Show, for each of the previous values of $\alpha$, the machine learning of $(w,b)$ in a suitable plane.
n=100w_init=0b_init=0iterations=np.arange(n)costs=np.zeros(n)withplt.style.context(['science']):plt.figure(figsize=(10,5))foralphain[1e-6,1e-5,1e-4]:w,b=grad_descent(w_init,b_init,alpha,n)plt.plot(iterations,C(x,y,w,b),label=r"$\alpha=$"+str(alpha))plt.title("Learning Curves for Gradient Descent for Several Learning Rates")plt.xlabel("Number of Iterations")plt.ylabel("Cost Function")plt.legend()plt.show()
In [13]:
withplt.style.context(['science']):plt.figure(figsize=(10,5))foralphain[1e-6,1e-5,1e-4]:weights,biases=grad_descent(w_init,b_init,alpha,n)plt.scatter(weights,biases,label=r"$\alpha=$"+str(alpha),s=0.5)plt.title(r"Machine Learning the Linear Regression Model Parameters $w,b$")plt.xlabel(r"$w$")plt.ylabel(r"$b$")plt.legend()plt.show()
Problem: Explain how feature renormalization and feature engineering can help to speed up gradient descent and obtain a model with greater predictive power.
Solution: Because the learning rate \(\alpha\) in gradient descent is, roughly speaking, a dimensionless parameter, it makes sense to nondimensionalize all feature variables in some sense. This is the idea of feature renormalization. For instance, one common method is to simply perform mean renormalization:
\[x\mapsto\frac{x-\mu}{x^*-x_*}\]
Another method is \(z\)-score renormalization:
\[x\mapsto\frac{x-\mu}{\sigma}\]
Feature engineering is about using one’s intuition to design new features by transforming or combining original features. Typically, this means expanding the set of basis functions one works with, thus allowing one to curve fit even for nonlinear function functions. Thus, linear regression also works with nonlinear functions; the word “linear” in “linear regression” shouldn’t be thought of as “linear fit” but as “linear algebra”.
Problem: Now consider the other type of supervised learning, namely classification, and specifically consider the method of logistic classification (commonly called by the misnomer of logistic “regression” even though it’s about classification). Write down a table comparing linear regression with logistic classification with regards to their:
i) Model function \(\hat y(\textbf x|\textbf w,b)\) to be fit to the training set \(\{(\textbf x_1,y_1),…,(\textbf x_N,y_N)\}\).
ii) Loss functions \(L(\hat y,y)\) appearing in the cost function \(C(\textbf w,b)=\frac{1}{N}\sum_{i=1}^NL(\hat y(\textbf x_i|\textbf w,b),y_i)\).
Solution:
where, as a minor aside, the loss function for logistic classification can also be written in the explicit “Pauli blocking” or “entropic” form:
\[L(\hat y,y)=-y\ln\hat y-(1-y)\ln(1-\hat y)\]
Indeed, it is possible to more rigorously justify this choice of loss function precisely through such maximum-likelihood arguments. For simplicity, one can simply think of this choice of \(L\) for logistic classification as ensuring that the corresponding cost function \(C\) is convex so that gradient descent can be made to converge to a global minimum (which wouldn’t have been the case if one had simply stuck with the old quadratic cost function from linear regression). A remarkable fact (related to this?) is that the explicit gradient descent update formulas for each iteration look exactly the same for linear regression and logistic classification:
just with the model function \(\hat y(\textbf x|\textbf w,b)\) specific to each case.
Problem: Given a supervised classification problem involving some training set to which a logistic sigmoid is fit with some optimal weights \(\textbf w\) and bias \(b\), how does the actual classification then arise?
Solution: One has to decide on some critical “activation energy”/threshold \(\hat y_c\in[0,1]\) such that the classification of a (possibly unseen) feature vector \(\textbf x\) is \(\Theta(\hat y(\textbf x|\textbf w,b)-\hat y_c)\in\{0,1\}\). Thus, the set of feature vectors \(\textbf x\) for which \(\hat y(\textbf x|\textbf w,b)=\hat y_c\) is called the decision boundary.
Problem: Using the scikit learn library, show how to perform logistic classification given \(N\) training examples of feature \(n\)-vectors \(\textbf x\in\textbf R^n\) with \(\hat y_c=0.5\).
Solution:
Problem: Explain what it means to regularize a cost function \(C(\textbf w,b)\) and what the purpose of this is.
Solution: It means adding an isotropic convex paraboloid component in the weight \(\textbf w\) (analogous to a harmonictrap in ultracold atoms), i.e.
(optionally though not typically one can also regularize the bias term \(b\) by adding \(\lambda b^2/2N\) to the cost function \(C\)). Here, \(\lambda\) like \(\alpha\) is another hyperparameter (sometimes called a regularization parameter). The purpose is to avoidoverfitting (a.k.a. high variance) when one has a lot of features and a relatively small training set (naturally, other ways to address overfitting include having more training examples and feature selection).
Problem: Consider placing a fictitious open surface in an equilibrium ideal gas at temperature \(T\); although the net particle current density through such a surface would be \(\textbf J=\textbf 0\), if one only counts the particles that go through the surface from one side to the other, then show that the resulting unidirectional particle currentdensity \(J\) is non-zero, and given by:
\[J=\frac{1}{4}n\langle v\rangle\]
where \(n=p/k_BT\) is the number density and \(\langle v\rangle=\sqrt{8k_BT/\pi m}\) the average speed.
Solution:
Problem: By an analogous calculation, show that the unidirectional kinetic energy current density \(S\) for an ideal gas (which one might also think of as a heat flux \(S=q\)) is given by:
\[S=\frac{1}{2}nk_BT\langle v\rangle\]
And hence, show that the average kinetic energy of particles hitting a wall is enhanced by a Bayes’ factor of \(4/3\) compared to the bulk kinetic energy \(\frac{3}{2}k_BT\) per particle.
Problem #\(1\): Describe how the classical Hall coefficient \(\rho^{-1}\) and explain why it’s “causally intuitive”.
Solution #\(1\): In the classical Hall effect, the “cause” is both an applied current density \(J\) together with an applied perpendicular magnetic field \(B\). The “effect” is an induced transverse electric field \(E\) whose magnitude and direction are such as to ensure a velocity selector steady state. So it seems reasonable to define the Hall coefficient by:
where the notation \(\rho^{-1}\) is deliberately suggestive of being the reciprocal charge density which is also what the Hall effect is. Note that here \(E\) only represents the transverse component of the electric field, i.e. \(E=-\textbf E\cdot(\textbf J\times\textbf B)/JB\), as there may also be a longitudinal component e.g. to compensate for scattering and other resistances.
The simplest way to derive this is to just set the Lorentz force density \(\textbf f\) to zero:
Since \(J,B\) are applied by the experimentalist, they are readily known, and \(E\) can also be readily measured by measuring instead a suitable Hall voltage \(\Delta\phi_H=-\int d\textbf x\cdot\textbf E\) in the transverse direction (voltages are always experimentally accessible as well), so the classical Hall effect provides a simple way to directly measure the charge density \(\rho\) (via the Hall coefficient \(\rho^{-1}\)), and hence the number density of charge carriers \(n=\rho/{\pm e}\) (strictly this assumes a single charge carrier species; for semiconductors it would be a bit more complicated…so in fact strictly speaking maybe one shouldn’t denote it by \(\rho^{-1}\) but just by \(R_H\)).
Problem #\(2\): Delve into the quantum Hall effect.
In sufficiently symmetric geometries, the method of images provides a way to solve Poisson’s equation \(|\partial_{\textbf x}|^2\phi=-\rho/\varepsilon_0\) in a domain \(V\) subject to either Dirichlet or Neumann boundary conditions (required for the uniqueness theorem to hold) by strategically placing charges in the “unphysical region” \(\textbf R^3-V\) such as to ensure the boundary conditions are met. It works because of linearity and the fact by placing image charges outside the physical region \(V\), one isn’t tampering with \(\rho\) in that region so Poisson’s equation truly is solved.
In the following problems, the goal is to compute (in the suggested order):
The electrostatic potential \(\phi(\textbf x)\) everywhere (i.e. both in regions of free space and inside materials).
The electrostatic field \(\textbf E(\textbf x)=-\frac{\partial\phi}{\partial\textbf x}\)
The induced charge density \(\sigma\) on any conducting surfaces, along with the total charge \(Q\) on such surfaces.
The force \(\textbf F\) between any conductors.
The internal fields (\(\textbf D,\textbf E,\textbf P,\phi\)) and bound charge distributions \(\rho_b,\sigma_b\) for any dielectrics.
The resistance/self-capacitance/self-inductance/mutual capacitance/mutual inductance of any conductors? (although that isn’t really electrostatics anymore…)
Problem: Consider placing a point charge \(q\) at the point \((0,0,z)\) a distance \(z\) from an infinite planar conductor at \(z=0\).
Solution: Place an image point charge \(-q\) at \((0,0,-z)\).
Problem: Now instead of a point charge, consider a line charge with linear charge density \(\chi\).
Solution:
Problem: Instead of a line charge, place a line “cylinder” of charge of radius \(a\).
So eliminating the \(\cos\phi\), one finds that it is indeed possible to isolate solely for the ratio \(\rho_1/\rho_2\) as a function of constant parameters, confirming that it is an equipotential surface as required.
Aside: this is nothing more than Apollonius’s construction of a circle as the set of all points whose distances \(\rho,\rho’\) from \(2\) “foci” are in a fixed ratio \(\rho’/\rho\). Indeed, if the two foci are separated by a “semi-axis” \(a\) (thus their full separation is \(2a\)), then the distance \(d\) from the midpoint of the two foci to the center of the Apollonian circle and its radius \(R\) satisfy (using the extreme points on the circle):
Problem: Consider a magnetic dipole \(\boldsymbol{\mu}\) suspended above a superconducting surface so that on this surface all magnetic fields are expelled.
Problem: An electrostatic dipole \(\boldsymbol{\pi}\) a distance from a conducting plane?
Problem: Consider a conducting sphere in an asymptotically uniform background electrostatic field \(\textbf E_0\).
Problem: Replace the conducting sphere by an insulating sphere (aka a linear dielectric sphere) of permittivity \(\varepsilon\) (comment on how this relates to the Clausius-Mossotti relation).
Problem: Instead of linear dielectric sphere, consider linear diamagnetic sphere in a uniform magnetic field \(\textbf B_0\).
Problem: Consider an \(N\)-gon of conducting sheets (quadrupole, octupole, etc.)
Problem: A point charge in a conducting spherical cavity (Green’s function for that domain).
Problem: A point charge outside the sphere.
Problem: (example with infinitely many image point charges?)
These ideas extend immediately to potential flows in fluid mechanics…describe all the analogous situations and analogous results without doing all the work again. Similarly for steady-state temperature distributions, and anywhere that Laplace’s equation with suitable boundary conditions shows up.
Problem: Distinguish between the terms “intrinsic semiconductor” and “extrinsic semiconductor“.
Solution: An intrinsic semiconductor is pretty much what it sounds like, i.e. a “pure” semiconductor material like \(\text{Si}\) that is undoped with any impurity dopants. An extrinsic semiconductor is then basically the negation of an intrinsic semiconductor, i.e. one which is doped with impurity dopants, although conceptually one can think of it as being doped with charge carriers (either holes \(h^+\) in a \(p\)-type extrinsic semiconductor or electrons \(e^-\) in an \(n\)-type extrinsic semiconductor).
Problem: In the phrases \(p\)-type semiconductor and \(n\)-type semiconductor, what do the \(p\) and \(n\) represent?
Solution: In both cases, the extrinsic semiconductor (isolated from anything else) is neutral, even when doped. Rather, the \(p\) and \(n\) refer to the majoritymobile/free charge carriers in the corresponding semiconductor; i.e. holes in the valence band and electrons in the conduction band respectively.
Problem: Show that the equilibrium number density \(n_{e^-}\) of mobile conduction electrons (i.e. not including the immobile core/valence electrons) thermally excited into the conduction band at temperature \(T\) is exponentially related to the gap \(E_C-\mu\) between the energy \(E_C\) at the base of the conduction band and the Fermi level \(\mu\):
\[n_{e^-}=n_Ce^{-\beta(E_C-\mu)}\]
where the so-called effective density of states:
\[n_C:=\frac{2g_v}{\lambda^{*3}_T}\]
is \(\approx\) the number density of available conduction band states at temperature \(T\) (here \(g_v\) is the valley degeneracy and \(\lambda^{*3}_T\) is the thermal de Broglie wavelength with respect to the electron’s effective mass \(m^*\)).
Solution: To clarify some of the approximations used in that line with the \(\approx\), the upper bound on the conduction band \(E_{C,\text{max}}\to\infty\) can be safely taken to infinity because of the exponential suppression of the integrand by the Fermi-Dirac distribution for \(E\gg\mu\) (in fact, using Fermi-Dirac statistics in the first place assumes the electrons interact solely through Pauli blocking). In addition, the density of states \(g_C(E)\) is approximated by that of a free particle in the neighbourhood of the conduction band valley (with the usual \(\sqrt{E}\mapsto \sqrt{E-E_C}\) because \(g_C(E)=0\) in the \(E\in[E_V,E_C]\) band gap) and with \(m\mapsto m^*\) to reflect the local curvature of the conduction band which is inherited from the strength of the lattice’s periodic potential. Finally, to strengthen the earlier claim that \(E\gg\mu\), indeed, \(E\geq E_C\) is the range of the integral, and so a sufficient condition for \(E\gg\mu\) is \(E_C\gg\mu\) (in practice a few \(k_BT\) is sufficient). This is assumed to be the case, and constitutes the assumption of a non-degenerate semiconductor (cf. non-degenerate Fermi gas). In this case, the Fermi-Dirac distribution boils down to just its “Boltzmann tail” \(\frac{1}{e^{\beta(E-\mu)}+1}\approx e^{-\beta(E-\mu)}\):
Problem: Repeat the above problem for holes to derive an analogous result for the equilibrium number density \(n_{h^+}\) of free conduction holes excited into the valence band at temperature \(T\):
\[n_{h^+}=n_Ve^{-\beta(\mu-E_V)}\]
where \(n_V\) is almost the same as \(n_C\) except that it’s derived from the effective mass \(m^*\) of the holes at the top of the valence band.
Solution: A few comments: if \(f(E)\) is the Fermi-Dirac distribution for electrons, then by the very definition of a hole as a vacancy/absence of an electron, the analog of the Fermi-Dirac distribution for holes (which can be considered fermionic quasiparticles) is \(1-f(E)\). In addition, a hole is considered to have more energy when it goes “downward” on a typical band diagram where the vertical axis \(E\) is really referring to the electron’s energy. This explains the counterintuitive limits on the integral:
Problem: If an intrinsic semiconductor is doped with impurity dopants to create an extrinsic semiconductor, say with a number density \(n_{d^+}\) of cationized donor dopants and \(n_{a^-}\) of anionized acceptor dopants, what constraint does charge neutrality of the semiconductor impose among the concentrations \(n_{e^-},n_{h^+},n_{d^+},n_{a^-}\)?
Solution:
\[-en_{e^-}+en_{h^+}+en_{d^+}-en_{a^-}=0\]
\[n_{h^+}+n_{d^+}=n_{e^-}+n_{a^-}\]
Conceptually, for every electron excited into the conduction band, the corresponding donor atom now becomes cationized; similarly, every hole excited into a valence band is really an acceptor atom anionizing as it accepts an electron from the valence band, so in the equation, it is conceptually meaningful to pair up \((n_{e^-},n_{d^+})\) and \((n_{h^+},n_{a^-})\). Note however this is not saying that they are equal; though they approach becoming equal the more heavily one dopes.
Problem: Show that, in an intrinsic semiconductor, the Fermi level \(\mu\) lies almost (but not exactly) at the midpoint \(\frac{E_V+E_C}{2}\) of the band gap.
Solution: An intrinsic semiconductor is undoped so \(n_{D^+}=n_{A^-}=0\). This implies from the charge neutrality argument above that \(n_{e^-}=n_{h^+}\) (i.e. every electron excited into the conduction band leaves a hole in the valence band). The rest of the argument is then just plugging in the earlier equilibrium free charge carrier concentrations and algebra:
In what follows, it will be useful to call this particular value of \(\mu\) the intrinsic Fermi level \(\mu_i\) since it is the Fermi level of an intrinsic semiconductor, prior to any extrinsic doping.
Problem:Define the intrinsic charge carrier concentration by \(n_i:=n_{e^-}=n_{h^+}\) for an intrinsic semiconductor, hence one has the so-called law of mass action \(n_{e^-}n_{h^+}=n_i^2\) (i.e. \(n_i^2\) is just a \(T\)-dependent equilibrium constant for the dissociation reaction \(0\to e^-+h^+\)). Show that the precise \(T\)-dependence of \(n_i\) is given by:
\[n_i\sim T^{3/2}e^{-E_g/2k_BT}\]
where the band gap \(E_g:=E_C-E_V\) (this result is sometimes also presented as \(n_i=n_Se^{-\beta E_g/2}\) where \(n_S:=\sqrt{n_Cn_V}\) is the geometric mean of the effective densities of states of the conduction and valence bands).
Solution:
Problem: Keeping tempertature \(T\) fixed, consider \(n\)-type doping an intrinsic semiconductor with donor dopants, thus creating an \(n\)-type extrinsic semiconductor. The effect of this will be to raise the Fermi level from the intrinsic Fermi level \(\mu_i\) (appx. in the middle of the band gap as shown earlier) to a new value \(\mu\) much closer to the base of the conduction band \(E_C\). Show that the precise amount of this raising can be quantified by:
\[\mu-\mu_i=k_BT\ln\frac{n_d}{n_i}\]
where \(n_d\) is the (directly manipulable!) concentration of donor dopants doped into the intrinsic semiconductor, stating the \(2\) key assumptions underlying this.
(note, an analogous line of reasoning for a \(p\)-type semiconductor shows that the Fermi level is lowered by an amount:
\[\mu_i-\mu=k_BT\ln\frac{n_a}{n_i}\]
where \(n_a\) is the (also directly manipulable) concentration of acceptor dopants).
Solution: At equilibrium, one has for an intrinsic semiconductor:
\[n_i=n_Ce^{-\beta(E_C-\mu_i)}\]
and for an \(n\)-type doped extrinsic semiconductor:
\[n_{e^-}=n_Ce^{-\beta(E_C-\mu)}\]
Taking the ratio yields:
\[\mu-\mu_i=k_BT\ln\frac{n_{e^-}}{n_i}\]
At this point, the goal is to justify why \(n_{e^-}\approx n_{d}\). This proceeds in \(2\) stages.
First, justify that \(n_{e^-}\approx n_{d}^+\), the concentration of cationized donor dopants. This follows by setting \(n_{a^+}=0\) in the earlier charge neutrality constraint (since there are no acceptor dopants added \(n_a=0\Rightarrow n_{a^+}=0\)) and using the law of mass action to replace \(n_{h^+}=n_i^2/n_{e^-}\), obtaining a quadratic equation for \(n_{e^-}\) whose physical solution is:
And at this point, one assumes that the semiconductor is fairly heavily doped, in particular \(n_D\gg n_i\) (typical values in \(\text{Si}\) are \(n_D\sim 10^{16}\text{ cm}^{-3}\) while \(n_i\sim 10^{10}\text{ cm}^{-3}\)). This allows one to approximate \(n_{e^-}\approx n_{d^+}\).
2. Then, to justify why \(n_{d^+}\approx n_d\), one has to assume that the donor dopants are shallow in the sense that the binding energy of their extra valence electron is comparable to \(k_BT\), and so it is easily excited into the conduction band. In other words, assume that almost complete cationization of donor dopants. This is just the statement that \(n_{d^+}\approx n_d\), as desired.
Problem: In what regime is the non-degenerate semiconductor approximation valid?
Solution: For an \(n\)-type extrinsic semiconductor, a rule of thumb is that the Fermi level cannot rise to within \(2k_BT\) of the base of the conduction band \(E_C\):
\[E_c-\mu\geq 2k_BT\]
Inserting \(n_d=n_Ce^{-\beta (E_C-\mu)}\) (where the approximation \(n_{e^-}\approx n_d\) has been employed), one arrives at the rule of thumb that the donor dopant concentration cannot exceed:
\[n_d\leq e^{-2}n_C\approx 0.14 n_C\]
Similarly, for \(p\)-type doping, the acceptor dopant concentration cannot exceed:
\[n_a\leq 0.14n_V\]
(the more important takeaway here is not the exact numerical prefactors, but the fact that the Fermi level should stay a few \(k_BT\) away from \(E_C\) or \(E_V\) for the semiconductor to be considered non-degenerate; indeed these estimates came from using the Boltzmann formula that was derived from this assumption, so it should be taken with a grain of salt as one is using a theory to predict its own demise).
Problem: A \(p\)-\(n\) junction is formed by putting a \(p\)-type extrinsic semiconductor in contact with an \(n\)-type extrinsic semiconductor. Starting from a simple “top-hat” distribution of the free charge density \(\rho_f(x)\) in the depletion region \(-x_p<x<x_n\), sketch \(\rho_f(x)\), \(E(x)\) and \(\phi(x)\).
Solution: Assuming an abrupt junction and sharp cutoff for the depletion region at \(x_p, x_n\) respectively, one has:
Problem: Make the sketches more quantitative. In particular, calculate the width \(x_n+x_p\) of the depletion region and the maximum strength \(E_{\text{max}}\) of the electrostatic field at the junction \(x=0\).
Solution:
Problem: Explain qualitatively how a depletion region forms at a \(p\)-\(n\) junction.
Solution: Intuitively, the electrons and holes cannot just keep diffusing indefinitely across the \(p\)-\(n\) junction because at some point too much like charge will clump on either side during recombination, preventing any further diffusion. Put another way, as the charge separation gets bigger and bigger, the induced \(\textbf E\)-field pointing from \(n\to p\) exerts an electric force on the electrons and holes that prevents them from crossing the junction; at equilibrium, this forms a depletion region where there are no mobile charge carriers.
Problem: Just as a harmonic oscillator can be free or driven, so a \(p\)-\(n\) junction can also be “free” (just sitting there with its built-in potential \(V_{\text{bi}}\)) or it can be “driven” as well in a sense, more precisely by applying an external voltage \(V\) across it. However, unlike say with resistors/capacitors/inductors where the polarity of this voltage doesn’t really matter, here the asymmetry of the \(p\)-type vs. \(n\)-type semiconductors on either side, and thus the corresponding asymmetry of \(V_{\text{bi}}\), means that the polarity of \(V\) matters. Sketch qualitative band diagrams to show how the \(p\)-\(n\) junction’s bands change in the case of both forward bias \(V>0\) or reverse bias \(V<0\). This underlies the principle of operation for some (though not all) kinds of diodes, sometimes called a \(p\)-\(n\) semiconductor diode.
Solution: Some words of explanation: forward biasing a \(p\)-\(n\) junction lowers the effective built-in potential from \(V_{\text{bi}}\mapsto V_{\text{bi}}-V\). This clearly increases conductivity of both electrons and holes across the junction now that the energy barrier is reduced. By contrast, reverse biasing a \(p\)-\(n\) junction only raises the effective built-in potential \(V_{\text{bi}}\mapsto _{\text{bi}}+V\), reducing conductivity of electrons and holes as the depletion region gets bigger.
When the \(p\)-\(n\) junction is initially unbiased so that \(V=0\):
After forward biasing \(V>0\):
The reverse-biased case is just opposite to the forward-biased case, and not shown. Note also that this is not an instance of quantum tunnelling, because it’s not just a simple top-hat potential barrier, there is no probability current across the depletion region, and indeed also no electric current, only diffusion current as elaborated later.
Another way to put it is that forward-biasing the \(p\)-\(n\) junction encourages the majority charge carriers on each side to diffuse across the depletion region (and discourages the minority carriers, but that doesn’t matter anyways because they are minority), while reverse-bias is the opposite.
Problem: Recall that in an intrinsic semiconductor, at finite \(T>0\), the very few charge carriers in the conduction band are purely thermal electrons excited from their corresponding thermal holes in the valence band. Then, respectively \(p\)-type or \(n\)-type doping the intrinsic semiconductor, the creation of hydrogenic acceptor states just above the valence band or donor states just below the conduction band causes respectively holes to become the majority charge carrier (in the valence band) and electrons to become the minority charge carrier (in the conduction band) in the \(p\)-type extrinsic semiconductor, and vice versa for \(n\)-type (that isn’t to say that the thermal electrons and thermal holes \(n_{e^-,i}=n_{h^+,i}=n_i\sim 1.5\times 10^{10}\text{ cm}^{-3}\) aren’t still there, just they become negligible).
Solution:
Problem: Calculate the reverse saturation current \(I_{\text{sat}}=\lim_{V\to-\infty}I_{\text{sat}}(e^{V/V_T}-1)\) for a \(p\)-\(n\) junction semiconductor diode with the following parameters:
\[n_a=\]
To sort out later:
At T=0 K, mu is not really well-defined (b/c g(E)=0 in the band gap, so mu could be put anywhere in there..) but for T>0 K it is well-defined…
For n-type doping, by putting extra atoms near the bottom of the conduction band, will increase the chemical potential…(all this comes from interpreting mu as a silly fit parameter that needs to be tuned to get integral g(E)f(E) = number density of conduction electrons in the system = number density of donor dopants).
For p-type doping, add strongly electronegative atoms, they rip off electrons from the valence band. leaving additional holes in the valence band.
w.r.t. intrinsic concentrations of electrons and holes at T=300 K,
something about asymmetry of the densities of states…the presence of the donor and acceptor states in the band gap influences g(E) there…
Problem: A vibrating string with displacement profile \(y(x,t)\) has non-uniform mass per unit length \(\mu(x)\) and non-uniform tension \(T(x)\) experiences both an internal restoring force due to \(T(x)\) but also a linear “Hooke’s law” restoring force \(-k(x)y(x)\) everywhere so that its equation of motion is governed by:
Problem: Make the “Noetherian interpretation” above more concrete by showing that the eigenvalue \(\omega^2\) can be expressed as a Rayleigh-Ritz quotient:
making it look like the usual Rayleigh-Ritz quotient employed in the quantum mechanical variational principle (although this glosses over the subtlety about boundary terms in the integration by parts).
Problem: Conversely, show that if one considers \(\omega^2=\omega^2[\psi]\) as a functional of \(\psi(x)\), then the functional is stationary on eigenstates \(\psi\) of the Sturm-Liouville operator \(H\) and with eigenvalue \(\omega^2[\psi]\).
Solution:
Problem: Show that eigenfunctions of the Sturm-Liouville operator with distinct eigenvalues are \(\mu\)-orthogonal.
Solution: (this proof assumes the eigenvalues have already been shown real, and the proof of that essentially mirrors this proof except setting \(1=2\)):
Problem: Solve the following inhomogeneous \(2^{\text{nd}}\)-order ODE: