In crystals, electrons roam nearly free,
But weak potentials from atoms we see,
Energy bands form, with gaps in between,
Creating the structure of metals so keen.
Near Brillouin zones, waves scatter and split,
Forming conductors, insulators, bit by bit.
Semiconductors bridge these worlds with a leap,
Where electrons’ paths, the lattice will keep.
– ChatGPT, September \(18\)th, \(2024\)
One approach to modelling the behavior of electrons \(e^-\) in a crystal is to use perturbation theory. Specifically, what idealized model Hamiltonian \(\tilde H\) are we working off of and what’s the perturbation \(\Delta\tilde H\) being applied to it? As the name “nearly free electron model” suggests, we use \(\tilde H:=\frac{\textbf P^2}{2m}\) the free Hamiltonian (aka just the kinetic energy of the electron) and view the potential energy \(\Delta\tilde H:=V(\textbf X)\) due to the crystal lattice as a perturbation to this free Hamiltonian.
Let’s work for now on a circular crystal \(S^1\), say of radius \(R\). Of course in order to use perturbation theory we better understand the model Hamiltonian \(\tilde H\) well first. And indeed we do; \(\tilde H\)-eigenstates are given by \(\langle x|k\rangle=\frac{1}{\sqrt{2\pi R}}e^{ikx}\) with energies \(E(k)=\frac{\hbar^2k^2}{2m}\) where \(x\) is the arc length along \(S^1\). But since the electron \(e^-\) is moving on \(S^1\) and not \(\textbf R\), the wavenumber is quantized \(k_n=n\kappa\) in units of the curvature \(\kappa:=1/R\).
So to summarize, if we consider only the model Hamiltonian \(\tilde H\), then the spectrum is the trivial, quadratic dispersion relation for the free electron \(E(k)=\frac{\hbar^2k^2}{2m}\):

Now we see how adding the perturbation \(V(x)\) will “butcher” the parabola graphed above. It turns out, we don’t need to assume anything about \(V(x)\) other than the fact that it’s periodic in \(x\) around the circle, i.e. \(V(x+2\pi R)=V(x)\). However, actually we have \(V(x+\Delta x)=V(x)\) where \(\Delta x=2\pi R/N\) is the adjacent spacing between the \(N\) atoms arranged around the circle \(S^1\) so the periodicity of the potential should be thought of as following from the periodicity of the crystal (which would also be true in a non-compact space like \(\textbf R\)) rather than the coincidental periodic topology of \(S^1\) in this case (chosen to avoid having to deal with edge effects).
Anyways, the key insight is that because \(V(x)\) is \(\Delta x\)-periodic, it admits a complex Fourier series \(V(x)=\sum_{n=-\infty}^{\infty}\hat V_n e^{2\pi inx/\Delta x}\) with Fourier coefficients \(\hat V_n=\frac{1}{\Delta x}\int_{0}^{\Delta x}V(x)e^{-2\pi inx/\Delta x}dx=\frac{1}{2\pi R}\int_{0}^{2\pi R}V(x)e^{-2\pi inx/\Delta x}dx\) for \(n\in\textbf Z\). What good does this do? Well, remember that we want to understand how \(V(x)\) perturbs the free particle dispersion spectrum \(E(k)=\hbar^2k^2/2m\). Although \(V(x)\) is a time-independent perturbation, we still need to decide if have to use degenerate or non-degenerate perturbation theory. Staring at the graph of the parabola above, we notice that, besides the \(k=0\) ground state, all other \(k_n=\pm\kappa,\pm 2\kappa,\pm 3\kappa,…\) are doubly degenerate with \(E(k)=E(-k)\). If we focus on the ground state \(|n\rangle=|0\rangle\) first with energy \(E(0)=0\), non-degenerate perturbation theory asserts that the first-order energy correction is:
\[\langle 0|V|0\rangle=\int_0^{2\pi R}|\langle x|0\rangle|^2V(x)dx=\frac{1}{2\pi R}\int_0^{2\pi R}V(x)dx=\hat V_0\]
which is essentially just a “DC offset” term in the Fourier series of \(V(x)\) to set a reference, and so not physically meaningful. The second-order energy correction \(\Delta E_0^{(2)}<0\) to the ground state energy \(E_0=0\) is:
\[\Delta E_0^{(2)}=-\sum_{n\neq 0}\frac{|\langle n|V|0\rangle|^2}{E(k_n)}=-\sum_{n\neq 0}\frac{|\hat V_n|^2}{E(k_{nN})}=-\frac{4m}{N^2\hbar^2\kappa^2}\sum_{n=1}^{\infty}\frac{|\hat V_n|^2}{n^2}\]
Now what about the doubly degenerate states \(|\pm k_n\rangle\)? For these, there is no reason a priori why we would somehow be allowed to get away with not using degenerate perturbation theory. So for now, let us work in the \(2\)-dimensional degenerate \(\tilde H\)-eigensubspace \(\text{span}_{\textbf C}|k_n\rangle,|-k_n\rangle\) for some \(n\in\textbf Z\). The matrix elements of the potential \(V\) in the basis \(\{|k_n\rangle:n\in\textbf Z\}\) for the entire state space \(\mathcal H\cong L^2(S^1\to\textbf C,dx)\) are a straightforward generalization of the earlier calculations for the ground state:
\[\langle k_n|V|k_m\rangle=\sum_{j=-\infty}^{\infty}\hat V_j\delta_{k_n-k_m,Nk_j}\]
Thus, the matrix element is guaranteed to vanish unless there exists an integer \(j\in\textbf Z\) such that \(j=\frac{n-m}{N}\), in which case \(\langle k_n|V|k_m\rangle=\hat V_{(n-m)/N}\) (and even then it could vanish if this Fourier coefficient happens to be zero). For the case \(m=-n\) that we’re interested in here, this means \(j=\frac{2n}{N}\in\textbf Z\). Thus, with \(N=101\) atoms for instance, the only degenerate states that might mix will be \(|101\rangle,|-101\rangle\), \(|202\rangle,|-202\rangle\), etc. whereas if \(N=100\) atoms then we get “twice as many” pairs of degenerate states that might mix \(n=\pm 50,\pm 100\), etc. Thus, the parity \((-1)^N\) of \(N\) seems to play an important role here?
Anyways, the point is that for the vast majority of degenerate pairs \(|k_n\rangle,|-k_n\rangle\), the periodic lattice potential energy perturbation \(V\) does not actually mix them. Put another way, the \(2\times 2\) matrix \([V|_{\text{span}_{\textbf C}|k_n\rangle,|-k_n\rangle}]_{|k_n\rangle,|-k_n\rangle}^{|k_n\rangle,|-k_n\rangle}\) is a diagonal matrix (i.e. the two complex conjugate off-diagonal matrix elements vanish). This means that the eigenvectors are just \((1,0)^T,(0,1)^T\) (so again, they don’t mix!) with corresponding eigenvalues the diagonal elements which are the expectations of \(V\) in \(|\pm k_n\rangle\) (and indeed \(\langle k_n|V|k_n\rangle=\hat V_0\) for all \(n\in\textbf Z\) so actually \([V|_{\text{span}_{\textbf C}|k_n\rangle,|-k_n\rangle}]_{|k_n\rangle,|-k_n\rangle}^{|k_n\rangle,|-k_n\rangle}=\hat V_0 1_{2\times 2}\)). But this is precisely the same first-order energy correction as if one had just directly applied non-degenerate perturbation theory! This is a general rule of thumb: if a particular \(\tilde H\)-eigenstate does not mix with any other degenerate \(\tilde H\)-eigenstates in the sense that their mutual matrix element with respect to the perturbation \(\Delta\tilde H\) vanishes, then one can effectively apply non-degenerate perturbation theory to that \(\tilde H\)-eigenstate despite it initially being degenerate. In this particular case, because all the energy corrections are just the DC reference of the potential \(\hat V_0\), this means that \(V\) does not even lift their degeneracy if they don’t mix.
Yet it turns out that quantum minority of degenerate pairs which do mix with respect to the perturbation \(V\) are the all-important pairs. Let \(|k_n\rangle,|-k_n\rangle\) be such a pair for which \(\langle k_n|V|-k_n\rangle=\hat V_{2n/N}\). Then one can check that they mix into the states \(|k_n\rangle,|-k_n\rangle\mapsto\frac{1}{\sqrt{2}}\text{arg}(\hat V_{2n/N})|k_n\rangle+\frac{1}{\sqrt 2}|-k_n\rangle,-\frac{1}{\sqrt{2}}\text{arg}(\hat V_{2n/N})|k_n\rangle+\frac{1}{\sqrt 2}|-k_n\rangle\) with respective energies \(E_{\pm}=\frac{\hbar^2k_n^2}{2m}+\hat V_0\pm |\hat V_{2n/N}|\) separated by an energy gap \(\Delta E=2|\hat V_{2n/N}|\).
Okay, now let’s clear something up; earlier we wrapped the crystal around on the circle \(S^1\) so that we didn’t have to worry about edge effects. This had the consequence that the wavenumbers \(k_n=n\kappa\) were quantized. However, let’s now brush that under the rug and pretend we’re working on \(\textbf R\) again where \(k\in\textbf R\) is no longer quantized. In this setting then, the degenerate pairs \(|k\rangle,|-k\rangle\) may mix if \(k=\frac{n\pi}{\Delta x}\) for some \(n\in\textbf Z\) (or more intuitively, if one can form a “standing de Broglie wave” between adjacent atoms \(n\lambda=2\Delta x\)). These special values of \(k\) define the boundaries of various Brillouin zones \(K_n\subseteq\textbf R\) in \(k\)-space. Each Brillouin zone maps bijectively to an energy band \(E(K):=\{E(k):k\in K_n\}\), separated by energy band gaps \(\Delta E\). This then is the merit of the perturbative nearly free-electron model; it establishes a logical equivalence between the seemingly disparate phenomena of periodic potentials and Brillouin zone/energy band/energy gap structure.

This is called the extended zone scheme. You can still vaguely see the outline of the unperturbed \(E(k)=\hbar^2k^2/2m\) parabolic spectrum that was previously there, now butchered as promised. In particular, I am omitting proof here that, away from the Brillouin zone boundaries, the spectrum is actually not changed much from the original parabola, but at the Brillouin zone boundaries \(\frac{\partial E}{\partial k}=0\) quadratically. The nearly free electron model is related to the tight-binding model by their very similar-looking first Brillouin zone.
A Digression on Floquet Theory
There is another, more abstract way to qualitatively see why a periodic potential \(V(x+\Delta x)=V(x)\) gives rise to Brillouin zones and energy bands.
Consider an \(N\)-dimensional linear dynamical system \(\dot{\textbf x}(t)=A(t)\textbf x(t)\) where \(A(t)\) can be time-dependent. In general, one expects this to have \(N\) linearly independent solutions \(\textbf x_1(t),\textbf x_2(t),…,\textbf x_N(t)\). A fundamental matrix solution \(X(t)\) to the linear dynamical system is any \(N\times N\) matrix comprised of a basis of \(N\) such linearly independent solutions \(X(t)=\left(\textbf x_1(t),\textbf x_2(t),…,\textbf x_N(t)\right)\), so in other words the linear independence implies that \(X(t)\in GL_N(\textbf R)\) is invertible \(\det X(t)\neq 0\) at all times \(t\in\textbf R\). Think of the fundamental matrix solution \(X(t)\) as an \(N\)-dimensional parallelotope evolving in time \(t\). It can be checked that any fundamental matrix solution \(\dot{X}(t)=A(t)X(t)\) satisfies the same system of ODEs, and that any individual solution \(\textbf x(t)\) must lie in the column space \(\textbf x(t)\in\text{span}_{\textbf R}(\textbf x_1(t),…,\textbf x_N(t))\) of any fundamental matrix solution \(X(t)\) (indeed, this is why it’s called “fundamental” because it provides a basis for the entire solution space of the linear dynamical system). Thus, \(\textbf x(t)=X(t)X^{-1}(0)\textbf x(0)\). If \(X(0)=1_{N\times N}\), then \(X(t)\) is called a principal fundamental matrix solution and in that case we just have \(\textbf x(t)=X(t)\textbf x(0)\). Thus, the principal fundamental matrix solution plays a role analogous to the time evolution operator \(U(t)\) in quantum mechanics, evolving the initial state of a system through time \(t\).
Floquet theory is interested in a special case of the setup above, namely when \(A(t+T)=A(t)\) is a \(T\)-periodic matrix. In this case, it turns out that the solutions \(\textbf x(t)\) themselves will not necessarily be \(T\)-periodic (or periodic at all for that matter), but will nonetheless take on a much simpler form than the otherwise unenlightening time-ordered exponential solution \(\textbf x(t)=\mathcal Te^{\int_0^tA(t’)dt’}\textbf x(0)\) as a Dyson series. What’s that simplification? Let me spoil it first, then prove it afterwards. Recall that I just said \(\textbf x(t)\) won’t necessarily be periodic. But that should feel pretty strange since \(A(t)\) is by assumption \(T\)-periodic so surely \(\textbf x(t)\) should inherit something from this. And that’s exactly the essence of Floquet’s theorem, namely that although \(\textbf x(t)\) itself need not be periodic, its time evolution can always be factorized into a \(T\)-periodic \(N\times N\) matrix \(X_A(t)\) (with the subscript \(A\) to suggest that it inherits its \(T\)-periodicity \(X_A(t+T)=X_A(t)\) from the \(T\)-periodicity of the matrix \(A(t)\)) modulated by an exponential growth/decay or oscillatory “envelope” \(e^{\Lambda t}\) for some time-independent \(N\times N\) (possibly complex) matrix \(\Lambda\in\textbf C^{N\times N}\). In other words, one has the ansatz:
\[\textbf x(t)=X_A(t)e^{\Lambda t}\textbf x(0)\]
To prove Floquet’s theorem, we first ask: although \(\textbf x(t+T)\neq\textbf x(t)\) need not be periodic, how are \(\textbf x(t+T)\) and \(\textbf x(t)\) related to each other? Clearly by a simple \(T\)-time translation \(\textbf x(t+T)=X(t+T)X^{-1}(t)\textbf x(t)\). It makes sense then to focus on understanding these fundamental matrix solutions \(X(t)\) better.
The first key result of Floquet theory is that the time evolution of any fundamental matrix solution \(X(t)\) across a period \(T\) behaves multiplicatively \(X(t+T)=X(t)\tilde X_T\) where the \(N\times N\) monodromy matrix \(\tilde X_T=X^{-1}(t)X(t+T)\) is a conserved quantity of any \(T\)-periodic linear dynamical system:
\[\dot{\tilde X}_T=-X^{-1}(t)A(t)X(t+T)+X^{-1}(t)A(t+T)X(t+T)=0\]
It is thus common to initialize the monodromy matrix \(\tilde X_T=X^{-1}(0)X(T)\) and write \(X(t+T)=X(t)X^{-1}(0)X(T)\). After \(n\) periods elapse we have \(X(t+nT)=X(t)\tilde X_T^n\) so as a corollary for instance the hypervolume \(\det X(t)\) of the parallelotope grows stroboscopically in time \(t\) as a geometric sequence \(\det X(t+nT)=(\det \tilde X_T)^n\det X(t)\). Indeed, using Jacobi’s formula one can check that \(\dot{\det}X(t)=\det A(t)\det X(t)\) so \(\det X(t)=e^{\int_0^tA(t’)dt’}\det X(0)\) is consistent with \(X(t+T)=X(t)\tilde X_T\) provided the monodromy matrix \(\tilde X_T\) and \(A(t)\) are related by \(\det\tilde X_T=e^{\int_0^T\text{Tr} A(t)dt}\) (despite the suggestion, note that in general \(\tilde X_T\neq e^{\int_0^T A(t)dt}\)! Everything would be simpler if this were true but alas it isn’t).
It is therefore intuitively clear that the behavior/stability of any periodic linear dynamical system is sensitive to its monodromy matrix \(\tilde X_T\), specifically whether it “blows up” or “decays” exponentially or oscillates (as would have been one’s first instinct given the periodicity \(A(t+T)=A(t)\) of \(A\))! More precisely, to understand the behavior of powers \(\tilde X_T^n\) of a matrix, it is always a good idea to diagonalize it (a subtle but important property of the monodromy matrix \(\tilde X_T\) proven on page \(52\) here is that although it depends on the choice of fundamental matrix solution \(X(t)\) used to construct it, it will be similar to any other monodromy matrix \(\tilde X’_T\) constructed from any other fundamental matrix solution \(X'(t)\); thus, the eigenvalues of the monodromy matrix \(\tilde X_T\) are properties only of the linear, periodic dynamical system \(\dot X=AX\) and therefore are rightfully called its characteristic multipliers).
Since \(\tilde X_T\in GL_N(\textbf R)\), these characteristic multipliers must be non-zero, and so all such characteristic multipliers can be written as \(e^{\lambda T}\) for some \(\lambda\in\textbf C\) called its characteristic exponent. Although the imaginary part \(\Im\lambda\) of such a characteristic exponent is not unique \(e^{(\lambda+2\pi i/T)T}=e^{\lambda T}\), the real part \(\Re\lambda\) is unique and physically meaningful, being given the name Lyapunov exponent. Specifically, it is clear that if a solution \(\textbf x(t)\) happens to initialize on \(\textbf x(0)=X(0)\tilde{\textbf x}\) where \(\tilde X_T\tilde{\textbf x}=e^{\lambda T}\tilde{\textbf x}\) is an eigenvector of the monodromy matrix \(\tilde X_T\), then one can check that such a solution \(\textbf x(t+T)=e^{\lambda T}\textbf x(t)\) evolves multiplicatively in an analogous manner to the fundamental matrix solution earlier. In particular, if one replaces this \(\textbf x(t)\mapsto \textbf x_A(t):=e^{-\lambda t}\textbf x(t)\), then clearly this new trajectory (but no longer a solution!) \(\textbf x_A(t)\) is \(T\)-periodic \(\textbf x_A(t+T)=e^{-\lambda t}e^{-\lambda T}\textbf x(t+T)=e^{-\lambda t}e^{-\lambda T}e^{\lambda T}\textbf x(t)=e^{-\lambda t}\textbf x(t)=\textbf x_A(t)\). Thus, one can isolate the form of this “eigentrajectory” as \(\textbf x(t)=e^{\lambda t}\textbf x_A(t)\). For \(\Re \lambda<0\) we have \(\lim_{t\to\infty}\textbf x(t)=\textbf 0\), for \(\Re \lambda>0\) we have \(\lim_{t\to\infty}\textbf x(t)=\boldsymbol{\infty}\) while for \(\Re \lambda=0\) the eigentrajectory \(\textbf x(t)\) is in general pseudoperiodic in that it ends up at the same distance \(|\textbf x(t+T)|=|\textbf x(t)|\) (but not necessarily the same point) away from the origin after each period \(t\mapsto t+T\). These criteria emphasize the role played by the Lyapunov exponent \(\Re\lambda\) in the analysis of stability. By lining up \(N\) such linearly independent eigentrajectories (we’ll assume that the monodromy matrix \(\tilde X_T\) is diagonalizable), it follows that any fundamental matrix solution \(X(t)\) admits a so-called Floquet normal form \(X(t)=X_{A}(t)e^{\Lambda t}\) where \(X_A(t+T)=X_A(t)\) inherits the \(T\)-periodicity of \(A\) and \(\Lambda\) is a time-independent \(N\times N\) matrix. This therefore proves Floquet’s theorem!
Connection to Periodic Potentials in Quantum Mechanics
Define \(\boldsymbol{\phi}(x):=(\psi(x),\psi'(x))^T\) so that the one-dimensional second-order Schrodinger equation reduces to the first-order linear system:
\[\boldsymbol{\phi}'(x)=A(x)\boldsymbol{\phi}(x)\]
where \(A(x)=\begin{pmatrix}0&1\\2m(V(x)-E)/\hbar^2 & 0\end{pmatrix}\) is \(\Delta x\)-periodic \(A(x+\Delta x)=A(x)\) by virtue of the \(\Delta x\)-periodicity of the potential \(V(x)\) itself. Now notice that clearly \(\text{Tr} A(x)=0+0=0\), so it follows from the earlier general Floquet theory results that \(\det\tilde \Phi_{\Delta x}=1\) where \(\tilde \Phi_{\Delta x}\) is the monodromy matrix of the periodic quantum system. This means that its two eigenvalues \(\varphi_{\pm}\) are given by:
\[\varphi_{\pm}=\frac{\text{Tr}\tilde \Phi_{\Delta x}}{2}\pm\sqrt{\frac{\text{Tr}^2\tilde \Phi_{\Delta x}}{4}-1}\]
Thus, when \(\text{Tr}\tilde \Phi_{\Delta x}<-2\) or \(\text{Tr}\tilde \Phi_{\Delta x}>2\), both \(\phi_{\pm}>0\) are real and positive, indicating a spatially unstable wavefunction \(\psi(x)\) that diverges in a non-normalizable manner as \(x\to\pm\infty\). Therefore, this is saying that there do not exist \(H\)-eigenstates with energy \(E\) such that the corresponding monodromy matrix \(\tilde \Phi_{\Delta x}=\tilde \Phi_{\Delta x}(E)\) has trace less than \(-2\) or greater than \(2\). These are the band gaps between the various energy bands! Of course then, on the other hand when \(\text{Tr}\tilde \Phi_{\Delta x}\in(-2,2)\) is in the zone of spatial stability (aka \(L^2\)-normalizability), these are associated with the energy bands! Finally, exactly when \(\text{Tr}\tilde \Phi_{\Delta x}=\pm 2\) is where one reaches the boundaries separating various Brillouin zones! Admittedly this Floquet analysis is a bit abstract and handwavy but I still think it’s a nice alternative perspective to the more concrete nearly-free electron model.
If we let \(\Phi(x)=\begin{pmatrix}\psi_1(x) & \psi_2(x) \\ \psi’_1(x) & \psi’_2(x)\end{pmatrix}\) be a fundamental matrix solution, then of course \(\det\Phi(x)\) is the usual constant Wronskian of two wavefunctions \(\psi_1(x),\psi_2(x)\), and Floquet’s theorem allows us to write \(\Phi(x)=\Phi_V(x)e^{\Lambda x}\) for a \(\Delta x\)-periodic \(2\times 2\) matrix \(\Phi_V(x+\Delta x)=\Phi_V(x)\) whose \(\Delta x\)-periodicity is inherited from the \(\Delta x\)-periodicity of the potential \(V(x)\). This leads to Bloch’s theorem (in one dimension), namely that there exists a crystal momentum \(k\in[-\pi/\Delta x,\pi/\Delta x]\) in the first Brillouin zone is given by the Bloch state \(\psi_{k}(x)=e^{ikx}\phi_k(x)\).