In number theory, the fundamental theorem of arithmetic clarifies why prime numbers are so important, namely that they form a “multiplicative basis” with which one can uniquely factorize any positive integer \(n\in \textbf Z^+\). In the same spirit, Schur’s lemma provides for representation theory roughly the analog of what the fundamental theorem of arithmetic provides for number theory, namely instead of prime numbers being in the spotlight it is the irreducible representations of a group \(G\) which are now in the spotlight. For this reason, I like to informally think of Schur’s lemma as the “fundamental theorem of representation theory” since there is a whole edifice of results (e.g. the Schur orthogonality relations for characters, the Peter-Weyl theorem for compact Lie groups, the decomposition of representation tensor products into direct sums of irreducible subrepresentations, etc.) built from Schur’s lemma.
Recall that given a group \(G\), two \(G\)-representations \(\phi_1:G\to GL(V_1)\), \(\phi_2:G\to GL(V_2)\) on vector spaces \(V_1,V_2\) are considered isomorphic \(G\)-representations iff there exists a vector space isomorphism \(L:V_1\to V_2\) that conjugates between the two \(G\)-representations for all symmetries \(g\in G\), i.e. \(\phi_2(g)=L\phi_1(g)L^{-1}\). Equivalently, one can view \(L\) as intertwining the two \(G\)-representations \(\phi_2(g)L=L\phi_1(g)\).
Schur’s lemma has \(2\) parts. The first part is roughly like saying that if \(p\leq p’\) are two prime numbers, then either \(p\) does not divide \(p’\) or \(p=p’\):
Schur’s Lemma (Part 1): Let \(G\) be a group, let \(\phi_1:G\to GL(V_1)\) and \(\phi_2:G\to GL(V_2)\) be two irreducible \(G\)-representations on respective vector spaces \(V_1,V_2\). Suppose \(L\) intertwines \(\phi_1\) with \(\phi_2\) so that \(\phi_2(g)L=L\phi_1(g)\). Then either \(L=0\) (so that \(\phi_1\) and \(\phi_2\) are non-isomorphic \(G\)-irreps) or \(L\) is invertible (so that \(\phi_1\) and \(\phi_2\) are isomorphic \(G\)-irreps).
Proof: First, the idea is to show that the kernel \(\ker(L)\) and the image \(L(V_1)\) of the intertwining map \(L\) are respectively \(\phi_1(g)\)-invariant and \(\phi_2(g)\)-invariant subspaces of \(V_1\) and \(V_2\) respectively for all \(g\in G\). This is a straightforward computation relying explicitly on the fact that \(L\) is an intertwining map. Then invoke the irreducibility of \(\phi_1\) and \(\phi_2\) to conclude that either \(\ker(L)=\{\textbf 0\}\) or \(\ker(L)=V_1\) and similarly that \(L(V_1)=\{\textbf 0\}\) or \(L(V_1)=V_2\). Finally, if \(L\) were to be non-invertible, then either it is not injective (so that \(\ker(L)\) is non-trivial and must therefore be \(\ker(L)=V_1\)) or it is not surjective (so that \(L(V_1)\) cannot be all of \(V_2\) and must therefore be \(L(V_1)=\{\textbf 0\}\)). In either case then, we see that \(L=0\) as claimed. Otherwise if \(L\) is invertible, then it just reduces to the earlier definition for isomorphic \(G\)-representations.
There is a second part to Schur’s lemma which roughly speaking is the version of the above but with \(V_1=V_2:=V\) and \(\phi_1=\phi_2:=\phi\) (also, I could not think of any number theoretic analog for this part unfortunately).
Schur’s Lemma (Part 2): Let \(G\) be a group, let \(\phi:G\to GL(V)\) be an irreducible \(G\)-representation on a complex vector space \(V\). Suppose \(L:V\to V\) commutes with the entire \(G\)-representation \(\phi\), i.e. \(\phi(g)L = L\phi(g)\) for all \(g\in G\). Then \(L=\lambda 1\) for some eigenvalue \(\lambda\in\textbf C\) of \(L\).
Proof: For any complex number \(\lambda\in\textbf C\), one can use \([L,\phi(g)]=0\) to compute that the subspace \(\ker(L-\lambda 1)\) of \(V\) is \(\phi(g)\)-invariant for all \(g\in G\) so by irreducibility of \(\phi\) must either be \(\ker(L-\lambda 1)=\{\textbf 0\}\) or \(\ker(L-\lambda 1)=V\). Now clearly the first case will in general occur if one just arbitrarily selects \(\lambda\in\textbf C\) while the second case occurs if one specifically chooses \(\lambda\in\textbf C\) to be an eigenvalue of \(L\), which thankfully can always be done thanks to the hypothesis that \(V\) is a complex vector space. But then \(\ker(L-\lambda 1)=V\) means that \(L=\lambda 1\) as claimed.
Finally, I wanted to briefly sketch the relevance of Schur’s lemma to quantum mechanics. Suppose a quantum mechanical system in \(\textbf R^3\) has a Hamiltonian \(H\) which is isotropic/central/rotationally invariant/rotationally symmetric (e.g. the 3D quantum harmonic oscillator, the hydrogen atom, the spherical potential well, etc.). There are several logically equivalent ways to formalize what this means; in order of decreasing intuitiveness but increasing usefulness, they are:
- The Hamiltonian \(H=T+V\) is associated to a potential energy operator \(V=V(|\textbf X|)\) which only depends on the radial position operator \(|\textbf X|\).
- For arbitrary rotations \(\Delta\boldsymbol{\theta}\in\textbf R^3\cong SO(3)\), the Hamiltonian \(H\) is invariant under conjugation by the unitary rotation operator \(e^{-i\Delta\boldsymbol{\theta}\cdot\textbf L/\hbar}He^{i\Delta\boldsymbol{\theta}\cdot\textbf L/\hbar}=H\).
- For arbitrary rotations \(\Delta\boldsymbol{\theta}\in\textbf R^3\cong SO(3)\), the Hamiltonian \(H\) commutes with the unitary rotation operator \([H, e^{-i\Delta\boldsymbol{\theta}\cdot\textbf L/\hbar}]=0\).
- The Hamiltonian \(H\) commutes with all \(3\) components of the generator \(\textbf L\) of infinitesimal rotations, i.e. \([H,\textbf L]=\textbf 0\).
- The \(3\) components of the angular momentum \(\textbf L\) are conserved observables of the quantum system, which is to say that \(\dot{\textbf L}=\textbf 0\) in the Heisenberg picture or equivalently that \(\frac{d}{dt}\langle\psi|\textbf L|\psi\rangle=\textbf 0\) for any quantum state \(|\psi\rangle\) in the Schrodinger picture.
Then if \(\mathcal H_{\ell}:=\text{span}_{\textbf C}\{|\ell, m_{\ell}\rangle:m_{\ell}=-\ell,…,\ell\}\) denotes the \(2\ell+1\)-dimensional \(\ell\)-multiplet of simultaneous \((\textbf L^2,L_3)\) eigenstates with definite total angular momentum \(\sqrt{\ell(\ell+1)}\hbar\), it is clear that \(\mathcal H_{\ell}\) is \(e^{-i\Delta\boldsymbol{\theta}\cdot\textbf L/\hbar}\)-invariant for all rotations \(\Delta{\theta}\in\textbf R^3\cong SO(3)\) (why? It suffices to check for infinitesimal rotations \(1-\frac{i d\boldsymbol{\theta}\cdot\textbf L}{\hbar}\) since any rotation is just the repeated application of infinitesimal rotations. In turn, it suffices to check that \(\mathcal H_{\ell}\) is \(\textbf L\)-invariant. It is certainly \(L_3\)-invariant, and \(L_1,L_2\) invariance follow from the real and imaginary part expressions \(L_1=(L_++L_-)/2\) and \(L_2=(L_+-L_-)/2i\) where clearly \(\mathcal H_{\ell}\) is \(L_{\pm}\)-invariant). Thus, for each \(\ell\in\textbf N\), the natural \(SO(3)\) representation \(\Delta\boldsymbol{\theta}\mapsto e^{-i\Delta\boldsymbol{\theta}\cdot\textbf L/\hbar}\) on the entire state space admits a subrepresentation on each multiplet \(\mathcal H_{\ell}\) (indeed, this ensures the existence of the matrix elements \(D^{\ell}_{m_{\ell}’m_{\ell}}(\boldsymbol{\theta}):=\langle \ell,m_{\ell}’|e^{-i\Delta\boldsymbol{\theta}\cdot\textbf L/\hbar}|\ell,m_{\ell}\rangle\) of the Wigner D-matrix \(D^{\ell}(\boldsymbol{\theta})\in U(2\ell +1)\)). Moreover, each of these subrepresentations is irreducible because the existence of the ladder operators \(L_{\pm}\) allow one to hop among all \(2\ell+1\) angular momentum eigenstates \(|\ell,m_{\ell}\rangle\) for a given angular momentum quantum number \(\ell\in\textbf N\).
All this is to say that, in practice, whenever any one of the \(5\) logically equivalent conditions above holds, one can declare victory (thanks to the second part of Schur’s lemma) by instantly jumping to the conclusion that each \(\ell\)-multiplet \(\mathcal H_{\ell}\) is an energy eigenspace since the Hamiltonian \(H\) is an intertwining operator for the entire irreducible \(SO(3)\) representation and so acts as \(H=E1\) on all states in \(\mathcal H_{\ell}\). Thus, this explains why in any rotationally symmetric quantum system, the energies \(E\) are always degenerate with respect to the \(z\)-axis angular momentum \(m_{\ell}\) but rather can only depend on the total angular momentum \(\ell\). Morally, this is clear: isotropy by definition doesn’t preference any direction whereas \(m_{\ell}\) preferences the \(z\)-axis.
Pingback: Why (21) cm? - William's Quantumplations