The purpose of this post is to lay out the basic theory of ensembles in statistical mechanics in addition to some examples. At the end, the goal will be to convincingly demonstrate ensemble equivalence.
Any system (e.g. a gas, a ferromagnet, a Bose-Einstein condensate, the universe, etc.) in equilibrium is described by some state space \(\mathcal H\) whose elements are called states of the system (or one often also encounters the term microstates used). For classical systems, \(\mathcal H\cong\textbf R^{6N}\) is typically just the product of \(N\) copies of phase space \((\textbf x,\textbf p)\in\textbf R^6\) for a system of \(N\) particles, though it could also be something more discrete such as \(\mathcal H\cong\{-1,1\}^N\) encoding the \(2^N\) configurations of an Ising ferromagnet of \(N\) spins. For quantum systems, \(\mathcal H\) is (as the notation for it suggests) the Hilbert space of that system. When encountering a new system in nature, the hardest obstacle can just be to figure out what the right choice of state space \(\mathcal H\) for such a system is in the first place.
Armed with a state space \(\mathcal H\), an ensemble \(\mathcal E\) in the broadest sense is simply any subset of state space \(\mathcal E\subset\mathcal H\). For instance, if the state space is \(\textbf R^3\), then the plane \(x+y+z=1\) or the sphere \(x^2+y^2+z^2=1\) would both be ensembles of the state space \(\textbf R^3\). In practice, ensembles of physical systems tend to be cut out from state space \(\mathcal H\) by constraining various quantities (intensive or extensive) to be conserved (this is reasonable because the system is in equilibrium). Sometimes it is said that an ensemble is a collection of microstates compatible with a given macrostate (and such (micro)states are said to be accessible).
Isolated/Equiprobable/Microcanonical Ensemble
The microcanonical ensemble \(\mathcal E_E\) of a given system enforces unwavering conservation of energy \(E\) (though there may also be other conserved quantities like particle number \(N\) and volume \(V\)). That is, only states with the prescribed energy \(E\) are accessible. Physically, this is realized by having the system be utterly isolated from its environment (in practice, the universe is the only truly isolated system, but it turns out later this is the most important system one will need to apply the microcanonical ensemble to).
An important axiom of statistical mechanics is that all accessible microstates in the system’s microcanonical ensemble \(\mathcal E_E\) are equiprobable. This is sometimes called the principle of equal a priori probabilities or the fundamental assumption of statistical mechanics. Regardless, it plays the same kind of fundamental role that e.g. the fundamental theorem of calculus play to calculus (except that the latter can is a theorem that can be proven whereas the former is an axiom that is assumed). One can rationalize it in various heuristic ways (e.g. appealing to the principle of detailed balance from kinetic theory) though ultimately, as with anything in theoretical physics, the proper justification comes from experiments.
Although for a classical system like a liquid or gas, it would seem that the number/density/degeneracy/multiplicity of states \(g(E):=|\mathcal E_E|\) in the microcanonical ensemble is uncountably infinite, this will be swept under a rug for now. Then one postulates that the probability \(p_{\mu}\) of observing the system in an accessible state \(\mu\in\mathcal E_E\) is just:
\[p_{\mu}=\frac{1}{g(E)}\]
(the choice of notation “\(\mu\)” comes about because e.g. a micrometer is written \(\mu\text{m}\) so the “\(\mu\)” should be read like “micro” which is reminiscent of the phrase “microstate”).
Given any isolated system, one can always mentally partition the system into a bunch of subsystems, say \(\mathcal N\) pieces. Clearly, this doesn’t affect the fact that the system is still isolated and so only has access to states in its microcanonical \(E\)-ensemble \(\mathcal E_E\). However, the \(\mathcal N\) individual subsystems are no longer isolated since they can interact with each other to shuffle energy back and forth, so long as the system’s total energy \(E_1+E_2+…+E_{\mathcal N}\) remains conserved at \(E\).
The question one would like to answer then is: what is the most likely \(\mathcal N\)-tuple of energies \((E_1,E_2,…,E_{\mathcal N})\in\textbf R^{\mathcal N}\) (notice this is asking about a macrostate)? That is, one would like to maximize the joint probability distribution (a separable function of \(E_1,E_2,…,E_{\mathcal N}\)):
\[p(E_1,E_2,…,E_{\mathcal N})=\frac{g_1(E_1)g_2(E_2)…g_{\mathcal N}(E_{\mathcal N})}{g(E)}\]
subject to the constraint \(E_1+E_2+…+E_{\mathcal N}=E\), where \(g(E)=(g_1*g_2*…*g_{\mathcal N})(E)\). Implementing this with a Lagrange multiplier \(\lambda\), one can check that:
\[\frac{\lambda g(E)}{g_1(E_1)g_2(E_2)…g_{\mathcal N}(E_{\mathcal N})}=\frac{g’_1(E_1)}{g_1(E_1)}=\frac{g’_2(E_2)}{g_2(E_2)}=…=\frac{g’_{\mathcal N}(E_{\mathcal N})}{g_{\mathcal N}(E_{\mathcal N})}\]
Thus, the most likely \(E\)-partition among the \(\mathcal N\) subsystems is such that the “coldness”:
\[\beta_i(E_i):=\frac{g’_i(E_i)}{g_i(E_i)}=\frac{\partial\ln g_i}{\partial E_i}\]
is the same for all subsystems \(1\leq i\leq\mathcal N\). This also motivates a log-scale measurement \(S_i(E_i)\) of the number of ways to distribute energy \(E_i\) among the microscopic constituents of the \(i\)-th system (cf. the decibel scale in acoustics or the magnitude scale in astronomy) called the (microcanonical/Boltzmann) entropy:
\[S_i(E_i):=k_B\ln g_i(E_i)\]
so one can instead say the quantity:
\[\frac{1}{T_i(E_i)}:=\frac{\partial S_i}{\partial E_i}\]
is the same for all systems in the most likely \(E\)-partition, where \(\beta_i(E_i)=\frac{1}{k_BT_i(E_i)}\). Here, \(T_i=T_i(E_i)\) is the familiar concept of temperature, and arises in the microcanonical ensemble as an emergent parameter determined by the energy \(E_i\). Although people sometimes say that temperature is only defined in equilibrium, this clearly need not be the case (e.g. nonequilibrium temperature evolution in time is described by the heat equation \(\dot{T}=\alpha T^{\prime\prime}\)) though certainly what is special about temperature is that it is very likely to be the same for a bunch of bodies in equilibrium \(t\to\infty\).
It’s worth re-examining the derivation above in terms of entropy \(S\) rather than the degeneracy \(g=e^{S/k_B}\). First, notice that rather than maximizing the probability \(\prod_{i=1}^{\mathcal N}g_i(E_i)/g(E)\) one can focus on just maximizing the numerator \(\prod_{i=1}^{\mathcal N}g_i(E_i)\) (representing the total number of states where system #\(1\) has energy \(E_1\), system #\(2\) has energy \(E_2\), etc.) since the denominator \(g(E)\) is just a constant. Second, since logs are monotonically increasing, this is equivalent to maximizing \(\ln\prod_{i=1}^{\mathcal N}g_i(E_i)=\sum_{i=1}^{\mathcal N}\ln g_i(E_i)\). Multiplying by Boltzmann’s constant \(k_B\), it is thus clear that the most likely \(E\)-partition (which need not be an equipartition!) is the one that maximizes the total entropy \(\sum_{i=1}^{\mathcal N}S_i(E_i)\) of the isolated system.
To check that one really has a maximum (and not a minimum), one can take the “Lagrangian”:
\[S(E_1,E_2,…,E_{\mathcal N},1/T):=S_1(E_1)+S_2(E_2)+…+S_{\mathcal N}(E_{\mathcal N})-\frac{1}{T}(E_1+E_2+…E_{\mathcal N}-E)\]
and compute its \(\mathcal N\times\mathcal N\) Hessian:
\[\frac{\partial^2 S}{\partial E_j\partial E_i}=-\frac{\delta_{ij}}{T_i^2C_i}\]
which is diagonal and, provided that the heat capacities \(C_i^{-1}:=\frac{\partial T_i}{\partial E_i}\) of the subsystems are all positive as required for equilibrium stability, is clearly negative-definite everywhere (and will thus also be so when restricted to the tangent space of the constraint (at the equal-\(T\) \(E\)-partition) which in this case is just the constraint itself \(E_1+E_2+…+E_{\mathcal N}=E\) because it’s already a plane).
To summarize this section: the most likely \(E\)-partition:
\[E=E_1+E_2+…+E_{\mathcal N}\]
maximizes total entropy \(S_1(E_1)+S_2(E_2)+…+S_{\mathcal N}(E_{\mathcal N})\) by thermalizing \(T_1(E_1)=T_2(E_2)=…=T_{\mathcal N}(E_{\mathcal N})\). These considerations follow respectively from simple probability and calculus, and represent one way to understand the \(2\)nd law of equilibrium (a.k.a. thermodynamics).
Interpretation of “Probability”
The meaning of the word “probability” \(p_{\mu}\) above and also throughout this post deserves a brief comment. Basically, there are \(2\) Fourier-dual ways to think about it:
1. (Frequency Interpretation): Fix a time \(t\) and consider \(\mathcal N\gg 1\) copies of the system. Then the expected number \(\langle \mathcal N_{\mu}\rangle\leq\mathcal N\) of these \(\mathcal N\) systems in state \(\mu\) will be \(\langle \mathcal N_{\mu}\rangle=p_{\mu}\mathcal N\).
2. (Time Interpretation): Fix \(\mathcal N=1\) copy of the system and consider as time \(t\) evolves how the state \(\mu(t)\) of the system fluctuates. Then over a long enough time interval \(\Delta t\), the expected amount of time \(\langle\Delta t_{\mu}\rangle\) the system spent in state \(\mu\) will be \(\langle\Delta t_{\mu}\rangle=p_{\mu}\Delta t\).
But why should these \(2\) interpretations both be valid? Can one show that they are logically equivalent to each other? The answer is no. Rather, this is another axiom which often goes by the name of the ergodic hypothesis. In practice, most systems are ergodic.
Thermal/Boltzmann/Canonical Ensemble
Consider the \(\mathcal N=2\) case where the joint probability distribution of energies simplifies to:
\[p(E_1,E_2=E-E_1)=\frac{g_1(E_1)g_2(E-E_1)}{(g_1*g_2)(E)}\]
where \((g_1*g_2)(E)=\sum_{E_1}g_1(E_1)g_2(E-E_1)\). Now make the assumption that system #\(1\) is some large system of interest in equilibrium with system #\(2\) which is an infinitely larger heat bath with infinite heat capacity \(C_2=\infty\) representing the rest of the universe/environment/surroundings which is not of interest. Then:
\[C_2=\infty\Rightarrow\frac{\partial T_2}{\partial E_2}=0\]
So \(T_2\neq T_2(E_2)\). This means the earlier definition of \(\beta_2\):
\[\beta_2=\frac{\partial\ln g_2}{\partial E_2}\]
now becomes a differential equation with exponentially growing degeneracy \(g_2(E_2)=g_2(0)e^{\beta E_2}\) in light of \(\beta:=\beta_2\neq\beta_2(E_2)\). Alternatively, the extensive quantities \(S_2,E_2\) are on equal footing \(TS_2(E_2)=E_2+TS_2(0)\).
At this point, a natural step would be to Taylor expand \(g_2(E-E_1)\approx g_2(E)-E_1g’_2(E)\). Taking the two systems to be in equilibrium, hence sitting at the same coldness \(\beta=\frac{g_1′(E_1)}{g_1(E_1)}=\frac{g_2′(E_2)}{g_2(E_2)}\approx\frac{g_2′(E)}{g_2(E)}\).
Substituting into the probability distribution yields the exponentially suppressed Boltzmann distribution at temperature \(T\):
\[p(E_1)=\frac{g_1(E_1)e^{-\beta E_1}}{Z(\beta)}\]
where the canonical partition function \(Z(\beta):=\sum_{E_1}g_1(E_1)e^{-\beta E_1}\equiv\int_{-\infty}^{\infty}dE_1g_1(E_1)e^{-\beta E_1}\) is just the Laplace transform of the density of states \(g_1(E_1)\). It follows that the canonical partition function \(Z(\beta)\) contains exactly the same information as the spectrum \(g_1(E_1)\) of a system which can be recovered from knowledge of \(Z(\beta)\) by the inverse Laplace transform:
\[g_1(E_1)=\frac{1}{2\pi i}\oint_{x-i\infty}^{x+i\infty}d\beta Z(\beta)e^{\beta E}\]
(notation simplified here to forget about the heat bath) Due to the Laplace transform relationship between \(g(E)\) and \(Z(\beta)\), many standard properties of the transform can be immediately imported. For instance, the density of states \(g(E)=(g_1*g_2*…*g_{\mathcal N})(E)\) of \(\mathcal N\) independent systems, by the convolution theorem, that the corresponding non-interacting partition function is just the \(\mathcal N\)-fold product \(Z(\beta)=Z_1(\beta)Z_2(\beta)…Z_{\mathcal N}(\beta)\).
Given the relative nature of the energy \(E\), the proper, gauge-invariant way to understand the Boltzmann distribution comes from looking at probability ratios:
\[\frac{p(E)}{p(E_0)}=\frac{g(E)}{g(E_0)}e^{-\beta\Delta E}\]
which depends only on the gauge-invariant energy difference \(\Delta E:=E-E_0>0\) through the Boltzmann factor \(e^{-\beta\Delta E}\). Heuristically, the first factor \(g(E)/g(E_0)=e^{\Delta S/k_B}\) is entropic while the second factor \(e^{-\beta\Delta E}\) is energetic. That is, if one thinks of \(E_0\) as a ground state energy, then states with energy \(E\) exceeding \(E_0\) by more than the thermal energy scale \(\beta^{-1}=k_BT\) are exponentially unlikely to be observed…that is, unless the entropic factor \(g(E)/g(E_0)\) can come to the rescue by growing even faster to offset this exponential suppression. In particular, this will be easier at lower temperature \(T\) where more interesting, high-energy physics can be observed.
In stark contrast to the microcanonical ensemble \(\mathcal E_E\), the energy \(E\) in the canonical ensemble \(\mathcal E_T\) is no longer conserved, but rather fluctuates around the ensemble average \(\langle E\rangle=\int_{-\infty}^{\infty}dE\rho(E)E\) where \(\rho(E)=g(E)e^{-\beta E}/Z\). In fact, one can view the energy \(E:\mathcal E_T\to\textbf R\) as a continuous random variable with Boltzmannian probability density function \(\rho(E)\). Then, as per the shift theorem for Laplace transforms, the moment generating function \(\langle e^{\tilde{\beta}E}\rangle\) of \(E\) is just:
\[\langle e^{\tilde{\beta}E}\rangle=\int_{-\infty}^{\infty}dE\rho(E)e^{\tilde{\beta}E}=\frac{Z(\beta-\tilde{\beta})}{Z(\beta)}\]
so the \(n\)-th moment \(\langle E^n\rangle\) of \(E\) is:
\[\langle E^n\rangle=\left(\frac{\partial\langle e^{\tilde{\beta}E}\rangle}{\partial\tilde{\beta}}\right)_{\tilde{\beta}=0}=\frac{(-1)^{n+1}}{Z}\frac{\partial^n Z}{\partial\beta^n}\]
One can also look at the cumulant generating function \(\ln\langle e^{\tilde{\beta}E}\rangle\) of \(E\):
\[\ln\langle e^{\tilde{\beta}E}\rangle=\ln Z(\beta-\tilde{\beta})-\ln Z(\beta)\]
where the first \(3\) cumulants of \(E\) have straightforward interpretations:
\[\langle E\rangle=\left(\frac{\partial\ln\langle e^{\tilde{\beta}E}\rangle}{\partial\tilde{\beta}}\right)_{\tilde{\beta}=0}=-\frac{\partial\ln Z}{\partial\beta}\]
\[\langle (E-\langle E\rangle)^2\rangle=\left(\frac{\partial^2\ln\langle e^{\tilde{\beta}E}\rangle}{\partial\tilde{\beta}^2}\right)_{\tilde{\beta}=0}=\frac{\partial^2\ln Z}{\partial\beta^2}\]
\[\langle (E-\langle E\rangle)^3\rangle=\left(\frac{\partial^3\ln\langle e^{\tilde{\beta}E}\rangle}{\partial\tilde{\beta}^3}\right)_{\tilde{\beta}=0}=-\frac{\partial^3\ln Z}{\partial\beta^3}\]
Finally, one can check properly (using chain rule), or just remember on dimensional analysis grounds the following version of the fluctuation-dissipation theorem:
\[\frac{C}{k_B}:=\frac{\partial\langle E\rangle}{\partial k_BT}=\frac{\langle (E-\langle E\rangle)^2\rangle}{(k_BT)^2}\Rightarrow C=k_B\beta^2\frac{\partial^2\ln Z}{\partial\beta^2}\]
Gaussian fluctuations argument to reproduce this…
Entropy
Forget about physics for a second. In probability theory, given a random variable with outcomes \(\mu\), its entropy is defined by:
\[S:=-\sum_{\mu}p_{\mu}\ln p_{\mu}=-1-\sum_{\mu}\int_0^{p_{\mu}}dp\ln p\]
This is just a purely mathematical concept from probability theory.
In physics, probability distributions \(p_{\mu}\) arise due to an observer’s classical ignorance about the state of a system (even in quantum mechanics, von Neumann entropy \(S=-\text{Tr}\rho\ln\rho\) is probing the classical uncertainty associated with a density operator \(\rho\) of a mixed state, not the Heisenbergian uncertainty intrinsic to pure quantum states).
Entropy is thus subjective because it depends on how much one knows about a system (e.g. does one know \(N,V,E\) exactly, or merely averages \(\langle N\rangle,\langle V\rangle,\langle E\rangle\), etc.) which informs one’s “personal” probability distribution over the system’s accessible state space \(\mathcal E\). For instance, God knows the exact state \(\mu^*\) of the universe, so that the delta distribution \(p_{\mu}=\delta_{\mu\mu^*}\) spiking at \(\mu=\mu^*\) has entropy \(S=0\). From God’s perspective therefore, the entropy of the universe is always \(S=0\). The average person however isn’t as omniscient as God, and so from their reference frame the universe has \(S>0\).
In casual conversation, when one speaks of e.g. the entropy \(S\) of a glass of water, or the change in entropy \(\Delta S\) of that glass of water, everything can be framed more clearly if, instead of using the word “entropy”, one replaces it by the phrase “classical ignorance“. This naturally leads one to rephrase one’s language; one could instead say “my entropy towards that glass of water is \(S\)” or “my entropy towards the heat bath has increased by \(\Delta S\)”. Moreover, if one’s information about the system changes, then prior probabilities need to be updated to posterior probabilities as well, changing one’s entropy towards a system.
In the microcanonical ensemble, the Boltzmann entropy is \(S=k\ln\Omega\). However, this is no longer appropriate for the canonical ensemble because the energy \(E\) is no longer fixed and therefore \(\Omega=\Omega(E)\) is not well-defined.
Instead, we can figure out what \(S\) is in the canonical ensemble (actually, in any thermodynamic ensemble) by recycling the trick of viewing the system + heat bath \(\mathcal S\cup\mathcal E\) as living in the microcanonical ensemble so that we would be able to apply the principle of equal a priori probabilities. In fact, one additional trick is also required. This trick is to consider \(N\) identical copies \(\prod_N\mathcal S\cup\mathcal E\) of \(\mathcal S\cup\mathcal E\). If one likes, one can think of \(N\) identical non-interacting universes each containing a copy of \(\mathcal S\cup\mathcal E\). If one writes \(0\leq N_n\leq N\) to be the number of these \(N\) universes whose system \(\mathcal S\) is in microstate \(n\) so that \(\sum_n N_n=N\), then by the law of large numbers one expects that the \(N_n= p_n N\) are fixed by the microstate probability distribution \(p_n\) (not necessarily the canonical ensemble \(p_n=e^{-\beta E_n}/Z\)) in any of the \(N\) identical universes.
As a toy example for building intuition, suppose only two microstates \(|\uparrow\rangle\) and \(|\downarrow\rangle\) are accessible with probabilities \(p_{|\uparrow\rangle}=30\%\) and \(p_{|\downarrow\rangle}=70\%\) so that in \(N=100\) universes we expect \(p_{|\uparrow\rangle}N=30\) of them to be in the \(|\uparrow\rangle\) microstate and \(p_{|\downarrow\rangle}N=70\) to be in the \(|\uparrow\rangle\) microstate. Then a microstate of the entire collection of \(N=100\) universes would consist of specifying the microstate (either \(|\uparrow\rangle\) or \(|\downarrow\rangle\)) for each of the \(N=100\) universes such as to satisfy the constraints that \(30\) are \(|\uparrow\rangle\) and \(70\) are \(|\downarrow\rangle\). This is precisely analogous to if we had \(30\) \(|\uparrow\rangle\) “letters” and \(70\) \(|\downarrow\rangle\) letters and asked for the number of distinct \(100\)-letter words that could be formed from their permutations. Clearly, the answer is given by the multinomial coefficient \(\Omega=\frac{100!}{30!70!}\). Generalizing this, we therefore have:
\[\Omega=\frac{N!}{\prod_nN_n!}=\frac{N!}{\prod_n(p_nN)!}\]
With the corresponding Boltzmann entropy in the microcanonical ensemble (after applying Stirling’s approximation in the form \(\ln N!\approx N\ln N-N\)):
\[S\approx -Nk\sum_{n}p_n\ln p_n\]
However, for non-interacting systems (universes!) the entropy is additive (alternatively, just set \(N=1\) universe) so one obtains the formula for Gibbs entropy as promised long ago:
\[S=-k\sum_{n}p_n\ln p_n\]
Back in the canonical ensemble \(p_n=e^{-\beta E_n}/Z\), this leads to a nice formula for entropy of a canonical ensemble:
\[S=k(\beta\langle E\rangle+\ln Z)=k\frac{\partial}{\partial T}T\ln Z\]
where the first equation can be roughly interpreted as a Boltzmann-like entropy \(k\ln Z\) with an additional energetic contribution \(k\beta\langle E\rangle=\langle E\rangle/T\). In fact, there is a more direct way to see how the Boltzmann entropy emerges by looking at how the partition function \(Z\) in the above formula for \(S\) behaves in the thermodynamic limit. Specifically, recall that \(Z=\sum_n e^{-\beta E_n}=\sum_{E_n}\Omega_{\mathcal S}(E_n)e^{-\beta E_n}\) which, being a sum of exponentials, will be dominated by the maximum value \(Z\approx \Omega_{\mathcal S}(E^*)e^{-\beta E^*}\) occurring at some energy \(E^*\) satisfying \(\frac{\partial \Omega_{\mathcal S}(E^*)e^{-\beta E^*}}{\partial E^*}=0\) or equivalently\(\frac{\partial\ln\Omega_{\mathcal S}(E^*)}{\partial E^*}=\beta\). Then clearly \(\langle E\rangle=-\frac{\partial\ln Z}{\partial\beta}=-\frac{\partial\ln \Omega_{\mathcal S}(E^*)}{\partial E^*}\frac{\partial E^*}{\partial\beta}+E^*+\beta\frac{\partial E^*}{\partial\beta}=E^*\) as expected in the thermodynamic limit and:
\[S=k(\beta\langle E\rangle+\ln Z)=k\ln\Omega_{\mathcal S}(E^*)\]
Thus demonstrating how, at least for entropy \(S\), the microcanonical and canonical ensembles converge in the thermodynamic limit.
Bridging Theory & Experiment
\[\Delta S=\int dS=\int \frac{\partial S}{\partial E}\frac{\partial E}{\partial T}dT=\int_{\Delta T}\frac{C}{T}dT\]
Heat Capacity Contribution From Schottky Anomaly in Canonical Ensemble
Consider a system of \(N\) fixed, non-interacting electrons \(e^-\) placed in an external \(\textbf B_{\text{ext}}\)-field. Each electron experiences a Zeeman splitting of magnitude \(\Delta E=E_{\downarrow}-E_{\uparrow}=\hbar\omega_0\) with \(\omega_0=\gamma B_{\text{ext}}\) the Larmor frequency of spin angular momentum precession about \(\textbf B_{\text{ext}}\). If we focus for a moment on just \(N=1\) of these fixed electrons \(e^-\), then \(Z=e^{-\beta E_{\uparrow}}+e^{-\beta E_{\downarrow}}=e^{\beta\hbar\omega_0/2}+e^{-\beta\hbar\omega_0/2}=2\cosh\frac{\beta\hbar\omega_0}{2}\). But then, recalling that the \(N\) electrons are non-interacting, it follows that the total partition function is:
\[Z=2^N\cosh^N\frac{\beta\hbar\omega_0}{2}\]
\[\ln Z=N\left(\ln 2+\ln\cosh\frac{\beta\hbar\omega_0}{2}\right)\]
We have:
\[\langle E\rangle = -\frac{N\hbar\omega_0}{2}\tanh\frac{\beta\hbar\omega_0}{2}\]
It makes sense that this is negative because in general more electron spins will be aligned than anti-aligned with \(\textbf B_{\text{ext}}\).
\[\sigma_E^2=N\left(\frac{\hbar\omega_0}{2}\right)^2\text{sech}^2\frac{\beta\hbar\omega_0}{2}\]
\[C_V=Nk\left(\frac{\beta\hbar\omega_0}{2}\right)^2\text{sech}^2\frac{\beta\hbar\omega_0}{2}\]

In practice, this Schottky anomaly due to the spin angular momentum contribution to \(C_V\) is dwarfed by contributions from phonons and from conduction \(e^-\) in metals. Furthermore, it doesn’t account for coupling between spins since we assumed the \(e^-\) were non-interacting.
As an example of computing entropy in the canonical ensemble, for the earlier Schottky anomaly this is:
\[S=Nk\left(-\frac{\beta\hbar\omega_0}{2}\tanh\frac{\beta\hbar\omega_0}{2}+\ln 2+\ln\cosh\frac{\beta\hbar\omega_0}{2}\right)\]

As another remark, if we take \(M_{\text{ind}}:=N_{|\uparrow\rangle}-N_{|\downarrow\rangle}=N\tanh\frac{\beta\hbar\omega_0}{2}\) to be the net induced magnetization along \(\textbf B_{\text{ext}}\), then one can also establish a version of Curie’s law \(\chi:=\frac{\partial M}{\partial B_{\text{ext}}}=\beta\mu N\text{sech}^2\frac{\beta\hbar\omega_0}{2}\approx\beta\mu N\) in the high-temperature limit \(T\to\infty\) (where here \(\mu=\hbar\gamma/2\)).
Helmholtz Free Energy
The phrase “free energy” is used ubiquitously throughout thermodynamics. From a practical engineering perspective, free energies can always be interpreted as the maximum available work \(W_{\text{available}}^*\) under specific circumstances. In mechanics, this would simply called “potential energy”, and to reflect this analogy in thermodynamics the free energy functions are also known as thermodynamic potentials.
Because the canonical ensemble is the \(NVT\)-ensemble, in particular at constant volume \(V\) and constant temperature \(T\) the Helmholtz free energy \(F:=\langle E\rangle -TS=-kT\ln Z\) which has the most direct connection to the partition function \(Z\) (no derivatives, just the log) is minimized (NEED TO COME BACK TO THIS TO ELABORATE WHY SO, see: https://physics.stackexchange.com/questions/341345/why-does-the-gibbs-free-energy-need-to-be-minimized-for-an-equilibrium for a rough idea).
Chemical Potential & Grand Canonical Ensemble
Conceptually, having already made the transition from the microcanonical ensemble to the canonical ensemble, it’s not difficult to make the jump from the canonical ensemble to the grand canonical ensemble. Here, the system \(\mathcal S\) interacts with a heat bath \(\mathcal E\) at constant temperature \(T\) and chemical potential \(\mu:=-T\frac{\partial S}{\partial N}\). As usual, assume the heat bath has a lot more energy and particles than the system \(N_{\mathcal E},E_{\mathcal E}\gg N_{\mathcal S},E_{\mathcal S}\) and that the composite system \(\mathcal S\cup\mathcal E\) is in the microcanonical ensemble so by the principle of equal a priori probabilities:
\[p_{|\mu_{\mathcal S}\rangle}=\frac{\Omega_{\mathcal E}(N-N_{|\mu_{\mathcal S}\rangle},E-E_{|\mu_{\mathcal S}\rangle})}{\Omega_{\mathcal S\cup\mathcal E}}\]
where where \(|\mu_{\mathcal S}\rangle\) denotes an arbitrary microstate of just the system \(\mathcal S\) and \(\Omega_{\mathcal S\cup\mathcal E}=\sum_{|\tilde{\mu}_{\mathcal S}\rangle}\Omega_{\mathcal E}(N-N_{|\tilde{\mu}_{\mathcal S}\rangle},E-E_{|\tilde{\mu}_{\mathcal S}\rangle})\) which ensures \(\sum_{|\mu_{\mathcal S}\rangle}p_{|\mu_{\mathcal S}\rangle}=1\). As usual, the \(3\) steps are then:
- Rewrite the heat bath’s multiplicity \(\Omega_{\mathcal E}\) in terms of its Boltzmann entropy \(\Omega_{\mathcal E}=e^{S_{\mathcal E}/k}\)
- Taylor expand the heat bath’s Boltzmann entropy \(S_{\mathcal E}(N_{\mathcal E},E_{\mathcal E})\) about the total \((N,E)\) of the composite system + heat bath \(\mathcal S\cup\mathcal E\).
- Plug in the definitions of \(T\) and \(\mu\) into the first partial derivative terms that appear in such a multivariable Taylor expansion.
Doing this, you end up with the grand canonical ensemble microstate probability distribution:
\[p_{|\mu_{\mathcal S}\rangle}=\frac{e^{-\beta(E_{|\mu_{\mathcal S}\rangle}-\mu N_{|\mu_{\mathcal S}\rangle})}}{\mathcal Z(\mu, T)}\]
Now it’s the grand canonical partition function \(\mathcal Z=\sum_{|\tilde{\mu}_{\mathcal S}\rangle}e^{-\beta(E_{|\tilde{\mu}_{\mathcal S}\rangle}-\mu N_{|\tilde{\mu}_{\mathcal S}\rangle})}\) which provides the gateway to computing all the macroscopic observables of a grand canonical thermodynamic system. One can check the following formulas (all referring to the system \(\mathcal S\)):
\[S=k\frac{\partial}{\partial T}(T\ln\mathcal Z)\]
\[\langle E\rangle-\mu\langle N\rangle=-\frac{\partial}{\partial\beta}\ln \mathcal Z\]
\[\langle N\rangle =\frac{1}{\beta}\frac{\partial}{\partial\mu}\ln\mathcal Z\]
\[\sigma^2_N=\frac{1}{\beta}\frac{\partial\langle N\rangle}{\partial\mu}\]
In the grand canonical ensemble, we of course have the grand canonical potential \(\Phi:=F-\mu N\) the Legendre transform of the Helmholtz free energy \(F\) from \(N\) to \(\mu\) (since now in the grand canonical ensemble it is \(\mu\) no longer \(N\)). Note that \(\Phi=-kT\ln \mathcal Z\). Finally, using the extensivity argument \(\Phi(\mu, \xi V,T)=\xi\Phi(\mu, V, T)\), it follows that \(\Phi\propto V\), and in fact the constant of proportionality is the (negative of the) pressure \(p=p(\mu, T)\), so \(\Phi=-pV\). This is because of the differential \(d\Phi=-Nd\mu-pdV-SdT\).