Given a mixed ensemble \(\{(|\psi_n\rangle, p_n)\}\) of pure quantum states \(|\psi_n\rangle\in\mathcal H\) with statistical probabilities \(p_n\in[0,1]\), the Hermitian, positive semi-definite, unit trace density operator \(\rho_{\{(|\psi_n\rangle, p_n)\}}:\mathcal H\to\mathcal H\) of that mixed ensemble is defined by the formula:
\[\rho_{\{(|\psi_n\rangle, p_n)\}}:=\sum_np_n|\psi_n\rangle\langle\psi_n|\]
In practice, the mixed ensemble \(\{(|\psi_n\rangle, p_n)\}\) with respect to which one is describing a given density operator will be obvious from context, so the density operator is commonly just written as \(\rho\). In general, the pure states \(|\psi_n\rangle\) in the mixed ensemble do not need to form a basis for \(\mathcal H\), or be orthogonal to each other, just normalized \(\langle\psi_n|\psi_n\rangle =1\). However, for intuition purposes it is helpful to look at the special (and indeed, not uncommon) case where the \(|\psi_n\rangle\) do in fact form a basis for \(\mathcal H\) and are orthogonal to each other, for instance if the \(|\psi_n\rangle=|E_n\rangle\) are the eigenstates of a Hermitian observable \(H\) so that \(H|E_n\rangle=E_n|E_n\rangle\). Recall that in such a case, any analytic function \(f(H)\) of the observable \(H\) is given by:
\[f(H):=\sum_nf(E_n)|E_n\rangle\langle E_n|\]
For instance, if \(f(E):=E\) (more concisely \(f:=1\)) is the identity then we just get:
\[H=\sum_nE_n|E_n\rangle\langle E_n|\]
Or if in addition \(H:=1\) is the identity then one has the usual resolution of the identity:
\[1=\sum_n|E_n\rangle\langle E_n|\]
In particular, the density operator \(\rho\) for any mixed ensemble is Hermitian \(\rho^{\dagger}=\rho\) so replacing \(H\mapsto\rho\) and \(E_n\mapsto p_n\) gives:
\[\rho=\sum_np_n|\psi_n\rangle\langle \psi_n|\]
Which shows that the density operator \(\rho\) is roughly a probability operator for the mixed ensemble of pure states in the sense that, given the special case above with the pure states \(|\psi_n\rangle\) all orthonormal, \(\rho|\psi_n\rangle=p_n|\psi_m\rangle\) so that the \(|\psi_n\rangle\) are all eigenstates of \(\rho\) with eigenvalues \(p_n\).
The average measured value of an observable \(H\) in a mixed ensemble is \(\text{Tr}(\rho H)\). For instance, if \(H=1\) is the identity, then \(\text{Tr}(\rho)=1\) as all states are eigenstates of \(1\) with eigenvalue \(1\). Or, if \(H=\rho\), then \(\text{Tr}(\rho^2)\) is called the purity of the mixed ensemble. This is because for a pure state \(\rho=|\psi\rangle\langle\psi|\), this is clearly just a vector projection \(\rho^2=\rho\), so that the purity is \(\text{Tr}(\rho^2)=\text{Tr}(\rho)=1\). For an impure mixed ensemble by contrast, the purity is \(\text{Tr}(\rho^2)<1\).
Given a mixed ensemble \(\{(|\psi_n\rangle, p_n)\}\) of pure states, one would like to quantify how much information one has about the actual quantum state \(|\psi_m\rangle\) of the system. To this effect, define the von Neumann entropy \(S\in[0,\ln\dim\mathcal H]\) of the mixed ensemble \(\{(|\psi_n\rangle, p_n)\}\) by:
\[S:=-\text{Tr}\rho\ln\rho\]
In practice, because \(S=S(\rho)\), in practice one calculates the von Neumann entropy of a mixed ensemble of pure states via:
\[S=-\sum_n\rho_n\ln(\rho_n)\]
where the \(\rho_n\in[0,1]\) are the eigenvalues of \(\rho\), thus resembling the notion of Shannon entropy \(S_X:=-\sum_{x\in X(\Omega)}m_X(x)\log(m_X(x))\) in information theory (indeed historically Shannon entropy was inspired by von Neumann entropy I believe).
There is a fundamental result in quantum statistical mechanics which relies on maximizing the von Neumann entropy \(S\) in the spirit of the second law of thermodynamics:
Theorem: Given total probability \(\text{Tr}\rho=1\) and average energy \(\text{Tr}\rho H=E\) constraints on the density operator \(\rho\), the unique density operator \(\rho\) that maximizes the von Neumann entropy \(S=S(\rho)\) functional is given by the Boltzmann distribution \(\rho_{\text{Boltzmann}}=\frac{e^{-\beta H}}{Z(\beta)}\), where the reciprocal temperature \(\beta=1/kT\) is determined (unsurprisingly) by the average energy \(E=\frac{\sum_nE_ne^{-\beta E_n}}{Z(\beta)}\), \(e^{-\beta H}=\sum_n e^{-\beta E_n}|E_n\rangle\langle E_n|\), and \(Z(\beta):=\text{Tr}e^{-\beta H}=\sum_n e^{-\beta E_n}\) is the partition function for the quantum system with Hamiltonian \(H\).
The proof is a straightforward exercise in Lagrange multipliers.
Given a quantum system with state space \(\mathcal H\) which can be viewed as \(\mathcal H=\mathcal H_A\otimes\mathcal H_B\) for some quantum subsystems \(A,B\), the density operator \(\rho:\mathcal H\to\mathcal H\) of the entire quantum system \(\mathcal H\) can reduced to two reduced density operators \(\rho_A:\mathcal H_A\to\mathcal H_A\) and \(\rho_B:\mathcal H_B\to\mathcal H_B\) on the Hilbert subspaces \(\mathcal H_A\) and \(\mathcal H_B\) respectively. This reduction process basically involves tracing out the unwanted degrees of freedom from whatever subsystem one is not interested in:
\[\rho_A:=\text{Tr}_B\rho\]
\[\rho_B:=\text{Tr}_A\rho\]
where \(\text{Tr}_A,\text{Tr}_B\) are called partial traces (cf. partial derivatives) with respect to the quantum subsystems \(\mathcal H_A\) and \(\mathcal H_B\) respectively. In practice, \(A\) will be the quantum system of interest while \(B\) will be some kind of environment/surrounding quantum system whose state one is not interested in.
Example: Consider two qubits with \(\mathcal H=\textbf C^2\otimes\textbf C^2\). Suppose that the system as a whole is in the pure quantum state \(|\psi\rangle=\frac{|\uparrow\rangle\otimes|\uparrow\rangle+|\downarrow\rangle\otimes|\downarrow\rangle}{\sqrt{2}}\) (this happens to be one of the four so-called maximally entangled 2-qubit Bell states which together form a basis for \(\mathcal H=\textbf C^2\otimes\textbf C^2\)). Equivalently, the density operator is \(\rho=|\psi\rangle\langle\psi|=\frac{|\uparrow\rangle\otimes|\uparrow\rangle\langle\uparrow|\otimes\langle\uparrow|+|\uparrow\rangle\otimes|\uparrow\rangle\langle\downarrow|\otimes\langle\downarrow|+|\downarrow\rangle\otimes|\downarrow\rangle\langle\uparrow|\otimes\langle\uparrow|+|\downarrow\rangle\otimes|\downarrow\rangle\langle\downarrow|\otimes\langle\downarrow|}{2}\). However, suppose one is only interested in one of the two qubits (let’s call it qubit \(A\)), and so would like to calculate the reduced density operator for qubit \(A\) by tracing out qubit \(B\). This means:
\[\rho_A=\langle\uparrow_B|\rho|\uparrow_B\rangle+\langle\downarrow_B|\rho|\downarrow_B\rangle\]
By fixing some basis \(\beta\) of \(\mathcal H\), one can of course also work with the density matrix \([\rho]_{\beta}^{\beta}\in\textbf C\) of \(\rho\) with respect to the basis \(\beta\). Typically however, \(\beta\) will be an orthonormal eigenbasis of some observable \(H\) so that