The purpose of this post is to demonstrate how Maxwell’s equations are compatible with special relativity. First though, some review of the formalism of special relativity is needed.
Review of Lorentz Scalars, Four-Vectors, Four-Covectors
Recall that in special relativity, a Lorentz scalar \(a=a(X)\) is any real number depending in some way on the four-position \(X=(ct,\textbf x)\) of events in Minkowski spacetime such that whenever the four-position \(X\) of an event undergoes a Lorentz transformation \(X\mapsto \Lambda X\) for some \(\Lambda\in O(1,3)\) in the Lorentz group (which recall consists of Lorentz boosts and proper/improper rotations) the Lorentz scalar \(a(X)\) is Lorentz invariant \(a(\Lambda X)=a(X)\). Similarly a four-vector \(V=V(X)\) transforms in an equivariant/commuting way \(V(\Lambda X)=\Lambda V(X)\) under a Lorentz transformation \(X\mapsto \Lambda X\) (here, I’m tempted to say “in a covariant way” but actually the components of \(X\) turn out to be contravariant…more on this later).
Theorem: If \(V(X),W(X)\) are any two four-vectors, then their inner product \(V(X)^T\eta W(X)\) is a Lorentz scalar. If in addition \(a(X),b(X)\) are Lorentz scalars, then the linear combination \(a(X)V(X)+b(X)W(X)\) is a four-vector.
Proof: \(V(\Lambda X)^T\eta W(\Lambda X)=(\Lambda V(X))^T\eta\Lambda W(X)=V(X)^T\Lambda^T\eta\Lambda W(X)=V(X)^T\eta W(X)\) as the defining property for membership \(\Lambda\in O(1,3)\) in the Lorentz group is that \(\Lambda^T\eta\Lambda=\eta\). For the second part, we have \(a(\Lambda X)V(\Lambda X)+b(\Lambda X)W(\Lambda X)=a(X)\Lambda V(X)+b(X)\Lambda W(X)=\Lambda(a(X)V(X)+b(X)W(X))\).
Corollaries: The Minkowski distance/”ruler” \(ds^2:=dX^T\eta dX=\eta_{\mu\nu}dX^{\mu}dX^{\nu}=c^2dt^2-|d\textbf x|^2\) is the fundamental Lorentz scalar that encodes the hyperbolic geometry of Minkowski spacetime. The proper time interval \(d\tau:=ds/c\) experienced by a particle (where here \(ds\) is the Minkowski distance between two infinitesimally close events on the particle’s worldline) is therefore also a Lorentz scalar. Thus, the derivative \(V:=\frac{dX}{d\tau}\) of the four-vector \(dX\) with respect to the Lorentz scalar \(d\tau\) is also a four-vector, called the four-velocity of the particle. Similar comments apply to the four-momentum \(P:=mV\), the four-acceleration \(A:=\frac{dV}{d\tau}\), etc.
Recall that in any inner product space \(V\), there is a canonical isomorphism \(V\cong V^*\) between \(V\) and its algebraic dual space \(V^*\) given by transposition \(\textbf v\mapsto \textbf v^T\) from a column vector to a row vector if one likes. In the case of Minkowski spacetime, the analog of the “inner product” is of course \((V,W)\mapsto V^T\eta W\) (though this is not strictly a valid inner product because \(X^T\eta X\) need not be positive semi-definite as exemplified when \(X\) is a space-like separated event!) so it makes sense that a canonical isomorphism should also exist between Minkowski spacetime and its algebraic dual spacetime given by \(V\mapsto V^T\eta\). Thus for instance, a four-position \(X=(ct,\textbf x)^T\) is dual to \(X^T\eta=(ct, -\textbf x)\), a four-velocity \(V=\gamma(c,\textbf v)^T\) is dual to \(V^T\eta=\gamma(c,-\textbf v)\), etc. Such objects like \(X^T\eta\) and \(V^T\eta\) that live in dual Minkowski spacetime are called four-covectors. It can be checked that under a Lorentz transformation \(X\mapsto \Lambda X\), a four-covector \(\tilde V(X)=V(X)^T\eta\) transforms (roughly) the opposite way to how a four-vector behaves, namely \(\tilde V(\Lambda X)=\tilde V(X)\Lambda^{-1}\). In other words, this provides another way to see why the inner product \(\tilde V(X)V(X)\mapsto \tilde V(X)\Lambda^{-1}\Lambda V(X)=\tilde V(X)V(X)\) is a Lorentz scalar.
If one works in a specific inertial frame, then the covariant components \(X_{\mu}\) of a four-covector \(\tilde X\) are obtained from the contravariant components \(X^{\nu}\) of the corresponding four-vector \(X\) by lowering indices with the Minkowski metric tensor \(X_{\mu}=\eta_{\mu\nu}X^{\nu}\). As \(\eta^2=1\) is an involution, it follows that \(\eta^{\mu\rho}\eta_{\rho\nu}=\delta^{\mu}_{\nu}\) so the contravariant Minkowski metric \(\eta^{\mu\nu}\) can also be used to “raise indices” (i.e. promote from a four-covector back to a four-vector) \(X^{\mu}=\eta^{\mu\nu}X_{\nu}\). Using this index formalism, under a Lorentz transformation of contravariant components \(X^{\mu}\mapsto \Lambda^{\mu}_{\nu}X^{\nu}\), the covariant components transform as \(X_{\mu}=\eta_{\mu\sigma}X^{\sigma}\mapsto \eta_{\mu\sigma}\Lambda^{\sigma}_{\rho} X^{\rho}=\eta_{\mu\sigma}\Lambda^{\sigma}_{\rho}\eta^{\rho\nu} X_{\nu}=\Lambda^{\nu}_{\mu}X_{\nu}\).
A prominent example of a four-covector is the four-gradient \(\frac{\partial}{\partial X}:=\left(\frac{1}{c}\frac{\partial}{\partial t},\frac{\partial}{\partial\textbf x}\right)\). It has components \(\frac{\partial}{\partial X^{\mu}}\). To prove that it’s actually a four-covector and not a four-vector, let \(X’^{\mu}=\Lambda^{\mu}_{\nu}X^{\nu}\) be the Lorentz-transformed contravariant components of the four-position \(X\). From this we obtain \(\frac{\partial X’^{\mu}}{\partial X^{\nu}}=\frac{\partial}{\partial X^{\nu}}\Lambda^{\mu}_{\rho}X^{\rho}=\Lambda^{\mu}_{\rho}\frac{\partial X^{\rho}}{\partial X^{\nu}}=\Lambda^{\mu}_{\rho}\delta_{\nu}^{\rho}=\Lambda^{\mu}_{\nu}\) or equivalently \(\frac{\partial X^{\nu}}{\partial X’^{\mu}}=\Lambda^{\nu}_{\mu}\). Meanwhile, the components of the four-gradient transform (by virtue of the chain rule) into \(\frac{\partial}{\partial X^{\mu}}\mapsto\frac{\partial}{\partial X’^{\mu}}=\frac{\partial X^{\nu}}{\partial X’^{\mu}}\frac{\partial}{\partial X^{\nu}}=\Lambda^{\nu}_{\mu}\frac{\partial}{\partial X^{\nu}}\). Thus, either from the fact that the \(\mu\) is a subscript on \(\Lambda\) or that the \(\nu\) is a superscript on \(\Lambda\), clearly this implies that the four-gradient \(\frac{\partial}{\partial X}\) is a four-covector and hence it has covariant components that can be denoted by the subscript \(\partial_{\mu}:=\frac{\partial}{\partial X^{\mu}}\).
Tensors
In general, on an arbitrary manifold (such as Minkowski spacetime!) if one transforms between two coordinate charts \((x^1,x^2,…)\mapsto (x’^1,x’^2,…)\) (i.e. a Lorentz transformation between two inertial frames!), then physicists define a tensor \(T\) of type \((u,v)\) by enforcing that its components transform according to:
\[T’^{\mu’_1,…,\mu’_{u}}_{\nu’_1,…,\nu’_{v}}=\frac{\partial x’^{\mu’_1}}{\partial x^{\mu_1}}…\frac{\partial x’^{\mu’_u}}{\partial x^{\mu_u}}\frac{\partial x^{\nu_1}}{\partial x’^{\nu’_1}}…\frac{\partial x^{\nu_v}}{\partial x’^{\nu’_v}}T^{\mu_1,…,\mu_u}_{\nu_1,…,\nu_v}\]
Please ingrain this equation into your head. In the case where that manifold is Minkowski spacetime, the partial derivatives become (via the earlier derivation):
\[T’^{\mu’_1,…,\mu’_{u}}_{\nu’_1,…,\nu’_{v}}=\Lambda^{\mu’_1}_{\mu_1}…\Lambda^{\mu’_u}_{\mu_u}\Lambda^{\nu_1}_{\nu’_1}…\Lambda^{\nu_v}_{\nu’_v}T^{\mu_1,…,\mu_u}_{\nu_1,…,\nu_v}\]
which we call a Lorentz tensor. Thus, Lorentz scalars are type \((0,0)\) Lorentz tensors, four-vectors are type \((1,0)\) Lorentz tensors while four-covectors are type \((0,1)\) Lorentz tensors. The important point about Lorentz tensors in special relativity is that any equation you write down using tensors (where you’ve respected the rules about pairing upper/contravariant and lower/covariant indices, etc.) is necessarily a Lorentz covariant equation in the sense that it will look the same in all inertial frames. This phenomenon of Lorentz covariance was one of Einstein’s original postulates of special relativity!
Four-Current
The four-current is defined by:
\[J:=\begin{pmatrix}\rho c \\ \textbf J\end{pmatrix}\]
It turns out that the four-current \(J\) is a four-vector as its name suggests, thus it has contravariant components \(J^{\mu}\) for spacetime index \(\mu=0,1,2,3\). Unlike other four-vectors such as the four-position \(X\), the four-velocity \(V\), etc. which were all properties of a single particle, the four-current is more of a “four-vector field” for a given configuration/dynamics of electric charges because it depends on the scalar field \(\rho=\rho(X)\) and the vector field \(\textbf J=\textbf J(X)\). Thus, the four-current is a type \((1,0)\) tensor field.
But why is \(J\) a four-vector (field)? The idea is that \(J\) contains both the information \(\rho\) about where the charges are and how they’re moving \(\textbf J\), but of course by local charge conservation these two quantities are related by the continuity equation \(\frac{\partial\rho}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf J=0\) which is elegantly expressed using the four-gradient as \(\partial_{\mu}J^{\mu}=0\). So the aesthetic of this should give some suggestion that, because we already showed that the four-gradient \(\partial_{\mu}\) is a four-covector, \(J\) should be a four-vector. To prove this rigorously, we enforce Lorentz covariance of local charge conservation in all inertial frames, so \(\partial’_{\mu}J’^{\mu}=0\). Then, \(\partial_{\mu}=\Lambda_{\mu}^{\nu}\partial’_{\nu}\), so \(\Lambda_{\mu}^{\nu}\partial’_{\nu}J^{\mu}=\partial’_{\nu}\Lambda_{\mu}^{\nu}J^{\mu}=0\). This implies that \(J’^{\mu}=\Lambda^{\mu}_{\nu}J^{\nu}+C\) where \(C\) is an integration constant independent of \(\mu\). But if the primed and unprimed inertial frames were to coincide so that \(\Lambda^{\mu}_{\nu}=\delta^{\mu}_{\nu}\) and \(J’^{\mu}=J^{\mu}\), then we must have \(C=0\). This completes the proof that \(J\) is a four-vector. As an example, if a wire of uniform charge density \(\rho\) and current \(\textbf J=\textbf 0\) is at rest in one inertial frame, then a second inertial observer Lorentz-boosted relative to the wire with velocity \(\textbf v\) would therefore see an increased charge density \(\rho’=\gamma\rho\) with the factor of \(\gamma\) attributable to length contraction and current density \(\textbf J’=-\rho’\textbf v\).
Electric & Magnetic Fields
One of the key insights that arises when one applies special relativity to electromagnetism is that magnetic fields are a relativistic effect. I suppose this is true even in materials where the \(\textbf B\)-field is due to spin angular momenta, and the Dirac equation teaches us that spin too is an intrinsically relativistic phenomenon! A simple example is given by imagining that one has an infinitely-long neutral \(\rho=0\) wire of cross-sectional area \(A\) conducting a current \(I>0\) along say the positive \(z\)-axis in the wire’s rest frame. Now, classically, we know that this is supposed to establish a circulating magnetostatic field \(\textbf B(r,\phi)=\frac{\mu_0 I}{2\pi r}\hat{\boldsymbol{\phi}}_{\phi}\). Thus, if a point charge \(q>0\) were moving with velocity \(\textbf v=v\hat{\textbf k}\) along the \(z\)-axis at some moment in time, although it would feel no electric force \(\textbf F_{\textbf E}=\textbf 0\) in the wire’s rest frame because of the wire’s electroneutrality \(\rho=0\), it should experience a magnetic force \(\textbf F_{\textbf B}=q\textbf v\times\textbf B\) attracting it towards the wire. However, it turns out that we don’t need to assume this! All we need to assume is we know how electric forces work, and then inject a bit of special relativity. To see this, Lorentz boost into the instantaneous inertial rest frame of the point charge \(q\) so that in this frame, it clearly wouldn’t experience any magnetic force because now \(\textbf v’=\textbf 0\). However, at the same time the wire is no longer neutral, instead acquiring a negative charge density \(\rho’=-\gamma\beta I/Ac<0\) (as an aside, the current also increases to \(I’=\gamma I\)). In this rest frame of \(q\) then, an electric force \(\textbf F’_{\textbf E’}\) therefore attracts it (being a positive charge \(q>0\)) towards the negatively-charged wire given by the standard formula from Gauss’s law for electric fields: \(\textbf F’_{\textbf E’}=\frac{q\rho’ A}{2\pi\varepsilon_0 r}\hat{\textbf r}=-qv\frac{\gamma I}{2\pi\varepsilon_0 c^2r}\hat{\textbf r}\). But recall \(c^2=1/\mu_0\varepsilon_0\) so \(\textbf F’_{\textbf E’}=-qv\frac{\gamma \mu_0 I}{2\pi r}\hat{\textbf r}\). Lorentz transforming back into the neutral rest frame of the wire, we conclude that there must be a magnetic force \(\textbf F_{\textbf B}=\textbf F’_{\textbf E’}/\gamma=-qv\frac{\mu_0 I}{2\pi r}\hat{\textbf r}\) which is just what we expect from Ampere’s law (this is obtained by noting that the Lorentz boosts are performed along the \(z\)-axis so the \(xy\) spatial components of the four-force \(F=\gamma(\textbf F\cdot\textbf v/c, \textbf F)^T\) are unaffected, and in particular as \(\textbf F’_{\textbf E’}\) lives in the \(xy\)-plane so \(\gamma\textbf F_{\textbf B}=\gamma’\textbf F’_{\textbf E’}\) but \(\gamma’=1\) in the rest frame of \(q\) so \(\textbf F_{\textbf B}=\textbf F’_{\textbf E’}/\gamma\) as claimed).
The Electromagnetic Four-Potential
Recall that out of \(4\) of Maxwell’s equations for \(\textbf E\) and \(\textbf B\), Gauss’s law for electric fields and the Ampere-Maxwell law involve the source fields \(\rho,\textbf J\) whereas Gauss’s law for magnetic fields and Faraday’s law of EM induction relate purely to the \(\textbf E\) and \(\textbf B\)-fields, independent of \(\rho\) and \(\textbf J\). Thus, it would be nice if one could in some sense just “get those two equations out of the way” to focus on the remaining two that actually deal with the sources. The trick to doing this is to work with the underlying electric scalar potential \(\phi(X)\) and magnetic vector potential \(\textbf A(X)\):
\[\textbf E=-\frac{\partial\phi}{\partial\textbf x}-\frac{\partial\textbf A}{\partial t}\]
\[\textbf B=\frac{\partial}{\partial\textbf x}\times\textbf A\]
which immediately guarantees that \(\frac{\partial}{\partial\textbf x}\cdot\textbf B=0\) and \(\frac{\partial}{\partial\textbf x}\times\textbf E=-\frac{\partial\textbf B}{\partial t}\) are satisfied. The two remaining Maxwell’s equations become:
\[\left|\frac{\partial}{\partial\textbf x}\right|^2\phi+\frac{\partial}{\partial t}\frac{\partial}{\partial\textbf x}\cdot \textbf A=-\frac{\rho}{\varepsilon_0}\]
\[\frac{\partial}{\partial\textbf x}\frac{\partial}{\partial\textbf x}\cdot\textbf A-\left|\frac{\partial}{\partial\textbf x}\right|^2\textbf A+\frac{1}{c^2}\left(\frac{\partial^2\phi}{\partial t\partial\textbf x}+\frac{\partial^2\textbf A}{\partial t^2}\right)=\mu_0\textbf J\]
Of course, there exists a gauge redundancy/freedom (also called “gauge symmetry” although it’s not strictly speaking a “symmetry”) such that multiple non-unique choices of the potentials \((\phi,\textbf A)\) can give rise to the same fields \(\textbf E,\textbf B\). For one, it is clear that \(\textbf A’=\textbf A-\frac{\partial\Gamma}{\partial\textbf x}\) for any scalar field \(\Gamma(X)\) still yields the same \(\textbf B=\frac{\partial}{\partial\textbf x}\times\textbf A’\). If the electric field is to also be preserved \(\textbf E=-\frac{\partial\phi’}{\partial\textbf x}-\frac{\partial\textbf A’}{\partial t}=-\frac{\partial\phi}{\partial\textbf x}-\frac{\partial\textbf A}{\partial t}\), then one can deduce that the electric scalar potential must transform correspondingly as \(\phi’=\phi+\frac{\partial\Gamma}{\partial t}\). Taken together, \((\phi,\textbf A)\mapsto (\phi+\dot{\Gamma},\textbf A-\Gamma’)\) is called a gauge transformation of the potentials \(\phi,\textbf A\) with respect to the gauge field \(\Gamma\). Since, at least classically, the choice of the gauge \(\Gamma\) has no physical significance, one can simply fix it to simplify the resultant \(2\) Maxwell equations; for instance, the Coulomb gauge \(\Gamma_{\text{Coulomb}}\) is chosen to kill the divergence term \(\frac{\partial}{\partial\textbf x}\cdot\textbf A=0\), whereas the Lorenz gauge \(\Gamma_{\text{Lorenz}}\) is chosen to kill \(\frac{1}{c}\frac{\partial\phi}{\partial t}+\frac{\partial}{\partial\textbf x}\cdot\textbf A=0\), etc.
Define the electromagnetic four-potential \(A\) by:
\[A:=\begin{pmatrix}\phi/c \\ \textbf A\end{pmatrix}\]
Why is this a four-vector? The following argument is due to J. Murray on Physics Stack Exchange: denoting the components of \(A\) in an inertial frame by \(A^{\mu}\) (even though we haven’t proved it’s a four-vector yet), since we know about the gauge freedom inherent in the four-potential \(A’^{\mu}=A^{\mu}+\partial^{\mu}\Gamma\) where \(\partial^{\mu}=\eta^{\mu\nu}\partial_{\nu}\) are the contravariant components of the four-gradient four-vector, clearly we should check whether the purported four-vectorial nature of \(A\) would change simply because of which gauge \(\Gamma\) one fixes. Fortunately, this is not the case; \(A^{\mu}\) transforms contravariantly if and only if \(A’^{\mu}\) transforms contravariantly with respect to Lorentz transformations (this is a simple check!). It remains to exhibit a specific example of a gauge \(\Gamma\) within which \(A^{\mu}\) transforms contravariantly. Fix \(\Gamma:=\Gamma_{\text{Lorenz}}\) so that \(\partial_{\mu}A^{\mu}=0\). Then Gauss’s law for electric fields and the Ampere-Maxwell law reduce to \(☐^2 A=\mu_0 J\) where the d’Alembert operator is \(☐^2:=\partial_{\mu}\partial^{\mu}\). But this is manifestly a Lorentz scalar operator, and because we’ve already shown that the four-current \(J\) is a four-vector, it follows that the electromagnetic four-potential \(A\) is also a four-vector.
The Electromagnetic Tensor
Now that we understand how the potentials \(\phi,\textbf A\) behave under Lorentz transformations, and we know how these are related to \(\textbf E,\textbf B\), it is straightforward to deduce how they too must transform. However, the relevant “nice object” is no longer a four-vector since \(\textbf E\) and \(\textbf B\) has six components! Instead, rather than a four-vector (i.e. a type \((1,0)\) Lorentz tensor), it turns out to be convenient to use a \(4\times 4\) matrix \(F\) (i.e. a type \((1,1)\) Lorentz tensor) that will be called the electromagnetic tensor (also sometimes called the Faraday tensor hence the letter \(F\)). Specifically, since we know \(\textbf E,\textbf B\) are gauge-invariant, one would like to ensure that the electromagnetic tensor \(F\) is also gauge-invariant in order to have any hope that it can properly capture \(\textbf E,\textbf B\). Pondering all these considerations together, I think it’s not too much of a stretch to say that you could eventually stumble upon the following definition (called the Plucker matrix of the four-gradient \(\partial/\partial X\) with the electromagnetic four-potential \(A\), each thought of as a “point” that together define a line in real projective space \(\textbf RP^3\)):
\[F:=\frac{\partial A^T}{\partial X}-\left(\frac{\partial A^T}{\partial X}\right)^T\]
Or in components:
\[F_{\mu\nu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}\]
where \(\partial_{\mu}\) are the usual covariant components of the four-gradient four-covector \(\partial/\partial X\) and \(A_{\mu}=\eta_{\mu\nu}A^{\nu}\) are the covariant components of the electromagnetic four-potential four-covector. As promised, one can quickly check that \(F\) is gauge-invariant, and moreover antisymmetric \(F\in\frak{so}\)\((4)\). Doing a few computations, one can show that the electric and magnetic fields \(\textbf E,\textbf B\) have the interpretation of the Plucker coordinates of the line:
\[F=\begin{pmatrix}0 & \textbf E^T/c \\ -\textbf E/c & \textbf B_{\times}\end{pmatrix}\]
with \(\textbf B_{\times}=-\textbf B\cdot\boldsymbol{\mathcal J}\) the cross-product matrix of the magnetic field \(\textbf B\). One can also construct the dual electromagnetic tensor \(\tilde F\) to be fully contravariant \(F^{\mu\nu}=\eta^{\nu\sigma}\eta^{\mu\rho}F_{\rho\sigma}\) so that:
\[\tilde F=\begin{pmatrix}0 & -\textbf E^T/c \\ \textbf E/c & \textbf B_{\times}\end{pmatrix}\]
The punchline then is that we know how \(\textbf E\) and \(\textbf B\) transform under Lorentz transformations because \(F,\tilde F\) are type \((1,1)\) Lorentz tensors! This follows because they’re constructed out of other objects like \(\partial/\partial X\), \(A\), \(\eta\), etc. that were already known to be Lorentz tensors. The transformation is the usual one for a type \((1,1)\) tensor (strictly, tensor field):
\[F’_{\mu\nu}=\Lambda_{\mu}^{\rho}\Lambda_{\nu}^{\sigma}F_{\rho\sigma}\]
\[F’^{\mu\nu}=\Lambda^{\mu}_{\rho}\Lambda^{\nu}_{\sigma}F^{\rho\sigma}\]
or in index-free notation, \(F’=\Lambda F\Lambda^T\). Amazing!
Although for four-vectors such as \(X\) and \(V\) it is true that a Lorentz boost in some direction \(\hat{\boldsymbol{\beta}}\) looks just like a Galilean boost in the same direction \(\hat{\boldsymbol{\beta}}\) provided one only looks at the spatial orthogonal complement \(\text{span}^{\perp}_{\textbf R}\hat{\boldsymbol{\beta}}\), for the electromagnetic field it turns out to be the other way around (as the electromagnetic tensor \(F\) is no longer a four-vector!). This is most transparent in the formulas below for a Lorentz boost \(\beta\in[-1,1]\) along the positive \(x\)-axis into a second inertial frame with axes aligned to the first:
\[E’_x=E_x\]
\[E’_y=\gamma(E_y-\beta cB_z)\]
\[E’_z=\gamma(E_z+\beta cB_y)\]
\[cB’_x=cB_x\]
\[cB’_y=\gamma(cB_y+\beta E_z)\]
\[cB’_z=\gamma(cB_z-\beta E_y)\]
Or, in the non-relativistic limit \(|\beta|\ll 1\), \(\textbf E’\approx \textbf E+\textbf v\times\textbf B\) and \(\textbf B’=\textbf B\) are the “Galilean boosts” of the electromagnetic field.