Noether’s Theorem in Classical Particle Mechanics

Consider an arbitrary worldline \(\{(t,\textbf x)\}\) of a classical system of particles in configuration spacetime. In general, this worldline need not correspond to any physical/on-shell trajectory; it can be as wildly off-shell as one likes, the only caveat being that time travel is forbidden (i.e. it must be possible to parameterize the worldline as \(\textbf x(t)\)).

Now take this worldline \(\{(t,\textbf x)\}\) and gently perturb each point on it \((t_0,\textbf x_0)\mapsto (t_0+\delta t(t_0,\textbf x_0),\textbf x_0+\delta\textbf x(t_0,\textbf x_0))\) by some infinitesimal translation to obtain a slightly shifted worldline. Given any generic off-shell function \(L(t,\textbf x,\dot{\textbf x})\) on configuration spacetime, if the function had value \(L(t_0,\textbf x_0,\dot{\textbf x}_0)\) at some point \((t_0,\textbf x_0)\in\{(t,\textbf x)\}\) on the worldline prior to the perturbation, then after the perturbation the value of the function at the corresponding displaced point \((t_0+\delta t(t_0,\textbf x_0),\textbf x_0+\delta\textbf x(t_0,\textbf x_0))\) on the perturbed worldline would be:

\[L(t_0+\delta t(t_0,\textbf x_0),\textbf x_0+\delta\textbf x(t_0,\textbf x_0),\dot{\textbf x}_0+\delta\dot{\textbf x}(t_0,\textbf x_0))\]

\[\approx L(t_0,\textbf x_0,\dot{\textbf x}_0)+\frac{\partial L}{\partial t}(t_0,\textbf x_0,\dot{\textbf x}_0)\delta t(t_0,\textbf x_0)+\frac{\partial L}{\partial\textbf x}(t_0,\textbf x_0,\dot{\textbf x}_0)\cdot\delta\textbf x(t_0,\textbf x_0)+\frac{\partial L}{\partial\dot{\textbf x}}(t_0,\textbf x_0,\dot{\textbf x}_0)\cdot\delta\dot{\textbf x}(t_0,\textbf x_0)\]

Henceforth dropping the cumbersome arguments (but keeping in mind the equation holds at any arbitrary point \((t_0,\textbf x_0)\in\{(t,\textbf x)\}\) on the unperturbed worldline), one thus has:

\[\delta L=\frac{\partial L}{\partial t}\delta t+\dot{(\textbf p\cdot\delta\textbf x)}+\left(\frac{\partial L}{\partial\textbf x}-\dot{\textbf p}\right)\cdot\delta\textbf x\]

where \(\textbf p:=\frac{\partial L}{\partial\dot{\textbf x}}\). So far this has just been math. At this point, one has to introduce \(2\) key pieces of physics:

  1. The function \(L(t,\textbf x,\dot{\textbf x})\) is not just some random function, but a special function called the Lagrangian, defined by \(L(t,\textbf x,\dot{\textbf x}):=T(\dot{\textbf x})-V(t,\textbf x)\).
  2. Among the uncountably infinite ocean of worldlines \(\{(t,\textbf x)\}\) that one could weave through configuration spacetime, the tiny subset of these worldlines that are physical/on-shell are those which satisfy the stationary action principle. This means that for any pair of points \((t_1,\textbf x^*(t_1)),(t_2,\textbf x^*(t_2))\) on such an on-shell worldline \(\textbf x^*(t)\), the action functional

\[S[\textbf x(t)]:=\int_{t_1}^{t_2}dt L(t,\textbf x(t),\dot{\textbf x}(t))\]

is stationary on \(\textbf x^*(t)\) (i.e. \(\delta S[\textbf x^*(t)]=0\)) subject to the constraints that:

  1. The initial and final times \(t_1,t_2\) are fixed \(\delta t(t_1,\textbf x^*(t_1))=\delta t(t_2,\textbf x^*(t_2))=0\)
  2. The initial and final configurations \(\textbf x^*(t_1),\textbf x^*(t_2)\) are also fixed \(\delta\textbf x(t_1,\textbf x^*(t_1))=\delta\textbf x(t_2,\textbf x^*(t_2))=\textbf 0\).

This yields the on-shell Euler-Lagrange equations of motion:

\[\dot{\textbf p}=\frac{\partial L}{\partial\textbf x}\]

On the other hand, if one now relaxes the above boundary conditions (which were needed only for formulating the stationary action principle) and instead consider an arbitrary infinitesimal perturbation \((t_0,\textbf x^*(t_0))\mapsto (t_0+\delta t(t_0,\textbf x^*(t_0)),\textbf x^*(t_0)+\delta\textbf x(t_0,\textbf x^*(t_0)))\) of an on-shell trajectory \(\textbf x^*(t)\) (to emphasize again, the boundaries are now free to move!), then the on-shell action \(S^*=S[\textbf x^*(t)]\) changes by the infinitesimal virial:

\[\delta S^*=\int_{t_1}^{t_2}dt\frac{\partial L}{\partial t}\delta t+[\textbf p\cdot\delta\textbf x]^{t_2}_{t_1}\]

Conservation of Momentum for Free Particle

Consider a free particle \(L=\frac{1}{2}m|\dot{\textbf x}|^2\). The Euler-Lagrange equations assert that such a particle moves at constant velocity \(\dot{\textbf x}=\text{const}\). Thus, all straight lines in configuration spacetime are on-shell worldlines because they make the action stationary.

Therefore, if one starts with any such (on-shell) straight worldline \(\textbf x^*(t)\) and performs a simple space translation \(\delta\textbf x\) without any time translation \(\delta t=0\) (purple to green curve), then on the one hand the action is unchanged \(\delta S^*=0\) because the new straight worldline is still on-shell (or mathematically, \(S=\frac{m}{2}\int_{t_1}^{t_2}dt|\dot{\textbf x}(t)|^2\) but the “slope” \(|\dot{\textbf x}|\) didn’t change) but on the other hand general calculus considerations dictate it changes by \(\delta S^*=[\textbf p\cdot\delta\textbf x]^{t_2}_{t_1}\) where \(\textbf p=m\dot{\textbf x}\). One is thus forced to conclude that the quantity \(\textbf p\cdot\delta\textbf x\) (and thus \(\textbf p\) itself because \(\delta\textbf x(t_1,\textbf x(t_1))=\delta\textbf x(t_2,\textbf x(t_2))\) is a uniform space translation) is conserved (since \(t_1\leq t_2\) are arbitrary times). If one likes, this is the integral form of the conservation of momentum. The differential form \(\dot{\textbf p}=\textbf 0\) is just the on-shell Euler-Lagrange equation.

Conservation of Energy for Time-Independent Systems

If one perturbs all the points along the curve such that the time and space perturbations are linked by the velocity \(\delta\textbf x=\dot{\textbf x}\delta t\), then this is a symmetry of the system since the on-shell action changes by a boundary term:

\[S^{*’}\approx\int_{t_1+\delta t}^{t_2+\delta t}dt L=\int_{t_1}^{t_2}dt L+\int_{t_2}^{t_2+\delta t}dtL-\int_{t_1}^{t_1+\delta t}dtL\approx S^*+\delta t[L]_{t_1}^{t_2}\]

Making the important assumption that \(\partial L/\partial t=0\), this means that:

\[\delta S^*=\delta t[L]^{t_2}_{t_1}=\delta t[\textbf p\cdot\dot{\textbf x}]_{t_1}^{t_2}\]

from which one obtains the conserved quantity \(H:=\textbf p\cdot\dot{\textbf x}-L\). In differential form, one can check the general on-shell identity \(\dot H=-\frac{\partial L}{\partial t}\) so when \(\frac{\partial L}{\partial t}=0\) one obtains \(H\) as the conserved energy (called the Beltrami identity in the more general setting of the calculus of variations). Strictly speaking \(H\) is not to be confused with the Hamiltonian which is a function of \(\textbf x\) and \(\textbf p\) via a \(\dot{\textbf x}\mapsto\textbf p\) Legendre transform of the Lagrangian \(L\).

Conservation of Angular Momentum

Finally, if the infinitesimal angular translation \(\delta\textbf x:=\delta\boldsymbol{\phi}\times\textbf x\) is a symmetry of the system, then the quantity:

\[\textbf p\cdot(\delta\boldsymbol{\phi}\times\textbf x)=\delta\boldsymbol{\phi}\cdot\textbf L\]

is conserved, where the orbital angular momentum \(\textbf L:=\textbf x\times\textbf p\).

General Remarks on Noether’s Theorem

More generally, anything you can do to some on-shell worldline that keeps it on-shell such that the on-shell action \(\delta S^*\) changes by at most some boundary term (equivalently the Lagrangian \(L\) changes by a total time derivative) is called a symmetry of the system, and this state of affairs can always be rearranged to yield a conservation law. This is Noether’s theorem.

Thus, to recap, Noether’s theorem arises from the fact that the action \(S\) is a time integral \(\int dt\) and that when working on-shell, the variation \(\delta S^*\) in the on-shell action is zero everywhere along the main body of the worldline \(\textbf x^*(t)\) (thanks to the stationary action principle) and so is only sensitive to the “edge effects” associated with changes in the initial and final configurations. But if the perturbation is a symmetry of the system, then one can always reframe this as saying that some quantity is conserved. Implicit in the whole discussion is that these symmetries need to be elements of some continuous Lie group otherwise it wouldn’t be possible to speak of implementing them infinitesimally.

Also, for any kind of purely spatial perturbation \(\delta\textbf x(t)\) so that \(\delta t=0\), it doesn’t even matter if \(\partial L/\partial t\neq 0\)…

Posted in Blog | Leave a comment

Linear Elastostatics & Elastodynamics

Problem: What is the defining property of the (Cauchy) stress tensor (field) \(\sigma(\textbf x,t)\) of a material.

Solution: The idea is that if one wants to find the stress vector (also called traction) \(\boldsymbol{\sigma}\) acting on a plane with unit normal \(\hat{\textbf n}\) in the material at location \(\textbf x\) and time \(t\), then this is obtained by acting with the stress tensor:

\[\boldsymbol{\sigma}=\sigma\hat{\textbf n}\]

Or in components:

\[\sigma_i=\sigma_{ij}n_j\]

Problem: What do the diagonal stresses \(\sigma_{11},\sigma_{22},\sigma_{33}\) represent physically? What about the off-diagonal stresses \(\sigma_{12},\sigma_{23},\sigma_{13}\), etc.

Solution: The diagonal stresses represent normal stresses. The off-diagonal stresses represent shear stresses.

Problem: Under what conditions is the stress tensor \(\sigma^T=\sigma\) symmetric? What does this imply the existence of?

Solution: Provided the net couple/torque vanishes \(\boldsymbol{\tau}=\textbf 0\) (intuitively, the normal stresses do not supply any torque, it is the off-diagonal shear stresses that need to match to ensure this rotational equilibrium).

The immediate corollary of this is that, as usual, \(\sigma\) will have \(3\) orthogonal eigenspaces (called principal directions spanned by \(\hat{\textbf n}\), whose orthogonal complements \(\text{span}^{\perp}(\hat{\textbf n})\) are called the principal planes) each associated to a real eigenvalue (called a principal stress). Physically, the stress vector \(\boldsymbol{\sigma}\) acts perpendicularly to such principal planes along the principal directions, so it is a pure normal stress since when diagonalized the off-diagonal shear stresses vanish.

Problem: If \(\sigma_{1,2,3}\) is a principal stress of the stress tensor \(\sigma\), explain why:

\[\sigma_{1,2,3}^3-\text{Tr}(\sigma)\sigma_{1,2,3}^2+\frac{1}{2}\left(\text{Tr}^2(\sigma)-\text{Tr}(\sigma^2)\right)\sigma_{1,2,3}-\det(\sigma)=0\]

Solution: This is a bit trivial in that it’s just the general form of the characteristic equation for any \(3\times 3\) matrix. Equally trivial is the fact that coefficients are coordinate-free (and sometimes called stress invariants in this context). In terms of the principal stresses \(\sigma_1,\sigma_2,\sigma_3\) at a given point in spacetime (not to be confused with the components of the stress vector), of course one has:

\[\text{Tr}(\sigma)=\sigma_1+\sigma_2+\sigma_3\]

\[\frac{1}{2}\left(\text{Tr}^2(\sigma)-\text{Tr}(\sigma^2)\right)=\sigma_1\sigma_2+\sigma_2\sigma_3+\sigma_3\sigma_1\]

\[\det(\sigma)=\sigma_1\sigma_2\sigma_3\]

Problem: Suppose one were interested in just the normal stress in the direction \(\hat{\textbf n}\) at some point even though the stress vector \(\boldsymbol{\sigma}\) was not parallel with \(\hat{\textbf n}\) (i.e. not a principal direction!). How can one nevertheless extract this information?

Solution: Do the obvious thing \(\hat{\textbf n}\cdot\boldsymbol{\sigma}=\hat{\textbf n}^T\sigma\hat{\textbf n}=\sigma_{ij}n_in_j\).

Problem: (Mohr’s circle, (equivalent) von Mises stress, yield criteria, cool Lagrange multiplier optimization problem for max/min shear stresses in Tresca’s criterion)

Solution:

Problem: What is the definition of the strain tensor \(\varepsilon\)?

Solution: It is the symmetric part of the Jacobian of the displacement field \(\textbf X(\textbf x,t)\), i.e.

\[\varepsilon:=\frac{1}{2}\left(\frac{\partial}{\partial\textbf x}\otimes\textbf X+\left(\frac{\partial}{\partial\textbf x}\otimes\textbf X\right)^T\right)\]

Or with respect to a Cartesian basis:

\[\varepsilon_{ij}=\frac{1}{2}\left(\partial_i X_j+\partial_j X_i\right)\]

and therefore is by construction symmetric \(\varepsilon^T=\varepsilon\).

Problem: Find the components of the strain tensor \(\varepsilon\) in cylindrical coordinates.

Solution:

Problem: Describe the grown-up version of Hooke’s constitutive law relating \(\varepsilon\) to \(\sigma\).

Solution: At its essence, Hooke’s law postulates there exists a linear and elastic (i.e. in thermodynamics language, reversible, i.e. no hysteresis) relationship between them, given by a rank-\(4\) tensor field \(C\) (called the elasticity/stiffness tensor):

\[\sigma=C\varepsilon\]

Or, more transparently in Cartesian components:

\[\sigma_{ij}=C_{ijk\ell}\varepsilon_{k\ell}\]

Although a general rank-\(4\) tensor in \(3\)D has \(3^4=81\) degrees of freedom, the symmetries \(\sigma_{ij}=\sigma_{ji}\) and \(\varepsilon_{ij}=\varepsilon_{ji}\) means that \(C_{ijk\ell}=C_{jik\ell}\) and \(C_{ijk\ell}=C_{ij\ell k}\) so the first \(2\) indices \(\{i,j\}\) should be viewed as a multiset rather than an ordered pair, and similarly for the last \(2\) indices \(\{k,\ell\}\). By inspection, there are \(3+2-1\choose{2}\)\(=6\) such multisets, for a total of \(6^2=36\) degrees of freedom. However, the existence of a scalar elastic strain energy density \(\frac{1}{2}C_{ijk\ell}\varepsilon_{ij}\varepsilon_{k\ell}\) means that \(C_{ijk\ell}=C_{k\ell ij}\), so a \(6\times 6\) symmetric matrix has \(\frac{6\times 7}{2}=21\) independent degrees of freedom.

Problem: Imposing further the assumption that \(C\) is an isotropic tensor, explain why \(21\) degrees of freedom are reduced to just \(2\):

\[C_{ijk\ell}=\lambda\delta_{ij}\delta_{k\ell}+G(\delta_{ik}\delta_{j\ell}+\delta_{i\ell}\delta_{jk})\]

where the scalars \(\lambda,G\in\textbf R\) are called the Lame parameters of \(C\).

Solution: A standard mathematical theorem is that the most general form of a rank-\(4\) isotropic tensor is:

\[C_{ijk\ell}=\lambda\delta_{ij}\delta_{k\ell}+\mu\delta_{ik}\delta_{j\ell}+G\delta_{i\ell}\delta_{jk}\]

except that in this case the additional symmetries mentioned earlier (e.g. \(\sigma^T=\sigma\)) enforce \(\mu=G\).

Problem: Hence, demonstrate that for a linear-elastic isotropic material:

\[\sigma=\lambda\text{Tr}(\varepsilon)1+2G\varepsilon\]

Solution: Start with Hooke’s law for a linear-elastic material, and then inject isotropy into \(C\):

Problem: It is important to emphasize the number “\(2\)” in the statement “there are \(2\) Lame parameters \(\lambda, G\)”. Just as in the thermodynamics of a gas the state of the system is completely specified by \((p,V)\), similarly here any linear-elastic isotropic material’s behavior is completely specified by \(2\) independent moduli, such as \((\lambda, G)\). However, there are also other pairs of moduli available, such as \((E,\nu)\) (Young’s modulus with Poisson’s ratio) or \((B, G)\) (bulk modulus and shear modulus).

First, explain why the \(G\) here really deserves to be called “shear modulus”. Similarly, by considering other loading scenarios, show that the relevant “Legendre transformations” between the moduli are:

\[\lambda=\frac{E\nu}{(1+\nu)(1-2\nu)}\]

\[G=\frac{E}{2(1+\nu)}\]

\[B=\frac{E}{3(1-2\nu)}\]

Solution:

Problem: Hooke’s law for linear-elastic materials as it’s currently written:

\[\sigma=\lambda\text{Tr}(\varepsilon)1+2G\varepsilon\]

conflates cause-and-effect since \(\sigma\) is what causes \(\varepsilon\). Show how to isolate for \(\varepsilon\) in terms of \(\sigma\), and write the resultant Hooke’s law in the principal frame.

Solution:

Problem: Looking at the expression for the bulk modulus \(B\) in terms of \(E\) and \(\nu\):

\[B=\frac{E}{3(1-2\nu)}\]

Explain why this constrains \(\nu\leq 1/2\). What does it mean for a material (e.g. rubber) to have \(\nu=1/2\)?

Solution: Stability requires that \(B\geq 0\) (otherwise one would get a positive feedback loop between \(p,V\)), so this implies \(\nu\leq 1/2\). Materials with \(\nu=1/2\) are volume-preserving; stretching in \(z\) will causes a contraction in \(\rho\) such that \(z\rho^2=\text{const}\). Of course this means the bulk modulus is \(B=\infty\).

Problem: How is this grown-up version of Hooke’s law related to the childish version \(F=-kx\)?

Solution: For a spring of material with Young’s modulus \(E\), wire of cross-section \(A\), and length \(L\), the spring constant is:

\[k=\frac{EA}{L}\]

this should be compared with a similar formula for capacitance of a parallel-plate capacitor:

\[C=\frac{\varepsilon A}{d}\]

or inductance of a long solenoid:

\[L=\frac{\mu N^2A}{\ell}\]

Problem: Explain why, as claimed earlier, for any (possibly anisotropic) linear elastic material, the strain energy density \(V\) is given by:

\[V=\frac{1}{2}C_{ijk\ell}\varepsilon_{ij}\varepsilon_{k\ell}=\frac{1}{2}\text{Tr}(\sigma\varepsilon)\]

And show how this result specializes to the case of an isotropic linear elastic material.

Solution: Basically just the usual argument of doing some work on a unit cell and equating that with the stored energy; the factor of \(1/2\) is the canonical “area-of-a-triangle” factor that arises due to the linearity of the stress-strain “curve”.

Including the assumption of isotropy:

Problem: Consider twisting a hollow cylinder of radius \(R\), length \(L\), and thickness \(t\ll R\) through a small angle \(\Delta\phi\). Show that the external couple \(\tau_{\text{ext}}\) required to maintain this is:

\[\tau_{\text{ext}}=\frac{2\pi GR^3t}{L}\Delta\phi\]

If instead it is a solid cylinder, show that the external couple required is now:

\[\tau_{\text{ext}}=\frac{\pi GR^4}{2L}\Delta\phi\]

(optional: calculate the elastic strain energy stored in the cylinders in both cases).

Solution:

Problem: Consider a uniformly pressurized hollow, capped cylinder of pressure \(p\), radius \(R\), thickness \(t\) and length \(L\). Show that the hoop stress \(\sigma_{\phi}\) and axial stress \(\sigma_z\) are related by a factor of \(2\):

\[\sigma_{\phi}=2\sigma_{z}=\frac{R}{t}p\]

(WHAT ABOUT RADIAL STRESSES \(\sigma_{\rho}\)?)

Solution:

Problem: Demonstrate Betti’s reciprocity theorem for a linear elastic isotropic cantilever beam.

Solution:

Problem: Show that if one takes a beam with Young’s modulus \(E\), moment of area \(I\), and you start to compress it axially, once the force \(F\) applied exceeds a certain critical force (the “Euler force”), the beam will spontaneously break symmetry by buckling (this configuration is sometimes referred to as the Euler strut).

Solution:

Problem: Show that Newton’s \(2^{\text{nd}}\) law in a continuum becomes the Cauchy momentum equation:

\[\rho\frac{D\textbf v}{Dt}\biggr|_{\textbf v}=\frac{\partial}{\partial\textbf x}\cdot\sigma+\textbf f_{\text{ext}}\]

Solution: Basically just an application of the Reynold’s transport theorem to a system nitpick trick applied to a comoving volume:

Problem: Show how, using the Cauchy momentum equation as the fundamental starting point, the equations of elastodynamics (Navier-Cauchy equations) and fluid dynamics (Navier-Stokes equations) arise from postulating similar-looking but conceptually different constitutive relations for the stress tensor \(\sigma\).

Solution:

Problem: As the Navier-Cauchy equations are linear, it makes sense to consider a plane wave ansatz \(\textbf X(\textbf x,t)=\textbf X_0e^{i(\textbf k\cdot\textbf x-\omega_{\textbf k}t)}\). Hence show that there are \(2\) kinds of waves:

\[\text{S-waves}\Leftrightarrow\textbf k\cdot\textbf X_0=0\]

\[\text{P-waves}\Leftrightarrow\textbf k\times\textbf X_0=\textbf 0\]

obtain the dispersion relation \(\omega_{\textbf k}\) for \(S\)-waves and \(P\)-waves (cf. \(s\) and \(p\) atomic orbitals or \(s\)-wave and \(p\)-wave scattering in quantum mechanics…the nomenclature is a coincidence as here \(S\) stands for “secondary” while \(P\) stands for “primary”, whereas in the quantum context \(s\) stands for “sharp” while \(p\) stands for “principal”).

Solution:

Problem: Consider a longitudinal \(P\)-wave propagating in the \(x\)-direction through:

a) A \(1\)D infinite (linear elastic isotropic) rod.

b) A \(2\)D infinite (linear elastic isotropic) plate.

c) A \(3\)D infinite (linear elastic isotropic) bulk solid.

In each case, what is the speed \(v_p\) of such \(P\)-waves?

Solution:

Problem: Starting from the Navier-Cauchy equations, show how to obtain the dynamic Euler-Bernoulli beam equation.

Solution:

Problem: Find the normal mode frequencies of vibration \(\omega_n\) of a cantilever beam of length \(L\).

Solution:

So the allowed values of \(kL\), although strictly given by the zeroes of the graph, quickly tend towards \(k_nL\approx (n+1/2)\pi\) (which are the vertical dashed lines). By the dispersion relation, the corresponding angular frequencies therefore quickly tend toward:

\[\omega_n\approx\sqrt{\frac{EI}{\mu}}\frac{(n+1/2)^2\pi^2}{L^2}\]

Problem: What is the sign convention for the internal couple (also called bending moment) of a beam?

Solution: Remember the mnemonic “top in tension” as this is when \(\bar{\tau}_{\text{int}}(x)>0\). For instance, for a simple cantilever beam drooping under its own weight, the top of the beam is clearly in tension everywhere, so the internal couple will be positive all the way along. Similarly, for a beam which is sagging between \(2\) supports, the top is in compression everywhere so under this sign convention the internal couple is negative everywhere along the beam.

Problem:

The advantage of this convention is that the static Euler-Bernoulli beam equation doesn’t need any pesky minus signs, assuming also the (reasonable) convention that \(y(x)>0\) is an upwards displacement:

\[y”(x)=\bar{\tau}_{\text{int}}(x)/EI\]

Geometrically, it is intuitively clear that when a beam is bent by external forces and torques, some sections will become longer (hence experiencing internal tension) while other sections will become shorter (hence experiencing internal compression). By the intermediate value theorem, there thus exists a section which becomes neither longer nor shorter, instead possessing a bend-invariant length; this is called the neutral axis of the beam (strictly speaking, the neutral axis is not a 1D line, but a 2D surface cutting through the center of the beam so perhaps a better name would have been neutral surface). The neutral axis of the beam therefore experiences neither internal tension nor internal compression, and so is defined by experiencing zero internal stress \(\sigma_{\text{int}}=0\). The idea is to then understand how the internal stress field \(\sigma_{\text{int}}(y)\) varies as a function of the distance \(y\) above (or below) the neutral axis (where we’ve just established it’s zero). Intuitively, this should depend not only on \(y\), but also on the stiffness of the beam material (this is why it’s harder to bend stiffer beams!). Since the internal tensions and internal compressions are essentially uniaxial stresses along the beam, one should invoke Hooke’s law:

$$\sigma_{\text{int}}(y)=E\varepsilon(y)$$

where the engineering strain \(\varepsilon(y)\) at a distance \(y\) from the neutral axis may be approximated as \(\varepsilon(y)=\kappa y\) where \(\kappa\) is the local curvature of the beam (and the key reason for introducing it is that it is independent of \(y\), being only a function of position \(x\) along the neutral axis). Thus, the key finding here is that the internal stress field varies linearly about the neutral axis:

$$\sigma_{\text{int}}(y)=\kappa Ey$$

Throughout this, the analysis has been for some fixed \(x\), and looking at the variation in \(y\). To globalize this from \(y\to x\), one could stick with \(\sigma_{\text{int}}(x,y)=\kappa(x)Ey\) but this would still be mucking up \(x\) with \(y\). So in order to get rid of \(y\) and be able to just worry about \(x\), it is necessary to find some property that characterizes each cross section of the beam. Staring at the picture of the internal stress field, it is clear that one useful metric is provided by the internal torque \(\bar{\tau}_{\text{int}}(x)\) about the neutral axis (also called the internal bending moment by engineers), which by definition is:

$$\bar{\tau}_{\text{int}}(x)=\iint_{\text{cross section at }x}y\sigma_{\text{int}}(x,y)dA$$

Introducing the second moment of area \(\bar I(x):=\iint_{\text{cross section at }x}y^2dA\) about the neutral axis, this leads to the result:

$$\bar{\tau}_{\text{int}}(x)=E\bar I(x)\kappa(x)$$

which washes out all \(y\)-dependence as desired. The product \(E\bar I\) is called the flexural rigidity of the beam. Letting \(\delta(x)\) denote the profile of the bent beam, one has the approximation \(\kappa(x)\approx d^2\delta/dx^2\), so (suppressing the \(x\)-dependence):

$$\frac{d^2\delta}{dx^2}=\frac{\bar{\tau}_{\text{int}}}{E\bar{I}}$$

Finally, although this form is generally sufficient (thanks to the “system nitpick trick”), it is also possible to express the internal torque \(\bar{\tau}_{\text{int}}\) about the neutral axis directly in terms of the (perhaps more intuitive) linear external transverse force density \(f_{\text{ext}}^{\perp}(x)\) along the beam. Specifically, because the beam is static, translational equilibrium in the vertical direction gives \(f_{\text{ext}}^{\perp}(x)=\frac{dF_{\gamma}}{dx}\) where \(F_{\gamma}(x)\) is the vertical shear force at position \(x\) and rotational equilibrium gives \(F_{\gamma}(x)=\frac{d\bar{\tau}_{\text{int}}}{dx}\). Altogether, this yields a 4th-order ODE which is traditionally called the static Euler-Bernoulli beam equation:

$$\frac{d^4\delta}{dx^4}=\frac{f_{\text{ext}}^{\perp}}{E\bar I}$$

Evidently, a more descriptive name for the static Euler-Bernoulli beam equation would be the static small deflection beam equation since the most important assumption underlying it is that the deflection \(\delta(x)\) of the beam is small everywhere.

Example: Within the Euler-Bernoulli small deflection regime, what is the shape of a homogeneous cantilever beam of density \(\rho\), Young’s modulus \(E\), and length \(L\) with an \(a\times a\) square cross section in the presence of Earth’s surface gravitational field \(g\)? One option is to insert \(f_{\text{ext}}^{\perp}=\rho ga^2\) and integrate the static Euler-Bernoulli beam equation from \(0\) to \(L\) with respect to \(x\). Because the internal shear force \(F_{\gamma}(0)=\rho ga^2L\) at the fixed end must balance the weight of the cantilever, this gives a boundary condition to fix the first integration constant. Integrating again, and now using the boundary condition \(\bar{\tau}_{\text{int}}(0)=\rho ga^2L^2/2\), one gets a second boundary condition. Then integrating twice more and using the cantilever boundary conditions \(\delta(0)=d\delta/dx(0)=0\) yields the deflected beam profile as a quartic polynomial in \(x\):

$$\delta(x)=\frac{\rho g}{2a^2E}x^2(6L^2-4Lx+x^2)$$

with maximum deflection \(\delta(L)=\frac{3\rho gL^4}{2a^2E}\), so if one wished to minimize this, then a suitable merit index for materials selection would be \(E/\rho\).

For Mode I plane stress crack loading, an external stress \(\boldsymbol{sigma}_{\text{ext}}\) is applied which induces a uniform internal stress field \(\sigma\) in the material. A crack of “semi-major axis” \(a\) will grow/propagate iff \(a>a^*=\frac{EV’_I}{\pi\sigma^2_{\text{ext}}}\). For brittle materials, \(\mathcal G^*=2\gamma\) (I don’t like the notation, but it’s so common I’ll just use it; maybe think Gibbs free energy is also a sort of “energy released”?), but for ductile materials the potential energetic penalty \(\mathcal G^*\) is generally far greater due the formation of a plastic zone about the crack tip \(\sigma_{\text{int}}=\sigma^*\) which requires external work \(W_{\text{ext}}=\frac{(\sigma^*)^2}{2E}\) to form (since it’s plastic deformation just like doing any other kind of deformation! Or think about the dislocations?). This blunts the crack tip curvature. The punchline is that ductile = tough! Also, fracture toughness \(\mathcal K^*:=\sqrt{E\mathcal G^*}\) (idk why the K notation). So basically, shit breaks when you drop it on the floor for instance because at the moment of collision the high stress \(\sigma_{\text{ext}}\) lowered the critical crack size \(a^*\propto\frac{1}{\sigma_{\text{ext}}^2}\) so much that the natural cracks \(a\) always present as defects in the material (dislocations are the other key kind of defect) exceeded \(a^*\), and therefore by the Griffith criterion it was energetically favorable for them to grow/propagate, leading to fracture! (although depends on brittle vs. ductile again).

A material is said to be viscoelastic iff it exhibits a duality between behaving as a Newtonian fluid (with some dynamic viscosity \(\eta\), hence the visco part of “viscoelastic”) and a linear elastic solid (with some Young’s modulus \(E\), hence the elastic part of “viscoelastic”) (cf. with wave-particle duality in quantum mechanics). There are various ways to make this duality precise, for instance the simplest is to model a viscoelastic material via the Maxwell model, namely a viscous damper (also called a dashpot) of dynamic viscosity \(\eta\) in series with a linear elastic spring of Young’s modulus \(E\). By Newton’s second law (analog of Kirchoff’s laws for electric circuits), this gives rise to a constitutive relation between the external stress \(\sigma_{\text{ext}}\) applied to the viscoelastic Maxwell material and its total elastic strain \(\varepsilon\) as:

$$\dot{\varepsilon}=\frac{\dot{\sigma}_{\text{ext}}}{E}+\frac{\sigma_{\text{ext}}}{\eta}$$

This is not really a differential equation, it’s just saying if you know a priori what the applied external stress \(\sigma_{\text{ext}}=\sigma_{\text{ext}}(t)\) is, then by time integration one can recover the corresponding elastic strain \(\varepsilon(t)\) experienced by the sample. This viscoelastic duality is directly responsible for loading-unloading hysteresis of materials (i.e. the damper dissipates heat energy, corresponding to the enclosed area on a hysteresis loop).

Posted in Blog | Leave a comment

Wave Impedance

Problem: To what kinds of waves does the concept of “wave impedance” \(Z\) apply to?

Solution: (Transverse/longitudinal/dispersive/non-dispersive/plane/non-planar) travelling waves

Problem: Why does it make more sense conceptually to consider the reciprocal of the impedance \(Y:=1/Z\) (called the admittance)?

Solution: In general, the heuristic one should have is that, if one applies a given known “drive” \(F\), then one would like to compute the corresponding “response” \(v\). It makes sense to relate the former to the latter by a direct “multiplicative” linear response function:

\[v=YF\]

but then this \(Y\) is really the admittance. So put another way:

\[v=\frac{F}{Z}\]

is the best way to remember what the essence of a wave impedance really is, i.e. \(v\) wants to be like \(F\), but \(F\) must get deflated by a factor \(Z\) representing how much of the influence of \(F\) is impeded by some mechanism in the underlying medium.

Problem: Conceptually, what’s the “logic flow” of impedance \(Z\)?

Solution: Impedance always has a general definition for waves in a given context, and in addition also takes on specific forms depending on the linear constitutive relation specified for the wave.

Problem: Define the impedance \(Z\) of a mass \(m\), and motivate it by considering elastic collisions.

Solution: As mass is just the inertia of a body to external forces, and concept of impedance is also in that spirit, it’s no surprise that:

\[Z=m\]

To see this, consider a \(1\)D elastic collision of a mass \(m\) with speed \(v\) incident head-on with a mass \(M\) at rest \(V=0\). Then:

\[mv=mv’+MV’\]

\[\frac{1}{2}mv’^2=\frac{1}{2}mv’^2+\frac{1}{2}MV’^2\]

which is solved by:

\[v’=\frac{m-M}{m+M}v\]

\[V’=\frac{2m}{m+M}v\]

Problem: Define the mechanical impedance \(Z\) of a transverse travelling wave \(\psi(x,t)\) in a non-dispersive violin string of linear mass density \(\mu\) under tension \(T\).

Solution: The transverse driving force is \(F=-T\psi’\) while the transverse velocity is \(\dot{\psi}\) so:

\[\dot{\psi}=\frac{-T\psi’}{Z}\]

Substituting a travelling wave ansatz \(\psi(x,t)=\psi(x-vt)\) leads to:

\[v=\frac{T}{Z}\Rightarrow Z=\sqrt{T\mu}\]

In practice, it is better to remember \(v=T/Z=\sqrt{T/\mu}\).

Problem: Define the specific acoustic impedance \(Z\) of a sound wave \(X(x,t)\) propagating in a non-dispersive medium (solid/liquid/gas) of density \(\rho\) with speed \(v\).

Solution: Similar to mechanical impedance except force is mapped to a pressure \(F\mapsto p\) to avoid dealing with the extensive nature of \(F\) depending on the area of application.

The pressure is \(p=-\rho c^2X’\) and the speed of the wave is \(\dot X\), so:

\[\dot X=\frac{-\rho c^2X’}{Z}\]

Either plug the travelling wave ansatz again \(X(x,t)=X(x-ct)\), or (equivalently) work in Fourier space:

\[-i\omega=\frac{-\rho c^2ik}{Z}\Rightarrow Z=\rho c\]

To make it look more like the violin string, one can write:

\[c=\frac{\rho c^2}{Z}=\sqrt{\frac{K}{\rho}}\]

where \(K\) is a suitable elastic modulus that depends on the type of wave and medium:

Problem: Define the electromagnetic impedance \(Z\) of a propagating EM wave in some medium (free space, dielectric, conductor, etc.)

Solution: The general definition is:

\[H=\frac{E}{Z}\]

where of course \(E:=|\textbf E|\) and \(H:=|\textbf H\) are the orthogonal \(\textbf E\) and \(\textbf H\)-fields. Invoking the constitutive relation \(E=v\times \mu H\) in a linear dielectric gives:

\[Z=\sqrt{\frac{\mu}{\varepsilon}}\]

Of course this applies to a conductor where \(\varepsilon\mapsto\varepsilon_{\text{eff}}=\varepsilon+i\sigma/\omega\).

Problem: Define the electrical impedance \(Z\) of a lumped element in an electric circuit.

Solution: This is Ohm’s law:

\[I=\frac{V}{Z}\]

(the connection with waves is a bit hazy here?)

Problem: Define the characteristic impedance of a transmission line with inductance per unit length \(\hat L\) and capacitance per unit length \(\hat C\), series resistance per unit length \(\hat R\), and parallel conductance per unit length \(\hat G\).

Solution:

\[Z=\sqrt{\hat R+i\omega\hat L}{\hat G+i\omega\hat C}\]

In the lossless case \(\hat R=\hat G=0\), this simplifies to:

\[Z=\sqrt{\frac{\hat L}{\hat C}}\]

Problem: Now that the notion of an impedance has been defined for a bunch of scenarios,

Problem: Explain why, in all the cases considered, the notion of a wave impedance \(Z\)

Impedance helps with steady-state calculations, comes from a linear constitutive law/equation of state/is intrinsic to a medium.

Problem:

Solution: Don’t memorize:

\[r=\frac{Z-Z’}{Z+Z’}\]

\[t=\frac{2Z}{Z+Z’}\]

One option is to memorize directly the interface matching conditions:

\[1+r=t\]

\[Z-Zr=Z’t\]

they are logically equivalent, but the latter makes it clear where it comes from. Furthermore, multiplying the latter \(2\) equations together yields the power flow/energy conservation equation:

\[Z(1-r^2)=Z’t^2\]

ensuring that \(R:=r^2\) and \(T:=\frac{Z’}{Z}t^2\) obey \(R+T=1\).

Alternatively, in direct analogy with “conservation of momentum” and “conservation of kinetic energy” from the context of elastic collisions, one can also remember the pair of equations:

\[Z=Zr+Z’t\]

\[\frac{1}{2}Z=\frac{1}{2}Zr^2+\frac{1}{2}Z’t^2\]

and similarly, the equation \(v+v’=V\) from that context directly translates to the continuity condition \(1+r=t\).

Problem: Explain why in some contexts (notably optics) the impedance looks “swapped”

Solution: Because of the mathematical identity:

\[\frac{Z’-Z}{Z+Z’}=\frac{\frac{1}{Z}-\frac{1}{Z’}}{\frac{1}{Z}+\frac{1}{Z’}}=\frac{Y-Y’}{Y+Y’}\]

in other words, it would have been more natural to work with the admittances.

Problem: Impedance matching \(Z’=Z\) is desirable because it ensures that there are no reflections \(r=R=0\), i.e. perfect transmission \(t=T=1\). However, even if \(Z”\neq Z\), show that by inserting a length \(\lambda’/4\) of an intermediate medium with impedance \(Z’=\sqrt{ZZ”}\) the geometric mean, this acts to effectively impedance match anyways.

Solution: It is useful to first gain a heuristic understanding of this. Basically, the idea is that initially one has \(2\) media with mismatched impedances \(Z\neq Z^{\prime\prime}\); left like that, there would be reflections at the interface. So the idea is to insert an intermediate medium between the \(2\), where here the word “intermediate” not only means it’s literally intermediate between the \(2\) media, but also that its impedance \(Z’\) should be intermediate between the \(2\) (which will turn out to be the geometric mean). In other words, one is dealing with either a monotonically increasing or decreasing impedance “staircase”:

Then, when a wave is incident from the left, some of it will be reflected at the \(Z\neq Z’\) interface while some is transmitted. This transmitted light will then be incident on the \(Z’\neq Z^{\prime\prime}\) interface, and some of that will be reflected back towards the source, and get transmitted again through the original \(Z\neq Z’\) interface. Now, in order to eliminate any net back-reflection, one would like for \(2\) things to be true:

  1. The \(2\) reflections should interfere destructively, i.e. be \(\pi\) out of phase.
  2. Their amplitudes should also match up, so that one does indeed achieve complete destructive interference, rather than merely partial destructive interference.

Since there is a staircase setup, either both reflections got a \(\pi\)-phase shift or neither of them did. So in any case, this is basically nothing more than an application of thin film interference; the phase shift due to bouncing back and forth at normal incidence inside the intermediate medium of length \(L\) is:

\[\Delta\phi=k’2L=\pi+2\pi n\Rightarrow L=\left(\frac{1}{4}+\frac{n}{2}\right)\lambda’\]

with \(n=0\) being a common choice. As for the amplitude matching, unfortunately it doesn’t seem as straightforward to show that \(Z’=\sqrt{ZZ^{\prime\prime}\), since that does not imply:

\[\frac{Z-Z’}{Z+Z’}\neq\frac{2Z}{Z+Z’}\frac{Z’-Z^{\prime\prime}}{Z+Z^{\prime\prime}}\frac{2Z’}{Z+Z’}\]

(so it seems the tedious approach from matching boundary conditions is necessary? this is the quickest way I can think of for doing it with boundary conditions:)

Problem: Explain why megaphones are shaped like so:

Solution:

Posted in Blog | Leave a comment

Legendre Transforms as Derivative Symmetrizers

Suppose you know that \(p\) is the derivative of some function with respect to \(v\). A natural question is whether or not the roles of \(v\) and \(p\) can be reversed, that is, can \(v\) also be viewed as the derivative of some (possibly different) function with respect to \(p\)? In symbols, if \(p=\frac{d\mathcal L}{dv}\) for some function \(\mathcal L\), then is there some (possibly different) function \(H\) such that \(v=\frac{dH}{dp}\)? The answer turns out to be yes, and moreover is unique modulo the addition of a constant. This function \(H\) is called the Legendre transform of \(\mathcal L\) from \(v\) to \(p\). It is a straightforward exercise in integration by parts to actually find an explicit formula for the Legendre transform \(H\) in terms of \(\mathcal L\), \(v\) and \(p\) by enforcing the “derivative symmetrizer” property described above:

$$v=\frac{dH}{dp}$$

$$dH=vdp$$

$$\int dH=\int vdp$$

$$H=vp-\int pdv$$

$$H=vp-\int\frac{d\mathcal L}{dv}dv$$

$$H=vp-\mathcal L$$

where in the last equation an arbitrary additive constant \(+C\) has been suppressed to zero as is conventional. Because of the derivative symmetrizer property, it is immediate that the Legendre transform of \(H\) from \(p\) back to \(v\) will just give \(\mathcal L\) again (i.e. the Legendre transform is an involution, or equivalently its inverse is equal to itself, hence as a corollary it preserves information).

In the context of classical mechanics, \(\mathcal L\) would represent the Lagrangian of a system while \(H\) would represent its Hamiltonian. The statement \(p=\frac{d\mathcal L}{dv}\) is then often viewed as the definition of the generalized momentum coordinate \(p\) conjugate to the generalized velocity coordinate \(v\) while the symmetric equation \(v=\frac{dH}{dp}\) often falls under the guise of one of Hamilton’s equations.

By contrast, in thermodynamics it is customary to use the Legendre transform with the opposite sign convention, so that instead of \(H=vp-\mathcal L\), it would be \(H=\mathcal L-vp\). This preserves the derivative \(p=\frac{d\mathcal L}{dv}\) (because \(H\) is not in that equation) but introduces a corresponding sign change in \(v=-\frac{dH}{dp}\) (because \(dH\mapsto -dH\)). For instance, starting from the combined first and second laws of thermodynamics:

$$dU=TdS-pdV+\mu_idN_i$$

One can Legendre transform \(U=U(S,V,N_i)\) along 3 distinct “axes”, namely \(S\to T, V\to -p\) or \(N_i\to\mu_i\), leading to three corresponding thermodynamic potentials:

  1. The Helmholtz free energy \(F:=U-TS\)
  2. The enthalpy \(H:=U+pV\)
  3. The no-name thermodynamic potential \(?:=U-\mu_i N_i\)

From here, one can apply more Legendre transforms to change variables as much as one wants, noting that Legendre transforms commute. For instance, the Gibbs free energy \(G\) can be thought of as either the Legendre transform of the Helmholtz free energy \(F\) from \(V\to -p\) or as the Legendre transform of the enthalpy \(H\) from \(S\to T\):

$$G=F+pV=H-TS=U+pV-TS$$

Occasionally one also sees the grand thermodynamic potential \(\Phi:=F-\mu_iN_i\) defined as the Legendre transform of the Helmholtz free energy \(F\) from \(N_i\to\mu_i\) (this is also the Legendre transform of the earlier “no-name” thermodynamic potential \(?\) from \(S\to T\)).

Remember that fundamentally the Legendre transform is defined to be a derivative symmetrizer. This means for instance that because \(-p=\frac{\partial U}{\partial V}\) and the enthalpy \(H\) was the Legendre transform of \(U\) from \(V\to -p\), this means we get for free the symmetric derivative \(V=\frac{\partial(-H)}{\partial(-p)}=\frac{\partial H}{\partial p}\), and likewise for the others.

One final nota bene: often, it is said that the Legendre transform only exists for convex or concave functions \(\mathcal L\). This is because if the Legendre transform \(H=vp-\mathcal L\) is to be regarded as a function of \(p\), then one needs to be able to find a formula for \(v=v(p)=v\left(\frac{d\mathcal L}{dv}\right)\), but the only way one can actually have such a functional relationship is iff \(\mathcal L\) does not have the same derivative \(\frac{d\mathcal L}{dv}\) at distinct values of \(v\). In practice most functions one deals with in physics are convex/concave for instance a typical Lagrangian contains a kinetic energy term \(\mathcal L=\frac{1}{2}mv^2+…\) and quadratic parabolas \(v\mapsto\frac{1}{2}mv^2\) are a classic example of convex functions. When \(\mathcal L\) has inflection points with respect to \(v\), it may sometimes be possible to take a sort of piecewise Legendre transform. Alternatively, one can use the more general notion of the Legendre-Fenchel transform which works for all functions (even non-convex/non-concave functions) by simply defaulting to the \(v\in\textbf R\) which maximizes \(H=vp-\mathcal L\) if there are multiple \(v\) with the same \(p=\frac{d\mathcal L}{dv}\).

Posted in Blog | Leave a comment

Where Do Observables Come From?

The purpose of this post is to explain where observables in non-relativistic quantum mechanics (notably the position \(\textbf X\), momentum \(\textbf P\), orbital angular momentum \(\textbf L\), spin angular momentum \(\textbf S\) and Hamiltonian \(H\) observables) arise from, and why they have the properties (e.g. commutation relations) that they do.

In one sentence, the answer is that they arise from smooth projective unitary Lie group representations on a quantum system’s state space \(\mathcal H\). Different quantum systems will live in different state spaces \(\mathcal H\) (e.g. a spinless quantum particle moving through space has \(\mathcal H\cong L^2(\textbf R^3\to\textbf C,d^3\textbf x)\), a qubit fixed in space has \(\mathcal H\cong \textbf C^2\), a spin-1/2 electron being deflected in the Stern-Gerlach experiment has \(\mathcal H\cong L^2(\textbf R^3\to\textbf C,d^3\textbf x)\otimes_{\textbf C}\textbf C^2\), etc.). Exactly what kind of state space \(\mathcal H\) the quantum system lives in determines what the measurable observables of that quantum system are (e.g. for the qubit with \(\mathcal H\cong \textbf C^2\), the only measurable observables are \(\textbf S\) and \(H\), meaning that \(\textbf X,\textbf P\) and \(\textbf L\) would all not be measurable observables. By contrast, for the spinless quantum particle moving through space with \(\mathcal H\cong L^2(\textbf R^3\to\textbf C,d^3\textbf x)\), the situation is almost the opposite, with \(\textbf X,\textbf P, \textbf L\) and \(H\) all measurable observables but not \(\textbf S\)). As an aside, this highlights why it is important to specify a priori what \(\mathcal H\) is for any given quantum system since that “sets the rules of the game” for what observables one can even sensibly talk about, let alone measure.

The key insight will be to understand and look at the derivative of a Lie group representation at the identity of the Lie group, since that’s also where the Lie algebra lives.

Definition: Let \(\phi^{\infty}:G\to GL(V)\) be a smooth representation of a Lie group \(G\) on a vector space \(V\) (notation: the symbol \(\phi\) is meant to evoke that it’s a group homomorphism while the \(\infty\) superscript emphasizes that its smooth, i.e. that \(\phi^\infty\) is of class \(C^\infty\)). Then the derivative of \(\phi^\infty\) at the identity \(1\in G\) exists because \(\phi^\infty\) is assumed to be smooth, and is a map of Lie algebras \(\dot{\phi}^{\infty}_1:\frak g\to\frak{gl}\)\((V)\) defined for all “velocity vectors” \(\textbf V\in\frak g\) by:

\[\dot{\phi}^{\infty}_1(\textbf V):=\left(\frac{\partial}{\partial t}\right)_{t=0}\phi^{\infty}\left(e^{\textbf Vt}\right)\]

The intuition is that the Lie group representation \(\phi^{\infty}\) is like a light source that projects the Lie group \(G\) down onto its shadow \(GL(V)\). The map \(t\mapsto e^{\textbf Vt}\) is then a trajectory (see one-parameter subgroup) of \(G\) which is depicted as a green bug walking around on \(G\) and passing through \(e^{\textbf 0}=1\) at time \(t=0\) with velocity vector \(\textbf V\). The shadow of this trajectory is drawn on \(GL(V)\). Thus, the derivative \(\dot{\phi}^{\infty}_1(\textbf V)\) is simply the velocity of the bug’s shadow (at the identity \(1\in GL(V)\)) as a function of the bug’s actual velocity \(\textbf V\) (at the identity \(1\in G\)), where the intuition is that the faster you move, the faster your shadow moves too.

There are some other names by which this “derivative of Lie group representation at the identity” \(\dot{\phi}^\infty_1\) is known. For instance, emphasizing the smooth manifold nature of the Lie groups \(G,GL(V)\), a differential geometer might call \(\dot{\phi}^\infty_1\) the pushforward of \(\phi^\infty\) at the identity, viewing it as a generalized Jacobian, however we won’t pursue this terminology. \(\dot{\phi}^\infty_1\) is also called the induced Lie algebra representation (arising from \(G\)), and this terminology will be an essential way to think about it later, but for now it’s best to just think of it using the picture above.

Four important properties of \(\dot{\phi}^\infty_1\) are listed below with intuition, proofs, and examples:

Property #1: \(\phi^{\infty}\left(e^{\textbf V t}\right)=e^{\dot{\phi}^{\infty}_1(\textbf V)t}\) (i.e. this is just saying that \(\dot{\phi}^\infty_1(\textbf V)\) is indeed the shadow velocity at the identity).

Intuition/Proof: Don’t just look at the initial shadow velocity \(\dot{\phi}^{\infty}_1(\textbf V)\), instead look at the shadow velocity at all times \(t\in\textbf R\):

\[\dot{\phi}^{\infty}(\textbf V):=\frac{\partial}{\partial t}\phi^{\infty}\left(e^{\textbf Vt}\right)=\lim_{\Delta t\to 0}\frac{\phi^{\infty}(e^{\textbf V(t+\Delta t)})-\phi^{\infty}(e^{\textbf Vt})}{\Delta t}\]

Expand \(e^{\textbf V(t+\Delta t)}=e^{\textbf V\Delta t}e^{\textbf Vt}\) (bug’s location on \(G\) at time \(t+\Delta t\) is just the location \(e^{\textbf Vt}\) at time \(t\) translated by the displacement \(e^{\textbf V\Delta t}\), or see BCH formula). Then write \(\phi^{\infty}(e^{\textbf V\Delta t}e^{\textbf Vt})=\phi^{\infty}\left(e^{\textbf V\Delta t}\right)\phi^{\infty}\left(e^{\textbf V t}\right)\) (bug’s shadow in \(GL(V)\) at time \(t+\Delta t\) is just the bug’s shadow \(\phi^{\infty}\left(e^{\textbf Vt}\right)\) at time \(t\) translated by the displacement’s shadow \(\phi^{\infty}\left(e^{\textbf V\Delta t}\right)\), or equivalently this is because \(\phi^{\infty}\) is a group homomorphism). Finally, factoring out the shadow’s location \(\phi^{\infty}\left(e^{\textbf Vt}\right)\) from the numerator and the limit gives a first-order ODE in the time domain:

\[\frac{\partial}{\partial t}\phi^{\infty}\left(e^{\textbf Vt}\right)=\dot{\phi}^{\infty}_1(\textbf V)\phi^{\infty}\left(e^{\textbf Vt}\right)\]

which is solved (with the initial condition \(\phi^{\infty}(e^{\textbf 0})=\phi^{\infty}(1)=1\)) by \(\phi^{\infty}\left(e^{\textbf Vt}\right)=e^{t\dot{\phi}^{\infty}_1(\textbf V)}\).

Examples: The trivial representation \(\phi^\infty:G\to GL(\textbf C)\cong\textbf C-\{0\}\) of any group \(G\) defined by \(\phi^\infty(g):=1\) has derivative (at the identity) \(\dot\phi^\infty_1(\textbf V)=0\) and indeed \(e^0=1\). The defining representation \(\phi^\infty:SO(2)\to SO(2)\subseteq GL(\textbf R^2)\) of \(SO(2)\) on \(\textbf R^2\), defined by \(\phi^\infty(R):=R\) has derivative (at the identity) \(\dot\phi^\infty_1(\Omega)=\Omega\) and so of course \(\phi^\infty(e^\Omega)=e^\Omega\). The adjoint representation \(\text{Ad}:Sp(1)\to GL(\frak{sp}\)\((1))\) of the symplectic group \(Sp(1)\) of unit quaternions on its Lie algebra \(\frak{sp}\)\((1)\) of imaginary quaternions \(\text{Ad}_{\hat q}(\textbf x):=\hat q\textbf x\hat q^{-1}\) has derivative (at the identity) \(\dot{\text{Ad}}_1:=\text{ad}:\frak{sp}\)\((1)\to\frak{gl}(\frak{sp}\)\((1))\) defined by \(\text{ad}_{\textbf x}(\textbf r)=[\textbf x,\textbf r]\). And indeed this is just one version of the BCH formula: \(e^{\textbf x}\textbf re^{-\textbf x}=e^{[\textbf x,]}(\textbf r)=\textbf r+[\textbf x,\textbf r]+\frac{1}{2}[\textbf x,[\textbf x,\textbf r]]+\frac{1}{6}[\textbf x,[\textbf x,[\textbf x,\textbf r]]]+…\) (despite the temptation, note that \(e^{\text{ad}_{\textbf x}}(\textbf r)\neq e^{\text{ad}_{\textbf x}(\textbf r)}\)).

Property #2: \(\dot\phi^\infty_1(\text{Ad}_g(\textbf V))=\text{Ad}_{\phi^\infty(g)}(\dot\phi^\infty_1(\textbf V))\) for all \(g\in G,\textbf V\in\frak g\) (i.e. “tilting your head” \(\textbf V\mapsto \text{Ad}_g(\textbf V)\), then measuring the shadow velocity \(\dot\phi^\infty_1(\text{Ad}_g(\textbf V))\) is the same as measuring the “actual” shadow velocity \(\dot\phi^\infty_1(\textbf V)\), then tilting your head).

Intuition/Proof: This is just Property #1 but with \(\textbf V\mapsto \text{Ad}_g(\textbf V)\) (i.e. tilting your head into the basis of \(g\)).

\[e^{\dot{\phi}^\infty_1(\text{Ad}_g(\textbf V))t}=\phi^\infty\left(e^{\text{Ad}_g(\textbf V)t}\right)\]

The bug’s trajectory \(e^{\text{Ad}_g(\textbf V)t}\) in \(G\) is more simply \(\text{Ad}_g(e^{\textbf Vt})\) (because it’s fundamentally the same trajectory as \(e^{\textbf Vt}\) just viewed in the basis of \(g\). Another way of phrasing it is that \(\exp\) and \(\text{Ad}_g\) commute for all \(\textbf V\in\frak g\)). Then \(\phi^\infty(\text{Ad}_g(e^{\textbf Vt}))=\text{Ad}_{\phi^\infty(g)}(\phi^\infty(e^{\textbf Vt}))\) by homomorphism. Finally, applying Property #1 again re-expresses the actual shadow trajectory \(\phi^\infty(e^{\textbf Vt})\) as \(e^{\dot{\phi}^{\infty}_1(\textbf V)t}\). Thus, as it is we have two ways of expressing the shadow’s tilted trajectory. To get just the tilted initial shadow velocity, take \((\partial/\partial t)_{t=0}\) and the claim follows.

Example: In the case where \(\phi^\infty=\text{Ad}\) also happens to be the adjoint representation of \(G\), Property #2 leads to the curious identity \([g\textbf V g^{-1},\textbf W]=g[\textbf V,g^{-1}\textbf W g]g^{-1}\).

Property #3: \(\dot{\phi}^\infty_1(\alpha\textbf V)=\alpha\dot{\phi}^\infty_1(\textbf V)\) and \(\dot{\phi}^\infty_1(\textbf V+\textbf W)=\dot{\phi}^\infty_1(\textbf V)+\dot{\phi}^\infty_1(\textbf W)\) for all \(\alpha\in\textbf R, \textbf V,\textbf W\in\frak g\) (i.e. doubling the bug’s velocity doubles its shadow velocity and a vector addition of actual velocities corresponds to vector addition of shadow velocities).

Intuition/Proof: This is just the statement that \(\dot{\phi}^\infty_1\) is an \(\textbf R\)-linear transformation between vector spaces \(\frak g\)\(\to\text{gl}\)\((V)\). The scaling covariance is easy to check directly from the definition of the derivative. As for translational covariance, I haven’t been able to find any elementary way to do it, other than by appealing to various BCH-type formulas. The key obstacle is to get \(e^{\textbf V+\textbf W}\) to interact nicely with the homomorphism property of \(\phi^\infty\). One way is to apply the Zassenhaus formula:

$$e^{(\textbf V+\textbf W)t}=e^{\textbf Vt}e^{\textbf Wt}e^{-[\textbf V,\textbf W]t^2/2}e^{(2[\textbf W,[\textbf V,\textbf W]+[\textbf V,[\textbf V,\textbf W]])t^3/6}…$$

where terms of order \(\mathcal O_{t\to 0}(e^{-t^2})\) will be negligible in the derivative. Another alternative is the Lie-Trotter product formula:

$$e^{\textbf V+\textbf W}=\lim_{n\to\infty}\left(e^{\textbf V/n}e^{\textbf W/n}\right)^n$$

in which one would freely interchange limits and derivatives, and apply the power rule.

Example: Taking again \(\phi^\infty=\text{Ad}\) as in the previous example, Property #3 simply says that the commutator looks like multiplication (or as mathematician’s prefer to say, bilinear).

Property #4: \(\dot\phi^\infty_1(\text{ad}_{\textbf V}(\textbf W))=\text{ad}_{\dot{\phi}^\infty_1(\textbf V)}(\dot\phi^\infty_1(\textbf W))\)

Intuition/Proof: This is just Property #2 but with \(g=e^{\textbf Vt}\) and the \(\textbf V\) in Property #2 replaced by \(\textbf W\), then just take \((\partial/\partial t)_{t=0}\) and note that it commutes with \(\dot\phi^\infty_1\) because the latter is linear (Property #3). Essentially, there is a bug with velocity \(\textbf W\) at the identity \(1\in G\), and a second bug \(e^{\textbf Vt}\) that collides with the first bug at the identity at \(t=0\), and so \(\dot\phi^\infty_1(\text{ad}_{\textbf V}(\textbf W))\) should be read as “the shadow velocity of the first bug in the moving frame of the second bug when both collide at \(1\)”.

Example: Taking \(\phi^\infty=\text{Ad}\) again, one establishes the Jacobi identity \(\text{ad}_{\text{ad}_{\textbf V}(\textbf W)}(\textbf X)=\text{ad}_{\text{ad}_{\textbf V}}(\text{ad}_{\textbf W})(\textbf X)=\text{ad}_{\textbf V}(\text{ad}_{\textbf W}(\textbf X))-\text{ad}_{\textbf w}(\text{ad}_{\textbf v}(\textbf X))\), commonly written in the cyclic form \([\textbf V,[\textbf W,\textbf X]]+[\textbf W,[\textbf X,\textbf V]]+[\textbf X,[\textbf V,\textbf W]]=\textbf 0\). This corollary of Property #4 combined with the multiplicative nature of the commutator from Property #3 is actually a conceptually elegant way to prove that the tangent space \(\frak g\)\(=T_1G\) of any Lie group \(G\) at its identity is genuinely a (real) Lie algebra.

And because \(\frak g\) is a Lie algebra, Property #3 and Property #4 acquire a deeper interpretation; together, they assert that \(\dot\phi^\infty_1\) is a representation of the Lie algebra \(\frak g\) (on the same vector space \(V\)). Just as a representation of the Lie group \(G\) was a homomorphism from \(G\) to \(GL(V)\), a representation of a Lie algebra \(\frak g\) is a homomorphism from \(\frak g\) to \(\frak{gl}\)\((V)\). But for Lie algebras, the notion of “homomorphism” is different than for groups because they have different data/structures on them that require preserving (for Lie algebras, these are the real vector space structure and the Lie bracket whereas for groups it is just the composition of symmetries). Thus a Lie algebra representation is distinct from the concept of a group representation, but similar in spirit.

Thus, this explains the terminology “induced Lie algebra representation” from earlier. Abstractly, one can view differentiation of paths at the identity as a functor from the category of Lie group representations to the category of Lie algebra representations. It is natural to ask if such a functor is surjective, that is, if every Lie algebra representation arises in this way. For simply connected Lie groups \(G\), it turns out the answer is yes, but in general no (I’m not sure about the question of injectivity, but my intuition suggests it is injective between non-isomorphic Lie group representations due to Schur’s lemma).

Finally, we are in a position to explain where observables come from. The following concrete example illustrates this general procedure:

Example: Consider a spinless particle moving on the real line \(\textbf R\). It therefore has state space \(\mathcal H=L^2(\textbf R\to\textbf C,dx)\) with the usual protractor given in terms of position eigenbasis wavefunctions by \(\langle\psi_1|\psi_2\rangle:=\int_{-\infty}^\infty\overline{\langle x|\psi_1\rangle}\langle x|\psi_2\rangle dx\). One example of a smooth projective unitary representation \(\phi^\infty:G\to PU(\mathcal H)\) acting on \(\mathcal H=L^2(\textbf R\to\textbf C,dx)\) comes from the additive Lie group \(G=\textbf R\) via:

$$\angle x|\phi^\infty_{\Delta k}|\psi\rangle:=e^{-i\Delta kx}\langle x|\psi\rangle$$

for some “translation” \(\Delta k\in\textbf R=G\) in de Broglie space (also called Fourier space, reciprocal space, dual space, momentum space, wave space, etc.). To wit:

  • \(\phi^\infty\) is indeed a representation of \(\textbf R\) because \(\phi^\infty_{\Delta k_1+\Delta k_2}=\phi^\infty_{\Delta k_1}\circ\phi^\infty_{\Delta k_2}\) (so it’s not merely a projective representation but an actual representation).
  • \(\phi^\infty\) is indeed unitary because \(e^{i\Delta kx}e^{-i\Delta kx}=1\) and an invariant integrand implies an invariant integral.

Now one would like to differentiate \(\phi^\infty\) at the identity \(1\in\textbf R\) to get the induced Lie algebra representation \(\dot{\phi}^\infty_1:\frak{R}\)\(\to\frak u\)\((L^2(\textbf R\to\textbf C,dx))/i\textbf R1\). The first thing is to figure out what the Lie algebra \(\frak R\) of \(\textbf R\) is. As one might intuitively expect, \(\frak R\)\(=\textbf R\) because the tangent line to \(1\in\textbf R\) is…well all of \(\textbf R\) (one way to argue this rigorously is to algebraically embed \(\textbf R\to SL_2(\textbf R)\) by viewing real numbers \(x\in\textbf R\) as the one-parameter subgroup of horizontal shearing matrices \(\begin{pmatrix}1 & x\\0 & 1\end{pmatrix}\in SL_2(\textbf R)\) and then noting that \(\frak{sl}\)\(_2(\textbf R)\cong\textbf R\)). Even though at the level of sets, \(\textbf R=\textbf R\) trivially, at the level of their structures it is important to emphasize that one of them \(\textbf R=G\) is first and foremost a (Lie) group whereas the other one \(\textbf R=\frak R\) is first and foremost a real vector space. The reason for emphasizing this is that it means the exponential map from the Lie algebra \(\textbf R\) to the Lie group \(\textbf R\) is not the standard real exponential that one is used to working with; instead, it is actually just the identity map \(e^x=x\). The reason for this is because the Lie group \(\textbf R\) is an additive group, not a multiplicative group (whereas matrix Lie groups are typically multiplicative, so for them the exponential map genuinely coincides with the matrix exponential). To see this rigorously, one has to look at the general definition of the exponential map in Lie theory in terms of a unique one-parameter subgroup and use the additivity (rather than multiplicativity) of the Lie group \(\textbf R\). All this is to say that:

$$\dot{\phi}^\infty_1(1)=\left(\frac{\partial}{\partial k}\right)_{k=0}e^{-ikx}=-ix$$

where the reason for pushing forward \(1\in\textbf R\) in the Lie algebra is because it is the most natural basis vector \(\textbf R=\text{span}_{\textbf R}\{1\}\) for the Lie algebra (or a “generator for the Lie group \(\textbf R\)” as physicists say, where the phrase “generate” is synonymous with “exponentiates to (allowing arbitrary linear combinations of generators)”, i.e. the assumption is that the exponential is surjective, which will be true if \(G\) has a compact and connected topology). Of course, this is currently an anti-Hermitian linear operator, so the last step is to multiply by \(i\) to recover the position observable \(X\) in the \(x\)-direction:

$$\langle x|X|\psi\rangle = x\langle x|\psi\rangle$$

A completely analogous procedure can be carried out with the SPUR \(\phi^\infty:\textbf R\to PU(L^2(\textbf R\to\textbf C,dx))\) defined by \(\langle x|\phi^\infty_{\Delta x}|\psi\rangle = \langle x-\Delta x|\psi\rangle\) (here the Lie group is also \(\textbf R\), but conceptually one thinks of it as the group of translations \(\Delta x\in\textbf R\) of real space rather than reciprocal space (or in the language of wave-particle duality, particle space rather than wave space). The induced Lie algebra representation pushed forward on the same generator \(1\) gives by the chain rule:

$$\langle x|\dot{\phi}^\infty_1(1)=-\frac{\partial}{\partial x}$$

Multiplying by \(i\) gives the momentum observable \(P_x\) in the \(x\)-direction:

$$\langle x|P_x|\psi\rangle=-i\frac{\partial}{\partial x}\langle x|\psi\rangle$$

where we have set \(\hbar = 1\) in some suitable natural unit system. Finally, there is one more measurable observable associated to this 1D spinless quantum particle with \(\mathcal H=L^2(\textbf R\to\textbf C,dx)\). This time again it happens to be the Lie group \(\textbf R\) that acts on \(L^2(\textbf R\to\textbf C,dx)\) via the SPUR \(\phi^\infty_{\Delta t}|\psi(t)\rangle:=|\psi(t+\Delta t)\rangle\), but now \(\textbf R\) is to be thought of conceptually as the group of translations \(\Delta t\in\textbf R\) through time. Also interesting to note here is that we do \(+\Delta t\) rather than \(-\Delta t\) whereas for the spatial translations we had to do \(-\Delta x\) rather than \(+\Delta x\) (this reminds me of Minkowski’s metric tensor \(g=\text{diag}(1,-1,-1,-1)\) used to define the hyperbolic geometry of Minkowski spacetime in special relativity, though I’m not sure if there’s any connection there). Running the machine again gives (chain rule):

$$\dot{\phi}^\infty_1(1)=\frac{\partial}{\partial t}$$

and multiplying by \(i\) gives the Hamiltonian observable \(H\):

$$H=i\frac{\partial}{\partial t}$$

which is of course also known as the (time-dependent) Schrodinger equation. By applying properties #1,#2,#3 and #4 of \(\dot{\phi}^\infty_1\) proved earlier, one can establish such facts as:

Unitary operators as Hilbert space homomorphisms? Also where does the role of irreducibility come in? I guess they literally are just a direct sum basis of all representations, so they are really the only interesting/fundamental ones.

Posted in Blog | Leave a comment