High-Power Rocketry

Having been a member of Cambridge University Spaceflight (CUSF) during my first year as an undergraduate student, I thought it would be fun to document all the knowledge I’ve acquired with regards to the art of high-power rocketry. A good summary of the basics of HPRC can be found in this NASA document which is meant to accompany this playlist of YouTube videos. This document on rocket anatomy from the National Association of Rocketry (NAR) is also quite thorough.

To set the stage, high-power rocketry is essentially just a more powerful version of model rocketry. In the United Kingdom, high-power rocketry is overseen by the United Kingdom Rocketry Association (UKRA), which means that if one wishes to fly a high-power rocket in the UK, one has to first obtain certification from UKRA. Exactly what kind of certification is required depends on the level and class of the rocket motor (or engine) that one is using (for model rockets, such certification is not necessary). These are simply intervals of rocket motor impulse \(J=\int_0^{t_{\text{rocket motor burnout}}}F_{\text{thrust}}dt\), doubling each time \([J_*,2J_*]\mapsto [2J_*,4J_*]\), and for high-power rocketry are:

  • Level \(1\): \(H\)-class rocket motors \(J_H\in[160,320]\text{ N}\cdot\text s\), \(I\)-class rocket motors \(J_I\in[320,640]\text{ N}\cdot\text s\)
  • Level \(2\): \(J\)-class rocket motors \(J_J\in[640,1280]\text{ N}\cdot\text s\), \(K\)-class rocket motors \(J_K\in[1280,2560]\text{ N}\cdot\text s\), \(L\)-class rocket motors \(J_L\in[2560,5120]\text{ N}\cdot\text s\)
  • Level \(3\): \(M\)-class rocket motors \(J_M\in[5120,10240]\text{ N}\cdot\text s\), \(N\)-class rocket motors \(J_N\in[10240, 20480]\text{ N}\cdot\text s\), \(O\)-class rocket motors \(J_O\in[20480, 40960]\text{ N}\cdot\text s\)

At CUSF, the rocket I have been working on, called Panthera, will be powered by a level \(2\) \(L\)-class rocket motor, the highest class within level \(2\). More specifically, I specialize in the avionics subsystems of the rocket rather than aerostructural subsystems. That being said, I will provide some comments on the aerostructure since it is important to have a basic idea of the rocket’s anatomy and where the avionics fit within the bigger picture.

A fundamental principle of high-power rocketry (and engineering more generally) is: “test what you fly and fly what you test”.

Aerostructures

The following are standard subassemblies that make up the aerostructure of a high-power rocket.

  • Nose cone: a bit of a misnomer since it doesn’t necessarily (indeed it usually isn’t) a cone per se. The most common geometry is the tangent ogive nose cone (pronounced OH-JIVE), namely the solid of revolution formed from two arcs of circles with diameters greater than the base diameter \(D\) of the nose cone. More generally, the selection of nose cone geometry depends on how simple it is to manufacture (or if sourced commercially, then what’s available on the market) and if simple enough, then on its drag characteristics. This is actually not quite as simple as just “minimizing the drag coefficient \(C_D\)”. The problem is that the total drag force \(\textbf F_d=-\frac{1}{2}C_D(\text{Re}, v/c)\rho A\textbf v^2\) acting on the rocket is the sum \(\textbf F_d=\textbf F_{d,\text{skin}}+\textbf F_{d,\text{form}}\) of skin friction drag \(\textbf F_{d,\text{skin}}\) which dominates for subsonic speeds (less than Mach \(1\)) and is due to the viscosity \(\eta\) of the boundary layer around the rocket and form drag \(\textbf F_{d,\text{form}}\) due to shock waves, eddies, vortices and other turbulent sources of a pressure differential \(\Delta p\) across the rocket which therefore means that form drag dominates at transonic/supersonic/hypersonic speeds (greater than Mach \(1\)). OpenRocket simulations suggested that Panthera’s speed was expected to peak around Mach \(1.74\), so when selecting the nose cone geometry it was essential to pay greater attention to minimizing the form drag \(\textbf F_{d,\text{form}}\). The fineness ratio of the nose cone is defined by \(\phi:=\frac{L}{D}\) where \(L\) is the nose cone length and \(D\) its base diameter and in general a greater fineness ratio \(\phi\) means a “sleeker” nose cone (I think sleekness ratio would have been a better name) and reduced form drag \(\frac{\partial F_{d,\text{form}}}{\partial \phi}<0\). However, for \(\phi>5\), the reduction in form drag becomes negligible as competing factors begin to dominate (for instance, the lateral surface area of a perfectly conical nose cone is \(A=\frac{\pi D^2}{4}\sqrt{1+4\phi^2}\) and \(F_{d,\text{skin}}\propto A\) while the mass will increase too). Another less important dimensionless ratio is the bluffness ratio \(b:=d/D\) where \(d\) is the diameter of curvature at the nose cone tip. In general, \(b\in [0.1, 0.2]\) is found to empirically minimize form drag. In fact, mathematically one can show that for fixed \(L\) and \(D\) the optimal geometry is the so-called von Karman nose cone (related to the Sears-Haack body) and this is what Panthera uses (with fineness ratio \(\phi=5\), often phrased as being “\(5:1\)”). The material is a G\(12\) fiberglass composite (since composite materials allow simultaneously both strength and light weight based on volume fraction although the dust from the fiberglass is a respiratory hazard) with an aluminum plating (for better heat dissipation).
  • Body tube (also called airframe): basically a long cylinder that defines the basic structure of the rocket. Actually, Panthera uses \(2\) body tubes as can be seen in the engineering drawing above. The upper \(575\text{ mm}\) body tube is connected to the lower \(1200\text{ mm}\) body tube via a coupler tube (with the precise connections explained below). Like the nose cone, both body tubes are made out of G12 fiberglass. The lower body tube houses the rocket motor along with the minimum diameter slimline motor retainer, with a \(1/4\) stainless steel (SS) eyebolt. There are also tang fin slots onto which the fins will be epoxied. These can be custom made using a jig and power tools with a drill template wrapped around the body tube to guide the drilling.
  • Fins:
  • Coupler tubes: cylindrical tube used to join body tube segments together. If each body tube has diameter \(D\), then the length \(L_C\) of the coupler tube should satisfy \(L_C\geq 2D\) and also \(D_{\text{outer, coupler tube}}=D_{\text{inner, body tube}}\). The requirement on the length \(L_C\) makes coupler tubes convenient avionics bays as is the case for Panthera. Moreover, part of the job of the avionics is to trigger e-matches to ignite black powder which is usually close to the coupler tube since that’s roughly where the separation of the body tubes will occur. The upper body tube is secured to the coupler tube with plastic rivets since they need to stay intact together during separation while the lower body tube is secured to the coupler tube by simply two nylon screws functioning as shear pins to be sheared by the pressurization of the black powder after ignition. Often, a collar of body tube material is epoxied in the center of the coupler tube for easy access to avionics bay switches and for pressurization.
  • Rocket motor casing: confusingly, the use of the word “motor” here does not mean an electric motor, but rather is just a synonym for “engine” (thus, I feel that “rocket engine” is a more suitable term). More precisely, the convention in high-power rocketry circles seems to be that if the rocket acquires thrust via a liquid propellant (e.g. liquid nitrogen \(\text N_2(\ell)\)), then it is called a rocket engine whereas if it’s a solid propellant (e.g. ammonium perchlorate composite propellant (APCP)), then it’s called a rocket motor. Panthera uses a Cesaroni 4864L2375-WT rocket motor casing.
  • Rocket motor retainer: purpose is to “retain” the rocket motor casing during flight (don’t want pressurization to eject it).
  • Centering rings: not applicable to Panthera which is a minimum diameter rocket (i.e. the body tube is also the motor tube) but for rockets which are not minimum diameter, these are used to center tubes nested within the body tube, as the name suggests.
  • Bulkhead plates: these are for capping coupler tubes, with \(D_{\text{bulkhead plates}}=D_{\text{inner, coupler tube}}\). On Panthera, the coupler tube housing the avionics bay will be capped by two carbon fiber bulkhead plates joined by two M\(5\) threaded rods (meaning \(5\text{ mm}\) diameter threads) with washers and nuts to hold them together, spaced equidistant between the center and the circumference. The upper bulkhead plate will also have an M\(10\) eye bolt to attach to the shock cord.
  • Booster/Fin Can: the purpose of fins is to stabilize the rocket.
  • Launch lug (rail buttons):
  • Shock cord/parachutes: Panthera will use a Kevlar shock cord, with eye-ring attachment to the eye ring or the motor retainer.

Avionics

  • Avionics is just a portmanteau of aviation electronics. Since I was chiefly responsible for the avionics subsystems of Panthera, I therefore have the most to say about it šŸ™‚
  • In general, the methodology was: decide what kind of generic sensors and rocketry-specific things are needed, go shopping on Amazon, Pi Hut, Mouser, PJRC, Arduino, ELEGOO, Adafruit, WizardRockets, BlackCat Rocketry, etc. for these sensors (I should confess I’m quite terrible at shopping for anything in general, I tend to spend way too much time sifting and carefully considering all the specs/options/datasheets, etc….in the future, I think a Pugh decision matrix and the 1/e rule would be a much more rational way to shop).
  • A lot of avionics is just embedded systems programming with digital electronics.
  • Level shifters, logic levels, converters, etc.
  • Antennas: an antenna is any conductor whose purpose is to transmit or receive radio waves (thus, if the antenna’s purpose is to transmit radio waves, then the driven element of the antenna will be connected via a feedline/transmission line such as a coaxial cable to a radio transmitter, whereas if the antenna’s purpose is to receive radio waves, then its driven element will be connected via feedline to a radio receiver). With regards to Panthera, \(3\) distinct antennas were involved. The first was a passive GPS antenna whose function was to receive radio waves from satellite constellations (e.g. GPS, GLONASS, Bei Dou, etc.) in orbit around Earth. The second was a Long Range (LoRa) antenna (\(f=433\text{ MHz}\) whose function was to transmit radio waves down to the ground. Finally, the third antenna was an ultra high frequency (UHF) \(f=433\text{ MHz}\) yagi-uda antenna whose function was to receive the transmitted \(f=433\text{ MHz}\) radio waves from the LoRa antenna. A yagi-uda antenna consists of a (parasitic) reflector, a driven element, and one or more (parasitic) directors all mounted parallel to each other on a boom. For a given frequency \(f\) to be received, this corresponds to a wavelength \(\lambda=c/f\), and the specific design characteristics of such an antenna for maximization of gain are then specified as a function of \(\lambda\).
  • When designing an electric circuit, one should first focus solely on the topology of the circuit (in KiCad, this corresponds to first laying out the symbols for microcontrollers, sensors, power supplies, etc. and subsequently connecting pins together as per whatever serial communication protocol is involved). Only after the topology has been taken care of does one then proceed to optimize the geometry of the electric circuit (minimize cross talk, parasitic effects, etc.) which in KiCad means figuring out how everything will pack together in physical space on a printed circuit board (PCB).

Serial Communication Protocols

  • A serial communication protocol is any standardized way for controllers to communicate data serially with peripherals. This is in contrast to parallel communication protocols (this is completely analogous to how components in an electric circuit can be in series or in parallel topologies). Serial communication protocols seem to be more popular. They require fewer wires (because all data communication occurs serially across the same wires) but as a result are slower than parallel communication protocols.
  • Abstractly, data is just sequences of binary digits (bits). Physically, data is digital voltage signals.
  • Communicate = Transmit + Receive
  • Often data which is communicated serially is called “serial data” although it makes more sense to say that one is communicating data serially (as an adverb) rather than talking about communicating serial data (as an adjective).
  • There are synchronous serial communication protocols (transmitters and receivers both share a serial clock ( and asynchronous serial communication protocols.

Universal Asynchronous Receiver-Transmitter (UART)

  • Universal asynchronous receiver-transmitter (UART) is a serial communication protocol (mostly superseded by universal serial bus (USB) these days) in which one has the usual VCC/GND power pins, and RX-TX and TX-RX between only two peripherals. Put bluntly, UART is about the most obvious/simplest/minimalistic serial communication protocol one would naturally think of, since the very concept of communication basically just means transmitting and receiving information. Can describe the inner workings more precisely. Simplicity is both its pro and its con. This is what the ublox MAX-M8Q GPS chips use (their \((r,\lambda,\varphi)\) accuracy is stunning, around \(5\) decimal places even when indoors).

Inter Integrated Circuit \(I^2C\)

  • Inter Integrated Circuit (\(I^2C\)) is a synchronous serial communication protocol. The idea is that there is one central device called the controller (formerly called a master) such as an Arduino Uno microcontroller which transmits and receives data serially to and from peripherals (formerly called slaves). in which one has the usual VCC/GND power pins for each peripheral and the microcontroller, but also there are pullup resistors and all the peripherals connect to a common \(I^2C\) bus via a serial data (SDA) line and also it’s synchronous so there is a serial clock (SCL) line as well.
  • The ublox MAX-M8Q GPS chips technically also work with \(I^2C\) in addition to UART, but the previous electronics team member working on Aquila mentioned that he could not get it to work on \(I^2C\). It is also the (American? European?) industry, science, medicine (ISM) band frequency \(f_{\text{ISM}}=433\text{ MHz},\lambda_{\text{ISM}}=0.7\text{ m}\) of the radio wavepackets that the LoRa module on the rocket transmits via an antenna, and which the LoRa module on the ground station receives via an identical antenna.
  • Serial Peripheral Interface (SPI) is a serial communication protocol with VCC/GND as usual, and also MOSI/MISO lines and stuff.
  • By uploading the following C++ sketch onto the Teensy \(4.1\) board, the hex(adecimal) \(I^2C\) address of the BME\(280\) barometer breakout board was located at \(0x76=76_{16}=118=1110110_2=0b1110110\) while the \(I^2C\) address of the MPU\(6050\) breakout board was located at \(0x68=68_{16}=104=1101000_2=0b1101000\).
  • In what follows, all the discussion is pertaining to through-hole soldering (which is what it sounds like). Unfortunately I did not get to do any surface mount device (SMD) soldering due to the expensive requirement of a microscope.
  • In general, I find that a repeated stabbing technique works well. To minimize surface energy \(V_{\text{surface}}\), enough shaking will cause the solder to naturally creep into the desired regions and completely fill it (correction: actually I found that a certain “rolling technique” with the soldering iron is even more effective and gives very nice finish, it is easy to forget that \(\phi\) degree of freedom of the soldering iron in your hand). It is true that if too thick a layer of oxidized metal is formed on the surface of the soldering iron tip, then this will reduce the thermal conduction (see Fourier’s heat conduction law), however what I find empirically is that if there is initially a small layer of liquid solder adhering to the soldering iron tip, then this can actually induce a sort of superheating effect (similar to how heterogeneous nucleation reduces supercooling) so that the solder’s melting point \(T_{(s)\to(\ell)}\) is higher than usual. There is probably some argument to be made here regarding interfacial surface energy minimization, though I’m not too sure.
  • Always have a digital multimeter nearby for checking electrical continuity.
  • For peripheral devices with pins, always solder from the side the pins poke out and similarly use that side for any connections with other terminals. Having some wire around and a crimper/wire stripper is very handy, allows arbitrary routing of terminals through space (although if compactness is especially desired, then that just speaks to the power of PCBs. Also, preplanning and drawing the schematic for a PCB really does help a lot cf. doing it live as I found out today, there’s just a lot of factors to consider, can easily go wrong. Not only this but the PCB is generally more robust than trying to solder like that.
  • Magnet wire is too much hassle stripping the insulation off each time; again, PCB is best.
  • For 3d printing the boards, ended up realizing only after a 5-hour print that hole for M5 threaded rod was too small…learned to print smaller versions of something first (quick 10 minute prints) to test for the tolerance before scaling up. Modular approach to engineering.
  • Group similar tasks together, for example first cut all the magnet wire needed, then strip all, then solder altogether, etc. much more efficient this way.
  • Knowing that the avionics board was meant to go together

Transistors

  • The simplest kind of transistor to understand is the bipolar junction transistor (BJT) (the reason for this name will become apparent in what follows). Among BJTs, the most common are NPN-type BJTs. These consist of two N-type semiconductors (the heavily doped, very \(e^-\)-dense emitter \(E\) and the spacious, less \(e^-\)-dense collector \(C\)) sandwiching P-type semiconductor in the middle (the very thin and lightly doped, hole-dense base \(B\)). Ordinarily, applying a voltage \(V_{C-E}\) across the NPN-type BJT will not induce a conventional electric current \(I_{C\to E}\) due to the presence of depletion regions at both \(PN\)-junctions in the NPN-type BJT. However the key principle of the NPN-type BJT is that if one also applies a much smaller voltage \(V_{B-E}\), then the thinness and doped nature of the P-type base semiconductor wafer means that the depletion regions vanish and very little electron-hole recombination can occur, so a small conventional electric current \(I_{B\to E}\) will flow, inducing a much greater collector current \(I_{C\to E}=\lambda I_{B\to E}\) to flow (electron avalanche!) where the gain factor \(\lambda\) is typically of order \(\lambda\sim 100\). In this way, an NPN-type BJT can be thought of as a current-controlled switch, or as a current amplifier. This is commonly reflected in the following circuit schematic symbol:
  • A metal-oxide semiconductor field-effect transistor (MOSFET) achieves the same sort of thing as a BJT. The only difference is the terminology and the mechanism. In a MOSFET, one has the source (analogous to the emitter in a BJT), the gate (analogous to the base in a BJT) and the drain (analogous to the collector in a BJT), although there is also a substrate. It’s called “field-effect” cuz it relies on an \(\textbf E\)-field. Also, rather than being current-controlled, it’s voltage-controlled (i.e. gate voltage \(V_G\) determines pinch-off from ohmic regime to saturation. There are also enhancement and depletion, N-channel and P-channel MOSFETs.

Kalman Filtering

  • Acquiring an accurate picture of Panthera’s flight profile as a function of time is important for two reasons. The less important reason is because naturally one would be interested in knowing how it flew and how well simulations were able to predict it so that future launches can be improved. The more important reason in this case is because one would like to know when Panthera reaches a certain altitude at which point one would like to have a backup system for igniting the electric match and exploding the black powder to induce main parachute deployment in case the main flight computer (the Eggtimer Quark) malfunctions.
  • The first step with Kalman filtering is to derive a linear dynamical model (LDM) for the time evolution of the rocket’s state vector. For simplicity, one can pretend that the acceleration \(a\) of Panthera is approximately constant throughout the entire flight (this is of course not true, for instance OpenRocket simulations suggest that the acceleration is expected to peak at around \(a\approx 40 g\) during launch but then decay down to \(a\approx -g\) as the rocket nears apogee since the rocket essentially just becomes a free fall projectile). Nevertheless, one should not fuss too much about the accuracy of the theoretical model (it just has to be roughly accurate on average).
  • The next step is to actually decide on a suitable state vector for the system. Ultimately, this depends on what parameters one is interested in estimating about Panthera. In this case a suitable state vector would be \(\begin{pmatrix} x\\ v \\ a\end{pmatrix}\in\textbf R^3\). Under the constant acceleration assumption earlier, the preliminary recursive state estimate based on the LDM alone is:

\[\begin{pmatrix} \hat x^{\text{LDM}}_{t}\\ \hat v^{\text{LDM}}_{t}\\ \hat a^{\text{LDM}}_{t}\end{pmatrix}=\begin{pmatrix}1 & \Delta t & \Delta t^2/2 \\ 0 & 1 & \Delta t \\ 0 & 0 & 1\end{pmatrix}\begin{pmatrix} \hat x_{t-\Delta t}\\ \hat v_{t-\Delta t}\\ \hat a_{t-\Delta t}\end{pmatrix}\]

  • The variance-covariance matrix of the linear dynamical model estimate is:

\[\text{varcov}\begin{pmatrix} \hat x^{\text{LDM}}_{t}\\ \hat v^{\text{LDM}}_{t}\\ \hat a^{\text{LDM}}_{t}\end{pmatrix}=\begin{pmatrix}1 & \Delta t & \Delta t^2/2 \\ 0 & 1 & \Delta t \\ 0 & 0 & 1\end{pmatrix}\text{varcov} \begin{pmatrix} \hat x_{t-\Delta t}\\ \hat v_{t-\Delta t}\\ \hat a_{t-\Delta t}\end{pmatrix} \begin{pmatrix}1 & 0 & 0 \\ \Delta t & 1 & 0\\ \Delta t^2/2 & \Delta t & 1\end{pmatrix}+\text{varcov}\begin{pmatrix} \delta\hat x^{\text{LDM}}_{t}\\ \delta\hat v^{\text{LDM}}_{t}\\ \delta\hat a^{\text{LDM}}_{t}\end{pmatrix}\]

where one can assume that the variance-covariance matrix of the LDM noise is dominated by the assumption of constant acceleration, so

\[\text{varcov}\begin{pmatrix} \delta\hat x^{\text{LDM}}_{t}\\ \delta\hat v^{\text{LDM}}_{t}\\ \delta\hat a^{\text{LDM}}_{t}\end{pmatrix}=\begin{pmatrix} 0&0&0\\0&0&0\\0&0&\sigma^2_{LDM}\end{pmatrix}\]

  • The lowest variance/covariance estimator for the state at time \(t\) is then the weighted average of the two:

\[\begin{pmatrix} \hat x^{\text{LDM}}_{t}\\ \hat v^{\text{LDM}}_{t}\\ \hat a^{\text{LDM}}_{t}\end{pmatrix}=(1-K_tH_t)\begin{pmatrix} \hat x^{\text{LDM}}_{t}\\ \hat v^{\text{LDM}}_{t}\\ \hat a^{\text{LDM}}_{t}\end{pmatrix}+K_t\begin{pmatrix} \hat x^{\text{sensors}}_{t}\\ \hat v^{\text{sensors}}_{t}\\ \hat a^{\text{sensors}}_{t}\end{pmatrix}\]

where the Kalman gain is given by:

\[K_t=\text{varcov}\begin{pmatrix} \hat x^{\text{LDM}}_{t}\\ \hat v^{\text{LDM}}_{t}\\ \hat a^{\text{LDM}}_{t}\end{pmatrix}H_t^T \left(H_t\text{varcov}\begin{pmatrix} \hat x^{\text{LDM}}_{t}\\ \hat v^{\text{LDM}}_{t}\\ \hat a^{\text{LDM}}_{t}\end{pmatrix}H_t^T+\text{varcov}\begin{pmatrix} \delta\hat x^{\text{sensors}}_{t}\\ \delta\hat a^{\text{sensors}}_{t}\end{pmatrix}\right)^{-1}\]

and \(H_t:\begin{pmatrix} x\\ v\\ a\end{pmatrix}\mapsto \begin{pmatrix} x\\ a\end{pmatrix}\) is just a matrix specific to the types of sensor measurements that are being fused (in this case, only altitude from the barometer and acceleration from the accelerometer, so \(H_t=H=\begin{pmatrix}1&0&0\\0&0&1\end{pmatrix}\)). Here, because the barometer and accelerometer measurements may be assumed to be uncorrelated with each other, the variance-covariance matrix of the sensor noise is of the form:

\[\text{varcov}\begin{pmatrix} \delta\hat x^{\text{sensors}}_{t}\\ \delta\hat a^{\text{sensors}}_{t}\end{pmatrix}=\begin{pmatrix}\sigma^2_{\text{barometer}} & 0 \\ 0 &\sigma^2_{\text{accelerometer}}\end{pmatrix}\]

Recovery

distinction between single deployment recovery system and dual deployment recovery system.

Posted in Blog | Leave a comment

Born-Oppenheimer Approximation

The purpose of this post is to explain the Born-Oppenheimer approximation.

Ignoring relativistic fine/hyperfine structure effects, the gross structure molecular Hamiltonian \(H\) (i.e. the “theory of everything”) is:

\[H=T_{\text n}+H_{\text e}\]

where the nuclear kinetic energy is:

\[T_{\text n}=\sum_{i}\frac{\textbf P_i^2}{2M_i}\]

and, for reasons that will become apparent soon, the electronic Hamiltonian \(\text H_{\text e}:=T_{\text e}+V_{\text{ee}}+V_{\text{ne}}\) is thought of as consisting of the following terms:

\[V_{\text{nn}}=\alpha\hbar c\sum_{i<j}\frac{Z_iZ_j}{|\textbf X_i-\textbf X_j|}\]

\[T_{\text e}=\sum_{i}\frac{\textbf p_i^2}{2m_e}\]

\[V_{\text{ee}}=\alpha\hbar c\sum_{i<j}\frac{1}{|\textbf x_i-\textbf x_j|}\]

\[V_{\text{ne}}=-\alpha\hbar c\sum_{i,j}\frac{Z_i}{|\textbf X_i-\textbf x_j|}\]

For an arbitrary molecule, the corresponding molecular Hamiltonian \(H\) is difficult to diagonalize. Fortunately, the existence of large mass gap \(M_i/m_e\sim 10^3\) between the nuclei and electrons motivates the Born-Oppenheimer approximation. This consists of \(2\) steps:

Step #\(1\): Diagonalize the electronic Hamiltonian \(H_{\text e}\) by itself first, viewing the nuclei positions \(\{\textbf X_i\}\) as fixed parameters:

\[H_{\text e}|E_{\text e}\{\textbf X_i\}\rangle=E_{\text e}\{\textbf X_i\}|E_{\text e}\{\textbf X_i\}\rangle\]

By adiabatically varying \(\{\textbf X_i\}\), one thereby obtains an effective potential energy surface \(E_{\text e}\{\textbf X_i\}\).

Step #\(2\): Solve \((T_{\text n}+E_{\text e}\{\textbf X_i\})|E\rangle=E|E\rangle\) to get the molecular energies \(E\).

in which the nuclei are taken to be roughly clamped \(T_{\text n}\approx 0\) about their equilibrium positions \(\langle{\textbf X}_i\) except that the internuclear potential:

\[V_{\text{nn}}\approx \text{const}+\frac{1}{2}\sum_{i,j}\left(\frac{\partial^2 V_{\text{nn}}}{\partial X_i\partial X_j}\right)_{\text{eq}}\eta_i\eta_j\]

permits small harmonic vibrations of the nuclei about their clamped equilibria, where \(\eta_i=X_i-\bar X_i\). By diagonalizing the Hessian \(\left(\frac{\partial^2 V_{\text{nn}}}{\partial X_i\partial X_j}\right)_{\text{eq}}\) in its eigenbasis of normal modes, one can obtain a set of fictitious nuclei whose vibrations are completely decoupled from each other, i.e. a bunch of independent harmonic oscillators.

Semiclassically, the perturbation due to an external electromagnetic wave \(\gamma\) is \(\Delta H_{\gamma}=-e\sum_{\text{electrons } i}\textbf E_0\cos(\textbf k\cdot\textbf x_i-\omega t)+e\sum_{\text{nuclei }i} Z_i\cos(\textbf k\cdot\textbf X_i-\omega t)\) where \(\omega=c|\textbf k|\).

Posted in Blog | Leave a comment

The Quantum Theory of Angular Momentum

In this interview, well-known theoretical physicist Frank Wilczek commented that “still to this day I think the quantum theory of angular momentum is one of the absolute pinnacles of human achievement. Just beautiful”. The purpose of this post is to give a flavor of why Wilczek might have thought that.

As (orbital/spin/total) angular momentum should be an observable, it therefore should arise from a smooth projective unitary representation \(\phi^{\infty}\) of some group acting on suitable state spaces \(\mathcal H\cong L^2(\textbf R^3\to\textbf C,d^3\textbf x)\). Unsurprisingly, that group is \(SO(3)\) with \(\phi^{\infty}:SO(3)\to PU(\mathcal H)\) being defined by \(\langle\textbf x|\phi^{\infty}_R|\psi\rangle:=\langle R^{-1}\textbf x|\psi\rangle\) where also \(R^{-1}=R^T\) (although it’s quite intuitive why it must be \(R^{-1}\) rather than \(R\), one can also check that only \(R^{-1}\) allows the homomorphism property \(\phi^\infty_{R_2R_1}=\phi^\infty_{R_2}\circ\phi^{\infty}_{R_1}\) to be satisfied). The induced Lie algebra representation is then \(\dot{\phi}^{\infty}_1:\frak{so}\)\((3)\to\frak u\)\((\mathcal H)/i\textbf R1\) given by \(\dot{\phi}^{\infty}_1(\tilde J)=\left(\frac{\partial}{\partial\theta}\right)_{\theta=0}\phi^{\infty}_{e^{\tilde{J}\theta}}\), so that one has the angular directional derivative:

\[\langle\textbf x|\dot{\phi}^{\infty}_1(\tilde J)|\psi\rangle=-\tilde{J}\textbf x\cdot\frac{\partial\langle\textbf x|\psi\rangle}{\partial\textbf x}\]

Letting \(\tilde{J}_1,\tilde J_2,\tilde J_3\in\frak{so}\)\((3)\) be the standard generators of \(\frak{so}\)\((3)\) (i.e. \((\tilde J_i)_{jk}=\varepsilon_{ijk}\)). Consider substituting \(\tilde J:=\tilde J_3\) for instance. Then clearly, one has:

\[\langle\textbf x|\dot{\phi}^{\infty}_1(\tilde J_3)|\psi\rangle=-\frac{\partial\langle\textbf x|\psi\rangle}{\partial\phi}\]

with \(\phi\) the usual cylindrical coordinate about the \(z\)-axis. It is instructive to prove this by writing \(\textbf x=\rho\hat{\boldsymbol{\rho}}_{\phi}+z\hat{\textbf k}\), \(\tilde J_3\textbf x=\hat{\textbf k}\times\textbf x\) and using the standard expression of the gradient in cylindrical coordinates \(\frac{\partial\langle\textbf x|\psi\rangle}{\partial\textbf x}=\frac{\partial\langle\textbf x|\psi\rangle}{\partial\rho}\hat{\boldsymbol{\rho}}_\phi+\frac{1}{\rho}\frac{\partial\langle\textbf x|\psi\rangle}{\partial\phi}\hat{\boldsymbol{\phi}}_{\phi}+\frac{\partial\langle\textbf x|\psi\rangle}{\partial z}\hat{\textbf k}\). Finally, to recover the angular momentum observables, one just does the usual multiply by \(i\hbar\), so for instance one has the familiar \(\langle\textbf x|L_3|\psi\rangle=-i\hbar\frac{\partial\langle\textbf x|\psi\rangle}{\partial\phi}\). This should be compared with formulas such as \(\langle\textbf x|\textbf P|\psi\rangle=-i\hbar\frac{\partial\langle\textbf x|\psi\rangle}{\partial\textbf x}\) and \(H|\psi\rangle=i\hbar\frac{\partial|\psi\rangle}{\partial t}\).

The commutation relations \([\tilde J_i,\tilde J_j]=\varepsilon_{ijk}\tilde J_k\) for \(\frak{so}\)\((3)\) are just an application of the Jacobi identity for \(\textbf R^3\) with the cross product \(\times\). It suffices to show \([\tilde J_1,\tilde J_2]=\tilde J_3\) for example. We have: \([\tilde J_1,\tilde J_2]\textbf x=\tilde J_1\tilde J_2\textbf x-\tilde J_2\tilde J_1\textbf x=\hat{\textbf i}\times(\hat{\textbf j}\times\textbf x)-\hat{\textbf j}\times(\hat{\textbf i}\times\textbf x)=(\hat{\textbf i}\times\hat{\textbf j})\times\textbf x=\hat{\textbf k}\times\textbf x=\tilde J_3\times\textbf x\). The commutator algebra for the angular momentum observables are then directly inherited from the commutation relations for \(\frak{so}\)\((3)\) thanks to the fact that \(\dot{\phi}^\infty_1\) is a Lie algebra representation. To wit, consider \([J_i,J_j]=[i\hbar\dot{\phi}^\infty_1(\tilde J_1),i\hbar\dot{\phi}^\infty_1(\tilde J_2)]=(i\hbar)^2[\dot{\phi}^\infty_1(\tilde J_1),\dot{\phi}^\infty_1(\tilde J_2)]=(i\hbar)^2\dot{\phi}^\infty_1([\tilde J_1,\tilde J_2])=(i\hbar)^2\dot{\phi}^\infty_1(\varepsilon_{ijk}\tilde J_k)=i\hbar\varepsilon_{ijk}J_k\). Or, on a more abstract level, it is clear that we have the isomorphism of Lie algebras \((\textbf R^3,\times)\cong(\frak{so}\)\((3),[,])\).

Anyways, armed with the commutator algebra \([J_i,J_j]=i\hbar\varepsilon_{ijk}J_k\), the spectrum \(\Lambda_{J_3}\) can be deduced. Letting \(\textbf J^2:=J_1^2+J_2^2+J_3^2\) be the Casimir invariant so that \([\textbf J^2,J_i]=0\), this means one can only know (i.e. measure with \(100\%\) certainty) the total angular momentum and angular momentum along one specific direction \(\hat{\textbf n}\). Arbitrarily orienting the coordinate axes so that \(\hat{\textbf n}:=\hat{\textbf k}\), this means the angular momentum \(J_3\) of the quantum particle along the \(z\)-axis is known and therefore is unknown in the \(xy\)-plane. Now, notice that \(\textbf J^2-J_3^2=J_1^2+J_2^2\). The right-hand side is a sum of squares, so there are two factorizations possible: \(J_1^2+J_2^2=(J_1+iJ_2)(J_1-iJ_2)+i[J_1,J_2]=(J_1-iJ_2)(J_1+iJ_2)+i[J_2,J_1]\). However, we’ve already worked out that \([J_1,J_2]=-[J_2,J_1]=i\hbar J_3\). Defining the mutually adjoint raising and lowering ladder operators \(J_{\pm}:=J_1\pm iJ_2\), this means that \(\textbf J^2-J_3^2+\hbar J_3=J_+J_-\) and \(\textbf J^2-J_3^2-\hbar J_3=J_-J_+\). In both cases, notice that both \(J_+J_-\) and \(J_-J_+\) are Hermitian and positive semi-definite (being of the standard SVD form, analogous to the number operator for the quantum harmonic oscillator). Thus, if \(|\lambda^{(2)},\lambda_{(3)}\rangle\) is a simultaneous eigenstate of \(\textbf J^2\) and \(J_3\) with the obvious eigenvalues, then the singular values must satisfy \(\lambda^{(2)}-\lambda_{(3)}^2+\hbar\lambda_{(3)}\geq 0\) and \(\lambda^{(2)}-\lambda_{(3)}^2-\hbar\lambda_{(3)}\geq 0\) (this has a nice geometric interpretation with two parabolas and the intersection in the middle region). The result is that for a given total angular momentum (squared) \(\lambda^{(2)}\), the angular momentum \(\lambda_{(3)}\) in the \(z\)-direction can only lie in an interval of width \(\sqrt{\hbar^2+4\lambda^{(2)}}-\hbar\) centered around \(\lambda_{(3)}=0\).

Now the part that I’m still figuring out how to understand deeply/motivate. Just as when one factorizes the Hamiltonian \(H_{\text{QHM}}\) of the quantum harmonic oscillator and realized that the factors turn out to act as raising and lowering operators for \(H_{\text{QHM}}\), so here the same phenomenon occurs. I feel that it depends not just on the fact that one is dealing with a sum of squares, but also works in both cases because the commutator is really simple. Once you know that, the proof is straightforward:

\[J_3J_+|\lambda^{(2)},\lambda_{(3)}\rangle=J_3J_1|\lambda^{(2)},\lambda_{(3)}\rangle+iJ_3J_2+|\lambda^{(2)},\lambda_{(3)}\rangle\]

The intuition here is that it would be really nice if the \(J_3\) could act on the state instead since we know exactly what it will do \(J_3|\lambda^{(2)},\lambda_{(3)}\rangle=\lambda_{(3)}|\lambda^{(2)},\lambda_{(3)}\rangle\). So to be able to make this happen, the commutator emerges naturally as a necessity to compute, which fortunately we already know. This yields:

\[J_3J_+|\lambda^{(2)},\lambda_{(3)}\rangle=(\lambda_{(3)}+\hbar)J_+|\lambda^{(2)},\lambda_{(3)}\rangle\]

And by a similar computation, \(J_3J_-|\lambda^{(2)},\lambda_{(3)}\rangle=(\lambda_{(3)}-\hbar)J_-|\lambda^{(2)},\lambda_{(3)}\rangle\). Finally, since \([\textbf J^2,J_{\pm}]=0\) (as each of the \(J_i\) commute with \(\textbf J^2\)), this means that \(\textbf J^2J_{\pm}|\lambda^{(2)},\lambda_{(3)}\rangle=J_{\pm}\textbf J^2|\lambda^{(2)},\lambda_{(3)}\rangle=\lambda^{(2)}J_{\pm}|\lambda^{(2)},\lambda_{(3)}\rangle\). The key result is therefore that \(J_+\) and \(J_-\) can both be thought of as increasing or decreasing the total angular momentum of the system along the \(z\)-axis but without changing the total angular momentum \(\textbf J^2\). The mere existence of the ladder operators means ultimately that the end result is as depicted in the picture below. Read the vertical axis as imposing the total amount of angular momentum (squared) and the horizontal axis as imposing the amount of angular momentum in the \(z\)-direction, with the valid angular momentum eigenstates forming a hexagonal lattice of sorts.

In physical space \(\textbf R^3\), the quantized angular momenta vectors allowed can be represented by this cross-section of the \(\rho\) vs. \(z\) plane. Below is for \(j=0,1/2,1,…,5\):

Focusing on the highest-weight states \(m_j=j\) (the inner parabolas in the diagram), for \(j=1/2\) the dark blue dots are inclined at the magic angle \(\theta_{1/2}=54.7^{\circ}\) from the \(z\)-axis, while for the \(j=1\) highest-weight states the angle is \(\theta_1=45^{\circ}\) as is intuitively clear from the plot. In general, it is straightforward to show that for that highest-weight state \(|j(j+1)\hbar ^2,j\hbar\rangle\), the angle decreases as \(\cos(\theta_j)=\sqrt{\frac{j}{j+1}}\) so that \(\lim_{j\to\infty}\theta_j=0\) (aside, one immediate conjecture is that there exists some relation between these angles \(\theta_j\) and the position representations \(\langle\textbf x|j(j+1)\hbar^2,j\hbar\rangle\) of these angular momentum eigenstates since the magic angle satisfies \(P_2(\cos(\theta_{1/2}))=0\) and if I remember correctly the angular momentum eigenstates are written in terms of spherical harmonics in the position representation which are defined via Legendre functions \(P_j^{m_j}(\cos(\theta))\) of the first kind…does it correspond to some kind of conical angular node?

If one wished, it is straightforward to compute the following explicit normalization constants (using the Condon-Shortley phase convention):

\[J_+|j(j+1)\hbar^2,m_j\hbar\rangle=\sqrt{j(j+1)-m_j(m_j+1)}\hbar|j(j+1)\hbar^2,(m_j+1)\hbar\rangle\]

\[J_-|j(j+1)\hbar^2,m_j\hbar\rangle=\sqrt{j(j+1)-m_j(m_j-1)}\hbar|j(j+1)\hbar^2,(m_j-1)\hbar\rangle\]

which will be real because \(-j\leq m_j\leq j\) as \(j\in\textbf N/2\) is by definition the highest weight (or spin) of the irreducible representation for which \(J_+|\lambda^{(2)},j\rangle=0\).

The Hamiltonian \(H_{RR}\) of a rigid rotor (rigid means no vibration, so one can safely set the potential energy \(V=0\)) consists entirely of rotational kinetic energy about its center of mass:

\[H_{RR}=\frac{1}{2}\left(\frac{J_1^2}{I_1}+\frac{J_2^2}{I_2}+\frac{J_3^2}{I_3}\right)\]

where \(I_1,I_2,I_3\) are its principal moments of inertia. Assuming \(z\)-axisymmetry \(I_1=I_2=I\), one has:

\[H_{\text{rigid rotor}}=\frac{1}{2}\left(\frac{J^2}{I}+\left(\frac{1}{I_3}-\frac{1}{I}\right)\textbf J_3^2\right)\]

At this point, the spectrum \(\Lambda_{H_{\text{rigid rotor}}}\) of kinetic energies is easy to read off, where as usual \(j\in\textbf N/2\) and \(m_j\in\{-j,…,j\}\):

\[E_{j,m_j}=\frac{\hbar^2}{2I}\left(j(j+1)+\left(\frac{I}{I_3}-1\right)m_j^2\right)\]

The ground state energy of the rigid rotor fixed in free space is thus \(E_{0,0}=0\) corresponding to no kinetic energy. In general, because \(I_3\ll I\), it follows that the lowest energy states should have \(m_j=0\) (i.e. no angular momentum about the \(z\)-axis so one can visualize how the rigid rotor necessarily ought to be spinning). This can only occur for integer total angular momenta \(j\in\textbf N\subseteq\textbf N/2\) where now:

\[E_{j,0}=j(j+1)\frac{\hbar^2}{2I}\]

Thus, the constraint \(j\in\textbf N\) means that if the rigid rotor is in state \(|j(j+1)\hbar^2,0\rangle\), then the smallest energetic transition it can make is by \(\Delta j=1\) and not \(\Delta j=1/2\) (EDIT: I’m so dumb bruh why was I even thinking this…clearly there ain’t such a thing as a “half-raising” or “half-lowering” operator). The smallest energy of the corresponding photon it can absorb/emit is (basically, recall that anytime one has a quadratic map \(j\mapsto j^2\) then the first differences \(\Delta j\propto j\)):

\[\Delta E^j_{j-1}=E_{j,0}-E_{j-1,0}=\frac{j\hbar^2}{I}\]

with angular frequency \(\omega^j_{j-1}=\Delta E^j_{j-1}/\hbar=j\hbar/I=j\omega^1_0\). If one models a carbon monoxide \(\text{CO}\) molecule as a rigid rotor in one of these low energy states, then about the center of mass one has \(I=\mu R^2\) where \(\mu=\frac{12\times 16}{12+16}\text{ u}\) is the reduced mass and \(R=112.8\text{ pm}\) is the bond length. These combine to yield a base photon frequency of \(f^1_0=\omega^1_0/2\pi=115.8\text{ GHz}\), which is close to the experimental \(f^1_0=113.17\text{ GHz}\), a very important spectral frequency in astronomy for the study of interstellar gas. One might inquire by what factor the angular velocity \(\Omega_j\) of the molecule’s rotation is related to the angular frequency \(\omega^j_{j-1}\) of photons absorbed/emitted. Since it was already mentioned that the rigid rotor in these low energy states does not rotate at all in the \(z\)-direction \(m_j=0\), it follows that \(|\textbf J|=I\Omega_j\), but we also know that \(|\textbf J|=\sqrt{j(j+1)}\hbar=j\hbar + O_{j\to\infty}(1)\) so \(\Omega_j\approx \omega^j_{j-1}\) as \(j\to\infty\) which is to be expected (e.g. from Maxwell’s equations applied to a rotating electric dipole), although if \(j\) gets too large then there may eventually be some \(m_j=\pm 1/2\) states popping up too?

Posted in Blog | Leave a comment

Canonical Quantization

The purpose of this post will be to review a standard method for passing from the classical to the quantum world (or more poetically, turning \(\hbar=0\) on to \(\hbar=1\) in natural units). This procedure is called canonical quantization.

It is first worth emphasizing a conceptual point. Morally, it is quantum mechanics, and not classical mechanics, that represents ground truth. Thus, the whole notion of trying to “pass” from the classical world to the quantum world (i.e. the whole notion of “quantizing” a classical theory) is a bit fallacious; from a logical perspective, it makes much more sense to start with quantum mechanics and pass the other way into the classical world by taking the limit \(\lim_{\hbar\to 0}\). For instance, the fact that a particle’s position and momentum along some direction \(\hat{\textbf n}\) can never be simultaneously known with certainty should be thought of as something very normal, the way things just are and have always been, and that one’s apparent ability to evade this principle in classical mechanics should be thought of as a strange, unusual luxury that deviates from the quantum norm. Although as classical creatures it is the Newtonian physics of the classical world that is certainly more intuitive, as a serious physicist it is essential to think clearly and get one’s priorities straight.

First Quantization (Nonrelativistic Quantum Mechanics)

Mathematically, canonical quantization \(f\mapsto\hat f\) is a Lie algebra representation that maps functions \(f\in C^{\infty}(\textbf R^2\to\textbf R)\) on classical phase space \(\textbf R^2\) to operators \(\hat f\in \frak u\)\((\mathcal H)\) on quantum state space \(\mathcal H\) (all this discussion is easily generalizable to a higher-dimensional phase space like \(\textbf R^{6N}\)). In the former, the Lie bracket is provided by the Poisson bracket \(\{,\}\) while in the latter the Lie bracket is the commutator \([,]\), and one requires canonical quantization to preserve this structure:

\[[\hat f,\hat g]=\widehat{\{f,g\}}\]

Thanks to Taylor’s theorem, a countably infinite basis of \(C^{\infty}(\textbf R^2\to\textbf R)\) is given by monomial functions on phase space of the form \((x,p)\mapsto 1,x,p,x^2,xp,p^2,…\), but it turns out there is good reason to stop at quadratic order. Specifically, it is worth noticing (using standard abuse of notation of writing e.g. \(x\) to mean the function \((x,p)\mapsto x\)) that the subspace \(\text{span}_{\textbf R}(1,x,p,x^2,xp,p^2)\subset C^{\infty}(\textbf R^2\to\textbf R)\) is in fact a \(6\)-dimensional Lie subalgebra of the Lie algebra \(C^{\infty}(\textbf R^2\to\textbf R)\):

In fact, the \(3\times 3\) yellow box highlighted above defines yet an even smaller Lie subalgebra \(\text{span}_{\textbf R}(1,x,p)\subset C^{\infty}(\textbf R^2\to\textbf R)\), called the Heisenberg Lie algebra. One can then check that the canonical quantization map, defined on the basis functions by:

\[\hat 1:=-i\hbar 1\]

\[\hat x:=-iX\]

\[\hat p:=-iP\]

\[\widehat{x^2}:=\frac{X^2}{i\hbar}\]

\[\widehat{xp}:=\frac{1}{i\hbar}\frac{XP+PX}{2}\]

\[\widehat{p^2}:=\frac{P^2}{i\hbar}\]

is a legitimate Lie algebra representation when restricted to either the first-order Heisenberg subalgebra \(\text{span}_{\textbf R}(1,x,p)\) or the quadratic subalgebra \(\text{span}_{\textbf R}(1,x,p,x^2,xp,p^2)\).

As an aside, the standard physicist’s convention with regards to the meaning of a Lie algebra representation is to instead impose the rule:

\[[\hat f,\hat g]=i\hbar\widehat{\{f,g\}}\]

with the extra factor of \(i\hbar\) compared with the mathematician’s convention. This makes it so that the canonical quantization map corresponds to “just putting hats on everything” (the exception is for the function \(xp\) below where a symmetric combination is used to maintain Hermiticity):

\[\hat 1:=1\]

\[\hat x:=X\]

\[\hat p:=P\]

\[\widehat{x^2}:=X^2\]

\[\widehat{xp}:=\frac{XP+PX}{2}\]

\[\widehat{p^2}:=P^2\]

Despite the hopes of Dirac and others that such a simple prescription could extend to a Lie algebra representation over all functions \(f\in C^{\infty}(\textbf R^2\to\textbf R)\) on phase space, it was subsequently shown in work of Groenewold and Van Hove that this is impossible; i.e. that there does not exist any “reasonable” quantization map that would continue to remain a Lie algebra representation beyond the subalgebra \(\text{span}_{\textbf R}(1,x,p,x^2,xp,p^2)\) of quadratic polynomials on phase space. One way to see this is that, continuing to cubic order, one could reasonably demand, in analogy to \(\widehat{xp}:=\frac{XP+PX}{2}\), that for instance \(\widehat{x^2p}:=\frac{X^2P+XPX+PX^2}{3}\).

But if this were so, then one can see that:

\[\{x^3,p^2\}=6x^2p\to 2i\hbar(X^2P+XPX+PX^2)\]

whereas, using (ironically) the identity \([f(X),P]=i\hbar f'(X)\) in light of the obvious analog \(\{f(x),p\}=f'(x)\):

\[[X^3,P^2]=P[X^3,P]+[X^3,P]P=3i\hbar(X^2P+PX^2)\]

So these disagree, at least if one were to implement canonical quantization using the Weyl transform as above (thus, this is not a rigorous proof that there isn’t some other clever way to quantize that preserves the Lie algebra structure consistently, but just an intuitive argument for why such a scheme can’t exist, and that at its heart the reason is due to the non-abelian nature of operators).

Second Quantization (Quantum Field Theory)

Just as one can implement canonical quantization to pass from classical particle mechanics on phase space to quantum particle mechanics on state space, one can perform an analogous procedure to pass from classical field theory to quantum field theory. More precisely, it will be most convenient to work in the framework of Hamiltonian classical field theory (if one were to start with Lagrangian classical field theory, then in lieu of canonical quantization one would instead utilize path integral quantization but fundamentally all of these are equivalent as they must be).

Recall that Lorentz invariance is often manifest in Lagrangian classical field theories because one can just check that all indices are suitably contracted in the theory’s Lagrangian density \(\mathcal L\) to check whether or not the theory’s action \(S=\int dtd^3\textbf x\mathcal L\) is Lorentz-invariant. By contrast, the Legendre transform to an equivalent Hamiltonian formulation of the same classical field theory doesn’t break Lorentz invariance, but obscures it, since defining the conjugate momentum field \(\pi^j(X):=\frac{\partial\mathcal L}{\partial\dot{\phi}_j}\) picks out a preferred time coordinate \(ct=x^0\) and this is further emphasized in the Hamiltonian density itself:

\[\mathcal H=\pi^i\dot{\phi}_i-\mathcal L\]

In the Schrodinger picture, a quantum field is considered to be an operator-valued function \(\phi(\textbf x)\) on space \(\textbf x\in\textbf R^3\) (not spacetime because in the Schrodinger picture one takes operators to be \(t\)-independent) which satisfies the canonical commutation relations (i.e. the commutation relations of the Heisenberg algebra that canonical quantization would aim to preserve):

\[[\phi_i(\textbf x),\phi_j(\textbf x’)]=[\pi^i(\textbf x),\pi^j(\textbf x’)]=0\]

\[[\phi_i(\textbf x),\pi^j(\textbf x’)]=i\hbar\delta_i^j\delta^3(\textbf x-\textbf x’)\]

In the Schrodinger picture, all the \(t\)-dependence lies in the state \(|\psi\rangle=|\psi(t)\rangle\) which evolves by the usual Schrodinger equation \(i\hbar\dot{|\psi\rangle}=H|\psi\rangle\) where \(H=\int d^3\textbf x\mathcal H\). However, note that \(|\psi\rangle\) is more like a “wavefunctional” than a “wavefunction” because it takes in some field \(\phi(\textbf x)\) as an input and outputs a complex number. Thus, to emphasize again, Lorentz invariance was already broken from the get-go, but here it’s just getting even worse since only space \(\textbf x\) appears in the field \(\phi(\textbf x)\) whereas only time \(t\) appears in the Schrodinger equation.

Posted in Blog | Leave a comment

Part 1A Chemistry Overview

The Shapes and Structures of Molecules

  • Depending on the nature of a particular chemical compound, there exist many experimental techniques for finding its molecular structure (David Tong would call these experimental techniques scattering which if one views light as a particle, would be an appropriate terminology). Examples include nuclear magnetic resonance (NMR) spectroscopy (where one also has the freedom to select any NMR-active nucleus with \(I\geq 1/2\) such as \(^1\text H\) or \(^{13}\text C\)), microwave spectroscopy, infrared (IR) spectroscopy, UV-visible spectroscopy, x-ray/electron/neutron diffraction, and mass spectrometry (with or without electrospray ionization). Notice that the spectroscopic methods are ordered in increasing energy \(E\) along the electromagnetic spectrum. Mass spectrometry is sort of an outlier in that it is not a spectroscopic method, but rather purely spectrometric. However, the spectroscopic methods are arguably the most important ones these days because they convey the richest amount of information, and several Nobel prizes in chemistry have basically gone to people who used these or similar techniques to determine the structures of very complicated (often biologically significant) molecules.
    • In NMR spectroscopy, the idea is to place a molecule in an external magnetic field \(\textbf B_{\text{ext}}\) (this is often actually the superposition of three magnetic fields, one uniform and constant, one uniform but oscillating, and one non-uniform but constant, details can be found here). Then, an NMR-active nucleus will have to first-order a Zeeman Hamiltonian of the form \(H_{\text{NMR-active nucleus}}=-\boldsymbol{\mu}\cdot\textbf B_{\text{ext}}\) where the nuclear magnetic dipole moment operator is \(\boldsymbol{\mu}=\gamma\textbf S\). Furthermore, define the dimensionless nuclear spin angular momentum operator \(\textbf I:=\textbf S/\hbar\) so that its spectrum is of the form \(\sqrt{I(I+1)}\) for \(I=0,1/2,1,3/2,…\). Thus, for both \(^1\text H\) and \(^{13}\text C\) nuclei which are NMR-active with \(I=1/2\), one has \(2I+1=2\) energy eigenstates (\(m_I=\pm 1/2\) or “spin-up” and “spin-down”) arising from Zeeman splitting, separated by \(\Delta E=\gamma\hbar|\textbf B_{\text{ext}}|\). However, this wouldn’t make NMR particularly useful as all identical nuclei would just experience the same \(\textbf B_{\text{ext}}\) and hence have identical energies and be indistinguishable. This is where two important perturbations are then added to the Hamiltonian \(H\) of the NMR-active nucleus. Specifically, one has \(H=H_{\text{Zeeman}}+H_{\text{screening}}+H_{\text{J-coupling}}\), where \(H_{\text{screening}}=-\boldsymbol{\mu}\cdot\delta\textbf B_{\text{ext}}\) with \(\delta\) the chemical shift tensor field and \(\delta\textbf B_{\text{ext}}\) thought of as the induced screening magnetic field at the NMR-active nucleus due to functional groups nearby (e.g. electronegative elements, diamagnetic shielding from \(e^-\) in an aromatic ring, forced hybridizations of valence atomic orbitals on the atom associated to that NMR-active nucleus, etc.). Meanwhile, \(H_{\text{J-coupling}}=\sum_{\text{all non-identical NMR-active nuclei } i}2\pi\hbar\textbf I\cdot(J\textbf I_i)\) where the sum runs over all NMR-active nuclei (or at least the ones close to the NMR-active nucleus of interest) including different elements (and excluding identical nuclei in identical magnetic environments). Here, \(J\) is the \(J\)-coupling tensor field, and this \(H_{\text{J-coupling}}\) interaction is mediated by the Fermi contact interaction, Pauli exclusion principle, and Hund’s rule. For viscinal \(^3J_{^1\text H-^1\text H}\) coupling of \(^1\text H\) nuclei, it is always possible to draw a Newman projection with an associated dihedral angle \(\theta\) between the two \(^1\text H\) nuclei, and the Karplus equation asserts that \(^3J_{^1\text H-^1\text H}(\theta)=f_2\cos(2\theta)+f_1\cos(\theta)+f_0\) for some empirical fitting parameters \(f_0,f_1,f_2\in\textbf R\). The intuition is thus that trans/anti protons \(\theta=180^{\circ}\) have the greatest viscinal coupling constant, followed by cis/eclipsed protons \(\theta=0\), and worst coupling is for orthogonal protons \(\theta=90^{\circ}\). There is pretty much an infinite rabbit hole of interactions that can arise with NMR spectroscopy (e.g. not just diamagnetic but also paramagnetic screening interactions in \(H_{\text{screening}}\), or interactions of the nuclear spin angular momentum with rotation of the molecule, or for NMR-active nuclei with \(I\geq 1\), a quadrupolar interaction? It’s a deep subject. Final comment: under isotropic motional averaging and within the secular approximation, one has \(\delta\mapsto\frac{1}{3}\text{Tr}(\delta)\) and likewise \(J\mapsto\frac{1}{3}\text{Tr}(J)\)and this is what is actually measurable and plotted on an NMR spectrum. Larger nuclear masses (e.g. \(^{13}\text{C}\)) will have larger chemical shift ranges.
    • For \(^{13}\text C\) NMR spectroscopy:

Also, spin angular momentum coupling between \(^{13}\text C\) nuclei and \(^1\text H\) nuclei are typically suppressed via broadband proton decoupling. Also, due to low isotopic abundance, spin angular momentum coupling among \(^{13}\text C\) nuclei may only be visible as satellite peaks (unless the compound is \(^{13}\text C\)-enriched) and vice versa (i.e. in an \(^1\text H\)-NMR spectrum, the spin angular momentum coupling of protons with \(^{13}\text C\) nuclei would also only maybe appear as satellites). There is also a variant known as \(^{13}\text C\) attached proton test (APT) NMR spectroscopy which is essentially just the regular \(^{13}\text C\) NMR spectrum but it is also sensitive to the parity of the number of attached \(\text H\) atoms (look at which way the deuterated chloroform \(\text{CDCl}_3\) solvent points to determine this).

  • For \(^1\text{H}\)-NMR spectra \(n_{^1\text H}(\delta)\), the integral trace \(\int_{\delta_1}^{\delta_2}n_{^1\text H}(\delta)d\delta\) gives the number of protons with chemical shifts \(\delta\in[\delta_1,\delta_2]\). This is not the case for \(^{13}\text C\)-NMR spectra (why though?). Another quirk peculiar to \(^1\text{H}\)-NMR spectra is roofing (why does it happen?).
  • Exchangeable protons \(\text O-\text H\) and \(\text N-\textbf H\) can be made to disappear from a \(^1\text H\)-NMR spectrum using a \(\text D_2\text O\) shake (as \(\text H\to\text D\) which resonates at very different wavenumbers due to twice the nuclear mass).
  • In IR spectroscopy, no external field is needed. Just irradiate the sample with a bunch of infrared light and see if certain frequencies are absorbed more than others by the sample. These would correspond to normal mode/vibrational eigenstates of the molecule, which themselves tend to be dominated by particular functional groups. Although it’s really a quantum harmonic oscillator, one can think of a normal mode as a classical harmonic oscillator with \(\omega_0=\sqrt{\frac{k}{\mu}}\) where \(\mu=\frac{m_1m_2}{m_1+m_2}\) is the reduced mass (and as its name suggests, its generally closer to the smaller mass). An IR spectrum is then a plot of \(|\Delta\textbf p|(\nu)\), where \(\Delta\textbf p\) is the change in electric dipole moment associated with a given normal mode. Thus, purely covalent/non-polar bonds do not show up on IR spectra.

Posted in Blog | Leave a comment

Electric Dipole Moment as Center of Charge

If one has a collection \(Q=\int\rho d^3x>0\) of positive charge \(\rho>0\) in some region of space, then the electric dipole moment \(\textbf p\) of \(Q\) may be viewed as \(\textbf p=Q\textbf X\), where \(\textbf X=\frac{1}{Q}\int\textbf x\rho d^3x\) is the center of charge in \(Q\) in direct analogy with the center of mass. On the other hand, for regions of negative charge \(\rho<0\), one can reflect the location of the charge across the origin and turn it into positive charge \(\rho(\textbf x)\mapsto-\rho(-\textbf x)\). Then one again recovers the center of charge interpretation of the electric dipole moment. Clearly, this operation (notably the reflection) in general depends on where one selects the origin to be. However, if the system of charges is neutral overall \(Q=0\), then the electric dipole moment \(\textbf p\) turns out to be origin independent. This \(Q=0\) neutral case is exemplified with the classic point electric dipole consisting of \(+q\) and \(-q\) charges separated by a displacement vector \(\Delta\textbf x\) pointing from \(-q\) to \(+q\). In this case, it is possible to convince oneself that it doesn’t matter where one places the origin, one always ends up with the result \(\textbf p=q\Delta\textbf x\). In this case, taking the dipole limit \(\lim_{q\to\infty,\Delta\textbf x\to\textbf 0,\textbf p=\text{constant}}\), one has the following equations governing how the dipole interacts with an external electric field \(\textbf E_{\text{ext}}\) (assuming \(\textbf B=\textbf 0\)). The external force is:

\[\textbf F_{\text{ext}}=\left(\textbf p\cdot\frac{\partial}{\partial\textbf x}\right)\textbf E_{\text{ext}}=\frac{\partial\textbf E_{\text{ext}}}{\partial\textbf x}\textbf p\]

The external couple is:

\[\boldsymbol{\tau}_{\text{ext}}=\textbf p\times\textbf E_{\text{ext}}\]

And the external electric potential energy relative to the configuration where the electric dipole is orthogonal to the external electric field \(\textbf p\cdot\textbf E_{\text{ext}}=0\) is then just (note the physical significance of the negative sign):

\[V_{\text{ext}}=-\textbf p\cdot\textbf E_{\text{ext}}\]

which satisfies \(\textbf F_{\text{ext}}=-\frac{\partial V_{\text{ext}}}{\partial\textbf x}\) thanks to standard vector calculus identities. Note also that \(\textbf E_{\text{ext}}\) in all these expressions is to be evaluated at the location \(\textbf x\) of the point electric dipole. Finally, note that in chemistry the electric dipole moment is defined in the opposite direction \(\textbf p_{\text{chemistry}}=-\textbf p\). For instance in a formula unit of \(\text{NaCl}\), the electric dipole moment \(\textbf p_{\text{chemistry}}\) would be considered to point from the \(\text{Na}^+\) cation to the \(\text{Cl}^-\) anion. This is because \(\textbf p_{\text{chemistry}}\) indicates the direction of greatest \(e^-\) density \(\rho_{e^-}<0\) which makes intuitive sense as electrons are the mobile charge carriers are in the case of \(\text{NaCl}\) they would be polarized towards the more electronegative \(\text{Cl}^-\). So ultimately this all goes back to Benjamin Franklin’s unwise choice of conventional current being the opposite of the actual direction of \(e^-\) flow.

Finally, a note that identical formulas hold for magnetic dipoles with \(\textbf p\mapsto\boldsymbol{\mu}\) and \(\textbf E_{\text{ext}}\mapsto\textbf B_{\text{ext}}\). Somewhat similarly to the electric dipole moment, the magnetic dipole moment is defined via \[\boldsymbol{\mu}:=\frac{1}{2}\iiint_{\textbf x\in\textbf R^3}\textbf x\times\textbf J(\textbf x)d^3x\].

Posted in Blog | Leave a comment

Reciprocal Space

Problem: What is a lattice \(\Lambda\)? What does it mean to say that a lattice is Bravais? Give an example of a Bravais lattice and a non-Bravais lattice.

Solution: A lattice \(\Lambda\) is any periodic set of lattice points in \(\textbf R^d\) for some dimension \(d\) (usually \(d=2\) or \(d=3\)). More precisely, a lattice is said to be Bravais iff either of the following \(2\) (logically equivalent) statements are true:

i) There exist \(d\) linearly independent vectors \(\textbf x_1,\textbf x_2,…,\textbf x_d\in\textbf R^d\) such that \(\Lambda=\text{span}_{\textbf Z}(\textbf x_1,\textbf x_2,…,\textbf x_d)\) (aside: of course such a \(\textbf Z\)-basis certainly need not be unique).

ii) Begin at any lattice point \(\textbf x\in\Lambda\), then close one’s eyes and walk to any other lattice point \(\textbf x’\in\Lambda\) without turning one’s head. After opening one’s eyes, it looks as if one hasn’t moved at all (i.e. as if \(\textbf x’=\textbf x\) even though that need not be true)!

An example of a \(2\)D Bravais lattice is the triangular lattice which is \(\textbf Z\)-spanned by the vectors \(\textbf x_1=a\hat{\textbf x}\) and \(\textbf x_2=\frac{\sqrt{3}}{2}\hat{\textbf x}+\frac{1}{2}\hat{\textbf y}\). An example of a \(2\)D lattice which is non-Bravais (but still a legitimate lattice) is a hexagonal honeycomb lattice.

Problem: Given a Bravais lattice \(\Lambda\) and a volume \(V\subseteq\textbf R^d\), what does it mean to say that \(V\) is a unit cell of \(\Lambda\)? What does it mean if such a unit cell is primitive? What does it mean if such a unit cell is conventional? Give an example of a Bravais lattice \(\Lambda\) and unit cell thereof which is conventional but non-primitive.

Solution: \(V\) is said to be a unit cell of \(\Lambda\) iff arbitrary \(\Lambda\)-translations tessellate the whole space \(\textbf R^d\), i.e.

\[\bigcup_{\textbf x\in\Lambda} V+\textbf x=\textbf R^d\]

(in particular, a corollary of this definition is that \(V\) is a unit cell of \(\Lambda\) iff \(V+\Delta\textbf x\) is a unit cell of \(\Lambda\) for all \(\Delta\textbf x\in\textbf R^d\) so the exact “location” of \(V\) is irrelevant, only its shape matters).

A unit cell \(V\) is said to be primitive iff it contains a single lattice point of \(\Lambda\). Loosely speaking, this means \(\#(V\cap\Lambda)=1\). More precisely, the volume \(|V|\) of \(V\) should coincide with the (positive) volume \(|\det(\textbf x_1,\textbf x_2,…,\textbf x_d)|\) of the parallelepiped formed from an arbitrary \(\textbf Z\)-basis \(\textbf x_1,\textbf x_2,…,\textbf x_d\) for \(\Lambda\):

\[|V|=|\det(\textbf x_1,\textbf x_2,…,\textbf x_d)|\]

(aside: it is intuitively clear, and not hard to show, that because of the presence of the integer field \(\textbf Z\), although the \(\textbf Z\)-basis itself is not unique, the (positive) volume \(|\det(\textbf x_1,\textbf x_2,…,\textbf x_d)|\) is invariant with respect to all possible choices).

A unit cell \(V\) for \(\Lambda\) is said to be conventional iff some crystallographer arbitrarily decided they like that particular unit cell for \(\Lambda\). Usually this is motivated by the unit cell being simpler to conceptualize than a primitive unit cell, or because it is more faithful representation of the symmetries of \(\Lambda\).

For example, for a face-centered cubic Bravais lattice:

\[\Lambda_{\text{fcc}}=\text{span}_{\textbf Z}\left(\frac{a}{2}\hat{\textbf x}+\frac{a}{2}\hat{\textbf y},\frac{a}{2}\hat{\textbf x}+\frac{a}{2}\hat{\textbf z},\frac{a}{2}\hat{\textbf y}+\frac{a}{2}\hat{\textbf z}\right)\]

(aside: the basis vectors \(\textbf x_i\) for \(i=1,…,d\) are also called primitive lattice vectors or primitive translation vectors, the word “primitive” being a reference to the fact that one possible unit cell for any Bravais lattice \(\Lambda\) is always the parallelepiped \(V\) formed by the \(\textbf x_i\) which trivially has \(|V|=|\det\{\textbf x_i\}|\) and so is primitive).

Although the \(\textbf x\)-notation here is suggestive of the Bravais lattice \(\Lambda\) residing in real space \(\textbf R^d\), in fact all the discussion hitherto also applies to reciprocal space.

Problem: Define the reciprocal Bravais lattice \(\Lambda^*\) associated to a given real space Bravais lattice \(\Lambda\).

Solution: There are several logically equivalent formulations:

i) \[\textbf k\in\Lambda^*\Leftrightarrow\textbf k\cdot\textbf x\equiv 0\pmod 2\pi\] for all \(\textbf x\in\Lambda\).

ii) If \((\textbf x_1,…,\textbf x_d)\) is any \(\textbf Z\)-basis for \(\Lambda\), then any set of vectors \(\textbf k_1,…,\textbf k_d\) obeying \(\textbf k_i\cdot\textbf x_j=2\pi\delta_{ij}\) will \(\textbf Z\)-span \(\Lambda^*\):

\[\Lambda^*=\text{span}_{\textbf Z}(\textbf k_1,…,\textbf k_d)\]

(for \(d=3\) only, one has the explicit formulas:

\[\textbf k_1=2\pi\frac{\textbf x_2\times\textbf x_3}{V}\]

\[\textbf k_2=2\pi\frac{\textbf x_2\times\textbf x_3}{V}\]

\[\textbf k_3=2\pi\frac{\textbf x_1\times\textbf x_2}{V}\]

where \(V:=\textbf x_1\cdot(\textbf x_2\times\textbf x_3)\) is the primitive unit cell volume in real space.

iii) If \(\Lambda\) is associated to the Dirac comb \(\sum_{\textbf x’\in\Lambda}\delta^d(\textbf x-\textbf x’)\), then the Fourier transform of \(\Lambda\) is the Dirac comb \(\sum_{\textbf k’\in\Lambda}\delta^d(\textbf k-\textbf k’)\) which may be associated to the reciprocal lattice \(\Lambda^*\) itself (thus, x-ray crystallography (in the Fraunhofer limit) is simply the art of photographing reciprocal space!). More generally, for any \(\Lambda\)-periodic function \(\psi(\textbf x)\), the Fourier transform \(\psi(\textbf k):=\int\frac{d^d\textbf k}{(2\pi)^d}e^{-i\textbf k\cdot\textbf x}\psi(\textbf x)\) is given by:

\[\]

  • Introduce the Wigner-Seitz primitive unit cell of a Bravais lattice; in reciprocal space called the Brillouin zone of \(\Lambda^*\), perpendicular bisector construction.

Problem: Although the Bravais lattices \(\Lambda\) considered so far have been, strictly speaking, infinite in extent, in practice all solids are finite in size, containing a finite number \(|\Lambda|<\infty\) of lattice points. Given this consideration, how many quantum \(\textbf k\)-states are available in each Brillouin zone (in the extended zone scheme; equivalently, in the reduced zone scheme this would be phrased as a question of how many \(\textbf k\)-states are available in each band).

  • In \(\textbf R^3\), there is a standard classification of 3D Bravais lattices into \(14\) disjoint buckets based on how symmetric the conventional unit cell is (most symmetric is the primitive cubic 3D Bravais lattice, most asymmetric is the primitive triclinic 3D Bravais lattice, all the other 3D Bravais lattices lie on a spectrum somewhere in between).
  • A crystal \(\Gamma\) is the convolution of a 3D Bravais lattice \(\Lambda\) with a motif \(M\) of atoms or molecules: \(\Gamma=\Lambda*M\).
  • A lattice plane is any 2D affine subspace of the crystal \(\Gamma\), denoted by Miller indices \((hkl)\) where the reciprocal lattice vector \(h\textbf a^*+k\textbf b^*+l\textbf c^*\) is the normal vector the lattice plane. In other words, this yields the Weiss zone law \((U\textbf a+V\textbf b+W\textbf c)\cdot(h\textbf a^*+k\textbf b^*+l\textbf c^*)=hU+kV+lW=0\).
  • The multiplicity of a lattice plane \((hkl)\) is \(|\{hkl\}|\) and is at most \(|\{hkl\}|\leq 2^3\times 3!=48\).
  • There are \(2\) distinct solutions to achieving close packing of identical spheres in \(\textbf R^3\) (i.e. saturating the maximum packing efficiency of \(\eta=\frac{\pi}{3\sqrt{2}}\approx 74\%\)), namely the cubic close-packed crystal \(\Gamma_{\text{ccp}}\) and the hexagonal close-packed crystal \(\Gamma_{\text{hcp}}\) (this mathematical theorem is fundamentally why these two particular crystals are so important). Each of these crystals can be “deconvolved” into their conventional unit cell and motif \(\Gamma_{\text{ccp}}=\Lambda_{\text{fcc}}*\{(0,0,0)\}\) and \(\Gamma_{\text{hcp}}=\Lambda_{\text{ph}}*\{(0,0,0),(2/3,1/3,1/2)\}\).
  • Having said that both \(\Gamma_{\text{ccp}}\) and \(\Gamma_{\text{hcp}}\) are a close packing of identical spheres, they have associated close-packed
  • Both \(\Gamma_{\text{ccp}}\) and \(\Gamma_{\text{hcp}}\) also contain tetrahedral and octahedral interstices/voids which is typically where atoms/molecules of a second smaller element might go.
  • There are also several standard symmetries/point groups of 3D crystals \(\Gamma\): rotational symmetry (only \(4\) possible: diads, triads, tetrads, hexads due to the crystallographic restriction theorem), glide plane symmetry (glide planes not necessarily lattice planes), screw axis symmetry, and centrosymmetry \(\Gamma(-\textbf x)=\Gamma(\textbf x)\). Some of these are specific compositions of other symmetry elements.

Materials For Devices

  • Dielectric materials are electric insulators (\(\rho_f=0\)) and so are polarized by an external electric field \(\textbf E^{\text{ext}}\), leading to an induced polarization density \(\textbf P^{\text{ind}}=\varepsilon_0\chi_e\textbf E^{\text{ext}}\) reflecting the density of induced electric dipoles. Microscopic mechanisms of dielectric polarization are electronic polarization (any dielectric), ionic polarization (ionic crystals), and orientational polarization (e.g. water).
  • Centrosymmetric crystals \(\Gamma\) with \(\rho_b(-\textbf x)=\rho_b(\textbf x)\) are non-polar \(\textbf P=\textbf 0\).
  • Among non-centrosymmetric crystals, some are polar and some are non-polar.
  • Piezoelectric materials are dielectrics where application of an external stress \(\boldsymbol{\sigma}^{\text{ext}}\) leads to an induced polarization \(\textbf P^{\text{ind}}\), with constant of proportionality the piezoelectric coefficient \(\textbf P^{\text{ind}}=d\boldsymbol{\sigma}^{\text{ext}}\). This piezoelectric effect can also be run in reverse, whereby application of an external voltage \(V^{\text{ext}}\) leads to an induced strain \(\varepsilon^{\text{ind}}\) (not to be confused with the polarizability \(\varepsilon\)) where now \(\varepsilon^{\text{ind}}=V^{\text{ext}}\).
  • All polar materials are pyroelectric materials and vice versa (due to thermal expansion). This means an externally initiated temperature change \(\Delta T^{\text{ext}}\) induces a polarization \(\textbf P^{\text{ind}}\) via another proportionality constant called the pyroelectric coefficient \(|\textbf P^{\text{ind}}=p\Delta T^{\text{ext}}\) where \(p<0\).
  • Ferroelectrics are dielectrics exhibiting ferroelectric hysteresis (and hence have a spontaneous/remanent polarization \(\textbf P_0\) below their Curie temperature \(T_C\) (thus, any ferroelectric hysteresis loop should be viewed as a cross-section for some fixed temperature \(T<T_C\)).
  • Perovskites are crystals with stoichiometry \(\text{ABX}_3\) for \(\text{A,B}\) metal cations and \(\text X\) an anion. The Goldschmidt tolerance factor measures how distorted from a cubic crystal structure the perovskite is, \(\Delta_{\text{cubic}}=\frac{R_A+R_X}{\sqrt{2}(R_B+R_X)}\).
  • Barium titanate \(\text{BaTiO}_3\) is a ferroelectric perovskite with \(\Delta_{\text{cubic}}\approx 1.07\) so \(\text{Ba}^{2+}\) cations too large, lot of space for \(\text{Ti}^{4+}\) cation to polarize in the octahedral interstice. As a result, at temperatures \(T<T_C=120^{\circ}\text{C}\) such as room temperature \(T=20^{\circ}\text C\) it exhibits ferroelectric hysteresis. Specifically, cooling from \(T=T_C\), it undergoes a paraelectric-to-ferroelectric first-order phase transition in its 3D Bravais lattice \(\Lambda_{\text{pc}}\mapsto\Lambda_{\text{bct}}\) (and goes into other ferroelectric phases at lower temperatures still).
  • Landau theory can be used to explain semi-quantitatively the phenomenology of phase transitions. The idea is to postulate an ansatz for the Helmholtz free energy \(F:=U-TS\) and to then seek to minimize it (why not Gibbs free energy instead?) with respect to temperature \(T\) and an order parameter \(P\) (the induced polarization in this case).
  • Ferroelectrics are not necessarily monodomain (unless \(|\textbf E^{\text{ext}}|\) is sufficiently strong), but more commonly have many polarization domains separated by polarization domain walls due to an energetic competition between \(V_{\text{dipoles}}\) and \(V_{\text{stray}}\) (and it is the pinning of polarization domain walls by defects that gives rise to the irreversibility of ferroelectric hysteresis in the first place).
  • Ferroelectrics are useful for \(2\) main reasons: they have large polarizability \(\varepsilon\) so are used as dielectrics in capacitors, and because of their ferroelectric hysteresis properties for ferroelectric RAM, etc.
  • Another ferroelectric “perovskite” is lead zirconate titanate (PZT) \(\text{PbZr}_x\text{Ti}_{1-x}\text O_3\) where \(x\in[0,1]\), with the important composition being around \(x\approx 0.5\) at room temperature due to the presence of a morphotropic phase boundary there (the central \(\text{Ti}^{4+}\) cation can be polarized in a total of \(|\{100\}|+|\{110\}|=6+8=14\) distinct directions).
  • The magnetization field \(\textbf M:=n\boldsymbol{\mu}\) is the density of magnetic dipoles (analogous to the polarization density \(\textbf P:=n\textbf p\) as the density of electric dipoles).
  • Magnetic susceptibility is defined by \(\textbf M^{\text{ind}}=\chi_m\textbf H^{\text{ext}}\).
  • Using \(\textbf M\), magnetic properties of materials can be classified into \(5\) buckets: diamagnetic, paramagnetic, ferromagnetic, antiferromagnetic, and ferrimagnetic, where both ferromagnetic and ferrimagnetic materials have hysteresis loops:
  • For example, magnetite (where magnetism was discovered) is a ferrimagnetic material adopting an inverse spinel crystal structure.
  • Fundamentally, the origin of magnetism in matter is due to the exchange interaction energy and the Pauli exclusion principle (so parallel spins are energetically favorable to minimize exchange interaction energy, but this is in competition with thermal energy/entropic considerations that wants to randomize magnetic moments) hence existence of a Curie temperature \(T_C\) such that for \(T>T_C\), magnetization vanishes via a ferromagnetic-to-paramagnetic phase transition.
  • Ferromagnets have easy and hard axes due to magnetocrystalline anisotropy, and these easy and hard axes also give rise to shape anisotropy (e.g. explains why bar magnets and not “fat magnets”).
  • Ferromagnets exhibit magnetostriction.
  • For analogous energy competition reasons as ferroelectrics, ferromagnets also have magnetization domains separated by domain walls, and the reason for irreversible ferromagnetic hysteresis (domain wall pinning) is identical.
  • Ferromagnets subdivide into soft and hard ferromagnets, soft ferromagnets are important for transformers (need to be able to easily switch magnetization back and forth) whereas hard ferromagnets have microstructure engineered to deliberately pin domain wall motion (e.g. neodymium magnets).
  • Ionic conductors are described in the steady state by the Fick-Ohm-Boltzmann equation (called Nernst-Einstein equation for some reason) \(\frac{\sigma_{\infty}}{D_{\infty}}=\frac{nq^2}{kT}\).
  • Two important stochiometric defects are Schottky defects (simultaneous cation and anion vacancies) and Frenkel defects (an ion moves into an interstice, leaving behind a vacancy). The presence of such vacancies allows small ions to jump, mediating conduction. However, the jump is thermally activated (need enough thermal energy, described by Arrhenius equation \(D=D_0e^{-\Delta E_a/RT}\), so ionic conduction works best when served hot!
  • Doping zirconia (zirconium dioxide) \(\text{ZrO}_2\) with yttrium \(\text Y^{3+}\) cations forces creation of \(\text{O}^{2-}\) vacancies for charge neutrality. These vacancies mean that yttria-stabilized zirconia (YSZ) is an ionic conductor (called “stabilized” because the yttrium also stabilizes the otherwise unstable high-temperature cubic phase of zirconia).
  • Bismuth oxide \(\text{Bi}_2\text O_3\) in its cubic (\(\delta\)) phase is also an ionic conductor.
  • Ionic conductors are useful electrolytes in oxygen concentration cells for \(\lambda\)-sensors as vehicle exhaust control systems, and hydrogen fuel cells for the hydrogen economy.
  • Expected end-to-end distance in an \(N\)-monomer polymer chain is \(\sqrt{N}\ell_K\), where \(\ell_K\) is the Kuhn length of the polymer chain.
  • Polymers have an inherent anisotropy to them, this leads to their birefringence \(\Delta n:=n_{\text{slow}}-n_{\text{fast}}\) (and remember \(n=\sqrt{\hat{\mu}\hat{\varepsilon}}\)). Rotation angle of birefringence is \(\Delta\theta=k\Delta_{\gamma}x=2\pi\Delta n\Delta x/\lambda\). Typically studied under crossed polarizers, for white light source, the color blocked is complementary of color observed (as given on a Michel-Levy chart). For crossed polarizers, irradiance of all wavelengths also varies as \(\cos^2(\theta)\) (get extinction positions), enabling determination of fast and slow axes. To determine exactly which is which, use a compensator.
  • Polymers are examples of liquid crystals, formed at intermediate temperatures and classified by a unit director field \(\textbf D\) (don’t confuse with electric displacement field) and order parameter \(Q=\overline{P_2(\cos(\theta))}\) into nematic, smectic A/C, and chiral nematic (pitched/helical) liquid crystals. As with alignment of polarization domains in ferroelectrics or magnetization domains in ferromagnets, the same energy competition (alignment vs. thermal) drives the phase transitions (similarly get domain walls as seen in Schlieren textures, but these are now called disclinations where get Schlieren brushes). Unit director field \(\textbf D\) also defines slow and fast axes for birefringence of liquid crystals.
  • A chiral nematic liquid crystal can be enforced using Dirichlet boundary conditions on the unit director field \(\textbf D\), and an external \(\textbf E^{\text{ext}}\)-field can be applied to such a chiral nematic liquid crystal pixel to induce a Freedericksz ON/OFF phase transition, the key buzzword behind liquid crystal displays (LCDs).

Diffraction

  • X-rays are arguably the most important experimental tool in crystallography (and other fields of science such as chemistry and biology). The reason is that their wavelengths \(\lambda\sim 1 A\) just happens to coincide with the typical length scale of most of these crystal structures and molecules, etc. that one is interested in understanding the structure of so that they will indeed be resolvable. Typical source of x-rays include \(\text{Cu}\) \(K_{\alpha}\) with \(\bar{\lambda}\approx 1.542 A\).
  • For single crystals \(\Gamma\), the rule is that the lattice plane \((hkl)\) will usually diffract x-rays incident on the plane (in accordance with the Bragg equation \(\lambda=2d_{hkl}\sin(\theta_{hkl})\)) unless the structure factor \(\psi_{hkl}=0\) vanishes (in which case \((hkl)\) is said to be systematically absent, and different 3D Bravais lattices \(\Lambda\) have different selection rules about which lattice planes should or shouldn’t be systematically absent).
  • For polycrystals, typically have a powder of the polycrystalline material, hence called x-ray powder diffraction. Due to randomness of grain orientations, get both front and back reflections via Debye-Scherrer cones and irradiance \(I\propto |\{hkl\}||\psi_{hkl}|^2\) as in the Born rule. Can be imaged using a Debye-Scherrer camera on photographic film or using an electronic detector.
  • In general, any kind of photographic film or “sampling” of an interference pattern should be thought of as a slice through the reciprocal lattice \(\Lambda^*\) of the original 3D Bravais lattice \(\Lambda\). Bragg’s law has a nice interpretation in \(\Lambda^*\) via the Ewald sphere construction.
  • Transmission electron microscopy and scanning electron microscopy take advantage of the even finer de Broglie wavelength of electrons to image at even higher resolutions.

Microstructure

  • To image the microstructure of a material, can use reflected light microscopy (need a chemical etchant like Nital first to etch different phases at different rates), or for greater resolution, use SEM or atomic force microscopy (AFM).
  • Gibbs free energy \(G:=H-TS\) is minimized at constant \(p\) and \(T\). Thus, there is often an enthalpic (\(H\)) and entropic (\(-TS\)) competition that determines the equilibrium phases of a material at given conditions, with the general theme being that in the hot limit \(T\to\infty\), entropic effects dominate whereas in the cold limit \(T\to 0\) enthalpic effects dominate.
  • Since \(dG=Vdp-SdT\), it follows that the slope \(\left(\frac{\partial G}{\partial T}\right)_p=-S<0\) is always negative. For two phases, the temperature \(T_{12}\) at which \(G_1=G_2\) is the phase transition temperature between those phases (although this is for equilibrium only; phase diagrams only show equilibrium phases of globally minimum \(G\), but metastable phases of locally minimum \(G\) can persist if there is sufficient activation energy barrier).
  • For any solution of two atomic species \(A,B\), the Gibbs free energy of mixing is \(\Delta G_{\text{mix}}=\Delta H_{\text{mix}}-T\Delta S_{\text{mix}}\). Assuming only nearest-neighbor interactions matter, then \(\Delta H_{\text{mix}}=H_{\text{sol}}-H_{\text{mech mix}}=\lambda_{AB}x_Ax_B\) where \(x_A=n_A/n,x_B=n_B/n\) are mole fractions and \(\lambda_{AB}\sim nC(2H_{AB}-H_{AA}-H_{BB})\) is the \(AB\)-interaction parameter and \(C\) is a coordination number. Meanwhile, ignoring thermal contributions to entropy, \(\Delta S_{\text{mix}}=S_{\text{sol}}-S_{\text{mech mix}}=-nR(x_A\ln(x_A)+x_B\ln(x_B))\) is just a linear combination. The solution is ideal iff \(\Delta H_{\text{mix}}=\lambda_{AB}=0\) (meaning that \(A\) and \(B\) are probably quite similar) and regular otherwise. Thus, the regular solution model “Lagrangian” is: $$\Delta G_{\text{mix}}=\lambda_{AB}x_Ax_B+nRT(x_A\ln(x_A)+x_B\ln(x_B))$$
  • If \(\lambda_{AB}\leq 0\), then \(\Delta G_{\text{mix}}<0\) always, and
  • The more interesting case is \(\lambda_{AB}>0\) since then at low \(T\) the enthalpic term dominates and segregation of phases occurs whereas at high \(T\) the entropic term dominates again and get a uniform solution once more.
  • Regular solution model is only an approximation, has many assumptions built into it.
  • In practice, determine compositions by using a phase diagram and proportions using tie lines and lever rule.
  • Eutectic phase transitions are of the form \(L\to\alpha+\beta\), and generally have a lamellar microstructure/intergrowth due to cooperative diffusive growth. Are important in solder, where a low melting point is desirable (melting point of eutectic alloy of solder is lower than either of the pure metals).
  • Experimentally, phase diagrams can be mapped out by measuring cooling curves \(T(t)\) for a given composition of two atomic species. Changes in the cooling rate \(\dot T\) suggest phase transitions, and \(\dot T=0\) is a hallmark of eutectic solidification \(L\to\alpha+\beta\).
  • Rapid/non-equilibrium solidification leads to coring of the solid that is solidified. Such solids may also be dendritic in nature.
  • So far have just considered thermodynamics, need consider kinetics too. Homogeneous nucleation of a solid phase \(\alpha\) in a liquid \(L\) (both of the same composition), the driving force \(\Delta G_V\) for a supercooling of \(\Delta T<0\) is \(\Delta G_V=\Delta T\Delta S_V\) (assuming the heat capacity \(C_p\) is independent of \(T\)), and so spherical nucleation is governed by a “Lagrangian” \(\Delta G(r)=\frac{4}{3}\pi r^3\Delta G_V+4\pi r^2\gamma\) with the work of nucleation being \(\Delta G^*=16\pi\gamma^3/(3\Delta G_V^2)\) (note the essential proportionalities) and \(r^*=-2\gamma/\Delta G_V\) (again, the proportionalities should make sense).
  • Nucleation rate (nucleations per unit volume per unit time) varies with temperature \(T\) as: \(\dot{N}(T)\propto N_Se^{-\Delta G^*(T)/RT}e^{-E_a/RT}\), where the notation \(\Delta G^*(T)\) emphasizes that the driving force also depends on \(T\) via the supercooling \(\Delta T=T-T_m\).
  • Heterogeneous nucleation substantially smaller energetic barrier and therefore faster \(\dot N\).
  • When a solid phase \(\alpha\) nucleates inside another solid phase \(\beta\) (not liquid), no longer a sphere (as it was for a liquid), instead need consider coherency of interfaces. Incoherent interfaces have high surface energy \(V_{\gamma}\propto\gamma\), so tend to try and minimize incoherent surface area and therefore (counterintuitively) want to grow in the direction of incoherent interfaces.
  • \(\Gamma_{\text{Widmanstatten}}\) is a crystal structure found in certain \(\text{Fe}\)-\(\text{Ni}\) meteorites with sufficiently slow cooling rate \(\dot T\), shows how \(\Lambda_{bcc}\) and \(\Lambda_{fcc}\) iron-rich phases can have coherent interface.
  • Isothermal Transformation (TTT) Diagrams are for a fixed composition.
  • Displacive phase transitions (e.g. austenitic fcc steel undergoing a martensitic bct phase transition) are in contrast to reconstructive phase transitions.
  • The standard phase diagram for the \(\text{Fe}\)-\(\text{C}\) alloy system is actually only a quasi-equilibrium phase diagram. Cast irons have \(2%<w_{\text{C}}<4\%\) whereas steels have \(0.1%<w_{\text{C}}<1.5\%\) and the latter are dominated by a eutectoid phase transition to form pearlite = ferrite + cementite.
  • With steels however, there is a lot of metallurgical wisdom that has been gathered over the years on ways to manipulate the steel to get more properties out of it.
  • To harden a steel, the standard 3-step recipe is: anneal, quench, temper. First, anneal the steel up into the austenitic \(\gamma\) phase and wait until it equilibrates. Then quench it rapidly in water. This prevents the interstitial carbon \(\text C\) atoms from diffusing to form the lamellar eutectoid microstructure and instead results in them occupying octahedral interstices in a bct \(\text{Fe}\) matrix. This \(\Lambda_{\text{bct}}\) is thus considerably strained, impeding dislocation motion (so hard) but also brittle. To reduce brittleness while maintaining hardness, tempering is used (hold at some sub-eutectoid \(T\)) to introduce small cementite precipitates in ferrite matrix (very different from how it would look for eutectoid phase transition).
  • Al-Cu alloy system is important in aerospace engineering applications.
  • In general, strength \(\sigma_y\) increases with smaller grains \(d\) via the Hall-Petch equation \(\sigma_y=\sigma_0+k/\sqrt{d}\) (grains impede dislocation glide), hence the fuss about finer microstructure.
  • Al-Cu alloys undergo the same 3-step processing to harden them: anneal, quench, temper. In this third tempering step, the incoherent tetragonal structure of the \(\theta\) phase means that several intermediate metastable phases form first: GP zones, \(\theta”\), \(\theta’\) and finally \(\theta\), becoming more and more incoherent.

Mechanical Behavior of Materials

  • For uniaxial loading along some lattice direction in a crystal, can experimentally measure a stress-strain curve or \(\sigma^{\text{ext}}\)-\(\varepsilon^{\text{ind}}\) curve. In fact, in the elastic deformation regime defined by small external stress \(\sigma^{\text{ext}}\), the strain increases linearly in accordance with Hooke’s law \(\varepsilon^{\text{ind}}=\frac{\sigma^{\text{ext}}}{E}\) where \(E\) is Young’s modulus. Atomic origin of this linearity can be attributed to quadratic nature of Lennard-Jones potential energy at the equilibrium distance \(r_0\), and pursuing this line of reasoning fully, one even estimates \(E\sim\frac{1}{r_0}\frac{d^2V}{dr^2}(r_0)\) so that sharper potential wells and closer-packed lattice planes mean stiffer materials.
  • Beyond the yield stress \(\sigma^{\text{ext}}_y\), materials undergo irreversible plastic deformation (cf. so many of the other irreversible phenomena in this course notably ferroelectric and ferromagnetic hysteresis). Fundamental insight is that the origin of such plastic deformation turns out to be due to dislocation glide on close-packed lattice planes and in close-packed lattice directions in crystals, providing low-energy-cost way for whole a** lattice planes to effectively glide and thereby leading to plastic deformation.
  • Poisson’s ratio is roughly speaking \(\nu:=-\frac{\varepsilon_{\rho}^{\text{ind}}}{\varepsilon_{z}^{\text{ext}}}\) in the limit of small external strains (really external stresses). Metals usually have \(\nu\approx 0.3\), and \(nu=0.5\) is the condition for incompressibility (e.g. rubbers).
  • Ductility is measured by the failure strain \(\varepsilon_f\). Ductile materials have large \(\varepsilon_f\) whereas brittle materials have low \(\varepsilon_f\).
  • Analogous relation for shear stresses and their induced shear strains: \(\gamma^{\text{ind}}=\frac{\tau^{\text{ext}}}{G}\) where \(G\) is the shear modulus.
  • Note that \(E=2G(1+\nu)\) so the \(3\) are not independent of each other.
  • Total strain energy density is \(u=\frac{1}{2}E\varepsilon^2+\frac{1}{2}G\gamma^2\).
  • In many materials, Young’s modulus \(E\) (and I would imagine shear modulus \(G\)) is a tensor field due to anisotropy, for instance in fiber composites. Can use the Voigt or Reuss models to estimate the Young’s moduli parallel and perpendicular to the fibers based on volume fractions of fiber and matrix (although not super accurate).
  • Phenomenon of thermal expansion (and its linear nature over most temperature ranges) can be rationalized via the asymmetry of the LJ potential (think of as a ball rolling back and forth down the potential well). Leads to thermal stresses in bimetallic strips and any interface of two different metals joined together.
  • Euler-Bernoulli beam theory.
  • Important: Experimentally, plastically deformed materials appeared to have parallel stripes on their surfaces when loaded axially \(\sigma^{\text{ext}}\).

Closer examination showed that each stripe was like a little stair on a staircase (will turn out to be of magnitude \(|\textbf b|\)); lattice planes of that specific orientation had seemed to glide ever so slightly, and this was what caused the plastic deformation. But if one naively adapts a block-slip model of calculating the critical shear stress \(\tau^*\) needed to move a lattice plane over another lattice plane, one obtains values that far exceed experimental observations. Turns out there is a loophole, a way to make it seem as if an entire lattice plane had slid across another one, but requiring far less stress (Peierls-Nabarro stress \(\sim Ge^{-2\pi w/|\textbf b|}\)). Dislocations (ruck through carpet analogy)! Most materials contain dislocations (and indeed, very pure materials with no dislocations can approach the block-slip \(\tau^*\)).

  • Dislocations \(:=\) 1D (line) defects (cf. vacancies as 0D defects, cracks as 2D defects?). Thus, dislocations are a subset of defects.
  • Edge dislocations have line vector \(\boldsymbol{\ell}\) along the bottom of the extra half-plane of atoms, and Burgers vector \(\textbf b\in\Lambda\) orthogonal to it (complete the Burgers circuit).
  • Also have screw dislocations where \(\boldsymbol{\ell}\) is along the helical “screw axis” and Burgers vector \(\textbf b\) now parallel to \(\boldsymbol{\ell}\) by the magnitude of the dislocation.
  • Most dislocations have both edge character and screw character (cf. hybrid atomic orbitals having \(s\) character and \(p\) character). However all dislocations, regardless of their exact edge/screw character give the same net effect of making lattice planes glide (ruck in the carpet again!) and glide on the glide plane \(\text{span}_{\textbf R}(\textbf b,\textbf{\ell})\) provided there is sufficient shear stress (much less than the block-slip model, but still non-zero as determined by projecting the external tensile stress \(\sigma^{\text{ext}}\) onto the lattice plane to obtain a resolved shear stress \(\tau^{\text{ind}}=\sigma^{\text{ext}}\cos(\phi)\cos(\lambda)\)). Note that \(\lambda\) and \(\phi\) are not in general complementary angles, rather I believe \(90^{\circ}\leq\lambda+\phi\leq 180^{\circ}\).
  • Dislocation loops can either be vacancy loops or interstitial loops.
  • A dislocation can be thought of as having a free body diagram consisting of a fictitious “glide force” (not fictitious in the sense of non-inertial but just actually fictitious because dislocations are not actual objects) and some resistive drag force. The glide force \(\textbf f\) is a force per unit length (makes sense, dislocations are 1D defects) and is \(\textbf f=\tau^{\text{ind}}\textbf b\).
  • Shear strain energy (per unit line vector length) stored in a screw dislocation is \(V\approx \frac{1}{2}G|\textbf b|^2\). Edge dislocations have similar formula, and are more energetically costly than screw dislocations (per unit length).
  • For a given stress \(\boldsymbol{\sigma}^{\text{ext}}\), the close-packed slip system to activate first will have the largest Schmid factor (i.e. closest to \(\cos^(45^{\circ})=1/2\)). For crystals \(\Gamma\) admitting \(\Lambda_{\text{bcc}}\) or \(\Lambda_{\text{fcc}}\) 3D Bravais lattices, these are given conveniently by the OILS rule (why does it work?).
  • During loading, the slip direction rotates towards the tensile axis \(\lambda\to 0\).
  • From the frame of the sample however, one can think of the tensile axis rotating towards the slip direction instead, and this is made explicit by adding multiples of the slip direction to the tensile axis until two of the indices of the rotated tensile axis have same components (at which point duplex slip is initiated on the two lattice planes with equal Schmid factor), and the tensile axis starts rotating toward the sum of their slip directions. Since number of lattice planes and interplanar spacing is assumed conserved during plastic deformation, it follows that one has the Heisenberg quantities \(L\cos(\lambda)=\text{constant}\) and therefore by complementarity \(L\sin(\phi)=\text{constant}\) (just think intuitively about these are referring to!).
  • Basically, to explain plastic deformation to a 5-year old, put your hands (lattice planes) together and show one “crawling” over the other (dislocation propagating like ruck in carpet).
  • For \(\Gamma_{\text{hcp}}\), there may be geometric softening in the plastic deformation regime.
  • For \(\Gamma_{\text{fcc}}\), there are \(3\) stages of plastic deformation: constant \(\sigma\) (easy glide), followed by work hardening, followed by cross slip (occurs earlier for higher stacking fault energies because partial dislocations are less separated and need to combine since they are not pure screw character). In polycrystals, the average Schmid factor is called the Taylor factor, and is \(\overline{\cos(\phi)\cos(\lambda)}=\frac{1}{3}\) but this does not yield an accurate prediction of yield stress \(\sigma_y\) due to grain boundary influence (Hall-Petch).
  • For \(\Gamma_{\text{polycrystalline}}\), duplex slip (and thus work hardening) initiates at different stresses in different grains of the polycrystal, so get continuous work hardening.
  • In a nutshell, work hardening is associated with duplex slip and is when dislocations react to become sessile, impeding other dislocations and therefore strengthening the material.
  • Dislocations are like stress dipoles—>form dislocation arrays.
  • When dislocations meet, they either cut/intersect (i.e. make a jog of length \(J_1=|\textbf b_2|\) which may or may not be glissile) or combine. Combination occurs iff Frank’s rule \(\textbf b_1\cdot\textbf b_2<0\) is satisfied (i.e. energetically favorable, minimizes the total amount of “dislocation”). Intuitively, \(\textbf b=\textbf b_1+\textbf b_2\) and \(\boldsymbol{\ell}\) must lie on the intersection of the two slip planes (use Weiss zone law or just cross product). If the new \(\text{span}_{\textbf R}(\textbf b,\boldsymbol{\ell})\) is not a close-packed lattice plane, then get a sessile/forest Lomer lock. This then impedes other dislocations on same plane—>work hardening.
  • Dislocations are generated by Frank-Read sources (Discord symbol).
  • Edge dislocations can bypass obstacles by dislocation climb (sinking and sourcing vacancies). (Perfect) screw dislocations can bypass obstacles by cross slip (because \(\textbf b\times\boldsymbol{\ell}=\textbf 0\) for screw dislocations, can glide on any crystallographically equivalent lattice planes provided sufficient external stress).
  • Strengthen = increase \(\sigma_y\) (super duper important for engineering b/c want stuff to operate in the linear elastic regime; in some sense, all this stuff about plastic deformation we’ve been learning is just a whole lot of shit that an engineer would want to avoid by keeping everything nice and simple and elastic).
  • Dislocation density \(\rho\) is meters of dislocation line (again dislocations are 1D defects!) per meter cubed.
  • Grain boundaries obviously inhibit dislocation glide (get pile up). Smaller grains have shorter pile-up, so stronger (Hall-Petch).
  • Another way to strengthen a material is by alloying with solute/impurity atoms. This is solid solution strengthening. For substitutional solute atoms, their stress field symmetrically sucks in dislocations, but the strengthening is modest. For interstitial solute atoms, strain field can be asymmetric (e.g. \(\text{C}\) solute atoms in octahedral interstices of \(\alpha\)-\(\text{Fe}\) matrix of a low-carbon steel, asymmetry allows interaction with both edge and screw dislocations). Furthermore, interstitial diffusion much faster than substitutional diffusion. Thus, interstitial solid solution strengthening \(\gg\) substitutional solid solution strengthening.
  • For low-carbon steels specifically, the rapid interstitial diffusion of \(\text{C}\) atoms to dislocations forms Cottrell atmospheres. For low-carbon steels, also get a phenomenon of Luders bands which are boundaries separating yielded (near ends, where dislocations have escaped from Cottrell atmospheres) and unyielded (near middle, where they haven’t yet) regions which merge toward each other, requiring reduced yield stress.
  • At low \(T\) (e.g. room temperature), the Cottrell atmosphere effect with Luders bands was described. At high \(T\), carbons are so mobile they just move along with dislocations so no strengthening (get a vanilla stress-strain curve). At intermediate \(T\), get Portevin-Le Chatelier effect (trap-escape cycle of dislocations from Cottrell atmospheres—>serrations in stress-strain curve).
  • Precipitate strengthening means using a whole other phase (not just solute atoms like C in steel, but a whole fricking other phase). For small precipitates, dislocations have to cut through them, increasing strengthening by \(\Delta\sigma_y\propto\sqrt{r}\). For large precipitates, dislocations can get away by Orowan bowing around them \(\Delta\sigma_y\propto \frac{1}{r}\), leaving two dislocation loops (thus increasing dislocation debris). For a dislocation of Burgers vector \(\textbf b\) in a material of shear modulus \(G\), \(\tau_{\text{bow}}\propto\frac{G|\textbf b|}{L}\) so farther spaced precipitates require less stress to bow across.
  • When tempering martensite to nucleate \(\text{Fe}_3\text C\) precipitates in \(\alpha\), one has \(\Delta\sigma_y(t_{\text{aging}})\), first solid solution strengthening, then coherency strains, then cutting, and finally bowing (at which point the material would be considered overaged).
  • Dissociation (opposite of combining!) of (perfect) dislocation into partial dislocations in \(\Lambda_{\text{fcc}}\) is favorable due to Frank’s rule. They will normally have a repulsive interaction between each other, but also their separation is limited by the stacking fault energy of the stacking fault between them, so that higher stacking fault energy means earlier stage-III cross slip in their plastic deformation.
  • Order hardening is a functor from disordered solid solutions to ordered solid solutions below a certain temperature \(T\), where the ordered, low-\(T\) phase is much stronger (hence the name) due to formation of anti-phase boundaries and the associated energy cost of that.
  • So…I’ve been emphasizing how plastic deformation is mainly driven by dislocation glide…yes, that’s true, a more minor way it could happen is via deformation twinning (simultaneous shearing of successive lattice planes with mirror symmetry about twin boundaries). For example, for \(\Lambda_{\text{fcc}}\), deformation twinning happens along \(\{111\}\) close-packed lattice planes by a downward amount \(\frac{a}{6}\langle 11\bar{2}\rangle\) from a B position to a C position sort of thing (identical as in dissociation into partial dislocations).
  • There are also annealing twins.
  • Toughness = Ductility (opposite of brittle)
  • Just as block slip model overestimates stress needed to plastically deform, so naive breaking-plane-of-bonds model overestimates stress needed to fracture. For former, the loophole was dislocations (1D defects). For the latter, it is cracks (2D defects).
  • Griffith criterion for energy balance, propagation of crack favorable. Griffith criterion is easiest to apply for brittle materials, where \(G\geq G_C= 2\gamma\). For ductile materials, there is a zone plasticity ahead of crack tip, blunting it, but therefore requiring extra work to be done (larger strain energy release rate \(G\) needed).
  • Plotting impact energy (a proxy for toughness) \(E_{\text{impact}}(T)\) as a function of temperature \(T\), see that for metals with \(\Lambda_{\text{bcc}}\), get a ductile-to-brittle transition temperature \(T_{d\to b}\) (think steels and the Titanic in those cold arctic waters where the steel was brittle).
  • Fiber-matrix composites may paradoxically have both the fibers and matrix individually being brittle yet the composite being tough (sum of parts is not whole!). This is due to strengthening effect from the fiber pull-out mechanism.
  • In a pressurized pipe, the hoop stress is twice the axial stress, so pipes burst longitudinally.
Posted in Blog | Leave a comment

Recurrent Neural Networks

Problem: What does it mean for a collection of feature vectors \(\mathbf x_1,…,\mathbf x_{T}\) to represent a form of sequence data. Give some examples of sequence data.

Solution: It means that the feature vectors are not i.i.d.; indeed, they are in general not a discrete-time Markov chain as each \(\mathbf x_t\) for \(1\leq t\leq T\) can depend on the whole history of \(\mathbf x_{<t}\) that came before it. Examples of sequence data include speech, music, DNA sequences, natural language words, etc.

Problem: Consider the NLP problem of named entity recognition which consists of assigning a binary label \(\hat y\in\{0,1\}\) to every word in an English sentence where \(y=1\) indicates that the word is part of someone’s name. In this case, what is a standard choice for the sequence feature vectors \(\mathbf x_t\)?

Solution: The idea is to employ a one-hot encoding of the words in the sentence with respect to some a priori dictionary of e.g. \(10000\) words in the English language or so. For instance, if “a” is the first word in such a dictionary, then the corresponding one-hot representation of the word “a” would be \(\mathbf x=(1,0,0,…)\in\mathbf R^{10000}\).

Problem: In the broadest sense, what is a recurrent neural network (RNN)? Explain how a simple RNN architecture may be used to analyze the sequence data in the application of named entity recognition described above.

Solution: In the broadest sense, an RNN may be thought of as any function:

\[(\mathbf a_t,\mathbf y_t)=\text{RNN}(\mathbf a_{t-1},\mathbf x_t|\boldsymbol{\theta})\]

that utilizes a so-called hidden state/activation vector \(\mathbf a_t\) (whose dimension is a hyperparameter of the RNN) to remember the history of \(\mathbf x_{<t}\) it has seen so far (typically initialized to \(\mathbf a_{t=1}=\mathbf 0\)). Such a function \(\text{RNN}\) may also depend parametrically on various learnable weights and biases \(\boldsymbol{\theta}\). Different RNN architectures are thus distinguished by the choice of function \(\text{RNN}\).

Intuitively, one can think of an RNN like a sponge that soaks in information, and the corresponding value of its hidden state \(\mathbf a_t\) as a measure of how soaked the sponge currently is. Then, at the next time step, the RNN will increase or decrease its water content based on how wet it currently is \(\mathbf a_{t-1}\) and what external stimulus \(\mathbf x_t\) it receives.

For the named entity recognition task, a simple RNN architecture may be used in which the memory update rule is:

\[\mathbf a_t=\tanh\left(W_{\mathbf a}\mathbf a_{t-1}+W_{\mathbf x}\mathbf x_t+\mathbf b_{\mathbf a}\right)\]

and the usual scalar binary classification:

\[\hat y_t=[\sigma(W_y\mathbf a_t+b_y)\geq 0.5]\]

Here, one might take \(\mathbf a_t\in\mathbf R^{512}\) for instance. In that case, the weight matrices and bias vectors \(W_{\mathbf a}\in\mathbf R^{512\times 512},W_{\mathbf x}\in\mathbf R^{512\times 10000},\mathbf b_{\mathbf a}\in\mathbf R^{512}\), and similarly \(W_y\in\mathbf R^{1\times 512},b_y\in\mathbf R\) are all to be learned by the RNN.

Problem: Give a taxonomy of RNN architectures, and classify the above example involving named entity recognition.

Solution: Essentially one has a \(2\times 2\) matrix:

The vanilla RNN architecture described above involving named entity recognition would be classified as a many-to-many RNN architecture.

Problem: Moving on from the example of named entity recognition, consider now a different task of implementing a (word-level) language model using an RNN; how can that be done?

Solution: Using a many-to-many RNN architecture that uses the same hidden state update rule as before:

\[\mathbf a_t=\tanh\left(W_{\mathbf a}\mathbf a_{t-1}+W_{\mathbf x}\mathbf x_t+\mathbf b_{\mathbf a}\right)\]

But instead generates a probability distribution over the vocabulary for the next word will be conditioned on the previous words:

\[\mathbf y_t=\text{softmax}(W_{\mathbf y}\mathbf a_t+\mathbf b_{\mathbf y})\in\mathbf R^{10000}\]

Moreover, one not only initializes \(\mathbf a_{t=1}=\mathbf 0\), but also \(\mathbf x_{t=1}=\mathbf 0\) and for \(t\geq 2\) takes the previous output as the next input \(\mathbf x_t=\mathbf y_{t-1}\). This same architecture can also be used to generate sequences of text, namely by sampling from the softmax probability distribution \(\mathbf y_t\) at each time step \(t=1,2,…,T\).

    Problem: Explain why the simple RNN architecture above suffers from both the vanishing gradient and exploding gradient problems.

    Solution: This problem is in fact a general curse of deep neural networks (and RNNs tend to be “deep” because each time step \(t=1,2,…,T\) is analogous to \(1\) layer when the RNN architecture is unrolled). Specifically, when computing gradients of loss functions \(\partial L/\partial\boldsymbol{\theta}\), backpropagation (called backpropagation through time (BPTT) in the context of RNNs) requires chaining \(T\) Jacobian matrices together in a neural net \(T\) layers deep, so if these Jacobians have spectral radii which are systematically \(<1\) or \(>1\), then multiplying them together would lead respectively to a vanishing gradient \(\partial L/\partial\boldsymbol{\theta}\to\mathbf 0\) or exploding gradient \(\partial L/\partial\boldsymbol{\theta}\to\mathbf{\infty}\). All this is to say that the simple RNN architecture will tend to have trouble remembering sequence feature vectors \(\mathbf x_{\ll t}\) that happened a long time ago because again for RNNs, time=layers.

    Problem: Explain how the gated recurrent unit (GRU) is a more sophisticated RNN architecture that overcomes the above problems.

    Solution: At each time step \(t=1,2,…,T\), a GRU is a map \(\mathbf a_t=\text{GRU}(\mathbf a_{t-1},\mathbf x_t)\) that computes the next hidden state via a sequence of \(4\) computations:

    1. Reset gate vector \(\mathbf r_t\):

    \[\mathbf{r}_t = \sigma(W_{\mathbf r\mathbf a} \mathbf{a}_{t-1}+W_{\mathbf r\mathbf x} \mathbf{x}_t + \mathbf{b}_{\mathbf r})\]

    2. Proposal of a candidate hidden state vector \(\tilde{\mathbf a}_t\):

    \[\tilde{\mathbf{a}}_t = \tanh(W_{\mathbf a\mathbf a}(\mathbf{r}_t \odot\mathbf{a}_{t-1}) + W_{\mathbf a\mathbf x} \mathbf{x}_t+\mathbf{b}_{\mathbf a})\]

    3. Update gate vector \(\mathbf u_t\) (isomorphic to reset gate vector \(\mathbf r\leftrightarrow\mathbf u\)):

    \[\mathbf{u}_t = \sigma(W_{\mathbf u\mathbf a} \mathbf{a}_{t-1}+W_{\mathbf u\mathbf x} \mathbf{x}_t + \mathbf{b}_{\mathbf u})\]

    4. Final GRU output vector \(\mathbf a_t\) as convex linear combination:

    \[\mathbf{a}_t = \mathbf{u}_t \odot \tilde{\mathbf{a}}_t+(\boldsymbol{1} – \mathbf{u}_t) \odot \mathbf{a}_{t-1}\]

    In particular, these \(4\) computations of the GRU should be compared with the single computation of a vanilla RNN unit discussed earlier:

    \[\mathbf a_t=\tanh\left(W_{\mathbf a}\mathbf a_{t-1}+W_{\mathbf x}\mathbf x_t+\mathbf b_{\mathbf a}\right)\]

    By using the reset gate \(\mathbf r_t\) to remember only relevant information, the GRU is able to retain a much longer-term memory of what has already come before it, thereby mitigating the vanishing gradient problem.

    Problem: Show that the long short-term memory (LSTM) architecture is another RNN architecture which, like the GRU, also mitigates the vanishing gradient problem.

    Solution: Unlike a GRU which has \(4\) computations and only \(2\) gates, an LSTM is a bit more involved in that it uses \(6\) computations with \(3\) gates (historically, GRUs were invented as a simplification of LSTMs).

    1. Proposal of candidate cell state:

    \[\tilde{\mathbf{c}}_t = \tanh(W_{\mathbf c\mathbf a} \mathbf{a}_{t-1} + W_{\mathbf c\mathbf x} \mathbf{x}_t + \mathbf{b}_{\mathbf c})\]

    2. Forget gate:

      \[\mathbf{f}_t = \sigma(W_{\mathbf f\mathbf a} \mathbf{a}_{t-1} + W_{\mathbf f\mathbf x} \mathbf{x}_t + \mathbf{b}_{\mathbf f})\]

      3. Update gate:

      \[\mathbf{u}_t = \sigma(W_{\mathbf u\mathbf a} \mathbf{a}_{t-1} + W_{\mathbf u\mathbf x} \mathbf{x}_t + \mathbf{b}_{\mathbf u})\]

      4. Update cell state:

      \[\mathbf{c}_t = \mathbf{u}_t \odot \tilde{\mathbf{c}}_t+\mathbf{\mathbf f}_t \odot \mathbf{c}_{t-1}\]

      5. Output gate:

      \[\mathbf{o}_t = \sigma(W_{\mathbf o\mathbf a} \mathbf{a}_{t-1} + W_{\mathbf o\mathbf x} \mathbf{x}_t + \mathbf{b}_{\mathbf o})\]

      6. Update hidden state:

      \[\mathbf{a}_t = \mathbf{o}_t \odot \tanh(\mathbf{c}_t)\]

      Problem: Whatever happened to the output vector \(\hat{\mathbf y}_t\) in the above discussion of the GRU and LSTM recurrent neural network architectures?

      Solution: It’s always there, but the exact formula for it depends on the application of interest in a standard way. Usually, one first computes a linear combination \(W_{\mathbf y\mathbf a}\mathbf a_t+\mathbf b_{\mathbf y}\) of the current hidden state, and then applies some activation function to it that depends in the usual way on the application at hand (e.g. sigmoid for binary classification, softmax for multiclass classification).

      Problem: Explain how a bidirectional recurrent neural network (BRNN) is an augmentation to the usual RNN architecture.

      Solution: Instead of just doing a forward pass in time \(t=1,2,…,T\) through the sequence of feature vectors \(\mathbf x_1,…,\mathbf x_T\), one can also imagine doing a backward pass in time \(t=T,T-1,…,1\) through the same sequence. That is, a BRNN is just \(2\) independent RNNs with hidden states \(\mathbf a_t\) and \(\mathbf a’_t\) respectively, updating according to some specific architectures but in opposite directions through time \(t\):

      \[\mathbf a_t=\text{RNN}(\mathbf a_{t-1},\mathbf x_t)\]

      \[\mathbf a’_t=\text{RNN}(\mathbf a’_{t+1},\mathbf x_t)\]

      such that the output at each time is given by:

      \[\hat{\mathbf y}_t\sim\text{nonlinear}(W_{\mathbf y\mathbf a}\mathbf a_t+W’_{\mathbf y\mathbf a}\mathbf a’_t+b_{\mathbf y})\]

      Of course, the catch with using a BRNN is that one must have access to the entire time series \(\mathbf x_1,…,\mathbf x_T\) up to the total duration \(T\) in order to be able to use it.

      Posted in Blog | Leave a comment

      Advanced Python

      Problem: Where and why should one create an __init__.py file?

      Solution: Inside a folder/directory that’s meant to be a Python package containing a bunch of Python modules with useful functions, etc. that other Python scripts would be importing from (not strictly necessary as Python \(3.3+\) but still conventional to include).

      1. It executes the first time any module from that package is imported, thus providing a convenient place to run setup code.
      2. Expose functions/classes in internal submodules within the package directly at the package level, simplifying user API.
      3. Defining what gets imported when using wildcard import *.

      Problem: (based on this YouTube video) Write some basic Python code to demonstrate how the object-oriented programming (OOP) paradigm works. In particular, show how to create a class, how to initialize attributes of object instances of the class, how to define methods associated to object instances of the class, how child classes can inherit properties of parent classes, and how classes themselves (not just their object instances) can also have class attributes and class methods or static methods.

      Solution:

      OOP_Fundamentals

      Problem: Explain the purpose of magic methods in OOP Python, and write some code to demonstrate their applications.

      Solution: Basically if you want to emulate the behavior of a lot of Python’s built-in classes like being able to concatenate strings using + or getting the length of a list using len(), but with your own classes rather than Python’s built-in classes, then magic methods are the way to go. Another way to put it is that you want to have access to this power of being able to do “operator overload”, so e.g. + is able to mean different things for adding two integers vs. two strings, because in all cases the + is just syntactic sugar for an underlying __add__ magic method that’s defined separately for the int class and the string class.

      magic_methods

      For more magic methods, you can starting type the double underscore, and see what VS Code IDE suggests:

      Problem: Write Python code to demonstrate some applications of decorators, generators and context managers.

      Solution: A decorator (@) is basically a wrapper function \(w\) that itself takes in some function \(f\) and maps it to a “wrapped” version \(w(f)\) of \(f\) with greater functionality but without cluttering the logic of \(f\) itself. A generator (yield) is also a function which is a bit like a discrete-time Markov chain. A context manager (with) guarantees that a program will exit even if there were errors during its execution.

      Decorators_Generators_Context_Managers
      Posted in Blog | Leave a comment

      Chaos & Nonlinear Dynamics

      The purpose of this post is to compile solutions to select exercises from Steven Strogatz’s textbook Chaos and Nonlinear Dynamics.

      Chapter #\(1\):

      Chapter #\(2\):

      Chapter #\(3\):

      Chapter #\(4\):

      Chapter #\(5\):

      Chapter #\(6\):

      Chapter #\(7\):

      Chapter #\(8\):

      Posted in Blog | Leave a comment