|𝔻⟩irac's Student: Orb v3 MLIP

Last week, Orbital Materials released their state-of-the-art (SOTA) MLIP model v3. Based on the pareto front performance plot of select models in Figure 1, seems to be the best-performing MLIP model to date [1]. The one thing that stood out to me was the introduction of a new loss regularization technique they call equigrad. I think this is a really clever/efficient way to learn the rotational equivariance without having to use any SO(3) equivariant layers, which are computationally intensive and architecturally complex¹.

Figure 1. Performance of orb-v3 models vs others (fig. 1 from ref. [1])

Equigrad

The orb-v3 class of models introduce a loss-based approach to rotational symmetry, which enforces the physical constraint that the total potential energy of a system should remain invariant under global rigid-body rotation. Rather than embedding rotational equivariance of the node states in the network architecture itself, as is done with MACE or SevenNet models, with equigrad one introduces a differentiable penalty on the model's energy prediction when atomic positions and the cell matrix are infinitesimally rotated.

The authors define a global rotation as a matrix exponential of a skew-symmetric matrix given as:

$ \begin{equation} \mathbf{R} = e^{\mathbf{G} - \mathbf{G}^T} \label{eq:rot_matrix} \end{equation} $

where $\mathbf{G} \in \mathbb{R}^{3 \times 3}$. The rotational gradient of the predicted energy, evaluated at the identity rotation ($\mathbf{G}=\mathbf{0}$), is then given as:

$ \begin{equation} \Delta_{\text{rot}} = \left. \frac{\partial E(\mathbf{r}^T \mathbf{R}, \mathbf{hR})}{\partial \mathbf{G}} \right|_{\mathbf{G}=\mathbf{0}} \label{eq:rot_grad} \end{equation} $

where $\mathbf{r}$ is the atomic positions and $\mathbf{h}$ the cell matrix. What is nice is this measures the sensitivity of the energy to infinitesimally small global rotations. If the energy is correctly invariant to rotation, then $\Delta_{\text{rot}}$ is zero and thus any deviation penalizes the model as:

$ \begin{equation} \mathcal{L}_{\text{equigrad}} = \lambda \left| \Delta_{\text{rot}} \right|_2^2 \end{equation} $

Computational Efficiency

I think the reason this remains efficient is because if we expand $\mathbf{R}$ via a Taylor series:

$ \begin{equation} \mathbf{R} = \mathbf{I} + (\mathbf{G} - \mathbf{G}^T) + \frac{1}{2!}(\mathbf{G} - \mathbf{G}^T)^2 + \cdots. \end{equation} $

and then because one evaluates the derivative w.r.t $\mathbf{G} = \mathbf{0}$, all the higher-order terms vanish due to the presence of $\mathbf{G}$ in the higher-order terms. Therefore one gets:

$ \begin{equation} \left. \frac{d\mathbf{R}}{d\mathbf{G}} \right|_{\mathbf{G}=\mathbf{0}} = \mathbf{G} - \mathbf{G}^T \end{equation} $

and $\Delta_{\text{rot}}$ is now a first-order variation. The linearization of $\mathbf{R}$ avoids evaluating the full matrix exponential in eq. $\ref{eq:rot_matrix}$, which saves compute time. Furthermore, for conservative models where forces and stress are computed via autograd, the evaluation of $\Delta_{\text{rot}}$ will use the same gradient computations and therefore the loss term adds negligible overhead [1]. At least this is what makes sense to me after reading the preprint [1].

Why This Works

This approach works because the force is defined as the negative gradient of the scalar potential energy:

$ \begin{equation} \mathbf{F}_i = -\nabla_{\mathbf{r}_i} E \end{equation} $

and for a conserved force field where $E$ is invariant to global rotation, then the forces transform equivariantly under rotation as:

$ \begin{equation} \mathbf{r}'_i = \mathbf{R} \mathbf{r}_i \quad \Rightarrow \quad \mathbf{F}'_i = \mathbf{R} \mathbf{F}_i \end{equation} $

What results is that by the enforcement of rotational invariance of the energy via equigrad, it implicitly leads to learning rotational equivariance with respect to the forces. The paper shows that equigrad significantly improves rotational invariance (by ~5x according to their tests, see Fig 3 in [1]) and also leads to improved energy RMSD (see Figure 2). The key is that this is achieved without including in the architecture any SO(3)-equivariant layers and without having to evaluate the full matrix exponential in eq. $\eqref{eq:rot_matrix}$, thereby making it very computationally efficient.

Figure 2. Energy RMSD with and without equigrad regularization (fig. 5 from ref. [1])

Summarizing things

The equigrad loss regularization introduced by B. Rhodes et al. [1] is clean and principled (I think), and seems to be a computationally efficient alternative to including equivariant layers in MLIP models. It yields:

Enforces energy invariance and learns force equivariance via the loss function.
Avoids the equivariant layer tensor field operations and spherical harmonics.
Adds negligible overhead, particularly for conservative models, due to linearization of $\mathbf{R}$ at $\mathbf{G} = \mathbf{0}$.

Footnotes

Adding SO(3) equivariant layers to the network architecture can be computationally expensive and architecturally complex compared to standard message passing GNN layers. For example, SO(3)-equivariant networks must compute spherical harmonics for each interaction, perform tensor product operations then contract these representations through Clebsch–Gordan decompositions, and maintain irreducible representation (irrep) channels at multiple orders. ↩