Search Blogs

Sunday, June 22, 2025

DFT is gettings its share of AI

I made a post on LinkedIn the other day sharing a recent preprint[1] by Microsoft AI that introduces a significant advancement in exchange-correlation functionals (XC) for DFT. This, to me, seemed like an important piece of news given that much of the data used for training the recent explosion in foundational interatomic potentials for atomistics uses massive DFT datasets.

The biggest achievement seems to be the ability for Microsoft AI's DFT XC functional, called Skala, to achieve near chemical accuracy (i.e., 1 kcal/mol) for molecular systems while doing it with reasonable computational cost. What this means is that we may get a new rung added to the chemistry Jacob's ladder of accuracy for DFT. I wanted to illusrate this so I made a diagram of the chemistry Jacob's ladder with the new Skala XC added as shown in fig. Figure 1. I got the idea to annotate it from Keiran Rowell's blog[2].

The idea of Jacobs Ladder in chemistry, I think, was first introduced by J. Perdew in 2013(?) [3], I believe. The primary idea is that like the in the biblical story, Jacob dreams of a ladder reaching from earth to heaven and God standing above it promising Jacob divine blessing and protection. In the spirit of this, we the scientist dream of reaching chemical accuracy using our ladder of DFT approximations. Why is this important? Well, chemical accuracy with DFT would enable accurate enough predictions in reasonable computational time to improve discovery, design, and understanding of chemicals and materials. In particular understanding reaction mechanisms and kinetics would be much more tractable if chemical accuracy is reached.

Figure 1. Computational chemistry Jacob's ladder for DFT with the Microsoft AI Skala XC added.

DFT XC Overview

First, let me briefly summarize some basics of density functional theory (DFT) to explain why the exchange-correlation (XC) functional is so important. The Nobel-worthy Hohenberg–Kohn theorems and the formulation of the Kohn–Sham equations established that the total ground-state (i.e., 0 K) energy of a many-body electronic system can be written as:

$ \begin{equation} E[\rho] = T_s[\rho] + V_{ext}[\rho] + J[\rho] + E_{xc}[\rho] \end{equation} \label{eq:dft_energy} $

The first three terms are kinetic energy, external potential energy, and Coulombic repulsion energy. The last term is the XC energy which is the only term we don't know exactly and is crucial because it encapsulates the many-body quantum mechanical effects of the electrons whereas the others are single-particle terms.

Throughout the successful history of DFT, different approximations have been used for the XC term. The first was local density approximation (LDA), with the idea being that the XC energy is simply a function of the local density of the electrons. For all its simplicity, LDA did work reasonably for some systems but shortcomings were quickly discovered.

Then the generalized gradient approximation (GGA) was introduced, where the XC energy is now not only a function of the local density but also the gradient of the density. This introduced the idea of non-local XC functionals (i.e., the energy now depends on how the density changes with position, rather than just at that position).

After GGA, what followed was a "flavor-of-the-month" approach toward improving XC functions. There were hybrid functionals that mixed DFT XC with the Hartree-Fock exchange. Then meta-GGA was introduced with the idea of adding higher-order derivatives to the density. Add to that expensive XC functionals that include perturbation quantum chemistry methods. All have their improvements and shortfalls. You can see all the major players in the rungs of the ladder in fig. Figure 1 showing how different XC functionals improve the range of accuracy as you climb up the ladder.

These are just the formalisms; there are also ad-hoc tweaks and specializations that get added based on your domain use and expertise. The limitations of such approaches are:

  1. Handcrafted Features: Most functionals rely on fixed analytic forms and known constraints.
  2. Non-locality Deficiency: Electron correlation is inherently non-local; local/semi-local functionals cannot fully capture this.
  3. Empirical Tuning vs. Physical Justification: Empirical functionals may generalize poorly or violate constraints.
  4. Slow Progress at Higher Rungs: Despite decades of work, hybrid/double-hybrid methods still fall short of universal chemical accuracy (error < 1 kcal/mol).

Skala Neural XC

The efforts by Microsoft AI represent a significant leap in XC functional design. Unlike traditional functionals that have relied on hand-crafted mathematical forms, Skala incorporates deep learning (i.e., Neural Networks) to achieve near-chemical accuracy while maintaining computational efficiency comparable to meta-GGA functionals[1]. Skala is particularly impressive because it navigates the trade-off between accuracy and computational cost. The Microsoft AI team has designed it to bridge the gap between semi-local functionals (fast but less accurate) and hybrid/double-hybrid functionals (accurate but computationally expensive).

Architecture and Design Philosophy

As mentioned, Skala XC is a neural network architecture trained to learn the non-local electron density interactions without requiring the full computational burden of exact exchange calculations. Like meta-GGA, it starts with seven semi-local density-derived features, but the design then employs what the authors call a "coarse-fine grid structure" that captures long-range density correlations consistent with multipole-like behavior.

For the exchange, the authors use LDA but incorporate a neural enhancement factor $f_\theta$:

$ \begin{equation} E_{xc}^\theta[\rho] = -\frac{3}{4}\left(\frac{6}{\pi}\right)^{1/3} \int \left(\rho_\uparrow(r)^{4/3} + \rho_\downarrow(r)^{4/3}\right) f_\theta[x[\rho]](r) \, dr \label{eq:skala_xc} \end{equation} $

Not knowing too much about XC design, this does seem clever though because $f_\theta$ operates on a feature vector $x[\rho]$ that encodes both local and non-local density information while maintaining computational efficiency (would like to understand this but above my head for now). This is reminiscent of delta-learning in atomistic ML, though Skala performs a direct functional approximation rather than a residual correction atop an existing functional.

What data was used?

The Microsoft AI team used a quantum chemistry dataset that seems only they could have curated, something like 150,000 data points spanning thermochemistry, conformational energies, noncovalent interactions, and ionization potentials. This was all done with gold-standard wavefunction methods like CCSD(T) to generate the data, 🀯. Their MSR-ACC/TAE dataset alone includes about 80,000 total atomization energies with errors below 1 kcal/mol, which is just staggering when you consider the computational expense of generating such reference data. The datasets alone might be useful for others.

Peformance Metrics

I'm not too familiar with benchmarks in this space, but they mention the challenging W4-17 benchmark and Skala achieves a mean absolute error (MAE) of around 1.0 kcal/mol, outperforming more established functionals. On the GMTKN55 benchmark, it scores a WTMAD-2 of 3.89 kcal/mol, which puts it in competition with the best hybrid functionals while requiring significantly less computational resources.

Another thing that is super interesting is that for large atomic numbers that are out of distribution for the trained Scala XC, They are showing that it maintains very good accuracy. This is usually a failure point for data-driven ML models where out of distribution data is not well handled particularly well [4]. The authors also method that the self-consistent field convergence is stable, crucial for practical application in DFT --- since, unlike post-SCF models, Skala is trained and used self-consistently.

Redefining Jacob's Ladder?

So going back to the Jacob's ladder of DFT functionals in fig. Figure 1. Whether Skala constitutes a sixth rung or a bypass of the traditional ladder remains obviously open to interpretation, but it clearly represents a shift. I'm not a veteran in this space so hard for me to be definitive, but it probably represents some departure to non-locality from semi-local features to effectively bypass the traditional way of doing things.

It will be interesting to see from the "Kings of DFT" to see how they view Skala XC. Based on the Microsoft AI preprint this functional(s) is showing systematic improvement through use of high-quality data and training procedures, and the authors claim it can encode known physics through appropriate constraints and design. On the other hand, they lack the theoretical transparency and interpretability that traditional functional forms offer will probably be a critiqued, although existing functionals are not perfect either.

I'm just interested to see how this gets used for the other down stream applications in solid-state physics and materials science. Will we get DFT training sets for Materials Project structures that range in the 1 million of structures and all have chemical accuracy? If so that could in turn make these foundation models even better for materials discovery and classical atomistic modeling.


References

[1] G. Luise, C.-W. Huang, T. Vogels, D.P. Kooi, S. Ehlert, S. Lanius, K.J.H. Giesbertz, A. Karton, D. Gunceler, M. Stanley, W.P. Bruinsma, L. Huang, X. Wei, J.G. Torres, A. Katbashev, B. MΓ‘tΓ©, S.-O. Kaba, R. Sordillo, Y. Chen, D.B. Williams-Young, C.M. Bishop, J. Hermann, R. van den Berg, P. Gori-Giorgi, Accurate and scalable exchange-correlation with deep learning, (2025). DOI.

[2] K. Rowell, An Ersatz Ansatz, Blog (2023). https://keiran-rowell.github.io/guide/2023-04-12-compchem-methods-basics (accessed June 21, 2025).

[3] J.P. Perdew, Climbing the ladder of density functional approximations, MRS Bull. 38 (2013) 743–750. DOI.

[4] K. Li, A.N. Rubungo, X. Lei, D. Persaud, K. Choudhary, B. DeCost, A.B. Dieng, J. Hattrick-Simpers, Probing out-of-distribution generalization in machine learning for materials, Commun Mater 6 (2025) 1–10. DOI.



Reuse and Attribution