Orders of Coupling Representations as a Versatile Framework for Machine Learning from Sparse Data in High-Dimensional Spaces

Thumbnail Image
Manzhos, Sergei
Carrington, Tucker Jr.
Ihara, Manabu
Machine learning (ML) techniques are already widely and increasingly used in diverse applications in science and technology, including computational chemistry. Specifically in computational chemistry, neural networks (NN) and kernel methods such as Gaussian process regressions (GPR) have been increasingly used for the construction of potential functions and functionals for density functional theory. While ML techniques have a number of advantages vs intuition-based models, notably their generality and black-box nature, they are still challenged when faced with high dimensionality of the feature space or low and uneven data density – in part because of their general nature. We review recent works using methods such as NNs and GPR as building blocks of composite methods in the framework of an expansion over orders of coupling. We introduce models using NN or GPR-based components as part of HDMR (high-dimensional model representations)-based structures. HDMR is a formalization of orders-of-coupling representations that include the many-body and N-mode representations well known in computational chemistry and allows, in particular, building all terms from one dataset of arbitrarily distributed data. The resulting HDMR-NN and HDMR-GPR combinations and NN with HDMR-GPR derived neuron activation functions not requiring non-linear optimization enhance machine learning capabilities in high dimensional spaces and or with sparse data.