laitimes

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

Edit | KX

To this day, crystallography has determined structures with detail and precision, from simple metals to large membrane proteins, unmatched by any other method. However, the biggest challenge, the so-called phase problem, is still retrieving phase information from the experimentally determined amplitude.

Researchers at the University of Copenhagen in Denmark have developed PhAI, a deep learning method to solve crystal phase problems, using a deep learning neural network trained on millions of intraocular crystal structures and their corresponding synthetic diffraction data to generate accurate electron density maps.

Studies have shown that this ab initio structural solution approach based on deep learning can solve phase problems with a resolution of only 2 angstroms, which is equivalent to only 10 to 20 percent of the data available at atomic resolution, which is often required by traditional ab initio methods.

相关研究以《PhAI: A deep-learning approach to solve the crystallographic phase problem》为题,于 8 月 1 日发布在《Science》上。

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

Paper link: https://www.science.org/doi/10.1126/science.adn2777

Crystallography is one of the core analytical techniques in the natural sciences. X-ray crystallography provides a unique perspective on the three-dimensional structure of crystals. In order to reconstruct the electron density map, it is necessary to know enough of the diffraction reflection of the complex structural factor F. In traditional experiments, only amplitude |F|, and the phase φ is lost. This is a crystallographic phase problem.

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

Figure: Flow chart for standard crystal structure determination. (Source: Paper)

Major breakthroughs were made in the 50s and 60s of the 20th century, when Karle and Hauptmann developed so-called direct methods for solving phase problems. However, the direct method requires diffraction data at atomic resolution. However, the requirement of atomic resolution is an empirical observation.

In recent years, the traditional direct approach has been complemented by the dual space approach. The ab initio approach currently available seems to have reached its limits. The general solution to the phase problem is still unknown.

Mathematically, any combination of structural factor amplitude and phase can be inversely Fourier transformed. However, physical and chemical requirements, such as having an atomic-like electron density distribution, impose rules on possible combinations of phases that coincide with a set of amplitudes. Advances in deep learning have made it possible to explore this relationship, perhaps in greater depth than current ab initio methods.

Here, researchers at the University of Copenhagen have adopted a data-driven approach, using millions of anthropomorphic crystal structures and their corresponding diffraction data, with the aim of solving phase problems in crystallography.

Studies have shown that this deep learning-based approach to de-engineered structural solutions can be performed at a resolution of only the minimum lattice plane distance (dmin) = 2.0 Å, requiring only 10 to 20% of the data required to use the direct method.

Design and training of neural networks

The constructed human neural network is called PhAI, and accepts the structural factor amplitude |F| and output the corresponding phase value φ . The architecture of PhAI is shown in the figure below.

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

Illustration: PhAI neural network approach to solving phase problems. (Source: Paper)

The number of structural factors of the crystal structure depends on the unit cell size. Depending on the computing resources, a limit is set on the size of the input data. The input structure factor amplitude is obeyed according to Miller exponents (h, k, l).

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

The reflection to choose.

That is, structures with a unit cell size of about 10 Å at atomic resolution. In addition, the most common center-symmetric space group, P21/c, was chosen. Center symmetry limits the possible phase values to zero or π rad.

The study uses an intraocular lens structure containing mostly organic molecules to train neural networks. About 49,000,000 structures were created, of which 94.29% were organic crystal structures, 5.66% were metal-organic crystal structures, and 0.05% were inorganic crystal structures.

The input of the neural network consists of amplitude and phase, which are processed by convolutional input blocks, added and fed into a series of convolutional blocks (Conv3D) followed by a series of multilayer perceptron (MLP) blocks. The predicted phase from the linear classifier (phase classifier) loops NC times through the network. The training data was generated by inserting metal atoms and organic molecules from the GDB-13 database into the unit cell. The resulting structures are organized into training data, from which the true phase and structure factor amplitudes can be calculated when sampling temperature factor, resolution, and integrity.

Solve real structural problems

The trained neural network runs on a standard computer with modest computational requirements. It accepts a list of HKL indexes and the corresponding structural factor amplitude as input. No additional input information is required, not even the unit cell parameters of the structure. This is fundamentally different from all other modern ab initio methods. The network can predict and output phase values on the fly.

The researchers tested the performance of the neural network using calculated diffraction data of real crystal structures. A total of 2387 test cases were obtained. For all collected structures, multiple data resolution values were considered, ranging from 1.0 to 2.0 Å. For comparison, the charge flipping method was also used to retrieve the phase information.

Millions of crystal data training, solving crystallographic phase problems, deep learning method PhAI

Figure: Histogram of the correlation coefficient r between the phase and the true electron density plot. (Source: Paper)

Trained neural networks perform well; If the corresponding diffraction data is of good resolution, it can solve all tested structures (N = 2387) and excels at solving structures from low-resolution data. Although a neural network is hardly trained on inorganic structures, it can solve such structures perfectly.

The charge flipping method performs well in processing high-resolution data, but its ability to produce reasonable and correct solutions gradually decreases as the data resolution decreases. However, it still solves about 32% of the structure at a resolution of 1.6Å. The number of structures determined by charge flipping can be improved by further experimentation and changes to the input parameters, such as the rollover threshold.

In PhAI methods, this meta-optimization is performed during training and does not need to be performed by the user. These results suggest that the common notion that atomic-resolution data must be available in crystallography to calculate phases de novo may be broken. PhAI requires only 10% to 20% atomic-resolution data.

This result clearly shows that atomic resolution is not necessary for ab initio methods and opens up new avenues for deep learning-based structure determination.

The challenge with this deep learning approach is to scale the neural network, that is, the diffraction data of larger unit cells will require a large amount of input and output data as well as computational costs during training. In the future, further research is needed to extend the method to the general case.

Read on