Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning

 

Xiaobo Qu1*, Yihui Huang1, Hengfa Lu1, Tianyu Qiu1, Di Guo2, Vladislav Orekhov3, Zhong Chen1*

1Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen, China

2School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China.

3Department of Chemistry and Molecular Biology and Swedish NMR Centre, University of Gothenburg, Box 465, Gothenburg 40530, Sweden

Correspondence should be addressed to Xiaobo Qu (quxiaobo@xmu.edu.cn) or Zhong Chen (chenz@xmu.edu.cn).

 

Citations:

Xiaobo Qu*, Yihui Huang, Hengfa Lu, Tianyu Qiu, Di Guo, Vladislav Orekhov, Zhong Chen*, Accelerated nuclear magnetic resonance spectroscopy with deep learning, arXiv:1904.05168, 2019. 

Synopsis

Nuclear magnetic resonance (NMR) spectroscopy serves as an indispensable tool in chemistry and biology but often suffers from long experimental time. We present a proof-of-concept of harnessing deep learning and neural network for high-quality, reliable, and very fast NMR spectra reconstruction from limited experimental data. We show that the neural network training can be achieved using solely synthetic NMR signal, which lifts the prohibiting demand for large volume of realistic training data usually required in the deep learning approach.

 

Method

Our method solely uses the synthetic data as training data, which is significantly different from many deep learning approaches that utilize the realistic data as training data. The fully sampled spectrum satisfies, where  is the Fourier transform and  is the fully sampled FID, and the undersampled FID obeys , where  is the undersampling operator, are generated as follows:

The fully sampled FID  is simulated according to the classical exponential function modeling as[1-5]:

                                     ,                              (1-1)

where J is the number of exponentials, Aj, , τj and fj are the amplitude, phase, decay time and frequency, respectively, of the jth exponential.

Reconstructing a spectrum from NUS data is equivalent to mapping of the input undersampled FID signal to the target spectrum. In the DL NMR, a neural network is trained to perform the mapping as shown in Figure 1.

 

 

Figure 1. Flowchart of deep learning NMR spectroscopy.

Main Results:

We apply the deep NMR to recover some 2D and 3D protein spectra compared with representative NUS NMR reconstruction methods, the LR approach[2] (for 2D spectra) and CS[6, 7] (for 3D spectra). Both DL, LR and CS methods obtain very high peak intensity correlation with R2 > 0.98(at the NUS rate of 25% for 2D spectra and 10% for 3D spectra), shows the high fidelity of reconstruction provided (Fig.2 – Fig.6). More reconstructed spectra are detailed in our paper.

1. The Reconstruction for 2D Spectra:

 

Figure 2. Reconstruction of a 2D 1H–15N HSQC spectrum of cytosolic CD79b protein from the B-cell receptor. (a)-(c) are the fully sampled spectra, LR and DL reconstructions from 25% NUS data, respectively; (d) and (e) are peak intensity correlations obtained by LR and DL methods, respectively; (d) and (e) are zoomed out 1D 15N traces, and the red, yellow and green lines represent the spectra obtained with fully-sampling, LR and DL methods, respectively.

 

Figure 3. Reconstruction of the 2D 1H-15N HSQC spectrum acquired from ubiquitin. (a) is the fully sampled reference spectrum, (b) and (c) are reconstructed spectra from 25% NUS data by LR and DL methods, respectively, (d) and (e) are the peak intensity correlations achieved by LR and DL methods, respectively, (f) and (g) are zoomed out 1D 15N traces, and the red, yellow and green lines represent the reference, LR and DL reconstructed spectra, respectively.

2. The Reconstruction for 3D Spectra

 

Figure 4. The projections on 1H-15N and 1H-13C planes of the 3D HNCACB. (a) is the fully sampled spectrum, (b) and (c) are reconstructed spectra from 10% NUS data by CS and DL, respectively. Note: The sub-region of projections marked with green dash rectangle was shown in Figure 5.

 

Figure 5. The sub-region of the projections on 1H-15N and 1H-13C planes of the 3D HNCACB. (a) is the fully sampled spectrum, (b) and (c) are reconstructed spectra from 10% NUS data by CS and DL, respectively.

 

Figure 6. Correlation coefficients between reconstructed spectra and fully sampled 3D HNCACB shown in Fig.5. (b) and (c) are the peak intensity correlations achieved by CS and DL, respectively.

3. The Reconstruction for 2D Spectra at Lower NUS level

At the lower the NUS levels (10% and 15%), the DL NMR provides higher correlation values as well as lower dispersion of correlation coefficients over 100 NUS trials. The higher quality of the DL NMR reconstruction at low NUS rate is also illustrated in Figs. 7 (a) and (b). These observations imply that DL allows more significant saving of measurement time than the LR method, and also is more robust under different NUS trials, leading to more stable reconstruction.

 

Figure 7. Reconstructed 2D 1H-15N HSQC spectra under different amounts of NUS data. (a)-(d) are the reconstructions at NUS of 10%, 15%, 20% and 25%, respectively. The spectra marked with green and yellow colors are reconstructions with DL and LR methods, respectively.

 

Figure 8. Correlation coefficients for the Fig. 7 spectra at different rates of NUS. Note: The green and yellow lines indicate the Pearson correlation coefficient R2 of DL and LR methods, each compared with the fully sampled spectrum, respectively. The error bars are the standard deviations of the correlations over 100 NUS resampling trials.

 

 

Figure 9. Correlation coefficients for other three spectra at different rates of NUS. Note: The green and yellow lines indicate the Pearson correlation coefficient R2 of DL and LR methods, each compared with the fully sampled spectrum, respectively. The error bars are the standard deviations of the correlations over 100 NUS resampling trials.

 

4. The Computation Time of Spectra Reconstruction

Without compromising the spectra quality, DL is much faster than other state-of-the-art methods such as low rank[2] and compressed sensing[8]. The comparisons, shown in Figure 10, indicate that the computational time of spectra reconstructions in DL NMR are orders of magnitude shorter than times needed for the traditional algorithms.

 

Figure 10. Computational time for the spectra reconstruction with deep learning, low rank and compressed sensing. Experiments were carried out in a dual CPUs (2.2 GHz, 12 cores per CPU) computer server equipped with 128 GB RAM and one Nvidia Tesla K40M. Deep learning, low rank and compressed sensing were implemented in Tensorflow (GPU), MATLAB (CPU) and MddNMR (CPU), respectively. Both low rank and compressed sensing algorithms were accelerated with CPU-based parallel computing in 24 threads. The indirect dimensions of tested 2D spectrum has 256 points while its direct dimension is 116 points. The indirect dimensions of the 3D spectra are 60x60 points, and its direct dimension has 732 points.

Download:

 

Paper:

CSG website: here

Arxiv: https://arxiv.org/abs/1904.05168

Demo Code and Dataset:

1) 2D NMR data and reconstruction

A. Dataset

HSQC spectrum of GB1: The sample is 2 mM U-15N, 20%-13C GB1 in 25 mM PO4, pH 7.0 with 150 mM NaCl and 5% D2O. Data was collected using a phase-cycle selected HSQC at 298 K on a Bruker Advance 600 MHz spectrometer using a room temp HCN TXI probe, equipped with a z-axis gradient system. The raw data of GB1 can be download at https://www.ibbr.umd.edu/nmrpipe/demo.html

B. Demo code

The demo code can be download here. Note: In this demo, spectrum data was processed and saved in mat format.

 

2) 3D NMR data and reconstruction

A. Dataset

HNCO spectrum of azurin: It was obtained from the 800 MHz spectrometer on 15N-13C-labeled Cu(I) azurin sample and was described earlier[9]. The raw data of HNCO can be download at http://mddnmr.spektrino.com/download

B. Demo code

The demo code can be download here. Note: In this demo, spectrum data was processed and saved in mat format.

 

Acknowledgments:

1) Data Provider

Authors thank Marius Clore and Samuel Kotler for providing the 3D HNCACB data; Jinfa Ying for assisting processing and helpful discussion on the 3D HNCACB spectrum; Luke Arbogast and Frank Delaglio for providing the 2D HSQC spectrum of GB1.

2) Fund

This work was supported in part by the National Natural Science Foundation of China (NSFC) under grants 61571380, 61871341 and U1632274, the Joint NSFC-Swedish Foundation for International Cooperation in Research and Higher Education (STINT) under grant 61811530021, the National Key R&D Program of China under grant 2017YFC0108703, the Natural Science Foundation of Fujian Province of China under grant 2018J06018, the Fundamental Research Funds for the Central Universities under grant 20720180056, the Science and Technology Program of Xiamen under grant 3502Z20183053, the China Scholarship Council under grants 201806315010 and 201808350010, and the Swedish Research Council under grant 2015–04614.

 

References:

[1] J. C. Hoch and A. Stern, NMR Data Processing. Wiley, 1996.

[2] X. Qu, M. Mayzel, J.-F. Cai, Z. Chen, and V. Orekhov, "Accelerated NMR Spectroscopy with Low-Rank Reconstruction," Angewandte Chemie, International Edition, vol. 54, no. 3, pp. 852-854, 2015.

[3] H. M. Nguyen, X. Peng, M. N. Do, and Z. Liang, "Denoising MR Spectroscopic Imaging Data with Low-Rank Approximations," IEEE Transactions on Biomedical Engineering, vol. 60, no. 1, pp. 78-89, 2013.

[4] J. Ying, H. Lu, Q. Wei, J. Cai, D. Guo, J. Wu, Z. Chen, and X. Qu, "Hankel Matrix Nuclear Norm Regularized Tensor Completion for N-dimensional Exponential Signals," IEEE Transactions on Signal Processing, vol. 65, no. 14, pp. 3702-3717, 2017.

[5] J. Ying, J. Cai, D. Guo, G. Tang, Z. Chen, and X. Qu, "Vandermonde Factorization of Hankel Matrix for Complex Exponential Signal Recovery—Application in Fast NMR Spectroscopy," IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5520-5533, 2018.

[6] Y. Pustovalova, M. Mayzel, and V. Y. Orekhov, "XLSY: Extra-Large NMR Spectroscopy," Angewandte Chemie, International Edition, vol. 57, no. 43, pp. 14043-14045, 2018.

[7] X. Qu, X. Cao, D. Guo, and Z. Chen, " Compressed Sensing for Sparse Magnetic Resonance Spectroscopy," International Society for Magnetic Resonance in Medicine 18th Scientific Meeting (ISMRM), Stockholm, Sweden, May 1-7, pp. 3371, 2010.

[8] K. Kazimierczuk and V. Y. Orekhov, "Accelerated NMR Spectroscopy by Using Compressed Sensing," Angewandte Chemie, International Edition, vol. 50, no. 24, pp. 5556-5559, 2011.

[9] D. M. Korzhnev, B. G. Karlsson, V. Y. Orekhov, and M. Billeter, "NMR Detection of Multiple Transitions to Low-Populated States in Azurin," Protein Science, vol. 12, no. 1, pp. 56-65, 2003.