📜 ⬆️ ⬇️

Predicting chemical reactions using machine translation algorithms



According to a study published by IBM Research researchers (1) , the prediction of chemical reactions can be significantly improved by considering the chemical reaction as a translation problem.

The idea of ​​using computers to facilitate the work of chemists is far from new. Already back in 1969, Corey and Vipka [1] demonstrated that planning the synthesis and retrosynthesis (the inverse problem, when the product is known, but the simple and cheap method of synthesis is unknown) can be performed by the machine. (2)

With the advent of new machine learning technologies, better prediction of the results of chemical transformations is possible. In recent years, forecasting methods based on reaction patterns have been widely studied. For example, Segler and Waller recently presented a neuro-symbolic approach (3). They extracted reaction rules from the commercial Reaxys database. They then trained the neural network with molecular reaction imprints to prioritize the rules and combined the network with the Monte Carlo method to search the tree (4) to overcome the scalability problems of other template-based methods.
')
To circumvent the limitations of template-based approaches and the further development of computer-aided prediction of reactions, in 2012, the first forecasting approach was introduced without using reaction templates (5). Researchers from IBM used a non-template-based method using Seq2seq models to predict and retrospect organic reactions. A similar approach was recently published by Nam and Kim (6), who also used non-template seq2seq models. Their variant was based on the translation model Tensorflow (v0.10.10.0) (7), from which they took the default values ​​for most of the hyper parameters.


Interface of the “Found in Translation” system (7) from researchers at IBM Research

The language of chemical formulas is the language by which people describe the chemical transformations and processes occurring in the surrounding world. A language invented by humans can be processed using algorithms similar to translation algorithms. Using this hypothesis, researchers from IBM brought chemical compounds into the SMILES presentation and proposed a new way of tokenization, which is arbitrarily expandable with new reaction information. Next, the system was trained with data sets (source - patent database of reactions), containing 395 thousand chemical reactions using the neural network often used in machine translation. Article (1) claims 80% prediction accuracy without the use of auxiliary data, such as reaction patterns. Accuracy is 6 points better than other prognostic models. In addition, for larger and noisier data sets, an accuracy of 65.4% is achieved.

The authors hope that this method will accelerate research, such as drug development, and expect to open online access to the system in 2018 ( 8 ).

Interview:



References:
1. Schwaller P, Gaudin T, Lanyi D, Bekas C, Laino T. “Found in Translation”: Predicting Outcomes of the Complex Sequence-to-Sequence Models. ArXiv171104810 Cs Stat [Internet]. 2017 Nov 13 [cited 2017 Dec 14]; Available from: arxiv.org/abs/1711.04810
2. Corey EJ, Wipke WT. Computer-Assisted Design of Complex Organic Syntheses. Science. 1969; 166 (3902): 178–92.
3. Segler MHS, Waller MP. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem - Eur J. 2017 May 2; 23 (25): 5966–71.
4. Monte Carlo method for searching the [Internet] tree. [cited 2017 Dec 14]. Available from: habrahabr.ru/post/282522
5. Kayala MA, Baldi P. Reactionary Prediction of Complex Chemical Reactions. J Chem Inf Model. 2012 Oct 22; 52 (10): 2526–40.
6. Nam J, Kim J. Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions. ArXiv161209529 Cs [Internet]. 2016 Dec 29 [cited 2017 Dec 14]; Available from: arxiv.org/abs/1612.09529
7. Found in Translation: Neural Networks Predict Outcomes in Chemistry [Internet]. IBM Blog Research. 2017 [cited 2017 Dec 14]. Available from: www.ibm.comhttps : //www.ibm.com/blogs/research/2017/12/neural-networks-organic-chemistry/
8. IBM Research - Zurich, Found in Translation chemistry app [Internet]. 2017 [cited 2017 Dec 14]. Available from: www.zurich.ibm.com/foundintranslation

Source: https://habr.com/ru/post/371099/


All Articles