📜 ⬆️ ⬇️

The use of IT in the search for new drugs: an overview of areas

Drug companies spend a lot of money on research. So, in the table of top R & D budgets, all the companies in the Healthcare section are pharmaceutical companies that search for new drugs. The fact is that the search for a drug is akin to finding a needle in a haystack. Information technology helps to speed up this search and save money. The role of IT in the search for new drugs is partially covered in the Habré articles of the “bioinformatics” and “biotechnology” sections, including this one . However, only the biology approach is presented to readers. It also offers a more comprehensive review, taking into account advances in chemistry. If interested - go!

To begin, let's answer the question: what is needed for the medicine to work? It is necessary that the drug molecule first
1) passed through biological systems to their bio-target - for example, a substance from a pill needs to bypass the digestive tract (absorption), circulatory system (distribution), liver (metabolism) and, moreover, not to have time to get out of the kidneys (elimination), and then
2) interacted with the biotarget - this target is usually the corresponding protein.
The search always starts from the second point - no one needs a substance that the body can “pass through”, but no one needs to act. Since, on the one hand, we have a small molecule of a chemical substance, and on the other, a large protein molecule, we can approach from two different sides.

The target approach ( target-based , “biological” approach) implies that we first choose a biotarget whose crystalline structure is known, and then we investigate the possibility of its interaction with a multitude of molecules about which we may not know anything (that is, even if There is some information about their properties - it is not used). The very possibility of interaction is studied using the so-called docking . Briefly, the procedure is as follows: we find the structure of the protein of interest to us, crystallized along with a known active substance ( .pdb file, for example, on the website rcsb.org ), open it with a suitable program (which is able to visualize the structure of the protein), remove the ligand from there call a substance that interacts with a large molecule), create or find a base of drawn molecular structures (virtual library), and check each molecule how well it fits into the freed pocket. That is, we are looking for its most advantageous position in the pocket with the help of a gradient descent (and its improvements) or (less often) a genetic algorithm. Difficulties: 1) the structure of many proteins is difficult to obtain, since it is impossible to crystallize them; 2) a small molecule can have many variants of spatial arrangement (conformations), moreover, when interacting with a protein, the conformation may change ( induced conformation ).
image
There is another way - molecular design (although under this phrase may mean a lot, this is where it is most appropriate). Instead of popping ready molecules into the pocket, in those places where the interaction with the known ligand took place, molecular fragments are left (the same or the other, which are able to interact in the same way - they are called bioisteric). And then they try to connect the fragments by inserting atoms between them, so that the resulting conformation is stable. The truth is this method did not find sufficient distribution. The complexity, apparently, in the algorithmic solution of such a problem.

The ligand approach (the ligand-based , “chemical” approach) implies that we have information about the activity of a number of compounds. And we want to find even more active. What is hiding under the word "activity"? It can be both quantitative and qualitative results of studies of reactions to the connection of living entities - in vivo (mouse, isolated organ or tissue, microorganism, cell culture, etc.), or non-living - in vitro (protein molecules). In this case, we may not know the structure of the protein, moreover, we may not even know the biotarget, having only the response of the organism or cells to the presence of the compound. The meaning of the approach is that, using information about the structure of active and inactive compounds, we can interpolate and extrapolate (within reasonable limits) the results to new, not yet synthesized compounds. Here again there are several options: pharmacophoric modeling and modeling of the quantitative relationship structure-activity .
Pharmacophore is a set of molecular fragments with a certain spatial distribution, the presence of which makes the molecule active. Instead of the most molecular fragment, pharmacophoric "notations" are more commonly used, that is, abstractions that include the fragment along with possible bioisosteres (for example, "hydrogen bond donor", "hydrogen bond acceptor", "hydrophobic group", "aromatic fragment", " π -connection ", etc.). For each possible combination of “notations”, again, by the principle of gradient descent, the best coordinates for the spheres of “notations” are sought, so that suitable fragments of active compounds fall into the corresponding spheres, and fragments of inactive ones do not fall. Difficulties: accurate definition of 3D geometry of molecules for a variety of conformations, the need for a significant number of compounds already tested.
image
Modeling quantitative structure-activity relationship ( Quantitative Structure Structure Activity Relationship, QSAR ) - if you describe the structure of compounds using numerical characteristics, the task of finding a relationship is reduced to a typical regression task (if the response is activity is a continuous value) or classification (if the response is discrete nominal value). These numerical characteristics are called molecular descriptors, they are usually calculated using ready-made software (Dragon, CDK, MOE, Accelrys discovery, etc.). In this case, there is a gradation of descriptors from 0D to 3D, depending on what level of detail the structure of the molecule is represented. Examples of descriptors can be: 0D - the number of carbon atoms, molecular weight, 1D - the number of hydroxyl groups, 2D - descriptors based on graph theory, for example. Wiener index, eigenvalues ​​of adjacency matrices, 3D quantum chemical descriptors, such as the energy of the highest occupied molecular orbitals, the heat of formation. Some sources allocate another 4D-descriptors - these are vectors of various potentials (electrostatic, steric) calculated in the spatial grid of points. QSAR modeling using the latter is called 3D-QSAR. As machine learning methods, the most common are multiple linear regression, projection on latent structures, neural networks (both for regression and classification) and Random Forest . Other algorithms are also used, but less frequently. Since the number of molecular descriptors is often much larger than the compounds studied, variable selection methods are used — a genetic algorithm, step-by-step regression, or screening, for example, based on the lack of correlation with the response. Difficulties: the need for a significant number of compounds already tested, 3D and 4D descriptors depend on the exact definition of the 3D geometry of molecules.
')
Docking, farmaforny and QSAR models are then used to conduct virtual screening (or in silico screening). That is, not having a compound synthesized in reality, its activity is evaluated, and, thus, a huge number of obviously inactive compounds are eliminated, and compounds with the predicted high activity are synthesized and sent for biological research. If the compound is active and non-toxic, then its pharmacokinetic properties are studied (absorption, volume of distribution, rate of biotransformation and elimination - the ADME complex). QSAR modeling can also be used to predict ADME , but that's another story.

The bonus is not the topic: and in the CIS countries the chemistry of medicines (the one that abroad medicinal chemistry ) was mistakenly translated as “medical chemistry” is still used today.

Source: https://habr.com/ru/post/191864/


All Articles