📜 ⬆️ ⬇️

What questions can be answered by analyzing 1,500,000 unique case histories?

Is there an association between asthma and schizophrenia?
Diabetes and bipolar personality disorder - can they have something in common?
Can you identify such a non-trivial connection database analysis of 1,500,000 US patients?

warning: under the cut a lot of text

The article is based on the report “Autism and Mendelian diseases” by Rzhetsky Andrei Yuryevich at the First International Conference “Autism. Challenges and solutions. Further about him and about data analysis
Andrey Yurevich Rzhetsky
image
Andrei Rzhetsky is a professor of medicine and human genetics at the Institute of Genomics and Systems Biology at the University of Chicago. He is also the director of the KONTE Genomic Bioinformatics Center for Neuropsychiatric Diseases. A. Rzhetsky graduated from Novosibirsk State University, defended his thesis at the Institute of Cytology and Genetics in Novosibirsk. In 1991, as a postdoc, he left for the United States.
Scientific interests:
1) bioinformatics and phylogenetics as applied to the analysis of genes, proteins, molecular metabolic pathways;
2) application of statistics to sequence analysis and molecular network analysis;
3) development of algorithms and programs for the analysis and comparison of metabolic pathways and sequences, phylogenetic reconstruction.
As a mathematician and theoretical biologist, Andrei Rzhetsky is a leading expert in the development of new bioinformatics approaches to the analysis of biological complexes and diseases. The scientist is a pioneer in developing strategies for bioinformatic mapping of diseases through a comprehensive analysis of genetic data.
Andrei Yurevich is so famous in the USA that there are even several search tips with his last name in Google:


Autism
Autism is a violation of the development of the nervous system, which is expressed by the difficulties of social interaction and communication, as well as limited and repetitive behavior. In accordance with the diagnostic criteria, the symptoms of autism should be evident in children as young as three years old. Autism affects the processing of information in the brain, changing the order of organization and connection of nerve cell synapses. How this happens is not entirely clear.
Approximate translation from Ango-Wiki

Mendelian diseases
Mendelian diseases, signs (Mendels diseases, traits): diseases or signs that result from the expression of a single gene that has a large effect on the phenotype. Inherited according to the laws of Mendel. Examples of Mendelian diseases: cystic fibrosis, sickle cell disease, Huntington's disease (Huntington) and hemophilia
from the Internet

Abstract


Biology has accumulated huge data that can only be processed using a computer. Andrei Rzhetsky's group undertook to process data on neuropsychiatric disorders. At the same time, they do not process a separate data file, be it genetic causes, environmental factors or clinical results, but all the data together, and this gives a more complete picture of the causes of the disorders.
In 2004, A. Rzhetsky's group received a grant from the organization Autism Speaks for a two-way analysis of autism (as a biological process and as a developmental disorder), using rich information accumulated in several related areas. The group collected information about molecular interactions in human neurons and, using its unique program (the GeneWays system), examined a wide range of disorders with which autism reveals non-random associations (neurological, autoimmune, metabolic, and many other groups of disorders that have a strong hereditary component).
In 2007, the group has analyzed 1.5 million case histories. The essence of the work lies in the study of the area of ​​intersection at the level of certain genes of various diseases. The researchers concluded that certain groups of genes can predispose a person to several diseases, while others can predispose a person to only one disease, protecting against another. The same mutation in a gene can either correlate with another disease or protect against it, making it impossible to combine two different diseases. In models for evaluating data from the field of autism, the possibility of susceptibility to bipolar disorder was also revealed. At the same time, a common group of genes was found when comparing migraine with autism, as well as the connection of infections with many neuropsychiatric disorders, including autism. The group of A. Rzhetsky for the first time measured these correlations.

The graph (below) shows the correlation of some common diseases. Red lines - positive correlation, blue - negative. Line thickness - the magnitude of the correlation. The size of the circle corresponds to the sample of patients (from 20 to 136 thousand).

Autism and Mendelian diseases


Dr. Rzhetsky opened his presentation with a slide with a familiar frame from the Russian film about Sherlock Holmes. And this is no accident: it is the Holmes phenomenon, which has succeeded in detective work due to attention to detail, insignificant for most ordinary observers, inspires Rzhetsky, who is also confident that it is trifles that can identify many biological puzzles and help them find the keys.
He uses the following metaphor: illness is a crime, data is evidence.
Research goal: building a model that yields a result (finding the "criminal" - the cause of the disease)
')
There are two symbolic images: Hedgehog and Fox. Fox knows a lot of little tricks, Hedgehog is only one reliable trick.
In the book “ Signal and Noise, ” Neyte Silver (Nate Silver) - analyzes a lot of scientific predictions. And if you look at what predictions work, which fail, then “Foxes” predict better than “Jerzy”.

The problem of working with statistical data is that there are two approaches comparable to religious ones.
The Bayesian approach makes it possible to explain how much we can trust the results and give assumptions in quantitative terms.
The problem of building a reliable model is that it is necessary to add the given phenotype + genome + environment to get a model with useful predictions. For example, one that can analyze the predisposition to a certain disease in a child.

So, we decided to analyze a variety of diseases. Why? Because the classification of diseases is in many ways artificial. Actually, autism certainly represents a “container with diseases” - with different causes, genetics.

Little retreat: Churchill, Martin Luther-King, Herneral Sherman, Roosevelt, Kennedy, Gandhi
What do they have in common (except that they are known and dead)?
the answer is here
The general thing is that they had bipolar-depressive personality disorder (manic-depressive psychosis). Churchil spoke about his state of apathy as a “black dog of depression”.
Affective disorders are common to many successful politicians.

What is the autistic phenotype: I wonder what else Asperger formulated from the group he singled out “the inability to form social skills,” “the absorption of fine details, and he also paid attention to“ awkward movements ”. Autistic children, he called “little professors” - all of which we single out as the criteria for autism so far.
A little bit of autism is just necessary for success in science. We do not know exactly which scientists in the past had autism (suspected by Newton and Tesla), but many scientists had schizophrenia and bipolar personality disorder.

The book “The Invisible Plague” claims that over the course of 260 years the incidence of neurological and mental diseases has increased (Many direct and indirect data have been processed).
The question of whether we see an increase in cases of autism is very acute: some believe. that the increase is, others - what is not.
The US Disease Center provides the following statistics on autism: 1:80 - boys, 1: 240 - girls.
Korean study: there was an attempt to phenotyping the entire population. They “combed” almost all children in South Korea and found that there are much more cases of autism and the incidence of diseases increases. According to their data, autism in 4% of boys and 1.5% of girls.

Why when we talk about statistics and analysis can there be so different points of view?
The reasons:
  1. diagnostic criteria change;
  2. there are economic reasons: for example, they may receive a diagnosis because of material interest;
  3. Doctors can diagnose differently.

However, according to Andrey Rzhetsky, diseases such as autism still increase in frequency.

What is necessary to build a plausible model of autism? We modeled the environment and the genome as random variables. For example, an infection is not present — a random variable, and changes in the genome are also a random genetic variable. We take P 1 and P 2 as two phenotypes (for example, autism and diabetes or autism and schizophrenia) and they will definitely have “common factors”. And we can build many models where P 1 intersects / does not suppress P 2 in environmental factors, either in the genome or in the phenotype.
The problem is that all existing genotype-phenotype dependency models are very simple now and are not suitable for describing such complex diseases as autism. And there are practically no models that even include the environment.

In addition, we do not know how to model, we do not know what should be included in the model:
Donald Rumsfeld (US Secretary of Defense) said: “There is something we know about, that we know. There is something that we know that we do not know. But there is also something that we don’t know, that we don’t know. ”
We also distinguish three types of factors: “Known Known” - these are well-studied and always taken into account factors, “Known unknowns” are insufficiently studied factors, but they come under suspicion of how they can affect the result and “Unknown unknowns” - factors that influence a process that we are studying, but of which we do not know and do not even suspect them and their existence.
An example of a genotype-phenotype-environment relationship:
Genotype: recessive mutation in the X chromosome
Phenotype: deficiency of protein coagulation factor VIII (Hemophilia A)
Wednesday: the blood of hundreds of thousands of people is taken for treatment.
Result: More than 80% of hemophilic patients in the United States suffer from AIDS and hepatitis. (because once donors did not check for these diseases)

When environmental factors are obvious:
Obesity in the USA : the number of overweight people is growing too fast to explain this by a factor of the genome, because growth has occurred in one or two generations:
image

How does the environment affect autism? Not enough data yet.
In order to add “famous unknowns” to the model, many parents were interviewed.
These are not the causes of autism, they are factors to consider. For example: mother lived on the edge of a corn field, the field was treated with pesticides and this could affect. Or another factor: an infectious disease, high fever and then regression (loss of speech by the child, coordination of movements). All factors must be considered when modeling, it is impossible to dismiss them.
Vaccination - the battlefield in question causes autism or not. We tested the hypothesis that only vaccination causes autism. This hypothesis was rejected (although there are many questions to this study). But the addition of factors remains unexplored: the genome + vaccination, and such a theory may be valid.
Together with James Evans (James A. Evans) investigated the factors that should be included in the genetic model of autism. They interviewed a number of scientists dealing with autism. They expected to find many positions of agreement and islands of disagreement, but they found an ocean of disagreement with small islands of unity.
Therefore, the model included a maximum of possible factors.

How is genetic research actually performed?
The task is simple when you need to compare one chromosome - then it is easy to find a matching distorted section, which leads to the disease. But when there are more than one such sites, when there are several chromosomes, the task becomes much more complicated. A person has about 20,000 genes. If you simply look for changes associated with autism for any combination of genes, then the number of possible combinations
for 2 genes - 10 ^ 8
for 3 genes - 10 ^ 12
for 10 genes - 10 ^ 37 - i.e. there is not enough population of the globe to collect data for analysis.
As you can see, what worked for one gene does not work for many.

The solution is to map the functional relationships of genes and proteins. Where to get such a card? The laboratory of Andrei Yurevich analyzed tens of thousands of articles in scientific journals to identify these links.

Fortunately, the genes we are looking for should be located close to the functional space - this is a well-analyzed, reliable pattern. So, we are not going through all the options in a row, but only those with the greatest correlation between the genome and the phenotype.
Why are Mendelian diseases taken for analysis? They are well studied, and it is known that certain places in the genome are responsible for them.
Color coding of Mendelian diseases in further visualization


When we analyzed for several diseases, it turned out that the same areas of the molecular network overlap with several diseases.
An example of hidden communication:
imageimage

Jodie Foster and Ronald Reagan - what do they have in common?
Don Hinkley trying to impress Jodie Foster attempted on Ronald Reagan

Phenotypes can be compared with well-known personalities, genotype - hidden connections between them. If we observe a sequence of phenotypes: is it possible to draw conclusions about genetics? Yes, for modeling conditions this can be done.

Data:


1,500,000 unique patient records, coded for ICD-9 diseases for the entire life of the patient. Since these data are used to determine the amount of compensation for insurance in the United States, they are imperfect. But, given their huge volume, it would be criminal not to analyze them.
Using a threshold model to describe whether genetic diseases transform into a phenotype, one can evaluate genetic relationships with complex disease phenotypes (like autism). Red ribs are the strongest ties. Prediction: autism has a common genetics with a host of unrelated diseases. As a result of the analysis: a significant association of autism with infectious diseases and with many diseases of the nervous system is evident.
Mendelian disease correlates with autism, bipolar disorder and schizophrenia


Finally, the graph below shows the correlation of some common diseases from a database of 1,500,000 patients. Red lines - positive correlation, blue - negative. Line thickness is the magnitude of correlation. The size of the circle corresponds to the sample of patients (from 20 to 136 thousand).


During the lecture, the professor showed a table of correlations of complex diseases and Mendelian diseases from an unpublished work, where 10,000,000 (yes, yes 10 million) unique disease cards were analyzed:


findings


Proven overlap of genome sites for various diseases
Every complex disease has a genetically related set of Mendelian diseases.
Analyzing the data, combining them, we are approaching the construction of a model of autism.
I hope you are not all asleep :)

ATTENTION


If you have interesting developments in the field of searching for connections, you are engaged in comparing data arrays, if you are engaged in genetic research, the laboratory of Andrei Yurevich Rzhetsky is interested in broad and mutually beneficial cooperation.
Contact them! (links at the bottom of the topic)

Thanks:
I thank the ITek company in which I work, my managers Balitsky Yuri and Kalashnikov Roman for providing a “day off” for three working days during the hot season for our technical support service.
Professional community of practices "Preventive medicine" thanks for the first international conference on autism, in the framework of which we could hear the wonderful report of Rzhetsky A.Yu.
I express my sincere and grateful thanks to the Ditina Maybutnym Foundation and personally to Inna Sergienko and Larisa Rybchenko, as well as to the head of the BF Association of Parents of Children with Autism, Yevgeny Panichevskaya. Thank you for your trust and opportunity to rearrange all of you at the 1st Moscow International Conference "Autism: Challenges and Solutions".
I express my gratitude to the director of the Coming Out Foundation, Evgenia Mishina, who provided invaluable material and moral assistance in Moscow, and to you, my wonderful Svetlana Moiseeva and Alya Yanushevich, thanks to whom I did not stay overnight at the train station. And of course to all who organized this and volunteer: Ekaterina Men, Yana Zolotovitskaya from the Center for Autism Problems, and everything else.

Selected publications by A. Rzhetsky:



References:


One of the books written by A. Yu. Rzhetsky. in collaboration with Zharkikh A.A. during the USSR: “A new approach to the reconstruction of phylogenies based on the analysis of many gene families”: books.google.com.ua/books/about/%D0%9D%D0%BE%D0%B2%D1%8B%D0%B9_ % D0% BF% D0% BE% D0% B4% D1% 85% D0% BE% D0% B4_% D0% BA_% D1% 80% D0% B5% D0% BA% D0% BE% D0% BD% D1 % 81.html? Id = RTPGHAAACAAJ & redir_esc = y
Andrei Yurevich’s website: www.ci.uchicago.edu/research/rzhetsky
Andrei Rzhetsky in the Biomedexperts catalog www.biomedexperts.com/Profile.bme/1652205/Andrey_Rzhetsky
Articles on the results of research:
Network properties of genes harboring inherited disease mutations www.pnas.org/content/105/11/4323.full
Probing genetic overlap among complex human phenotypes www.pnas.org/content/104/28/11694.full

Source: https://habr.com/ru/post/178457/


All Articles