Data mining makes scientific discoveries

An interesting article published in the journal New Scientist on how data mining is used to analyze a large amount of scientific information. The goal is to search for valuable information in isolated scientific articles. These patterns of people are probably not able to detect on their own, without automatic processing. This is not surprising, since the volume of published scientific documents on the Internet in English alone has already exceeded 100 million documents . This is a huge information noise, from which it is almost impossible to extract useful information. That is, it is impossible to extract the human mind.

It is clear that without data mining in modern science is impossible. For example, petabytes of information from the Large Hadron Collider are processed for months / years in order to determine the presence or absence of effects assumed by a particular theory. But here we are talking about a more “subtle" analysis of scientific results from different authors for the search for hidden patterns, coincidences.

For example, a Californian supercomputer called KnIT is constantly working on such tasks. He analyzes 50,000 scientific articles per hour. Let's say he specifically analyzed all the information related to a protein called p53 and searched for all the data about the enzymes that interact with it, they are called kinases.
')
P53 protein is very important and is considered a “genome security guard,” it suppresses the occurrence of cancerous tumors in the body. The supercomputer was looking for all references in scientific articles, which may indicate the presence of new undiscovered kinases for the p53 protein. As a test task, he analyzed research papers until 2003 — and found 7 kinases that were actually discovered over the next 10 years. That is, the system has confirmed that it can make real scientific discoveries. In addition, she found 2 more kinases, still unknown to science. Initial laboratory experiments confirmed the validity of the assumptions made by the supercomputer (although a group of scientists wants to repeat the experiments to ensure).

The KnIT developers from IBM and Beylor Medical College recently presented a report on this topic at the Knowledge Discovery and Data-mining Conference in New York. Their main thesis is that people-scientists are better suited to generate new information, while computers are better suited for analyzing all this huge generated data set.

Of course, KnIT is not the only development in this area where active research is taking place. For example, the authors of the Manchester Eve system claim that she has already found a new cure for malaria. The program did not study scientific work, but emulated experiments in this area itself, trying different versions of drugs.

Source: https://habr.com/ru/post/240067/

All Articles

Data mining makes scientific discoveries

More articles: