📜 ⬆️ ⬇️

How artificial intelligence changes science

The latest AI algorithms understand the evolution of galaxies, count the functions of quantum waves, discover new chemical compounds, and so on. Is there anything in the work of scientists that can not be automated?




No person or even a group of people can keep up with the waterfall of information produced by a huge number of experiments in physics and astronomy. Some of them leave terabytes of data daily, and this stream only increases. The Square Kilometer Array antenna array, a radio telescope, which is planned to be included in the mid-2020s, will produce annually a volume of data comparable to the entire Internet as a whole.

This flood of data has led many scientists to turn to artificial intelligence (AI) for help. With minimal human involvement, such AI systems as neural networks — computer-simulated neuron networks that mimic the brain’s work — are able to wade through mountains of data, finding anomalies and recognizing sequences that people would never notice.

Of course, the help of computers in scientific research has been used for about 75 years, and the method of manually sorting data in search of meaningful sequences was invented thousands of years ago. But some scientists argue that the latest machine learning technology and AI represent a fundamentally new way of doing science. One such approach, generative modeling (GM), can help determine the most likely theory among the competing explanations of the observed data, based only on these data, and without any pre-programmed knowledge of what physical processes can occur in the system under study. . Supporters of the GM consider it innovative enough to be considered as a potential “third way” of studying the Universe.
')
We usually acquire knowledge about nature through observation. As Johann Kepler studied the tables of the positions of the planets Tycho Brahe, trying to find the underlying law (he finally realized that the planets move in elliptical orbits). Science has also advanced through simulations. An astronomer can model the movement of the Milky Way and the neighboring galaxy, Andromeda, and predict that they will collide in a few billion years. Observations and simulations help scientists create hypotheses that can be tested with future observations. GM is different from both of these approaches.

“In essence, this is the third approach, between observation and simulation,” says Kevin Shavinsky , an astrophysicist and one of the most active supporters of GM, who until recently worked at the Swiss Federal Institute of Technology. "This is another way to attack a task."

Some scientists consider GM and other technologies to be simply powerful tools for practicing traditional science. But most agree that AI will significantly affect this process, and its role in science will only grow. Brian Nord , an astrophysicist from the Fermi National Accelerator Laboratory, using artificial neural networks to explore space, belongs to those who fear that none of the studies of a human scientist will avoid automation. “The thought is rather terrifying,” he said.

Discovery generation


Even at the institute, Shavinsky began to build a reputation for himself in data science. While working on his doctoral dissertation, he met the task of classifying thousands of galaxies based on their appearance. There were no ready-made programs for this task, so he decided to organize crowdsourcing for this purpose - this is how the Galactic Zoo project was born. Since 2007, ordinary users have been able to help astronomers by building guesses about which galaxy belongs to which category, and usually the majority of votes correctly classified the galaxy. The project was a success, however, as Shavinsky notes, the AI ​​made it meaningless: “Today a talented scientist with experience in MO and access to cloud computing can do a similar project in half a day.”

Shavinsky turned to the new powerful tool of GM in 2016. In essence, the GM asks the question: how likely is it that, under condition X, we get the result Y? This approach has been incredibly effective and versatile. For example, suppose you fed a GM a set of images of human faces, and for each person their age is recorded. The program combines these training data and begins to find a connection between old faces and an increase in the likelihood of wrinkles on them. As a result, she can give the age of any given person — that is, predict what physical changes a given person of any age is likely to undergo.


None of these persons is present. The top row (A) and the left column (B) are created by a generative-adversary network (GSS) using building blocks derived from elements of real persons. Then, the GSS combined the main features of the faces of row A, including gender, height and shape of the face, with the smaller face features of column B, for example, hair and eye color, and created faces in the rest of the table.

Of the GM systems, the most well-known generative-contention networks (GSS). After processing adequate GSS learning data, it can restore images with missing or damaged pixels or make blurred photos clear. The GSS are trained to extract the missing information on the basis of a contest (hence the “contest”): one part of the network, the generator, generates false data, and the second, the discriminator, tries to distinguish false data from real ones. During the program, both parts of it gradually work better and better. You may have seen some super-realistic “faces” created by the GSS — images of “incredibly realistic people who do not exist in reality,” as they wrote in one of the headlines.

In the more general case, the GM takes a set of data (usually images, but not necessarily), and breaks them into subsets of basic abstract building blocks — scientists call them the “hidden space” of data. The algorithm manipulates the elements of the hidden space to see how this affects the original data, and this helps to reveal the physical processes that ensure the operation of the system.

The idea of ​​hidden space is abstract and hard to imagine, but as a rough analogy, think about what your brain can do when you are trying to determine the sex of a person by face. Perhaps you notice the hairstyle, nose shape, and so on, as well as patterns that are not easy to describe in words. A computer program also searches for hidden attributes in the data: although it has no idea what a mustache is or gender if it was trained on a data set in which some images are labeled “male” or “female”, and some have the label “mustache” ”, She will quickly understand the relationship.


Kevin Shavinsky, astrophysicist, head of the AI ​​company Modulos

In a paper published in December in the journal Astronomy & Astrophysics, Shavinsky and his colleagues, Denis Tharp and Che Zhen used GM to study the physical changes of galaxies in the evolutionary process (the software they use calculates the hidden space a bit differently than the GSS, so technically it cannot name GSS, although it is quite close in properties). Their model created artificial data sets to test hypotheses about physical processes. For example, they asked how the “attenuation” of star formation — a sharp decrease in the rate of their formation — is associated with an increase in the density of the galaxy.

For Shavinsky, the key question is how much information about stellar and galactic processes can be extracted on the basis of only one data. “We exclude everything that we know about astrophysics,” he said. “To what extent can we rediscover this knowledge using only data?”

First, images of galaxies were reduced to hidden space; then Shavinsky could correct one element of this space so that it corresponded to a certain change in the galaxy environment — for example, the density of its environment. Then he could re-generate the galaxy and see what differences appear this time. “And now I have a machine for generating hypotheses,” he explained. “I can take a bunch of galaxies that were originally surrounded by low density and make it look like the density of their surroundings is high.” Shavinsky, Tharp and Zhen discovered that when they go from lower to higher environmental density, they become redder, and their stars are concentrated more densely. This coincides with existing observations of galaxies, said Shavinsky. The only question is why.

The next step, says Shavinsky, has not yet managed to automate. “I, a man, need to intervene and say: Well, what kind of physics can explain this effect?” There are two possible explanations for this process: perhaps the galaxies are becoming redder in denser environments because they contain more dust, or because there is a recession in the formation of stars (in other words, their stars are usually older). With the help of the generative model, we can test both ideas. We change the elements of the hidden space associated with dust and with the speed of star formation, and see how this affects the color of galaxies. “And the answer is clear,” said Shavinsky. Redder galaxies are those “where the rate of star formation has fallen, and not those where there is more dust. Therefore, we tend to favor the first explanation. ”


Top row - real galaxies in low density regions.
The second row is a reconstruction based on hidden space.
Next come the transformations made by the network, and below are the generated galaxies in high density regions.

The approach is associated with traditional simulations, but has dramatic differences. The simulation, in fact, "is done on the basis of assumptions," said Shavinsky. “This is the same as saying:“ I think I understood what physical fundamentals underlie everything that I observe in the system. ” I have a recipe for the formation of stars, for the behavior of dark matter, and so on. I place all my hypotheses and run a simulation. And then I ask: Does this look like reality? ”And with generative modeling, this, in his words, looks“ in a sense, the exact opposite of simulation. We do not know anything, do not want to suggest anything. We want the data to tell us what can happen. ”

The obvious success of generative modeling in such a study obviously does not mean that astronomers and graduate students are no longer needed - but it seems to show a shift in the degree to which AI can learn something about astrophysical objects and processes, having almost only a huge amount of data. “This is not a fully automated science, but it demonstrates that we are able to create tools that automate scientific progress, at least in part,” said Shavinsky.

Generative modeling is obviously capable of many things - but whether it represents a really new approach to science is a moot point. For David Hogg , a cosmologist from New York University and the Flatiron Institute, this technology, although impressive, is essentially a very complex way of extracting sequences from data — and astronomers have been doing this for many centuries. In other words, it is an advanced method of observation and analysis. Hogg's work, like Shavinsky, is highly dependent on AI; he uses neural networks to classify stars by spectrum and draw conclusions about other physical properties of stars using data-based models. But he considers his work, and Shavinsky’s work, to be an old, kind, proven scientific method. “I do not think this is the third way,” he said recently. “I just think that we, as a community, are increasingly using our data. In particular, we have learned much better at comparing data. But from my point of view, my work fits perfectly into the framework of the observation regime. ”

Diligent Assistants


Whether or not AI and neural networks are conceptually new tools, or not, it is obvious that they have begun to play a crucial role in modern astronomy and physical research. At the Heidelberg Institute for Theoretical Studies, physicist Kai Polsterer leads an astroinformatics group — a team of researchers working with new astrophysical methods based on data processing. Recently, they used an algorithm with MOs to extract information about the redshift from galaxy data sets — a task that was formerly exhausting.

Polsterer considers these new AI-based systems to be “assiduous assistants” who can brush the data for hours without falling into boredom and not complaining about the working conditions. These systems can do all the monotonous and hard work, he said, leaving us with a "cool, interesting science."

But they are not perfect. In particular, Polsterer warns, algorithms can only do what they have been taught. The system is indifferent to the input data. Give her a galaxy, and she will appreciate her redshift and age. But give her a selfie or a photo of a rotten fish, she will estimate their age (naturally, incorrectly). In the end, he said, there remains a need for oversight by people. “Everything closes on us, on researchers. We are responsible for the interpretation. ”

For its part, Nord from Fermilab warns that it is important that neural networks not only produce results, but also work errors, as any student is accustomed to. In science, it is so accepted that if you took a measurement, but did not give an error, no one will take your results seriously.

Like many AI researchers, Nord also worries that the results from neural networks are hard to understand; the neural network responds without providing a clear way to get it.

However, not everyone believes that lack of transparency is a problem. Lenka Zdeborova, a researcher from the Institute of Theoretical Physics in France, points out that human intuition is also sometimes impossible to understand. You look at the photo and find out that it has a cat on it - “but you don’t know how you know it,” she says. “Your brain is also in some sense a black box.”

Not only astrophysicists and cosmologists migrate towards science using AI and data processing. Roger Melko, a specialist in quantum physics from the Perimeter Institute for Theoretical Physics and the University of Waterloo, used neural networks to solve some of the most complex and important problems in this area, for example, the representation of a wave function describing a multi-particle system. AI is necessary here because of what Melko calls the "exponential curse of dimension." That is, the number of possible forms of the wave function grows exponentially with an increase in the number of particles in the described system. The difficulty is similar to trying to choose the best move in a game like chess or go: you try to figure out the next move, imagining how your opponent will go, and choosing the best answer, but with each move the number of possibilities increases.

Of course, AI mastered both of these games, having learned to play chess several decades ago, and having beaten the best go player in 2016 — the AlphaGo system did. He says finely that they are also well adapted to the problems of quantum physics.

Machine mind


Is Shavinsky right when he claims that he has found the “third way” to do science, or, as Hogg says, these are just traditional observations and data analysis “on steroids”, it is clear that AI changes the essence of scientific discovery and obviously accelerates it. How far will the AI ​​revolution go in science?

Loud statements about the achievements of "robo-scientists" are periodically heard. Ten years ago, a robot chemist Adam investigated the yeast genome and established which genes are responsible for the production of certain amino acids. He did this by observing yeast strains that lacked certain genes, and comparing the results of their behavior with each other. Wired magazine wrote "The robot made a scientific discovery on its own ."

A little later, Lee Cronin, a chemist at the University of Glazko, used a robot to randomly mix chemicals in order to find out if there were any new compounds. By tracking real-time reactions with a mass spectrometer, a machine operating on nuclear magnetic resonance, and an infrared spectrometer, the system eventually learned to predict the most reactive combinations. Even though this did not lead to discoveries, Cronin said, the robotic system may allow chemists to speed up their research by 90%.

Last year, another team of scientists from Zurich used neural networks to derive physical laws based on datasets. Their system, something like a robotic Kepler, re-opened the heliocentric model of the solar system, based on records of the location of the Sun and Mars in the sky visible from Earth, and also derived the law of conservation of moment from observations of collisions of spheres. Since physical laws can often be expressed in several ways, researchers are interested in whether this system can offer new, and perhaps simpler, ways of working with known laws.

All these are examples of how AI accelerates scientific discoveries, although in each case it can be argued how revolutionary the new approach was. Perhaps the most controversial will be the question of how much information can be obtained from data alone - an important issue in the era of vast, and constantly growing, mountains of data. In the 2018 book The Book of Why, computer science specialist Jadi Pearl and popular science writer Dana Mackenzie suggest that the data is "incredibly stupid." Questions about cause-effect relationships "are never answered on the basis of data," they write. “Every time, seeing a work or study analyzing data without taking into account models, one can be sure that the output of this work summarizes, and possibly transforms, but does not interpret the data.” Shavinski sympathizes with Pearl’s position, but describes the idea of ​​working only with data, as something like a “little man of dashes”. He said that he had never stated the possibility of deriving causes and effects from the data. "I just said that we can do much more with the data than it usually does."

Another common argument is that science needs creativity, and, at least for the time being, we don’t have any idea how to program it. A simple enumeration of all the possibilities, as Cronin's robot chemist did, does not look particularly creative. “I think that in order to come up with a theory, logical constructions, creativity is required,” said Polsterer. “Every time you need creativity, you need a man.” And where does creativity come from? Polsterer suspects that it is associated with boredom - the fact that, according to him, the machine is not given to test. “To be creative, one must not like boredom. And I don’t think the computer will ever get bored. ” On the other hand, words like “creativity” and “inspiration” are often used to describe programs such as Deep Blue and AlphaGo. And vain attempts to describe what is happening inside the machine mind, are very similar to the difficulties that we face, studying our own thinking processes.

Shavinsky recently left academia for the commercial sector; He now runs a startup Modulos, where many scientists from a Swiss technical institute work, and, according to their website, “work in the eye of an AI development and machine learning project.” Whatever obstacles lie between modern AI and full-fledged artificial intelligence, he and other experts believe that the machines are destined to do more and more work of scientists. Whether there is a limit, we just have to find out.

“Will it be possible in the foreseeable future to create a machine capable of making discoveries in physics or mathematics that the smartest of living people who use biological equipment are not capable of? - thinks Shavinsky. - Will the science of the future be developed thanks to machines operating at a level inaccessible to us? I dont know. This is a good question".

Source: https://habr.com/ru/post/445806/


All Articles