About the author. Richard Sutton is a professor of computer science at the University of Alberta. It is considered one of the founders of modern computational teaching methods with reinforcement.Following the 70 years of research in the field of AI, the main lesson is that general computational methods are ultimately the most effective. And with a large margin. Of course, the reason is Moore's law, more precisely, the exponential fall in the cost of computation.
Most AI studies have assumed that constant computational resources are available to the agent. In this case, practically the only way to increase productivity is the use of human knowledge. But a typical research project is too short-term, and after a few years the productivity of computers inevitably increases.
')
Striving for improvement in the short term, researchers are trying to apply human knowledge in the subject area, but in the long term, only the power of the calculations is important. These two trends should not contradict each other, but in practice they contradict. The time spent in one direction is the time lost for another. There are psychological obligations to invest in this or that approach. And the introduction of knowledge in the subject area tends to complicate the system in such a way that it is worse suited to use general computational methods. There have been many examples when researchers learned this bitter lesson too late, and it is useful to consider some of the most famous ones.
In computer chess, the system that defeated world champion Kasparov in 1997 was based on a deep search for options. Then most computer chess researchers looked at these methods with dismay because they applied a human understanding of the subject area — the particular structure of a chess game. When a simpler, search-based approach with special hardware and software turned out to be significantly more efficient, these researchers refused to admit defeat. They said that the “brute force” method may have worked once, but is not a common strategy. In any case, people are
not playing chess like that. These researchers wanted to win methods based on the human understanding of the game, but they were disappointed.
A similar situation has arisen in the studies of the game of go, only with a delay of 20 years. Huge initial efforts were made to avoid searching, and to use human subject knowledge or game features, but all these efforts proved to be useless when a deep search for options with massive parallel computing was effectively applied. Self-learning was also important for mastering the function of value, as in many other games and even in chess, although this function did not play a large role in the program of 1997, which for the first time defeated the world champion. Learning in a game with yourself and learning in general are similar to searching in the sense that they allow the use of massive parallel computing. Search and training are the most important applications of computational power in AI research. As in computer chess, in the development of a program for the game of go, researchers first focused on applying human understanding of the subject area (which required less search), and much later came great success when they applied search and training.
In the 1970s, DARPA held a competition for speech recognition systems. The participants proposed many special methods that used knowledge of the subject area - knowledge of words, phonemes, the human vocal tract, etc. On the other hand, new methods were presented that were more statistical in nature. They did much more computation based on hidden Markov models (HMM). And again, statistical methods triumphed over methods based on domain knowledge. This has led to significant changes in the whole processing of natural language. Gradually over the years, statistics and calculations began to dominate this area. The recent growth of deep learning in speech recognition is the last step in this direction. Deep learning methods rely even less on human knowledge and use even more computation along with training on huge data sets. This has significantly improved speech recognition systems. As in games, researchers have always tried to create systems that work along the lines of their own minds: they tried to transfer their knowledge of the subject area to their systems. But ultimately it turned out to be counterproductive and it was a tremendous waste of time when Moore's law made massive calculations available and tools were developed for their effective use.
In computer vision, a similar picture. Early methods considered vision as a search for the boundaries of objects, generalized cylinders or in terms of SIFT signs. But today all this is dropped. Modern deep-learning neural networks use only the concepts of convolutions and some invariants, while working much better.
This is a great lesson. In general, in the industry, we have not fully understood it, because we continue to make the same mistakes. To effectively counter this, one should understand the attractiveness of these mistakes. We must learn a bitter lesson: building a model of the human mind does not work in the long run. Bitter lesson is based on several historical observations:
- Researchers often tried to build their knowledge into AI agents.
- It always helps in the short term and personally satisfies the researcher, but
- In the long run, this approach rests on the ceiling and even slows down further progress.
- Breakthrough progress ultimately comes from the opposite approach, based on massive computation through search and learning.
The ultimate success is tinged with bitterness and is often not fully accepted, because it is a victory over an attractive, people-oriented approach.
One lesson needs to be learned from this bitter experience: it is necessary to recognize the tremendous power of common methods that continue to scale with increasing computing power, even when huge amounts of computation are required. It seems that search and training can scale indefinitely.
The second common point that should be learned from the bitter lesson is that real human thinking is extremely, irretrievably difficult. We should stop trying to find a simple way to present the contents of the mind as simple models of space, objects, or multiple agents. All this is part of the internally complex external world. This is impossible to model because the complexity is infinite. Instead, meta-methods should be developed that can find and capture this arbitrary complexity. For these methods, it is important that they can find good approximations, but this search is carried out by the methods themselves, and not by us. We need AI agents who can conduct research themselves and not use the knowledge we have discovered. Building an AI system on human knowledge only complicates its learning.