Google programmer Clif Young explains how the explosive development of depth learning algorithms coincides with the failure of Moore's law, the empirical rule of computer chip progress for decades, and makes it develop fundamentally new computational schemes

The explosive development of AI and machine learning algorithms changes the very nature of computing - as they say in one of the largest AI practicing companies - on Google. Google programmer Cliff Young spoke at the opening of the autumn microprocessor conference organized by the Linley Group, a popular symposium on computer chips held by a respected semiconductor analysis company.
Yang said that the use of AI had moved into the “exponential phase” at the very moment when Moore’s law, the empirical rule of progress for computer chips, had been completely slowed down for decades.
“The times are pretty nervous,” he said thoughtfully. “Digital CMOS is slowing down, we see problems with the 10nm process at Intel, we see them at the 7nm process from GlobalFoundries, and at the same time as the development of depth learning, an economic inquiry appears. CMOS, a complementary metal-oxide-semiconductor structure, is the most common material used to make computer chips.
')
While classic chips hardly increase efficiency and productivity, requests from AI researchers are growing, said Young. He gave a bit of statistics: the number of machine learning papers stored on the arXiv preprint site maintained by Cornell University doubles every 18 months. And the number of internal projects focusing on AI in Google, he said, also doubles every 18 months. The need for the number of floating-point operations needed for processing neural networks used in machine learning is growing even faster - it doubles every three and a half months.
All of this growth in computational queries is being combined into “Mura's super law,” said Young, and he called this phenomenon “a bit intimidating” and “a little dangerous”, and “in order to worry about.”
“Where did all this exponential growth come from?” In the field of AI, he asked. “In particular, the thing is that in-depth training just works. “In my career, I have long ignored machine learning,” he said. “It was not obvious that these things would take off.”
But then such breakthroughs quickly began to appear, such as pattern recognition, and it became clear that in-depth training was “incredibly effective,” he said. “For most of the last five years, we were a company that put AI in the first place, and we redid a large part of AI-based businesses,” from search to advertising and much more.

The Google Brain project team, the leading AI research project, needs “giant machines,” said Young. For example, neural networks are sometimes measured by the number of “weights” used in them, that is, the variables applied to a neural network and affect how it processes data.
And if ordinary neural networks can contain hundreds of thousands or even millions of scales that need to be calculated, researchers from Google require themselves “tera-weight machines”, that is, computers capable of counting trillions of scales. Because "every time we double the size of a neural network, we improve its accuracy." The rule of AI development is becoming more and more.
In response to requests from Google, they are developing their own line of chips for MO, the Tensor Processing Unit. TPU and its like are needed, since traditional CPU and GPU graphics chips do not cope with the loads.
"We held ourselves together for a very long time and said that Intel and Nvidia are great at creating high-performance systems," said Young. “But we crossed this line five years ago.”
TPU, after its first public appearance in 2017, caused hype by claims that it outperforms conventional chips in speed. Google is already working on the third generation of TPU, using it in their projects and offering computer facilities on demand through the Google Cloud service.
The company continues to manufacture TPU of all large and large sizes. In its “stringed” configuration, the 1024 TPUs are jointly connected to a new type of supercomputer, and Google plans to continue to expand this system, according to Young.
“We are creating giant multi-computers with tens of petabytes,” he said. “We are moving tirelessly on progress in several directions at once, and terabyte-scale operations continue to grow.” Such projects raise all the problems associated with the development of supercomputers.
For example, Google engineers have adopted the tricks used in the legendary Cray supercomputer. They combined a giant “matrix multiplication module”, a part of the chip that carries the main computational burden for neural networks, with a “general purpose vector module” and a “general purpose scalar module”, as was done in Cray. “The combination of scalar and vector modules allowed Cray to overtake all in performance,” he said.
Google has developed its own innovative arithmetic constructions for programming chips. A certain way of representing real numbers called bfloat16 improves the efficiency of processing numbers in neural networks. In colloquial speech, it is called “brain float number”.
TPU uses the fastest memory chips, high bandwidth memory, or HBM [high-bandwidth memory]. He said that the demand for large amounts of memory in the training of neural networks is growing rapidly.
“Memory during training is used more intensively. People talk about hundreds of millions of scales, but they have their own problems when processing the activation of "variables of the neural network.
Google also tweaks a way of programming neural networks that helps squeeze the most out of hardware. “We are working on data and parallelism of the model” in such projects as “Mesh TensorFlow” - an adaptation of the TensorFlow software platform, “combining data and parallelism on pod scales”.
Young did not disclose some technical details. He noted that the company did not talk about internal connections, about how data travels around the chip - simply noted that "our connectors are gigantic." He refused to cover this subject, which caused laughter in the audience.
Young pointed to even more interesting areas of computation, which may soon be revealed to us. For example, he suggested that calculations using analog chips, circuits that process input data in the form of continuous values ​​instead of zeros and ones can play an important role. "Perhaps we will turn to the analog field, in physics there is a lot of interesting things related to analog computers and NVM memory."
He also expressed hope for the success of the startups associated with the chips presented at the conference: “There are very cool startups here, and we need them to work, because the capabilities of digital CMOS are not limitless; I want all these investments to shoot. ”