📜 ⬆️ ⬇️

Google uses machine learning to improve data center efficiency

The Internet giant uses machine learning and artificial intelligence to improve the efficiency of its data centers. According to Joe Kawa, vice president of data center at Google, the company began to use neural networks to analyze a huge amount of data collected on servers and make recommendations to improve their work.

In fact, Google built a computer that knows even more about its data centers than the engineers themselves. The human resource is not written off, but Kava believes that the use of a neural network will give Google the opportunity to achieve new horizons in the performance of server farms due to going beyond what engineers can see and analyze.


')
Google is already running some of the most energy-efficient data centers on the planet. The use of artificial intelligence will give Google the opportunity to look into the future and simulate thousands of work patterns of their data centers.

In the early stages of use, neural networks allowed Google to predict the PUE coefficient with an accuracy of 99.6%. These recommendations, for all their seeming insignificance, led to substantial cost savings, since have been applied to thousands of servers.

Why did Google turn to machine learning and neural networks? The main reason lies in the fact that data centers are constantly expanding, which becomes a challenge for Google, which uses sensors to collect millions of values ​​of infrastructure data and energy consumption.

“In such a dynamic environment as a data center, it is sometimes difficult for a person to see all the interconnections of system variables,” says Kava. - “We have been working on optimizing the work of the data center for a long time. All certainly the best ways have already been introduced, but we must not stop! ”

image

Meet the boy-genius


Google's neural network was created by Jim Gao, a Google engineer who was nicknamed “genius boy” by his colleagues because of his ability to analyze large amounts of data. Gao analyzed cooling systems, applying the principles of hydrodynamics and monitoring data to create a 3D model of airflow inside the server room.

Gao believed that it was possible to create a model that tracks an even larger set of variables, including IT workload, weather conditions, cooling towers, water pumps and heat exchangers that maintain the normal temperature of Google servers.

“Computers are good because they can see all the history hidden in the data. Jim took the information that we collect daily and drove it through his model to come to understand complex chains of interaction, to understand the meaning that workers could not have noticed as mere mortals, ”writes Kava on his blog. “Thanks to a series of trial and error, Jim's model now gives an accuracy of 99.6% in calculating PUE. This means that he can now apply models in search of new ways to increase the effectiveness of our actions. ” The image below shows the correlation between the predicted (black curve) and actual (yellow curve) PUE changes.



How it works


Gao began working on machine learning as a “20 percent project.” Traditionally, Google allows its employees to spend part of their working time on developing innovations, in addition to their core responsibilities. Gao was not a specialist in artificial intelligence. To study the key points in machine learning, Gao took a course at Stanford with Professor Andrew Un.

The neural network imitates the work of the human brain, allowing the computer to understand and “learn” tasks without having to program them explicitly. Google search engine is often cited as an example of this type of training, which is also one of the key areas of research in a company. “This model is nothing more than a set of calculations for differential equations,” Kawa explained. “But you have to understand mathematics.” The model begins with a study of the interaction of variables. ”

To begin with, Gao needed to identify the key factors affecting energy efficiency in Google data centers. He narrowed the number of these indicators to 19 and designed a neural network, a machine learning system that can recognize patterns in large data arrays.

“The huge number of combinations of equipment and settings makes it difficult to find optimal efficiency,” writes Gao in his report. - “In a working data center, tasks can be implemented by a variety of hardware combinations (mechanical and electrical) and software (control strategies and installations). It’s almost impossible to test every combination to improve efficiency - there are time constraints, frequent load fluctuations in IT equipment operation, weather conditions, and the need to maintain a stable data center operation. ”



Works on a single server


As for the equipment, according to Kava, the system does not require incredible computing power and runs on a single server, but could even work on one high-end desktop computer.

The system has been launched on several Google data centers. The machine learning tool was able to propose several changes that led to a gradual improvement in PUE, including an improvement in load distribution with increasing infrastructure capacity, as well as small changes in the temperature of the cooling water system.

“Recent tests at Google's data centers have shown that machine learning is an effective method of using existing sensor readings to simulate energy distribution in a data center and leads to significant cost savings,” writes Gao.

Machines do not take over


Kava believes that this tool will help Google to model and improve other projects in the future. But do not worry, Google data centers will not soon acquire self-awareness. Now the company is interested in automation, and even recently acquired robotics development companies, but so far none of Google data centers work exclusively on automated management. ”

“We still need people to make the right conclusions about all this,” says Kawa. “And I still want our engineers to familiarize themselves with these recommendations.”

The greatest bonuses for using neural networks will manifest themselves in the coming years when building a new Google server platform. “I anticipate the use of this principle in the design of data centers,” says Kawa. “This advanced technology can be used both in design and in further improvements.” I think we will find other uses. ”

Google shared its approach to machine learning in a Gao article , hoping that those who also manage powerful data centers will be able to put this into practice. “This mechanism is not something special that only Google or Jim Gao can use,” says Kawa. “I would really like to see a wider application of this technology. I think the whole industry will benefit from this. It’s an amazing tool to be as effective as possible. ”

Source: https://habr.com/ru/post/230627/


All Articles