High performance Google Chrome

History and cornerstones of Google Chrome.

Google Chrome was introduced in the second half of 2008 as a beta version for the Windows platform. The Chrome code, authored by Google, was made available under the liberal BSD license - just like the Chromium project. For most concerned, such a turn of events was a surprise - is the browser war back ? Can Google really make its product better than others?

“It was so good that it made me change my mind ..” - Erich Schmidt, who initially did not want to accept the idea of Google Chrome.

Yes, he could! Today, Google Chrome is one of the most popular browsers ( 35% market share, StatCounter performance) and is available on Windows, Linux, OS X, Chrome OS, Android and iOS platforms. Undoubtedly, its advantages and wide functionality found a response in the hearts of users, giving many wonderful ideas to other browsers.

The original 38 page comic book, with explanations of the ideas and principles of Google Chrome offers us a wonderful example of the thinking and design process that resulted in this browser. However, this is only the beginning. The main principles that have become the motivators of the first stages of the development of Chrome, have been continued in the rules of continuous improvement of the browser:

Speed : make the fastest browser
Security : provide the user with the most secure environment for work
Stability : provide a flexible and stable platform for web applications
Simplicity : sophisticated technology behind a simple interface .

As the development team notes, many of the sites we use today are not so much web pages as web applications. In turn, increasingly large and ambitious applications require speed, security and stability. Each of these qualities deserves a separate chapter of the book, but since today our topic is performance, we’ll mostly talk about speed.
')

Versatility of performance.

Modern browsers are a platform that in many ways resembles the operating system, and Google Chrome is no exception. The browsers that preceded Chrome were designed as monolithic programs, with one workflow. All open pages shared the same address space and worked with the same resources, so an error in the processing of any page or in the browser engine, threatened to crash and crash the entire application.

In contrast to this approach, Chrome is based on a multi-process architecture that provides each page with its own separate process and memory, creating something like a rigidly isolated sandbox for each tab. In a world of ever-increasing multi-core processors, the ability to isolate processes simultaneously with protecting each page from other, error-prone pages, gave Chrome a significant performance advantage over competitors. It is worth noting that most other browsers followed suit, introducing or starting the implementation of the mentioned architecture.

With the separation of processes, the execution of a web application mainly involves three tasks: get all the necessary resources, build the page structure and display it, run JS. The processes of building a page and executing JS follow a single-threaded, alternating pattern, since it is not possible to perform the simultaneous construction and modification of the same page tree (DOM). This feature is explained by the fact that JS itself is a single-threaded language. Therefore, optimizing the co-creation of the page and the execution of scripts in real time is a very important task for both web application developers and browser developers themselves.

Chrome uses WebKit , a fast, open source and standards-compliant engine to display pages. To perform JS Chrome uses its own, very well optimized V8 Javascript engine, which, by the way, is an Open Source project, and has found its application in many other popular projects - for example, in node.js. However, optimizing the execution of V8 scripts, or processing and displaying pages with a webcam, is not so important when the browser is waiting for the resources needed to build the page.

The ability of the browser to optimize the order, priority, and manage the possible delays of each necessary resource is one of the most important factors of its work. You may not even be aware of this, but the Chrome network interface, figuratively speaking, will grow smarter every day, trying to hide or minimize the cost of waiting for each resource to load: it learns like a DNS search, memorizing network topology, performing preliminary requests to those most likely to visit pages and so on. Externally, it is a simple mechanism for requesting and receiving resources, but its internal structure gives us a fascinating opportunity to learn how to optimize web performance in order to leave the user with only the best impressions.

What is a modern web application?

Before turning to the details of optimizing our interaction with the Internet, this section will help us understand the nature of the problem that we are investigating. In other words , what does a modern web page do, or what does the current web application look like ?

The HTTP Archive project preserves the history of the evolution of the web, and it will help us answer this question. Instead of collecting and analyzing content from the entire network, he periodically visits popular sites to collect and record data about resources used, content types, headers, and other meta-data for each individual site. Statistics available as of January 2013 may surprise you. The average page, with a selection among the first 300,000 Internet sites, has the following characteristics:

"Weight" - 1280 KB
Consists of 88 resources (images, css, js)
Uses data from more than 30 third-party sites.

Let's look at it in more detail. Over 1MB of weight on average, consists of 88 resources, and is collected from 30 different proprietary and third-party servers. Note that each of these indicators has been growing steadily over the past few years, and there is no reason to predict that this growth will stop. We are increasingly building increasingly cumbersome and demanding web applications, and this has no end in sight.

By making simple mathematical calculations based on the HTTP Archive indicators, you can see that the average page resource has a weight of 12KB (1045KB / 84), which means that most of the Internet connections in the browser are short and impulsive . This makes life even more difficult for us, since the underlying protocol (TCP) is optimized for large, streaming downloads. Therefore, it is worth getting to the bottom of things, and consider one of the typical requests for a typical resource.

Typical query life

The W3C Navigation Timing specification provides a browser API and the ability to track the hourly limits and performance of each request. Let's take a closer look at its components, so each of them represents a very important part in the user's overall impression of browser performance.

After receiving the URL of the resource on the Internet, the browser begins to check whether it is local, and whether there is data stored in the cache. If you have previously received data from this resource, and the corresponding browser headers have been set (Expires, Cache-Controle, ...) then you can get all the data from the cache - the fastest request is the request that was not made . In another case, if we checked the resource and the cache is “rotten” or we have not visited the site, it is the turn to make an expensive network request.

Having the site address and the path to the requested resource, Chrome first checks whether there are open connections to this site that can be used again - the sockets are grouped in {scheme, host, port} . In case you are accessing the Internet through a proxy, or have installed a proxy auto-config (PAC) script, Chrome checks the availability of the necessary connection through the appropriate proxy. The PAC script allows you to specify multiple proxies based on URLs or other configuration configuration rules, and each of them can have its own set of connections. Finally, if none of the above conditions came up, the turn came to obtain an IP address for the address we need - DNS Lookup.

If we are lucky, and the address is in the cache, the answer will certainly cost us in one quick system request. If not, the first thing to do is to query the DNS server. The time it takes to perform it depends on your Internet provider, the popularity of the requested site and the likelihood that the site name is in the intermediate cache, plus the DNS server response time to the request. In other words, there is a lot of uncertainty here, but the time of a few hundred milliseconds, which will be needed for a query before the DNS, will not be out of the ordinary.

After receiving the IP, Chrome can establish a new TCP connection to the remote server, which means that we have to do the so-called three-way handshake (three-time greeting): SYN> SYN-ACK> ACK. This exchange of greetings adds a delay to the request-response for each new TCP connection — no exceptions . Depending on the distance between the client and the server, given the choice of the routing path, this can take us a few hundred or even thousands of milliseconds. Note that this entire robot is executed before even one byte of data of the web application is transmitted!

If the TCP connection is established, and we use the secure data transfer protocol (HTTPS), we will additionally need to establish an SSL connection. This can take up to two additional full-cycle question-answer between the client and the server. If the SSL session is cached, we can do with only one additional cycle.

Finally, after all the procedures, Chrome has the ability to finally send an HTTP request (requestStart in the diagram above). Upon receiving the request, the server starts processing it, and sends the response back to the client. This will require at least one cycle, plus time to process the request on the server. And so, we finally got an answer. Yes, if this answer is not HTTP redirect! In this case, we will have to repeat once again the whole procedure described above. Do you have a couple of not quite necessary redirects on your pages? Probably should return to them, and change your decision.

Did you consider all these delays? To illustrate the problem, suppose the worst scenario is a typical broadband connection: the local cache is lost, followed by relatively fast DNS lookup (50ms), TCP greeting, SSL negotiation, and relatively fast (100ms) server response, with 80ms for delivering the request and response (average cycle time on continental America):

50 ms for DNS
80 ms for DNS greeting (one cycle)
160 ms for SSL greetings (two cycles)
40ms on request to the server
100 ms to process the request on the server
40 ms to reply from server

In sum, this is 470 ms for a single request, which results in more than 80% of the time it takes to connect to the server compared to the time it takes the server to process the request . In fact, even 470 milliseconds can be an optimistic estimate:

if the server's response does not fit into the initial congestion window (4-15KB), then several more request-response cycles will be required.
SSL delay can be even worse if we need to get a lost certificate or perform an online certificate status check , in each case we need to establish a new, independent, TCP connection that can add hundreds of milliseconds or even seconds to the delay.

What does "fast enough" mean?

Network expenses for DNS, greetings, and messaging are what dominate the total time in previous cases - the server response will require only 20% of the total wait! But, by and large, do these delays matter ? If you are reading this, then you probably already know the answer - yes, and very much so.

Recent user research paints the following picture that users expect from any interface, both online and offline applications:

Delay	User Response
0-100ms	instantly
100-300 ms	small but noticeable delay
300-1000ms	processing request
more than 1s.	switching the user's attention to another context
more than 10s.	I'll be back later

The table above also explains the unofficial rule of performance in a web application environment: display your pages, or at least provide a visual response to a user action within 250ms to keep your application interested. But it's not just speed for speed. Research in Google, Amazon, Microsoft, and many thousands of other sites show that the additional delay has a direct impact on the success of your site: faster sites bring more views, higher user loyalty and higher conversion.

And so, what we have is an optimal latency of about 250ms, but, however, as we saw above, the combination of DNS queries, the installation of TCP and SSL connections, and the delivery of requests takes up to 370ms. We have passed the limit by more than 50%, and we still have not considered the processing time of the request on the server!

For most users and even web developers, DNS, TCP, SSL delays are completely opaque, and are formed on layers of abstraction that only a few of us think about. However, each of these steps is critical for user interaction as a whole, since each additional network request can add tens or hundreds of milliseconds of delay.
This is the reason why the Chrome network stack has many, many more than just a socket handler.

We discussed the problem, it's time to move on to the implementation details.

PS translator: since the article is quite large, I decided to break it into theory and practice, the other part is more interesting and much larger. As it turned out, a lot of time in processing a request for a translation takes a design for Habr, about 40% of the time, and reading in Russian, since for me this is a kind of double translation. Thanks for attention.

Source: https://habr.com/ru/post/167971/

All Articles