"Under the hood" Turbo Pages: architecture of fast loading web pages technology

Hi, my name is Stas Makeev. In Yandex, I lead the development of technology Turbo pages, which provides fast loading of content, even with a slow connection. Today I will tell Habr's readers a little about the architecture of our project.

Happiness of a user is largely affected by how quickly he sees the contents of a web page. The speed worries many: in the mobile app store, only Speedtest has more than one hundred million installations. Providers, mobile operators, website and application developers strive to ensure the fastest possible access to content so that customers are satisfied.

The average download speed in the Russian mobile networks is 16.26 Mbps - this is a pretty good indicator. But the connection speed is uneven, we are still faced with a slow Internet - 3G, 2G, EDGE. Surely you were in a situation where in a cafe or a shopping center, on the road or in the country, high speed is usually reduced: sites load for tens of seconds, or even longer.
')
Technology Turbo pages solves the problem of access to content, including at low or unstable connection speeds. This is important for site owners who have reduced the proportion of visitors who fall off during the transition from search.

How Turbo Pages Work

The site owner registers an RSS feed with Yandex.Webmaster. The feed gets into the Turbo Pages content system, which takes updates from it every few minutes. Heavy content - first of all pictures and video - we cache and decompose in CDN. In addition to RSS, content can be transmitted via API and auto-parser.

The volume of cached images of Turbo pages approaches 100 TB

Reliability and resiliency of the system is important to us, so we make several replicas of the data and store them in our three data centers. In each data center, hundreds of servers process thousands of requests per second, which allows you to flexibly balance the load.

The content system of the Turbo pages deserves a separate post, and we will write it. For now, we restrict ourselves to a simplified scheme.

What happens when you open a URL in a browser?

When a user goes to the Turbo page, “under the hood”, something like this happens:

The HTTP adapter handles the user's HTTP request and makes a request to the desired graph in AppHost and report-renderer.

AppHost is a special component that encapsulates the network interaction of sources, described as a dependency graph. Sources are polled in the order of topological sorting on this graph, all business logic is sewn in them and in the graph configuration. In particular, at the graph level, KV-storage is polled and a data request is sent to third-party APIs.

Report-renderer is an application written in node.js, which accepts JSON as input, executes templates written in JS, and returns a string.

All this happens almost instantly.

What affects the download speed?

We are working on all aspects of speed: from implementing HTTP / 2 on a balancer and optimizing TLS-handshake to manual optimization of SVG. In this case, you need to understand what constitutes the final user speed.

Inside the team, we distinguish three stages of processing the request: server, network, and client.

Server

This includes everything that happens in data centers: from the moment when the HTTP request arrives at our server to the generation of an HTML page that is given directly to the client.

The processing time of the request on the server should be minimal. Despite the relatively small values, it affects absolutely all user requests. In addition, all processes occur in our controlled environment - there can simply be no excuse for large delays.

Server time consists of network interactions between the vertices of the source dependency graph and the times of each vertex. But we will not focus on the features of the network infrastructure of Yandex data centers - they deserve a separate post.

I would like to pay more attention to the second component - the time of execution of each of the vertices. As an example, let us analyze our principles and tools for working on the Report-renderer component, which is responsible for generating HTML. For other components, they are very similar.

In our CI process, there are tasks that accept pull requests in dev that perform basic checks on each commit in a feature branch. If some indicators exceed the specified limits, the effect on dev is frozen until the reasons are clarified.

Key metrics at this stage:

standardization time;
the size of the final page;
size of static files.

We collect client statics (CSS and JS) for each page depending on the data, but the bundles with blocks themselves do not depend on the request, so it’s enough to compare the size of files in the branch with similar files in dev. For different types of files, we have different thresholds, after which the task cannot be poured into dev without “OK” from those responsible for speed.

As a rule, there is a joint analysis of the code and the search for ways to optimize.

With the page size metrics and templating time, you have to act differently, since they are highly dependent on a specific query and some statistical certainty is needed. Moreover, it is impossible to take synthetic requests, because it will be unfair measurements. Therefore, we constantly collect random user requests for access logs, create “cartridges” from them, and “shoot” them with patterns in the branch with changes and dev. This allows you to catch changes even on not very popular requests.

We have several "baskets of requests" that allow you to cover most of the traffic to the Turbo page.

In addition to optimizing our templates, we follow the optimizations that occur within the V8. For example, the transition to TurboFan gave excellent results: the server templating time was significantly reduced.

Time server templateization decreased after the transition to TurboFan

Network

In the network part, we include everything that happens between the client and the server: data transfer time, page size and statics, as well as resource caching. This is more interesting, because of our cozy data centers we find ourselves in a wild outside world, where not everything depends on us. Measurements are becoming a little more difficult, and most importantly - you can get really tangible results in hundreds of milliseconds.

This is what we do.

We have tweaked the TCP and TLS parameters that allow us to win several RTTs (Round Trip Time), this gives excellent results in networks with high latency. Our colleagues have already written about this, so I will not go deep.

The size of the transmitted data can greatly affect the download speed, so we try to send only what the current page needs, in the most efficient way.

Images in our interfaces are optimized using ImageOptim. To optimize SVG, we use not only SVGO , but we are not lazy to look into the content and, if possible, optimize it with our hands.

We upload images from site owners to a special CDN optimized for image rendering. We cut off the exif and color profile of the image by first converting the image to sRGB. The bit rate is reduced to 8 bits per channel, the compression level is set to 85. The lanczos filter is used for resizing.

We create dozens of variants of each picture for combinations of different screen sizes, taking into account the pixel density (retina-displays). And of course, we automatically encode images into WebP format, if supported by the browser.
Text formats (HTML, JavaScript, CSS) are compressed using gzip / zopfli and brotli, if the browser supports it.

It is important not to forget about the remoteness of users from the servers. Turbo pages are used in many regions, and content can be anything. So we do not make compromises and to reduce latency even in the most remote regions we use a CDN, which is constantly expanding.

And of course, the fastest query that does not do at all. All statics are given with perpetual caching from a separate domain without cookies, and to increase the cache hit, it can also be heated on the main page and the search results page.

Customer

It is not enough to form the server's response and deliver it to the browser over the network, it still needs to be effectively shown. We optimize the start time of the page rendering, so that the person will start reading the contents faster.

In the HTML header, we “warm up” the connection with our servers, distributing statics, and preload it additionally. Styles inline into the page, which allows the browser to start rendering the page without waiting for the loading of styles over the network.

Content images, embeds and ads are not loaded immediately, but as you read the page, when you approach the field of view of the user.

JavaScript is partially embedded in HTML, and all other scripts are loaded at the end with separate HTTP requests. Scripts that are critical for getting started, collection of errors and metrics, as well as components that are not often found on the page are embedded in the page.

We collect RUM metrics for page load. The most critical ones are the time to the first byte, the first rendering and the onset of interactivity, when all the scripts have completed initialization and the user can use the page.

Most users access Turbo pages not directly, but from other Yandex services, and we wanted to evaluate page load time in the context of user experience. Not just to get abstract time in a vacuum, but a metric of how the user sees everything.

So we formulated the integral velocity metric:
max (firstContentfulPaint, firstImageLoadTime, timeToVisible) — timeToClick

Where:

timeToClick is the absolute time of the click that led to the Turbo page display. This can be a click on a snippet on the search results page or on a card in Yandex.Dzene.
firstImageLoadTime - absolute load time of the first content image in the first screen.
timeToVisible - the absolute time of the page transition to the visible state. This is relevant for cases where the page was loaded in the background.

And got the user experience metrics:

if 2/3 of the screen is occupied by an image that has not yet loaded, the integrity of the firstContentfulPaint metric is rather dubious;
There are many event handlers on the links, there may be a nonzero time between the click and the actual start time of the page loading, which I would like to understand.

We are constantly developing technology to make websites attract more visitors. Now Turbo-page on average loads 15 times faster than the usual mobile version. Tens of thousands of sites use Turbo, and the total number of visits to them is more than 12 billion.

All this is the result of the work of developers, support services, managers working with site owners, and many others. Over time, the team, of course, expands. For example, now we are looking for specialists in frontend and backend and will be happy to see new colleagues.

What components of Turbo technology would you like to read more detailed technical materials in the future? What our experience would be interesting for you? We will also welcome feedback and ideas. Thank!

Source: https://habr.com/ru/post/460373/

All Articles

"Under the hood" Turbo Pages: architecture of fast loading web pages technology

How Turbo Pages Work

What happens when you open a URL in a browser?

What affects the download speed?

More articles: