Best HTTP / 2 prioritization to speed up the web

HTTP / 2 promised to speed up the web significantly, and Cloudflare had long since deployed HTTP / 2 access for all clients. But one HTTP / 2 feature, prioritization, did not meet expectations. Not because it is fundamentally broken, but because of the implementation in browsers.

Today, Cloudflare proposes to change the HTTP / 2 prioritization, which gives our servers control over prioritization decisions that really speed up the Internet.

Historically, it is the browser that controls how and when to load web content. Today, for all paid plans we make radical changes to this model. They transfer control directly to the site owner. On the Speed tab in the Cloudflare dashboard, clients can turn on Extended HTTP / 2 Prioritization: it overrides the default browser settings for an improved scheduling scheme, which significantly speeds up access for visitors (in some cases we have seen 50% acceleration). With Cloudflare workers, site owners can go even further and fully customize their settings for their specific needs.

Current situation

Web pages consist of dozens (sometimes hundreds) of individual resources that are loaded and collected by the browser into the final displayed content. This includes visible content with which the user interacts (HTML, CSS, images), as well as application logic (JavaScript) for the site itself, advertising, analytics, and marketing tracking lights. From the user's point of view, the sequence in which these resources are loaded is very important: it affects the time when he sees the content and can interact with the page.
')
The browser is, in essence, an HTML processing engine that runs through an HTML document and follows the instructions in order: from beginning to end HTML, building the page as you go. Links to style sheets (CSS) tell the browser how to style the content of the page, and the browser will delay displaying the content until it loads the style sheet. Scripts on the page may have different behaviors. If the script is marked as “asynchronous” or “deferred”, the browser can continue processing the document and simply run the script when it becomes available. If the script is not marked as asynchronous or deferred, the browser MUST stop processing the document until the script loads and runs. Such scripts are called “blocking” because they block the browser's ability to continue processing the document.

HTML document is divided into two parts. The <head> document header is at the beginning and contains style sheets, scripts, and other browser instructions needed to display the content. After the title comes the body of the <body> document, it contains the actual content displayed in the browser window (although scripts and style sheets can also be in the body). Until the browser gets to the body of the document, the user has nothing to show, and the page will remain blank. Therefore, it is important to process the header as quickly as possible. If you are interested in the details, on the HTML5 Rocks website there is an excellent tutorial on how browsers work.

The browser is usually responsible for the order of loading the various resources needed to build the page and further processing the document. In HTTP / 1.x, there are restrictions on how many objects a browser can request from any server at a time (usually 6 connections and only one resource at a time per connection), so the order of requests is strictly controlled by the browser. In HTTP / 2, the situation is completely different. The browser can request all resources at once (at least as soon as it becomes aware of them), and provides the server with detailed instructions on how to deliver these resources.

Optimum resource loading order

For most parts of the page loading cycle, there is an optimal order that speeds up the accessibility of the page for the user to the maximum (and the difference between the optimal and non-optimal loading order can reach 50% or more).

As described above, before the browser can display any content, CSS and JavaScript block it in the <head> section. At this stage, it is more profitable to use 100% of the channel to download blocking resources, rather than loading them in order, as they are written in the HTML code. This allows the browser to analyze and launch each element while loading the next blocking resource, which creates an optimal pipeline.

Script loading time for parallel or sequential loading is not different, but with sequential loading, the first script can be processed and executed during the loading of the second one.

After loading the blocking resources, the situation becomes a bit more interesting. Here, the optimal load may depend on a particular site or even business priorities (the choice of user content or advertising, or analytics, etc.). A separate problem with fonts, because the browser detects the necessary fonts after applying the style sheet to the displayed content. Therefore, by the time the browser learns about the font, it is necessary to display text that is already ready to be displayed on the screen. Any delays in loading the font lead to a lack of text on the screen (or the text is displayed in the wrong font).

As a rule, some compromises need to be taken into account:

Custom fonts and images in the visible part of the page (viewport) should be loaded as quickly as possible. They directly affect the user's visual experience when loading the page.
Non-blocking JavaScript should be loaded sequentially with respect to other JavaScript resources so that the execution of each of them can be put into the pipeline. JavaScript can include custom application logic, as well as tracking beacons for analytics and marketing, and their delay can lead to a decrease in performance tracked by the business.
Images can be downloaded in parallel. The first few bytes of the image file contain its dimensions, which may be necessary for the browser layout, and parallel loading of progressive images can provide visual completeness after transferring approximately only 50% of the total volume.

Given the tradeoffs, in most cases, this strategy works well:

Custom fonts are loaded sequentially and share available bandwidth with images in scope.
Visible images are loaded in parallel, sharing the part of the bandwidth allocated to them.
When there are no more fonts or visible images:
- Non-blocking scripts are loaded sequentially and share the available bandwidth with invisible images (which are out of scope).
- Invisible images are loaded in parallel, dividing the part of the bandwidth allocated to them.

Thus, the user-visible content is loaded as quickly as possible, the application logic is delayed to a minimum, and invisible images are loaded in such a way as to complete the layout as quickly as possible.

Example

To illustrate, we use a simplified product category page from a typical e-commerce site:

Blue is the HTML file for the page itself.
Green - One external style sheet (CSS file).
Orange - Four external scripts (javascript). Two blocking scripts at the beginning of the page and two asynchronous. Blocking scripts are shown in a darker shade of orange.
Red is one custom web font.
Purple - 13 images. The preview window displays the page logo and four product images, and another 8 product images require scrolling. The five visible images are indicated by a darker shade of purple.

For simplicity, we assume that all resources are the same size and each is loaded in 1 second. Downloading all resources takes a total of 20 seconds, but the order and method of loading is extremely important.

Here's what the optimal resource load will look like in a browser:

The page is empty for the first 4 seconds while loading HTML, CSS and blocking scripts: they all use 100% of the connection.
At the 4-second mark, the background and page structure are displayed without text or images.
After a second, at 5 seconds, the page text is displayed.
In the interval of 5−10 seconds, images are loaded, at first blurry, but very quickly they become clear. At about 7 seconds, the result is almost indistinguishable from the final version.
At 10 seconds, the loading of all visual content in the visible part of the page is completed.
Over the next two seconds, asynchronous JavaScript is loaded and executed, executing any non-critical logic (analytics, marketing tags, etc.).
In the last 8 seconds, the remaining images are loaded in case the user scrolls the page.

Current browser prioritization

All current browser engines implement different prioritization strategies , none of which is optimal.

Microsoft Edge and Internet Explorer do not support prioritization , so they work with the default HTTP / 2 settings, which loads everything in parallel, evenly distributing bandwidth between all resources. Microsoft Edge in future versions will switch to the use of the Chromium engine, which can improve the situation. But for now, in our example, the browser will be stuck in the page header most of the time, as the images slow down the transmission of blocking scripts and style sheets.

Visually, this leads to a rather painful experience: the user looks at the blank screen for 19 seconds, and then there is a delay of 1 second to display the text. When you are watching the animation below, be patient, because for 19 seconds it may seem that nothing is happening on the empty screen (although it is):

Safari loads all resources in parallel , sharing bandwidth based on their importance, according to Safari (blocking resources such as scripts and style sheets are more important than images). Images are loaded in parallel, but also simultaneously with blocking content.

Although Safari is similar to Edge in the sense that everything is loading at the same time, allocating a larger band for blocking resources allows you to display content much earlier:

After about 8 seconds, the loading of the stylesheet and scripts is completed, so you can start rendering the page. Since the images were loaded in parallel, they can also be partially displayed (blurry for progressive images). This is still two times slower than the optimal scenario, but much better than in the Edge.
After about 11 seconds, the font is loaded. You can display the text. At this point, more data is loaded for the images, and they become a little sharper. This compares with the situation around the 7-second mark for an optimal loading scenario.
During the remaining 9 seconds, the images become clearer as more data is loaded, until finally the process is completed in 20 seconds.

Firefox creates a dependency tree that groups resources and then plans to either load groups one by one or share bandwidth between groups together. Within this group, resources share bandwidth and load simultaneously. Images are planned to be loaded after the style sheets that block rendering, and load in parallel, but scripts and style sheets that block rendering are also loaded in parallel and do not receive the advantages of pipeline processing.

In our example, this is a bit faster than in Safari, since the images are waiting for the loading of style sheets:

At 6 seconds, the original content of the page is displayed with the background and blurred versions of the product images (compared to 8 seconds for Safari and 4 seconds in the best case).
At 8 seconds, the font was loaded, and you can display the text along with a slightly clearer product images (compared to 11 seconds for Safari and 7 seconds in the best case).
During the remaining 12 seconds, the images become clearer as the remaining content is loaded.

Chrome (and all Chromium-based browsers) prioritizes resources by list . This works very well for blocking resources that are optimally loaded in order, but not so good for images. Each image is loaded up to 100% before starting the next one.

In practice, this is almost the optimal loading scenario, with the only difference that images are loaded one at a time, and not in parallel:

Up to 5 seconds, Chrome download is identical to the optimal scenario, displaying the background at 4th second and text content at 5th.
Over the next 5 seconds, the scopes are loaded one by one until the process finishes at 10 seconds (compared to the optimal scenario, when they are displayed in a slightly blurred form at 7 seconds and become clearer during the remaining three seconds).
After completion of the visual part of the page in 10 seconds (identical to the optimal script), the remaining 10 seconds are spent on launching asynchronous scripts and loading hidden images (as in the optimal script).

Visual comparison

The visual difference is quite different, although technically downloading the entire content takes the same time:

Server side prioritization

HTTP / 2 prioritization is requested by the client (browser), and the server must decide what to do based on the request. A large number of servers do not support this feature at all , and the rest fulfill the client's request. Another option is to decide on the best server-side prioritization based on the client’s request.

According to the specification , HTTP / 2 prioritization is a dependency tree that requires full knowledge of all current requests in order to be able to prioritize resources relative to each other. This allows you to implement incredibly complex strategies, but this is difficult to implement well on the browser or server side (as evidenced by various browser strategies and different levels of server support). To simplify prioritization management, we have developed a simpler scheme that still has all the flexibility necessary for optimal planning.

The Cloudflare prioritization scheme consists of 64 priority “levels”, and within each level there are groups of resources that determine how to divide the connection between them:

First, all resources are downloaded at a higher priority level, then a transition to a lower level occurs.

Within a given priority level, there are three different concurrency groups:

0 : all resources in the “0” group are sent sequentially in the order in which they were requested, using 100% bandwidth. Only after loading all the resources of the “0” group are other groups considered at the same level.
1 : all resources in concurrency group “1” are sent sequentially in the order in which they were requested. The available bandwidth is evenly distributed between parallelism group “1” and parallelism group “n”.
n : resources in parallel group “n” are transferred in parallel, sharing the available bandwidth.

In practice, the concurrency group “0” is useful for critical content that needs to be processed sequentially (scripts, CSS, etc.). Group “1” is useful for less important content that can share bandwidth with other resources, but where the resources themselves still benefit from sequential processing (asynchronous scripts, non-progressive images, etc.). The concurrency group “n” is useful for resources that benefit from parallel processing (progressive images, video, audio, etc.).

Default Cloudflare Prioritization

The extended prioritization option implements the “optimal” resource loading order described above. The specific priorities used are as follows:

This scheme allows you to sequentially send resources that block rendering, then send visible images in parallel, and then the rest of the page content with some level of strip sharing to balance the load between the application and the content. The caution * If Detectable is that not all browsers distinguish between different types of style sheets and scripts, but it will still be much faster in all cases. Acceleration by 50%, especially for visitors of Edge and Safari, will not be something unusual:

Setting up prioritization with workers

Faster defaults are great, but things get really interesting thanks to the ability to customize prioritization with Cloudflare Workers support, so sites can redefine the default priority for resources or implement their own prioritization schemes.

If the worker adds a cf-priority header to the response, the Cloudflare edge servers will apply the specified priority and concurrency. The header format is <priority> / <concurrency>, so the header is response.headers.set('cf-priority', “30/0”); sets the answer to priority 30 and parallelism 0. Similarly, “30/1” sets parallelism to “1”, and “30 / n” sets parallelism to n.

With such flexibility, the site can customize arbitrary priority resources for their needs. For example, to increase the priority of some important asynchronous scripts or main images: they are downloaded before the browser has determined that they are in sight.

To inform about ranitization prioritization decisions, the workers also indicate the browser-requested prioritization information in the request object that is passed to the worker’s event receiver (request.cf.requestPriority). Incoming priorities are a list of attributes separated by semicolons. It looks like this: weight=192;exclusive=0;group=3;group-weight=127 .

weight : weight to prioritize HTTP / 2.
exclusive : exclusive HTTP / 2 flag (1 for Chromium based browsers, 0 for others).
group : HTTP / 2 stream identifier for the request group (non-zero for Firefox).
group-weight : HTTP / 2 weight for the request group (non-zero for Firefox).

This is just the beginning.

The ability to customize and control the priority of responses is a basic building block for a lot of future work. We intend to implement our own advanced optimization on top of this, but with the support of workers, all sites and researchers can experiment with different prioritization strategies. Through the Apps Marketplace, companies can also create new optimization services on top of the working platform and share them with other sites.

If you are on a Pro plan or higher, go to the Speed tab in the Cloudflare dashboard and turn on HTTP / 2 Extended Prioritization to speed up your site.

Source: https://habr.com/ru/post/452020/

All Articles