Can you afford it? Real-world web performance budget

Posted by Alex Russell, developer of Chrome, Blink and Google’s web platform.

TL; DR: performance budgets are a significant but undervalued part of a successful product and a healthy team. Most of our partners are not aware of the conditions of the real world - and as a result, they choose the wrong technology. We set a budget for the time of five seconds or less before the interactivity of the site after the first download, as well as two or less seconds for subsequent downloads. If these guidelines are followed, we are limited to a typical real-world device and a typical network configuration. This is an Android smartphone for $ 200 on a 400 Kbps channel, RTT 400 ms. This means a budget of ~ 130-170 KB of critical path resources, depending on their composition: the larger the JS, the smaller the volume.

Over the past few years we have had the pleasure of working with dozens of teams. The work was enlightening, sometimes in very unexpected places. One of the most unexpected results is the frequent occurrences of “jams of javascript”.
')

“We need a new term for missed business opportunities because of the modern frontend. Could it be a “JavaScript trap”?

Managers who give the go-ahead to the creation of progressive web applications (PWA) are often referred to as the main motive for virtually seamless coverage of new users. At the same time, developers are mastering the tools that make it possible to achieve such a goal. Nobody wanted bad. However, the results of the “finished” PWA project often require weeks or months of painful rework to ensure minimally acceptable performance.

Such an alteration delays the launch, which in turn delays the collection of data on the viability of the chosen PWA strategy. Developers are often not aware of the problems until it is too late. They launch sites that are simply impossible to use, unless you are one of the wealthy owners of the best smartphones.

Baseline setting

Those teams that manage to avoid unpleasant results tend to show a few similarities:

The leaders are enthusiastic. They use the “do what you need” approach to ensure and keep the application running fast.
At an early stage, performance budgets are established.
Budgets are scaled according to the parameters of the network and devices on the market.
Continuous Integration (CI) tools and systems help track progress and prevent regression.

These parameters are based on each other: it’s difficult to plan the right things if you don’t have a manual that appreciates the importance of a user-friendly interface and its long-term importance to the business. Teams with such support can set performance budgets, hold contests between competing approaches, and invest in performance infrastructure. They have more will to go against “generally accepted standards” when popular tools prove their inconsistency.

Thanks to performance budgets, everyone is in the same boat - a culture of shared enthusiasm is being created to improve the user interface. Teams with budgets are also easier to track progress and make graphs. This helps managers: they have meaningful metrics in support of investments.

Budgets establish objective boundaries to determine which changes in the code base will be a step forward, and which - a step back from the user's point of view. Without this, you will inevitably fall into the trap and begin to pretend that you can afford more than allowed in reality. Very rarely, we have seen teams that succeed without a budget, collect RUM metrics, and use representative user devices.

Meetings with partners are indicative. We immediately got a clear idea of how bad the site’s performance is in terms of the percentage of leading programmers, project managers and managers with top smartphones that they use mainly in urban areas .

User enhancements consist of two steps:

Revising assumptions and growing understanding of real world conditions
Automated objective baseline testing

Never before have front-end developers had access to such excellent performance measurement tools and diagnostic techniques, however, poor results are normal. What's the matter?

JS is your most expensive resource

One distinct trend is the belief that the JavaScript framework and single-page architecture (SPA) must be used to develop progressive web applications. This is not true (more on this in the next article), and such sites will definitely need more scripts in each document (for example, for the router components). We regularly see websites loading more than 500 KB of scripts (compressed). This is important because all script downloads affect the most important metric: the time before the appearance of interactivity ( Time to Interactive ). Sites with so many scripts are simply not available to a significant part of users; statistically, users will not wait for the interface to load for so long. If they wait for the download, then they experience terrible lags.

We were often asked: “Why is the JS limit of 200 KB so important, do we have some bigger pictures?” Good question! To answer it, it is important to understand how the browser handles resources (of different types) and the concept of a critical path . As a timely introduction, I recommend Kevin Schaaf’s recent talk .

Loading JavaScript with a delay can lead to the fact that the “rendered on the server” pages do not work as the user expects, which is very annoying. This effect is the main reason why we strive to guarantee reliable interactivity.

Imagine this page:

<!DOCTYPE html> <html> <head> <link rel="stylesheet" href="/styles.css"> <script src="/app.js" async></script> </head> <body> <my-app> <picture slot="hero-image"> <source srcset="img@desktop.png, img@desktop-2x.png 2x" media="(min-width: 990px)"> <source srcset="img@tablet.png, img@tablet-2x.png 2x" media="(min-width: 750px)"> <img srcset="img@mobile.png, img@mobile-2x.png 2x" alt="I don't know why. It's a perfectly cromunlent word!"> </picture> </my-app> </body> </html>

The browser receives this document in response to a GET request to https://example.com/ . The server sends it as a stream of bytes, and when the browser encounters each of the sub-resources mentioned in the document, it requests them.

After the download is complete, this page should respond to user actions - there is the most “interactivity” from the “Time to Interactive” (TTI) parameter. Browsers process user actions by generating DOM events that application code waits for. User actions are processed in the main document stream , where JavaScript also works.

Here are some operations that can occur in other threads, in any case, preserving the browser's ability to respond to user actions:

HTML parsing
CSS parsing
JavaScript parsing and compilation (sometimes)
Some JS garbage collection tasks
Parsing and screening pictures
Hardware Accelerated CSS Transformations and Animations
Scrolling the main document (if there are no active touch event handlers)

However, the following operations should go in the main thread:

Run javascript
DOM construction
Layout
Processing input from the user (including scrolling in the presence of active touch event handlers)

If the document in our example did not rely on JavaScript to create the <my-app> element, the document content would most likely become interactive as soon as enough CSS and content is loaded for meaningful rendering.

Script execution delays interactivity in several ways:

If the script runs longer than 50 ms, the time to reach the interactive state is increased by the entire time required for downloading, compiling and executing JS
Any DOM or UI created on JS is not available for use while the script is running.

The images do not block the main stream, do not block the interaction during parsing and rasterization and do not interfere with other parts of the UI to run in interactive mode or save it. Therefore, a 150 KB image will not significantly increase TTI, but a JS of this size will delay interactivity by the time required for the following tasks:

Code request, including DNS, TCP, HTTP with unarchiving overhead
Parsing and compiling top-level JS functions
Script execution

These steps are often repeated.

If the script execution fits into 50 ms, then the TTI will not increase, but this is unrealistic. 150 KB of compressed JavaScript is unzipped to about 1 MB of code. As Eddie documented , the whole process takes more than a second on most phones in the world, not counting the download time .

JavaScript is the most braking part of any web page, both for downloading and for device performance. For developers and managers with fast smartphones on fast networks, these hidden costs can be doubly incomprehensible.

Global truth

It is extremely important to decide which benchmark to use to determine the performance budget. Some developers and companies know their audience very closely - and can make informed assessments about devices and networks for current and future users. However, most do not have such information for setting the base level. Where to begin?

Two numbers are important here:

45% of mobile connections in the world are 2G
75% of connections carried by 2G or 3G

The median user is working on a slow network . The only question is how slow the connection is.

Our metrics on Google give a controversial picture (I'm working to clarify). Some systems show median RTTs of about 100 ms for 3G users. Others indicate that the median user is not able to transmit or receive a separate packet in less than 400 ms in some large markets.

I think you should choose a conservative option. The competing, overloaded cells can make “fast” networks brutally slow, transmission dispersion greatly reduces the effectiveness of TCP , and natural surges of network traffic work against us .

For Google developers, a specially created “degraded 3G” network works to evaluate the behavior of applications in such conditions. It simulates an RTT connection of 400 ms and a bandwidth of 400-600 Kbps (plus latency dispersion and packet loss simulation). Given the contradictory picture that our metrics show, this can be taken as a basic level.

However, simulating packet loss and latency dispersion can make benchmarks very difficult and underestimate results. The effect of a lost packet during a DNS lookup gives a difference in seconds , making it difficult to compare the result before and after the changes made during development. Probably our baseline should take lower bandwidth and increased latency, sacrificing packet loss. We lose the accuracy of the real world, but we get the repeatability of tests, the ability to compare the results before and after the changes made, as well as to compare different products. Here you can still talk and talk about the influence of DNS, TLS, network topology and other factors. If you want to delve into this topic, I highly recommend the book “High Performance Browser Networking” by Ilya Grigorik . RRC description alone is worth your time.

Let's return to our basic level. We have roughly decided on network simulation: RTT 400 ms, channel 400 Kbps . What about the device itself?

At last year’s Chrome Dev Summit conference, I discussed some temperature and energy limitations that make a huge difference in performance between a mobile device and a desktop device . Add to this the gaping gap between the performance of top and low end devices due to different chip characteristics, such as cache size. Fortunately, it’s easier to pick a basic level there than with network speed: more than half of mobile users in the US use Android devices. If you look at other countries, the vast majority of smartphones sold now (and the last five years) work under Android . The average price of such devices in most regions falls due to the pervasiveness of Android and the steady decline in prices in the ecosystem . In turn, this affects the only major trend in determining the global base level of hardware performance budget: the next billion users will go online when they can afford it . This means a decrease in the average price of smartphones in the foreseeable future. And this, in turn, means that all the improvements in the quantity of transistors-per-dollar are converted into a lower price, and not into faster devices (on average).

In 2016, a true median device was sold for about $ 200 without being tied to an operator. This year the median device is even cheaper, but with about the same performance. It can be expected that the performance of the median device will freeze at one level for several more years. This is one of the reasons why I proposed the Moto G4 as the base device last year, and this year I recommend it as the Moto G5 Plus .

To summarize, our global baseline for measuring performance is:

~ $ 200 (new, without binding to the operator) Android-smartphone
On slow 3G networks, emulation:
- RTT 400 ms
- 400 Kbps speed

For most developers, creating applications in such an environment is akin to growing vegetables on Mars. Fortunately, this configuration is available at webpagetest.org/easy , so that we can recreate the Martian conditions here on Earth at any time.

Calculation of the acceptable level

The last thing to discuss in the performance budget is time. What time is too long?

I like the definition of Monica:

“The Monica Perf Test: if you have time to look away to eye contact with a stranger while the application is loading before the first rendering, then it is too slow”

... but it is more qualitative than quantitative. In numbers, I would like each page load to take less than a second ( see RAIL ). In the real world, this is impossible, so we set the Time-to-Interactive (TTI) metric with partners:

TTI up to 5 seconds for the first boot
TTI up to 2 seconds for subsequent downloads

Now we have everything we need to create a rough performance budget for the product in 2017.

First boot

If we subtract the time, conditions of the network connection and the main stages of the critical path, then we get some interesting results. You can start with a budget for the first boot in 5 seconds and calculate what transfer you can afford.

First, subtract 1.6 seconds from the budget for a DNS lookup and TLS handshake, which leaves 3.4 seconds for everything else.

Now let's calculate how much data can be transmitted over such a channel in 3.4 seconds: 400 Kbps = 50 KB / s. 50 KB / s * 3.4 = 170 KB.

NOTE: This discussion will clearly enrage competent network engineers. In previous versions of this article, slow start, bdp, tcp window scaling and the like were discussed. It was all hard to understand. Simplification does not significantly affect the overall conclusions, so these details were excluded.

Modern web applications mainly consist of JS, that is, we also need to subtract the time to parse and evaluate JS. The gzip compression for the JS code is from 5x to 7x, so 170 KB of the JS archive is converted into approximately 850 KB-1 MB of JS code. According to previous estimates, it takes about a second to launch it (assuming that there is no resource-intensive DOM there, but of course there is one). After playing a little with these figures, you can get within 3.4 seconds of downloading and parsing / evaluating, limiting yourself to the transfer of JS in the amount of 130 KB.

And the last detail: if one of the critical path resources is loaded from another place (for example, from a CDN), then you need to subtract from the budget another connection time with it (~ 1.6 s), which further reduces the fraction of time from 5 seconds We can spend on network data transfer and work on the client side.

Summing up, under ideal conditions, our approximate budget for critical path resources (CSS, JS, HTML and data) is:

170 KB for sites without a special amount of JS
130 KB for sites made on JS frameworks

This gives us the opportunity to consider the single most pressing question that stands in modern front-end development: “Can you afford it?”

For example, if your JS framework takes ~ 40 KB on a site overloaded with JS (which has a budget of 130 KB due to JS processing time), then only 90 KB remains for the rest. Your entire application should fit in this volume. A 100-kilobyte framework loaded from a CDN already exceeds the budget by 20 KB .

Remember: your favorite framework might fit in 40KB, but what about the data system? Router components that you added? Suddenly, 130 KB doesn’t seem to be such a large volume, given the data, patterns and styles.

Living on a budget means constantly asking yourself: “Can I really afford it?”

Second download

In an ideal world, all pages load faster than a second, but for many reasons this is often not feasible. Therefore, we will give ourselves a little more freedom and a budget of 2 seconds for the second (third, fourth, etc.) load.

Why not five? Because we no longer need to go online to download the UI. Service Workers and offline first architectures allow you to display interactive pixels on the screen without even accessing the network. This is the key to consistently high performance .

Two seconds is an eternity on modern CPUs, but we still need to correctly distribute them taking into account the following factors:

Process creation time (Android is relatively slow compared to other OSs)
Time to read bytes from disk (it is non-zero even on flash drives!)
Code execution time

All the applications I have seen that fit into the five-second first boot and correctly implemented the principle of offline-first, also fit into the budget of two seconds, and it is also possible to reach the limit of one second! But the introduction of offline-first is a huge problem for many teams. Designing with local saving of last user data (last-seen), reliable and consistent caching of application resources, tricks with updating application code using the Service Worker life cycle can be a big task.

I look forward to the tools continue to evolve in this direction. The most advanced framework I know now is the Polymer App Toolbox , so if you are not sure where to start, start with it.

130-170 KB ... Yes, you are just kidding!?!

Many teams with whom we had to talk, asked the question: is it possible to fit something meaningful at all in such a small amount of 130 KB. Maybe! The PRPL pattern does this by aggressively splitting code depending on the route, caching granular resources (sequential pages) using the Service Worker, and cleverly using modern protocol extensions like HTTP / 2 Push .

Together, these tools allow you to fit a modern functional interface of less than 100 KB on a critical path .

Unfortunately, it is still difficult to determine which parts of the page are critical resources for TTI and which are not from a specific log. But I believe that the tools will quickly fix this flaw, given the exceptional importance of this metric.

Despite all the arguments, it is possible to keep within the budget without even giving up completely the frameworks. Both Wego and Ele.me are created with the help of modern tools ( Polymer and Vue , respectively) - and they really work today , helping customers make transactions. Most applications are less complex than these. Living on a budget does not mean starving.

Tools for teams on a budget

Meeting the budget is really difficult, but the benefits to businesses and users are enormous. Benefits for development teams and their managers are not often discussed. No lead programmer or project manager wants to be in front of the manager who comes up to him with the phone in his hands and asks: “Why does it work so slowly when I'm on vacation?”

This is not a theoretical argument.

I saw teams that had just finished rewriting the code on a modern technological stack — and they sat nailed for an hour when we showed their “best” and “fast” applications in real world conditions.

Anyone will lose face if the product does not meet expectations. Months of unscheduled prompting to improve performance push back the introduction of new functions and negatively affect the team morale. When productivity becomes a problem, middle managers try to suppress self-doubt and at the same time close the team from the flying crap, which the team itself expects. Worse, managers themselves may begin to doubt the team. A performance crisis can have lasting effects; can the organization be sure that the team will issue a quality product? Can they trust a lead programmer who recommends using a new technology or make additional major investments? Then recriminations. This is a terrible experience, especially for developers who are too often in the position of incredible “fix it as soon as possible” pressure — and “this” can be the underlying technology on which the product is based.

In the worst cases, the product cannot be fixed in a time small enough to help the business. Often, development proceeds in an evolutionary way, and if a startup or a small team has relied on the wrong technology stack, in the absence of time, an error can become fatal. What is the worst, the error can not notice very, very long. If all employees of the group carry top-smartphones with the latest version of iOS in their pockets and do not leave the city, and the economy of the product is based on the growth of a wide audience, then no one will understand the lack of growth of this audience.

Of course, productivity is not a (whole) product. Many applications that are slowed down or intended for a narrow niche feel great. The unique service that people need (and they are ready to try to get it) is able to outweigh any of these concerns. Some have succeeded even in the App Store and Play Market, where it is not easy to attract an audience. But products in competitive markets matter every advantage.

Some specific tools and techniques can help teams that implement a performance budget:

webpagetest.org/easy : our favorite one-time analysis tool
WPT scripts : for teams that don’t want to install a special WPT instance, and they have public URLs for WIP applications, integration with WPT scripts could be a good option for regular “checks”
WPT private instances : if a team wants to integrate WPT directly into its CI or into commit-queue systems (automated test queue before a commit), then you should consider installing a private WPT server and equipment
Scripted Lighthouse : Not Ready for a Full WPT Instance? Scripting Lighthouse helps CI automate site analysis and regression search
grunt-perfbudget is an even simpler automated WPT test tool for your CI. Use it!
Speedcurve and Caliber : network services automate regular performance checks in the real world
Webpack Performance Budgets : Webpack , ,
bundlesize pr-bot , -. !

Fighting code blowing often means that you need to make obvious mistakes out of warnings . For teams using continuous integration systems or automated test queues before committing, it is strongly recommended to disable commits that exceed the performance budget.

If the group starts from scratch, my strong recommendation is to start with a stack that focuses on the structure of the application, the separation of code and the target assembly. The best for today:

Polymer App Toolbox
Next.js, preferably with Preact as a lighter runtime library

Whatever tools your team chooses, the most important is the budget. Without it, even the most advanced “lightweight” frameworks can easily create bloated, unusable applications. Starting from a global base level and increasing the budget only on the basis of specific metrics is the only way I know that the project will be good for everyone.

Notes

For the sake of saving time and space, you will have to postpone for the next article a discussion of architectures that have a reserve for the future. Curious users can explore Service Workers , Navigation Preload and Streams . Their combined power must fundamentally transform the optimal page load time in 2018 and beyond.

And the last.I thank everyone who reviewed the first drafts of this article, among them: Winamrata Singhal , Paul Quinlan , Peter O'Shaughnessy , Eddie Osmani and Gray Norton . I hope their heroic attempts to rid the article of errors are not inferior to my talent to add new ones.

Source: https://habr.com/ru/post/345212/

All Articles