Garbage Collection in V8: How the New Orinoco GC Works

To be honest, this is one of the most brutal articles that I read lately: there is a lot about death at a young age, about persecution from one area of memory to another, and about a fierce struggle for productivity. In general, welcome to the cat - there is a translation of Peter Marshall's excellent article on how garbage collection in V8 works today.

Over the past few years, the approach to garbage collection in V8 has changed a lot. As part of the Orinoco project, he has gone from a consistent stop-the-world approach to a parallel and competitive approach with incremental foldback.
')
Note: if you like watching a report rather than reading an article, you can do it here . If not, read on.

Any garbage collector has a set of tasks that need to be performed periodically:

Find live / dead objects in memory.
Reuse the memory occupied by dead objects.
Condense / defragment memory (optional).

These tasks can be performed sequentially, or you can alternate. The easiest way is to stop the execution of JavaScript and do everything consistently in the main thread. However, this may lead to delays, which we talked about in previous posts , as well as to a decrease in program performance in general.

Main GC (full mark-compact)

The main GC collects trash from the entire heap.

Garbage collection occurs in three steps: labeling, recycling and compacting

Marking

Determining from what objects you can free up memory is a mandatory part of the work of the garbage collector. He considers the object alive based on information about its reachability. This means that any object referenced from the current execution environment should be stored in memory, and all inaccessible objects can be collected by the GC.

Marking is the process of finding reachable objects. The GC has a set of pointers from which it begins to search, the so-called root set. This set includes objects from the current execution stack and a global object. Starting with this set, the GC follows each pointer to a JavaScript object and marks each one as reachable, and then moves to pointers from objects to other objects and repeats this process recursively until each reachable object is marked.

Recycling

Disposal is a process during which areas of memory left from dead objects are put into a list called free-list. As soon as the marking process is completed, the GC finds such areas and adds them to the appropriate list. Free-lists differ from each other in what size of the memory area they are stored in, which makes it possible to quickly find the desired one. Later, when we want to allocate a memory, we will look at one of the lists and find a section of suitable size.

Compaction

Also, the main GC sometimes makes decisions about cleaning / compacting some pages of memory, based on its own heuristic estimates based on the degree of page fragmentation. You can think of compaction as an analogue of hard disk defragmentation on older PCs. We copy the surviving objects onto other pages that have not yet been compacted (here the free-list is just used). Thus, we can reuse the small scattered memory areas left over from dead objects.

One of the drawbacks of GC, which copies surviving objects, is that when many long-lived objects are created, you have to pay a high price for copying them. It is for this reason that only some highly fragmented memory pages undergo compaction, while for the rest, recycling is simply carried out, which does not require copying the surviving objects.

Generation memory device

The pile in V8 is broken down into areas called generations. There is a younger generation (which in turn is subdivided into a generation “nursery” and “intermediate” generation) and old generations. Created objects are placed in the "nursery." Subsequently, if they are experiencing the next garbage collection, they remain in the younger generation, but pass into the category of "intermediate". If they survive after the next assembly, they are placed in the older generation.

A bunch of V8 is divided into generations. Objects are moved from the youngest to the older generation if they are experiencing garbage collection

There is an important term "generational hypothesis" in garbage collection. Simply put, it means that most of the objects “die young”. In other words, most objects are created and almost immediately die in terms of GC. And this statement is true not only for JavaScript, but also for most dynamic programming languages.

The heap organization in V8 relies on the above hypothesis. For example, at first glance it may seem illogical that GC is engaged in compaction / movement of objects that have survived garbage collection, because copying objects is a rather expensive operation in order to carry it out during garbage collection. But, based on the hypothesis of generations, we know that very few objects will survive this procedure. So, if you move only the surviving objects, anything that has not been moved can automatically be considered rubbish. This means that the price we pay for copying is proportional to the number of surviving objects, not all created ones.

Auxiliary GC (scavenger)

There are actually two garbage collectors in V8. The main (mark-compact) quite effectively collects garbage from the entire heap, while the auxiliary collects garbage only in young memory, because the hypothesis of generations tells us that the main efforts to collect garbage should be sent there.

The principle of operation of the auxiliary GC is such that the surviving objects always move to a new memory page. In V8, the young memory is divided into two halves. One is always free so that it is possible to move the surviving objects into it, and during assembly this initially empty area is called To-space. The area from which copying occurs is called From-space. In the worst case, every object can survive, and then you have to copy them all.

For this type of assembly, there is a separate set of pointers that reference old memory to young. And instead of scanning the entire heap, we use write barriers to maintain this set. So, combining this set with the stack and the global object, we get all the links in the young memory without having to scan all the objects from the old memory.

When copying objects from From-space to To-space, all surviving objects are placed in a continuous memory location. Thus, it is possible to get rid of fragmentation - gaps of memory left over from dead objects. After the transfer is complete, To-space becomes From-space, and vice versa. As soon as the GC finishes its work, the memory for new objects will be allocated starting from the first free address in From-space.

Scavenger moves surviving objects to a new memory page.

If you use only this strategy and do not move objects from the young memory, the memory will end pretty quickly. Therefore, objects that have survived two garbage collections are moved to the old memory.

The final step is to update the pointers to the objects that have been moved. Each copied object leaves its original address, leaving instead the forwarding address, which is necessary to find the original object in the future.

Scavenger transfers “intermediate” objects to old memory, and objects from “yasel” to a new page.

Thus, garbage collection in young memory consists of three steps: marking objects, copying them, updating pointers.

Orinoco

Most of the listed algorithms are described in various sources and are often used in execution environments with support for automatic garbage collection. But GC in V8 has come a long way before becoming a truly modern tool. One of the significant metrics describing its performance is how often and how long the main thread pauses while the garbage collector performs its functions. For classic stop-the-world collectors, this time leaves its mark on the experience of using the page due to delays, poor-quality rendering and an increase in response time.

Logo Orinoco GC V8

Orinoco is a codename for GC using modern techniques of parallel, incremental and competitive garbage collection. There are some terms that have a specific meaning in the context of GC, so let's first give their definitions.

Parallelism

Parallelism is when the main and auxiliary streams do approximately the same amount of work per unit time. This is still the stop-the-world approach, but the length of the pause in this case is divided by the number of threads involved in the work (minus the cost of synchronization).

This is the simplest of the three techniques. The heap does not change, because JavaScript is not executed, so that it is sufficient for threads to support synchronization of access to objects.

The main and auxiliary threads work on the same task simultaneously.

Incrementality

Incrementality is when the main thread does a small amount of work intermittently. Instead of full garbage collection, small tasks are done on partial assembly.

This is a more difficult task, since JavaScript runs between the incremental builds, which means that the state of the heap is changing, which in turn can invalidate some of the work done in the previous iteration.

As can be seen from the diagram, this approach does not reduce the total amount of work (and, as a rule, even increases it), but it distributes this work in time. Therefore, it is a good way to solve one of the main tasks - reducing the response time of the main stream.
By allowing javascript to run with a few interruptions to garbage collection, an application can continue to be responsive: respond to user input and update animations.

Small areas of GC work in the main thread

Competitiveness

Competitiveness is when the main thread continuously executes JavaScript, and auxiliary threads do garbage collection in the background. This is the most difficult of the three techniques: a heap can change at any time, invalidating the work done by the GC before.

In addition, there are also read / write races, since the auxiliary and main threads simultaneously read or modify the same objects.

The build is completely in the background, the main thread at this time can perform JavaScript

GC state in V8

Scavenging

V8 distributes the work of garbage collection between auxiliary streams in young memory (scavenging). Each stream receives a set of pointers, following which, moves all living objects into To-space.

When moving objects in To-space, threads need to synchronize through atomic read / write / compare and swap operations to avoid a situation where, for example, another thread found the same object, but following a different path, and also tries to move it.

The thread that moved the object to To-space then returns and leaves the forwarding pointer so that other threads that find this object can proceed to the right address. For fast and non-synchronized memory allocation for surviving objects, threads use thread local buffers.

Parallel assembly distributes work between several auxiliary threads and the main thread

Main GC

The main GC in V8 starts with the labeling of objects. As soon as the heap reaches a certain limit (calculated dynamically), competitive markers begin their work. Each of the streams receives a set of pointers, and, passing on them, they mark each found object as attainable.

Competitive labeling occurs entirely in the background while javascript is running in the main thread. Write barriers are used to keep track of new links between objects that are created in JavaScript while threads are marking.

Primary GC uses competitive labeling, recycling, and parallel compaction and pointer updating.

At the end of the competitive labeling, the main stream performs a quick step to complete the marking. During this, the execution of JavaScript in the main thread is suspended.

The root set is scanned again to make sure that all living objects are marked, and then memory compaction and pointer updating begin in several streams.
Not all pages in the old memory are compacted - those that do not will be scanned to the vacated memory areas (sweeping) for listing them (free-lists).

During this pause, sweeping tasks start that compete with memory compaction tasks and the main thread and can continue even when JavaScript is running in the main thread.

Idle-time GC

JavaScript developers do not have access to the GC - it is part of the implementation of the environment. And although the JS code cannot call GC directly, V8 provides such access to the environment that embeds the engine.

GC can send tasks (idle tasks) that can be performed “in free time” and which are parts of the work that in any case would have to be done. An environment like Chrome, where the engine is embedded, can have its own idea of what is considered free time. For example, in Chrome at a frequency of 60 frames per second, the browser has about 16.6 ms to render an animation frame.

If the animation work is completed earlier, in free time, before the next frame arrives, Chrome can perform some of the tasks received from GC.

GC uses the free time of the main thread to do pre-clean.

Learn more from our publication on Idle-time GC .

Results

GC in V8 has come a long way since its inception. Adding parallel, incremental and competitive techniques to it took several years, but it paid off, allowing us to do most of the work in the background.

Everything related to the pauses of the main thread, the response time and page load has significantly improved, which allows the animation, scrolling and user interaction on the page to be much smoother. The parallel collector made it possible to reduce the total duration of processing a young memory by 20–50%, depending on the load.

Idle-time GC allows you to reduce the size of the used heap for Gmail by 45%. Competitive labeling and disposal (sweeping) can reduce the duration of GC-pauses in heavy WebGL-games up to 50%.

However, the work is not over. Reducing pauses remains an important task to simplify the lives of web users, and we are looking for the possibility of using more advanced techniques to achieve the goal.

On top of that, Blink (a renderer in Chrome) is also equipped with an Oilpan picker, and we are working to improve the interaction between the two GCs and also to use Orinoco techniques in Oilpan.

Most JavaScript developers do not need to think about how GC works, but some idea of this can help make optimal decisions in terms of memory usage and programming patterns. For example, given the structure of the heap V8, based on generations, low-living objects are actually quite cheap in terms of GC, since we pay mainly for the surviving objects. And this kind of pattern is not only peculiar to JavaScript, but also to many languages with garbage collection support.

Source: https://habr.com/ru/post/454930/

All Articles