GC in Go: priority on speed and simplicity

Translation of the blog post of the main author of the garbage collector at Go, Richard Hudson, the inventor of many algorithms for GC in other languages, one of the leading engineers at Intel (currently working at Google).

Go plans its own garbage collector (GC) not only for 2015, but also for 2025 and beyond: it should be GC, which supports modern program development principles and scales well with the advent of new software and hardware in the next decades. In this future, there is no place for GC to pause with “stop the world” (stop-the-world), which were an obstacle to the wider use of such safe and reliable languages as Go.

Go 1.5, the first glimpse of this future, reached the goal of reducing the top pause bar to 10ms, which we set for ourselves a year ago. You can see some impressive figures in the report on GopherCon . These improvements in response time attracted a lot of attention; Robin Verlangen 's blog post “Billions of requests per day meet Go 1.5” confirms our calculations with real results. Separately, we liked the screenshots of the production server charts from Alan Shreve and his commentary “Holy 85% reduction!”.

Today, 16 gigabytes of memory cost $ 100 and all processors are multi-core, and support hyper-trading. After 10 years, this hardware will look obsolete, but programs written on Go today will have to be able to scale to work on growing iron resources and be ready for the “next big thing”. Considering that iron makes it possible to increase throughput, the Go garbage collector is designed so that responsiveness and small pauses are a priority, and tuning takes place with one slider. Go 1.5 is the first big step in this direction, and these first steps will forever affect the further development of Go and the programs written in it. This post gives an overview of what we did in the Go 1.5 collector.
')

Details

To create a garbage collector for the next decade, we turned to an algorithm written several decades ago. The new garbage collector in Go is a competitive (concurrent), three-color (tri-color), mark-sweep collector, the idea of which was first proposed by Dijkstra in 1978 . This approach is deliberately so inappropriate for most modern enterprise-level garbage collectors, and we believe that this is the best approach for modern iron and the requirements for pauses on the new hardware.

In the tricolor garbage collector, each object can be labeled as white, gray or black, and we consider the heap as a graph of related objects. At the beginning of each GC cycle, all objects are white. GC passes all the root nodes of the graph, which are objects that are directly accessible to the program - these are global variables and variables in the stack - and marks them with gray. GC then selects a gray object, turns it black, and then scans it for pointers and other objects. If the scan detects a pointer to a white object, it makes it gray. This process is repeated until there are gray objects. At this point, all white objects will be considered unreachable and can be reused.

This all happens in parallel with the work of the program, also called the mutator, which changes pointers as the collector works. From this it follows that the mutator must maintain the invariant of the fact that no black object indicates white, so that the garbage collector does not lose the trail of the object in that part of the heap that it has already bypassed. Supporting such an invariant is a task for the “write barrier” (write barrier), which is essentially a small function that starts every time a mutator changes the pointer on the heap. The entry barrier in Go marks gray objects that were white, ensuring that the garbage collector will scan it sooner or later.

Determining the moment when all objects are scanned is a delicate task and can be very expensive and difficult if we want to avoid mutator locks. For simplicity, Go 1.5 makes the maximum possible in the background, and then suspends the program for a very short time to check all potentially gray objects. Finding the ideal ratio for the time needed for this pause and for the entire work of the GC is one of the main tasks for Go 1.6.

Of course, the devil is in the details. When to start the next cycle of GC? Which metric to use to make this decision? How should the GC and the Go scheduler interact? How to stop mutator threads so that they have enough time to scan their stacks? How do we present white, gray and black objects in order to most effectively search and scan gray objects? How do we know where their root nodes are? How do we know where the pointers are in the object? How do we minimize memory fragmentation? How do we solve cache performance issues? How big should a bunch be? And so on, and so on, something related to allocations, something about finding reachable objects, something about planning, but the main issues affect performance. Discussions about the lower-level details of each of these areas are beyond the scope of this post.

At a higher level, one of the approaches for solving these problems for GC is to add to the GC sliders (knobs), one for each task. The developer can then customize the GC for himself, setting many parameters. The downside is that after 10 years with one or two new sliders each year, you end up with an Employment Contract for Using GC Switches. Go will not go this way. Instead, we give only one slider, called GOGC. Its value controls the total heap size relative to the size of reachable objects. The default value of “100” means that the total heap size is now 100% larger (that is, twice) the size of the actually accessible objects after the last GC cycle. “200” means that the total heap size is 200% larger (that is, three times) than the size of the actually used objects. If you want to reduce the total amount of time the GC runs, increase the GOGC. If you want to give more time to GC, and win yourself a memory - reduce GOGC.

It is important to understand that as the amount of memory doubles with the next generation of iron, a simple increase in GOGC will halve the number of GC cycles. On the other hand, since the GOGC operates with the concept of reachable objects, the increase in load and the concomitant increase in the number of reachable objects do not need to be retuned. The application simply scales. Moreover, unburdened by the support of dozens of sliders, the command writing the language runtime can focus on improving runtime based on feedback from real production programs.

Conclusion

The Go 1.5 garbage collector introduces us to the future, where pauses to stop the world are no longer an obstacle to a safe and reliable language. This is the future where applications scale out effortlessly with the new hardware, and as the hardware becomes more powerful, the garbage collector will not interfere with even better, and even more scalable software. This is a good place for the next decade and what comes after it. If you want to know the details of how we removed the problems with pauses, see the Go GC report : Latency Problem Solved presentation or report slides .

Source: https://habr.com/ru/post/265833/

All Articles

GC in Go: priority on speed and simplicity

Details

Conclusion

More articles: