📜 ⬆️ ⬇️

Visualize concurrency in Go with WebGL

One of the great strengths of the Go programming language is its built-in concurrency support, based on the Communicating Sequential Processes work of Tony Hoare. Go is designed for convenient work with multi-threaded programming and makes it very easy to build fairly complex concurrent programs. But have you ever wondered how different concurrency patterns look visually?

Of course, thought. All of us, one way or another, we think in visual images. If I ask you about something that includes the numbers “from 1 to 100”, you will instantly “see” them in your head in one form or another, probably without even realizing it. For example, I see a row from 1 to 100 as a line with numbers leaving me, turning 90 degrees to the right on the number 20 and continuing to 1000+. And, having rummaged in my memory, I remember that in the very first kindergarten in the locker room along the wall the numbers were written, and the number 20 was just in the corner. You probably have some idea of ​​your own. Or, another frequent example - imagine all year round and 4 seasons of the year - someone sees them as a square, each face of which belongs to the season, someone as a circle, someone else somehow.

Anyway, let me show my attempt to visualize the basic concurrency patterns using Go and WebGL. These interactive visualizations more or less reflect how I see it in my head. It will be interesting to hear how this differs from the visualization of the readers.


So let's start with the simplest example - “Hello, concurrent world” to get acquainted with the concept of my approach.
')

Hello concurrent world


package main func main() { //     int ch := make(chan int) //     go func() { //  42   ch <- 42 }() // ,    <-ch } 


Link to an interactive WebGL demo
Here the blue lines represent the gorutines, time “runs” down the Y axis. The thin blue lines connecting the 'main' and 'go # 19' are marks of the beginning and end of the life of the gorutina, showing ancestors and children. The red arrows show the event of sending a message to the channel, and the value sent is signed. In fact, “sending to the channel” and “reading from the channel” are two separate events, and the behavior will be very different between the buffered and unbuffered channels, but I will animate these two events as one - “transmitting the value over the channel”. The string "# 19" in the name of an anonymous gorutina is the real ID of a running gorutina. Although it is impossible to officially recognize ID Gorutin (so that people do not town other concurrency models in which identifiers play an important role), but for such hacker cases you can do it - this is well written in the article by Gorotine IDs by Scott Mansfield.

Timers


In fact, our simplest Hello, world above can be used to create the simplest timer. In the standard Go library there are such convenient functions as time.After or time.Tick, but let's implement our own - we will write a function that creates a channel, starts a gorutina that sleeps the necessary time and writes to the channel, and returns this channel to the caller.
 package main import "time" func timer(d time.Duration) <-chan int { c := make(chan int) go func() { time.Sleep(d) c <- 1 }() return c } func main() { for i := 0; i < 24; i++ { c := timer(1 * time.Second) <-c } } 


Link to an interactive WebGL demo
Great, right? But we go further.

Ping pong


This interesting example of concurrency was taken from Sameer Ajmani, a famous paper by the Googler " Advanced Concurrency Patterns ". Of course, this example is not very advanced, but for those who are only acquainted with concurrency in Go, it should be interesting and demonstrative.

So, we have a table channel that performs the role of a table, there is a Ball Ball, which is an int variable and stores the number of blows on it, and there are gorutin players who “take the ball from the table” (read from the channel), “ beat on him "(increase the variable) and" throw back on the table "(write to the channel).
 package main import "time" func main() { var Ball int table := make(chan int) go player(table) go player(table) table <- Ball time.Sleep(1 * time.Second) <-table } func player(table chan int) { for { ball := <-table ball++ time.Sleep(100 * time.Millisecond) table <- ball } } 


Link to an interactive WebGL demo
At this point, I want to once again draw your attention to the link with the interactive WebGL demo available under each animation - by opening in a new tab, you can move, rotate, zoom in / out and view these 3D animations as you like, as well as slow down / speed up and restart them.

Now, let's run three gorein players, instead of two:
  go player(table) go player(table) go player(table) 


Link to an interactive WebGL demo
In this example, we see that each player picks up the ball from the table in turn, and you may wonder why this is so, who guarantees this order?

The answer here is simple - the Go runtime contains a FIFO queue for gorutins ready to read from the channel, which is why we are observing this order. In our case, each gorutina is in the queue immediately after sending the ball to the table. However, this behavior may change in the future and rely on the order is not worth it. But for now this is so, and let's launch not three, but one hundred gorutin.
 for i := 0; i < 100; i++ { go player(table) } 


Link to an interactive WebGL demo
The FIFO order is now more obvious, isn't it? We can easily run a million gorutins (they are cheap, and it’s ok to have hundreds of thousands of gorutins in large Go programs), but for our purposes it will be too much. Let's move on to other patterns.

Fan-in


One of the most famous patterns is the so-called fan-in pattern. It is the opposite of the fan-out pattern, which we will look at next. In short, fan-in is a function that reads from multiple sources and multiplexes everything into one channel.
For example:
 package main import ( "fmt" "time" ) func producer(ch chan int, d time.Duration) { var i int for { ch <- i i++ time.Sleep(d) } } func reader(out chan int) { for x := range out { fmt.Println(x) } } func main() { ch := make(chan int) out := make(chan int) go producer(ch, 100*time.Millisecond) go producer(ch, 250*time.Millisecond) go reader(out) for i := range ch { out <- i } } 


Link to an interactive WebGL demo
As we can see, the first producer generates numbers every 100ms, the second one every 250ms, and reader receives the numbers from both producers immediately. Multiplexing, in fact, occurs in the function main.

Fan out


The opposite of fan-in is the fan-out or workers pattern. Many Gorutin are read from one channel, taking away some data for processing and effectively distributing the work between the CPU cores. Hence the name " workers ". It is very easy to implement this pattern in Go - start a pack of gorutin, transfer a channel through a parameter and write your data to this channel, and multiplexing and distribution will be done automatically due to Go runtime.
 package main import ( "fmt" "sync" "time" ) func worker(tasksCh <-chan int, wg *sync.WaitGroup) { defer wg.Done() for { task, ok := <-tasksCh if !ok { return } d := time.Duration(task) * time.Millisecond time.Sleep(d) fmt.Println("processing task", task) } } func pool(wg *sync.WaitGroup, workers, tasks int) { tasksCh := make(chan int) for i := 0; i < workers; i++ { go worker(tasksCh, wg) } for i := 0; i < tasks; i++ { tasksCh <- i } close(tasksCh) } func main() { var wg sync.WaitGroup wg.Add(36) go pool(&wg, 36, 50) wg.Wait() } 


Link to an interactive WebGL demo
One thing that I would like to draw attention to here is concurrency. Lego notice that the gorutiny-workers run in parallel, taking their "work" through the channels, one after another. For this animation, you can also see that the gorutines do it almost simultaneously. Unfortunately, so far the animation doesn’t see where the gorutin really works, and where it is blocked, and also here the time scale is already close to the error threshold of the error, but specifically this animation was recorded on a program running on 4 cores, ie with GOMAXPROCS = 4 . A little further we will look at this issue in more detail.

In the meantime, let's try something more complicated - workers who have their own, sub workers.
 package main import ( "fmt" "sync" "time" ) const ( WORKERS = 5 SUBWORKERS = 3 TASKS = 20 SUBTASKS = 10 ) func subworker(subtasks chan int) { for { task, ok := <-subtasks if !ok { return } time.Sleep(time.Duration(task) * time.Millisecond) fmt.Println(task) } } func worker(tasks <-chan int, wg *sync.WaitGroup) { defer wg.Done() for { task, ok := <-tasks if !ok { return } subtasks := make(chan int) for i := 0; i < SUBWORKERS; i++ { go subworker(subtasks) } for i := 0; i < SUBTASKS; i++ { task1 := task * i subtasks <- task1 } close(subtasks) } } func main() { var wg sync.WaitGroup wg.Add(WORKERS) tasks := make(chan int) for i := 0; i < WORKERS; i++ { go worker(tasks, &wg) } for i := 0; i < TASKS; i++ { tasks <- i } close(tasks) wg.Wait() } 


Link to an interactive WebGL demo
Great. Of course, it was possible to do more and workers, and sub-workers, but I tried to make the animation as clear as possible.

There are much more complex patterns, workers with subwoofers with their own subwoofers, and channels that are transmitted through the channels themselves, but the idea of ​​fan-out should be clear.

Servers


The next popular pattern, similar to fan-out, is servers . It is distinguished by the fact that gorutiny start dynamically, perform the necessary work and complete. And quite often this pattern is used to implement servers - we listen to the port, accept the connection, start the gorutina, which will continue to deal with the incoming request, passing the connection to it, and at that time we listen further, waiting for the next connection. This is quite convenient and allows you to implement an efficient server that can handle 10K connections, very simple. Take a look at the following example:
 package main import "net" func handler(c net.Conn) { c.Write([]byte("ok")) c.Close() } func main() { l, err := net.Listen("tcp", ":5000") if err != nil { panic(err) } for { c, err := l.Accept() if err != nil { continue } go handler(c) } } 


Link to an interactive WebGL demo
This example is not very interesting - in fact, nothing much happens here. Although, of course, under the hood there is tremendously hidden enormous complexity and algorithms. "Simplicity is complicated."

But let's add some activity to our server, and, let's say, add a logger in which each client will write the client's address.
 package main import ( "fmt" "net" "time" ) func handler(c net.Conn, ch chan string) { ch <- c.RemoteAddr().String() c.Write([]byte("ok")) c.Close() } func logger(ch chan string) { for { fmt.Println(<-ch) } } func server(l net.Listener, ch chan string) { for { c, err := l.Accept() if err != nil { continue } go handler(c, ch) } } func main() { l, err := net.Listen("tcp", ":5000") if err != nil { panic(err) } ch := make(chan string) go logger(ch) go server(l, ch) time.Sleep(10 * time.Second) } 


Link to an interactive WebGL demo
Quite defiantly, isn't it? This animation shows that our logger can quickly become a bottleneck if the number of connections grows, and the logger is not too fast (say, it will serialize data and send it somewhere else). But we can solve this by using the fan-out pattern already familiar to us. Let's write it.

Server + Worker


An example of a server with a worker will be a slightly more advanced version of the solution just announced. It not only starts the logger in several mountains, but also collects data from them with the results (for example, the result of recording to a remote service).
Let's look at the code and animation:
 package main import ( "net" "time" ) func handler(c net.Conn, ch chan string) { addr := c.RemoteAddr().String() ch <- addr time.Sleep(100 * time.Millisecond) c.Write([]byte("ok")) c.Close() } func logger(wch chan int, results chan int) { for { data := <-wch data++ results <- data } } func parse(results chan int) { for { <-results } } func pool(ch chan string, n int) { wch := make(chan int) results := make(chan int) for i := 0; i < n; i++ { go logger(wch, results) } go parse(results) for { addr := <-ch l := len(addr) wch <- l } } func server(l net.Listener, ch chan string) { for { c, err := l.Accept() if err != nil { continue } go handler(c, ch) } } func main() { l, err := net.Listen("tcp", ":5000") if err != nil { panic(err) } ch := make(chan string) go pool(ch, 4) go server(l, ch) time.Sleep(10 * time.Second) } 


Link to an interactive WebGL demo
We have improved our server, effectively distributing the task for the logger between 4 gorutinami, but still we see that the logger can become a bottleneck. Thousands of connections converge in the same channel. before multiplexing between the gorutins. But, of course, this will happen already at much greater loads than in the previous version.

Sieve of Eratosthenes


But pretty fan-in / fan-out experiments. Let's look at a more interesting example. One of my favorites is the “Sieve of Eratosthenes” on the gorutines and canals, found in the report “ Go Concurrency Patterns ”. Sieve of Eratosthenes is an ancient algorithm for finding primes to a given limit. Its essence lies in the successive deletion of all numbers divisible by each subsequent prime number found. The forehead algorithm is not very efficient, especially on multi-core machines.

A variant of the implementation of this algorithm with gorutinami and channels runs one gorutin for each prime number found, and this gorutina filters the numbers that are divided into it. When the first prime number in a gorutin is found, it is sent to the main gorutin (main) and displayed on the screen. This algorithm is also far from the most effective, but I find it stunningly elegant. Here is the code itself:
 // A concurrent prime sieve package main import "fmt" // Send the sequence 2, 3, 4, ... to channel 'ch'. func Generate(ch chan<- int) { for i := 2; ; i++ { ch <- i // Send 'i' to channel 'ch'. } } // Copy the values from channel 'in' to channel 'out', // removing those divisible by 'prime'. func Filter(in <-chan int, out chan<- int, prime int) { for { i := <-in // Receive value from 'in'. if i%prime != 0 { out <- i // Send 'i' to 'out'. } } } // The prime sieve: Daisy-chain Filter processes. func main() { ch := make(chan int) // Create a new channel. go Generate(ch) // Launch Generate goroutine. for i := 0; i < 10; i++ { prime := <-ch fmt.Println(prime) ch1 := make(chan int) go Filter(ch, ch1, prime) ch = ch1 } } 


And now take a look at the animation.

Link to an interactive WebGL demo
Do not forget to spin it interactively in 3D space at the link above. I really like how illustrative this example is and studying it in 3D can help you understand the algorithm itself better. We see that the first Gorutin ( generate ) sends the first prime number (2) to main , then the first Gorutin filter starts, filtering out twos, then triples, fives, sevens ... and each time a new found prime number goes to main - this is especially good seen from above. Beautiful algorithm, including in 3D.

GOMAXPROCS


Now let's go back to our example with the workers. Remember, I wrote that this example was launched with GOMAXPROCS = 4? This is because all these animations are not drawn, they are real traces of real programs.

To begin with, let's refresh our memory and remember what GOMAXPROCS is :
GOMAXPROCS sets the maximum number of CPU cores that can execute code simultaneously.


I changed the code of the workers slightly to make it work. loading the processor, not just sleeping. Then I ran the code without any changes on a Linux machine with two processors of 12 cores each - first with GOMAXPROCS = 1, then with GOMAXPROCS = 24.

So. the first animation shows the same program running on the 1st core, the second on 24 cores.


WebGL animation 1 WebGL animation 24
The time animation speed is different in these examples (I wanted all the animations to take a fixed time in height), but the difference should be obvious. With GOMAXPROCS = 1, the next worker takes the work (reads from the channel) only when the processor core is released and the previous gorutina has completed its portion. With 24 cores, the gorutines almost immediately disassemble the tasks and the acceleration is huge.

However, it is important to understand that an increase in GOMAXPROCS does not always lead to an increase in performance, and there may be cases when it even drops from an increase in the number of cores.

Leaks gorutin


What else can we demonstrate from the world of concurrency? One of the things that comes to my mind is gorutin leaks. They can happen by negligence, or, say, if you started a gorutina, but it went out of scope .

The first (and only) time when I ran into a gorutin leak was a terrifying picture in my head, and literally next weekend I wrote expvarmon for quick monitoring. And now I can visualize this horrific image using WebGL.

Take a look:

Link to an interactive WebGL demo
It pains me even to look at it :) Each line is a spent computer resources and a time bomb for your program.

Concurrency is not parallelism


The word concurrency is often translated as "parallelism", but this is not entirely true. In truth, I don’t know a good translation, so everywhere I write here without translation. But the topic itself, explaining the differences between concurrency and concurrency, has been uncovered many times , including Rob Paik, in a remarkable report of the same name . Look, if not yet.


In short, then:
Parallelism is just a lot of things running in parallel.
Concurrency is a way to structure a program.

It is important to understand that these concepts are somewhat orthogonal — a concurrent program may or may not be parallel. We have just seen an example of this with different GOMAXPROCS - the same code ran both on the 1st core (sequentially), and on the 24 cores (in parallel).

I could repeat many of the postulates of the above references and reports, but this has already been done before me. Better try to show it visually.

So, this is parallelism. Just a lot of things running in parallel.

Link to an interactive WebGL demo

And this is parallelism. Even more parallel pieces, which, however, does not change anything.

Link to an interactive WebGL demo

But this is concurrency:


And this:


And this is also a concurrency:


How was this done?


To create these animations, I wrote two programs, gotracer and gothree.js . The first does the following things:

Example of the resulting JSON:


Further, gothree.js uses the power of the elegant Three.js library to draw and animate this data in 3D using WebGL. A small wrapper to cram it into one demo page and ready.

However, this approach is very limited. I had to very carefully select examples, rename channels to get the correct trace. With this approach, there is no easy way to link the channels between the gorutines, if they are called differently inside the function. There are also problems with timing - the output to the console sometimes takes more time than starting the gorutina, recording to the channel and output, so in some examples I had to tweak a bit by inserting time.Sleep (in the example with timers, the animation is slightly incorrect therefore).

Generally. This is the main reason why I don’t open the code. Now I’m still playing with Dmitry Vyukov’s execution tracer - it seems to give the required level of detail, although it doesn’t contain information about what was sent to the channel. Perhaps there are still some ways to achieve the most detailed trays, I will explore further. E-mail me on twitter or here in the comments if you have ideas. It would be cool if this tool eventually evolved into a 3D concurrency debugger applicable to absolutely any Go program, with no exceptions.

This article was originally presented as a report at Go Meetup in Lviv (January 23, 2016) , then published in English on my blog . Also on HackerNews and on Reddit .

Source: https://habr.com/ru/post/276255/


All Articles