bytes.Buffer in Go: optimizations that don't work

Many go programmers are familiar with bytes.Buffer . One of its advantages is that it allows you to avoid heap allocations using the same pattern as " small buffer / size optimization ":

type Buffer struct { bootstrap [64]byte //        // ...   }

There is only one problem. This optimization does not work .

By the end of this article, you will learn why this optimization does not work and what we can do about it.

As it was planned, "small buffer optimization"

We introduce a slightly simplified definition of bytes.Buffer :

 const smallBufSize int = 64 type Buffer struct { bootstrap [smallBufSize]byte buf []byte }

When we perform actions on the Buffer , for example, we call the Buffer.Write method, the record is always made in buf , but before this record, Buffer.grow(n) run inside each of these methods, which makes sure that there is enough space in this slice next n bytes.

Grow can look like this:

 func (b *Buffer) grow(n int) { //         bytes.Buffer. l := len(b.buf) //   Buffer need := n + l have := cap(b.buf) - l if have >= need { b.buf = b.buf[:need] return } if need <= smallBufSize { //     , //   . b.buf = b.bootstrap[:] } else { // growFactor -     . //     need  need*2. newBuf := make([]byte, need, growFactor(need)) copy(newBuf, b.buf) b.buf = newBuf } }

Assumptions used in our implementation Buffer.grow

We assume that len(b.buf) is the actual data length in Buffer, which would require Write methods to use append to add new bytes to the slice. In the bytes.Buffer from the standard library, this is not the case, but for example this is an inessential implementation detail.

If b allocated on the stack, then the bootstrap inside it is allocated on the stack, which means that the b.buf slice will reuse memory inside b , without requiring additional allocation.

When grow reveals that the bootstrap array is no longer enough, a new, “real” slice will be created, to which elements from the past storage will then be copied (from the “small buffer”). After that Buffer.bootstrap will lose relevance. If Buffer.Reset is Buffer.Reset , the cap(b.buf) will remain the same, and the bootstrap array will no longer be needed.

Memory escaping in heap

Further, it is expected that the reader is at least superficially familiar with the fact that such an analysis in Go.

Consider the following situation:

 func f() *Buffer { var b bytes.Buffer // leak.go:11:6: moved to heap: b return &b // leak.go:12:9: &b escapes to heap }

Here b will be highlighted on the heap. The reason for this is a "leaking" pointer to b :

 $ go tool compile -m leak.go leak.go:12:9: &b escapes to heap leak.go:11:6: moved to heap: b

Terminology

In this article, leaking and escapes are used almost as synonyms.

In the compiler itself there is some difference, for example, the value “runs into a heap” (x escapes to heap), but the parameters of the function are “leaking” (leaking param x).

Flowing parameter means that the passed argument for this parameter will be a selection on the heap. In other words, the leaking parameter causes the arguments to run off into a heap.

Above was the obvious case, but how about this:

 func length() int { var b bytes.Buffer b.WriteString("1") return b.Len() }

Here we need only 1 byte, everything fits into the bootstrap , the buffer itself is local and does not “run away” from the function. You might be surprised, but the result will be the same as the allocation b on the heap.

To be sure, you can check this with the benchmark:

 BenchmarkLength-8 20000000 90.1 ns/op 112 B/op 1 allocs/op

Benchmark listing

 package p import ( "bytes" "testing" ) func length() int { var b bytes.Buffer b.WriteString("1") return b.Len() } func BenchmarkLength(b *testing.B) { for i := 0; i < bN; i++ { _ = length() } }

Explanation 112 b / op

When runtime asks for an N byte allocator, it is not necessary that exactly N bytes be allocated.

All results below are shown for a combination of GOOS=linux and GOARCH=AMD64 .

 package benchmark import "testing" //go:noinline func alloc9() []byte { return make([]byte, 9) } func BenchmarkAlloc9(b *testing.B) { for i := 0; i < bN; i++ { _ = alloc9() } }

If you run go test -bench=. -benchmem go test -bench=. -benchmem with this dough:

 BenchmarkAlloc9-8 50000000 33.5 ns/op 16 B/op 1 allocs/op

9 bytes are requested, 16 is allocated. Now let's go back to bytes.Buffer :

 fmt.Println(unsafe.Sizeof(bytes.Buffer{})) => 104

Look at $ GOROOT / src / runtime / sizeclasses.go :

 // class bytes/obj bytes/span objects tail waste max waste // 1 8 8192 1024 0 87.50% // 2 16 8192 512 0 43.75% // 3 32 8192 256 0 46.88% // 4 48 8192 170 32 31.52% // 5 64 8192 128 0 23.44% // 6 80 8192 102 32 19.07% // 7 96 8192 85 32 15.95% // 8 112 8192 73 16 13.56% // ...

96 bytes does not fit, 112 is selected.

But why is this happening?

What happens and why

Some analysis of the situation can be found in the issue mentioned at the outset.
There's also a simple reproducer .

The problem place is in the assignment b.buf = b.bootstrap[:] . This code causes the escape analysis to assume that b.bootstrap "runs away", and since this is an array, it is stored inside the object itself, which means that the whole b should be allocated on the heap.

If bootstrap were a slice, and not an array, then this would not have happened, because there is an adhoc optimization for assigning slices from an object to the object itself:

 //   ,     , // object      . object.buf1 = object.buf2[a:b]

The answer is why this optimization does not work for arrays has already been formulated above, but here is a squeeze from esc.go # L835-L866 (the entire optimization code is highlighted by reference):

 // Note, this optimization does not apply to OSLICEARR, // because it does introduce a new pointer into b that was not already there // (pointer to b itself). After such assignment, if b contents escape, // b escapes as well. If we ignore such OSLICEARR, we will conclude // that b does not escape when b contents do.

Here it is worth adding that for the pointer analyzer there are several levels of "leaks", the main ones are:

The object escapes (b escapes). In this case, the object itself must be allocated on the heap.
Object elements escape (b contents escape). In this case, the pointers in the object are considered evading.

The case of an array is special in that if an array leaks, the object itself containing it should leak.

The escape analysis determines whether the object can be placed on the stack or not, relying only on the information that is available in the body of the analyzed function. The Buffer.grow method takes b using a pointer, so it is required to calculate a suitable location for placement. Since in the case of an array we cannot distinguish between "b escape" and "b contents escape" , we have to be more pessimistic and come to the conclusion that b not safe to place on the stack.

Suppose the opposite, that the self-assignment pattern allows for arrays to be the same as for slices:

 package example var sink interface{} type bad struct { array [10]byte slice []byte } func (b *bad) bug() { b.slice = b.array[:] // ignoring self-assignment to b.slice sink = b.array // b.array escapes to heap // b does not escape }

The decision to place b on the stack in this situation will lead to a catastrophe: after exiting the function inside which b was created, the memory to which the sink will refer will be nothing but garbage.

Pointers to arrays

Imagine that our Buffer was announced a little differently:

 const smallBufSize int = 64 type Buffer struct { - bootstrap [smallBufSize]byte + bootstrap *[smallBufSize]byte buf []byte }

Unlike a regular array, a pointer to an array will not store all the elements inside the Buffer itself. This means that if the allocation of bootstrap on the heap does not entail the allocation of Buffer on the heap. Since escape analysis can allocate pointer fields on the stack, when possible, we can assume that such a definition of Buffer more successful.

But it is in theory. In practice, a pointer to an array has no special processing and falls into the same category as a slice from a regular array, which is not entirely correct. The CL133375: cmd / compile / internal / gc: handle array slice self-assign in esc.go aims to remedy this situation.

Suppose that this change was made to the Go compiler.

Zero value that we lost

Unfortunately, the transition from [64]byte to *[64]byte has a problem: we can no longer use bootstrap without explicitly initializing it, the zero value Buffer no longer useful, we need a constructor.

 func NewBuffer() Buffer { return Buffer{bootstrap: new(*[smallBufSize]byte)} }

We return Buffer , not *Buffer , to avoid problems with the analysis of pointers (it is very conservative in Go), and given that NewBuffer always embedded in the place of the call, unnecessary copying will not occur.

After embedding the NewBuffer body into the call, escape analysis can try to prove that new(*[smallBufSize]byte) does not exceed the lifetime of the frame of the function in which it is called in its lifetime. If so, then the allocation will be on the stack.

Intel bytebuf

The optimization described above is applied in the intel-go / bytebuf package .

This library exports the type bytebuf.Buffer , which duplicates bytes.Buffer by 99.9%. All changes are reduced to the introduction of the constructor ( bytebuf.New ) and a pointer to the array instead of the usual array:

 type Buffer struct { buf []byte // contents are the bytes buf[off : len(buf)] off int // read at &buf[off], write at &buf[len(buf)] - bootstrap [64]byte // helps small buffers avoid allocation. + bootstrap *[64]byte // helps small buffers avoid allocation. lastRead readOp // last read operation (for Unread*). }

Here is a performance comparison with bytes.Buffer :

 name old time/op new time/op delta String/empty-8 138ns ±13% 24ns ± 0% -82.94% (p=0.000 n=10+8) String/5-8 186ns ±11% 60ns ± 1% -67.82% (p=0.000 n=10+10) String/64-8 225ns ±10% 108ns ± 6% -52.26% (p=0.000 n=10+10) String/128-8 474ns ±17% 338ns ±13% -28.57% (p=0.000 n=10+10) String/1024-8 889ns ± 0% 740ns ± 1% -16.78% (p=0.000 n=9+10) name old alloc/op new alloc/op delta String/empty-8 112B ± 0% 0B -100.00% (p=0.000 n=10+10) String/5-8 117B ± 0% 5B ± 0% -95.73% (p=0.000 n=10+10) String/64-8 176B ± 0% 64B ± 0% -63.64% (p=0.000 n=10+10) String/128-8 368B ± 0% 256B ± 0% -30.43% (p=0.000 n=10+10) String/1024-8 2.16kB ± 0% 2.05kB ± 0% -5.19% (p=0.000 n=10+10) name old allocs/op new allocs/op delta String/empty-8 1.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) String/5-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) String/64-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) String/128-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) String/1024-8 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10)

All other information is available in the README .

Due to the impossibility of using the zero value and binding to the construction function New , it is not possible to apply this optimization to bytes.Buffer .

Is this the only way to make faster bytes.Buffer ? The answer is no. But this is exactly the way to require minimal changes to the implementation.

Plans for escape analysis

In its current form, Go's analysis is pretty weak. Almost any operations with pointer values lead to heap allocations, even if this is not a valid decision.

Most of the time that I devote to the golang / go project, I will try to focus on solving these particular problems, so that in the next release (1.12) some improvements are possible.

You can read about the results and details of the internal structure of this part of the compiler in one of my following articles. I will also try to provide a set of recommendations that will help in some cases to structure the code so that it has less unwanted memory allocations.

Source: https://habr.com/ru/post/422447/

All Articles