📜 ⬆️ ⬇️

Simple Go Program Optimization Techniques

I always care about performance. I don't know exactly why. But I just get pissed off by slow services and programs. Looks like I'm not alone .

In A / B tests, we tried to slow down the page output in 100 millisecond increments and found that even very small delays lead to a significant drop in revenue. - Greg Linden, Amazon.com

By experience, low productivity is manifested in one of two ways:


For most of my career, I either did data science with Python, or created services on Go. In the second case, I have much more experience in optimization. Go is usually not a bottleneck in the services I write - programs when working with databases are often limited to I / O. However, in the batch machine learning pipelines that I developed, the program is often limited to CPU. If the Go program over-uses the processor, there are various strategies.
')
This article explains some of the techniques that can be used to significantly increase productivity without much effort. I deliberately ignore methods that require significant effort or large changes in the program structure.

Before you start


Before making any changes to the program, take the time to create a suitable baseline for comparison. If you do not do this, you will wander in the dark, wondering if there is any benefit from the changes made. First write benchmarks and take profiles for use in pprof. The best way to write a benchmark is also on Go : it simplifies the use of pprof and memory profiling. Also use benchcmp: a useful tool for comparing the difference in performance between tests.

If the code is not very compatible with benchmarks, just start with something that can be measured. You can profile the code manually using runtime / pprof .

So, let's begin!

Use sync.Pool to reuse previously selected objects.


sync.Pool implements the release list . This allows you to reuse previously selected structures and depreciates the distribution of the object for many uses, reducing the work of the garbage collector. The API is very simple. Implement a function that allocates a new instance of an object. The API will return a pointer type.

var bufpool = sync.Pool{ New: func() interface{} { buf := make([]byte, 512) return &buf }} 

After that, you can make Get() objects from the pool and Put() them back when you're done.

 // sync.Pool returns a interface{}: you must cast it to the underlying type // before you use it. b := *bufpool.Get().(*[]byte) defer bufpool.Put(&b) // Now, go do interesting things with your byte buffer. buf := bytes.NewBuffer(b) 

There are nuances. Before Go 1.13, the pool was cleaned up with every garbage collection. This can adversely affect the performance of programs that allocate a lot of memory. Starting from 1.13, more objects seem to survive the GC .

!!! Before returning the object to the pool, it is necessary to reset the structure fields.

If you do not do this, you can get a dirty object from the pool that contains data from a previous use. This is a serious security threat!

 type AuthenticationResponse { Token string UserID string } rsp := authPool.Get().(*AuthenticationResponse) defer authPool.Put(rsp) // If we don't hit this if statement, we might return data from other users! if blah { rsp.UserID = "user-1" rsp.Token = "super-secret } return rsp 

The safe way to always guarantee zero memory is to do it explicitly:

 // reset resets all fields of the AuthenticationResponse before pooling it. func (a* AuthenticationResponse) reset() { a.Token = "" a.UserID = "" } rsp := authPool.Get().(*AuthenticationResponse) defer func() { rsp.reset() authPool.Put(rsp) }() 

The only time this is not a problem is when you use the memory to which you recorded. For example:

 var ( r io.Reader w io.Writer ) // Obtain a buffer from the pool. buf := *bufPool.Get().(*[]byte) defer bufPool.Put(&buf) // We only write to w exactly what we read from r, and no more. nr, er := r.Read(buf) if nr > 0 { nw, ew := w.Write(buf[0:nr]) } 

Avoid using structures containing pointers as keys for a large map.


Whew, I was too verbose. I apologize. Often talked (including my former colleague Phil Pearl ) about Go performance with a large heap size . During garbage collection, the runtime scans objects with pointers and tracks them. If you have a very large map[string]int , then GC should check every line. This happens with every garbage collection, since the lines contain pointers.

In this example, we write 10 million items in map[string]int and measure the duration of the garbage collection. We allocate our map in the package area to ensure heap memory allocation.

 package main import ( "fmt" "runtime" "strconv" "time" ) const ( numElements = 10000000 ) var foo = map[string]int{} func timeGC() { t := time.Now() runtime.GC() fmt.Printf("gc took: %s\n", time.Since(t)) } func main() { for i := 0; i < numElements; i++ { foo[strconv.Itoa(i)] = i } for { timeGC() time.Sleep(1 * time.Second) } } 

Running the program, we will see the following:

  inthash → go install && inthash
 gc took: 98.726321ms
 gc took: 105.524633ms
 gc took: 102.829451ms
 gc took: 102.71908ms
 gc took: 103.084104ms
 gc took: 104.821989ms 

This is quite a long time in a computer country!

What can be done for optimization? A good idea seems to be to delete pointers everywhere so as not to load the garbage collector. There are pointers in the rows ; so let's implement this as map[int]int .

 package main import ( "fmt" "runtime" "time" ) const ( numElements = 10000000 ) var foo = map[int]int{} func timeGC() { t := time.Now() runtime.GC() fmt.Printf("gc took: %s\n", time.Since(t)) } func main() { for i := 0; i < numElements; i++ { foo[i] = i } for { timeGC() time.Sleep(1 * time.Second) } } 

Running the program again, we will see:

  inthash → go install && inthash
 gc took: 3.608993ms
 gc took: 3.926913ms
 gc took: 3.955706ms
 gc took: 4.063795ms
 gc took: 3.91519ms
 gc took: 3.75226ms 

Much better. We sped up garbage collection 35 times. When used in production, you will need to hash the strings into integers before inserting them into the map.

By the way, there are still many ways to avoid GC. If you allocate giant arrays of meaningless structures, ints or bytes, GC will not scan it : that is, you save on GC time. Such methods usually require substantial processing of the program, so today we will not go into this topic.

As with any optimization, the effect may vary. See the tweet thread from Damien Gryski with an interesting example of how removing rows from a large map in favor of a smarter data structure actually increased memory consumption. In general, read everything that he publishes.

Generate marshaling code to avoid reflection in runtime.


Marshaling and unmarshaling your structure into various serialization formats, such as JSON, is a typical operation, especially when creating microservices. Many microservices have a single job at all. Functions like json.Marshal and json.Unmarshal rely on json.Unmarshal reflection to serialize structure fields to bytes and vice versa. This can be slow: reflection is not as efficient as explicit code.

However, there are optimization options. The marshaling mechanics in JSON looks like this:

 package json // Marshal take an object and returns its representation in JSON. func Marshal(obj interface{}) ([]byte, error) { // Check if this object knows how to marshal itself to JSON // by satisfying the Marshaller interface. if m, is := obj.(json.Marshaller); is { return m.MarshalJSON() } // It doesn't know how to marshal itself. Do default reflection based marshallling. return marshal(obj) } 

If we know the marshalling process in JSON, we have a hook to avoid being reflected in runtime. But we don’t want to manually write all the marshalling code, what to do? Instruct the computer to generate this code! Code generators like easyjson look at the structure and generate highly optimized code that is fully compatible with existing marshaling interfaces, such as json.Marshaller .

Load the package and write the following command to $file.go containing the structures for which you want to generate code.

  easyjson -all $ file.go 

The file $file_easyjson.go must be generated. Since easyjson implemented the json.Marshaller interface for you, instead of the default reflection, these functions will be called. Congratulations: you just sped up your JSON code three times. There are many tricks to further increase productivity.

I recommend this package because I myself have used it before, and successfully. But beware. Please do not take this as an invitation to start an aggressive debate with me about the fastest JSON packages.

You should ensure that the marshaling code is re-generated when the structure changes. If you forget to do this, the new fields you add will not be serialized, which will lead to confusion! You can use go generate for these tasks. To keep it in sync with the structures, I prefer to put generate.go at the root of the package, which causes go generate for all package files: this can help when you have a lot of files that need to generate such code. General advice: to ensure that the structures are updated, call go generate in CI and check that there are no differences with the registered code.

Use strings.Builder to build strings.


In Go, strings are immutable: present them as read-only bytes. This means that each time you create a string, you allocate memory and potentially create more work for the garbage collector.

Go 1.10 implemented strings.Builder as an effective way to create strings. Internally, it writes to the byte buffer. Only when calling String() in the builder, a string is actually created. It relies on some unsafe tricks to return the base bytes as a zero-distributed string: see this blog for further study on how this works.

Let's compare the performance of two approaches:

 // main.go package main import "strings" var strs = []string{ "here's", "a", "some", "long", "list", "of", "strings", "for", "you", } func buildStrNaive() string { var s string for _, v := range strs { s += v } return s } func buildStrBuilder() string { b := strings.Builder{} // Grow the buffer to a decent length, so we don't have to continually // re-allocate. b.Grow(60) for _, v := range strs { b.WriteString(v) } return b.String() } 

 // main_test.go package main import ( "testing" ) var str string func BenchmarkStringBuildNaive(b *testing.B) { for i := 0; i < bN; i++ { str = buildStrNaive() } } func BenchmarkStringBuildBuilder(b *testing.B) { for i := 0; i < bN; i++ { str = buildStrBuilder() } 

Here are the results on my Macbook Pro:

  strbuild → go test -bench =.  -benchmem
 goos: darwin
 goarch: amd64
 pkg: github.com/sjwhitworth/perfblog/strbuild
 BenchmarkStringBuildNaive-8 5000000 255 ns / op 216 B / op 8 allocs / op
 BenchmarkStringBuildBuilder-8 20000000 54.9 ns / op 64 B / op 1 allocs / op 

As you can see, strings.Builder is 4.7 times faster, causes eight times less selections and takes up four times less memory.

When performance is important, use strings.Builder . In general, I recommend using it everywhere except in the most trivial cases of string construction.

Use strconv instead of fmt


fmt is one of Go's most famous packages. You probably used it in your first program to display “hello, world”. But when it comes to converting integers and floats to strings, it is not as effective as its younger brother, strconv . This package shows decent performance with very few API changes.

fmt basically accepts interface{} as function arguments. There are two drawbacks:

Source: https://habr.com/ru/post/457004/


All Articles