The Story of One Thick Binary

enter image description here

Hey. My name is Marco (I'm a system programmer at Badoo). And I present to your attention the translation of the post on Go, which seemed to me interesting. Go is really scolded for thick binaries, but at the same time they are praised for static linking and for the convenience of displaying a single file. If on modern servers, thick binaries are not a problem, then on embedded systems - how. The author describes his history of dealing with them in Go.

Small file size is important for applications running under very limited resources. In this article we will consider the creation of an agent program that should work on various low-power devices. Their memory and processor resources will be small, and I can't even predict how much.

Go binaries are distinguished by their small size and self-sufficiency: by creating a program on Go, you get a single binary file that contains everything you need. Compare with platforms such as Java, Node.js, Ruby and Python, where your code takes up only a small part of the application, and everything else is a bunch of dependencies that you also have to pack if you want to get a self-contained package.

Despite this important convenience, like the ability to create self-contained binaries, Go does not have built-in tools to help evaluate dependency sizes, so that developers can make informed decisions about whether to include these dependencies in a file or not.

The gofat tool will help you deal with dependency sizes in your Go project.

Creating an IoT agent

I'll talk a little about how we thought through and created one of our services - an IoT agent that will be deployed on low-power devices around the world. And consider its architecture from an operational point of view.

Sample code can be downloaded from here: https://github.com/jondot/fattyproject

First, we need a good CLI ergonomics, so we will use the kingpin - this is a POSIX-compatible library of CLI flags and options (I like this library so much that I used it in many of my projects). But in fact, I will use my go-cli-starter project, which includes this library:

 $ git clone https://github.com/jondot/go-cli-starter fattyproject Cloning into 'fattyproject'... remote: Counting objects: 55, done. remote: Total 55 (delta 0), reused 0 (delta 0), pack-reused 55 Unpacking objects: 100% (55/55), done.

Since our program is an agent, it should work all the time. As an example for this, we will use a cycle that infinitely performs a foolish operation.

 for { f := NewFarble(&Counter{}) f.Bumple() time.Sleep(time.Second * 1) }

During long-term work, trash accumulates in memory — small memory leaks, forgotten open file descriptors. But even a tiny leak can turn into a giant one if the application has been running non-stop for years. Fortunately, Go has built-in metrics and a system status monitor - expvars . This will be very helpful in analyzing the agent's internal kitchen: since it must work for a long time without stopping, from time to time we will analyze its state — processor consumption, garbage collection cycles, and so on. All this will be done for us by expvars and expvars , which is very convenient for solving such problems.

To use expvars we need magic import. Magic - because during the import a handler will be added to the existing HTTP server. For this we need a working HTTP server from net/http .

 import ( _ "expvar" "net/http" : : go func() { http.ListenAndServe(":5160", nil) }()

Once our program turns into a complex service, we can also add a logging library with support for levels in order to receive information about errors and warnings, as well as understand when the program is working normally. To do this, use zap (from Uber).

 import( : "go.uber.org/zap" : logger, _ := zap.NewProduction() logger.Info("OK", zap.Int("ip", *ip))

A service that runs non-stop on a remote device that you do not control and, most likely, cannot update, should be extremely stable. So it is advisable to lay in him the flexibility. For example, so that he can execute custom commands and scripts, that is, provide a mechanism for changing the behavior of a service without redeploying it or restarting it.

Add a launcher for an arbitrary remote script. Although it looks suspicious, but if this is your agent or service, then you can prepare the built-in runtime sandbox to run the code. Most often build in runtime-environment for JavaScript and Lua.

We will use the embedded otto JS engine.

 import( : "github.com/robertkrimen/otto" : for { : vm.Run(` abc = 2 + 2; console.log("\nThe value of abc is " + abc); // 4 `) : }

If we assume that the content transmitted in Run , we receive from the outside, we got a complex and self-updating IoT agent!

Understanding the dependencies of the Go binary

So, what we have come to.

 $ ls -lha fattyproject ... 13M ... fattyproject*

We assume that we need all the dependencies added, but as a result, the size of the binary file is adjusted to 12 megabytes. Although this is not much compared with other languages and platforms, however, given the modest capabilities of IoT equipment, it would be advisable to reduce the file size and the cost of computing resources.

Let's find out how dependencies are added to our binary file.

First, let's deal with a well-known binary. GraphicsMagick is a modern variation of the popular ImageMagick image processing system. You probably have it already installed. If not, then under OS X you can do this with the brew install graphicsmagick .

otool is an alternative to the ldd tool, only under OS X. With it, we can analyze the binary file and find out which libraries it is linked to.

 $ otool -L `which convert` /usr/local/bin/convert: /usr/local/Cellar/imagemagick/6.9.3-0_2/lib/libMagickCore-6.Q16.2.dylib (compatibility version 3.0.0, current version 3.0.0) /usr/local/Cellar/imagemagick/6.9.3-0_2/lib/libMagickWand-6.Q16.2.dylib (compatibility version 3.0.0, current version 3.0.0) /usr/local/opt/freetype/lib/libfreetype.6.dylib (compatibility version 19.0.0, current version 19.3.0) /usr/local/opt/xz/lib/liblzma.5.dylib (compatibility version 8.0.0, current version 8.2.0) /usr/lib/libbz2.1.0.dylib (compatibility version 1.0.0, current version 1.0.5) /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5) /usr/local/opt/libtool/lib/libltdl.7.dylib (compatibility version 11.0.0, current version 11.1.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

From the list, you can isolate and the size of each dependency:

 $ ls -lha /usr/l/.../-0_2/lib/libMagickCore-6.Q16.2.dylib ... 1.7M ... /usr/.../libMagickCore-6.Q16.2.dylib

Can we thus get a fairly complete picture of any binary file? Obviously, the answer is no.

By default, Go links statically. Because of this, we get the only self-contained binary file. But it also means that otool , like any other similar tool, will be useless.

 $ cat main.go package main func main() { print("hello") } $ go build && otool -L main main:

If we still try to parse the Go binary file into its dependencies, then we will have to use a tool that understands the format of these binary files. Let's look for something suitable.

For a list of available tools, use the go tool :

 $ go tool addr2line api asm cgo compile cover dist doc fix link nm objdump pack pprof trace vet yacc

We can immediately refer to the source codes of these tools . Take, for example, nm, and see its package documentation . I intentionally mentioned this tool. As it turned out, the possibilities of nm very close to what we need, but this is still not enough. He is able to display a list of characters and sizes of objects, but all this is useless if we try to get a general idea of the dependencies of a binary file.

 $ go tool nm -sort size -size fattyproject | head -n 20 5ee8a0 1960408 R runtime.eitablink 5ee8a0 1960408 R runtime.symtab 5ee8a0 1960408 R runtime.pclntab 5ee8a0 1960408 R runtime.esymtab 4421e0 1011800 R type.* 4421e0 1011800 R runtime.types 4421e0 1011800 R runtime.rodata 551a80 543204 R go.func.* 551a80 543204 R go.string.hdr.* 12d160 246512 T github.com/robertkrimen/otto._newContext 539238 100424 R go.string.* 804760 65712 B runtime.trace cd1e0 23072 T net/http.init 5e3b80 21766 R runtime.findfunctab 1ae1a0 18720 T go.uber.org/zap.Any 301510 18208 T unicode.init 5e9088 17924 R runtime.typelink 3b7fe0 16160 T crypto/sha512.block 8008a0 16064 B runtime.semtable 3f6d60 14640 T crypto/sha256.block

Although in relation to the dependencies themselves, the indicated dimensions (second column) may be exact, but on the whole we cannot simply add and add these values.

Gofat

There remains the last trick that should work. When you compile your binary file, Go generates intermediate files for each dependency, before statically linking them into a single file.

I present to your attention gofat - a shell script that is a combination of Go code and some Unix tools. It analyzes dependency sizes in Go binaries:

 #!/bin/sh eval `go build -work -a 2>&1` && find $WORK -type f -name "*.a" | xargs -I{} du -hxs "{}" | gsort -rh | sed -es:${WORK}/::g

If you are in a hurry, just copy or download this script and make it executable ( chmod +x ). Then run the script without any arguments in your project directory to get information about its dependencies.

Let's deal with this command:

eval go build -work -a 2>&1

The -a flag tells Go to ignore the cache and build the project from scratch. In this case, all dependencies will be reassembled forcibly. The flag –work displays the working directory, so we can analyze it (thanks to the Go developers!).

 find $WORK -type f -name "*.a" | xargs -I{} du -hxs "{}" | gsort -rh

Then we use the find tool to find all the *.a files that represent our compiled dependencies. Then we transfer all strings (locations of files) to xargs . This utility allows you to apply commands to each transmitted string - in our case, du , which gets the file size.

Finally, we use gsort (the GNU version of sort) to sort the file sizes in reverse order.

 sed -es:${WORK}/::g

We remove the WORK folder prefix from everywhere and display a cleared string with data on dependencies.

Moving on to the most interesting part: what does 12 MB occupy in our binary file?

Lose weight

For the first time, we launch gofat for our toy project with an IoT agent. We get the following data:

 2.2M github.com/robertkrimen/otto.a 1.8M net/http.a 1.4M runtime.a 960K net.a 820K reflect.a 788K gopkg.in/alecthomas/kingpin.v2.a 668K github.com/newrelic/go-agent.a 624K github.com/newrelic/go-agent/internal.a 532K crypto/tls.a 464K encoding/gob.a 412K math/big.a 392K text/template.a 392K go.uber.org/zap/zapcore.a 388K github.com/alecthomas/template.a 352K crypto/x509.a 344K go/ast.a 340K syscall.a 328K encoding/json.a 320K text/template/parse.a 312K github.com/robertkrimen/otto/parser.a 312K github.com/alecthomas/template/parse.a 288K go.uber.org/zap.a 232K time.a 224K regexp/syntax.a 224K regexp.a 224K go/doc.a 216K fmt.a 196K unicode.a 192K compress/flate.a 172K github.com/robertkrimen/otto/ast.a 172K crypto/elliptic.a 156K encoding/asn1.a 152K os.a 136K strconv.a 128K os/exec.a 128K github.com/Sirupsen/logrus.a 128K flag.a 112K vendor/golang_org/x/net/http2/hpack.a 104K strings.a 104K net/textproto.a 104K mime/multipart.a

If you experiment, you will notice that with gofat the build time increases significantly. The fact is that we run the build in the -a mode, in which everything is rebuilt.

Now we know how much space each dependency takes. Roll up the sleeves, analyze and take action.

 1.8M net/http.a

Everything related to HTTP processing pulls up to 1.8 MB. Perhaps you can throw it out. We will refuse expvar , instead we will periodically expvar critical parameters and program status information to a log file. If you do this often, everything will be fine.

Update: With the release of Go 1.8 net / http, it weighs 2.2 MB.

 788K gopkg.in/alecthomas/kingpin.v2.a 388K github.com/alecthomas/template.a

And this is a big surprise: about 1 MB takes a very convenient POSIX-feature for parsing flags. You can opt out of it and use the package from the standard library, or even do away with the flags altogether and read the configuration from the environment variables (and this will also take up some volume).

Newrelic adds another 1.3 MB, so you can also drop it:

 668K github.com/newrelic/go-agent.a 624K github.com/newrelic/go-agent/internal.a

`Zap also throw out. We use the standard logging package:

392K go.uber.org/zap/zapcore.a

Otto , being embedded JS-engine, weighs a lot:

 2.2M github.com/robertkrimen/otto.a 312K github.com/robertkrimen/otto/parser.a 172K github.com/robertkrimen/otto/ast.a

At the same time, logrus takes up little space for such a multifunctional logging library:

 128K github.com/Sirupsen/logrus.a

You can leave.

Conclusion

We found a way to calculate dependency sizes in Go and saved about 7 MB. And we decided that we will not use certain dependencies, but instead we will take analogues from the standard Go library.

Moreover, I would say that if we try hard and experiment with a set of dependencies, we can shrink our binary file from the initial 12 MB to 1.2 MB.

It is not necessary to do this, because the dependencies in Go are already small compared to other platforms. But you definitely need to have on hand tools that will help you better understand what you are creating. And if you are developing software for environments with very limited resources available, then gofat can be one such tool.

PS: if you want to experiment more, here is the reference repository: https://github.com/jondot/fattyproject .

Source: https://habr.com/ru/post/322880/

All Articles