3.5 years, 500k Go lines. Part 1

This is a translation of Nate Finch’s article ( original published March 24, 2017).

January 31, 2017 was my last day at Canonical after 3.5 years of work on one of the largest open source projects written in Go - Juju .

At the time of writing, the main Juju repository is 3,542 files, 540,000 lines of Go code (this number does not include 65,000 comment lines). With all the dependencies, except for the standard library, Juju contains 9,523 files, which contain 1,963,000 lines of Go code (excluding 331,000 lines of comments).

Here are a few lessons learned from approximately 7,000 hours of work on this project.

It is worth noting that not everyone in the Juju team agrees with me, and the code base is so great that you can work with it for a year and not see even 2/3 of the code. So treat the following with some skepticism.

Juju

Juju is a tool for orchestrating services, akin to Nomad, Kubernetes and the like. Juju consists (mainly) of two binary files: client and server. The server can work in several different modes (previously the server was in the form of several different files, but since they were 99% identical, it was easier to make one that is easier to distribute). The server can work in the cloud of your choice. You can run several additional instances on separate machines and they will be managed by a central server. The client and the auxiliary machines communicate with the main server via RPC over the web socket.

Juju is a monolith. Any microservices, for work one binary file is necessary. And it really works well, because Go is good at competitiveness - no need to worry about some kind of gorutin blocking anything. Therefore, it is convenient to use everything in one process. You avoid overheading from serialization and other interprocess interactions. This makes code more interdependent, but sharing responsibility is not always the highest priority in development. As a result, it seems to me that the monolith was much easier to develop and test than if it were a bunch of small services, and correct code splitting and encapsulation would help avoid confusing code.

Package management

Juju does not use vending. In my opinion, it would be worth it, but the project was launched before normal tools appeared, and the transition to the use of vendoring never seemed worth the time spent on it. Now we use Roger Peppe godeps (by the way, this is not a godep ) to fix the revisions. True, there is a problem with him - he brings a mess to other packages in your GOPATH, setting them in accordance with the specified commit, so if you ever build another project that does not use vending, there is a possibility that the dependencies will not be collected from the master - flowers. However, revision fixing gave us repeatable builds (until no one did anything really terrible in the external repositories), and there were no special problems, except that the file containing the commit hashes was always a point of conflict. merge Since it was often changed, and even by a large number of developers, sooner or later a situation had to occur when two people changed the same or adjacent lines in this file (for godeps , the godeps utility uses a file in which to update the sources) the packages and the hash of the commit to be installed are indicated). This became a real problem, so I started writing a utility to automatically resolve these conflicts (since godeps keeps the date of a committed commit, it is almost always possible to just choose a more recent commit). The problem remains when using glide , as well as any similar tool that stores dependency hashes in a single file. Not sure I know how to fix it.

All in all, I have never had the feeling that package management is a huge problem. The insignificant thing in our daily work ... so it was always strange for me to read stories that people don’t take Go seriously due to the lack of a package manager. Most third-party repositories support stable APIs, and we could tie our code to a specific commit ... well, that was not a problem.

Project organization

Juju is an 80% monolithic repository (monorepo) located at <github.com/juju/juju>, and the remaining 20% of the code exists in separate repositories (at <github.com/juju>). There are pros and cons in allocating a mono-repository ... It's easy to make large-scale changes across the code, but it also means that you can most likely not support a stable API in foo/bar/baz/bat/alt/special ... so we didn't support it. And this translates into insanity for anyone importing a package from this monorepository and will think that the package will continue to exist in about the same way in the future. Vending, of course, will save in this case, but if you ever need to upgrade, good luck.

A mono-repository also meant that we were less cautious about the API, the division of responsibility, and the code was more interdependent than it could be. Not to say that we were careless, but it seems to me that things outside the main Juju repository were much more standardized due to the same division of responsibility, quality and stability of the API. Of course, the documentation for external repositories was also better, and this in itself means a lot.

The problem with external repositories was managing packages and synchronizing changes between repositories. If you updated the external repository, you had to make changes to the main repository after that in order to start using the changes to the external one. Of course, it is impossible to do this atomically for two github repositories. And sometimes the ability to make changes to the master is blocked due to code inspection or failed tests or something else, and then you have potentially incompatible changes found in the external repository that anyone who decides to make changes to this external repository stumbles on .

I will say one more thing: utility repositories are evil. Many times we were going to backport the fix to a sub-package of our utils repository to an earlier version of Juju, and once again realized that many, many other unrelated changes would come up with this fix. And all because we have too much of everything in one repository. It turns out that we had to do all sorts of horrible branches, “crutch”, “copy-paste”, and in general it's all bad and don't do it. Just say no to the utils packages and repositories.

General simplicity

The simplicity of Go was definitely a major success factor for the Juju project. Only about a third of the developers we hired used to work with Go. The rest were newbies. A week later, most newcomers have already become very experienced. The size and complexity of the product was much more of a problem for developers than the language itself. There have been cases when more experienced Go developers received questions from the team about how best to make X in Go, but this was quite rare. Compare this with C # in my previous work, where I constantly explained different parts of the language or why something works that way and not otherwise.

Simplicity was a boon to the project, since we could hire good overall developers, not just those who had Go programming experience. That is, this language has never been an obstacle for understanding the new part of the code. Juju was so huge that no one could know the details of the whole project. But at the same time, almost everyone could get into a piece of code and find out what 100 or so lines containing an error do and how they do it (more or less). Most of the problems with learning a new piece of code were the same as they would be in any language - what is the architecture, how the information is transmitted, what are the expectations (orig .: expectations) .

Since Go has very little magic, it seems to me that it was easier to implement this project on Go than if it were some other language. You do not have the magic that other languages have. Magic that can add unexpected functionality with seemingly simple and clear lines of code. While studying the old Go code piece you will never have to wonder "how does it work?" Because it’s just a Go code piece. Of course, this does not mean that there is no complicated code over which it is necessary to “brainwash”, hidden expectations and preliminary conditions ... but at least this is not hidden behind language features that obscure the basic algorithms.

Testing

Test suites

At Juju, we used gocheck Gustavo Nieyemer to run our tests. Thanks to the features of gocheck it is possible to carry out full testing (full stack testing), deploying the Juju server and mongo database environment in automatic mode before each test, thereby reducing overhead costs for the developer. After the tests were written, it turned out that they are simply huge, but you could just embed this "basic set" into the structure of your test set, and it will automatically do all the dirty work for you. As a result, our unit tests were performed for almost 20 minutes on a very productive laptop, because for each test a lot of actions were performed. Such a large amount of test code made them fragile and difficult to understand and debug. To understand why the test passed or failed, you needed to understand all the code that ran before the open brace of your test function, and since it is easy to embed a set into a set, often LOT of which was done before this open brace.

In the future, instead, I will stick to the standard library for testing. I like that tests with the standard library are written the same way as regular Go code, and also that dependencies should be explicit. If you want to run the code at the beginning of your test, you can simply put the method there ... you must put the method there.

`time` in a bottle

The time package is a curse of tests and test code. If you have a code that should time out after 30 seconds, how do you test it? Do the test take 30 seconds to complete? And the rest of the tests run for 30 seconds, if something goes wrong? This is connected not only with time.Sleep , but also with time.After or time.Ticker ... well, this is a disaster during the tests. And not to mention that when testing (especially when running with the -race key), your code can run much slower than in production.

The solution is to make fun of time ... which is, of course, non-trivial, because the time package is just a bunch of top-level functions. Therefore, wherever the time package is used, you need to take your special interface instead, which is a wrapper for time , and then pass this fake time for tests, which you can already control. This greatly increased our time to create ready-made assemblies and propagate changes in the code. For a long time, it was a constant source of flakey tests. These are tests that will take place most of the time, but if on some day the CI machine was thoughtful, some random tests failed. And when you have hundreds of thousands of lines of tests, there is a high probability that a test will not pass, and, most likely, it will not be the same test as the last time. The repair of the flakey test was similar to the game "kill the mole" (orig .: whack-a-mole; a slot machine in which out of 9 holes the moles protrude, which must be hammered) .

Cross compilation happiness

I don’t know the exact number of all OS and architecture combinations, but Juju’s server is precisely built for Windows and Linux (Centos and Ubuntu), as well as for many architectures, not just for amd64, but even for such eccentric ones as ppc64le, arm64 and s390x.

Juju first used gccgo for architectures that the gc compiler did not support. Because of this, there were a few mistakes in Juju, where gccgo was doing some kind of subtle crap. When gc was updated and began to support all architectures, we were very pleased to throw out an additional compiler from the project and work only with gc.

When we switched to gc, architectural-specific errors almost disappeared. And it's great, given the breadth of the supported Juju architectures, as well as the fact that usually those eccentric architectures used large companies that have a lot of leverage over Canonical.

OS specific errors

First, when we first started building support for Windows, we had a few bugs related to the OS (we all developed on Ubuntu, so Windows-specific errors did not come up until it worked out CI). They basically boiled down to two common file system errors.

The first is the use of default forward slashes for paths in tests. For example, if you know that the configuration file should be located in the "juju" subfolder and called "config.yml", then your test can verify that the file path is folder + "/juju/config.yml" , and under Windows it should be folder + "\juju\config.yml" .

When creating new directories, even in tests, use filepath.Join instead of path.Join , and certainly not by combining strings and slashes. filepath.Join will use the correct slashes for the OS. To compare paths, always use path.ToSlash to bring paths to a canonical form, which can already be compared.

Another common mistake was that Linux developers allow you to delete / move an open file. This does not work in Windows, because Windows locks the file when it is opened. This often happened in the form of a call to defer file.Delete() , which, according to the FIFO, was called before the deferred call to file.Close() and, thus, an attempt was made to delete the file that was still open. Embarrassment. One solution is to simply always call file.Close() before doing the move or delete. Note that you can call Close several times, so it’s safe to call it before deleting, even if you already have defer file.Close() , which will work at the end of the function.

None of these errors is difficult, and I believe that such strong support for cross-platform in the standard library simplifies the development of cross-platform code.

Error processing

Error handling in Go definitely had a beneficial effect on Juju's stability. The fact that you can tell where a particular function might fail with an error makes it much easier to write code that is waiting for a crash, and does it gracefully.

For a long time, Juju simply used the standard errors package. However, we felt that we needed more context in order to better track the path of the code that caused the error, and we thought that it would be nice to save more detailed information about the error and also add context to it (for example, when using fmt.Errorf information is lost about the original error, if, say, it would be an error ( os.NotFound ).

A couple of years ago, we started developing our own error package, which captures more context without losing the original information about the error. After fruitless throwing in different directions, we combined all our ideas into https://github.com/juju/errors . This, of course, is not an ideal library, and over the years it has expanded due to new features, but it was a good start.

The main problem is that you always need to call errors.Trace(err) when returning an error in order to find out the current file name and line number when you need to display such a thing as stack trace. Today, I would choose the <github.com/pkg/errors> package from Dave Cheney, which captures the stack trace at the time the error was created and avoids full tracing. Honestly, I don’t find the stack trace on error just super useful. In practice, unexpected errors have sufficient context from just fmt.Errorf("while doing foo: %v", err) , so most often the stack trace is not needed. The ability to explore the properties of the original error can sometimes come in handy, but most likely not as often as you think. If foobar.init() returns something like os.IsNotFound , does this really help your code? In most cases, no.

Stability

For such a huge project, Juju is very stable (this does not mean that there are no many mistakes in it ... I just mean that it almost never fell or was very buggy). I think a lot depends on the language. The company I worked for before Canonical had a million lines of C # code, and it often fell with "null reference" exceptions and other unhandled exceptions. Honestly, I don’t remember ever seeing a panic because of the empty pointer in Juju production code, only occasionally in the development process, when I did some stupid things in the new code.

I’m sure that the multiple return pattern in Go serves to indicate errors. Using the template foo, err := and always always checking for errors - this really leads to a very small chance of encountering a null pointer. Checking for an error before accessing the returned variable (s) is the basic principle of Go, so important that we document exceptions to this rule. The additional error return value cannot be ignored or forgotten due to compiler checks for unused variables. This quite well reduces the problem of null pointers in Go, as compared to other similar languages.

Generics

This section will be short because ... well, you know. Only once or twice while working on Juju, I felt that I personally lacked generics. And I do not remember that during the review of the code I would like generics for foreign sources. I was happy that I did not have to “grokat” the cognitive (intellectual) complexity with which I met with generics in C #. Go interfaces are good enough for 99% of cases and I don't mean interface{} . We rarely used interface{} in Juju, and almost always it was because some kind of serialization was performed.

To be continued

This is already a rather long post, so I think it's time to stop. I have many more specific things that I can talk about ... about the API, versioning, database, refactoring, logging, idioms, code inspections, etc.

Source: https://habr.com/ru/post/325326/

All Articles