📜 ⬆️ ⬇️

Using Golang to create microservices in The Economist: a retrospective

Hello! Already on May 28, we are launching the first group in the “Developer Golang” course. And today we are sharing with you the first publication dedicated to the launch of this course. Go.



Key excerpts
')

I joined the The Economist development team as a Drupal developer. However, my real task was to participate in a project that would fundamentally change the Economist content delivery technology. The first few months I spent on learning Go, worked for several months with an external consultant to create an MVP (minimum viable product - minimum viable product), and then re-joined the team to oversee their immersion in Go.
This technology shift was triggered by The Economist's mission to expand the digital audience as news consumption moved away from print media. The Economist needed more flexibility to deliver content to increasingly diverse digital channels. To achieve this goal and maintain a high level of performance and reliability, the platform moved from monolithic to microservice architecture. The tools, written on Go, were a key component of the new system, which enabled The Economist to provide scalable, high-performance services and quickly create new products.

Go into The Economist:


Why did The Economist choose Go?

To answer this question, it will be useful to highlight the overall architecture of the new platform. The platform, called the Content Platform, is an event handling system. It responds to events from different content authoring platforms and starts a process flow executed in separately running microservices. These services perform functions such as data standardization, semantic tag analysis, indexing in ElasticSearch, and sending content to external platforms such as Apple News or Facebook. The platform also has a RESTful API, which, in conjunction with GraphQL, is the main entry point for front-end clients and products.

When developing a common architecture, the team investigated which languages ​​would fit the needs of the platform. Go was compared to Python, Ruby, Node, PHP and Java. While each language has its strengths, Go best fits the platform architecture. Go, aimed at concurrency and API support, along with its construction of a static compiled language, facilitated the development of distributed event processing systems that could scale. In addition, the relatively simple Go syntax made it easy to get involved in development and start writing working code, which promised immediate benefits for the team going through such a big technological transition. In general, it was determined that Go is the language most suitable for usability and efficiency in a distributed cloud system.

Three years later: Did Go meet these ambitious goals?

Some elements of the platform design were well aligned with the Go language. Failing Fast was a critical part of the system, since it consisted of distributed independent services. In accordance with the principles of the Twelve-Factor App ("12-factor application"), applications had to run quickly and quickly fail. The design of Go as a static, compiled language provides a quick start time, and the compiler performance is constantly improving and has never been a problem for design or deployment. In addition, Go's error handling design allowed applications to fail not only faster, but also smarter.

Error processing

A feature that engineers quickly notice in Go is that it is of type Error rather than an exception system. In Go, all errors are values. Error type is predefined and is an interface. An interface in Go is essentially a named collection of methods, and any other custom type can satisfy an interface if it has the same methods. The Error type is an interface that can describe itself as a string.

type error interface { Error() string } 

This provides engineers with more control and functionality in error handling. By adding the Error method that returns a string to any custom module, you can create your own errors and generate them, for example, using the New function presented below, which comes from the Errors package.

 type errorString struct { s string } func (e *errorString) Error() string { return es } 

What does this mean in practice? In Go, functions allow multiple return values, so if your function may not work, it will most likely return an error value. The language encourages you to explicitly check for errors where they occur (as opposed to throwing and catching an exception), so your code should usually include a check “if err! = Nil. At first, this frequent error handling may seem monotonous. However, error as a value allows you to use Error to simplify error handling. For example, in a distributed system, you can easily implement retry requests by wrapping errors.

Network problems will always occur in the system, whether sending data to other internal services or transferring it to third-party tools. This example from the Net package shows how you can use an error as a type to distinguish temporary network errors from permanent ones. The Economist team used a similar error wrapper to create incremental retries when sending content to external APIs.

 package net type Error interface { error Timeout() bool // Is the error a timeout? Temporary() bool // Is the error temporary? } if nerr, ok := err.(net.Error); ok && nerr.Temporary() { time.Sleep(1e9) continue } if err != nil { log.Fatal(err) } 

The authors of Go believe that not all exceptions are exceptional. Engineers are directed to a reasonable recovery from errors, rather than to a failure of the application. In addition, Go error handling allows you to better control errors, which can improve aspects such as debugging or usability errors. Within the Content Platform, this Go design feature has allowed developers to make informed decisions about errors, which has led to an increase in the reliability of the system as a whole.

Data consistency

Data consistency is a critical factor in the Content Platform. In The Economist, content is the foundation of business, and the Content Platform’s goal is to ensure that content can be published once and is accessible everywhere. Therefore, it is important that each product and consumer have consistency of data with the API Content Platform. Products mainly use GraphQL for API requests, which require a static scheme that serves as a kind of contract between consumers and the platform. Content processed by the Platform should be consistent with this scheme. The static language helped realize this and made it easy to achieve consistency of data.

Testing with Go

Another coherence feature is the Go test suite. Fast Go compilation time, combined with first-class testing as a language feature, allowed the team to incorporate effective testing methods into design workflows and fast failures in assembly lines. Go tools for tests make it easy to set up and run. Running "go test" will run all the tests in the current directory, and the test command has several useful flags. The cover flag provides a detailed code coverage report. The “bench” test runs benchmark tests, which are indicated by running the name of the test function with the word “Bench” rather than “Test”. The TestMain function provides methods for additional test setup, such as a dummy authentication server.

In addition, Go has the ability to create table tests with anonymous structures and stubs with interfaces, improving test coverage. Although testing is not new in terms of language features, Go makes it easy to create reliable tests and easily integrate them into workflows. From the very beginning, The Economist engineers were able to run tests as part of assembly lines without special configuration, and even added Git Hooks to run tests before pushing code into Github.

However, the project has not been without efforts to achieve data consistency. The first major challenge for the platform was managing dynamic content from unpredictable backends. The platform consumes content from source CMS systems mainly through JSON endpoints, where the structure and data types are not guaranteed. This meant that the platform could not use the standard Go package to interpret json, which supports JSON deserialization into structures, but beeps the alarm if the types of the struct and input data fields do not match.

To overcome this problem, a special method was needed to match the server side with the standard format. After several iterations of the chosen approach, the team implemented its own de-serialization process. Although this approach was a bit like processing the standard library package, it gave engineers complete control over the processing of the source data.

Network support

Scalability was at the forefront of the new platform, and this was provided by the standard Go libraries for networking and API. In Go, you can quickly implement scalable HTTP endpoints without the need for frameworks. In the example below, the standard net / http library package is used to configure a handler that accepts a means of recording requests and responses. When the Content Platform API was first implemented, it used the API framework. It was eventually replaced by the standard library, as the team recognized that it meets all their network needs without additional unnecessary compromises. The Golang HTTP handlers are scaled because each request to the handler is executed in parallel in Goroutine, a lightweight stream without the need for customization.

 package main import ( "fmt" "log" "net/http" ) func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Hello World!") } func main() { http.HandleFunc("/", handler) log.Fatal(http.ListenAndServe(":8080", nil)) } 

Concurrency model

The Go concurrency model provided multiple benefits in improving performance across the platform. Working with distributed data involves messing with the guarantees promised to consumers. According to the CAP theorem, it is impossible to simultaneously provide more than two of the following three guarantees: Data consistency. Availability. Resistance to separation. In The Economist, consistency was ultimately accepted, which means that reading from data sources will ultimately be consistent, and moderate delays are acceptable in all data sources that reach a consistent state. One way to minimize this gap is to use Goroutines.

Goroutines are light streams managed by the Go runtime to prevent them (flows) from running out. Goroutines allowed to optimize asynchronous tasks on the platform. For example, one of the Platform's data warehouses is Elasticsearch. When content is updated on the system, content referenced by this item in Elasticsearch is updated and reindexed. Thanks to the introduction of Goroutines, the processing time was reduced, which ensured rapid consistency of elements. This example shows how elements that are suitable for reprocessing are reprocessed in Goroutine.

 func reprocess(searchResult *http.Response) (int, error) { responses := make([]response, len(searchResult.Hits)) var wg sync.WaitGroup wg.Add(len(responses)) for i, hit := range searchResult.Hits { wg.Add(1) go func(i int, item elastic.SearchHit) { defer wg.Done() code, err := reprocessItem(item) responses[i].code = code responses[i].err = err }(i, *hit) } wg.Wait return http.StatusOK, nil } 

System design is more than just programming. Engineers need to understand what tools are appropriate where and when. While Go was a powerful tool for most of The Economist's Content Platform, some restrictions required other solutions.

Dependency management

When Go was first released, he had no dependency management system. Within the community, several tools have been developed to address this need. The Economist used Git's submodules, which made sense at a time when the community was actively promoting a standard dependency management tool. Today, although the community is already much closer to a consistent approach to managing dependencies, it is not there yet. In The Economist, the submodule approach did not create serious problems, but it was difficult for other Go developers, and this should be taken into account when switching to Go.

There were also platform requirements for which Go features or design were not the best solution. Because Platform added audio processing support, Go tools for extracting metadata were limited at the time, and so the team chose Exiftool Python instead. Platform services run in docker containers, which allowed to install Exiftool and launch it from the Go application.

 func runExif(args []string) ([]byte, error) { cmdOut, err := exec.Command("exiftool", args...).Output() if err != nil { return nil, err } return cmdOut, nil } 

Another common scenario for the platform is to accept non-working HTML code from source CMS systems, analyze HTML code for correctness, and reorganize HTML code. Initially, Go was used for this process, but since the standard Go HTML library requires valid HTML, a large amount of custom code was required for HTML parsing before processing. This code quickly became fragile and missed borderline cases, so a new Javascript solution was implemented. Javascript provided greater flexibility and adaptability for managing the HTML validation and redesign process.

Javascript was also a common choice for filtering and routing events in the Platform. Events are filtered using AWS Lambdas, which are lightweight functions that are triggered only when called. One use case is to filter events in different bands, such as fast and slow. This filtering is performed based on a single metadata field in the event handler's JSON object. In the implementation of filtering, a package of Javascript pointers was used to capture an element in a JSON object. This approach was much more efficient compared to the complete dismantling of JSON that would be required for Go. While this type of functionality could also be achieved with Go, using Javascript was easier for engineers and provided simpler lambda.

Retrospective Go

After implementing the Contact Platform and supporting it in production, if I were to conduct a Go and Content Platform retrospective, my review would be as follows:

What is good?


What can be improved?


Overall, it was a positive experience, and Go is one of the most important elements that allowed the Content Platform to scale. Go is not always the right tool, and that's fine. The Economist has a polyglot platform and uses different languages ​​where it makes sense. Go is probably never the best choice when it comes to messing with text objects and dynamic content, so Javascript is still in the toolbox. However, Go's strengths are the foundation that allows the system to scale and grow.
When considering whether Go is right for you, consider key system design issues:


If you are developing a system that addresses distributed data problems, asynchronous workflows, and high performance and scalability, I recommend that you consider Go and its capabilities to speed up the achievement of your system goals.

Friends, we are waiting for your comments and we invite everyone to the open webinar , which will be held on the 16th by the senior developer at Yandex and, concurrently, by our teacher Dmitry Small .

Source: https://habr.com/ru/post/451760/


All Articles