Centrifugo - 3.5 million rpm

Last time I wrote about Centrifugo a little over a year ago. The time has come to remind of the existence of the project and tell what happened during this period of time. To prevent the article from slipping into a boring enumeration of changes, I will try to focus on some Go libraries that have helped me in the development — perhaps you will get something useful for yourself.

The best part is that this year there was a decent amount of projects using Centrifugo in battle - and each such story is very inspiring. Currently, the largest installation of the Centrifuge, which I know about, is:

300 thousand users online
3.5 million fan-out messages per minute
4 Centrifugo nodes on Amazon c4.xlarge
nodes are connected by a PUB / SUB mechanism of a single Redis instance
CPU consumption on average 40%

By tradition, I have to remind you what I'm writing to you about. I will try to simplify my life and quote the last post :

Centrifugo is a server that runs next to the backend of your application (the backend can be written in any language / framework). Application users connect to Centrifuge using the Websocket protocol or the SockJS polyfill library. Having connected and logged in using an HMAC token (received from the application backend), they subscribe to the channels of interest. Having learned about a new event, the backend of the application sends it to the desired channel in Centrifuge using the HTTP API or the queue in Redis. The centrifuge, in turn, instantly sends a message to all connected interested (subscribed to the channel) users.

A year ago, the latest version was 1.4.2, and now it is already 1.7.3 - a lot of work has been done.
')
Last year, Centrifuge received HTTP / 2 support. It cost me a lot of work and many hours of work. Just kidding :) With Go 1.6 release, Go projects got HTTP / 2 support automatically. In fact, for Centrifugo, where the main transport is still Websocket, HTTP / 2 support may seem useless. However, this is not entirely true - after all, Centrifugo including is a SockJS server. SockJS provides fallback to transports that use the HTTP protocol (Eventsource, XHR-streaming, etc.), in case the browser for some reason cannot establish a Websocket connection. Well, or just in case you for some reason do not want to use Websocket. For many years, we struggled with the limit on persistent connections to a single host, which is set by the HTTP specification (in reality, 5-6 depending on the browser), and now the time has come when, thanks to HTTP / 2, connections from different browser tabs are multiplexed into one. Tabs with permanent HTTP connections can now be opened a lot. So understand - what kind of transport is currently better for a more unidirectional flow of real-time messages from the server to the client - Websocket or something like an Eventsource over HTTP / 2.

A little later, another interesting innovation concerning the HTTP server appeared - support for automatically obtaining an HTTPS certificate from Let's Encrypt. Again I would like to say that I had to sweat, but no - thanks to the golang.org/x/crypto/acme/autocert package, I could write a server that can work with Let's Encrypt - this is a matter of several lines of code:

manager := autocert.Manager{ Prompt: autocert.AcceptTOS, HostPolicy: autocert.HostWhitelist("example.org"), } server := &http.Server{ Addr: ":https", TLSConfig: &tls.Config{GetCertificate: manager.GetCertificate}, } server.ListenAndServeTLS("", "")

In version 1.5.0, an important change was that protobuf began to fly between Centrifuge and Radish instead of JSON. To work with protobuf, I took the github.com/gogo/protobuf library - due to code generation and not using the package, the reflect speed of serialization and deserialization is simply crazy. Especially in comparison with JSON:

 BenchmarkMsgMarshalJSON 2022 ns/op 432 B/op 5 allocs/op BenchmarkMsgMarshalGogoprotobuf 124 ns/op 48 B/op 1 allocs/op

Initially, protobuf version 2 was used, but a little later it was possible to upgrade to the current version 3.

The graph shows much less time to process a request in version 1.5, using protobuf. At the same time, the more complex the request and the more data (the more channels to which the messages should be posted) it contains - the more noticeable the difference.

To add protobuf support to your Go project, just write a proto file like this:

 syntax = "proto3"; package proto; import "github.com/gogo/protobuf/gogoproto/gogo.proto"; option (gogoproto.equal_all) = true; option (gogoproto.populate_all) = true; option (gogoproto.testgen_all) = true; message Message { string UID = 1 [(gogoproto.jsontag) = "uid"]; string Channel = 2 [(gogoproto.jsontag) = "channel"]; bytes Data = 3 [(gogoproto.customtype) = "github.com/centrifugal/centrifugo/libcentrifugo/raw.Raw", (gogoproto.jsontag) = "data", (gogoproto.nullable) = false]; }

As you can see, in the proto-file it is possible to use not only the basic types, but also your custom types.

After the proto-file is written - it remains only to set a protoc on this file (you can download it from the releases page) using one of the code generators provided by the gogoprotobuf library - as a result, a file will be created with all the necessary methods of serialization and deserialization of the described structures. If you are interested in reading about it in more detail, here is the article , the truth is in English.

I also experimented a lot with alternative JSON parsers to deserialize the messages included in the API - ffjson , easyjson , gjson , jsonparser . The best performance was shown by jsonparser - it really speeds up JSON parsing in the stated 10 times and practically does not allocate memory. However, I did not dare to add it to Centrifugo - until it became a bottleneck I did not want to move away from using the standard library. However, it is nice to know that there is an opportunity to so significantly improve the performance of parsing JSON data.

Also JSON is used to communicate with the client - in some particularly hot areas (for example, for new messages in the channel) I create JSON not using the Marshal function, but manually, it looks like this:

 func writeMessage(buf *bytebufferpool.ByteBuffer, msg *Message) { buf.WriteString(`{"uid":"`) buf.WriteString(msg.UID) buf.WriteString(`",`) buf.WriteString(`"channel":`) EncodeJSONString(buf, msg.Channel, true) buf.WriteString(`,"data":`) buf.Write(msg.Data) buf.WriteString(`}`) }

It uses the library github.com/valyala/bytebufferpool - providing a pool of [] byte-buffers to further reduce the number of memory allocations.

I also recommend the wonderful library github.com/nats-io/nuid - in the Centrifuge each message gets a unique id, this library from the developers of Nats.io allows you to generate unique identifiers very quickly. However, it should be borne in mind that you can only use it where you are not afraid that the attacker will be able to calculate the next id from the existing one. But in many places this library can be a good replacement for uuid.

Version 1.6.0 was the result of a complete refactoring of the server code, on which I worked for three months - this is where I really had to sweat. I still drink Centrifuge after hours, so these 3 months are not really that much in terms of time. But still.

The result of refactoring was the separation of code into small packages with a clear public API and interaction with each other - before that, all the code for the most part lay in one folder. It also turned out to make certain parts of the server replaceable at the initialization stage. Now, when almost half a year has passed since then, I will not say that this splitting into separate small packages had any significant impact or gave tangible benefits afterwards - no, nothing like that. But, most likely, it simplifies reading the code for other programmers - who are not familiar with the project from the very first days.

In the process of refactoring, it turned out to significantly improve some parts of the code — for example, metrics that now somewhat resemble how the addition of metrics is arranged in the Prometheus client for Go .

The centrifuge uses the github.com/spf13/viper package for configuration — this is one of the best libraries for configuring the application I've worked with — since with minimal effort from the programmer, it is possible to customize the configuration of the application using environment variables and flags at startup and the settings file (using popular formats - YAML, JSON, TOML, etc.) + viper works in conjunction with github.com/spf13/cobra - one of the most convenient packages for creating cli-utilities. But there is one big BUT! Viper pulls some unreasonable number of external dependencies , some of which pull their own - and most of these dependencies are not used at all in Centrifugo - remote configuration (Consul, Etcd), support for the afero file system, fsnotify (who generally needs a server did the applications restart automatically when the config was changed on the disk?), HCL and Java configuration file formats are also not needed. So I had to fork the viper and make my “lite” version, in which there are no dependencies that I do not need. In fact, this is not the best option - I would like the viper to support plugins and library users themselves determine at the initialization stage which pieces of functionality they need.

In version 1.6, the Redis sharding by channel name was added to distribute the load among several Redis instances. I was always embarrassed by the restriction of one Redis instance — although it was extremely fast on the operations that the Centrifuge uses, I still wanted to have a way to scale this point. Now with the presence of sharding instead of the following scheme:

We get this:

Unfortunately, without resharing, but in the case of Centrifuge, resharding is not so important in fact - the message delivery model and so at most once, and thanks to the way the Centrifuge works, the state itself is restored after a while. Inside, a fast and non-memory-consuming sharding algorithm called Jump is used — the code from the github.com/dgryski/go-jump library is used. More recently, a story has emerged of successfully using sharding in production — in Mesos a three-shard environment. However, I haven’t been able to use sharding in some of my projects.

Perhaps you know that in Centrifuge there is a web-interface written in ReactJS, this interface lies in a separate repository and is uploaded to the server at the build stage. Thus, the binary includes all the statics necessary for the operation of the web interface - the built-in Go FileServer allows you to easily give the statics to the desired address. Initially, for these purposes I used github.com/jteeuwen/go-bindata in conjunction with github.com/elazarl/go-bindata-assetfs . However, I came across a more lightweight and simpler in my opinion github.com/rakyll/statik library - from the familiar Go community Jaana B. Dogan.

Finally, the last thing I would like to mention from the server changes is the integration with the PreparedMessage structure from the Gorilla Websocket library. Appeared PreparedMessage in the Gorilla Websocket library recently. The essence of this structure comes down to the fact that it caches the created websocket frame in order to reuse it whenever possible and not create it every time. In the case of Centrifuges, when there are thousands of users in the channel and the same message is sent to the connection for all, it makes sense with a sufficiently large number of users (in my benchmarks, the gain appeared when there were> 20k customers in one channel). But it makes even more sense when Websocket traffic is turned on — in the case of the Websocket protocol, this is the permessage-deflate extension , which allows you to compress traffic using flate compression. In Go, the flate.Writer structure weighs more than 600kb (!), So with a large fan-out of messages (regardless of the number of clients in the channel) - PreparedMessage helps a lot.

A little sore point - these are clients for mobile devices. Since I do not know either Objective-C / Swift, or Jav at a sufficient level, I can’t help with the development of mobile clients for Centrifugo, which allow you to connect to the server with iOS and Android devices. These clients were written by members of the open-source community, for which I am immensely grateful to them. However, having written clients, the authors, by and large, lost interest in their support - and some features are still missing there. However, these are working clients who have proved the possibility of using the Centrifuge from mobile devices.

This situation cannot but upset me - so for my part I took a step to try to write a client on Go and use Gomobile to generate binding to a client for iOS (Objective-C / Swift) and Android (Java). Well, in general, I managed it - github.com/centrifugal/centrifuge-mobile . It was fascinating - the most difficult part was to try out the bindings obtained in the business - for this I had to master XCode and Android Studio, and also write small examples of using the Websocket client Centrifuge for all three languages - Objective-C, Swift and Java. I wrote an article about the features of Gomobile - maybe someone will be interested in the details.

Among the shortcomings of gomobile, I would like to point out not even strict restrictions on the types supported (which are actually quite possible to live with), but the fact that Go does not generate LLVM bitcodes (Apple) that Apple recommends adding to each application. In theory, this bitcode allows Apple to independently optimize applications in the App Store. Currently, when creating an iOS application, you can disable the bitcode in the project settings in XCode, but what happens if Apple decides to make it mandatory? Unclear. And the lack of control over the situation is a little sad.

The most amazing thing for me is that I learned about it only when the code of my library was ready, tested on an Android device, and I was absolutely sure that everything would go smoothly on iOS — there is no mention of it anywhere in the gomobile documentation found (distracted and overlooked?).

Here, in general, and all of the bright events. Trying Centrifugo is not difficult - there are packages for popular Linux distributions, a Docker image, binary releases and a couple of lines to put on MacOS using brew - all useful links can be found in the README on Github.

Source: https://habr.com/ru/post/326236/

All Articles

Centrifugo - 3.5 million rpm

More articles: