This is the first sign in the series of
publications about the new dynamic HTTP router supporting GOV.UK. This letter sheds light on our impulse, explains determination and sums up the experience gained.
Why did we give it up?
GOV.UK is a unique government domain that provides services and information to hundreds of government departments in one place. This is specifically to hide the army departments and colleges eats up on the provision of simple services. Of course, behind the facade, GOV.UK is not a monolithic huge program, but a conglomerate of small applications designed to perform one task,
in the spirit of UNIX . In order to present these applications as a single website, we need a bunch of technologies capable of forwarding user requests to the proper services: HTTP router.
When we started, in October 2012, three uniformly configured
Varnish instances performed the routing task at the forefront of the entire farm. Varnish deserves the highest praise, but we probably came to the edge of his capabilities. It goes without saying that
VCL setup scripts could not be watched without a shudder.
if (req.url ~ "^/autocomplete(\?.*)?$|^/preload-autocomplete(\?.*)?$|^/sitemap[^/]*.xml(\?.*)?$") { <%= set_backend('search') %> } else if (req.url ~ "^/when-do-the-clocks-change([/?.].*)?$|^/bank-holidays([/?.].*)?$|^/gwyliau-banc([/?.].*)?$") { <%= set_backend('calendars') %> } else if (req.url ~ "^/(<%= @smartanswers.join("|") %>)([/?.].*)?$") { <%= set_backend('smartanswers') %> } else if (req.url ~ "^/child-benefit-tax-calculator([/?.].*)?$") { <%= set_backend('calculators') %> } else if (req.url ~ "^/stylesheets|^/javascripts|^/images|^/templates|^/favicon\.ico(\?.*)?$|^/humans\.txt(\?.*)?$|^/robots\.txt(\?.*)?$|^/fonts|^/google[a-f0-9]{16}\.html(\?.*)?$|^/apple-touch(.*)?\.png$") { <%= set_backend('static') %> } else if (req.url ~ "^/(designprinciples|service-manual|transformation)([/?.].*)?$") { <%= set_backend('designprinciples') %> ... } else { <%= set_backend('frontend') %> }
If there are no outrages with regular expressions in conditions from which the eyes of the beaten life of the system administrator are bleeding, then what is wrong with this arrangement?
The main misfortunes forcing us to rethink the HTTP router are:
1. Maintainability: accumulating a list of all routes in one file entails undesirable frequent shaman dances with tambourines. When an application needs to change the URL for which it is responsible, we are forced to update the VCL scripts along with the application, an operation requiring some cunning. Worse, the need to update a single file saturated with mystical syntax and bicycles brings to the scene new risks when changing any URL, which significantly slows us down.
2. Labor productivity: Varnish steadfastly demolishes our abuse of its configuration language, but the final lines of the mentioned VCL script cause us an unrelenting migraine.
The performance tricks actually triggered. GOV.UK hides behind the facade tens of thousands of URLs and we didn’t stop to watch over all this wealth in the Varnish configuration. A blessing in disguise, for the most part this host of URLs is served by just two applications, frontend and whitehall, which respectively give citizens the most important thing - content (like
browse pages ) and
government corporate publishing pages . These applications, in turn, receive content via
contentAPI , the internal interface to the database, which stores all the original wisdom of GOV.UK. This means that any request to the facade after a whistle with a string of conventions VCL can generate behind the scenes the following requests
1. Varnish -> frontend
2. frontend -> contentAPI (“do you have a page like this and would be good in the format that I can access”)
3. If the frontend does not own the content, it returns 404 and our nginx web server will try its luck with the whitehall application.
4. whitehall -> contentAPI
Ochumet? Yeah. Even if all requests are processed quickly (and “quickly”, you think noticed, in the Rail application it’s not that fast), we are still talking about at least 200ms for 404 or 150ms to return any whitehall page. Perhaps the most annoying thing is that for every successful request to whitehall we make two requests to application servers returning 404. Each application server has a limited number of processing threads (we put
unicorn before a Ruby application), as a result, until 404 doomed request is dull to the final , no other request can be processed by the same process.
Need to change something.
A prototype of the new router.
Then in April I spent a few days figuring out what the improved router itself should improve. I decided to use
Go . The simplicity of the language and the Go compiler's guarantees allow it to get along perfectly with the basic components of our HTTP infrastructure and a few quick experiments with the wonderful
net / http package have strengthened me in this decision. In particular, the multitasking model implemented in the Go language makes it ridiculously easy to create high-performance applications tied to I / O, which the application must surely be and properly built router.
')
First of all, the question arose how to store and search for entries in the route table. GOV.UK contains
excellent URLs reflecting the logical structure of the site. (For example, the home page of the Ministry of Health lives on / government / organizations / department-of-health, a sub-page of the list of departments and agencies living on / government / organizations). Due to the tree structure of the URL and, in essence, prefix routing (because, for example, everything including / government is served by one application), the natural choice was the prefix tree, trie.
Implementing a prefix tree on Go turned out to be a couple of trivia. The result (which, like all mentioned in this letter, is available on
GitHub ) was a data structure capable of reflecting slices of arrays of slice strings ([] string {"government", "organizations")) into an arbitrary type (interface {}, speaking the language Go). Built-in language testing support gave the process a special bliss. Despite the fact that it was a prototype, the writing of tests did not require excessive efforts, since the 80 lines of the trie package turned into no more than 200 lines of
data tests (data-driven-test, DDT).
HTTP support
The next step was to use the prefix tree as a routing table. Go has a distinctive (indisputably beautifully designed) HTTP library, net / http, in which the handler, or http.Handler, is put at the forefront. The http.Handler type is an interface. The width of the fields does not allow to go into the interpretation of the Go type system and the place in it of the interface type, but it would be legitimate to say: if you can implement the ServeHTTP (ResponseWriter, * Request) method on your type, then this type can be used as http.Handler.
This is precisely the purpose of the triemux package. Mux, short for multiplexer, a switch, is the term used by Go to denote the component of the receiving and routing requests in different directions for further processing based on the properties (for example, URL) of these requests. In other words, this is an HTTP router. Because triemux satisfies the http.Handler it is an HTTP router and can be used along with the predefined ServeMux from the standard Go library. Our product adds a bit of security to concurrent access to the routing table (
read-write lock ), which allows you to dynamically update the table without interrupting the maintenance of current requests.
The elegance of the hackneyed http.Handler pattern adds the fact that the switch (mux) itself is quite http.Handler, no more than a way to redirect traffic to other http.Handler handlers. triemux does not make assumptions about the design of handlers, it is the same where to switch. Here comes the parquet packet router.
Dynamic loading routes.
To really solve the problems outlined in the first lines of the letter, we need to load routes from some kind of storage, which can be updated by applications when they are deployed. We are using
MongoDB under the hood of GOV.UK and the router package is the triemux link with the Mongo database. Routes are loaded into memory at system startup and traffic is redirected to one of the processing engines (also defined in the database) via the built-in
reverse proxy factory . This layout brings a number of amenities. Namely, we can dynamically load routes when deploying applications and
replace the route table atomically , without dropping queries at the same time. If things go wrong when the routes are reloaded (for example, a conversation with Mongo does not add up), we can easily register a
deferred recovery procedure that guarantees uninterrupted routing.
Router, executed.
By the time I started working on a new router for GOV.UK, I almost didn’t see anything working written on Go. However, all this jazz took only two and a half days, and the result of my work exceeded our existing production pipeline by several orders of magnitude. (In fact, it did not work for me to measure the responsiveness of the router. The measurements of test servers behind the back of the router, and not the router itself, such.)
Subsequently, I began to call this the “unbridled effectiveness of Go” (with the permission of Eugene Wigner). Go is a compact language that fits completely in my head, which for the most part allowed me to become effective in an extremely short time. But the size of the language is deceptive regarding its expressiveness, the quality of standard libraries and the striking ease with which rather complex entities are made from simple parts (in our case, trie -> triemux -> router).
I can state with all confidence that I have made the most pleasant sensations from my adventures. But from a working prototype to a combat build, the path is not long, especially regarding the loaded GOV.UK components. Now my colleagues have a far more difficult job of testing and deploying a new router in front of a
national resource .
Original