The series of Kubernetes success stories continues with the story of the British startup bank Monzo. This young company is classified as “challenger banks”
(yes, this is already a term from the Oxford Dictionary ) , i.e. small banks that challenge the large and long-established financial industry. This becomes possible thanks to the active and widespread use of modern information technologies in their very basis, i.e. the rejection of operations in the traditional format in favor of electronic counterparts, allowing for a qualitative reduction in costs
(banks with this approach are also called “digital-only”) . The example of Monzo, created just 2 years ago, is interesting because it is supported by Kubernetes platform, Go language and other modern Open Source products, which are well known to DevOps engineers and not only, to achieve great goals.
What kind of monzo?
Monzo , originally known as Mondo, was
founded in 2015 by a team from another challenger bank, Starling Bank, created a year earlier, led by entrepreneur Tom Blomfield, who was already successful at that time, 29 years old. By
positioning Monzo as a “bank for those who hate [traditional] banks,” the founders were able to quickly find their audience: in early 2016, they
run the “fastest crowdfunding campaign in history,” collecting 1 million GPB in 96 seconds.
')
In August of the same year, Monzo
received his first (limited) license for official activity as a bank, and in April 2017 managed to remove the restrictions, which
made Monzo a “fully authorized bank” and allowed to offer current accounts to its users. For her first truly active calendar year (2016), Monzo
activated 83700 cards and increased her staff from 16 people to 71.
The company's end-user service is an e-bank, available as a mobile application for iOS and Android, which in real time receives push notifications of all payments made using Monzo bank cards. It also stores transaction history with an indication of the location where payments were made, and automatic assignment of categories depending on the type of company receiving the money (this data is improved by the users themselves in accordance with the approach of crowd-sourced suggestions).
(A more detailed description of the functionality of the application and the bank is beyond the scope of this material.)
In August 2016, analysts from IDC published a
report in which they expressed confidence that Monzo would be able to "provide all the guarantees offered by any [traditional] bank and effectively compete with large transnational banks in the country."
Infrastructure and solutions
The first sufficiently detailed information about the software architecture in Monzo and the infrastructure serving it became known thanks to the report “
Building a Bank with Kubernetes ” by Oliver Beattie, who heads the company's engineers
(by that time, 10 people were working in the backend engineering team) . This performance took place at Kubernetes London Meetup (October'16), and then at KubeCon 2016 (November'16).
Taking extensibility as one of the main factors for the main application of the bank, the company's engineers immediately chose the
microservice architecture , explaining that they don’t want to get one big “bloated” application that cannot be changed “not only now, but after 10, after 20 years ", And this is exactly what happens with legacy banks, as they are called in Monzo:
“The IT systems of many large banks are not expanding in the sense that making changes is too expensive for them. Take, for example, the ability to freeze the card in the application by pressing a button. A friend from RBS (Royal Bank of Scotland, part of the "Big Four" of UK banks - approx. Transl.) Told me that they had considered this opportunity many times, but it took too much time to figure out which of the 20 IT systems will require changes, and modifications for some of them have been frozen for years. Ultimately, this idea was discarded as too expensive. ”
- Jonas Huckestein , Co-founder and CTO of Monzo
At the time Oliver spoke, the company had about
150 microservices in production (Docker was used for containers). One of the problems they encountered on their way to this was the efficiency of the application. When there are few microservices, it was enough to duplicate all of them on each machine, but over time (with an increase in the number of microservices), this approach stopped working. Then, the engineers began to break up microservices into certain groups (
app ,
core , etc.) for posting to different hosts, but this option did not become effective: it happened that the
app did not have enough computing power, and the
core machines were idle. At this stage, Kubernetes
came to Monzo engineers , who allowed “just to have a common pool of resources”, where the application is launched and easily scaled as needed. No less interesting was how this change affected the costs of IT infrastructure:
The red on this chart indicates the cost of the former infrastructure, and the white on the managed Kubernetes. (Monzo uses AWS cloud as the infrastructure iron.)
The next problem is the flexibility of a complex system consisting of many interconnected microservices. At first, it was partially solved by the use of RabbitMQ-based queues, but this approach was not focused on Kubernetes, and the number of requirements made here grew strongly:
The general essence of these requirements was reduced to the minimum delays in responses to user requests and the maximum success in the chance to give an answer. To achieve these goals,
linkerd was chosen
(we wrote about this project and service mesh as a software class in here ) and
Finagle .
On each host where microservices were run, a local linkerd instance was installed, to which local microservices addressed their requests, and he had already contacted the rest of linkerd to decide where to send this request.
Illustration of using linkerd when a GET request is made to a Unicorn HTTP server
To ensure the network isolation of various areas of its infrastructure, Monzo took Calico and the network policies of Kubernetes. For example, the “super-secure” zone can be used to store full bank card numbers (it is not required to transmit this information “in its pure form” for most services, so they may be located in other zones).
At the next KubeCon conference, already European and held in March 2017, the same Oliver from Monzo again
spoke with a shorter story about the use of Kubernetes in the company's infrastructure, saying that by that time the
number of microservices in production had grown to 200 . And almost all of them belonged to the stateless category: only Apache Kafka was mentioned from stateful, with a note that “we began to move there more actively and technologies related to databases”. His presentation, entitled “Processing Real Money at Monzo with Kubernetes and Linkerd,” was continued by the Buoyant (Oliver Gould) namesake, part of the report which was more focused on getting to know linkerd (instead of any specifics with Monzo).
And the latest facts about the Monzo infrastructure were found in the report “
Securing your Infrastructure with CoreOS ” (May 31 at CoreOS Fest 2017):
- the number of microservices in Monzo increased to 230 ;
- Apache Cassandra DBMS is used to store all data;
- Vault is used to store secrets (with the Cassandra backend), but not all of the data has been transferred there;
- For communication with payment systems (or rather, between AWS and the co-location, which directly communicates with these systems) , a VPN based on WireGuard is used , which is “excellent for container infrastructure”.
Major incident October 27
A spoonful of tar in this success story adds an event that
happened just recently - at the end of last month. In short, the production of Monzo did not work correctly for about 80 minutes due to a well-known bug in Kubernetes and circumstances.
Respect is the fact that on the community forum Monzo, the same Oliver
published a very detailed post-mortem about what happened with explanations, even for those who first heard about Kubernetes. The essence of the problem that arose was that, 2 weeks before the incident, Monzo engineers made changes to the etcd cluster, increasing the number of nodes from 3 to 9 and updating the etcd version at the same time, and checked its performance ... However, one fine day the next rollout of the updated service caused strange problems in the system that did not decide to roll back to the working version.
In addition to a certain sequence of events in the work with the infrastructure,
bug # 47131 in Kubernetes played a role in this incident, manifesting itself on certain versions of etcd (still not closed), and
bug # 1219 in linkerd (closed in April with the release of linkerd 1.0) . The operational work of the company's engineers and the existing processes / mechanisms in case of accidents kept the problem to a minimum, but full of a simple banking platform for 20 minutes (and significant problems for more than an hour) is, of course, an unacceptable mistake even for such a “youth” startup. However, the community’s reaction to this engineering post was pleasantly surprised by the positive attitude of the Monzo audience, which welcomed the openness and honesty of the company after what happened.
Their designs
In Monzo's
GitHub account, you can find dozens of repositories, some of which are Go's own developments. In particular:
- typhon - RPC-framework for interaction between microservices;
- slog - a library for receiving logs in a structured form (taking into account the context and arbitrary key-value pairs for each event) and sending them to the seelog;
- phosphor - distributed tracing system, similar to Google's Dapper and Twitter's Zipkin;
- terrors - the Golang error wrap package with additional information.
You can find out more about how Monzo develops Go in the report “
Building a Bank with Go ”, voiced by the Monzo engineer for distributed systems (Matt Heath) in March of this year.
Summarizing
As of October 30, the number of Monzo users
was 469 thousand, and the last round of investments, which ended in November,
brought the company 71 million GBP, and all this indicates the great prospects of the project.
In general, Monzo and Kubernetes are united not only by a similar age, but also by the value they bring to the world, making available today what many have dreamed of and what is the logical development of their industry. The October incident, of course, does not paint the reputation of any bank, but from the reaction that followed it, we can conclude that in this case it will bring more useful new experience to a very talented engineering team rather than lead to a collapse of trust.
Other articles from the cycle