📜 ⬆️ ⬇️

Techno log: why Evernote chose Apache Thrift to build its API

Thrift
When we started to plan how the Evernote service will be arranged in 2007, we knew that on the very first day we would need support from both thin (such as browsers) and thick synchronized clients. This prompted us to think about remote protocols and client APIs before starting work on any GUI for the web. Otherwise, you would have to wait several months while you screw the API onto an already existing web service.

Our applications have specific requirements for the API, such as:
  1. Cross platform When we started in February 2008, we had ready code that used Java on the server side, and Win32 C ++ and Objective-C Cocoa on the client side.
  2. Compact data transfer. Evernote client applications synchronize notes, which can contain hundreds of embedded images with a total volume of tens of megabytes. We wanted to have an API in which the transfer of a 15-megabyte note would mean the transfer of exactly these 15 megabytes, not more.
  3. Forward / backward compatibility. After the user once installed his client’s version on his computer, we don’t want to force him to update the software every time we expand our data structure model.
  4. Bindings to programming languages ​​(bindings). We did not want to write a bunch of code to parse and serialize the data structures for each client. This takes a lot of time and leads to errors, and besides it makes the third point impracticable in practice.
  5. Basing on standards and / or open source. Other things being equal, we did not want to associate our service API with proprietary technologies for obvious reasons.
  6. Compactness. We would prefer not to add a megabyte of code and 200 classes to each of our mobile clients.

We spent a couple of months researching and testing various alternatives. XML-RPC or SOAP met one of the requirements (1, 5), ICE from ZeroC - the other (2, 4). We even thought at some point about inventing a bicycle and rolling out our own little ad hoc protocol.

One of our friends recommended to pay attention to the newly opened framework Thrift , used in Facebook. Facebook used it internally for backend servers to exchange messages with other internal servers, where they often had to deal with code pairing in different languages ​​(for example, PHP and C ++). Yes, and the other guys, as far as we could tell, used Thrift for a similar task: providing communications for internal backend servers.
')
We were looking for something else: a framework that could be used not only for server-to-server connections, but also for massive client-server synchronization via the Internet. At the same time, Thrift was ideally suited to all our requirements:
  1. Cross-platform. We define our data and service operations model using the Interface Definition Language in Thrift, and after compilation we get the output of the client and server code for a dozen different languages ​​at the output.
  2. Compact data transfer. If we indicate in the structure description that there is a bianar field and put 1 megabyte of data there, then as a result exactly 1 megabyte will be transferred via communication channels.
  3. Forward / backward compatibility. That's where Thrift is truly incomparable. With a certain accuracy and understanding of how Thrift works (which is not always given immediately, of course), you can add structures, fields, utility methods, and function parameters without disturbing existing clients. Windows or Mac clients that we released 3 years ago can still sync with Evernote today.
  4. Bindings to programming languages ​​(bindings). See point 1. At the very beginning, there was no Objective-C Cocoa support at Thrift, so Andrew McGeachie (our “person-team” for Mac client development) added this support to the Thrift compiler.
  5. Basing on standards and / or open source. Facebook handed over the development of Thrift to the Apache Software Foundation, which is very generous of them.
  6. Compactness. The Thrift executable libraries and generated code were very small and straightforward. They could be easily read and understand exactly what they are doing (since then, it’s true, it’s a bit overgrown with all sorts of additional code, but today, we feel, it’s still the most compact option compared to alternatives).

In the end, we got the Evernote Service API , which provided all of our customers (and hundreds of partner applications ) interaction through a common API using the generated native code. With more than three million active users, often using Evernote on several platforms, it seems to me that on most computers / devices using Thrift, there are Evernote clients.

What about you?


You are about to implement an API for your web service. Should you use Thrift?

If your application has exactly the same requirements as Evernote, then Thrift may be a good choice. If you do not encounter a complex data model with large binary structures (Section 2), the answer is not so obvious.

Web services with simpler data models tend to use less complex REST protocols that serialize data through XML or JSON. This approach will make simple operations really easy to test and execute. If I need to do a couple of things using the Twitter API , I can test them manually from the command line with curl / wget and screw the code into my application via printf / println / regexps and so on. This means that the starting barrier for independent developers who will start working with this type of API, very low.

Our Thrift API sets higher requirements for developers who need to understand all the details of the transfer level and interdependencies in the libraries within their applications before they can begin any testing entirely. With our API, we deliver code samples for different languages, but it still remains a more laborious task than using a simple REST scheme.

On the other hand, a low barrier for similar types of APIs with simplified untyped data serialization, as a rule, results in subsequent compatibility problems (Section 3). Our Twitter gateway uses the independent Twitter4J library to interact with the REST-based Twitter API. Last year, our gateway broke down at least a couple of times due to changes on the server side of Twitter and the subsequent incorrect Twitter4J interpretation of XML data structures (for example, the bitness of the number of tweet identifiers).

More formal IDL and native code generation can ensure stable client operation over the longer term, so the initial complexity of Thrift for developers can then be compensated for some services that are interested in the stability and long life of client code.

Source: https://habr.com/ru/post/120895/


All Articles