Hi, Habr! We continue to track the theme of the API design after we met
this book in the Manning publishing portfolio. Today we decided to publish an overview article about the new Graph API and suggest once again to think about what the new API will be after the undivided popularity of REST.
Enjoy reading!
If in the past 10 years you have consumed an API, I’m willing to bet that it was a REST API. Probably, the data was structured around resources, id included in the responses pointing to related objects, and using HTTP commands it was told how to deal with the information: read, write and update (yes, I agree, this is a loose definition, not Roy’s canonical REST) Fielding). For a while, REST-style APIs have been the dominant standard in our industry.
However, REST has its problems. A client can get used to extracting extra data by requesting a whole resource in case he needs only one or two pieces of information. Or, the client may regularly require several objects at the same time, but he cannot retrieve them all in one request - then a so-called “under-extraction” of data arises. With regard to support, changes in the REST API can lead to the fact that the client will need to update the entire integration in order for the program to comply with the new API structure or response schemes.
')
To solve such problems in recent years, fundamentally different APIs, called "graph", have been increasingly developed.
What is Graph API?
Simplified definition of a graph API: This is an API that models data in terms of nodes and edges (objects and relationships) and allows the client to interact with many nodes at once within a single query. Suppose the server contains data about authors, blog posts and comments to them. If we have a REST API, then to get the author and comments to a specific post from the client, you may need to make three HTTP requests, for example:
/posts/123
,
/authors/455
,
/posts/123/comments
.
In the graph API, the client formulates the call in such a way that the data from all three resources are drawn in one go. The client can also indicate the fields that are really important to him, giving more complete control over the response scheme.
To explore in detail how this mechanism works, consider a couple of cases describing living non-fictional APIs.
Case 1: Facebook Graph APIFacebook released version 1.0 of
its API in 2010 and has since been designing new versions, inspired by the example of graph databases. There are nodes that correspond, for example, to posts and comments, as well as edges connecting them and indicating that this comment is “relevant” to this post. This approach provides the entire structure with no less high-quality detectability than the typical REST API, however, it still allows the client to optimize data retrieval. Take as an example a separate post and consider what simple operations can be done with it.
First, the client selects a post from the root of the API using the GET request, based on the post ID.
GET /<post-id>
By default, in this case, most of the top-level fields of this post are returned. If the client only needs access to certain elements of the post — for example, the title and creation time — you can only request these fields, specifying this information as one of the request parameters:
GET /<post-id>?fields=caption,created_time
To select the required data, the client requests an edge, for example, comments to the post:
GET /<post-id>/comments
Until now, all this is reminiscent of the REST API functions. Perhaps the ability to specify a subset of fields is new, but in general, the data are perceived largely as resources. The situation becomes more interesting when the client collects the attached request. Here's how the client can choose comments to the post:
GET /<post-id>?fields=caption,created_time,comments{id,message}
The above query returns a response, which contains the time when the post was created, its title and a list of comments (from each message only the id and the message are selected). In REST, you could not do this. The client would need to select the post first and then the comments.
And what if the client needs a deeper investment?
GET /<post-id>?fields=caption,created_time,comments{id,message,from{id,name}}
In this query, post comments are selected, including the id and the name of the author of each comment. Consider how this would be done in REST. The client would need to request a post, request comments, and then, in a series of separate requests, extract information about the author of each comment. Immediately recruited a lot of HTTP calls! However, when designing in the form of a graph, all this information is condensed in one call, and in that call it turns out only that information that the client needs.
Finally, the last point that should be noted about graph design: any object selected from the edge itself is a node and, therefore, it can be requested directly. Here, for example, how additional information about a specific commentary is selected:
GET /<comment-id>
Note: the client does not need to collect the URLs of the form
/posts/<post-id>/comments/<comment-id>
, as might be required when working with the REST API. This can be useful in situations where the client does not have direct access to the id of the parent object.
The same situation occurs when data changes. For example, if we need to update and / or delete an object (say, a comment), the PUT or DELETE request is applied, respectively, sent directly to the end point
id
. To create an object, the client can send a POST to the corresponding node edge. So, to add a comment to a post, the client makes a POST request to the edge with comments from this post:
POST /<post-id>/comments message=This+is+a+comment
Case 2: GitHub V4 GraphQL APIAnother competitor to the graph API is the specification called GraphQL. This concept is significantly different from REST, here only one endpoint is provided that accepts GET and POST requests. For all interactions with the API, queries are sent that comply with the GraphQL syntax.
In May 2017, GitHub released the 4th version of its API corresponding to this specification. To try what GraphQL is, let's look at the individual operations that can be done with the repository.
To select a repository, the client defines a GraphQL query:
POST /graphql { "query": "repository(owner:\"zapier\", name:\"transformer\") { id description }" }
In this query, the ID and description of the “transformer” repository are selected from the Zapier org resource. There are several things to note here. First, we read the data from the API using POST, since we send the message body in the request. Secondly, the payload of the request itself is written in JSON format, which is prescribed in the GraphQL standard. Thirdly, the structure of the request will be exactly as specified in our request,
{"data": {"repository": {"id": "MDEwOlJlcG9zaXRvcnk1MDEzODA0MQ==", "description": "..."}}}
( The root
data
key is another mandatory element that must be present in GraphQL responses).
To select data related to the repository - for example, tasks and their authors, the client applies a subquery:
POST /graphql { "query": "repository(owner: \"zapier\", name: \"transformer\") { id description issues(last: 20, orderBy: {field: CREATED_AT, direction: DESC}) { nodes { title body author { login } } } }" }
This request snatches the ID and description of the repository, the name and text of the last 20 tasks created in the repository, as well as the login name of the author of each task. That is, in each request fit a lot of information. Imagine what the REST equivalent of such a query would look like - and it becomes clear what features and flexibility the GraphQL clients provide in this regard.
When updating data, GraphQL uses a concept called “mutation”. Unlike REST, where an update is performed by PUT or POST of a modified copy of the resource to the same endpoint with which the client extracted it, the GraphQL mutation is an explicit operation defined by the API. If the client needs to correct the data, then it is required to know which mutations are supported on the server. Conveniently, GraphQL allows them to be detected as part of a process called “introspection scheme”.
Before discussing what “introspection” is, it is necessary to clarify the term “scheme”. In GraphQL, each API defines a set of types used in validating queries. So far in GitHub, we have worked with
repository
,
issue
and
author
. Each type describes the data it contains, as well as the relationships of this type with others. Together, all these types form an API schema.
If there is a detailed GraphQL schema, it is imperative that the client have the opportunity to request this schema in accordance with the GraphQL syntax. In this way, the client can learn the capabilities of the API through introspection.
If a client wants to know which mutations are possible in GitHub, you can simply request:
POST /graphql { "query": "__type(name: \"Mutation\") { name kind description fields { name description } }" }
Among the mutations listed in the response, we find, for example,
addStar
, which allows the client to put an asterisk repository (or any rated object). To implement a mutation, a similar query is used:
POST /graphql { "query": "mutation { addStar(input:{starrableId:\"MDEwOlJlcG9zaXRvcnk1MDEzODA0MQ==\"}) { starrable { viewerHasStarred } } }" }
This request indicates that the client is about to use the
addStar
mutation and provides the arguments necessary to perform such an operation; in this case, it is only the repository ID. Note: in this query, the keyword mutation is used as the prefix of the query. This is how GraphQL finds out that the client is about to perform a mutation. In all previous queries, the query keyword could also be supplied as a prefix, but it is customary to use it if the type of operation is not specified. Finally, it should be noted that the client fully controls the data contained in the response. In this request, the client requests the
viewerHasStarred
field from the repository — in this scenario, it doesn’t interest us much, because an asterisk is added during the mutation, and we know that it will return
true
. However, if a client has committed a different mutation — say, he created a task, he can receive in response generated values, for example, the task ID or number, as well as embedded data, for example, the total number of open tasks in this repository.
API of the futureI hope these cases demonstrate how the design of the API is developing in the SaaS industry. I'm not trying to say that the graph API is the future, and REST is dead. Architectures like GraphQL have their own problems. But it's good that the circle of possibilities is expanding, and the next time when you need to create an API, you will be able to weigh all the compromises that you have to make with one or another design variant, and choose the best solution.