Working with data when building an API based on GraphQL

Preamble

First of all, this article is designed for those readers who are already familiar with GraphQL and more about the intricacies and nuances of working with it. Nevertheless, I hope that it will be useful for beginners.

GraphQL is a great tool. I think many already know and understand its advantages. However, there are some nuances that you should know when you build your API based on GraphQL.

For example, GraphQL allows you to return to the consumer (user or program) who has requested the data only the part in which this consumer is interested. However, when building a server, it is quite easy to make a mistake, which leads to the fact that inside the server (which can be, including - distributed), the data will run in full "bursts". This is primarily due to the fact that out of the box GraphQL itself does not provide convenient tools for parsing an incoming request, and those interfaces that are laid out in it are not well documented.

Source of the problem

Let's look at a typical example of a non-optimal implementation (open the image in a separate window if it is hard to read):

Suppose that our consumer is some kind of application or component of the "phone book", which requests from our API only the identifier, name and telephone number of the users stored by us. At the same time, our API is much more extensive, it will allow access to other data, such as physical address and email address of users.

At the point of data exchange between the consumer and the API, GraphQL perfectly does all the work we need - only the requested data will be sent in response to the request. The problem in this case is at the point of sampling data from the database - i.e. in the internal implementation of our server, and it lies in the fact that for each incoming request we select all user data from the database, regardless of the fact that we don’t need their part. This generates an unnecessary load on the database and causes excessive traffic to circulate within the system. With a significant number of queries, you can get a substantial optimization by changing the approach to data sampling and select only those fields that were requested. At the same time, it is absolutely not important that we act as a data source - a relational database, NoSQL technology or another service (internal or external). Any implementation may be subject to such non-optimal behavior. MySQL in this case is simply chosen as an example.

Decision

It is possible to optimize this server behavior by analyzing the arguments that come in the resolve() function:

 async resolve(source, args, context, info) { // ... }

It is the last argument, info , that is of particular interest to us, in this case. Let us turn to the documentation and analyze in detail what the resolve() function consists of and the argument that interests us:

 type GraphQLFieldResolveFn = ( source?: any, args?: {[argName: string]: any}, context?: any, info?: GraphQLResolveInfo ) => any type GraphQLResolveInfo = { fieldName: string, fieldNodes: Array<Field>, returnType: GraphQLOutputType, parentType: GraphQLCompositeType, schema: GraphQLSchema, fragments: { [fragmentName: string]: FragmentDefinition }, rootValue: any, operation: OperationDefinition, variableValues: { [variableName: string]: any }, }

So, the first three arguments passed to the "resolver" are source - the data passed from the parent node in the GraphQL tree of the schema, args are the query arguments (which come from query), and context is the developer-defined execution context object, often called to pass some global data in "resolvers". And finally, the fourth argument is the meta information about the request.

What can we extract from GraphQLResolveInfo to solve our problem?

The most interesting parts of it are:

fieldName - the current field name of their GraphQL schema. Those. it corresponds to the field name that is specified in the scheme for this resolver. If we catch an info object on the users field, as in our example above, then it is "users" that will be contained as the fieldName value fieldName
fieldNodes - a collection (array) of nodes that were REQUESTED in the query. Just what is required!
fragments - a collection of query fragments (in case the query has been fragmented). Also important information to extract the final data fields.

So, as a solution, we have to parse the info sheet and select the list of fields that came to us from query, and then pass them to the SQL query. Unfortunately, the GraphQL package from Facebook out of the box does not give us anything to simplify this task. In general, as practice has shown, this task is not so trivial, given the fact that requests can be fragmented. And besides, a similar analysis has a universal solution, which later is simply copied from project to project.

Therefore, I decided to write it as an open source library under the ISC license. With its help, the solution of parsing incoming request fields is solved quite simply, for example, in our case like this:

 const { fieldsList } = require('graphql-fields-list'); // ... async resolve(source, args, context, info) { const requestedFields = fieldsList(info); return await database.query(`SELECT ${requestedFields.join(',')} FROM users`) }

fieldsList(info) in this case does all the work for us and returns a "flat" array of child fields for this resolver, i.e. our final SQL query will look like this:

 SELECT id, name, phone FROM users;

If we change the incoming request to:

 query UserListQuery { users { id name phone email } }

then the SQL query will turn into:

 SELECT id, name, phone, email FROM users;

However, it is not always possible to do with such a simple call. Often, real-world applications are much more complex in structure. In some implementations, we may need to describe a resolver at the top level relative to the data in the final GraphQL scheme. For example, in case we decide to use the Relay library, we want to use a ready-made mechanism for splitting collections of data objects by pages, which leads to the fact that our GraphQL scheme will be built according to certain rules. For example, we rework our schema in this way (TypeScript):

 import { GraphQLObjectType, GraphQLSchema, GraphQLString } from 'graphql'; import { connectionDefinitions, connectionArgs, nodeDefinitions, fromGlobalId, globalIdField, connectionFromArray, GraphQLResolveInfo, } from 'graphql-relay'; import { fieldsList } from 'graphql-fields-list'; export const { nodeInterface, nodeField } = nodeDefinitions(async (globalId: string) => { const { type, id } = fromGlobalId(globalId); let node: any = null; if (type === 'User') { node = await database.select(`SELECT id FROM user WHERE id="${id}"`); } return node; }); const User = new GraphQLObjectType({ name: 'User', interfaces: [nodeInterface], fields: { id: globalIdField('User', (user: any) => user.id), name: { type: GraphQLString }, email: { type: GraphQLString }, phone: { type: GraphQLString }, address: { type: GraphQLString }, } }); export const { connectionType: userConnection } = connectionDefinitions({ nodeType: User }); const Query = new GraphQLObjectType({ name: 'Query', fields: { node: nodeField, users: { type: userConnection, args: { ...connectionArgs }, async resolve( source: any, args: {[argName: string]: any}, context: any, info: GraphQLResolveInfo, ) { // TODO: implement }, }, }); export const schema = new GraphQLSchema({ query: Query });

At the same time connectionDefinition from Relay will add edges , node , pageInfo and cursor nodes to the scheme, i.e. we will now need to restructure our requests differently (we will not now dwell on the pagination):

 query UserListQuery { users { edges { node { id name phone email } } } }

So, the resolve() function implemented by the users node will now have to determine which fields are not requested for it, but for its nested child node node , which, as we see, is located in relation to users along the path of edges.node .

fieldsList from the graphql-fields-list library will help to solve this problem as well, for this you need to pass the corresponding path option to it. For example, here is the implementation in our case:

 async resolve( source: any, args: {[argName: string]: any}, context: any, info: GraphQLResolveInfo, ) { const fields = fieldsList(info, { path: 'edges.node' }); return connectionFromArray( await database.query(`SELECT ${fields.join(',')} FROM users`), args ); }

Also in the real world it may be that in the GraphQL scheme we have only one field name, and in the database schema it corresponds to other field name. For example, suppose that a table of users in a database was defined differently:

 CREATE TABLE users ( id BIGINT PRIMARY KEY AUTO_INCREMENT, fullName VARCHAR(255), email VARCHAR(255), phoneNumber VARCHAR(15), address VARCHAR(255) );

In this case, the fields from the GraphQL query must be renamed before embedding it into the SQL query. fieldsList will help with this if you pass the name map to it in the corresponding transform option:

 async resolve( source: any, args: {[argName: string]: any}, context: any, info: GraphQLResolveInfo, ) { const fields = fieldsList(info, { path: 'edges.node', transform: { phone: 'phoneNumber', name: 'fullName' }, }); return connectionFromArray( await database.query(`SELECT ${fields.join(',')} FROM users`), args ); }

And yet, sometimes, converting to a flat array of fields is not enough (for example, if the data source returns a complex structure with nesting). In this case, the fieldsMap function from the graphql-fields-list library will come to the rescue, which returns the entire tree of the requested fields as an object:

 const { fieldsMap } = require(`graphql-fields-list`); // ... some resolver implementation on `users`: resolve(arc, args, ctx, info) { const map = fieldsMap(info); /* RESULT: { edges: { node: { id: false, name: false, phone: false, } } } */ }

If we assume that the user is described by a complex structure, we will get it all. This method can also take an optional path argument, which allows you to get a map of only the required branch from the entire tree, for example:

 const { fieldsMap } = require(`graphql-fields-list`); // ... some resolver implementation on `users`: resolve(arc, args, ctx, info) { const map = fieldsMap(info, 'edges.node'); /* RESULT: { id: false, name: false, phone: false, } */ }

The transformation of names on maps is not currently supported and remains at the mercy of the developer.

Query fragmentation

GraphQL supports query fragmentation, for example, we can expect the consumer to send us such a request (here we refer to the original scheme, a bit contrived, but syntactically correct):

 query UsersFragmentedQuery { users { id ...NamesFramgment ...ContactsFragment } } fragment NamesFragment on User { name } fragment AddressFragment on User { address } fragment ContactsFragment on User { phone email ...AddressFragment }

Do not worry in this case, and fieldsList(info) , and fieldsMap(info) in this case will return the expected result, since they take into account the possibility of query fragmentation. So, fieldsList(info) returns ['id', 'name', 'phone', 'email', 'address'] , and fieldsMap(info) , respectively, returns:

 { id: false, name: false, phone: false, email: false, address: false }

PS

I hope this article has helped shed light on some of the nuances of working with GraphQL on the server, and the graphql-fields-list library can help you in the future to create optimal solutions.

UPD 1

Library version 1.1.0 released - support for @skip and @include in queries has been added. By default, this option is enabled, if necessary, disable it as follows:

 fieldsList(info, { withDirectives: false }) fieldsMap(info, null, false);

Source: https://habr.com/ru/post/427399/

All Articles