📜 ⬆️ ⬇️

We do GraphQL API on PHP and MySQL. Part 3: Solving the N + 1 Query Problem

image

In this, the third article on the creation of GraphQL server with graphql-php, I will talk about how to deal with the problem of N + 1 queries.

Foreword


I will continue to change the code received at the end of the previous article . You can also view it in the article repository on Github . If you have not read the previous articles, I recommend reading them before continuing.

I will also comment out two lines in graphql.php, which add a limit to the maximum depth and complexity of the query, since they are not very necessary during development, and can only cause debugging problems:
')
// DocumentValidator::addRule('QueryComplexity', new QueryComplexity(6)); // DocumentValidator::addRule('QueryDepth', new QueryDepth(1)); 


Problem N + 1 queries


Problem


The easiest way to explain what the problem is N + 1 queries on the example. Suppose you need to request a list of articles and their authors. Without hesitation, you can do it like this:

 $articles = DB::table('articles')->get(); foreach ($articles as &$article) { $article->author = DB::table('users')->where('id', $article->author_id)->first(); } 

As a rule, DB::table('articles')->get() as a result sends one request to the database, something like this:

 SELECT * FROM articles; 

And then in the loop another N queries are sent to the database:

 SELECT * FROM users WHERE id = 1; SELECT * FROM users WHERE id = 2; SELECT * FROM users WHERE id = 3; SELECT * FROM users WHERE id = 4; SELECT * FROM users WHERE id = 5; ... SELECT * FROM users WHERE id = N; 

Where N is the number of articles received in the first request.

For example, we execute one request that returns 100 articles to us, and then for each article we execute one more request from the author. In total, 100 + 1 = 101 queries are obtained. This is an extra load on the database server and is called the N + 1 query problem.

Decision


The most common method for solving this problem is grouping queries.

If we rewrite the same example using the grouping of queries, we will get something like this:

 $articles = DB::table('articles')->get(); $authors_ids = get_authors_ids($articles); $authors = DB::table('users')->whereIn('id', $authors_ids)->get(); foreach ($articles as &$article) { $article->author = search_author_by_id($authors, $article->author_id); } 

That is, we do the following:

  1. We request an array of articles
  2. We remember the id of all authors of these articles.
  3. We request an array of users by these id
  4. We insert authors into articles from the array of users

At the same time, no matter how many articles we have requested, only two requests will be sent to the database:

 SELECT * FROM articles; SELECT * FROM users WHERE id IN (1, 2, 3, 4, 5, ..., N); 


Problem N + 1 queries in GraphQL


Now let's return to our GraphQL server in the state it is in after the previous article, and pay attention to how the request for the number of the user's friends is implemented.

If we request a list of users with an indication of the number of friends each, then first the Graph SQL server will request all the records from the user table:

 'allUsers' => [ 'type' => Types::listOf(Types::user()), 'description' => ' ', 'resolve' => function () { return DB::select('SELECT * from users'); } ] 

And then for each user will request from the database the number of his friends:

 'countFriends' => [ 'type' => Types::int(), 'description' => '  ', 'resolve' => function ($root) { return DB::affectingStatement("SELECT u.* FROM friendships f JOIN users u ON u.id = f.friend_id WHERE f.user_id = {$root->id}"); } ] 

This is where the problem of N + 1 queries appears.

To solve this problem by the method of grouping queries, graphql-php suggests that we postpone the execution of these fields to the resolol until the values ​​of all the other (non-postponed) fields are obtained.

The idea is simple: instead of the result, the function “resolve” of the field should return an object of the GraphQL \ Deferred class, to the constructor of which the function is passed to get the same result.

That is, now we can connect the class Deferred:

 use GraphQL\Deferred; 

And postpone execution by rewriting the “countFriends” field resolver as follows:

 'countFriends' => [ 'type' => Types::int(), 'description' => '  ', 'resolve' => function ($root) { return new Deferred(function () use ($root) { return DB::affectingStatement("SELECT u.* FROM friendships f JOIN users u ON u.id = f.friend_id WHERE f.user_id = {$root->id}"); }); } ] 

But simply postponing the execution of the query, we will not solve the problem of N + 1. Therefore, we need to create a buffer that will accumulate the id of all users for which we need to request the number of friends, and in the future will be able to return the results for all users.

To do this, I will create a small class that will have three simple static methods:


You can also implement this class in any way you like; I will only give its code for a specific example:

App / Buffer.php
 <?php namespace App; /** * Class Buffer * *    * * @package App */ class Buffer { /** *  id  * * @var array */ private static $ids = array(); /** *        * * @var array */ private static $results = array(); /** *           */ public static function load() { //     ,     if (!empty(self::$results)) return; //          $rows = DB::select("SELECT u.id, COUNT(f.friend_id) AS count FROM users u LEFT JOIN friendships f ON f.user_id = u.id WHERE u.id IN (" . implode(',', self::$ids) . ") GROUP BY u.id"); foreach ($rows as $row) { self::$results[$row->id] = $row->count; } } /** *  id    * * @param int $id */ public static function add($id) { //   id    ,     if (in_array($id, self::$ids)) return; self::$ids[] = $id; } /** *       * * @param $id * @return int */ public static function get($id) { if (!isset(self::$results[$id])) return null; return self::$results[$id]; } } 

Now we connect our buffer to UserType.php:

 use App\Buffer; 

And again rewrite the resolver for the “countFriends” field:

 'countFriends' => [ 'type' => Types::int(), 'description' => '  ', 'resolve' => function ($root) { //  id    Buffer::add($root->id); return new Deferred(function () use ($root) { //       (     ) Buffer::load(); //       return Buffer::get($root->id); }); } ], 

Is done. Now when you run the query:

GraphQL: Requesting users with number of friends

The number of friends for all users will be retrieved from the database only once. Moreover, the data request on the number of friends will be executed only once, even with this GraphQL query:

GraphQL: Nested user request with number of friends

Of course in this form our buffer is very highly specialized. It turns out that for another field we need to create another buffer. But this is just an example and nothing prevents us from making a universal buffer, which, for example, will store data for different fields by their keys, and also take a function as an argument to get results. In this case, the buffer can receive not only from the database, but also from some API.

Conclusion


That's all. Offer your solutions to these problems and ask questions if they arise.

The source code from an article on Github.

Other parts of this article:

  1. Installation, layout and queries
  2. Mutations, variables, validation and security
  3. Solving the problem of N + 1 queries

Source: https://habr.com/ru/post/329408/


All Articles