📜 ⬆️ ⬇️

Extra join in SQL queries

While debugging the performance of a small project, but with a fairly large base, I ran into an unpleasant special effect.
Django when samples with conditions for foreign keys associated with testing for NULL, generates queries containing JOIN for each such key. For example, for the model

class ForumPlugin(models.Model): name = models.CharField( null=False, blank=False, max_length=50, unique=True, verbose_name=_('name') ) class Thread(MPTTModel): parent = TreeForeignKey( 'self', null=True, blank=True, related_name='children', verbose_name=_('parent thread') ) plugin = models.ForeignKey( ForumPlugin, null=True, blank=True, related_name='threads', verbose_name=_('plugin') ) 

When performing a sample

 Thread.objects.filter(plugin__isnull=True, parent__isnull=True) 

Djando makes this request:
')
 SELECT `forum_thread`.`id`, `forum_thread`.`parent_id`, `forum_thread`.`plugin_id`, `forum_thread`.`lft`, `forum_thread`.`rght`, `forum_thread`.`tree_id`, `forum_thread`.`level` FROM `forum_thread` LEFT OUTER JOIN `forum_thread` T2 ON (`forum_thread`.`parent_id` = T2.`id`) LEFT OUTER JOIN `forum_forumplugin` ON (`forum_thread`.`plugin_id` = `forum_forumplugin`.`id`) WHERE (T2.`id` IS NULL AND `forum_forumplugin`.`id` IS NULL AND ) ORDER BY `forum_thread`.`id 

Naturally, the execution time of such a query is increased by several orders of magnitude, which can be critical with large tables. So on my project on a table of the order of 20-30k records, such sampling instead of the expected 1ms is performed from 100ms to 300ms, which doubles the page generation time.

Unfortunately, the bug of ORM developers has been known for four years and has a long and sad history .

Currently present in all stable versions, including 1.4.3. It is assumed that in 1.5 it will finally be fixed.

As a workaround, it is advised to use double negation:

 Thread.objects.exclude(plugin__isnull=False, parent__isnull=False) 

but I did not succeed in practice in this way to get rid of the problem. Referring directly to the parent_id field also does not help.

Be careful when designing models and try to take this Django feature into account, and avoid foreign key selections using NULL conditions.

UPD: Found solution:

 Category.objects.extra(where=['parent_id IS NULL']) 

Using raw sql is of course a bad form, but apparently this is the only solution.

Source: https://habr.com/ru/post/165895/


All Articles