Django-orm in search of LEFT JOIN

It has long been no longer a secret that Django-ORM as a whole is stupid as a stick and is not capable of solving less serious tasks, and is especially stupid in those cases when it is necessary to influence from the outside on the formation of reasonable SQL queries. About one of these cases and how I tried to deal with this - I will tell under the cut.

It all started with the fact that picking the TecDoc database inspired me with the idea of implementing my own translation storage system in the database. Without hesitation, I put up such models for the translation application and one for bullying:

class Translations(models.Model): "  " text = models.TextField(null=True, blank=True) lng = models.SlugField(max_length=32, choices=settings.LANGUAGES, db_index=True) des = models.ForeignKey("Designations", db_index=True, related_name='translations') class Meta: verbose_name = _("translation") verbose_name_plural = _("translations") ordering = ['lng'] # db_table='mlang_translations' class Designations(models.Model): "  ()     id" class Meta: verbose_name = _("designation") verbose_name_plural = _("designations") # db_table='mlang_designations' class Page(MPTTModel): content = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+") keywords = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+") description = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+") title = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+") code = models.CharField(max_length=256, db_index=True) parent = TreeForeignKey('self', null=True, blank=True) # db_table='flatpages_page'

It works as follows:
')
Many models refer to translation labels, after which you can get a translation for a field in one of the languages. The number of requests in the simplest case will be 1 + , * .
As can be seen from the description of the models, the translated fields and the translations themselves refer to the same label, which makes it easy to force the label itself when selecting translations by direct JOIN of the translation to the desired field. And this is where the tambourine dances begin.

Perhaps we will start with the “head on” option, which is better not to use if there are no other options: QuerySet.raw and we get this code:

 Page.objects.raw(""" select fpage.id id, content_translated.text content_translated, title_translated.text title_translated, keywords_translated.text keywords_translated, description_translated.text description_translated from flatpages_page fpage left join mlang_translations content_translated on fpage.content_id=content_translated.des_id and content_translated.lng=%s left join mlang_translations description_translated on fpage.description_id=description_translated.des_id and description_translated.lng=%s left join mlang_translations keywords_translated on fpage.keywords_id=keywords_translated.des_id and keywords_translated.lng=%s left join mlang_translations title_translated on fpage.title_id=title_translated.des_id and title_translated.lng=%s """, params=["ru", "ru", "ru", "ru"])

I do not need to paint pros and cons of this approach.
Naturally, if there are many models and / or you need to receive translations in several views, and / or the fields change at some point, this will be a nightmare in reality.

We are starting to actively google on the topic django orm left join order to get an equivalent SQL query, but the python / django way.

The first thing that caught my eye was to fake a Q object, so that it turned into a LEFT JOIN: QLeftOuterJoin , and if you google longer and more attentively, you can see that this solution is as old as mammoths and from about 2010 it does not work. Attempts to launch success were unsuccessful.

then, in Google’s output, we encounter a certain “ hack ” out of the box above the QuerySet.query, which by standard means allows us to embed a custom INNER / LEFT JOIN into the QuerySet and for our experimental sample the code will look like this:

  qs = Page.objects.filter(id__isnull=False) # ,    . for field in Page._meta.local_fields: if field.rel is not None and field.rel.to is Designations: join = qs.query.join( (Page._meta.db_table, Translations._meta.db_table, ((field.name+'_id', 'des_id'),)), nullable=True, #  LEFT JOIN join_field=Translations._meta.get_field_by_name('des')[0] ) qs = qs.extra( select={ field.name+"_translated": join+'.text' }, where=[join+".lng=%s"], params=['ru'] )

I'll tell you what's going on here: we iterate over all the fields of the Page model and for each ForeignKey (Designations) we generate a unique JOIN. The docstring query.join says:

'join_cols' is a tuple of tuples containing columns to join on ((l_id1, r_id1), (l_id2, r_id2))

Those. With the 3rd element of the first argument, we can transfer a set of connecting fields for the condition, BUT we cannot do filtering within JOIN. In consequence of this, where and param appeared in the qs.extra call, in our turn all our LEFT JOIN was broken into the usual INNER JOIN of the form:

 SELECT ... FROM flatpages_pages, mlang_translations t1, mlang_translations_t2, ..... where t1.lng='ru' AND t2.lng='ru' AND ......

On the one hand, we can say - this is a feature, if one field is not translated, then we hide the entire record and give it to 404. On the other hand, this is not the behavior that I want by default.

Well, okay, go ahead with the normal django way described in the documentation: QuerySet.extra and write the following auxiliary function to automatically generate translations for the desired model:

 def translate(model, lng, exclude=None): if exclude is not None and not isinstance(exclude, (list, tuple, set, frozenset,)): raise TypeError('exclude must be iterable') fields = [] for field in model._meta.fields: if field.rel is not None and field.rel.to is Designations: if exclude is not None and field.name in exclude: continue fields.append( [field.name, map(lambda x: x[1], field.rel.get_joining_columns())[0]] ) if not fields: return {} return dict( tables=[ '"{trans._meta.db_table}" AS "trans_{pos}"'.format(trans=Translations, pos=pos) for pos, val in enumerate(fields) ], select={ column[0] + "_translated": "trans_{0}.text".format(pos) for pos, column in enumerate(fields) }, where=[ "{model._meta.db_table}.{column[1]}=trans_{pos}.des_id and trans_{pos}.lng=%s".format(pos=pos, column=column, model=model) for pos, column in enumerate(fields) ], params=[lng] * len(fields) )

It works quite simply: it iterates all the ForeignKey (Designations) from the transferred model and fills the dictionary for transfer to QuerySet.extra and the result is the following call:

 Page.objects.extra(**translate(Page, lng))

It looks beautiful, BUT these are the same eggs only in profile as in P2, only the thoroughbred INNER JOIN in the query text ...

upd: As promised, I supplement the article with new and final results of the search for answers.

All the above methods relied mainly on the documentation and some of the “hacks” without much detail in how it works in general.
So, if you look at the sources of the Query class in django.db.models.sql.query in particular, the join function itself can be noticed that the work with alias_map and join_map dictionaries is not bad. join_map is nothing more than an ordinary dictionary where the tuples go, which we pass with the 1st argument when calling join, and the value is the alias tuple for the exact identification of join in the query. alias_map uses the same alias as keys, and the descriptor of JOIN itself, which later will be converted to SQL. The type and format of dexryptor is reduced to the form:

 JoinInfo = namedtuple('JoinInfo', 'table_name rhs_alias join_type lhs_alias ' 'join_cols nullable join_field')

The conversion to SQL itself is hard-hardened, which completely eliminates the possibility of no-one monkey patching in the depths of django to add the ability to generate JOINs in smarter than

 LEFT OUTER JOIN table alias ON main_table.field=alias.field

BUT, there is one weighty but:
In pursuit of an elegant solution, I managed to overlook the fact that our left join from paragraph 2 as a whole is fully working and meets all the requirements, but is a little under-appreciated. As written in all SQL: LEFT JOIN textbooks, NULL substitutes for all missing right samples, and therefore we can expand the WHERE condition so that n2 becomes an adequate JEFT JOIN equivalent to what was described at the very beginning of the article. And the code from n2 will look like this:

  qs = Page.objects.filter(id__isnull=False) #    for field in Page._meta.local_fields: if field.rel is not None and field.rel.to is Designations: alias = qs.query.join( (Page._meta.db_table, Translations._meta.db_table, ((field.name+"_id", 'des_id',),)), nullable=True, join_field=Translations._meta.get_field_by_name("des")[0] ) qs = qs.extra( select={field.name+"_translated": alias+'.text'}, where=["{0}.lng='{1}' or {0}.lng is null".format(alias, 'ru')], #  or lng is null )

at the output, we get data that are absolutely identical to SQL from Query.raw, only WHERE was complicated by transferring the filtering by language here, which diminished the readability of the final SQL and I can not unequivocally say how this affects the speed of execution of SQL at the database level relative to the original one. In general, we can say that the problem was solved, I was able to get rid of a huge number of unnecessary queries to the database by building smarter queries to the database without writing pure SQL, which in turn guarantees the portability of the code between different DBMS.

PS: I hope someone this little research will help to sleep more calmly.
The source can be found on github.com .

Source: https://habr.com/ru/post/230507/

All Articles

Django-orm in search of LEFT JOIN

More articles: