📜 ⬆️ ⬇️

Django: How to quickly get unnecessary duplicates in a simple QuerySet

Just discovered an interesting bug (a bug in terms of human logic, but not a machine), and decided to share it with the community. I have been programming on django for quite some time, but I came across this behavior for the first time, so I think it will be useful for someone. Well, to the point!

Suppose we have such a primitive piece in the code:

# views.py ids = [5201, 5230, 5183, 5219, 5217, 5209, 5246, 5252, 5164, 5248, ...< ..>...] products = Product.objects.filter(id__in=ids) 

The received goods about the help of pagination are displayed on the corresponding page of 20 pieces. Once the manager calls and says that the product “jumps” through the pages - at first it was seen on the second page, and then suddenly repeated on the fifth.
')
“Ha” - we declare, set a breakpoint after the specified code block and do print (products). Visually and, for fidelity, we check the output by a cycle - and there are no duplicates there!

Let's do this: try to catch the duplicate product by indexing and slicing. After some time, find the villains: products [3] == products [20]. So, found them. 3 and 20. Item name .

We derive: print (products), we look at positions 3 and 20 ... and there are different products! How so?

We try print (products [0:10]) - the product in position 3 is - name . We try print (products [10:21]) - the product in position 20 is also there, and it is the same - name . @ #! Well, apparently, django somehow iterates and takes on the index (stock?) In different ways, check it out.

We climb into the QuerySet class, we look at the __getitem__ method, here it is in a brief retelling:

 qs = self._clone() qs.query.set_limits(k, k + 1) return list(qs)[0] 

That is, taking by index is just setting the set_limits for the query, so I decided to check how it looks in SQL — maybe an error crept in there?

 qs1 = products._clone() qs1.query.set_limits(3, 4) print(qs1.query) qs2 = products._clone() qs2.query.set_limits(20, 21) print(qs2.query) 

And when I received

 SELECT "shop_product"."id", ... FROM "shop_product" WHERE ("shop_product"."id" IN (5201, 5230, 5183, 5219, 5217, 5209, 5246, 5252, 5164, 5248)) LIMIT 1 OFFSET 3 

and

 SELECT "shop_product"."id", ... FROM "shop_product" WHERE ("shop_product"."id" IN (5201, 5230, 5183, 5219, 5217, 5209, 5246, 5252, 5164, 5248)) LIMIT 1 OFFSET 20, 

I realized that I did not understand. At different offset in the database is the same record? But in the same constraint on id, there can be no duplicates ...

In general, when I executed both requests directly in Postgresql with pens and received the same entries, I started to google on postgres limit offset duplicates and found the answer to stackoverflow . And here's the thing:

When the order of sorting rows in a query (ORDER_BY) is not specified, Postgres can use any sort that it likes - and I, in general, do not mind, I wrote: Product.objects.filter (... ), without any order_by (). When I just wrote this code, there was no pagination, and all the goods were displayed at once on the page - here Postgres sorted all these goods arbitrarily, but all at once.

And then, when the pagination appeared, the bd received a command like “sort the lines as you prefer and give me the lines from 20 to 40”, and now at different ranges (0-20 or 20-40) the sorting was different - it depended on postgres optimizations - and it turns out that the specified lines were sent to the output from a random list.

And here is a quote from the postgres site :
The query optimizer takes LIMIT into account when it comes to LIMIT and OFFSET. Thus, it will be possible to choose inconsistent results if you’re using a predicate result ordering with ORDER BY. This is not a bug; If you’re not in use, you can’t use it.

Well, we will know!

 products = Product.objects.filter(...).order_by('price') 

Yes? NOT! After checking everything, I again found duplicates - but this time I was caught by the fact that order_by used a parameter that could be the same - and now for all products with the same price, the sorting order was again undefined. So that:

 products = Products.objects.filter(...).order_by('price', 'id') 

Now everything is exactly.

PS: For me personally, this was another clear confirmation of the “Law of Leaky Abstractions” - you kind of write ORM a simple query “give me lines 20-40,” and it doesn't even seem necessary to know SQL and Postgres, but in the end one fine moment this abstraction flowing, and here you are already learning the basics.

PPS: By the way, if there is a desire, you can turn this bug in the Django admin panel, if you do not specify ordering for modelAdmin :)

Source: https://habr.com/ru/post/272559/


All Articles