Just discovered an interesting bug (a bug in terms of human logic, but not a machine), and decided to share it with the community. I have been programming on django for quite some time, but I came across this behavior for the first time, so I think it will be useful for someone. Well, to the point!
Suppose we have such a primitive piece in the code:
The received goods about the help of pagination are displayed on the corresponding page of 20 pieces. Once the manager calls and says that the product “jumps” through the pages - at first it was seen on the second page, and then suddenly repeated on the fifth.
')
“Ha” - we declare, set a breakpoint after the specified code block and do print (products). Visually and, for fidelity, we check the output by a cycle - and there are no duplicates there!
Let's do this: try to catch the duplicate product by indexing and slicing. After some time, find the villains: products [3] == products [20]. So, found them. 3 and 20. Item
name .
We derive: print (products), we look at positions 3 and 20 ... and there are different products! How so?
We try print (products [0:10]) - the product in position 3 is -
name . We try print (products [10:21]) - the product in position 20 is also there, and it is the same -
name . @ #! Well, apparently, django somehow iterates and takes on the index (stock?) In different ways, check it out.
We climb into the QuerySet class, we look at the __getitem__ method, here it is in a brief retelling:
qs = self._clone() qs.query.set_limits(k, k + 1) return list(qs)[0]
That is, taking by index is just setting the set_limits for the query, so I decided to check how it looks in SQL — maybe an error crept in there?
qs1 = products._clone() qs1.query.set_limits(3, 4) print(qs1.query) qs2 = products._clone() qs2.query.set_limits(20, 21) print(qs2.query)
And when I received
SELECT "shop_product"."id", ... FROM "shop_product" WHERE ("shop_product"."id" IN (5201, 5230, 5183, 5219, 5217, 5209, 5246, 5252, 5164, 5248)) LIMIT 1 OFFSET 3
and
SELECT "shop_product"."id", ... FROM "shop_product" WHERE ("shop_product"."id" IN (5201, 5230, 5183, 5219, 5217, 5209, 5246, 5252, 5164, 5248)) LIMIT 1 OFFSET 20,
I realized that I did not understand. At different offset in the database is the same record? But in the same constraint on id, there can be no duplicates ...
In general, when I executed both requests directly in
Postgresql with pens and received the same entries, I started to google on postgres limit offset duplicates and found the
answer to stackoverflow . And here's the thing:
When the order of sorting rows in a query (ORDER_BY) is not specified, Postgres can use
any sort that it likes - and I, in general, do not mind, I wrote: Product.objects.filter (... ), without any order_by (). When I just wrote this code, there was no pagination, and all the goods were displayed at once on the page - here Postgres sorted all these goods arbitrarily, but all at once.
And then, when the pagination appeared, the bd received a command like “sort the lines as you prefer and give me the lines from 20 to 40”, and now at different ranges (0-20 or 20-40) the sorting was different - it depended on postgres optimizations - and it turns out that the specified lines were sent to the output from a random list.
And here is a quote
from the postgres site :
The query optimizer takes LIMIT into account when it comes to LIMIT and OFFSET. Thus, it will be possible to choose inconsistent results if you’re using a predicate result ordering with ORDER BY. This is not a bug; If you’re not in use, you can’t use it.
Well, we will know!
products = Product.objects.filter(...).order_by('price')
Yes? NOT! After checking everything, I again found duplicates - but this time I was caught by the fact that order_by used a parameter that could be the same - and now for all products with the same price, the sorting order was again undefined. So that:
products = Products.objects.filter(...).order_by('price', 'id')
Now everything is exactly.
PS: For me personally, this was another clear confirmation of the “Law of Leaky Abstractions” - you kind of write ORM a simple query “give me lines 20-40,” and it doesn't even seem necessary to know SQL and Postgres, but in the end one fine moment this abstraction flowing, and here you are already learning the basics.
PPS: By the way, if there is a desire, you can turn this bug in the Django admin panel, if you do not specify ordering for modelAdmin :)