Not ORM single

Hello! I manage the Partners Development department in the hotel reservation service Ostrovok.ru . In this article I would like to tell you about how we used Django ORM on one project.

In fact, I was a slave, the name was supposed to be " ~~Not~~ ORM uniform. "If you are wondering why I wrote this, and also if:

You have Django on the stack, and you want to squeeze out the maximum of ORM capabilities, not just Model.objects.all() ,
You want to transfer part of business logic to the database level
Or do you want to find out why the most frequent excuse for developers in B2B.Ostrovok.ru "has historically been this way" ,

... welcome under cat.

cdpv

In 2014, we launched B2B.Ostrovok.ru, an online booking service for hotels, transfers, cars and other travel services for professionals of the tourism market (travel agents, operators and corporate clients).

In B2B, we have designed and quite successfully use an abstract order model based on GenericForeignKey — meta-order — MetaOrder .

A meta-order is an abstract entity that can be used regardless of which type of order it belongs to: hotel ( Hotel ), additional service ( Upsell ) or car ( Car ). In the future we may have other types.

This was not always the case. When the B2B service was launched, only hotels could be booked through it, and all business logic was oriented towards them. Many fields have been created, for example, to display the exchange rates of the sale amount and the refund amount of the reservation. Over time, we realized how best to store and reuse this data, given the meta-orders. But the whole code could not be rewritten, and part of this heritage came into the new architecture. Actually, this led to difficulties in calculations, which use several types of orders. What to do - so historically ...

My goal is to show on our example the power of Django ORM.

Prehistory

Our B2B clients for planning their expenses didn’t have enough information about how much they need to pay now / tomorrow / later, whether they have debts on orders and what is its size, as well as how much more they can spend within their limits. We decided to show this information in the form of a dashboard - such a simple socket with a clear diagram.

dash1
(all values are test and do not apply to a specific partner)

At first glance, everything is quite simple - we filter all the orders of a partner, summarize and show.

Solution options

A little explanation of how we make the calculations. We are an international company, our partners from different countries conduct transactions - they buy and resell reservations - in different currencies. At the same time, they should receive financial statements in their chosen currency (usually local). It would be foolish and impractical to store all possible data on the rates of all currencies, so you need to select a reference currency, for example, the ruble. Thus, it is possible to store the rates of all currencies only to the ruble Accordingly, when a partner wants to receive a summary, we convert the amount at the rate set at the time of sale.

"Head-on"

In fact, this is Model.objects.all() and a loop with conditions:

Model.objects.all () with conditions

 def output(partner_id): today = dt.date.today() # query_get_one -    partner = query_get_one(Partner.objects.filter(id=partner_id)) #    -  query = MetaOrder.objects.filter(partner=partner) result = defaultdict(Decimal) for morder in query: #  ,     #     payment_pending = morder.get_payment_pending() payment_due = morder.get_payment_due() #        # (     ) payable = morder.get_payable_in_cur() #       if payment_pending > today: result['payment_pending'] += payable # ,     if payment_pending < today and payment_due > today: result['payment_due'] += payable return result

This request will return a generator, which potentially contains several hundred bookings. Each of these bookings will be requested in the database, and therefore the cycle will work for a very long time.

You can speed things up a bit by adding the prefetch_related method:

 # object -      GenericForeignKey. query = query.prefetch_related('object')

Then there will be slightly fewer requests to the database (referrals by GenericForeignKey ), but all the same, in the end, we will rest on their number, because the query to the database will still be made for each iteration of the cycle.

The output method can (and should) be cached, but still the first call runs on the order of a minute, which is completely unacceptable.

Here are the results of this approach:

timing_before

The average response time is 4 seconds, and there are peaks reaching 21 seconds. Pretty long.

We didn’t roll out dashboards for all partners, and therefore we didn’t have so many requests to him, but still enough to understand that this approach is not effective.

count_before
The numbers on the bottom right are the number of requests: minimum, maximum, average, total.

Wisely

The head-on prototype was good for understanding the complexity of the task, but is not optimal for use. We decided that it would be much quicker and less resource-intensive to make several complex queries to the database than many simple ones.

Query plan

With broad strokes, the query plan can be described like this:

collect orders for initial conditions,
prepare the fields for calculation through annotate ,
calculate field values
make aggregate by amount and quantity,

Initial conditions

Partners who enter the site can only see information on their contract.

 partner = query_get_one(Partner.objects.filter(id=partner_id))

In the case when we do not want to show new types of orders / reservations, you need to filter only supported ones:

 query = MetaOrder.objects.filter( partner=partner, content_type__in=[ Hotel.get_content_type(), Car.get_content_type(), Upsell.get_content_type(), ] )

Order status is important (more about Q ):

 query = query.filter( Q(hotel__status__in=['completed', 'cancelled']) #     ,    # | Q(car__status__in=[...]) )

We also often use pre-prepared requests, for example, to exclude all orders that cannot be paid. There are quite a lot of business logic there, which is not very interesting to us in this article, but in fact it’s just additional filters. A method that returns a prepared query might look like this:

 query = MetaOrder.exclude_non_payable_metaorders(query)

As you can see, this is a class method that also returns a QuerySet .

We will also prepare a couple of variables for conditional constructions and for storing the results of calculations:

 import datetime as dt from typing.decimal import Decimal today = dt.date.today() result = defaultdict(Decimal)

Field preparation ( `annotate` )

Due to the fact that we have to refer to the fields depending on the type of order, we will apply Coalesce . Thus, we will be able to abstract any number of new types of orders in a single field.

Here is the first part of the annotate block:

First annotate

 #     , #      from app.helpers.numbers import ZERO, ONE query_annoted = query.annotate( _payment_pending=Coalesce( 'hotel__payment_pending', 'car__payment_pending', 'upsell__payment_pending', ), _payment_due=Coalesce( 'hotel__payment_due', 'car__payment_due', 'upsell__payment_due', ), _refund=Coalesce( 'hotel__refund', Value(ZERO) ), _refund_currency_rate=Coalesce( 'hotel__refund_currency_rate', Value(ONE) ), _sell=Coalesce( 'hotel__sell', Value(ZERO) ), _sell_currency_rate=Coalesce( 'hotel__sell_currency_rate', Value(ONE) ), )

Coalesce works here with a bang, because hotel orders have several special properties, and in all other cases (additional services and cars) these properties are not important to us. So appear Value(ZERO) for the sums and Value(ONE) for exchange rates. ZERO and ONE are Decimal('0') and Decimal(1) , only in the form of constants. The approach is an amateur, but in our project it is so accepted.

You might ask, why not put some fields one level higher in a meta-order? For example, payment_pending , which is everywhere. Indeed, over time, we transfer such fields into a meta-order, but now the code works well, so such tasks are not our priority.

Another preparation and calculations

Now we need to make some calculations with the sums that we received in the past annotate block. Notice, here you no longer need to be tied to the type of order (except for one exception).

Second annotate

 .annotate( #  _base     _sell_base=( F('_sell') * F('_sell_currency_rate') ), _refund_base=( F('_refund') * F('_refund_currency_rate') ), _payable_base=( F('_sell_base') - F('_refund_base') ), _reporting_currency_rate=Case( When( content_type=Hotel.get_content_type(), then=RawSQL( '(hotel.currency_data->>%s)::numeric', (partner.reporting_currency,), ), ), output_field=DecimalField(), default=Decimal('1'), ), )

The most interesting part of this block is the _reporting_currency_rate field, or the exchange rate to the reference currency at the time of sale. Data on the rates of all currencies to the reference currency for a hotel order are stored in currency_data . This is just JSON. Why do we keep it like this? So historically .

And here, it would seem, why not use F and not substitute the value of the currency of the contract? That is, it would be cool if you could do this:

 F(f'currency_data__{partner.reporting_currency}')

But f-strings not supported in F Although the fact that Django ORM already has the ability to refer to nested json-fields is very pleasing - F('currency_data__USD') .

And the last annotate block is the calculation _payable_in_cur , which will be summed up for all orders. This value must be in the currency of the contract.

dash2

 .annotate( _payable_in_cur=( F('_payable_base') / F('_reporting_currency_rate') ) )

The peculiarity of the annotate method is that it generates a lot of SELECT something AS something_else constructions that are not directly involved in the query. This can be seen by unloading the SQL query - query.__str__() .

This is the SQL generated by the Django ORM for base_query_annotated . It is quite often necessary to read it to understand where it is possible to optimize request.

Final calculations

There will be a small aggregate wrapper so that in the future, if a partner needs some other metric, it can be easily added.

dash3

 def _get_data_from_query(query: QuerySet) -> Decimal: result = query.aggregate( _sum_payable=Sum(F('_payable_in_cur')), ) return result['_sum_payable'] or ZERO

And one more thing - this is the last filtering by business condition, for example, we need all the orders that need to be paid soon.

dash4

 before_payment_pending_query = _get_data_from_query( base_query_annotated.filter(_payment_pending__gt=today) )

Debugging and checking

A very convenient way to verify the correctness of a created request is to verify it with a more readable version of calculations.

 for morder in query: payable = morder.get_payable_in_cur() payment_pending = morder.get_payment_pending() if payment_pending > today: result['payment_pending'] += payable

Do you know the method "in the forehead"?

Final code

As a result, we got something like the following:

Final code

 def _get_data_from_query(query: QuerySet) -> tuple: result = query.aggregate( _sum_payable=Sum(F('_payable_in_cur')), ) return result['_sum_payable'] or ZERO def output(partner_id: int): today = dt.date.today() partner = query_get_one(Partner.objects.filter(id=partner_id)) query = MetaOrder.objects.filter(partner=partner, content_type__in=[ Hotel.get_content_type(), Car.get_content_type(), Upsell.get_content_type(), ]) result = defaultdict(Decimal) query_annoted = query.annotate( _payment_pending=Coalesce( 'hotel__payment_pending', 'car__payment_pending', 'upsell__payment_pending', ), _payment_due=Coalesce( 'hotel__payment_due', 'car__payment_due', 'upsell__payment_due', ), _refund=Coalesce( 'hotel__refund', Value(ZERO) ), _refund_currency_rate=Coalesce( 'hotel__refund_currency_rate', Value(Decimal('1')) ), _sell=Coalesce( 'hotel__sell', Value(ZERO) ), _sell_currency_rate=Coalesce( 'hotel__sell_currency_rate', Value(Decimal('1')) ), ).annotate( # Calculated fields _sell_base=( F('_sell') * F('_sell_currency_rate') ), _refund_base=( F('_refund') * F('_refund_currency_rate') ), _payable_base=( F('_sell_base') - F('_refund_base') ), _reporting_currency_rate=Case( # Only hotels have currency_data, therefore we need a # check and default value When( content_type=Hotel.get_content_type(), then=RawSQL( '(hotel.currency_data->>%s)::numeric', (partner.reporting_currency,), ), ), output_field=DecimalField(), default=Decimal('1'), ), ) .annotate( _payable_in_cur=( F('_payable_base') / F('_reporting_currency_rate') ) ) before_payment_pending_query = _get_data_from_query( base_query_annotated.filter(_payment_pending__gt=today) ) after_payment_pending_before_payment_due_query = _get_data_from_query( base_query_annotated.filter( Q(_payment_pending__lte=today) & Q(_payment_due__gt=today) ) )

This is how it works now:

timing_after

count_after

findings

By rewriting and optimizing the logic, we managed to make a fairly fast handle for partner metrics and greatly reduce the number of queries to the database. The solution was good and we will reuse this logic in other parts of the project. ORM is our everything.

Write comments, ask questions - we will try to answer! Thank!

Source: https://habr.com/ru/post/447706/

All Articles

Not ORM single

Not ORM single

Prehistory

Solution options

"Head-on"

Wisely

Query plan

Initial conditions

Field preparation ( annotate )

Another preparation and calculations

Final calculations

Debugging and checking

Final code

findings

More articles:

Field preparation ( `annotate` )