Where are the brakes in ORM?

Analysis of some python ORM overhead costs

Introduction

When developing a python django application, I encountered inadequate inhibition.
After several attempts to improve the rather complex calculation algorithms, I noticed that the substantial improvements of these algorithms led to a very modest result - from which I concluded that the bottleneck was not in the algorithms.

The subsequent analysis showed that the main unproductive consumer of processor resources turned out to be the django ORM, which was used to access the data needed in the calculations.
Interested in this question, I decided to check what the overheads are when using ORM. To get the result, I used the most elementary operation: getting the username of the first and only user in the newly created new database.
')
As a database, MySQL was used, located on localhost (MyISAM tables).

As a starting “example for imitation”, I used the code that uses the django specifics minimally and almost optimally obtains the necessary value:

def test_native(): from django.db import connection, transaction cursor = connection.cursor() t1 = datetime.datetime.now() for i in range(10000): cursor.execute("select username from auth_user limit 1") f = cursor.fetchone() u = f[0][0] t2 = datetime.datetime.now() print "native req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10. t1 = datetime.datetime.now() for i in range(10000): cursor.execute("select username,first_name,last_name,email,password,is_staff,is_active,is_superuser,last_login,date_joined from auth_user limit 1") f = cursor.fetchone() u = f[0][0] t2 = datetime.datetime.now() print "native (1) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10. t1 = datetime.datetime.now() for i in range(10000): cursor = connection.cursor() cursor.execute("select username,first_name,last_name,email,password,is_staff,is_active,is_superuser,last_login,date_joined from auth_user limit 1") f = cursor.fetchone() u = f[0][0] t2 = datetime.datetime.now() print "native (2) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.

The result of this code:

 >>> test_native ()
 native req / seq: 8873.05935101 req time (ms): 0.1127007
 native (1) req / seq: 5655.73751948 req time (ms): 0.1768116
 native (2) req / seq: 3815.78751558 req time (ms): 0.2620691

Thus, the optimal “sample” gives about 8 and a half thousand calls to the database per second.
Usually, django and other ORMs pull upon themselves getting other attributes of the object when it is received from the database. As it is easy to see, getting a “locomotive” from the rest of the fields in the table rather worsened the result: up to 5 and a half thousand requests per second. However, this deterioration is relative, since often more than one data field is required to obtain the result of calculations.
The operation of getting a new cursor turned out to be quite difficult - it takes about 0.1ms and degrades the speed of code execution almost 1.5 times.

Take for example the second result for the sample and see what losses the ORM adds to the obtained indicators.

Django orm

We will execute several variants of queries, starting from the simplest ones, and trying to consistently use the django tools to optimize queries.
At first we will execute the most unpretentious code for obtaining the necessary attribute, starting directly with the User type.
Then we will try to improve the result by first saving the query object.
Then, recall the use of the only () method and try to further improve the result.
And finally, another attempt to improve the situation is to use the values () method, which eliminates the need to create a target object.
Here is the final code that verifies the result of our efforts:

 def test_django():   t1 = datetime.datetime.now()   for i in range(10000):       u = User.objects.all()[0].username   t2 = datetime.datetime.now()   print "django req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   t1 = datetime.datetime.now()   q = User.objects.all()   for i in range(10000):       u = q[0].username   t2 = datetime.datetime.now()   print "django (1) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   t1 = datetime.datetime.now()   q = User.objects.all().only('username')   for i in range(10000):       u = q[0].username   t2 = datetime.datetime.now()   print "django (2) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   t1 = datetime.datetime.now()   q = User.objects.all().values('username')   for i in range(10000):       u = q[0]['username']   t2 = datetime.datetime.now()   print "django (3) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.

Execution results discourage:

 >>> test_django ()
 django req / seq: 1106.3929598 req time (ms): 0.903838
 django (1) req / seq: 1173.20686476 req time (ms): 0.8523646
 django (2) req / seq: 695.949871009 req time (ms): 1.4368851
 django (3) req / seq: 1383.74156246 req time (ms): 0.7226783

Firstly, the use of ORM in itself has degraded performance by more than 5 times (!) In comparison with even a non-optimal “sample”. Transferring the preparation of the query outside the cycle is not much (less than 10%) improved the result. But the use of only () completely spoiled the picture - we see a deterioration in the result by almost 2 times, instead of the expected improvement. At the same time, what is interesting, the exclusion of the creation of an object helped to increase productivity by 20%.
Thus, django ORM gives an increase in overheads of approximately 0.7226783-0.1768116 = 0.5458667ms to receive one object.
Omitting further experiments that required the creation of additional objects and tables, I inform you that these results are also true for obtaining a list of objects: obtaining each individual object of a collection of objects that is the result of a single query results in losses of about half a millisecond and more on each object.
In the case of MySQL, these losses amount to a slowdown in code execution by more than 5 times .

SQLAlchemy

For SQLAlchemy, I created a class AUser, declaratively declaring a data structure corresponding to the standard class django.contrib.auth.models.User.
For maximum performance, after a thoughtful reading of the documentation and some experiments, a simple query cache was used:

 query_cache = {} engine = create_engine('mysql://testalchemy:testalchemy@127.0.0.1/testalchemy', execution_options={'compiled_cache':query_cache})

Performance testing is performed first on the “frontal” version of access to the object.
Then an optimization is attempted by moving the query preparation out of the cycle.
Then optimization is attempted by eliminating the creation of the target object.
Then we will further optimize the query by limiting the set of requested fields.

 def test_alchemy():   t1 = datetime.datetime.now()   for i in range(10000):       u = session.query(AUser)[0].username   t2 = datetime.datetime.now()   print "alchemy req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   q = session.query(AUser)   t1 = datetime.datetime.now()   for i in range(10000):       u = q[0].username   t2 = datetime.datetime.now()   print "alchemy (2) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   from sqlalchemy.sql import select   table = AUser.__table__   sel = select([table],limit=1)   t1 = datetime.datetime.now()   for i in range(10000):       u = sel.execute().first()['username']   t2 = datetime.datetime.now()   print "alchemy (3) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   table = AUser.__table__   sel = select(['username'],from_obj=table,limit=1)   t1 = datetime.datetime.now()   for i in range(10000):       u = sel.execute().first()['username']   t2 = datetime.datetime.now()   print "alchemy (4) req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.

Here are the test results:

 >>> test_alchemy ()
 alchemy req / seq: 512.719730527 req time (ms): 1.9503833
 alchemy (2) req / seq: 526.34332554 req time (ms): 1.8999006
 alchemy (3) req / seq: 1341.40897306 req time (ms): 0.7454848
 alchemy (4) req / seq: 1995.34167532 req time (ms): 0.5011673

In the first two cases, the alchemy did not cache requests, despite their identity (I found why, but the developers still offer to drown it with some kind of plug, which they then promise to stick into the code, I did not do that). Cached queries allow alchemy to be 30% -35% higher than django ORM in performance.
Immediately, I note that the SQL generated by the django ORM and SQLAlchemy is almost identical and introduces a minimum of distortion to the test.

ORM on the knee

Naturally, after such results, we redid all of our code, which received data in processing algorithms, into direct requests. Working with code that performs direct requests is inconvenient - so we wrapped the most frequently performed operations into a simple class that performs tasks similar to ORM:

 class S:   def __init__(self,**kw):       self.__dict__.update(kw)   @classmethod   def list_from_cursor(cls,cursor):       return [cls(**dict(zip([col[0] for col in cursor.description],row))) for row in cursor.fetchall()]   @classmethod   def from_cursor(cls,cursor):       row = cursor.fetchone()       if row:           return cls(**dict(zip([col[0] for col in cursor.description],row)))   def __str__(self):       return str(self.__dict__)   def __repr__(self):       return str(self)   def __getitem__(self,ind):       return getattr(self,ind)

We measure the performance losses brought about by the use of this class.

 def test_S():   from django.db import connection, transaction   import util   cursor = connection.cursor()   t1 = datetime.datetime.now()   for i in range(10000):       cursor.execute("select * from auth_user limit 1")       u = util.S.from_cursor(cursor).username   t2 = datetime.datetime.now()   print "S req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.   t1 = datetime.datetime.now()   for i in range(10000):       cursor.execute("select username from auth_user limit 1")       u = util.S.from_cursor(cursor).username   t2 = datetime.datetime.now()   print "S opt req/seq:",10000/(t2-t1).total_seconds(),'req time (ms):',(t2-t1).total_seconds()/10.

Test results:

 >>> test_S ()
 S req / seq: 4714.92835902 req time (ms): 0.2120923
 S opt req / seq: 7473.3388636 req time (ms): 0.133809

As you can see, the losses are very modest: 0.2120923-0.1768116 = 0.0352807ms in the non-optimal case and 0.133809-0.1127007 = 0.0211083ms in the optimal one. I note that in our ORM, made on the knee, a full-fledged python object is created.

General conclusion

The use of powerful universal ORM leads to a very noticeable loss of performance. In the case of using fast DBMS engines such as MySQL, the performance of data access is reduced by more than 3-5 times . The performance loss is about 0.5ms or more per access to a single object on the Intel Pentium Dual CPU E2200 @ 2.20GHz platform.
A significant part of the loss is the creation of an object from a row of data obtained from the database: approximately 0.1ms. Another 0.1ms eats away creating the cursor, which is quite difficult to get rid of in ORM.
The origin of the remaining losses remains unknown. One can only assume that a sufficiently large amount of losses may be caused by the number of calls during the processing of the result — by abstracting the data processing layers.

In order to achieve adequate performance, ORM developers should keep in mind the losses due to the passage of the code of layers of abstraction, on the design of the query, and on other operations specific to ORM. A truly productive ORM should allow for a developer using this ORM to once prepare a parameterized query and then use it with different parameters with minimal impact on overall performance. One way to implement this approach is to use some cache for the generated SQL expressions and handles for prepared queries specific to the underlying DBMS. To my surprise, despite the fact that this optimization was done in SQLAlchemy, performance still suffers, albeit somewhat less.

For me personally, it remains a mystery from where another 0.3-0.4ms of losses are taken when reading one object on both ORMs. It is characteristic that both ORMs spend almost equally unproductive processor resources. This makes you think that the losses are not caused by any local ORM problems (like the lack of a cache of prepared queries from django), but by architectural elements that are probably the same for both ORMs. I would be grateful to the community for professional comments.

Source: https://habr.com/ru/post/148044/

All Articles