
Probably everyone knows that Django is one of the most popular frameworks for web development in python. And even if the web-project is based on third-party code, then often when developing, they use separate parts of this framework - for example, ORM. In this article I would like to tell about the features of using Django ORM when working with MySQL database, namely about transactions and pitfalls associated with them. So, for example, if at some point you realize that instead of the expected data, a completely different result returns, then perhaps this article will help you figure out what's what.
Next, we will talk about InnoDB, because it is the only engine that runs as part of MySQL and fully supports transactions (BDB does not count, since it has not been supported for a long time).
It is worth noting a number of features:
1. In Django, the
MySQLdb extension is used as an interface to MySQL, and in turn, each time it connects to the database, it
sets :
AUTOCOMMIT=0
That is, each data change operation must complete a COMMIT / ROLLBACK to commit or roll back the changes. If you used to use PHP (PDO, Mysqli) or Ruby extensions to access MySQL, then you will probably be a little surprised, because in almost all database access drivers the
AUTOCOMMIT value
does not change when connected (by default, MySQL
sets it to
AUTOCOMMIT = 1 ).
2. MySQL uses the
REPEATABLE-READ transaction isolation level, unlike, for example, PosgreSQL or Oracle, in which the isolation level is transacted by default
READ-COMMITTED .
What does it mean? Consider a specific example:
CREATE TABLE IF NOT EXISTS `test` ( `id` int(11) NOT NULL AUTO_INCREMENT, `value` varchar(255) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `test` VALUES (NULL, 'a');
REPEATABLE-READ
1st transaction: | 2nd transaction: |
---|
SET AUTOCOMMIT = 0; | SET AUTOCOMMIT = 0; |
SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
+ ---- + ------- +
| |
| INSERT INTO `test` VALUES (NULL, 'b'); SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
| 2 | b |
+ ---- + ------- + COMMIT; |
INSERT INTO `test` VALUES (NULL, 'c'); SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
| 3 | c |
+ ---- + ------- + COMMIT; | |
As you can see from the example, in the first transaction after the first reading of the data, all subsequent readings will return exactly the same result until the
COMMIT occurs, regardless of what happens in other transactions.
READ-COMMITTED
1st transaction: | 2nd transaction: |
---|
SET SESSION tx_isolation = 'READ-COMMITTED'; SET AUTOCOMMIT = 0; | SET SESSION tx_isolation = 'READ-COMMITTED'; SET AUTOCOMMIT = 0; |
SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
+ ---- + ------- +
| |
| INSERT INTO `test` VALUES (NULL, 'b'); |
SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
+ ---- + ------- + | |
| SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
| 2 | b |
+ ---- + ------- + COMMIT; |
INSERT INTO `test` VALUES (NULL, 'c'); SELECT * FROM `test`;
+ ---- + ------- +
| id | value |
+ ---- + ------- +
| 1 | a |
| 2 | b |
| 3 | c |
+ ---- + ------- + COMMIT; | |
In the case of READ-COMMITTED, a SELECT always returns the last committed version of the data.
Returning to the topic of Django - the trick to using the Django ORM is that it looks like READ-COMMITTED is the only transaction isolation level that the developers were guided by. So, for example, if we refer to the Django code, namely the implementation of the
get_or_create () method in the
QuerySet class:
def get_or_create(self, **kwargs): """ Looks up an object with the given kwargs, creating one if necessary. Returns a tuple of (object, created), where created is a boolean specifying whether an object was created. """ assert kwargs, \ 'get_or_create() must be passed at least one keyword argument' defaults = kwargs.pop('defaults', {}) lookup = kwargs.copy() for f in self.model._meta.fields: if f.attname in lookup: lookup[f.name] = lookup.pop(f.attname) try: self._for_write = True return self.get(**lookup), False except self.model.DoesNotExist: try: params = dict([(k, v) for k, v in kwargs.items() if '__' not in k]) params.update(defaults) obj = self.model(**params) sid = transaction.savepoint(using=self.db) obj.save(force_insert=True, using=self.db) transaction.savepoint_commit(sid, using=self.db) return obj, True except IntegrityError, e: transaction.savepoint_rollback(sid, using=self.db) exc_info = sys.exc_info() try: return self.get(**lookup), False except self.model.DoesNotExist:
then the second attempt to retrieve the object:
return self.get(**lookup), False
will always fail.
I will try to explain - here, for example, two processes simultaneously call the
get_or_create () method of a certain model. The first process is trying to read the data - no data, the exception is generated
DoesNotExist . The second process similarly tries to read the data and similarly generates an
DoesNotExist exception. Further, since the connection uses AUTOCOMMIT = 0 and the transaction isolation level REPEATABLE-READ, both processes freeze the read data. Suppose that the first process successfully creates a record and returns an object of the created record. But at the same time, the second process cannot create anything, since this will violate the uniqueness constraint. The funny thing is that he does not see the object created in the first process, due to the fact that when you re-read the data returns "frozen" result.
Of course, in experimental conditions, this error is rather problematic to reproduce, but with numerous competitive requests, this code will work unstably, periodically generating the exception
DoesNotExist .
How to deal with it?
1. In the case of using the
get_or_create () method, write your own method that performs a forced COMMIT before re-reading the data:
@transaction.commit_manually() def custom_get_or_create(...): try: obj = SomeModel.objects.create(...) except IntegrityError: transaction.commit() obj = SomeModel.objects.get(...) return obj
2. In MySQL settings (/etc/mysql/my.cnf), use the forced transaction isolation level READ-COMMITTED:
transaction-isolation = READ-COMMITTED
3. When using Django> = 1.2 versions, use the following code in settings.py in the connection options to the database:
DATABASE_OPTIONS = { "init_command": "SET storage_engine=INNODB, SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED", }
This bug was
published on the Django
bugtracker a long time ago, but still the ticket is not closed and the problem is still relevant.
')
Or another example - let's say Django ORM is used separately from the web server, as part of a certain demon that is constantly hanging in memory, and periodically reads new data from the MySQL table. This experiment can be performed using the built-in shell in Django:
python manage.py shell >>> from test_module.models import * >>> len(SomeModel.objects.all()) 10
Next, using the second terminal, add several entries:
>>> SomeModel(name='test1').save() >>> SomeModel(name='test2').save() >>> len(SomeModel.objects.all()) 12
And although there are changes in the second terminal, these newly added entries in the first terminal will still be unavailable, since the started transaction is not completed, and after the first reading of the data all subsequent readings will return the same result until the COMMIT is invoked forcibly.
What to do with it? Change the transaction isolation level in the mysql settings (my.cnf) or in the database connection settings in settings.py in Django. Well, or forcibly commit the data after each read:
>>> from django.db import connection, transaction >>> len(Param_Type.objects.all()) 10 >>> transaction.commit_unless_managed() >>> len(Param_Type.objects.all()) 12
Why is this all happening? Perhaps this is due to the fact that Django was originally designed to work using PostgreSQL as a database, in which, as stated above, READ-COMMITTED is used out of the box. In general, this is not quite the standard behavior of Django ORM in relation to MySQL InnoDB can lead to quite difficult to catch bugs. Therefore, in most places where the described problem is discussed (in various blogs and on stackoverflow), it is strongly recommended to use READ-COMMITTED as the transaction isolation level by default, arguing this choice by the fact that READ-COMMITTED is more
productive than REPEATABLE-READ.