📜 ⬆️ ⬇️

Django 3.0 will be asynchronous

Andrew Godwin published DEP 0009: Async-capable Django on May 9, and was approved by Django technical council on July 21, so you can hope that by the time Django 3.0 is released, they will be able to do something interesting. It was already mentioned somewhere in the comments of Habr , but I decided to convey this news to a wider audience by translating it - primarily for those who, like me, do not particularly follow the news of Django.



Asynchronous Python has been developed for many years, and in the Django ecosystem, we experimented with it in Channels with the primary focus on web socket support.


As the ecosystem developed, it became apparent that while there was no urgent need to extend Django to support non-HTTP protocols such as web sockets, asynchronous support would provide many benefits for the traditional Django model-view-template framework.


The benefits are described in the Motivation section below, but the general conclusion I came to is that we get so much from asynchronous Django that it is worth the hard work it takes. I also believe that it is very important to make changes in an iterative, community-supported way that will not depend on one or two old contributors that might burn out.


Although the document is referred to as the “Feature” DEP, all of this means that it is also partly a Process DEP. The scope of the changes proposed below is incredibly large, and launching them as a traditional single-feature process is likely to fail.


Of course, throughout this document, it is important to remember the Django philosophy, which is to keep everything safe and backward compatible. The plan is not to remove synchronous Django - the plan is to keep it in its current form, but add asynchrony as an option for those who believe that they need additional performance or flexibility.


Is this a gigantic job? Of course. But I feel that this allows us to significantly change the future of Django - we have the opportunity to take a proven framework, and an incredible community, and introduce a completely new set of options that were previously impossible.


The web has changed, and Django should change with it, but in accordance with our ideals, be affordable, secure by default and flexible as projects grow and their needs change. In the world of cloud data warehousing, service-oriented architecture and backend as the foundation of complex business logic, the ability to do things competitively is key.


This DEP outlines a plan that I think will lead us there. This is a vision that I really believe in and with which I will work to help do everything possible. At the same time, careful analysis and skepticism are justified; I ask for your constructive criticism, as well as your trust. Django relies on a community of people and the applications they create, and if we need to determine the path to the future, we must do it together.


Short description


We are going to add support for asynchronous representations, middleware, ORM and other important elements to Django.


This will be done by running synchronous code in threads, gradually replacing it with asynchronous code. Synchronous APIs will continue to exist and be fully supported, and over time will turn into synchronous wrappers for initially asynchronous code.


ASGI mode will launch Django as a native asynchronous application. WSGI mode will trigger a separate event loop each time Django is accessed, so that the asynchronous layer is compatible with the synchronous server.


Multithreading around ORM is complex and requires a new concept of connection contexts and sticky threads to run synchronous ORM code.


Many parts of Django will continue to work in sync, and our priority will be to support users writing views in both styles, letting them choose the best style for the presentation they are working on.


Some functions, such as templates and caching, will need their own separate DEPs and studies on how to make them completely asynchronous. This DEP mainly focuses on HTTP-middleware-view flow and ORM.


There will be full backward compatibility. The standard Django 2.2 project should run in asynchronous Django (be it 3.0 or 3.1) without change.


This proposal is focused on the implementation of small, iterative parts with their gradual placement in the master branch to avoid problems with the long-lived fork and allow us to change course as problems are discovered.


This is a good opportunity to attract new members. We must finance the project so that this happens faster. Funding should be on a scale that we are not used to.


Specification


The overall goal is to make every single part of Django, which can be blocking - that is, not just CPU-bound computations - become asynchronous (run in an asynchronous event loop without locks).


This includes the following features:



However, this does not include such things as internationalization, which will not bring any performance gain, since this is a CPU-bound task that also runs quickly, or migrations that are single-threaded when launched through the management command.


Each individual function that becomes asynchronous inside will also provide a synchronous interface that is backward compatible with the current API (in 2.2) for the foreseeable future - we could change it over time to make them better, but synchronous APIs will not go anywhere.


An overview of how this is technically achieved is given below, and then specific implementation details for specific areas are given. It is not exhaustive for all the functions of Django, but if we achieve this initial goal, we will include almost all use cases.


The final part of this section, “Procedure,” also discusses how these changes can be implemented gradually and by several groups of developers in parallel, which is important for completing these changes with the help of volunteers in a reasonable amount of time.


Technical review


The principle that allows us to maintain synchronous and asynchronous implementations in parallel is the ability to run one style inside another.


Each function will go through three stages of implementation:



Asynchronous wrapper


First, the existing synchronous code will be wrapped in an asynchronous interface, which runs the synchronous code in the thread pool. This will allow us to design and provide an asynchronous interface relatively quickly, without having to rewrite all available code for asynchrony.


The toolkit for this is already available in asgiref as a function sync_to_async , which supports things like exception handling or threadlocals (more on this below).


Running code in threads will most likely not lead to increased productivity - the overhead that comes up will probably slow it down a bit when you just run normal linear code - but this will allow developers to start running something competitively and get used to new features.


In addition, there are several parts of Django that are sensitive to starting in the same thread upon repeated access; for example, processing transactions in a database. If we wrapped some code in atomic() that would then access the ORM through random threads taken from the pool, the transaction would have no effect, since it is bound to a connection inside the thread in which the transaction was started.


In such situations, a “sticky thread” is required in which the asynchronous context calls all the synchronous code in the same thread sequentially instead of pushing it into the thread pool, preserving the correct behavior of ORM and other thread-sensitive parts. All parts of Django that we suspect need it, including the entire ORM, will use the sync_to_async version, which takes this into account, so everything is safe by default. Users will be able to selectively disable this for competitive query execution - for more details see "ORM" below.


Asynchronous implementation


The next step is to rewrite the implementation of the function to asynchronous code and then present the synchronous interface through a wrapper that executes asynchronous code in a one-time event loop. This is already available in asgiref as a function of async_to_sync .


It is not necessary to rewrite all functions at once to quickly jump to the third stage. We can focus our efforts on the parts that we can do well and that have the support of third-party libraries, while helping the rest of the Python ecosystem in things that require more work to implement native asynchrony; This is discussed below.


This general overview works with almost all Django functions that should become asynchronous, with the exception of those places for which Python does not provide asynchronous function equivalents that we already use. The result will be either a change in how Django presents its API in asynchronous mode, or working with Python core developers to help develop Python asynchronous features.


Threadlocals


One of the basic details of the Django implementation that needs to be mentioned separately from most of the functions described below is threadlocals. As the name implies, threadlocals work within a thread, and although Django keeps the HttpRequest object outside of threadlocal, we put several other things in it - for example, database connections or the current language.


Using threadlocals can be divided into two options:



At first glance, it might seem that “context locals” can be resolved using the new contextvars module in Python, but Django 3.0 will still have to support Python 3.6, while this module appeared in 3.7. In addition, contextvars specifically designed to get rid of context when switching, for example, to a new stream, while we need to save these values ​​to allow the functions sync_to_async and async_to_sync to work normally as wrappers. When Django will only support 3.7 and newer, we might consider using contextvars , but that would require a lot of work in Django.


This has already been resolved with asgiref Local , which is compatible with coroutines and threads. Now it does not use contextvars , but we can switch it to work with backport for 3.6 after some testing.


True threadlocals, on the other hand, can simply continue to work in the current thread. However, we must be more careful to prevent such objects from leaking into another stream; when a view is no longer running in the same thread, but spawns a thread for each ORM call (during the “synchronous implementation, asynchronous wrapper” phase), some things that were possible in synchronous mode will not be possible in asynchronous.


This will require special attention and the prohibition of some previously possible operations in asynchronous mode; The cases we know of are described below in specific sections.


Simultaneous support for synchronous and asynchronous interfaces


One of the big problems that we will encounter when trying to port Django is that Python does not allow you to make synchronous and asynchronous versions of a function with the same name.


This means that you can’t just take and make an API that works something like this:


 #   value = cache.get("foo") #   value = await cache.get("bar") 

This is an unfortunate limitation of the way Python is implemented asynchronously, and there is no obvious workaround. When something is called, you do not know whether you will be await or not, so there is no way to determine what needs to be returned.


(Note: this is because Python implements asynchronous functions as “a synchronous callable that returns a coroutine,” rather than something like “calling the __acall__ method on an object.” Asynchronous context managers and iterators do not have this problem, because they have separate methods __aiter__ and __aenter__ .)


With this in mind, we must place the namespaces of synchronous and asynchronous implementations separately from each other so that they do not conflict. We could do this with the named argument sync=True , but this leads to confusing bodies of functions / methods and does not allow the use of async def , and also allows you to accidentally forget to write this argument. A random call to a synchronous method when you wanted to call it asynchronously is dangerous.


The proposed solution for most places in the Django code base is to provide a suffix for names of asynchronous implementations of functions - for example, cache.get_async in addition to synchronous cache.get . Although this is an ugly solution, it makes it very easy to detect errors when viewing code (you should use await with the _async method).


Views and HTTP Handling


Views are probably the cornerstone of the usefulness of asynchrony, and we expect most users to choose between asynchronous and synchronous code.


Django will support two kinds of views:



They will be handled by BaseHandler , which will check the view received from the URL resolver and call it accordingly. The base handler should be the first part of Django to become asynchronous, and we will need to modify the WSGI handler to call it in its own event loop using async_to_sync .


Intermediate layers (middleware) or settings like ATOMIC_REQUESTS , which wrap the views in non-asynchronously safe code (for example, the atomic() block), will continue to work, but their speed will be affected (for example, the prohibition of parallel ORM calls inside the view with atomic() )


The existing StreamingHttpResponse class will be modified to be able to accept either a synchronous or asynchronous iterator, and then its internal implementation will always be asynchronous. Similarly for FileResponse . Since this is a potential backward incompatibility point for third-party code that directly accesses Response objects, we still need to provide a synchronous __iter__ for the transition period.


WSGI will continue to be supported by Django indefinitely, but the WSGI handler will move on to running asynchronous middleware and views in its own one-time event loop. This is likely to lead to a slight decrease in performance, but in the initial experiments it did not have too much impact.


All asynchronous HTTP functions will work inside WSGI, including long-polling and slow responses, but they will be as inefficient as they are now, taking up a thread / process for each connection. ASGI servers will be the only ones that can efficiently support many concurrent requests, as well as handle non-HTTP protocols, such as WebSocket, for use by extensions like Channels .


Intermediate layers


While the previous section discussed mainly the request / response path, middleware needs a separate section because of the complexity inherent in their current design.


Django middlewares are now arranged in the form of a stack in which each middleware gets get_response to run the next in order middleware (or the view for the lowest middleware on the stack). However, we need to maintain a mixture of synchronous and asynchronous middleware for backward compatibility, and these two types will not be able to access each other natively.


Thus, to ensure that middleware works, we will instead have to initialize each middleware with the get_response placeholder, which instead returns control back to the handler and handles both the transfer of data between the middleware and the view, as well as an exception throw. In a way, it will eventually look like middleware of the Django 1.0 era from an internal point of view, although, of course, the user API will remain the same.


We can declare synchronous middleware obsolete, but I recommend not doing this any time soon. If and when we get to the end of the cycle of their obsolescence, we could then return the middleware implementation to a purely recursive stack model, as it is now.


ORM


ORM is the largest part of Django in terms of code size and the most difficult to convert to asynchronous.


This is largely due to the fact that the underlying database drivers are synchronous by design, and progress will be slow towards a set of mature, standardized, asynchronous database drivers. Instead, we must design a future in which database drivers will initially be synchronous, and lay the foundation for contributors who will further develop asynchronous drivers iteratively.


Problems with ORM fall into two main categories - threads and implicit blocking.


Streams


The main problem with ORM is that Django is designed around a single global connections object, which magically gives you the right connection for your current thread.


In an asynchronous world - where all coroutines work in the same thread - this is not only annoying, but simply dangerous. Without any additional security, a user accessing an ORM as usual risks breaking connection objects by accessing it from several different places.


Fortunately, connection objects are at least portable between threads, although they cannot be called from two threads at the same time. Django already cares about thread-safety for database drivers in the ORM code, and so we have a place to change its behavior to work properly.


We will modify the connections object so that it understands both coroutines and threads - reusing some code from asgiref.local , but with the addition of additional logic. Connections will be shared in asynchronous and synchronous code that calls each other - with context transfer via sync_to_async and async_to_sync - and synchronous code will be forced to execute sequentially in one sticky thread, so this will not work at the same time breaking thread-safety.


This implies that we need a solution like a context manager to open and close a database connection, like atomic() . This will allow us to provide consistent calls and sticky threads in this context and allow users to create multiple contexts if they want to open multiple connections. It also gives us a potential way to get rid of magical global connections if we want to develop this further.


At the moment, Django does not have connection lifecycle management that is independent of the signals from the handler class, and therefore we will use them to create and clear these “connection contexts”. The documentation will also be updated to make it clearer how to properly handle connections outside the request / response cycle; even in the current code, many users do not know that any long-running management team must periodically call close_old_connections to work correctly.


Backward compatibility means that we must allow users access to connections from any random code at any time, but we will only allow this for synchronous code; we will ensure that the code is wrapped in a “connection context”, if it is asynchronous, from day one.


It might seem like it would be nice to add transaction.atomic() in addition to transaction.atomic() and require the user to run all the code inside one of them, but this can lead to confusion about what happens if you attach one of them is inside the other.


Instead, I suggest creating a new db.new_connections() context manager that enables this behavior, and make it create a new connection whenever it is called, and allow arbitrary atomic() nesting inside it.


Each time you new_connections() block, Django sets up a new context with new database connections. All transactions that were performed outside the block continue; any ORM calls inside the block work with a new connection to the database and will see the database from this point of view. If transaction isolation is enabled in the database, as is usually done by default, this means that new connections within the block may not see the changes made by any uncommitted transactions outside it.


In addition, the connections inside this new_connections block can themselves use atomic() to trigger additional transactions on these new connections. Any nesting of these two context managers is allowed, but each time new_connections used, previously opened transactions are “suspended” and do not affect ORM calls until a new new_connections block is new_connections .


An example of how this API might look:


 async def get_authors(pattern): # Create a new context to call concurrently async with db.new_connections(): return [ author.name async for author in Authors.objects.filter(name__icontains=pattern) ] async def get_books(pattern): # Create a new context to call concurrently async with db.new_connections(): return [ book.title async for book in Book.objects.filter(name__icontains=pattern) ] async def my_view(request): # Query authors and books concurrently task_authors = asyncio.create_task(get_authors("an")) task_books = asyncio.create_task(get_books("di")) return render( request, "template.html", { "books": await task_books, "authors": await task_authors, }, ) 

This is somewhat verbose, but the goal is also to add high-level shortcuts to enable this behavior (and also cover the transition from asyncio.ensure_future in Python 3.6 to asyncio.create_task in 3.7).


With the help of this context manager and “sticky threads” within the same connection context, we guarantee that all code will be as secure as we can do it by default; there is a possibility that the user can use the connection in one thread for two different parts of the request using yield , but this is yield possible now.


Implicit locks


Another problem with the current ORM design is that blocking (network-related) operations, in particular reading related fields, are encountered in model instances.


If you take an instance of the model and then access model_instance.related_field , Django will transparently load the contents of the associated model and return it to you. However, this is not possible in asynchronous code - blocking code should not be executed in the main thread, and there is no asynchronous access to attributes.


Fortunately, Django already has a way out of this - select_related , which loads the related fields in advance, and prefetch_related for many-to-many relationships. If you use ORM asynchronously, we will prohibit any implicitly blocking operations, such as background access to attributes, and instead return an error telling you to pre-extract the field.


This has the added benefit of preventing slow code that executes N requests in a for loop, which is a common mistake of many new Django programmers. This raises the entry barrier, but remember that asynchronous Django will be optional - users will still be able to write synchronous code if they wish (and this will be encouraged in the tutorial, since synchronous code is much more difficult to make mistakes).


QuerySet , fortunately, can easily implement asynchronous generators and transparently support both synchronization and asynchrony:


 async def view(request): data = [] async for user in User.objects.all(): data.append(await extract_important_info(user)) return await render("template.html", data) 

Other


Parts of ORM associated with schema changes will not be asynchronous; they should be called only from management teams. Some projects already call them in submissions, but this is not a good idea anyway.


Patterns


The templates are now completely synchronous, and the plan is to leave them that way in the first step. , , DEP.


, Jinja2 , .


, Django , . Jinja2 , , , .


, render_async , render ; , , .



Django — _async - (, get_async , set_async ).


, API sync_to_async , BaseCache .


, thread-safety API , Django, , . , ORM, , .


Forms


, , , ModelForm ORM .


, - clean save , , . , , , DEP.


Email


Django, . send_mail_async send_mail , async - (, mail_admins ).


Django , - SMTP, . , , , , .


Testing


, Django .


ASGI- asgiref.testing.ApplicationCommunicator . assert' .


Django , , . , — , , HTTP event loop, WSGI.


. , , .


, , . async def @async_to_sync , , , Django test runner.


asyncio ( loop' , ) , , , DEBUG=True . — , , .


WebSockets


Django; , Channels , ASGI, .


, Channels, , ASGI.


Procedure


, , . , .


, . , — .


, , . , ORM, , , .


:



; , . , , , , .


, - ; , , . , , Django, Django async-only .


, , DEP , , , email . DBAPI — , core Python , , PEP, .


Motivation


, , , . Django , - , -; .


, - . , — , , .


Python — . - Python , , .


Python asyncio , , . , , , , Django-size .



Django, «»; , , — , Django — .


, . , API , Django - .


, , Django . , Django ORM , , , -.


, , — . - long-poll server-sent events. Django , - .



Django; . , , , .


Django , . ; Django- , , , , , .


, -- Django .


backward compatibility


, . « Django», ; , , .


, , , , API Django, , , Python 3, API, Django Python.


Python


Python . Python, , , .


, Django — - Python — , Python, Python . , , , .



, Django, . , Django , .


— , , , , .


, — , , — Django ( , Python ).


Django?


, Django. , Lawrence Journal-World — , , SPA — , , . , , , , .


, Django , - , — — . , , ; , Django , .


, Django . , . , , , — , , .


Justification


Django django-developers , , , , DEP.


, , , :



Channels DEP ; Django , .


, , . DEP , , — Django .


. Django, , , WSGI , event loop , . 10% — , . , .


, , (- , Python ). , Django ; , , , Django master- .


, ( ORM, ..), ; Python.


, , Django, «» . , , , , .


Alternatives


, , , .


_async


, , (, django.core.cache.cache.get_async ), :


 from django.core.cache_async import cache cache.get("foo") 

, ; , , .


, , ; .


Django


- , ; , , , — .


, , , — , .


Channels


, Channels , «» Django. , - , , Django; ORM, HTTP/middleware flow .


asyncio


event loop' Python, asyncio , Django. await async Python event loop .


, asyncio , ; Django , , . Django ; , , async runtime, , .


Greenlets/Gevent


Gevent, , Python.


, . yield await , API, Django, , . , .


, , , . greenlet-safe Django ORM - new-connection-context, .


, . Django « » gevent, , , , .


Financing


DEP.


, — , — , ( ).


, , - . Django Fellows ; — , ( ), , , - .


— , Kickstarter migrations contrib.postgres , MOSS (Mozilla) Channels . , Django, , .


, . — Python, Django — — . , Django/async, .


HTTP/middleware/view flow, , , , « », .


, , , ( , Fellows, / , Channels, , ), , .


, , , , Django, .


backward compatibility


, , , , API.


, , , , HTTP/middleware flow. , API, APM, .


, , Django , , . , ORM , , — , ORM .



DEP , ; Django .


, asgiref , , . Django Django.


Channels , Django, Django.


Copyright


( ) CC0 1.0 Universal .


')

Source: https://habr.com/ru/post/461493/


All Articles