Python Dependency Management: A Comparison of Approaches

I have been writing in python for about five years, of which the last three years have been developing my own project. Most of this way my team helps me with this. And with each release, with each new feature, we are increasingly trying to ensure that the project does not turn into a mess from unsupported code; we deal with cyclic imports, mutual dependencies, allocate reusable modules, rebuild the structure.

Unfortunately, in the Python community there is no universal concept of "good architecture", there is only the concept of "pythonicity", so we have to come up with the architecture ourselves. Under the cut - Longrid with reflections on architecture and, first of all, on dependency management is applicable to Python.

django.setup ()

I’ll start with a question to the junglers. Do you often write these two lines?
')

import django django.setup()

You need to start the file from this if you want to work with django objects without starting the django web server itself. This applies to models, and tools for working with time ( django.utils.timezone ), and django.urls.reverse ( django.urls.reverse ), and much more. If this is not done, then you will get an error:

 django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.

I am constantly writing these two lines. I am a big fan of ejection code; I like to create a separate .py file, twist some things in it, figure it out - and then embed it in the project.

And this constant django.setup() annoys me a lot. Firstly, you get tired of repeating it everywhere; and, secondly, django initialization takes a few seconds (we have a big monolith), and when you restart the same file 10, 20, 100 times - it just slows down the development.

How to get rid of django.setup() ? You need to write code that minimally depends on django.

For example, if we write a client of an external API, then we can make it dependent on django:

 from django.conf import settings class APIClient: def __init__(self): self.api_key = settings.SOME_API_KEY # : client = APIClient()

or it can be independent of django:

 class APIClient: def __init__(self, api_key): self.api_key = api_key # : client = APIClient(api_key='abc')

In the second case, the constructor is more cumbersome, but any manipulations with this class can be done without loading all the dzhangovskoy machinery.

Tests are also getting easier. How to test a component that depends on django.conf.settings settings? Just lock them with the @override_settings decorator. And if the component does not depend on anything, then there will be nothing to get wet: it passed the parameters to the constructor - and drove it.

Dependency management

The django dependency story is the most striking example of a problem that I encounter every day: dependency management problems in python - and the overall architecture of python applications.

The relationship with dependency management in the Python community is mixed. Three main camps can be distinguished:

Python is a flexible language. We write as we want, depending on what we want. We are not shy about cyclic dependencies, attribute substitution for classes in runtime, etc.
Python is a special language. There are idiomatic ways to build architecture and dependencies. Data transfer up and down the call stack is performed by iterators, coroutines, and context managers.
Class report on this topic and an example
Brandon Rhodes, Dropbox: Hoist your IO .

Example from the report:
```
 def main(): """          """ with open("/etc/hosts") as file: for line in parse_hosts(file): print(line) def parse_hosts(lines): """    -   """ for line in lines: if line.startswith("#"): continue yield line 
```
Python's flexibility is an extra way to shoot yourself in the foot. You need a hard set of rules for dependency management. A good example is the Russian dry-python guys. There is still a less hardcore approach - Django structure for scale and longevity , But the idea here is the same.

There are several articles on dependency management in python ( example 1 , example 2 ), but they all come down to advertising someone's Dependency Injection frameworks. This article is a new entry on the same topic, but this time it is a pure thought experiment without advertising. This is an attempt to find a balance between the three approaches above, do without an extra framework and make it “pythonic”.

I recently read Clean Architecture - and I seem to understand what the value of dependency injection in python is and how it can be implemented. I saw this on the example of my own project. In a nutshell, this is protecting the code from breaking when another code changes .

Initial data

There is an API client that executes HTTP requests for the service shortener:

 # shortener_client.py import requests class ShortenerClient: def __init__(self, api_key): self.api_key = api_key def shorten_link(self, url): response = requests.post( url='https://fstrk.cc/short', headers={'Authorization': self.api_key}, json={'url': url} ) return response.json()['url']

And there is a module that shortens all the links in the text. To do this, he uses the shortener API client:

 # text_processor.py import re from shortener_client import ShortenerClient class TextProcessor: def __init__(self, text): self.text = text def process(self): changed_text = self.text links = re.findall( r'https?://[^\r\n\t") ]*', self.text, flags=re.MULTILINE ) api_client = ShortenerClient('abc') for link in links: shortened = api_client.shorten_link(link) changed_text = changed_text.replace(link, shortened) return changed_text

The logic of code execution lives in a separate control file (let's call it a controller):

 # controller.py from text_processor import TextProcessor processor = TextProcessor("""  1: https://ya.ru  2: https://google.com """) print(processor.process())

Everything is working. The processor parses the text, shortens the links using a shortener, returns the result. The dependencies look like this:

Problem

Here's the problem: the TextProcessor class depends on the ShortenerClient class - and breaks when the ShortenerClient interface changes .

How can this happen?

Suppose in our project we decided to track shorten_link and added the callback_url argument to the shorten_link method. This argument means the address to which notifications should come when clicking on a link.

The ShortenerClient.shorten_link method began to look like this:

 def shorten_link(self, url, callback_url): response = requests.post( url='https://fstrk.cc/short', headers={'Authorization': self.api_key}, json={'url': url, 'callback_on_click': callback_url} ) return response.json()['url']

And what happens? And it turns out that when we try to start, we get an error:

 TypeError: shorten_link() missing 1 required positional argument: 'callback_url'

That is, we changed the shortener, but it was not he who broke, but his client:

So what? Well, the calling file broke, we went and fixed it. What is the problem?

If this is solved in a minute - they went and corrected - then this, of course, is not a problem at all. If there is little code in the classes and if you support them yourself (this is your side project, these are two small classes of the same subsystem, etc.), then you can stop there.

Problems begin when:

the calling and called modules have a lot of code;
different modules are supported by different people / teams.

If you write the ShortenerClient class, and your colleague writes TextProcessor , you get an offensive situation: you changed the code, but it broke. And it broke in a place that you have not seen in life, and now you need to sit down and understand someone else's code.

Even more interesting is when your module is used in several places, and not in one; and your edit will break the code on the heap of files.

Therefore, the problem can be formulated as follows: how to organize the code so that when the ShortenerClient interface is changed, ShortenerClient itself ShortenerClient , and not its consumers (of which there can be many)?

The solution here is:

Class consumers and the class itself must agree on a common interface. This interface should become law.
If the class ceases to correspond to its interface, these will be its problems, and not the problems of consumers.

Freeze the interface

What does fixing an interface look like in python? This is an abstract class:

 from abc import ABC, abstractmethod class AbstractClient(ABC): @abstractmethod def __init__(self, api_key): pass @abstractmethod def shorten_link(self, link): pass

If now we inherit from this class and forget to implement some method, we get an error:

 class ShortenerClient(AbstractClient): def __ini__(self, api_key): self.api_key = api_key client = ShortenerClient('123') >>> TypeError: Can't instantiate abstract class ShortenerClient with abstract methods __init__, shorten_link

But this is not enough. An abstract class captures only the names of methods, but not their signature.

Need a second signature verification tool. This second tool is mypy . It will help verify the signatures of inherited methods. To do this, we must add annotations to the interface:

 # shortener_client.py from abc import ABC, abstractmethod class AbstractClient(ABC): @abstractmethod def __init__(self, api_key: str) -> None: pass @abstractmethod def shorten_link(self, link: str) -> str: pass class ShortenerClient(AbstractClient): def __init__(self, api_key: str) -> None: self.api_key = api_key def shorten_link(self, link: str, callback_url: str) -> str: return 'xxx'

If we now check this code with mypy , we get an error due to the extra callback_url argument:

 mypy shortener_client.py >>> error: Signature of "shorten_link" incompatible with supertype "AbstractClient"

Now we have a reliable way to commit the class interface.

Dependency inversion

Having debugged the interface, we must move it to another place in order to completely eliminate the consumer’s dependence on the shortener_client.py file. For example, you can drag the interface directly to the consumer - to a file with the TextProcessor processor:

 # text_processor.py import re from abc import ABC, abstractmethod class AbstractClient(ABC): @abstractmethod def __init__(self, api_key: str) -> None: pass @abstractmethod def shorten_link(self, link: str) -> str: pass class TextProcessor: def __init__(self, text, shortener_client: AbstractClient) -> None: self.text = text self.shortener_client = shortener_client def process(self) -> str: changed_text = self.text links = re.findall( r'https?://[^\r\n\t") ]*', self.text, flags=re.MULTILINE ) for link in links: shortened = self.shortener_client.shorten_link(link) changed_text = changed_text.replace(link, shortened) return changed_text

And that will change the direction of addiction! Now the TextProcessor owns the interaction interface, and as a result, ShortenerClient depends on it, and not vice versa.

In simple words, we can describe the essence of our transformation as follows:

TextProcessor says: I am a processor, and I am involved in text conversion. I do not want to know anything about the shortening mechanism: this is not my business. I want to pull the shorten_link method so that it shorten_link everything for me. So please, give me an object that plays according to my rules. Decisions about the way I interact are made by me, not him.
ShortenerClient says: it seems that I cannot exist in a vacuum, and they require certain behavior from me. I'll go ask TextProcessor what I need to match so as not to break.

Multiple consumers

If several modules use shortening links, then the interface should be put not in one of them, but in some separate file, which is located above the other files, higher in hierarchy:

Control component

If consumers do not import ShortenerClient , then who does import it and create a class object? It must be a control component - in our case it is controller.py .

The simplest approach is a straightforward dependency injection, Dependency Injection “in the forehead”. We create objects in the calling code, transfer one object to another. Profit

 # controller.py import TextProcessor import ShortenerClient processor = TextProcessor( text=' 1: https://ya.ru  2: https://google.com', shortener_client=ShortenerClient(api_key='123') ) print(processor.process())

Python approach

A more “pythonic” approach is believed to be Dependency Injection through inheritance.

Raymond Hettinger talks about this in great detail in his Super considered Super report.

https://www.youtube.com/watch?v=EiOglTERPEo

To adapt the code to this style, you need to slightly change the TextProcessor , making it inheritable:

 # text_processor.py class TextProcessor: def __init__(self, text: str) -> None: self.text = text self.shortener_client: AbstractClient = self.get_shortener_client() def get_shortener_client(self) -> AbstractClient: """      """ raise NotImplementedError

And then, in the calling code, inherit it:

 # controller.py import TextProcessor import ShortenerClient class ProcessorWithClient(TextProcessor): """   ,    """ def get_shortener_client(self) -> ShortenerClient: return ShortenerClient(api_key='abc') processor = ProcessorWithClient( text=' 1: https://ya.ru  2: https://google.com' ) print(processor.process())

The second example is ubiquitous in popular frameworks:

At Django, we are constantly inherited. We redefine methods of Class-based view, models, forms; in other words, inject our dependencies into the already debugged work of the framework.
In DRF, the same thing. We are expanding views, serializers, permissions.
And so on. There are a lot of examples.

The second example looks prettier and more familiar, doesn't it? Let's develop it and see if this beauty is preserved.

Python development

In business logic, there are usually more than two components. Suppose that our TextProcessor is not an independent class, but only one of the TextPipeline elements that processes the text and sends it to the mail:

 class TextPipeline: def __init__(self, text, email): self.text_processor = TextProcessor(text) self.mailer = Mailer(email) def process_and_mail(self) -> None: processed_text = self.text_processor.process() self.mailer.send_text(text=processed_text)

If we want to isolate the TextPipeline from the classes used, we must follow the same procedure as before:

the TextPipeline class will declare interfaces for used components;
used components will be forced to conform to these interfaces;
some external code will put everything together and run.

The dependency diagram will look like this:

But what will the assembly code for these dependencies now look like?

 import TextProcessor import ShortenerClient import Mailer import TextPipeline class ProcessorWithClient(TextProcessor): def get_shortener_client(self) -> ShortenerClient: return ShortenerClient(api_key='123') class PipelineWithDependencies(TextPipeline): def get_text_processor(self, text: str) -> ProcessorWithClient: return ProcessorWithClient(text) def get_mailer(self, email: str) -> Mailer: return Mailer(email) pipeline = PipelineWithDependencies( email='abc@def.com', text=' 1: https://ya.ru  2: https://google.com' ) pipeline.process_and_mail()

Have you noticed? We first inherit the TextProcessor class to insert the ShortenerClient into it, and then inherit the TextPipeline to insert our overridden TextProcessor (as well as Mailer ) into it. We have several levels of sequential redefinition. Already complicated.

Why are all frameworks organized in this way? Yes, because it is only suitable for frameworks.

All levels of the framework are clearly defined, and their number is limited. For example, in Django, you can override FormField to insert it into an override of a Form , to insert a form into an override of View . Everything. Three levels.
Each framework serves one purpose. This task is clearly defined.
Each framework has detailed documentation that describes how and what to inherit; what and with what to combine.

Can you clearly and unambiguously identify and document your business logic? Especially the architecture of the levels at which it works? Me not. Unfortunately, Raymond Hettinger's approach does not scale to business logic.

Back to the forehead approach

At several levels of difficulty, a simple approach wins. It looks simpler - and easier to change when logic changes.

 import TextProcessor import ShortenerClient import Mailer import TextPipeline pipeline = TextPipeline( text_processor=TextProcessor( text=' 1: https://ya.ru  2: https://google.com', shortener_client=ShortenerClient(api_key='abc') ), mailer=Mailer('abc@def.com') ) pipeline.process_and_mail()

But, when the number of levels of logic increases, even such an approach becomes inconvenient. We have to imperatively initiate a bunch of classes, passing them into each other. I want to avoid many levels of nesting.

Let's try another call.

Global Instance Storage

Let's try to create a global dictionary in which the instances of the components we need will lie. And let these components reach each other through access to this dictionary.

Let's call it INSTANCE_DICT :

 # text_processor.py import INSTANCE_DICT class TextProcessor(AbstractTextProcessor): def __init__(self, text) -> None: self.text = text def process(self) -> str: shortener_client: AbstractClient = INSTANCE_DICT['Shortener'] # ...

 # text_pipeline.py import INSTANCE_DICT class TextPipeline: def __init__(self) -> None: self.text_processor: AbstractTextProcessor = INSTANCE_DICT[ 'TextProcessor'] self.mailer: AbstractMailer = INSTANCE_DICT['Mailer'] def process_and_mail(self) -> None: processed_text = self.text_processor.process() self.mailer.send_text(text=processed_text)

The trick is to put our objects in this dictionary before they are accessed . This is what we will do in controller.py :

 # controller.py import INSTANCE_DICT import TextProcessor import ShortenerClient import Mailer import TextPipeline INSTANCE_DICT['Shortener'] = ShortenerClient('123') INSTANCE_DICT['Mailer'] = Mailer('abc@def.com') INSTANCE_DICT['TextProcessor'] = TextProcessor(text=' : https://ya.ru') pipeline = TextPipeline() pipeline.process_and_mail()

Pros of working through a global dictionary:

no engine hood magic and extra DI frameworks;
flat list of dependencies in which you do not need to manage nesting;
all DI bonuses: simple testing, independence, protection of components from breakdowns when other components change.

Of course, instead of creating INSTANCE_DICT , you can use some kind of DI framework; but the essence of this will not change. The framework will provide more flexible management of instances; he will allow you to create them in the form of singleton or packs, like a factory; but the idea will remain the same.

Perhaps at some point this will not be enough for me, and I still choose some kind of framework.

And, perhaps, all this is unnecessary, and it is easier to do without it: write direct imports and not create unnecessary abstract interfaces.

What is your experience with dependency management in python? And in general - is it necessary, or am I inventing a problem from the air?

Source: https://habr.com/ru/post/461511/

All Articles