📜 ⬆️ ⬇️

Python Dependency Management: A Comparison of Approaches

image

I have been writing in python for about five years, of which the last three years have been developing my own project. Most of this way my team helps me with this. And with each release, with each new feature, we are increasingly trying to ensure that the project does not turn into a mess from unsupported code; we deal with cyclic imports, mutual dependencies, allocate reusable modules, rebuild the structure.

Unfortunately, in the Python community there is no universal concept of "good architecture", there is only the concept of "pythonicity", so we have to come up with the architecture ourselves. Under the cut - Longrid with reflections on architecture and, first of all, on dependency management is applicable to Python.

django.setup ()


I’ll start with a question to the junglers. Do you often write these two lines?
')
import django django.setup() 

You need to start the file from this if you want to work with django objects without starting the django web server itself. This applies to models, and tools for working with time ( django.utils.timezone ), and django.urls.reverse ( django.urls.reverse ), and much more. If this is not done, then you will get an error:

 django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. 

I am constantly writing these two lines. I am a big fan of ejection code; I like to create a separate .py file, twist some things in it, figure it out - and then embed it in the project.

And this constant django.setup() annoys me a lot. Firstly, you get tired of repeating it everywhere; and, secondly, django initialization takes a few seconds (we have a big monolith), and when you restart the same file 10, 20, 100 times - it just slows down the development.

How to get rid of django.setup() ? You need to write code that minimally depends on django.

For example, if we write a client of an external API, then we can make it dependent on django:

 from django.conf import settings class APIClient: def __init__(self): self.api_key = settings.SOME_API_KEY # : client = APIClient() 

or it can be independent of django:

 class APIClient: def __init__(self, api_key): self.api_key = api_key # : client = APIClient(api_key='abc') 

In the second case, the constructor is more cumbersome, but any manipulations with this class can be done without loading all the dzhangovskoy machinery.

Tests are also getting easier. How to test a component that depends on django.conf.settings settings? Just lock them with the @override_settings decorator. And if the component does not depend on anything, then there will be nothing to get wet: it passed the parameters to the constructor - and drove it.

Dependency management


The django dependency story is the most striking example of a problem that I encounter every day: dependency management problems in python - and the overall architecture of python applications.

The relationship with dependency management in the Python community is mixed. Three main camps can be distinguished:


There are several articles on dependency management in python ( example 1 , example 2 ), but they all come down to advertising someone's Dependency Injection frameworks. This article is a new entry on the same topic, but this time it is a pure thought experiment without advertising. This is an attempt to find a balance between the three approaches above, do without an extra framework and make it “pythonic”.

I recently read Clean Architecture - and I seem to understand what the value of dependency injection in python is and how it can be implemented. I saw this on the example of my own project. In a nutshell, this is protecting the code from breaking when another code changes .

Initial data


There is an API client that executes HTTP requests for the service shortener:

 # shortener_client.py import requests class ShortenerClient: def __init__(self, api_key): self.api_key = api_key def shorten_link(self, url): response = requests.post( url='https://fstrk.cc/short', headers={'Authorization': self.api_key}, json={'url': url} ) return response.json()['url'] 

And there is a module that shortens all the links in the text. To do this, he uses the shortener API client:

 # text_processor.py import re from shortener_client import ShortenerClient class TextProcessor: def __init__(self, text): self.text = text def process(self): changed_text = self.text links = re.findall( r'https?://[^\r\n\t") ]*', self.text, flags=re.MULTILINE ) api_client = ShortenerClient('abc') for link in links: shortened = api_client.shorten_link(link) changed_text = changed_text.replace(link, shortened) return changed_text 

The logic of code execution lives in a separate control file (let's call it a controller):

 # controller.py from text_processor import TextProcessor processor = TextProcessor("""  1: https://ya.ru  2: https://google.com """) print(processor.process()) 

Everything is working. The processor parses the text, shortens the links using a shortener, returns the result. The dependencies look like this:

image

Problem


Here's the problem: the TextProcessor class depends on the ShortenerClient class - and breaks when the ShortenerClient interface changes .

How can this happen?

Suppose in our project we decided to track shorten_link and added the callback_url argument to the shorten_link method. This argument means the address to which notifications should come when clicking on a link.

The ShortenerClient.shorten_link method began to look like this:

 def shorten_link(self, url, callback_url): response = requests.post( url='https://fstrk.cc/short', headers={'Authorization': self.api_key}, json={'url': url, 'callback_on_click': callback_url} ) return response.json()['url'] 

And what happens? And it turns out that when we try to start, we get an error:

 TypeError: shorten_link() missing 1 required positional argument: 'callback_url' 

That is, we changed the shortener, but it was not he who broke, but his client:

image

So what? Well, the calling file broke, we went and fixed it. What is the problem?

If this is solved in a minute - they went and corrected - then this, of course, is not a problem at all. If there is little code in the classes and if you support them yourself (this is your side project, these are two small classes of the same subsystem, etc.), then you can stop there.

Problems begin when:


If you write the ShortenerClient class, and your colleague writes TextProcessor , you get an offensive situation: you changed the code, but it broke. And it broke in a place that you have not seen in life, and now you need to sit down and understand someone else's code.

Even more interesting is when your module is used in several places, and not in one; and your edit will break the code on the heap of files.

Therefore, the problem can be formulated as follows: how to organize the code so that when the ShortenerClient interface is changed, ShortenerClient itself ShortenerClient , and not its consumers (of which there can be many)?

The solution here is:


image

Freeze the interface


What does fixing an interface look like in python? This is an abstract class:

 from abc import ABC, abstractmethod class AbstractClient(ABC): @abstractmethod def __init__(self, api_key): pass @abstractmethod def shorten_link(self, link): pass 

If now we inherit from this class and forget to implement some method, we get an error:

 class ShortenerClient(AbstractClient): def __ini__(self, api_key): self.api_key = api_key client = ShortenerClient('123') >>> TypeError: Can't instantiate abstract class ShortenerClient with abstract methods __init__, shorten_link 

But this is not enough. An abstract class captures only the names of methods, but not their signature.

Need a second signature verification tool. This second tool is mypy . It will help verify the signatures of inherited methods. To do this, we must add annotations to the interface:

 # shortener_client.py from abc import ABC, abstractmethod class AbstractClient(ABC): @abstractmethod def __init__(self, api_key: str) -> None: pass @abstractmethod def shorten_link(self, link: str) -> str: pass class ShortenerClient(AbstractClient): def __init__(self, api_key: str) -> None: self.api_key = api_key def shorten_link(self, link: str, callback_url: str) -> str: return 'xxx' 

If we now check this code with mypy , we get an error due to the extra callback_url argument:

 mypy shortener_client.py >>> error: Signature of "shorten_link" incompatible with supertype "AbstractClient" 

Now we have a reliable way to commit the class interface.

Dependency inversion


Having debugged the interface, we must move it to another place in order to completely eliminate the consumer’s dependence on the shortener_client.py file. For example, you can drag the interface directly to the consumer - to a file with the TextProcessor processor:

 # text_processor.py import re from abc import ABC, abstractmethod class AbstractClient(ABC): @abstractmethod def __init__(self, api_key: str) -> None: pass @abstractmethod def shorten_link(self, link: str) -> str: pass class TextProcessor: def __init__(self, text, shortener_client: AbstractClient) -> None: self.text = text self.shortener_client = shortener_client def process(self) -> str: changed_text = self.text links = re.findall( r'https?://[^\r\n\t") ]*', self.text, flags=re.MULTILINE ) for link in links: shortened = self.shortener_client.shorten_link(link) changed_text = changed_text.replace(link, shortened) return changed_text 

And that will change the direction of addiction! Now the TextProcessor owns the interaction interface, and as a result, ShortenerClient depends on it, and not vice versa.

image

In simple words, we can describe the essence of our transformation as follows:


Multiple consumers


If several modules use shortening links, then the interface should be put not in one of them, but in some separate file, which is located above the other files, higher in hierarchy:

image

Control component


If consumers do not import ShortenerClient , then who does import it and create a class object? It must be a control component - in our case it is controller.py .

The simplest approach is a straightforward dependency injection, Dependency Injection “in the forehead”. We create objects in the calling code, transfer one object to another. Profit

 # controller.py import TextProcessor import ShortenerClient processor = TextProcessor( text=' 1: https://ya.ru  2: https://google.com', shortener_client=ShortenerClient(api_key='123') ) print(processor.process()) 

Python approach


A more “pythonic” approach is believed to be Dependency Injection through inheritance.

Raymond Hettinger talks about this in great detail in his Super considered Super report.

To adapt the code to this style, you need to slightly change the TextProcessor , making it inheritable:

 # text_processor.py class TextProcessor: def __init__(self, text: str) -> None: self.text = text self.shortener_client: AbstractClient = self.get_shortener_client() def get_shortener_client(self) -> AbstractClient: """      """ raise NotImplementedError 

And then, in the calling code, inherit it:

 # controller.py import TextProcessor import ShortenerClient class ProcessorWithClient(TextProcessor): """   ,    """ def get_shortener_client(self) -> ShortenerClient: return ShortenerClient(api_key='abc') processor = ProcessorWithClient( text=' 1: https://ya.ru  2: https://google.com' ) print(processor.process()) 

The second example is ubiquitous in popular frameworks:


The second example looks prettier and more familiar, doesn't it? Let's develop it and see if this beauty is preserved.

Python development


In business logic, there are usually more than two components. Suppose that our TextProcessor is not an independent class, but only one of the TextPipeline elements that processes the text and sends it to the mail:

 class TextPipeline: def __init__(self, text, email): self.text_processor = TextProcessor(text) self.mailer = Mailer(email) def process_and_mail(self) -> None: processed_text = self.text_processor.process() self.mailer.send_text(text=processed_text) 

If we want to isolate the TextPipeline from the classes used, we must follow the same procedure as before:


The dependency diagram will look like this:

image

But what will the assembly code for these dependencies now look like?

 import TextProcessor import ShortenerClient import Mailer import TextPipeline class ProcessorWithClient(TextProcessor): def get_shortener_client(self) -> ShortenerClient: return ShortenerClient(api_key='123') class PipelineWithDependencies(TextPipeline): def get_text_processor(self, text: str) -> ProcessorWithClient: return ProcessorWithClient(text) def get_mailer(self, email: str) -> Mailer: return Mailer(email) pipeline = PipelineWithDependencies( email='abc@def.com', text=' 1: https://ya.ru  2: https://google.com' ) pipeline.process_and_mail() 

Have you noticed? We first inherit the TextProcessor class to insert the ShortenerClient into it, and then inherit the TextPipeline to insert our overridden TextProcessor (as well as Mailer ) into it. We have several levels of sequential redefinition. Already complicated.

Why are all frameworks organized in this way? Yes, because it is only suitable for frameworks.


Can you clearly and unambiguously identify and document your business logic? Especially the architecture of the levels at which it works? Me not. Unfortunately, Raymond Hettinger's approach does not scale to business logic.

Back to the forehead approach


At several levels of difficulty, a simple approach wins. It looks simpler - and easier to change when logic changes.

 import TextProcessor import ShortenerClient import Mailer import TextPipeline pipeline = TextPipeline( text_processor=TextProcessor( text=' 1: https://ya.ru  2: https://google.com', shortener_client=ShortenerClient(api_key='abc') ), mailer=Mailer('abc@def.com') ) pipeline.process_and_mail() 

But, when the number of levels of logic increases, even such an approach becomes inconvenient. We have to imperatively initiate a bunch of classes, passing them into each other. I want to avoid many levels of nesting.

Let's try another call.

Global Instance Storage


Let's try to create a global dictionary in which the instances of the components we need will lie. And let these components reach each other through access to this dictionary.

Let's call it INSTANCE_DICT :

 # text_processor.py import INSTANCE_DICT class TextProcessor(AbstractTextProcessor): def __init__(self, text) -> None: self.text = text def process(self) -> str: shortener_client: AbstractClient = INSTANCE_DICT['Shortener'] # ...   

 # text_pipeline.py import INSTANCE_DICT class TextPipeline: def __init__(self) -> None: self.text_processor: AbstractTextProcessor = INSTANCE_DICT[ 'TextProcessor'] self.mailer: AbstractMailer = INSTANCE_DICT['Mailer'] def process_and_mail(self) -> None: processed_text = self.text_processor.process() self.mailer.send_text(text=processed_text) 

The trick is to put our objects in this dictionary before they are accessed . This is what we will do in controller.py :

 # controller.py import INSTANCE_DICT import TextProcessor import ShortenerClient import Mailer import TextPipeline INSTANCE_DICT['Shortener'] = ShortenerClient('123') INSTANCE_DICT['Mailer'] = Mailer('abc@def.com') INSTANCE_DICT['TextProcessor'] = TextProcessor(text=' : https://ya.ru') pipeline = TextPipeline() pipeline.process_and_mail() 

Pros of working through a global dictionary:


Of course, instead of creating INSTANCE_DICT , you can use some kind of DI framework; but the essence of this will not change. The framework will provide more flexible management of instances; he will allow you to create them in the form of singleton or packs, like a factory; but the idea will remain the same.

Perhaps at some point this will not be enough for me, and I still choose some kind of framework.

And, perhaps, all this is unnecessary, and it is easier to do without it: write direct imports and not create unnecessary abstract interfaces.

What is your experience with dependency management in python? And in general - is it necessary, or am I inventing a problem from the air?

Source: https://habr.com/ru/post/461511/


All Articles