The fundamental problem of package managers for programming languages

Why are there so many different package managers? They can be found in many operating systems (apt, yum, pacman, Homebrew), and working with many programming languages (Bundler, Cabal, Composer, CPAN, CRAN, CTAN, EasyInstall, Go Get, Maven, npm, NuGet, OPAM PEAR, pip, RubyGems, etc., etc.). "Every programming language needs its own package manager, it has become a universally recognized truth." What kind of inexplicable attraction makes programming languages, one after another, slide into this cliff? Why don't we just use existing package managers?

You probably already have some suggestions why using apt to manage Ruby packages is not a good idea. “System package managers and package managers for programming languages are completely different things. Centralized distribution of all packages is great, but it is completely unsuitable for most libraries hosted on GitHub. Centralized packet distribution is too slow. All programming languages are different and their communities do not interact with each other. Such package managers install packages globally, and I want to manage the versions of the libraries used. ”These flaws are certainly present in this solution. But the very essence of all these problems is missing in them.

The fundamental problem is that package managers for various programming languages are decentralized .
')
This decentralization is implied even in the definition of the package manager itself: it is a kind of program that installs programs and libraries from remote sources that were not available locally at the time of installation. Even if you imagine an ideal centralized package manager, even there will be two copies of this library: one - somewhere on the server, the second - located locally with the programmer who writes the application, using this library. However, in reality, the library ecosystem suffers greatly from fragmentation - it brings together many libraries created by different developers. Of course, all libraries can be loaded and indexed in one place, but this still does not mean that library authors will be aware of any other use cases. And then we get what Perl is called DarkPAN in the world: an innumerable amount of code that seems to exist, but which we don’t have any idea about, since it is wired somewhere in the proprietary code or functions somewhere in the corporate servers. Decentralization can be bypassed only when you control absolutely all the code of your application. But in this case, you hardly need a package manager, is it? (By the way, colleagues told me that this is necessary for large projects, such as the Windows operating system or Google Chrome browser.)

Decentralized systems are complex. Seriously, very complicated. If you don’t think carefully about the architecture of such a system, dependency hell certainly awaits you. There is no one “right” solution to this problem: I can name at least three different approaches to solving this problem, which are used in different generations of package managers, and each of them has its own advantages and disadvantages.

Pinned versions. Perhaps the most popular is the opinion that the developer should strictly indicate the version of the package used. This approach is promoted by such managers as Bundler for Ruby, Composer for PHP, pip in conjunction with virtualenv for Python and any other Ruby / node.js inspired approach (for example, Gradle for Java or Cargo for Rust). Recreation of assemblies in them rules the ball - these package managers solve the problem of decentralization, simply by assuming that the entire ecosystem of packages ceases to exist as soon as you have fixed the versions. The main advantage of this approach is that you can specify the versions of libraries that you use in your code. Of course, this is also a disadvantage - you always have to control the versions of these libraries. Usually versions are simply fixed, safely forgetting about them, even if any important security update is released. To have updated versions of all dependencies development cycles are needed, but this time is most often spent on other things (for example, developing new features).

Stable versions. If package management requires each individual application developer to spend time and effort on keeping all dependencies up to date and checking that they continue to work correctly with the application and with each other, we could ask ourselves: is there a way to centralize this work? This leads us to another approach: create a centralized repository with approved packages, the work that has been tested together, and issue bug fixes and security updates for them while we maintain backward compatibility. For various programming languages there are implementations of such package managers. At least two that I know about are Anaconda for Python and Stackage for Haskell. But if you look closely, we will see that exactly the same model is used in package managers of operating systems. As a system administrator, I often recommend that users give preference to libraries distributed in operating system repositories. They will not break backward compatibility until we switch to a new release version of the OS, and at the same time, you will always use the latest bug fixes and security updates. (Yes, you will not be able to use features from the new versions, but, by itself, this goes against the concept of stability.)

Considering decentralization. Up to this point, we tried not to consider decentralization at all as an acceptable approach. They talked about the need for a central repository and update control by the developer. But do not we throw out the child with water? The main disadvantage of the centralized approach is the huge amount of work that needs to be done in order to achieve stable operation of all packages and to keep these packages up to date. In addition, no one expects that absolutely all packages will be compatible with each other, but, nevertheless, this does not prevent the use of certain categories of packages with others. The ideal decentralized system shifts the task of determining which packages can work together for everyone who participates in this system, which again brings us back to the fundamental question: How can we create an ecosystem of decentralized package managers that will work?

Here are a number of principles that can help us:

Strong dependency encapsulation. One of the reasons that makes dependency hell such an insidious problem is that package dependencies are often an inseparable part with its core API: thus, the choice of dependency is more of a global choice that affects the entire application. If the library uses any dependencies inside, and this choice is completely determined only by the details of the internal implementation of this library, it should not lead to any global constraints. NPM for NodeJS brings this principle to a logical limit — by default, it does not limit dependency duplication, allowing each library to download its own instance of the dependent package. Although I doubt that it is worth duplicating absolutely all the packages (this is found in the Maven ecosystem for Java), I certainly agree that this approach increases the composability of dependencies.
Semantic versioning promotion. In decentralized systems, it is especially important that library developers provide as accurate library information as possible so that users and utilities working with packages can make informed decisions. Different formats of versions and ranges of versions only complicate the already difficult task (as I wrote in the previous post ). If you have the opportunity to use semantic versions , or even better, instead of semantic versions, use a more correct approach, indicating type dependencies in your interfaces, our utilities will be able to make the best choice. The “gold standard” of information in decentralized systems is “Is package A compatible with package B”, and this information is often very difficult to analyze (or impossible for systems with dynamic typing).
Centralization for special cases. One of the principles of the decentralized system is that each participant can gather the most suitable environment for themselves. This implies a certain freedom in choosing a central source, or the creation and use of one’s own — centralization for special occasions. If we assume that users will create their own repositories, in the style used in operating systems, we must provide them with the tools with which you can easily and painlessly create and use these repositories.

For a long time, the source code management ecosystem was completely built around centralized systems. The proliferation of version control systems like Git has radically changed the situation: although Git may seem more complicated than Subversion to master people far from technology, the merits of decentralization are much bigger and more diverse. But no one has yet managed to create the same Git for package management. If someone assures you that the package management problem is solved and everything is just reinventing the Bundler, I ask you - think about decentralization properly.

Source: https://habr.com/ru/post/250065/

All Articles

The fundamental problem of package managers for programming languages

More articles: