Node.js without node_modules

Last week, the developers of Yarn (a package manager for Javascript) announced a new feature - the Plug'n'Play installation. This feature allows you to run Node.js projects without using the node_modules folder, in which project dependencies are usually installed before launch. The description of the feature declares that node_modules will no longer be needed - the modules will be loaded from the common cache of the package manager.

At the same time, the NPM developers also announced their similar problem solution.

Let's take a closer look at these solutions and try to test them in real projects.

History of the problem

Initially, the modular system of NodeJS was completely based on the file system. Any require() call is mapped to the file system. For the organization of third-party modules, the folder node_modules was invented, into which reusable modules and libraries should be downloaded and installed. Thus, each project received its own separate set of dependencies, wasting rationally the disk space.

Installing dependencies takes most of the build time in CI systems, so speeding up this step will have a positive effect on build time as a whole.

Simplified, installing modules consists of the following steps:

Calculates a specific version of the module from the allowed interval.
All modules of the required versions are downloaded from the repository and stored in the local cache.
Modules from the local cache are copied to the project's node_modules folder

If the first two steps are already sufficiently optimized and are performed quickly when you already have cached modules, the third step has remained almost unchanged compared to the first versions of node and npm.

The new approach proposes to get rid of the third step and replace the actual copying of files with the creation of a table that maps the requested modules onto their copies in the local cache.

Using symlinks

Instead of actually copying modules, you can add a symlink to their location in the cache. This approach is implemented in PNPM , another alternative package manager. The approach may well work, but with symlinks there are many problems associated with the dual location of the file, the search for adjacent modules, etc. In addition, creating symlinks is a file operation that I would like to avoid in the ideal way of working.

We try Yarn PNP

More information about this feature can be found in the official description . This paragraph contains a brief retelling of it.

The PNP version of Yarn is now in feature-branch yarn-pnp .

Clone the repository locally with the desired branch

 git clone git@github.com:yarnpkg/yarn.git --branch yarn-pnp

Assembly instructions yarn is here , a set of steps is very trivial.

After the build is completed, we add an alias to the custom version of yarn and we can start working with it:

 alias yarn-local="node $PWD/lib/cli/index.js"

Plug'n'play is enabled in two ways: either through the flag: yarn --pnp , or by additional configuration in package.json : "installConfig": {"pnp": true} .

As an example, Yarn developers have already prepared a demo project . It has a Webpack, Babel and other tools typical of the modern frontend. Let's try to install its dependencies in different ways and get the following results:

Normal installation yarn : 19s
Installation via yarn --pnp : 3s

Before the measurement, one cold installation was carried out so that all the necessary modules were already in the cache.

Let's now figure out how this works. After a pnp installation, an additional .pnp.js file is created in the project root, which contains an override of native logic in the Module class embedded in Node.js. By loading this file into our code, we give the require() function the ability to get modules from the global cache and not to look at node_modules . All built-in yarn commands, like yarn start or yarn test by default, preload this file, so no changes in your code will be required if you have already used Yarn before.

In addition to mapping modules, pnp.js performs additional dependency validation. If you try to call require('test') , without a dependency declared in package.json , you will get the following error: Error: You cannot require a package ("test") that is not declared in your dependencies . This improvement should improve the reliability and predictability of the code.

Among the shortcomings of the new approach, it is worth noting that additional integration is required for tools that worked directly with the node_modules directory without the built-in Node mechanisms. For example, for Webpack and other frontend collectors, additional plug-ins will be needed so that they can find the necessary files for the bundling.

In the demo project there are sketches of resolvers , for Eslint, Jest, Rollup and Webpack.

In my experiment, there are still problems with Typescript, which is strongly tied to the presence of node_modules and there is no easy way to override the module search strategy.

There will also be problems with postintall scripts. Since the module remains in the cache, postinstall scripts that change its state (for example, download additional files) can damage the cache and break other projects that depend on it. Yarn developers recommend disabling script execution with the --ignore-scripts flag. They have already experimented with the inclusion of this flag by default for all projects inside Facebook and found no serious problems. In the long term, the abandonment of postinstall scripts seems like a good step in view of known security issues .

Try NPM tink

The NPM team also announced its alternative solution. Their new tool, tink, is supplied as a separate, NPM-independent module. At the input, tink accepts the file package-lock.json , which is automatically generated when npm install started. Based on the lock file, tink generates the node_modules/.package-map.json , which stores the projection of local modules to their real location in the cache.

Unlike Yarn, there is no hook file that can be preloaded into your project to patch the require. Instead, it is proposed to use the tink command instead of node to get the right environment. This approach is less ergonomic, because it will require modifications in your code to make it work. However, as a proof-of-concept will do.

I tried to compare the installation speed of the modules with the npm ci and tink , but the tink was even slower, so I will not give any results. Obviously, this project is much more raw than Yarn and is not optimized at all. Well, we will wait for new releases.

Conclusion

Rejection of the node_modules directory is a logical step, taking into account the experience of other languages where this approach was not originally. This will favorably affect the speed of assembly with CI-systems, where it is possible to save cache packages between builds. In addition, if you transfer the package cache and the .pnp.js file from one computer to another, you can reproduce the environment without even starting Yarn. This can be useful in container systems for assembling: mount the cache directory, put the .pnp.js file, and you can immediately run tests.

The new approach looks unusual and breaks some established practices, based on the fact that all modules are always available in node_modules. But the .pnp.js file offers an API that allows you to abstract away from the actual position of the files and work with the virtual tree. In addition, in an extreme case, there is a command yarn unplug --persist , which will extract the module from the cache and place it locally in the node_modules .

In any case, nothing has yet been finalized, even the pull-request in Yarn is not yet injected, we should expect changes. But it was interesting to me to try the alpha version of the feature in and test them on a couple of my personal projects and make sure that this approach really works, making the installation faster.

Links

Source: https://habr.com/ru/post/423487/

All Articles