It so happened that since 2012 I have been developing an open source browser, being the only programmer. On Python by itself. The browser is not the easiest thing, now there are more than 1000 modules in the main part of the project and more than 120,000 lines of Python code. In total, with satellite projects there will be one and a half times more.
At some point I was tired of messing with the floors of imports at the beginning of each file and I decided to deal with this problem once and for all. This is how the
smart_imports (
github ,
pypi ) library was born.
The idea is quite simple. Any complex project eventually forms its own naming convention for everything.
If this agreement is transformed into more formal rules, then any entity can be imported automatically by the name of its associated variable.')
For example, it will not be necessary to write
import math
order to refer to
math.pi
- and so we can understand that in this case
math
is a module of the standard library.
Smart imports support Python> = 3.5 The library is fully covered with tests,
coverage> 95% . I've been using it myself for a year.
For details, I invite under the cat.
How does it work in general?
So, the code from the title image works as follows:
- During a call to the
smart_imports.all()
library, it builds the AST module from which the call was made; - Find uninitialized variables;
- The name of each variable is run through a sequence of rules that try to find the module you need to import (or the module attribute) by name. If the rule detects the required entity, the following rules are not checked.
- Found modules are loaded, initialized and placed in the global namespace (or the necessary attributes of these modules are placed there).
Uninitialized variables are searched in all places of the code, including the new syntax.
Automatic import is enabled only for those project components that explicitly call
smart_imoprts.all()
. In addition, the use of smart imports does not prohibit the use of regular imports. This allows you to implement the library gradually, as well as resolve complex cyclical dependencies.
A meticulous reader will notice that the AST module is constructed two times:
- the first time it builds CPython during the import of the module;
- the second time it is built by smart_imports during a call to
smart_imports.all()
.
AST can really be built only once (for this you need to integrate into the process of importing modules using import hooks implemented in
PEP-0302 , but this solution slows down imports.
What do you think, why?Comparing the performance of the two implementations (with hooks and without), I came to the conclusion that when importing a module, CPython builds AST in its internal (C-shnyh) data structures. Converting them into Python data structures is more expensive than building a tree from source using the
ast module.
Of course, the AST of each module is built and analyzed only once per launch.
Default import rules
The library can be used without additional configuration. By default, it imports modules according to the following rules:
- By exact match of the name, it searches for the module next to the current one (in the same directory).
- Checks standard library modules:
- by exact name matching for top-level packages;
- for nested packages and modules checks for compound names with the replacement of a dot with an underscore. For example,
os.path
will be imported with the os_path
variable.
- By exact name match, it looks for installed third-party packages. For example, the well-known requests packet.
Performance
The work of smart imports does not affect the performance of the program, but increases the time it starts.
Due to the repeated construction of the AST, the time of the first launch increases approximately 1.5-2 times. For small projects this is irrelevant. In large projects, the launch time suffers more from the structure of dependencies between modules than from the import time of a specific module.
When smart imports become popular, I will rewrite the work with AST on C - this should significantly reduce startup costs.
To speed up loading, the results of processing AST modules can be cached on the file system. Enables caching in config. Of course, the cache is invalid when the source changes.
The launch time is affected by both the list of search rules for the modules and their sequence. Since some rules use standard Python functionality to search for modules. You can exclude these costs by explicitly specifying that the names and modules match the “Customized Names” rule (see below).
Configuration
The default configuration was described earlier. It should be enough to work with the standard library in small projects.
Default config { "cache_dir": null, "rules": [{"type": "rule_local_modules"}, {"type": "rule_stdlib"}, {"type": "rule_predefined_names"}, {"type": "rule_global_modules"}] }
If necessary, a more complex config can be put on the file system.
An example of a complex config (from browser).
During a call to the
smart_import.all()
library, it determines the position of the calling module on the file system and starts searching for the
smart_imports.json
file in the direction from the current directory to the root directory. If such a file is found, it is considered the configuration for the current module.
You can use several different configs (placing them in different directories).
There are not so many configuration options:
{ // AST. // null — . "cache_dir": null|"string", // . "rules": [] }
Import rules
The order of the rules in the config determines the order of their application. The first rule that was triggered stops the further search for imports.
In the examples of configs, the rule_predefined_names rule will often appear
rule_predefined_names
; it is necessary for the built-in functions to be correctly recognized (for example,
print
).
Rule 1: Predefined Names
The rule allows you to ignore predefined names like
__file__
and built-in functions, such as
print
.
Rule 2: Local Modules
Checks whether there is a module with the specified name next to the current module (in the same directory). If there is, it imports it.
Rule 3: Global Modules
It tries to import a module directly by name. For example,
requests module.
Rule 4: Customized Names
Corresponds to the name of a specific module or its attribute. Correspondence is indicated in the config of the rule.
Rule 5: Standard Modules
Checks if the name is a standard library module. For example
math or
os.path which is transformed into
os_path
.
It works faster than the import rule of global modules, since it checks for the presence of a module using a cached list. Lists for each version of Python are taken from here:
github.com/jackmaney/python-stdlib-listRule 6: Import by Prefix
Imports a module by name, from the package associated with its prefix. It is convenient to use when you have several packages used throughout the code. For example, the
utils
package modules can be accessed with the
utils_
prefix.
Rule 7: Module from parent package
If you have subpackets with the same name in different parts of the project (for example,
tests
or
migrations
), you can allow them to search for modules to be imported by name in the parent packages.
Rule 8: Binding to another package
For modules from a specific package, allows searching for imports by name in other packages (specified in the config file). In my case, this rule turned out to be useful for cases when I did not want to extend the work of the previous rule (Module from the parent package) to the whole project.
Add your own rules
Adding your own rule is quite simple:
- Inheriting from the class smart_imports.rules.BaseRule .
- We implement the necessary logic.
- Register a rule using the smart_imports.rules.register method
- We add the rule to the config.
- ???
- Profit
An example can be found in the
implementation of the current rules.Profit
Multi-line import lists were missing at the beginning of each source.
Reduced the number of rows. Before the browser was transferred to smart imports, it had 6,688 lines responsible for imports. After the transition, there are 2084 left (two lines of smart_imports for each file + 130 imports, called explicitly from functions and similar places).
A nice bonus was the standardization of names in the project. The code has become easier to read and easier to write. There is no need to think about the names of imported entities - there are some clear rules that are easy to follow.
Development plans
I like the idea of ​​defining code properties by variable names, so I will try to develop it both within smart imports and within other projects.
Regarding smart imports, I plan:
- Add support for new versions of Python.
- Investigate the ability to rely on the current developments of the community on the code type annotation.
- Explore the opportunity to make lazy imports.
- Implement utilities to automatically generate a config from sources and refactor sources to use smart_imports.
- Rewrite a piece of code in C to speed up the work with AST.
- Develop integration with linter and IDE, if those will have problems with code analysis without explicit imports.
In addition, I’m interested in your opinion about the library’s default behavior and import rules.
Thank you for mastering this sheet of text :-D