πŸ“œ ⬆️ ⬇️

Python transpiler chain β†’ 11l β†’ C ++ [to speed up Python code and more]




This article discusses the most interesting transformations that the chain of two transpilers (the first translates the Python code into the code in the new 11l programming language , and the second - the 11l code in C ++), and compares the performance with other acceleration tools / code execution in Python (PyPy, Cython, Nuitka).

Replacing "slices" \ slices on ranges \ ranges

Python11l
s[-1] s[-2] s[:-1] s[1:] s[:1:] s[1:2] s[::2] s[3:10:2] s[3:10:] 
 s.last s[(len)-2] s[0..<(len)-1] s[1..] s[0..<1] s[1..<2] s[(0..).step(2)] s[(3..<10).step(2)] s[3..<10] 
An explicit indication for indexing from the end of the array s[(len)-2] instead of just s[-2] needed to eliminate the following errors:
  1. When it is required, for example, to get the previous character by s[i-1] , but with i = 0 such / this record instead of an error will silently return the last character of the string [ and I encountered this error in practice - commit ] .
  2. The expression s[i:] after i = s.find(":") will work incorrectly when the character is not found in the string [ instead of the "part of the string starting from the first character : and then" the last character of the string will be taken ] (and , to return -1 with the find() function in Python, I also consider incorrectly [ should return null / None [ and if -1 is required, then it should be written explicitly: i = s.find(":") ?? -1 ] ] ).
  3. Writing s[-n:] to get the last n characters of the string will work incorrectly with n = 0.

Chains of comparison operators


At first glance, it’s an outstanding feature of the Python language, but in practice it can be easily abandoned / dispensed with using the in operator and ranges:
a < b < cb in a<..<c
a <= b < cb in a..<c
a < b <= cb in a<..c
0 <= b <= 9b in 0..9

List inclusion (list comprehension)


Similarly, as it turned out, you can opt out of another interesting Python feature - list comprehensions.
While some glorify list comprehension and even suggest discarding `filter ()` and `map ()` , I found that:
  1. In all places where I met Python's list comprehension, you can easily get away with the functions `filter ()` and `map ()`.
     dirs[:] = [d for d in dirs if d[0] != '.' and d != exclude_dir] dirs[:] = filter(lambda d: d[0] != '.' and d != exclude_dir, dirs) '[' + ', '.join(python_types_to_11l[ty] for ty in self.type_args) + ']' '[' + ', '.join(map(lambda ty: python_types_to_11l[ty], self.type_args)) + ']' # Nested list comprehension: matrix = [ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], ] [[row[i] for row in matrix] for i in range(4)] list(map(lambda i: list(map(lambda row: row[i], matrix)), range(4))) 
  2. `filter ()` and `map ()` in 11l look prettier than in Python
     dirs[:] = filter(lambda d: d[0] != '.' and d != exclude_dir, dirs) dirs = dirs.filter(d -> d[0] != '.' & d != @exclude_dir) '[' + ', '.join(map(lambda ty: python_types_to_11l[ty], self.type_args)) + ']' '['(.type_args.map(ty -> :python_types_to_11l[ty]).join(', '))']' outfile.write("\n".join(x[1] for x in fileslist if x[0])) outfile.write("\n".join(map(lambda x: x[1], filter(lambda x: x[0], fileslist)))) outfile.write(fileslist.filter(x -> x[0]).map(x -> x[1]).join("\n")) 
    and therefore the need for list comprehensions in 11l actually disappears [the replacement of list comprehension with filter() and / or map() is performed during the conversion of the Python code to 11l automatically ] .

Convert the if-elif-else chain to switch


While Python does not contain a switch statement, this is one of the most beautiful constructs in 11l, and so I decided to insert the switch automatically:
Python11l
 ch = instr[i] if ch == "[": nesting_level += 1 elif ch == "]": nesting_level -= 1 if nesting_level == 0: break elif ch == "'": ending_tags.append(''') # '' elif ch == "'": assert(ending_tags.pop() == ''') 
 switch instr[i] '[' nesting_level++ ']' if --nesting_level == 0 loop.break "'" ending_tags.append("'") // '' "'" assert(ending_tags.pop() == "'") 
For completeness, here is the generated C ++ code.
 switch (instr[i]) { case u'[': nesting_level++; break; case u']': if (--nesting_level == 0) goto break_; break; case u''': ending_tags.append(u"'"_S); break; // '' case u''': assert(ending_tags.pop() == u'''); break; } 


Converting small dictionaries to native code


Consider this line of Python code:
 tag = {'*':'b', '_':'u', '-':'s', '~':'i'}[prev_char()] 
Most likely, this form of recording is not very effective [ in terms of performance ] , but very convenient.
')
In 11l, the entry corresponding to this line [ and obtained by the Python transpiler β†’ 11l ] is not only convenient [ however, not as elegant as in Python ] , but also fast:
 var tag = switch prev_char() {'*' {'b'}; '_' {'u'}; '-' {'s'}; '~' {'i'}} 

The line above is shipped to:
 auto tag = [&](const auto &a){return a == u'*' ? u'b'_C : a == u'_' ? u'u'_C : a == u'-' ? u's'_C : a == u'~' ? u'i'_C : throw KeyError(a);}(prev_char()); 
[ Calling the lambda function, the C ++ compiler will embed \ inline during the optimization process and only the chain of ?/: Operators will remain. ]

In the case when assignment is made to a variable, the dictionary is left as is:
Python
 rn = {'I': 1, 'V': 5, 'X': 10, 'L': 50, ...} 
11l
 var rn = ['I' = 1, 'V' = 5, 'X' = 10, 'L' = 50, ...] 
C ++
 auto rn = create_dict(dict_of(u'I'_C, 1)(u'V'_C, 5)(u'X'_C, 10)(u'L'_C, 50)...); 

Capture \ Ca pture external variables


In Python, to indicate that a variable is not local, but must be taken outside [ from the current function ] , the nonlocal keyword is used [ otherwise, for example, found = True will be interpreted as creating a new local variable found , rather than assigning a value existing external variable ] .
In 11l, the prefix @ is used for this:
Python11l
 writepos = 0 def write_to_pos(pos, npos): nonlocal writepos outfile.write(...) writepos = npos 
 var writepos = 0 fn write_to_pos(pos, npos) @outfile.write(...) @writepos = npos 
C ++:
 auto writepos = 0; auto write_to_pos = [..., &outfile, &writepos](const auto &pos, const auto &npos) { outfile.write(...); writepos = npos; }; 

Global variables


Similar to external variables, if you forget to declare a global variable in Python [ via the global keyword ] , you get an inconspicuous bug:
 break_label_index = -1 ... def parse(tokens, source_): global source, tokeni, token, scope source = source_ tokeni = -1 token = None break_label_index = -1 scope = Scope(None) ... 
 var break_label_index = -1 ... fn parse(tokens, source_) :source = source_ :tokeni = -1 :token = null break_label_index = -1 :scope = Scope(null) ... 
The code at 11l [ on the right ], as opposed to Python [ on the left ], will break_label_index error at the compilation stage with the 'undeclared variable break_label_index '.

Index / number of the current container item


I always forget the order of the variables that the Python function returns enumerate {first comes the value and then the index or vice versa}. Ruby's analog behavior - each.with_index - is much easier to remember: with index means that index comes after value, not before. But in 11l, the logic is even easier to memorize:
Python11l
 items = ['A', 'B', 'C'] for index, item in enumerate(items): print(str(index) + ' = ' + item) 
 var items = ['A', 'B', 'C'] loop(item) items print(loop.index' = 'item) 

Performance


As a test, the program uses the conversion of PC markup to HTML , and the source data is taken from the source code of the article on PC markup [ since this article is currently the largest written on PC markup ] , and is repeated 10 times, i.e. It turns out from 48.8 kilobyte article a file of 488Kb

Here is a diagram showing how many times the corresponding method of executing Python code is faster than the original [ CPython ] implementation:

And now let's add to the diagram the implementation generated by the Python transpiler β†’ 11l β†’ C ++:

Execution time [ 488Kb file conversion time ] was 868 ms for CPython and 38 ms for generated C ++ code [ this time includes full-featured [ i.e. not just working with data in RAM ] launching the program by the operating system and all the input / output [ reading the original file [ .pq ] and saving the new file [ .html ] to the disk ] ] .

I also wanted to try Shed Skin , but it does not support local functions.
Numba also failed to use (gives the error 'Use of unknown opcode LOAD_BUILD_CLASS').
Here is the archive with the program used for performance comparison [ for Windows ] (installed Python 3.6 or higher and the following Python packages are required: pywin32, cython).

Python source and output of Python β†’ 11l and 11l β†’ C ++ transpilers:
Python11l generated
(with keywords instead of letters)
11l
(with letters)
C ++ generated

Source: https://habr.com/ru/post/431318/


All Articles