📜 ⬆️ ⬇️

Modifying a function bytecode in Python

Some time ago I needed to solve a rather unusual task, namely, to add a non-standard operator in the python language. This task was to generate a Python code using a pseudo-code similar to an assembler that contains the goto operator. I didn’t want to write a complex lexical analyzer, in the pseudocode operator goto was used to organize cycles and transitions according to conditions, and I wanted to have some of its analogs in python, which is not present.

There is a module laid out in honor of April 1 as a joke, but it did not work for me. At once I want to make a reservation that I am aware of the drawbacks of using this operator, however, in some cases, with automatic code generation, its use greatly simplifies the life of a programmer. Plus, the described approach allows you to add any necessary code modification, if such is required, and this will be described below as an example of adding the goto operator.

So, there is a problem how to add a couple of new commands to the python and how to make it interpret them correctly (go to the necessary addresses). To do this, we write a decorator that will hook up to the function within which we want to use the goto operator and add labels (label), and use the dis modules that allow you to work with the python bytecode, and new, which allows you to create internal python objects dynamically .

For a start, let's define the format of the commands. Since the python has a number of restrictions on syntax, commands like
')
a: goto a 


do not succeed. However, the python allows you to add constructions like

 label .a goto .a 


Here it should be noted that the point plays an important role, since the python skips spaces and reduces this to calls to class attributes. Writing without a dot will result in a syntax error message. So, consider the byte code of these commands. To do this, execute the following code:

 >>> def f(): >>> label .a >>> goto .a >>> import dis >>> dis.dis( f ) 2 0 LOAD_GLOBAL 0 (label) 3 LOAD_ATTR 1 (a) 6 POP_TOP 3 7 LOAD_GLOBAL 2 (goto) 10 LOAD_ATTR 1 (a) 13 POP_TOP 14 LOAD_CONST 0 (None) 17 RETURN_VALUE 


Consequently, the command to declare a label and go over a label is reduced to three operations LOAD_GLOBAL, LOAD_ATTR, POP_TOP, the main of which are the first two. The dis module allows you to determine the bytecode of these commands using the opmap dictionary and get their symbolic representation by bytecode using the opname dictionary.

 >>> dis.opmap[ 'LOAD_GLOBAL' ] 116 >>> dis.opmap[ 'LOAD_ATTR' ] 105 


The byte representation of the function f is stored in f.func_code.co_code, and the symbolic representations of its variables are stored in f.func_code.co_names.

 >>> f.func_code.co_names ('label', 'a', 'goto') 


Now a little about the byte representations of the teams of interest. One piece of the disassembler shows that the LOAD_GLOBAL and LOAD_ATTR commands are represented by three bytes (the offset is indicated on the left), the first of which is the byte-operation code (from opmap), the second and third are the data (low and high byte, respectively) representing the index in the list f.func_code.co_names corresponding to which variable or attribute we want to declare.

You can determine if the command has arguments (and thus the length of the command in bytes) by comparing with dis.HAVE_ARGUMENT. If it is greater than or equal to the given constant, then it has arguments, otherwise - no. Thus, we obtain a function for parsing the byte-code function. Next, we replace the tag code with the NOP operation, and the goto statement code with JUMP_ABSOLUTE, which takes an offset as a parameter within the function. Here, almost all. The decorator code and usage example is shown below.

 import dis, new class MissingLabelError( Exception ): pass class ExistingLabelError( Exception ): pass def goto( function ): labels_dict = {} gotos_list = [] command_name = '' previous_operation = '' i = 0 while i < len( function.func_code.co_code ): operation_code = ord( function.func_code.co_code[ i ] ) operation_name = dis.opname[ operation_code ] if operation_code >= dis.HAVE_ARGUMENT: lo_byte = ord( function.func_code.co_code[ i + 1 ] ) hi_byte = ord( function.func_code.co_code[ i + 2 ] ) argument_position = ( hi_byte << 8 ) ^ lo_byte if operation_name == 'LOAD_GLOBAL': command_name = function.func_code.co_names[ argument_position ] if operation_name == 'LOAD_ATTR' and previous_operation == 'LOAD_GLOBAL': if command_name == 'label': label = function.func_code.co_names[ argument_position ] if labels_dict.has_key( label ): raise ExistingLabelError( 'Label redifinition: %s' % label ) labels_dict.update( { label : i - 3 } ) elif command_name == 'goto': gotos_list += [ ( function.func_code.co_names[ argument_position ], i - 3 ) ] i += 3 else: i += 1 previous_operation = operation_name codebytes_list = list( function.func_code.co_code ) for label, index in labels_dict.items(): codebytes_list[ index : index + 7 ] = [ chr( dis.opmap[ 'NOP' ] ) ] * 7 #  7     LOAD_GLOBAL, LOAD_ATTR  POP_TOP  NOP for label, index in gotos_list: if label not in labels_dict: raise MissingLabelError( 'Missing label: %s' % label ) target_index = labels_dict[ label ] + 7 codebytes_list[ index ] = chr( dis.opmap[ 'JUMP_ABSOLUTE' ] ) codebytes_list[ index + 1 ] = chr( target_index & 0xFF ) codebytes_list[ index + 2 ] = chr( ( target_index >> 8 ) & 0xFF ) #  -    code = function.func_code new_code = new.code( code.co_argcount, code.co_nlocals, code.co_stacksize, code.co_flags, str().join( codebytes_list ), code.co_consts, code.co_names, code.co_varnames, code.co_filename, code.co_name, code.co_firstlineno, code.co_lnotab ) #    new_function = new.function( new_code, function.func_globals ) return new_function 


Example of use:

 @goto def test_function( n ): goto .label1 label .label2 print n goto .label3 label .label1 print n n -= 1 if n != 0: goto .label1 else: goto .label2 label .label3 print 'the end' test_function( 10 ) 


The result of the example:

 10 9 8 7 6 5 4 3 2 1 0 the end 


In conclusion, I want to add that this solution does not quite correspond to the general python style: it is not very reliable due to the strong dependence on the interpreter version (in this case, the interpreter 2.7 was used, but it should work for all versions of 2-ki), but the solution to this problem once again proves the great flexibility of the language and the possibility of adding new necessary functionality.

Source: https://habr.com/ru/post/140356/


All Articles