Shed Skin - an experimental translator from Python to C ++

Introduction

Shed Skin is an experimental translator from Python to C ++, designed to speed up the execution of Python programs with intensive calculations. It converts programs written in a limited subset of the Python language to C ++. C ++ code can be compiled into executable code, which can be either a separate program or an extension module that can be easily imported and used in a regular Python program.

')
Shed Skin uses the type matching methods used in the Python program to generate the explicit type declarations required for the C ++ version. Since C ++ is a static-typed language, Shed Skin requires that Python code be written so that all variables are of a specific type.

In addition to the restrictions on typing and a subset of the language, supported programs cannot freely use the standard Python library, although about 25 commonly used modules are supported, such as random and re (see Library Restrictions).

In addition, the type detection technology used by Shed Skin currently does not scale well for programs that exceed several thousand lines of code (the maximum size of the program being broadcast is about 6,000 lines (sloccount)). In general, this means that Shed Skin is currently more suitable for broadcasting small programs and extension modules that do not use intensively the features of Python's dynamic typing or standard and external libraries. See below a collection of 75 non-trivial example programs.

Since Shed Skin is still in the early stages of development, it can be seriously improved. At the moment, you may encounter any errors in the process of using it. Please send us a report about them so we can fix them!

Shed Skin is currently compatible with versions 2.4 to 2.7 of Python, behaves like 2.6, and runs on Windows and most UNIX platforms such as GNU / Linux and OSX.

Restrictions on typing

Shed Skin translates ordinary, but statically typed programs in C ++. The restriction on static typing means that variables can have only one, immutable type. So, for example, the code

a = 1 a = '1' #

not allowed However, as in C ++, types can be abstract, for example, code

 a = A() a = B() #

where A and B have a common base class, let's say.

The restriction on typing also means that elements of the same collection (list, set, etc.) cannot have different types (because the types of their members must also be static). Thus, the code:

 a = ['apple', 'b', 'c'] #  b = (1, 2, 3) #  c = [[10.3, -2.0], [1.5, 2.3], []] #

let's say but code

 d = [1, 2.5, 'abc'] #  e = [3, [1, 2]] #  f = (0, 'abc', [1, 2, 3]) #

not allowed Keys and values of dictionaries can have different types:

 g = {'a': 1, 'b': 2, 'c': 3} #  h = {'a': 1, 'b': 'hello', 'c': [1, 2, 3]} #

In the current version of Shed Skin, mixed types are also allowed in tuples of length two:

 a = (1, [1]) #

In the future, it may be possible to allow mixed types in tuples of greater length.

Type None can only be mixed with non-scalar types (that is, not with int, float, bool, or complex):

 l = [1] l = None #

 m = 1 m = None #

 def fun(x = None): # :     x, , x = -1 pass fun(1)

Integers and floating-point numbers (integers and floats) can usually be mixed (integers become floating-point numbers). If this is not possible, Shed Skin will display an error message.

Limitations on a Python subset

Shed Skin will always support only a subset of all features of the Python language. Currently the following features are not supported:

eval, getattr, hasattr, isinstance, all dynamic
arbitrary precision arithmetic (integers - int - become 32-bit (signed) by default on most architectures, see Command Line Options)
package-unpacking of arguments (* args and ** kwargs)
multiple inheritance
nested functions and classes
unicode
inheritance from built-in types (excluding Exception and object)
function overload __iter__, __call__, __del__
closures

Some other features are only partially supported:

class attributes can only be accessed through the class name:

  self.class_attr #  SomeClass.class_attr #  SomeClass.some_static_method() #

function references can be passed, but not to class methods and not to classes, and they cannot be contained in any container:
```
 var = lambda x, y: x+y #  var = some_func #  var = self.some_method # ,    var = SomeClass #  [var] # ,    
```

Library Restrictions

Currently, the following 25 modules are heavily supported. Some of them, such as os.path, were translated to C ++ using Shed Skin.

array
binascii
bisect
collections (defaultdict, deque)
colorsys
ConfigParser (without SafeConfigParser)
copy
csv (without Dialect, Sniffer)
datetime
fnmatch
getopt
glob
heapq
itertools (without starmap)
math
mmap
os (under Windows some functionality is missing)
os.path
random
re
select (select function only, under UNIX)
socket
string
struct (without struct, pack_into, unpack_from)
sys
time

Note that any other module, such as pygame, pyqt, or pickle, can be used in conjunction with an extension module generated using Shed Skin. Examples of such use are found in the Shed Skin examples.

Installation

There are two types of installers: Windows Sappacking Installer and UNIX Archive. But, of course, it’s better if Shed Skin is installed using your GNU / Linux installation manager (Shed Skin is available at least on Debian, Ubuntu, Fedora and Arch).

Windows

To install the Windows version, simply download and run the installer. If you are using ActivePython or another non-standard Python distribution, or MingW, first remove it. Also keep in mind that, probably, in the 64-bit version of Python, some file is missing, so the assembly of extension modules is not possible. Instead of 64-bit, use the 32-bit version of Python.

UNIX

Installation via package manager

Sample command for Ubuntu:

 sudo apt-get install shedskin

Manual installation

To install the distribution package from a UNIX archive manually, do the following:

download and unzip the archive
run the command run sudo python setup.py install

Dependencies

To compile and run the programs generated by shedskin, the following libraries are needed:

g ++, C ++ compiler (version 4.2 or higher).
pcre debug files
Python debug files
garbage collector Boehm

To install these libraries under Ubuntu, enter:

 sudo apt-get install g++ libpcre++dev python-all-dev libgc-dev

If the Boehm garbage collector is not available through your package manager, use this method. Download, for example, version 7.2alpha6 from the website, unpack it, and install it as follows:

 ./configure --prefix=/usr/local --enable-threads=posix --enable-cplusplus --enable-thread-local-alloc --enable-large-config make make check sudo make install

If the PCRE library is not available through your package manager, use the following method. Download, for example, version 8.12 from the website, unpack it and install it as follows:

 ./configure --prefix=/usr/local make sudo make install

Osx

Manual installation

To install Shed Skin from a UNIX archive on an OSX system, do the following:

download and unzip the archive
run the command run sudo python setup.py install

Dependencies

To compile and run the programs generated by shedskin, the following libraries are needed:

g ++, C ++ compiler (version 4.2 or higher; comes with Apple Xcode development environment?).
pcre debug files
Python debug files
garbage collector Boehm

If the Boehm garbage collector is not available through your package manager, use this method. Download, for example, version 7.2alpha6 from the website, unpack it, and install it as follows:

 ./configure --prefix=/usr/local --enable-threads=posix --enable-cplusplus --enable-thread-local-alloc --enable-large-config make make check sudo make install

If the PCRE library is not available through your package manager, use the following method. Download, for example, version 8.12 from the website, unpack it and install it as follows:

 ./configure --prefix=/usr/local make sudo make install

Broadcast usual program

On Windows, first execute (by double-clicking) the init.bat file in the directory where you installed Shed Skin.

To compile the following simple test program named test.py:

print 'hello, world!'

enter:

shedskin test

Two C ++ files are created with the names test.cpp and test.hpp, as well as a Makefile.

To create an executable file named test (or test.exe), type:

 make

Creating an extension module

To compile the following program named simple_module.py as an extension module:

 # simple_module.py def func1(x): return x+1 def func2(n): d = dict([(i, i*i) for i in range(n)]) return d if __name__ == '__main__': print func1(5) print func2(10)

enter:

 shedskin -e simple_module make

To make the 'make' command successfully on a non-Windows system, make sure you have the Python debug files installed (on Debian, install python-dev; on Fedora, install python-devel).

Note that in order to define types, your module only has to call its own functions. This effect is achieved in the example due to the fact that calls are placed inside the if __name __ == '__ main__' condition, so they are not called if the module is imported. Functions can only be called indirectly, that is, if func2 calls func1, the call to func1 can be omitted.

The extension module can now be simply imported and used as usual:

 >>> from simple_module import func1, func2 >>> func1(5) 6 >>> func2(10) {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

Restrictions

There are significant differences between using a compiled expansion module and the original.

You can transfer and return only built-in scalar types and containers (int, float, complex, bool, str, list, tuple, dict, set), as well as None and instances of user-defined classes. For example, anonymous functions and iterators are currently not supported.
Built-in objects as well as their contents are completely converted to each function call / return from the Shed Skin types to CPython and vice versa. This means that you cannot change the built-in CPython objects from the Shed Skin side and vice versa, and the conversion can be slow. You can send / return instances of user-defined classes without any transformations and change them from any side.
Global variables are converted only once, during initialization, from Shed Skin to CPython. This means that the values of the CPython version and the Shed Skin version can change independently of each other. This problem can be avoided by using only constant glabolic variables, or by adding getter functions and setters.
Multiple (interoperable) expansion modules are not currently supported. Also, importing and simultaneously using the Python and translated versions may not work.

Integration with Numpy

Shed Skin currently has no direct support for Numpy. However, it is possible to transfer the Numpy array to the translated Shed Skin module as a list using its tolist method. Note that this is very inefficient (see above), so it can be used if a large amount of time is spent inside the expander. Consider the following example:

 # simple_module2.py def my_sum(a): """ compute sum of elements in list of lists (matrix) """ h = len(a) # number of rows in matrix w = len(a[0]) # number of columns s = 0.0 for i in range(h): for j in range(w): s += a[i][j] return s if __name__ == '__main__': print my_sum([[1.0, 2.0], [3.0, 4.0]])

After translating this module as an extension module using Shed Skin, we can pass the Numpy array as follows:

 >>> import numpy >>> import simple_module2 >>> a = numpy.array(([1.0, 2.0], [3.0, 4.0])) >>> simple_module2.my_sum(a.tolist()) 10.0

Distribution of binary codes

Windows

To use the generated Windows binary code on another system or run it without init.bat running, place the following files in a directory with the binary file:

shedskin-0.9 \ shedskin \ gc.dll
shedskin-0.9 \ shedskin-libpcre-0.dll
shedskin-0.9 \ bin \ libgcc_s_dw-1.dll
shedskin-0.9 \ bin \ libstdc ++. dll

UNIX

To use the generated binary file on another system, make sure that libgc and libpcre3 are installed there. If this is not the case, and you cannot install them globally on the system, you can place copies of these libraries in the same directory as the binary, using the following commands:

 $ ldd test libgc.so.1 => /usr/lib/libgc.so.1 libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 $ cp /usr/lib/libgc.so.1 . $ cp /lib/x86_64-linux-gnu/libpcre.so.3 . $ LD_LIBRARY_PATH=. ./test

Note that both systems must be 32-bit or 64-bit. If this is not the case, Shed Skin must be installed on another system in order to rebuild the binary.

Multiprocessing

Suppose we have defined the following function in a file called meuk.py:

 def part_sum(start, end): """ calculate partial sum """ sum = 0 for x in xrange(start, end): if x % 2 == 0: sum -= 1.0 / x else: sum += 1.0 / x return sum if __name__ == '__main__': part_sum(1, 10)

To translate this file into an extension module, enter:

  shedskin -e meuk make

To use the resulting extension module with the standard multiprocessing standard library module, simply add a Python wrapper:

 from multiprocessing import Pool def part_sum((start, end)): import meuk return meuk.part_sum(start, end) pool = Pool(processes=2) print sum(pool.map(part_sum, [(1,10000000), (10000001, 20000000)]))

Calling C / C ++ Code

To call the code in C / C ++, do the following:

Give Shed Skin typing information using the C / C ++ code type model. Suppose we need to call a simple function that returns a list of n smallest primes greater than a given number. The following type model, contained in the stuff.py file, is sufficient for Shed Skin to perform type matching:

 #stuff.py def more_primes(n, nr=10): return [1]

To produce an actual type mapping, write a test program called test.py, which uses this type model, and then translate it:

 #test.py import stuff print stuff.more_primes(100) shedskin test

In addition to test.py, this code also translates stuff.py in C ++. Now you can manually write C / C ++ code in the stuff.cpp file. To avoid rewriting it during the next translation of the test.py file, move stuff. * To the lib / Shed Skin directory.

Standard library

By moving stuff. * To lib /, we actually added support for an arbitrary library module to Shed Skin. Other programs broadcast by Shed Skin can now import our library and use more_primes. In fact, in the lib / directory there are types and implementation models of all supported modules. As you can see, some were partially converted to C ++ using Shed Skin.

Types of Shed Skin

Shed Skin reimplements Python's built-in types using its own C ++ class set. They have the same interface as their Python colleagues, so they are easy to use (assuming you have a basic knowledge of C ++). For details on class definitions, see the lib / builtin.hpp file. If in doubt, convert similar Python code to C ++ and look at the result!

Command line options

The shedskin command supports the following options:

-a --ann Display commented source code (.ss.py)
-b --nobounds Disable range checking
-e --extmod Generate extension module
-f --flags Set flags for Makefile
-g --nogcwarns Disable runtime GC warnings
-l --long Use whole long long ("64-bit")
-m --makefile Specify a different Makefile name
-n --silent Silent mode, show only warnings
-o --noassert Disable assert statements
-r --random Use fast random number generator (rand ())
-s --strhash Use fast string hashing algorithm (murmur)
-w --nowrap Disable wrap-around checking
-x --traceback Print traceback for uncaught exceptions.
-L --lib Add library directory

For example, to translate the test.py file as a plug-in, enter shedskin –e test or shedskin – –extmod test.

The -b or --nobounds option is very often used because it disables out of range exceptions (IndexError), which can greatly affect performance.

  a = [1, 2, 3] print a[5] # invalid index: out of bounds

Performance Tips and Tricks

Tips

Small allocations of memory (for example, creating a new tuple, list, or class instance) usually do not slow down the Python program much. However, after translation in C ++, they often become a bottleneck. This happens because for each allocation of memory, the memory is requested from the system, it must be cleared by the garbage collector, and a large number of subsequent allocations of memory are likely to cause an absence in the cache. A key approach to high performance is often a reduction in the number of small allocations, for example, replacing a small generator expression with a cycle or eliminating intermediate tuples in some calculations.

However, note that for idiomatic for a, b in enumerate (..), for a, b in enumerate (..) and for a, b in somedict.iteritems (), intermediate small objects are thrown away by the optimizer, and strings of length 1 cached

Some features of Python (which can slow down the generated code) are not always necessary and can be turned off. See the “Command Line Options” section for details. Turning off range checking is usually a very safe optimization and can greatly help in the case of code where an index operation is often used.

Access through an attribute in the generated code is faster than taking by index. For example, vx * vy * vz is faster than v [0] * v [1] * v [2].

Shed Skin takes the flags for the C ++ compiler from FLAGS * files in the directory where Shed Skin is installed. These flags can be changed, or modified using a local file called FLAGS.

With a large number of floating point calculations, it is not always necessary to follow the IEEE floating point specifications. Adding the -ffast-math flag can significantly improve performance.

Due to the profiling can squeeze even greater performance. In the latest versions of GCC, first compile and execute the generated code with -fprofile-generate, and then with fprofile-use.

For best results, configure the latest version of Boehm GC with CPPFLAGS = "- O3 -march = native" ./configure - enable-cplusplus - enable-threads = pthreads - enable-thread-local-alloc - enable-large -config --enable-parallel-mark. The latter option allows the GC to use multiple processor cores.

When optimizing, it is very useful to know how much time is spent in each part of your program. The Gprof2Dot program can be used to create beautiful traffic for a single program, as well as for the original Python code. OProfile can be used to profile an expansion module.

To use Gprof2dot, download the gprof2dot.py file from the website and install Graphviz. Then:

 shedskin program make program_prof ./program_prof gprof program_prof | gprof2dot.py | dot -Tpng -ooutput.png

To use OProfile, install it and use it as follows.

 shedskin -e extmod make sudo opcontrol --start python main_program_that_imports_extmod sudo opcontrol --shutdown opreport -l extmod.so

Receptions

The following two code fragments work in the same way, but only the second is supported:

 statistics = {'nodes': 28, 'solutions': set()} class statistics: pass s = statistics(); s.nodes = 28; s.solutions = set()

The order of calculating the arguments of a function or print statement changes during translation in C ++, so it is best not to count on it:

 print 'hoei', raw_input() # raw_input    'hoei'!

Tuples with different element types and lengths> 2 are not currently supported. However, they can be emulated:

 class mytuple: def __init__(self, a, b, c): self.a, self.b, self.c = a, b, c

Block comments surrounded by # {and #}, Shed Skin ignores. This feature can be used to comment on code that cannot be compiled. For example, the following snippet will display a dot only when running under CPython:

 print "x =", x print "y =", y #{ import pylab as pl pl.plot(x, y) pl.show() #}

Version 0.9.4, June 16, 2013, Mark Dufour and James Coughlan

Source: https://habr.com/ru/post/194650/

All Articles

Shed Skin - an experimental translator from Python to C ++

Introduction

Restrictions on typing

Limitations on a Python subset

Library Restrictions

Installation

Windows

UNIX

Installation via package manager

Manual installation

Dependencies

Osx

Manual installation

Dependencies

Broadcast usual program

Creating an extension module

Restrictions

Integration with Numpy

Distribution of binary codes

Windows

UNIX

Multiprocessing

Calling C / C ++ Code

Standard library

Types of Shed Skin

Command line options

Performance Tips and Tricks

Tips

Receptions

More articles: