Type conversion to Boost.Python. We do the conversion between the usual types of C + + and Python

This article is not a continuation of the story about wrappers C ++ API. There will be no wrappers today. Although the logic is the third part of the story .

Today there will be a sea of blood, the dismemberment of existing types and the magical transformation of them into familiar analogs in another language.

We will not talk about the existing conversion between the lines, no, we will write our converters.

We will transform the usual Python datetime.datetime into boost :: posix_time :: ptime of the Boost library and vice versa, but to hell with it, we will turn the entire datetime library into booster types! And in order not to be boring, we sacrifice the built-in class of the Python 3.x byte array, for it there is just no converter in Boost.Python, and then brutally use the conversion of the byte array in the new python converter uuid.UUID to boost :: uuids: : uuid . Yes, the converter can be used in the converter!

Do you crave blood, Colosseum?! ..

Instead of introducing

If someone did not notice, Boost.Python does a great job turning a bunch of scalars into objects of the Python classes of the appropriate type. If you want to compare, write on pure C, use the C-API directly, let it communicate your brain. Spend a lot of time to understand the comfort of modern technology, the convenience of an easy chair, the need for a hot tub and a remote control for a TV. Fans of wooden shops, washing in ice holes and torches, let them continue to engage in cheap popular art.

So, there is such a thing: built-in converters in Boost.Python are built-in type converters from Python to C ++ and back, which are partially implemented in $ (BoostPath) \ libs \ python \ src \ converter and $ (BoostPath) \ boost \ python \ converter. There are many of them, they solve about 95% of the problems when working with the built-in types of Python and C ++, there is string conversion, not ideal of course , but if in C ++ we work with UTF-8 strings or wide-strings, then everything will be fast, qualitatively and discreetly, in the sense of convenient to use.

Almost everything that is not done by the built-in converters is solved by wrappers of your classes. Boost.Python offers a truly monstrously simple way to describe class wrappers, in the form of a meta-language that even looks like a Python class:

class_<Some>( "Some" ) .def( "method_A", &Some::method_A, args( "x", "y", "z" ) ) .def( "method_B", &Some::method_B, agrs( "u", "v" ) ) ;

Everything is great, but there is one thing ...

... one big and wonderful thing: both C ++ and Python are languages with their own libraries. In C ++

 #include <boost/date_time.hpp> #include <boost/uuid/uuid.hpp>

is a de facto analog in Python:

 import datetime import uuid

So, a lot of things in your C ++ code can already be tied to work, for example, with the class boost :: gregorian :: date , and in Python, in turn, a lot is tied to the datetime.date class, its equivalent. Working in Python with the class wrapper boost :: gregorian :: date , wrapped with all methods, operator overloading and trying to stick its instances instead of the usual datetime.date - I don’t even know what it is called, it’s not a crutch, it’s dancing with a grenade. And this grenade will jerk, gentlemen of the jury. On the Python side, you need to work with the built-in date and time library.

If you read this, and look at your code, where you extract extract datetime Python fields in C ++, there’s nothing stupid to smile, everything described in the paragraph above applies to you to no less degree. Even if you have a mega-class of date / time in C ++, it is better to write a type converter, rather than pull them one field away in some kind of cycling method.

In general, if Python has its own type, and C ++ has its own well-established type that implements the basic logic with a similar functional component, then you need a converter.

You really need a converter.

What is a converter

A converter is a kind of conversion registered in Boost.Python from C ++ type to Python type or vice versa. On the C ++ side, you use familiar types, in full confidence that in Python this will be the appropriate type. Actually, converters are usually written in both directions, but it is much easier to write a conversion from C ++ to Python, you will see. The whole point is that creating an instance in C ++ requires memory, which is often a non-trivial task. Creating an object in Python is an extremely easy task, so let's start with a conversion from C ++ to Python.

Type conversion from C ++ to Python

To convert from C ++ to Python, you need a structure that has a static convert method that accepts a reference to a type in C ++ and returns PyObject *, a common type for any object used in the C-API of the Python language and as a stuffing object boost :: python :: object.

Let's immediately get a template structure, because we want mass slaughter:

 template< typename T > struct type_into_python { static PyObject* convert( T const& ); };

All that is required is to implement, for example, for the type boost :: posix_time :: ptime, the method of specialization of the template structure:

 template<> PyObject* type_into_python<ptime>::convert( ptime const& );

and register a converter when declaring a module inside a BOOST_PYTHON_MODULE :

  to_python_converter< ptime, type_into_python<ptime> >();

Well, well, since I said Az, let's tell you and Buki. The implementation of the converter for boost :: posix_time :: ptime will look something like this:

 PyObject* type_into_python<ptime>::convert( ptime const& t ) { auto d = t.date(); auto tod = t.time_of_day(); auto usec = tod.total_microseconds() % 1000000; return PyDateTime_FromDateAndTime( d.year(), d.month(), d.day(), tod.hours(), tod.minutes(), tod.seconds(), usec ); }

Important! When registering a module, we definitely need to connect datetime via the C-API:

  PyDateTime_IMPORT; to_python_converter< ptime, type_into_python<ptime> >();

Without the PyDateTime_IMPORT string , nothing will take off.

')

In general, we are lucky that the Python C-API has a ready-made function for creating PyObject * for a new datetime.datetime by its parameters, essentially an analogue of the constructor for the datetime class. And it’s not lucky that Boost has such a “fun” API for the ptime class. The class turned out to be not completely independent, you have to pull out the date and time from it, which are separate components there, and time is represented as time_duration - the analog is not so much a datetime.time, but rather a datetime.timedelta! This, in general, will not allow one-to-one representation of the datetime library types in C ++. Well, the fact that boost :: posix_time :: time_duration does not provide direct access to microseconds and milliseconds is completely unpleasant. Instead, you have to either "cunningly" work with the fractional_seconds () method, or stupidly make a terrible thing - to take the module total_microseconds ()% 1,000,000. Worse, I have not decided yet, I don’t like how time_duration is done . We will make the datetime.time class for it, and we will not touch another similar datetime.timedelta class yet.

Convert from Python to C ++

Hehe, my friends, this is a really complicated point. Stock up with validol, fasten your seat belts.

Everything seems to be exactly the same: we make a structure template with two methods convertible and construct - the ability to convert and type constructor in C ++. Actually all the same as the methods are called, the main thing is to refer to them when registering, it is most convenient to do this in the constructor of our template structure:

 template< typename T > struct type_from_python { type_from_python() { converter::registry::push_back( convertible, construct, type_id<T>() ); } static void* convertible( PyObject* ); static void construct( PyObject*, converter::rvalue_from_python_stage1_data* ); };

Actually, when a module is declared, it will be enough to call the constructor of this structure. Well, of course, you need to implement these methods for each convertible type, for example, for ptime:

 template<> void* type_from_python<ptime>::convertible( PyObject* ); template<> void type_from_python<ptime>::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

Let's immediately look at the implementation of the convertibility check method and the ptime design method:

 void* type_from_python<ptime>::convertible( PyObject* obj ) { return PyDateTime_Check( obj ) ? obj : nullptr; } void type_from_python<ptime>::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data ) { auto storage = reinterpret_cast< converter::rvalue_from_python_storage<ptime>* >( data )->storage.bytes; date date_only( PyDateTime_GET_YEAR( obj ), PyDateTime_GET_MONTH( obj ), PyDateTime_GET_DAY( obj ) ); time_duration time_of_day( PyDateTime_DATE_GET_HOUR( obj ), PyDateTime_DATE_GET_MINUTE( obj ), PyDateTime_DATE_GET_SECOND( obj ) ); time_of_day += microsec( PyDateTime_DATE_GET_MICROSECOND( obj ) ); new(storage) ptime( date_only, time_of_day ); data->convertible = storage; }

With the convertible method, everything is clear: you datetime - go through , no - nullptr and exit.

But the construct method will be just as furious for absolutely every type!

Even if you have your own type of MyDateTime, you will have to create it in place via placing new where you will be given to place it! You see this funny operator:

  new(storage) ptime( date_only, time_of_day );

This is posting new. It creates your new object in the specified location. This is the place we need to calculate, we are offered the following way to get the desired pointer:

  auto storage = reinterpret_cast< converter::rvalue_from_python_storage<ptime>* >( data )->storage.bytes;

I will not comment on this. Just remember.

All the rest is additional calculations to call the understandable constructor of the non-dependent class ptime.

Do not forget to fill in another field at the end:

  data->convertible = storage;

Again, I don’t know how to name it more simply, just remember that this is important and you need to fill in the field. Think of it as an unpleasant trifle before universal happiness.

Examples of how it is done by someone other than me can be found here , here and here on the website Boost.Python in the FAQ section .

Converting datetime types to <boost / date_time.hpp> and back

So for date and time separately, everything is pretty simple. Thanks to our template structure, we just need to add the implementation for the date and time_duration of the following specialization methods of our template structures:

 template<> PyObject* type_into_python<date>::convert( date const& ); template<> void* type_from_python<date>::convertible( PyObject* ); template<> void type_from_python<date>::construct( PyObject*, converter::rvalue_from_python_stage1_data* ); template<> PyObject* type_into_python<time_duration>::convert( time_duration const& ); template<> void* type_from_python<time_duration>::convertible( PyObject* ); template<> void type_from_python<time_duration>::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

The task is simple, it comes down to splitting the previous methods into pairs for date and time separately.

For boost :: gregorian :: date and datetime.date :

 PyObject* type_into_python<date>::convert( date const& d ) { return PyDate_FromDate( d.year(), d.month(), d.day() ); } void* type_from_python<date>::convertible( PyObject* obj ) { return PyDate_Check( obj ) ? obj : nullptr; } void type_from_python<date>::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data ) { auto storage = reinterpret_cast< converter::rvalue_from_python_storage<date>* >( data )->storage.bytes; new(storage) date( PyDateTime_GET_YEAR( obj ), PyDateTime_GET_MONTH( obj ), PyDateTime_GET_DAY( obj ) ); data->convertible = storage; }

And for boost :: posix_time :: time_duration and datetime.time :

 PyObject* type_into_python<time_duration>::convert( time_duration const& t ) { auto usec = t.total_microseconds() % 1000000; return PyTime_FromTime( t.hours(), t.minutes(), t.seconds(), usec ); } void* type_from_python<time_duration>::convertible( PyObject* obj ) { return PyTime_Check( obj ) ? obj : nullptr; } void type_from_python<time_duration>::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data ) { auto storage = reinterpret_cast< converter::rvalue_from_python_storage<time_duration>* >( data )->storage.bytes; time_duration* t = new(storage) time_duration( PyDateTime_TIME_GET_HOUR( obj ), PyDateTime_TIME_GET_MINUTE( obj ), PyDateTime_TIME_GET_SECOND( obj ) ); *t += microsec( PyDateTime_TIME_GET_MICROSECOND( obj ) ); data->convertible = storage; }

Registration of all this stuff in our module will look like this:

 BOOST_PYTHON_MODULE( ... ) { ... PyDateTime_IMPORT; to_python_converter< ptime, type_into_python<ptime> >(); type_from_python< ptime >(); to_python_converter< date, type_into_python<date> >(); type_from_python< date >(); to_python_converter< time_duration, type_into_python<time_duration> >(); type_from_python< time_duration >(); ... }

We check the work with the conversion of the date and time

It's time to check in our megaconversion, let's get all sorts of unnecessary functions that take the date / time at the entrance and return the date / time at the exit.

 ptime tomorrow(); ptime day_before( ptime const& the_moment ); date last_day_of_this_month(); date year_after( date const& the_day ); time_duration delta_between( ptime const& at, ptime const& to ); time_duration plus_midday( time_duration const& the_moment );

We declare them in our module to call from Python:

  def( "tomorrow", tomorrow ); def( "day_before", day_before, args( "moment" ) ); def( "last_day_of_this_month", last_day_of_this_month ); def( "year_after", year_after, args( "day" ) ); def( "delta_between", delta_between, args( "at", "to" ) ); def( "plus_midday", plus_midday, args( "moment" ) );

The way these our functions do the following (although in reality it is no longer important, the types of input / output are important):

 ptime tomorrow() { return microsec_clock::local_time() + days( 1 ); } ptime day_before( ptime const& that ) { return that - days( 1 ); } date last_day_of_this_month() { date today = day_clock::local_day(); date next_first_day = (today.month() == Dec) ? date( today.year() + 1, 1, 1 ) : date( today.year(), today.month() + 1, 1 ); return next_first_day - days( 1 ); } date year_after( date const& the_day ) { return the_day + years( 1 ); } time_duration delta_between( ptime const& at, ptime const& to ) { return to - at; } time_duration plus_midday( time_duration const& the_moment ) { return time_duration( 12, 0, 0 ) + the_moment; }

In particular, here is such a simple script (in Python 3.x):

 from someconv import * from datetime import * # test datetime.datetime <=> boost::posix_time::ptime t = tomorrow(); print( 'Tomorrow at same time:', t ) for _ in range(3): t = day_before(t); print( 'Day before that moment:', t ) # test datetime.date <=> boost::gregorian::date d = last_day_of_this_month(); print( 'Last day of this month:', d ) for _ in range(3): d = year_after(d); print( 'Day before that day:', d ) # test datetime.time <=> boost::posix_time::time_duration at = datetime.now() to = at + timedelta( seconds=12*60*60 ) dt = delta_between( at, to ) print( "Delta between '{at}' and '{to}' is '{dt}'".format( at=at, to=to, dt=dt ) ) t0 = time( 6, 30, 0 ) t1 = plus_midday( t0 ) print( t0, "plus midday is:", t1 )

It should work out correctly and end with something like this, with the output of correct dates and times. The test script will of course be attached. (I do not write a conclusion, so as not to burn what time it was written!)

You can, in principle, not be shy and write your test functions, they will all work as it should, if you did everything correctly.

As a last resort, at the end I’ll post a link to the project along with a test script.

Byte array as vector bytes in C ++

Generally speaking, the example below is extremely harmful. A standard std :: vector pattern with a bit lower than int will be extremely inefficient. Losing when copying and, as a result, with vector :: resize () will be catastrophic, simply because copying will be element-wise. With all optimizations enabled, this will lead to losses of up to 170% with simple copying compared to memcpy () (measured in the MSVS v10 Release Build). Which is not particularly pleasant for a frequently used code fragment. Especially when copying is not visible, and sometimes resize () implicitly occurs. There are "interesting" subsidence in performance, in the sense that there will be something to do, catching the brakes in a large system.

The example below is purely academic, if you need a manic code optimization somewhere and you are writing a part of the C ++ module code. If you are on performance, feel free to use this conversion.

For Python 2.x, this section is irrelevant in principle. Then byte arrays were called strings. It will be much more interesting to read about working with unicode and converting it to the standard C ++ string here in PyWiki .

But for Python 3.x, this conversion will reduce the huge piece of code with a bunch of C-APIs before using the usual vector ( byte is an unsigned 8-bit integer - uint8_t ).

So, again use our wonderful patterned structures and rejoice:

 typedef uint8_t byte; typedef vector<byte> byte_array; ... template<> PyObject* type_into_python<byte_array>::convert( byte_array const& ); template<> void* type_from_python<byte_array>::convertible( PyObject* ); template<> void type_from_python<byte_array>::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

We also add the registration of converters to the ad of our module:

 BOOST_PYTHON_MODULE( ... ) { ... to_python_converter< byte_array, type_into_python<byte_array> >(); type_from_python< byte_array >(); }

And the simplest implementation, we simply use the C-API knowledge of the PyBytes object and work with the methods of std :: vector:

 PyObject* type_into_python<byte_array>::convert( byte_array const& ba ) { const char* src = ba.empty() ? "" : reinterpret_cast<const char*>( &ba.front() ); return PyBytes_FromStringAndSize( src, ba.size() ); } void* type_from_python<byte_array>::convertible( PyObject* obj ) { return PyBytes_Check( obj ) ? obj : nullptr; } void type_from_python<byte_array>::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data ) { auto storage = reinterpret_cast< converter::rvalue_from_python_storage<byte_array>* >( data )->storage.bytes; byte* dest; Py_ssize_t len; PyBytes_AsStringAndSize( obj, reinterpret_cast<char**>( &dest ), &len ); new(storage) byte_array( dest, dest + len ); data->convertible = storage; }

It is unlikely that additional comments will be required, for knowledge of the C-API of the PyBytes object I will send here .

Convert uuid.UUID to boost :: uuids :: uuid and back

You will laugh, but we have so simplified our work by creating those two templates at the very beginning, which, again, everything will be reduced to the implementation of the three methods:

 using namespace boost::uuids; ... template<> PyObject* type_into_python<uuid>::convert( uuid const& ); template<> void* type_from_python<uuid>::convertible( PyObject* ); template<> void type_from_python<uuid>::construct( PyObject*, converter::rvalue_from_python_stage1_data* );

Usually we add two new lines to the module's ad - registration of the conversion back and forth:

  to_python_converter< uuid, type_into_python<uuid> >(); type_from_python< uuid >();

And now the most interesting thing is that the C-API will not help us here, rather it will prevent, the easiest way is to act through the boost :: python :: import module of the python “uuid” itself and the “UUID” class of the same module.

 static object py_uuid = import( "uuid" ); static object py_uuid_UUID = py_uuid.attr( "UUID" ); PyObject* type_into_python<uuid>::convert( uuid const& u ) { return incref( py_uuid_UUID( object(), byte_array( u.data, u.data + sizeof(u.data) ) ).ptr() ); } void* type_from_python<uuid>::convertible( PyObject* obj ) { return PyObject_IsInstance( obj, py_uuid_UUID.ptr() ) ? obj : nullptr; } void type_from_python<uuid>::construct( PyObject* obj, converter::rvalue_from_python_stage1_data* data ) { auto storage = reinterpret_cast< converter::rvalue_from_python_storage<uuid>* >( data )->storage.bytes; byte_array ba = extract<byte_array>( object( handle<>( borrowed( obj ) ) ).attr( "bytes" ) ); uuid* res = new(storage) uuid; memcpy( res->data, &ba.front(), ba.size() ); data->convertible = storage; }

Sorry to use global variables, this is usually done in a singleton with Py_Initialize () and Py_Finalize () in the constructor and destructor, respectively. But since here we have a purely training example and are only used from Python so far, then you can get by with this approach, forgive again, but the code is clearer.

Since the behavior in these methods is very different from all of the above, it is necessary to describe in more detail what is actually happening.

In py_uuid, we saved the object of the connected uuid module from the standard Python library.

In py_uuid_UUID, we saved an object of class uuid.UUID. It is the class itself as such. Applying parentheses to this object will result in a call to the constructor and the creation of an object of this type. What we will do later. However, this class itself is still useful for the convertible method - checking the type of the argument whether an object is a UUID.

In the direction of Python from C ++, everything is clear - we simply call the constructor, pass the first parameter None (the default constructor boost :: python :: object will create just None ), the second leaves our byte array from the previous section. If your Python 2.x code changes a bit and is simplified, it is enough to pass a string and pretend that it is a byte array.

When checking a Python object for convertibility, the PyObject_IsInstance () function helps us a lot.

We take the PyObject * pointer of type uuid.UUID using the ptr () method of class boost :: python :: object. This is where the class object itself came in handy. In fact, classes in Python are the same objects. And this is great. Thank you for such a logical and understandable language.

Here is the conversion code from Python to C ++, nothing is clear what is happening on this line:

  byte_array ba = extract<byte_array>( object( handle<>( borrowed( obj ) ) ).attr( "bytes" ) );

Here, in fact, everything is extremely simple. From the uuid.UUID object that came as PyObject *, we create a full-fledged boost :: python :: object. Pay attention to the handle <> ( borrowed (obj)) construction - it is very important not to lose the borrowed call, otherwise our fresh object will crash into the destructor of the transferred object.

So, we have obtained from object PyObject * boost :: python :: object by reference to an argument of type uuid.UUID. We take the attribute bytes from our object, pull out from it byte_array through extract. Everything, we have content.

Fans can do everything through serialization-deserialization can search through conversion to string and back. Any lexical_cast () to help them and a stone around his neck. Remember that string creation and serialization in C ++ is in fact a very expensive operation.

Python 2.x users will immediately get the bytes as a string. Such before there were lines, as well as in C / C ++, in fact through char *.

In general, everything is simple further, fill in the array, sorry for unsafe copying, and transfer the filled object back to C ++.

We check the operation of the byte and UUID transformations

Let's get some more functions that drive our types back and forth between C ++ and Python:

 byte_array string_to_bytes( string const& src ); string bytes_to_string( byte_array const& src ); uuid random_uuid(); byte_array uuid_bytes( uuid const& src );

We describe them in our module to call from Python:

 BOOST_PYTHON_MODULE( someconv ) { ... def( "string_to_bytes", string_to_bytes, args( "src" ) ); def( "bytes_to_string", bytes_to_string, args( "src" ) ); def( "random_uuid", random_uuid ); def( "uuid_bytes", uuid_bytes, args( "src" ) ); ... }

Actually their behavior is not so important, but let's honestly describe their implementation for clarity of the result:

 byte_array string_to_bytes( std::string const& src ) { return byte_array( src.begin(), src.end() ); } string bytes_to_string( byte_array const& src ) { return string( src.begin(), src.end() ); } uuid random_uuid() { static random_generator gen_uuid; return gen_uuid(); } byte_array uuid_bytes( uuid const& src ) { return byte_array( src.data, src.data + sizeof(src.data) ); }

In general, such a test script (in Python 3.x):

 from someconv import * from uuid import * ... # test bytes <=> std::vector<uint8_t> print( bytes_to_string( b"I_must_be_string" ) ) print( string_to_bytes( "I_must_be_byte_array" ) ) print( bytes_to_string( " - !".encode() ) ) print( string_to_bytes( " - !" ).decode() ) print( bytes_to_string( string_to_bytes( " -  !" ) ) ) # test uuid.UUID <=> boost::uuids::uuid u = random_uuid() print( 'Generated UUID (C++ module):', uuid_bytes(u) ) print( 'Generated UUID (in Python): ', u.bytes)

Must correctly work out and produce the result of something like:

 I_must_be_string b'I_must_be_byte_array' - ! - ! -  ! Generated UUID (C++ module): b'\xf1B\xdb\xa9<lL\x9d\x9a\xfd\xf3\xe9\x9f\xa6\x9aT' Generated UUID (in Python): b'\xf1B\xdb\xa9<lL\x9d\x9a\xfd\xf3\xe9\x9f\xa6\x9aT'

By the way, if you take and remove the borrowed from the UUID from Python in C ++ for verification, drop exactly on the last line, since the object will already be destroyed and there will be nothing to take the bytes property.

Total

We have learned not only to write converters, but also to generalize them, to reduce labor costs when writing them to a minimum and to use one of the other. Actually, we already know what it is, how to use it and where it is vital.

The link to the project is here (~ 207 KB) . The MSVS v11 project is configured to build with Python 3.3 x64.

useful links

Boost.Python Documentation

How to write a string converter

Unicode to Python 2.x conversion

Converting arrays between C ++ and Python

Another option for converting date / time

Source: https://habr.com/ru/post/168827/

All Articles