We sharpen the old code under the new realities

In this article, I will discuss one of the ways to transform the C / C ++ code into the code written in C # with the least effort. However, the principles described are suitable for other pairs of languages. I want to immediately say that the method is not designed for the transformation of the code that implements the GUI.

Why do it? For example, I ported the well-known graphic library LibTiff (and LibJpeg at the same time) to C # in this way. This made it possible to use the achievements of many people who created LibTiff in my program along with the .NET Framework class library. Code examples in the article will be mainly from LibTiff and LibJpeg.

')

1. Infrastructure

What is required:

The original code that you can collect "for one click."
A set of tests, which can also be performed "one click".
Version control system.
Basic concepts of refactoring.

Requirements to collect for "one click" and perform "for one click" are needed in order to maximally speed up the cycle of "changed-compiled-launched tests". The more effort it takes to perform one such cycle, the less often it will be executed. This can lead to complex and large-scale rollbacks of unsuccessful changes.

Version control system will suit any. I use Subversion in my work - you, in turn, can use what is convenient for you. The main thing is to use at least something other than folders on the disk.

Tests will be required to ensure at any time that the code is still doing what it should, and just as before. The assurance that the code does not functionally change is the main difference between the described method and the “write everything from scratch in a new language” method. Tests are not required to cover 100% of the code, but it is desirable to have tests for all the basic functionality. It is desirable that the tests do not have access to the internal structure of the program, this will avoid the constant rewriting of tests.

For example, to port LibTiff I used:

set of images in different versions of the TIFF format.
a tiffcp console program that converts images from one TIFF to another.
a set of scripts (bat-files) that call tiffcp to convert.
set of expected output images.
A program for binary comparison of images obtained after converting with expected images.

About refactoring it is enough to read only one book. This is Martin Fowler's book Refactoring. Improving existing code. ” If you have not read it, be sure to read it - for any programmer, knowledge of refactoring principles is only useful. The whole book is not necessary to read. Enough to read about 130 pages from the beginning. These are the first five parts and the beginning of the sixth part, before the “Embedding Method” section.

Of course, the better you know the languages between which the code will be transformed, the easier it will be for you. Note that the knowledge of the device of the original code is not required. For a start, it’s enough to know what the source code does. How he does this will become clear in the process of transformation.

2. The transfer process

The essence of the method is that the original code with a large number of simple and small changes is reduced to a simplified form, while maintaining its capabilities. No need to try to immediately change a large piece of code and still optimize it in addition. You need to move as small steps as possible and after each step, run tests and record successful changes. That is, changed a little - checked. If everything is in order, the changes are uploaded to the version system repository

The transfer process can be divided into three major steps:

In the original code, everything that uses the specific capabilities of the source language is gradually replaced by a simpler, but equivalent in functionality. This often leads to the fact that the code starts to work slower and does not look so beautiful. Do not worry about it at this stage.
The modified code is given to the form that will be able to build a new compiler.
Tests are transferred, and the code in the new language is brought to coincidence in functionality with the original code.

Only after all these points have been completed is it worth remembering about the speed and beauty of the code.

The main difficulty is the first stage. At this stage, you need to convert the code in C / C ++ to the code in "pure C ++", as close as possible to the syntax for C #. At this stage, you need to get rid of:

preprocessor directives
goto operators
typedef statements
pointer arithmetic
function pointers
multiple inheritance
“Free” functions

Let us proceed to the consideration of specific steps.

2.1 Delete unused code

The first step is to remove unused portions of the code. For example, from LibTiff I first deleted all files that did not belong to the Windows version build. Then, in the remaining files, I found conditional compilation statements, into which code ignored by the Visual Studio compiler was enclosed, and also deleted them. Examples of such code:

#if defined(__BORLANDC__) || defined(__MINGW32__) # define XMD_H 1 #endif</code> < code > #if 0 extern const int jpeg_zigzag_order [ ] ; #endif

#if defined(__BORLANDC__) || defined(__MINGW32__) <br/>
# define XMD_H 1 <br/>
#endif</code> <br/>
<br/>
< code > #if 0 <br/>
extern const int jpeg_zigzag_order [ ] ; <br/>
#endif

In many cases, unused functions can be found in the source code. They also need to be sent to eternal rest.

2.2 Preprocessor and conditional compilation

Often conditional compilation is used to create specialized versions of a program. This is when in one or several files with #define is configured that will be used during compilation, and the code in other files is enclosed in # ifdef / # endif. Example:

/*jconfig.h for Microsoft Visual C++ on Windows 95 or NT. */ ..... #define BMP_SUPPORTED #define GIF_SUPPORTED ..... /* wrbmp.c */ .... #ifdef BMP_SUPPORTED ... #endif /* BMP_SUPPORTED */

/*jconfig.h for Microsoft Visual C++ on Windows 95 or NT. */ <br/>
.....<br/>
#define BMP_SUPPORTED <br/>
#define GIF_SUPPORTED <br/>
.....<br/>
<br/>
/* wrbmp.c */ <br/>
....<br/>
#ifdef BMP_SUPPORTED <br/>
...<br/>
#endif /* BMP_SUPPORTED */

I recommend to immediately select what will be used and get rid of the conditional compilation. For example, if you decide that support for BMP pictures is needed, then you need to remove the #ifdef BMP_SUPPORTED command from the entire code.

If you need to save the creation of multiple versions of the program, then you need to make a set of tests for each version of the program. I advise you to leave one of the most complete version and work with it. After the transfer is complete, you can add the necessary conditional compilation commands again.

This does not end the work with the preprocessor. It is necessary to find preprocessor commands that emulate functions, and replace them with full-fledged functions.

#define CACHE_STATE(tif, sp) do { \ BitAcc = sp -> data ; \ BitsAvail = sp -> bit ; \ EOLcnt = sp -> EOLcnt ; \ cp = ( unsigned char * ) tif -> tif_rawcp ; \ ep = cp + tif -> tif_rawcc ; \ } while ( 0 )

#define CACHE_STATE(tif, sp) do { \ <br/>
BitAcc = sp -> data ; \<br/>
BitsAvail = sp -> bit ; \<br/>
EOLcnt = sp -> EOLcnt ; \<br/>
cp = ( unsigned char * ) tif -> tif_rawcp ; \<br/>
ep = cp + tif -> tif_rawcc ; \<br/>
} while ( 0 )

To correctly compile the function signature, you need to know the types of all variables. Note that the variables BitAcc, BitsAvail, EOLcnt, cp and ep are assigned values. These variables will become the parameters of the new function, and they must be passed by reference. That is, for example, for BitAcc, you need to write uint32 & in the function signature.

Sometimes programmers abuse preprocessor commands. Look at a real example of such abuse:

#define HUFF_DECODE(result,state,htbl,failaction,slowlabel) \ { register int nb, look ; \ if ( bits_left < HUFF_LOOKAHEAD ) { \ if ( ! jpeg_fill_bit_buffer ( & state,get_buffer,bits_left, 0 ) ) { failaction ; } \ get_buffer = state. get_buffer ; bits_left = state. bits_left ; \ if ( bits_left < HUFF_LOOKAHEAD ) { \ nb = 1 ; goto slowlabel ; \ } \ } \ look = PEEK_BITS ( HUFF_LOOKAHEAD ) ; \ if ( ( nb = htbl -> look_nbits [ look ] ) != 0 ) { \ DROP_BITS ( nb ) ; \ result = htbl -> look_sym [ look ] ; \ } else { \ nb = HUFF_LOOKAHEAD + 1 ; \ slowlabel : \ if ( ( result = jpeg_huff_decode ( & state,get_buffer,bits_left,htbl,nb ) ) < 0 ) \ { failaction ; } \ get_buffer = state. get_buffer ; bits_left = state. bits_left ; \ } \ }

#define HUFF_DECODE(result,state,htbl,failaction,slowlabel) \ <br/>
{ register int nb, look ; \<br/>
if ( bits_left < HUFF_LOOKAHEAD ) { \<br/>
if ( ! jpeg_fill_bit_buffer ( & state,get_buffer,bits_left, 0 ) ) { failaction ; } \<br/>
get_buffer = state. get_buffer ; bits_left = state. bits_left ; \<br/>
if ( bits_left < HUFF_LOOKAHEAD ) { \<br/>
nb = 1 ; goto slowlabel ; \<br/>
} \<br/>
} \<br/>
look = PEEK_BITS ( HUFF_LOOKAHEAD ) ; \<br/>
if ( ( nb = htbl -> look_nbits [ look ] ) != 0 ) { \<br/>
DROP_BITS ( nb ) ; \<br/>
result = htbl -> look_sym [ look ] ; \<br/>
} else { \<br/>
nb = HUFF_LOOKAHEAD + 1 ; \<br/>
slowlabel : \<br/>
if ( ( result = jpeg_huff_decode ( & state,get_buffer,bits_left,htbl,nb ) ) < 0 ) \<br/>
{ failaction ; } \<br/>
get_buffer = state. get_buffer ; bits_left = state. bits_left ; \<br/>
} \<br/>
}

In the above code, PEEK_BITS and DROP_BITS are also “functions” created in the same way as HUFF_DECODE. In this case, it may be wise to fully include the code of the “functions” PEEK_BITS and DROP_BITS in HUFF_DECODE to simplify the transformation.

You can proceed to the next stage of code refinement after only harmless preprocessor commands are left:

#define DATATYPE_VOID 0

2.3 Operators switch and goto

It is possible to get rid of goto by introducing boolean variables and / or changing the function code. For example, if a function has a loop in which goto is used outside of the loop, then such a construction can be changed to set a boolean, break and check the value of the variable after the loop.

In the next step, I check all switch constructs for the presence of cases with missing breaks.

switch ( test1 ( buf ) ) { case - 1 : if ( line != buf + ( bufsize - 1 ) ) continue ; /* falls through */ default : fputs ( buf, out ) ; break ; }

switch ( test1 ( buf ) ) <br/>
{ <br/>
case - 1 : <br/>
if ( line != buf + ( bufsize - 1 ) ) <br/>
continue ; <br/>
/* falls through */ <br/>
default : <br/>
fputs ( buf, out ) ; <br/>
break ; <br/>
}

This is allowed in C / C ++, but prohibited in C #. Such switch statements can either be replaced with several if blocks, or, if the case with fallthrough consists of a pair of lines, duplicate the common code.

2.4 We collect stones

Everything described earlier requires a fairly small amount of time compared to further tasks. The first such large-scale task is to collect data and functions into classes. The goal is a situation where each function is a method of a class.

If the code was originally written in C ++, then, most likely, there are not enough functions (not methods) in it. In this case, you need to find a connection between existing classes and "free" functions. It usually turns out that functions perform an auxiliary role for classes. If the function is used only in one class, then it can be added to this class as a static method. If the function is used from several classes, then you can make a new class and introduce the function by a static method into the newly created class.

If the code was originally written in C, then the classes in it are not found. You have to create them from scratch, grouping functions around the data they manage. Fortunately, it is usually sufficient to simply understand which data and functions constitute one logical unit. Especially if the code is written in C, but object-oriented.

Consider the example below:

struct tiff { char * tif_name ; int tif_fd ; int tif_mode ; uint32 tif_flags ; ...... } ; ...... extern int TIFFDefaultDirectory ( tiff * ) ; extern void _TIFFSetDefaultCompressionState ( tiff * ) ; extern int TIFFSetCompressionScheme ( tiff * , int ) ; ......

struct tiff<br/>
{ <br/>
char * tif_name ; <br/>
int tif_fd ; <br/>
int tif_mode ; <br/>
uint32 tif_flags ; <br/>
......<br/>
} ; <br/>
......<br/>
extern int TIFFDefaultDirectory ( tiff * ) ; <br/>
extern void _TIFFSetDefaultCompressionState ( tiff * ) ; <br/>
extern int TIFFSetCompressionScheme ( tiff * , int ) ; <br/>
......

It is easy to see that the tiff structure simply begs for becoming a class, and the three functions declared below are public methods of this class. So, it is worth changing the struct to class and making the functions static class methods.

As most functions become class methods, it will become easier to understand what to do with the remaining “free” functions. Do not forget that not all functions will become public methods. Usually there are a number of auxiliary functions not intended for external use. Such support functions will become private methods.

After the functions have become static class methods, I advise you to replace malloc / free with new / delete and add constructors with destructors. Then we begin to turn static methods into full-fledged class methods. As the methods cease to be static, it becomes clear that at least one parameter is superfluous. This is a pointer to the original structure, which has become a class. Of course, such redundancy must be eliminated. It may also turn out that some parameters of private functions can be made variable by members of the class.

2.5 Again preprocessor and multiple inheritance

After a set of classes came out of a set of functions and structures, it’s time to return to the preprocessor. Or rather, to define it like the one below (you shouldn’t have any other time by now):

#define STRIP_SIZE_DEFAULT 8192

Such defines should be turned into constants and pick up the class that will become their owner. As with functions, for newly created constants you may need to create a class (for example, Constants). Like functions, constants can become public or private.

If the original code was written in C ++, then multiple inheritance can be used. This is another thing that you need to get rid of for transfer to C #. One way: to change the class hierarchy so that multiple inheritance is excluded. Another way is that all classes that are used for multiple inheritance contain only pure virtual (virtual) methods and do not contain variables. Example:

class A { public : virtual bool DoSomething ( ) = 0 ; } ; class B { public : virtual bool DoAnother ( ) = 0 ; } ; class C : public A, B { ...... } ;

class A<br/>
{ <br/>
public : <br/>
virtual bool DoSomething ( ) = 0 ; <br/>
} ; <br/>
class B<br/>
{ <br/>
public : <br/>
virtual bool DoAnother ( ) = 0 ; <br/>
} ; <br/>
class C : public A, B<br/>
{ ...... } ;

Such multiple inheritance can be easily transferred to C # by declaring classes A and B as interfaces.

2.6 operator typedef

Before proceeding to the next large-scale task to get rid of pointer arithmetic, you should pay attention to the declarations of type synonyms (typedef operator). Sometimes these ads are used to shorten the record. For example:

typedef vector<Command*> Commands;

I prefer to embed such constructions, that is, change Commands to vector <Command *> in the code, and delete them.

More interesting are the following typedef uses:

typedef signed char int8 ; typedef unsigned char uint8 ; typedef short int16 ; typedef unsigned short uint16 ; typedef int int32 ; typedef unsigned int uint32 ;

typedef signed char int8 ; <br/>
typedef unsigned char uint8 ; <br/>
typedef short int16 ; <br/>
typedef unsigned short uint16 ; <br/>
typedef int int32 ; <br/>
typedef unsigned int uint32 ;

Here you should pay attention to the generated type names. It is obvious that typedef short int16; and typedef int int32; rather, interference, which means it is better to change the int16 code to short, and int32 to int. But the remaining typedefs are very useful. It makes sense to only slightly adjust their names so that they match the type names in C #. That is, do so:

typedef signed char sbyte ; typedef unsigned char byte ; typedef unsigned short ushort typedef unsigned int uint ;

typedef signed char sbyte ; <br/>
typedef unsigned char byte ; <br/>
typedef unsigned short ushort <br/>
typedef unsigned int uint ;

Particular attention should be paid to such structures:

typedef unsigned char JBLOCK[64]; /* one block of coefficients */

This construction introduces the name JBLOCK for an array of 64 unsigned char elements. I prefer to turn such constructions into classes. That is, to make JBLOCK a class that contains an array within itself, and provides methods for accessing the elements of the array. Such an approach greatly simplifies the understanding of how JBLOCK arrays are created and deleted (especially 2-D and 3-D), and also how they change during the work of the program.

2.7 Pointer arithmetic

Another major task is getting rid of pointer arithmetic (pointer-arithmetic). Many C / C ++ programs rely heavily on this language feature. For example:

void horAcc32 ( int stride, uint * wp, int wc ) { if ( wc > stride ) { wc -= stride ; do { wp [ stride ] += wp [ 0 ] ; wp ++; wc -= stride ; } while ( ( int ) wc > 0 ) ; } }

void horAcc32 ( int stride, uint * wp, int wc ) <br/>
{ <br/>
if ( wc > stride ) { <br/>
wc -= stride ; <br/>
do { <br/>
wp [ stride ] += wp [ 0 ] ; <br/>
wp ++; <br/>
wc -= stride ; <br/>
} while ( ( int ) wc > 0 ) ; <br/>
} <br/>
}

Such functions need to be changed, because pointer arithmetic is not available by default in C #. You can use such arithmetic in unsafe code (unsafe code), but this code has its drawbacks. Therefore, I prefer to modify this code by introducing “index arithmetic”. That is, changing the code like this:

void horAcc32 ( int stride, uint * wp, int wc ) { int wpPos = 0 ; if ( wc > stride ) { wc -= stride ; do { wp [ wpPos + stride ] += wp [ wpPos ] ; wpPos ++; wc -= stride ; } while ( ( int ) wc > 0 ) ; } }

void horAcc32 ( int stride, uint * wp, int wc ) <br/>
{ <br/>
int wpPos = 0 ; <br/>
if ( wc > stride ) { <br/>
wc -= stride ; <br/>
do { <br/>
wp [ wpPos + stride ] += wp [ wpPos ] ; <br/>
wpPos ++; <br/>
wc -= stride ; <br/>
} while ( ( int ) wc > 0 ) ; <br/>
} <br/>
}

The result is a function that does the same work, but does not use pointer arithmetic and can be easily transferred to C #. Most likely, the modified code will run slower than the original one. Let me remind you once again that at this stage it does not matter.

Special attention should be paid to the functions that in the process of work change the pointers passed to them. Option source function:

void horAcc32(int stride, uint* & wp, int wc)

In this case, when wp changes in the horAcc32 function, the pointer also changes in the calling function. The approach with the introduction of the index can be used in this case. You just need to enter the index in the calling function and pass it to horAcc32.

void horAcc32(int stride, uint* wp, int& wpPos, int wc)

It is often convenient to make int wpPos a class field (member variable).

2.8 Pointers to functions

After pointer arithmetic, it's time to take pointers to functions (if there are any in the code). Cases of using pointers to functions can be divided into three fairly different subspecies:

function pointers are created and used within a single class / function
function pointers are created and used by different program classes.
function pointers are created by users and passed to the program (the program in this case is a static or dynamically loaded library)

An example for the first case:

typedef int ( * func ) ( int x, int y ) ; class Calculator { Calculator ( ) ; int ( * func ) ( int x, int y ) ; static int sum ( int x, int y ) { return x + y ; } static int mul ( int x, int y ) { return x * y ; } public : static Calculator * CreateSummator ( ) { Calculator * c = new Calculator ( ) ; c -> func = sum ; return c ; } static Calculator * CreateMultiplicator ( ) { Calculator * c = new Calculator ( ) ; c -> func = mul ; return c ; } int Calc ( int x, int y ) { return ( * func ) ( x,y ) ; } } ;

typedef int ( * func ) ( int x, int y ) ; <br/>
<br/>
class Calculator<br/>
{ <br/>
Calculator ( ) ; <br/>
int ( * func ) ( int x, int y ) ; <br/>
<br/>
static int sum ( int x, int y ) { return x + y ; } <br/>
static int mul ( int x, int y ) { return x * y ; } <br/>
public : <br/>
static Calculator * CreateSummator ( ) <br/>
{ <br/>
Calculator * c = new Calculator ( ) ; <br/>
c -> func = sum ; <br/>
return c ; <br/>
} <br/>
static Calculator * CreateMultiplicator ( ) <br/>
{ <br/>
Calculator * c = new Calculator ( ) ; <br/>
c -> func = mul ; <br/>
return c ; <br/>
} <br/>
int Calc ( int x, int y ) { return ( * func ) ( x,y ) ; } <br/>
} ;

In this case, the function Calc in the created class depends on which of the CreateSummator or CreateMultiplicator methods is called. I prefer to create an internal enum in the class, which describes all possible options for func, and a field that stores the value from enum. Then, instead of a pointer to a function, I create a method consisting of a switch statement (or several if). The created method selects which function to call, depending on the value of the field. Modified version:

class Calculator { enum FuncType { ftSum, ftMul } ; FuncType type ; Calculator ( ) ; int func ( int x, int y ) { if ( type == ftSum ) return sum ( x,y ) ; return mul ( x,y ) ; } static int sum ( int x, int y ) { return x + y ; } static int mul ( int x, int y ) { return x * y ; } public : static Calculator * createSummator ( ) { Calculator * c = new Calculator ( ) ; c -> type = ftSum ; return c ; } static Calculator * createMultiplicator ( ) { Calculator * c = new Calculator ( ) ; c -> type = ftMul ; return c ; } int Calc ( int x, int y ) { return func ( x,y ) ; } } ;

class Calculator<br/>
{ <br/>
enum FuncType<br/>
{ ftSum, ftMul } ; <br/>
FuncType type ; <br/>
<br/>
Calculator ( ) ; <br/>
<br/>
int func ( int x, int y ) <br/>
{ <br/>
if ( type == ftSum ) <br/>
return sum ( x,y ) ; <br/>
return mul ( x,y ) ; <br/>
} <br/>
<br/>
static int sum ( int x, int y ) { return x + y ; } <br/>
static int mul ( int x, int y ) { return x * y ; } <br/>
public : <br/>
static Calculator * createSummator ( ) <br/>
{ <br/>
Calculator * c = new Calculator ( ) ; <br/>
c -> type = ftSum ; <br/>
return c ; <br/>
} <br/>
static Calculator * createMultiplicator ( ) <br/>
{ <br/>
Calculator * c = new Calculator ( ) ; <br/>
c -> type = ftMul ; <br/>
return c ; <br/>
} <br/>
int Calc ( int x, int y ) { return func ( x,y ) ; } <br/>
} ;

You can do otherwise: do not change anything for now, and at the time of the transfer to C # use delegates.

An example for the second case (function pointers are created and used by different classes of the program):

typedef int ( * TIFFVSetMethod ) ( TIFF * , ttag_t, va_list ) ; typedef int ( * TIFFVGetMethod ) ( TIFF * , ttag_t, va_list ) ; typedef void ( * TIFFPrintMethod ) ( TIFF * , FILE * , long ) ; class TIFFTagMethods { public : TIFFVSetMethod vsetfield ; TIFFVGetMethod vgetfield ; TIFFPrintMethod printdir ; } ;

typedef int ( * TIFFVSetMethod ) ( TIFF * , ttag_t, va_list ) ; <br/>
typedef int ( * TIFFVGetMethod ) ( TIFF * , ttag_t, va_list ) ; <br/>
typedef void ( * TIFFPrintMethod ) ( TIFF * , FILE * , long ) ; <br/>
<br/>
class TIFFTagMethods<br/>
{ <br/>
public : <br/>
TIFFVSetMethod vsetfield ; <br/>
TIFFVGetMethod vgetfield ; <br/>
TIFFPrintMethod printdir ; <br/>
} ;

I prefer to change this situation by turning vsetfield / vgetfield / printdir into virtual methods. The code that used vsetfield / vgetfield / printdir will create descendant classes from TIFFTagMethods with the necessary implementation of virtual methods.

An example for the third case (function pointers are created by users and passed to the program):

typedef int (*PROC)(int, int);
int DoUsingMyProc (int, int, PROC lpMyProc, …);

This is where delegates are best suited. That is, at this stage, while the original code is being polished, nothing needs to be done, and when transferring to a C # project, a delegate must be made instead of PROC, and the DoUsingMyProc function will receive a delegate instance.

2.9 Isolation of the "problem" code

The last change to the original code is to isolate everything that can cause problems when changing compilers. This is, for example, code that actively uses the standard C / C ++ library (functions like fprintf, gets, atof, etc.) or WinAPI. In C #, such code will need to be modified to use methods from the .NET Framework or, if necessary, p / invoke. I advise in this case, look at the site http://www.pinvoke.net .

"Problem code" should be localized as much as possible. To do this, you can create a class with static methods that wraps around the standard C / C ++ library and WinAPI. Then when transferring it will be necessary to change only this wrapper.

2.10 Changing the compiler

The “moment of truth” has arrived - it's time to transfer the modified code to the project compiled by the C # compiler. It's all quite simple, albeit time-consuming. You need to create an empty project, then add the necessary classes to it, copying code from similar original classes into these classes.

In the process, you will have to delete unnecessary (various #include, for example) and make cosmetic changes. "Standard" changes:

merge code from .h and .cpp files
replacing obj-> method () with obj.method ()
Replace Class :: StaticMethod with Class.StaticMethod
delete * in func (A * anInstance)
replacing func (int & x) with func (ref int x)

Most of the changes are not really difficult, but sometimes you have to comment on a part of the code. Basically, it is necessary to comment on the problem code, which was discussed in section 2.9. The main goal is to get the compiled code in C #. Most likely, it will not work, but everything has its time.

2.11 file processing

After the transferred code is compiled, it needs to be brought to coincide with the original in functionality. Here you will need to create a second set of tests that will use the transferred code to work. The previously commented methods need to carefully view and rewrite their body using the .NET Framework. I think this stage can not particularly explain. I just want to draw attention to a couple of moments.

When creating strings from an array of bytes (and vice versa), you need to carefully select the encoding used. You should not use Encoding.ASCII, since it is 7-bit and for bytes larger than 127 it will turn out '?' instead of characters. It is better to use the current Encoding.Default or Encoding.GetEncoding ("Latin1") encoding. The choice of encoding depends on what happens next with the text or bytes. If the text needs to be shown to the user, then it is better to use Encoding.Default, and if bytes are made of text for writing to a binary file, then it is better to use Encoding.GetEncoding ("Latin1").

Certain problems can be the output of formatted strings (family of printf functions in C / C ++). The functionality of String.Format in .NET differs both in features (it is poorer) and the format string syntax. This problem can be solved in two ways:

create a class that will do the same thing as the printf function
change the formatting string so that String.Format gives similar results (not always possible)

If you follow the first path, you should pay attention to the already existing implementation of “A printf implementation in C #” .

I prefer the second way. If you go on it, it will help Google on "c # format specifiers" (without quotes) and " Format Specifiers from C # in a Nutshell ".

After all the tests that use the ported code run successfully, you can safely say that the transfer is complete. Now you can remember that the code is not yet completely “in the spirit of C #” (for example, get- / set-methods are used instead of properties) and do refactoring of the transferred code. You can use the profiler to look for "bottlenecks" and do optimization. But that's another story.

Good luck with porting!

PS The article is not mine, written by Sergius Bobrovsky, who still does not have an invitation to Habr. If anyone has free and do not mind sharing, write in a personal, please.

Source: https://habr.com/ru/post/101653/

All Articles