Caché Native Access - working with native libraries in Caché

Picture to attract attention

As you know, Caché is not only a DBMS, but also a full-fledged programming language (Caché ObjectScript). Both from the DBMS and Caché ObjectScript (COS) side, access beyond Caché is rich in capabilities (in .Net / Java through .Net / Java Gateway, to relational DBMS through SQL Gateway, working with web services). But if we talk about working with native binary libraries, this interaction is implemented through the Caché Callout Gateway , which is somewhat specific. On how to radically facilitate the work with native libraries directly from COS can be learned by cat.

Caché Callout Gateway

Today, Caché uses the Caché Callout Gateway to work with native code. This name means several functions united under one name - $ ZF (). These functions are divided into two groups:

$ ZF (-1), $ ZF (-2). The first group of functions allows you to work with system commands and console programs. This is an effective tool, but its drawback is obvious - the entire functionality of the library is difficult to implement in one or several programs.
Example using $ ZF (-1)
Creating a new folder in the working directory with the name “newfolder”:
```
set name = "newfolder" set status = $ZF(-1, "mkdir " _ name) 
```
$ ZF (-3), $ ZF (-5), $ ZF (). The second group of functions provides access to dynamic and static libraries. This is more like what we need. But not everything is so simple: $ ZF () does not work with any libraries, but only with libraries of a special type - Callout Libraries . Callout Library differs from the usual library in the presence of a special character table ZFEntry in the code, which contains some analogue of the prototypes of the exported functions. Moreover, the type of arguments of exported functions is strictly limited - only int and several types of pointers are supported. That is, to make a Callout Library from an arbitrary library, you will most likely have to write a wrapper over the entire library, which is not convenient.
An example of creating a Callout Library and calling a function from it
Callout Library, test.c file:
```
 #define ZF_DLL #include <cdzf.h> //  cdzf.h    Cache/dev/cpp/include int square(int input, int *output) { *output = input * input; return ZF_SUCCESS; } ZFBEGIN //   ZFENTRY("square", "iP", square) // "iP" ,   square   - int  int * ZFEND 
```
Compile (mingw):
```
 gcc -mdll -fpic test.c -o test.dll 
```
On Linux, you must replace -mdll with -shared.
')
Calling square () from Caché:
```
 USER> do $ZF(-3, "test.dll", "square", 9) 81 
```

Caché Native Access

To remove the limitations of Callout Gateway and make working with native libraries convenient, a CNA project was created. Name - tracing from a similar project for a Java-machine JNA .

CNA features:

You can call functions from any dynamic (shared) library that is binary compatible with C
To call functions, only COS code is needed - there is no need to write anything on C or another compiled in computer code
Support for all simple types of C language, size_t and pointers
Support for structures (and nested structures)
Caché thread support
Platforms Supported: Linux (x86-32 / 64), Windows (x86-32 / 64)

Installation

First we collect the C part, compiled by one command -

 make libffi && make

Under Windows, you can use to compile mingw, or download ready-made binaries . Then we import the cna.xml file into any convenient area:

 do $system.OBJ.Load("  cna.xml", "c")

CNA example

The simplest native library that is on all systems is the standard C library. On Windows, it is usually located at C: \ Windows \ System32 \ msvcrt.dll, on Linux it is /usr/lib/libc.so. Let's try to call some function from it, for example strlen, it has such a prototype:

 size_t strlen(const char *);

 Class CNA.Strlen Extends %RegisteredObject { ClassMethod Call(libcnaPath As %String, libcPath As %String, string As %String) As %Integer { set cna = ##class(CNA.CNA).%New(libcnaPath) //    CNA.CNA do cna.LoadLibrary(libcPath) //  libc  CNA set pString = cna.ConvertStringToPointer(string) //     C      //  strlen:   ,   , //         set result = cna.CallFunction("strlen", cna.#SIZET, $lb(cna.#POINTER), pString) do cna.FreeLibrary() return result } }

In the terminal:

 USER>w ##class(CNA.Strlen).Call("libcna.dll", "C:\Windows\system32\msvcrt.dll", "hello") 5

Implementation details

CNA is a bundle of C and Caché libraries. Mostly CNA relies on libffi . libffi is a library that allows you to organize the "low level" of the interface of external functions (FFI). It helps to forget about the existence of various calling conventions and to call functions at run time, without providing their specifications at compile time. But to call functions from libffi you need the address of the function, and we would like to call functions only by name. To get the address of the function from any of the name will have to use the platform-dependent interfaces: POSIX and WinAPI. In POSIX, there is a dlopen () / dlsym () mechanism for loading the library and finding the address of a function; in WinAPI, the LoadLibrary () and GetProcAddress () functions. This is one of the obstacles to porting CNA to other platforms, although on the other hand, almost all modern systems are at least partially, but they support the POSIX standard (except, of course, Windows).

libffi is written in C and assembler. Hence libffi is the native library, and access to it from Caché can only be obtained with the help of Callout Gateway. That is, you need to write a layer that connects libffi and Caché and is a Callout Library so that it can be accessed from COS. Approximate scheme of work of CNA:

At this stage, there is a data conversion problem. When we call a function from COS, we pass arguments in the internal Caché format. You need to transfer them to Callout Gateway, then to libffi, but you still need to convert them to C format somewhere. But Callout Gateway supports very few data types and if we converted data on the C side, we would have to send everything to as strings, and then parse them, which is not convenient for many reasons. Therefore, it was decided to convert the data on the Cache side and pass all the arguments in the form of strings with binary data already in the C format.

Since all data types C, except for composite, are numbers, in fact, the task of converting data comes down to converting numbers to binary strings using COS. For these purposes, Caché has great features that allow you to bypass the need for direct data access: $ CHAR and $ ASCII, which convert an 8-bit number to a character and vice versa. There are analogs for all the necessary numbers - for 16, 32 and 64-bit integers and double-precision floating-point numbers. But there is one thing - all these functions work only for either signed or unsigned numbers (of course, when working with integers). In C, as is well known, a number of any size can be either signed or unsigned. To complement these functions to complete the work will have to manually.

Additional code is used to represent signed numbers in C:

The first bit is responsible for the sign of the number: 0 - plus, 1 - minus
Positive numbers are encoded similarly unsigned.
The maximum positive number is 2 ^k-1 -1, k is the number of bits
The code of a negative number x is the same as the code of an unsigned number 2 ^k + x

This method allows you to use the same implementation of addition as for unsigned numbers. This is achieved through arithmetic overflow .

Consider an example conversion for unsigned 32-bit numbers. If the number is positive, then we simply use the $ ZLCHAR function, if it is negative, then we need to find such an unsigned number so that they coincide in binary form. How to search for this number directly follows from the definition of the additional code - you need to add the initial number to the minimum number that does not fit in 32 bits - 2 ³² or FFFFFFFF ₁₆ + 1. The result is the following code:

 if (x < 0) { set x = $ZLCHAR($ZHEX("FFFFFFFF") + x + 1) } else set x = $ZLCHAR(x) }

The next problem is the transformation of the structures of the composite type of the C language. Everything would be simple if the structures in memory were represented in the same way as they were written - all fields follow in succession, one after another. But in memory, the structure is arranged so that the address of each of the fields is a multiple of a special number, field alignment. The end of the structure is also aligned - by the greatest field alignment. Alignment is necessary due to the fact that most platforms either do not know how to work with non-aligned data, or they do it rather slowly. Usually x86 alignment is equal to the size of the field, but there is an exception - 32-bit Linux, there the alignment of all fields that are larger than 4 bytes is equal to just 4 bytes. More information about the alignment of data can be read in this article .

Take, as an example, such a structure:

 struct X { char a, b; // sizeof(char) == 1 double c; // sizeof(double) == 8 char d; };

On x86-32, it will be located in different OS in different OS:

In practice, such a representation of the structure is formed quite simply. It is necessary to write the fields in memory successively, but each time to form an indent (padding) is an empty space before writing. Indent is calculated as follows:

 set padding = (alignment - (offset # alignment)) # alignment //offset -

What is not working yet

1) Integers in Caché are presented in such a way that accurate work with them is guaranteed only as long as the number does not go beyond the 64-bit signed number. But in C there is a 64-bit unsigned type (unsigned long long). That is, a number that exceeds the maximum 64-bit signed value, 2 ⁶³ -1 (~ 9 * 10 ¹⁸ ), cannot be transferred to the external function.

2) There are two types for working with real numbers in Caché: the native decimal and the double-precision floating point numbers of the IEEE 754 standard. That is, there are no analogs of the C language types float and long double in Caché. It is possible to work in CNA with these types, but with each hit in Caché they will be converted into double.

3) When working on Windows with long double, everything will most likely work incorrectly. This is due to the fact that Microsoft and the mingw development team have fundamentally different views on what should be a long double. Microsoft considers that both 32 and 64-bit systems have a long double size - 8 bytes. In mingw, on 32 bits - 12 bytes, on 64 - 16. And since CNA is compiled using mingw, it is better to forget about long double.

4) There is no support for unions and bit fields in structures (bitfields). This is because libffi does not support them.

Criticism, comments, suggestions - are welcome.

The entire source code is laid out on a githab under a MIT license.
github.com/intersystems-ru/cna

Source: https://habr.com/ru/post/235473/

All Articles