Developing Modules for Limbo C (Part 2)

Part 1

Content

Heap

In order for C to correctly create and destroy complex structures with which the code will work on Limbo, it is necessary to imagine how they are stored in memory, i.e. how the heap is organized in Inferno. All the functions listed below for working with heap are described in libinterp/heap.c , and the structures in include/interp.h .

What is in my memory ...

In order to simply allocate n bytes in heap, and then free them, you need to do the following:

 Heap *h; uchar *data; h = nheap(256); data = H2D(uchar*, h); ... //   data  256  destroy(data);

Physically, the memory will be allocated 256 + the size of the heap header bytes, and at the beginning will be the header, and then the user data. The header is described in the Heap structure, plus there are two macros for converting a pointer to a heap header into a pointer to data (moreover, for convenience, immediately with a type H2D() ) H2D() and vice versa D2H() . The destroy() function uses D2H() to get a heap header instead of a pointer to the beginning of user data of unknown length and find out how many bytes need to be freed.

 struct Heap { int color; /* Allocation color */ ulong ref; Type* t; ulong hprof; /* heap profiling */ }; #define H2D(t, x) ((t)(((uchar*)(x))+sizeof(Heap))) #define D2H(x) ((Heap*)(((uchar*)(x))-sizeof(Heap)))

Something about garbage collection

What is in the header heap? I won't tell you anything about hprof , I didn't understand heap profiling, but the rest of the fields are extremely important.

First, a few words about the garbage collector in Inferno. Two strategies are used simultaneously: a simple reference counter of which is sufficient for most structures, plus a variation on the theme of tri-color marking to select structures with cyclical references. Accordingly, ref is the reference counter, and color used for tri-color marking. When nheap() or another similar function allocates a new object to the heap, the ref value is set to 1 in its heap header. When destroy() is called, it decreases the ref value by 1, and only if ref became equal to 0, it frees up the object occupied memory.

Accordingly, as long as you store the value returned by nheap() (or any other similar function) in one variable, you have exactly one reference to this object, and its ref is 1. As soon as you copy this reference to another variable, you need to increase the reference count plus notify the tri-color algorithm. It is done this way ( Setmark() is also a macro, but in order to deal with it you need to understand the work of the tri-color algorithm, which is not required from you right now):

 Heap *h; uchar *data, *data2; data = H2D(uchar*, nheap(256)); data2 = data; h = D2H(data2); h->ref++; Setmark(h); //   h->color destroy(data); //       destroy(data2); //

Of course, if you selected an object in heap, and then returned it to the user by writing to *f->ret , then nothing else needs to be done with ref and color - the link will be removed from the local variables of your function when the function ends, and only one will be left again the reference to this object is for the user, in the variable where he saved the value returned by your function, i.e. There was a move, not a copy of the link.

')

There is one implicit nuance associated with moving links. In the previous example with the return value, a new, just created link is moved, and nothing extra is required there. But if you moved a link from one already existing variable / structure to another, also an existing one, then you need to notify the tri-color algorithm by calling Setmark() (this is also related to the peculiarities of this algorithm and will be described below):

 dst->data = src->data; src->data = H; Setmark(D2H(dst->data));

Types of objects in heap

The example with nheap() described above is almost never used in real code, since Limbo does not have this data type: n bytes. Therefore, neither get from Limbo nor return to Limbo the object selected through nheap() will fail. And to allocate memory for the internal needs of your C-module, as a rule, there are usually enough malloc() with free() .

All objects in the heap that Limbo can operate on must be of some type described by the Type structure. This allows you to solve the problem of automatic detection of all links within any object - what needs to be done when:

memory allocation for this object (to set all the pointers inside it to H aka nil );
executing destroy() (so that when deleting an object from memory with it, delete or reduce the ref of all objects it referred to);
the work of the garbage collector (to take into account when crawling all the objects in the memory, the objects to which are located inside other complex objects).

 struct Type { int ref; void (*free)(Heap*, int); void (*mark)(Type*, void*); int size; int np; void* destroy; void* initialize; uchar map[STRUCTALIGN]; };

For standard types (any adt / tuple / structure that contains either non-reference fields or fields with links to regular heap objects that can be freed via destroy() - like String* and Array* ), the np and map fields are used. The map field contains a bit mask, one bit each (starting from the high bit of the first byte) for every 4 bytes of adt / tuple / structure, where the set bit means that the corresponding 4 bytes are a pointer to a heap object. (The np field contains the length of the map field in bytes.)

Some data types use memory in an unusual way. Typical examples are String and Array - the first contains inside the char* field, which must be freed via free() ; the second can be a slice and contain within itself a link to the "parent" array. The Type structure allows you to specify your own non-standard handler functions for these types, which will be called when allocating / initializing memory from destroy() and from the garbage collector: free , mark , destroy and initialize .

The size field contains the size of a structure of this type (since we still have to specify its type when allocating memory for this structure, preserving the size of the structure inside the type allows us to limit ourselves to specifying only the type, without adding the size of the structure to it each time).

The ref field is used to store the current number of objects in this type of heap. The fact is that the list of types is not limited to standard string , array of , etc. - any tuple described on the fly is a separate type for which you need your description by the Type structure. It turns out that some types are created by Limbo on the fly, stored in the same heap, and must be removed from memory as soon as all objects of this type are deleted. Therefore, when creating a new object of a certain type, you need to increase the ref this type, and when you delete this object, destroy() will automatically reduce the ref also for the type of this object (and remove it from memory if ref equals 0).

Type values for standard types are declared statically, with ref set to 1 (so their ref will never be less than 1 and they will never be removed from memory), and are described at the beginning of libinterp/head.c :

 Type Tbyte = { 1, 0, 0, 1 }; Type Tword = { 1, 0, 0, sizeof(WORD) }; Type Tptr = { 1, 0, markheap, sizeof(WORD*), 1, 0, 0, { 0x80 } }; Type Tarray = { 1, freearray, markarray, sizeof(Array) }; Type Tstring = { 1, freestring, noptrs, sizeof(String) }; ...

We create our complex structures in heap

Types for your own adt / tuples / structures in some cases will help determine libinterp/runt.h (by calculating the size of your structure for Type.size and the bit mask of pointer fields for Type.map and Type.np ), in others you will have to define them independently (for example, to create and return a tuple from the C function). They are usually created during initialization of your module (returning to the example with the module Example) using the dtype() function. And memory is allocated and initialized for them through heap(&Tsometype) , and not nheap(n_bytes) .

module/example.m

 Example: module { ... MyData: adt{ i: int; s: string; new: fn(i: int): ref MyData; }; };

libinterp/runt.h (automatically generated from module/example.m )

 typedef struct Example_MyData Example_MyData; struct Example_MyData { WORD i; String* s; }; #define Example_MyData_size 8 #define Example_MyData_map {0x40,} void MyData_new(void*); typedef struct F_MyData_new F_MyData_new; struct F_MyData_new { WORD regs[NREG-1]; Example_MyData** ret; uchar temps[12]; WORD i; };

Example_MyData_map 0x40 means 010000 ... bit mask, i.e. the first 4 bytes of our structure is not a pointer (WORD) but the second is a pointer (String *).

libinterp/example.c

 static Type* TMyData; static uchar MyData_map[] = Example_MyData_map; void examplemodinit(void) { ... TMyData = dtype(freeheap, Example_MyData_size, MyData_map, sizeof(MyData_map)); } void MyData_new(void *fp) { F_MyData_new *f; int i; Example_MyData* mydata; f = fp; i = f->i; mydata = H2D(Example_MyData*, heap(TMyData)); mydata->i = i; destroy(*f->ret); *f->ret = mydata; }

testexample.b

 ... example: Example; MyData: import example; ... init(nil: ref Draw->Context, nil: list of string) { ... mydata := MyData.new(5); sys->print("i = %d, s = %q\n", mydata.i, mydata.s); } ; testexample ... i = 5, s = '' ;

Array

Consider working with arrays. In memory, the array is located as follows: heap header, array header, array elements. Accordingly, to allocate memory for an array, you must know the size and number of its elements. To initialize these elements (all of a sudden there are links that need to be set in H ) and later to correctly remove from memory, you need to know the type of these elements (then there is no need to specify the size, it is already specified inside the type).

 struct Array { WORD len; Type* t; Array* root; uchar* data; };

Here len is the size of the array, t is the type of its elements, root pointer to the parent array (if this array is its slice), and data pointer to the first element of the array (this is the next byte after the Array structure if the array is independent or the address of the first element of our slice is among the elements parent array).

If we do not create a slice of another array, then we need to allocate more memory than the Array structure itself (and, accordingly, as indicated in Tarray.size ). Therefore, we will not be able to allocate memory for the array through the heap() function used previously. Fortunately, for this there is a convenient heaparray() function. An example of selecting array[16] of byte :

 Array *arr; arr = H2D(Array*, heaparray(&Tbyte, 16));

Here is an example of a function that returns an array slice: http://code.google.com/p/inferno-cjson/source/browse/libinterp/cjson.c#59 .

adt and ref adt

There is an implicit difference between Limbo and C in how adt and ref adt are handled. If at the Limbo level, working with them looks (almost) the same:

 a := array[10] of MyData; b := array[10] of ref MyData; for(i := 0; i < len b; i++) b[i] = ref MyData; a[0].i = b[0].i;

then at the C level these are completely different arrays:

 Array *a, *b; int i; Example_MyData* adata; Example_MyData** bdata; a = H2D(Array*, heaparray(TMyData, 10)); adata = (Example_MyData*)a->data; b = H2D(Array*, heaparray(&Tptr, 10)); bdata = (Example_MyData**)b->data; for(i = 0; i < b->len; i++) bdata[i] = H2D(Example_MyData*, heap(TMyData)); adata[0].i = bdata[0]->i;

GC

A general description of the tri-color algorithm can be found in Wikipedia . Inferno, these three “colors” are called mutator , marker and sweeper .

New objects set h->color=mutator .

After each full gc cycle, the values of mutator , marker and sweeper variables change in a circle, thus changing the values of h->color all objects in heap:

 mutator -> marker marker -> sweeper sweeper -> mutator //    ..  sweeper

If during operation gc h->color==sweeper , then h is removed from memory.

So, what is Setmark() and why it is needed.

 #define Setmark(h) if((h)->color!=mutator) { (h)->color = propagator; nprop=1; }

Further, since the garbage collector can work for a very long time, and everything else is stopped at this moment, in Inferno the garbage collector works in small pieces - bypassing some of the objects it pauses, letting other code work, and then continues from the place where it stopped last time . But for this algorithm, it is required to check all objects in memory in one pass, atomically, otherwise it is impossible to unambiguously identify unused objects. Therefore, a mechanism is implemented in the Inferno garbage collector that allows you to determine if its potentially interesting objects changed between GC launches.

This mechanism is the call to Setmark() . The nprop flag nprop it, if necessary, indicates to the GC that after the current crawl cycle of objects in the heap cannot be deleted, unused objects must be repeated from the beginning.

The value h->color==propagator means that the next time gc is run, it is necessary to view this object. In the process of viewing the object, h->color set in mutator . (The same propagator value is set to “root” objects at the beginning of a new GC cycle, but in this context it is not important.)

Disable object in heap from GC

Since the GC removes from memory all objects that are not referenced by “root” objects (which probably contain Dis working threads, loaded modules, etc.) without paying attention to the value of their ref , the question arises: how to store a link to the heap object outside Dis threads, for example in the global variables of your C-module, so that it does not kill the GC? To do this, you must inform the GC that it should not monitor this heap object using poolimmutable() (to connect an object back to the GC there is a poolmutable() , all these functions are in emu/port/alloc.c ):