Analog std :: vector from C ++ 11 on pure C89 and how I wrote it

Residential array of people. No seriously.

Holivary between lovers of C and its followers ~~bastard~~ son in the face of C + + began even before my birth and will end unless after the death of both these languages and me at the same time.

Adherents of the great creation Kernigan-Ritchie, up to the last second of the working day, are ready to prove to Stroustrup’s minions axioms about eternity Xi and its incredible flexibility.
Those, in their own way, advise them to better rejoice at the working day, because it is about to be the last - the twenty-first century does not need a cross-platform assembler.
Burning out, supporters of C lead millions of long-gone through their heads to embrace theses “why C is better than C ++,” while each time emphasizing that the second had lost all the merits of the first while still in his father's womb, incidentally losing the face of man.
The accused party is not offended and ...

but wait, what am I talking about?

I love C, I respect C ++ and I can not stand holivars (honestly). At the same time, I realize that C really lacks a lot, and a vivid example of this is the lack of convenient data handling. In C ++, this problem is largely solved by the STL and properties of the language itself. In my student's opinion, the std::vector is especially different. If it became interesting how I implemented its analogue using C89 tools - I ask for cat.

Prehistory

In general, everyone who switches to C from a slightly higher level is surely faced with the above described problem (in my case it was FreeBASIC and Free Pascal). The problem of the absence of the long-beloved Redim and SetLength() first solved "in the forehead with a sledgehammer" with the help of realloc() . Then knowledge comes around with experience, and instead, a simple, self-written dynamic array is used.

However, the need to duplicate its code for each individual data type each time causes more irritation. There is also an alternative - the use of pointers, requiring dereferencing and type conversions. And then the person gets into the hands of C ++ (or its equivalent) and the person sees STL (or its equivalent). Then you can read in any gutter novel.

However, they fall in love with the body, but love the soul. If a person has been in a happy relationship with X for a long time, if projects have already appeared in them, then it is only natural for a person to want to make the object of his love better - for mutual benefit. And a person in cultivation is always focused on something.

In short, this is a story about how love for C made me add to it (her?) The notorious std::vector - what I liked about C ++, which (which?) I was into at one time.

Before us even the flood

It has already been noted that the problem of the absence in C of the built-in dynamic array for arbitrary types is not new and has been solved differently many times.
Here are the options for the implementation of the vector, which I found in just five minutes on Google:

https://github.com/rxi/vec
https://github.com/eteran/c-vector
https://github.com/jibsen/scv
https://github.com/graphitemaster/cvec
https://github.com/robertkety/dataStructures (Ctrl + F "dynamicArray")
http://troydhanson.imtqy.com/uthash/utarray.html
https://github.com/dude719/Dynamic-Array-Kernel
https://developer.gnome.org/glib/stable/glib-Arrays.html
https://www.happybearsoftware.com/implementing-a-dynamic-array
https://github.com/nothings/stb/blob/master/stretchy_buffer.h (added on tip by Xop )

All of these solutions have at least one of the following ~~fatal~~ disadvantages:

The implementation of macros specific control functions.
Using macros as inline functions is a bad idea. This has been said many times, but are we tired of repeating?
First , when using macro functions, it is harder to track and debug errors due to incorrect argument types.
Secondly , the macros-functions do not know how to return anything, if you do not take into account the perversion with the transfer of the variable name as a separate argument to store the result.
Thirdly , due to constant code substitutions from macro functions that are not very similar to inline, the size of the translation unit swells. Hence the increase in the size of the output executable file and other pleasures of life.
Fourth , you cannot take a pointer to a macro.
Duplication of functions common to any vectors.
For example, different release functions for the vector int 's and the vector char ' s. Under the hood, they will be just a call to the function free() , deeply indifferent to what is stored in the buffer being destroyed, as well as to the type of pointer to it.
This again provokes an increase in the volume of translation units, duplication of code, and at the same time littering of the name space.
Work with values through untyped pointers.
This makes it necessary to always take a pointer to the value to add it even to a simple vector of primitive types (for example, int 's). And also do not forget about casting and dereferencing. Well, that in such a vector, you can potentially stick values of different types , and no one will warn us about this.
The designation of the type of vector as a structure.
The biggest drawback, in the presence of one which even the complete absence of others no longer plays a role.
First , access to the vector elements occurs through the structure field. For a one-dimensional vector, this is already inconvenient - is it worth talking about multidimensional.
Secondly , all fields of the structure, even technical, are freely accessible to the user.
Thirdly , the almost complete incompatibility between vectors of different types.
Fourth , to create and delete a vector, you need 2 malloc() / free() calls, respectively - one per structure and one per vector buffer itself. As you might guess, for the dimension vector $n$ calls will be already $2n$ .
Fifth , such a vector can be transferred to its function only by a pointer to a structure, therefore the syntax for accessing it in a function will be slightly different ( -> instead of . And so on).

Thus, the task of creating a vector that specializes in all types of C data and has the following capabilities emerges:

Access to elements of a vector as to elements of a regular array, regardless of its dimension: vec[k] , vec[i][j] , etc.
Vector control using ordinary functions that have typed arguments and a return value, as opposed to macros.
No duplicate code due to the specialization of only those functions that accept and / or return custom type values.
The user has no direct access to vector technical information.
Compatibility between vectors of different types at the level of assignment of one to another.
The ability of the user in the specialization of the vector to specify the method of transmission and return values: by value or by reference (through the pointer).
The maximum similarity of the vector interface with that of std::vector from C ++ 11.

Dive into C89

I will answer in advance the question why C89, and not at least C99. First, it gives support to the compiler from Visual Studio (although I don’t like it). Secondly, I myself love C99 very much, but in this case I felt that the task at hand could be solved in more severe conditions. After all, the publication in "abnormal programming" must be justified.

When I first began to study C, writing a convenient vector seemed to me extremely impossible - the indexing operator in the head was associated strictly with the array, the array was associated strictly with the typed pointer, and the vector was associated with the need to store technical information in the structure. I could not get away from the idea that access to the elements of a vector could be realized only through the field of this structure itself, and the use of this approach by the overwhelming majority of vector realizations only strengthened confidence in this.

But then I came across a dynamic string library for C called Simple Dynamic Strings , written at the time for Redis . It uses a different approach: technical information about the vector is not stored in the structure together with a pointer to it, but in the form of a header right before the vector buffer itself in memory. This allows you to operate the vector directly through a typed pointer, while the placement of technical information is always reliably known.

Let me remind you that a typed pointer allows you to access the elements of a vector through an indexing operator, as in a regular array. And the arrangement of technical information right in front of the vector itself deprives the user of direct access to it - for this he will have to manipulate the pointers. In the case of a structure, both the pointer to the vector and the technical information are its fields, access to which is the same.

Thus, we have implemented the possibilities (1) and (4). Go ahead.

Since now the vector is just a typed pointer, we, it would seem, can already generalize for different types of vectors such functions as, for example, the release function, simply designating the argument “pointer to the vector to be released” as void* . It is well known that any other pointer can be implicitly converted to void* , as well as vice versa.

However, can we do this for other functions? Oddly enough, but yes. We do not have functions that operate directly on the stored values themselves - they were originally supposed to specialize separately for each type of vector. In fact, we operate only on the places where values are stored, but not on the values themselves. Therefore, it is enough for us to know only the size of one element, which can be stored in the vector’s technical information and filled in as a function of its creation by passing the corresponding argument. Such a trick allows us to generalize all functions for different types of vectors, and to specialize on their basis only those that accept and / or return values of a user-defined type.

Clauses (2) and (3) are implemented. And since there are no objects in C and any value can be reassigned to another variable literally by copying the memory, item (5) is also implemented. We continue in the same vein.

In fact, all specialized functions operate on user-defined values in one of two ways:

assignment of the transferred value to the indicated vector elements;
return value of the specified item.

It is known that a value can be passed to a function or returned from it either by value (sorry for the pun) or by reference. For primitive types, the first option is preferable, while for complex structures the second one.
Of course there are no links a la C ++ in C, but pointers will replace them.

Tired of the text? rhetorical question.
Then I will give for clarity the definitions of variants of the same functions that accept / return variables by value and by reference, respectively.

 gvec_error_e gvec_NAME_push( gvec_NAME_t* phandle, const TYPE value ) gvec_error_e gvec_NAME_push( gvec_NAME_t* phandle, const TYPE* value )

 TYPE gvec_NAME_front( gvec_NAME_t handle ) TYPE* gvec_NAME_front( gvec_NAME_t handle )

It is seen that in both cases the difference is only in one symbol.

Already in C89, an assignment operator is available for all types, not just primitive ones. This allows sending and returning by reference or by value in specialized functions to be indicated by arguments of the specializer macro. True, a reasonable question arises: why not to indicate this with one argument at once for transmission and return at the same time? It is very simple: returning by value is more convenient and faster in the case of primitive types, but the value may not be determined if there is no requested element in the vector. When returning by reference in this case, we can simply return NULL . In short, this is left to the discretion of the programmer himself.

As a result, paragraph (6) is implemented. Paragraph (7) can also be considered implemented on the basis of all the previous ones.

Conclusion

The final implementation of the vector library on C89, ready for practical use, is here:

https://github.com/cher-nov/genvector ( ~~MIT License~~ now WTFPL)

The simplest example of use is given in ReadMe.

Of course, the article does not cover some other, less complex but equally interesting aspects of implementation, for the description of which I lacked brevity and eloquence. Also omitted the ranting about the decisions that turned out to be unsuccessful in the end, and their rethinking. But I am sure that the answers on the first one can be obtained from the code and ReadMe in the repository, and on the second one - from the history of commits.

This is my first article on Habré, so please judge as much as possible. For the impediment - especially.

I hope all this will prove to someone so useful.

Source: https://habr.com/ru/post/324210/

All Articles

Analog std :: vector from C ++ 11 on pure C89 and how I wrote it

Prehistory

Before us even the flood

Dive into C89

Conclusion

More articles: