📜 ⬆️ ⬇️

A little about the lines in C, or several options to optimize non-optimizable

Habra, hello!

Not so long ago, I had a rather interesting incident in which one of the teachers of a computer science college was implicated.

Talking about programming under Linux slowly moved to the fact that this person began to argue that the complexity of system programming is in fact greatly exaggerated. That the C language is simple as a match, in fact, like the Linux kernel (in his words).
')
I had with me a laptop with Linux, which was attended by a gentleman's toolkit for developing in the C language (gcc, vim, make, valgrind, gdb). I don’t remember what goal we set for ourselves then, but after a couple of minutes my opponent was behind this laptop, completely ready to solve the problem.

And literally in the very first lines, he made a serious mistake when allocating memory under ... a string.

char *str = (char *)malloc(sizeof(char) * strlen(buffer)); 

buffer is a stack variable in which data was entered from the keyboard.

I think there will definitely be people who will ask: “Can anything be wrong here?”.
Believe, maybe.

And what exactly - read on the cut.

A bit of theory - a kind of LikBez.


If you know - scroll to the next header.

A string in C is an array of characters that, in a good way, should always end with '\ 0' - the end of line character. The lines on the stack (static) are declared like this:

 char str[n] = { 0 }; 

n is the size of the character array, the same as the length of the string.

Assigning {0} - “zeroing” of the line (optional, you can declare without it). The result is the same as the memset (str, 0, sizeof (str)) and bzero (str, sizeof (str)) functions. It is used so that in the uninitialized variables there is no litter.

Also on the stack, you can immediately initialize the line:

 char buf[BUFSIZE] = "default buffer text\n"; 

In addition, the line can be declared a pointer and allocate memory for it on the heap:

 char *str = malloc(size); 

size - the number of bytes that we allocate for the line. Such strings are called dynamic (due to the fact that the required size is calculated dynamically + the allocated memory size can be increased at any time using the realloc () function).

In the case of a stack variable, I used the notation n to determine the size of the array, in the case of a variable on the heap — I used the notation size. And it perfectly reflects the true essence of the difference between an ad on a stack and an ad with allocating memory on a heap, because n is usually used when talking about the number of elements. And size is another story altogether ...

I think. enough for now. Go ahead.

We will help valgrind


In my previous article, I also mentioned it. Valgrind ( once a wiki article , two is a small how-to ) is a very useful program that helps a programmer to track memory leaks and context errors — these are the things that most often come up when working with strings.

Let's look at a small listing that implements something similar to the program I mentioned, and run it through valgrind:

 #include <stdio.h> #include <stdlib.h> #include <string.h> #define HELLO_STRING "Hello, Habr!\n" void main() { char *str = malloc(sizeof(char) * strlen(HELLO_STRING)); strcpy(str, HELLO_STRING); printf("->\t%s", str); free(str); } 

And, actually, the result of the program:

 [indever@localhost public]$ gcc main.c [indever@localhost public]$ ./a.out -> Hello, Habr! 

Nothing unusual yet. And now let's run this program with valgrind!

 [indever@localhost public]$ valgrind --tool=memcheck ./a.out ==3892== Memcheck, a memory error detector ==3892== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==3892== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==3892== Command: ./a.out ==3892== ==3892== Invalid write of size 2 ==3892== at 0x4005B4: main (in /home/indever/prg/C/public/a.out) ==3892== Address 0x520004c is 12 bytes inside a block of size 13 alloc'd ==3892== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299) ==3892== by 0x400597: main (in /home/indever/prg/C/public/a.out) ==3892== ==3892== Invalid read of size 1 ==3892== at 0x4C30BC4: strlen (vg_replace_strmem.c:454) ==3892== by 0x4E89AD0: vfprintf (in /usr/lib64/libc-2.24.so) ==3892== by 0x4E90718: printf (in /usr/lib64/libc-2.24.so) ==3892== by 0x4005CF: main (in /home/indever/prg/C/public/a.out) ==3892== Address 0x520004d is 0 bytes after a block of size 13 alloc'd ==3892== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299) ==3892== by 0x400597: main (in /home/indever/prg/C/public/a.out) ==3892== -> Hello, Habr! ==3892== ==3892== HEAP SUMMARY: ==3892== in use at exit: 0 bytes in 0 blocks ==3892== total heap usage: 2 allocs, 2 frees, 1,037 bytes allocated ==3892== ==3892== All heap blocks were freed -- no leaks are possible ==3892== ==3892== For counts of detected and suppressed errors, rerun with: -v ==3892== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0) 

== 3892 == All heap blocks were freed - no leaks are possible - there are no leaks , and it pleases. But you should lower your eyes a little lower (although, I want to note, this is just the result, the main information is a little elsewhere):

== 3892 == ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)
3 errors. In 2 contexts. In such a simple program. How!?

Yes, very simple. The whole “joke” is that the strlen function does not take into account the end-of-line character - '\ 0'. Even if you explicitly specify it in the incoming line (#define HELLO_STRING "Hello, Habr! \ N \ 0"), it will be ignored.

Slightly higher than the result of the program execution, the lines -> Hello, Habr! There is a detailed report on what and where our precious valgrind did not like. I propose to look at these lines and draw conclusions.

Actually, the correct version of the program will look like this:

 #include <stdio.h> #include <stdlib.h> #include <string.h> #define HELLO_STRING "Hello, Habr!\n" void main() { char *str = malloc(sizeof(char) * (strlen(HELLO_STRING) + 1)); strcpy(str, HELLO_STRING); printf("->\t%s", str); free(str); } 

Pass through valgrind:

 [indever@localhost public]$ valgrind --tool=memcheck ./a.out -> Hello, Habr! ==3435== ==3435== HEAP SUMMARY: ==3435== in use at exit: 0 bytes in 0 blocks ==3435== total heap usage: 2 allocs, 2 frees, 1,038 bytes allocated ==3435== ==3435== All heap blocks were freed -- no leaks are possible ==3435== ==3435== For counts of detected and suppressed errors, rerun with: -v ==3435== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 

Fine. No errors, +1 byte of allocated memory helped solve the problem.

Interestingly, in most cases the first and second programs will work the same, but if the memory allocated for the line that the end character did not fit into was not zeroed, then the printf () function, when outputting such a line, will output all the garbage after this line - everything will be displayed until the line end character appears on the printf () path.

However, you know, (strlen (str) + 1) is such a solution. We face 2 problems:

  1. And if we need to allocate memory for a string formed using, for example, s (n) printf (..)? We do not support arguments.
  2. Appearance. The string with the declaration of the variable looks just awful. Some guys to malloc also (char *) manage to fasten, as if writing under the pluses. In a program where you regularly need to process lines, it makes sense to find a more elegant solution.

Let's come up with a solution that will satisfy both us and valgrind.

snprintf ()


int snprintf(char *str, size_t size, const char *format, ...); - function - sprintf extension, which formats the string and writes it according to the pointer passed as the first argument. It differs from sprintf () in that the str will not write a byte more than specified in size.

The function has one interesting feature - in any case, it returns the size of the string being formed (without taking into account the end-of-line character). If the string is empty, then 0 is returned.

One of the problems I have described using strlen is related to the functions sprintf () and snprintf (). Suppose we need to write something in the str string. The final line contains the values ​​of other variables. Our record should be something like this:

 char * str = /*    */; sprintf(str, "Hello, %s\n", "Habr!"); 

The question is: how to determine how much memory should be allocated for the str line?

 char * str = malloc(sizeof(char) * (strlen(str, "Hello, %s\n", "Habr!") + 1)); 
- it's not gonna go. The prototype of the strlen () function looks like this:

 #include <string.h> size_t strlen(const char *s); 

const char * s does not imply that the string passed to s can be a format string with a variable number of arguments.

Here we can use the useful property of the function snprintf (), which I mentioned above. Let's look at the code for the following program:

 #include <stdio.h> #include <stdlib.h> #include <string.h> void main() { /* .. snprintf()     ,      */ size_t needed_mem = snprintf(NULL, 0, "Hello, %s!\n", "Habr") + sizeof('\0'); char *str = malloc(needed_mem); snprintf(str, needed_mem, "Hello, %s!\n", "Habr"); printf("->\t%s", str); free(str); } 

Run the program in valgrind:

 [indever@localhost public]$ valgrind --tool=memcheck ./a.out -> Hello, Habr! ==4132== ==4132== HEAP SUMMARY: ==4132== in use at exit: 0 bytes in 0 blocks ==4132== total heap usage: 2 allocs, 2 frees, 1,041 bytes allocated ==4132== ==4132== All heap blocks were freed -- no leaks are possible ==4132== ==4132== For counts of detected and suppressed errors, rerun with: -v ==4132== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [indever@localhost public]$ 

Fine. Support arguments we have. Because we pass zero to the snprintf () function as the second argument, writing to the zero pointer will never result in Seagfault. However, despite this, the function will still return the size required by the string.

But on the other hand, we had to create an additional variable, and the construction

 size_t needed_mem = snprintf(NULL, 0, "Hello, %s!\n", "Habr") + sizeof('\0'); 

looks even worse than strlen ().

In general, + sizeof ('\ 0') can be removed if at the end of the format line you specify '\ 0' (size_t needed_mem = snprintf (NULL, 0, "Hello,% s! \ N \ 0 ", "Habr")) ;), but this is not always possible (depending on the string processing mechanism, we can allocate an extra byte).

Need to do something. I thought a little and decided that now was the time to appeal to the wisdom of the ancients. We describe the macro function that will call snprintf () with a null pointer as the first argument, and null as the second. And let's not forget about the end of the line!

 #define strsize(args...) snprintf(NULL, 0, args) + sizeof('\0') 

Yes, it may be news for someone, but macros in C support a variable number of arguments, and a triple-dot tells the preprocessor that the specified argument of the macro function (in our case, it’s args) corresponds to several real arguments.

Let's test our solution in practice:

 #include <stdio.h> #include <stdlib.h> #include <string.h> #define strsize(args...) snprintf(NULL, 0, args) + sizeof('\0') void main() { char *str = malloc(strsize("Hello, %s\n", "Habr!")); sprintf(str, "Hello, %s\n", "Habr!"); printf("->\t%s", str); free(str); } 

Run with valgrund:

 [indever@localhost public]$ valgrind --tool=memcheck ./a.out -> Hello, Habr! ==6432== ==6432== HEAP SUMMARY: ==6432== in use at exit: 0 bytes in 0 blocks ==6432== total heap usage: 2 allocs, 2 frees, 1,041 bytes allocated ==6432== ==6432== All heap blocks were freed -- no leaks are possible ==6432== ==6432== For counts of detected and suppressed errors, rerun with: -v ==6432== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 

Yes, there are no mistakes. Everything is correct. And valgrind is pleased, and the programmer can finally go to sleep.

But, finally, I will say something else. In case we need to allocate memory for any string (even with arguments) there is already a fully working ready solution .

This is the asprintf function:

 #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <stdio.h> int asprintf(char **strp, const char *fmt, ...); 

As the first argument, it takes a pointer to a string (** strp) and allocates memory according to a dereferenced pointer.

Our program written using asprintf () will look like this:

 #include <stdio.h> #include <stdlib.h> #include <string.h> void main() { char *str; asprintf(&str, "Hello, %s!\n", "Habr"); printf("->\t%s", str); free(str); } 

And, actually, in valgrind:

 [indever@localhost public]$ valgrind --tool=memcheck ./a.out -> Hello, Habr! ==6674== ==6674== HEAP SUMMARY: ==6674== in use at exit: 0 bytes in 0 blocks ==6674== total heap usage: 3 allocs, 3 frees, 1,138 bytes allocated ==6674== ==6674== All heap blocks were freed -- no leaks are possible ==6674== ==6674== For counts of detected and suppressed errors, rerun with: -v ==6674== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 

Everything is fine, but, as you can see, more memory was allocated in total, and now alloc'ov are three, not two. On weak embedded systems, using this feature is undesirable.
In addition, if we write man asprintf in the console, we will see:

 CONFORMING TO These functions are GNU extensions, not in C or POSIX. They are also available under *BSD. The FreeBSD implementation sets strp to NULL on error. 


Hence it is clear that this function is available only in the GNU source code.

Conclusion


In conclusion, I want to say that working with strings in C is a very complex topic, which has a number of nuances. For example, to write a “safe” code when dynamically allocating memory, it is recommended to use the calloc () function instead of malloc () - calloc clogs the allocated memory with zeros. Well, or after allocating memory, use the memset () function. Otherwise, the garbage that initially lay on the allocated area of ​​memory may cause questions during debugging, and sometimes when working with a string.

More than half of my fellow C-programmers (most of them are beginners), who, at my request, solved the task with memory allocation for strings, did it in such a way that in the end this led to context errors. In one case - even to a memory leak (well, the person forgot to make free (str), with whom it does not happen). As a matter of fact, this encouraged me to create this creation, which you just read.

I hope someone this article will be useful. Why am I all this town - no language is simple. Everywhere has its own subtleties. And the more subtleties of the language you know, the better your code.

I believe that after reading this article, your code will be a little better :)
Good luck, Habr!

Source: https://habr.com/ru/post/326108/


All Articles