📜 ⬆️ ⬇️

Glibc library verification experiment

glibc and PVS-Studio
We conducted an experiment on checking the glibc library with PVS-Studio. The purpose of the experiment is to see how successfully the analyzer can test Linux projects. While bad can. There is a huge number of false positives due to the use of non-standard extensions. However, I still managed to find something interesting.

glibc


glibc - GNU C Library (GNU library). Glibc is a C library that provides system calls and basic functions such as open, malloc, printf, etc. The C library is used for all dynamically linked programs. It is written by the Free Software Foundation for GNU operating systems. glibc is released under the GNU LGPL license.

Taken from Wikipedia: glibc .

Not so long ago, news appeared on the Internet that a new version of the glibc library was released. This prompted us to check this library with the help of our PVS-Studio analyzer. Namely, the version glibc-2-19-90 was tested. Unfortunately, I was distracted for a couple of weeks, so I found the time to write an article just now. I was busy comparing a few static analyzers. This is a very important task for us, since they do not stop asking us what is better than Cppcheck and the static analyzer in Visual Studio 2013. Therefore, glibc had to wait a bit.
')
We did not expect to find something terrible and really did not find it. The glibc library is very high quality and verifiable by many analyzers. At a minimum, this:So to find at least something, this is already a great achievement.

What is the complexity of the analysis


Unfamiliar with the internal kitchen of static analysis tools, they seem to be very simple utilities. This is not true. These are very complex programs.

Such tools as RATS can be confusing . If someone watched the RATS code, then I saw that it was just a search for certain function names in the files. This tool is also called a static code analyzer. However, it is very far from what real code analyzers do. Static analysis is not a search using regular expressions at all [ 1 ].

Repeatedly, we repeated that the Linux version is not at all the same as a recompiled executable [ 2 ]. There is an abyss between the executable module and the software product. One obstacle, support for specific extensions and the like.

What it is for an outsider is completely incomprehensible. Here from sees, in the program a function call of strcmp ():
cmpres = strcmp (newp->from_string, root->from_string); 

And he doesn’t even suspect the horror of this line after preprocessing and which non-standard extensions will be used. Specifically in our case, the string turns into this:
 cmpres = __extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p (newp->from_string) && __builtin_constant_p (root->from_string) && (__s1_len = strlen (newp->from_string), __s2_len = strlen (root->from_string), (!((size_t)(const void *)((newp->from_string) + 1) - (size_t)(const void *)(newp->from_string) == 1) || __s1_len >= 4) && (!((size_t)(const void *)((root->from_string) + 1) - (size_t)(const void *)(root->from_string) == 1) || __s2_len >= 4)) ? __builtin_strcmp (newp->from_string, root->from_string) : (__builtin_constant_p (newp->from_string) && ((size_t)(const void *)((newp->from_string) + 1) - (size_t)(const void *)(newp->from_string) == 1) && (__s1_len = strlen (newp->from_string), __s1_len < 4) ? (__builtin_constant_p (root->from_string) && ((size_t)(const void *)((root->from_string) + 1) - (size_t)(const void *)(root->from_string) == 1) ? __builtin_strcmp (newp->from_string, root->from_string) : (__extension__ ({ const unsigned char *__s2 = (const unsigned char *) (const char *) (root->from_string); int __result = (((const unsigned char *) (const char *) (newp->from_string))[0] - __s2[0]); if (__s1_len > 0 && __result == 0) { __result = (((const unsigned char *) (const char *) (newp->from_string))[1] - __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((const unsigned char *) (const char *) (newp->from_string))[2] - __s2[2]); if (__s1_len > 2 && __result == 0) __result = (((const unsigned char *) (const char *) (newp->from_string))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p (root->from_string) && ((size_t)(const void *)((root->from_string) + 1) - (size_t)(const void *)(root->from_string) == 1) && (__s2_len = strlen (root->from_string), __s2_len < 4) ? (__builtin_constant_p (newp->from_string) && ((size_t)(const void *)((newp->from_string) + 1) - (size_t)(const void *)(newp->from_string) == 1) ? __builtin_strcmp (newp->from_string, root->from_string) : (- (__extension__ ({ const unsigned char *__s2 = (const unsigned char *) (const char *) (newp->from_string); int __result = (((const unsigned char *) (const char *) (root->from_string))[0] - __s2[0]); if (__s2_len > 0 && __result == 0) { __result = (((const unsigned char *) (const char *) (root->from_string))[1] - __s2[1]); if (__s2_len > 1 && __result == 0) { __result = (((const unsigned char *) (const char *) (root->from_string))[2] - __s2[2]); if (__s2_len > 2 && __result == 0) __result = (((const unsigned char *) (const char *) (root->from_string))[3] - __s2[3]); } } __result; })))) : __builtin_strcmp (newp->from_string, root->from_string)))); }); 

The analyzer is not ready for such a turn of events and, at times, produces meaningless false messages on such constructions.

I will explain about false messages on a simpler example. Suppose we have a line of code:
 assert(MAP_FAILED == (void *) -1); 

The macro assert () is expanded into the following code:
 ((((void *) -1) == (void *) -1) ? (void) (0) : __assert_fail ("((void *) -1) == (void *) -1", "loadmsgcat.c", 840, __PRETTY_FUNCTION__)); 

The PVS-Studio analyzer gives a false warning regarding the comparison (((void *) -1) == (void *) -1):

V501 There are identical to the left and the right of the operator: = (operator: ((void *) - 1) == (void *) - 1 loadmsgcat.c 840

We are not surprised by this. All this we have already passed, leading the development of programs collected using Visual C ++. There are also many interesting and unusual things. Much work needs to be done to teach the analyzer to understand what is what. We need to teach him to understand that he is dealing with the macro “assert” which is harmless and simply verifies that the macro MAP_FAILED is "(void *) -1". All this has already been done for Visual C ++. No for Linux.

The ability to work correctly with such constructions and consists a huge part of the work on supporting other compilers. Externally, this work is not visible. But it requires studying the features of the compiler and standard libraries. These features must be studied, supported and tested.

I hope I opened a little crack, so you can look into hell. In the future, I plan to write a series of articles that will show the complexity of developing static analysis tools. I think it will be interesting.

Suspicious code fragments found


Although the glibc project is being tested by many tools, we managed to find something interesting. Let's look at these sections of the code.

Strange expression


 char *DCIGETTEXT (....) { .... /* Make CATEGORYVALUE point to the next element of the list. */ while (categoryvalue[0] != '\0' && categoryvalue[0] == ':') ++categoryvalue; .... } 

V590 Consider inspecting this expression. The expression is misprint. dcigettext.c 582

The condition can be simplified to:
 while (categoryvalue[0] == ':') 

Perhaps this is not an error and the first part of the condition (categoryvalue [0]! = '\ 0') is simply superfluous. However, suddenly it should be written like this:
 while (categoryvalue[0] != '\0' && categoryvalue[0] != ':') 

Pointer dereferencing before checking


Not necessarily, this place is dangerous. Perhaps the pointer can never be zero. But nonetheless:
 static enum clnt_stat clntraw_call (h, proc, xargs, argsp, xresults, resultsp, timeout) CLIENT *h; u_long proc; xdrproc_t xargs; caddr_t argsp; xdrproc_t xresults; caddr_t resultsp; struct timeval timeout; { struct clntraw_private_s *clp = clntraw_private; XDR *xdrs = &clp->xdr_stream; .... if (clp == NULL) return RPC_FAILED; .... } 

V595 The 'clp' pointer was used before it was verified against nullptr. Check lines: 145, 150. clnt_raw.c 145

Next to this file you can see a similar defect: V595 The 'clp' pointer was used against nullptr. Check lines: 232, 235. clnt_raw.c 232

Another example of dangerous code:
 int __nss_getent_r (....) { .... if (res && __res_maybe_init (&_res, 0) == -1) { *h_errnop = NETDB_INTERNAL; *result = NULL; return errno; } .... if (status == NSS_STATUS_TRYAGAIN && (h_errnop == NULL || *h_errnop == NETDB_INTERNAL) && errno == ERANGE) } 

V595 The 'h_errnop' pointer was used before it was verified against nullptr. Check lines: 146, 172. getnssent_r.c 146

If the if (res && __res_maybe_init (& _res, 0) == -1) condition is met, the function returns error information. However, it dereferences the 'h_errnop' and 'result' pointers. However, these pointers may well be NULL. This conclusion can be made by examining the code below.

Dangerous optimization (vulnerability)


 char * __sha256_crypt_r (key, salt, buffer, buflen) const char *key; const char *salt; char *buffer; int buflen; { .... unsigned char temp_result[32] .... memset (temp_result, '\0', sizeof (temp_result)); .... .... // temp_result    } 

V597 The compiler could delete the memset function call, which is used to flush the temp_result buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha256-crypt.c 385

The compiler has the right to remove a call to the memset () function when compiling the Release version. More precisely, it is not only in the right, but it is also obliged to do this for the purpose of optimization. The buffer 'temp_result' after calling the function memset () is not used anywhere, hence the function call itself is also superfluous.

We are dealing with a vulnerability, since private data will not be cleared. You should replace the memset () function with a more appropriate one. The analyzer offers RtlSecureZeroMemory (), which of course is not in Linux. But there are analogues.

A similar situation: V597 The compiler could delete the memset function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory () function should be used to erase the private data. sha512-crypt.c 396

Undefined behavior


It would seem that the library glibc should be written as portable as possible. However, we find in it quite a few shear structures that cannot be called safe from the point of view of tolerance.

Here is what the C language standard tells us about shifts:

Integer promotions are performed on each of the operands. The promoted left operand. If you want to make it to the bottom of the water, you’re not sure.

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. E1 * 2, E2 * 2, E2 * 2 E2, E2 * 2 E2, If E1 has a signed type and a non-negative value, and E1 * 2 pow E2 is a representation of the type, then that is the resulting value; otherwise, the behavior is undefined.

5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. The E1 / 2 pow E2 has an unsigned type. If E1 has a signed value

It follows from the standard that it is wrong to shift negative numbers. However, this is a very common operation in the glibc library.

Shift left example:
 static void init_cacheinfo (void) { .... count_mask = ~(-1 << (count_mask + 1)); .... } 

V610 Undefined behavior. Check the shift operator '<<. The left operand '-1' is negative. cacheinfo.c 645

Right shift example:
 utf8_encode (char *buf, int val) { .... *buf = (unsigned char) (~0xff >> step); .... } 

The expression "~ 0xff" is of type 'int' and is equal to -256.

Here is a list of all the places where you can observe incorrect shifts:

Using an uninitialized variable


 static int send_vc(....) { .... int truncating, connreset, resplen, n; .... #ifdef _STRING_ARCH_unaligned *anssizp2 = orig_anssizp - resplen; *ansp2 = *ansp + resplen; #else .... } V614 Uninitialized variable 'resplen' used. res_send.c 790 

Incorrect formatting of strings


In some places, '% u' is used to print character variables. In some other places, '% d' is used to print unsigned variables. These are trifles, of course, but they are also worth mentioning.

Example:
 typedef unsigned int __uid_t; typedef __uid_t uid_t; int user2netname (...., const uid_t uid, ....) { .... sprintf (netname, "%s.%d@%s", OPSYS, uid, dfltdom); .... } 

V576 Incorrect format. Consider checking the fourth argument of the 'sprintf' function. The SIGNED integer type argument is expected. netname.c 51

Other relevant messages:

Conclusion


An unsuccessful project was chosen to begin checking code from the Linux world It is too high quality. :) It's hard to write an interesting article about errors. But it does not matter. A lot of other well-known and interesting projects in Linux are waiting for us, which we check to demonstrate the capabilities of the PVS-Studio analyzer.

Additional links


  1. Andrey Karpov. Static analysis and regular expressions .
  2. Dmitry Tkachenko. Interview with Andrei Karpov, technical director of PVS-Studio and CppCat projects .


Answers on questions
Often our articles are asked the same questions. We collected the answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 .

Source: https://habr.com/ru/post/213969/


All Articles