What is the difference between the following pairs of lengths and pointers?
size_t len1 = 0; char *ptr1 = NULL; size_t len2 = 0; char *ptr2 = malloc(0); size_t len3 = 0; char *ptr3 = (char *)malloc(4096) + 4096; size_t len4 = 0; char ptr4[0]; size_t len5 = 0; char ptr5[];
In many cases, all five expressions will produce the same result. In others, their behavior may be completely different. One of the obvious differences is the ability to pass a pointer to release it, but we will not consider it.
')
The first case is interesting, but it is too different from the others, so we will postpone it for now.
malloc (0)
The behavior of malloc (0) is defined by standards. You can return a null or unique pointer. The second option in many implementations is performed by an internal increase in length by one (which is then usually rounded to 16). According to the rules, such a pointer cannot be dereferenced, but usually several bytes are still allocated, and therefore such a program will not fall.
Returning NULL leads to the possibility of an interesting bug. Often, returning NULL from malloc is considered an error.
if ((ptr = malloc(len)) == NULL) err(1, "out of memory");
If len is zero, it will lead to an illegal error message - if you do not add an additional check && len! = 0. You can also join the sect of adherents of "not checking malloc".
In OpenBSD, malloc handles zero differently. Placing data of zero size returns pieces of pages that were protected via mprotect () with the PROT_NONE key. Attempting to dereference such a pointer will result in a fall.
Note that the requirements for unique signs prohibit “cheating” when using them.
int thezero; void * malloc(size_t len) { if (len == 0) return &thezero; } void free(void *ptr) { if (ptr == &thezero) return; }
Such an implementation does not comply with the rules, since successive calls will return the same value. Therefore, the second case is similar to both the first and the third, depending on the implementation.
Other cases
If malloc does not generate an error, then options 3, 4, and 5 in most cases work identically. The main difference will be in the use of sizeof (ptr) / sizeof (ptr [0]), for example in a loop. This will lead to an incorrect answer, the correct answer, or nothing at all, breaking off at the compilation stage. The 4th version is not allowed by the standard, but the compilers will most likely swallow it.
The biggest difference between these options and the first one is that they are tested for null. It will be like the difference between an empty array and a missing array. And in the same way, an empty string is not equal to a null string, although it occupies one byte in memory.
null objects
Let's return to the first variant and zero objects. Consider the following challenge:
memset(ptr, 0, 0);
0 bytes ptr assign 0. Which of the five pointers listed will make such a call? 3, 4 and 5. 2nd - if this is a unique pointer. But what if ptr is NULL?
Standard C in the section "Using Library Functions" says:
If the function argument has an invalid value (the value is outside the domain of the function, the pointer points to memory outside the scope of the available program, or the pointer is zero), then the behavior will not be determined.
The section "Agreements on the functions of working with strings" specifies:
If the argument declared as size_t n determines the length of the array in the function, the value of n can be zero when calling this function. Unless the description of a particular function specifies the opposite, the values ​​of the pointer arguments must be valid.
Apparently, the result of memset'a 0 bytes to NULL will be undefined. The documentation for memset, memcpy and memmove does not indicate that they can accept null pointers. As a counterexample, a description of snprintf can be given, which says: "If n is zero, nothing is written, and s can be a null pointer." POSIX's read function documentation describes in a similar way that a zero-length read is not considered an error, but an implementation can check other parameters for an error — for example, invalid buffer pointers.
And what in practice? The easiest way to handle zero length in memset or memcpy functions is to not go into a loop and do nothing. Usually, in C, an undefined behavior causes some kind of reaction, but in this case it is already determined that nothing happens with normal pointers. Checking for pointer abnormalities would be redundant.
Checking for non-zero, but invalid pointers is quite complicated. memcpy doesn't do that at all, letting the program just drop. The read call also does not check anything. It delegates a check to the copyout function, which gets the handler to detect errors. And although you can add a check for null, such pointers are no more invalid for these functions than 0x1 or 0xffffffff, for which there is no special processing.
Bummer
In practice, this means that there is a large amount of code that implies (either intentionally or accidentally) that null pointers and zero length are valid. I decided to conduct an experiment by adding error output and output to memcpy, in case the pointer turns out to be NULL, and installed a new libc.
Feb 11 01:52:47 carbolite xsetroot: memcpy with NULL Feb 11 01:53:18 carbolite last message repeated 15 times
Yeah, it didn't take long. I wonder what he does there:
Feb 11 01:53:18 carbolite gdb: memcpy with NULL Feb 11 01:53:19 carbolite gdb: memcpy with NULL
Clearly understood. These messages seem to get tired very quickly. We return everything as it was.
Effects
I dealt with this question, since at the intersection of the fields it is “not determined, but it should work” and the optimization of the C compilers does nothing good. A smart compiler can see the memcpy call, mark both pointers as valid, and remove null checks.
int backup; void copyint(int *ptr) { size_t len = sizeof(int); if (!ptr) len = 0; memcpy(&backup, ptr, len); }
But the code above will obviously not work as it should, if the compiler removes the check for zero and a null pointer will be passed.
This question worries me because in the past I have come across cases when such optimization of dereferencing and checking has led to security gaps. For software that is not ready for such a high level of compliance with standards, this is quite sad news.
At first, I was unable to convince the compiler to remove the check for zero after memcpy “dereferencing”, but this does not mean that this cannot happen. gcc 4.9 says that this check will be removed by optimization. In OpenBSD, the gcc 4.9 package (containing many patches) does not delete the default check, even if –O3, but if you enable "-fdelete-null-pointer-checks", this deletes the checks. I do not know what about the clang - the first tests show that it does not remove, but there are no guarantees. In theory, he will also be able to carry out such optimization.