Suppose we have a region / memory area defined by two variables, for example:
byte* regionStart; size_t regionSize;
It is required to check whether the pointer value is within this range. Perhaps your first impulse would be to write this:
if (p >= regionStart && p < regionStart + regionSize)
But does the standard guarantee the expected behavior of this code?
The relevant clause of the C standard (6.5.8 Relational Operators) (1) reads as follows:
')
If two pointers to an object or an incomplete type refer to the same object or to the position immediately after the last element of the same array, these pointers are equal. If the referenced objects are members of the same compound object, then pointers to the members of the structure declared later are more pointers to the members declared earlier, and pointers to array elements with large indices are greater than pointers to elements of the same array with smaller indices. All pointers to members of the same union are equal. If the expression P points to an element of the array, and the expression Q points to the last element of the same array, then the value of the pointer-expression Q + 1 is greater than the value of the expression P. In all other cases, the behavior is undefined.Now recall that the C language was designed to work with a wide range of architectures, many of which have already become museum exhibits. For this reason, it is extremely conservative with regard to the choice of acceptable actions, since it is necessary to leave the possibility to write C programs for obsolete systems. (Although at one time they were quite advanced.)
However, when allocating memory, the appearance of such a pointer is possible that will satisfy our condition, although in reality it will not refer to a given area. This will happen, for example, when running on an 80286 processor in protected mode, which was used by Windows 3.x operating systems in standard mode and OS / 2 1.x.
The pointer in such a system is a 32-bit value consisting of two parts of 16 bits each — it is customary to write it as
XXXX: YYYY . The first 16-bit half (
XXXX ) is the “selector”, which serves to select a 64-
KB memory segment. The second 16-bit half (
YYYY ) is the “offset” by which a byte is selected within the segment specified by the first half. (In fact, this mechanism is more complicated, but in the framework of this discussion we will manage with such an explanation.)
Memory blocks larger than 64 KB are divided into 64 KB segments. To move to the next segment, you must add 8 to the selector of the current segment. For example, the byte following
0101: FFFF is written as
0109: 0000 .
But why add exactly 8? Why not just increase the selector by one? The fact is that the lower three bits of the selector are used for other purposes. In particular, the lowest selector bit is used to select the selector table. We will not touch bits 1 and 2 here, since they are not related to our question. For convenience, just imagine that they are always set to zero. (2)
The correspondence of selectors to physical memory addresses is described by two tables:
Global Descriptor Table (
Global Descriptor Table ; identifies memory segments common to all processes) and
Local Descriptor Table (
Local Descriptor Table ; identifies memory segments allocated for personal use of a particular process). Thus, the selectors for the local memory of the process are
0001 ,
0009 ,
0011 ,
0019 , etc., and the selectors for the global memory are
0008 ,
0010 ,
0018 ,
0020 , etc. (The selector
0000 is reserved.)
Well, now we can build a counterexample. Let
regionStart = 0101: 0000 , and
regionSize = 0x00020000 . This means that the range of protected addresses is from
0101: 0000 to
0101: FFFF and from
0109: 0000 to
0109: FFFF . In addition,
regionStart + regionSize = 0111: 0000 .
Now imagine that in the range of
0108: 0000 a segment of global memory is allocated, - the fact that this is global memory is indicated by an even number in the selector.
Note that the global memory area is not in the range of protected addresses, but the value of the pointer to this area satisfies the inequality
0101: 0000? 0108: 0000 <0111: 0000 .
A bit more text: Our check can fail even on flat-memory model architectures. Modern compilers are too eager to optimize undefined behavior. Having found a comparison of pointers, they have the right to assume that these pointers refer to the same composite object or array (or position beyond the last element of the array), since any other type of comparison leads to indefinite behavior. In our case, if
regionStart indicates the beginning of an array or a composite object, then only pointers of the form
regionStart, regionStart + 1, regionStart + 2, ..., regionStart + regionSize can be correctly compared with it. All of them satisfy the condition
p> = regionStart and therefore can be optimized, with the result that the compiler simplifies our checking to the following code:
if (p < regionStart + regionSize)
Now the condition will satisfy all pointers, the value of which is less than
regionStart .
(You may encounter this situation if, as the author of the original question that answers this article, you allocate a region of memory using the expression
regionStart = malloc (n) or if the selected region is used as a pool of preallocated objects for quick access and you need to solve Whether to free the pointer using the
free function.)
Moral: This code is insecure - even on architectures with a flat memory model.
But not everything is so bad: The result of converting a pointer to an integer type depends on the implementation used, which means that it should describe its behavior. If your implementation assumes getting the numerical value of the linear address of the object referenced by the pointer, and you know that you are working on a flat memory model architecture, the output will compare
integer values instead of
pointers . Comparing integers does not have such limitations as comparing pointers.
if ((uintptr_t)p >= (uintptr_t)regionStart && (uintptr_t)p < (uintptr_t)regionStart + (uintptr_t)regionSize)
Notes:- Note that “equal” and “not equal” are not relational operators.
- I know that in reality this is not the case - equal to zero I accept them for convenience.
(This article is based on
my comments on StackOverflow .)
Updated: Clarification: optimization of the “beginning of the memory region” is performed only when the
regionStart pointer refers to the beginning of an array or a composite object.
This is a translation of the “into a range of memory” into Russian. Click the link to see the original English version.