Reflections on dereferencing a null pointer

It turns out that the question of whether or not such a code & ((T *) (0) -> x) is correct is very difficult. I decided to write about this little note.

In a recent article about checking the Linux kernel with the help of the PVS-Studio analyzer, I wrote that I found the following code fragment:

static int podhd_try_init(struct usb_interface *interface, struct usb_line6_podhd *podhd) { int err; struct usb_line6 *line6 = &podhd->line6; if ((interface == NULL) || (podhd == NULL)) return -ENODEV; .... }

Also in the article I wrote that such code, in my opinion, is incorrect. Details can be found in the article.

After that, I received emails saying that I was wrong, and this code is absolutely correct. Many have indicated that if podhd == 0, then this code essentially implements the “offsetof” idiom, and nothing bad can happen. In order not to write a lot of answers, I decided to make an answer in the form of a small blog post.

Naturally, I decided to study this topic in more detail. But honestly, as a result, I'm just more confused. Therefore, I will not give you an exact answer, whether you can write like this or not. I will only provide some links and share my opinion.
')
When I was writing an article about verifying Linux, I thought this way.

Any null pointer dereference is an undefined behavior. One of the manifestations of indefinite behavior can be such code optimization when the test (podhd == NULL) disappears. It is this scenario that I described in the article.

In the letters, some developers wrote that they could not achieve this behavior on their compilers. However, this does not prove anything. The expected correct operation of the program is just one of the options for undefined behavior.

Some also wrote that this is exactly how the ffsetof () macro works:

 #define offsetof(st, m) ((size_t)(&((st *)0)->m))

However, this does not prove anything. Such macros are specially made to work correctly in the right compiler. If we write similar code, it is not necessary that it will work.

Moreover, here the compiler clearly sees 0 and can guess what the programmer wants from it. When 0 is stored in a variable, this is a completely different matter, and the compiler may behave in unexpected ways.

This is what offsetof says in Wikipedia:

The “traditional” implementation of the compiler is not especially picky about pointers; It is a hypothetical structure for the hypothetical structure:

#define offsetof (st, m) ((size_t) (& ((st *) 0) -> m))

This is a list of This is not the case, it has been a reference, and it’s not the case. It is also the case of the issue of compiler diagnostics if misspelled. Some modern compilers (such as GCC) define the macro using a special form instead, eg

#define offsetof (st, m) __builtin_offsetof (st, m)

As you can see, according to Wikipedia, I'm right. So you can not write. This is an undefined behavior. Some people also count on StackOverflow: Address of members of a struct via NULL pointer .

However, I am confused by the fact that although everyone is talking about indefinite behavior, nowhere is there an exact explanation on this subject. For example, in Wikipedia there is a note that the statement requires confirmation [citation needed].

Similar questions have been discussed many times on the forums, but nowhere have I seen an unambiguous explanation, confirmed by references to the C or C ++ standard.

There is another such old discussion of the standard, which also did not add clarity: 232. Is indirection through a null pointer undefined behavior?

So, at the moment this question is not completely clear to me. However, I still believe that this code is bad and should be refactored.

If someone sends me good notes on this topic, I will add them to the end of this article.

UPDATE: Continued: habrahabr.ru/company/pvs-studio/blog/250701

Source: https://habr.com/ru/post/247973/

All Articles

Reflections on dereferencing a null pointer

More articles: