During the code analysis, PVS-Studio performs data flow analysis and operates with the values of variables. Values are taken from constants or derived from conditional expressions. We call them virtual values. Recently, we improved them to work with multi-character constants and this was the reason for creating a new diagnostic rule.
Introduction
A multi-character literal is
implementation-defined , so different compilers can encode these literals in different ways. For example, GCC and Clang specify a value based on the order of the characters in the literal, while MSVC moves them depending on the type of character (normal or escape).
For example, the literal 'T \ x65s \ x74' will be encoded in different ways, depending on the compiler. Similar logic had to be added to the analyzer. As a result, we made a new diagnostic rule V1039 to identify such literals in the code. Such literals are dangerous in cross-platform projects that use multiple compilers for building.
Diagnostics V1039
Consider an example. The code below, compiled by different compilers, will behave differently:
')
#include <stdio.h> void foo(int c) { if (c == 'T\x65s\x74') // <= V1039 { printf("Compiled with GCC or Clang.\n"); } else { printf("It's another compiler (for example, MSVC).\n"); } } int main(int argc, char** argv) { foo('Test'); return 0; }
A program compiled by different compilers will print different messages on the screen.
For a project that uses a specific compiler, this will not be noticeable, but problems may arise during porting, so you should replace such literals with simple numeric constants, for example, change Test to 0x54657374.
To demonstrate the difference between compilers, let's write a small utility, where sequences of 3 and 4 characters are taken, for example, 'GHIJ' and 'GHI', and their representation in memory is displayed on the screen after compilation.
Utility code:
#include <stdio.h> typedef int char_t; void PrintBytes(const char* format, char_t lit) { printf("%20s : ", format); const unsigned char *ptr = (const unsigned char*)&lit; for (int i = sizeof(lit); i--;) { printf("%c", *ptr++); } putchar('\n'); } int main(int argc, char** argv) { printf("Hex codes are: G(%02X) H(%02X) I(%02X) J(%02X)\n",'G','H','I','J'); PrintBytes("'GHIJ'", 'GHIJ'); PrintBytes("'\\x47\\x48\\x49\\x4A'", '\x47\x48\x49\x4A'); PrintBytes("'G\\x48\\x49\\x4A'", 'G\x48\x49\x4A'); PrintBytes("'GH\\x49\\x4A'", 'GH\x49\x4A'); PrintBytes("'G\\x48I\\x4A'", 'G\x48I\x4A'); PrintBytes("'GHI\\x4A'", 'GHI\x4A'); PrintBytes("'GHI'", 'GHI'); PrintBytes("'\\x47\\x48\\x49'", '\x47\x48\x49'); PrintBytes("'GH\\x49'", 'GH\x49'); PrintBytes("'\\x47H\\x49'", '\x47H\x49'); PrintBytes("'\\x47HI'", '\x47HI'); return 0; }
Displaying the utility compiled with Visual C ++:
Hex codes are: G(47) H(48) I(49) J(4A) 'GHIJ' : JIHG '\x47\x48\x49\x4A' : GHIJ 'G\x48\x49\x4A' : HGIJ 'GH\x49\x4A' : JIHG 'G\x48I\x4A' : JIHG 'GHI\x4A' : JIHG 'GHI' : IHG '\x47\x48\x49' : GHI 'GH\x49' : IHG '\x47H\x49' : HGI '\x47HI' : IHG
Output of the utility compiled by GCC or Clang:
Hex codes are: G(47) H(48) I(49) J(4A) 'GHIJ' : JIHG '\x47\x48\x49\x4A' : JIHG 'G\x48\x49\x4A' : JIHG 'GH\x49\x4A' : JIHG 'G\x48I\x4A' : JIHG 'GHI\x4A' : JIHG 'GHI' : IHG '\x47\x48\x49' : IHG 'GH\x49' : IHG '\x47H\x49' : IHG '\x47HI' : IHG
Conclusion
Diagnostics V1039 has been added to the PVS-Studio analyzer version
7.03 , which was recently released. Download the latest version of the analyzer on
the download page .

If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov.
The dangers of using multi-character constants