Perl 5 was chosen to supplement the list of open source programming languages that were tested using the PVS-Studio static code analyzer. This article is about errors found and the difficulties of viewing the analysis results. The number of macros in the code is so great that it creates the feeling that the code is written not in C language, but in some strange dialect of it. Despite the difficulties in viewing the code, it was possible to gather up interesting problems, which will be discussed in this article.
Introduction
Perl is a high-level interpreted dynamic general purpose programming language (Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages). Development of Perl 5 began in 1994. After a couple of decades, C code with numerous macros causes nervousness among modern programmers.
Perl 5 source code was taken from the official
repository (the
blead branch). To check the project, the
PVS-Studio static code analyzer was used. The analysis was conducted on the Linux operating system, but the analyzer is also available for Windows and macOS.
')
Viewing the results of the analysis was not an easy task. The point is that the analyzer checks
preprocessed .i files, in which all the preprocessor directives are already disclosed, and gives warnings to the files with the source code. This is the correct behavior of the analyzer; nothing needs to be changed, but many warnings are issued for macros! And behind the macros there is an unreadable code.
The ternary operator doesn't work the way you think.
V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower limit than the '-' operator. toke.c 9494
STATIC char * S_scan_ident(pTHX_ char *s, char *dest, STRLEN destlen, I32 ck_uni) { .... if ((s <= PL_bufend - (is_utf8) ? UTF8SKIP(s) : 1) && VALID_LEN_ONE_IDENT(s, PL_bufend, is_utf8)) { .... } .... }
Let's start the review with a beautiful error. Every few reviews of the code have to repeat that the ternary operator has almost the lowest priority in the calculations.
Consider a fragment with an error:
s <= PL_bufend - (is_utf8) ? UTF8SKIP(s) : 1
The order of operations that the programmer expects:
- ?:
- -
- <=
What actually happens:
- -
- <=
- ?:
Keep a label with the priorities of operations: "The
priority of operations in the C / C ++ language ".
V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower limit than the '==' operator. re_exec.c 9193
STATIC I32 S_regrepeat(pTHX_ regexp *prog, char **startposp, const regnode *p, regmatch_info *const reginfo, I32 max _pDEPTH) { .... assert(STR_LEN(p) == reginfo->is_utf8_pat ? UTF8SKIP(STRING(p)) : 1); .... }
Simple code with a similar error. But if you do not know the priorities of operations, then you can make a mistake in the expression of any size.
Another place with assert:
- V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower limit than the '==' operator. re_exec.c 9286
V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. pp_hot.c 3036
PP(pp_match) { .... MgBYTEPOS_set(mg, TARG, truebase, RXp_OFFS(prog)[0].end); .... }
And here is a warning to the macro ... To understand what is happening there, even the implementation of the macro will not help, because it uses a few more macros!
Therefore, I attach a fragment of the preprocessed file for this line of code:
(((targ)->sv_flags & 0x00000400) && (!((targ)->sv_flags & 0x00200000) || S_sv_only_taint_gmagic(targ)) ? (mg)->mg_len = ((prog->offs)[0].end), (mg)->mg_flags |= 0x40 : ((mg)->mg_len = (((targ)->sv_flags & 0x20000000) && !__builtin_expect(((((PL_curcop)->cop_hints + 0) & 0x00000008) ? (_Bool)1 :(_Bool)0),(0))) ? (ssize_t)Perl_utf8_length( (U8 *)(truebase), (U8 *)(truebase)+((prog->offs)[0].end)) : (ssize_t)((prog->offs)[0].end), (mg)->mg_flags &= ~0x40));
Somewhere here, the analyzer doubted the correct use of the ternary operator (there are 3), but I did not find the strength to understand what was being done in this code. We have already seen that the developers make such mistakes, so here it can also be very likely.
Three more uses of this macro:
- V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. pp_ctl.c 324
- V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. regexec.c 7335
- V502 Perhaps the '?:' Operator was different. The '?:' Operator has a operator. re_exec.c 7335
Note colleagues Andrei Karpov. I have been meditating on this code for 10 minutes and am inclined to believe that there is no error here. But in any case, it is extremely painful to read such code, and it is better not to write like this.
Errors in conditions
V523 The 'then' statement is equivalent to the 'else' statement. toke.c 12056
static U8 * S_add_utf16_textfilter(pTHX_ U8 *const s, bool reversed) { .... SvCUR_set(PL_linestr, 0); if (FILTER_READ(0, PL_linestr, 0)) { SvUTF8_on(PL_linestr); } else { SvUTF8_on(PL_linestr); } PL_bufend = SvEND(PL_linestr); return (U8*)SvPVX(PL_linestr); }
I think you can do without examining the contents of the macros to make sure that there are suspiciously duplicated code fragments.
V564 The '|' operator is applied to bool type value. You've probably forgotten to include the parentheses or intended to use the '||' operator. op.c 11494
OP * Perl_ck_rvconst(pTHX_ OP *o) { .... gv = gv_fetchsv(kidsv, o->op_type == OP_RV2CV && o->op_private & OPpMAY_RETURN_CONSTANT ? GV_NOEXPAND : iscv | !(kid->op_private & OPpCONST_ENTERED), iscv
Very strange code. The expression "iscv | ! (kid-> op_private & OPpCONST_ENTERED) ”is not used at all. There is clearly some typo here. For example, perhaps you should write here:
: iscv = !(kid->op_private & OPpCONST_ENTERED), iscv
V547 Expression 'RETVAL == 0' is always true. Typemap.c 710
XS_EUPXS(XS_XS__Typemap_T_SYSRET_pass); XS_EUPXS(XS_XS__Typemap_T_SYSRET_pass) { dVAR; dXSARGS; if (items != 0) croak_xs_usage(cv, ""); { SysRet RETVAL; #line 370 "Typemap.xs" RETVAL = 0; #line 706 "Typemap.c" { SV * RETVALSV; RETVALSV = sv_newmortal(); if (RETVAL != -1) {
The variable
RETVAL is checked twice in a row. At the same time, from the code it is clear that this variable is always zero. Perhaps, in one or both conditions, they wanted to check the
RETVALSV pointer, but made a typo.
Throw in warnings about the sizeof operator
The analyzer has several types of diagnostic rules that look for errors using the
sizeof operator. On the Perl 5 project, two such diagnostics issued a total of about a thousand warnings. In this case, not the analyzer is to blame, but the macros.
V568 It's odd that the sizeof () operator is the 'len + 1' expression. util.c 1084
char * Perl_savepvn(pTHX_ const char *pv, I32 len) { .... Newx(newaddr,len+1,char); .... }
There are many similar macros in the code. I chose one for an example, we are interested in the argument "len + 1".
The macro preprocessor is expanded to the following code:
(newaddr = ((void)(__builtin_expect(((((( sizeof(size_t) < sizeof(len+1) || sizeof(char) > ((size_t)1 << 8*(sizeof(size_t) - sizeof(len+1)))) ? (size_t)(len+1) : ((size_t)-1)/sizeof(char)) > ((size_t)-1)/sizeof(char))) ? (_Bool)1 : (_Bool)0),(0)) && (S_croak_memory_wrap(),0)), (char*)(Perl_safesysmalloc((size_t)((len+1)*sizeof(char))))));
The analyzer warning is issued on the
sizeof (len +1) construct. The fact is that no calculations in the arguments of the
sizeof operator are performed. In a similar code many macros are revealed. This is probably the old legacy code, in which no one wants to touch anything, but current developers continue to use old macros, suggesting a different behavior.
Null pointer dereferencing
V522 Dereferencing of the null pointer 'sv' might take place. pp_ctl.c 577
OP * Perl_pp_formline(void) { .... SV *sv = ((void *)0); .... switch (*fpc++) { .... case 4: arg = *fpc++; f += arg; fieldsize = arg; if (mark < sp) sv = *++mark; else { sv = &(PL_sv_immortals[2]); Perl_ck_warner( (28 ), "...."); } .... break; case 5: { const char *s = item = ((((sv)->sv_flags & (....)) == 0x00000400) ? .... .... } .... }
This code snippet is completely taken from the preprocessed file, because it is impossible to verify the existence of a problem from the source file, again due to macros.
The
sv pointer when declared is initialized to zero. The analyzer found that when passing by the value of
5 in the
switch statement , this pointer is dereferenced, which was not previously initialized. The change of the
sv pointer is present in the branch with the value
4 , but at the end of this block of code there is a
break statement. Most likely, this place requires writing additional code.
It was verified against nullptr. Check lines: 15919, 15920. op.c 15919
void Perl_rpeep(pTHX_ OP *o) { .... OP *k = o->op_next; U8 want = (k->op_flags & OPf_WANT);
In this code snippet, the analyzer detected a pointer
k , which is dereferenced one line before it is checked for validity. This can be either an error or an extra code.
Diagnostics
V595 finds a lot of warnings in any project, Perl 5 is no exception. It’s impossible to put all this into an article, so we’ll confine ourselves to one example, and the developers, if they wish, will check the project themselves.
miscellanea
V779 Unreachable code detected. It is possible that an error is present. universal.c 457
XS(XS_utf8_valid); XS(XS_utf8_valid) { dXSARGS; if (items != 1) croak_xs_usage(cv, "sv"); else { SV * const sv = ST(0); STRLEN len; const char * const s = SvPV_const(sv,len); if (!SvUTF8(sv) || is_utf8_string((const U8*)s,len)) XSRETURN_YES; else XSRETURN_NO; } XSRETURN_EMPTY; }
In the line with
XSRETURN_EMPTY, the analyzer detected an unreachable code. In this function there are two operators
return and
croak_xs_usage - a macro that is expanded into the noreturn function:
void Perl_croak_xs_usage(const CV *const cv, const char *const params) __attribute__((noreturn));
In similar places in the Perl 5 code, the
NOT_REACHED macro is used to specify an unreachable branch.
V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. inffast.c 296
void ZLIB_INTERNAL inflate_fast(z_streamp strm, unsigned start) { .... unsigned long hold; unsigned bits; .... hold &= (1U << bits) - 1; .... }
The analyzer detected a suspicious operation when working with bitmasks. A variable of smaller width than the variable
hold is used as a bit mask. This results in the loss of the most significant bits. Developers should pay attention to this code.
Conclusion

Finding errors through macros was very difficult. Viewing the report took a lot of time and effort. Nevertheless, the article includes very interesting cases, similar to real mistakes. The report of the analyzer is quite large, there definitely is a lot of interesting things there. But I can’t see it further :). I advise the developer to check the project on their own and eliminate the defects that they can detect.
PS We definitely want to support this interesting project, and we are ready to provide developers with a license for several months.

If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov.
Perl 5: How to Hide Errors in Macros