Microsoft has released the open source source code engine CoreCLR, which is a key element of. NET Core. This news, of course, could not fail to attract our attention. After all, the larger the audience of the project, the more alarming the suspicious places found will be. Despite the authorship of Microsoft, as in any large project, there is something to see and think about.
Introduction
CoreCLR is a .NET Core runtime environment, performing functions such as garbage collection or compiling to final machine code. .Net Core is a modular implementation of .Net that can be used as a base for a huge number of scenarios.
Recently the source code is available on GitHub and has been tested with
PVS-Studio 5.23. Like me, anyone can get a full verification log using Microsoft Visual Studio Community Edition, the release of which was also recent
news from Microsoft.
')
Typos
Traditionally, I start from typos-like places. In conditional expressions, variables, constants, macros, or structure / class fields are repeated. The presence of an error is a subject of discussion; nevertheless, such places were found and look suspicious.
V501 There are identical sub-expressions 'tree-> gtOper == GT_CLS_VAR' operator. ClrJit lsra.cpp 3140
Although the 'GenTree' structure has a similar field "tree-> gtType", it has a different type with "tree-> gtOper". I think the point here is in the copied constant. That is, in the expression should be used another constant, in addition to GT_CLS_VAR.
V501 There are identical sub-expressions "DECODE_PSP_SYM" operator. daccess 264
enum GcInfoDecoderFlags { DECODE_SECURITY_OBJECT = 0x01, DECODE_CODE_LENGTH = 0x02, DECODE_VARARG = 0x04, DECODE_INTERRUPTIBILITY = 0x08, DECODE_GC_LIFETIMES = 0x10, DECODE_NO_VALIDATION = 0x20, DECODE_PSP_SYM = 0x40, DECODE_GENERICS_INST_CONTEXT = 0x80, DECODE_GS_COOKIE = 0x100, DECODE_FOR_RANGES_CALLBACK = 0x200, DECODE_PROLOG_LENGTH = 0x400, DECODE_EDIT_AND_CONTINUE = 0x800, }; size_t GCDump::DumpGCTable(PTR_CBYTE table, ....) { GcInfoDecoder hdrdecoder(table, (GcInfoDecoderFlags)( DECODE_SECURITY_OBJECT | DECODE_GS_COOKIE | DECODE_CODE_LENGTH | DECODE_PSP_SYM
There are even two duplicate constants, although there are other constants in the "GcInfoDecoderFlags" enumeration that are not used in the condition.
Similar places:
- V501 There are identical sub-expressions L varLoc1.vlStk2.vls2BaseReg ’== 'operator. cee_wks util.cpp 657
- V501 There are identical sub-expressions' varLoc1.vlStk2.vls2Offset '==' operator. cee_wks util.cpp 658
- V501 There are identical sub-expressions 'varLoc1.vlFPstk.vlfReg' operator. cee_wks util.cpp 661
V700 Consider inspecting the 'T foo = foo = ...' expression. It is odd that variable is initialized through itself. cee_wks zapsig.cpp 172
BOOL ZapSig::GetSignatureForTypeHandle(....) { .... CorElementType elemType = elemType = TryEncodeUsingShortcut(pMT); .... }
It seems to be just an extra assignment, but such errors are often made when copying code and forget to rename something. In any case, in this form the code is meaningless.
V523 The 'then' statement is equivalent to the 'else' statement. cee_wks threadsuspend.cpp 2468
enum __MIDL___MIDL_itf_mscoree_0000_0004_0001 { OPR_ThreadAbort = 0, OPR_ThreadRudeAbortInNonCriticalRegion = .... , OPR_ThreadRudeAbortInCriticalRegion = ....) , OPR_AppDomainUnload = .... , OPR_AppDomainRudeUnload = ( OPR_AppDomainUnload + 1 ) , OPR_ProcessExit = ( OPR_AppDomainRudeUnload + 1 ) , OPR_FinalizerRun = ( OPR_ProcessExit + 1 ) , MaxClrOperation = ( OPR_FinalizerRun + 1 ) } EClrOperation; void Thread::SetRudeAbortEndTimeFromEEPolicy() { LIMITED_METHOD_CONTRACT; DWORD timeout; if (HasLockInCurrentDomain()) { timeout = GetEEPolicy()-> GetTimeout(OPR_ThreadRudeAbortInCriticalRegion);
This diagnostic finds the same blocks in the if / else constructs. And here, too, there is a suspicion of a typo in the constant. In the first case, the meaning of the “OPR_ThreadRudeAbortInNonCriticalRegion” just fits.
Similar places:
- V523 The 'then' statement is equivalent to the 'else' statement. ClrJit instr.cpp 3427
- V523 The 'then' statement is equivalent to the 'else' statement. ClrJit flowgraph.cpp 18815
- V523 The 'then' statement is equivalent to the 'else' statement. daccess dacdbiimpl.cpp 6374
Constructor initialization list
V670 The uninitialized class member 'gcInfo' is used to initialize the 'regSet' member. Remember that members are initialized in the order of declarations inside a class. ClrJit codegencommon.cpp 92
CodeGenInterface *getCodeGenerator(Compiler *comp); class CodeGenInterface { friend class emitter; public: .... RegSet regSet;
According to the standard, the order of initialization of class members in the constructor occurs in the order they are declared in the class. To correct the error, move the declaration of a member of the class 'gcInfo' above the declaration 'regSet'.
False but useful warning
V705 It is possible that “else” block was forgotten or commented out, thus altering the program's operation logics. daccess daccess.cpp 2979
HRESULT Initialize() { if (hdr.dwSig == sig) { m_rw = eRO; m_MiniMetaDataBuffSizeMax = hdr.dwTotalSize; hr = S_OK; } else
The analyzer detected a suspicious place in the code. Here you can see that the code is commented and everything is fine But this type of error is very common when the code after the 'else' is commented out and the operator following it becomes part of the condition. In this example, there is no error, but it is quite possible to allow it when editing this place in the future.
64-bit error
V673 The '0xefefefef << 28' expression evaluates to the 'unsigned' type '32' bits. cee_dac _dac object.inl 95
inline void Object::EnumMemoryRegions(void) { .... SIZE_T size = sizeof(ObjHeader) + sizeof(Object); .... size |= 0xefefefef << 28; .... }
You can read about the term “64-bit error”
here . In this example, after the shift, the operation “size | = 0xf0000000” in a 32-bit program and “size | = 0x00000000f0000000” in a 64-bit one will be performed. Most likely, in the 64-bit program they planned to calculate: "size | = 0x0efefefef0000000". But where is the oldest part of the number?
The number “0xefefefef” is of type 'unsigned', because it does not fit into the type of 'int'. The shift of the 32-bit number is performed and as a result we get 0xf0000000 unsigned type. Further, this unsigned number will expand to SIZE_T and we will get 0x00000000f0000000.
To work correctly, you must first perform an explicit type conversion. An example of the correct code:
inline void Object::EnumMemoryRegions(void) { .... SIZE_T size = sizeof(ObjHeader) + sizeof(Object); .... size |= SIZE_T(0xefefefef) << 28; .... }
Another place:
- V673 The '0xefefefef << 28' expression evaluates to the 'unsigned' type '32' bits. cee_dac dynamicmethod.cpp 807
Code "retired"
Sometimes the conditions are written so that they literally contradict each other.
V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 31825, 31827. cee_wks gc.cpp 31825
void gc_heap::verify_heap (BOOL begin_gc_p) { .... if (brick_table [curr_brick] < 0) { if (brick_table [curr_brick] == 0) { dprintf(3, ("curr_brick %Ix for object %Ix set to 0", curr_brick, (size_t)curr_object)); FATAL_GC_ERROR(); } .... } .... }
Code that never gets control, but it doesn't look as significant as in the following example:
V517 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 2353, 2391. utilcode util.cpp 2353
void PutIA64Imm22(UINT64 * pBundle, UINT32 slot, INT32 imm22) { if (slot == 0) { const UINT64 mask0 = UI64(0xFFFFFC000603FFFF); pBundle[0] &= mask0; UINT64 temp0; temp0 = (UINT64) (imm22 & 0x200000) << 20;
Perhaps a very important code never gets control because of an error in the cascade of conditional statements.
More suspicious places:
- V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 2898, 2900. daccess nidump.cpp 2898
- V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 337, 339. utilcode prettyprintsig.cpp 337
- V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 774, 776. utilcode prettyprintsig.cpp 774
Undefined behavior
V610 Undefined behavior. Check the shift operator '<<'. The left operand '-1' is negative. bcltype metamodel.h 532
inline static mdToken decodeToken(....) {
The analyzer detected a shift operation of a negative number, which leads to undefined behavior.
V610 Undefined behavior. Check the shift operator '<<'. The left operand '(~ 0)' is negative. cee_dac decodemd.cpp 456
#define bits_generation 2 #define generation_mask (~(~0 << bits_generation)) #define MASK(len) (~((~0)<<len)) #define MASK64(len) ((~((~((unsigned __int64)0))<<len))) void Encoder::Add(unsigned value, unsigned length) { .... value = (value & MASK(length)); .... }
Thanks to the post of the V610 analyzer, I found several incorrect macros. '~ 0' is reduced to a signed negative integer, followed by a shift. As in one of the macros, you need to perform an explicit conversion to unsigned:
#define bits_generation 2 #define generation_mask (~(~((unsigned int)0) << bits_generation)) #define MASK(len) (~((~((unsigned int)0))<<len)) #define MASK64(len) ((~((~((unsigned __int64)0))<<len)))
Incorrect sizeof (xx)
The DacReadAll V579 It is possibly a mistake. Inspect the third argument. daccess dacimpl.h 1688
template<class T> inline bool MisalignedRead(CORDB_ADDRESS addr, T *t) { return SUCCEEDED(DacReadAll(TO_TADDR(addr), t, sizeof(t), false)); }
This is such a small function that always takes the size of the pointer. Most likely, they wanted to write “sizeof (* t)”, well, or “sizeof (T)”.
Another illustrative example:
V579 . It is possibly a mistake. Inspect the third argument. util.cpp 4943
HRESULT GetMTOfObject(TADDR obj, TADDR *mt) { if (!mt) return E_POINTER; HRESULT hr = rvCache->Read(obj, mt, sizeof(mt), NULL); if (SUCCEEDED(hr)) *mt &= ~3; return hr; }
Family of functions "memFAIL"
With the use of memXXX-functions can make a variety of errors. To search for such places in the analyzer there are several diagnostic rules.
V512 A call to the "memset" function will lead to the underflow of the buffer 'pAddExpression'. sos strike.cpp 11973
DECLARE_API(Watch) { .... if(addExpression.data != NULL || aExpression.data != NULL) { WCHAR pAddExpression[MAX_EXPRESSION]; memset(pAddExpression, 0, MAX_EXPRESSION); swprintf_s(pAddExpression, MAX_EXPRESSION, L"%S", ....); Status = g_watchCmd.Add(pAddExpression); } .... }
A common mistake when they forget to make an amendment to the size of a type:
WCHAR pAddExpression[MAX_EXPRESSION]; memset(pAddExpression, 0, sizeof(WCHAR)*MAX_EXPRESSION);
A few more places like this:
- V512 A call-out the memset function will lead to the underflow of the buffer pSaveName. sos strike.cpp 11997
- V512 A call-out the memset function will lead to the underflow of the buffer pOldName. sos strike.cpp 12013
- V512 A call of the memset function will lead to the underflow of the buffer pNewName. sos strike.cpp 12016
- V512 A call of the memset function will lead to underflow of the buffer 'pExpression'. sos strike.cpp 12024
- V512 A call of the memset function will lead to underflow of the buffer pFilterName. sos strike.cpp 12039
V598 The 'memcpy' function is used to copy the fields of the 'GenTree' class. Virtual table pointer will be maintained by this. ClrJit compiler.hpp 1344
struct GenTree { .... #if DEBUGGABLE_GENTREE virtual void DummyVirt() {} #endif
If the preprocessor variable 'DEBUGGABLE_GENTREE' is declared, then a virtual function is defined. Then the class contains a pointer to a table of virtual methods and it can no longer be copied like this.
V598 The 'memcpy' function is used to copy the fields of the 'GCStatistics' class. Virtual table pointer will be maintained by this. cee_wks gc.cpp 287
struct GCStatistics : public StatisticsBase { .... virtual void Initialize(); virtual void DisplayAndUpdate(); .... }; GCStatistics g_LastGCStatistics; void GCStatistics::DisplayAndUpdate() { .... memcpy(&g_LastGCStatistics, this, sizeof(g_LastGCStatistics)); .... }
In this place incorrect copying is performed not only in debug mode.
V698 Expression 'memcmp (....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp (....) <0' instead. sos util.cpp 142
bool operator( )(const GUID& _Key1, const GUID& _Key2) const { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }
It is not correct to compare the result of the function 'memcmp' with the value 1 or -1. The efficiency of such constructions depends on the libraries, the compiler, its settings, the operating system, its capacity and so on; in this case, it is necessary to check one of the three states: '<0', '0' or '> 0'.
Similar place:
- V698 Expression 'wcscmp (....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'wcscmp (....) <0' instead. sos strike.cpp 3855
About pointers
V522 Dereferencing of the null pointer 'hp' might take place. cee_wks gc.cpp 4488
heap_segment* gc_heap::get_segment_for_loh (size_t size #ifdef MULTIPLE_HEAPS , gc_heap* hp #endif
If 'MULTIPLE_HEAPS' is not defined, then trouble. The pointer will be zero.
It was verified against nullptr. Check lines: 6970, 6976. ClrJit gentree.cpp 6970
void Compiler::gtDispNode(GenTreePtr tree, ....) { .... if (tree->gtOper >= GT_COUNT) { printf(" **** ILLEGAL NODE ****"); return; } if (tree && printFlags) { switch (tree->gtOper) { .... } .... } .... }
In the source code, there are common places when the pointer is checked for validity, but after dereference.
The entire list:
CoreCLR_V595.txt .
Extra checks
Even if the extra code is not harmful, its presence may simply distract the development attention from more important places.
V503 This is a nonsensical comparison: pointer> = 0. cee_wks gc.cpp 21707
void gc_heap::make_free_list_in_brick (BYTE* tree, make_free_args* args) { assert ((tree >= 0)); .... }
Here is a check pointer. More examples:
- V503 This is a nonsensical comparison: pointer> = 0. cee_wks gc.cpp 23204
- V503 This is a nonsensical comparison: pointer> = 0. cee_wks gc.cpp 27683
V547 Expression 'maxCpuId> = 0' is always true. Unsigned type value is always> = 0. cee_wks codeman.cpp 1219
void EEJitManager::SetCpuInfo() { .... unsigned char buffer[16]; DWORD maxCpuId = getcpuid(0, buffer); if (maxCpuId >= 0) { .... }
A similar example, only with the DWORD type.
V590 Consider inspecting the 'wzPath [0]! = L' \ 0 '&& wzPath [0] == L' \\ '' expression. The expression is misprint. cee_wks path.h 62
static inline bool HasUncPrefix(LPCWSTR wzPath) { _ASSERTE(!clr::str::IsNullOrEmpty(wzPath)); return wzPath[0] != W('\0') && wzPath[0] == W('\\') && wzPath[1] != W('\0') && wzPath[1] == W('\\') && wzPath[2] != W('\0') && wzPath[2] != W('?'); }
This function can be simplified to this option:
static inline bool HasUncPrefix(LPCWSTR wzPath) { _ASSERTE(!clr::str::IsNullOrEmpty(wzPath)); return wzPath[0] == W('\\') && wzPath[1] == W('\\') && wzPath[2] != W('\0') && wzPath[2] != W('?'); }
Another place:
- V590 Consider inspecting this expression. The expression is misprint. cee_wks path.h 72
V571 Recurring check. The 'if (moduleInfo [MSCORWKS] .baseAddr == 0)' condition was already verified in line 749. sos util.cpp 751
struct ModuleInfo { ULONG64 baseAddr; ULONG64 size; BOOL hasPdb; }; HRESULT CheckEEDll() { ....
In the second case, 'baseAddr' can no longer be checked.
V704 'this == nullptr' expression should be avoided - this expression can always be NULL. ClrJit gentree.cpp 12731
bool FieldSeqNode::IsFirstElemFieldSeq() { if (this == nullptr) return false; return m_fieldHnd == FieldSeqStore::FirstElemPseudoField; }
According to the C ++ standard, the this pointer can never be null. The possible consequences of such a code can be found in detail in the diagnostic description of the
V704 . The fact that such code can work correctly after compilation by the Visual C ++ compiler is just luck and you honestly cannot rely on it.
The entire list:
CoreCLR_V704.txt .
V668 has been defined using the 'new' operator. The exception will be generated in the case of memory allocation error. ClrJit stresslog.h 552
FORCEINLINE BOOL GrowChunkList () { .... StressLogChunk * newChunk = new StressLogChunk (....); if (newChunk == NULL) { return FALSE; } .... }
If the 'new' operator could not allocate memory, then, according to the C ++ standard of the language, the exception std :: bad_alloc () is generated. Thus, it does not make sense to check the pointer to equality to zero.
It is better to check such places, here is the complete list:
CoreCLR_V668.txt .
Conclusion
The newly opened project CoreCLR is a good example of what closed source software might look like. There are constantly discussions on this topic and here you have one more reason for reflection and discussion.
For us, it is important that in any large project you can find some mistakes and the best use of the static analyzer is regular checks. Do not be lazy,
download PVS-Studio and check your project.
This article is in English.
If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov.
PVS-Studio: 25 Suspicious Code Fragments in CoreCLR .