📜 ⬆️ ⬇️

PVS-Studio: 25 suspicious code snippets from CoreCLR


Microsoft has released the open source source code engine CoreCLR, which is a key element of. NET Core. This news, of course, could not fail to attract our attention. After all, the larger the audience of the project, the more alarming the suspicious places found will be. Despite the authorship of Microsoft, as in any large project, there is something to see and think about.

Introduction


CoreCLR is a .NET Core runtime environment, performing functions such as garbage collection or compiling to final machine code. .Net Core is a modular implementation of .Net that can be used as a base for a huge number of scenarios.

Recently the source code is available on GitHub and has been tested with PVS-Studio 5.23. Like me, anyone can get a full verification log using Microsoft Visual Studio Community Edition, the release of which was also recent news from Microsoft.

')

Typos


Traditionally, I start from typos-like places. In conditional expressions, variables, constants, macros, or structure / class fields are repeated. The presence of an error is a subject of discussion; nevertheless, such places were found and look suspicious.

V501 There are identical sub-expressions 'tree-> gtOper == GT_CLS_VAR' operator. ClrJit lsra.cpp 3140
// register variable GTNODE(GT_REG_VAR , "regVar" ,0,GTK_LEAF|GTK_LOCAL) // static data member GTNODE(GT_CLS_VAR , "clsVar" ,0,GTK_LEAF) // static data member address GTNODE(GT_CLS_VAR_ADDR , "&clsVar" ,0,GTK_LEAF) .... void LinearScan::buildRefPositionsForNode(GenTree *tree, ....) { .... if ((tree->gtOper == GT_CLS_VAR || tree->gtOper == GT_CLS_VAR) && i == 1) { registerType = TYP_PTR; currCandidates = allRegs(TYP_PTR); } .... } 

Although the 'GenTree' structure has a similar field "tree-> gtType", it has a different type with "tree-> gtOper". I think the point here is in the copied constant. That is, in the expression should be used another constant, in addition to GT_CLS_VAR.

V501 There are identical sub-expressions "DECODE_PSP_SYM" operator. daccess 264
 enum GcInfoDecoderFlags { DECODE_SECURITY_OBJECT = 0x01, DECODE_CODE_LENGTH = 0x02, DECODE_VARARG = 0x04, DECODE_INTERRUPTIBILITY = 0x08, DECODE_GC_LIFETIMES = 0x10, DECODE_NO_VALIDATION = 0x20, DECODE_PSP_SYM = 0x40, DECODE_GENERICS_INST_CONTEXT = 0x80, DECODE_GS_COOKIE = 0x100, DECODE_FOR_RANGES_CALLBACK = 0x200, DECODE_PROLOG_LENGTH = 0x400, DECODE_EDIT_AND_CONTINUE = 0x800, }; size_t GCDump::DumpGCTable(PTR_CBYTE table, ....) { GcInfoDecoder hdrdecoder(table, (GcInfoDecoderFlags)( DECODE_SECURITY_OBJECT | DECODE_GS_COOKIE | DECODE_CODE_LENGTH | DECODE_PSP_SYM //<==1 | DECODE_VARARG | DECODE_PSP_SYM //<==1 | DECODE_GENERICS_INST_CONTEXT //<==2 | DECODE_GC_LIFETIMES | DECODE_GENERICS_INST_CONTEXT //<==2 | DECODE_PROLOG_LENGTH), 0); .... } 

There are even two duplicate constants, although there are other constants in the "GcInfoDecoderFlags" enumeration that are not used in the condition.

Similar places:
V700 Consider inspecting the 'T foo = foo = ...' expression. It is odd that variable is initialized through itself. cee_wks zapsig.cpp 172
 BOOL ZapSig::GetSignatureForTypeHandle(....) { .... CorElementType elemType = elemType = TryEncodeUsingShortcut(pMT); .... } 

It seems to be just an extra assignment, but such errors are often made when copying code and forget to rename something. In any case, in this form the code is meaningless.

V523 The 'then' statement is equivalent to the 'else' statement. cee_wks threadsuspend.cpp 2468
 enum __MIDL___MIDL_itf_mscoree_0000_0004_0001 { OPR_ThreadAbort = 0, OPR_ThreadRudeAbortInNonCriticalRegion = .... , OPR_ThreadRudeAbortInCriticalRegion = ....) , OPR_AppDomainUnload = .... , OPR_AppDomainRudeUnload = ( OPR_AppDomainUnload + 1 ) , OPR_ProcessExit = ( OPR_AppDomainRudeUnload + 1 ) , OPR_FinalizerRun = ( OPR_ProcessExit + 1 ) , MaxClrOperation = ( OPR_FinalizerRun + 1 ) } EClrOperation; void Thread::SetRudeAbortEndTimeFromEEPolicy() { LIMITED_METHOD_CONTRACT; DWORD timeout; if (HasLockInCurrentDomain()) { timeout = GetEEPolicy()-> GetTimeout(OPR_ThreadRudeAbortInCriticalRegion); //<== } else { timeout = GetEEPolicy()-> GetTimeout(OPR_ThreadRudeAbortInCriticalRegion); //<== } .... } 

This diagnostic finds the same blocks in the if / else constructs. And here, too, there is a suspicion of a typo in the constant. In the first case, the meaning of the “OPR_ThreadRudeAbortInNonCriticalRegion” just fits.

Similar places:

Constructor initialization list


V670 The uninitialized class member 'gcInfo' is used to initialize the 'regSet' member. Remember that members are initialized in the order of declarations inside a class. ClrJit codegencommon.cpp 92
 CodeGenInterface *getCodeGenerator(Compiler *comp); class CodeGenInterface { friend class emitter; public: .... RegSet regSet; //<=== line 91 .... public: GCInfo gcInfo; //<=== line 322 .... }; // CodeGen constructor CodeGenInterface::CodeGenInterface(Compiler* theCompiler) : compiler(theCompiler), gcInfo(theCompiler), regSet(theCompiler, gcInfo) { } 

According to the standard, the order of initialization of class members in the constructor occurs in the order they are declared in the class. To correct the error, move the declaration of a member of the class 'gcInfo' above the declaration 'regSet'.

False but useful warning


V705 It is possible that “else” block was forgotten or commented out, thus altering the program's operation logics. daccess daccess.cpp 2979
 HRESULT Initialize() { if (hdr.dwSig == sig) { m_rw = eRO; m_MiniMetaDataBuffSizeMax = hdr.dwTotalSize; hr = S_OK; } else // when the DAC initializes this for the case where the target is // (a) a live process, or (b) a full dump, buff will point to a // zero initialized memory region (allocated w/ VirtualAlloc) if (hdr.dwSig == 0 && hdr.dwTotalSize == 0 && hdr.dwCntStreams == 0) { hr = S_OK; } // otherwise we may have some memory corruption. treat this as // a liveprocess/full dump else { hr = S_FALSE; } .... } 

The analyzer detected a suspicious place in the code. Here you can see that the code is commented and everything is fine But this type of error is very common when the code after the 'else' is commented out and the operator following it becomes part of the condition. In this example, there is no error, but it is quite possible to allow it when editing this place in the future.

64-bit error


V673 The '0xefefefef << 28' expression evaluates to the 'unsigned' type '32' bits. cee_dac _dac object.inl 95
 inline void Object::EnumMemoryRegions(void) { .... SIZE_T size = sizeof(ObjHeader) + sizeof(Object); .... size |= 0xefefefef << 28; .... } 

You can read about the term “64-bit error” here . In this example, after the shift, the operation “size | = 0xf0000000” in a 32-bit program and “size | = 0x00000000f0000000” in a 64-bit one will be performed. Most likely, in the 64-bit program they planned to calculate: "size | = 0x0efefefef0000000". But where is the oldest part of the number?

The number “0xefefefef” is of type 'unsigned', because it does not fit into the type of 'int'. The shift of the 32-bit number is performed and as a result we get 0xf0000000 unsigned type. Further, this unsigned number will expand to SIZE_T and we will get 0x00000000f0000000.

To work correctly, you must first perform an explicit type conversion. An example of the correct code:
 inline void Object::EnumMemoryRegions(void) { .... SIZE_T size = sizeof(ObjHeader) + sizeof(Object); .... size |= SIZE_T(0xefefefef) << 28; .... } 

Another place:

Code "retired"


Sometimes the conditions are written so that they literally contradict each other.

V637 Two opposite conditions were encountered. The second condition is always false. Check lines: 31825, 31827. cee_wks gc.cpp 31825
 void gc_heap::verify_heap (BOOL begin_gc_p) { .... if (brick_table [curr_brick] < 0) { if (brick_table [curr_brick] == 0) { dprintf(3, ("curr_brick %Ix for object %Ix set to 0", curr_brick, (size_t)curr_object)); FATAL_GC_ERROR(); } .... } .... } 

Code that never gets control, but it doesn't look as significant as in the following example:

V517 The use of if (A) {...} else if (A) {...} 'pattern was detected. There is a possibility of logical error presence. Check lines: 2353, 2391. utilcode util.cpp 2353
 void PutIA64Imm22(UINT64 * pBundle, UINT32 slot, INT32 imm22) { if (slot == 0) { const UINT64 mask0 = UI64(0xFFFFFC000603FFFF); /* Clear all bits used as part of the imm22 */ pBundle[0] &= mask0; UINT64 temp0; temp0 = (UINT64) (imm22 & 0x200000) << 20; // 1 s temp0 |= (UINT64) (imm22 & 0x1F0000) << 11; // 5 imm5c temp0 |= (UINT64) (imm22 & 0x00FF80) << 25; // 9 imm9d temp0 |= (UINT64) (imm22 & 0x00007F) << 18; // 7 imm7b /* Or in the new bits used in the imm22 */ pBundle[0] |= temp0; } else if (slot == 1) { .... } else if (slot == 0) //<== { const UINT64 mask1 = UI64(0xF000180FFFFFFFFF); /* Clear all bits used as part of the imm22 */ pBundle[1] &= mask1; UINT64 temp1; temp1 = (UINT64) (imm22 & 0x200000) << 37; // 1 s temp1 |= (UINT64) (imm22 & 0x1F0000) << 32; // 5 imm5c temp1 |= (UINT64) (imm22 & 0x00FF80) << 43; // 9 imm9d temp1 |= (UINT64) (imm22 & 0x00007F) << 36; // 7 imm7b /* Or in the new bits used in the imm22 */ pBundle[1] |= temp1; } FlushInstructionCache(GetCurrentProcess(),pBundle,16); } 

Perhaps a very important code never gets control because of an error in the cascade of conditional statements.

More suspicious places:

Undefined behavior


V610 Undefined behavior. Check the shift operator '<<'. The left operand '-1' is negative. bcltype metamodel.h 532
 inline static mdToken decodeToken(....) { //<TODO>@FUTURE: make compile-time calculation</TODO> ULONG32 ix = (ULONG32)(val & ~(-1 << m_cb[cTokens])); if (ix >= cTokens) return rTokens[0]; return TokenFromRid(val >> m_cb[cTokens], rTokens[ix]); } 

The analyzer detected a shift operation of a negative number, which leads to undefined behavior.

V610 Undefined behavior. Check the shift operator '<<'. The left operand '(~ 0)' is negative. cee_dac decodemd.cpp 456
 #define bits_generation 2 #define generation_mask (~(~0 << bits_generation)) #define MASK(len) (~((~0)<<len)) #define MASK64(len) ((~((~((unsigned __int64)0))<<len))) void Encoder::Add(unsigned value, unsigned length) { .... value = (value & MASK(length)); .... } 

Thanks to the post of the V610 analyzer, I found several incorrect macros. '~ 0' is reduced to a signed negative integer, followed by a shift. As in one of the macros, you need to perform an explicit conversion to unsigned:
 #define bits_generation 2 #define generation_mask (~(~((unsigned int)0) << bits_generation)) #define MASK(len) (~((~((unsigned int)0))<<len)) #define MASK64(len) ((~((~((unsigned __int64)0))<<len))) 

Incorrect sizeof (xx)


The DacReadAll V579 It is possibly a mistake. Inspect the third argument. daccess dacimpl.h 1688
 template<class T> inline bool MisalignedRead(CORDB_ADDRESS addr, T *t) { return SUCCEEDED(DacReadAll(TO_TADDR(addr), t, sizeof(t), false)); } 

This is such a small function that always takes the size of the pointer. Most likely, they wanted to write “sizeof (* t)”, well, or “sizeof (T)”.

Another illustrative example:

V579 . It is possibly a mistake. Inspect the third argument. util.cpp 4943
 HRESULT GetMTOfObject(TADDR obj, TADDR *mt) { if (!mt) return E_POINTER; HRESULT hr = rvCache->Read(obj, mt, sizeof(mt), NULL); if (SUCCEEDED(hr)) *mt &= ~3; return hr; } 


Family of functions "memFAIL"


With the use of memXXX-functions can make a variety of errors. To search for such places in the analyzer there are several diagnostic rules.

V512 A call to the "memset" function will lead to the underflow of the buffer 'pAddExpression'. sos strike.cpp 11973
 DECLARE_API(Watch) { .... if(addExpression.data != NULL || aExpression.data != NULL) { WCHAR pAddExpression[MAX_EXPRESSION]; memset(pAddExpression, 0, MAX_EXPRESSION); swprintf_s(pAddExpression, MAX_EXPRESSION, L"%S", ....); Status = g_watchCmd.Add(pAddExpression); } .... } 

A common mistake when they forget to make an amendment to the size of a type:
 WCHAR pAddExpression[MAX_EXPRESSION]; memset(pAddExpression, 0, sizeof(WCHAR)*MAX_EXPRESSION); 

A few more places like this:
V598 The 'memcpy' function is used to copy the fields of the 'GenTree' class. Virtual table pointer will be maintained by this. ClrJit compiler.hpp 1344
 struct GenTree { .... #if DEBUGGABLE_GENTREE virtual void DummyVirt() {} #endif // DEBUGGABLE_GENTREE .... }; void GenTree::CopyFrom(const GenTree* src, Compiler* comp) { .... memcpy(this, src, src->GetNodeSize()); .... } 

If the preprocessor variable 'DEBUGGABLE_GENTREE' is declared, then a virtual function is defined. Then the class contains a pointer to a table of virtual methods and it can no longer be copied like this.

V598 The 'memcpy' function is used to copy the fields of the 'GCStatistics' class. Virtual table pointer will be maintained by this. cee_wks gc.cpp 287
 struct GCStatistics : public StatisticsBase { .... virtual void Initialize(); virtual void DisplayAndUpdate(); .... }; GCStatistics g_LastGCStatistics; void GCStatistics::DisplayAndUpdate() { .... memcpy(&g_LastGCStatistics, this, sizeof(g_LastGCStatistics)); .... } 

In this place incorrect copying is performed not only in debug mode.

V698 Expression 'memcmp (....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp (....) <0' instead. sos util.cpp 142
 bool operator( )(const GUID& _Key1, const GUID& _Key2) const { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; } 

It is not correct to compare the result of the function 'memcmp' with the value 1 or -1. The efficiency of such constructions depends on the libraries, the compiler, its settings, the operating system, its capacity and so on; in this case, it is necessary to check one of the three states: '<0', '0' or '> 0'.

Similar place:

About pointers


V522 Dereferencing of the null pointer 'hp' might take place. cee_wks gc.cpp 4488
 heap_segment* gc_heap::get_segment_for_loh (size_t size #ifdef MULTIPLE_HEAPS , gc_heap* hp #endif //MULTIPLE_HEAPS ) { #ifndef MULTIPLE_HEAPS gc_heap* hp = 0; #endif //MULTIPLE_HEAPS heap_segment* res = hp->get_segment (size, TRUE); .... } 

If 'MULTIPLE_HEAPS' is not defined, then trouble. The pointer will be zero.

It was verified against nullptr. Check lines: 6970, 6976. ClrJit gentree.cpp 6970
 void Compiler::gtDispNode(GenTreePtr tree, ....) { .... if (tree->gtOper >= GT_COUNT) { printf(" **** ILLEGAL NODE ****"); return; } if (tree && printFlags) { /* First print the flags associated with the node */ switch (tree->gtOper) { .... } .... } .... } 

In the source code, there are common places when the pointer is checked for validity, but after dereference.

The entire list: CoreCLR_V595.txt .

Extra checks


Even if the extra code is not harmful, its presence may simply distract the development attention from more important places.

V503 This is a nonsensical comparison: pointer> = 0. cee_wks gc.cpp 21707
 void gc_heap::make_free_list_in_brick (BYTE* tree, make_free_args* args) { assert ((tree >= 0)); .... } 

Here is a check pointer. More examples:
V547 Expression 'maxCpuId> = 0' is always true. Unsigned type value is always> = 0. cee_wks codeman.cpp 1219
 void EEJitManager::SetCpuInfo() { .... unsigned char buffer[16]; DWORD maxCpuId = getcpuid(0, buffer); if (maxCpuId >= 0) { .... } 

A similar example, only with the DWORD type.

V590 Consider inspecting the 'wzPath [0]! = L' \ 0 '&& wzPath [0] == L' \\ '' expression. The expression is misprint. cee_wks path.h 62
 static inline bool HasUncPrefix(LPCWSTR wzPath) { _ASSERTE(!clr::str::IsNullOrEmpty(wzPath)); return wzPath[0] != W('\0') && wzPath[0] == W('\\') && wzPath[1] != W('\0') && wzPath[1] == W('\\') && wzPath[2] != W('\0') && wzPath[2] != W('?'); } 

This function can be simplified to this option:
 static inline bool HasUncPrefix(LPCWSTR wzPath) { _ASSERTE(!clr::str::IsNullOrEmpty(wzPath)); return wzPath[0] == W('\\') && wzPath[1] == W('\\') && wzPath[2] != W('\0') && wzPath[2] != W('?'); } 

Another place:
V571 Recurring check. The 'if (moduleInfo [MSCORWKS] .baseAddr == 0)' condition was already verified in line 749. sos util.cpp 751
 struct ModuleInfo { ULONG64 baseAddr; ULONG64 size; BOOL hasPdb; }; HRESULT CheckEEDll() { .... // Do we have clr.dll if (moduleInfo[MSCORWKS].baseAddr == 0) //<== { if (moduleInfo[MSCORWKS].baseAddr == 0) //<== g_ExtSymbols->GetModuleByModuleName ( MAIN_CLR_MODULE_NAME_A,0,NULL, &moduleInfo[MSCORWKS].baseAddr); if (moduleInfo[MSCORWKS].baseAddr != 0 && //<== moduleInfo[MSCORWKS].hasPdb == FALSE) { .... } .... } .... } 

In the second case, 'baseAddr' can no longer be checked.

V704 'this == nullptr' expression should be avoided - this expression can always be NULL. ClrJit gentree.cpp 12731
 bool FieldSeqNode::IsFirstElemFieldSeq() { if (this == nullptr) return false; return m_fieldHnd == FieldSeqStore::FirstElemPseudoField; } 

According to the C ++ standard, the this pointer can never be null. The possible consequences of such a code can be found in detail in the diagnostic description of the V704 . The fact that such code can work correctly after compilation by the Visual C ++ compiler is just luck and you honestly cannot rely on it.

The entire list: CoreCLR_V704.txt .

V668 has been defined using the 'new' operator. The exception will be generated in the case of memory allocation error. ClrJit stresslog.h 552
 FORCEINLINE BOOL GrowChunkList () { .... StressLogChunk * newChunk = new StressLogChunk (....); if (newChunk == NULL) { return FALSE; } .... } 

If the 'new' operator could not allocate memory, then, according to the C ++ standard of the language, the exception std :: bad_alloc () is generated. Thus, it does not make sense to check the pointer to equality to zero.

It is better to check such places, here is the complete list: CoreCLR_V668.txt .

Conclusion


The newly opened project CoreCLR is a good example of what closed source software might look like. There are constantly discussions on this topic and here you have one more reason for reflection and discussion.

For us, it is important that in any large project you can find some mistakes and the best use of the static analyzer is regular checks. Do not be lazy, download PVS-Studio and check your project.

This article is in English.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. PVS-Studio: 25 Suspicious Code Fragments in CoreCLR .

Read the article and have a question?
Often our articles are asked the same questions. We collected the answers to them here: Answers to questions from readers of articles about PVS-Studio and CppCat, version 2014 . Please review the list.

Source: https://habr.com/ru/post/253280/


All Articles