📜 ⬆️ ⬇️

“Why haven't artificial intelligence been invented yet?” Or testing CNTK tools from Microsoft Research

Microsoft has released the open source source code of tools that the company uses to speed up development in the field of artificial intelligence: the Computational Network Toolkit is now available on Github. Developers had to create their own solution, because the tools available were too slow. Let's take a look at the results of testing this project with a static code analyzer.

Introduction


Computational Network Toolkit (CNTK) is a set of tools for designing and training networks of various types that can be used for pattern recognition, speech understanding, text analysis and more.

PVS-Studio is a static analyzer for detecting errors in the source code of programs written in C, C ++ and C # languages. The PVS-Studio tool is designed for developers of modern applications and integrates into Visual Studio 2010-2015 environments.

In preparing the article about checking the next open project , we provide in it information far from all warnings issued by the static analyzer. Therefore, we recommend that the authors of the project independently perform an analysis and study all the messages issued by the analyzer. We provide a key to open project developers for a time.
')
I must say that there were a few errors. And this is expected. The Microsoft product code is very high quality and we have already seen this more than once, checking out the projects that the company is gradually opening. But do not forget that the meaning of static analysis is in regular checks, and not in one-off "cavalry swoops."

Eh, typos


Typos - a nasty coincidence when typing. They penetrated social networks, books, publications on the Internet and even on television. In plain text, spell checking in text editors comes to the rescue, and static source code analysis does a good job with programming.



V501 There are identical sub-expressions '! Input (0) -> HasMBLayout ()' operator. trainingnodes.h 1416
virtual void Validate(bool isFinalValidationPass) override { .... if (isFinalValidationPass && !(Input(0)->GetSampleMatrixNumRows() == Input(2)->GetSampleMatrixNumRows() && (Input(0)->GetMBLayout() == Input(2)->GetMBLayout() || !Input(0)->HasMBLayout() || // <= !Input(0)->HasMBLayout()))) // <= { LogicError(..., NodeName().c_str(),OperationName().c_str()); } .... } 

The formatting of this fragment has been greatly changed for clarity. Only after that it became obvious that the condition contains two identical checks "! Input (0) -> HasMBLayout ()". Most likely, in one case, they wanted to use an element with the index '2'.

V501 There are identical sub-expressions to the left and the right to the operator - i0 - i0 ssematrix.h 564
 void assignpatch(const ssematrixbase &patch, const size_t i0, const size_t i1, const size_t j0, const size_t j1) { .... for (size_t j = j0; j < j1; j++) { const float *pcol = &patch(i0 - i0, j - j0); // <= float *qcol = &us(i0, j); const size_t colbytes = (i1 - i0) * sizeof(*pcol); memcpy(qcol, pcol, colbytes); } .... } 

Because of the seals, the expression “i0-i0” is always zero. Perhaps they wanted to write “i1-i0” or “j - i1”, or something else. Developers need to check out this place.

V596 The object was not used. The 'throw' keyword could be missing: throw runtime_error (FOO); simplenetworkbuilder.cpp 1578
 template <class ElemType> ComputationNetworkPtr SimpleNetworkBuilder<ElemType>:: BuildNetworkFromDbnFile(const std::wstring& dbnModelFileName) { .... if (this->m_outputLayerSize >= 0) outputLayerSize = this->m_outputLayerSize; else if (m_layerSizes.size() > 0) m_layerSizes[m_layerSizes.size() - 1]; else std::runtime_error("Output layer size must be..."); // <= .... } 

The error is that the 'throw' keyword is accidentally forgotten. As a result, this code does not generate an exception in case of an error situation. Corrected code:
 .... else throw std::runtime_error("Output layer size must be..."); .... 

Work with files




V739 EOF should not be compared with a value of the 'char' type. The 'c' should be of the 'int' type. fileutil.cpp 852
 string fgetstring(FILE* f) { string res; for (;;) { char c = (char) fgetc(f); // <= if (c == EOF) // <= RuntimeError("error reading .... 0: %s", strerror(errno)); if (c == 0) break; res.push_back(c); } return res; } 

The analyzer has detected that the constant EOF is compared with a variable of the type 'char'. This indicates that some characters will be processed by the program incorrectly.

Consider how EOF is declared:
 #define EOF (-1) 

As you can see, EOF is nothing more than '-1' of the 'int' type. The fgetc () function returns values ​​of type 'int'. Namely - it can return a number from 0 to 255 or -1 (EOF). The read values ​​are placed in a variable of type 'char'. Because of this, a character with a value of 0xFF (255) turns into -1 and is interpreted just like the end of file (EOF).

Users using Extended ASCII Codes may encounter an error when one of the characters of their alphabet is incorrectly processed by the program.

For example, the last letter of the Russian alphabet in the Windows-1251 encoding just has the code 0xFF and will be interpreted by the program as the end-of-file character.

Corrected code snippet:
 int c = fgetc(f); if (c == EOF) RuntimeError(....); 

V547 Expression 'val [0] == 0xEF' is always false. The value range of char type: [-128, 127]. file.cpp 462
 bool File::IsUnicodeBOM(bool skip) { .... else if (m_options & fileOptionsText) { char val[3]; file.ReadString(val, 3); found = (val[0] == 0xEF && val[1] == 0xBB && val[2] == 0xBF); } // restore pointer if no BOM or we aren't skipping it if (!found || !skip) { SetPosition(pos); } .... } 

By default, the 'char' type has a range of [-127; 127]. Using the compilation flag / J, you can tell the compiler to use the range [0; 255]. But for this source file such a flag is not specified, therefore such code will never determine that the file contains a BOM .

Work with memory




V595 The m_rowIndices' pointer was used before it was verified against nullptr. Check lines: 171, 175. libsvmbinaryreader.cpp 171
 template <class ElemType> void SparseBinaryMatrix<ElemType>::ResizeArrays(size_t newNNz) { .... if (m_nnz > 0) { memcpy(rowIndices, m_rowIndices, sizeof(int32_t)....); // <= memcpy(values, this->m_values, sizeof(ElemType)....); // <= } if (m_rowIndices != nullptr) { // free(m_rowIndices); CUDAPageLockedMemAllocator::Free(this->m_rowIndices, ....); } if (this->m_values != nullptr) { // free(this->m_values); CUDAPageLockedMemAllocator::Free(this->m_values, ....); } .... } 

The analyzer has detected a potential null pointer dereference. If in the code there is a comparison of the pointer with zero, but this pointer is used without verification above the code, then this code is suspicious and dangerous.

The memcpy () functions copy the bytes located at the addresses 'm_rowIndices' and 'm_values', while dereferencing these pointers, and in the given code example, they can potentially be zero.

V510 The 'sprintf_s' function is not expected to receive class-type variable as the third actual argument. binaryfile.cpp 501
 const std::wstring& GetName() { return m_name; } Section* Section::ReadSection(....) { .... char message[256]; sprintf_s(message,"Invalid header in file %ls, in header %s\n", m_file->GetName(), section->GetName()); // <= RuntimeError(message); .... } 

Only POD types can act as actual parameters of the sprint_s () function. POD is an abbreviation of "Plain Old Data", which can be translated as "Simple data in the style of C".

"Std :: wstring" does not apply to POD types. Instead of a pointer to a row, the contents of the object will fall onto the stack. Such a code will lead to the formation of a "gibberish" in the buffer or to the emergency termination of the program.

Corrected version:
 sprintf_s(message,"Invalid header in file %ls, in header %s\n", m_file->GetName().c_str(), section->GetName().c_str()); 

V630 The 'malloc' function is to allocate memory. latticeforwardbackward.cpp 912
 void lattice::forwardbackwardalign() { .... aligninfo *refinfo; unsigned short *refalign; refinfo = (aligninfo *) malloc(sizeof(aligninfo) * 1); // <= refalign = (unsigned short *) malloc(sizeof(....) * framenum); array_ref<aligninfo> refunits(refinfo, 1); array_ref<unsigned short> refedgealignmentsj(....); .... } 

In this code snippet, incorrect allocation of dynamic memory for a structure of the “aligninfo” type was found. The fact is that constructors are present in the structure definition, and with this method of memory allocation the constructor will not be called. When freeing memory using the free () function, the destructor will also not be called.

Below is a code describing the type "aligninfo":
 struct aligninfo // phonetic alignment { unsigned int unit : 19; // triphone index unsigned int frames : 11; // duration in frames unsigned int unused : 1; // (for future use) unsigned int last : 1; // set for last entry aligninfo(size_t punit, size_t pframes) : unit((unsigned int) punit), frames((unsigned int) pframes), unused(0), last(0) { checkoverflow(unit, punit, "aligninfo::unit"); checkoverflow(frames, pframes, "aligninfo::frames"); } aligninfo() // [v-hansu] initialize to impossible values { #ifdef INITIAL_STRANGE unit = unsigned int(-1); frames = unsigned int(-1); unused = unsigned int(-1); last = unsigned int(-1); #endif } template <class IDMAP> void updateunit(const IDMAP& idmap /*[unit] -> new unit*/) { const size_t mappedunit = idmap[unit]; unit = (unsigned int) mappedunit; checkoverflow(unit, mappedunit, "aligninfo::unit"); } }; 

Corrected version:
 aligninfo *refinfo = new aligninfo(); 

And naturally, to free up memory, you need to call the 'delete' operator.

V599 The virtual destructor is not present, although the IDataWriter class contains virtual functions. datawriter.cpp 47
 IDataWriter<ElemType>* m_dataWriter; .... template <class ElemType> void DataWriter<ElemType>::Destroy() { delete m_dataWriter; // <= V599 warning m_dataWriter = NULL; } 

The analyzer message indicates the absence of a virtual destructor for the base type of the object being destroyed. In this case, the destruction of the object of the derived class will entail an undefined behavior of the program. In practice, this can lead to memory leaks and non-release of other resources. Let's try to understand the cause of this warning.
 template <class ElemType> class DATAWRITER_API IDataWriter { public: typedef std::string LabelType; typedef unsigned int LabelIdType; virtual void Init(....) = 0; virtual void Init(....) = 0; virtual void Destroy() = 0; virtual void GetSections(....) = 0; virtual bool SaveData(....) = 0; virtual void SaveMapping(....) = 0; }; 

This definition of the base class, as we see - it contains virtual functions, but there is no virtual destructor.
 m_dataWriter = new HTKMLFWriter<ElemType>(); 

Thus, memory is allocated for an object of the derived class "HTKMLFWriter". Find its description:
 template <class ElemType> class HTKMLFWriter : public IDataWriter<ElemType> { private: std::vector<size_t> outputDims; std::vector<std::vector<std::wstring>> outputFiles; std::vector<size_t> udims; std::map<std::wstring, size_t> outputNameToIdMap; std::map<std::wstring, size_t> outputNameToDimMap; std::map<std::wstring, size_t> outputNameToTypeMap; unsigned int sampPeriod; size_t outputFileIndex; void Save(std::wstring& outputFile, ....); ElemType* m_tempArray; size_t m_tempArraySize; .... }; 

Due to the missing virtual destructor in the base class, this object will not be correctly destroyed. Destructors will not be called for outputDims, outputFiles, and so on. In general, it is impossible to predict all the consequences, because “uncertain behavior” is not for nothing called.

Mistakes




V502 Perhaps the '?:' Operator was different. The '?:' Operator has a lower than the '|' operator. sequenceparser.h 338
 enum SequenceFlags { seqFlagNull = 0, seqFlagLineBreak = 1, // line break on the parsed line seqFlagEmptyLine = 2, // empty line seqFlagStartLabel = 4, seqFlagStopLabel = 8 }; long Parse(....) { .... // sequence state machine variables bool m_beginSequence; bool m_endSequence; .... if (seqPos) { SequencePosition sequencePos(numbers->size(), labels->size(), m_beginSequence ? seqFlagStartLabel : 0 | m_endSequence ? seqFlagStopLabel : 0 | seqFlagLineBreak); // add a sequence element to the list seqPos->push_back(sequencePos); sequencePositionLast = sequencePos; } // end of sequence determines record separation if (m_endSequence) recordCount = (long) labels->size(); .... } 

Priority of the ternary operator ':?' below the priority of bitwise OR '|'. Consider a fragment with an error separately:
 0 | m_endSequence ? seqFlagStopLabel : 0 | seqFlagLineBreak 

In the code, it was expected to perform bitwise operations only with the specified flags, but due to the unexpected order of execution of the operators, first “0 | m_endSequence ", and not" m_endSequence? seqFlagStopLabel: 0 | seqFlagLineBreak.

This is generally an interesting case. Despite the error, the code works correctly. Beaten OR with 0 has no effect. Nevertheless, it is better to correct the error.

Two more such places:
V530 basics.h 428
 // TODO: merge this with todouble(const char*) above static inline double todouble(const std::string& s) { s.size(); // just used to remove the unreferenced warning double value = 0.0; .... } 

There is no mistake in this place, as the comment left behind tells us, but I cited this example for a reason:

First, to disable the compiler warning, there is a UNREFERENCED_PARAMETER macro, the name of which clearly indicates that the function parameter is not used intentionally:
 #define UNREFERENCED_PARAMETER(P) (P) static inline double todouble(const std::string& s) { UNREFERENCED_PARAMETER(s); .... } 

Secondly, I wanted to lead to another analyzer warning, which most likely indicates an error:

V530 utterancesourcemulti.h 340
 template <class UTTREF> std::vector<shiftedvector<....>>getclassids(const UTTREF &uttref) { std::vector<shiftedvector<....>> allclassids; allclassids.empty(); // <= .... } 

It makes no sense not to use the result of the empty () function. Most likely they wanted to clear the vector using the function clear ().

Similar place:
V688 '_ _ _ local local variable variable variable variable ses ses ses sequencereader.cpp 552
 template <class ElemType> class SequenceReader : public IDataReader<ElemType> { protected: bool m_idx2clsRead; bool m_clsinfoRead; bool m_idx2probRead; std::wstring m_file; // <= .... } template <class ElemType> template <class ConfigRecordType> void SequenceReader<ElemType>::InitFromConfig(....) { .... std::wstring m_file = readerConfig(L"file"); // <= if (m_traceLevel > 0) { fprintf(stderr, "....", m_file.c_str()); } .... } 

Using variables of the same name in a class, in class functions and class parameters is a very bad programming style. Here in this example: declaring the variable “std :: wstring m_file = readerConfig (L" file ");" really should have been here or added temporarily for debugging, and then forgot to delete?

Developers need to check this and the following places:

Conclusion


Computational Network Toolkit (CNTK) - turned out to be a small but very interesting project to check. Since the CNTK project has only recently been opened, we are waiting for interesting solutions using it, as well as waiting for the opening of other Microsoft projects.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Svyatoslav Razmyslov. “Why is there no artificial intelligence yet?” Or, analysis of the CNTK tool kit from Microsoft Research .

Read the article and have a question?
Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.

Source: https://habr.com/ru/post/276427/


All Articles