PVS-Studio vs Chromium

This time the victory was won by the good. Or rather, the source code of the project Chromium. Chromium is one of the best projects that we tested with PVS-Studio.

Chromium is an open source web browser developed by Google and designed to provide users with fast and secure Internet access. Chromium is based on the Google Chrome browser. At the same time, Chromium is a preliminary version of Google Chrome, as well as a number of other alternative web browsers.

From the point of view of a programmer, Chromium is a solution (solution) consisting of 473 projects. The total amount of C / C ++ source code is about 460 megabytes. The number of rows is difficult to calculate.

These 460 megabytes include a large number of different libraries. If you separate them, then there will be about 155 megabytes. Significantly less, but still a lot. Moreover, everything is relative. Many of these libraries are made by the developers of Chromium as part of the task of creating Chromium. Although such libraries live by themselves, they can be completely attributed to the browser itself.
')
Chromium became the largest and highest quality project I met during the PVS-Studio tests. When working with the Chromium project, it was actually not very clear who was checking whom. We found and fixed several errors in PVS-Studio related to the analysis of C ++ files and with the support of a specific project structure.

The quality of the source code Chromium says a lot of points and techniques used. For example, most programmers use the following construction to determine the number of elements in an array:

int XX[] = { 1, 2, 3, 4 }; size_t N = sizeof(XX) / sizeof(XX[0]);

This is usually made into the following macro:

 #define count_of(arg) (sizeof(arg) / sizeof(arg[0]))

This is a workable and useful macro. I myself, to be honest, always used exactly this macro. However, it can cause an error, since he can accidentally slip a simple pointer, and he will not object to this. Let me explain with an example:

 void Test(int C[3]) { int A[3]; int *B = Foo(); size_t x = count_of( A ); // Ok x = count_of( B ); // Error x = count_of( C ); // Error }

The count_of (A) construct works correctly and returns the number of elements in array A, equal to three.

But if you accidentally apply count_of () to a pointer, the result will be a meaningless value. The trouble is that the macro will not warn the programmer in any way about the strange construction of the type count_of (B). The size of the pointer is divided by the size of the array element. This situation seems far-fetched and artificial, but I met it in real applications. As an example, I will give the code from the Miranda IM project:

 #define SIZEOF(X) (sizeof(X)/sizeof(X[0])) int Cache_GetLineText(..., LPTSTR text, int text_size, ...) { ... tmi.printDateTime(pdnce->hTimeZone, _T("t"), text, SIZEOF(text), 0); ... }

So such errors are quite a place to be and it would be nice to be able to defend against them. It is even easier to make a mistake if you try to calculate the size of the array passed as an argument:

 void Test(int C[3]) { x = count_of( C ); // Error }

According to the C ++ standard of the language, the 'C' variable is an ordinary pointer, and not an array at all. As a result, in programs it is quite often possible to find processing only parts of the transmitted array.
If we are talking about such errors, I will give you a trick, how can you find out the size of the transmitted array. To do this, pass it to the link:

 void Test(int (&C)[3]) { x = count_of( C ); // Ok }

Now the result of the count_of (C) expression is 3.

Let's go back to Chromium. It uses a macro that avoids the errors described above. Here is its implementation:

 template <typename T, size_t N> char (&ArraySizeHelper(T (&array)[N]))[N]; #define arraysize(array) (sizeof(ArraySizeHelper(array)))

The idea of this magic spell is as follows. The template function ArraySizeHelper accepts an array of arbitrary type with length N as input. At the same time, the function returns an array of 'char' elements with length N. The function is not implemented because it is not needed. For the sizeof () operator, only the declaration of the ArraySizeHelper function is sufficient. In the macro 'arraysize', the size of the byte array returned by the ArraySizeHelper function is calculated. This size is the number of elements in the array, the length of which we want to calculate.

If you have a headache, you can take my word for it to work. And it works much better than the previously considered macro 'count_of ()'. Since the ArraySizeHelper function takes an array by reference, it is impossible to pass a simple pointer to it. Let's write a test code:

 template <typename T, size_t N> char (&ArraySizeHelper(T (&array)[N]))[N]; #define arraysize(array) (sizeof(ArraySizeHelper(array))) void Test(int C[3]) { int A[3]; int *B = Foo(); size_t x = arraysize( A ); // Ok x = arraysize( B ); //   x = arraysize( C ); //   }

The erroneous code simply will not be compiled. In my opinion, this is great if you can prevent a potential error at the compilation stage. This is a wonderful example that reflects the quality of the approach to programming. My respect for the developers at Google.

I will give one more example, of a completely different plan, but also speaking about the quality of the code.

 if (!file_util::Delete(db_name, false) && !file_util::Delete(db_name, false)) { // Try to delete twice. If we can't, fail. LOG(ERROR) << "unable to delete old TopSites file"; return false; }

This code may seem strange to many programmers. What is the point of trying to delete a file twice? And there is meaning. The one who wrote it, reached enlightenment and the essence of being programs. The file is uniquely deleted or not deleted only in textbooks and the abstract world. In a real system, it happens that the file just could not be deleted and a moment later - it is possible. The reason for this may be antiviruses, viruses, version control systems, and God knows what else. Programmers often do not think about such situations. They think so, since they did not delete the file, it means that it will not work. But if you want to do well and not litter in the directories, you need to take into account these extraneous effects. I came across exactly the same situation when a file is not deleted once per 1000 launches. And the decision was exactly the same. Well, except that I just in case Sleep (0) inserted in the middle.

But what about the check with PVS-Studio? The Chromium code is probably the highest quality code I've ever seen. This is confirmed by the very low density of errors that we could find. If to take quantitatively, then of course there are a lot of mistakes. But if you divide this number of errors by the amount of code, then it turns out that there are practically none. What are these errors? The most ordinary. A few examples:

 V512 A call of the 'memset' function will lead to underflow of the buffer '(exploded)'. platform time_win.cc 116

 void NaCl::Time::Explode(bool is_local, Exploded* exploded) const { ... ZeroMemory(exploded, sizeof(exploded)); ... }

Mistakes are done by everyone. It just forgot the star. It should have been: sizeof (* exploded).

 V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '-' operator. views custom_frame_view.cc 400

 static const int kClientEdgeThickness; int height() const; bool ShouldShowClientEdge() const; void CustomFrameView::PaintMaximizedFrameBorder(gfx::Canvas* canvas) { ... int edge_height = titlebar_bottom->height() - ShouldShowClientEdge() ? kClientEdgeThickness : 0; ... }

Insidious operator "?:" Has a lower priority than subtraction. Additional brackets are needed here:

 int edge_height = titlebar_bottom->height() - (ShouldShowClientEdge() ? kClientEdgeThickness : 0);

A check that does not make sense.

 V547 Expression 'count < 0' is always false. Unsigned type value is never < 0. ncdecode_tablegen ncdecode_tablegen.c 197

 static void CharAdvance(char** buffer, size_t* buffer_size, size_t count) { if (count < 0) { NaClFatal("Unable to advance buffer by count!"); } else { ... }

The condition "count <0" is always false. Protection will not work and potentially some kind of buffer may be full. This is, by the way, an example of how static analyzers can be used to search for vulnerabilities. An attacker can quickly identify for himself those portions of the code that contain errors for their further, more careful analysis. And here is another piece of code that might be interesting from a security point of view:

 V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (salt)' expression. common visitedlink_common.cc 84

 void MD5Update(MD5Context* context, const void* buf, size_t len); VisitedLinkCommon::Fingerprint VisitedLinkCommon::ComputeURLFingerprint( ... const uint8 salt[LINK_SALT_LENGTH]) { ... MD5Update(&ctx, salt, sizeof(salt)); ... }

The MD5Update () function will process as many bytes as the pointer takes. Is there a potential hole in terms of data encryption? I do not know if there is any danger in all this or not. But from the point of view of intruders - this is definitely an interesting place for a more detailed study.

The correct code should look like this:

 MD5Update(&ctx, salt, sizeof(salt[0]) * LINK_SALT_LENGTH);

Or like this:

 VisitedLinkCommon::Fingerprint VisitedLinkCommon::ComputeURLFingerprint( ... const uint8 (&salt)[LINK_SALT_LENGTH]) { ... MD5Update(&ctx, salt, sizeof(salt)); ... }

Another example about a typo:

 V501 There are identical sub-expressions 'host != buzz::XmlConstants::str_empty ()' to the left and to the right of the '&&' operator. chromoting_jingle_glue iq_request.cc 248

 void JingleInfoRequest::OnResponse(const buzz::XmlElement* stanza) { ... std::string host = server->Attr(buzz::QN_JINGLE_INFO_HOST); std::string port_str = server->Attr(buzz::QN_JINGLE_INFO_UDP); if (host != buzz::STR_EMPTY && host != buzz::STR_EMPTY) { ... }

In fact, you need to check the variable port_str:

 if (host != buzz::STR_EMPTY && port_str != buzz::STR_EMPTY) {

Some of the classics:

 V530 The return value of function 'empty' is required to be utilized. chrome_frame_npapi np_proxy_service.cc 293

 bool NpProxyService::GetProxyValueJSONString(std::string* output) { DCHECK(output); output->empty(); ... }

Should be: output-> clear ();

But there is even a job with a null pointer:

 V522 Dereferencing of the null pointer 'plugin_instance' might take place. Check the logical condition. chrome_frame_npapi chrome_frame_npapi.cc 517

 bool ChromeFrameNPAPI::Invoke(...) { ChromeFrameNPAPI* plugin_instance = ChromeFrameInstanceFromNPObject(header); if (!plugin_instance && (plugin_instance->automation_client_.get())) return false; ... }

Another example of validation that never works:

 V547 Expression 'current_idle_time < 0' is always false. Unsigned type value is never < 0. browser idle_win.cc 23

 IdleState CalculateIdleState(unsigned int idle_threshold) { ... DWORD current_idle_time = 0; ... // Will go -ve if we have been idle for a long time (2gb seconds). if (current_idle_time < 0) current_idle_time = INT_MAX; ... }

Perhaps we should stop. I can go on, but it gets boring. And this is only that which relates to Chromium itself. But there are still tests with errors like this:

 V554 Incorrect use of auto_ptr. The memory allocated with 'new []' will be cleaned using 'delete'. interactive_ui_tests accessibility_win_browsertest.cc 306

 void AccessibleChecker::CheckAccessibleChildren(IAccessible* parent) { ... auto_ptr<VARIANT> child_array(new VARIANT[child_count]); ... }

And a huge number of libraries on which Chromium is built. Moreover, the size of the libraries is much larger than the size of Chromium itself. And there are also many interesting places in the code. It is clear that perhaps the code with errors is not used anywhere, but the error from this does not cease to be an error. One first example (ICU library):

 V547 Expression '* string != 0 || * string != '_'' is always true. Probably the '&&' operator should be used here. icui18n ucol_sit.cpp 242

 U_CDECL_BEGIN static const char* U_CALLCONV _processVariableTop(...) { ... if(i == locElementCapacity && (*string != 0 || *string != '_')) { *status = U_BUFFER_OVERFLOW_ERROR; } ... }

The expression "(* string! = 0 || * string! = '_')" Is always true. Apparently it should be: (* string == 0 || * string == '_').

Conclusion

PVS-Studio was defeated. Chromium code is one of the best codes we have analyzed. We found almost nothing in Chromium. Rather, we found a lot of mistakes, and in this article we showed only a small part of them. But if you consider that all these errors are spread over the source code of 460 megabytes, then it turns out that there are practically none.

PS

I answer the question: will we let the Chromium developers know about the errors found? No, we will not inform. This is a very large amount of work that we cannot afford to do for free. Check Chromium is not a Miranda IM check or Ultimate Toolbox check at all . This is a big job, you need to carefully examine all the messages and find out whether this is really a mistake or not. For this you need to navigate the project. We will send the translation of this article to the developers of Chromium, and if they are interested, they will be able to carry out a project analysis and analyze all diagnostic messages. Yes, for this they will need to purchase PVS-Studio. But any department of Google can easily afford it.

Pps

No, we are not greedy. We are ready to help open projects like FlylinkDC ++. But these are different things.

Source: https://habr.com/ru/post/119815/

All Articles

PVS-Studio vs Chromium

Conclusion

More articles: