Our team wrote three notes related to the code analysis of the Tizen operating system. The operating system contains a lot of code and therefore is fertile ground for writing various articles. I think that we will return to Tizen in the future, but now we are waiting for other interesting projects. Therefore, I will summarize some of the work done and answer a number of questions that have arisen after the previously published articles.
Work done
So, our team wrote 3 articles:
- Andrey Karpov. 27000 errors in the Tizen operating system . A fundamental article demonstrating the importance of using static analysis in large projects. The PVS-Studio static analyzer perfectly showed how many different error patterns it can detect in C / C ++ code.
- Andrey Karpov. Let's talk about micro-optimizations on the example of Tizen code . Using the example of Tizen, it was demonstrated what code micro-optimizations the PVS-Studio analyzer offers. The old code should not be edited, but one should definitely develop a new code based on these recommendations.
')
- Sergey Khrenov. We continue to study Tizen: C # components were of high quality . But here the PVS-Studio analyzer could not prove itself. Failure. But this article shows that we are honest in their research. We managed to find a lot of interesting errors in C and C ++ code - we wrote about it. It was not possible to find errors in C # code - we also wrote about this.
Following the results of the publications, two major discussions arose: the
first on Reddit, the
second on Hacker News. Also appeared a few news posts. Major:
All this prompted me to discuss a few additional topics and answer some questions that were raised in the discussions.
It is necessary to rewrite everything on Rust
Recently, enthusiasts have become active, campaigning everywhere to use Rust. An especially vigorous upsurge of discussion on this topic followed the article "
Rewrite the Linux kernel in Rust? ".
These enthusiasts were noted in the comments to our articles. Their sentence: to avoid such errors, you need to rewrite everything to Rust.
Actually, it doesn't matter to me whether something will correspond or not. There is so much code written in C and C ++ in the world, that for another 50 years PVS-Studio analyzer will check something. Well, if you still use static analyzers for Cobol, then for C and C ++ they will also take many more decades.
And yet, I can not get around this topic. Are you seriously suggesting rewriting such projects on Rust? Here take and rewrite 72 MLOC code on Rust? Yes, it's crazy!
This will require an incredible amount of time and effort. Moreover, having spent many years developing, you will end up with exactly the same thing that already existed! It would be much better to invest these man-years in the creation of something new in an already existing project.
Someone will argue that after such a rewrite the code will become better and more reliable. There is no such guarantee at all. In large projects, the value of the language is not so great. In addition, many C or C ++ libraries have been fine-tuned for a long time, and when rewriting you will have to reinvent bicycles, which for many years will please users with various errors.
I think that someone who suggests rewriting 72 MLOC codes is simply incompetent. This can be forgiven for a beginner, but if it is said by a person with experience, then he is apparently a troll.
3.3% is a very small sample and your error estimate is incorrect
Yes, this approach may give an inaccurate result. But it makes sense to worry about it if we checked 1000, 3000 or 10000 lines of code. It’s worth worrying if only one project is checked, written by one team. In another project, the density of errors can be very different.
But I remind you that I (using the PVS-Studio analyzer) checked 2,400,000 lines of C / C ++ code. It's a lot! This is the size of some projects.
At the same time, the code of different projects was checked. I used the finger-to-sky sampling method. Very good, honest way. Here is a list of the projects I have studied:
alsa-lib-1.0.28, aspell-0.60.6.1, augeas-1.3.0, bind-9.11.0, efl-1.16.0, enlightenment-0.20.0, ise-engine-anthy-1.0.9, bluetooth- frwk-0.2.157, capi-appfw-application-0.5.5, capi-base-utils-3.0.0, capi-content-media-content-0.3.10, capi-maps-service-0.6.12, capi- media-audio-io-0.3.70, capi-media-codec-0.5.3, capi-media-image-util-0.1.15, capi-media-player-0.3.58, capi-media-screen-mirroring- 0.1.78, capi-media-streamrecorder-0.0.10, capi-media-vision-0.3.24, capi-network-bluetooth-0.3.4, capi-network-http-0.0.23, cynara-0.14.10, e-mod-tizen-devicemgr-0.1.69, ise-engine-default-1.0.7, ise-engine-sunpinyin-1.0.10, ise-engine-tables-1.0.10, isf-3.0.186, org. tizen.app-selector-0.1.61, org.tizen.apps-0.3.1, org.tizen.bluetooth-0.1.2, org.tizen.browser-3.2.0, org.tizen.browser-profile_common-1.6. 4, org.tizen.classic-watch-0.0.1, org.tizen.d2d-conv-setting-profile_mobile-1.0, org.tizen.d2d-conv-setting-profile_wearable-1.0, org.tizen.download-manager- 0.3.21, org.tizen.download-manager-0.3.22, o rg.tizen.dpm-toolkit-0.1, org.tizen.elm-demo-tizen-common-0.1, org.tizen.indicator-0.2.53, org.tizen.inputdelegator-0.1.170518, org.tizen.menu- screen-1.2.5, org.tizen.myplace-1.0.1, org.tizen.privacy-setting-profile_mobile-1.0.0, org.tizen.privacy-setting-profile_wearable-1.0.0, org.tizen.quickpanel- 0.8.0, org.tizen.screen-reader-0.0.8, org.tizen.service-plugin-sample-0.1.6, org.tizen.setting-1.0.1, org.tizen.settings-0.2, org. tizen.settings-adid-0.0.1, org.tizen.telephony-syspopup-0.1.6, org.tizen.voice-control-panel-0.1.1, org.tizen.voice-setting-0.0.1, org. tizen.volume-0.1.149, org.tizen.w-home-0.1.0, org.tizen.w-wifi-1.0.229, org.tizen.watch-setting-0.0.1, security-manager-1.2. 17
It is unlikely that I was “lucky” to take so many projects belonging to the same team. Obviously, different teams of specialists worked here.
Therefore, we can assume that the obtained value of the density of detected errors is an average for the rest of the project.
It's not as bad as you say.
After the publication of my article “27000 errors in the Tizen operating system”, several illiterate news appeared on the Internet, where people wrote about a large number of vulnerabilities found in Tizen. For example, one could see such illiterate headlines "In the code of the Tizen operating system, 27,000 vulnerabilities were fixed." This, of course, is not true. Let me explain why.
I must say that I did not write about vulnerabilities, but about errors. And yet, in the article I never said that Tizen is a low-quality code. Yes, I say that the PVS-Studio analyzer reveals many errors, however there will be many errors in any large project. Therefore, the total number of errors still does not indicate the quality of the code.
Now let's talk a little more about vulnerabilities. Among all the errors that occur in the programs, there are security weaknesses. Their peculiarity is that confusion is possible when this error can be used by an attacker. These types of errors are described in the
CWE . CWE is a community-developed list of common software security weaknesses.
In my article, I classify many of the CWE classification errors. However, this does not mean anything. The fact is that such errors are rarely used as vulnerabilities. In other words, it is very rarely possible to turn a CWE into a CVE. More terminology can be found
here .
Once again, underscore that using an error as a vulnerability can be very, very rare. In the overwhelming majority of cases, an error is simply an error that, although unpleasant to users of the program, does not cause security problems.
Does 27,000 errors indicate good or bad code quality? Impossible to say. However, this is not a terrible number, as it may seem at first glance. It should be noted that the total amount of code is 72,500,000 lines in C, C ++ (without comments). It turns out that the PVS-Studio analyzer detects approximately 0.37 errors per 1000 lines of code. Or in other words, about 1 error per 3000 lines of code.
Note. Do not confuse this with the total number of errors in the Tizen code. This is what we can reveal, and not how many of them are there. Please pay attention to this, as some incorrectly interpret the data.
So, PVS-Studio detects approximately 0.37 errors per 1000 lines of code. Is it a lot or a little? Rather, medium. It happens better and worse. Here are some examples:
- Notepad ++: we find about 2 errors per 1000 lines of code.
- Far Manager for Linux: we find about 0.46 errors per 1000 lines of code.
- Tor project: we do n't find anything at all. Density 0.
Let's sum up. In fact, there is no sensation. The number is shocking at 27,000 errors, but such an impressive figure is related to the large size of the Tizen project. If you take another big project, there will also be a lot of mistakes.
The purpose of my article was to show that the PVS-Studio tool can be useful for the Tizen project. And it seems I managed it. However, I did not expect the violent reaction and discussion that arose around this article. We regularly write such notes. They can be
found here .
The article does not indicate the percentage of false positives.
I'll start from afar. Unfortunately, many readers get acquainted with articles very inattentively. As a result, they quite often misinterpret the numbers that are indicated in them. I am already well acquainted with this effect and try to take this into account in the articles. For example, in the article on “27000 errors” I specifically wrote twice that I found 900 errors, having studied 3.3% of the code. At the same time he emphasized that we are talking about errors, and not about the number of warnings issued by the analyzer.
And although I was safe, this comment still appeared:
900 varnings in Lint's analog does not mean 900 bugs. I would even say that these indicators are not connected at all. Surely, there were found errors in the formatting of the code, the scope of variables, etc. In the furnace of such analytes.The man did not read the article, but he saw the number 900 and hurries to share his opinion with others.
It is for this reason that I am not writing about the number of false positives. People will see a certain number, and then for many years everywhere they will write in the comments “this is a bad analyzer, in which the percentage of false positives is NN”.
The thing is that the analyzer requires configuration. And the majority of false positives refers to a small number of macros. I have already demonstrated in several of my articles how the suppression of warnings for several macros dramatically reduces the number of false positives.
The same is true of Tizen. However, I am afraid that very few people will pay attention to these explanations and examples. But everyone will remember the number, meaning a large percentage of false positives.
There is a logical question: Then why don't you set up a static analyzer and not show a good result right away?
I answer. It will take time, and I’m still waiting for interesting projects like iOS or Android. However, this is not the main reason why I do not want to do this. The fact is that it is not clear where to stay. I know that by making efforts we can reduce the number of false positives to zero, or almost to zero. For example, we nullified the number of false positives when we worked on the Unreal Engine project (see Articles
1 ,
2 ).
So, if I use the settings to reduce the number of false positives to a very small percentage, they will tell me that this is not fair. It turns out that, on the one hand, I would like to leave as few of false positives as possible, on the other hand, I should not overdo it, showing too ideal. Something I do not like it. I think it's better to do nothing at all then.
So how does a programmer understand if our analyzer works well or is it bad? Very simple! You need to
download it and try to check the working draft. Immediately it becomes clear, like it or not. And it will be immediately apparent how many false positives and what type they are. After that, perhaps your company will be happy to join
our list of clients .
Only I ask you not to make a mistake and not to try the analyzer on small projects or in general test examples. The reasons:
Update. I submit this note after writing this article. Well, the readers won. I give up and quote a number. I conducted a study of EFL Core Libraries and calculated that the PVS-Studio static analyzer will produce about 10-15% of false positives. Here is an article about this:
Characteristics of the PVS-Studio analyzer on the example of EFL Core Libraries .
Enough -Wall -Wextra -Werror
As always, there were comments that modern compilers themselves are well able to perform static code analysis and additional tools are not needed.
Additional tools are needed. Static analyzers are specialized tools that are always ahead of compilers with their diagnostic capabilities. For this they receive money from their clients.
However, in addition to words, I have evidence. Every time we check a compiler, we find errors there:
It should not be forgotten that static analysis is not only a warning, it is also an infrastructure. Here are some features of PVS-Studio:
- Convenient and easy integration with Visual Studio 2010-2017.
- Integration with SonarQube.
- BlameNotifier utility. The tool allows you to send letters to developers about errors that PVS-Studio found during the night run.
- Mass Suppression - allows you to suppress all old messages so that the analyzer generates 0 operations. You can always return to the depressed messages later. The ability to safely integrate PVS-Studio into the existing development process and focus on errors only in the new code.
- Saving and loading analysis results: you can check the code at night, save the results, and load them in the morning and watch them.
- IncrediBuild support.
- Mark as False Alarm - markup in the code, so as not to swear specific diagnostics in a specific file fragment.
- Interactive filtering of the analysis results (log) in the PVS-Studio window: by diagnostic code, by file name, by inclusion of the word in the diagnostic text.
- Error statistics in Excel - you can see the rate of editing errors, the number of errors in time, etc.
- Automatic check for new versions of PVS-Studio (both when working in the IDE and at night assemblies).
- Using relative paths in report files to enable report transfer to another machine.
- CLMonitoring - checking for projects that do not have Visual Studio files (.sln / .vcxproj); if suddenly you don’t have enough CLMonitoring functionality, then you can integrate PVS-Studio into any Makefile-based build system manually.
- pvs-studio-analyzer is a CLMonitoring utility similar to Linux.
- Ability to exclude from the analysis files by name, folder or mask.
More information about all this can be found in the
documentation .
No price
Yes, we have no price on the site. This is standard practice for companies selling static code analysis solutions.
We are positioning PVS-Studio as a B2B solution. When selling companies need to discuss a lot of points that affect the price of a license. Posting a specific price on the site does not make sense and more productive to immediately start the discussion.
Why we do not work with individual developers?
We tried, but we did not succeed .
However, individual developers can use one of the free license options:
Representatives of the companies I invite to communicate by
mail .
Not all sites that you mention in the article are real mistakes.
Yes, maybe something will not be a mistake with a more careful study of the code. On the other hand, with careful analysis, it may become clear that I, on the contrary, missed some errors. For example, I was too lazy to learn the warnings of the
V730 - they are initialized inside the constructor. It is very difficult to try to understand in someone else's code whether it is a mistake that some member of the class has remained uninitialized. However, if you do this, you will most certainly find real mistakes.
Let's look at one of these cases. The code refers to the org.tizen.browser-profile_common-1.6.4 project.
To begin, consider the definition of the
BookmarkItem class.
class BookmarkItem { public: BookmarkItem(); BookmarkItem( const std::string& url, const std::string& title, const std::string& note, unsigned int dir = 0, unsigned int id = 0 ); virtual ~BookmarkItem(); void setAddress(const std::string & url) { m_url = url; }; std::string getAddress() const { return m_url; }; void setTitle(const std::string & title) { m_title = title; }; std::string getTitle() const { return m_title; }; void setNote(const std::string& note){m_note = note;}; std::string getNote() const { return m_note;}; void setId(int id) { m_saved_id = id; }; unsigned int getId() const { return m_saved_id; }; .... .... bool is_folder(void) const { return m_is_folder; } bool is_editable(void) const { return m_is_editable; } void set_folder_flag(bool flag) { m_is_folder = flag; } void set_editable_flag(bool flag) { m_is_editable = flag; } private: unsigned int m_saved_id; std::string m_url; std::string m_title; std::string m_note; std::shared_ptr<tizen_browser::tools::BrowserImage> m_thumbnail; std::shared_ptr<tizen_browser::tools::BrowserImage> m_favicon; unsigned int m_directory; std::vector<unsigned int> m_tags; bool m_is_folder; bool m_is_editable; };
We are interested in the members
m_is_folder and
m_is_editable . Note that they are at the end of the class description. I am ready to bet $ 10 on the fact that initially they were not in the first version of the class and they appeared already later, in the process of project development. So, when these members were added, only one constructor was modified.
As a result, we have these two constructors:
BookmarkItem::BookmarkItem() : m_saved_id(0) , m_url() , m_title() , m_note() , m_thumbnail(std::make_shared<.....>()) , m_favicon(std::make_shared<.....>()) , m_directory(0) , m_is_folder(false) , m_is_editable(true) { } BookmarkItem::BookmarkItem( const std::string& url, const std::string& title, const std::string& note, unsigned int dir, unsigned int id ) : m_saved_id(id) , m_url(url) , m_title(title) , m_note(note) , m_directory(dir) { }
One constructor initializes the
m_is_folder and
m_is_editable members , and the other does not. I do not have absolute certainty, but most likely this is a mistake.
The PVS-Studio analyzer issues the following warning for the second constructor: V730. Of the class are initialized inside the constructor. Consider inspecting: m_is_folder, m_is_editable. BookmarkItem.cpp 268
By the way, the PVS-Studio analyzer can search for 64-bit errors. Tizen is 32-bit so far, so for him they are not relevant yet, but I want to devote a few words to this topic.
In fact, no “64-bit errors” exist. However, it makes sense to highlight some errors in this category and consider them separately. The fact is that such errors do not manifest themselves in the 32-bit version of applications. And at all do not show and they cannot be found out any testing.
Consider a simple example for clarification. It is necessary to create an array of pointers and for this purpose such erroneous code was written:
int **PtrArray = (int **)malloc(Count * size_of(int));
Memory is allocated for an array of
int , and not for an array of pointers. The correct code should have been:
int **PtrArray = (int **)malloc(Count * size_of(int *));
The error in the 32-bit program does not manifest itself. The size of the pointer and the type of
int match, so the buffer is allocated the correct size. Everything works correctly and the problems will begin only when we begin to build the 64-bit version of the program.
Note. Of course, in some 64-bit systems, the pointer size can also coincide with the size of the
int type. Or it may be that the sizes will differ in 32-bit systems. This is exotic, about which there is no point in talking. The considered example will work correctly in all common 32-bit systems and fail in 64-bit ones.
On our site you can find a lot of interesting materials on 64-bit errors and ways to deal with them:
- Collection of examples of 64-bit errors in real programs
- C ++ 11 and 64-bit errors
- Undefined behavior is closer than you think.
- Lessons learned from developing 64-bit C / C ++ applications
Let's return to the Tizen project and take for example the project capi-media-vision-0.3.24. Here you can observe an interesting kind of 64-bit errors. The PVS-Studio analyzer issues 11 warnings for it with the code
V204 :
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_testsuite_common.c 94
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_video_helper.c 103
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_video_helper.c 345
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_mask_buffer.c 39
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_surveillance.c 52
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_surveillance.c 134
- V204 Explicit conversion from 32-bit integer type to pointer type. mv_surveillance.c 172
- V204 Explicit conversion from 32-bit integer type to pointer type. surveillance_test_suite.c 452
- V204 Explicit conversion from 32-bit integer type to pointer type: (unsigned char *) malloc (buf_size) surveillance_test_suite.c 668
- V204 Explicit conversion from 32-bit integer type to pointer type: (unsigned char *) malloc (buf_size) surveillance_test_suite.c 998
- V204 Explicit conversion from 32-bit integer type to pointer type: (unsigned char *) malloc (buf_size) surveillance_test_suite.c 1109
At first glance, these warnings are issued on a completely harmless code of this kind:
*string = (char*)malloc(real_string_len * sizeof(char));
What is the reason? The fact is that there is no header file connected in which the type of the
malloc function is declared. You can verify this by preprocessing the C-files and looking at the contents of the
i-files . The use of the
malloc function is there, but its declaration is not.
Since this program is in C language, it is compiled, despite the lack of ads. Once the function is not declared, it is considered that it takes and returns arguments of type
int .
Those. the compiler thinks the function is declared like this:
int malloc(int x);
Thanks to this, a 32-bit program compiles perfectly and works. The pointer is placed in the
int type and everything is fine.
This program will be compiled in 64-bit mode. She even almost always works. The most important thing is almost always.
Everything will be fine as long as the memory is allocated in the lower addresses of the address space. However, in the process of work, the memory in the younger part of the address space may be occupied or fragmented. Then the memory manager will return the memory allocated outside the lower addresses. A crash will occur due to truncation of the higher bits in the pointer. In more detail, as well as what happens, I described here: "
Beautiful 64-bit error in the C language ."
As a result, we see 11 defects that can lead to hard-to-reproducible program crashes. Very unpleasant mistakes.
Unfortunately, diagnostics of PVS-Studio for detecting 64-bit errors generate a lot of noise (false positives) and nothing can be done about it. Such is their nature. The analyzer often does not know what the range of those or other values ​​is, and cannot understand that the code will work correctly. But if you want to make a reliable and fast 64-bit application, you should work with all these warnings. By the way, we can take on this painstaking work and customize the porting of the application to a 64-bit system. We have experience in this area (see "
How to transfer a project of 9 million lines of code to a 64-bit platform ").
So if Tizen developers want to make the system 64-bit, then know that our team is ready to help with this.
Conclusion
Thank you all for your attention. For those who are interested in the PVS-Studio analyzer and want to learn more about its capabilities, I suggest watching a large presentation (47 minutes):
PVS-Studio static code analyzer for C, C ++ and C # .
I invite you to subscribe to keep abreast of new publications:
If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov.
Tizen: Summing Up