📜 ⬆️ ⬇️

PVS-Studio wanted, but could not find bugs in robots.txt

Picture 1

The other day, Google published the source code for the robots.txt parser. Why not get rid of the project already tested by everyone up and down through PVS-Studio and, possibly, find an error. No sooner said than done. It is a pity that nothing significant was found. Well, then let it be just an excuse to praise the developers of Google.

robots.txt - an index file that contains the rules for search engines. It is valid for https, http and FTP protocols. Google has made its robots.txt parser available to everyone. You can read more about this news here: Google opens the source code of the robots.txt parser

I think most of those who read our articles know what PVS-Studio does. But in case you are new to our blog, we will give a brief reference. PVS-Studio is a static code analyzer that allows you to find a variety of errors, vulnerabilities and shortcomings in projects written in C, C ++, C # and Java. In other words, PVS-Studio is a SAST solution and can work both on user machines or build servers, and in the cloud . Moreover, the PVS-Studio team loves to write articles about checking various projects. So let's get down to business and try to find errors in the source code of the parser from Google.

To our regret, and, to the joy of everyone else, no errors were found. Found only a couple of minor flaws, which we will tell. It is necessary because at least something to write :). The absence of errors is due to the small size of the project and the high quality of the code itself. This does not mean that there are no errors hiding there, but the static analysis was currently impotent.
')
In general, this article turned out in the spirit of our other publication " The shortest article on checking nginx ".

There was a possibility of a small optimization:

V805 Decreased performance. It is not necessary to identify the string by using 'strlen (str)> 0' construct. A more efficient way is to check: str [0]! = '\ 0'. robots.cc 354

bool RobotsTxtParser::GetKeyAndValueFrom(char **key, ....) { .... *key = line; .... if (strlen(*key) > 0) { .... return true; } return false; } 

Calling the strlen function to find out if a string is non-empty is an inefficient way. Such a check can be made much simpler: if (* key [0]! = '\ 0') , and you will not need to go through all the elements of a string if it is non-empty.

V808 'path' object of the 'basic_string' type was created but was not used. robots.cc 123

 std::string GetPathParamsQuery(....) { std::string path; .... } 

The path string is declared, but not used further. In some cases, unused variables may indicate an error. But here it seems that before this variable was somehow used, but after making changes, it was no longer needed. Thus, the analyzer often also helps to make the code cleaner and help avoid errors by simply removing the prerequisites for their appearance.

In the following case, the analyzer, in fact, recommends adding a default return after the entire main has been processed. Perhaps it would be worth adding a return at the very end so that you can understand that everything really worked. However, if this behavior was intended, and nothing needs to be changed, and the analyzer would not want to see the message, then in the case of PVS-Studio, you can suppress this warning and never see it again :).

V591 The 'main' function doesn’t return a value, which is equivalent to 'return 0'. It is possible that this is an unintended behavior. robots_main.cc 99

 int main(int argc, char** argv) { .... if (filename == "-h" || filename == "-help" || filename == "--help") { ShowHelp(argc, argv); return 0; } if (argc != 4) { .... return 1; } if (....) { .... return 1; } .... if (....) { std::cout << "...." << std::endl; } } 

It was also found that the following two functions with different names have the same implementation. Perhaps this is the result of the fact that earlier these functions had different logic, but they came to one. Or it may be that a typo crept in somewhere, so such warnings should be carefully checked.

V524 It is odd that the body of the MatchDisallow function is fully equivalent. robots.cc 645

 int MatchAllow(absl::string_view path, absl::string_view pattern) { return Matches(path, pattern) ? pattern.length() : -1; } int MatchDisallow(absl::string_view path, absl::string_view pattern) { return Matches(path, pattern) ? pattern.length() : -1; } 

This is the only place that makes me suspicious. It is worth checking out the authors of the project.

Thus, checking the robots.txt parser from Google showed that the project, which is so actively used and, most likely, repeatedly checked for errors, has a high quality code. And the found flaws can not spoil the impression of what cool coders from Google were involved in this project :).

We offer you to download and try PVS-Studio on the project you are interested in.



If you want to share this article with an English-speaking audience, then please use the link to the translation: Victoria Khanieva. PVS-Studio couldn’t find bugs in robots.txt

Source: https://habr.com/ru/post/459662/


All Articles