📜 ⬆️ ⬇️

Unicorn in search of extraterrestrial intelligence: an analysis of the SETI @ home code

There are two possibilities: either we are alone in the universe, or not. Both are equally terrible. (c) Arthur Charles Clark

Discussions about whether we are alone in the Universe excite people’s minds for more than a dozen years. The SETI project dealing with the search for extraterrestrial civilizations and the possibility of entering into contact with them has a serious attitude to this issue. The analysis of the source code of one of the projects of this program, SETI @ home, will be discussed in this article.

More about the project




SETI @ home is a scientific non-commercial voluntary computing project that uses free resources on volunteer computers to search for radio signals from extraterrestrial civilizations. The project is based on an open software platform for distributed computing - BOINC , written in C ++.
')
A static analyzer of C / C ++ code - PVS-Studio was used as an analysis tool. The source code for the SETI @ home project is available on the official website. Instructions on how to build a project are also not difficult to find, so, having collected everything you need and preparing a cup of coffee, I got down to business.

Test results


I admit that before analyzing the project, I was in anticipation of how many problem areas could be detected. But to my surprise, really interesting code fragments (problem) turned out to be not very much, which speaks of its quality.

However, there were suspicious places, and some of them I would like to consider.

To warm up


The code examples in this section cannot be brought under any one category, such as “pointers” or “cycles”, as they have different themes, but they are interesting in their own way.

Therefore, I propose to move closer to the point:

struct SETI_WU_INFO : public track_mem<SETI_WU_INFO> { .... int splitter_version; .... }; SETI_WU_INFO::SETI_WU_INFO(const workunit &w):.... { .... splitter_version=(int)floor(w.group_info-> splitter_cfg->version)*0x100; splitter_version+=(int)((w.group_info->splitter_cfg->version)*0x100) && 0xff; .... } 

Analyzer warning: V560 A part of conditional expression is always true: 0xff. seti_header.cpp 96

Suspicious is the '&&' operator, which is used to get an integer value. Perhaps, in this case, the '&' operator was necessary, since otherwise the variable 'splitter_version' will always take one of two values: 0 or 1.

Of course, there is a probability that the addition to 'splitter_version' is 0 or 1, but I think you will agree that it is not very large, and in this case you could use more understandable code (for example, the ternary operator).

The next suspicious code snippet is the methods that should return a value but, nevertheless, do not return anything. Moreover, they have empty bodies. Such a code at least looks suspicious. I propose to take a look for yourself:

 struct float4 { .... inline float4 rsqrt() const { } inline float4 sqrt() const { } inline float4 recip() const { } .... }; 

Analyzer Warnings:As can be seen from this fragment, none of the methods returns anything. I specifically wrote out this particular section of code separately, and was somewhat surprised to see that the compilation was successful. There were no compiler warnings either. But only until these methods are invoked. Attempting to do this still produces a compilation error.

What it was: a reserve for the future or a mistake is difficult to say, since there were no comments on this code. Just keep in mind what I wrote above.

But let's not dwell on this example, it’s better to see what else was found.

 template <typename T> std::vector<T> xml_decode_field(const std::string &input, ....) { .... std::string::size_type start,endt,enc,len; .... if ((len=input.find("length=",start)!=std::string::npos)) length=atoi(&(input.c_str()[len+strlen("length=")])); .... } 

Analyzer warning: Consider reviewing the expression of the A = B! = C 'kind. The expression is calculated as the following: 'A = (B! = C)'. xml_util.h 891

As is clear from the code, during the parsing of the input data, it was necessary to obtain the length value (the 'length' variable).

What was meant? The string searches for the substring “length =”; if it is found, the index of the beginning of the substring is written to the variable 'len'. After that, the original string is converted to a C-string, from which the required length value is extracted using the indexing operator. As a calculation of the index of the character storing the length value, the substring index “length =” and its length are used.

However, due to the priority of operations (or incorrectly placed brackets in the condition, it is clear that they are duplicated) everything goes wrong. First, a comparison with the value of 'npos' will be performed, and the result of this comparison (0 or 1) will be written into the variable 'len', which will lead to an incorrect calculation of the array index.

While viewing the analysis log, I came across a couple of interesting macros. I propose to take a look and you:

 #define FORCE_FRAME_POINTER (0) #define SETIERROR( err, errmsg ) do { \ FORCE_FRAME_POINTER; \ throw seti_error( err, __FILE__, __LINE__, errmsg ); \ } while (0) 

Analyzer Warning: V606 Ownerless token '0'. analyzefuncs.cpp 212

Just want to say that this macro has been met several times in the course of the code. It is not clear why not just generate an exception. Instead, the code contains an incomprehensible token and there is a loop for which only one iteration is performed. The approach is interesting, but why such a bike is unclear.

Pointers and memory management


For a change, sample code with pointers. As a rule, in code fragments containing work with pointers or addresses, the probability of stepping on a rake increases in order. Therefore, they cause more interest.

 size_t GenChirpFftPairs(....) { .... double * ChirpSteps; .... ChirpSteps = (double *)calloc(swi.num_fft_lengths, sizeof(double)); .... CRate+=ChirpSteps[j]; .... if (ChirpSteps) free (ChirpSteps); .... } 

Analyzer Warning: V595 The ChirpSteps' pointer was used against nullptr. Check lines: 138, 166. chirpfft.cpp 138

The analyzer warns that the pointer is used before the check is performed, on whether it is zero. If the memory cannot be allocated and the function 'calloc' returns the value 'NULL', a null pointer dereference will be performed, which, as we all know very well, is not very good.

Another point is that the function 'free' is called only if the pointer is not equal to 'NULL'. This check is redundant since the free function handles null pointers without any problems.

Another piece of code with suspicious use of the 'memset' function. Let's get a look:

 int ReportTripletEvent(....) { .... static int * inv; if (!inv) inv = (int*)calloc_a(swi.analysis_cfg.triplet_pot_length, sizeof(int), MEM_ALIGN); memset(inv, -1, sizeof(inv)); for (i=0;i<swi.analysis_cfg.triplet_pot_length;i++) { j = (i*pot_len)/swi.analysis_cfg.triplet_pot_length; if (inv[j] < 0) inv[j] = i; .... } .... } 

Analyzer Warning: V579 It is possibly a mistake. Inspect the third argument. analyzereport.cpp 271

From this code fragment, it is clear that the memory is first allocated for the array, after which its elements are filled with the value '-1', and after that work is done with them. But in the 'memset' function, the third parameter is passed not the size of the array, but the size of the pointer. To correctly fill the array with the required characters, the third argument should be the buffer size.

Cycles


In many projects, there are cycles, the body of which either runs infinitely or not at all. SETI @ home is no exception. On the other hand, the consequences here are not as critical as in some other projects.

 std::string hotpix::update_format() const { std::ostringstream rv(""); for (int i=2;i<2;i++) rv << "?,"; rv << "?"; return rv.str(); } 

Analyzer Warning: Consider Inspecting the 'for' operator. It is possible that you will not be executed at all. schema_master.cpp 9535

The error is very trivial. As we all know, the body of the 'for' loop is executed as long as its conditional expression is true. Here, on the first iteration, the condition will be false, so that the loop will be immediately exited. Personally, I cannot understand what was meant here, but nevertheless the body of this cycle will never be executed.

A similar code snippet met again, but in a different method of another class:

V621 Consider inspecting the 'for' operator. It is possible that you will not be executed at all. schema_master.cpp 11633

Not so transparent, but potentially erroneous example:

 template <typename T> std::istream &operator >>(std::istream &i, sqlblob<T> &b) { .... while (!i.eof()) { i >> tmp; buf+=(tmp+' '); } .... } 

Analyzer Warning: V663 Infinite loop is possible. The 'cin.eof ()' condition is not a loop from the loop. Consider adding the 'cin.fail ()' function call to the conditional expression. sqlblob.h 58

Since we are considering cycles, it is easy to guess that the error is in the condition for exiting the 'while' loop. Although many probably will not find anything suspicious, since the method used here looks quite standard. But there is an underwater rock, otherwise this article would not exist in the article.

The fact is that in the event of a failure to read the data, such a check will not be enough. In this case, the 'eof ()' method will constantly return 'false', as a result - an infinite loop.

To correct the error, add an additional condition. Then the loop will look like this:

 while(!i.eof() && !i.fail()) { //do something } 

Other suspicious places


You need to be careful with bit operations. During the analysis, a code segment was encountered, leading to undefined behavior:
 int seti_analyze (ANALYSIS_STATE& state) { .... int last_chirp_ind = - 1 << 20, chirprateind; .... } 

Analyzer warning: V610 Undefined behavior. Check the shift operator '<<'. The left operand '-1' is negative. analyzefuncs.cpp 177

As can be seen from the code, the variable is initialized with the value obtained as a result of the bit shift. And everything would be fine, but the left operand is negative, and according to the C ++ 11 standard, this operation leads to undefined behavior.

There is a double-edged sword. On the one hand, a similar code has been used for a long time and repeatedly, on the other hand, the standard still states that this code leads to undefined behavior.

The final decision remains for the programmer, but pay attention to it.

The code was repeatedly encountered, where different values ​​were assigned twice to the same variable, and no other variable operations were performed between these assignments. One example of such a code:

 int checkpoint(BOOLEAN force_checkpoint) { int retval=0, i, l=xml_indent_level; .... retval = (int)state_file.write(str.c_str(), str.size(), 1); // ancillary data retval = state_file.printf( "<bs_score>%f</bs_score>\n" "<bs_bin>%d</bs_bin>\n" "<bs_fft_ind>%d</bs_fft_ind>\n", best_spike->score, best_spike->bin, best_spike->fft_ind); .... } 

Analyzer Warning: V519 The 'retval' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 450, 452. seti.cpp 452

In this situation, it is difficult to say what was meant or how it should be corrected. But the programmer who wrote the code may understand the reason for this use of the variable. We can only wonder and speculate.

Four more similar code sections were met. Relevant analyzer messages:
Perhaps these variables were used simply to view the values ​​that the functions return when debugging code. Then there is nothing dangerous and these warnings can be ignored or suppressed in one of the many ways that the PVS-Studio analyzer provides.

Finally, I will give an example where the 'strlen' function is somewhat irrational:
 int parse_state_file(ANALYSIS_STATE& as) { .... while(fgets(p, sizeof(buf)-(int)strlen(buf), state_file)) { if (xml_match_tag(buf, "</bt_pot_min")) break; p += strlen(p); } .... } 

Analyzer Warning: V814 Decreased performance. Calls to the 'strlen' function for the loop's continuation was calculated. seti.cpp 770

Since the buffer (variable 'buf') does not change during the execution of the loop, there is no need to calculate its length at each iteration. Perhaps it would be more expedient to create for this a separate variable with which to make a comparison. This is not so noticeable with small buffer sizes, but with large ones, when the number of iterations is significantly larger, it can be much more noticeable.

This code has been met several times, here are some more similar messages:

What else could I find?


There were other warnings of the analyzer, but I found the code fragments are not so interesting as to write them out and analyze them separately. You can just read this section and find out what else was encountered during the test.

For example, there came across "hanging" arrays, which are declared but not used at all. Fortunately, their size was fixed and small. However, the stack memory was allocated for them, which is inexpedient.

Also, pointer dereference has been encountered several times, followed by its increase (* p ++). At the same time, the value stored in the index was not used in any way. From the examples it was clear that the change was simply to the pointer itself, but for some reason it was still dereferenced. This code is potentially erroneous, as it can sometimes be implied to change the value stored in the pointer, and not its own. Therefore, to treat with distrust these warnings are not worth it.

The function 'fprintf' was repeatedly encountered, the format string of which did not match the actual arguments passed to the function. This leads to indefinite behavior and may cause, for example, displaying nonsense on the screen.

Conclusion


After checking, I left a double impression. On the one hand, I was somewhat upset that there were fewer errors in the code than expected, therefore, there was less material for the article. On the other hand, the project was verified, and it was interesting. A small number of errors indicates the quality of the code, which is also good.

What else to add? Install the SETI @ home client - help in the search for extraterrestrial civilizations, install PVS-Studio - it will help you in finding errors in the source codes written in C / C ++.

This article is in English.


If you want to share this article with an English-speaking audience, then please use the link to the translation: Sergey Vasiliev. A Unicorn Seeking Extraterrestrial Life: Analyzing SETI @ home's Source Code .

Read the article and have a question?
Often our articles are asked the same questions. We collected answers to them here: Answers to questions from readers of articles about PVS-Studio, version 2015 . Please review the list.

Source: https://habr.com/ru/post/259151/


All Articles