Notes on the article “How to write in C in 2016”

In fact, the Assembler would have looked like this if it were a weapon, but with C, too, you need to be extremely careful

From the translator:
This publication is a translation of the article-response to the text “How to C in 2016” . The translation of the latter was published by me on Friday and caused, in places, a mixed reaction of the community. The user of CodeRush gave a hint to this “answer”, for supporting the discussion of the issue already within the framework of Habr, for which a special thanks to him.

Earlier, the network published an article “Programming in C in 2016” with many useful tips, among which, alas, were not very good ideas. That is why I decided to comment on the relevant points. While I was preparing a new material, someone noticed that only responsible programmers should take on work in C, while other languages in which there are more opportunities to improve existing skills should be taken irresponsible. Let's understand the secrets of experts in their field.

Use debugger

Point number 1, which you probably ignore, but in vain - to run for each line of code kernel level debugger right at the stage of writing. If you use the potential of this tool only to solve particularly complex problems, you definitely make a mistake.
')
I mean using IDEs like Visual Studio, Xcode, or Eclipse. If in this case you only work with the editor (without debugging capabilities), it means that you are not doing your job well. I mention this nuance because so many people write code in editors that do not have a debugging function. And I am not an exception.

This is important when programming in all languages, but especially in C. If the memory is corrupted, you will need to dump the structures and the contents of the memory in order to detect the error. Why does X give out some strange 37653? The printf () debugging command will not clarify the situation, but looking at the hexdump utilities in the stack, you will definitely understand how a certain amount of information was copied.

Do not forget to debug your code

Since C does not provide memory protection, an error made in one place may appear in another, even if this part of the code is not related to the damaged one. That is why it is so difficult to debug some problem areas of the code. In such cases, many programmers tear their hair, shout: “I can’t fix it” and implore colleagues to help.

Do not step on the same rake. A couple of times when faced with such "intractable" tasks, you will learn how to write better code. This will be a self-testing code that quickly detects bugs, or during the writing process, effective verification utilities will be used for whole data blocks covering borderline cases.

Protect code actively

Once I had a chance to work on a project whose leaders decided to put catch (...) (in C ++) everywhere to avoid a crash in the program. Exceptions and even cases of memory damage were simply disguised, and the program continued to function. Developers thought that they reduced the code vulnerability. They thought defensive programming style was a great solution.

But this is not a defense, but stupidity. Where is the logic to hide errors, which are much more difficult to detect later?

Do it differently. You need the principle of attack, the code that will identify shortcomings as soon as possible.

One of the ways to make this approach a reality is assert () - double verification assumptions, thanks to which you will surely do everything correctly. This feature detects bugs before they damage memory. I’m serious: to debug inconspicuous errors in C, I simply insert assert () wherever they can cause a crash (just don’t get carried away with the proposed command).

If we are talking about codes, the best way to “attack in advance” is unit testing. As soon as you have doubts, write unit tests to check ambiguous parameters. There is an opinion that, working in C, it is easy to get confused when initially everything goes according to plan, and then oddities emerge - for such cases it is also a good idea to have suitable tests in stock.

The code must be of high quality

As for this item, things like modular, regression testing and even fuzzing are gradually becoming the norm. If you have an open source project, you just need to request the make test option in order to perform a qualitative analysis. With it, you can run unit testing code with high coverage. This approach is considered a standard for large open source projects. I'm not kidding: unit testing with high code coverage should be the starting point for every new project. You will notice this in all my serious open source development. Here I begin to write modular texts already at the initial stage, and one by one (though I am lazy, and therefore I cannot boast of high code coverage).

Fuzzing with AFL is a relatively new phenomenon, but after testing this mechanism, you will understand how effective it is in identifying bugs in various open source projects. The C programming language beats all records when it comes to parsing from the outside. Previously, programs often crashed due to poorly formatted files or bad network packages. But in 2016, no one will tolerate such nonsense. If you are not sure that, regardless of the entered data, the program will surely work correctly, it means that you are mistaken somewhere.

And, if it seems to you that someone else should take care of quality, you are again mistaken.

Forget about global variables

When I work on open source projects in C / C ++, it’s global variables that prevent me from living. It is difficult for you to debug the project and set the parameters of multithreading, because you arranged a real breeding ground for global variables. Minimize their use - and code refactoring will become much more efficient.

Yes, if we are talking about the status debugging / registration system, you even need to access global variables, and in other cases - just forget about their existence.

A bit of OOP, add some functional programming and some Java.

Often they joke that “I can program on FORTRAN on any PL”, referring to the fact that programmers often use this or that language for other purposes, trying to reduce it to familiar tools. But it's like saying that dumb programmers remain dumb, no matter what programming language they come across. Although sometimes the universal properties inherent in various programming systems also work fine.

In object-oriented programming, we are interested in how the concept of structure integrates data and methods that are necessary for processing information. For struct Fooba, you create a series of functions that are reduced to foo_xxxx (). At your service are the constructor foo_create (), the destructor foo_destroy () and a sea of functions defining the structure.

The most important thing is to describe the struct Foobar in the C file, and not in the header file. Let the functions be publicly available, but here's the exact format of the structure, it is desirable to hide. As a rule, there are direct links to structures, which is especially important for libraries, the export of which header affects the compatibility of binary application interfaces (as the size of the structure changes). If it is necessary to export a structure, specify the version or size as the first parameter.

No, I'm not going to go into the details of OOP until inheritance and polymorphism. On the contrary, I just wanted to note that you can take note of the advantages of modular programming, which, in principle, are similar to OOP.

In addition, no one has canceled a few good ideas in functional programming, namely, those of its inherent functions that do not have "side effects". These are options designed to have source data and certain information at the output. No amateur performance. Most of the functions you write should look like this. If you compose something like void foobar (void); - wait for trouble.

Global variables are also mentioned in the list of tools that significantly complicate life. The system calls that reduce its performance fall into this category. Global variables, in essence, are similar to variables hidden deep in the body of a structure that you call via int foobar (struct Xyz * p) ;. As a result, you have to dig into the depths of p to find the necessary parameters. Simply, when all this lies on the surface, then the request is formed as foobar (p-> length, p-> socket-> status, p-> bbb). Yes, in this case, you have to work with long, annoying parameter lists, but instead of a complex structure, the foobar () function depends on simple types.

In part, a similar functional approach to programming is due to the use of constants, where pointers are specified via const, which means that the function cannot change them. But it is clear where the results (return value and non-constant pointers), and where the source data.

C - programming language of low-level systems, but, except for obviously complex cases, try not to abuse them. Instead of specific C techniques, use the tools with which your code will be viable in a wider environment, even if it is similar to C conditions. So you can integrate it with JavaScript, Java, C #, etc.

We'll have to do without pointer arithmetic. Yes, in the 1980s, it allowed the code to be significantly accelerated, but since the 1990s, it has no more benefit from it, especially in the case of modern optimizing compilers. Pointer arithmetic only reduces readability of the code. Almost every time an open source project is exposed to cyber frauds (Hearbleed (apparently meant Heart bleed note), Shellshock, etc.), look for the reason in the code with pointer arithmetic. Alternatively, pay attention to the variables of integer indices and analyze the corresponding data arrays, as if you wrote it in Java.

Such an ideal approach to writing code also means not parsing network protocols / file formats for structures / integers. Yes, the manual on network logging advises to use something like noths (* (short *) p), only this advice at the time of writing the book was not the most successful, and today it is completely out of place. Analyze integer parameters as in Java: p [0] * 256 + p [1]. It seems to you that having fixed the packed structure on the surface, it will be easier to analyze it - it was not there.

Block unsafe functions

Stop using outdated strcpy () and sprintf () functions. If I find vulnerabilities, most likely they are right here. Moreover, with such a set, auditing your code is much more expensive, because you need to look through each of the above functions to make sure that the buffer is not full. Perhaps you are sure that everything is fine with the buffer, but I will have to check it for a long time and tediously. No to write strlcpy () / strcpy_s (), and snprintf () / sprintf_s ().

In general, you really need to be aware of what a buffer overflow and variable size overflow are. Look at how Reallocarray () is used on OpenBSD, understand why this option allows you to solve the variable size overflow problem, and then try to use it in all your codes instead of malloc (). If necessary, copy the original reallocarray () from OpenBSD and stick to this function in your programs.

Do you know why the code suddenly fails when entering certain data? Maybe it's their safety. And by the way, why are intruders breaking into your code? If you do everything correctly, you don’t have to worry about such problems.

In the article “C Programming in 2016”, there was a suggestion to use calloc () everywhere. Do not rush to follow it, because in this case, on many platforms, you will still encounter an overflow of variable sizes. Also, get used to functions like realloc (), and at the same time reallocarray ().

There are many rules for writing safe code, but if you do what I said, solve most problems in this area. And, yes, be skeptical of any source data, even if we are talking about a local file or a USB port that seems to be under control.

Down with the weird code

What is missing for all companies that specialize in software development is after-work meetings, which anyone can come in and voice proposals for a single style of the codes they create. Then just dismiss all those who show off. Sounds silly.

The only thing that can be called the right “style” is the similarity of the code with its colleagues from the Internet. This applies to both personal codes and open source projects that you usually work on. You only need to select one of the existing well-known styles, such as those offered by Linux, BSD, WebKit, or Gnu.

Among the advantages of other programming languages, especially Python, we can note a very small number of generally accepted styles, which is not true for C. For example, when analyzing the Hearbleed vulnerability, it turned out that OpenSSL uses Whitesmiths brackets - a style that was once accepted, but now is rare and looks weird. LibreSSL converted it to BSD format. Very good solution: if your style in C is too florid / outdated, it may be time to change it to something common / familiar.

Are you sure that everyone will use the same cool stuff that you, as soon as they see how cool they look in your code? No, get rid of them, they only annoy people. Or, if such tools are vital (sometimes it happens), document your experience.

The future for multi-core processors

Processors are unlikely to be faster. As you can see, more and more nuclei just appear. No, this does not mean that you do not need to write multi-threaded code, but it is probably worth thinking about their future.

Say no to mutexes and critical sections - they only complicate your code. Yes, it increases the safety of the product, but performance suffers. So your code can fly on 2 or 3 cores, but if there are more of them, the program starts to slow down. More important than scalability for a different number of cores in C programming can only be to ensure a decent level of product security.

I think that very soon you will get acquainted with a huge article on scalability for various systems, but for now just do, as I said: get rid of global variables and hidden data exchange functions of embedded structures. Then, when you need to do refactoring of the code and its scale, it will be much easier to work.

Forget about true / false for success / failure

The article “C programming in 2016” says that success is always true. Nonsense. True is true, and success is success. Do not put an equal sign between them. To get 0 in the case of a successful result and another value in case of a function failure, you will have to write the most complicated code.

Yes, such nonsense: there is no standard, and it is unlikely to ever appear. Instead of listening to the bad advice from “C programming in 2016”, look at how doc is doing an excellent job with the task “down with a strange code”. The author believes that if we taught others to do the same, if they didn’t muddle their own code, setting a good example, then there wouldn’t be a trace of the problem. Naively. Programmers will never agree with this standard. Your code will have to survive in a world full of ambiguity, where both true and 0 mean success, despite the initially opposite values. The creation of a standard is possible only with the unambiguous definition of indicators SUCCESS and FAILURE.

If the code looks like this:

if (foobar(x,y)) { ...; } else { ...; }  ,      ,   success,   failure.     .   : if (foobar(x,y) == Success) { ...; } else { ...; }

Little about integer values

The author of the article “C programming in 2016” states that there is no reason to use classical int or unsigned, and instead it’s better to refer to int32_t and uint32_t. Nonsense! The int and long commands are generally accepted for inputting source data in most library functions, and they provide a kind of type protection and notify users even when requesting different types of the same size.

Frankly speaking, it is not difficult to set incorrect values of integers, including on 64 and 32-bit systems. Yes, using int to manipulate the pointer will break the 64-bit code (write intptr_t, ptrdiff_t or size_t instead), but you won’t believe how rare this happens in practice. Just set mmap () for the first 4 gigabytes, marking that these pages are invalid when loading, do modular / regression testing - and you will quickly solve any problem. And I do not need you to explain how to do it.
What annoys the code the most is the fact that programmers are always in a hurry to redefine types of integer values. Tie it up. I understand that u32 gives a special charm to the code, but this element just makes me mad. But I am the one who will have to read the code. Please replace it with something standard, for example, uint32_t or unsigned int. And, oh, horror, it’s enough to arbitrarily create types of integers like filesize. I know that you want to give the selected whole a new meaning, but do not forget that C programming is designed for a “low level”, and therefore programmers simply die in the process of checking such pearls.

Use static and dynamic analysis.

If earlier the specificity of C was reduced to “warning levels” and the “lint” option, at the moment their place has been taken by “static analysis”. With the help of Clang, compilers please users with new and new messages, and gcc is trying to keep up. And, although many are not up to date, Microsoft compilers also offer static analysis at the Clang level. The XCode capabilities in the field of visualization of Clang analysis are really impressive, although we are talking about the same mechanisms as in Clang itself.

But this is only a general static analysis. But there are still many security tools that take static analysis to a higher level - Coverity, Veracode, and HP Fortify.

It does not do without the principle of "erroneous admission," but this is an incorrect definition. Writing such commands in the code, you “clean it up”, which means you get much more reliable results. In other words, the popdnaya scheme allows you to bring the code to perfection, removing unnecessary elements. Writing code under the strict supervision of a static analyzer improves programming skills.

These terrible dependencies

After several years of existence of corporations, projects appear only on the current system. And all because a great many unsystematic dependencies are accumulating. In one of the companies where I worked they joked that they would have to share materials with competitors, because otherwise they would never remember how they mastered this or that development.

And for very good reasons, this practice is fixed forever. Another company proposed to unify versions of compilers in order to avoid problems with integration arising from the use of teams of different compilers. But this approach solves a relatively minor problem, which is replaced by a much more serious one. Eliminating issues with integration, you spend a kind of sanitation code.

Open source also implies certain difficulties. Dependencies are rarely fully documented, but there are plenty of shortcomings in them. In most of these cases, in the end, you spend a lot of time installing two incompatible versions of the same dependency in order to compile the necessary code.

The less dependencies, the more popular the code becomes. For this you need only:

Remove dependencies. As a rule, only 1% of dependencies benefit the programmer, the remaining 99% make it harder to work.
Use only the required source file. Instead of tying all the processes on the whole OpenSSL library (and its dependencies), just add the sha2.c file, unless, of course, you need other OpenSSL functions.
Let the source of all dependencies be right in the tree. For example, Lua is a smart scripting language at 25kloc that just needs updates. Instead of leaving users to fend for lua-dev, add a Lua source in your tree.
Load libraries in the process of launching the program through dlopen (), not forgetting the files of their .h interfaces, which are also part of the source of your project. This means that you will have no problems with this or that dependency until it is needed for the functioning of the libraries. Or, if there is no way without it, you can prepare error notifications with provided instructions on how to fix dependency problems.

Understand Uncertain C Behavior

Most likely, you do not quite understand how the C language works. Consider the expression (x + 1 <x). The actual result in this case is 5. And all because C does not specify an action for x, if the variable receives the maximum integer value and you add 1, which leads to its overflow. Many compilers identify similar phenomena as binary overflows, as well as other programming languages (for example, Java). But, as you know, some compilers automatically classify such situations as impossible, and simply delete all relevant code.

Thus, you will have to proceed not from how the current version of your C compiler works, but to delve into additional aspects by analyzing how other C compilers can react to the code.

Conclusion

Do not take programming in C if you are not used to taking responsibility. It is important to thoroughly deal with such concepts as buffer overflow, variable size overflow, thread synchronization, undefined behavior, etc. Responsibility implies the creation of a high-quality code designed for early detection of bugs. It is in these conditions that C programming will develop in 2016.

Source: https://habr.com/ru/post/275823/

All Articles