Thoughts about programming in assembler

After many years of doing something without falling, I decided to return to basics. To programming. Again, in view of the many “modern achievements” in this area, it was difficult to decide what is really not enough, what to undertake in order to be both pleasant and useful. Having tried a lot of things little by little, I decided to go back to where I was drawn from the first days of my acquaintance with a computer (even with a copy of Sir Sinclair’s work) - to programming in assembler. In fact, at one time I knew the Assembler quite well (in this case, talking about x86), but for almost 15 years I have not written anything on it. Thus, it is a kind of return of the "prodigal son."
But then the first disappointment awaited. To my deep regret, books, manuals and other assembler references found on the Internet contain minimal information on how to program in assembler, why it is so, and what it does.

An example from another area

If we take boxing as an example, then all such manuals teach you to execute a punch, to move while standing on the floor, but absolutely what boxing does is boxing, and not “permitted muzzling.” That is, combination work, features of the use of the ring, protective actions, tactical construction of the battle and, moreover, the strategy of the battle are not considered at all. They taught a man to beat on a “pear” and immediately to the ring. This is fundamentally wrong. But this is how almost all the “textbooks” and “manuals” for programming in assembler are built.

However, normal books should be, most likely under the mountain of "slag" I simply did not find them. Therefore, before we fill in our knowledge with a global description of architecture, mnemonics and all sorts of tricks “how to make a fig with 2 fingers”, let us approach the issue of programming in an assembler from an “ideological” point of view.

Idyll

A small note, hereinafter, a classification that differs from the one currently used will be used. However, this is not the reason for the “debate about the color of truth,” it’s just easier in this form to explain the author’s point of view on programming.

So, today, it would seem, the era of happiness has arrived for programmers. Huge selection of funds for all occasions and wishes. There are millions of “frameworks” / “patterns” / “templates” / “libraries” and thousands of tools that “facilitate” programming, hundreds of languages and dialects, dozens of methodologies and various approaches to programming. Take - I do not want. But not "taken." And the point is not in religious beliefs, but in the fact that all this looks like an attempt to eat something tasteless. If desired, and zeal, you can adapt yourself to this, of course. But, returning to programming, in the majority of the proposed technical beauty is not visible - only a lot of “crutches” are visible. As a result, when using these "achievements", from under the "brush of artists" instead of fascinating landscapes, there is a solid "abstraction", or splints - if you are lucky. Do most programmers are so mediocre, ignorant and have problems at the level of genetics? No I do not think so. So what is the reason?
To date, there are many ideas and ways of programming. Consider the most "trendy" of them.

Imperative programming - in this approach, the programmer sets the sequence of actions leading to the solution of the problem. The basis is the division of the program into parts that perform logically independent operations (modules, functions, procedures). But unlike the typed approach (see below), there is an important feature here — the lack of “typing” of variables. In other words, the notion of “variable type” is missing; instead, it is understood that the values of the same variable can have a different type. Vivid representatives of this approach are Basic, REXX, MUMPS.
Typed programming is a modification of imperative programming when the programmer and the system limit the possible values of variables. Of the most famous languages, it is Pascal, C.
Functional programming is a more mathematical way of solving a problem when the solution consists in “constructing” a hierarchy of functions (and accordingly creating the missing ones), leading to the solution of the problem. As examples: Lisp, Forth.
Automata programming is an approach where the programmer builds a model / network consisting of messaging objects / actuators, both changing / storing their internal "state" and able to interact with the outside world. In other words, this is what is commonly called "object programming" (not object-oriented). This way of programming is presented in Smalltalk.

And how many other languages? As a rule, these are already “mutants”. For example, the mixture of a typed and automaton approach gave “object-oriented programming”.
')
As we see, each of the approaches (even without taking into account the limitations of specific implementations) imposes its own limitations on the programming technique itself. But it cannot be otherwise. Unfortunately, these restrictions are often created artificially to "maintain the purity of the idea." As a result, the programmer has to “distort” the originally found solution into a form that at least somehow corresponds to the ideology of the language used or the “template” used. This is even without taking into account modern methods and methods of design and development.

It would seem, programming in assembler, we are free to do everything in the way that we want and allows us to “iron”. But as soon as we want to use the “universal driver” for any type of equipment, we are forced to change the freedom of “creativity” to the prescribed (standardized) approaches and ways to use the driver. As soon as we need the opportunity to use the work of other colleagues or to give them the opportunity to do the same with the fruits of our work - we are forced to change the freedom to choose interaction between parts of the program in certain negotiated / standardized ways.

Thus, the “freedom” that often breaks into an assembler often turns out to be a “myth.” And this (understanding of limitations, and ways of organizing them), in my opinion, should be given increased attention. The programmer must understand the reason for the restrictions being introduced, and that distinguishes the assembler from many high-level languages, be able to change them, if the need arises. However, now an assembler programmer is forced to put up with the restrictions imposed by high-level languages, not having the gingerbreads available to programmers on them. On the one hand, the operating systems provide a lot of functions already implemented, there are ready-made libraries and much more. But how to use them, as specifically, implemented without regard to calling them from programs written in assembler, or even in general, contrary to the programming logic for the x86 architecture. As a result, now programming in assembler with calling OS functions or external libraries of high-level languages is “fear” and “horror”.

The farther into the forest, the thicker

So, we realized that although assembler is very simple, we must be able to use it. And the main consistency is the need to interact with the execution environment where our program runs. If programmers in high-level languages already have access to the necessary libraries, functions, subroutines for many occasions, and they have access to ways of interacting with the outside world, in the form consistent with the idea of the language, then the assembler programmer has to wade through a thicket of all kinds of obstacles empty place. When you look at what high-level languages generate when compiling, it adds the feeling that those who wrote compilers either have no idea how the x86 processor works, “or one of the two” (c).

So let's go in order. Programming is primarily engineering, that is, scientific work aimed at efficient (in terms of reliability, use of available resources, implementation timelines and ease of use) solving practical problems. And, at the heart of any engineering is a systematic approach. That is, it is impossible to consider any solution as a kind of “unseparable” black box functioning in a complete and ideal vacuum.

Another example from another area

As a vivid example of a systematic approach, truck production in the USA can be cited. In this case, the truck manufacturer is simply the manufacturer of the frame and cab + the designer's assembler. Everything else (engine, transmission, suspension, electrical equipment, etc.) is taken based on the wishes of the customer. One customer wanted to get some Kenworth with an engine from Detroit Diesel, a manual Fuller box, a spring suspension from some Dana - please. It took the friend of this customer the same Kenworth model, but with the “native” Paccar engine, the Allison automatic transmission and the air suspension from another manufacturer - easy! And so do all truck builders in the United States. That is, a truck is a system in which each module can be replaced with another one of the same purpose and seamlessly docked with the existing ones. Moreover, the method of module docking is made with the maximum available versatility and convenience of further expanding the functionality. That is what an engineer should strive for.

Unfortunately, we will have to live with what we have, but in the future this should be avoided. So, a program is, in essence, a set of modules (as it is called as they are called, and how they “behave”), composing which we achieve the solution of the task at hand. For efficiency, it is highly desirable that these modules can be reused. And not just use at any cost, but use a convenient way. And here we are waiting for another unpleasant "surprise." Most high-level languages operate with such structural units as “function” and “procedure”. And, as a way of dealing with them, “parameter transfer” is applied. This is quite logical, and there are no questions. But as always, “what is being done is not what is being done - how is being done” (c). And here begins the most incomprehensible. There are 3 ways of organizing the transfer of parameters: cdecl , stdcall , fastcall . So, none of these methods is native for x86. Moreover, they are all flawed in terms of expanding the functionality of the called subroutines. That is, having increased the number of parameters passed, we are forced to change all the call points of this function / subroutine, or to produce a new subroutine with similar functionality that will be called in a slightly different way.

The above mentioned parameters transfer methods work relatively well on processors with 2 separate stacks (a data stack and an address / control stack) and advanced stack manipulation commands (at least an index reference to the stack elements). But when programming on x86, you have to pervert first when transferring / receiving parameters, and then not forgetting their “structural” removal from the stack. Along the way, trying to guess / calculate the maximum stack depth. Recall that x86 (16/32 bit mode) is a processor that has:

specialized registers (RONS - general-purpose registers - as such are absent: that is, we cannot multiply the contents of the GS register by one command with the value from EDI and get the result in the EDX: ECX pair, or divide the value from the EDI: ESI register pair by the contents register EAX);
few registers;
one stack;
the memory cell does not provide any information on the type of value stored there.

In other words, the programming methods used for processors with a large register file, with the support of several independent stacks, etc. most of them are not applicable when programming on x86.

The next peculiarity of interaction with ready-made modules written in “high-level languages” is the “struggle” with “types of variables”. On the one hand, the reason for the appearance of variable types is clear - the programmer knows what values are used within his subprogram / module. On this basis, it seems quite logical that, by setting the type of values of a variable, we can “simplify” the writing of the program by placing control of the types / limits of values on the language translator. But even here the baby was splashed out with water. Because any program is written not for generating spherical horses in a vacuum, but for practical work with user data. That is an obvious violation of the systems approach - as if the developers of high-level languages viewed their systems without taking into account interaction with the outside world. As a result, when programming in a typed language, the developer must look at all possible types of "wrong" input data, and look for ways to circumvent uncertainties. And it is here that the monstrous systems of support of regular expressions, exception handling, signatures of methods / procedures for different types of values and other other generation of crutches appear on the scene.

As mentioned above, for the x86 architecture, the value itself stored in the memory cell does not have any type. An assembler programmer gets the privilege and responsibility to determine how to handle this very value. And how to determine the type of value and how to handle it - there are many options to choose from. But, we emphasize once again, they all relate only to the values received from the user. As developers of typed languages correctly noted: the types of values of internal and service variables are almost always known in advance.

This reason (the perverted transfer of parameters to modules written in high-level languages and the need to strictly monitor the types of parameters passed in the same modules) is seen as the main one, due to which assembly language programming is unreasonably difficult. And the majority prefers to understand the wilds of “high-level languages” in order to take advantage of what has already been gained by others than to suffer, I insert the same “typical” crutches to correct what they did not do. And the rare assembler translator somehow “unloads” the programmer from this routine.

What to do?

Preliminary conclusions with a 15-year break in programming in assembler.
First, about the modules or parts of the program. In the general case, it is worth to distinguish two types of executive modules of the program in assembly language - “operation” and “subprogram”.

A “operation” is a module that performs an “atomic” action and does not require a set of parameters for its execution (for example, the operation to clear the entire screen, or the operation to calculate the median of a number series, etc.).
The “subroutine” is to call a functional module that requires, for correct functioning, a set of input parameters (more than 2–3).

And here it is worth assessing the experience of imperative and functional languages. They gave us 2 valuable tools that should be used: “data structure” (or, for example, REXX - compound / supplemented variables) and “data non-clutter”.

To transfer parameters to subroutines, it is convenient to use “structures”, that is, formed sets of parameters located in a certain area of memory available and the main program and called subroutines. Moreover, it is possible to standardize the approach, and use the “0-th” parameter as a bit mask of the filled / significant fields of the structure. That is, it will be a kind of call signature, which the subroutine can additionally analyze and change the operation logic, depending on the actual parameters used. Moreover, the developer can extend the capabilities of the subroutine, while maintaining compatibility with the old calls, and increase the number of parameters used without having to produce many such subroutines with the same functionality within the supported API. An additional advantage of this approach is the reduction of “parasitic” work with the stack.

It is also useful to follow the rule of non-tangency - that is, the immutability of the transmitted parameters. The subroutine cannot (should not) change values in the structure passed to it and the result returns either in registers (no more than 2-3 parameters), or also in a new structure created. Thus, we are spared the need to make copies of structures, in case of a “forgotten” change in data by subroutines, and we can use an already created structure or its main part to call several subroutines operating with one / similar set of parameters. Moreover, practically “automatically” we come to the next “functional” rule - the internal context-independence of subprograms and operations. In other words - to the separation of the state / data from the method / subprogram of their processing (as opposed to the automaton model). In cases of parallel programming, as well as sharing a single subroutine, we eliminate both the need to produce multiple execution contexts and watch for their "non-intersection", and the creation of many instances of one subroutine with different "states" in the case of several of its calls.

As for the “types” of data, then you can leave “everything as it is”, and you can also not reinvent the wheel and take advantage of the fact that developers of translators of imperative languages have long used - “value type identifier”. That is, all data coming from the outside world is analyzed and each received value is assigned an identifier of the processed type (integer, floating point, packed BCD, character code, etc.) and the size of the field / value. Having this information, the programmer, on the one hand, does not drive the user into an unnecessarily narrow framework of the “rules” of entering values, but on the other hand, he has the ability to choose the most efficient way to process user data in the process. But, I repeat once again, this only applies to working with user data.

These were general considerations about assembly programming, not related to design, debugging and error handling. I hope that the OS developers who write them from 0-la (and even more so on assembler) will have something to think about and they will choose (though not described above, but any other) ways to make programming in assembler more systematic, convenient and enjoyable, but they will not blindly copy others, often hopelessly “crooked” options.

Source: https://habr.com/ru/post/207186/

All Articles

Thoughts about programming in assembler

Idyll

The farther into the forest, the thicker

What to do?

More articles: