Boris Babayan about the past, present and future of computing technology

What is the development of computing technology to a person who has been developing computer technology for more than half a century?
I was able to talk on this subject with Boris Artashesovich Babayan, director of architecture at Intel.
Boris Babayan is known as the chief architect of the computer computing systems Elbrus-1, Elbrus-2 and Elbrus-3. Some of his ideas are used in the Transmeta architecture. Currently, Boris is leading the development of a new microprocessor architecture at Intel.
')
In order to completely do away with the formalities, I will list the titles, degrees and positions of Boris: Corresponding Member of the Russian Academy of Sciences, Doctor of Technical Sciences, Professor, Head of the Microprocessor Technologies Department of the Moscow Institute of Physics and Technology, Intel Fellow, State and Lenin Prize Laureate.
Further narration is built on behalf of Babayan. My mean comments are decorated in the form of frames or links to Internet pages.It is appropriate to ask a question about what I will say. The answer may seem very unusual: everything. About everything that I and my colleagues do. I thought about the problems of my current project and what to do next. And I deeply felt that everything in our business
is closely connected. And architecture, and programming languages, and the operating system. Well, everything, everything, everything, you know? This will be my story.
When I looked at the situation as a whole, it struck me very strongly. It seems that nothing more can be optimized in the gland. My colleagues and I know that this is not the case. We are already optimizing the microprocessor architecture of the new generation, although it has not yet become a product. And for those who are engaged in
superscalar , the end seems to be - there is nothing left to do next. And nevertheless, if we consider the whole development of computing technology, then we are closer to the beginning than to the end.
Unfortunately - or perhaps fortunately - I don’t know - the whole story, I think, has an end. But this is probably the visible end. It seems to me that the end. And it is possible that under other conditions ... But it was something like a preface.
In general, the topic of my conversation is the history of the development of our business, the entire computer technology, computer technology. Along the way, I will talk about the role of our Russian scientists, Soviet scientists, or something ... But this is not my goal. That, so to speak, in passing. Unfortunately, very few people in Russia know about this.
Everything, it was the last foreword. Now we will begin.
Look here: computing is what is it? This is an implementation of algorithms in hardware using programming languages. That is, we have three major components: algorithms, languages and hardware. Therefore, if we are going to analyze something, then first of all it is necessary to look at these components.
Let's start with the algorithms. Algorithms are something abstract, something eternal. Like numbers, for example. There may be many different algorithms, and each of them is eternal.
How are the algorithms presented? In them, as in many other things related to the real world, two components can be distinguished: temporal and spatial. A time component is a sequence of operations: what is being done. This component is extremely parallel. Of course, there are purely sequential algorithms, but in principle the concept of an algorithm is an idea of a parallel structure. Typically, the computation graph is very parallel and extremely structured.
Now the spatial component. The spatial component is also parallel and structured. What is she like? These are objects with which operations work. A lot of objects. These objects can refer to each other, be nested into each other. But there are many of them and different operations work with them. Everything is
very parallel.
That's what algorithms are.
And iron? And iron, unlike algorithms, is not forever. It varies greatly over time. Initially, iron was very primitive. When I started ... - and I started at the Physical and Technical Institute, in the 51st year I entered it - and so, then the first BESM did not work either. One bit of information was then represented by two electronic tubes. This is a cubic decimeter! Incredible volume. And there was only one executing device. It occupied three rooms. As I always joke, the transfer of a bit began in one room, and ended in two rooms. This is a terrible size.
And look, what now. On one crystal an incredible amount of performing devices is placed.
But the first BESM was already far from the first. I did not see the very first cars. For example, the Urals. It was a strictly consistent machine. Bit-serial machine. That is, there was a magnetic drum. One bit of information was read from it. Then the read values, for example, were added and again one bit was written to the drum. Awful. But it was necessary to design such machines. Where to go? Then there was no way to reflect on the equipment such an incredible magnificence as parallelism in time and in space. There was simply nothing to talk about.
It was necessary to simplify. Well, what kind of simplification would be natural in this case? Linearize everything! Turn everything into a ticker. It is very simple. Nothing worth turning a parallel algorithm into a ruler. And in space and in time. Here are the first cars were made. And this is a brilliant decision, of course. Although ... it is difficult to call him a genius, since it is so simple. But this is an effective solution for the time. Because to turn a complex algorithm into a ruler - no optimization is needed. Compiler to do this is easy. True, then about compilers and programming languages never heard of. Everything was programmed in bits. The state of computing was such that it did not allow to focus on programmers. They were not considered at all. Programmers worked with what the hardware developers gave them. And the programs were tiny. Therefore, it was easier and more efficient to write in bits than in any language. Then the use of languages was a waste of machine time.
My favorite joke in those days was that real men work in assembler, and languages are so much mischief.
In the absence of languages and compilers, a person personally represented the algorithm as a ruler. And this corresponded to some familiar concept of time. The time is sequential - and the algorithm is sequential. This is all natural. Therefore, compilers (by the time they appeared) to do this work was very simple. For the car - even easier. The linear sequence of instructions that the algorithm turned into was performed strictly sequentially. This requires a very simple equipment! Then — and I had already done the machines — we had never even imagined to model the behavior of the equipment.
Now without modeling is impossible. Modern architectures are very dynamic. It is impossible to optimize them without modeling. And in that era, the execution time of the application was equal to the sum of the execution times of individual operations. Nothing needed to be modeled, everything was clear. There were no registers and no caches. The first BESM worked this way: it read one number from the memory, the second number, performed the operations, wrote back. Reading from memory, compared with the operation, occurred instantly. Operations then took many cycles.
The space was also very simple. There were no objects. There was just a sequence of bits.
About security, i.e. about the protection of processes from each other, there was no talk at all. The car was in a separate room. Someone had the key to this room - that’s all the security. A programmer came, sat down at the car, started to work. Finished - shot his tape, left, the other came. Each one brought and carried their data with them, and only one process worked on the machine. 100% secure.
The life of the programmer was not easy. Each new machine is a new command system. All programs were copied. It was only the fact that there were very few programs that saved, so that it was possible to put up with the rewriting.
That was the beginning.
The tragedy is that compatibility with this beginning we have until today. All modern command systems are linear. Although the equipment is now very parallel. And it's very bad. This complicates programming. A lot of mistakes arise. We have big security problems because of pointers that allow us to reach out to any information from any place. Awful But that's not all. From the first personal cars we got the tire. The hardware at that time was not yet parallel, and the processors had only one actuator. But the processors themselves could already work in parallel with each other on shared memory. Just for this they needed a tire. The bus created strict orderliness of all memory accesses from all processors. What is now called strong ordering or memory model.
Programmers were given such a system, and they understood that everything was ordered in time. Programmers, as always, just used what they were given. And they gave them that. Then synchronization through semaphores and reading data from memory took the same time. Therefore, for programmers, from the point of view of application efficiency, there was no difference: to use semaphores for synchronization, or to use the orderliness set by the memory model. As a result, programs were written as horrible.
And then the situation only worsened. Caches appeared. And if someone used to work out honest synchronization through semaphores in their programs, now he simply had to rewrite the programs so that they work through the memory model. Because data access has become much faster than access to the semaphore. After all, the data now lie in the cache, and the semaphore is in shared memory, access to which is much slower than to the cache. Using semaphores has become very unprofitable.
From here went compatibility issues.
It has long been no tires. However, in all cars that tire was modeled, which was in the very first cars. Of course, it was slightly changed. The developers of
P6 have looked that if we strictly perform memory ordering, then 25% of performance is lost. Therefore, they introduced one relief: reads from memory can outrun entries in memory. But still, the readings cannot overtake each other, the records cannot overtake each other, and the records cannot overtake the readings.
And it would be possible to cancel and do everything in a new way. We never used tires in
Elbrus . We had a cross-bar-switch from the very beginning.
Cross-bar-switchDirect connection of each processor with each
So: the humor is that the first cars are still being modeled. And not only in hardware, but also in programming languages. All programming languages are corrupted. They are very far from the algorithms - from what I called parallel time and parallel space. They are closer to the very first machines, where everything is consistent.
We see that compatibility has taken everything in a stranglehold, and it is very difficult to change it.
Now let's take a look at how the temporal component progressed and how the spatial one developed. Well, let's start with the spatial component. It is much easier. Although more relevant, perhaps, the time component. But it is more difficult.
So spatial ... Everything is very simple with it. These are objects. So, the machine must work with objects. When we started the first Elbrus, we did not think about security.
Security...Protecting processes from one another could be an independent motivation for introducing all the things mentioned below. That is why safety is mentioned here.
As I already said, there were no security problems at that time. We tackled objects to support high-level languages. We then thought: what should be a high-level language? At the time, in the 72nd year, it was thought that high-level programming was provided by languages such as Algol and Fortran. But we quickly realized that existing languages are oriented to existing machines, where everything is consistent. Therefore, trying to support existing languages in a new machine, we would actually be oriented indirectly, through languages, to existing architectures. It is nonsense!
I note that at that time there were already machines with support for high-level languages. For example, the company
Burroughs produced a machine that performed
Extended Algol . The data types there were implemented through tags that were stored in memory along with the data. It was a good idea, and we borrowed it from Burroughs. But the way they used these tags was just a misunderstanding. With the help of tags Burroughs supported static types at the hardware level. It worked like this: a certain type was rigidly assigned to each used memory cell. This information was used to automatically convert the data while it was being written to memory. If, for example, a real number was saved to a memory area marked with an integer tag, then the machine dynamically turned the real into an integer before writing.
By supporting the existing static languages, Burroughs, in fact, supported the old hardware, because it was the static languages that were oriented towards it. It’s just stupid! Burroughs, developing a new architecture, eventually focused on the old machines. In addition, the tags could be changed on the go in an unprivileged mode, which is also a big nonsense.
However, in fairness, it should be noted that the dynamic type casting was given by Burroughs almost free of charge. For such a conversion, it was necessary to recognize the type of cell in which the data was stored at each memory entry. This was necessary in order to find out how the data should be converted. It turns out that each entry in the memory required prior reading. Such extra readings look terrible waste. But the fact is that then
thoric memory was used . Before writing to such a memory, it had to be demagnetized. And degaussing was done through reading. Therefore, reading the cell type during the recording did not lead to additional overhead. It was simply combined with demagnetization.
But back to Elbrus. We rejected the orientation of the old languages. We analyzed a variety of different options and realized that high-level programming should allow programming not on linear memory, not on bits, but on objects. This is an object orientation.
This is how the algorithms work? Imagine that some algorithm is being executed. If it generates a new object, it does not need any bits. An object, well, appears in space or something. To this newly generated object, only the current algorithm can access. No one else even knows about him. Because it was generated by a specific algorithm. An abstract algorithm, but no iron, do you understand? This should be supported in the gland somehow.
Further, each object has a type. So we must support the types. At first, we tried to apply a very extremist approach: integers that were real in tags were written. But then we saw that it was completely unnecessary. No need to support any types in tags. And that's why.
It is a little about type system Elbrus-1next to user data, information about their type was stored in memory. The same information fell into the registers when reading data from memory. Knowledge of the type made it possible to control the operations that could be performed on objects. For example, an attempt to produce integer addition of real numbers caused scrapping
Imagine that you take a real number and want to change some bits in it. For example, you want to do something with an exhibitor. If strictly all the variety of types is supported (bit set, real, integers, ...), then to change any bits in a real number, you must first turn it into a set of bits, then do the necessary manipulations, then turn back the bit set into a real one. A bunch of unnecessary transformations! And if you can literally work with the material, which, of course, you must be able to do, the error may not lie in what the programmer changes, so to speak, in the real number of bits, but in that he writes literally incorrect values.
Work literallyThose. via bit set
I want to say that inside the procedure it is impossible to check its semantics. The apparatus cannot know semantics. Errors within the procedure are the responsibility of the programmer. There is no need to protect the programmer inside the procedure. If he wants, he can always make a mistake there. Another thing - interprocedural relationships. Here interprocedural relationships are governed by pointers. And pointers, generally speaking, cannot be changed literally. That is, if a procedure works with a certain set of pointers, it should not be able to forge them.
Thus, we realized that you only need to check pointers and nothing else. This is the first. The second. We realized that a high-level language should be dynamic.
This is if we want to have a universal language. For me, then - let me remind you that this 72 year was - universalism meant that an operating system could be effectively written in a language. If the operating system is not programmed, then the approach is not universal. In order to be able to write an operating system, data types must be dynamic.Why is this so? Suppose you define some name and statically bind a type to this name. In your program, you know this type, but the operating system, when dealing with your data, does not know how it is described in the source language. If the program variable is statically described as an integer, the operating system does not know about it. She only sees binary codes. Therefore, the static approach is completely unacceptable. In addition, it is also not convenient. For example, it is convenient to be able to transmit the same parameter of the procedure as a direct value, a reference to this value, or even a procedure call that produces the desired value. It is absolutely natural. But the static approach does not allow doing this, unlike the dynamic one.As a result, we came to the conclusion that a high-level language is a dynamic language with strict control of pointers. Exactly to this conclusion, in fact, independently - we didn’t make much contact and hardly read each other - Nicklaus Wirth arrived. This is man number one in languages. He also realized that high-level programming is based on type safety and type dynamics.Wirth created the dynamic type of Euler. He was incredibly interesting. Everything was just delighted with this language. But he lacked efficiency. Because it was not supported by hardware. There was a mass of dynamic control. Then Wirth passed back. He said: yes, the types should be, but let them be static. And the control should be ... but maybe not very strong, because, for example, controlling the output of the array beyond borders requires additional commands.We from the very beginning implied that the types are checked by hardware. For example, if you have a pointer to an array, then no one can, having this pointer, go beyond the bounds of the array. It's impossible.
And what happened as a result? We made a real language. We had tags. That is, several digits were added to all the data describing the type. We did control pointers. In short, we created the El-76 . This is the late Volodya Pentkovsky made. He was a programmer then.On El-76 the Elbrus operating system was written. This is already done by Serezha Semenikhin. 26 people in the mid-70s made an operating system: multiprogram, multiprocessor, multiterminal. At that time, basically packet mode was used everywhere. And we have such a developed operating system!In addition, our standard approach dramatically simplified programming. He even increased productivity. We have made three generations of cars on this approach: the first, second, third Elbrus.Our machines were used in very critical systems. Moscow’s anti-missile defense, space control, nuclear projects in Arzamas. At the same time, all our users said: debugging on this machine is 10 times faster than on older machines. The impression was as if you were working with a permanently enabled debugging system. And without loss of efficiency.When our guys got Western cars - in the beginning of the 90s - they began to approach me and say: how can these machines work? They can not debug the program. Awful
Now look what happened with this approach! Historically.
We released the first car in '78. Then passed the first test of the operating system. Until 82–83, First Elbrus were widely used in our country. Around the same time, research in the field of type safety was carried out in many universities of the world. And Intel is also interested in this approach. They also made a type-safety-car, the 432nd . Ask any Intel man if he knows that Intel in history has 432 cars. The answer will be: no, I do not know. This is a shame to remember! Just a shameful car. She provided type safety. But there were elementary mistakes.In the 432nd, the protection of users was implemented, which of course is very correct. However, all the pointers were stored in a separate segment. It was called program reference table. The car controlled that only pointers can be in this segment. And nowhere, besides him, could there be more pointers. I read their papers. They say that all this is very cool, but it is very inconvenient to keep the pointers in a separate segment.Pointers should be sent as normal data. We did just that. We have pointers - just data. And the 432nd kept pointers in a separate segment. This led to terrible results. For example, a three-address operation was required to index the array. It was necessary to specify three operands: the position of the pointer to the array in the pointer segment, the position of the index in the data segment and again the offset in the pointer segment, for which the result was required to be written. But the worst thing is not even this, but the fact that when entering the procedure, the operating system was requested four segments! Two segments for parameters: pointer and scalar. And two segments for local data: also Pointer and scalar.Central Committee and Council of Ministersthen Western machines were copied, and with the advent of the 432nd, they spread the paper on scientists: here, you need to copy this machine. Since I was a defender of the ideology of type-safety, I was included in the commission. Ernst Filtsev headed it. And then I wrote a big cart against Intel. Wrote that the car is bad and that after a few months it will fail. And she failed. When I came to Intel, I found an article by Bob Colwell , the brilliant man who created the P6. He analyzed the 432th machine and came to the same conclusion as me: the idea is colossal, but the implementation is useless. Colwell’s article has specific numbers. For example, entering the procedure took up to fifty memory accesses. How was it possible to work on such a machine?As I said, the 432nd did not last long. But what happened after that with type safety? In what areas did languages and equipment develop?The author thanks Andrei Dobrov, Alexander Kim, Dmitry Maslennikov and Alexander Ostanevich for their help in preparing the material.