Today - the story about one of the key concepts of OS Phantom. However, the concept itself, of course, existed even before the Phantom - in fact, at Tannenbaum, at the end of the book, where he allows himself to dream, the outlines of almost all the features of the Phantom are visible, so that, in general, the approach is quite obvious for those thinks about the future of systems in general.
Persistent RAM is a very simple and very difficult thing.
In general, everything is simple: imagine that the contents of the RAM does not disappear. Never. For example, when you turn off the computer. Or, for example, when ... the disappearance of the computer. "And the souls of the dead programs rush over the water."
')
Well, really - it does not matter: if we were able to save the state of the computer before shutting it down, then we can restore this state in another. Same. Generally the same? Straight to the chip? And if there is another video card in it - is it already impossible?
It is possible, because we are not talking about persisting, saving the
entire state of the computer, but only its operational memory. But then there is no use for this, the attentive reader will say - at least it would be necessary to save the registers? Otherwise, restoring only the memory will not allow the program to work as if nothing had happened.
And the task is exactly that. To provide the program with an environment in which the OS shutdown and computer shutdown for the program looked exclusively like pressing the “pause” button while watching a movie. During the pause "under the program" you can even change the computer, but you need to somehow provide a situation in which the continuation of work for the program will be completely transparent.
This is unattainable if the requirement to bring to the absolute. The condition of the hardware cannot be saved and fully restored. But do not. The program does not need a video card, it needs the same API and the saved picture on the screen, and this is possible.
What emerges: it’s not enough to save the state of the RAM only, and the whole computer is unrealistic.
Engineering ingenuity tells us that it is necessary to specify an environment for a program, which is, firstly, sufficient for the program itself, and secondly, within such an environment, it should be possible to save the consistent state of the program and everything that it “sees”.
It is easy to see that this is the operating system specification. Well - not any. Good operating system. A system that quite qualitatively hides from the application details of the equipment (and in general the frailty of the surrounding world).
Therefore, in general, it is necessary to speak not about persistent (virtual) memory, but about persistent execution environment = persistent OS. Actually, this is the OS project Phantom, but today I will limit myself in principle, nevertheless, only a question of RAM, otherwise the article will not be able to finish writing in the morning.
So, we understand that persistent memory is surrounded by other persistent services, and for the time being we set them aside.
The purpose of the war is also clear. Is it clear? Not?
The goal - to simplify the life of the programmer to the obscene. Rid him of his job of serializing the state of the program. And from the restrictions that he imposes on himself, designing the internal representation of data in the program, keeping in mind that all this will sooner or later have to be squeezed into the Procrustean bed of the byte stream, which is the file.
Well - here is a simple example:
habrahabr.ru/company/2gis/blog/198564Instead of fighting with the FS, you can simply forget about it as about a bad dream. Obviously, this raises a lot of questions about data exchange, but, in general, we admit to ourselves that the exchange also takes place not through files for a long time, but through REST / SOAP / whatever, and also not in the serialization format of the tree into a byte stream, and in the form of requests for objects or "branches" of the object tree. What, obviously, corresponds not to the “file” semantics, but to the presentation of the data, which is handy for the program itself.
Well, the goal is clear. Now means. How to write memory to disk?
Well, write (fd, (void *) 0, get_mem_size ()), right?
Of course not. For many reasons. And because there can be “holes” in memory - areas that are not displayed anywhere, access to which is denied. And because there is absolutely no need to write down what has not changed. Yes, the same code - why? And, most importantly, because it is wildly expensive.
And it is absolutely important that the program during the recording will have to stop. But this is not going anywhere at all.
The point here is what it is. If we want (and we want) that the application program trusts us and really doesn’t try to write to the file for peace of mind (and this kills the whole idea altogether), we need to guarantee it, there is NO
GUARANTEE that we will definitely save it. Perhaps, not quite in the last state (it is not so important - it will repeat part of the calculations), but in the integral and not the day before yesterday.
That is - the state of the program (a little later we will find out that this is the state of all programs in general) cannot be saved in the old way - before turning off the computer or by ctrl-s, it must be done
constantly . Or, at least, quite regularly. And often.
It is unlikely that the user, and even more so - the computer-controlled technological process will be pleased with the message “I keep the RAM” for a minute or two.
We must learn to keep the state in the background, slowly, right “under” the running program.
But then it will be inconsistent! We saved the left half of the program state, and when we started to keep the right one, the program had already gained something new - half "do not grow together" when restarted. The program will add an object to the left half, put a link from the right half to it, and in the “photo” of the state it turns out that there is a link, but no object.
The stored state must be strictly, absolutely, consistent. But it must be done consistently.
Can?
Can.
There are many heuristics that can reduce the cost and ease such a process, but it is based on the usual COW, copy on write.
If you exaggerate to the limit, then photographing the state is reduced to instantaneous switching of all memory to read only - this is the only (and extremely inexpensive) operation that needs to be done atomically. (We make a reservation in advance that it is necessary to drive programs into a state that is fully represented in memory. And not in the processor registers. Which is also easy and inexpensive).
This is, in fact, "the point of photographing."
Further, the process acquires a certain meditativeness and can be performed without pulling out the hair from the body's natural cavities, calmly and with alignment. Memory pages can be safely written to disk one by one. And what about the program? She, of course, will not wait and try to write something in some page of memory. Such a page is divided into a pair — an old one; a permanent copy is queued to write to disk (closer to the beginning), and the new one is used and modified by the program. When writing to the disc is over, old copies will be destroyed.
As I said, this is just a draft. There are lots of details. (Those who wish to blow the brain can
look at the code right now.)
The obvious optimization is to write only diffs. Since last time, not everything has changed.
Equally obvious, but more controversial and requiring heuristic optimization, is to write something down to the point of photographing (and switch to r / o), in the hope that it will not change. Then you can simply use the finished recorded block on the disk, without repeating the disk operation for this page of memory.
It is also clear that the working set of the program (or rather, its not read only part) will potentially double during the snapshot recording: if we are unlucky, while we record the state, the program “overtakes”, overwrites all its pages and all of them have to select second physical page of RAM. That is, if bad luck costs RAM doubled during snapshot. In reality, the situation is always softer, but the “demand” for RAM during snapshot inevitably increases. In order to satisfy it without stopping the program due to lack of RAM, it would be good to prepare a little before snapshots, write pages that have changed to disk, and then they can “take away” the physical memory page if it is urgently needed elsewhere. Again, an attentive reader will see that the above has already been said about writing to disk proactively, but for other purposes.
Preventive recording improves another indicator. Snapshot latency - the time between "photographing" and the end of recording photos on the disk.
I note that all this today is fully implemented and works in the OS kernel.
In this place we put a semicolon. After some time, I will try to write an article that links the previous topic - the synchronization primitives - and this one. That is, to write about the problems of implementing synchronization primitives in persistent memory.
And this will be an article that still has not been done properly. That is, to a significant extent it will consist of questions.