* Link to the library at the end of the article. The article itself outlines the mechanisms implemented in the library, with medium detail. The implementation for macOS is not yet complete, but it differs little from the implementation for Linux. This is mainly an implementation for Linux.
Walking on a githaba one Saturday afternoon, I came across a library that implements updating c ++ code on the fly for windows. I myself got off windows a few years ago, not a bit sorry, and now all programming happens on either Linux (at home) or macOS (at work). A little googling, I found that the approach from the library above is quite popular, and msvc uses the same technique for the "Edit and continue" function in Visual Studio. The only problem is that I have not found a single implementation under non-windows (I looked bad?). Asked the author of the library above, whether he would make a port for other platforms, the answer was negative.
I will say right away that I was only interested in the option in which I would not have to change the existing project code (as, for example, in the case of RCCPP or cr , where all potentially reloadable code should be in a separate dynamically loaded library).
"How so?" - I thought, and began to smoke incense.
I mainly do game devs. Most of my work time I spend on writing game logic and layout of any visual. In addition, I use imgui for auxiliary utilities. My code cycle, as you probably guessed, is Write -> Compile -> Run -> Repeat. Everything happens pretty quickly (incremental build, all sorts of ccache, etc.). The problem here is that this cycle has to be repeated quite often. For example, I am writing a new game mechanics, let it be a "Jump", a suitable, controlled Jump:
1. Wrote a draft implementation based on an impulse, assembled, launched. I saw that I accidentally put an impulse every frame, not just once.
2. Fixed, collected, launched, now normal. But it would be necessary to take the absolute value of the impulse more.
3. Fixed, assembled, launched, running. But somehow felt wrong. We must try on the basis of strength to do.
4. Wrote a draft implementation based on strength, assembled, launched, running. It would be necessary only to change the instantaneous speed at the time of the jump.
...
10. Fixed, collected, launched, works. But still not that. Probably need to try the implementation based on the change gravityScale
.
...
20. Great, looks great! Now we take out all the parameters in the editor for gamediz, test and fill.
...
30. Jump ready.
And at each iteration, you need to collect the code and in the running application to get to the place where I can jump. It usually takes at least 10 seconds. And if I can jump only in open areas, which still need to get to? And if I need to be able to jump on blocks of height N units? Here I already need to collect a test scene, which also needs to be debugged, and for which I also need to spend time. For such iterations, a hot reload of the code would be ideal. Of course, this is not a panacea, it is far from being suitable for everything, and even after a reboot, sometimes you need to re-create a part of the game world, and this should be taken into account. But in many things it can be useful and can save concentration and a lot of time.
This is the minimum set of requirements that an implementation must satisfy. Looking ahead, I will briefly describe what was implemented additionally:
Up to this point, I was very far from the data domain, so I had to collect and assimilate information from scratch.
At a high level, the mechanism looks like this:
Let's start with the most interesting thing - the mechanism for reloading functions.
Here are 3 more or less popular ways of replacing functions in (or almost at) runtime:
strcpy
function, and make it so that when you start the application, it takes my version of strcpy
instead of the libraryThe first 2 options are obviously not suitable, since they work only with exported functions, and we do not want to mark all the functions of our application with any attributes. Therefore, Function hooking is our option!
In short, hooking works like this:
/hotpatch
and /FUNCTIONPADMIN
. The first one to the beginning of each function records 2 bytes, which do nothing, for their subsequent rewriting with a "short jump". The second allows you to leave an empty place in front of the body of each function in the form of nop
instructions for the "long jump" to the desired place, so in 2 jumps you can switch from the old function to the new one. You can read more about how this is implemented in windows and msvc, for example, here .Unfortunately, in clang and gcc there is nothing similar (at least under Linux and macOS). In fact, this is not such a big problem, we will write directly on top of the old function. In this case, we risk getting into trouble if our application is multi-threaded. If usually in a multi-threaded environment, we restrict access to data by one stream while another stream modifies them, then we need to limit the ability to execute code by one stream, while another stream modifies this code. I have not figured out how to do this, so the implementation will behave unpredictably in a multithreaded environment.
There is one subtle point. On a 32-bit system, 5 bytes is enough for us to "jump" to any place. On a 64-bit system, if we don’t want to spoil registers, we will need 14 bytes. The bottom line is that 14 bytes in machine code scale is quite a lot, and if there is any stub function with an empty body in the code, it is likely to be less than 14 bytes in length. I don’t know the whole truth, but I spent some time behind the disassembler while I thought, wrote and debugged the code, and I noticed that all functions are aligned on a 16-byte boundary (debug build without optimizations, not sure about optimized code). And this means that between the beginning of any two functions there will be at least 16 bytes, which is enough for us to “snag” them. Superficial googling led here , but I don’t know for sure, I was just lucky, or today all compilers are doing it. In any case, if in doubt, simply declare a couple of variables at the beginning of the stub function so that it becomes large enough.
So, we have the first bit - the mechanism for redirecting functions from the old version to the new one.
Now we need to somehow get the addresses of all (not only exported) functions from our program or an arbitrary dynamic library. This can be done quite simply using system api, if characters are not cut out of your application. On Linux, this is api from elf.h
and link.h
, on macOS - loader.h
and nlist.h
.
dl_iterate_phdr
we go through all the loaded libraries and, in fact, the program.symtab
section, .symtab
retrieve all the information about the characters, namely the name, type, index of the section in which it lies, its size, and also we calculate its “real” address based on the virtual address and the library loading addressThere is one subtlety. When loading the elf file, the system does not load the .symtab
section (correct it if it is wrong), and the .dynsym
section does not suit us, because we cannot get characters with visibility STV_INTERNAL
and STV_HIDDEN
. Simply put, we will not see such features:
// some_file.cpp namespace { int someUsefulFunction(int value) // <----- { return value * 2; } }
and such variables:
// some_file.cpp void someDefaultFunction() { static int someVariable = 0; // <----- ... }
Thus, in the 3rd paragraph we work not with the program that dl_iterate_phdr
gave dl_iterate_phdr
, but with the file that we downloaded from the disk and parsed with some elf parser (or on a bare api). So we won't miss anything. On macOS, the procedure is similar, only the function names from the system api are different.
After that we filter all characters and save only:
STT_FUNC
type STT_FUNC
located in the .text
section that have a non-zero size. Such a filter only passes functions whose code is actually contained in this program or library.STT_OBJECT
, located in the .bss
sectionTo reload the code, we need to know where to get the source code files and how to compile them.
In the first implementation, I read this information from the .debug_info
section, which contains debug information in the DWARF format. In order for each translation unit (ET) within the DWARF to contain the compilation line of this ET, it is necessary to pass fach -grecord-gcc-switches
when compiling. I myself parsed DWARF library libdwarf
, which comes with libelf
. In addition to the compilation command from DWARF, you can also get information about the dependencies of our ETs on other files. But I refused this implementation for several reasons:
10 seconds at the start of the application - too much. After some deliberation, I rewrote the logic of parsing DWARF to parsing compile_commands.json
. This file can be generated by simply adding set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
to your CMakeLists.txt. This way we get all the information we need.
Since we have abandoned DWARF, we need to find another way to handle dependencies between files. Parse the files with your hands and look for them, you really don’t want to, and who knows more about dependencies than the compiler itself?
In clang and gcc there are a number of options that generate so-called depfiles almost for free. These files use the make and ninja build systems to resolve dependencies between files. Depfiles have a very simple format:
CMakeFiles/lib_efsw.dir/libs/efsw/src/efsw/DirectorySnapshot.cpp.o: \ /home/ddovod/_private/_projects/jet/live/libs/efsw/src/efsw/base.hpp \ /home/ddovod/_private/_projects/jet/live/libs/efsw/src/efsw/sophist.h \ /home/ddovod/_private/_projects/jet/live/libs/efsw/include/efsw/efsw.hpp \ /usr/bin/../lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/string \ /usr/bin/../lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/x86_64-linux-gnu/c++/7.3.0/bits/c++config.h \ /usr/bin/../lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/x86_64-linux-gnu/c++/7.3.0/bits/os_defines.h \ ...
The compiler puts these files next to the object files for each ET, we need to parse them and put them into the hashmap. Total parsing compile_commands.json
+ depfiles for the same 500 ET takes a little more than 1 second. In order for everything to work, we need to globally add the -MD
flag for all project files in the compile option.
There is one subtlety associated with ninja. This build system generates depfiles regardless of the presence of the -MD
flag for its needs. But after they are generated, it translates them into its binary format, and deletes the source files. Therefore, when running ninja, you must pass the -d keepdepfile
flag. Also, for reasons unknown to me, in the case of make (with the -MD
option), the file is named some_file.cpp.d
, while with ninja it is called some_file.cpp.od
. Therefore, you need to check the availability of both versions.
Suppose we have such a code (a very synthetic example):
// Singleton.hpp class Singletor { public: static Singleton& instance(); }; int veryUsefulFunction(int value); // Singleton.cpp Singleton& Singletor::instance() { static Singleton ins; return ins; } int veryUsefulFunction(int value) { return value * 2; }
We want to change the veryUsefulFunction
function to:
int veryUsefulFunction(int value) { return value * 3; }
When restarting to the dynamic library with the new code, besides the veryUsefulFunction
, the static variable static Singleton ins;
will fall static Singleton ins;
and method Singletor::instance
. As a result, the program will start calling up new versions of both functions. But static ins
in this library has not yet been initialized, and therefore the first access to it will call the constructor of the class Singleton
. We certainly do not want this. Therefore, the implementation transfers the values ​​of all such variables that it finds in the compiled dynamic library from the old code to this very dynamic library with the new code along with their guard variables .
There is one subtle and generally insoluble moment.
Suppose we have a class:
class SomeClass { public: void calledEachUpdate() { m_someVar1++; } private: int m_someVar1 = 0; };
The method calledEachUpdate
is called 60 times per second. We change it by adding a new field:
class SomeClass { public: void calledEachUpdate() { m_someVar1++; m_someVar2++; } private: int m_someVar1 = 0; int m_someVar2 = 0; };
If an instance of this class is located in the dynamic memory or on the stack, after reloading the code, the application is likely to fall. The allocated instance contains only the m_someVar1
variable, but after a reboot, the method calledEachUpdate
will try to change m_someVar2
, changing what actually does not belong to this instance, which leads to unpredictable consequences. In this case, the logic of transferring the state is transferred to the programmer, who must somehow save the state of the object and delete the object itself before reloading the code, and create a new object after the reboot. The library provides events in the form of the onCodePreLoad
and onCodePostLoad
delegate methods that the application can handle.
I do not know how (and whether it is possible) to resolve this situation in a general way, I will think. Now this case "more or less normally" will work only for static variables, the following logic is used there:
void* oldVarPtr = ...; void* newVarPtr = ...; size_t oldVarSize = ...; size_t newVarSize = ...; memcpy(newVarPtr, oldVarPtr, std::min(oldVarSize, newVarSize));
This is not very correct, but it is the best that I came up with.
As a result, the code will behave unpredictably if the set and layout of fields in the data structures change in runtime. The same applies to polymorphic types.
How it all works together.
compile_commands.json
in the application directory and in the parent directories recursively, and gets out all the necessary information about the ET.compile_commands.json
.Ctrl+r
is assigned to it), the library waits for the completion of the compilation processes and links all the new object files to the dynamic library.dlopen
function.It works very well, especially when you know what is under the hood and what to expect, at least at a high level.
Personally, I was very surprised by the lack of such a solution for Linux, is nobody really interested in this?
I will be glad to any criticism, thanks!
Source: https://habr.com/ru/post/435260/
All Articles