📜 ⬆️ ⬇️

Again EA, again NFS, again bugs. We repair

Hi, Habr! With you again speedranning community NFS. And we again repair an old toy - NFS Most Wanted. I already talked about fixing bugs in my previous articles , and today I wanted to go with you a little deeper into the jungle of disassembling. Interested please under the cat.



Prehistory


Once upon a time, when EA published good NFS , one of the most famous racing games came out - Most Wanted. Alas, it was not written as well as it was sold, and periodically fell. Of course, an ordinary person pays little attention to it - well, it flew out once for the passage, that's okay. But this creates tremendous problems for us: how many potential records were killed by accidental falls without distinct symptoms. It all ended with KuruHS personally asking me to figure out the situation. I could not refuse.

What we have



')
IDA - to disassemble
Cheat Engine - for editing memory and instructions
Visual Studio - for debugging (Trace Points, turned out to be a very convenient thing)

We have a bunch of dumps. A decent heap, gigabytes per 10. We’ll start with them - we will analyze on which instructions the game falls. And it falls quite randomly, although some patterns can be traced. During the solution of problems, we found several potentially dangerous places that sometimes paint the game. For example:



in a hash string calculation function. Apparently, the developers did not expect to get a null-pointer in this place, so they did not add a check for it. Because of this, in rare cases, the game fell. The fix is ​​pretty banal - jump into the first empty piece of the executable, and make test edi, edi. Then jz retun and jmp from where they jumped initially.



Another similar case was found in the procedure at
00057D105 mov edx, [ecx] ; ,

The developers again did not expect to get a null pointer there, so the game was falling. Fix absolutely identical to the previous one.



The most common cause of the fall was in the AllocateMemory function. Attempts to disassemble it have horrified everyone who worked on the problem of the fall of the game. Attention has been given to the fact that the game has at least 5 different memory management subsystems. What I got into ...



Okay, no time to whine, need to reverse. Several evenings after the analysis of this garbage bore fruit: the code, although still unreadable, has become more understandable. Apparently, this subsystem works according to the standard scheme: we grab a certain amount of memory at once, breaking up into blocks, store them in a doubly linked list; upon request, we give out free sites, and if there are none, we are trying to take more from the system. Ah, the 2005th, when memory operations were expensive enough for her to throw away at random ...



Some places in this function cause headaches to me, because my brain refuses to even try to process them. But one thing is clear to me for sure - somewhere among all these linked lists consisting of linked lists, lies the wrong pointer, because of which everything falls. The only solution that came to mind was to disable the “use_best_fit” check so that the subsystem would give out the first available free block, rather than looking for the one it considers most appropriate.

Of course, this did not solve the whole problem, but at least the game became really more stable - it fell just a few times during the week of testing in this particular place (considering that KuruHS spends 10 hours a day in the game), which I consider to be a pretty good result .

Pure virtual function call.


The same mistake that is illustrated in the header. People familiar with C ++ will immediately understand what the problem is. However, without the source code, everything becomes much more complicated. The situation is complicated by CRT, which, like the partisans, stubbornly does not want to generate dumps if it catches this type of error.

Purecall means that the code tried to call a "purely virtual function" (a virtual function of a class that has no implementation). Without a doubt, he cannot do this, so the only thing he decides to do is to inform the user of this and end with code 0 . As a result, everything seems to be good with the code, but in reality everything is bad.

Thank you Microsoft for the great feature - _set_purecall_handler, which allows you to replace the purecall handler. We are looking for references / links in the executable, we find the function itself. It now remains to write your handler and do not forget to install it as a handler. To do this, we need to find a large enough piece of unused code in the file itself, which we can overwrite with our code. A short search showed that this would be a _CxxThrowException function (no references were found to it). We mercilessly write down her entire body in nop'ami and begin to create on top of her:



This is how the pseudo-code of new procedures will look like:

 new_handler: xor eax, eax ; return *(0); mov eax, [eax] ;    ret set_handler: push new_handler call _set_purecall_handler ; _set_purecall_handler(new_handler); add esp, 4 ; cdecl,   ret 

We compile (in my case, we drive our hands into the Cheat Engine) and insert it into the code:



Now you need to find a suitable place to call this procedure. I didn’t find a really suitable one, but I found one great empty function right in the main loop of the game, so its call to substitute a call to a function written by us. We make a patch and you can test it.



The only problem is that this error is quite rare, and you do not want to play for hours aimlessly. I nevertheless decided to test myself a bit, and was pleasantly surprised - the game fell literally after 10 minutes of gameplay, and it fell on the section I just wrote. Move through the call stack a little higher:

 0043E005 call dword ptr [edx+80h] 

I can not say anything except: "Yes, this is a call to a virtual function." The very first thought - what if without it? We cut it off with nop'am, we test - like we live. The game works as it should. No side effects. We collect the patch, send for testing. A day later, a dump arrives, where the same procedure drops a few bytes below. I cut it out too - the game starts to fall. Everything leads to the need to think about a more serious decision. But nothing comes to my head, so it is postponed indefinitely.

During the night I managed to think everything over, and came to the conclusion. You say that C ++ does not know how to determine the type of an object in runtime? And I will say that it can. And very simple - at the address of the virtual table in memory. Having studied the dumps, I came to the conclusion that the wrong class (vtbl @ 0x00890970) arrives from time to time in the procedure, which means we can catch this situation:

 cmp edx, 00890970h jnz good_class xor eax, eax jmp return good_class: call dword ptr[edx+80h] jmp continue 

But there is one catch: it takes a lot of space, and it needs to be built into the procedure. Finding enough space will fail; all that is is a couple of empty chunks of several bytes before the function. Thank you already for the fact that there are many of them and they are close. Therefore, we write spaghetti and jump from one place to another almost after each instruction:



Lyrics
Maybe I lost my temper a little and it was worthwhile to shove it into the once _CxxThrowException function, since I already cleaned it. But alas, he did as he did. I'll try to redo this fix the other day.

Patch and run. And we get the same problem: this crash is so rare that in almost 4 hours of testing this piece of code was run just a couple of times, and all the times the correct class was obtained at the entrance.

It was possible to leave it like that, but I needed confirmation that it really works. Therefore, we are going to reverse further and try to cause an exceptional situation with our hands.

A quick inspection revealed that the game could fall if one of the arguments was non-zero. The procedure itself is called in only two places, and in one of the cases it is called with the very argument set to 0. It means that we are looking at another function.



we remove the “extra” checks to the maximum and try to force this function by force. We start the test and finally get the wrong class at the entrance. We are waiting for the studio debugger to finish printing all the text, the game droops and ... continues to work. Hooray!


Screenshot soapy, for recording from stream

Conclusion


Solution found - the game no longer falls, even if the input filed something wrong. This is noticeable in the screenshot above - part of the barrier is missing, because the game tried to put something wrong there. What exactly is a mystery covered in gloom, but I am sure that sooner or later we will find out that as well.

In general, the situation has really improved noticeably - KuruHS was able to fully spend about 20 hours in the game without a single fall, which previously would have been simply impossible.

I decided to fix the whole fix in the form of an asi script, on the principle of Widescreen patches from ThirteenAG. You can read the sources and download the scripts on github .

Thanks for attention!

Source: https://habr.com/ru/post/349296/


All Articles