Having updated to the latest AMD 16.4.2 drivers in late April, I discovered that all DirectX12 applications had stopped working. Without being surprised at all, I decided to wait for the problem to be fixed and put DirectX12 aside. But the months went by, and with the new drivers the situation did not change.
Google has shown that this problem is massive (
one ,
two ,
three ,
four ), and AMD does not react at all. The user of the AMD
tapek forums found out by debugging that the problem was the use of popcnt instructions from the SSE4.2 set with newer drivers.
Having loaded one of the problematic libraries (amdxc32.dll) into Hiew, we search through the popcnt-F3 0F B8 instruction opcode to find that it is called as many as three times! This means that she is not much needed there and you can think of a replacement for her. This instruction returns the first argument the number of single bits of the second argument.
To replace popcnt, take the algorithm of Brian Kernigan (Brian Kernigan / Kernighan).
In C ++, it looks like this:
')
int kernigan(int value){ int count = 0; while(value != 0){ value &= (value-1); count++; } return count; }
On the asma:
push ebx push ecx xor eax,eax mov ebx, value kernigan_start: cmp ebx, 0 jz kernigan_end add eax, 1 mov ecx, ebx sub ebx, 1 and ebx, ecx jmp kernigan_start kernigan_end: pop ecx pop ebx retn
We are looking for an unallocated space at the end of the code section filled with zeros. There we will write our code:
Find the popcnt command in the library:
And replace it with a transition to our code:
In the previously found place we write our code and return control to where we took it from.
After that, we repeat the above with the remaining calls to the popcnt command both in this library and in amdxc64.dll, replace the originals with them and get working DirectX12 again without SSE4.2.
PS Link to my modified library for drivers 16.9.1 of September 13th.