Not so long ago I came across an interesting non-standard problem that made me think about two hours. It has mixed several factors, such as some curvature of the product architecture, insufficient logging and features of other software already installed. At once I will make a reservation that by and large the view of the problem and the solution will be from the philistine bell tower (in programming I shave a little more than nothing), but in this case it was this fresh look that was key in the final decision.

')
Actually the problem:
After installing
Acronis vmProtect Windows Agent (upgrade from version 8 to version 9), an agent for backing up virtual machines running VMware ESX / ESXi, the main Acronis vmProtect Managed Machine Service (
vmms.exe ) service refused to start. After starting the service spontaneously terminated within a few seconds:

Our dances with and without tambourines:
To begin with, we tried the standard dances with tambourines ala “reinstall under another user, reboot, etc.”, as obviously the problem did not reproduce in our QA department and accordingly the conclusion was that something was wrong with a particular installation of the user . As you can guess, these dances did not lead to anything and the service still stubbornly refused to start normally. The following lines were found in our service logs:

We knew that the
MigrationManager component was being loaded inside the main library
VELightBundle.dll and the fact that the service had successfully started earlier was known, and the problem arose only after upgrading from version 8 to version 9. The first assumption was made that
VELightBundle.dll was incorrectly updated and the old one remained. Let me clarify that by this time the investigation of the problem was already taking place on the user's side through TeamViewer, so it was quite funny for him to watch our attempts. All this time there was a lively discussion on Skype between me and the developer (the sky began to look like a sieve):
[10/31/2013 12:17:35] Developer: Listen on his machine, the dll part was created on 20.08 and on 28.10.
[10/31/2013 12:17:46 PM] Developer: I understand he didn’t have an update.
[10/31/2013 12:18:53 PM] Developer: For example, DiskBundle on the 20th, and velight 28
[10/31/2013 12:27:21] Developer: strange
[10/31/2013 12:27:49 PM] Developer: it's just that the file is not from 9k
[10/31/2013 12:28:13] Chineek: from 8 chtoli
[10/31/2013 12:28:39] Chineek: if I were you, I would have just planted all the RIGHT ones for myself) well, either aninstall, install
[10/31/2013 12:28:48] Chineek: Aninstall - remove all that is extra in the long run - install
Unfortunately, even the use of the standard method “in any incomprehensible situation if something does not work, delete and install as it should work” in this case did not help and had to use “heavy artillery” in the person of one of my favorite utilities
Process Monitor . The log was filmed from the moment of the start of the Acronis vmProtect Managed Machine Service until the moment of its spontaneous stop. The log was set to filter the process vmms.exe and viewed messages about trying to download
VELightBundle.dll . I will not describe how long it took to search for these particular lines (I was helped by removing the similar log from the system where the service started correctly and comparing it with the problem log regarding the boot process of
VELightBundle.dll ), but this is what
happened :

Decision
As you can see, during the download process,
VELightBundle.dll tried to load the shared 3rd-party library
libxml2.dll (we use it to generate certain xml reports starting from version 9), but instead of the correct path
C: \ Program Files (x86) \ Common Files \ Acronis \ vmProtect \ Common \ , it began to load from
C: \ Program Files (x86) \ ATI Stream \ bin \ x86_64 \ , because was present in both places (each had its own unique version of this library):

From the Process Monitor log you can see that the first place where libxml2.dll is searched for is the product installation folder, therefore, by simply copying libxml2.dll from
C: \ Program Files (x86) \ Common Files \ Acronis \ vmProtect \ Common \ to
C: \ Program Files (x86) \ Acronis \ vmProtect \ Windows Agent \ initial problem was solved and the service started correctly. Hooray!
How did this happen?
What have we missed? Where did
C: \ Program Files (x86) \ ATI Stream \ bin \ x86_64 \ come from?
The answer to the second question is fairly obvious: loading the .dll goes by traversing the paths from the% path% variable if it is not found in the same folder where the original process was started. As you can see in the following screenshot,
C: \ Program Files (x86) \ ATI Stream \ bin \ x86_64 \ is indeed listed and therefore was selected to search and download
libxml2.dll :
An important explanation: I took screenshots after I reproduced the problem locally on my machine. While reproducing and writing this article, I downloaded
libxml2.dll from the first available link and copied it to the first path mentioned in% path%, which was
C: \ Program Files (x86) \ ATI Stream \ bin \ x86_64 \ . In the case of a real user,
libxml2.dll was in the installation folder of the
Norman antivirus, which, as it turned out, also registers itself in% path%.
Having continued to get to the bottom of the developers, I confirmed for myself the following thing:
All third-party (3rd-party) .dll files are added to
C: \ Program Files (x86) \ Common Files \ Acronis \ vmProtect \ Common \ folder, which, however, is not written into the% path% variable. The .dll data is jerked from our own libraries, but in an indirect way, that is, only if they are not found anywhere in the standard paths
C: \ Windows \ ,
C: \ Windows \ Syswow64 \ (system32) or% path%. In other words, there is a bug /
dirty hack in the logic of loading third-party libraries.
Once again, Process Monitor
saved the world helped solve a nonstandard problem, while finding a reason by remote debugging via VPN would take much longer.
I hope that this experience will prevent developers from writing buggy code, and technical support will provide another means to investigate problems with the launch of services. Agree, it is pleasant to receive similar confessions from developers:
[10/31/2013 2:56:21 PM] Developer: Thank you. I myself would have come to this for a long time.
PS I will quote our developers (I hope for other developers this phrase will make sense, and not a set of characters for me :)): “Well, what can I add. The fact is that we have a hook on delayload, which this kind of problem is designed to solve. Now does not solve. "