📜 ⬆️ ⬇️

How would you solve this compatibility problem? Answer

I remind you that this is still a translation of Raymond Chen, and not an answer to the comments in the previous topic on Habré. Although the sentences in the comments here and there are quite similar.

Let us examine several solutions proposed in the comments.


Some people missed the fact that the main error scenario is NAS devices, which consist of a large disk, a small computer and a place where you can plug in the power cord. The OS of this computer is in the ROM, and is replaced only by a full flashing. We must wait until the vendor releases it, so simply download and update the driver will not work.
')
Regarding the definition of an erroneous driver, the CIFS protocol (also known as SMB) does not give the client enough information to determine which version is on the server. There is only the “family” field, which reports the general category of server (OS / 2, Samba, Windows NT, ...). All the client can say is "Well, the server is running on some version of Samba." He cannot say whether this is an erroneous version or a corrected one. The only way to determine this is to start working with an erroneous server, and see how it ends.

We cannot immediately re-read the list in case of an error: the partial list was already returned to the client at the moment the error was detected. The application calls IShellFolder :: EnumObjects and the shell performs a quick request. Each time the application executes IEnumIDList :: Next, and gets the following result. After returning about 100 items, ops, it turns out that the server is one of those bad ones that breaks the quick query. And now what? It is impossible to go back in time and take away from the application all the elements that were already given to it.

Another suggestion was to return a new error code like ERROR_PLEASE_RESTART, which would mean “Hmm, server problems. Please try again, only slower. ” This is practically the same as “Do nothing”, because the server has already returned a specific error, namely STATUS_INVALID_LEVEL. This mistake does not really mean “Please try again,” but “Sorry, but I can't do it.” It can absolutely legally arise when, for example, you are trying to get a response from the server in “fast” mode, but it does not support it. But the effect from the point of view of the program is the same. “If FindNextFile returned an error xyz, it means that the server has problems and you have to repeat the request again.” Call it “ERROR_PLEASE_RESTART” or “STATUS_INVALID_LEVEL” or “PURPLE_LILACS”. No matter what you choose, the result is the same: the existing code must be changed in order to be aware of the new error and to react to it correctly. Programs that are not changed will behave strangely.

The fight against this bug, in fact, lasted several months. During this time, many more devices were discovered that did not work correctly in the “fast” mode. Some of them were based on the unpatched version of Samba, some used their own implementation of the SMB protocol, and the solution found for Samba did not work for them. It was also bad that most of the devices were quite budget, and did not allow the firmware to be upgraded in any way.

There were also reports of similar problems even in some completely patched common Linux distributions.

Further, some of these devices incorrectly handled “fast” requests in completely different ways. For example, one of them I dealt with did not return any error codes. He simply returned garbage data, for example, skipped the first five characters in each file and returned the rest. How can you identify such an error? If the server says “Ok, I have an e.txt file,” which will be answered by Windows: “Oh, I don't think so. I bet that one of those bad servers that skips the first five characters, and in fact you mean readme.txt. ” What if it really has an e.txt file?

One of the devices just fell when it was accessed in “fast” mode. Others hung and required a reboot. “Oh, again, someone brought to work his laptop with Windows Vista, and included it in the corporate network. Our file server fell again. ”

Such a variety of erroneous behavior when using “quick” queries made the scenarios for automatic detection of erroneous servers unrealistic. Especially in the case when the server returns well-formed, but incorrect data. And even if the definition would be correct, then when the server simply falls, it still will not save.

Thus, the decision was made to stop using “quick” requests for anything other than local drives. The drivers of the most popular file systems (NTFS, FAT, CDFS, UDF) are completely under Microsoft control, and tested for compatibility with the fast mode.

This is all sad, but that’s the price of compatibility.

Raymond has several posts related to the analysis of suggestions, and I will not translate them all. But who is interested, then:

Adding flags to APIs to work around driver bugs doesn't scale

Be very careful

It was a flag

Source: https://habr.com/ru/post/108523/


All Articles