📜 ⬆️ ⬇️

Invisible File Mystery

image

Not so long ago, having finished work on the next article for Habr, I decided to add it to my friend's review. After saving the HTML page with the whole environment (pictures, styles, etc), I packed it into a ZIP archive and sent it to the addressee. Within five minutes I received feedback, which, contrary to my expectations, was connected not with the article itself, but with the fact that the archive was absolutely empty. After scratching my head and deciding that I was blunted with archiving, I repeated the procedure, making sure to select all the files needed for packing. A few minutes later, an acquaintance again broke out surprised, “Are you kidding me?” While I was not joking at all.

I started to put together all the elements of the puzzle. First, I found out what he is trying to open the archive. Suddenly, as a viewer, he uses some kind of third-rate garbage. Don't get it from which developer? However, it turned out to be the default explorer.exe . I used Total Commander both for packing and for viewing the resulting archive, and in my case it was not at all empty:
')
image

What is it really build xxxWindowsUltimateEditionxxx pumped up? I tried to open the same archive on my computer with the help of explorer.exe and finally believed my friend - the archive really looked empty:

image

Who is to blame for this behavior? Let's figure it out.

How was the process, and what came of it, read under the cut (carefully, a lot of screenshots ). Before reading this article, I also strongly recommend that you review the previous ones .

Experimentally, I found that the problem is reproduced, at a minimum, if there is a '' 'in the name of the archived file (for example, "" some_file.txt "). Next, I found out that when using 7-Zip as an archiver, the contents of the resulting archive are quietly displayed in explorer.exe. Checking the “problematic” archive for errors with the built-in tools of 7-Zip also revealed nothing:

image

By the way, did you notice that instead of the characters' '' and '"' in the original archive, Total Commander shows the underscore character ('_')? 7-Zip File Manager, in turn, replaced '' 'with the' <'character:

image

What is it? Let's not guess and see how ZIP-archives differ in the case when one was packed with Total Commander's built-in archiver, and the second with 7-Zip.

To begin with, we create a minimal example that reproduces the problem situation - I stopped at the empty file ““ some_file.txt ”(the final archive will be called“ “some_file.zip”). Next, we archive it in both ways without compression and take XVI32 in our hands, in which we open both the resulting archives:

Problem archive
image

Normal archive
image

Already with the naked eye it is noticeable that their contents are different from each other. However, this should not yet cause any specific emotions, because archivers can easily write information about themselves loved ones in some "additional" fields. To make sure what exactly the difference between these files is, let's look at the specification and the third-party description of the ZIP format and “split” our bytes into its component parts:

Problem archive
 Local file header

 50 4B 03 04 - signature
 14 00 - PKZip version needed to extract
 02 00 - General purpose bit flag
 00 00 - Compression
 83 55 - Mod.  time
 DC 46 - Mod.  date
 00 00 00 00 - CRC-32 checksum
 00 00 00 00 - Compressed size
 00 00 00 00 - Uncompressed size
 0E 00 - File name len
 00 00 - Extra field len
 3C 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("<some_file.txt")

 Central directory file header

 50 4B 01 02 - Signature
 14 00 - Version
 14 00 - PRZip version needed to extract
 02 00 - Flags
 00 00 - Compression
 83 55 - Mod.  time
 DC 46 - Mod.  date
 00 00 00 00 - CRC-32 checksum
 00 00 00 00 - Compressed size
 00 00 00 00 - Uncompressed size
 0E 00 - File name len
 00 00 - Extra field len
 00 00 - File comm.  len
 00 00 - Number of disk
 00 00 - Internal attr.
 20 00 00 00 - External attr.
 00 00 00 00 - Offset of local header
 3C 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("<some_file.txt")

 End of central directory record

 50 4B 05 06 - Signature
 00 00 - Number of this disk
 00 00 - Number of the central directory starts
 01 00 - Number of central directory entries on this disk
 01 00 - Total number of entries in the central directory
 3C 00 00 00 - Central directory size
 2C 00 00 00 - Offset of cd wrt to starting disk
 00 00 - Comment len


Normal archive
 Local file header

 50 4B 03 04 - signature
 0A 00 - PKZip version needed to extract
 00 08 - General purpose bit flag
 00 00 - Compression
 84 55 - Mod.  time
 DC 46 - Mod.  date
 00 00 00 00 - CRC-32 checksum
 00 00 00 00 - Compressed size
 00 00 00 00 - Uncompressed size
 0F 00 - File name len
 00 00 - Extra field len
 C2 AB 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("B" some_file.txt ")

 Central directory file header

 50 4B 01 02 - Signature
 3F 00 - Version
 0A 00 - PRZip version needed to extract
 00 08 - Flags
 00 00 - Compression
 84 55 - Mod.  time
 DC 46 - Mod.  date
 00 00 00 00 - CRC-32 checksum
 00 00 00 00 - Compressed size
 00 00 00 00 - Uncompressed size
 0F 00 - File name len
 24 00 - Extra field len
 00 00 - File comm.  len
 00 00 - Number of disk
 00 00 - Internal attr.
 20 00 00 00 - External attr.
 00 00 00 00 - Offset of local header
 C2 AB 73 6F 6D 65 5F 66 69 6C 65 2E 74 78 74 - File name ("B" some_file.txt ")
 0A 00 20 00 00 00 00 00 01 00 18 00 F0 88 3F D4 6D B1 D0 01 F0 88 3F D4 6D B1 D0 01 F0 88 3F D4 6D B1 D0 01 - Extra field

 End of central directory record

 50 4B 05 06 - Signature
 00 00 - Number of this disk
 00 00 - Number of the central directory starts
 01 00 - Number of central directory entries on this disk
 01 00 - Total number of entries in the central directory
 61 00 00 00 - Central directory size
 2D 00 00 00 - Offset of cd wrt to starting disk
 00 00 - Comment len


As you can see, in the case of a problem archive, the “« ”symbol really“ turned ”into '<' (0x3C) for some reason, while in the normal archive it continues to be itself (0xC2 0xAB - this is how it is represented in UTF -eight).

And what will happen if we simply replace '<' in the problematic archive with '' ', of course, simultaneously changing the values ​​of the remaining bytes, which we could influence in this way? Replace 0x3C with 0xC2 0xAB (note that you need to do this in two places at once), 0x0E 0x00 (File name len) with 0x0F 0x00 (this also needs to be done in two places), 0x3C 0x00 0x00 0x00 (Central directory size) with 0x3D 0x00 0x00 0x00 (since we have increased the size of the Central directory with our previous actions) and 0x2C 0x00 0x00 0x00 (Offset of cd wrt to strating disk) by 0x2D 0x00 0x00 0x00. The result should be the following:

image

Open the resulting archive in explorer.exe and see:

image

Yes, the file is now visible, but there is clearly something wrong with its name. We scour the specification in search of the word "unicode" and meet the following:

APPENDIX D - Language Encoding (EFS)

D.1 The ZIP format historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437. This limits storing
MS-DOS range of values
and not support file names in other character encodings, or
languages. To address this limitation, this specification will support the
following change.

D.2 If you need to use the file name 11
to the original ZIP character encoding. If general purpose bit 11 is set, the
The Unicode Standard, Version 4.1.0 or
UTF-8 storage
specification. The Unicode Standard is published by the The Unicode.
Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
byte order mark (BOM)

Let's see if 11 bits of the flag field are set in our cases:

Problem archive
image

Normal archive
image

Let's set this flag in case of a problem file (Tools -> Bit manipulation)

image

and try again to open our archive in explorer.exe:

image

Similarly, we forgot that the flag fields are actually two:

image

We set the required bit in this field and re-open our archive:

image

Great! That's just why Total Commander "turns" the symbol "" into "<"? In order to understand this, take OllyDbg in hand and launch the file manager we are researching in it. But wait, let's check if ASLR technology is enabled for totalcmd.exe. We load it into PE Tools with Alt-1, click on the “Optional Header” button and see that the base will not change (for a more detailed description of this process I recommend to look at the previous article ):

image

Obviously, to create a TC archive at the beginning, you should use the CreateFile WinAPI function, so we will set breakpoints on its calls (most likely, in our case, it should use the Unicode version of this function):

image

We remove the breakpoints that are triggered for every action we do not need (for example, at the event of receiving the focus window TC at 0x00567264 ):

image

Press Alt-F5 (key combination for archiving files in Total Commander), click on “OK” and find ourselves here:

image

Let's try to understand whether TC has already converted the character '' 'to' <'. To do this, open the "Memory" window with Alt-M -> left-click on the first line -> Ctrl-B -> enter "<some_file.txt" in the "ASCII" field:

image

Click on the “OK” button and see that already at the moment the application has implemented its transformation:

image

Press Alt-K to look at the Call Stack:

image

We set breakpoints at the beginning of each of the procedures in the list, press F9, delete the resulting archive, and make sure that the string "<some_file.txt" is no longer in the process memory. After this, we start the archiving process again and stop at the beginning of the first procedure from the Call Stack shown earlier:

image

Again we are looking for the same line in the whole memory of the process and ... We find it:

image

Well, the last logical option at the moment is to put a breakpoint at the beginning of the last procedure in the Call Stack, from which, in fact, they called us here:

image

We jump to the call (right-click on the line with the address of the current procedure in the Call Stack window -> Show Call), run up to the start of the procedure indicated by OllyDbg and set a bryak on it:

image

We perform the same actions as before (hard press F9, delete the archive, check the process memory for the absence of the string "<some_file.txt", press Alt-F5 and the "OK" button) and stop at the just-delivered breakpoint. We are looking for the same line and ... Find it again:

image

Considering that the Call Stack is currently empty, it can be assumed that we are running in a thread other than main thread, or we got here as a result of a conditional or unconditional transition. Press Ctrl-R and see:

image

Jump to a single link by pressing the Enter key:

image

We look, who in turn refers to this line:

image

We jump there:

image

We go inside several procedures and see the call to the WinAPI function CreateThread :

image

In principle, one could be convinced of this in another way - to do this, just look at the title of the CPU window, which in my case reported that the thread ID is 0x000013B8:

image

At the same time, in the “Log” window opened by pressing Alt-L, it is clear that the ID of the main thread is 0x00001E30:

image

Press Alt-F5, set breakpoints on CreateThread calls

image

, click on the “OK” button and stop at a place already familiar to us:

image

We look at the Call Stack and using the “binary search” method (we divide the number of input parameters in half and look at the result) and unwind the chain of calls to various procedures to the state when it becomes known which one of them will appear in the memory of the process as <some_file. txt "- it is a procedure located at 0x00491780 . Looking closely at what is happening inside it, we can detect the call to the WinAPI function CharToOem :

image

According to official documentation, this function translates the transferred string into an OEM-defined character set , and, if an ANSI version is used, we can perform a so-called "In place translation" (src and dest can point to the same address, which eliminates the need to create a separate buffer for the destination string), which happens in the case of TC:

lpszDst [out]
Type: LPSTR
The destination buffer, which receives the string string. If you want to use the ANSI function, you can see the same address as the lpszSrc parameter. This cannot be done if CharToOem is being used as a wide-character function

Yes, after its call, the buffer transferred to it already actually contains the character '<' instead of '' ':

image

What will we patch? Let's first take a look at the archiving options in TC:

image

By default, the “Pack Unicode names” option is set to “Ask every time a Unicode name is encountered”. Consequently, TC did not consider that the name encountered was Unicode. And if you try to archive a file, for example, with Chinese characters?

image

As you can see, in this case, TC displays a window with a message about the encountered file name, which contains characters other than the code page used. In the case of the “” symbol, there wasn’t such a message, most likely because many code pages and (in my case, apparently, CP1251 ) contain this symbol in the additional “cells” assigned to them. But if you set this option in “All as UTF-8 if at least one contains characters> 127”, then we will see that our “some_file.txt” file is correctly packed and subsequently displayed in explorer.exe.

You can ask “Why, even if the file name changed from„ “some_file.txt“ to „<some_file.txt“, explorer.exe could not display it with at least the changed name ”? The fact is that '<' is one of the characters that are prohibited for use in the names of directories and files in the case of NTFS:

The following reserved characters:

<(less than)
> (greater than)
: (colon)
"(double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)

Moreover, unzipping such an archive using the built-in Windows tools also fails. Firstly, because of the symbol ““ ”in the name of the archive itself:

image

Secondly, because of its content:

image

Afterword


In absolutely any product there are bugs / features (call it what you want) that can crawl out in the most unexpected places, and the more complex the software complex is, the more bugs it can usually be detected. Do not be lazy to study the reasons for the behavior that has arisen to you, because it is quite possible that in the process of researching an application you will learn something new.

Thank you for your attention, and again I hope that the article was useful to someone.

Source: https://habr.com/ru/post/261733/


All Articles