Today, the topic of automating web application security testing using PhantomJS in conjunction with BurpSuite, ModSecurity, Garmr, etc. is very actively developing. I did not become an exception; I would like to share with you my experience of developing a working prototype of a scanner with support for Javascript, Ajax and DomMutation. Maybe this will help someone to develop their own solution, which will be much better. I ask all interested under the cat :-)
I hope no one will argue that existing web applications are a hellish brew from various technologies and the further the more logic moves from the server to the client. Of course, this could not but affect the departments of information security, which are already often not so great, and here also the usual methods of automated testing simply break off at the root. At one point, it became clear that for the initial analysis of solutions there was no longer enough static analysis and the usual scheme of the html-parser + fuzzer. So it was decided to investigate this issue and try to do something about it.
Why is support for Javascript, Ajax and mutations so important?
Because there is no other way :-) History knows a lot of examples when the absence of even the simplest Javascript support in security scanners reduced their effectiveness to nothing. For example, CSRF in Yandex.Mail - if you pass the user’s change in the GET parameters, the backend will honestly respond with an error about an invalid CSRF token and then the JavaScript engine re-sent (!) The same request, but by adding a valid token to the request :) the backend position is more than correct, but from the client's point of view ... Or XSS when translating a letter into any of the languages ​​is a similar story, a primitive vulnerability that w3af and its ilk could not detect. The lack of support for JavaScript at the current time, it seems simply unacceptable and it is necessary to try to fix it with all our strength.
How to be?
I decided to use existing ideas and take my favorite SlimerJS and CasperJS as a basis and try to get a crawler at the output with the possibility of recursively traversing the events of the DOM + elements and monitoring mutations to identify “pathologies”, XSS vulnerabilities and others. Why not PhantomJS? Because there is no support for MutationObserver in it, which I needed to analyze mutations. And so, I imagined a complete system consisting of four large blocks:
* Crowler, who like a monkey bypasses all the events we need and tracks mutations based on formal rules
* A proxy that could collect data for further fuzzing and running the rest of the context-dependent checks
* Fixtures in the web application, with pre-prepared labels that we can navigate in the testing process.
* Diff report based on previous scan results
Most of them were implemented earlier and therefore I concentrated on the first part (if there is interest, I will tell about the others). The general algorithm of work, which seems to me, is divided into two large sections - page processing and event processing, in a simplified form looks like this:

As you can see, everything is quite simple, although not yet perfect, for simplicity of perception, I drew only the main blocks. Imagine a web application that displays users and, upon click, displays additional information received by an Ajax request:

Available here:
http://crawl-test.mmmkay.info/ .
Initial move:
https://github.com/dharrya/monkey-crawler/tree/master/testsFor example, if you run w3af, with the webSpider plugin, no Ajax requests will be detected, and yet they are not rarely vulnerable for a number of obvious reasons. Skipfish will have about the same result.
I prepared a small prototype to confirm the performance of the algorithm, available on
github . At the root of the project there is a test script "test.js":
(function start(require) { "use strict"; var Spider = require('lib/spider').Spider; var utils = require('utils'); var startTime = new Date().getTime(); var spider = new Spider(); var url = 'https://github.com/dharrya'; if (spider.cli.has(0)) url = spider.cli.get(0); spider.initialize({ targetUri: url, eventContainer: undefined }); spider.start(url); spider.then(spider.process); spider.run(function() { this.echo('\n<---------- COMPLETED ---------->\n'); var deltaTime = new Date().getTime() - startTime; deltaTime = (deltaTime / 1000).toFixed(2); this.echo('time: ' + deltaTime + 'sec'); this.echo('Processed pages:' + this.pagesQueue.length); utils.dump(this.pages); spider.exit(); }); })(require);
The essence of which is to start the process and display the “raw” result, let's try on my example:
$ ./test.sh http://crawl-test.mmmkay.info <---------- COMPLETED ----------> time: 3.61sec Processed pages:1 [ { "url": "http://crawl-test.mmmkay.info/", "opened": true, "processed": true, "reloadCount": 0, "status_code": 200, "jsErrors": [], "xss": [], "xssHashMap": [], "pages": [], "events": [ { "eventType": "click", "path": "id(\"user-Lisa\")/DIV[3]/BUTTON[1]", "parentEvent": null, "depth": 0, "status": "completed", "completed": true, "deleted": false, "xss": [], "xssHashMap": [], "events": [ { "eventType": "click", "path": "//DIV[4]", "parentEvent": null, "depth": 1, "status": "completed", "completed": true, "deleted": false, "xss": [], "xssHashMap": [], "events": [], "resourses": [] } ], "resourses": [ "http://crawl-test.mmmkay.info/user/Lisa.json" ] }, { "eventType": "click", "path": "id(\"user-Jimmy\")/DIV[3]/BUTTON[1]", "parentEvent": null, "depth": 0, "status": "completed", "completed": true, "deleted": false, "xss": [ { "innerHtml": "Another XSS...", "path": "id(\"userInfoDescription\")/XSSMARK[1]", "initiator": null, "dbRecord": null } ], "xssHashMap": [ 0 ], "events": [], "resourses": [ "http://crawl-test.mmmkay.info/user/Jimmy.json" ] }, { "eventType": "click", "path": "id(\"user-Mark\")/DIV[3]/BUTTON[1]", "parentEvent": null, "depth": 0, "status": "completed", "completed": true, "deleted": false, "xss": [], "xssHashMap": [], "events": [], "resourses": [ "http://crawl-test.mmmkay.info/user/Mark.json" ] }, { "eventType": "click", "path": "id(\"user-Tommy\")/DIV[3]/BUTTON[1]", "parentEvent": null, "depth": 0, "status": "completed", "completed": true, "deleted": false, "xss": [], "xssHashMap": [], "events": [], "resourses": [ "http://crawl-test.mmmkay.info/user/Tommy.json" ] } ], "deferredEvents": [], "startTime": 1391266464847, "endTime": 1391266466787, "resourses": [ "http://crawl-test.mmmkay.info/" ] } ]
Do you see requests for user data from Ajax events? This is exactly what we need! I agree, the current debugging output is a bit redundant, but informative. It shows that he successfully discovered additional requests while processing the events of the “More info” buttons, which can be phased in the future. In the appendage, thanks to the fixtures in the web application, he was able to immediately detect XSS in mutations, which he kindly reported. While not very fast, but I actively work on it in my spare time. Another excellent example is linkedin, here is the result for 5 of its pages (starting with the main one):

Green nodes - element events processed
Blue - resources that were requested during their processing
As you can see, in such web applications with a multitude of chain events, this approach can be effective!
')
Total
I think in the future to develop this idea to a full-fledged web application security scanner and to add the rest of the strapping to lay out (if the manual permits), maybe as a plug-in for W3af or Minion. But before that, there are still many unsolved issues related to performance and critical features.
I hope I didn’t bore you very much and my attempts will be useful to someone, if I didn’t work out something clearly - say, I will try to break the cover.