📜 ⬆️ ⬇️

NW + Edge.js + Fiddler or a tale about the dockability of the untangible

Hello to all.
Not so long ago, read the article What we should parse the site. Basics webdriver API and remembered that he had long been going to bring at least one relatively funny idea to a relatively working state. Hands still reached, which means it's time to tell what happened.
There is such a great program - Fiddler, which allows you to intercept and modify http / https requests. There is a wonderful thing called NW.js, also known as node-webkit, which allows you to ... parse various sites as well. You are beautiful, I am beautiful - why don't we make friends?
Actually, the idea is this: one could, of course, raise Fiddler separately, write logic in it and drive traffic through it from node-webkit-but this is not so interesting. So, we will combine everything under the same roof, the benefit of Fiddler is a library in C # - FiddlerCore.
Under the node there is an excellent module - Edge.js. This is such a tricky thing that allows you to execute C # code (and not only). Is there a node? Remarkably, you can get it under nw.js, there is even a manual — yes, there it is !


So, let's skip a couple of hours of sekas for building this wonderful library (how many times to speak to yourself, read the manuals carefully! We need 13 studios, not 12 and not 15!) And get down to writing the code. I will not dwell on the connection and loading of modules either, suppose that those who are interested in this article know how to read manuals (yes, I understood, I understood that manuals should be read carefully!).

//#r "System.Windows.Forms.dll" //#r "fiddler/FiddlerCore.dll" using Fiddler; using System; using System.Windows.Forms; using System.Collections.Generic; using System.IO; using System.Net; using System.Threading; using System.Threading.Tasks; public class Startup { Func<object, Task<object>> _console; Func<object, Task<object>> _html; public async void _print(object text){ if(_console!=null) await _console(text); } public async void _getHtml(object html) { if (_html != null) await _html(html); } public async Task<object> Invoke(dynamic data) { _console = (Func<object, Task<object>>)data.console; _html = (Func<object, Task<object>>)data._html; _print("Started"); FiddlerApplication.Shutdown(); new FiddlerLogic { _beforeRequest = (oS) => { var proxy = oS.oRequest.headers["POverride"]; if (proxy != null) { oS["X-OverrideGateway"] = proxy; } }, _beforeResponse = (oS) => { var response = oS.GetResponseBodyAsString(); _html(response); } }._start(5000); return Task.FromResult("Done"); } } class FiddlerLogic { public Action<Session> _beforeRequest; public Action<Session> _beforeResponse; public void _start(int port=5555) { FiddlerApplication.BeforeRequest += (oS) => _beforeRequest(oS); FiddlerApplication.BeforeResponse += (oS) => _beforeResponse(oS); FiddlerApplication.Startup(port, false, true); } } 

')
So what happens here?
These are the 2 lines here.
// # r "System.Windows.Forms.dll"
// # r "fiddler / FiddlerCore.dll"

are a marker for edge.js, needed to connect libraries.
The FiddlerLogic class is just a small wrapper over Fiddler, in which there was a code for connecting certificates, but then with the help of some street magic (most likely, I found the old version in the zagashnik, which did not require this) . Now, this class, in fact, does not do anything special, but where now without legacy code? Actually, in the constructor of the object, we specify 2 callbacks that will be called before / after sending the request, the port (if necessary) and that's it.
Oh yes, in the FiddlerApplication.Startup 2 and 3 arguments are responsible for using the system proxy / intercept https requests, respectively. Since I would like to intercept https-requests and see if you do not need to use the system proxy, the values ​​are false and true.

Now about Startup. This is such a fun class needed to work with edge.js (more on this module later). Actually, Invoke is an entry point, console / _html - functions from js. FiddlerApplication.Shutdown () is responsible for completing all previous fiddler instances. In _beforeRequest, the proxy is changed when there is a POverride header in the request. In _beforeResponse, nothing very useful happens, left for example. The _print and _getHtml functions are simply wrappers that check for functions passed from js.

Now consider the js part.

 var edge = require('edge'); var gui = require('nw.gui'); var async = require('async'); var request = require('request'); var start=Date.now(); var arr=[]; var count=50; var prev=Date.now(); var proxyList = ['160.92.56.41:80']; _init(); for(var i=0; i<count; i++){ arr.push('http://myip.ru/index_small.php') } var i=0; var _node=function(url, c){ console.log(url); var options = { url: url, headers: { 'POverride': proxyList[0] } }; request(options, function(err, response, html){ if(err) console.log(err); var j=_html(html); console.log('Container:', j.find('.network-info tbody>:nth-child(2) td').text()); c(); }); }; async.map(arr, _node, function(){ var ms=Date.now()-start; console.info('Node parse got %f seconds. Mid time: %f. Mid page per second: %f', ms/1000, ms/count, count*1000/ms); } ); /***DEFINITIONS***/ function _html(html){ return $('<div></div>').html(html); }; function _init(){ try{ request=request.defaults({'proxy':'http://localhost:5000'}); gui.App.setProxyConfig("http://localhost:5000"); func = edge.func("fiddler/Main.cs"); func({ console: function(data, callback){ console.log(data); }, _html: function(html, callback){ //var container=_html(html); //console.log('Container:', container, container.find('.network-info tbody>:nth-child(2) td')); } }); } catch(ex){ console.log(ex); } }; 


First of all, you should pay attention to 2 functions from below - _html and _init. The first deals with obscenity in the form of building a DOM from a string, 2 - the basic settings and the connection of C # code.
Edge.func loads the contents of Main.cs (see the code above), and passes the arguments to the necessary functions as arguments. Actually, the function argument func is the data from public async Task

Source: https://habr.com/ru/post/273311/


All Articles