📜 ⬆️ ⬇️

Testing or parsing sites with a dynamic home and more. Nightmare.js - he doesn't care

This article will not contain a lot of lyricism, maral or introductory why and to whom it may be necessary.

In a nutshell:


1. The package can be used to test sites.
2. The package can be used to parse data.
3. The package can be used to automate data entry to sites.

Alternatives:


Casper.js, phantom.js, watir and many more, Google is full of one and all. Why am I behind nightmare.js:
')
  1. Ease of use.
  2. Full html5 support, no conflicts with sites.
  3. Expandable through actions.

Library structure


The Nightmare class uses the electron framework, for each page creating an object (BrowserWindow) that launches the browser shell Chromium.

Principle of operation


  1. Nightmare initializes a new electron application with a start page that needs to be further processed.
  2. Before loading the page under study, scripts are loaded that allow the programmer to maintain two-way interaction with the page through a series of emitters.
  3. Nightmare provides the programmer with a set of api (chains of actions), allowing to make any manipulations with the site and obtain the required data.

pros


  1. The code on the client side and the site is written in the same language, no templating is required.
  2. The ability to expand the modules through the creation of action games. The action can be created at the nightmare class level or at the nightmare class level and electron level (which in turn allows devapi Chromium to be used). In npm, there are already enough ready-made extension modules that can be connected to your project (for example, realMouse fully emulates mouse hover or work with iframes, which is blocked by browser security).
  3. All commands are chains, each of which returns a promise; this allows you to write code in either the promise style or inside async functions or generators.
  4. A relatively small load on the processor and memory, you need to remember that comparing such a tool with simple get and post queries is not ethical, in terms of speed and memory browser parsers lose without options).
  5. Nightmare can work in two modes, the browser display mode and the background process mode.
  6. Supports proxy. Installing user agent, setting browser extension.
  7. You can enable or disable the display of images, support for webGL and a whole bunch more.
  8. You can create preload scripts, which allows you to add your library functions to the page before loading. As a particular example, you can rewrite the function addEventListener by making it the decorator for real + injecting analytic functions to check for that. What does the site actually do when you are on it or to struggle with the finger print intrusiveness that everyone loved so much, forgetting about your “anonymity”.

From emotion to business


A classic example of using the module from the documentation:

var Nightmare = require('nightmare'); var nightmare = Nightmare({ show: true }); nightmare .goto('https://duckduckgo.com') .type('#search_form_input_homepage', 'github nightmare') .click('#search_button_homepage') .wait('#zero_click_wrapper .c-info__title a') .evaluate(function () { return document.querySelector('#zero_click_wrapper .c-info__title a').href; }) .end() .then(function (result) { console.log(result); }) .catch(function (error) { console.error('Search failed:', error); }); 

In a nutshell about what is happening


Connecting the library, creating an object with a visible browser mode. Entering the page, searching for an element using the DSS selector, entering text, pressing a button, waiting for the new CSR to appear, performing the function on the browser side and returning it, after completing the task chain, then the result of the work will be transmitted or an exception will be triggered. In my opinion, everything is simple and convenient, but as soon as the page crawl script becomes large, this command description becomes inconvenient, therefore I propose a good use case for async functions:

 const Nightmare = require('nightmare'); (async ()=>{ let nightmare; try { nightmare = Nightmare({ show: true }); await nightmare .goto('https://duckduckgo.com') .type('#search_form_input_homepage', 'github nightmare') .click('#search_button_homepage') .wait('#zero_click_wrapper .c-info__title a'); let siteData = await nightmare.evaluate(function () { return document.querySelector('#zero_click_wrapper .c-info__title a').href; }); //     } catch (error) { console.error(error); throw error; } finally { await nightmare.end(); } })(); 

What are the advantages of such a variant of writing code? You can receive as many times as you want from the site through evaluate, analyze them and apply various behavioral scripts, describing it in your script.

You can go through the pages sequentially through await nightmare.goto (....), while Nightmare will wait for the download house.

About documented features


I consider it useless to describe all the functions in the examples, since all this is well indicated in the documentation. Let me just say that the module can read any data, take screenshots, save html pages, pdf pages, transfer data to the site. Through additional modules, uploading files to the server is available through the form input type = ”file”. Able to respond to alert, prompt, confirm, can broadcast in the form of events data from the console.

What features should be considered when working with nightmare


It is necessary to understand that each action will either be completed or an exception will be thrown, and therefore in places where there is no certainty that the code will pass 100% you need to wrap requests into try catch and process them accordingly. As an example of wait (selector), this instruction will give the command to suspend the execution of the script until the html element appears with the corresponding css selector, but the module has a default timeout, it can be changed optionally, upon the occurrence of which an exception will be thrown, respectively -and or somehow react to it.

Summary


In my opinion, nightmare.js is a very serious library with good functionality. Easy to learn, flexible, allows you to perform almost any task in testing sites and their analysis. To strict critics I treat with understanding who will be interested in the topic, according to the comments I will gather ideas for the following articles.

Links


→ Nigthmare.js
→ Electron

Thanks for attention!

Source: https://habr.com/ru/post/331752/


All Articles