📜 ⬆️ ⬇️

Human parser on Selenium WD



Start


And now it was my turn to buy a car. How do the guys from work, I saw. They go to the site and follow the proposals, well, someone who is a little older buys a newspaper and looks through ads. All this is monotonous and did not want to be distracted by the seat, research and clicking on the links. I just wanted someone to do it for me, there were no such people. So it was necessary to make the computer do it all.

Formulation of the problem


As I saw the solution to this problem, write a parser, write a mailing script. The parser should collect ad data from the site (the site I chose “from hand to hand”), and the newsletter should send me e-mail messages about the new lot. The message text should contain:
The parser should work every N-minutes. And after it is working out messages should come. The results of the parsing are recorded in the database, and after sending the message to the mail, mark each advertisement as sent. I don’t want to see the same thing a thousand times.
')

Parser


It was the most difficult step. I remember, a long time ago, I wrote a parser for classmates. In PHP. At first I had to figure out the most social. network and understand how magically it works. Then it was necessary to remember all these sessions, cookies and the sequence of clicking on links. And turning all these thoughts into code? Oh God. How I wished that everything would happen clearly. How I wanted not to think about what the browser has long been able to do. I wanted my beloved Natasha to finally understand and most importantly see the results of the work, and not the white text on the black background of the command line.
That's why I just wanted to control the browser, which would be understandable and visible. And here, Selenium WebDriver enters the scene. With the help of which you can control the browser, being able only to correctly select selectors (css, XPath). The logic of the parser becomes transparent. Press the button, wait, enter the data, press the button and that's it. And no cookies. Hooray! And most importantly, I will see everything alive, and not in the logs.

Preparatory work


And so we need to install:

Next, in the project folder you will need to install several modules for Node.

Let me remind you that the installation of modules looks like this:
npm install " " 

Now let's run our selenium
 java -jar "    selenium'" 

And the database server
 mongod --dbpath "    " 


We write a parser


Source chose as already said - "From hand to hand." Now the sequence of actions:
Clear all cookies
 browser.deleteAllCookies(); 

Choose a region

image
Here in this field we enter the region that interests us. The region as well as all other parameters are described in the object, which will be indicated below. Entering the region and pressing can be described by the following pseudocode:
  //     browser.elementByCss(LOCATOR.cssPath) .then(function(el){ //  return el.type(OPTION.region); }) .then(function(){ //       LOCATOR.className = ''; LOCATOR.cssPath = '.b-searchRegion > ul:nth-child(1) > li:nth-child(1)'; return browser.elementByXPath('//span[contains(text(), "' + OPTION.region + '")]'); }) .then(function(el){ //    el.click(); }); 


Next, select the section (I need "Cars")

image

You can describe it simply by clicking on a link containing specific text.
 //   browser.waitForVisibleByPartialLinkText(OPTION.category,OPTION.elWait) .then(function(){ //    return browser.elementByPartialLinkText(OPTION.category); }) .then(function(el){ //    el.click(); }) 


Now the final part of setting the parameters for the search is to click on the “more parameters” button, enter the price, year of production and other parameters. All this can be viewed in the video.
As you can see everything is very easy to find the item, find out its unique locator and click, enter or leave it alone.
Actually, it is now necessary to collect data. Click on the "Show" button, and parsim data. Getting the text looks easy
  //    browser.elementByXPathOrNull(locationXPath) .then(function(el){ if(el) { //  return el.text(); } else{ cb('   - ' + locationXPath); } }) 

The stop signal for data collection is the absence of a blue right arrow on the results page:
image
After collecting the data we record their base. And close the browser.
By the way, here is the object that describes the parameters for finding a car.
 OPTION = { region : ' ',//  category: ' ',//  price : {from : 0 , to : 1800000},// cy : 'RUR',// releaseYear : {from : 2010, to : 2013},//  mileage : {from : 0 , to : 99000 },// mark : ['BMW','', 'Audi','Hyundai'],//  model : ['X1', 'X3', 'X5'],//  carcass : ['', ''],//  transmisson : ['', ''],// motor : [''],//  gear : ['','',' ',' '],// photo : false,//  video : false,//  district : ['',''],//jrheuf area : ['', '', ''],// metro : { lines : ['', ''], //  station : [' .', ' .']//  }, source : [''],//  submitted : ['  '],//   ajaxWaitMilisec : 2000,//  ajax . elWait : 3000//   . }, 


Work with base


The base structure is as follows
 MONGODBSCHEMA : { title : String, //   link : {type : String , unique : true},//   price : String,//  location : String,//   phone : String,//   * text : String, //  * images : Array,//  * sms : {type : Boolean, default : false }, //      * email : {type : Boolean, default : false }//     e-mail    } 

* - the fields that were planned to be used are marked, but I decided to abandon them. In the future may be needed.

We send notifications by mail


This is the easiest moment. We use for this the emailjs module. Select all documents from the database in which the "email" field is set to "false". Sent to my mailbox. Change the “email” property to “true” for sent ones. Open the phone application that displays our letters and study suitable ones.

We do everything every N-minutes.


We use the cron module for this. We start every 20 minutes first the parser, and then the distribution of letters.

That's all


Now I follow the sale of cars when I have a phone at my side. And there is no unpleasant sediment from a broken brain that stores in itself the entire sequence of magical actions, as before with PHP (the language itself has nothing to do with it). And there is a feeling that I will write the next parser in about 60 minutes just by recognizing the element locators. All code is here .
And I also want to say a huge thank you to my future wife Natasha, for the fact that she is not against all these crazy ideas of mine with home programming, and for the fact that she has such an invigorating and sweet laugh.

Source: https://habr.com/ru/post/186496/


All Articles