📜 ⬆️ ⬇️

Architecture of the collection service and classification of housing ads from Vkontakte



In this article, I will talk about how the search ad housing service from Vkontakte works and developed, why a service-oriented architecture was chosen, and what technologies and solutions were used in its development.

Service works more than nine months.

During this time:
')

Hereinafter, I will use the word service - as an SOA module, and not the entire web service.

I chose the SOA architecture because it enabled:


You could call it microservice architecture, but there were some minor differences. Between services, data exchange is used based on the “Common Database” using the MDBWP protocol instead of the HTTP API API that is usual for microservices and storing the data of each service in its database. This approach was due to the rapid development with the ability to retain all the advantages of the described SOA approach.

Ansible was chosen to automate the warmup.
This is one of the configuration management systems that has a low entry threshold.

MongoDB was chosen as the database. This document-oriented database was perfect for storing ads with a list of metro stations, contact details of landlords, and a description of the ad.

At the moment, the general scheme of interaction between services is as follows:



Services:




rent-view - service display ads and search for them


github.com/mrsuh/rent-view



The service is written on NodeJS , because The most important criterion of its quality was the speed of the server’s response to the user.

The service requests ads in MongoDB , renders HTML pages using the doT.js template engine and gives them to the browser.

The service is built using Grunt .

To work in a browser, scripts are written in pure JS , and styles are written in LESS . Nginx is used as a proxy server, which caches part of the responses and provides an HTTPS connection.

rent-collector - ad collection service


github.com/mrsuh/rent-collector



The service collects ads, classifies them and writes them to the database.

It is written in PHP for several reasons: knowledge of the necessary libraries for writing the service, as well as high speed of development.

The symfony 3 framework is used.

Beanstalk was selected as the queuing service. It is lightweight, but does not have its own message broker. This is exactly what is needed for a small virtual server and for non-critical data to be lost.

Using beanstalk , 4 messaging channels were made:


rent-parser - classified ads service


github.com/mrsuh/rent-parser
Service written in Golang .

To extract structured data from the text, the service uses the Tomita parser from Yandex . Performs preprocessing of the text and subsequent processing of the results of parsing.

So that you can test the service, I made an open API .

Try parser online
Request:
curl -X POST -d '   30   .  + 7 999 999 9999' 'http://api.socrent.ru/parse' 

Answer:

 {"type":2,"phone":["9999999999"],"price":30000} 

Types of ads:
+ 0 - room
+ 1 - 1 bedroom apartment
+ 2 - 2 bedroom apartment
+ 3 - 3 bedroom apartment
+ 4 - 4+ room apartment
+ 5 - studio
+ 6 - no ads

For more information about the classification of ads, I wrote here habrahabr.ru/post/328282

rent-control - settings management service


github.com/mrsuh/rent-control



It is written in PHP for several reasons: knowledge of the necessary libraries for writing the service, as well as high speed of development.
The symfony 3 framework is used.
Bootstrap Style Library 3 .

The settings managed by the service include:


Initially, all the data to control the parsing lay in the configuration files. With the increase in the number of cities, it was necessary to visualize them and simplify the editing of records. In addition, it was required to simplify the addition of new parameters.

rent-notifier is a bot service for sending out new announcements in Telegram and Vkontakte.


github.com/mrsuh/rent-notifier

Example of subscribing to ads:



The service is written in Golang due to the criticality of the speed of response to the user.
The essence of the service is as follows: you subscribe to the distribution of new announcements, and as you add, the bot sends you messages about them. The service inserts a link to the original ad in the message text.

Auxiliary repos




Code for PHP common database


github.com/mrsuh/rent-schema

General database schema:



With the addition of the rent-control service, the duplication of the database schema code appeared. Therefore, it was decided to make the code in a separate package. Now for any service in PHP, it is enough to add this package to the dependencies via composer .

 composer require mrsuh/rent-schema 


ODM for mongoDB


github.com/mrsuh/mongo-odm

The first ODM for PHP MongoDB that I thought was Doctrine 2 . It comes with symfony 3 and has good documentation.

But at the time of writing the service, in order for this ODM to start working with the latest version of drivers for Mongo PHP , it was necessary to install another package as a layer between the new and the old API . Doctrine 2 is a fairly large project in itself, and with an additional package it became even bigger. Instead, I wanted something lightweight. Therefore, I decided to write ODM myself with a minimal functional set. And I did it - ODM completely copes with its responsibilities.

Some statistics




The service adds an average of 519.41 ads per day to the site.

The most popular metro stations, among the largest cities of Russia, were the following:


More statistics can be viewed on the site itself.

Conclusion




If you have not yet decided whether you need an SOA architecture, then make a monolithic application with a breakdown into modules. So it will be easier to transfer your application to services if necessary. But if you still decide to use SOA architecture, you should understand that this may increase the complexity of the development, the complexity of the deployment, the amount of code, as well as the volume of messages between services.

PS I found the last two apartments with the help of my service. I hope he helps you too.

Source: https://habr.com/ru/post/342220/


All Articles