📜 ⬆️ ⬇️

Personal Book Search Service

Good afternoon friends.

Allow me to present to your attention the service of personal search for books . In contrast to the classical search, here the system, once received requests from the user, will search them again and again. When each new match is found, the system sends a notification to the user. And this is repeated until the user finds all the books that he needs and does not delete his search queries.


')

The idea of ​​a personal search



I believe that I am not the pioneer of the idea of ​​personal search. However, I’ll briefly dwell on this. So, suppose that we got on some interesting site (for example, dedicated to the sale and exchange of books, like mine). Here we see that users regularly add new books. And everything would be fine, only to go constantly to the site in the hope that something necessary for us is about to appear - this is somehow inconvenient ... Yes, and we are busy people, we can forget it ...

Therefore, an idea immediately arises - what if you unload all the “dirty work” on the shoulders of a search robot? It sounds tempting! Let him look for those books that we tell him and worry (notify) us only when such books really appear.

If you think, you can find a lot of cases where the same approach can be applied. For example, a notice that a suitable job vacancy has appeared in a city. That the required medicine was brought to the pharmacy (you never know, a person with a chronic disease, and the medicine is running out). What appeared interesting gadget / video card / hard drive ... It seems to be simple things, but you have to spend your time. Also regularly. And if the information is still scattered on the top ten sites? In general, uncomfortable.

Personal Book Search



However, back to the books. Books are convenient because they easily determine if a book matches a search query or not. For example, Lukyanenko is always a book by Lukyanenko, and “Bury me behind a plinth” is such a book, and no other. Therefore, 95 percent of all the work can be done at once by the search algorithm itself, but the remaining 5 percent remains for the editor. What to do - some search queries look rather ambiguous and give a large stream of irrelevant matches. We have to sift them with your hands.

Nevertheless, even in such a simple model, the numbers are quite good: from about 3,000 incoming books and about 200 search queries, about 300 suitable books were found. That is, in fact, every tenth at the time of his addition already has a potential buyer (and sometimes several at once).

Finally, I’ll open one small technical secret: if the author is entered into the search query, the system searches for not only the direct option, but also its synonyms (for example, “Lukyanenko” = “Sergey Lukyanenko” = “Lukyanenko, Sergey” = “Lukyanenko S.”) . Synonyms are stored in the database and replenished as much as possible and the availability of funding from advertising on the site :-)

Service expansion



Initially, a personal search was possible only for registered users of the site. Finally, after about three months of running-in, it was decided to open this opportunity for ordinary guests. Now anyone can leave search requests for books on our website.

However, this is not the most delicious. Just recently, we managed to expand the scope of the search not only with books from our website, but also from LiveJournal communities. The system was connected about 30 rss-feeds (those book communities that are active). Next, the script downloads their contents and searches for messages within search messages. For relevance, only those messages that are no more than 20 days apart from the current date are analyzed. In the first launch, about 50 books were immediately found that fit the user's search queries - again, a very good indicator.

The algorithms are still “damp”, but the plans will include any other book projects that display their books through rss-tapes. In addition, later it is planned to introduce a “personal search area” - “in my city”, “in my and nearby cities”, “in all cities”. After all, it happens that someone is looking for a rare book and is ready to order it even from abroad, while others are bestsellers in their city.

Performance



Performance is a separate issue. The algorithm consumes quite a lot of resources, so you have to run it on a local copy of the database (conveniently, a backup is made and a search is immediately performed).

Typical figures: the total size of the table of books - about 11 thousand entries, the search starts when you add every 200-300 entries. The number of requests is about 200. The running time of the script on my machine is about a minute. While not annoying much, but as the service increases, you will need to think about optimization (now, apparently, thinks for a long time because of the large number of relationships between the tables). But for comparison: the run of the same 200 queries on the topic table, downloaded from LJ, took only about 7 seconds. But there is only one table with about 70 records. In general, the experiments are continuing.

Source: https://habr.com/ru/post/87535/


All Articles