Periodically began to notice that I can not find the right article that I saw before.
It seems to be all simple - according to the memorized information, the article can be easily found. But no. Google search often gives nothing, because I remember only fragments of content, and search results contain a lot of noise.
Actually it is at work. For the storage and exchange of useful links to various Github projects, articles, services, we used to use Skype, but now we began to use Yammer for these purposes. Both of these methods have their drawbacks. The main disadvantage of Skype for link exchange is the difficulty of searching through history. Yammer problem - it does not index the text of the article, but only a snippet. None of them has automatic categorization capabilities.
In my spare time, I wrote an application specifically designed to search for articles. Its features:
- add an article with one button from the browser
- automatic categorization
- Russian and English morphology
- view article text
- search query operators
There are 3 tapes available to the registered user: all articles (all), personal selection (selected), articles added (stars). The link to the personal tape editing will appear in the menu after registration. In the same drop-down list to the right of the search string, you can set a filter by category.
')
The main technologies used for development are: Ruby on Rails, Sidekiq, Elasticsearch, PostgeSQL.
To implement a high-quality search, I used the morphology and gem readability plugin, extracting important content from the original source page.
The definition of a category is as follows. Articles in the “web development” category contain terms: html, html5, css, css3, javascript, js, and others. Accordingly, in order to find articles on web development, you need to run a query with a list of these keywords. There are 2 suitable types of queries in elastic: query string and simple query string, I chose the latter because he will never throw an exception and drop an invalid part of the request.
Sample web development category requestjavascript* jQuery coffeescript ajax bootstrap foundation backbone* angularjs css* less sass scss adaptive responsive html* haml DOM frontend "front-end" web "image placeholder" mozilla firefox chrome opera codepen
Thus you can find documents included in the category. Then the opposite question arises - how to find the categories in which a particular document is included? Elasticsearch allows you to change documents and requests among themselves. A category is a saved query and now you can ask which categories are suitable for a given article. This is exactly the type of request, and if you add a new article or category changes will take effect immediately.
I thought for a long time how easy and convenient it is to add new categories. I would like to have a convenient query editor, the possibility of moderation, as well as to evaluate the contribution of each user. There were a lot of different thoughts, and in the end I settled on the githaba repository. GitHub allows you to fork the repository and edit categories online. To check the correctness of the categories file there is an rspec test, which is automatically launched on travis-ci when sending a pull-request.