Preamble
I will not surprise anyone if I say that a modern person, and, in particular, a programmer, gets a lot of information every day. For example, my RSS-client gives me about 500 articles per week. And, of course, this is not the only source of information.
I thought about creating an RSS client for myself with a student filter of articles on NodeJS. In principle, there are ready-made RSS readers under a node, there are ready-made neural networks with classifiers, so it seemed to me to write some kind of prototype not a particularly difficult task.
')
I decided to start by testing the neural networks tucked under my arm. I took a small amount of input. I copied the positive data from articles on nodejs with habr. I found the negative data on the "tape.ru". The task of the classifier was to sort the articles on programming and nodejs from the usual, uninteresting for my development, news.
I don’t want to show the results of the work with
Brain and
Fann - I don’t think that I have enough expertise to judge them. I can only say that out of the box they did not suit me at all - on my input they did not give an adequate number of correct answers. But the
Natural Library impressed me a lot.
Then I will show how I taught the classifier, checked his work and made him understand Russian.
Input data
The data on which I trained and tested the classifier can be viewed
here . There are a lot of them for the article, that's why I brought them from here.
Code
'use strict'; var data = require('./data'); var natural = require('natural'), porterStemmer = natural.PorterStemmerRu, classifier = new natural.BayesClassifier(porterStemmer);
Result
START CLASSIFICATION
Test on good
> good
> good
> good
> good
Test on bad
> bads
> bads
> bads
> bads
> good
> bads
> bads
> good
Russian language support
For qualitative classification, Natural uses the “stemmer” component, which splits text into an array of words, removes useless words (so-called
stopwords ), and cuts off the endings of words.
By default, the classifier ignores Russian words, although there is support for the Russian language in the project. In order to make the classifier understand the Russian language, it is necessary to initialize the classifier, passing into it a steamer for the Russian language, thus replacing the default English steamer. This is very easy to do:
var classifier = new natural.BayesClassifier(natural.PorterStemmerRu);
Now the text inside the classifier will be processed correctly, taking into account the peculiarities of the Russian language.
Lovers of experiments
I specially created a repository with a working classifier. Installation is trivial:
git clone git@github.com:shuvalov-anton/classifier.git cd classifier npm i node app.js
Then change the data in data.js to your own and see the result.
PS
To be honest, I have no experience in classifying information to evaluate the result, but the results of Natural made me very impressed as a simple user. Unfortunately, I did not find any more or less serious project documentation other than the readme on github. And in order to understand how to turn on the Russian language, I had to dig in the source code, but there was nothing supercomplex in this, and I believe that the result was worth it!