📜 ⬆️ ⬇️

Building neural networks in php using FANN, an example implementation

I was faced with the task of analyzing a large amount of information and identifying patterns. And the first thing that came to mind was to build a mathematical model using a neural network .

Since the data for analysis is generated in php and this language is the closest to me now, a library with an interface for php was searched. In this regard, I was recommended by FANN (Fast Artificial Neural Network) - open source networking software. This solution has api for 15 languages, so almost everyone can choose something for themselves.

Example. Recognition of the text language on the page

For example, let's take the task easy, but not far from our reality and from serious tasks. Suppose there are 1000 documents in 3 different languages. Let it be French, English and Polish. Our task is to teach the neural network to recognize the language of the document. For this we use the simplest frequency mechanism. Nevertheless, his results are not bad. Its essence is that each language with different frequency in the text contains the same characters. We prepare 3 large pieces of text for each of the languages ​​(English, French, Polish), count for each character frequency. We will transmit this data to the neural network, indicating which set of frequencies belongs to each of the languages. Then the neural network will do everything itself.

Start by installing FANN

Example, installation quote for Ubuntu.
')
1) You need to install the libfann1 and libfann1-dev packages
apt-get install libfann* 


2) Add fann support in php
I have Apache and the php5-dev module is installed, so I’m doing this
 # wget http://pecl.php.net/get/fann # tar xvfz fann # cd fann-0.1.1 # phpize # ./configure # make 


If at compilation errors appear and among them there will be such

fann.c:393: error: 'zif_fannOO___set' undeclared (first use in this function)

then you should edit the file php_fann.h and comment out line 28 # define PHP_FANN_OO 1
After that re-compile.

As a result, we generate modules that need to be connected to php
 sudo cp -R ./modules/* /usr/lib/php5/20090626+lfs/ 


And in php.ini add
 extension=fann.so 


Overloaded the Apache and checked that everything is OK
 php -m | grep fann 


The solution of the problem

For this you need to perform 2 steps:
1) Teach the network (first listing)
2) And use the finished model for classification (second listing)

I will give an example for the first stage and immediately indicate a link to the documentation.

I gave the code as much as possible with comments, so that it would be clear what was happening and not disassemble it separately.

Train.php file
 <?php /* *   . 256 -   ,      , *         ,     . * 128 -      .       . * 3    .     3 ,       * 1.0 - connection_rate -     * 0.7 - learning_rate -    http://www.basegroup.ru/glossary/definitions/learning_rate/ * */ $ann = fann_create(array(256, 128, 3), 1.0, 0.7); /* *   -    ,  -  . *   3  .        . *    ,    ,    ,    *     (array(1, 0, 0) // Outputs).            * generate_frequencies -   . * *  3   * -  -  * -  -  * -     * *   en.txt, fr.txt, pl.txt    -  10000     * */ fann_train($ann, array( array( generate_frequencies(file_get_contents("en.txt")), // Inputs array(1, 0, 0) // Outputs ), array( generate_frequencies(file_get_contents("fr.txt")), // Inputs array(0, 1, 0) // Outputs ), array( generate_frequencies(file_get_contents("pl.txt")), // Inputs array(0, 0, 1) // Outputs ), ), 100000, 0.00001, 1000 ); /* *     .        * */ fann_save($ann,"classify.txt"); /* *    * */ function generate_frequencies($text){ //     $text = preg_replace("/[^\p{L}]/iu", "", strtolower($text)); //      $total = strlen($text); $data = count_chars($text); //     array_walk($data, function (&$item, $key, $total){ $item = round($item/$total, 3); }, $total); return array_values($data); } ?> 


In the code above, we just generated the model. And now let's check it in the case, the code below analyzes the text and gives an assessment of belonging to a particular language.

Run.php file
 <?php /* *    .       . * */ $ann = fann_create("classify.txt"); /* *       3     *   * */ $output = fann_run($ann, generate_frequencies("ANN are slowly adjusted so as to produce the same output as in the examples. The hope is that when the ANN is shown a new X-ray images containing healthy tissues")); var_dump($output); $output = fann_run($ann, generate_frequencies("Voyons, Monsieur, absolument pas, les camions d'aujourd'hui ne se traînent pas, bien au contraire. Il leur arrive même de pousser les voitures. Non, croyez moi, ce qu'il vous faut, c'est un camion ! - Vous croyez ? Si vous le dites. Est-ce que je pourrais l'avoir en rouge ? - Bien entendu cher Monsieur,vos désirs sont des ordres, vous l'aurez dans quinze jours clé en main. Et la maison sera heureuse de vous offrir le porte-clé. Si vous payez comptant. Cela va sans dire, ajouta Monsieur Filou. - Ah, si ce ")); var_dump($output); $output = fann_run($ann, generate_frequencies("tworząc dzieło literackie, pracuje na języku. To właśnie język stanowi tworzywo, dzięki któremu powstaje tekst. Język literacki ( lub inaczej artystyczny) powstaje poprzez wybór odpowiednich środków i przy wykorzystaniu odpowiednich zabiegów technicznych. Kompozycja - jest to układ elementów treściowych i formalnych dzieła dokonanych według określonych zasad konstrukcyjnych. Kształtowanie tworzywa dzieła literackiego jest procesem skomplikowanym i przebiegającym na wielu poziomach. Składa się na nie:")); var_dump($output); ?> generate_frequencies ( "tworząc dzieło literackie, pracuje na języku. To właśnie język stanowi tworzywo, dzięki któremu powstaje tekst. Język literacki (lub inaczej artystyczny) powstaje poprzez wybór odpowiednich środków i przy wykorzystaniu odpowiednich zabiegów technicznych. <?php /* *    .       . * */ $ann = fann_create("classify.txt"); /* *       3     *   * */ $output = fann_run($ann, generate_frequencies("ANN are slowly adjusted so as to produce the same output as in the examples. The hope is that when the ANN is shown a new X-ray images containing healthy tissues")); var_dump($output); $output = fann_run($ann, generate_frequencies("Voyons, Monsieur, absolument pas, les camions d'aujourd'hui ne se traînent pas, bien au contraire. Il leur arrive même de pousser les voitures. Non, croyez moi, ce qu'il vous faut, c'est un camion ! - Vous croyez ? Si vous le dites. Est-ce que je pourrais l'avoir en rouge ? - Bien entendu cher Monsieur,vos désirs sont des ordres, vous l'aurez dans quinze jours clé en main. Et la maison sera heureuse de vous offrir le porte-clé. Si vous payez comptant. Cela va sans dire, ajouta Monsieur Filou. - Ah, si ce ")); var_dump($output); $output = fann_run($ann, generate_frequencies("tworząc dzieło literackie, pracuje na języku. To właśnie język stanowi tworzywo, dzięki któremu powstaje tekst. Język literacki ( lub inaczej artystyczny) powstaje poprzez wybór odpowiednich środków i przy wykorzystaniu odpowiednich zabiegów technicznych. Kompozycja - jest to układ elementów treściowych i formalnych dzieła dokonanych według określonych zasad konstrukcyjnych. Kształtowanie tworzywa dzieła literackiego jest procesem skomplikowanym i przebiegającym na wielu poziomach. Składa się na nie:")); var_dump($output); ?> 


Result

Our model generated the following answers by text.
In the first case, the model decided that she was given English at the entrance (98%) and she is right
 array(3) { [0]=> float(0.98745632171631) [1]=> float(0.0094089629128575) [2]=> float(0) } 


In the second text, she decided in favor of the French and was again right
 array(3) { [0]=> float(0) [1]=> float(0.99334162473679) [2]=> float(0) } 


She also correctly recognized the third text as Polish.
 array(3) { [0]=> float(0.015697015449405) [1]=> float(0) [2]=> float(1) } 


Some users complain that neural networks give probabilities rather than a specific answer. For those who are in the tank I will add that in our world everything is based on probabilities. For the correct answer, you need to consider the one that is true at least 90%, if less then the network needs to be trained to improve the classification.

Despite such a simple system, the neural network gives good performance. You can make n-grams and classify them, it will be even more reliable, you can combine both options. Neural networks are a powerful tool, you just need to learn how to use it.

Go to the FANN website

Source: https://habr.com/ru/post/158729/


All Articles