I am engaged in neural network learning algorithms. So far, simple non-concurrent neural networks. So far, relatively simple algorithms, one form or another of gradient descents. Today I spoke at an interesting seminar on neuroinformatics, and I was asked why to rediscover what was invented? ')
And there really is a matlab. Anyone in two movements can create and train a standard grid with one of the ready-made standard and already optimized algorithms, teach some scary standard classification problem and everything will be fine with it. This is all the more relevant, given that since the 70s of the last century, nothing fundamentally new has happened in the matter of reversing the error. And new grids are already in the lab.
In this post I will try to show why you need to reinvent the wheel. Less words, so to speak, more pictures. I sat down and made a couple of videos in a few hours.
We will solve the problem from my previous article. Let's take a standard network tested for decades, a multilayer perceptron in 4 layers of 10 neurons, fully connected with an additional bias-synapse for each neuron (since Rumelhart's times there is nothing more boring). At the entrance he will be given coordinates in the range [-1; 1], and at the exit expect a prediction in the form of three numbers, what color the dot should be in the photo of a flying peacock in this place. And then we request colors from the network for all points within the picture and see what it has learned.
On the left is the picture that the network draws, and at the bottom of it the RMSE is the root-mean-square error of the network at all points of the picture. The bottom right graph of the mean square error for all points of the era. The points are chosen randomly, so the error differs from the average for all data. The graph is dynamically rescaled to be better seen. And at the top of the schedule scale. Finally, the last upper right square is the position of the network in a random two-dimensional phase space. This visualization method is described in detail in one of my past articles . It is very convenient to understand what is happening inside the network.
We will teach the first network with the most oak method - the usual gradient descent. Educational examples will be fed in batches of 300 points. The method is still used as part of more complex algorithms. For example, when training trump deep nets. In the video 1000 epochs one frame per epoch. Adaptive speed, in the range of 0.1-0.01
The video shows that by the middle of training the network almost ceases to improve the error at RMSE = 0.35, but the phase picture shows that there is some kind of stable lateral drift below the noise level. Let's go through about 8000 epochs over the last few frames and see the local minimum in which this drift ends. The network stopped at the turn of RMSE = 0.308, further pointless to mock it.
Now take the second video. The same 4 layers of 10 neurons. But here I include almost all the power of my best semi-stochastic algorithms, invented in the process of reinventing the bicycle. The size of the pack for the Mini-batch adaptively changes from 100 to 1000, the speed also goes one and a half orders of magnitude.
We immediately see that everything is much more cheerful. The network for 100 eras passes the limit to which the method from the textbook did not reach for many thousands, and after 500 epochs we admire a much more beautiful picture that could reach RMSE = 0.24 maybe this is not the end and it is dramatically better than algorithms without stochastics. The picture shows that the network has mastered a much more beautiful transformation.
But that's not the point. Let's compare this rather good result with the picture from the previous article. Obtained by almost the same algorithms with stochastics, but on a network that does not have such a rudiment as fixed layers. RMSE = 0.218
Personally, one glance is enough for me to understand that before moving on some things in neuroinformatics would do well to rediscover it. As Jobs said “Stay Hungry. Stay Foolish. ”Very much in neuroscience can be improved by the most ordinary person on the most ordinary home computer if he has an idea, and he takes time to do it. Too little knowledge has been accumulated here, too much is left in the legacy of the time of computers occupying the floor, too many wonderful algorithms just no one has ever tried.
And then, maybe tomorrow, Google will offer you a billion, or even even better, future brains in the bank will be named in your honor. Or even cooler - you will be interested to live in this world.