Your attention is invited to the translation of the post of John Carmack, published by him on Facebook last week and has gained some popularity .After a long break, I finally decided to take another leave, which I spent on programming. For a whole week I was able to work quietly in the hermit mode, far from the usual pressure of work. My wife has generously offered me to take such a vacation for several years now, but vacation, and I, in principle, are poorly compatible things.
As a change of environment after my current work on Oculus, I wanted to write several implementations of neural networks from scratch in C ++, and planned to do this using strictly the OpenBSD system. Someone from my friends noticed that it was a rather random set of technologies, but in the end everything worked well.
')
Despite the fact that I did not have to use OpenBSD in my work, I always liked her idea - it is a relatively minimal and independent system with a holistic vision, as well as an emphasis on quality and craftsmanship. Linux is capable of much, but integrity is not about Linux.
I am not an ardent Unix fan. I am doing quite well with this operating system, but the most comfortable thing for me is working in Visual Studio under Windows. I thought that the week of immersion in the work in the style of the old Unix school would be interesting for me to complete - even if it means that I will work more slowly. It was a kind of retro-adventure adventure -
fvwm and
vi became my best friends for a while. Notice - not
vim , but the real
vi from BSD.
In the end, I was not able to explore the system as deeply as I wanted - I spent 95% of the time doing only simple actions in
vi /
make /
gdb . I appreciated the high quality of the
man pages, since I tried to do everything in the system itself without resorting to searching the Internet. It was fun to see links to things that are over 30 years ago, like Tektronix terminals.
I was a little surprised that C ++ support was not up to par. G ++ did not support C ++ 11, LLVM C ++ did not work well with
gdb .
Gdb has repeatedly
crashed - as it seems to me, because of problems with C ++. I am aware of the fact that it is possible to upgrade to fresh versions through ports (
ports ), but I decided to use only the base system.
Looking back, I think that I just had to go through “full retro” and write everything in ANSI C. There are often days in my life when I, like many older programmers, think in the following spirit: “Perhaps, ultimately C ++ is not as good as it is thought ... ”. I like a lot in it, but I don’t bother to write small projects on pure C.
Perhaps on my next vacation I will try to use only one
emacs - this is another important layer of the programmer culture, with which I didn’t have time to get to know how.
I have an excellent general understanding of how most machine learning algorithms work, I had to write a linear classifier and a decision tree, but for some reason I always avoided neural networks. In my heart, I suspect that the trendy “HYIP” around machine learning has awakened skepticism in me, and I still have some reflexive bias about the approach “let's throw everything into the neural network, and let it be sorted out there”.
Continuing to stick with the retro theme, I printed out some old
Jan Lekun’s publications and was going to do all my work offline, as if I’m really in a mountain hut - but it all ended with the fact that I revised many
of the Stanford CS231N lectures on YouTube and they really turned out to be helpful. I rarely watch lecture videos due to the fact that I usually find it difficult to justify for myself so much serious waste of time - but on vacation, you can afford it.
I do not think that I have any worthwhile thoughts about neural networks that I should share - but this was an extremely productive week for me, which helped turn theoretically “book” knowledge into real experience.
In my work, I used my traditional approach: first, quickly get the result by writing hastily coarse “hacked” code, then write a new implementation from scratch, based on the lessons learned - so both implementations will be working, and if necessary I can compare them with each other (
cross check ).
At first, I misunderstood the
backpropagation method (
backprop ) a couple of times - the turning point was a comparison with numerical differentiation! It seemed to me interesting that the training of a neural network goes even when its various parts may not be entirely correct - as long as the received sign remains correct most of the time, the matter often moves on.
I was pleased with the resulting code of my multilayer neural network; it is in a form that I can simply use in my further experiments. Yes, for something serious, I will have to use an existing library, but in life there are many cases where it is convenient that you have just a couple of .cpp and .h files at hand in which you wrote each line of code yourself. My code for the convolutional neural network (
conv net ) only managed to get to the “works, but with a bunch of hacks” phase - I could spend another day or two on it to write a clear and flexible implementation.
It seemed to me interesting that when I tested my original neural network on
MNIST (
handwriting number sample database ) before adding any convolutions, I got much better results than the values ​​given for
non-convolutional NNs , The comparison of LeKun 1998 in comparison is about 2% of errors on a test set with one layer of 100 nodes, as opposed to 3% for wider and deeper networks of that time. I think this is a matter of modern best practices -
ReLU ,
Softmax and improved initialization.
This is one of the most impressive properties of the work of neural networks - they are all so simple that breakthrough achievements can often be expressed with just a few lines of code. Apparently, there are some similarities with ray tracing from the world of computer graphics, where you can quickly implement a physically correct
light transport ray tracer and create up-to-date images if you have the necessary data and enough patience to wait for the results.
I understood much better with the understanding of the principles of
overtraining / generalization / regularization by studying several parameters of training. On the last night of my vacation, I didn’t touch the architecture and just played with hyper parameters. As it turned out,
it was much harder to maintain concentration while “
she trains ” than in the classic case when “
it is compiled ”.
Now I will look both ways to find a suitable opportunity to apply new skills in my work at Oculus!
I’m scared to even think about what my mailbox and workplace had become when I was gone - we’ll see tomorrow.