In the spring, the administration of Habr kindly provided us with a blog so that we would tell about our exercise with room recognition. All the maintenance of this system was done simply out of interest and enthusiasm, but it allowed us to communicate with interesting people, help some people, and find a part-time job on completely different topics.

In any image processing tasks, 90% of success is a good database. Representative and large. In the spring, we promised to post a full database of images of what will come to us. Subscribe blog ends, so the time to fulfill the promise (the blog may be extended, or maybe not). Our server worked 95% of the time, starting with the first post. All that has come is now available + we have made separate bases on the cut numbers and cut symbols.
')
Under the link to the base + its analysis + some code + a small story about what will be done next with our server / project life.
Base itself
Bases by numbers we are not marked (anywhere there is no file with the correct decryption). Marked only the base of the characters.
Base uncircumcised photos of cars (1.4 GB). Approximately 9,300 frames.
The sizes will be from a couple of hundred pixels to a dozen megapixels. Pictures look like this:
Base of excised numbers + counterexamples (260 MB). Approximately 5000 numbers + 1200 counterexamples.
Pictures look like this:
Base cut characters for Russian numbers (60 MB). Approximately 18 thousand letters, numbers and counterexamples.
* Small addition. Folder "17" is empty. It contained the letters “O”, but the classifiers did not distinguish between it and zero, so we combined them in the folder “0”.
Pictures look like this:











A few words about the base
The base is based on the fact that we collected ourselves + on the fact that we were sent + manual thinning from bad frames. For the beeches, a sliced ​​base of negative examples is given (folder number 22). Without a negative sample, the operation of the real algorithm is almost impossible, the letters will be determined at any noise. The negative sample contains chunks of beeches, without this, the eight-cut in half can be mistaken for three. At the end of the article there will be an example of such a use of such a base. A negative sample is also given for the excised numbers, if there are people willing to train a cascade on them.
From the minuses of the base:
- decent frames taken from the monitor
- some numbers are found more than once
- there are duplicate images, although not very many
- thinning frames for the base with large photographs is imperfect. Everything came to the server: absolutely irrelevant photographs, empty photographs, photographs with unreadable numbers. I cleaned most of it, but there might be residues.
Why do you need it?
Strangely enough, but the problem of recognition of numbers is relevant in a variety of situations, often not related to numbers. And, despite the fact that there seem to be dozens of solutions, there are many projects where it needs to be solved independently and from scratch. Such a base is an aid to such projects.
It is also interesting for testing machine recognition and classification algorithms. Take the same
MNIST . A synthetic task, but still interesting to many. And here there is the possibility of real application of the trained algorithm.
Further life of the project
We are planning to keep the server with number recognition recognized in working condition + sometimes we update the algorithm. The application on the phones, as we expected, is not particularly fired. It is available in the
PlayStore + is a version for the
iPhone . The applications are generally working, but we will not do any further support for them. Yes, and especially a large base did not meet. The code for
both applications is
open . If you want, you can finish it yourself.
On the other hand, we were pleasantly surprised that our server was often used as a kind of benchmark for testing our algorithms. Three times we were driven to the server huge databases. We do not know exactly who and why. But, nice. It can be seen that people spent the time to compare their algorithm with ours, even if they are far from ideal.
Hence the idea: a small piece of the base we left for ourselves. If suddenly it will be interesting to anyone, we can run the algorithm on our base (or give the base to you), on the terms of non-proliferation of the base and publication of the results. We will publish the results
here and
here , when publishing we put a link to your website / contact.
Non-proliferation of such a test base is necessary to avoid fitting the results, as they do when there are open bases (for example, MNIST).
If the blog is extended to us, then the results and evaluation methodology will report on Habré.
About cascade
We have already asked five times to update the cascade. You need to spend a few more evenings for this, but now there is no time. Be sure to do, the ordered base presented here is a big piece of work in this direction. The new version will appear in our
repository . Approximately I will be busy on January holidays, but I can’t be in time. In principle, if anyone is not lazy, you can use the base presented here and count on it a cascade. How to do
this is the instruction. If you do, throw to us, or directly to the githab, we will update the cascade on it.
Letter recognition training
Despite the fact that we have published all the logic and algorithms in previous articles, we don’t want to publish the source codes of the piece where we recognize letter numbers (if we suddenly decide to turn off the server completely, then of course we will publish it).
But still, we would like to show you how to recognize numbers quickly and easily. Therefore, we went through several variants of simple algorithms that allow recognizing letters that are easy to train (if there is a large base) and were trained on the base that was laid out above. The best interest and the simplest work in our opinion is the SVM in the Accord library (ML library of the AForge project). In principle, everything is similarly done in OpenCV, SVM is there.
Training:
using Accord.MachineLearning.VectorMachines; using Accord.MachineLearning.VectorMachines.Learning; using Accord.Statistics.Kernels; // , double[][] inputs; // int[] outputs; //"" . , "" SVM double sigma = 12; // . 10 , 12 , + 1 int classCount=23; MulticlassSupportVectorLearning teacher = null; // : , , //sigma - . - 10-20, . machine = new MulticlassSupportVectorMachine(width*height, new Gaussian(sigma), classCount); // teacher = new MulticlassSupportVectorLearning(machine, inputs, outputs); teacher.Algorithm = (svm, classInputs, classOutputs, i, j) => new SequentialMinimalOptimization(svm, classInputs, classOutputs) { CacheSize = 0 }; teacher.Run(); machine.Save("MachineForSymbol");
Recognition is implemented in two lines:
MulticlassSupportVectorMachine machine = MulticlassSupportVectorMachine.Load("MachineForSymbol"); int output = machine.Compute(input);
It is important that the input data is binarized. This greatly improves the accuracy of work. I will give an example of loading:
using AForge.Imaging; using AForge.Imaging.Filters; private static List<double> test(string str) { Bitmap bmp = new Bitmap(str); // , . 34*60. , . ResizeNearestNeighbor filter = new ResizeNearestNeighbor(17, 30); int count = 0; BitmapData bitmapData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb); List<double> res = new List<double>(); int width = bitmapData.Width; int height = bitmapData.Height; int stride = bitmapData.Stride; int offset = stride - width*3; unsafe { byte* ptr = (byte*)bitmapData.Scan0.ToPointer(); double summ = 0; for (int y = 0; y < bitmapData.Height; y++) { for (int x = 0; x < bitmapData.Width; x++, ptr+=3) { // , , res.Add((ptr[0] + ptr[1] + ptr[2]) / (3*255.0)); summ += (ptr[0] + ptr[1] + ptr[2]); } ptr += offset; } summ = summ / (3*255.0* bitmapData.Height * bitmapData.Width); // for (int i = 0; i < res.Count; i++) { if (res[i]<summ) res[i] = 0; else res[i] = 1; } } bmp.UnlockBits(bitmapData); return res; }
An example of a trained SVM'a. The percentage of correctly recognized characters in a training-independent sample is 96%.
PS
From the stories related to the numbers. Now in Kazakhstan there is (or is happening) some kind of mass tender for the installation of number recognition systems. But there are some very few competent firms connected with IT system solutions, or very little. Somewhere once or twice a month, another manager contacts us and offers to supply them with a system immediately, already trained under the country numbers. At the same time confusing the words "server" and "camera" ...