📜 ⬆️ ⬇️

Image Classifier

Given a bit matrix containing a filled-in image of a circle, square, or triangle.
The image may be slightly distorted and may contain noise.
It is necessary to write an algorithm to determine the type of the drawn figure on the matrix.

This simple task at first glance met me at the entrance exam at DM Labs.
At the first lesson we discussed the solution, and the teacher (Alexander Shlemov; he supervised and further implementation) showed why it is better to use machine learning for the solution.

During the discussion, we found that our decision was made in two stages. The first stage is noise filtering, the second stage is the calculation of the metric by which the classification will take place. Here the problem of defining boundaries arises: it is necessary to know what values ​​the metric can take for each shape. It is possible to pave these boundaries manually “by the eye”, but it is better to entrust this matter to a mathematically sound algorithm.
This learning task was an introduction to Machine Learning for me, and I would like to share this experience with you.

main idea


First, we create a variety of random images of circles, triangles and quadrangles. Next, we pass the pictures through the filters to eliminate noise and calculate the set of metrics that will be used to classify the figures. We define the boundary values ​​for metrics using a decision tree (Decision tree), or rather, with the help of its CART variety (for more details see here ). The implementation of the algorithm is simple and better interpreted in terms of boundaries.
By passing an unknown image through the filters and calculating the metrics, we can tell by the decision tree which class the figure in the picture belongs to.

Generation


The generator must not only create pictures, but also be able to interfere with them. When generating, we must discard figures with sides less than a certain size (for example, 15 pixels) and with obtuse angles (more than 150 degrees) - even for a person it is difficult to classify them.
The script is carefully provided by Victor Evstratov.
bitbucket.org/rampeer/image-classifier/src/HEAD/picture_generator.py
')

Filters


During the discussion with colleagues, we found good ideas for filters. Let us take black pixels surrounded by pixels of the opposite color for interference (this idea formed the basis of the median filter), and take a large amount of black pixels as the desired circuit, that is, we use a bicomponent filter.

Median filter : for each black pixel we determine the number of “black” neighbors in the vicinity of the radius R. If this number is less than a certain T , then discard this pixel, that is, paint it white. In the “original” median filter, we drop the pixel if it has less than half of the black neighbors, and we have defined our threshold T , so in fact it is a quantile filter.



I wrote a median filter honestly and head-on, so it works for O(WHR2) , where W and H are the size of the image. The teacher said that the algorithm can be accelerated using the Fourier transform. Indeed, the median filter can be expressed through convolution, and convolution can be quickly calculated using the fast Fourier transform.
It turns out that you can calculate the matrix with the number of neighbors as
result = FFT-1 (FFT(data) FFT(window))
This algorithm works for O(WHlog(W+H)) , that is, much faster than a naive implementation, and does not depend on the window size. Due to the cyclical nature of convolution, an artifact occurs at the boundaries: when processing the leftmost pixels, the neighbors will be considered the rightmost ones. This can be corrected by adding a frame of pure pixels along the edges of the image, and you can leave it so that I did - anyway, this effect does not damage the performance.
bitbucket.org/rampeer/image-classifier/src/HEAD/filter_median.py
However, I found this property a bad property: it rounds the acute angles of the triangles. Because of this, it is necessary to take the radius of the window R rather small and pass through the image with a filter only a few (N) times. Although at first it seemed that you can apply a median filter as long as it removes some pixels.

Bicomponent filter : take an arbitrary black pixel, assign it some Q number. Assign the same number to “black” neighbors of this pixel in a neighborhood with radius R and repeat for neighbors neighbors. We will repeat until we reach the border of the image (this is reminiscent of the effect of the Paint Fill tool in Paint, but we paint it in Q color). Then we increase the number Q by one, select the next “uncolored” pixel and repeat the process.
After executing this algorithm, we get a set of non-contiguous islands. We can say with high confidence that the largest island is the figure, and the rest of the islands are interferences.


<filter_bicomp.py>
Unlike the previous one, this filter does not spoil the image. It can cause damage only if the line is crossed by a line of interference with a width greater than T , which is unlikely.
In the median filter, I found another property, this time positive. It breaks the space filled with noise into “islands”. So, then applying a bicomponent filter, we get a loop with sticky noise. After processing with a bicomponent filter, you should once again go through the median to remove the remaining irregularities. Then you need to build a convex contour around the remaining points, fill it in and calculate the metrics already for the new image.
bitbucket.org/rampeer/image-classifier/src/HEAD/process.py
Filters work:
Source imageMedian filterMedian, bicomponentMedian, bicomponent,
median, fill

Filter parameters are selected based on the size of the images in the sample and their noise. For the clever quad shown above:
  median.filter(img, 2, 6, 1) bicomp.filter(img, 2) median.filter(img, 2, 5, 2) 

In our sample, images will be less noisy, and the settings will be more benign.

Metrics


In the course of the same discussion with colleagues, we came up with many more interesting metrics.

Counting angles . This is the very first thing that comes to mind after reading the task. But due to interference, additional angles may appear, close to 0 degrees. I tried unsuccessfully to deal with this by sticking together almost collinear vectors and setting thresholds. Such methods are hard to set up, and they can still give an incorrect result, since the figure is smoothed when filtering. It is better to sum the squares of the sines of the corners, and if the angles are greater than a certain threshold T, round the square up to one. This gives a fairly good result: acute angles add one to the counter, while angles close to 0 add almost nothing. By the way, it seemed to me amusing that in such a case the number of angles at the triangle can vary from 2.5 to 4.

bitbucket.org/rampeer/image-classifier/src/HEAD/feature_edges.py

Mirroring . The meaning of this metric is to calculate how much the figure will coincide with an inverted copy of itself when reflected horizontally / vertically / along both axes. That is, this metric is a kind of measure of symmetry. A circle, whatever one may say, will always completely coincide with itself. Also, I intuitively assumed that the square will coincide more with itself than the triangle.

bitbucket.org/rampeer/image-classifier/src/HEAD/feature_mirror.py

The ratio of the area to the square of the perimeter . The perimeter must be squared so that the metric does not have dimensions and does not depend on the size of the image. The perimeter and area will be taken from the convex contour. The circle has the largest metric value among the figures: s/p^2 =(πr^2)/(4π^2 r^2 )=1/4π . For an equilateral triangle (it has the largest ratio among the triangles): s/p^2 =(4√3 a^2)/(3*9a^2 )=(4√3)/27 . For the square: s/p^2 =(a^2)/(16*a^2 )=1/16 .

bitbucket.org/rampeer/image-classifier/src/HEAD/feature_perimeter.py

Metrics on the describing rectangle . The previous metrics did not divide the four and the triangle very well, and I decided to come up with a new metric. Construct a bounding box around the shape. For each side of the rectangle we find the first (“minimal”) and last (“maximal”) intersection with the figure. We will have an “octagon” for which we can calculate different metrics.

For example, the ratio of the area of ​​a figure to the area of ​​a describing square (sbound).
bitbucket.org/rampeer/image-classifier/src/HEAD/feature_sbound.py
Also a promising metric is the ratio of the area of ​​a figure to the area of ​​this octagon (sbound2):
bitbucket.org/rampeer/image-classifier/src/HEAD/feature_sbound2.py

Data collection


Applying the knowledge gained, I generated images and collected statistics. In this set of images, I added shapes representing potential “bad incidents” for metrics:

These cases, I gave a lot of weight when building a tree. The fact is that the probability of random generation of such images is small, and these cases are boundary for some metrics. Therefore, for correct classification, you need to add them to the sample with a large weight.
The procedure for filtering images and collecting metrics I rendered to a separate file, it will be needed for analysis. By the way, in the decision tree our metrics are called “input parameters” or “features” (feature).
bitbucket.org/rampeer/image-classifier/src/HEAD/process.py
bitbucket.org/rampeer/image-classifier/src/HEAD/stats.py
Then I plotted to make sure the metrics distinguish the figures well enough. For this, a “box with mustache” type of chart is suitable:

“Antennae” overlap, which means that inaccuracies in the analysis are possible. As it was written above, precisely for accuracy we need several metrics, and not one. Then I tried to make sure that the “bad cases” of the metrics do not overlap. For this, I built the dependence of one metric on another. If it turns out that they are monotonously dependent on each other, then their “bad incidents” will also coincide.

* As we can see from the graphs, the “clouds” of points intersect strongly. Therefore, the classification may be a big mistake. In addition, metrics are not monotonously dependent on each other.

Analysis


According to the collected data, you can build a decision tree:
bitbucket.org/rampeer/image-classifier/src/HEAD/analyze.py
I tried to visualize the tree. It turned out this scheme:

It follows from it that some metrics remained unused. From the very beginning, we could not predict which metrics would turn out to be “better” than others.
The prediction accuracy is about 91 percent, which is not bad considering the square distortions and interference in the sample:



Let's try to draw the images yourself and analyze them:

Circle

Rectangle

Triangle

Let's try to increase the voltage.
We will distort the figures until they become incorrectly determined.

Rectangle

Rectangle

Triangle

Rectangle

That's all.
In the last image, the corners of the triangle are strongly rounded, edges cannot work correctly, and perimeter gives too large an error. The triangle is unsuccessfully rotated: when constructing a bounding rectangle, only two vertices will touch it, and sbound and sbound2 will not yield anything reasonable. Only mirror could give the correct result, but it is not included in the tree. And if out of 5 metrics only one points to a triangle, then this conclusion can be interpreted as erroneous.

Conclusion


Methods of machine learning allowed to build a system that copes well with the task - it recognizes the figures in the image quite well.
Graphs were built in R:
bitbucket.org/rampeer/image-classifier/src/HEAD/charts.R

Source: https://habr.com/ru/post/205842/


All Articles