Label recognition technology on the example of labels from IKEA (2 videos)

The task of label recognition is of great practical importance. After all, its solution can significantly simplify work with goods in stores, ranging from merchandising and ending directly with the sale to the final buyer. However, due to the weak formalization of the task itself and the large variability of potential objects of recognition, there is currently no universal label recognition technology. And commercial enterprises, realizing the high commercial value of such automation, use various workarounds (for example, they stick 1D- or 2D-barcodes to goods).

Despite this, the task of recognizing labels does not cease to attract many inquiring minds. So we wanted to find such a practical task, which is solved using label recognition technology and at the same time brings public benefit. The answer was found by itself during the next eating of the famous Swedish meatballs with cranberry sauce.

')
That's right, we will talk about label recognition in the hypermarket of furniture and household goods IKEA. Let's get started

Consider the features of the problem:

the object of recognition is well known and understandable;
red and yellow Ikeev labels are quite contrasting;
Ikeev labels have certain filling rules;
IKEA product numbers are subject to strict patterns.

In this form, the task sounds both interesting and not too difficult - a great chance to check in the case of beginners (interns). So today we bring to your attention SmartHelper - a program created by the three interns of our IT lab, which can facilitate the purchase of thousands of users in IKEA.

Like any car, regardless of the slope consists of the body, engine, transmission and chassis, so the recognition program contains inside a number of mandatory modules:

1. Module receiving and primary image processing. As a result of the capture from the camera, color images enter the program in the RGB representation. For further work there is no need to “drag” a multi-megabyte image and we should suppress the color, preferably without degrading the picture for further recognition. Therefore, the first step is the analysis of the central part of the input image for “what kind of blooming label do we recognize”, after which the correct color suppression method is selected.

2. Module localization and orientation of the document. Using a mobile phone camera to capture document images introduces a large number of geometric and optical distortions. This module is responsible for finding the borders of the label, determining its type, determining the projective basis and correcting the image of the document.

3. Module guidance and field recognition. Any document (and Ikeev label, in particular) consists of auxiliary and informational elements. In particular, on the label it is necessary to recognize the item, row and place in the warehouse. Depending on the type of label, these fields may be located in different places or be absent altogether.

4. Text line recognition module. The purpose and functionality of this module to the reader, we are sure, are understandable without further ado. Separately, I just want to note that for the segmentation of a string into characters, we use dynamic programming, and the recognition engine works directly “by the gray” image (without prior binarization).

5. Module for integrating results. This module - “must have” when solving problems of video stream recognition. The field can be recognized much more confidently by analyzing the results of successive frames. We call this technique “interframe integration of results”. Although sometimes it is also useful when recognizing individual images (often important documents are duplicated in documents and can be “integrated” in recognition).

As a result, we have a fully functional program, the knowledge of which is a pleasure to share. But before publishing a story about this technology in Habré, we, of course, told friends about it (who are far from programming and object recognition and are close to various economic issues and purchases in IKEA). And I must say that the reaction was varied: starting from wild delight (which is incredibly pleasant) and ending with a complete lack of understanding of its expediency. Having conducted a kind of data mining of the received answers, we managed to formulate three main questions that interested “potential users”.

1. Why make a fuss with this recognition when everyone has long been accustomed to just taking pictures of labels?

Again, the primary task of the program is to conduct a kind of testing for our new employees. But the practical meaning of the resulting program is not devoid. In contrast to the chaotic set of photographs of labels, our program allows you to compactly and ergonomically create a shopping list during a walk through the showroom. And with agreements with IKEA, it will be possible to additionally specify the availability of goods in the store and calculate the intermediate cost of the basket.

2. Well, why then recognize it on a mobile phone, when is it easier to send a photo to the cloud and wait for the results?

From the point of view of a software developer, remote recognition on a powerful server is, of course, easier. But not every user will appreciate this approach for a number of reasons. Firstly, even in our time, not every IKEA customer is connected to unlimited mobile Internet. Secondly, even with the potential presence of unlimited customer, due to technical peculiarities inside the store, it may not be good to catch the net everywhere (as a rule, IKEA is a large reinforced concrete structure that does not transmit high-frequency radio waves). Finally, thirdly, as we told in our previous post , your smartphone is fully matured to solve recognition tasks quickly enough - so why, then, send something somewhere and wait for an answer?

3. And what part did experienced programmers of your organization take part in developing the program?

Of course, there are a number of subtasks in the program that our novice programmers cannot solve themselves. Specifically, the guys used ready-made text line recognition tools (character segmenter and letter recognition engine). The rest of the functional (frame preprocessing, field guidance, interframe integration of the results) was performed by the guys themselves, practically from scratch.

Of course, there was another task that fell on the shoulders of absolutely all employees of our organization - it was necessary to approach with all rigor and seriousness to testing the resulting product in real conditions. Below is the video that was filmed already outside our IT lab. We invite you, dear readers of our blog, to take part in testing our new program (as long as the version for the iPhone in the Apple Store is uploaded).

Source: https://habr.com/ru/post/255699/

All Articles

Label recognition technology on the example of labels from IKEA (2 videos)

More articles: