📜 ⬆️ ⬇️

How we learned to quickly process check scans

In this article I will tell you how we learned to recognize checks (or rather slips), which rakes we stepped on and what efficiency we achieved.

I'll start with a brief intro. We earn by conducting promotions for manufacturers and sellers of various consumer goods. As a rule, this is something from the category of “buy a product - find a code - send it to us - win a prize”. Our main product is a platform that generates, accepts, processes these codes, helps communicate with participants, pays bonuses and does a lot of interesting things.

Recently, we have increasingly seen stocks from colleagues in the shop, where the confirmation of the purchase was not a promotional code, as in the example above, but a check photo. And now the phenomenon was becoming widespread. Staying away was unacceptable.
')
First of all, I registered in all promotions with checks, which I learned about (well, to be honest, not in all, somewhere around the second ten I was tired). As you can guess, I didn’t have checks, and I decided to use a photo of a cat on an abstract background instead of a check. Imagine my surprise when in all the promotions, except for one, my cat was accepted and allowed me to draw prizes. And in some places they even gave out an instant prize in the form of a promo code to the online library. To be honest, in that one action, the cat was also accepted, but they promised to send it to moderation and within 8 (!) Hours to resolve the issue of my participation in the action.

This option is clearly not suitable for us. First, to allow a person to draw on any photo is not good. He can upload the photo of the same check as many times as the platform allows, thereby repeatedly increasing his chances of receiving a prize. When one of these applications wins, the original of that very single check is presented and the prize is taken. Of course, there is a chance to win twice and impersonate, but this we have already carried away. Secondly, 8 hours of not giving a person feedback looks mockingly in a world where a visitor who has spent on the site for more than 15 seconds is considered targeted. Thirdly, to give a prize for a photo of a seal means to show oneself to be a not very competent organizer. By the way, here he is.



The conclusion suggested itself: we need to learn how to recognize checks. The task is difficult, so we went to the professionals - one well-known company. Fortunately, they had a check recognition solution that, unfortunately, is not localized on the Russian market. In honor of this, we were given 1000 free attempts to recognize a check, a promise to help with advice and wished good luck.

By the time there was a request from the client. Our task was to carry out an action for a large chain of retail stores. Looking ahead, I will say that we took up to 1000 registrations per day. In order to qualify for prizes in the promotion, you had to buy something in the amount of N rubles for a certain period of time. and be sure to pay with a VISA card. The photo of the slip obtained with the purchase should have been downloaded to our promotional site. If you were recognized as the winner, you had to present a slip and a VISA card at the box office and pick up the prize. One photo - one chance to win. The winner is calculated among all participants who loaded correct slips using a special formula. Our task at this stage is to take a slip and allow / not allow a person to participate in the draw. At the same time, it is advisable to cut off the cunning ones to the maximum, who may try to slip us one slip twice, slip a slip printed before the start of the action and many more interesting things including but not limited to the photo of the cat.

Repeated testing of a large company product showed that it determines the purchase amount, card type, card number, time and date of printing of the slip. And it seems to be here: duplicates are discarded (for this, by calculating the hashes of the recognized parameters and the picture itself), the amount, date, payment system and card number are recognizable. True, it is recognized with errors ... and not all.

Let me remind you that when issuing a prize, the winner checked the amount of the slip and the card, more precisely the last 4 digits of the card number. The data was checked against the registry, which, according to the results of the drawing, was automatically sent to the store by our system. That is, these data simply must be correct.

We had to make the first compromise: we ask the participant to enter the amount of the purchase and the last 4 digits of the card number with their hands. Further, if what the person entered and what the machine recognized coincided, and the payment system and the date of the slip print are correct, we allow the participant to play.
Counted, it turned out that we allow only 71% of the slip. The remaining 29% are incorrect or poor-quality images and correct, but incorrectly recognized images in an approximate ratio of 50/50.

How to deal with these 14.5% of checks rejected incorrectly? The solution came pretty quickly, they began to send a manual appr in a friendly contact center. Of the minuses: expensive and long. If 71% of the lucky ones got the result within a minute, then these people had to report waiting for up to 8 hours. It was decided to try to normalize the recognition results in our system.

We turn on analytics: manually check the data on the photo and the result of recognition. The recognition result arrives as follows: the “date”, “amount” fields, etc., are separate. and separately the full text, that is, in general, all that was found in the image. Often, data not contained in one of the first fields could be found in the full text with eyes. Following the analysis of several hundred slips, they decided to do the following:

1) Distinguish a check from a slip: among all accepted slip, we are looking for a slip with the maximum number of lines. For any rejected (for any reason) document, we count the number of lines, if it exceeds the previously calculated maximum, we say to the person “Perhaps you are trying to load a check, but not a slip. Take a picture of the slip apart from the check and try again. " Thus, the person understood better what was wrong with his photo.

2) If the date is not recognized: try to search for the full text of the fragment by the mask "XX / XX / XY", where X is any number, and Y is any character. When finding a fragment I change to 6 (or 7 depending on the year of verification), the found fragment should be considered the date of printing of the slip. Yes, yes, the system squinted mostly on the last digit of the date. Won 2%.

3) If the amount is not recognized: search in the full text by the mask “Z. RU”, where X is any number and Z is any character including space or no character. The fragment found is compared with what the participant entered. If there is a discrepancy, alternately replace all the characters 6 by 8 in the found fragment and compare them with what they entered. For some reason, the machine is often confused exactly 6 and 8, and not 8 and 6, namely, 6 and 8. The gain is about 3%.

4) Card number: search in the full text by the mask “** XXXX”, where X is any number. There can be spaces or punctuation marks between the X characters, we throw them away. The resulting number is compared with the manually entered card number. + 1%.

5) Payment card: search in the full text one of the fragments: "Card: V", "Card: V", "Card'V", "VISH". When finding a card to consider a card VISA. + 3%.

Thus, we brought the number of applications received within one minute up to 80%. Alas, on this the possibilities of normalization were almost exhausted, and we switched to improving the efficiency of manual recognition (but this is another story).

In general, we have succeeded, as far as I know, the first action in the country with real machine recognition of checks. The result for the first time seems to me quite good, and by the summer our partner promised to noticeably improve the quality of recognition, officially presenting the Russian version of his service.

Source: https://habr.com/ru/post/401391/


All Articles