Google Cloud Vision API. Future Computer Vision as a service has arrived?

A year ago, Google piled the Cloud Vision API platform. The idea of the platform is to provide Computer Vision technologies, in which Google is the undisputed leader, as a service. A couple of years ago there was a technology for each task. It was impossible to take something in common and to ensure that the algorithm solves everything. But Google swung. Here, a year has passed. And the technology is still not well known. On Habré one article . Yes, and that is not about Cloud Vision api, but about Face api, which was the predecessor. The English-language Internet is also not replete with articles. Is that from Google itself. Is it a failure?

It was interesting to see what it is still in the spring. But the strength to fully sit was not enough. Occasionally, something separate tested. Periodically customers came and asked why Cloud Api could not be applied. I had to answer. Or vice versa, send from the threshold in this direction. And suddenly I realized that there was already enough material for the article. Go.

What is included in the Cloud Vision API

Google says a lot of different and beautiful words. But it is not interesting. All that they can do according to their price list:
')

Label Detection - the detection of the class to which the image belongs, the detection of what is depicted in the image: seals, dogs, elephants, serenity, etc.
OCR - text recognition
Explicit Content Detection - detection of any bad content: wipe blacks and other horrors of life.
Facial Detection - detection of faces, features, special points on faces
Landmark Detection - geolocation detection by photo
Logo Detection - detecting symbols and icons
Image Properties - did not understand what it means

In the article I will talk about Label Detection, ORC, Facial Detection as the most logical ComputerVision puzzles for which I know analogies / variants. A little casual touch Landmark Detection.

Label Detection

Anyway, everyone uses this feature from Google. Search on a picture on google.com uses it. In fact, Label Detection gives the characteristics of images. Characteristics can be the object depicted in the photo. There may be a style of photography, such as “Macro”, “Portrait”, “Black and White”. Or maybe something very general: “botanical object”, “atmospheric phenomenon”.
In addition to searching, this function can solve problems:

Image base sorting
Signatures of tags to some photobank
Interest analysis by photos
Etc.

Analogs . But, strangely enough, google has a lot of competitors in this direction:
Microsoft . Able to fairly well describe what is happening in the image, and not just its integral part. No online demos to compare.

IBM is much poorer and poorer.
Cloud Sight - muddy razvodilovo. They pretend to be that they have an automatic system that 100% correctly recognizes. In reality, the Indians are sitting. They want 50 bucks for 800 images. I recognized very badly. But maybe, just everyone came out to smoke.
Clarifai Works awesome. I did not even believe it. But it recognizes and signs better than anyone within 2-3 seconds. Sometimes, though, Google won
There are a few smaller players with worse recognition.
There are open grids trained on ImageNet that you can customize. Cheap and angry. But it will not work very well.

Here is a big and full comparison. I will give just a few examples:

According to Google .
According to IBM .
According to CloudSight .
According to Clarifi .
According to an ImageNet-trained GoogleNet laid out in open access at Caffe, this is Siberian Husky.

An example of when Google failed: 1 . An example when he gave an inaccurate, in my opinion, description: 1 . There is no word "apple". But only cloudsight managed. There is no word "man" - 2 . There is no word “crow” - 3 .

And only in this picture from those that I tried, Google went around all and found a cat:

The rest failed - 1 , 2 , 3 .

Conclusion: works well. Competitors, marching in at least enough. At home, it's almost impossible to repeat.

Facial detection

Face recognition. What can Google:

Find a face
Find special points
Determine facial expression

That he does not know how, but competitors are able:

Determine gender
Determine age
Compare two faces and decide whether they are the same or not.

Competitors, for example: 1 , 2 , 3 . In general, dozens of them.

What makes sense to compare Google? For example, with the selection of individuals cascading Haar, as in OpenCV, or HOG, as in dLib. Google wins them. And the points of the face is better than dlib:

Dlib:

More and more .

Google:

More and more .

But at the same time, Google is paid, and Dlib is free. To configure you need the same number of lines. Moreover, if you get tired of it, you can take something state-of-art instead of dlib and get accuracy almost no worse than Google.

In general, this round of Google definitely leaked.

Landmark detection

But this item - PC. and Google has no equal here. And there are no analogues. When I understood how this function works, I thought, "well, the Kremlin recognizes it." But it was not there. In addition to the Kremlin, it successfully recognizes all more or less significant tourist sites. Two examples that froze me:

Borovsk:

Well figs with him, he is more or less touristy popular. There are many photos. Let lucky.

Conclusion:

Manor-pioneer camp, lost between Moscow and St. Petersburg near Borovichi.

How he does it, I do not know. Here are a bunch of examples: 1 , 2 , 3 , 4 , 5 , 6 , 7 .

He spilled only once. But the epic way:

OCR - Optical Carecter Recognition

And finally, go to the most interesting part. It is for this reason that I climbed to dig. How well Google recognizes texts. This is the most industrial application, it is here that dozens of producers and consumers would be interested to have ready-made solutions:

Recognize Books
Recognize price tags
To recognize the signs
Recognize autonumbers, train numbers, house numbers, ...
Etc., applications do not count

Let's try to compare what Google has achieved. Compare with existing solutions. And we compare it with its only competitor as a “common recognizer” - Microsoft .

Books, Texts
For texts, Google has a strong competitor to Abbyy . From what I have tested, it seems to me that the level of character recognition in them is approximately the same:

Google

Perform a wireless installation
Before starting, install, verify that the
wireless access point is working correctly
network, and the product is turned on
If there is a process
If there is a blue light on it
3. Connect the USB cable between the computer and the product.
The ⟨HP⟩1 ⟨Smart Install⟩4 program (see picture should start
automatically within 30 seconds. Note: If ⟨HP⟩1 Smart Install does not
start automatically. ⟨AutoPlay⟩6 might be disabled on your computer
⟨Browse My Computer⟩5and double-click the ⟨HP⟩1 Smart Install CD drive.
-SISetup. the7exe file
product. If you can’t find the ⟨HP⟩1 Smart Install CD drive,
use the software CD to install the product.
2. Follow the onscreen instructions.
3. When prompted to select a connection type, select the configuration⟩ 11
to print over the ⟨Wireless Network⟩10 option.
1. From the product control panel, press and hold the cancel button X
for 5 seconds, a configuration page. This
page will have an ⟨lP⟩8address in the etworkNetwork Information⟩9 section.
2. At the computer, open a Web browser, type the product IP address
in the address field
embedded web server page.
3. Click the ⟨HP⟩1 Smart Install tab, and then click the Download button.
4. Follow the onscreen instructions.

Abbyy

Perform a wireless installation
Before starting, install, verify that the
point is working correctly
network, and the product is turned on.
If there is not a pfod-; to process A.
If there is a blue light on it, g <f process B.
A.
1. Connect the USB cable between the computer and the product. The HP Smart Install program can be disabled automatically within 30 seconds. Double-click the SISetup.exe file to run the program. If you can’t find the HP Smart Install CD drive
2. Follow the onscreen instructions.
3. When prompted to select the wireless network option
1. For X seconds, click here for the printout page. This page will have an IP address in the Network Information section.
2. Click here for the product embedded web server page.
3. Click the HP Smart Install tab, and then dick the Download button.
4. Follow the onscreen instructions.

Microsoft

Perform a wireless instanotion (wir & ss models 0+)
Before starting the installation. verify that the
pant is working correctly
network. and the product is turned on.
If there is not a solid blue light on the product.
to process A.
If there is a blue light on the bp the product. 90 to
process B.
l. Connect the USB cable between the computer and
the product. The HP Smart Install program (see p • dure
above) should skirt automatically within 30 seconds-
Note: If HP Smart Install does not start automatically
AutoPlay rmght be disabled on your computer
Browse my computer and double-click the
HP Smart Install CD drive. Double-click the
SISetup.exe file to run the program the
product. If you can't find the HP Smart Install CD
drive, use the software CD to install the product.
2
Follow the onscreen instructions.
3
When prompted to select a connection type
Configure to print over Wireless Network opt.on.
B.
. From the product control panel, press and hold the
X for 5 seconds
print a Configuration page. This page will have an IP
address in the Network Information section.
2. At the computer, open a Web browser, type
IP address in the address field. and press the enter key
to open the product embedded web server page.
3. Click the HP Smart Install tab, and then click the
Download button.
4. Follow the onscreen instructions.

It is seen that only Google and Abbyy really compete. But as soon as it comes to the bulk of the text on the page, Abbyy wins here: he knows how to structure the text, translate tables, footers, and so on. Google gives the bulk of the text. Plus, Google has few languages in support.

Vanguey, that in the near future startups will appear that will use Google Api for translation, and all structural analytics + text collection will be hooked on top. Given that Abbyy wants to translate 10 times more than Google - this is quite juicy.

It is clear that in the text segment there is not a single good software that can be launched at home. So go to the next OCR task.

Price tags, other signs
An important point - google does not support other languages, unlike microsoft. But in general, both work when the text is good, not blurry, not tilted, not noisy and does not glare:

Microsoft:

175766
AHAHAC B AHAHACE
tıııışııııııı 175T

Google:

175 AHAHAC BAHAHACE 340167 000000 900708249

Microsoft: Nothing recognized

Google:

K) cpeacrao ainocya. AOS Aqua Aqua ban.aasa an03 Depa Pocowe 100or - 912 mov

Microsoft: by zeros

Google:

In general, somewhere 60% of the labels - the text is read. And this in my opinion is just an awesome result. But how and why to collect this text is not clear.

Moreover, even well-taken tablets read well. Not all the text, of course, but large precisely:

original

Google

Microsoft

But still, texts of different formats are not well recognized:

Google

Microsoft

Technical information
And here, Google and Microsoft are specifically crap. Microsoft recognized 20% of several dozens of checked car numbers. Google 60%, and even those without a region. The region recognized only in an ideal situation, when a large number without dirt. As soon as the dirt - a separate piece recognizes and everything.

Plus regular errors of 1-2 characters:

Number recognition systems rely on a priori information => work better. Although, of course, Google may be enough for some applications. A perfect shot in the focus to the number recognizes.

Without a priori information is bad. Another variant of technical vision - train numbers. Microsoft did not cope at all. Google gave only 50 percent correct. On the rest I constantly mowed:

So in the tasks of quality control, stable text recognition, Google and Microsoft are suitable only for the simplest tasks.

Of course, OpenSource solutions on the topic of such problems do not, but often they can be solved on their own. Search of simple hypotheses, contour search, etc. The same license plate allocation works quite stably, for example, in OpenCV. There is a cascade of Haar and contour selection. Plus you can teach LBP | HOG.

Total

Label Detection - Google is ahead, but in a close and tight fight.
Facial Detection - Google is behind. It is not clear what his decision is about.
Landmark Detection - Adoration! This is nowhere to be found!
OCR - This battle develops here. Google began to step on the heels of serious decisions, but so far it can not get around. At the same time, in the region where a clear statement of the problem is missing, it is leading the way. Microsoft is far enough behind, but trying to catch up.

So far, up to a stable CV, the solution of everything and everything from the giants is far. But they slowly and smoothly capture the whole market. Yes, their solutions can only work in terms of access to the Internet. But it will often be easier to make the Internet than to file the solution yourself.

Source: https://habr.com/ru/post/312714/

All Articles

Google Cloud Vision API. Future Computer Vision as a service has arrived?

What is included in the Cloud Vision API

Label Detection

Facial detection

Landmark detection

OCR - Optical Carecter Recognition

Total

More articles: