📜 ⬆️ ⬇️

Data Layout Specialist

Today is a wonderful day (if you know what I mean) to announce our new program - Data Marking Specialist .

Currently, in the field of artificial intelligence, there is a situation in which several components are needed for learning a strong neural network: hardware, software and, directly, data. Lots of data.

Iron, in general, is available to everyone through the clouds. Yes, it can be expensive, but ECU instances on EC2 are quite affordable for most researchers. The software is open source, most of the frameworks can be downloaded somewhere and work with them. Some harder, some simpler. But the threshold for entry is quite acceptable. Only the last component remains - this is data. And here is the snag.
')
Deep learning requires really big data: hundreds of thousands – millions of objects. If you want to deal with, for example, the task of classifying images, then you, in addition to the data itself, need to transfer information to the neuronics which class this or that object belongs to. If your task is also related to image segmentation, then getting a good dataset is already fantastically difficult. Imagine that you need to highlight the boundaries of each object on each image.


In this post, I want to review those tools (commercial and free), which are trying to make life easier for these wonderful people - data markers.

Labelme


To begin with, this is a free tool made at MIT . With its help, you can mark your images: these can be just bounding boxes, or pixel-by-pixel segmentation.


In fact, this is a UI in which you can select the outlines of the image and put dots. It's all. This tulza is not able to do anything more intelligent. Another feature: LabelMe has a mobile application. You can not lose time in the subway, train, bus, boring lecture.

Prodi.gy


One of the most advanced active learning systems . The idea is that an already pre-trained model with minimal training tries to tag your data, and your task is only to guide it. The target audience is analysts and engineers who need to properly mark up the data, and they don’t have much resources for external markers. UX, according to the developers, is similar to Tinder.


Tulsa requests to mark only those objects for which she is not sure. It seems like they are putting more emphasis on working with texts, but they also have computer vision, including working with video. We ourselves did not use it. She paid. The cost of the license starts from $ 390.

Scale API


These guys are coming up with a turnkey layout process. Give us your data, we will give it to our markers, we will check the quality, we will give you the result after some time. And all this through the API.


Naturally, this is also not a free tool. For example, the marking of one picture for the semantic segmentation task (that is, to select objects with outlines on the image and say what kind of objects) will cost $ 8 if you need urgently, or $ 6.4 if you are ready to wait.

Supervise.ly


This tool is intended to simplify markup of type segmentation. Under the hood (by feeling) something like Polygon-RNN works. You select objects with rectangles, and the system itself finds the boundaries of the object inside the rectangle. They have different trained grids for different subject areas.


The guys still know how to generate synthetic data from games and dilute them with real, if real is difficult to get. Plus, they can also have their entire system inside your enterprise, so that the data from you will not go anywhere. In general, it feels like it can accelerate the work of the marker. But it is not exactly.

Mechanical Turk


The power of Hindu markings at your fingertips. Expensive for you, a penny for them, poor quality, incomprehensible quality control, but everyone uses. In Russia, there is an analogue - Yandex.Toloka .


Someday we will interview the users of these platforms and find out how their working day is going and what the difficulties are.

Crowdflower


This tool is the de facto standard for markup. They also use living people, but provide them with more advanced tools than Toloka or MTurk, in order to make it easier to mark.


In addition to standard bounding boxes, semantic segmentation, polygons, they also mark points, for example, for warehouses or shelves in stores.

As you can see, the market for such solutions is still very narrow, but the potential is quite large, because the AI ​​bottleneck now is precisely the well-marked data. And besides jokes, this is really the future.

If you know other tools, write in the comments.

Source: https://habr.com/ru/post/352572/


All Articles