πŸ“œ ⬆️ ⬇️

The neural network generates images of dishes according to their recipes.


Comparison of real photos (above), generated images with semantic regularization (middle row) and without it

A team of researchers from Tel Aviv University has developed a neural network capable of generating images of dishes according to their textual recipes. Thus, the housewife can see in advance what will happen in the end, if you change one or another item of the recipe: add a new ingredient or remove some of the existing ones. In principle, this scientific work is a good idea for a commercial application, especially since the source code of the program is published in the public domain .

The neural network is a modified version of the Generative-Competitive Network (GAN) called StackGAN V2. The training took place on a large base of 52 thousand pairs of images / recipes from the recipe1M data set.

In principle, a neural network can take almost any list of ingredients and instructions β€” even fantastic combinations β€” and figure out what the finished product looks like.
')
β€œIt all started when I asked my grandmother for a recipe for her legendary fish cutlets with tomato sauce,” says Ori Bar El, lead author of the scientific work. - Because of her old age, she did not remember the exact recipe. But I was wondering whether it is possible to build a system that will display a recipe from the image of food. After thinking about this problem, I came to the conclusion that the system is too difficult to obtain an exact recipe with real and "hidden" ingredients, such as salt, pepper, butter, flour, etc. Then I wondered if it could be done the other way around. Namely, generate product images based on recipes. We believe that this task is very difficult for people, especially for computers. Since most modern systems of artificial intelligence are trying to replace experts in simple tasks for humans, we thought that it would be interesting to solve a problem that even goes beyond human capabilities. As you can see, this can be done with some success. "

Generating images from text is a complex task, with many applications in the field of computer vision. Recent work has shown that generative-competitive networks (GANs) are very effective in synthesizing high-quality, realistic images from low-variability and low-resolution data sets.

It is also known that cGAN-type networks generate convincing images directly from the text description. Recently, a recipe1M data set containing 800,000 pairs of recipes and their corresponding images was published as part of a scientific study (see A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba. Learning for the IEEE Conference on Computer Vision and Pattern Recognition , 2017). This set has high variability due to the variety of food categories by category. In addition, complex text from two sections (ingredients and instructions) is attached to images. In total, the text part can contain dozens of lines.

Having such an excellent set of data, scientists from Tel Aviv University could only teach a neural network. They combined their accumulated knowledge in the field of generative-adversary networks and a published data set.

Researchers recognize that the system is not perfect yet. The problem is that the original data set is represented by images of relatively low resolution of 256 Γ— 256 pixels, and often of poor quality, there are many images with poor lighting conditions, porridge-like images and non-square shaped images (which makes it difficult to train models). This fact explains why both the developed cGAN models succeeded in creating "mushy" food products (for example, pasta, rice, soups, salads), but it is very difficult for them to generate product images of a characteristic clear shape (for example, a hamburger or chicken).

In the future, the authors intend to continue to work, having trained the system for the rest of the recipes (about 350 thousand images remain in the set of suitable data). However, this does not negate the fact that the available photos are of poor quality. Therefore, they allow the possibility of creating their own set based on the text of children's books and relevant images.

The scientific article was published on January 8, 2019 on the site of preprints arXiv.org (arXiv: 1901.02404).

Source: https://habr.com/ru/post/435924/


All Articles