📜 ⬆️ ⬇️

Deep learning to determine the style and genre of paintings

Hi, Habr!


Today I want to talk about the second part of the service project for the identification and classification of works of art. Let me remind you that we solved two main tasks:


  1. search for a picture in the database of a photo taken by a mobile phone;
  2. definition of the style and genre of the picture, which is not in the database.

Today we will consider the use of a convolutional neural network to classify images by style and genre.



Let's help Dasha understand modern art?


Definition of the style of paintings


Of the nearly 250,000 paintings in the Arthive database, less than 20% is assigned a genre, style or technique, often the classes exhibited in the database do not correspond to the true values, many classes contain too few images. It seems there are even classes containing units of images. Apparently, some authors consider it necessary to create a name for their own style.

In total, about 75 styles were allocated in the database, however, for our work, the customer selected 27 mandatory styles (to which one more was later added), which the system must necessarily recognize.

According to them, the distribution of filling was very uneven.


StyleqtyStyleqty
Realism19594Primitivism1234
Impressionism15864Art Deco1092
Romanticism8963Northern Renaissance921
Baroque7726Cubism902
Modern4882Academism707
Surrealism4793Gothic608
Revival4709Modernism539
Expressionism4329Social realism481
Symbolism4321Pop Art475
Post-impressionism3951Pointillism275
Abstractionism3664Fauvism217
Ukie-e3136Avant-garde174
Classicism1730Hyperrealism13
Rococo1600Fantasyeight
Total96908

All styles
StyleqtyStyleqtyStyleqty
Realism19594Pop Art475Decorative art66
Impressionism15864Biedermeier471Minimalism66
Romanticism8963Fantastic realism386Sentimentalism66
Baroque7726Abstract expressionism358Cloisonianism60
Modern4882Nabi339Metaphysical painting56
Surrealism4793Pointillism275Mccchioli52
Revival4709Suprematism273Orphism51
Expressionism4329Pre-Raphaelites252Dadaism50
Symbolism4321Magical realism248Neoimpressionism49
Post-impressionism3951Early Renaissance232Luminism41
Abstractionism3664Neo-expressionism230Proto-renaissance39
The Golden Age of Holland3292Fauvism217Plentanism37
Ukie-e3136Postmodernism192Tenebrizm35
Classicism1730Avant-garde174Abstract impressionism34
Rococo1600Modern Art149Conceptualism29
Primitivism1234Precisionism138Japonism24
Art Deco1092Cubofuturism108Postmodern24
Northern Renaissance921Constructivism104Luchism24
Cubism902Tonalism103Byzantine20
Academism707Orphism94Romantic realismnineteen
Gothic608Regionalism93Hyperrealism13
Neoclassicism601Analytical realism89Verismeleven
Mannerism544Naturalism73Neo-primitivismten
Modernism539Neo-modernism70Fantasyeight
Social realism481Futurism67Metarealism7
Total106284


We are faced with the task of classifying images, but we cannot select some simple signs manually. So, we will use deep machine learning, in which such complex features are automatically highlighted in the learning process.


')

Transfer learning


Consider the inception v3 network.



General architecture with intermediate outputs

In its architecture (and in any other deep network) we can conditionally distinguish two main components - Feature Extractor and Predictor.

Feature Extractor displays the input color images in a multi-dimensional feature space (multi-channel feature map). The attribute map saves spatial information — that is, it is a three-dimensional tensor with dimensions along the width, height, and number of feature channels; the final pooling has not yet been applied, which completely eliminates the information about the relative position of the features in the original image. Feature Inventor v3 Network Extractor receives a 299 image as input  times299  times3, and on output forms a feature map of size 17  times17  times2048. The size of the input can be varied, which will lead to changes in the size of the map of signs and can be useful to reduce the computational cost when working with a network.

A Predictor is a network that generates output data based on a feature map formed by Feature Extractor. As a rule, for the classification problem, the Predictor is a fully connected layer of neurons, the number of outputs of which coincides with the number of classes of the problem.

Classical transfer learning assumes that we take a trained network, separate the Extract Extractor from it, and supplement it with a new predictor with the number of classes we need. The resulting network is trained at low speed with partially or completely frozen scales of the Extract Extractor layers.


Apply transfer learning to classify styles. Take the Inception-v3 network trained on the imagenet dataset and replace the neuron output layer in it, which classifies the input images into the number of selected styles. We trained the resulting network on images of different styles, freezing the training of all layers except the last.


To analyze the data, we displayed the distribution of the validation set by classes.



Each line corresponds to a class from the validation set. The brightness of the squares in the row is proportional to the number of pictures that fall into the class corresponding to the column.

For better clarity, exclude the main diagonal and re-normalize the values ​​of each row.



In addition, we will try to map the distribution by style onto a two-dimensional space using TSNE.


It can be seen that a lot of mistakes are observed, for example, when classifying paintings in the style of Fauvism - a significant part of them belong to expressionism as a network. Northern Renaissance and Gothic often refer to Renaissance. Many images of the Rococo style and classicism are related to realism. Modernism and modern generally break up into many styles.


Throwing a simple script that sorted out the training base by folders in accordance with the style defined by the network, we conducted a brief analysis of errors. It turned out that the base markup at least raises questions.

Many images in the style of modernism (which, although the customer noted as mandatory, but in general is not a style, but rather a direction in art as a whole) were actually duplicated in other styles, especially in modernity (and this is already a style).


In the style of socialist realism there were abstract images, for example, works by Lissitzky. Apparently, they got there thanks to the work of Lissitzky on the Soviet poster, which is very indirectly related to social realism.


In many respects, these are really mistakes, but sometimes the reason is the debatable nature of highlighting some, especially modern styles. It should be borne in mind that the base is filled with various users, and among them sometimes there is no consensus.

Errors in the data lead to corresponding network image classification errors. In the process of cleaning the base, both by us and by an expert art critic on the part of the customer, the marking for the training set has been significantly improved.


However, the majority of network classification errors (by total) are related to more or less well-established styles, such as rococo, classicism, realism. The attribution of works to these styles, as a rule, occurs on the basis of an epoch or authorship, and it seems that there is no doubt or controversy. Why is the network unable to distinguish their style? The main reason is the use of a pre-trained network to extract features.


The point is that this network was trained to classify objects, determine what exactly is depicted, while discarding information that is not essential for the task about how it is depicted. For example, from the point of view of the network, in all the images at the beginning of the article, in general, a person is depicted.


To solve this problem, we made a network with intermediate outputs - it is believed that the signs become more and more difficult as they move along the network, and irrelevant information disappears gradually. We will try to extract from the intermediate layers what was irrelevant for imagenet classification.



General architecture with intermediate outputs

There is another problem - graphics, prints, sketches. In imagenet, on which the inception network was pre-trained, there is simply nothing of the kind, and, accordingly, the features distinguished by the network are not suitable for classifying such images.







Realism, Impressionism.

Camille Caro, Hagar and the Angel

Baroque
Rembrandt Harmens van Rhine, Hagar and the Angel


On the other hand, beautifully hung as a separate cloud of the Ukye-e- style painting is a kind of engraving that has become widespread in Japan since the 17th century. Although initially they were not on our mandatory list, we added them there.



Asakus rice fields and the Torinomachi festival


After working with data, it was possible to achieve a better distribution of classes.


We understand the genres


Of the total number of genres, 13 were selected (bold)


Genreqty
Allegorical scene2500
Portrait2308
Landscape2213
Fantasy2191
Literary scene2096
Cityscape2048
Nu1981
Still life1932
Genre scene1736
Animalism1587
Religious scene1417
Mythological scene1368
Marina1210
Architecture958
Interior635
Historical scene534
Battle scene201
Zakli180
Lead124
Urban landscapesixteen
Total27235

Basically, the reduction in the number of genres has been achieved by reducing the genres of various scenes - "religious", "mythological", "allegorical", "literary" and combining them under the common name "genre scene". We came to the conclusion that the separation of these genres can hardly be done with sufficient accuracy without significant cultural analysis.

For example, for an allegorical scene, by definition, it is assumed that there is a hidden meaning in the image, the use of the figurative meaning of the objects depicted. There is a difficulty with the "religious scene": it is very likely that the network trained to issue such a class will call them caricature images (for example, parodying the "Last Supper" da Vinci), and this may offend someone .


The layout of data by genres initially seems to be quite good, except for several genres for which there are few images in the database. Searching on the Internet, we were able to slightly expand the number of images in the genres (mainly the battle scene, twists and vedutas).
After combining difficult genres into a common "genre scene", we immediately tried to educate the network head-on using the inception transfer learning network.


Genres, result 1


It is seen that the points corresponding to images of different genres are mixed. For these images, the network gives high values ​​of the probabilities of belonging to several genres at once, and the genre is most likely determined almost randomly. The reason seems to be that genres, unlike styles, have a more pronounced hierarchy. We tried to understand these links, we got such a map of genres:



Hierarchy of genres


Child and parental hierarchy genres often have common features from a network point of view (and from our point of view, too). For example, a battle scene on land as a whole has the same characteristics as an ordinary landscape - an image of a large open area or city, and a battle scene at sea looks more like a marina genre. Therefore, we have divided the battle scene genre into two - on land and at sea. Another example: portraits, genre scenes and nudes from the point of view of the pre-trained network all have a common feature - the presence of people.


In a database, pictures of similar content often refer either to a child or to a parent genre, depending on where it was determined by the expert who contributed the pictures to the database. In this regard, a large-scale cleaning and redevelopment of the base was carried out taking into account the possible hierarchy of genres, which took a lot of effort (we managed to automate it, but not much).


In order to transfer the hierarchy of genres to the network, we abandoned the one-hot presentation and set a unit for images not only in one genre, but also in its parent, if there is one, and also replaced the target function of the learning process and the output layer activation function . Thus, the task became the Multilabel classification (the input image can belong to several classes).



It seems to us that there is not enough of another genre here - abstraction. Strictly speaking, it is not exactly a genre. At least the experts insisted that there was no such genre. In order for the network not to give random images to abstract images, another one was added to the general division of genres called “failed to identify”, including abstract and controversial images.


Instead of conclusion


In general, it was possible to achieve a satisfactory classification accuracy of styles and genres of images, but there is much to improve.


Unfortunately, the classification of styles and techniques was not completed - the service support was not implemented.

Source: https://habr.com/ru/post/422357/


All Articles